[TIP / Sklearn ] Custom Estimator (Ex : Combined Regressor)
2020. 10. 27. 23:12ㆍ분석 Python/Scikit Learn (싸이킷런)
sklearn을 사용하여 estimator를 만들 때, 참고하기 좋은 자료가 있어서 공유한다.
이 자료는 Combined Regreesor라는 Estimator를 쓸 때 만든 자료인 것 같은데, Custom Estimator로 Grid Search도 돌리는 것을 보고 관심이 가게 되서 일단 저장해둔다.
custom estimator를 만들어야하는 분이라면 참고하면 좋을 것 같다.
Sklearn compatibility
If we want to achieve full sklearn compatiblity (model selection, pipelines, etc.) and also use sklearn’s onboard testing utilities we have to do some modifications to the estimator:
- we need to add the setters and getters for parameters (we use sklearn’s convention to prefix the parameters with the name and two underscores, i.e. base_regressor__some_param)
- consistent handling of the random state
import numpy as np from sklearn.base import BaseEstimator, RegressorMixin from sklearn.ensemble import RandomForestRegressor from sklearn.linear_model import LinearRegression class CombinedRegressor(BaseEstimator, RegressorMixin): def __init__(self, base_regressor=RandomForestRegressor, backup_regressor=LinearRegression, lower=0.1, upper=1.9, random_state=None, **kwargs): self.base_regressor = base_regressor() self.backup_regressor = backup_regressor() self.set_random_state(random_state) self.lower = lower self.upper = upper self.set_params(**kwargs) def fit(self, X, y): self.base_regressor.fit(X, y) self.backup_regressor.fit(X, y) return self def predict(self, X, y=None): y_base = self.base_regressor.predict(X) y_backup = self.backup_regressor.predict(X) y_pred = np.where((self.lower * y_backup <= y_base) & (y_base <= self.upper * y_backup), y_base, y_backup) return y_pred def __repr__(self): # not as good as sklearn pretty printing, # but shows updated params of subestimator return f'CombinedRegressor({self.get_params()})' def get_params(self, deep=False, **kwargs): base_regressor_params = self.base_regressor.get_params(**kwargs) # remove random state as it should be a global param of the estimator base_regressor_params.pop('random_state', None) base_regressor_params = {'base_regressor__' + key: value for key, value in base_regressor_params.items()} backup_regressor_params = self.backup_regressor.get_params(**kwargs) backup_regressor_params.pop('random_state', None) backup_regressor_params = {'backup_regressor__' + key: value for key, value in backup_regressor_params.items()} own_params = { 'lower': self.lower, 'upper': self.upper, 'random_state': self.random_state } params = {**own_params, **base_regressor_params, **backup_regressor_params, } if deep: params['base_regressor'] = self.base_regressor params['backup_regressor'] = self.backup_regressor return params def set_random_state(self, value): self.random_state = value if 'random_state' in self.base_regressor.get_params().keys(): self.base_regressor.set_params(random_state=value) # linear reg does not have random state, but just in case.. if 'random_state' in self.backup_regressor.get_params().keys(): self.backup_regressor.set_params(random_state=value) def set_params(self, **params): for key, value in params.items(): if key.startswith('base_regressor__'): trunc_key = {key[len('base_regressor__'):]: value} self.base_regressor.set_params(**trunc_key) elif key.startswith('backup_regressor__'): trunc_key = {key[len('backup_regressor__'):]: value} self.backup_regressor.set_params(**trunc_key) elif key == 'random_state': self.set_random_state(value) else: # try to fetch old value first to raise AttributeError # if not exists old_value = getattr(self, key) setattr(self, key, value) # set_params needs to return self to make gridsearch work return self def _more_tags(self): # no_validation added because validation is happening # within built-in sklearn estimators return {**self.base_regressor._more_tags(), 'no_validation': True}
from sklearn.model_selection import GridSearchCV cv = GridSearchCV(CombinedRegressor(), param_grid={'base_regressor__n_estimators': [50, 100]}, verbose=1) cv.fit(X, y) cv.best_params_
check estimator
from sklearn.utils.estimator_checks import check_estimator # at once check_estimator(CombinedRegressor()) # iterate for estimator, check in check_estimator(CombinedRegressor(), generate_only=True): print(f'Running {check}') check(estimator)
to satisfy a specific validation
import mock from sklearn.utils.estimator_checks import check_estimator with mock.patch('sklearn.utils.estimator_checks.check_estimators_data_not_an_array', return_value=True) as mock: check_estimator(CombinedRegressor())
728x90
'분석 Python > Scikit Learn (싸이킷런)' 카테고리의 다른 글
개선된 OneHotEncoder 알아보기(v1.1 이후) (0) | 2022.05.25 |
---|---|
[sklearn] TSNE, MDS, SpectralEmbedding Estimator를 Pipeline 에 적용 시키는 방법 (0) | 2020.08.23 |
scikit-learn 파이프라인 시각화 기능 사용 및 재사용 (pipeline visualization) (0) | 2020.05.15 |
scikit-learn 0.23 이 되면서 변한 점 (0) | 2020.05.15 |
Scikit-learn Custom Pipeline Save & Reload (저장 및 재사용) (0) | 2020.02.28 |