
sklearn을 사용하여 estimator를 만들 때, 참고하기 좋은 자료가 있어서 공유한다.
이 자료는 Combined Regreesor라는 Estimator를 쓸 때 만든 자료인 것 같은데, Custom Estimator로 Grid Search도 돌리는 것을 보고 관심이 가게 되서 일단 저장해둔다.
custom estimator를 만들어야하는 분이라면 참고하면 좋을 것 같다.
Sklearn compatibility
If we want to achieve full sklearn compatiblity (model selection, pipelines, etc.) and also use sklearn’s onboard testing utilities we have to do some modifications to the estimator:
- we need to add the setters and getters for parameters (we use sklearn’s convention to prefix the parameters with the name and two underscores, i.e. base_regressor__some_param)
- consistent handling of the random state
import numpy as np
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
class CombinedRegressor(BaseEstimator, RegressorMixin):
def __init__(self,
base_regressor=RandomForestRegressor,
backup_regressor=LinearRegression,
lower=0.1,
upper=1.9,
random_state=None,
**kwargs):
self.base_regressor = base_regressor()
self.backup_regressor = backup_regressor()
self.set_random_state(random_state)
self.lower = lower
self.upper = upper
self.set_params(**kwargs)
def fit(self, X, y):
self.base_regressor.fit(X, y)
self.backup_regressor.fit(X, y)
return self
def predict(self, X, y=None):
y_base = self.base_regressor.predict(X)
y_backup = self.backup_regressor.predict(X)
y_pred = np.where((self.lower * y_backup <= y_base) & (y_base <= self.upper * y_backup),
y_base,
y_backup)
return y_pred
def __repr__(self):
# not as good as sklearn pretty printing,
# but shows updated params of subestimator
return f'CombinedRegressor({self.get_params()})'
def get_params(self, deep=False, **kwargs):
base_regressor_params = self.base_regressor.get_params(**kwargs)
# remove random state as it should be a global param of the estimator
base_regressor_params.pop('random_state', None)
base_regressor_params = {'base_regressor__' + key: value
for key, value
in base_regressor_params.items()}
backup_regressor_params = self.backup_regressor.get_params(**kwargs)
backup_regressor_params.pop('random_state', None)
backup_regressor_params = {'backup_regressor__' + key: value
for key, value
in backup_regressor_params.items()}
own_params = {
'lower': self.lower,
'upper': self.upper,
'random_state': self.random_state
}
params = {**own_params,
**base_regressor_params,
**backup_regressor_params,
}
if deep:
params['base_regressor'] = self.base_regressor
params['backup_regressor'] = self.backup_regressor
return params
def set_random_state(self, value):
self.random_state = value
if 'random_state' in self.base_regressor.get_params().keys():
self.base_regressor.set_params(random_state=value)
# linear reg does not have random state, but just in case..
if 'random_state' in self.backup_regressor.get_params().keys():
self.backup_regressor.set_params(random_state=value)
def set_params(self, **params):
for key, value in params.items():
if key.startswith('base_regressor__'):
trunc_key = {key[len('base_regressor__'):]: value}
self.base_regressor.set_params(**trunc_key)
elif key.startswith('backup_regressor__'):
trunc_key = {key[len('backup_regressor__'):]: value}
self.backup_regressor.set_params(**trunc_key)
elif key == 'random_state':
self.set_random_state(value)
else:
# try to fetch old value first to raise AttributeError
# if not exists
old_value = getattr(self, key)
setattr(self, key, value)
# set_params needs to return self to make gridsearch work
return self
def _more_tags(self):
# no_validation added because validation is happening
# within built-in sklearn estimators
return {**self.base_regressor._more_tags(), 'no_validation': True}from sklearn.model_selection import GridSearchCV
cv = GridSearchCV(CombinedRegressor(),
param_grid={'base_regressor__n_estimators': [50, 100]},
verbose=1)
cv.fit(X, y)
cv.best_params_check estimator
from sklearn.utils.estimator_checks import check_estimator
# at once
check_estimator(CombinedRegressor())
# iterate
for estimator, check in check_estimator(CombinedRegressor(), generate_only=True):
print(f'Running {check}')
check(estimator)to satisfy a specific validation
import mock
from sklearn.utils.estimator_checks import check_estimator
with mock.patch('sklearn.utils.estimator_checks.check_estimators_data_not_an_array', return_value=True) as mock:
check_estimator(CombinedRegressor())
Combining tree based models with a linear baseline model to improve extrapolation
Writing your own sklearn functions, part 1
towardsdatascience.com
Combining tree based models with a linear baseline model to improve extrapolation (writing your own sklearn functions, part 1)
Written by Sebastian Telsemeyer on October 30, 2019 Combining tree based models with a linear baseline model to improve extrapolation (writing your own sklearn functions, part 1) This post is a short intro on combining different machine learning models fo
blog.telsemeyer.com
'분석 Python > Scikit Learn (싸이킷런)' 카테고리의 다른 글
| 개선된 OneHotEncoder 알아보기(v1.1 이후) (0) | 2022.05.25 |
|---|---|
| [sklearn] TSNE, MDS, SpectralEmbedding Estimator를 Pipeline 에 적용 시키는 방법 (0) | 2020.08.23 |
| scikit-learn 파이프라인 시각화 기능 사용 및 재사용 (pipeline visualization) (0) | 2020.05.15 |
| scikit-learn 0.23 이 되면서 변한 점 (0) | 2020.05.15 |
| Scikit-learn Custom Pipeline Save & Reload (저장 및 재사용) (0) | 2020.02.28 |