[ Python ] Scikit-Learn, Numeric 표준화 / Category Onehot 하는 Pipeline 및 모델링하는 코드

2019. 6. 15. 18:38분석 Python/Scikit Learn (싸이킷런)

728x90
numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])
## category 있는 경우 
categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression(solver='lbfgs'))])

param_grid = {
    'preprocessor__num__imputer__strategy': ['mean', 'median'],
    'classifier__C': [0.1, 1.0, 10, 100],
}

#grid_search = GridSearchCV(clf, param_grid, cv=10, iid=False)
#grid_search.fit(X_train, y_train)
 

Column Transformer with Mixed Types — scikit-learn 0.21.2 documentation

Note Click here to download the full example code Column Transformer with Mixed Types This example illustrates how to apply different preprocessing and feature extraction pipelines to different subsets of features, using sklearn.compose.ColumnTransformer.

scikit-learn.org

 

728x90

데이터분석뉴비님의
글이 좋았다면 응원을 보내주세요!