[ Python ] scikitplot 다양한 metric plot 제공

2019. 10. 19. 17:38분석 Python/Scikit Learn (싸이킷런)

728x90

분석을 하다 보면 여러 Metric Plot을 그려야 하는 경우가 많다.
그래서 R에서는 제공하는게 많지만, 은근히 파이썬에서는 사람이 그려야 하는 게 많았다.
이번에는 ScikitPlot 라는 패키지를 소개하려고 한다. 여기서는 여러 가지 Plot을 제시해준다.
나중에 쓸 수도 있으니 간단한 사용법을 미리 정리!

## Multiclass

  1. ROC CURVE
  2. Confusion Matrx Plot
  3. silhouette
  4. precision_recall

## Binary 

  1. lift_curve
  2. cumulative_gain
  3. ks_statistic
  4. calibration_curve

## Kmeans

  • elbow curve (optimal cluster)

## PCA

  • pca component variance
import matplotlib.pyplot as plt
import scikitplot as skplt
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import LinearSVC
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import scikitplot as skplt

## ROC Curve

X, y = load_digits(return_X_y=True)
random_forest_clf = RandomForestClassifier(n_estimators=5, max_depth=5, random_state=1)
rf = random_forest_clf.fit(X, y)
y_probas = rf.predict_proba(X)
skplt.metrics.plot_roc(y, y_probas)
plt.show()

## ConfusionMatrix

predictions = rf.predict(X)
skplt.metrics.plot_confusion_matrix(y, predictions,
                                    normalize=True)
plt.show()

 

## silhouette

skplt.metrics.plot_silhouette(X, predictions)
plt.show()

## Precision Recall

skplt.metrics.plot_precision_recall(y, y_probas)
plt.show()

Binary Class

zero_one_index = [True if i in [0,1] else False for i in y  ]
X_1 , Y_1 = X[zero_one_index] , y[zero_one_index]

## KS Cruve

rf = random_forest_clf.fit(X_1 , Y_1)
y_probas = rf.predict_proba(X_1)
skplt.metrics.plot_ks_statistic(Y_1, y_probas)
plt.show()

## Cumulative GAIN

skplt.metrics.plot_cumulative_gain(Y_1, y_probas)
plt.show()

## Lift Curve

skplt.metrics.plot_lift_curve(Y_1, y_probas)
plt.show()

## calibration_curve

rf = RandomForestClassifier()
lr = LogisticRegression()
nb = GaussianNB()
svm = LinearSVC()
rf_probas = rf.fit(X_1, Y_1).predict_proba(X_1)
lr_probas = lr.fit(X_1, Y_1).predict_proba(X_1)
nb_probas = nb.fit(X_1, Y_1).predict_proba(X_1)
svm_scores = svm.fit(X_1, Y_1).decision_function(X_1)
probas_list = [rf_probas, lr_probas, nb_probas, svm_scores]
clf_names = ['Random Forest', 'Logistic Regression',
             'Gaussian Naive Bayes', 'Support Vector Machine']
skplt.metrics.plot_calibration_curve(Y_1,
                                     probas_list,
                                     clf_names)

plt.show()

## elbow_curve

kmeans= KMeans(random_state=1)
skplt.cluster.plot_elbow_curve(kmeans, X,
                               cluster_ranges=range(1, 30))
plt.show()

## PCA Component Variance

pca = PCA(random_state=1)
pca.fit(X)
skplt.decomposition.plot_pca_component_variance(pca)
plt.show()

 

- 끝 -

 

https://scikit-plot.readthedocs.io/en/stable/cluster.html

 

Clusterer Module (API Reference) — Scikit-plot documentation

ax (matplotlib.axes.Axes)

scikit-plot.readthedocs.io

 

728x90