Why using a mean for missing data is a bad idea. Alternative imputation algorithms.
2019. 6. 30. 23:02ㆍ포스팅 후보
가장 인상 깊은 부분은 이것
Mean reduces a variance of the data
the variance was reduced (that big change is because the dataset is very small) after using the Mean Imputation. Going deeper into mathematics, a smaller variance leads to the narrower confidence interval in the probability distribution
평균 대체는 분산을 작게하는데, 분산이 작게 되면 신뢰 구간은 좁아지게 된다.
즉 모델이 편향되게 할 수 있다!
MAR, MCAR, MNAR 잘 설명한 곳
https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4
728x90
'포스팅 후보' 카테고리의 다른 글
Stacking Classifier 연습해보기 (0) | 2019.07.24 |
---|---|
Differential Privacy 관련 좋은 글 (0) | 2019.07.01 |
7 Tips for Dealing With Small Image Data (0) | 2019.06.30 |
CatBoost + Interpretation (0) | 2019.06.30 |
regularization group lasso for NN (0) | 2019.06.17 |