Why using a mean for missing data is a bad idea. Alternative imputation algorithms.
2019. 6. 30. 23:02ㆍ포스팅 후보
728x90
Why using a mean for missing data is a bad idea. Alternative imputation algorithms.
We all know the pain when the dataset we want to use for Machine Learning contains missing data. The quick and easy workaround is to…
towardsdatascience.com
가장 인상 깊은 부분은 이것
Mean reduces a variance of the data
the variance was reduced (that big change is because the dataset is very small) after using the Mean Imputation. Going deeper into mathematics, a smaller variance leads to the narrower confidence interval in the probability distribution
평균 대체는 분산을 작게하는데, 분산이 작게 되면 신뢰 구간은 좁아지게 된다.
즉 모델이 편향되게 할 수 있다!
MAR, MCAR, MNAR 잘 설명한 곳
https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4
How to Handle Missing Data
“The idea of imputation is both seductive and dangerous” (R.J.A Little & D.B. Rubin)
towardsdatascience.com
728x90
'포스팅 후보' 카테고리의 다른 글
Stacking Classifier 연습해보기 (0) | 2019.07.24 |
---|---|
Differential Privacy 관련 좋은 글 (0) | 2019.07.01 |
7 Tips for Dealing With Small Image Data (0) | 2019.06.30 |
CatBoost + Interpretation (0) | 2019.06.30 |
regularization group lasso for NN (0) | 2019.06.17 |