Why using a mean for missing data is a bad idea. Alternative imputation algorithms.

Why using a mean for missing data is a bad idea. Alternative imputation algorithms.

2019. 6. 30. 23:02ㆍ포스팅 후보

https://towardsdatascience.com/why-using-a-mean-for-missing-data-is-a-bad-idea-alternative-imputation-algorithms-837c731c1008

We all know the pain when the dataset we want to use for Machine Learning contains missing data. The quick and easy workaround is to…

towardsdatascience.com

가장 인상 깊은 부분은 이것

Mean reduces a variance of the data

the variance was reduced (that big change is because the dataset is very small) after using the Mean Imputation. Going deeper into mathematics, a smaller variance leads to the narrower confidence interval in the probability distribution

평균 대체는 분산을 작게하는데, 분산이 작게 되면 신뢰 구간은 좁아지게 된다.

즉 모델이 편향되게 할 수 있다!

MAR, MCAR, MNAR 잘 설명한 곳

https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4

How to Handle Missing Data

“The idea of imputation is both seductive and dangerous” (R.J.A Little & D.B. Rubin)

towardsdatascience.com

'포스팅 후보' 카테고리의 다른 글

Stacking Classifier 연습해보기 (0)	2019.07.24
Differential Privacy 관련 좋은 글 (0)	2019.07.01
7 Tips for Dealing With Small Image Data (0)	2019.06.30
CatBoost + Interpretation (0)	2019.06.30
regularization group lasso for NN (0)	2019.06.17

All I Need Is Data.

All I Need Is Data.

태그

최근글

댓글

공지사항

아카이브

Mean reduces a variance of the data

'포스팅 후보' 카테고리의 다른 글

관련글

티스토리툴바