파이썬에서 정규성 검정하기
## shapiro , normaltest
shapiro가 노말 분포 검정할 때 가장 엄격하게 한다고 함.
from scipy.stats import shapiro , normaltest , anderson , kstest
normal = []
notnormal = []
for var in num_var :
stat, p = shapiro(train[var].dropna().values)
stat, p = normaltest(train[var].dropna().values)
alpha = 0.05
if p > alpha :
normal.append(var)
else :
notnormal.append(var)
## kstest
분포 2개를 비교하는 것이므로, 어떤 분포에서도 사용 가능하므로, 노말에서도 사용 가능함.
normal = []
notnormal = []
for var in num_var :
stat,p = kstest(train[var].dropna().values, "norm")
alpha = 0.05
if p > alpha :
normal.append(var)
else :
notnormal.append(var)
## anderson
kstest와 같이 어떤 분포라도 비교 가능함. 기존 분포를 알고 있는 경우 ks test보다 더 정교하게 비교가 가능함.
비교 가능한 분포로는 {’norm’,’expon’,’logistic’,’gumbel’,’gumbel_l’, gumbel_r’, ‘extreme1’}
normal = []
notnormal = []
for var in num_var :
result = anderson(train[var].dropna().values)
normality = 0
for i in range(len(result.critical_values)):
sl, cv = result.significance_level[i], result.critical_values[i]
if result.statistic < result.critical_values[i]:
normality +=1
else :
normality +=0
if normality > 2.5 :
normal.append(var)
else :
notnormal.append(var)
좀 더 자세한 것은 아래 글 참고!
6 ways to test for a Normal Distribution — which one to use?
Find out which approach is the most powerful when testing for a normal distribution!
towardsdatascience.com
https://machinelearningmastery.com/a-gentle-introduction-to-normality-tests-in-python/
A Gentle Introduction to Normality Tests in Python
An important decision point when working with a sample of data is whether to use parametric or nonparametric statistical methods. Parametric statistical methods assume that the data has a known and specific distribution, often a Gaussian distribution. If a
machinelearningmastery.com