파이썬에서 정규성 검정하기

데이터분석뉴비 2020. 3. 22. 20:09

## shapiro , normaltest

shapiro가 노말 분포 검정할 때 가장 엄격하게 한다고 함.

from scipy.stats import shapiro , normaltest , anderson , kstest

normal = []
notnormal = []
for var in num_var :
    stat, p  = shapiro(train[var].dropna().values)
    stat, p  = normaltest(train[var].dropna().values)
    alpha = 0.05
    if p > alpha :
        normal.append(var)
    else :
        notnormal.append(var)

## kstest

분포 2개를 비교하는 것이므로, 어떤 분포에서도 사용 가능하므로, 노말에서도 사용 가능함.

normal = []
notnormal = []
for var in num_var :
    stat,p  = kstest(train[var].dropna().values, "norm")
    alpha = 0.05
    if p > alpha :
        normal.append(var)
    else :
        notnormal.append(var)

## anderson

kstest와 같이 어떤 분포라도 비교 가능함. 기존 분포를 알고 있는 경우 ks test보다 더 정교하게 비교가 가능함.

비교 가능한 분포로는 {’norm’,’expon’,’logistic’,’gumbel’,’gumbel_l’, gumbel_r’, ‘extreme1’}

normal = []
notnormal = []
for var in num_var :
    result  = anderson(train[var].dropna().values)
    normality = 0
    for i in range(len(result.critical_values)):
        sl, cv = result.significance_level[i], result.critical_values[i]
        if result.statistic < result.critical_values[i]:
            normality +=1
        else :
            normality +=0
    if normality > 2.5 :
        normal.append(var)
    else :
        notnormal.append(var)

좀 더 자세한 것은 아래 글 참고!

https://towardsdatascience.com/6-ways-to-test-for-a-normal-distribution-which-one-to-use-9dcf47d8fa93

6 ways to test for a Normal Distribution — which one to use?

Find out which approach is the most powerful when testing for a normal distribution!

towardsdatascience.com

https://machinelearningmastery.com/a-gentle-introduction-to-normality-tests-in-python/

A Gentle Introduction to Normality Tests in Python

An important decision point when working with a sample of data is whether to use parametric or nonparametric statistical methods. Parametric statistical methods assume that the data has a known and specific distribution, often a Gaussian distribution. If a

machinelearningmastery.com