Python Group 별로 Bar Graph 그릴 때,

2019. 6. 9. 19:36분석 Python/Visualization

728x90

그룹별로 시각화할 일이 있어서 찾아보는데, R과 같이 제공해주는 것도 있지만, 잘 생각해서 해야 하는 것도 있다 

 

목적은 train과 test 별로 각각의 Category 비율 파악을 하는 것이 목적

f, ax = plt.subplots(3,4 , figsize = (20,20))
axx = ax.flatten()
f2, ax2 = plt.subplots(3,4 , figsize = (20,20))
axx3 = ax2.flatten()

for axx2 , axx4 , j in zip(axx, axx3, catcols) :
    #output = data.iloc[index_info[0]].reset_index(drop=True).iloc[idx][j].value_counts()
    tr = data.iloc[index_info[0]].reset_index(drop=True).iloc[idx][j].values.tolist()
    te = data.iloc[index_info[1]].reset_index(drop=True).iloc[test_idx][j].values.tolist() 
    a =  tr + te 
    aa = pd.DataFrame({"Category" :a , "cat" : ["train"] *len(tr) + ["test"] *len(te) })
    if sum(np.isin(np.array(["Location", "Date"]) , j)) < 1  : 
        leg = True
    else :
        leg = False
    aa.groupby("cat")['Category'].value_counts(normalize=True).unstack().plot(kind="bar", 
                                                                              legend=True , 
                                                                              stacked= True , 
                                                                              ax = axx2)
    axx2.set_title(j , fontsize = 20)
    sns.countplot(data = aa , x= "Category" , hue = "cat" , ax = axx4 )
	axx4.set_title(j , fontsize = 20)

 

Group 별로 Count plot을 그릴 때,

 

https://seaborn.pydata.org/generated/seaborn.countplot.html

 

seaborn.countplot — seaborn 0.9.0 documentation

x, y, hue : names of variables in data or vector data, optional Inputs for plotting long-form data. See examples for interpretation. data : DataFrame, array, or list of arrays, optional Dataset for plotting. If x and y are absent, this is interpreted as wi

seaborn.pydata.org

 

 

R에서 Bar Graph 비율로 시각화하는 것을 파이썬에서 하고 싶을 때

 

https://github.com/mwaskom/seaborn/issues/1027

 

Add percentages instead of counts to countplot · Issue #1027 · mwaskom/seaborn

Hello, I would like to make a proposal - could we add an option to a countplot which would allow to instead displaying counts display percentages/frequencies? Thanks

github.com

 

- 끝 -

728x90