Pandas 중복되는 값의 시작점과 누적후에 끝점 위치 구해보기

데이터분석뉴비 2020. 7. 14. 20:24

조건을 줘서 누적까지 감안한 위치를 구하고 싶다

만약에 아래 그림처럼 elu는 (0,0) leaky_relu(1,2), selu(3,3) leaky_relu(4,6)

이런 식으로 구해야했다.

import pandas as pd
data = pd.read_csv("./save_parameter.csv")
data.head()

(data["activation"].\
groupby((data['activation'] != data['activation'].shift(1)).cumsum()).cumcount()+1).head(7)

여러 번의 삽질을 통해서 구할 수 있게 됐다 ㅠㅠㅠ

아주 긴 함수를 만들게 됬다 ㄷㄷㄷ

def find_index(x):
    df_by_group = pd.DataFrame(x.index)
    df_by_group_index = df_by_group.reset_index(drop=True).rename(columns={0:"index"})
    df_by_group_index["shift_1"] = df_by_group_index.shift(-1)
    df_by_group_index["diff"] = df_by_group_index["shift_1"] - df_by_group_index["index"]
    ck_pass = []
    store = []
    for idx_1 , row_1 in df_by_group_index.iterrows() :
        length_ = 0 
        start_index = row_1["index"]
        if start_index in ck_pass :
            continue
        for idx_2 , row_2 in df_by_group_index.iloc[(idx_1):, : ].iterrows() :
            if idx_2 == 0 :
                diff = row_1["diff"]
                if diff == 1 :
                    ck_pass.append(row_1["shift_1"])
                    length_ += 1
                else :
                    store.append([start_index , start_index + length_])
                    break
            else :
                diff = row_2["diff"]
                if diff == 1 :
                    ck_pass.append(row_2["index"])
                    ck_pass.append(row_2["shift_1"])
                    length_ += 1
                else :
                    store.append([start_index , start_index + length_])
                    break
    return store

find_data = data.groupby("activation").apply(find_index)

collection = []
for key , list_ in find_data.iteritems() :
    df = pd.DataFrame(list_, columns = ["start", "end"])
    df["name"] = key
    collection.append(df)
pd.concat(collection,axis=0).sort_values(["start"])

위의 코드를 이용하면 아래처럼 처음 시작 점과 끝점을 구할 수 있게 된다! ㅠㅠ

삽질의 삽질을 해서 알게 됐으니 잘 쓰도록 노력해야겠다.

아마 더 쉬운 방법이 있을 수 있지만... 일단은 여기까지...

stackoverflow.com/questions/25119524/pandas-conditional-rolling-count

Pandas: conditional rolling count

I have a Series that looks the following: col 0 B 1 B 2 A 3 A 4 A 5 B It's a time series, therefore the index is ordered by time. For each row, I'd like to count how many times the valu...

stackoverflow.com

save_parameter.csv

0.00MB

저작자표시 (새창열림)