All I Need Is Data.

Pandas 중복되는 값의 시작점과 누적후에 끝점 위치 구해보기

2020. 7. 14. 20:24ㆍ분석 Python/Pandas Tip

조건을 줘서 누적까지 감안한 위치를 구하고 싶다

만약에 아래 그림처럼 elu는 (0,0) leaky_relu(1,2), selu(3,3) leaky_relu(4,6)

이런 식으로 구해야했다.

import pandas as pd
data = pd.read_csv("./save_parameter.csv")
data.head()

(data["activation"].\
groupby((data['activation'] != data['activation'].shift(1)).cumsum()).cumcount()+1).head(7)

여러 번의 삽질을 통해서 구할 수 있게 됐다 ㅠㅠㅠ

아주 긴 함수를 만들게 됬다 ㄷㄷㄷ

def find_index(x):
    df_by_group = pd.DataFrame(x.index)
    df_by_group_index = df_by_group.reset_index(drop=True).rename(columns={0:"index"})
    df_by_group_index["shift_1"] = df_by_group_index.shift(-1)
    df_by_group_index["diff"] = df_by_group_index["shift_1"] - df_by_group_index["index"]
    ck_pass = []
    store = []
    for idx_1 , row_1 in df_by_group_index.iterrows() :
        length_ = 0 
        start_index = row_1["index"]
        if start_index in ck_pass :
            continue
        for idx_2 , row_2 in df_by_group_index.iloc[(idx_1):, : ].iterrows() :
            if idx_2 == 0 :
                diff = row_1["diff"]
                if diff == 1 :
                    ck_pass.append(row_1["shift_1"])
                    length_ += 1
                else :
                    store.append([start_index , start_index + length_])
                    break
            else :
                diff = row_2["diff"]
                if diff == 1 :
                    ck_pass.append(row_2["index"])
                    ck_pass.append(row_2["shift_1"])
                    length_ += 1
                else :
                    store.append([start_index , start_index + length_])
                    break
    return store

find_data = data.groupby("activation").apply(find_index)

collection = []
for key , list_ in find_data.iteritems() :
    df = pd.DataFrame(list_, columns = ["start", "end"])
    df["name"] = key
    collection.append(df)
pd.concat(collection,axis=0).sort_values(["start"])

위의 코드를 이용하면 아래처럼 처음 시작 점과 끝점을 구할 수 있게 된다! ㅠㅠ

삽질의 삽질을 해서 알게 됐으니 잘 쓰도록 노력해야겠다.

아마 더 쉬운 방법이 있을 수 있지만... 일단은 여기까지...

stackoverflow.com/questions/25119524/pandas-conditional-rolling-count

Pandas: conditional rolling count

I have a Series that looks the following: col 0 B 1 B 2 A 3 A 4 A 5 B It's a time series, therefore the index is ordered by time. For each row, I'd like to count how many times the valu...

stackoverflow.com

save_parameter.csv

0.00MB

저작자표시

'분석 Python > Pandas Tip' 카테고리의 다른 글

[Pandas] data type별로 컬럼들을 사전 형태로 모으기 (0)	2020.07.23
[Pandas] 여러개의 컬럼 하나로 합치기 (0)	2020.07.22
pandas useful tip (0)	2020.06.25
pandas apply를 사용하여 다중 컬럼(multiple columns) 만들기 (0)	2020.06.09
pandas 의 filter 함수로 변수 선택하기 (0)	2020.05.19

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

All I Need Is Data.

All I Need Is Data.

태그

최근글

댓글

공지사항

아카이브

'분석 Python > Pandas Tip' 카테고리의 다른 글

관련글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역