groupbyの最初と最後の行を比較して新しい値を作成する

wjie08

複数の値を持つデータフレームがあり、「email」列でグループ化し、最初の行と最後の行を取得して、カテゴリ列のステータスに変化があるかどうかを比較したいと思います。たとえば、カテゴリがMGRからMGRの場合、変更はありません。カテゴリがEMPからMGRに変更された場合、ステータスの変更が反映されます。

date                 email               category
13-04-2018            [email protected]     MGR
13-04-2018            [email protected]      EMP
18-04-2018            [email protected]     EMP
20-04-2018            [email protected]      MGR
11-01-2019            [email protected]     MGR
15-10-2019            [email protected]     MGR
16-11-2019            [email protected]     MGR
31-01-2020            [email protected]      EMP
02-05-2020            [email protected]      MGR
05-08-2020            [email protected]      MGR
14-02-2021            [email protected]      MGR
15-02-2021            [email protected]      MGR

次の結果を取得したい

date                 email               category    status
13-04-2018            [email protected]     MGR        no change
15-10-2019            [email protected]     MGR        no change
13-04-2018            [email protected]      EMP        change
15-02-2021            [email protected]      MGR        change
18-04-2018            [email protected]     EMP        change 
16-11-2019            [email protected]     MGR        change 
20-04-2018            [email protected]      MGR        no change
05-08-2020            [email protected]      MGR        no change
31-01-2020            [email protected]      EMP        change 
14-02-2021            [email protected]      MGR        change

次のコードを試しましたが、groupbyに基づいて最初と最後の行のみを取得しているようです。最初の行と最後の行の値を比較する方法はありますか？

#get the first and last row of the groupby
df2 = df.groupby('email', as_index=False).nth([0,-1])

どんな形の助けにも感謝します、ありがとう。

ゼバルチン

それが十分に効率的であるかどうかはわかりませんが、うまく機能します。

def check_status(group):
    selected = [False] * len(group)
    selected[0] = selected[-1] = True
    new_group = group[selected]
    new_group['status'] = 'change' if new_group.category.is_unique else 'no change'
    return new_group

print(df.groupby('email').apply(check_status).reset_index(drop=True))

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]