Below is a simplified version of the df in question:
df = pd.DataFrame({'date':['2021-01-01','2021-01-02','2021-01-03','2021-01-04','2021-02-01','2021-02-02','2021-02-03','2021-02-04'],
'month': ['Jan','Jan','Jan','Jan','Feb','Feb','Feb','Feb'],
'label': ['A','A','B','A','A','B', 'C', 'A']})
df
date month label
0 2021-01-01 Jan A
1 2021-01-02 Jan A
2 2021-01-03 Jan B
3 2021-01-04 Jan A
4 2021-02-01 Feb A
5 2021-02-02 Feb B
6 2021-02-03 Feb C
7 2021-02-04 Feb A
I would like to have a new column showing the cumulative sum of unique labels on a monthly basis.
Intended df:
date month label count
0 2021-01-01 Jan A 1
1 2021-01-02 Jan A 1
2 2021-01-03 Jan B 2
3 2021-01-04 Jan A 2
4 2021-02-01 Feb A 1
5 2021-02-02 Feb B 2
6 2021-02-03 Feb C 3
7 2021-02-04 Feb A 3
We can use sort to check by month
and label
to check for differences in rows with shift
. Join this boolean array to our dataframe and use groupby.cumsum
to get the counter:
d = df.sort_values(["month", "label"])
s = d["label"].ne(d["label"].shift()).rename("count")
df = df.join(s)
df["count"] = df.groupby("month")["count"].cumsum()
date month label count
0 2021-01-01 Jan A 1
1 2021-01-02 Jan A 1
2 2021-01-03 Jan B 2
3 2021-01-04 Jan A 2
4 2021-02-01 Feb A 1
5 2021-02-02 Feb B 2
6 2021-02-03 Feb C 3
7 2021-02-04 Feb A 3
OLD ANSWER
We can make use of a cumulative sum of booleans:, by checking if the previous label
is equal to the current. Then groupby and cumsum
s = df["label"].ne(df["label"].shift())
df["count"] = s.groupby(df["month"]).cumsum()
date month label count
0 2021-01-01 Jan A 1
1 2021-01-01 Jan A 1
2 2021-01-03 Jan B 2
3 2021-02-01 Feb A 1
4 2021-02-02 Feb B 2
5 2021-02-03 Feb C 3
Or more safe and make use of your dates by doing a groupby on year-month:
df["date"] = pd.to_datetime(df["date"])
s = df["label"].ne(df["label"].shift())
df["count"] = s.groupby(df["date"].dt.strftime("%Y-%m")).cumsum()
date month label count
0 2021-01-01 Jan A 1
1 2021-01-01 Jan A 1
2 2021-01-03 Jan B 2
3 2021-02-01 Feb A 1
4 2021-02-02 Feb B 2
5 2021-02-03 Feb C 3
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments