I have the following Pandas DataFrame object df
, which denotes incidents that occurred between 2000-07-01 to 2018-03-31. Each row represents an incident that occurred on that particular date. FID_1
is the index column and can be used to uniquely identify each row of incident. The ICC_NAME
column contains 33 unique values for where it occurred.
comb_date ICC_NAME
FID_1
267 2000-09-18 09:49:00 Alexandra
462 2000-10-19 01:00:00 Alexandra
696 2000-11-26 15:08:00 Alexandra
734 2000-11-27 19:20:00 Alexandra
760 2000-11-28 20:00:00 Alexandra
761 2000-11-28 20:30:00 Alexandra
945 2000-05-12 12:37:00 Alexandra
1242 2000-12-12 14:35:00 Alexandra
1440 2000-12-16 06:45:00 Alexandra
1523 2000-12-17 12:55:00 Alexandra
1701 2000-12-19 18:40:00 Alexandra
1899 2000-12-26 11:42:00 Alexandra
1963 2000-12-29 09:43:00 Alexandra
1975 2000-12-29 15:54:00 Alexandra
2004 2000-12-30 13:26:00 Alexandra
2044 2000-12-31 13:18:00 Alexandra
2100 2001-01-01 00:06:00 Alexandra
2202 2001-02-01 13:34:00 Alexandra
2826 2001-11-01 13:32:00 Alexandra
2991 2001-01-15 10:55:00 Alexandra
3175 2001-01-20 11:18:00 Alexandra
3176 2001-01-20 11:35:00 Alexandra
3212 2001-01-20 22:55:00 Alexandra
3371 2001-01-26 14:25:00 Alexandra
3386 2001-01-26 19:05:00 Alexandra
3395 2001-01-27 13:20:00 Alexandra
3432 2001-01-28 18:03:00 Alexandra
3701 2001-06-02 18:29:00 Alexandra
3881 2001-02-14 10:00:00 Alexandra
4131 2001-02-21 17:48:00 Alexandra
... ... ...
... ... ...
... ... Boort
... ... Boort
... ... ...
... ... ...
96968 2018-01-25 17:27:00 Woori Yallock
96983 2018-01-25 19:04:00 Woori Yallock
96995 2018-01-26 00:03:00 Woori Yallock
97002 2018-01-26 09:39:00 Woori Yallock
97105 2018-01-28 11:12:00 Woori Yallock
97143 2018-01-29 14:42:00 Woori Yallock
97144 2018-01-29 15:00:00 Woori Yallock
97160 2018-01-30 21:54:00 Woori Yallock
97249 2018-06-02 22:40:00 Woori Yallock
97314 2018-11-02 12:38:00 Woori Yallock
97361 2018-02-13 16:49:00 Woori Yallock
97362 2018-02-13 16:55:00 Woori Yallock
97368 2018-02-14 05:48:00 Woori Yallock
97446 2018-02-18 11:17:00 Woori Yallock
97475 2018-02-19 18:52:00 Woori Yallock
97485 2018-02-20 15:42:00 Woori Yallock
97496 2018-02-20 22:19:00 Woori Yallock
97514 2018-02-22 14:47:00 Woori Yallock
97563 2018-02-25 20:37:00 Woori Yallock
97641 2018-02-28 17:19:00 Woori Yallock
97642 2018-02-28 17:45:00 Woori Yallock
97769 2018-07-03 07:35:00 Woori Yallock
97786 2018-07-03 22:05:00 Woori Yallock
97902 2018-11-03 16:20:00 Woori Yallock
97938 2018-12-03 14:33:00 Woori Yallock
97939 2018-12-03 14:35:00 Woori Yallock
97946 2018-12-03 20:23:00 Woori Yallock
98046 2018-03-17 18:24:00 Woori Yallock
98090 2018-03-18 11:06:00 Woori Yallock
98207 2018-03-22 19:58:00 Woori Yallock
[98372 rows x 2 columns]
What I want to achieve is to get sum of incidents per YYYY-MM and for each ICC_NAME.
yyyy-mm Alexandra Boort ... Woori Yallock
2000-07 29 12 ... 8
2000-08 20 16 ... 13
... ...
... ...
2018-03 41 8 ... 28
I was thinking of using resample but not sure on which column the sum() should be applied.
Use crosstab
with convert datetimes to month periods by Series.dt.to_period
, last change index, columns names by DataFrame.rename_axis
and convert PeriodIndex
to column by DataFrame.reset_index
:
df['comb_date'] = pd.to_datetime(df['comb_date'])
df1 = (pd.crosstab(df['comb_date'].dt.to_period('m'), df['ICC_NAME'])
.rename_axis(columns=None, index='yyy-mm')
.reset_index())
print (df1)
yyy-mm Alexandra Woori Yallock
0 2000-05 1 0
1 2000-09 1 0
2 2000-10 1 0
3 2000-11 4 0
4 2000-12 9 0
5 2001-01 9 0
6 2001-02 3 0
7 2001-06 1 0
8 2001-11 1 0
9 2018-01 0 8
10 2018-02 0 11
11 2018-03 0 3
12 2018-06 0 1
13 2018-07 0 2
14 2018-11 0 2
15 2018-12 0 3
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments