Pandas - How to get get sum of rows by multiple columns in a DataFrame

alextc

I have the following Pandas DataFrame object df, which denotes incidents that occurred between 2000-07-01 to 2018-03-31. Each row represents an incident that occurred on that particular date. FID_1 is the index column and can be used to uniquely identify each row of incident. The ICC_NAME column contains 33 unique values for where it occurred.

                comb_date       ICC_NAME
FID_1                                   
267   2000-09-18 09:49:00      Alexandra
462   2000-10-19 01:00:00      Alexandra
696   2000-11-26 15:08:00      Alexandra
734   2000-11-27 19:20:00      Alexandra
760   2000-11-28 20:00:00      Alexandra
761   2000-11-28 20:30:00      Alexandra
945   2000-05-12 12:37:00      Alexandra
1242  2000-12-12 14:35:00      Alexandra
1440  2000-12-16 06:45:00      Alexandra
1523  2000-12-17 12:55:00      Alexandra
1701  2000-12-19 18:40:00      Alexandra
1899  2000-12-26 11:42:00      Alexandra
1963  2000-12-29 09:43:00      Alexandra
1975  2000-12-29 15:54:00      Alexandra
2004  2000-12-30 13:26:00      Alexandra
2044  2000-12-31 13:18:00      Alexandra
2100  2001-01-01 00:06:00      Alexandra
2202  2001-02-01 13:34:00      Alexandra
2826  2001-11-01 13:32:00      Alexandra
2991  2001-01-15 10:55:00      Alexandra
3175  2001-01-20 11:18:00      Alexandra
3176  2001-01-20 11:35:00      Alexandra
3212  2001-01-20 22:55:00      Alexandra
3371  2001-01-26 14:25:00      Alexandra
3386  2001-01-26 19:05:00      Alexandra
3395  2001-01-27 13:20:00      Alexandra
3432  2001-01-28 18:03:00      Alexandra
3701  2001-06-02 18:29:00      Alexandra
3881  2001-02-14 10:00:00      Alexandra
4131  2001-02-21 17:48:00      Alexandra
...                   ...            ...
...                   ...            ...
...                   ...          Boort
...                   ...          Boort
...                   ...            ...
...                   ...            ...
96968 2018-01-25 17:27:00  Woori Yallock
96983 2018-01-25 19:04:00  Woori Yallock
96995 2018-01-26 00:03:00  Woori Yallock
97002 2018-01-26 09:39:00  Woori Yallock
97105 2018-01-28 11:12:00  Woori Yallock
97143 2018-01-29 14:42:00  Woori Yallock
97144 2018-01-29 15:00:00  Woori Yallock
97160 2018-01-30 21:54:00  Woori Yallock
97249 2018-06-02 22:40:00  Woori Yallock
97314 2018-11-02 12:38:00  Woori Yallock
97361 2018-02-13 16:49:00  Woori Yallock
97362 2018-02-13 16:55:00  Woori Yallock
97368 2018-02-14 05:48:00  Woori Yallock
97446 2018-02-18 11:17:00  Woori Yallock
97475 2018-02-19 18:52:00  Woori Yallock
97485 2018-02-20 15:42:00  Woori Yallock
97496 2018-02-20 22:19:00  Woori Yallock
97514 2018-02-22 14:47:00  Woori Yallock
97563 2018-02-25 20:37:00  Woori Yallock
97641 2018-02-28 17:19:00  Woori Yallock
97642 2018-02-28 17:45:00  Woori Yallock
97769 2018-07-03 07:35:00  Woori Yallock
97786 2018-07-03 22:05:00  Woori Yallock
97902 2018-11-03 16:20:00  Woori Yallock
97938 2018-12-03 14:33:00  Woori Yallock
97939 2018-12-03 14:35:00  Woori Yallock
97946 2018-12-03 20:23:00  Woori Yallock
98046 2018-03-17 18:24:00  Woori Yallock
98090 2018-03-18 11:06:00  Woori Yallock
98207 2018-03-22 19:58:00  Woori Yallock

[98372 rows x 2 columns]

What I want to achieve is to get sum of incidents per YYYY-MM and for each ICC_NAME.

yyyy-mm      Alexandra      Boort      ...      Woori Yallock
2000-07             29         12      ...                  8
2000-08             20         16      ...                 13
... ...
... ...
2018-03             41         8       ...                 28

I was thinking of using resample but not sure on which column the sum() should be applied.

jezrael

Use crosstab with convert datetimes to month periods by Series.dt.to_period, last change index, columns names by DataFrame.rename_axis and convert PeriodIndex to column by DataFrame.reset_index:

df['comb_date'] = pd.to_datetime(df['comb_date'])
df1 = (pd.crosstab(df['comb_date'].dt.to_period('m'), df['ICC_NAME'])
         .rename_axis(columns=None, index='yyy-mm')
         .reset_index())
print (df1)
     yyy-mm  Alexandra  Woori Yallock
0   2000-05          1              0
1   2000-09          1              0
2   2000-10          1              0
3   2000-11          4              0
4   2000-12          9              0
5   2001-01          9              0
6   2001-02          3              0
7   2001-06          1              0
8   2001-11          1              0
9   2018-01          0              8
10  2018-02          0             11
11  2018-03          0              3
12  2018-06          0              1
13  2018-07          0              2
14  2018-11          0              2
15  2018-12          0              3

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Pandas - dataframe groupby - how to get sum of multiple columns

compare multiple columns to get rows that are different in two pandas dataframe

How to get sum of k consecutive rows in pandas dataframe?

Pandas DataFrame: How to natively get minimum across range of rows and columns

How to get columns from duplicated rows in Pandas DataFrame?

How to get selective columns and then rows from a a pandas dataframe

How to get the top rows from a dataframe with multiple columns set as the index?

Pandas group by multiple columns and get output in rows

For each day get the sum of all rows in a very large Pandas DataFrame which match in two specific columns

How to get unique information from multiple columns of a pandas dataframe?

How to get value counts for multiple columns at once in Pandas DataFrame?

How to get percentage count based on multiple columns in pandas dataframe?

Pandas : Sum multiple columns and get results in multiple columns

Get the sum of multiple columns

Pandas groupby and get nunique of multiple columns in a dataframe

If values in multiple columns match another dataframe, get sum based on range of dates pandas

How get multiple SUM() for different columns with GROUP BY

How to get Mysql SUM of multiple columns in database

Pandas - get dataframe rows based on matching columns with other dataframe

How to get rows in a column in Pandas DataFrame

How to get the last N rows of a pandas DataFrame?

How to get the last N rows of a pandas DataFrame?

How to get distinct rows from pandas dataframe?

How to get list of rows of pandas dataframe in python?

Get cumulative sum and mean for specifc columns in pandas dataframe

How to get sum of table a and sum of table b with multiple rows

Get the sum of absolutes of columns for a dataframe

How to get the max out of a group by on two columns and sum on third in a pandas dataframe?

How to get difference of columns in DataFrame Pandas?