how can I split a dataframe by two columns and count number of rows based on group more efficient

Zihu Guo

I have a data.frame with more than 120000 rows, it looks like this

> head(mydf)
ID MONTH.YEAR VALUE
1 110  JAN. 2012  1000
2 111  JAN. 2012  1000
3 121  FEB. 2012  3000
4 131  FEB. 2012  3000
5 141  MAR. 2012  5000
6 142  MAR. 2012  4000

and I want to split the data.frame depend on the MONTH.YEAR and VALUE column, and count the rows of each group, my expect answer should looks like this

MONTH.YEAR VALUE count
JAN. 2012  1000  2
FEB. 2012  3000  2
MAR. 2012  5000  1
MAR. 2012  4000  1

I tried to split it and use the sapply count the number of each group, and this is my code

sp <- split(mydf, list(mydf$MONTH.YEAR, mydf$VALUE), drop=TRUE);
result <- data.frame(yearandvalue = names(sapply(sp, nrow)), count = sapply(sp, nrow))

but I find the process is very slow. Is there a more efficient way to impliment this? thank you very much.

akrun

Try

aggregate(ID~., mydf, length)

Or

library(dplyr)
 mydf %>%
    group_by(MONTH.YEAR, VALUE) %>%
    summarise(count=n())

Or

library(data.table)
setDT(mydf)[, list(count=.N) , list(MONTH.YEAR, VALUE)]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Count number of unique rows based on two columns, by group

How can I count the number of occurrences of an element across two columns of a dataframe?

How can I combine rows in a pandas dataframe based on comparing values in two columns?

How can I remove rows from a dataframe based on having NA in two columns?

How can I separate two columns of count by group by?

How can I group the number of values into columns based on month in sql

How can I group by elements based on multiple columns in pandas dataframe and save the number of elements of each group in another column?

Conditional count of rows in a dataframe based on two columns in another dataframe

How can I find sequences of rows based on two columns?

How to split a dataframe in two dataframes based on the total number of rows in the original dataframe

How can i group pandas dataframe based on two colunms?

Group dataframe by two columns and then find average count based on one of the groups

How can I group with multiple columns and count?

How can I split this dataframe in multiple columns?

How can i split a column in more columns based on a specific value in certain cells

How can i count the number of rows duplicated?

How can I split index into two columns?

How can I split the difference between two timestamps that contain more than one month in a Pandas DataFrame

How can I split the difference between two timestamps that contain more than one date in a Pandas DataFrame

How to count the number of rows that have the value 1 for all the columns in a dataframe?

DAX/Power BI: How can I filter distinct rows conditionally on two (or more) columns?

How to split rows into columns in a dataframe

How to count occurrence in previous rows based on two columns value

How to count rows that have the same values in two columns in Dataframe

How can I count the number of rows within each group using SQL?

How can I count the number of rows in multiple datasets, group them by one column (month), in a Select statement in Bigquery?

Pandas dataframe, how can I group by multiple columns and apply sum for specific column and add new count column?

How to split a columns based on the index of the string in the columns while using a efficient method to parse all the Dataframe

Find if more rows exist based on two columns