Summarizing multiple columns by two variables

HVS

This is my first time ever using R so please forgive me if this question is not worded properly. I have a .csv file that I imported into R and I am trying to summarize some data. each row of data if for a given year, study site, and area and each column has the number of species present. There are 4 columns for each species as there were 4 surveys where the species could have been seen.

I am trying to get the sum of each species by year and study site. Columns 5:8 are one species, 9:12 another, 13:16 another and so on. Here is the code that I thought would summarize columns 5:8 by year (YYYY) and study area (SAR):

aggregate(test[,5:8],by = list("SAR","YYYY"), FUN = sum, na.rm = TRUE)

This gives me the error message that "argument must have the same length". Can anyone help me through this initial step?

Here is some of the data:

SAR    YYYY GRID_ID WID     col1 col2 col3 col4
BCPALP  2005    1   1189    NA  NA  0   0
BCPALP  2005    1   1190    0   NA  0   0
BCPALP  2005    1   1191    0   0   NA  NA
BCPALP  2005    1   1192    0   NA  NA  NA
BCPALP  2005    1   1194    NA  NA  1   NA
BCPALP  2005    1   1195    NA  NA  1   NA
BCPALP  2005    1   1196    0   NA  0   NA
BCPALP  2005    1   1198    0   NA  0   NA
BCPALP  2005    1   1199    0   NA  0   0

I'm hoping to get an output that is something like this:

SAR    YYYY    total of columns 1:4
BCPALP 2005    2

This is the code I just tried.

aggregate(cbind("col1", "col2", "col3", "col4")~SAR+YYYY, test, FUN=sum, na.rm=TRUE, na.action=NULL)

It gives me an error message that states "variable lengths differ (found for 'SAR')".

I went back and checked the data and all the variable lengths are the same.

akrun

We can use either aggregate or data.table or dplyr. If we use the formula method for aggregate, we need to set the na.action=NULL when there are NA values in different columns. By default, the na.action=na.omit, so if there is a single NA in one of the columns, that row will be removed from the calculation.

aggregate(cbind(col1, col2, col3, col4)~SAR+YYYY, test,
                        FUN=sum, na.rm=TRUE, na.action=NULL)
#   SAR YYYY col1 col2 col3 col4
#1 BCPALP 2005    0    0    2    0

Using dplyr, we group by 'SAR', 'YYYY', and use summarise_each to get the sum of each of the 'col'.

library(dplyr)
test %>%
     group_by(SAR, YYYY) %>%
     summarise_each(funs(sum=sum(., na.rm=TRUE)), 5:ncol(test))
#     SAR  YYYY  col1  col2  col3  col4
#   (chr) (int) (int) (int) (int) (int)
#1 BCPALP  2005     0     0     2     0

Or with data.table. We convert the 'data.frame' to 'data.table' (setDT(test)), grouped by 'SAR', 'YYYY', we loop though the Subset of Data.table (.SD) and get the sum. The columns to be looped are specified in the .SDcols.

library(data.table)
setDT(test)[, lapply(.SD, sum, na.rm=TRUE), by = .(SAR, YYYY),
             .SDcols= 5:ncol(test)]  
#      SAR YYYY col1 col2 col3 col4
#1: BCPALP 2005    0    0    2    0

Update

Suppose after aggregating we need to get the row wise sum for columns 'col1:col4', then 'col5:col8' etc.

 DT <- setDT(test1)[, lapply(.SD, sum, na.rm=TRUE),
              by = .(SAR, YYYY), .SDcols= 5:ncol(test1)]
 DT1 <- melt(DT, id.var=c('SAR', 'YYYY'))[, i1 := as.numeric(gl(.N, 4, .N)),
            .(SAR, YYYY)]
 dcast(DT1, SAR+YYYY~i1, value.var='value', sum)

data

 test <- structure(list(SAR = c("BCPALP", "BCPALP",
"BCPALP", "BCPALP", 
"BCPALP", "BCPALP", "BCPALP", "BCPALP", "BCPALP"), YYYY = c(2005L, 
2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L),
GRID_ID = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), WID = c(1189L, 1190L, 1191L, 
1192L, 1194L, 1195L, 1196L, 1198L, 1199L), col1 = c(NA, 0L, 0L, 
0L, NA, NA, 0L, 0L, 0L), col2 = c(NA, NA, 0L, NA, NA, NA, NA, 
NA, NA), col3 = c(0L, 0L, NA, NA, 1L, 1L, 0L, 0L, 0L), col4 = c(0L, 
0L, NA, NA, NA, NA, NA, NA, 0L)), .Names = c("SAR", "YYYY",
"GRID_ID", 
"WID", "col1", "col2", "col3", "col4"), class = "data.frame", 
 row.names = c(NA, -9L))

set.seed(24)
m1 <- matrix(sample(c(NA,0:5), 9*4, replace=TRUE),ncol=4, 
           dimnames=list(NULL, paste0('col', 5:8)))
test1 <- cbind(test, m1) 

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Summarizing multiple columns with dplyr?

Summarizing by group of two variables

Summarizing data by name separated across multiple variables

Summarizing by group of two rows

Summarizing a dataset and creating new variables

Summarizing columns using a vector with dplyr

Summarizing at multiple levels of hierarchy

How to create multiple dummy variables (interaction between two columns)?

Python 3.6: Creating a pivot table summarizing counts of values for multiple columns in dataframe

summarizing data.table - creating multiple columns subset by date in R

Create an summarizing variable for multiple columns in data.table r

Plotting two columns based on multiple variables in another column?

Grouping by Multiple variables and summarizing character frequencies

Summarizing multiple dummies in R

Summarizing multiple tables in Excel

Median of two variables/columns

Why do group_by and group_by_ give different answers when summarizing by two variables?

Export variables into two columns

Summarizing multiple columns based on one column in R

gtsummary: multiple continuous variables as columns and than stratify by two categorial variables

Summarizing unique values by group over multiple columns

Summarizing Multiple Columns of Data Using Pipes

Grouping pandas dataframe by two columns without summarizing it

How to create two variables columns

Summarizing two variables into one

Replacing UNION ALL to increase speed when summarizing multiple columns

R/arrow summarizing on variable columns

loop over two variables to create multiple year columns

Summarizing categorical variables by multiple groups

TOP Ranking

  1. 1

    Failed to listen on localhost:8000 (reason: Cannot assign requested address)

  2. 2

    How to import an asset in swift using Bundle.main.path() in a react-native native module

  3. 3

    Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

  4. 4

    pump.io port in URL

  5. 5

    Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

  6. 6

    BigQuery - concatenate ignoring NULL

  7. 7

    ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

  8. 8

    Do Idle Snowflake Connections Use Cloud Services Credits?

  9. 9

    maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

  10. 10

    Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

  11. 11

    Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

  12. 12

    Generate random UUIDv4 with Elm

  13. 13

    Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

  14. 14

    Is it possible to Redo commits removed by GitHub Desktop's Undo on a Mac?

  15. 15

    flutter: dropdown item programmatically unselect problem

  16. 16

    Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

  17. 17

    EXCEL: Find sum of values in one column with criteria from other column

  18. 18

    Pandas - check if dataframe has negative value in any column

  19. 19

    How to use merge windows unallocated space into Ubuntu using GParted?

  20. 20

    Make a B+ Tree concurrent thread safe

  21. 21

    ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

HotTag

Archive