I am trying to apply multiple functions to multiple columns by a grouping variable. I can get the results, but not in a useful format. Below, I would like res2 to be an extension of res1 by the by variable "cyl" and as many rows as unique values of cyl.
I have tried omitting unlist and redefining the my.sum.function to return a numeric rather than a list. But I can't get the format I need.
library(data.table)
## The well known data
data(mtcars)
DT <- data.table(mtcars)
## a custom set of summary functions
my.sum.fun = function(x){list(
mean = mean(x, na.rm=T),
median = median(x, na.rm=T),
sd = sd(x, na.rm=T)
)}
## I can summarize multiple columns. This works
res1 <- DT[,unlist(lapply(.SD,my.sum.fun)),.SDcols=c("mpg","hp")]
res1
mpg.mean mpg.median mpg.sd hp.mean hp.median hp.sd
20.090625 19.200000 6.026948 146.687500 123.000000 68.562868
## Now I add a by column. What I would like is the format as res1 but with the by column "cyl" added and with as many rows as unique values of "cyl".
res2 <- DT[,unlist(lapply(.SD,my.sum.fun)),.SDcols=c("mpg","hp"),by=list(cyl)]
res2
cyl V1
1: 6 19.742857
2: 6 19.700000
3: 6 1.453567
4: 6 122.285714
5: 6 110.000000
6: 6 24.260491
7: 4 26.663636
8: 4 26.000000
9: 4 4.509828
10: 4 82.636364
11: 4 91.000000
12: 4 20.934530
13: 8 15.100000
14: 8 15.200000
15: 8 2.560048
16: 8 209.214286
17: 8 192.500000
18: 8 50.976886
There is an option in unlist
to avoid unlisting recursively - the recursive
parameter (By default, the recursive = TRUE
)
DT[,unlist(lapply(.SD,my.sum.fun),
recursive = FALSE),.SDcols=c("mpg","hp"),by=list(cyl)]
# cyl mpg.mean mpg.median mpg.sd hp.mean hp.median hp.sd
#1: 6 19.74286 19.7 1.453567 122.28571 110.0 24.26049
#2: 4 26.66364 26.0 4.509828 82.63636 91.0 20.93453
#3: 8 15.10000 15.2 2.560048 209.21429 192.5 50.97689
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments