R - Automatically stack every nth column of a data frame and save them as new objects

Hanx70

I have a date frame called DF with, say, three variables that repeat each other cyclically:

      A      B      C      A      B      C 
1    a1     b1     c1     a5     b5     c5
2    a2     b2     c2     a6     b6     c6
3    a3     b3     c3     a7     b7     c7
4    a4     b4     c4     a8     b8     c8

I want to stack the first A column on the second A column (and on the third, and fourth and so on, if they exist), and do the same with the other variables, and then save the result as new objects (as vectors, for example). So what I want to obtain is

V_A <- c(a1,a2,a3,a4,a5,a6,a7,a8)
V_B <- c(b1,b2,b3,b4,b5,b6,b7,b8)
V_C <- c(c1,c2,c3,c4,c5,c6,c7,c8)

While it's very easy to do it manually, like this

V_A <- DF[,seq(1, ncol(DF), 3]
V_A <- stack(DF)
V_B <- DF[,seq(2, ncol(DF), 3]
V_B <- stack(DF)
V_C <- DF[,seq(3, ncol(DF), 3]
V_C <- stack(DF)

what I'm looking for is a code that does this automatically, so that it will work for data frames with every number of variables without having to write ad-hoc codes every time. To sum up, the code should: 1) select every nth column in the data frame 2) stack this columns 3) save the result in new objects automatically created

I feel there must be a way to do this but I haven't succeeded so far. Thanks very much in advance.

EDIT Let's say I am in a slightly different situation, in which the columns repeat but not with exactly the same name, and I still want to do the same thing. So I have:

     A1      B1      C1      A2      B2      C2 
1    a11     b11     c11     a25     b25     c25
2    a12     b12     c12     a26     b26     c26
3    a13     b13     c13     a27     b27     c27
4    a14     b14     c14     a28     b28     c28

and I want:

V_A <- c(a11,a12,a13,a14,a25,a26,a27,a28)
V_B <- c(b11,b12,b13,b14,b25,b26,b27,b28)
V_C <- c(c11,c12,c13,c14,c25,c26,c27,c28)

How can I do it?

G. Grothendieck

Here are some alternatives. No packages are used.

1) aperm Create a 3d array a, permute the dimensions and reshape into a matrix m and then convert that to a data frame. This one only works if all values are of the same type. (2) and (3) do not have this limitation.

k <- 3
nr <- nrow(DF)
nc <- ncol(DF)
unames <- unique(names(DF))

a <- array(as.matrix(DF), c(nr, k, nc/k))
m <- matrix(aperm(a, c(1, 3, 2)),, k, dimnames = list(NULL, unames))
as.data.frame(m, stringsAsFactors = FALSE)

giving:

   A  B  C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8

If we are in the situation given in the question's EDIT then replace unames with the following where DF2 is DF with the revised names as per Note at end:

unames <- unique(sub("\\d*$", "", names(DF2)))

2) lapply This generalizes the code in the question. unames is defined above:

L <- lapply(split(as.list(DF), names(DF)), unlist)
as.data.frame(L, stringsAsFactors = FALSE)

giving:

   A  B  C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8

With the input shown in the question's EDIT it could be done like this where DF2 is given reproducibly in the Note at the end.

names0 <- sub("\\d*$", "", names(DF2))   # names without the trailing digits
L <- lapply(split(as.list(DF2), names0), unlist)
as.data.frame(L, stringsAsFactors = FALSE)

3) reshape nc and unames are from above. varying is a list with k components such as that the ith component contains the index vector c(i, i+k, ...). It seems that reshape does not like duplicated names so we have given it setNames(DF, 1:nc) as the input. This solution does have the advantage of also generating the index vectors time and id which relate the output to the input data.

varying <- split(1:nc, names(DF))
reshape(setNames(DF, 1:nc), dir = "long", varying = varying, v.names = unames)

giving:

    time  A  B  C id
1.1    1 a1 b1 c1  1
2.1    1 a2 b2 c2  2
3.1    1 a3 b3 c3  3
4.1    1 a4 b4 c4  4
1.2    2 a5 b5 c5  1
2.2    2 a6 b6 c6  2
3.2    2 a7 b7 c7  3
4.2    2 a8 b8 c8  4

With the input shown in the question's EDIT it actually simplifies. We no longer need to use setNames(DF, 1:nc) but can just use the data frame as is as input. Also, we can use varying=TRUE (also see @thelatemail's comment) instead of calculating a complex argument for varying. The input DF2 is as shown in the Note at the end and names0 is as in (2) above.

reshape(DF2, dir = "long", varying = TRUE, v.names = unique(names0))

Note:

Lines <- "      A      B      C      A      B      C 
1    a1     b1     c1     a5     b5     c5
2    a2     b2     c2     a6     b6     c6
3    a3     b3     c3     a7     b7     c7
4    a4     b4     c4     a8     b8     c8"
DF <- read.table(text = Lines, as.is = TRUE, check.names = FALSE)

DF2 <- setNames(DF, c("A1", "B1", "C1", "A2", "B2", "C2")) # test input

Upate: A number of simplifications. Also added DF2 in Note at end and discuss in each alternative how to modify the code to deal with it. (A general method might be just to reduce DF2 to DF as I discussed in the comments below.)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Nth smallest value for every column in data.frame in R

Save for loop data to a new column in data frame

Save every nth file to a new subfolder following a if statement in R

Add every three rows of data frame and them multiply the new results - R

How to sort the columns of every row of a data frame and then save the column name in the corresponding columns in R?

How to split the the strings of column in a data frame and save them in different columns?

Get sum of every n th column for each individual and create new data frame in r

How to fill every nth row of a data.frame column with values from another data.frame's column?

Create a new row containing column sums for every data frame in a list

How do I split a data frame among columns, say at every nth column?

Change column name every nth column in R

R Creating new row for every subject in the data frame

More efficient way to get averages from data frame in R and save them back to same data frame

Stack Based Column Sum in a data frame using R

Sum all values in every column of a data.frame in R

Is there an R function for performing basic operations on every column of a data frame?

Stack every other column in R

How to calculate the correlation of 2 variables for every nth rows in a data frame in r?

Transpose the data in a column every nth rows in PANDAS

Create new data frame column based on conditioned column names in R

Splitting a column in a data frame by an nth instance of a character

Recode column every nth element in R

Deleting every nth column from a dataframe in r

How can I transpose every nth (9th) column to a new row in R?

Get nth element of a vector inside a data.frame column for the whole column - R

Stack every nth column under neath each other

Query data from another data frame for new column in r

How to create a new column in R from data in another data frame?

pairwise substring a data frame and save them in a list