Flatten a list with complex nested structure

Zelazny7

I have a list with the following example structure:

> dput(test)
structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(
    var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2", 
"var3")), section2 = structure(list(row = structure(list(var1 = 1, 
    var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")), 
    row = structure(list(var1 = 4, var2 = 5, var3 = 6), .Names = c("var1", 
    "var2", "var3")), row = structure(list(var1 = 7, var2 = 8, 
        var3 = 9), .Names = c("var1", "var2", "var3"))), .Names = c("row", 
"row", "row"))), .Names = c("id", "var1", "var3", "section1", 
"section2"))


> str(test)
List of 5
 $ id      : num 1
 $ var1    : num 2
 $ var3    : num 4
 $ section1:List of 3
  ..$ var1: num 1
  ..$ var2: num 2
  ..$ var3: num 3
 $ section2:List of 3
  ..$ row:List of 3
  .. ..$ var1: num 1
  .. ..$ var2: num 2
  .. ..$ var3: num 3
  ..$ row:List of 3
  .. ..$ var1: num 4
  .. ..$ var2: num 5
  .. ..$ var3: num 6
  ..$ row:List of 3
  .. ..$ var1: num 7
  .. ..$ var2: num 8
  .. ..$ var3: num 9

Notice that the section2 list contains elements named rows. These represent multiple records. What I have is a nested list where some elements are at the root level and others are multiple nested records for the same observation. I would like the following output in a data.frame format:

> desired
  id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3
1  1    2    4             1             2               3             1             4             7
2 NA   NA   NA            NA            NA              NA             2             5             8
3 NA   NA   NA            NA            NA              NA             3             6             9

Root-level elements should populate the first row, while row elements should have their own rows. As an added complication, the number of variables in the row entries can vary.

tiffany

Here's a general approach. It doesn't assume that you'll have only three row; it will work with however many rows you have. And if a value is missing in the nested structure (e.g. var1 doesn't exist for some sub-lists in section2), the code correctly returns an NA for that cell.

E.g. if we use the following data:

test <- structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")), section2 = structure(list(row = structure(list(var1 = 1, var2 = 2), .Names = c("var1", "var2")), row = structure(list(var1 = 4, var2 = 5), .Names = c("var1", "var2")), row = structure(list( var2 = 8, var3 = 9), .Names = c("var2", "var3"))), .Names = c("row", "row", "row"))), .Names = c("id", "var1", "var3", "section1", "section2"))

The general approach is to use melt to create a dataframe that includes information about the nested structure, and then dcast to mold it into the format you desire.

library("reshape2")

flat <- unlist(test, recursive=FALSE)
names(flat)[grep("row", names(flat))] <- gsub("row", "var", paste0(names(flat)[grep("row", names(flat))], seq_len(length(names(flat)[grep("row", names(flat))]))))  ## keeps track of rows by adding an ID
ul <- melt(unlist(flat))
split <- strsplit(rownames(ul), split=".", fixed=TRUE) ## splits the names into component parts
max <- max(unlist(lapply(split, FUN=length)))
pad <- function(a) {
  c(a, rep(NA, max-length(a)))
}
levels <- matrix(unlist(lapply(split, FUN=pad)), ncol=max, byrow=TRUE)

## Get the nesting structure
nested <- data.frame(levels, ul)
nested$X3[is.na(nested$X3)] <- levels(as.factor(nested$X3))[[1]]
desired <- dcast(nested, X3~X1 + X2)
names(desired) <- gsub("_", "\\.", gsub("_NA", "", names(desired)))
desired <- desired[,names(flat)]

> desired
  ## id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3
## 1  1    2    4             1             2             3             1             4             7
## 2 NA   NA   NA            NA            NA            NA             2             5             8
## 3 NA   NA   NA            NA            NA            NA             3             6             9

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to flatten irregular nested sublists inside a list

How to flatten out a nested json structure in go

How initialize nested complex structure

flatten nested list by averaging vectors

Recursively flatten nested list in python

Flatten nested lists in a list

Nested list comprehension to flatten nested list

flatten a nested list by using generator

flatten nested data structure in Spark

Flatten list of nested namedtuples to list of dictionaries

How to flatten nested list keeping inner structure

Flatten complex directory structure in Python

Flatten nested list into 1-deep list

How to flatten a nested list in python?

SelectMany to flatten a nested structure

Why does my Pradicate my_flatten/2 not flatten a nested list structure? (Prolog)

Flatten Nested List using AutoMapper

Complex model to nested list

Flatten a nested dict structure into a dataset

How can I prepopulate the structure of a complex nested form on the Create button in a List view in react-admin?

Typescript flatten complex nested array

How to validate a complex nested data structure with Pydantic?

Flatten a deeply nested data structure of arrays, objects + strings into a list of data items while mapping the former parent-child relationship too

GO is a complex nested structure

alternately flatten values of nested list

Flatten deeply nested list of dataframes

Flatten a nested list structure using concatMap gives error

Flatten a partially nested list in Ansible

Flatten a complex nested having a common column JSON using jolt transform