Reading data from an XML File Using R

DesertProject

From reading through, then trying a few previous examples on StackOverflow related to reading an XML file in R, it seems that due to the “jagged” nature of the following file I can’t use XPath related methods.

https://www.dropbox.com/s/jz8sj2fifuobkva/Data.xml?oref=e&n=305307914

Therefore, it seems I need to use a combination of xmlToList() and ldply() to read data from the following file.

Specifically, for all 20 events in the file (ie. event.1, event.2, … event.20), I am wanting to get the following variables (structured as)

$movements$movement$clips$clip$data$event$begin (vector)
$movements$movement$clips$clip$data$event$end (vector)
$movements$movement$clips$clip$data$event$max$cells (data frame)
As per above but $rollover$data$quant$cells where there are several samples within an event (n data frames)

Based on other StackOverflow examples the code (using R v3.1.2) I have tried to read the “begin” data is as follows :-

library(XML)
library(plyr)

datfile <- "D:/Data.xml"
xmlfile <- xmlTreeParse(datfile,useInternal = TRUE)
sampledata <- xmlToList(xmlfile)
startdata <- ldply(sampledata$movements$movement$clips$clip$data$event$begin)

When I do this I only get the first variable (0.240) in event.1. I have now got to the point where I am stuck, and have exhausted my investigations on how to do this.

hrbrmstr

If you're willing to give xml2 a go, you can get to begin in a few lines:

library(xml2)
library(magrittr)

# get a vector

doc <- read_xml("~/Dropbox/Data.xml")

doc %>%
  xml_find_all("//d1:event/d1:begin", ns=xml_ns(doc)) %>%
  xml_text() %>%
  as.numeric()

##  [1] 0.24 0.73 1.25 1.75 2.24 2.75 3.27 3.76 4.30 4.77 5.28 5.78 6.32 6.82
## [15] 7.34 7.85 8.37 8.86 9.39 9.89

# get data frames

library(stringr)

make_df <- function(txt) {

  txt %>%
    str_split("\n") %>% extract2(1) %>%
    str_trim() %>%
    textConnection() -> con

  dat <- read.table(con)
  close(con)

  dat

}

doc %>%
  xml_find_all("//d1:max/d1:cells", ns=xml_ns(doc)) %>%
  xml_text() %>%
  lapply(make_df) -> df_list

df_list[[1]]

##     V1   V2   V3   V4   V5   V6   V7   V8   V9 V10 V11 V12
## 1  0.0  0.0  1.5  3.5  3.0  1.5  0.0  0.0  0.0 0.0 0.0   0
## 2  0.0  1.0  5.5  8.5  7.0  3.5  2.0  2.0  1.0 0.0 0.0   0
## 3  0.0  3.0  9.0 13.0  9.0  4.0  3.0  3.5  2.5 1.0 0.0   0
## 4  0.0  4.5 11.0 14.0  9.0  4.0  3.0  4.0  4.0 2.0 0.0   0
## 5  0.0  4.0 10.5 12.0  7.5  4.0  3.0  4.0  4.5 3.0 0.0   0
## 6  0.0  4.5  8.5 10.0  8.0  7.5  6.5  4.5  4.0 2.5 0.0   0
## 7  2.0  8.0 14.5 16.0 14.0 13.5 13.0  9.5  5.5 2.5 0.0   0
## 8  3.5 12.0 20.0 20.5 18.0 18.0 18.0 14.5  9.0 4.0 1.5   0
## 9  4.5 12.5 20.5 21.0 18.0 18.0 18.5 16.0 11.5 6.5 2.5   0
## 10 4.5 12.0 19.0 20.0 17.5 17.5 18.0 16.5 12.5 7.5 3.5   0
## 11 3.5  9.5 15.5 16.5 15.0 14.5 14.5 14.0 11.5 8.0 4.0   1
## 12 2.0  6.5 10.0 12.0 11.0 11.0 12.0 12.0 10.5 7.5 4.0   0
## 13 1.5  4.5  6.5  7.0  7.0  7.0  8.0  9.0  8.0 6.5 3.5   0
## 14 1.0  4.0  5.5  5.5  5.5  5.5  6.0  6.0  6.0 4.5 2.5   0
## 15 1.5  4.5  6.0  5.5  5.5  5.5  5.5  5.5  5.5 4.0 2.0   0
## 16 2.0  5.0  7.0  7.0  6.0  6.0  6.0  6.0  5.5 4.0 1.5   0
## 17 2.5  5.5  7.5  7.5  7.0  7.0  6.5  6.5  5.5 4.0 1.5   0
## 18 2.0  5.5  7.0  7.5  7.5  7.5  7.5  6.5  5.5 3.5 0.0   0
## 19 2.5  5.5  7.5  8.0  7.5  8.0  7.5  6.5  5.0 2.5 0.0   0
## 20 2.0  5.0  6.5  7.5  7.5  8.0  7.5  6.5  4.5 2.0 0.0   0
## 21 1.5  4.0  6.0  7.5  8.5  8.5  8.0  6.0  3.5 1.0 0.0   0
## 22 1.0  3.5  6.5  8.5  9.5  9.5  8.0  5.5  3.0 0.0 0.0   0
## 23 0.0  4.0  8.0 11.0 12.5 11.0  8.5  5.5  2.5 0.0 0.0   0
## 24 0.0  4.5  9.5 13.5 14.5 12.0  8.5  5.5  2.0 0.0 0.0   0
## 25 0.0  5.5 13.0 17.5 17.0 14.5  9.5  5.5  1.5 0.0 0.0   0
## 26 0.0  6.5 16.0 21.0 19.5 15.5 10.0  5.0  1.0 0.0 0.0   0
## 27 0.0  7.0 17.0 22.5 21.0 16.0 10.0  5.0  0.0 0.0 0.0   0
## 28 0.0  7.0 17.5 22.5 20.5 15.5  9.0  3.5  0.0 0.0 0.0   0
## 29 0.0  5.5 14.5 20.5 18.5 14.0  8.0  2.5  0.0 0.0 0.0   0
## 30 0.0  3.5 10.0 14.5 14.0 10.0  5.0  1.0  0.0 0.0 0.0   0
## 31 0.0  1.5  5.5  8.5  8.0  5.5  2.5  0.0  0.0 0.0 0.0   0
## 32 0.0  0.0  0.0  2.5  2.5  0.0  0.0  0.0  0.0 0.0 0.0   0

length(df_list)

## [1] 20

# get the deeply nested ones

quant_cells <- function(node) {
  node %>%
    xml_find_all("./d1:data/d1:quant/d1:cells", ns=xml_ns(doc)) %>%
    xml_text() %>%
    lapply(make_df)
}

doc %>%
  xml_find_all("//d1:rollover", ns=xml_ns(doc)) %>%
  as_list() %>%
  lapply(quant_cells) -> quant_df_list

length(quant_df_list)

## [1] 20

length(quant_df_list[[1]])

## [1] 63

quant_df_list[[1]]

## [[1]]
##    V1  V2  V3  V4  V5 V6
## 1 0.0 0.0 0.0 0.0 0.0  0
## 2 0.0 0.0 0.2 0.0 0.0  0
## 3 0.0 0.5 1.7 0.5 0.0  0
## 4 0.5 2.7 3.4 2.3 0.3  0
## 5 2.3 4.3 4.4 3.0 0.4  0
## 6 3.2 4.8 4.8 3.3 0.4  0
## 7 2.2 4.1 3.8 2.3 0.3  0
## 8 0.3 1.4 1.4 0.4 0.0  0
## 
## [[2]]
##    V1  V2   V3   V4  V5  V6  V7  V8 V9
## 1 0.0 0.0  0.0  0.0 0.0 0.0 0.0 0.0  0
## 2 0.0 0.3  0.9  1.3 1.1 0.4 0.0 0.0  0
## 3 0.2 2.2  4.5  5.9 4.7 2.0 0.2 0.0  0
## 4 1.0 5.3  8.5  9.1 7.1 3.7 0.4 0.0  0
## 5 2.9 8.3 12.0 11.6 9.0 5.4 1.0 0.0  0
## 6 3.5 9.2 13.5 12.9 9.6 5.8 1.5 0.1  0
## 7 3.0 8.2 11.6 11.3 8.3 4.4 0.5 0.0  0
## 8 1.1 3.7  6.4  6.3 4.0 1.8 0.2 0.0  0
## 9 0.0 0.2  1.4  1.5 0.3 0.0 0.0 0.0  0
## ...
## (down to [[63]])

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-10-23

Comments

0 comments

TOP Ranking

Article

Reading data from an XML File Using R

Reading data from an XML File Using R

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

How to import an asset in swift using Bundle.main.path() in a react-native native module

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

pump.io port in URL

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

BigQuery - concatenate ignoring NULL

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Do Idle Snowflake Connections Use Cloud Services Credits?

maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

Generate random UUIDv4 with Elm

Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

Is it possible to Redo commits removed by GitHub Desktop's Undo on a Mac?

flutter: dropdown item programmatically unselect problem

Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

EXCEL: Find sum of values in one column with criteria from other column

Pandas - check if dataframe has negative value in any column

How to use merge windows unallocated space into Ubuntu using GParted?

Make a B+ Tree concurrent thread safe

ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements