From reading through, then trying a few previous examples on StackOverflow related to reading an XML file in R, it seems that due to the “jagged” nature of the following file I can’t use XPath related methods.
https://www.dropbox.com/s/jz8sj2fifuobkva/Data.xml?oref=e&n=305307914
Therefore, it seems I need to use a combination of xmlToList() and ldply() to read data from the following file.
Specifically, for all 20 events in the file (ie. event.1, event.2, … event.20), I am wanting to get the following variables (structured as)
$movements$movement$clips$clip$data$event$begin
(vector)$movements$movement$clips$clip$data$event$end
(vector)$movements$movement$clips$clip$data$event$max$cells
(data frame)$rollover$data$quant$cells
where there are several samples within an event (n data frames)Based on other StackOverflow examples the code (using R v3.1.2) I have tried to read the “begin” data is as follows :-
library(XML)
library(plyr)
datfile <- "D:/Data.xml"
xmlfile <- xmlTreeParse(datfile,useInternal = TRUE)
sampledata <- xmlToList(xmlfile)
startdata <- ldply(sampledata$movements$movement$clips$clip$data$event$begin)
When I do this I only get the first variable (0.240) in event.1. I have now got to the point where I am stuck, and have exhausted my investigations on how to do this.
If you're willing to give xml2
a go, you can get to begin
in a few lines:
library(xml2)
library(magrittr)
# get a vector
doc <- read_xml("~/Dropbox/Data.xml")
doc %>%
xml_find_all("//d1:event/d1:begin", ns=xml_ns(doc)) %>%
xml_text() %>%
as.numeric()
## [1] 0.24 0.73 1.25 1.75 2.24 2.75 3.27 3.76 4.30 4.77 5.28 5.78 6.32 6.82
## [15] 7.34 7.85 8.37 8.86 9.39 9.89
# get data frames
library(stringr)
make_df <- function(txt) {
txt %>%
str_split("\n") %>% extract2(1) %>%
str_trim() %>%
textConnection() -> con
dat <- read.table(con)
close(con)
dat
}
doc %>%
xml_find_all("//d1:max/d1:cells", ns=xml_ns(doc)) %>%
xml_text() %>%
lapply(make_df) -> df_list
df_list[[1]]
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
## 1 0.0 0.0 1.5 3.5 3.0 1.5 0.0 0.0 0.0 0.0 0.0 0
## 2 0.0 1.0 5.5 8.5 7.0 3.5 2.0 2.0 1.0 0.0 0.0 0
## 3 0.0 3.0 9.0 13.0 9.0 4.0 3.0 3.5 2.5 1.0 0.0 0
## 4 0.0 4.5 11.0 14.0 9.0 4.0 3.0 4.0 4.0 2.0 0.0 0
## 5 0.0 4.0 10.5 12.0 7.5 4.0 3.0 4.0 4.5 3.0 0.0 0
## 6 0.0 4.5 8.5 10.0 8.0 7.5 6.5 4.5 4.0 2.5 0.0 0
## 7 2.0 8.0 14.5 16.0 14.0 13.5 13.0 9.5 5.5 2.5 0.0 0
## 8 3.5 12.0 20.0 20.5 18.0 18.0 18.0 14.5 9.0 4.0 1.5 0
## 9 4.5 12.5 20.5 21.0 18.0 18.0 18.5 16.0 11.5 6.5 2.5 0
## 10 4.5 12.0 19.0 20.0 17.5 17.5 18.0 16.5 12.5 7.5 3.5 0
## 11 3.5 9.5 15.5 16.5 15.0 14.5 14.5 14.0 11.5 8.0 4.0 1
## 12 2.0 6.5 10.0 12.0 11.0 11.0 12.0 12.0 10.5 7.5 4.0 0
## 13 1.5 4.5 6.5 7.0 7.0 7.0 8.0 9.0 8.0 6.5 3.5 0
## 14 1.0 4.0 5.5 5.5 5.5 5.5 6.0 6.0 6.0 4.5 2.5 0
## 15 1.5 4.5 6.0 5.5 5.5 5.5 5.5 5.5 5.5 4.0 2.0 0
## 16 2.0 5.0 7.0 7.0 6.0 6.0 6.0 6.0 5.5 4.0 1.5 0
## 17 2.5 5.5 7.5 7.5 7.0 7.0 6.5 6.5 5.5 4.0 1.5 0
## 18 2.0 5.5 7.0 7.5 7.5 7.5 7.5 6.5 5.5 3.5 0.0 0
## 19 2.5 5.5 7.5 8.0 7.5 8.0 7.5 6.5 5.0 2.5 0.0 0
## 20 2.0 5.0 6.5 7.5 7.5 8.0 7.5 6.5 4.5 2.0 0.0 0
## 21 1.5 4.0 6.0 7.5 8.5 8.5 8.0 6.0 3.5 1.0 0.0 0
## 22 1.0 3.5 6.5 8.5 9.5 9.5 8.0 5.5 3.0 0.0 0.0 0
## 23 0.0 4.0 8.0 11.0 12.5 11.0 8.5 5.5 2.5 0.0 0.0 0
## 24 0.0 4.5 9.5 13.5 14.5 12.0 8.5 5.5 2.0 0.0 0.0 0
## 25 0.0 5.5 13.0 17.5 17.0 14.5 9.5 5.5 1.5 0.0 0.0 0
## 26 0.0 6.5 16.0 21.0 19.5 15.5 10.0 5.0 1.0 0.0 0.0 0
## 27 0.0 7.0 17.0 22.5 21.0 16.0 10.0 5.0 0.0 0.0 0.0 0
## 28 0.0 7.0 17.5 22.5 20.5 15.5 9.0 3.5 0.0 0.0 0.0 0
## 29 0.0 5.5 14.5 20.5 18.5 14.0 8.0 2.5 0.0 0.0 0.0 0
## 30 0.0 3.5 10.0 14.5 14.0 10.0 5.0 1.0 0.0 0.0 0.0 0
## 31 0.0 1.5 5.5 8.5 8.0 5.5 2.5 0.0 0.0 0.0 0.0 0
## 32 0.0 0.0 0.0 2.5 2.5 0.0 0.0 0.0 0.0 0.0 0.0 0
length(df_list)
## [1] 20
# get the deeply nested ones
quant_cells <- function(node) {
node %>%
xml_find_all("./d1:data/d1:quant/d1:cells", ns=xml_ns(doc)) %>%
xml_text() %>%
lapply(make_df)
}
doc %>%
xml_find_all("//d1:rollover", ns=xml_ns(doc)) %>%
as_list() %>%
lapply(quant_cells) -> quant_df_list
length(quant_df_list)
## [1] 20
length(quant_df_list[[1]])
## [1] 63
quant_df_list[[1]]
## [[1]]
## V1 V2 V3 V4 V5 V6
## 1 0.0 0.0 0.0 0.0 0.0 0
## 2 0.0 0.0 0.2 0.0 0.0 0
## 3 0.0 0.5 1.7 0.5 0.0 0
## 4 0.5 2.7 3.4 2.3 0.3 0
## 5 2.3 4.3 4.4 3.0 0.4 0
## 6 3.2 4.8 4.8 3.3 0.4 0
## 7 2.2 4.1 3.8 2.3 0.3 0
## 8 0.3 1.4 1.4 0.4 0.0 0
##
## [[2]]
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
## 2 0.0 0.3 0.9 1.3 1.1 0.4 0.0 0.0 0
## 3 0.2 2.2 4.5 5.9 4.7 2.0 0.2 0.0 0
## 4 1.0 5.3 8.5 9.1 7.1 3.7 0.4 0.0 0
## 5 2.9 8.3 12.0 11.6 9.0 5.4 1.0 0.0 0
## 6 3.5 9.2 13.5 12.9 9.6 5.8 1.5 0.1 0
## 7 3.0 8.2 11.6 11.3 8.3 4.4 0.5 0.0 0
## 8 1.1 3.7 6.4 6.3 4.0 1.8 0.2 0.0 0
## 9 0.0 0.2 1.4 1.5 0.3 0.0 0.0 0.0 0
## ...
## (down to [[63]])
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments