Reading data from an XML File Using R

DesertProject

From reading through, then trying a few previous examples on StackOverflow related to reading an XML file in R, it seems that due to the “jagged” nature of the following file I can’t use XPath related methods.

https://www.dropbox.com/s/jz8sj2fifuobkva/Data.xml?oref=e&n=305307914

Therefore, it seems I need to use a combination of xmlToList() and ldply() to read data from the following file.

Specifically, for all 20 events in the file (ie. event.1, event.2, … event.20), I am wanting to get the following variables (structured as)

  • $movements$movement$clips$clip$data$event$begin (vector)
  • $movements$movement$clips$clip$data$event$end (vector)
  • $movements$movement$clips$clip$data$event$max$cells (data frame)
  • As per above but $rollover$data$quant$cells where there are several samples within an event (n data frames)

Based on other StackOverflow examples the code (using R v3.1.2) I have tried to read the “begin” data is as follows :-

library(XML)
library(plyr)

datfile <- "D:/Data.xml"
xmlfile <- xmlTreeParse(datfile,useInternal = TRUE)
sampledata <- xmlToList(xmlfile)
startdata <- ldply(sampledata$movements$movement$clips$clip$data$event$begin)

When I do this I only get the first variable (0.240) in event.1. I have now got to the point where I am stuck, and have exhausted my investigations on how to do this.

hrbrmstr

If you're willing to give xml2 a go, you can get to begin in a few lines:

library(xml2)
library(magrittr)

# get a vector

doc <- read_xml("~/Dropbox/Data.xml")

doc %>%
  xml_find_all("//d1:event/d1:begin", ns=xml_ns(doc)) %>%
  xml_text() %>%
  as.numeric()

##  [1] 0.24 0.73 1.25 1.75 2.24 2.75 3.27 3.76 4.30 4.77 5.28 5.78 6.32 6.82
## [15] 7.34 7.85 8.37 8.86 9.39 9.89

# get data frames

library(stringr)

make_df <- function(txt) {

  txt %>%
    str_split("\n") %>% extract2(1) %>%
    str_trim() %>%
    textConnection() -> con

  dat <- read.table(con)
  close(con)

  dat

}

doc %>%
  xml_find_all("//d1:max/d1:cells", ns=xml_ns(doc)) %>%
  xml_text() %>%
  lapply(make_df) -> df_list

df_list[[1]]

##     V1   V2   V3   V4   V5   V6   V7   V8   V9 V10 V11 V12
## 1  0.0  0.0  1.5  3.5  3.0  1.5  0.0  0.0  0.0 0.0 0.0   0
## 2  0.0  1.0  5.5  8.5  7.0  3.5  2.0  2.0  1.0 0.0 0.0   0
## 3  0.0  3.0  9.0 13.0  9.0  4.0  3.0  3.5  2.5 1.0 0.0   0
## 4  0.0  4.5 11.0 14.0  9.0  4.0  3.0  4.0  4.0 2.0 0.0   0
## 5  0.0  4.0 10.5 12.0  7.5  4.0  3.0  4.0  4.5 3.0 0.0   0
## 6  0.0  4.5  8.5 10.0  8.0  7.5  6.5  4.5  4.0 2.5 0.0   0
## 7  2.0  8.0 14.5 16.0 14.0 13.5 13.0  9.5  5.5 2.5 0.0   0
## 8  3.5 12.0 20.0 20.5 18.0 18.0 18.0 14.5  9.0 4.0 1.5   0
## 9  4.5 12.5 20.5 21.0 18.0 18.0 18.5 16.0 11.5 6.5 2.5   0
## 10 4.5 12.0 19.0 20.0 17.5 17.5 18.0 16.5 12.5 7.5 3.5   0
## 11 3.5  9.5 15.5 16.5 15.0 14.5 14.5 14.0 11.5 8.0 4.0   1
## 12 2.0  6.5 10.0 12.0 11.0 11.0 12.0 12.0 10.5 7.5 4.0   0
## 13 1.5  4.5  6.5  7.0  7.0  7.0  8.0  9.0  8.0 6.5 3.5   0
## 14 1.0  4.0  5.5  5.5  5.5  5.5  6.0  6.0  6.0 4.5 2.5   0
## 15 1.5  4.5  6.0  5.5  5.5  5.5  5.5  5.5  5.5 4.0 2.0   0
## 16 2.0  5.0  7.0  7.0  6.0  6.0  6.0  6.0  5.5 4.0 1.5   0
## 17 2.5  5.5  7.5  7.5  7.0  7.0  6.5  6.5  5.5 4.0 1.5   0
## 18 2.0  5.5  7.0  7.5  7.5  7.5  7.5  6.5  5.5 3.5 0.0   0
## 19 2.5  5.5  7.5  8.0  7.5  8.0  7.5  6.5  5.0 2.5 0.0   0
## 20 2.0  5.0  6.5  7.5  7.5  8.0  7.5  6.5  4.5 2.0 0.0   0
## 21 1.5  4.0  6.0  7.5  8.5  8.5  8.0  6.0  3.5 1.0 0.0   0
## 22 1.0  3.5  6.5  8.5  9.5  9.5  8.0  5.5  3.0 0.0 0.0   0
## 23 0.0  4.0  8.0 11.0 12.5 11.0  8.5  5.5  2.5 0.0 0.0   0
## 24 0.0  4.5  9.5 13.5 14.5 12.0  8.5  5.5  2.0 0.0 0.0   0
## 25 0.0  5.5 13.0 17.5 17.0 14.5  9.5  5.5  1.5 0.0 0.0   0
## 26 0.0  6.5 16.0 21.0 19.5 15.5 10.0  5.0  1.0 0.0 0.0   0
## 27 0.0  7.0 17.0 22.5 21.0 16.0 10.0  5.0  0.0 0.0 0.0   0
## 28 0.0  7.0 17.5 22.5 20.5 15.5  9.0  3.5  0.0 0.0 0.0   0
## 29 0.0  5.5 14.5 20.5 18.5 14.0  8.0  2.5  0.0 0.0 0.0   0
## 30 0.0  3.5 10.0 14.5 14.0 10.0  5.0  1.0  0.0 0.0 0.0   0
## 31 0.0  1.5  5.5  8.5  8.0  5.5  2.5  0.0  0.0 0.0 0.0   0
## 32 0.0  0.0  0.0  2.5  2.5  0.0  0.0  0.0  0.0 0.0 0.0   0

length(df_list)

## [1] 20

# get the deeply nested ones

quant_cells <- function(node) {
  node %>%
    xml_find_all("./d1:data/d1:quant/d1:cells", ns=xml_ns(doc)) %>%
    xml_text() %>%
    lapply(make_df)
}

doc %>%
  xml_find_all("//d1:rollover", ns=xml_ns(doc)) %>%
  as_list() %>%
  lapply(quant_cells) -> quant_df_list

length(quant_df_list)

## [1] 20

length(quant_df_list[[1]])

## [1] 63

quant_df_list[[1]]

## [[1]]
##    V1  V2  V3  V4  V5 V6
## 1 0.0 0.0 0.0 0.0 0.0  0
## 2 0.0 0.0 0.2 0.0 0.0  0
## 3 0.0 0.5 1.7 0.5 0.0  0
## 4 0.5 2.7 3.4 2.3 0.3  0
## 5 2.3 4.3 4.4 3.0 0.4  0
## 6 3.2 4.8 4.8 3.3 0.4  0
## 7 2.2 4.1 3.8 2.3 0.3  0
## 8 0.3 1.4 1.4 0.4 0.0  0
## 
## [[2]]
##    V1  V2   V3   V4  V5  V6  V7  V8 V9
## 1 0.0 0.0  0.0  0.0 0.0 0.0 0.0 0.0  0
## 2 0.0 0.3  0.9  1.3 1.1 0.4 0.0 0.0  0
## 3 0.2 2.2  4.5  5.9 4.7 2.0 0.2 0.0  0
## 4 1.0 5.3  8.5  9.1 7.1 3.7 0.4 0.0  0
## 5 2.9 8.3 12.0 11.6 9.0 5.4 1.0 0.0  0
## 6 3.5 9.2 13.5 12.9 9.6 5.8 1.5 0.1  0
## 7 3.0 8.2 11.6 11.3 8.3 4.4 0.5 0.0  0
## 8 1.1 3.7  6.4  6.3 4.0 1.8 0.2 0.0  0
## 9 0.0 0.2  1.4  1.5 0.3 0.0 0.0 0.0  0
## ...
## (down to [[63]])

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

TOP Ranking

  1. 1

    Failed to listen on localhost:8000 (reason: Cannot assign requested address)

  2. 2

    How to import an asset in swift using Bundle.main.path() in a react-native native module

  3. 3

    Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

  4. 4

    pump.io port in URL

  5. 5

    Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

  6. 6

    BigQuery - concatenate ignoring NULL

  7. 7

    ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

  8. 8

    Do Idle Snowflake Connections Use Cloud Services Credits?

  9. 9

    maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

  10. 10

    Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

  11. 11

    Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

  12. 12

    Generate random UUIDv4 with Elm

  13. 13

    Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

  14. 14

    Is it possible to Redo commits removed by GitHub Desktop's Undo on a Mac?

  15. 15

    flutter: dropdown item programmatically unselect problem

  16. 16

    Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

  17. 17

    EXCEL: Find sum of values in one column with criteria from other column

  18. 18

    Pandas - check if dataframe has negative value in any column

  19. 19

    How to use merge windows unallocated space into Ubuntu using GParted?

  20. 20

    Make a B+ Tree concurrent thread safe

  21. 21

    ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

HotTag

Archive