Saul Frank Saul Frank - 15 days ago 8
R Question

R Convert complex xml to dataframe

I am converting a complex XML file to a dataframe.

Two problems with this approach:


  1. All of two is replicated where it should be null

  2. If there are more data points in one of the nodes then sometimes get this error: "arguments imply differing number of rows: 198, 240", it should map back to the same row and if it doesn't exist, should be null.

  3. How do I calculate two existing columns to equal 3?



This is a simplified version of that:

require(xml2)

xml_data = "
<top>
<line>
<one>1</one>
</line>
<line>
<one>1</one>
<two>2</two>
</line>
<line>
<one>1</one>
</line>
</top>
"

data2 <- read_xml(file)


df <- data.frame(
#purchase
one=xml_text(xml_find_all(data2, ".//line/one")),
two=xml_text(xml_find_all(data2, ".//line/two")),
sum1 = one + two
)

Answer

After I wrote the comment I realized that actual searching effort was prbly unlikely:

require(xml2)
library(purrr)
library(dplyr)

xml_data = "
<top>
    <line>
        <one>1</one>
    </line>
    <line>
        <one>1</one>
        <two>2</two>
    </line>
    <line>
        <one>1</one>
    </line>
</top>
"

data2 <- read_xml(xml_data)

xml_find_all(data2, ".//line") %>% 
  map_df(function(x) {
    one <- xml_find_all(x, ".//one") %>% xml_text() %>% as.numeric()
    two <- xml_find_all(x, ".//two") %>% xml_text() %>% as.numeric()
    if (length(two) == 0) two <- NA_integer_
    data_frame(one, two, sum=sum(one, two, na.rm=TRUE))
  })
## # A tibble: 3 × 3
##     one   two   sum
##   <dbl> <dbl> <dbl>
## 1     1    NA     1
## 2     1     2     3
## 3     1    NA     1
Comments