salvu salvu - 3 months ago 5
R Question

How to recursively build a tree of unknown depth in R

I am trying to store scraped media comment data obtained via RSelenium, XML and JSON into a tree using Christoph Gluc's data.tree package in R. The problem I am facing is that I do not know beforehand the depth of the tree because of the way some of the comments are posted. The primary site is Disqus where people commenting might comment on the article, reply to other people's comments with yet other people replying to comments directly. Thus the depth of the comments is not known.

My data is held in a list of lists where each list represents a disqus comment and might either contain 6 elements if the comment does not have "child" comments or 7 if child comments exist. The seventh element will be another list of one or more comments. The following structure gives an idea of the data:

newtree <- Node$new("article_Name")
post <- newtree$AddChild("Post ID_1")$
AddChild("Date")$
AddSibling("Poster")$
AddSibling("Disqus Name")$
AddSibling("Message")$
AddSibling("Num Children")$
parent$
AddChildNode(Node$new("Child Post ID_1"))$
AddChild("Date")$
AddSibling("Poster")$
AddSibling("Disqus Name")$
AddSibling("Message")$
AddSibling("Num Children")$
AddChildNode(Node$new("Child-Child Post ID_1"))$
AddChild("Date")$
AddSibling("Poster")$
AddSibling("Disqus Name")$
AddSibling("Message")$
AddSibling("Num Children")$
parent$
parent$
parent$
AddChildNode(Node$new("Child Post ID_2"))$
root$
AddChild("Post ID 2")
print(newtree)


I have tried creating the above using loops but I obviously cannot go beyond the first level of children as I do not know whether a child of a child might have a child as well.

I have tried looking for post on recursion but could locate any regarding R although there are quite a few in other languages such as javascript. The following is the code I tried to create the tree and it sure looks ugly with all the
[[]]
that are required to access some of the deeper elements.

commentTree <- Node$new("article_Name")
for (i in 1:length(appNodes)) {
i <- 1
post <- commentTree$AddChild(
postData[[i]][1])$
AddChild(postData[[i]][2])$
AddSibling(postData[[i]][3])$
AddSibling(postData[[i]][4])$
AddSibling(postData[[i]][5])$
AddSibling(postData[[i]][6])

while (postData[[i]][[6]] > 0) {
for (j in 1 : length(postData[[i]][[7]])) {
post$AddChildNode(Node$new(postData[[i]][[7]][[j]][1]))$
AddChild(postData[[i]][[7]][[j]][2])$
AddSibling(postData[[i]][[7]][[j]][3])$
AddSibling(postData[[i]][[7]][[j]][4])$
AddSibling(postData[[i]][[7]][[j]][5])$
AddSibling(postData[[i]][[7]][[j]][6])
}
}
print(commentTree)
}


Any help to write a recursive function would be greatly appreciated. Thanks.

EDIT - Added sample date as posted in comment to provide clarity

[[1]]
[[1]]$postId
[1] "2794864846"

[[1]]$date
[1] "Thursday, July 21, 2016 9:28 AM"

[[1]]$poster
[1] "Lucienne"

[[1]]$disqusUname
[1] "disqus_AEt1ZsgK9N"

[[1]]$message
[1] "200 hundred pilots for 7 planes? Wow each of them must work very long hours. "

[[1]]$numChildren
[1] 1

[[1]]$child
[[1]]$child[[1]]
[[1]]$child[[1]]$postId
[1] "2795010796"

[[1]]$child[[1]]$date
[1] "Thursday, July 21, 2016 11:50 AM"

[[1]]$child[[1]]$poster
[1] "Jesmond Tedesco Triccas"

[[1]]$child[[1]]$disqusUname
[1] "jesmondtedescotriccas"

[[1]]$child[[1]]$message
[1] "My thoughts exactly"

[[1]]$child[[1]]$numChildren
[1] 0


When I used the
tmpTree <- as.Node(postData)
to convert from list to tree, I obtained the following. My tree can be accessed by using
tmpTree$'1'$poster
gives "Lucienne" and
tmpTree$'1'$child$'1'$poster
gives "Jesmond Tedesco..". Can the child node name be somehow set to the value in the postId field when doing the conversion?

I'm also still stuck on trying to implement a recursive manner to read all of a comment's data.

levelName
1 Root
2 °--1
3 °--child
4 °--1


EDIT - Added a reproducible code This code is a comment with a child comment. I apologize for the length of the example. This is an example having comments with children who have other children etc.

list(structure(list(postId = "2794968061", date = "Thursday, July 21, 2016 10:56 AM",
poster = "toni", disqusUname = "disqus_bujblK3zF5", message = "unbeleivable, to hear today's socialists condemning workers for trying to organise a strike, where are the likes of GWU and the Labour of old defending workers rights?",
numChildren = 1L, child = list(structure(list(postId = "2794971958",
date = "Thursday, July 21, 2016 11:01 AM", poster = "Glorfindel",
disqusUname = "disqus_daQLxWKMFy", message = "Workers rights yes, but these are not workers but wanna be millionaires in the making! They should do some research and see what a great life they have, then maybe drop these unrealistic demands! Shame on these pilots!",
numChildren = 2L, child = list(structure(list(postId = "2798727439",
date = "Saturday, July 23, 2016 9:14 AM", poster = "Christopher Hitch Borg",
disqusUname = "christopherhitchborg", message = "Pilots are workers.",
numChildren = 1L, child = list(structure(list(postId = "2798801249",
date = "Saturday, July 23, 2016 11:06 AM", poster = "Glorfindel",
disqusUname = "disqus_daQLxWKMFy", message = "Dream on. Sounds to me you are either very dumb or a capitalist trying to confuse issues.",
numChildren = 0), .Names = c("postId", "date",
"poster", "disqusUname", "message", "numChildren"
)))), .Names = c("postId", "date", "poster", "disqusUname",
"message", "numChildren", "child")), structure(list(postId = "2794982098",
date = "Thursday, July 21, 2016 11:14 AM", poster = "toni",
disqusUname = "disqus_bujblK3zF5", message = "pilots all over the world have a good salary, to be were they are they had to make big sacrifices and pay lots of money for the studies. The shame is on persons getting 13000 euros for absolutely nothing, or persons put in high places with no experience at all, shame is making an 18 year old a CEO, and I could go on for ever.We should thank all Air Malta pilots for doing a good job for all this time",
numChildren = 2L, child = list(structure(list(postId = "2795785527",
date = "Thursday, July 21, 2016 8:00 PM", poster = "Glorfindel",
disqusUname = "disqus_daQLxWKMFy", message = "Agan: when a company is fighting for survival, it is shameful, distasteful and counterproductive to demand a 30% salary increase!!! Especially that when compared to other pilots their perks are already better than most!The other stuff you mentio has nothing to do with this article. However two wrongs do not make a right. Simple as that.",
numChildren = 0), .Names = c("postId", "date",
"poster", "disqusUname", "message", "numChildren"
)), structure(list(postId = "2795010275", date = "Thursday, July 21, 2016 11:50 AM",
poster = "Jesmond Tedesco Triccas", disqusUname = "jesmondtedescotriccas",
message = "Air Malta pilots have their training paid for by the company. And do they have definite or indefinite contracts?",
numChildren = 1L, child = list(structure(list(
postId = "2795206130", date = "Thursday, July 21, 2016 2:53 PM",
poster = "toni", disqusUname = "disqus_bujblK3zF5",
message = "I have my doubts about the company paying for training, because I know of persons who couldin't make it for the financial reasons, however whatever the situation one cannot deny that they have one of the most difficult and responsible jobs existing",
numChildren = 0), .Names = c("postId", "date",
"poster", "disqusUname", "message", "numChildren"
)))), .Names = c("postId", "date", "poster",
"disqusUname", "message", "numChildren", "child")))), .Names = c("postId",
"date", "poster", "disqusUname", "message", "numChildren",
"child")))), .Names = c("postId", "date", "poster", "disqusUname",
"message", "numChildren", "child")))), .Names = c("postId",
"date", "poster", "disqusUname", "message", "numChildren", "child"
)))

Answer

This is already implemented for you, and you do not need to apply recursion yourself.

Taking your posted data above, and assuming it's called lol (for "list-of-list", no pun intended), we can do:

tree <- FromListExplicit(lol[[1]], nameName = "postId", childrenName = "child")
print(tree, "date", "poster", "disqusUname")

This will print as:

                   levelName                             date                  poster           disqusUname
1 2794968061                 Thursday, July 21, 2016 10:56 AM                    toni     disqus_bujblK3zF5
2  °--2794971958             Thursday, July 21, 2016 11:01 AM              Glorfindel     disqus_daQLxWKMFy
3      ¦--2798727439          Saturday, July 23, 2016 9:14 AM  Christopher Hitch Borg  christopherhitchborg
4      ¦   °--2798801249     Saturday, July 23, 2016 11:06 AM              Glorfindel     disqus_daQLxWKMFy
5      °--2794982098         Thursday, July 21, 2016 11:14 AM                    toni     disqus_bujblK3zF5
6          ¦--2795785527      Thursday, July 21, 2016 8:00 PM              Glorfindel     disqus_daQLxWKMFy
7          °--2795010275     Thursday, July 21, 2016 11:50 AM Jesmond Tedesco Triccas jesmondtedescotriccas
8              °--2795206130  Thursday, July 21, 2016 2:53 PM                    toni     disqus_bujblK3zF5

For details on FromListExplicit, see

?FromListExplicit

or (easier to remember):

?as.Node.list