mql4beginner mql4beginner - 3 months ago 13
R Question

How to overcome this error in Xpath

I tried to extract the Holders table ("Direct Holders (Forms 3 and 4)") for FB.I copied the Xpath function by using Chrome's "Inspect element" for the table but I keep getting the error below.How can I solve this error?

url = "http://finance.yahoo.com/quote/FB/holders?p=FB"
doc = htmlTreeParse(url, useInternalNodes = T)
tab_nodes = xpathApply(doc, "//*[@id="main-0-Quote-Proxy"]/section/div[2]/section/div/section/div[3]/div[2]/div[2]/table")

Error: unexpected symbol in "tab_nodes = xpathApply(doc, "//*[@id="main"

Answer

You can't scrape it since it's dynamic content built from data retrieved in an XHR request. While you have Developer Tools open, move to the Network tab, select "XHR" and refresh the page. You'll see a few URLs, one will be the data you need in JSON.

library(dplyr)
library(httr)
library(purrr)
library(readr)

URL <- "https://query2.finance.yahoo.com/v10/finance/quoteSummary/FB?lang=en-US&region=US&modules=institutionOwnership%2CfundOwnership%2CmajorDirectHolders%2CmajorHoldersBreakdown%2CinsiderTransactions%2CinsiderHolders%2CnetSharePurchaseActivity&corsDomain=finance.yahoo.com"
res <- GET(URL)
dat <- content(res)
df <- map_df(dat$quoteSummary$result[[1]]$majorDirectHolders$holders, ~as.list(unlist(.)))
glimpse(df)
## Observations: 10
## Variables: 22
## $ maxAge                   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
## $ name                     <chr> "KOUM JAN", "SANDBERG SHERYL", "ZUCKERBERG MAR...
## $ relation                 <chr> "Director", "Officer", "Officer", "Officer", "...
## $ url                      <chr> "http://biz.yahoo.com/t/28/9464.html", "http:/...
## $ transactionDescription   <chr> "Automatic Sale", "Sale", "Automatic Sale", "A...
## $ latestTransDate.raw      <int> 1471824000, 1471219200, 1471478400, 1471219200...
## $ latestTransDate.fmt      <date> 2016-08-22, 2016-08-15, 2016-08-18, 2016-08-1...
## $ positionDirect.raw       <int> 2576396, 4593776, NA, 651044, 648776, 420525, ...
## $ positionDirect.fmt       <dbl> 2.58, 4.59, NA, 651.04, 648.78, 420.52, 222.19...
## $ positionDirect.longFmt   <dbl> 2576396, 4593776, NA, 651044, 648776, 420525, ...
## $ positionDirectDate.raw   <int> 1447632000, 1471219200, NA, 1471219200, 143164...
## $ positionDirectDate.fmt   <date> 2015-11-16, 2016-08-15, NA, 2016-08-15, 2015-...
## $ positionIndirect.raw     <int> 38729593, 23824, 3756744, NA, NA, NA, NA, 2144...
## $ positionIndirect.fmt     <dbl> 38.73, 23.82, 3.76, NA, NA, NA, NA, 214.41, 17...
## $ positionIndirect.longFmt <dbl> 38729593, 23824, 3756744, NA, NA, NA, NA, 2144...
## $ positionIndirectDate.raw <int> 1471824000, 1444348800, 1471478400, NA, NA, NA...
## $ positionIndirectDate.fmt <date> 2016-08-22, 2015-10-09, 2016-08-18, NA, NA, N...
## $ positionSummary.raw      <int> 41305989, 4617600, NA, NA, NA, NA, NA, 218185,...
## $ positionSummary.fmt      <dbl> 41.31, 4.62, NA, NA, NA, NA, NA, 218.19, 185.3...
## $ positionSummary.longFmt  <dbl> 41305989, 4617600, NA, NA, NA, NA, NA, 218185,...
## $ positionSummaryDate.raw  <int> 1471824000, 1471219200, NA, NA, NA, NA, NA, 14...
## $ positionSummaryDate.fmt  <date> 2016-08-22, 2016-08-15, NA, NA, NA, NA, NA, 2...
Comments