Osprey Eagle Osprey Eagle - 1 month ago 32
R Question

Rselenium - How to scrape all drop down list option values

How can all the option values from a drop down list be scraped using Rselenium?

Sample of page source:

<select name="main$ddArea" onchange="javascript:setTimeout(&#39;__doPostBack(\&#39;main$ddArea\&#39;,\&#39;\&#39;)&#39;, 0)" id="main_ddArea" class="groupTextBox">
<option selected="selected" value="95182">Area 1</option>
<option value="95183">Area 2</option>
<option value="95184">Area 3</option>
<option value="95185">Area 4</option>
<option value="95186">Area 4</option>
</select>


The result wanted is a vector with each value as an element. For example, values = c("95182", "95183", "95184", "95185", "95186")

Obtaining a string of the values would also likely work as it could be split into elements, e.g., using strsplit.

getElementAttribute() with 'value' or 'option' does not work. E.g.,

dd.areas = remDr$findElement(using='id', value="main_ddArea")
dd.areas$getElementAttribute('option')


or

dd.areas$getElementAttribute('value')


getElementText()
finds one string of the text, e.g, "Area 1 /n Area 2 /n Area 3 /n...." . But the text can't later be used to navigate the drop down list. In other words, when navigating the dropdown list using
$findelement()
, a value is needed to populate the drop down list; text does not work.

The package documentation does not appear to contain references to drop down lists and neither does the vignette.

Answer

You can use findElement to target the select tag then get the outerHTML and parse the resulting html:

remDr$navigate("https://www.tutorialspoint.com/html/html_select_tag.htm")
webElem <- remDr$findElement("name", "dropdown")
appHTML <- webElem$getElementAttribute("outerHTML")[[1]]
doc <- htmlParse(appHTML)
doc["//option", fun = function(x) xmlGetAttr(x, "value")]

> doc["//option", fun = function(x) xmlGetAttr(x, "value")]
[[1]]
[1] "Data Structures"

[[2]]
[1] "Data Mining"

There were some recent issues with Firefox and get element attributes which appear when running selenium server 2 with a gecko based browser see GetAttribute of WebElement in Selenium Firefox Driver Returns Empty . In such a case you can use JavaScript to get the attributes

remDr$navigate("https://www.tutorialspoint.com/html/html_select_tag.htm")
webElem <- remDr$findElement("name", "dropdown")
jsScript <- "var element = arguments[0]; return element.outerHTML;"
appHTML <- remDr$executeScript(jsScript, list(webElem))[[1]]
doc <- htmlParse(appHTML)
doc["//option", fun = function(x) xmlGetAttr(x, "value")]

> doc["//option", fun = function(x) xmlGetAttr(x, "value")]
[[1]]
[1] "Data Structures"

[[2]]
[1] "Data Mining"