I have been using python selenium for web automation testing. The key part of automation is to find the right element for a user-visible object in a HTML page. The following API will work most of the time, but not all the time.
find_element_by_xxx, xxx can be id, name, xpath, tag_name etc.
Ok, so there may be cases where you need to perform some substantial processing of a page on the client (Python) side rather than on the server (browser) side. For instance, if you have some sort of machine learning system already written in Python and it needs to analyze the whole page before performing actions on them, then although it is possible to do it with a bunch of
find_element calls, this gets very expensive because each call is a round-trip between the client and the server. And rewriting it to work in the browser may be too expensive.
However, I do not see an efficient way to get a serialization of the DOM together with Selenium's own identifiers. Selenium creates these identifiers on an as-needed basis, when you call
find_element or when DOM nodes are returned from an
execute_script call (or passed to the callback that
execute_async_script gives to the script). But if you call
find_element to get identifiers for each element, then you are back to square one. I could imagine decorating the DOM in the browser with the required information but there is no public API to request some sort of pre-assignment of
WebElement ids. As a matter of fact, these identifiers are designed to be opaque so even if a solution managed somehow to get the required information, I'd be concerned about cross-browser viability and ongoing support.
There is however a way to get an addressing system that would work on both sides: XPath. The idea is to parse the DOM serialization into a tree on the client side and then get the XPath of the nodes you are interested in and use this to get the corresponding WebElement. So if you'd have to perform dozens of client-server roundtrips to determine which single element you need to perform a click on, you'd be able so reduce this to an initial query of the page source plus a single
find_element call with the XPath you need.
Here is a super simple proof of concept. It fetches the main input field of the Google front page.
The code above does not use
driver.page_source because Selenium's documentation states that there is no guarantee as to the freshness of what it returns. It could be the state of the current DOM or the state of the DOM when the page was first loaded.
This solution suffers from the exact same problems that
find_element suffers from regarding dynamic contents. If the DOM changes while the analysis is occurring, then you are working on a stale representation of the DOM.
find_element calls could conceivably avoid the problem I'm talking about in this point by ordering the sequence of calls carefully.)
lxml's tree could possibly differ structurally from the DOM tree in such a way that the XPath obtained from
lxml does not address the corresponding element in the DOM. What
lxml processes is the cleaned up serialized view that the browser has of the HTML passed to it. Therefore, so long as the code is written to prevent the problems I've mentioned in point 2 and 3, I do not see this as a likely scenario, but it is not impossible.