Winterflags Winterflags - 5 months ago 17
Python Question

How to shorten long XPath expressions with many OR alternatives?

I am working to get Selenium to go through a large number of alternative conditional XPaths, looking for elements that may match – and pass that on to the object

elmnt
.

Currently, using the
OR
operator (
|
), the code quickly gets very repetitive and exhaustive, especially when there are a lot of possible variations.

In the example below, the only variation is that I start looking for
h1
,
h2
, or
h3
. The rest is the same.

for word in ["testString1", "testString2", "testString3"]:

try:
elmnt = driver.find_element_by_xpath(
(
"//h1[text()[contains(., '%s')]]" % word +
"/following::p" +
"|" +
"//h1[text()[contains(., '%s')]]" % word +
"/following::span" +

"|" +

"//h2[text()[contains(., '%s')]]" % word +
"/following::p" +
"|" +
"//*h2[text()[contains(., '%s')]]" % word +
"/following::span" +

"|" +

"//h3[text()[contains(., '%s')]]" % word +
"/following::p" +
"|" +
"//h3[text()[contains(., '%s')]]" % word +
"/following::span"
)
).text
except:
pass
else:
print elmnt
break


But in my actual code, I will be looking at a even more variations, including various node types in
/following::
besides
p
and
span
.

Question: Is there some way to simplify (shorten) this?

My first hope was that it'd be possible to do something like:

"//[h1|h2|h3][text()[contains(., '%s')]]" % word


i.e. that the
or
operators could be "baked into" the XPath expression without having to use fully exhaustive string concatenations like in the example. And if so, that idea could've been applied across the board.

However, this does not seem to be possible.

Is the solution to create some sort of generative function that creates the entire xPath string, or something else?

Answer

I would use this shortened XPath (leveraging the self:: axis as recommended by @alecxe in a comment):

  "//*[self::h1 or self::h2 or self::h3][contains(., '%s')]" % word
+ "/following::*[self::p or self::span]"

Note that this tests that the string value of h1, or h2, or h3 contains the value of the word variable (rather than the string value of the immediate text nodes). Also, if you want to actually test that the string value of those elements are rather than contains word, use [.='%s'] instead.

Comments