cman77 cman77 - 7 months ago 19
Ruby Question

Filtering by multiple values using XPath

I am trying to filter an XML document of Jobs by the Company name.

I am able to pull all items that match specific Company names using:

doc.xpath("/source/job[company[text() = 'BigCorp' or text() = 'MegaCorp']]")


I am unable to do the opposite and exclude by these values, using something like:

doc.xpath("/source/job[company[text() != 'Hodes' or text() != 'Scurri']]")


Where am I going wrong? Is there a way to provide a comma-separated list of values?

Answer

Try changing the or to and:

doc.xpath("/source/job[company[text() != 'Hodes' and text() != 'Scurri']]")

If you use or, it's always going to return the job.

For example, it would return the job with the company Hodes because text() != 'Scurri' is true (and vice versa).


so normalize-space() did it! doc.xpath("/source/job[company[normalize-space() != 'Hodes' and normalize-space() != 'Scurri']]") not sure why?

The reason normalize-space() worked is because text() is also going to return whitespace.

For example, if you have an element like:

<company>
 Hodes
</company>

or:

<company> Hodes </company>

the text() would equal "_Hodes_". (I replaced the spaces with _ to make them easier to see.)

Because of the whitespace, "_Hodes_" doesn't equal "Hodes".

Using normalize-space() will strip the leading/trailing whitespace and replace multiple spaces with a single space.