neenkart neenkart - 7 months ago 17
HTML Question

xpath to get data starts with specific character or string

I need to extract certain text elements from the following code.

<div class="inhalt-links">
<h2>
Deutsche Verkehrswacht
<br>
Verkehrswacht Dortmund e. V.
<br>
</h2>
<h3>
Standnummer:&nbsp;
<span style="font-weight: normal;">4.E08</span>
</h3>
<div class="clear"></div>
<br>
Benediktinerstra├če 82
<br>
44287&nbsp;Dortmund
<br>
Deutschland
<br>
<br>
Tel.:+49 231 447687
<br>
Fax:+49 231 447136
<br>
E-Mail:info@verkehrswacht-dortmund.de
<br>
<a href="http://www.verkehrswacht-dortmund.de" class="url" target="_blank">www.verkehrswacht-dortmund.de</a>
<br>
<div class="social"></div>
<br>
</div>


For extracting the Tel.:+49 231 447687, i can use
div[@class='inhalt-links']/text()[4]
. And for other details like Fax, Email, Website, i just need to change the position number of text() element. But, the position of these texts will be of different order sometimes, like in the following code:

<div class="inhalt-links">
<h2>
DEW21
<br>
</h2>
<h3>
Standnummer:&nbsp;
<span style="font-weight: normal;">4.B56</span>
</h3>
<div class="clear"></div>
<br>
G├╝nter-Samtlebe-Platz 1
<br>
44135&nbsp;Dortmund
<br>
Postfach:104141
<br>
44041&nbsp;Dortmund
<br>
Deutschland
<br>
<br>
Tel.:+49 231 544-0
<br>
Fax:+49 231 544-1130
<br>
E-Mail:vertrieb@dew21.de
<br>
<a href="http://www.dew21.de" class="url" target="_blank">www.dew21.de</a>
<br>
<div class="social"></div>
<br>
</div>


The xpath
div[@class='inhalt-links']/text()[4]
will select the text "44041 Dortmund" instead of Tel.:+49 231 544-0. Is there any xpath like
"div[@class='inhalt-links']/text[starts with "Tel.:"]"
to select the
Tel.:
element?

Answer

" Is there any xpath like "//div[@class='inhalt-links']/text[starts with "Tel.:"]" to select the Tel.: element?"

Sure, try this way :

//div[@class='inhalt-links']/text()[starts-with(normalize-space(), 'Tel.:')]

The XPath returns text node -rather than element- that starts with, after removing leading and trailing whitespaces*, the keyword Tel.:.


*) For reference of what normalize-space() is doing more precisely :

The normalize-space function strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string. [Mozilla Developer Network]