Chris Rockwell Chris Rockwell - 1 month ago 12x
HTML Question

Replace double slash with single slash in XPath selector

I am using to scrape some pages. I came across a page that uses internal hrefs like this:
- notice the double slash after the domain name. From my research, this is done for SEO purposes but I need to get the url without those double slashes, so it returns

I am trying to use XPath (which I'm very new to) and I can get the link fine with:
//a[contains(@class, 'event-info-btn')]//@href

My next step was to try
with this:
fn:replace(//a[contains(@class, 'event-info-btn')]//@href, '', '')
. This isn't working - nothing is returned.

I'm not sure if my implementation is bad, or if just doesn't support this.

  • I'll also note the reason why I'm trying to do this: is failing on all of the urls. If I manually remove the slash and try again, it works fine.


Note that claims to support XPath 2.0.


You probably mean /@href rather than //@href, but that's not the real problem.

Your XPath is returning a sequence of href attributes where replace() is expecting a string.


For this HTML,

  <a class="event-info-btn" href="">one</a>
  <a class="event-info-btn" href="">one</a>
  <a class="event-info-btn" href="">one</a>

this XPath,

for $href in //a[contains(@class, 'event-info-btn')]/@href 
    return replace($href, '', '')

will return

as requested.


This doesn't work in and I'm having trouble finding a fiddle-like site to test it.

You can see this working here., it seems, only allows you to input one line of xpath.

You might try putting the XPath on a single line, then:

for $href in //a[contains(@class, 'event-info-btn')]/@href return replace($href, '', '')

If that doesn't work, then's claim that they support XPath 2.0 is not correct.