R Question

R: How to extract specific digits from a string?

I want to retrieve the first Numbers (here -> 344002) from a string:

string <- '<a href="/Archiv-Suche/!344002&amp;s=&amp;SuchRahmen=Print/" ratiourl-ressource="344002"'

I am preferably looking for a regular expression, which looks for the Numbers after the ! and before the &amp.

All I came up with is this but this catches the ! as well (!344002):

regmatches(string, gregexpr("\\!([[:digit:]]+)", string, perl =TRUE))

Any ideas?

Answer Source

Use this regex:


Use this code:

regmatches(string, gregexpr("(?<=\!)\d+(?=&amp)", string, perl=TRUE))
  • (?<=\!) is a lookbehind, the match will start following !
  • \d+ matches one digit or more
  • (?=&amp) stops the match if next characters are &amp