Java Question

substring between two delimiters

I have a string as : "This is a URL which should be used"

I just need to extract the URL that is starting from http and ending at pdf :

String sLeftDelimiter = "http://";
String[] tempURL = sValueFromAddAtt.split(sLeftDelimiter );
String sRequiredURL = sLeftDelimiter + tempURL[1];

This gives me the output as " which should be used"

Need help on this.

This kind of problem is what regular expressions were made for:

Pattern findUrl = Pattern.compile("\\bhttp.*?\\.pdf\\b");
Matcher matcher = findUrl.matcher("This is a URL which should be used");
while (matcher.find()) {

The regular expression explained:

  • \b before the "http" there is a word boundary (i.e. xhttp does not match)
  • http the string "http" (be aware that this also matches "https" and "httpsomething")
  • .*? any character (.) any number of times (*), but try to use the least amount of characters (?)
  • \.pdf the literal string ".pdf"
  • \b after the ".pdf" there is a word boundary (i.e. .pdfoo does not match)

If you would like to match only http and https, try to use this instead of http in your string:

  • https?\: - this matches the string http, then an optional "s" (indicated by the ? after the s) and then a colon.