Jordan Davis Jordan Davis - 1 month ago 4x
C Question

Regex URL Capturing Group

I'm writing a regex expression and trying to get each part of a URL into it's own capture group for extraction:

  • Protocol (http,https)

  • Sub Domain (sub)

  • Domain (domain)

  • Domain Extension (com,net)

  • Path (/path/to/file - this is to be the path to the directory the file is contained in)

  • URI (file name)

  • URI Extension (file extension - js,css,pdf)

Sample URLs:

What I have so far:


Desired Output:

  • Group1: protocol

  • Group2: sub domain (if exist, or blank if not)

  • Group3: domain

  • Group4: domain extension

  • Group5: directory path

  • Group6: file name

  • Group7: file extension

Question: How can I get each URL part into it's own capture group across all the examples I have listed above?


You can use to check the group numbers but (if having extra groups doesn't bother you) with


you'll get

Group 1: protocol

Group 3. subdomain

Group 4. domain

Group 5. Top Level Domain (or as you say domain extension)

Group 6. /path/to/file.js

Group 8. filename

Group 9. extension

If you DO care about the numbers, you can always use "non-capturing groups (?:)


That Way you'll indeed get

Group 1: protocol

Group 2. subdomain

Group 3. domain

Group 4. domain extension (TLD)

Group 5. /path/to/file.js

Group 6. filename

Group 7. extension