Ozuf Ozuf - 1 month ago 7
Java Question

Regex to match url directory path without match file name

I want a regex that will match

https://example.com/studio/
or
https://example.com/studio
without matching
https://example.com/studio/path-to-file-blah-blah
or
https://example.com/studio/path-to-file-blah-blah.html


I tries
https?:\/\/(?:w{3}[.])?example[.]com\/studio\S*
but it's matching both groups above.

I have also tried
https?:\/\/(?:w{3}[.])?example[.]com\/studio\/?
and it was able to match only the first group. But the problem is matching only the second group. Please how can I do that?

Answer

I'm assuming you need to parse the URL from unstructured text. Assuming there's a space character, new line character, or the end of the string the following should work for you. If there's a period or other character directly after the URL this will fail, but it's easy to modify to support additional terminating characters.

https?:\/\/(?:w{3}[.])?example[.]com\/studio\/?(?:\s|$)

(?:\s|$) Just says match a space character (which includes line endings line a new line character) OR match the end of the string.

Regex Demo

EDIT

I think you're saying group 2 is:

https://example.com/studio/path-to-file-blah-blah
https://example.com/studio/path-to-file-blah-blah.html

To match these, you'll need the following regex:

https?:\/\/(?:w{3}[.])?example[.]com\/studio\/\S+

The only change I made was the last character was \S*, but it should be \S+.

* means 0 or more

+ means 1 or more.

Hopefully this touches on what you're looking for. If I'm still off, if you label the groups it'd help me understand so I can write the correct regex.