I am trying to capture first occurence of anything that looks like a domain name from a string. For examaple
'dfasdf https://www.my.domain.home.com fadsfas'
SELECT 'dfasdf https://www.my.domain.home.com fadsfas' AS string,
) AS url_to_match;
'dfasdf https://my.domain.home.com fadsfas'
'dfasdf my.domain.home.com fadsfas'
'dfasdf ,my.domain.home.com-- fadsfas'
The problem with
www. being included in the match seems to be because you're using the 0th group (which is the full match, not just the capturing groups). While I don't know how to change that, it is possible to reformulate the regex so that group 0 and group 1 have the same value, like so:
This just says the match can't start at
www., rather than allowing the match to start there and then having to ignore it.
I've made a modified version of your regex that shows how it works. Note that if you want to match names with mixed-case alphanumerics you'll need to add
A-Z to the
a-z0-9, or turn on case-insensitivity; matching non-ascii domain names is more work, and left for the interested reader to work out.