dukeofgaming dukeofgaming - 6 months ago 12
Linux Question

How to extract multiple environment variables from a regular expression in Bash?

I essentially want to extract the parts from a URL into different environment variables for use later, so I am expecting to be able to do something like this:

echo "my-app.domain.com:8080" | \
sed -r 's/((\w+)\.)?(\w+\.\w+)(\:(\d+))?/\2\n\3\n\5/g' | \
read SUBDOMAIN DOMAIN PORT


However this does not seem to work (and for some reason the ":" from the port is always output in "\5"):

sh-4.2# echo "my-app.domain.com:8080" | \
sed -r 's/((\w+)\.)?(\w+\.\w+)(\:(\d+))?/\2\n\3\n\5/g'
my-app
domain.com
:8080


What is more strange to me is that if I print a new line after the \5, this will be the output:

sh-4.2# echo "my-app.domain.com:8080" | sed -r 's/((\w+)\.)?(\w+\.\w+)(\:(\d+))?/\2\n\3\n\5\n/g'
my-app
domain.com

:8080


In any case, when using read, none of the variables are set either... seems I am doing a number of things wrong but am unable to spot exactly what.

Answer

With GNU bash:

url="my-app.domain.com:8080"
[[ $url =~ ([^.]*)\.(.*):(.*) ]] 
subdomain="${BASH_REMATCH[1]}"
domain="${BASH_REMATCH[2]}"
port="${BASH_REMATCH[3]}"
echo "$subdomain $domain $port"

Output:

my-app domain.com 8080

See: The Stack Overflow Regular Expressions FAQ