JPShook JPShook - 1 month ago 8
Linux Question

How to parse HTTP headers using Bash?

I need to get 2 values from a web page header that I am getting using curl. I have been able to get the values individually using:

response1=$(curl -I -s http://www.example.com | grep HTTP/1.1 | awk {'print $2'})
response2=$(curl -I -s http://www.example.com | grep Server: | awk {'print $2'})


But I cannot figure out how to grep the values separately using a single curl request like:

response=$(curl -I -s http://www.example.com)
http_status=$response | grep HTTP/1.1 | awk {'print $2'}
server=$response | grep Server: | awk {'print $2'}


Every attempt either leads to a error message or empty values. I am sure it is just a syntax issue.

Answer

Full bashsolution. Demonstrate how to easily parse other headers without requiring awk:

shopt -s extglob # Required to trim whitespace; see below

while IFS=':' read key value; do
    # trim whitespace in "value"
    value=${value##+([[:space:]])}; value=${value%%+([[:space:]])}

    case "$key" in
        Server) SERVER="$value"
                ;;
        Content-Type) CT="$value"
                ;;
        HTTP*) read PROTO STATUS MSG <<< "$key{$value:+:$value}"
                ;;
     esac
done < <(curl -sI http://www.google.com)
echo $STATUS
echo $SERVER
echo $CT

Producing:

302
GFE/2.0
text/html; charset=UTF-8

According to RFC-2616, HTTP headers are modeled as described in "Standard for the Format of ARPA Internet Text Messages" (RFC822), which states clearly section 3.1.2:

The field-name must be composed of printable ASCII characters (i.e., characters that have values between 33. and 126., decimal, except colon). The field-body may be composed of any ASCII characters, except CR or LF. (While CR and/or LF may be present in the actual text, they are removed by the action of unfolding the field.)

So the above script should catch any RFC-[2]822 compliant header with the notable exception of folded headers.

Comments