JPShook JPShook - 10 months ago 59
Linux Question

How to parse HTTP headers using Bash?

I need to get 2 values from a web page header that I am getting using curl. I have been able to get the values individually using:

response1=$(curl -I -s | grep HTTP/1.1 | awk {'print $2'})
response2=$(curl -I -s | grep Server: | awk {'print $2'})

But I cannot figure out how to grep the values separately using a single curl request like:

response=$(curl -I -s
http_status=$response | grep HTTP/1.1 | awk {'print $2'}
server=$response | grep Server: | awk {'print $2'}

Every attempt either leads to a error message or empty values. I am sure it is just a syntax issue.

Answer Source

Full bashsolution. Demonstrate how to easily parse other headers without requiring awk:

shopt -s extglob # Required to trim whitespace; see below

while IFS=':' read key value; do
    # trim whitespace in "value"
    value=${value##+([[:space:]])}; value=${value%%+([[:space:]])}

    case "$key" in
        Server) SERVER="$value"
        Content-Type) CT="$value"
        HTTP*) read PROTO STATUS MSG <<< "$key{$value:+:$value}"
done < <(curl -sI
echo $STATUS
echo $SERVER
echo $CT


text/html; charset=UTF-8

According to RFC-2616, HTTP headers are modeled as described in "Standard for the Format of ARPA Internet Text Messages" (RFC822), which states clearly section 3.1.2:

The field-name must be composed of printable ASCII characters (i.e., characters that have values between 33. and 126., decimal, except colon). The field-body may be composed of any ASCII characters, except CR or LF. (While CR and/or LF may be present in the actual text, they are removed by the action of unfolding the field.)

So the above script should catch any RFC-[2]822 compliant header with the notable exception of folded headers.