zarathustra zarathustra - 3 months ago 19
HTTP Question

Umlauts in ISO-8859-1 encoded website

My very simple code snippet:

import "net/http"
import "io"
import "os"

func main() {
resp, err := http.Get("http://example.com")
if err == nil {
io.Copy(os.Stdout, resp.Body)
}
}


When
example.com
is
charset=iso-8859-1
encoded my output is faulty. Umlauts for example are not displayed correctly:

Hällo Wörld --> H?llo W?rld


Whats a good solution to display umlauts correctly??

Answer

You can use the package golang.org/x/net/html/charset to determine the encoding of the website, and also create a reader that converts the content to UTF-8.

Below is a working example:

package main

import (
    "io"
    "net/http"
    "os"

    "golang.org/x/net/html/charset"
)

func main() {
    resp, err := http.Get("http://example.com")
    if err != nil {
        os.Exit(1)
    }

    r, err := charset.NewReader(resp.Body, resp.Header.Get("Content-Type"))
    if err != nil {
        os.Exit(1)
    }

    io.Copy(os.Stdout, r)
}
Comments