Elegant Metal Elegant Metal - 1 year ago 57
HTML Question

Extract URLs from Google search result page

I'm trying to grab all the URLs off of a Google search page and there are two ways I think I could do it, but I don't really have any idea how to do them.

First, I could simply scrape them from the

tags and get the
attribute for each link. However, this gives me a really long string that I would have to parse through to get the URL. Here's an example of what would have to be parsed through:


The URL I would want out of this would be:


So I would have to create a string between the
which I'm not 100% sure how to do because each really long string Google gives me is a different size so just using slice and cutting it up "x" amount of characters wouldn't work.

Second, underneath each link in a Google search there is the URL in green text. Right clicking that and inspecting the element gives:
cite class="_Rm"
(between chevrons) which I don't know how to find with goquery because looking for
with my small function just gives me more long strings of characters.

Here is my small function, it currently does the first option without the parsing and gives me a long string of text that just takes me to the search page:

func GetUrls(url string) {

doc, err := goquery.NewDocument(url)

if err != nil {

doc.Find(".r").Each(func(i int, s *goquery.Selection) {

doc.Find(".r a").Each(func(i int, s *goquery.Selection) {
Link, _ := s.Attr("href")
Link = url + Link
fmt.Printf("link is [%s]\n", Link)



Answer Source

The standard library has support parsing URLs. Check out the net/url package. Using this package, we can get query parameters from URLs.

Note that your original raw URL contains the URL you want to extract in the "aqs" parameter in the form of


Which is basically another URL.

Let's write a little helper function which gets a parameter from a raw URL text:

func getParam(raw, param string) (string, error) {
    u, err := url.Parse(raw)
    if err != nil {
        return "", err

    q := u.Query()
    if q == nil {
        return "", fmt.Errorf("No query part")

    v := q.Get(param)
    if v == "" {
        return "", fmt.Errorf("Param not found")
    return v, nil

Using this we can get the "aqs" parameter from the original URL, and using this again we can get the "q" parameter which is exactly your desired URL:

raw := "https://www.google.com/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&q=mh4u%20items&oq=mh4u%20items&aqs=chrome.0.0l2j69i59j69i60j0l2.1754j0j7/url?q=https://youknowumsayin.wordpress.com/2015/03/16/the-inventory-and-you-what-items-should-i-bring-mh4u/&sa=U&ei=n8NvVdSvBMOsyATSzYKoCQ&ved=0CEUQFjAL&usg=AFQjCNGyD5NjsqOncyLElJt9C0hqVQ7gyA"
aqs, err := getParam(raw, "aqs")
if err != nil {

result, err := getParam(aqs, "q")

Output (try it on the Go Playground):