Adel Ahmed Adel Ahmed - 5 months ago 15
C Question

downloading words with libcurl

I'm trying to download the words in a website (including the title). I'm using libcurl's option:

curl_easy_setopt(myHandle, CURLOPT_HEADER, 0);


to remove the non necessary data. I get the style data, however:

example.com:

Example Domain body { background-color: #f0f0f2; margin: 0; padding: 0; font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif; } div { width: 600px; margin: 5em auto; padding: 50px; background-color: #fff; border-radius: 1em; } a:link, a:visited { color: #38488f; text-decoration: none; } @media (max-width: 700px) { body { background-color: #fff; } div { width: auto; margin: 0 auto; border-radius: 0; padding: 1em; } } Example Domain This domain is established to be used for illustrative examples in documents. You may use this domain in examples without prior coordination or asking for permission. More information...http://www.iana.org/domains/example


Is there another option that can remove the following lines:

body { background-color: #f0f0f2; margin: 0; padding: 0; font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif; }


or do I have to parse the characters manually?

Answer Source

I just opted for removing the header completely from the page using:

htmlCode = strstr(htmlCode, "</head>");

that should do it for now