Vikas Dhochak Vikas Dhochak - 3 months ago 18
C# Question

Download file from a dynamically generated link which lies in the source code of an HTML

I am trying to get the Weather data from BOM Australia. The manual way is to go to http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064 and click 'All years of data', and it downloads the file!

Here's what I have tried to automate this:

using (WebClient client = new WebClient())
{

string html = client.DownloadString("http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064");


List<string> list = LinkExtractor.Extract(html);
foreach (var link in list)
{
if (link.StartsWith("/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile"))
{

string resource = "http://www.bom.gov.au" + link;
MessageBox.Show(resource);


client.DownloadFileAsync(new Uri(resource), Dts.Connections["data.zip"].ConnectionString);
break;
}
}




}


Don't worry about the linkExtractor, it works as I am able to see the link that gives the file. The problem is that the 'DownloadFileAsync' creates a new request which does not let the file to get downloaded since the file needs the same session.

Is there a way I can do this? Please reach out for more clarification.

UPDATE:

Here are the changes I made, utilising cookies from HttpWebRequest. However, I am still not able to download the file.

HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064");
request.CookieContainer = new CookieContainer();

HttpWebResponse response = (HttpWebResponse)request.GetResponse();

foreach (Cookie cook in response.Cookies)
{
MessageBox.Show(cook.ToString());
}

if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;

if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}

string data = readStream.ReadToEnd();



using (WebClient client = new WebClient())
{
foreach (Cookie cook in response.Cookies)
{
MessageBox.Show(cook.ToString());
client.Headers.Add(HttpRequestHeader.Cookie, cook.ToString());
}

List<string> list = LinkExtractor.Extract(data);
foreach (var link in list)
{
if (link.StartsWith("/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile"))
{

string initial = "http://www.bom.gov.au" + link;
MessageBox.Show(initial);

//client.Headers.Add(HttpRequestHeader.Cookie, "JSESSIONID=2EBAFF7EFE2EEFE8140118CE5170B8F6");
client.DownloadFile(new Uri(initial), Dts.Connections["data.zip"].ConnectionString);
break;
}
}




}

response.Close();
readStream.Close();
}

Answer

The html you get and the url's within that are HtmlEncoded. That makes that when you substring the url out of the html you need to Decode it, ideally. This is what the download url for the zip looks like:

   /jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&amp;p_stn_num=2064&amp;p_c=-938623&amp;p_nccObsCode=136&amp;p_startYear=2016

There is helper class to do the decoding for us: WebUtility

This code does download the zip file:

using (var client = new WebClient())
{
    var url = "http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064";    
    string html = client.DownloadString(url);

    var pos = html.IndexOf("/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile");
    var endpos = html.IndexOf('"', pos);
    string link = html.Substring(pos, endpos - pos);

    var decodedLink = WebUtility.HtmlDecode(link);
    string resource = "http://www.bom.gov.au" + decodedLink;                    


    client.DownloadFile(new Uri(resource), @"c:\temp\bom2.zip");

}

In this case you don't need the cookies to be kept but you need to be careful with the URL's you parse.

Comments