从位于 HTML 的源代码中的动态生成的 link 下载文件
Download file from a dynamically generated link which lies in the source code of an HTML
我正在尝试从 BOM Australia 获取天气数据。手动方式是去http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064点击'All years of data',然后下载文件!
这是我尝试自动执行的操作:
using (WebClient client = new WebClient())
{
string html = client.DownloadString("http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064");
List<string> list = LinkExtractor.Extract(html);
foreach (var link in list)
{
if (link.StartsWith("/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile"))
{
string resource = "http://www.bom.gov.au" + link;
MessageBox.Show(resource);
client.DownloadFileAsync(new Uri(resource), Dts.Connections["data.zip"].ConnectionString);
break;
}
}
}
不用担心 linkExtractor,它可以正常工作,因为我能够看到提供文件的 link。问题是 'DownloadFileAsync' 创建了一个新请求,该请求不允许下载文件,因为文件需要相同的会话。
我有办法做到这一点吗?请与我们联系以获取更多说明。
更新:
这是我使用来自 HttpWebRequest 的 cookie 所做的更改。但是,我仍然无法下载文件。
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064");
request.CookieContainer = new CookieContainer();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
foreach (Cookie cook in response.Cookies)
{
MessageBox.Show(cook.ToString());
}
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
string data = readStream.ReadToEnd();
using (WebClient client = new WebClient())
{
foreach (Cookie cook in response.Cookies)
{
MessageBox.Show(cook.ToString());
client.Headers.Add(HttpRequestHeader.Cookie, cook.ToString());
}
List<string> list = LinkExtractor.Extract(data);
foreach (var link in list)
{
if (link.StartsWith("/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile"))
{
string initial = "http://www.bom.gov.au" + link;
MessageBox.Show(initial);
//client.Headers.Add(HttpRequestHeader.Cookie, "JSESSIONID=2EBAFF7EFE2EEFE8140118CE5170B8F6");
client.DownloadFile(new Uri(initial), Dts.Connections["data.zip"].ConnectionString);
break;
}
}
}
response.Close();
readStream.Close();
}
您得到的 html 和其中的 url 是 HtmlEncoded。这使得当您从 html 中提取 url 子串时,理想情况下您需要对其进行解码。这是 zip 下载 url 的样子:
/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&p_stn_num=2064&p_c=-938623&p_nccObsCode=136&p_startYear=2016
有帮手class帮我们解码:WebUtility
此代码下载 zip 文件:
using (var client = new WebClient())
{
var url = "http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064";
string html = client.DownloadString(url);
var pos = html.IndexOf("/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile");
var endpos = html.IndexOf('"', pos);
string link = html.Substring(pos, endpos - pos);
var decodedLink = WebUtility.HtmlDecode(link);
string resource = "http://www.bom.gov.au" + decodedLink;
client.DownloadFile(new Uri(resource), @"c:\temp\bom2.zip");
}
在这种情况下,您不需要保留 cookie,但您需要小心处理您解析的 URL。
我正在尝试从 BOM Australia 获取天气数据。手动方式是去http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064点击'All years of data',然后下载文件!
这是我尝试自动执行的操作:
using (WebClient client = new WebClient())
{
string html = client.DownloadString("http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064");
List<string> list = LinkExtractor.Extract(html);
foreach (var link in list)
{
if (link.StartsWith("/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile"))
{
string resource = "http://www.bom.gov.au" + link;
MessageBox.Show(resource);
client.DownloadFileAsync(new Uri(resource), Dts.Connections["data.zip"].ConnectionString);
break;
}
}
}
不用担心 linkExtractor,它可以正常工作,因为我能够看到提供文件的 link。问题是 'DownloadFileAsync' 创建了一个新请求,该请求不允许下载文件,因为文件需要相同的会话。
我有办法做到这一点吗?请与我们联系以获取更多说明。
更新:
这是我使用来自 HttpWebRequest 的 cookie 所做的更改。但是,我仍然无法下载文件。
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064");
request.CookieContainer = new CookieContainer();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
foreach (Cookie cook in response.Cookies)
{
MessageBox.Show(cook.ToString());
}
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
string data = readStream.ReadToEnd();
using (WebClient client = new WebClient())
{
foreach (Cookie cook in response.Cookies)
{
MessageBox.Show(cook.ToString());
client.Headers.Add(HttpRequestHeader.Cookie, cook.ToString());
}
List<string> list = LinkExtractor.Extract(data);
foreach (var link in list)
{
if (link.StartsWith("/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile"))
{
string initial = "http://www.bom.gov.au" + link;
MessageBox.Show(initial);
//client.Headers.Add(HttpRequestHeader.Cookie, "JSESSIONID=2EBAFF7EFE2EEFE8140118CE5170B8F6");
client.DownloadFile(new Uri(initial), Dts.Connections["data.zip"].ConnectionString);
break;
}
}
}
response.Close();
readStream.Close();
}
您得到的 html 和其中的 url 是 HtmlEncoded。这使得当您从 html 中提取 url 子串时,理想情况下您需要对其进行解码。这是 zip 下载 url 的样子:
/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&p_stn_num=2064&p_c=-938623&p_nccObsCode=136&p_startYear=2016
有帮手class帮我们解码:WebUtility
此代码下载 zip 文件:
using (var client = new WebClient())
{
var url = "http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=136&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=2064";
string html = client.DownloadString(url);
var pos = html.IndexOf("/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile");
var endpos = html.IndexOf('"', pos);
string link = html.Substring(pos, endpos - pos);
var decodedLink = WebUtility.HtmlDecode(link);
string resource = "http://www.bom.gov.au" + decodedLink;
client.DownloadFile(new Uri(resource), @"c:\temp\bom2.zip");
}
在这种情况下,您不需要保留 cookie,但您需要小心处理您解析的 URL。