提取纯文本网站 html
Extract web site plain html
我正在尝试使用以下代码访问网站的内容:
HttpClient httpClient = new HttpClient();
string htmlresult = "";
var response = await httpClient.GetAsync(url);
if (response.IsSuccessStatusCode)
{
htmlresult = await response.Content.ReadAsStringAsync();
}
return htmlresult;
它给了我正确的 html 除了 https://www.yahoo.com
,这可能给我一个加密的字符串而不是普通的 html,如下所示。
‹ Ľç–ãF¶.øÿ<»Ž4Kj“ð¦ÔÒ½÷ž·îÊO0$ Úž~÷ 4@D™U:ëNgK"bÛÄïÿõr¯4^ô
如何从这个加密文本中得到简单的html?
Yahoo 使用Accept-Encoding: gzip, deflate, br
,所以你的案例中的内容是g-zipped。快速修复您的代码 - 启用自动解压缩:
private async Task<String> GetUrl(string url)
{
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
HttpClient httpClient = new HttpClient(handler);
string htmlresult = "";
var response = await httpClient.GetAsync(url);
if (response.IsSuccessStatusCode)
{
htmlresult = await response.Content.ReadAsStringAsync();
}
return htmlresult;
}
我正在尝试使用以下代码访问网站的内容:
HttpClient httpClient = new HttpClient();
string htmlresult = "";
var response = await httpClient.GetAsync(url);
if (response.IsSuccessStatusCode)
{
htmlresult = await response.Content.ReadAsStringAsync();
}
return htmlresult;
它给了我正确的 html 除了 https://www.yahoo.com
,这可能给我一个加密的字符串而不是普通的 html,如下所示。
‹ Ľç–ãF¶.øÿ<»Ž4Kj“ð¦ÔÒ½÷ž·îÊO0$ Úž~÷ 4@D™U:ëNgK"bÛÄïÿõr¯4^ô
如何从这个加密文本中得到简单的html?
Yahoo 使用Accept-Encoding: gzip, deflate, br
,所以你的案例中的内容是g-zipped。快速修复您的代码 - 启用自动解压缩:
private async Task<String> GetUrl(string url)
{
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
HttpClient httpClient = new HttpClient(handler);
string htmlresult = "";
var response = await httpClient.GetAsync(url);
if (response.IsSuccessStatusCode)
{
htmlresult = await response.Content.ReadAsStringAsync();
}
return htmlresult;
}