WebClient.DownloadString() 结果在某些网站上存在编码问题!用波斯语(波斯语)

WebClient.DownloadString() result has encoding issues with some web sites! with persian(farsi) language

我想打开一个网站并阅读其来源。 所以我写了这段代码:

WebClient client = new WebClient();
htmlCode = client.DownloadString("http://www.varzesh3.com");

但是我得到了垃圾数据。 我也添加了这段代码,但仍然不起作用。

client.Encoding = Encoding.UTF8; client.Headers.Add("charset", "utf-8");

此外,我使用了这些代码,但其中 none 个代码不起作用:

byte[] raw = client.DownloadData("http://www.varzesh3.com");

string webData1 = Encoding.ASCII.GetString(raw);
string webData2 = Encoding.BigEndianUnicode.GetString(raw);
string webData3 = Encoding.Unicode.GetString(raw);
string webData4 = Encoding.UTF32.GetString(raw);
string webData5 = Encoding.UTF7.GetString(raw);
string webData6 = Encoding.UTF8.GetString(raw);

注意: 我可以打开和阅读任何其他使用波斯语 (farsi) 语言的网站,但我无法打开 www.varzesh3.com 你能帮帮我吗?

该站点的结果已压缩。你需要先解压它。 More info here. Now by using the custom MyWebClient,你将拥有:

using (var client = new MyWebClient { Encoding = Encoding.UTF8 })
{
    var test = client.DownloadString("http://www.varzesh3.com/");
}

这是因为网站使用gzip压缩输出。 你应该解压它

using (var hc = new HttpClient())
using (var stream = await hc.GetStreamAsync(@"http://www.varzesh3.com/"))
using (var gzstream = new GZipStream(stream, CompressionMode.Decompress))
using (var reader = new StreamReader(gzstream))
{
    var text = await reader.ReadToEndAsync();
    // do what you want with text
}