C#获取非英文字母的站点源码

Question

我正在尝试使用

在 C# 中获取站点的源代码

WebClient client = new WebClient();
string content = client.DownloadString(url);

而且一切顺利。但是，源代码包含希伯来字符，在 content 变量中显示为 Gibbrish。我需要做什么才能识别它？

Answer 1

WebClient client = new WebClient();
client.Encoding = System.Text.UTF8Encoding.UTF8; // added
string content = client.DownloadString(url);

您必须指定编码，您可能默认请求 ASCII，内容可能是 UTF8。这是编码设置为 UTF8 的示例。如果您不确定它是什么，请先手动检查源，然后相应地指定编码。有关详细信息，请参阅文档中的 Remarks。

Answer 2

问题出在您的 WebClient 的编码上。 MSDN 说：

... the method uses the encoding specified in the Encoding property to convert the resource to a String.

解决方案：设置一个特定的编码如

client.Encoding = Encoding.UTF8;

再试一次

string content = client.DownloadString(url);

UTF8 应该也能对希伯来字符进行编码。

C# Get site source code with letters other than english