C#:WebClient - 无法识别西里尔字符
C#: WebClient - Cant Unrecognize cyrillic characters
正在尝试解析站点:link
下载内容的代码:
WebClient client = new WebClient();
client.Encoding = System.Text.Encoding.ASCII; // OR UTF8
string reply = client.DownloadString(url);
回复:
<!DOCTYPE HTML>
<html prefix="og: http://ogp.me/ns#">
<head><meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
<link rel="icon" type="image/vnd.microsoft.icon" href="https://spravnik.com/favicon.ico"/>
<link rel="SHORTCUT ICON" href="https://spravnik.com/favicon.ico"/>
<link href="/src/main.css?v=1.25" rel="stylesheet" type="text/css" />
<script src="https://cdn.contentsitesrv.com/js/push/subscribe.js?v=1.3.0"></script>
<title>??????????? 12 ??????? ??. - ?????????? ?????????? ??????</title>
<meta name="keywords" content="?????????? ?????????? ????????????, ???? 09 ????????????, ?????????? ????? ????????????"/>
<meta name="description" content="? ??????????? ☎ ????????? ?? ??????? ????? ??? ???? ???????????? ?? 12 ??????? ??. ????? ???????? ?? ?????? ????????, ?????? ???????? ????? ???????? ? ????? ?? ?????? ????????."/>
<meta property="og:title" content="?????????? ??????????. ??????????? ? ?? ??????...!"/>
所有西里尔字符都转换为“???”或者在����
看起来这个网站只是忽略了您的客户端编码和 return 您的 1251 编码数据。我更喜欢使用 RestClient 并检查响应 ContentType。但是,如果您对这个网站有绝对的把握 - 下面的代码可以正常工作。
WebClient client = new WebClient {Encoding = Encoding.UTF8};
byte[] reply = client.DownloadData(url);
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Encoding encoding1251 = Encoding.GetEncoding("windows-1251");
var convertedBytes = Encoding.Convert(encoding1251, Encoding.UTF8, reply);
string result = Encoding.UTF8.GetString(convertedBytes);
正在尝试解析站点:link
下载内容的代码:
WebClient client = new WebClient();
client.Encoding = System.Text.Encoding.ASCII; // OR UTF8
string reply = client.DownloadString(url);
回复:
<!DOCTYPE HTML>
<html prefix="og: http://ogp.me/ns#">
<head><meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
<link rel="icon" type="image/vnd.microsoft.icon" href="https://spravnik.com/favicon.ico"/>
<link rel="SHORTCUT ICON" href="https://spravnik.com/favicon.ico"/>
<link href="/src/main.css?v=1.25" rel="stylesheet" type="text/css" />
<script src="https://cdn.contentsitesrv.com/js/push/subscribe.js?v=1.3.0"></script>
<title>??????????? 12 ??????? ??. - ?????????? ?????????? ??????</title>
<meta name="keywords" content="?????????? ?????????? ????????????, ???? 09 ????????????, ?????????? ????? ????????????"/>
<meta name="description" content="? ??????????? ☎ ????????? ?? ??????? ????? ??? ???? ???????????? ?? 12 ??????? ??. ????? ???????? ?? ?????? ????????, ?????? ???????? ????? ???????? ? ????? ?? ?????? ????????."/>
<meta property="og:title" content="?????????? ??????????. ??????????? ? ?? ??????...!"/>
所有西里尔字符都转换为“???”或者在����
看起来这个网站只是忽略了您的客户端编码和 return 您的 1251 编码数据。我更喜欢使用 RestClient 并检查响应 ContentType。但是,如果您对这个网站有绝对的把握 - 下面的代码可以正常工作。
WebClient client = new WebClient {Encoding = Encoding.UTF8};
byte[] reply = client.DownloadData(url);
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Encoding encoding1251 = Encoding.GetEncoding("windows-1251");
var convertedBytes = Encoding.Convert(encoding1251, Encoding.UTF8, reply);
string result = Encoding.UTF8.GetString(convertedBytes);