解析来自 REST 请求的响应时编码错误

Error encoding when parsing response from REST request

上下文:我使用 Selenium 解析 http://bizhub.vn/tech/start-up-cvn-loyalty-receives-vnd11-billion-investment_318377.html 处的内容,然后获取新闻 post 的内容,然后使用 meaningcloud.com 中的 API (文本摘要服务)。

我正在使用 .NET 5.0.100-preview.8.20417.9,ASP.NET Core Web API 5,我有

文件SysUtil.cs

using System;

namespace bizhubdotvn_tech.Controllers
{
    public class SysUtil
    {
        public static String StringEncodingConvert(String strText, String strSrcEncoding, String strDestEncoding)
        {
            System.Text.Encoding srcEnc = System.Text.Encoding.GetEncoding(strSrcEncoding);
            System.Text.Encoding destEnc = System.Text.Encoding.GetEncoding(strDestEncoding);
            byte[] bData = srcEnc.GetBytes(strText);
            byte[] bResult = System.Text.Encoding.Convert(srcEnc, destEnc, bData);
            return destEnc.GetString(bResult);
        }
    }
}
        public static string SummaryText(string newsContent)
        {
            var client = new RestClient("https://api.meaningcloud.com/summarization-1.0");
            client.Timeout = -1;
            var request = new RestRequest(Method.POST);
            request.AddParameter("key", "25870359b682ec3c93f9becd850eb442");
            request.AddParameter("txt", JsonEncodedText.Encode(newsContent));
            request.AddParameter("sentences", 4);
            IRestResponse response = client.Execute(request);
            var mm = JObject.Parse(response.Content);
            string raw_string = (string)mm["summary"];
            //FIXME: sinh ra các ký tự lạ.
            string foo2 = SysUtil.StringEncodingConvert(raw_string, "Windows-1251", "UTF-8");
            Console.WriteLine("summary4 = " + foo2);
            return foo2;
        }

response.Content =

"{\"status\":{\"code\":\"0\",\"msg\":\"OK\",\"credits\":\"1\",\"remaining_credits\":\"19756\"},\"summary\":\"NextPay Joint Stock Company on Monday announced it had invested VND11 billion (US3,000) in CNV Loyalty.Established at the end of 2017, CNV Loyalty creates customer care applications for businesses.Nguyen Tuan Phu, founder cum CEO of CNV Loyalty said: \\u201CWith only the cost of VND50 - 150 million, significantly lower than building a customer care system in a traditional way, Loyalty designs a customised application for the business with its own brand to interact directly with customers. In particular, the rate of accessing customers accurately up to 95 per cent.\\u201DThe start-up has received an investment of VND11 billion from NextPay and Next100 - investment fund from Nguyen Hoa Binh after two months of an appraisal.Nguyen Huu Tuat, CEO of NextPay said: \\u201CWe are living in a digital economy whose main form is the app economy. [...] CNV Loyalty is a solution to help brands always present directly, interact directly, understand their customers directly at a zero cost. [...] The investment of NextPay will help CNV Loyalty to invest more deeply in technology and products.With the market heavily influenced by COVID-19, Vietnamese start-ups still developing and receiving investment from domestic investors without being dependent on venture capital abroad is a great encouragement to Viet Nam\\u0027s start-up community.\"}"

foo2 =

"NextPay Joint Stock Company on Monday announced it had invested VND11 billion (US3,000) in CNV Loyalty.Established at the end of 2017, CNV Loyalty creates customer care applications for businesses.Nguyen Tuan Phu, founder cum CEO of CNV Loyalty said: \u201CWith only the cost of VND50 - 150 million, significantly lower than building a customer care system in a traditional way, Loyalty designs a customised application for the business with its own brand to interact directly with customers. In particular, the rate of accessing customers accurately up to 95 per cent.\u201DThe start-up has received an investment of VND11 billion from NextPay and Next100 - investment fund from Nguyen Hoa Binh after two months of an appraisal.Nguyen Huu Tuat, CEO of NextPay said: \u201CWe are living in a digital economy whose main form is the app economy. [...] CNV Loyalty is a solution to help brands always present directly, interact directly, understand their customers directly at a zero cost. [...] The investment of NextPay will help CNV Loyalty to invest more deeply in technology and products.With the market heavily influenced by COVID-19, Vietnamese start-ups still developing and receiving investment from domestic investors without being dependent on venture capital abroad is a great encouragement to Viet Nam\u0027s start-up community."

如何解决\u201C处的问题(全局考虑这种字符,不只考虑一个特定字符)?

我相信你想替换像

这样的特殊字符

\u201c

Nick van Esch 在 this thread 中发布了一个大致相同的答案,这可能对您有所帮助。

所以这可能是您对返回的对象所做的 post-processing。你不应该做任何这些。这也可能是您正在使用的 RestClient。没有理由使用 RestClient。您可以使用 HttpClient 通过 HTTP 协议完成任何您需要的事情。

我去注册了一个密钥并尝试了它,它工作得很好,不需要“re-encode”数据。这是我的实现:

private static readonly HttpClient _httpClient = new HttpClient();

private sealed class MeaningResponseModel
{
    [JsonProperty("summary")]
    public string Summary { get; set; }
}

private static async Task<MeaningResponseModel> GetMeaningfulDataAsync(
    string key, int sentences, Uri uri)
{
    var queryString = $"key={key}&sentences={sentences}" +
        $"&url={WebUtility.UrlEncode(uri.ToString())}";

    using (var req = new HttpRequestMessage(HttpMethod.Post,
        new UriBuilder("https://api.meaningcloud.com/summarization-1.0")
    {
        Query = queryString
    }.Uri))
    {
        using (var res = await _httpClient.SendAsync(req))
        {
            res.EnsureSuccessStatusCode();
            using(var s = await res.Content.ReadAsStreamAsync())
            using(var sr = new StreamReader(s))
            using(var jtr = new JsonTextReader(sr))
            {
                return new JsonSerializer().Deserialize<MeaningResponseModel>(jtr);
            }
        }
    }
}

private async Task TestThis()
{
    var test = await GetMeaningfulDataAsync(
        "YOUR KEY HERE",
        20,
        new Uri("http://bizhub.vn/tech/start-up-cvn-loyalty-receives-vnd11-billion-investment_318377.html"));

    Console.WriteLine(test.Summary);
}

输出:

基于

private static readonly HttpClient _httpClient = new HttpClient();

private sealed class MeaningResponseModel
{
    [JsonProperty("summary")]
    public string Summary { get; set; }
}

private static async Task<MeaningResponseModel> GetMeaningfulDataAsync(string key, int sentences, string content)
{
    var queryString = $"key={key}&sentences={sentences}&txt={content}";
    using (var req = new HttpRequestMessage(HttpMethod.Post, new UriBuilder("https://api.meaningcloud.com/summarization-1.0") { Query = queryString }.Uri))
    {
        using (var res = await _httpClient.SendAsync(req))
        {
            res.EnsureSuccessStatusCode();
            using (var s = await res.Content.ReadAsStreamAsync())
            using (var sr = new StreamReader(s))
            using (var jtr = new JsonTextReader(sr))
            {
                return new Newtonsoft.Json.JsonSerializer().Deserialize<MeaningResponseModel>(jtr);
            }
        }
    }
}

使用时:

// Lấy summary 4 câu.
var temp = await GetMeaningfulDataAsync("25870359b682ec3c93f9becd850eb442", 4, contentAfterTrim);
string summary4 = temp.Summary;
summary4 = summary4.Replace("[...] ", "");
news.Summary4 = summary4;
Console.WriteLine("summary4 = " + summary4);