字符编码问题看起来与我手动下载文件时得到的不同

Issue with character encoding looking different from what I get when I download a file manually

我正在尝试使用以下 google translate api 端点来翻译应用程序中的文本: https://clients5.google.com/translate_a/t?client=dict-chrome-ex&sl=auto&tl=en&q=контрольная%20работа

当我点击 link 时,它会下载一个文本文件,打开时包含我需要的所有信息,格式似乎正确(sentences[0].trans = "text" 是一样的格式就像我手动写出“文本”这个词一样)。

然而,在 C# 中使用 www 文件请求时,在 python 中使用 requests.get,或通过邮递员,我得到以下字符串而不是“trans”:“ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð ± отР°".

我试过将它转换成一堆不同的编码,但 none 给出了正确的值。我也不同意完整请求的英文部分是正确的,但是应该是英文的翻译显示错误,显示原始翻译的俄语部分也显示错误。

无论我在 C# 中尝试不同的编码(utf7、utf8、utf16、utf16-be)时如何更改其编码,我从中得到的文本似乎都不会转换回测试。

我在这里遗漏了什么吗?

尝试请求的代码、手动下载文件的结果以及运行代码的结果如下所示:

代码:

import json
import requests

text = "контрольная работа"
lang = "en"
url = f"https://clients5.google.com/translate_a/t?client=dict-chrome-ex&sl=auto&tl={lang}&q={text}"

url = url.replace(" ", "%20")

res = requests.get(url)

res = res.text

jres = json.loads(res)
translation = jres["sentences"][0]["trans"]
print(res, end="\n\n")
print("\t", translation)

手动下载(点击chrome中的link下载文件):

{
  "sentences": [
    {
      "trans": "test",
      "orig": "контрольная работа",
      "backend": 10
    },
    {
      "src_translit": "kontrol'naya rabota"
    }
  ],
  "dict": [
    {
      "pos": "noun",
      "terms": [
        "test"
      ],
      "entry": [
        {
          "word": "test",
          "reverse_translation": [
            "тест",
            "испытание",
            "анализ",
            "проверка",
            "критерий",
            "контрольная работа"
          ],
          "score": 0.18498141
        }
      ],
      "base_form": "контрольная работа",
      "pos_enum": 1
    }
  ],
  "src": "ru",
  "alternative_translations": [
    {
      "src_phrase": "контрольная работа",
      "alternative": [
        {
          "word_postproc": "test",
          "score": 1000,
          "has_preceding_space": true,
          "attach_to_next_token": false,
          "backends": [
            10
          ]
        },
        {
          "word_postproc": "test work",
          "score": 0,
          "has_preceding_space": true,
          "attach_to_next_token": false,
          "backends": [
            3
          ]
        }
      ],
      "srcunicodeoffsets": [
        {
          "begin": 0,
          "end": 18
        }
      ],
      "raw_src_segment": "контрольная работа",
      "start_pos": 0,
      "end_pos": 0
    }
  ],
  "confidence": 1,
  "ld_result": {
    "srclangs": [
      "ru"
    ],
    "srclangs_confidences": [
      1
    ],
    "extended_srclangs": [
      "ru"
    ]
  },
  "target_inflections": [
    {
      "written_form": "test",
      "features": {
        "number": 2
      }
    },
    {
      "written_form": "tests",
      "features": {
        "number": 1
      }
    }
  ]
}

在 C# 中使用 www 请求文件(.net framework 3.5,当 www 未被弃用时具有统一引擎)或在 Python 中请求:

{
  "sentences": [
    {
      "trans": "ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð ± отР°",
      "orig": "ÐºÐ¾Ð½Ñ‚Ñ€Ð¾Ð»ÑŒÐ½Ð°Ñ  работа",
      "backend": 3,
      "translation_engine_debug_info": [
        {
          "model_tracking": {
            "checkpoint_md5": "ef4a126affdcc2d3c84e987e2d0fb6b1",
            "launch_doc": "tea_GermanicB_afdaislbnosvfyyiiw_en_2020q2.md"
          }
        }
      ]
    }
  ],
  "src": "is",
  "alternative_translations": [
    {
      "src_phrase": "ÐºÐ¾Ð½Ñ‚Ñ€Ð¾Ð»ÑŒÐ½Ð°Ñ  работа",
      "alternative": [
        {
          "word_postproc": "ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð ± отР°",
          "score": 0,
          "has_preceding_space": true,
          "attach_to_next_token": false,
          "backends": [
            3
          ]
        },
        {
          "word_postproc": "ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð °",
          "score": 0,
          "has_preceding_space": true,
          "attach_to_next_token": false,
          "backends": [
            8
          ]
        }
      ],
      "srcunicodeoffsets": [
        {
          "begin": 0,
          "end": 35
        }
      ],
      "raw_src_segment": "ÐºÐ¾Ð½Ñ‚Ñ€Ð¾Ð»ÑŒÐ½Ð°Ñ  работа",
      "start_pos": 0,
      "end_pos": 0
    }
  ],
  "confidence": 1,
  "ld_result": {
    "srclangs": [
      "is"
    ],
    "srclangs_confidences": [
      1
    ],
    "extended_srclangs": [
      "is"
    ]
  }
}

因为它直接与 Chrome 一起工作,所以我添加了一个 Chrome 用户代理 header 并且它工作正常:

import json
import requests
from pprint import pprint

url = 'https://clients5.google.com/translate_a/t'
params = {'client': 'dict-chrome-ex',
          'sl': 'auto',
          'tl': 'en',
          'q': 'контрольная работа'}
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'}

r = requests.get(url,params=params,headers=headers)

jres = r.json()
print(json.dumps(jres, indent=2, ensure_ascii=False))

输出:

{
  "sentences": [
    {
      "trans": "test",
      "orig": "контрольная работа",
      "backend": 10
    },
    {
      "src_translit": "kontrol'naya rabota"
    }
  ],
  "dict": [
    {
      "pos": "noun",
      "terms": [
        "test"
      ],
      "entry": [
        {
          "word": "test",
          "reverse_translation": [
            "тест",
            "испытание",
            "анализ",
            "проверка",
            "критерий",
            "контрольная работа"
          ],
          "score": 0.18498141
        }
      ],
      "base_form": "контрольная работа",
      "pos_enum": 1
    }
  ],
  "src": "ru",
  "alternative_translations": [
    {
      "src_phrase": "контрольная работа",
      "alternative": [
        {
          "word_postproc": "test",
          "score": 1000,
          "has_preceding_space": true,
          "attach_to_next_token": false,
          "backends": [
            10
          ]
        },
        {
          "word_postproc": "control work",
          "score": 0,
          "has_preceding_space": true,
          "attach_to_next_token": false,
          "backends": [
            3
          ]
        }
      ],
      "srcunicodeoffsets": [
        {
          "begin": 0,
          "end": 18
        }
      ],
      "raw_src_segment": "контрольная работа",
      "start_pos": 0,
      "end_pos": 0
    }
  ],
  "confidence": 1,
  "ld_result": {
    "srclangs": [
      "ru"
    ],
    "srclangs_confidences": [
      1
    ],
    "extended_srclangs": [
      "ru"
    ]
  },
  "target_inflections": [
    {
      "written_form": "test",
      "features": {
        "number": 2
      }
    },
    {
      "written_form": "tests",
      "features": {
        "number": 1
      }
    }
  ]
}