如何测试编码类型 Python 2.7？

Question

我正在尝试解决我遇到的有关外来字符（任何和所有字母表）的问题。我的脚本 (2.7 python) 接收字符（英文字母和其他外来字符的混合）作为 unicode json，并将其发送到数据库插入函数以使用 psycopg2 插入到某些表中。这作为一个脚本是完美的，但一旦作为一项服务就没有那么多了（插入外来字符是胡说八道）。这 unicoding/encoding/decoding 东西太混乱了！我正在尝试遵循此 ( https://www.pythoncentral.io/python-unicode-encode-decode-strings-python-2x/ )，希望能准确了解我收到的内容，然后发送到数据库，但在我看来，我需要知道各个阶段的编码是什么。你如何获得编码类型是什么？抱歉，这一定很简单，但我没有找到如何获取该信息，而且我认为其他人关于此事的问题还没有得到准确回答。这不可能那么难以捉摸。请帮忙

请求的附加信息... -是的，很想搬到 3.x，但现在不能。 -目前主要是我在测试，还没有对用户开放。我正在 Windows 2012 Server AWS 机器上进行测试和开发，该服务托管在类似的机器上。是的 - 你如何找到语言环境信息？

用前端开发人员 (js) 做了一些测试，他说 json 输入以 url 编码的形式传给我...当我输入它时，它只显示 unicode。想法??

Answer 1

不要依赖默认的系统编码。相反，总是自己设置：

    # read in a string (a bunch of bytes the encoding of which you should know)
    str = sys.stdin.read();
    # decode the bytes into a unicode string
    u = unicode.decode(str, encoding='ISO-8859-1', errors=replace);
    # do stuff with the string
    # ...
    # always operate on unicode stuff inside your program.
    # make a 'unicode sandwhich'.
    # ...
    # encode the bytes in preparation for writing them out
    out = unicode.encode(u, encoding='UTF-8')
    # great, now you have bytes you can just write out
    with open('myfile.txt', 'wb') as f:
        rb.write(out)

注意，我对整个编码进行了硬编码。

但是如果您不知道输入的编码怎么办？嗯，这是个问题。 You need to know that. But I also understand unicode can be painful and there's this guy from the python community who tells you how to stop the pain (video).

请注意，python 3 中的一大变化是更好的 unicode 支持。而不是使用 unicode 包和令人困惑的 py2 str 类型，在 python 3 str 类型中正是 python 2 的 unicode 类型, 你可以在更方便的地方指定编码:

with open('myfile.txt', 'w', encoding=UTF-8, errors='ignore') as f:
   # ...

如何测试编码类型 Python 2.7？

How to test for encoding type Python 2.7?

python

python-2.7

python-unicode