UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 1: ordinal not in range(128)

Question

所以，正如问题的标题所说，我对 encoding/decoding 个字符串有疑问。

我正在使用： python 2.7 |姜戈 1.11 | jinja2 2.8

基本上，我从数据库中检索一些数据，我对其进行序列化，在其上设置缓存，然后获取缓存，将其反序列化并将其呈现给模板。

问题：

我有一些人的名字和姓氏，这些人的名字中都有“ă”之类的字符。我使用 json.dumps.

进行序列化

序列化字典的示例如下（我有 10 个这样的）：

active_agents = User.region_objects.get_active_agents()
agents_by_commission_last_month = active_agents.values(....
                                                          "first_name", "last_name").order_by(
        '-total_paid_transaction_value_last_month')

然后，当我设置缓存时，我会这样做：

for key, value in context.items():
   ......
   value = json.dumps(list(value), default=str, ensure_ascii=False).encode('utf-8')

，其中 value 是上述代码中 .values() 返回的字典列表，key 是 region_agents_by_commission_last_month（就像前面代码中的变量）

现在，我必须获取缓存。所以我在做同样的过程，但是相反。

serialized_keys = ['agencies_by_commission_last_month',
                       'region_agents_by_commission_last_month', 'region_agents_by_commission_last_12_months',
                       'region_agents_by_commission_last_30_days',
                       'agencies_by_commission_last_year',
                       'agencies_by_commission_last_12_months',
                       'agencies_by_commission_last_30_days',
                       'region_agents_by_commission_last_year',
                       'agency',
                       'for_agent']
    context = {}

    for key, value in region_ranking_cache.items():
        if key in serialized_keys:
            objects = json.loads(value, object_hook=_decode_dict)
            for serilized_dict in objects:
                ....
                 d['full_name'] = d['first_name'] + " " + d['last_name']
                 full_name = d['full_name'].decode('utf-8').encode('utf-8')
                 d['full_name'] = full_name
                 print(d['full_name'])
                ....

其中 _decode_dict 对于 object_hook 看起来像：

打印结果：Cătălin Pintea，没问题。但是在我呈现的字典中：'full_name': 'C\xc4\x83t\xc4\x83lin Pintea',

def _decode_list(data):
    rv = []
    for item in data:
        if isinstance(item, unicode):
            item = item.encode('utf-8')
        elif isinstance(item, list):
            item = _decode_list(item)
        elif isinstance(item, dict):
            item = _decode_dict(item)
        rv.append(item)
    return rv


def _decode_dict(data):
    rv = {}
    for key, value in data.items():
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        elif isinstance(value, list):
            value = _decode_list(value)
        elif isinstance(value, dict):
            value = _decode_dict(value)
        rv[key] = value
    return rv

基本上，当 json.loads.

时，我使用此 object 钩子函数将所有键和值编码（）为 utf-8

这就是我避免在 views.py.

中抛出此错误的方法

错误

模板某处，我正在使用：

<td>{{ agent.full_name }}</td>

而agent.full_name来自：'full_name': 'C\xc4\x83t\xc4\x83lin Pintea',

回溯

Traceback:

File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/exception.py" in inner
  41.             response = get_response(request)

File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py" in _legacy_get_response
  249.             response = self._get_response(request)

File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py" in _get_response
  187.                 response = self.process_exception_by_middleware(e, request)

File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py" in _get_response
  185.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/usr/local/lib/python2.7/dist-packages/django/utils/decorators.py" in inner
  185.                     return func(*args, **kwargs)

File "/usr/local/lib/python2.7/dist-packages/django/contrib/auth/decorators.py" in _wrapped_view
  23.                 return view_func(request, *args, **kwargs)

File "/app/crmrebs/utils/__init__.py" in wrapper
  255.             return http_response_class(t.render(output, request))

File "/usr/local/lib/python2.7/dist-packages/django_jinja/backend.py" in render
  106.         return mark_safe(self.template.render(context))

File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py" in render
  989.         return self.environment.handle_exception(exc_info, True)

File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py" in handle_exception
  754.         reraise(exc_type, exc_value, tb)

File "/app/crmrebs/jinja2/ranking/dashboard_ranking.html" in top-level template code
  1. {% extends "base.html" %}

File "/app/crmrebs/jinja2/base.html" in top-level template code
  1. {% extends "base_stripped.html" %}

File "/app/crmrebs/jinja2/base_stripped.html" in top-level template code
  94.           {% block content %}

File "/app/crmrebs/jinja2/ranking/dashboard_ranking.html" in block "content"
  83.           {% include "dashboard/region_ranking.html" %}

File "/app/crmrebs/jinja2/dashboard/region_ranking.html" in top-level template code
  41.         {% include "dashboard/_agent_ranking_row_month.html" %}

File "/app/crmrebs/jinja2/dashboard/_agent_ranking_row_month.html" in top-level template code
  2.   <td>{{ agent.full_name }}</td>

Exception Type: UnicodeDecodeError at /ranking
Exception Value: 'ascii' codec can't decode byte 0xc4 in position 1: ordinal not in range(128)

这就是错误的来源。我尝试了其他东西，但我猜这是 python 2.7 的限制。我通常使用 python 3.9，但对于这个项目我必须使用 2.7。我在这里尝试了其他答案，但没有任何帮助。

谁能帮我正确序列化这本词典，我怎样才能避免这种混乱？

希望我说清楚了。

祝大家有个愉快的一天！

Answer 1

所以，我设法解决了我的问题。

我发现 active_agents.values(...."first_name", "last_name").order_by('-total_paid_transaction_value_last_month') 检索到一个字典，其中它的键和值已经是 unicode（因为它在 models.py、django 1.11 和 python2 中的配置方式.7. 所以，序列化的过程就好了。进入模板的最终结果确实看起来像 ’C\xc4\x83t\xc4\x83lin'。错误来自 /xc4/.
为了在模板上修复它，我这样做了： {{ agent.full_name.decode("utf-8") }}，这给了我正确的结果：Cătălin Pintea

谢谢@BoarGules。 d['last_name'] 和 d['first_name'] 确实是在 unicode 中。所以当我进行连接时，我不得不添加 u" ".