Django 模型 __unicode__ 在记录时引发异常

Django Model __unicode__ raising exception when logging

我有一个模型 class,如下所示:

class Address(models.Model):
    # taking length of address/city fields from existing UserProfile model
    address_1 = models.CharField(max_length=128,
                                 blank=False,
                                 null=False)

    address_2 = models.CharField(max_length=128,
                                 blank=True,
                                 null=True)

    address_3 = models.CharField(max_length=128,
                                 blank=True,
                                 null=True)

    unit = models.CharField(max_length=10,
                            blank=True,
                            null=True)

    city = models.CharField(max_length=128,
                            blank=False,
                            null=False)

    state_or_province = models.ForeignKey(StateOrProvince)

    postal_code = models.CharField(max_length=20,
                                   blank=False,
                                   null=False)

    phone = models.CharField(max_length=20,
                             blank=True,
                             null=True)

    is_deleted = models.BooleanField(default=False,
                                     null=False)

    def __unicode__(self):
        return u"{}, {} {}, {}".format(
            self.city, self.state_or_province.postal_abbrev, self.postal_code, self.address_1)

关键是 __unicode__ 方法。我有一个客户模型,它有一个指向此 table 的外键字段,我正在执行以下日志记录:

log.debug(u'Generated customer [{}]'.format(vars(customer)))

这很好用,但是如果 address_1 字段值包含非 ascii 值,比如

57562 Vån Ness Hwy

系统抛出以下异常:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 345: ordinal not in range(128)

我在 django/db/models/base.py:

中找到了一个奇怪的方法
def __repr__(self):
        try:
            u = six.text_type(self)
        except (UnicodeEncodeError, UnicodeDecodeError):
            u = '[Bad Unicode data]'
        return force_str('<%s: %s>' % (self.__class__.__name__, u))

如您所见,此方法被调用到 force_str,但未正确处理。这是一个错误吗?如果在我的对象上调用 unicode,难道不应该所有内容都采用 unicode 格式吗?

尝试 decode 非 utf-8 字符:

def __unicode__(self):
        return u"{}, {} {}, {}".format(
            self.city, self.state_or_province.postal_abbrev, self.postal_code, self.address_1.decode('utf-8'))

根据 docs,当 python 对象作为参数传递给 '{}'.format(obj) 时,

A general convention is that an empty format string ("") [within the "{}"] produces the same result as if you had called str() on the value.

这意味着您实际上是在调用 str(vars(customer))vars(customer) returns dict.

dict 上调用 str() 将在其键和值上调用 repr(),否则你会得到不明确的输出(例如 str(1) == str('1') == '1'repr(1) == '1' and repr('1') == '"1"'(参见 Difference between __str__ and __repr__ in Python

因此 repr() 仍在您的 Address 上调用,其中 returns 是一个字符串。

现在 Python 2 - , so you'll need to either override __str__() in your model to make it handle decoding into ascii (Django docs) 中不允许从 repr() 返回 unicode,或者执行类似的操作:

string_dict = {str(k): str(v) for (k, v) in vars(customer).items()}
log.debug(u'Generated customer [{}]'.format(string_dict))

这与其说是一个漂亮的答案,不如说是一个黑客,但我仍然会把我的两分钱扔到堆里。只需将您正在使用的 "logging.Handler" 子类化,然后更改 'emit' 方法(如果它是导致异常的方法)。

优点

设置非常简单。设置后,无需对任何 model/data.

执行任何操作

缺点

结果是不会有 UnicodeErrors,但日志文件将有 "strange looking strings starting with a backslash" 曾经有 unicode 标记的地方。例如会变成'\xf0\x9f\xa6\x84\'。也许您可以在需要时使用脚本将“\xf0\x9f\xa6\x84\”转换回日志文件中的 unicode。

步骤是

1) 制作一个 "custom_logging.py",您可以将其导入您的 settings.py

from logging import FileHandler

class Utf8FileHandler(FileHandler):
    """
          This is a hack-around version of the logging.Filehandler

        Prevents errors of the type
        UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f984' in position 150: character maps to <undefined>
    """
    def __init__(self, *args, **kwargs):
        FileHandler.__init__(self, *args, **kwargs)

    def emit(self, record):
        """
        Emit a record.

        If a formatter is specified, it is used to format the record.
        The record is then written to the stream with a trailing newline.  If
        exception information is present, it is formatted using
        traceback.print_exception and appended to the stream.  If the stream
        has an 'encoding' attribute, it is used to determine how to do the
        output to the stream.
        """
        try:
            msg = self.format(record)
            stream = self.stream
            stream.write(msg)
            stream.write(self.terminator)
            self.flush()
        except Exception:
            # The hack.
            try:
                stream.write(str(msg.encode('utf-8'))[2:-1])
                stream.write(self.terminator)
                self.flush()
            # End of the hack.
            except Exception:
                self.handleError(record)

2) 在您的 settings.py 中,使用您自定义的文件处理程序,像这样(将 LOGGING['handlers']['file']['class'] 设置为指向到 custom_logging 模块。):

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'verbose': {
            'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
        },
    },
    'handlers': {
        'file': {
            'level': 'DEBUG',
            'class': 'config.custom_logging.Utf8FileHandler',
            'filename': secrets['DJANGO_LOG_FILE'],
            'formatter': 'verbose',
        },
    },
    'loggers': {
        'django': {
            'handlers': ['file'],
            'level': 'DEBUG',
            'propagate': True,
        },
    },
}