Django 模型 __unicode__ 在记录时引发异常
Django Model __unicode__ raising exception when logging
我有一个模型 class,如下所示:
class Address(models.Model):
# taking length of address/city fields from existing UserProfile model
address_1 = models.CharField(max_length=128,
blank=False,
null=False)
address_2 = models.CharField(max_length=128,
blank=True,
null=True)
address_3 = models.CharField(max_length=128,
blank=True,
null=True)
unit = models.CharField(max_length=10,
blank=True,
null=True)
city = models.CharField(max_length=128,
blank=False,
null=False)
state_or_province = models.ForeignKey(StateOrProvince)
postal_code = models.CharField(max_length=20,
blank=False,
null=False)
phone = models.CharField(max_length=20,
blank=True,
null=True)
is_deleted = models.BooleanField(default=False,
null=False)
def __unicode__(self):
return u"{}, {} {}, {}".format(
self.city, self.state_or_province.postal_abbrev, self.postal_code, self.address_1)
关键是 __unicode__
方法。我有一个客户模型,它有一个指向此 table 的外键字段,我正在执行以下日志记录:
log.debug(u'Generated customer [{}]'.format(vars(customer)))
这很好用,但是如果 address_1 字段值包含非 ascii 值,比如
57562 Vån Ness Hwy
系统抛出以下异常:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 345: ordinal not in range(128)
我在 django/db/models/base.py:
中找到了一个奇怪的方法
def __repr__(self):
try:
u = six.text_type(self)
except (UnicodeEncodeError, UnicodeDecodeError):
u = '[Bad Unicode data]'
return force_str('<%s: %s>' % (self.__class__.__name__, u))
如您所见,此方法被调用到 force_str,但未正确处理。这是一个错误吗?如果在我的对象上调用 unicode,难道不应该所有内容都采用 unicode 格式吗?
尝试 decode
非 utf-8 字符:
def __unicode__(self):
return u"{}, {} {}, {}".format(
self.city, self.state_or_province.postal_abbrev, self.postal_code, self.address_1.decode('utf-8'))
根据 docs,当 python 对象作为参数传递给 '{}'.format(obj)
时,
A general convention is that an empty format string ("") [within the "{}"] produces the
same result as if you had called str() on the value.
这意味着您实际上是在调用 str(vars(customer))
和 vars(customer)
returns dict
.
在 dict
上调用 str()
将在其键和值上调用 repr()
,否则你会得到不明确的输出(例如 str(1) == str('1') == '1'
但 repr(1) == '1' and repr('1') == '"1"'
(参见 Difference between __str__ and __repr__ in Python)
因此 repr()
仍在您的 Address
上调用,其中 returns 是一个字符串。
现在 Python 2 - , so you'll need to either override __str__()
in your model to make it handle decoding into ascii (Django docs) 中不允许从 repr()
返回 unicode,或者执行类似的操作:
string_dict = {str(k): str(v) for (k, v) in vars(customer).items()}
log.debug(u'Generated customer [{}]'.format(string_dict))
这与其说是一个漂亮的答案,不如说是一个黑客,但我仍然会把我的两分钱扔到堆里。只需将您正在使用的 "logging.Handler" 子类化,然后更改 'emit' 方法(如果它是导致异常的方法)。
优点
设置非常简单。设置后,无需对任何 model/data.
执行任何操作
缺点
结果是不会有 UnicodeErrors,但日志文件将有 "strange looking strings starting with a backslash" 曾经有 unicode 标记的地方。例如会变成'\xf0\x9f\xa6\x84\'。也许您可以在需要时使用脚本将“\xf0\x9f\xa6\x84\”转换回日志文件中的 unicode。
步骤是
1) 制作一个 "custom_logging.py",您可以将其导入您的 settings.py
from logging import FileHandler
class Utf8FileHandler(FileHandler):
"""
This is a hack-around version of the logging.Filehandler
Prevents errors of the type
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f984' in position 150: character maps to <undefined>
"""
def __init__(self, *args, **kwargs):
FileHandler.__init__(self, *args, **kwargs)
def emit(self, record):
"""
Emit a record.
If a formatter is specified, it is used to format the record.
The record is then written to the stream with a trailing newline. If
exception information is present, it is formatted using
traceback.print_exception and appended to the stream. If the stream
has an 'encoding' attribute, it is used to determine how to do the
output to the stream.
"""
try:
msg = self.format(record)
stream = self.stream
stream.write(msg)
stream.write(self.terminator)
self.flush()
except Exception:
# The hack.
try:
stream.write(str(msg.encode('utf-8'))[2:-1])
stream.write(self.terminator)
self.flush()
# End of the hack.
except Exception:
self.handleError(record)
2) 在您的 settings.py 中,使用您自定义的文件处理程序,像这样(将 LOGGING['handlers']['file']['class'] 设置为指向到 custom_logging 模块。):
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'verbose': {
'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
},
},
'handlers': {
'file': {
'level': 'DEBUG',
'class': 'config.custom_logging.Utf8FileHandler',
'filename': secrets['DJANGO_LOG_FILE'],
'formatter': 'verbose',
},
},
'loggers': {
'django': {
'handlers': ['file'],
'level': 'DEBUG',
'propagate': True,
},
},
}
我有一个模型 class,如下所示:
class Address(models.Model):
# taking length of address/city fields from existing UserProfile model
address_1 = models.CharField(max_length=128,
blank=False,
null=False)
address_2 = models.CharField(max_length=128,
blank=True,
null=True)
address_3 = models.CharField(max_length=128,
blank=True,
null=True)
unit = models.CharField(max_length=10,
blank=True,
null=True)
city = models.CharField(max_length=128,
blank=False,
null=False)
state_or_province = models.ForeignKey(StateOrProvince)
postal_code = models.CharField(max_length=20,
blank=False,
null=False)
phone = models.CharField(max_length=20,
blank=True,
null=True)
is_deleted = models.BooleanField(default=False,
null=False)
def __unicode__(self):
return u"{}, {} {}, {}".format(
self.city, self.state_or_province.postal_abbrev, self.postal_code, self.address_1)
关键是 __unicode__
方法。我有一个客户模型,它有一个指向此 table 的外键字段,我正在执行以下日志记录:
log.debug(u'Generated customer [{}]'.format(vars(customer)))
这很好用,但是如果 address_1 字段值包含非 ascii 值,比如
57562 Vån Ness Hwy
系统抛出以下异常:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 345: ordinal not in range(128)
我在 django/db/models/base.py:
中找到了一个奇怪的方法def __repr__(self):
try:
u = six.text_type(self)
except (UnicodeEncodeError, UnicodeDecodeError):
u = '[Bad Unicode data]'
return force_str('<%s: %s>' % (self.__class__.__name__, u))
如您所见,此方法被调用到 force_str,但未正确处理。这是一个错误吗?如果在我的对象上调用 unicode,难道不应该所有内容都采用 unicode 格式吗?
尝试 decode
非 utf-8 字符:
def __unicode__(self):
return u"{}, {} {}, {}".format(
self.city, self.state_or_province.postal_abbrev, self.postal_code, self.address_1.decode('utf-8'))
根据 docs,当 python 对象作为参数传递给 '{}'.format(obj)
时,
A general convention is that an empty format string ("") [within the "{}"] produces the same result as if you had called str() on the value.
这意味着您实际上是在调用 str(vars(customer))
和 vars(customer)
returns dict
.
在 dict
上调用 str()
将在其键和值上调用 repr()
,否则你会得到不明确的输出(例如 str(1) == str('1') == '1'
但 repr(1) == '1' and repr('1') == '"1"'
(参见 Difference between __str__ and __repr__ in Python)
因此 repr()
仍在您的 Address
上调用,其中 returns 是一个字符串。
现在 Python 2 - , so you'll need to either override __str__()
in your model to make it handle decoding into ascii (Django docs) 中不允许从 repr()
返回 unicode,或者执行类似的操作:
string_dict = {str(k): str(v) for (k, v) in vars(customer).items()}
log.debug(u'Generated customer [{}]'.format(string_dict))
这与其说是一个漂亮的答案,不如说是一个黑客,但我仍然会把我的两分钱扔到堆里。只需将您正在使用的 "logging.Handler" 子类化,然后更改 'emit' 方法(如果它是导致异常的方法)。
优点
设置非常简单。设置后,无需对任何 model/data.
执行任何操作缺点
结果是不会有 UnicodeErrors,但日志文件将有 "strange looking strings starting with a backslash" 曾经有 unicode 标记的地方。例如会变成'\xf0\x9f\xa6\x84\'。也许您可以在需要时使用脚本将“\xf0\x9f\xa6\x84\”转换回日志文件中的 unicode。
步骤是
1) 制作一个 "custom_logging.py",您可以将其导入您的 settings.py
from logging import FileHandler
class Utf8FileHandler(FileHandler):
"""
This is a hack-around version of the logging.Filehandler
Prevents errors of the type
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f984' in position 150: character maps to <undefined>
"""
def __init__(self, *args, **kwargs):
FileHandler.__init__(self, *args, **kwargs)
def emit(self, record):
"""
Emit a record.
If a formatter is specified, it is used to format the record.
The record is then written to the stream with a trailing newline. If
exception information is present, it is formatted using
traceback.print_exception and appended to the stream. If the stream
has an 'encoding' attribute, it is used to determine how to do the
output to the stream.
"""
try:
msg = self.format(record)
stream = self.stream
stream.write(msg)
stream.write(self.terminator)
self.flush()
except Exception:
# The hack.
try:
stream.write(str(msg.encode('utf-8'))[2:-1])
stream.write(self.terminator)
self.flush()
# End of the hack.
except Exception:
self.handleError(record)
2) 在您的 settings.py 中,使用您自定义的文件处理程序,像这样(将 LOGGING['handlers']['file']['class'] 设置为指向到 custom_logging 模块。):
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'verbose': {
'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
},
},
'handlers': {
'file': {
'level': 'DEBUG',
'class': 'config.custom_logging.Utf8FileHandler',
'filename': secrets['DJANGO_LOG_FILE'],
'formatter': 'verbose',
},
},
'loggers': {
'django': {
'handlers': ['file'],
'level': 'DEBUG',
'propagate': True,
},
},
}