Celery worker 的日志包含问号 (???) 而不是正确的 unicode 字符

Question

我在 CentOS 6.5 上使用 Celery 3.1.18 和 Python 2.7.8。

在 Celery 任务模块中，我有以下代码：

# someapp/tasks.py
from celery import shared_task
from celery.utils.log import get_task_logger

logger = get_task_logger(__name__)


@shared_task()
def foo():
    logger.info('Test output: %s', u"测试中")

我用the initd script here来运行一个Celery worker。我还把以下设置放在 /etc/default/celeryd:

CELERYD_NODES="bar"

# %N will be replaced with the first part of the nodename.
CELERYD_LOG_FILE="/var/log/celery/%N.log"

# Workers should run as an unprivileged user.
#   You need to create this user manually (or you can choose
#   a user/group combination that already exists, e.g. nobody).
CELERYD_USER="nobody"
CELERYD_GROUP="nobody"

所以我的日志文件位于 /var/log/celery/bar.log。

但是，一旦任务被worker执行，上面的日志文件显示：

[2015-05-07 03:51:14,438: INFO/Worker-1/someapp.tasks.foo(...)] Test output: ???

Unicode 字符不见了，取而代之的是一些问号。

如何取回日志文件中的 unicode 字符？

Answer 1

您需要在启动 celery 应用程序的环境中设置 LANG=zh_CN.UTF-8。

如果你使用的是celeryd，有一个简单的方法，在/etc/default/celeryd

中设置CELERY_BIN="env LANG=zh_CN.UTF-8 /path/to/celery/binary

解释：

Celery 使用 ColorFormatter 进行消息格式化，即在 celery.utils.log 中定义。
ColorFormatter 使用 kombu.utils.encoding.safe_str.

unicode

str

kombu.utils.encoding.safe_str 将 unicode 编码为 str with encoding returns 由 default_encoding 在 kombu.utils.encoding
default_encoding returns getattr(get_default_encoding_file(), 'encoding', None) or sys.getfilesystemencoding()
此外，我没有明确地找到 celery 设置编码，所以我认为 celery 是使用 sys.getfilesystemencoding() 作为将 unicode 转换为 str 的编码。
sys.getfilesystemencoding's manual 表示：

On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed
So, setting LANG=zh_CN.UTF8 in the celery process environment tells celery to convert unicode to str by UTF8.

Celery worker 的日志包含问号 (???) 而不是正确的 unicode 字符

Celery worker's log contains question marks (???) instead of correct unicode characters

python

unicode

logging

celery