来自 Haystack 索引的 Unicode 错误
Unicode errror from Haystack indexing
我有一个 Django 系统,带有 haystack 和 elasticsearch 支持的搜索。更具体地说,我正在使用 django-cms 和 aldryn search 将搜索与 CMS 集成。
aldryn-search==0.3.0
Django==1.10.7
django-cms==3.4.3
django-haystack==2.6.0
elasticsearch==2.4.1
requests==2.13.0
requests-aws4auth==0.9
索引 haystack 的整个过程由第三方应用程序控制,但我以前从未经历过这种情况,所以希望能找到解决办法。
当 运行 update_index
用于 haystack 时存在一些 unicode 问题;
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 785, in bulk
doc_type, '_bulk'), params=params, body=self._bulk_body(body))
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/elasticsearch/transport.py", line 327, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 68, in perform_request
response = self.session.request(method, url, data=body, timeout=timeout or self.timeout)
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/requests/adapters.py", line 423, in send
timeout=timeout
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1053, in request
self._send_request(method, url, body, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1093, in _send_request
self.endheaders(body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders
self._send_output(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 891, in _send_output
msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1815: ordinal not in range(128)
Haystack 设置;
awsauth = AWS4Auth(TBH_AWS_ACCESS_KEY, TBH_AWS_SECRET_KEY, AWS_S3_REGION, 'es')
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch2_backend.Elasticsearch2SearchEngine',
'URL': 'https://search.eu-west-1.es.amazonaws.com/',
'INDEX_NAME': 'myproj_dev',
'TIMEOUT': 30,
'KWARGS': {
'port': 443,
'http_auth': awsauth,
'use_ssl': True,
'verify_certs': True,
'connection_class': elasticsearch.RequestsHttpConnection,
}
},
}
一旦进程在以下代码块中命中 urllib3;
# conn.request() calls httplib.*.request, not the method in
# urllib3.request. It also calls makefile (recv) on the socket.
if chunked:
conn.request_chunked(method, url, **httplib_request_kw)
else:
conn.request(method, url, **httplib_request_kw)
我可以看出数据的问题;
>>> httplib_request_kw['body'][1815]
'�'
>>> httplib_request_kw['body'].decode('utf-8')[1815]
u'-'
我可以对 haystack 设置进行某种猴子修补或调整以确保数据解码吗?
在您的 lib/python2.7/site-packages/haystack/fields.py
、convert
函数中(第 209 行附近)。尝试添加:
if isinstance(value, str):
value = value.decode("utf-8")
更改后的完整 class 代码:
class CharField(SearchField):
field_type = 'string'
def __init__(self, **kwargs):
if kwargs.get('facet_class') is None:
kwargs['facet_class'] = FacetCharField
super(CharField, self).__init__(**kwargs)
def prepare(self, obj):
return self.convert(super(CharField, self).prepare(obj))
def convert(self, value):
if value is None:
return None
if isinstance(value, str):
value = value.decode("utf-8")
return six.text_type(value)
我有一个 Django 系统,带有 haystack 和 elasticsearch 支持的搜索。更具体地说,我正在使用 django-cms 和 aldryn search 将搜索与 CMS 集成。
aldryn-search==0.3.0
Django==1.10.7
django-cms==3.4.3
django-haystack==2.6.0
elasticsearch==2.4.1
requests==2.13.0
requests-aws4auth==0.9
索引 haystack 的整个过程由第三方应用程序控制,但我以前从未经历过这种情况,所以希望能找到解决办法。
当 运行 update_index
用于 haystack 时存在一些 unicode 问题;
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 785, in bulk
doc_type, '_bulk'), params=params, body=self._bulk_body(body))
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/elasticsearch/transport.py", line 327, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py", line 68, in perform_request
response = self.session.request(method, url, data=body, timeout=timeout or self.timeout)
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/requests/adapters.py", line 423, in send
timeout=timeout
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/Users/mwalker/Sites/myproj/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1053, in request
self._send_request(method, url, body, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1093, in _send_request
self.endheaders(body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders
self._send_output(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 891, in _send_output
msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1815: ordinal not in range(128)
Haystack 设置;
awsauth = AWS4Auth(TBH_AWS_ACCESS_KEY, TBH_AWS_SECRET_KEY, AWS_S3_REGION, 'es')
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch2_backend.Elasticsearch2SearchEngine',
'URL': 'https://search.eu-west-1.es.amazonaws.com/',
'INDEX_NAME': 'myproj_dev',
'TIMEOUT': 30,
'KWARGS': {
'port': 443,
'http_auth': awsauth,
'use_ssl': True,
'verify_certs': True,
'connection_class': elasticsearch.RequestsHttpConnection,
}
},
}
一旦进程在以下代码块中命中 urllib3;
# conn.request() calls httplib.*.request, not the method in
# urllib3.request. It also calls makefile (recv) on the socket.
if chunked:
conn.request_chunked(method, url, **httplib_request_kw)
else:
conn.request(method, url, **httplib_request_kw)
我可以看出数据的问题;
>>> httplib_request_kw['body'][1815]
'�'
>>> httplib_request_kw['body'].decode('utf-8')[1815]
u'-'
我可以对 haystack 设置进行某种猴子修补或调整以确保数据解码吗?
在您的 lib/python2.7/site-packages/haystack/fields.py
、convert
函数中(第 209 行附近)。尝试添加:
if isinstance(value, str):
value = value.decode("utf-8")
更改后的完整 class 代码:
class CharField(SearchField):
field_type = 'string'
def __init__(self, **kwargs):
if kwargs.get('facet_class') is None:
kwargs['facet_class'] = FacetCharField
super(CharField, self).__init__(**kwargs)
def prepare(self, obj):
return self.convert(super(CharField, self).prepare(obj))
def convert(self, value):
if value is None:
return None
if isinstance(value, str):
value = value.decode("utf-8")
return six.text_type(value)