写入所选文件时出现 UnicodeEncodeError
UnicodeEncodeError when writing the selected file
我对其他已经回答过的问题做了几次尝试加上我的代码总是returns错误。
此代码的唯一目的是将标签放入文档的句子中,并将包含超过 N 次出现的特定 POS 的句子转储到文件中:
import os
import nlpnet
import codecs
TAGGER = nlpnet.POSTagger('pos-pt', language='pt')
# You could have a function that tagged and verified if a
# sentence meets the criteria for storage.
def is_worth_saving(text, pos, pos_count):
# tagged sentences are lists of tagged words, which in
# nlpnet are (word, pos) tuples. Tagged texts may contain
# several sentences.
pos_words = [word for sentence in TAGGER.tag(text)
for word in sentence
if word[1] == pos]
return len(pos_words) >= pos_count
with codecs.open('dataset.txt', encoding='utf8') as original_file:
with codecs.open('dataset_new.txt', 'w') as output_file:
for text in original_file:
# For example, only save sentences with more than 5 verbs in it
if is_worth_saving(text, 'V', 5):
output_file.write(text + os.linesep)
编译错误:
Traceback (most recent call last):
File "D:/Word Sorter/Classifier.py", line 31, in <module>
output_file.write(text + os.linesep)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 161-162: ordinal not in range(128)
你以前见过这些问题吗?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) and Again: UnicodeEncodeError: ascii codec can't encode
和你的错误一模一样。所以我的猜测是您需要使用 text.encode('utf8')
.
对 text
进行编码
编辑:
在这里尝试使用它:
output_file.write(text.encode('utf8') + os.linesep)
我对其他已经回答过的问题做了几次尝试加上我的代码总是returns错误。 此代码的唯一目的是将标签放入文档的句子中,并将包含超过 N 次出现的特定 POS 的句子转储到文件中:
import os
import nlpnet
import codecs
TAGGER = nlpnet.POSTagger('pos-pt', language='pt')
# You could have a function that tagged and verified if a
# sentence meets the criteria for storage.
def is_worth_saving(text, pos, pos_count):
# tagged sentences are lists of tagged words, which in
# nlpnet are (word, pos) tuples. Tagged texts may contain
# several sentences.
pos_words = [word for sentence in TAGGER.tag(text)
for word in sentence
if word[1] == pos]
return len(pos_words) >= pos_count
with codecs.open('dataset.txt', encoding='utf8') as original_file:
with codecs.open('dataset_new.txt', 'w') as output_file:
for text in original_file:
# For example, only save sentences with more than 5 verbs in it
if is_worth_saving(text, 'V', 5):
output_file.write(text + os.linesep)
编译错误:
Traceback (most recent call last):
File "D:/Word Sorter/Classifier.py", line 31, in <module>
output_file.write(text + os.linesep)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 161-162: ordinal not in range(128)
你以前见过这些问题吗?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) and Again: UnicodeEncodeError: ascii codec can't encode
和你的错误一模一样。所以我的猜测是您需要使用 text.encode('utf8')
.
text
进行编码
编辑:
在这里尝试使用它:
output_file.write(text.encode('utf8') + os.linesep)