信息功能不返回西里尔字符
Informative features are not returning Cyrillic characters
我现在已经切换到 Python 3.6,但是当 运行 信息丰富的功能时,当我尝试在我的功能提取器中打印俄语时,我最终遇到了乱码。
Most Informative Features
three_last_letters = 'оÌ' noun : verb = 6.6 : 1.0
three_last_letters = 'гÐ' noun : verb = 5.4 : 1.0
three_last_letters = 'еÐ' noun : verb = 4.7 : 1.0
three_last_letters = 'мÐ' noun : verb = 4.4 : 1.0
three_last_letters = 'нÑ' noun : verb = 3.5 : 1.0
对于特征提取器本身
def POS_features(word):
return{'three_last_letters':word[-3:]}
print(POS_features(u'Богатир'))
我可以让 тир 打印得很好,我可以做些什么来制作信息丰富的功能 return 俄语字符?
我想通了我做错了什么,
vocab = nltk.corpus.reader.CategorizedPlaintextCorpusReader(
"C:\Users\Admin\AppData\Roaming\nltk_data\corpora\russian\vocab", r'.*\.txt', cat_pattern=r'^(noun|verb)', encoding="utf8"
当我导入我的 vocab 文件夹时,我将它编码为 latin-1
一切顺利,西里尔字符已返回给我
Most Informative Features
three_last_letters = 'ать' verb : noun = 15.2 : 1.0
three_last_letters = 'де' noun : verb = 2.6 : 1.0
three_last_letters = 'сть' noun : verb = 1.5 : 1.0
three_last_letters = 'пра' noun : verb = 1.4 : 1.0
three_last_letters = 'ина' noun : verb = 1.4 : 1.0
我现在已经切换到 Python 3.6,但是当 运行 信息丰富的功能时,当我尝试在我的功能提取器中打印俄语时,我最终遇到了乱码。
Most Informative Features
three_last_letters = 'оÌ' noun : verb = 6.6 : 1.0
three_last_letters = 'гÐ' noun : verb = 5.4 : 1.0
three_last_letters = 'еÐ' noun : verb = 4.7 : 1.0
three_last_letters = 'мÐ' noun : verb = 4.4 : 1.0
three_last_letters = 'нÑ' noun : verb = 3.5 : 1.0
对于特征提取器本身
def POS_features(word):
return{'three_last_letters':word[-3:]}
print(POS_features(u'Богатир'))
我可以让 тир 打印得很好,我可以做些什么来制作信息丰富的功能 return 俄语字符?
我想通了我做错了什么,
vocab = nltk.corpus.reader.CategorizedPlaintextCorpusReader(
"C:\Users\Admin\AppData\Roaming\nltk_data\corpora\russian\vocab", r'.*\.txt', cat_pattern=r'^(noun|verb)', encoding="utf8"
当我导入我的 vocab 文件夹时,我将它编码为 latin-1 一切顺利,西里尔字符已返回给我
Most Informative Features
three_last_letters = 'ать' verb : noun = 15.2 : 1.0
three_last_letters = 'де' noun : verb = 2.6 : 1.0
three_last_letters = 'сть' noun : verb = 1.5 : 1.0
three_last_letters = 'пра' noun : verb = 1.4 : 1.0
three_last_letters = 'ина' noun : verb = 1.4 : 1.0