Python Gensim LDA "DeprecationWarning: invalid escape sequence"
Python LDA gensim "DeprecationWarning: invalid escape sequence"
我是 Whosebug 的新手,python 所以请多多包涵。
我正在尝试使用 PyCharm 编辑器对 python 中带有 gensim 包的文本语料库进行 运行 潜在狄利克雷分析。我在 R 中准备了语料库并使用此 R 命令将其导出到 csv 文件:
write.csv(testdf, "C://...//test.csv", fileEncoding = "utf-8")
它创建了以下 csv 结构(尽管文本更长并且已经过预处理):
,"datetimestamp","id","origin","text"
1,"1960-01-01","id_1","Newspaper1","Test text one"
2,"1960-01-02","id_2","Newspaper1","Another text"
3,"1960-01-03","id_3","Newspaper1","Yet another text"
4,"1960-01-04","id_4","Newspaper2","Four Five Six"
5,"1960-01-05","id_5","Newspaper2","Alpha Bravo Charly"
6,"1960-01-06","id_6","Newspaper2","Singing Dancing Laughing"
然后我尝试使用以下基本 python 代码(基于 gensim tutorials)来执行简单的 LDA 分析:
import gensim
from gensim import corpora, models, similarities, parsing
import pandas as pd
from six import iteritems
import os
import pyLDAvis.gensim
class MyCorpus(object):
def __iter__(self):
for row in pd.read_csv('//mpifg.local/dfs/home/lu/Meine Daten/Imagined Futures and Greek State Bonds/Topic Modelling/Python/test.csv', index_col=False, header = 0 ,encoding='utf-8')['text']:
# assume there's one document per line, tokens separated by whitespace
yield dictionary.doc2bow(row.split())
if __name__ == '__main__':
dictionary = corpora.Dictionary(row.split() for row in pd.read_csv(
'//.../test.csv', index_col=False, encoding='utf-8')['text'])
print(dictionary)
dictionary.save(
'//.../greekdict.dict') # store the dictionary, for future reference
## create an mmCorpus
corpora.MmCorpus.serialize('//.../greekcorpus.mm', MyCorpus())
corpus = corpora.MmCorpus('//.../greekcorpus.mm')
dictionary = corpora.Dictionary.load('//.../greekdict.dict')
corpus = corpora.MmCorpus('//.../greekcorpus.mm')
# train model
lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dictionary, num_topics=50, iterations=1000)
我收到以下错误代码并且代码退出:
...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:832: DeprecationWarning: invalid escape sequence \d
\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2736: DeprecationWarning: invalid escape sequence \d
\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2914: DeprecationWarning: invalid escape sequence \g
\...\Python\venv\lib\site-packages\pyLDAvis_prepare.py:387:
DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
我找不到任何解决方案,老实说,我也不知道问题到底出在哪里。我花了几个小时确保 csv 的编码是 utf-8 并正确导出(从 R)和导入(在 python 中)。
我哪里做错了或者我还能看哪里?干杯!
DeprecationWarining
正是 - 警告一个功能被 弃用 应该提示用户使用其他功能而不是在将来保持兼容性.所以在你的情况下,我只会关注你使用的库的更新。
从最后一个警告开始,它看起来像是来自 pandas
并且已根据 pyLDAvis
here.
进行记录
其余的来自 pyparsing
模块,但您似乎没有明确导入它。也许您使用的某个库具有依赖性并使用了一些相对较旧且已弃用的功能。为了消除开始时的警告,我会检查升级是否有帮助。祝你好运!
我是 Whosebug 的新手,python 所以请多多包涵。 我正在尝试使用 PyCharm 编辑器对 python 中带有 gensim 包的文本语料库进行 运行 潜在狄利克雷分析。我在 R 中准备了语料库并使用此 R 命令将其导出到 csv 文件:
write.csv(testdf, "C://...//test.csv", fileEncoding = "utf-8")
它创建了以下 csv 结构(尽管文本更长并且已经过预处理):
,"datetimestamp","id","origin","text"
1,"1960-01-01","id_1","Newspaper1","Test text one"
2,"1960-01-02","id_2","Newspaper1","Another text"
3,"1960-01-03","id_3","Newspaper1","Yet another text"
4,"1960-01-04","id_4","Newspaper2","Four Five Six"
5,"1960-01-05","id_5","Newspaper2","Alpha Bravo Charly"
6,"1960-01-06","id_6","Newspaper2","Singing Dancing Laughing"
然后我尝试使用以下基本 python 代码(基于 gensim tutorials)来执行简单的 LDA 分析:
import gensim
from gensim import corpora, models, similarities, parsing
import pandas as pd
from six import iteritems
import os
import pyLDAvis.gensim
class MyCorpus(object):
def __iter__(self):
for row in pd.read_csv('//mpifg.local/dfs/home/lu/Meine Daten/Imagined Futures and Greek State Bonds/Topic Modelling/Python/test.csv', index_col=False, header = 0 ,encoding='utf-8')['text']:
# assume there's one document per line, tokens separated by whitespace
yield dictionary.doc2bow(row.split())
if __name__ == '__main__':
dictionary = corpora.Dictionary(row.split() for row in pd.read_csv(
'//.../test.csv', index_col=False, encoding='utf-8')['text'])
print(dictionary)
dictionary.save(
'//.../greekdict.dict') # store the dictionary, for future reference
## create an mmCorpus
corpora.MmCorpus.serialize('//.../greekcorpus.mm', MyCorpus())
corpus = corpora.MmCorpus('//.../greekcorpus.mm')
dictionary = corpora.Dictionary.load('//.../greekdict.dict')
corpus = corpora.MmCorpus('//.../greekcorpus.mm')
# train model
lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dictionary, num_topics=50, iterations=1000)
我收到以下错误代码并且代码退出:
...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:832: DeprecationWarning: invalid escape sequence \d
\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2736: DeprecationWarning: invalid escape sequence \d
\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2914: DeprecationWarning: invalid escape sequence \g
\...\Python\venv\lib\site-packages\pyLDAvis_prepare.py:387: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing
我找不到任何解决方案,老实说,我也不知道问题到底出在哪里。我花了几个小时确保 csv 的编码是 utf-8 并正确导出(从 R)和导入(在 python 中)。
我哪里做错了或者我还能看哪里?干杯!
DeprecationWarining
正是 - 警告一个功能被 弃用 应该提示用户使用其他功能而不是在将来保持兼容性.所以在你的情况下,我只会关注你使用的库的更新。
从最后一个警告开始,它看起来像是来自 pandas
并且已根据 pyLDAvis
here.
其余的来自 pyparsing
模块,但您似乎没有明确导入它。也许您使用的某个库具有依赖性并使用了一些相对较旧且已弃用的功能。为了消除开始时的警告,我会检查升级是否有帮助。祝你好运!