Gensim mallet CalledProcessError: returned non-zero exit status
Gensim mallet CalledProcessError: returned non-zero exit status
我在尝试访问 jupyter 笔记本中的 gensims mallet 时遇到错误。我在笔记本所在的文件夹中有指定的文件 'mallet',但似乎无法访问它。我尝试从 C 驱动器路由到它,但我仍然遇到相同的错误。请帮助:)
import os
from gensim.models.wrappers import LdaMallet
#os.environ.update({'MALLET_HOME':r'C:/Users/new_mallet/mallet-2.0.8/'})
mallet_path = 'mallet' # update this path
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=bow_corpus, num_topics=20, id2word=dictionary)
result = (ldamallet.show_topics(num_topics=3, num_words=10,formatted=False))
for each in result:
print (each)
将路径更新为:
mallet_path = 'C:/mallet/mallet-2.0.8/bin/mallet.bat'
并编辑 mallet 2.0.8 文件夹中的记事本 mallet.bat 为:
@echo off
rem This batch file serves as a wrapper for several
rem MALLET command line tools.
if not "%MALLET_HOME%" == "" goto gotMalletHome
echo MALLET requires an environment variable MALLET_HOME.
goto :eof
:gotMalletHome
set MALLET_CLASSPATH=C:\mallet\mallet-2.0.8\class;C:\mallet\mallet-2.0.8\lib\mallet-deps.jar
set MALLET_MEMORY=1G
set MALLET_ENCODING=UTF-8
set CMD=%1
shift
set CLASS=
if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors
if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors
if "%CMD%"=="import-svmlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors
if "%CMD%"=="info" set CLASS=cc.mallet.classify.tui.Vectors2Info
if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify
if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify
if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify
if "%CMD%"=="classify-svmlight" set CLASS=cc.mallet.classify.tui.SvmLight2Classify
if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.TopicTrainer
if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics
if "%CMD%"=="evaluate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics
if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader
if "%CMD%"=="run" set CLASS=%1 & shift
if not "%CLASS%" == "" goto gotClass
echo Mallet 2.0 commands:
echo import-dir load the contents of a directory into mallet instances (one per file)
echo import-file load a single file into mallet instances (one per line)
echo import-svmlight load a single SVMLight format data file into mallet instances (one per line)
echo info get information about Mallet instances
echo train-classifier train a classifier from Mallet data files
echo classify-dir classify data from a single file with a saved classifier
echo classify-file classify the contents of a directory with a saved classifier
echo classify-svmlight classify data from a single file in SVMLight format
echo train-topics train a topic model from Mallet data files
echo infer-topics use a trained topic model to infer topics for new documents
echo evaluate-topics estimate the probability of new documents given a trained model
echo prune remove features based on frequency or information gain
echo split divide data into testing, training, and validation portions
echo bulk-load for big input files, efficiently prune vocabulary and import docs
echo Include --help with any option for more information
goto :eof
:gotClass
set MALLET_ARGS=
:getArg
if "%1"=="" goto run
set MALLET_ARGS=%MALLET_ARGS% %1
shift
goto getArg
:run
"C:\Program Files\Java\jdk-12\bin\java" -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
:eof
在命令行中,这些命令有助于弄清楚发生了什么:
notepad mallet.bat
java
C:\Program Files\Java\jdk-12\bin\java
dir /OD
cd %userdir%
cd %userpath%
cd\
cd users
cd your_username
cd appdata\local\temp
dir /OD
问题出在 java 没有正确安装或路径不包括 java 和 mallet 类路径没有正确定义。更多信息在这里:https://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.html。这解决了我的错误,希望它能帮助其他人:)
我遇到了同样的问题。我所做的是将 mallet 文件夹的位置更改为 c://new_mallet
所以效果很好
import os
os.environ.update({'MALLET_HOME': r'C:/new_mallet/mallet-2.0.8/'})
mallet_path = 'C:/new_mallet/mallet-2.0.8/bin/mallet' # update this path
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=10, id2word=id2word)
在 Python 的 Jupyter Notebook 中,我 运行
conda uninstall gensim
conda install gensim
以管理员身份在 cmd 中重新启动我的内核。在我花了很多时间在线搜索后,工作起来很有魅力。
确保您安装了 Java 开发人员工具包 (JDK)。
功劳归于
安装 JDK 后,LDA Mallet 的以下代码非常有效!
import os
from gensim.models.wrappers import LdaMallet
os.environ.update({'MALLET_HOME':r'C:/mallet/mallet-2.0.8/'})
mallet_path = r'C:/mallet/mallet-2.0.8/bin/mallet.bat'
lda_mallet = LdaMallet(
mallet_path,
corpus = corpus_bow,
num_topics = n_topics,
id2word = dct,
)
对我来说,这不是导入或路径问题。
我花了几个小时试图解决它。
尝试了这个 solution 但没有任何效果。
查看我之前对 LDA Mallet 的成功调用,我注意到一些参数没有设置,然后我这样做了:
gensim.models.wrappers.LdaMallet(mallet_path=mallet_path, corpus=语料库, num_topics=num_topics, id2word=id2word, prefix='temp_file_' , 工人=4)
真心希望对你有帮助。找到解决这个问题的方法很痛苦。
对于linux,我发现需要显式定义二进制槌路径。以下代码有效。
from gensim.test.utils import common_corpus, common_dictionary
from gensim.models.wrappers import LdaMallet
mallet_path = "/path/Mallet/bin/mallet"
model = LdaMallet(mallet_path=mallet_path, corpus=common_corpus, num_topics=2, id2word=common_dictionary)
对于那些仍在苦苦挣扎并花了数小时尝试许多不同建议的其他人,我终于成功了!
按照此处的说明进行操作(我在 mac)
https://ps.au.dk/fileadmin/ingen_mappe_valgt/installing_mallet.pdf
我在开始这个之前也关闭了anaconda,不知道这是否重要。
在终端中出现以下错误:
(base) myname-MacBook-Air:mallet-2.0.8 myname$ ./bin/mallet
-bash: ./bin/mallet: /bin/bash: bad interpreter: Operation not permitted
然后我按照这些说明取消隔离
“bad interpreter: Operation not permitted” Error on El Capitan
重新打开 anaconda,一切正常!
我通过下载解决了这个问题 JDK java https://docs.oracle.com/en/java/javase/15/install/installation-jdk-macos.html#GUID-F9183C70-2E96-40F4-9104-F3814A5A331F
我遇到了同样的错误,因为我忘记在 ubuntu 上安装 java。
我在尝试访问 jupyter 笔记本中的 gensims mallet 时遇到错误。我在笔记本所在的文件夹中有指定的文件 'mallet',但似乎无法访问它。我尝试从 C 驱动器路由到它,但我仍然遇到相同的错误。请帮助:)
import os
from gensim.models.wrappers import LdaMallet
#os.environ.update({'MALLET_HOME':r'C:/Users/new_mallet/mallet-2.0.8/'})
mallet_path = 'mallet' # update this path
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=bow_corpus, num_topics=20, id2word=dictionary)
result = (ldamallet.show_topics(num_topics=3, num_words=10,formatted=False))
for each in result:
print (each)
将路径更新为:
mallet_path = 'C:/mallet/mallet-2.0.8/bin/mallet.bat'
并编辑 mallet 2.0.8 文件夹中的记事本 mallet.bat 为:
@echo off
rem This batch file serves as a wrapper for several
rem MALLET command line tools.
if not "%MALLET_HOME%" == "" goto gotMalletHome
echo MALLET requires an environment variable MALLET_HOME.
goto :eof
:gotMalletHome
set MALLET_CLASSPATH=C:\mallet\mallet-2.0.8\class;C:\mallet\mallet-2.0.8\lib\mallet-deps.jar
set MALLET_MEMORY=1G
set MALLET_ENCODING=UTF-8
set CMD=%1
shift
set CLASS=
if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors
if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors
if "%CMD%"=="import-svmlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors
if "%CMD%"=="info" set CLASS=cc.mallet.classify.tui.Vectors2Info
if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify
if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify
if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify
if "%CMD%"=="classify-svmlight" set CLASS=cc.mallet.classify.tui.SvmLight2Classify
if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.TopicTrainer
if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics
if "%CMD%"=="evaluate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics
if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader
if "%CMD%"=="run" set CLASS=%1 & shift
if not "%CLASS%" == "" goto gotClass
echo Mallet 2.0 commands:
echo import-dir load the contents of a directory into mallet instances (one per file)
echo import-file load a single file into mallet instances (one per line)
echo import-svmlight load a single SVMLight format data file into mallet instances (one per line)
echo info get information about Mallet instances
echo train-classifier train a classifier from Mallet data files
echo classify-dir classify data from a single file with a saved classifier
echo classify-file classify the contents of a directory with a saved classifier
echo classify-svmlight classify data from a single file in SVMLight format
echo train-topics train a topic model from Mallet data files
echo infer-topics use a trained topic model to infer topics for new documents
echo evaluate-topics estimate the probability of new documents given a trained model
echo prune remove features based on frequency or information gain
echo split divide data into testing, training, and validation portions
echo bulk-load for big input files, efficiently prune vocabulary and import docs
echo Include --help with any option for more information
goto :eof
:gotClass
set MALLET_ARGS=
:getArg
if "%1"=="" goto run
set MALLET_ARGS=%MALLET_ARGS% %1
shift
goto getArg
:run
"C:\Program Files\Java\jdk-12\bin\java" -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
:eof
在命令行中,这些命令有助于弄清楚发生了什么:
notepad mallet.bat
java
C:\Program Files\Java\jdk-12\bin\java
dir /OD
cd %userdir%
cd %userpath%
cd\
cd users
cd your_username
cd appdata\local\temp
dir /OD
问题出在 java 没有正确安装或路径不包括 java 和 mallet 类路径没有正确定义。更多信息在这里:https://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.html。这解决了我的错误,希望它能帮助其他人:)
我遇到了同样的问题。我所做的是将 mallet 文件夹的位置更改为 c://new_mallet 所以效果很好
import os
os.environ.update({'MALLET_HOME': r'C:/new_mallet/mallet-2.0.8/'})
mallet_path = 'C:/new_mallet/mallet-2.0.8/bin/mallet' # update this path
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=10, id2word=id2word)
在 Python 的 Jupyter Notebook 中,我 运行
conda uninstall gensim
conda install gensim
以管理员身份在 cmd 中重新启动我的内核。在我花了很多时间在线搜索后,工作起来很有魅力。
确保您安装了 Java 开发人员工具包 (JDK)。
功劳归于
安装 JDK 后,LDA Mallet 的以下代码非常有效!
import os
from gensim.models.wrappers import LdaMallet
os.environ.update({'MALLET_HOME':r'C:/mallet/mallet-2.0.8/'})
mallet_path = r'C:/mallet/mallet-2.0.8/bin/mallet.bat'
lda_mallet = LdaMallet(
mallet_path,
corpus = corpus_bow,
num_topics = n_topics,
id2word = dct,
)
对我来说,这不是导入或路径问题。
我花了几个小时试图解决它。 尝试了这个 solution 但没有任何效果。
查看我之前对 LDA Mallet 的成功调用,我注意到一些参数没有设置,然后我这样做了:
gensim.models.wrappers.LdaMallet(mallet_path=mallet_path, corpus=语料库, num_topics=num_topics, id2word=id2word, prefix='temp_file_' , 工人=4)
真心希望对你有帮助。找到解决这个问题的方法很痛苦。
对于linux,我发现需要显式定义二进制槌路径。以下代码有效。
from gensim.test.utils import common_corpus, common_dictionary
from gensim.models.wrappers import LdaMallet
mallet_path = "/path/Mallet/bin/mallet"
model = LdaMallet(mallet_path=mallet_path, corpus=common_corpus, num_topics=2, id2word=common_dictionary)
对于那些仍在苦苦挣扎并花了数小时尝试许多不同建议的其他人,我终于成功了!
按照此处的说明进行操作(我在 mac)
https://ps.au.dk/fileadmin/ingen_mappe_valgt/installing_mallet.pdf
我在开始这个之前也关闭了anaconda,不知道这是否重要。
在终端中出现以下错误:
(base) myname-MacBook-Air:mallet-2.0.8 myname$ ./bin/mallet
-bash: ./bin/mallet: /bin/bash: bad interpreter: Operation not permitted
然后我按照这些说明取消隔离
“bad interpreter: Operation not permitted” Error on El Capitan
重新打开 anaconda,一切正常!
我通过下载解决了这个问题 JDK java https://docs.oracle.com/en/java/javase/15/install/installation-jdk-macos.html#GUID-F9183C70-2E96-40F4-9104-F3814A5A331F
我遇到了同样的错误,因为我忘记在 ubuntu 上安装 java。