如何让 TextBlob 与 Ubuntu 上的所有用户一起工作?
How to get TextBlob to work with all users on Ubuntu?
我正在尝试为 Unix 服务器上的一些队友启动 TextBlob 并 运行ning,当我 运行 使用 TextBlob 的脚本时,它似乎工作正常 运行以 root 身份登录,但是当我尝试使用我创建的新帐户时,出现以下错误:
**********************************************************************
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- '/home/USERNAME/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
**********************************************************************
Traceback (most recent call last):
File "sampleClassifier.py", line 25, in <module>
cl = NaiveBayesClassifier(train)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 192, in __init__
self.train_features = [(self.extract_features(d), c) for d, c in self.train_set]
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 169, in extract_features
return self.feature_extractor(text, self.train_set)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 81, in basic_extractor
word_features = _get_words_from_dataset(train_set)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 63, in _get_words_from_dataset
return set(all_words)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 62, in <genexpr>
all_words = chain.from_iterable(tokenize(words) for words, _ in dataset)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 59, in tokenize
return word_tokenize(words, include_punc=False)
File "/usr/local/lib/python2.7/dist-packages/textblob/tokenizers.py", line 72, in word_tokenize
for sentence in sent_tokenize(text))
File "/usr/local/lib/python2.7/dist-packages/textblob/base.py", line 64, in itokenize
return (t for t in self.tokenize(text, *args, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/textblob/decorators.py", line 38, in decorated
raise MissingCorpusError()
textblob.exceptions.MissingCorpusError:
Looks like you are missing some required data for this feature.
To download the necessary data, simply run
python -m textblob.download_corpora
or use the NLTK downloader to download the missing data: http://nltk.org/data.html
If this doesn't fix the problem, file an issue at https://github.com/sloria/TextBlob/issues.
我们使用的机器非常小,所以我无法通过为不同的用户多次下载语料库来压倒它 - 有人知道我如何解决这个问题吗?我已经为 root 安装了它,但我不知道软件包在哪里或如何找到它们。
按照 docs 中的说明进行操作应该可行。尝试设置 NLTK_DATA
环境变量,看看它是否适用于新用户。
我正在尝试为 Unix 服务器上的一些队友启动 TextBlob 并 运行ning,当我 运行 使用 TextBlob 的脚本时,它似乎工作正常 运行以 root 身份登录,但是当我尝试使用我创建的新帐户时,出现以下错误:
**********************************************************************
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- '/home/USERNAME/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
**********************************************************************
Traceback (most recent call last):
File "sampleClassifier.py", line 25, in <module>
cl = NaiveBayesClassifier(train)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 192, in __init__
self.train_features = [(self.extract_features(d), c) for d, c in self.train_set]
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 169, in extract_features
return self.feature_extractor(text, self.train_set)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 81, in basic_extractor
word_features = _get_words_from_dataset(train_set)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 63, in _get_words_from_dataset
return set(all_words)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 62, in <genexpr>
all_words = chain.from_iterable(tokenize(words) for words, _ in dataset)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 59, in tokenize
return word_tokenize(words, include_punc=False)
File "/usr/local/lib/python2.7/dist-packages/textblob/tokenizers.py", line 72, in word_tokenize
for sentence in sent_tokenize(text))
File "/usr/local/lib/python2.7/dist-packages/textblob/base.py", line 64, in itokenize
return (t for t in self.tokenize(text, *args, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/textblob/decorators.py", line 38, in decorated
raise MissingCorpusError()
textblob.exceptions.MissingCorpusError:
Looks like you are missing some required data for this feature.
To download the necessary data, simply run
python -m textblob.download_corpora
or use the NLTK downloader to download the missing data: http://nltk.org/data.html
If this doesn't fix the problem, file an issue at https://github.com/sloria/TextBlob/issues.
我们使用的机器非常小,所以我无法通过为不同的用户多次下载语料库来压倒它 - 有人知道我如何解决这个问题吗?我已经为 root 安装了它,但我不知道软件包在哪里或如何找到它们。
按照 docs 中的说明进行操作应该可行。尝试设置 NLTK_DATA
环境变量,看看它是否适用于新用户。