未找到 NLTK 路透社数据集
NLTK reuters datasets not found
我使用以下命令从 nltk 下载路透社数据集:
import nltk
nltk.download('reuters')
我确认数据集已下载,我可以在 "C:/Users/username/AppData/Roaming/nltk_data" 下看到它。
但是,当我想读取数据集时,python看不到!我收到以下错误:
C:\Users\username\python\Python37-32\Lib\site-packages\sklearn\externals\joblib\externals\cloudpickle\cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Traceback (most recent call last):
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 80, in __load
try: root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\data.py", line 675, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mreuters[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('reuters')
[0m
Searched in:
- 'C:\Users\username/nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'
- 'C:\Users\username\python\Python37-32\nltk_data'
- 'C:\Users\username\python\Python37-32\share\nltk_data'
- 'C:\Users\username\python\Python37-32\lib\nltk_data'
- 'C:\Users\username\AppData\Roaming\nltk_data'
*******
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\username\eclipse-workspace\ML\src\PAs\pa2\Test.py", line 17, in <module>
from commons import util, datasets, runClassifier, mlGraphics
File "C:\Users\username\eclipse-workspace\ML\src\commons\datasets.py", line 258, in <module>
class Reuters:
File "C:\Users\username\eclipse-workspace\ML\src\commons\datasets.py", line 259, in Reuters
documents = reuters.fileids()
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 116, in __getattr__
self.__load()
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 81, in __load
except LookupError: raise e
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 78, in __load
root = nltk.data.find('{}/{}'.format(self.subdir, self.__name))
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\data.py", line 675, in find
raise LookupError(resource_not_found)
LookupError:
*********
Resource [93mreuters[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('reuters')
[0m
Searched in:
- 'C:\Users\username/nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'
- 'C:\Users\username\python\Python37-32\nltk_data'
- 'C:\Users\username\python\Python37-32\share\nltk_data'
- 'C:\Users\username\python\Python37-32\lib\nltk_data'
-C:\Users\username\AppData\Roaming\nltk_data'
我试图手动创建一个目录 "C:/Users/username/nltk_data" 并将 reuters.zip 粘贴到那里,但这没有帮助!
当我使用 nltk.download() 再次下载它时,它显示以下内容:
[nltk_data] Downloading package reuters to C:\Users\username/nltk_data...
[nltk_data] Package reuters is already up-to-date!
有什么提示吗?
我也想知道为什么python打印的路径同时包含斜杠/
和反斜杠\
?
这是我的代码。您可以获得相应的帮助
import nltk
#nltk.download('punkt')
#nltk.download('averaged_perceptron_tagger')
var = open("e:\Assignment\my_file.txt","r") #open file
lines = var.read() #read all lines
sentences = nltk.sent_tokenize(lines) #tokenize sentences
nouns = [] #empty to array to hold all nouns
for sentence in sentences:
for word,pos in nltk.pos_tag(nltk.word_tokenize(str(sentence))):
if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'):
nouns.append(word)
print (nouns)
由于在将 nltk
与 python 3.7
一起使用时 imp
模块已弃用,请使用 import importlib
而不是 import imp
,或尝试 运行旧版本 python
的代码。
在我的例子中,我只是转到下载语料库的文件夹并解压缩存档。
要查看语料库的下载位置:
nltk.download('reuters')
[nltk_data] Downloading package reuters to /home/denys/nltk_data...
[nltk_data] Package reuters is already up-to-date!
我使用以下命令从 nltk 下载路透社数据集:
import nltk
nltk.download('reuters')
我确认数据集已下载,我可以在 "C:/Users/username/AppData/Roaming/nltk_data" 下看到它。
但是,当我想读取数据集时,python看不到!我收到以下错误:
C:\Users\username\python\Python37-32\Lib\site-packages\sklearn\externals\joblib\externals\cloudpickle\cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Traceback (most recent call last):
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 80, in __load
try: root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\data.py", line 675, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mreuters[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('reuters')
[0m
Searched in:
- 'C:\Users\username/nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'
- 'C:\Users\username\python\Python37-32\nltk_data'
- 'C:\Users\username\python\Python37-32\share\nltk_data'
- 'C:\Users\username\python\Python37-32\lib\nltk_data'
- 'C:\Users\username\AppData\Roaming\nltk_data'
*******
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\username\eclipse-workspace\ML\src\PAs\pa2\Test.py", line 17, in <module>
from commons import util, datasets, runClassifier, mlGraphics
File "C:\Users\username\eclipse-workspace\ML\src\commons\datasets.py", line 258, in <module>
class Reuters:
File "C:\Users\username\eclipse-workspace\ML\src\commons\datasets.py", line 259, in Reuters
documents = reuters.fileids()
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 116, in __getattr__
self.__load()
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 81, in __load
except LookupError: raise e
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 78, in __load
root = nltk.data.find('{}/{}'.format(self.subdir, self.__name))
File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\data.py", line 675, in find
raise LookupError(resource_not_found)
LookupError:
*********
Resource [93mreuters[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('reuters')
[0m
Searched in:
- 'C:\Users\username/nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'
- 'C:\Users\username\python\Python37-32\nltk_data'
- 'C:\Users\username\python\Python37-32\share\nltk_data'
- 'C:\Users\username\python\Python37-32\lib\nltk_data'
-C:\Users\username\AppData\Roaming\nltk_data'
我试图手动创建一个目录 "C:/Users/username/nltk_data" 并将 reuters.zip 粘贴到那里,但这没有帮助! 当我使用 nltk.download() 再次下载它时,它显示以下内容:
[nltk_data] Downloading package reuters to C:\Users\username/nltk_data...
[nltk_data] Package reuters is already up-to-date!
有什么提示吗?
我也想知道为什么python打印的路径同时包含斜杠/
和反斜杠\
?
这是我的代码。您可以获得相应的帮助
import nltk
#nltk.download('punkt')
#nltk.download('averaged_perceptron_tagger')
var = open("e:\Assignment\my_file.txt","r") #open file
lines = var.read() #read all lines
sentences = nltk.sent_tokenize(lines) #tokenize sentences
nouns = [] #empty to array to hold all nouns
for sentence in sentences:
for word,pos in nltk.pos_tag(nltk.word_tokenize(str(sentence))):
if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'):
nouns.append(word)
print (nouns)
由于在将 nltk
与 python 3.7
一起使用时 imp
模块已弃用,请使用 import importlib
而不是 import imp
,或尝试 运行旧版本 python
的代码。
在我的例子中,我只是转到下载语料库的文件夹并解压缩存档。 要查看语料库的下载位置:
nltk.download('reuters')
[nltk_data] Downloading package reuters to /home/denys/nltk_data...
[nltk_data] Package reuters is already up-to-date!