如何使用 python 中的文本文件的拆分方法制作词袋
How to make a bag of words using split method from a text file in python
我正在努力学习 TFIDF。但是我无法从文件中提取单词。
代码:
docA = open("/home/user/Desktop/da/doca","r")
print(docA.read())
bowA = docA.split(" ")
错误:
AttributeError
Traceback (most recent call last)
<ipython-input-32-06e07f9dd975> in <module>
----> 1 bowA = docA.split(" ")
AttributeError: '_io.TextIOWrapper' object has no attribute 'split'`
Can anyone help me solve this?
您想使用返回的字符串而不是文件句柄:
docA = open("/home/user/Desktop/da/doca","r")
document_string = docA.read()
bowA = document_string.split()
您可以直接调用 split
,默认情况下它会在空格处拆分
我假设你的意思是:
docA = open("/home/user/Desktop/da/doca","r")
# print(docA.read())
bowA = docA.read().split(" ") # or just split() will do
docA.close()
当您调用 read()
时,读取光标将读取整个文件,将读取光标留在末尾。所以再次调用 read()
将 return 空字符串。因此,如果你想打印内容,你可以将内容分配给一个变量,打印它并随意使用它:
docA = open("/home/user/Desktop/da/doca","r")
data = docA.read()
print(data)
bowA = data.split()
docA.close()
或者干脆
with open("/home/user/Desktop/da/doca","r") as docA:
data = docA.read()
print(data)
bowA = data.split()
我正在努力学习 TFIDF。但是我无法从文件中提取单词。
代码:
docA = open("/home/user/Desktop/da/doca","r")
print(docA.read())
bowA = docA.split(" ")
错误:
AttributeError
Traceback (most recent call last)
<ipython-input-32-06e07f9dd975> in <module>
----> 1 bowA = docA.split(" ")
AttributeError: '_io.TextIOWrapper' object has no attribute 'split'`
Can anyone help me solve this?
您想使用返回的字符串而不是文件句柄:
docA = open("/home/user/Desktop/da/doca","r")
document_string = docA.read()
bowA = document_string.split()
您可以直接调用 split
,默认情况下它会在空格处拆分
我假设你的意思是:
docA = open("/home/user/Desktop/da/doca","r")
# print(docA.read())
bowA = docA.read().split(" ") # or just split() will do
docA.close()
当您调用 read()
时,读取光标将读取整个文件,将读取光标留在末尾。所以再次调用 read()
将 return 空字符串。因此,如果你想打印内容,你可以将内容分配给一个变量,打印它并随意使用它:
docA = open("/home/user/Desktop/da/doca","r")
data = docA.read()
print(data)
bowA = data.split()
docA.close()
或者干脆
with open("/home/user/Desktop/da/doca","r") as docA:
data = docA.read()
print(data)
bowA = data.split()