AttributeError: 'list' object has no attribute 'lower' gensim
AttributeError: 'list' object has no attribute 'lower' gensim
我在文本文件中有一个包含 10k 个单词的列表,如下所示:
G15
KDN
C30A
行动标准
气刷
空气稀释
我正在尝试使用此代码将它们转换为小写标记,以便使用 GenSim 进行后续处理:
data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')]
texts = [[word for word in data.lower().split()] for word in data]
我收到以下回调:
AttributeErrorTraceback (most recent call last)
<ipython-input-84-33bbe380449e> in <module>()
1 data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')]
----> 2 texts = [[word for word in data.lower().split()] for word in data]
3
AttributeError: 'list' object has no attribute 'lower'
任何关于我做错了什么以及如何纠正它的建议将不胜感激!!!谢谢!!
你需要
texts = [[word.lower() for word in line.split()] for line in data]
此代码为 data
([... for line in data]
) 中的每个 line
生成一个小写单词列表 ([word.lower() for word in line.split()]
)。每个 str line
将包含 space-separated 个单词的序列。line.split()
会将这个序列变成列表。 word.lower()
会将每个单词转换为小写。
尝试:
data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')]
texts = [[word.lower() for word in text.split()] for text in data]
您正在尝试将 .lower() 应用于数据,这是一个列表。
.lower() 只能应用于字符串。
你做错的是,为列表(在你的例子中是数据)调用字符串方法(lower()
)
data = [line.strip() for line in open('corpus.txt', 'r')]
获取行作为列表条目后你应该做的是
texts = [[words for words in sentences.lower().split()] for sentences in data]
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*********^^^^^^^^^^^^^^^^^^^^^^*********^^^^
#you should call lower on iter. value - in our case it is "sentences"
这将为您提供列表列表。每个列表都包含行中的小写单词。
$ tail -n 10 corpus.txt
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> data = [line.strip() for line in open('corpus.txt', 'r')]
>>> texts = [[words for words in sentences.lower().split()] for sentences in data]
>>> texts[:5]
[['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution']]
>>>
确定你可以展平或保持原样。
>>> flattened = reduce(lambda x,y: x+y, texts)
>>> flattened[:30]
['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a']
>>>
我们可以简单地将列表转换成小的后者。
>>> words = ["PYTHON", "PROGRAMMING"]
>>> type((words))
>>> for i in words:
print(i.lower())
输出:
python programming
我在文本文件中有一个包含 10k 个单词的列表,如下所示:
G15 KDN C30A 行动标准 气刷 空气稀释
我正在尝试使用此代码将它们转换为小写标记,以便使用 GenSim 进行后续处理:
data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')]
texts = [[word for word in data.lower().split()] for word in data]
我收到以下回调:
AttributeErrorTraceback (most recent call last)
<ipython-input-84-33bbe380449e> in <module>()
1 data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')]
----> 2 texts = [[word for word in data.lower().split()] for word in data]
3
AttributeError: 'list' object has no attribute 'lower'
任何关于我做错了什么以及如何纠正它的建议将不胜感激!!!谢谢!!
你需要
texts = [[word.lower() for word in line.split()] for line in data]
此代码为 data
([... for line in data]
) 中的每个 line
生成一个小写单词列表 ([word.lower() for word in line.split()]
)。每个 str line
将包含 space-separated 个单词的序列。line.split()
会将这个序列变成列表。 word.lower()
会将每个单词转换为小写。
尝试:
data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')]
texts = [[word.lower() for word in text.split()] for text in data]
您正在尝试将 .lower() 应用于数据,这是一个列表。
.lower() 只能应用于字符串。
你做错的是,为列表(在你的例子中是数据)调用字符串方法(lower()
)
data = [line.strip() for line in open('corpus.txt', 'r')]
获取行作为列表条目后你应该做的是
texts = [[words for words in sentences.lower().split()] for sentences in data]
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*********^^^^^^^^^^^^^^^^^^^^^^*********^^^^
#you should call lower on iter. value - in our case it is "sentences"
这将为您提供列表列表。每个列表都包含行中的小写单词。
$ tail -n 10 corpus.txt
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> data = [line.strip() for line in open('corpus.txt', 'r')]
>>> texts = [[words for words in sentences.lower().split()] for sentences in data]
>>> texts[:5]
[['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution']]
>>>
确定你可以展平或保持原样。
>>> flattened = reduce(lambda x,y: x+y, texts)
>>> flattened[:30]
['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a']
>>>
我们可以简单地将列表转换成小的后者。
>>> words = ["PYTHON", "PROGRAMMING"]
>>> type((words))
>>> for i in words:
print(i.lower())
输出:
python programming