如何显示数据集中的特定单词?
How can I show a specific word in a data set?
刚开始学习python。我有一个关于在 excel.
中匹配我的数据集中的一些词的问题
words_list 包含了一些我想在数据集中找到的词。
words_list = ('tried','mobile','abc')
df 是来自 excel 的摘录,并选取了一个列。
df =
0 to make it possible or easier for someone to do ...
1 unable to acquire a buffer item very likely ...
2 The organization has tried to make...
3 Broadway tried a variety of mobile Phone for the..
我想得到这样的结果:
'None',
'None',
'tried',
'tried','mobile'
我在 Jupiter 中这样试过:
list = [ ]
for word in df:
if any (aa in word for aa in words_List):
list.append(word)
else:
list.append('None')
print(list)
但是结果会在df中显示整个句子
'None'
'None'
'The organization has tried to make...'
'Broadway tried a variety of mobile Phone for the..'
我可以只在单词列表中显示结果吗?
对不起我的英语和
谢谢大家
我建议对 DataFrame
进行操作(这应该始终是您的第一个想法,使用 pandas 的力量)
import pandas as pd
words_list = {'tried', 'mobile', 'abc'}
df = pd.DataFrame({'col': ['to make it possible or easier for someone to do',
'unable to acquire a buffer item very likely',
'The organization has tried to make',
'Broadway tried a variety of mobile Phone for the']})
df['matches'] = df['col'].str.split().apply(lambda x: set(x) & words_list)
print(df)
col matches
0 to make it possible or easier for someone to do {}
1 unable to acquire a buffer item very likely {}
2 The organization has tried to make {tried}
3 Broadway tried a variety of mobile Phone for the {mobile, tried}
它打印整行的原因与您有关:
for word in df:
你的“word”变量实际上占了整行。然后它检查整行以查看它是否包含您的搜索词。如果它确实找到了它,那么它基本上会说,“是的,我在这一行中找到了 ____,因此将该行添加到您的列表中。
听起来你想做的是先将行拆分成单词,然后再检查。
list = [ ]
found = False
for line in df:
words = line.split(" ")
for word in word_list:
if word in words:
found = True
list.append(word)
# this is just to append "None" if nothing found
if found:
found = False
else:
list.append("None")
print(list)
附带说明一下,在处理列表时,您可能希望使用 pprint 而不是 print。它以更易于阅读的布局打印列表、词典等。我不知道你是否需要安装这个包。这取决于您最初的安装方式 python。但用法是这样的:
from pprint import pprint
dictionary = {'firstkey':'firstval','secondkey':'secondval','thirdkey':'thirdval'}
pprint(dictionary)
刚开始学习python。我有一个关于在 excel.
中匹配我的数据集中的一些词的问题words_list 包含了一些我想在数据集中找到的词。
words_list = ('tried','mobile','abc')
df 是来自 excel 的摘录,并选取了一个列。
df =
0 to make it possible or easier for someone to do ...
1 unable to acquire a buffer item very likely ...
2 The organization has tried to make...
3 Broadway tried a variety of mobile Phone for the..
我想得到这样的结果:
'None',
'None',
'tried',
'tried','mobile'
我在 Jupiter 中这样试过:
list = [ ]
for word in df:
if any (aa in word for aa in words_List):
list.append(word)
else:
list.append('None')
print(list)
但是结果会在df中显示整个句子
'None'
'None'
'The organization has tried to make...'
'Broadway tried a variety of mobile Phone for the..'
我可以只在单词列表中显示结果吗?
对不起我的英语和
谢谢大家
我建议对 DataFrame
进行操作(这应该始终是您的第一个想法,使用 pandas 的力量)
import pandas as pd
words_list = {'tried', 'mobile', 'abc'}
df = pd.DataFrame({'col': ['to make it possible or easier for someone to do',
'unable to acquire a buffer item very likely',
'The organization has tried to make',
'Broadway tried a variety of mobile Phone for the']})
df['matches'] = df['col'].str.split().apply(lambda x: set(x) & words_list)
print(df)
col matches
0 to make it possible or easier for someone to do {}
1 unable to acquire a buffer item very likely {}
2 The organization has tried to make {tried}
3 Broadway tried a variety of mobile Phone for the {mobile, tried}
它打印整行的原因与您有关:
for word in df:
你的“word”变量实际上占了整行。然后它检查整行以查看它是否包含您的搜索词。如果它确实找到了它,那么它基本上会说,“是的,我在这一行中找到了 ____,因此将该行添加到您的列表中。
听起来你想做的是先将行拆分成单词,然后再检查。
list = [ ]
found = False
for line in df:
words = line.split(" ")
for word in word_list:
if word in words:
found = True
list.append(word)
# this is just to append "None" if nothing found
if found:
found = False
else:
list.append("None")
print(list)
附带说明一下,在处理列表时,您可能希望使用 pprint 而不是 print。它以更易于阅读的布局打印列表、词典等。我不知道你是否需要安装这个包。这取决于您最初的安装方式 python。但用法是这样的:
from pprint import pprint
dictionary = {'firstkey':'firstval','secondkey':'secondval','thirdkey':'thirdval'}
pprint(dictionary)