仅在 pandas 列中保留匹配的词
Keep only matched words in pandas column
我只想保留列表中出现的那些词。所有其他词都应该被删除。(pandas 数据框)
cuisine_list = ['breakfast', 'american', 'tea', 'chicken']
name
cuisine
dominos pizza
breakfast american tea dine in
kfc
american chicken play area
结果应该是这样的-
name
cuisine
dominos pizza
breakfast american tea
kfc
american chicken
我正在使用以下代码,但它花费了很多时间。
file1_cuisine = file1[["Cuisine"]]
for index, row in file1_cuisine.iterrows():
words_to_keep = []
for word in row[0].split(' '):
if word in words_to_match :
words_to_keep.append(word + ' ')
file1_cuisine.loc[index, 'final_input_text']= ''.join(words_to_keep)
使用set intersection
using &
with df.apply
and Series.str.split
:
In [760]: y = set(cuisine_list)
In [766]: df['cuisine'] = df['cuisine'].str.split().apply(lambda x: list(set(x) & y)).str.join(',')
In [767]: df
Out[767]:
name cuisine
0 dominos pizza tea,american,breakfast
1 kfc chicken,american
将 lambda 函数与 split
结合使用并设置交集,最后连接值 ,
:
cuisine_list = ['breakfast', 'american', 'tea', 'chicken']
df['cuisine'] = df['cuisine'].apply(lambda x: ','.join(set(x.split()).intersection(cuisine_list)))
print (df)
name cuisine
0 dominos pizza tea,breakfast,american
1 kfc chicken,american
cuisine_list = ['breakfast', 'american', 'tea', 'chicken']
pat = '|'.join(r"\b{}\b".format(x) for x in cuisine_list)
df['cuisine'] = df['cuisine'].str.findall(rf'{pat}').str.join(',')
print (df)
name cuisine
0 dominos pizza breakfast,american,tea
1 kfc american,chicken
我只想保留列表中出现的那些词。所有其他词都应该被删除。(pandas 数据框)
cuisine_list = ['breakfast', 'american', 'tea', 'chicken']
name | cuisine |
---|---|
dominos pizza | breakfast american tea dine in |
kfc | american chicken play area |
结果应该是这样的-
name | cuisine |
---|---|
dominos pizza | breakfast american tea |
kfc | american chicken |
我正在使用以下代码,但它花费了很多时间。
file1_cuisine = file1[["Cuisine"]]
for index, row in file1_cuisine.iterrows():
words_to_keep = []
for word in row[0].split(' '):
if word in words_to_match :
words_to_keep.append(word + ' ')
file1_cuisine.loc[index, 'final_input_text']= ''.join(words_to_keep)
使用set intersection
using &
with df.apply
and Series.str.split
:
In [760]: y = set(cuisine_list)
In [766]: df['cuisine'] = df['cuisine'].str.split().apply(lambda x: list(set(x) & y)).str.join(',')
In [767]: df
Out[767]:
name cuisine
0 dominos pizza tea,american,breakfast
1 kfc chicken,american
将 lambda 函数与 split
结合使用并设置交集,最后连接值 ,
:
cuisine_list = ['breakfast', 'american', 'tea', 'chicken']
df['cuisine'] = df['cuisine'].apply(lambda x: ','.join(set(x.split()).intersection(cuisine_list)))
print (df)
name cuisine
0 dominos pizza tea,breakfast,american
1 kfc chicken,american
cuisine_list = ['breakfast', 'american', 'tea', 'chicken']
pat = '|'.join(r"\b{}\b".format(x) for x in cuisine_list)
df['cuisine'] = df['cuisine'].str.findall(rf'{pat}').str.join(',')
print (df)
name cuisine
0 dominos pizza breakfast,american,tea
1 kfc american,chicken