将字符串中的所有单词匹配到另一个字符串中(单词可以位于不同的位置)
Match all words from string in another string (words can be in different positions)
我有一个必须与数据框列匹配的字符串列表。
列表如下所示:
list = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view
wcdma']
数据框中的列如下所示:
data = {'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']}
我想从列表中的字符串中找到包含每个单词的每一行,这样我就可以得到下一个数据帧:
COLUMN | String
wcdma street view disconnected | street view wcdma
gbts planned work street view | street view gbts
lte atn golden village optical invalid| golden village lte
wcdma street view planned work | street view wcdma
我试图找到匹配项的方法是在列表中提供字符串作为元素列表(如 ['street'、'view'、'wcdma'])并进行搜索:
df.apply(lambda x: all(er in x.COLUMN for er in list), axis=1)
但这 return 对我来说没什么,即使我确实知道必须至少有一场比赛。如果我将 all() 更改为 any() ,它将 return smth 但这不是我需要的。
你可以试试这个。
df = pd.DataFrame({'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']})
df
COLUMN
0 wcdma street view disconnected
1 gbts planned work street view
2 lte atn golden village optical invalid
3 wcdma street view planned work
现在,使用df.apply
lst = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view wcdma']
df['String'] = df.COLUMN.apply(lambda x:[i for i in lst if all(j in x for j in i.split())].pop())
df
COLUMN String
0 wcdma street view disconnected street view wcdma
1 gbts planned work street view street view gbts
2 lte atn golden village optical invalid golden village lte
3 wcdma street view planned work street view wcdma
import pandas as pd
list2 = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view wcdma']
list2=[x.split(' ') for x in list1]
data = {'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']}
data=pd.DataFrame(data)
def search(x):
list1=x.split(' ')
for y in list2:
check=all(item in list1 for item in y)
if check:
return ' '.join(y)
return None
data['matched']=data['COLUMN'].transform(search)
说明:我将每个字符串转换为 space 上的列表第一次拆分。对 'COLUMN' 使用 transform(),我使用 all() 来检测 'y' 的所有元素是否都在 'list2' 中。如果是,我return那个字符串
我有一个必须与数据框列匹配的字符串列表。
列表如下所示:
list = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view
wcdma']
数据框中的列如下所示:
data = {'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']}
我想从列表中的字符串中找到包含每个单词的每一行,这样我就可以得到下一个数据帧:
COLUMN | String
wcdma street view disconnected | street view wcdma
gbts planned work street view | street view gbts
lte atn golden village optical invalid| golden village lte
wcdma street view planned work | street view wcdma
我试图找到匹配项的方法是在列表中提供字符串作为元素列表(如 ['street'、'view'、'wcdma'])并进行搜索:
df.apply(lambda x: all(er in x.COLUMN for er in list), axis=1)
但这 return 对我来说没什么,即使我确实知道必须至少有一场比赛。如果我将 all() 更改为 any() ,它将 return smth 但这不是我需要的。
你可以试试这个。
df = pd.DataFrame({'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']})
df
COLUMN
0 wcdma street view disconnected
1 gbts planned work street view
2 lte atn golden village optical invalid
3 wcdma street view planned work
现在,使用df.apply
lst = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view wcdma']
df['String'] = df.COLUMN.apply(lambda x:[i for i in lst if all(j in x for j in i.split())].pop())
df
COLUMN String
0 wcdma street view disconnected street view wcdma
1 gbts planned work street view street view gbts
2 lte atn golden village optical invalid golden village lte
3 wcdma street view planned work street view wcdma
import pandas as pd
list2 = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view wcdma']
list2=[x.split(' ') for x in list1]
data = {'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']}
data=pd.DataFrame(data)
def search(x):
list1=x.split(' ')
for y in list2:
check=all(item in list1 for item in y)
if check:
return ' '.join(y)
return None
data['matched']=data['COLUMN'].transform(search)
说明:我将每个字符串转换为 space 上的列表第一次拆分。对 'COLUMN' 使用 transform(),我使用 all() 来检测 'y' 的所有元素是否都在 'list2' 中。如果是,我return那个字符串