将字符串中的所有单词匹配到另一个字符串中(单词可以位于不同的位置)

Match all words from string in another string (words can be in different positions)

我有一个必须与数据框列匹配的字符串列表。

列表如下所示:

list = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view
wcdma']  

数据框中的列如下所示:

data = {'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']}

我想从列表中的字符串中找到包含每个单词的每一行,这样我就可以得到下一个数据帧:

  COLUMN                               |  String    
 wcdma street view disconnected        | street view wcdma  
 gbts planned work street view         | street view gbts  
 lte atn golden village optical invalid| golden village lte  
 wcdma street view planned work        | street view wcdma   

我试图找到匹配项的方法是在列表中提供字符串作为元素列表(如 ['street'、'view'、'wcdma'])并进行搜索:

df.apply(lambda x: all(er in x.COLUMN for er in list), axis=1)

但这 return 对我来说没什么,即使我确实知道必须至少有一场比赛。如果我将 all() 更改为 any() ,它将 return smth 但这不是我需要的。

你可以试试这个。

df = pd.DataFrame({'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']})
df

                                   COLUMN
0          wcdma street view disconnected
1           gbts planned work street view
2  lte atn golden village optical invalid
3          wcdma street view planned work

现在,使用df.apply

lst = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view wcdma']  
df['String'] = df.COLUMN.apply(lambda x:[i for i in lst if all(j in x for j in i.split())].pop())
df
                                   COLUMN              String
0          wcdma street view disconnected   street view wcdma
1           gbts planned work street view    street view gbts
2  lte atn golden village optical invalid  golden village lte
3          wcdma street view planned work   street view wcdma
import pandas as pd
list2 = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view wcdma']
list2=[x.split(' ') for x in list1]
data = {'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']}
data=pd.DataFrame(data)
def search(x):
    list1=x.split(' ')
    for y in list2:
         check=all(item in list1 for item in y)
         if check:
             return ' '.join(y)
    return None
data['matched']=data['COLUMN'].transform(search)

说明:我将每个字符串转换为 space 上的列表第一次拆分。对 'COLUMN' 使用 transform(),我使用 all() 来检测 'y' 的所有元素是否都在 'list2' 中。如果是,我return那个字符串