在 Python 中的字符串列表中搜索字符串列表

Search a list of list of strings inside a list of strings in Python

我想在 Python 的另一个字符串列表中搜索一个字符串列表。如果找到匹配项,我想检索两个列表的匹配字符串。我也想获得部分匹配。列表 1 和列表 2 都非常大,所以只提供一个示例

示例:

list 1 = [ 'The tablets are filled into cylindrically shaped bottles made of white coloured\npolyethylene. The volumes of the bottles depend on the tablet strength and amount of\ntablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured\npolypropylene and is equipped with a tamper proof ring.', 'PVC/PVDC blister pack', 'Blisters are made in a cold-forming process from an aluminium base web. Each tablet is\nfilled into a separate blister and a lidding foil of aluminium is welded on. The blisters\nare opened by pressing the tablets through the lidding foil.', '\n']



list 2 = [['Blister', 'Foil', 'Aluminium'], ['Blister', 'Base Web', 'PVC/PVDC'], ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], ['Bottle', 'Screw Type Cap', 'Polypropylene'], ['Bottle', 'Safety Ring', ''], ['Blister', 'Base Web', 'PVC'], ['Blister', 'Base Web', 'PVD/PVDC'], ['Bottle', 'Square Shaped Bottle', 'Polyethylene']]

如果列表 1 中的列表 2 的每个匹配项不存在于列表 1 的相同字符串中,则应将其输出为单独的阶段

示例预期输出:

Stage 1: 'The tablets are filled into cylindrically shaped bottles made of white coloured\npolyethylene. The volumes of the bottles depend on the tablet strength and amount of\ntablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured\npolypropylene and is equipped with a tamper proof ring.', values : ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene']

Stage 2: 'Blisters are made in a cold-forming process from an aluminium base web. Each tablet is\nfilled into a separate blister and a lidding foil of aluminium is welded on. The blisters\nare opened by pressing the tablets through the lidding foil.', Values: ['Blister', 'Foil', 'Aluminium']

匹配条件:

1.) 我想忽略列表 1 中的 \n 进行匹配。
2.) 我想匹配列表 1 中的列表 2,忽略 plural/singular,这意味着应该匹配列表 1 中作为 'bottles' 存在的 'Bottle'。

我已经尝试了在 Whosebug 上找到的这段代码,但实际上并没有用。无法使用此代码获得多个匹配项,也无法从包含列表 2 的值的列表 1 中检索整个字符串。这只是列出了列表 2 中的一些值:

from itertools import product

def generate_edges(iterable, control):
    edges = []
    control_set = set(control)
    for e in iterable:
        e_set = set(e)
        common = e_set & control_set
        to_pair = e_set - common
        edges.extend(product(to_pair, common))
    return edges

generate_edges(list2, list1)

最新变化:

counter = 1

for words in final_ref:
    for sen in paragraphs:
        all_exist = True
        for w in words:
            if w.lower() not in sen.lower():
                all_exist = False
                break
        if all_exist:
            #print(words[0])
            colours = ["White","Yellow","Blue","Red","Green","Black","Brown","Silver","Purple","Navy blue","Gray","Orange","Maroon","pink","colourless","blue"]
            if words[0] == 'Bottle':
                for wd in colours:
                    if wd in sen.split():
                        wd = wd

                        #print(wd)
#                        wordsnew = wd + words[0]
#                        print(wordsnew)
#            else:
#                wordsnew = words
#                print(wordsnew)
#                break



                    #print(wd)

            fr = "Stage " + str(counter) + ": " + "Package Description" + ": " + sen + " Values" + ": " + str(words) + "Colour" + ": " + str(wd) + "\n" + "\n" + "\n"
            result.append(fr)
            result = [i.replace('\n','') for i in result]
            result = [i.replace('\t','') for i in result]
            counter += 1
print(result)

通常你需要付出努力才能得到回应,但这次这对你有帮助:

counter = 1
for words in list2:
    for sen in list1:
        all_exist = True
        for w in words:
            if w.lower() not in sen.lower():
                all_exist = False
                break
        if all_exist:
            print("Stage " + str(counter) + ": " + sen + " Values" + str(words) + "\n")
            counter += 1

输出:

Stage 1: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tablet
is filled into a separate blister and a lidding foil of aluminium is welded on. The blisters
are opened by pressing the tablets through the lidding foil. PVDC foil is in contact with
the tablets. Values['Blister', 'Foil', 'Aluminium']

Stage 2: Blisters are made in a cold-forming process from an aluminium base web. Each tablet is
filled into a separate blister and a lidding foil of aluminium is welded on. The blisters
are opened by pressing the tablets through the lidding foil. Values['Blister', 'Foil', 'Aluminium']

Stage 3: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tablet
is filled into a separate blister and a lidding foil of aluminium is welded on. The blisters
are opened by pressing the tablets through the lidding foil. PVDC foil is in contact with
the tablets. Values['Blister', 'Base Web', 'PVC/PVDC']

Stage 4: The tablets are filled into cylindrically shaped bottles made of white coloured
polyethylene. The volumes of the bottles depend on the tablet strength and amount of
tablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured
polypropylene and is equipped with a tamper proof ring. Values['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene']

Stage 5: The tablets are filled into cylindrically shaped bottles made of white coloured
polyethylene. The volumes of the bottles depend on the tablet strength and amount of
tablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured
polypropylene and is equipped with a tamper proof ring. Values['Bottle', 'Screw Type Cap', 'Polypropylene']

Stage 6: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tablet
is filled into a separate blister and a lidding foil of aluminium is welded on. The blisters
are opened by pressing the tablets through the lidding foil. PVDC foil is in contact with
the tablets. Values['Blister', 'Base Web', 'PVC']