逐行遍历一列列表,并在新的 pandas 数据框中将匹配项转换为 X

Iterate row by row through a column of lists and turning matches over X in a new pandas dataframe

我正在尝试遍历在 pandas 数据框列中找到的列表,并返回与新数据框其他行中包含的列表匹配超过 3 次的结果。

数据如下:

期望输出:

(输出是因为在至少其他三行的列表中找到了这些特定关键字)。

最小可重现示例:

import pandas as pd

# initialize data of lists.
data = {'url': ["www.bbc.co.uk", "www.cabinzero.com", "www.cntraveller.com", "www.forbes.com", "www.gov.scot", "www.gov.uk", "www.ons.gov.uk"],
        'keyword': ["['amber travel list', 'travel amber list', 'amber list countries uk travel', 'travel amber list countries', 'amber list countries travel']", "['amber list countries uk travel', 'travel amber list countries', 'amber travel list', 'travel amber list', 'amber list countries travel']", "['travel amber list', 'amber list countries uk travel', 'amber travel list', 'amber list countries travel', 'travel amber list countries']", "['amber travel list', 'travel amber list countries', 'travel amber list', 'amber list countries travel', 'amber list countries uk travel']", "['amber list countries travel', 'travel amber list countries', 'amber list countries uk travel', 'travel amber list', 'amber travel list']", "['amber list countries travel', 'amber list countries uk travel', 'amber travel list']", "['amber list countries uk travel', 'amber travel list', 'travel amber list countries', 'amber list countries travel']"]}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.
print(df)

我试过的 我试过将列表列转储到单个列表并遍历以计算出现次数,但无法使其工作并且不确定这是否是最佳方法。

如果在同一个列表中每个关键字都是唯一的,那么您可以:

from itertools import chain

listed_keywords = df.keyword.apply(lambda x: eval(x)).values # returns array of list
all_keywords = list(chain.from_iterable(listed_keywords)) # Concat all the lists into 1 global list of keywords

unique_keyword, nunique_keyword = np.unique(all_keywords, return_counts = True)# Return unique keywords and their respective frequency among all the keywords

df_keywords = pd.DataFrame(dict(keyword = unique_keyword, frequency = nunique_keyword)) # Create a DataFrame so you can easily filter according to keyword frequency.

希望这能回答您的问题!