逐行遍历一列列表,并在新的 pandas 数据框中将匹配项转换为 X
Iterate row by row through a column of lists and turning matches over X in a new pandas dataframe
我正在尝试遍历在 pandas 数据框列中找到的列表,并返回与新数据框其他行中包含的列表匹配超过 3 次的结果。
数据如下:
期望输出:
(输出是因为在至少其他三行的列表中找到了这些特定关键字)。
最小可重现示例:
import pandas as pd
# initialize data of lists.
data = {'url': ["www.bbc.co.uk", "www.cabinzero.com", "www.cntraveller.com", "www.forbes.com", "www.gov.scot", "www.gov.uk", "www.ons.gov.uk"],
'keyword': ["['amber travel list', 'travel amber list', 'amber list countries uk travel', 'travel amber list countries', 'amber list countries travel']", "['amber list countries uk travel', 'travel amber list countries', 'amber travel list', 'travel amber list', 'amber list countries travel']", "['travel amber list', 'amber list countries uk travel', 'amber travel list', 'amber list countries travel', 'travel amber list countries']", "['amber travel list', 'travel amber list countries', 'travel amber list', 'amber list countries travel', 'amber list countries uk travel']", "['amber list countries travel', 'travel amber list countries', 'amber list countries uk travel', 'travel amber list', 'amber travel list']", "['amber list countries travel', 'amber list countries uk travel', 'amber travel list']", "['amber list countries uk travel', 'amber travel list', 'travel amber list countries', 'amber list countries travel']"]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
我试过的
我试过将列表列转储到单个列表并遍历以计算出现次数,但无法使其工作并且不确定这是否是最佳方法。
如果在同一个列表中每个关键字都是唯一的,那么您可以:
from itertools import chain
listed_keywords = df.keyword.apply(lambda x: eval(x)).values # returns array of list
all_keywords = list(chain.from_iterable(listed_keywords)) # Concat all the lists into 1 global list of keywords
unique_keyword, nunique_keyword = np.unique(all_keywords, return_counts = True)# Return unique keywords and their respective frequency among all the keywords
df_keywords = pd.DataFrame(dict(keyword = unique_keyword, frequency = nunique_keyword)) # Create a DataFrame so you can easily filter according to keyword frequency.
希望这能回答您的问题!
我正在尝试遍历在 pandas 数据框列中找到的列表,并返回与新数据框其他行中包含的列表匹配超过 3 次的结果。
数据如下:
期望输出:
(输出是因为在至少其他三行的列表中找到了这些特定关键字)。
最小可重现示例:
import pandas as pd
# initialize data of lists.
data = {'url': ["www.bbc.co.uk", "www.cabinzero.com", "www.cntraveller.com", "www.forbes.com", "www.gov.scot", "www.gov.uk", "www.ons.gov.uk"],
'keyword': ["['amber travel list', 'travel amber list', 'amber list countries uk travel', 'travel amber list countries', 'amber list countries travel']", "['amber list countries uk travel', 'travel amber list countries', 'amber travel list', 'travel amber list', 'amber list countries travel']", "['travel amber list', 'amber list countries uk travel', 'amber travel list', 'amber list countries travel', 'travel amber list countries']", "['amber travel list', 'travel amber list countries', 'travel amber list', 'amber list countries travel', 'amber list countries uk travel']", "['amber list countries travel', 'travel amber list countries', 'amber list countries uk travel', 'travel amber list', 'amber travel list']", "['amber list countries travel', 'amber list countries uk travel', 'amber travel list']", "['amber list countries uk travel', 'amber travel list', 'travel amber list countries', 'amber list countries travel']"]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
我试过的 我试过将列表列转储到单个列表并遍历以计算出现次数,但无法使其工作并且不确定这是否是最佳方法。
如果在同一个列表中每个关键字都是唯一的,那么您可以:
from itertools import chain
listed_keywords = df.keyword.apply(lambda x: eval(x)).values # returns array of list
all_keywords = list(chain.from_iterable(listed_keywords)) # Concat all the lists into 1 global list of keywords
unique_keyword, nunique_keyword = np.unique(all_keywords, return_counts = True)# Return unique keywords and their respective frequency among all the keywords
df_keywords = pd.DataFrame(dict(keyword = unique_keyword, frequency = nunique_keyword)) # Create a DataFrame so you can easily filter according to keyword frequency.
希望这能回答您的问题!