如何在 defaultdict(list) 中存储字典
How to store dictionary inside defaultdict(list)
import pandas as pd
import re
from collections import defaultdict
d = defaultdict(list)
df = pd.read_csv('https://raw.githubusercontent.com/twittergithub/hello/main/category_app_id_text_1_month_march_2021%20(1).csv')
数据帧的输出是..
suggestions category
0 ['jio tv', 'jio', 'jiosaavn', 'jiomart', 'jio ... ['BOOKS_AND_REFERENCE',
'PRODUCTIVITY', 'MUSIC...
1 ['instagram', 'internet', 'instacart', 'instag... ['SOCIAL', 'COMMUNICATION',
'FOOD_AND_DRINK', ...
2 ['instagram', 'instacart', 'instagram download... ['SOCIAL', 'FOOD_AND_DRINK',
'VIDEO_PLAYERS', ...
3 ['vpn', 'vpn free', 'vpn master', 'vpn private... ['TOOLS', 'TOOLS', 'TOOLS', 'TOOLS',
'TOOLS', ...
4 ['pubg', 'pubg mobile lite', 'pubg lite', 'pub... ['GAME_ACTION', 'GAME_ACTION',
'TOOLS', 'GAME_...
... ...
...
49610 ['inbuilt camera app', 'inbuilt screen recorde... ['PHOTOGRAPHY', 'VIDEO_PLAYERS',
'TOOLS', 'PRO...
49611 ['mpsc science app in marathi', 'mpsc science ... ['EDUCATION', 'EDUCATION',
'EDUCATION', 'EDUCA...
49612 ['ryerson', 'ryerson university', 'ryerson mob... ['BOOKS_AND_REFERENCE', 'EDUCATION',
'EDUCATIO...
49613 ['eeze', 'eezee english', 'ezee tab', 'deezer'... ['TRAVEL_AND_LOCAL', 'EDUCATION',
'BUSINESS', ...
49614 ['hindi love story books free download', 'hind... ['BOOKS_AND_REFERENCE',
'BOOKS_AND_REFERENCE',...
如果要为每行和每个类别内的类别列表中的每个项目创建一个类别列字典,请根据建议列创建一个建议字典,如果建议或类别重复,则只需增加字典里面的计数器。
dictionary = defaultdict(list)
for i in range(df.shape[0]):
categories = set(re.sub(r'[^\w\s]', '', df.loc[i, 'category']).split())
for category in categories:
suggestions = set(re.sub(r'[^\w\s]', '', df.loc[i, 'suggestions']).split())
for suggestion in suggestions:
if suggestion not in dictionary[category]:
dictionary[category][suggestion] = 1
else:
dictionary[category][suggestion] += 1
但我在 defaultdict 的类别列表中得到空列表。我希望你能理解我的问题。
使用 pandas
:
可能更容易和更快
from ast import literal_eval
# create cartesian product of categories and suggestions for each record,
# and calculate value_counts
z = pd.merge(
df['category'].apply(literal_eval).explode(),
df['suggestions'].apply(literal_eval).explode(),
left_index=True,
right_index=True).value_counts()
# convert to nested dict
d = {l: z.xs(l).to_dict() for l in z.index.levels[0]}
d
输出:
{'ART_AND_DESIGN': {'flipaclip': 39,
'mehndi design': 28,
'ibis paint x': 22,
'u launcher lite': 21,
'poster maker': 20,
'poster maker design app free': 20,
'ibis paint': 18,
'mehndi design 2021': 18,
'mehandi ka design': 18,
'u launcher': 18,
...
话虽如此,如果您想使用原来的方法,您只需将 dictionary
声明为 defaultdict(dict)
而不是 defaultdict(list)
,因为您重新制作嵌套字典,而不是列表字典:
dictionary = defaultdict(dict)
for i in range(df.shape[0]):
...
import pandas as pd
import re
from collections import defaultdict
d = defaultdict(list)
df = pd.read_csv('https://raw.githubusercontent.com/twittergithub/hello/main/category_app_id_text_1_month_march_2021%20(1).csv')
数据帧的输出是..
suggestions category
0 ['jio tv', 'jio', 'jiosaavn', 'jiomart', 'jio ... ['BOOKS_AND_REFERENCE',
'PRODUCTIVITY', 'MUSIC...
1 ['instagram', 'internet', 'instacart', 'instag... ['SOCIAL', 'COMMUNICATION',
'FOOD_AND_DRINK', ...
2 ['instagram', 'instacart', 'instagram download... ['SOCIAL', 'FOOD_AND_DRINK',
'VIDEO_PLAYERS', ...
3 ['vpn', 'vpn free', 'vpn master', 'vpn private... ['TOOLS', 'TOOLS', 'TOOLS', 'TOOLS',
'TOOLS', ...
4 ['pubg', 'pubg mobile lite', 'pubg lite', 'pub... ['GAME_ACTION', 'GAME_ACTION',
'TOOLS', 'GAME_...
... ...
...
49610 ['inbuilt camera app', 'inbuilt screen recorde... ['PHOTOGRAPHY', 'VIDEO_PLAYERS',
'TOOLS', 'PRO...
49611 ['mpsc science app in marathi', 'mpsc science ... ['EDUCATION', 'EDUCATION',
'EDUCATION', 'EDUCA...
49612 ['ryerson', 'ryerson university', 'ryerson mob... ['BOOKS_AND_REFERENCE', 'EDUCATION',
'EDUCATIO...
49613 ['eeze', 'eezee english', 'ezee tab', 'deezer'... ['TRAVEL_AND_LOCAL', 'EDUCATION',
'BUSINESS', ...
49614 ['hindi love story books free download', 'hind... ['BOOKS_AND_REFERENCE',
'BOOKS_AND_REFERENCE',...
如果要为每行和每个类别内的类别列表中的每个项目创建一个类别列字典,请根据建议列创建一个建议字典,如果建议或类别重复,则只需增加字典里面的计数器。
dictionary = defaultdict(list)
for i in range(df.shape[0]):
categories = set(re.sub(r'[^\w\s]', '', df.loc[i, 'category']).split())
for category in categories:
suggestions = set(re.sub(r'[^\w\s]', '', df.loc[i, 'suggestions']).split())
for suggestion in suggestions:
if suggestion not in dictionary[category]:
dictionary[category][suggestion] = 1
else:
dictionary[category][suggestion] += 1
但我在 defaultdict 的类别列表中得到空列表。我希望你能理解我的问题。
使用 pandas
:
from ast import literal_eval
# create cartesian product of categories and suggestions for each record,
# and calculate value_counts
z = pd.merge(
df['category'].apply(literal_eval).explode(),
df['suggestions'].apply(literal_eval).explode(),
left_index=True,
right_index=True).value_counts()
# convert to nested dict
d = {l: z.xs(l).to_dict() for l in z.index.levels[0]}
d
输出:
{'ART_AND_DESIGN': {'flipaclip': 39,
'mehndi design': 28,
'ibis paint x': 22,
'u launcher lite': 21,
'poster maker': 20,
'poster maker design app free': 20,
'ibis paint': 18,
'mehndi design 2021': 18,
'mehandi ka design': 18,
'u launcher': 18,
...
话虽如此,如果您想使用原来的方法,您只需将 dictionary
声明为 defaultdict(dict)
而不是 defaultdict(list)
,因为您重新制作嵌套字典,而不是列表字典:
dictionary = defaultdict(dict)
for i in range(df.shape[0]):
...