比较两个 defaultidict 并只保留满足条件的记录
compare two defaultidicts and keep only records that satisfy a condition
我有一个 df,我想以 id 为键的方式将其转换为字典,然后获取字典列表作为值:
d = {'id': [1,1,1,1,2,2,3,3,3,4,4,4,4],
'label':['A','A','B','G','A','BB','C','C','A','BB','B','AA','AA']
,'amount':[2,-12,12,-12,5,-5,2,3,5,3,3,10,-10]}
df = pd.DataFrame(d)
d = defaultdict(lambda: defaultdict(list))
#only append the negative amounts
for index,row in df.iterrows():
if row["amount"] < 0:
d[row["id"]][row["amount"]].append(
{ "id": row["id"],
"label": row["label"]})
print(d)
Out: defaultdict(<function __main__.<lambda>()>,
{1: defaultdict(list,
{-12: [{'id': 1, 'description': 'A'},
{'id': 1, 'description': 'G'}]}),
2: defaultdict(list, {-5: [{'id': 2, 'description': 'BB'}]}),
4: defaultdict(list, {-10: [{'id': 4, 'description': 'AA'}]})})
d2 = defaultdict(lambda: defaultdict(list))
#only append the positive amounts
for index,row in df.iterrows():
account_id = row["id"]
amount = row["amount"]
if amount > 0:
d2[account_id][amount].append( { "id": row["id"],
"label": row["label"]})
print(d2)
Out: defaultdict(<function __main__.<lambda>()>,
{1: defaultdict(list,
{2: [{'id': 1, 'description': 'A'}],
12: [{'id': 1, 'description': 'B'}]}),
2: defaultdict(list, {5: [{'id': 2, 'description': 'A'}]}),
3: defaultdict(list,
{2: [{'id': 3, 'description': 'C'}],
3: [{'id': 3, 'description': 'C'}],
5: [{'id': 3, 'description': 'A'}]}),
4: defaultdict(list,
{3: [{'id': 4, 'description': 'BB'},
{'id': 4, 'description': 'B'}],
10: [{'id': 4, 'description': 'AA'}]})})
我如何比较两个字典,以便我得到包含同一用户的匹配正数和负数的记录,以便我的字典看起来像下面这样?我只想使用字典而不是 pandas operations.d
defaultdict(<function __main__.<lambda>()>,
{1: defaultdict(list,
{-12: [{'id': 1, 'description': 'A'},
{'id': 1, 'description': 'G'}],
12: [{'id': 1, 'description': 'B'}]},
2: defaultdic (list, {-5: [{'id': 2, 'description': 'BB'},
5: {'id': 2, 'description': 'A'}]}),
4: defaultdict(list, {-10: [{'id': 4, 'description': 'AA'}],
10: [{'id': 4, 'description': 'AA'}]}))
所以您基本上是想过滤每个 id 以正整数和负整数形式存在的金额?我建议在将其转换为字典之前在 pandas 中对其进行过滤。您可以按 id
进行分组,然后通过比较哪些数量也存在于相同的负 Series
:
中来过滤组
new_df = df.groupby('id').apply(lambda g: g[g['amount'].isin(g['amount']*-1)]).reset_index(drop=True)
输出:
id
label
amount
0
1
A
-12
1
1
B
12
2
1
G
-12
3
2
A
5
4
2
BB
-5
5
4
AA
10
6
4
AA
-10
然后您可以随心所欲地导出整个 df:
d = defaultdict(lambda: defaultdict(list))
for index,row in new_df.iterrows():
d[row["id"]][row["amount"]].append(
{ "id": row["id"],
"label": row["label"]})
如果您只想比较字典,可以迭代一个字典,比较另一个字典中是否存在负值,然后将两个值写入新字典:
new_dict = {}
for item in d:
for i in d[item]:
if i*-1 in [i for item in d2 for i in d2[item]]:
new_dict[item] = {i:d[item][i], -i:d2[item][-i]}
结果:
{1: {-12: [{'id': 1, 'label': 'A'}, {'id': 1, 'label': 'G'}],
12: [{'id': 1, 'label': 'B'}]},
2: {-5: [{'id': 2, 'label': 'BB'}], 5: [{'id': 2, 'label': 'A'}]},
4: {-10: [{'id': 4, 'label': 'AA'}], 10: [{'id': 4, 'label': 'AA'}]}}
我有一个 df,我想以 id 为键的方式将其转换为字典,然后获取字典列表作为值:
d = {'id': [1,1,1,1,2,2,3,3,3,4,4,4,4],
'label':['A','A','B','G','A','BB','C','C','A','BB','B','AA','AA']
,'amount':[2,-12,12,-12,5,-5,2,3,5,3,3,10,-10]}
df = pd.DataFrame(d)
d = defaultdict(lambda: defaultdict(list))
#only append the negative amounts
for index,row in df.iterrows():
if row["amount"] < 0:
d[row["id"]][row["amount"]].append(
{ "id": row["id"],
"label": row["label"]})
print(d)
Out: defaultdict(<function __main__.<lambda>()>,
{1: defaultdict(list,
{-12: [{'id': 1, 'description': 'A'},
{'id': 1, 'description': 'G'}]}),
2: defaultdict(list, {-5: [{'id': 2, 'description': 'BB'}]}),
4: defaultdict(list, {-10: [{'id': 4, 'description': 'AA'}]})})
d2 = defaultdict(lambda: defaultdict(list))
#only append the positive amounts
for index,row in df.iterrows():
account_id = row["id"]
amount = row["amount"]
if amount > 0:
d2[account_id][amount].append( { "id": row["id"],
"label": row["label"]})
print(d2)
Out: defaultdict(<function __main__.<lambda>()>,
{1: defaultdict(list,
{2: [{'id': 1, 'description': 'A'}],
12: [{'id': 1, 'description': 'B'}]}),
2: defaultdict(list, {5: [{'id': 2, 'description': 'A'}]}),
3: defaultdict(list,
{2: [{'id': 3, 'description': 'C'}],
3: [{'id': 3, 'description': 'C'}],
5: [{'id': 3, 'description': 'A'}]}),
4: defaultdict(list,
{3: [{'id': 4, 'description': 'BB'},
{'id': 4, 'description': 'B'}],
10: [{'id': 4, 'description': 'AA'}]})})
我如何比较两个字典,以便我得到包含同一用户的匹配正数和负数的记录,以便我的字典看起来像下面这样?我只想使用字典而不是 pandas operations.d
defaultdict(<function __main__.<lambda>()>,
{1: defaultdict(list,
{-12: [{'id': 1, 'description': 'A'},
{'id': 1, 'description': 'G'}],
12: [{'id': 1, 'description': 'B'}]},
2: defaultdic (list, {-5: [{'id': 2, 'description': 'BB'},
5: {'id': 2, 'description': 'A'}]}),
4: defaultdict(list, {-10: [{'id': 4, 'description': 'AA'}],
10: [{'id': 4, 'description': 'AA'}]}))
所以您基本上是想过滤每个 id 以正整数和负整数形式存在的金额?我建议在将其转换为字典之前在 pandas 中对其进行过滤。您可以按 id
进行分组,然后通过比较哪些数量也存在于相同的负 Series
:
new_df = df.groupby('id').apply(lambda g: g[g['amount'].isin(g['amount']*-1)]).reset_index(drop=True)
输出:
id | label | amount | |
---|---|---|---|
0 | 1 | A | -12 |
1 | 1 | B | 12 |
2 | 1 | G | -12 |
3 | 2 | A | 5 |
4 | 2 | BB | -5 |
5 | 4 | AA | 10 |
6 | 4 | AA | -10 |
然后您可以随心所欲地导出整个 df:
d = defaultdict(lambda: defaultdict(list))
for index,row in new_df.iterrows():
d[row["id"]][row["amount"]].append(
{ "id": row["id"],
"label": row["label"]})
如果您只想比较字典,可以迭代一个字典,比较另一个字典中是否存在负值,然后将两个值写入新字典:
new_dict = {}
for item in d:
for i in d[item]:
if i*-1 in [i for item in d2 for i in d2[item]]:
new_dict[item] = {i:d[item][i], -i:d2[item][-i]}
结果:
{1: {-12: [{'id': 1, 'label': 'A'}, {'id': 1, 'label': 'G'}],
12: [{'id': 1, 'label': 'B'}]},
2: {-5: [{'id': 2, 'label': 'BB'}], 5: [{'id': 2, 'label': 'A'}]},
4: {-10: [{'id': 4, 'label': 'AA'}], 10: [{'id': 4, 'label': 'AA'}]}}