如何在不使用 python 中的循环的情况下创建引用数据框和字典的当前列的条件列?
How can i create conditional column referring present columns of a dataframe and dictionary without using loop in python?
我有一个数据农场
import pandas as pd
df = pd.DataFrame({"type": ["A" ,"A1" ,"A" ,"A1","B" ],
"group": ["g1", "g2","g2","g2","g1"]})
我有一本字典
dic ={"AlphaA": {"A": {"g1":"A_GRP1", "g2":"A_GRP2"},
"A1": {"g1":"A1_GRP1", "g2":"A1_GRP2"}},
"AlphaB": {"B": {"g1":"B_GRP1", "g2":"B_GRP2"}},
}
我必须创建一个名为“值”的列,它将使用数据框和字典并获取分配给它的值
申请条件:
- 如果类型是“A”或“A1”,它应该引用字典键 AlphaA 并获取相应组的值并将其分配给新列
- 如果类型是“B”,它应该引用字典键 AlphaB 并获取相应组的值
第一行示例:
类型是“A”因此引用字典键“AlphaA”
组是“g1
因此 :
dictt["AlphaA"]["A"]["g1"] #would be the answer
需要输出
final_df = pd.DataFrame({"type" : ["A" ,"A1" ,"A" ,"A1","B" ],
"group": ["g1", "g2","g2","g2","g1"],
"value": ["A_GRP1", "A1_GRP2", "A_GRP2",
"A1_GRP2", "B_GRP1"]})
我能够使用循环实现这一点,但它需要很多时间,
因此寻找一些快速的技术。
将 DataFrame.join
与通过字典理解从字典创建的系列一起使用:
d1 = {(k1, k2): v2 for k, v in d.items() for k1, v1 in v.items() for k2, v2 in v1.items()}
df = df.join(pd.Series(d1).rename('value'), on=['type','group'])
print (df)
type group value
0 A g1 A_GRP1
1 A1 g2 A1_GRP2
2 A g2 A_GRP2
3 A1 g2 A1_GRP2
4 B g1 B_GRP1
假设 dic
输入字典,您可以将字典值合并到一个字典中(借助 ChainMap
), convert to DataFrame and unstack
to Series and merge
:
from collections import ChainMap
s = pd.DataFrame(dict(ChainMap(*dic.values()))).unstack()
# without ChainMap
# d = {k: v for d in dic.values() for k,v in d.items()}
# pd.DataFrame(d).unstack()
out = df.merge(s.rename('value'), left_on=['type', 'group'], right_index=True)
输出:
type group value
0 A g1 A_GRP1
1 A1 g2 A1_GRP2
3 A1 g2 A1_GRP2
2 A g2 A_GRP2
4 B g1 B_GRP1
您可以删除原始字典的外键并尝试应用于行
d = {k:v for vs in d.values() for k, v in vs.items()}
df['value'] = (df.assign(value=df['type'].map(d))
.apply(lambda row: row['value'][row['group']], axis=1)
)
print(d)
{'A': {'g1': 'A_GRP1', 'g2': 'A_GRP2'}, 'A1': {'g1': 'A1_GRP1', 'g2': 'A1_GRP2'}, 'B': {'g1': 'B_GRP1', 'g2': 'B_GRP2'}}
print(df)
type group value
0 A g1 A_GRP1
1 A1 g2 A1_GRP2
2 A g2 A_GRP2
3 A1 g2 A1_GRP2
4 B g1 B_GRP1
我有一个数据农场
import pandas as pd
df = pd.DataFrame({"type": ["A" ,"A1" ,"A" ,"A1","B" ],
"group": ["g1", "g2","g2","g2","g1"]})
我有一本字典
dic ={"AlphaA": {"A": {"g1":"A_GRP1", "g2":"A_GRP2"},
"A1": {"g1":"A1_GRP1", "g2":"A1_GRP2"}},
"AlphaB": {"B": {"g1":"B_GRP1", "g2":"B_GRP2"}},
}
我必须创建一个名为“值”的列,它将使用数据框和字典并获取分配给它的值
申请条件:
- 如果类型是“A”或“A1”,它应该引用字典键 AlphaA 并获取相应组的值并将其分配给新列
- 如果类型是“B”,它应该引用字典键 AlphaB 并获取相应组的值
第一行示例:
类型是“A”因此引用字典键“AlphaA”
组是“g1
因此 :
dictt["AlphaA"]["A"]["g1"] #would be the answer
需要输出
final_df = pd.DataFrame({"type" : ["A" ,"A1" ,"A" ,"A1","B" ],
"group": ["g1", "g2","g2","g2","g1"],
"value": ["A_GRP1", "A1_GRP2", "A_GRP2",
"A1_GRP2", "B_GRP1"]})
我能够使用循环实现这一点,但它需要很多时间,
因此寻找一些快速的技术。
将 DataFrame.join
与通过字典理解从字典创建的系列一起使用:
d1 = {(k1, k2): v2 for k, v in d.items() for k1, v1 in v.items() for k2, v2 in v1.items()}
df = df.join(pd.Series(d1).rename('value'), on=['type','group'])
print (df)
type group value
0 A g1 A_GRP1
1 A1 g2 A1_GRP2
2 A g2 A_GRP2
3 A1 g2 A1_GRP2
4 B g1 B_GRP1
假设 dic
输入字典,您可以将字典值合并到一个字典中(借助 ChainMap
), convert to DataFrame and unstack
to Series and merge
:
from collections import ChainMap
s = pd.DataFrame(dict(ChainMap(*dic.values()))).unstack()
# without ChainMap
# d = {k: v for d in dic.values() for k,v in d.items()}
# pd.DataFrame(d).unstack()
out = df.merge(s.rename('value'), left_on=['type', 'group'], right_index=True)
输出:
type group value
0 A g1 A_GRP1
1 A1 g2 A1_GRP2
3 A1 g2 A1_GRP2
2 A g2 A_GRP2
4 B g1 B_GRP1
您可以删除原始字典的外键并尝试应用于行
d = {k:v for vs in d.values() for k, v in vs.items()}
df['value'] = (df.assign(value=df['type'].map(d))
.apply(lambda row: row['value'][row['group']], axis=1)
)
print(d)
{'A': {'g1': 'A_GRP1', 'g2': 'A_GRP2'}, 'A1': {'g1': 'A1_GRP1', 'g2': 'A1_GRP2'}, 'B': {'g1': 'B_GRP1', 'g2': 'B_GRP2'}}
print(df)
type group value
0 A g1 A_GRP1
1 A1 g2 A1_GRP2
2 A g2 A_GRP2
3 A1 g2 A1_GRP2
4 B g1 B_GRP1