来自嵌套字典的数据框中的多列更新
Multiple column update in dataframe from a nested dictionary
您好,我需要根据字典更新 DataFrame 中的特定列。我的初始 DataFrame 是这样的
Date
Var_1
Var_2
Var_3
Var_4
01/01/2022
100
Yes
Yes
104
02/01/2022
100
Yes
Yes
104
03/01/2022
100
Yes
Yes
104
04/01/2022
100
Yes
Yes
104
05/01/2022
100
Yes
No
104
06/01/2022
100
Yes
No
104
07/01/2022
100
Yes
No
104
08/01/2022
100
No
Yes
104
我的嵌套字典是这个(基于此我需要更新这个数据框)
my_dict = {
"01/01/2022" : { "Var_2": "Yes","Var_3": "No"},
"02/01/2022" : { "Var_2": "Yes","Var_3": "No"},
"03/01/2022" : { "Var_2": "Yes","Var_3": "Yes"},
"05/01/2022" : { "Var_2": "No", "Var_3": "Yes"},
"06/01/2022" : { "Var_2": "No", "Var_3": "Yes"}
}
我想要的输出是
Date
Var_1
Var_2
Var_3
Var_4
01/01/2022
100
Yes
No
104
02/01/2022
100
Yes
No
104
03/01/2022
100
Yes
Yes
104
04/01/2022
100
Yes
Yes
104
05/01/2022
100
No
Yes
104
06/01/2022
100
No
Yes
104
07/01/2022
100
Yes
No
104
08/01/2022
100
No
Yes
104
尝试过 .replace(my_dict)
但没有成功。
一个选项是将 my_dict
转换为 DataFrame 并用它更新 df
:
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
tmp = pd.DataFrame.from_dict(my_dict, orient='index')
tmp.index = pd.to_datetime(tmp.index)
df.update(tmp)
df = df.reset_index()
或使用combine_first
:
tmp = pd.DataFrame.from_dict(my_dict, orient='index')
tmp.index = pd.to_datetime(tmp.index)
df = tmp.combine_first(df.set_index('Date')).reset_index().rename(columns={'index':'Date'})
输出:
Date Var_1 Var_2 Var_3 Var_4
0 01/01/2022 100 Yes No 104
1 02/01/2022 100 Yes No 104
2 03/01/2022 100 Yes Yes 104
3 04/01/2022 100 Yes Yes 104
4 05/01/2022 100 No Yes 104
5 06/01/2022 100 No Yes 104
6 07/01/2022 100 Yes No 104
7 08/01/2022 100 No Yes 104
您好,我需要根据字典更新 DataFrame 中的特定列。我的初始 DataFrame 是这样的
Date | Var_1 | Var_2 | Var_3 | Var_4 |
---|---|---|---|---|
01/01/2022 | 100 | Yes | Yes | 104 |
02/01/2022 | 100 | Yes | Yes | 104 |
03/01/2022 | 100 | Yes | Yes | 104 |
04/01/2022 | 100 | Yes | Yes | 104 |
05/01/2022 | 100 | Yes | No | 104 |
06/01/2022 | 100 | Yes | No | 104 |
07/01/2022 | 100 | Yes | No | 104 |
08/01/2022 | 100 | No | Yes | 104 |
我的嵌套字典是这个(基于此我需要更新这个数据框)
my_dict = {
"01/01/2022" : { "Var_2": "Yes","Var_3": "No"},
"02/01/2022" : { "Var_2": "Yes","Var_3": "No"},
"03/01/2022" : { "Var_2": "Yes","Var_3": "Yes"},
"05/01/2022" : { "Var_2": "No", "Var_3": "Yes"},
"06/01/2022" : { "Var_2": "No", "Var_3": "Yes"}
}
我想要的输出是
Date | Var_1 | Var_2 | Var_3 | Var_4 |
---|---|---|---|---|
01/01/2022 | 100 | Yes | No | 104 |
02/01/2022 | 100 | Yes | No | 104 |
03/01/2022 | 100 | Yes | Yes | 104 |
04/01/2022 | 100 | Yes | Yes | 104 |
05/01/2022 | 100 | No | Yes | 104 |
06/01/2022 | 100 | No | Yes | 104 |
07/01/2022 | 100 | Yes | No | 104 |
08/01/2022 | 100 | No | Yes | 104 |
尝试过 .replace(my_dict)
但没有成功。
一个选项是将 my_dict
转换为 DataFrame 并用它更新 df
:
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
tmp = pd.DataFrame.from_dict(my_dict, orient='index')
tmp.index = pd.to_datetime(tmp.index)
df.update(tmp)
df = df.reset_index()
或使用combine_first
:
tmp = pd.DataFrame.from_dict(my_dict, orient='index')
tmp.index = pd.to_datetime(tmp.index)
df = tmp.combine_first(df.set_index('Date')).reset_index().rename(columns={'index':'Date'})
输出:
Date Var_1 Var_2 Var_3 Var_4
0 01/01/2022 100 Yes No 104
1 02/01/2022 100 Yes No 104
2 03/01/2022 100 Yes Yes 104
3 04/01/2022 100 Yes Yes 104
4 05/01/2022 100 No Yes 104
5 06/01/2022 100 No Yes 104
6 07/01/2022 100 Yes No 104
7 08/01/2022 100 No Yes 104