尝试根据另一个数据框中的列修改列值时出现键盘错误
Keyerror when trying to modify column values based on column from another datafrme
我有两个 pandas 数据帧(df1 和 df2)。
df1
address mon tue wed ...
address1 40 40 40 ...
address2 20 20 20 ...
address3 30 30 0 ...
address3 0 0 30 ...
... ... ... ... ...
df2
address mon tue wed ...
address1 0 15 0 ...
address2 0 6 0 ...
address3 15 0 0 ...
... ... ... ... ...
我想做的是当df1(例如mon)的一列的值大于0时,如果df2的值也大于0则用df2的值替换df1的值:
df1 已修改
address mon tue wed ...
address1 40 15 40 ...
address2 20 6 20 ...
address3 15 30 0 ...
address3 0 0 30 ...
... ... ... ... ...
我正在尝试此代码 based on this:
for index, _ in df1.iterrows():
if df1.loc[index, 'mon'] > 0:
df1.loc[index, 'mon'] = float(
df2.loc[(df2['address'] == df1[index, 'address']), 'mon'])
但是我得到了 KeyError: (4, 'address')
Traceback (most recent call last):
File "/usr/lib64/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: (4, 'address')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/project/script.py", line 78, in <module>
df2.loc[(df2['address'] == df1[index, 'address']), 'mon'])
File "/usr/lib64/python3.8/site-packages/pandas/core/frame.py", line 3458, in __getitem__
indexer = self.columns.get_loc(key)
File "/usr/lib64/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: (4, 'address')
我可能做错了什么?
提前致谢。
使用mask
和combine_first
。
将 address
列设置为两个数据帧的索引,然后创建一个布尔掩码,其中 df1 和 df2 值大于 0。使用 mask
将匹配条件的每个单元格设置为 NaN 并使用 combine_first
用 df2 的值填充 df1 的 NaN 值。
df1 = df1.set_index('address')
df2 = df2.set_index('address').reindex(df1.index)
mask = df1.gt(0) & df2.gt(0)
df1 = df1.mask(mask).combine_first(df2).reset_index()
输出:
>>> df1
address mon tue wed
0 address1 40.0 15.0 40
1 address2 20.0 6.0 20
2 address3 15.0 30.0 0
3 address3 0.0 0.0 30
我有两个 pandas 数据帧(df1 和 df2)。
df1
address mon tue wed ...
address1 40 40 40 ...
address2 20 20 20 ...
address3 30 30 0 ...
address3 0 0 30 ...
... ... ... ... ...
df2
address mon tue wed ...
address1 0 15 0 ...
address2 0 6 0 ...
address3 15 0 0 ...
... ... ... ... ...
我想做的是当df1(例如mon)的一列的值大于0时,如果df2的值也大于0则用df2的值替换df1的值:
df1 已修改
address mon tue wed ...
address1 40 15 40 ...
address2 20 6 20 ...
address3 15 30 0 ...
address3 0 0 30 ...
... ... ... ... ...
我正在尝试此代码 based on this:
for index, _ in df1.iterrows():
if df1.loc[index, 'mon'] > 0:
df1.loc[index, 'mon'] = float(
df2.loc[(df2['address'] == df1[index, 'address']), 'mon'])
但是我得到了 KeyError: (4, 'address')
Traceback (most recent call last):
File "/usr/lib64/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: (4, 'address')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/project/script.py", line 78, in <module>
df2.loc[(df2['address'] == df1[index, 'address']), 'mon'])
File "/usr/lib64/python3.8/site-packages/pandas/core/frame.py", line 3458, in __getitem__
indexer = self.columns.get_loc(key)
File "/usr/lib64/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: (4, 'address')
我可能做错了什么?
提前致谢。
使用mask
和combine_first
。
将 address
列设置为两个数据帧的索引,然后创建一个布尔掩码,其中 df1 和 df2 值大于 0。使用 mask
将匹配条件的每个单元格设置为 NaN 并使用 combine_first
用 df2 的值填充 df1 的 NaN 值。
df1 = df1.set_index('address')
df2 = df2.set_index('address').reindex(df1.index)
mask = df1.gt(0) & df2.gt(0)
df1 = df1.mask(mask).combine_first(df2).reset_index()
输出:
>>> df1
address mon tue wed
0 address1 40.0 15.0 40
1 address2 20.0 6.0 20
2 address3 15.0 30.0 0
3 address3 0.0 0.0 30