在 pandas 多索引数据帧上设置单个值
Set single value on pandas multiindex dataframe
对于单索引数据框,我们可以使用 loc 来获取、设置和更改值:
>>> df=pd.DataFrame()
>>> df.loc['A',1]=1
>>> df
1
A 1.0
>>> df.loc['A',1]=2
>>> df.loc['A',1]
2.0
但是,对于多索引数据框,loc 可以获取和更改值:
>>> df=pd.DataFrame([['A','B',1]])
>>> df=df.set_index([0,1])
>>> df.loc[('A','B'),2]
1
>>> df.loc[('A','B'),2]=3
>>> df.loc[('A','B'),2]
3
但设置它们似乎失败了:
>>> df=pd.DataFrame()
>>> df.loc[('A','B'),2]=3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 688, in __setitem__
indexer = self._get_setitem_indexer(key)
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 630, in _get_setitem_indexer
return self._convert_tuple(key, is_setter=True)
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 754, in _convert_tuple
idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter)
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 1212, in _convert_to_indexer
return self._get_listlike_indexer(key, axis, raise_missing=True)[1]
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 1266, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 1308, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['A', 'B'], dtype='object')] are in the [index]"
为什么会这样,使用 loc 在多索引数据帧中设置单个值的“正确”方法是什么?
这会失败,因为您在 MultiIndex 中没有正确的级别数。
您需要使用正确的层数初始化一个空的 DataFrame,例如使用 pandas.MultiIndex.from_arrays
:
idx = pd.MultiIndex.from_arrays([[],[]])
df = pd.DataFrame(index=idx)
df.loc[('A','B'), 2] = 3
输出:
2
A B 3.0
对于单索引数据框,我们可以使用 loc 来获取、设置和更改值:
>>> df=pd.DataFrame()
>>> df.loc['A',1]=1
>>> df
1
A 1.0
>>> df.loc['A',1]=2
>>> df.loc['A',1]
2.0
但是,对于多索引数据框,loc 可以获取和更改值:
>>> df=pd.DataFrame([['A','B',1]])
>>> df=df.set_index([0,1])
>>> df.loc[('A','B'),2]
1
>>> df.loc[('A','B'),2]=3
>>> df.loc[('A','B'),2]
3
但设置它们似乎失败了:
>>> df=pd.DataFrame()
>>> df.loc[('A','B'),2]=3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 688, in __setitem__
indexer = self._get_setitem_indexer(key)
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 630, in _get_setitem_indexer
return self._convert_tuple(key, is_setter=True)
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 754, in _convert_tuple
idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter)
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 1212, in _convert_to_indexer
return self._get_listlike_indexer(key, axis, raise_missing=True)[1]
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 1266, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 1308, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['A', 'B'], dtype='object')] are in the [index]"
为什么会这样,使用 loc 在多索引数据帧中设置单个值的“正确”方法是什么?
这会失败,因为您在 MultiIndex 中没有正确的级别数。
您需要使用正确的层数初始化一个空的 DataFrame,例如使用 pandas.MultiIndex.from_arrays
:
idx = pd.MultiIndex.from_arrays([[],[]])
df = pd.DataFrame(index=idx)
df.loc[('A','B'), 2] = 3
输出:
2
A B 3.0