按索引级别为 Pandas Multiindex DataFrame 赋值
Assigning values to Pandas Multiindex DataFrame by index level
我有一个 Pandas 多索引数据框,我需要为系列中的其中一列赋值。该系列与数据帧索引的第一级共享其索引。
import pandas as pd
import numpy as np
idx0 = np.array(['bar', 'bar', 'bar', 'baz', 'foo', 'foo'])
idx1 = np.array(['one', 'two', 'three', 'one', 'one', 'two'])
df = pd.DataFrame(index = [idx0, idx1], columns = ['A', 'B'])
s = pd.Series([True, False, True],index = np.unique(idx0))
print df
print s
输出:
A B
bar one NaN NaN
two NaN NaN
three NaN NaN
baz one NaN NaN
foo one NaN NaN
two NaN NaN
bar True
baz False
foo True
dtype: bool
这些不起作用:
df.A = s # does not raise an error, but does nothing
df.loc[s.index,'A'] = s # raises an error
预期输出:
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
系列(和字典)可以像使用 map 和 apply 的函数一样使用(感谢@normanius 改进了语法):
df['A'] = pd.Series(df.index.get_level_values(0)).map(s).values
或类似的:
df['A'] = df.reset_index(level=0)['level_0'].map(s).values
结果:
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
df.A = s
does not raise an error, but does nothing
确实这应该有效。你的观点实际上与mine有关。
ᐊᐊ 解决方法 ᐊᐊ
>>> s.index = pd.Index((c,) for c in s.index) # ᐊᐊᐊᐊᐊᐊᐊᐊ
>>> df.A = s
>>> df
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
为什么上面的方法有效?
因为当您直接 df.A = s
而没有解决方法 时,您实际上是在尝试分配 pandas.Index
-contained coordinates within a subclass instance,which somehow looks like a "counter-opposition" to the LS principle i.e. an instance of pandas.MultiIndex
.我的意思是,自己寻找:
>>> type(s.index).__name__
'Index'
而
>>> type(df.index).__name__
'MultiIndex'
因此,这个解决方法包括将 s
的索引转换为一维 pandas.MultiIndex
实例。
>>> s.index = pd.Index((c,) for c in s.index)
>>> type(s.index).__name__
'MultiIndex'
没有明显的变化
>>> s
bar True
baz False
foo True
dtype: bool
一个想法:从许多观点(数学的,本体论的)来看,所有这些都以某种方式表明pandas.Index
应该被设计成 pandas.MultiIndex
的子类,而不是像现在这样相反。
我有一个 Pandas 多索引数据框,我需要为系列中的其中一列赋值。该系列与数据帧索引的第一级共享其索引。
import pandas as pd
import numpy as np
idx0 = np.array(['bar', 'bar', 'bar', 'baz', 'foo', 'foo'])
idx1 = np.array(['one', 'two', 'three', 'one', 'one', 'two'])
df = pd.DataFrame(index = [idx0, idx1], columns = ['A', 'B'])
s = pd.Series([True, False, True],index = np.unique(idx0))
print df
print s
输出:
A B
bar one NaN NaN
two NaN NaN
three NaN NaN
baz one NaN NaN
foo one NaN NaN
two NaN NaN
bar True
baz False
foo True
dtype: bool
这些不起作用:
df.A = s # does not raise an error, but does nothing
df.loc[s.index,'A'] = s # raises an error
预期输出:
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
系列(和字典)可以像使用 map 和 apply 的函数一样使用(感谢@normanius 改进了语法):
df['A'] = pd.Series(df.index.get_level_values(0)).map(s).values
或类似的:
df['A'] = df.reset_index(level=0)['level_0'].map(s).values
结果:
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
df.A = s
does not raise an error, but does nothing
确实这应该有效。你的观点实际上与mine有关。
ᐊᐊ 解决方法 ᐊᐊ
>>> s.index = pd.Index((c,) for c in s.index) # ᐊᐊᐊᐊᐊᐊᐊᐊ
>>> df.A = s
>>> df
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
为什么上面的方法有效?
因为当您直接 df.A = s
而没有解决方法 时,您实际上是在尝试分配 pandas.Index
-contained coordinates within a subclass instance,which somehow looks like a "counter-opposition" to the LS principle i.e. an instance of pandas.MultiIndex
.我的意思是,自己寻找:
>>> type(s.index).__name__
'Index'
而
>>> type(df.index).__name__
'MultiIndex'
因此,这个解决方法包括将 s
的索引转换为一维 pandas.MultiIndex
实例。
>>> s.index = pd.Index((c,) for c in s.index)
>>> type(s.index).__name__
'MultiIndex'
没有明显的变化
>>> s
bar True
baz False
foo True
dtype: bool
一个想法:从许多观点(数学的,本体论的)来看,所有这些都以某种方式表明pandas.Index
应该被设计成 pandas.MultiIndex
的子类,而不是像现在这样相反。