按索引级别为 Pandas Multiindex DataFrame 赋值

Question

我有一个 Pandas 多索引数据框，我需要为系列中的其中一列赋值。该系列与数据帧索引的第一级共享其索引。

import pandas as pd
import numpy as np
idx0 = np.array(['bar', 'bar', 'bar', 'baz', 'foo', 'foo'])
idx1 = np.array(['one', 'two', 'three', 'one', 'one', 'two'])
df = pd.DataFrame(index = [idx0, idx1], columns = ['A', 'B'])
s = pd.Series([True, False, True],index = np.unique(idx0))
print df
print s

输出：

             A    B
bar one    NaN  NaN
    two    NaN  NaN
    three  NaN  NaN
baz one    NaN  NaN
foo one    NaN  NaN
    two    NaN  NaN

bar     True
baz    False
foo     True
dtype: bool

这些不起作用：

df.A = s # does not raise an error, but does nothing
df.loc[s.index,'A'] = s # raises an error

预期输出：

             A     B
bar one    True   NaN
    two    True   NaN
    three  True   NaN
baz one    False  NaN
foo one    True   NaN
    two    True   NaN

Answer 1

系列（和字典）可以像使用 map 和 apply 的函数一样使用（感谢@normanius 改进了语法）：

df['A'] = pd.Series(df.index.get_level_values(0)).map(s).values

或类似的：

df['A'] = df.reset_index(level=0)['level_0'].map(s).values

结果：

A    B
bar one     True  NaN
    two     True  NaN
    three   True  NaN
baz one    False  NaN
foo one     True  NaN
    two     True  NaN

Answer 2

df.A = s does not raise an error, but does nothing

确实这应该有效。^{你的观点实际上与mine有关。}

ᐊᐊ 解决方法 ᐊᐊ

>>> s.index = pd.Index((c,) for c in s.index)  # ᐊᐊᐊᐊᐊᐊᐊᐊ
>>> df.A = s
>>> df
               A    B
bar one     True  NaN
    two     True  NaN
    three   True  NaN
baz one    False  NaN
foo one     True  NaN
    two     True  NaN

为什么上面的方法有效？

因为当您直接 df.A = s 而没有解决方法 时，您实际上是在尝试分配 pandas.Index-contained coordinates within a subclass instance,^{which somehow looks like a "counter-opposition" to the LS principle} i.e. an instance of pandas.MultiIndex .我的意思是，自己寻找：

>>> type(s.index).__name__
'Index'

而

>>> type(df.index).__name__
'MultiIndex'

因此，这个解决方法包括将 s 的索引转换为一维 pandas.MultiIndex 实例。

>>> s.index = pd.Index((c,) for c in s.index)
>>> type(s.index).__name__
'MultiIndex'

没有明显的变化

>>> s
bar     True
baz    False
foo     True
dtype: bool

一个想法：从许多观点（数学的，本体论的）来看，所有这些都以某种方式表明pandas.Index 应该被设计成 pandas.MultiIndex 的子类，而不是像现在这样相反。

按索引级别为 Pandas Multiindex DataFrame 赋值

Assigning values to Pandas Multiindex DataFrame by index level

python

multi-index

pandas