如何摆脱 pandas 中的多维索引

Question

在 Pandas 中，什么是 select 多索引中任意行集的好方法？

df = pd.DataFrame(columns=['A', 'B', 'C'])
df['A'] = ['a', 'a', 'b', 'b']
df['B'] = [1,2,3,4]
df['C'] = [1,2,3,4]

the_indices_we_want = df.ix[[0,3],['A','B']]
df = df.set_index(['A', 'B']) #Create a multiindex

df.ix[the_indices_we_want] #ValueError: Cannot index with multidimensional key

df.ix[[tuple(x) for x in the_indices_we_want.values]]

这最后一行是一个答案，但感觉很笨拙；它们甚至不能是列表，它们必须是元组。它还涉及生成一个新对象来进行索引。我处于一种情况，我正在尝试使用来自另一个数据帧的索引对多索引数据帧进行查找：

data_we_want = dataframe_with_the_data.ix[dataframe_with_the_indices[['Index1','Index2']]]

现在看来我需要这样写：

data_we_want = dataframe_with_the_data.ix[[tuple(x) for x in dataframe_with_the_indices[['Index1','Index2']].values]]

这是可行的，但是如果有很多行（即数以亿计的所需索引），那么生成这个元组列表就会成为相当大的负担。有什么解决办法吗？

编辑：@joris 的解决方案有效，但如果索引都是数字则无效。索引全部为整数的示例：

df = pd.DataFrame(columns=['A', 'B', 'C'])
df['A'] = ['a', 'a', 'b', 'b']
df['B'] = [1,2,3,4]
df['C'] = [1,2,3,4]

the_indices_we_want = df.ix[[0,3],['B','C']]
df = df.set_index(['B', 'C'])

df.ix[pd.Index(the_indices_we_want)] #ValueError: Cannot index with multidimensional key

df.ix[pd.Index(the_indices_we_want.astype('object'))] #Works, though feels clunky.

Answer 1

你确实不能直接用DataFrame做索引，但是如果你把它转换成一个Index对象，它会做正确的事情（DataFrame中的一行将被视为一个多索引条目）：

In [43]: pd.Index(the_indices_we_want)
Out[43]: Index([(u'a', 1), (u'b', 4)], dtype='object')

In [44]: df.ix[pd.Index(the_indices_we_want)]
Out[44]:
     C
A B
a 1  1
b 4  4

In [45]: df.ix[[tuple(x) for x in the_indices_we_want.values]]
Out[45]:
     C
A B
a 1  1
b 4  4

这个比较干净。通过一些快速测试，它似乎快了一点（但不多，只有 2 倍）

Answer 2

在 pandas 的较新版本中，您可以简单地使用 .iloc 进行行索引。

df = pd.DataFrame(columns=['A', 'B', 'C'])
df['A'] = ['a', 'a', 'b', 'b']
df['B'] = [1,2,3,4]
df['C'] = [1,2,3,4]
df.iloc[[0, 3]][['A', 'B']]

如何摆脱 pandas 中的多维索引

How to get away with a multidimensional index in pandas

python

multi-index

pandas