如何从 pandas DataFrame 中划分两行并将结果存储在第二个 DataFrame 中？

Question

我想从第一行 DataFrame 中划分两行，并将结果存储在第二行 DataFrame 中。我的尝试基于 this question，但到目前为止还没有成功。

第一个 DataFrame 中的行索引是 tuples 格式 (str, int)，我要计算的比率（并且是第二个 DataFrame) 表示为 tuples 的 tuple，并存储在 list:

中

(gene, position)
((gene, position1), (gene, position2))

我的代码如下：

df1 = pd.DataFrame(data={'A':[1,2,3], 'B':[4,5,6], 'C':[7,8,9], 'D' [10,11,12]}, 
                   index=[('geneA', 1538), ('geneA', 1591), ('geneA', 1687)])

               A  B  C   D
(geneA, 1538)  1  4  7  10
(geneA, 1591)  2  5  8  11
(geneA, 1687)  3  6  9  12

pairs = [(('geneA', 1538), ('geneA', 1591))]

df2 = pd.DataFrame()
for pair in pairs:
    df2.loc[[pair]] = df1.loc[[pair[0]]] / df1.loc[[pair[1]]]

当我运行此代码时，我得到 ValueError:

ValueError: Buffer has wrong number of dimensions (expected 1, got 3)

我上面链接的前一个示例在进行除法的行中没有嵌套括号，但是当我删除括号时，我得到与索引相关的 KeyErrors。我怀疑这与我使用 tuples 并将 tuples 嵌套为 indices 有关。任何帮助将不胜感激，我整个下午都在试图解决这个问题。

Answer 1

pandas 使用 tuples for aMultiIndex` (see docs):

The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.from_arrays), an array of tuples (using MultiIndex.from_tuples), or a crossed set of iterables (using MultiIndex.from_product).

所以首先定义一个MultiIndex可能是最合适的。

df1 = pd.DataFrame(data={'A':[1,2,3], 'B':[4,5,6], 'C':[7,8,9], 'D': [10,11,12]}, index=pd.MultiIndex.from_tuples([('geneA', 1538), ('geneA', 1591), ('geneA', 1687)]))

            A  B  C   D
geneA 1538  1  4  7  10
      1591  2  5  8  11
      1687  3  6  9  12

这样定义，除法工作正常：

pairs = [(('geneA', 1538), ('geneA', 1591))]
df2 = pd.DataFrame()
for pair in pairs:
    df2[pair] = df1.loc[pair[0]].div(df1.loc[pair[1]])

df2.T

                                  A    B      C         D
((geneA, 1538), (geneA, 1591))  0.5  0.8  0.875  0.909091

如何从 pandas DataFrame 中划分两行并将结果存储在第二个 DataFrame 中？

How do I divide two rows from a pandas DataFrame and store the result in a second DataFrame?

python

multi-index

pandas