根据索引位置将多行插入到数据框中

insert multiple rows in to data frame based on index position

我有两个多级索引数据框,如下所示。我需要将它们合并在一起。 具体来说,我需要根据匹配的索引将 df2 中的行插入 df1 - 在级别的末尾。下面的数据可以告诉你我的意思,预期的输出也在那里。它需要是“程序化的”并且 运行 以加载这些实例!非常感谢

数据


array1 = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]]
tupples1 = list(zip(*array1))
index1 = pd.MultiIndex.from_tuples(tupples1, names=["first"])
df1 = pd.DataFrame(np.random.randn(8), index=index1)#
df1.reset_index(inplace=True)


array2 = [["bar", "bar","bar","qux","qux","qux","qux"]]
tupples2 = list(zip(*array2))
index2 = pd.MultiIndex.from_tuples(tupples2, names=["first"])
df2 = pd.DataFrame(np.random.randn(7), index=index2)
df2.reset_index(inplace=True)

预期输出 将是这个形状(但具有实际值)

array3 = [["bar", "bar","bar", "bar","bar", "baz", "baz", "foo", "foo", "qux", "qux","qux","qux","qux","qux","qux"]]
tupples3 = list(zip(*array3))
index3 = pd.MultiIndex.from_tuples(tupples3, names=["first"])
df3 = pd.DataFrame(np.random.randn(16), index=index3)
df3.reset_index(inplace=True)

这是在第一个索引级别上使用 pd.factorize 来获得该级别的某种顺序的一种方法,一旦您 concat 两个数据帧。

np.random.seed(1)
df3 = pd.concat([df1, df2])

df3 = (
    df3.set_index( # add two index level for sorting
         [list(range(len(df3))), # to have current order of rows
          pd.factorize(df3.index.get_level_values('first'))[0]], # to have order of first index 
         append=True) # to not replace original index
       .sort_index(level=[-1, -2]) # sort as wanted
       .droplevel([-2,-1]) # delete the extra index
)
print(df3)
                     0
first second          
bar   one     1.624345
      two    -0.611756
      one     0.319039
      two    -0.249370
      three   1.462108
baz   one    -0.528172
      two    -1.072969
foo   one     0.865408
      two    -2.301539
qux   one     1.744812
      two    -0.761207
      one    -2.060141
      two    -0.322417
      three  -0.384054
      four    1.133769

请注意,您可以执行相同的操作,将两个级别添加为列进行排序并使用 sort_values