根据索引位置将多行插入到数据框中
insert multiple rows in to data frame based on index position
我有两个多级索引数据框,如下所示。我需要将它们合并在一起。
具体来说,我需要根据匹配的索引将 df2 中的行插入 df1 - 在级别的末尾。下面的数据可以告诉你我的意思,预期的输出也在那里。它需要是“程序化的”并且 运行 以加载这些实例!非常感谢
数据
array1 = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]]
tupples1 = list(zip(*array1))
index1 = pd.MultiIndex.from_tuples(tupples1, names=["first"])
df1 = pd.DataFrame(np.random.randn(8), index=index1)#
df1.reset_index(inplace=True)
array2 = [["bar", "bar","bar","qux","qux","qux","qux"]]
tupples2 = list(zip(*array2))
index2 = pd.MultiIndex.from_tuples(tupples2, names=["first"])
df2 = pd.DataFrame(np.random.randn(7), index=index2)
df2.reset_index(inplace=True)
预期输出 将是这个形状(但具有实际值)
array3 = [["bar", "bar","bar", "bar","bar", "baz", "baz", "foo", "foo", "qux", "qux","qux","qux","qux","qux","qux"]]
tupples3 = list(zip(*array3))
index3 = pd.MultiIndex.from_tuples(tupples3, names=["first"])
df3 = pd.DataFrame(np.random.randn(16), index=index3)
df3.reset_index(inplace=True)
这是在第一个索引级别上使用 pd.factorize
来获得该级别的某种顺序的一种方法,一旦您 concat
两个数据帧。
np.random.seed(1)
df3 = pd.concat([df1, df2])
df3 = (
df3.set_index( # add two index level for sorting
[list(range(len(df3))), # to have current order of rows
pd.factorize(df3.index.get_level_values('first'))[0]], # to have order of first index
append=True) # to not replace original index
.sort_index(level=[-1, -2]) # sort as wanted
.droplevel([-2,-1]) # delete the extra index
)
print(df3)
0
first second
bar one 1.624345
two -0.611756
one 0.319039
two -0.249370
three 1.462108
baz one -0.528172
two -1.072969
foo one 0.865408
two -2.301539
qux one 1.744812
two -0.761207
one -2.060141
two -0.322417
three -0.384054
four 1.133769
请注意,您可以执行相同的操作,将两个级别添加为列进行排序并使用 sort_values
。
我有两个多级索引数据框,如下所示。我需要将它们合并在一起。 具体来说,我需要根据匹配的索引将 df2 中的行插入 df1 - 在级别的末尾。下面的数据可以告诉你我的意思,预期的输出也在那里。它需要是“程序化的”并且 运行 以加载这些实例!非常感谢
数据
array1 = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]]
tupples1 = list(zip(*array1))
index1 = pd.MultiIndex.from_tuples(tupples1, names=["first"])
df1 = pd.DataFrame(np.random.randn(8), index=index1)#
df1.reset_index(inplace=True)
array2 = [["bar", "bar","bar","qux","qux","qux","qux"]]
tupples2 = list(zip(*array2))
index2 = pd.MultiIndex.from_tuples(tupples2, names=["first"])
df2 = pd.DataFrame(np.random.randn(7), index=index2)
df2.reset_index(inplace=True)
预期输出 将是这个形状(但具有实际值)
array3 = [["bar", "bar","bar", "bar","bar", "baz", "baz", "foo", "foo", "qux", "qux","qux","qux","qux","qux","qux"]]
tupples3 = list(zip(*array3))
index3 = pd.MultiIndex.from_tuples(tupples3, names=["first"])
df3 = pd.DataFrame(np.random.randn(16), index=index3)
df3.reset_index(inplace=True)
这是在第一个索引级别上使用 pd.factorize
来获得该级别的某种顺序的一种方法,一旦您 concat
两个数据帧。
np.random.seed(1)
df3 = pd.concat([df1, df2])
df3 = (
df3.set_index( # add two index level for sorting
[list(range(len(df3))), # to have current order of rows
pd.factorize(df3.index.get_level_values('first'))[0]], # to have order of first index
append=True) # to not replace original index
.sort_index(level=[-1, -2]) # sort as wanted
.droplevel([-2,-1]) # delete the extra index
)
print(df3)
0
first second
bar one 1.624345
two -0.611756
one 0.319039
two -0.249370
three 1.462108
baz one -0.528172
two -1.072969
foo one 0.865408
two -2.301539
qux one 1.744812
two -0.761207
one -2.060141
two -0.322417
three -0.384054
four 1.133769
请注意,您可以执行相同的操作,将两个级别添加为列进行排序并使用 sort_values
。