在 python 中将多列堆叠为一列

Question

我有一个 100 行 x 7 列的 pandas 数据框，如下所示：

第 source 列中的值与其他列中的值相关联。例如，a 连接到 contact_1, contact_2... contact_5。同理，b 连接到 contact_6, contact_7 .... and contact_10.

我只想将这些列堆叠成两列（即源和目标），以帮助我使用边缘列表格式构建图形。

预期的输出数据格式为：

我尝试了 df.stack() 但没有得到想要的结果，我得到了以下结果：

有什么建议吗？

Answer 1

您正在寻找 pd.wide_to_long。应该这样做：

pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')

第 destination_ 栏将包含您要查找的信息。

示例：

import pandas as pd
d = {'source': ['a', 'b'],
 'destination_1': ['contact_1', 'contact_6'],
 'destination_2': ['contact_2', 'contact_7']}
df = pd.DataFrame(d)
pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')

输出：

              destination_
source number             
a      1         contact_1
b      1         contact_6
a      2         contact_2
b      2         contact_7

Answer 2

您可以尝试使用 pandas.DataFrame.melt，它会重新排列数据框，使一列现在是标识符变量，其余列是值变量。您可以阅读更多相关信息 here。

您可以按如下方式将 DataFrame.melt 应用于您的数据：

df = pd.DataFrame(data={
    "source": ["a", "b", "c"],
    "destination_1": ["contact_1", "contact_6", "contact_11"],
    "destination_2": ["contact_2", "contact_7", "contact_12"],
    ...
})

output_df = df.melt(id_vars=["source"])
# value_vars automatically inferred to be the remaining columns.

这将输出一个类似于

的DataFrame对象

   source       variable       value
0       a  destination_1   contact_1
1       b  destination_1   contact_6
2       c  destination_1  contact_11
3       a  destination_2   contact_2
4       b  destination_2   contact_7
5       c  destination_2  contact_12
.       .              .           .
.       .              .           .
.       .              .           .

您可以使用 output_df.sort_values(by=["source"]) 按 source 列排序。如果需要，您可以删除 variable 列并将 value 列重命名为 destination。您还可以在使用 output_df.reset_index(drop=True).

排序后重置索引

在 python 中将多列堆叠为一列

Stacking a number of columns into one column in python

python

stack

pandas

edge-list