重塑 pandas 数据框

Question

假设这样一个数据框：

df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]], columns = ['A', 'B', 'A1', 'B1'])

我想要一个如下所示的数据框：

什么不起作用：

new_rows = int(df.shape[1]/2) * df.shape[0]
new_cols = 2
df.values.reshape(new_rows, new_cols, order='F')

当然我可以遍历数据并制作一个新的列表列表，但必须有更好的方法。有什么想法吗？

Answer 1

您可以使用 lreshape, for column id numpy.repeat:

a = [col for col in df.columns if 'A' in col]
b = [col for col in df.columns if 'B' in col]
df1 = pd.lreshape(df, {'A' : a, 'B' : b})

df1['id'] = np.repeat(np.arange(len(df.columns) // 2), len (df.index))  + 1
print (df1)
    A   B  id
0   1   2   1
1   5   6   1
2   9  10   1
3   3   4   2
4   7   8   2
5  11  12   2

编辑：

lreshape is currently undocumented, but it is possible it might be removed(with pd.wide_to_long too)。

可能的解决方案是将所有 3 个功能合并为一个 - 也许 melt，但现在尚未实施。也许在 pandas 的一些新版本中。然后我的回答会更新。

Answer 2

我分三步解决了这个问题：

创建一个新数据框 df2 只包含您要添加到初始数据框的数据 df。
从 df 中删除将在下面添加的数据（用于制作 df2.
将 df2 附加到 df。

像这样：

# step 1: create new dataframe
df2 = df[['A1', 'B1']]
df2.columns = ['A', 'B']

# step 2: delete that data from original
df = df.drop(["A1", "B1"], 1)

# step 3: append
df = df.append(df2, ignore_index=True)

请注意，当您执行 df.append() 时，您需要指定 ignore_index=True，以便新列附加到索引而不是保留其旧索引。

您的最终结果应该是您的原始数据框，其中的数据按您想要的方式重新排列：

In [16]: df
Out[16]:
    A   B
0   1   2
1   5   6
2   9  10
3   3   4
4   7   8
5  11  12

Answer 3

像这样使用pd.concat()：

#Split into separate tables
df_1 = df[['A', 'B']]
df_2 = df[['A1', 'B1']]
df_2.columns = ['A', 'B'] # Make column names line up

# Add the ID column
df_1 = df_1.assign(id=1)
df_2 = df_2.assign(id=2)

# Concatenate
pd.concat([df_1, df_2])

Answer 4

pd.wide_to_long 函数几乎就是为这种情况而构建的，在这种情况下，您有许多以不同数字后缀结尾的相同变量前缀。这里唯一的区别是您的第一组变量没有后缀，因此您需要先重命名您的列。

pd.wide_to_long 的唯一问题是它必须有一个标识变量，i，这与 melt 不同。 reset_index用于创建一个this唯一标识的列，后面会去掉。我认为这可能会在未来得到纠正。

df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
  .reset_index()[['A', 'B', 'id']]

    A   B id
0   1   2  1
1   5   6  1
2   9  10  1
3   3   4  2
4   7   8  2
5  11  12  2

重塑 pandas 数据框

reshape a pandas dataframe

python

reshape

dataframe

pandas

lreshape