使用列表作为列中的值的数据框？

Question

我有一个这样的数据框：

name	A	B	i
x	3	[1,1,1]	1
y	3	4	1
z	5	[1,1,1]	1
x	5	3	2
y	5	7	2
z	7	3	2

我要的是这个：

x_A	x_B	y_A	y_B	z_A	z_B	i
3	[1,1,1]	3	4	5	[1,1,1]	1
5	3	5	7	3	3	2

到目前为止，我的代码如下所示：

df = df.pivot_table(column= 'name', values =['A','B'])
df2 = df.unstack().to_frame().T
df2.columns = df2.columns.map('_'.join)

然而，当我运行这个时，它似乎跳过了有列表的列（即 B 列）并给了我：

x_A	y_A	z_A
3	3	5

还有其他方法可以解决这个问题吗？我错过了什么吗？ TIA.

Answer 1

尝试：

df = df.set_index("name").stack().to_frame().T
df.columns = df.columns.map("_".join)
print(df)

打印：

  x_A        x_B y_A y_B z_A        z_B
0   3  [1, 1, 1]   3   4   5  [1, 1, 1]

编辑：更新问题：

df = df.set_index(["name", "i"]).unstack(level=0).swaplevel(axis=1)
df.columns = df.columns.map("_".join)
print(df.reset_index())

打印：

   i  x_A  y_A  z_A        x_B y_B        z_B
0  1    3    3    5  [1, 1, 1]   4  [1, 1, 1]
1  2    5    5    7          3   7          3

Answer 2

见https://numpy.org/doc/stable/reference/generated/numpy.ravel.html

from io import StringIO
df = """name    A   B
x   3   [1,1,1]
y   3   4
z   5   [1,1,1]"""
df = pd.read_table(StringIO(df)).set_index('name')
s = pd.Series(df.values.ravel(),
              index=[i+'_'+c for i in df.index for c in df.columns])
s.to_frame().T

	x_A	x_B	y_A	y_B	z_A	z_B
0	3	[1,1,1]	3	4	5	[1,1,1]

Answer 3

我们可以使用pivot_table with aggfunc of first (to handle object types like list) and sort_index to group level 1 keys together. Then collapse the MultiIndex with Index.swaplevel and Index.map. Lastly, return i to the columns with DataFrame.reset_index:

out_df = (
    df.pivot_table(
        index='i',
        columns='name',
        aggfunc='first'
    ).sort_index(axis=1, level=1)
)
out_df.columns = out_df.columns.swaplevel().map('_'.join)
out_df = out_df.reset_index()

out_df:

	i	x_A	x_B	y_A	y_B	z_A	z_B
0	1	3	[1, 1, 1]	3	4	5	[1, 1, 1]
1	2	5	3	5	7	7	3

设置：

import pandas as pd

df = pd.DataFrame({
    'name': ['x', 'y', 'z', 'x', 'y', 'z'],
    'A': [3, 3, 5, 5, 5, 7],
    'B': [[1, 1, 1], '4', [1, 1, 1], '3', '7', '3'],
    'i': [1, 1, 1, 2, 2, 2]
})

pyjanitor module has an abstraction for this operation called pivot_wider 将此转换简化为：

out_df = df.pivot_wider(index='i', names_from='name')

	i	x_A	y_A	z_A	x_B	y_B	z_B
0	1	3	3	5	[1, 1, 1]	4	[1, 1, 1]
1	2	5	5	7	3	7	3

完整的工作示例：

# pip install pyjanitor
# conda install pyjanitor -c conda-forge
import janitor
import pandas as pd

df = pd.DataFrame({
    'name': ['x', 'y', 'z', 'x', 'y', 'z'],
    'A': [3, 3, 5, 5, 5, 7],
    'B': [[1, 1, 1], '4', [1, 1, 1], '3', '7', '3'],
    'i': [1, 1, 1, 2, 2, 2]
})

out_df = df.pivot_wider(index='i', names_from='name')
print(out_df)

使用列表作为列中的值的数据框？

Pivot data-frame with list as value in column?

python

pivot

dataframe

pandas