使用列表作为列中的值的数据框?

Pivot data-frame with list as value in column?

我有一个这样的数据框:

name A B i
x 3 [1,1,1] 1
y 3 4 1
z 5 [1,1,1] 1
x 5 3 2
y 5 7 2
z 7 3 2

我要的是这个:

x_A x_B y_A y_B z_A z_B i
3 [1,1,1] 3 4 5 [1,1,1] 1
5 3 5 7 3 3 2

到目前为止,我的代码如下所示:

df = df.pivot_table(column= 'name', values =['A','B'])
df2 = df.unstack().to_frame().T
df2.columns = df2.columns.map('_'.join)

然而,当我 运行 这个时,它似乎跳过了有列表的列(即 B 列)并给了我:

x_A y_A z_A
3 3 5

还有其他方法可以解决这个问题吗?我错过了什么吗? TIA.

尝试:

df = df.set_index("name").stack().to_frame().T
df.columns = df.columns.map("_".join)
print(df)

打印:

  x_A        x_B y_A y_B z_A        z_B
0   3  [1, 1, 1]   3   4   5  [1, 1, 1]

编辑:更新问题:

df = df.set_index(["name", "i"]).unstack(level=0).swaplevel(axis=1)
df.columns = df.columns.map("_".join)
print(df.reset_index())

打印:

   i  x_A  y_A  z_A        x_B y_B        z_B
0  1    3    3    5  [1, 1, 1]   4  [1, 1, 1]
1  2    5    5    7          3   7          3

https://numpy.org/doc/stable/reference/generated/numpy.ravel.html

from io import StringIO
df = """name    A   B
x   3   [1,1,1]
y   3   4
z   5   [1,1,1]"""
df = pd.read_table(StringIO(df)).set_index('name')
s = pd.Series(df.values.ravel(),
              index=[i+'_'+c for i in df.index for c in df.columns])
s.to_frame().T
x_A x_B y_A y_B z_A z_B
0 3 [1,1,1] 3 4 5 [1,1,1]

我们可以使用pivot_table with aggfunc of first (to handle object types like list) and sort_index to group level 1 keys together. Then collapse the MultiIndex with Index.swaplevel and Index.map. Lastly, return i to the columns with DataFrame.reset_index:

out_df = (
    df.pivot_table(
        index='i',
        columns='name',
        aggfunc='first'
    ).sort_index(axis=1, level=1)
)
out_df.columns = out_df.columns.swaplevel().map('_'.join)
out_df = out_df.reset_index()

out_df:

i x_A x_B y_A y_B z_A z_B
0 1 3 [1, 1, 1] 3 4 5 [1, 1, 1]
1 2 5 3 5 7 7 3

设置:

import pandas as pd

df = pd.DataFrame({
    'name': ['x', 'y', 'z', 'x', 'y', 'z'],
    'A': [3, 3, 5, 5, 5, 7],
    'B': [[1, 1, 1], '4', [1, 1, 1], '3', '7', '3'],
    'i': [1, 1, 1, 2, 2, 2]
})

pyjanitor module has an abstraction for this operation called pivot_wider 将此转换简化为:

out_df = df.pivot_wider(index='i', names_from='name')
i x_A y_A z_A x_B y_B z_B
0 1 3 3 5 [1, 1, 1] 4 [1, 1, 1]
1 2 5 5 7 3 7 3

完整的工作示例:

# pip install pyjanitor
# conda install pyjanitor -c conda-forge
import janitor
import pandas as pd

df = pd.DataFrame({
    'name': ['x', 'y', 'z', 'x', 'y', 'z'],
    'A': [3, 3, 5, 5, 5, 7],
    'B': [[1, 1, 1], '4', [1, 1, 1], '3', '7', '3'],
    'i': [1, 1, 1, 2, 2, 2]
})

out_df = df.pivot_wider(index='i', names_from='name')
print(out_df)