将行转置为列并在 pandas 中创建新列

Transpose rows into columns and create new columns in pandas

运行 创建混淆数据的代码:

from itertools import product

import numpy as np
import pandas as pd


a = [1, 2, 3]
b = [20, 40, 80, 160]
c = ['A', 'B']
d = [10, 20, 30, 40]
e = ['Alpha', 'Beta']
f = ['X', 'Y', 'Z']
i = ['a', 'b', 'c', 'd', 'e', 'f']

df = pd.DataFrame(product(a, b, c, d, e, f, i),
                  columns=['col_A', 'col_B', 'col_C', 'col_D', 'col_E', 'col_F', 'col_G'])

df['col_H'] = np.random.randint(0, 100, size=df.shape[0])
df['col_I'] = np.random.randint(0, 100, size=df.shape[0])
df['col_J'] = np.random.randint(0, 100, size=df.shape[0])

print(df)

我得到以下 table:

      col_A  col_B col_C  col_D  col_E col_F col_G  col_H  col_I  col_J
0         1     20     A     10  Alpha     X     a     44     55     12
1         1     20     A     10  Alpha     X     b     50     15     46
2         1     20     A     10  Alpha     X     c     42     36     46
3         1     20     A     10  Alpha     X     d     71     19     61
4         1     20     A     10  Alpha     X     e     74      1      7
5         1     20     A     10  Alpha     X     f     11     99      2
...     ...    ...   ...    ...    ...   ...   ...    ...    ...    ...
3450      3    160     B     40   Beta     Z     a     47     81     58
3451      3    160     B     40   Beta     Z     b     63     82     26
3452      3    160     B     40   Beta     Z     c     37     28     64
3453      3    160     B     40   Beta     Z     d     54     60     91
3454      3    160     B     40   Beta     Z     e     16     14     55
3455      3    160     B     40   Beta     Z     f     40     18     97

[3456 rows x 10 columns]

我想转置最后 4 列,将第一列的值设置为 headers,并根据其余 3 列名称添加一个新列。

见下文:

      col_A  col_B col_C  col_D  col_E col_F new_col   a   b   c   d   e   f
0         1     20     A     10  Alpha     X   col_H  68  89   0  99  50  11
1         1     20     A     10  Alpha     X   col_I  27  43  20  45  97  99
2         1     20     A     10  Alpha     X   col_J  40  43  36  89  90   2
...     ...    ...   ...    ...    ...   ...     ... ... ... ... ... ... ...
1150      3    160     B     40   Beta     Z   col_H  59  50  95  31  32  68
1151      3    160     B     40   Beta     Z   col_I  14  94  25  20  46  90
1152      3    160     B     40   Beta     Z   col_J  87  32  34  99  91  90

我试过用 melt, pivot, and pivot_table 来做,但我想不通。

这是通过 set_index / stack / unstack 的一种方式:

df = df.set_index(df.columns[:-3].to_list()).stack().unstack(-2).reset_index()

输出:

print(df.head(10))

col_G  col_A  col_B col_C  col_D  col_E col_F level_6   a   b   c   d   e   f
0          1     20     A     10  Alpha     X   col_H  97  96  59  71  98  19
1          1     20     A     10  Alpha     X   col_I   0  55   8  26  70  85
2          1     20     A     10  Alpha     X   col_J  90  87  16  70  68  39
3          1     20     A     10  Alpha     Y   col_H  65  69   9  92  35  11
4          1     20     A     10  Alpha     Y   col_I  28  47  56  83  43  81
5          1     20     A     10  Alpha     Y   col_J   8  29  89  47  84  32
6          1     20     A     10  Alpha     Z   col_H  97  12  95  17  34  47
7          1     20     A     10  Alpha     Z   col_I  78  64  51  81  15  62
8          1     20     A     10  Alpha     Z   col_J  47  83  58   3  90  11
9          1     20     A     10   Beta     X   col_H  19   8  92  69  65  32