获取列名作为具有相同列值的新列

Get column name as new column with the same column value

我有与此类似的数据框:


name    hobby   date         country      5           10           15         20 ...
Toby    Guitar  2020-01-19    Brazil     0.1245       0.2543      0.7763     0.2264
Linda   Cooking 2020-03-05    Italy      0.5411       0.2213      Nan        0.3342
Ben     Diving  2020-04-02    USA        0.8843       0.2333      0.4486     0.2122
...

我想要 int 列,复制它们,并将 int 作为列的新值,如下所示:


name    hobby   date         country      5      5     10      10     15     15    20      20...
Toby    Guitar  2020-01-19    Brazil     0.1245  5     0.2543  10    0.7763  15   0.2264   20
Linda   Cooking 2020-03-05    Italy      0.5411  5     0.2213  10    Nan     15   0.3342   20
Ben     Diving  2020-04-02    USA        0.8843  5     0.2333  10    0.4486  15   0.2122   20
...

我不确定如何解决这个问题并寻找想法

这是一个您可以尝试的解决方案,

digits_ = pd.DataFrame(
    {col: [int(col)] * len(df) for col in df.columns if col.isdigit()}
)

pd.concat([df, digits_], axis=1)

    name    hobby        date country       5  ...      20  5  10  15  20
0   Toby   Guitar  2020-01-19  Brazil  0.1245  ...  0.2264  5  10  15  20
1  Linda  Cooking  2020-03-05   Italy  0.5411  ...  0.3342  5  10  15  20
2    Ben   Diving  2020-04-02     USA  0.8843  ...  0.2122  5  10  15  20

我不确定这是否是组织具有重复列名的数据的最佳方式。我建议将其堆叠(熔化)成长格式。

df.melt(id_vars=["name", "hobby", "date", "country"])

结果

    name       hobby    date        country variable    value
0   Toby       Guitar   2020-01-19  Brazil  5           0.1245
1   Linda      Cooking  2020-03-05  Italy   5           0.5411
2   Ben        Diving   2020-04-02  USA     5           0.8843
3   Toby       Guitar   2020-01-19  Brazil  10          0.2543
4   Linda      Cooking  2020-03-05  Italy   10          0.2213
5   Ben        Diving   2020-04-02  USA     10          0.2333
6   Toby       Guitar   2020-01-19  Brazil  15          0.7763
7   Linda      Cooking  2020-03-05  Italy   15          Nan
8   Ben        Diving   2020-04-02  USA     15          0.4486
9   Toby       Guitar   2020-01-19  Brazil  20          0.2264
10  Linda      Cooking  2020-03-05  Italy   20          0.3342
11  Ben        Diving   2020-04-02  USA     20          0.2122

您可以使用 pandas insert(...) 函数结合 for 循环

import numpy as np
import pandas as pd

df = pd.DataFrame([['Toby', 'Guitar', '2020-01-19', 'Brazil', 0.1245, 0.2543, 0.7763, 0.2264],
                   ['Linda', 'Cooking', '2020-03-05', 'Italy', 0.5411, 0.2213, np.nan, 0.3342],
                   ['Ben', 'Diving', '2020-04-02', 'USA', 0.8843, 0.2333, 0.4486, 0.2122]],
                  columns=['name', 'hobby', 'date', 'country', 5, 10, 5, 20])

start_col=4
for i in range(0, len(df.columns)-start_col):
    dcol = df.columns[start_col+i*2] # digit col name to duplicate
    df.insert(start_col+i*2+1, dcol, [dcol]*len(df.index), True)

结果:

    name    hobby        date country       5  ...  10       5  5      20  20
0   Toby   Guitar  2020-01-19  Brazil  0.1245  ...  10  0.7763  5  0.2264  20
1  Linda  Cooking  2020-03-05   Italy  0.5411  ...  10     NaN  5  0.3342  20
2    Ben   Diving  2020-04-02     USA  0.8843  ...  10  0.4486  5  0.2122  20

[3 rows x 12 columns]

我假设你所有的列都是从第 5 位开始的数字,但如果不是,你可以在 for 循环中添加一个 if 条件为了防止这种情况:

start_col=4
for i in range(0, len(df.columns)-start_col):
    dcol = df.columns[start_col+i*2] # digit col name to duplicate
    if type(dcol) is int:
        df.insert(start_col+i*2+1, dcol, [dcol]*len(df.index), True)