Pandas 将行合并为一列

Pandas Melt row into a column

我有一个数据框,当前读取如下:

df_new = pd.DataFrame({'Week':['nan',14, 14, 14, 14, 14],
                          'Date':['NaT','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05'],
                          'site 1':['entry',0, 0, 0, 0, 0],
                          'site 1':['exit',0, 0, 0, 0, 0],
                          'site 2':['entry',1, 0,50, 7, 0],
                          'site 2':['exit',10, 0, 7, 19, 0],
                          'site 3':['entry',0, 100, 14, 9, 0],
                          'site 3':['exit',0, 0, 7, 0, 0],
                          'site 4':['entry',0, 0, 0, 0, 0],
                          'site 4':['exit',0, 0, 0, 0, 0],
                          'site 5':['entry',0, 0, 0, 0, 0],
                          'site 5':['exit',15, 0, 25, 0, 80],
                          })

然而,我想要的是每个站点指定 exit/entry 的列(列来自合并的 Excel headers)

下面是所需内容的示例(忽略我输入的实际值)

df_target = pd.DataFrame({'Week':[14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14],
                          'Date':['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05'],
                          'site':['site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 2', 'site 2','site 2','site 2','site 2','site 2'],
                          'entry/exit':['exit','exit', 'exit', 'entry', 'entry', 'entry', 'entry', 'entry', 'entry', 'exit', 'exit', 'exit', 'exit', 'entry', 'entry'],
                          'Value':[12 ,1, 0, 50, 7, 0, 12 ,1, 0, 50, 7, 0, 12 ,1, 0]               
                          })

作为图像:

我试过了

df_target = df_new.melt(id_vars=['Week','Date'], var_name="Site", value_name="Value")

但我想我也需要以某种方式按第二行分组或将其视为第二行 header?

首先从输入 DataFrame:

创建 MultiIndex
#if possible
#df = pd.read_csv(file, header=[0,1], index_col=[0,1])

df_new.columns = [df_new.columns, df_new.iloc[0]]
df = df_new.iloc[1:]
print (df.columns)
MultiIndex([(  'Week',  'nan'),
            (  'Date',  'NaT'),
            ('site 1', 'exit'),
            ('site 2', 'exit'),
            ('site 3', 'exit'),
            ('site 4', 'exit'),
            ('site 5', 'exit')],
           )

然后将前 2 个 MultiIndex columns 转换为 index,因此可以使用 DataFrame.unstack for melting with Series.rename_axisSeries.reset_index:

df = (df.set_index(df.columns[:2].tolist())
        .unstack([0,1])
        .rename_axis(['site','entry/exit','Week','Date'])
        .reset_index(name='Value'))
print (df)
      site entry/exit  Week        Date Value
0   site 1       exit    14  2020-04-01     0
1   site 1       exit    14  2020-04-02     0
2   site 1       exit    14  2020-04-03     0
3   site 1       exit    14  2020-04-04     0
4   site 1       exit    14  2020-04-05     0
5   site 2       exit    14  2020-04-01    10
6   site 2       exit    14  2020-04-02     0
7   site 2       exit    14  2020-04-03     7
8   site 2       exit    14  2020-04-04    19
9   site 2       exit    14  2020-04-05     0
10  site 3       exit    14  2020-04-01     0
11  site 3       exit    14  2020-04-02     0
12  site 3       exit    14  2020-04-03     7
13  site 3       exit    14  2020-04-04     0
14  site 3       exit    14  2020-04-05     0
15  site 4       exit    14  2020-04-01     0
16  site 4       exit    14  2020-04-02     0
17  site 4       exit    14  2020-04-03     0
18  site 4       exit    14  2020-04-04     0
19  site 4       exit    14  2020-04-05     0
20  site 5       exit    14  2020-04-01    15
21  site 5       exit    14  2020-04-02     0
22  site 5       exit    14  2020-04-03    25
23  site 5       exit    14  2020-04-04     0
24  site 5       exit    14  2020-04-05    80