Pandas 将行合并为一列
Pandas Melt row into a column
我有一个数据框,当前读取如下:
df_new = pd.DataFrame({'Week':['nan',14, 14, 14, 14, 14],
'Date':['NaT','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05'],
'site 1':['entry',0, 0, 0, 0, 0],
'site 1':['exit',0, 0, 0, 0, 0],
'site 2':['entry',1, 0,50, 7, 0],
'site 2':['exit',10, 0, 7, 19, 0],
'site 3':['entry',0, 100, 14, 9, 0],
'site 3':['exit',0, 0, 7, 0, 0],
'site 4':['entry',0, 0, 0, 0, 0],
'site 4':['exit',0, 0, 0, 0, 0],
'site 5':['entry',0, 0, 0, 0, 0],
'site 5':['exit',15, 0, 25, 0, 80],
})
然而,我想要的是每个站点指定 exit/entry 的列(列来自合并的 Excel headers)
下面是所需内容的示例(忽略我输入的实际值)
df_target = pd.DataFrame({'Week':[14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14],
'Date':['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05'],
'site':['site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 2', 'site 2','site 2','site 2','site 2','site 2'],
'entry/exit':['exit','exit', 'exit', 'entry', 'entry', 'entry', 'entry', 'entry', 'entry', 'exit', 'exit', 'exit', 'exit', 'entry', 'entry'],
'Value':[12 ,1, 0, 50, 7, 0, 12 ,1, 0, 50, 7, 0, 12 ,1, 0]
})
作为图像:
我试过了
df_target = df_new.melt(id_vars=['Week','Date'], var_name="Site", value_name="Value")
但我想我也需要以某种方式按第二行分组或将其视为第二行 header?
首先从输入 DataFrame
:
创建 MultiIndex
#if possible
#df = pd.read_csv(file, header=[0,1], index_col=[0,1])
df_new.columns = [df_new.columns, df_new.iloc[0]]
df = df_new.iloc[1:]
print (df.columns)
MultiIndex([( 'Week', 'nan'),
( 'Date', 'NaT'),
('site 1', 'exit'),
('site 2', 'exit'),
('site 3', 'exit'),
('site 4', 'exit'),
('site 5', 'exit')],
)
然后将前 2 个 MultiIndex columns
转换为 index
,因此可以使用 DataFrame.unstack
for melting with Series.rename_axis
和
Series.reset_index
:
df = (df.set_index(df.columns[:2].tolist())
.unstack([0,1])
.rename_axis(['site','entry/exit','Week','Date'])
.reset_index(name='Value'))
print (df)
site entry/exit Week Date Value
0 site 1 exit 14 2020-04-01 0
1 site 1 exit 14 2020-04-02 0
2 site 1 exit 14 2020-04-03 0
3 site 1 exit 14 2020-04-04 0
4 site 1 exit 14 2020-04-05 0
5 site 2 exit 14 2020-04-01 10
6 site 2 exit 14 2020-04-02 0
7 site 2 exit 14 2020-04-03 7
8 site 2 exit 14 2020-04-04 19
9 site 2 exit 14 2020-04-05 0
10 site 3 exit 14 2020-04-01 0
11 site 3 exit 14 2020-04-02 0
12 site 3 exit 14 2020-04-03 7
13 site 3 exit 14 2020-04-04 0
14 site 3 exit 14 2020-04-05 0
15 site 4 exit 14 2020-04-01 0
16 site 4 exit 14 2020-04-02 0
17 site 4 exit 14 2020-04-03 0
18 site 4 exit 14 2020-04-04 0
19 site 4 exit 14 2020-04-05 0
20 site 5 exit 14 2020-04-01 15
21 site 5 exit 14 2020-04-02 0
22 site 5 exit 14 2020-04-03 25
23 site 5 exit 14 2020-04-04 0
24 site 5 exit 14 2020-04-05 80
我有一个数据框,当前读取如下:
df_new = pd.DataFrame({'Week':['nan',14, 14, 14, 14, 14],
'Date':['NaT','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05'],
'site 1':['entry',0, 0, 0, 0, 0],
'site 1':['exit',0, 0, 0, 0, 0],
'site 2':['entry',1, 0,50, 7, 0],
'site 2':['exit',10, 0, 7, 19, 0],
'site 3':['entry',0, 100, 14, 9, 0],
'site 3':['exit',0, 0, 7, 0, 0],
'site 4':['entry',0, 0, 0, 0, 0],
'site 4':['exit',0, 0, 0, 0, 0],
'site 5':['entry',0, 0, 0, 0, 0],
'site 5':['exit',15, 0, 25, 0, 80],
})
然而,我想要的是每个站点指定 exit/entry 的列(列来自合并的 Excel headers)
下面是所需内容的示例(忽略我输入的实际值)
df_target = pd.DataFrame({'Week':[14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14],
'Date':['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05'],
'site':['site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 2', 'site 2','site 2','site 2','site 2','site 2'],
'entry/exit':['exit','exit', 'exit', 'entry', 'entry', 'entry', 'entry', 'entry', 'entry', 'exit', 'exit', 'exit', 'exit', 'entry', 'entry'],
'Value':[12 ,1, 0, 50, 7, 0, 12 ,1, 0, 50, 7, 0, 12 ,1, 0]
})
作为图像:
我试过了
df_target = df_new.melt(id_vars=['Week','Date'], var_name="Site", value_name="Value")
但我想我也需要以某种方式按第二行分组或将其视为第二行 header?
首先从输入 DataFrame
:
MultiIndex
#if possible
#df = pd.read_csv(file, header=[0,1], index_col=[0,1])
df_new.columns = [df_new.columns, df_new.iloc[0]]
df = df_new.iloc[1:]
print (df.columns)
MultiIndex([( 'Week', 'nan'),
( 'Date', 'NaT'),
('site 1', 'exit'),
('site 2', 'exit'),
('site 3', 'exit'),
('site 4', 'exit'),
('site 5', 'exit')],
)
然后将前 2 个 MultiIndex columns
转换为 index
,因此可以使用 DataFrame.unstack
for melting with Series.rename_axis
和
Series.reset_index
:
df = (df.set_index(df.columns[:2].tolist())
.unstack([0,1])
.rename_axis(['site','entry/exit','Week','Date'])
.reset_index(name='Value'))
print (df)
site entry/exit Week Date Value
0 site 1 exit 14 2020-04-01 0
1 site 1 exit 14 2020-04-02 0
2 site 1 exit 14 2020-04-03 0
3 site 1 exit 14 2020-04-04 0
4 site 1 exit 14 2020-04-05 0
5 site 2 exit 14 2020-04-01 10
6 site 2 exit 14 2020-04-02 0
7 site 2 exit 14 2020-04-03 7
8 site 2 exit 14 2020-04-04 19
9 site 2 exit 14 2020-04-05 0
10 site 3 exit 14 2020-04-01 0
11 site 3 exit 14 2020-04-02 0
12 site 3 exit 14 2020-04-03 7
13 site 3 exit 14 2020-04-04 0
14 site 3 exit 14 2020-04-05 0
15 site 4 exit 14 2020-04-01 0
16 site 4 exit 14 2020-04-02 0
17 site 4 exit 14 2020-04-03 0
18 site 4 exit 14 2020-04-04 0
19 site 4 exit 14 2020-04-05 0
20 site 5 exit 14 2020-04-01 15
21 site 5 exit 14 2020-04-02 0
22 site 5 exit 14 2020-04-03 25
23 site 5 exit 14 2020-04-04 0
24 site 5 exit 14 2020-04-05 80