在行而不是列中使用 TimeSeries 重塑 Pandas DataFrame
Reshape Pandas DataFrame with TimeSeries in rows instead of columns
我有一个 DataFrame df,其中包含 2010 年 1 月至 2021 年 12 月期间每一天的价格数据(开盘价、收盘价、最高价、最低价):
Name
ISIN
Data
02.01.2010
05.01.2010
06.01.2010
...
31.12.2021
Apple
US9835635986
Price Open
12.45
13.45
12.48
...
54.12
Apple
US9835635986
Price Close
12.58
15.35
12.38
...
54.43
Apple
US9835635986
Price High
12.78
15.85
12.83
...
54.91
Apple
US9835635986
Price Low
12.18
13.35
12.21
...
53.98
Microsoft
US1223928384
Price Open
12.45
13.45
12.48
...
43.56
...
..
...
...
...
...
...
...
我正在尝试将 table 重塑为以下格式:
Date
Name
ISIN
Price Open
Price Close
Price High
Price Low
02.01.2010
Apple
US9835635986
12.45
12.58
12.78
12.18
05.01.2010
Apple
US9835635986
13.45
15.35
15.85
13.35
...
...
...
...
...
...
...
...
02.01.2010
Microsoft
US1223928384
12.45
13.67
13.74
12.35
简单地转置 DateFrame 不起作用。我还尝试了 pivot,它给出了操作数不能广播到不同形状的错误消息。
dates = ['NAME','ISIN']
dates.append(df.columns.tolist()[3:]) # appends all columns names starting with 02.01.2010
df.pivot(index = dates, columns = 'Data', Values = 'Data')
如何获得所需格式的 DataFrame?
在转换日期时间之前使用 DataFrame.melt
,最后排序 MultiIndex
:
df = (df.melt(['Name','ISIN','Data'], var_name='Date')
.assign(Date = lambda x: pd.to_datetime(x['Date'], format='%d.%m.%Y'))
.pivot(index = ['Date','Name','ISIN'], columns = 'Data', values = 'value')
.sort_index(level=[1,2,0])
.reset_index()
)
print (df)
Data Date Name ISIN Price Close Price High Price Low \
0 2010-01-02 Apple US9835635986 12.58 12.78 12.18
1 2010-01-05 Apple US9835635986 15.35 15.85 13.35
2 2010-01-06 Apple US9835635986 12.38 12.83 12.21
3 2021-12-31 Apple US9835635986 54.43 54.91 53.98
4 2010-01-02 Microsoft US1223928384 NaN NaN NaN
5 2010-01-05 Microsoft US1223928384 NaN NaN NaN
6 2010-01-06 Microsoft US1223928384 NaN NaN NaN
7 2021-12-31 Microsoft US1223928384 NaN NaN NaN
Data Price Open
0 12.45
1 13.45
2 12.48
3 54.12
4 12.45
5 13.45
6 12.48
7 43.56
另一个想法是先将列名转换为日期时间,然后按 DataFrame.stack
and Series.unstack
:
重塑
L = df.columns.tolist()
df = (df.set_axis(L[:3] + pd.to_datetime(L[3:], format='%d.%m.%Y').tolist(), axis=1)
.rename_axis('Date', axis=1)
.set_index(L[:3])
.stack()
.unstack(2)
.reorder_levels([2,0,1])
.reset_index())
print (df)
Data Date Name ISIN Price Close Price High Price Low \
0 2010-01-02 Apple US9835635986 12.58 12.78 12.18
1 2010-01-05 Apple US9835635986 15.35 15.85 13.35
2 2010-01-06 Apple US9835635986 12.38 12.83 12.21
3 2021-12-31 Apple US9835635986 54.43 54.91 53.98
4 2010-01-02 Microsoft US1223928384 NaN NaN NaN
5 2010-01-05 Microsoft US1223928384 NaN NaN NaN
6 2010-01-06 Microsoft US1223928384 NaN NaN NaN
7 2021-12-31 Microsoft US1223928384 NaN NaN NaN
Data Price Open
0 12.45
1 13.45
2 12.48
3 54.12
4 12.45
5 13.45
6 12.48
7 43.56
我有一个 DataFrame df,其中包含 2010 年 1 月至 2021 年 12 月期间每一天的价格数据(开盘价、收盘价、最高价、最低价):
Name | ISIN | Data | 02.01.2010 | 05.01.2010 | 06.01.2010 | ... | 31.12.2021 |
---|---|---|---|---|---|---|---|
Apple | US9835635986 | Price Open | 12.45 | 13.45 | 12.48 | ... | 54.12 |
Apple | US9835635986 | Price Close | 12.58 | 15.35 | 12.38 | ... | 54.43 |
Apple | US9835635986 | Price High | 12.78 | 15.85 | 12.83 | ... | 54.91 |
Apple | US9835635986 | Price Low | 12.18 | 13.35 | 12.21 | ... | 53.98 |
Microsoft | US1223928384 | Price Open | 12.45 | 13.45 | 12.48 | ... | 43.56 |
... | .. | ... | ... | ... | ... | ... | ... |
我正在尝试将 table 重塑为以下格式:
Date | Name | ISIN | Price Open | Price Close | Price High | Price Low | |
---|---|---|---|---|---|---|---|
02.01.2010 | Apple | US9835635986 | 12.45 | 12.58 | 12.78 | 12.18 | |
05.01.2010 | Apple | US9835635986 | 13.45 | 15.35 | 15.85 | 13.35 | |
... | ... | ... | ... | ... | ... | ... | ... |
02.01.2010 | Microsoft | US1223928384 | 12.45 | 13.67 | 13.74 | 12.35 |
简单地转置 DateFrame 不起作用。我还尝试了 pivot,它给出了操作数不能广播到不同形状的错误消息。
dates = ['NAME','ISIN']
dates.append(df.columns.tolist()[3:]) # appends all columns names starting with 02.01.2010
df.pivot(index = dates, columns = 'Data', Values = 'Data')
如何获得所需格式的 DataFrame?
在转换日期时间之前使用 DataFrame.melt
,最后排序 MultiIndex
:
df = (df.melt(['Name','ISIN','Data'], var_name='Date')
.assign(Date = lambda x: pd.to_datetime(x['Date'], format='%d.%m.%Y'))
.pivot(index = ['Date','Name','ISIN'], columns = 'Data', values = 'value')
.sort_index(level=[1,2,0])
.reset_index()
)
print (df)
Data Date Name ISIN Price Close Price High Price Low \
0 2010-01-02 Apple US9835635986 12.58 12.78 12.18
1 2010-01-05 Apple US9835635986 15.35 15.85 13.35
2 2010-01-06 Apple US9835635986 12.38 12.83 12.21
3 2021-12-31 Apple US9835635986 54.43 54.91 53.98
4 2010-01-02 Microsoft US1223928384 NaN NaN NaN
5 2010-01-05 Microsoft US1223928384 NaN NaN NaN
6 2010-01-06 Microsoft US1223928384 NaN NaN NaN
7 2021-12-31 Microsoft US1223928384 NaN NaN NaN
Data Price Open
0 12.45
1 13.45
2 12.48
3 54.12
4 12.45
5 13.45
6 12.48
7 43.56
另一个想法是先将列名转换为日期时间,然后按 DataFrame.stack
and Series.unstack
:
L = df.columns.tolist()
df = (df.set_axis(L[:3] + pd.to_datetime(L[3:], format='%d.%m.%Y').tolist(), axis=1)
.rename_axis('Date', axis=1)
.set_index(L[:3])
.stack()
.unstack(2)
.reorder_levels([2,0,1])
.reset_index())
print (df)
Data Date Name ISIN Price Close Price High Price Low \
0 2010-01-02 Apple US9835635986 12.58 12.78 12.18
1 2010-01-05 Apple US9835635986 15.35 15.85 13.35
2 2010-01-06 Apple US9835635986 12.38 12.83 12.21
3 2021-12-31 Apple US9835635986 54.43 54.91 53.98
4 2010-01-02 Microsoft US1223928384 NaN NaN NaN
5 2010-01-05 Microsoft US1223928384 NaN NaN NaN
6 2010-01-06 Microsoft US1223928384 NaN NaN NaN
7 2021-12-31 Microsoft US1223928384 NaN NaN NaN
Data Price Open
0 12.45
1 13.45
2 12.48
3 54.12
4 12.45
5 13.45
6 12.48
7 43.56