Python pandas 由长转宽
Python pandas pivot from long to wide
我的数据目前是长格式。以下是示例:
Stock Date Time Price Year
AAA 2001-01-05 15:20:09 2.380 2001
AAA 2002-02-23 10:13:24 2.440 2002
AAA 2002-02-27 17:17:55 2.460 2002
BBB 2006-05-13 16:03:49 2.780 2006
BBB 2006-10-04 10:33:10 2.800 2006
我想通过 "Stock" 和 "Year" 将其重塑为宽格式,如下所示:
Stock Year Date1 Time1 Price1 Date2 Time2 Price2
AAA 2001 2001-01-05 15:20:09 2.380
AAA 2002 2002-02-23 10:13:24 2.440 2002-02-27 17:17:55 2.460
BBB 2006 2006-05-13 16:03:49 2.780 2006-10-04 10:33:10 2.800
我尝试了此处发布的解决方案 Pandas long to wide reshape 并得到了这个:
df['idx'] = df.groupby(['Stock', 'Year']).cumcount()
df['date_idx'] = 'date_' + df.idx.astype(str)
df['time_idx'] = 'time_' + df.idx.astype(str)
df['price_idx'] = 'price_' + df.idx.astype(str)
date = df.pivot(index=['Stock', 'Year'], columns='date_idx', values='Date')
time = df.pivot(index=['Stock', 'Year'], columns='time_idx', values='Time')
price = df.pivot(index=['Stock', 'Year'], columns='price_idx', values='Price')
reshape = pd.concat([date, time, price], axis=1)
但是最后一行给我这个错误:
ValueError: Wrong number of items passed 15624, placement implies 2
我的代码哪里出错了?还是有另一种更简洁的方法来进行这种重塑?
我想你可以使用 pivot_table
,但需要一些 aggfunc
。我选择 first
,因为使用默认 np.mean
和 datetime
.
有问题
更好的示例解释是 and in docs。
解决方案1:
df['idx'] = (df.groupby(['Stock', 'Year']).cumcount() + 1).astype(str)
df1 = (df.pivot_table(index=['Stock', 'Year'],
columns=['idx'],
values=['Date', 'Time', 'Price'],
aggfunc='first'))
df1.columns = [''.join(col) for col in df1.columns]
df1 = df1.reset_index()
print (df1)
Stock Year Date1 Date2 Time1 Time2 Price1 Price2
0 AAA 2001 2001-01-05 None 15:20:09 None 2.38 None
1 AAA 2002 2002-02-23 2002-02-27 10:13:24 17:17:55 2.44 2.46
2 BBB 2006 2006-05-13 2006-10-04 16:03:49 10:33:10 2.78 2.8
然后你可以转换为 float
price
列和 to_datetime
date
列:
cols = df1.columns[df1.columns.str.contains('Price')]
df1[cols] = df1[cols].astype(float)
cols = df1.columns[df1.columns.str.contains('Date')]
df1[cols] = df1[cols].apply(pd.to_datetime)
print (df1)
Stock Year Date1 Date2 Time1 Time2 Price1 Price2
0 AAA 2001 2001-01-05 NaT 15:20:09 None 2.38 NaN
1 AAA 2002 2002-02-23 2002-02-27 10:13:24 17:17:55 2.44 2.46
2 BBB 2006 2006-05-13 2006-10-04 16:03:49 10:33:10 2.78 2.80
print (df1.dtypes)
Stock object
Year int64
Date1 datetime64[ns]
Date2 datetime64[ns]
Time1 object
Time2 object
Price1 float64
Price2 float64
解决方案2:
df['idx'] = df.groupby(['Stock', 'Year']).cumcount() + 1
df['date_idx'] = 'date_' + df.idx.astype(str)
df['time_idx'] = 'time_' + df.idx.astype(str)
df['price_idx'] = 'price_' + df.idx.astype(str)
date = df.pivot_table(index=['Stock', 'Year'], columns='date_idx', values='Date', aggfunc='first')
time = df.pivot_table(index=['Stock', 'Year'], columns='time_idx', values='Time', aggfunc='first')
price = df.pivot_table(index=['Stock', 'Year'], columns='price_idx', values='Price', aggfunc='first')
reshape = pd.concat([date, time, price], axis=1).reset_index()
print (reshape)
Stock Year date_1 date_2 time_1 time_2 price_1 price_2
0 AAA 2001 2001-01-05 None 15:20:09 None 2.38 NaN
1 AAA 2002 2002-02-23 2002-02-27 10:13:24 17:17:55 2.44 2.46
2 BBB 2006 2006-05-13 2006-10-04 16:03:49 10:33:10 2.78 2.80
我的数据目前是长格式。以下是示例:
Stock Date Time Price Year
AAA 2001-01-05 15:20:09 2.380 2001
AAA 2002-02-23 10:13:24 2.440 2002
AAA 2002-02-27 17:17:55 2.460 2002
BBB 2006-05-13 16:03:49 2.780 2006
BBB 2006-10-04 10:33:10 2.800 2006
我想通过 "Stock" 和 "Year" 将其重塑为宽格式,如下所示:
Stock Year Date1 Time1 Price1 Date2 Time2 Price2
AAA 2001 2001-01-05 15:20:09 2.380
AAA 2002 2002-02-23 10:13:24 2.440 2002-02-27 17:17:55 2.460
BBB 2006 2006-05-13 16:03:49 2.780 2006-10-04 10:33:10 2.800
我尝试了此处发布的解决方案 Pandas long to wide reshape 并得到了这个:
df['idx'] = df.groupby(['Stock', 'Year']).cumcount()
df['date_idx'] = 'date_' + df.idx.astype(str)
df['time_idx'] = 'time_' + df.idx.astype(str)
df['price_idx'] = 'price_' + df.idx.astype(str)
date = df.pivot(index=['Stock', 'Year'], columns='date_idx', values='Date')
time = df.pivot(index=['Stock', 'Year'], columns='time_idx', values='Time')
price = df.pivot(index=['Stock', 'Year'], columns='price_idx', values='Price')
reshape = pd.concat([date, time, price], axis=1)
但是最后一行给我这个错误:
ValueError: Wrong number of items passed 15624, placement implies 2
我的代码哪里出错了?还是有另一种更简洁的方法来进行这种重塑?
我想你可以使用 pivot_table
,但需要一些 aggfunc
。我选择 first
,因为使用默认 np.mean
和 datetime
.
更好的示例解释是
解决方案1:
df['idx'] = (df.groupby(['Stock', 'Year']).cumcount() + 1).astype(str)
df1 = (df.pivot_table(index=['Stock', 'Year'],
columns=['idx'],
values=['Date', 'Time', 'Price'],
aggfunc='first'))
df1.columns = [''.join(col) for col in df1.columns]
df1 = df1.reset_index()
print (df1)
Stock Year Date1 Date2 Time1 Time2 Price1 Price2
0 AAA 2001 2001-01-05 None 15:20:09 None 2.38 None
1 AAA 2002 2002-02-23 2002-02-27 10:13:24 17:17:55 2.44 2.46
2 BBB 2006 2006-05-13 2006-10-04 16:03:49 10:33:10 2.78 2.8
然后你可以转换为 float
price
列和 to_datetime
date
列:
cols = df1.columns[df1.columns.str.contains('Price')]
df1[cols] = df1[cols].astype(float)
cols = df1.columns[df1.columns.str.contains('Date')]
df1[cols] = df1[cols].apply(pd.to_datetime)
print (df1)
Stock Year Date1 Date2 Time1 Time2 Price1 Price2
0 AAA 2001 2001-01-05 NaT 15:20:09 None 2.38 NaN
1 AAA 2002 2002-02-23 2002-02-27 10:13:24 17:17:55 2.44 2.46
2 BBB 2006 2006-05-13 2006-10-04 16:03:49 10:33:10 2.78 2.80
print (df1.dtypes)
Stock object
Year int64
Date1 datetime64[ns]
Date2 datetime64[ns]
Time1 object
Time2 object
Price1 float64
Price2 float64
解决方案2:
df['idx'] = df.groupby(['Stock', 'Year']).cumcount() + 1
df['date_idx'] = 'date_' + df.idx.astype(str)
df['time_idx'] = 'time_' + df.idx.astype(str)
df['price_idx'] = 'price_' + df.idx.astype(str)
date = df.pivot_table(index=['Stock', 'Year'], columns='date_idx', values='Date', aggfunc='first')
time = df.pivot_table(index=['Stock', 'Year'], columns='time_idx', values='Time', aggfunc='first')
price = df.pivot_table(index=['Stock', 'Year'], columns='price_idx', values='Price', aggfunc='first')
reshape = pd.concat([date, time, price], axis=1).reset_index()
print (reshape)
Stock Year date_1 date_2 time_1 time_2 price_1 price_2
0 AAA 2001 2001-01-05 None 15:20:09 None 2.38 NaN
1 AAA 2002 2002-02-23 2002-02-27 10:13:24 17:17:55 2.44 2.46
2 BBB 2006 2006-05-13 2006-10-04 16:03:49 10:33:10 2.78 2.80