Pandas:重塑 Dataframe 以将多列压缩为单行值
Pandas: reshape Dataframe to condense multiple columns into single row value
我有以下数据框
pointId april august december february \
0 307 None None None NaN
1 307 None None None NaN
2 307 None None None NaN
3 307 None None None 0.88
4 307 None None None 0.60
january july june march may november october september year
0 NaN None None NaN None None None None 2014
1 NaN None None NaN None None None None 2015
2 NaN None None NaN None None None None 2016
3 0.7 None None 1.1 None None None None 2017
4 0.5 None None NaN None None None None 2018
对于特定的给定年份,它基本上在月份列中有一些值 pointId
我需要重塑它,以便将 12 列压缩为一个日期列。此列将具有给定值的月份的最后日期。所以我需要在月份列中为给定值添加一行。生成的数据框应如下所示:
pointId Date Value
0 307 01/31/2017 0.7
1 307 02/28/2017 0.88
2 307 03/31/2017 1.1
3 307 01/31/2018 0.5
4 686307 02/28/2018 0.6
像往常一样,感谢我们的所有帮助。没有 SO,我就无法工作 :)
通过使用 stack
,下一步您只需将年、月转换为月末
df.set_index(['pointId','year']).replace('None',np.nan).stack()
Out[1127]:
pointId year
307 2017 february 0.88
january 0.70
march 1.10
2018 february 0.60
january 0.50
dtype: float64
更新
s=df.set_index(['pointId','year']).replace('None',np.nan).stack().reset_index()
s=s.replace({'february':2,'january':1,'march':3})
from pandas.tseries.offsets import MonthEnd
s['Date']=pd.to_datetime(s.year*10+s.level_2,format='%Y%m')+MonthEnd(1)
s.drop(['year','level_2'],1).rename(columns={0:'Value'})
Out[1143]:
pointId Value Date
0 307 0.88 2017-02-28
1 307 0.70 2017-01-31
2 307 1.10 2017-03-31
3 307 0.60 2018-02-28
4 307 0.50 2018-01-31
我有以下数据框
pointId april august december february \
0 307 None None None NaN
1 307 None None None NaN
2 307 None None None NaN
3 307 None None None 0.88
4 307 None None None 0.60
january july june march may november october september year
0 NaN None None NaN None None None None 2014
1 NaN None None NaN None None None None 2015
2 NaN None None NaN None None None None 2016
3 0.7 None None 1.1 None None None None 2017
4 0.5 None None NaN None None None None 2018
对于特定的给定年份,它基本上在月份列中有一些值 pointId
我需要重塑它,以便将 12 列压缩为一个日期列。此列将具有给定值的月份的最后日期。所以我需要在月份列中为给定值添加一行。生成的数据框应如下所示:
pointId Date Value
0 307 01/31/2017 0.7
1 307 02/28/2017 0.88
2 307 03/31/2017 1.1
3 307 01/31/2018 0.5
4 686307 02/28/2018 0.6
像往常一样,感谢我们的所有帮助。没有 SO,我就无法工作 :)
通过使用 stack
,下一步您只需将年、月转换为月末
df.set_index(['pointId','year']).replace('None',np.nan).stack()
Out[1127]:
pointId year
307 2017 february 0.88
january 0.70
march 1.10
2018 february 0.60
january 0.50
dtype: float64
更新
s=df.set_index(['pointId','year']).replace('None',np.nan).stack().reset_index()
s=s.replace({'february':2,'january':1,'march':3})
from pandas.tseries.offsets import MonthEnd
s['Date']=pd.to_datetime(s.year*10+s.level_2,format='%Y%m')+MonthEnd(1)
s.drop(['year','level_2'],1).rename(columns={0:'Value'})
Out[1143]:
pointId Value Date
0 307 0.88 2017-02-28
1 307 0.70 2017-01-31
2 307 1.10 2017-03-31
3 307 0.60 2018-02-28
4 307 0.50 2018-01-31