在 python 中将 Series 重塑为 Dataframe 矩阵
Reshape Series into Dataframe matrix in python
我有一个包含 305 个条目的系列,该系列具有 Datatime 索引。数据看起来像这样
1992-01-31 1.123077
1992-02-28 -2.174845
1992-03-31 -3.884848
1992-04-30 8.682919
1992-05-29 1.312976
1992-06-30 7.851080
1992-07-31 -3.192788
1992-08-31 -7.351976
1992-09-30 -6.782217
1992-10-30 -17.182738
1992-11-30 3.898782
1992-12-31 -26.190414
1993-01-29 2.233359
1993-02-26 6.709006
continues with monthly data till December 2017
我想将数据重塑为一个 DataFrame,其中包含行的所有年份和列的所有月份以及要根据需要填写的数据
January February March etc >> December
2017 values values values values values
2016 values values values values values
2015 values values values values values
etc \//
1992 values
我查看了其他帖子并尝试了 reshape 和 asmatrix,但鉴于它是不均匀的系列,我不断收到此错误。
ValueError:新数组的总大小必须不变。
我真正想做的是,如果矩阵是奇数形状,则为缺失值插入 NaN。因此,如果 2017 年没有 11 月或 12 月的值,它们将是 NaN
让我知道是否有人可以提供帮助
来源 DF:
In [159]: df
Out[159]:
val
date
1992-01-31 1.123077
1992-02-28 -2.174845
1992-03-31 -3.884848
1992-04-30 8.682919
1992-05-29 1.312976
1992-06-30 7.851080
1992-07-31 -3.192788
1992-08-31 -7.351976
1992-09-30 -6.782217
1992-10-30 -17.182738
1992-11-30 3.898782
1992-12-31 -26.190414
1993-01-29 2.233359
1993-02-26 6.709006
解决方案:
import calendar
In [158]: (df.assign(year=df.index.year, mon=df.index.month)
.pivot(index='year', columns='mon', values='val')
.rename(columns=dict(zip(range(13), calendar.month_name))))
Out[158]:
mon January February March April May June July August September October November December
year
1992 1.123077 -2.174845 -3.884848 8.682919 1.312976 7.85108 -3.192788 -7.351976 -6.782217 -17.182738 3.898782 -26.190414
1993 2.233359 6.709006 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
更新: 或更好更短 :
In [164]: pd.pivot(df.index.year, df.index.month, df['val']) \
.rename(columns=calendar.month_name.__getitem__)
Out[164]:
date January February March April May June July August September October November December
date
1992 1.123077 -2.174845 -3.884848 8.682919 1.312976 7.85108 -3.192788 -7.351976 -6.782217 -17.182738 3.898782 -26.190414
1993 2.233359 6.709006 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
试试
#Give your series index a name so that we can reset index and have a new column
your_series.index = your_series.index.rename('Time')
df = your_series.toframe('Values').reset_index()
#Create variables for month and year
df['Month'] = df.Time.dt.month
df['Year'] = df.Time.dt.Year
#Assuming they are unique, create a pivot table
df.pivot('Year','Month','Values')
月份将是数字。如果你想要月份的名称,你必须这样做
import datetime as dt
df['Month'] = df.Time.date.apply(lambda x: dt.datetime.strftime(x,'%B'))
如果您的 month/year 对不是唯一的,则执行类似
的操作
df.groupby(['Year','Month']).Values.sum().unstack()
s
1992-01-31 1.123077
1992-02-28 -2.174845
1992-03-31 -3.884848
1992-04-30 8.682919
1992-05-29 1.312976
1992-06-30 7.851080
1992-07-31 -3.192788
1992-08-31 -7.351976
1992-09-30 -6.782217
1992-10-30 -17.182738
1992-11-30 3.898782
1992-12-31 -26.190414
1993-01-29 2.233359
1993-02-26 6.709006
Name: 1, dtype: float64
type(s)
pandas.core.series.Series
如有必要,将索引转换为 datetime
-
s.index = pd.to_datetime(s.index, errors='coerce')
现在,使用 pd.pivot
-
x = pd.Series(s.index.strftime('%Y %B')).str.split()
y, m = x.str[0], x.str[1]
pd.pivot(y, m, s)
April August December February January July June \
1992 8.682919 -7.351976 -26.190414 -2.174845 1.123077 -3.192788 7.85108
1993 NaN NaN NaN 6.709006 2.233359 NaN NaN
March May November October September
1992 -3.884848 1.312976 3.898782 -17.182738 -6.782217
1993 NaN NaN NaN NaN NaN
我有一个包含 305 个条目的系列,该系列具有 Datatime 索引。数据看起来像这样
1992-01-31 1.123077
1992-02-28 -2.174845
1992-03-31 -3.884848
1992-04-30 8.682919
1992-05-29 1.312976
1992-06-30 7.851080
1992-07-31 -3.192788
1992-08-31 -7.351976
1992-09-30 -6.782217
1992-10-30 -17.182738
1992-11-30 3.898782
1992-12-31 -26.190414
1993-01-29 2.233359
1993-02-26 6.709006
continues with monthly data till December 2017
我想将数据重塑为一个 DataFrame,其中包含行的所有年份和列的所有月份以及要根据需要填写的数据
January February March etc >> December
2017 values values values values values
2016 values values values values values
2015 values values values values values
etc \//
1992 values
我查看了其他帖子并尝试了 reshape 和 asmatrix,但鉴于它是不均匀的系列,我不断收到此错误。
ValueError:新数组的总大小必须不变。
我真正想做的是,如果矩阵是奇数形状,则为缺失值插入 NaN。因此,如果 2017 年没有 11 月或 12 月的值,它们将是 NaN
让我知道是否有人可以提供帮助
来源 DF:
In [159]: df
Out[159]:
val
date
1992-01-31 1.123077
1992-02-28 -2.174845
1992-03-31 -3.884848
1992-04-30 8.682919
1992-05-29 1.312976
1992-06-30 7.851080
1992-07-31 -3.192788
1992-08-31 -7.351976
1992-09-30 -6.782217
1992-10-30 -17.182738
1992-11-30 3.898782
1992-12-31 -26.190414
1993-01-29 2.233359
1993-02-26 6.709006
解决方案:
import calendar
In [158]: (df.assign(year=df.index.year, mon=df.index.month)
.pivot(index='year', columns='mon', values='val')
.rename(columns=dict(zip(range(13), calendar.month_name))))
Out[158]:
mon January February March April May June July August September October November December
year
1992 1.123077 -2.174845 -3.884848 8.682919 1.312976 7.85108 -3.192788 -7.351976 -6.782217 -17.182738 3.898782 -26.190414
1993 2.233359 6.709006 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
更新: 或更好更短
In [164]: pd.pivot(df.index.year, df.index.month, df['val']) \
.rename(columns=calendar.month_name.__getitem__)
Out[164]:
date January February March April May June July August September October November December
date
1992 1.123077 -2.174845 -3.884848 8.682919 1.312976 7.85108 -3.192788 -7.351976 -6.782217 -17.182738 3.898782 -26.190414
1993 2.233359 6.709006 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
试试
#Give your series index a name so that we can reset index and have a new column
your_series.index = your_series.index.rename('Time')
df = your_series.toframe('Values').reset_index()
#Create variables for month and year
df['Month'] = df.Time.dt.month
df['Year'] = df.Time.dt.Year
#Assuming they are unique, create a pivot table
df.pivot('Year','Month','Values')
月份将是数字。如果你想要月份的名称,你必须这样做
import datetime as dt
df['Month'] = df.Time.date.apply(lambda x: dt.datetime.strftime(x,'%B'))
如果您的 month/year 对不是唯一的,则执行类似
的操作df.groupby(['Year','Month']).Values.sum().unstack()
s
1992-01-31 1.123077
1992-02-28 -2.174845
1992-03-31 -3.884848
1992-04-30 8.682919
1992-05-29 1.312976
1992-06-30 7.851080
1992-07-31 -3.192788
1992-08-31 -7.351976
1992-09-30 -6.782217
1992-10-30 -17.182738
1992-11-30 3.898782
1992-12-31 -26.190414
1993-01-29 2.233359
1993-02-26 6.709006
Name: 1, dtype: float64
type(s)
pandas.core.series.Series
如有必要,将索引转换为 datetime
-
s.index = pd.to_datetime(s.index, errors='coerce')
现在,使用 pd.pivot
-
x = pd.Series(s.index.strftime('%Y %B')).str.split()
y, m = x.str[0], x.str[1]
pd.pivot(y, m, s)
April August December February January July June \
1992 8.682919 -7.351976 -26.190414 -2.174845 1.123077 -3.192788 7.85108
1993 NaN NaN NaN 6.709006 2.233359 NaN NaN
March May November October September
1992 -3.884848 1.312976 3.898782 -17.182738 -6.782217
1993 NaN NaN NaN NaN NaN