python 中的重采样和切片数据
resampling and slicing data in python
数据如下所示:
High Low Open Close Volume Adj Close
Date
1999-12-31 1472.420044 1458.189941 1464.469971 1469.250000 374050000 1469.250000
2000-01-03 1478.000000 1438.359985 1469.250000 1455.219971 931800000 1455.219971
2000-01-04 1455.219971 1397.430054 1455.219971 1399.420044 1009000000 1399.420044
2000-01-05 1413.270020 1377.680054 1399.420044 1402.109985 1085500000 1402.109985
2000-01-06 1411.900024 1392.099976 1402.109985 1403.449951 1092300000 1403.449951
... ... ... ... ... ... ...
2020-01-06 3246.840088 3214.639893 3217.550049 3246.280029 3674070000 3246.280029
2020-01-07 3244.909912 3232.429932 3241.860107 3237.179932 3420380000 3237.179932
2020-01-08 3267.070068 3236.669922 3238.590088 3253.050049 3720890000 3253.050049
2020-01-09 3275.580078 3263.669922 3266.030029 3274.699951 3638390000 3274.699951
2020-01-10 3282.989990 3268.010010 3281.810059 3273.739990 920449258 3273.739990
5039 rows × 6 columns
因为这是每日数据,我用以下方法对其进行了重新采样:
weekly_resample = data.High.resample('M')
这会生成一个 DatetimeIndexResampler 对象文件。现在我想将这些数据切片以仅查看最近 10 周,为此我已经这样做了:
weekly_resample = data.High.resample('M')[-1:10]
但这会产生错误:
KeyError: 'Column not found: slice(-1, 10, None)'
我如何划分过去 10 周的时间?
最后 10 行使用 DataFrame.groupby
with Grouper
, so possible use GroupBy.tail
:
weekly_resample = data.High.groupby(pd.Grouper(freq='M')).tail(10)
print (weekly_resample)
Date
1999-12-31 1472.420044
2000-01-03 1478.000000
2000-01-04 1455.219971
2000-01-05 1413.270020
2000-01-06 1411.900024
2020-01-06 3246.840088
2020-01-07 3244.909912
2020-01-08 3267.070068
2020-01-09 3275.580078
2020-01-10 3282.989990
Name: High, dtype: float64
可以用resample
解决,只需要Resampler.transform
:
weekly_resample = data.High.resample('M').transform(lambda x: x.iloc[-10:])
#alternative
#weekly_resample = data.High.resample('M').transform(lambda x: x.tail(10))
print (weekly_resample)
Date
1999-12-31 1472.420044
2000-01-03 1478.000000
2000-01-04 1455.219971
2000-01-05 1413.270020
2000-01-06 1411.900024
2020-01-06 3246.840088
2020-01-07 3244.909912
2020-01-08 3267.070068
2020-01-09 3275.580078
2020-01-10 3282.989990
Name: High, dtype: float64
数据如下所示:
High Low Open Close Volume Adj Close
Date
1999-12-31 1472.420044 1458.189941 1464.469971 1469.250000 374050000 1469.250000
2000-01-03 1478.000000 1438.359985 1469.250000 1455.219971 931800000 1455.219971
2000-01-04 1455.219971 1397.430054 1455.219971 1399.420044 1009000000 1399.420044
2000-01-05 1413.270020 1377.680054 1399.420044 1402.109985 1085500000 1402.109985
2000-01-06 1411.900024 1392.099976 1402.109985 1403.449951 1092300000 1403.449951
... ... ... ... ... ... ...
2020-01-06 3246.840088 3214.639893 3217.550049 3246.280029 3674070000 3246.280029
2020-01-07 3244.909912 3232.429932 3241.860107 3237.179932 3420380000 3237.179932
2020-01-08 3267.070068 3236.669922 3238.590088 3253.050049 3720890000 3253.050049
2020-01-09 3275.580078 3263.669922 3266.030029 3274.699951 3638390000 3274.699951
2020-01-10 3282.989990 3268.010010 3281.810059 3273.739990 920449258 3273.739990
5039 rows × 6 columns
因为这是每日数据,我用以下方法对其进行了重新采样:
weekly_resample = data.High.resample('M')
这会生成一个 DatetimeIndexResampler 对象文件。现在我想将这些数据切片以仅查看最近 10 周,为此我已经这样做了:
weekly_resample = data.High.resample('M')[-1:10]
但这会产生错误:
KeyError: 'Column not found: slice(-1, 10, None)'
我如何划分过去 10 周的时间?
最后 10 行使用 DataFrame.groupby
with Grouper
, so possible use GroupBy.tail
:
weekly_resample = data.High.groupby(pd.Grouper(freq='M')).tail(10)
print (weekly_resample)
Date
1999-12-31 1472.420044
2000-01-03 1478.000000
2000-01-04 1455.219971
2000-01-05 1413.270020
2000-01-06 1411.900024
2020-01-06 3246.840088
2020-01-07 3244.909912
2020-01-08 3267.070068
2020-01-09 3275.580078
2020-01-10 3282.989990
Name: High, dtype: float64
可以用resample
解决,只需要Resampler.transform
:
weekly_resample = data.High.resample('M').transform(lambda x: x.iloc[-10:])
#alternative
#weekly_resample = data.High.resample('M').transform(lambda x: x.tail(10))
print (weekly_resample)
Date
1999-12-31 1472.420044
2000-01-03 1478.000000
2000-01-04 1455.219971
2000-01-05 1413.270020
2000-01-06 1411.900024
2020-01-06 3246.840088
2020-01-07 3244.909912
2020-01-08 3267.070068
2020-01-09 3275.580078
2020-01-10 3282.989990
Name: High, dtype: float64