下采样加速度计和陀螺仪的时间序列数据
Downsample the Time Series data of Accelerometer and Gyroscope
我有体育活动的时间序列数据。数据以 50hz 的频率记录。但是现在我想以 20hz 的频率对数据进行下采样,因为我想以 20hz 的频率训练和预测模型。
python 中是否有有效的方法来做到这一点?我听说过 Panda 的重采样功能,但不完全知道如何有效地使用它来解决我的问题。任何一段代码都会很有帮助。
epoch (ms) time (10:00) elapsed (s) x-axis (g) y-axis (g) z-axis (g)
1613977400899 2021-02-22T12:03:20.899 0 -0.336 0.886 0.649
1613977400920 2021-02-22T12:03:20.920 0.021 -0.233 0.799 0.648
1613977400940 2021-02-22T12:03:20.940 0.041 -0.173 0.771 0.629
1613977400961 2021-02-22T12:03:20.961 0.062 -0.132 0.757 0.596
1613977400981 2021-02-22T12:03:20.981 0.082 -0.113 0.724 0.57
1613977401002 2021-02-22T12:03:21.002 0.103 -0.127 0.713 0.538
1613977401021 2021-02-22T12:03:21.021 0.122 -0.175 0.743 0.488
1613977401041 2021-02-22T12:03:21.041 0.142 -0.266 0.775 0.417
1613977401062 2021-02-22T12:03:21.062 0.163 -0.281 0.774 0.402
1613977401082 2021-02-22T12:03:21.082 0.183 -0.212 0.713 0.427
1613977401103 2021-02-22T12:03:21.103 0.204 -0.17 0.649 0.46
1613977401123 2021-02-22T12:03:21.123 0.224 -0.204 0.649 0.524
1613977401144 2021-02-22T12:03:21.144 0.245 -0.313 0.684 0.658
1613977401164 2021-02-22T12:03:21.164 0.265 -0.415 0.727 0.785
1613977401183 2021-02-22T12:03:21.183 0.284 -0.419 0.726 0.82
这里的一个主要问题似乎是您的原始频率“大约”为 20ms(或 50Hz),不准确。我们需要分两步重新采样:
- 上采样到 1 毫秒,我们可以在其中定义要使用的插值
- 下采样到 50 毫秒(这只是每 50 行选择一个,很简单)
首先让我们建立一个时间索引。这里你有两次信息,所以其中任何一个都可以工作:
>>> df = df.set_index(df['epoch (ms)'].astype('datetime64[ms]'))
>>> df = df.set_index(pd.to_datetime(df['time (10:00)']))
>>> df
epoch (ms) time (10:00) elapsed (s) x-axis (g) y-axis (g) z-axis (g)
time (10:00)
2021-02-22 12:03:20.899 1613977400899 2021-02-22T12:03:20.899 0.000 -0.336 0.886 0.649
2021-02-22 12:03:20.920 1613977400920 2021-02-22T12:03:20.920 0.021 -0.233 0.799 0.648
2021-02-22 12:03:20.940 1613977400940 2021-02-22T12:03:20.940 0.041 -0.173 0.771 0.629
2021-02-22 12:03:20.961 1613977400961 2021-02-22T12:03:20.961 0.062 -0.132 0.757 0.596
2021-02-22 12:03:20.981 1613977400981 2021-02-22T12:03:20.981 0.082 -0.113 0.724 0.570
2021-02-22 12:03:21.002 1613977401002 2021-02-22T12:03:21.002 0.103 -0.127 0.713 0.538
2021-02-22 12:03:21.021 1613977401021 2021-02-22T12:03:21.021 0.122 -0.175 0.743 0.488
2021-02-22 12:03:21.041 1613977401041 2021-02-22T12:03:21.041 0.142 -0.266 0.775 0.417
2021-02-22 12:03:21.062 1613977401062 2021-02-22T12:03:21.062 0.163 -0.281 0.774 0.402
2021-02-22 12:03:21.082 1613977401082 2021-02-22T12:03:21.082 0.183 -0.212 0.713 0.427
2021-02-22 12:03:21.103 1613977401103 2021-02-22T12:03:21.103 0.204 -0.170 0.649 0.460
2021-02-22 12:03:21.123 1613977401123 2021-02-22T12:03:21.123 0.224 -0.204 0.649 0.524
2021-02-22 12:03:21.144 1613977401144 2021-02-22T12:03:21.144 0.245 -0.313 0.684 0.658
2021-02-22 12:03:21.164 1613977401164 2021-02-22T12:03:21.164 0.265 -0.415 0.727 0.785
2021-02-22 12:03:21.183 1613977401183 2021-02-22T12:03:21.183 0.284 -0.419 0.726 0.820
(现在我们真的不再需要 epoch
和 time
列了,因为信息在索引中)
现在我们可以进行重采样了:
>>> df.resample('1ms').interpolate().resample('50ms').last()
epoch (ms) time (10:00) elapsed (s) x-axis (g) y-axis (g) z-axis (g)
time (10:00)
2021-02-22 12:03:20.850 1.613977e+12 2021-02-22T12:03:20.899 0.000 -0.336000 0.886000 0.649000
2021-02-22 12:03:20.900 1.613977e+12 2021-02-22T12:03:20.940 0.050 -0.155429 0.765000 0.614857
2021-02-22 12:03:20.950 1.613977e+12 2021-02-22T12:03:20.981 0.100 -0.125000 0.714571 0.542571
2021-02-22 12:03:21.000 1.613977e+12 2021-02-22T12:03:21.041 0.150 -0.271714 0.774619 0.411286
2021-02-22 12:03:21.050 1.613977e+12 2021-02-22T12:03:21.082 0.200 -0.178000 0.661190 0.453714
2021-02-22 12:03:21.100 1.613977e+12 2021-02-22T12:03:21.144 0.250 -0.338500 0.694750 0.689750
2021-02-22 12:03:21.150 1.613977e+12 2021-02-22T12:03:21.183 0.284 -0.419000 0.726000 0.820000
请注意,您可以通过指定传递给 .interpolate()
的参数来执行不同类型的插值。请参阅 the doc:
method : str, default ‘linear’
Interpolation technique to use. One of:
- ‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.
- ‘time’: Works on daily and higher resolution data to interpolate given length of interval.
- ‘index’, ‘values’: use the actual numerical values of the index.
- ‘pad’: Fill in NaNs using existing values.
- ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
- ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes.
- ‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives which replaces ‘piecewise_polynomial’ interpolation method in scipy 0.18.
您可以看到坐标略有不同,由您选择适合您的方法:
>>> df.resample('1ms').interpolate('time').resample('50ms').last()
epoch (ms) time (10:00) elapsed (s) x-axis (g) y-axis (g) z-axis (g)
time (10:00)
2021-02-22 12:03:20.850 1.613977e+12 2021-02-22T12:03:20.899 0.000 -0.336000 0.886000 0.649000
2021-02-22 12:03:20.900 1.613977e+12 2021-02-22T12:03:20.940 0.050 -0.155429 0.765000 0.614857
2021-02-22 12:03:20.950 1.613977e+12 2021-02-22T12:03:20.981 0.100 -0.125000 0.714571 0.542571
2021-02-22 12:03:21.000 1.613977e+12 2021-02-22T12:03:21.041 0.150 -0.271714 0.774619 0.411286
2021-02-22 12:03:21.050 1.613977e+12 2021-02-22T12:03:21.082 0.200 -0.178000 0.661190 0.453714
2021-02-22 12:03:21.100 1.613977e+12 2021-02-22T12:03:21.144 0.250 -0.338500 0.694750 0.689750
2021-02-22 12:03:21.150 1.613977e+12 2021-02-22T12:03:21.183 0.284 -0.419000 0.726000 0.820000
>>> df.resample('1ms').interpolate('cubic').resample('50ms').last()
epoch (ms) time (10:00) elapsed (s) x-axis (g) y-axis (g) z-axis (g)
time (10:00)
2021-02-22 12:03:20.850 1.613977e+12 2021-02-22T12:03:20.899 0.000 -0.336000 0.886000 0.649000
2021-02-22 12:03:20.900 1.613977e+12 2021-02-22T12:03:20.940 0.050 -0.153162 0.766266 0.615219
2021-02-22 12:03:20.950 1.613977e+12 2021-02-22T12:03:20.981 0.100 -0.122950 0.711454 0.543581
2021-02-22 12:03:21.000 1.613977e+12 2021-02-22T12:03:21.041 0.150 -0.285487 0.781273 0.403123
2021-02-22 12:03:21.050 1.613977e+12 2021-02-22T12:03:21.082 0.200 -0.172478 0.656944 0.452494
2021-02-22 12:03:21.100 1.613977e+12 2021-02-22T12:03:21.144 0.250 -0.342439 0.695493 0.693425
2021-02-22 12:03:21.150 1.613977e+12 2021-02-22T12:03:21.183 0.284 -0.419000 0.726000 0.820000
我有体育活动的时间序列数据。数据以 50hz 的频率记录。但是现在我想以 20hz 的频率对数据进行下采样,因为我想以 20hz 的频率训练和预测模型。
python 中是否有有效的方法来做到这一点?我听说过 Panda 的重采样功能,但不完全知道如何有效地使用它来解决我的问题。任何一段代码都会很有帮助。
epoch (ms) time (10:00) elapsed (s) x-axis (g) y-axis (g) z-axis (g)
1613977400899 2021-02-22T12:03:20.899 0 -0.336 0.886 0.649
1613977400920 2021-02-22T12:03:20.920 0.021 -0.233 0.799 0.648
1613977400940 2021-02-22T12:03:20.940 0.041 -0.173 0.771 0.629
1613977400961 2021-02-22T12:03:20.961 0.062 -0.132 0.757 0.596
1613977400981 2021-02-22T12:03:20.981 0.082 -0.113 0.724 0.57
1613977401002 2021-02-22T12:03:21.002 0.103 -0.127 0.713 0.538
1613977401021 2021-02-22T12:03:21.021 0.122 -0.175 0.743 0.488
1613977401041 2021-02-22T12:03:21.041 0.142 -0.266 0.775 0.417
1613977401062 2021-02-22T12:03:21.062 0.163 -0.281 0.774 0.402
1613977401082 2021-02-22T12:03:21.082 0.183 -0.212 0.713 0.427
1613977401103 2021-02-22T12:03:21.103 0.204 -0.17 0.649 0.46
1613977401123 2021-02-22T12:03:21.123 0.224 -0.204 0.649 0.524
1613977401144 2021-02-22T12:03:21.144 0.245 -0.313 0.684 0.658
1613977401164 2021-02-22T12:03:21.164 0.265 -0.415 0.727 0.785
1613977401183 2021-02-22T12:03:21.183 0.284 -0.419 0.726 0.82
这里的一个主要问题似乎是您的原始频率“大约”为 20ms(或 50Hz),不准确。我们需要分两步重新采样:
- 上采样到 1 毫秒,我们可以在其中定义要使用的插值
- 下采样到 50 毫秒(这只是每 50 行选择一个,很简单)
首先让我们建立一个时间索引。这里你有两次信息,所以其中任何一个都可以工作:
>>> df = df.set_index(df['epoch (ms)'].astype('datetime64[ms]'))
>>> df = df.set_index(pd.to_datetime(df['time (10:00)']))
>>> df
epoch (ms) time (10:00) elapsed (s) x-axis (g) y-axis (g) z-axis (g)
time (10:00)
2021-02-22 12:03:20.899 1613977400899 2021-02-22T12:03:20.899 0.000 -0.336 0.886 0.649
2021-02-22 12:03:20.920 1613977400920 2021-02-22T12:03:20.920 0.021 -0.233 0.799 0.648
2021-02-22 12:03:20.940 1613977400940 2021-02-22T12:03:20.940 0.041 -0.173 0.771 0.629
2021-02-22 12:03:20.961 1613977400961 2021-02-22T12:03:20.961 0.062 -0.132 0.757 0.596
2021-02-22 12:03:20.981 1613977400981 2021-02-22T12:03:20.981 0.082 -0.113 0.724 0.570
2021-02-22 12:03:21.002 1613977401002 2021-02-22T12:03:21.002 0.103 -0.127 0.713 0.538
2021-02-22 12:03:21.021 1613977401021 2021-02-22T12:03:21.021 0.122 -0.175 0.743 0.488
2021-02-22 12:03:21.041 1613977401041 2021-02-22T12:03:21.041 0.142 -0.266 0.775 0.417
2021-02-22 12:03:21.062 1613977401062 2021-02-22T12:03:21.062 0.163 -0.281 0.774 0.402
2021-02-22 12:03:21.082 1613977401082 2021-02-22T12:03:21.082 0.183 -0.212 0.713 0.427
2021-02-22 12:03:21.103 1613977401103 2021-02-22T12:03:21.103 0.204 -0.170 0.649 0.460
2021-02-22 12:03:21.123 1613977401123 2021-02-22T12:03:21.123 0.224 -0.204 0.649 0.524
2021-02-22 12:03:21.144 1613977401144 2021-02-22T12:03:21.144 0.245 -0.313 0.684 0.658
2021-02-22 12:03:21.164 1613977401164 2021-02-22T12:03:21.164 0.265 -0.415 0.727 0.785
2021-02-22 12:03:21.183 1613977401183 2021-02-22T12:03:21.183 0.284 -0.419 0.726 0.820
(现在我们真的不再需要 epoch
和 time
列了,因为信息在索引中)
现在我们可以进行重采样了:
>>> df.resample('1ms').interpolate().resample('50ms').last()
epoch (ms) time (10:00) elapsed (s) x-axis (g) y-axis (g) z-axis (g)
time (10:00)
2021-02-22 12:03:20.850 1.613977e+12 2021-02-22T12:03:20.899 0.000 -0.336000 0.886000 0.649000
2021-02-22 12:03:20.900 1.613977e+12 2021-02-22T12:03:20.940 0.050 -0.155429 0.765000 0.614857
2021-02-22 12:03:20.950 1.613977e+12 2021-02-22T12:03:20.981 0.100 -0.125000 0.714571 0.542571
2021-02-22 12:03:21.000 1.613977e+12 2021-02-22T12:03:21.041 0.150 -0.271714 0.774619 0.411286
2021-02-22 12:03:21.050 1.613977e+12 2021-02-22T12:03:21.082 0.200 -0.178000 0.661190 0.453714
2021-02-22 12:03:21.100 1.613977e+12 2021-02-22T12:03:21.144 0.250 -0.338500 0.694750 0.689750
2021-02-22 12:03:21.150 1.613977e+12 2021-02-22T12:03:21.183 0.284 -0.419000 0.726000 0.820000
请注意,您可以通过指定传递给 .interpolate()
的参数来执行不同类型的插值。请参阅 the doc:
method : str, default ‘linear’
Interpolation technique to use. One of:
- ‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.
- ‘time’: Works on daily and higher resolution data to interpolate given length of interval.
- ‘index’, ‘values’: use the actual numerical values of the index.
- ‘pad’: Fill in NaNs using existing values.
- ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
- ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes.
- ‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives which replaces ‘piecewise_polynomial’ interpolation method in scipy 0.18.
您可以看到坐标略有不同,由您选择适合您的方法:
>>> df.resample('1ms').interpolate('time').resample('50ms').last()
epoch (ms) time (10:00) elapsed (s) x-axis (g) y-axis (g) z-axis (g)
time (10:00)
2021-02-22 12:03:20.850 1.613977e+12 2021-02-22T12:03:20.899 0.000 -0.336000 0.886000 0.649000
2021-02-22 12:03:20.900 1.613977e+12 2021-02-22T12:03:20.940 0.050 -0.155429 0.765000 0.614857
2021-02-22 12:03:20.950 1.613977e+12 2021-02-22T12:03:20.981 0.100 -0.125000 0.714571 0.542571
2021-02-22 12:03:21.000 1.613977e+12 2021-02-22T12:03:21.041 0.150 -0.271714 0.774619 0.411286
2021-02-22 12:03:21.050 1.613977e+12 2021-02-22T12:03:21.082 0.200 -0.178000 0.661190 0.453714
2021-02-22 12:03:21.100 1.613977e+12 2021-02-22T12:03:21.144 0.250 -0.338500 0.694750 0.689750
2021-02-22 12:03:21.150 1.613977e+12 2021-02-22T12:03:21.183 0.284 -0.419000 0.726000 0.820000
>>> df.resample('1ms').interpolate('cubic').resample('50ms').last()
epoch (ms) time (10:00) elapsed (s) x-axis (g) y-axis (g) z-axis (g)
time (10:00)
2021-02-22 12:03:20.850 1.613977e+12 2021-02-22T12:03:20.899 0.000 -0.336000 0.886000 0.649000
2021-02-22 12:03:20.900 1.613977e+12 2021-02-22T12:03:20.940 0.050 -0.153162 0.766266 0.615219
2021-02-22 12:03:20.950 1.613977e+12 2021-02-22T12:03:20.981 0.100 -0.122950 0.711454 0.543581
2021-02-22 12:03:21.000 1.613977e+12 2021-02-22T12:03:21.041 0.150 -0.285487 0.781273 0.403123
2021-02-22 12:03:21.050 1.613977e+12 2021-02-22T12:03:21.082 0.200 -0.172478 0.656944 0.452494
2021-02-22 12:03:21.100 1.613977e+12 2021-02-22T12:03:21.144 0.250 -0.342439 0.695493 0.693425
2021-02-22 12:03:21.150 1.613977e+12 2021-02-22T12:03:21.183 0.284 -0.419000 0.726000 0.820000