下采样加速度计和陀螺仪的时间序列数据

Downsample the Time Series data of Accelerometer and Gyroscope

我有体育活动的时间序列数据。数据以 50hz 的频率记录。但是现在我想以 20hz 的频率对数据进行下采样,因为我想以 20hz 的频率训练和预测模型。

python 中是否有有效的方法来做到这一点?我听说过 Panda 的重采样功能,但不完全知道如何有效地使用它来解决我的问题。任何一段代码都会很有帮助。

   epoch (ms)              time (10:00)  elapsed (s)  x-axis (g)  y-axis (g)  z-axis (g)
1613977400899   2021-02-22T12:03:20.899            0      -0.336       0.886       0.649
1613977400920   2021-02-22T12:03:20.920        0.021      -0.233       0.799       0.648
1613977400940   2021-02-22T12:03:20.940        0.041      -0.173       0.771       0.629
1613977400961   2021-02-22T12:03:20.961        0.062      -0.132       0.757       0.596
1613977400981   2021-02-22T12:03:20.981        0.082      -0.113       0.724       0.57
1613977401002   2021-02-22T12:03:21.002        0.103      -0.127       0.713       0.538
1613977401021   2021-02-22T12:03:21.021        0.122      -0.175       0.743       0.488
1613977401041   2021-02-22T12:03:21.041        0.142      -0.266       0.775       0.417
1613977401062   2021-02-22T12:03:21.062        0.163      -0.281       0.774       0.402
1613977401082   2021-02-22T12:03:21.082        0.183      -0.212       0.713       0.427
1613977401103   2021-02-22T12:03:21.103        0.204      -0.17        0.649       0.46
1613977401123   2021-02-22T12:03:21.123        0.224      -0.204       0.649       0.524
1613977401144   2021-02-22T12:03:21.144        0.245      -0.313       0.684       0.658
1613977401164   2021-02-22T12:03:21.164        0.265      -0.415       0.727       0.785
1613977401183   2021-02-22T12:03:21.183        0.284      -0.419       0.726       0.82

这里的一个主要问题似乎是您的原始频率“大约”为 20ms(或 50Hz),不准确。我们需要分两步重新采样:

  1. 上采样到 1 毫秒,我们可以在其中定义要使用的插值
  2. 下采样到 50 毫秒(这只是每 50 行选择一个,很简单)

首先让我们建立一个时间索引。这里你有两次信息,所以其中任何一个都可以工作:

>>> df = df.set_index(df['epoch (ms)'].astype('datetime64[ms]'))
>>> df = df.set_index(pd.to_datetime(df['time (10:00)']))
>>> df
                            epoch (ms)             time (10:00)  elapsed (s)  x-axis (g)  y-axis (g)  z-axis (g)
time (10:00)                                                                                                    
2021-02-22 12:03:20.899  1613977400899  2021-02-22T12:03:20.899        0.000      -0.336       0.886       0.649
2021-02-22 12:03:20.920  1613977400920  2021-02-22T12:03:20.920        0.021      -0.233       0.799       0.648
2021-02-22 12:03:20.940  1613977400940  2021-02-22T12:03:20.940        0.041      -0.173       0.771       0.629
2021-02-22 12:03:20.961  1613977400961  2021-02-22T12:03:20.961        0.062      -0.132       0.757       0.596
2021-02-22 12:03:20.981  1613977400981  2021-02-22T12:03:20.981        0.082      -0.113       0.724       0.570
2021-02-22 12:03:21.002  1613977401002  2021-02-22T12:03:21.002        0.103      -0.127       0.713       0.538
2021-02-22 12:03:21.021  1613977401021  2021-02-22T12:03:21.021        0.122      -0.175       0.743       0.488
2021-02-22 12:03:21.041  1613977401041  2021-02-22T12:03:21.041        0.142      -0.266       0.775       0.417
2021-02-22 12:03:21.062  1613977401062  2021-02-22T12:03:21.062        0.163      -0.281       0.774       0.402
2021-02-22 12:03:21.082  1613977401082  2021-02-22T12:03:21.082        0.183      -0.212       0.713       0.427
2021-02-22 12:03:21.103  1613977401103  2021-02-22T12:03:21.103        0.204      -0.170       0.649       0.460
2021-02-22 12:03:21.123  1613977401123  2021-02-22T12:03:21.123        0.224      -0.204       0.649       0.524
2021-02-22 12:03:21.144  1613977401144  2021-02-22T12:03:21.144        0.245      -0.313       0.684       0.658
2021-02-22 12:03:21.164  1613977401164  2021-02-22T12:03:21.164        0.265      -0.415       0.727       0.785
2021-02-22 12:03:21.183  1613977401183  2021-02-22T12:03:21.183        0.284      -0.419       0.726       0.820

(现在我们真的不再需要 epochtime 列了,因为信息在索引中)

现在我们可以进行重采样了:

>>> df.resample('1ms').interpolate().resample('50ms').last()
                           epoch (ms)             time (10:00)  elapsed (s)  x-axis (g)  y-axis (g)  z-axis (g)
time (10:00)                                                                                                   
2021-02-22 12:03:20.850  1.613977e+12  2021-02-22T12:03:20.899        0.000   -0.336000    0.886000    0.649000
2021-02-22 12:03:20.900  1.613977e+12  2021-02-22T12:03:20.940        0.050   -0.155429    0.765000    0.614857
2021-02-22 12:03:20.950  1.613977e+12  2021-02-22T12:03:20.981        0.100   -0.125000    0.714571    0.542571
2021-02-22 12:03:21.000  1.613977e+12  2021-02-22T12:03:21.041        0.150   -0.271714    0.774619    0.411286
2021-02-22 12:03:21.050  1.613977e+12  2021-02-22T12:03:21.082        0.200   -0.178000    0.661190    0.453714
2021-02-22 12:03:21.100  1.613977e+12  2021-02-22T12:03:21.144        0.250   -0.338500    0.694750    0.689750
2021-02-22 12:03:21.150  1.613977e+12  2021-02-22T12:03:21.183        0.284   -0.419000    0.726000    0.820000

请注意,您可以通过指定传递给 .interpolate() 的参数来执行不同类型的插值。请参阅 the doc

method : str, default ‘linear’
Interpolation technique to use. One of:

  • ‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.
  • ‘time’: Works on daily and higher resolution data to interpolate given length of interval.
  • ‘index’, ‘values’: use the actual numerical values of the index.
  • ‘pad’: Fill in NaNs using existing values.
  • ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
  • ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes.
  • ‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives which replaces ‘piecewise_polynomial’ interpolation method in scipy 0.18.

您可以看到坐标略有不同,由您选择适合您的方法:

>>> df.resample('1ms').interpolate('time').resample('50ms').last()
                           epoch (ms)             time (10:00)  elapsed (s)  x-axis (g)  y-axis (g)  z-axis (g)
time (10:00)                                                                                                   
2021-02-22 12:03:20.850  1.613977e+12  2021-02-22T12:03:20.899        0.000   -0.336000    0.886000    0.649000
2021-02-22 12:03:20.900  1.613977e+12  2021-02-22T12:03:20.940        0.050   -0.155429    0.765000    0.614857
2021-02-22 12:03:20.950  1.613977e+12  2021-02-22T12:03:20.981        0.100   -0.125000    0.714571    0.542571
2021-02-22 12:03:21.000  1.613977e+12  2021-02-22T12:03:21.041        0.150   -0.271714    0.774619    0.411286
2021-02-22 12:03:21.050  1.613977e+12  2021-02-22T12:03:21.082        0.200   -0.178000    0.661190    0.453714
2021-02-22 12:03:21.100  1.613977e+12  2021-02-22T12:03:21.144        0.250   -0.338500    0.694750    0.689750
2021-02-22 12:03:21.150  1.613977e+12  2021-02-22T12:03:21.183        0.284   -0.419000    0.726000    0.820000
>>> df.resample('1ms').interpolate('cubic').resample('50ms').last()
                           epoch (ms)             time (10:00)  elapsed (s)  x-axis (g)  y-axis (g)  z-axis (g)
time (10:00)                                                                                                   
2021-02-22 12:03:20.850  1.613977e+12  2021-02-22T12:03:20.899        0.000   -0.336000    0.886000    0.649000
2021-02-22 12:03:20.900  1.613977e+12  2021-02-22T12:03:20.940        0.050   -0.153162    0.766266    0.615219
2021-02-22 12:03:20.950  1.613977e+12  2021-02-22T12:03:20.981        0.100   -0.122950    0.711454    0.543581
2021-02-22 12:03:21.000  1.613977e+12  2021-02-22T12:03:21.041        0.150   -0.285487    0.781273    0.403123
2021-02-22 12:03:21.050  1.613977e+12  2021-02-22T12:03:21.082        0.200   -0.172478    0.656944    0.452494
2021-02-22 12:03:21.100  1.613977e+12  2021-02-22T12:03:21.144        0.250   -0.342439    0.695493    0.693425
2021-02-22 12:03:21.150  1.613977e+12  2021-02-22T12:03:21.183        0.284   -0.419000    0.726000    0.820000