将价目表从较长的长度重新调整为较短的长度

Rescale price list from a longer length to a smaller length

给定以下具有 60 个元素的 pandas 数据框。

import pandas as pd
data = [60,62.75,73.28,75.77,70.28
    ,67.85,74.58,72.91,68.33,78.59
    ,75.58,78.93,74.61,85.3,84.63
    ,84.61,87.76,95.02,98.83,92.44
    ,84.8,89.51,90.25,93.82,86.64
    ,77.84,76.06,77.75,72.13,80.2
    ,79.05,76.11,80.28,76.38,73.3
    ,72.28,77,69.28,71.31,79.25
    ,75.11,73.16,78.91,84.78,85.17
    ,91.53,94.85,87.79,97.92,92.88
    ,91.92,88.32,81.49,88.67,91.46
    ,91.71,82.17,93.05,103.98,105]

data_pd = pd.DataFrame(data, columns=["price"])

是否有一个公式可以以这种方式重新缩放它,以便对于从索引 0 到索引 i+1 的每个 window 大于 20 个元素,数据被重新缩​​放到 20 个元素?

这是一个使用重新缩放数据创建 windows 的循环,我只是不知道如何针对手头的这个问题进行重新缩放。关于如何做到这一点有什么建议吗?

miniLenght = 20
rescaledData = []
for i in range(len(data_pd)):
    if(i >= miniLenght):
        dataForScaling = data_pd[0:i]
        scaledDataToMinLenght = dataForScaling #do the scaling here so that the length of the rescaled data is always equal to miniLenght
        rescaledData.append(scaledDataToMinLenght)

基本上在重新缩放后 rescaledData 应该有 40 个数组,每个数组的长度为 20 个价格。

从这篇论文来看,您似乎正在将列表的大小调整回 20 个索引,然后在您的 20 个索引处插入数据。

我们将像他们一样制作索引 (range(0, len(large), step = len(large)/miniLenght)),然后使用 numpys interp - 有上百万种数据插值方法。 np.interp 使用线性插值,因此如果您要求索引 1.5,您将得到点 1 和点 2 的平均值,依此类推。

因此,这是对您的代码的快速修改(注意,我们可能可以使用 'rolling' 对其进行完全矢量化):

import numpy as np
miniLenght = 20
rescaledData = []

for i in range(len(data_pd)):
    if(i >= miniLenght):
        dataForScaling = data_pd['price'][0:i]
        #figure out how many 'steps' we have
        steps = len(dataForScaling)
        #make indices where the data needs to be sliced to get 20 points
        indices = np.arange(0,steps, step = steps/miniLenght)
        #use np.interp at those points, with the original values as given
        rescaledData.append(np.interp(indices, np.arange(steps), dataForScaling))

并且输出符合预期:

[array([ 60.  ,  62.75,  73.28,  75.77,  70.28,  67.85,  74.58,  72.91,
         68.33,  78.59,  75.58,  78.93,  74.61,  85.3 ,  84.63,  84.61,
         87.76,  95.02,  98.83,  92.44]),
 array([ 60.    ,  63.2765,  73.529 ,  74.9465,  69.794 ,  69.5325,
         74.079 ,  71.307 ,  72.434 ,  77.2355,  77.255 ,  76.554 ,
         81.024 ,  84.8645,  84.616 ,  86.9725,  93.568 ,  98.2585,
         93.079 ,  85.182 ]),.....