Python Pandas Dataframe:索引长度不匹配 - df['column'] = ndarray

Python Pandas Dataframe: length of index does not match - df['column'] = ndarray

我有一个 pandas 数据框,其中包含用于分析的 EOD 财务数据 (OHLC)。

我正在使用 https://github.com/cirla/tulipy 库生成技术指标值,这些值有特定的时间段作为选项。例如。 timeperiod=5 的 ADX 显示最近 5 天的 ADX。

由于这个时间段,生成的带有指标值的数组的长度总是比 Dataframe 短。因为前 5 天的价格用于生成第 6 天的 ADX..

    pdi14, mdi14 = ti.di(
    high=highData, low=lowData, close=closeData, period=14)

    df['mdi_14'] = mdi14
    df['pdi_14'] = pdi14
    >> ValueError: Length of values does not match length of index

不幸的是,与 TA-LIB 不同,这个郁金香库不为前几天的空白日提供 NaN 值...

有没有一种简单的方法可以将这些 NaN 添加到 ndarray 中? 或者在某个索引处插入 df 并让它自动为它之前的行创建 NaN?

提前致谢,我研究了好几天!

完整的 MCVE

df = pd.DataFrame(1, range(10), list('ABC'))

a = np.full((len(df) - 6, df.shape[1]), 2)
b = np.full((6, df.shape[1]), np.nan)

c = np.row_stack([b, a])

d = pd.DataFrame(c, df.index, df.columns)
d

     A    B    C
0  NaN  NaN  NaN
1  NaN  NaN  NaN
2  NaN  NaN  NaN
3  NaN  NaN  NaN
4  NaN  NaN  NaN
5  NaN  NaN  NaN
6  2.0  2.0  2.0
7  2.0  2.0  2.0
8  2.0  2.0  2.0
9  2.0  2.0  2.0

也许自己在代码中进行转换?

period = 14
pdi14, mdi14 = ti.di(
    high=highData, low=lowData, close=closeData, period=period
)

df['mdi_14'] = np.NAN
df['mdi_14'][period - 1:] = mdi14

我希望他们将来在库中用 NAN 填充第一个值。留下这样的时间序列数据没有任何标签是很危险的。

tulip library includes a start function for each indicator (reference: https://tulipindicators.org/usage) that can be used to determine the output length of an indicator given a set of input options. Unfortunately, it does not appear that the python bindings library, tulipy 的 C 版本包含此功能。相反,您必须求助于动态重新分配索引值以使输出与原始 DataFrame 对齐。

这是一个使用郁金香文档中的价格系列的示例:

#Create the dataframe with close prices
prices = pd.DataFrame(data={81.59, 81.06, 82.87, 83, 83.61, 83.15, 82.84, 83.99, 84.55,
 84.36, 85.53, 86.54, 86.89, 87.77, 87.29}, columns=['close'])

#Compute the technical indicator using tulipy and save the result in a DataFrame
bbands = pd.DataFrame(data=np.transpose(ti.bbands(real = prices['close'].to_numpy(), period = 5, stddev = 2)))

#Dynamically realign the index; note from the tulip library documentation that the price/volume data is expected be ordered "oldest to newest (index 0 is oldest)"
bbands.index += prices.index.max() - bbands.index.max()

#Put the indicator values with the original DataFrame
prices[['BBANDS_5_2_low', 'BBANDS_5_2_mid', 'BBANDS_5_2_up']] = bbands
prices.head(15)

close   BBANDS_5_2_low  BBANDS_5_2_mid  BBANDS_5_2_up
0   81.06   NaN NaN NaN
1   81.59   NaN NaN NaN
2   82.87   NaN NaN NaN
3   83.00   NaN NaN NaN
4   83.61   80.530042   82.426  84.321958
5   83.15   81.494061   82.844  84.193939
6   82.84   82.533343   83.094  83.654657
7   83.99   82.471983   83.318  84.164017
8   84.55   82.417750   83.628  84.838250
9   84.36   82.435203   83.778  85.120797
10  85.53   82.511331   84.254  85.996669
11  86.54   83.142618   84.994  86.845382
12  86.89   83.536488   85.574  87.611512
13  87.77   83.870324   86.218  88.565676
14  87.29   85.288871   86.804  88.319129