Python Pandas Dataframe：索引长度不匹配 - df['column'] = ndarray

Question

我有一个 pandas 数据框，其中包含用于分析的 EOD 财务数据 (OHLC)。

我正在使用 https://github.com/cirla/tulipy 库生成技术指标值，这些值有特定的时间段作为选项。例如。 timeperiod=5 的 ADX 显示最近 5 天的 ADX。

由于这个时间段，生成的带有指标值的数组的长度总是比 Dataframe 短。因为前 5 天的价格用于生成第 6 天的 ADX..

    pdi14, mdi14 = ti.di(
    high=highData, low=lowData, close=closeData, period=14)

    df['mdi_14'] = mdi14
    df['pdi_14'] = pdi14
    >> ValueError: Length of values does not match length of index

不幸的是，与 TA-LIB 不同，这个郁金香库不为前几天的空白日提供 NaN 值...

有没有一种简单的方法可以将这些 NaN 添加到 ndarray 中？或者在某个索引处插入 df 并让它自动为它之前的行创建 NaN？

提前致谢，我研究了好几天！

Answer 1

完整的 MCVE

df = pd.DataFrame(1, range(10), list('ABC'))

a = np.full((len(df) - 6, df.shape[1]), 2)
b = np.full((6, df.shape[1]), np.nan)

c = np.row_stack([b, a])

d = pd.DataFrame(c, df.index, df.columns)
d

     A    B    C
0  NaN  NaN  NaN
1  NaN  NaN  NaN
2  NaN  NaN  NaN
3  NaN  NaN  NaN
4  NaN  NaN  NaN
5  NaN  NaN  NaN
6  2.0  2.0  2.0
7  2.0  2.0  2.0
8  2.0  2.0  2.0
9  2.0  2.0  2.0

Answer 2

也许自己在代码中进行转换？

period = 14
pdi14, mdi14 = ti.di(
    high=highData, low=lowData, close=closeData, period=period
)

df['mdi_14'] = np.NAN
df['mdi_14'][period - 1:] = mdi14

我希望他们将来在库中用 NAN 填充第一个值。留下这样的时间序列数据没有任何标签是很危险的。

Answer 3

tulip library includes a start function for each indicator (reference: https://tulipindicators.org/usage) that can be used to determine the output length of an indicator given a set of input options. Unfortunately, it does not appear that the python bindings library, tulipy 的 C 版本包含此功能。相反，您必须求助于动态重新分配索引值以使输出与原始 DataFrame 对齐。

这是一个使用郁金香文档中的价格系列的示例：

#Create the dataframe with close prices
prices = pd.DataFrame(data={81.59, 81.06, 82.87, 83, 83.61, 83.15, 82.84, 83.99, 84.55,
 84.36, 85.53, 86.54, 86.89, 87.77, 87.29}, columns=['close'])

#Compute the technical indicator using tulipy and save the result in a DataFrame
bbands = pd.DataFrame(data=np.transpose(ti.bbands(real = prices['close'].to_numpy(), period = 5, stddev = 2)))

#Dynamically realign the index; note from the tulip library documentation that the price/volume data is expected be ordered "oldest to newest (index 0 is oldest)"
bbands.index += prices.index.max() - bbands.index.max()

#Put the indicator values with the original DataFrame
prices[['BBANDS_5_2_low', 'BBANDS_5_2_mid', 'BBANDS_5_2_up']] = bbands
prices.head(15)

close   BBANDS_5_2_low  BBANDS_5_2_mid  BBANDS_5_2_up
0   81.06   NaN NaN NaN
1   81.59   NaN NaN NaN
2   82.87   NaN NaN NaN
3   83.00   NaN NaN NaN
4   83.61   80.530042   82.426  84.321958
5   83.15   81.494061   82.844  84.193939
6   82.84   82.533343   83.094  83.654657
7   83.99   82.471983   83.318  84.164017
8   84.55   82.417750   83.628  84.838250
9   84.36   82.435203   83.778  85.120797
10  85.53   82.511331   84.254  85.996669
11  86.54   83.142618   84.994  86.845382
12  86.89   83.536488   85.574  87.611512
13  87.77   83.870324   86.218  88.565676
14  87.29   85.288871   86.804  88.319129

Python Pandas Dataframe：索引长度不匹配 - df['column'] = ndarray

Python Pandas Dataframe: length of index does not match - df['column'] = ndarray

python

time-series

dataframe

pandas

valueerror