Python Pandas Dataframe:索引长度不匹配 - df['column'] = ndarray
Python Pandas Dataframe: length of index does not match - df['column'] = ndarray
我有一个 pandas 数据框,其中包含用于分析的 EOD 财务数据 (OHLC)。
我正在使用 https://github.com/cirla/tulipy 库生成技术指标值,这些值有特定的时间段作为选项。例如。 timeperiod=5 的 ADX 显示最近 5 天的 ADX。
由于这个时间段,生成的带有指标值的数组的长度总是比 Dataframe 短。因为前 5 天的价格用于生成第 6 天的 ADX..
pdi14, mdi14 = ti.di(
high=highData, low=lowData, close=closeData, period=14)
df['mdi_14'] = mdi14
df['pdi_14'] = pdi14
>> ValueError: Length of values does not match length of index
不幸的是,与 TA-LIB 不同,这个郁金香库不为前几天的空白日提供 NaN 值...
有没有一种简单的方法可以将这些 NaN 添加到 ndarray 中?
或者在某个索引处插入 df 并让它自动为它之前的行创建 NaN?
提前致谢,我研究了好几天!
完整的 MCVE
df = pd.DataFrame(1, range(10), list('ABC'))
a = np.full((len(df) - 6, df.shape[1]), 2)
b = np.full((6, df.shape[1]), np.nan)
c = np.row_stack([b, a])
d = pd.DataFrame(c, df.index, df.columns)
d
A B C
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
6 2.0 2.0 2.0
7 2.0 2.0 2.0
8 2.0 2.0 2.0
9 2.0 2.0 2.0
也许自己在代码中进行转换?
period = 14
pdi14, mdi14 = ti.di(
high=highData, low=lowData, close=closeData, period=period
)
df['mdi_14'] = np.NAN
df['mdi_14'][period - 1:] = mdi14
我希望他们将来在库中用 NAN 填充第一个值。留下这样的时间序列数据没有任何标签是很危险的。
tulip library includes a start
function for each indicator (reference: https://tulipindicators.org/usage) that can be used to determine the output length of an indicator given a set of input options. Unfortunately, it does not appear that the python bindings library, tulipy 的 C 版本包含此功能。相反,您必须求助于动态重新分配索引值以使输出与原始 DataFrame 对齐。
这是一个使用郁金香文档中的价格系列的示例:
#Create the dataframe with close prices
prices = pd.DataFrame(data={81.59, 81.06, 82.87, 83, 83.61, 83.15, 82.84, 83.99, 84.55,
84.36, 85.53, 86.54, 86.89, 87.77, 87.29}, columns=['close'])
#Compute the technical indicator using tulipy and save the result in a DataFrame
bbands = pd.DataFrame(data=np.transpose(ti.bbands(real = prices['close'].to_numpy(), period = 5, stddev = 2)))
#Dynamically realign the index; note from the tulip library documentation that the price/volume data is expected be ordered "oldest to newest (index 0 is oldest)"
bbands.index += prices.index.max() - bbands.index.max()
#Put the indicator values with the original DataFrame
prices[['BBANDS_5_2_low', 'BBANDS_5_2_mid', 'BBANDS_5_2_up']] = bbands
prices.head(15)
close BBANDS_5_2_low BBANDS_5_2_mid BBANDS_5_2_up
0 81.06 NaN NaN NaN
1 81.59 NaN NaN NaN
2 82.87 NaN NaN NaN
3 83.00 NaN NaN NaN
4 83.61 80.530042 82.426 84.321958
5 83.15 81.494061 82.844 84.193939
6 82.84 82.533343 83.094 83.654657
7 83.99 82.471983 83.318 84.164017
8 84.55 82.417750 83.628 84.838250
9 84.36 82.435203 83.778 85.120797
10 85.53 82.511331 84.254 85.996669
11 86.54 83.142618 84.994 86.845382
12 86.89 83.536488 85.574 87.611512
13 87.77 83.870324 86.218 88.565676
14 87.29 85.288871 86.804 88.319129
我有一个 pandas 数据框,其中包含用于分析的 EOD 财务数据 (OHLC)。
我正在使用 https://github.com/cirla/tulipy 库生成技术指标值,这些值有特定的时间段作为选项。例如。 timeperiod=5 的 ADX 显示最近 5 天的 ADX。
由于这个时间段,生成的带有指标值的数组的长度总是比 Dataframe 短。因为前 5 天的价格用于生成第 6 天的 ADX..
pdi14, mdi14 = ti.di(
high=highData, low=lowData, close=closeData, period=14)
df['mdi_14'] = mdi14
df['pdi_14'] = pdi14
>> ValueError: Length of values does not match length of index
不幸的是,与 TA-LIB 不同,这个郁金香库不为前几天的空白日提供 NaN 值...
有没有一种简单的方法可以将这些 NaN 添加到 ndarray 中? 或者在某个索引处插入 df 并让它自动为它之前的行创建 NaN?
提前致谢,我研究了好几天!
完整的 MCVE
df = pd.DataFrame(1, range(10), list('ABC'))
a = np.full((len(df) - 6, df.shape[1]), 2)
b = np.full((6, df.shape[1]), np.nan)
c = np.row_stack([b, a])
d = pd.DataFrame(c, df.index, df.columns)
d
A B C
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
6 2.0 2.0 2.0
7 2.0 2.0 2.0
8 2.0 2.0 2.0
9 2.0 2.0 2.0
也许自己在代码中进行转换?
period = 14
pdi14, mdi14 = ti.di(
high=highData, low=lowData, close=closeData, period=period
)
df['mdi_14'] = np.NAN
df['mdi_14'][period - 1:] = mdi14
我希望他们将来在库中用 NAN 填充第一个值。留下这样的时间序列数据没有任何标签是很危险的。
tulip library includes a start
function for each indicator (reference: https://tulipindicators.org/usage) that can be used to determine the output length of an indicator given a set of input options. Unfortunately, it does not appear that the python bindings library, tulipy 的 C 版本包含此功能。相反,您必须求助于动态重新分配索引值以使输出与原始 DataFrame 对齐。
这是一个使用郁金香文档中的价格系列的示例:
#Create the dataframe with close prices
prices = pd.DataFrame(data={81.59, 81.06, 82.87, 83, 83.61, 83.15, 82.84, 83.99, 84.55,
84.36, 85.53, 86.54, 86.89, 87.77, 87.29}, columns=['close'])
#Compute the technical indicator using tulipy and save the result in a DataFrame
bbands = pd.DataFrame(data=np.transpose(ti.bbands(real = prices['close'].to_numpy(), period = 5, stddev = 2)))
#Dynamically realign the index; note from the tulip library documentation that the price/volume data is expected be ordered "oldest to newest (index 0 is oldest)"
bbands.index += prices.index.max() - bbands.index.max()
#Put the indicator values with the original DataFrame
prices[['BBANDS_5_2_low', 'BBANDS_5_2_mid', 'BBANDS_5_2_up']] = bbands
prices.head(15)
close BBANDS_5_2_low BBANDS_5_2_mid BBANDS_5_2_up
0 81.06 NaN NaN NaN
1 81.59 NaN NaN NaN
2 82.87 NaN NaN NaN
3 83.00 NaN NaN NaN
4 83.61 80.530042 82.426 84.321958
5 83.15 81.494061 82.844 84.193939
6 82.84 82.533343 83.094 83.654657
7 83.99 82.471983 83.318 84.164017
8 84.55 82.417750 83.628 84.838250
9 84.36 82.435203 83.778 85.120797
10 85.53 82.511331 84.254 85.996669
11 86.54 83.142618 84.994 86.845382
12 86.89 83.536488 85.574 87.611512
13 87.77 83.870324 86.218 88.565676
14 87.29 85.288871 86.804 88.319129