尝试在 pandas 数据帧上执行 ffill() 时出现 IndexError
IndexError when trying to perform ffill() on pandas dataframe
谁能解释一下这个错误是什么意思?我有一个包含大量 NaN 值的大型数据框。我只是想用以前的值填充某些列。这是代码:
import tables as tb
import pandas as pd
在这里我打开一些 pytables 并将 table 导入数据帧
FGBL = tb.open_file("C:\Users\SUPER\Documents\NewQSPythonSamples\FGBL.h5")
FGBM = tb.open_file("C:\Users\SUPER\Documents\NewQSPythonSamples\FGBM.h5")
FGBS = tb.open_file("C:\Users\SUPER\Documents\NewQSPythonSamples\FGBS.h5")
FGBLtable = FGBL.root.trade.Z4
FGBMtable = FGBM.root.trade.Z4
FGBStable = FGBS.root.trade.Z4
FGBStableq = FGBS.root.quote.Z4
FGBMtableq = FGBM.root.quote.Z4
FGBLtableq = FGBL.root.quote.Z4
fgbltrade = pd.DataFrame.from_records(FGBLtable.read())
fgbmtrade = pd.DataFrame.from_records(FGBMtable.read())
fgbstrade = pd.DataFrame.from_records(FGBLtable.read())
fgblquote = pd.DataFrame.from_records(FGBLtableq.read())
fgbmquote = pd.DataFrame.from_records(FGBMtableq.read())
fgbsquote = pd.DataFrame.from_records(FGBStableq.read())
然后我将日期时间从时间戳转换为日期时间格式
fgbltrade["DateTimes"] = pd.to_datetime(fgbltrade.dateTime, unit="s")
fgbmtrade["DateTimes"] = pd.to_datetime(fgbmtrade.dateTime, unit="s")
fgbstrade["DateTimes"] = pd.to_datetime(fgbstrade.dateTime, unit="s")
fgblquote["DateTimes"] = pd.to_datetime(fgblquote.dateTime, unit="s")
fgbmquote["DateTimes"] = pd.to_datetime(fgbmquote.dateTime, unit="s")
fgbsquote["DateTimes"] = pd.to_datetime(fgbsquote.dateTime, unit="s")
对帧执行一些简单的数学运算,然后删除 NaN 和不需要的列
fgblquote["VWPfgbl"] = (fgblquote.askPrc*fgblquote.bidSize + fgblquote.bidPrc*fgblquote.askSize)/(fgblquote.askSize + fgblquote.bidSize)
fgbmquote["VWPfgbm"] = (fgbmquote.askPrc*fgbmquote.bidSize + fgbmquote.bidPrc*fgbmquote.askSize)/(fgbmquote.askSize + fgbmquote.bidSize)
fgbsquote["VWPfgbs"] = (fgbsquote.askPrc*fgbsquote.bidSize + fgbsquote.bidPrc*fgbsquote.askSize)/(fgbsquote.askSize + fgbsquote.bidSize)
fgblquote = fgblquote.dropna()
fgbmquote = fgbmquote.dropna()
fgbsquote = fgbsquote.dropna()
fgblquote = fgblquote.drop(["askPrc", "askSize", "bidPrc", "bidSize", "dateTime"], axis=1)
fgbmquote = fgbmquote.drop(["askPrc", "askSize", "bidPrc", "bidSize", "dateTime"], axis=1)
fgbsquote = fgbsquote.drop(["askPrc", "askSize", "bidPrc", "bidSize", "dateTime"], axis=1)
然后我将这些帧合并在一起
df = pd.merge(fgbltrade, fgbmtrade, on='DateTimes', how = "outer")
df = pd.merge(df, fgbstrade, on='DateTimes', how = "outer")
df = pd.merge(df, fgblquote, on='DateTimes', how = "outer")
df = pd.merge(df, fgbmquote, on='DateTimes', how = "outer")
df = pd.merge(df, fgbsquote, on='DateTimes', how = "outer")
并尝试填补前锋
df = df["VWPfgbl"].ffill()
df = df["VWPfgbm"].ffill()
df = df["VWPfgbs"].ffill()
和错误:
In [3]: df = df["VWPfgbl"].ffill()
...: df = df["VWPfgbm"].ffill()
...: df = df["VWPfgbs"].ffill()
...:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-20f62c2a5da9> in <module>()
1 df = df["VWPfgbl"].ffill()
----> 2 df = df["VWPfgbm"].ffill()
3 df = df["VWPfgbs"].ffill()
4
C:\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
482 def __getitem__(self, key):
483 try:
--> 484 result = self.index.get_value(self, key)
485
486 if not np.isscalar(result):
C:\Anaconda3\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
1214 # python 3
1215 if np.isscalar(key): # pragma: no cover
-> 1216 raise IndexError(key)
1217 raise InvalidIndexError(key)
1218
IndexError: VWPfgbm
错误,
IndexError: VWPfgbm
表示 df
没有名为 'VWPfgbm'
的列。
您可以通过检查 df.columns
.
来检查该事实
您可能想知道,如果 fgbmquote["VWPfgbm"]
和
df = pd.merge(df, fgblquote, on='DateTimes', how = "outer")
为什么 df
不包含列,"VWPfgbm"
?
可能发生这种情况的一个原因是 df
和 fgblquote
都有 "VWPfgbm"
列。然后 pd.merge
通过在合并的 DataFrame 中命名列 "VWPfgbm_x"
和 "VWPfgbm_y"
来消除歧义。见 suffixes
parameter of the pd.merge
function.
例如,
import pandas as pd
foo = pd.DataFrame({'VWPfgbm':range(3), 'baz':list('ABC')})
bar = pd.DataFrame({'VWPfgbm':range(3,6), 'baz':list('CAB')})
pd.merge(foo, bar, on='baz', how='outer')
产量
VWPfgbm_x baz VWPfgbm_y
0 0 A 4
1 1 B 5
2 2 C 3
您的错误是您用先前值的单个列覆盖了 df
变量。
df = df["VWPfgbl"].ffill()
df = df["VWPfgbm"].ffill()
df = df["VWPfgbs"].ffill()
那里的第一行将分配 df
变量,使其成为原始数据框的单个(填充)列。这就是它在第二行失败的原因,因为 df
现在没有任何其他列,所以你得到一个 IndexError
.
您应该将代码重写为
df["VWPfgbl"] = df["VWPfgbl"].ffill()
df["VWPfgbm"] = df["VWPfgbm"].ffill()
df["VWPfgbs"] = df["VWPfgbs"].ffill()
谁能解释一下这个错误是什么意思?我有一个包含大量 NaN 值的大型数据框。我只是想用以前的值填充某些列。这是代码:
import tables as tb
import pandas as pd
在这里我打开一些 pytables 并将 table 导入数据帧
FGBL = tb.open_file("C:\Users\SUPER\Documents\NewQSPythonSamples\FGBL.h5")
FGBM = tb.open_file("C:\Users\SUPER\Documents\NewQSPythonSamples\FGBM.h5")
FGBS = tb.open_file("C:\Users\SUPER\Documents\NewQSPythonSamples\FGBS.h5")
FGBLtable = FGBL.root.trade.Z4
FGBMtable = FGBM.root.trade.Z4
FGBStable = FGBS.root.trade.Z4
FGBStableq = FGBS.root.quote.Z4
FGBMtableq = FGBM.root.quote.Z4
FGBLtableq = FGBL.root.quote.Z4
fgbltrade = pd.DataFrame.from_records(FGBLtable.read())
fgbmtrade = pd.DataFrame.from_records(FGBMtable.read())
fgbstrade = pd.DataFrame.from_records(FGBLtable.read())
fgblquote = pd.DataFrame.from_records(FGBLtableq.read())
fgbmquote = pd.DataFrame.from_records(FGBMtableq.read())
fgbsquote = pd.DataFrame.from_records(FGBStableq.read())
然后我将日期时间从时间戳转换为日期时间格式
fgbltrade["DateTimes"] = pd.to_datetime(fgbltrade.dateTime, unit="s")
fgbmtrade["DateTimes"] = pd.to_datetime(fgbmtrade.dateTime, unit="s")
fgbstrade["DateTimes"] = pd.to_datetime(fgbstrade.dateTime, unit="s")
fgblquote["DateTimes"] = pd.to_datetime(fgblquote.dateTime, unit="s")
fgbmquote["DateTimes"] = pd.to_datetime(fgbmquote.dateTime, unit="s")
fgbsquote["DateTimes"] = pd.to_datetime(fgbsquote.dateTime, unit="s")
对帧执行一些简单的数学运算,然后删除 NaN 和不需要的列
fgblquote["VWPfgbl"] = (fgblquote.askPrc*fgblquote.bidSize + fgblquote.bidPrc*fgblquote.askSize)/(fgblquote.askSize + fgblquote.bidSize)
fgbmquote["VWPfgbm"] = (fgbmquote.askPrc*fgbmquote.bidSize + fgbmquote.bidPrc*fgbmquote.askSize)/(fgbmquote.askSize + fgbmquote.bidSize)
fgbsquote["VWPfgbs"] = (fgbsquote.askPrc*fgbsquote.bidSize + fgbsquote.bidPrc*fgbsquote.askSize)/(fgbsquote.askSize + fgbsquote.bidSize)
fgblquote = fgblquote.dropna()
fgbmquote = fgbmquote.dropna()
fgbsquote = fgbsquote.dropna()
fgblquote = fgblquote.drop(["askPrc", "askSize", "bidPrc", "bidSize", "dateTime"], axis=1)
fgbmquote = fgbmquote.drop(["askPrc", "askSize", "bidPrc", "bidSize", "dateTime"], axis=1)
fgbsquote = fgbsquote.drop(["askPrc", "askSize", "bidPrc", "bidSize", "dateTime"], axis=1)
然后我将这些帧合并在一起
df = pd.merge(fgbltrade, fgbmtrade, on='DateTimes', how = "outer")
df = pd.merge(df, fgbstrade, on='DateTimes', how = "outer")
df = pd.merge(df, fgblquote, on='DateTimes', how = "outer")
df = pd.merge(df, fgbmquote, on='DateTimes', how = "outer")
df = pd.merge(df, fgbsquote, on='DateTimes', how = "outer")
并尝试填补前锋
df = df["VWPfgbl"].ffill()
df = df["VWPfgbm"].ffill()
df = df["VWPfgbs"].ffill()
和错误:
In [3]: df = df["VWPfgbl"].ffill()
...: df = df["VWPfgbm"].ffill()
...: df = df["VWPfgbs"].ffill()
...:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-20f62c2a5da9> in <module>()
1 df = df["VWPfgbl"].ffill()
----> 2 df = df["VWPfgbm"].ffill()
3 df = df["VWPfgbs"].ffill()
4
C:\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
482 def __getitem__(self, key):
483 try:
--> 484 result = self.index.get_value(self, key)
485
486 if not np.isscalar(result):
C:\Anaconda3\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
1214 # python 3
1215 if np.isscalar(key): # pragma: no cover
-> 1216 raise IndexError(key)
1217 raise InvalidIndexError(key)
1218
IndexError: VWPfgbm
错误,
IndexError: VWPfgbm
表示 df
没有名为 'VWPfgbm'
的列。
您可以通过检查 df.columns
.
您可能想知道,如果 fgbmquote["VWPfgbm"]
和
df = pd.merge(df, fgblquote, on='DateTimes', how = "outer")
为什么 df
不包含列,"VWPfgbm"
?
可能发生这种情况的一个原因是 df
和 fgblquote
都有 "VWPfgbm"
列。然后 pd.merge
通过在合并的 DataFrame 中命名列 "VWPfgbm_x"
和 "VWPfgbm_y"
来消除歧义。见 suffixes
parameter of the pd.merge
function.
例如,
import pandas as pd
foo = pd.DataFrame({'VWPfgbm':range(3), 'baz':list('ABC')})
bar = pd.DataFrame({'VWPfgbm':range(3,6), 'baz':list('CAB')})
pd.merge(foo, bar, on='baz', how='outer')
产量
VWPfgbm_x baz VWPfgbm_y
0 0 A 4
1 1 B 5
2 2 C 3
您的错误是您用先前值的单个列覆盖了 df
变量。
df = df["VWPfgbl"].ffill()
df = df["VWPfgbm"].ffill()
df = df["VWPfgbs"].ffill()
那里的第一行将分配 df
变量,使其成为原始数据框的单个(填充)列。这就是它在第二行失败的原因,因为 df
现在没有任何其他列,所以你得到一个 IndexError
.
您应该将代码重写为
df["VWPfgbl"] = df["VWPfgbl"].ffill()
df["VWPfgbm"] = df["VWPfgbm"].ffill()
df["VWPfgbs"] = df["VWPfgbs"].ffill()