ValueError: If using all scalar values, you must pass an index

Question

取以下代码：

import MySQLdb as mdb
import pandas as pd

con = mdb.connect(db_host, db_user, db_pass, db_name)

query = """SELECT `TIME`.`BID-CLOSE`
          FROM `EUR-USD`.`tbl_EUR-USD_1-Day`
          WHERE TIME >= '2006-12-15 22:00:00' AND TIME <= '2007-01-03 22:00:00'
          ORDER BY TIME ASC;"""

# Create a pandas dataframe from the SQL query
eurusd = pd.read_sql_query(query, con=con, index_col='TIME')
idx = pd.date_range('2006-12-17 22:00:00', '2007-01-03 22:00:00')
eurusd.reindex(idx, fill_value=None)

这给出了

的输出

                     BID-CLOSE
2006-12-17 22:00:00    1.30971
2006-12-18 22:00:00    1.31971
2006-12-19 22:00:00    1.31721
2006-12-20 22:00:00    1.31771
2006-12-21 22:00:00    1.31411
2006-12-22 22:00:00        NaN
2006-12-23 22:00:00        NaN
2006-12-24 22:00:00        NaN
2006-12-25 22:00:00    1.30971
2006-12-26 22:00:00    1.31131
2006-12-27 22:00:00    1.31491
2006-12-28 22:00:00    1.32021
2006-12-29 22:00:00        NaN
2006-12-30 22:00:00        NaN
2006-12-31 22:00:00    1.32731
2007-01-01 22:00:00    1.32731
2007-01-02 22:00:00    1.31701
2007-01-03 22:00:00    1.30831

重新索引数据

eurusd = eurusd.reindex(idx, fill_value=None)

插值类型列表

methods = ['linear', 'quadratic', 'cubic']

下一行抛出异常...

pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})

ValueError: If using all scalar values, you must pass an index

遵循本指南的插值部分http://pandas.pydata.org/pandas-docs/stable/missing_data.html 在这种情况下我该如何正确 'pass an index'？

更新 1

eurusd.interpolate('linear')

的输出

                     BID-CLOSE
2006-12-17 22:00:00   1.309710
2006-12-18 22:00:00   1.319710
2006-12-19 22:00:00   1.317210
2006-12-20 22:00:00   1.317710
2006-12-21 22:00:00   1.314110
2006-12-22 22:00:00   1.313010
2006-12-23 22:00:00   1.311910
2006-12-24 22:00:00   1.310810
2006-12-25 22:00:00   1.309710
2006-12-26 22:00:00   1.311310
2006-12-27 22:00:00   1.314910
2006-12-28 22:00:00   1.320210
2006-12-29 22:00:00   1.322577
2006-12-30 22:00:00   1.324943
2006-12-31 22:00:00   1.327310
2007-01-01 22:00:00   1.327310
2007-01-02 22:00:00   1.317010
2007-01-03 22:00:00   1.308310

更新 2

In[9]: pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
Out[9]: 
                        cubic    linear  quadratic
2006-12-17 22:00:00  1.309710  1.309710   1.309710
2006-12-18 22:00:00  1.319710  1.319710   1.319710
2006-12-19 22:00:00  1.317210  1.317210   1.317210
2006-12-20 22:00:00  1.317710  1.317710   1.317710
2006-12-21 22:00:00  1.314110  1.314110   1.314110
2006-12-22 22:00:00  1.310762  1.313010   1.307947
2006-12-23 22:00:00  1.309191  1.311910   1.305159
2006-12-24 22:00:00  1.308980  1.310810   1.305747
2006-12-25 22:00:00  1.309710  1.309710   1.309710
2006-12-26 22:00:00  1.311310  1.311310   1.311310
2006-12-27 22:00:00  1.314910  1.314910   1.314910
2006-12-28 22:00:00  1.320210  1.320210   1.320210
2006-12-29 22:00:00  1.323674  1.322577   1.321632
2006-12-30 22:00:00  1.325553  1.324943   1.323998
2006-12-31 22:00:00  1.327310  1.327310   1.327310
2007-01-01 22:00:00  1.327310  1.327310   1.327310
2007-01-02 22:00:00  1.317010  1.317010   1.317010
2007-01-03 22:00:00  1.308310  1.308310   1.308310

Answer 1

问题是，当您使用 DataFrame 构造函数时：

pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})

每个 m 的值是一个 DataFrame，它将被解释为标量值，这无疑是令人困惑的。此构造函数需要某种序列或 Series。以下应该可以解决问题：

pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})

自从对列 returns 进行子集化后 Series。因此，例如代替：

In [34]: pd.DataFrame({'linear':df.interpolate('linear')})
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-4b6c095c6da3> in <module>()
----> 1 pd.DataFrame({'linear':df.interpolate('linear')})

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    222                                  dtype=dtype, copy=copy)
    223         elif isinstance(data, dict):
--> 224             mgr = self._init_dict(data, index, columns, dtype=dtype)
    225         elif isinstance(data, ma.MaskedArray):
    226             import numpy.ma.mrecords as mrecords

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    358             arrays = [data[k] for k in keys]
    359 
--> 360         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    361 
    362     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5229     # figure out the index, if necessary
   5230     if index is None:
-> 5231         index = extract_index(arrays)
   5232     else:
   5233         index = _ensure_index(index)

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data)
   5268 
   5269         if not indexes and not raw_lengths:
-> 5270             raise ValueError('If using all scalar values, you must pass'
   5271                              ' an index')
   5272 

ValueError: If using all scalar values, you must pass an index

改用这个：

In [35]: pd.DataFrame({'linear':df['BID-CLOSE'].interpolate('linear')})
Out[35]: 
                       linear
timestamp                    
2016-10-10 22:00:00  1.309710
2016-10-10 22:00:00  1.319710
2016-10-10 22:00:00  1.317210
2016-10-10 22:00:00  1.317710
2016-10-10 22:00:00  1.314110
2016-10-10 22:00:00  1.313010
2016-10-10 22:00:00  1.311910
2016-10-10 22:00:00  1.310810
2016-10-10 22:00:00  1.309710
2016-10-10 22:00:00  1.311310
2016-10-10 22:00:00  1.314910
2016-10-10 22:00:00  1.320210
2016-10-10 22:00:00  1.322577
2016-10-10 22:00:00  1.324943
2016-10-10 22:00:00  1.327310
2016-10-10 22:00:00  1.327310
2016-10-10 22:00:00  1.317010
2016-10-10 22:00:00  1.308310

公平警告，但是，当我尝试对您的数据进行 'quadratic' 和 'cubic' 插值时，出现 LinAlgError: singular matrix 错误。不知道为什么。

ValueError: If using all scalar values, you must pass an index

ValueError: If using all scalar values, you must pass an index

python

quantitative-finance

pandas