ValueError: If using all scalar values, you must pass an index
ValueError: If using all scalar values, you must pass an index
取以下代码:
import MySQLdb as mdb
import pandas as pd
con = mdb.connect(db_host, db_user, db_pass, db_name)
query = """SELECT `TIME`.`BID-CLOSE`
FROM `EUR-USD`.`tbl_EUR-USD_1-Day`
WHERE TIME >= '2006-12-15 22:00:00' AND TIME <= '2007-01-03 22:00:00'
ORDER BY TIME ASC;"""
# Create a pandas dataframe from the SQL query
eurusd = pd.read_sql_query(query, con=con, index_col='TIME')
idx = pd.date_range('2006-12-17 22:00:00', '2007-01-03 22:00:00')
eurusd.reindex(idx, fill_value=None)
这给出了
的输出
BID-CLOSE
2006-12-17 22:00:00 1.30971
2006-12-18 22:00:00 1.31971
2006-12-19 22:00:00 1.31721
2006-12-20 22:00:00 1.31771
2006-12-21 22:00:00 1.31411
2006-12-22 22:00:00 NaN
2006-12-23 22:00:00 NaN
2006-12-24 22:00:00 NaN
2006-12-25 22:00:00 1.30971
2006-12-26 22:00:00 1.31131
2006-12-27 22:00:00 1.31491
2006-12-28 22:00:00 1.32021
2006-12-29 22:00:00 NaN
2006-12-30 22:00:00 NaN
2006-12-31 22:00:00 1.32731
2007-01-01 22:00:00 1.32731
2007-01-02 22:00:00 1.31701
2007-01-03 22:00:00 1.30831
重新索引数据
eurusd = eurusd.reindex(idx, fill_value=None)
插值类型列表
methods = ['linear', 'quadratic', 'cubic']
下一行抛出异常...
pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})
ValueError: If using all scalar values, you must pass an index
遵循本指南的插值部分http://pandas.pydata.org/pandas-docs/stable/missing_data.html
在这种情况下我该如何正确 'pass an index'?
更新 1
eurusd.interpolate('linear')
的输出
BID-CLOSE
2006-12-17 22:00:00 1.309710
2006-12-18 22:00:00 1.319710
2006-12-19 22:00:00 1.317210
2006-12-20 22:00:00 1.317710
2006-12-21 22:00:00 1.314110
2006-12-22 22:00:00 1.313010
2006-12-23 22:00:00 1.311910
2006-12-24 22:00:00 1.310810
2006-12-25 22:00:00 1.309710
2006-12-26 22:00:00 1.311310
2006-12-27 22:00:00 1.314910
2006-12-28 22:00:00 1.320210
2006-12-29 22:00:00 1.322577
2006-12-30 22:00:00 1.324943
2006-12-31 22:00:00 1.327310
2007-01-01 22:00:00 1.327310
2007-01-02 22:00:00 1.317010
2007-01-03 22:00:00 1.308310
更新 2
In[9]: pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
Out[9]:
cubic linear quadratic
2006-12-17 22:00:00 1.309710 1.309710 1.309710
2006-12-18 22:00:00 1.319710 1.319710 1.319710
2006-12-19 22:00:00 1.317210 1.317210 1.317210
2006-12-20 22:00:00 1.317710 1.317710 1.317710
2006-12-21 22:00:00 1.314110 1.314110 1.314110
2006-12-22 22:00:00 1.310762 1.313010 1.307947
2006-12-23 22:00:00 1.309191 1.311910 1.305159
2006-12-24 22:00:00 1.308980 1.310810 1.305747
2006-12-25 22:00:00 1.309710 1.309710 1.309710
2006-12-26 22:00:00 1.311310 1.311310 1.311310
2006-12-27 22:00:00 1.314910 1.314910 1.314910
2006-12-28 22:00:00 1.320210 1.320210 1.320210
2006-12-29 22:00:00 1.323674 1.322577 1.321632
2006-12-30 22:00:00 1.325553 1.324943 1.323998
2006-12-31 22:00:00 1.327310 1.327310 1.327310
2007-01-01 22:00:00 1.327310 1.327310 1.327310
2007-01-02 22:00:00 1.317010 1.317010 1.317010
2007-01-03 22:00:00 1.308310 1.308310 1.308310
问题是,当您使用 DataFrame
构造函数时:
pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})
每个 m
的值是一个 DataFrame
,它将被解释为标量值,这无疑是令人困惑的。此构造函数需要某种序列或 Series
。以下应该可以解决问题:
pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
自从对列 returns 进行子集化后 Series
。因此,例如代替:
In [34]: pd.DataFrame({'linear':df.interpolate('linear')})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-34-4b6c095c6da3> in <module>()
----> 1 pd.DataFrame({'linear':df.interpolate('linear')})
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
222 dtype=dtype, copy=copy)
223 elif isinstance(data, dict):
--> 224 mgr = self._init_dict(data, index, columns, dtype=dtype)
225 elif isinstance(data, ma.MaskedArray):
226 import numpy.ma.mrecords as mrecords
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
358 arrays = [data[k] for k in keys]
359
--> 360 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
361
362 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5229 # figure out the index, if necessary
5230 if index is None:
-> 5231 index = extract_index(arrays)
5232 else:
5233 index = _ensure_index(index)
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data)
5268
5269 if not indexes and not raw_lengths:
-> 5270 raise ValueError('If using all scalar values, you must pass'
5271 ' an index')
5272
ValueError: If using all scalar values, you must pass an index
改用这个:
In [35]: pd.DataFrame({'linear':df['BID-CLOSE'].interpolate('linear')})
Out[35]:
linear
timestamp
2016-10-10 22:00:00 1.309710
2016-10-10 22:00:00 1.319710
2016-10-10 22:00:00 1.317210
2016-10-10 22:00:00 1.317710
2016-10-10 22:00:00 1.314110
2016-10-10 22:00:00 1.313010
2016-10-10 22:00:00 1.311910
2016-10-10 22:00:00 1.310810
2016-10-10 22:00:00 1.309710
2016-10-10 22:00:00 1.311310
2016-10-10 22:00:00 1.314910
2016-10-10 22:00:00 1.320210
2016-10-10 22:00:00 1.322577
2016-10-10 22:00:00 1.324943
2016-10-10 22:00:00 1.327310
2016-10-10 22:00:00 1.327310
2016-10-10 22:00:00 1.317010
2016-10-10 22:00:00 1.308310
公平警告,但是,当我尝试对您的数据进行 'quadratic'
和 'cubic'
插值时,出现 LinAlgError: singular matrix
错误。不知道为什么。
取以下代码:
import MySQLdb as mdb
import pandas as pd
con = mdb.connect(db_host, db_user, db_pass, db_name)
query = """SELECT `TIME`.`BID-CLOSE`
FROM `EUR-USD`.`tbl_EUR-USD_1-Day`
WHERE TIME >= '2006-12-15 22:00:00' AND TIME <= '2007-01-03 22:00:00'
ORDER BY TIME ASC;"""
# Create a pandas dataframe from the SQL query
eurusd = pd.read_sql_query(query, con=con, index_col='TIME')
idx = pd.date_range('2006-12-17 22:00:00', '2007-01-03 22:00:00')
eurusd.reindex(idx, fill_value=None)
这给出了
的输出 BID-CLOSE
2006-12-17 22:00:00 1.30971
2006-12-18 22:00:00 1.31971
2006-12-19 22:00:00 1.31721
2006-12-20 22:00:00 1.31771
2006-12-21 22:00:00 1.31411
2006-12-22 22:00:00 NaN
2006-12-23 22:00:00 NaN
2006-12-24 22:00:00 NaN
2006-12-25 22:00:00 1.30971
2006-12-26 22:00:00 1.31131
2006-12-27 22:00:00 1.31491
2006-12-28 22:00:00 1.32021
2006-12-29 22:00:00 NaN
2006-12-30 22:00:00 NaN
2006-12-31 22:00:00 1.32731
2007-01-01 22:00:00 1.32731
2007-01-02 22:00:00 1.31701
2007-01-03 22:00:00 1.30831
重新索引数据
eurusd = eurusd.reindex(idx, fill_value=None)
插值类型列表
methods = ['linear', 'quadratic', 'cubic']
下一行抛出异常...
pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})
ValueError: If using all scalar values, you must pass an index
遵循本指南的插值部分http://pandas.pydata.org/pandas-docs/stable/missing_data.html 在这种情况下我该如何正确 'pass an index'?
更新 1
eurusd.interpolate('linear')
BID-CLOSE
2006-12-17 22:00:00 1.309710
2006-12-18 22:00:00 1.319710
2006-12-19 22:00:00 1.317210
2006-12-20 22:00:00 1.317710
2006-12-21 22:00:00 1.314110
2006-12-22 22:00:00 1.313010
2006-12-23 22:00:00 1.311910
2006-12-24 22:00:00 1.310810
2006-12-25 22:00:00 1.309710
2006-12-26 22:00:00 1.311310
2006-12-27 22:00:00 1.314910
2006-12-28 22:00:00 1.320210
2006-12-29 22:00:00 1.322577
2006-12-30 22:00:00 1.324943
2006-12-31 22:00:00 1.327310
2007-01-01 22:00:00 1.327310
2007-01-02 22:00:00 1.317010
2007-01-03 22:00:00 1.308310
更新 2
In[9]: pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
Out[9]:
cubic linear quadratic
2006-12-17 22:00:00 1.309710 1.309710 1.309710
2006-12-18 22:00:00 1.319710 1.319710 1.319710
2006-12-19 22:00:00 1.317210 1.317210 1.317210
2006-12-20 22:00:00 1.317710 1.317710 1.317710
2006-12-21 22:00:00 1.314110 1.314110 1.314110
2006-12-22 22:00:00 1.310762 1.313010 1.307947
2006-12-23 22:00:00 1.309191 1.311910 1.305159
2006-12-24 22:00:00 1.308980 1.310810 1.305747
2006-12-25 22:00:00 1.309710 1.309710 1.309710
2006-12-26 22:00:00 1.311310 1.311310 1.311310
2006-12-27 22:00:00 1.314910 1.314910 1.314910
2006-12-28 22:00:00 1.320210 1.320210 1.320210
2006-12-29 22:00:00 1.323674 1.322577 1.321632
2006-12-30 22:00:00 1.325553 1.324943 1.323998
2006-12-31 22:00:00 1.327310 1.327310 1.327310
2007-01-01 22:00:00 1.327310 1.327310 1.327310
2007-01-02 22:00:00 1.317010 1.317010 1.317010
2007-01-03 22:00:00 1.308310 1.308310 1.308310
问题是,当您使用 DataFrame
构造函数时:
pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})
每个 m
的值是一个 DataFrame
,它将被解释为标量值,这无疑是令人困惑的。此构造函数需要某种序列或 Series
。以下应该可以解决问题:
pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
自从对列 returns 进行子集化后 Series
。因此,例如代替:
In [34]: pd.DataFrame({'linear':df.interpolate('linear')})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-34-4b6c095c6da3> in <module>()
----> 1 pd.DataFrame({'linear':df.interpolate('linear')})
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
222 dtype=dtype, copy=copy)
223 elif isinstance(data, dict):
--> 224 mgr = self._init_dict(data, index, columns, dtype=dtype)
225 elif isinstance(data, ma.MaskedArray):
226 import numpy.ma.mrecords as mrecords
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
358 arrays = [data[k] for k in keys]
359
--> 360 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
361
362 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5229 # figure out the index, if necessary
5230 if index is None:
-> 5231 index = extract_index(arrays)
5232 else:
5233 index = _ensure_index(index)
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data)
5268
5269 if not indexes and not raw_lengths:
-> 5270 raise ValueError('If using all scalar values, you must pass'
5271 ' an index')
5272
ValueError: If using all scalar values, you must pass an index
改用这个:
In [35]: pd.DataFrame({'linear':df['BID-CLOSE'].interpolate('linear')})
Out[35]:
linear
timestamp
2016-10-10 22:00:00 1.309710
2016-10-10 22:00:00 1.319710
2016-10-10 22:00:00 1.317210
2016-10-10 22:00:00 1.317710
2016-10-10 22:00:00 1.314110
2016-10-10 22:00:00 1.313010
2016-10-10 22:00:00 1.311910
2016-10-10 22:00:00 1.310810
2016-10-10 22:00:00 1.309710
2016-10-10 22:00:00 1.311310
2016-10-10 22:00:00 1.314910
2016-10-10 22:00:00 1.320210
2016-10-10 22:00:00 1.322577
2016-10-10 22:00:00 1.324943
2016-10-10 22:00:00 1.327310
2016-10-10 22:00:00 1.327310
2016-10-10 22:00:00 1.317010
2016-10-10 22:00:00 1.308310
公平警告,但是,当我尝试对您的数据进行 'quadratic'
和 'cubic'
插值时,出现 LinAlgError: singular matrix
错误。不知道为什么。