"ValueError: Index contains duplicate entries, cannot reshape" Pandas DataReader

Question

我可以从 yahoo 读取“AAPL”符号历史数据

dfcomp3 = web.DataReader(["AAPL"],'yahoo',start=start,end=end)['Adj Close']

我可以从yahoo上读取“GE”符号历史数据

dfcomp3 = web.DataReader(["AAPL"],'yahoo',start=start,end=end)['Adj Close']

我可以从 yahoo 读取“BTC-USD”符号历史数据

dfcomp3 = web.DataReader(["BTC-USD"],'yahoo',start=start,end=end)['Adj Close']

我可以从 yahoo 读取“AAPL”、“GE”符号历史数据

dfcomp7 = web.DataReader(["GE", "AAPL"],'yahoo',start=start,end=end)['Adj Close']

我无法从 yahoo 读取两个“AAPL”、“BTC-USD”符号历史数据

dfcomp7 = web.DataReader(["BTC-USD", "AAPL"],'yahoo',start=start,end=end)['Adj Close']

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-0cbbb3aa9346> in <module>()
----> 1 dfcomp7 = web.DataReader(["BTC-USD", "AAPL" ],'yahoo',start=start,end=end)['Adj Close']

7 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/reshape.py in _make_selectors(self)
    164 
    165         if mask.sum() < len(self.index):
--> 166             raise ValueError('Index contains duplicate entries, '
    167                              'cannot reshape')
    168 

ValueError: Index contains duplicate entries, cannot reshape

为什么？

Answer 1

进入调试模式并在 self.index 上执行 value_counts()。这样你就会看到哪个日期和哪个符号造成了问题。

当 BTC-USD 自行下载时，它不会产生此问题，因为 pandas-datareader 正在取消堆叠并且所有符号都变成列名。这不是问题，因为只有一个符号。然而，对于许多符号，它会在拆栈时导致错误。

对于日期为 19 年 12 月 4 日和 19 年 12 月 6 日的以下代码 CBS、STI、VIAB，我遇到了同样的问题。

Answer 2

意识到这是一个老问题，但我在下载 Yahoo Finance 时遇到了同样的问题。我相信这个特定问题是 Yahoo 特有的，出于某种原因，它在一天内发送了多个价格。其中一项建议涉及重建索引，但由于 'DataReader' 转换为 pandas 的方式，您根本无法创建数据框，因此无法重建索引。

这是我的解决方案。我已经包括了一个 try except 因为我认为这个问题可能会暂时发生（例如，Yahoo 将来会修复重复项）并且因为我的代码每天都在运行，所以我想灵活地解决这个问题或不依赖在发送的输出上。我在这里使用主要的罗素指数作为我的样本。此代码尝试执行正常方式，如果抛出 IndexError，则单独循环每个符号，删除任何重复项（默认情况下保留第一个）并将数据帧合并为一个。

def get_yahoo():
    start = dt.datetime(1995, 12, 31)
    end = dt.datetime.today()
    yh_fields = ['^RLG', '^RLV', '^RUO', '^RUJ']
    try:
        yho = web.DataReader(yh_fields, 'yahoo', start, end)['Adj Close']
    except ValueError:
        yho = pd.DataFrame()
        for y in yh_fields:
            temp = web.DataReader(y, 'yahoo', start, end)['Adj Close']
            temp = temp.rename(y)
            temp = temp[~temp.index.duplicated()]
            yho = yho.join(temp, how='outer')
    return yho

"ValueError: Index contains duplicate entries, cannot reshape" Pandas DataReader

"ValueError: Index contains duplicate entries, cannot reshape" Pandas DataReader

python

api

yahoo

pandas

pandas-datareader