Python pandas datareader 不再适用于 yahoo-finance 已更改 url

Python pandas datareader no longer works for yahoo-finance changed url

由于雅虎停止了他们的 API 支持 pandas datareader 现在失败

import pandas_datareader.data as web
import datetime
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime(2017, 5, 17)
web.DataReader('GOOGL', 'yahoo', start, end)

HTTPError: HTTP Error 401: Unauthorized

有没有非官方的库可以让我们暂时解决这个问题?也许在 Quandl 上有什么?

所以他们更改了 url,现在使用 cookie 保护(可能 javascript),所以我使用模拟浏览器的 dryscrape 解决了我自己的问题 这只是一个仅供参考,因为这肯定现在违反了他们的条款和条件......所以使用风险自负?我正在寻找 Quandl 的替代 EOD 价格来源。

我无法通过 cookie 浏览 CookieJar,所以我最终使用 dryscrape "fake" 用户下载

import dryscrape
from bs4 import BeautifulSoup
import time
import datetime
import re

#we visit the main page to initialise sessions and cookies
session = dryscrape.Session()
session.set_attribute('auto_load_images', False)
session.set_header('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95     Safari/537.36')    

#call this once as it is slow(er) and then you can do multiple download, though there seems to be a limit after which you have to reinitialise...
session.visit("https://finance.yahoo.com/quote/AAPL/history?p=AAPL")
response = session.body()


#get the dowload link
soup = BeautifulSoup(response, 'lxml')
for taga in soup.findAll('a'):
    if taga.has_attr('download'):
        url_download = taga['href']
print(url_download)

#now replace the default end date end start date that yahoo provides
s = "2017-02-18"
period1 = '%.0f' % time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d").timetuple())
e = "2017-05-18"
period2 = '%.0f' % time.mktime(datetime.datetime.strptime(e, "%Y-%m-%d").timetuple())

#now we replace the period download by our dates, please feel free to improve, I suck at regex
m = re.search('period1=(.+?)&', url_download)
if m:
    to_replace = m.group(m.lastindex)
    url_download = url_download.replace(to_replace, period1)        
m = re.search('period2=(.+?)&', url_download)
if m:
    to_replace = m.group(m.lastindex)
    url_download = url_download.replace(to_replace, period2)

#and now viti and get body and you have your csv
session.visit(url_download)
csv_data = session.body()

#and finally if you want to get a dataframe from it
import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd
df = pd.read_csv(StringIO(csv_data), index_col=[0], parse_dates=True)
df

我发现 https://pypi.python.org/pypi/fix-yahoo-finance 中 "fix-yahoo-finance" 的解决方法很有用,例如:

from pandas_datareader import data as pdr
import fix_yahoo_finance

data = pdr.get_data_yahoo('APPL', start='2017-04-23', end='2017-05-24')

注意最后 2 个数据列的顺序是 'Adj Close' 和 'Volume' 即。不是以前的格式。重新索引:

cols = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
data.reindex(columns=cols)

我从 Yahoo 转到 Google Finance,它对我有用,所以从

data.DataReader(ticker, 'yahoo', start_date, end_date)

data.DataReader(ticker, 'google', start_date, end_date)

并改编了我的 "old" Yahoo!符号来自:

tickers = ['AAPL','MSFT','GE','IBM','AA','DAL','UAL', 'PEP', 'KO']

tickers = ['NASDAQ:AAPL','NASDAQ:MSFT','NYSE:GE','NYSE:IBM','NYSE:AA','NYSE:DAL','NYSE:UAL', 'NYSE:PEP', 'NYSE:KO']

让线程在读取每个数据后休眠。 可能大部分时间都有效,所以尝试 5-6 次并将数据保存在 csv 文件中,以便下次您可以从文件中读取。

### code is here ###
import pandas_datareader as web
import time
import datetime as dt
import pandas as pd

symbols = ['AAPL', 'MSFT', 'AABA', 'DB', 'GLD']
webData = pd.DataFrame()
for stockSymbol in symbols:
    webData[stockSymbol] = web.DataReader(stockSymbol, 
    data_source='yahoo',start= 
               startDate, end= endDate, retry_count= 10)['Adj Close']   
    time.sleep(22) # thread sleep for 22 seconds.

试试这个:

import fix_yahoo_finance as yf
data = yf.download('SPY', start = '2012-01-01', end='2017-01-01')

雅虎财经与 pandas 合作良好。像这样使用它:

import pandas as pd
import pandas_datareader as pdr
from pandas_datareader import data as wb

ticker='GOOGL'
start_date='2019-1-1'
data_source='yahoo'

ticker_data=wb.DataReader(ticker,data_source=data_source,start=start_date)
df=pd.DataFrame(ticker_data)

fix_yahoo_finance 包的名称已更改为 yfinance。所以你可以试试这个代码

import yfinance as yf
data = yf.download('MSFT', start = '2012-01-01', end='2017-01-01')