R:将 google 财务 JSON 数据放入数据框
R: getting google finance JSON data into a dataframe
我正在尝试将 google 财务 JSON 数据放入数据框。
我试过了:
library(jsonlite)
dat1 <- fromJSON("http://www.google.com/finance/info?q=NSE:%20AAPL,MSFT,TSLA,AMZN,IBM")
dat1
但是我得到一个错误:
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: trailing garbage
感谢您的帮助。
我从 here 获得了以下代码。让我知道这是否有帮助。另外,我还推荐 netfonds。 Netfonds 是我发现的唯一提供历史价格和未平仓合约日内 tick 水平数据的来源。如果您有兴趣,我在下面发布了一些额外的链接,用于提取 Netfonds 数据。
http://www.blackarbs.com/blog/3/22/2015/how-to-get-free-intraday-stock-data-from-netfonds
http://www.onestepremoved.com/free-stock-data/
import urllib
from datetime import date, datetime
""" googlefinance
This module provides a Python API for retrieving stock data from Google Finance.
"""
_month_dict = {
'Jan': 1,
'Feb': 2,
'Mar': 3,
'Apr': 4,
'May': 5,
'Jun': 6,
'Jul': 7,
'Aug': 8,
'Sep': 9,
'Oct': 10,
'Nov': 11,
'Dec': 12}
# Google doesn't like Python's user agent...
class FirefoxOpener(urllib.FancyURLopener):
version = 'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11'
def __request(symbol):
url = 'http://google.com/finance/historical?q=%s&output=csv' % symbol
opener = FirefoxOpener()
return opener.open(url).read().strip().strip('"')
def get_historical_prices(symbol, start_date=None, end_date=None):
"""
Get historical prices for the given ticker symbol.
Returns a nested list. fields are Date, Open, High, Low, Close, Volume.
"""
price_data = [data.split(',') for data in __request(symbol).split('\n')[1:]]
for quote in price_data:
quote[0] = _format_date(quote[0])
return price_data
def _format_date(datestr):
""" Change datestr from google format ('20-Jul-12') to the format yahoo uses ('2012-07-20')
"""
parts = datestr.split('-')
day = int(parts[0])
month = _month_dict[parts[1]]
year = int('20'+ parts[2])
return date(year, month, day).strftime('%Y-%m-%d')
由于我这边的代理问题,我无法使用 fromJSON
复制您的错误,但以下工作使用 httr
require(jsonlite)
require(httr)
#Set your proxy setting if needed
#set_config(use_proxy(url='hostname',port= port,username="",password=""))
url.name = "http://www.google.com/finance/info?q=NSE:%20AAPL,MSFT,TSLA,AMZN,IBM"
url.get = GET(url.name)
#parsing the content as json results in similar error as you encountered
#url.content = content(url.get,type="application/json")
#Error in parseJSON(txt) : parse error: trailing garbage
# " : "0.57" ,"yld" : "2.46" } ,{ "id": "358464" ,"t" : "MSFT"
# (right here) ------^
#read content as html text
url.content = content(url.get, as="text")
#remove html tags
clean.text = gsub("<.*?>", "", url.content)
#remove residual text
clean.text = gsub("\n|\//","",clean.text)
DF = fromJSON(clean.text)
head(DF[,1:10],5)
# id t e l l_fix l_cur s ltt lt lt_dts
#1 22144 AAPL NASDAQ 92.51 92.51 92.51 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#2 358464 MSFT NASDAQ 51.05 51.05 51.05 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#3 12607212 TSLA NASDAQ 208.96 208.96 208.96 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#4 660463 AMZN NASDAQ 713.23 713.23 713.23 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#5 18241 IBM NYSE 148.95 148.95 148.95 2 6:59PM EDT May 11, 6:59PM EDT 2016-05-11T18:59:12Z
如果 Google 财务端点 returns 换行分隔 json,R 中的解决方案应该是:
library(jsonlite)
dat1 <- stream_in(url("http://www.google.com/finance/info?q=NSE:%20AAPL,MSFT,TSLA,AMZN,IBM"))
但端点似乎不接受此类请求(不再接受?):
HTTP status was '403 Forbidden'
我正在尝试将 google 财务 JSON 数据放入数据框。 我试过了:
library(jsonlite)
dat1 <- fromJSON("http://www.google.com/finance/info?q=NSE:%20AAPL,MSFT,TSLA,AMZN,IBM")
dat1
但是我得到一个错误:
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) : parse error: trailing garbage
感谢您的帮助。
我从 here 获得了以下代码。让我知道这是否有帮助。另外,我还推荐 netfonds。 Netfonds 是我发现的唯一提供历史价格和未平仓合约日内 tick 水平数据的来源。如果您有兴趣,我在下面发布了一些额外的链接,用于提取 Netfonds 数据。
http://www.blackarbs.com/blog/3/22/2015/how-to-get-free-intraday-stock-data-from-netfonds
http://www.onestepremoved.com/free-stock-data/
import urllib
from datetime import date, datetime
""" googlefinance
This module provides a Python API for retrieving stock data from Google Finance.
"""
_month_dict = {
'Jan': 1,
'Feb': 2,
'Mar': 3,
'Apr': 4,
'May': 5,
'Jun': 6,
'Jul': 7,
'Aug': 8,
'Sep': 9,
'Oct': 10,
'Nov': 11,
'Dec': 12}
# Google doesn't like Python's user agent...
class FirefoxOpener(urllib.FancyURLopener):
version = 'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11'
def __request(symbol):
url = 'http://google.com/finance/historical?q=%s&output=csv' % symbol
opener = FirefoxOpener()
return opener.open(url).read().strip().strip('"')
def get_historical_prices(symbol, start_date=None, end_date=None):
"""
Get historical prices for the given ticker symbol.
Returns a nested list. fields are Date, Open, High, Low, Close, Volume.
"""
price_data = [data.split(',') for data in __request(symbol).split('\n')[1:]]
for quote in price_data:
quote[0] = _format_date(quote[0])
return price_data
def _format_date(datestr):
""" Change datestr from google format ('20-Jul-12') to the format yahoo uses ('2012-07-20')
"""
parts = datestr.split('-')
day = int(parts[0])
month = _month_dict[parts[1]]
year = int('20'+ parts[2])
return date(year, month, day).strftime('%Y-%m-%d')
由于我这边的代理问题,我无法使用 fromJSON
复制您的错误,但以下工作使用 httr
require(jsonlite)
require(httr)
#Set your proxy setting if needed
#set_config(use_proxy(url='hostname',port= port,username="",password=""))
url.name = "http://www.google.com/finance/info?q=NSE:%20AAPL,MSFT,TSLA,AMZN,IBM"
url.get = GET(url.name)
#parsing the content as json results in similar error as you encountered
#url.content = content(url.get,type="application/json")
#Error in parseJSON(txt) : parse error: trailing garbage
# " : "0.57" ,"yld" : "2.46" } ,{ "id": "358464" ,"t" : "MSFT"
# (right here) ------^
#read content as html text
url.content = content(url.get, as="text")
#remove html tags
clean.text = gsub("<.*?>", "", url.content)
#remove residual text
clean.text = gsub("\n|\//","",clean.text)
DF = fromJSON(clean.text)
head(DF[,1:10],5)
# id t e l l_fix l_cur s ltt lt lt_dts
#1 22144 AAPL NASDAQ 92.51 92.51 92.51 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#2 358464 MSFT NASDAQ 51.05 51.05 51.05 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#3 12607212 TSLA NASDAQ 208.96 208.96 208.96 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#4 660463 AMZN NASDAQ 713.23 713.23 713.23 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#5 18241 IBM NYSE 148.95 148.95 148.95 2 6:59PM EDT May 11, 6:59PM EDT 2016-05-11T18:59:12Z
如果 Google 财务端点 returns 换行分隔 json,R 中的解决方案应该是:
library(jsonlite)
dat1 <- stream_in(url("http://www.google.com/finance/info?q=NSE:%20AAPL,MSFT,TSLA,AMZN,IBM"))
但端点似乎不接受此类请求(不再接受?):
HTTP status was '403 Forbidden'