我可以将历史市值从网站导入 R 吗?
Can I import historical market cap from a website to R?
我尝试使用 getHisMktCap 导入股票的历史市值。由于此功能需要使用数字作为行情符号,因此它不适合我。
我找到了一个显示股票历史市值的网站,我想将其导入 R。
https://stockrow.com/interactive_chart/54e53957-2723-4e78-8b73-1b922e9d3300
如您所见,这是来自图表。我只想带来一天的市值,2015-10-30。我也有数百个代码。
我试过:
library(data.table)
mydat <- fread('https://stockrow.com/interactive_chart/54e53957-2723-4e78-8b73-1b922e9d3300')
并且它不从网络导入数据。我怎样才能做到这一点?
根据 stockrow 的社区,there's no available API:
Hi we currently don’t offer any APIs as our data provider doesn’t allow it. If you interested in fundamental data APIs, check Sharadar SF1 database on Quandl, it’s available for a very reasonable price.
如果我们按照评论中的建议访问 Quandl 网站,我们可以看到他们提供了 R dedicated package for their API.
我来晚了,但这是针对您的特定问题的自定义解决方案。
这将要求您使用 RSelenium
获取 HTML 数据,其中包含 chart indicators
和 chart tickers
的值,这些值需要馈送到 JSON API (?).然后,使用 jsonlite
和 httr
您可以制定一个 POST
查询,该查询将从该 URL 中获取 JSON 格式的数据。最后,可以在 R 中对数据进行格式化和绘图。
下面的一组函数为您完成了这个(下面代码块中的最后一个函数取决于前面的辅助函数)。 srvisualize
是为从 stockrow 中检索和绘制数据而量身定制的。您需要提供的只是您的库存 URL。 除了绘制数据外,它还 returns 原始格式化数据(作为 data.frame
对象;用于下游争论)、绘图数据(用于自定义绘图)和 Docker
容器 ID(其中部署了 Selenium
浏览器以加载 URL;此容器在函数终止时关闭)。
使用 srvisualize
的先决条件是安装 Docker
,因为 Selenium
浏览器将由 srvisualize
安装和部署为 Docker 容器。注意:如果 srvisualize
dies/crashes,那么你必须手动杀死它启动的 docker 容器(如果它启动了一个)(应该打印 docker ID到 R 控制台)。
#AUXILIARY FUNCTIONS 1 & 2
#----
#Functions used to find the docker ID
#Courtesy
longest_string <- function(s){return(s[which.max(nchar(s))])}
lcsbstr_no_lib <- function(a, b) {
matches <- gregexpr("M+", drop(attr(adist(a, b, counts = TRUE), "trafos")))[[1]];
lengths<- attr(matches, 'match.length')
which_longest <- which.max(lengths)
index_longest <- matches[which_longest]
length_longest <- lengths[which_longest]
longest_cmn_sbstr <- substring(longest_string(c(a, b)), index_longest , index_longest + length_longest - 1)
return(longest_cmn_sbstr)
}
#----
#AUXILIARY FUNCTIONS 3 & 4
#----
startseleniumdocker <- function(){
#Loading a Selenium web browser via docker
#system("docker pull selenium/standalone-chrome", wait = TRUE)
#system("docker run -d -p 4445:4444 -p 5901:5900 selenium/standalone-chrome", wait = TRUE)
cat("Getting Selenium browser docker!\n")
system("docker pull selenium/standalone-chrome-debug", wait = TRUE)
Sys.sleep(4)
cat("Starting docker container!\n")
mydocker <- system("docker run -d -p 4445:4444 -p 5901:5900 selenium/standalone-chrome-debug",
wait = TRUE, intern = TRUE)
Sys.sleep(4)
dockers <- paste0(system("docker ps", wait = TRUE, intern = TRUE), collapse = " ")
#Storing the docker ID for later--to close the docker container upon function completion
mydockerid <- lcsbstr_no_lib(dockers, mydocker)
return(mydockerid)
}
stopseleniumdocker <- function(mydockerid){
cat("Closing Selenium browser contained in docker", mydockerid, "\n")
system(paste0("docker stop ", mydockerid), wait = TRUE, intern = TRUE)
#Check if docker has been closed properly
dockers <- paste0(system("docker ps", wait = TRUE, intern = TRUE), collapse = " ")
if(lcsbstr_no_lib(dockers, mydockerid) != mydockerid) cat("Docker closed succesfully.")
}
#----
#MAIN FUNCTION
#----
#Start docker container, fetch + plot data from Stockrow, stop docker container
srvisualize <- function(url = NULL){
require(RSelenium) #For getting HTML data
require(devtools) #RSelenium dependency
require(stringi) #RSelenium dependency
require(jsonlite) #For parsing JSON data
require(httr) #For getting JSON data
require(ggplot2) #For plotting
require(magrittr) #For plotting
require(stringr)
if(is.null(url)) stop("No URL provided!")
#if(is.null(remDr)) stop("No Selenium remote driver provided!")
#start docker
mydockid <- startseleniumdocker()
if(!is.null(mydockid)) cat("Selenium browser running from docker container", mydockid, "\nStarting remote driver!\n")
#Starting remote driver
remDr <- RSelenium::remoteDriver(port=4445L, browserName="chrome")
Sys.sleep(10)
#Opening the webpage
remDr$open()
if(!remDr$getStatus()$ready) stop("Something's wrong with Selenium, please check!")
remDr$navigate(url)
remDr$getCurrentUrl() #to check where we are
cat("The current URL is: ", unlist(remDr$getCurrentUrl()), "\n")
#Stockrow passes queries from the interactive_chart
#to an internal API URL: https://stockrow.com/api/fundamentals.json
#which returns the requested data (as a JSON)
#There are only two things that define the request uniquely
#Namely: the chart indicators and the tickers
#So to get the interactive_chart data in R
#We first need to scrape the chart indicaors
#and the chart tickers from the webpage
#Once we have these
#we can reconstruct the request ourselves
#and pass it to fundamentals.json
#to get our data
#First get the hidden chart indicator string
webElem <- remDr$findElements(using = "name", value = "indicator-input")
#chartindicators <- webElem[[1]]$getElementAttribute("value")
chart_indicators <- unlist(lapply(webElem, function(x){x$getElementAttribute("value")}))
chart_indicators
#Then get the set of tickers for the plot
webElem <- remDr$findElements(using = "name", value = "compare-input")
#charttickers <- unlist(webElem$getElementAttribute("value"))
chart_tickers <- unlist(lapply(webElem, function(x){x$getElementAttribute("value")}))
chart_tickers
#Also set the start_date for the data
chart_start_date <- "1960-01-01T00:00:00.000+01:00"
#Put the indicators, tickers, and a start_date value
#into a list that will then be converted into a JSON string
#with jsonlite::toJSON()
reqargs <- list(indicators = chart_indicators,
tickers = chart_tickers,
start_date = chart_start_date)
#Request URL
jsonurl <- "https://stockrow.com/api/fundamentals.json"
cat("Fetching data.\n")
#Make the request with httr::POST()
#Notice the application/json Content-Type specified
#in the header
#The JSON string composed earlier is submitted as the body
#of the request
chartdat <- httr::POST(jsonurl,
httr::add_headers(
"Content-Type" = "application/json;charset=utf-8"
),
body = jsonlite::toJSON(reqargs)
)
#Check if the request was successful
#i.e., status code 200
if(httr::status_code(chartdat) == 200) cat("Data acquired!\n")
#Get the contents
chartdat <- httr::content(chartdat, as = "text")
chartdat <- jsonlite::fromJSON(chartdat)
#Writing the data to a data.frame for plotting
dat <- data.frame(name = c(), date = c(), value = c())
for(i in 1:length(chartdat$series$name)){
#i <- 1
curdat <- as.data.frame(chartdat$series$data[i])
names(curdat) <- c("date", "value")
curdat$series <- rep_len(chartdat$series$name[i], nrow(curdat))
#For some reason, the dates are off by 10 years.
#So the chart_start_date value can't be used directly
#to parse the datetime data in milliseconds to date-time
#So a custom value is used here
curdat$date <- as.POSIXct(curdat$date/1000, origin = "1969-12-31T00:00:00.000+01:00")
dat <- rbind(dat, curdat)
}
#Plotting the data
cat("Plotting data!\n")
plotdat <- dat %>%
ggplot(aes(x = date, y = value/10^12, color = series)) +
geom_line() +
xlab("Date") +
ylab("Cash (trillion USD)")
print(plotdat)
cat("Done!\n")
stopseleniumdocker(mydockid)
return(list(dat, plotdat, mydockid))
}
#----
以下是该函数的一些实际应用示例:
与你的URL:
url1 <- "https://stockrow.com/interactive_chart/54e53957-2723-4e78-8b73-1b922e9d3300"
url1_dat <- srvisualize(url = url1)
URL 有两个代码:
url2 <- "https://stockrow.com/interactive_chart/0c3a40d2-ca06-4df1-9115-b45d2df2e5f5"
url2_dat <- srvisualize(url = url2)
URL 有两个指标和两个代码:
url3 <- "https://stockrow.com/interactive_chart/281a8ff5-b055-41d5-8b06-7b4b84f70210"
url3_dat <- srvisualize(url = url3)
再举一个例子 URL 随机代码和随机指标:
url4 <- "https://stockrow.com/interactive_chart/5e95b5a0-cc15-4620-b9bf-f0c4f7436490"
url4_dat <- srvisualize(url = url4)
当然,为了增强和完善 srvisualize
的功能和可用性,这里还有很多工作要做,但这只是一个开始。
我尝试使用 getHisMktCap 导入股票的历史市值。由于此功能需要使用数字作为行情符号,因此它不适合我。
我找到了一个显示股票历史市值的网站,我想将其导入 R。
https://stockrow.com/interactive_chart/54e53957-2723-4e78-8b73-1b922e9d3300
如您所见,这是来自图表。我只想带来一天的市值,2015-10-30。我也有数百个代码。
我试过:
library(data.table)
mydat <- fread('https://stockrow.com/interactive_chart/54e53957-2723-4e78-8b73-1b922e9d3300')
并且它不从网络导入数据。我怎样才能做到这一点?
根据 stockrow 的社区,there's no available API:
Hi we currently don’t offer any APIs as our data provider doesn’t allow it. If you interested in fundamental data APIs, check Sharadar SF1 database on Quandl, it’s available for a very reasonable price.
如果我们按照评论中的建议访问 Quandl 网站,我们可以看到他们提供了 R dedicated package for their API.
我来晚了,但这是针对您的特定问题的自定义解决方案。
这将要求您使用 RSelenium
获取 HTML 数据,其中包含 chart indicators
和 chart tickers
的值,这些值需要馈送到 JSON API (?).然后,使用 jsonlite
和 httr
您可以制定一个 POST
查询,该查询将从该 URL 中获取 JSON 格式的数据。最后,可以在 R 中对数据进行格式化和绘图。
下面的一组函数为您完成了这个(下面代码块中的最后一个函数取决于前面的辅助函数)。 srvisualize
是为从 stockrow 中检索和绘制数据而量身定制的。您需要提供的只是您的库存 URL。 除了绘制数据外,它还 returns 原始格式化数据(作为 data.frame
对象;用于下游争论)、绘图数据(用于自定义绘图)和 Docker
容器 ID(其中部署了 Selenium
浏览器以加载 URL;此容器在函数终止时关闭)。
使用 srvisualize
的先决条件是安装 Docker
,因为 Selenium
浏览器将由 srvisualize
安装和部署为 Docker 容器。注意:如果 srvisualize
dies/crashes,那么你必须手动杀死它启动的 docker 容器(如果它启动了一个)(应该打印 docker ID到 R 控制台)。
#AUXILIARY FUNCTIONS 1 & 2
#----
#Functions used to find the docker ID
#Courtesy
longest_string <- function(s){return(s[which.max(nchar(s))])}
lcsbstr_no_lib <- function(a, b) {
matches <- gregexpr("M+", drop(attr(adist(a, b, counts = TRUE), "trafos")))[[1]];
lengths<- attr(matches, 'match.length')
which_longest <- which.max(lengths)
index_longest <- matches[which_longest]
length_longest <- lengths[which_longest]
longest_cmn_sbstr <- substring(longest_string(c(a, b)), index_longest , index_longest + length_longest - 1)
return(longest_cmn_sbstr)
}
#----
#AUXILIARY FUNCTIONS 3 & 4
#----
startseleniumdocker <- function(){
#Loading a Selenium web browser via docker
#system("docker pull selenium/standalone-chrome", wait = TRUE)
#system("docker run -d -p 4445:4444 -p 5901:5900 selenium/standalone-chrome", wait = TRUE)
cat("Getting Selenium browser docker!\n")
system("docker pull selenium/standalone-chrome-debug", wait = TRUE)
Sys.sleep(4)
cat("Starting docker container!\n")
mydocker <- system("docker run -d -p 4445:4444 -p 5901:5900 selenium/standalone-chrome-debug",
wait = TRUE, intern = TRUE)
Sys.sleep(4)
dockers <- paste0(system("docker ps", wait = TRUE, intern = TRUE), collapse = " ")
#Storing the docker ID for later--to close the docker container upon function completion
mydockerid <- lcsbstr_no_lib(dockers, mydocker)
return(mydockerid)
}
stopseleniumdocker <- function(mydockerid){
cat("Closing Selenium browser contained in docker", mydockerid, "\n")
system(paste0("docker stop ", mydockerid), wait = TRUE, intern = TRUE)
#Check if docker has been closed properly
dockers <- paste0(system("docker ps", wait = TRUE, intern = TRUE), collapse = " ")
if(lcsbstr_no_lib(dockers, mydockerid) != mydockerid) cat("Docker closed succesfully.")
}
#----
#MAIN FUNCTION
#----
#Start docker container, fetch + plot data from Stockrow, stop docker container
srvisualize <- function(url = NULL){
require(RSelenium) #For getting HTML data
require(devtools) #RSelenium dependency
require(stringi) #RSelenium dependency
require(jsonlite) #For parsing JSON data
require(httr) #For getting JSON data
require(ggplot2) #For plotting
require(magrittr) #For plotting
require(stringr)
if(is.null(url)) stop("No URL provided!")
#if(is.null(remDr)) stop("No Selenium remote driver provided!")
#start docker
mydockid <- startseleniumdocker()
if(!is.null(mydockid)) cat("Selenium browser running from docker container", mydockid, "\nStarting remote driver!\n")
#Starting remote driver
remDr <- RSelenium::remoteDriver(port=4445L, browserName="chrome")
Sys.sleep(10)
#Opening the webpage
remDr$open()
if(!remDr$getStatus()$ready) stop("Something's wrong with Selenium, please check!")
remDr$navigate(url)
remDr$getCurrentUrl() #to check where we are
cat("The current URL is: ", unlist(remDr$getCurrentUrl()), "\n")
#Stockrow passes queries from the interactive_chart
#to an internal API URL: https://stockrow.com/api/fundamentals.json
#which returns the requested data (as a JSON)
#There are only two things that define the request uniquely
#Namely: the chart indicators and the tickers
#So to get the interactive_chart data in R
#We first need to scrape the chart indicaors
#and the chart tickers from the webpage
#Once we have these
#we can reconstruct the request ourselves
#and pass it to fundamentals.json
#to get our data
#First get the hidden chart indicator string
webElem <- remDr$findElements(using = "name", value = "indicator-input")
#chartindicators <- webElem[[1]]$getElementAttribute("value")
chart_indicators <- unlist(lapply(webElem, function(x){x$getElementAttribute("value")}))
chart_indicators
#Then get the set of tickers for the plot
webElem <- remDr$findElements(using = "name", value = "compare-input")
#charttickers <- unlist(webElem$getElementAttribute("value"))
chart_tickers <- unlist(lapply(webElem, function(x){x$getElementAttribute("value")}))
chart_tickers
#Also set the start_date for the data
chart_start_date <- "1960-01-01T00:00:00.000+01:00"
#Put the indicators, tickers, and a start_date value
#into a list that will then be converted into a JSON string
#with jsonlite::toJSON()
reqargs <- list(indicators = chart_indicators,
tickers = chart_tickers,
start_date = chart_start_date)
#Request URL
jsonurl <- "https://stockrow.com/api/fundamentals.json"
cat("Fetching data.\n")
#Make the request with httr::POST()
#Notice the application/json Content-Type specified
#in the header
#The JSON string composed earlier is submitted as the body
#of the request
chartdat <- httr::POST(jsonurl,
httr::add_headers(
"Content-Type" = "application/json;charset=utf-8"
),
body = jsonlite::toJSON(reqargs)
)
#Check if the request was successful
#i.e., status code 200
if(httr::status_code(chartdat) == 200) cat("Data acquired!\n")
#Get the contents
chartdat <- httr::content(chartdat, as = "text")
chartdat <- jsonlite::fromJSON(chartdat)
#Writing the data to a data.frame for plotting
dat <- data.frame(name = c(), date = c(), value = c())
for(i in 1:length(chartdat$series$name)){
#i <- 1
curdat <- as.data.frame(chartdat$series$data[i])
names(curdat) <- c("date", "value")
curdat$series <- rep_len(chartdat$series$name[i], nrow(curdat))
#For some reason, the dates are off by 10 years.
#So the chart_start_date value can't be used directly
#to parse the datetime data in milliseconds to date-time
#So a custom value is used here
curdat$date <- as.POSIXct(curdat$date/1000, origin = "1969-12-31T00:00:00.000+01:00")
dat <- rbind(dat, curdat)
}
#Plotting the data
cat("Plotting data!\n")
plotdat <- dat %>%
ggplot(aes(x = date, y = value/10^12, color = series)) +
geom_line() +
xlab("Date") +
ylab("Cash (trillion USD)")
print(plotdat)
cat("Done!\n")
stopseleniumdocker(mydockid)
return(list(dat, plotdat, mydockid))
}
#----
以下是该函数的一些实际应用示例:
与你的URL:
url1 <- "https://stockrow.com/interactive_chart/54e53957-2723-4e78-8b73-1b922e9d3300"
url1_dat <- srvisualize(url = url1)
URL 有两个代码:
url2 <- "https://stockrow.com/interactive_chart/0c3a40d2-ca06-4df1-9115-b45d2df2e5f5"
url2_dat <- srvisualize(url = url2)
URL 有两个指标和两个代码:
url3 <- "https://stockrow.com/interactive_chart/281a8ff5-b055-41d5-8b06-7b4b84f70210"
url3_dat <- srvisualize(url = url3)
再举一个例子 URL 随机代码和随机指标:
url4 <- "https://stockrow.com/interactive_chart/5e95b5a0-cc15-4620-b9bf-f0c4f7436490"
url4_dat <- srvisualize(url = url4)
当然,为了增强和完善 srvisualize
的功能和可用性,这里还有很多工作要做,但这只是一个开始。