我可以将历史市值从网站导入 R 吗？

Question

我尝试使用 getHisMktCap 导入股票的历史市值。由于此功能需要使用数字作为行情符号，因此它不适合我。

我找到了一个显示股票历史市值的网站，我想将其导入 R。

https://stockrow.com/interactive_chart/54e53957-2723-4e78-8b73-1b922e9d3300

如您所见，这是来自图表。我只想带来一天的市值，2015-10-30。我也有数百个代码。

我试过：

library(data.table)
mydat <- fread('https://stockrow.com/interactive_chart/54e53957-2723-4e78-8b73-1b922e9d3300')

并且它不从网络导入数据。我怎样才能做到这一点？

Answer 1

根据 stockrow 的社区，there's no available API：

Hi we currently don’t offer any APIs as our data provider doesn’t allow it. If you interested in fundamental data APIs, check Sharadar SF1 database on Quandl, it’s available for a very reasonable price.

如果我们按照评论中的建议访问 Quandl 网站，我们可以看到他们提供了 R dedicated package for their API.

Answer 2

我来晚了，但这是针对您的特定问题的自定义解决方案。

这将要求您使用 RSelenium 获取 HTML 数据，其中包含 chart indicators 和 chart tickers 的值，这些值需要馈送到 JSON API (?).然后，使用 jsonlite 和 httr 您可以制定一个 POST 查询，该查询将从该 URL 中获取 JSON 格式的数据。最后，可以在 R 中对数据进行格式化和绘图。

下面的一组函数为您完成了这个（下面代码块中的最后一个函数取决于前面的辅助函数）。 srvisualize 是为从 stockrow 中检索和绘制数据而量身定制的。您需要提供的只是您的库存 URL。除了绘制数据外，它还 returns 原始格式化数据（作为 data.frame 对象；用于下游争论）、绘图数据（用于自定义绘图）和 Docker 容器 ID（其中部署了 Selenium 浏览器以加载 URL；此容器在函数终止时关闭）。

使用 srvisualize 的先决条件是安装 Docker，因为 Selenium 浏览器将由 srvisualize 安装和部署为 Docker 容器。注意：如果 srvisualize dies/crashes，那么你必须手动杀死它启动的 docker 容器（如果它启动了一个）（应该打印 docker ID到 R 控制台）。

#AUXILIARY FUNCTIONS 1 & 2
#----

#Functions used to find the docker ID
#Courtesy 
longest_string <- function(s){return(s[which.max(nchar(s))])}

lcsbstr_no_lib <- function(a, b) { 
  
  matches <- gregexpr("M+", drop(attr(adist(a, b, counts = TRUE), "trafos")))[[1]];
  lengths<- attr(matches, 'match.length')
  which_longest <- which.max(lengths)
  index_longest <- matches[which_longest]
  length_longest <- lengths[which_longest]
  longest_cmn_sbstr  <- substring(longest_string(c(a, b)), index_longest , index_longest + length_longest - 1)
  return(longest_cmn_sbstr) 
  
}

#----


#AUXILIARY FUNCTIONS 3 & 4
#----

startseleniumdocker <- function(){
  #Loading a Selenium web browser via docker
  #system("docker pull selenium/standalone-chrome", wait = TRUE)
  #system("docker run -d -p 4445:4444 -p 5901:5900 selenium/standalone-chrome", wait = TRUE)
  cat("Getting Selenium browser docker!\n")
  system("docker pull selenium/standalone-chrome-debug", wait = TRUE)
  Sys.sleep(4)
  cat("Starting docker container!\n")
  mydocker <- system("docker run -d -p 4445:4444 -p 5901:5900 selenium/standalone-chrome-debug", 
                     wait = TRUE, intern = TRUE)
  Sys.sleep(4)
  dockers <- paste0(system("docker ps", wait = TRUE, intern = TRUE), collapse = " ")
  #Storing the docker ID for later--to close the docker container upon function completion
  mydockerid <- lcsbstr_no_lib(dockers, mydocker)
  
  return(mydockerid)
}

stopseleniumdocker <- function(mydockerid){
  
  cat("Closing Selenium browser contained in docker", mydockerid, "\n")
  system(paste0("docker stop ", mydockerid), wait = TRUE, intern = TRUE)
  #Check if docker has been closed properly
  dockers <- paste0(system("docker ps", wait = TRUE, intern = TRUE), collapse = " ")
  if(lcsbstr_no_lib(dockers, mydockerid) != mydockerid) cat("Docker closed succesfully.")
  
}

#----


#MAIN FUNCTION
#----

#Start docker container, fetch + plot data from Stockrow, stop docker container
srvisualize <- function(url = NULL){
  
  require(RSelenium) #For getting HTML data
  require(devtools) #RSelenium dependency
  require(stringi) #RSelenium dependency
  require(jsonlite) #For parsing JSON data
  require(httr) #For getting JSON data
  require(ggplot2) #For plotting
  require(magrittr) #For plotting
  require(stringr)

  
  
  if(is.null(url)) stop("No URL provided!")
  #if(is.null(remDr)) stop("No Selenium remote driver provided!")
  
  #start docker
  mydockid <- startseleniumdocker()
  
  if(!is.null(mydockid)) cat("Selenium browser running from docker container", mydockid, "\nStarting remote driver!\n")
  
  #Starting remote driver
  remDr <- RSelenium::remoteDriver(port=4445L, browserName="chrome")
  Sys.sleep(10)
  #Opening the webpage
  remDr$open()
  
  if(!remDr$getStatus()$ready) stop("Something's wrong with Selenium, please check!")
  
  remDr$navigate(url)
  remDr$getCurrentUrl() #to check where we are
  cat("The current URL is: ", unlist(remDr$getCurrentUrl()), "\n")
  
  
  #Stockrow passes queries from the interactive_chart
  #to an internal API URL: https://stockrow.com/api/fundamentals.json
  #which returns the requested data (as a JSON)
  #There are only two things that define the request uniquely
  #Namely: the chart indicators and the tickers
  
  #So to get the interactive_chart data in R
  #We first need to scrape the chart indicaors
  #and the chart tickers from the webpage
  
  #Once we have these
  #we can reconstruct the request ourselves
  #and pass it to fundamentals.json
  #to get our data
  
  
  #First get the hidden chart indicator string
  webElem <- remDr$findElements(using = "name", value = "indicator-input")
  #chartindicators <- webElem[[1]]$getElementAttribute("value")
  chart_indicators <- unlist(lapply(webElem, function(x){x$getElementAttribute("value")}))
  chart_indicators
  
  #Then get the set of tickers for the plot
  webElem <- remDr$findElements(using = "name", value = "compare-input")
  #charttickers <- unlist(webElem$getElementAttribute("value"))
  chart_tickers <- unlist(lapply(webElem, function(x){x$getElementAttribute("value")}))
  chart_tickers
  
  #Also set the start_date for the data
  chart_start_date <- "1960-01-01T00:00:00.000+01:00"
  
  #Put the indicators, tickers, and a start_date value
  #into a list that will then be converted into a JSON string
  #with jsonlite::toJSON()
  reqargs <- list(indicators = chart_indicators, 
                  tickers = chart_tickers, 
                  start_date = chart_start_date)
  
  #Request URL
  jsonurl <- "https://stockrow.com/api/fundamentals.json"
  
  cat("Fetching data.\n")
  
  #Make the request with httr::POST()
  #Notice the application/json Content-Type specified
  #in the header
  #The JSON string composed earlier is submitted as the body
  #of the request
  chartdat <- httr::POST(jsonurl,
                         httr::add_headers(
                           "Content-Type" = "application/json;charset=utf-8"
                         ),
                         body = jsonlite::toJSON(reqargs)
  )
  
  #Check if the request was successful
  #i.e., status code 200
  if(httr::status_code(chartdat) == 200) cat("Data acquired!\n")
  
  
  #Get the contents
  chartdat <- httr::content(chartdat, as = "text")
  chartdat <- jsonlite::fromJSON(chartdat) 
  
  #Writing the data to a data.frame for plotting
  dat <- data.frame(name = c(), date = c(), value = c())
  
  for(i in 1:length(chartdat$series$name)){
    #i <- 1
    curdat <- as.data.frame(chartdat$series$data[i])
    names(curdat) <- c("date", "value")
    curdat$series <- rep_len(chartdat$series$name[i], nrow(curdat))
    #For some reason, the dates are off by 10 years.
    #So the chart_start_date value can't be used directly
    #to parse the datetime data in milliseconds to date-time
    #So a custom value is used here
    curdat$date <- as.POSIXct(curdat$date/1000, origin = "1969-12-31T00:00:00.000+01:00")
    
    dat <- rbind(dat, curdat)
    
  }
  
  
  
  #Plotting the data
  cat("Plotting data!\n")
  
  plotdat <- dat %>% 
    ggplot(aes(x = date, y = value/10^12, color = series)) + 
    geom_line() + 
    xlab("Date") + 
    ylab("Cash (trillion USD)")
  
  print(plotdat)
  cat("Done!\n")
  
  stopseleniumdocker(mydockid)
  
  
  return(list(dat, plotdat, mydockid))
  
}

#----

以下是该函数的一些实际应用示例：

与你的URL:

url1 <- "https://stockrow.com/interactive_chart/54e53957-2723-4e78-8b73-1b922e9d3300"
url1_dat <- srvisualize(url = url1)

URL 有两个代码：

url2 <- "https://stockrow.com/interactive_chart/0c3a40d2-ca06-4df1-9115-b45d2df2e5f5"
url2_dat <- srvisualize(url = url2)

URL 有两个指标和两个代码：

url3 <- "https://stockrow.com/interactive_chart/281a8ff5-b055-41d5-8b06-7b4b84f70210"
url3_dat <- srvisualize(url = url3)

再举一个例子 URL 随机代码和随机指标：

url4 <- "https://stockrow.com/interactive_chart/5e95b5a0-cc15-4620-b9bf-f0c4f7436490"
url4_dat <- srvisualize(url = url4)

当然，为了增强和完善 srvisualize 的功能和可用性，这里还有很多工作要做，但这只是一个开始。

我可以将历史市值从网站导入 R 吗？

Can I import historical market cap from a website to R?

r

data-import