使用 R 和 rvest 抓取财务数据
Scraping financial data with R and rvest
我正在尝试从 morningstar.com 获取财务数据;我想得到 i.e. MSFT yearly revenue data.
它们在主 <div>
的一排 <div>
table.
我跟着一些样本得到主要 table:
url <- "http://financials.morningstar.com/income-statement/is.html?t=MSFT®ion=usa&culture=en-US"
table <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="sfcontent"]/div[3]/div[3]') %>%
html_table()
但我得到一个空 list()
。 html_nodes
本身 returns 一个 {xml_nodeset (0)}
我不知道如何处理。
read.csv("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:MSFT®ion=usa&culture=en-US&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=865827&denominatorView=raw&number=3", skip=1)
Fiscal.year.ends.in.June..USD.in.millions.except.per.share.data. X2011.06 X2012.06 X2013.06 X2014.06 X2015.06 TTM
1 Revenue 69943.00 73723.00 77849.00 86833.00 93580.00 90758.00
2 Cost of revenue 15577.00 17530.00 20249.00 26934.00 33038.00 31972.00
3 Gross profit 54366.00 56193.00 57600.00 59899.00 60542.00 58786.00
4 Operating expenses NA NA NA NA NA NA
5 Research and development 9043.00 9811.00 10411.00 11381.00 12046.00 11943.00
6 Sales, General and administrative 18162.00 18426.00 20425.00 20632.00 20324.00 19862.00
7 Restructuring, merger and acquisition NA NA NA 127.00 NA NA
8 Other operating expenses NA 6193.00 NA NA 10011.00 8871.00
9 Total operating expenses 27205.00 34430.00 30836.00 32140.00 42381.00 40676.00
10 Operating income 27161.00 21763.00 26764.00 27759.00 18161.00 18110.00
11 Interest Expense 295.00 380.00 429.00 597.00 781.00 869.00
12 Other income (expense) 1205.00 884.00 717.00 658.00 1127.00 883.00
13 Income before taxes 28071.00 22267.00 27052.00 27820.00 18507.00 18124.00
14 Provision for income taxes 4921.00 5289.00 5189.00 5746.00 6314.00 5851.00
15 Net income from continuing operations 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
16 Net income 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
17 Net income available to common shareholders 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
18 Earnings per share NA NA NA NA NA NA
19 Basic 2.73 2.02 2.61 2.66 1.49 1.51
20 Diluted 2.69 2.00 2.58 2.63 1.48 1.50
21 Weighted average shares outstanding NA NA NA NA NA NA
22 Basic 8490.00 8396.00 8375.00 8299.00 8177.00 8114.00
23 Diluted 8593.00 8506.00 8470.00 8399.00 8254.00 8183.00
24 EBITDA 31132.00 25614.00 31236.00 33629.00 25245.00 24983.00
让浏览器开发者工具 "Network" 选项卡成为您的好友是 super-helpful。
(URL 来自检查 "Export" 按钮的作用)。
Stefano,您可能会发现这非常有用。
require(quantmod)
setwd("C:/Users/your_path_here/")
stocks <- c("AXP","BA","CAT","CSCO","CVX","DD","DIS","GE","GS","HD","IBM","INTC","JNJ","JPM","KO","MCD","MMM","MRK","MSFT","NKE","PFE","PG","T","TRV","UNH","UTX","V","VZ","WMT","XOM")
# equityList <- read.csv("EquityList.csv", header = FALSE, stringsAsFactors = FALSE)
# names(equityList) <- c ("Ticker")
for (i in 1 : length(stocks)) {
temp<-getFinancials(stocks[i],src="google",auto.assign=FALSE)
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Annual).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Annual).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Annual).csv",sep=""))
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Quarterly).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Quaterly).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Quaterly).csv",sep=""))
}
我正在尝试从 morningstar.com 获取财务数据;我想得到 i.e. MSFT yearly revenue data.
它们在主 <div>
的一排 <div>
table.
我跟着一些样本得到主要 table:
url <- "http://financials.morningstar.com/income-statement/is.html?t=MSFT®ion=usa&culture=en-US"
table <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="sfcontent"]/div[3]/div[3]') %>%
html_table()
但我得到一个空 list()
。 html_nodes
本身 returns 一个 {xml_nodeset (0)}
我不知道如何处理。
read.csv("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:MSFT®ion=usa&culture=en-US&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=865827&denominatorView=raw&number=3", skip=1)
Fiscal.year.ends.in.June..USD.in.millions.except.per.share.data. X2011.06 X2012.06 X2013.06 X2014.06 X2015.06 TTM
1 Revenue 69943.00 73723.00 77849.00 86833.00 93580.00 90758.00
2 Cost of revenue 15577.00 17530.00 20249.00 26934.00 33038.00 31972.00
3 Gross profit 54366.00 56193.00 57600.00 59899.00 60542.00 58786.00
4 Operating expenses NA NA NA NA NA NA
5 Research and development 9043.00 9811.00 10411.00 11381.00 12046.00 11943.00
6 Sales, General and administrative 18162.00 18426.00 20425.00 20632.00 20324.00 19862.00
7 Restructuring, merger and acquisition NA NA NA 127.00 NA NA
8 Other operating expenses NA 6193.00 NA NA 10011.00 8871.00
9 Total operating expenses 27205.00 34430.00 30836.00 32140.00 42381.00 40676.00
10 Operating income 27161.00 21763.00 26764.00 27759.00 18161.00 18110.00
11 Interest Expense 295.00 380.00 429.00 597.00 781.00 869.00
12 Other income (expense) 1205.00 884.00 717.00 658.00 1127.00 883.00
13 Income before taxes 28071.00 22267.00 27052.00 27820.00 18507.00 18124.00
14 Provision for income taxes 4921.00 5289.00 5189.00 5746.00 6314.00 5851.00
15 Net income from continuing operations 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
16 Net income 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
17 Net income available to common shareholders 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
18 Earnings per share NA NA NA NA NA NA
19 Basic 2.73 2.02 2.61 2.66 1.49 1.51
20 Diluted 2.69 2.00 2.58 2.63 1.48 1.50
21 Weighted average shares outstanding NA NA NA NA NA NA
22 Basic 8490.00 8396.00 8375.00 8299.00 8177.00 8114.00
23 Diluted 8593.00 8506.00 8470.00 8399.00 8254.00 8183.00
24 EBITDA 31132.00 25614.00 31236.00 33629.00 25245.00 24983.00
让浏览器开发者工具 "Network" 选项卡成为您的好友是 super-helpful。
(URL 来自检查 "Export" 按钮的作用)。
Stefano,您可能会发现这非常有用。
require(quantmod)
setwd("C:/Users/your_path_here/")
stocks <- c("AXP","BA","CAT","CSCO","CVX","DD","DIS","GE","GS","HD","IBM","INTC","JNJ","JPM","KO","MCD","MMM","MRK","MSFT","NKE","PFE","PG","T","TRV","UNH","UTX","V","VZ","WMT","XOM")
# equityList <- read.csv("EquityList.csv", header = FALSE, stringsAsFactors = FALSE)
# names(equityList) <- c ("Ticker")
for (i in 1 : length(stocks)) {
temp<-getFinancials(stocks[i],src="google",auto.assign=FALSE)
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Annual).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Annual).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Annual).csv",sep=""))
write.csv(temp$IS$A,paste(stocks[i],"_Income_Statement(Quarterly).csv",sep=""))
write.csv(temp$BS$A,paste(stocks[i],"_Balance_Sheet(Quaterly).csv",sep=""))
write.csv(temp$CF$A,paste(stocks[i],"_Cash_Flow(Quaterly).csv",sep=""))
}