使用 R 抓取数据

Webscraping the data using R

目标: 我正在尝试从网页 http://www.nepalstock.com/datanepse/previous.php 中抓取所有公司的历史每日股价。以下代码有效;但是,它始终只生成最近(2015 年 2 月 5 日)的每日股票价格。换句话说,无论我输入的日期如何,输出都是一样的。如果您能在这方面提供帮助,我将不胜感激。

  library(RHTMLForms)
    library(RCurl)
    library(XML)
    url <- "http://www.nepalstock.com/datanepse/previous.php"
    forms <- getHTMLFormDescription(url)

    # we are interested in the second list with date forms
    # forms[[2]]
    # HTML Form: http://www.nepalstock.com/datanepse/ 
    #   Date: [  ]

    get_stock<-createFunction(forms[[2]])

#create sequence of dates from start to end and store it as a list

    date_daily<-as.list(seq(as.Date("2011-08-24"), as.Date("2011-08-30"), "days"))

# determine the number of elements in the list

    num<-length(date_daily)

    daily_1<-lapply(date_daily,function(x){
      show(x) #displays the particular date
      readHTMLTable(htmlParse(get_stock(Date = x)), which = 7)

    })


 #18 tables out of which 7 is one what we desired

# change the colnames 

    col_name<-c("SN","Traded_Companies","No_of_Transactions","Max_Price","Min_Price","Closing_Price","Total_Share","Amount","Previous_Closing","Difference_Rs.")
    daily_2<-lapply(daily_1,setNames,nm=col_name)

Output:
> head(daily_2[[1]],5)
 SN                                   Traded_Companies No_of_Transactions Max_Price Min_Price Closing_Price Total_Share    Amount
1  1                  Agricultural Development Bank Ltd                 24       489       471           473       2,868 1,359,038
2  2 Arun Valley Hydropower Development Company Limited                 40       365       360           362       8,844 3,199,605
3  3                    Alpine Development Bank Limited                 11       297       295           295         150    44,350
4  4                   Asian Life Insurance Co. Limited                 10     1,230     1,215         1,225         898 1,098,452
5  5                         Apex Development Bank Ltd.                 23       131       125           131       6,033   769,893
  Previous_Closing Difference_Rs.
1              480             -7
2              363             -1
3              303             -8
4            1,242            -17
5              132             -1
> tail(daily_2[[1]],5)
     SN                 Traded_Companies No_of_Transactions Max_Price Min_Price Closing_Price Total_Share    Amount Previous_Closing
140 140               United Finance Ltd                  4       255       242           242         464   115,128              255
141 141  United Insurance Co.(Nepal)Ltd.                  3       905       905           905         234   211,770              915
142 142         Vibor Bikas Bank Limited                  7       158       152           156         710   109,510              161
143 143 Western Development Bank Limited                 35       320       311           313       7,631 2,402,497              318
144 144    Yeti Development Bank Limited                 22       139       132           139      14,355 1,921,511              134
    Difference_Rs.
140            -13
141            -10
142             -5
143             -5
144              5

这是一种快速方法。请注意,该站点使用 POST 请求将日期发送到服务器。

library(rvest)
library(httr)

page <- "http://www.nepalstock.com/datanepse/previous.php" %>% 
  POST(body = list(Date = "2015-02-01")) %>% 
  html()

page %>%
  html_node(".dataTable") %>%
  html_table(header = TRUE)