Error/exception 使用 bind_rows() 和 lapply() 函数处理

Error/exception handling with bind_rows() and lapply() functions

我有一个函数可以从 url 列表中抓取 table:

getscore <- function(www0) {

    require(rvest)
    require(dplyr)

    www <- html(www0)

    boxscore <- www %>% html_table(fill = TRUE) %>% .[[1]]
    names(boxscore)[3] <- "VG"
    names(boxscore)[5] <- "HG"
    names(boxscore)[6] <- "Type"

    return(boxscore)
}

工作示例数据:

www_list <- c("http://www.hockey-reference.com/boxscores/2014/12/20/",
              "http://www.hockey-reference.com/boxscores/2014/12/21/",
              "http://www.hockey-reference.com/boxscores/2014/12/22/")

nhl14_15 <- bind_rows(lapply(www_list, getscore))

但是,没有玩游戏的网址会破坏我的功能:

www_list <- c("http://www.hockey-reference.com/boxscores/2014/12/22/",
              "http://www.hockey-reference.com/boxscores/2014/12/23/",
              "http://www.hockey-reference.com/boxscores/2014/12/24/",
              "http://www.hockey-reference.com/boxscores/2014/12/25/")

nhl14_15 <- bind_rows(lapply(www_list, getscore))

如何将 error/exception 处理构建到我的函数中以跳过中断的 url?


代码应该是可重现的...

没有游戏时获得的table是完全不同的结构。您可以检查 colnames(boxscore) 是否符合预期。作为示例,我包含了对您的函数的改编,用于检查列 Visitor 是否可用。

getscore <- function(www0) {

  require(rvest)
  require(dplyr)

  www <- html(www0)

  boxscore <- www %>% html_table(fill = TRUE) %>% .[[1]]

  if ("Visitor" %in% colnames(boxscore)){
    names(boxscore)[3] <- "VG"
    names(boxscore)[5] <- "HG"
    names(boxscore)[6] <- "Type"

  return(boxscore)
  }
}

有了这个函数,你的例子就不会中断:

www_list <- c("http://www.hockey-reference.com/boxscores/2014/12/22/",
              "http://www.hockey-reference.com/boxscores/2014/12/23/",
              "http://www.hockey-reference.com/boxscores/2014/12/24/",
              "http://www.hockey-reference.com/boxscores/2014/12/25/")

nhl14_15 <- bind_rows(lapply(www_list, getscore))

这里一个不错的方法是使用 data.table 包中的 rbindlist(它允许您使用 fill=TRUE),这样您就可以绑定所有甚至 bind_rows 不起作用,但是您可以过滤非 NA 日期(本质上是 bind_rows 不起作用的网页),然后限制为 6 列,我猜您正在寻找有效数据。

library(data.table) # development vs. 1.9.5
www_list <- c("http://www.hockey-reference.com/boxscores/2014/12/20/",
              "http://www.hockey-reference.com/boxscores/2014/12/21/",
              "http://www.hockey-reference.com/boxscores/2014/12/22/",
              "http://www.hockey-reference.com/boxscores/2014/12/24/") # not working
resdt<-rbindlist(
    lapply(
        www_list, function(www0){
            message ("web is ", www0) # comment out this if you don't want message to appear
            getscore(www0)}),fill=TRUE)
resdt[!is.na(Date),1:6,with=FALSE] # 6 column is valid data

         Date             Visitor VG                  Home HG Type
 1: 2014-12-20  Colorado Avalanche  5        Buffalo Sabres  1     
 2: 2014-12-20    New York Rangers  3   Carolina Hurricanes  2   SO
 3: 2014-12-20  Chicago Blackhawks  2 Columbus Blue Jackets  3   SO
 4: 2014-12-20     Arizona Coyotes  2     Los Angeles Kings  4     
 5: 2014-12-20 Nashville Predators  6        Minnesota Wild  5   OT
 6: 2014-12-20     Ottawa Senators  1    Montreal Canadiens  4     
 7: 2014-12-20 Washington Capitals  4     New Jersey Devils  0     
 8: 2014-12-20 Tampa Bay Lightning  1    New York Islanders  3     
 9: 2014-12-20    Florida Panthers  1   Pittsburgh Penguins  3     
10: 2014-12-20     St. Louis Blues  2       San Jose Sharks  3   OT
11: 2014-12-20 Philadelphia Flyers  7   Toronto Maple Leafs  4     
12: 2014-12-20      Calgary Flames  2     Vancouver Canucks  3   OT
13: 2014-12-21      Buffalo Sabres  3         Boston Bruins  4   OT
14: 2014-12-21 Toronto Maple Leafs  0    Chicago Blackhawks  4     
15: 2014-12-21  Colorado Avalanche  2     Detroit Red Wings  1   SO
16: 2014-12-21        Dallas Stars  6       Edmonton Oilers  5   SO
17: 2014-12-21 Carolina Hurricanes  0      New York Rangers  1     
18: 2014-12-21 Philadelphia Flyers  4         Winnipeg Jets  3   OT
19: 2014-12-22     San Jose Sharks  2         Anaheim Ducks  3   OT
20: 2014-12-22 Nashville Predators  5 Columbus Blue Jackets  1     
21: 2014-12-22 Pittsburgh Penguins  3      Florida Panthers  4   SO
22: 2014-12-22      Calgary Flames  4     Los Angeles Kings  3   OT
23: 2014-12-22     Arizona Coyotes  1     Vancouver Canucks  7     
24: 2014-12-22     Ottawa Senators  1   Washington Capitals  2     
          Date             Visitor VG                  Home HG Type

如果你不熟悉data.table,你可以直接用它做rbindlist,然后将data.table转换回data.frame并执行通常的data.frame 操作。但是,你真的应该学习 data.table 因为它在大数据上非常快速和高效。

resdf<-as.data.frame(res.dt)
with(resdf,resdf[!is.na(Date),1:6]) 

     Date             Visitor VG                  Home HG Type
1  2014-12-20  Colorado Avalanche  5        Buffalo Sabres  1     
2  2014-12-20    New York Rangers  3   Carolina Hurricanes  2   SO
3  2014-12-20  Chicago Blackhawks  2 Columbus Blue Jackets  3   SO
4  2014-12-20     Arizona Coyotes  2     Los Angeles Kings  4     
5  2014-12-20 Nashville Predators  6        Minnesota Wild  5   OT
6  2014-12-20     Ottawa Senators  1    Montreal Canadiens  4     
7  2014-12-20 Washington Capitals  4     New Jersey Devils  0     
8  2014-12-20 Tampa Bay Lightning  1    New York Islanders  3     
9  2014-12-20    Florida Panthers  1   Pittsburgh Penguins  3     
10 2014-12-20     St. Louis Blues  2       San Jose Sharks  3   OT
11 2014-12-20 Philadelphia Flyers  7   Toronto Maple Leafs  4     
12 2014-12-20      Calgary Flames  2     Vancouver Canucks  3   OT
13 2014-12-21      Buffalo Sabres  3         Boston Bruins  4   OT
14 2014-12-21 Toronto Maple Leafs  0    Chicago Blackhawks  4     
15 2014-12-21  Colorado Avalanche  2     Detroit Red Wings  1   SO
16 2014-12-21        Dallas Stars  6       Edmonton Oilers  5   SO
17 2014-12-21 Carolina Hurricanes  0      New York Rangers  1     
18 2014-12-21 Philadelphia Flyers  4         Winnipeg Jets  3   OT
19 2014-12-22     San Jose Sharks  2         Anaheim Ducks  3   OT
20 2014-12-22 Nashville Predators  5 Columbus Blue Jackets  1     
21 2014-12-22 Pittsburgh Penguins  3      Florida Panthers  4   SO
22 2014-12-22      Calgary Flames  4     Los Angeles Kings  3   OT
23 2014-12-22     Arizona Coyotes  1     Vancouver Canucks  7     
24 2014-12-22     Ottawa Senators  1   Washington Capitals  2