rbind 联机目录中的 txt 文件 (R)

Question

我正在尝试从 url 获取串联文本文件，但我不知道如何使用 html 和不同的文件夹执行此操作？

这是我试过的代码，但它只列出了文本文件，并且有很多 html 代码，例如 this 我该如何解决这个问题，以便我可以将文本文件组合成一个csv文件？

library(RCurl)
url <- "http://weather.ggy.uga.edu/data/daily/"
dir <- getURL(url, dirlistonly = T)
filenames <- unlist(strsplit(dir,"\n")) #split into filenames
#append the files one after another
for (i in 1:length(filenames)) {
file <- past(url,filenames[i],delim='') #concatenate for urly 
if (i==1){
cp <- read_delim(file, header=F, delim=',')
}
else{
temp <- read_delim(file,header=F,delim=',')
cp <- rbind(cp,temp) #append to existing file
rm(temp)# remove the temporary file
}
}

Answer 1

这是我为我工作的代码片段。我喜欢使用 rvest 而不是 RCurl，因为这是我学到的。在这种情况下，我能够使用 html_nodes 函数来隔离每个以 .txt 结尾的文件。结果 table 将时间保存为字符串，但您可以稍后修复。如果您有任何问题，请告诉我。

library(rvest)
library(readr)

url <- "http://weather.ggy.uga.edu/data/daily/"

doc <- xml2::read_html(url)
text <- rvest::html_text(rvest::html_nodes(doc, "tr td a:contains('.txt')"))


# define column types of fwf data ("c" = character, "n" = number)
ctypes <- paste0("c", paste0(rep("n",11), collapse = ""))
data <- data.frame()

for (i in 1:2){
  file <- paste0(url, text[1])

  date <- as.Date(read_lines(file, n_max = 1), "%m/%d/%y")

  # Read file to determine widths
  columns <- fwf_empty(file, skip = 3)

  # Manually expand `solar` column to be 3 spaces wider
  columns$begin[8] <- columns$begin[8] - 3

  data <- rbind(data, cbind(date,read_fwf(file, columns, 
                                          skip = 3, col_types = ctypes)))
}

rbind 联机目录中的 txt 文件 (R)

rbind txt files from online directory (R)

r

concatenation

rbind

stringr