R - 循环 cbind() 结果的累积存储和可能的 lapply 双 for-loop 解决方案
R - Cumulative storage of looped cbind() results and possible lapply solution to double for-loop
我找到了解决 我根据 @Ryan 的推荐发布的解决方案,代码如下:
for (i in seq_along(url)){
webpage <- read_html(url[i]) #loop through URL list to access html data
fac_data <- html_nodes(webpage,'.tableunder') %>% html_text()
fac_data1 <- html_nodes(webpage,'.tableunder1') %>% html_text()
fac_data <- c(fac_data, fac_data1) #Store table data on each URL in a variable
x <- fac_data %>% matrix(ncol = length(headers[[i]]), byrow=TRUE) #make matrix to extract column data
for (j in seq_along(headers[[i]])){
y <- cbind(x[,j]) #extract column data and store in temporary variable
colnames(y) <- as.character(headers[[i]][j]) #add column name
print(cbind(y)) #loop through headers list to print column data in sequence. ** cbind(y) will be overwritten when I try to store the result on a list with 'z <- cbind(y)'.
}
}
我现在可以打印出所有值,包括 headers 相关数据。
一些 follow-up 问题将是:
如何将 cbind(y) 的输出累积保存在 data.frame 或列表中?循环通过 cbind(y) 将覆盖值,这让我只剩下最后一个 table 的最后一列。像这样:
退休年月
[1,]《82年8月》
这些变体也不起作用:
z[[x]][j] <- cbind(y)
> source('~/Google 云端硬盘/R/scrapeFaculty.R')
Error in `*tmp*`[[x]] : 最多只能選擇一個元素
z[j] <- cbind(y)
> source('~/Google 云端硬盘/R/scrapeFaculty.R')
There were 13 warnings (use warnings() to see them)
z[[j]] <- cbind(y)
> source('~/Google 云端硬盘/R/scrapeFaculty.R')
Error in z[[j]] <- cbind(y) : 用來替換的元素比所要替換的值多
- 是否可以将双 for-loop 替换为简单的 lapply() 函数来
解决上述问题?
编辑:
这是我用来解决这个问题的最终代码:
for (i in seq_along(url)){
webpage <- read_html(url[i])
fac_data <- html_nodes(webpage,'.tableunder') %>% html_text()
fac_data1 <- html_nodes(webpage,'.tableunder1') %>% html_text()
fac_data <- c(fac_data, fac_data1)
x <- fac_data %>% matrix(ncol = length(headers[[i]]), byrow=TRUE) #make matrix to extract column data
y <- cbind(x[,1:length(headers[[i]])]) #extract column data
colnames(y)<- as.character(headers[[i]]) #add colunm name
ntu.hist[[i]] <- y #Cumulate results on a list.
}
我想知道是否可以选择一次绑定多个而不是循环。这些语法选项有帮助吗?
y <– data.frame(col1=c(1:3),col2=c(4:6),col3=c(7:9))
cbind(y[,c(1:3)])
col1 col2 col3
1 1 4 7
2 2 5 8
3 3 6 9
#In R, you can use ":" to specify a range. So 1,2,3,4 is equal to 1:4.
#If you don't want number 3 in that range, you can use c(1,2,4).
#For example:
cbind(y[,c(1,3)])
col1 col3
1 1 7
2 2 8
3 3 9
最终代码:
这是最终代码:
for (i in seq_along(url)){
webpage <- read_html(url[i])
fac_data <- html_nodes(webpage,'.tableunder') %>% html_text()
fac_data1 <- html_nodes(webpage,'.tableunder1') %>% html_text()
fac_data <- c(fac_data, fac_data1)
x <- fac_data %>% matrix(ncol = length(headers[[i]]), byrow=TRUE) #make matrix to extract column data
y <- cbind(x[,1:length(headers[[i]])]) #extract column data
colnames(y)<- as.character(headers[[i]]) #add colunm name
ntu.hist[[i]] <- y #Cumulate results on a list.
}
我找到了解决
for (i in seq_along(url)){
webpage <- read_html(url[i]) #loop through URL list to access html data
fac_data <- html_nodes(webpage,'.tableunder') %>% html_text()
fac_data1 <- html_nodes(webpage,'.tableunder1') %>% html_text()
fac_data <- c(fac_data, fac_data1) #Store table data on each URL in a variable
x <- fac_data %>% matrix(ncol = length(headers[[i]]), byrow=TRUE) #make matrix to extract column data
for (j in seq_along(headers[[i]])){
y <- cbind(x[,j]) #extract column data and store in temporary variable
colnames(y) <- as.character(headers[[i]][j]) #add column name
print(cbind(y)) #loop through headers list to print column data in sequence. ** cbind(y) will be overwritten when I try to store the result on a list with 'z <- cbind(y)'.
}
}
我现在可以打印出所有值,包括 headers 相关数据。
一些 follow-up 问题将是:
如何将 cbind(y) 的输出累积保存在 data.frame 或列表中?循环通过 cbind(y) 将覆盖值,这让我只剩下最后一个 table 的最后一列。像这样:
退休年月
[1,]《82年8月》
这些变体也不起作用:
z[[x]][j] <- cbind(y)
> source('~/Google 云端硬盘/R/scrapeFaculty.R')
Error in `*tmp*`[[x]] : 最多只能選擇一個元素
z[j] <- cbind(y)
> source('~/Google 云端硬盘/R/scrapeFaculty.R')
There were 13 warnings (use warnings() to see them)
z[[j]] <- cbind(y)
> source('~/Google 云端硬盘/R/scrapeFaculty.R')
Error in z[[j]] <- cbind(y) : 用來替換的元素比所要替換的值多
- 是否可以将双 for-loop 替换为简单的 lapply() 函数来 解决上述问题?
编辑:
这是我用来解决这个问题的最终代码:
for (i in seq_along(url)){
webpage <- read_html(url[i])
fac_data <- html_nodes(webpage,'.tableunder') %>% html_text()
fac_data1 <- html_nodes(webpage,'.tableunder1') %>% html_text()
fac_data <- c(fac_data, fac_data1)
x <- fac_data %>% matrix(ncol = length(headers[[i]]), byrow=TRUE) #make matrix to extract column data
y <- cbind(x[,1:length(headers[[i]])]) #extract column data
colnames(y)<- as.character(headers[[i]]) #add colunm name
ntu.hist[[i]] <- y #Cumulate results on a list.
}
我想知道是否可以选择一次绑定多个而不是循环。这些语法选项有帮助吗?
y <– data.frame(col1=c(1:3),col2=c(4:6),col3=c(7:9))
cbind(y[,c(1:3)])
col1 col2 col3
1 1 4 7
2 2 5 8
3 3 6 9
#In R, you can use ":" to specify a range. So 1,2,3,4 is equal to 1:4.
#If you don't want number 3 in that range, you can use c(1,2,4).
#For example:
cbind(y[,c(1,3)])
col1 col3
1 1 7
2 2 8
3 3 9
最终代码:
这是最终代码:
for (i in seq_along(url)){
webpage <- read_html(url[i])
fac_data <- html_nodes(webpage,'.tableunder') %>% html_text()
fac_data1 <- html_nodes(webpage,'.tableunder1') %>% html_text()
fac_data <- c(fac_data, fac_data1)
x <- fac_data %>% matrix(ncol = length(headers[[i]]), byrow=TRUE) #make matrix to extract column data
y <- cbind(x[,1:length(headers[[i]])]) #extract column data
colnames(y)<- as.character(headers[[i]]) #add colunm name
ntu.hist[[i]] <- y #Cumulate results on a list.
}