如何在将丢失的元素保留为空白列的同时跳过循环中的错误?
How to skip an error in loop while preserving the missing element as a blank column?
我有以下数据块:
> dput(data)
structure(c(0.640372781, 0.54596394, 0.364612178, 0.554321638,
0.623891566, 0.299900389, 0.629781465, 0.502673674, 0.414942748,
0.485381455, 0.629032253, 0.201974626, 0.549820206, 0.49277897,
0.299640651, 0.443151949, 0.506297992, 0.259198111, 0.635090505,
0.597640686, 0.430193856, 0.631067648, 0.662995875, 0.391062922,
0.632248042, 0.627503454, 0.432827825, 0.418849204, 0.612201188,
0.227470395, 0.556520484, 0.6095603, 0.414923451, 0.57634896,
0.543780581, 0.320027087, 0.655818488, 0.648937123, 0.497094053,
0.429772696, 0.632386262, 0.270060224, 0.564427852, 0.456642259,
0.492407708, 0.436349654, 0.616355794, 0.248897538, 0.642866477,
0.555022037, 0.358901689, 0.53184597, 0.606299729, 0.342449093,
0.667681177, 0.506448197, 0.370292817, 0.555462276, 0.642302168,
0.42487856, 0.649249462, 0.544035494, 0.394793334, 0.383522657,
0.557789563, 0.220189788, 0.636151283, 0.547825201, 0.391789202,
0.653913292, 0.649412792, 0.452257495, 0.648866884, 0.535907987,
0.392093314, 0.724788138, 0.674157973, 0.494385979, 0.673032345,
0.450686601, 0.369089571, 0.397124065, 0.502592807, 0.197922003
), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1025049600,
1025136000, 1025222400, 1025481600, 1025568000, 1025654400), tzone = "UTC", tclass = "Date"), .Dim = c(6L,
14L), .Dimnames = list(NULL, c("AN8068571086", "BMG3223R1088",
"BMG4388N1065", "BMG6359F1032", "BMG7496G1033", "BMG812761002",
"CA88157K1012", "CH0044328745", "CH0048265513", "GB00B4VLR192",
"GB00B5BT0K07", "GB00B6SLMV12", "GB00BFG3KF26", "GB00BVVBC028"
)))
还有这段代码:
######## INPUTS ######
a <- 0.5
b <- 0.6
results <- list() # list containing loop results
#######################
for (i in 1:nrow(data)) {
input <- as.matrix(data[i,])
#extract column names with a value between a and b
stocks <- matrix(colnames(data[,which(input > a & input < b)]))
# make a vector with new name for the output
date <- head(rownames(input), n=1)
#rename column
colnames(stocks) <- date
#export to list under "date" name
results[[date]] <- stocks
}
如果你 运行 完全照原样你会得到这个错误:
Error in matrix(colnames(data[, which(input > a & input < b)])) :
'data' must be of a vector type, was 'NULL'
In addition: Warning messages:
1: In min(j, na.rm = TRUE) :
no non-missing arguments to min; returning Inf
2: In max(j, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
这来自 data
中的第三行,其中不包含介于 0.5 和 0.6 之间的值
matrix(colnames(data[,which(input > a & input < b)]))
在 运行 上面的代码我 运行 之后,它将我所有的数据合并在一起并为其他计算做准备:
# merge all results in a list
max_length <- max(sapply(results ,length))
final_results <- sapply(results, function(x){
c(x, rep(NA, max_length - length(x)))
})
我需要一种方法来跳过该错误,同时仍将日期名称保留为 final_results
中的空白列。我在想也许是一个 if
函数,这样如果 a
和 b
之间没有值,则使用 colname = date
创建一个空矩阵 (1x1) 应该存储在 results
列表中。
另一个选项是使用 tryCatch
但这将完全省略日期,并且在我这里的数千个之间将不可能找到丢失的列。
如果您使用 tryCatch
函数通过返回具有 NA 值的矩阵来处理您的错误,您将得到一个列,其中包含适当的日期作为名称,并且在 final_results
中仅包含 NA。但是,这将以相同的方式处理所有错误,因此如果您的数据可能引发不同的错误,则可能不是最佳解决方案。
stocks <- tryCatch(matrix(colnames(data[,which(input > a & input < b)])),
error = function(e) matrix(NA))
快速解决您的任务:
DF <- as.data.frame(data)
DF <- apply(DF, 1, function(x) {
ifelse(x > a & x < b, x, NA_real_)
})
结果是:
> DF
2002-06-26 2002-06-27 2002-06-28 2002-07-01 2002-07-02 2002-07-03
AN8068571086 NA 0.5459639 NA 0.5543216 NA NA
BMG3223R1088 NA 0.5026737 NA NA NA NA
BMG4388N1065 0.5498202 NA NA NA 0.5062980 NA
BMG6359F1032 NA 0.5976407 NA NA NA NA
BMG7496G1033 NA NA NA NA NA NA
BMG812761002 0.5565205 NA NA 0.5763490 0.5437806 NA
CA88157K1012 NA NA NA NA NA NA
CH0044328745 0.5644279 NA NA NA NA NA
CH0048265513 NA 0.5550220 NA 0.5318460 NA NA
GB00B4VLR192 NA 0.5064482 NA 0.5554623 NA NA
GB00B5BT0K07 NA 0.5440355 NA NA 0.5577896 NA
GB00B6SLMV12 NA 0.5478252 NA NA NA NA
GB00BFG3KF26 NA 0.5359080 NA NA NA NA
GB00BVVBC028 NA NA NA NA 0.5025928 NA
如果您想删除所有 NA
的列,那么您可以像这样过滤它们:
DF <- DF[
, apply(DF, 2, function(x) {
sum(is.na(x)) != length(x)
})]
过滤后的结果为:
> DF
2002-06-26 2002-06-27 2002-07-01 2002-07-02
AN8068571086 NA 0.5459639 0.5543216 NA
BMG3223R1088 NA 0.5026737 NA NA
BMG4388N1065 0.5498202 NA NA 0.5062980
BMG6359F1032 NA 0.5976407 NA NA
BMG7496G1033 NA NA NA NA
BMG812761002 0.5565205 NA 0.5763490 0.5437806
CA88157K1012 NA NA NA NA
CH0044328745 0.5644279 NA NA NA
CH0048265513 NA 0.5550220 0.5318460 NA
GB00B4VLR192 NA 0.5064482 0.5554623 NA
GB00B5BT0K07 NA 0.5440355 NA 0.5577896
GB00B6SLMV12 NA 0.5478252 NA NA
GB00BFG3KF26 NA 0.5359080 NA NA
GB00BVVBC028 NA NA NA 0.5025928
我有以下数据块:
> dput(data)
structure(c(0.640372781, 0.54596394, 0.364612178, 0.554321638,
0.623891566, 0.299900389, 0.629781465, 0.502673674, 0.414942748,
0.485381455, 0.629032253, 0.201974626, 0.549820206, 0.49277897,
0.299640651, 0.443151949, 0.506297992, 0.259198111, 0.635090505,
0.597640686, 0.430193856, 0.631067648, 0.662995875, 0.391062922,
0.632248042, 0.627503454, 0.432827825, 0.418849204, 0.612201188,
0.227470395, 0.556520484, 0.6095603, 0.414923451, 0.57634896,
0.543780581, 0.320027087, 0.655818488, 0.648937123, 0.497094053,
0.429772696, 0.632386262, 0.270060224, 0.564427852, 0.456642259,
0.492407708, 0.436349654, 0.616355794, 0.248897538, 0.642866477,
0.555022037, 0.358901689, 0.53184597, 0.606299729, 0.342449093,
0.667681177, 0.506448197, 0.370292817, 0.555462276, 0.642302168,
0.42487856, 0.649249462, 0.544035494, 0.394793334, 0.383522657,
0.557789563, 0.220189788, 0.636151283, 0.547825201, 0.391789202,
0.653913292, 0.649412792, 0.452257495, 0.648866884, 0.535907987,
0.392093314, 0.724788138, 0.674157973, 0.494385979, 0.673032345,
0.450686601, 0.369089571, 0.397124065, 0.502592807, 0.197922003
), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1025049600,
1025136000, 1025222400, 1025481600, 1025568000, 1025654400), tzone = "UTC", tclass = "Date"), .Dim = c(6L,
14L), .Dimnames = list(NULL, c("AN8068571086", "BMG3223R1088",
"BMG4388N1065", "BMG6359F1032", "BMG7496G1033", "BMG812761002",
"CA88157K1012", "CH0044328745", "CH0048265513", "GB00B4VLR192",
"GB00B5BT0K07", "GB00B6SLMV12", "GB00BFG3KF26", "GB00BVVBC028"
)))
还有这段代码:
######## INPUTS ######
a <- 0.5
b <- 0.6
results <- list() # list containing loop results
#######################
for (i in 1:nrow(data)) {
input <- as.matrix(data[i,])
#extract column names with a value between a and b
stocks <- matrix(colnames(data[,which(input > a & input < b)]))
# make a vector with new name for the output
date <- head(rownames(input), n=1)
#rename column
colnames(stocks) <- date
#export to list under "date" name
results[[date]] <- stocks
}
如果你 运行 完全照原样你会得到这个错误:
Error in matrix(colnames(data[, which(input > a & input < b)])) :
'data' must be of a vector type, was 'NULL'
In addition: Warning messages:
1: In min(j, na.rm = TRUE) :
no non-missing arguments to min; returning Inf
2: In max(j, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
这来自 data
中的第三行,其中不包含介于 0.5 和 0.6 之间的值
matrix(colnames(data[,which(input > a & input < b)]))
在 运行 上面的代码我 运行 之后,它将我所有的数据合并在一起并为其他计算做准备:
# merge all results in a list
max_length <- max(sapply(results ,length))
final_results <- sapply(results, function(x){
c(x, rep(NA, max_length - length(x)))
})
我需要一种方法来跳过该错误,同时仍将日期名称保留为 final_results
中的空白列。我在想也许是一个 if
函数,这样如果 a
和 b
之间没有值,则使用 colname = date
创建一个空矩阵 (1x1) 应该存储在 results
列表中。
另一个选项是使用 tryCatch
但这将完全省略日期,并且在我这里的数千个之间将不可能找到丢失的列。
如果您使用 tryCatch
函数通过返回具有 NA 值的矩阵来处理您的错误,您将得到一个列,其中包含适当的日期作为名称,并且在 final_results
中仅包含 NA。但是,这将以相同的方式处理所有错误,因此如果您的数据可能引发不同的错误,则可能不是最佳解决方案。
stocks <- tryCatch(matrix(colnames(data[,which(input > a & input < b)])),
error = function(e) matrix(NA))
快速解决您的任务:
DF <- as.data.frame(data)
DF <- apply(DF, 1, function(x) {
ifelse(x > a & x < b, x, NA_real_)
})
结果是:
> DF
2002-06-26 2002-06-27 2002-06-28 2002-07-01 2002-07-02 2002-07-03
AN8068571086 NA 0.5459639 NA 0.5543216 NA NA
BMG3223R1088 NA 0.5026737 NA NA NA NA
BMG4388N1065 0.5498202 NA NA NA 0.5062980 NA
BMG6359F1032 NA 0.5976407 NA NA NA NA
BMG7496G1033 NA NA NA NA NA NA
BMG812761002 0.5565205 NA NA 0.5763490 0.5437806 NA
CA88157K1012 NA NA NA NA NA NA
CH0044328745 0.5644279 NA NA NA NA NA
CH0048265513 NA 0.5550220 NA 0.5318460 NA NA
GB00B4VLR192 NA 0.5064482 NA 0.5554623 NA NA
GB00B5BT0K07 NA 0.5440355 NA NA 0.5577896 NA
GB00B6SLMV12 NA 0.5478252 NA NA NA NA
GB00BFG3KF26 NA 0.5359080 NA NA NA NA
GB00BVVBC028 NA NA NA NA 0.5025928 NA
如果您想删除所有 NA
的列,那么您可以像这样过滤它们:
DF <- DF[
, apply(DF, 2, function(x) {
sum(is.na(x)) != length(x)
})]
过滤后的结果为:
> DF
2002-06-26 2002-06-27 2002-07-01 2002-07-02
AN8068571086 NA 0.5459639 0.5543216 NA
BMG3223R1088 NA 0.5026737 NA NA
BMG4388N1065 0.5498202 NA NA 0.5062980
BMG6359F1032 NA 0.5976407 NA NA
BMG7496G1033 NA NA NA NA
BMG812761002 0.5565205 NA 0.5763490 0.5437806
CA88157K1012 NA NA NA NA
CH0044328745 0.5644279 NA NA NA
CH0048265513 NA 0.5550220 0.5318460 NA
GB00B4VLR192 NA 0.5064482 0.5554623 NA
GB00B5BT0K07 NA 0.5440355 NA 0.5577896
GB00B6SLMV12 NA 0.5478252 NA NA
GB00BFG3KF26 NA 0.5359080 NA NA
GB00BVVBC028 NA NA NA 0.5025928