拒绝采样循环在 R 中产生 "length zero" 错误
Rejection sampling loop producing a "length zero" error in R
我收到一个 argument is of length zero
错误,我无法找出原因。
此代码应该从 start
和 end
的数字范围内采样。一些行有额外的依赖关系,其中某些 ID 需要在其他 ID 之后。代码通过将 after1
和 after2
列中的值替换为相应的采样值来检查这些依赖关系。如果不满足所需的依赖性,则对值进行重新采样。
当代码成功运行时,sampled
值应填充满足所需依赖性的数字。代码是 运行 用于 i
次迭代,在末尾添加一列表示 运行 采样值对应的内容。
我最近调整了在将数据输入 R 之前清理和准备数据的方式。我认为我已经正确地转置了代码,但我收到了旧版本中不存在的以下错误,而且我不确定如何修复它。我查看了其他帖子,但没有找到适用的解决方案。
Error in if (is.na(filter(dftemp, ID == dftemp[j, k])[6])) { :
argument is of length zero
In addition: Warning message:
In as.integer(dftemp[j, k]) : NAs introduced by coercion
这是我目前正在处理的代码:
df <- read_csv("sampling sample set.csv", na = c("#VALUE!", "#N/A", ""))
dftemp <- df
dftemp %>% mutate_if(is.factor, as.character) -> dftemp #change factors to characters
for (i in 1:200){ #determines how many iterations to run
row_list<-as.list(1:nrow(dftemp))
q<-0
while(length(row_list)!=0 & q<10){
q<-q+1
for(j in row_list){ #this loop replaces the check values
skip_flag<-FALSE #initialize skip flag used to check the replacement sampling
for(k in 4:5){ #checking the after columns
if(is.na(dftemp[j,k])){
print("NA break")
print(i)
break
} else if(is.na(as.integer(dftemp[j,k]))==FALSE) { #if it's already an integer, we already did this, next
print("integer next")
next
print("integer next")
} else if(dftemp[j,k]==""){ #check for blank values
print("empty string next")
dftemp[j,k]<-NA #if blank value found, replace with NA
print("fixed blank to NA")
next
} else if(is.na(filter(dftemp,ID==dftemp[j,k])[6])) { #if the replacement has not yet been generated, move on, but set flag to jump this to the end
skip_flag<-TRUE
print("skip flag set")
} else {
dftemp[j,k]<-as.integer(filter(dftemp,ID==dftemp[j,k])[6]) #replacing IDs with the sampled dates of those IDs
print("successful check value grab")
} #if-else
} #k for loop
if(skip_flag==FALSE){
row_list<-row_list[row_list!=j]
} else {
next
}
#sampling section
if(skip_flag==FALSE){
dftemp[j,6] <- mapply(function(x, y) sample(seq(x, y), 1), dftemp[j,"start"], dftemp[j,"end"])
dftemp[j,7]<-i #identifying the run number
if(any(as.numeric(dftemp[j,4:5])>as.numeric(dftemp[j,6]),na.rm=TRUE)){
print(j)
while(any(as.numeric(dftemp[j,4:5])>as.numeric(dftemp[j,6]),na.rm=TRUE)){
dftemp[j,6] <- mapply(function(x, y) sample(seq(x, y), 1), dftemp[j,"start"], dftemp[j,"end"])
} #while
dftemp[j,7]=i
}#if
}
} #j for loop
} #while loop wrapper around j loop
if(i==1){
dftemp2<-dftemp
}else{
dftemp2<-rbind(dftemp2,dftemp)
}#else
#blank out dftemp to prepare for another run
dftemp<-dftemp
dftemp$sampled <- NA
dftemp %>% mutate_if(is.factor, as.character) -> dftemp
}#i for loop
这是示例数据。
structure(list(ID = c("a123-1", "b123-1", "c123-1", "d123-1",
"e123-1", "f123-1", "g123-1", "h123-1", "i123-1", "j123-1", "k123-1",
"l123-1", "m123-1", "n123-1"), start = c(-5100, -4760, -4930,
-4930, -5380, -5280, -4855, -4855, -4855, -4855, -4855, -4855,
-4810, -4810), end = c(-4760, -4420, -4420, -4420, -5080, -5080,
-4750, -4750, -4750, -4750, -4750, -4750, -4710, -4710), after1 = c(NA,
NA, NA, NA, NA, NA, NA, "g123-1", "g123-1", NA, "j123-1", "j123-1",
NA, NA), after2 = c(NA, NA, NA, NA, NA, NA, NA, NA, "h123-1",
NA, NA, "k123-1", NA, NA), sampled = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA)), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -14L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_character",
"collector")), start = structure(list(), class = c("collector_double",
"collector")), end = structure(list(), class = c("collector_double",
"collector")), after1 = structure(list(), class = c("collector_character",
"collector")), after2 = structure(list(), class = c("collector_character",
"collector")), sampled = structure(list(), class = c("collector_logical",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
如错误消息所示,问题至少发生在 is.na(filter(dftemp,ID==dftemp[j,k])[6])
行。问题似乎与 dplyr 的 filter
想要的输入有关。考虑以下调用返回的内容:
#returns a tibble with one value
str(dftemp[8,4])
#returns an empty tibble
filter(dftemp,ID==dftemp[8,4])
#returns True
is.data.frame(filter(dftemp,ID==dftemp[8,4]))
filter
直接想要值,而不是包含该值的数据框。在您的子集上添加 as.character
应该可以解决此问题。请注意,这可能发生在代码的其他地方,因此您可能需要在其他地方确保拥有正确的数据类型。下面是一个例子:
#replace line in question with the following:
is.na(filter(dftemp,ID==as.character(dftemp[8,4]) )[6])
#testing
if(is.na(filter(dftemp,ID==as.character(dftemp[8,4]) )[6])){print("working")}
#output
[1] "working"
我收到一个 argument is of length zero
错误,我无法找出原因。
此代码应该从 start
和 end
的数字范围内采样。一些行有额外的依赖关系,其中某些 ID 需要在其他 ID 之后。代码通过将 after1
和 after2
列中的值替换为相应的采样值来检查这些依赖关系。如果不满足所需的依赖性,则对值进行重新采样。
当代码成功运行时,sampled
值应填充满足所需依赖性的数字。代码是 运行 用于 i
次迭代,在末尾添加一列表示 运行 采样值对应的内容。
我最近调整了在将数据输入 R 之前清理和准备数据的方式。我认为我已经正确地转置了代码,但我收到了旧版本中不存在的以下错误,而且我不确定如何修复它。我查看了其他帖子,但没有找到适用的解决方案。
Error in if (is.na(filter(dftemp, ID == dftemp[j, k])[6])) { :
argument is of length zero
In addition: Warning message:
In as.integer(dftemp[j, k]) : NAs introduced by coercion
这是我目前正在处理的代码:
df <- read_csv("sampling sample set.csv", na = c("#VALUE!", "#N/A", ""))
dftemp <- df
dftemp %>% mutate_if(is.factor, as.character) -> dftemp #change factors to characters
for (i in 1:200){ #determines how many iterations to run
row_list<-as.list(1:nrow(dftemp))
q<-0
while(length(row_list)!=0 & q<10){
q<-q+1
for(j in row_list){ #this loop replaces the check values
skip_flag<-FALSE #initialize skip flag used to check the replacement sampling
for(k in 4:5){ #checking the after columns
if(is.na(dftemp[j,k])){
print("NA break")
print(i)
break
} else if(is.na(as.integer(dftemp[j,k]))==FALSE) { #if it's already an integer, we already did this, next
print("integer next")
next
print("integer next")
} else if(dftemp[j,k]==""){ #check for blank values
print("empty string next")
dftemp[j,k]<-NA #if blank value found, replace with NA
print("fixed blank to NA")
next
} else if(is.na(filter(dftemp,ID==dftemp[j,k])[6])) { #if the replacement has not yet been generated, move on, but set flag to jump this to the end
skip_flag<-TRUE
print("skip flag set")
} else {
dftemp[j,k]<-as.integer(filter(dftemp,ID==dftemp[j,k])[6]) #replacing IDs with the sampled dates of those IDs
print("successful check value grab")
} #if-else
} #k for loop
if(skip_flag==FALSE){
row_list<-row_list[row_list!=j]
} else {
next
}
#sampling section
if(skip_flag==FALSE){
dftemp[j,6] <- mapply(function(x, y) sample(seq(x, y), 1), dftemp[j,"start"], dftemp[j,"end"])
dftemp[j,7]<-i #identifying the run number
if(any(as.numeric(dftemp[j,4:5])>as.numeric(dftemp[j,6]),na.rm=TRUE)){
print(j)
while(any(as.numeric(dftemp[j,4:5])>as.numeric(dftemp[j,6]),na.rm=TRUE)){
dftemp[j,6] <- mapply(function(x, y) sample(seq(x, y), 1), dftemp[j,"start"], dftemp[j,"end"])
} #while
dftemp[j,7]=i
}#if
}
} #j for loop
} #while loop wrapper around j loop
if(i==1){
dftemp2<-dftemp
}else{
dftemp2<-rbind(dftemp2,dftemp)
}#else
#blank out dftemp to prepare for another run
dftemp<-dftemp
dftemp$sampled <- NA
dftemp %>% mutate_if(is.factor, as.character) -> dftemp
}#i for loop
这是示例数据。
structure(list(ID = c("a123-1", "b123-1", "c123-1", "d123-1",
"e123-1", "f123-1", "g123-1", "h123-1", "i123-1", "j123-1", "k123-1",
"l123-1", "m123-1", "n123-1"), start = c(-5100, -4760, -4930,
-4930, -5380, -5280, -4855, -4855, -4855, -4855, -4855, -4855,
-4810, -4810), end = c(-4760, -4420, -4420, -4420, -5080, -5080,
-4750, -4750, -4750, -4750, -4750, -4750, -4710, -4710), after1 = c(NA,
NA, NA, NA, NA, NA, NA, "g123-1", "g123-1", NA, "j123-1", "j123-1",
NA, NA), after2 = c(NA, NA, NA, NA, NA, NA, NA, NA, "h123-1",
NA, NA, "k123-1", NA, NA), sampled = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA)), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -14L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_character",
"collector")), start = structure(list(), class = c("collector_double",
"collector")), end = structure(list(), class = c("collector_double",
"collector")), after1 = structure(list(), class = c("collector_character",
"collector")), after2 = structure(list(), class = c("collector_character",
"collector")), sampled = structure(list(), class = c("collector_logical",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
如错误消息所示,问题至少发生在 is.na(filter(dftemp,ID==dftemp[j,k])[6])
行。问题似乎与 dplyr 的 filter
想要的输入有关。考虑以下调用返回的内容:
#returns a tibble with one value
str(dftemp[8,4])
#returns an empty tibble
filter(dftemp,ID==dftemp[8,4])
#returns True
is.data.frame(filter(dftemp,ID==dftemp[8,4]))
filter
直接想要值,而不是包含该值的数据框。在您的子集上添加 as.character
应该可以解决此问题。请注意,这可能发生在代码的其他地方,因此您可能需要在其他地方确保拥有正确的数据类型。下面是一个例子:
#replace line in question with the following:
is.na(filter(dftemp,ID==as.character(dftemp[8,4]) )[6])
#testing
if(is.na(filter(dftemp,ID==as.character(dftemp[8,4]) )[6])){print("working")}
#output
[1] "working"