源代码中的 uniroot() 函数不适用于修改；无法找出错误

Question

我试图找出R中两条曲线的交点坐标。输入数据是两条曲线的经验点坐标。我的解决方案是使用函数 curve_intersect()。我需要为 2000 次复制（即 2000 对曲线）执行此操作。所以我把数据放在两个列表中。每个列表包含 1000 个数据帧，每个数据帧中一条曲线的 x 和 y 坐标。

这是我的数据：data

下面是我使用的代码。

threshold_or1 <- map2_df(recall_or1_4, precision_or1_4,
                         ~curve_intersect(.x, .y, empirical = TRUE, domain = NULL))

# recall_or_4 is a list of 2000 data frames. Each data frame 
# |contains coordinates from curve #1. 

# precision_or_4 is a list of 2000 data frames. Each data frame 
# |contains coordinates from curve #2.

我在下面收到这条错误消息。

Error in uniroot(function(x) curve1_f(x) - curve2_f(x), c(min(curve1$x),  : f() values at end points not of opposite sign

因为函数 curve_intersect() 可以成功地应用于两个列表中的一些单独的数据帧。我运行下面的代码是为了准确地查看是哪对数据帧导致了进程失败。

test <- for (i in 1:2000){
            curve_intersect(recall_or1_4[[i]], precision_or1_4[[i]], empirical = TRUE, domain = NULL)
            print(paste("i=",i))}

然后，我收到以下消息，这意味着处理运行成功，直到到达数据对 #460。所以我检查了那个单独的数据对。

[1] "i= 457"
[1] "i= 458"
[1] "i= 459"
Error in uniroot(function(x) curve1_f(x) - curve2_f(x), c(min(curve1$x),  : f() values at end points not of opposite sign

我绘制了数据对 #460。

test1 <- precision_or1_4[[460]] %>% mutate(statistics = 'precision')
test2 <- recall_or1_4[[460]] %>% mutate(statistics = 'recall')
test3 <- rbind(test1, test2)
test3 <- test3 %>% mutate(statistics = as.factor(statistics))
curve_test3 <- ggplot(test3, aes(x = x, y = y))+
        geom_line(aes(colour = statistics))
curve_test3

Find coordinates of the intersection point

然后我去修改了curve_intersect()的源码。原源码为

    curve_intersect <- function(curve1, curve2, empirical=TRUE, domain=NULL) {
        if (!empirical & missing(domain)) {
                stop("'domain' must be provided with non-empirical curves")
        }
        
        if (!empirical & (length(domain) != 2 | !is.numeric(domain))) {
                stop("'domain' must be a two-value numeric vector, like c(0, 10)")
        }
        
        if (empirical) {
                # Approximate the functional form of both curves
                curve1_f <- approxfun(curve1$x, curve1$y, rule = 2)
                curve2_f <- approxfun(curve2$x, curve2$y, rule = 2)
                
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                point_x <- uniroot(function(x) curve1_f(x) - curve2_f(x),
                                   c(min(curve1$x), max(curve1$x)))$root
                
                # Find where point_x is in curve 2
                point_y <- curve2_f(point_x)
        } else {
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                # within the given domain
                point_x <- uniroot(function(x) curve1(x) - curve2(x), domain)$root
                
                # Find where point_x is in curve 2
                point_y <- curve2(point_x)
        }
        
        return(list(x = point_x, y = point_y))
}

我修改了第三个 if 语句的 uniroot() 部分。我没有使用 c(min(curve1$x), max(curve1$x)) 作为 uniroot() 的参数，而是使用了 lower = -100000000, upper = 100000000。修改后的函数为

curve_intersect_tq <- function(curve1, curve2, empirical=TRUE, domain=NULL) {
        if (!empirical & missing(domain)) {
                stop("'domain' must be provided with non-empirical curves")
        }
        
        if (!empirical & (length(domain) != 2 | !is.numeric(domain))) {
                stop("'domain' must be a two-value numeric vector, like c(0, 10)")
        }
        
        if (empirical) {
                # Approximate the functional form of both curves
                curve1_f <- approxfun(curve1$x, curve1$y, rule = 2)
                curve2_f <- approxfun(curve2$x, curve2$y, rule = 2)
                
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                point_x <- uniroot(function(x) curve1_f(x) - curve2_f(x),
                                   lower = -100000000, upper = 100000000)$root
                
                # Find where point_x is in curve 2
                point_y <- curve2_f(point_x)
        } else {
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                # within the given domain
                point_x <- uniroot(function(x) curve1(x) - curve2(x), domain)$root
                
                # Find where point_x is in curve 2
                point_y <- curve2(point_x)
        }
        
        return(list(x = point_x, y = point_y))
}

我试图更改 lower =, upper = 个参数的值。它不起作用。我收到了如下所示的相同错误消息。

curve_intersect_tq(recall_or1_4[[460]], precision_or1_4[[460]], empirical = TRUE, domain = NULL)

Error in uniroot(function(x) curve1_f(x) - curve2_f(x), c(min(curve1$x),  : 
  f() values at end points not of opposite sign

我还尝试使用 tidyverse 包中的 possibly(fun, NA)，希望该过程可以运行即使出现错误消息。我用

的时候没用

(1) possibly(curve_intersect(), NA) 或 (2) possibly(uniroot(), NA)

同样的错误信息出现了。

为什么我会收到错误消息？可能的解决方案是什么？提前致谢。

Answer 1

聚会可能有点晚了，但这就是您的代码仍然失败的原因以及您可以做什么，具体取决于您希望从分析中得到什么：

首先，你的代码失败的原因，即使在改编之后，你只是告诉 uniroot 搜索更广泛的 window在 x。但是，基础曲线永远不会相交 - 根本找不到任何 curve1_f(x) - curve2_f(x) == 0。

来自 uniroot 的文档：

"The function values at the endpoints must be of opposite signs (or zero), for extendInt="no", the default."

在最初的 curve_intersect 实现中，uniroot 正在搜索数据中定义的 x-interval（即 c(min(curve1$x), max(curve1$x))）。在您的更改中，您告诉它在 x 区间 [-100000000, 100000000] 中搜索。您也可以设置 extendInt = "yes"，但它不会改变任何内容。
问题不在于搜索间隔，而在于 approxfun!

approxfun 仅通过插值点之间的经验数据来帮助您。在您传入的数据之外，返回的函数不知道要做什么。
approxfun 允许您为 y 指定明确的值，这些值应该在经验定义的 window（及其参数 yleft/yright）之外返回，或者让您设置每边一个rule。
在您上面发布的代码中，rule = 2 决定“使用最接近数据极值的值”。因此，approxfun 不会外推您传入的数据。它只会扩展已知数据。

我们可以绘制 curve1_f 和 curve2_f 将如何扩展到经验定义的 x-interval 之外进入无穷大：

tibble(
    x = seq(0, 1, by = 0.001),
    curve1_approxed = curve1_f(x),
    curve2_approxed = curve2_f(x)
  ) %>%
  pivot_longer(starts_with("curve"), names_to = "curve", values_to = "y") %>%
  ggplot(aes(x = x, y = y, color = curve)) +
  geom_line() +
  geom_vline(xintercept = c(min(curve1$x), max(curve1$x)), color = "grey75")

那么，现在你可以做些什么来让你的代码不崩溃：
（剧透：这在很大程度上取决于您要通过项目完成的目标）

接受在您的数据的观察范围内没有交集。
如果您不想做任何假设，我建议您将映射函数包装在 tryCatch 语句中，并让它在 out-of-the-box 解决方案没有给您任何结果的地方失败。让我们运行这是您列表中之前使整个事情崩溃的部分：

threshold_or1.fix1 <- map2_df(
  recall_or1_4, precision_or1_4,
  ~tryCatch({
    curve_intersect(.x, .y, empirical = TRUE, domain = NULL)
  }, error = function(e){
    return(tibble(.rows = 1))
  }),
  .id = "i"
)

现在，当 curve_intersect 无法为您提供结果时，只有一行 NA。

threshold_or1.fix1[459:461,]
# A tibble: 3 x 3
  i          x      y
  <chr>  <dbl>  <dbl>
1 459    0.116  0.809
2 460   NA     NA    
3 461    0.264  0.773

尝试使用线性模型外推您的数据
在这种情况下，我们将使用自定义 curve_intersect 函数。让我们将有问题的 uniroot 调用包装在 tryCatch 中，如果找不到根，我们将为每条曲线拟合一个 lm 并让 uniroot 在拟合线性。
根据您的实验，这可能有意义也可能没有意义，所以我会让您在这里做判断。显然，如果您的数据比这更复杂，您可以使用除简单 lm 之外的其他模型...
只是为了形象化这种方法与默认方法：

tibble(
    x = seq(-1, 2, by = 0.001),
    curve1_approxed = curve1_f(x),
    curve2_approxed = curve2_f(x),
    curve1_lm = predict(lm(y ~ x, data = curve1), newdata = tibble(x = x)),
    curve2_lm = predict(lm(y ~ x, data = curve2), newdata = tibble(x = x))
  ) %>%
  pivot_longer(starts_with("curve"), names_to = "curve", values_to = "y") %>%
  ggplot(aes(x = x, y = y, color = curve)) +
  geom_line() +
  geom_vline(xintercept = c(min(curve1$x), max(curve1$x)), color = "grey75")

你看，在approxfun“失败”的地方，lm我们假设我们可以线性推断并在观察到的框架之外找到x = 1.27周围的交点。

要采用第二种方法并在我们的搜索中包含 lm 的外推法，您可以像这样拼凑一些东西：
（这里也只编辑了第三个 if。）

curve_intersect_custom <- function(curve1, curve2, empirical=TRUE, domain=NULL) {
  if (!empirical & missing(domain)) {
    stop("'domain' must be provided with non-empirical curves")
  }
  
  if (!empirical & (length(domain) != 2 | !is.numeric(domain))) {
    stop("'domain' must be a two-value numeric vector, like c(0, 10)")
  }
  
  if (empirical) {
    
    return(
      tryCatch({
        # Approximate the functional form of both curves
        curve1_f <- approxfun(curve1$x, curve1$y, rule = 2)
        curve2_f <- approxfun(curve2$x, curve2$y, rule = 2)
        
        # Calculate the intersection of curve 1 and curve 2 along the x-axis
        point_x <- uniroot(
          f = function(x) curve1_f(x) - curve2_f(x),
          interval = c(min(curve1$x), max(curve1$x))
        )$root
        
        # Find where point_x is in curve 2
        point_y <- curve2_f(point_x)
        
        return(list(x = point_x, y = point_y, method = "approxfun"))
        
      }, error = function(e) {
        tryCatch({
          curve1_lm_f <- function(x) predict(lm(y ~ x, data = curve1), newdata = tibble(x = x))
          curve2_lm_f <- function(x) predict(lm(y ~ x, data = curve2), newdata = tibble(x = x))
          
          point_x <- uniroot(
            f = function(x) curve1_lm_f(x) - curve2_lm_f(x),
            interval = c(min(curve1$x), max(curve1$x)),
            extendInt = "yes"
          )$root
          
          point_y <- curve2_lm_f(point_x)
          
          return(list(x = point_x, y = point_y, method = "lm"))
          
        }, error = function(e) {
          return(list(x = NA_real_, y = NA_real_, method = NA_character_))
        })
      })
    )
    
    
  } else {
    # Calculate the intersection of curve 1 and curve 2 along the x-axis
    # within the given domain
    point_x <- uniroot(function(x) curve1(x) - curve2(x), domain)$root
    
    # Find where point_x is in curve 2
    point_y <- curve2(point_x)
  }
  
  return(list(x = point_x, y = point_y))
}

对于您的有问题的列表元素，现在尝试使用天真的拟合 lm 模型进行推断：

threshold_or1.fix2 <- map2_df(
    recall_or1_4, precision_or1_4,
    ~curve_intersect_custom(.x, .y, empirical = TRUE, domain = NULL),
    .id = "i"
)

threshold_or1.fix2[459:461,]
# A tibble: 3 x 4
  i         x     y method   
  <chr> <dbl> <dbl> <chr>    
1 459   0.116 0.809 approxfun
2 460   1.27  0.813 lm       
3 461   0.264 0.773 approxfun

希望这对理解和解决您的问题有所帮助:)

源代码中的 uniroot() 函数不适用于修改；无法找出错误

uniroot() function in source code does not work with modification; Could not figure out the error

statistics

r

purrr

tidyverse

uniroot