R获得范围重叠的矩阵

R obtain matrix with overlap in ranges

我有一个范围如下所示的数据框:

df <- data.frame(label = c("A", "B", "C"),
                 start = c(2, 11, 22),
                 stop = c(37, 45, 29))

现在我想获得一个矩阵,我可以在其中看到 A:B、B:C、A:C 等之间有多少重叠(百分比),即有多少范围 A 出现在范围 B 等中。输出应如下所示:

          A       B      C
 A        100     76.5   100
 B        74.3    100    100
 C        20      20.6   100

我试过用 IRanges 或 GRanges 获得这样的矩阵,但这似乎不可能。希望有人能帮我解决这个问题!

基础 R

out <- 100 * with(df, t((outer(stop, stop, pmin) - outer(start, start, pmax)) / (stop - start)))
dimnames(out) <- list(df$label, df$label)
out
#           A         B   C
# A 100.00000  76.47059 100
# B  74.28571 100.00000 100
# C  20.00000  20.58824 100

整洁宇宙

library(dplyr)
library(tidyr)
expand_grid(Var1 = df$label, Var2 = df$label) %>%
  left_join(df, by = c("Var1" = "label")) %>%
  left_join(df, by = c("Var2" = "label")) %>%
  mutate(
    start = pmax(start.y, start.x),
    stop  = pmin(stop.x, stop.y),
    overlap = 100 * (stop - start) / (stop.y - start.y)
  ) %>%
  pivot_wider(Var1, names_from = Var2, values_from = overlap)
# # A tibble: 3 x 4
#   Var1      A     B     C
#   <chr> <dbl> <dbl> <dbl>
# 1 A     100    76.5   100
# 2 B      74.3 100     100
# 3 C      20    20.6   100