如何根据查找 table 拆分列的值？

Question

我有一个包含 ID 和关联值的数据集：

df <- data.frame(id = c("1", "2", "3"), value = c("12", "20", "16"))

我有一个查找 table 匹配 id 到另一个参考标签 ref:

lookup <- data.frame(id = c("1", "1", "1", "2", "2", "3", "3", "3", "3"), ref = c("a", "b", "c", "a", "d", "d", "e", "f", "a"))

注意id到ref是多对多匹配：同一个id可以关联多个ref，同一个ref 可以关联多个 id.

我正在尝试将与 df$id 列关联的 value 平均拆分为关联的 ref 列。输出数据集如下所示：

output <- data.frame(ref = "a", "b", "c", "d", "e", f", value = "18", "4", "4", "14", "4", "4")

ref	value
a	18
b	4
c	4
d	14
e	4
f	4

我尝试将其分为四个步骤：

在 lookup 上调用 pivot_wider，将具有相同 id 值的行转换为列（例如，a、b、c.)
根据id
将每个df$value均分为a、b、c等不为空的列
转置数据集并对 id 列求和。

不过，我不知道如何使步骤 (3) 起作用，我怀疑有更简单的方法。

Answer 1

这是一个潜在的逻辑。通过 id 将 value 从 df 合并到 lookup，将 value 除以匹配行数，然后按 ref 分组并求和。然后选择您想要的方式。

基础 R

tmp <- merge(lookup, df, by="id", all.x=TRUE)
tmp$value <- ave(as.numeric(tmp$value), tmp$id, FUN=\(x) x/length(x) )
aggregate(value ~ ref, tmp, sum)

dplyr

library(dplyr)
lookup %>%
  left_join(df, by="id") %>%
  group_by(id) %>% 
  mutate(value = as.numeric(value) / n() ) %>%
  group_by(ref) %>%
  summarise(value = sum(value))

data.table

library(data.table)
setDT(df)
setDT(lookup)
lookup[df, on="id", value := as.numeric(value)/.N, by=.EACHI][
   , .(value = sum(value)), by=ref]

#   ref value
#1:   a    18
#2:   b     4
#3:   c     4
#4:   d    14
#5:   e     4
#6:   f     4

Answer 2

这可能有效

lookup %>%
  left_join(lookup %>%
              group_by(id) %>%
              summarise(n = n()) %>%
              left_join(dummy, by = "id") %>%
              mutate(value = as.numeric(value)) %>%
              mutate(repl = value/n) %>%
              select(id, repl) ,
            by = "id"
  ) %>% select(ref, repl) %>%
  group_by(ref) %>% summarise(value = sum(repl))

  ref   value
  <chr> <dbl>
1 a        18
2 b         4
3 c         4
4 d        14
5 e         4
6 f         4

Answer 3

@thelatemail 的与基管的变体。

merge(df, lookup) |> type.convert(as.is=TRUE) |>
  transform(value=ave(value, id, FUN=\(x) x/length(x))) |>
  with(aggregate(list(value=value), list(ref=ref), sum))
#   ref value
# 1   a    18
# 2   b     4
# 3   c     4
# 4   d    14
# 5   e     4
# 6   f     4

如何根据查找 table 拆分列的值？

How do I split the value of a column based on a lookup table?

pivot

r

reshape

dataframe