R根据特定列值乘以不相等的数据帧

R multiply unequal dataframes based on specific column values

更新了问题以包含从 df1 传递到输出的额外列,并添加了主 df 有 700 万行。


我有两个数据框,类似于Pandas SO Question

我需要在名称匹配的地方将它们相乘。我不确定如何干净地完成此操作。有没有应用函数可以轻松做到这一点?

DF1:(有16列额外数据,700万行)

Data1   Data2   Name      Value
aa      bb      sample1   50
ff      ff      sample1   100
ef      fd      sample1   75
ff      df      sample2   100
bbf     ad3     sample2   200
dd      a       sample2   300
33      3rf     sample3   25
ddd     dd      sample3   50
dd      dd      sample3   40

DF2:

Name      Value
sample1   1
sample2   0.5
sample3   2

输出:(还有未显示的额外 16 列)

Data1   Data2   Name      Value
aa      bb      sample1   50
ff      ff      sample1   100
ef      fd      sample1   75
ff      df      sample2   50
bbf     ad3     sample2   100
dd      a       sample2   150
33      3rf     sample3   50
ddd     dd      sample3   100
dd      dd      sample3   80

您可以尝试下面的基本 R 代码,使用 merge

DF1$Value <- do.call(`*`,merge(DF1[c("Name","Value")],DF2,all = TRUE,by="Name")[-1])

这样

> DF1
  Data1 Data2    Name Value
1    aa    bb sample1    50
2    ff    ff sample1   100
3    ef    fd sample1    75
4    ff    df sample2    50
5   bbf   ad3 sample2   100
6    dd     a sample2   150
7    33   3rf sample3    50
8   ddd    dd sample3   100
9    dd    dd sample3    80

数据

DF1 <- structure(list(Data1 = c("aa", "ff", "ef", "ff", "bbf", "dd", 
"33", "ddd", "dd"), Data2 = c("bb", "ff", "fd", "df", "ad3", 
"a", "3rf", "dd", "dd"), Name = c("sample1", "sample1", "sample1", 
"sample2", "sample2", "sample2", "sample3", "sample3", "sample3"
), Value = c(50L, 100L, 75L, 100L, 200L, 300L, 25L, 50L, 40L)), class = "data.frame", row.names = c(NA, 
-9L))

DF2 <- structure(list(Name = c("sample1", "sample2", "sample3"), Value = c(1, 
0.5, 2)), class = "data.frame", row.names = c(NA, -3L))

最直接的方法是使用 match 以正确的顺序获取 df2 的行索引。

df2$Value[match(df1$Name, df2$Name)] * df1$Value

您还可以将 df2 转换为名称基于 Name 列的向量。然后使用 df1.

Name 列从中提取值
df1$Value * setNames(df2$Value, df2$Name)[df1$Name]

您可以使用 data.table 包:

library(data.table)
setDT(df1)[setDT(df2), Value_new := Value * i.Value, on = "Name"]

#     Data1  Data2    Name Value Value_new
# 1:     aa     bb sample1    50        50
# 2:     ff     ff sample1   100       100
# 3:     ef     fd sample1    75        75
# 4:     ff     df sample2   100        50
# 5:    bbf    ad3 sample2   200       100
# 6:     dd      a sample2   300       150
# 7:     33    3rf sample3    25        50
# 8:    ddd     dd sample3    50       100
# 9:     dd     dd sample3    40        80

我们可以使用 left_joininner_join 连接两个数据框,然后将相应的 Value 列彼此相乘。使用 dplyr 可以完成为:

library(dplyr)

inner_join(df1, df2, by = 'Name') %>%
   mutate(Value = Value.x * Value.y) %>%
   select(names(df1))

#     Name Value
#1 sample1    50
#2 sample1   100
#3 sample1    75
#4 sample2    50
#5 sample2   100
#6 sample2   150
#7 sample3    50
#8 sample3   100
#9 sample3    80

数据

df1 <- structure(list(Name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L), .Label = c("sample1", "sample2", "sample3"), class = "factor"), 
Value = c(50L, 100L, 75L, 100L, 200L, 300L, 25L, 50L, 40L
)), class = "data.frame", row.names = c(NA, -9L))

df2 <- structure(list(Name = structure(1:3, .Label = c("sample1", "sample2", 
"sample3"), class = "factor"), Value = c(1, 0.5, 2)), class = "data.frame", 
row.names = c(NA, -3L))