R根据特定列值乘以不相等的数据帧
R multiply unequal dataframes based on specific column values
更新了问题以包含从 df1 传递到输出的额外列,并添加了主 df 有 700 万行。
我有两个数据框,类似于Pandas SO Question
我需要在名称匹配的地方将它们相乘。我不确定如何干净地完成此操作。有没有应用函数可以轻松做到这一点?
DF1:(有16列额外数据,700万行)
Data1 Data2 Name Value
aa bb sample1 50
ff ff sample1 100
ef fd sample1 75
ff df sample2 100
bbf ad3 sample2 200
dd a sample2 300
33 3rf sample3 25
ddd dd sample3 50
dd dd sample3 40
DF2:
Name Value
sample1 1
sample2 0.5
sample3 2
输出:(还有未显示的额外 16 列)
Data1 Data2 Name Value
aa bb sample1 50
ff ff sample1 100
ef fd sample1 75
ff df sample2 50
bbf ad3 sample2 100
dd a sample2 150
33 3rf sample3 50
ddd dd sample3 100
dd dd sample3 80
您可以尝试下面的基本 R 代码,使用 merge
DF1$Value <- do.call(`*`,merge(DF1[c("Name","Value")],DF2,all = TRUE,by="Name")[-1])
这样
> DF1
Data1 Data2 Name Value
1 aa bb sample1 50
2 ff ff sample1 100
3 ef fd sample1 75
4 ff df sample2 50
5 bbf ad3 sample2 100
6 dd a sample2 150
7 33 3rf sample3 50
8 ddd dd sample3 100
9 dd dd sample3 80
数据
DF1 <- structure(list(Data1 = c("aa", "ff", "ef", "ff", "bbf", "dd",
"33", "ddd", "dd"), Data2 = c("bb", "ff", "fd", "df", "ad3",
"a", "3rf", "dd", "dd"), Name = c("sample1", "sample1", "sample1",
"sample2", "sample2", "sample2", "sample3", "sample3", "sample3"
), Value = c(50L, 100L, 75L, 100L, 200L, 300L, 25L, 50L, 40L)), class = "data.frame", row.names = c(NA,
-9L))
DF2 <- structure(list(Name = c("sample1", "sample2", "sample3"), Value = c(1,
0.5, 2)), class = "data.frame", row.names = c(NA, -3L))
最直接的方法是使用 match
以正确的顺序获取 df2
的行索引。
df2$Value[match(df1$Name, df2$Name)] * df1$Value
您还可以将 df2
转换为名称基于 Name
列的向量。然后使用 df1
.
Name
列从中提取值
df1$Value * setNames(df2$Value, df2$Name)[df1$Name]
您可以使用 data.table
包:
library(data.table)
setDT(df1)[setDT(df2), Value_new := Value * i.Value, on = "Name"]
# Data1 Data2 Name Value Value_new
# 1: aa bb sample1 50 50
# 2: ff ff sample1 100 100
# 3: ef fd sample1 75 75
# 4: ff df sample2 100 50
# 5: bbf ad3 sample2 200 100
# 6: dd a sample2 300 150
# 7: 33 3rf sample3 25 50
# 8: ddd dd sample3 50 100
# 9: dd dd sample3 40 80
我们可以使用 left_join
或 inner_join
连接两个数据框,然后将相应的 Value
列彼此相乘。使用 dplyr
可以完成为:
library(dplyr)
inner_join(df1, df2, by = 'Name') %>%
mutate(Value = Value.x * Value.y) %>%
select(names(df1))
# Name Value
#1 sample1 50
#2 sample1 100
#3 sample1 75
#4 sample2 50
#5 sample2 100
#6 sample2 150
#7 sample3 50
#8 sample3 100
#9 sample3 80
数据
df1 <- structure(list(Name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("sample1", "sample2", "sample3"), class = "factor"),
Value = c(50L, 100L, 75L, 100L, 200L, 300L, 25L, 50L, 40L
)), class = "data.frame", row.names = c(NA, -9L))
df2 <- structure(list(Name = structure(1:3, .Label = c("sample1", "sample2",
"sample3"), class = "factor"), Value = c(1, 0.5, 2)), class = "data.frame",
row.names = c(NA, -3L))