条件减法第 2 部分

Conditional subtraction Part 2

我有一个很大的 data.frame (TOTAL),其中包含一些值 (cols11-16),我需要减去一个基数,根据 [=] 中的两个条件,从中乘以一个值13=].

data.frame (TOTAL) 有点像这样

Channel    Hour      Category cols11 cols12 cols13 cols14 cols15 base
TV1        01:00:00  New      2      5      4      5      6      2.4
TV5        23:00:00  Old      1      5      3      9      7      1.8
TV1        02:00:00  New      8      7      9      2      4      5.4

有 4 个不同的频道,24 个不同的时间(00:00:00-23:00:00

我还有其他四个带有条件变量的向量需要根据小时和频道在基数上相乘,所以对于每个频道我都有一个这样的向量:

TV1Slope:
TV1Slope00 TV1Slope01 TV1Slope02.. TV1Slope23
 0.0012      0.0015    0.013       0.0009

TV5Slope:
TV5Slope00 TV5Slope01 TV5Slope02.. TV5Slope23
0.0032      0.0023    0.016       0.002

TOTAL$Uplift0 <- (TOTAL$cols11 - TOTAL$base * conditionedvariable)
TOTAL$Uplift1 <- (TOTAL$cols12 - TOTAL$base * conditionedvariable)
TOTAL$Uplift2 <- (TOTAL$cols13 - TOTAL$base * conditionedvariable)
TOTAL$Uplift3 <- (TOTAL$cols14 - TOTAL$base * conditionedvariable)
TOTAL$Uplift4 <- (TOTAL$cols15 - TOTAL$base * conditionedvariable)

如何让R根据条件选择条件变量?

例如:

对于 TOTAL$Uplift0 我将得到:

 cols11 - base * conditionedvariable

对于频道为 TV1 且小时为 01:00:00: 2 - 2.4 *0.0015 的第一行 对于频道为 TV5 且小时为 23:00:00: 1 - 1.8 *0.002 的第二行 对于频道为 TV1 且小时为 02:00:00: 8 - 5.4 *0.013

的第三行

我们paste把'Hour'列的'Channel'和substring连在一起('nm1'),把'TV1Slope'和[=27连在一起=] vectors('TV15'), match 'nm1 vector 'TV15'去掉'Slope'子串后sub,得到对应的'TV15' 值。使用 grep 对名称以 'cols' 开头的列进行子集化,进行计算,并将其分配给新列 ('nm2')。

nm1 <- with(TOTAL, paste0(Channel, substr(Hour, 1,2)))
TV15 <- c(TV1Slope, TV5Slope)
val <- TV15[match(nm1, sub('Slope', '', names(TV15)))]
indx <- grep('^cols', names(TOTAL))
nm2 <- paste0('Uplift',seq_along(indx)-1)
TOTAL[nm2] <- TOTAL[indx]-(TOTAL$base*val)
TOTAL
#  Channel     Hour Category cols11 cols12 cols13 cols14 cols15 base   Uplift0
#1     TV1 01:00:00      New      2      5      4      5      6  2.4 1.9946026
#2     TV5 23:00:00      Old      1      5      3      9      7  1.8 0.9823184
#3     TV1 02:00:00      New      8      7      9      2      4  5.4 7.9619720
#   Uplift1  Uplift2  Uplift3  Uplift4
#1 4.994603 3.994603 4.994603 5.994603
#2 4.982318 2.982318 8.982318 6.982318
#3 6.961972 8.961972 1.961972 3.961972

注意:创建了可重现的 'TV1Slope' 和 'TV5Slope' 示例

数据

TOTAL <- structure(list(Channel = c("TV1", "TV5", "TV1"), Hour = c("01:00:00", 
"23:00:00", "02:00:00"), Category = c("New", "Old", "New"), cols11 = c(2L, 
1L, 8L), cols12 = c(5L, 5L, 7L), cols13 = c(4L, 3L, 9L), cols14 = c(5L, 
9L, 2L), cols15 = c(6L, 7L, 4L), base = c(2.4, 1.8, 5.4)), .Names = c("Channel", 
"Hour", "Category", "cols11", "cols12", "cols13", "cols14", "cols15", 
"base"), class = "data.frame", row.names = c(NA, -3L))

set.seed(24)
TV1Slope <- setNames(runif(24)/100, sprintf('TV1Slope%02d', 0:23))
set.seed(29)
TV5Slope <- setNames(runif(24)/100, sprintf('TV5Slope%02d', 0:23))