条件减法第 2 部分
Conditional subtraction Part 2
我有一个很大的 data.frame (TOTAL
),其中包含一些值 (cols11-16),我需要减去一个基数,根据 [=] 中的两个条件,从中乘以一个值13=].
data.frame (TOTAL
) 有点像这样
Channel Hour Category cols11 cols12 cols13 cols14 cols15 base
TV1 01:00:00 New 2 5 4 5 6 2.4
TV5 23:00:00 Old 1 5 3 9 7 1.8
TV1 02:00:00 New 8 7 9 2 4 5.4
有 4 个不同的频道,24 个不同的时间(00:00:00-23:00:00
)
我还有其他四个带有条件变量的向量需要根据小时和频道在基数上相乘,所以对于每个频道我都有一个这样的向量:
TV1Slope:
TV1Slope00 TV1Slope01 TV1Slope02.. TV1Slope23
0.0012 0.0015 0.013 0.0009
TV5Slope:
TV5Slope00 TV5Slope01 TV5Slope02.. TV5Slope23
0.0032 0.0023 0.016 0.002
TOTAL$Uplift0 <- (TOTAL$cols11 - TOTAL$base * conditionedvariable)
TOTAL$Uplift1 <- (TOTAL$cols12 - TOTAL$base * conditionedvariable)
TOTAL$Uplift2 <- (TOTAL$cols13 - TOTAL$base * conditionedvariable)
TOTAL$Uplift3 <- (TOTAL$cols14 - TOTAL$base * conditionedvariable)
TOTAL$Uplift4 <- (TOTAL$cols15 - TOTAL$base * conditionedvariable)
如何让R根据条件选择条件变量?
例如:
对于 TOTAL$Uplift0
我将得到:
cols11 - base * conditionedvariable
对于频道为 TV1 且小时为 01:00:00: 2 - 2.4 *0.0015
的第一行
对于频道为 TV5 且小时为 23:00:00: 1 - 1.8 *0.002
的第二行
对于频道为 TV1 且小时为 02:00:00: 8 - 5.4 *0.013
的第三行
我们paste
把'Hour'列的'Channel'和substring
连在一起('nm1'),把'TV1Slope'和[=27连在一起=] vectors('TV15'), match
'nm1 vector 'TV15'去掉'Slope'子串后sub
,得到对应的'TV15' 值。使用 grep
对名称以 'cols' 开头的列进行子集化,进行计算,并将其分配给新列 ('nm2')。
nm1 <- with(TOTAL, paste0(Channel, substr(Hour, 1,2)))
TV15 <- c(TV1Slope, TV5Slope)
val <- TV15[match(nm1, sub('Slope', '', names(TV15)))]
indx <- grep('^cols', names(TOTAL))
nm2 <- paste0('Uplift',seq_along(indx)-1)
TOTAL[nm2] <- TOTAL[indx]-(TOTAL$base*val)
TOTAL
# Channel Hour Category cols11 cols12 cols13 cols14 cols15 base Uplift0
#1 TV1 01:00:00 New 2 5 4 5 6 2.4 1.9946026
#2 TV5 23:00:00 Old 1 5 3 9 7 1.8 0.9823184
#3 TV1 02:00:00 New 8 7 9 2 4 5.4 7.9619720
# Uplift1 Uplift2 Uplift3 Uplift4
#1 4.994603 3.994603 4.994603 5.994603
#2 4.982318 2.982318 8.982318 6.982318
#3 6.961972 8.961972 1.961972 3.961972
注意:创建了可重现的 'TV1Slope' 和 'TV5Slope' 示例
数据
TOTAL <- structure(list(Channel = c("TV1", "TV5", "TV1"), Hour = c("01:00:00",
"23:00:00", "02:00:00"), Category = c("New", "Old", "New"), cols11 = c(2L,
1L, 8L), cols12 = c(5L, 5L, 7L), cols13 = c(4L, 3L, 9L), cols14 = c(5L,
9L, 2L), cols15 = c(6L, 7L, 4L), base = c(2.4, 1.8, 5.4)), .Names = c("Channel",
"Hour", "Category", "cols11", "cols12", "cols13", "cols14", "cols15",
"base"), class = "data.frame", row.names = c(NA, -3L))
set.seed(24)
TV1Slope <- setNames(runif(24)/100, sprintf('TV1Slope%02d', 0:23))
set.seed(29)
TV5Slope <- setNames(runif(24)/100, sprintf('TV5Slope%02d', 0:23))
我有一个很大的 data.frame (TOTAL
),其中包含一些值 (cols11-16),我需要减去一个基数,根据 [=] 中的两个条件,从中乘以一个值13=].
data.frame (TOTAL
) 有点像这样
Channel Hour Category cols11 cols12 cols13 cols14 cols15 base
TV1 01:00:00 New 2 5 4 5 6 2.4
TV5 23:00:00 Old 1 5 3 9 7 1.8
TV1 02:00:00 New 8 7 9 2 4 5.4
有 4 个不同的频道,24 个不同的时间(00:00:00-23:00:00
)
我还有其他四个带有条件变量的向量需要根据小时和频道在基数上相乘,所以对于每个频道我都有一个这样的向量:
TV1Slope:
TV1Slope00 TV1Slope01 TV1Slope02.. TV1Slope23
0.0012 0.0015 0.013 0.0009
TV5Slope:
TV5Slope00 TV5Slope01 TV5Slope02.. TV5Slope23
0.0032 0.0023 0.016 0.002
TOTAL$Uplift0 <- (TOTAL$cols11 - TOTAL$base * conditionedvariable)
TOTAL$Uplift1 <- (TOTAL$cols12 - TOTAL$base * conditionedvariable)
TOTAL$Uplift2 <- (TOTAL$cols13 - TOTAL$base * conditionedvariable)
TOTAL$Uplift3 <- (TOTAL$cols14 - TOTAL$base * conditionedvariable)
TOTAL$Uplift4 <- (TOTAL$cols15 - TOTAL$base * conditionedvariable)
如何让R根据条件选择条件变量?
例如:
对于 TOTAL$Uplift0
我将得到:
cols11 - base * conditionedvariable
对于频道为 TV1 且小时为 01:00:00: 2 - 2.4 *0.0015
的第一行
对于频道为 TV5 且小时为 23:00:00: 1 - 1.8 *0.002
的第二行
对于频道为 TV1 且小时为 02:00:00: 8 - 5.4 *0.013
我们paste
把'Hour'列的'Channel'和substring
连在一起('nm1'),把'TV1Slope'和[=27连在一起=] vectors('TV15'), match
'nm1 vector 'TV15'去掉'Slope'子串后sub
,得到对应的'TV15' 值。使用 grep
对名称以 'cols' 开头的列进行子集化,进行计算,并将其分配给新列 ('nm2')。
nm1 <- with(TOTAL, paste0(Channel, substr(Hour, 1,2)))
TV15 <- c(TV1Slope, TV5Slope)
val <- TV15[match(nm1, sub('Slope', '', names(TV15)))]
indx <- grep('^cols', names(TOTAL))
nm2 <- paste0('Uplift',seq_along(indx)-1)
TOTAL[nm2] <- TOTAL[indx]-(TOTAL$base*val)
TOTAL
# Channel Hour Category cols11 cols12 cols13 cols14 cols15 base Uplift0
#1 TV1 01:00:00 New 2 5 4 5 6 2.4 1.9946026
#2 TV5 23:00:00 Old 1 5 3 9 7 1.8 0.9823184
#3 TV1 02:00:00 New 8 7 9 2 4 5.4 7.9619720
# Uplift1 Uplift2 Uplift3 Uplift4
#1 4.994603 3.994603 4.994603 5.994603
#2 4.982318 2.982318 8.982318 6.982318
#3 6.961972 8.961972 1.961972 3.961972
注意:创建了可重现的 'TV1Slope' 和 'TV5Slope' 示例
数据
TOTAL <- structure(list(Channel = c("TV1", "TV5", "TV1"), Hour = c("01:00:00",
"23:00:00", "02:00:00"), Category = c("New", "Old", "New"), cols11 = c(2L,
1L, 8L), cols12 = c(5L, 5L, 7L), cols13 = c(4L, 3L, 9L), cols14 = c(5L,
9L, 2L), cols15 = c(6L, 7L, 4L), base = c(2.4, 1.8, 5.4)), .Names = c("Channel",
"Hour", "Category", "cols11", "cols12", "cols13", "cols14", "cols15",
"base"), class = "data.frame", row.names = c(NA, -3L))
set.seed(24)
TV1Slope <- setNames(runif(24)/100, sprintf('TV1Slope%02d', 0:23))
set.seed(29)
TV5Slope <- setNames(runif(24)/100, sprintf('TV5Slope%02d', 0:23))