R 条件应用对所有记录都不起作用
R conditional apply does not work equally for all records
我有以下数据框。 Change_Month 表示 Adv 的所有值都属于 N 类别之前的月份,并且从该月份开始属于 V 类别。
Adv_Code Change_Month Change_Dt April May June July August September October November December January February March
A201 April 7/4/2017 0 0 0 1 0 0 0 0 0 0 0 0
A198 April 7/4/2017 1 1 0 0 3 0 0 0 0 0 0 0
S1212 May 16/04/2017 0 0 0 0 0 0 0 0 0 0 0 1
S1213 January 4/1/2018 1 0 0 1 1 1 1 0 1 1 2 1
因此对于 A201,所有值都属于 V 类别。 A198 同样如此。
对于 S1212,4 月的值将归入 N 类别并保持在 V 类别。
同样,对于 S1213,4 月至 12 月将归入 N 类别,1 月至 3 月将归入 V 类别。
因此,预期输出为:
Adv_Code Change_Month Change_Dt April May June July August September October November December January February March N_OPN V_OPN
A201 April 7/4/2017 0 0 0 1 0 0 0 0 0 0 0 0 0 1
A198 April 7/4/2017 1 1 0 0 3 0 0 0 0 0 0 0 0 5
S1212 May 16/04/2017 0 0 0 0 0 0 0 0 0 0 0 1 0 1
S1213 January 4/1/2018 1 0 0 1 1 1 1 0 1 1 2 1 6 4
我尝试使用以下代码:
col <- 4 #Column number from where the months start
df[c("N_OPN", "V_OPN")] <- t(apply(df, 1, function(x) {
inds <- grep(x[["Change_Month"]], names(x))
if (as.numeric(substr(x["Change_Dt"], 1, 2)) > 15)
c(sum(as.numeric(x[col:pmax(col, inds - 1)])),
sum(as.numeric(x[inds:ncol(df)])))
else
c(sum(as.numeric(x[col:inds])),
sum(as.numeric(x[pmin(ncol(df), inds + 1):ncol(df)])))
}))
然而这给出了:
Adv_Code Change_Month Change_Dt April May June July August September October November December January February March N_OPN V_OPN
A201 April 7/4/2017 0 0 0 1 0 0 0 0 0 0 0 0 0 1
A198 April 7/4/2017 1 1 0 0 3 0 0 0 0 0 0 0 1 5
S1212 May 16/04/2017 0 0 0 0 0 0 0 0 0 0 0 1 0 1
S1213 January 4/1/2018 1 0 0 1 1 1 1 0 1 1 2 1 7 3
我不确定为什么会这样。
有人可以帮我解决这个问题吗?
下面是重现数据帧的代码:
df <- structure(list(Adv_Code = structure(c(2L, 1L, 3L,4L), .Label = c("A198","A201", "S1212","S1213"), class = "factor"),
Change_Dt = structure(c(2L,3L, 1L,1L), .Label = c("07/04/2017", "07/04/2017", "16/04/2017","4/1/2018"), class = "factor"),
Change_Month = structure(1:4, .Label = c("April", "April","May","January"), class = "factor"), April = c(0L, 1L, 0L,1L),
May = c(0L, 1L, 0L,0L), June = c(0L, 0L, 0L,0L), July = c(1L, 0L,0L,1L),
August = c(0L, 3L, 0L,1L), September = c(0L,0L, 0L,1L), October = c(0L, 0L, 0L,1L), November = c(0L,0L, 0L,0L),
December = c(0L, 0L, 0L,1L), January = c(0L,0L, 0L,1L), February = c(0L, 0L, 0L,2L), March = c(0L,0L, 1L,1L)), class = "data.frame", row.names = c(NA, -4L))
带有 for
循环的选项是
df1 <- df[4:ncol(df)]
j1 <- match(df$Change_Month, names(df1))
N_OPN <- numeric(nrow(df1))
V_OPN <- numeric(nrow(df1))
for(i in seq_len(nrow(df1))) {
j2 <- j1[i] -1
N_OPN[i] <- if(j2 == 0) 0 else sum(df1[i, seq_len(j2)])
V_OPN[i] <- sum(df1[i, (j1[i]:ncol(df1))])
}
df[c("N_OPN", "V_OPN")] <- list(N_OPN, V_OPN)
df
# Adv_Code Change_Dt Change_Month April May June July August September October November December January
#1 A201 07/04/2017 April 0 0 0 1 0 0 0 0 0 0
#2 A198 16/04/2017 April 1 1 0 0 3 0 0 0 0 0
#3 S1212 07/04/2017 May 0 0 0 0 0 0 0 0 0 0
#4 S1213 07/04/2017 January 1 0 0 1 1 1 1 0 1 1
# February March N_OPN V_OPN
#1 0 0 0 1
#2 0 0 0 5
#3 0 1 0 1
#4 2 1 6 4
我有以下数据框。 Change_Month 表示 Adv 的所有值都属于 N 类别之前的月份,并且从该月份开始属于 V 类别。
Adv_Code Change_Month Change_Dt April May June July August September October November December January February March
A201 April 7/4/2017 0 0 0 1 0 0 0 0 0 0 0 0
A198 April 7/4/2017 1 1 0 0 3 0 0 0 0 0 0 0
S1212 May 16/04/2017 0 0 0 0 0 0 0 0 0 0 0 1
S1213 January 4/1/2018 1 0 0 1 1 1 1 0 1 1 2 1
因此对于 A201,所有值都属于 V 类别。 A198 同样如此。 对于 S1212,4 月的值将归入 N 类别并保持在 V 类别。 同样,对于 S1213,4 月至 12 月将归入 N 类别,1 月至 3 月将归入 V 类别。
因此,预期输出为:
Adv_Code Change_Month Change_Dt April May June July August September October November December January February March N_OPN V_OPN
A201 April 7/4/2017 0 0 0 1 0 0 0 0 0 0 0 0 0 1
A198 April 7/4/2017 1 1 0 0 3 0 0 0 0 0 0 0 0 5
S1212 May 16/04/2017 0 0 0 0 0 0 0 0 0 0 0 1 0 1
S1213 January 4/1/2018 1 0 0 1 1 1 1 0 1 1 2 1 6 4
我尝试使用以下代码:
col <- 4 #Column number from where the months start
df[c("N_OPN", "V_OPN")] <- t(apply(df, 1, function(x) {
inds <- grep(x[["Change_Month"]], names(x))
if (as.numeric(substr(x["Change_Dt"], 1, 2)) > 15)
c(sum(as.numeric(x[col:pmax(col, inds - 1)])),
sum(as.numeric(x[inds:ncol(df)])))
else
c(sum(as.numeric(x[col:inds])),
sum(as.numeric(x[pmin(ncol(df), inds + 1):ncol(df)])))
}))
然而这给出了:
Adv_Code Change_Month Change_Dt April May June July August September October November December January February March N_OPN V_OPN
A201 April 7/4/2017 0 0 0 1 0 0 0 0 0 0 0 0 0 1
A198 April 7/4/2017 1 1 0 0 3 0 0 0 0 0 0 0 1 5
S1212 May 16/04/2017 0 0 0 0 0 0 0 0 0 0 0 1 0 1
S1213 January 4/1/2018 1 0 0 1 1 1 1 0 1 1 2 1 7 3
我不确定为什么会这样。 有人可以帮我解决这个问题吗?
下面是重现数据帧的代码:
df <- structure(list(Adv_Code = structure(c(2L, 1L, 3L,4L), .Label = c("A198","A201", "S1212","S1213"), class = "factor"),
Change_Dt = structure(c(2L,3L, 1L,1L), .Label = c("07/04/2017", "07/04/2017", "16/04/2017","4/1/2018"), class = "factor"),
Change_Month = structure(1:4, .Label = c("April", "April","May","January"), class = "factor"), April = c(0L, 1L, 0L,1L),
May = c(0L, 1L, 0L,0L), June = c(0L, 0L, 0L,0L), July = c(1L, 0L,0L,1L),
August = c(0L, 3L, 0L,1L), September = c(0L,0L, 0L,1L), October = c(0L, 0L, 0L,1L), November = c(0L,0L, 0L,0L),
December = c(0L, 0L, 0L,1L), January = c(0L,0L, 0L,1L), February = c(0L, 0L, 0L,2L), March = c(0L,0L, 1L,1L)), class = "data.frame", row.names = c(NA, -4L))
带有 for
循环的选项是
df1 <- df[4:ncol(df)]
j1 <- match(df$Change_Month, names(df1))
N_OPN <- numeric(nrow(df1))
V_OPN <- numeric(nrow(df1))
for(i in seq_len(nrow(df1))) {
j2 <- j1[i] -1
N_OPN[i] <- if(j2 == 0) 0 else sum(df1[i, seq_len(j2)])
V_OPN[i] <- sum(df1[i, (j1[i]:ncol(df1))])
}
df[c("N_OPN", "V_OPN")] <- list(N_OPN, V_OPN)
df
# Adv_Code Change_Dt Change_Month April May June July August September October November December January
#1 A201 07/04/2017 April 0 0 0 1 0 0 0 0 0 0
#2 A198 16/04/2017 April 1 1 0 0 3 0 0 0 0 0
#3 S1212 07/04/2017 May 0 0 0 0 0 0 0 0 0 0
#4 S1213 07/04/2017 January 1 0 0 1 1 1 1 0 1 1
# February March N_OPN V_OPN
#1 0 0 0 1
#2 0 0 0 5
#3 0 1 0 1
#4 2 1 6 4