根据条件创建的组汇总列值
summarizing column values by groups created based on conditionals
我有以下数据集:
Adv_Code Change_Dt Change_Month April_OPN May_OPN June_OPN July_OPN August_OPN September_OPN October_OPN November_OPN December_OPN January_OPN February_OPN March_OPN
A201 12/04/2018 April 0 0 1 0 0 0 0 0 0 0 0 0
A198 27/07/2018 August 2 0 0 1 2 0 5 0 0 0 0 0
S1212 10/11/2018 November 0 3 4 0 0 3 0 1 0 0 0 0
我需要根据change_month和change_dt将每月交易分为N和V。
当日期在该月的 15 号之后,change_month 落在下个月,否则与 change_dt 相同。
例如,对于 A198,Change_Month 是 Aug,因此 April_OPN 到 July_OPN 将归入 N 类别并保留在 V 类别。
对于S1212,由于日期在15日之前,4月-10月的OPN落在N下,仍然在V下。
预期输出:
Adv_Code Change_Dt Change_Month N_OPN V_OPN
A201 12/04/2018 April 0 1
A198 27/07/2018 August 3 7
S1212 10/11/2018 November 10 1
有人可以帮我解决这个问题吗?
下面是重现数据集的代码:
Adv_Code <- c('A201','A198','S1212')
Change_Dt <- c(as.Date('12/04/2018'),as.Date('27/07/2018'),as.Date('10/11/2018'))
April_NOP <- c(0,2,0)
May_NOP <- c(0,0,3)
June_NOP <- c(0,0,4)
July_NOP <- c(0,1,0)
August_NOP <- c(0,2,0)
September_NOP <- c(0,0,3)
October_NOP <- c(0,5,0)
November_NOP <- c(0,0,1)
December_NOP <- c(0,0,0)
January_NOP <- c(0,0,0)
February_NOP <- c(0,0,0)
March_NOP <- c(0,0,0)
df <- data.frame(Adv_Code,Change_Dt,April_NOP,May_NOP,June_NOP,July_NOP,August_NOP,September_NOP,October_NOP,November_NOP,December_NOP,January_NOP,February_NOP,March_NOP)
我们可以使用 apply
和 MARGIN = 1
(按行)。在该行 (inds
) 出现 Change_Month
的位置存储列号。获取 Change_Dt
的子字符串并检查该值是否大于或等于 15,并根据 sum
将值分为两部分并添加为新列。
col <- 4 #Column number from where the months start
df[c("N_OPN", "V_OPN")] <- t(apply(df, 1, function(x) {
inds <- grep(x[["Change_Month"]], names(x))
if (as.numeric(substr(x["Change_Dt"], 1, 2)) > 15)
c(sum(as.numeric(x[col:pmax(col, inds - 1)])),
sum(as.numeric(x[inds:ncol(df)])))
else
c(sum(as.numeric(x[col:inds])),
sum(as.numeric(x[pmin(ncol(df), inds + 1):ncol(df)])))
}))
df[c(1:3, 16, 17)]
# Adv_Code Change_Dt Change_Month N_OPN V_OPN
#1 A201 12/04/2018 April 0 1
#2 A198 27/07/2018 August 3 7
#3 S1212 10/11/2018 November 11 0
数据
df <- structure(list(Adv_Code = structure(c(2L, 1L, 3L), .Label =
c("A198",
"A201", "S1212"), class = "factor"), Change_Dt = structure(c(2L,
3L, 1L), .Label = c("10/11/2018", "12/04/2018", "27/07/2018"), class =
"factor"),
Change_Month = structure(1:3, .Label = c("April", "August",
"November"), class = "factor"), April_OPN = c(0L, 2L, 0L),
May_OPN = c(0L, 0L, 3L), June_OPN = c(1L, 0L, 4L), July_OPN = c(0L,
1L, 0L), August_OPN = c(0L, 2L, 0L), September_OPN = c(0L,
0L, 3L), October_OPN = c(0L, 5L, 0L), November_OPN = c(0L,
0L, 1L), December_OPN = c(0L, 0L, 0L), January_OPN = c(0L,
0L, 0L), February_OPN = c(0L, 0L, 0L), March_OPN = c(0L,
0L, 0L)), class = "data.frame", row.names = c(NA, -3L))
我有以下数据集:
Adv_Code Change_Dt Change_Month April_OPN May_OPN June_OPN July_OPN August_OPN September_OPN October_OPN November_OPN December_OPN January_OPN February_OPN March_OPN
A201 12/04/2018 April 0 0 1 0 0 0 0 0 0 0 0 0
A198 27/07/2018 August 2 0 0 1 2 0 5 0 0 0 0 0
S1212 10/11/2018 November 0 3 4 0 0 3 0 1 0 0 0 0
我需要根据change_month和change_dt将每月交易分为N和V。 当日期在该月的 15 号之后,change_month 落在下个月,否则与 change_dt 相同。 例如,对于 A198,Change_Month 是 Aug,因此 April_OPN 到 July_OPN 将归入 N 类别并保留在 V 类别。 对于S1212,由于日期在15日之前,4月-10月的OPN落在N下,仍然在V下。
预期输出:
Adv_Code Change_Dt Change_Month N_OPN V_OPN
A201 12/04/2018 April 0 1
A198 27/07/2018 August 3 7
S1212 10/11/2018 November 10 1
有人可以帮我解决这个问题吗?
下面是重现数据集的代码:
Adv_Code <- c('A201','A198','S1212')
Change_Dt <- c(as.Date('12/04/2018'),as.Date('27/07/2018'),as.Date('10/11/2018'))
April_NOP <- c(0,2,0)
May_NOP <- c(0,0,3)
June_NOP <- c(0,0,4)
July_NOP <- c(0,1,0)
August_NOP <- c(0,2,0)
September_NOP <- c(0,0,3)
October_NOP <- c(0,5,0)
November_NOP <- c(0,0,1)
December_NOP <- c(0,0,0)
January_NOP <- c(0,0,0)
February_NOP <- c(0,0,0)
March_NOP <- c(0,0,0)
df <- data.frame(Adv_Code,Change_Dt,April_NOP,May_NOP,June_NOP,July_NOP,August_NOP,September_NOP,October_NOP,November_NOP,December_NOP,January_NOP,February_NOP,March_NOP)
我们可以使用 apply
和 MARGIN = 1
(按行)。在该行 (inds
) 出现 Change_Month
的位置存储列号。获取 Change_Dt
的子字符串并检查该值是否大于或等于 15,并根据 sum
将值分为两部分并添加为新列。
col <- 4 #Column number from where the months start
df[c("N_OPN", "V_OPN")] <- t(apply(df, 1, function(x) {
inds <- grep(x[["Change_Month"]], names(x))
if (as.numeric(substr(x["Change_Dt"], 1, 2)) > 15)
c(sum(as.numeric(x[col:pmax(col, inds - 1)])),
sum(as.numeric(x[inds:ncol(df)])))
else
c(sum(as.numeric(x[col:inds])),
sum(as.numeric(x[pmin(ncol(df), inds + 1):ncol(df)])))
}))
df[c(1:3, 16, 17)]
# Adv_Code Change_Dt Change_Month N_OPN V_OPN
#1 A201 12/04/2018 April 0 1
#2 A198 27/07/2018 August 3 7
#3 S1212 10/11/2018 November 11 0
数据
df <- structure(list(Adv_Code = structure(c(2L, 1L, 3L), .Label =
c("A198",
"A201", "S1212"), class = "factor"), Change_Dt = structure(c(2L,
3L, 1L), .Label = c("10/11/2018", "12/04/2018", "27/07/2018"), class =
"factor"),
Change_Month = structure(1:3, .Label = c("April", "August",
"November"), class = "factor"), April_OPN = c(0L, 2L, 0L),
May_OPN = c(0L, 0L, 3L), June_OPN = c(1L, 0L, 4L), July_OPN = c(0L,
1L, 0L), August_OPN = c(0L, 2L, 0L), September_OPN = c(0L,
0L, 3L), October_OPN = c(0L, 5L, 0L), November_OPN = c(0L,
0L, 1L), December_OPN = c(0L, 0L, 0L), January_OPN = c(0L,
0L, 0L), February_OPN = c(0L, 0L, 0L), March_OPN = c(0L,
0L, 0L)), class = "data.frame", row.names = c(NA, -3L))