将数据从宽格式更改为长格式并在 R 中创建计算字段
Change data from wide to long format and create calculated fields in R
以下数据捕获了每个 Adv (Adv_Code) 的每月 OPN(最佳产品编号)。 Change_Dt 捕获 Adv 状态从 A 更改为 B 的月份。
更改月份之前,所有 OPN 都属于 adv 的 A 状态,当月之后,所有 OPN 都属于 B 状态。
以下为已有数据
Adv_Code Change_Dt April_OPN May_OPN June_OPN July_OPN Aug_OPN Sep_OPN Oct_OPN Nov_OPN Dec_OPN Jan_OPN Feb_OPN March_OPN
A201 April 0 0 0 0 0 0 0 0 0 0 0 0
A198 July 2 0 0 1 2 0 5 0 0 0 0 0
S1212 Nov 0 3 4 0 0 3 0 1 0 0 0 0
我想通过转换为长格式并根据 OPN 月份创建 Adv_Status 来创建以下数据结构。即如果 Month_OPN < Change_Dt Adv_Status 将是 A 否则 B.
Month_OPN 就是四月到三月,也就是 12 个月。
OPN 捕获每个 Adv.So 的每月 OPN,它是每个 Adv.
的 April NOP 到 Mar NOP 列中值的转置
预期输出:
Agent_Code Change_Dt Month_OPN Adv_Status OPN
S1198201 April April B 0
S1198201 April May B 0
S1198201 April June B 0
S1198201 April July B 0
S1198201 April Aug B 0
S1198201 April Sep B 0
S1198201 April Oct B 0
S1198201 April Nov B 0
S1198201 April Dec B 0
S1198201 April Jan B 0
S1198201 April Feb B 0
S1198201 April Mar B 0
S1198203 July April A 2
S1198203 July May A 0
S1198203 July June A 0
S1198203 July July B 1
S1198203 July Aug B 2
S1198203 July Sep B 0
S1198203 July Oct B 5
S1198203 July Nov B 0
S1198203 July Dec B 0
S1198203 July Jan B 0
S1198203 July Feb B 0
S1198203 July Mar B 0
S1198212 Nov April A 0
S1198212 Nov May A 3
S1198212 Nov June A 4
S1198212 Nov July A 0
S1198212 Nov Aug A 0
S1198212 Nov Sep A 3
S1198212 Nov Oct A 0
S1198212 Nov Nov B 1
S1198212 Nov Dec B 0
S1198212 Nov Jan B 0
S1198212 Nov Feb B 0
S1198212 Nov Mar B 0
有人可以帮我用 R 做这个吗?
考虑使用内置常量 month.name 和 [=25 进行清理和月数计算的基数 R reshape
=]:
# RESHAPE
rdf <- reshape(df, idvar=c("Adv_Code", "Change_Dt"),
varying=list(names(df)[-1][-1]), v.names="OPN",
times=names(df)[-1][-1], timevar="Month_OPN",
new.row.names=1:1E5, direction="long")
# CALCULATION
final_df <- within(rdf, {
# RETRIEVE MONTH NUMBER FROM MONTH NAME/MONTH ABBREV (e.g., JULY or JUL => 7)
Change_Dt_Num <- sapply(Change_Dt, function(x) max(which(month.name==x), which(month.abb==x)))
# REMOVE THE "_OPN" SUFFIX FROM Month_OPN VALUES
Month_OPN <- sub("_OPN", "", Month_OPN)
# RETRIEVE MONTH NUMBER FROM MONTH NAME/MONTH ABBREV (e.g., JULY or JUL => 7)
Month_OPN_Num <- sapply(Month_OPN, function(x) max(which(month.name==x), which(month.abb==x)))
# CONDITIONALLY ASSIGN "A" AND "B" BY COMPARING BOTH MONTH NUMBERS BEFORE/AFTER APRIL
Adv_Status <- ifelse(Month_OPN_Num < Change_Dt_Num & Month_OPN_Num >= 4, "A",
ifelse(Month_OPN_Num < Change_Dt_Num & Month_OPN_Num < 4, "B", "B"))
# REMOVE HELPER COLUMNS (USED FOR ABOVE CALCULATION ONLY)
rm(Change_Dt_Num, Month_OPN_Num)
})
# RE-ORDER ROWS AND RESET ROW NAMES
final_df <- with(final_df, final_df[order(Adv_Code),])
row.names(final_df) <- NULL
输出
final_df
# Adv_Code Change_Dt Month_OPN OPN Adv_Status
# 1 A198 July April 2 A
# 2 A198 July May 0 A
# 3 A198 July June 0 A
# 4 A198 July July 1 B
# 5 A198 July Aug 2 B
# 6 A198 July Sep 0 B
# 7 A198 July Oct 5 B
# 8 A198 July Nov 0 B
# 9 A198 July Dec 0 B
# 10 A198 July Jan 0 B
# 11 A198 July Feb 0 B
# 12 A198 July March 0 B
# 13 A201 April April 0 B
# 14 A201 April May 0 B
# 15 A201 April June 0 B
# 16 A201 April July 0 B
# 17 A201 April Aug 0 B
# 18 A201 April Sep 0 B
# 19 A201 April Oct 0 B
# 20 A201 April Nov 0 B
# 21 A201 April Dec 0 B
# 22 A201 April Jan 0 B
# 23 A201 April Feb 0 B
# 24 A201 April March 0 B
# 25 S1212 Nov April 0 A
# 26 S1212 Nov May 3 A
# 27 S1212 Nov June 4 A
# 28 S1212 Nov July 0 A
# 29 S1212 Nov Aug 0 A
# 30 S1212 Nov Sep 3 A
# 31 S1212 Nov Oct 0 A
# 32 S1212 Nov Nov 1 B
# 33 S1212 Nov Dec 0 B
# 34 S1212 Nov Jan 0 B
# 35 S1212 Nov Feb 0 B
# 36 S1212 Nov March 0 B
以下数据捕获了每个 Adv (Adv_Code) 的每月 OPN(最佳产品编号)。 Change_Dt 捕获 Adv 状态从 A 更改为 B 的月份。
更改月份之前,所有 OPN 都属于 adv 的 A 状态,当月之后,所有 OPN 都属于 B 状态。
以下为已有数据
Adv_Code Change_Dt April_OPN May_OPN June_OPN July_OPN Aug_OPN Sep_OPN Oct_OPN Nov_OPN Dec_OPN Jan_OPN Feb_OPN March_OPN
A201 April 0 0 0 0 0 0 0 0 0 0 0 0
A198 July 2 0 0 1 2 0 5 0 0 0 0 0
S1212 Nov 0 3 4 0 0 3 0 1 0 0 0 0
我想通过转换为长格式并根据 OPN 月份创建 Adv_Status 来创建以下数据结构。即如果 Month_OPN < Change_Dt Adv_Status 将是 A 否则 B.
Month_OPN 就是四月到三月,也就是 12 个月。
OPN 捕获每个 Adv.So 的每月 OPN,它是每个 Adv.
预期输出:
Agent_Code Change_Dt Month_OPN Adv_Status OPN
S1198201 April April B 0
S1198201 April May B 0
S1198201 April June B 0
S1198201 April July B 0
S1198201 April Aug B 0
S1198201 April Sep B 0
S1198201 April Oct B 0
S1198201 April Nov B 0
S1198201 April Dec B 0
S1198201 April Jan B 0
S1198201 April Feb B 0
S1198201 April Mar B 0
S1198203 July April A 2
S1198203 July May A 0
S1198203 July June A 0
S1198203 July July B 1
S1198203 July Aug B 2
S1198203 July Sep B 0
S1198203 July Oct B 5
S1198203 July Nov B 0
S1198203 July Dec B 0
S1198203 July Jan B 0
S1198203 July Feb B 0
S1198203 July Mar B 0
S1198212 Nov April A 0
S1198212 Nov May A 3
S1198212 Nov June A 4
S1198212 Nov July A 0
S1198212 Nov Aug A 0
S1198212 Nov Sep A 3
S1198212 Nov Oct A 0
S1198212 Nov Nov B 1
S1198212 Nov Dec B 0
S1198212 Nov Jan B 0
S1198212 Nov Feb B 0
S1198212 Nov Mar B 0
有人可以帮我用 R 做这个吗?
考虑使用内置常量 month.name 和 [=25 进行清理和月数计算的基数 R reshape
=]:
# RESHAPE
rdf <- reshape(df, idvar=c("Adv_Code", "Change_Dt"),
varying=list(names(df)[-1][-1]), v.names="OPN",
times=names(df)[-1][-1], timevar="Month_OPN",
new.row.names=1:1E5, direction="long")
# CALCULATION
final_df <- within(rdf, {
# RETRIEVE MONTH NUMBER FROM MONTH NAME/MONTH ABBREV (e.g., JULY or JUL => 7)
Change_Dt_Num <- sapply(Change_Dt, function(x) max(which(month.name==x), which(month.abb==x)))
# REMOVE THE "_OPN" SUFFIX FROM Month_OPN VALUES
Month_OPN <- sub("_OPN", "", Month_OPN)
# RETRIEVE MONTH NUMBER FROM MONTH NAME/MONTH ABBREV (e.g., JULY or JUL => 7)
Month_OPN_Num <- sapply(Month_OPN, function(x) max(which(month.name==x), which(month.abb==x)))
# CONDITIONALLY ASSIGN "A" AND "B" BY COMPARING BOTH MONTH NUMBERS BEFORE/AFTER APRIL
Adv_Status <- ifelse(Month_OPN_Num < Change_Dt_Num & Month_OPN_Num >= 4, "A",
ifelse(Month_OPN_Num < Change_Dt_Num & Month_OPN_Num < 4, "B", "B"))
# REMOVE HELPER COLUMNS (USED FOR ABOVE CALCULATION ONLY)
rm(Change_Dt_Num, Month_OPN_Num)
})
# RE-ORDER ROWS AND RESET ROW NAMES
final_df <- with(final_df, final_df[order(Adv_Code),])
row.names(final_df) <- NULL
输出
final_df
# Adv_Code Change_Dt Month_OPN OPN Adv_Status
# 1 A198 July April 2 A
# 2 A198 July May 0 A
# 3 A198 July June 0 A
# 4 A198 July July 1 B
# 5 A198 July Aug 2 B
# 6 A198 July Sep 0 B
# 7 A198 July Oct 5 B
# 8 A198 July Nov 0 B
# 9 A198 July Dec 0 B
# 10 A198 July Jan 0 B
# 11 A198 July Feb 0 B
# 12 A198 July March 0 B
# 13 A201 April April 0 B
# 14 A201 April May 0 B
# 15 A201 April June 0 B
# 16 A201 April July 0 B
# 17 A201 April Aug 0 B
# 18 A201 April Sep 0 B
# 19 A201 April Oct 0 B
# 20 A201 April Nov 0 B
# 21 A201 April Dec 0 B
# 22 A201 April Jan 0 B
# 23 A201 April Feb 0 B
# 24 A201 April March 0 B
# 25 S1212 Nov April 0 A
# 26 S1212 Nov May 3 A
# 27 S1212 Nov June 4 A
# 28 S1212 Nov July 0 A
# 29 S1212 Nov Aug 0 A
# 30 S1212 Nov Sep 3 A
# 31 S1212 Nov Oct 0 A
# 32 S1212 Nov Nov 1 B
# 33 S1212 Nov Dec 0 B
# 34 S1212 Nov Jan 0 B
# 35 S1212 Nov Feb 0 B
# 36 S1212 Nov March 0 B