将上个月的状态标签放在当前标签旁边的新列中
Put last month's status label in a new column next to current label
我正在尝试编写一个脚本,通过创建一个显示上一时期标签的新列“Last_month”来比较一组客户的时期标签。如果这是客户第一次出现在数据集中,那么它应该在“Last_month”列中说 NA 或“New”。
我根据几年前收到的相关问题 () 的回复尝试了多种方法,但我无法在第一个月将标签设置为 NA 或“新”。
如果我只需要比较最后两个时期,我会使用左连接来比较这两个时期,但这里我有一个涵盖许多时期的大型数据集。是否有捷径可寻?
我的最佳尝试加上期望的结果:
library(tidyverse)
# Example data
df <- tibble(Period = c(202201, 202202, 202202, 202202, 202201, 202203,202203),
CustomerID = c(1, 1, 2, 3, 2, 2, 1),
Tag = c("A", "A", "B", "C", "D", "C", "D"))
# My best result
df %>%
arrange(CustomerID, Period) %>%
group_by(CustomerID) %>%
mutate(Last_month = lag(Tag, default = Tag[1]))
# Desired outcome 1
df2 <- tibble(Period = c(202201, 202202, 202203, 202201, 202202, 202203,202202),
Customer = c(1, 1, 1, 2, 2, 2, 3),
Tag = c("A", "A", "D", "D", "B", "C", "C"),
Last_month = c("New", "A", "A", "New", "D", "B", "New"))
# Desired outcome 2
df3 <- tibble(Period = c(202201, 202202, 202203, 202201, 202202, 202203,202202),
Customer = c(1, 1, 1, 2, 2, 2, 3),
Tag = c("A", "A", "D", "D", "B", "C", "C"),
Last_month = c(NA, "A", "A", NA, "D", "B", NA))
像这样?
library(dplyr)
df %>%
arrange(CustomerID, Period) %>%
group_by(CustomerID) %>%
mutate(last_month = lag(Tag, 1))
我正在尝试编写一个脚本,通过创建一个显示上一时期标签的新列“Last_month”来比较一组客户的时期标签。如果这是客户第一次出现在数据集中,那么它应该在“Last_month”列中说 NA 或“New”。
我根据几年前收到的相关问题 (
如果我只需要比较最后两个时期,我会使用左连接来比较这两个时期,但这里我有一个涵盖许多时期的大型数据集。是否有捷径可寻? 我的最佳尝试加上期望的结果:
library(tidyverse)
# Example data
df <- tibble(Period = c(202201, 202202, 202202, 202202, 202201, 202203,202203),
CustomerID = c(1, 1, 2, 3, 2, 2, 1),
Tag = c("A", "A", "B", "C", "D", "C", "D"))
# My best result
df %>%
arrange(CustomerID, Period) %>%
group_by(CustomerID) %>%
mutate(Last_month = lag(Tag, default = Tag[1]))
# Desired outcome 1
df2 <- tibble(Period = c(202201, 202202, 202203, 202201, 202202, 202203,202202),
Customer = c(1, 1, 1, 2, 2, 2, 3),
Tag = c("A", "A", "D", "D", "B", "C", "C"),
Last_month = c("New", "A", "A", "New", "D", "B", "New"))
# Desired outcome 2
df3 <- tibble(Period = c(202201, 202202, 202203, 202201, 202202, 202203,202202),
Customer = c(1, 1, 1, 2, 2, 2, 3),
Tag = c("A", "A", "D", "D", "B", "C", "C"),
Last_month = c(NA, "A", "A", NA, "D", "B", NA))
像这样?
library(dplyr)
df %>%
arrange(CustomerID, Period) %>%
group_by(CustomerID) %>%
mutate(last_month = lag(Tag, 1))