按列分组并查找另一列的前导值
Grouping by column and finding preceeding value of another column
我有一个很长的销售数据,下面是一个示例摘录:
| Date | CountryA | CountryB | PriceA | PriceB | |
+------------+----------+----------+--------+--------+--+
| 05/09/2019 | US | Japan | 20 | 55 | |
| 28/09/2019 | Japan | Germany | 30 | 28 | |
| 16/10/2019 | Canada | US | 25 | 78 | |
| 28/10/2019 | Germany | Japan | 60 | 17 | |
+------------+----------+----------+--------+--------+--+
我想在“CountryB”列上分组,然后生成一个新列,显示相应国家/地区的 PriceA 的先前值,即该特定国家/地区上次出现在“CountryA”列中时基于日期顺序。在这个示例 table 中,我想得到以下结果:
| Date | CountryA | CountryB | PriceA | PriceB | PriceA_lag1 | |
+------------+----------+----------+--------+--------+-------------+--+
| 05/09/2019 | US | Japan | 20 | 55 | | |
| 28/09/2019 | Japan | Germany | 30 | 28 | | |
| 16/10/2019 | Canada | US | 25 | 78 | 20 | |
| 28/10/2019 | Germany | Japan | 60 | 17 | 30 | |
+------------+----------+----------+--------+--------+-------------+--+
我用 dplyr 尝试了以下操作:
data=data%>%group_by(CountryB)%>%mutate_at(list(lag1=~dplyr::lag(.,1,order_by=Date)),.vars=vars(PriceA))
然而,当相应的国家位于“CountryA”列时,这并没有给我前面的值,而是当相应的国家位于“CountryB”时。
有人可以帮我解决这个问题吗?
谢谢
很可能是我写过的最丑陋的代码,但是...
# install.packages('dplyr', 'magrittr')
library(dplyr)
library(magrittr)
d <- data.frame(
stringsAsFactors = FALSE,
Date = c("05/09/2019", "28/09/2019", "16/10/2019", "28/10/2019"),
CountryA = c("US", "Japan", "Canada", "Germany"),
CountryB = c("Japan", "Germany", "US", "Japan"),
PriceA = c(20L, 30L, 25L, 60L),
PriceB = c(55L, 28L, 78L, 17L)
) %>%
mutate(Date = as.Date(Date, format = '%d/%m/%Y'))
priceA_lag <- c()
for(row in 1:nrow(d)){
country <- slice(d, row) %$% CountryB
date <- slice(d, row) %$% Date
thePrice <- d %>%
filter(CountryA == country,
date > Date) %>%
filter(Date == max(Date)) %$%
PriceA
thePrice <- ifelse(length(thePrice) > 0, thePrice, NA)
priceA_lag <- priceA_lag %>%
append(thePrice)
}
d$priceA_lag <- priceA_lag
> d
Date CountryA CountryB PriceA PriceB priceA_lag
1 2019-09-05 US Japan 20 55 NA
2 2019-09-28 Japan Germany 30 28 NA
3 2019-10-16 Canada US 25 78 20
4 2019-10-28 Germany Japan 60 17 30
我有一个很长的销售数据,下面是一个示例摘录:
| Date | CountryA | CountryB | PriceA | PriceB | |
+------------+----------+----------+--------+--------+--+
| 05/09/2019 | US | Japan | 20 | 55 | |
| 28/09/2019 | Japan | Germany | 30 | 28 | |
| 16/10/2019 | Canada | US | 25 | 78 | |
| 28/10/2019 | Germany | Japan | 60 | 17 | |
+------------+----------+----------+--------+--------+--+
我想在“CountryB”列上分组,然后生成一个新列,显示相应国家/地区的 PriceA 的先前值,即该特定国家/地区上次出现在“CountryA”列中时基于日期顺序。在这个示例 table 中,我想得到以下结果:
| Date | CountryA | CountryB | PriceA | PriceB | PriceA_lag1 | |
+------------+----------+----------+--------+--------+-------------+--+
| 05/09/2019 | US | Japan | 20 | 55 | | |
| 28/09/2019 | Japan | Germany | 30 | 28 | | |
| 16/10/2019 | Canada | US | 25 | 78 | 20 | |
| 28/10/2019 | Germany | Japan | 60 | 17 | 30 | |
+------------+----------+----------+--------+--------+-------------+--+
我用 dplyr 尝试了以下操作:
data=data%>%group_by(CountryB)%>%mutate_at(list(lag1=~dplyr::lag(.,1,order_by=Date)),.vars=vars(PriceA))
然而,当相应的国家位于“CountryA”列时,这并没有给我前面的值,而是当相应的国家位于“CountryB”时。
有人可以帮我解决这个问题吗? 谢谢
很可能是我写过的最丑陋的代码,但是...
# install.packages('dplyr', 'magrittr')
library(dplyr)
library(magrittr)
d <- data.frame(
stringsAsFactors = FALSE,
Date = c("05/09/2019", "28/09/2019", "16/10/2019", "28/10/2019"),
CountryA = c("US", "Japan", "Canada", "Germany"),
CountryB = c("Japan", "Germany", "US", "Japan"),
PriceA = c(20L, 30L, 25L, 60L),
PriceB = c(55L, 28L, 78L, 17L)
) %>%
mutate(Date = as.Date(Date, format = '%d/%m/%Y'))
priceA_lag <- c()
for(row in 1:nrow(d)){
country <- slice(d, row) %$% CountryB
date <- slice(d, row) %$% Date
thePrice <- d %>%
filter(CountryA == country,
date > Date) %>%
filter(Date == max(Date)) %$%
PriceA
thePrice <- ifelse(length(thePrice) > 0, thePrice, NA)
priceA_lag <- priceA_lag %>%
append(thePrice)
}
d$priceA_lag <- priceA_lag
> d
Date CountryA CountryB PriceA PriceB priceA_lag
1 2019-09-05 US Japan 20 55 NA
2 2019-09-28 Japan Germany 30 28 NA
3 2019-10-16 Canada US 25 78 20
4 2019-10-28 Germany Japan 60 17 30