变量创建 - 推断年龄
Variable creation - Inferring age
我有一个分组数据框;
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000,3000,700,800,900,1000,20000,20000,30000,30000)
DF <- data.frame(Truck, OilChanged, Odometer)
# Truck OilChanged Odometer
# 1 A True 1000
# 2 A NewOil 1000
# 3 A False 2000
# 4 A False 3000
# 5 B False 700
# 6 B False 800
# 7 B False 900
# 8 B False 1000
# 9 C True 20000
# 10 C NewOil 20000
# 11 C True 30000
# 12 C NewOil 30000
我正在尽可能地推断石油的年龄(以千米为单位)。只有换油后才能进行推断。如果不更换机油,机油使用年限将仍然是个谜(例如:卡车 B)。
下面是想要的结果;
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000, 3000,700,800,900,1000,20000,20000,30000,30000)
OilAge <- c(NA,0,1000,2000,NA,NA,NA,NA,NA,0,10000,0)
Result <- data.frame(Truck, OilChanged, Odometer, OilAge)
# Truck OilChanged Odometer OilAge
# 1 A True 1000 NA
# 2 A NewOil 1000 0
# 3 A False 2000 1000
# 4 A False 3000 2000
# 5 B False 700 NA
# 6 B False 800 NA
# 7 B False 900 NA
# 8 B False 1000 NA
# 9 C True 20000 NA
# 10 C NewOil 20000 0
# 11 C True 30000 10000
# 12 C NewOil 30000 0
注意:True oilchanged 行与后续 NewOil 行之间的里程表读数将始终相同。因为油样是在换油之前直接采集的。但是必须保留这两行以使下游计算正常运行,例如变化率公式。
OilAge 列中的 NA 表示年龄是个谜。
如果此解决方案适合您,请告诉我。
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000,3000,700,800,900,1000,20000,20000,30000,30000)
DF <- data.frame(Truck, OilChanged, Odometer)
DF %>%
group_by(Truck) %>%
mutate(status = length(unique(OilChanged)),
OilAge = ifelse(OilChanged == "NewOil", 0,
ifelse(OilChanged == "False", Odometer - (Odometer - lag(Odometer)),
ifelse(OilChanged == "True", Odometer - lag(Odometer), NA)))) %>%
mutate(OilAge = ifelse(status !=1, OilAge, NA)) %>%
subset(select = c(Truck, OilChanged, Odometer, OilAge))
另一种方法
DF %>% group_by(Truck) %>%
mutate(d = cumsum(OilChanged == 'NewOil')) %>%
group_by(Truck, d) %>%
mutate(OilAge = cumsum(c(0*NA^(as.logical(!(first(d)))), diff(NA^(as.logical(!d))*Odometer))))
# A tibble: 12 x 5
# Groups: Truck, d [6]
Truck OilChanged Odometer d OilAge
<chr> <chr> <dbl> <int> <dbl>
1 A True 1000 0 NA
2 A NewOil 1000 1 0
3 A False 2000 1 1000
4 A False 3000 1 2000
5 B False 700 0 NA
6 B False 800 0 NA
7 B False 900 0 NA
8 B False 1000 0 NA
9 C True 20000 0 NA
10 C NewOil 20000 1 0
11 C True 30000 1 10000
12 C NewOil 30000 2 0
d
是一个虚拟变量,您可以在了解已完成的操作后取消选择
我有一个分组数据框;
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000,3000,700,800,900,1000,20000,20000,30000,30000)
DF <- data.frame(Truck, OilChanged, Odometer)
# Truck OilChanged Odometer
# 1 A True 1000
# 2 A NewOil 1000
# 3 A False 2000
# 4 A False 3000
# 5 B False 700
# 6 B False 800
# 7 B False 900
# 8 B False 1000
# 9 C True 20000
# 10 C NewOil 20000
# 11 C True 30000
# 12 C NewOil 30000
我正在尽可能地推断石油的年龄(以千米为单位)。只有换油后才能进行推断。如果不更换机油,机油使用年限将仍然是个谜(例如:卡车 B)。
下面是想要的结果;
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000, 3000,700,800,900,1000,20000,20000,30000,30000)
OilAge <- c(NA,0,1000,2000,NA,NA,NA,NA,NA,0,10000,0)
Result <- data.frame(Truck, OilChanged, Odometer, OilAge)
# Truck OilChanged Odometer OilAge
# 1 A True 1000 NA
# 2 A NewOil 1000 0
# 3 A False 2000 1000
# 4 A False 3000 2000
# 5 B False 700 NA
# 6 B False 800 NA
# 7 B False 900 NA
# 8 B False 1000 NA
# 9 C True 20000 NA
# 10 C NewOil 20000 0
# 11 C True 30000 10000
# 12 C NewOil 30000 0
注意:True oilchanged 行与后续 NewOil 行之间的里程表读数将始终相同。因为油样是在换油之前直接采集的。但是必须保留这两行以使下游计算正常运行,例如变化率公式。
OilAge 列中的 NA 表示年龄是个谜。
如果此解决方案适合您,请告诉我。
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000,3000,700,800,900,1000,20000,20000,30000,30000)
DF <- data.frame(Truck, OilChanged, Odometer)
DF %>%
group_by(Truck) %>%
mutate(status = length(unique(OilChanged)),
OilAge = ifelse(OilChanged == "NewOil", 0,
ifelse(OilChanged == "False", Odometer - (Odometer - lag(Odometer)),
ifelse(OilChanged == "True", Odometer - lag(Odometer), NA)))) %>%
mutate(OilAge = ifelse(status !=1, OilAge, NA)) %>%
subset(select = c(Truck, OilChanged, Odometer, OilAge))
另一种方法
DF %>% group_by(Truck) %>%
mutate(d = cumsum(OilChanged == 'NewOil')) %>%
group_by(Truck, d) %>%
mutate(OilAge = cumsum(c(0*NA^(as.logical(!(first(d)))), diff(NA^(as.logical(!d))*Odometer))))
# A tibble: 12 x 5
# Groups: Truck, d [6]
Truck OilChanged Odometer d OilAge
<chr> <chr> <dbl> <int> <dbl>
1 A True 1000 0 NA
2 A NewOil 1000 1 0
3 A False 2000 1 1000
4 A False 3000 1 2000
5 B False 700 0 NA
6 B False 800 0 NA
7 B False 900 0 NA
8 B False 1000 0 NA
9 C True 20000 0 NA
10 C NewOil 20000 1 0
11 C True 30000 1 10000
12 C NewOil 30000 2 0
d
是一个虚拟变量,您可以在了解已完成的操作后取消选择