用 R 中先前值的乘数填充 NA
Fill NAs with a multiplier of the previous value in R
我有一个数据框,其中包含每个国家/地区每个日期的有序值列表 (var1)。在有 NA 的地方,我想通过将乘数列中的值乘以先前的值来填充 NA。最终,这将是乘数从以前的值增长。
现有数据框
df <- data.frame(Date = seq(ymd("2020-01-01",tz= Sys.timezone()),ymd("2020-01-05",tz= Sys.timezone()),86400),
Country=c(rep("USA",5),rep("INDIA",5),rep("POLAND",5),rep("SWITZERLAND",5)),
var1= c(20:21,rep(NA,3))
,multiplier= c(rep(1.1,5),rep(1.2,5),rep(1.5,5),rep(1.1,5)))
df
Date Country var1 multiplier
1 2020-01-01 USA 20 1.1
2 2020-01-02 USA 21 1.1
3 2020-01-03 USA NA 1.1
4 2020-01-04 USA NA 1.1
5 2020-01-05 USA NA 1.1
6 2020-01-01 INDIA 20 1.2
7 2020-01-02 INDIA 21 1.2
8 2020-01-03 INDIA NA 1.2
9 2020-01-04 INDIA NA 1.2
10 2020-01-05 INDIA NA 1.2
11 2020-01-01 POLAND 20 1.5
12 2020-01-02 POLAND 21 1.5
13 2020-01-03 POLAND NA 1.5
14 2020-01-04 POLAND NA 1.5
15 2020-01-05 POLAND NA 1.5
16 2020-01-01 SWITZERLAND 20 1.1
17 2020-01-02 SWITZERLAND 21 1.1
18 2020-01-03 SWITZERLAND NA 1.1
19 2020-01-04 SWITZERLAND NA 1.1
20 2020-01-05 SWITZERLAND NA 1.1
预期输出
预期的输出是用乘数和前一个值的乘积填充 var1 中的 NA。例如,美国 1 月 3 日的 var1 值将为 21 * 1.1 = 23.1
#After manipulation I should get the following
df
Date Country var1 multiplier
1 1-Jan USA 20.000 1.1
2 2-Jan USA 21.000 1.1
3 3-Jan USA 23.100 1.1
4 4-Jan USA 25.410 1.1
5 5-Jan USA 27.951 1.1
6 1-Jan INDIA 20.000 1.2
7 2-Jan INDIA 21.000 1.2
8 3-Jan INDIA 25.200 1.2
9 4-Jan INDIA 30.240 1.2
10 5-Jan INDIA 36.288 1.2
11 1-Jan POLAND 20.000 1.5
12 2-Jan POLAND 21.000 1.5
13 3-Jan POLAND 31.500 1.5
14 4-Jan POLAND 47.250 1.5
15 5-Jan POLAND 70.875 1.5
16 1-Jan SWITZERLAND 20.000 1.1
17 2-Jan SWITZERLAND 21.000 1.1
18 3-Jan SWITZERLAND 23.100 1.1
19 4-Jan SWITZERLAND 25.410 1.1
20 5-Jan SWITZERLAND 27.951 1.1
提前感谢您的回复
我们可以使用 accumulate2
在按 'Country' 分组后,通过对 'var1' 的非 NA 逻辑元素求和创建的组。 accumulate2
与 'multiplier' 相乘,并将下一个元素替换为前一个相乘后的值
library(dplyr)
library(purrr)
df %>%
group_by(Country) %>%
group_by(grp = cumsum(!is.na(var1)), .add = TRUE) %>%
mutate(var1 = accumulate2(var1, multiplier[-1], ~ ..1 * ..3)) %>%
unnest(c(var1)) %>%
as.data.frame
# Date Country var1 multiplier grp
#1 2020-01-01 USA 20.000 1.1 1
#2 2020-01-02 USA 21.000 1.1 2
#3 2020-01-03 USA 23.100 1.1 2
#4 2020-01-04 USA 25.410 1.1 2
#5 2020-01-05 USA 27.951 1.1 2
#6 2020-01-01 INDIA 20.000 1.2 1
#7 2020-01-02 INDIA 21.000 1.2 2
#8 2020-01-03 INDIA 25.200 1.2 2
#9 2020-01-04 INDIA 30.240 1.2 2
#10 2020-01-05 INDIA 36.288 1.2 2
#11 2020-01-01 POLAND 20.000 1.5 1
#12 2020-01-02 POLAND 21.000 1.5 2
#13 2020-01-03 POLAND 31.500 1.5 2
#14 2020-01-04 POLAND 47.250 1.5 2
#15 2020-01-05 POLAND 70.875 1.5 2
#16 2020-01-01 SWITZERLAND 20.000 1.1 1
#17 2020-01-02 SWITZERLAND 21.000 1.1 2
#18 2020-01-03 SWITZERLAND 23.100 1.1 2
#19 2020-01-04 SWITZERLAND 25.410 1.1 2
#20 2020-01-05 SWITZERLAND 27.951 1.1 2
我看不出如何使用 dplyr
轻松完成此操作,但使用循环就足够简单了:
n = nrow(df)
for(i in 2:n){
if(is.na(df$var1[i])){
df$var1[i] = df$var1[i - 1] * df$multiplier[i]
}
}
这当然假设第一行没有 NA
。如果你想处理它,你必须添加一个 if
语句。
这是 data.table
中的一个选项,也使用 base::cumprod
:
library(data.table)
ix <- setDT(df)[is.na(var1), which=TRUE]
df[, var1 := as.double(nafill(var1, "locf"))][
ix, var1 := var1 * cumprod(multiplier), Country]
我有一个数据框,其中包含每个国家/地区每个日期的有序值列表 (var1)。在有 NA 的地方,我想通过将乘数列中的值乘以先前的值来填充 NA。最终,这将是乘数从以前的值增长。
现有数据框
df <- data.frame(Date = seq(ymd("2020-01-01",tz= Sys.timezone()),ymd("2020-01-05",tz= Sys.timezone()),86400),
Country=c(rep("USA",5),rep("INDIA",5),rep("POLAND",5),rep("SWITZERLAND",5)),
var1= c(20:21,rep(NA,3))
,multiplier= c(rep(1.1,5),rep(1.2,5),rep(1.5,5),rep(1.1,5)))
df
Date Country var1 multiplier
1 2020-01-01 USA 20 1.1
2 2020-01-02 USA 21 1.1
3 2020-01-03 USA NA 1.1
4 2020-01-04 USA NA 1.1
5 2020-01-05 USA NA 1.1
6 2020-01-01 INDIA 20 1.2
7 2020-01-02 INDIA 21 1.2
8 2020-01-03 INDIA NA 1.2
9 2020-01-04 INDIA NA 1.2
10 2020-01-05 INDIA NA 1.2
11 2020-01-01 POLAND 20 1.5
12 2020-01-02 POLAND 21 1.5
13 2020-01-03 POLAND NA 1.5
14 2020-01-04 POLAND NA 1.5
15 2020-01-05 POLAND NA 1.5
16 2020-01-01 SWITZERLAND 20 1.1
17 2020-01-02 SWITZERLAND 21 1.1
18 2020-01-03 SWITZERLAND NA 1.1
19 2020-01-04 SWITZERLAND NA 1.1
20 2020-01-05 SWITZERLAND NA 1.1
预期输出 预期的输出是用乘数和前一个值的乘积填充 var1 中的 NA。例如,美国 1 月 3 日的 var1 值将为 21 * 1.1 = 23.1
#After manipulation I should get the following
df
Date Country var1 multiplier
1 1-Jan USA 20.000 1.1
2 2-Jan USA 21.000 1.1
3 3-Jan USA 23.100 1.1
4 4-Jan USA 25.410 1.1
5 5-Jan USA 27.951 1.1
6 1-Jan INDIA 20.000 1.2
7 2-Jan INDIA 21.000 1.2
8 3-Jan INDIA 25.200 1.2
9 4-Jan INDIA 30.240 1.2
10 5-Jan INDIA 36.288 1.2
11 1-Jan POLAND 20.000 1.5
12 2-Jan POLAND 21.000 1.5
13 3-Jan POLAND 31.500 1.5
14 4-Jan POLAND 47.250 1.5
15 5-Jan POLAND 70.875 1.5
16 1-Jan SWITZERLAND 20.000 1.1
17 2-Jan SWITZERLAND 21.000 1.1
18 3-Jan SWITZERLAND 23.100 1.1
19 4-Jan SWITZERLAND 25.410 1.1
20 5-Jan SWITZERLAND 27.951 1.1
提前感谢您的回复
我们可以使用 accumulate2
在按 'Country' 分组后,通过对 'var1' 的非 NA 逻辑元素求和创建的组。 accumulate2
与 'multiplier' 相乘,并将下一个元素替换为前一个相乘后的值
library(dplyr)
library(purrr)
df %>%
group_by(Country) %>%
group_by(grp = cumsum(!is.na(var1)), .add = TRUE) %>%
mutate(var1 = accumulate2(var1, multiplier[-1], ~ ..1 * ..3)) %>%
unnest(c(var1)) %>%
as.data.frame
# Date Country var1 multiplier grp
#1 2020-01-01 USA 20.000 1.1 1
#2 2020-01-02 USA 21.000 1.1 2
#3 2020-01-03 USA 23.100 1.1 2
#4 2020-01-04 USA 25.410 1.1 2
#5 2020-01-05 USA 27.951 1.1 2
#6 2020-01-01 INDIA 20.000 1.2 1
#7 2020-01-02 INDIA 21.000 1.2 2
#8 2020-01-03 INDIA 25.200 1.2 2
#9 2020-01-04 INDIA 30.240 1.2 2
#10 2020-01-05 INDIA 36.288 1.2 2
#11 2020-01-01 POLAND 20.000 1.5 1
#12 2020-01-02 POLAND 21.000 1.5 2
#13 2020-01-03 POLAND 31.500 1.5 2
#14 2020-01-04 POLAND 47.250 1.5 2
#15 2020-01-05 POLAND 70.875 1.5 2
#16 2020-01-01 SWITZERLAND 20.000 1.1 1
#17 2020-01-02 SWITZERLAND 21.000 1.1 2
#18 2020-01-03 SWITZERLAND 23.100 1.1 2
#19 2020-01-04 SWITZERLAND 25.410 1.1 2
#20 2020-01-05 SWITZERLAND 27.951 1.1 2
我看不出如何使用 dplyr
轻松完成此操作,但使用循环就足够简单了:
n = nrow(df)
for(i in 2:n){
if(is.na(df$var1[i])){
df$var1[i] = df$var1[i - 1] * df$multiplier[i]
}
}
这当然假设第一行没有 NA
。如果你想处理它,你必须添加一个 if
语句。
这是 data.table
中的一个选项,也使用 base::cumprod
:
library(data.table)
ix <- setDT(df)[is.na(var1), which=TRUE]
df[, var1 := as.double(nafill(var1, "locf"))][
ix, var1 := var1 * cumprod(multiplier), Country]