如果在重复的相同非 NA 值之间,则用最后一个非 NA 填充 NA 值
Filling NA values with last non-NA's if between repeated identical non-NA values
我想用以前的非 NA 值替换我数据集中的 NA 值,但前提是 NA 介于相同的值之间。
为了说明这里有一小部分数据:
date 1 2 3
1 2004-12-27 NA NA NA
2 2004-12-28 2.299 2.349 2.348
3 2004-12-29 NA NA NA
4 2005-01-03 NA NA NA
5 2005-01-04 NA NA NA
6 2005-01-05 2.299 NA NA
7 2005-01-06 NA NA NA
8 2005-01-10 NA NA NA
9 2005-01-11 2.299 2.349 2.348
10 2005-01-12 NA NA NA
11 2005-01-17 NA NA NA
12 2005-01-18 2.299 NA NA
13 2005-01-19 NA NA NA
14 2005-01-24 NA NA NA
15 2005-01-25 NA 2.369 2.368
16 2005-01-26 2.299 NA NA
17 2005-01-31 2.299 NA NA
18 2005-02-01 NA NA NA
19 2005-02-02 NA NA NA
20 2005-02-08 NA NA NA
理想的输出是:
date 1 2 3
1 2004-12-27 NA NA NA
2 2004-12-28 2.299 2.349 2.348
3 2004-12-29 2.299 2.349 2.348
4 2005-01-03 2.299 2.349 2.348
5 2005-01-04 2.299 2.349 2.348
6 2005-01-05 2.299 2.349 2.348
7 2005-01-06 2.299 2.349 2.348
8 2005-01-10 2.299 2.349 2.348
9 2005-01-11 2.299 2.349 2.348
10 2005-01-12 2.299 NA NA
11 2005-01-17 2.299 NA NA
12 2005-01-18 2.299 NA NA
13 2005-01-19 2.299 NA NA
14 2005-01-24 2.299 NA NA
15 2005-01-25 2.299 2.369 2.368
16 2005-01-26 2.299 NA NA
17 2005-01-31 2.299 NA NA
这是使用 dput
的数据集的可重现样本:
structure(list(data_gas = structure(c(12779, 12780, 12781, 12786,
12787, 12788, 12789, 12793, 12794, 12795, 12800, 12801, 12802,
12807, 12808, 12809, 12814, 12815, 12816, 12822), class = "Date"),
`1` = c(NA, 2.299, NA, NA, NA, 2.299, NA, NA, 2.299, NA,
NA, 2.299, NA, NA, NA, 2.299, 2.299, NA, NA, NA), `3` = c(NA,
2.349, NA, NA, NA, NA, NA, NA, 2.349, NA, NA, NA, NA, NA,
2.369, NA, NA, NA, NA, NA), `4` = c(NA, 2.348, NA, NA, NA,
NA, NA, NA, 2.348, NA, NA, NA, NA, NA, 2.368, NA, NA, NA,
NA, NA)), row.names = c(NA, 20L), class = "data.frame")
我试了几次 for
循环都没有成功。
任何帮助将不胜感激。
这是一个基础 R for
循环解决方案。
编写一个函数来比较两个连续的非NA
值,如果它们相同,则用相同的值填充中间的NA
值。
fill_NA_values <- function(x) {
#Index of non-NA values
non_na_values <- which(!is.na(x))
#loop over each index.
for(i in seq_along(non_na_values[-1])) {
#If two consecutive non-NA value are the same
if(x[non_na_values[i]] == x[non_na_values[i + 1]]) {
#Fill the NA values in between with the value.
x[(non_na_values[i] + 1):(non_na_values[i+1] -1)] <- x[non_na_values[i]]
}
}
x
}
使用 lapply
将此应用于多列。
df[-1] <- lapply(df[-1], fill_NA_values)
df
# date X1 X3 X4
#1 2004-12-27 NA NA NA
#2 2004-12-28 2.299 2.349 2.348
#3 2004-12-29 2.299 2.349 2.348
#4 2005-01-03 2.299 2.349 2.348
#5 2005-01-04 2.299 2.349 2.348
#6 2005-01-05 2.299 2.349 2.348
#7 2005-01-06 2.299 2.349 2.348
#8 2005-01-10 2.299 2.349 2.348
#9 2005-01-11 2.299 2.349 2.348
#10 2005-01-12 2.299 NA NA
#11 2005-01-17 2.299 NA NA
#12 2005-01-18 2.299 NA NA
#13 2005-01-19 2.299 NA NA
#14 2005-01-24 2.299 NA NA
#15 2005-01-25 2.299 2.369 2.368
#16 2005-01-26 2.299 NA NA
#17 2005-01-31 2.299 NA NA
#18 2005-02-01 NA NA NA
#19 2005-02-02 NA NA NA
#20 2005-02-08 NA NA NA
我想用以前的非 NA 值替换我数据集中的 NA 值,但前提是 NA 介于相同的值之间。
为了说明这里有一小部分数据:
date 1 2 3
1 2004-12-27 NA NA NA
2 2004-12-28 2.299 2.349 2.348
3 2004-12-29 NA NA NA
4 2005-01-03 NA NA NA
5 2005-01-04 NA NA NA
6 2005-01-05 2.299 NA NA
7 2005-01-06 NA NA NA
8 2005-01-10 NA NA NA
9 2005-01-11 2.299 2.349 2.348
10 2005-01-12 NA NA NA
11 2005-01-17 NA NA NA
12 2005-01-18 2.299 NA NA
13 2005-01-19 NA NA NA
14 2005-01-24 NA NA NA
15 2005-01-25 NA 2.369 2.368
16 2005-01-26 2.299 NA NA
17 2005-01-31 2.299 NA NA
18 2005-02-01 NA NA NA
19 2005-02-02 NA NA NA
20 2005-02-08 NA NA NA
理想的输出是:
date 1 2 3
1 2004-12-27 NA NA NA
2 2004-12-28 2.299 2.349 2.348
3 2004-12-29 2.299 2.349 2.348
4 2005-01-03 2.299 2.349 2.348
5 2005-01-04 2.299 2.349 2.348
6 2005-01-05 2.299 2.349 2.348
7 2005-01-06 2.299 2.349 2.348
8 2005-01-10 2.299 2.349 2.348
9 2005-01-11 2.299 2.349 2.348
10 2005-01-12 2.299 NA NA
11 2005-01-17 2.299 NA NA
12 2005-01-18 2.299 NA NA
13 2005-01-19 2.299 NA NA
14 2005-01-24 2.299 NA NA
15 2005-01-25 2.299 2.369 2.368
16 2005-01-26 2.299 NA NA
17 2005-01-31 2.299 NA NA
这是使用 dput
的数据集的可重现样本:
structure(list(data_gas = structure(c(12779, 12780, 12781, 12786,
12787, 12788, 12789, 12793, 12794, 12795, 12800, 12801, 12802,
12807, 12808, 12809, 12814, 12815, 12816, 12822), class = "Date"),
`1` = c(NA, 2.299, NA, NA, NA, 2.299, NA, NA, 2.299, NA,
NA, 2.299, NA, NA, NA, 2.299, 2.299, NA, NA, NA), `3` = c(NA,
2.349, NA, NA, NA, NA, NA, NA, 2.349, NA, NA, NA, NA, NA,
2.369, NA, NA, NA, NA, NA), `4` = c(NA, 2.348, NA, NA, NA,
NA, NA, NA, 2.348, NA, NA, NA, NA, NA, 2.368, NA, NA, NA,
NA, NA)), row.names = c(NA, 20L), class = "data.frame")
我试了几次 for
循环都没有成功。
任何帮助将不胜感激。
这是一个基础 R for
循环解决方案。
编写一个函数来比较两个连续的非NA
值,如果它们相同,则用相同的值填充中间的NA
值。
fill_NA_values <- function(x) {
#Index of non-NA values
non_na_values <- which(!is.na(x))
#loop over each index.
for(i in seq_along(non_na_values[-1])) {
#If two consecutive non-NA value are the same
if(x[non_na_values[i]] == x[non_na_values[i + 1]]) {
#Fill the NA values in between with the value.
x[(non_na_values[i] + 1):(non_na_values[i+1] -1)] <- x[non_na_values[i]]
}
}
x
}
使用 lapply
将此应用于多列。
df[-1] <- lapply(df[-1], fill_NA_values)
df
# date X1 X3 X4
#1 2004-12-27 NA NA NA
#2 2004-12-28 2.299 2.349 2.348
#3 2004-12-29 2.299 2.349 2.348
#4 2005-01-03 2.299 2.349 2.348
#5 2005-01-04 2.299 2.349 2.348
#6 2005-01-05 2.299 2.349 2.348
#7 2005-01-06 2.299 2.349 2.348
#8 2005-01-10 2.299 2.349 2.348
#9 2005-01-11 2.299 2.349 2.348
#10 2005-01-12 2.299 NA NA
#11 2005-01-17 2.299 NA NA
#12 2005-01-18 2.299 NA NA
#13 2005-01-19 2.299 NA NA
#14 2005-01-24 2.299 NA NA
#15 2005-01-25 2.299 2.369 2.368
#16 2005-01-26 2.299 NA NA
#17 2005-01-31 2.299 NA NA
#18 2005-02-01 NA NA NA
#19 2005-02-02 NA NA NA
#20 2005-02-08 NA NA NA