如果一列中有一定数量的连续NA,则替换值
If there are a certain number of consecutive NAs in a column, then replace the values
我有一个名为 meanSR_strong
的专栏和另一个名为 meanSR_weak
的专栏。如果 meanSR_strong
列中有 10 个或更多连续的 NA,我想用 meanSR_weak
列中的值替换这些值,即使这些替换值也是 NA。如果 meanSR_strong
列中有 under 连续的 NA,那么我不需要做任何替换。
比如第3-6行都是NA,但那只是连续的四行,所以无所谓。但是第 15-28 行都是 NA(并且连续超过 10 行),所以我想从 meanSR_weak
列中提取值。
我知道如何替换所有 NA,但我还没有想出一个好的编码方式!
这是我的数据
x=structure(list(meanSR_strong = c(NA, 0.376009009009009, NA, NA,
NA, NA, 0.615585585585586, NA, 0.607354054054054, 0.590210810810811,
0.57005045045045, 0.596616216216216, 0.584066666666667, 0.538597297297297,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.639010810810811,
0.634272972972973), meanSR_weak = c(0.574724324324324, 0.562030630630631,
0.586247747747748, NA, NA, NA, 0.615585585585586, NA, 0.607354054054054,
0.590210810810811, 0.57005045045045, 0.596616216216216, 0.608510810810811,
0.538597297297297, NA, NA, NA, 0.555463063063063, 0.376715315315315,
NA, NA, NA, NA, NA, NA, 0.60972972972973, NA, NA, 0.639010810810811,
0.634272972972973), cloud.pct_strong = c(100, 36.036036036036,
98.1981981981982, 100, 100, 100, 0, 100, 0, 0, 0, 0, 3.6036036036036,
0, NA, NA, 100, 67.5675675675676, 100, 100, NA, 100, 100, 100,
100, 74.7747747747748, 100, 100, 0, 0), cloud.pct_weak = c(0,
0, 0, 100, 100, 100, 0, 100, 0, 0, 0, 0, 0, 0, NA, NA, 100, 0,
36.036036036036, 67.5675675675676, NA, 100, 100, 100, 100, 0.900900900900901,
100, 60.3603603603604, 0, 0), date = structure(c(951868800, 951955200,
952041600, 952128000, 952214400, 952300800, 952387200, 952473600,
952560000, 952646400, 952732800, 952819200, 952905600, 952992000,
953078400, 953164800, 953251200, 953337600, 953424000, 953510400,
953596800, 953683200, 953769600, 953856000, 953942400, 954028800,
954115200, 954201600, 954288000, 954374400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), .Names = c("meanSR_strong", "meanSR_weak",
"cloud.pct_strong", "cloud.pct_weak", "date"), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
temp = inverse.rle(with(rle(is.na(x$meanSR_strong)),
list(lengths = lengths,
values = replace(values, which(lengths > 10), 2))))
replace(x$meanSR_strong, temp == 2, x$meanSR_weak[temp == 2])
# [1] NA 0.3760090 NA NA NA
# [6] NA 0.6155856 NA 0.6073541 0.5902108
#[11] 0.5700505 0.5966162 0.5840667 0.5385973 NA
#[16] NA NA 0.5554631 0.3767153 NA
#[21] NA NA NA NA NA
#[26] 0.6097297 NA NA 0.6390108 0.6342730
R rle 函数可用于此。首先构建 is.na
值的 rle 列表("values" 和 "lengths",参见 ?rle
):
z <- rle(is.na(x$meanSR_strong))
然后当 NA 的 运行 小于您选择的某个长度时,将 z$values 条目从 TRUE 更改为 FALSE。这里我选择10:
z$values[z$lengths <10& z$values==TRUE] <- FALSE
然后使用 rep
函数重建一个逻辑向量以使用 [<-
函数进行索引,该函数本质上是 rle
:
的逆函数
x [ rep( z$values, z$lengths), "meanSR_strong"] <-
x[ rep( z$values, z$lengths), "meanSR_weak"]
print(x, n=30)
# A tibble: 30 x 5
meanSR_strong meanSR_weak cloud.pct_strong cloud.pct_weak date
<dbl> <dbl> <dbl> <dbl> <dttm>
1 NA 0.5747243 100.000000 0.0000000 2000-03-01
2 0.3760090 0.5620306 36.036036 0.0000000 2000-03-02
3 NA 0.5862477 98.198198 0.0000000 2000-03-03
4 NA NA 100.000000 100.0000000 2000-03-04
5 NA NA 100.000000 100.0000000 2000-03-05
6 NA NA 100.000000 100.0000000 2000-03-06
7 0.6155856 0.6155856 0.000000 0.0000000 2000-03-07
8 NA NA 100.000000 100.0000000 2000-03-08
9 0.6073541 0.6073541 0.000000 0.0000000 2000-03-09
10 0.5902108 0.5902108 0.000000 0.0000000 2000-03-10
11 0.5700505 0.5700505 0.000000 0.0000000 2000-03-11
12 0.5966162 0.5966162 0.000000 0.0000000 2000-03-12
13 0.5840667 0.6085108 3.603604 0.0000000 2000-03-13
14 0.5385973 0.5385973 0.000000 0.0000000 2000-03-14
15 NA NA NA NA 2000-03-15
16 NA NA NA NA 2000-03-16
17 NA NA 100.000000 100.0000000 2000-03-17
18 0.5554631 0.5554631 67.567568 0.0000000 2000-03-18
19 0.3767153 0.3767153 100.000000 36.0360360 2000-03-19
20 NA NA 100.000000 67.5675676 2000-03-20
21 NA NA NA NA 2000-03-21
22 NA NA 100.000000 100.0000000 2000-03-22
23 NA NA 100.000000 100.0000000 2000-03-23
24 NA NA 100.000000 100.0000000 2000-03-24
25 NA NA 100.000000 100.0000000 2000-03-25
26 0.6097297 0.6097297 74.774775 0.9009009 2000-03-26
27 NA NA 100.000000 100.0000000 2000-03-27
28 NA NA 100.000000 60.3603604 2000-03-28
29 0.6390108 0.6390108 0.000000 0.0000000 2000-03-29
30 0.6342730 0.6342730 0.000000 0.0000000 2000-03-30
我有一个名为 meanSR_strong
的专栏和另一个名为 meanSR_weak
的专栏。如果 meanSR_strong
列中有 10 个或更多连续的 NA,我想用 meanSR_weak
列中的值替换这些值,即使这些替换值也是 NA。如果 meanSR_strong
列中有 under 连续的 NA,那么我不需要做任何替换。
比如第3-6行都是NA,但那只是连续的四行,所以无所谓。但是第 15-28 行都是 NA(并且连续超过 10 行),所以我想从 meanSR_weak
列中提取值。
我知道如何替换所有 NA,但我还没有想出一个好的编码方式!
这是我的数据
x=structure(list(meanSR_strong = c(NA, 0.376009009009009, NA, NA,
NA, NA, 0.615585585585586, NA, 0.607354054054054, 0.590210810810811,
0.57005045045045, 0.596616216216216, 0.584066666666667, 0.538597297297297,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.639010810810811,
0.634272972972973), meanSR_weak = c(0.574724324324324, 0.562030630630631,
0.586247747747748, NA, NA, NA, 0.615585585585586, NA, 0.607354054054054,
0.590210810810811, 0.57005045045045, 0.596616216216216, 0.608510810810811,
0.538597297297297, NA, NA, NA, 0.555463063063063, 0.376715315315315,
NA, NA, NA, NA, NA, NA, 0.60972972972973, NA, NA, 0.639010810810811,
0.634272972972973), cloud.pct_strong = c(100, 36.036036036036,
98.1981981981982, 100, 100, 100, 0, 100, 0, 0, 0, 0, 3.6036036036036,
0, NA, NA, 100, 67.5675675675676, 100, 100, NA, 100, 100, 100,
100, 74.7747747747748, 100, 100, 0, 0), cloud.pct_weak = c(0,
0, 0, 100, 100, 100, 0, 100, 0, 0, 0, 0, 0, 0, NA, NA, 100, 0,
36.036036036036, 67.5675675675676, NA, 100, 100, 100, 100, 0.900900900900901,
100, 60.3603603603604, 0, 0), date = structure(c(951868800, 951955200,
952041600, 952128000, 952214400, 952300800, 952387200, 952473600,
952560000, 952646400, 952732800, 952819200, 952905600, 952992000,
953078400, 953164800, 953251200, 953337600, 953424000, 953510400,
953596800, 953683200, 953769600, 953856000, 953942400, 954028800,
954115200, 954201600, 954288000, 954374400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), .Names = c("meanSR_strong", "meanSR_weak",
"cloud.pct_strong", "cloud.pct_weak", "date"), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
temp = inverse.rle(with(rle(is.na(x$meanSR_strong)),
list(lengths = lengths,
values = replace(values, which(lengths > 10), 2))))
replace(x$meanSR_strong, temp == 2, x$meanSR_weak[temp == 2])
# [1] NA 0.3760090 NA NA NA
# [6] NA 0.6155856 NA 0.6073541 0.5902108
#[11] 0.5700505 0.5966162 0.5840667 0.5385973 NA
#[16] NA NA 0.5554631 0.3767153 NA
#[21] NA NA NA NA NA
#[26] 0.6097297 NA NA 0.6390108 0.6342730
R rle 函数可用于此。首先构建 is.na
值的 rle 列表("values" 和 "lengths",参见 ?rle
):
z <- rle(is.na(x$meanSR_strong))
然后当 NA 的 运行 小于您选择的某个长度时,将 z$values 条目从 TRUE 更改为 FALSE。这里我选择10:
z$values[z$lengths <10& z$values==TRUE] <- FALSE
然后使用 rep
函数重建一个逻辑向量以使用 [<-
函数进行索引,该函数本质上是 rle
:
x [ rep( z$values, z$lengths), "meanSR_strong"] <-
x[ rep( z$values, z$lengths), "meanSR_weak"]
print(x, n=30)
# A tibble: 30 x 5
meanSR_strong meanSR_weak cloud.pct_strong cloud.pct_weak date
<dbl> <dbl> <dbl> <dbl> <dttm>
1 NA 0.5747243 100.000000 0.0000000 2000-03-01
2 0.3760090 0.5620306 36.036036 0.0000000 2000-03-02
3 NA 0.5862477 98.198198 0.0000000 2000-03-03
4 NA NA 100.000000 100.0000000 2000-03-04
5 NA NA 100.000000 100.0000000 2000-03-05
6 NA NA 100.000000 100.0000000 2000-03-06
7 0.6155856 0.6155856 0.000000 0.0000000 2000-03-07
8 NA NA 100.000000 100.0000000 2000-03-08
9 0.6073541 0.6073541 0.000000 0.0000000 2000-03-09
10 0.5902108 0.5902108 0.000000 0.0000000 2000-03-10
11 0.5700505 0.5700505 0.000000 0.0000000 2000-03-11
12 0.5966162 0.5966162 0.000000 0.0000000 2000-03-12
13 0.5840667 0.6085108 3.603604 0.0000000 2000-03-13
14 0.5385973 0.5385973 0.000000 0.0000000 2000-03-14
15 NA NA NA NA 2000-03-15
16 NA NA NA NA 2000-03-16
17 NA NA 100.000000 100.0000000 2000-03-17
18 0.5554631 0.5554631 67.567568 0.0000000 2000-03-18
19 0.3767153 0.3767153 100.000000 36.0360360 2000-03-19
20 NA NA 100.000000 67.5675676 2000-03-20
21 NA NA NA NA 2000-03-21
22 NA NA 100.000000 100.0000000 2000-03-22
23 NA NA 100.000000 100.0000000 2000-03-23
24 NA NA 100.000000 100.0000000 2000-03-24
25 NA NA 100.000000 100.0000000 2000-03-25
26 0.6097297 0.6097297 74.774775 0.9009009 2000-03-26
27 NA NA 100.000000 100.0000000 2000-03-27
28 NA NA 100.000000 60.3603604 2000-03-28
29 0.6390108 0.6390108 0.000000 0.0000000 2000-03-29
30 0.6342730 0.6342730 0.000000 0.0000000 2000-03-30