在 R 中使用 lag() 函数时用原始值替换 NA
replace NA with original value when using lag() function in R
我正在使用 dplyr
的 lag()
函数,我试图找出不让 NA(而是取原始值)作为空白滞后单元格的默认值。
这是我的代码:
df <- data_frame(d1 = runif(10, 1, 5),
d2 = runif(10, 2, 6),
d3 = runif(10, 3, 7),
d4 = runif(10, 4, 8),
d5 = runif(10, 5, 9),
d6 = runif(10, 6, 10),
d7 = runif(10, 7, 11),
d8 = runif(10, 8, 12)) %>% rownames_to_column()
df %>%
gather(key = "col", value = "val", -"rowname") %>%
group_by(col) %>%
mutate(new_col = ifelse(val >= lag(val, 2) + lag(val, 2)*0.4, NA, val))
如果我执行此代码(老实说,我很期待)它不起作用:
df %>%
gather(key = "col", value = "val", -"rowname") %>%
group_by(col) %>%
mutate(new_col = if_else(val >= lag(val, 2, default = val) + lag(val, 2, default = val)*0.4, NA, val))
我缺少什么才能得到这个结果?
rowname col val new_col
<chr> <chr> <dbl> <dbl>
1 1 d1 1.31 **1.31**
2 2 d1 4.10 **4.10**
3 3 d1 3.81 NA
4 4 d1 4.52 4.52
5 5 d1 3.89 3.89
6 6 d1 1.01 1.01
7 7 d1 2.68 2.68
8 8 d1 2.81 NA
9 9 d1 1.18 1.18
10 10 d1 1.19 1.19
# ... with 70 more rows
感谢任何帮助!
您可以 replace
n
滞后值与原始值。
library(dplyr)
n <- 2
df %>%
tidyr::pivot_longer(cols = -rowname, values_to = 'val', names_to = 'col') %>%
group_by(col) %>%
mutate(new_col = if_else(val >= lag(val, n) + lag(val, n)*0.4, NA_real_, val),
new_col = replace(new_col, 1:n, val[1:n]))
coalesce就是针对这类问题而生的
library(tidyverse)
set.seed(42)
df <- data_frame(d1 = runif(10, 1, 5),
d2 = runif(10, 2, 6),
d3 = runif(10, 3, 7),
d4 = runif(10, 4, 8),
d5 = runif(10, 5, 9),
d6 = runif(10, 6, 10),
d7 = runif(10, 7, 11),
d8 = runif(10, 8, 12)) %>% rownames_to_column()
#> Warning: `data_frame()` is deprecated as of tibble 1.1.0.
#> Please use `tibble()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
df %>%
gather(key = "col", value = "val", -"rowname") %>%
group_by(col) %>%
mutate(new_col = ifelse(val >= lag(val, 2) + lag(val, 2)*0.4, NA, val),
new_col_no_na = coalesce(new_col,val))
#> # A tibble: 80 x 5
#> # Groups: col [8]
#> rowname col val new_col new_col_no_na
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 d1 4.66 NA 4.66
#> 2 2 d1 4.75 NA 4.75
#> 3 3 d1 2.14 2.14 2.14
#> 4 4 d1 4.32 4.32 4.32
#> 5 5 d1 3.57 NA 3.57
#> 6 6 d1 3.08 3.08 3.08
#> 7 7 d1 3.95 3.95 3.95
#> 8 8 d1 1.54 1.54 1.54
#> 9 9 d1 3.63 3.63 3.63
#> 10 10 d1 3.82 NA 3.82
#> # ... with 70 more rows
由 reprex package (v0.3.0)
于 2020-06-07 创建
我正在使用 dplyr
的 lag()
函数,我试图找出不让 NA(而是取原始值)作为空白滞后单元格的默认值。
这是我的代码:
df <- data_frame(d1 = runif(10, 1, 5),
d2 = runif(10, 2, 6),
d3 = runif(10, 3, 7),
d4 = runif(10, 4, 8),
d5 = runif(10, 5, 9),
d6 = runif(10, 6, 10),
d7 = runif(10, 7, 11),
d8 = runif(10, 8, 12)) %>% rownames_to_column()
df %>%
gather(key = "col", value = "val", -"rowname") %>%
group_by(col) %>%
mutate(new_col = ifelse(val >= lag(val, 2) + lag(val, 2)*0.4, NA, val))
如果我执行此代码(老实说,我很期待)它不起作用:
df %>%
gather(key = "col", value = "val", -"rowname") %>%
group_by(col) %>%
mutate(new_col = if_else(val >= lag(val, 2, default = val) + lag(val, 2, default = val)*0.4, NA, val))
我缺少什么才能得到这个结果?
rowname col val new_col
<chr> <chr> <dbl> <dbl>
1 1 d1 1.31 **1.31**
2 2 d1 4.10 **4.10**
3 3 d1 3.81 NA
4 4 d1 4.52 4.52
5 5 d1 3.89 3.89
6 6 d1 1.01 1.01
7 7 d1 2.68 2.68
8 8 d1 2.81 NA
9 9 d1 1.18 1.18
10 10 d1 1.19 1.19
# ... with 70 more rows
感谢任何帮助!
您可以 replace
n
滞后值与原始值。
library(dplyr)
n <- 2
df %>%
tidyr::pivot_longer(cols = -rowname, values_to = 'val', names_to = 'col') %>%
group_by(col) %>%
mutate(new_col = if_else(val >= lag(val, n) + lag(val, n)*0.4, NA_real_, val),
new_col = replace(new_col, 1:n, val[1:n]))
coalesce就是针对这类问题而生的
library(tidyverse)
set.seed(42)
df <- data_frame(d1 = runif(10, 1, 5),
d2 = runif(10, 2, 6),
d3 = runif(10, 3, 7),
d4 = runif(10, 4, 8),
d5 = runif(10, 5, 9),
d6 = runif(10, 6, 10),
d7 = runif(10, 7, 11),
d8 = runif(10, 8, 12)) %>% rownames_to_column()
#> Warning: `data_frame()` is deprecated as of tibble 1.1.0.
#> Please use `tibble()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
df %>%
gather(key = "col", value = "val", -"rowname") %>%
group_by(col) %>%
mutate(new_col = ifelse(val >= lag(val, 2) + lag(val, 2)*0.4, NA, val),
new_col_no_na = coalesce(new_col,val))
#> # A tibble: 80 x 5
#> # Groups: col [8]
#> rowname col val new_col new_col_no_na
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 d1 4.66 NA 4.66
#> 2 2 d1 4.75 NA 4.75
#> 3 3 d1 2.14 2.14 2.14
#> 4 4 d1 4.32 4.32 4.32
#> 5 5 d1 3.57 NA 3.57
#> 6 6 d1 3.08 3.08 3.08
#> 7 7 d1 3.95 3.95 3.95
#> 8 8 d1 1.54 1.54 1.54
#> 9 9 d1 3.63 3.63 3.63
#> 10 10 d1 3.82 NA 3.82
#> # ... with 70 more rows
由 reprex package (v0.3.0)
于 2020-06-07 创建