创建新变量,其中所有不在第 1 个和第 99 个百分位数之间的值都替换为缺失值 (NA)

Create new variable where all values that are not between the 1st and the 99th percentile are substituted with missing values (NA)

我想复制 india04 数据框并使用 mutate() 添加一个名为“incwage_adj”的新变量,其中第 1 个百分位数和第 99 个百分位数之间的所有收入值都替换为缺失值 (NA) . 套餐:

library(tidyverse)
require(nycflights13)
data(diamonds)
load("india04.Rdata")

代码:

india04_new <- india04 %>%
mutate(incwage_adj = ifelse(incwage != quantile(india04_new2$incwage, 0.99), NA, incwage))

我们可以使用between来创建一个逻辑条件,将不在1到99%之间的值改为NA

library(dplyr)
 mtcars %>% 
    mutate(mpg_adj =  ifelse(between(mpg, 
      quantile(mpg, 0.01, na.rm = TRUE), 
      quantile(mpg, 0.99, na.rm = TRUE)), mpg, NA))

或者用case_when

mtcars %>% 
    mutate(mpg_adj =  case_when(between(mpg, 
      quantile(mpg, 0.01, na.rm = TRUE), 
      quantile(mpg, 0.99, na.rm = TRUE))~ mpg))

尝试以下操作:

india04_new2$incwage[with(india04_new2, 
 incwage > quantile(incwage, 0.01) & incwage < quantile(incwage, 0.99))] <- NA

这应该替换第 1 个和第 99 个百分位数之间的所有 incwage