如何根据过滤值用修改后的列替换 R 中的列？（删除面板数据中的异常值）

Question

我有一个这样的面板数据集

year	id	treatment_year	time_to_treatment	outcome
2000	1	2011	-11	2
2002	1	2011	-10	3
2004	2	2015	-9	22

等等等等。我正在尝试通过 'Winsorize' 处理异常值。最终目标是制作一个散点图，X 轴为 time_to_treatment，Y 轴为结果。

我想将每个 time_to_treatment 的结果替换为它的 winsorized 结果，即将所有极值替换为 5% 和 95% 分位数。到目前为止，我尝试做的是这个，但它不起作用。

for(i in range(dataset$time_to_treatment)){
    dplyr::filter(dataset, time_to_treatment == i)$outcome <-  DescTools::Winsorize(dplyr::filter(dataset,time_to_treatment==i)$outcome)
}

我收到错误 - 过滤器错误（数据集，time_to_treatment == i）<- *vtmp*：找不到函数“过滤器<-”

谁能提供更好的方法？谢谢

我的实际数据其中：冲突 = 结果，佣金 = 治疗年份，CD_mun = id.

相关时间段指标为time_to_t

组：年份，CD_MUN，类型 [6]

type	CD_MUN	year	time_to_t	conflicts	commission
chr	dbl	dbl	dbl	int	dbl
manif	1100023	2000	-11	1	2011
manif	1100189	2000	-3	2	2003
manif	1100205	2000	-9	5	2009
manif	1500602	2000	-4	1	2004
manif	3111002	2000	-11	2	2011
manif	3147006	2000	-10	1	2010

Answer 1

首先你可以使用这个：

# The data
set.seed(123)
df <- data.frame(
  time_to_treatment = seq(-15, 0, 1),
  outcome = sample(1:30, 16, replace=T)
)

# A solution without Winsorize based solely on dplyr
library(dplyr)
df %>% 
  mutate(outcome05 = quantile(outcome, probs = 0.05), # 5% quantile
         outcome95 = quantile(outcome, probs = 0.95), # 95% quantile
         outcome = ifelse(outcome <= outcome05, outcome05, outcome), # replace
         outcome = ifelse(outcome >= outcome95, outcome95, outcome)) %>% 
  select(-c(outcome05, outcome95))

您可以根据您的具体问题进行调整。

Answer 2

假设“时间段”指的是 'commission' 列，您可以使用 ave.

transform(dat, conflicts_w=ave(conflicts, commission, FUN=DescTools::Winsorize))
#    type  CD_MUN year time_to_t conflicts commission conflicts_w
# 1 manif 1100023 2000       -11         1       2011        1.05
# 2 manif 1100189 2000        -3         2       2003        2.00
# 3 manif 1100205 2000        -9         5       2009        5.00
# 4 manif 1500602 2000        -4         1       2004        1.00
# 5 manif 3111002 2000       -11         2       2011        1.95
# 6 manif 3147006 2000       -10         1       2010        1.00

数据：

dat <- structure(list(type = c("manif", "manif", "manif", "manif", "manif", 
"manif"), CD_MUN = c(1100023L, 1100189L, 1100205L, 1500602L, 
3111002L, 3147006L), year = c(2000L, 2000L, 2000L, 2000L, 2000L, 
2000L), time_to_t = c(-11L, -3L, -9L, -4L, -11L, -10L), conflicts = c(1L, 
2L, 5L, 1L, 2L, 1L), commission = c(2011L, 2003L, 2009L, 2004L, 
2011L, 2010L)), class = "data.frame", row.names = c(NA, -6L))

如何根据过滤值用修改后的列替换 R 中的列？（删除面板数据中的异常值）

How to replace a column in R by a modified column, dependent on filtered values? (removing outliers in panel data)

r

outliers

panel-data

tidyverse

如何根据过滤值用修改后的列替换 R 中的列？ （删除面板数据中的异常值）

How to replace a column in R by a modified column, dependent on filtered values? (removing outliers in panel data)

r

outliers

panel-data

tidyverse

如何根据过滤值用修改后的列替换 R 中的列？（删除面板数据中的异常值）