在 R 中过滤测量值超过一年的行
Filter rows with measurements for more than one year in R
这是我的数据集的一个子集,在几个实验中测量了变量elevated
Experiment.Name Sampling.Year elevated
3409 Swiss Jura_c 1999 17.30000
3410 Swiss Jura_c 1999 9.10000
3411 SwissFACE_lolium_c 2000 -1.45545
3412 SwissFACE_lolium_c 2000 -2.94843
3413 SwissFACE_lolium_c 2000 -3.74132
3414 SwissFACE_lolium_c 2000 -1.42080
3461 DRI_c 1993 122.87900
3462 DRI_c 1993 13.71500
3463 DRI_c 1993 0.91800
3464 DRI_c 1993 1.29800
3465 DRI_c 1993 2.43600
3466 DRI_c 1993 3.46600
3467 DRI_c 1994 0.42700
3469 DRI_c 1994 1.74100
3470 DRI_c 1994 1.01700
3471 DRI_c 1994 2.38300
3640 Bonanza Creek_pb_f 2001 3222.00000
3641 Bonanza Creek_pg_f 2001 3455.00000
3665 Fork Mountain_f 2000 0.24900
3669 Fork Mountain_f 2001 0.23100
4037 KFFL_wh 2003 42.07000
我想对整个数据集进行子集化,因此我只保留那些包含 elevated
测量值的实验超过一年。例如,在上面的 table 中,我会排除对应于 Swiss Jura_c
实验的行,因为它只有一年的测量值:1999
。但是,我会包括与 DRI_c
实验对应的行,因为它包含超过一年的测量值:1993
和 1994
。我怎样才能在 R 中实现这样的子集选择?
谢谢
尝试
library(data.table)
setDT(df1)[, .SD[uniqueN(Sampling.Year)>1], Experiment.Name]
或者
library(dplyr)
df1 %>%
group_by(Experiment.Name) %>%
filter(n_distinct(Sampling.Year)>1)
数据
df1 <- structure(list(Experiment.Name = c("Swiss Jura_c",
"Swiss Jura_c",
"SwissFACE_lolium_c", "SwissFACE_lolium_c", "SwissFACE_lolium_c",
"SwissFACE_lolium_c", "DRI_c", "DRI_c", "DRI_c", "DRI_c", "DRI_c",
"DRI_c", "DRI_c", "DRI_c", "DRI_c", "DRI_c", "Bonanza Creek_pb_f",
"Bonanza Creek_pg_f", "Fork Mountain_f", "Fork Mountain_f", "KFFL_wh"
), Sampling.Year = c(1999L, 1999L, 2000L, 2000L, 2000L, 2000L,
1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1994L, 1994L, 1994L,
1994L, 2001L, 2001L, 2000L, 2001L, 2003L), elevated = c(17.3,
9.1, -1.45545, -2.94843, -3.74132, -1.4208, 122.879, 13.715,
0.918, 1.298, 2.436, 3.466, 0.427, 1.741, 1.017, 2.383, 3222,
3455, 0.249, 0.231, 42.07)), .Names = c("Experiment.Name",
"Sampling.Year",
"elevated"), row.names = c(3409L, 3410L, 3411L, 3412L, 3413L,
3414L, 3461L, 3462L, 3463L, 3464L, 3465L, 3466L, 3467L, 3469L,
3470L, 3471L, 3640L, 3641L, 3665L, 3669L, 4037L), class = "data.frame")
或使用基数 R:
a <- aggregate(Sampling.Year ~ Experiment.Name, data=df1, function(x) length(unique(x)))
df1[which(df1$Experiment.Name %in% a$Experiment.Name[which(a$Sampling.Year > 1)]),]]
这是我的数据集的一个子集,在几个实验中测量了变量elevated
Experiment.Name Sampling.Year elevated
3409 Swiss Jura_c 1999 17.30000
3410 Swiss Jura_c 1999 9.10000
3411 SwissFACE_lolium_c 2000 -1.45545
3412 SwissFACE_lolium_c 2000 -2.94843
3413 SwissFACE_lolium_c 2000 -3.74132
3414 SwissFACE_lolium_c 2000 -1.42080
3461 DRI_c 1993 122.87900
3462 DRI_c 1993 13.71500
3463 DRI_c 1993 0.91800
3464 DRI_c 1993 1.29800
3465 DRI_c 1993 2.43600
3466 DRI_c 1993 3.46600
3467 DRI_c 1994 0.42700
3469 DRI_c 1994 1.74100
3470 DRI_c 1994 1.01700
3471 DRI_c 1994 2.38300
3640 Bonanza Creek_pb_f 2001 3222.00000
3641 Bonanza Creek_pg_f 2001 3455.00000
3665 Fork Mountain_f 2000 0.24900
3669 Fork Mountain_f 2001 0.23100
4037 KFFL_wh 2003 42.07000
我想对整个数据集进行子集化,因此我只保留那些包含 elevated
测量值的实验超过一年。例如,在上面的 table 中,我会排除对应于 Swiss Jura_c
实验的行,因为它只有一年的测量值:1999
。但是,我会包括与 DRI_c
实验对应的行,因为它包含超过一年的测量值:1993
和 1994
。我怎样才能在 R 中实现这样的子集选择?
谢谢
尝试
library(data.table)
setDT(df1)[, .SD[uniqueN(Sampling.Year)>1], Experiment.Name]
或者
library(dplyr)
df1 %>%
group_by(Experiment.Name) %>%
filter(n_distinct(Sampling.Year)>1)
数据
df1 <- structure(list(Experiment.Name = c("Swiss Jura_c",
"Swiss Jura_c",
"SwissFACE_lolium_c", "SwissFACE_lolium_c", "SwissFACE_lolium_c",
"SwissFACE_lolium_c", "DRI_c", "DRI_c", "DRI_c", "DRI_c", "DRI_c",
"DRI_c", "DRI_c", "DRI_c", "DRI_c", "DRI_c", "Bonanza Creek_pb_f",
"Bonanza Creek_pg_f", "Fork Mountain_f", "Fork Mountain_f", "KFFL_wh"
), Sampling.Year = c(1999L, 1999L, 2000L, 2000L, 2000L, 2000L,
1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1994L, 1994L, 1994L,
1994L, 2001L, 2001L, 2000L, 2001L, 2003L), elevated = c(17.3,
9.1, -1.45545, -2.94843, -3.74132, -1.4208, 122.879, 13.715,
0.918, 1.298, 2.436, 3.466, 0.427, 1.741, 1.017, 2.383, 3222,
3455, 0.249, 0.231, 42.07)), .Names = c("Experiment.Name",
"Sampling.Year",
"elevated"), row.names = c(3409L, 3410L, 3411L, 3412L, 3413L,
3414L, 3461L, 3462L, 3463L, 3464L, 3465L, 3466L, 3467L, 3469L,
3470L, 3471L, 3640L, 3641L, 3665L, 3669L, 4037L), class = "data.frame")
或使用基数 R:
a <- aggregate(Sampling.Year ~ Experiment.Name, data=df1, function(x) length(unique(x)))
df1[which(df1$Experiment.Name %in% a$Experiment.Name[which(a$Sampling.Year > 1)]),]]