删除重复的两个标准区间 R
remove duplicate two criterion interval R
我正在使用 R 清理和处理数据。我想从矩阵中删除重复项。请参见下面的示例。
我想根据两个标准删除重复项,如果可以使用间隔(如果在 [=21= 中多次检测到同一行的 RT ± 0.1 和 m.z ± 0.001 ], 所以删除多余的行).
RT m.z
1 2.02 326.1988
2 2.03 326.1989
3 2.06 326.1990
4 2.03 331.1533
5 2.03 375.1785
6 2.03 301.2852
7 2.04 301.2852
8 2.06 301.2852
9 2.07 357.2609
10 2.07 308.0327
11 2.08 218.2221
12 2.08 312.3617
13 2.10 473.3453
14 2.15 388.3929
我想要这样的输出:
RT m.z
1 2.02 326.1988
2
3 2.06 326.1990
4 2.03 331.1533
5 2.03 375.1785
6 2.03 301.2852
7
8 2.06 301.2852
9 2.07 357.2609
10 2.07 308.0327
11 2.08 218.2221
12 2.08 312.3617
13 2.10 473.3453
14 2.15 388.3929
如果你能提供帮助,那将对我有很大帮助。
提前致谢。
这是 dplyr
的一种方法。不确定这是否是最有效的方法。
df <- read.table(textConnection("RT m.z
1 2.02 326.1988
2 2.03 326.1989
3 2.06 326.1990
4 2.03 331.1533
5 2.03 375.1785
6 2.03 301.2852
7 2.04 301.2852
8 2.06 301.2852
9 2.07 357.2609
10 2.07 308.0327
11 2.08 218.2221
12 2.08 312.3617
13 2.10 473.3453
14 2.15 388.3929"))
现在使用您提供的相同数据。
library(dplyr)
# This calculates the difference in RT and m.z between consecutive rows
# and looks for absolute differences on which we filter further down the chain
df %>% mutate(
rtdiff = abs(lag(RT) - RT),
mzdiff = abs(lag(m.z) - m.z)
) %>%
# This replaces the NAs in the first row
# with large values so filter does not have to deal with NAs
mutate(rtdiff = replace(rtdiff, is.na(rtdiff), 999),
mzdiff = replace(mzdiff, is.na(mzdiff), 999)) %>%
# Remove the rows that don't meet your condition
filter(!(rtdiff < 0.02 & mzdiff < 0.0002)) %>%
# select only the columns you need and lose the rest
select(RT, m.z)
给我们:
RT m.z
1 2.02 326.1988
2 2.06 326.1990
3 2.03 331.1533
4 2.03 375.1785
5 2.03 301.2852
6 2.06 301.2852
7 2.07 357.2609
8 2.07 308.0327
9 2.08 218.2221
10 2.08 312.3617
11 2.10 473.3453
12 2.15 388.3929
嗨,我似乎在我的重复之间插入了值。
所以我建议对 Maiasaura 代码进行一个小改动。
for (i in 1:100){
reduced.list.pre.filtering = reduced.list.pre.filtering %>% mutate(
rtdiff = abs(lag(RT..min.,i) - RT..min.),
mzdiff = abs(lag(Max..m.z,i) - Max..m.z)) %>%
mutate(rtdiff = replace(rtdiff, is.na(rtdiff), 999),
mzdiff = replace(mzdiff, is.na(mzdiff), 999)) %>%
filter(!(rtdiff < setRT & mzdiff < setmz )) %>%
select(RT..min., Max..m.z)}
像这样我们检查一行的所有 100 个跟随值。希望它能帮助别人。如果您有更好的解决方案,请不要犹豫。
我正在使用 R 清理和处理数据。我想从矩阵中删除重复项。请参见下面的示例。 我想根据两个标准删除重复项,如果可以使用间隔(如果在 [=21= 中多次检测到同一行的 RT ± 0.1 和 m.z ± 0.001 ], 所以删除多余的行).
RT m.z
1 2.02 326.1988
2 2.03 326.1989
3 2.06 326.1990
4 2.03 331.1533
5 2.03 375.1785
6 2.03 301.2852
7 2.04 301.2852
8 2.06 301.2852
9 2.07 357.2609
10 2.07 308.0327
11 2.08 218.2221
12 2.08 312.3617
13 2.10 473.3453
14 2.15 388.3929
我想要这样的输出:
RT m.z
1 2.02 326.1988
2
3 2.06 326.1990
4 2.03 331.1533
5 2.03 375.1785
6 2.03 301.2852
7
8 2.06 301.2852
9 2.07 357.2609
10 2.07 308.0327
11 2.08 218.2221
12 2.08 312.3617
13 2.10 473.3453
14 2.15 388.3929
如果你能提供帮助,那将对我有很大帮助。
提前致谢。
这是 dplyr
的一种方法。不确定这是否是最有效的方法。
df <- read.table(textConnection("RT m.z
1 2.02 326.1988
2 2.03 326.1989
3 2.06 326.1990
4 2.03 331.1533
5 2.03 375.1785
6 2.03 301.2852
7 2.04 301.2852
8 2.06 301.2852
9 2.07 357.2609
10 2.07 308.0327
11 2.08 218.2221
12 2.08 312.3617
13 2.10 473.3453
14 2.15 388.3929"))
现在使用您提供的相同数据。
library(dplyr)
# This calculates the difference in RT and m.z between consecutive rows
# and looks for absolute differences on which we filter further down the chain
df %>% mutate(
rtdiff = abs(lag(RT) - RT),
mzdiff = abs(lag(m.z) - m.z)
) %>%
# This replaces the NAs in the first row
# with large values so filter does not have to deal with NAs
mutate(rtdiff = replace(rtdiff, is.na(rtdiff), 999),
mzdiff = replace(mzdiff, is.na(mzdiff), 999)) %>%
# Remove the rows that don't meet your condition
filter(!(rtdiff < 0.02 & mzdiff < 0.0002)) %>%
# select only the columns you need and lose the rest
select(RT, m.z)
给我们:
RT m.z
1 2.02 326.1988
2 2.06 326.1990
3 2.03 331.1533
4 2.03 375.1785
5 2.03 301.2852
6 2.06 301.2852
7 2.07 357.2609
8 2.07 308.0327
9 2.08 218.2221
10 2.08 312.3617
11 2.10 473.3453
12 2.15 388.3929
嗨,我似乎在我的重复之间插入了值。
所以我建议对 Maiasaura 代码进行一个小改动。
for (i in 1:100){
reduced.list.pre.filtering = reduced.list.pre.filtering %>% mutate(
rtdiff = abs(lag(RT..min.,i) - RT..min.),
mzdiff = abs(lag(Max..m.z,i) - Max..m.z)) %>%
mutate(rtdiff = replace(rtdiff, is.na(rtdiff), 999),
mzdiff = replace(mzdiff, is.na(mzdiff), 999)) %>%
filter(!(rtdiff < setRT & mzdiff < setmz )) %>%
select(RT..min., Max..m.z)}
像这样我们检查一行的所有 100 个跟随值。希望它能帮助别人。如果您有更好的解决方案,请不要犹豫。