dplyr 根据差异丢弃值
dplyr drop values based on difference
我有一个table这样的
Stimuli Subject Block TChosen Percentage
<fct> <fct> <fct> <int> <chr>
1 1 1 13 7 14.29%
2 2 1 13 18 36.73%
3 3 1 13 24 48.98%
4 1 2 13 3 6.12%
5 2 2 13 15 30.61%
6 3 2 13 31 63.27%
7 13 100 13 13 26.53%
8 14 100 13 11 22.45%
9 15 100 13 25 51.02%
10 1 1002 13 9 18.37%
每个主题每个块,我想删除百分比在另一个条目的 10% 以内的行。所以在这种情况下,上面的条目 7 和 8 将被删除。
期望的输出
Stimuli Subject Block TChosen Percentage
<fct> <fct> <fct> <int> <chr>
1 1 1 13 7 14.29%
2 2 1 13 18 36.73%
3 3 1 13 24 48.98%
4 1 2 13 3 6.12%
5 2 2 13 15 30.61%
6 3 2 13 31 63.27%
7 15 100 13 25 51.02%
8 1 1002 13 9 18.37%
谢谢!
你可以试试这个方法:
library(dplyr)
df %>%
mutate(Percentage = readr::parse_number(Percentage)) %>%
arrange(Subject, Block, Percentage) %>%
group_by(Subject, Block) %>%
filter(Percentage - lag(Percentage, default = -Inf) > 10 &
lead(Percentage, default = Inf) - Percentage > 10) %>%
ungroup
# Stimuli Subject Block TChosen Percentage
# <int> <int> <int> <int> <dbl>
#1 1 1 13 7 14.3
#2 2 1 13 18 36.7
#3 3 1 13 24 49.0
#4 1 2 13 3 6.12
#5 2 2 13 15 30.6
#6 3 2 13 31 63.3
#7 15 100 13 25 51.0
#8 1 1002 13 9 18.4
将 Percentage
转换为数字,并在每个 Subject
和 Block
.
中保留大于前一个值和下一个值 10% 的行
数据
df <- structure(list(Stimuli = c(1L, 2L, 3L, 1L, 2L, 3L, 13L, 14L,
15L, 1L), Subject = c(1L, 1L, 1L, 2L, 2L, 2L, 100L, 100L, 100L,
1002L), Block = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L), TChosen = c(7L, 18L, 24L, 3L, 15L, 31L, 13L, 11L, 25L,
9L), Percentage = c("14.29%", "36.73%", "48.98%", "6.12%", "30.61%",
"63.27%", "26.53%", "22.45%", "51.02%", "18.37%")),
class = "data.frame", row.names = c(NA, -10L))
我有一个table这样的
Stimuli Subject Block TChosen Percentage
<fct> <fct> <fct> <int> <chr>
1 1 1 13 7 14.29%
2 2 1 13 18 36.73%
3 3 1 13 24 48.98%
4 1 2 13 3 6.12%
5 2 2 13 15 30.61%
6 3 2 13 31 63.27%
7 13 100 13 13 26.53%
8 14 100 13 11 22.45%
9 15 100 13 25 51.02%
10 1 1002 13 9 18.37%
每个主题每个块,我想删除百分比在另一个条目的 10% 以内的行。所以在这种情况下,上面的条目 7 和 8 将被删除。
期望的输出
Stimuli Subject Block TChosen Percentage
<fct> <fct> <fct> <int> <chr>
1 1 1 13 7 14.29%
2 2 1 13 18 36.73%
3 3 1 13 24 48.98%
4 1 2 13 3 6.12%
5 2 2 13 15 30.61%
6 3 2 13 31 63.27%
7 15 100 13 25 51.02%
8 1 1002 13 9 18.37%
谢谢!
你可以试试这个方法:
library(dplyr)
df %>%
mutate(Percentage = readr::parse_number(Percentage)) %>%
arrange(Subject, Block, Percentage) %>%
group_by(Subject, Block) %>%
filter(Percentage - lag(Percentage, default = -Inf) > 10 &
lead(Percentage, default = Inf) - Percentage > 10) %>%
ungroup
# Stimuli Subject Block TChosen Percentage
# <int> <int> <int> <int> <dbl>
#1 1 1 13 7 14.3
#2 2 1 13 18 36.7
#3 3 1 13 24 49.0
#4 1 2 13 3 6.12
#5 2 2 13 15 30.6
#6 3 2 13 31 63.3
#7 15 100 13 25 51.0
#8 1 1002 13 9 18.4
将 Percentage
转换为数字,并在每个 Subject
和 Block
.
数据
df <- structure(list(Stimuli = c(1L, 2L, 3L, 1L, 2L, 3L, 13L, 14L,
15L, 1L), Subject = c(1L, 1L, 1L, 2L, 2L, 2L, 100L, 100L, 100L,
1002L), Block = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L), TChosen = c(7L, 18L, 24L, 3L, 15L, 31L, 13L, 11L, 25L,
9L), Percentage = c("14.29%", "36.73%", "48.98%", "6.12%", "30.61%",
"63.27%", "26.53%", "22.45%", "51.02%", "18.37%")),
class = "data.frame", row.names = c(NA, -10L))