根据字母顺序在 R 中过滤
Filter in R based on Alphabetical Order
看来我应该已经知道该怎么做了。但基本上我有一个 table 具有重复值,但在一列中存在差异。我搜索了一下,发现很多关于Sorting by alphabetical order,但不是filtering by alphabetical order的问题。
提前抱歉,我也不知道如何很好地格式化一些示例数据。
ResultID -Condition -nVariedSolute -tabscore5 -ItemPartID
644040-----LDoF -----2---------------- 2B------------ 540000
644040 ---LDoF -----1 -------------- 3B ---------- 540000
因此,我正在尝试根据 tabscore5 的最大值(按字母顺序)进行过滤。我使用 split() 发现的所有内容都假定它是一个数值。
我想保留整行,但只保留 tabscore5 中每个 ResultID 值的最大值的行。
我想这可能是这样的
df %>% group_by(ResultID) %>% split(max(c(which.min(tabscore5))))
但我一直没有收到任何数据作为响应。我错过了什么?
下面我尝试按照用户@MikeH 的建议使用 dput(my_df) 的输出,但我可能做错了。
structure(list(ResultID = c(644040L, 644040L, 644043L, 644047L, 644047L, 644050L, 644050L, 644249L, 644251L, 644251L, 644252L, 644252L, 644259L, 644259L), Condition = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,1L, 1L, 1L, 1L, 1L, 1L), .Label = c("HDoF", "LDoF"), class = "factor"), nVariedSolute = c(-1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1), tabscore5 = c("3B", "2B", "1", "1", "2A", "2B", "3A", "1", "1", "2A", "1", "2A", "1", "2A"), Question = c("1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim"), ItemPartID = c(540000, 540000, 540000, 539941, 539941, 539941, 539941, 540000, 539941, 539941, 539941, 539941, 539941, 539941)), .Names = c("ResultID", "Condition", "nVariedSolute", "tabscore5", "Question", "ItemPartID"), row.names = c(NA, -14L), class = "data.frame")
library(dplyr)
df %>%
group_by(ResultID) %>%
top_n(n = 1, wt =tabscore5)
# ResultID Condition nVariedSolute tabscore5 Question ItemPartID
# <int> <fctr> <dbl> <chr> <chr> <dbl>
# 1 644040 LDoF -1 3B 1 - DrinkMix_SometimesClaim 540000
# 2 644043 LDoF 1 1 1 - DrinkMix_SometimesClaim 540000
# 3 644047 HDoF 1 2A 1 - DrinkMix_SometimesClaim 539941
# 4 644050 HDoF 2 3A 1 - DrinkMix_SometimesClaim 539941
# 5 644249 LDoF 1 1 1 - DrinkMix_SometimesClaim 540000
# 6 644251 HDoF 1 2A 1 - DrinkMix_SometimesClaim 539941
# 7 644252 HDoF 1 2A 1 - DrinkMix_SometimesClaim 539941
# 8 644259 HDoF 1 2A 1 - DrinkMix_SometimesClaim 539941
看来我应该已经知道该怎么做了。但基本上我有一个 table 具有重复值,但在一列中存在差异。我搜索了一下,发现很多关于Sorting by alphabetical order,但不是filtering by alphabetical order的问题。
提前抱歉,我也不知道如何很好地格式化一些示例数据。
ResultID -Condition -nVariedSolute -tabscore5 -ItemPartID
644040-----LDoF -----2---------------- 2B------------ 540000
644040 ---LDoF -----1 -------------- 3B ---------- 540000
因此,我正在尝试根据 tabscore5 的最大值(按字母顺序)进行过滤。我使用 split() 发现的所有内容都假定它是一个数值。
我想保留整行,但只保留 tabscore5 中每个 ResultID 值的最大值的行。
我想这可能是这样的
df %>% group_by(ResultID) %>% split(max(c(which.min(tabscore5))))
但我一直没有收到任何数据作为响应。我错过了什么?
下面我尝试按照用户@MikeH 的建议使用 dput(my_df) 的输出,但我可能做错了。
structure(list(ResultID = c(644040L, 644040L, 644043L, 644047L, 644047L, 644050L, 644050L, 644249L, 644251L, 644251L, 644252L, 644252L, 644259L, 644259L), Condition = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,1L, 1L, 1L, 1L, 1L, 1L), .Label = c("HDoF", "LDoF"), class = "factor"), nVariedSolute = c(-1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1), tabscore5 = c("3B", "2B", "1", "1", "2A", "2B", "3A", "1", "1", "2A", "1", "2A", "1", "2A"), Question = c("1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim"), ItemPartID = c(540000, 540000, 540000, 539941, 539941, 539941, 539941, 540000, 539941, 539941, 539941, 539941, 539941, 539941)), .Names = c("ResultID", "Condition", "nVariedSolute", "tabscore5", "Question", "ItemPartID"), row.names = c(NA, -14L), class = "data.frame")
library(dplyr)
df %>%
group_by(ResultID) %>%
top_n(n = 1, wt =tabscore5)
# ResultID Condition nVariedSolute tabscore5 Question ItemPartID
# <int> <fctr> <dbl> <chr> <chr> <dbl>
# 1 644040 LDoF -1 3B 1 - DrinkMix_SometimesClaim 540000
# 2 644043 LDoF 1 1 1 - DrinkMix_SometimesClaim 540000
# 3 644047 HDoF 1 2A 1 - DrinkMix_SometimesClaim 539941
# 4 644050 HDoF 2 3A 1 - DrinkMix_SometimesClaim 539941
# 5 644249 LDoF 1 1 1 - DrinkMix_SometimesClaim 540000
# 6 644251 HDoF 1 2A 1 - DrinkMix_SometimesClaim 539941
# 7 644252 HDoF 1 2A 1 - DrinkMix_SometimesClaim 539941
# 8 644259 HDoF 1 2A 1 - DrinkMix_SometimesClaim 539941