R:在基于多列的数据框中使用排序功能
R: Using the sort function in a dataframe based on multiple columns
我是一名心脏病专家,喜欢用 R 编写代码 - 我在对数据框进行排序时遇到了一个真正的问题,我怀疑解决方案真的很简单!
我有一个数据框,其中包含多项研究 df$study 的汇总值。大多数研究只有一个汇总值 (df$summary)。然而,正如您所见,研究 A 具有三个汇总值 (df$no.of.estimate)。见下文
study <- c("E", "A", "F", "A", "B", "A", "C", "D")
no.of.estimate <- c(1, 2, 1, 3, 1, 1, 1, 1)
summary <- c(1, 2, 3, 5, 6 ,7 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)
所以我想按 df$summary
对数据框进行排序 - 这很容易。但是,如果每项研究都有多个估计,那么我想将这些研究组合在一起并使用 "no.of.estimates" 列按顺序显示。
所以基本上所需的输出是
study <- c("E", "A", "A", "A", "F", "B", "C", "D")
no.of.estimate <- c(1, 1, 2, 3, 1, 1, 1, 1)
summary <- c(1, 7, 2, 5, 3 ,6 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)
你可以试试
library(dplyr)
df %>%
mutate(study=factor(study, levels=unique(study))) %>%
arrange(study,no.of.estimate)
# study no.of.estimate summary
#1 E 1 1
#2 A 1 7
#3 A 2 2
#4 A 3 5
#5 F 1 3
#6 B 1 6
#7 C 1 8
#8 D 1 9
或base R
方法
df$study <- factor(df$study, levels=unique(df$study))
df[with(df, order(study, no.of.estimate)), ]
数据
df <- structure(list(study = structure(c(5L, 1L, 6L, 1L, 2L, 1L, 3L,
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"),
no.of.estimate = c(1, 2, 1, 3, 1, 1, 1, 1), summary = c(1,
2, 3, 5, 6, 7, 8, 9)), .Names = c("study", "no.of.estimate",
"summary"), row.names = c(NA, -8L), class = "data.frame")
预期的数据集是
df1 <- structure(list(study = structure(c(5L, 1L, 1L, 1L, 6L, 2L, 3L,
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"),
no.of.estimate = c(1, 1, 2, 3, 1, 1, 1, 1), summary = c(1,
7, 2, 5, 3, 6, 8, 9)), .Names = c("study", "no.of.estimate",
"summary"), row.names = c(NA, -8L), class = "data.frame")
这是我的 data.table
尝试,同时保留您的列并创建新索引(尽管请先查看我的评论)。主要优点是您将通过引用更新数据集而不是创建新副本
library(data.table)
setorder(setDT(df)[, indx := .GRP, study], indx, no.of.estimate)[]
# study no.of.estimate summary indx
# 1: E 1 1 1
# 2: A 1 7 2
# 3: A 2 2 2
# 4: A 3 5 2
# 5: F 1 3 3
# 6: B 1 6 4
# 7: C 1 8 5
# 8: D 1 9 6
我是一名心脏病专家,喜欢用 R 编写代码 - 我在对数据框进行排序时遇到了一个真正的问题,我怀疑解决方案真的很简单!
我有一个数据框,其中包含多项研究 df$study 的汇总值。大多数研究只有一个汇总值 (df$summary)。然而,正如您所见,研究 A 具有三个汇总值 (df$no.of.estimate)。见下文
study <- c("E", "A", "F", "A", "B", "A", "C", "D")
no.of.estimate <- c(1, 2, 1, 3, 1, 1, 1, 1)
summary <- c(1, 2, 3, 5, 6 ,7 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)
所以我想按 df$summary
对数据框进行排序 - 这很容易。但是,如果每项研究都有多个估计,那么我想将这些研究组合在一起并使用 "no.of.estimates" 列按顺序显示。
所以基本上所需的输出是
study <- c("E", "A", "A", "A", "F", "B", "C", "D")
no.of.estimate <- c(1, 1, 2, 3, 1, 1, 1, 1)
summary <- c(1, 7, 2, 5, 3 ,6 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)
你可以试试
library(dplyr)
df %>%
mutate(study=factor(study, levels=unique(study))) %>%
arrange(study,no.of.estimate)
# study no.of.estimate summary
#1 E 1 1
#2 A 1 7
#3 A 2 2
#4 A 3 5
#5 F 1 3
#6 B 1 6
#7 C 1 8
#8 D 1 9
或base R
方法
df$study <- factor(df$study, levels=unique(df$study))
df[with(df, order(study, no.of.estimate)), ]
数据
df <- structure(list(study = structure(c(5L, 1L, 6L, 1L, 2L, 1L, 3L,
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"),
no.of.estimate = c(1, 2, 1, 3, 1, 1, 1, 1), summary = c(1,
2, 3, 5, 6, 7, 8, 9)), .Names = c("study", "no.of.estimate",
"summary"), row.names = c(NA, -8L), class = "data.frame")
预期的数据集是
df1 <- structure(list(study = structure(c(5L, 1L, 1L, 1L, 6L, 2L, 3L,
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"),
no.of.estimate = c(1, 1, 2, 3, 1, 1, 1, 1), summary = c(1,
7, 2, 5, 3, 6, 8, 9)), .Names = c("study", "no.of.estimate",
"summary"), row.names = c(NA, -8L), class = "data.frame")
这是我的 data.table
尝试,同时保留您的列并创建新索引(尽管请先查看我的评论)。主要优点是您将通过引用更新数据集而不是创建新副本
library(data.table)
setorder(setDT(df)[, indx := .GRP, study], indx, no.of.estimate)[]
# study no.of.estimate summary indx
# 1: E 1 1 1
# 2: A 1 7 2
# 3: A 2 2 2
# 4: A 3 5 2
# 5: F 1 3 3
# 6: B 1 6 4
# 7: C 1 8 5
# 8: D 1 9 6