如何在没有循环的情况下在 R 中重塑 data.frame?
How to reshape a data.frame in R without a loop?
我在 R 中有一个 data.frame
。我需要比较两行数据,如果它们相同,我需要合并行并将数据合并到一列中。我觉得这是使用 R 时的常见需求,因此使用 ddply
或其他一些包应该能够完成此任务。下面是原样的数据,dat
,以及经过一些代码后它应该是什么样的,foo.
我是 R 的新手,所以非常感谢任何帮助。
之前:
dat <- structure(list(detected_id = c(11, 11, 4), reviewer_name = c("mike",
"mike", "john"), created_at = c("2016-05-04 10:02:45", "2016-05-04 10:02:45",
"2016-05-04 10:02:45"), stage = c(2L, 2L, 1L), V7 = c("Detected Organism: Staphylococcus Aureus, Comment: Looks good",
"Detected Organism: Staphylococcus Aureus, Comment: Note 1",
"Detected Organism: Human Adenovirus 7, Comment: test")), .Names = c("detected_id",
"reviewer_name", "created_at", "stage", "V7"), row.names = c(NA,
-3L), class = "data.frame")
之后:
foo <- structure(list(detected_id = c(11L, 4L), reviewer_name = c("mike",
"john"), created_at = structure(c(1L, 1L), .Label = "5/4/16 10:02", class = "factor"),
stage = c(2L, 1L), V7 = structure(c(2L, 1L), .Label = c("Detected Organism: Human Adenovirus 7, Comment: test",
"Detected Organism: Staphylococcus Aureus, Comment: Looks good; Detected Organism: Staphylococcus Aureus, Comment: Note 1"
), class = "factor")), .Names = c("detected_id", "reviewer_name",
"created_at", "stage", "V7"), row.names = c(NA, -2L), class = "data.frame")
编辑:
下面的解决方案适用于我提供的数据集,但我发现这些解决方案实际上并没有按预期工作。这是失败的 data.frame 示例。请注意,detected_id 列对我来说已经过时了。
dat <- structure(list(detected_id = c(11, 11, 11, 11, 12, 4), reviewer_name = c("Mike",
"Mike", "Mike", "Mike", "John", "John"), created_at = c("2016-05-04 10:02:45",
"2016-05-04 10:02:45", "2016-05-04 10:02:45", "2016-05-04 10:02:45",
"2016-05-04 10:02:45", "2016-05-04 10:02:45"), stage = c(2L,
3L, 2L, 3L, 1L, 1L), V7 = c("Detected Organism: Staphylococcus Aureus, Comment: Looks good",
"Detected Organism: Staphylococcus Aureus, Comment: Looks good",
"Detected Organism: Staphylococcus Aureus, Comment: Note 1",
"Detected Organism: Staphylococcus Aureus, Comment: Note 1",
"Detected Organism: Stenotrophomonas Maltophilia, Comment: new note",
"Detected Organism: Human Adenovirus 7, Comment: test")), .Names = c("detected_id",
"reviewer_name", "created_at", "stage", "V7"), row.names = c(NA,
-6L), class = "data.frame")
解决方案:在重塑 data.frame 之前删除 detected_id 列,谢谢@eddi
library(data.table)
setDT(dat)[, paste(V7, collapse = "; ")
, by = .(detected_id, reviewer_name, created_at, stage)]
# detected_id reviewer_name created_at stage
#1: 11 mike 2016-05-04 10:02:45 2
#2: 4 john 2016-05-04 10:02:45 1
# V1
#1: Detected Organism: Staphylococcus Aureus, Comment: Looks good; Detected Organism: Staphylococcus Aureus, Comment: Note 1
#2: Detected Organism: Human Adenovirus 7, Comment: test
使用基数 R
with(dat, aggregate(V7,list(detected_id=detected_id, reviewer_name=reviewer_name, created_at=created_at, stage=stage),paste,collapse=' '))
我在 R 中有一个 data.frame
。我需要比较两行数据,如果它们相同,我需要合并行并将数据合并到一列中。我觉得这是使用 R 时的常见需求,因此使用 ddply
或其他一些包应该能够完成此任务。下面是原样的数据,dat
,以及经过一些代码后它应该是什么样的,foo.
我是 R 的新手,所以非常感谢任何帮助。
之前:
dat <- structure(list(detected_id = c(11, 11, 4), reviewer_name = c("mike",
"mike", "john"), created_at = c("2016-05-04 10:02:45", "2016-05-04 10:02:45",
"2016-05-04 10:02:45"), stage = c(2L, 2L, 1L), V7 = c("Detected Organism: Staphylococcus Aureus, Comment: Looks good",
"Detected Organism: Staphylococcus Aureus, Comment: Note 1",
"Detected Organism: Human Adenovirus 7, Comment: test")), .Names = c("detected_id",
"reviewer_name", "created_at", "stage", "V7"), row.names = c(NA,
-3L), class = "data.frame")
之后:
foo <- structure(list(detected_id = c(11L, 4L), reviewer_name = c("mike",
"john"), created_at = structure(c(1L, 1L), .Label = "5/4/16 10:02", class = "factor"),
stage = c(2L, 1L), V7 = structure(c(2L, 1L), .Label = c("Detected Organism: Human Adenovirus 7, Comment: test",
"Detected Organism: Staphylococcus Aureus, Comment: Looks good; Detected Organism: Staphylococcus Aureus, Comment: Note 1"
), class = "factor")), .Names = c("detected_id", "reviewer_name",
"created_at", "stage", "V7"), row.names = c(NA, -2L), class = "data.frame")
编辑:
下面的解决方案适用于我提供的数据集,但我发现这些解决方案实际上并没有按预期工作。这是失败的 data.frame 示例。请注意,detected_id 列对我来说已经过时了。
dat <- structure(list(detected_id = c(11, 11, 11, 11, 12, 4), reviewer_name = c("Mike",
"Mike", "Mike", "Mike", "John", "John"), created_at = c("2016-05-04 10:02:45",
"2016-05-04 10:02:45", "2016-05-04 10:02:45", "2016-05-04 10:02:45",
"2016-05-04 10:02:45", "2016-05-04 10:02:45"), stage = c(2L,
3L, 2L, 3L, 1L, 1L), V7 = c("Detected Organism: Staphylococcus Aureus, Comment: Looks good",
"Detected Organism: Staphylococcus Aureus, Comment: Looks good",
"Detected Organism: Staphylococcus Aureus, Comment: Note 1",
"Detected Organism: Staphylococcus Aureus, Comment: Note 1",
"Detected Organism: Stenotrophomonas Maltophilia, Comment: new note",
"Detected Organism: Human Adenovirus 7, Comment: test")), .Names = c("detected_id",
"reviewer_name", "created_at", "stage", "V7"), row.names = c(NA,
-6L), class = "data.frame")
解决方案:在重塑 data.frame 之前删除 detected_id 列,谢谢@eddi
library(data.table)
setDT(dat)[, paste(V7, collapse = "; ")
, by = .(detected_id, reviewer_name, created_at, stage)]
# detected_id reviewer_name created_at stage
#1: 11 mike 2016-05-04 10:02:45 2
#2: 4 john 2016-05-04 10:02:45 1
# V1
#1: Detected Organism: Staphylococcus Aureus, Comment: Looks good; Detected Organism: Staphylococcus Aureus, Comment: Note 1
#2: Detected Organism: Human Adenovirus 7, Comment: test
使用基数 R
with(dat, aggregate(V7,list(detected_id=detected_id, reviewer_name=reviewer_name, created_at=created_at, stage=stage),paste,collapse=' '))