如何在 dplyr 和 R 中总结和子集多级分组数据框
How to summarise and subset multi-level grouped dataframe in dplyr and R
我有以下长格式数据:
testdf <- tibble(
name = c(rep("john", 4), rep("joe", 2)),
rep = c(1, 1, 2, 2, 1, 1),
field = rep(c("pet", "age"), 3),
value = c("dog", "young", "cat", "old", "fish", "young")
)
对于每个命名的人(约翰和乔),我想总结一下他们的每只宠物:
出于某种原因,我似乎无法处理“John”数据中重复的 events/pets。
如果我只为 Joe(只有一只宠物)过滤,代码就可以工作。
非常感谢任何帮助...
testdf %>%
group_by(name, rep) %>%
# filter(name == "joe") %>% # when I filter only for Joe, the code works
summarise(
about = paste0(
"The pet is a: ", .[field == "pet", "value"], " and it is ", .[field == "age", "value"]
)
)
testdf %>%
pivot_wider(id_cols = name:rep,names_from = field) %>%
mutate(about = paste0("The pet is a: ", pet, " and it is ", age))
name rep pet age about
<chr> <dbl> <chr> <chr> <chr>
1 john 1 dog young The pet is a: dog and it is young
2 john 2 cat old The pet is a: cat and it is old
3 joe 1 fish young The pet is a: fish and it is young
这也可以用data.table来完成,如下:
library(data.table)
setDT(testdf)[
,j = .(about = paste0("The pet is a ", .SD[field=="pet",value], " and it is ", .SD[field=="age",value])),
by = .(name,rep)
]
name rep about
1: john 1 The pet is a dog and it is young
2: john 2 The pet is a cat and it is old
3: joe 1 The pet is a fish and it is young
您的数据格式较长且不整齐,多个字段合二为一。因此,langtang 的回答是将其展开或扩大范围。 (更好的是 data.table 但我发现仍然很难使用 .SD]
我更喜欢在 dplyr 中尽可能简单地做这些事情。
另一种不传播的方法如下,它会产生相同的结果。 [没有data.table .SD 对我来说还是很难掌握!
所以在 3 行中:
testdf%>%
group_by(name,rep)%>%
summarise(about = paste("The pet is ",value[field=='pet']," and it is ",value[field=='age']))
产量:
name rep about
<chr> <dbl> <chr>
1 joe 1 The pet is fish and it is young
2 john 1 The pet is dog and it is young
3 john 2 The pet is cat and it is old
我有以下长格式数据:
testdf <- tibble(
name = c(rep("john", 4), rep("joe", 2)),
rep = c(1, 1, 2, 2, 1, 1),
field = rep(c("pet", "age"), 3),
value = c("dog", "young", "cat", "old", "fish", "young")
)
对于每个命名的人(约翰和乔),我想总结一下他们的每只宠物:
出于某种原因,我似乎无法处理“John”数据中重复的 events/pets。
如果我只为 Joe(只有一只宠物)过滤,代码就可以工作。
非常感谢任何帮助...
testdf %>%
group_by(name, rep) %>%
# filter(name == "joe") %>% # when I filter only for Joe, the code works
summarise(
about = paste0(
"The pet is a: ", .[field == "pet", "value"], " and it is ", .[field == "age", "value"]
)
)
testdf %>%
pivot_wider(id_cols = name:rep,names_from = field) %>%
mutate(about = paste0("The pet is a: ", pet, " and it is ", age))
name rep pet age about
<chr> <dbl> <chr> <chr> <chr>
1 john 1 dog young The pet is a: dog and it is young
2 john 2 cat old The pet is a: cat and it is old
3 joe 1 fish young The pet is a: fish and it is young
这也可以用data.table来完成,如下:
library(data.table)
setDT(testdf)[
,j = .(about = paste0("The pet is a ", .SD[field=="pet",value], " and it is ", .SD[field=="age",value])),
by = .(name,rep)
]
name rep about
1: john 1 The pet is a dog and it is young
2: john 2 The pet is a cat and it is old
3: joe 1 The pet is a fish and it is young
您的数据格式较长且不整齐,多个字段合二为一。因此,langtang 的回答是将其展开或扩大范围。 (更好的是 data.table 但我发现仍然很难使用 .SD]
我更喜欢在 dplyr 中尽可能简单地做这些事情。 另一种不传播的方法如下,它会产生相同的结果。 [没有data.table .SD 对我来说还是很难掌握! 所以在 3 行中:
testdf%>%
group_by(name,rep)%>%
summarise(about = paste("The pet is ",value[field=='pet']," and it is ",value[field=='age']))
产量:
name rep about
<chr> <dbl> <chr>
1 joe 1 The pet is fish and it is young
2 john 1 The pet is dog and it is young
3 john 2 The pet is cat and it is old