你如何在 R 中分组和合并列
how do you group and merge columns in R
我有这个数据框:
d
structure(list(Product = structure(c(3L, 1L, 2L, 4L, 4L, 6L,
4L, 5L), .Label = c("App_Servers ", "Db_servers,application ",
"Server1,Serve2,Server4", "Server1,Serve2,Server4 ", "Server1,Serve2,Server4 ",
"Server1,Serve2,Sever4 "), class = "factor"), Day = structure(c(3L,
5L, 4L, 5L, 2L, 4L, 1L, 1L), .Label = c("Mon ", "Thu ", "Tue",
"Tue ", "Wed "), class = "factor"), Date = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 7L), .Label = c(" 2015-01-06 ", "2015-01-07 ",
"2015-01-13 ", "2015-01-14 ", "2015-01-15 ", "2015-01-20 ", "2015-02-16 "
), class = "factor"), Month = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L), .Label = c("Feb", "Jan"), class = "factor")), .Names = c("Product",
"Day", "Date", "Month"), class = "data.frame", row.names = c(NA,
-8L))
我需要能够将按产品、日期和月份分组的逗号分隔的日期放在一个单元格中。例如,
服务器 1、服务 2、服务器 4 出现在 2015-01-06、2015-01-14、2015-01-15、2015-01-20 一月份
我的新 df 需要如下所示:
Product Day Date Month Day_list
Server1,Serve2,Server4 Tues 2015-01-06 Jan 2015-01-06,2015-01-13,2015-01-20
有什么包可以帮助我在 R 中做到这一点吗?
我尝试使用 data.table 包:
d[,d:=paste(Date,Date), c("Product","Day","Month")]
不工作
这是一种使用 dplyr
的解决方案:
d %>% mutate(
Product = gsub("[ ]", "", Product),
Day = gsub("[ ] ", "", Day )
) %>%
group_by(Product, Month) %>%
mutate(
Day_list = paste(Date, collapse = "")
)
Product Day Date Month Day_list
1 Server1,Serve2,Server4 Tue 2015-01-06 Jan 2015-01-06 2015-01-14 2015-01-15
2 App_Servers Wed 2015-01-07 Jan 2015-01-07
3 Db_servers,application Tue 2015-01-13 Jan 2015-01-13
4 Server1,Serve2,Server4 Wed 2015-01-14 Jan 2015-01-06 2015-01-14 2015-01-15
5 Server1,Serve2,Server4 Thu 2015-01-15 Jan 2015-01-06 2015-01-14 2015-01-15
6 Server1,Serve2,Sever4 Tue 2015-01-20 Jan 2015-01-20
7 Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16 2015-02-16
8 Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16 2015-02-16
这里有几件事。
首先,您的列中有额外的空格。您必须删除才能将它们组合在一起。
require(data.table)
setDT(d)[, `:=`(Product = gsub("[ ]", "", Product),
Date = gsub("[ ]", "", Date))]
其次,您错误地使用了 paste()
和 :=
。
d[, Date_list := paste(Date, collapse=","), by=c("Product", "Month")]
d
# Product Day Date Month Date_list
# 1: Server1,Serve2,Server4 Tue 2015-01-06 Jan 2015-01-06,2015-01-14,2015-01-15
# 2: App_Servers Wed 2015-01-07 Jan 2015-01-07
# 3: Db_servers,application Tue 2015-01-13 Jan 2015-01-13
# 4: Server1,Serve2,Server4 Wed 2015-01-14 Jan 2015-01-06,2015-01-14,2015-01-15
# 5: Server1,Serve2,Server4 Thu 2015-01-15 Jan 2015-01-06,2015-01-14,2015-01-15
# 6: Server1,Serve2,Sever4 Tue 2015-01-20 Jan 2015-01-20
# 7: Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16,2015-02-16
# 8: Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16,2015-02-16
看看 Introduction to data.table and Reference semantics 小插曲。
编辑:我刚刚发现第 6 行有错字 Product
。它有 Sever4
而不是 Server4
.
我有这个数据框:
d
structure(list(Product = structure(c(3L, 1L, 2L, 4L, 4L, 6L,
4L, 5L), .Label = c("App_Servers ", "Db_servers,application ",
"Server1,Serve2,Server4", "Server1,Serve2,Server4 ", "Server1,Serve2,Server4 ",
"Server1,Serve2,Sever4 "), class = "factor"), Day = structure(c(3L,
5L, 4L, 5L, 2L, 4L, 1L, 1L), .Label = c("Mon ", "Thu ", "Tue",
"Tue ", "Wed "), class = "factor"), Date = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 7L), .Label = c(" 2015-01-06 ", "2015-01-07 ",
"2015-01-13 ", "2015-01-14 ", "2015-01-15 ", "2015-01-20 ", "2015-02-16 "
), class = "factor"), Month = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L), .Label = c("Feb", "Jan"), class = "factor")), .Names = c("Product",
"Day", "Date", "Month"), class = "data.frame", row.names = c(NA,
-8L))
我需要能够将按产品、日期和月份分组的逗号分隔的日期放在一个单元格中。例如,
服务器 1、服务 2、服务器 4 出现在 2015-01-06、2015-01-14、2015-01-15、2015-01-20 一月份
我的新 df 需要如下所示:
Product Day Date Month Day_list
Server1,Serve2,Server4 Tues 2015-01-06 Jan 2015-01-06,2015-01-13,2015-01-20
有什么包可以帮助我在 R 中做到这一点吗?
我尝试使用 data.table 包:
d[,d:=paste(Date,Date), c("Product","Day","Month")]
不工作
这是一种使用 dplyr
的解决方案:
d %>% mutate(
Product = gsub("[ ]", "", Product),
Day = gsub("[ ] ", "", Day )
) %>%
group_by(Product, Month) %>%
mutate(
Day_list = paste(Date, collapse = "")
)
Product Day Date Month Day_list
1 Server1,Serve2,Server4 Tue 2015-01-06 Jan 2015-01-06 2015-01-14 2015-01-15
2 App_Servers Wed 2015-01-07 Jan 2015-01-07
3 Db_servers,application Tue 2015-01-13 Jan 2015-01-13
4 Server1,Serve2,Server4 Wed 2015-01-14 Jan 2015-01-06 2015-01-14 2015-01-15
5 Server1,Serve2,Server4 Thu 2015-01-15 Jan 2015-01-06 2015-01-14 2015-01-15
6 Server1,Serve2,Sever4 Tue 2015-01-20 Jan 2015-01-20
7 Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16 2015-02-16
8 Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16 2015-02-16
这里有几件事。
首先,您的列中有额外的空格。您必须删除才能将它们组合在一起。
require(data.table)
setDT(d)[, `:=`(Product = gsub("[ ]", "", Product),
Date = gsub("[ ]", "", Date))]
其次,您错误地使用了 paste()
和 :=
。
d[, Date_list := paste(Date, collapse=","), by=c("Product", "Month")]
d
# Product Day Date Month Date_list
# 1: Server1,Serve2,Server4 Tue 2015-01-06 Jan 2015-01-06,2015-01-14,2015-01-15
# 2: App_Servers Wed 2015-01-07 Jan 2015-01-07
# 3: Db_servers,application Tue 2015-01-13 Jan 2015-01-13
# 4: Server1,Serve2,Server4 Wed 2015-01-14 Jan 2015-01-06,2015-01-14,2015-01-15
# 5: Server1,Serve2,Server4 Thu 2015-01-15 Jan 2015-01-06,2015-01-14,2015-01-15
# 6: Server1,Serve2,Sever4 Tue 2015-01-20 Jan 2015-01-20
# 7: Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16,2015-02-16
# 8: Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16,2015-02-16
看看 Introduction to data.table and Reference semantics 小插曲。
编辑:我刚刚发现第 6 行有错字 Product
。它有 Sever4
而不是 Server4
.