从数据框中分组数据

Grouping data from data frame

我有一个具有以下形式的数据:

first second data_col1 data_col2 data_col3
lu    NA     <number>  <number>  <number>
lu    NA     <number>  <number>  <number>
lu    NA     <number>  <number>  <number>
lu    NA     <number>  <number>  <number>
lu    mult   <number>  <number>  <number>
lu    mult   <number>  <number>  <number>
lu    mult   <number>  <number>  <number>
lu    mult   <number>  <number>  <number>
mult  NA     <number>  <number>  <number>
mult  NA     <number>  <number>  <number>
mult  NA     <number>  <number>  <number>
mult  NA     <number>  <number>  <number>

以此类推

我想按前两列对这些数据进行分组并分别绘制它们。

我尝试这样做:

comb <- unique(total.df[c(1,2)])
apply(comb, 1, function(x) {
  d<-total.df[total.df$guess==FALSE &
              total.df$second==x[2] &
              total.df$first==x[1] &
              total.df$tasks=='tasks_const',]
  p = ggplot(d, aes(x=d$platform, y=d$time,
                    group=as.factor(d$sched),
             colour=as.factor(d$sched))) +
      geom_point() + geom_line()
  ggsave(filename=sprintf("/tmp/a_%s_%s.png", x[1], x[2]))
})

我的梳子如下:

        first   second
1        mult     <NA>
121        lu     mult
241        lu     <NA>
361      heat     mult
481      heat       lu
601      heat     <NA>
721  cholesky     mult
841  cholesky       lu
961  cholesky     heat
1081 cholesky     <NA>
1201 pipeline     mult
1321 pipeline       lu
1441 pipeline     heat
1561 pipeline cholesky
1681 pipeline     <NA>
1801      gen     mult
1921      gen       lu
2041      gen     heat
2161      gen cholesky
2281      gen pipeline
2401      gen     <NA>

facet_wrap 几乎解决了我的任务,但我希望每张图片都是分开的,以便能够看到实际存在的内容。而且 facet_wrap 每个都太小了。

带有 facet_wrap 的代码如下:

ggplot(total.df, aes(x=total.df$platform, y=total.df$time,
       group=as.factor(total.df$sched),
       colour=as.factor(total.df$sched))) +
geom_point() + geom_line() + facet_wrap(first ~ second);

我可能会建议将每个图表绘制在一个 pdf 文件的不同页面上。我也可以推荐使用 data.table,因为它让事情看起来更好:

library(data.table)
total.dt <- data.table(total.df)
setkey(total.dt, first, second)
comb <- unique(total.dt[, list(first, second)])

pdf("test.pdf")
for(n in 1:nrow(comb)){
  d <- total.dt[comb[n, ]][guess == FALSE & tasks == "tasks_const"] 
  print(ggplot(d, aes(x = platform, y = time,
                      group = as.factor(sched),
                      colour = as.factor(sched))) +
        geom_point() + geom_line() + 
        ggtitle(sprintf("first=%s, second=%s", comb[n, first], comb[n, second])))}
dev.off()