从数据框中分组数据
Grouping data from data frame
我有一个具有以下形式的数据:
first second data_col1 data_col2 data_col3
lu NA <number> <number> <number>
lu NA <number> <number> <number>
lu NA <number> <number> <number>
lu NA <number> <number> <number>
lu mult <number> <number> <number>
lu mult <number> <number> <number>
lu mult <number> <number> <number>
lu mult <number> <number> <number>
mult NA <number> <number> <number>
mult NA <number> <number> <number>
mult NA <number> <number> <number>
mult NA <number> <number> <number>
以此类推
我想按前两列对这些数据进行分组并分别绘制它们。
我尝试这样做:
comb <- unique(total.df[c(1,2)])
apply(comb, 1, function(x) {
d<-total.df[total.df$guess==FALSE &
total.df$second==x[2] &
total.df$first==x[1] &
total.df$tasks=='tasks_const',]
p = ggplot(d, aes(x=d$platform, y=d$time,
group=as.factor(d$sched),
colour=as.factor(d$sched))) +
geom_point() + geom_line()
ggsave(filename=sprintf("/tmp/a_%s_%s.png", x[1], x[2]))
})
我的梳子如下:
first second
1 mult <NA>
121 lu mult
241 lu <NA>
361 heat mult
481 heat lu
601 heat <NA>
721 cholesky mult
841 cholesky lu
961 cholesky heat
1081 cholesky <NA>
1201 pipeline mult
1321 pipeline lu
1441 pipeline heat
1561 pipeline cholesky
1681 pipeline <NA>
1801 gen mult
1921 gen lu
2041 gen heat
2161 gen cholesky
2281 gen pipeline
2401 gen <NA>
facet_wrap 几乎解决了我的任务,但我希望每张图片都是分开的,以便能够看到实际存在的内容。而且 facet_wrap 每个都太小了。
带有 facet_wrap 的代码如下:
ggplot(total.df, aes(x=total.df$platform, y=total.df$time,
group=as.factor(total.df$sched),
colour=as.factor(total.df$sched))) +
geom_point() + geom_line() + facet_wrap(first ~ second);
我可能会建议将每个图表绘制在一个 pdf 文件的不同页面上。我也可以推荐使用 data.table
,因为它让事情看起来更好:
library(data.table)
total.dt <- data.table(total.df)
setkey(total.dt, first, second)
comb <- unique(total.dt[, list(first, second)])
pdf("test.pdf")
for(n in 1:nrow(comb)){
d <- total.dt[comb[n, ]][guess == FALSE & tasks == "tasks_const"]
print(ggplot(d, aes(x = platform, y = time,
group = as.factor(sched),
colour = as.factor(sched))) +
geom_point() + geom_line() +
ggtitle(sprintf("first=%s, second=%s", comb[n, first], comb[n, second])))}
dev.off()
我有一个具有以下形式的数据:
first second data_col1 data_col2 data_col3
lu NA <number> <number> <number>
lu NA <number> <number> <number>
lu NA <number> <number> <number>
lu NA <number> <number> <number>
lu mult <number> <number> <number>
lu mult <number> <number> <number>
lu mult <number> <number> <number>
lu mult <number> <number> <number>
mult NA <number> <number> <number>
mult NA <number> <number> <number>
mult NA <number> <number> <number>
mult NA <number> <number> <number>
以此类推
我想按前两列对这些数据进行分组并分别绘制它们。
我尝试这样做:
comb <- unique(total.df[c(1,2)])
apply(comb, 1, function(x) {
d<-total.df[total.df$guess==FALSE &
total.df$second==x[2] &
total.df$first==x[1] &
total.df$tasks=='tasks_const',]
p = ggplot(d, aes(x=d$platform, y=d$time,
group=as.factor(d$sched),
colour=as.factor(d$sched))) +
geom_point() + geom_line()
ggsave(filename=sprintf("/tmp/a_%s_%s.png", x[1], x[2]))
})
我的梳子如下:
first second
1 mult <NA>
121 lu mult
241 lu <NA>
361 heat mult
481 heat lu
601 heat <NA>
721 cholesky mult
841 cholesky lu
961 cholesky heat
1081 cholesky <NA>
1201 pipeline mult
1321 pipeline lu
1441 pipeline heat
1561 pipeline cholesky
1681 pipeline <NA>
1801 gen mult
1921 gen lu
2041 gen heat
2161 gen cholesky
2281 gen pipeline
2401 gen <NA>
facet_wrap 几乎解决了我的任务,但我希望每张图片都是分开的,以便能够看到实际存在的内容。而且 facet_wrap 每个都太小了。
带有 facet_wrap 的代码如下:
ggplot(total.df, aes(x=total.df$platform, y=total.df$time,
group=as.factor(total.df$sched),
colour=as.factor(total.df$sched))) +
geom_point() + geom_line() + facet_wrap(first ~ second);
我可能会建议将每个图表绘制在一个 pdf 文件的不同页面上。我也可以推荐使用 data.table
,因为它让事情看起来更好:
library(data.table)
total.dt <- data.table(total.df)
setkey(total.dt, first, second)
comb <- unique(total.dt[, list(first, second)])
pdf("test.pdf")
for(n in 1:nrow(comb)){
d <- total.dt[comb[n, ]][guess == FALSE & tasks == "tasks_const"]
print(ggplot(d, aes(x = platform, y = time,
group = as.factor(sched),
colour = as.factor(sched))) +
geom_point() + geom_line() +
ggtitle(sprintf("first=%s, second=%s", comb[n, first], comb[n, second])))}
dev.off()