如何生成数据集和图表列表并导出它们?
How to generate A LIST OF datasets & graphs and EXPORTING them?
这是数据集:
# dataset call DT
DT <- data.table(
Store = rep(c("store_A","store_B","store_C","store_D","store_E"),4),
Amount = sample(1000,20))
我有两个目标必须实现:
- 1.Generate 独立分组数据集 用于导出 EXCEL.CSV 文件。
- 2.Generate INDEPENDENT Graph 用于导出 PNG 文件。
*不需要 运行 一次操作。
约束:
我只能通过 ONE by ONE 基本操作来执行这些操作,例如:
# For dataset & CSV export
store_A <- DT %>% group_by(Store) %>% summarise(Total = sum(Amount))
fwrite(store_A,"PATH/store_A.csv")
store_B <- DT %>% group_by(Store) %>% summarise(Total = sum(Amount))
fwrite(store_B,"PATH/store_A.csv")
.....
# For graph :
Plt_A <- ggplot(store_A,aes(x = Store, y = Total)) + geom_point()
ggsave("PATH/Plt_A.png")
Plt_B <- ggplot(store_B,aes(x = Store, y = Total)) + geom_point()
ggsave("PATH/Plt_B.png")
.....
*可以找到由“for - loops”编写的方法,但令人困惑的是
更有效和 WORKS 生成图形,
for 循环 VS lapply 系列 --
由于真实数据集有超过 200 万行 70 列和 10k 组 要生成,for 循环可能 运行 非常慢并使 R 本身崩溃。
实际数据集中的瓶颈包含 10k 个 "Store" 组。
因为一切都需要循环:
require(tidyverse)
require(data.table)
setwd("Your working directory")
# dataset call DT
DT <- data.table(
Store = rep(c("store_A","store_B","store_C","store_D","store_E"),4),
Amount = sample(1000,20)) %>%
#Arrange by store and amount
arrange(Store, Amount) %>%
#Nesting by store, thus the loop counter/index will go by store
nest(-Store)
#Export CSVs by store
i <- 1
for (i in 1:nrow(DT)) {
write.csv(DT$data[i], paste(DT$Store[i], "csv", sep = "."))
}
#Export Graphs by store
i <- 1
for (i in 1:nrow(DT)) {
Graph <- DT$data[i] %>%
as.data.frame() %>%
ggplot(aes(Amount)) + geom_histogram()
ggsave(Graph, file = paste0(DT$Store[i],".png"), width = 14, height = 10, units = "cm")
}
这是数据集:
# dataset call DT
DT <- data.table(
Store = rep(c("store_A","store_B","store_C","store_D","store_E"),4),
Amount = sample(1000,20))
我有两个目标必须实现:
- 1.Generate 独立分组数据集 用于导出 EXCEL.CSV 文件。
- 2.Generate INDEPENDENT Graph 用于导出 PNG 文件。
*不需要 运行 一次操作。
约束: 我只能通过 ONE by ONE 基本操作来执行这些操作,例如:
# For dataset & CSV export
store_A <- DT %>% group_by(Store) %>% summarise(Total = sum(Amount))
fwrite(store_A,"PATH/store_A.csv")
store_B <- DT %>% group_by(Store) %>% summarise(Total = sum(Amount))
fwrite(store_B,"PATH/store_A.csv")
.....
# For graph :
Plt_A <- ggplot(store_A,aes(x = Store, y = Total)) + geom_point()
ggsave("PATH/Plt_A.png")
Plt_B <- ggplot(store_B,aes(x = Store, y = Total)) + geom_point()
ggsave("PATH/Plt_B.png")
.....
*可以找到由“for - loops”编写的方法,但令人困惑的是 更有效和 WORKS 生成图形, for 循环 VS lapply 系列 -- 由于真实数据集有超过 200 万行 70 列和 10k 组 要生成,for 循环可能 运行 非常慢并使 R 本身崩溃。 实际数据集中的瓶颈包含 10k 个 "Store" 组。
因为一切都需要循环:
require(tidyverse)
require(data.table)
setwd("Your working directory")
# dataset call DT
DT <- data.table(
Store = rep(c("store_A","store_B","store_C","store_D","store_E"),4),
Amount = sample(1000,20)) %>%
#Arrange by store and amount
arrange(Store, Amount) %>%
#Nesting by store, thus the loop counter/index will go by store
nest(-Store)
#Export CSVs by store
i <- 1
for (i in 1:nrow(DT)) {
write.csv(DT$data[i], paste(DT$Store[i], "csv", sep = "."))
}
#Export Graphs by store
i <- 1
for (i in 1:nrow(DT)) {
Graph <- DT$data[i] %>%
as.data.frame() %>%
ggplot(aes(Amount)) + geom_histogram()
ggsave(Graph, file = paste0(DT$Store[i],".png"), width = 14, height = 10, units = "cm")
}