R 聚合 data.frame 一列中的日期和时间格式错误
R aggregate data.frame having dates and hours in one column misformatted
我有一个如下所示的数据框:
kWh Equipment date
1 1.53 aquecedor01 2015-01-01 00:00:00
2 5.29 aquecedor01 2015-01-01 01:00:00
3 5.73 aquecedor01 2015-01-01 02:00:00
但是当我通过 Equipment 变量汇总数据以从 kWh 中找到最大值时,日期列的格式错误如下:
Equipment kWh date
1 aquecedor01 6.5 1433023200
2 aquecedor02 6.5 1433023200
3 exaustor 6.5 1433023200
我已经为此苦苦挣扎了一段时间,我发现的大多数东西只能独立地处理日期或时间。就我而言,因为我是在 Shiny 应用程序中执行情节,所以一次完成所有操作对我来说会更容易。
我想在条形图中绘制每个设备的所有最大值,并在条形图上写下该值的时间。这是我的代码:
ggplotly(ggplot(data=aggregate(
. ~ Equipment,
data = dt.hourly[(as.character(input$dateRange[1]) <= dt.hourly$date) &
(as.character(input$dateRange[2]) > dt.hourly$date) &
(dt.hourly$Equipment %in% input$equipments),], max),
aes(x=Equipment, y=kWh)) +
geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=date),
position = position_stack(vjust = 0.5),
angle = 90,
size=2) +
xlab("Date") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
))
除此之外 angle=90
被忽略了,我不明白为什么。
这就是我得到的:
提前致谢。
作为一个可重现的例子:
library(plotly)
set.seed(1)
dt <- data.frame(
kWh = sample(10:100, 10, replace = TRUE)/100,
Equipment = sample(c("heater", "furnace", "AC"), 10, replace = TRUE),
date = sample(as.POSIXct(c("2015-01-14 17:00:00", "2015-01-21 20:00:00", "2015-01-21 22:00:00", "2015-02-21 20:00:00", "2015-01-22 14:00:00", "2015-02-14 17:00:00", "2015-02-21 20:00:00", "2015-02-21 22:00:00", "2015-03-21 20:00:00", "2015-03-22 14:00:00" )), 10, replace = TRUE)
)
以及绘图:
ggplotly(ggplot(data=aggregate(
. ~ Equipment,
data = dt[("2015-01-12" <= dt$date) &
("2015-02-22" > dt$date) &
(dt$Equipment %in% c("AC", "furnace")),], max),
aes(x=Equipment, y=kWh)) +
geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=date),
position = position_stack(vjust = 0.5),
angle = 90,
size=2) +
xlab("Date") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)))
并且 dput
输出是:
structure(list(kWh = c(0.34, 0.43, 0.62, 0.92, 0.28, 0.91, 0.95,
0.7, 0.67, 0.15), Equipment = structure(c(3L, 3L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L), .Label = c("AC", "furnace", "heater"), class = "factor"),
date = structure(c(1427032800, 1421877600, 1424548800, 1421870400,
1421877600, 1424548800, 1421254800, 1424548800, 1426968000,
1424548800), class = c("POSIXct", "POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA,
-10L))
由于您的 objective 是注释出现最大 kWh 的日期,因此您希望在中省略 date聚合。因此,考虑使用 ave
计算分组 max_kWh,这会添加相同长度的列(内联聚合)。然后在 kWh == max_kWh
.
处对数据框进行子集化
dt$max_kWh <- with(dt, ave(kWh, Equipment, FUN=max))
agg_dt <- subset(dt, kWh == max_kWh)
ggplot(data=agg_dt, aes(x=Equipment, y=kWh)) +
geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=date),
position = position_stack(vjust = 0.5),
angle = 0,
size = 2) +
xlab("Equipment") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 0, hjust = 1))
对于读取 input 值的 Shiny 集成,使用 transform
添加 max_kWh 列,然后将结果包装在 subset
:
agg_dt <- subset(
transform(dt.hourly[(as.character(input$dateRange[1]) <= dt.hourly$date) &
(as.character(input$dateRange[2]) > dt.hourly$date) &
(dt.hourly$Equipment %in% input$equipments),],
max_kWh = ave(kWh, Equipment, FUN=max),
kWh == max_kWh
)
ggplotly(ggplot(data=agg_dt, aes(x=Equipment, y=kWh)) +
geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=date),
position = position_stack(vjust = 0.5),
angle = 0,
size = 2) +
xlab("Date") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 0, hjust = 1))
))
您可以在绘制之前根据需要过滤数据:
library(tidyverse)
dt_sum <- dt %>%
# First filter according to your input
filter(Equipment %in% c("AC", "furnace") & ("2015-01-12" <= date) & ("2015-02-22" > date)) %>%
group_by(Equipment) %>% # Group the data by Equipment
top_n(1, kWh) %>% # Take the maximum kWh value per Equipment
top_n(1, date) # Take the maximum date if there are several with the same max kWh value
dt_sum
# A tibble: 2 x 3
# Groups: Equipment [2]
# kWh Equipment date
# <dbl> <fct> <dttm>
# 1 0.92 furnace 2015-01-21 20:00:00
# 2 0.95 AC 2015-01-14 17:00:00
p <- ggplot(dt_sum, aes(x = Equipment, y = kWh)) +
geom_bar(position = 'dodge', stat = 'identity') +
geom_text(aes(label = date), position = position_stack(vjust = 0.5),
angle = 90, size = 2) +
xlab("Date") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
p
角度问题是由 ggplotly
引起的(如您所见,angle = 90
在 ggplot
调用中未被忽略)。
ggplotly(p)
我有一个如下所示的数据框:
kWh Equipment date
1 1.53 aquecedor01 2015-01-01 00:00:00
2 5.29 aquecedor01 2015-01-01 01:00:00
3 5.73 aquecedor01 2015-01-01 02:00:00
但是当我通过 Equipment 变量汇总数据以从 kWh 中找到最大值时,日期列的格式错误如下:
Equipment kWh date
1 aquecedor01 6.5 1433023200
2 aquecedor02 6.5 1433023200
3 exaustor 6.5 1433023200
我已经为此苦苦挣扎了一段时间,我发现的大多数东西只能独立地处理日期或时间。就我而言,因为我是在 Shiny 应用程序中执行情节,所以一次完成所有操作对我来说会更容易。
我想在条形图中绘制每个设备的所有最大值,并在条形图上写下该值的时间。这是我的代码:
ggplotly(ggplot(data=aggregate(
. ~ Equipment,
data = dt.hourly[(as.character(input$dateRange[1]) <= dt.hourly$date) &
(as.character(input$dateRange[2]) > dt.hourly$date) &
(dt.hourly$Equipment %in% input$equipments),], max),
aes(x=Equipment, y=kWh)) +
geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=date),
position = position_stack(vjust = 0.5),
angle = 90,
size=2) +
xlab("Date") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
))
除此之外 angle=90
被忽略了,我不明白为什么。
这就是我得到的:
提前致谢。
作为一个可重现的例子:
library(plotly)
set.seed(1)
dt <- data.frame(
kWh = sample(10:100, 10, replace = TRUE)/100,
Equipment = sample(c("heater", "furnace", "AC"), 10, replace = TRUE),
date = sample(as.POSIXct(c("2015-01-14 17:00:00", "2015-01-21 20:00:00", "2015-01-21 22:00:00", "2015-02-21 20:00:00", "2015-01-22 14:00:00", "2015-02-14 17:00:00", "2015-02-21 20:00:00", "2015-02-21 22:00:00", "2015-03-21 20:00:00", "2015-03-22 14:00:00" )), 10, replace = TRUE)
)
以及绘图:
ggplotly(ggplot(data=aggregate(
. ~ Equipment,
data = dt[("2015-01-12" <= dt$date) &
("2015-02-22" > dt$date) &
(dt$Equipment %in% c("AC", "furnace")),], max),
aes(x=Equipment, y=kWh)) +
geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=date),
position = position_stack(vjust = 0.5),
angle = 90,
size=2) +
xlab("Date") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)))
并且 dput
输出是:
structure(list(kWh = c(0.34, 0.43, 0.62, 0.92, 0.28, 0.91, 0.95,
0.7, 0.67, 0.15), Equipment = structure(c(3L, 3L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L), .Label = c("AC", "furnace", "heater"), class = "factor"),
date = structure(c(1427032800, 1421877600, 1424548800, 1421870400,
1421877600, 1424548800, 1421254800, 1424548800, 1426968000,
1424548800), class = c("POSIXct", "POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA,
-10L))
由于您的 objective 是注释出现最大 kWh 的日期,因此您希望在中省略 date聚合。因此,考虑使用 ave
计算分组 max_kWh,这会添加相同长度的列(内联聚合)。然后在 kWh == max_kWh
.
dt$max_kWh <- with(dt, ave(kWh, Equipment, FUN=max))
agg_dt <- subset(dt, kWh == max_kWh)
ggplot(data=agg_dt, aes(x=Equipment, y=kWh)) +
geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=date),
position = position_stack(vjust = 0.5),
angle = 0,
size = 2) +
xlab("Equipment") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 0, hjust = 1))
对于读取 input 值的 Shiny 集成,使用 transform
添加 max_kWh 列,然后将结果包装在 subset
:
agg_dt <- subset(
transform(dt.hourly[(as.character(input$dateRange[1]) <= dt.hourly$date) &
(as.character(input$dateRange[2]) > dt.hourly$date) &
(dt.hourly$Equipment %in% input$equipments),],
max_kWh = ave(kWh, Equipment, FUN=max),
kWh == max_kWh
)
ggplotly(ggplot(data=agg_dt, aes(x=Equipment, y=kWh)) +
geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=date),
position = position_stack(vjust = 0.5),
angle = 0,
size = 2) +
xlab("Date") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 0, hjust = 1))
))
您可以在绘制之前根据需要过滤数据:
library(tidyverse)
dt_sum <- dt %>%
# First filter according to your input
filter(Equipment %in% c("AC", "furnace") & ("2015-01-12" <= date) & ("2015-02-22" > date)) %>%
group_by(Equipment) %>% # Group the data by Equipment
top_n(1, kWh) %>% # Take the maximum kWh value per Equipment
top_n(1, date) # Take the maximum date if there are several with the same max kWh value
dt_sum
# A tibble: 2 x 3
# Groups: Equipment [2]
# kWh Equipment date
# <dbl> <fct> <dttm>
# 1 0.92 furnace 2015-01-21 20:00:00
# 2 0.95 AC 2015-01-14 17:00:00
p <- ggplot(dt_sum, aes(x = Equipment, y = kWh)) +
geom_bar(position = 'dodge', stat = 'identity') +
geom_text(aes(label = date), position = position_stack(vjust = 0.5),
angle = 90, size = 2) +
xlab("Date") +
ylab("Consumption (kWh)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
p
角度问题是由 ggplotly
引起的(如您所见,angle = 90
在 ggplot
调用中未被忽略)。
ggplotly(p)