如何在 R 中动态地对时间范围内的重叠值求和?
How to sum overlapping values in a time range dynamically in R?
我有一个数据集,其中包含项目名称、开始日期、结束日期以及分配给每个项目的价值点,描述了它的强度。我想动态地可视化这些数据(f.ex.with plotly)并显示在给定时间范围内重叠的值点的总和。这是一些示例数据:
Project Title
Start date
End date
Points
Project A
20/04/2019
14/07/2023
10
Project B
18/06/2020
15/05/2022
5
Project C
01/12/2021
19/04/2023
3
Project D
09/07/2023
17/08/2024
2
(还有很多行)
在上面的例子中,项目A、B、C重叠,共20分,后来项目A和D重叠了一小段时间,共计12分几天。
我已将数据框融化为长格式以制作甘特图,例如:
df_tidy <- dataset2 %>%
melt(dataset2,
id.vars = "Project Title", "Points"),
measure.vars = c("start_date", "finish_date"),
variable.name = "variable")
我现在需要找到一种方法来对时间重叠的点求和,并以某种方式将其绘制在甘特图中。目标是能够在任何给定时间查看项目计划的总点数。我的主要想法是在任何给定时间悬停时让它以 plotly 显示,但我找不到办法做到这一点。有人对如何做有意见吗?谢谢!!
这不是情节,但你明白了。特别是,我创建了一个新的数据框,其中包含对点数发生变化的每个时间点的观察。然后可以用 geom_step()
.
绘制
# Data
d <- structure(list(name = c("Project A", "Project B", "Project C",
"Project D"), start = structure(c(18006, 18431, 18962, 19547), class = "Date"),
end = structure(c(19552, 19127, 19466, 19952), class = "Date"),
pts = c(10, 5, 3, 2)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))
library(tidyverse)
library(lubridate)
# Generate interval variable
d <- d |>
mutate(interval = start %--% end)
# New data set with point values where changes occur
points <-
tibble(dates = c(d$start, d$end + days(1))) |>
arrange(dates) |>
mutate(pts = map_dbl(dates, \(x) d$pts[x %within% d$interval] |> sum()))
pts_range <- range(points$pts)
# Calculate positions for combining discrete and continuous scale of names and points in plot
d <- d |>
mutate(pos = name |>
factor() |>
as.numeric() |>
{\(x) (x - 1) / (max(x)-1) * (pts_range[2] - pts_range[1] - 1) + pts_range[1] + 1}()
)
ggplot(d) +
geom_linerange(aes(xmin = start,
xmax = end,
y = pos,
color = name),
size = 12) +
geom_text(aes(x = start + ((end - start) / 2),
y = pos,
label = name)) +
geom_step(data = points,
aes(x = dates, y = pts),
alpha = .6,
size = 2) +
labs(x = NULL,
y = "Points") +
theme(legend.position = "none")
由 reprex package (v2.0.1)
创建于 2022-02-01
我有一个数据集,其中包含项目名称、开始日期、结束日期以及分配给每个项目的价值点,描述了它的强度。我想动态地可视化这些数据(f.ex.with plotly)并显示在给定时间范围内重叠的值点的总和。这是一些示例数据:
Project Title | Start date | End date | Points |
---|---|---|---|
Project A | 20/04/2019 | 14/07/2023 | 10 |
Project B | 18/06/2020 | 15/05/2022 | 5 |
Project C | 01/12/2021 | 19/04/2023 | 3 |
Project D | 09/07/2023 | 17/08/2024 | 2 |
(还有很多行)
在上面的例子中,项目A、B、C重叠,共20分,后来项目A和D重叠了一小段时间,共计12分几天。 我已将数据框融化为长格式以制作甘特图,例如:
df_tidy <- dataset2 %>%
melt(dataset2,
id.vars = "Project Title", "Points"),
measure.vars = c("start_date", "finish_date"),
variable.name = "variable")
我现在需要找到一种方法来对时间重叠的点求和,并以某种方式将其绘制在甘特图中。目标是能够在任何给定时间查看项目计划的总点数。我的主要想法是在任何给定时间悬停时让它以 plotly 显示,但我找不到办法做到这一点。有人对如何做有意见吗?谢谢!!
这不是情节,但你明白了。特别是,我创建了一个新的数据框,其中包含对点数发生变化的每个时间点的观察。然后可以用 geom_step()
.
# Data
d <- structure(list(name = c("Project A", "Project B", "Project C",
"Project D"), start = structure(c(18006, 18431, 18962, 19547), class = "Date"),
end = structure(c(19552, 19127, 19466, 19952), class = "Date"),
pts = c(10, 5, 3, 2)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))
library(tidyverse)
library(lubridate)
# Generate interval variable
d <- d |>
mutate(interval = start %--% end)
# New data set with point values where changes occur
points <-
tibble(dates = c(d$start, d$end + days(1))) |>
arrange(dates) |>
mutate(pts = map_dbl(dates, \(x) d$pts[x %within% d$interval] |> sum()))
pts_range <- range(points$pts)
# Calculate positions for combining discrete and continuous scale of names and points in plot
d <- d |>
mutate(pos = name |>
factor() |>
as.numeric() |>
{\(x) (x - 1) / (max(x)-1) * (pts_range[2] - pts_range[1] - 1) + pts_range[1] + 1}()
)
ggplot(d) +
geom_linerange(aes(xmin = start,
xmax = end,
y = pos,
color = name),
size = 12) +
geom_text(aes(x = start + ((end - start) / 2),
y = pos,
label = name)) +
geom_step(data = points,
aes(x = dates, y = pts),
alpha = .6,
size = 2) +
labs(x = NULL,
y = "Points") +
theme(legend.position = "none")
由 reprex package (v2.0.1)
创建于 2022-02-01