缺失数据的时间序列图
time series plot for missing data
我有一些序列事件数据,我想绘制 value
随时间变化的缺失趋势。示例如下:
id time value
1 aa122 1 1
2 aa2142 1 1
3 aa4341 1 1
4 bb132 1 2
5 bb2181 2 1
6 bb3242 2 3
7 bb3321 2 NA
8 cc122 2 1
9 cc2151 2 2
10 cc3241 3 1
11 dd161 3 3
12 dd2152 3 NA
13 dd3282 3 NA
14 ee162 3 1
15 ee2201 4 2
16 ee3331 4 NA
17 ff1102 4 NA
18 ff2141 4 NA
19 ff3232 5 1
20 gg142 5 3
21 gg2192 5 NA
22 gg3311 5 NA
23 gg4362 5 NA
24 ii111 5 NA
NA
假设会随着时间的推移而增加(行为正在消退)。如何绘制 NA
跨时间
我想这就是您要找的东西?你想看看随着时间的推移出现了多少 NA。假设这是正确的,如果每次都是一组,那么你可以统计每组中出现的NA数量
数据:
df <- structure(list(id = structure(1:24, .Label = c("aa122", "aa2142",
"aa4341", "bb132", "bb2181", "bb3242", "bb3321", "cc122", "cc2151",
"cc3241", "dd161", "dd2152", "dd3282", "ee162", "ee2201", "ee3331",
"ff1102", "ff2141", "ff3232", "gg142", "gg2192", "gg3311", "gg4362",
"ii111"), class = "factor"), time = c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L,
5L, 5L), value = c(1L, 1L, 1L, 2L, 1L, 3L, NA, 1L, 2L, 1L, 3L,
NA, NA, 1L, 2L, NA, NA, NA, 1L, 3L, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-24L))
library(tidyverse)
library(ggplot2)
df %>%
group_by(time) %>%
summarise(sumNA = sum(is.na(value)))
# A tibble: 5 × 2
time sumNA
<int> <int>
1 1 0
2 2 1
3 3 2
4 4 3
5 5 4
然后您可以使用 ggplot2
绘制此图
df %>%
group_by(time) %>%
summarise(sumNA = sum(is.na(value))) %>%
ggplot(aes(x=time)) +
geom_line(aes(y=sumNA))
如你所见,随着时间的增加,NA的数量也在增加
我有一些序列事件数据,我想绘制 value
随时间变化的缺失趋势。示例如下:
id time value
1 aa122 1 1
2 aa2142 1 1
3 aa4341 1 1
4 bb132 1 2
5 bb2181 2 1
6 bb3242 2 3
7 bb3321 2 NA
8 cc122 2 1
9 cc2151 2 2
10 cc3241 3 1
11 dd161 3 3
12 dd2152 3 NA
13 dd3282 3 NA
14 ee162 3 1
15 ee2201 4 2
16 ee3331 4 NA
17 ff1102 4 NA
18 ff2141 4 NA
19 ff3232 5 1
20 gg142 5 3
21 gg2192 5 NA
22 gg3311 5 NA
23 gg4362 5 NA
24 ii111 5 NA
NA
假设会随着时间的推移而增加(行为正在消退)。如何绘制 NA
跨时间
我想这就是您要找的东西?你想看看随着时间的推移出现了多少 NA。假设这是正确的,如果每次都是一组,那么你可以统计每组中出现的NA数量
数据:
df <- structure(list(id = structure(1:24, .Label = c("aa122", "aa2142",
"aa4341", "bb132", "bb2181", "bb3242", "bb3321", "cc122", "cc2151",
"cc3241", "dd161", "dd2152", "dd3282", "ee162", "ee2201", "ee3331",
"ff1102", "ff2141", "ff3232", "gg142", "gg2192", "gg3311", "gg4362",
"ii111"), class = "factor"), time = c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L,
5L, 5L), value = c(1L, 1L, 1L, 2L, 1L, 3L, NA, 1L, 2L, 1L, 3L,
NA, NA, 1L, 2L, NA, NA, NA, 1L, 3L, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-24L))
library(tidyverse)
library(ggplot2)
df %>%
group_by(time) %>%
summarise(sumNA = sum(is.na(value)))
# A tibble: 5 × 2
time sumNA
<int> <int>
1 1 0
2 2 1
3 3 2
4 4 3
5 5 4
然后您可以使用 ggplot2
df %>%
group_by(time) %>%
summarise(sumNA = sum(is.na(value))) %>%
ggplot(aes(x=time)) +
geom_line(aes(y=sumNA))
如你所见,随着时间的增加,NA的数量也在增加