如何按年份绘制变量的可用性?
How to plot the availability of a variable by year?
year <- c(2000:2014)
group <- c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
"B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
"C","C","C","C","C","C","C","C","C","C","C","C","C","C","C")
value <- sample(1:5, 45, replace=TRUE)
df <- data.frame(year,group,value)
df$value[df$value==1] <- NA
year group value
1 2000 A NA
2 2001 A 2
3 2002 A 2
...
11 2010 A 2
12 2011 A 3
13 2012 A 5
14 2013 A NA
15 2014 A 3
16 2000 B 2
17 2001 B 3
...
26 2010 B NA
27 2011 B 5
28 2012 B 4
29 2013 B 3
30 2014 B 5
31 2000 C 5
32 2001 C 4
33 2002 C 3
34 2003 C 4
...
44 2013 C 5
45 2014 C 3
以上是我的问题的示例数据框。
每个组(A、B 或 C)都具有从 2000 年到 2014 年的值,但在某些年份,某些组的值可能会缺失。
我想绘制的图表如下:
x 轴是年份
y 轴是组(即 A、B 和 C 应显示在 y-lab 上)
条形或线条表示每个组的价值可用性
如果值为NA,则该条不会在该时间点显示。
如果可能的话,ggplot2 是首选。
有人可以帮忙吗?
谢谢。
我认为我的描述令人困惑。我期待如下图,但 x 轴是年份。条形或线条表示一年中给定组的值的可用性。
在 A 组的示例数据框中,我们有
2012 A 5
2013 A NA
2014 A 3
那么2013年A组的点应该没有,然后2014年A组的点出现一个点
您可以使用 geom_errorbar,没有范围(geom_errorbarh 表示水平)。然后只是 complete.cases(或 !is.na(df$value)
)
的子集
library(ggplot2)
set.seed(10)
year <- c(2000:2014)
group <- c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
"B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
"C","C","C","C","C","C","C","C","C","C","C","C","C","C","C")
value <- sample(1:5, 45, replace=TRUE)
df <- data.frame(year,group,value)
df$value[df$value==1] <- NA
no_na_df <- df[complete.cases(df), ]
ggplot(no_na_df, aes(x=year, y = group)) +
geom_errorbarh(aes(xmax = year, xmin = year), size = 2)
编辑:
要获得计数棒,您可以使用这种稍微不吸引人的方法。必须对组数据进行数字表示,以赋予条形宽度。此后,我们可以使比例再次将变量表示为离散。
df$group_n <- as.numeric(df$group)
no_na_df <- df[complete.cases(df), ]
ggplot(no_na_df, aes(xmin=year-0.5, xmax=year+0.5, y = group_n)) +
geom_rect(aes(ymin = group_n-0.1, ymax = group_n+0.1)) +
scale_y_discrete(limits = levels(df$group))
year <- c(2000:2014)
group <- c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
"B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
"C","C","C","C","C","C","C","C","C","C","C","C","C","C","C")
value <- sample(1:5, 45, replace=TRUE)
df <- data.frame(year,group,value)
df$value[df$value==1] <- NA
year group value
1 2000 A NA
2 2001 A 2
3 2002 A 2
...
11 2010 A 2
12 2011 A 3
13 2012 A 5
14 2013 A NA
15 2014 A 3
16 2000 B 2
17 2001 B 3
...
26 2010 B NA
27 2011 B 5
28 2012 B 4
29 2013 B 3
30 2014 B 5
31 2000 C 5
32 2001 C 4
33 2002 C 3
34 2003 C 4
...
44 2013 C 5
45 2014 C 3
以上是我的问题的示例数据框。 每个组(A、B 或 C)都具有从 2000 年到 2014 年的值,但在某些年份,某些组的值可能会缺失。
我想绘制的图表如下:
x 轴是年份
y 轴是组(即 A、B 和 C 应显示在 y-lab 上)
条形或线条表示每个组的价值可用性
如果值为NA,则该条不会在该时间点显示。 如果可能的话,ggplot2 是首选。
有人可以帮忙吗? 谢谢。
我认为我的描述令人困惑。我期待如下图,但 x 轴是年份。条形或线条表示一年中给定组的值的可用性。
在 A 组的示例数据框中,我们有
2012 A 5
2013 A NA
2014 A 3
那么2013年A组的点应该没有,然后2014年A组的点出现一个点
您可以使用 geom_errorbar,没有范围(geom_errorbarh 表示水平)。然后只是 complete.cases(或 !is.na(df$value)
)
library(ggplot2)
set.seed(10)
year <- c(2000:2014)
group <- c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
"B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
"C","C","C","C","C","C","C","C","C","C","C","C","C","C","C")
value <- sample(1:5, 45, replace=TRUE)
df <- data.frame(year,group,value)
df$value[df$value==1] <- NA
no_na_df <- df[complete.cases(df), ]
ggplot(no_na_df, aes(x=year, y = group)) +
geom_errorbarh(aes(xmax = year, xmin = year), size = 2)
编辑: 要获得计数棒,您可以使用这种稍微不吸引人的方法。必须对组数据进行数字表示,以赋予条形宽度。此后,我们可以使比例再次将变量表示为离散。
df$group_n <- as.numeric(df$group)
no_na_df <- df[complete.cases(df), ]
ggplot(no_na_df, aes(xmin=year-0.5, xmax=year+0.5, y = group_n)) +
geom_rect(aes(ymin = group_n-0.1, ymax = group_n+0.1)) +
scale_y_discrete(limits = levels(df$group))