如何根据比例为两个完全不同的变量构建单个图?
How to construct a single graph for two completely different variables in terms of scale?
我有这个数据集
df <- data.frame(year = seq(1970, 2015, by = 5),
staff = c(219, 231, 259, 352, 448, 427, 556, 555, 602, 622),
applications = c(5820, 7107, 6135, 16119, 19381, 36611, 54962, 45759, 40358, 458582))
我想进行探索性分析,并根据收到的申请比较员工人数是否在增长。我使用 excel 绘制了一个折线图:
意义不大。
我还记录了两个变量的对数,它们几乎得到了预期的结果,但我想知道是否对非数学家来说,带有对数的图表更难解释。因为我想在向不太了解统计或数学的管理人员进行演示时使用这些图表。
我的问题是如何处理这种情况以绘制有意义的图表。
我有一种直觉,R 可能有比 Excel 更好的解决方案(这就是我在这里问的原因),但问题是 'How'?
任何帮助将不胜感激。
一个建议是将您的度量更改为某种比率度量。例如,staff per applications
。在下文中,我将使用 staff per 1,000 applications
:
library(ggplot2)
df <- data.frame(year = seq(1970, 2015, by = 5),
staff = c(219, 231, 259, 352, 448, 427, 556, 555, 602, 622),
applications = c(5820, 7107, 6135, 16119, 19381, 36611, 54962, 45759, 40358, 458582))
ggplot(data = df, aes(x = year, y = staff / (applications / 1000))) +
geom_point(size = 3) +
geom_line() +
ggtitle("Staff per 1,000 Applications")
我们可以在没有 ggplot2
的情况下获得相同的结果:
with(df,
plot(x = year, y = staff / (applications / 1000), type = "l", main = "Staff per 1,000 Applications") +
points(x = year, y = staff / (applications / 1000), pch = 21, cex = 2, bg = "black")
)
或者,您可以让您的数据集更整洁一些(有关更多信息,请参阅 this, this, and/or this)并用 free_y
比例绘制它们的两个方面:
library(tidyr)
df_tidy <- gather(df, measure, value, -year)
ggplot(data = df_tidy, aes(x = year, y = value)) +
geom_point(size = 3) +
geom_line() +
facet_grid(measure ~ ., scales = "free_y")
we can use this process:
library(ggplot2)
library(reshape2)
ggplot(df, aes(year)) +
geom_line(aes(y = staff, colour = "staff")) +
geom_line(aes(y = applications, colour = "applications"))
df <- data.frame(year = seq(1970, 2015, by = 5),
staff = c(219, 231, 259, 352, 448, 427, 556, 555, 602, 622),
applications = c(5820, 7107, 6135, 16119, 19381, 36611, 54962, 45759, 40358, 458582)
我建议您使用 facet_grid
和 scales = "free_y"
。
ggplot(reshape2::melt(df, 1), aes(year, value)) +
geom_line() + geom_point() +
facet_grid(variable ~ ., scales = 'free_y')
你将得到的输出是,
我有这个数据集
df <- data.frame(year = seq(1970, 2015, by = 5),
staff = c(219, 231, 259, 352, 448, 427, 556, 555, 602, 622),
applications = c(5820, 7107, 6135, 16119, 19381, 36611, 54962, 45759, 40358, 458582))
我想进行探索性分析,并根据收到的申请比较员工人数是否在增长。我使用 excel 绘制了一个折线图:
意义不大。 我还记录了两个变量的对数,它们几乎得到了预期的结果,但我想知道是否对非数学家来说,带有对数的图表更难解释。因为我想在向不太了解统计或数学的管理人员进行演示时使用这些图表。 我的问题是如何处理这种情况以绘制有意义的图表。 我有一种直觉,R 可能有比 Excel 更好的解决方案(这就是我在这里问的原因),但问题是 'How'?
任何帮助将不胜感激。
一个建议是将您的度量更改为某种比率度量。例如,staff per applications
。在下文中,我将使用 staff per 1,000 applications
:
library(ggplot2)
df <- data.frame(year = seq(1970, 2015, by = 5),
staff = c(219, 231, 259, 352, 448, 427, 556, 555, 602, 622),
applications = c(5820, 7107, 6135, 16119, 19381, 36611, 54962, 45759, 40358, 458582))
ggplot(data = df, aes(x = year, y = staff / (applications / 1000))) +
geom_point(size = 3) +
geom_line() +
ggtitle("Staff per 1,000 Applications")
我们可以在没有 ggplot2
的情况下获得相同的结果:
with(df,
plot(x = year, y = staff / (applications / 1000), type = "l", main = "Staff per 1,000 Applications") +
points(x = year, y = staff / (applications / 1000), pch = 21, cex = 2, bg = "black")
)
或者,您可以让您的数据集更整洁一些(有关更多信息,请参阅 this, this, and/or this)并用 free_y
比例绘制它们的两个方面:
library(tidyr)
df_tidy <- gather(df, measure, value, -year)
ggplot(data = df_tidy, aes(x = year, y = value)) +
geom_point(size = 3) +
geom_line() +
facet_grid(measure ~ ., scales = "free_y")
we can use this process:
library(ggplot2)
library(reshape2)
ggplot(df, aes(year)) +
geom_line(aes(y = staff, colour = "staff")) +
geom_line(aes(y = applications, colour = "applications"))
df <- data.frame(year = seq(1970, 2015, by = 5),
staff = c(219, 231, 259, 352, 448, 427, 556, 555, 602, 622),
applications = c(5820, 7107, 6135, 16119, 19381, 36611, 54962, 45759, 40358, 458582)
我建议您使用 facet_grid
和 scales = "free_y"
。
ggplot(reshape2::melt(df, 1), aes(year, value)) +
geom_line() + geom_point() +
facet_grid(variable ~ ., scales = 'free_y')
你将得到的输出是,