ggplot/scatterplot 一年的排名与另一年的排名
ggplot/scatterplot of rank in one year against rank in a different year
这看起来很简单。如果您知道,请指点我一份副本。使用包 reshape2
(作者:Hadley Wickham)的 melt()
函数和包 ggplot2
(作者:Hadley Wickham)的 ggplot()
命令以长格式排列数据,我想根据 2009 年值的排名对其 2007 年的排名 绘制一个变量的 ID。
我最好的镜头:
ggplot(data = df, aes(
x = reorder(subset(id, year == "2007"), subset(rank, year == "2007")),
y = reorder(subset(id, year == "2009"), subset(rank, year == "2009")))) +
geom_point()
在上图中,点位于 45 度角线上,而不是 (id,id)
交叉点(例如汇丰银行、汇丰银行)。 ID 沿 x 轴 按预期顺序排列,但沿 y 轴按相反顺序排列 。
注意:我的最终目的是制作一个气泡图,圆点大小与值成正比,变量标签和值打印在圆圈旁边。
数据
head(df)
## id year value rank
## 13 Citigroup 2007 255 1
## 15 HSBC 2007 215 2
## 14 JP Morgan 2007 165 3
## 2 Royal Bank of Scotland 2007 120 4
## 9 UBS 2007 116 5
## 12 Santander 2007 116 6
df <- structure(list(id = c("Citigroup", "HSBC", "JP Morgan", "Royal Bank of Scotland",
"UBS", "Santander", "BNP Paribas", "Goldman Sachs", "Unicredit",
"Barclays", "Societe Generale", "Deutsche Bank", "Credit Suisse",
"Credit Agricole", "Morgan Stanley", "HSBC", "JP Morgan", "Santander",
"UBS", "Goldman Sachs", "BNP Paribas", "Credit Suisse", "Societe Generale",
"Unicredit", "Citigroup", "Credit Agricole", "Morgan Stanley",
"Deutsche Bank", "Barclays", "Royal Bank of Scotland"), year = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2007",
"2009"), class = "factor"), value = c(255, 215, 165, 120, 116,
116, 108, 100, 93, 91, 80, 76, 75, 67, 49, 97, 85, 64, 35, 35,
32.5, 27, 26, 26, 19, 17, 16, 10.3, 7.4, 4.6), rank = c(1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 15)), .Names = c("id", "year", "value",
"rank"), row.names = c(13L, 15L, 14L, 2L, 9L, 12L, 7L, 11L, 8L,
6L, 5L, 3L, 10L, 4L, 1L, 30L, 29L, 27L, 24L, 26L, 22L, 25L, 20L,
23L, 28L, 19L, 16L, 18L, 21L, 17L), class = "data.frame")
## Data Source: A lecture handout by professor Andrei Shleifer of Harvard university, with data source quoted as J.P. Morgan and dated February 2009.
我认为主要问题是您的数据采用 "too long" 格式,即您的 x 值(2007 年排名)和 y 值(2009 年排名)最终出现在同一列中。也许您很容易在 post.
中未显示的数据消息步骤中更改此上游
无论如何,给定 post 中的数据,我会首先将其转换为更宽的格式(此处使用 data.table::dcast
),以便在单独的列中具有 x 和 y 值:
library(data.table)
df2 <- dcast(setDT(df), id ~ year, value.var = c("value", "rank"))
head(df2)
# id value_2007 value_2009 rank_2007 rank_2009
# 1: BNP Paribas 108 32.5 7 6
# 2: Barclays 91 7.4 10 14
# 3: Citigroup 255 19.0 1 10
# 4: Credit Agricole 67 17.0 14 11
# 5: Credit Suisse 75 27.0 13 7
# 6: Deutsche Bank 76 10.3 12 13
那么绘图就相当简单了:
ggplot(data = df2, aes(x = rank_2007, y = rank_2009, label = id)) +
geom_text(vjust = 1) +
geom_point(aes(size = value_2007), alpha = 0.2) +
geom_point(aes(size = value_2009), alpha = 0.2)
当然,有很多美化的可能性(标签定位、点大小的比例等),但那是另一回事了。
lukeA(在评论部分)和 Henrik 对我的问题提出了一些很好的建议和答案。谢谢!在这里,作为后续行动,我想展示我是如何结合他们的建议制作气泡图和排名相关视觉效果的:
第一个图使用geom_point()
结合colour
和size
,而第二个图使用geom_point()
结合fill
和size
相反,使用 shape = 21
参数在图例中打印一个空心圆。我发现充满黑色的 size
图例气泡在视觉上有点过于压倒性。
由于某种原因,传说中的name
参数没有打印出来,这是我没有遇到过的情况,也无法解释。也许需要对颜色和形状进行更多调整...欢迎发表评论!
df <- structure(list(id = c("HSBC", "JP Morgan", "Santander", "UBS",
"Goldman Sachs", "BNP Paribas", "Credit Suisse", "Unicredit",
"Societe Generale", "Citigroup", "Credit Agricole", "Morgan Stanley",
"Deutsche Bank", "Barclays", "Royal Bank of Scotland"), value.2007 = c(215L,
165L, 116L, 116L, 100L, 108L, 75L, 93L, 80L, 255L, 67L, 49L,
76L, 91L, 120L), value.2009 = c(97, 85, 64, 35, 35, 32.5, 27,
26, 26, 19, 17, 16, 10.3, 7.4, 4.6), rank.2007 = c(2, 3, 6, 5,
8, 7, 13, 9, 11, 1, 14, 15, 12, 10, 4), rank.2009 = c(1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)), .Names = c("id",
"value.2007", "value.2009", "rank.2007", "rank.2009"), row.names = c(15L,
14L, 12L, 9L, 11L, 7L, 10L, 8L, 5L, 13L, 4L, 1L, 3L, 6L, 2L), class = "data.frame")
## Comments:
# 1. properly scale bubbles in bubble-chart:
v1 <- min(df$value.2007, df$value.2009)
v2 <- max(df$value.2007, df$value.2009)
# use + scale_size(range = c(v1, v2)/10) or similar
# 2. increase the size of points in the legend
# with + guides(colour = guide_legend(override.aes = list(size = 10)))
# 3. add a name to the legend guides: FAIL!
# + scale_size(name = "Market Cap ($bn)", range = c(v1, v2)/10)
# + scale_color_manual(name = "Year", values = c("royalblue", "forestgreen"))
# Version 1. solid shape with colour
library("ggplot2")
library("scales")
p <- ggplot(data = df, aes(x = rank.2007, y = rank.2009, label = id)) +
geom_point(aes(size = value.2007, colour = "2007"), alpha = 0.8) +
geom_point(aes(size = value.2009, colour = "2009"), alpha = 0.8) +
geom_text(size = 4, vjust = -5) +
scale_x_continuous(limits = c(-1, 17), breaks = seq(1, 16, 2)) +
scale_y_continuous(limits = c(-1, 17), breaks = seq(1, 16, 2)) +
coord_fixed() +
scale_color_manual(values = c("royalblue", "forestgreen")) +
scale_size(range = c(v1, v2)/10) +
guides(colour = guide_legend(override.aes = list(size = 10))) +
theme_bw() +
xlab("Rank by Market Capitalization in 2007") +
ylab("Rank by Market Capitalization in 2009") +
ggtitle("Market Capitalization Before and After the Crisis \n(Selected Banks: 2009 versus 2007)") +
theme(legend.position = "right", legend.direction = "vertical") +
theme(legend.title = element_blank()) +
theme(legend.key = element_blank())
p
ggsave(p, file = "p1.jpg", width = 12, height = 10)
# Version 2: hollow shape with fill
library("ggplot2")
library("scales")
p <- ggplot(data = df, aes(x = rank.2007, y = rank.2009, label = id)) +
geom_point(aes(size = value.2007, fill = "2007"),
shape = 21, alpha = 0.8) +
geom_point(aes(size = value.2009, fill = "2009"),
shape = 21, alpha = 0.8) +
geom_text(size = 4, vjust = -5) +
scale_size(name = "Market Cap ($bn)", range = c(v1, v2)/10) +
scale_shape(solid = FALSE) + # combined with shape=21
scale_x_continuous(limits = c(-1, 17), breaks = seq(1, 16, 2)) +
scale_y_continuous(limits = c(-1, 17), breaks = seq(1, 16, 2)) +
coord_fixed() +
scale_fill_manual(name = "Year", values = c("royalblue", "forestgreen")) +
guides(fill = guide_legend(override.aes = list(size = 10))) +
theme_bw() +
xlab("Rank by Market Capitalization in 2007") +
ylab("Rank by Market Capitalization in 2009") +
ggtitle("Market Capitalization Before and After the Crisis \n(Selected Banks: 2009 versus 2007)") +
theme(legend.position = "right", legend.direction = "vertical") +
theme(legend.title = element_blank()) +
theme(legend.key = element_blank())
p
ggsave(p, file = "p2.jpg", width = 12, height = 10)
这看起来很简单。如果您知道,请指点我一份副本。使用包 reshape2
(作者:Hadley Wickham)的 melt()
函数和包 ggplot2
(作者:Hadley Wickham)的 ggplot()
命令以长格式排列数据,我想根据 2009 年值的排名对其 2007 年的排名 绘制一个变量的 ID。
我最好的镜头:
ggplot(data = df, aes(
x = reorder(subset(id, year == "2007"), subset(rank, year == "2007")),
y = reorder(subset(id, year == "2009"), subset(rank, year == "2009")))) +
geom_point()
在上图中,点位于 45 度角线上,而不是 (id,id)
交叉点(例如汇丰银行、汇丰银行)。 ID 沿 x 轴 按预期顺序排列,但沿 y 轴按相反顺序排列 。
注意:我的最终目的是制作一个气泡图,圆点大小与值成正比,变量标签和值打印在圆圈旁边。
数据
head(df)
## id year value rank
## 13 Citigroup 2007 255 1
## 15 HSBC 2007 215 2
## 14 JP Morgan 2007 165 3
## 2 Royal Bank of Scotland 2007 120 4
## 9 UBS 2007 116 5
## 12 Santander 2007 116 6
df <- structure(list(id = c("Citigroup", "HSBC", "JP Morgan", "Royal Bank of Scotland",
"UBS", "Santander", "BNP Paribas", "Goldman Sachs", "Unicredit",
"Barclays", "Societe Generale", "Deutsche Bank", "Credit Suisse",
"Credit Agricole", "Morgan Stanley", "HSBC", "JP Morgan", "Santander",
"UBS", "Goldman Sachs", "BNP Paribas", "Credit Suisse", "Societe Generale",
"Unicredit", "Citigroup", "Credit Agricole", "Morgan Stanley",
"Deutsche Bank", "Barclays", "Royal Bank of Scotland"), year = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2007",
"2009"), class = "factor"), value = c(255, 215, 165, 120, 116,
116, 108, 100, 93, 91, 80, 76, 75, 67, 49, 97, 85, 64, 35, 35,
32.5, 27, 26, 26, 19, 17, 16, 10.3, 7.4, 4.6), rank = c(1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 15)), .Names = c("id", "year", "value",
"rank"), row.names = c(13L, 15L, 14L, 2L, 9L, 12L, 7L, 11L, 8L,
6L, 5L, 3L, 10L, 4L, 1L, 30L, 29L, 27L, 24L, 26L, 22L, 25L, 20L,
23L, 28L, 19L, 16L, 18L, 21L, 17L), class = "data.frame")
## Data Source: A lecture handout by professor Andrei Shleifer of Harvard university, with data source quoted as J.P. Morgan and dated February 2009.
我认为主要问题是您的数据采用 "too long" 格式,即您的 x 值(2007 年排名)和 y 值(2009 年排名)最终出现在同一列中。也许您很容易在 post.
中未显示的数据消息步骤中更改此上游无论如何,给定 post 中的数据,我会首先将其转换为更宽的格式(此处使用 data.table::dcast
),以便在单独的列中具有 x 和 y 值:
library(data.table)
df2 <- dcast(setDT(df), id ~ year, value.var = c("value", "rank"))
head(df2)
# id value_2007 value_2009 rank_2007 rank_2009
# 1: BNP Paribas 108 32.5 7 6
# 2: Barclays 91 7.4 10 14
# 3: Citigroup 255 19.0 1 10
# 4: Credit Agricole 67 17.0 14 11
# 5: Credit Suisse 75 27.0 13 7
# 6: Deutsche Bank 76 10.3 12 13
那么绘图就相当简单了:
ggplot(data = df2, aes(x = rank_2007, y = rank_2009, label = id)) +
geom_text(vjust = 1) +
geom_point(aes(size = value_2007), alpha = 0.2) +
geom_point(aes(size = value_2009), alpha = 0.2)
当然,有很多美化的可能性(标签定位、点大小的比例等),但那是另一回事了。
lukeA(在评论部分)和 Henrik 对我的问题提出了一些很好的建议和答案。谢谢!在这里,作为后续行动,我想展示我是如何结合他们的建议制作气泡图和排名相关视觉效果的:
第一个图使用geom_point()
结合colour
和size
,而第二个图使用geom_point()
结合fill
和size
相反,使用 shape = 21
参数在图例中打印一个空心圆。我发现充满黑色的 size
图例气泡在视觉上有点过于压倒性。
由于某种原因,传说中的name
参数没有打印出来,这是我没有遇到过的情况,也无法解释。也许需要对颜色和形状进行更多调整...欢迎发表评论!
df <- structure(list(id = c("HSBC", "JP Morgan", "Santander", "UBS",
"Goldman Sachs", "BNP Paribas", "Credit Suisse", "Unicredit",
"Societe Generale", "Citigroup", "Credit Agricole", "Morgan Stanley",
"Deutsche Bank", "Barclays", "Royal Bank of Scotland"), value.2007 = c(215L,
165L, 116L, 116L, 100L, 108L, 75L, 93L, 80L, 255L, 67L, 49L,
76L, 91L, 120L), value.2009 = c(97, 85, 64, 35, 35, 32.5, 27,
26, 26, 19, 17, 16, 10.3, 7.4, 4.6), rank.2007 = c(2, 3, 6, 5,
8, 7, 13, 9, 11, 1, 14, 15, 12, 10, 4), rank.2009 = c(1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)), .Names = c("id",
"value.2007", "value.2009", "rank.2007", "rank.2009"), row.names = c(15L,
14L, 12L, 9L, 11L, 7L, 10L, 8L, 5L, 13L, 4L, 1L, 3L, 6L, 2L), class = "data.frame")
## Comments:
# 1. properly scale bubbles in bubble-chart:
v1 <- min(df$value.2007, df$value.2009)
v2 <- max(df$value.2007, df$value.2009)
# use + scale_size(range = c(v1, v2)/10) or similar
# 2. increase the size of points in the legend
# with + guides(colour = guide_legend(override.aes = list(size = 10)))
# 3. add a name to the legend guides: FAIL!
# + scale_size(name = "Market Cap ($bn)", range = c(v1, v2)/10)
# + scale_color_manual(name = "Year", values = c("royalblue", "forestgreen"))
# Version 1. solid shape with colour
library("ggplot2")
library("scales")
p <- ggplot(data = df, aes(x = rank.2007, y = rank.2009, label = id)) +
geom_point(aes(size = value.2007, colour = "2007"), alpha = 0.8) +
geom_point(aes(size = value.2009, colour = "2009"), alpha = 0.8) +
geom_text(size = 4, vjust = -5) +
scale_x_continuous(limits = c(-1, 17), breaks = seq(1, 16, 2)) +
scale_y_continuous(limits = c(-1, 17), breaks = seq(1, 16, 2)) +
coord_fixed() +
scale_color_manual(values = c("royalblue", "forestgreen")) +
scale_size(range = c(v1, v2)/10) +
guides(colour = guide_legend(override.aes = list(size = 10))) +
theme_bw() +
xlab("Rank by Market Capitalization in 2007") +
ylab("Rank by Market Capitalization in 2009") +
ggtitle("Market Capitalization Before and After the Crisis \n(Selected Banks: 2009 versus 2007)") +
theme(legend.position = "right", legend.direction = "vertical") +
theme(legend.title = element_blank()) +
theme(legend.key = element_blank())
p
ggsave(p, file = "p1.jpg", width = 12, height = 10)
# Version 2: hollow shape with fill
library("ggplot2")
library("scales")
p <- ggplot(data = df, aes(x = rank.2007, y = rank.2009, label = id)) +
geom_point(aes(size = value.2007, fill = "2007"),
shape = 21, alpha = 0.8) +
geom_point(aes(size = value.2009, fill = "2009"),
shape = 21, alpha = 0.8) +
geom_text(size = 4, vjust = -5) +
scale_size(name = "Market Cap ($bn)", range = c(v1, v2)/10) +
scale_shape(solid = FALSE) + # combined with shape=21
scale_x_continuous(limits = c(-1, 17), breaks = seq(1, 16, 2)) +
scale_y_continuous(limits = c(-1, 17), breaks = seq(1, 16, 2)) +
coord_fixed() +
scale_fill_manual(name = "Year", values = c("royalblue", "forestgreen")) +
guides(fill = guide_legend(override.aes = list(size = 10))) +
theme_bw() +
xlab("Rank by Market Capitalization in 2007") +
ylab("Rank by Market Capitalization in 2009") +
ggtitle("Market Capitalization Before and After the Crisis \n(Selected Banks: 2009 versus 2007)") +
theme(legend.position = "right", legend.direction = "vertical") +
theme(legend.title = element_blank()) +
theme(legend.key = element_blank())
p
ggsave(p, file = "p2.jpg", width = 12, height = 10)