散点图上两点之间的标签范围与百分比差异
Label Range between two points on scatterplot with the percent difference
我有一个简单的散点图,显示不同范围内年份之间的销售差异。
因此,当范围为“>400 美元”时,2013 年的销售额为 X,2014 年的销售额为 X。
我正在尝试在某些点添加注释,以显示 2013 年到 2014 年的百分比差异。这可能吗?
这是输出:
structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L,
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L,
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("M",
"M", "M", "M", "-80M", "-90M", "-100M", "1-110M",
"1-120M", "1-130M", "1-140M", "1-150M", "1-160M",
"1-170M", "1-180M", "1-190M", "1-200M", "0-225M",
"6-250M", "1-275M", "6-300M", "1-325M", "6-350M",
"1-375M", "6-400M", ">0M"), class = "factor"), Avg_TOTALS = c(44732492.5,
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5,
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417,
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143,
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333,
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379,
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723,
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286,
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year",
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA,
-44L))
这是我当前生成的图表:
orderlist = c("M", "M", "M", "M", "-80M", "-90M", "- 100M", "1-110M", "1-120M", "1-130M",
"1-140M", "1-150M", "1-160M", "1-170M", "1-180M", "1-190M", "1-200M", "0-225M",
"6-250M", "1-275M", "6-300M", "1-325M", "6-350M", "1-375M", "6-400M", ">0M")
myDF = transform(myDF, Range = factor(Range, levels = orderlist))
myChart <- ggplot(myDF, aes(x = Range, y = Avg_TOTALS)) +
geom_point(aes(color = factor(Year))) +
theme_tufte() +
theme(axis.text.x= element_text(angle = 90, hjust = 0)) +
labs(x = "Range", y = "Sales by Range", title = "MyChart")+
scale_y_continuous(breaks = c(50000000, 100000000, 200000000,
300000000,400000000, 500000000),
labels = dollar)
这给了我:
并引出了这个问题:
以 2013 年为基准年,我如何添加这些点之间的百分比差异?也,有一些范围在两年中只有一年有销售——是否可以跳过这些范围内的百分比标签?一个条件是数据必须在两个年份都存在才能被包含?
感谢您的帮助!
这是一种方法。我认为有更好的方法。这是我昏昏欲睡的大脑中最好的。希望你不介意。让我简要解释一下代码。我关注你了。然后,我获得了 ggplot 使用的数据,我称之为 foo。我创建了一个主数据框来处理丢失的数据点并使用了连接。 dplyr 部分正在做一些计算和东西以获得比例。使用 annotate
中的结果,我分配了您想要的标签。希望这会帮助你。 zzz...
数据
mydf <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L,
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L,
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("M",
"M", "M", "M", "-80M", "-90M", "-100M", "1-110M",
"1-120M", "1-130M", "1-140M", "1-150M", "1-160M",
"1-170M", "1-180M", "1-190M", "1-200M", "0-225M",
"6-250M", "1-275M", "6-300M", "1-325M", "6-350M",
"1-375M", "6-400M", ">0M"), class = "factor"), Avg_TOTALS = c(44732492.5,
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5,
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417,
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143,
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333,
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379,
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723,
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286,
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year",
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA,
-44L))
orderlist = c("M", "M", "M", "M", "-80M", "-90M", "- 100M", "1-110M", "1-120M", "1-130M",
"1-140M", "1-150M", "1-160M", "1-170M", "1-180M", "1-190M", "1-200M", "0-225M",
"6-250M", "1-275M", "6-300M", "1-325M", "6-350M", "1-375M", "6-400M", ">0M")
mydf = transform(myDF, Range = factor(Range, levels = orderlist))
g <- ggplot(mydf, aes(x = Range, y = Avg_TOTALS)) +
geom_point(aes(color = factor(Year))) +
#theme_tufte() +
theme(axis.text.x= element_text(angle = 90, hjust = 0))+
labs(x="Range", y = "Sales by Range", title = "MyChart")+
scale_y_continuous(breaks = c(50000000, 100000000, 200000000, 300000000,400000000, 500000000), labels = dollar)
library(dplyr)
foo <- ggplot_build(g)$data[[1]] %>%
arrange(group) %>%
mutate(year = c(rep("2013", times = 23), rep("2014", times = 21)))
master <- expand.grid(year = c("2013", "2014"), group = 1:24)
full_join(master, foo, by = c("year", c("group" = "x"))) %>%
group_by(group) %>%
mutate(prop = round(order_by(year, y / first(y)), 2)) %>%
summarise(y = first(y), prop = min(prop, na.rm = FALSE)) -> txt
g + annotate("text", x = txt$group, y = txt$y + 15000000, label = txt$prop)
我有一个简单的散点图,显示不同范围内年份之间的销售差异。
因此,当范围为“>400 美元”时,2013 年的销售额为 X,2014 年的销售额为 X。
我正在尝试在某些点添加注释,以显示 2013 年到 2014 年的百分比差异。这可能吗?
这是输出:
structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L,
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L,
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("M",
"M", "M", "M", "-80M", "-90M", "-100M", "1-110M",
"1-120M", "1-130M", "1-140M", "1-150M", "1-160M",
"1-170M", "1-180M", "1-190M", "1-200M", "0-225M",
"6-250M", "1-275M", "6-300M", "1-325M", "6-350M",
"1-375M", "6-400M", ">0M"), class = "factor"), Avg_TOTALS = c(44732492.5,
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5,
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417,
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143,
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333,
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379,
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723,
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286,
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year",
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA,
-44L))
这是我当前生成的图表:
orderlist = c("M", "M", "M", "M", "-80M", "-90M", "- 100M", "1-110M", "1-120M", "1-130M",
"1-140M", "1-150M", "1-160M", "1-170M", "1-180M", "1-190M", "1-200M", "0-225M",
"6-250M", "1-275M", "6-300M", "1-325M", "6-350M", "1-375M", "6-400M", ">0M")
myDF = transform(myDF, Range = factor(Range, levels = orderlist))
myChart <- ggplot(myDF, aes(x = Range, y = Avg_TOTALS)) +
geom_point(aes(color = factor(Year))) +
theme_tufte() +
theme(axis.text.x= element_text(angle = 90, hjust = 0)) +
labs(x = "Range", y = "Sales by Range", title = "MyChart")+
scale_y_continuous(breaks = c(50000000, 100000000, 200000000,
300000000,400000000, 500000000),
labels = dollar)
这给了我:
并引出了这个问题:
以 2013 年为基准年,我如何添加这些点之间的百分比差异?也,有一些范围在两年中只有一年有销售——是否可以跳过这些范围内的百分比标签?一个条件是数据必须在两个年份都存在才能被包含?
感谢您的帮助!
这是一种方法。我认为有更好的方法。这是我昏昏欲睡的大脑中最好的。希望你不介意。让我简要解释一下代码。我关注你了。然后,我获得了 ggplot 使用的数据,我称之为 foo。我创建了一个主数据框来处理丢失的数据点并使用了连接。 dplyr 部分正在做一些计算和东西以获得比例。使用 annotate
中的结果,我分配了您想要的标签。希望这会帮助你。 zzz...
数据
mydf <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L,
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L,
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("M",
"M", "M", "M", "-80M", "-90M", "-100M", "1-110M",
"1-120M", "1-130M", "1-140M", "1-150M", "1-160M",
"1-170M", "1-180M", "1-190M", "1-200M", "0-225M",
"6-250M", "1-275M", "6-300M", "1-325M", "6-350M",
"1-375M", "6-400M", ">0M"), class = "factor"), Avg_TOTALS = c(44732492.5,
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5,
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417,
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143,
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333,
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379,
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723,
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286,
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year",
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA,
-44L))
orderlist = c("M", "M", "M", "M", "-80M", "-90M", "- 100M", "1-110M", "1-120M", "1-130M",
"1-140M", "1-150M", "1-160M", "1-170M", "1-180M", "1-190M", "1-200M", "0-225M",
"6-250M", "1-275M", "6-300M", "1-325M", "6-350M", "1-375M", "6-400M", ">0M")
mydf = transform(myDF, Range = factor(Range, levels = orderlist))
g <- ggplot(mydf, aes(x = Range, y = Avg_TOTALS)) +
geom_point(aes(color = factor(Year))) +
#theme_tufte() +
theme(axis.text.x= element_text(angle = 90, hjust = 0))+
labs(x="Range", y = "Sales by Range", title = "MyChart")+
scale_y_continuous(breaks = c(50000000, 100000000, 200000000, 300000000,400000000, 500000000), labels = dollar)
library(dplyr)
foo <- ggplot_build(g)$data[[1]] %>%
arrange(group) %>%
mutate(year = c(rep("2013", times = 23), rep("2014", times = 21)))
master <- expand.grid(year = c("2013", "2014"), group = 1:24)
full_join(master, foo, by = c("year", c("group" = "x"))) %>%
group_by(group) %>%
mutate(prop = round(order_by(year, y / first(y)), 2)) %>%
summarise(y = first(y), prop = min(prop, na.rm = FALSE)) -> txt
g + annotate("text", x = txt$group, y = txt$y + 15000000, label = txt$prop)