绘制收入的客户百分比

Plot percentage of clients that earn

我有一个显示某人收入多少的数据集。生成合成数据:

set.seed (100)
ddata <- data.frame(amount = rbeta(10000,2,20) * 1000)
ddata <- ddata %>%
  group_by(amount) %>%
  summarise(proportion = n()) %>%
  mutate(Perc = cumsum(100*proportion/sum(proportion)), 
        reverse = length(.)-Perc)

要查看数据的分布,它是偏斜的。

hist(ddata$amount)

我创建了一个百分比排名列 'Perc',它显示 收入 X 或低于 X 金额的客户数量。以下是图表的代码:

ddata %>% 
ggplot() +
    geom_line(aes(x = amount, y = Perc, color = '#EF010C')) +
ylab("% Clients") +
xlab("Amount earned")

我需要创建这个图表的反面,换句话说,我还需要创建一个图表显示 我 收入 X 或超过 X 的客户数量。我为此创建了列 'reverse',曲线形状似乎是正确的,但 y 轴上显示的百分比是负数。我怎样才能解决这个问题?任何帮助将不胜感激。最后,曲线应该遵循类似于前面显示的直方图的形状。

坐标轴错误的图表:

ddata %>% 
ggplot() +
    geom_line(aes(x = amount, y = reverse, color = '#EF010C')) +
ylab("% Clients") +
xlab("Amount earned")
set.seed (100)
ddata <- data.frame(amount = rbeta(10000,2,20) * 1000)
ddata <- ddata %>%
  group_by(amount) %>%
  summarise(proportion = n()) %>%
  mutate(Perc = cumsum(100*proportion/sum(proportion)), 
         reverse = 100-Perc) #Changed this

ddata %>% 
  ggplot() +
  geom_line(aes(x = amount, y = Perc, color = '#EF010C')) +
  ylab("% Clients") +
  xlab("Amount earned")

ddata %>% 
  ggplot() +
  geom_line(aes(x = amount, y = reverse, color = '#EF010C')) +
  ylab("% Clients") +
  xlab("Amount earned")

像这样?

您可以使用 ecdf 获得您的值的经验累积分布。倒数只是 100% - ecdf:

set.seed (100)
ddata <- data.frame(amount = rbeta(10000,2,20) * 1000)
ddata <- ddata %>%
  mutate(Perc = ecdf(amount)(amount) * 100,
         reverse = 100 - Perc)

ddata %>% 
ggplot() +
    geom_line(aes(x = amount, y = Perc), color = '#EF010C') +
ylab("% Clients") +
xlab("Amount earned")

ddata %>% 
ggplot() +
    geom_line(aes(x = amount, y = reverse), color = '#EF010C') +
ylab("% Clients") +
xlab("Amount earned")