显示 ggalluvium 的流量

Question

寻求有关使用 ggalluvium 的一些建议，以证明澳大利亚的偏好分布。

上下文，在澳大利亚我们有优先投票。假设我住在一个有 4 名候选人参加竞选的地区。根据您的 party/candidate 偏好，通过在方框 1-4 中编号来完成投票。第一次点票后得票比例最低的候选人将被淘汰，他们的选票将分配给选民在选票上指定的位置。重复此过程，直到有两名候选人保留并在两党首选的50％以上选举中选出候选人。

我正在寻求使用流程图和 ggalluvium 可视化上述重复分发过程。

然而，我似乎不太能描绘出在下一次计票中向候选人提供选票的流程。

这是我目前得到的：

library(tidyverse)
library(magrittr)
library(ggalluvial)


Load Data
house_of_reps <- read_csv("https://results.aec.gov.au/24310/Website/Downloads/HouseDopByDivisionDownload-24310.csv", skip = 1)
house_of_reps$BallotPosition  %<>% as.factor()
house_of_reps$CountNumber %<>% as.factor()

cooper <- house_of_reps %>% 
      filter(DivisionNm == "Cooper") %>% 
      spread(CalculationType, CalculationValue) %>% 
      select(4,9,10,14)

cooper %>% ggplot(aes(x = CountNumber, alluvium = PartyNm, stratum = `Preference Percent`, y = `Preference Percent`, fill = PartyAb)) +
       geom_alluvium(aes(fill = PartyAb), decreasing = TRUE) +
       geom_stratum(decreasing = TRUE) +
       geom_text(stat = "stratum",decreasing = TRUE, aes(label = after_stat(fill))) +
       stat_stratum(decreasing = TRUE) +
       stat_stratum(geom = "text", aes(label = PartyAb), decreasing = TRUE) +
       scale_fill_viridis_d() +
       theme_minimal()

Output image

对于如何显示每次后续计票后的选票流向下一层中哪个政党的任何指导，我们将不胜感激。

Answer 1

很遗憾，您的数据集不太适合您想要的那种情节。虽然绘图本身很容易，但要获得所需的绘图涉及“一些”数据整理和准备步骤。

一般问题是您的数据集没有显示从一个政党到另一个政党的选票流向。它只显示一个政党在每次计票中失去或获得的总票数。

但是，由于在每个步骤中只有一方遗漏了这一缺失信息，因此可以从您的数据中提取这些信息。基本思想是根据选民的次要政党偏好为每个政党或更准确地说是每个在后来的计数之一中退出的政党分配 obs。

不确定每个步骤是否清楚，但我添加了一些解释作为注释，并添加了数据集最终结构图，希望能更清楚地说明所有步骤的最终结果是什么：

library(tidyverse)
library(magrittr)
library(ggalluvial)

# Load Data
house_of_reps <- read_csv("https://results.aec.gov.au/24310/Website/Downloads/HouseDopByDivisionDownload-24310.csv", skip = 1)
house_of_reps$BallotPosition  %<>% as.factor()
house_of_reps$CountNumber %<>% as.factor()

cooper <- house_of_reps %>% 
  filter(DivisionNm == "Cooper") %>% 
  spread(CalculationType, CalculationValue) %>% 
  select(count = CountNumber, party = PartyAb, pref = `Preference Count`, trans = `Transfer Count`)

# Helper function to
make_rows <- function(x) {
  # Name of party which gets dropped in this period
  dropped <- filter(x, trans < 0) %>% pull(party)
  if (length(dropped) > 0) {
    x <- filter(x, trans >= 0)
    # Replacements are added two times. Once for the period where the party drops out, 
    # and also for the previous period
    xdrop <- mutate(x, party = dropped, pref = trans, trans = 0, is_drop = FALSE)
    xdrop1 <- mutate(xdrop, count = count - 1, to = party, is_drop = FALSE)
    # For the parties to keep or which receive transfered votes have to adjust the number of votes
    xkeep <- mutate(x, pref = pref - trans, trans = 0) 
    bind_rows(xdrop1, xdrop, xkeep)  
  } else {
    x
  }
}

cooper1 <- cooper %>% 
  # First: Convert count to a numeric. Add a "to" variable for second 
  # party preference or the party where votes are transferred to. This variable 
  # will later on be mapped on the "fill" aes 
  mutate(to = party, count = as.numeric(as.character(count))) %>% 
  group_by(party) %>%
  # Add identifier of obs. to drop. Obs. to drop are obs. of parties which 
  # drop out in the following count
  mutate(is_drop = lead(trans, default = 0) < 0) %>% 
  ungroup() %>% 
  # Split obs. to be dropped by secondary party preference, i.e. in count 0 the 
  # obs for party "IND" is replaced by seven obs. reflecting the secondary preference 
  # for one of the other seven parties
  split(.$count) %>% 
  map(make_rows) %>% 
  bind_rows() %>% 
  # Now drop original obs.
  filter(!is_drop, pref > 0) %>%
  # Add a unique identifier
  group_by(count, party) %>% 
  mutate(id = paste0(party, row_number())) %>% 
  ungroup() %>% 
  # To make the flow chart work we have make the dataset complete, i.e. add 
  # "empty" obs for each type of voter and each count
  complete(count, id, fill = list(pref = 0, trans = 0, is_drop = FALSE)) %>% 
  # Fill up party and "to" columns  
  mutate(across(c(party, to), ~ if_else(is.na(.), str_extract(id, "[^\d]+"), .))) %>%
  # Filling up the "to" column with last observed value for "to" if any
  group_by(id) %>% 
  mutate(last_id = last(which(party != to)),
         to = if_else(count >= last_id & !is.na(last_id), to[last_id], to)) %>% 
  ungroup()

数据集的最终结构可以用瓦片图来说明：

cooper1 %>% 
  add_count(count, party) %>% 
  ggplot(aes(count, reorder(id, n), fill = to)) +
  geom_tile(color = "white")

正如我所说的，在所有繁琐的数据争论之后，制作流程图本身是最简单的任务，可以像这样实现：

cooper1 %>% 
  ggplot(aes(x = count, alluvium = id, stratum = to, y = pref, fill = to)) +
  geom_flow(decreasing = TRUE) +
  geom_stratum(decreasing = TRUE) +
  scale_fill_viridis_d() +
  theme_minimal()

显示 ggalluvium 的流量

Showing flows for ggalluvium

r

ggplot2

ggalluvial