geom_bar 计算组数而不是个案数

geom_bar counts group and not number of cases

所以我是 R 的新手,由于到目前为止谷歌搜索和浏览问题对我没有帮助,我决定写下我的。

对于描述性统计,我想要一个 geom_bar 图。我的数据框由 21 个 ID 组成,每个 ID 都有一个或多个诊断。 ID 显然是从 1 到 21 的数字,诊断编码为 0 和 1(否和是)。到目前为止,我的代码是绘制彼此相邻的条形图,但它不是计算每组的案例数,而是绘制每组的人数。因此,对于每个诊断,我都有两个条形图,它们始终代表每组的人数(尝试者与未尝试者)而不是病例数。

我的旧数据框看起来像这样:

code MDD Anxiety PTBS age attempters
01 0 1 1 17 1
02 1 1 0 53 0
03 0 0 1 32 0
04 0 1 0 60 0

但是我的论文实际上不需要很多专栏。

起初我将我的数据从宽数据改为长数据,只包含我需要的列:

df_long <- data_gesamt %>%
  select(code, MDD, Anxiety, PTBS, attempters) %>%
  group_by(code, attempters) %>%
  tidyr::gather(key = predictors,
                value = severity,
                MDD, Anxiety, PTBS) %>% 
  mutate(attempters = as.factor(attempters)) %>% 
  drop_na(attempters)

这让我得到了如下数据框:

code attempters predictors severity
01 1 MDD 0
02 0 MDD 1
03 0 MDD 0
04 0 MDD 0
01 1 Anxiety 1
02 0 Anxiety 1
03 0 Anxiety 0
04 0 Anxiety 1
01 1 PTBS 1
02 0 PTBS 0
03 0 PTBS 1
04 0 PTBS 0
01 1 age 17
02 0 age 53
03 0 age 32
04 0 age 60

然后使用以下绘图:

plot <- df_long %>%
  ggplot(aes(x = attempters, fill = attempters)) +
  geom_bar() +
  facet_grid(.~ predictors) +
  theme(legend.position = "bottom")

plot

我需要计算每组有多少患有 MDD、焦虑症和 PTBS 的人以及平均年龄(不过我可以忽略这个)。到目前为止,我得到了每组的人数(未尝试者与尝试者)...... 我错过了什么或出了什么问题?

[更新:修改了数据性质错误的部分]

建议在绘制图表之前对数据进行汇总,这样 ggplot 只处理数据的可视化而不是一些额外的计算。除此之外,您还可以仔细检查计算的统计数据是否是您想要的,而不是将其全部留给 ggplot 来完成幕后的所有计算。

library(dplyr)
library(tidyr)
library(ggplot2)

# dput of your original table
df <- structure(list(code = 1:4, MDD = c(0L, 1L, 0L, 0L), Anxiety = c(1L, 
  1L, 0L, 1L), PTBS = c(1L, 0L, 1L, 0L), age = c(17L, 53L, 32L, 
    60L), attempters = c(1L, 0L, 0L, 0L)), row.names = c(NA, -4L), 
  class = "data.frame")

# Calculate the number per predictor & attemp
graph_data <- df %>%
  # pivot the data and only keep records with identified 
  # predictors has value as 1
  pivot_longer(cols = MDD:PTBS,
    names_to = "predictors", values_to = "value") %>%
  filter(value == 1) %>%
  # I convert attempers to factor as it only 0 and 1
  # Numeric value confusing with ggplot a bit
  group_by(predictors, attempters = factor(attempters)) %>%
  summarize(severity = n(),
    mean_age = mean(age), .groups = "drop")

# data after summarized
graph_data
#> # A tibble: 5 x 4
#>   predictors attempters severity mean_age
#>   <chr>      <fct>         <int>    <dbl>
#> 1 Anxiety    0                 2     56.5
#> 2 Anxiety    1                 1     17  
#> 3 MDD        0                 1     53  
#> 4 PTBS       0                 1     32  
#> 5 PTBS       1                 1     17

这是情节输出

# and now plot is taking in graph_data
ggplot(data = graph_data) +
  # I prefer to do the mapping per geom instead of at ggplot call.
  geom_bar(mapping = aes(
    x = attempters,
    # here y value is the sum of severity calculated earlier
    y = severity,
    # when do fill/colors I prefer to explicit specify the group even if it
    # can be auto by ggplot
    fill = attempters, group = attempters),
    # Here stat is identity instead of default count
    stat = "identity", 
    # position_dodge for avoid bar stacked on each other
    position = position_dodge()) +
  facet_grid(.~ predictors) +
  theme(legend.position = "bottom")

reprex package (v2.0.0)

于 2021-04-17 创建

我们可以使用

library(dplyr)
library(tidyr)
library(ggplot2)
data_gesamt %>% pivot_longer(cols = MDD:PTBS, names_to = 'predictors', values_to = 'severity', values_drop_na = TRUE) %>% group_by(attempters = factor(attempters), predictors) %>% summarise(Count = sum(severity), age = mean(age)) %>% ggplot(aes(x = attempters, y = Count, fill = predictors)) + geom_col(position = 'dodge')