r 中的均值和 sd 的频率 table,每行有多个案例

Frequency table with mean and sd in r with multiple cases per row

我想创建一个频率 table,它根据以下(虚拟)数据给出每年 meanSD 的咨询:

    id icpc icpc2 date 
1:  123 D95 F15   2015-06-19 
2:  123 F85       2016-08-15 
3:  332 A01       2010-03-16
4:  332 A04       2018-01-20 
5:  332 K20       2017-02-20
6:  100 B10       2017-06-01
7:  100 A04       2008-01-11
8:  113 T08       2018-03-18
9:  113 P28       2017-01-19
10: 113 D95 A01   2013-01-16
11: 113 A04       2009-05-01
12: 551 B12 A01   2011-04-03
13: 551 D95       2015-05-09

可重现的数据:

df <- structure(list(id = c(123L, 123L, 332L, 332L, 332L, 100L, 100L, 
113L, 113L, 113L, 113L, 551L, 551L), icpc = c("D95", "F85", "A01", 
"A04", "K20", "B10", "A04", "T08", "P28", "D95", "A04", "B12", 
"D95"), icpc2 = c("F15", "", "", "", "", "", "", "", "", "A01", 
"", "A01", ""), date = c("2015-06-19", "2016-08-15", "2010-03-16", 
"2018-01-20", "2017-02-20", "2017-05-01", "2008-01-11", "1201803-18", 
"2017-01-19", "2013-01-16", "2009-05-01", "2011-04-03", "2015-05-09"
)), class = "data.frame", row.names = c(NA, -13L))

我执行了以下步骤,并且能够通过 mean 获得频率 table,但我认为应该有更简单的方法,但我仍然无法获得 SD请帮我得到 SDyear.

为了算均值,我新建了一个专栏(consult),每次咨询1个(基于icpc):

setDT(df)[, consult := if (any(icpc %in% "")) "1" else "1", ]
df$consult <- as.numeric(df$consult)

从那里:

#consultation frequency per year
df.freq.year <- df %>%
  mutate(year = format(date, "%Y")) %>%
  group_by(id, year) %>%
  summarise(frequency = sum(consult))

#mean consultations per year
df.mean.year <- df.freq.year %>%
  group_by(id, year) %>%
  summarise(mean = mean(frequency))

#make table with number of patients per year
df.pat <- df %>%
      mutate(year = format(date, "%Y")) %>%
      group_by(year) %>%
      summarise(Nbr.patients = sum(length(unique(id))))

我试过以下方法(不成功):

sqrt(var(df.freq.year$frequency, by = "year"))

我的输出应该是这样的:

   year mean SD
1:  2008 5.2  1.3
2:  2009 4.0  1.1
3:  2010 8.9  1.6
4:  2011 4.9  2.1
5:  2012 3.4  1.1
6:  2013 2.3  1.1
7:  2014 9.5  1.3
8:  2015 12.0 2.1
9:  2016 11.4 2.6
10: 2017 8.9  2.0
11: 2018 6.7  2.2

好的,我设法解决了...

#consultation frequency per patient per year
df.freq.patyear <- df %>%
  group_by(id, year) %>%
  summarise(frequency = sum(consult))

#calculate SD per year
df.sd <- df.freq.patyear %>%
  group_by(year) %>%
  summarise(SD = sd(frequency))

df.table <- merge(df.mean.year, df.sd, by = "year")