使用 n_distinct 条件计算每位患者的平均值

Question

在我的数据框中，我想计算执行医疗保健活动的绝对频率、相对频率和每位患者执行的平均次数。

我使用以下代码计算医疗保健利用率：

Df %>%
   group_by(A) %>%
   summarize(n = n()) %>%
   mutate(rel.freq = (n/sum(n))*100) %>%
   mutate(avg.A.pt = n/sum(n_distinct(Person[A == A])))

我对代码的最后一行有疑问。我需要计算每名患者针对一种特定护理类型的活动数量，计算方式为 activity n 的总数除以患者的唯一数量 n_distinct(Person)，但仅除以接受特定类型护理的患者 Person[HCU == HCU]。

我的目标结果如下所示：

*HCU    n     rel.freq     avg.hcu.pt*
ECG   486      10%          4.0
Echo  301      8%           1.8

你能帮我修改一下代码吗？

提前致谢！

回复后，补充一些信息：

我在安全环境中进行远程访问，因此很遗憾，我无法为您提供数据样本。我有大约 20.000 名患者的数据集，他们接受了 11.000.000 次医疗保健活动（行）和 34 列，例如专业、医疗中心、年龄和个人代码。对于我的文章，我想展示： - 至少接受过一次特定医疗保健 activity 的（唯一）患者百分比（我称之为相对频率） - 每个（唯一）患者activity（特定类型）的平均医疗保健数量

基本上我已经映射了护理类型，例如使用 group_by 和 dplyr 过滤器的实验室测试，这给了我实验室测试的总数。但现在我想具体说明，例如有多少患者至少做过一次 MRI，有多少从未做过 MRI，患者接受了多少次 MRI（平均）。

我试过你的建议

Df %>%
Group_by(A, Person) %>%
Summarise(n = n())

# A= healthcare activities

这给了我：

A            Person         n
MRI        1                 6
MRI        2                 2
… for all >1000 patients who received MRI
Echo      1                 3
And so on

如何获得 MRI 患者的百分比？每个患者的平均 MRI 次数？

Answer 1

让我们创建一些玩具数据。四种不同概率的治疗。 100位患者就诊1000次

set.seed(123)
df<-data.frame(A = sample(c("MRI", "ECG", "Echo", "PET"), 1000,
                          prob=c(0.05, 0.8, 0.13, 0.02), replace=TRUE),
               p = sample(1:100, 1000, replace=TRUE))

现在我们汇总数据

    df %>% 
  # group by Treatment and patients
  group_by(A, p) %>% 
  # first summary is the number of a specific treatments for each patient
  summarise(n = n()) %>% 
  # next summary we sum the number distinct patients in the group
  # and divide by sum the number of distinct patients to get the rel.freq of the treatment.
  # Then we take the mean value of the number of treatment pr. patient 
  summarise(rel.freq   = n_distinct(p)/n_distinct(df$p),
            avg.hcu.pt = mean(n))

结果

# A tibble: 4 x 3
A     rel.freq avg.hcu.pt
<fct>    <dbl>      <dbl>
1 ECG       1          8.02
2 Echo      0.76       1.72
3 MRI       0.37       1.30
4 PET       0.17       1.12

使用 n_distinct 条件计算每位患者的平均值

Calculation averages per patient using n_distinct with condition

average

r

distinct-values

dataframe

dplyr