在 R 中的 if 操作后添加数字
adding up numbers after an if operation in R
我有一组数据,其中包含患者在特定日期服用的疗程数。
subject<-c(111,111,111,222,222,333,333,333,333)
date<-as.Date(c("2010-12-12","2011-12-01","2009-8-7","2010-5-7","2011-3-7","2011-8-5","2013-8-27","2016-9-3","2011-8-5"))
medicationCourses<-c(1,0,NA,3,4,2,4,5,6)
data<-data.frame(subject,date,medicationCourses)
data
subject date medicationCourses
1 111 2010-12-12 1
2 111 2011-12-01 0
3 111 2009-08-07 NA
4 222 2010-05-07 3
5 222 2011-03-07 4
6 333 2011-08-05 2
7 333 2013-08-27 4
8 333 2016-09-03 5
9 333 2011-08-05 6
我还有他们的入院日期
hospitalSubject<-c(111,222,333)
admissionDate<-as.Date(c("2011-12-31","2013-12-31","2013-12-31"))
hospitalData<-data.frame(hospitalSubject,admissionDate)
hospitalData
hospitalSubject admissionDate
1 111 2011-12-31
2 222 2013-12-31
3 333 2013-12-31
我想总结入院当天或之前的药物疗程数,并得出以下结果:
subject admissionDate totalMedicationCourses
111 2011-12-31 1
222 2013-12-31 7
333 2013-12-31 12
我想知道是否有人可以告诉我如何在 R 中做到这一点?我是 R 的新手,所以任何指导将不胜感激!
一个选项是 merge
两个数据集 subject/hospitalSubject
在两个数据集中, subset
具有 date <= admissionDate
的行,并得到 sum
的 'medicationCourses' 按 'subject/admissionDate' 和 aggregate
分组
d1 <- subset(merge(data, hospitalData, by.x='subject',
by.y='hospitalSubject'), date <= admissionDate)
aggregate(medicationCourses~subject+admissionDate, d1, sum,
na.rm=TRUE, na.action=NULL)
# subject admissionDate medicationCourses
#1 111 2011-12-31 1
#2 222 2013-12-31 7
#3 333 2013-12-31 12
或者我们可以通过将'data.frame'转换为'data.table'(setDT(data)
)来使用data.table
,设置密钥为'subject'(setkey(
), 并加入 hospitalData
, 过滤 date <= admissionDate
的行并得到 'medicationCourses' 的 sum
, 按 'subject' 和 'admissionDate' 分组。
library(data.table)
setkey(setDT(data), subject)[hospitalData][date <= admissionDate,
list(TotalMedicationCourses=sum(medicationCourses, na.rm=TRUE)),
list(subject, admissionDate)]
# subject admissionDate TotalMedicationCourses
#1: 111 2011-12-31 1
#2: 222 2013-12-31 7
#3: 333 2013-12-31 12
或与 dplyr
类似的方法
library(dplyr)
left_join(data, hospitalData, by=c('subject'='hospitalSubject')) %>%
filter(date <=admissionDate) %>%
group_by(subject, admissionDate) %>%
summarise(TotalMedicationCourses=sum(medicationCourses, na.rm=TRUE))
我有一组数据,其中包含患者在特定日期服用的疗程数。
subject<-c(111,111,111,222,222,333,333,333,333)
date<-as.Date(c("2010-12-12","2011-12-01","2009-8-7","2010-5-7","2011-3-7","2011-8-5","2013-8-27","2016-9-3","2011-8-5"))
medicationCourses<-c(1,0,NA,3,4,2,4,5,6)
data<-data.frame(subject,date,medicationCourses)
data
subject date medicationCourses
1 111 2010-12-12 1
2 111 2011-12-01 0
3 111 2009-08-07 NA
4 222 2010-05-07 3
5 222 2011-03-07 4
6 333 2011-08-05 2
7 333 2013-08-27 4
8 333 2016-09-03 5
9 333 2011-08-05 6
我还有他们的入院日期
hospitalSubject<-c(111,222,333)
admissionDate<-as.Date(c("2011-12-31","2013-12-31","2013-12-31"))
hospitalData<-data.frame(hospitalSubject,admissionDate)
hospitalData
hospitalSubject admissionDate
1 111 2011-12-31
2 222 2013-12-31
3 333 2013-12-31
我想总结入院当天或之前的药物疗程数,并得出以下结果:
subject admissionDate totalMedicationCourses
111 2011-12-31 1
222 2013-12-31 7
333 2013-12-31 12
我想知道是否有人可以告诉我如何在 R 中做到这一点?我是 R 的新手,所以任何指导将不胜感激!
一个选项是 merge
两个数据集 subject/hospitalSubject
在两个数据集中, subset
具有 date <= admissionDate
的行,并得到 sum
的 'medicationCourses' 按 'subject/admissionDate' 和 aggregate
d1 <- subset(merge(data, hospitalData, by.x='subject',
by.y='hospitalSubject'), date <= admissionDate)
aggregate(medicationCourses~subject+admissionDate, d1, sum,
na.rm=TRUE, na.action=NULL)
# subject admissionDate medicationCourses
#1 111 2011-12-31 1
#2 222 2013-12-31 7
#3 333 2013-12-31 12
或者我们可以通过将'data.frame'转换为'data.table'(setDT(data)
)来使用data.table
,设置密钥为'subject'(setkey(
), 并加入 hospitalData
, 过滤 date <= admissionDate
的行并得到 'medicationCourses' 的 sum
, 按 'subject' 和 'admissionDate' 分组。
library(data.table)
setkey(setDT(data), subject)[hospitalData][date <= admissionDate,
list(TotalMedicationCourses=sum(medicationCourses, na.rm=TRUE)),
list(subject, admissionDate)]
# subject admissionDate TotalMedicationCourses
#1: 111 2011-12-31 1
#2: 222 2013-12-31 7
#3: 333 2013-12-31 12
或与 dplyr
library(dplyr)
left_join(data, hospitalData, by=c('subject'='hospitalSubject')) %>%
filter(date <=admissionDate) %>%
group_by(subject, admissionDate) %>%
summarise(TotalMedicationCourses=sum(medicationCourses, na.rm=TRUE))