如何在 R 中为条件逻辑回归匹配患者数据?
How do I match patient data for conditional logistic regression in R?
我有如下数据集:
patient_id pre.int.outcome post.int.outcome
302949 1 1
993564 0 1
993570 1 1
993575 0 1
993792 1 0
我想为每位患者进行 clogit pre/post 干预
我明白我需要把它写成这样的形式:
strata outcome
1 1
1 1
2 0
2 0
3 0
3 1
在这种形式中,层是成对的患者编号和结果,但我不确定该怎么做。任何人都可以提供帮助或直接找到有帮助的资源吗?
编辑:我最终做的是使用重塑函数来制作数据集 'long' 而不是宽;
ds1<-reshape(ds, varying=c('pre.int.outcome','post.int.outcome'), v.names='outcome', timevar='before_after', times=c(0,1), direction='long')
我按 patient_id 排序以将其用作我的 'strata'。
ds1[order(ds1$patient_id),]
可能有帮助
data.frame(strata= rep(1:nrow(df1), each=2), outcome=c(t(df1[2:3])))
基于 akrun 的评论和回答,这里有一个使用 reshape2
包的 melt
:
的解决方案
library(reshape2)
# I created dummy data to make sure my answer works
# I assumed 4 intervention treatments, but this would work with
# two treatments. With the dummy data, just make sure nObs/4 is an integer
nObs = 100 # number of observations
d = data.frame(patient_id = 1:4,
pre.int.outcome = rbinom(4, 1, 0.7),
post.int.outcome = rbinom(4, 1, 0.5),
intervention = rep(c("a", "b", "c", "d"), each = nObs/4))
# melting the data as suggested by akrun
d2 = melt(d, id.vars = c("patient_id", "intervention"))
# Creating a strata variable for you with paste
d2$strata = as.factor(paste(d2$patient_id, d2$variable))
# I also clean up the variable to remove patient_id
# useful if you are concerned about protecting pii
levels(d2$strata) = 1:length(d2$strata)
# last, I clean up the data and create a third "pretty" data.frame
d3 = d2[ , c("intervention", "value", "strata")]
head(d3)
# intervention value strata
# 1 a 1 2
# 2 a 1 4
# 3 a 1 6
# 4 a 1 8
# 5 a 1 2
# 6 a 1 4
# I also throw in the logistic regression
myGLM = glm(value ~ intervention, data = d3, family = 'binomial')
summary(myGLM)
# prints lots of outputs to screen ...
# or if you need odds ratios
myGLM2 = glm(value ~ intervention - 1, data = d3, family = 'binomial')
exp(myGLM2$coef)
exp(confint(myGLM2))
# also prints lots of outputs to screen ...
编辑: 我根据 OP 的评论添加了 intervention
。我还添加了 glm
以进一步帮助她或他。
我有如下数据集:
patient_id pre.int.outcome post.int.outcome
302949 1 1
993564 0 1
993570 1 1
993575 0 1
993792 1 0
我想为每位患者进行 clogit pre/post 干预
我明白我需要把它写成这样的形式:
strata outcome
1 1
1 1
2 0
2 0
3 0
3 1
在这种形式中,层是成对的患者编号和结果,但我不确定该怎么做。任何人都可以提供帮助或直接找到有帮助的资源吗?
编辑:我最终做的是使用重塑函数来制作数据集 'long' 而不是宽;
ds1<-reshape(ds, varying=c('pre.int.outcome','post.int.outcome'), v.names='outcome', timevar='before_after', times=c(0,1), direction='long')
我按 patient_id 排序以将其用作我的 'strata'。
ds1[order(ds1$patient_id),]
可能有帮助
data.frame(strata= rep(1:nrow(df1), each=2), outcome=c(t(df1[2:3])))
基于 akrun 的评论和回答,这里有一个使用 reshape2
包的 melt
:
library(reshape2)
# I created dummy data to make sure my answer works
# I assumed 4 intervention treatments, but this would work with
# two treatments. With the dummy data, just make sure nObs/4 is an integer
nObs = 100 # number of observations
d = data.frame(patient_id = 1:4,
pre.int.outcome = rbinom(4, 1, 0.7),
post.int.outcome = rbinom(4, 1, 0.5),
intervention = rep(c("a", "b", "c", "d"), each = nObs/4))
# melting the data as suggested by akrun
d2 = melt(d, id.vars = c("patient_id", "intervention"))
# Creating a strata variable for you with paste
d2$strata = as.factor(paste(d2$patient_id, d2$variable))
# I also clean up the variable to remove patient_id
# useful if you are concerned about protecting pii
levels(d2$strata) = 1:length(d2$strata)
# last, I clean up the data and create a third "pretty" data.frame
d3 = d2[ , c("intervention", "value", "strata")]
head(d3)
# intervention value strata
# 1 a 1 2
# 2 a 1 4
# 3 a 1 6
# 4 a 1 8
# 5 a 1 2
# 6 a 1 4
# I also throw in the logistic regression
myGLM = glm(value ~ intervention, data = d3, family = 'binomial')
summary(myGLM)
# prints lots of outputs to screen ...
# or if you need odds ratios
myGLM2 = glm(value ~ intervention - 1, data = d3, family = 'binomial')
exp(myGLM2$coef)
exp(confint(myGLM2))
# also prints lots of outputs to screen ...
编辑: 我根据 OP 的评论添加了 intervention
。我还添加了 glm
以进一步帮助她或他。