是否有可能 运行 一个 Cox-Proportional-Hazards-Model 对 `lifelines` 或其他包裹中的基线危险具有指数分布?
Is it possible to run a Cox-Proportional-Hazards-Model with an exponential distribution for the baseline hazard in `lifelines` or another package?
我考虑使用 lifelines
包来适应 Cox-Proportional-Hazards-Model. I read that lifelines uses a nonparametric approach to fit the baseline hazard,这会导致某些时间点的 baseline_hazards 不同(请参见下面的代码示例)。对于我的申请,我需要一个
exponential distribution leading to a baseline hazard h0(t) = lambda 随时间变化。
所以我的问题是:是否(与此同时)运行 一个考克斯比例风险模型,其基线风险呈指数分布 lifelines
或另一个 Python包?
示例代码:
from lifelines import CoxPHFitter
import pandas as pd
df = pd.DataFrame({'duration': [4, 6, 5, 5, 4, 6],
'event': [0, 0, 0, 1, 1, 1],
'cat': [0, 1, 0, 1, 0, 1]})
cph = CoxPHFitter()
cph.fit(df, duration_col='duration', event_col='event', show_progress=True)
cph.baseline_hazard_
给予
baseline hazard
T
4.0 0.160573
5.0 0.278119
6.0 0.658032
生命线作者在这里。
因此,此模型本身不在生命线中,但您可以轻松地自己实现它(也许我会为将来的版本做一些事情)。这个想法依赖于比例风险模型和 AFT(加速故障时间)模型的交集。在具有指数风险(即恒定基线风险)的 cox-ph 模型中,风险如下所示:
h(t|x) = lambda_0(t) * exp(beta * x) = lambda_0 * exp(beta * x)
在指数分布的 AFT 规范中,风险如下所示:
h(t|x) = exp(-beta * x - beta_0) = exp(-beta * x) * exp(-beta_0) = exp(-beta * x) * lambda_0
注意负号的区别!
因此,我们可以进行指数 AFT 拟合,而不是进行 CoxPH(如果我们想要与 CoxPH 相同的解释,则翻转符号)。我们可以使用自定义回归模型语法来执行此操作:
from lifelines.fitters import ParametricRegressionFitter
from autograd import numpy as np
class ExponentialAFTFitter(ParametricRegressionFitter):
# this is necessary, and should always be a non-empty list of strings.
_fitted_parameter_names = ['lambda_']
def _cumulative_hazard(self, params, T, Xs):
# params is a dictionary that maps unknown parameters to a numpy vector.
# Xs is a dictionary that maps unknown parameters to a numpy 2d array
lambda_ = np.exp(np.dot(Xs['lambda_'], params['lambda_']))
return T / lambda_
测试这个,
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter
rossi = load_rossi()
rossi['intercept'] = 1
regressors = {'lambda_': rossi.columns}
eaf = ExponentialAFTFitter().fit(rossi, "week", "arrest", regressors=regressors)
eaf.print_summary()
"""
<lifelines.ExponentialAFTFitter: fitted with 432 observations, 318 censored>
event col = 'arrest'
number of subjects = 432
number of events = 114
log-likelihood = -686.37
time fit was run = 2019-06-27 15:13:18 UTC
---
coef exp(coef) se(coef) z p -log2(p) lower 0.95 upper 0.95
lambda_ fin 0.37 1.44 0.19 1.92 0.06 4.18 -0.01 0.74
age 0.06 1.06 0.02 2.55 0.01 6.52 0.01 0.10
race -0.30 0.74 0.31 -0.99 0.32 1.63 -0.91 0.30
wexp 0.15 1.16 0.21 0.69 0.49 1.03 -0.27 0.56
mar 0.43 1.53 0.38 1.12 0.26 1.93 -0.32 1.17
paro 0.08 1.09 0.20 0.42 0.67 0.57 -0.30 0.47
prio -0.09 0.92 0.03 -3.03 <0.005 8.65 -0.14 -0.03
_intercept 4.05 57.44 0.59 6.91 <0.005 37.61 2.90 5.20
_fixed _intercept 0.00 1.00 0.00 nan nan nan 0.00 0.00
---
"""
CoxPHFitter().fit(load_rossi(), 'week', 'arrest').print_summary()
"""
<lifelines.CoxPHFitter: fitted with 432 observations, 318 censored>
duration col = 'week'
event col = 'arrest'
number of subjects = 432
number of events = 114
partial log-likelihood = -658.75
time fit was run = 2019-06-27 15:17:41 UTC
---
coef exp(coef) se(coef) z p -log2(p) lower 0.95 upper 0.95
fin -0.38 0.68 0.19 -1.98 0.05 4.40 -0.75 -0.00
age -0.06 0.94 0.02 -2.61 0.01 6.79 -0.10 -0.01
race 0.31 1.37 0.31 1.02 0.31 1.70 -0.29 0.92
wexp -0.15 0.86 0.21 -0.71 0.48 1.06 -0.57 0.27
mar -0.43 0.65 0.38 -1.14 0.26 1.97 -1.18 0.31
paro -0.08 0.92 0.20 -0.43 0.66 0.59 -0.47 0.30
prio 0.09 1.10 0.03 3.19 <0.005 9.48 0.04 0.15
---
Concordance = 0.64
Log-likelihood ratio test = 33.27 on 7 df, -log2(p)=15.37
"""
注意符号的变化!所以如果你想要模型中的恒定基线风险,那就是 exp(-4.05)
.
我考虑使用 lifelines
包来适应 Cox-Proportional-Hazards-Model. I read that lifelines uses a nonparametric approach to fit the baseline hazard,这会导致某些时间点的 baseline_hazards 不同(请参见下面的代码示例)。对于我的申请,我需要一个
exponential distribution leading to a baseline hazard h0(t) = lambda 随时间变化。
所以我的问题是:是否(与此同时)运行 一个考克斯比例风险模型,其基线风险呈指数分布 lifelines
或另一个 Python包?
示例代码:
from lifelines import CoxPHFitter
import pandas as pd
df = pd.DataFrame({'duration': [4, 6, 5, 5, 4, 6],
'event': [0, 0, 0, 1, 1, 1],
'cat': [0, 1, 0, 1, 0, 1]})
cph = CoxPHFitter()
cph.fit(df, duration_col='duration', event_col='event', show_progress=True)
cph.baseline_hazard_
给予
baseline hazard
T
4.0 0.160573
5.0 0.278119
6.0 0.658032
生命线作者在这里。
因此,此模型本身不在生命线中,但您可以轻松地自己实现它(也许我会为将来的版本做一些事情)。这个想法依赖于比例风险模型和 AFT(加速故障时间)模型的交集。在具有指数风险(即恒定基线风险)的 cox-ph 模型中,风险如下所示:
h(t|x) = lambda_0(t) * exp(beta * x) = lambda_0 * exp(beta * x)
在指数分布的 AFT 规范中,风险如下所示:
h(t|x) = exp(-beta * x - beta_0) = exp(-beta * x) * exp(-beta_0) = exp(-beta * x) * lambda_0
注意负号的区别!
因此,我们可以进行指数 AFT 拟合,而不是进行 CoxPH(如果我们想要与 CoxPH 相同的解释,则翻转符号)。我们可以使用自定义回归模型语法来执行此操作:
from lifelines.fitters import ParametricRegressionFitter
from autograd import numpy as np
class ExponentialAFTFitter(ParametricRegressionFitter):
# this is necessary, and should always be a non-empty list of strings.
_fitted_parameter_names = ['lambda_']
def _cumulative_hazard(self, params, T, Xs):
# params is a dictionary that maps unknown parameters to a numpy vector.
# Xs is a dictionary that maps unknown parameters to a numpy 2d array
lambda_ = np.exp(np.dot(Xs['lambda_'], params['lambda_']))
return T / lambda_
测试这个,
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter
rossi = load_rossi()
rossi['intercept'] = 1
regressors = {'lambda_': rossi.columns}
eaf = ExponentialAFTFitter().fit(rossi, "week", "arrest", regressors=regressors)
eaf.print_summary()
"""
<lifelines.ExponentialAFTFitter: fitted with 432 observations, 318 censored>
event col = 'arrest'
number of subjects = 432
number of events = 114
log-likelihood = -686.37
time fit was run = 2019-06-27 15:13:18 UTC
---
coef exp(coef) se(coef) z p -log2(p) lower 0.95 upper 0.95
lambda_ fin 0.37 1.44 0.19 1.92 0.06 4.18 -0.01 0.74
age 0.06 1.06 0.02 2.55 0.01 6.52 0.01 0.10
race -0.30 0.74 0.31 -0.99 0.32 1.63 -0.91 0.30
wexp 0.15 1.16 0.21 0.69 0.49 1.03 -0.27 0.56
mar 0.43 1.53 0.38 1.12 0.26 1.93 -0.32 1.17
paro 0.08 1.09 0.20 0.42 0.67 0.57 -0.30 0.47
prio -0.09 0.92 0.03 -3.03 <0.005 8.65 -0.14 -0.03
_intercept 4.05 57.44 0.59 6.91 <0.005 37.61 2.90 5.20
_fixed _intercept 0.00 1.00 0.00 nan nan nan 0.00 0.00
---
"""
CoxPHFitter().fit(load_rossi(), 'week', 'arrest').print_summary()
"""
<lifelines.CoxPHFitter: fitted with 432 observations, 318 censored>
duration col = 'week'
event col = 'arrest'
number of subjects = 432
number of events = 114
partial log-likelihood = -658.75
time fit was run = 2019-06-27 15:17:41 UTC
---
coef exp(coef) se(coef) z p -log2(p) lower 0.95 upper 0.95
fin -0.38 0.68 0.19 -1.98 0.05 4.40 -0.75 -0.00
age -0.06 0.94 0.02 -2.61 0.01 6.79 -0.10 -0.01
race 0.31 1.37 0.31 1.02 0.31 1.70 -0.29 0.92
wexp -0.15 0.86 0.21 -0.71 0.48 1.06 -0.57 0.27
mar -0.43 0.65 0.38 -1.14 0.26 1.97 -1.18 0.31
paro -0.08 0.92 0.20 -0.43 0.66 0.59 -0.47 0.30
prio 0.09 1.10 0.03 3.19 <0.005 9.48 0.04 0.15
---
Concordance = 0.64
Log-likelihood ratio test = 33.27 on 7 df, -log2(p)=15.37
"""
注意符号的变化!所以如果你想要模型中的恒定基线风险,那就是 exp(-4.05)
.