如何从 R 中的二维 copula 中找到联合累积分布函数?
How to find the joint cumulative distribution function from a 2-D copula in R?
我现在正在研究 R 中的 copula,我想知道如何找到 R 中的联合累积分布?
D = c(1,3,2,2,8,2,1,3,1,1,3,3,1,1,2,1,2,1,1,3,4,1,1,3,1,1,2,1,3,7,1,4,6,1,2,1,1,3,1,2,2,3,4,1,1,1,1,2,2,12,1,1,2,1,1,1,3,4)
S = c(1.42,5.15,2.52,2.29,12.36,2.82,1.49,3.53,1.17,1.03,4.03,5.26,1.65,1.41,3.75,1.09,3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)
经过一番摸索,我发现Gamma分布最能描述上述数据。
library(fitdistrplus)
fg_d <- fitdist(data = Dur, distr = "gamma", method = "mle")
fg_s <- fitdist(data = Sev, distr = "gamma", method = "mle")
然后,我尝试 select 使用 VineCopula
包的 copula 家族:
mydata <- cbind(D=D, S=S)
u1 <- pobs(mydata[,1])
u2 <- pobs(mydata[,2])
fitCopula <- BiCopSelect(u1, u2, familyset=NA)
summary(fitCopula)
结果表示"Survival Clayton"。然后,我尝试构建以下 copula:
library(copula)
cop_model <- surClaytonCopula(param = 5.79)
现在,根据下面的等式(假设E(L)为常数):
我需要为给定的 D 和 S 值找到 FD(d)、FS(s) 和 C(FD(d),FS(s))。
比如我们取D=3,S=2,那么我们要求F(D<=3),F(S<=2),C(D<=3 and
S<=2).我想知道如何使用包 copula
在 R 中做到这一点?
另外,我们如何找到C(D<=3 or
S<=2)?感谢您的帮助。
这是一个仅使用基础 R 和 copula 包的答案:
FD(d) 是伽玛 CDF。根据您的代码,它的形状为 2.20,速率为 0.98,因此 FD(3) 为 pgamma(3, 2.20, 0.98) = 0.7495596
FS(s) 是伽玛 CDF。根据您的代码,它的形状为 1.56,速率为 0.45,因此 FS(2) 为 pgamma(2, 1.56, 0.45) = 0.3631978
C(FD(d), FS(s)) 是生存的 Clayton Copula (也称为旋转 Clayton copula)用上述边缘计算。在 R 中,这是
library(copula)
D_shape <- 2.20
D_rate <- 0.98
S_shape <- 1.56
S_rate <- 0.45
surv_clay <- rotCopula(claytonCopula(5.79))
pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
- OP评论中Shiau 2006年论文第810页等式(23)的分母表明P(D>=3 or S>=2) = 1- C(FD (d), FS(s)) 即:
1 - pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
- P(D<=3 或 S<=2) = P(D<=3) + P(S<=2) - P(D<=3,S<=2) 所以
pgamma(3, D_shape, D_rate) +
pgamma(2, S_shape, S_rate) -
pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
来源
- R-bloggers post on Modelling Dependence with Copulas
- 通过阅读 copula 手册,意识到
VineCopula::surClaytonCopula(5.79)
对应于 copula::rotCopula(copula::claytonCopula(5.79))
下面是一些代码,可以通过模拟仔细检查一些事情。
library(fitdistrplus)
library(copula)
library(VineCopula)
D = c(1,3,2,2,8,2,1,3,1,1,3,3,1,1,2,1,2,1,1,3,4,1,1,3,1,1,2,1,3,7,1,4,6,1,2,1,1,3,1,2,2,3,4,1,1,1,1,2,2,12,1,1,2,1,1,1,3,4)
S = c(1.42,5.15,2.52,2.29,12.36,2.82,1.49,3.53,1.17,1.03,4.03,5.26,1.65,1.41,3.75,1.09,3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)
(fg_d <- fitdist(data = D, distr = "gamma", method = "mle"))
(fg_s <- fitdist(data = S, distr = "gamma", method = "mle"))
mydata <- cbind(D=D, S=S)
u1 <- pobs(mydata[,1])
u2 <- pobs(mydata[,2])
fitCopula <- BiCopSelect(u1, u2, familyset=NA)
summary(fitCopula)
D_shape <- fg_d$estimate[1]
D_rate <- fg_d$estimate[2]
S_shape <- fg_s$estimate[1]
S_rate <- fg_s$estimate[2]
copula_dist <- mvdc(copula=rotCopula(claytonCopula(5.79)), margins=c("gamma","gamma"),
paramMargins=list(list(shape=D_shape, rate=D_rate),
list(shape=S_shape, rate=S_rate)))
sim <- rMvdc(n = 1e5,
copula_dist)
plot(sim, col="red")
points(D,S, col="black")
legend('bottomright',c('Observed','Simulated'),col=c('black','red'),pch=21)
并回答有关具体计算的问题:
## F_D(d) for d=3
mean(sim[,1] <=3) ## simulated
pgamma(3, D_shape, D_rate) ## theory
## F_S(s) for s=2
mean(sim[,2] <=2) ## simulated
pgamma(2, S_shape, S_rate) ## theory
## C(F_D(d) for d=3 AND F_S(s) for s=2)
## simulated value:
mean(sim[,1] <=3 & sim[,2] <=2)
## with copula:
surv_clay <- rotCopula(claytonCopula(5.79))
pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
## P(D>=3 or S>=2)
## simulated
mean(sim[,1] >= 3 | sim[,2] >=2)
## with copula:
1-pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
## In case you want:
## P(D<=3 or S<=2) = P(D<=3) + P(S<=2) - P(D<=3,S<=2)
## simulated:
mean(sim[,1] <= 3 | sim[,2] <= 2)
## theory with copula:
pgamma(3, D_shape, D_rate) + pgamma(2, S_shape, S_rate) - pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
运行 上面的两段代码给出了以下输出:
> (fg_d <- fitdist(data = D, distr = "gamma", method = "mle"))
Fitting of the distribution ' gamma ' by maximum likelihood
Parameters:
estimate Std. Error
shape 2.2082572 0.3831383
rate 0.9775783 0.1903410
> (fg_s <- fitdist(data = S, distr = "gamma", method = "mle"))
Fitting of the distribution ' gamma ' by maximum likelihood
Parameters:
estimate Std. Error
shape 1.5628338 0.26500235
rate 0.4494518 0.08964724
>
> mydata <- cbind(D=D, S=S)
> u1 <- pobs(mydata[,1])
> u2 <- pobs(mydata[,2])
> fitCopula <- BiCopSelect(u1, u2, familyset=NA)
Warning message:
In cor(x[(x[, 1] < 0) & (x[, 2] < 0), ]) : the standard deviation is zero
> summary(fitCopula)
Family
------
No: 13
Name: Survival Clayton
Parameter(s)
------------
par: 5.79
Dependence measures
-------------------
Kendall's tau: 0.74 (empirical = 0.82, p value < 0.01)
Upper TD: 0.89
Lower TD: 0
Fit statistics
--------------
logLik: 57.68
AIC: -113.37
BIC: -111.31
>
> D_shape <- fg_d$estimate[1]
> D_rate <- fg_d$estimate[2]
> S_shape <- fg_s$estimate[1]
> S_rate <- fg_s$estimate[2]
>
> copula_dist <- mvdc(copula=rotCopula(claytonCopula(5.79)), margins=c("gamma","gamma"),
+ paramMargins=list(list(shape=D_shape, rate=D_rate),
+ list(shape=S_shape, rate=S_rate)))
>
> sim <- rMvdc(n = 1e5,
+ copula_dist)
>
> plot(sim, col="red")
> points(D,S, col="black")
> legend('bottomright',c('Observed','Simulated'),col=c('black','red'),pch=21)
并且--
> ## F_D(d) for d=3
> mean(sim[,1] <=3) ## simulated
[1] 0.74759
> pgamma(3, D_shape, D_rate) ## theory
[1] 0.746482
>
> ## F_S(s) for s=2
> mean(sim[,2] <=2) ## simulated
[1] 0.36233
> pgamma(2, S_shape, S_rate) ## theory
[1] 0.3617122
>
> ## C(F_D(d) for d=3 AND F_S(s) for s=2)
> ## simulated value:
> mean(sim[,1] <=3 & sim[,2] <=2)
[1] 0.362
> ## with copula:
> surv_clay <- rotCopula(claytonCopula(5.79))
> pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
[1] 0.3615195
>
> ## P(D>=3 or S>=2)
> ## simulated
> mean(sim[,1] >= 3 | sim[,2] >=2)
[1] 0.638
> ## with copula:
> 1-pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
[1] 0.6384805
> ## In case you want:
> ## P(D<=3 or S<=2) = P(D<=3) + P(S<=2) - P(D<=3,S<=2)
> ## simulated:
> mean(sim[,1] <= 3 | sim[,2] <= 2)
[1] 0.74792
> ## theory with copula:
> pgamma(3, D_shape, D_rate) + pgamma(2, S_shape, S_rate) - pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
[1] 0.7466747
我现在正在研究 R 中的 copula,我想知道如何找到 R 中的联合累积分布?
D = c(1,3,2,2,8,2,1,3,1,1,3,3,1,1,2,1,2,1,1,3,4,1,1,3,1,1,2,1,3,7,1,4,6,1,2,1,1,3,1,2,2,3,4,1,1,1,1,2,2,12,1,1,2,1,1,1,3,4)
S = c(1.42,5.15,2.52,2.29,12.36,2.82,1.49,3.53,1.17,1.03,4.03,5.26,1.65,1.41,3.75,1.09,3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)
经过一番摸索,我发现Gamma分布最能描述上述数据。
library(fitdistrplus)
fg_d <- fitdist(data = Dur, distr = "gamma", method = "mle")
fg_s <- fitdist(data = Sev, distr = "gamma", method = "mle")
然后,我尝试 select 使用 VineCopula
包的 copula 家族:
mydata <- cbind(D=D, S=S)
u1 <- pobs(mydata[,1])
u2 <- pobs(mydata[,2])
fitCopula <- BiCopSelect(u1, u2, familyset=NA)
summary(fitCopula)
结果表示"Survival Clayton"。然后,我尝试构建以下 copula:
library(copula)
cop_model <- surClaytonCopula(param = 5.79)
现在,根据下面的等式(假设E(L)为常数):
我需要为给定的 D 和 S 值找到 FD(d)、FS(s) 和 C(FD(d),FS(s))。
比如我们取D=3,S=2,那么我们要求F(D<=3),F(S<=2),C(D<=3 and
S<=2).我想知道如何使用包 copula
在 R 中做到这一点?
另外,我们如何找到C(D<=3 or
S<=2)?感谢您的帮助。
这是一个仅使用基础 R 和 copula 包的答案:
FD(d) 是伽玛 CDF。根据您的代码,它的形状为 2.20,速率为 0.98,因此 FD(3) 为
pgamma(3, 2.20, 0.98) = 0.7495596
FS(s) 是伽玛 CDF。根据您的代码,它的形状为 1.56,速率为 0.45,因此 FS(2) 为
pgamma(2, 1.56, 0.45) = 0.3631978
C(FD(d), FS(s)) 是生存的 Clayton Copula (也称为旋转 Clayton copula)用上述边缘计算。在 R 中,这是
library(copula)
D_shape <- 2.20
D_rate <- 0.98
S_shape <- 1.56
S_rate <- 0.45
surv_clay <- rotCopula(claytonCopula(5.79))
pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
- OP评论中Shiau 2006年论文第810页等式(23)的分母表明P(D>=3 or S>=2) = 1- C(FD (d), FS(s)) 即:
1 - pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
- P(D<=3 或 S<=2) = P(D<=3) + P(S<=2) - P(D<=3,S<=2) 所以
pgamma(3, D_shape, D_rate) +
pgamma(2, S_shape, S_rate) -
pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
来源
- R-bloggers post on Modelling Dependence with Copulas
- 通过阅读 copula 手册,意识到
VineCopula::surClaytonCopula(5.79)
对应于copula::rotCopula(copula::claytonCopula(5.79))
下面是一些代码,可以通过模拟仔细检查一些事情。
library(fitdistrplus)
library(copula)
library(VineCopula)
D = c(1,3,2,2,8,2,1,3,1,1,3,3,1,1,2,1,2,1,1,3,4,1,1,3,1,1,2,1,3,7,1,4,6,1,2,1,1,3,1,2,2,3,4,1,1,1,1,2,2,12,1,1,2,1,1,1,3,4)
S = c(1.42,5.15,2.52,2.29,12.36,2.82,1.49,3.53,1.17,1.03,4.03,5.26,1.65,1.41,3.75,1.09,3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)
(fg_d <- fitdist(data = D, distr = "gamma", method = "mle"))
(fg_s <- fitdist(data = S, distr = "gamma", method = "mle"))
mydata <- cbind(D=D, S=S)
u1 <- pobs(mydata[,1])
u2 <- pobs(mydata[,2])
fitCopula <- BiCopSelect(u1, u2, familyset=NA)
summary(fitCopula)
D_shape <- fg_d$estimate[1]
D_rate <- fg_d$estimate[2]
S_shape <- fg_s$estimate[1]
S_rate <- fg_s$estimate[2]
copula_dist <- mvdc(copula=rotCopula(claytonCopula(5.79)), margins=c("gamma","gamma"),
paramMargins=list(list(shape=D_shape, rate=D_rate),
list(shape=S_shape, rate=S_rate)))
sim <- rMvdc(n = 1e5,
copula_dist)
plot(sim, col="red")
points(D,S, col="black")
legend('bottomright',c('Observed','Simulated'),col=c('black','red'),pch=21)
并回答有关具体计算的问题:
## F_D(d) for d=3
mean(sim[,1] <=3) ## simulated
pgamma(3, D_shape, D_rate) ## theory
## F_S(s) for s=2
mean(sim[,2] <=2) ## simulated
pgamma(2, S_shape, S_rate) ## theory
## C(F_D(d) for d=3 AND F_S(s) for s=2)
## simulated value:
mean(sim[,1] <=3 & sim[,2] <=2)
## with copula:
surv_clay <- rotCopula(claytonCopula(5.79))
pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
## P(D>=3 or S>=2)
## simulated
mean(sim[,1] >= 3 | sim[,2] >=2)
## with copula:
1-pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
## In case you want:
## P(D<=3 or S<=2) = P(D<=3) + P(S<=2) - P(D<=3,S<=2)
## simulated:
mean(sim[,1] <= 3 | sim[,2] <= 2)
## theory with copula:
pgamma(3, D_shape, D_rate) + pgamma(2, S_shape, S_rate) - pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
运行 上面的两段代码给出了以下输出:
> (fg_d <- fitdist(data = D, distr = "gamma", method = "mle"))
Fitting of the distribution ' gamma ' by maximum likelihood
Parameters:
estimate Std. Error
shape 2.2082572 0.3831383
rate 0.9775783 0.1903410
> (fg_s <- fitdist(data = S, distr = "gamma", method = "mle"))
Fitting of the distribution ' gamma ' by maximum likelihood
Parameters:
estimate Std. Error
shape 1.5628338 0.26500235
rate 0.4494518 0.08964724
>
> mydata <- cbind(D=D, S=S)
> u1 <- pobs(mydata[,1])
> u2 <- pobs(mydata[,2])
> fitCopula <- BiCopSelect(u1, u2, familyset=NA)
Warning message:
In cor(x[(x[, 1] < 0) & (x[, 2] < 0), ]) : the standard deviation is zero
> summary(fitCopula)
Family
------
No: 13
Name: Survival Clayton
Parameter(s)
------------
par: 5.79
Dependence measures
-------------------
Kendall's tau: 0.74 (empirical = 0.82, p value < 0.01)
Upper TD: 0.89
Lower TD: 0
Fit statistics
--------------
logLik: 57.68
AIC: -113.37
BIC: -111.31
>
> D_shape <- fg_d$estimate[1]
> D_rate <- fg_d$estimate[2]
> S_shape <- fg_s$estimate[1]
> S_rate <- fg_s$estimate[2]
>
> copula_dist <- mvdc(copula=rotCopula(claytonCopula(5.79)), margins=c("gamma","gamma"),
+ paramMargins=list(list(shape=D_shape, rate=D_rate),
+ list(shape=S_shape, rate=S_rate)))
>
> sim <- rMvdc(n = 1e5,
+ copula_dist)
>
> plot(sim, col="red")
> points(D,S, col="black")
> legend('bottomright',c('Observed','Simulated'),col=c('black','red'),pch=21)
并且--
> ## F_D(d) for d=3
> mean(sim[,1] <=3) ## simulated
[1] 0.74759
> pgamma(3, D_shape, D_rate) ## theory
[1] 0.746482
>
> ## F_S(s) for s=2
> mean(sim[,2] <=2) ## simulated
[1] 0.36233
> pgamma(2, S_shape, S_rate) ## theory
[1] 0.3617122
>
> ## C(F_D(d) for d=3 AND F_S(s) for s=2)
> ## simulated value:
> mean(sim[,1] <=3 & sim[,2] <=2)
[1] 0.362
> ## with copula:
> surv_clay <- rotCopula(claytonCopula(5.79))
> pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
[1] 0.3615195
>
> ## P(D>=3 or S>=2)
> ## simulated
> mean(sim[,1] >= 3 | sim[,2] >=2)
[1] 0.638
> ## with copula:
> 1-pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
[1] 0.6384805
> ## In case you want:
> ## P(D<=3 or S<=2) = P(D<=3) + P(S<=2) - P(D<=3,S<=2)
> ## simulated:
> mean(sim[,1] <= 3 | sim[,2] <= 2)
[1] 0.74792
> ## theory with copula:
> pgamma(3, D_shape, D_rate) + pgamma(2, S_shape, S_rate) - pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
[1] 0.7466747