如何在 semPaths 的估计中包含 p 值和 R 方？

Question

我正在使用 semPaths（semPlot 包）来绘制我的结构方程模型。经过反复试验，我有一个很好的脚本来展示我想要的东西。除了，我还没弄清楚如何在图中包含 estimates/regression 系数的 p-value/significance 级别。

Can/how 我可以包括显着性水平吗？边缘标签中的 p 值低于估计值或作为无关紧要的虚线或......？我也有兴趣包括 R 平方，但不像显着性水平那么重要。

这是我目前使用的脚本：

semPaths(fitmod.bac.class2,
     what = "std",
     whatLabels = "std",
     style="ram",
     edge.label.cex = 1.3,
     layout = 'tree',
     intercepts=FALSE,
     residuals=FALSE,
     nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
     sizeMan=7 )

SemPath 输出之一的示例

在此示例中，以下内容不重要：

Ignavibacteria -> First_C_CO2_ugC_gC_day，p = 0.096
pH -> Ignavibacteria，p = 0.151
cand_class_MB_A2_108 <-> 杆菌相关性，p = 0.054

我是 R 用户而不是真正的编码员，所以我可能只是遗漏了论点中的一个关键点。

我目前正在测试很多不同的模型，真的很希望不必手工绘制它们。

更新：使用 semPlotModel：我对 semPlotModel 不包括 sem 函数的显着性水平的理解是否正确（请参阅下面的脚本和输出）？我特别希望将 P(>|z|) 用于回归和协方差。是只有我缺少它，还是不包括在内？如果不包含，我的解决方案只是自定义边缘标签。

    {model.NA.UP.bac.class2 <- '

  #LATANT VARIABLES

  #REGRESSIONS 

  #soil organic carbon quality

  c_Negativicutes ~ CN  

  #microorganisms
  First_C_CO2_ugC_gC_day    ~ c_Bacilli
  First_C_CO2_ugC_gC_day    ~ c_Ignavibacteria
  First_C_CO2_ugC_gC_day    ~ c_cand_class_MB_A2_108
  First_C_CO2_ugC_gC_day    ~ c_Negativicutes

  #pH
  c_Bacilli            ~pH
  c_Ignavibacteria    ~pH
  c_cand_class_MB_A2_108~pH
  c_Negativicutes     ~pH

  #COVARIANCE
  initial_water       ~~ CN
  c_cand_class_MB_A2_108 ~~ c_Bacilli 


  '

  fitmod.bac.class2 <- sem(model.NA.UP.bac.class2, data=datapNA.UP.log, missing="ml", meanstructure=TRUE, fixed.x=FALSE, std.lv=FALSE, std.ov=FALSE)
  summary(fitmod.bac.class2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE)

  out <- capture.output(summary(fitmod.bac.class2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE))
}

输出：

  lavaan 0.6-5 ended normally after 188 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of free parameters                         28

  Number of observations                            30
  Number of missing patterns                         1

Model Test User Model:

  Test statistic                                17.816
  Degrees of freedom                                16
  P-value (Chi-square)                           0.335

Model Test Baseline Model:

  Test statistic                               101.570
  Degrees of freedom                                28
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.975
  Tucker-Lewis Index (TLI)                       0.957

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)                472.465
  Loglikelihood unrestricted model (H1)        481.373

  Akaike (AIC)                                -888.930
  Bayesian (BIC)                              -849.697
  Sample-size adjusted Bayesian (BIC)         -936.875

Root Mean Square Error of Approximation:

  RMSEA                                          0.062
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.185
  P-value RMSEA <= 0.05                          0.414

Standardized Root Mean Square Residual:

  SRMR                                           0.107

Parameter Estimates:

  Information                                 Observed
  Observed information based on                Hessian
  Standard errors                             Standard

Regressions:
                           Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  c_Negativicutes ~                                                             
    CN                        0.419    0.143    2.939    0.003    0.419    0.416
  c_cand_class_MB_A2_108 ~                                                      
    CN                       -0.433    0.160   -2.707    0.007   -0.433   -0.394
  First_C_CO2_ugC_gC_day ~                                                      
    c_Bacilli                 0.525    0.128    4.092    0.000    0.525    0.496
    c_Ignavibacter            0.207    0.124    1.667    0.096    0.207    0.195
    c_c__MB_A2_108            0.310    0.125    2.475    0.013    0.310    0.301
    c_Negativicuts            0.304    0.137    2.220    0.026    0.304    0.271
  c_Bacilli ~                                                                   
    pH                        0.624    0.135    4.604    0.000    0.624    0.643
  c_Ignavibacteria ~                                                            
    pH                        0.245    0.171    1.436    0.151    0.245    0.254
  c_cand_class_MB_A2_108 ~                                                      
    pH                        0.393    0.151    2.597    0.009    0.393    0.394
  c_Negativicutes ~                                                             
    pH                        0.435    0.129    3.361    0.001    0.435    0.476

Covariances:
                            Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  CN ~~                                                                          
    initial_water              0.001    0.000    2.679    0.007    0.001    0.561
 .c_cand_class_MB_A2_108 ~~                                                      
   .c_Bacilli                 -0.000    0.000   -1.923    0.054   -0.000   -0.388

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .c_Negativicuts    0.145    0.198    0.734    0.463    0.145    3.826
   .c_c__MB_A2_108    1.038    0.226    4.594    0.000    1.038   25.076
   .Frs_C_CO2_C_C_   -0.346    0.233   -1.485    0.137   -0.346   -8.115
   .c_Bacilli         0.376    0.135    2.778    0.005    0.376    9.340
   .c_Ignavibacter    0.754    0.170    4.424    0.000    0.754   18.796
    CN                0.998    0.007  145.158    0.000    0.998   26.502
    pH                0.998    0.008  131.642    0.000    0.998   24.034
    initial_water     0.998    0.008  125.994    0.000    0.998   23.003

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .c_Negativicuts    0.001    0.000    3.873    0.000    0.001    0.600
   .c_c__MB_A2_108    0.001    0.000    3.833    0.000    0.001    0.689
   .Frs_C_CO2_C_C_    0.001    0.000    3.873    0.000    0.001    0.408
   .c_Bacilli         0.001    0.000    3.873    0.000    0.001    0.586
   .c_Ignavibacter    0.002    0.000    3.873    0.000    0.002    0.936
    CN                0.001    0.000    3.873    0.000    0.001    1.000
    initial_water     0.002    0.000    3.873    0.000    0.002    1.000
    pH                0.002    0.000    3.873    0.000    0.002    1.000

R-Square:
                   Estimate
    c_Negativicuts    0.400
    c_c__MB_A2_108    0.311
    Frs_C_CO2_C_C_    0.592
    c_Bacilli         0.414
    c_Ignavibacter    0.064

Warning message:
In lav_model_hessian(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING: Hessian is not fully symmetric. Max diff = 5.15131396241486e-05

Answer 1

这个例子取自 ?semPaths 因为我们没有你的对象。

library('semPlot')

modFile <- tempfile(fileext = '.OUT')
download.file('http://sachaepskamp.com/files/mi1.OUT', modFile)

使用semPlotModel获取对象而不绘制。在那里你可以检查要绘制的内容。我只是在没有阅读文档的情况下四处寻找，直到我发现它似乎在使用什么。

在运行 semPlotModel 之后，对象有一个元素 x@Pars，其中包含边、节点和用于边标签的 std在你的情况下。 semPaths 也有一个允许您制作自定义边缘标签的参数，因此您可以从 x@Pars 中获取所需的数据并添加您的 p 值：

x <- semPlotModel(modFile)
x@Pars
#                     label    lhs edge    rhs    est       std   group fixed par
# 1        lambda[11]^{(y)} perfIQ   ->     pc  1.000 0.6219648 Group 1  TRUE   0
# 2        lambda[21]^{(y)} perfIQ   ->     pa  0.923 0.5664888 Group 1 FALSE   1
# 3        lambda[31]^{(y)} perfIQ   ->     oa  1.098 0.6550159 Group 1 FALSE   2
# 4        lambda[41]^{(y)} perfIQ   ->     ma  0.784 0.4609990 Group 1 FALSE   3
# 5   theta[11]^{(epsilon)}     pc  <->     pc  5.088 0.6131598 Group 1 FALSE   5
# 10  theta[22]^{(epsilon)}     pa  <->     pa  5.787 0.6790905 Group 1 FALSE   6
# 15  theta[33]^{(epsilon)}     oa  <->     oa  5.150 0.5709541 Group 1 FALSE   7
# 20  theta[44]^{(epsilon)}     ma  <->     ma  7.311 0.7874800 Group 1 FALSE   8
# 21                psi[11] perfIQ  <-> perfIQ  3.210 1.0000000 Group 1 FALSE   4
# 22           tau[1]^{(y)}         int     pc 10.500        NA Group 1 FALSE   9
# 23           tau[2]^{(y)}         int     pa 10.374        NA Group 1 FALSE  10
# 24           tau[3]^{(y)}         int     oa 10.663        NA Group 1 FALSE  11
# 25           tau[4]^{(y)}         int     ma 10.371        NA Group 1 FALSE  12
# 11       lambda[11]^{(y)} perfIQ   ->     pc  1.000 0.6515609 Group 2  TRUE   0
# 27       lambda[21]^{(y)} perfIQ   ->     pa  0.923 0.5876948 Group 2 FALSE   1
# 31       lambda[31]^{(y)} perfIQ   ->     oa  1.098 0.6981974 Group 2 FALSE   2
# 41       lambda[41]^{(y)} perfIQ   ->     ma  0.784 0.4621919 Group 2 FALSE   3
# 51  theta[11]^{(epsilon)}     pc  <->     pc  5.006 0.5754684 Group 2 FALSE  14
# 101 theta[22]^{(epsilon)}     pa  <->     pa  5.963 0.6546148 Group 2 FALSE  15
# 151 theta[33]^{(epsilon)}     oa  <->     oa  4.681 0.5125204 Group 2 FALSE  16
# 201 theta[44]^{(epsilon)}     ma  <->     ma  8.356 0.7863786 Group 2 FALSE  17
# 211               psi[11] perfIQ  <-> perfIQ  3.693 1.0000000 Group 2 FALSE  13
# 221          tau[1]^{(y)}         int     pc 10.500        NA Group 2 FALSE   9
# 231          tau[2]^{(y)}         int     pa 10.374        NA Group 2 FALSE  10
# 241          tau[3]^{(y)}         int     oa 10.663        NA Group 2 FALSE  11
# 251          tau[4]^{(y)}         int     ma 10.371        NA Group 2 FALSE  12
# 26               alpha[1]         int perfIQ -2.469        NA Group 2 FALSE  18

如您所见，边缘标签比绘制的边缘标签多，我不知道它如何选择使用哪个，所以我只从每组中取出前四个（因为显示了四个边缘并且 std 匹配那些。也许有一个选项可以绘制所有这些或者 select 你需要哪些——我还没有读过文档。

## take first four stds from each group, generate some p-values
l <- sapply(split(x@Pars$std, x@Pars$group), function(x) head(x, 4))
set.seed(1)
l <- sprintf('%.3f, p=%s', l, format.pval(runif(length(l)), digits = 2))
l
# [1] "0.622, p=0.27" "0.566, p=0.37" "0.655, p=0.57" "0.461, p=0.91" "0.652, p=0.20" "0.588, p=0.90" "0.698, p=0.94" "0.462, p=0.66"

然后你可以用你的新标签绘制对象，edgeLabels = l

layout(1:2)
semPaths(
  x,
  edgeLabels = l,
  ask = FALSE, title = FALSE,
  what = 'std',
  whatLabels = 'std',
  style = 'ram',
  edge.label.cex = 1.3,
  layout = 'tree',
  intercepts = FALSE,
  residuals = FALSE,
  sizeMan = 7 
)

Answer 2

在@rawr 的帮助下，我已经解决了。如果其他人需要在他们的 semPath 中包含来自 Lavaan 的估计值和 p 值，请按以下方法完成。

#extracting the parameters from the sem model and selecting the interactions relevant for the semPaths (here, I need 12 estimates and p-values)
table2<-parameterEstimates(fitmod.bac.class2,standardized=TRUE)  %>%  head(12)

#turning the chosen parameters into text
b<-gettextf('%.3f \n p=%.3f', table2$std.all, digits=table2$pvalue)

老实说，我不明白最后一段脚本是如何工作的。这是从 rawr 的答案中复制过来的，然后经过大量的反复试验，直到它起作用。可能（很可能）有更好的写法，但它确实有效 :)

#putting that list into edgeLabels in sempaths
semPaths(fitmod.bac.class2,
         what = "std",
         edgeLabels = b,
         style="ram",
         edge.label.cex = 1,
         layout = 'tree',
         intercepts=FALSE,
         residuals=FALSE,
         nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
         sizeMan=7
)

enter image description here

Answer 3

只是一个小但相关的细节，用于改进上述答案。上面的代码需要检查参数 table 以计算要维护的行数，以在 %>%head(4) 中指定。

我们可以从提取的参数中排除 table 那些 lhs 和 rhs 不相等的行。

#extracting the parameters from the sem model and selecting the interactions relevant for the semPaths 
table2<-parameterEstimates(fitmod.bac.class2,standardized=TRUE)%>%as.dataframe()
table2<-table2[!table2$lhs==table2$rhs,]

如果公式还包含带有“:=”的额外行，这些行也将包含参数 table，应将其删除。

其余保持不变...

#turning the chosen parameters into text
b<-gettextf('%.3f \n p=%.3f', table2$std.all, digits=table2$pvalue)
#putting that list into edgeLabels in sempaths
semPaths(fitmod.bac.class2,
         what = "std",
         edgeLabels = b,
         style="ram",
         edge.label.cex = 1,
         layout = 'tree',
         intercepts=FALSE,
         residuals=FALSE,
         nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
         sizeMan=7
)

如何在 semPaths 的估计中包含 p 值和 R 方？

How do I include p-value and R-square for the estimates in semPaths?

r

significance

r-lavaan

p-value

semplot