Expand.grid p 值矩阵用 NA 填充相等的变量

Question

我不得不运行对数据集中的分类数据进行大量卡方费舍尔检验。由于分类变量的数量，我知道这样做会花费大量时间，所以我在 here 上找到了一个函数并根据需要修改了它。

>HRchi
    # A tibble: 6 x 13
  Position   State Sex   MaritalDesc CitizenDesc HispanicLatino RaceDesc TermReason  EmploymentStatus  Department ManagerName RecruitmentSour~
  <chr>      <chr> <chr> <chr>       <chr>       <chr>          <chr>    <chr>       <chr>             <chr>      <chr>       <chr>           
1 Productio~ MA    "M "  Single      US Citizen  No             White    N/A-StillE~ Active            "Producti~ Michael Al~ LinkedIn        
2 Sr. DBA    MA    "M "  Married     US Citizen  No             White    career cha~ Voluntarily Term~ "IT/IS"    Simon Roup  Indeed          
3 Productio~ MA    "F"   Married     US Citizen  No             White    hours       Voluntarily Term~ "Producti~ Kissy Sull~ LinkedIn        
4 Productio~ MA    "F"   Married     US Citizen  No             White    N/A-StillE~ Active            "Producti~ Elijiah Gr~ Indeed          
5 Productio~ MA    "F"   Divorced    US Citizen  No             White    return to ~ Voluntarily Term~ "Producti~ Webster Bu~ Google Search   
6 Productio~ MA    "F"   Single      US Citizen  No             White    N/A-StillE~ Active            "Producti~ Amy Dunn    LinkedIn        
# ... with 1 more variable: PerformanceScore <chr>
>

我用来运行测试的函数如下

col_combinations <-  expand.grid(names(HRchi), names(HRchi))
cor_test_wrapper <-  function(col_name1, col_name2, data_frame) {
  format(fisher.test(data_frame[[col_name1]], data_frame[[col_name2]],  
                     simulate.p.value = TRUE, B = 1e6)$p.value, scientific = F)
}

p_vals <- mapply(cor_test_wrapper, 
                col_name1 = col_combinations[[1]], 
                col_name2 = col_combinations[[2]], 
                MoreArgs = list(data_frame = HRchi))

Ficher.pvalue.matrix <- matrix(p_vals, 13, 13, dimnames = list(names(HRchi), names(HRchi)))
Ficher.pvalue.matrix

这个 returns p 值矩阵：

   rowname Position State Sex   MaritalDesc CitizenDesc HispanicLatino RaceDesc TermReason EmploymentStatus Department ManagerName RecruitmentSour~
   <chr>   <chr>    <chr> <chr> <chr>       <chr>       <chr>          <chr>    <chr>      <chr>            <chr>      <chr>       <chr>           
 1 Positi~ 0.00000~ 0.00~ 0.31~ 0.8194522   0.6830553   0.03777396     0.16237~ 0.9216931  0.01563398       0.0000009~ 0.00000099~ 0.000002999997  
 2 State   0.00000~ 0.00~ 0.14~ 0.5327625   0.4954165   0.4240866      0.00748~ 0.980687   0.8377042        0.0000009~ 0.00000099~ 0.02947497      
 3 Sex     0.31226~ 0.14~ 0.00~ 0.6979593   0.6987973   0.8145132      0.94932~ 0.6053784  0.959038         0.2443258  0.06263294  0.1271179       
 4 Marita~ 0.81893~ 0.53~ 0.69~ 0.00000099~ 0.9265121   0.5331945      0.48005~ 0.0059059~ 0.008646991      0.7705712  0.8863871   0.2533087       
 5 Citize~ 0.68347~ 0.49~ 0.70~ 0.9270521   0.00000099~ 1              0.05806~ 0.1407349  0.2222708        0.4063666  0.8475872   0.1891118       
 6 Hispan~ 0.03778~ 0.42~ 0.81~ 0.5330425   1           0.000000999999 0.04130~ 0.8368642  1                0.05423295 0.1162419   0.06414394      
 7 RaceDe~ 0.16164~ 0.00~ 0.94~ 0.4804555   0.05764794  0.04088996     0.00000~ 0.972402   0.8328322        0.08990291 0.01743098  0.000000999999  
 8 TermRe~ 0.92143~ 0.98~ 0.60~ 0.005702994 0.1414139   0.8366842      0.97238  0.0000009~ 0.000000999999   0.2481378  0.7842482   0.0002929997    
 9 Employ~ 0.01571~ 0.83~ 0.95~ 0.008722991 0.2230458   1              0.83268~ 0.0000009~ 0.000000999999   0.0025569~ 0.001606998 0.000000999999  
10 Depart~ 0.00000~ 0.00~ 0.24~ 0.7694292   0.4063906   0.05454395     0.09036~ 0.2486848  0.002619997      0.0000009~ 0.00000099~ 0.000000999999  
11 Manage~ 0.00000~ 0.00~ 0.06~ 0.8851031   0.8472942   0.1168469      0.01726~ 0.7852542  0.001648998      0.0000009~ 0.00000099~ 0.000001999998  
12 Recrui~ 0.00000~ 0.02~ 0.12~ 0.2529637   0.1878758   0.06357094     0.00000~ 0.0003429~ 0.000002999997   0.0000009~ 0.00000099~ 0.000000999999  
13 Perfor~ 0.76044~ 0.56~ 0.47~ 0.9184571   0.7584852   1              0.15887~ 0.06789893 0.003164997      0.6032454  0.2900097   0.3136187       
# ... with 1 more variable: PerformanceScore <chr>

我想知道的是，是否可以让对角线以上的所有内容（Position = Position，State = State，等等）都等于 NA，这样数据框就不会那么混乱了。

Answer 1

您可以使用 diag 函数替换对角线上的值。例如：

# Create example matrix (correlation matrix of mtcars data)
myMatrix <- cor(mtcars)

# Replace diagonal with NA
diag(mtcars) <- NA

要更改上下对角线：

# Upper
myMatrix[upper.tri(myMatrix)] <- NA

# Lower
myMatrix[lower.tri(myMatrix)] <- NA

Answer 2

你可以使用 upper.tri:

Ficher.pvalue.matrix[upper.tri(Ficher.pvalue.matrix)]<-NA

Expand.grid p 值矩阵用 NA 填充相等的变量

Expand.grid p-value matrix fill equal variables with NA

r

mapply