Kruskal-wallis test in R gives an error: Error in model.frame.default: variable lengths differ

Kruskal-wallis test in R gives an error: Error in model.frame.default: variable lengths differ

我正在尝试 运行 Kruskal wallis 测试我在 R 中的示例数据框 (df) 中的多列,但我遇到以下错误:

 Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups),  : 
  variable lengths differ (found for 'as.factor(Groups)') 

这是我的示例数据框 (df):

Groups  Gene1   Gene2   Gene3   Gene4   Gene5   Gene6   Gene7   Gene8   Gene9   Gene10
Group1  120.67  69.33   1.24    2.31    0.39    6.57    2.49    383.84  415.23  NA
Group1  157 110.67  0.4 0.84    0.28    2.62    2.11    245.42  325.23  NA
Group1  113.5   66.75   1.07    4.53    0.33    2.37    2.35    421.25  352.03  73.51
Group1  131 79.67   1.13    5.03    0.72    3.36    2.24    305.32  432.81  71.11
Group1  120 79.67   0.91    3.84    0.74    3.77    1.92    298.91  382.43  66.49
Group2  125.67  83.67   2.07    1.73    0.38    3.89    2.09    233.81  377.21  72.1
Group2  103.33  68.67   1.01    4.89    0.3 4.5 1.75    231.5   381.73  53
Group2  121.33  74.67   0.54    2.39    3.95    3.7 2.46    310.66  355.97  143.61
Group2  136 83.67   1.6 1.75    0.32    5.17    2.36    410.21  389.62  170.34
Group2  143.67  71.33   0.56    1.22    0.26    4.48    2.62    294.01  491.57  96.72
Group2  134.67  69.67   0.85    1.77    0.45    3.58    2.44    236.61  441.32  69.06
Group2  158.33  98.33   0.87    3.69    0.51    2.53    2.6 257.66  396.96  41.94
Group2  147.33  88.33   NA  NA  NA  NA  NA  NA  NA  NA
Group2  95.67   59  1.39    0.56    0.31    2.49    2.09    395.38  420.28  64.83
Group3  135 82  13.31   24.05   1.21    3.83    2.83    313.71  327.84  66.8
Group3  124.67  78  1.12    2   0.71    3.77    2.42    334.36  358.9   131.35
Group3  152 98.33   1.11    1.54    0.35    2.11    2.21    297.68  433.48  117.18
Group3  135.33  73.67   0.13    2.99    0.3 2.4 1.86    296.82  415.13  112.97
Group3  135.33  87  0.91    3.73    0.65    2.92    1.85    335.31  412.16  103.18
Group4  124.67  77.67   0.28    0.81    0.49    2.62    1.96    251.49  468.19  80.27
Group4  125.67  72.33   1.01    1.82    0.35    3.65    1.62    335.18  264.74  145.15
Group4  169 105 0.6 3.12    0.29    3.9 2.22    311.01  459.85  82.89
Group4  123.67  76.33   0.65    1.78    0.47    2.77    1.57    253.56  283.38  59.07
Group5  132.67  76.33   2.94    17.01   0.27    3.99    2.55    354.78  493.02  145.36
Group5  NA  NA  1.34    1.42    0.4 4.21    2.02    243.26  345.2   43.91
Group5  144.33  75  NA  NA  0.55    3.26    2.85    312.16  419.86  55.71
Group5  136.25  78.25   NA  1.32    0.65    3.63    1.52    267.13  256.18  53.49
Group5  123.67  69.33   1.81    1.52    0.67    3.89    2   303.89  346.57  112.16
Group5  116.67  66.33   0.7 1.68    0.27    3.55    2.16    284.96  407.04  102.97
Group5  136.67  76  2.68    4.3 0.33    7.36    2.26    237.28  423.29  88.65
Group6  122 63.33   0.87    4.2 0.17    3.92    2.11    159.04  300.24  60.13
Group6  130.67  82.67   0.8 1.85    1   5.26    2.46    388.61  558.51  66.76
Group6  136.33  70.33   0.54    2.26    0.35    NA  NA  388.81  551.69  113.39
Group6  127.33  73  1.32    2.19    0.99    4.42    2.59    378.57  501.12  85.56
Group7  186.67  89.67   0.79    1.77    0.53    5.22    2.73    269.87  490.25  77.74
Group7  203 93  5.63    22.08   0.82    6.97    2.92    341.87  611.33  92.7
Group7  127 72.67   0.55    1.07    0.38    3.2 1.69    310.9   410.19  65.62
Group7  142 79.67   1.61    1.35    3.24    3.73    2.08    304.52  495.79  60.15

这是我的代码:

   kw.tests <- lapply(
         data[, -1],
         function(x) { kruskal.test(as.numeric(x) ~ as.factor(Groups), data = data_test, na.action=na.omit) }
   )

     Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups),  : 
      variable lengths differ (found for 'as.factor(Groups)') 

当我单独 运行 对每个基因进行 运行 时,这段代码 运行 非常完美,例如,对于 Gene1:

kruskal.test(Gene1 ~ as.factor(Groups), data = data_test, na.action=na.omit)

    Kruskal-Wallis rank sum test

data:  Gene1 by as.factor(Groups)
Kruskal-Wallis chi-squared = 5.6607, df = 6, p-value = 0.4622

但是,当我使用 lapply 甚至 for 循环时,它会给我这个错误。我已经多次搜索这个错误,但 none 以下答案对我有帮助。

  1. 我了解到这可能是由于文件中的 NA。但是,我无法避免 NA,因为我的数据框比这大得多。此外,即使有 NAs,这个测试 运行s 对每个基因都是完美的,没有 lapply 或循环。
  2. 'Groups'变量的可变长度与所有其他变量的可变长度相同,所以这也不是问题。

我在这里 post 我的数据片段:

> dput(data_test)
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L), .Label = c("Group1", 
"Group2", "Group3", "Group4", "Group5", "Group6", "Group7"), class = "factor"), 
    Gene1 = c(120.67, 157, 113.5, 131, 120, 125.67, 103.33, 121.33, 
    136, 143.67, 134.67, 158.33, 147.33, 95.67, 135, 124.67, 
    152, 135.33, 135.33, 124.67, 125.67, 169, 123.67, 132.67, 
    NA, 144.33, 136.25, 123.67, 116.67, 136.67, 122, 130.67, 
    136.33, 127.33, 186.67, 203, 127, 142), Gene2 = c(69.33, 
    110.67, 66.75, 79.67, 79.67, 83.67, 68.67, 74.67, 83.67, 
    71.33, 69.67, 98.33, 88.33, 59, 82, 78, 98.33, 73.67, 87, 
    77.67, 72.33, 105, 76.33, 76.33, NA, 75, 78.25, 69.33, 66.33, 
    76, 63.33, 82.67, 70.33, 73, 89.67, 93, 72.67, 79.67), Gene3 = c(1.24, 
    0.4, 1.07, 1.13, 0.91, 2.07, 1.01, 0.54, 1.6, 0.56, 0.85, 
    0.87, NA, 1.39, 13.31, 1.12, 1.11, 0.13, 0.91, 0.28, 1.01, 
    0.6, 0.65, 2.94, 1.34, NA, NA, 1.81, 0.7, 2.68, 0.87, 0.8, 
    0.54, 1.32, 0.79, 5.63, 0.55, 1.61), Gene4 = c(2.31, 0.84, 
    4.53, 5.03, 3.84, 1.73, 4.89, 2.39, 1.75, 1.22, 1.77, 3.69, 
    NA, 0.56, 24.05, 2, 1.54, 2.99, 3.73, 0.81, 1.82, 3.12, 1.78, 
    17.01, 1.42, NA, 1.32, 1.52, 1.68, 4.3, 4.2, 1.85, 2.26, 
    2.19, 1.77, 22.08, 1.07, 1.35), Gene5 = c(0.39, 0.28, 0.33, 
    0.72, 0.74, 0.38, 0.3, 3.95, 0.32, 0.26, 0.45, 0.51, NA, 
    0.31, 1.21, 0.71, 0.35, 0.3, 0.65, 0.49, 0.35, 0.29, 0.47, 
    0.27, 0.4, 0.55, 0.65, 0.67, 0.27, 0.33, 0.17, 1, 0.35, 0.99, 
    0.53, 0.82, 0.38, 3.24), Gene6 = c(6.57, 2.62, 2.37, 3.36, 
    3.77, 3.89, 4.5, 3.7, 5.17, 4.48, 3.58, 2.53, NA, 2.49, 3.83, 
    3.77, 2.11, 2.4, 2.92, 2.62, 3.65, 3.9, 2.77, 3.99, 4.21, 
    3.26, 3.63, 3.89, 3.55, 7.36, 3.92, 5.26, NA, 4.42, 5.22, 
    6.97, 3.2, 3.73), Gene7 = c(2.49, 2.11, 2.35, 2.24, 1.92, 
    2.09, 1.75, 2.46, 2.36, 2.62, 2.44, 2.6, NA, 2.09, 2.83, 
    2.42, 2.21, 1.86, 1.85, 1.96, 1.62, 2.22, 1.57, 2.55, 2.02, 
    2.85, 1.52, 2, 2.16, 2.26, 2.11, 2.46, NA, 2.59, 2.73, 2.92, 
    1.69, 2.08), Gene8 = c(383.84, 245.42, 421.25, 305.32, 298.91, 
    233.81, 231.5, 310.66, 410.21, 294.01, 236.61, 257.66, NA, 
    395.38, 313.71, 334.36, 297.68, 296.82, 335.31, 251.49, 335.18, 
    311.01, 253.56, 354.78, 243.26, 312.16, 267.13, 303.89, 284.96, 
    237.28, 159.04, 388.61, 388.81, 378.57, 269.87, 341.87, 310.9, 
    304.52), Gene9 = c(415.23, 325.23, 352.03, 432.81, 382.43, 
    377.21, 381.73, 355.97, 389.62, 491.57, 441.32, 396.96, NA, 
    420.28, 327.84, 358.9, 433.48, 415.13, 412.16, 468.19, 264.74, 
    459.85, 283.38, 493.02, 345.2, 419.86, 256.18, 346.57, 407.04, 
    423.29, 300.24, 558.51, 551.69, 501.12, 490.25, 611.33, 410.19, 
    495.79), Gene10 = c(NA, NA, 73.51, 71.11, 66.49, 72.1, 53, 
    143.61, 170.34, 96.72, 69.06, 41.94, NA, 64.83, 66.8, 131.35, 
    117.18, 112.97, 103.18, 80.27, 145.15, 82.89, 59.07, 145.36, 
    43.91, 55.71, 53.49, 112.16, 102.97, 88.65, 60.13, 66.76, 
    113.39, 85.56, 77.74, 92.7, 65.62, 60.15)), class = "data.frame", row.names = c(NA, 
-38L))

感谢任何进一步的帮助。 谢谢。

您在 lapply / 应用调用中使用了错误的数据集名称

apply(data_test[,-1],2,function(x){kruskal.test(as.numeric(x)~as.factor(data_test$Groups))})

适合我。