Kruskal-wallis test in R gives an error: Error in model.frame.default: variable lengths differ
Kruskal-wallis test in R gives an error: Error in model.frame.default: variable lengths differ
我正在尝试 运行 Kruskal wallis 测试我在 R 中的示例数据框 (df) 中的多列,但我遇到以下错误:
Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups), :
variable lengths differ (found for 'as.factor(Groups)')
这是我的示例数据框 (df):
Groups Gene1 Gene2 Gene3 Gene4 Gene5 Gene6 Gene7 Gene8 Gene9 Gene10
Group1 120.67 69.33 1.24 2.31 0.39 6.57 2.49 383.84 415.23 NA
Group1 157 110.67 0.4 0.84 0.28 2.62 2.11 245.42 325.23 NA
Group1 113.5 66.75 1.07 4.53 0.33 2.37 2.35 421.25 352.03 73.51
Group1 131 79.67 1.13 5.03 0.72 3.36 2.24 305.32 432.81 71.11
Group1 120 79.67 0.91 3.84 0.74 3.77 1.92 298.91 382.43 66.49
Group2 125.67 83.67 2.07 1.73 0.38 3.89 2.09 233.81 377.21 72.1
Group2 103.33 68.67 1.01 4.89 0.3 4.5 1.75 231.5 381.73 53
Group2 121.33 74.67 0.54 2.39 3.95 3.7 2.46 310.66 355.97 143.61
Group2 136 83.67 1.6 1.75 0.32 5.17 2.36 410.21 389.62 170.34
Group2 143.67 71.33 0.56 1.22 0.26 4.48 2.62 294.01 491.57 96.72
Group2 134.67 69.67 0.85 1.77 0.45 3.58 2.44 236.61 441.32 69.06
Group2 158.33 98.33 0.87 3.69 0.51 2.53 2.6 257.66 396.96 41.94
Group2 147.33 88.33 NA NA NA NA NA NA NA NA
Group2 95.67 59 1.39 0.56 0.31 2.49 2.09 395.38 420.28 64.83
Group3 135 82 13.31 24.05 1.21 3.83 2.83 313.71 327.84 66.8
Group3 124.67 78 1.12 2 0.71 3.77 2.42 334.36 358.9 131.35
Group3 152 98.33 1.11 1.54 0.35 2.11 2.21 297.68 433.48 117.18
Group3 135.33 73.67 0.13 2.99 0.3 2.4 1.86 296.82 415.13 112.97
Group3 135.33 87 0.91 3.73 0.65 2.92 1.85 335.31 412.16 103.18
Group4 124.67 77.67 0.28 0.81 0.49 2.62 1.96 251.49 468.19 80.27
Group4 125.67 72.33 1.01 1.82 0.35 3.65 1.62 335.18 264.74 145.15
Group4 169 105 0.6 3.12 0.29 3.9 2.22 311.01 459.85 82.89
Group4 123.67 76.33 0.65 1.78 0.47 2.77 1.57 253.56 283.38 59.07
Group5 132.67 76.33 2.94 17.01 0.27 3.99 2.55 354.78 493.02 145.36
Group5 NA NA 1.34 1.42 0.4 4.21 2.02 243.26 345.2 43.91
Group5 144.33 75 NA NA 0.55 3.26 2.85 312.16 419.86 55.71
Group5 136.25 78.25 NA 1.32 0.65 3.63 1.52 267.13 256.18 53.49
Group5 123.67 69.33 1.81 1.52 0.67 3.89 2 303.89 346.57 112.16
Group5 116.67 66.33 0.7 1.68 0.27 3.55 2.16 284.96 407.04 102.97
Group5 136.67 76 2.68 4.3 0.33 7.36 2.26 237.28 423.29 88.65
Group6 122 63.33 0.87 4.2 0.17 3.92 2.11 159.04 300.24 60.13
Group6 130.67 82.67 0.8 1.85 1 5.26 2.46 388.61 558.51 66.76
Group6 136.33 70.33 0.54 2.26 0.35 NA NA 388.81 551.69 113.39
Group6 127.33 73 1.32 2.19 0.99 4.42 2.59 378.57 501.12 85.56
Group7 186.67 89.67 0.79 1.77 0.53 5.22 2.73 269.87 490.25 77.74
Group7 203 93 5.63 22.08 0.82 6.97 2.92 341.87 611.33 92.7
Group7 127 72.67 0.55 1.07 0.38 3.2 1.69 310.9 410.19 65.62
Group7 142 79.67 1.61 1.35 3.24 3.73 2.08 304.52 495.79 60.15
这是我的代码:
kw.tests <- lapply(
data[, -1],
function(x) { kruskal.test(as.numeric(x) ~ as.factor(Groups), data = data_test, na.action=na.omit) }
)
Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups), :
variable lengths differ (found for 'as.factor(Groups)')
当我单独 运行 对每个基因进行 运行 时,这段代码 运行 非常完美,例如,对于 Gene1:
kruskal.test(Gene1 ~ as.factor(Groups), data = data_test, na.action=na.omit)
Kruskal-Wallis rank sum test
data: Gene1 by as.factor(Groups)
Kruskal-Wallis chi-squared = 5.6607, df = 6, p-value = 0.4622
但是,当我使用 lapply 甚至 for 循环时,它会给我这个错误。我已经多次搜索这个错误,但 none 以下答案对我有帮助。
- 我了解到这可能是由于文件中的 NA。但是,我无法避免 NA,因为我的数据框比这大得多。此外,即使有 NAs,这个测试 运行s 对每个基因都是完美的,没有 lapply 或循环。
- 'Groups'变量的可变长度与所有其他变量的可变长度相同,所以这也不是问题。
我在这里 post 我的数据片段:
> dput(data_test)
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L), .Label = c("Group1",
"Group2", "Group3", "Group4", "Group5", "Group6", "Group7"), class = "factor"),
Gene1 = c(120.67, 157, 113.5, 131, 120, 125.67, 103.33, 121.33,
136, 143.67, 134.67, 158.33, 147.33, 95.67, 135, 124.67,
152, 135.33, 135.33, 124.67, 125.67, 169, 123.67, 132.67,
NA, 144.33, 136.25, 123.67, 116.67, 136.67, 122, 130.67,
136.33, 127.33, 186.67, 203, 127, 142), Gene2 = c(69.33,
110.67, 66.75, 79.67, 79.67, 83.67, 68.67, 74.67, 83.67,
71.33, 69.67, 98.33, 88.33, 59, 82, 78, 98.33, 73.67, 87,
77.67, 72.33, 105, 76.33, 76.33, NA, 75, 78.25, 69.33, 66.33,
76, 63.33, 82.67, 70.33, 73, 89.67, 93, 72.67, 79.67), Gene3 = c(1.24,
0.4, 1.07, 1.13, 0.91, 2.07, 1.01, 0.54, 1.6, 0.56, 0.85,
0.87, NA, 1.39, 13.31, 1.12, 1.11, 0.13, 0.91, 0.28, 1.01,
0.6, 0.65, 2.94, 1.34, NA, NA, 1.81, 0.7, 2.68, 0.87, 0.8,
0.54, 1.32, 0.79, 5.63, 0.55, 1.61), Gene4 = c(2.31, 0.84,
4.53, 5.03, 3.84, 1.73, 4.89, 2.39, 1.75, 1.22, 1.77, 3.69,
NA, 0.56, 24.05, 2, 1.54, 2.99, 3.73, 0.81, 1.82, 3.12, 1.78,
17.01, 1.42, NA, 1.32, 1.52, 1.68, 4.3, 4.2, 1.85, 2.26,
2.19, 1.77, 22.08, 1.07, 1.35), Gene5 = c(0.39, 0.28, 0.33,
0.72, 0.74, 0.38, 0.3, 3.95, 0.32, 0.26, 0.45, 0.51, NA,
0.31, 1.21, 0.71, 0.35, 0.3, 0.65, 0.49, 0.35, 0.29, 0.47,
0.27, 0.4, 0.55, 0.65, 0.67, 0.27, 0.33, 0.17, 1, 0.35, 0.99,
0.53, 0.82, 0.38, 3.24), Gene6 = c(6.57, 2.62, 2.37, 3.36,
3.77, 3.89, 4.5, 3.7, 5.17, 4.48, 3.58, 2.53, NA, 2.49, 3.83,
3.77, 2.11, 2.4, 2.92, 2.62, 3.65, 3.9, 2.77, 3.99, 4.21,
3.26, 3.63, 3.89, 3.55, 7.36, 3.92, 5.26, NA, 4.42, 5.22,
6.97, 3.2, 3.73), Gene7 = c(2.49, 2.11, 2.35, 2.24, 1.92,
2.09, 1.75, 2.46, 2.36, 2.62, 2.44, 2.6, NA, 2.09, 2.83,
2.42, 2.21, 1.86, 1.85, 1.96, 1.62, 2.22, 1.57, 2.55, 2.02,
2.85, 1.52, 2, 2.16, 2.26, 2.11, 2.46, NA, 2.59, 2.73, 2.92,
1.69, 2.08), Gene8 = c(383.84, 245.42, 421.25, 305.32, 298.91,
233.81, 231.5, 310.66, 410.21, 294.01, 236.61, 257.66, NA,
395.38, 313.71, 334.36, 297.68, 296.82, 335.31, 251.49, 335.18,
311.01, 253.56, 354.78, 243.26, 312.16, 267.13, 303.89, 284.96,
237.28, 159.04, 388.61, 388.81, 378.57, 269.87, 341.87, 310.9,
304.52), Gene9 = c(415.23, 325.23, 352.03, 432.81, 382.43,
377.21, 381.73, 355.97, 389.62, 491.57, 441.32, 396.96, NA,
420.28, 327.84, 358.9, 433.48, 415.13, 412.16, 468.19, 264.74,
459.85, 283.38, 493.02, 345.2, 419.86, 256.18, 346.57, 407.04,
423.29, 300.24, 558.51, 551.69, 501.12, 490.25, 611.33, 410.19,
495.79), Gene10 = c(NA, NA, 73.51, 71.11, 66.49, 72.1, 53,
143.61, 170.34, 96.72, 69.06, 41.94, NA, 64.83, 66.8, 131.35,
117.18, 112.97, 103.18, 80.27, 145.15, 82.89, 59.07, 145.36,
43.91, 55.71, 53.49, 112.16, 102.97, 88.65, 60.13, 66.76,
113.39, 85.56, 77.74, 92.7, 65.62, 60.15)), class = "data.frame", row.names = c(NA,
-38L))
感谢任何进一步的帮助。
谢谢。
您在 lapply / 应用调用中使用了错误的数据集名称
apply(data_test[,-1],2,function(x){kruskal.test(as.numeric(x)~as.factor(data_test$Groups))})
适合我。
我正在尝试 运行 Kruskal wallis 测试我在 R 中的示例数据框 (df) 中的多列,但我遇到以下错误:
Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups), :
variable lengths differ (found for 'as.factor(Groups)')
这是我的示例数据框 (df):
Groups Gene1 Gene2 Gene3 Gene4 Gene5 Gene6 Gene7 Gene8 Gene9 Gene10
Group1 120.67 69.33 1.24 2.31 0.39 6.57 2.49 383.84 415.23 NA
Group1 157 110.67 0.4 0.84 0.28 2.62 2.11 245.42 325.23 NA
Group1 113.5 66.75 1.07 4.53 0.33 2.37 2.35 421.25 352.03 73.51
Group1 131 79.67 1.13 5.03 0.72 3.36 2.24 305.32 432.81 71.11
Group1 120 79.67 0.91 3.84 0.74 3.77 1.92 298.91 382.43 66.49
Group2 125.67 83.67 2.07 1.73 0.38 3.89 2.09 233.81 377.21 72.1
Group2 103.33 68.67 1.01 4.89 0.3 4.5 1.75 231.5 381.73 53
Group2 121.33 74.67 0.54 2.39 3.95 3.7 2.46 310.66 355.97 143.61
Group2 136 83.67 1.6 1.75 0.32 5.17 2.36 410.21 389.62 170.34
Group2 143.67 71.33 0.56 1.22 0.26 4.48 2.62 294.01 491.57 96.72
Group2 134.67 69.67 0.85 1.77 0.45 3.58 2.44 236.61 441.32 69.06
Group2 158.33 98.33 0.87 3.69 0.51 2.53 2.6 257.66 396.96 41.94
Group2 147.33 88.33 NA NA NA NA NA NA NA NA
Group2 95.67 59 1.39 0.56 0.31 2.49 2.09 395.38 420.28 64.83
Group3 135 82 13.31 24.05 1.21 3.83 2.83 313.71 327.84 66.8
Group3 124.67 78 1.12 2 0.71 3.77 2.42 334.36 358.9 131.35
Group3 152 98.33 1.11 1.54 0.35 2.11 2.21 297.68 433.48 117.18
Group3 135.33 73.67 0.13 2.99 0.3 2.4 1.86 296.82 415.13 112.97
Group3 135.33 87 0.91 3.73 0.65 2.92 1.85 335.31 412.16 103.18
Group4 124.67 77.67 0.28 0.81 0.49 2.62 1.96 251.49 468.19 80.27
Group4 125.67 72.33 1.01 1.82 0.35 3.65 1.62 335.18 264.74 145.15
Group4 169 105 0.6 3.12 0.29 3.9 2.22 311.01 459.85 82.89
Group4 123.67 76.33 0.65 1.78 0.47 2.77 1.57 253.56 283.38 59.07
Group5 132.67 76.33 2.94 17.01 0.27 3.99 2.55 354.78 493.02 145.36
Group5 NA NA 1.34 1.42 0.4 4.21 2.02 243.26 345.2 43.91
Group5 144.33 75 NA NA 0.55 3.26 2.85 312.16 419.86 55.71
Group5 136.25 78.25 NA 1.32 0.65 3.63 1.52 267.13 256.18 53.49
Group5 123.67 69.33 1.81 1.52 0.67 3.89 2 303.89 346.57 112.16
Group5 116.67 66.33 0.7 1.68 0.27 3.55 2.16 284.96 407.04 102.97
Group5 136.67 76 2.68 4.3 0.33 7.36 2.26 237.28 423.29 88.65
Group6 122 63.33 0.87 4.2 0.17 3.92 2.11 159.04 300.24 60.13
Group6 130.67 82.67 0.8 1.85 1 5.26 2.46 388.61 558.51 66.76
Group6 136.33 70.33 0.54 2.26 0.35 NA NA 388.81 551.69 113.39
Group6 127.33 73 1.32 2.19 0.99 4.42 2.59 378.57 501.12 85.56
Group7 186.67 89.67 0.79 1.77 0.53 5.22 2.73 269.87 490.25 77.74
Group7 203 93 5.63 22.08 0.82 6.97 2.92 341.87 611.33 92.7
Group7 127 72.67 0.55 1.07 0.38 3.2 1.69 310.9 410.19 65.62
Group7 142 79.67 1.61 1.35 3.24 3.73 2.08 304.52 495.79 60.15
这是我的代码:
kw.tests <- lapply(
data[, -1],
function(x) { kruskal.test(as.numeric(x) ~ as.factor(Groups), data = data_test, na.action=na.omit) }
)
Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups), :
variable lengths differ (found for 'as.factor(Groups)')
当我单独 运行 对每个基因进行 运行 时,这段代码 运行 非常完美,例如,对于 Gene1:
kruskal.test(Gene1 ~ as.factor(Groups), data = data_test, na.action=na.omit)
Kruskal-Wallis rank sum test
data: Gene1 by as.factor(Groups)
Kruskal-Wallis chi-squared = 5.6607, df = 6, p-value = 0.4622
但是,当我使用 lapply 甚至 for 循环时,它会给我这个错误。我已经多次搜索这个错误,但 none 以下答案对我有帮助。
- 我了解到这可能是由于文件中的 NA。但是,我无法避免 NA,因为我的数据框比这大得多。此外,即使有 NAs,这个测试 运行s 对每个基因都是完美的,没有 lapply 或循环。
- 'Groups'变量的可变长度与所有其他变量的可变长度相同,所以这也不是问题。
我在这里 post 我的数据片段:
> dput(data_test)
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L), .Label = c("Group1",
"Group2", "Group3", "Group4", "Group5", "Group6", "Group7"), class = "factor"),
Gene1 = c(120.67, 157, 113.5, 131, 120, 125.67, 103.33, 121.33,
136, 143.67, 134.67, 158.33, 147.33, 95.67, 135, 124.67,
152, 135.33, 135.33, 124.67, 125.67, 169, 123.67, 132.67,
NA, 144.33, 136.25, 123.67, 116.67, 136.67, 122, 130.67,
136.33, 127.33, 186.67, 203, 127, 142), Gene2 = c(69.33,
110.67, 66.75, 79.67, 79.67, 83.67, 68.67, 74.67, 83.67,
71.33, 69.67, 98.33, 88.33, 59, 82, 78, 98.33, 73.67, 87,
77.67, 72.33, 105, 76.33, 76.33, NA, 75, 78.25, 69.33, 66.33,
76, 63.33, 82.67, 70.33, 73, 89.67, 93, 72.67, 79.67), Gene3 = c(1.24,
0.4, 1.07, 1.13, 0.91, 2.07, 1.01, 0.54, 1.6, 0.56, 0.85,
0.87, NA, 1.39, 13.31, 1.12, 1.11, 0.13, 0.91, 0.28, 1.01,
0.6, 0.65, 2.94, 1.34, NA, NA, 1.81, 0.7, 2.68, 0.87, 0.8,
0.54, 1.32, 0.79, 5.63, 0.55, 1.61), Gene4 = c(2.31, 0.84,
4.53, 5.03, 3.84, 1.73, 4.89, 2.39, 1.75, 1.22, 1.77, 3.69,
NA, 0.56, 24.05, 2, 1.54, 2.99, 3.73, 0.81, 1.82, 3.12, 1.78,
17.01, 1.42, NA, 1.32, 1.52, 1.68, 4.3, 4.2, 1.85, 2.26,
2.19, 1.77, 22.08, 1.07, 1.35), Gene5 = c(0.39, 0.28, 0.33,
0.72, 0.74, 0.38, 0.3, 3.95, 0.32, 0.26, 0.45, 0.51, NA,
0.31, 1.21, 0.71, 0.35, 0.3, 0.65, 0.49, 0.35, 0.29, 0.47,
0.27, 0.4, 0.55, 0.65, 0.67, 0.27, 0.33, 0.17, 1, 0.35, 0.99,
0.53, 0.82, 0.38, 3.24), Gene6 = c(6.57, 2.62, 2.37, 3.36,
3.77, 3.89, 4.5, 3.7, 5.17, 4.48, 3.58, 2.53, NA, 2.49, 3.83,
3.77, 2.11, 2.4, 2.92, 2.62, 3.65, 3.9, 2.77, 3.99, 4.21,
3.26, 3.63, 3.89, 3.55, 7.36, 3.92, 5.26, NA, 4.42, 5.22,
6.97, 3.2, 3.73), Gene7 = c(2.49, 2.11, 2.35, 2.24, 1.92,
2.09, 1.75, 2.46, 2.36, 2.62, 2.44, 2.6, NA, 2.09, 2.83,
2.42, 2.21, 1.86, 1.85, 1.96, 1.62, 2.22, 1.57, 2.55, 2.02,
2.85, 1.52, 2, 2.16, 2.26, 2.11, 2.46, NA, 2.59, 2.73, 2.92,
1.69, 2.08), Gene8 = c(383.84, 245.42, 421.25, 305.32, 298.91,
233.81, 231.5, 310.66, 410.21, 294.01, 236.61, 257.66, NA,
395.38, 313.71, 334.36, 297.68, 296.82, 335.31, 251.49, 335.18,
311.01, 253.56, 354.78, 243.26, 312.16, 267.13, 303.89, 284.96,
237.28, 159.04, 388.61, 388.81, 378.57, 269.87, 341.87, 310.9,
304.52), Gene9 = c(415.23, 325.23, 352.03, 432.81, 382.43,
377.21, 381.73, 355.97, 389.62, 491.57, 441.32, 396.96, NA,
420.28, 327.84, 358.9, 433.48, 415.13, 412.16, 468.19, 264.74,
459.85, 283.38, 493.02, 345.2, 419.86, 256.18, 346.57, 407.04,
423.29, 300.24, 558.51, 551.69, 501.12, 490.25, 611.33, 410.19,
495.79), Gene10 = c(NA, NA, 73.51, 71.11, 66.49, 72.1, 53,
143.61, 170.34, 96.72, 69.06, 41.94, NA, 64.83, 66.8, 131.35,
117.18, 112.97, 103.18, 80.27, 145.15, 82.89, 59.07, 145.36,
43.91, 55.71, 53.49, 112.16, 102.97, 88.65, 60.13, 66.76,
113.39, 85.56, 77.74, 92.7, 65.62, 60.15)), class = "data.frame", row.names = c(NA,
-38L))
感谢任何进一步的帮助。 谢谢。
您在 lapply / 应用调用中使用了错误的数据集名称
apply(data_test[,-1],2,function(x){kruskal.test(as.numeric(x)~as.factor(data_test$Groups))})
适合我。