t.test 基于 R 中两组不同的因素跨数据框
t.test across a dataframe based on two different group of factors in R
我有一个数据框,其中包含在 2 个位置记录的 11 种植物的变量。对于每个物种,我试图使用 t.test(或 wilcoxon 检验)比较两个不同位置之间变量的平均值。
这是我数据的前几行
SPECIES LOCATION X.COLONIZATION SPORE_DENSITY pH NO3 NH4 P Organic_C K Cu Mn Zn BD X.Sand
1 C. comosa Gauteng 90 387 5.40 8.24 1.35 1.10 0.95 94.40 3.36 84.40 4.72 1.45 68.0
2 C. comosa Gauteng 84 270 5.25 8.36 1.37 1.20 0.99 94.87 3.39 84.87 4.77 1.36 76.0
3 C. comosa Gauteng 96 404 5.55 8.19 1.32 1.11 0.94 94.01 3.35 84.01 4.68 1.54 78.0
4 C. comosa Mpumalanga 79 382 5.84 4.05 3.46 3.04 1.55 130.40 0.28 25.43 2.00 1.66 73.6
5 C. comosa Mpumalanga 82 383 5.49 4.45 3.48 3.09 1.53 131.36 0.27 25.35 2.12 1.45 76.5
6 C. comosa Mpumalanga 86 371 6.19 4.43 3.44 3.04 1.58 129.95 0.29 25.45 2.14 1.87 74.9
7 C. distans Gauteng 80 334 5.48 8.88 1.96 3.33 0.99 130.24 0.99 40.01 3.94 1.55 70.0
8 C. distans Gauteng 75 409 5.29 8.54 1.99 3.28 0.99 130.28 0.95 40.25 3.89 1.48 79.0
9 C. distans Gauteng 85 259 5.67 8.63 1.93 3.39 1.02 130.30 0.98 40.12 3.97 1.62 79.0
10 C. distans Mpumalanga 65 326 5.61 6.02 2.65 4.45 2.58 163.25 1.79 53.11 6.11 1.68 72.0
11 C. distans Mpumalanga 79 351 5.43 6.58 2.55 4.49 2.59 163.55 1.78 52.89 6.04 1.63 78.0
12 C. distans Mpumalanga 71 251 5.79 6.24 2.59 4.41 2.59 163.27 1.75 53.03 6.19 1.73 75.0
X.Silt X.Clay
1 12 9
2 16 13
3 14 14
4 9 10
5 11 16
6 13 16
7 8 11
8 12 15
9 10 16
10 8 10
11 15 14
12 16 12
例如,对于每个物种,我想比较(检验显着性差异)豪登省和普马兰加省的孢子密度平均值。有什么帮助吗?
我们按 'SPECIES' 分组,然后在数字列上使用 summarise
和 across
,子集列值是 'LOCATION' 是 'Gauteng' 或另一个,应用 t.test
并提取 pvalue
library(dplyr) #1.0.0
df1 %>%
group_by(SPECIES) %>%
summarise(across(where(is.numeric), ~
t.test(.[LOCATION == 'Gauteng'], .[LOCATION == 'Mpumalanga'])$p.value))
# A tibble: 2 x 16
# SPECIES X.COLONIZATION SPORE_DENSITY pH NO3 NH4 P Organic_C K Cu Mn Zn BD X.Sand X.Silt X.Clay
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 C. comosa 0.146 0.614 0.149 0.000269 7.27e-8 1.35e-5 0.00000970 2.15e-6 3.12e-7 1.35e-5 7.23e-6 0.219 0.779 0.140 0.474
#2 C. dista… 0.177 0.667 0.438 0.000624 1.94e-4 2.04e-5 0.00000670 4.48e-6 1.22e-6 1.90e-8 2.07e-5 0.0653 0.791 0.363 0.359
数据
df1 <- structure(list(SPECIES = c("C. comosa", "C. comosa", "C. comosa",
"C. comosa", "C. comosa", "C. comosa", "C. distans", "C. distans",
"C. distans", "C. distans", "C. distans", "C. distans"), LOCATION = c("Gauteng",
"Gauteng", "Gauteng", "Mpumalanga", "Mpumalanga", "Mpumalanga",
"Gauteng", "Gauteng", "Gauteng", "Mpumalanga", "Mpumalanga",
"Mpumalanga"), X.COLONIZATION = c(90L, 84L, 96L, 79L, 82L, 86L,
80L, 75L, 85L, 65L, 79L, 71L), SPORE_DENSITY = c(387L, 270L,
404L, 382L, 383L, 371L, 334L, 409L, 259L, 326L, 351L, 251L),
pH = c(5.4, 5.25, 5.55, 5.84, 5.49, 6.19, 5.48, 5.29, 5.67,
5.61, 5.43, 5.79), NO3 = c(8.24, 8.36, 8.19, 4.05, 4.45,
4.43, 8.88, 8.54, 8.63, 6.02, 6.58, 6.24), NH4 = c(1.35,
1.37, 1.32, 3.46, 3.48, 3.44, 1.96, 1.99, 1.93, 2.65, 2.55,
2.59), P = c(1.1, 1.2, 1.11, 3.04, 3.09, 3.04, 3.33, 3.28,
3.39, 4.45, 4.49, 4.41), Organic_C = c(0.95, 0.99, 0.94,
1.55, 1.53, 1.58, 0.99, 0.99, 1.02, 2.58, 2.59, 2.59), K = c(94.4,
94.87, 94.01, 130.4, 131.36, 129.95, 130.24, 130.28, 130.3,
163.25, 163.55, 163.27), Cu = c(3.36, 3.39, 3.35, 0.28, 0.27,
0.29, 0.99, 0.95, 0.98, 1.79, 1.78, 1.75), Mn = c(84.4, 84.87,
84.01, 25.43, 25.35, 25.45, 40.01, 40.25, 40.12, 53.11, 52.89,
53.03), Zn = c(4.72, 4.77, 4.68, 2, 2.12, 2.14, 3.94, 3.89,
3.97, 6.11, 6.04, 6.19), BD = c(1.45, 1.36, 1.54, 1.66, 1.45,
1.87, 1.55, 1.48, 1.62, 1.68, 1.63, 1.73), X.Sand = c(68,
76, 78, 73.6, 76.5, 74.9, 70, 79, 79, 72, 78, 75), X.Silt = c(12L,
16L, 14L, 9L, 11L, 13L, 8L, 12L, 10L, 8L, 15L, 16L), X.Clay = c(9L,
13L, 14L, 10L, 16L, 16L, 11L, 15L, 16L, 10L, 14L, 12L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
我有一个数据框,其中包含在 2 个位置记录的 11 种植物的变量。对于每个物种,我试图使用 t.test(或 wilcoxon 检验)比较两个不同位置之间变量的平均值。
这是我数据的前几行
SPECIES LOCATION X.COLONIZATION SPORE_DENSITY pH NO3 NH4 P Organic_C K Cu Mn Zn BD X.Sand
1 C. comosa Gauteng 90 387 5.40 8.24 1.35 1.10 0.95 94.40 3.36 84.40 4.72 1.45 68.0
2 C. comosa Gauteng 84 270 5.25 8.36 1.37 1.20 0.99 94.87 3.39 84.87 4.77 1.36 76.0
3 C. comosa Gauteng 96 404 5.55 8.19 1.32 1.11 0.94 94.01 3.35 84.01 4.68 1.54 78.0
4 C. comosa Mpumalanga 79 382 5.84 4.05 3.46 3.04 1.55 130.40 0.28 25.43 2.00 1.66 73.6
5 C. comosa Mpumalanga 82 383 5.49 4.45 3.48 3.09 1.53 131.36 0.27 25.35 2.12 1.45 76.5
6 C. comosa Mpumalanga 86 371 6.19 4.43 3.44 3.04 1.58 129.95 0.29 25.45 2.14 1.87 74.9
7 C. distans Gauteng 80 334 5.48 8.88 1.96 3.33 0.99 130.24 0.99 40.01 3.94 1.55 70.0
8 C. distans Gauteng 75 409 5.29 8.54 1.99 3.28 0.99 130.28 0.95 40.25 3.89 1.48 79.0
9 C. distans Gauteng 85 259 5.67 8.63 1.93 3.39 1.02 130.30 0.98 40.12 3.97 1.62 79.0
10 C. distans Mpumalanga 65 326 5.61 6.02 2.65 4.45 2.58 163.25 1.79 53.11 6.11 1.68 72.0
11 C. distans Mpumalanga 79 351 5.43 6.58 2.55 4.49 2.59 163.55 1.78 52.89 6.04 1.63 78.0
12 C. distans Mpumalanga 71 251 5.79 6.24 2.59 4.41 2.59 163.27 1.75 53.03 6.19 1.73 75.0
X.Silt X.Clay
1 12 9
2 16 13
3 14 14
4 9 10
5 11 16
6 13 16
7 8 11
8 12 15
9 10 16
10 8 10
11 15 14
12 16 12
例如,对于每个物种,我想比较(检验显着性差异)豪登省和普马兰加省的孢子密度平均值。有什么帮助吗?
我们按 'SPECIES' 分组,然后在数字列上使用 summarise
和 across
,子集列值是 'LOCATION' 是 'Gauteng' 或另一个,应用 t.test
并提取 pvalue
library(dplyr) #1.0.0
df1 %>%
group_by(SPECIES) %>%
summarise(across(where(is.numeric), ~
t.test(.[LOCATION == 'Gauteng'], .[LOCATION == 'Mpumalanga'])$p.value))
# A tibble: 2 x 16
# SPECIES X.COLONIZATION SPORE_DENSITY pH NO3 NH4 P Organic_C K Cu Mn Zn BD X.Sand X.Silt X.Clay
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 C. comosa 0.146 0.614 0.149 0.000269 7.27e-8 1.35e-5 0.00000970 2.15e-6 3.12e-7 1.35e-5 7.23e-6 0.219 0.779 0.140 0.474
#2 C. dista… 0.177 0.667 0.438 0.000624 1.94e-4 2.04e-5 0.00000670 4.48e-6 1.22e-6 1.90e-8 2.07e-5 0.0653 0.791 0.363 0.359
数据
df1 <- structure(list(SPECIES = c("C. comosa", "C. comosa", "C. comosa",
"C. comosa", "C. comosa", "C. comosa", "C. distans", "C. distans",
"C. distans", "C. distans", "C. distans", "C. distans"), LOCATION = c("Gauteng",
"Gauteng", "Gauteng", "Mpumalanga", "Mpumalanga", "Mpumalanga",
"Gauteng", "Gauteng", "Gauteng", "Mpumalanga", "Mpumalanga",
"Mpumalanga"), X.COLONIZATION = c(90L, 84L, 96L, 79L, 82L, 86L,
80L, 75L, 85L, 65L, 79L, 71L), SPORE_DENSITY = c(387L, 270L,
404L, 382L, 383L, 371L, 334L, 409L, 259L, 326L, 351L, 251L),
pH = c(5.4, 5.25, 5.55, 5.84, 5.49, 6.19, 5.48, 5.29, 5.67,
5.61, 5.43, 5.79), NO3 = c(8.24, 8.36, 8.19, 4.05, 4.45,
4.43, 8.88, 8.54, 8.63, 6.02, 6.58, 6.24), NH4 = c(1.35,
1.37, 1.32, 3.46, 3.48, 3.44, 1.96, 1.99, 1.93, 2.65, 2.55,
2.59), P = c(1.1, 1.2, 1.11, 3.04, 3.09, 3.04, 3.33, 3.28,
3.39, 4.45, 4.49, 4.41), Organic_C = c(0.95, 0.99, 0.94,
1.55, 1.53, 1.58, 0.99, 0.99, 1.02, 2.58, 2.59, 2.59), K = c(94.4,
94.87, 94.01, 130.4, 131.36, 129.95, 130.24, 130.28, 130.3,
163.25, 163.55, 163.27), Cu = c(3.36, 3.39, 3.35, 0.28, 0.27,
0.29, 0.99, 0.95, 0.98, 1.79, 1.78, 1.75), Mn = c(84.4, 84.87,
84.01, 25.43, 25.35, 25.45, 40.01, 40.25, 40.12, 53.11, 52.89,
53.03), Zn = c(4.72, 4.77, 4.68, 2, 2.12, 2.14, 3.94, 3.89,
3.97, 6.11, 6.04, 6.19), BD = c(1.45, 1.36, 1.54, 1.66, 1.45,
1.87, 1.55, 1.48, 1.62, 1.68, 1.63, 1.73), X.Sand = c(68,
76, 78, 73.6, 76.5, 74.9, 70, 79, 79, 72, 78, 75), X.Silt = c(12L,
16L, 14L, 9L, 11L, 13L, 8L, 12L, 10L, 8L, 15L, 16L), X.Clay = c(9L,
13L, 14L, 10L, 16L, 16L, 11L, 15L, 16L, 10L, 14L, 12L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))