对重复处理而不是参数执行 pca
Perform pca on replicate treatments instead of parameters
我有一个数据集,其中第 1 列包含治疗名称,其余列包含这些治疗的值,并且每个治疗有三个重复。为了说明,我使用 iris 数据集创建了模拟数据集,如下所示:
df <- read.table(text = '"Treatment" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
"treatment_a" 5.1 3.5 1.4 0.2
"treatment_a" 4.9 3 1.4 0.2
"treatment_a" 4.7 3.2 1.3 0.2
"treatment_b" 4.6 3.1 1.5 0.2
"treatment_b" 5 3.6 1.4 0.2
"treatment_b" 5.4 3.9 1.7 0.4
"treatment_c" 4.6 3.4 1.4 0.3
"treatment_c" 5 3.4 1.5 0.2
"treatment_c" 4.4 2.9 1.4 0.2
"treatment_d" 4.9 3.1 1.5 0.1
"treatment_d" 5.4 3.7 1.5 0.2
"treatment_d" 4.8 3.4 1.6 0.2
"treatment_e" 4.8 3 1.4 0.1
"treatment_e" 4.3 3 1.1 0.1
"treatment_e" 5.8 4 1.2 0.2
"treatment_f" 5.7 4.4 1.5 0.4
"treatment_f" 5.4 3.9 1.3 0.4
"treatment_f" 5.1 3.5 1.4 0.3
"treatment_g" 5.7 3.8 1.7 0.3
"treatment_g" 5.1 3.8 1.5 0.3
"treatment_g" 5.4 3.4 1.7 0.2
"treatment_h" 5.1 3.7 1.5 0.4
"treatment_h" 4.6 3.6 1 0.2
"treatment_h" 5.1 3.3 1.7 0.5', header = TRUE)
我想使用 R 在该数据集上执行 pca,其方式是将重复处理而不是变量绘制在图上,处理名称也应标记在图上。
我在 Whosebug 上寻找过类似的问题,但没有找到与我的问题类似的问题。
原始回复
您是否希望制作散点图,其中第一和第二主成分分别绘制在 x 和 y 轴上?然后你想用治疗方法标记这些点?如果是这样,您可以试一试。我正在使用 ggplot2
包。
我还给锅增添了色彩美感。如果您不想要,请随意删除该部分。
df <- read.table(text = '"Treatment" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
"treatment_a" 5.1 3.5 1.4 0.2
"treatment_a" 4.9 3 1.4 0.2
"treatment_a" 4.7 3.2 1.3 0.2
"treatment_b" 4.6 3.1 1.5 0.2
"treatment_b" 5 3.6 1.4 0.2
"treatment_b" 5.4 3.9 1.7 0.4
"treatment_c" 4.6 3.4 1.4 0.3
"treatment_c" 5 3.4 1.5 0.2
"treatment_c" 4.4 2.9 1.4 0.2
"treatment_d" 4.9 3.1 1.5 0.1
"treatment_d" 5.4 3.7 1.5 0.2
"treatment_d" 4.8 3.4 1.6 0.2
"treatment_e" 4.8 3 1.4 0.1
"treatment_e" 4.3 3 1.1 0.1
"treatment_e" 5.8 4 1.2 0.2
"treatment_f" 5.7 4.4 1.5 0.4
"treatment_f" 5.4 3.9 1.3 0.4
"treatment_f" 5.1 3.5 1.4 0.3
"treatment_g" 5.7 3.8 1.7 0.3
"treatment_g" 5.1 3.8 1.5 0.3
"treatment_g" 5.4 3.4 1.7 0.2
"treatment_h" 5.1 3.7 1.5 0.4
"treatment_h" 4.6 3.6 1 0.2
"treatment_h" 5.1 3.3 1.7 0.5', header = TRUE)
# run principle components, ignore first column
pr <- prcomp(df[, 2:5])
# run predict to get the first and second principle components
pr_pred <- predict(pr)
# put this into a data frame so we can use ggplot
df2 <- data.frame(Treatment = df$Treatment,
pr_pred[, 1:2])
library(ggplot2)
ggplot(data = df2, aes(x = PC1, y = PC2,
colour = Treatment,
label = Treatment)) +
geom_text()
添加了省略号
要添加这些,我们必须更改类别的数量。我们三个一起去。希望在您的实际数据集中,有足够的数据来绘制您要查找的椭圆。
df_mod <- read.table(text = '"Treatment" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
"treatment_a" 5.1 3.5 1.4 0.2
"treatment_a" 4.9 3 1.4 0.2
"treatment_a" 4.7 3.2 1.3 0.2
"treatment_b" 4.6 3.1 1.5 0.2
"treatment_b" 5 3.6 1.4 0.2
"treatment_b" 5.4 3.9 1.7 0.4
"treatment_c" 4.6 3.4 1.4 0.3
"treatment_c" 5 3.4 1.5 0.2
"treatment_c" 4.4 2.9 1.4 0.2
"treatment_a" 4.9 3.1 1.5 0.1
"treatment_a" 5.4 3.7 1.5 0.2
"treatment_a" 4.8 3.4 1.6 0.2
"treatment_b" 4.8 3 1.4 0.1
"treatment_b" 4.3 3 1.1 0.1
"treatment_b" 5.8 4 1.2 0.2
"treatment_c" 5.7 4.4 1.5 0.4
"treatment_c" 5.4 3.9 1.3 0.4
"treatment_c" 5.1 3.5 1.4 0.3
"treatment_a" 5.7 3.8 1.7 0.3
"treatment_a" 5.1 3.8 1.5 0.3
"treatment_b" 5.4 3.4 1.7 0.2
"treatment_b" 5.1 3.7 1.5 0.4
"treatment_c" 4.6 3.6 1 0.2
"treatment_c" 5.1 3.3 1.7 0.5', header = TRUE)
pr_mod <- prcomp(df_mod[, 2:5])
pr_pred_mod <- predict(pr_mod)
df2_mod <- data.frame(Treatment = df_mod$Treatment,
pr_pred_mod[, 1:2])
ggplot(data = df2_mod, aes(x = PC1, y = PC2,
colour = Treatment,
label = Treatment)) +
geom_text() +
stat_ellipse(show.legend = FALSE)
我有一个数据集,其中第 1 列包含治疗名称,其余列包含这些治疗的值,并且每个治疗有三个重复。为了说明,我使用 iris 数据集创建了模拟数据集,如下所示:
df <- read.table(text = '"Treatment" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
"treatment_a" 5.1 3.5 1.4 0.2
"treatment_a" 4.9 3 1.4 0.2
"treatment_a" 4.7 3.2 1.3 0.2
"treatment_b" 4.6 3.1 1.5 0.2
"treatment_b" 5 3.6 1.4 0.2
"treatment_b" 5.4 3.9 1.7 0.4
"treatment_c" 4.6 3.4 1.4 0.3
"treatment_c" 5 3.4 1.5 0.2
"treatment_c" 4.4 2.9 1.4 0.2
"treatment_d" 4.9 3.1 1.5 0.1
"treatment_d" 5.4 3.7 1.5 0.2
"treatment_d" 4.8 3.4 1.6 0.2
"treatment_e" 4.8 3 1.4 0.1
"treatment_e" 4.3 3 1.1 0.1
"treatment_e" 5.8 4 1.2 0.2
"treatment_f" 5.7 4.4 1.5 0.4
"treatment_f" 5.4 3.9 1.3 0.4
"treatment_f" 5.1 3.5 1.4 0.3
"treatment_g" 5.7 3.8 1.7 0.3
"treatment_g" 5.1 3.8 1.5 0.3
"treatment_g" 5.4 3.4 1.7 0.2
"treatment_h" 5.1 3.7 1.5 0.4
"treatment_h" 4.6 3.6 1 0.2
"treatment_h" 5.1 3.3 1.7 0.5', header = TRUE)
我想使用 R 在该数据集上执行 pca,其方式是将重复处理而不是变量绘制在图上,处理名称也应标记在图上。 我在 Whosebug 上寻找过类似的问题,但没有找到与我的问题类似的问题。
原始回复
您是否希望制作散点图,其中第一和第二主成分分别绘制在 x 和 y 轴上?然后你想用治疗方法标记这些点?如果是这样,您可以试一试。我正在使用 ggplot2
包。
我还给锅增添了色彩美感。如果您不想要,请随意删除该部分。
df <- read.table(text = '"Treatment" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
"treatment_a" 5.1 3.5 1.4 0.2
"treatment_a" 4.9 3 1.4 0.2
"treatment_a" 4.7 3.2 1.3 0.2
"treatment_b" 4.6 3.1 1.5 0.2
"treatment_b" 5 3.6 1.4 0.2
"treatment_b" 5.4 3.9 1.7 0.4
"treatment_c" 4.6 3.4 1.4 0.3
"treatment_c" 5 3.4 1.5 0.2
"treatment_c" 4.4 2.9 1.4 0.2
"treatment_d" 4.9 3.1 1.5 0.1
"treatment_d" 5.4 3.7 1.5 0.2
"treatment_d" 4.8 3.4 1.6 0.2
"treatment_e" 4.8 3 1.4 0.1
"treatment_e" 4.3 3 1.1 0.1
"treatment_e" 5.8 4 1.2 0.2
"treatment_f" 5.7 4.4 1.5 0.4
"treatment_f" 5.4 3.9 1.3 0.4
"treatment_f" 5.1 3.5 1.4 0.3
"treatment_g" 5.7 3.8 1.7 0.3
"treatment_g" 5.1 3.8 1.5 0.3
"treatment_g" 5.4 3.4 1.7 0.2
"treatment_h" 5.1 3.7 1.5 0.4
"treatment_h" 4.6 3.6 1 0.2
"treatment_h" 5.1 3.3 1.7 0.5', header = TRUE)
# run principle components, ignore first column
pr <- prcomp(df[, 2:5])
# run predict to get the first and second principle components
pr_pred <- predict(pr)
# put this into a data frame so we can use ggplot
df2 <- data.frame(Treatment = df$Treatment,
pr_pred[, 1:2])
library(ggplot2)
ggplot(data = df2, aes(x = PC1, y = PC2,
colour = Treatment,
label = Treatment)) +
geom_text()
添加了省略号
要添加这些,我们必须更改类别的数量。我们三个一起去。希望在您的实际数据集中,有足够的数据来绘制您要查找的椭圆。
df_mod <- read.table(text = '"Treatment" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
"treatment_a" 5.1 3.5 1.4 0.2
"treatment_a" 4.9 3 1.4 0.2
"treatment_a" 4.7 3.2 1.3 0.2
"treatment_b" 4.6 3.1 1.5 0.2
"treatment_b" 5 3.6 1.4 0.2
"treatment_b" 5.4 3.9 1.7 0.4
"treatment_c" 4.6 3.4 1.4 0.3
"treatment_c" 5 3.4 1.5 0.2
"treatment_c" 4.4 2.9 1.4 0.2
"treatment_a" 4.9 3.1 1.5 0.1
"treatment_a" 5.4 3.7 1.5 0.2
"treatment_a" 4.8 3.4 1.6 0.2
"treatment_b" 4.8 3 1.4 0.1
"treatment_b" 4.3 3 1.1 0.1
"treatment_b" 5.8 4 1.2 0.2
"treatment_c" 5.7 4.4 1.5 0.4
"treatment_c" 5.4 3.9 1.3 0.4
"treatment_c" 5.1 3.5 1.4 0.3
"treatment_a" 5.7 3.8 1.7 0.3
"treatment_a" 5.1 3.8 1.5 0.3
"treatment_b" 5.4 3.4 1.7 0.2
"treatment_b" 5.1 3.7 1.5 0.4
"treatment_c" 4.6 3.6 1 0.2
"treatment_c" 5.1 3.3 1.7 0.5', header = TRUE)
pr_mod <- prcomp(df_mod[, 2:5])
pr_pred_mod <- predict(pr_mod)
df2_mod <- data.frame(Treatment = df_mod$Treatment,
pr_pred_mod[, 1:2])
ggplot(data = df2_mod, aes(x = PC1, y = PC2,
colour = Treatment,
label = Treatment)) +
geom_text() +
stat_ellipse(show.legend = FALSE)