我可以将一个 PCA 的变量坐标覆盖在第二个 PCA 的个体坐标上并仍然解释结果吗？

Question

我有两组数据：丰度数据和环境数据，需要 "link" 它们或 "overlay" 它们在 PCA 中：

我想在 R 中进行 PCA，这给了我个体 Ciliate 物种作为个人和环境参数作为变量。我所拥有的是两个不同的数据框。 “abundance”，其中给出了采样点不同物种的丰度，并且 "environment" 其中给出了采样点的 环境参数 。所以我在每个数据框中都有一个共同的参数：站点！如果我进行主成分分析，我要么得到一个以地点为个体、环境参数为变量的图，要么以物种为个体、地点为变量的图。我需要的是 link 参数站点的数据集，以便我可以将纤毛虫物种作为个体并将环境参数作为变量进行 PCA。所以我会以某种方式需要 link 公共参数站点上的两个 PCAs/dataframes。到目前为止我所做的-我做了两个不同的 PCA，并记住了 PCA1 个体（纤毛虫物种）的坐标和 PCA2 变量的坐标（环境参数）并绘制了它们-> 该图正是我需要的，但它是否仍然可以解释为 PCA，所以数据帧真的 link 由站点参数编辑？或者只是简单地欺骗了数据并失去了可解释性？

我尝试的另一个选择是通过加权平均值（由现场纤毛虫的丰度加权）计算每个纤毛虫物种的环境参数，并在具有纤毛虫物种的数据框上进行 PCA 和加权平均值环境参数...哪个有效，但我认为我以这种方式丢失了很多信息...您怎么看？

#Create random dataframe of abundance data, I am sure this can be done simpler and more elegant than this ;)
    species<-c("spec1", "spec2", "spec3", "spec 4", "spec 5", "spec 6", "spec7")
    site1<-c(2,4,19,34,3,6,9)
    site2<-c(5,8,9,12,0,1,1)
    site3<-c(23,56,7,1,1,1,2)
    site4<-c(4,6,2,8,5,1,7)
    abundance<-data.frame(species,site1,site2,site3,site4)
    rownames(abundance)<-abundance$species
    abundance<-abundance[,-1]
    #Create random dataframe of abundance data
    #environmental parameters of the sites
    X<-c("site1","site2","site3","site4")
    Temp<-c(24,24.5,23.5,25)
    Chla<-c(2.2,1.5,2.0,3.4)
    Plo<-c(1000,2000,1500,200)
    Plo2<-c(200,400,600,200)
    environment<-data.frame(X,Temp,Chla,Plo,Plo2)
    rownames(environment)<-environment$X
    environment<-environment[,-1]
    ###PCA on abundance data
    #hellinger pre-transformation of abundance data
    library(vegan)
    abu.h<-decostand(abundance,"hellinger")
    abu.h.pca<-prcomp(abu.h)
    envir.pca<-prcomp(environment,scale=TRUE)
    biplot(abu.h.pca)
    ##and now I would need to discard the sites vectors and overlay it with 
    #the environmental sites factors, due to my prof?
    #Graph of individuals 
    fviz_pca_ind(abu.h.pca) 
    ##get coordinates 
    library(factoextra)
    ind<-get_pca_ind(abu.h.pca) 
    head(ind$coord) 
    #x in biplot 
    ind<-ind$coord 
    ind<-ind[,1:2]
    ind 
    #y variables 
    # Extract the results for variables only

    vari<-get_pca_var(abu.h.pca) 
    var<-vari$coord 
    var<-var[,1:2] 
    var 
    biplot(ind, var, var.axes = TRUE)

Answer 1

我从来没有做过你所描述的事情，但我知道你可以在 nMDS 上使用矢量叠加来做与环境（非生物）数据的关联。如果你能用 PCA 做到这一点，我不确定，但至少我的 PRIMER 手册提到使用欧氏距离的非生物数据的 PCA 非常适合生物数据的 nMDS，这就是 PRIMER 的 BEST 函数的工作原理.但这不是 PRIMER。

参见vegan::envfit函数。 intro vignette covers it briefly. Vegan tutor covers it a bit more.

我转置了物种数据，并使用物种数据的 nMDS 完成了它。

library(vegan)

species <-c ("spec1", "spec2", "spec3", "spec 4", "spec 5", "spec 6", "spec7")
site1 <- c(2,4,19,34,3,6,9)
site2 <- c(5,8,9,12,0,1,1)
site3 <- c(23,56,7,1,1,1,2)
site4 <- c(4,6,2,8,5,1,7)
abundance <- data.frame(species,site1,site2,site3,site4)
rownames(abundance) <- abundance$species
abundance <- abundance[,-1]
abundance <- t(abundance)

X <- c ("site1","site2","site3","site4")
Temp <- c(24,24.5,23.5,25)
Chla <- c(2.2,1.5,2.0,3.4)
Plo <- c(1000,2000,1500,200)
Plo2 <- c(200,400,600,200)
environment <- data.frame(X,Temp,Chla,Plo,Plo2)
rownames(environment) <- environment$X
environment <- environment[,-1]

AbEnvMDS <- metaMDS(abundance, k = 2)
AbEnvFit <- envfit(AbEnvMDS, environment)

plt <- plot(AbEnvMDS) # displays both sites (empty circles) and species (red +)
plt <- plot(AbEnvMDS, display = "species") # displays only species (red +)
plt
identify(plt, what = "species") # choose your points
plot(AbEnvFit) # overlays your environment

我可以将一个 PCA 的变量坐标覆盖在第二个 PCA 的个体坐标上并仍然解释结果吗？

Can I overlay the coordinates of variables of one PCA over the coordinates of individuals from a second PCA and still interprete the results?

r

pca