如何获得`skbio` PCoA(主坐标分析)结果?
How to get `skbio` PCoA (Principal Coordinate Analysis) results?
我正在查看 attributes
of skbio's
PCoA
方法(如下所列)。我是这个 API
的新手,我希望能够获得 eigenvectors
和投影到新轴上的原始点,类似于 sklearn.decomposition.PCA
中的 .fit_transform
这样我就可以创建一些 PC_1 vs PC_2
风格的情节。我想出了如何获得 eigvals
和 proportion_explained
但 features
返回为 None
。
是因为它处于测试阶段吗?
如果有任何使用它的教程,将不胜感激。我是 scikit-learn
的忠实粉丝,并希望开始使用更多 scikit's
产品。
| Attributes
| ----------
| short_method_name : str
| Abbreviated ordination method name.
| long_method_name : str
| Ordination method name.
| eigvals : pd.Series
| The resulting eigenvalues. The index corresponds to the ordination
| axis labels
| samples : pd.DataFrame
| The position of the samples in the ordination space, row-indexed by the
| sample id.
| features : pd.DataFrame
| The position of the features in the ordination space, row-indexed by
| the feature id.
| biplot_scores : pd.DataFrame
| Correlation coefficients of the samples with respect to the features.
| sample_constraints : pd.DataFrame
| Site constraints (linear combinations of constraining variables):
| coordinates of the sites in the space of the explanatory variables X.
| These are the fitted site scores
| proportion_explained : pd.Series
| Proportion explained by each of the dimensions in the ordination space.
| The index corresponds to the ordination axis labels
这是我生成 principal component analysis
对象的代码。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False})
import skbio
from scipy.spatial import distance
%matplotlib inline
np.random.seed(0)
# Iris dataset
DF_data = pd.DataFrame(load_iris().data,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
columns = load_iris().feature_names)
n,m = DF_data.shape
# print(n,m)
# 150 4
Se_targets = pd.Series(load_iris().target,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
name = "Species")
# Scaling mean = 0, var = 1
DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data),
index = DF_data.index,
columns = DF_data.columns)
# Distance Matrix
Ar_dist = distance.squareform(distance.pdist(DF_standard.T, metric="braycurtis")) # (m x m) distance measure
DM_dist = skbio.stats.distance.DistanceMatrix(Ar_dist, ids=DF_standard.columns)
PCoA = skbio.stats.ordination.pcoa(DM_dist)
您可以使用 OrdinationResults.samples
访问转换后的样本坐标。这将 return 一个 pandas.DataFrame
由样本 ID(即距离矩阵中的 ID)索引的行。由于主坐标分析对样本的距离矩阵进行操作,因此转换后的特征坐标 (OrdinationResults.features
) 不可用。 scikit-bio 中接受样本 x 特征 table 作为输入的其他排序方法将具有可用的转换特征坐标(例如 CA、CCA、RDA)。
旁注:distance.squareform
调用是不必要的,因为 skbio.DistanceMatrix
支持方形或向量形式的数组。
我正在查看 attributes
of skbio's
PCoA
方法(如下所列)。我是这个 API
的新手,我希望能够获得 eigenvectors
和投影到新轴上的原始点,类似于 sklearn.decomposition.PCA
中的 .fit_transform
这样我就可以创建一些 PC_1 vs PC_2
风格的情节。我想出了如何获得 eigvals
和 proportion_explained
但 features
返回为 None
。
是因为它处于测试阶段吗?
如果有任何使用它的教程,将不胜感激。我是 scikit-learn
的忠实粉丝,并希望开始使用更多 scikit's
产品。
| Attributes
| ----------
| short_method_name : str
| Abbreviated ordination method name.
| long_method_name : str
| Ordination method name.
| eigvals : pd.Series
| The resulting eigenvalues. The index corresponds to the ordination
| axis labels
| samples : pd.DataFrame
| The position of the samples in the ordination space, row-indexed by the
| sample id.
| features : pd.DataFrame
| The position of the features in the ordination space, row-indexed by
| the feature id.
| biplot_scores : pd.DataFrame
| Correlation coefficients of the samples with respect to the features.
| sample_constraints : pd.DataFrame
| Site constraints (linear combinations of constraining variables):
| coordinates of the sites in the space of the explanatory variables X.
| These are the fitted site scores
| proportion_explained : pd.Series
| Proportion explained by each of the dimensions in the ordination space.
| The index corresponds to the ordination axis labels
这是我生成 principal component analysis
对象的代码。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False})
import skbio
from scipy.spatial import distance
%matplotlib inline
np.random.seed(0)
# Iris dataset
DF_data = pd.DataFrame(load_iris().data,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
columns = load_iris().feature_names)
n,m = DF_data.shape
# print(n,m)
# 150 4
Se_targets = pd.Series(load_iris().target,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
name = "Species")
# Scaling mean = 0, var = 1
DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data),
index = DF_data.index,
columns = DF_data.columns)
# Distance Matrix
Ar_dist = distance.squareform(distance.pdist(DF_standard.T, metric="braycurtis")) # (m x m) distance measure
DM_dist = skbio.stats.distance.DistanceMatrix(Ar_dist, ids=DF_standard.columns)
PCoA = skbio.stats.ordination.pcoa(DM_dist)
您可以使用 OrdinationResults.samples
访问转换后的样本坐标。这将 return 一个 pandas.DataFrame
由样本 ID(即距离矩阵中的 ID)索引的行。由于主坐标分析对样本的距离矩阵进行操作,因此转换后的特征坐标 (OrdinationResults.features
) 不可用。 scikit-bio 中接受样本 x 特征 table 作为输入的其他排序方法将具有可用的转换特征坐标(例如 CA、CCA、RDA)。
旁注:distance.squareform
调用是不必要的,因为 skbio.DistanceMatrix
支持方形或向量形式的数组。