如何使用 Python(SciPy、NumPy 和 Matplotlib)绘制 R 的 vegan::procrustes?
How to plot R's vegan::procrustes using Python (SciPy, NumPy, and Matplotlib)?
我正在学习关于 procrustes 分析的素食教程:
https://www.rdocumentation.org/packages/vegan/versions/2.4-2/topics/procrustes
# Load data
library(vegan)
data(varespec)
# Ordinations
vare.dist <- vegdist(wisconsin(varespec))
mds.null <- monoMDS(vare.dist, y = cmdscale(vare.dist))
mds.alt <- monoMDS(vare.dist)
# Run procrustes
vare.proc <- procrustes(mds.alt, mds.null)
# Summary of procrustes
summary(vare.proc)
Call:
procrustes(X = mds.alt, Y = mds.null)
Number of objects: 24 Number of dimensions: 2
Procrustes sum of squares:
13.99951
Procrustes root mean squared error:
0.7637492
Quantiles of Procrustes errors:
Min 1Q Median 3Q Max
0.08408327 0.23038165 0.49316643 0.65854198 1.99105837
Rotation matrix:
[,1] [,2]
[1,] 0.9761893 0.2169202
[2,] 0.2169202 -0.9761893
of averages:
[,1] [,2]
[1,] -2.677059e-17 -5.251452e-18
Scaling of target:
[1] 0.6455131
plot(vare.proc)
现在 Python 我可以选择:
# Returns
mtx1: array_like
A standardized version of data1.
mtx2: array_like
The orientation of data2 that best fits data1. Centered, but not necessarily .
disparity: float
as defined above.
R(N, N): ndarray
The matrix solution of the orthogonal Procrustes problem. Minimizes the Frobenius norm of (A @ R) - B, subject to R.T @ R = I.
scale: float
Sum of the singular values of A.T @ B.
我的问题:
- 哪个实现更类似于
vegan::procrustes
、scipy.spatial.procrustes
或 scipy.linalg.orthogonal_procrustes
?
- 如何使用输出生成上图(在 Matplotlib 中而不是通过 Rpy2)?
编辑:
将 and 用于 SciPy/NumPy 实现,应该能够仅使用 PyData 中的标准包在 Python 中准确地重现 VEGAN 分析。
编辑 2:
为了防止对任何人有帮助,我在 soothsayer
中实现了 vegan
中的 Procrustes
和 Protest
(以及此处描述的绘图方法) python 套餐:
https://github.com/jolespin/soothsayer/blob/60accb669062c559ba785be201deddeede26a049/soothsayer/ordination/ordination.py#L715
不完全一样,但我觉得很接近。
为此,我将 varespec
数据带入了 Python。然后尝试模仿 vegan
包(和你的代码)采取的行动。按照这些思路,我坚持使用您使用的对象名称(大部分)。
必须有更简单的方法来做到这一点。
import scipy.spatial.distance as ssd
import sklearn.manifold as sm # MDS, smacof
import procrustes
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
wVares_py = r.wVares # I brought in the varespec data from R (named wVares)
# find the distance
vare_dist = ssd.pdist(wVares_py, 'braycurtis')
# make it square (Required for MDS and smaof
vare_dist = ssd.squareform(vare_dist)
mds = sm.MDS(metric = False, dissimilarity = 'precomputed')
mds_alt = mds.fit_transform(vare_dist)
mds_null = sm.smacof(vare_dist, metric = False, init = mds_alt)
vare_proc = procrustes.rotational(mds_alt, mds_null[0], translate = False)
# create an array that represents the axes at the origin, then rotate them
xAx = np.array([[-.4, 0],[.4, 0]])
yAx = np.array([[0, .4], [0, -.4]])
# create the rotated origins
xHat = vare_proc["t"] * xAx
yHat = vare_proc["t"] * yAx
# add line segments
vareA = pd.DataFrame(vare_proc.new_a, columns = ['x1', 'y1'])
vareB = pd.DataFrame(vare_proc.new_b, columns = ['x2', 'y2'])
df = pd.concat([vareA, vareB], axis = 1)
# now find the differences
df['dx'] = df.x2 - df.x1
df['dy'] = df.y2 - df.y1
现在一切就绪,可以绘制图表了。
plt.figure(figsize = (10, 10), dpi = 100)
fig, ax = plt.subplots()
# plot the points
# plot the points
ax.scatter(vare_proc.new_a[:, 0],
vare_proc.new_a[:, 1],
marker = 'o',
label = 'a',
c = '#cc5500')
# plot the x, y axes
ax.axhline(y=0, color='k')
ax.axvline(x=0, color='k')
# plot the rotated origin
ax.axline(xHat[:,0], xHat[:,1], linestyle = '-.')
ax.axline(yHat[:,0], yHat[:,1], linestyle = '-.')
# add arrows to show the direction per point
ax.quiver(df.x1, df.y1, df.dx, df.dy, units='xy', color = 'gray', scale=.7)
plt.show()
vegan
包在 monoMDS
中运行的方式似乎与 Python 使用的任何方法都不一致。 vegan
中的代码全部基于Fortran。您可以 link Fortan 到 Python,就像在 R 中一样。
虽然缩放比例不同,但vare.proc
内的旋转矩阵与上面Python编码对象中的旋转矩阵(数组?)完全相同vare_proc
。我发现这很有趣。
如果我为您的对象 vare.proc
引入 R 中的内容片段,我可以做出精确的图表。
首先,我使用 R Markdown 在 Python 和 R 之间共享对象。
# compare to R
# vpR = vareProc$rotation
vpR_py = r.vpR
# vpYrot = vareProc$Yrot
vpYrot_py = r.vpYrot
# vpX = vareProc$X
vpX_py = r.vpX
然后我用这些物件做了复制品
# build the plot exactly like vegan
# using https://github.com/vegandevs/vegan/blob/master/R/plot.procrustes.R
tails = vpYrot_py
heads = vpX_py
# find the ranges
xMax = max(abs(np.hstack((tails[:,0], heads[:,0]))))
yMax = max(abs(np.hstack((tails[:,1], heads[:,1]))))
plt.figure(dpi = 100)
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.set_xlim(-xMax, xMax)
ax.set_ylim(-yMax, yMax)
# add points
ax.scatter(x = tails[:,0],
y = tails[:,1],
facecolors = 'none',
edgecolors = 'k')
# add origin axes
ax.axhline(y = 0, color='k', ls = '--', linewidth = 1) # using dashed for origin
ax.axvline(x = 0, color='k', ls = '--', linewidth = 1)
# add rotation axes
ax.axline(xy1 = (0, 0), color = 'k',
slope = vpR_py[0, 1]/vpR_py[0, 0])
ax.axline(xy1 = (0, 0), color = 'k',
slope = vpR_py[1, 1]/vpR_py[1, 0])
# add arrows
for i in range(0, len(tails)):
ax.annotate("", xy = heads[i,:],
xycoords = 'data',
xytext = tails[i,:],
arrowprops=dict(arrowstyle="->",
color = 'blue'))
plt.tight_layout(pad = 2)
plt.show()
来自matplotlib
来自vegan
以下是如何仅使用 SciPy
和 NumPy
在 Python 中重现 vegan
procrustes
分析。有关如何绘制的上下文,请参阅上面的@Kat 答案:
# Get X and Y which are the first 2 embeddings
# -------------------------------------------
# X <- scores(X, display = scores, ...)
mds_null = pd.read_csv("https://pastebin.com/raw/c1Zwb4pu", index_col=0, sep="\t")
X_original = mds_null.values
# X_original[:5]
# array([[-0.28600328, -0.23135498],
# [ 0.00510282, -0.39170001],
# [ 0.80888622, 1.18656269],
# [ 1.44897916, -0.18887702],
# [ 0.54496396, -0.09403705]])
# Y <- scores(X, display = scores, ...)
mds_alt = pd.read_csv("https://pastebin.com/raw/UMGLgXmu", index_col=0, sep="\t")
Y_original = mds_alt.values
# Y_original[:5]
# array([[ 0.2651293 , 0.09755548],
# [ 0.17639036, 0.45328207],
# [-0.16167398, -1.42304932],
# [-1.44706047, 0.3966337 ],
# [-0.49825949, 0.15239689]])
# Procrustes in VEGAN
# ----------------
procrustes_result = vegan.procrustes(X_original, Y_original, scale=True, symmetric=False, scores="sites")
# Rotation matrix from VEGAN
A = np.array(dict(procrustes_result.items())["rotation"])
# A
# # array([[-0.98171787, -0.19034186],
# # [ 0.19034186, -0.98171787]])
# Center the X and Y data
# -----------------------
X_mean = X_original.mean(axis=0)
X = X_original - X_mean
Y_mean = Y_original.mean(axis=0)
Y = Y_original - Y_mean
# # Rotation matrix from SciPy
# ----------------------------
A, sca = orthogonal_procrustes(X,Y)
# A
# array([[-0.98171787, 0.19034186],
# [-0.19034186, -0.98171787]])
# Manual calculation
# ------------------
# np.linalg.svd returns:
# u{ (…, M, M), (…, M, K) } array
# Unitary array(s). The first a.ndim - 2 dimensions have the same size as those of the input a. The size of the last two dimensions depends on the value of full_matrices. Only returned when compute_uv is True.
# s(…, K) array
# Vector(s) with the singular values, within each vector sorted in descending order. The first a.ndim - 2 dimensions have the same size as those of the input a.
# vh{ (…, N, N), (…, K, N) } array
# Unitary array(s). The first a.ndim - 2 dimensions have the same size as those of the input a. The size of the last two dimensions depends on the value of full_matrices. Only returned when compute_uv is True.
XY = np.dot(X.T, Y) # crossprod(X,Y)
U,s,Vh = np.linalg.svd(XY)
V = Vh.T
A = np.dot(V, U.T)
# A
# array([[-0.98171787, -0.19034186],
# [ 0.19034186, -0.98171787]])
# Reproduce the Yrot object
# -------------------------
scale = True
symmetric = False
# Yrot from VEGAN
Yrot = np.array(dict(procrustes_result.items())["Yrot"])
# Yrot[:5]
# array([[-0.20919556, -0.12656386],
# [-0.07519809, -0.41418755],
# [-0.09706039, 1.23572329],
# [ 1.29483042, -0.098617 ],
# [ 0.44844991, -0.04740275]])
# NumPy implementation
# ctrace <- function(MAT) sum(MAT^2)
def ctrace(MAT):
return np.sum(MAT**2)
# if (scale) {
# c <- sum(sol$d)/ctrace(Y)
if scale:
c = np.sum(s)/ctrace(Y)
# Yrot <- c * Y %*% A
Yrot = c * np.dot(Y,A)
Yrot[:5]
# array([[-0.20919556, -0.12656386],
# [-0.07519809, -0.41418755],
# [-0.09706039, 1.23572329],
# [ 1.29483042, -0.098617 ],
# [ 0.44844991, -0.04740275]])
python
的另一种 PROTEST 实现
感谢分享!如果有人感兴趣,这是我在 cross-checking 文献和 VEGAN 实施之后根据 @O.rka 的工作制定的 PROTEST 实施。
from tqdm import trange
def protest(X, Y, no_permutations=999, with_replacement=False):
'''Performs procrustean test (PROTEST).
H_0: "X and Y are no more related than random datasets would be."
Args:
X (np.ndarray): Matrix of shape (n x p11), where n is the number of observations
Y (np.ndarray): Matrix of shape (n x p12)
no_permutations (int): Number of permutations to consider
with_replacement (bool): If set to True, permutations are sampled with replacement.
Returns:
test_statistic (float): Procrustean correlation coefficient
p_value (float): P-Value of PROTEST
RSS (float): Residual Sum of Squares of prucrustes rotations
References:
Gower & Dijksterhuis (2004): "Procrustes Problems"
Gower (1971): "Statistical methods of comparing different multivariate analyses
of the same data", p. 138-149
Peres-Neto & Jackson (2001): "How well do multivariate data sets match? The advantages of
a Procrustean superimposition approach over the Mantel test"
'''
n = X.shape[0]
assert n == Y.shape[0], 'X has to be of shape (n x p1) and Y has to be of shape (n x p2).'
# Get machine float error
EPS = np.sqrt(np.finfo(float).eps)
# # standardize the matrices using Gower-Transformation (Gower 1971)
X, Y = gower_trafo(X), gower_trafo(Y)
# calculate test statistic & residual sum of squares
test_statistic = procrustes_corr(X, Y)
RSS = np.sum(X**2) + np.sum(
Y**2) - 2 * test_statistic # c.f. Gower & Dijksterhuis (2004), equation 4.3
# Get integer index for permutations
index = np.arange(0, n)
# Permute and calculate goodness of fit
permutations = list()
for rs in trange(no_permutations):
permutated_idx = np.random.RandomState(rs).choice(index, replace=with_replacement, size=n)
corr = procrustes_corr(X[permutated_idx], Y)
permutations.append(corr)
permutations = np.array(permutations)
p_value = (np.sum(permutations >= test_statistic - EPS) + 1) / (no_permutations + 1)
# Print Output
print(f'Performed PROTEST with {no_permutations} permutations.')
print(f'Procrustean correlation coefficient: {test_statistic}')
print(f'p-Value: {p_value}')
print(f'Residual Sum of Squares: {RSS}')
return test_statistic, p_value, RSS
def procrustes_corr(X, Y):
'''Calculates the procrustean correlation coefficient.
Args:
X (np.ndarray): Gower-transformed Matrix of shape (n x p1), where n is the number of observations
Y (np.ndarray): Gower-transformed Matrix of shape (n x p2)
Returns:
corr (float): The goodness of fit of the orthogonal procrustes rotation. Procrustean form of Corelation coefficient.
References:
Peres-Neto & Jackson (2001): "How well do multivariate data sets match? The advantages of
a Procrustean superimposition approach over the Mantel test"
'''
XY = np.dot(X.T, Y)
_, s, _ = np.linalg.svd(XY)
# for performing PROTEST (Peres-Neto & Jackson 2001)
corr = np.trace(np.diag(s))
return corr
def gower_trafo(A):
'''Standardizes Matrix A using Gower-Transformation.
Args:
A (np.ndarray): Matrix of shape (n x p), where n is the number of observations
Returns:
A (np.ndarray): Standardized Matrix A
References:
Gower (1971): "Statistical methods of comparing different multivariate analyses
of the same data", p. 138-149
'''
A = A - A.mean(axis=0, keepdims=True)
A = A / np.sqrt(np.sum(A**2))
return A
我正在学习关于 procrustes 分析的素食教程:
https://www.rdocumentation.org/packages/vegan/versions/2.4-2/topics/procrustes
# Load data
library(vegan)
data(varespec)
# Ordinations
vare.dist <- vegdist(wisconsin(varespec))
mds.null <- monoMDS(vare.dist, y = cmdscale(vare.dist))
mds.alt <- monoMDS(vare.dist)
# Run procrustes
vare.proc <- procrustes(mds.alt, mds.null)
# Summary of procrustes
summary(vare.proc)
Call:
procrustes(X = mds.alt, Y = mds.null)
Number of objects: 24 Number of dimensions: 2
Procrustes sum of squares:
13.99951
Procrustes root mean squared error:
0.7637492
Quantiles of Procrustes errors:
Min 1Q Median 3Q Max
0.08408327 0.23038165 0.49316643 0.65854198 1.99105837
Rotation matrix:
[,1] [,2]
[1,] 0.9761893 0.2169202
[2,] 0.2169202 -0.9761893
of averages:
[,1] [,2]
[1,] -2.677059e-17 -5.251452e-18
Scaling of target:
[1] 0.6455131
plot(vare.proc)
现在 Python 我可以选择:
# Returns
mtx1: array_like
A standardized version of data1.
mtx2: array_like
The orientation of data2 that best fits data1. Centered, but not necessarily .
disparity: float
as defined above.
R(N, N): ndarray
The matrix solution of the orthogonal Procrustes problem. Minimizes the Frobenius norm of (A @ R) - B, subject to R.T @ R = I.
scale: float
Sum of the singular values of A.T @ B.
我的问题:
- 哪个实现更类似于
vegan::procrustes
、scipy.spatial.procrustes
或scipy.linalg.orthogonal_procrustes
? - 如何使用输出生成上图(在 Matplotlib 中而不是通过 Rpy2)?
编辑:
将
编辑 2:
为了防止对任何人有帮助,我在 soothsayer
中实现了 vegan
中的 Procrustes
和 Protest
(以及此处描述的绘图方法) python 套餐:
https://github.com/jolespin/soothsayer/blob/60accb669062c559ba785be201deddeede26a049/soothsayer/ordination/ordination.py#L715
不完全一样,但我觉得很接近。
为此,我将 varespec
数据带入了 Python。然后尝试模仿 vegan
包(和你的代码)采取的行动。按照这些思路,我坚持使用您使用的对象名称(大部分)。
必须有更简单的方法来做到这一点。
import scipy.spatial.distance as ssd
import sklearn.manifold as sm # MDS, smacof
import procrustes
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
wVares_py = r.wVares # I brought in the varespec data from R (named wVares)
# find the distance
vare_dist = ssd.pdist(wVares_py, 'braycurtis')
# make it square (Required for MDS and smaof
vare_dist = ssd.squareform(vare_dist)
mds = sm.MDS(metric = False, dissimilarity = 'precomputed')
mds_alt = mds.fit_transform(vare_dist)
mds_null = sm.smacof(vare_dist, metric = False, init = mds_alt)
vare_proc = procrustes.rotational(mds_alt, mds_null[0], translate = False)
# create an array that represents the axes at the origin, then rotate them
xAx = np.array([[-.4, 0],[.4, 0]])
yAx = np.array([[0, .4], [0, -.4]])
# create the rotated origins
xHat = vare_proc["t"] * xAx
yHat = vare_proc["t"] * yAx
# add line segments
vareA = pd.DataFrame(vare_proc.new_a, columns = ['x1', 'y1'])
vareB = pd.DataFrame(vare_proc.new_b, columns = ['x2', 'y2'])
df = pd.concat([vareA, vareB], axis = 1)
# now find the differences
df['dx'] = df.x2 - df.x1
df['dy'] = df.y2 - df.y1
现在一切就绪,可以绘制图表了。
plt.figure(figsize = (10, 10), dpi = 100)
fig, ax = plt.subplots()
# plot the points
# plot the points
ax.scatter(vare_proc.new_a[:, 0],
vare_proc.new_a[:, 1],
marker = 'o',
label = 'a',
c = '#cc5500')
# plot the x, y axes
ax.axhline(y=0, color='k')
ax.axvline(x=0, color='k')
# plot the rotated origin
ax.axline(xHat[:,0], xHat[:,1], linestyle = '-.')
ax.axline(yHat[:,0], yHat[:,1], linestyle = '-.')
# add arrows to show the direction per point
ax.quiver(df.x1, df.y1, df.dx, df.dy, units='xy', color = 'gray', scale=.7)
plt.show()
vegan
包在 monoMDS
中运行的方式似乎与 Python 使用的任何方法都不一致。 vegan
中的代码全部基于Fortran。您可以 link Fortan 到 Python,就像在 R 中一样。
虽然缩放比例不同,但vare.proc
内的旋转矩阵与上面Python编码对象中的旋转矩阵(数组?)完全相同vare_proc
。我发现这很有趣。
如果我为您的对象 vare.proc
引入 R 中的内容片段,我可以做出精确的图表。
首先,我使用 R Markdown 在 Python 和 R 之间共享对象。
# compare to R
# vpR = vareProc$rotation
vpR_py = r.vpR
# vpYrot = vareProc$Yrot
vpYrot_py = r.vpYrot
# vpX = vareProc$X
vpX_py = r.vpX
然后我用这些物件做了复制品
# build the plot exactly like vegan
# using https://github.com/vegandevs/vegan/blob/master/R/plot.procrustes.R
tails = vpYrot_py
heads = vpX_py
# find the ranges
xMax = max(abs(np.hstack((tails[:,0], heads[:,0]))))
yMax = max(abs(np.hstack((tails[:,1], heads[:,1]))))
plt.figure(dpi = 100)
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.set_xlim(-xMax, xMax)
ax.set_ylim(-yMax, yMax)
# add points
ax.scatter(x = tails[:,0],
y = tails[:,1],
facecolors = 'none',
edgecolors = 'k')
# add origin axes
ax.axhline(y = 0, color='k', ls = '--', linewidth = 1) # using dashed for origin
ax.axvline(x = 0, color='k', ls = '--', linewidth = 1)
# add rotation axes
ax.axline(xy1 = (0, 0), color = 'k',
slope = vpR_py[0, 1]/vpR_py[0, 0])
ax.axline(xy1 = (0, 0), color = 'k',
slope = vpR_py[1, 1]/vpR_py[1, 0])
# add arrows
for i in range(0, len(tails)):
ax.annotate("", xy = heads[i,:],
xycoords = 'data',
xytext = tails[i,:],
arrowprops=dict(arrowstyle="->",
color = 'blue'))
plt.tight_layout(pad = 2)
plt.show()
来自matplotlib
来自vegan
以下是如何仅使用 SciPy
和 NumPy
在 Python 中重现 vegan
procrustes
分析。有关如何绘制的上下文,请参阅上面的@Kat 答案:
# Get X and Y which are the first 2 embeddings
# -------------------------------------------
# X <- scores(X, display = scores, ...)
mds_null = pd.read_csv("https://pastebin.com/raw/c1Zwb4pu", index_col=0, sep="\t")
X_original = mds_null.values
# X_original[:5]
# array([[-0.28600328, -0.23135498],
# [ 0.00510282, -0.39170001],
# [ 0.80888622, 1.18656269],
# [ 1.44897916, -0.18887702],
# [ 0.54496396, -0.09403705]])
# Y <- scores(X, display = scores, ...)
mds_alt = pd.read_csv("https://pastebin.com/raw/UMGLgXmu", index_col=0, sep="\t")
Y_original = mds_alt.values
# Y_original[:5]
# array([[ 0.2651293 , 0.09755548],
# [ 0.17639036, 0.45328207],
# [-0.16167398, -1.42304932],
# [-1.44706047, 0.3966337 ],
# [-0.49825949, 0.15239689]])
# Procrustes in VEGAN
# ----------------
procrustes_result = vegan.procrustes(X_original, Y_original, scale=True, symmetric=False, scores="sites")
# Rotation matrix from VEGAN
A = np.array(dict(procrustes_result.items())["rotation"])
# A
# # array([[-0.98171787, -0.19034186],
# # [ 0.19034186, -0.98171787]])
# Center the X and Y data
# -----------------------
X_mean = X_original.mean(axis=0)
X = X_original - X_mean
Y_mean = Y_original.mean(axis=0)
Y = Y_original - Y_mean
# # Rotation matrix from SciPy
# ----------------------------
A, sca = orthogonal_procrustes(X,Y)
# A
# array([[-0.98171787, 0.19034186],
# [-0.19034186, -0.98171787]])
# Manual calculation
# ------------------
# np.linalg.svd returns:
# u{ (…, M, M), (…, M, K) } array
# Unitary array(s). The first a.ndim - 2 dimensions have the same size as those of the input a. The size of the last two dimensions depends on the value of full_matrices. Only returned when compute_uv is True.
# s(…, K) array
# Vector(s) with the singular values, within each vector sorted in descending order. The first a.ndim - 2 dimensions have the same size as those of the input a.
# vh{ (…, N, N), (…, K, N) } array
# Unitary array(s). The first a.ndim - 2 dimensions have the same size as those of the input a. The size of the last two dimensions depends on the value of full_matrices. Only returned when compute_uv is True.
XY = np.dot(X.T, Y) # crossprod(X,Y)
U,s,Vh = np.linalg.svd(XY)
V = Vh.T
A = np.dot(V, U.T)
# A
# array([[-0.98171787, -0.19034186],
# [ 0.19034186, -0.98171787]])
# Reproduce the Yrot object
# -------------------------
scale = True
symmetric = False
# Yrot from VEGAN
Yrot = np.array(dict(procrustes_result.items())["Yrot"])
# Yrot[:5]
# array([[-0.20919556, -0.12656386],
# [-0.07519809, -0.41418755],
# [-0.09706039, 1.23572329],
# [ 1.29483042, -0.098617 ],
# [ 0.44844991, -0.04740275]])
# NumPy implementation
# ctrace <- function(MAT) sum(MAT^2)
def ctrace(MAT):
return np.sum(MAT**2)
# if (scale) {
# c <- sum(sol$d)/ctrace(Y)
if scale:
c = np.sum(s)/ctrace(Y)
# Yrot <- c * Y %*% A
Yrot = c * np.dot(Y,A)
Yrot[:5]
# array([[-0.20919556, -0.12656386],
# [-0.07519809, -0.41418755],
# [-0.09706039, 1.23572329],
# [ 1.29483042, -0.098617 ],
# [ 0.44844991, -0.04740275]])
python
的另一种 PROTEST 实现感谢分享!如果有人感兴趣,这是我在 cross-checking 文献和 VEGAN 实施之后根据 @O.rka 的工作制定的 PROTEST 实施。
from tqdm import trange
def protest(X, Y, no_permutations=999, with_replacement=False):
'''Performs procrustean test (PROTEST).
H_0: "X and Y are no more related than random datasets would be."
Args:
X (np.ndarray): Matrix of shape (n x p11), where n is the number of observations
Y (np.ndarray): Matrix of shape (n x p12)
no_permutations (int): Number of permutations to consider
with_replacement (bool): If set to True, permutations are sampled with replacement.
Returns:
test_statistic (float): Procrustean correlation coefficient
p_value (float): P-Value of PROTEST
RSS (float): Residual Sum of Squares of prucrustes rotations
References:
Gower & Dijksterhuis (2004): "Procrustes Problems"
Gower (1971): "Statistical methods of comparing different multivariate analyses
of the same data", p. 138-149
Peres-Neto & Jackson (2001): "How well do multivariate data sets match? The advantages of
a Procrustean superimposition approach over the Mantel test"
'''
n = X.shape[0]
assert n == Y.shape[0], 'X has to be of shape (n x p1) and Y has to be of shape (n x p2).'
# Get machine float error
EPS = np.sqrt(np.finfo(float).eps)
# # standardize the matrices using Gower-Transformation (Gower 1971)
X, Y = gower_trafo(X), gower_trafo(Y)
# calculate test statistic & residual sum of squares
test_statistic = procrustes_corr(X, Y)
RSS = np.sum(X**2) + np.sum(
Y**2) - 2 * test_statistic # c.f. Gower & Dijksterhuis (2004), equation 4.3
# Get integer index for permutations
index = np.arange(0, n)
# Permute and calculate goodness of fit
permutations = list()
for rs in trange(no_permutations):
permutated_idx = np.random.RandomState(rs).choice(index, replace=with_replacement, size=n)
corr = procrustes_corr(X[permutated_idx], Y)
permutations.append(corr)
permutations = np.array(permutations)
p_value = (np.sum(permutations >= test_statistic - EPS) + 1) / (no_permutations + 1)
# Print Output
print(f'Performed PROTEST with {no_permutations} permutations.')
print(f'Procrustean correlation coefficient: {test_statistic}')
print(f'p-Value: {p_value}')
print(f'Residual Sum of Squares: {RSS}')
return test_statistic, p_value, RSS
def procrustes_corr(X, Y):
'''Calculates the procrustean correlation coefficient.
Args:
X (np.ndarray): Gower-transformed Matrix of shape (n x p1), where n is the number of observations
Y (np.ndarray): Gower-transformed Matrix of shape (n x p2)
Returns:
corr (float): The goodness of fit of the orthogonal procrustes rotation. Procrustean form of Corelation coefficient.
References:
Peres-Neto & Jackson (2001): "How well do multivariate data sets match? The advantages of
a Procrustean superimposition approach over the Mantel test"
'''
XY = np.dot(X.T, Y)
_, s, _ = np.linalg.svd(XY)
# for performing PROTEST (Peres-Neto & Jackson 2001)
corr = np.trace(np.diag(s))
return corr
def gower_trafo(A):
'''Standardizes Matrix A using Gower-Transformation.
Args:
A (np.ndarray): Matrix of shape (n x p), where n is the number of observations
Returns:
A (np.ndarray): Standardized Matrix A
References:
Gower (1971): "Statistical methods of comparing different multivariate analyses
of the same data", p. 138-149
'''
A = A - A.mean(axis=0, keepdims=True)
A = A / np.sqrt(np.sum(A**2))
return A