Python 由许多概率的乘积、大协方差矩阵的求逆及其行列式的计算引起的溢出

Question

所以我有这样的功能：

def logpp(X,m,S):
# Find the number of dimensions from the data vector
d = X.shape[1]

# Invert the covariance matrix
Sinv = np.linalg.inv(S)

# Compute the quadratic terms for all data points
Q = -0.5*(np.dot(X-m,Sinv)*(X-m)).sum(axis=1)

# Raise them quadratic terms to the exponential
Q = np.exp(Q)

# Divide by the terms in the denominator
P = Q / np.sqrt((2*np.pi)**d * np.linalg.det(S))

# Take the product of the probability of each data points
Pprod = np.prod(P)

# Return the log-probability
return np.log(Pprod)

当我生成较大的输入时，结果会溢出。如何重写才能避免溢出？

我的输入函数：

X1 = numpy.random.mtrand.RandomState(123).normal(0,1,[5,len(m1)])
X2 = numpy.random.mtrand.RandomState(123).normal(0,1,[20,len(m2)])
X3 = numpy.random.mtrand.RandomState(123).normal(0,1,[100,len(m3)])

Answer 1

几点建议：

请注意，在处理概率时，您通常希望留在 "log space" 中，因为所有内容都是非负的，并且代码倾向于使用 under/overflow 容易浮点数的值
线性代数库有很多处理病态和其他数值不稳定性的工具，请查看 linalg tutorial 以获得一些指导

在你的情况下，我将函数重写为：

import numpy as np
from scipy import linalg

def logpp(X, m, S):
    X = X - m
    Q = -0.5 * (linalg.solve(S, X.T).T * X).sum(axis=1)

    sgn, logdetS = np.linalg.slogdet(S)
    assert sgn > 0

    logP = Q - 0.5 * (np.log(2*np.pi)*d + logdetS)

    return np.sum(logP)

哪个更适合我

我不完全确定这里使用 solve 的有效性，但我认为它做对了。如果有人可以对此发表评论，那就太好了！

Answer 2

不使用"slove"函数：

def logp_robust(X,m,S):

# Find the number of dimensions from the data vector
d = X.shape[1]
N = X.shape[0]
# Invert the covariance matrix
Sinv = numpy.linalg.inv(S)

# Compute the quadratic terms for all data points
Q = numpy.sum(-0.5*(numpy.dot(X-m,Sinv)*(X-m)).sum(axis=1))


return (Q-0.5 * (d * N)* numpy.log(2*numpy.pi)-0.5* N* numpy.log(numpy.linalg.det(S)))

Python 由许多概率的乘积、大协方差矩阵的求逆及其行列式的计算引起的溢出

Python overflow cause by product of many probabilities, inversion of a large covariance matrix, and computation of its determinant

python

matrix

linear-algebra