Python 由许多概率的乘积、大协方差矩阵的求逆及其行列式的计算引起的溢出
Python overflow cause by product of many probabilities, inversion of a large covariance matrix, and computation of its determinant
所以我有这样的功能:
def logpp(X,m,S):
# Find the number of dimensions from the data vector
d = X.shape[1]
# Invert the covariance matrix
Sinv = np.linalg.inv(S)
# Compute the quadratic terms for all data points
Q = -0.5*(np.dot(X-m,Sinv)*(X-m)).sum(axis=1)
# Raise them quadratic terms to the exponential
Q = np.exp(Q)
# Divide by the terms in the denominator
P = Q / np.sqrt((2*np.pi)**d * np.linalg.det(S))
# Take the product of the probability of each data points
Pprod = np.prod(P)
# Return the log-probability
return np.log(Pprod)
当我生成较大的输入时,结果会溢出。如何重写才能避免溢出?
我的输入函数:
X1 = numpy.random.mtrand.RandomState(123).normal(0,1,[5,len(m1)])
X2 = numpy.random.mtrand.RandomState(123).normal(0,1,[20,len(m2)])
X3 = numpy.random.mtrand.RandomState(123).normal(0,1,[100,len(m3)])
几点建议:
- 请注意,在处理概率时,您通常希望留在 "log space" 中,因为所有内容都是非负的,并且代码倾向于使用 under/overflow 容易浮点数的值
- 线性代数库有很多处理病态和其他数值不稳定性的工具,请查看 linalg tutorial 以获得一些指导
在你的情况下,我将函数重写为:
import numpy as np
from scipy import linalg
def logpp(X, m, S):
X = X - m
Q = -0.5 * (linalg.solve(S, X.T).T * X).sum(axis=1)
sgn, logdetS = np.linalg.slogdet(S)
assert sgn > 0
logP = Q - 0.5 * (np.log(2*np.pi)*d + logdetS)
return np.sum(logP)
哪个更适合我
我不完全确定这里使用 solve
的有效性,但我认为它做对了。如果有人可以对此发表评论,那就太好了!
不使用"slove"函数:
def logp_robust(X,m,S):
# Find the number of dimensions from the data vector
d = X.shape[1]
N = X.shape[0]
# Invert the covariance matrix
Sinv = numpy.linalg.inv(S)
# Compute the quadratic terms for all data points
Q = numpy.sum(-0.5*(numpy.dot(X-m,Sinv)*(X-m)).sum(axis=1))
return (Q-0.5 * (d * N)* numpy.log(2*numpy.pi)-0.5* N* numpy.log(numpy.linalg.det(S)))
所以我有这样的功能:
def logpp(X,m,S):
# Find the number of dimensions from the data vector
d = X.shape[1]
# Invert the covariance matrix
Sinv = np.linalg.inv(S)
# Compute the quadratic terms for all data points
Q = -0.5*(np.dot(X-m,Sinv)*(X-m)).sum(axis=1)
# Raise them quadratic terms to the exponential
Q = np.exp(Q)
# Divide by the terms in the denominator
P = Q / np.sqrt((2*np.pi)**d * np.linalg.det(S))
# Take the product of the probability of each data points
Pprod = np.prod(P)
# Return the log-probability
return np.log(Pprod)
当我生成较大的输入时,结果会溢出。如何重写才能避免溢出?
我的输入函数:
X1 = numpy.random.mtrand.RandomState(123).normal(0,1,[5,len(m1)])
X2 = numpy.random.mtrand.RandomState(123).normal(0,1,[20,len(m2)])
X3 = numpy.random.mtrand.RandomState(123).normal(0,1,[100,len(m3)])
几点建议:
- 请注意,在处理概率时,您通常希望留在 "log space" 中,因为所有内容都是非负的,并且代码倾向于使用 under/overflow 容易浮点数的值
- 线性代数库有很多处理病态和其他数值不稳定性的工具,请查看 linalg tutorial 以获得一些指导
在你的情况下,我将函数重写为:
import numpy as np
from scipy import linalg
def logpp(X, m, S):
X = X - m
Q = -0.5 * (linalg.solve(S, X.T).T * X).sum(axis=1)
sgn, logdetS = np.linalg.slogdet(S)
assert sgn > 0
logP = Q - 0.5 * (np.log(2*np.pi)*d + logdetS)
return np.sum(logP)
哪个更适合我
我不完全确定这里使用 solve
的有效性,但我认为它做对了。如果有人可以对此发表评论,那就太好了!
不使用"slove"函数:
def logp_robust(X,m,S):
# Find the number of dimensions from the data vector
d = X.shape[1]
N = X.shape[0]
# Invert the covariance matrix
Sinv = numpy.linalg.inv(S)
# Compute the quadratic terms for all data points
Q = numpy.sum(-0.5*(numpy.dot(X-m,Sinv)*(X-m)).sum(axis=1))
return (Q-0.5 * (d * N)* numpy.log(2*numpy.pi)-0.5* N* numpy.log(numpy.linalg.det(S)))