计算点间距不均匀的 3D 梯度
Calculating a 3D gradient with unevenly spaced points
我目前有一个由几百万个不均匀间隔的粒子跨越的体积,每个粒子都有一个属性(潜在的,对于那些好奇的人),我想为其计算局部力(加速度)。
np.gradient 仅适用于均匀间隔的数据,我在这里查看:Second order gradient in numpy 其中插值是必要的,但我在 Numpy 中找不到 3D 样条实现。
一些将产生代表性数据的代码:
import numpy as np
from scipy.spatial import cKDTree
x = np.random.uniform(-10, 10, 10000)
y = np.random.uniform(-10, 10, 10000)
z = np.random.uniform(-10, 10, 10000)
phi = np.random.uniform(-10**9, 0, 10000)
kdtree = cKDTree(np.c_[x,y,z])
_, index = kdtree.query([0,0,0], 32) #find 32 nearest particles to the origin
#find the gradient at (0,0,0) by considering the 32 nearest particles?
(我的问题和Function to compute 3D gradient with unevenly spaced sample locations很相似,但是好像没有解决办法,所以我想我再问一次。)
如有任何帮助,我们将不胜感激。
直觉上,对于一个数据点的派生,我会执行以下操作
- 对周围的数据进行切片:
data=phi[x_id-1:x_id+1, y_id-1:y_id+1, z_id-1:z_id+1]
。 kdTre 的方法看起来非常好,当然你也可以将它用于数据的子集。
- 拟合一个 3D 多项式,你可能想看看 polyvander3D。将切片中间的点定义为中心。计算到其他点的偏移量。将它们作为坐标传递给 polyfit。
- Derive 你所在位置的多项式。
这将是解决您的问题的简单方法。
但是它可能会很慢。
编辑:
实际上这似乎是通常的方法:https://scicomp.stackexchange.com/questions/480/how-can-i-numerically-differentiate-an-unevenly-sampled-function
已接受的答案讨论了推导插值多项式。尽管显然该多项式应该涵盖所有数据(Vandermonde 矩阵)。对你来说那是不可能的,数据太多了。取局部子集似乎很合理。
很大程度上取决于潜在数据的 signal/noise 比率。你的例子全是噪音,所以 "fitting" 任何东西都会是 "over-fitting." 噪音的程度将决定你想要多拟合的程度(就像 lhk 的回答一样)想成为克里金法(使用 pyKriging 或其他方式)
我建议使用 query(x,distance_upper_bound)
而不是 query(x,k)
,因为这可能会防止由于集群引起的一些不稳定性
我不是数学家,但我预计将多项式拟合到依赖于距离的数据子集在空间上会不稳定,尤其是随着多项式阶数的增加。这将使您得到的梯度场不连续。
这是一个 Julia 实现,可以满足您的要求
using NearestNeighbors
n = 3;
k = 32; # for stability use k > n*(n+3)/2
# Take a point near the center of cube
point = 0.5 + rand(n)*1e-3;
data = rand(n, 10^4);
kdtree = KDTree(data);
idxs, dists = knn(kdtree, point, k, true);
# Coords of the k-Nearest Neighbors
X = data[:,idxs];
# Least-squares recipe for coefficients
C = point * ones(1,k); # central node
dX = X - C; # diffs from central node
G = dX' * dX;
F = G .* G;
v = diag(G);
N = pinv(G) * G;
N = eye(N) - N;
a = N * pinv(F*N) * v; # ...these are the coeffs
# Use a temperature distribution of T = 25.4 * r^2
# whose analytical gradient is gradT = 25.4 * 2*x
X2 = X .* X;
C2 = C .* C;
T = 25.4 * n * mean(X2, 1)';
Tc = 25.4 * n * mean(C2, 1)'; # central node
dT = T - Tc; # diffs from central node
y = dX * (a .* dT); # Estimated gradient
g = 2 * 25.4 * point; # Analytical
# print results
@printf "Estimated Grad = %s\n" string(y')
@printf "Analytical Grad = %s\n" string(g')
@printf "Relative Error = %.8f\n" vecnorm(g-y)/vecnorm(g)
这种方法有大约 1% 的相对误差。这是几次运行的结果...
Estimated Grad = [25.51670916224472 25.421038632006926 25.6711949674633]
Analytical Grad = [25.41499027802736 25.44913042322385 25.448202594123806]
Relative Error = 0.00559934
Estimated Grad = [25.310574056859014 25.549736360607493 25.368056350800604]
Analytical Grad = [25.43200914200516 25.43243178887198 25.45061497749628]
Relative Error = 0.00426558
更新
我不太了解 Python,但这里的翻译似乎有效
import numpy as np
from scipy.spatial import KDTree
n = 3;
k = 32;
# fill the cube with random points
data = np.random.rand(10000,n)
kdtree = KDTree(data)
# pick a point (at the center of the cube)
point = 0.5 * np.ones((1,n))
# Coords of k-Nearest Neighbors
dists, idxs = kdtree.query(point, k)
idxs = idxs[0]
X = data[idxs,:]
# Calculate coefficients
C = (np.dot(point.T, np.ones((1,k)))).T # central node
dX= X - C # diffs from central node
G = np.dot(dX, dX.T)
F = np.multiply(G, G)
v = np.diag(G);
N = np.dot(np.linalg.pinv(G), G)
N = np.eye(k) - N;
a = np.dot(np.dot(N, np.linalg.pinv(np.dot(F,N))), v) # these are the coeffs
# Temperature distribution is T = 25.4 * r^2
X2 = np.multiply(X, X)
C2 = np.multiply(C, C)
T = 25.4 * n * np.mean(X2, 1).T
Tc = 25.4 * n * np.mean(C2, 1).T # central node
dT = T - Tc; # diffs from central node
# Analytical gradient ==> gradT = 2*25.4* x
g = 2 * 25.4 * point;
print( "g[]: %s" % (g) )
# Estimated gradient
y = np.dot(dX.T, np.multiply(a, dT))
print( "y[]: %s, Relative Error = %.8f" % (y, np.linalg.norm(g-y)/np.linalg.norm(g)) )
更新 #2
我想我可以使用格式化的 ASCII 而不是 LaTeX 来写一些易于理解的东西...
`Given a set of M vectors in n-dimensions (call them b_k), find a set of
`coeffs (call them a_k) which yields the best estimate of the identity
`matrix and the zero vector
`
` M
` (1) min ||E - I||, where E = sum a_k b_k b_k
` a_k k=1
`
` M
` (2) min ||z - 0||, where z = sum a_k b_k
` a_k k=1
`
`
`Note that the basis vectors {b_k} are not required
`to be normalized, orthogonal, or even linearly independent.
`
`First, define the following quantities:
`
` B ==> matrix whose columns are the b_k
` G = B'.B ==> transpose of B times B
` F = G @ G ==> @ represents the hadamard product
` v = diag(G) ==> vector composed of diag elements of G
`
`The above minimizations are equivalent to this linearly constrained problem
`
` Solve F.a = v
` s.t. G.a = 0
`
`Let {X} denote the Moore-Penrose inverse of X.
`Then the solution of the linear problem can be written:
`
` N = I - {G}.G ==> projector into nullspace of G
` a = N . {F.N} . v
`
`The utility of these coeffs is that they allow you to write
`very simple expressions for the derivatives of a tensor field.
`
`
`Let D be the del (or nabla) operator
`and d be the difference operator wrt the central (aka 0th) node,
`so that, for any scalar/vector/tensor quantity Y, we have:
` dY = Y - Y_0
`
`Let x_k be the position vector of the kth node.
`And for our basis vectors, take
` b_k = dx_k = x_k - x_0.
`
`Assume that each node has a field value associated with it
` (e.g. temperature), and assume a quadratic model [about x = x_0]
` for the field [g=gradient, H=hessian, ":" is the double-dot product]
`
` Y = Y_0 + (x-x_0).g + (x-x_0)(x-x_0):H/2
` dY = dx.g + dxdx:H/2
` D2Y = I:H ==> Laplacian of Y
`
`
`Evaluate the model at the kth node
`
` dY_k = dx_k.g + dx_k dx_k:H/2
`
`Multiply by a_k and sum
`
` M M M
` sum a_k dY_k = sum a_k dx_k.g + sum a_k dx_k dx_k:H/2
` k=1 k=1 k=1
`
` = 0.g + I:H/2
` = D2Y / 2
`
`Thus, we have a second order estimate of the Laplacian
`
` M
` Lap(Y_0) = sum 2 a_k dY_k
` k=1
`
`
`Now play the same game with a linear model
` dY_k = dx_k.g
`
`But this time multiply by (a_k dx_k) and sum
`
` M M
` sum a_k dx_k dY_k = sum a_k dx_k dx_k.g
` k=1 k=1
`
` = I.g
` = g
`
`
`In general, the derivatives at the central node can be estimated as
`
` M
` D#Y = sum a_k dx_k#dY_k
` k=1
`
` M
` D2Y = sum 2 a_k dY_k
` k=1
`
` where
` # stands for the {dot, cross, or tensor} product
` yielding the {div, curl, or grad} of Y
` and
` D2Y stands for the Laplacian of Y
` D2Y = D.DY = Lap(Y)
我迟交两分钱。在space均匀分布且较大的情况下,通常只提取每个粒子的局部信息。
您可能已经注意到,提取本地信息的方法有多种:
- N个最近邻,以KD树为例。这动态地定义了局部性,这可能是也可能不是一个好主意。
- 用平面随机划分 space 以对粒子进行分组。基本上测试 N 不等式以减少 space N 次。
一旦定义了局部性,您就可以插入一个多项式,它是解析微分的。我鼓励更多地思考不同的地方定义。 (可能会产生有趣的差异)
我目前有一个由几百万个不均匀间隔的粒子跨越的体积,每个粒子都有一个属性(潜在的,对于那些好奇的人),我想为其计算局部力(加速度)。
np.gradient 仅适用于均匀间隔的数据,我在这里查看:Second order gradient in numpy 其中插值是必要的,但我在 Numpy 中找不到 3D 样条实现。
一些将产生代表性数据的代码:
import numpy as np
from scipy.spatial import cKDTree
x = np.random.uniform(-10, 10, 10000)
y = np.random.uniform(-10, 10, 10000)
z = np.random.uniform(-10, 10, 10000)
phi = np.random.uniform(-10**9, 0, 10000)
kdtree = cKDTree(np.c_[x,y,z])
_, index = kdtree.query([0,0,0], 32) #find 32 nearest particles to the origin
#find the gradient at (0,0,0) by considering the 32 nearest particles?
(我的问题和Function to compute 3D gradient with unevenly spaced sample locations很相似,但是好像没有解决办法,所以我想我再问一次。)
如有任何帮助,我们将不胜感激。
直觉上,对于一个数据点的派生,我会执行以下操作
- 对周围的数据进行切片:
data=phi[x_id-1:x_id+1, y_id-1:y_id+1, z_id-1:z_id+1]
。 kdTre 的方法看起来非常好,当然你也可以将它用于数据的子集。 - 拟合一个 3D 多项式,你可能想看看 polyvander3D。将切片中间的点定义为中心。计算到其他点的偏移量。将它们作为坐标传递给 polyfit。
- Derive 你所在位置的多项式。
这将是解决您的问题的简单方法。 但是它可能会很慢。
编辑:
实际上这似乎是通常的方法:https://scicomp.stackexchange.com/questions/480/how-can-i-numerically-differentiate-an-unevenly-sampled-function
已接受的答案讨论了推导插值多项式。尽管显然该多项式应该涵盖所有数据(Vandermonde 矩阵)。对你来说那是不可能的,数据太多了。取局部子集似乎很合理。
很大程度上取决于潜在数据的 signal/noise 比率。你的例子全是噪音,所以 "fitting" 任何东西都会是 "over-fitting." 噪音的程度将决定你想要多拟合的程度(就像 lhk 的回答一样)想成为克里金法(使用 pyKriging 或其他方式)
我建议使用
query(x,distance_upper_bound)
而不是query(x,k)
,因为这可能会防止由于集群引起的一些不稳定性我不是数学家,但我预计将多项式拟合到依赖于距离的数据子集在空间上会不稳定,尤其是随着多项式阶数的增加。这将使您得到的梯度场不连续。
这是一个 Julia 实现,可以满足您的要求
using NearestNeighbors
n = 3;
k = 32; # for stability use k > n*(n+3)/2
# Take a point near the center of cube
point = 0.5 + rand(n)*1e-3;
data = rand(n, 10^4);
kdtree = KDTree(data);
idxs, dists = knn(kdtree, point, k, true);
# Coords of the k-Nearest Neighbors
X = data[:,idxs];
# Least-squares recipe for coefficients
C = point * ones(1,k); # central node
dX = X - C; # diffs from central node
G = dX' * dX;
F = G .* G;
v = diag(G);
N = pinv(G) * G;
N = eye(N) - N;
a = N * pinv(F*N) * v; # ...these are the coeffs
# Use a temperature distribution of T = 25.4 * r^2
# whose analytical gradient is gradT = 25.4 * 2*x
X2 = X .* X;
C2 = C .* C;
T = 25.4 * n * mean(X2, 1)';
Tc = 25.4 * n * mean(C2, 1)'; # central node
dT = T - Tc; # diffs from central node
y = dX * (a .* dT); # Estimated gradient
g = 2 * 25.4 * point; # Analytical
# print results
@printf "Estimated Grad = %s\n" string(y')
@printf "Analytical Grad = %s\n" string(g')
@printf "Relative Error = %.8f\n" vecnorm(g-y)/vecnorm(g)
这种方法有大约 1% 的相对误差。这是几次运行的结果...
Estimated Grad = [25.51670916224472 25.421038632006926 25.6711949674633]
Analytical Grad = [25.41499027802736 25.44913042322385 25.448202594123806]
Relative Error = 0.00559934
Estimated Grad = [25.310574056859014 25.549736360607493 25.368056350800604]
Analytical Grad = [25.43200914200516 25.43243178887198 25.45061497749628]
Relative Error = 0.00426558
更新
我不太了解 Python,但这里的翻译似乎有效
import numpy as np
from scipy.spatial import KDTree
n = 3;
k = 32;
# fill the cube with random points
data = np.random.rand(10000,n)
kdtree = KDTree(data)
# pick a point (at the center of the cube)
point = 0.5 * np.ones((1,n))
# Coords of k-Nearest Neighbors
dists, idxs = kdtree.query(point, k)
idxs = idxs[0]
X = data[idxs,:]
# Calculate coefficients
C = (np.dot(point.T, np.ones((1,k)))).T # central node
dX= X - C # diffs from central node
G = np.dot(dX, dX.T)
F = np.multiply(G, G)
v = np.diag(G);
N = np.dot(np.linalg.pinv(G), G)
N = np.eye(k) - N;
a = np.dot(np.dot(N, np.linalg.pinv(np.dot(F,N))), v) # these are the coeffs
# Temperature distribution is T = 25.4 * r^2
X2 = np.multiply(X, X)
C2 = np.multiply(C, C)
T = 25.4 * n * np.mean(X2, 1).T
Tc = 25.4 * n * np.mean(C2, 1).T # central node
dT = T - Tc; # diffs from central node
# Analytical gradient ==> gradT = 2*25.4* x
g = 2 * 25.4 * point;
print( "g[]: %s" % (g) )
# Estimated gradient
y = np.dot(dX.T, np.multiply(a, dT))
print( "y[]: %s, Relative Error = %.8f" % (y, np.linalg.norm(g-y)/np.linalg.norm(g)) )
更新 #2
我想我可以使用格式化的 ASCII 而不是 LaTeX 来写一些易于理解的东西...
`Given a set of M vectors in n-dimensions (call them b_k), find a set of `coeffs (call them a_k) which yields the best estimate of the identity `matrix and the zero vector ` ` M ` (1) min ||E - I||, where E = sum a_k b_k b_k ` a_k k=1 ` ` M ` (2) min ||z - 0||, where z = sum a_k b_k ` a_k k=1 ` ` `Note that the basis vectors {b_k} are not required `to be normalized, orthogonal, or even linearly independent. ` `First, define the following quantities: ` ` B ==> matrix whose columns are the b_k ` G = B'.B ==> transpose of B times B ` F = G @ G ==> @ represents the hadamard product ` v = diag(G) ==> vector composed of diag elements of G ` `The above minimizations are equivalent to this linearly constrained problem ` ` Solve F.a = v ` s.t. G.a = 0 ` `Let {X} denote the Moore-Penrose inverse of X. `Then the solution of the linear problem can be written: ` ` N = I - {G}.G ==> projector into nullspace of G ` a = N . {F.N} . v ` `The utility of these coeffs is that they allow you to write `very simple expressions for the derivatives of a tensor field. ` ` `Let D be the del (or nabla) operator `and d be the difference operator wrt the central (aka 0th) node, `so that, for any scalar/vector/tensor quantity Y, we have: ` dY = Y - Y_0 ` `Let x_k be the position vector of the kth node. `And for our basis vectors, take ` b_k = dx_k = x_k - x_0. ` `Assume that each node has a field value associated with it ` (e.g. temperature), and assume a quadratic model [about x = x_0] ` for the field [g=gradient, H=hessian, ":" is the double-dot product] ` ` Y = Y_0 + (x-x_0).g + (x-x_0)(x-x_0):H/2 ` dY = dx.g + dxdx:H/2 ` D2Y = I:H ==> Laplacian of Y ` ` `Evaluate the model at the kth node ` ` dY_k = dx_k.g + dx_k dx_k:H/2 ` `Multiply by a_k and sum ` ` M M M ` sum a_k dY_k = sum a_k dx_k.g + sum a_k dx_k dx_k:H/2 ` k=1 k=1 k=1 ` ` = 0.g + I:H/2 ` = D2Y / 2 ` `Thus, we have a second order estimate of the Laplacian ` ` M ` Lap(Y_0) = sum 2 a_k dY_k ` k=1 ` ` `Now play the same game with a linear model ` dY_k = dx_k.g ` `But this time multiply by (a_k dx_k) and sum ` ` M M ` sum a_k dx_k dY_k = sum a_k dx_k dx_k.g ` k=1 k=1 ` ` = I.g ` = g ` ` `In general, the derivatives at the central node can be estimated as ` ` M ` D#Y = sum a_k dx_k#dY_k ` k=1 ` ` M ` D2Y = sum 2 a_k dY_k ` k=1 ` ` where ` # stands for the {dot, cross, or tensor} product ` yielding the {div, curl, or grad} of Y ` and ` D2Y stands for the Laplacian of Y ` D2Y = D.DY = Lap(Y)
我迟交两分钱。在space均匀分布且较大的情况下,通常只提取每个粒子的局部信息。
您可能已经注意到,提取本地信息的方法有多种:
- N个最近邻,以KD树为例。这动态地定义了局部性,这可能是也可能不是一个好主意。
- 用平面随机划分 space 以对粒子进行分组。基本上测试 N 不等式以减少 space N 次。
一旦定义了局部性,您就可以插入一个多项式,它是解析微分的。我鼓励更多地思考不同的地方定义。 (可能会产生有趣的差异)