Numpy：具有积分限制的数值积分

Question

我已经测量了我想要在特定范围内积分的峰值。

我要整合的数据是具有波数和强度的 numpy 数组形式：

peakQ1_2500_smoothened =
array([[ 1.95594400e+04, -3.70074342e-17,  3.26000000e+00],
       [ 1.95594500e+04,  1.66666667e-03,  4.81500000e+00],
       [ 1.95594600e+04,  2.83333333e-02,  4.80833333e+00],
       [ 1.95594700e+04,  1.33333333e-02,  4.82166667e+00],
       [ 1.95594800e+04,  5.00000000e-03,  4.92416667e+00],
       [ 1.95594900e+04,  5.55555556e-04,  4.99305556e+00],
       [ 1.95595100e+04, -7.77777778e-03,  5.03972222e+00],
       [ 1.95595200e+04, -5.55555556e-03,  4.96888889e+00],
       [ 1.95595300e+04, -1.77777778e-02,  4.91333333e+00],
       [ 1.95595400e+04,  1.38888889e-02,  4.82500000e+00],
       [ 1.95595500e+04,  7.05555556e-02,  4.85722222e+00],
       [ 1.95595600e+04,  1.43888889e-01,  4.86638889e+00],
       [ 1.95595700e+04,  1.98888889e-01,  4.85138889e+00],
       [ 1.95595800e+04,  2.84444444e-01,  4.90694444e+00],
       [ 1.95595900e+04,  4.64444444e-01,  4.93611111e+00],
       [ 1.95596000e+04,  6.61111111e-01,  4.98166667e+00],
       [ 1.95596100e+04,  9.61666667e-01,  4.96722222e+00],
       [ 1.95596200e+04,  1.23222222e+00,  4.94388889e+00],
       [ 1.95596400e+04,  1.43555556e+00,  5.02166667e+00],
       [ 1.95596500e+04,  1.53222222e+00,  5.00500000e+00],
       [ 1.95596600e+04,  1.59833333e+00,  5.03666667e+00],
       [ 1.95596700e+04,  1.66388889e+00,  4.94555556e+00],
       [ 1.95596800e+04,  1.60111111e+00,  4.92777778e+00],
       [ 1.95596900e+04,  1.42333333e+00,  4.94666667e+00],
       [ 1.95597000e+04,  1.14111111e+00,  5.00777778e+00],
       [ 1.95597100e+04,  9.52222222e-01,  5.08555556e+00],
       [ 1.95597200e+04,  7.25555556e-01,  5.09222222e+00],
       [ 1.95597300e+04,  5.80555556e-01,  5.08055556e+00],
       [ 1.95597400e+04,  3.92777778e-01,  5.09611111e+00],
       [ 1.95597500e+04,  2.43222222e-01,  5.01655556e+00],
       [ 1.95597600e+04,  1.36555556e-01,  4.99822222e+00],
       [ 1.95597700e+04,  6.32222222e-02,  4.87044444e+00],
       [ 1.95597800e+04,  3.88888889e-02,  4.91944444e+00],
       [ 1.95597900e+04,  3.22222222e-02,  4.93611111e+00],
       [ 1.95598000e+04,  2.44444444e-02,  5.10277778e+00],
       [ 1.95598100e+04,  5.11111111e-02,  5.11277778e+00],
       [ 1.95598200e+04,  4.44444444e-02,  5.21944444e+00],
       [ 1.95598300e+04,  4.33333333e-02,  5.05333333e+00],
       [ 1.95598400e+04,  3.58333333e-02,  5.08750000e+00],
       [ 1.95598500e+04,  7.50000000e-03,  5.12750000e+00],
       [ 1.95598600e+04,  4.16666667e-03,  5.22916667e+00],
       [ 1.95598800e+04, -1.33333333e-02,  3.51000000e+00]])

我发现我可以对整个数组进行积分：

def integratePeak(yvals, xvals):
    I = np.trapz(yvals, x = xvals)
    return I

但是我如何与 x-limits 集成，例如从 19559.52 到 19559.78？

def integratePeak(yvals, xvals, xlower, xupper):
    '''integrate y over x from xlower to xupper'''
    return I

我当然可以通过将数组元素显式引用为 peakQ1_2500_smoothened[7:33,0] 和 peakQ1_2500_smoothened[7:33,1] 来给出 x 和 y 值，但显然我不想引用数组元素而是定义积分限制为波数，因为不同的测量峰具有不同的阵列长度。

每个波数减少到一个数据点然后取运行平均值的函数：

def averagePerWavenumber(data):
    wavenum, intensity, power = data[:,0], data[:,1], data[:,2]
    wavenum_unique, intensity_mean = npi.group_by(wavenum).mean(intensity)
    wavenum_unique, power_mean = npi.group_by(wavenum).mean(power)
    output = np.zeros(shape=(len(wavenum_unique), 3))
    output[:,0] = wavenum_unique
    output[:,1] = intensity_mean
    output[:,2] = power_mean
    return output

def smoothening(data, bins):
    output = np.zeros(shape=(len(data[:,0]), 3))
    output[:,0] = data[:,0]
    output[:,1] = np.convolve(data[:,1], np.ones(bins), mode='same') / bins
    output[:,2] = np.convolve(data[:,2], np.ones(bins), mode='same') / bins
    return output

Answer 1

def integratePeak(yvals, xvals, xlower, xupper):
    '''integrate y over x from xlower to xupper.

    Use trapz to integrate over points closest to xlower, xupper.
    
    the +1 to idx_max is for numpy half-open indexing.
    '''
    idx_min = np.argmin(np.abs(xvals - xlower))
    idx_max = np.argmin(np.abs(xvals - xupper)) + 1
    result = np.trapz(yvals[idx_min:idx_max], x=xvals[idx_min:idx_max])
    return result

顺便说一句，您可能会受益于对表格数据使用 pandas - 它可以与 numpy 数组很好地互操作，最重要的是可以让您标记数据：

import pandas as pd
df = pd.DataFrame(peakQ1_2500_smoothened, columns=["wave_num", "intensity", "col3"])

integratePeak(yvals=df.intensity, xvals=df.wave_num, xlower=19559.52, xupper=19559.78)

# 0.18853555549577536

Answer 2

让我们先看看 np.trapz 的实际作用。第 i 个梯形的面积是平均高度乘以宽度：0.5 * (y[i + 1] + y[i]) * (x[i + 1] - x[i])。如果您有一个固定的 dx 而不是 x 数组，则最后一项只是一个标量。因此，让我们重写您的第一个函数：

def integrate_peak0(y, x):
    """ x can be array of same size as y or a scalar """
    dx = x if x.size <= 1 else np.diff(x)
    return np.sum(0.5 * (y[1:] + y[:-1]) * dx)

现在最难的部分是插值积分的极限。由于 x 已排序，您可以使用 np.searchsorted 将限制转换为索引转换为数据：

limits = np.array([xlower, xupper])
indices = np.searchsorted(x, limits)

如果限制总是落在 x 的精确值上，您可以直接使用 indices：

def integrate_peak1(y, x, xlower, xupper):
    indices = np.searchsorted(x, [xlower, xupper])
    s = slice(indices[0], indices[1] + 1)
    return np.trapz(y[s], x[s])

由于几乎不会出现这种情况，您可以尝试下一个最简单的方法：四舍五入到最接近的值。您可以使用花式索引为每个可以应用 np.argmin 的潜在边界获取二维数组：

candidates = x[np.stack((indices - 1, indices), axis=0)]
offset = np.abs(candidates - limits).argmin(axis=0) - 1
indices += offset

candidates 是一个 2x2 数组，其中列代表每个边界的候选者，行代表较小和较大的候选者。 offset 将是您需要修改索引以获得最近邻居的数量。这是根据积分限制选择最近的 bin 的积分器版本：

def integrate_peak2(y, x, xlower, xupper):
    limits = np.array([xlower, xupper])
    indices = np.searchsorted(x, limits)
    candidates = x[np.stack((indices - 1, indices), axis=0)]
    indices += np.abs(candidates - limits).argmin(axis=0) - 1

    s = slice(indices[0], indices[1] + 1)
    return np.trapz(y[s], x[s])

最终版本是在x的基础上对y的值进行插值。该版本可以通过两种方式之一实现。您可以计算目标 y 值并使用适当的 x 将它们传递给 np.trapz，或者您可以使用 integrate_peak0 中定义的函数自己执行操作。

给定一个元素 x[i] < xn <= x[i + 1]，您可以估计 yn = y[i] + (y[i + 1] - y[i]) * (x[n] - x[i]) / (x[i + 1] - x[i])。这里，x[i] 和 x[i + 1] 是上面显示的 candidates 的值。 y[i]和y[i + 1]是y对应的元素。 xn 是 limits。因此，您可以通过几种不同的方式计算插值。

一种方法是将输入调整为 trapz:

def integrate_peak3a(y, x, xlower, xupper):
    limits = np.array([xlower, xupper])
    indices = np.searchsorted(x, limits)
    indices = np.stack((indices - 1, indices), axis=0)
    xi = x[indices]
    yi = y[indices]
    yn = yi[0] + np.diff(yi, axis=0) * (limits - xi[0]) / np.diff(xi, axis=0)

    indices = indices[[1, 0], [0, 1]]
    s = slice(indices[0], indices[1] + 1)
    return np.trapz(np.r_[yn[0, 0], y[s], yn[0, 1]], np.r_[xlower, x[s], xupper])

另一种方法是手动计算边缘片段的总和：

def integrate_peak3b(y, x, xlower, xupper):
    limits = np.array([xlower, xupper])
    indices = np.searchsorted(x, limits)
    indices = np.stack((indices - 1, indices), axis=0)
    xi = x[indices]
    yi = y[indices]
    yn = yi[0] + np.diff(yi, axis=0) * (limits - xi[0]) / np.diff(xi, axis=0)

    indices = indices[[1, 0], [0, 1]]
    s = slice(indices[0], indices[1] + 1)
    return np.trapz(y[s], x[s]) - 0.5 * np.diff((yn + y[indices]) * (x[indices] - limits))

当然，您可以通过 integrate_peak0 中的“手动”计算运行 integrate_peak3a 中 np.trapz 的输入。

在所有这些情况下，检查积分的限制是否在可接受的范围内并且顺序是否正确作为 reader 的练习。

Numpy：具有积分限制的数值积分

Numpy: Numerical integration with integration limits

python

arrays

integration

numpy