NumPy：如何使用 np.var 和手动计算二维数组每一行的方差（即不使用 np.var；显式计算每个项）？

Question

我正在使用 Python 从大文件导入数据。一共有三列分别对应x,y,z数据。每行代表收集数据的时间。例如：

importedData = [[1, 2, 3],  <--This row: x, y, and z data at time 0.
                [4, 5, 6],
                [7, 8, 9]];

我想计算每个时间（行）的方差。据我所知，一种方法如下（如果这不正确，我将不胜感激）：

varPerTimestep = np.var(importedData,axis=1);
这是我的问题。为了说服同事它有效，接下来我想做同样的事情，但要避免使用 np.var。这意味着解决：

Var(S)=(⟨S_bar⋅S_bar⟩−⟨S_bar⟩⟨S_bar⟩) # S_bar, x, y, z

我是间歇性 Python 用户，只是不知道如何为每一行执行此操作。我在网上找到了一个建议，但不知道如何调整下面的代码以使其适用于每一行（抱歉；无法提供 link 因为当我这样做时，我收到一个错误，提示我的代码未格式化正确，我不能 post 这个问题；也是一些代码被格式化为下面引号的原因）：

def variance(data, ddof=0):
     n = len(data)
     mean = sum(data) / n
     return sum((x - mean) ** 2 for x in data) / (n - ddof)

我尝试过各种方法。例如，将函数放入一个循环中，我首先尝试获取行平均值：

for row in importedData:
    mean_test = np.mean(importedData,axis=1)
print(mean_test)

这给了我一个我无法弄清楚的错误：

Traceback (most recent call last):
  File "<string>", line 13, in <module>
TypeError: list indices must be integers or slices, not tuple

我也试过这个但没有输出，因为我似乎陷入了一个循环：

 n = len(importedData[0,:])         # Trying to get the length of each row.
 mean = mean(importedData[0,:])     # Likewise trying to get the mean of each row.
 deviations = [(x - mean) ** 2 for x in importedData]
 variance = sum(deviations) / n

如果有人能指出正确的方向，我将不胜感激。

Answer 1

好吧，你可以做这样的事情来使事情更明确：

import numpy as np 

importedData = np.arange(1,10).reshape(3,3)

# Get means for each row
means = [row.mean() for row in importedData]

# Calculate squared errors
squared_errors = [(row-mean)**2 for row, mean in zip(importedData, means)]

# Calculate "mean for each row of squared errors" (aka the variance)
variances = [row.mean() for row in squared_errors]

# Sanity check
print(variances)
print(importedData.var(1))

# [0.6666666666666666, 0.6666666666666666, 0.6666666666666666]
# [0.66666667 0.66666667 0.66666667]

NumPy：如何使用 np.var 和手动计算二维数组每一行的方差（即不使用 np.var；显式计算每个项）？

NumPy: how to calculate variance along each row of a 2D array using np.var and by hand (i.e., not using np.var; calculating each term explicitly)?

statistics

loops

numpy

variance

python-3.x