pandas DataFrame (python) 中的 Z 分数归一化

Question

我正在使用 python3 (spyder)，并且我有一个 table 对象类型 "pandas.core.frame.DataFrame"。我想对 table 中的值进行 z-score 标准化（每个值减去其行的平均值并除以其行的 sd），因此每一行的平均值 = 0 和 sd = 1。我尝试了两种方法。

第一种方法

from scipy.stats import zscore
zetascore_table=zscore(table,axis=1)

第二种方法

rows=table.index.values
columns=table.columns
import numpy as np
for i in range(len(rows)):
    for j in range(len(columns)):
         table.loc[rows[i],columns[j]]=(table.loc[rows[i],columns[j]] - np.mean(table.loc[rows[i],]))/np.std(table.loc[rows[i],])
table

这两种方法似乎都有效，但是当我检查每一行的均值和标准差时，它不是预期的 0 和 1，而是其他浮点值。我不知道哪个可能是问题所在。

在此先感谢您的帮助！

Answer 1

抱歉，考虑一下我发现自己有另一种比 for 循环更简单的计算 z-score 的方法（减去每行的平均值并将结果除以该行的 sd）：

table=table.T# need to transpose it since the functions work like that 
sd=np.std(table)
mean=np.mean(table)
numerator=table-mean #numerator in the formula for z-score 
z_score=numerator/sd
z_norm_table=z_score.T #we transpose again and we have the initial table but with all the 
#values z-scored by row.

我检查了一下，现在每行的意思是 0 或非常接近 0，sd 是 1 或非常接近 1，所以这对我有用。抱歉，我没有编码经验，有时简单的事情需要大量试验，直到我弄清楚如何解决它们。

Answer 2

下面的代码计算 pandas df 的列中每个值的 z 分数。然后它将 z 分数保存在新列中（此处称为 'num_1_zscore'）。非常容易做到。

from scipy.stats import zscore
import pandas as pd

# Create a sample df
df = pd.DataFrame({'num_1': [1,2,3,4,5,6,7,8,9,3,4,6,5,7,3,2,9]})

# Calculate the zscores and drop zscores into new column
df['num_1_zscore'] = zscore(df['num_1'])

display(df)

pandas DataFrame (python) 中的 Z 分数归一化

Z-score normalization in pandas DataFrame (python)

normalization

python-3.x

pandas

spyder