pandas 通过该类别的平均值填充空值(使用循环?)
pandas fill null values by the mean of that category (use loop?)
我正在尝试根据当年观察到的值的平均值来填补数据集中缺失的数据,而且需要很长时间才能逐条写入。我无法使用 for 循环创建此结构。应该如何编码?
df['TOTAL_REVENUE'] = df.TOTAL_REVENUE.fillna(df.groupby('YEAR')['TOTAL_REVENUE'].transform('mean'))
df['FEDERAL_REVENUE'] = df.FEDERAL_REVENUE.fillna(df.groupby('YEAR')['FEDERAL_REVENUE'].transform('mean'))
df['STATE_REVENUE'] = df.STATE_REVENUE.fillna(df.groupby('YEAR')['STATE_REVENUE'].transform('mean'))
df['TOTAL_EXPENDITURE'] = df.TOTAL_EXPENDITURE.fillna(df.groupby('YEAR')['TOTAL_EXPENDITURE'].transform('mean'))
我知道这是错误的,但我想举个例子。
for column in df.columns:
df[column] = df.column.fillna(df.groupby('YEAR')[column].transform('mean'))
#df['TOTAL_REVENUE'] = df.TOTAL_REVENUE.fillna(df.groupby('YEAR')['TOTAL_REVENUE'].transform('mean'))
A screenshot as an example
你会这样做(使用 df[column]
而不是 df.column
):
for column in df.columns:
df[column] = df[column].fillna(df.groupby('YEAR')[column].transform('mean'))
我正在尝试根据当年观察到的值的平均值来填补数据集中缺失的数据,而且需要很长时间才能逐条写入。我无法使用 for 循环创建此结构。应该如何编码?
df['TOTAL_REVENUE'] = df.TOTAL_REVENUE.fillna(df.groupby('YEAR')['TOTAL_REVENUE'].transform('mean'))
df['FEDERAL_REVENUE'] = df.FEDERAL_REVENUE.fillna(df.groupby('YEAR')['FEDERAL_REVENUE'].transform('mean'))
df['STATE_REVENUE'] = df.STATE_REVENUE.fillna(df.groupby('YEAR')['STATE_REVENUE'].transform('mean'))
df['TOTAL_EXPENDITURE'] = df.TOTAL_EXPENDITURE.fillna(df.groupby('YEAR')['TOTAL_EXPENDITURE'].transform('mean'))
我知道这是错误的,但我想举个例子。
for column in df.columns:
df[column] = df.column.fillna(df.groupby('YEAR')[column].transform('mean'))
#df['TOTAL_REVENUE'] = df.TOTAL_REVENUE.fillna(df.groupby('YEAR')['TOTAL_REVENUE'].transform('mean'))
A screenshot as an example
你会这样做(使用 df[column]
而不是 df.column
):
for column in df.columns:
df[column] = df[column].fillna(df.groupby('YEAR')[column].transform('mean'))