在 Python 中重现 Excel 平均值和四舍五入
Reproducing Excel average and rounding in Python
我在 Python 中阅读了几个关于舍入的问题,但无法重现我的 Excel 结果。
我有一行数字和 NA
s:
Reprex via print(reprex.to_dict())
{'Column 1': {0: 0}, 'Column 2': {0: 0}, 'Column 3': {0: 95}, 'Column 4': {0: 2}, 'Column 5': {0: 2}, 'Column 6': {0: 0}, 'Column 7': {0: 83}, 'Column 8': {0: 95}, 'Column 9': {0: 100}, 'Column 10': {0: 90}, 'Column 11': {0: 7}, 'Column 12': {0: 0}, 'Column 13': {0: 98}, 'Column 14': {0: 97}, 'Column 15': {0: 14}, 'Column 16': {0: 1}, 'Column 17': {0: 0}, 'Column 18': {0: 3}, 'Column 19': {0: 7}, 'Column 20': {0: 9}, 'Column 21': {0: 5}, 'Column 22': {0: 6}, 'Column 23': {0: 10}, 'Column 24': {0: 4}, 'Column 25': {0: 7}, 'Column 26': {0: 5}, 'Column 27': {0: 13}, 'Column 28': {0: 3}, 'Column 29': {0: 5}, 'Column 30': {0: 0}, 'Column 31': {0: 97}, 'Column 32': {0: 96}, 'Column 33': {0: 97}, 'Column 34': {0: 98}, 'Column 35': {0: 97}, 'Column 36': {0: 100}, 'Column 37': {0: 97}, 'Column 38': {0: 97}, 'Column 39': {0: 97}, 'Column 40': {0: 91}, 'Column 41': {0: 97}, 'Column 42': {0: 5}, 'Column 43': {0: 10}, 'Column 44': {0: nan}, 'Column 45': {0: 10}, 'Column 46': {0: 7}, 'Column 47': {0: 8}, 'Column 48': {0: 6}, 'Column 49': {0: 14}, 'Column 50': {0: 22}, 'Column 51': {0: 17}, 'Column 52': {0: 8}, 'Column 53': {0: 21}, 'Column 54': {0: 19}, 'Column 55': {0: 20}, 'Column 56': {0: 18}, 'Column 57': {0: 15}, 'Column 58': {0: 19}}
Excel 函数 Average()
给我 35.85964912
,四舍五入到 36
.
我多次检查我是否正确地对列进行子集化。
当我做的时候
cols = df.iloc[: , 133:191]
df['score'] = cols.mean(axis = 1)
Python 给出了 37.905660
.
的平均值
因此,在 Excel 中四舍五入得到 38
,使用
df = df.round({'Overall_mean_procedure_PB_score': 0})
2 是一个巨大的差异。也许 NA
在这里造成了问题。
如何正确进行此类计算,遵循 Excel?
复现数据:
import pandas as pd
import numpy as np
data = {
"Column 1": {0: 0},
"Column 2": {0: 0},
"Column 3": {0: 95},
"Column 4": {0: 2},
"Column 5": {0: 2},
"Column 6": {0: 0},
"Column 7": {0: 83},
"Column 8": {0: 95},
"Column 9": {0: 100},
"Column 10": {0: 90},
"Column 11": {0: 7},
"Column 12": {0: 0},
"Column 13": {0: 98},
"Column 14": {0: 97},
"Column 15": {0: 14},
"Column 16": {0: 1},
"Column 17": {0: 0},
"Column 18": {0: 3},
"Column 19": {0: 7},
"Column 20": {0: 9},
"Column 21": {0: 5},
"Column 22": {0: 6},
"Column 23": {0: 10},
"Column 24": {0: 4},
"Column 25": {0: 7},
"Column 26": {0: 5},
"Column 27": {0: 13},
"Column 28": {0: 3},
"Column 29": {0: 5},
"Column 30": {0: 0},
"Column 31": {0: 97},
"Column 32": {0: 96},
"Column 33": {0: 97},
"Column 34": {0: 98},
"Column 35": {0: 97},
"Column 36": {0: 100},
"Column 37": {0: 97},
"Column 38": {0: 97},
"Column 39": {0: 97},
"Column 40": {0: 91},
"Column 41": {0: 97},
"Column 42": {0: 5},
"Column 43": {0: 10},
"Column 44": {0: np.nan},
"Column 45": {0: 10},
"Column 46": {0: 7},
"Column 47": {0: 8},
"Column 48": {0: 6},
"Column 49": {0: 14},
"Column 50": {0: 22},
"Column 51": {0: 17},
"Column 52": {0: 8},
"Column 53": {0: 21},
"Column 54": {0: 19},
"Column 55": {0: 20},
"Column 56": {0: 18},
"Column 57": {0: 15},
"Column 58": {0: 19},
}
df = pd.DataFrame(data).T
df.mean()
输出:
0 35.859649
dtype: float64
看起来效果不错。我认为 NA 是问题所在。将其转换为 python NaN 类型。
感谢@Henrik Bo,我解决了。我搞砸了数据框中的 NaN
和 NA
。
当我做的时候
# Replace all outlier values with np.NaN
df = df.replace(outlier_value, np.NaN)
计算正确。
我在 Python 中阅读了几个关于舍入的问题,但无法重现我的 Excel 结果。
我有一行数字和 NA
s:
Reprex via print(reprex.to_dict())
{'Column 1': {0: 0}, 'Column 2': {0: 0}, 'Column 3': {0: 95}, 'Column 4': {0: 2}, 'Column 5': {0: 2}, 'Column 6': {0: 0}, 'Column 7': {0: 83}, 'Column 8': {0: 95}, 'Column 9': {0: 100}, 'Column 10': {0: 90}, 'Column 11': {0: 7}, 'Column 12': {0: 0}, 'Column 13': {0: 98}, 'Column 14': {0: 97}, 'Column 15': {0: 14}, 'Column 16': {0: 1}, 'Column 17': {0: 0}, 'Column 18': {0: 3}, 'Column 19': {0: 7}, 'Column 20': {0: 9}, 'Column 21': {0: 5}, 'Column 22': {0: 6}, 'Column 23': {0: 10}, 'Column 24': {0: 4}, 'Column 25': {0: 7}, 'Column 26': {0: 5}, 'Column 27': {0: 13}, 'Column 28': {0: 3}, 'Column 29': {0: 5}, 'Column 30': {0: 0}, 'Column 31': {0: 97}, 'Column 32': {0: 96}, 'Column 33': {0: 97}, 'Column 34': {0: 98}, 'Column 35': {0: 97}, 'Column 36': {0: 100}, 'Column 37': {0: 97}, 'Column 38': {0: 97}, 'Column 39': {0: 97}, 'Column 40': {0: 91}, 'Column 41': {0: 97}, 'Column 42': {0: 5}, 'Column 43': {0: 10}, 'Column 44': {0: nan}, 'Column 45': {0: 10}, 'Column 46': {0: 7}, 'Column 47': {0: 8}, 'Column 48': {0: 6}, 'Column 49': {0: 14}, 'Column 50': {0: 22}, 'Column 51': {0: 17}, 'Column 52': {0: 8}, 'Column 53': {0: 21}, 'Column 54': {0: 19}, 'Column 55': {0: 20}, 'Column 56': {0: 18}, 'Column 57': {0: 15}, 'Column 58': {0: 19}}
Excel 函数 Average()
给我 35.85964912
,四舍五入到 36
.
我多次检查我是否正确地对列进行子集化。
当我做的时候
cols = df.iloc[: , 133:191]
df['score'] = cols.mean(axis = 1)
Python 给出了 37.905660
.
因此,在 Excel 中四舍五入得到 38
,使用
df = df.round({'Overall_mean_procedure_PB_score': 0})
2 是一个巨大的差异。也许 NA
在这里造成了问题。
如何正确进行此类计算,遵循 Excel?
复现数据:
import pandas as pd
import numpy as np
data = {
"Column 1": {0: 0},
"Column 2": {0: 0},
"Column 3": {0: 95},
"Column 4": {0: 2},
"Column 5": {0: 2},
"Column 6": {0: 0},
"Column 7": {0: 83},
"Column 8": {0: 95},
"Column 9": {0: 100},
"Column 10": {0: 90},
"Column 11": {0: 7},
"Column 12": {0: 0},
"Column 13": {0: 98},
"Column 14": {0: 97},
"Column 15": {0: 14},
"Column 16": {0: 1},
"Column 17": {0: 0},
"Column 18": {0: 3},
"Column 19": {0: 7},
"Column 20": {0: 9},
"Column 21": {0: 5},
"Column 22": {0: 6},
"Column 23": {0: 10},
"Column 24": {0: 4},
"Column 25": {0: 7},
"Column 26": {0: 5},
"Column 27": {0: 13},
"Column 28": {0: 3},
"Column 29": {0: 5},
"Column 30": {0: 0},
"Column 31": {0: 97},
"Column 32": {0: 96},
"Column 33": {0: 97},
"Column 34": {0: 98},
"Column 35": {0: 97},
"Column 36": {0: 100},
"Column 37": {0: 97},
"Column 38": {0: 97},
"Column 39": {0: 97},
"Column 40": {0: 91},
"Column 41": {0: 97},
"Column 42": {0: 5},
"Column 43": {0: 10},
"Column 44": {0: np.nan},
"Column 45": {0: 10},
"Column 46": {0: 7},
"Column 47": {0: 8},
"Column 48": {0: 6},
"Column 49": {0: 14},
"Column 50": {0: 22},
"Column 51": {0: 17},
"Column 52": {0: 8},
"Column 53": {0: 21},
"Column 54": {0: 19},
"Column 55": {0: 20},
"Column 56": {0: 18},
"Column 57": {0: 15},
"Column 58": {0: 19},
}
df = pd.DataFrame(data).T
df.mean()
输出:
0 35.859649
dtype: float64
看起来效果不错。我认为 NA 是问题所在。将其转换为 python NaN 类型。
感谢@Henrik Bo,我解决了。我搞砸了数据框中的 NaN
和 NA
。
当我做的时候
# Replace all outlier values with np.NaN
df = df.replace(outlier_value, np.NaN)
计算正确。