Pandas Dataframe 求和结果错误
Pandas Dataframe sum result is wrong
我使用 pandas 和 openpyxl 编写程序来操作 excel 文件,
系列数据为:
l=[466629703, NA, 527821349, NA,734823364, NA,1667241489, NA,502673377, NA,491316417, NA,505520276, NA,2840580259, NA,1399526794, NA,468709318, NA,425220764, NA,409771252, NA,643692418, NA,1193809483, NA,353829950, NA,424820400, NA,406999623, NA,389293014, NA,1168972722, NA,420654309, NA,390431735, NA,356588382, NA]
deposit_sum = sep_df[sep_kward][deposit].dropna().astype(int).sum()
结果必须是16188926398
但是11200862491是上面代码的结果。只有一个文件发生该错误。您认为问题是什么?
不要将值转换为 int
后删除 NaN's
将值转换为 int64
因为此 2840580259.0
超出整数值的范围:
deposit_sum =df[0].dropna().astype('int64').sum()
#deposit_sum =sep_df[sep_kward][deposit].dropna().astype('int64').sum()
deposit_sum
的输出:
16188926398
使用的示例数据框:
NA=float('NaN')
l=[466629703, NA, 527821349, NA,734823364, NA,1667241489, NA,502673377, NA,491316417, NA,505520276, NA,2840580259, NA,1399526794, NA,468709318, NA,425220764, NA,409771252, NA,643692418, NA,1193809483, NA,353829950, NA,424820400, NA,406999623, NA,389293014, NA,1168972722, NA,420654309, NA,390431735, NA,356588382, NA]
df=pd.DataFrame(l)
我使用 pandas 和 openpyxl 编写程序来操作 excel 文件, 系列数据为:
l=[466629703, NA, 527821349, NA,734823364, NA,1667241489, NA,502673377, NA,491316417, NA,505520276, NA,2840580259, NA,1399526794, NA,468709318, NA,425220764, NA,409771252, NA,643692418, NA,1193809483, NA,353829950, NA,424820400, NA,406999623, NA,389293014, NA,1168972722, NA,420654309, NA,390431735, NA,356588382, NA]
deposit_sum = sep_df[sep_kward][deposit].dropna().astype(int).sum()
结果必须是16188926398
但是11200862491是上面代码的结果。只有一个文件发生该错误。您认为问题是什么?
不要将值转换为 int
后删除 NaN's
将值转换为 int64
因为此 2840580259.0
超出整数值的范围:
deposit_sum =df[0].dropna().astype('int64').sum()
#deposit_sum =sep_df[sep_kward][deposit].dropna().astype('int64').sum()
deposit_sum
的输出:
16188926398
使用的示例数据框:
NA=float('NaN')
l=[466629703, NA, 527821349, NA,734823364, NA,1667241489, NA,502673377, NA,491316417, NA,505520276, NA,2840580259, NA,1399526794, NA,468709318, NA,425220764, NA,409771252, NA,643692418, NA,1193809483, NA,353829950, NA,424820400, NA,406999623, NA,389293014, NA,1168972722, NA,420654309, NA,390431735, NA,356588382, NA]
df=pd.DataFrame(l)