Pandas series.apply 出现溢出错误
OverFlow error with Pandas series.apply
我有一个函数可以很好地处理单个值,但是当我将它与 pandas series.apply() 一起使用时,它会出现 OverflowError。
from __future__ import division
import pandas as pd
import numpy as np
birthdays = pd.DataFrame(np.empty([365,2]), columns = ['k','probability'], index = range(1,366))
birthdays['k'] = birthdays.index
我做一个函数:
def probability_of_shared_bday(k):
end_point = 366 - k
numerator = 1
for i in range(end_point, 366):
numerator = numerator*i
denominator = 365**k
probability_of_no_match = (1 - numerator/denominator)
return probability_of_no_match
当我用单个整数尝试这个时,它工作正常:
probability_of_shared_bday(1)
0.0
probability_of_shared_bday(100)
0.9999996927510721
但是当我尝试将此函数与应用一起使用时:
birthdays['probability'] = birthdays['k'].apply(probability_of_shared_bday, convert_dtype=False)
溢出错误:整数除法结果对于浮点数来说太大
无论 convert_dtype
是真还是假,都会发生这种情况。
检查 birthdays['k'].dtypes
我得到 dtype('int64')
我不确定为什么 apply
会出现这个问题,但您不应该像最初那样编写函数。这里有一个避免将两个大数相除的建议:
def probability_of_shared_bday(k):
end_point = 366 - k
ratio = 1
for i in range(end_point, 366):
ratio *= i / 365
probability_of_no_match = (1 - ratio)
return probability_of_no_match
问题就迎刃而解了!
我有一个函数可以很好地处理单个值,但是当我将它与 pandas series.apply() 一起使用时,它会出现 OverflowError。
from __future__ import division
import pandas as pd
import numpy as np
birthdays = pd.DataFrame(np.empty([365,2]), columns = ['k','probability'], index = range(1,366))
birthdays['k'] = birthdays.index
我做一个函数:
def probability_of_shared_bday(k):
end_point = 366 - k
numerator = 1
for i in range(end_point, 366):
numerator = numerator*i
denominator = 365**k
probability_of_no_match = (1 - numerator/denominator)
return probability_of_no_match
当我用单个整数尝试这个时,它工作正常:
probability_of_shared_bday(1)
0.0
probability_of_shared_bday(100)
0.9999996927510721
但是当我尝试将此函数与应用一起使用时:
birthdays['probability'] = birthdays['k'].apply(probability_of_shared_bday, convert_dtype=False)
溢出错误:整数除法结果对于浮点数来说太大
无论 convert_dtype
是真还是假,都会发生这种情况。
检查 birthdays['k'].dtypes
我得到 dtype('int64')
我不确定为什么 apply
会出现这个问题,但您不应该像最初那样编写函数。这里有一个避免将两个大数相除的建议:
def probability_of_shared_bday(k):
end_point = 366 - k
ratio = 1
for i in range(end_point, 366):
ratio *= i / 365
probability_of_no_match = (1 - ratio)
return probability_of_no_match
问题就迎刃而解了!