带有不确定性包的意外长计算时间
Unexpectedly long computation time with uncertainties package
考虑以下代码片段:
import random
from uncertainties import unumpy, ufloat
x = [random.uniform(0,1) for p in range(1,8200)]
y = [random.randrange(0,1000) for p in range(1,8200)]
xerr = [random.uniform(0,1)/1000 for p in range(1,8200)]
yerr = [random.uniform(0,1)*10 for p in range(1,8200)]
x = unumpy.uarray(x, xerr)
y = unumpy.uarray(y, yerr)
diff = sum(x*y)
u = ufloat(0.0, 0.0)
for k in range(len(x)):
u+= (diff-x[k])**2 * y[k]
print(u)
如果我尝试在我的计算机上 运行 它,最多需要 10 分钟才能产生结果。我不太确定为什么会这样,希望得到某种解释。
如果我不得不猜测,我会说不确定性的计算出于某种原因比人们想象的要复杂,但正如我所说,这只是一个猜测。有趣的是,如果最后删除 print
指令,代码几乎会立即完成,老实说,这让我感到困惑,而不是帮助...
如果您不知道,this 是不确定性库的存储库。
我可以复制这个,印刷品是永恒的。或者更确切地说,它是
转换为由 print 隐式调用的字符串。
我用line_profiler测了AffineScalarFunc
的__format__
函数的时间。 (它被__str__
调用,它被打印调用)
我将数组大小从 8200 减少到 1000 以使其运行得更快一些。这是结果(为了便于阅读而修剪):
Timer unit: 1e-06 s
Total time: 29.1365 s
File: /home/veith/Projects/Whosebug/test/lib/python3.6/site-packages/uncertainties/core.py
Function: __format__ at line 1813
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1813 @profile
1814 def __format__(self, format_spec):
1960
1961 # Since the '%' (percentage) format specification can change
1962 # the value to be displayed, this value must first be
1963 # calculated. Calculating the standard deviation is also an
1964 # optimization: the standard deviation is generally
1965 # calculated: it is calculated only once, here:
1966 1 2.0 2.0 0.0 nom_val = self.nominal_value
1967 1 29133097.0 29133097.0 100.0 std_dev = self.std_dev
1968
可以看到几乎所有的时间都在1967行,计算标准差的地方。如果再深挖一点,就会发现error_components
属性是问题所在,derivatives
属性是问题所在,_linear_part.expand()
是问题所在问题。如果您对此进行概要分析,您就会开始找到问题的根源。这里的大部分工作是均匀分布的:
Function: expand at line 1481
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1481 @profile
1482 def expand(self):
1483 """
1484 Expand the linear combination.
1485
1486 The expansion is a collections.defaultdict(float).
1487
1488 This should only be called if the linear combination is not
1489 yet expanded.
1490 """
1491
1492 # The derivatives are built progressively by expanding each
1493 # term of the linear combination until there is no linear
1494 # combination to be expanded.
1495
1496 # Final derivatives, constructed progressively:
1497 1 2.0 2.0 0.0 derivatives = collections.defaultdict(float)
1498
1499 15995999 4942237.0 0.3 9.7 while self.linear_combo: # The list of terms is emptied progressively
1500
1501 # One of the terms is expanded or, if no expansion is
1502 # needed, simply added to the existing derivatives.
1503 #
1504 # Optimization note: since Python's operations are
1505 # left-associative, a long sum of Variables can be built
1506 # such that the last term is essentially a Variable (and
1507 # not a NestedLinearCombination): popping from the
1508 # remaining terms allows this term to be quickly put in
1509 # the final result, which limits the number of terms
1510 # remaining (and whose size can temporarily grow):
1511 15995998 6235033.0 0.4 12.2 (main_factor, main_expr) = self.linear_combo.pop()
1512
1513 # print "MAINS", main_factor, main_expr
1514
1515 15995998 10572206.0 0.7 20.8 if main_expr.expanded():
1516 15992002 6822093.0 0.4 13.4 for (var, factor) in main_expr.linear_combo.items():
1517 7996001 8070250.0 1.0 15.8 derivatives[var] += main_factor*factor
1518
1519 else: # Non-expanded form
1520 23995993 8084949.0 0.3 15.9 for (factor, expr) in main_expr.linear_combo:
1521 # The main_factor is applied to expr:
1522 15995996 6208091.0 0.4 12.2 self.linear_combo.append((main_factor*factor, expr))
1523
1524 # print "DERIV", derivatives
1525
1526 1 2.0 2.0 0.0 self.linear_combo = derivatives
你可以看到有很多次调用expanded
,调用isinstance
,。
另请注意注释,这暗示该库实际上仅在需要时才计算导数(并且意识到它在其他情况下确实很慢)。这就是为什么转换成string这么久的原因,之前的时间都没有占用。
在 AffineScalarFunc
的 __init__
中:
# In order to have a linear execution time for long sums, the
# _linear_part is generally left as is (otherwise, each
# successive term would expand to a linearly growing sum of
# terms: efficiently handling such terms [so, without copies]
# is not obvious, when the algorithm should work for all
# functions beyond sums).
在 AffineScalarFunc
的 std_dev
中:
#! It would be possible to not allow the user to update the
#std dev of Variable objects, in which case AffineScalarFunc
#objects could have a pre-calculated or, better, cached
#std_dev value (in fact, many intermediate AffineScalarFunc do
#not need to have their std_dev calculated: only the final
#AffineScalarFunc returned to the user does).
在 LinearCombination
的 expand
中:
# The derivatives are built progressively by expanding each
# term of the linear combination until there is no linear
# combination to be expanded.
所以总而言之,这在某种程度上是意料之中的,因为库处理这些需要大量操作才能处理的非本地数字(显然)。
考虑以下代码片段:
import random
from uncertainties import unumpy, ufloat
x = [random.uniform(0,1) for p in range(1,8200)]
y = [random.randrange(0,1000) for p in range(1,8200)]
xerr = [random.uniform(0,1)/1000 for p in range(1,8200)]
yerr = [random.uniform(0,1)*10 for p in range(1,8200)]
x = unumpy.uarray(x, xerr)
y = unumpy.uarray(y, yerr)
diff = sum(x*y)
u = ufloat(0.0, 0.0)
for k in range(len(x)):
u+= (diff-x[k])**2 * y[k]
print(u)
如果我尝试在我的计算机上 运行 它,最多需要 10 分钟才能产生结果。我不太确定为什么会这样,希望得到某种解释。
如果我不得不猜测,我会说不确定性的计算出于某种原因比人们想象的要复杂,但正如我所说,这只是一个猜测。有趣的是,如果最后删除 print
指令,代码几乎会立即完成,老实说,这让我感到困惑,而不是帮助...
如果您不知道,this 是不确定性库的存储库。
我可以复制这个,印刷品是永恒的。或者更确切地说,它是
转换为由 print 隐式调用的字符串。
我用line_profiler测了AffineScalarFunc
的__format__
函数的时间。 (它被__str__
调用,它被打印调用)
我将数组大小从 8200 减少到 1000 以使其运行得更快一些。这是结果(为了便于阅读而修剪):
Timer unit: 1e-06 s
Total time: 29.1365 s
File: /home/veith/Projects/Whosebug/test/lib/python3.6/site-packages/uncertainties/core.py
Function: __format__ at line 1813
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1813 @profile
1814 def __format__(self, format_spec):
1960
1961 # Since the '%' (percentage) format specification can change
1962 # the value to be displayed, this value must first be
1963 # calculated. Calculating the standard deviation is also an
1964 # optimization: the standard deviation is generally
1965 # calculated: it is calculated only once, here:
1966 1 2.0 2.0 0.0 nom_val = self.nominal_value
1967 1 29133097.0 29133097.0 100.0 std_dev = self.std_dev
1968
可以看到几乎所有的时间都在1967行,计算标准差的地方。如果再深挖一点,就会发现error_components
属性是问题所在,derivatives
属性是问题所在,_linear_part.expand()
是问题所在问题。如果您对此进行概要分析,您就会开始找到问题的根源。这里的大部分工作是均匀分布的:
Function: expand at line 1481
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1481 @profile
1482 def expand(self):
1483 """
1484 Expand the linear combination.
1485
1486 The expansion is a collections.defaultdict(float).
1487
1488 This should only be called if the linear combination is not
1489 yet expanded.
1490 """
1491
1492 # The derivatives are built progressively by expanding each
1493 # term of the linear combination until there is no linear
1494 # combination to be expanded.
1495
1496 # Final derivatives, constructed progressively:
1497 1 2.0 2.0 0.0 derivatives = collections.defaultdict(float)
1498
1499 15995999 4942237.0 0.3 9.7 while self.linear_combo: # The list of terms is emptied progressively
1500
1501 # One of the terms is expanded or, if no expansion is
1502 # needed, simply added to the existing derivatives.
1503 #
1504 # Optimization note: since Python's operations are
1505 # left-associative, a long sum of Variables can be built
1506 # such that the last term is essentially a Variable (and
1507 # not a NestedLinearCombination): popping from the
1508 # remaining terms allows this term to be quickly put in
1509 # the final result, which limits the number of terms
1510 # remaining (and whose size can temporarily grow):
1511 15995998 6235033.0 0.4 12.2 (main_factor, main_expr) = self.linear_combo.pop()
1512
1513 # print "MAINS", main_factor, main_expr
1514
1515 15995998 10572206.0 0.7 20.8 if main_expr.expanded():
1516 15992002 6822093.0 0.4 13.4 for (var, factor) in main_expr.linear_combo.items():
1517 7996001 8070250.0 1.0 15.8 derivatives[var] += main_factor*factor
1518
1519 else: # Non-expanded form
1520 23995993 8084949.0 0.3 15.9 for (factor, expr) in main_expr.linear_combo:
1521 # The main_factor is applied to expr:
1522 15995996 6208091.0 0.4 12.2 self.linear_combo.append((main_factor*factor, expr))
1523
1524 # print "DERIV", derivatives
1525
1526 1 2.0 2.0 0.0 self.linear_combo = derivatives
你可以看到有很多次调用expanded
,调用isinstance
,
在 AffineScalarFunc
的 __init__
中:
# In order to have a linear execution time for long sums, the
# _linear_part is generally left as is (otherwise, each
# successive term would expand to a linearly growing sum of
# terms: efficiently handling such terms [so, without copies]
# is not obvious, when the algorithm should work for all
# functions beyond sums).
在 AffineScalarFunc
的 std_dev
中:
#! It would be possible to not allow the user to update the
#std dev of Variable objects, in which case AffineScalarFunc
#objects could have a pre-calculated or, better, cached
#std_dev value (in fact, many intermediate AffineScalarFunc do
#not need to have their std_dev calculated: only the final
#AffineScalarFunc returned to the user does).
在 LinearCombination
的 expand
中:
# The derivatives are built progressively by expanding each
# term of the linear combination until there is no linear
# combination to be expanded.
所以总而言之,这在某种程度上是意料之中的,因为库处理这些需要大量操作才能处理的非本地数字(显然)。