Cython 能否进一步减少此函数的 Python 方法调用开销?

Can Cython further reduce Python method calling overhead for this function?

我有一个函数可以根据最终用户的输入调用多次 (10^6+)。根据 cProfile 函数本身执行速度很快,但调用次数会影响性能。

这是一个最小案例:

# condition_counter.pyx
# cython: profile=True

import cProfile
import pstats
import pyximport

pyximport.install()

USER_DEFINED_NUM = 10
USER_DEFINED_SPECIAL_VALUES = 1, 3, 8


def condition_met(number):
    value = USER_DEFINED_NUM % number
    return value in USER_DEFINED_SPECIAL_VALUES


cdef cy_condition_met(number):
    value = USER_DEFINED_NUM % number
    return value in USER_DEFINED_SPECIAL_VALUES


def condition_counter(end_number):
    current_number = 1
    special_nums = [num for num in range(current_number, end_number) if condition_met(num)]
    return len(special_nums)

def cy_condition_counter(end_number):
    current_number = 1
    special_nums = [num for num in range(current_number, end_number) if cy_condition_met(num)]
    return len(special_nums)

以上不是我的实际代码,它只是一个显示我遇到的优化问题的小例子。当我分析 Cython 和 Python 版本时,我发现差异非常小。

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  ...
  9999999    2.117    0.000    2.117    0.000 min_case_py_overhead.pyx:13(condition_met)
  ...

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  ...
  9999999    2.090    0.000    2.090    0.000 min_case_py_overhead.pyx:18(cy_condition_met)
  ...

percall 统计数据来看,Python 和 Cython 函数的内容执行速度同样快。这就是为什么我怀疑 Python 开销是问题所在。这也是我认为 PyPy 不会提供帮助的原因。

有没有办法进一步减少开销?我尝试静态声明变量,但有时会减慢速度。我欢迎 Cython 之外的性能改进。我的主要问题是多次调用一个函数,很多次。在我的场景中,减少调用次数不是一个选项。

您可以通过 cdef 处理所有内容来减少 python 个对象的开销。我删除了分析代码以支持单独的模块计时 10M 运行s 的功能,并减少了 90% 的 运行 时间。这是您的现有功能和以“cp”开头的新功能。

condition_counter.pyx

USER_DEFINED_NUM = 10
USER_DEFINED_SPECIAL_VALUES = 1, 3, 8

def condition_met(number):
    value = USER_DEFINED_NUM % number
    return value in USER_DEFINED_SPECIAL_VALUES

cdef cy_condition_met(number):
    value = USER_DEFINED_NUM % number
    return value in USER_DEFINED_SPECIAL_VALUES

def condition_counter(end_number):
    current_number = 1
    special_nums = [num for num in range(current_number, end_number) if condition_met(num)]
    return len(special_nums)

def cy_condition_counter(end_number):
    current_number = 1
    special_nums = [num for num in range(current_number, end_number) if cy_condition_met(num)]
    return len(special_nums)

#----------------------------------------------------------------------
# Really go down the cython path
#----------------------------------------------------------------------

cdef int CP_USER_DEFINED_NUM = 10
cdef int CP_USER_DEFINED_SPECIAL_VALUES[3]
CP_USER_DEFINED_SPECIAL_VALUES = [1, 3, 8]

cdef int cp_condition_met(int number):
    cdef int value = CP_USER_DEFINED_NUM % number
    return value in CP_USER_DEFINED_SPECIAL_VALUES

cpdef int cp_condition_counter(int end_number):
    cdef int current_number = 1
    cdef int num
    cdef int count = 0
    for num in range(current_number, end_number):
        if cp_condition_met(num):
            count += 1
    return count

测试脚本

#!/usr/bin/env python3

import condition_counter
from time import perf_counter

iterations = 10_000_000

start = perf_counter()
result = condition_counter.condition_counter(iterations)
delta = perf_counter()-start
print("py", delta)

start = perf_counter()
result = condition_counter.cy_condition_counter(iterations)
delta = perf_counter()-start
print("cy", delta)

start = perf_counter()
result = condition_counter.cp_condition_counter(iterations)
delta = perf_counter()-start
print("cp", delta)

和性能数字

py 0.6689409520004119
cy 0.5783118550007202
cp 0.03368412400050147