字符串比较方法之间的性能差异

Question

作为一名学生，我有一门关于使用 Python3.8+ 新功能的课程，它还包括 Python3.10。所以我现在正尝试在我的家庭作业中使用 match-case 来消除一长串 if-else，它应该被 cpp 中的 switch-case 替换。

在我的作业中，我需要一个将字符串“male/female/...”转换为整数的函数。这很容易，但我突然对不同字符串比较方法的性能感兴趣，所以我设计了下面的测试代码来测试三种不同方法的性能。

import timeit
from typing import Callable


def test_match_case(sex: str):
    match sex:
        case 'male':
            gender = 1
        case 'female':
            gender = 2
        case '________female':
            gender = 3
        case '#________________female':
            gender = 4
        case _:
            gender = 0

    return gender


def test_if_full_compare(sex: str):
    if sex == 'male':
        gender = 1
    elif sex == 'female':
        gender = 2
    elif sex == '________female':
        gender = 3
    elif sex == '#________________female':
        gender = 4
    else:
        gender = 0

    return gender


def test_if_startswith(sex: str):
    if sex.startswith('m'):
        gender = 1
    elif sex.startswith('f'):
        gender = 2
    elif sex.startswith('_'):
        gender = 3
    elif sex.startswith('#'):
        gender = 4
    else:
        gender = 0

    return gender


def _test_and_print(func: Callable, arg: str, times: int = 10000000):
    func_name = func.__name__
    time_cost = timeit.timeit(f"""{func_name}("{arg}")""", f"""from __main__ import {func_name}""", number=times)
    print(f"func: {func_name} arg: {arg} time_cost: {time_cost:.6f}")


_test_and_print(test_match_case, "male")
_test_and_print(test_match_case, "female")
_test_and_print(test_match_case, "________female")
_test_and_print(test_match_case, "#________________female")
_test_and_print(test_if_full_compare, "male")
_test_and_print(test_if_full_compare, "female")
_test_and_print(test_if_full_compare, "________female")
_test_and_print(test_if_full_compare, "#________________female")
_test_and_print(test_if_startswith, "male")
_test_and_print(test_if_startswith, "female")
_test_and_print(test_if_startswith, "________female")
_test_and_print(test_if_startswith, "#________________female")

这是我电脑上的测试结果：

func: test_match_case arg: male time_cost: 0.517885
func: test_match_case arg: female time_cost: 0.610869
func: test_match_case arg: ________female time_cost: 0.726382
func: test_match_case arg: #________________female time_cost: 0.846604
func: test_if_full_compare arg: male time_cost: 0.487761
func: test_if_full_compare arg: female time_cost: 0.578670
func: test_if_full_compare arg: ________female time_cost: 0.666413
func: test_if_full_compare arg: #________________female time_cost: 0.827917
func: test_if_startswith arg: male time_cost: 0.917373
func: test_if_startswith arg: female time_cost: 1.362152
func: test_if_startswith arg: ________female time_cost: 1.817514
func: test_if_startswith arg: #________________female time_cost: 2.249916

我在Python中了解了一些关于“常驻内存”的知识，但我对上面的结果仍有很多疑问。

match-case 在每种情况下都比 if-else 慢，为什么？
L4 和 L8 之间的时间成本差距比其他三种情况（L1-L5、L2-L6、L3-L7）小得多，为什么？这是受“常驻内存”影响吗？
startswith 处理长字符串的速度太差了。在我的直觉中，这个内置函数只需要比较第一个字符，所以最后 4 种情况的时间成本应该几乎相同，而且 L12 应该比 L8 快得多，因为它不需要遍历整个字符串。但事实与我的估计恰恰相反。这怎么解释？
如果您发现任何其他问题，请告诉我。

我已经用“[python] 字符串比较”一词进行了搜索，但没有得到任何预期的结果。

Answer 1

您的性能问题有三个原因，前两个本质上是 language/interpreter 的“故障”，另一个是您构建测试用例的结果：

Python 不是完全编译的静态类型语言，这意味着 match 通常不能优化太多（如果我没记错的话，甚至可能需要考虑 cases 顺序，使其更难优化），并且在实践中，从不执行有意义的优化以“跳”到右边 case。 match 构造的设计实际上在此处的 if/elif/else 链上增加了少量开销，但在其他方面等效到 if/elif/else 链，使其在大多数情况下稍微慢一些（请参阅下面的详细信息；这是它赶上最后一个案例的原因）。
没有专用字节码支持的通用代码路径（例如任意方法调用）通常比字节码支持的操作（例如直接 == 检查）具有更高的固定开销，因此如果完成的工作相同，字节码操作赢了。随着时间的推移，他们一直在通过避免创建绑定方法对象的 LOAD_METHOD/CALL_METHOD 操作码和减少参数处理开销的 Vectorcall 协议来改进这一点，但在大多数情况下专用字节码仍然胜出，所以如果== 和 startswith 做同样的实际工作，== 会赢。
即使在理论上，startswith 也不会保存任何内容 在此特定用例中；每个固定字符串的长度都不同，因此实际的相等性测试从“长度是否匹配？”开始，发现它们不匹配，并立即说“不相等”；当它们不相等时，它甚至从不读取字符串数据的单个字节。当它们相等时，很可能是因为所涉及的四个字符串中的三个是包含合法变量名称的文字（CPython 实习生作为实现细节），甚至不比较字符串的数据'平等;它检查“is interned”位，在两者中找到它的集合，并且知道可以比较指针地址（它甚至可能在大小检查之前这样做）；当且仅当它们是相同字符串时，两个 interned 字符串才相等，因此再次强调，不需要数据比较。

第 #1 点/问题 #2 的详细信息：我检查了在两个相同函数上调用 dis.dis 的结果，其中一个使用了 match /case 其中之一使用了 if/elif/else 链。 CPython 3.10.0 生成的字节码几乎相同，除了：

match/case 链只加载一次字符串，然后在每次测试保存最后一个之前，它使用 DUP_TOP 指令将字符串顶部的值加倍堆栈（只是引用的两倍，没有复制）所以新的副本可以用于比较，而原始的则保留下来用于下一次比较。 DUP_TOP 是一个 super-cheap 指令（大致相当于它在 if 链中使用的 LOAD_FAST ，你必须一遍又一遍地命名局部变量），但是.. .
因为任何成功比较意味着不需要重复值，所有路径保存最后一个需要添加一个POP_TOP指令来清除重复值if/elif链不需要。

这解释了您对问题 #2 的观察；所有以涉及 DUP_TOP 的 case 结尾的代码路径执行一条额外的字节码指令（对应的 POP_TOP），而最后一种情况不需要它，因此执行的指令结束up 大致相同（它执行一个 LOAD_FAST 和 case count DUP_TOPs，而不是 case count LOAD_FASTs); DUP_TOP 非常比 LOAD_FAST 稍微便宜一点（但还不足以在需要时弥补 POP_TOP 的成本），您执行稍微比 match 的最后 case 更好，它使用匹配的值。

Answer 2

您错过了我认为最 pythonic 的解决方案，使用字典。我希望它在每种情况下都是最快的，但事实并非如此。级联 elif 子句在列表中越往下越慢，因为它必须先通过每个前面的测试才能移动到下一个。

_sex_dict = {
    'male':                   1,
    'female':                 2,
    '________female':         3,
    '#________________female':4
    }

def test_dict(sex: str):
    gender = _sex_dict.get(sex, 0)
    return gender

func: test_if_full_compare arg: male time_cost: 1.621118
func: test_if_full_compare arg: female time_cost: 1.786827
func: test_if_full_compare arg: ________female time_cost: 2.041604
func: test_if_full_compare arg: #________________female time_cost: 2.434724
func: test_if_startswith arg: male time_cost: 2.886888
func: test_if_startswith arg: female time_cost: 4.518326
func: test_if_startswith arg: ________female time_cost: 5.808293
func: test_if_startswith arg: #________________female time_cost: 7.220475
func: test_dict arg: male time_cost: 1.917939
func: test_dict arg: female time_cost: 1.924376
func: test_dict arg: ________female time_cost: 1.930555
func: test_dict arg: #________________female time_cost: 2.079617

字符串比较方法之间的性能差异

Performance variance among string-compare methods

python

string

string-comparison