如何在 pandas.DataFrame.apply 中使用赋值表达式

How to use an assignment expression with pandas.DataFrame.apply

我有以下最小工作示例(特定于 python >= 3.8),它将文件名字符串转换为完整路径:

# running this block will produce the expected output
import pandas as pd
from pathlib import Path


def make_path(filename):
    f = filename.split('_')
    return directory / f[-2][:4] / '_'.join(f[:3]) / filename


directory = Path('/ifs/archive/ops/hst/public')

data = {'productFileName': ['hst_15212_ad_wfc3_ir_total_idq2ad_segment-cat.ecsv',
                            'hst_15212_ad_wfc3_ir_total_idq2ad_point-cat.ecsv',
                            'hst_15212_bt_wfc3_ir_total_idq2bt_segment-cat.ecsv',
                            'hst_15212_bt_wfc3_ir_total_idq2bt_point-cat.ecsv',
                            'hst_15212_4g_wfc3_ir_f160w_idq24g_point-cat.ecsv']}
dfx = pd.DataFrame(data)

dfx['filePath'] = dfx.productFileName.apply(make_path)

如何使用 .apply(...) 中的赋值表达式 (:=) 完成此操作?

大致如下:

dfx['filePath'] = dfx.productFileName.apply(lambda filename: directory / f[-2][:4] / '_'.join(f[:3]) / filename for (f := filename.split('_')))

当前结果:

  File "/tmp/ipykernel_3834754/3286169981.py", line 1
    dfx['filePath'] = dfx.productFileName.apply(lambda filename: directory / f[-2][:4] / '_'.join(f[:3]) / filename for (f := filename.split('_')))
                                                                                                                         ^
SyntaxError: cannot assign to named expression

您应该在第一次使用时进行分配:

dfx['filePath'] = dfx.productFileName.apply(lambda filename: directory / (f := filename.split('_'))[-2][:4] / '_'.join(f[:3]) / filename)