将参数添加到应用的数据框函数

Adding parameters to an applied dataframe function

假设我有一个数据框:

                 Pop_By_Area    CensusPop
 ID         
 100010401001000    77.0        77           
 100010401001001    294.0       294 
 100010401001002    20.0        20
 100010401001003    91.0        91  
 100010401001004    53.0        53  

我想创建一个函数来比较一行中的 2 列值和 return 新列的值,即两列之间的差异:

 def pop_compare(row):
     pop_by_area_sum = row.Pop_By_Area
    census_pop_avg = float(row.CensusPop)
    diff = 0
    if (pop_by_area_sum != census_pop_avg):
        diff = abs(int(pop_by_area_sum - census_pop_avg))
    return diff

cb_pop_sum['Difference'] = cb_pop_sum.apply(pop_compare, axis=1)

没问题;工作正常,但我必须使用特定的列名称:

>                   Pop_By_Area CensusPop Difference   
 ID         
 100010401001000    77.0        77        0   
 100010401001001    294.0       294       0
 100010401001002    20.0        20        0
 100010401001003    91.0        91        0
 100010401001004    53.0        53        0

现在,假设我想使用类似的函数将较大数据框中的任意 2 列与 return 的差异进行比较。除了行之外,我还需要将比较列的参数添加到函数中。

def pop_compare2(row, colA, colB):
    valA = row.colA
    valB = row.colB
    diff = 0
    if (valA != valB):
        diff = abs(int(valA - valB))
    return diff

这不起作用,当我 运行 以下内容时:

c_A = "Pop_By_Area"
c_B = "CensusPop"
cb_pop_sum['Difference2'] = cb_pop_sum.apply(pop_compare2(colA=c_A, colB=c_B), axis=1)
cb_pop_sum.head()

它抛出错误 TypeError: pop_compare2() missing 1 required positional argument: 'row'。我在这里做错了什么?

也许我误解了你的问题,但这应该有效:

from io import StringIO
csv = StringIO("""
 ID                 Pop_By_Area    CensusPop      
 100010401001000    77.0        77           
 100010401001001    294.0       294 
 100010401001002    20.0        20
 100010401001003    91.0        91  
 100010401001004    53.0        53 
""")

import pandas as pd
df = pd.read_csv(csv, sep='\s+')
df['Difference'] = df['Pop_By_Area'] - df['CensusPop']

def custom_func(subdf):
    x,y = subdf.values
    return x**3-y/123

df['Difference2'] = df[['Pop_By_Area', 'CensusPop']].apply(custom_func, axis=1)