将参数添加到应用的数据框函数
Adding parameters to an applied dataframe function
假设我有一个数据框:
Pop_By_Area CensusPop
ID
100010401001000 77.0 77
100010401001001 294.0 294
100010401001002 20.0 20
100010401001003 91.0 91
100010401001004 53.0 53
我想创建一个函数来比较一行中的 2 列值和 return 新列的值,即两列之间的差异:
def pop_compare(row):
pop_by_area_sum = row.Pop_By_Area
census_pop_avg = float(row.CensusPop)
diff = 0
if (pop_by_area_sum != census_pop_avg):
diff = abs(int(pop_by_area_sum - census_pop_avg))
return diff
cb_pop_sum['Difference'] = cb_pop_sum.apply(pop_compare, axis=1)
没问题;工作正常,但我必须使用特定的列名称:
> Pop_By_Area CensusPop Difference
ID
100010401001000 77.0 77 0
100010401001001 294.0 294 0
100010401001002 20.0 20 0
100010401001003 91.0 91 0
100010401001004 53.0 53 0
现在,假设我想使用类似的函数将较大数据框中的任意 2 列与 return 的差异进行比较。除了行之外,我还需要将比较列的参数添加到函数中。
def pop_compare2(row, colA, colB):
valA = row.colA
valB = row.colB
diff = 0
if (valA != valB):
diff = abs(int(valA - valB))
return diff
这不起作用,当我 运行 以下内容时:
c_A = "Pop_By_Area"
c_B = "CensusPop"
cb_pop_sum['Difference2'] = cb_pop_sum.apply(pop_compare2(colA=c_A, colB=c_B), axis=1)
cb_pop_sum.head()
它抛出错误 TypeError: pop_compare2() missing 1 required positional argument: 'row'
。我在这里做错了什么?
也许我误解了你的问题,但这应该有效:
from io import StringIO
csv = StringIO("""
ID Pop_By_Area CensusPop
100010401001000 77.0 77
100010401001001 294.0 294
100010401001002 20.0 20
100010401001003 91.0 91
100010401001004 53.0 53
""")
import pandas as pd
df = pd.read_csv(csv, sep='\s+')
df['Difference'] = df['Pop_By_Area'] - df['CensusPop']
def custom_func(subdf):
x,y = subdf.values
return x**3-y/123
df['Difference2'] = df[['Pop_By_Area', 'CensusPop']].apply(custom_func, axis=1)
假设我有一个数据框:
Pop_By_Area CensusPop
ID
100010401001000 77.0 77
100010401001001 294.0 294
100010401001002 20.0 20
100010401001003 91.0 91
100010401001004 53.0 53
我想创建一个函数来比较一行中的 2 列值和 return 新列的值,即两列之间的差异:
def pop_compare(row):
pop_by_area_sum = row.Pop_By_Area
census_pop_avg = float(row.CensusPop)
diff = 0
if (pop_by_area_sum != census_pop_avg):
diff = abs(int(pop_by_area_sum - census_pop_avg))
return diff
cb_pop_sum['Difference'] = cb_pop_sum.apply(pop_compare, axis=1)
没问题;工作正常,但我必须使用特定的列名称:
> Pop_By_Area CensusPop Difference
ID
100010401001000 77.0 77 0
100010401001001 294.0 294 0
100010401001002 20.0 20 0
100010401001003 91.0 91 0
100010401001004 53.0 53 0
现在,假设我想使用类似的函数将较大数据框中的任意 2 列与 return 的差异进行比较。除了行之外,我还需要将比较列的参数添加到函数中。
def pop_compare2(row, colA, colB):
valA = row.colA
valB = row.colB
diff = 0
if (valA != valB):
diff = abs(int(valA - valB))
return diff
这不起作用,当我 运行 以下内容时:
c_A = "Pop_By_Area"
c_B = "CensusPop"
cb_pop_sum['Difference2'] = cb_pop_sum.apply(pop_compare2(colA=c_A, colB=c_B), axis=1)
cb_pop_sum.head()
它抛出错误 TypeError: pop_compare2() missing 1 required positional argument: 'row'
。我在这里做错了什么?
也许我误解了你的问题,但这应该有效:
from io import StringIO
csv = StringIO("""
ID Pop_By_Area CensusPop
100010401001000 77.0 77
100010401001001 294.0 294
100010401001002 20.0 20
100010401001003 91.0 91
100010401001004 53.0 53
""")
import pandas as pd
df = pd.read_csv(csv, sep='\s+')
df['Difference'] = df['Pop_By_Area'] - df['CensusPop']
def custom_func(subdf):
x,y = subdf.values
return x**3-y/123
df['Difference2'] = df[['Pop_By_Area', 'CensusPop']].apply(custom_func, axis=1)