根据边际税率表计算纳税义务
Calculate tax liabilities based on a marginal tax rate schedule
income tax calculation python asks how to calculate taxes given a marginal tax rate schedule, and its answer 提供了一个有效的函数(如下)。
但是,它仅适用于单一收入值。我将如何调整它以适用于 list/numpy array/pandas 系列收入值?也就是说,如何矢量化此代码?
from bisect import bisect
rates = [0, 10, 20, 30] # 10% 20% 30%
brackets = [10000, # first 10,000
30000, # next 20,000
70000] # next 40,000
base_tax = [0, # 10,000 * 0%
2000, # 20,000 * 10%
10000] # 40,000 * 20% + 2,000
def tax(income):
i = bisect(brackets, income)
if not i:
return 0
rate = rates[i]
bracket = brackets[i-1]
income_in_bracket = income - bracket
tax_in_bracket = income_in_bracket * rate / 100
total_tax = base_tax[i-1] + tax_in_bracket
return total_tax
一种(可能效率低下的)方法是使用列表理解:
def tax_multiple(incomes):
return [tax(income) for income in incomes]
创建了两个数据框,一个用于税收参数,一个用于收入。
对于每项收入,我们使用 "searchsorted" 方法从税收 table 中获取相应的行索引。
使用该索引,我们创建一个新的 table (df_tax.loc[rows]) 并将其与收入 table 连接起来,
然后计算税收,并删除不需要的列。
import numpy as np, pandas as pd
# Test data:
df=pd.DataFrame({"name":["Bob","Julie","Mary","John","Bill","George","Andie"], \
"income":[0, 9_000, 10_000, 11_000, 30_000, 69_999, 200_000]})
OUT:
name income
0 Bob 0
1 Julie 9000
2 Mary 10000
3 John 11000
4 Bill 30000
5 George 69999
6 Andie 200000
df_tax=pd.DataFrame({"brackets": [0, 10_000, 30_000, 70_000 ], # lower limits
"rates": [0, .10, .20, .30 ],
"base_tax": [0, 0, 2_000, 10_000 ]} )
rows= df_tax["brackets"].searchsorted(df["income"], side="right") - 1 # aka bisect()
OUT:
[0 0 1 1 2 2 3]
df= pd.concat([df,df_tax.loc[rows].reset_index(drop=True)], axis=1)
df["total_tax"]= df["income"].sub(df["brackets"]).mul(df["rates"]).add(df["base_tax"])
OUT:
name income brackets rates base_tax total_tax
0 Bob 0 0 0.0 0 0.0
1 Julie 9000 0 0.0 0 0.0
2 Mary 10000 10000 0.1 0 0.0
3 John 11000 10000 0.1 0 100.0
4 Bill 30000 30000 0.2 2000 2000.0
5 George 69999 30000 0.2 2000 9999.8
6 Andie 200000 70000 0.3 10000 49000.0
df=df.reindex(columns=["name","income","total_tax"])
OUT:
name income total_tax
0 Bob 0 0.0
1 Julie 9000 0.0
2 Mary 10000 0.0
3 John 11000 100.0
4 Bill 30000 2000.0
5 George 69999 9999.8
6 Andie 200000 49000.0
编辑:
一开始,你也可以计算出base_tax:
df_tax["base_tax"]= df_tax.brackets #edit2
.sub(df_tax.brackets.shift(fill_value=0))
.mul(df_tax.rates.shift(fill_value=0))
.cumsum()
将 kantal 的 调整为 运行 作为函数:
def income_tax(income, brackets, rates):
df_tax = pd.DataFrame({'brackets': brackets, 'rates': rates})
df_tax['base_tax'] = df_tax.brackets.\
sub(df_tax.brackets.shift(fill_value=0)).\
mul(df_tax.rates.shift(fill_value=0)).cumsum()
rows = df_tax.brackets.searchsorted(income, side='right') - 1
income_bracket_df = df_tax.loc[rows].reset_index(drop=True)
return pd.Series(income).sub(income_bracket_df.brackets).\
mul(income_bracket_df.rates).add(income_bracket_df.base_tax)
例如:
income = [0, 9_000, 10_000, 11_000, 30_000, 69_999, 200_000]
brackets = [0, 10_000, 30_000, 70_000] # Lower limits.
rates = [0, .10, .20, .30]
income_tax(income, brackets, rates).tolist()
# [0.0, 0.0, 0.0, 100.0, 2000.0, 9999.8, 49000.0]
如果需要,此方法仅使用 NumPy 实现矢量化边际税计算。
def tax(incomes, bands, rates):
# Broadcast incomes so that we can compute an amount per income, per band
incomes_ = np.broadcast_to(incomes, (bands.shape[0] - 1, incomes.shape[0]))
# Find amounts in bands for each income
amounts_in_bands = np.clip(incomes_.transpose(),
bands[:-1], bands[1:]) - bands[:-1]
# Calculate tax per band
taxes = rates * amounts_in_bands
# Sum tax bands per income
total_taxes = taxes.sum(axis=1)
return total_taxes
对于用法,区间应包括上限 - 在我看来,这使它更加明确。
incomes = np.array([0, 7000, 14000, 28000, 56000, 77000, 210000])
bands = np.array([0, 12500, 50000, 150000, np.inf])
rates = np.array([0, 0.2, 0.4, 0.45])
df = pd.DataFrame()
df['pre_tax'] = incomes
df['post_tax'] = incomes - tax(incomes, bands, rates)
print(df)
输出:
pre_tax post_tax
0 0 0.0
1 7000 7000.0
2 14000 13700.0
3 28000 24900.0
4 56000 46100.0
5 77000 58700.0
6 210000 135500.0
income tax calculation python asks how to calculate taxes given a marginal tax rate schedule, and its answer 提供了一个有效的函数(如下)。
但是,它仅适用于单一收入值。我将如何调整它以适用于 list/numpy array/pandas 系列收入值?也就是说,如何矢量化此代码?
from bisect import bisect
rates = [0, 10, 20, 30] # 10% 20% 30%
brackets = [10000, # first 10,000
30000, # next 20,000
70000] # next 40,000
base_tax = [0, # 10,000 * 0%
2000, # 20,000 * 10%
10000] # 40,000 * 20% + 2,000
def tax(income):
i = bisect(brackets, income)
if not i:
return 0
rate = rates[i]
bracket = brackets[i-1]
income_in_bracket = income - bracket
tax_in_bracket = income_in_bracket * rate / 100
total_tax = base_tax[i-1] + tax_in_bracket
return total_tax
一种(可能效率低下的)方法是使用列表理解:
def tax_multiple(incomes):
return [tax(income) for income in incomes]
创建了两个数据框,一个用于税收参数,一个用于收入。 对于每项收入,我们使用 "searchsorted" 方法从税收 table 中获取相应的行索引。 使用该索引,我们创建一个新的 table (df_tax.loc[rows]) 并将其与收入 table 连接起来, 然后计算税收,并删除不需要的列。
import numpy as np, pandas as pd
# Test data:
df=pd.DataFrame({"name":["Bob","Julie","Mary","John","Bill","George","Andie"], \
"income":[0, 9_000, 10_000, 11_000, 30_000, 69_999, 200_000]})
OUT:
name income
0 Bob 0
1 Julie 9000
2 Mary 10000
3 John 11000
4 Bill 30000
5 George 69999
6 Andie 200000
df_tax=pd.DataFrame({"brackets": [0, 10_000, 30_000, 70_000 ], # lower limits
"rates": [0, .10, .20, .30 ],
"base_tax": [0, 0, 2_000, 10_000 ]} )
rows= df_tax["brackets"].searchsorted(df["income"], side="right") - 1 # aka bisect()
OUT:
[0 0 1 1 2 2 3]
df= pd.concat([df,df_tax.loc[rows].reset_index(drop=True)], axis=1)
df["total_tax"]= df["income"].sub(df["brackets"]).mul(df["rates"]).add(df["base_tax"])
OUT:
name income brackets rates base_tax total_tax
0 Bob 0 0 0.0 0 0.0
1 Julie 9000 0 0.0 0 0.0
2 Mary 10000 10000 0.1 0 0.0
3 John 11000 10000 0.1 0 100.0
4 Bill 30000 30000 0.2 2000 2000.0
5 George 69999 30000 0.2 2000 9999.8
6 Andie 200000 70000 0.3 10000 49000.0
df=df.reindex(columns=["name","income","total_tax"])
OUT:
name income total_tax
0 Bob 0 0.0
1 Julie 9000 0.0
2 Mary 10000 0.0
3 John 11000 100.0
4 Bill 30000 2000.0
5 George 69999 9999.8
6 Andie 200000 49000.0
编辑:
一开始,你也可以计算出base_tax:
df_tax["base_tax"]= df_tax.brackets #edit2
.sub(df_tax.brackets.shift(fill_value=0))
.mul(df_tax.rates.shift(fill_value=0))
.cumsum()
将 kantal 的
def income_tax(income, brackets, rates):
df_tax = pd.DataFrame({'brackets': brackets, 'rates': rates})
df_tax['base_tax'] = df_tax.brackets.\
sub(df_tax.brackets.shift(fill_value=0)).\
mul(df_tax.rates.shift(fill_value=0)).cumsum()
rows = df_tax.brackets.searchsorted(income, side='right') - 1
income_bracket_df = df_tax.loc[rows].reset_index(drop=True)
return pd.Series(income).sub(income_bracket_df.brackets).\
mul(income_bracket_df.rates).add(income_bracket_df.base_tax)
例如:
income = [0, 9_000, 10_000, 11_000, 30_000, 69_999, 200_000]
brackets = [0, 10_000, 30_000, 70_000] # Lower limits.
rates = [0, .10, .20, .30]
income_tax(income, brackets, rates).tolist()
# [0.0, 0.0, 0.0, 100.0, 2000.0, 9999.8, 49000.0]
如果需要,此方法仅使用 NumPy 实现矢量化边际税计算。
def tax(incomes, bands, rates):
# Broadcast incomes so that we can compute an amount per income, per band
incomes_ = np.broadcast_to(incomes, (bands.shape[0] - 1, incomes.shape[0]))
# Find amounts in bands for each income
amounts_in_bands = np.clip(incomes_.transpose(),
bands[:-1], bands[1:]) - bands[:-1]
# Calculate tax per band
taxes = rates * amounts_in_bands
# Sum tax bands per income
total_taxes = taxes.sum(axis=1)
return total_taxes
对于用法,区间应包括上限 - 在我看来,这使它更加明确。
incomes = np.array([0, 7000, 14000, 28000, 56000, 77000, 210000])
bands = np.array([0, 12500, 50000, 150000, np.inf])
rates = np.array([0, 0.2, 0.4, 0.45])
df = pd.DataFrame()
df['pre_tax'] = incomes
df['post_tax'] = incomes - tax(incomes, bands, rates)
print(df)
输出:
pre_tax post_tax
0 0 0.0
1 7000 7000.0
2 14000 13700.0
3 28000 24900.0
4 56000 46100.0
5 77000 58700.0
6 210000 135500.0