根据边际税率表计算纳税义务

Calculate tax liabilities based on a marginal tax rate schedule

income tax calculation python asks how to calculate taxes given a marginal tax rate schedule, and its answer 提供了一个有效的函数(如下)。

但是,它仅适用于单一收入值。我将如何调整它以适用于 list/numpy array/pandas 系列收入值?也就是说,如何矢量化此代码?

from bisect import bisect

rates = [0, 10, 20, 30]   # 10%  20%  30%

brackets = [10000,        # first 10,000
            30000,        # next  20,000
            70000]        # next  40,000

base_tax = [0,            # 10,000 * 0%
            2000,         # 20,000 * 10%
            10000]        # 40,000 * 20% + 2,000

def tax(income):
    i = bisect(brackets, income)
    if not i:
        return 0
    rate = rates[i]
    bracket = brackets[i-1]
    income_in_bracket = income - bracket
    tax_in_bracket = income_in_bracket * rate / 100
    total_tax = base_tax[i-1] + tax_in_bracket
    return total_tax

一种(可能效率低下的)方法是使用列表理解:

def tax_multiple(incomes):
    return [tax(income) for income in incomes]

创建了两个数据框,一个用于税收参数,一个用于收入。 对于每项收入,我们使用 "searchsorted" 方法从税收 table 中获取相应的行索引。 使用该索引,我们创建一个新的 table (df_tax.loc[rows]) 并将其与收入 table 连接起来, 然后计算税收,并删除不需要的列。

import numpy as np, pandas as pd

    # Test data:
    df=pd.DataFrame({"name":["Bob","Julie","Mary","John","Bill","George","Andie"], \
                    "income":[0, 9_000, 10_000, 11_000, 30_000, 69_999, 200_000]})   
    OUT:
         name  income
    0     Bob       0
    1   Julie    9000
    2    Mary   10000
    3    John   11000
    4    Bill   30000
    5  George   69999
    6   Andie  200000

df_tax=pd.DataFrame({"brackets": [0, 10_000, 30_000, 70_000 ],   # lower limits
                     "rates":    [0,  .10,    .20,    .30   ],
                     "base_tax": [0,   0,    2_000,  10_000 ]} )


rows= df_tax["brackets"].searchsorted(df["income"], side="right") - 1  # aka bisect()
OUT:
[0 0 1 1 2 2 3]

df= pd.concat([df,df_tax.loc[rows].reset_index(drop=True)], axis=1) 

df["total_tax"]= df["income"].sub(df["brackets"]).mul(df["rates"]).add(df["base_tax"])

OUT:
     name  income  brackets  rates  base_tax  total_tax
0     Bob       0         0    0.0         0        0.0
1   Julie    9000         0    0.0         0        0.0
2    Mary   10000     10000    0.1         0        0.0
3    John   11000     10000    0.1         0      100.0
4    Bill   30000     30000    0.2      2000     2000.0
5  George   69999     30000    0.2      2000     9999.8
6   Andie  200000     70000    0.3     10000    49000.0

df=df.reindex(columns=["name","income","total_tax"])
OUT:
     name  income  total_tax
0     Bob       0        0.0
1   Julie    9000        0.0
2    Mary   10000        0.0
3    John   11000      100.0
4    Bill   30000     2000.0
5  George   69999     9999.8
6   Andie  200000    49000.0

编辑:

一开始,你也可以计算出base_tax:

df_tax["base_tax"]= df_tax.brackets   #edit2
                .sub(df_tax.brackets.shift(fill_value=0))
                .mul(df_tax.rates.shift(fill_value=0))
                .cumsum()

将 kantal 的 调整为 运行 作为函数:

def income_tax(income, brackets, rates):
    df_tax = pd.DataFrame({'brackets': brackets, 'rates': rates})
    df_tax['base_tax'] = df_tax.brackets.\
        sub(df_tax.brackets.shift(fill_value=0)).\
        mul(df_tax.rates.shift(fill_value=0)).cumsum()
    rows = df_tax.brackets.searchsorted(income, side='right') - 1
    income_bracket_df = df_tax.loc[rows].reset_index(drop=True)
    return pd.Series(income).sub(income_bracket_df.brackets).\
        mul(income_bracket_df.rates).add(income_bracket_df.base_tax)

例如:

income = [0, 9_000, 10_000, 11_000, 30_000, 69_999, 200_000]
brackets = [0, 10_000, 30_000, 70_000]  # Lower limits.
rates =    [0,    .10,    .20,    .30]

income_tax(income, brackets, rates).tolist()
# [0.0, 0.0, 0.0, 100.0, 2000.0, 9999.8, 49000.0]

如果需要,此方法仅使用 NumPy 实现矢量化边际税计算。

def tax(incomes, bands, rates):
    # Broadcast incomes so that we can compute an amount per income, per band
    incomes_ = np.broadcast_to(incomes, (bands.shape[0] - 1, incomes.shape[0]))
    # Find amounts in bands for each income
    amounts_in_bands = np.clip(incomes_.transpose(),
                               bands[:-1], bands[1:]) - bands[:-1]
    # Calculate tax per band
    taxes = rates * amounts_in_bands
    # Sum tax bands per income
    total_taxes = taxes.sum(axis=1)
    return total_taxes

对于用法,区间应包括上限 - 在我看来,这使它更加明确。

incomes = np.array([0, 7000, 14000, 28000, 56000, 77000, 210000])
bands = np.array([0, 12500, 50000, 150000, np.inf])
rates = np.array([0, 0.2, 0.4, 0.45])

df = pd.DataFrame()
df['pre_tax'] = incomes
df['post_tax'] = incomes - tax(incomes, bands, rates)
print(df)

输出:

   pre_tax  post_tax
0        0       0.0
1     7000    7000.0
2    14000   13700.0
3    28000   24900.0
4    56000   46100.0
5    77000   58700.0
6   210000  135500.0