比较 xlsx 或 csv 文件格式的银行账户报表以查找任何匹配的交易

Compare bank account statements in xlsx or csv file format to find any matching transactions

我有多个合并和排序的 .xlsx 银行账户报表,以便手动比较和匹配我的账户之间进行的任何交易。目的是解析在将它们导入 GnuCash 之前进行的所有交易,以便我的帐户注册中不会出现重复的交易记录。

到目前为止,我已经制作了一个 bash 脚本来解析我在手动比较合并的 .xlsx 文件后创建的 .csv 文件。 sed 和 awk 的组合用于生成导入 GnuCash 时所需的 .qif 文件。

我需要一些帮助来自动进行交易的手动比较,因为我发现自己无法使用 sed 或 awk 成功解析匹配的交易。

我的银行对帐单格式如下:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69
2014-04-25;2014-04-25;5284690010;SAVINGS;-200;8470.69
2014-04-25;2014-04-25;5284690010;SAVINGS;-1730;8670.69
2014-04-25;2014-04-25;3788765004S;SALARY;10000.69;10400.69
2014-04-24;2014-04-24;5484384195;PHARMACY /14-04-23;-79;400
2014-04-23;2014-04-22;5434473478;GAS STATION/14-04-22;-521;479
2014-04-23;2014-04-22;5487473797;GROCERY STORE/14-04-22;-661;1000

当我对所有银行对帐单进行合并和排序后,我添加了一个包含源银行对帐单帐号的列:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance;Source account
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69;123456789
2014-04-25;2014-04-25;5629374859;MORTGAGE;5000;10000;543219876              # Merged from my second accounts statement
2014-04-25;2014-04-25;5284690010;SAVINGS;-200;8470.69;123456789
2014-04-25;2014-04-25;5284690010;SAVINGS;200;1930;987654321                 # Merged from my third accounts statement
2014-04-25;2014-04-25;5284690010;SAVINGS;-1730;8670.69;123456789
2014-04-25;2014-04-25;5284690010;SAVINGS;1730;1730;987654321                # Merged from my third accounts statement
2014-04-25;2014-04-25;3788765004S;SALARY;10000.69;10400.69;123456789
2014-04-24;2014-04-24;5484384195;PHARMACY /14-04-23;-79;400;123456789
2014-04-23;2014-04-22;5434473478;GAS STATION/14-04-22;-521;479;123456789
2014-04-23;2014-04-22;5487473797;GROCERY STORE/14-04-22;-661;1000;123456789

我需要帮助实现的是用合并的银行对账单解析文件,以便找到我的账户之间的交易。记录日期、发生日期、验证编号、备注和金额(比较两行时忽略负金额符号)匹配的任何交易(文件中的行)应按如下方式处理:1)将源帐户交易行保存在文件中,2 ) 将带有目标账户帐号的新列 ("Destination account") 添加到源账户交易行 3) 从文件中删除目标账户交易行。

举个例子 - 这是一场比赛:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance;Source account
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69;123456789           # Source account
2014-04-25;2014-04-25;5629374859;MORTGAGE;5000;10000;543219876              # Destination account

处理完生成交易的这两行后,文件中的输出应为:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance;Source account;Destination account
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69;123456789;543219876

在处理完我的合并银行账户报表示例中的所有交易后,最终输出应该是一个包含以下行的文件:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance;Source account;Destination account
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69;123456789;543219876
2014-04-25;2014-04-25;5284690010;SAVINGS;-200;8470.69;123456789;987654321
2014-04-25;2014-04-25;5284690010;SAVINGS;-1730;8670.69;123456789;987654321
2014-04-25;2014-04-25;3788765004S;SALARY;10000.69;10400.69;123456789;
2014-04-24;2014-04-24;5484384195;PHARMACY /14-04-23;-79;400;123456789;
2014-04-23;2014-04-22;5434473478;GAS STATION/14-04-22;-521;479;123456789;
2014-04-23;2014-04-22;5487473797;GROCERY STORE/14-04-22;-661;1000;123456789;

注意:这四笔交易不是我账户之间的交易 - 它们应该保存在文件中,添加的列"Destination account"留空。

2014-04-25;2014-04-25;3788765004S;SALARY;10000.69;10400.69;123456789;
2014-04-24;2014-04-24;5484384195;PHARMACY /14-04-23;-79;400;123456789;
2014-04-23;2014-04-22;5434473478;GAS STATION/14-04-22;-521;479;123456789;
2014-04-23;2014-04-22;5487473797;GROCERY STORE/14-04-22;-661;1000;123456789;

任何使用与我当前 bash 脚本兼容的工具的解决方案(或者可能是使用 pythons pandas 库的解决方案?)将不胜感激!

假设您已将这些订单项加载到列表列表中...

matcher = dict()
for li in line_items:
    # use Verification as key, append Amount and Account
    matcher.setdefault(li[2], []).append((li[5], li[7]))
# then sort these by amount so that "from" is first (negative value means "from") 
for k in matcher.keys():
    matcher[k].sort()
[...]
# later, can obtain accounts using Verification...
# assuming "v" has value of Verification number
from_acct, to_acct = [i[1] for i in matcher.get(v, ((None, None), (None, None)))]

我认为这会处理您更新后的问题中描述的交易记录。

它首先从输入的 csv 文件创建类型为 defaultdict(list) 的字典,该文件具有基于交易匹配描述的标准的键。具有相同密钥的所有交易都存储在关联的 list.

之后,它以成对方式遍历为每个键收集的交易列表,并从中创建合并的交易记录,其中添加了第二笔交易的源账户的附加目标账户字段。然后将创建的每个合并交易记录写入输出 csv 文件。

未配对的交易只会变成目标字段为空的合并记录。配对的交易仅在两个金额的符号不同时才合并,否则它们将被视为两个未配对的交易,如前所述。

from collections import defaultdict, namedtuple
import csv
from itertools import imap, izip_longest

# A couple of utility string conversion functions.
def rename(name):
    """ Convert csv column name to a valid namedtuple fieldname which must be a
    valid Python identifier. Not exhaustive, but good enough for the headers
    shown (and is reversable, see below).
    """
    return name.lower().replace(' ', '_')

def undo_rename(name):
    """ Convert munged namedtuple fieldname back to a csv column name. """
    return name.replace('_', ' ').capitalize()

banktrans_filename = 'banktrans.csv'
banktrans_merged_filename = 'banktransmerged.csv'
DELIMITER = ';'
matched_trans = defaultdict(list)

with open(banktrans_filename, 'rb') as banktrans_file:
    reader = csv.reader(banktrans_file, delimiter=DELIMITER)
    # create namedtuple fieldnames from csv header row
    fieldnames = [rename(columname) for columname in next(reader)]
    Transaction = namedtuple('Transaction', fieldnames)
    for transact in imap(Transaction._make, reader):
        match_key = (transact.date_recorded, transact.date_occurred,
                     transact.verification_number, transact.memo,
                     # disregard any leading minus sign in amount field
                     transact.amount[transact.amount.startswith('-'):])
        matched_trans[match_key].append(transact)

with open(banktrans_merged_filename, 'wb') as banktrans_merged_file:
    writer = csv.writer(banktrans_merged_file, delimiter=DELIMITER)
    # merged tranactions have an additonal fieldname at the end
    mergedfieldnames = fieldnames + [rename('Destination account')]
    MergedTransaction = namedtuple('MergedTransaction', mergedfieldnames)
    # write header row
    writer.writerow([undo_rename(fieldname) for fieldname in mergedfieldnames])
    # merge pairs of matched transactions
    for match_key, transacts in sorted(matched_trans.items()):
        for trans_pair in izip_longest(*([iter(transacts)]*2)):
            if trans_pair[1] is None:  # unmatched trans, copy & add empty col
                merged_transact = MergedTransaction._make(trans_pair[0] + ('',))
            elif (trans_pair[0].amount.startswith('-') ==
                  trans_pair[1].amount.startswith('-')):  # amts have same sign?
                # records shouldn't be merged, treat as two unmatched trans
                merged_transact = MergedTransaction._make(trans_pair[0] + ('',))
                writer.writerow(merged_transact)
                merged_transact = MergedTransaction._make(trans_pair[1] + ('',))
                writer.writerow(merged_transact)
                continue  # skip remainder of loop
            else:  # merge pair by making source of the second the dest account
                merged_transact = MergedTransaction._make(
                    trans_pair[0] + (trans_pair[1].source_account,))
            writer.writerow(merged_transact)    

print('merged transactions saved to file: ' + repr(banktrans_merged_filename))

结果输出文件的内容:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance;Source account;Destination account
2014-04-23;2014-04-22;5434473478;GAS STATION/14-04-22;-521;479;123456789;
2014-04-23;2014-04-22;5487473797;GROCERY STORE/14-04-22;-661;1000;123456789;
2014-04-24;2014-04-24;5484384195;PHARMACY /14-04-23;-79;400;123456789;
2014-04-25;2014-04-25;3788765004S;SALARY;10000.69;10400.69;123456789;
2014-04-25;2014-04-25;5284690010;SAVINGS;-1730;8670.69;123456789;987654321
2014-04-25;2014-04-25;5284690010;SAVINGS;-200;8470.69;123456789;987654321
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69;123456789;543219876