核对总数时可能出现浮动错误?

Possible float error when reconciling totals?

我的问题是我有一个 pre-process 从 csv 读取数据并在 2 client-given 字段(文档计数和检查总数)上进行协调,然后解析数据并为自己计算总计,然后比较两者以进行协调。

首先,这是我的导入:

from csv import reader, writer, QUOTE_MINIMAL
import logging
from os import getcwd, mkdir, path
from sys import argv
from datetime import date
from types import IntType, FloatType 

接下来,这是实际的协调步骤本身:

def _recon_totals(self):
        """
        Reconcile the check total amount and document count and write out the file name,
        check numbers, vendor names, and timestamp to weekly report.
        """

        # Client totals
        client_doc_count = int(self.header_data[0][6])
        client_check_tot = float(self.header_data[0][7])
        # Double check variable typing for reconciliation totals.
        logging.info('Document count is: {0}'.format(client_doc_count))
        # doc_var_type = type(client_doc_count)
        # assert doc_var_type is IntType, 'Doc count is not an integer: {0}'.format(
        #    doc_var_type) 
        logging.info('Check Total is: {0}'.format(client_check_tot))
        # check_var_type = type(client_check_tot)
        # assert check_var_type is FloatType, 'Check tot is not a float: {0}'.format(
        #    check_var_type)

        # RRD totals
        rrd_doc_count = 0
        rrd_check_tot = 0.0

        with open(self.rpt_of, 'a') as rpt_outfile:
            for transact in self.transact_data:
                row_type = transact[0]
                logging.debug('Transaction type is: {0}'.format(row_type))

                if row_type == 'P':
                    # Reconciliation
                    rrd_doc_count += 1
                    trans_chk_amt = float(transact[12])
                    # trans_chk_type = type(trans_chk_amt)
                    # assert trans_chk_type is FloatType, 'Transaction Check Total is '\
                    #                                     'not a float: {0}'.format(
                    #                                         trans_chk_type)
                    rrd_check_tot += trans_chk_amt
                    # Reporting
                    vend_name = transact[2]
                    file_name = self.infile.split('/')[-1]
                    print('File name', file_name)
                    check_num = transact[9]
                    cur_time = date.today()
                    rpt_outfile.write('{0:<50}{1:<50}{2:<30}{3}\n'.format(file_name,
                                                                          vend_name,
                                                                          check_num,
                                                                          cur_time))
        # Reconcile totals and return the lists for writing if they are correct
        # if (client_doc_count, client_check_tot) == (rrd_doc_count, rrd_check_tot):
        #     logging.info('Recon totals match!')
        if client_doc_count == rrd_doc_count and client_check_tot == rrd_check_tot:
        #     logging.info('Recon totals match!')
            return True

        else:
            raise ValueError('Recon totals do not match! Client: {0} {1} {2} {3}\n'
                             'RRD {4} {5} {6} {7}'.format(client_doc_count,
                                                          client_check_tot,
                                                          type(client_doc_count),
                                                          type(client_check_tot),
                                                          rrd_doc_count,
                                                          rrd_check_tot,
                                                          type(rrd_doc_count),
                                                          type(rrd_check_tot)))

我有 6 个文件 运行,其中 4 个 运行 正常(通过对帐),然后 2 个失败。这是正常的,客户给了我们错误的数据,除了我在数据中找不到任何表明这是错误的事实。甚至我的堆栈调用也显示这些客户总数和我的总数应该一致:

ValueError: Recon totals do not match! Client: 2 8739.54 <type 'int'> <type 'float'>
RRD 2 8739.54 <type 'int'> <type 'float'>

我尝试了两种不同的方式来编写检查这两者的语句,并且得到了相同的结果(意料之中)。

最后,这是一个(修改过的,相关字段除外)示例数据字段(这是 header 记录及其计数):

"H","XXX","XXX","XXX","XXX","XXX","2","8739.54","","","","","","","","","","","","","","","",""

然后这是我协调的行:

"P","XXX","XXX","XXX","","XXX","XXX","XXX","XXX","XXX","XXX","XXX","846.80",...(more fields that aren't pertinent)
"P","XXX","XXX","XXX","","XXX","XXX","XXX","XXX","XXX","XXX","XXX","7892.74",...(more fields that aren't pertinent)

对于每条 "P" 记录,我都会增加我的文档计数,然后将非 "XXX" 字段添加到 运行 总数中。

总而言之,如有任何帮助,我们将不胜感激,我看不出我犯了任何逻辑错误。

我不会依赖 real-world 数据的浮点相等性检查,因为浮点数学在各种奇怪的方面都不精确。我建议首先确保这种差异是由浮点不精确引起的,方法是打印您正在比较的两个值之间的差异,并确保它与您正在使用的数字相比非常非常小。然后我建议定义一个误差范围,在这个误差范围内两个总数被认为实际上相等;对于 real-world 钱,半美分似乎是这种容忍度的自然价值。

我不同意建议存在误差的答案。这是不可靠的(因为边距会随着您求和的浮点数而变化)并且看起来确实不是一个好的解决方案。这让我想起了电影Office Space,他们只是在交易过程中砍掉一小部分便士,然后将它们转移到另一个银行账户(你的错误保证金)。

但是我绝对同意通过使用减法进行检查以确保这确实是一个浮点错误的建议。

我会完全放弃浮动并使用 decimal 库。您需要做的就是将所有 float 构造函数替换为 Decimal 构造函数:

from decimal import Decimal


def _recon_totals(self):
    """
    Reconcile the check total amount and document count and write out the file name,
    check numbers, vendor names, and timestamp to weekly report.
    """

    # Client totals
    client_doc_count = int(self.header_data[0][6])
    client_check_tot = Decimal(self.header_data[0][7])
    # Double check variable typing for reconciliation totals.
    logging.info('Document count is: {0}'.format(client_doc_count))
    # doc_var_type = type(client_doc_count)
    # assert doc_var_type is IntType, 'Doc count is not an integer: {0}'.format(
    #    doc_var_type) 
    logging.info('Check Total is: {0}'.format(client_check_tot))

    # RRD totals
    rrd_doc_count = 0
    rrd_check_tot = Decimal(0.0)

    with open(self.rpt_of, 'a') as rpt_outfile:
        for transact in self.transact_data:
            row_type = transact[0]
            logging.debug('Transaction type is: {0}'.format(row_type))

            if row_type == 'P':
                # Reconciliation
                rrd_doc_count += 1
                trans_chk_amt = Decimal(transact[12])                           trans_chk_type)
                rrd_check_tot += trans_chk_amt
                # Reporting
                vend_name = transact[2]
                file_name = self.infile.split('/')[-1]
                print('File name', file_name)
                check_num = transact[9]
                cur_time = date.today()
                rpt_outfile.write('{0:<50}{1:<50}{2:<30}{3}\n'.format(file_name,
                                                                      vend_name,
                                                                      check_num,
                                                                      cur_time))
    # Reconcile totals and return the lists for writing if they are correct
    # if (client_doc_count, client_check_tot) == (rrd_doc_count, rrd_check_tot):
    #     logging.info('Recon totals match!')
    if client_doc_count == rrd_doc_count and client_check_tot == rrd_check_tot:
    #     logging.info('Recon totals match!')
        return True

    else:
        raise ValueError('Recon totals do not match! Client: {0} {1} {2} {3}\n'
                         'RRD {4} {5} {6} {7}'.format(client_doc_count,
                                                      client_check_tot,
                                                      type(client_doc_count),
                                                      type(client_check_tot),
                                                      rrd_doc_count,
                                                      rrd_check_tot,
                                                      type(rrd_doc_count),
                                                      type(rrd_check_tot)))

小数的工作原理是以 10 为基数而不是像浮点数那样以 2 为基数存储数字。 Here 是浮点数不准确的一些示例。现在,由于我们所有的钱通常都是使用 base-10 进行交易的,因此只有使用 base-10 符号来处理它才有意义,而不是先有损地转换为 base-2,然后再返回 base-10。