如何正确包装单元测试差异?

How to wrap correctly the unit testing diff?

我是 运行 python 3.6.4,但有时单元测试差异无法按预期工作。例如,在下面有一个强制单元测试错误与预期的换行行为是不需要的。

import unittest

class TestSemanticRules(unittest.TestCase):
    maxDiff = None

    def test_badWrapping(self):
        self.assertEqual(
            "1. Duplicated target language name defined in your grammar on: [@-1,63:87='Abstract Machine Language'<__ANON_3>,3:19]\n"
            "2. Duplicated master scope name defined in your grammar on: [@-1,138:147='source.sma'<__ANON_3>,5:20]"
            ,
            "1. Duplicated target language name defined in your grammar on: free_input_string\n"
            "  text_chunk_end  Abstract Machine Language"
            "\n"
            "2. Duplicated master scope name defined in your grammar on: free_input_string\n"
            "  text_chunk_end  source.sma"
        )

unittest.main(failfast=True)

运行 它与 python3 test.py 你看到第一个错误 diff 行没有被包裹:

预期结果为:

我尝试搜索替代差异库,然后我尝试用自定义差异库替换 unittest 差异作为内置 difflib,但差异结果相同。所以,我假设 unittest 包正在使用 difflib.

import unittest
import difflib

class TestSemanticRules(unittest.TestCase):
    maxDiff = None

    def myAssertEquals(self, expected, actual):
        expected = expected.splitlines( 1 )
        actual = actual.splitlines( 1 )

        if expected != actual:
            diff = difflib.context_diff( expected, actual, fromfile='expected input', tofile='actual output', lineterm='\n' )
            self.fail( '\n' + ''.join( diff ) )

    def test_badWrapping(self):
        self.myAssertEquals(
            "1. Duplicated target language name defined in your grammar on: [@-1,63:87='Abstract Machine Language'<__ANON_3>,3:19]\n"
            "2. Duplicated master scope name defined in your grammar on: [@-1,138:147='source.sma'<__ANON_3>,5:20]"
            ,
            "1. Duplicated target language name defined in your grammar on: free_input_string\n"
            "  text_chunk_end  Abstract Machine Language"
            "\n"
            "2. Duplicated master scope name defined in your grammar on: free_input_string\n"
            "  text_chunk_end  source.sma"
        )

可以配置unittest包使用的difflib内置库,所以,不会出现这种情况?或者 difflib 包有可靠的替代品吗?

搜索 difflib 的替代品我得到了 3 个结果:

  1. https://github.com/Carreau/difflib2.py(4 年无更新)
  2. https://github.com/google/diff-match-patch
  3. https://github.com/seperman/deepdiff

然后,使用 diff-match-patch 我设法构建了以下代码:

import re
import unittest

import textwrap
import diff_match_patch

class DiffMatchPatch(diff_match_patch.diff_match_patch):

    def diff_prettyText(self, diffs):
        """Convert a diff array into a pretty Text report.
        Args:
          diffs: Array of diff tuples.
        Returns:
          Text representation.
        """
        results_diff = []
        cut_next_new_line = [False]
        # print('\ndiffs:\n%s\n' % diffs)

        operations = (self.DIFF_INSERT, self.DIFF_DELETE)

        def parse(sign):
            # print('new1:', text.encode( 'ascii' ))

            if text:
                new = text

            else:
                return ''

            new = textwrap.indent( "%s" % new, sign, lambda line: True )

            # force the diff change to show up on a new line for highlighting
            if len(results_diff) > 0:
                new = '\n' + new

            if new[-1] == '\n':

                if op == self.DIFF_INSERT and next_text and new[-1] == '\n' and next_text[0] == '\n':
                    cut_next_new_line[0] = True;

                    # Avoids a double plus sign showing up when the diff has the element (1, '\n')
                    if len(text) > 1: new = new + '%s\n' % sign

            elif next_op not in operations and next_text and next_text[0] != '\n':
                new = new + '\n'

            # print('new2:', new.encode( 'ascii' ))
            return new

        for index in range(len(diffs)):
            op, text = diffs[index]
            if index < len(diffs) - 1: 
                next_op, next_text = diffs[index+1]
            else:
                next_op, next_text = (0, "")

            if op == self.DIFF_INSERT:
                results_diff.append( parse( "+ " ) )

            elif op == self.DIFF_DELETE:
                results_diff.append( parse( "- " ) )

            elif op == self.DIFF_EQUAL:
                # print('new3:', text.encode( 'ascii' ))
                text = textwrap.indent(text, "  ")

                if cut_next_new_line[0]:
                    cut_next_new_line[0] = False
                    text = text[1:]

                results_diff.append(text)
                # print('new4:', text.encode( 'ascii' ))

        return "".join(results_diff)

    def diff_linesToWords(self, text1, text2, delimiter=re.compile('\n')):
        """
            Split two texts into an array of strings.  Reduce the texts to a string
            of hashes where each Unicode character represents one line.

            95% of this function code is copied from `diff_linesToChars` on:
                https://github.com/google/diff-match-patch/blob/895a9512bbcee0ac5a8ffcee36062c8a79f5dcda/python3/diff_match_patch.py#L381

            Copyright 2018 The diff-match-patch Authors.
            https://github.com/google/diff-match-patch
            Licensed under the Apache License, Version 2.0 (the "License");
            you may not use this file except in compliance with the License.
            You may obtain a copy of the License at
              http://www.apache.org/licenses/LICENSE-2.0

            Args:
                text1: First string.
                text2: Second string.
                delimiter: a re.compile() expression for the word delimiter type

            Returns:
                Three element tuple, containing the encoded text1, the encoded text2 and
                the array of unique strings.  The zeroth element of the array of unique
                strings is intentionally blank.
        """
        lineArray = []  # e.g. lineArray[4] == "Hello\n"
        lineHash = {}   # e.g. lineHash["Hello\n"] == 4

        # "\x00" is a valid character, but various debuggers don't like it.
        # So we'll insert a junk entry to avoid generating a null character.
        lineArray.append('')

        def diff_linesToCharsMunge(text):
            """Split a text into an array of strings.  Reduce the texts to a string
            of hashes where each Unicode character represents one line.
            Modifies linearray and linehash through being a closure.
            Args:
                text: String to encode.
            Returns:
                Encoded string.
            """
            chars = []
            # Walk the text, pulling out a substring for each line.
            # text.split('\n') would would temporarily double our memory footprint.
            # Modifying text would create many large strings to garbage collect.
            lineStart = 0
            lineEnd = -1
            while lineEnd < len(text) - 1:
                lineEnd = delimiter.search(text, lineStart)

                if lineEnd:
                    lineEnd = lineEnd.start()

                else:
                    lineEnd = len(text) - 1

                line = text[lineStart:lineEnd + 1]

                if line in lineHash:
                    chars.append(chr(lineHash[line]))
                else:
                    if len(lineArray) == maxLines:
                        # Bail out at 1114111 because chr(1114112) throws.
                        line = text[lineStart:]
                        lineEnd = len(text)
                    lineArray.append(line)
                    lineHash[line] = len(lineArray) - 1
                    chars.append(chr(len(lineArray) - 1))
                lineStart = lineEnd + 1
            return "".join(chars)

        # Allocate 2/3rds of the space for text1, the rest for text2.
        maxLines = 666666
        chars1 = diff_linesToCharsMunge(text1)
        maxLines = 1114111
        chars2 = diff_linesToCharsMunge(text2)
        return (chars1, chars2, lineArray)

class TestRules(unittest.TestCase):
    ## Set the maximum size of the assertion error message when Unit Test fail
    maxDiff = None

    ## Whether `characters diff=0`, `words diff=1` or `lines diff=2` will be used
    diffMode = 1

    def __init__(self, *args, **kwargs):
        diffMode = kwargs.pop('diffMode', -1)
        if diffMode > -1: self.diffMode = diffMode

        super(TestRules, self).__init__(*args, **kwargs)

    def setUp(self):
        if diff_match_patch: self.addTypeEqualityFunc(str, self.myAssertEqual)

    def myAssertEqual(self, expected, actual, msg=""):
        """
            How to wrap correctly the unit testing diff?
            
        """
        # print( '\n\nexpected\n%s' % expected )
        # print( '\n\nactual\n%s' % actual )

        if expected != actual:
            diff_match = DiffMatchPatch()

            if self.diffMode == 0:
                diffs = diff_match.diff_main(expected, actual)

            else:
                diff_struct = diff_match.diff_linesToWords(expected, actual,
                        re.compile(r'\b') if self.diffMode == 1 else re.compile(r'\n') )

                lineText1 = diff_struct[0] # .chars1;
                lineText2 = diff_struct[1] # .chars2;
                lineArray = diff_struct[2] # .lineArray;

                diffs = diff_match.diff_main(lineText1, lineText2, False);
                diff_match.diff_charsToLines(diffs, lineArray);
                diff_match.diff_cleanupSemantic(diffs)

            if msg:
                msg += '\n'

            else:
                msg = "The strings does not match...\n"

            self.fail( msg + diff_match.diff_prettyText(diffs) )

    def test_characthersDiffModeExample1(self):
        self.diffMode = 0
        expected = "1. Duplicated target language name defined in your grammar on: [@-1,63:87='Abstract Machine Language'<__ANON_3>,3:19]\n" \
                   "2. Duplicated master scope name defined in your grammar on: [@-1,138:147='source.sma'<__ANON_3>,5:20]"

        actual = "1. Duplicated target language name defined in your grammar on: free_input_string\n" \
                 "  text_chunk_end  Abstract Machine Language\n" \
                 "\n" \
                 "2. Duplicated master scope name defined in your grammar on: free_input_string\n" \
                 "  text_chunk_end  source.sma" \

        with self.assertRaises( AssertionError ) as error:
            self.myAssertEqual( expected, actual )

        print( '\nerror.exception\n%s\n' % str(error.exception) )
        self.assertEqual(
            "The strings does not match...\n"
            "  1. Duplicated target language name defined in your grammar on: \n"
            "- [@-1,63:87='\n"
            "+ free_input_string\n"
            "+   text_chunk_end  \n"
            "  Abstract Machine Language\n"
            "- '<__ANON_3>,3:19]\n"
            "+ \n"
            "  2. Duplicated master scope name defined in your grammar on: \n"
            "- [@-1,138:147='\n"
            "+ free_input_string\n"
            "+   text_chunk_end  \n"
            "  source.sma\n"
            "- '<__ANON_3>,5:20]"
            , str(error.exception) )

    def test_wordsDiffModeExample1(self):
        self.diffMode = 1
        expected = "1. Duplicated target language name defined in your grammar on: [@-1,63:87='Abstract Machine Language'<__ANON_3>,3:19]\n" \
                   "2. Duplicated master scope name defined in your grammar on: [@-1,138:147='source.sma'<__ANON_3>,5:20]"

        actual = "1. Duplicated target language name defined in your grammar on: free_input_string\n" \
                 "  text_chunk_end  Abstract Machine Language\n" \
                 "\n" \
                 "2. Duplicated master scope name defined in your grammar on: free_input_string\n" \
                 "  text_chunk_end  source.sma" \

        with self.assertRaises( AssertionError ) as error:
            self.myAssertEqual( expected, actual )

        print( '\nerror.exception\n%s\n' % str(error.exception) )
        self.assertEqual(
            "The strings does not match...\n"
            "  1. Duplicated target language name defined in your grammar on: \n"
            "- [@-1,63:87='Abstract Machine Language'<__ANON_3>,3:19]\n"
            "+ free_input_string\n"
            "+   text_chunk_end  Abstract Machine Language\n"
            "+ \n"
            "  2. Duplicated master scope name defined in your grammar on: \n"
            "- [@-1,138:147='source.sma'<__ANON_3>,5:20]\n"
            "+ free_input_string\n"
            "+   text_chunk_end  source.sma"
            , str(error.exception) )

    def test_linesDiffModeExample1(self):
        self.diffMode = 2
        expected = "1. Duplicated target language name defined in your grammar on: [@-1,63:87='Abstract Machine Language'<__ANON_3>,3:19]\n" \
                   "2. Duplicated master scope name defined in your grammar on: [@-1,138:147='source.sma'<__ANON_3>,5:20]"

        actual = "1. Duplicated target language name defined in your grammar on: free_input_string\n" \
                 "  text_chunk_end  Abstract Machine Language\n" \
                 "\n" \
                 "2. Duplicated master scope name defined in your grammar on: free_input_string\n" \
                 "  text_chunk_end  source.sma" \

        with self.assertRaises( AssertionError ) as error:
            self.myAssertEqual( expected, actual )

        print( '\nerror.exception\n%s\n' % str(error.exception) )
        self.assertEqual(
            "The strings does not match...\n"
            "- 1. Duplicated target language name defined in your grammar on: [@-1,63:87='Abstract Machine Language'<__ANON_3>,3:19]\n"
            "- 2. Duplicated master scope name defined in your grammar on: [@-1,138:147='source.sma'<__ANON_3>,5:20]\n"
            "+ 1. Duplicated target language name defined in your grammar on: free_input_string\n"
            "+   text_chunk_end  Abstract Machine Language\n"
            "+ \n"
            "+ 2. Duplicated master scope name defined in your grammar on: free_input_string\n"
            "+   text_chunk_end  source.sma"
            , str(error.exception) )

unittest.main(failfast=True, verbosity=2)

使用diffMode=0作为字符


使用diffMode=1作为文字


使用diffMode=2

这似乎已经比 unittest 模块的内置行为更好。这个新的diff_prettyText()怎么还能改进?

参考资料

  1. How to print the comparison of two multiline strings in unified diff format?

  2. python difflib comparing files
  3. How to print the 2 full objects instead of show diff on a python unit test error?

这是 python 上的错误,可以通过应用此补丁修复:

  1. unittest assertEqual difference output foiled by newlines
  2. [https://bugs.python.org/issue35687](The 单元测试模块差异 missing/forgetting/not 在 + 和 ? 之前放置换行符对于一些输入)