有没有一种好的、可控的方法来比较两个 .json 文件并将它们的差异生成 Excel Sheet in Python

Is there a good, controllable way to compare two .json files and generate their difference into an Excel Sheet in Python

我试过使用 jsondiff 但输出不符合我的要求。

with open('C/abc/file1.json') as f:
        a= json.load(f)
    
    with open('C/abc/file2.json') as s:
        b= json.load(s)
    c= diff(a,b)

我要在Excel sheet中打印的是Delta,file1中存在但file2中不存在的东西,存在于file1中但在 file2 中更改,在 file2 中新添加的内容不在 file1 中,如果可能的话,还有行号。 因此,如果有人知道如何实现这一点,请分享它,如果需要更多说明,请告诉我。

由于限制,我无法粘贴文件的原始内容,但是我已经把这些文件中的内容放在下面了。

file1.json

    {
  "Indicator": {
    "key1": "value 1",
    "key2": "value 2",
    "Name": "value 3",
    "key4": "value 4",
    "Description": "some text",
    "Subformulas": {
      "some key": "some sub-formula"
    },
    "Formula": "some formula"
  }
}

file2.json

    {
  "Indicator": {
    "key1": "value 1",
    "key2": "value 2",
    "key3":"value changed",
    "Name": "value 3",
    "key4": "value 4",
    "Description": "some text",
    "Subformulas": {
      "some key": "change in some sub-formula"
    },
    "Formula": "some formula"
  }
}

我已经用它来打印 Excel sheet.. 代码中的差异,我在 Stack Overflow 中的一个问题的答案中找到了它,但它不是打印差异进入 excel sheet...在终端上它显示出不同,但在 excel sheet 中却没有。所以我想我做错了什么,我不知道如何纠正。

c.items()
c1=list(c.items())
workbook = xlsxwriter.Workbook('myfile.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0

order=sorted(c.keys())

for key in order:
    row += 1
    #print(key)
    worksheet.write(row,    col,     key)
    for item in c[key]:
        #print(item,row, col+1)
        worksheet.write(row, col + 1, item)
        col += 1
    col = 0

workbook.close()

如果您阅读 jsondiff https://github.com/xlwings/jsondiff/tree/05dd7dd6c0b712fe54491289d3972ab58a125e11/jsondiff 的源代码,您会看到结果的 'syntax' 有几个选项 - 简短文档中未提及。默认值是 'compact' ,它似乎只显示更改的内容,正如您所发现的那样。如果你选择 'explicit' 语法,你会在 diff 输出中得到 'insert'、'update'、'delete' 等。您可以探索另一种语法 'symetric'。

所以改变:

c = jsondiff.diff(a,b)

至:

c = jsondiff.diff(a,b,syntax='explicit')

我修改了你的 file2,所以它添加和删除了一些键:

b = {
  "Indicator": {
    "key1": "value 1",
    "key3":"value changed",
    "Name": "value 3",
    "key4": "value 4",
    "key5": "added",
    
    "Description": "some text",
    "Subformulas": {
      "some key": "change in some sub-formula"
    },
    "Formula": "some formula"
  }
}

diff 的 c 结果中的任何结果是 (pprint-ed):

{update: {'Indicator': {insert: {'key3': 'value changed', 'key5': 'added'},
                        delete: ['key2'],
                        update: {'Subformulas': {update: {'some key': 'change '
                                                                      'in some '
                                                                      'sub-formula'}}}}}}

xls 编写代码必须 'walk' 嵌套字典才能产生输出。

经过一些研究,我找到了通用 list/dictionary 助行器的代码 objwalk() https://code.activestate.com/recipes/577982-recursively-walk-python-objects/

我选择通过将键的名称与 / 连接来展平键的路径 - 显然,如果您的键可能包含 /,您将不得不对此进行调整。

这是作为最小可重现示例的结果代码 - 注意 当您下次在此处创建问题时,您应该对您在问题中输入的代码采用相同的方法,即提供 运行 代码所需的一切。

import collections.abc
import jsondiff
import pprint

import xlsxwriter

a = {
  "Indicator": {
    "key1": "value 1",
    "key2": "value 2",
    "Name": "value 3",
    "key4": "value 4",
    "Description": "some text",
    "Subformulas": {
      "some key": "some sub-formula"
    },
    "Formula": "some formula"
  }
}

b = {
  "Indicator": {
    "key1": "value 1",
    "key3":"value changed",
    "Name": "value 3",
    "key4": "value 4",
    "key5": "added",
    
    "Description": "some text",
    "Subformulas": {
      "some key": "change in some sub-formula"
    },
    "Formula": "some formula"
  }
}

c = jsondiff.diff(a,b,syntax='explicit')

pprint.pprint( c)

# dual python 2/3 compatability, inspired by the "six" library
string_types = (str, unicode) if str is bytes else (str, bytes)
iteritems = lambda mapping: getattr(mapping, 'iteritems', mapping.items)()
# from https://code.activestate.com/recipes/577982-recursively-walk-python-objects/
def objwalk(obj, path=(), memo=None):
    if memo is None:
        memo = set()
    iterator = None
    if isinstance(obj, collections.abc.Mapping):
        iterator = iteritems
    elif isinstance(obj, (collections.abc.Sequence, collections.abc.Set)) and not isinstance(obj, string_types):
        iterator = enumerate
    if iterator:
        if id(obj) not in memo:
            memo.add(id(obj))
            for path_component, value in iterator(obj):
                for result in objwalk(value, path + (path_component,), memo):
                    yield result
            memo.remove(id(obj))
    else:
        yield path, obj
       
# from 
def grouped(iterable, n):
    "s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
    return zip(*[iter(iterable)]*n)

workbook = xlsxwriter.Workbook('myfile.xlsx')
worksheet = workbook.add_worksheet()

row = 0
# write the column headings
worksheet.write(row, 0, "Path" )
worksheet.write(row, 1, "Change" )
worksheet.write(row, 2, "New Value")

for o in objwalk(c):
    print( f"{o=}" )
    keypath = []
    for op,key in grouped(o[0],2):
        if type(key)==str:
            keypath.append(str(key))
#        print( f"  {op=}" )
        print( f"  {op=} {key=}" )
        lastop = op
    print( f"{keypath=} {lastop=} {o[1]=}" )
    row += 1
    # write the values for this row
    worksheet.write(row, 0, "/".join(keypath) )
    worksheet.write(row, 1, repr(lastop))
    worksheet.write(row, 2, o[1])
    
workbook.close()

输出:

“删除”行看起来与 'New Value' 列中显示的已删除键很奇怪,但这就是 jsondiffexplicit 语法生成输出的方式。你当然可以用不同的方式处理它。