有没有一种好的、可控的方法来比较两个 .json 文件并将它们的差异生成 Excel Sheet in Python
Is there a good, controllable way to compare two .json files and generate their difference into an Excel Sheet in Python
我试过使用 jsondiff 但输出不符合我的要求。
with open('C/abc/file1.json') as f:
a= json.load(f)
with open('C/abc/file2.json') as s:
b= json.load(s)
c= diff(a,b)
我要在Excel sheet中打印的是Delta,file1中存在但file2中不存在的东西,存在于file1中但在 file2 中更改,在 file2 中新添加的内容不在 file1 中,如果可能的话,还有行号。
因此,如果有人知道如何实现这一点,请分享它,如果需要更多说明,请告诉我。
由于限制,我无法粘贴文件的原始内容,但是我已经把这些文件中的内容放在下面了。
file1.json
{
"Indicator": {
"key1": "value 1",
"key2": "value 2",
"Name": "value 3",
"key4": "value 4",
"Description": "some text",
"Subformulas": {
"some key": "some sub-formula"
},
"Formula": "some formula"
}
}
file2.json
{
"Indicator": {
"key1": "value 1",
"key2": "value 2",
"key3":"value changed",
"Name": "value 3",
"key4": "value 4",
"Description": "some text",
"Subformulas": {
"some key": "change in some sub-formula"
},
"Formula": "some formula"
}
}
我已经用它来打印 Excel sheet.. 代码中的差异,我在 Stack Overflow 中的一个问题的答案中找到了它,但它不是打印差异进入 excel sheet...在终端上它显示出不同,但在 excel sheet 中却没有。所以我想我做错了什么,我不知道如何纠正。
c.items()
c1=list(c.items())
workbook = xlsxwriter.Workbook('myfile.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0
order=sorted(c.keys())
for key in order:
row += 1
#print(key)
worksheet.write(row, col, key)
for item in c[key]:
#print(item,row, col+1)
worksheet.write(row, col + 1, item)
col += 1
col = 0
workbook.close()
如果您阅读 jsondiff https://github.com/xlwings/jsondiff/tree/05dd7dd6c0b712fe54491289d3972ab58a125e11/jsondiff 的源代码,您会看到结果的 'syntax' 有几个选项 - 简短文档中未提及。默认值是 'compact' ,它似乎只显示更改的内容,正如您所发现的那样。如果你选择 'explicit' 语法,你会在 diff 输出中得到 'insert'、'update'、'delete' 等。您可以探索另一种语法 'symetric'。
所以改变:
c = jsondiff.diff(a,b)
至:
c = jsondiff.diff(a,b,syntax='explicit')
我修改了你的 file2,所以它添加和删除了一些键:
b = {
"Indicator": {
"key1": "value 1",
"key3":"value changed",
"Name": "value 3",
"key4": "value 4",
"key5": "added",
"Description": "some text",
"Subformulas": {
"some key": "change in some sub-formula"
},
"Formula": "some formula"
}
}
diff 的 c
结果中的任何结果是 (pprint-ed):
{update: {'Indicator': {insert: {'key3': 'value changed', 'key5': 'added'},
delete: ['key2'],
update: {'Subformulas': {update: {'some key': 'change '
'in some '
'sub-formula'}}}}}}
xls 编写代码必须 'walk' 嵌套字典才能产生输出。
经过一些研究,我找到了通用 list/dictionary 助行器的代码 objwalk()
https://code.activestate.com/recipes/577982-recursively-walk-python-objects/
我选择通过将键的名称与 / 连接来展平键的路径 - 显然,如果您的键可能包含 /,您将不得不对此进行调整。
这是作为最小可重现示例的结果代码 - 注意 当您下次在此处创建问题时,您应该对您在问题中输入的代码采用相同的方法,即提供 运行 代码所需的一切。
import collections.abc
import jsondiff
import pprint
import xlsxwriter
a = {
"Indicator": {
"key1": "value 1",
"key2": "value 2",
"Name": "value 3",
"key4": "value 4",
"Description": "some text",
"Subformulas": {
"some key": "some sub-formula"
},
"Formula": "some formula"
}
}
b = {
"Indicator": {
"key1": "value 1",
"key3":"value changed",
"Name": "value 3",
"key4": "value 4",
"key5": "added",
"Description": "some text",
"Subformulas": {
"some key": "change in some sub-formula"
},
"Formula": "some formula"
}
}
c = jsondiff.diff(a,b,syntax='explicit')
pprint.pprint( c)
# dual python 2/3 compatability, inspired by the "six" library
string_types = (str, unicode) if str is bytes else (str, bytes)
iteritems = lambda mapping: getattr(mapping, 'iteritems', mapping.items)()
# from https://code.activestate.com/recipes/577982-recursively-walk-python-objects/
def objwalk(obj, path=(), memo=None):
if memo is None:
memo = set()
iterator = None
if isinstance(obj, collections.abc.Mapping):
iterator = iteritems
elif isinstance(obj, (collections.abc.Sequence, collections.abc.Set)) and not isinstance(obj, string_types):
iterator = enumerate
if iterator:
if id(obj) not in memo:
memo.add(id(obj))
for path_component, value in iterator(obj):
for result in objwalk(value, path + (path_component,), memo):
yield result
memo.remove(id(obj))
else:
yield path, obj
# from
def grouped(iterable, n):
"s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
return zip(*[iter(iterable)]*n)
workbook = xlsxwriter.Workbook('myfile.xlsx')
worksheet = workbook.add_worksheet()
row = 0
# write the column headings
worksheet.write(row, 0, "Path" )
worksheet.write(row, 1, "Change" )
worksheet.write(row, 2, "New Value")
for o in objwalk(c):
print( f"{o=}" )
keypath = []
for op,key in grouped(o[0],2):
if type(key)==str:
keypath.append(str(key))
# print( f" {op=}" )
print( f" {op=} {key=}" )
lastop = op
print( f"{keypath=} {lastop=} {o[1]=}" )
row += 1
# write the values for this row
worksheet.write(row, 0, "/".join(keypath) )
worksheet.write(row, 1, repr(lastop))
worksheet.write(row, 2, o[1])
workbook.close()
输出:
“删除”行看起来与 'New Value' 列中显示的已删除键很奇怪,但这就是 jsondiff
为 explicit
语法生成输出的方式。你当然可以用不同的方式处理它。
我试过使用 jsondiff 但输出不符合我的要求。
with open('C/abc/file1.json') as f:
a= json.load(f)
with open('C/abc/file2.json') as s:
b= json.load(s)
c= diff(a,b)
我要在Excel sheet中打印的是Delta,file1中存在但file2中不存在的东西,存在于file1中但在 file2 中更改,在 file2 中新添加的内容不在 file1 中,如果可能的话,还有行号。 因此,如果有人知道如何实现这一点,请分享它,如果需要更多说明,请告诉我。
由于限制,我无法粘贴文件的原始内容,但是我已经把这些文件中的内容放在下面了。
file1.json
{
"Indicator": {
"key1": "value 1",
"key2": "value 2",
"Name": "value 3",
"key4": "value 4",
"Description": "some text",
"Subformulas": {
"some key": "some sub-formula"
},
"Formula": "some formula"
}
}
file2.json
{
"Indicator": {
"key1": "value 1",
"key2": "value 2",
"key3":"value changed",
"Name": "value 3",
"key4": "value 4",
"Description": "some text",
"Subformulas": {
"some key": "change in some sub-formula"
},
"Formula": "some formula"
}
}
我已经用它来打印 Excel sheet.. 代码中的差异,我在 Stack Overflow 中的一个问题的答案中找到了它,但它不是打印差异进入 excel sheet...在终端上它显示出不同,但在 excel sheet 中却没有。所以我想我做错了什么,我不知道如何纠正。
c.items()
c1=list(c.items())
workbook = xlsxwriter.Workbook('myfile.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0
order=sorted(c.keys())
for key in order:
row += 1
#print(key)
worksheet.write(row, col, key)
for item in c[key]:
#print(item,row, col+1)
worksheet.write(row, col + 1, item)
col += 1
col = 0
workbook.close()
如果您阅读 jsondiff https://github.com/xlwings/jsondiff/tree/05dd7dd6c0b712fe54491289d3972ab58a125e11/jsondiff 的源代码,您会看到结果的 'syntax' 有几个选项 - 简短文档中未提及。默认值是 'compact' ,它似乎只显示更改的内容,正如您所发现的那样。如果你选择 'explicit' 语法,你会在 diff 输出中得到 'insert'、'update'、'delete' 等。您可以探索另一种语法 'symetric'。
所以改变:
c = jsondiff.diff(a,b)
至:
c = jsondiff.diff(a,b,syntax='explicit')
我修改了你的 file2,所以它添加和删除了一些键:
b = {
"Indicator": {
"key1": "value 1",
"key3":"value changed",
"Name": "value 3",
"key4": "value 4",
"key5": "added",
"Description": "some text",
"Subformulas": {
"some key": "change in some sub-formula"
},
"Formula": "some formula"
}
}
diff 的 c
结果中的任何结果是 (pprint-ed):
{update: {'Indicator': {insert: {'key3': 'value changed', 'key5': 'added'},
delete: ['key2'],
update: {'Subformulas': {update: {'some key': 'change '
'in some '
'sub-formula'}}}}}}
xls 编写代码必须 'walk' 嵌套字典才能产生输出。
经过一些研究,我找到了通用 list/dictionary 助行器的代码 objwalk()
https://code.activestate.com/recipes/577982-recursively-walk-python-objects/
我选择通过将键的名称与 / 连接来展平键的路径 - 显然,如果您的键可能包含 /,您将不得不对此进行调整。
这是作为最小可重现示例的结果代码 - 注意 当您下次在此处创建问题时,您应该对您在问题中输入的代码采用相同的方法,即提供 运行 代码所需的一切。
import collections.abc
import jsondiff
import pprint
import xlsxwriter
a = {
"Indicator": {
"key1": "value 1",
"key2": "value 2",
"Name": "value 3",
"key4": "value 4",
"Description": "some text",
"Subformulas": {
"some key": "some sub-formula"
},
"Formula": "some formula"
}
}
b = {
"Indicator": {
"key1": "value 1",
"key3":"value changed",
"Name": "value 3",
"key4": "value 4",
"key5": "added",
"Description": "some text",
"Subformulas": {
"some key": "change in some sub-formula"
},
"Formula": "some formula"
}
}
c = jsondiff.diff(a,b,syntax='explicit')
pprint.pprint( c)
# dual python 2/3 compatability, inspired by the "six" library
string_types = (str, unicode) if str is bytes else (str, bytes)
iteritems = lambda mapping: getattr(mapping, 'iteritems', mapping.items)()
# from https://code.activestate.com/recipes/577982-recursively-walk-python-objects/
def objwalk(obj, path=(), memo=None):
if memo is None:
memo = set()
iterator = None
if isinstance(obj, collections.abc.Mapping):
iterator = iteritems
elif isinstance(obj, (collections.abc.Sequence, collections.abc.Set)) and not isinstance(obj, string_types):
iterator = enumerate
if iterator:
if id(obj) not in memo:
memo.add(id(obj))
for path_component, value in iterator(obj):
for result in objwalk(value, path + (path_component,), memo):
yield result
memo.remove(id(obj))
else:
yield path, obj
# from
def grouped(iterable, n):
"s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
return zip(*[iter(iterable)]*n)
workbook = xlsxwriter.Workbook('myfile.xlsx')
worksheet = workbook.add_worksheet()
row = 0
# write the column headings
worksheet.write(row, 0, "Path" )
worksheet.write(row, 1, "Change" )
worksheet.write(row, 2, "New Value")
for o in objwalk(c):
print( f"{o=}" )
keypath = []
for op,key in grouped(o[0],2):
if type(key)==str:
keypath.append(str(key))
# print( f" {op=}" )
print( f" {op=} {key=}" )
lastop = op
print( f"{keypath=} {lastop=} {o[1]=}" )
row += 1
# write the values for this row
worksheet.write(row, 0, "/".join(keypath) )
worksheet.write(row, 1, repr(lastop))
worksheet.write(row, 2, o[1])
workbook.close()
输出:
“删除”行看起来与 'New Value' 列中显示的已删除键很奇怪,但这就是 jsondiff
为 explicit
语法生成输出的方式。你当然可以用不同的方式处理它。