我正在尝试使用 python 的对象路径从多级 json/dictionary 中选择特定值,但无法达到我想要的目标格式
I am trying to use python's objectpath to pick specific values out of a multi-level json/dictionary, but can't get to my desired target format
想象一下,我击中了一个 API,它 returns 是一个多层次的 json 斑点。然后我想从该 blob 中提取特定值,然后将其上传到数据库,因此我需要将其展平。
基本上我想从这样的事情开始:
d1 = {'results': [
{'a': 1, 'b': 10},
{'a': 2, 'b': 20},
{'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
{'a': 4, 'c': {'d': 200, 'e': 2000}}
]
}
像这样(最好是调整标签以表示原始层次结构):
d2 = [
{'a': 1, 'b': 10},
{'a': 2, 'b': 20},
{'a': 3, 'b': 30, 'c.d': 100},
{'a': 4, 'c.d': 200}
]
我觉得 jsonpath 或 objectpaths 应该可以做到这一点,但我一直无法让它工作。我可以很容易地遍历这个例子,但是我有一堆这样的东西可以做更多 "declarative" 会更可取。
我一定是遗漏了一些关于这些路径如何工作的东西。这是我的尝试:
from objectpath import Tree
# starting here...
d1 = {'results': [
{'a': 1, 'b': 10},
{'a': 2, 'b': 20},
{'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
{'a': 4, 'c': {'d': 200, 'e': 2000}}
]
}
# trying to get here...
# d2 = [
# {'a': 1, 'b': 10},
# {'a': 2, 'b': 20},
# {'a': 3, 'b': 30, 'c.d': 100},
# {'a': 4, 'c.d': 200}
# ]
if __name__ == "__main__":
t = Tree(d1)
print([x for x in t.execute('$.results.a')]) # works to get value of a
print([x for x in t.execute('$.results.(a,b)')]) # creates dictionary of a & b -- cool
print([x for x in t.execute('$.results.(a,b,c)')]) # adds all of c's sub document, makes sense
print([x for x in t.execute('$.results.(a,b,c.d)')]) # nothing changed?
print([x for x in t.execute('$.results.*')]) # selects everything, sure
print([x for x in t.execute('$.results.*["a"]')]) # just "a" value again, makes sense
print([x for x in t.execute('$.results.*["a" or "b"]')]) # apparently this means HAS "A" or "B" -- weird?
print([x for x in t.execute('$.results..(a,b,d)')]) # almost works but puts d in it's own dictionary?!
print([x for x in t.execute('{"a": $.results.a, "b": $.results.b, "c.d": $.results.c.d}')]) # what I would expect, but not even close
结果
[1, 2, 3, 4]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'a': 3}, {'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[1, 2, 3, 4]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'a': 3}, {'d': 100}, {'a': 4}, {'d': 200}]
['b', 'a', 'c.d']
我似乎很接近,但也许我这样做完全错了?
棉花糖之类的东西会更好用吗?这似乎有点矫枉过正,因为我必须定义一个 class 层次结构。
谢谢!
这是简单的递归:
from pprint import pprint
def flat_dict(d: dict):
o = {}
for k, v in d.items():
if type(v) is dict:
o.update({
k + '.' + key: value
for key, value in flat_dict(v).items()
})
else:
o[k] = v
return o
def main():
d = {
'result': [
{'a': 1, 'b': 10},
{'a': 2, 'b': 20},
{'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
{'a': 4, 'c': {'d': 200, 'e': 2000}}
]
}
res = [
flat_dict(e)
for e in d['result']
]
pprint(res)
if __name__ == '__main__':
main()
结果:
[{'a': 1, 'b': 10},
{'a': 2, 'b': 20},
{'a': 3, 'b': 30, 'c.d': 100, 'c.e': 1000},
{'a': 4, 'c.d': 200, 'c.e': 2000}]
想象一下,我击中了一个 API,它 returns 是一个多层次的 json 斑点。然后我想从该 blob 中提取特定值,然后将其上传到数据库,因此我需要将其展平。
基本上我想从这样的事情开始:
d1 = {'results': [
{'a': 1, 'b': 10},
{'a': 2, 'b': 20},
{'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
{'a': 4, 'c': {'d': 200, 'e': 2000}}
]
}
像这样(最好是调整标签以表示原始层次结构):
d2 = [
{'a': 1, 'b': 10},
{'a': 2, 'b': 20},
{'a': 3, 'b': 30, 'c.d': 100},
{'a': 4, 'c.d': 200}
]
我觉得 jsonpath 或 objectpaths 应该可以做到这一点,但我一直无法让它工作。我可以很容易地遍历这个例子,但是我有一堆这样的东西可以做更多 "declarative" 会更可取。
我一定是遗漏了一些关于这些路径如何工作的东西。这是我的尝试:
from objectpath import Tree
# starting here...
d1 = {'results': [
{'a': 1, 'b': 10},
{'a': 2, 'b': 20},
{'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
{'a': 4, 'c': {'d': 200, 'e': 2000}}
]
}
# trying to get here...
# d2 = [
# {'a': 1, 'b': 10},
# {'a': 2, 'b': 20},
# {'a': 3, 'b': 30, 'c.d': 100},
# {'a': 4, 'c.d': 200}
# ]
if __name__ == "__main__":
t = Tree(d1)
print([x for x in t.execute('$.results.a')]) # works to get value of a
print([x for x in t.execute('$.results.(a,b)')]) # creates dictionary of a & b -- cool
print([x for x in t.execute('$.results.(a,b,c)')]) # adds all of c's sub document, makes sense
print([x for x in t.execute('$.results.(a,b,c.d)')]) # nothing changed?
print([x for x in t.execute('$.results.*')]) # selects everything, sure
print([x for x in t.execute('$.results.*["a"]')]) # just "a" value again, makes sense
print([x for x in t.execute('$.results.*["a" or "b"]')]) # apparently this means HAS "A" or "B" -- weird?
print([x for x in t.execute('$.results..(a,b,d)')]) # almost works but puts d in it's own dictionary?!
print([x for x in t.execute('{"a": $.results.a, "b": $.results.b, "c.d": $.results.c.d}')]) # what I would expect, but not even close
结果
[1, 2, 3, 4]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'a': 3}, {'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[1, 2, 3, 4]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'a': 3}, {'d': 100}, {'a': 4}, {'d': 200}]
['b', 'a', 'c.d']
我似乎很接近,但也许我这样做完全错了? 棉花糖之类的东西会更好用吗?这似乎有点矫枉过正,因为我必须定义一个 class 层次结构。 谢谢!
这是简单的递归:
from pprint import pprint
def flat_dict(d: dict):
o = {}
for k, v in d.items():
if type(v) is dict:
o.update({
k + '.' + key: value
for key, value in flat_dict(v).items()
})
else:
o[k] = v
return o
def main():
d = {
'result': [
{'a': 1, 'b': 10},
{'a': 2, 'b': 20},
{'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
{'a': 4, 'c': {'d': 200, 'e': 2000}}
]
}
res = [
flat_dict(e)
for e in d['result']
]
pprint(res)
if __name__ == '__main__':
main()
结果:
[{'a': 1, 'b': 10},
{'a': 2, 'b': 20},
{'a': 3, 'b': 30, 'c.d': 100, 'c.e': 1000},
{'a': 4, 'c.d': 200, 'c.e': 2000}]