列表理解是合并这些 JSON 文件 Python 的正确方法吗?
Is List Comprehension the Right Way to Merge these JSON files Python?
如何使用 python 列表推导式将一个 JSON 文件中的值替换为另一个 JSON 文件中的链接值?
一个看起来像这样,有一个 "a" 值,我需要用它来替换另一个列表中的值,使用 "b" 作为连接器(a、b 和 c 值都是唯一 ID):
{
"records":[
{
"a": "7hk2k989u23lesdfsfd",
"b":"b8",
},
{
"a": "9ty562349u23lesdfsfd",
"b":"b6",
},
{
"a": "Ur233Fglesdfsfd",
"b":"b2",
}
]
}
另一个看起来像这样 "d"s 需要替换为相应的 "a" 值,其中 "b" 是关键:
{
"records":[
{
"c":00023414,
"d":["b8","b6"]
},
{
"c":0005814,
"d":["b8","b2","b6"]
}
]
}
所以我最终得到:
{
"records":[
{
"c":00023414,
"d":["7hk2k989u23lesdfsfd","9ty562349u23lesdfsfd"]
},
{
"c":0005814,
"d":["7hk2k989u23lesdfsfd","Ur233Fglesdfsfd","9ty562349u23lesdfsfd"]
}
]
}
使用 python 解决这个问题的正确方法是什么,特别是如果我需要代码来提高性能?
您的文件无效 JSON。您应该检查 JSON 验证器,例如 JSON Lint
In [494]: import json
In [495]: with open('/Users/ado/Desktop/ab.json') as f:
...: ab = json.load(f)
...:
In [496]: with open('/Users/ado/Desktop/cd.json') as f:
...: cd = json.load(f)
...:
请注意,您可以将 ab
简单地视为相关 a
和 b
的集合。这是使用字典将 b
s 映射到 a
s
的好时机
In [497]: d_ab = {r['b']: r['a'] for r in ab['records']}
In [498]: d_ab
Out[498]:
{'b2': 'Ur233Fglesdfsfd',
'b6': '9ty562349u23lesdfsfd',
'b8': '7hk2k989u23lesdfsfd'}
现在您可以迭代 cd
中的 records
并使用 list
理解来创建新值
In [499]: for r in cd['records']:
...: r['d'] = [d_ab.get(d) for d in r['d']]
...:
In [500]: cd
Out[500]:
{'records': [{'c': 23414,
'd': ['7hk2k989u23lesdfsfd', '9ty562349u23lesdfsfd']},
{'c': 5814,
'd': ['7hk2k989u23lesdfsfd', 'Ur233Fglesdfsfd', '9ty562349u23lesdfsfd']}]}
最后,将新内容写入文件
In [502]: with open('/Users/ado/Desktop/cd-mapped.json', 'w') as f:
...: f.write(json.dumps(cd))
...:
这个解决方案的前提是在ab
中每条记录总有a
和b
。
PS 为了好玩,你可以使用 map
和 dict.get
而不是理解
In [505]: for r in cd['records']:
...: r['d'] = list(map(d_ab.get, r['d']))
...:
In [506]: cd
Out[506]:
{'records': [{'c': 23414,
'd': ['7hk2k989u23lesdfsfd', '9ty562349u23lesdfsfd']},
{'c': 5814,
'd': ['7hk2k989u23lesdfsfd', 'Ur233Fglesdfsfd', '9ty562349u23lesdfsfd']}]}
就性能而言,理解力通常会超过 map
s
In [509]: %timeit for r in cd['records']: r['d'] = [d_ab.get(d) for d in r['d']]
...:
The slowest run took 7.19 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.34 µs per loop
In [511]: %timeit for r in cd['records']: r['d'] = list(map(d_ab.get, r['d']))
The slowest run took 7.19 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.74 µs per loop
如何使用 python 列表推导式将一个 JSON 文件中的值替换为另一个 JSON 文件中的链接值?
一个看起来像这样,有一个 "a" 值,我需要用它来替换另一个列表中的值,使用 "b" 作为连接器(a、b 和 c 值都是唯一 ID):
{
"records":[
{
"a": "7hk2k989u23lesdfsfd",
"b":"b8",
},
{
"a": "9ty562349u23lesdfsfd",
"b":"b6",
},
{
"a": "Ur233Fglesdfsfd",
"b":"b2",
}
]
}
另一个看起来像这样 "d"s 需要替换为相应的 "a" 值,其中 "b" 是关键:
{
"records":[
{
"c":00023414,
"d":["b8","b6"]
},
{
"c":0005814,
"d":["b8","b2","b6"]
}
]
}
所以我最终得到:
{
"records":[
{
"c":00023414,
"d":["7hk2k989u23lesdfsfd","9ty562349u23lesdfsfd"]
},
{
"c":0005814,
"d":["7hk2k989u23lesdfsfd","Ur233Fglesdfsfd","9ty562349u23lesdfsfd"]
}
]
}
使用 python 解决这个问题的正确方法是什么,特别是如果我需要代码来提高性能?
您的文件无效 JSON。您应该检查 JSON 验证器,例如 JSON Lint
In [494]: import json
In [495]: with open('/Users/ado/Desktop/ab.json') as f:
...: ab = json.load(f)
...:
In [496]: with open('/Users/ado/Desktop/cd.json') as f:
...: cd = json.load(f)
...:
请注意,您可以将 ab
简单地视为相关 a
和 b
的集合。这是使用字典将 b
s 映射到 a
s
In [497]: d_ab = {r['b']: r['a'] for r in ab['records']}
In [498]: d_ab
Out[498]:
{'b2': 'Ur233Fglesdfsfd',
'b6': '9ty562349u23lesdfsfd',
'b8': '7hk2k989u23lesdfsfd'}
现在您可以迭代 cd
中的 records
并使用 list
理解来创建新值
In [499]: for r in cd['records']:
...: r['d'] = [d_ab.get(d) for d in r['d']]
...:
In [500]: cd
Out[500]:
{'records': [{'c': 23414,
'd': ['7hk2k989u23lesdfsfd', '9ty562349u23lesdfsfd']},
{'c': 5814,
'd': ['7hk2k989u23lesdfsfd', 'Ur233Fglesdfsfd', '9ty562349u23lesdfsfd']}]}
最后,将新内容写入文件
In [502]: with open('/Users/ado/Desktop/cd-mapped.json', 'w') as f:
...: f.write(json.dumps(cd))
...:
这个解决方案的前提是在ab
中每条记录总有a
和b
。
PS 为了好玩,你可以使用 map
和 dict.get
而不是理解
In [505]: for r in cd['records']:
...: r['d'] = list(map(d_ab.get, r['d']))
...:
In [506]: cd
Out[506]:
{'records': [{'c': 23414,
'd': ['7hk2k989u23lesdfsfd', '9ty562349u23lesdfsfd']},
{'c': 5814,
'd': ['7hk2k989u23lesdfsfd', 'Ur233Fglesdfsfd', '9ty562349u23lesdfsfd']}]}
就性能而言,理解力通常会超过 map
s
In [509]: %timeit for r in cd['records']: r['d'] = [d_ab.get(d) for d in r['d']]
...:
The slowest run took 7.19 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.34 µs per loop
In [511]: %timeit for r in cd['records']: r['d'] = list(map(d_ab.get, r['d']))
The slowest run took 7.19 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.74 µs per loop