使用 Python 将纯文本 API 响应解析为 JSON
Parse plain text API response into JSON using Python
我用于项目的 API 端点 return 是以下形式的纯文本响应:
[RESPONSE]
code = 200
description = Command completed successfully
queuetime = 0
runtime = 0.071
property[abuse policy][0] = The policies are published at the REGISTRY_OPERATOR website at:
property[abuse policy][1] = =>https://registry.in/Policies
property[abuse policy][2] =
property[abuse policy][3] = IN Policy Framework: https://registry.in/system/files/inpolicy_0.pdf
property[abuse policy][4] = IN Domain Anti-Abuse policy: https://registry.in/Policies/IN_Anti_Abuse_Policy
property[abuse policy url][0] = https://registry.in/Policies/IN_Anti_Abuse_Policy
property[active][0] = 0
我正在尝试使用 Python 将其解析为字典。目前,我有以下代码:
import re
def text_to_dict(text):
js = {}
for s in text.splitlines():
x = s.split("=", maxsplit=1)
if len(x) > 1:
keys = [k for i in re.split("\]|\[", x[0]) if (k := i.strip())]
for i, k in enumerate(keys):
pd = js
for j,pk in enumerate(keys[:i]):
if keys[j+1:j+2] and not (keys[j+1:j+2][0]).isnumeric():
pd = pd[pk]
if k not in pd:
if k.isnumeric():
pd[keys[i-1]].append((x[1]).strip())
else:
pd[k] = (x[1]).strip() if i == len(keys)-1 else [] if keys[i+1:i+2] and (keys[i+1:i+2][0]).isnumeric() else {}
return js
这段代码可以处理上面的例子,而且returns:
{
"code": "200",
"description": "Command completed successfully",
"runtime": "0.081",
"queuetime": "0",
"property": {
"abuse policy": [
"The policies are published at the REGISTRY_OPERATOR website at:",
"=>https://registry.in/Policies",
"",
"IN Policy Framework: https://registry.in/system/files/inpolicy_0.pdf",
"IN Domain Anti-Abuse policy: https://registry.in/Policies/IN_Anti_Abuse_Policy"
],
"abuse policy url": [
"https://registry.in/Policies/IN_Anti_Abuse_Policy"
],
"active": [
"0"
]
}
}
但是,如果我将其附加到上面的示例中,它无法处理以下内容:
...
property[active][1][test] = TEST
或
...
property[active][1][0] = TEST
哪个应该return
{
...
"active": [
"0",
{"test": "TEST"}
]
}
和
{
...
"active": [
"0",
["TEST"]
]
}
分别。
我觉得有一种更简单的方法可以在不编写一堆嵌套 if 的情况下考虑所有可能性,但我不确定是什么。
您输入的数据实际上是 INI 文件格式。 Python 为了方便起见有 configparser
模块。
当我们假设键 'property[foo][0][test]'
的每个部分实际上都是一个字典键(没有嵌套列表)时,我们会将其解析为以下结构:
{'property': {'foo': {'0': {'test': 'value'}}}}
这可以通过不断创建嵌套字典的循环来完成:
from configparser import ConfigParser
def parse(text):
config = ConfigParser()
config.read_string(text)
root = {}
for key in config['RESPONSE'].keys():
curr = root
for key_part in key.replace(']', '').split('['):
if key_part not in curr:
curr[key_part] = {}
prev = curr
curr = curr[key_part]
prev[key_part] = config['RESPONSE'][key]
return root
用法
from pprint import pprint
text = """
[RESPONSE]
code = 200
description = Command completed successfully
queuetime = 0
runtime = 0.071
property[abuse policy][0] = The policies are published at the REGISTRY_OPERATOR website at:
property[abuse policy][1] = =>https://registry.in/Policies
property[abuse policy][2] =
property[abuse policy][3] = IN Policy Framework: https://registry.in/system/files/inpolicy_0.pdf
property[abuse policy][4] = IN Domain Anti-Abuse policy: https://registry.in/Policies/IN_Anti_Abuse_Policy
property[abuse policy url][0] = https://registry.in/Policies/IN_Anti_Abuse_Policy
property[active][0] = 0
property[foo][0][test] = a
property[foo][1][test] = b
property[bar][0][0] = A
property[bar][1][1] = B
"""
pprint(parse(text))
结果
{'code': '200',
'description': 'Command completed successfully',
'property': {'abuse policy': {'0': 'The policies are published at the '
'REGISTRY_OPERATOR website at:',
'1': '=>https://registry.in/Policies',
'2': '',
'3': 'IN Policy Framework: '
'https://registry.in/system/files/inpolicy_0.pdf',
'4': 'IN Domain Anti-Abuse policy: '
'https://registry.in/Policies/IN_Anti_Abuse_Policy'},
'abuse policy url': {'0': 'https://registry.in/Policies/IN_Anti_Abuse_Policy'},
'active': {'0': '0'},
'bar': {'0': {'0': 'A'}, '1': {'1': 'B'}},
'foo': {'0': {'test': 'a'}, '1': {'test': 'b'}}},
'queuetime': '0',
'runtime': '0.071'}
您可以检查 key_part
是否为数字,并将其转换为 int
,以便生成的结构表现得更像是包含列表,即
{'property': {'foo': {0: {'test': 'value'}}}}
我用于项目的 API 端点 return 是以下形式的纯文本响应:
[RESPONSE]
code = 200
description = Command completed successfully
queuetime = 0
runtime = 0.071
property[abuse policy][0] = The policies are published at the REGISTRY_OPERATOR website at:
property[abuse policy][1] = =>https://registry.in/Policies
property[abuse policy][2] =
property[abuse policy][3] = IN Policy Framework: https://registry.in/system/files/inpolicy_0.pdf
property[abuse policy][4] = IN Domain Anti-Abuse policy: https://registry.in/Policies/IN_Anti_Abuse_Policy
property[abuse policy url][0] = https://registry.in/Policies/IN_Anti_Abuse_Policy
property[active][0] = 0
我正在尝试使用 Python 将其解析为字典。目前,我有以下代码:
import re
def text_to_dict(text):
js = {}
for s in text.splitlines():
x = s.split("=", maxsplit=1)
if len(x) > 1:
keys = [k for i in re.split("\]|\[", x[0]) if (k := i.strip())]
for i, k in enumerate(keys):
pd = js
for j,pk in enumerate(keys[:i]):
if keys[j+1:j+2] and not (keys[j+1:j+2][0]).isnumeric():
pd = pd[pk]
if k not in pd:
if k.isnumeric():
pd[keys[i-1]].append((x[1]).strip())
else:
pd[k] = (x[1]).strip() if i == len(keys)-1 else [] if keys[i+1:i+2] and (keys[i+1:i+2][0]).isnumeric() else {}
return js
这段代码可以处理上面的例子,而且returns:
{
"code": "200",
"description": "Command completed successfully",
"runtime": "0.081",
"queuetime": "0",
"property": {
"abuse policy": [
"The policies are published at the REGISTRY_OPERATOR website at:",
"=>https://registry.in/Policies",
"",
"IN Policy Framework: https://registry.in/system/files/inpolicy_0.pdf",
"IN Domain Anti-Abuse policy: https://registry.in/Policies/IN_Anti_Abuse_Policy"
],
"abuse policy url": [
"https://registry.in/Policies/IN_Anti_Abuse_Policy"
],
"active": [
"0"
]
}
}
但是,如果我将其附加到上面的示例中,它无法处理以下内容:
...
property[active][1][test] = TEST
或
...
property[active][1][0] = TEST
哪个应该return
{
...
"active": [
"0",
{"test": "TEST"}
]
}
和
{
...
"active": [
"0",
["TEST"]
]
}
分别。
我觉得有一种更简单的方法可以在不编写一堆嵌套 if 的情况下考虑所有可能性,但我不确定是什么。
您输入的数据实际上是 INI 文件格式。 Python 为了方便起见有 configparser
模块。
当我们假设键 'property[foo][0][test]'
的每个部分实际上都是一个字典键(没有嵌套列表)时,我们会将其解析为以下结构:
{'property': {'foo': {'0': {'test': 'value'}}}}
这可以通过不断创建嵌套字典的循环来完成:
from configparser import ConfigParser
def parse(text):
config = ConfigParser()
config.read_string(text)
root = {}
for key in config['RESPONSE'].keys():
curr = root
for key_part in key.replace(']', '').split('['):
if key_part not in curr:
curr[key_part] = {}
prev = curr
curr = curr[key_part]
prev[key_part] = config['RESPONSE'][key]
return root
用法
from pprint import pprint
text = """
[RESPONSE]
code = 200
description = Command completed successfully
queuetime = 0
runtime = 0.071
property[abuse policy][0] = The policies are published at the REGISTRY_OPERATOR website at:
property[abuse policy][1] = =>https://registry.in/Policies
property[abuse policy][2] =
property[abuse policy][3] = IN Policy Framework: https://registry.in/system/files/inpolicy_0.pdf
property[abuse policy][4] = IN Domain Anti-Abuse policy: https://registry.in/Policies/IN_Anti_Abuse_Policy
property[abuse policy url][0] = https://registry.in/Policies/IN_Anti_Abuse_Policy
property[active][0] = 0
property[foo][0][test] = a
property[foo][1][test] = b
property[bar][0][0] = A
property[bar][1][1] = B
"""
pprint(parse(text))
结果
{'code': '200',
'description': 'Command completed successfully',
'property': {'abuse policy': {'0': 'The policies are published at the '
'REGISTRY_OPERATOR website at:',
'1': '=>https://registry.in/Policies',
'2': '',
'3': 'IN Policy Framework: '
'https://registry.in/system/files/inpolicy_0.pdf',
'4': 'IN Domain Anti-Abuse policy: '
'https://registry.in/Policies/IN_Anti_Abuse_Policy'},
'abuse policy url': {'0': 'https://registry.in/Policies/IN_Anti_Abuse_Policy'},
'active': {'0': '0'},
'bar': {'0': {'0': 'A'}, '1': {'1': 'B'}},
'foo': {'0': {'test': 'a'}, '1': {'test': 'b'}}},
'queuetime': '0',
'runtime': '0.071'}
您可以检查 key_part
是否为数字,并将其转换为 int
,以便生成的结构表现得更像是包含列表,即
{'property': {'foo': {0: {'test': 'value'}}}}