CSV 中奇怪的类似于 JSON 的数据格式
Strange JSON-like data format in CSV
我必须使用一种奇怪的类似于 JSON 的格式,这种格式会使解析器崩溃,因为它不完全 JSON(没有引号,等号而不是冒号等)。
有没有人见过这样的数据格式,如果有的话是什么?
"[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16},..."
它嵌套在 CSV 结构中,所以我想知道它是否与此有关。
编辑:
完整示例
[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]
运行 通过 yaml 和 python 给出以下内容:
import yaml
dct = yaml.safe_load(body)
dct
[{'location_type=1': None,
'location_fullname=Papua New Guinea': None,
'location_countrycode=PP': None,
...
该格式似乎足够兼容,只需进行一些字符串操作即可将其转换为 YAML 或 JSON。
1。 YAML 转换
这是将字符串加载到字典列表中的最简单方法。如果您的 Python 项目已经需要 YAML,那么您没有任何理由不使用此解决方案:
import yaml
body = '[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'
dct = yaml.safe_load( body.replace('=',': ') )
2。 JSON 转换
如果您希望避免向项目添加外部依赖项 (YAML),则可以使用此解决方案。虽然大多数人不关心这一点,但我确实关心(越少越好)。需要注意的是,它需要知道转换数据类型的数据结构(该函数是@dawg 的稍微修改版本):
import re
import json
body = '[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'
def conv(s):
try:
return int(s)
except ValueError:
pass
try:
return float(s)
except ValueError:
return None if s == '' else s
dct = [
{ k: conv(v) for k,v in d.items() }
for d in json.loads(
re.sub(
'([^\s[{^=]+)=([^,}]*)([,}\]])',
'"\1":"\2"\3',
body
)
)
]
两种解决方案都产生:
# dct
[
{
'location_type': 1,
'location_fullname': 'Papua New Guinea',
'location_countrycode': 'PP',
'location_adm1code': 'PP',
'location_adm2code': None,
'location_latitude': -6,
'location_longitude': 147,
'location_featureid': 'PP',
'character_offset': 16
},
{
'location_type': 1,
'location_fullname': 'Papua New Guinea',
'location_countrycode': 'PP',
'location_adm1code': 'PP',
'location_adm2code': None,
'location_latitude': -6,
'location_longitude': 147,
'location_featureid': 'PP',
'character_offset': 290},
{
'location_type': 1,
'location_fullname': 'Indonesia',
'location_countrycode': 'ID',
'location_adm1code': 'ID',
'location_adm2code': None,
'location_latitude': -5,
'location_longitude': 120,
'location_featureid': 'ID',
'character_offset': 676
},
{
'location_type': 1,
'location_fullname': 'North Korea',
'location_countrycode': 'KN',
'location_adm1code': 'KN',
'location_adm2code': None,
'location_latitude': 40,
'location_longitude': 127,
'location_featureid': 'KN',
'character_offset': 748
},
{
'location_type': 1,
'location_fullname': 'British Indian Ocean Territory',
'location_countrycode': 'IO',
'location_adm1code': 'IO',
'location_adm2code': None,
'location_latitude': -6,
'location_longitude': 71.5,
'location_featureid': 'IO', 'character_offset': 892
}
]
我会处理 yaml
给你的东西,因为它 几乎 在那里:
import yaml
ex='''
[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'''
def conv(s):
try:
return int(s)
except ValueError:
pass
try:
return float(s)
except ValueError:
return s
res=[{x:conv(y) for x,y in map(lambda s: s.split('='), di)}
for di in yaml.load(ex, Loader=yaml.CLoader)]
>>> res
[{'location_type': 1, 'location_fullname': 'Papua New Guinea', 'location_countrycode': 'PP', 'location_adm1code': 'PP', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 147, 'location_featureid': 'PP', 'character_offset': 16}, {'location_type': 1, 'location_fullname': 'Papua New Guinea', 'location_countrycode': 'PP', 'location_adm1code': 'PP', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 147, 'location_featureid': 'PP', 'character_offset': 290}, {'location_type': 1, 'location_fullname': 'Indonesia', 'location_countrycode': 'ID', 'location_adm1code': 'ID', 'location_adm2code': '', 'location_latitude': -5, 'location_longitude': 120, 'location_featureid': 'ID', 'character_offset': 676}, {'location_type': 1, 'location_fullname': 'North Korea', 'location_countrycode': 'KN', 'location_adm1code': 'KN', 'location_adm2code': '', 'location_latitude': 40, 'location_longitude': 127, 'location_featureid': 'KN', 'character_offset': 748}, {'location_type': 1, 'location_fullname': 'British Indian Ocean Territory', 'location_countrycode': 'IO', 'location_adm1code': 'IO', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 71.5, 'location_featureid': 'IO', 'character_offset': 892}]
我必须使用一种奇怪的类似于 JSON 的格式,这种格式会使解析器崩溃,因为它不完全 JSON(没有引号,等号而不是冒号等)。
有没有人见过这样的数据格式,如果有的话是什么?
"[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16},..."
它嵌套在 CSV 结构中,所以我想知道它是否与此有关。
编辑:
完整示例
[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]
运行 通过 yaml 和 python 给出以下内容:
import yaml
dct = yaml.safe_load(body)
dct
[{'location_type=1': None,
'location_fullname=Papua New Guinea': None,
'location_countrycode=PP': None,
...
该格式似乎足够兼容,只需进行一些字符串操作即可将其转换为 YAML 或 JSON。
1。 YAML 转换
这是将字符串加载到字典列表中的最简单方法。如果您的 Python 项目已经需要 YAML,那么您没有任何理由不使用此解决方案:
import yaml
body = '[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'
dct = yaml.safe_load( body.replace('=',': ') )
2。 JSON 转换
如果您希望避免向项目添加外部依赖项 (YAML),则可以使用此解决方案。虽然大多数人不关心这一点,但我确实关心(越少越好)。需要注意的是,它需要知道转换数据类型的数据结构(该函数是@dawg
import re
import json
body = '[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'
def conv(s):
try:
return int(s)
except ValueError:
pass
try:
return float(s)
except ValueError:
return None if s == '' else s
dct = [
{ k: conv(v) for k,v in d.items() }
for d in json.loads(
re.sub(
'([^\s[{^=]+)=([^,}]*)([,}\]])',
'"\1":"\2"\3',
body
)
)
]
两种解决方案都产生:
# dct
[
{
'location_type': 1,
'location_fullname': 'Papua New Guinea',
'location_countrycode': 'PP',
'location_adm1code': 'PP',
'location_adm2code': None,
'location_latitude': -6,
'location_longitude': 147,
'location_featureid': 'PP',
'character_offset': 16
},
{
'location_type': 1,
'location_fullname': 'Papua New Guinea',
'location_countrycode': 'PP',
'location_adm1code': 'PP',
'location_adm2code': None,
'location_latitude': -6,
'location_longitude': 147,
'location_featureid': 'PP',
'character_offset': 290},
{
'location_type': 1,
'location_fullname': 'Indonesia',
'location_countrycode': 'ID',
'location_adm1code': 'ID',
'location_adm2code': None,
'location_latitude': -5,
'location_longitude': 120,
'location_featureid': 'ID',
'character_offset': 676
},
{
'location_type': 1,
'location_fullname': 'North Korea',
'location_countrycode': 'KN',
'location_adm1code': 'KN',
'location_adm2code': None,
'location_latitude': 40,
'location_longitude': 127,
'location_featureid': 'KN',
'character_offset': 748
},
{
'location_type': 1,
'location_fullname': 'British Indian Ocean Territory',
'location_countrycode': 'IO',
'location_adm1code': 'IO',
'location_adm2code': None,
'location_latitude': -6,
'location_longitude': 71.5,
'location_featureid': 'IO', 'character_offset': 892
}
]
我会处理 yaml
给你的东西,因为它 几乎 在那里:
import yaml
ex='''
[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'''
def conv(s):
try:
return int(s)
except ValueError:
pass
try:
return float(s)
except ValueError:
return s
res=[{x:conv(y) for x,y in map(lambda s: s.split('='), di)}
for di in yaml.load(ex, Loader=yaml.CLoader)]
>>> res
[{'location_type': 1, 'location_fullname': 'Papua New Guinea', 'location_countrycode': 'PP', 'location_adm1code': 'PP', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 147, 'location_featureid': 'PP', 'character_offset': 16}, {'location_type': 1, 'location_fullname': 'Papua New Guinea', 'location_countrycode': 'PP', 'location_adm1code': 'PP', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 147, 'location_featureid': 'PP', 'character_offset': 290}, {'location_type': 1, 'location_fullname': 'Indonesia', 'location_countrycode': 'ID', 'location_adm1code': 'ID', 'location_adm2code': '', 'location_latitude': -5, 'location_longitude': 120, 'location_featureid': 'ID', 'character_offset': 676}, {'location_type': 1, 'location_fullname': 'North Korea', 'location_countrycode': 'KN', 'location_adm1code': 'KN', 'location_adm2code': '', 'location_latitude': 40, 'location_longitude': 127, 'location_featureid': 'KN', 'character_offset': 748}, {'location_type': 1, 'location_fullname': 'British Indian Ocean Territory', 'location_countrycode': 'IO', 'location_adm1code': 'IO', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 71.5, 'location_featureid': 'IO', 'character_offset': 892}]