CSV 中奇怪的类似于 JSON 的数据格式

Strange JSON-like data format in CSV

我必须使用一种奇怪的类似于 JSON 的格式,这种格式会使解析器崩溃,因为它不完全 JSON(没有引号,等号而不是冒号等)。

有没有人见过这样的数据格式,如果有的话是什么?

"[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16},..."

它嵌套在 CSV 结构中,所以我想知道它是否与此有关。

编辑:

完整示例

[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]

运行 通过 yaml 和 python 给出以下内容:

import yaml
dct = yaml.safe_load(body)
dct
[{'location_type=1': None,
  'location_fullname=Papua New Guinea': None,
  'location_countrycode=PP': None,
...

该格式似乎足够兼容,只需进行一些字符串操作即可将其转换为 YAML 或 JSON。

1。 YAML 转换

这是将字符串加载到字典列表中的最简单方法。如果您的 Python 项目已经需要 YAML,那么您没有任何理由不使用此解决方案:

import yaml

body = '[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'

dct = yaml.safe_load( body.replace('=',': ') )

2。 JSON 转换

如果您希望避免向项目添加外部依赖项 (YAML),则可以使用此解决方案。虽然大多数人不关心这一点,但我确实关心(越少越好)。需要注意的是,它需要知道转换数据类型的数据结构(该函数是@dawg 的稍微修改版本):

import re
import json

body = '[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'

def conv(s):
    try:
        return int(s)
    except ValueError:
        pass
    try:
        return float(s)
    except ValueError:
        return None if s == '' else s

dct = [ 
  { k: conv(v) for k,v in d.items() }
  for d in json.loads(
    re.sub(
      '([^\s[{^=]+)=([^,}]*)([,}\]])',
      '"\1":"\2"\3',
      body
    )
  )
]

两种解决方案都产生:
# dct
[
  {
    'location_type': 1,
    'location_fullname': 'Papua New Guinea',
    'location_countrycode': 'PP',
    'location_adm1code': 'PP',
    'location_adm2code': None,
    'location_latitude': -6,
    'location_longitude': 147,
    'location_featureid': 'PP',
    'character_offset': 16
  },
  {
    'location_type': 1,
    'location_fullname': 'Papua New Guinea',
    'location_countrycode': 'PP',
    'location_adm1code': 'PP',
    'location_adm2code': None,
    'location_latitude': -6,
    'location_longitude': 147,
    'location_featureid': 'PP',
    'character_offset': 290},
  {
    'location_type': 1,
    'location_fullname': 'Indonesia',
    'location_countrycode': 'ID',
    'location_adm1code': 'ID',
    'location_adm2code': None,
    'location_latitude': -5,
    'location_longitude': 120,
    'location_featureid': 'ID',
    'character_offset': 676
  },
  {
    'location_type': 1,
    'location_fullname': 'North Korea',
    'location_countrycode': 'KN',
    'location_adm1code': 'KN',
    'location_adm2code': None,
    'location_latitude': 40,
    'location_longitude': 127,
    'location_featureid': 'KN',
    'character_offset': 748
  },
  {
    'location_type': 1,
    'location_fullname': 'British Indian Ocean Territory',
    'location_countrycode': 'IO',
    'location_adm1code': 'IO',
    'location_adm2code': None,
    'location_latitude': -6,
    'location_longitude': 71.5,
    'location_featureid': 'IO', 'character_offset': 892
  }
]

我会处理 yaml 给你的东西,因为它 几乎 在那里:

import yaml 

ex='''
[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'''

def conv(s):
    try:
        return int(s)
    except ValueError:
        pass 
        
    try:
        return float(s)
    except ValueError:
        return s
        

res=[{x:conv(y) for x,y in map(lambda s: s.split('='), di)} 
       for di in yaml.load(ex, Loader=yaml.CLoader)]

>>> res
[{'location_type': 1, 'location_fullname': 'Papua New Guinea', 'location_countrycode': 'PP', 'location_adm1code': 'PP', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 147, 'location_featureid': 'PP', 'character_offset': 16}, {'location_type': 1, 'location_fullname': 'Papua New Guinea', 'location_countrycode': 'PP', 'location_adm1code': 'PP', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 147, 'location_featureid': 'PP', 'character_offset': 290}, {'location_type': 1, 'location_fullname': 'Indonesia', 'location_countrycode': 'ID', 'location_adm1code': 'ID', 'location_adm2code': '', 'location_latitude': -5, 'location_longitude': 120, 'location_featureid': 'ID', 'character_offset': 676}, {'location_type': 1, 'location_fullname': 'North Korea', 'location_countrycode': 'KN', 'location_adm1code': 'KN', 'location_adm2code': '', 'location_latitude': 40, 'location_longitude': 127, 'location_featureid': 'KN', 'character_offset': 748}, {'location_type': 1, 'location_fullname': 'British Indian Ocean Territory', 'location_countrycode': 'IO', 'location_adm1code': 'IO', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 71.5, 'location_featureid': 'IO', 'character_offset': 892}]