使用按值排序的嵌套对象将 csv 转换为 json

Convert csv to json with nested objects sorted by value

我是 json 的新手,我尝试了 提出的建议。但是我失败了。

我的原始文件(缩写)名为 test.csv,如下所示:

person_uuid sample_uuid sample_slot sample_info
aa  AB  A   anything
aa  BD  B   more info
bc  FD  A   just info 
bc  AD  B   even more info 
bc  OI  C   text
hu  KL  B   texttext
hu  HF  C   information

我尝试用来转换它的脚本叫做 csv2json.py:

import csv
import json
import sys

base_name = sys.argv[1]
csvFilePath = "data/"+base_name+".csv"
jsonFilePath = "data/"+base_name+".json"

# 
primary_fields = ['person_uuid']
secondary_fields = ['sample_slot']
result = []
with open(csvFilePath) as csv_file:
    reader = csv.DictReader(csv_file, delimiter='\t', skipinitialspace=True)
    for row in reader:
        d = {k: v for k, v in row.items() if k in primary_fields}
        e = {k: v for k, v in row.items() if k in secondary_fields}

        d['samples'] = [{k: v, }
                        for k, v in row.items() if k not in primary_fields]

        result.append(d)

# convert python jsonArray to JSON String and write to file
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
    jsonString = json.dumps(result, indent=4)
    jsonf.write(jsonString)

我用 python csv2json.py test 调用转换,结果是:

[
    {
        "person_uuid": "aa",
        "samples": [
            {
                "sample_uuid": "AB"
            },
            {
                "sample_slot": "A"
            },
            {
                "sample_info": "anything"
            }
        ]
    },
    {
        "person_uuid": "aa",
        "samples": [
            {
                "sample_uuid": "BD"
            },
            {
                "sample_slot": "B"
            },
            {
                "sample_info": "more info"
            }
        ]
    },
    {
        "person_uuid": "bc",
        "samples": [
            {
                "sample_uuid": "FD"
            },
            {
                "sample_slot": "A"
            },
            {
                "sample_info": "just info "
            }
        ]
    },
    {
        "person_uuid": "bc",
        "samples": [
            {
                "sample_uuid": "AD"
            },
            {
                "sample_slot": "B"
            },
            {
                "sample_info": "even more info "
            }
        ]
    },
    {
        "person_uuid": "bc",
        "samples": [
            {
                "sample_uuid": "OI"
            },
            {
                "sample_slot": "C"
            },
            {
                "sample_info": "text"
            }
        ]
    },
    {
        "person_uuid": "hu",
        "samples": [
            {
                "sample_uuid": "KL"
            },
            {
                "sample_slot": "B"
            },
            {
                "sample_info": "texttext"
            }
        ]
    },
    {
        "person_uuid": "hu",
        "samples": [
            {
                "sample_uuid": "HF"
            },
            {
                "sample_slot": "C"
            },
            {
                "sample_info": "information"
            }
        ]
    }
]

但我想改为:

[


    {
        "person_uuid": "aa",
        "samples": {
            "A": {
                "sample_uuid": "AB",
                "sample_info": "anything"
            },
            "B": {
                "sample_uuid": "BD",
                "sample_info": "more info"
            }
        }


    }, {
        "person_uuid": "bc",
        "samples": {
            "A": {
                "sample_uuid": "FD",
                "sample_info": "just info"
            },
            "B": {
                "sample_uuid": "AD",
                "sample_info": "even more info"
            },
            "C": {
                "sample_uuid": "OI",
                "sample_info": "text"
            }
        }
    },
    {
        "person_uuid": "hu",
        "samples": {
            "B": {
                "sample_uuid": "KL",
                "sample_info": "texttext"
            },
            "C": {
                "sample_uuid": "HF",
                "sample_info": "information"
            }
        }
    }

]

任何帮助都感谢我如何正确嵌套(我尝试使用 e = {k: v for k, v in row.items() if k in secondary_fields})。

可以用iterools.groupby (also see this awswer)解决。

举个例子:

from itertools import groupby


primary_fields = "person_uuid"
secondary_fields = "sample_slot"

with open(csvFilePath) as csv_file:
    reader = csv.DictReader(csv_file, delimiter='\t', skipinitialspace=True)
    result = []
    # We group them by all those who have the same primary_fields
    for key, group in groupby(reader, key=lambda x: x[primary_fields]):
        # We do the "sample" only for the filtered items
        samples = {
            elem[secondary_fields]: {
            "sample_uuid": elem["sample_uuid"],
            "sample_info": elem["sample_info"],
            }
            for elem in group
        }
        result.append({primary_fields: key, "samples": samples})

结果是:

[{'person_uuid': 'aa',
  'samples': {'A': {'sample_uuid': 'AB', 'sample_info': 'anything'},
   'B': {'sample_uuid': 'BD', 'sample_info': 'more info'}}},
 {'person_uuid': 'bc',
  'samples': {'A': {'sample_uuid': 'FD', 'sample_info': 'just info '},
   'B': {'sample_uuid': 'AD', 'sample_info': 'even more info '},
   'C': {'sample_uuid': 'OI', 'sample_info': 'text'}}},
 {'person_uuid': 'hu',
  'samples': {'B': {'sample_uuid': 'KL', 'sample_info': 'texttext'},
   'C': {'sample_uuid': 'HF', 'sample_info': 'information'}}}]