使用按值排序的嵌套对象将 csv 转换为 json
Convert csv to json with nested objects sorted by value
我是 json
的新手,我尝试了 提出的建议。但是我失败了。
我的原始文件(缩写)名为 test.csv
,如下所示:
person_uuid sample_uuid sample_slot sample_info
aa AB A anything
aa BD B more info
bc FD A just info
bc AD B even more info
bc OI C text
hu KL B texttext
hu HF C information
我尝试用来转换它的脚本叫做 csv2json.py
:
import csv
import json
import sys
base_name = sys.argv[1]
csvFilePath = "data/"+base_name+".csv"
jsonFilePath = "data/"+base_name+".json"
#
primary_fields = ['person_uuid']
secondary_fields = ['sample_slot']
result = []
with open(csvFilePath) as csv_file:
reader = csv.DictReader(csv_file, delimiter='\t', skipinitialspace=True)
for row in reader:
d = {k: v for k, v in row.items() if k in primary_fields}
e = {k: v for k, v in row.items() if k in secondary_fields}
d['samples'] = [{k: v, }
for k, v in row.items() if k not in primary_fields]
result.append(d)
# convert python jsonArray to JSON String and write to file
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonString = json.dumps(result, indent=4)
jsonf.write(jsonString)
我用 python csv2json.py test
调用转换,结果是:
[
{
"person_uuid": "aa",
"samples": [
{
"sample_uuid": "AB"
},
{
"sample_slot": "A"
},
{
"sample_info": "anything"
}
]
},
{
"person_uuid": "aa",
"samples": [
{
"sample_uuid": "BD"
},
{
"sample_slot": "B"
},
{
"sample_info": "more info"
}
]
},
{
"person_uuid": "bc",
"samples": [
{
"sample_uuid": "FD"
},
{
"sample_slot": "A"
},
{
"sample_info": "just info "
}
]
},
{
"person_uuid": "bc",
"samples": [
{
"sample_uuid": "AD"
},
{
"sample_slot": "B"
},
{
"sample_info": "even more info "
}
]
},
{
"person_uuid": "bc",
"samples": [
{
"sample_uuid": "OI"
},
{
"sample_slot": "C"
},
{
"sample_info": "text"
}
]
},
{
"person_uuid": "hu",
"samples": [
{
"sample_uuid": "KL"
},
{
"sample_slot": "B"
},
{
"sample_info": "texttext"
}
]
},
{
"person_uuid": "hu",
"samples": [
{
"sample_uuid": "HF"
},
{
"sample_slot": "C"
},
{
"sample_info": "information"
}
]
}
]
但我想改为:
[
{
"person_uuid": "aa",
"samples": {
"A": {
"sample_uuid": "AB",
"sample_info": "anything"
},
"B": {
"sample_uuid": "BD",
"sample_info": "more info"
}
}
}, {
"person_uuid": "bc",
"samples": {
"A": {
"sample_uuid": "FD",
"sample_info": "just info"
},
"B": {
"sample_uuid": "AD",
"sample_info": "even more info"
},
"C": {
"sample_uuid": "OI",
"sample_info": "text"
}
}
},
{
"person_uuid": "hu",
"samples": {
"B": {
"sample_uuid": "KL",
"sample_info": "texttext"
},
"C": {
"sample_uuid": "HF",
"sample_info": "information"
}
}
}
]
任何帮助都感谢我如何正确嵌套(我尝试使用 e = {k: v for k, v in row.items() if k in secondary_fields}
)。
可以用iterools.groupby (also see this awswer)解决。
举个例子:
from itertools import groupby
primary_fields = "person_uuid"
secondary_fields = "sample_slot"
with open(csvFilePath) as csv_file:
reader = csv.DictReader(csv_file, delimiter='\t', skipinitialspace=True)
result = []
# We group them by all those who have the same primary_fields
for key, group in groupby(reader, key=lambda x: x[primary_fields]):
# We do the "sample" only for the filtered items
samples = {
elem[secondary_fields]: {
"sample_uuid": elem["sample_uuid"],
"sample_info": elem["sample_info"],
}
for elem in group
}
result.append({primary_fields: key, "samples": samples})
结果是:
[{'person_uuid': 'aa',
'samples': {'A': {'sample_uuid': 'AB', 'sample_info': 'anything'},
'B': {'sample_uuid': 'BD', 'sample_info': 'more info'}}},
{'person_uuid': 'bc',
'samples': {'A': {'sample_uuid': 'FD', 'sample_info': 'just info '},
'B': {'sample_uuid': 'AD', 'sample_info': 'even more info '},
'C': {'sample_uuid': 'OI', 'sample_info': 'text'}}},
{'person_uuid': 'hu',
'samples': {'B': {'sample_uuid': 'KL', 'sample_info': 'texttext'},
'C': {'sample_uuid': 'HF', 'sample_info': 'information'}}}]
我是 json
的新手,我尝试了
我的原始文件(缩写)名为 test.csv
,如下所示:
person_uuid sample_uuid sample_slot sample_info
aa AB A anything
aa BD B more info
bc FD A just info
bc AD B even more info
bc OI C text
hu KL B texttext
hu HF C information
我尝试用来转换它的脚本叫做 csv2json.py
:
import csv
import json
import sys
base_name = sys.argv[1]
csvFilePath = "data/"+base_name+".csv"
jsonFilePath = "data/"+base_name+".json"
#
primary_fields = ['person_uuid']
secondary_fields = ['sample_slot']
result = []
with open(csvFilePath) as csv_file:
reader = csv.DictReader(csv_file, delimiter='\t', skipinitialspace=True)
for row in reader:
d = {k: v for k, v in row.items() if k in primary_fields}
e = {k: v for k, v in row.items() if k in secondary_fields}
d['samples'] = [{k: v, }
for k, v in row.items() if k not in primary_fields]
result.append(d)
# convert python jsonArray to JSON String and write to file
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonString = json.dumps(result, indent=4)
jsonf.write(jsonString)
我用 python csv2json.py test
调用转换,结果是:
[
{
"person_uuid": "aa",
"samples": [
{
"sample_uuid": "AB"
},
{
"sample_slot": "A"
},
{
"sample_info": "anything"
}
]
},
{
"person_uuid": "aa",
"samples": [
{
"sample_uuid": "BD"
},
{
"sample_slot": "B"
},
{
"sample_info": "more info"
}
]
},
{
"person_uuid": "bc",
"samples": [
{
"sample_uuid": "FD"
},
{
"sample_slot": "A"
},
{
"sample_info": "just info "
}
]
},
{
"person_uuid": "bc",
"samples": [
{
"sample_uuid": "AD"
},
{
"sample_slot": "B"
},
{
"sample_info": "even more info "
}
]
},
{
"person_uuid": "bc",
"samples": [
{
"sample_uuid": "OI"
},
{
"sample_slot": "C"
},
{
"sample_info": "text"
}
]
},
{
"person_uuid": "hu",
"samples": [
{
"sample_uuid": "KL"
},
{
"sample_slot": "B"
},
{
"sample_info": "texttext"
}
]
},
{
"person_uuid": "hu",
"samples": [
{
"sample_uuid": "HF"
},
{
"sample_slot": "C"
},
{
"sample_info": "information"
}
]
}
]
但我想改为:
[
{
"person_uuid": "aa",
"samples": {
"A": {
"sample_uuid": "AB",
"sample_info": "anything"
},
"B": {
"sample_uuid": "BD",
"sample_info": "more info"
}
}
}, {
"person_uuid": "bc",
"samples": {
"A": {
"sample_uuid": "FD",
"sample_info": "just info"
},
"B": {
"sample_uuid": "AD",
"sample_info": "even more info"
},
"C": {
"sample_uuid": "OI",
"sample_info": "text"
}
}
},
{
"person_uuid": "hu",
"samples": {
"B": {
"sample_uuid": "KL",
"sample_info": "texttext"
},
"C": {
"sample_uuid": "HF",
"sample_info": "information"
}
}
}
]
任何帮助都感谢我如何正确嵌套(我尝试使用 e = {k: v for k, v in row.items() if k in secondary_fields}
)。
可以用iterools.groupby (also see this awswer)解决。
举个例子:
from itertools import groupby
primary_fields = "person_uuid"
secondary_fields = "sample_slot"
with open(csvFilePath) as csv_file:
reader = csv.DictReader(csv_file, delimiter='\t', skipinitialspace=True)
result = []
# We group them by all those who have the same primary_fields
for key, group in groupby(reader, key=lambda x: x[primary_fields]):
# We do the "sample" only for the filtered items
samples = {
elem[secondary_fields]: {
"sample_uuid": elem["sample_uuid"],
"sample_info": elem["sample_info"],
}
for elem in group
}
result.append({primary_fields: key, "samples": samples})
结果是:
[{'person_uuid': 'aa',
'samples': {'A': {'sample_uuid': 'AB', 'sample_info': 'anything'},
'B': {'sample_uuid': 'BD', 'sample_info': 'more info'}}},
{'person_uuid': 'bc',
'samples': {'A': {'sample_uuid': 'FD', 'sample_info': 'just info '},
'B': {'sample_uuid': 'AD', 'sample_info': 'even more info '},
'C': {'sample_uuid': 'OI', 'sample_info': 'text'}}},
{'person_uuid': 'hu',
'samples': {'B': {'sample_uuid': 'KL', 'sample_info': 'texttext'},
'C': {'sample_uuid': 'HF', 'sample_info': 'information'}}}]