在 Python 中将 CSV 转换为结构良好的 JSON
Convert CSV to well-structured JSON in Python
我有一个结构如下的 CSV 文件:
Store, Region, District, MallName, Location
1234,90,910,MallA,GMT
4567,87,902,MallB,EST
2468,90,811,MallC,PST
1357,87,902,MallD,CST
我通过反复的挑衅能够完成的是获得如下格式:
{
"90": {
"910": {
"1234": {
"name": "MallA",
"location": "GMT"
}
},
"811": {
"2468": {
"name": "MallB",
"location": "PST"
}
}
},
"87": {
"902": {
"4567": {
"name": "MallB",
"location": "EST"
},
"1357": {
"name": "MallD",
"location": "CST"
}
}
}
}
下面的代码被精简以匹配我提供的示例数据集,但您知道发生了什么。同样,它是非常迭代的和非 pythonic 的,我也在努力朝着这个方向发展。 (如果有人觉得定义的程序值得 post 我可以)。
#************
# Main()
#************
dictHierarchy = {}
with open(getFilePath(), 'r') as f:
content = [line.strip('\n') for line in f.readlines()]
for data in content:
data = data.split(",")
myRegion = data[1]
myDistrict = data[2]
myName = data[3]
myLocation = data[4]
myStore = data[0]
if myRegion in dictHierarchy:
#check for District
if myDistrict in dictHierarchy[myRegion]:
#checkforStore
dictHierarchy[myRegion][myDistrict].update({myStore:addStoreDetails(data)})
else:
#add district
dictHierarchy[myRegion].update({myDistrict:addStore(data)})
else:
#add region
dictHierarchy.update({myRegion:addDistrict(data)})
with open('hierarchy.json', 'w') as outfile:
json.dump(dictHierarchy, outfile)
强迫症我看着上面的 JSON 输出并认为对于盲目打开文件的人来说它看起来像一个大杂烩。我希望为纯文本可读性做的是将数据分组并将其放入 JSON 格式,如下所示:
{"Regions":[
{"Region":"90", "Districts":[
{"District":"910", "Stores":[
{"Store":"1234", "name":"MallA", "location":"GMT"}]},
{"District":"811", "Stores":[
{"Store":"2468", "name":"MallC", "location":"PST"}]}]},
{"Region":"87", "Districts":[
{"District":"902", "Stores":[
{"Store":"4567", "name":"MallB", "location":"EST"},
{"Store":"1357", "name":"MallD", "location":"CST"}]}]}]}
长话短说,我今天浪费了很多时间试图弄清楚如何在 Python 中实际填充数据结构,但基本上没有结果。有没有一种干净的 pythonic 方法来实现这一目标?值得付出努力吗?
我已将 headers 添加到您的输入中,例如:
Store,Region,District,name,location
1234,90,910,MallA,GMT
4567,87,902,MallB,EST
2468,90,811,MallC,PST
1357,87,902,MallD,CST
然后像这样使用 python csv reader and group by:
import csv
from itertools import groupby, ifilter
from operator import itemgetter
data = []
with open('in.csv') as csvfile:
reader = csv.DictReader(csvfile)
regions = []
regions_dict = sorted(list(reader), key=itemgetter('Region'))
for region_id, region_group in groupby(regions_dict, itemgetter('Region')):
districts = []
regions.append({'Region': region_id, 'Districts': districts})
districts_dict = sorted(region_group, key=itemgetter('District'))
for district_id, district_group in groupby(districts_dict, itemgetter('District')):
districts.append({'District': district_id, 'Stores': list(district_group)})
print regions
我有一个结构如下的 CSV 文件:
Store, Region, District, MallName, Location
1234,90,910,MallA,GMT
4567,87,902,MallB,EST
2468,90,811,MallC,PST
1357,87,902,MallD,CST
我通过反复的挑衅能够完成的是获得如下格式:
{
"90": {
"910": {
"1234": {
"name": "MallA",
"location": "GMT"
}
},
"811": {
"2468": {
"name": "MallB",
"location": "PST"
}
}
},
"87": {
"902": {
"4567": {
"name": "MallB",
"location": "EST"
},
"1357": {
"name": "MallD",
"location": "CST"
}
}
}
}
下面的代码被精简以匹配我提供的示例数据集,但您知道发生了什么。同样,它是非常迭代的和非 pythonic 的,我也在努力朝着这个方向发展。 (如果有人觉得定义的程序值得 post 我可以)。
#************
# Main()
#************
dictHierarchy = {}
with open(getFilePath(), 'r') as f:
content = [line.strip('\n') for line in f.readlines()]
for data in content:
data = data.split(",")
myRegion = data[1]
myDistrict = data[2]
myName = data[3]
myLocation = data[4]
myStore = data[0]
if myRegion in dictHierarchy:
#check for District
if myDistrict in dictHierarchy[myRegion]:
#checkforStore
dictHierarchy[myRegion][myDistrict].update({myStore:addStoreDetails(data)})
else:
#add district
dictHierarchy[myRegion].update({myDistrict:addStore(data)})
else:
#add region
dictHierarchy.update({myRegion:addDistrict(data)})
with open('hierarchy.json', 'w') as outfile:
json.dump(dictHierarchy, outfile)
强迫症我看着上面的 JSON 输出并认为对于盲目打开文件的人来说它看起来像一个大杂烩。我希望为纯文本可读性做的是将数据分组并将其放入 JSON 格式,如下所示:
{"Regions":[
{"Region":"90", "Districts":[
{"District":"910", "Stores":[
{"Store":"1234", "name":"MallA", "location":"GMT"}]},
{"District":"811", "Stores":[
{"Store":"2468", "name":"MallC", "location":"PST"}]}]},
{"Region":"87", "Districts":[
{"District":"902", "Stores":[
{"Store":"4567", "name":"MallB", "location":"EST"},
{"Store":"1357", "name":"MallD", "location":"CST"}]}]}]}
长话短说,我今天浪费了很多时间试图弄清楚如何在 Python 中实际填充数据结构,但基本上没有结果。有没有一种干净的 pythonic 方法来实现这一目标?值得付出努力吗?
我已将 headers 添加到您的输入中,例如:
Store,Region,District,name,location
1234,90,910,MallA,GMT
4567,87,902,MallB,EST
2468,90,811,MallC,PST
1357,87,902,MallD,CST
然后像这样使用 python csv reader and group by:
import csv
from itertools import groupby, ifilter
from operator import itemgetter
data = []
with open('in.csv') as csvfile:
reader = csv.DictReader(csvfile)
regions = []
regions_dict = sorted(list(reader), key=itemgetter('Region'))
for region_id, region_group in groupby(regions_dict, itemgetter('Region')):
districts = []
regions.append({'Region': region_id, 'Districts': districts})
districts_dict = sorted(region_group, key=itemgetter('District'))
for district_id, district_group in groupby(districts_dict, itemgetter('District')):
districts.append({'District': district_id, 'Stores': list(district_group)})
print regions