将 CSV 转换为列表树
Convert CSV to list tree
我有一个包含城镇、县、国家/地区的城镇的 csv 列表。
为了不必在编码中这样做,我删除了 headers。
这是我的:
locations={}
class Location:
def __init__(self,town,county,country):
self.town = town
self.county = county
self.country = country
def store(self):
locations.update({self.county:self.country})
for line in open('town-county-country.csv','r'):
line=line.strip()
line=line.split(',')
x = Location(line[0],line[1],line[2])
x.store()
这样就完美地存入了字典。但现在对于每个国家/地区,我都想要一个以国家/地区为键,以县列表为值的字典。
我考虑过使用 for 循环来创建国家/地区列表,并使用嵌套 for 循环来添加县,但是除此之外还必须有 if 语句来检查密钥是否存在,而且看起来不是最好的实现此目的的方法。
有没有人知道更好的方法?
理想情况下输出应该是这样的:
counties = {
'AK': [
'ALEUTIANS EAST',
'ALEUTIANS WEST',
'ANCHORAGE',
'BETHEL',
'BRISTOL BAY',
'DENALI',
'DILLINGHAM',
'FAIRBANKS NORTH STAR',
'HAINES',
'HOONAH ANGOON',
'JUNEAU',
'KENAI PENINSULA',
'KETCHIKAN GATEWAY',
'KODIAK ISLAND',
'LAKE AND PENINSULA',
'MATANUSKA SUSITNA',
'NOME',
'NORTH SLOPE',
'NORTHWEST ARCTIC',
'PETERSBURG',
'PRINCE OF WALES HYDER',
'SITKA',
'SKAGWAY',
'SOUTHEAST FAIRBANKS',
'VALDEZ CORDOVA',
'WADE HAMPTON',
'WRANGELL',
'YAKUTAT',
'YUKON KOYUKUK'
],
这是我打开的文件的简短示例:
Ampthill,Bedfordshire,England
Arlesey,Bedfordshire,England
Bedford,Bedfordshire,England
Biggleswade,Bedfordshire,England
Dunstable,Bedfordshire,England
Flitwick,Bedfordshire,England
Houghton Regis,Bedfordshire,England
Kempston,Bedfordshire,England
Leighton Buzzard,Bedfordshire,England
Linslade,Bedfordshire,England
Luton,Bedfordshire,England
Potton,Bedfordshire,England
Sandy,Bedfordshire,England
Shefford,Bedfordshire,England
Stotfold,Bedfordshire,England
Wixams,Bedfordshire,England
Woburn,Bedfordshire,England
我将测试数据添加到您的示例 csv 中,因为它只有 1 个县:
Ampthill,Bedfordshire,England
Reading,Berkshire,England
Aylesbury,Buckinghamshire,England
Munster,Cork,Ireland
import pandas as pd # pandas is faster than csv library and less lines of code than just reading the raw file
counties = {} # will contain final data
df = pd.read_csv('town-county-country.csv', header=None) # you can leave headers in your file header=None will ignore them
countries = list(df.apply(set)[2]) # df[2] = country # set removes duplicates but does not preserve order
# add each country to counties dictionary
for country in countries:
counties.update({country: list(set(df[df[2].isin([country])][1]))}) # we use set again to remove duplicates on the county and use .isin to lookup country(df[2])
print(counties)
output: {'England': [['Buckinghamshire', 'Berkshire', 'Bedfordshire']], 'Ireland': [['Cork']]}
此外,正如我在您的原始代码中所说,您可以这样做:
line = line.strip().split(',')
x = Location(*line)
我有一个包含城镇、县、国家/地区的城镇的 csv 列表。 为了不必在编码中这样做,我删除了 headers。
这是我的:
locations={}
class Location:
def __init__(self,town,county,country):
self.town = town
self.county = county
self.country = country
def store(self):
locations.update({self.county:self.country})
for line in open('town-county-country.csv','r'):
line=line.strip()
line=line.split(',')
x = Location(line[0],line[1],line[2])
x.store()
这样就完美地存入了字典。但现在对于每个国家/地区,我都想要一个以国家/地区为键,以县列表为值的字典。
我考虑过使用 for 循环来创建国家/地区列表,并使用嵌套 for 循环来添加县,但是除此之外还必须有 if 语句来检查密钥是否存在,而且看起来不是最好的实现此目的的方法。
有没有人知道更好的方法?
理想情况下输出应该是这样的:
counties = {
'AK': [
'ALEUTIANS EAST',
'ALEUTIANS WEST',
'ANCHORAGE',
'BETHEL',
'BRISTOL BAY',
'DENALI',
'DILLINGHAM',
'FAIRBANKS NORTH STAR',
'HAINES',
'HOONAH ANGOON',
'JUNEAU',
'KENAI PENINSULA',
'KETCHIKAN GATEWAY',
'KODIAK ISLAND',
'LAKE AND PENINSULA',
'MATANUSKA SUSITNA',
'NOME',
'NORTH SLOPE',
'NORTHWEST ARCTIC',
'PETERSBURG',
'PRINCE OF WALES HYDER',
'SITKA',
'SKAGWAY',
'SOUTHEAST FAIRBANKS',
'VALDEZ CORDOVA',
'WADE HAMPTON',
'WRANGELL',
'YAKUTAT',
'YUKON KOYUKUK'
],
这是我打开的文件的简短示例:
Ampthill,Bedfordshire,England
Arlesey,Bedfordshire,England
Bedford,Bedfordshire,England
Biggleswade,Bedfordshire,England
Dunstable,Bedfordshire,England
Flitwick,Bedfordshire,England
Houghton Regis,Bedfordshire,England
Kempston,Bedfordshire,England
Leighton Buzzard,Bedfordshire,England
Linslade,Bedfordshire,England
Luton,Bedfordshire,England
Potton,Bedfordshire,England
Sandy,Bedfordshire,England
Shefford,Bedfordshire,England
Stotfold,Bedfordshire,England
Wixams,Bedfordshire,England
Woburn,Bedfordshire,England
我将测试数据添加到您的示例 csv 中,因为它只有 1 个县:
Ampthill,Bedfordshire,England
Reading,Berkshire,England
Aylesbury,Buckinghamshire,England
Munster,Cork,Ireland
import pandas as pd # pandas is faster than csv library and less lines of code than just reading the raw file
counties = {} # will contain final data
df = pd.read_csv('town-county-country.csv', header=None) # you can leave headers in your file header=None will ignore them
countries = list(df.apply(set)[2]) # df[2] = country # set removes duplicates but does not preserve order
# add each country to counties dictionary
for country in countries:
counties.update({country: list(set(df[df[2].isin([country])][1]))}) # we use set again to remove duplicates on the county and use .isin to lookup country(df[2])
print(counties)
output: {'England': [['Buckinghamshire', 'Berkshire', 'Bedfordshire']], 'Ireland': [['Cork']]}
此外,正如我在您的原始代码中所说,您可以这样做:
line = line.strip().split(',')
x = Location(*line)