python 有序字典问题
python ordered dict issue
如果我有一个 CSV 文件,其中每一行都有一个字典值(列为 ["Location"]、["MovieDate"]、["Formatted_Address"]、["Lat"], ["Lng"]), 有人告诉我如果我想按 Location
分组并附加到共享相同 [=17] 的所有 MovieDate
值上,请使用 OrderDict =] 值。
数据示例:
Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
"Edgebrook Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
对于具有相同位置的每一行(^在此示例中),我想进行这样的输出,以便没有重复的位置。
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
我使用 ordereddict 执行此操作的代码有什么问题?
from collections import OrderedDict
od = OrderedDict()
import csv
with open("MovieDictFormatted.csv") as f,open("MoviesCombined.csv" ,"w") as out:
r = csv.reader(f)
wr = csv.writer(out)
header = next(r)
for row in r:
loc,rest = row[0], row[1]
od.setdefault(loc, []).append(rest)
wr.writerow(header)
for loc,vals in od.items():
wr.writerow([loc]+vals)
我最终得到的是这样的:
['Edgebrook Park, Chicago ', 'Jun-7 A League of Their Own']
['Gage Park, Chicago ', "Jun-9 It's a Mad, Mad, Mad, Mad World"]
['Jefferson Memorial Park, Chicago ', 'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers ']
['Commercial Club Playground, Chicago ', 'Jun-12 Despicable Me 2']
问题是在这种情况下我没有让其他列显示,我最好怎么做?我也更愿意将 MovieDate 值设为一个长字符串,如下所示:
'Jun-12 Monsters University Jul-11 Frozen Aug-8 The Blues Brothers '
而不是:
'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers '
谢谢大家,感激不尽。我是 python 菜鸟。
不幸的是,将 row[0], row[1]
更改为 row[0], row[1:]
并没有给我想要的结果。我只想在第二列 (MovieDate) 中添加值,而不是复制所有其他列因此:
['Jefferson Memorial Park, Chicago ', ['Jun-12 Monsters University ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Jul-11 Frozen ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Aug-8 The Blues Brothers ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353']]
让我们尝试改变
od.setdefault(loc, []).append(rest)
到
od[loc] = ' '.join([od.get(loc, ''), ' 'join(rest)])
然后保持原样:
wr.writerow([loc]+vals)
您只需要进行一些更改,您需要加入纬度和经度,要删除重复的纬度和经度,我们还需要将其用作键:
with open("data.csv") as f,open("new.csv" ,"w") as out:
r = csv.reader(f)
wr= csv.writer(out)
header = next(r)
for row in r:
od.setdefault((row[0], row[-2], row[-1]), []).append(" ".join(row[1:-2]))
wr.writerow(header)
for loc,vals in od.items():
wr.writerow([loc[0]] + vals+list(loc[1:]))
输出:
Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA","Jun-9 It's a Mad, Mad, Mad, Mad World Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
A League of Their Own
是第一个,因为它在 mad,mad 行之前,
row[1:-2]
获取除纬度、经度和位置之外的所有内容,我们将纬度和经度存储在我们的键元组中,以避免在每一行的末尾重复写入。
使用名称和解包可能会更容易理解:
with open("data.csv") as f, open("new.csv", "w") as out:
r = csv.reader(f)
wr = csv.writer(out)
header = next(r)
for row in r:
loc, mov, form, lat, long = row
od.setdefault((loc, lat, long), []).append("{} {}".format(mov, form))
wr.writerow(header)
for loc, vals in od.items():
wr.writerow([loc[0]] + vals + list(loc[1:]))
使用csv.Dictwriter保留五列:
od = OrderedDict()
import csv
with open("data.csv") as f, open("new.csv", "w") as out:
r = csv.DictReader(f,fieldnames=['Location', 'MovieDate', 'Formatted_Address', 'Lat', 'Lng'])
wr = csv.DictWriter(out, fieldnames=r.fieldnames)
for row in r:
od.setdefault(row["Location"], dict(Location=row["Location"], Lat=row["Lat"], Lng=row["Lng"],
MovieDate=[], Formatted_Address=row["Formatted_Address"]))
od[row["Location"]]["MovieDate"].append(row["MovieDate"])
for loc, vals in od.items():
od[loc]["MovieDate"]= ", ".join(od[loc]["MovieDate"])
wr.writerow(vals)
#
输出:
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own, Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
所以五列保持不变,我们将 "MovieDate"
连接成单个字符串,并且 Formatted_Address=form
始终是唯一的,因此我们不需要更新它。
事实证明,我们需要做的就是连接 MovieDate's
并删除 Location、Lat、Lng 和 'Formatted_Address'
.
的重复条目
假设位置是该行的第一项:
dict = {}
for line in f:
if line[0] not in dict:
dict[line[0]] = []
dict[line[0]].append(line[1:])
对于每个位置,您都拥有该行的其余部分
for key, value in dict.iteritems():
out.write(key + value)
如果我有一个 CSV 文件,其中每一行都有一个字典值(列为 ["Location"]、["MovieDate"]、["Formatted_Address"]、["Lat"], ["Lng"]), 有人告诉我如果我想按 Location
分组并附加到共享相同 [=17] 的所有 MovieDate
值上,请使用 OrderDict =] 值。
数据示例:
Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
"Edgebrook Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
对于具有相同位置的每一行(^在此示例中),我想进行这样的输出,以便没有重复的位置。
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
我使用 ordereddict 执行此操作的代码有什么问题?
from collections import OrderedDict
od = OrderedDict()
import csv
with open("MovieDictFormatted.csv") as f,open("MoviesCombined.csv" ,"w") as out:
r = csv.reader(f)
wr = csv.writer(out)
header = next(r)
for row in r:
loc,rest = row[0], row[1]
od.setdefault(loc, []).append(rest)
wr.writerow(header)
for loc,vals in od.items():
wr.writerow([loc]+vals)
我最终得到的是这样的:
['Edgebrook Park, Chicago ', 'Jun-7 A League of Their Own']
['Gage Park, Chicago ', "Jun-9 It's a Mad, Mad, Mad, Mad World"]
['Jefferson Memorial Park, Chicago ', 'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers ']
['Commercial Club Playground, Chicago ', 'Jun-12 Despicable Me 2']
问题是在这种情况下我没有让其他列显示,我最好怎么做?我也更愿意将 MovieDate 值设为一个长字符串,如下所示:
'Jun-12 Monsters University Jul-11 Frozen Aug-8 The Blues Brothers '
而不是:
'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers '
谢谢大家,感激不尽。我是 python 菜鸟。
不幸的是,将 row[0], row[1]
更改为 row[0], row[1:]
并没有给我想要的结果。我只想在第二列 (MovieDate) 中添加值,而不是复制所有其他列因此:
['Jefferson Memorial Park, Chicago ', ['Jun-12 Monsters University ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Jul-11 Frozen ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Aug-8 The Blues Brothers ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353']]
让我们尝试改变
od.setdefault(loc, []).append(rest)
到
od[loc] = ' '.join([od.get(loc, ''), ' 'join(rest)])
然后保持原样:
wr.writerow([loc]+vals)
您只需要进行一些更改,您需要加入纬度和经度,要删除重复的纬度和经度,我们还需要将其用作键:
with open("data.csv") as f,open("new.csv" ,"w") as out:
r = csv.reader(f)
wr= csv.writer(out)
header = next(r)
for row in r:
od.setdefault((row[0], row[-2], row[-1]), []).append(" ".join(row[1:-2]))
wr.writerow(header)
for loc,vals in od.items():
wr.writerow([loc[0]] + vals+list(loc[1:]))
输出:
Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA","Jun-9 It's a Mad, Mad, Mad, Mad World Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
A League of Their Own
是第一个,因为它在 mad,mad 行之前,
row[1:-2]
获取除纬度、经度和位置之外的所有内容,我们将纬度和经度存储在我们的键元组中,以避免在每一行的末尾重复写入。
使用名称和解包可能会更容易理解:
with open("data.csv") as f, open("new.csv", "w") as out:
r = csv.reader(f)
wr = csv.writer(out)
header = next(r)
for row in r:
loc, mov, form, lat, long = row
od.setdefault((loc, lat, long), []).append("{} {}".format(mov, form))
wr.writerow(header)
for loc, vals in od.items():
wr.writerow([loc[0]] + vals + list(loc[1:]))
使用csv.Dictwriter保留五列:
od = OrderedDict()
import csv
with open("data.csv") as f, open("new.csv", "w") as out:
r = csv.DictReader(f,fieldnames=['Location', 'MovieDate', 'Formatted_Address', 'Lat', 'Lng'])
wr = csv.DictWriter(out, fieldnames=r.fieldnames)
for row in r:
od.setdefault(row["Location"], dict(Location=row["Location"], Lat=row["Lat"], Lng=row["Lng"],
MovieDate=[], Formatted_Address=row["Formatted_Address"]))
od[row["Location"]]["MovieDate"].append(row["MovieDate"])
for loc, vals in od.items():
od[loc]["MovieDate"]= ", ".join(od[loc]["MovieDate"])
wr.writerow(vals)
# 输出:
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own, Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
所以五列保持不变,我们将 "MovieDate"
连接成单个字符串,并且 Formatted_Address=form
始终是唯一的,因此我们不需要更新它。
事实证明,我们需要做的就是连接 MovieDate's
并删除 Location、Lat、Lng 和 'Formatted_Address'
.
假设位置是该行的第一项:
dict = {}
for line in f:
if line[0] not in dict:
dict[line[0]] = []
dict[line[0]].append(line[1:])
对于每个位置,您都拥有该行的其余部分
for key, value in dict.iteritems():
out.write(key + value)