向字典中添加多个值,然后对其进行排序

Adding multiple values to a dictionary and then sorting it

我有一个包含三个属性的文件:id、文本和日期。该文件中大约有 70K 条记录。我希望将此数据添加到字典中,然后按日期排序。下面是代码。

matchinput = csv.reader(open(filename,"rb"),delimiter=',', quotechar='|')
tweets = []
for row in matchinput:
    data = dict()
    data['id']=str(row[0])
    data['text']=str(row[1])
    data['date']=str(row[2])
    tweets.append(data)

sorted(tweets, key=lambda tweets: tweets[2])
print tweets

代码给出以下错误:

sorted(tweets, key=lambda tweets: tweets[2])
KeyError: 2

输入文件:

566561942949474304,"lala is only 52 runs and 7 wickets away from being the only player to score 8000 runs and take 400 wickets in odi's !!! #pakvsind #cwc15",2015-02-14 22:37:48
566561925178200064,"rt @shoaibakhtarpk: captain @misbahulhaqpk, speaking to media, says want to make history by wining match against india #cwc15#pakvind #ind",2015-02-14 22:37:43

输出文件:

566561925178200064,"rt @shoaibakhtarpk: captain @misbahulhaqpk, speaking to media, says want to make history by wining match against india #cwc15#pakvind #ind",2015-02-14 22:37:43
566561942949474304,"lala is only 52 runs and 7 wickets away from being the only player to score 8000 runs and take 400 wickets in odi's !!! #pakvsind #cwc15",2015-02-14 22:37:48

为什么不将每一行存储为 list/tuple 知道 row[0] = idrow[1] = textrow[2] = date 正如您在解析.csv 文件。这样,每个 id/text/date 组合都放在一起:

# to take care of any fileio cleanup and clean unnecessary lines
with open(filename, 'rb') as csvfile:
    data = [row for row in csv.reader(csvfile, delimiter=',', quotechar='|')
sorted_data = sorted(data, key=lambda t: t[-1]) # or t[2]

如果你想把 ids, texts, dates 分开,你可以使用 zip:

ids, texts, dates = zip(*sorted_data)

编辑:反映出您对日期的关注,示例代码中的字符串格式应正确排序为字符串。但是,更一般地说,您始终可以执行以下操作以确保任何 date/time 格式正确排序(我使用了与您当前的日期时间格式相对应的 strptime 字符串)。

import datetime
date_key = lambda t: datetime.datetime.strptime(t[-1], '%Y-%m-%d %H:%M:%S')
sorted_data = sorted(data, key=date_key)