有序字典和排序

Ordered Dictionary and Sorting

我正在尝试解决一个简单的练习题:

Parse the CSV file to:

  • Find only the rows where the user started before September 6th, 2010.
  • Next, order the values from the "words" column in ascending order (by start date)
  • Return the compiled "hidden" phrase

csv文件有19列1000行数据。其中大部分是无关紧要的。正如问题所述,我们只关心按升序对 start_date 列进行排序,以从 'words' 列中获取关联的词。将这些词组合在一起将得到 "hidden" 短语。

源文件中的日期是 UTC 时间格式,所以我不得不转换它们。我现在正处于我认为我已经选择了正确行的地步,但是我在对日期进行排序时遇到了问题。

这是我的代码:

import csv
from collections import OrderedDict
from datetime import datetime


with open('TSE_sample_data.csv', 'rb') as csvIn:
    reader = csv.DictReader(csvIn)
    for row in reader:

        #convert from UTC to more standard date format
        startdt = datetime.fromtimestamp(int(row['start_date']))
        new_startdt = datetime.strftime(startdt, '%Y%m%d')        

        # find dates before Sep 6th, 2010
        if new_startdt < '20100906':

            # add the values from the 'words' column to a list 
            words = []
            words.append(row['words'])

            # add the dates to a list
            dates = []
            dates.append(new_startdt)

            # create an ordered dictionary to sort the dates... this is where I'm having issues 
            dict1 = OrderedDict(zip(words, dates))
            print dict1
            #print list(dict1.items())[0][1]
            #dict2 = sorted([(y,x) for x,y in dict1.items()])
            #print dict2

当我 print dict1 我希望有一本有序的字典,其中包含单词和日期作为项目。相反,我得到的是为每个创建的键值对创建多个有序字典。

这是更正后的版本:

import csv
from collections import OrderedDict
from datetime import datetime


with open('TSE_sample_data.csv', 'rb') as csvIn:
    reader = csv.DictReader(csvIn)
    words = []
    dates = []
    for row in reader:

        #convert from UTC to more standard date format
        startdt = datetime.fromtimestamp(int(row['start_date']))
        new_startdt = datetime.strftime(startdt, '%Y%m%d')        

        # find dates before Sep 6th, 2010
        if new_startdt < '20100906':

            # add the values from the 'words' column to a list 
            words.append(row['words'])
            # add the dates to a list
            dates.append(new_startdt)

    # This is where I was going wrong! Had to move the lines below outside of the for loop
    # Originally, because I was still inside the for loop, I was creating a new Ordered Dict for each "row in reader" that met my if condition
    # By doing this outside of the for loop, I'm able to create the ordered dict storing all of the values that have been found in tuples inside the ordered dict
    # create an ordered dictionary to sort by the dates
    dict1 = OrderedDict(zip(words, dates))
    dict2 = sorted([(y,x) for x,y in dict1.items()])

    # print the hidden message
    for i in dict2: 
        print i[1]