有序字典和排序
Ordered Dictionary and Sorting
我正在尝试解决一个简单的练习题:
Parse the CSV file to:
- Find only the rows where the user started before September 6th, 2010.
- Next, order the values from the "words" column in ascending order (by start date)
- Return the compiled "hidden" phrase
csv文件有19列1000行数据。其中大部分是无关紧要的。正如问题所述,我们只关心按升序对 start_date 列进行排序,以从 'words' 列中获取关联的词。将这些词组合在一起将得到 "hidden" 短语。
源文件中的日期是 UTC 时间格式,所以我不得不转换它们。我现在正处于我认为我已经选择了正确行的地步,但是我在对日期进行排序时遇到了问题。
这是我的代码:
import csv
from collections import OrderedDict
from datetime import datetime
with open('TSE_sample_data.csv', 'rb') as csvIn:
reader = csv.DictReader(csvIn)
for row in reader:
#convert from UTC to more standard date format
startdt = datetime.fromtimestamp(int(row['start_date']))
new_startdt = datetime.strftime(startdt, '%Y%m%d')
# find dates before Sep 6th, 2010
if new_startdt < '20100906':
# add the values from the 'words' column to a list
words = []
words.append(row['words'])
# add the dates to a list
dates = []
dates.append(new_startdt)
# create an ordered dictionary to sort the dates... this is where I'm having issues
dict1 = OrderedDict(zip(words, dates))
print dict1
#print list(dict1.items())[0][1]
#dict2 = sorted([(y,x) for x,y in dict1.items()])
#print dict2
当我 print dict1
我希望有一本有序的字典,其中包含单词和日期作为项目。相反,我得到的是为每个创建的键值对创建多个有序字典。
这是更正后的版本:
import csv
from collections import OrderedDict
from datetime import datetime
with open('TSE_sample_data.csv', 'rb') as csvIn:
reader = csv.DictReader(csvIn)
words = []
dates = []
for row in reader:
#convert from UTC to more standard date format
startdt = datetime.fromtimestamp(int(row['start_date']))
new_startdt = datetime.strftime(startdt, '%Y%m%d')
# find dates before Sep 6th, 2010
if new_startdt < '20100906':
# add the values from the 'words' column to a list
words.append(row['words'])
# add the dates to a list
dates.append(new_startdt)
# This is where I was going wrong! Had to move the lines below outside of the for loop
# Originally, because I was still inside the for loop, I was creating a new Ordered Dict for each "row in reader" that met my if condition
# By doing this outside of the for loop, I'm able to create the ordered dict storing all of the values that have been found in tuples inside the ordered dict
# create an ordered dictionary to sort by the dates
dict1 = OrderedDict(zip(words, dates))
dict2 = sorted([(y,x) for x,y in dict1.items()])
# print the hidden message
for i in dict2:
print i[1]
我正在尝试解决一个简单的练习题:
Parse the CSV file to:
- Find only the rows where the user started before September 6th, 2010.
- Next, order the values from the "words" column in ascending order (by start date)
- Return the compiled "hidden" phrase
csv文件有19列1000行数据。其中大部分是无关紧要的。正如问题所述,我们只关心按升序对 start_date 列进行排序,以从 'words' 列中获取关联的词。将这些词组合在一起将得到 "hidden" 短语。
源文件中的日期是 UTC 时间格式,所以我不得不转换它们。我现在正处于我认为我已经选择了正确行的地步,但是我在对日期进行排序时遇到了问题。
这是我的代码:
import csv
from collections import OrderedDict
from datetime import datetime
with open('TSE_sample_data.csv', 'rb') as csvIn:
reader = csv.DictReader(csvIn)
for row in reader:
#convert from UTC to more standard date format
startdt = datetime.fromtimestamp(int(row['start_date']))
new_startdt = datetime.strftime(startdt, '%Y%m%d')
# find dates before Sep 6th, 2010
if new_startdt < '20100906':
# add the values from the 'words' column to a list
words = []
words.append(row['words'])
# add the dates to a list
dates = []
dates.append(new_startdt)
# create an ordered dictionary to sort the dates... this is where I'm having issues
dict1 = OrderedDict(zip(words, dates))
print dict1
#print list(dict1.items())[0][1]
#dict2 = sorted([(y,x) for x,y in dict1.items()])
#print dict2
当我 print dict1
我希望有一本有序的字典,其中包含单词和日期作为项目。相反,我得到的是为每个创建的键值对创建多个有序字典。
这是更正后的版本:
import csv
from collections import OrderedDict
from datetime import datetime
with open('TSE_sample_data.csv', 'rb') as csvIn:
reader = csv.DictReader(csvIn)
words = []
dates = []
for row in reader:
#convert from UTC to more standard date format
startdt = datetime.fromtimestamp(int(row['start_date']))
new_startdt = datetime.strftime(startdt, '%Y%m%d')
# find dates before Sep 6th, 2010
if new_startdt < '20100906':
# add the values from the 'words' column to a list
words.append(row['words'])
# add the dates to a list
dates.append(new_startdt)
# This is where I was going wrong! Had to move the lines below outside of the for loop
# Originally, because I was still inside the for loop, I was creating a new Ordered Dict for each "row in reader" that met my if condition
# By doing this outside of the for loop, I'm able to create the ordered dict storing all of the values that have been found in tuples inside the ordered dict
# create an ordered dictionary to sort by the dates
dict1 = OrderedDict(zip(words, dates))
dict2 = sorted([(y,x) for x,y in dict1.items()])
# print the hidden message
for i in dict2:
print i[1]