读取csv文件,解析数据,存入字典
Read csv file, parse data, and store in a dictionary
我有一个文件,其中包含电台最近播放的歌曲、艺术家和播放时间,格式如下:"November 4, 2019 8:02 PM"、"Wagon Wheel"、"Darius Rucker"。我试图将此文件的内容存储在字符串变量 playlist_csv 中,使用 splitlines() 将记录存储在可变行中,然后遍历这些行以将数据存储在字典中。键应该是时间戳的日期时间对象,值应该是歌曲和艺术家的元组:{datetime_key: (song, artist)}
文件摘录如下:
"November 4, 2019 8:02 PM","Wagon Wheel","Darius Rucker"
"November 4, 2019 7:59 PM","Remember You Young","Thomas Rhett"
"November 4, 2019 7:55 PM","Long Hot Summer","Keith Urban"
所需词典应如下所示:
{datetime.datetime(2019, 11, 4, 20, 2): ('Wagon Wheel', 'Darius Rucker'),
datetime.datetime(2019, 11, 4, 19, 59): ('Remember You Young', 'Thomas Rhett'),
datetime.datetime(2019, 11, 4, 19, 55): ('Long Hot Summer', 'Keith Urban')}
这是我到目前为止的代码:
# read the file and store content in string variable playlist_csv
with open('playlist.txt', 'r') as csv_file:
playlist_csv = csv_file.read().replace('\n', '')
# use splitlines() method to store records in variable lines (it is list)
split_playlist = playlist_csv.splitlines()
# iterate through lines to store data in playlist_dict dictionary
playlist_dict = {}
for l in csv.reader(split_playlist, quotechar='"', delimiter=',',
quoting=csv.QUOTE_ALL, skipinitialspace=True):
dt=datetime.strptime(l[0], '%B %d, %Y %I:%M %p')
playlist_dict[l[dt]].append(dt)
print(playlist_dict)
但是,当我尝试将此数据存储在字典中时,我一直 运行 出错(特别是“'datetime.datetime' 对象不可订阅”和 "list indices must be integers or slices" 修改代码时)。
感谢任何帮助!
您首先尝试拆分 csv 文件似乎是不必要的 - csv.reader
会为您处理所有这些。而不是 playlist_dict[l[dt]].append(dt)
你需要像 playlist_dict[dt].append((song, artist))
这样的东西。这应该有效:
with open('playlist.txt', 'r') as csv_file:
playlist = {}
for time, song, artist in csv.reader(csv_file):
time = datetime.strptime(time, '%B %d, %Y %I:%M %p')
if time in playlist:
playlist[time].append((song, artist))
else:
playlist[time] = [(song, artist)]
(您提供给 csv.reader
的可选参数可能也不需要 - 默认值应该适用于您提供的输入类型。)
或者如果你在每个日期时间只有一个可能的 song/artist 那么你不需要一个列表并且可以这样做(这似乎是你正在寻找的输出):
with open('playlist.txt', 'r') as f:
playlist = {datetime.strptime(time, '%B %d, %Y %I:%M %p'): (song, artist)
for time, song, artist in csv.reader(f)}
事实证明,对于这种情况它可能是更好的选择,这里是使用 Pandas 的解决方案。作为奖励,它计算每首歌曲之间的时间。
import pandas as pd
df = pd.read_csv('../resources/radio_songs.csv', dtype={'song_name': str, 'artist': str},
parse_dates=[0], header=None, names=['time_played', 'song_name', 'artist'])
df['time_diff'] = df['time_played'].diff(periods=-1)
DataFrame 输出:
time_played song_name artist time_diff
0 2019-11-04 20:02:00 Wagon Wheel Darius Rucker 00:03:00
1 2019-11-04 19:59:00 Remember You Young Thomas Rhett 00:04:00
2 2019-11-04 19:55:00 Long Hot Summer Keith Urban NaT
如果出于某种原因您需要它,可以通过以下有趣的方式重新创建该字典格式:
tuples_dict = dict(zip(df['time_played'], zip(df['song_name'], df['artist'])))
输出:
{Timestamp('2019-11-04 20:02:00'): ('Wagon Wheel', 'Darius Rucker'), Timestamp('2019-11-04 19:59:00'): ('Remember You Young', 'Thomas Rhett'), Timestamp('2019-11-04 19:55:00'): ('Long Hot Summer', 'Keith Urban')}
我有一个文件,其中包含电台最近播放的歌曲、艺术家和播放时间,格式如下:"November 4, 2019 8:02 PM"、"Wagon Wheel"、"Darius Rucker"。我试图将此文件的内容存储在字符串变量 playlist_csv 中,使用 splitlines() 将记录存储在可变行中,然后遍历这些行以将数据存储在字典中。键应该是时间戳的日期时间对象,值应该是歌曲和艺术家的元组:{datetime_key: (song, artist)}
文件摘录如下:
"November 4, 2019 8:02 PM","Wagon Wheel","Darius Rucker"
"November 4, 2019 7:59 PM","Remember You Young","Thomas Rhett"
"November 4, 2019 7:55 PM","Long Hot Summer","Keith Urban"
所需词典应如下所示:
{datetime.datetime(2019, 11, 4, 20, 2): ('Wagon Wheel', 'Darius Rucker'),
datetime.datetime(2019, 11, 4, 19, 59): ('Remember You Young', 'Thomas Rhett'),
datetime.datetime(2019, 11, 4, 19, 55): ('Long Hot Summer', 'Keith Urban')}
这是我到目前为止的代码:
# read the file and store content in string variable playlist_csv
with open('playlist.txt', 'r') as csv_file:
playlist_csv = csv_file.read().replace('\n', '')
# use splitlines() method to store records in variable lines (it is list)
split_playlist = playlist_csv.splitlines()
# iterate through lines to store data in playlist_dict dictionary
playlist_dict = {}
for l in csv.reader(split_playlist, quotechar='"', delimiter=',',
quoting=csv.QUOTE_ALL, skipinitialspace=True):
dt=datetime.strptime(l[0], '%B %d, %Y %I:%M %p')
playlist_dict[l[dt]].append(dt)
print(playlist_dict)
但是,当我尝试将此数据存储在字典中时,我一直 运行 出错(特别是“'datetime.datetime' 对象不可订阅”和 "list indices must be integers or slices" 修改代码时)。
感谢任何帮助!
您首先尝试拆分 csv 文件似乎是不必要的 - csv.reader
会为您处理所有这些。而不是 playlist_dict[l[dt]].append(dt)
你需要像 playlist_dict[dt].append((song, artist))
这样的东西。这应该有效:
with open('playlist.txt', 'r') as csv_file:
playlist = {}
for time, song, artist in csv.reader(csv_file):
time = datetime.strptime(time, '%B %d, %Y %I:%M %p')
if time in playlist:
playlist[time].append((song, artist))
else:
playlist[time] = [(song, artist)]
(您提供给 csv.reader
的可选参数可能也不需要 - 默认值应该适用于您提供的输入类型。)
或者如果你在每个日期时间只有一个可能的 song/artist 那么你不需要一个列表并且可以这样做(这似乎是你正在寻找的输出):
with open('playlist.txt', 'r') as f:
playlist = {datetime.strptime(time, '%B %d, %Y %I:%M %p'): (song, artist)
for time, song, artist in csv.reader(f)}
事实证明,对于这种情况它可能是更好的选择,这里是使用 Pandas 的解决方案。作为奖励,它计算每首歌曲之间的时间。
import pandas as pd
df = pd.read_csv('../resources/radio_songs.csv', dtype={'song_name': str, 'artist': str},
parse_dates=[0], header=None, names=['time_played', 'song_name', 'artist'])
df['time_diff'] = df['time_played'].diff(periods=-1)
DataFrame 输出:
time_played song_name artist time_diff
0 2019-11-04 20:02:00 Wagon Wheel Darius Rucker 00:03:00
1 2019-11-04 19:59:00 Remember You Young Thomas Rhett 00:04:00
2 2019-11-04 19:55:00 Long Hot Summer Keith Urban NaT
如果出于某种原因您需要它,可以通过以下有趣的方式重新创建该字典格式:
tuples_dict = dict(zip(df['time_played'], zip(df['song_name'], df['artist'])))
输出:
{Timestamp('2019-11-04 20:02:00'): ('Wagon Wheel', 'Darius Rucker'), Timestamp('2019-11-04 19:59:00'): ('Remember You Young', 'Thomas Rhett'), Timestamp('2019-11-04 19:55:00'): ('Long Hot Summer', 'Keith Urban')}