从 txt 文件中读取行并创建一个字典,其中值是元组列表
Reading lines from a txt file and create a dictionary where values are list of tuples
student.txt:
Akçam Su Tilsim PSYC 3.9
Aksel Eda POLS 2.78
Alpaydin Dilay ECON 1.2
Atil Turgut Uluç IR 2.1
Deveci Yasemin PSYC 2.9
Erserçe Yasemin POLS 3.0
Gülle Halil POLS 2.7
Gündogdu Ata Alp ECON 4.0
Gungor Muhammed Yasin POLS 3.1
Hammoud Rawan IR 1.7
Has Atakan POLS 1.97
Ince Kemal Kahriman IR 2.0
Kaptan Deniz IR 3.5
Kestir Bengisu IR 3.8
Koca Aysu ECON 2.5
Kolayli Sena Göksu IR 2.8
Kumman Gizem PSYC 2.9
Madenoglu Zeynep PSYC 3.1
Naghiyeva Gulustan IR 3.8
Ok Arda Mert IR 3.2
Var Berna ECON 2.9
Yeltekin Sude PSYC 1.2
你好,我想写一个函数,将文件中每个学生的信息读取到字典中,键是部门,值是给定部门的学生列表(元组列表).每个学生的信息都存储在一个元组中
包含(姓氏,GPA)。文件中的学生可能有多个名字,但只会存储姓氏和 gpa。该函数应该 return 字典。 (姓氏是每行的第一个词。)
这是我试过的:
def read_student(ifile):
D={}
f1=open(ifile,'r')
for line in f1:
tab=line.find('\t')
space=line.rfind(' ')
rtab=line.rfind('\t')
student_surname=line[0:tab]
gpa=line[space+1:]
department=line[rtab+1:space]
if department not in D:
D[department]=[(student_surname,gpa)]
else:
D[department].append((student_surname,gpa))
f1.close()
return D
print(read_student('student.txt'))
我认为主要问题是有一种混乱,因为有时单词后面有制表符,有时单词后面有 space,所以我不知道在这种情况下如何正确使用查找功能。
见下文 - 您必须注意姓氏,但已处理问题中的其他详细信息
from collections import defaultdict
data = defaultdict(list)
with open('data.txt', encoding="utf-8") as f:
lines = [l.strip() for l in f.readlines()]
for line in lines:
first_space_idx = line.rfind(' ')
sec_space_idx = line.rfind(' ', 0,first_space_idx - 1)
grade = line[first_space_idx+1:]
dep = line[sec_space_idx:first_space_idx]
student = line[:sec_space_idx].strip()
data[dep].append((student, grade))
for dep, students in data.items():
print(f'{dep} --> {students}')
输出
PSYC --> [('Akçam Su Tilsim', '3.9'), ('Deveci Yasemin', '2.9'), ('Kumman Gizem', '2.9'), ('Madenoglu Zeynep', '3.1'), ('Yeltekin Sude', '1.2')]
POLS --> [('Aksel Eda', '2.78'), ('Erserçe Yasemin', '3.0'), ('Gülle Halil', '2.7'), ('Gungor Muhammed Yasin', '3.1'), ('Has Atakan', '1.97')]
ECON --> [('Alpaydin Dilay', '1.2'), ('Gündogdu Ata Alp', '4.0'), ('Koca Aysu', '2.5'), ('Var Berna', '2.9')]
IR --> [('Atil Turgut Uluç', '2.1'), ('Hammoud Rawan', '1.7'), ('Ince Kemal Kahriman', '2.0'), ('Kaptan Deniz', '3.5'), ('Kestir Bengisu', '3.8'), ('Kolayli Sena Göksu', '2.8'), ('Naghiyeva Gulustan', '3.8'), ('Ok Arda Mert', '3.2')]
既然可以 split
,为什么还要搞 rfind
和 find
?
def read_student(ifile):
D = {}
f1 = open(ifile,'r')
for line in f1:
cols = line.split() # Splits at one or more whitespace
surname = cols[0].strip()
department = cols[-2].strip() # Because you know the last-but-one is dept
gpa = float(cols[-1].strip()) # Because you know the last one is GPA
fname = ' '.join(cols[1:-2]).strip()
# cols[1:-2] gives you everything starting at col 1 up to but excluding the second-last.
# Then you join these with spaces.
if department not in D:
D[department] = [(surname, gpa)]
else:
D[department].append((surname, gpa))
f1.close()
return D
如果您知道您的列始终由 \t
分隔,则可以改为 cols = line.split('\t')
。然后你在第二列有学生的姓名,第三列有部门,第四列有 GPA。
一些建议:
- 您可以使用
defaultdict
来避免每次检查if department not in D
- 您可以使用
with
来管理文件的读取,因此您不必担心 f1.close()
。这是读取 Python. 中文件的首选方式
您可以使用split(' ', 1)
提取姓氏。它给出了包含两个元素的列表。第一个是姓氏。然后再拆分第二个元素得到using rsplit(' ', 1)
。它再次给出包含两个元素的列表,第一个是 name 和 dept,第二个是 gpa。再次拆分第二个元素得到部门。
def read_student(ifile):
d = {}
with open(ifile) as fp:
for line in fp:
fname, data = line.strip().split(' ', 1)
data, gpa = data.rsplit(' ', 1)
dept = data.split()[-1]
d.setdefault(dept, []).append((fname, gpa))
return d
print(read_student('student.txt'))
输出:
{'ECON': [('Alpaydin', '1.2'),
('Gündogdu', '4.0'),
('Koca', '2.5'),
('Var', '2.9')],
'IR': [('Atil', '2.1'),
('Hammoud', '1.7'),
('Ince', '2.0'),
('Kaptan', '3.5'),
('Kestir', '3.8'),
('Kolayli', '2.8'),
('Naghiyeva', '3.8'),
('Ok', '3.2')],
'POLS': [('Aksel', '2.78'),
('Erserçe', '3.0'),
('Gülle', '2.7'),
('Gungor', '3.1'),
('Has', '1.97')],
'PSYC': [('Akçam', '3.9'),
('Deveci', '2.9'),
('Kumman', '2.9'),
('Madenoglu', '3.1'),
('Yeltekin', '1.2')]}
此解决方案利用 itemgetter 来简化变量的获取:姓氏、部门。和 gpa
from operator import itemgetter
d = dict()
with open('f0.txt', 'r') as f:
for line in f:
name, dept, gpa = itemgetter(0, -2, -1)(line.split())
d.setdefault(dept, []).append((name, gpa))
student.txt:
Akçam Su Tilsim PSYC 3.9
Aksel Eda POLS 2.78
Alpaydin Dilay ECON 1.2
Atil Turgut Uluç IR 2.1
Deveci Yasemin PSYC 2.9
Erserçe Yasemin POLS 3.0
Gülle Halil POLS 2.7
Gündogdu Ata Alp ECON 4.0
Gungor Muhammed Yasin POLS 3.1
Hammoud Rawan IR 1.7
Has Atakan POLS 1.97
Ince Kemal Kahriman IR 2.0
Kaptan Deniz IR 3.5
Kestir Bengisu IR 3.8
Koca Aysu ECON 2.5
Kolayli Sena Göksu IR 2.8
Kumman Gizem PSYC 2.9
Madenoglu Zeynep PSYC 3.1
Naghiyeva Gulustan IR 3.8
Ok Arda Mert IR 3.2
Var Berna ECON 2.9
Yeltekin Sude PSYC 1.2
你好,我想写一个函数,将文件中每个学生的信息读取到字典中,键是部门,值是给定部门的学生列表(元组列表).每个学生的信息都存储在一个元组中 包含(姓氏,GPA)。文件中的学生可能有多个名字,但只会存储姓氏和 gpa。该函数应该 return 字典。 (姓氏是每行的第一个词。)
这是我试过的:
def read_student(ifile):
D={}
f1=open(ifile,'r')
for line in f1:
tab=line.find('\t')
space=line.rfind(' ')
rtab=line.rfind('\t')
student_surname=line[0:tab]
gpa=line[space+1:]
department=line[rtab+1:space]
if department not in D:
D[department]=[(student_surname,gpa)]
else:
D[department].append((student_surname,gpa))
f1.close()
return D
print(read_student('student.txt'))
我认为主要问题是有一种混乱,因为有时单词后面有制表符,有时单词后面有 space,所以我不知道在这种情况下如何正确使用查找功能。
见下文 - 您必须注意姓氏,但已处理问题中的其他详细信息
from collections import defaultdict
data = defaultdict(list)
with open('data.txt', encoding="utf-8") as f:
lines = [l.strip() for l in f.readlines()]
for line in lines:
first_space_idx = line.rfind(' ')
sec_space_idx = line.rfind(' ', 0,first_space_idx - 1)
grade = line[first_space_idx+1:]
dep = line[sec_space_idx:first_space_idx]
student = line[:sec_space_idx].strip()
data[dep].append((student, grade))
for dep, students in data.items():
print(f'{dep} --> {students}')
输出
PSYC --> [('Akçam Su Tilsim', '3.9'), ('Deveci Yasemin', '2.9'), ('Kumman Gizem', '2.9'), ('Madenoglu Zeynep', '3.1'), ('Yeltekin Sude', '1.2')]
POLS --> [('Aksel Eda', '2.78'), ('Erserçe Yasemin', '3.0'), ('Gülle Halil', '2.7'), ('Gungor Muhammed Yasin', '3.1'), ('Has Atakan', '1.97')]
ECON --> [('Alpaydin Dilay', '1.2'), ('Gündogdu Ata Alp', '4.0'), ('Koca Aysu', '2.5'), ('Var Berna', '2.9')]
IR --> [('Atil Turgut Uluç', '2.1'), ('Hammoud Rawan', '1.7'), ('Ince Kemal Kahriman', '2.0'), ('Kaptan Deniz', '3.5'), ('Kestir Bengisu', '3.8'), ('Kolayli Sena Göksu', '2.8'), ('Naghiyeva Gulustan', '3.8'), ('Ok Arda Mert', '3.2')]
既然可以 split
,为什么还要搞 rfind
和 find
?
def read_student(ifile):
D = {}
f1 = open(ifile,'r')
for line in f1:
cols = line.split() # Splits at one or more whitespace
surname = cols[0].strip()
department = cols[-2].strip() # Because you know the last-but-one is dept
gpa = float(cols[-1].strip()) # Because you know the last one is GPA
fname = ' '.join(cols[1:-2]).strip()
# cols[1:-2] gives you everything starting at col 1 up to but excluding the second-last.
# Then you join these with spaces.
if department not in D:
D[department] = [(surname, gpa)]
else:
D[department].append((surname, gpa))
f1.close()
return D
如果您知道您的列始终由 \t
分隔,则可以改为 cols = line.split('\t')
。然后你在第二列有学生的姓名,第三列有部门,第四列有 GPA。
一些建议:
- 您可以使用
defaultdict
来避免每次检查if department not in D
- 您可以使用
with
来管理文件的读取,因此您不必担心f1.close()
。这是读取 Python. 中文件的首选方式
您可以使用split(' ', 1)
提取姓氏。它给出了包含两个元素的列表。第一个是姓氏。然后再拆分第二个元素得到using rsplit(' ', 1)
。它再次给出包含两个元素的列表,第一个是 name 和 dept,第二个是 gpa。再次拆分第二个元素得到部门。
def read_student(ifile):
d = {}
with open(ifile) as fp:
for line in fp:
fname, data = line.strip().split(' ', 1)
data, gpa = data.rsplit(' ', 1)
dept = data.split()[-1]
d.setdefault(dept, []).append((fname, gpa))
return d
print(read_student('student.txt'))
输出:
{'ECON': [('Alpaydin', '1.2'),
('Gündogdu', '4.0'),
('Koca', '2.5'),
('Var', '2.9')],
'IR': [('Atil', '2.1'),
('Hammoud', '1.7'),
('Ince', '2.0'),
('Kaptan', '3.5'),
('Kestir', '3.8'),
('Kolayli', '2.8'),
('Naghiyeva', '3.8'),
('Ok', '3.2')],
'POLS': [('Aksel', '2.78'),
('Erserçe', '3.0'),
('Gülle', '2.7'),
('Gungor', '3.1'),
('Has', '1.97')],
'PSYC': [('Akçam', '3.9'),
('Deveci', '2.9'),
('Kumman', '2.9'),
('Madenoglu', '3.1'),
('Yeltekin', '1.2')]}
此解决方案利用 itemgetter 来简化变量的获取:姓氏、部门。和 gpa
from operator import itemgetter
d = dict()
with open('f0.txt', 'r') as f:
for line in f:
name, dept, gpa = itemgetter(0, -2, -1)(line.split())
d.setdefault(dept, []).append((name, gpa))