Python 根据定义的分隔符生成行的生成器
Python generator which yields lines based on defined separator
尝试编写一个生成器函数,该函数一次读入文件一行,并根据定义的分隔符将每个项目作为列表中的单独元素生成输出。所以对于输入:
ID|Name|Major
1234|Jane Heng|History
2334|Nandini Khola|Computer Science
6345|Ben Johnson|Data Science
理想的输出是:
[1234, Jane Heng, History]
[2334, Nandini Khola, Computer Science]
[6345, Ben Johnson, Data Science]
这是我目前的代码:
def file_reader(path, fields, sep, header):
with open(path, "r") as file:
if not os.path.isfile(path):
raise FileNotFoundError(
errno.ENOENT, os.strerror(errno.ENOENT), path)
for line in file:
count = 0 #Initialize line counter
while True:
i = line.find(sep)
count += 1
if i == -1:
break
fieldlist = [x for x in (line.rstrip(sep) for line in file) if x]
# if header is True:
# if len(fieldlist) == fields:
# count = 1 # Start from the second line if there is header
# continue
# else:
# raise ValueError(
# f'{path} has {len(fieldlist)} fields in header but expected {fields} fields!')
if len(fieldlist) != fields:
raise ValueError(f'{path} has {len(fieldlist)} fields on line {count} but expected {fields} fields!')
yield fieldlist
但测试:
gen = file_reader('/path/to/file.txt', 3, sep='|', header=True)
print(next(gen))
我得到:
['1234|Jane Heng|History\n', '2334|Nandini Khola|Computer Science\n', '6345|Ben Johnson|Data Science']
如果我尝试类似
for ID, Name, Major in file_reader('/path/to/file.txt', 3, sep='|', header=True):
print(f"id: {ID} name: {Name} major: {Major}")
我得到以下输出:
cwid: 1234|Jane Heng|History
name: 2334|Nandini Khola|Computer Science
major: 6345|Ben Johnson|Data Science
ValueError: /path/to/file.txt has 0 fields on line 2 but expected 3 fields!
显然 \n
导致所有内容都被读取为 1 行,因此出现 ValueError 异常。
header 代码块目前已被注释掉,但我们的想法是仅在 header 具有预期的字段数时才继续。因此,如果 header 中只有 2 个字段,则会引发 ValueError 异常。当块被评论时,我得到:
ValueError: /path/to/file.txt has 0 fields in header but expected 3 fields!
关于如何获得所需输出的任何建议?
使用 split('|')
似乎可以完成工作:
def file_reader(path):
with open(path, 'r') as file:
if not os.path.isfile(path):
raise FileNotFoundError(
errno.ENOENT, os.strerror(errno.ENOENT), path)
result = []
header_length = 0
for i, line in enumerate(file):
if i == 0:
header_length = len(line.strip().split('|'))
else:
contents = line.strip().split('|')
if len(contents) != header_length:
raise ValueError() #your desired error message here
else:
result.append(contents)
return result
result = file_reader(path)
for r in result:
print(r)
输出:
['1234', 'Jane Heng', 'History']
['2334', 'Nandini Khola', 'Computer Science']
['6345', 'Ben Johnson', 'Data Science']
尝试编写一个生成器函数,该函数一次读入文件一行,并根据定义的分隔符将每个项目作为列表中的单独元素生成输出。所以对于输入:
ID|Name|Major
1234|Jane Heng|History
2334|Nandini Khola|Computer Science
6345|Ben Johnson|Data Science
理想的输出是:
[1234, Jane Heng, History]
[2334, Nandini Khola, Computer Science]
[6345, Ben Johnson, Data Science]
这是我目前的代码:
def file_reader(path, fields, sep, header):
with open(path, "r") as file:
if not os.path.isfile(path):
raise FileNotFoundError(
errno.ENOENT, os.strerror(errno.ENOENT), path)
for line in file:
count = 0 #Initialize line counter
while True:
i = line.find(sep)
count += 1
if i == -1:
break
fieldlist = [x for x in (line.rstrip(sep) for line in file) if x]
# if header is True:
# if len(fieldlist) == fields:
# count = 1 # Start from the second line if there is header
# continue
# else:
# raise ValueError(
# f'{path} has {len(fieldlist)} fields in header but expected {fields} fields!')
if len(fieldlist) != fields:
raise ValueError(f'{path} has {len(fieldlist)} fields on line {count} but expected {fields} fields!')
yield fieldlist
但测试:
gen = file_reader('/path/to/file.txt', 3, sep='|', header=True)
print(next(gen))
我得到:
['1234|Jane Heng|History\n', '2334|Nandini Khola|Computer Science\n', '6345|Ben Johnson|Data Science']
如果我尝试类似
for ID, Name, Major in file_reader('/path/to/file.txt', 3, sep='|', header=True):
print(f"id: {ID} name: {Name} major: {Major}")
我得到以下输出:
cwid: 1234|Jane Heng|History
name: 2334|Nandini Khola|Computer Science
major: 6345|Ben Johnson|Data Science
ValueError: /path/to/file.txt has 0 fields on line 2 but expected 3 fields!
显然 \n
导致所有内容都被读取为 1 行,因此出现 ValueError 异常。
header 代码块目前已被注释掉,但我们的想法是仅在 header 具有预期的字段数时才继续。因此,如果 header 中只有 2 个字段,则会引发 ValueError 异常。当块被评论时,我得到:
ValueError: /path/to/file.txt has 0 fields in header but expected 3 fields!
关于如何获得所需输出的任何建议?
使用 split('|')
似乎可以完成工作:
def file_reader(path):
with open(path, 'r') as file:
if not os.path.isfile(path):
raise FileNotFoundError(
errno.ENOENT, os.strerror(errno.ENOENT), path)
result = []
header_length = 0
for i, line in enumerate(file):
if i == 0:
header_length = len(line.strip().split('|'))
else:
contents = line.strip().split('|')
if len(contents) != header_length:
raise ValueError() #your desired error message here
else:
result.append(contents)
return result
result = file_reader(path)
for r in result:
print(r)
输出:
['1234', 'Jane Heng', 'History']
['2334', 'Nandini Khola', 'Computer Science']
['6345', 'Ben Johnson', 'Data Science']