Python 根据定义的分隔符生成行的生成器

Python generator which yields lines based on defined separator

尝试编写一个生成器函数,该函数一次读入文件一行,并根据定义的分隔符将每个项目作为列表中的单独元素生成输出。所以对于输入:

ID|Name|Major
1234|Jane Heng|History
2334|Nandini Khola|Computer Science
6345|Ben Johnson|Data Science

理想的输出是:

[1234, Jane Heng, History]
[2334, Nandini Khola, Computer Science]
[6345, Ben Johnson, Data Science]

这是我目前的代码:

def file_reader(path, fields, sep, header):
    with open(path, "r") as file:
        if not os.path.isfile(path):
            raise FileNotFoundError(
                errno.ENOENT, os.strerror(errno.ENOENT), path)
        for line in file:
            count = 0 #Initialize line counter

            while True:
                i = line.find(sep)
                count += 1
                if i == -1:
                    break
                fieldlist = [x for x in (line.rstrip(sep) for line in file) if x]
                # if header is True:
                #     if len(fieldlist) == fields:
                #         count = 1  # Start from the second line if there is header
                #         continue
                #     else:
                #         raise ValueError(
                #             f'{path} has {len(fieldlist)} fields in header but expected {fields} fields!')
                if len(fieldlist) != fields:
                    raise ValueError(f'{path} has {len(fieldlist)} fields on line {count} but expected {fields} fields!')
                yield fieldlist

但测试:

gen = file_reader('/path/to/file.txt', 3, sep='|', header=True)
print(next(gen))

我得到:

['1234|Jane Heng|History\n', '2334|Nandini Khola|Computer Science\n', '6345|Ben Johnson|Data Science']

如果我尝试类似

for ID, Name, Major in file_reader('/path/to/file.txt', 3, sep='|', header=True):
    print(f"id: {ID} name: {Name} major: {Major}")

我得到以下输出:

cwid: 1234|Jane Heng|History
 name: 2334|Nandini Khola|Computer Science
 major: 6345|Ben Johnson|Data Science

ValueError: /path/to/file.txt has 0 fields on line 2 but expected 3 fields!

显然 \n 导致所有内容都被读取为 1 行,因此出现 ValueError 异常。

header 代码块目前已被注释掉,但我们的想法是仅在 header 具有预期的字段数时才继续。因此,如果 header 中只有 2 个字段,则会引发 ValueError 异常。当块被评论时,我得到:

ValueError: /path/to/file.txt has 0 fields in header but expected 3 fields!

关于如何获得所需输出的任何建议?

使用 split('|') 似乎可以完成工作:

def file_reader(path):
    with open(path, 'r') as file:
        if not os.path.isfile(path):
            raise FileNotFoundError(
                errno.ENOENT, os.strerror(errno.ENOENT), path)
            
        result = []
        header_length = 0
            
        for i, line in enumerate(file):
            if i == 0:
                header_length = len(line.strip().split('|'))
            else:
                contents = line.strip().split('|')
                if len(contents) != header_length:
                    raise ValueError() #your desired error message here
                else:
                    result.append(contents)
                
        return result

result = file_reader(path)
for r in result:
    print(r)

输出:

['1234', 'Jane Heng', 'History']
['2334', 'Nandini Khola', 'Computer Science']
['6345', 'Ben Johnson', 'Data Science']