使用 csv 模块读取 ascii 分隔文本？

Question

您可能会也可能不会 aware of ASCII delimited text，这具有使用非键盘字符分隔字段和行的好处。

写出来很简单：

import csv

with open('ascii_delim.adt', 'w') as f:
    writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
    writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
    writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))

而且，果然，你把东西倒出来了。但是，在阅读时，lineterminator 什么都不做，如果我尝试这样做：

open('ascii_delim.adt', newline=chr(30))

它抛出一个 ValueError: illegal newline value:

那么如何读取我的 ASCII 分隔文件？我是否只能做 line.split(chr(30))？

Answer 1

The documentation 说：

The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future.

因此 csv 模块无法读取使用自定义行终止符的 CSV 文件。

Answer 2

根据 the docs for open:

newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'.

所以 open 不会处理您的文件。每 the csv docs:

Note The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator.

所以那也不行。我还研究了 str.splitlines 是否可配置，但它使用一组定义的边界。

Am I relegated to doing line.split(chr(30))?

看起来是这样，抱歉！

Answer 3

您可以通过有效地将文件中的行尾字符转换为换行符来做到这一点 csv.reader 被硬编码为可以识别：

import csv

with open('ascii_delim.adt', 'w') as f:
    writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
    writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
    writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))

def readlines(f, newline='\n'):
    while True:
        line = []
        while True:
            ch = f.read(1)
            if ch == '':  # end of file?
                return
            elif ch == newline:  # end of line?
                line.append('\n')
                break
            line.append(ch)
        yield ''.join(line)

with open('ascii_delim.adt', 'rb') as f:
    reader = csv.reader(readlines(f, newline=chr(30)), delimiter=chr(31))
    for row in reader:
        print row

输出：

['Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue']
['Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!']

Answer 4

嘿，我整天都在为类似的问题苦苦挣扎。我写了一个深受@martineau 启发的函数，它应该可以为您解决。我的函数速度较慢，但可以解析由任何类型的字符串分隔的文件。希望对您有所帮助！

import csv

def custom_CSV_reader(csv_file,row_delimiter,col_delimiter):

    with open(csv_file, 'rb') as f:

        row = [];
        result = [];
        temp_row = ''
        temp_col = ''
        line = ''
        go = 1;

        while go == 1:
            while go == 1:
                ch = f.read(1)

                if ch == '':  # end of file?
                    go = 0

                if ch != '\n' and ch != '\t' and ch != ',':
                    temp_row = temp_row + ch
                    temp_col = temp_col + ch
                    line = line + ch

                if row_delimiter in temp_row:
                    line = line[:-len(row_delimiter)]

                    row.append(line)

                    temp_row = ''
                    line= ''

                    break

                elif col_delimiter in temp_col:
                    line = line[:-len(col_delimiter)]
                    row.append(line)
                    result.append(row)

                    row = [];
                    temp_col = ''
                    line = ''
                    break
    return result

使用 csv 模块读取 ascii 分隔文本？

Using csv module to read ascii delimited text?

python

csv

newline