使用 csv 模块读取 ascii 分隔文本?
Using csv module to read ascii delimited text?
您可能会也可能不会 aware of ASCII delimited text,这具有使用非键盘字符分隔字段和行的好处。
写出来很简单:
import csv
with open('ascii_delim.adt', 'w') as f:
writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))
而且,果然,你把东西倒出来了。但是,在阅读时,lineterminator
什么都不做,如果我尝试这样做:
open('ascii_delim.adt', newline=chr(30))
它抛出一个 ValueError: illegal newline value:
那么如何读取我的 ASCII 分隔文件?我是否只能做 line.split(chr(30))
?
The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future.
因此 csv
模块无法读取使用自定义行终止符的 CSV 文件。
newline controls how universal newlines mode works (it only applies to text mode). It can be None
, ''
, '\n'
, '\r'
, and '\r\n'
.
所以 open
不会处理您的文件。每 the csv
docs:
Note The reader
is hard-coded to recognise either '\r'
or '\n'
as end-of-line, and ignores lineterminator.
所以那也不行。我还研究了 str.splitlines
是否可配置,但它使用一组定义的边界。
Am I relegated to doing line.split(chr(30))
?
看起来是这样,抱歉!
您可以通过有效地将文件中的行尾字符转换为换行符来做到这一点 csv.reader
被硬编码为可以识别:
import csv
with open('ascii_delim.adt', 'w') as f:
writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))
def readlines(f, newline='\n'):
while True:
line = []
while True:
ch = f.read(1)
if ch == '': # end of file?
return
elif ch == newline: # end of line?
line.append('\n')
break
line.append(ch)
yield ''.join(line)
with open('ascii_delim.adt', 'rb') as f:
reader = csv.reader(readlines(f, newline=chr(30)), delimiter=chr(31))
for row in reader:
print row
输出:
['Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue']
['Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!']
嘿,我整天都在为类似的问题苦苦挣扎。我写了一个深受@martineau 启发的函数,它应该可以为您解决。我的函数速度较慢,但可以解析由任何类型的字符串分隔的文件。希望对您有所帮助!
import csv
def custom_CSV_reader(csv_file,row_delimiter,col_delimiter):
with open(csv_file, 'rb') as f:
row = [];
result = [];
temp_row = ''
temp_col = ''
line = ''
go = 1;
while go == 1:
while go == 1:
ch = f.read(1)
if ch == '': # end of file?
go = 0
if ch != '\n' and ch != '\t' and ch != ',':
temp_row = temp_row + ch
temp_col = temp_col + ch
line = line + ch
if row_delimiter in temp_row:
line = line[:-len(row_delimiter)]
row.append(line)
temp_row = ''
line= ''
break
elif col_delimiter in temp_col:
line = line[:-len(col_delimiter)]
row.append(line)
result.append(row)
row = [];
temp_col = ''
line = ''
break
return result
您可能会也可能不会 aware of ASCII delimited text,这具有使用非键盘字符分隔字段和行的好处。
写出来很简单:
import csv
with open('ascii_delim.adt', 'w') as f:
writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))
而且,果然,你把东西倒出来了。但是,在阅读时,lineterminator
什么都不做,如果我尝试这样做:
open('ascii_delim.adt', newline=chr(30))
它抛出一个 ValueError: illegal newline value:
那么如何读取我的 ASCII 分隔文件?我是否只能做 line.split(chr(30))
?
The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future.
因此 csv
模块无法读取使用自定义行终止符的 CSV 文件。
newline controls how universal newlines mode works (it only applies to text mode). It can be
None
,''
,'\n'
,'\r'
, and'\r\n'
.
所以 open
不会处理您的文件。每 the csv
docs:
Note The
reader
is hard-coded to recognise either'\r'
or'\n'
as end-of-line, and ignores lineterminator.
所以那也不行。我还研究了 str.splitlines
是否可配置,但它使用一组定义的边界。
Am I relegated to doing
line.split(chr(30))
?
看起来是这样,抱歉!
您可以通过有效地将文件中的行尾字符转换为换行符来做到这一点 csv.reader
被硬编码为可以识别:
import csv
with open('ascii_delim.adt', 'w') as f:
writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))
def readlines(f, newline='\n'):
while True:
line = []
while True:
ch = f.read(1)
if ch == '': # end of file?
return
elif ch == newline: # end of line?
line.append('\n')
break
line.append(ch)
yield ''.join(line)
with open('ascii_delim.adt', 'rb') as f:
reader = csv.reader(readlines(f, newline=chr(30)), delimiter=chr(31))
for row in reader:
print row
输出:
['Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue']
['Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!']
嘿,我整天都在为类似的问题苦苦挣扎。我写了一个深受@martineau 启发的函数,它应该可以为您解决。我的函数速度较慢,但可以解析由任何类型的字符串分隔的文件。希望对您有所帮助!
import csv
def custom_CSV_reader(csv_file,row_delimiter,col_delimiter):
with open(csv_file, 'rb') as f:
row = [];
result = [];
temp_row = ''
temp_col = ''
line = ''
go = 1;
while go == 1:
while go == 1:
ch = f.read(1)
if ch == '': # end of file?
go = 0
if ch != '\n' and ch != '\t' and ch != ',':
temp_row = temp_row + ch
temp_col = temp_col + ch
line = line + ch
if row_delimiter in temp_row:
line = line[:-len(row_delimiter)]
row.append(line)
temp_row = ''
line= ''
break
elif col_delimiter in temp_col:
line = line[:-len(col_delimiter)]
row.append(line)
result.append(row)
row = [];
temp_col = ''
line = ''
break
return result