CSV reader 错误地解析引号后的制表符
CSV reader incorrectly parses tabspaces after inverted commas
我正在使用 CSV reader 读取 Python 中的 TSV。代码是:
f = csv.reader(open('sample.csv'), delimiter='\t')
for chunk in f:
print(chunk)
制表符分隔的 CSV 文件中的一行如下所示(托管的 csv here):
doc
unit1_toks
unit2_toks
unit1_txt1
unit2_txt2
s1_toks
s2_toks
unit1_sent
unit2_sent
dir
GUM_bio_galois
156-160
161-170
" We zouden dan voorstellen
dat de auteur al zijn werk zou moeten publiceren
107-182
107-182
Poisson declared Galois ' work " incomprehensible " , declaring that " [ Galois ' ] argument is not sufficient . " [ 16 ]
Poisson declared Galois ' work " incomprehensible " , declaring that " [ Galois ' ] argument would then suggest that the author should publish the opinion . " [ 16 ]
1>2
我得到以下输出(CSV reader 缺少一些制表符空格):
['GUM_bio_galois',
'156-160',
'161-170',
' We zouden dan voorstellen\tdat de auteur al zijn werk zou moeten publiceren\t107-182\t107-182\tPoisson declared Galois \' work incomprehensible " , declaring that " [ Galois \' ] argument is not sufficient . " [ 16 ]',
'Poisson declared Galois \' work " incomprehensible " , declaring that " [ Galois \' ] argument would then suggest that the author should publish the opinion . " [ 16 ]',
'1>2']
我希望它看起来像这样:
['GUM_bio_galois',
'156-160',
'161-170',
'" We zouden dan voorstellen',
'dat de auteur al zijn werk zou moeten publiceren',
'107-182',
'107-182',
'Poisson declared Galois \' work incomprehensible " , declaring that " [ Galois \' ] argument is not sufficient . " [ 16 ]',
'Poisson declared Galois \' work " incomprehensible " , declaring that " [ Galois \' ] argument would then suggest that the author should publish the opinion . " [ 16 ]',
'1>2']
如何获取 CSV reader 来处理不完整的引号并将它们保留在我的输出中?
import csv
with open('sample.csv') as f:
rdr = csv.reader(f, quoting=csv.QUOTE_NONE, delimiter='\t')
header = next(rdr)
for line in rdr:
print(line)
或使用csv.DictReader
:
import csv
with open('sample.csv') as f:
rdr = csv.DictReader(f, quoting=csv.QUOTE_NONE, delimiter='\t')
for line in rdr:
print(line)
我正在使用 CSV reader 读取 Python 中的 TSV。代码是:
f = csv.reader(open('sample.csv'), delimiter='\t')
for chunk in f:
print(chunk)
制表符分隔的 CSV 文件中的一行如下所示(托管的 csv here):
doc | unit1_toks | unit2_toks | unit1_txt1 | unit2_txt2 | s1_toks | s2_toks | unit1_sent | unit2_sent | dir |
---|---|---|---|---|---|---|---|---|---|
GUM_bio_galois | 156-160 | 161-170 | " We zouden dan voorstellen | dat de auteur al zijn werk zou moeten publiceren | 107-182 | 107-182 | Poisson declared Galois ' work " incomprehensible " , declaring that " [ Galois ' ] argument is not sufficient . " [ 16 ] | Poisson declared Galois ' work " incomprehensible " , declaring that " [ Galois ' ] argument would then suggest that the author should publish the opinion . " [ 16 ] | 1>2 |
我得到以下输出(CSV reader 缺少一些制表符空格):
['GUM_bio_galois',
'156-160',
'161-170',
' We zouden dan voorstellen\tdat de auteur al zijn werk zou moeten publiceren\t107-182\t107-182\tPoisson declared Galois \' work incomprehensible " , declaring that " [ Galois \' ] argument is not sufficient . " [ 16 ]',
'Poisson declared Galois \' work " incomprehensible " , declaring that " [ Galois \' ] argument would then suggest that the author should publish the opinion . " [ 16 ]',
'1>2']
我希望它看起来像这样:
['GUM_bio_galois',
'156-160',
'161-170',
'" We zouden dan voorstellen',
'dat de auteur al zijn werk zou moeten publiceren',
'107-182',
'107-182',
'Poisson declared Galois \' work incomprehensible " , declaring that " [ Galois \' ] argument is not sufficient . " [ 16 ]',
'Poisson declared Galois \' work " incomprehensible " , declaring that " [ Galois \' ] argument would then suggest that the author should publish the opinion . " [ 16 ]',
'1>2']
如何获取 CSV reader 来处理不完整的引号并将它们保留在我的输出中?
import csv
with open('sample.csv') as f:
rdr = csv.reader(f, quoting=csv.QUOTE_NONE, delimiter='\t')
header = next(rdr)
for line in rdr:
print(line)
或使用csv.DictReader
:
import csv
with open('sample.csv') as f:
rdr = csv.DictReader(f, quoting=csv.QUOTE_NONE, delimiter='\t')
for line in rdr:
print(line)