pd.read_csv for a "one-column" import: which sep avoids split as: "ParserError: Error tokenizing data. C error: Expected 10 fields in line 4, saw 16"

pd.read_csv for a "one-column" import: which sep avoids split as: "ParserError: Error tokenizing data. C error: Expected 10 fields in line 4, saw 16"

对于只有一列的 csv,当 运行

pd.read_csv('/MYPATH/MYFILE.csv')

我明白了

ParserError: Error tokenizing data. C error: Expected 10 fields in line 4, saw 16

或者长输出:

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py in read(self, nrows)
   2155     def read(self, nrows=None):
   2156         try:
-> 2157             data = self._reader.read(nrows)
   2158         except StopIteration:
   2159             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 10 fields in line 4, saw 16

显然,它不会将单列 csv 读取为一列,就好像标准分隔符拆分了列一样。因此,我做了分隔符None,但是运行

pd.read_csv('/MYPATH/MYFILE.csv', sep=None)

投掷

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py in _alert_malformed(self, msg, row_num)
   2996         """
   2997         if self.error_bad_lines:
-> 2998             raise ParserError(msg)
   2999         elif self.warn_bad_lines:
   3000             base = f"Skipping line {row_num}: "

ParserError: Expected 68 fields in line 26, saw 147

哪个定界符 = 分隔符根本不拆分列?

您需要使用从未出现在数据中的分隔符。分隔符只是将输入拆分为列,而不是行,因此我们可以这样做:

pd.read_csv('/MYPATH/MYFILE.csv', sep="§§§")

或 is/are 肯定 不在 csv 中的任何字符。然后,该列将被读取为一列,分隔符将不会检测到任何需要的拆分。

如果没有这个,标准分隔符将设置为 sep=",",这显然会在“单列”csv 的任何列中找到一些逗号。

致谢名单