在 pandas 中读取大尺寸 csv 时出现内存错误
memory error reading big size csv in pandas
我的笔记本电脑内存是 8 gig,我试图读取和处理一个大的 csv 文件,遇到了内存问题,我找到了一个解决方案,它使用 chunksize 来处理逐块文件,但显然当 uisng chunsize 文件格式 vecoe textreaderfile 和我用来处理普通 csvs 的代码不再工作时,这是我正在尝试的代码用于读取 csv 文件中的句子数量。
wdata = pd.read_csv(fileinput, nrows=0,).columns[0]
skip = int(wdata.count(' ') == 0)
wdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=1000)
data = wdata.count()
print(data)
我得到的错误是:-
Traceback (most recent call last):
File "table.py", line 24, in <module>
data = wdata.count()
AttributeError: 'TextFileReader' object has no attribute 'count'
我也尝试了另一种方法 运行 这个代码
TextFileReader = pd.read_csv(fileinput, chunksize=1000) # the number of rows per chunk
dfList = []
for df in TextFileReader:
dfList.append(df)
df = pd.concat(dfList,sort=False)
print(df)
它给出了这个错误
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 908, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2132, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 4
您必须遍历块:
csv_length = 0
for chunk in pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=10000):
csv_length += chunk.count()
print(csv_length )
我的笔记本电脑内存是 8 gig,我试图读取和处理一个大的 csv 文件,遇到了内存问题,我找到了一个解决方案,它使用 chunksize 来处理逐块文件,但显然当 uisng chunsize 文件格式 vecoe textreaderfile 和我用来处理普通 csvs 的代码不再工作时,这是我正在尝试的代码用于读取 csv 文件中的句子数量。
wdata = pd.read_csv(fileinput, nrows=0,).columns[0]
skip = int(wdata.count(' ') == 0)
wdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=1000)
data = wdata.count()
print(data)
我得到的错误是:-
Traceback (most recent call last):
File "table.py", line 24, in <module>
data = wdata.count()
AttributeError: 'TextFileReader' object has no attribute 'count'
我也尝试了另一种方法 运行 这个代码
TextFileReader = pd.read_csv(fileinput, chunksize=1000) # the number of rows per chunk
dfList = []
for df in TextFileReader:
dfList.append(df)
df = pd.concat(dfList,sort=False)
print(df)
它给出了这个错误
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 908, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2132, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 4
您必须遍历块:
csv_length = 0
for chunk in pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=10000):
csv_length += chunk.count()
print(csv_length )