如何加载 iPython 中太大的 csv 文件?
How can I load a csv file which is too big in iPython?
如何在 iPython 中加载太大的 csv 文件?内存里好像不能一次性加载。
您可以使用此代码分块读取文件,它还会将文件分发到多个处理器上。
import pandas as pd
import multiprocessing as mp
LARGE_FILE = "yourfile.csv"
CHUNKSIZE = 100000 # processing 100,000 rows at a time
def process_frame(df):
# process data frame
return len(df)
if __name__ == '__main__':
reader = pd.read_csv(LARGE_FILE, chunksize=CHUNKSIZE)
pool = mp.Pool(4) # use 4 processes
funclist = []
for df in reader:
# process each data frame
f = pool.apply_async(process_frame,[df])
funclist.append(f)
result = 0
for f in funclist:
result += f.get(timeout=10) # timeout in 10 seconds
如何在 iPython 中加载太大的 csv 文件?内存里好像不能一次性加载。
您可以使用此代码分块读取文件,它还会将文件分发到多个处理器上。
import pandas as pd
import multiprocessing as mp
LARGE_FILE = "yourfile.csv"
CHUNKSIZE = 100000 # processing 100,000 rows at a time
def process_frame(df):
# process data frame
return len(df)
if __name__ == '__main__':
reader = pd.read_csv(LARGE_FILE, chunksize=CHUNKSIZE)
pool = mp.Pool(4) # use 4 processes
funclist = []
for df in reader:
# process each data frame
f = pool.apply_async(process_frame,[df])
funclist.append(f)
result = 0
for f in funclist:
result += f.get(timeout=10) # timeout in 10 seconds