实现for循环的并行处理
Implement parallel processing of for loop
希望使以下代码并行 - 它以一种大型 9gb 专有格式读取数据,并根据 30 列数据生成 30 个单独的 csv 文件。目前,在 30 分钟的数据集上写入每个 csv 需要 9 分钟。 Python中并行库的解决方案space有点铺天盖地。你能告诉我任何好的 tutorials/sample 代码吗?我找不到任何有用的信息。
for i in range(0, NumColumns):
aa = datetime.datetime.now()
allData = [TimeStamp]
ColumnData = allColumns[i].data # Get the data within this one Column
Samples = ColumnData.size # Find the number of elements in Column data
print('Formatting Column {0}'.format(i+1))
truncColumnData = [] # Initialize truncColumnData array each time for loop runs
if ColumnScale[i+1] == 'Scale: '+ tempScaleName: # If it's temperature, format every value to 5 characters
for j in range(Samples):
truncValue = '{:.1f}'.format((ColumnData[j]))
truncColumnData.append(truncValue) # Appends formatted value to truncColumnData array
allData.append(truncColumnData) #append the formatted Column data to the all data array
zipObject = zip(*allData)
zipList = list(zipObject)
csvFileColumn = 'Column_' + str('{0:02d}'.format(i+1)) + '.csv'
# Write the information to .csv file
with open(csvFileColumn, 'wb') as csvFile:
print('Writing to .csv file')
writer = csv.writer(csvFile)
counter = 0
for z in zipList:
counter = counter + 1
timeString = '{:.26},'.format(z[0])
zList = list(z)
columnVals = zList[1:]
columnValStrs = list(map(str, columnVals))
formattedStr = ','.join(columnValStrs)
csvFile.write(timeString + formattedStr + '\n') # Writes the time stamps and channel data by columns
一个可能的解决方案可能是使用 Dask http://dask.pydata.org/en/latest/
最近有同事推荐给我,所以我才想到它。
希望使以下代码并行 - 它以一种大型 9gb 专有格式读取数据,并根据 30 列数据生成 30 个单独的 csv 文件。目前,在 30 分钟的数据集上写入每个 csv 需要 9 分钟。 Python中并行库的解决方案space有点铺天盖地。你能告诉我任何好的 tutorials/sample 代码吗?我找不到任何有用的信息。
for i in range(0, NumColumns):
aa = datetime.datetime.now()
allData = [TimeStamp]
ColumnData = allColumns[i].data # Get the data within this one Column
Samples = ColumnData.size # Find the number of elements in Column data
print('Formatting Column {0}'.format(i+1))
truncColumnData = [] # Initialize truncColumnData array each time for loop runs
if ColumnScale[i+1] == 'Scale: '+ tempScaleName: # If it's temperature, format every value to 5 characters
for j in range(Samples):
truncValue = '{:.1f}'.format((ColumnData[j]))
truncColumnData.append(truncValue) # Appends formatted value to truncColumnData array
allData.append(truncColumnData) #append the formatted Column data to the all data array
zipObject = zip(*allData)
zipList = list(zipObject)
csvFileColumn = 'Column_' + str('{0:02d}'.format(i+1)) + '.csv'
# Write the information to .csv file
with open(csvFileColumn, 'wb') as csvFile:
print('Writing to .csv file')
writer = csv.writer(csvFile)
counter = 0
for z in zipList:
counter = counter + 1
timeString = '{:.26},'.format(z[0])
zList = list(z)
columnVals = zList[1:]
columnValStrs = list(map(str, columnVals))
formattedStr = ','.join(columnValStrs)
csvFile.write(timeString + formattedStr + '\n') # Writes the time stamps and channel data by columns
一个可能的解决方案可能是使用 Dask http://dask.pydata.org/en/latest/ 最近有同事推荐给我,所以我才想到它。