加速约束洗牌。 GPU (Tesla K40m),CPU MATLAB 中的并行计算
Speedup constrained shuffling. GPU (Tesla K40m), CPU parallel computations in MATLAB
我有 100 个 lamp。他们在眨眼。我观察了一段时间。对于每个 lamp,我计算眨眼间隔的平均值、标准差和自相关。
现在我应该重新采样观察到的数据并保持排列,其中所有参数(均值、标准差、自相关)都在某个范围内。我的代码运行良好。但是每轮实验需要很长时间(一周)。我是在12核2个Tesla K40m GPU的计算服务器上做的(详情在文末)
我的代码:
close all
clear all
clc
% open parpool skip error if it was opened
try parpool(24); end
% Sample input. It is faked, just for demo.
% Number of "lamps" and number of "blinks" are similar to real.
NLamps = 10^2;
NBlinks = 2*10^2;
Events = cumsum([randg(9,NLamps,NBlinks)],2); % each row - different "lamp"
DurationOfExperiment=Events(:,end).*1.01;
%% MAIN
% Define parameters
nLags=2; % I need to keep autocorrelation with lags 1-2
alpha=[0.01,0.1]; % range of allowed relative deviation from observed
% parameters should be > 0 to avoid generating original
% sequence
nPermutations=10^2; % In original code 10^5
% Processing of experimental data
DurationOfExperiment=num2cell(DurationOfExperiment);
Events=num2cell(Events,2);
Intervals=cellfun(@(x) diff(x),Events,'UniformOutput',false);
observedParams=cellfun(@(x) fGetParameters(x,nLags),Intervals,'UniformOutput',false);
observedParams=cell2mat(observedParams);
% Constrained shuffling. EXPENSIVE PART!!!
while true
parfor iPermutation=1:nPermutations
% Shuffle intervals
shuffledIntervals=cellfun(@(x,y) fPermute(x,y),Intervals,DurationOfExperiment,'UniformOutput',false);
% get parameters of shuffled intervals
shuffledParameters=cellfun(@(x) fGetParameters(x,nLags),shuffledIntervals,'UniformOutput',false);
shuffledParameters=cell2mat(shuffledParameters);
% get relative deviation
delta=abs((shuffledParameters-observedParams)./observedParams);
% find shuffled Lamps, which are inside alpha range
MaximumDeviation=max(delta,[] ,2);
MinimumDeviation=min(delta,[] ,2);
LampID=find(and(MaximumDeviation<alpha(2),MinimumDeviation>alpha(1)));
% if shuffling of ANY lamp was succesful, save these Intervals
if ~isempty(LampID)
shuffledIntervals=shuffledIntervals(LampID);
shuffledParameters=shuffledParameters(LampID,:);
parsave( LampID,shuffledIntervals,shuffledParameters);
'DONE'
end
end
end
%% FUNCTIONS
function [ params ] = fGetParameters( intervals,nLags )
% Calculate [mean,std,autocorrelations with lags from 1 to nLags
R=nan(1,nLags);
for lag=1:nLags
R(lag) = corr(intervals(1:end-lag)',intervals((1+lag):end)','type','Spearman');
end
params = [mean(intervals),std(intervals),R];
end
%--------------------------------------------------------------------------
function [ Intervals ] = fPermute( Intervals,Duration )
% Create long shuffled time-series
Time=cumsum([0,datasample(Intervals,numel(Intervals)*3)]);
% Keep the same duration
Time(Time>Duration)=[];
% Calculate Intervals
Intervals=diff(Time);
end
%--------------------------------------------------------------------------
function parsave( LampID,Intervals,params)
save([num2str(randi(10^9)),'.mat'],'LampID','Intervals','params')
end
服务器规格:
>>gpuDevice()
CUDADevice with properties:
Name: 'Tesla K40m'
Index: 1
ComputeCapability: '3.5'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 8
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 1.1979e+10
AvailableMemory: 1.1846e+10
MultiprocessorCount: 15
ClockRateKHz: 745000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
>> feature('numcores')
MATLAB detected: 12 physical cores.
MATLAB detected: 24 logical cores.
MATLAB was assigned: 24 logical cores by the OS.
MATLAB is using: 12 logical cores.
MATLAB is not using all logical cores because hyper-threading is enabled.
>> system('for /f "tokens=2 delims==" %A in (''wmic cpu get name /value'') do @(echo %A)')
Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
>> memory
Maximum possible array: 496890 MB (5.210e+11 bytes) *
Memory available for all arrays: 496890 MB (5.210e+11 bytes) *
Memory used by MATLAB: 18534 MB (1.943e+10 bytes)
Physical Memory (RAM): 262109 MB (2.748e+11 bytes)
* Limited by System Memory (physical + swap file) available.
问题:
是否有可能加快我的计算速度? 我考虑 CPU+GPU 计算,但我不明白该怎么做(我没有经验gpuArrays)。此外,我不确定这是个好主意。有时一些算法优化会带来更大的利润,然后是并行计算。
P.S.
保存步骤不是瓶颈——在最好的情况下,它每 10-30 分钟发生一次。
基于 GPU 的处理仅适用于某些功能和正确的显卡(如果我没记错的话)。
对于问题的 GPU 部分,MATLAB 有一个 list of available functions - 你可以在 GPU 上 运行 - 你的代码中最昂贵的部分是不幸的是,函数 corr
不在列表中。
如果探查器没有突出显示瓶颈 - 发生了一些奇怪的事情......所以我 运行 对你上面的代码进行了一些测试:
nPermutations = 10^0 iteration takes ~0.13 seconds
nPermutations = 10^1 iteration takes ~1.3 seconds
nPermutations = 10^3 iteration takes ~130 seconds
nPermutations = 10^4 probably takes ~1300 seconds
nPermutations = 10^5 probably takes ~13000 seconds
这还不到一周...
我有没有提到我在你的 while
声明中加入了 break
- 因为我不能在您的代码中看不到 "break" 退出 while
循环的地方 - 我希望为了您着想,这不是您的函数永远 运行 的原因....
while true
parfor iPermutation=1:nPermutations
% Shuffle intervals
shuffledIntervals=cellfun(@(x,y) fPermute(x,y),Intervals,DurationOfExperiment,'UniformOutput',false);
% get parameters of shuffled intervals
shuffledParameters=cellfun(@(x) fGetParameters(x,nLags),shuffledIntervals,'UniformOutput',false);
shuffledParameters=cell2mat(shuffledParameters);
% get relative deviation
delta=abs((shuffledParameters-observedParams)./observedParams);
% find shuffled Lamps, which are inside alpha range
MaximumDeviation=max(delta,[] ,2);
MinimumDeviation=min(delta,[] ,2);
LampID=find(and(MaximumDeviation<alpha(2),MinimumDeviation>alpha(1)));
% if shuffling of ANY lamp was succesful, save these Intervals
if ~isempty(LampID)
shuffledIntervals=shuffledIntervals(LampID);
shuffledParameters=shuffledParameters(LampID,:);
parsave( LampID,shuffledIntervals,shuffledParameters);
'DONE'
end
end
break % You need to break out of the loop at some point
% otherwise it would run forever....
end
我有 100 个 lamp。他们在眨眼。我观察了一段时间。对于每个 lamp,我计算眨眼间隔的平均值、标准差和自相关。 现在我应该重新采样观察到的数据并保持排列,其中所有参数(均值、标准差、自相关)都在某个范围内。我的代码运行良好。但是每轮实验需要很长时间(一周)。我是在12核2个Tesla K40m GPU的计算服务器上做的(详情在文末)
我的代码:
close all
clear all
clc
% open parpool skip error if it was opened
try parpool(24); end
% Sample input. It is faked, just for demo.
% Number of "lamps" and number of "blinks" are similar to real.
NLamps = 10^2;
NBlinks = 2*10^2;
Events = cumsum([randg(9,NLamps,NBlinks)],2); % each row - different "lamp"
DurationOfExperiment=Events(:,end).*1.01;
%% MAIN
% Define parameters
nLags=2; % I need to keep autocorrelation with lags 1-2
alpha=[0.01,0.1]; % range of allowed relative deviation from observed
% parameters should be > 0 to avoid generating original
% sequence
nPermutations=10^2; % In original code 10^5
% Processing of experimental data
DurationOfExperiment=num2cell(DurationOfExperiment);
Events=num2cell(Events,2);
Intervals=cellfun(@(x) diff(x),Events,'UniformOutput',false);
observedParams=cellfun(@(x) fGetParameters(x,nLags),Intervals,'UniformOutput',false);
observedParams=cell2mat(observedParams);
% Constrained shuffling. EXPENSIVE PART!!!
while true
parfor iPermutation=1:nPermutations
% Shuffle intervals
shuffledIntervals=cellfun(@(x,y) fPermute(x,y),Intervals,DurationOfExperiment,'UniformOutput',false);
% get parameters of shuffled intervals
shuffledParameters=cellfun(@(x) fGetParameters(x,nLags),shuffledIntervals,'UniformOutput',false);
shuffledParameters=cell2mat(shuffledParameters);
% get relative deviation
delta=abs((shuffledParameters-observedParams)./observedParams);
% find shuffled Lamps, which are inside alpha range
MaximumDeviation=max(delta,[] ,2);
MinimumDeviation=min(delta,[] ,2);
LampID=find(and(MaximumDeviation<alpha(2),MinimumDeviation>alpha(1)));
% if shuffling of ANY lamp was succesful, save these Intervals
if ~isempty(LampID)
shuffledIntervals=shuffledIntervals(LampID);
shuffledParameters=shuffledParameters(LampID,:);
parsave( LampID,shuffledIntervals,shuffledParameters);
'DONE'
end
end
end
%% FUNCTIONS
function [ params ] = fGetParameters( intervals,nLags )
% Calculate [mean,std,autocorrelations with lags from 1 to nLags
R=nan(1,nLags);
for lag=1:nLags
R(lag) = corr(intervals(1:end-lag)',intervals((1+lag):end)','type','Spearman');
end
params = [mean(intervals),std(intervals),R];
end
%--------------------------------------------------------------------------
function [ Intervals ] = fPermute( Intervals,Duration )
% Create long shuffled time-series
Time=cumsum([0,datasample(Intervals,numel(Intervals)*3)]);
% Keep the same duration
Time(Time>Duration)=[];
% Calculate Intervals
Intervals=diff(Time);
end
%--------------------------------------------------------------------------
function parsave( LampID,Intervals,params)
save([num2str(randi(10^9)),'.mat'],'LampID','Intervals','params')
end
服务器规格:
>>gpuDevice()
CUDADevice with properties:
Name: 'Tesla K40m'
Index: 1
ComputeCapability: '3.5'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 8
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 1.1979e+10
AvailableMemory: 1.1846e+10
MultiprocessorCount: 15
ClockRateKHz: 745000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
>> feature('numcores')
MATLAB detected: 12 physical cores.
MATLAB detected: 24 logical cores.
MATLAB was assigned: 24 logical cores by the OS.
MATLAB is using: 12 logical cores.
MATLAB is not using all logical cores because hyper-threading is enabled.
>> system('for /f "tokens=2 delims==" %A in (''wmic cpu get name /value'') do @(echo %A)')
Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
>> memory
Maximum possible array: 496890 MB (5.210e+11 bytes) *
Memory available for all arrays: 496890 MB (5.210e+11 bytes) *
Memory used by MATLAB: 18534 MB (1.943e+10 bytes)
Physical Memory (RAM): 262109 MB (2.748e+11 bytes)
* Limited by System Memory (physical + swap file) available.
问题:
是否有可能加快我的计算速度? 我考虑 CPU+GPU 计算,但我不明白该怎么做(我没有经验gpuArrays)。此外,我不确定这是个好主意。有时一些算法优化会带来更大的利润,然后是并行计算。
P.S. 保存步骤不是瓶颈——在最好的情况下,它每 10-30 分钟发生一次。
基于 GPU 的处理仅适用于某些功能和正确的显卡(如果我没记错的话)。
对于问题的 GPU 部分,MATLAB 有一个 list of available functions - 你可以在 GPU 上 运行 - 你的代码中最昂贵的部分是不幸的是,函数 corr
不在列表中。
如果探查器没有突出显示瓶颈 - 发生了一些奇怪的事情......所以我 运行 对你上面的代码进行了一些测试:
nPermutations = 10^0 iteration takes ~0.13 seconds
nPermutations = 10^1 iteration takes ~1.3 seconds
nPermutations = 10^3 iteration takes ~130 seconds
nPermutations = 10^4 probably takes ~1300 seconds
nPermutations = 10^5 probably takes ~13000 seconds
这还不到一周...
我有没有提到我在你的 while
声明中加入了 break
- 因为我不能在您的代码中看不到 "break" 退出 while
循环的地方 - 我希望为了您着想,这不是您的函数永远 运行 的原因....
while true
parfor iPermutation=1:nPermutations
% Shuffle intervals
shuffledIntervals=cellfun(@(x,y) fPermute(x,y),Intervals,DurationOfExperiment,'UniformOutput',false);
% get parameters of shuffled intervals
shuffledParameters=cellfun(@(x) fGetParameters(x,nLags),shuffledIntervals,'UniformOutput',false);
shuffledParameters=cell2mat(shuffledParameters);
% get relative deviation
delta=abs((shuffledParameters-observedParams)./observedParams);
% find shuffled Lamps, which are inside alpha range
MaximumDeviation=max(delta,[] ,2);
MinimumDeviation=min(delta,[] ,2);
LampID=find(and(MaximumDeviation<alpha(2),MinimumDeviation>alpha(1)));
% if shuffling of ANY lamp was succesful, save these Intervals
if ~isempty(LampID)
shuffledIntervals=shuffledIntervals(LampID);
shuffledParameters=shuffledParameters(LampID,:);
parsave( LampID,shuffledIntervals,shuffledParameters);
'DONE'
end
end
break % You need to break out of the loop at some point
% otherwise it would run forever....
end