Matlab:可以应用 SOM 和 kmeans 对时间序列数据进行二值化吗?
Matlab : Can SOM and kmeans be applied to binarize time series data?
我在这里发现了一个类似的问题Determining cluster membership in SOM (Self Organizing Map) for time series data
我想学习如何应用自组织映射对数据进行二值化或分配超过 2 种符号。
例如,让 data = rand(100,1)
通常,我会做 data_quantized = 2*(data>=0.5)-1
以获得一个二进制值转换序列,其中假定并固定了阈值 0.5。可能已经可以使用多于 2 个符号来量化数据。可以应用 kmeans 或 SOM 来完成这项任务吗?如果用SOM量化数据,输入输出应该是多少?
X = {x_i(t)}
for i =1:N and t = 1:T time series, N
表示组件/变量的数量。要获得任何矢量的量化值 x_i 就是使用最接近的 BMU 的值。量化误差将是输入向量与最佳匹配模型之差的欧几里德范数。然后使用时间序列的符号表示来比较/匹配新的时间序列。 BMU 是标量值还是浮点数向量?很难想象 SOM 在做什么。
Matlab 实现https://www.mathworks.com/matlabcentral/fileexchange/39930-self-organizing-map-simple-demonstration
我不明白如何在量化中处理时间序列。假设 N = 1
,一个从白噪声过程中获得的元素的一维数组/向量,我如何使用自组织映射量化/划分此数据?
http://www.mathworks.com/help/nnet/ug/cluster-with-self-organizing-map-neural-network.html
由 Matlab 提供,但它适用于 N 维数据,但我有一个包含 1000 个数据点 (t =1,...,1000) 的一维数据。
如果提供一个玩具示例来解释如何将时间序列量化为多个级别,那将会有很大的帮助。让,trainingData = x_i;
T = 1000;
N = 1;
x_i = rand(T,N) ;
如何应用下面的 SOM 代码,使数值数据可以用 1、2、3 等符号表示,即使用 3 个符号进行聚类?数据点(标量值)可以用符号 1 或 2 或 3 表示。
function som = SOMSimple(nfeatures, ndim, nepochs, ntrainingvectors, eta0, etadecay, sgm0, sgmdecay, showMode)
%SOMSimple Simple demonstration of a Self-Organizing Map that was proposed by Kohonen.
% sommap = SOMSimple(nfeatures, ndim, nepochs, ntrainingvectors, eta0, neta, sgm0, nsgm, showMode)
% trains a self-organizing map with the following parameters
% nfeatures - dimension size of the training feature vectors
% ndim - width of a square SOM map
% nepochs - number of epochs used for training
% ntrainingvectors - number of training vectors that are randomly generated
% eta0 - initial learning rate
% etadecay - exponential decay rate of the learning rate
% sgm0 - initial variance of a Gaussian function that
% is used to determine the neighbours of the best
% matching unit (BMU)
% sgmdecay - exponential decay rate of the Gaussian variance
% showMode - 0: do not show output,
% 1: show the initially randomly generated SOM map
% and the trained SOM map,
% 2: show the trained SOM map after each update
%
% For example: A demonstration of an SOM map that is trained by RGB values
%
% som = SOMSimple(1,60,10,100,0.1,0.05,20,0.05,2);
% % It uses:
% % 1 : dimensions for training vectors
% % 60x60: neurons
% % 10 : epochs
% % 100 : training vectors
% % 0.1 : initial learning rate
% % 0.05 : exponential decay rate of the learning rate
% % 20 : initial Gaussian variance
% % 0.05 : exponential decay rate of the Gaussian variance
% % 2 : Display the som map after every update
nrows = ndim;
ncols = ndim;
nfeatures = 1;
som = rand(nrows,ncols,nfeatures);
% Generate random training data
x_i = trainingData;
% Generate coordinate system
[x y] = meshgrid(1:ncols,1:nrows);
for t = 1:nepochs
% Compute the learning rate for the current epoch
eta = eta0 * exp(-t*etadecay);
% Compute the variance of the Gaussian (Neighbourhood) function for the ucrrent epoch
sgm = sgm0 * exp(-t*sgmdecay);
% Consider the width of the Gaussian function as 3 sigma
width = ceil(sgm*3);
for ntraining = 1:ntrainingvectors
% Get current training vector
trainingVector = trainingData(ntraining,:);
% Compute the Euclidean distance between the training vector and
% each neuron in the SOM map
dist = getEuclideanDistance(trainingVector, som, nrows, ncols, nfeatures);
% Find the best matching unit (bmu)
[~, bmuindex] = min(dist);
% transform the bmu index into 2D
[bmurow bmucol] = ind2sub([nrows ncols],bmuindex);
% Generate a Gaussian function centered on the location of the bmu
g = exp(-(((x - bmucol).^2) + ((y - bmurow).^2)) / (2*sgm*sgm));
% Determine the boundary of the local neighbourhood
fromrow = max(1,bmurow - width);
torow = min(bmurow + width,nrows);
fromcol = max(1,bmucol - width);
tocol = min(bmucol + width,ncols);
% Get the neighbouring neurons and determine the size of the neighbourhood
neighbourNeurons = som(fromrow:torow,fromcol:tocol,:);
sz = size(neighbourNeurons);
% Transform the training vector and the Gaussian function into
% multi-dimensional to facilitate the computation of the neuron weights update
T = reshape(repmat(trainingVector,sz(1)*sz(2),1),sz(1),sz(2),nfeatures);
G = repmat(g(fromrow:torow,fromcol:tocol),[1 1 nfeatures]);
% Update the weights of the neurons that are in the neighbourhood of the bmu
neighbourNeurons = neighbourNeurons + eta .* G .* (T - neighbourNeurons);
% Put the new weights of the BMU neighbouring neurons back to the
% entire SOM map
som(fromrow:torow,fromcol:tocol,:) = neighbourNeurons;
end
end
function ed = getEuclideanDistance(trainingVector, sommap, nrows, ncols, nfeatures)
% Transform the 3D representation of neurons into 2D
neuronList = reshape(sommap,nrows*ncols,nfeatures);
% Initialize Euclidean Distance
ed = 0;
for n = 1:size(neuronList,2)
ed = ed + (trainingVector(n)-neuronList(:,n)).^2;
end
ed = sqrt(ed);
我不知道我可能误解了你的问题,但据我所知,它真的很简单,无论是 kmeans
还是 Matlab 自己的 selforgmap
。您为 SOMSimple 发布的实现我无法真正评论。
让我们以您的初始示例为例:
rng(1337);
T = 1000;
x_i = rand(1,T); %rowvector for convenience
假设您要量化为三个符号,您的手动版本可能是:
nsyms = 3;
symsthresh = [1:-1/nsyms:1/nsyms];
x_i_q = zeros(size(x_i));
for i=1:nsyms
x_i_q(x_i<=symsthresh(i)) = i;
end
使用Matlab自带的selforgmap
可以得到类似的结果:
net = selforgmap(nsyms);
net.trainParam.showWindow = false;
net = train(net,x_i);
net(x_i);
y = net(x_i);
classes = vec2ind(y);
最后,同样的事情可以直接用 kmeans
:
clusters = kmeans(x_i',nsyms)';
我在这里发现了一个类似的问题Determining cluster membership in SOM (Self Organizing Map) for time series data
我想学习如何应用自组织映射对数据进行二值化或分配超过 2 种符号。
例如,让 data = rand(100,1)
通常,我会做 data_quantized = 2*(data>=0.5)-1
以获得一个二进制值转换序列,其中假定并固定了阈值 0.5。可能已经可以使用多于 2 个符号来量化数据。可以应用 kmeans 或 SOM 来完成这项任务吗?如果用SOM量化数据,输入输出应该是多少?
X = {x_i(t)}
for i =1:N and t = 1:T time series, N
表示组件/变量的数量。要获得任何矢量的量化值 x_i 就是使用最接近的 BMU 的值。量化误差将是输入向量与最佳匹配模型之差的欧几里德范数。然后使用时间序列的符号表示来比较/匹配新的时间序列。 BMU 是标量值还是浮点数向量?很难想象 SOM 在做什么。
Matlab 实现https://www.mathworks.com/matlabcentral/fileexchange/39930-self-organizing-map-simple-demonstration
我不明白如何在量化中处理时间序列。假设 N = 1
,一个从白噪声过程中获得的元素的一维数组/向量,我如何使用自组织映射量化/划分此数据?
http://www.mathworks.com/help/nnet/ug/cluster-with-self-organizing-map-neural-network.html
由 Matlab 提供,但它适用于 N 维数据,但我有一个包含 1000 个数据点 (t =1,...,1000) 的一维数据。
如果提供一个玩具示例来解释如何将时间序列量化为多个级别,那将会有很大的帮助。让,trainingData = x_i;
T = 1000;
N = 1;
x_i = rand(T,N) ;
如何应用下面的 SOM 代码,使数值数据可以用 1、2、3 等符号表示,即使用 3 个符号进行聚类?数据点(标量值)可以用符号 1 或 2 或 3 表示。
function som = SOMSimple(nfeatures, ndim, nepochs, ntrainingvectors, eta0, etadecay, sgm0, sgmdecay, showMode)
%SOMSimple Simple demonstration of a Self-Organizing Map that was proposed by Kohonen.
% sommap = SOMSimple(nfeatures, ndim, nepochs, ntrainingvectors, eta0, neta, sgm0, nsgm, showMode)
% trains a self-organizing map with the following parameters
% nfeatures - dimension size of the training feature vectors
% ndim - width of a square SOM map
% nepochs - number of epochs used for training
% ntrainingvectors - number of training vectors that are randomly generated
% eta0 - initial learning rate
% etadecay - exponential decay rate of the learning rate
% sgm0 - initial variance of a Gaussian function that
% is used to determine the neighbours of the best
% matching unit (BMU)
% sgmdecay - exponential decay rate of the Gaussian variance
% showMode - 0: do not show output,
% 1: show the initially randomly generated SOM map
% and the trained SOM map,
% 2: show the trained SOM map after each update
%
% For example: A demonstration of an SOM map that is trained by RGB values
%
% som = SOMSimple(1,60,10,100,0.1,0.05,20,0.05,2);
% % It uses:
% % 1 : dimensions for training vectors
% % 60x60: neurons
% % 10 : epochs
% % 100 : training vectors
% % 0.1 : initial learning rate
% % 0.05 : exponential decay rate of the learning rate
% % 20 : initial Gaussian variance
% % 0.05 : exponential decay rate of the Gaussian variance
% % 2 : Display the som map after every update
nrows = ndim;
ncols = ndim;
nfeatures = 1;
som = rand(nrows,ncols,nfeatures);
% Generate random training data
x_i = trainingData;
% Generate coordinate system
[x y] = meshgrid(1:ncols,1:nrows);
for t = 1:nepochs
% Compute the learning rate for the current epoch
eta = eta0 * exp(-t*etadecay);
% Compute the variance of the Gaussian (Neighbourhood) function for the ucrrent epoch
sgm = sgm0 * exp(-t*sgmdecay);
% Consider the width of the Gaussian function as 3 sigma
width = ceil(sgm*3);
for ntraining = 1:ntrainingvectors
% Get current training vector
trainingVector = trainingData(ntraining,:);
% Compute the Euclidean distance between the training vector and
% each neuron in the SOM map
dist = getEuclideanDistance(trainingVector, som, nrows, ncols, nfeatures);
% Find the best matching unit (bmu)
[~, bmuindex] = min(dist);
% transform the bmu index into 2D
[bmurow bmucol] = ind2sub([nrows ncols],bmuindex);
% Generate a Gaussian function centered on the location of the bmu
g = exp(-(((x - bmucol).^2) + ((y - bmurow).^2)) / (2*sgm*sgm));
% Determine the boundary of the local neighbourhood
fromrow = max(1,bmurow - width);
torow = min(bmurow + width,nrows);
fromcol = max(1,bmucol - width);
tocol = min(bmucol + width,ncols);
% Get the neighbouring neurons and determine the size of the neighbourhood
neighbourNeurons = som(fromrow:torow,fromcol:tocol,:);
sz = size(neighbourNeurons);
% Transform the training vector and the Gaussian function into
% multi-dimensional to facilitate the computation of the neuron weights update
T = reshape(repmat(trainingVector,sz(1)*sz(2),1),sz(1),sz(2),nfeatures);
G = repmat(g(fromrow:torow,fromcol:tocol),[1 1 nfeatures]);
% Update the weights of the neurons that are in the neighbourhood of the bmu
neighbourNeurons = neighbourNeurons + eta .* G .* (T - neighbourNeurons);
% Put the new weights of the BMU neighbouring neurons back to the
% entire SOM map
som(fromrow:torow,fromcol:tocol,:) = neighbourNeurons;
end
end
function ed = getEuclideanDistance(trainingVector, sommap, nrows, ncols, nfeatures)
% Transform the 3D representation of neurons into 2D
neuronList = reshape(sommap,nrows*ncols,nfeatures);
% Initialize Euclidean Distance
ed = 0;
for n = 1:size(neuronList,2)
ed = ed + (trainingVector(n)-neuronList(:,n)).^2;
end
ed = sqrt(ed);
我不知道我可能误解了你的问题,但据我所知,它真的很简单,无论是 kmeans
还是 Matlab 自己的 selforgmap
。您为 SOMSimple 发布的实现我无法真正评论。
让我们以您的初始示例为例:
rng(1337);
T = 1000;
x_i = rand(1,T); %rowvector for convenience
假设您要量化为三个符号,您的手动版本可能是:
nsyms = 3;
symsthresh = [1:-1/nsyms:1/nsyms];
x_i_q = zeros(size(x_i));
for i=1:nsyms
x_i_q(x_i<=symsthresh(i)) = i;
end
使用Matlab自带的selforgmap
可以得到类似的结果:
net = selforgmap(nsyms);
net.trainParam.showWindow = false;
net = train(net,x_i);
net(x_i);
y = net(x_i);
classes = vec2ind(y);
最后,同样的事情可以直接用 kmeans
:
clusters = kmeans(x_i',nsyms)';