dyconnmap.cluster package¶
Submodules¶
dyconnmap.cluster.cluster module¶
Base class for clustring algorithms
- class dyconnmap.cluster.cluster.BaseCluster[source]¶
Bases:
object
Base class for clustering alorithms.
- encode(data, metric='euclidean', sort=True)[source]¶
Employ a nearest-neighbor rule to encode the given
data
using the codebook.- Parameters
data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
metric (string or None) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.
Valid options include:
euclidean
cityblock
l1
cosine
If None is passed, the matric used for learning the data will be used.
sort (boolean) – Whether or not to sort the symbols using MDS first. Default True
- Returns
encoded_data (real array-like, shape(n_samples, n_features)) –
data
, as represented by the prototypes in codebook.ts_symbols (list, shape(n_samples, 1)) – A discrete symbolic time series
dyconnmap.cluster.gng module¶
Growing NeuralGas
Growing Neural Gas (GNG) [Fritzke1995] is a dynamic neural network (as in adaptive) that learns topologies. Compared to Neural Gas, GNG provides the functionality of adding or purging the constructed graph of nodes and edges when certain criterion are met.
To do so, each node on the network stores a number of secondary information and statistics, such as, its learning vector, a local error, etc. Edge edge is assigned with a counter related to its age; so as older edges are pruned.
The convergence of the algorithm depends either on the maximum number of nodes of the graph, or an upper limit of elapsed iterations.
Briefly, the algorithm works as following:
Create two nodes, with weights drawn randomly from the original distibution; connect these two nodes. Set the edge’s age to zero.
Draw randomly a sample (\(\overrightarrow{x}\)) from the distibution.
For each node (\(n\)) in the graph with associated weights \(\overrightarrow{w}\), we compute the euclidean distance from \(\overrightarrow{x}\): \(||\overrightarrow{n}_w - \overrightarrow{x}||^2\). Next, we find the two nodes closest \(\overrightarrow{x}\) with distances \(d_s\) and \(d_t\).
The best matching unit (\(s\)) adjusts:
its weights: \(\overrightarrow{s}_w \leftarrow \overrightarrow{s}_w + [e_w * (\overrightarrow{x} - \overrightarrow{s}_w)]\).
its local error: \(s_{error} \leftarrow s_{error} + d_s\).
Next, the nodes (\(N\)) adjacent to \(s\):
update their weights: \(\overrightarrow{N}_w \leftarrow \overrightarrow{N}_w + [e_n * (\overrightarrow{x} - \overrightarrow{N}_w)]\).
increase the age of the connecting edges by one.
If the best and second mathing units (\(s\) and \(t\)) are connected, we reset the age of the connecting edge. Otherwise, we connect them.
Regarding the pruning of the network. First we remove the edges with older than \(a_{max}\). In the seond pass, we remove any disconnected nodes.
We check the iteration (\(iter\)), whether is a multiple of \(\lambda\) and if the maximum number of iteration has been reached; then we add a new node (\(q\)) in the graph:
Let \(u\) denote the node with the highest error on the graph, and \(v\) its neighbor with the highest error.
we disconnect \(u\) and \(v\)
\(q\) is added between \(u\) and \(v\): \(\overrightarrow{q}_w \leftarrow \frac{ \overrightarrow{u}_w + \overrightarrow{v}_w }{2}\).
connect \(q\) to \(u\), and \(q\) to \(v\)
reduce the local errors of both \(u\) and \(v\): \(u_{error} \leftarrow \alpha * u_{error}\) and \(v_{error} \leftarrow \alpha * v_{error}\)
define the local error \(q\): \(q_{error} \leftarrow u_{error}\)
Adjust the error of each node (\(n\)) on the graph: \(n_{error} \leftarrow n_{error} - \beta * n_{error}\)
Finally, increate the iteration (\(iter\)) and if any of the criterion is not satisfied, repeat from step #2.
- Fritzke1995
Fritzke, B. (1995). A growing neural gas network learns topologies. In Advances in neural information processing systems (pp. 625-632).
- class dyconnmap.cluster.gng.GrowingNeuralGas(n_max_protos=30, l=300, a_max=88, a=0.5, b=0.0005, iterations=10000, lrate=None, n_jobs=1, rng=None)[source]¶
Bases:
dyconnmap.cluster.cluster.BaseCluster
Growing Neural Gas
- Parameters
n_max_protos (int) – Maximum number of nodes.
l (int) – Every iteration is checked if it is a multiple of this value.
a_max (int) – Maximum age of edges.
a (float) – Weights the local error of the nodes when adding a new node.
b (float) – Weights the local error of all the nodes on the graph.
iterations (int) – Total number of iterations.
lrate (list of length 2) – The learning rates of the best matching unit and its neighbors.
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn)).
rng (object or None) – An object of type numpy.random.RandomState.
- protos¶
The prototypical vectors
- Type
array-like, shape(n_protos, n_features)
Notes
dyconnmap.cluster.mng module¶
Merge Neural Gas
Merge Neural Gas (MNG) [Strickert2003] is similar to the original Neural Gas algorithm, but each node, has an additional context vector (\(c\)) associated; and the best matching unit is determined by a linear combination of both the weight and context vector (thus the merge), from the previous iteration.
Similar to Neural Gas, \(N\) nodes have their weights (\(w\)) and context vectors (\(c\)) randomly initialized. When the network is presented with a new input sequence \(v\), the distance and the corresponding rank of each node \(n\) is estimated.
The distance is computed by: \(d_i(n) = (1 - \alpha) * ||v-w_i||^2 + ||c(n)-c_i||^2\) and the context by: \(c(n) = (1 - \beta) * w_{I_{n}-1} + \beta * c_{I_{n}-1}\).
\(\alpha\) is the balancing factor between \(w\) and \(c\)
\(\beta\) is the merging degree between two subsequent iterations
\(I_{n}-1\) denotes the previous iteration
Each node is adapted with:
\({\Delta}w_i = e_w(k) * h_{\lambda(k)} (r(d_i,d)) * (v-w_i)\) and its context by
\({\Delta}c_i = e_w(k) * h_{\lambda(k)} (r(d_i,d)) * (c(n)-c_i)\)
\(r(d_i, d)\) denotes the rank of the \(i\)-th node
As with Neural Gas, \(h_{\lambda(k)}(r(d_i, d)) = exp(\frac{-r(d_i, d)}{\lambda(k)})\) represents the neighborhood ranking function. \(\lambda(k)\) denotes the influence of the neighbood: \({\lambda_0(\frac{\lambda_T}{\lambda_0})}^{(\frac{t}{T_{max}})}\).
Notes
- Strickert2003
Strickert, M., & Hammer, B. (2003, September). Neural gas for sequences. In Proceedings of the Workshop on Self-Organizing Maps (WSOM’03) (pp. 53-57).
- class dyconnmap.cluster.mng.MergeNeuralGas(n_protos=10, iterations=1024, merge_coeffs=None, epsilon=None, lrate=None, n_jobs=1, rng=None)[source]¶
Bases:
dyconnmap.cluster.cluster.BaseCluster
Merge Neural Gas
- Parameters
n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
merge_coeffs (list of length 2) – The merging coefficients
epsilon (list of length 2) – The initial and final training rates.
lrate (list of length 2) – The initial and final rearning rates.
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn)).
rng (object) – An object of type numpy.random.RandomState.
- protos¶
The prototypical vectors
- Type
array-like, shape(n_protos, n_features)
- distortion¶
The normalized distortion error
- Type
float
- encode(data, metric='euclidean')[source]¶
Employ a nearest-neighbor rule to encode the given
data
using the codebook.- Parameters
data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
metric (string or None) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.
Valid options include:
euclidean
cityblock
l1
cosine
If None is passed, the matric used for learning the data will be used.
sort (boolean) – Whether or not to sort the symbols using MDS first. Default True
- Returns
encoded_data (real array-like, shape(n_samples, n_features)) –
data
, as represented by the prototypes in codebook.ts_symbols (list, shape(n_samples, 1)) – A discrete symbolic time series
dyconnmap.cluster.ng module¶
NeuralGas
Inspired by Self Organizing Maps (SOMs), Neural Gas (NG), an unsupervised adaptive algorithm coined by [Martinetz1991]. Neural Gas does not assume a preconstructed lattice thus the adaptation cannot be based on the distances between the neighbor neurons (like in SOMs) because be definition there are no neighbors.
The adaptation-convergence is driven by a stochastic gradient function with a soft-max adaptation rule that minimizes the average distortion error.
First, we construct a number of neurons ( \(N_\vec{w}\) ) with random weights ( \(\vec{w}\) ). Then we train the model by feeding it feature vectors sequentially drawn from the distribution \(P(t)\). When a new feature vector is presented to the model, we sort all neurons’ weights (\(N_\vec{w})\) based on their Euclidean distance from \(\vec{x}\). Then, the adaptation if done by:
where,
The parameter \(\lambda\), governs the initial and final learning rate, while the parameter \(e\) the training respectively.
After the presentation of a feature vector, increase the itaration counter \(t\) and repeat until all desired criteria are met, or \(t = T_{max}\).
With these prototypes, we can represent all the input feature vectors \(\vec{x}\) using a Nearest Neighbor rule. The quality of this encoding can measured by the normalized distortion error:
where
\(T\) is the number of reference prototypes; in \(X\) the input patterns are stored; \(X^\ast\) contains the approximated patterns as produced by the Nearest Neighbor rule.
Notes
For faster convergence, we can also draw random weights from the given probability distribution \(P(t)\)
- Martinetz1991
Martinetz, T., Schulten, K., et al. A “neural-gas” network learns topologies. University of Illinois at Urbana-Champaign, 1991.
- Laskaris2004
Laskaris, N. A., Fotopoulos, S., & Ioannides, A. A. (2004). Mining information from event-related recordings. Signal Processing Magazine, IEEE, 21(3), 66-77.
- class dyconnmap.cluster.ng.NeuralGas(n_protos=10, iterations=1024, epsilon=None, lrate=None, n_jobs=1, metric='euclidean', rng=None)[source]¶
Bases:
dyconnmap.cluster.cluster.BaseCluster
Neural Gas
- Parameters
n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
epsilon (list of length 2) – The initial and final training rates
lrate (list of length 2) – The initial and final rearning rates
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
metric (string) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.
Valid options include:
euclidean
cityblock
l1
cosine
rng (object or None) – An object of type numpy.random.RandomState
- protos¶
The prototypical vectors
- Type
array-like, shape(n_protos, n_features)
- distortion¶
The normalized distortion error
- Type
float
Notes
Slightly based on http://webloria.loria.fr/~rougier/downloads/ng.py
dyconnmap.cluster.rng module¶
Relational Neural Gas
Relational Neural Gas (RNG) [Hammer2007] is a variant of Neural Gas, that allows clustering and mining data from a pairwise similarity or dissimilarity matrix.
- Hammer2007
Hammer, B., & Hasenfuss, A. (2007, September). Relational neural gas. In Annual Conference on Artificial Intelligence (pp. 190-204). Springer, Berlin, Heidelberg.
- Hasenfuss2008
Hasenfuss, A., Hammer, B., & Rossi, F. (2008, July). Patch Relational Neural Gas–Clustering of Huge Dissimilarity Datasets. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 1-12). Springer, Berlin, Heidelberg.
- class dyconnmap.cluster.rng.RelationalNeuralGas(n_protos=10, iterations=100, lrate=None, metric='euclidean', rng=None)[source]¶
Bases:
dyconnmap.cluster.cluster.BaseCluster
Relational Neural Gas
- Parameters
n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
lrate (list of length 2) – The initial and final rearning rates
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
metric (string) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.
Valid options include:
euclidean
cityblock
l1
cosine
rng (object or None) – An object of type numpy.random.RandomState
- protos¶
The prototypical vectors
- Type
array-like, shape(n_protos, n_features)
- fit(data)[source]¶
Fit
- Parameters
data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
metric (string or None) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.
Valid options include:
euclidean
cityblock
l1
cosine
If None is passed, the matric used for learning the data will be used
dyconnmap.cluster.som module¶
Self Organizing Map
\(T\) is the number of reference prototypes; in \(X\) the input patterns are stored; \(X^\ast\) contains the approximated patterns as produced by the Nearest Neighbor rule.
Notes
For faster convergence, we can also draw random weights from the given probability distribution \(P(t)\)
- Martinetz1991
Martinetz, T., Schulten, K., et al. A “neural-gas” network learns topologies. University of Illinois at Urbana-Champaign, 1991.
- class dyconnmap.cluster.som.SOM(grid=(10, 10), iterations=1024, lrate=0.1, n_jobs=1, rng=None)[source]¶
Bases:
dyconnmap.cluster.cluster.BaseCluster
Self Organizing Map
- Parameters
grid (list of length 2) – The X and Y sizes of the grid
iterations (int) – The maximum iterations
lrate (float) – The initial rearning rate
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
rng (object or None) – An object of type numpy.random.RandomState
- protos¶
The prototypical vectors
- Type
array-like, shape(n_protos, n_features)
dyconnmap.cluster.umatrix module¶
UMatrix Visualization
dyconnmap.cluster.validity module¶
- RayTuri1999
Ray, S., & Turi, R. H. (1999, December). Determination of number of clusters in k-means clustering and application in colour image segmentation. In Proceedings of the 4th international conference on advances in pattern recognition and digital techniques (pp. 137-143).
- Davies1979
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2), 224-227.
Module contents¶
- class dyconnmap.cluster.GrowingNeuralGas(n_max_protos=30, l=300, a_max=88, a=0.5, b=0.0005, iterations=10000, lrate=None, n_jobs=1, rng=None)[source]¶
Bases:
dyconnmap.cluster.cluster.BaseCluster
Growing Neural Gas
- Parameters
n_max_protos (int) – Maximum number of nodes.
l (int) – Every iteration is checked if it is a multiple of this value.
a_max (int) – Maximum age of edges.
a (float) – Weights the local error of the nodes when adding a new node.
b (float) – Weights the local error of all the nodes on the graph.
iterations (int) – Total number of iterations.
lrate (list of length 2) – The learning rates of the best matching unit and its neighbors.
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn)).
rng (object or None) – An object of type numpy.random.RandomState.
- protos¶
The prototypical vectors
- Type
array-like, shape(n_protos, n_features)
Notes
- class dyconnmap.cluster.MergeNeuralGas(n_protos=10, iterations=1024, merge_coeffs=None, epsilon=None, lrate=None, n_jobs=1, rng=None)[source]¶
Bases:
dyconnmap.cluster.cluster.BaseCluster
Merge Neural Gas
- Parameters
n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
merge_coeffs (list of length 2) – The merging coefficients
epsilon (list of length 2) – The initial and final training rates.
lrate (list of length 2) – The initial and final rearning rates.
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn)).
rng (object) – An object of type numpy.random.RandomState.
- protos¶
The prototypical vectors
- Type
array-like, shape(n_protos, n_features)
- distortion¶
The normalized distortion error
- Type
float
- encode(data, metric='euclidean')[source]¶
Employ a nearest-neighbor rule to encode the given
data
using the codebook.- Parameters
data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
metric (string or None) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.
Valid options include:
euclidean
cityblock
l1
cosine
If None is passed, the matric used for learning the data will be used.
sort (boolean) – Whether or not to sort the symbols using MDS first. Default True
- Returns
encoded_data (real array-like, shape(n_samples, n_features)) –
data
, as represented by the prototypes in codebook.ts_symbols (list, shape(n_samples, 1)) – A discrete symbolic time series
- class dyconnmap.cluster.NeuralGas(n_protos=10, iterations=1024, epsilon=None, lrate=None, n_jobs=1, metric='euclidean', rng=None)[source]¶
Bases:
dyconnmap.cluster.cluster.BaseCluster
Neural Gas
- Parameters
n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
epsilon (list of length 2) – The initial and final training rates
lrate (list of length 2) – The initial and final rearning rates
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
metric (string) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.
Valid options include:
euclidean
cityblock
l1
cosine
rng (object or None) – An object of type numpy.random.RandomState
- protos¶
The prototypical vectors
- Type
array-like, shape(n_protos, n_features)
- distortion¶
The normalized distortion error
- Type
float
Notes
Slightly based on http://webloria.loria.fr/~rougier/downloads/ng.py
- class dyconnmap.cluster.RelationalNeuralGas(n_protos=10, iterations=100, lrate=None, metric='euclidean', rng=None)[source]¶
Bases:
dyconnmap.cluster.cluster.BaseCluster
Relational Neural Gas
- Parameters
n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
lrate (list of length 2) – The initial and final rearning rates
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
metric (string) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.
Valid options include:
euclidean
cityblock
l1
cosine
rng (object or None) – An object of type numpy.random.RandomState
- protos¶
The prototypical vectors
- Type
array-like, shape(n_protos, n_features)
- fit(data)[source]¶
Fit
- Parameters
data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
metric (string or None) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.
Valid options include:
euclidean
cityblock
l1
cosine
If None is passed, the matric used for learning the data will be used
- class dyconnmap.cluster.SOM(grid=(10, 10), iterations=1024, lrate=0.1, n_jobs=1, rng=None)[source]¶
Bases:
dyconnmap.cluster.cluster.BaseCluster
Self Organizing Map
- Parameters
grid (list of length 2) – The X and Y sizes of the grid
iterations (int) – The maximum iterations
lrate (float) – The initial rearning rate
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
rng (object or None) – An object of type numpy.random.RandomState
- protos¶
The prototypical vectors
- Type
array-like, shape(n_protos, n_features)
- dyconnmap.cluster.davies_bouldin(data, labels)[source]¶
Davies-Bouldin Index
- Parameters
data (array-like, shape(n_ts, n_samples)) – Input time series
labels (array-like, shape(n_ts)) – Cluster assignements (labels) per time serie.
- Returns
index
- Return type
float