dyconnmap.cluster package¶

Submodules¶

dyconnmap.cluster.cluster module¶

Base class for clustring algorithms

class dyconnmap.cluster.cluster.BaseCluster[source]¶

Bases: object

Base class for clustering alorithms.

encode(data, metric='euclidean', sort=True)[source]¶

Employ a nearest-neighbor rule to encode the given data using the codebook.

Parameters

data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
metric (string or None) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.

Valid options include:
- euclidean
- cityblock
- l1
- cosine
If None is passed, the matric used for learning the data will be used.
sort (boolean) – Whether or not to sort the symbols using MDS first. Default True

Returns

encoded_data (real array-like, shape(n_samples, n_features)) – data, as represented by the prototypes in codebook.
ts_symbols (list, shape(n_samples, 1)) – A discrete symbolic time series

dyconnmap.cluster.gng module¶

Growing NeuralGas

Growing Neural Gas (GNG) [Fritzke1995] is a dynamic neural network (as in adaptive) that learns topologies. Compared to Neural Gas, GNG provides the functionality of adding or purging the constructed graph of nodes and edges when certain criterion are met.

To do so, each node on the network stores a number of secondary information and statistics, such as, its learning vector, a local error, etc. Edge edge is assigned with a counter related to its age; so as older edges are pruned.

The convergence of the algorithm depends either on the maximum number of nodes of the graph, or an upper limit of elapsed iterations.

Briefly, the algorithm works as following:

Create two nodes, with weights drawn randomly from the original distibution; connect these two nodes. Set the edge’s age to zero.
Draw randomly a sample (\(\overrightarrow{x}\)) from the distibution.
For each node (\(n\)) in the graph with associated weights \(\overrightarrow{w}\), we compute the euclidean distance from \(\overrightarrow{x}\): \(||\overrightarrow{n}_w - \overrightarrow{x}||^2\). Next, we find the two nodes closest \(\overrightarrow{x}\) with distances \(d_s\) and \(d_t\).
The best matching unit (\(s\)) adjusts:

its weights: \(\overrightarrow{s}_w \leftarrow \overrightarrow{s}_w + [e_w * (\overrightarrow{x} - \overrightarrow{s}_w)]\).

its local error: \(s_{error} \leftarrow s_{error} + d_s\).

Next, the nodes (\(N\)) adjacent to \(s\):

update their weights: \(\overrightarrow{N}_w \leftarrow \overrightarrow{N}_w + [e_n * (\overrightarrow{x} - \overrightarrow{N}_w)]\).

increase the age of the connecting edges by one.

If the best and second mathing units (\(s\) and \(t\)) are connected, we reset the age of the connecting edge. Otherwise, we connect them.
Regarding the pruning of the network. First we remove the edges with older than \(a_{max}\). In the seond pass, we remove any disconnected nodes.
We check the iteration (\(iter\)), whether is a multiple of \(\lambda\) and if the maximum number of iteration has been reached; then we add a new node (\(q\)) in the graph:

Let \(u\) denote the node with the highest error on the graph, and \(v\) its neighbor with the highest error.

we disconnect \(u\) and \(v\)

\(q\) is added between \(u\) and \(v\): \(\overrightarrow{q}_w \leftarrow \frac{ \overrightarrow{u}_w + \overrightarrow{v}_w }{2}\).

connect \(q\) to \(u\), and \(q\) to \(v\)

reduce the local errors of both \(u\) and \(v\): \(u_{error} \leftarrow \alpha * u_{error}\) and \(v_{error} \leftarrow \alpha * v_{error}\)

define the local error \(q\): \(q_{error} \leftarrow u_{error}\)

Adjust the error of each node (\(n\)) on the graph: \(n_{error} \leftarrow n_{error} - \beta * n_{error}\)
Finally, increate the iteration (\(iter\)) and if any of the criterion is not satisfied, repeat from step #2.

Fritzke1995: Fritzke, B. (1995). A growing neural gas network learns topologies. In Advances in neural information processing systems (pp. 625-632).

class dyconnmap.cluster.gng.GrowingNeuralGas(n_max_protos=30, l=300, a_max=88, a=0.5, b=0.0005, iterations=10000, lrate=None, n_jobs=1, rng=None)[source]¶

Bases: dyconnmap.cluster.cluster.BaseCluster

Growing Neural Gas

Parameters

n_max_protos (int) – Maximum number of nodes.
l (int) – Every iteration is checked if it is a multiple of this value.
a_max (int) – Maximum age of edges.
a (float) – Weights the local error of the nodes when adding a new node.
b (float) – Weights the local error of all the nodes on the graph.
iterations (int) – Total number of iterations.
lrate (list of length 2) – The learning rates of the best matching unit and its neighbors.
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn)).
rng (object or None) – An object of type numpy.random.RandomState.

protos¶

The prototypical vectors

Type: array-like, shape(n_protos, n_features)

Notes

fit(data)[source]¶

Learn data, and construct a vector codebook.

Parameters: data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
Returns: self – The instance itself
Return type: object

dyconnmap.cluster.mng module¶

Merge Neural Gas

Merge Neural Gas (MNG) [Strickert2003] is similar to the original Neural Gas algorithm, but each node, has an additional context vector (\(c\)) associated; and the best matching unit is determined by a linear combination of both the weight and context vector (thus the merge), from the previous iteration.

Similar to Neural Gas, \(N\) nodes have their weights (\(w\)) and context vectors (\(c\)) randomly initialized. When the network is presented with a new input sequence \(v\), the distance and the corresponding rank of each node \(n\) is estimated.

The distance is computed by: \(d_i(n) = (1 - \alpha) * ||v-w_i||^2 + ||c(n)-c_i||^2\) and the context by: \(c(n) = (1 - \beta) * w_{I_{n}-1} + \beta * c_{I_{n}-1}\).

\(\alpha\) is the balancing factor between \(w\) and \(c\)

\(\beta\) is the merging degree between two subsequent iterations

\(I_{n}-1\) denotes the previous iteration

Each node is adapted with:

\({\Delta}w_i = e_w(k) * h_{\lambda(k)} (r(d_i,d)) * (v-w_i)\) and its context by

\({\Delta}c_i = e_w(k) * h_{\lambda(k)} (r(d_i,d)) * (c(n)-c_i)\)

\(r(d_i, d)\) denotes the rank of the \(i\)-th node

As with Neural Gas, \(h_{\lambda(k)}(r(d_i, d)) = exp(\frac{-r(d_i, d)}{\lambda(k)})\) represents the neighborhood ranking function. \(\lambda(k)\) denotes the influence of the neighbood: \({\lambda_0(\frac{\lambda_T}{\lambda_0})}^{(\frac{t}{T_{max}})}\).

Notes

Strickert2003: Strickert, M., & Hammer, B. (2003, September). Neural gas for sequences. In Proceedings of the Workshop on Self-Organizing Maps (WSOM’03) (pp. 53-57).

class dyconnmap.cluster.mng.MergeNeuralGas(n_protos=10, iterations=1024, merge_coeffs=None, epsilon=None, lrate=None, n_jobs=1, rng=None)[source]¶

Bases: dyconnmap.cluster.cluster.BaseCluster

Merge Neural Gas

Parameters

n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
merge_coeffs (list of length 2) – The merging coefficients
epsilon (list of length 2) – The initial and final training rates.
lrate (list of length 2) – The initial and final rearning rates.
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn)).
rng (object) – An object of type numpy.random.RandomState.

protos¶

The prototypical vectors

Type: array-like, shape(n_protos, n_features)

distortion¶

The normalized distortion error

Type: float

encode(data, metric='euclidean')[source]¶

Employ a nearest-neighbor rule to encode the given data using the codebook.

Parameters

data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
metric (string or None) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.

Valid options include:
- euclidean
- cityblock
- l1
- cosine
If None is passed, the matric used for learning the data will be used.
sort (boolean) – Whether or not to sort the symbols using MDS first. Default True

Returns

encoded_data (real array-like, shape(n_samples, n_features)) – data, as represented by the prototypes in codebook.
ts_symbols (list, shape(n_samples, 1)) – A discrete symbolic time series

fit(data)[source]¶

Parameters: data –
Returns

dyconnmap.cluster.ng module¶

NeuralGas

Inspired by Self Organizing Maps (SOMs), Neural Gas (NG), an unsupervised adaptive algorithm coined by [Martinetz1991]. Neural Gas does not assume a preconstructed lattice thus the adaptation cannot be based on the distances between the neighbor neurons (like in SOMs) because be definition there are no neighbors.

The adaptation-convergence is driven by a stochastic gradient function with a soft-max adaptation rule that minimizes the average distortion error.

First, we construct a number of neurons ( \(N_\vec{w}\) ) with random weights ( \(\vec{w}\) ). Then we train the model by feeding it feature vectors sequentially drawn from the distribution \(P(t)\). When a new feature vector is presented to the model, we sort all neurons’ weights (\(N_\vec{w})\) based on their Euclidean distance from \(\vec{x}\). Then, the adaptation if done by:

\[\vec{w} \leftarrow \vec{w} + [ N_{\vec{w}} \cdot e(t)) \cdot h(k)) \cdot (\vec{x} - \vec{w}) ], \forall \vec{w} \in N_{\vec{w}}\]

where,

\[ \begin{align}\begin{aligned}h(t)=exp{ \frac{-k}{\sigma^2}(t) }\\\sigma^2 = {\lambda_i(\frac{\lambda_T}{\lambda_0})}^{(\frac{t}{T_{max}})}\\e(t) = {e_i(\frac{e_T}{e_0})}^{(\frac{t}{T_{max}})}\end{aligned}\end{align} \]

The parameter \(\lambda\), governs the initial and final learning rate, while the parameter \(e\) the training respectively.

After the presentation of a feature vector, increase the itaration counter \(t\) and repeat until all desired criteria are met, or \(t = T_{max}\).

With these prototypes, we can represent all the input feature vectors \(\vec{x}\) using a Nearest Neighbor rule. The quality of this encoding can measured by the normalized distortion error:

\[\frac{ \sum_{t=1}^T \left | \left | X(t) - X^\ast(t)) \right | \right |^2 }{ \sum_{t=1}^T \left | \left | X(t) - \overline{X}) \right | \right |^2 }\]

where

\[\overline{X}` = \frac{1}{T} \sum_{t=1}^T{X(t)}\]

\(T\) is the number of reference prototypes; in \(X\) the input patterns are stored; \(X^\ast\) contains the approximated patterns as produced by the Nearest Neighbor rule.

Notes

For faster convergence, we can also draw random weights from the given probability distribution \(P(t)\)

Martinetz1991: Martinetz, T., Schulten, K., et al. A “neural-gas” network learns topologies. University of Illinois at Urbana-Champaign, 1991.
Laskaris2004: Laskaris, N. A., Fotopoulos, S., & Ioannides, A. A. (2004). Mining information from event-related recordings. Signal Processing Magazine, IEEE, 21(3), 66-77.

class dyconnmap.cluster.ng.NeuralGas(n_protos=10, iterations=1024, epsilon=None, lrate=None, n_jobs=1, metric='euclidean', rng=None)[source]¶

Bases: dyconnmap.cluster.cluster.BaseCluster

Neural Gas

Parameters

n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
epsilon (list of length 2) – The initial and final training rates
lrate (list of length 2) – The initial and final rearning rates
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
metric (string) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.

Valid options include:
- euclidean
- cityblock
- l1
- cosine
rng (object or None) – An object of type numpy.random.RandomState

protos¶

The prototypical vectors

Type: array-like, shape(n_protos, n_features)

distortion¶

The normalized distortion error

Type: float

Notes

Slightly based on http://webloria.loria.fr/~rougier/downloads/ng.py

fit(data)[source]¶

Learn data, and construct a vector codebook.

Parameters: data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
Returns: self – The instance itself
Return type: object

dyconnmap.cluster.rng module¶

Relational Neural Gas

Relational Neural Gas (RNG) [Hammer2007] is a variant of Neural Gas, that allows clustering and mining data from a pairwise similarity or dissimilarity matrix.

Hammer2007: Hammer, B., & Hasenfuss, A. (2007, September). Relational neural gas. In Annual Conference on Artificial Intelligence (pp. 190-204). Springer, Berlin, Heidelberg.
Hasenfuss2008: Hasenfuss, A., Hammer, B., & Rossi, F. (2008, July). Patch Relational Neural Gas–Clustering of Huge Dissimilarity Datasets. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 1-12). Springer, Berlin, Heidelberg.

class dyconnmap.cluster.rng.RelationalNeuralGas(n_protos=10, iterations=100, lrate=None, metric='euclidean', rng=None)[source]¶

Bases: dyconnmap.cluster.cluster.BaseCluster

Relational Neural Gas

Parameters

n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
lrate (list of length 2) – The initial and final rearning rates
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
metric (string) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.

Valid options include:
- euclidean
- cityblock
- l1
- cosine
rng (object or None) – An object of type numpy.random.RandomState

protos¶

The prototypical vectors

Type: array-like, shape(n_protos, n_features)

fit(data)[source]¶

Fit

Parameters

data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
metric (string or None) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.

Valid options include:
- euclidean
- cityblock
- l1
- cosine
If None is passed, the matric used for learning the data will be used

dyconnmap.cluster.som module¶

Self Organizing Map

\(T\) is the number of reference prototypes; in \(X\) the input patterns are stored; \(X^\ast\) contains the approximated patterns as produced by the Nearest Neighbor rule.

Notes

For faster convergence, we can also draw random weights from the given probability distribution \(P(t)\)

Martinetz1991: Martinetz, T., Schulten, K., et al. A “neural-gas” network learns topologies. University of Illinois at Urbana-Champaign, 1991.

class dyconnmap.cluster.som.SOM(grid=(10, 10), iterations=1024, lrate=0.1, n_jobs=1, rng=None)[source]¶

Bases: dyconnmap.cluster.cluster.BaseCluster

Self Organizing Map

Parameters

grid (list of length 2) – The X and Y sizes of the grid
iterations (int) – The maximum iterations
lrate (float) – The initial rearning rate
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
rng (object or None) – An object of type numpy.random.RandomState

protos¶

The prototypical vectors

Type: array-like, shape(n_protos, n_features)

classmethod findBMU(x, y)[source]¶

fit(data)[source]¶

dyconnmap.cluster.umatrix module¶

UMatrix Visualization

dyconnmap.cluster.umatrix.umatrix(M)[source]¶

dyconnmap.cluster.validity module¶

RayTuri1999: Ray, S., & Turi, R. H. (1999, December). Determination of number of clusters in k-means clustering and application in colour image segmentation. In Proceedings of the 4th international conference on advances in pattern recognition and digital techniques (pp. 137-143).
Davies1979: Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2), 224-227.

dyconnmap.cluster.validity.davies_bouldin(data, labels)[source]¶

Davies-Bouldin Index

Parameters

data (array-like, shape(n_ts, n_samples)) – Input time series
labels (array-like, shape(n_ts)) – Cluster assignements (labels) per time serie.

Returns

index

Return type

float

dyconnmap.cluster.validity.ray_turi(data, labels)[source]¶

Ray-Turi Index

Parameters

data (array-like, shape(n_ts, n_samples)) – Input time series
labels (array-like, shape(n_ts)) – Cluster assignements (labels) per time serie.

Returns

index

Return type

float

Module contents¶

class dyconnmap.cluster.GrowingNeuralGas(n_max_protos=30, l=300, a_max=88, a=0.5, b=0.0005, iterations=10000, lrate=None, n_jobs=1, rng=None)[source]¶

Bases: dyconnmap.cluster.cluster.BaseCluster

Growing Neural Gas

Parameters

n_max_protos (int) – Maximum number of nodes.
l (int) – Every iteration is checked if it is a multiple of this value.
a_max (int) – Maximum age of edges.
a (float) – Weights the local error of the nodes when adding a new node.
b (float) – Weights the local error of all the nodes on the graph.
iterations (int) – Total number of iterations.
lrate (list of length 2) – The learning rates of the best matching unit and its neighbors.
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn)).
rng (object or None) – An object of type numpy.random.RandomState.

protos¶

The prototypical vectors

Type: array-like, shape(n_protos, n_features)

Notes

fit(data)[source]¶

Learn data, and construct a vector codebook.

Parameters: data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
Returns: self – The instance itself
Return type: object

class dyconnmap.cluster.MergeNeuralGas(n_protos=10, iterations=1024, merge_coeffs=None, epsilon=None, lrate=None, n_jobs=1, rng=None)[source]¶

Bases: dyconnmap.cluster.cluster.BaseCluster

Merge Neural Gas

Parameters

n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
merge_coeffs (list of length 2) – The merging coefficients
epsilon (list of length 2) – The initial and final training rates.
lrate (list of length 2) – The initial and final rearning rates.
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn)).
rng (object) – An object of type numpy.random.RandomState.

protos¶

The prototypical vectors

Type: array-like, shape(n_protos, n_features)

distortion¶

The normalized distortion error

Type: float

encode(data, metric='euclidean')[source]¶

Employ a nearest-neighbor rule to encode the given data using the codebook.

Parameters

data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
metric (string or None) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.

Valid options include:
- euclidean
- cityblock
- l1
- cosine
If None is passed, the matric used for learning the data will be used.
sort (boolean) – Whether or not to sort the symbols using MDS first. Default True

Returns

encoded_data (real array-like, shape(n_samples, n_features)) – data, as represented by the prototypes in codebook.
ts_symbols (list, shape(n_samples, 1)) – A discrete symbolic time series

fit(data)[source]¶

Parameters: data –
Returns

class dyconnmap.cluster.NeuralGas(n_protos=10, iterations=1024, epsilon=None, lrate=None, n_jobs=1, metric='euclidean', rng=None)[source]¶

Bases: dyconnmap.cluster.cluster.BaseCluster

Neural Gas

Parameters

n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
epsilon (list of length 2) – The initial and final training rates
lrate (list of length 2) – The initial and final rearning rates
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
metric (string) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.

Valid options include:
- euclidean
- cityblock
- l1
- cosine
rng (object or None) – An object of type numpy.random.RandomState

protos¶

The prototypical vectors

Type: array-like, shape(n_protos, n_features)

distortion¶

The normalized distortion error

Type: float

Notes

Slightly based on http://webloria.loria.fr/~rougier/downloads/ng.py

fit(data)[source]¶

Learn data, and construct a vector codebook.

Parameters: data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
Returns: self – The instance itself
Return type: object

class dyconnmap.cluster.RelationalNeuralGas(n_protos=10, iterations=100, lrate=None, metric='euclidean', rng=None)[source]¶

Bases: dyconnmap.cluster.cluster.BaseCluster

Relational Neural Gas

Parameters

n_protos (int) – The number of prototypes
iterations (int) – The maximum iterations
lrate (list of length 2) – The initial and final rearning rates
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
metric (string) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.

Valid options include:
- euclidean
- cityblock
- l1
- cosine
rng (object or None) – An object of type numpy.random.RandomState

protos¶

The prototypical vectors

Type: array-like, shape(n_protos, n_features)

fit(data)[source]¶

Fit

Parameters

data (real array-like, shape(n_samples, n_features)) – Data matrix, each row represents a sample.
metric (string or None) –
One of the following valid options as defined for function http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html.

Valid options include:
- euclidean
- cityblock
- l1
- cosine
If None is passed, the matric used for learning the data will be used

class dyconnmap.cluster.SOM(grid=(10, 10), iterations=1024, lrate=0.1, n_jobs=1, rng=None)[source]¶

Bases: dyconnmap.cluster.cluster.BaseCluster

Self Organizing Map

Parameters

grid (list of length 2) – The X and Y sizes of the grid
iterations (int) – The maximum iterations
lrate (float) – The initial rearning rate
n_jobs (int) – Number of parallel jobs (will be passed to scikit-learn))
rng (object or None) – An object of type numpy.random.RandomState

protos¶

The prototypical vectors

Type: array-like, shape(n_protos, n_features)

classmethod findBMU(x, y)[source]¶

fit(data)[source]¶

dyconnmap.cluster.davies_bouldin(data, labels)[source]¶

Davies-Bouldin Index

Parameters

data (array-like, shape(n_ts, n_samples)) – Input time series
labels (array-like, shape(n_ts)) – Cluster assignements (labels) per time serie.

Returns

index

Return type

float

dyconnmap.cluster.ray_turi(data, labels)[source]¶

Ray-Turi Index

Parameters

data (array-like, shape(n_ts, n_samples)) – Input time series
labels (array-like, shape(n_ts)) – Cluster assignements (labels) per time serie.

Returns

index

Return type

float

dyconnmap.cluster.umatrix(M)[source]¶