heat.cluster.spectral
Module for Spectral Clustering, a graph-based machine learning algorithm
Module Contents
- class Spectral(n_clusters: int = None, gamma: float = 1.0, metric: str = 'rbf', laplacian: str = 'fully_connected', threshold: float = 1.0, boundary: str = 'upper', n_lanczos: int = 300, assign_labels: str = 'kmeans', **params)
Bases:
heat.ClusteringMixin
,heat.BaseEstimator
Spectral clustering
- Variables:
n_clusters (int) – Number of clusters to fit
gamma (float) – Kernel coefficient sigma for ‘rbf’, ignored for metric=’euclidean’
metric (string) –
How to construct the similarity matrix.
’rbf’ : construct the similarity matrix using a radial basis function (RBF) kernel.
’euclidean’ : construct the similarity matrix as only euclidean distance.
laplacian (str) – How to calculate the graph laplacian (affinity) Currently supported : ‘fully_connected’, ‘eNeighbour’
threshold (float) – Threshold for affinity matrix if laplacian=’eNeighbour’ Ignorded for laplacian=’fully_connected’
boundary (str) – How to interpret threshold: ‘upper’, ‘lower’ Ignorded for laplacian=’fully_connected’
n_lanczos (int) – number of Lanczos iterations for Eigenvalue decomposition
assign_labels (str) – The strategy to use to assign labels in the embedding space.
**params (dict) – Parameter dictionary for the assign_labels estimator
- _spectral_embedding(x: heat.core.dndarray.DNDarray) Tuple[heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray]
Helper function for dataset x embedding. Returns Tupel(Eigenvalues, Eigenvectors) of the graph’s Laplacian matrix.
- Parameters:
x (DNDarray) – Sample Matrix for which the embedding should be calculated
Notes
This will throw out the complex side of the eigenvalues found during this.
- fit(x: heat.core.dndarray.DNDarray)
Clusters dataset X via spectral embedding. Computes the low-dim representation by calculation of eigenspectrum (eigenvalues and eigenvectors) of the graph laplacian from the similarity matrix and fits the eigenvectors that correspond to the k lowest eigenvalues with a seperate clustering algorithm (currently only kmeans is supported). Similarity metrics for adjacency calculations are supported via spatial.distance. The eigenvalues and eigenvectors are computed by reducing the Laplacian via lanczos iterations and using the torch eigenvalue solver on this smaller matrix. If other eigenvalue decompostion methods are supported, this will be expanded.
- Parameters:
x (DNDarray) – Training instances to cluster. Shape = (n_samples, n_features)
- predict(x: heat.core.dndarray.DNDarray) heat.core.dndarray.DNDarray
Return the label each sample in X belongs to. X is transformed to the low-dim representation by calculation of eigenspectrum (eigenvalues and eigenvectors) of the graph laplacian from the similarity matrix. Inference of lables is done by extraction of the closest centroid of the n_clusters eigenvectors from the previously fitted clustering algorithm (kmeans).
- Parameters:
x (DNDarray) – New data to predict. Shape = (n_samples, n_features)
Warning
Caution: Calculation of the low-dim representation requires some time!