heat.cluster._kcluster
Base-module for k-clustering algorithms
Module Contents
- class _KCluster(metric: Callable, n_clusters: int, init: str | heat.core.dndarray.DNDarray, max_iter: int, tol: float, random_state: int)
Bases:
heat.ClusteringMixin
,heat.BaseEstimator
Base class for k-statistics clustering algorithms (kmeans, kmedians, kmedoids). The clusters are represented by centroids ci (we use the term from kmeans for simplicity)
- Parameters:
metric (function) – One of the distance metrics in ht.spatial.distance. Needs to be passed as lambda function to take only two arrays as input
n_clusters (int) – The number of clusters to form as well as the number of centroids to generate.
init (str or DNDarray, default: ‘random’) –
Method for initialization:
‘probability_based’ : selects initial cluster centers for the clustering in a smart way to speed up convergence (k-means++)
‘random’: choose k observations (rows) at random from data for the initial centroids.
’batchparallel’: use the batch parallel algorithm to initialize the centroids, only available for split=0 and KMeans or KMedians
DNDarray
: gives the initial centers, should be of Shape = (n_clusters, n_features)
max_iter (int) – Maximum number of iterations for a single run.
tol (float, default: 1e-4) – Relative tolerance with regards to inertia to declare convergence.
random_state (int) – Determines random number generation for centroid initialization.
- _initialize_cluster_centers(x: heat.core.dndarray.DNDarray)
Initializes the K-Means centroids.
- Parameters:
x (DNDarray) – The data to initialize the clusters for. Shape = (n_samples, n_features)
- _assign_to_cluster(x: heat.core.dndarray.DNDarray, eval_functional_value: bool = False)
Assigns the passed data points to the centroids based on the respective metric
- _update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray)
The Update strategy is algorithm specific (e.g. calculate mean of assigned points for kmeans, median for kmedians, etc.)
- fit(x: heat.core.dndarray.DNDarray)
Computes the centroid of the clustering algorithm to fit the data
x
. The full pipeline is algorithm specific.- Parameters:
x (DNDarray) – Training instances to cluster. Shape = (n_samples, n_features)
- predict(x: heat.core.dndarray.DNDarray)
Predict the closest cluster each sample in
x
belongs to.In the vector quantization literature,
cluster_centers_()
is called the code book and each value returned by predict is the index of the closest code in the code book.- Parameters:
x (DNDarray) – New data to predict. Shape = (n_samples, n_features)