`heat.cluster._kcluster`

Base-module for k-clustering algorithms

Module Contents

class _KCluster(metric: Callable, n_clusters: int, init: str | heat.core.dndarray.DNDarray, max_iter: int, tol: float, random_state: int)[source]

Bases: heat.ClusteringMixin, heat.BaseEstimator

Base class for k-statistics clustering algorithms (kmeans, kmedians, kmedoids). The clusters are represented by centroids ci (we use the term from kmeans for simplicity)

Parameters:

metric (function) – One of the distance metrics in ht.spatial.distance. Needs to be passed as lambda function to take only two arrays as input
n_clusters (int) – The number of clusters to form as well as the number of centroids to generate.
init (str or DNDarray, default: ‘random’) –
Method for initialization:
- ‘probability_based’ : selects initial cluster centers for the clustering in a smart way to speed up convergence (k-means++)
- ‘random’: choose k observations (rows) at random from data for the initial centroids.
- ’batchparallel’: use the batch parallel algorithm to initialize the centroids, only available for split=0 and KMeans or KMedians
- DNDarray: gives the initial centers, should be of Shape = (n_clusters, n_features)
max_iter (int) – Maximum number of iterations for a single run.
tol (float, default: 1e-4) – Relative tolerance with regards to inertia to declare convergence.
random_state (int) – Determines random number generation for centroid initialization.

n_clusters

init

max_iter

tol

random_state

_metric

_cluster_centers = None

_functional_value = None

_labels = None

_inertia = None

_n_iter = None

_p = None

_initialize_cluster_centers(x: heat.core.dndarray.DNDarray, oversampling: float, iter_multiplier: float)[source]

Initializes the K-Means centroids.

Parameters:

x (DNDarray) – The data to initialize the clusters for. Shape = (n_samples, n_features)
oversampling (float) – oversampling factor used in the k-means|| initializiation of centroids
iter_multiplier (float) – factor that increases the number of iterations used in the initialization of centroids

_centroid_sampling_helper(x: heat.core.dndarray.DNDarray, centroids: heat.core.dndarray.DNDarray, oversampling: float, num_iters: int)[source]

Helper function for the k-means|| initialization of centroids. Samples new centroids based on a probability distribution derived from the distance of data points to the current set of centroids.

Parameters:

x (DNDarray) – The data to initialize the clusters for. Shape = (n_samples, n_features)
centroids (DNDarray) – The initial set of centroids
oversampling (float) – oversampling factor used in the k-means|| initializiation of centroids
num_iters (float) – number of iterations used in the initialization of centroids

_assign_to_cluster(x: heat.core.dndarray.DNDarray, eval_functional_value: bool = False)[source]

Assigns the passed data points to the centroids based on the respective metric

Parameters:

x (DNDarray) – Data points, Shape = (n_samples, n_features)
eval_functional_value (bool, default: False) – If True, the current K-Clustering functional value of the clustering algorithm is evaluated

_update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray)[source]

The Update strategy is algorithm specific (e.g. calculate mean of assigned points for kmeans, median for kmedians, etc.)

Parameters:

x (DNDarray) – Input Data
matching_centroids (DNDarray) – Index array of assigned centroids

fit(x: heat.core.dndarray.DNDarray)[source]

Computes the centroid of the clustering algorithm to fit the data x. The full pipeline is algorithm specific.

Parameters:: x (DNDarray) – Training instances to cluster. Shape = (n_samples, n_features)

predict(x: heat.core.dndarray.DNDarray)[source]

Predict the closest cluster each sample in x belongs to.

In the vector quantization literature, cluster_centers_() is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters:: x (DNDarray) – New data to predict. Shape = (n_samples, n_features)

heat.cluster._kcluster

Module Contents

`heat.cluster._kcluster`