heat.cluster.kmedoids

Module Implementing the Kmedoids Algorithm

Module Contents

class KMedoids(n_clusters: int = 8, init: str | heat.core.dndarray.DNDarray = 'random', max_iter: int = 300, random_state: int = None)

Bases: heat.cluster._kcluster._KCluster

This is not the original implementation of k-medoids using PAM as originally proposed by in [1]. This is kmedoids with the Manhattan distance as fixed metric, calculating the median of the assigned cluster points as new cluster center and snapping the centroid to the the nearest datapoint afterwards.

Parameters:
  • n_clusters (int, optional, default: 8) – The number of clusters to form as well as the number of centroids to generate.

  • init (str or DNDarray, default: ‘random’) –

    Method for initialization:

    • ‘k-medoids++’ : selects initial cluster centers for the clustering in a smart way to speed up convergence [2].

    • ‘random’: choose k observations (rows) at random from data for the initial centroids.

    • DNDarray: gives the initial centers, should be of Shape = (n_clusters, n_features)

  • max_iter (int, default: 300) – Maximum number of iterations of the algorithm for a single run.

  • random_state (int) – Determines random number generation for centroid initialization.

References

[1] Kaufman, L. and Rousseeuw, P.J. (1987), Clustering by means of Medoids, in Statistical Data Analysis Based on the L1 Norm and Related Methods, edited by Y. Dodge, North-Holland, 405416.

_update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray)

Compute new centroid ci as closest sample to the median of the data points in x that are assigned to ci

Parameters:
  • x (DNDarray) – Input data

  • matching_centroids (DNDarray) – Array filled with indeces i indicating to which cluster ci each sample point in x is assigned

fit(x: heat.core.dndarray.DNDarray)

Computes the centroid of a k-medoids clustering.

Parameters:

x (DNDarray) – Training instances to cluster. Shape = (n_samples, n_features)

_initialize_cluster_centers(x: heat.core.dndarray.DNDarray)

Initializes the K-Means centroids.

Parameters:

x (DNDarray) – The data to initialize the clusters for. Shape = (n_samples, n_features)

_assign_to_cluster(x: heat.core.dndarray.DNDarray, eval_functional_value: bool = False)

Assigns the passed data points to the centroids based on the respective metric

Parameters:
  • x (DNDarray) – Data points, Shape = (n_samples, n_features)

  • eval_functional_value (bool, default: False) – If True, the current K-Clustering functional value of the clustering algorithm is evaluated

predict(x: heat.core.dndarray.DNDarray)

Predict the closest cluster each sample in x belongs to.

In the vector quantization literature, cluster_centers_() is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters:

x (DNDarray) – New data to predict. Shape = (n_samples, n_features)