heat.cluster.kmedians

Module Implementing the Kmedians Algorithm

Module Contents

class KMedians(n_clusters: int = 8, init: str | heat.core.dndarray.DNDarray = 'random', max_iter: int = 300, tol: float = 0.0001, random_state: int = None)

Bases: heat.cluster._kcluster._KCluster

K-Medians clustering algorithm [1]. Uses the Manhattan (City-block, \(L_1\)) metric for distance calculations

Parameters:
  • n_clusters (int, optional, default: 8) – The number of clusters to form as well as the number of centroids to generate.

  • init (str or DNDarray, default: ‘random’) –

    Method for initialization:

    • ‘k-medians++’ : selects initial cluster centers for the clustering in a smart way to speed up convergence [2].

    • ‘random’: choose k observations (rows) at random from data for the initial centroids.

    • ’batchparallel’: initialize by using the batch parallel algorithm (see BatchParallelKMedians for more information).

    • DNDarray: gives the initial centers, should be of Shape = (n_clusters, n_features)

  • max_iter (int, default: 300) – Maximum number of iterations of the k-means algorithm for a single run.

  • tol (float, default: 1e-4) – Relative tolerance with regards to inertia to declare convergence.

  • random_state (int) – Determines random number generation for centroid initialization.

References

[1] Hakimi, S., and O. Kariv. “An algorithmic approach to network location problems II: The p-medians.” SIAM Journal on Applied Mathematics 37.3 (1979): 539-560.

_update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray)

Compute coordinates of new centroid as median of the data points in x that are assigned to it

Parameters:
  • x (DNDarray) – Input data

  • matching_centroids (DNDarray) – Array filled with indeces i indicating to which cluster ci each sample point in x is assigned

fit(x: heat.core.dndarray.DNDarray)

Computes the centroid of a k-medians clustering.

Parameters:

x (DNDarray) – Training instances to cluster. Shape = (n_samples, n_features)

_initialize_cluster_centers(x: heat.core.dndarray.DNDarray)

Initializes the K-Means centroids.

Parameters:

x (DNDarray) – The data to initialize the clusters for. Shape = (n_samples, n_features)

_assign_to_cluster(x: heat.core.dndarray.DNDarray, eval_functional_value: bool = False)

Assigns the passed data points to the centroids based on the respective metric

Parameters:
  • x (DNDarray) – Data points, Shape = (n_samples, n_features)

  • eval_functional_value (bool, default: False) – If True, the current K-Clustering functional value of the clustering algorithm is evaluated

predict(x: heat.core.dndarray.DNDarray)

Predict the closest cluster each sample in x belongs to.

In the vector quantization literature, cluster_centers_() is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters:

x (DNDarray) – New data to predict. Shape = (n_samples, n_features)