heat.cluster.kmedians
Module Implementing the Kmedians Algorithm
Module Contents
- class KMedians(n_clusters: int = 8, init: str | heat.core.dndarray.DNDarray = 'random', max_iter: int = 300, tol: float = 0.0001, random_state: int = None)
Bases:
heat.cluster._kcluster._KCluster
K-Medians clustering algorithm [1]. Uses the Manhattan (City-block, \(L_1\)) metric for distance calculations
- Parameters:
n_clusters (int, optional, default: 8) – The number of clusters to form as well as the number of centroids to generate.
init (str or DNDarray, default: ‘random’) –
Method for initialization:
‘k-medians++’ : selects initial cluster centers for the clustering in a smart way to speed up convergence [2].
‘random’: choose k observations (rows) at random from data for the initial centroids.
’batchparallel’: initialize by using the batch parallel algorithm (see BatchParallelKMedians for more information).
DNDarray: gives the initial centers, should be of Shape = (n_clusters, n_features)
max_iter (int, default: 300) – Maximum number of iterations of the k-means algorithm for a single run.
tol (float, default: 1e-4) – Relative tolerance with regards to inertia to declare convergence.
random_state (int) – Determines random number generation for centroid initialization.
References
[1] Hakimi, S., and O. Kariv. “An algorithmic approach to network location problems II: The p-medians.” SIAM Journal on Applied Mathematics 37.3 (1979): 539-560.
- _update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray)
Compute coordinates of new centroid as median of the data points in
x
that are assigned to it
- fit(x: heat.core.dndarray.DNDarray)
Computes the centroid of a k-medians clustering.
- Parameters:
x (DNDarray) – Training instances to cluster. Shape = (n_samples, n_features)
- _initialize_cluster_centers(x: heat.core.dndarray.DNDarray)
Initializes the K-Means centroids.
- Parameters:
x (DNDarray) – The data to initialize the clusters for. Shape = (n_samples, n_features)
- _assign_to_cluster(x: heat.core.dndarray.DNDarray, eval_functional_value: bool = False)
Assigns the passed data points to the centroids based on the respective metric
- predict(x: heat.core.dndarray.DNDarray)
Predict the closest cluster each sample in
x
belongs to.In the vector quantization literature,
cluster_centers_()
is called the code book and each value returned by predict is the index of the closest code in the code book.- Parameters:
x (DNDarray) – New data to predict. Shape = (n_samples, n_features)