:mod:`heat.cluster._kcluster` ============================= .. py:module:: heat.cluster._kcluster .. autoapi-nested-parse:: Base-module for k-clustering algorithms Module Contents --------------- .. py:class:: _KCluster(metric: Callable, n_clusters: int, init: Union[str, heat.core.dndarray.DNDarray], max_iter: int, tol: float, random_state: int) Bases: :class:`heat.ClusteringMixin`, :class:`heat.BaseEstimator` Base class for k-statistics clustering algorithms (kmeans, kmedians, kmedoids). The clusters are represented by centroids ci (we use the term from kmeans for simplicity) :param metric: One of the distance metrics in ht.spatial.distance. Needs to be passed as lambda function to take only two arrays as input :type metric: function :param n_clusters: The number of clusters to form as well as the number of centroids to generate. :type n_clusters: int :param init: Method for initialization: - ‘probability_based’ : selects initial cluster centers for the clustering in a smart way to speed up convergence (k-means++) - ‘random’: choose k observations (rows) at random from data for the initial centroids. - 'batchparallel': use the batch parallel algorithm to initialize the centroids, only available for split=0 and KMeans or KMedians - ``DNDarray``: gives the initial centers, should be of Shape = (n_clusters, n_features) :type init: str or DNDarray, default: ‘random’ :param max_iter: Maximum number of iterations for a single run. :type max_iter: int :param tol: Relative tolerance with regards to inertia to declare convergence. :type tol: float, default: 1e-4 :param random_state: Determines random number generation for centroid initialization. :type random_state: int .. attribute:: n_clusters .. attribute:: init .. attribute:: max_iter .. attribute:: tol .. attribute:: random_state .. attribute:: _metric .. attribute:: _cluster_centers :annotation: = None .. attribute:: _functional_value :annotation: = None .. attribute:: _labels :annotation: = None .. attribute:: _inertia :annotation: = None .. attribute:: _n_iter :annotation: = None .. attribute:: _p :annotation: = None .. role:: raw-html(raw) :format: html .. method:: _initialize_cluster_centers(x: heat.core.dndarray.DNDarray, oversampling: float, iter_multiplier: float) Initializes the K-Means centroids. :param x: The data to initialize the clusters for. Shape = (n_samples, n_features) :type x: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param iter_multiplier: factor that increases the number of iterations used in the initialization of centroids :type iter_multiplier: float .. method:: _centroid_sampling_helper(x: heat.core.dndarray.DNDarray, centroids: heat.core.dndarray.DNDarray, oversampling: float, num_iters: int) Helper function for the k-means|| initialization of centroids. Samples new centroids based on a probability distribution derived from the distance of data points to the current set of centroids. :param x: The data to initialize the clusters for. Shape = (n_samples, n_features) :type x: DNDarray :param centroids: The initial set of centroids :type centroids: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param num_iters: number of iterations used in the initialization of centroids :type num_iters: float .. method:: _assign_to_cluster(x: heat.core.dndarray.DNDarray, eval_functional_value: bool = False) Assigns the passed data points to the centroids based on the respective metric :param x: Data points, Shape = (n_samples, n_features) :type x: DNDarray :param eval_functional_value: If True, the current K-Clustering functional value of the clustering algorithm is evaluated :type eval_functional_value: bool, default: False .. method:: _update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray) The Update strategy is algorithm specific (e.g. calculate mean of assigned points for kmeans, median for kmedians, etc.) :param x: Input Data :type x: DNDarray :param matching_centroids: Index array of assigned centroids :type matching_centroids: DNDarray .. method:: fit(x: heat.core.dndarray.DNDarray) Computes the centroid of the clustering algorithm to fit the data ``x``. The full pipeline is algorithm specific. :param x: Training instances to cluster. Shape = (n_samples, n_features) :type x: DNDarray .. method:: predict(x: heat.core.dndarray.DNDarray) Predict the closest cluster each sample in ``x`` belongs to. In the vector quantization literature, :func:`cluster_centers_` is called the code book and each value returned by predict is the index of the closest code in the code book. :param x: New data to predict. Shape = (n_samples, n_features) :type x: DNDarray