:mod:`heat.cluster` =================== .. py:module:: heat.cluster .. autoapi-nested-parse:: Add the clustering functions to the ht.cluster namespace Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 _kcluster/index.rst batchparallelclustering/index.rst kmeans/index.rst kmedians/index.rst kmedoids/index.rst metrics/index.rst spectral/index.rst Package Contents ---------------- .. function:: _kmex(X, p, n_clusters, init, max_iter, tol, random_state=None, weights: torch.tensor = 1.0) Auxiliary function: single-process k-means and k-medians in pytorch p is the norm used for computing distances: p=2 implies k-means, p=1 implies k-medians. p should be 1 (k-medians) or 2 (k-means). For other choice of p, we proceed as for p=2 and hope for the best. (note: kmex stands for kmeans and kmedians) .. py:class:: DNDarray(array: torch.Tensor, gshape: Tuple[int, Ellipsis], dtype: heat.core.types.datatype, split: Union[int, None], device: heat.core.devices.Device, comm: Communication, balanced: bool) Distributed N-Dimensional array. The core element of HeAT. It is composed of PyTorch tensors local to each process. :param array: Local array elements :type array: torch.Tensor :param gshape: The global shape of the array :type gshape: Tuple[int,...] :param dtype: The datatype of the array :type dtype: datatype :param split: The axis on which the array is divided between processes :type split: int or None :param device: The device on which the local arrays are using (cpu or gpu) :type device: Device :param comm: The communications object for sending and receiving data :type comm: Communication :param balanced: Describes whether the data are evenly distributed across processes. If this information is not available (``self.balanced is None``), it can be gathered via the :func:`is_balanced()` method (requires communication). :type balanced: bool or None .. attribute:: __array .. attribute:: __gshape .. attribute:: __dtype .. attribute:: __split .. attribute:: __device .. attribute:: __comm .. attribute:: __balanced .. attribute:: __ishalo :annotation: = False .. attribute:: __halo_next :annotation: = None .. attribute:: __halo_prev :annotation: = None .. attribute:: __partitions_dict__ :annotation: = None .. attribute:: __lshape_map :annotation: = None .. role:: raw-html(raw) :format: html .. method:: __prephalo(start, end) -> torch.Tensor Extracts the halo indexed by start, end from ``self.array`` in the direction of ``self.split`` :param start: Start index of the halo extracted from ``self.array`` :type start: int :param end: End index of the halo extracted from ``self.array`` :type end: int .. method:: get_halo(halo_size: int, prev: bool = True, next: bool = True) Fetch halos of size ``halo_size`` from neighboring ranks and save them in ``self.halo_next/self.halo_prev``. :param halo_size: Size of the halo. :type halo_size: int :param prev: If True, fetch the halo from the previous rank. Default: True. :type prev: bool, optional :param next: If True, fetch the halo from the next rank. Default: True. :type next: bool, optional .. method:: __cat_halo() -> torch.Tensor Return local array concatenated to halos if they are available. .. method:: __array__() -> numpy.ndarray Returns a view of the process-local slice of the :class:`DNDarray` as a numpy ndarray, if the ``DNDarray`` resides on CPU. Otherwise, it returns a copy, on CPU, of the process-local slice of ``DNDarray`` as numpy ndarray. .. method:: __array_ufunc__(ufunc, method, *inputs, **kwargs) Override NumPy's universal functions. .. method:: __array_function__(func, types, args, kwargs) Augments NumPy's functions. .. method:: astype(dtype, copy=True) -> DNDarray Returns a casted version of this array. Casted array is a new array of the same shape but with given type of this array. If copy is ``True``, the same array is returned instead. :param dtype: Heat type to which the array is cast :type dtype: datatype :param copy: By default the operation returns a copy of this array. If copy is set to ``False`` the cast is performed in-place and this array is returned :type copy: bool, optional .. method:: balance_() -> DNDarray Function for balancing a :class:`DNDarray` between all nodes. To determine if this is needed use the :func:`is_balanced()` function. If the ``DNDarray`` is already balanced this function will do nothing. This function modifies the ``DNDarray`` itself and will not return anything. .. rubric:: Examples >>> a = ht.zeros((10, 2), split=0) >>> a[:, 0] = ht.arange(10) >>> b = a[3:] [0/2] tensor([[3., 0.], [1/2] tensor([[4., 0.], [5., 0.], [6., 0.]]) [2/2] tensor([[7., 0.], [8., 0.], [9., 0.]]) >>> b.balance_() >>> print(b.gshape, b.lshape) [0/2] (7, 2) (1, 2) [1/2] (7, 2) (3, 2) [2/2] (7, 2) (3, 2) >>> b [0/2] tensor([[3., 0.], [4., 0.], [5., 0.]]) [1/2] tensor([[6., 0.], [7., 0.]]) [2/2] tensor([[8., 0.], [9., 0.]]) >>> print(b.gshape, b.lshape) [0/2] (7, 2) (3, 2) [1/2] (7, 2) (2, 2) [2/2] (7, 2) (2, 2) .. method:: __bool__() -> bool Boolean scalar casting. .. method:: __cast(cast_function) -> Union[float, int] Implements a generic cast function for ``DNDarray`` objects. :param cast_function: The actual cast function, e.g. ``float`` or ``int`` :type cast_function: function :raises TypeError: If the ``DNDarray`` object cannot be converted into a scalar. .. method:: collect_(target_rank: Optional[int] = 0) -> None A method collecting a distributed DNDarray to one MPI rank, chosen by the `target_rank` variable. It is a specific case of the ``redistribute_`` method. :param target_rank: The rank to which the DNDarray will be collected. Default: 0. :type target_rank: int, optional :raises TypeError: If the target rank is not an integer. :raises ValueError: If the target rank is out of bounds. .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.collect_() >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) >>> st.collect_(1) >>> print(st.lshape) [0/2] (50, 81, 0) [1/2] (50, 81, 67) [2/2] (50, 81, 0) .. method:: __complex__() -> DNDarray Complex scalar casting. .. method:: counts_displs() -> Tuple[Tuple[int], Tuple[int]] Returns actual counts (number of items per process) and displacements (offsets) of the DNDarray. Does not assume load balance. .. method:: cpu() -> DNDarray Returns a copy of this object in main memory. If this object is already in main memory, then no copy is performed and the original object is returned. .. method:: create_lshape_map(force_check: bool = False) -> torch.Tensor Generate a 'map' of the lshapes of the data on all processes. Units are ``(process rank, lshape)`` :param force_check: if False (default) and the lshape map has already been created, use the previous result. Otherwise, create the lshape_map :type force_check: bool, optional .. method:: create_partition_interface() Create a partition interface in line with the DPPY proposal. This is subject to change. The intention of this to facilitate the usage of a general format for the referencing of distributed datasets. An example of the output and shape is shown below. __partitioned__ = { 'shape': (27, 3, 2), 'partition_tiling': (4, 1, 1), 'partitions': { (0, 0, 0): { 'start': (0, 0, 0), 'shape': (7, 3, 2), 'data': tensor([...], dtype=torch.int32), 'location': [0], 'dtype': torch.int32, 'device': 'cpu' }, (1, 0, 0): { 'start': (7, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [1], 'dtype': torch.int32, 'device': 'cpu' }, (2, 0, 0): { 'start': (14, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [2], 'dtype': torch.int32, 'device': 'cpu' }, (3, 0, 0): { 'start': (21, 0, 0), 'shape': (6, 3, 2), 'data': None, 'location': [3], 'dtype': torch.int32, 'device': 'cpu' } }, 'locals': [(rank, 0, 0)], 'get': lambda x: x, } :rtype: dictionary containing the partition interface as shown above. .. method:: __float__() -> DNDarray Float scalar casting. .. seealso:: :func:`~heat.core.manipulations.flatten` .. method:: fill_diagonal(value: float) -> DNDarray Fill the main diagonal of a 2D :class:`DNDarray`. This function modifies the input tensor in-place, and returns the input array. :param value: The value to be placed in the ``DNDarrays`` main diagonal :type value: float .. method:: __getitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]]) -> DNDarray Global getter function for DNDarrays. Returns a new DNDarray composed of the elements of the original tensor selected by the indices given. This does *NOT* redistribute or rebalance the resulting tensor. If the selection of values is unbalanced then the resultant tensor is also unbalanced! To redistributed the ``DNDarray`` use :func:`balance()` (issue #187) :param key: Indices to get from the tensor. :type key: int, slice, Tuple[int,...], List[int,...] .. rubric:: Examples >>> a = ht.arange(10, split=0) (1/2) >>> tensor([0, 1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5, 6, 7, 8, 9], dtype=torch.int32) >>> a[1:6] (1/2) >>> tensor([1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5], dtype=torch.int32) >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] (1/2) >>> tensor([0.]) (2/2) >>> tensor([0., 0.]) .. method:: gpu() -> DNDarray Returns a copy of this object in GPU memory. If this object is already in GPU memory, then no copy is performed and the original object is returned. .. method:: __int__() -> DNDarray Integer scalar casting. .. method:: is_balanced(force_check: bool = False) -> bool Determine if ``self`` is balanced evenly (or as evenly as possible) across all nodes distributed evenly (or as evenly as possible) across all processes. This is equivalent to returning ``self.balanced``. If no information is available (``self.balanced = None``), the balanced status will be assessed via collective communication. :param force_check: If True, the balanced status of the ``DNDarray`` will be assessed via collective communication in any case. :type force_check: bool, optional .. method:: is_distributed() -> bool Determines whether the data of this ``DNDarray`` is distributed across multiple processes. .. method:: __key_is_singular(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: __key_adds_dimension(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: item() Returns the only element of a 1-element :class:`DNDarray`. Mirror of the pytorch command by the same name. If size of ``DNDarray`` is >1 element, then a ``ValueError`` is raised (by pytorch) .. rubric:: Examples >>> import heat as ht >>> x = ht.zeros((1)) >>> x.item() 0.0 .. method:: __len__() -> int The length of the ``DNDarray``, i.e. the number of items in the first dimension. .. method:: numpy() -> numpy.array Returns a copy of the :class:`DNDarray` as numpy ndarray. If the ``DNDarray`` resides on the GPU, the underlying data will be copied to the CPU first. If the ``DNDarray`` is distributed, an MPI Allgather operation will be performed before converting to np.ndarray, i.e. each MPI process will end up holding a copy of the entire array in memory. Make sure process memory is sufficient! .. rubric:: Examples >>> import heat as ht T1 = ht.random.randn((10,8)) T1.numpy() .. method:: _repr_pretty_(p, cycle) Pretty print for IPython. .. method:: __repr__() -> str Returns a printable representation of the passed DNDarray, targeting developers. .. method:: ravel() Flattens the ``DNDarray``. .. seealso:: :func:`~heat.core.manipulations.ravel` .. rubric:: Examples >>> a = ht.ones((2, 3), split=0) >>> b = a.ravel() >>> a[0, 0] = 4 >>> b DNDarray([4., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0) .. method:: redistribute_(lshape_map: Optional[torch.Tensor] = None, target_map: Optional[torch.Tensor] = None) Redistributes the data of the :class:`DNDarray` *along the split axis* to match the given target map. This function does not modify the non-split dimensions of the ``DNDarray``. This is an abstraction and extension of the balance function. :param lshape_map: The current lshape of processes. Units are ``[rank, lshape]``. :type lshape_map: torch.Tensor, optional :param target_map: The desired distribution across the processes. Units are ``[rank, target lshape]``. Note: the only important parts of the target map are the values along the split axis, values which are not along this axis are there to mimic the shape of the ``lshape_map``. :type target_map: torch.Tensor, optional .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> target_map = torch.zeros((st.comm.size, 3), dtype=torch.int64) >>> target_map[0, 2] = 67 >>> print(target_map) [0/2] tensor([[ 0, 0, 67], [0/2] [ 0, 0, 0], [0/2] [ 0, 0, 0]], dtype=torch.int32) [1/2] tensor([[ 0, 0, 67], [1/2] [ 0, 0, 0], [1/2] [ 0, 0, 0]], dtype=torch.int32) [2/2] tensor([[ 0, 0, 67], [2/2] [ 0, 0, 0], [2/2] [ 0, 0, 0]], dtype=torch.int32) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.redistribute_(target_map=target_map) >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) .. method:: __redistribute_shuffle(snd_pr: Union[int, torch.Tensor], send_amt: Union[int, torch.Tensor], rcv_pr: Union[int, torch.Tensor], snd_dtype: torch.dtype) Function to abstract the function used during redistribute for shuffling data between processes along the split axis :param snd_pr: Sending process :type snd_pr: int or torch.Tensor :param send_amt: Amount of data to be sent by the sending process :type send_amt: int or torch.Tensor :param rcv_pr: Receiving process :type rcv_pr: int or torch.Tensor :param snd_dtype: Torch type of the data in question :type snd_dtype: torch.dtype .. method:: resplit_(axis: int = None) In-place option for resplitting a :class:`DNDarray`. :param axis: The new split axis, ``None`` denotes gathering, an int will set the new split axis :type axis: int .. rubric:: Examples >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, None) >>> a.split None >>> a.lshape (0/2) (4, 5) (1/2) (4, 5) >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, 1) >>> a.split 1 >>> a.lshape (0/2) (4, 3) (1/2) (4, 2) .. method:: __setitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Global item setter :param key: Index/indices to be set :type key: Union[int, Tuple[int,...], List[int,...]] :param value: Value to be set to the specified positions in the DNDarray (self) :type value: Union[float, DNDarray,torch.Tensor] .. rubric:: Notes If a ``DNDarray`` is given as the value to be set then the split axes are assumed to be equal. If they are not, PyTorch will raise an error when the values are attempted to be set on the local array .. rubric:: Examples >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] = 1 >>> a (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 1., 0., 0., 0.]]) (2/2) >>> tensor([[0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.]]) .. method:: __setter(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Utility function for checking ``value`` and forwarding to :func:``__setitem__`` :raises NotImplementedError: If the type of ``value`` ist not supported .. method:: __str__() -> str Computes a string representation of the passed ``DNDarray``. .. method:: tolist(keepsplit: bool = False) -> List Return a copy of the local array data as a (nested) Python list. For scalars, a standard Python number is returned. :param keepsplit: Whether the list should be returned locally or globally. :type keepsplit: bool .. rubric:: Examples >>> a = ht.array([[0, 1], [2, 3]]) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=0) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=1) >>> a.tolist(keepsplit=True) (1/2) [[0], [2]] (2/2) [[1], [3]] .. method:: __torch_function__(func, types, args=(), kwargs=None) Supports PyTorch's dispatch mechanism. .. method:: __torch_proxy__() -> torch.Tensor Return a 1-element `torch.Tensor` strided as the global `self` shape. Used internally for sanitation purposes. .. method:: __xitem_get_key_start_stop(rank: int, actives: list, key_st: int, key_sp: int, step: int, ends: torch.Tensor, og_key_st: int) -> Tuple[int, int] .. py:class:: _KCluster(metric: Callable, n_clusters: int, init: Union[str, heat.core.dndarray.DNDarray], max_iter: int, tol: float, random_state: int) Bases: :class:`heat.ClusteringMixin`, :class:`heat.BaseEstimator` Base class for k-statistics clustering algorithms (kmeans, kmedians, kmedoids). The clusters are represented by centroids ci (we use the term from kmeans for simplicity) :param metric: One of the distance metrics in ht.spatial.distance. Needs to be passed as lambda function to take only two arrays as input :type metric: function :param n_clusters: The number of clusters to form as well as the number of centroids to generate. :type n_clusters: int :param init: Method for initialization: - ‘probability_based’ : selects initial cluster centers for the clustering in a smart way to speed up convergence (k-means++) - ‘random’: choose k observations (rows) at random from data for the initial centroids. - 'batchparallel': use the batch parallel algorithm to initialize the centroids, only available for split=0 and KMeans or KMedians - ``DNDarray``: gives the initial centers, should be of Shape = (n_clusters, n_features) :type init: str or DNDarray, default: ‘random’ :param max_iter: Maximum number of iterations for a single run. :type max_iter: int :param tol: Relative tolerance with regards to inertia to declare convergence. :type tol: float, default: 1e-4 :param random_state: Determines random number generation for centroid initialization. :type random_state: int .. attribute:: n_clusters .. attribute:: init .. attribute:: max_iter .. attribute:: tol .. attribute:: random_state .. attribute:: _metric .. attribute:: _cluster_centers :annotation: = None .. attribute:: _functional_value :annotation: = None .. attribute:: _labels :annotation: = None .. attribute:: _inertia :annotation: = None .. attribute:: _n_iter :annotation: = None .. attribute:: _p :annotation: = None .. role:: raw-html(raw) :format: html .. method:: _initialize_cluster_centers(x: heat.core.dndarray.DNDarray, oversampling: float, iter_multiplier: float) Initializes the K-Means centroids. :param x: The data to initialize the clusters for. Shape = (n_samples, n_features) :type x: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param iter_multiplier: factor that increases the number of iterations used in the initialization of centroids :type iter_multiplier: float .. method:: _centroid_sampling_helper(x: heat.core.dndarray.DNDarray, centroids: heat.core.dndarray.DNDarray, oversampling: float, num_iters: int) Helper function for the k-means|| initialization of centroids. Samples new centroids based on a probability distribution derived from the distance of data points to the current set of centroids. :param x: The data to initialize the clusters for. Shape = (n_samples, n_features) :type x: DNDarray :param centroids: The initial set of centroids :type centroids: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param num_iters: number of iterations used in the initialization of centroids :type num_iters: float .. method:: _assign_to_cluster(x: heat.core.dndarray.DNDarray, eval_functional_value: bool = False) Assigns the passed data points to the centroids based on the respective metric :param x: Data points, Shape = (n_samples, n_features) :type x: DNDarray :param eval_functional_value: If True, the current K-Clustering functional value of the clustering algorithm is evaluated :type eval_functional_value: bool, default: False .. method:: _update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray) The Update strategy is algorithm specific (e.g. calculate mean of assigned points for kmeans, median for kmedians, etc.) :param x: Input Data :type x: DNDarray :param matching_centroids: Index array of assigned centroids :type matching_centroids: DNDarray .. method:: fit(x: heat.core.dndarray.DNDarray) Computes the centroid of the clustering algorithm to fit the data ``x``. The full pipeline is algorithm specific. :param x: Training instances to cluster. Shape = (n_samples, n_features) :type x: DNDarray .. method:: predict(x: heat.core.dndarray.DNDarray) Predict the closest cluster each sample in ``x`` belongs to. In the vector quantization literature, :func:`cluster_centers_` is called the code book and each value returned by predict is the index of the closest code in the code book. :param x: New data to predict. Shape = (n_samples, n_features) :type x: DNDarray .. py:class:: _KCluster(metric: Callable, n_clusters: int, init: Union[str, heat.core.dndarray.DNDarray], max_iter: int, tol: float, random_state: int) Bases: :class:`heat.ClusteringMixin`, :class:`heat.BaseEstimator` Base class for k-statistics clustering algorithms (kmeans, kmedians, kmedoids). The clusters are represented by centroids ci (we use the term from kmeans for simplicity) :param metric: One of the distance metrics in ht.spatial.distance. Needs to be passed as lambda function to take only two arrays as input :type metric: function :param n_clusters: The number of clusters to form as well as the number of centroids to generate. :type n_clusters: int :param init: Method for initialization: - ‘probability_based’ : selects initial cluster centers for the clustering in a smart way to speed up convergence (k-means++) - ‘random’: choose k observations (rows) at random from data for the initial centroids. - 'batchparallel': use the batch parallel algorithm to initialize the centroids, only available for split=0 and KMeans or KMedians - ``DNDarray``: gives the initial centers, should be of Shape = (n_clusters, n_features) :type init: str or DNDarray, default: ‘random’ :param max_iter: Maximum number of iterations for a single run. :type max_iter: int :param tol: Relative tolerance with regards to inertia to declare convergence. :type tol: float, default: 1e-4 :param random_state: Determines random number generation for centroid initialization. :type random_state: int .. attribute:: n_clusters .. attribute:: init .. attribute:: max_iter .. attribute:: tol .. attribute:: random_state .. attribute:: _metric .. attribute:: _cluster_centers :annotation: = None .. attribute:: _functional_value :annotation: = None .. attribute:: _labels :annotation: = None .. attribute:: _inertia :annotation: = None .. attribute:: _n_iter :annotation: = None .. attribute:: _p :annotation: = None .. role:: raw-html(raw) :format: html .. method:: _initialize_cluster_centers(x: heat.core.dndarray.DNDarray, oversampling: float, iter_multiplier: float) Initializes the K-Means centroids. :param x: The data to initialize the clusters for. Shape = (n_samples, n_features) :type x: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param iter_multiplier: factor that increases the number of iterations used in the initialization of centroids :type iter_multiplier: float .. method:: _centroid_sampling_helper(x: heat.core.dndarray.DNDarray, centroids: heat.core.dndarray.DNDarray, oversampling: float, num_iters: int) Helper function for the k-means|| initialization of centroids. Samples new centroids based on a probability distribution derived from the distance of data points to the current set of centroids. :param x: The data to initialize the clusters for. Shape = (n_samples, n_features) :type x: DNDarray :param centroids: The initial set of centroids :type centroids: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param num_iters: number of iterations used in the initialization of centroids :type num_iters: float .. method:: _assign_to_cluster(x: heat.core.dndarray.DNDarray, eval_functional_value: bool = False) Assigns the passed data points to the centroids based on the respective metric :param x: Data points, Shape = (n_samples, n_features) :type x: DNDarray :param eval_functional_value: If True, the current K-Clustering functional value of the clustering algorithm is evaluated :type eval_functional_value: bool, default: False .. method:: _update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray) The Update strategy is algorithm specific (e.g. calculate mean of assigned points for kmeans, median for kmedians, etc.) :param x: Input Data :type x: DNDarray :param matching_centroids: Index array of assigned centroids :type matching_centroids: DNDarray .. method:: fit(x: heat.core.dndarray.DNDarray) Computes the centroid of the clustering algorithm to fit the data ``x``. The full pipeline is algorithm specific. :param x: Training instances to cluster. Shape = (n_samples, n_features) :type x: DNDarray .. method:: predict(x: heat.core.dndarray.DNDarray) Predict the closest cluster each sample in ``x`` belongs to. In the vector quantization literature, :func:`cluster_centers_` is called the code book and each value returned by predict is the index of the closest code in the code book. :param x: New data to predict. Shape = (n_samples, n_features) :type x: DNDarray .. py:class:: DNDarray(array: torch.Tensor, gshape: Tuple[int, Ellipsis], dtype: heat.core.types.datatype, split: Union[int, None], device: heat.core.devices.Device, comm: Communication, balanced: bool) Distributed N-Dimensional array. The core element of HeAT. It is composed of PyTorch tensors local to each process. :param array: Local array elements :type array: torch.Tensor :param gshape: The global shape of the array :type gshape: Tuple[int,...] :param dtype: The datatype of the array :type dtype: datatype :param split: The axis on which the array is divided between processes :type split: int or None :param device: The device on which the local arrays are using (cpu or gpu) :type device: Device :param comm: The communications object for sending and receiving data :type comm: Communication :param balanced: Describes whether the data are evenly distributed across processes. If this information is not available (``self.balanced is None``), it can be gathered via the :func:`is_balanced()` method (requires communication). :type balanced: bool or None .. attribute:: __array .. attribute:: __gshape .. attribute:: __dtype .. attribute:: __split .. attribute:: __device .. attribute:: __comm .. attribute:: __balanced .. attribute:: __ishalo :annotation: = False .. attribute:: __halo_next :annotation: = None .. attribute:: __halo_prev :annotation: = None .. attribute:: __partitions_dict__ :annotation: = None .. attribute:: __lshape_map :annotation: = None .. role:: raw-html(raw) :format: html .. method:: __prephalo(start, end) -> torch.Tensor Extracts the halo indexed by start, end from ``self.array`` in the direction of ``self.split`` :param start: Start index of the halo extracted from ``self.array`` :type start: int :param end: End index of the halo extracted from ``self.array`` :type end: int .. method:: get_halo(halo_size: int, prev: bool = True, next: bool = True) Fetch halos of size ``halo_size`` from neighboring ranks and save them in ``self.halo_next/self.halo_prev``. :param halo_size: Size of the halo. :type halo_size: int :param prev: If True, fetch the halo from the previous rank. Default: True. :type prev: bool, optional :param next: If True, fetch the halo from the next rank. Default: True. :type next: bool, optional .. method:: __cat_halo() -> torch.Tensor Return local array concatenated to halos if they are available. .. method:: __array__() -> numpy.ndarray Returns a view of the process-local slice of the :class:`DNDarray` as a numpy ndarray, if the ``DNDarray`` resides on CPU. Otherwise, it returns a copy, on CPU, of the process-local slice of ``DNDarray`` as numpy ndarray. .. method:: __array_ufunc__(ufunc, method, *inputs, **kwargs) Override NumPy's universal functions. .. method:: __array_function__(func, types, args, kwargs) Augments NumPy's functions. .. method:: astype(dtype, copy=True) -> DNDarray Returns a casted version of this array. Casted array is a new array of the same shape but with given type of this array. If copy is ``True``, the same array is returned instead. :param dtype: Heat type to which the array is cast :type dtype: datatype :param copy: By default the operation returns a copy of this array. If copy is set to ``False`` the cast is performed in-place and this array is returned :type copy: bool, optional .. method:: balance_() -> DNDarray Function for balancing a :class:`DNDarray` between all nodes. To determine if this is needed use the :func:`is_balanced()` function. If the ``DNDarray`` is already balanced this function will do nothing. This function modifies the ``DNDarray`` itself and will not return anything. .. rubric:: Examples >>> a = ht.zeros((10, 2), split=0) >>> a[:, 0] = ht.arange(10) >>> b = a[3:] [0/2] tensor([[3., 0.], [1/2] tensor([[4., 0.], [5., 0.], [6., 0.]]) [2/2] tensor([[7., 0.], [8., 0.], [9., 0.]]) >>> b.balance_() >>> print(b.gshape, b.lshape) [0/2] (7, 2) (1, 2) [1/2] (7, 2) (3, 2) [2/2] (7, 2) (3, 2) >>> b [0/2] tensor([[3., 0.], [4., 0.], [5., 0.]]) [1/2] tensor([[6., 0.], [7., 0.]]) [2/2] tensor([[8., 0.], [9., 0.]]) >>> print(b.gshape, b.lshape) [0/2] (7, 2) (3, 2) [1/2] (7, 2) (2, 2) [2/2] (7, 2) (2, 2) .. method:: __bool__() -> bool Boolean scalar casting. .. method:: __cast(cast_function) -> Union[float, int] Implements a generic cast function for ``DNDarray`` objects. :param cast_function: The actual cast function, e.g. ``float`` or ``int`` :type cast_function: function :raises TypeError: If the ``DNDarray`` object cannot be converted into a scalar. .. method:: collect_(target_rank: Optional[int] = 0) -> None A method collecting a distributed DNDarray to one MPI rank, chosen by the `target_rank` variable. It is a specific case of the ``redistribute_`` method. :param target_rank: The rank to which the DNDarray will be collected. Default: 0. :type target_rank: int, optional :raises TypeError: If the target rank is not an integer. :raises ValueError: If the target rank is out of bounds. .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.collect_() >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) >>> st.collect_(1) >>> print(st.lshape) [0/2] (50, 81, 0) [1/2] (50, 81, 67) [2/2] (50, 81, 0) .. method:: __complex__() -> DNDarray Complex scalar casting. .. method:: counts_displs() -> Tuple[Tuple[int], Tuple[int]] Returns actual counts (number of items per process) and displacements (offsets) of the DNDarray. Does not assume load balance. .. method:: cpu() -> DNDarray Returns a copy of this object in main memory. If this object is already in main memory, then no copy is performed and the original object is returned. .. method:: create_lshape_map(force_check: bool = False) -> torch.Tensor Generate a 'map' of the lshapes of the data on all processes. Units are ``(process rank, lshape)`` :param force_check: if False (default) and the lshape map has already been created, use the previous result. Otherwise, create the lshape_map :type force_check: bool, optional .. method:: create_partition_interface() Create a partition interface in line with the DPPY proposal. This is subject to change. The intention of this to facilitate the usage of a general format for the referencing of distributed datasets. An example of the output and shape is shown below. __partitioned__ = { 'shape': (27, 3, 2), 'partition_tiling': (4, 1, 1), 'partitions': { (0, 0, 0): { 'start': (0, 0, 0), 'shape': (7, 3, 2), 'data': tensor([...], dtype=torch.int32), 'location': [0], 'dtype': torch.int32, 'device': 'cpu' }, (1, 0, 0): { 'start': (7, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [1], 'dtype': torch.int32, 'device': 'cpu' }, (2, 0, 0): { 'start': (14, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [2], 'dtype': torch.int32, 'device': 'cpu' }, (3, 0, 0): { 'start': (21, 0, 0), 'shape': (6, 3, 2), 'data': None, 'location': [3], 'dtype': torch.int32, 'device': 'cpu' } }, 'locals': [(rank, 0, 0)], 'get': lambda x: x, } :rtype: dictionary containing the partition interface as shown above. .. method:: __float__() -> DNDarray Float scalar casting. .. seealso:: :func:`~heat.core.manipulations.flatten` .. method:: fill_diagonal(value: float) -> DNDarray Fill the main diagonal of a 2D :class:`DNDarray`. This function modifies the input tensor in-place, and returns the input array. :param value: The value to be placed in the ``DNDarrays`` main diagonal :type value: float .. method:: __getitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]]) -> DNDarray Global getter function for DNDarrays. Returns a new DNDarray composed of the elements of the original tensor selected by the indices given. This does *NOT* redistribute or rebalance the resulting tensor. If the selection of values is unbalanced then the resultant tensor is also unbalanced! To redistributed the ``DNDarray`` use :func:`balance()` (issue #187) :param key: Indices to get from the tensor. :type key: int, slice, Tuple[int,...], List[int,...] .. rubric:: Examples >>> a = ht.arange(10, split=0) (1/2) >>> tensor([0, 1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5, 6, 7, 8, 9], dtype=torch.int32) >>> a[1:6] (1/2) >>> tensor([1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5], dtype=torch.int32) >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] (1/2) >>> tensor([0.]) (2/2) >>> tensor([0., 0.]) .. method:: gpu() -> DNDarray Returns a copy of this object in GPU memory. If this object is already in GPU memory, then no copy is performed and the original object is returned. .. method:: __int__() -> DNDarray Integer scalar casting. .. method:: is_balanced(force_check: bool = False) -> bool Determine if ``self`` is balanced evenly (or as evenly as possible) across all nodes distributed evenly (or as evenly as possible) across all processes. This is equivalent to returning ``self.balanced``. If no information is available (``self.balanced = None``), the balanced status will be assessed via collective communication. :param force_check: If True, the balanced status of the ``DNDarray`` will be assessed via collective communication in any case. :type force_check: bool, optional .. method:: is_distributed() -> bool Determines whether the data of this ``DNDarray`` is distributed across multiple processes. .. method:: __key_is_singular(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: __key_adds_dimension(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: item() Returns the only element of a 1-element :class:`DNDarray`. Mirror of the pytorch command by the same name. If size of ``DNDarray`` is >1 element, then a ``ValueError`` is raised (by pytorch) .. rubric:: Examples >>> import heat as ht >>> x = ht.zeros((1)) >>> x.item() 0.0 .. method:: __len__() -> int The length of the ``DNDarray``, i.e. the number of items in the first dimension. .. method:: numpy() -> numpy.array Returns a copy of the :class:`DNDarray` as numpy ndarray. If the ``DNDarray`` resides on the GPU, the underlying data will be copied to the CPU first. If the ``DNDarray`` is distributed, an MPI Allgather operation will be performed before converting to np.ndarray, i.e. each MPI process will end up holding a copy of the entire array in memory. Make sure process memory is sufficient! .. rubric:: Examples >>> import heat as ht T1 = ht.random.randn((10,8)) T1.numpy() .. method:: _repr_pretty_(p, cycle) Pretty print for IPython. .. method:: __repr__() -> str Returns a printable representation of the passed DNDarray, targeting developers. .. method:: ravel() Flattens the ``DNDarray``. .. seealso:: :func:`~heat.core.manipulations.ravel` .. rubric:: Examples >>> a = ht.ones((2, 3), split=0) >>> b = a.ravel() >>> a[0, 0] = 4 >>> b DNDarray([4., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0) .. method:: redistribute_(lshape_map: Optional[torch.Tensor] = None, target_map: Optional[torch.Tensor] = None) Redistributes the data of the :class:`DNDarray` *along the split axis* to match the given target map. This function does not modify the non-split dimensions of the ``DNDarray``. This is an abstraction and extension of the balance function. :param lshape_map: The current lshape of processes. Units are ``[rank, lshape]``. :type lshape_map: torch.Tensor, optional :param target_map: The desired distribution across the processes. Units are ``[rank, target lshape]``. Note: the only important parts of the target map are the values along the split axis, values which are not along this axis are there to mimic the shape of the ``lshape_map``. :type target_map: torch.Tensor, optional .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> target_map = torch.zeros((st.comm.size, 3), dtype=torch.int64) >>> target_map[0, 2] = 67 >>> print(target_map) [0/2] tensor([[ 0, 0, 67], [0/2] [ 0, 0, 0], [0/2] [ 0, 0, 0]], dtype=torch.int32) [1/2] tensor([[ 0, 0, 67], [1/2] [ 0, 0, 0], [1/2] [ 0, 0, 0]], dtype=torch.int32) [2/2] tensor([[ 0, 0, 67], [2/2] [ 0, 0, 0], [2/2] [ 0, 0, 0]], dtype=torch.int32) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.redistribute_(target_map=target_map) >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) .. method:: __redistribute_shuffle(snd_pr: Union[int, torch.Tensor], send_amt: Union[int, torch.Tensor], rcv_pr: Union[int, torch.Tensor], snd_dtype: torch.dtype) Function to abstract the function used during redistribute for shuffling data between processes along the split axis :param snd_pr: Sending process :type snd_pr: int or torch.Tensor :param send_amt: Amount of data to be sent by the sending process :type send_amt: int or torch.Tensor :param rcv_pr: Receiving process :type rcv_pr: int or torch.Tensor :param snd_dtype: Torch type of the data in question :type snd_dtype: torch.dtype .. method:: resplit_(axis: int = None) In-place option for resplitting a :class:`DNDarray`. :param axis: The new split axis, ``None`` denotes gathering, an int will set the new split axis :type axis: int .. rubric:: Examples >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, None) >>> a.split None >>> a.lshape (0/2) (4, 5) (1/2) (4, 5) >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, 1) >>> a.split 1 >>> a.lshape (0/2) (4, 3) (1/2) (4, 2) .. method:: __setitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Global item setter :param key: Index/indices to be set :type key: Union[int, Tuple[int,...], List[int,...]] :param value: Value to be set to the specified positions in the DNDarray (self) :type value: Union[float, DNDarray,torch.Tensor] .. rubric:: Notes If a ``DNDarray`` is given as the value to be set then the split axes are assumed to be equal. If they are not, PyTorch will raise an error when the values are attempted to be set on the local array .. rubric:: Examples >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] = 1 >>> a (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 1., 0., 0., 0.]]) (2/2) >>> tensor([[0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.]]) .. method:: __setter(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Utility function for checking ``value`` and forwarding to :func:``__setitem__`` :raises NotImplementedError: If the type of ``value`` ist not supported .. method:: __str__() -> str Computes a string representation of the passed ``DNDarray``. .. method:: tolist(keepsplit: bool = False) -> List Return a copy of the local array data as a (nested) Python list. For scalars, a standard Python number is returned. :param keepsplit: Whether the list should be returned locally or globally. :type keepsplit: bool .. rubric:: Examples >>> a = ht.array([[0, 1], [2, 3]]) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=0) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=1) >>> a.tolist(keepsplit=True) (1/2) [[0], [2]] (2/2) [[1], [3]] .. method:: __torch_function__(func, types, args=(), kwargs=None) Supports PyTorch's dispatch mechanism. .. method:: __torch_proxy__() -> torch.Tensor Return a 1-element `torch.Tensor` strided as the global `self` shape. Used internally for sanitation purposes. .. method:: __xitem_get_key_start_stop(rank: int, actives: list, key_st: int, key_sp: int, step: int, ends: torch.Tensor, og_key_st: int) -> Tuple[int, int] .. py:class:: KMeans(n_clusters: int = 8, init: Union[str, heat.core.dndarray.DNDarray] = 'random', max_iter: int = 300, tol: float = 0.0001, random_state: Optional[int] = None) Bases: :class:`heat.cluster._kcluster._KCluster` K-Means clustering algorithm. An implementation of Lloyd's algorithm [1]. :ivar n_clusters: The number of clusters to form as well as the number of centroids to generate. :vartype n_clusters: int :ivar init: Method for initialization: - ‘k-means++’ : selects initial cluster centers for the clustering in a smart way to speed up convergence [2]. - ‘random’: choose k observations (rows) at random from data for the initial centroids. - 'batchparallel': initialize by using the batch parallel algorithm (see BatchParallelKMeans for more information). - DNDarray: it should be of shape (n_clusters, n_features) and gives the initial centers. :vartype init: str or DNDarray :ivar max_iter: Maximum number of iterations of the k-means algorithm for a single run. :vartype max_iter: int :ivar tol: Relative tolerance with regards to inertia to declare convergence. :vartype tol: float :ivar random_state: Determines random number generation for centroid initialization. :vartype random_state: int .. rubric:: Notes The average complexity is given by :math:`O(k \cdot n \cdot T)`, were n is the number of samples and :math:`T` is the number of iterations. In practice, the k-means algorithm is very fast, but it may fall into local minima. That is why it can be useful to restart it several times. If the algorithm stops before fully converging (because of ``tol`` or ``max_iter``), ``labels_`` and ``cluster_centers_`` will not be consistent, i.e. the ``cluster_centers_`` will not be the means of the points in each cluster. Also, the estimator will reassign ``labels_`` after the last iteration to make ``labels_`` consistent with predict on the training set. .. rubric:: References [1] Lloyd, Stuart P., "Least squares quantization in PCM", IEEE Transactions on Information Theory, 28 (2), pp. 129–137, 1982. [2] Arthur, D., Vassilvitskii, S., "k-means++: The Advantages of Careful Seeding", Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027–1035, 2007. .. attribute:: _p :annotation: = 2 .. role:: raw-html(raw) :format: html .. method:: _update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray) Compute coordinates of new centroid as mean of the data points in ``x`` that are assigned to this centroid. :param x: Input data :type x: DNDarray :param matching_centroids: Array filled with indices ``i`` indicating to which cluster ``ci`` each sample point in ``x`` is assigned :type matching_centroids: DNDarray .. method:: fit(x: heat.core.dndarray.DNDarray, oversampling: float = 2, iter_multiplier: float = 1) -> KMeans.fit.self Computes the centroid of a k-means clustering. Reduce the values of the parameters 'oversampling' and 'iter_multiplier' to speed up the computation, if necessary. However, for too low values the initialization of cluster centers might fail and raise a corresponding ValueError. :param x: Training instances to cluster. Shape = (n_samples, n_features) :type x: DNDarray :param oversampling: oversampling factor used for the k-means|| initializiation of centroids :type oversampling: float :param iter_multiplier: factor that increases the number of iterations used in the initialization of centroids :type iter_multiplier: float .. py:class:: _KCluster(metric: Callable, n_clusters: int, init: Union[str, heat.core.dndarray.DNDarray], max_iter: int, tol: float, random_state: int) Bases: :class:`heat.ClusteringMixin`, :class:`heat.BaseEstimator` Base class for k-statistics clustering algorithms (kmeans, kmedians, kmedoids). The clusters are represented by centroids ci (we use the term from kmeans for simplicity) :param metric: One of the distance metrics in ht.spatial.distance. Needs to be passed as lambda function to take only two arrays as input :type metric: function :param n_clusters: The number of clusters to form as well as the number of centroids to generate. :type n_clusters: int :param init: Method for initialization: - ‘probability_based’ : selects initial cluster centers for the clustering in a smart way to speed up convergence (k-means++) - ‘random’: choose k observations (rows) at random from data for the initial centroids. - 'batchparallel': use the batch parallel algorithm to initialize the centroids, only available for split=0 and KMeans or KMedians - ``DNDarray``: gives the initial centers, should be of Shape = (n_clusters, n_features) :type init: str or DNDarray, default: ‘random’ :param max_iter: Maximum number of iterations for a single run. :type max_iter: int :param tol: Relative tolerance with regards to inertia to declare convergence. :type tol: float, default: 1e-4 :param random_state: Determines random number generation for centroid initialization. :type random_state: int .. attribute:: n_clusters .. attribute:: init .. attribute:: max_iter .. attribute:: tol .. attribute:: random_state .. attribute:: _metric .. attribute:: _cluster_centers :annotation: = None .. attribute:: _functional_value :annotation: = None .. attribute:: _labels :annotation: = None .. attribute:: _inertia :annotation: = None .. attribute:: _n_iter :annotation: = None .. attribute:: _p :annotation: = None .. role:: raw-html(raw) :format: html .. method:: _initialize_cluster_centers(x: heat.core.dndarray.DNDarray, oversampling: float, iter_multiplier: float) Initializes the K-Means centroids. :param x: The data to initialize the clusters for. Shape = (n_samples, n_features) :type x: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param iter_multiplier: factor that increases the number of iterations used in the initialization of centroids :type iter_multiplier: float .. method:: _centroid_sampling_helper(x: heat.core.dndarray.DNDarray, centroids: heat.core.dndarray.DNDarray, oversampling: float, num_iters: int) Helper function for the k-means|| initialization of centroids. Samples new centroids based on a probability distribution derived from the distance of data points to the current set of centroids. :param x: The data to initialize the clusters for. Shape = (n_samples, n_features) :type x: DNDarray :param centroids: The initial set of centroids :type centroids: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param num_iters: number of iterations used in the initialization of centroids :type num_iters: float .. method:: _assign_to_cluster(x: heat.core.dndarray.DNDarray, eval_functional_value: bool = False) Assigns the passed data points to the centroids based on the respective metric :param x: Data points, Shape = (n_samples, n_features) :type x: DNDarray :param eval_functional_value: If True, the current K-Clustering functional value of the clustering algorithm is evaluated :type eval_functional_value: bool, default: False .. method:: _update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray) The Update strategy is algorithm specific (e.g. calculate mean of assigned points for kmeans, median for kmedians, etc.) :param x: Input Data :type x: DNDarray :param matching_centroids: Index array of assigned centroids :type matching_centroids: DNDarray .. method:: fit(x: heat.core.dndarray.DNDarray) Computes the centroid of the clustering algorithm to fit the data ``x``. The full pipeline is algorithm specific. :param x: Training instances to cluster. Shape = (n_samples, n_features) :type x: DNDarray .. method:: predict(x: heat.core.dndarray.DNDarray) Predict the closest cluster each sample in ``x`` belongs to. In the vector quantization literature, :func:`cluster_centers_` is called the code book and each value returned by predict is the index of the closest code in the code book. :param x: New data to predict. Shape = (n_samples, n_features) :type x: DNDarray .. py:class:: DNDarray(array: torch.Tensor, gshape: Tuple[int, Ellipsis], dtype: heat.core.types.datatype, split: Union[int, None], device: heat.core.devices.Device, comm: Communication, balanced: bool) Distributed N-Dimensional array. The core element of HeAT. It is composed of PyTorch tensors local to each process. :param array: Local array elements :type array: torch.Tensor :param gshape: The global shape of the array :type gshape: Tuple[int,...] :param dtype: The datatype of the array :type dtype: datatype :param split: The axis on which the array is divided between processes :type split: int or None :param device: The device on which the local arrays are using (cpu or gpu) :type device: Device :param comm: The communications object for sending and receiving data :type comm: Communication :param balanced: Describes whether the data are evenly distributed across processes. If this information is not available (``self.balanced is None``), it can be gathered via the :func:`is_balanced()` method (requires communication). :type balanced: bool or None .. attribute:: __array .. attribute:: __gshape .. attribute:: __dtype .. attribute:: __split .. attribute:: __device .. attribute:: __comm .. attribute:: __balanced .. attribute:: __ishalo :annotation: = False .. attribute:: __halo_next :annotation: = None .. attribute:: __halo_prev :annotation: = None .. attribute:: __partitions_dict__ :annotation: = None .. attribute:: __lshape_map :annotation: = None .. role:: raw-html(raw) :format: html .. method:: __prephalo(start, end) -> torch.Tensor Extracts the halo indexed by start, end from ``self.array`` in the direction of ``self.split`` :param start: Start index of the halo extracted from ``self.array`` :type start: int :param end: End index of the halo extracted from ``self.array`` :type end: int .. method:: get_halo(halo_size: int, prev: bool = True, next: bool = True) Fetch halos of size ``halo_size`` from neighboring ranks and save them in ``self.halo_next/self.halo_prev``. :param halo_size: Size of the halo. :type halo_size: int :param prev: If True, fetch the halo from the previous rank. Default: True. :type prev: bool, optional :param next: If True, fetch the halo from the next rank. Default: True. :type next: bool, optional .. method:: __cat_halo() -> torch.Tensor Return local array concatenated to halos if they are available. .. method:: __array__() -> numpy.ndarray Returns a view of the process-local slice of the :class:`DNDarray` as a numpy ndarray, if the ``DNDarray`` resides on CPU. Otherwise, it returns a copy, on CPU, of the process-local slice of ``DNDarray`` as numpy ndarray. .. method:: __array_ufunc__(ufunc, method, *inputs, **kwargs) Override NumPy's universal functions. .. method:: __array_function__(func, types, args, kwargs) Augments NumPy's functions. .. method:: astype(dtype, copy=True) -> DNDarray Returns a casted version of this array. Casted array is a new array of the same shape but with given type of this array. If copy is ``True``, the same array is returned instead. :param dtype: Heat type to which the array is cast :type dtype: datatype :param copy: By default the operation returns a copy of this array. If copy is set to ``False`` the cast is performed in-place and this array is returned :type copy: bool, optional .. method:: balance_() -> DNDarray Function for balancing a :class:`DNDarray` between all nodes. To determine if this is needed use the :func:`is_balanced()` function. If the ``DNDarray`` is already balanced this function will do nothing. This function modifies the ``DNDarray`` itself and will not return anything. .. rubric:: Examples >>> a = ht.zeros((10, 2), split=0) >>> a[:, 0] = ht.arange(10) >>> b = a[3:] [0/2] tensor([[3., 0.], [1/2] tensor([[4., 0.], [5., 0.], [6., 0.]]) [2/2] tensor([[7., 0.], [8., 0.], [9., 0.]]) >>> b.balance_() >>> print(b.gshape, b.lshape) [0/2] (7, 2) (1, 2) [1/2] (7, 2) (3, 2) [2/2] (7, 2) (3, 2) >>> b [0/2] tensor([[3., 0.], [4., 0.], [5., 0.]]) [1/2] tensor([[6., 0.], [7., 0.]]) [2/2] tensor([[8., 0.], [9., 0.]]) >>> print(b.gshape, b.lshape) [0/2] (7, 2) (3, 2) [1/2] (7, 2) (2, 2) [2/2] (7, 2) (2, 2) .. method:: __bool__() -> bool Boolean scalar casting. .. method:: __cast(cast_function) -> Union[float, int] Implements a generic cast function for ``DNDarray`` objects. :param cast_function: The actual cast function, e.g. ``float`` or ``int`` :type cast_function: function :raises TypeError: If the ``DNDarray`` object cannot be converted into a scalar. .. method:: collect_(target_rank: Optional[int] = 0) -> None A method collecting a distributed DNDarray to one MPI rank, chosen by the `target_rank` variable. It is a specific case of the ``redistribute_`` method. :param target_rank: The rank to which the DNDarray will be collected. Default: 0. :type target_rank: int, optional :raises TypeError: If the target rank is not an integer. :raises ValueError: If the target rank is out of bounds. .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.collect_() >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) >>> st.collect_(1) >>> print(st.lshape) [0/2] (50, 81, 0) [1/2] (50, 81, 67) [2/2] (50, 81, 0) .. method:: __complex__() -> DNDarray Complex scalar casting. .. method:: counts_displs() -> Tuple[Tuple[int], Tuple[int]] Returns actual counts (number of items per process) and displacements (offsets) of the DNDarray. Does not assume load balance. .. method:: cpu() -> DNDarray Returns a copy of this object in main memory. If this object is already in main memory, then no copy is performed and the original object is returned. .. method:: create_lshape_map(force_check: bool = False) -> torch.Tensor Generate a 'map' of the lshapes of the data on all processes. Units are ``(process rank, lshape)`` :param force_check: if False (default) and the lshape map has already been created, use the previous result. Otherwise, create the lshape_map :type force_check: bool, optional .. method:: create_partition_interface() Create a partition interface in line with the DPPY proposal. This is subject to change. The intention of this to facilitate the usage of a general format for the referencing of distributed datasets. An example of the output and shape is shown below. __partitioned__ = { 'shape': (27, 3, 2), 'partition_tiling': (4, 1, 1), 'partitions': { (0, 0, 0): { 'start': (0, 0, 0), 'shape': (7, 3, 2), 'data': tensor([...], dtype=torch.int32), 'location': [0], 'dtype': torch.int32, 'device': 'cpu' }, (1, 0, 0): { 'start': (7, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [1], 'dtype': torch.int32, 'device': 'cpu' }, (2, 0, 0): { 'start': (14, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [2], 'dtype': torch.int32, 'device': 'cpu' }, (3, 0, 0): { 'start': (21, 0, 0), 'shape': (6, 3, 2), 'data': None, 'location': [3], 'dtype': torch.int32, 'device': 'cpu' } }, 'locals': [(rank, 0, 0)], 'get': lambda x: x, } :rtype: dictionary containing the partition interface as shown above. .. method:: __float__() -> DNDarray Float scalar casting. .. seealso:: :func:`~heat.core.manipulations.flatten` .. method:: fill_diagonal(value: float) -> DNDarray Fill the main diagonal of a 2D :class:`DNDarray`. This function modifies the input tensor in-place, and returns the input array. :param value: The value to be placed in the ``DNDarrays`` main diagonal :type value: float .. method:: __getitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]]) -> DNDarray Global getter function for DNDarrays. Returns a new DNDarray composed of the elements of the original tensor selected by the indices given. This does *NOT* redistribute or rebalance the resulting tensor. If the selection of values is unbalanced then the resultant tensor is also unbalanced! To redistributed the ``DNDarray`` use :func:`balance()` (issue #187) :param key: Indices to get from the tensor. :type key: int, slice, Tuple[int,...], List[int,...] .. rubric:: Examples >>> a = ht.arange(10, split=0) (1/2) >>> tensor([0, 1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5, 6, 7, 8, 9], dtype=torch.int32) >>> a[1:6] (1/2) >>> tensor([1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5], dtype=torch.int32) >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] (1/2) >>> tensor([0.]) (2/2) >>> tensor([0., 0.]) .. method:: gpu() -> DNDarray Returns a copy of this object in GPU memory. If this object is already in GPU memory, then no copy is performed and the original object is returned. .. method:: __int__() -> DNDarray Integer scalar casting. .. method:: is_balanced(force_check: bool = False) -> bool Determine if ``self`` is balanced evenly (or as evenly as possible) across all nodes distributed evenly (or as evenly as possible) across all processes. This is equivalent to returning ``self.balanced``. If no information is available (``self.balanced = None``), the balanced status will be assessed via collective communication. :param force_check: If True, the balanced status of the ``DNDarray`` will be assessed via collective communication in any case. :type force_check: bool, optional .. method:: is_distributed() -> bool Determines whether the data of this ``DNDarray`` is distributed across multiple processes. .. method:: __key_is_singular(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: __key_adds_dimension(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: item() Returns the only element of a 1-element :class:`DNDarray`. Mirror of the pytorch command by the same name. If size of ``DNDarray`` is >1 element, then a ``ValueError`` is raised (by pytorch) .. rubric:: Examples >>> import heat as ht >>> x = ht.zeros((1)) >>> x.item() 0.0 .. method:: __len__() -> int The length of the ``DNDarray``, i.e. the number of items in the first dimension. .. method:: numpy() -> numpy.array Returns a copy of the :class:`DNDarray` as numpy ndarray. If the ``DNDarray`` resides on the GPU, the underlying data will be copied to the CPU first. If the ``DNDarray`` is distributed, an MPI Allgather operation will be performed before converting to np.ndarray, i.e. each MPI process will end up holding a copy of the entire array in memory. Make sure process memory is sufficient! .. rubric:: Examples >>> import heat as ht T1 = ht.random.randn((10,8)) T1.numpy() .. method:: _repr_pretty_(p, cycle) Pretty print for IPython. .. method:: __repr__() -> str Returns a printable representation of the passed DNDarray, targeting developers. .. method:: ravel() Flattens the ``DNDarray``. .. seealso:: :func:`~heat.core.manipulations.ravel` .. rubric:: Examples >>> a = ht.ones((2, 3), split=0) >>> b = a.ravel() >>> a[0, 0] = 4 >>> b DNDarray([4., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0) .. method:: redistribute_(lshape_map: Optional[torch.Tensor] = None, target_map: Optional[torch.Tensor] = None) Redistributes the data of the :class:`DNDarray` *along the split axis* to match the given target map. This function does not modify the non-split dimensions of the ``DNDarray``. This is an abstraction and extension of the balance function. :param lshape_map: The current lshape of processes. Units are ``[rank, lshape]``. :type lshape_map: torch.Tensor, optional :param target_map: The desired distribution across the processes. Units are ``[rank, target lshape]``. Note: the only important parts of the target map are the values along the split axis, values which are not along this axis are there to mimic the shape of the ``lshape_map``. :type target_map: torch.Tensor, optional .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> target_map = torch.zeros((st.comm.size, 3), dtype=torch.int64) >>> target_map[0, 2] = 67 >>> print(target_map) [0/2] tensor([[ 0, 0, 67], [0/2] [ 0, 0, 0], [0/2] [ 0, 0, 0]], dtype=torch.int32) [1/2] tensor([[ 0, 0, 67], [1/2] [ 0, 0, 0], [1/2] [ 0, 0, 0]], dtype=torch.int32) [2/2] tensor([[ 0, 0, 67], [2/2] [ 0, 0, 0], [2/2] [ 0, 0, 0]], dtype=torch.int32) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.redistribute_(target_map=target_map) >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) .. method:: __redistribute_shuffle(snd_pr: Union[int, torch.Tensor], send_amt: Union[int, torch.Tensor], rcv_pr: Union[int, torch.Tensor], snd_dtype: torch.dtype) Function to abstract the function used during redistribute for shuffling data between processes along the split axis :param snd_pr: Sending process :type snd_pr: int or torch.Tensor :param send_amt: Amount of data to be sent by the sending process :type send_amt: int or torch.Tensor :param rcv_pr: Receiving process :type rcv_pr: int or torch.Tensor :param snd_dtype: Torch type of the data in question :type snd_dtype: torch.dtype .. method:: resplit_(axis: int = None) In-place option for resplitting a :class:`DNDarray`. :param axis: The new split axis, ``None`` denotes gathering, an int will set the new split axis :type axis: int .. rubric:: Examples >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, None) >>> a.split None >>> a.lshape (0/2) (4, 5) (1/2) (4, 5) >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, 1) >>> a.split 1 >>> a.lshape (0/2) (4, 3) (1/2) (4, 2) .. method:: __setitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Global item setter :param key: Index/indices to be set :type key: Union[int, Tuple[int,...], List[int,...]] :param value: Value to be set to the specified positions in the DNDarray (self) :type value: Union[float, DNDarray,torch.Tensor] .. rubric:: Notes If a ``DNDarray`` is given as the value to be set then the split axes are assumed to be equal. If they are not, PyTorch will raise an error when the values are attempted to be set on the local array .. rubric:: Examples >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] = 1 >>> a (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 1., 0., 0., 0.]]) (2/2) >>> tensor([[0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.]]) .. method:: __setter(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Utility function for checking ``value`` and forwarding to :func:``__setitem__`` :raises NotImplementedError: If the type of ``value`` ist not supported .. method:: __str__() -> str Computes a string representation of the passed ``DNDarray``. .. method:: tolist(keepsplit: bool = False) -> List Return a copy of the local array data as a (nested) Python list. For scalars, a standard Python number is returned. :param keepsplit: Whether the list should be returned locally or globally. :type keepsplit: bool .. rubric:: Examples >>> a = ht.array([[0, 1], [2, 3]]) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=0) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=1) >>> a.tolist(keepsplit=True) (1/2) [[0], [2]] (2/2) [[1], [3]] .. method:: __torch_function__(func, types, args=(), kwargs=None) Supports PyTorch's dispatch mechanism. .. method:: __torch_proxy__() -> torch.Tensor Return a 1-element `torch.Tensor` strided as the global `self` shape. Used internally for sanitation purposes. .. method:: __xitem_get_key_start_stop(rank: int, actives: list, key_st: int, key_sp: int, step: int, ends: torch.Tensor, og_key_st: int) -> Tuple[int, int] .. py:class:: KMedians(n_clusters: int = 8, init: Union[str, heat.core.dndarray.DNDarray] = 'random', max_iter: int = 300, tol: float = 0.0001, random_state: int = None) Bases: :class:`heat.cluster._kcluster._KCluster` K-Medians clustering algorithm [1]. Uses the Manhattan (City-block, :math:`L_1`) metric for distance calculations :param n_clusters: The number of clusters to form as well as the number of centroids to generate. :type n_clusters: int, optional, default: 8 :param init: Method for initialization: - ‘k-medians++’ : selects initial cluster centers for the clustering in a smart way to speed up convergence [2]. - ‘random’: choose k observations (rows) at random from data for the initial centroids. - 'batchparallel': initialize by using the batch parallel algorithm (see BatchParallelKMedians for more information). - DNDarray: gives the initial centers, should be of Shape = (n_clusters, n_features) :type init: str or DNDarray, default: ‘random’ :param max_iter: Maximum number of iterations of the k-means algorithm for a single run. :type max_iter: int, default: 300 :param tol: Relative tolerance with regards to inertia to declare convergence. :type tol: float, default: 1e-4 :param random_state: Determines random number generation for centroid initialization. :type random_state: int .. rubric:: References [1] Hakimi, S., and O. Kariv. "An algorithmic approach to network location problems II: The p-medians." SIAM Journal on Applied Mathematics 37.3 (1979): 539-560. .. attribute:: _p :annotation: = 1 .. role:: raw-html(raw) :format: html .. method:: _update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray) Compute coordinates of new centroid as median of the data points in ``x`` that are assigned to it :param x: Input data :type x: DNDarray :param matching_centroids: Array filled with indeces ``i`` indicating to which cluster ``ci`` each sample point in x is assigned :type matching_centroids: DNDarray .. method:: fit(x: heat.core.dndarray.DNDarray, oversampling: float = 2, iter_multiplier: float = 1) Computes the centroid of a k-medians clustering. :param x: Training instances to cluster. Shape = (n_samples, n_features) :type x: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param iter_multiplier: factor that increases the number of iterations used in the initialization of centroids :type iter_multiplier: float .. py:class:: _KCluster(metric: Callable, n_clusters: int, init: Union[str, heat.core.dndarray.DNDarray], max_iter: int, tol: float, random_state: int) Bases: :class:`heat.ClusteringMixin`, :class:`heat.BaseEstimator` Base class for k-statistics clustering algorithms (kmeans, kmedians, kmedoids). The clusters are represented by centroids ci (we use the term from kmeans for simplicity) :param metric: One of the distance metrics in ht.spatial.distance. Needs to be passed as lambda function to take only two arrays as input :type metric: function :param n_clusters: The number of clusters to form as well as the number of centroids to generate. :type n_clusters: int :param init: Method for initialization: - ‘probability_based’ : selects initial cluster centers for the clustering in a smart way to speed up convergence (k-means++) - ‘random’: choose k observations (rows) at random from data for the initial centroids. - 'batchparallel': use the batch parallel algorithm to initialize the centroids, only available for split=0 and KMeans or KMedians - ``DNDarray``: gives the initial centers, should be of Shape = (n_clusters, n_features) :type init: str or DNDarray, default: ‘random’ :param max_iter: Maximum number of iterations for a single run. :type max_iter: int :param tol: Relative tolerance with regards to inertia to declare convergence. :type tol: float, default: 1e-4 :param random_state: Determines random number generation for centroid initialization. :type random_state: int .. attribute:: n_clusters .. attribute:: init .. attribute:: max_iter .. attribute:: tol .. attribute:: random_state .. attribute:: _metric .. attribute:: _cluster_centers :annotation: = None .. attribute:: _functional_value :annotation: = None .. attribute:: _labels :annotation: = None .. attribute:: _inertia :annotation: = None .. attribute:: _n_iter :annotation: = None .. attribute:: _p :annotation: = None .. role:: raw-html(raw) :format: html .. method:: _initialize_cluster_centers(x: heat.core.dndarray.DNDarray, oversampling: float, iter_multiplier: float) Initializes the K-Means centroids. :param x: The data to initialize the clusters for. Shape = (n_samples, n_features) :type x: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param iter_multiplier: factor that increases the number of iterations used in the initialization of centroids :type iter_multiplier: float .. method:: _centroid_sampling_helper(x: heat.core.dndarray.DNDarray, centroids: heat.core.dndarray.DNDarray, oversampling: float, num_iters: int) Helper function for the k-means|| initialization of centroids. Samples new centroids based on a probability distribution derived from the distance of data points to the current set of centroids. :param x: The data to initialize the clusters for. Shape = (n_samples, n_features) :type x: DNDarray :param centroids: The initial set of centroids :type centroids: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param num_iters: number of iterations used in the initialization of centroids :type num_iters: float .. method:: _assign_to_cluster(x: heat.core.dndarray.DNDarray, eval_functional_value: bool = False) Assigns the passed data points to the centroids based on the respective metric :param x: Data points, Shape = (n_samples, n_features) :type x: DNDarray :param eval_functional_value: If True, the current K-Clustering functional value of the clustering algorithm is evaluated :type eval_functional_value: bool, default: False .. method:: _update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray) The Update strategy is algorithm specific (e.g. calculate mean of assigned points for kmeans, median for kmedians, etc.) :param x: Input Data :type x: DNDarray :param matching_centroids: Index array of assigned centroids :type matching_centroids: DNDarray .. method:: fit(x: heat.core.dndarray.DNDarray) Computes the centroid of the clustering algorithm to fit the data ``x``. The full pipeline is algorithm specific. :param x: Training instances to cluster. Shape = (n_samples, n_features) :type x: DNDarray .. method:: predict(x: heat.core.dndarray.DNDarray) Predict the closest cluster each sample in ``x`` belongs to. In the vector quantization literature, :func:`cluster_centers_` is called the code book and each value returned by predict is the index of the closest code in the code book. :param x: New data to predict. Shape = (n_samples, n_features) :type x: DNDarray .. py:class:: DNDarray(array: torch.Tensor, gshape: Tuple[int, Ellipsis], dtype: heat.core.types.datatype, split: Union[int, None], device: heat.core.devices.Device, comm: Communication, balanced: bool) Distributed N-Dimensional array. The core element of HeAT. It is composed of PyTorch tensors local to each process. :param array: Local array elements :type array: torch.Tensor :param gshape: The global shape of the array :type gshape: Tuple[int,...] :param dtype: The datatype of the array :type dtype: datatype :param split: The axis on which the array is divided between processes :type split: int or None :param device: The device on which the local arrays are using (cpu or gpu) :type device: Device :param comm: The communications object for sending and receiving data :type comm: Communication :param balanced: Describes whether the data are evenly distributed across processes. If this information is not available (``self.balanced is None``), it can be gathered via the :func:`is_balanced()` method (requires communication). :type balanced: bool or None .. attribute:: __array .. attribute:: __gshape .. attribute:: __dtype .. attribute:: __split .. attribute:: __device .. attribute:: __comm .. attribute:: __balanced .. attribute:: __ishalo :annotation: = False .. attribute:: __halo_next :annotation: = None .. attribute:: __halo_prev :annotation: = None .. attribute:: __partitions_dict__ :annotation: = None .. attribute:: __lshape_map :annotation: = None .. role:: raw-html(raw) :format: html .. method:: __prephalo(start, end) -> torch.Tensor Extracts the halo indexed by start, end from ``self.array`` in the direction of ``self.split`` :param start: Start index of the halo extracted from ``self.array`` :type start: int :param end: End index of the halo extracted from ``self.array`` :type end: int .. method:: get_halo(halo_size: int, prev: bool = True, next: bool = True) Fetch halos of size ``halo_size`` from neighboring ranks and save them in ``self.halo_next/self.halo_prev``. :param halo_size: Size of the halo. :type halo_size: int :param prev: If True, fetch the halo from the previous rank. Default: True. :type prev: bool, optional :param next: If True, fetch the halo from the next rank. Default: True. :type next: bool, optional .. method:: __cat_halo() -> torch.Tensor Return local array concatenated to halos if they are available. .. method:: __array__() -> numpy.ndarray Returns a view of the process-local slice of the :class:`DNDarray` as a numpy ndarray, if the ``DNDarray`` resides on CPU. Otherwise, it returns a copy, on CPU, of the process-local slice of ``DNDarray`` as numpy ndarray. .. method:: __array_ufunc__(ufunc, method, *inputs, **kwargs) Override NumPy's universal functions. .. method:: __array_function__(func, types, args, kwargs) Augments NumPy's functions. .. method:: astype(dtype, copy=True) -> DNDarray Returns a casted version of this array. Casted array is a new array of the same shape but with given type of this array. If copy is ``True``, the same array is returned instead. :param dtype: Heat type to which the array is cast :type dtype: datatype :param copy: By default the operation returns a copy of this array. If copy is set to ``False`` the cast is performed in-place and this array is returned :type copy: bool, optional .. method:: balance_() -> DNDarray Function for balancing a :class:`DNDarray` between all nodes. To determine if this is needed use the :func:`is_balanced()` function. If the ``DNDarray`` is already balanced this function will do nothing. This function modifies the ``DNDarray`` itself and will not return anything. .. rubric:: Examples >>> a = ht.zeros((10, 2), split=0) >>> a[:, 0] = ht.arange(10) >>> b = a[3:] [0/2] tensor([[3., 0.], [1/2] tensor([[4., 0.], [5., 0.], [6., 0.]]) [2/2] tensor([[7., 0.], [8., 0.], [9., 0.]]) >>> b.balance_() >>> print(b.gshape, b.lshape) [0/2] (7, 2) (1, 2) [1/2] (7, 2) (3, 2) [2/2] (7, 2) (3, 2) >>> b [0/2] tensor([[3., 0.], [4., 0.], [5., 0.]]) [1/2] tensor([[6., 0.], [7., 0.]]) [2/2] tensor([[8., 0.], [9., 0.]]) >>> print(b.gshape, b.lshape) [0/2] (7, 2) (3, 2) [1/2] (7, 2) (2, 2) [2/2] (7, 2) (2, 2) .. method:: __bool__() -> bool Boolean scalar casting. .. method:: __cast(cast_function) -> Union[float, int] Implements a generic cast function for ``DNDarray`` objects. :param cast_function: The actual cast function, e.g. ``float`` or ``int`` :type cast_function: function :raises TypeError: If the ``DNDarray`` object cannot be converted into a scalar. .. method:: collect_(target_rank: Optional[int] = 0) -> None A method collecting a distributed DNDarray to one MPI rank, chosen by the `target_rank` variable. It is a specific case of the ``redistribute_`` method. :param target_rank: The rank to which the DNDarray will be collected. Default: 0. :type target_rank: int, optional :raises TypeError: If the target rank is not an integer. :raises ValueError: If the target rank is out of bounds. .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.collect_() >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) >>> st.collect_(1) >>> print(st.lshape) [0/2] (50, 81, 0) [1/2] (50, 81, 67) [2/2] (50, 81, 0) .. method:: __complex__() -> DNDarray Complex scalar casting. .. method:: counts_displs() -> Tuple[Tuple[int], Tuple[int]] Returns actual counts (number of items per process) and displacements (offsets) of the DNDarray. Does not assume load balance. .. method:: cpu() -> DNDarray Returns a copy of this object in main memory. If this object is already in main memory, then no copy is performed and the original object is returned. .. method:: create_lshape_map(force_check: bool = False) -> torch.Tensor Generate a 'map' of the lshapes of the data on all processes. Units are ``(process rank, lshape)`` :param force_check: if False (default) and the lshape map has already been created, use the previous result. Otherwise, create the lshape_map :type force_check: bool, optional .. method:: create_partition_interface() Create a partition interface in line with the DPPY proposal. This is subject to change. The intention of this to facilitate the usage of a general format for the referencing of distributed datasets. An example of the output and shape is shown below. __partitioned__ = { 'shape': (27, 3, 2), 'partition_tiling': (4, 1, 1), 'partitions': { (0, 0, 0): { 'start': (0, 0, 0), 'shape': (7, 3, 2), 'data': tensor([...], dtype=torch.int32), 'location': [0], 'dtype': torch.int32, 'device': 'cpu' }, (1, 0, 0): { 'start': (7, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [1], 'dtype': torch.int32, 'device': 'cpu' }, (2, 0, 0): { 'start': (14, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [2], 'dtype': torch.int32, 'device': 'cpu' }, (3, 0, 0): { 'start': (21, 0, 0), 'shape': (6, 3, 2), 'data': None, 'location': [3], 'dtype': torch.int32, 'device': 'cpu' } }, 'locals': [(rank, 0, 0)], 'get': lambda x: x, } :rtype: dictionary containing the partition interface as shown above. .. method:: __float__() -> DNDarray Float scalar casting. .. seealso:: :func:`~heat.core.manipulations.flatten` .. method:: fill_diagonal(value: float) -> DNDarray Fill the main diagonal of a 2D :class:`DNDarray`. This function modifies the input tensor in-place, and returns the input array. :param value: The value to be placed in the ``DNDarrays`` main diagonal :type value: float .. method:: __getitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]]) -> DNDarray Global getter function for DNDarrays. Returns a new DNDarray composed of the elements of the original tensor selected by the indices given. This does *NOT* redistribute or rebalance the resulting tensor. If the selection of values is unbalanced then the resultant tensor is also unbalanced! To redistributed the ``DNDarray`` use :func:`balance()` (issue #187) :param key: Indices to get from the tensor. :type key: int, slice, Tuple[int,...], List[int,...] .. rubric:: Examples >>> a = ht.arange(10, split=0) (1/2) >>> tensor([0, 1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5, 6, 7, 8, 9], dtype=torch.int32) >>> a[1:6] (1/2) >>> tensor([1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5], dtype=torch.int32) >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] (1/2) >>> tensor([0.]) (2/2) >>> tensor([0., 0.]) .. method:: gpu() -> DNDarray Returns a copy of this object in GPU memory. If this object is already in GPU memory, then no copy is performed and the original object is returned. .. method:: __int__() -> DNDarray Integer scalar casting. .. method:: is_balanced(force_check: bool = False) -> bool Determine if ``self`` is balanced evenly (or as evenly as possible) across all nodes distributed evenly (or as evenly as possible) across all processes. This is equivalent to returning ``self.balanced``. If no information is available (``self.balanced = None``), the balanced status will be assessed via collective communication. :param force_check: If True, the balanced status of the ``DNDarray`` will be assessed via collective communication in any case. :type force_check: bool, optional .. method:: is_distributed() -> bool Determines whether the data of this ``DNDarray`` is distributed across multiple processes. .. method:: __key_is_singular(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: __key_adds_dimension(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: item() Returns the only element of a 1-element :class:`DNDarray`. Mirror of the pytorch command by the same name. If size of ``DNDarray`` is >1 element, then a ``ValueError`` is raised (by pytorch) .. rubric:: Examples >>> import heat as ht >>> x = ht.zeros((1)) >>> x.item() 0.0 .. method:: __len__() -> int The length of the ``DNDarray``, i.e. the number of items in the first dimension. .. method:: numpy() -> numpy.array Returns a copy of the :class:`DNDarray` as numpy ndarray. If the ``DNDarray`` resides on the GPU, the underlying data will be copied to the CPU first. If the ``DNDarray`` is distributed, an MPI Allgather operation will be performed before converting to np.ndarray, i.e. each MPI process will end up holding a copy of the entire array in memory. Make sure process memory is sufficient! .. rubric:: Examples >>> import heat as ht T1 = ht.random.randn((10,8)) T1.numpy() .. method:: _repr_pretty_(p, cycle) Pretty print for IPython. .. method:: __repr__() -> str Returns a printable representation of the passed DNDarray, targeting developers. .. method:: ravel() Flattens the ``DNDarray``. .. seealso:: :func:`~heat.core.manipulations.ravel` .. rubric:: Examples >>> a = ht.ones((2, 3), split=0) >>> b = a.ravel() >>> a[0, 0] = 4 >>> b DNDarray([4., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0) .. method:: redistribute_(lshape_map: Optional[torch.Tensor] = None, target_map: Optional[torch.Tensor] = None) Redistributes the data of the :class:`DNDarray` *along the split axis* to match the given target map. This function does not modify the non-split dimensions of the ``DNDarray``. This is an abstraction and extension of the balance function. :param lshape_map: The current lshape of processes. Units are ``[rank, lshape]``. :type lshape_map: torch.Tensor, optional :param target_map: The desired distribution across the processes. Units are ``[rank, target lshape]``. Note: the only important parts of the target map are the values along the split axis, values which are not along this axis are there to mimic the shape of the ``lshape_map``. :type target_map: torch.Tensor, optional .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> target_map = torch.zeros((st.comm.size, 3), dtype=torch.int64) >>> target_map[0, 2] = 67 >>> print(target_map) [0/2] tensor([[ 0, 0, 67], [0/2] [ 0, 0, 0], [0/2] [ 0, 0, 0]], dtype=torch.int32) [1/2] tensor([[ 0, 0, 67], [1/2] [ 0, 0, 0], [1/2] [ 0, 0, 0]], dtype=torch.int32) [2/2] tensor([[ 0, 0, 67], [2/2] [ 0, 0, 0], [2/2] [ 0, 0, 0]], dtype=torch.int32) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.redistribute_(target_map=target_map) >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) .. method:: __redistribute_shuffle(snd_pr: Union[int, torch.Tensor], send_amt: Union[int, torch.Tensor], rcv_pr: Union[int, torch.Tensor], snd_dtype: torch.dtype) Function to abstract the function used during redistribute for shuffling data between processes along the split axis :param snd_pr: Sending process :type snd_pr: int or torch.Tensor :param send_amt: Amount of data to be sent by the sending process :type send_amt: int or torch.Tensor :param rcv_pr: Receiving process :type rcv_pr: int or torch.Tensor :param snd_dtype: Torch type of the data in question :type snd_dtype: torch.dtype .. method:: resplit_(axis: int = None) In-place option for resplitting a :class:`DNDarray`. :param axis: The new split axis, ``None`` denotes gathering, an int will set the new split axis :type axis: int .. rubric:: Examples >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, None) >>> a.split None >>> a.lshape (0/2) (4, 5) (1/2) (4, 5) >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, 1) >>> a.split 1 >>> a.lshape (0/2) (4, 3) (1/2) (4, 2) .. method:: __setitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Global item setter :param key: Index/indices to be set :type key: Union[int, Tuple[int,...], List[int,...]] :param value: Value to be set to the specified positions in the DNDarray (self) :type value: Union[float, DNDarray,torch.Tensor] .. rubric:: Notes If a ``DNDarray`` is given as the value to be set then the split axes are assumed to be equal. If they are not, PyTorch will raise an error when the values are attempted to be set on the local array .. rubric:: Examples >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] = 1 >>> a (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 1., 0., 0., 0.]]) (2/2) >>> tensor([[0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.]]) .. method:: __setter(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Utility function for checking ``value`` and forwarding to :func:``__setitem__`` :raises NotImplementedError: If the type of ``value`` ist not supported .. method:: __str__() -> str Computes a string representation of the passed ``DNDarray``. .. method:: tolist(keepsplit: bool = False) -> List Return a copy of the local array data as a (nested) Python list. For scalars, a standard Python number is returned. :param keepsplit: Whether the list should be returned locally or globally. :type keepsplit: bool .. rubric:: Examples >>> a = ht.array([[0, 1], [2, 3]]) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=0) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=1) >>> a.tolist(keepsplit=True) (1/2) [[0], [2]] (2/2) [[1], [3]] .. method:: __torch_function__(func, types, args=(), kwargs=None) Supports PyTorch's dispatch mechanism. .. method:: __torch_proxy__() -> torch.Tensor Return a 1-element `torch.Tensor` strided as the global `self` shape. Used internally for sanitation purposes. .. method:: __xitem_get_key_start_stop(rank: int, actives: list, key_st: int, key_sp: int, step: int, ends: torch.Tensor, og_key_st: int) -> Tuple[int, int] .. py:class:: KMedoids(n_clusters: int = 8, init: Union[str, heat.core.dndarray.DNDarray] = 'random', max_iter: int = 300, random_state: int = None) Bases: :class:`heat.cluster._kcluster._KCluster` Kmedoids with the Manhattan distance as fixed metric, calculating the median of the assigned cluster points as new cluster center and snapping the centroid to the the nearest datapoint afterwards. This is not the original implementation of k-medoids using PAM as originally proposed by in [1]. :param n_clusters: The number of clusters to form as well as the number of centroids to generate. :type n_clusters: int, optional, default: 8 :param init: Method for initialization: - ‘k-medoids++’ : selects initial cluster centers for the clustering in a smart way to speed up convergence [2]. - ‘random’: choose k observations (rows) at random from data for the initial centroids. - DNDarray: gives the initial centers, should be of Shape = (n_clusters, n_features) :type init: str or DNDarray, default: ‘random’ :param max_iter: Maximum number of iterations of the algorithm for a single run. :type max_iter: int, default: 300 :param random_state: Determines random number generation for centroid initialization. :type random_state: int .. rubric:: References [1] Kaufman, L. and Rousseeuw, P.J. (1987), Clustering by means of Medoids, in Statistical Data Analysis Based on the L1 Norm and Related Methods, edited by Y. Dodge, North-Holland, 405416. .. role:: raw-html(raw) :format: html .. method:: _update_centroids(x: heat.core.dndarray.DNDarray, matching_centroids: heat.core.dndarray.DNDarray) Compute new centroid ``ci`` as closest sample to the median of the data points in ``x`` that are assigned to ``ci`` :param x: Input data :type x: DNDarray :param matching_centroids: Array filled with indeces ``i`` indicating to which cluster ``ci`` each sample point in ``x`` is assigned :type matching_centroids: DNDarray .. method:: fit(x: heat.core.dndarray.DNDarray, oversampling: float = 2, iter_multiplier: float = 1) Computes the centroid of a k-medoids clustering. :param x: Training instances to cluster. Shape = (n_samples, n_features) :type x: DNDarray :param oversampling: oversampling factor used in the k-means|| initializiation of centroids :type oversampling: float :param iter_multiplier: factor that increases the number of iterations used in the initialization of centroids :type iter_multiplier: float .. py:class:: DNDarray(array: torch.Tensor, gshape: Tuple[int, Ellipsis], dtype: heat.core.types.datatype, split: Union[int, None], device: heat.core.devices.Device, comm: Communication, balanced: bool) Distributed N-Dimensional array. The core element of HeAT. It is composed of PyTorch tensors local to each process. :param array: Local array elements :type array: torch.Tensor :param gshape: The global shape of the array :type gshape: Tuple[int,...] :param dtype: The datatype of the array :type dtype: datatype :param split: The axis on which the array is divided between processes :type split: int or None :param device: The device on which the local arrays are using (cpu or gpu) :type device: Device :param comm: The communications object for sending and receiving data :type comm: Communication :param balanced: Describes whether the data are evenly distributed across processes. If this information is not available (``self.balanced is None``), it can be gathered via the :func:`is_balanced()` method (requires communication). :type balanced: bool or None .. attribute:: __array .. attribute:: __gshape .. attribute:: __dtype .. attribute:: __split .. attribute:: __device .. attribute:: __comm .. attribute:: __balanced .. attribute:: __ishalo :annotation: = False .. attribute:: __halo_next :annotation: = None .. attribute:: __halo_prev :annotation: = None .. attribute:: __partitions_dict__ :annotation: = None .. attribute:: __lshape_map :annotation: = None .. role:: raw-html(raw) :format: html .. method:: __prephalo(start, end) -> torch.Tensor Extracts the halo indexed by start, end from ``self.array`` in the direction of ``self.split`` :param start: Start index of the halo extracted from ``self.array`` :type start: int :param end: End index of the halo extracted from ``self.array`` :type end: int .. method:: get_halo(halo_size: int, prev: bool = True, next: bool = True) Fetch halos of size ``halo_size`` from neighboring ranks and save them in ``self.halo_next/self.halo_prev``. :param halo_size: Size of the halo. :type halo_size: int :param prev: If True, fetch the halo from the previous rank. Default: True. :type prev: bool, optional :param next: If True, fetch the halo from the next rank. Default: True. :type next: bool, optional .. method:: __cat_halo() -> torch.Tensor Return local array concatenated to halos if they are available. .. method:: __array__() -> numpy.ndarray Returns a view of the process-local slice of the :class:`DNDarray` as a numpy ndarray, if the ``DNDarray`` resides on CPU. Otherwise, it returns a copy, on CPU, of the process-local slice of ``DNDarray`` as numpy ndarray. .. method:: __array_ufunc__(ufunc, method, *inputs, **kwargs) Override NumPy's universal functions. .. method:: __array_function__(func, types, args, kwargs) Augments NumPy's functions. .. method:: astype(dtype, copy=True) -> DNDarray Returns a casted version of this array. Casted array is a new array of the same shape but with given type of this array. If copy is ``True``, the same array is returned instead. :param dtype: Heat type to which the array is cast :type dtype: datatype :param copy: By default the operation returns a copy of this array. If copy is set to ``False`` the cast is performed in-place and this array is returned :type copy: bool, optional .. method:: balance_() -> DNDarray Function for balancing a :class:`DNDarray` between all nodes. To determine if this is needed use the :func:`is_balanced()` function. If the ``DNDarray`` is already balanced this function will do nothing. This function modifies the ``DNDarray`` itself and will not return anything. .. rubric:: Examples >>> a = ht.zeros((10, 2), split=0) >>> a[:, 0] = ht.arange(10) >>> b = a[3:] [0/2] tensor([[3., 0.], [1/2] tensor([[4., 0.], [5., 0.], [6., 0.]]) [2/2] tensor([[7., 0.], [8., 0.], [9., 0.]]) >>> b.balance_() >>> print(b.gshape, b.lshape) [0/2] (7, 2) (1, 2) [1/2] (7, 2) (3, 2) [2/2] (7, 2) (3, 2) >>> b [0/2] tensor([[3., 0.], [4., 0.], [5., 0.]]) [1/2] tensor([[6., 0.], [7., 0.]]) [2/2] tensor([[8., 0.], [9., 0.]]) >>> print(b.gshape, b.lshape) [0/2] (7, 2) (3, 2) [1/2] (7, 2) (2, 2) [2/2] (7, 2) (2, 2) .. method:: __bool__() -> bool Boolean scalar casting. .. method:: __cast(cast_function) -> Union[float, int] Implements a generic cast function for ``DNDarray`` objects. :param cast_function: The actual cast function, e.g. ``float`` or ``int`` :type cast_function: function :raises TypeError: If the ``DNDarray`` object cannot be converted into a scalar. .. method:: collect_(target_rank: Optional[int] = 0) -> None A method collecting a distributed DNDarray to one MPI rank, chosen by the `target_rank` variable. It is a specific case of the ``redistribute_`` method. :param target_rank: The rank to which the DNDarray will be collected. Default: 0. :type target_rank: int, optional :raises TypeError: If the target rank is not an integer. :raises ValueError: If the target rank is out of bounds. .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.collect_() >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) >>> st.collect_(1) >>> print(st.lshape) [0/2] (50, 81, 0) [1/2] (50, 81, 67) [2/2] (50, 81, 0) .. method:: __complex__() -> DNDarray Complex scalar casting. .. method:: counts_displs() -> Tuple[Tuple[int], Tuple[int]] Returns actual counts (number of items per process) and displacements (offsets) of the DNDarray. Does not assume load balance. .. method:: cpu() -> DNDarray Returns a copy of this object in main memory. If this object is already in main memory, then no copy is performed and the original object is returned. .. method:: create_lshape_map(force_check: bool = False) -> torch.Tensor Generate a 'map' of the lshapes of the data on all processes. Units are ``(process rank, lshape)`` :param force_check: if False (default) and the lshape map has already been created, use the previous result. Otherwise, create the lshape_map :type force_check: bool, optional .. method:: create_partition_interface() Create a partition interface in line with the DPPY proposal. This is subject to change. The intention of this to facilitate the usage of a general format for the referencing of distributed datasets. An example of the output and shape is shown below. __partitioned__ = { 'shape': (27, 3, 2), 'partition_tiling': (4, 1, 1), 'partitions': { (0, 0, 0): { 'start': (0, 0, 0), 'shape': (7, 3, 2), 'data': tensor([...], dtype=torch.int32), 'location': [0], 'dtype': torch.int32, 'device': 'cpu' }, (1, 0, 0): { 'start': (7, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [1], 'dtype': torch.int32, 'device': 'cpu' }, (2, 0, 0): { 'start': (14, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [2], 'dtype': torch.int32, 'device': 'cpu' }, (3, 0, 0): { 'start': (21, 0, 0), 'shape': (6, 3, 2), 'data': None, 'location': [3], 'dtype': torch.int32, 'device': 'cpu' } }, 'locals': [(rank, 0, 0)], 'get': lambda x: x, } :rtype: dictionary containing the partition interface as shown above. .. method:: __float__() -> DNDarray Float scalar casting. .. seealso:: :func:`~heat.core.manipulations.flatten` .. method:: fill_diagonal(value: float) -> DNDarray Fill the main diagonal of a 2D :class:`DNDarray`. This function modifies the input tensor in-place, and returns the input array. :param value: The value to be placed in the ``DNDarrays`` main diagonal :type value: float .. method:: __getitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]]) -> DNDarray Global getter function for DNDarrays. Returns a new DNDarray composed of the elements of the original tensor selected by the indices given. This does *NOT* redistribute or rebalance the resulting tensor. If the selection of values is unbalanced then the resultant tensor is also unbalanced! To redistributed the ``DNDarray`` use :func:`balance()` (issue #187) :param key: Indices to get from the tensor. :type key: int, slice, Tuple[int,...], List[int,...] .. rubric:: Examples >>> a = ht.arange(10, split=0) (1/2) >>> tensor([0, 1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5, 6, 7, 8, 9], dtype=torch.int32) >>> a[1:6] (1/2) >>> tensor([1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5], dtype=torch.int32) >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] (1/2) >>> tensor([0.]) (2/2) >>> tensor([0., 0.]) .. method:: gpu() -> DNDarray Returns a copy of this object in GPU memory. If this object is already in GPU memory, then no copy is performed and the original object is returned. .. method:: __int__() -> DNDarray Integer scalar casting. .. method:: is_balanced(force_check: bool = False) -> bool Determine if ``self`` is balanced evenly (or as evenly as possible) across all nodes distributed evenly (or as evenly as possible) across all processes. This is equivalent to returning ``self.balanced``. If no information is available (``self.balanced = None``), the balanced status will be assessed via collective communication. :param force_check: If True, the balanced status of the ``DNDarray`` will be assessed via collective communication in any case. :type force_check: bool, optional .. method:: is_distributed() -> bool Determines whether the data of this ``DNDarray`` is distributed across multiple processes. .. method:: __key_is_singular(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: __key_adds_dimension(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: item() Returns the only element of a 1-element :class:`DNDarray`. Mirror of the pytorch command by the same name. If size of ``DNDarray`` is >1 element, then a ``ValueError`` is raised (by pytorch) .. rubric:: Examples >>> import heat as ht >>> x = ht.zeros((1)) >>> x.item() 0.0 .. method:: __len__() -> int The length of the ``DNDarray``, i.e. the number of items in the first dimension. .. method:: numpy() -> numpy.array Returns a copy of the :class:`DNDarray` as numpy ndarray. If the ``DNDarray`` resides on the GPU, the underlying data will be copied to the CPU first. If the ``DNDarray`` is distributed, an MPI Allgather operation will be performed before converting to np.ndarray, i.e. each MPI process will end up holding a copy of the entire array in memory. Make sure process memory is sufficient! .. rubric:: Examples >>> import heat as ht T1 = ht.random.randn((10,8)) T1.numpy() .. method:: _repr_pretty_(p, cycle) Pretty print for IPython. .. method:: __repr__() -> str Returns a printable representation of the passed DNDarray, targeting developers. .. method:: ravel() Flattens the ``DNDarray``. .. seealso:: :func:`~heat.core.manipulations.ravel` .. rubric:: Examples >>> a = ht.ones((2, 3), split=0) >>> b = a.ravel() >>> a[0, 0] = 4 >>> b DNDarray([4., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0) .. method:: redistribute_(lshape_map: Optional[torch.Tensor] = None, target_map: Optional[torch.Tensor] = None) Redistributes the data of the :class:`DNDarray` *along the split axis* to match the given target map. This function does not modify the non-split dimensions of the ``DNDarray``. This is an abstraction and extension of the balance function. :param lshape_map: The current lshape of processes. Units are ``[rank, lshape]``. :type lshape_map: torch.Tensor, optional :param target_map: The desired distribution across the processes. Units are ``[rank, target lshape]``. Note: the only important parts of the target map are the values along the split axis, values which are not along this axis are there to mimic the shape of the ``lshape_map``. :type target_map: torch.Tensor, optional .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> target_map = torch.zeros((st.comm.size, 3), dtype=torch.int64) >>> target_map[0, 2] = 67 >>> print(target_map) [0/2] tensor([[ 0, 0, 67], [0/2] [ 0, 0, 0], [0/2] [ 0, 0, 0]], dtype=torch.int32) [1/2] tensor([[ 0, 0, 67], [1/2] [ 0, 0, 0], [1/2] [ 0, 0, 0]], dtype=torch.int32) [2/2] tensor([[ 0, 0, 67], [2/2] [ 0, 0, 0], [2/2] [ 0, 0, 0]], dtype=torch.int32) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.redistribute_(target_map=target_map) >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) .. method:: __redistribute_shuffle(snd_pr: Union[int, torch.Tensor], send_amt: Union[int, torch.Tensor], rcv_pr: Union[int, torch.Tensor], snd_dtype: torch.dtype) Function to abstract the function used during redistribute for shuffling data between processes along the split axis :param snd_pr: Sending process :type snd_pr: int or torch.Tensor :param send_amt: Amount of data to be sent by the sending process :type send_amt: int or torch.Tensor :param rcv_pr: Receiving process :type rcv_pr: int or torch.Tensor :param snd_dtype: Torch type of the data in question :type snd_dtype: torch.dtype .. method:: resplit_(axis: int = None) In-place option for resplitting a :class:`DNDarray`. :param axis: The new split axis, ``None`` denotes gathering, an int will set the new split axis :type axis: int .. rubric:: Examples >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, None) >>> a.split None >>> a.lshape (0/2) (4, 5) (1/2) (4, 5) >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, 1) >>> a.split 1 >>> a.lshape (0/2) (4, 3) (1/2) (4, 2) .. method:: __setitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Global item setter :param key: Index/indices to be set :type key: Union[int, Tuple[int,...], List[int,...]] :param value: Value to be set to the specified positions in the DNDarray (self) :type value: Union[float, DNDarray,torch.Tensor] .. rubric:: Notes If a ``DNDarray`` is given as the value to be set then the split axes are assumed to be equal. If they are not, PyTorch will raise an error when the values are attempted to be set on the local array .. rubric:: Examples >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] = 1 >>> a (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 1., 0., 0., 0.]]) (2/2) >>> tensor([[0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.]]) .. method:: __setter(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Utility function for checking ``value`` and forwarding to :func:``__setitem__`` :raises NotImplementedError: If the type of ``value`` ist not supported .. method:: __str__() -> str Computes a string representation of the passed ``DNDarray``. .. method:: tolist(keepsplit: bool = False) -> List Return a copy of the local array data as a (nested) Python list. For scalars, a standard Python number is returned. :param keepsplit: Whether the list should be returned locally or globally. :type keepsplit: bool .. rubric:: Examples >>> a = ht.array([[0, 1], [2, 3]]) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=0) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=1) >>> a.tolist(keepsplit=True) (1/2) [[0], [2]] (2/2) [[1], [3]] .. method:: __torch_function__(func, types, args=(), kwargs=None) Supports PyTorch's dispatch mechanism. .. method:: __torch_proxy__() -> torch.Tensor Return a 1-element `torch.Tensor` strided as the global `self` shape. Used internally for sanitation purposes. .. method:: __xitem_get_key_start_stop(rank: int, actives: list, key_st: int, key_sp: int, step: int, ends: torch.Tensor, og_key_st: int) -> Tuple[int, int] .. py:class:: Spectral(n_clusters: int = None, gamma: float = 1.0, metric: str = 'rbf', laplacian: str = 'fully_connected', threshold: float = 1.0, boundary: str = 'upper', eigen_solver: str = 'randomized', reigh_rank: int = 100, reigh_n_oversamples: int = 10, reigh_power_iter: int = 0, lanczos_n_iter: int = 300, assign_labels: str = 'kmeans', **params) Bases: :class:`heat.ClusteringMixin`, :class:`heat.BaseEstimator` Spectral clustering :ivar n_clusters: Number of clusters to fit :vartype n_clusters: int :ivar gamma: Kernel coefficient sigma for 'rbf', ignored for metric='euclidean' :vartype gamma: float :ivar metric: How to construct the similarity matrix. - 'rbf' : construct the similarity matrix using a radial basis function (RBF) kernel. - 'euclidean' : construct the similarity matrix as only euclidean distance. :vartype metric: string :ivar laplacian: How to calculate the graph laplacian (affinity) Currently supported : 'fully_connected', 'eNeighbour' :vartype laplacian: str :ivar threshold: Threshold for affinity matrix if laplacian='eNeighbour' Ignorded for laplacian='fully_connected' :vartype threshold: float :ivar boundary: How to interpret threshold: 'upper', 'lower' Ignorded for laplacian='fully_connected' :vartype boundary: str :ivar eigen_solver: The eigenvalue decomposition strategy to use. - 'lanczos' : Use Lanczos iterations to reduce the Laplacian matrix size before applying the torch eigenvalue solver. - 'randomized' : Use a randomized algorithm to compute the approximate eigenvalues and eigenvectors. :vartype eigen_solver: str :ivar reigh_rank: number of samples for randomized eigenvalue decomposition. Only used if eigen_solver='randomized'. It must hold reigh_rank >= n_clusters. If n_clusters is None (automatic selection of number of clusters), reigh_rank gives an upper bound on the number of clusters that can be found. Therefore, reigh_rank should be set high enough to capture the expected number of clusters in that case. :vartype reigh_rank: int :ivar reigh_n_oversamples: number of oversamples for randomized eigenvalue decomposition. Only used if eigen_solver='randomized'. Default is 10. :vartype reigh_n_oversamples: int :ivar reigh_power_iter: number of power iterations for randomized eigenvalue decomposition. Only used if eigen_solver='randomized'. Default is 0. Consider increasing this value if the eigen-spectrum of the Laplacian decays slowly. :vartype reigh_power_iter: int :ivar lanczos_n_iter: number of Lanczos iterations for Eigenvalue decomposition. Only used if eigen_solver='lanczos'. Default is 300. :vartype lanczos_n_iter: int :ivar assign_labels: The strategy to use to assign labels in the embedding space. :vartype assign_labels: str :ivar \*\*params: Parameter dictionary for the assign_labels estimator :vartype \*\*params: dict .. attribute:: n_clusters :annotation: = None .. attribute:: gamma :annotation: = 1.0 .. attribute:: metric :annotation: = 'rbf' .. attribute:: laplacian :annotation: = 'fully_connected' .. attribute:: threshold :annotation: = 1.0 .. attribute:: boundary :annotation: = 'upper' .. attribute:: lanczos_n_iter :annotation: = 300 .. attribute:: assign_labels :annotation: = 'kmeans' .. attribute:: eigen_solver :annotation: = 'randomized' .. attribute:: reigh_n_oversamples :annotation: = 10 .. attribute:: reigh_power_iter :annotation: = 0 .. attribute:: reigh_rank :annotation: = 100 .. attribute:: _labels :annotation: = None .. role:: raw-html(raw) :format: html .. method:: _spectral_embedding(x: heat.core.dndarray.DNDarray) -> Tuple[heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray] Helper function for dataset x embedding. Returns Tupel(Eigenvalues, Eigenvectors) of the graph's Laplacian matrix. :param x: Sample Matrix for which the embedding should be calculated :type x: DNDarray .. rubric:: Notes This will throw out the complex side of the eigenvalues found during this. .. method:: fit(x: heat.core.dndarray.DNDarray) Clusters dataset X via spectral embedding. Computes the low-dim representation by calculation of eigenspectrum (eigenvalues and eigenvectors) of the graph laplacian from the similarity matrix and fits the eigenvectors that correspond to the k lowest eigenvalues with a seperate clustering algorithm (currently only kmeans is supported). Similarity metrics for adjacency calculations are supported via spatial.distance. The eigenvalues and eigenvectors are computed by reducing the Laplacian via lanczos iterations and using the torch eigenvalue solver on this smaller matrix. If other eigenvalue decompostion methods are supported, this will be expanded. :param x: Training instances to cluster. Shape = (n_samples, n_features) :type x: DNDarray .. method:: predict(x: heat.core.dndarray.DNDarray) -> heat.core.dndarray.DNDarray Return the label each sample in X belongs to. X is transformed to the low-dim representation by calculation of eigenspectrum (eigenvalues and eigenvectors) of the graph laplacian from the similarity matrix. Inference of lables is done by extraction of the closest centroid of the n_clusters eigenvectors from the previously fitted clustering algorithm (kmeans). :param x: New data to predict. Shape = (n_samples, n_features) :type x: DNDarray .. warning:: Caution: Calculation of the low-dim representation requires some time! .. py:class:: DNDarray(array: torch.Tensor, gshape: Tuple[int, Ellipsis], dtype: heat.core.types.datatype, split: Union[int, None], device: heat.core.devices.Device, comm: Communication, balanced: bool) Distributed N-Dimensional array. The core element of HeAT. It is composed of PyTorch tensors local to each process. :param array: Local array elements :type array: torch.Tensor :param gshape: The global shape of the array :type gshape: Tuple[int,...] :param dtype: The datatype of the array :type dtype: datatype :param split: The axis on which the array is divided between processes :type split: int or None :param device: The device on which the local arrays are using (cpu or gpu) :type device: Device :param comm: The communications object for sending and receiving data :type comm: Communication :param balanced: Describes whether the data are evenly distributed across processes. If this information is not available (``self.balanced is None``), it can be gathered via the :func:`is_balanced()` method (requires communication). :type balanced: bool or None .. attribute:: __array .. attribute:: __gshape .. attribute:: __dtype .. attribute:: __split .. attribute:: __device .. attribute:: __comm .. attribute:: __balanced .. attribute:: __ishalo :annotation: = False .. attribute:: __halo_next :annotation: = None .. attribute:: __halo_prev :annotation: = None .. attribute:: __partitions_dict__ :annotation: = None .. attribute:: __lshape_map :annotation: = None .. role:: raw-html(raw) :format: html .. method:: __prephalo(start, end) -> torch.Tensor Extracts the halo indexed by start, end from ``self.array`` in the direction of ``self.split`` :param start: Start index of the halo extracted from ``self.array`` :type start: int :param end: End index of the halo extracted from ``self.array`` :type end: int .. method:: get_halo(halo_size: int, prev: bool = True, next: bool = True) Fetch halos of size ``halo_size`` from neighboring ranks and save them in ``self.halo_next/self.halo_prev``. :param halo_size: Size of the halo. :type halo_size: int :param prev: If True, fetch the halo from the previous rank. Default: True. :type prev: bool, optional :param next: If True, fetch the halo from the next rank. Default: True. :type next: bool, optional .. method:: __cat_halo() -> torch.Tensor Return local array concatenated to halos if they are available. .. method:: __array__() -> numpy.ndarray Returns a view of the process-local slice of the :class:`DNDarray` as a numpy ndarray, if the ``DNDarray`` resides on CPU. Otherwise, it returns a copy, on CPU, of the process-local slice of ``DNDarray`` as numpy ndarray. .. method:: __array_ufunc__(ufunc, method, *inputs, **kwargs) Override NumPy's universal functions. .. method:: __array_function__(func, types, args, kwargs) Augments NumPy's functions. .. method:: astype(dtype, copy=True) -> DNDarray Returns a casted version of this array. Casted array is a new array of the same shape but with given type of this array. If copy is ``True``, the same array is returned instead. :param dtype: Heat type to which the array is cast :type dtype: datatype :param copy: By default the operation returns a copy of this array. If copy is set to ``False`` the cast is performed in-place and this array is returned :type copy: bool, optional .. method:: balance_() -> DNDarray Function for balancing a :class:`DNDarray` between all nodes. To determine if this is needed use the :func:`is_balanced()` function. If the ``DNDarray`` is already balanced this function will do nothing. This function modifies the ``DNDarray`` itself and will not return anything. .. rubric:: Examples >>> a = ht.zeros((10, 2), split=0) >>> a[:, 0] = ht.arange(10) >>> b = a[3:] [0/2] tensor([[3., 0.], [1/2] tensor([[4., 0.], [5., 0.], [6., 0.]]) [2/2] tensor([[7., 0.], [8., 0.], [9., 0.]]) >>> b.balance_() >>> print(b.gshape, b.lshape) [0/2] (7, 2) (1, 2) [1/2] (7, 2) (3, 2) [2/2] (7, 2) (3, 2) >>> b [0/2] tensor([[3., 0.], [4., 0.], [5., 0.]]) [1/2] tensor([[6., 0.], [7., 0.]]) [2/2] tensor([[8., 0.], [9., 0.]]) >>> print(b.gshape, b.lshape) [0/2] (7, 2) (3, 2) [1/2] (7, 2) (2, 2) [2/2] (7, 2) (2, 2) .. method:: __bool__() -> bool Boolean scalar casting. .. method:: __cast(cast_function) -> Union[float, int] Implements a generic cast function for ``DNDarray`` objects. :param cast_function: The actual cast function, e.g. ``float`` or ``int`` :type cast_function: function :raises TypeError: If the ``DNDarray`` object cannot be converted into a scalar. .. method:: collect_(target_rank: Optional[int] = 0) -> None A method collecting a distributed DNDarray to one MPI rank, chosen by the `target_rank` variable. It is a specific case of the ``redistribute_`` method. :param target_rank: The rank to which the DNDarray will be collected. Default: 0. :type target_rank: int, optional :raises TypeError: If the target rank is not an integer. :raises ValueError: If the target rank is out of bounds. .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.collect_() >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) >>> st.collect_(1) >>> print(st.lshape) [0/2] (50, 81, 0) [1/2] (50, 81, 67) [2/2] (50, 81, 0) .. method:: __complex__() -> DNDarray Complex scalar casting. .. method:: counts_displs() -> Tuple[Tuple[int], Tuple[int]] Returns actual counts (number of items per process) and displacements (offsets) of the DNDarray. Does not assume load balance. .. method:: cpu() -> DNDarray Returns a copy of this object in main memory. If this object is already in main memory, then no copy is performed and the original object is returned. .. method:: create_lshape_map(force_check: bool = False) -> torch.Tensor Generate a 'map' of the lshapes of the data on all processes. Units are ``(process rank, lshape)`` :param force_check: if False (default) and the lshape map has already been created, use the previous result. Otherwise, create the lshape_map :type force_check: bool, optional .. method:: create_partition_interface() Create a partition interface in line with the DPPY proposal. This is subject to change. The intention of this to facilitate the usage of a general format for the referencing of distributed datasets. An example of the output and shape is shown below. __partitioned__ = { 'shape': (27, 3, 2), 'partition_tiling': (4, 1, 1), 'partitions': { (0, 0, 0): { 'start': (0, 0, 0), 'shape': (7, 3, 2), 'data': tensor([...], dtype=torch.int32), 'location': [0], 'dtype': torch.int32, 'device': 'cpu' }, (1, 0, 0): { 'start': (7, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [1], 'dtype': torch.int32, 'device': 'cpu' }, (2, 0, 0): { 'start': (14, 0, 0), 'shape': (7, 3, 2), 'data': None, 'location': [2], 'dtype': torch.int32, 'device': 'cpu' }, (3, 0, 0): { 'start': (21, 0, 0), 'shape': (6, 3, 2), 'data': None, 'location': [3], 'dtype': torch.int32, 'device': 'cpu' } }, 'locals': [(rank, 0, 0)], 'get': lambda x: x, } :rtype: dictionary containing the partition interface as shown above. .. method:: __float__() -> DNDarray Float scalar casting. .. seealso:: :func:`~heat.core.manipulations.flatten` .. method:: fill_diagonal(value: float) -> DNDarray Fill the main diagonal of a 2D :class:`DNDarray`. This function modifies the input tensor in-place, and returns the input array. :param value: The value to be placed in the ``DNDarrays`` main diagonal :type value: float .. method:: __getitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]]) -> DNDarray Global getter function for DNDarrays. Returns a new DNDarray composed of the elements of the original tensor selected by the indices given. This does *NOT* redistribute or rebalance the resulting tensor. If the selection of values is unbalanced then the resultant tensor is also unbalanced! To redistributed the ``DNDarray`` use :func:`balance()` (issue #187) :param key: Indices to get from the tensor. :type key: int, slice, Tuple[int,...], List[int,...] .. rubric:: Examples >>> a = ht.arange(10, split=0) (1/2) >>> tensor([0, 1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5, 6, 7, 8, 9], dtype=torch.int32) >>> a[1:6] (1/2) >>> tensor([1, 2, 3, 4], dtype=torch.int32) (2/2) >>> tensor([5], dtype=torch.int32) >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] (1/2) >>> tensor([0.]) (2/2) >>> tensor([0., 0.]) .. method:: gpu() -> DNDarray Returns a copy of this object in GPU memory. If this object is already in GPU memory, then no copy is performed and the original object is returned. .. method:: __int__() -> DNDarray Integer scalar casting. .. method:: is_balanced(force_check: bool = False) -> bool Determine if ``self`` is balanced evenly (or as evenly as possible) across all nodes distributed evenly (or as evenly as possible) across all processes. This is equivalent to returning ``self.balanced``. If no information is available (``self.balanced = None``), the balanced status will be assessed via collective communication. :param force_check: If True, the balanced status of the ``DNDarray`` will be assessed via collective communication in any case. :type force_check: bool, optional .. method:: is_distributed() -> bool Determines whether the data of this ``DNDarray`` is distributed across multiple processes. .. method:: __key_is_singular(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: __key_adds_dimension(key: any, axis: int, self_proxy: torch.Tensor) -> bool .. method:: item() Returns the only element of a 1-element :class:`DNDarray`. Mirror of the pytorch command by the same name. If size of ``DNDarray`` is >1 element, then a ``ValueError`` is raised (by pytorch) .. rubric:: Examples >>> import heat as ht >>> x = ht.zeros((1)) >>> x.item() 0.0 .. method:: __len__() -> int The length of the ``DNDarray``, i.e. the number of items in the first dimension. .. method:: numpy() -> numpy.array Returns a copy of the :class:`DNDarray` as numpy ndarray. If the ``DNDarray`` resides on the GPU, the underlying data will be copied to the CPU first. If the ``DNDarray`` is distributed, an MPI Allgather operation will be performed before converting to np.ndarray, i.e. each MPI process will end up holding a copy of the entire array in memory. Make sure process memory is sufficient! .. rubric:: Examples >>> import heat as ht T1 = ht.random.randn((10,8)) T1.numpy() .. method:: _repr_pretty_(p, cycle) Pretty print for IPython. .. method:: __repr__() -> str Returns a printable representation of the passed DNDarray, targeting developers. .. method:: ravel() Flattens the ``DNDarray``. .. seealso:: :func:`~heat.core.manipulations.ravel` .. rubric:: Examples >>> a = ht.ones((2, 3), split=0) >>> b = a.ravel() >>> a[0, 0] = 4 >>> b DNDarray([4., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0) .. method:: redistribute_(lshape_map: Optional[torch.Tensor] = None, target_map: Optional[torch.Tensor] = None) Redistributes the data of the :class:`DNDarray` *along the split axis* to match the given target map. This function does not modify the non-split dimensions of the ``DNDarray``. This is an abstraction and extension of the balance function. :param lshape_map: The current lshape of processes. Units are ``[rank, lshape]``. :type lshape_map: torch.Tensor, optional :param target_map: The desired distribution across the processes. Units are ``[rank, target lshape]``. Note: the only important parts of the target map are the values along the split axis, values which are not along this axis are there to mimic the shape of the ``lshape_map``. :type target_map: torch.Tensor, optional .. rubric:: Examples >>> st = ht.ones((50, 81, 67), split=2) >>> target_map = torch.zeros((st.comm.size, 3), dtype=torch.int64) >>> target_map[0, 2] = 67 >>> print(target_map) [0/2] tensor([[ 0, 0, 67], [0/2] [ 0, 0, 0], [0/2] [ 0, 0, 0]], dtype=torch.int32) [1/2] tensor([[ 0, 0, 67], [1/2] [ 0, 0, 0], [1/2] [ 0, 0, 0]], dtype=torch.int32) [2/2] tensor([[ 0, 0, 67], [2/2] [ 0, 0, 0], [2/2] [ 0, 0, 0]], dtype=torch.int32) >>> print(st.lshape) [0/2] (50, 81, 23) [1/2] (50, 81, 22) [2/2] (50, 81, 22) >>> st.redistribute_(target_map=target_map) >>> print(st.lshape) [0/2] (50, 81, 67) [1/2] (50, 81, 0) [2/2] (50, 81, 0) .. method:: __redistribute_shuffle(snd_pr: Union[int, torch.Tensor], send_amt: Union[int, torch.Tensor], rcv_pr: Union[int, torch.Tensor], snd_dtype: torch.dtype) Function to abstract the function used during redistribute for shuffling data between processes along the split axis :param snd_pr: Sending process :type snd_pr: int or torch.Tensor :param send_amt: Amount of data to be sent by the sending process :type send_amt: int or torch.Tensor :param rcv_pr: Receiving process :type rcv_pr: int or torch.Tensor :param snd_dtype: Torch type of the data in question :type snd_dtype: torch.dtype .. method:: resplit_(axis: int = None) In-place option for resplitting a :class:`DNDarray`. :param axis: The new split axis, ``None`` denotes gathering, an int will set the new split axis :type axis: int .. rubric:: Examples >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, None) >>> a.split None >>> a.lshape (0/2) (4, 5) (1/2) (4, 5) >>> a = ht.zeros( ... ( ... 4, ... 5, ... ), ... split=0, ... ) >>> a.lshape (0/2) (2, 5) (1/2) (2, 5) >>> ht.resplit_(a, 1) >>> a.split 1 >>> a.lshape (0/2) (4, 3) (1/2) (4, 2) .. method:: __setitem__(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Global item setter :param key: Index/indices to be set :type key: Union[int, Tuple[int,...], List[int,...]] :param value: Value to be set to the specified positions in the DNDarray (self) :type value: Union[float, DNDarray,torch.Tensor] .. rubric:: Notes If a ``DNDarray`` is given as the value to be set then the split axes are assumed to be equal. If they are not, PyTorch will raise an error when the values are attempted to be set on the local array .. rubric:: Examples >>> a = ht.zeros((4, 5), split=0) (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) (2/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]]) >>> a[1:4, 1] = 1 >>> a (1/2) >>> tensor([[0., 0., 0., 0., 0.], [0., 1., 0., 0., 0.]]) (2/2) >>> tensor([[0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.]]) .. method:: __setter(key: Union[int, Tuple[int, Ellipsis], List[int, Ellipsis]], value: Union[float, DNDarray, torch.Tensor]) Utility function for checking ``value`` and forwarding to :func:``__setitem__`` :raises NotImplementedError: If the type of ``value`` ist not supported .. method:: __str__() -> str Computes a string representation of the passed ``DNDarray``. .. method:: tolist(keepsplit: bool = False) -> List Return a copy of the local array data as a (nested) Python list. For scalars, a standard Python number is returned. :param keepsplit: Whether the list should be returned locally or globally. :type keepsplit: bool .. rubric:: Examples >>> a = ht.array([[0, 1], [2, 3]]) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=0) >>> a.tolist() [[0, 1], [2, 3]] >>> a = ht.array([[0, 1], [2, 3]], split=1) >>> a.tolist(keepsplit=True) (1/2) [[0], [2]] (2/2) [[1], [3]] .. method:: __torch_function__(func, types, args=(), kwargs=None) Supports PyTorch's dispatch mechanism. .. method:: __torch_proxy__() -> torch.Tensor Return a 1-element `torch.Tensor` strided as the global `self` shape. Used internally for sanitation purposes. .. method:: __xitem_get_key_start_stop(rank: int, actives: list, key_st: int, key_sp: int, step: int, ends: torch.Tensor, og_key_st: int) -> Tuple[int, int] .. data:: self Auxiliary single-process functions and base class for batch-parallel k-clustering .. function:: _initialize_plus_plus(X, n_clusters, p, random_state=None, weights: torch.tensor = 1, max_samples=2**24 - 1) Auxiliary function: single-process k-means++/k-medians++ initialization in pytorch p is the norm used for computing distances weights allows to add weights to the distribution function, so that the data points with higher weights are preferred; note that weights must have the same dimension as X[0] The value max_samples=2**24 - 1 is necessary as PyTorchs multinomial currently only supports this number of different categories. .. function:: _kmex(X, p, n_clusters, init, max_iter, tol, random_state=None, weights: torch.tensor = 1.0) Auxiliary function: single-process k-means and k-medians in pytorch p is the norm used for computing distances: p=2 implies k-means, p=1 implies k-medians. p should be 1 (k-medians) or 2 (k-means). For other choice of p, we proceed as for p=2 and hope for the best. (note: kmex stands for kmeans and kmedians) .. function:: _parallel_batched_kmex_predict(X, centers, p) Auxiliary function: predict labels for parallel_batched_kmex .. py:class:: _BatchParallelKCluster(p: int, n_clusters: int, init: str, max_iter: int, tol: float, random_state: Union[int, None], n_procs_to_merge: Union[int, None]) Bases: :class:`heat.ClusteringMixin`, :class:`heat.BaseEstimator` Base class for batch parallel k-clustering .. attribute:: n_clusters .. attribute:: _init .. attribute:: max_iter .. attribute:: tol .. attribute:: random_state .. attribute:: n_procs_to_merge .. attribute:: _p .. attribute:: _cluster_centers :annotation: = None .. attribute:: _n_iter :annotation: = None .. attribute:: _functional_value :annotation: = None .. role:: raw-html(raw) :format: html .. method:: fit(x: heat.core.dndarray.DNDarray) Computes the centroid of the clustering algorithm to fit the data ``x``. :param x: Training instances to cluster. Shape = (n_samples, n_features). It must hold x.split=0. :type x: DNDarray :param weights: Add weights to the distribution function used in the clustering algorithm in kmex :type weights: torch.tensor .. method:: predict(x: heat.core.dndarray.DNDarray) Predict the closest cluster each sample in ``x`` belongs to. In the vector quantization literature, :func:`cluster_centers_` is called the code book and each value returned by predict is the index of the closest code in the code book. :param x: New data to predict. Shape = (n_samples, n_features) :type x: DNDarray .. py:class:: BatchParallelKMeans(n_clusters: int = 8, init: str = 'k-means++', max_iter: int = 300, tol: float = 0.0001, random_state: int = None, n_procs_to_merge: int = None) Bases: :class:`_BatchParallelKCluster` Batch-parallel K-Means clustering algorithm from Ref. [1]. The input must be a ``DNDarray`` of shape `(n_samples, n_features)`, with split=0 (i.e. split along the sample axis). This method performs K-Means clustering on each batch (i.e. on each process-local chunk) of data individually and in parallel. After that, all centroids from the local K-Means are gathered and another instance of K-means is performed on them in order to determine the final centroids. To improve scalability of this approach also on a large number of processes, this procedure can be applied in a hierarchical manner using the parameter `n_procs_to_merge`. :ivar n_clusters: The number of clusters to form as well as the number of centroids to generate. :vartype n_clusters: int :ivar init: Method for initialization for local and global k-means: - ‘k-means++’ : selects initial cluster centers for the clustering in a smart way to speed up convergence [2]. - ‘random’: choose k observations (rows) at random from data for the initial centroids. (Not implemented yet) :vartype init: str :ivar max_iter: Maximum number of iterations of the local/global k-means algorithms. :vartype max_iter: int :ivar tol: Relative tolerance with regards to inertia to declare convergence, both for local and global k-means. :vartype tol: float :ivar random_state: Determines random number generation for centroid initialization. :vartype random_state: int :ivar n_procs_to_merge: Number of processes to merge after each iteration of the local k-means. If None, all processes are merged after each iteration. :vartype n_procs_to_merge: int .. rubric:: References [1] Rasim M. Alguliyev, Ramiz M. Aliguliyev, Lyudmila V. Sukhostat, Parallel batch k-means for Big data clustering, Computers & Industrial Engineering, Volume 152 (2021). https://doi.org/10.1016/j.cie.2020.107023. .. attribute:: init :annotation: = 'k-means++' .. role:: raw-html(raw) :format: html .. py:class:: BatchParallelKMedians(n_clusters: int = 8, init: str = 'k-medians++', max_iter: int = 300, tol: float = 0.0001, random_state: int = None, n_procs_to_merge: int = None) Bases: :class:`_BatchParallelKCluster` Batch-parallel K-Medians clustering algorithm, in analogy to the K-means algorithm from Ref. [1]. This requires data to be given as DNDarray of shape (n_samples, n_features) with split=0 (i.e. split along the sample axis). The idea of the method is to perform the classical K-Medians on each batch of data (i.e. on each process-local chunk of data) individually and in parallel. After that, all centroids from the local K-Medians are gathered and another instance of K-Medians is performed on them in order to determine the final centroids. To improve scalability of this approach also on a range number of processes, this procedure can be applied in a hierarchical manor using the parameter n_procs_to_merge. :ivar n_clusters: The number of clusters to form as well as the number of centroids to generate. :vartype n_clusters: int :ivar init: Method for initialization for local and global k-medians: - ‘k-medians++’ : selects initial cluster centers for the clustering in a smart way to speed up convergence [2]. - ‘random’: choose k observations (rows) at random from data for the initial centroids. (Not implemented yet) :vartype init: str :ivar max_iter: Maximum number of iterations of the local/global k-Medians algorithms. :vartype max_iter: int :ivar tol: Relative tolerance with regards to inertia to declare convergence, both for local and global k-Medians. :vartype tol: float :ivar random_state: Determines random number generation for centroid initialization. :vartype random_state: int :ivar n_procs_to_merge: Number of processes to merge after each iteration of the local k-Medians. If None, all processes are merged after each iteration. :vartype n_procs_to_merge: int .. rubric:: References [1] Rasim M. Alguliyev, Ramiz M. Aliguliyev, Lyudmila V. Sukhostat, Parallel batch k-means for Big data clustering, Computers & Industrial Engineering, Volume 152 (2021). https://doi.org/10.1016/j.cie.2020.107023. .. attribute:: init :annotation: = 'k-medians++' .. role:: raw-html(raw) :format: html .. function:: _validate_input(X, labels, metric='euclidean') Input validation for clustering metrics. Converts input to DNDarray if needed. :param X: Input data. :type X: {DNDarray, list} :param labels: Labels. :type labels: {DNDarray, list} :param metric: The metric to use for validation. Default is "euclidean". :type metric: str, optional :returns: * **X** (*DNDarray*) -- The converted and validated X. * **labels** (*DNDarray*) -- The converted and validated labels. .. rubric:: Examples >>> import heat as ht >>> X = ht.array([[1, 2], [3, 4]], dtype=ht.float) >>> labels = ht.array([0, 1]) >>> _validate_input(X, labels) (DNDarray([[1., 2.], [3., 4.]], dtype=ht.float32, device=cpu:0, split=None), DNDarray([0, 1], dtype=ht.int64, device=cpu:0, split=None)) .. function:: silhouette_samples(X, labels, *, metric='euclidean') Compute the Silhouette Coefficient for each sample. The Silhouette Coefficient is a measure of how close an object is to its own cluster (cohesion) compared to other clusters (separation). The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. * The score is 0 for clusters with only a single sample. * The calculation involves computing the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. Parameters ---------- X : DNDarray An array of pairwise distances between samples, or a feature array. If `metric='precomputed'`, X is assumed to be a distance matrix and a feature array otherwise. labels : DNDarray Labels for each sample. metric : str, optional The metric to use when calculating distance between instances in a feature array. If metric is "precomputed", X is assumed to be a distance matrix. Default is "euclidean". Returns ------- DNDarray Silhouette value of all individual samples in the clustering Notes ----- The Silhouette Coefficient $s(i)$ for a single sample is defined as: $$s(i) = rac{b(i) - a(i)}{\max(a(i), b(i))}$$ where $a(i)$ is the mean distance to other samples in the same cluster and $b(i)$ is the mean distance to samples in the nearest neighbor cluster. Raises ------ ValueError If `metric='precomputed'` and the diagonal contains non-zero elements. See Also -------- silhouette_score : Average silhouette coefficient over all samples. Examples -------- >>> import heat as ht >>> X = ht.array([[1, 2], [1, 1], [4, 4], [4, 5]], split=0) >>> labels = ht.array([0, 0, 1, 1], split=0) >>> ht.cluster.silhouette_samples(X, labels) DNDarray([0.7452, 0.7836, 0.7452, 0.7836], dtype=ht.float64, device=cpu:0, split=0) .. function:: silhouette_score(X, labels, *, metric='euclidean', sample_size=None, random_state=None, **kwargs) Compute the mean Silhouette Coefficient of all samples. The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is $(b - a) / \max(a, b)$. * This function returns the average of `silhouette_samples`. * To clarify, $b$ is the distance between a sample and the nearest cluster that the sample is not a part of. :param X: An array of pairwise distances between samples, or a feature array. :type X: DNDarray :param labels: Labels for each sample. :type labels: DNDarray :param metric: The metric to use when calculating distance between instances in a feature array. If metric is "precomputed", X is assumed to be a distance matrix. Default is "euclidean". :type metric: str, optional :param sample_size: The size of the sample to use when computing the Silhouette Coefficient on a random subset of the data. If ``sample_size is None``, no sampling is used. :type sample_size: int, optional :param random_state: Determines random number generation for selecting a subset of samples. Used when `sample_size` is not `None`. :type random_state: int, optional :param \*\*kwargs: Additional keyword arguments passed to `silhouette_samples`. :type \*\*kwargs: optional :returns: Silhouette score of the clustering :rtype: float .. rubric:: Notes The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. .. seealso:: :py:obj:`silhouette_samples` Silhouette Coefficient for each individual sample. .. rubric:: Examples >>> import heat as ht >>> X = ht.array([[1, 2], [1, 1], [4, 4], [4, 5]], split=0) >>> labels = ht.array([0, 0, 1, 1], split=0) >>> ht.cluster.silhouette_score(X, labels) 0.76439