heat.statistics

Distributed statistical operations.

Module Contents

argmax(x: heat.core.dndarray.DNDarray, axis: int | None = None, out: heat.core.dndarray.DNDarray | None = None, **kwargs: object) heat.core.dndarray.DNDarray

Returns an array of the indices of the maximum values along an axis. It has the same shape as x.shape with the dimension along axis removed.

Parameters:
  • x (DNDarray) – Input array.

  • axis (int, optional) – By default, the index is into the flattened array, otherwise along the specified axis.

  • out (DNDarray, optional.) – If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.

Examples

>>> a = ht.random.randn(3, 3)
>>> a
DNDarray([[ 1.0661,  0.7036, -2.0908],
          [-0.7534, -0.4986, -0.7751],
          [-0.4815,  1.9436,  0.6400]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.argmax(a)
DNDarray([7], dtype=ht.int64, device=cpu:0, split=None)
>>> ht.argmax(a, axis=0)
DNDarray([0, 2, 2], dtype=ht.int64, device=cpu:0, split=None)
>>> ht.argmax(a, axis=1)
DNDarray([0, 1, 1], dtype=ht.int64, device=cpu:0, split=None)
argmin(x: heat.core.dndarray.DNDarray, axis: int | None = None, out: heat.core.dndarray.DNDarray | None = None, **kwargs: object) heat.core.dndarray.DNDarray

Returns an array of the indices of the minimum values along an axis. It has the same shape as x.shape with the dimension along axis removed.

Parameters:
  • x (DNDarray) – Input array.

  • axis (int, optional) – By default, the index is into the flattened array, otherwise along the specified axis.

  • out (DNDarray, optional) – Issue #100 If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.

Examples

>>> a = ht.random.randn(3, 3)
>>> a
DNDarray([[ 1.0661,  0.7036, -2.0908],
          [-0.7534, -0.4986, -0.7751],
          [-0.4815,  1.9436,  0.6400]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.argmin(a)
DNDarray([2], dtype=ht.int64, device=cpu:0, split=None)
>>> ht.argmin(a, axis=0)
DNDarray([1, 1, 0], dtype=ht.int64, device=cpu:0, split=None)
>>> ht.argmin(a, axis=1)
DNDarray([2, 2, 0], dtype=ht.int64, device=cpu:0, split=None)
average(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None, weights: heat.core.dndarray.DNDarray | None = None, returned: bool = False) heat.core.dndarray.DNDarray | Tuple[heat.core.dndarray.DNDarray, Ellipsis]

Compute the weighted average along the specified axis.

If returned=True, return a tuple with the average as the first element and the sum of the weights as the second element. sum_of_weights is of the same type as average.

Parameters:
  • x (DNDarray) – Array containing data to be averaged.

  • axis (None or int or Tuple[int,...], optional) – Axis or axes along which to average x. The default, axis=None, will average over all of the elements of the input array. If axis is negative it counts from the last to the first axis. #TODO Issue #351: If axis is a tuple of ints, averaging is performed on all of the axes specified in the tuple instead of a single axis or all the axes as before.

  • weights (DNDarray, optional) – An array of weights associated with the values in x. Each value in x contributes to the average according to its associated weight. The weights array can either be 1D (in which case its length must be the size of x along the given axis) or of the same shape as x. If weights=None, then all data in x are assumed to have a weight equal to one, the result is equivalent to mean().

  • returned (bool, optional) – If True, the tuple (average, sum_of_weights) is returned, otherwise only the average is returned. If weights=None, sum_of_weights is equivalent to the number of elements over which the average is taken.

Raises:
  • ZeroDivisionError – When all weights along axis are zero.

  • TypeError – When the length of 1D weights is not the same as the shape of x along axis.

Examples

>>> data = ht.arange(1,5, dtype=float)
>>> data
DNDarray([1., 2., 3., 4.], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.average(data)
DNDarray(2.5000, dtype=ht.float32, device=cpu:0, split=None)
>>> ht.average(ht.arange(1,11, dtype=float), weights=ht.arange(10,0,-1))
DNDarray([4.], dtype=ht.float64, device=cpu:0, split=None)
>>> data = ht.array([[0, 1],
                     [2, 3],
                    [4, 5]], dtype=float, split=1)
>>> weights = ht.array([1./4, 3./4])
>>> ht.average(data, axis=1, weights=weights)
DNDarray([0.7500, 2.7500, 4.7500], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.average(data, weights=weights)
Traceback (most recent call last):
    ...
TypeError: Axis must be specified when shapes of x and weights differ.
bincount(x: heat.core.dndarray.DNDarray, weights: heat.core.dndarray.DNDarray | None = None, minlength: int = 0) heat.core.dndarray.DNDarray

Count number of occurrences of each value in array of non-negative ints. Return a non-distributed DNDarray of length max(x) + 1 if input is non-empty, else 0.

The number of bins (size 1) is one larger than the largest value in x unless x is empty, in which case the result is a tensor of size 0. If minlength is specified, the number of bins is at least minlength and if x is empty, then the result is tensor of size minlength filled with zeros. If n is the value at position i, out[n] += weights[i] if weights is specified else out[n] += 1.

Parameters:
  • x (DNDarray) – 1-dimensional, non-negative ints

  • weights (DNDarray, optional) – Weight for each value in the input tensor. Array of the same shape as x. Same split as x.

  • minlength (int, non-negative, optional) – Minimum number of bins

Raises:

ValueError – If x and weights don’t have the same distribution.

Examples

>>> ht.bincount(ht.arange(5))
DNDarray([1, 1, 1, 1, 1], dtype=ht.int64, device=cpu:0, split=None)
>>> ht.bincount(ht.array([0, 1, 3, 2, 1]), weights=ht.array([0, 0.5, 1, 1.5, 2]))
DNDarray([0.0000, 2.5000, 1.5000, 1.0000], dtype=ht.float32, device=cpu:0, split=None)
bucketize(input: heat.core.dndarray.DNDarray, boundaries: heat.core.dndarray.DNDarray | torch.Tensor, out_int32: bool = False, right: bool = False, out: heat.core.dndarray.DNDarray = None) heat.core.dndarray.DNDarray

Returns the indices of the buckets to which each value in the input belongs, where the boundaries of the buckets are set by boundaries.

Parameters:
  • input (DNDarray) – The input array.

  • boundaries (DNDarray or torch.Tensor) – monotonically increasing sequence defining the bucket boundaries, 1-dimensional, not distributed

  • out_int32 (bool, optional) – set the dtype of the output to ht.int64 (False) or ht.int32 (True)

  • right (bool, optional) – indicate whether the buckets include the right (False) or left (True) boundaries, see Notes.

  • out (DNDarray, optional) – The output array, must be the shame shape and split as the input array.

Notes

This function uses the PyTorch’s setting for right:

right

returned index i satisfies

False

boundaries[i-1] < x <= boundaries[i]

True

boundaries[i-1] <= x < boundaries[i]

Raises:

RuntimeError – If boundaries is distributed.

See also

digitize

NumPy-like version of this function.

Examples

>>> boundaries = ht.array([1, 3, 5, 7, 9])
>>> v = ht.array([[3, 6, 9], [3, 6, 9]])
>>> ht.bucketize(v, boundaries)
DNDarray([[1, 3, 4],
          [1, 3, 4]], dtype=ht.int64, device=cpu:0, split=None)
>>> ht.bucketize(v, boundaries, right=True)
DNDarray([[2, 3, 5],
          [2, 3, 5]], dtype=ht.int64, device=cpu:0, split=None)
cov(m: heat.core.dndarray.DNDarray, y: heat.core.dndarray.DNDarray | None = None, rowvar: bool = True, bias: bool = False, ddof: int | None = None) heat.core.dndarray.DNDarray

Estimate the covariance matrix of some data, m. For more imformation on the algorithm please see the numpy function of the same name

Parameters:
  • m (DNDarray) – A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables.

  • y (DNDarray, optional) – An additional set of variables and observations. y has the same form as that of m.

  • rowvar (bool, optional) – If True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

  • bias (bool, optional) – Default normalization (False) is by (N - 1), where N is the number of observations given (unbiased estimate). If True, then normalization is by N. These values can be overridden by using the keyword ddof in numpy versions >= 1.5.

  • ddof (int, optional) – If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate and ddof=0 will return the simple average.

digitize(x: heat.core.dndarray.DNDarray, bins: heat.core.dndarray.DNDarray | torch.Tensor, right: bool = False) heat.core.dndarray.DNDarray

Return the indices of the bins to which each value in the input array x belongs. If values in x are beyond the bounds of bins, 0 or len(bins) is returned as appropriate.

Parameters:
  • x (DNDarray) – The input array

  • bins (DNDarray or torch.Tensor) – A 1-dimensional array containing a monotonic sequence describing the bin boundaries, not distributed.

  • right (bool, optional) – Indicating whether the intervals include the right or the left bin edge, see Notes.

Notes

This function uses NumPy’s setting for right:

right

order of bins

returned index i satisfies

False

increasing

bins[i-1] <= x < bins[i]

True

increasing

bins[i-1] < x <= bins[i]

False

decreasing

bins[i-1] > x >= bins[i]

True

decreasing

bins[i-1] >= x > bins[i]

Raises:

RuntimeError – If bins is distributed.

See also

bucketize

PyTorch-like version of this function.

Examples

>>> x = ht.array([1.2, 10.0, 12.4, 15.5, 20.])
>>> bins = ht.array([0, 5, 10, 15, 20])
>>> ht.digitize(x,bins,right=True)
DNDarray([1, 2, 3, 4, 4], dtype=ht.int64, device=cpu:0, split=None)
>>> ht.digitize(x,bins,right=False)
DNDarray([1, 3, 3, 4, 5], dtype=ht.int64, device=cpu:0, split=None)
histc(input: heat.core.dndarray.DNDarray, bins: int = 100, min: int = 0, max: int = 0, out: heat.core.dndarray.DNDarray | None = None) heat.core.dndarray.DNDarray

Return the histogram of a DNDarray.

The elements are sorted into equal width bins between min and max. If min and max are both equal, the minimum and maximum values of the data are used. Elements lower than min and higher than max are ignored.

Parameters:
  • input (DNDarray) – the input array, must be of float type

  • bins (int, optional) – number of histogram bins

  • min (int, optional) – lower end of the range (inclusive)

  • max (int, optional) – upper end of the range (inclusive)

  • out (DNDarray, optional) – the output tensor, same dtype as input

Examples

>>> ht.histc(ht.array([1., 2, 1]), bins=4, min=0, max=3)
DNDarray([0., 2., 1., 0.], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.histc(ht.arange(10, dtype=ht.float64, split=0), bins=10)
DNDarray([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=ht.float64, device=cpu:0, split=None)
histogram(a: heat.core.dndarray.DNDarray, bins: int = 10, range: Tuple[int, int] = (0, 0), normed: bool | None = None, weights: heat.core.dndarray.DNDarray | None = None, density: bool | None = None) heat.core.dndarray.DNDarray

Compute the histogram of a DNDarray.

Parameters:
  • a (DNDarray) – the input array, must be of float type

  • bins (int, optional) – number of histogram bins

  • range (Tuple[int,int], optional) – lower and upper end of the bins. If not provided, range is simply (a.min(), a.max()).

  • normed (bool, optional) – Deprecated since NumPy version 1.6. TODO: remove.

  • weights (DNDarray, optional) – array of weights. Not implemented yet.

  • density (bool, optional) – Not implemented yet.

Notes

This is a wrapper function of histc() for some basic compatibility with the NumPy API.

See also

histc()

kurtosis(x: heat.core.dndarray.DNDarray, axis: int | None = None, unbiased: bool = True, Fischer: bool = True) heat.core.dndarray.DNDarray

Compute the kurtosis (Fisher or Pearson) of a dataset. Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution.

If unbiased is True (defualt) then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators

Parameters:
  • x (ht.DNDarray) – Input array

  • axis (NoneType or Int) – Axis along which skewness is calculated, Default is to compute over the whole array x

  • unbiased (Bool) – if True (default) the calculations are corrected for bias

  • Fischer (bool) – Whether use Fischer’s definition or not. If true 3. is subtracted from the result.

Warning

UserWarning: Dependent on the axis given and the split configuration, a UserWarning may be thrown during this function as data is transferred between processes.

max(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None, out: heat.core.dndarray.DNDarray | None = None, keepdims: bool | None = None) heat.core.dndarray.DNDarray

Return the maximum along a given axis.

Parameters:
  • x (DNDarray) – Input array.

  • axis (None or int or Tuple[int,...], optional) – Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the maximum is selected over multiple axes, instead of a single axis or all the axes as before.

  • out (DNDarray, optional) – Tuple of two output arrays (max, max_indices). Must be of the same shape and buffer length as the expected output. The minimum value of an output element. Must be present to allow computation on empty slice.

  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.

Examples

>>> a = ht.float32([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])
>>> ht.max(a)
DNDarray([12.], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.max(a, axis=0)
DNDarray([10., 11., 12.], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.max(a, axis=1)
DNDarray([ 3.,  6.,  9., 12.], dtype=ht.float32, device=cpu:0, split=None)
maximum(x1: heat.core.dndarray.DNDarray, x2: heat.core.dndarray.DNDarray, out: heat.core.dndarray.DNDarray | None = None) heat.core.dndarray.DNDarray

Compares two DNDarrays and returns a new DNDarray containing the element-wise maxima. The DNDarrays must have the same shape, or shapes that can be broadcast to a single shape. For broadcasting semantics, see: https://pytorch.org/docs/stable/notes/broadcasting.html If one of the elements being compared is NaN, then that element is returned. TODO: Check this: If both elements are NaNs then the first is returned. The latter distinction is important for complex NaNs, which are defined as at least one of the real or imaginary parts being NaN. The net effect is that NaNs are propagated.

Parameters:
  • x1 (DNDarray) – The first array containing the elements to be compared.

  • x2 (DNDarray) – The second array containing the elements to be compared.

  • out (DNDarray, optional) – A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned.

Examples

>>> import heat as ht
>>> a = ht.random.randn(3, 4)
>>> a
DNDarray([[ 0.2701, -0.6993,  1.2197,  0.0579],
          [ 0.6815,  0.4722, -0.3947, -0.3030],
          [ 1.0101, -1.2460, -1.3953, -0.6879]], dtype=ht.float32, device=cpu:0, split=None)
>>> b = ht.random.randn(3, 4)
>>> b
DNDarray([[ 0.9664,  0.6159, -0.8555,  0.8204],
          [-1.2200, -0.0759,  0.0437,  0.4700],
          [ 1.2271,  1.0530,  0.1095,  0.8386]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.maximum(a, b)
DNDarray([[0.9664, 0.6159, 1.2197, 0.8204],
          [0.6815, 0.4722, 0.0437, 0.4700],
          [1.2271, 1.0530, 0.1095, 0.8386]], dtype=ht.float32, device=cpu:0, split=None)
>>> c = ht.random.randn(1, 4)
>>> c
DNDarray([[-0.5363, -0.9765,  0.4099,  0.3520]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.maximum(a, c)
DNDarray([[ 0.2701, -0.6993,  1.2197,  0.3520],
          [ 0.6815,  0.4722,  0.4099,  0.3520],
          [ 1.0101, -0.9765,  0.4099,  0.3520]], dtype=ht.float32, device=cpu:0, split=None)
>>> d = ht.random.randn(3, 4, 5)
>>> ht.maximum(a, d)
ValueError: operands could not be broadcast, input shapes (3, 4) (3, 4, 5)
mean(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None) heat.core.dndarray.DNDarray

Calculates and returns the mean of a DNDarray. If an axis is given, the mean will be taken in that direction.

Parameters:
  • x (DNDarray) – Values for which the mean is calculated for. The dtype of x must be a float

  • axis (None or int or iterable) – Axis which the mean is taken in. Default None calculates mean of all data items.

Notes

Split semantics when axis is an integer:

  • if axis==x.split, then mean(x).split=None

  • if axis>split, then mean(x).split=x.split

  • if axis<split, then mean(x).split=x.split-1

Examples

>>> a = ht.random.randn(1,3)
>>> a
DNDarray([[-0.1164,  1.0446, -0.4093]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.mean(a)
DNDarray(0.1730, dtype=ht.float32, device=cpu:0, split=None)
>>> a = ht.random.randn(4,4)
>>> a
DNDarray([[-1.0585,  0.7541, -1.1011,  0.5009],
          [-1.3575,  0.3344,  0.4506,  0.7379],
          [-0.4337, -0.6516, -1.3690, -0.8772],
          [ 0.6929, -1.0989, -0.9961,  0.3547]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.mean(a, 1)
DNDarray([-0.2262,  0.0413, -0.8328, -0.2619], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.mean(a, 0)
DNDarray([-0.5392, -0.1655, -0.7539,  0.1791], dtype=ht.float32, device=cpu:0, split=None)
>>> a = ht.random.randn(4,4)
>>> a
DNDarray([[-0.1441,  0.5016,  0.8907,  0.6318],
          [-1.1690, -1.2657,  1.4840, -0.1014],
          [ 0.4133,  1.4168,  1.3499,  1.0340],
          [-0.9236, -0.7535, -0.2466, -0.9703]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.mean(a, (0,1))
DNDarray(0.1342, dtype=ht.float32, device=cpu:0, split=None)
median(x: heat.core.dndarray.DNDarray, axis: int | None = None, keepdims: bool = False) heat.core.dndarray.DNDarray

Compute the median of the data along the specified axis. Returns the median of the DNDarray elements.

Parameters:
  • x (DNDarray) – Input tensor

  • axis (int, or None, optional) – Axis along which the median is computed. Default is None, i.e., the median is computed along a flattened version of the DNDarray.

  • keepdims (bool, optional) – If True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result can broadcast correctly against the original array a.

min(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None, out: heat.core.dndarray.DNDarray | None = None, keepdims: bool | None = None) heat.core.dndarray.DNDarray

Return the minimum along a given axis.

Parameters:
  • x (DNDarray) – Input array.

  • axis (None or int or Tuple[int,...]) – Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the minimum is selected over multiple axes, instead of a single axis or all the axes as before.

  • out (Tuple[DNDarray,DNDarray], optional) – Tuple of two output arrays (min, min_indices). Must be of the same shape and buffer length as the expected output. The maximum value of an output element. Must be present to allow computation on empty slice.

  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.

Examples

>>> a = ht.float32([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9],
        [10, 11, 12]
    ])
>>> ht.min(a)
DNDarray([1.], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.min(a, axis=0)
DNDarray([1., 2., 3.], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.min(a, axis=1)
DNDarray([ 1.,  4.,  7., 10.], dtype=ht.float32, device=cpu:0, split=None)
minimum(x1: heat.core.dndarray.DNDarray, x2: heat.core.dndarray.DNDarray, out: heat.core.dndarray.DNDarray | None = None) heat.core.dndarray.DNDarray

Compares two DNDarrays and returns a new DNDarray containing the element-wise minima. If one of the elements being compared is NaN, then that element is returned. They must have the same shape, or shapes that can be broadcast to a single shape. For broadcasting semantics, see: https://pytorch.org/docs/stable/notes/broadcasting.html TODO: Check this: If both elements are NaNs then the first is returned. The latter distinction is important for complex NaNs, which are defined as at least one of the real or imaginary parts being NaN. The net effect is that NaNs are propagated.

Parameters:
  • x1 (DNDarray) – The first array containing the elements to be compared.

  • x2 (DNDarray) – The second array containing the elements to be compared.

  • out (DNDarray, optional) – A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned.

Examples

>>> import heat as ht
>>> a = ht.random.randn(3,4)
>>> a
DNDarray([[-0.5462,  0.0079,  1.2828,  1.4980],
          [ 0.6503, -1.1069,  1.2131,  1.4003],
          [-0.3203, -0.2318,  1.0388,  0.4439]], dtype=ht.float32, device=cpu:0, split=None)
>>> b = ht.random.randn(3,4)
>>> b
DNDarray([[ 1.8505,  2.3055, -0.2825, -1.4718],
          [-0.3684,  1.6866, -0.8570, -0.4779],
          [ 1.0532,  0.3775, -0.8669, -1.7275]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.minimum(a,b)
DNDarray([[-0.5462,  0.0079, -0.2825, -1.4718],
          [-0.3684, -1.1069, -0.8570, -0.4779],
          [-0.3203, -0.2318, -0.8669, -1.7275]], dtype=ht.float32, device=cpu:0, split=None)
>>> c = ht.random.randn(1,4)
>>> c
DNDarray([[-1.4358,  1.2914, -0.6042, -1.4009]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.minimum(a,c)
DNDarray([[-1.4358,  0.0079, -0.6042, -1.4009],
          [-1.4358, -1.1069, -0.6042, -1.4009],
          [-1.4358, -0.2318, -0.6042, -1.4009]], dtype=ht.float32, device=cpu:0, split=None)
>>> d = ht.random.randn(3,4,5)
>>> ht.minimum(a,d)
ValueError: operands could not be broadcast, input shapes (3, 4) (3, 4, 5)
percentile(x: heat.core.dndarray.DNDarray, q: heat.core.dndarray.DNDarray | int | float | Tuple | List, axis: int | None = None, out: heat.core.dndarray.DNDarray | None = None, interpolation: str = 'linear', keepdims: bool = False) heat.core.dndarray.DNDarray

Compute the q-th percentile of the data along the specified axis. Returns the q-th percentile(s) of the tensor elements.

Parameters:
  • x (DNDarray) – Input tensor

  • q (DNDarray, scalar, or list of scalars) – Percentile or sequence of percentiles to compute. Must belong to the interval [0, 100].

  • axis (int, or None, optional) – Axis along which the percentiles are computed. Default is None.

  • out (DNDarray, optional.) – Output buffer.

  • interpolation (str, optional) –

    Interpolation method to use when the desired percentile lies between two data points \(i < j\). Can be one of:

    • ‘linear’: \(i + (j - i) \cdot fraction\), where fraction is the fractional part of the index surrounded by i and j.

    • ‘lower’: i.

    • ‘higher’: j.

    • ‘nearest’: i or j, whichever is nearest.

    • ‘midpoint’: \((i + j) / 2\).

  • keepdims (bool, optional) – If True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result can broadcast correctly against the original array x.

skew(x: heat.core.dndarray.DNDarray, axis: int = None, unbiased: bool = True) heat.core.dndarray.DNDarray

Compute the sample skewness of a data set.

Parameters:
  • x (ht.DNDarray) – Input array

  • axis (NoneType or Int) – Axis along which skewness is calculated, Default is to compute over the whole array x

  • unbiased (Bool) – if True (default) the calculations are corrected for bias

Warning

UserWarning: Dependent on the axis given and the split configuration, a UserWarning may be thrown during this function as data is transferred between processes.

std(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int] | List[int] = None, ddof: int = 0, **kwargs: object) heat.core.dndarray.DNDarray

Calculates the standard deviation of a DNDarray with the bessel correction. If an axis is given, the variance will be taken in that direction.

Parameters:
  • x (DNDarray) – array for which the std is calculated for. The datatype of x must be a float

  • axis (None or int or iterable) – Axis which the std is taken in. Default None calculates std of all data items.

  • ddof (int, optional) – Delta Degrees of Freedom: the denominator implicitely used in the calculation is N - ddof, where N represents the number of elements. If ddof=1, the Bessel correction will be applied. Setting ddof>1 raises a NotImplementedError.

Examples

>>> a = ht.random.randn(1, 3)
>>> a
DNDarray([[ 0.5714,  0.0048, -0.2942]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.std(a)
DNDarray(0.3590, dtype=ht.float32, device=cpu:0, split=None)
>>> a = ht.random.randn(4,4)
>>> a
DNDarray([[ 0.8488,  1.2225,  1.2498, -1.4592],
          [-0.5820, -0.3928,  0.1509, -0.0174],
          [ 0.6426, -1.8149,  0.1369,  0.0042],
          [-0.6043, -0.0523, -1.6653,  0.6631]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.std(a, 1, ddof=1)
DNDarray([1.2961, 0.3362, 1.0739, 0.9820], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.std(a, 1)
DNDarray([1.2961, 0.3362, 1.0739, 0.9820], dtype=ht.float32, device=cpu:0, split=None)
var(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int] | List[int] = None, ddof: int = 0, **kwargs: object) heat.core.dndarray.DNDarray

Calculates and returns the variance of a DNDarray. If an axis is given, the variance will be taken in that direction.

Parameters:
  • x (DNDarray) – Array for which the variance is calculated for. The datatype of x must be a float

  • axis (None or int or iterable) – Axis which the std is taken in. Default None calculates std of all data items.

  • ddof (int, optional) – Delta Degrees of Freedom: the denominator implicitely used in the calculation is N - ddof, where N represents the number of elements. If ddof=1, the Bessel correction will be applied. Setting ddof>1 raises a NotImplementedError.

Notes

Split semantics when axis is an integer:

  • if axis=x.split, then var(x).split=None

  • if axis>split, then var(x).split = x.split

  • if axis<split, then var(x).split=x.split - 1

The variance is the average of the squared deviations from the mean, i.e., var=mean(abs(x - x.mean())**2). The mean is normally calculated as x.sum()/N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.

Examples

>>> a = ht.random.randn(1,3)
>>> a
DNDarray([[-2.3589, -0.2073,  0.8806]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.var(a)
DNDarray(1.8119, dtype=ht.float32, device=cpu:0, split=None)
>>> ht.var(a, ddof=1)
DNDarray(2.7179, dtype=ht.float32, device=cpu:0, split=None)
>>> a = ht.random.randn(4,4)
>>> a
DNDarray([[-0.8523, -1.4982, -0.5848, -0.2554],
          [ 0.8458, -0.3125, -0.2430,  1.9016],
          [-0.6778, -0.3584, -1.5112,  0.6545],
          [-0.9161,  0.0168,  0.0462,  0.5964]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.var(a, 1)
DNDarray([0.2777, 1.0957, 0.8015, 0.3936], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.var(a, 0)
DNDarray([0.7001, 0.4376, 0.4576, 0.7890], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.var(a, 0, ddof=1)
DNDarray([0.7001, 0.4376, 0.4576, 0.7890], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.var(a, 0, ddof=0)
DNDarray([0.7001, 0.4376, 0.4576, 0.7890], dtype=ht.float32, device=cpu:0, split=None)