heat.statistics
Distributed statistical operations.
Module Contents
- argmax(x: heat.core.dndarray.DNDarray, axis: int | None = None, out: heat.core.dndarray.DNDarray | None = None, **kwargs: object) heat.core.dndarray.DNDarray[source]
Returns an array of the indices of the maximum values along an axis. It has the same shape as
x.shapewith the dimension along axis removed.- Parameters:
x (DNDarray) – Input array.
axis (int, optional) – By default, the index is into the flattened array, otherwise along the specified axis.
out (DNDarray, optional.) – If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.
**kwargs – Extra keyword arguments
Examples
>>> a = ht.random.randn(3, 3) >>> a DNDarray([[ 1.0661, 0.7036, -2.0908], [-0.7534, -0.4986, -0.7751], [-0.4815, 1.9436, 0.6400]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.argmax(a) DNDarray([7], dtype=ht.int64, device=cpu:0, split=None) >>> ht.argmax(a, axis=0) DNDarray([0, 2, 2], dtype=ht.int64, device=cpu:0, split=None) >>> ht.argmax(a, axis=1) DNDarray([0, 1, 1], dtype=ht.int64, device=cpu:0, split=None)
- argmin(x: heat.core.dndarray.DNDarray, axis: int | None = None, out: heat.core.dndarray.DNDarray | None = None, **kwargs: object) heat.core.dndarray.DNDarray[source]
Returns an array of the indices of the minimum values along an axis. It has the same shape as
x.shapewith the dimension along axis removed.- Parameters:
x (DNDarray) – Input array.
axis (int, optional) – By default, the index is into the flattened array, otherwise along the specified axis.
out (DNDarray, optional) – Issue #100 If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.
**kwargs – Extra keyword arguments
Examples
>>> a = ht.random.randn(3, 3) >>> a DNDarray([[ 1.0661, 0.7036, -2.0908], [-0.7534, -0.4986, -0.7751], [-0.4815, 1.9436, 0.6400]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.argmin(a) DNDarray([2], dtype=ht.int64, device=cpu:0, split=None) >>> ht.argmin(a, axis=0) DNDarray([1, 1, 0], dtype=ht.int64, device=cpu:0, split=None) >>> ht.argmin(a, axis=1) DNDarray([2, 2, 0], dtype=ht.int64, device=cpu:0, split=None)
- average(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None, weights: heat.core.dndarray.DNDarray | None = None, returned: bool = False) heat.core.dndarray.DNDarray | Tuple[heat.core.dndarray.DNDarray, Ellipsis][source]
Compute the weighted average along the specified axis.
If
returned=True, return a tuple with the average as the first element and the sum of the weights as the second element.sum_of_weightsis of the same type asaverage.- Parameters:
x (DNDarray) – Array containing data to be averaged.
axis (None or int or Tuple[int,...], optional) – Axis or axes along which to average
x. The default,axis=None, will average over all of the elements of the input array. If axis is negative it counts from the last to the first axis. #TODO Issue #351: If axis is a tuple of ints, averaging is performed on all of the axes specified in the tuple instead of a single axis or all the axes as before.weights (DNDarray, optional) – An array of weights associated with the values in
x. Each value inxcontributes to the average according to its associated weight. The weights array can either be 1D (in which case its length must be the size ofxalong the given axis) or of the same shape asx. Ifweights=None, then all data inxare assumed to have a weight equal to one, the result is equivalent tomean().returned (bool, optional) – If
True, the tuple(average, sum_of_weights)is returned, otherwise only the average is returned. Ifweights=None,sum_of_weightsis equivalent to the number of elements over which the average is taken.
- Raises:
ZeroDivisionError – When all weights along axis are zero.
TypeError – When the length of 1D weights is not the same as the shape of
xalong axis.
Examples
>>> data = ht.arange(1, 5, dtype=float) >>> data DNDarray([1., 2., 3., 4.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.average(data) DNDarray(2.5000, dtype=ht.float32, device=cpu:0, split=None) >>> ht.average(ht.arange(1, 11, dtype=float), weights=ht.arange(10, 0, -1)) DNDarray([4.], dtype=ht.float64, device=cpu:0, split=None) >>> data = ht.array([[0, 1], [2, 3], [4, 5]], dtype=float, split=1) >>> weights = ht.array([1.0 / 4, 3.0 / 4]) >>> ht.average(data, axis=1, weights=weights) DNDarray([0.7500, 2.7500, 4.7500], dtype=ht.float32, device=cpu:0, split=None) >>> ht.average(data, weights=weights) Traceback (most recent call last): ... TypeError: Axis must be specified when shapes of x and weights differ.
- bincount(x: heat.core.dndarray.DNDarray, weights: heat.core.dndarray.DNDarray | None = None, minlength: int = 0) heat.core.dndarray.DNDarray[source]
Count number of occurrences of each value in array of non-negative ints. Return a non-distributed
DNDarrayof length max(x) + 1 if input is non-empty, else 0.The number of bins (size 1) is one larger than the largest value in x unless x is empty, in which case the result is a tensor of size 0. If minlength is specified, the number of bins is at least minlength and if x is empty, then the result is tensor of size minlength filled with zeros. If n is the value at position i, out[n] += weights[i] if weights is specified else out[n] += 1.
- Parameters:
- Raises:
ValueError – If x and weights don’t have the same distribution.
Examples
>>> ht.bincount(ht.arange(5)) DNDarray([1, 1, 1, 1, 1], dtype=ht.int64, device=cpu:0, split=None) >>> ht.bincount(ht.array([0, 1, 3, 2, 1]), weights=ht.array([0, 0.5, 1, 1.5, 2])) DNDarray([0.0000, 2.5000, 1.5000, 1.0000], dtype=ht.float32, device=cpu:0, split=None)
- bucketize(input: heat.core.dndarray.DNDarray, boundaries: heat.core.dndarray.DNDarray | torch.Tensor, out_int32: bool = False, right: bool = False, out: heat.core.dndarray.DNDarray = None) heat.core.dndarray.DNDarray[source]
Returns the indices of the buckets to which each value in the input belongs, where the boundaries of the buckets are set by boundaries.
- Parameters:
input (DNDarray) – The input array.
boundaries (DNDarray or torch.Tensor) – monotonically increasing sequence defining the bucket boundaries, 1-dimensional, not distributed
out_int32 (bool, optional) – set the dtype of the output to
ht.int64(False) orht.int32(True)right (bool, optional) – indicate whether the buckets include the right (False) or left (True) boundaries, see Notes.
out (DNDarray, optional) – The output array, must be the shame shape and split as the input array.
Notes
This function uses the PyTorch’s setting for
right:right
returned index i satisfies
False
boundaries[i-1] < x <= boundaries[i]
True
boundaries[i-1] <= x < boundaries[i]
- Raises:
RuntimeError – If boundaries is distributed.
See also
digitizeNumPy-like version of this function.
Examples
>>> boundaries = ht.array([1, 3, 5, 7, 9]) >>> v = ht.array([[3, 6, 9], [3, 6, 9]]) >>> ht.bucketize(v, boundaries) DNDarray([[1, 3, 4], [1, 3, 4]], dtype=ht.int64, device=cpu:0, split=None) >>> ht.bucketize(v, boundaries, right=True) DNDarray([[2, 3, 5], [2, 3, 5]], dtype=ht.int64, device=cpu:0, split=None)
- cov(m: heat.core.dndarray.DNDarray, y: heat.core.dndarray.DNDarray | None = None, rowvar: bool = True, bias: bool = False, ddof: int | None = None) heat.core.dndarray.DNDarray[source]
Estimate the covariance matrix of some data, m. For more imformation on the algorithm please see the numpy function of the same name
- Parameters:
m (DNDarray) – A 1-D or 2-D array containing multiple variables and observations. Each row of
mrepresents a variable, and each column a single observation of all those variables.y (DNDarray, optional) – An additional set of variables and observations.
yhas the same form as that ofm.rowvar (bool, optional) – If
True(default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.bias (bool, optional) – Default normalization (
False) is by (N - 1), where N is the number of observations given (unbiased estimate). IfTrue, then normalization is by N. These values can be overridden by using the keywordddofin numpy versions >= 1.5.ddof (int, optional) – If not
Nonethe default value implied bybiasis overridden. Note thatddof=1will return the unbiased estimate andddof=0will return the simple average.
- digitize(x: heat.core.dndarray.DNDarray, bins: heat.core.dndarray.DNDarray | torch.Tensor, right: bool = False) heat.core.dndarray.DNDarray[source]
Return the indices of the bins to which each value in the input array x belongs. If values in x are beyond the bounds of bins, 0 or len(bins) is returned as appropriate.
- Parameters:
Notes
This function uses NumPy’s setting for
right:right
order of bins
returned index i satisfies
False
increasing
bins[i-1] <= x < bins[i]
True
increasing
bins[i-1] < x <= bins[i]
False
decreasing
bins[i-1] > x >= bins[i]
True
decreasing
bins[i-1] >= x > bins[i]
- Raises:
RuntimeError – If bins is distributed.
See also
bucketizePyTorch-like version of this function.
Examples
>>> x = ht.array([1.2, 10.0, 12.4, 15.5, 20.0]) >>> bins = ht.array([0, 5, 10, 15, 20]) >>> ht.digitize(x, bins, right=True) DNDarray([1, 2, 3, 4, 4], dtype=ht.int64, device=cpu:0, split=None) >>> ht.digitize(x, bins, right=False) DNDarray([1, 3, 3, 4, 5], dtype=ht.int64, device=cpu:0, split=None)
- histc(input: heat.core.dndarray.DNDarray, bins: int = 100, min: int = 0, max: int = 0, out: heat.core.dndarray.DNDarray | None = None) heat.core.dndarray.DNDarray[source]
Return the histogram of a DNDarray.
The elements are sorted into equal width bins between min and max. If min and max are both equal, the minimum and maximum values of the data are used. Elements lower than min and higher than max are ignored.
- Parameters:
Examples
>>> ht.histc(ht.array([1.0, 2, 1]), bins=4, min=0, max=3) DNDarray([0., 2., 1., 0.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.histc(ht.arange(10, dtype=ht.float64, split=0), bins=10) DNDarray([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=ht.float64, device=cpu:0, split=None)
- histogram(a: heat.core.dndarray.DNDarray, bins: int = 10, range: Tuple[int, int] = (0, 0), normed: bool | None = None, weights: heat.core.dndarray.DNDarray | None = None, density: bool | None = None) heat.core.dndarray.DNDarray[source]
Compute the histogram of a DNDarray.
- Parameters:
a (DNDarray) – the input array, must be of float type
bins (int, optional) – number of histogram bins
range (Tuple[int,int], optional) – lower and upper end of the bins. If not provided, range is simply (a.min(), a.max()).
normed (bool, optional) – Deprecated since NumPy version 1.6. TODO: remove.
weights (DNDarray, optional) – array of weights. Not implemented yet.
density (bool, optional) – Not implemented yet.
Notes
This is a wrapper function of
histc()for some basic compatibility with the NumPy API.See also
- kurtosis(x: heat.core.dndarray.DNDarray, axis: int | None = None, unbiased: bool = True, Fischer: bool = True) heat.core.dndarray.DNDarray[source]
Compute the kurtosis (Fisher or Pearson) of a dataset. Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution.
If unbiased is True (defualt) then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators
- Parameters:
x (ht.DNDarray) – Input array
axis (NoneType or Int) – Axis along which skewness is calculated, Default is to compute over the whole array x
unbiased (Bool) – if True (default) the calculations are corrected for bias
Fischer (bool) – Whether use Fischer’s definition or not. If true 3. is subtracted from the result.
Warning
UserWarning: Dependent on the axis given and the split configuration, a UserWarning may be thrown during this function as data is transferred between processes.
- max(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None, out: heat.core.dndarray.DNDarray | None = None, keepdims: bool | None = None) heat.core.dndarray.DNDarray[source]
Return the maximum along a given axis.
- Parameters:
x (DNDarray) – Input array.
axis (None or int or Tuple[int,...], optional) – Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the maximum is selected over multiple axes, instead of a single axis or all the axes as before.
out (DNDarray, optional) – Tuple of two output arrays
(max, max_indices). Must be of the same shape and buffer length as the expected output. The minimum value of an output element. Must be present to allow computation on empty slice.keepdims (bool, optional) – If this is set to
True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.
Examples
>>> a = ht.float32([ [1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12] ]) >>> ht.max(a) DNDarray([12.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.max(a, axis=0) DNDarray([10., 11., 12.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.max(a, axis=1) DNDarray([ 3., 6., 9., 12.], dtype=ht.float32, device=cpu:0, split=None)
- maximum(x1: heat.core.dndarray.DNDarray, x2: heat.core.dndarray.DNDarray, out: heat.core.dndarray.DNDarray | None = None) heat.core.dndarray.DNDarray[source]
Compares two
DNDarraysand returns a newDNDarraycontaining the element-wise maxima. TheDNDarraysmust have the same shape, or shapes that can be broadcast to a single shape. For broadcasting semantics, see: https://pytorch.org/docs/stable/notes/broadcasting.html If one of the elements being compared isNaN, then that element is returned. TODO: Check this: If both elements are NaNs then the first is returned. The latter distinction is important for complex NaNs, which are defined as at least one of the real or imaginary parts beingNaN. The net effect is that NaNs are propagated.- Parameters:
x1 (DNDarray) – The first array containing the elements to be compared.
x2 (DNDarray) – The second array containing the elements to be compared.
out (DNDarray, optional) – A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or
None, a freshly-allocated array is returned.
Examples
>>> import heat as ht >>> a = ht.random.randn(3, 4) >>> a DNDarray([[ 0.2701, -0.6993, 1.2197, 0.0579], [ 0.6815, 0.4722, -0.3947, -0.3030], [ 1.0101, -1.2460, -1.3953, -0.6879]], dtype=ht.float32, device=cpu:0, split=None) >>> b = ht.random.randn(3, 4) >>> b DNDarray([[ 0.9664, 0.6159, -0.8555, 0.8204], [-1.2200, -0.0759, 0.0437, 0.4700], [ 1.2271, 1.0530, 0.1095, 0.8386]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.maximum(a, b) DNDarray([[0.9664, 0.6159, 1.2197, 0.8204], [0.6815, 0.4722, 0.0437, 0.4700], [1.2271, 1.0530, 0.1095, 0.8386]], dtype=ht.float32, device=cpu:0, split=None) >>> c = ht.random.randn(1, 4) >>> c DNDarray([[-0.5363, -0.9765, 0.4099, 0.3520]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.maximum(a, c) DNDarray([[ 0.2701, -0.6993, 1.2197, 0.3520], [ 0.6815, 0.4722, 0.4099, 0.3520], [ 1.0101, -0.9765, 0.4099, 0.3520]], dtype=ht.float32, device=cpu:0, split=None) >>> d = ht.random.randn(3, 4, 5) >>> ht.maximum(a, d) ValueError: operands could not be broadcast, input shapes (3, 4) (3, 4, 5)
- mean(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None) heat.core.dndarray.DNDarray[source]
Calculates and returns the mean of a
DNDarray. If an axis is given, the mean will be taken in that direction.- Parameters:
x (DNDarray) – Values for which the mean is calculated for. The dtype of
xmust be a floataxis (None or int or iterable) – Axis which the mean is taken in. Default
Nonecalculates mean of all data items.
Notes
Split semantics when axis is an integer:
if
axis==x.split, thenmean(x).split=Noneif
axis>split, thenmean(x).split=x.splitif
axis<split, thenmean(x).split=x.split-1
Examples
>>> a = ht.random.randn(1, 3) >>> a DNDarray([[-0.1164, 1.0446, -0.4093]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.mean(a) DNDarray(0.1730, dtype=ht.float32, device=cpu:0, split=None) >>> a = ht.random.randn(4, 4) >>> a DNDarray([[-1.0585, 0.7541, -1.1011, 0.5009], [-1.3575, 0.3344, 0.4506, 0.7379], [-0.4337, -0.6516, -1.3690, -0.8772], [ 0.6929, -1.0989, -0.9961, 0.3547]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.mean(a, 1) DNDarray([-0.2262, 0.0413, -0.8328, -0.2619], dtype=ht.float32, device=cpu:0, split=None) >>> ht.mean(a, 0) DNDarray([-0.5392, -0.1655, -0.7539, 0.1791], dtype=ht.float32, device=cpu:0, split=None) >>> a = ht.random.randn(4, 4) >>> a DNDarray([[-0.1441, 0.5016, 0.8907, 0.6318], [-1.1690, -1.2657, 1.4840, -0.1014], [ 0.4133, 1.4168, 1.3499, 1.0340], [-0.9236, -0.7535, -0.2466, -0.9703]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.mean(a, (0, 1)) DNDarray(0.1342, dtype=ht.float32, device=cpu:0, split=None)
- median(x: heat.core.dndarray.DNDarray, axis: int | None = None, keepdims: bool = False, sketched: bool = False, sketch_size: float | None = 1.0 / MPI.COMM_WORLD.size) heat.core.dndarray.DNDarray[source]
Compute the median of the data along the specified axis. Returns the median of the
DNDarrayelements. Per default, the “true” median of the entire data set is computed; however, the argument sketched allows to switch to a faster but less accurate version that computes the median only on behalf of a random subset of the data set (“sketch”).- Parameters:
x (DNDarray) – Input tensor
axis (int, or None, optional) – Axis along which the median is computed. Default is
None, i.e., the median is computed along a flattened version of theDNDarray.keepdims (bool, optional) – If True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result can broadcast correctly against the original array
a.sketched (bool, optional) – If True, the median is computed on a random subset of the data set (“sketch”). This is faster but less accurate. Default is False. The size of the sketch is controlled by the argument sketch_size.
sketch_size (float, optional) – The size of the sketch as a fraction of the data set size. Default is 1./n_proc where n_proc is the number of MPI processes, e.g. n_proc = MPI.COMM_WORLD.size. Must be in the range (0, 1). Ignored for sketched = False.
- min(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None, out: heat.core.dndarray.DNDarray | None = None, keepdims: bool | None = None) heat.core.dndarray.DNDarray[source]
Return the minimum along a given axis.
- Parameters:
x (DNDarray) – Input array.
axis (None or int or Tuple[int,...]) – Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the minimum is selected over multiple axes, instead of a single axis or all the axes as before.
out (Tuple[DNDarray,DNDarray], optional) – Tuple of two output arrays
(min, min_indices). Must be of the same shape and buffer length as the expected output. The maximum value of an output element. Must be present to allow computation on empty slice.keepdims (bool, optional) – If this is set to
True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.
Examples
>>> a = ht.float32([ [1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12] ]) >>> ht.min(a) DNDarray([1.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.min(a, axis=0) DNDarray([1., 2., 3.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.min(a, axis=1) DNDarray([ 1., 4., 7., 10.], dtype=ht.float32, device=cpu:0, split=None)
- minimum(x1: heat.core.dndarray.DNDarray, x2: heat.core.dndarray.DNDarray, out: heat.core.dndarray.DNDarray | None = None) heat.core.dndarray.DNDarray[source]
Compares two
DNDarraysand returns a newDNDarraycontaining the element-wise minima. If one of the elements being compared isNaN, then that element is returned. They must have the same shape, or shapes that can be broadcast to a single shape. For broadcasting semantics, see: https://pytorch.org/docs/stable/notes/broadcasting.html TODO: Check this: If both elements are NaNs then the first is returned. The latter distinction is important for complex NaNs, which are defined as at least one of the real or imaginary parts beingNaN. The net effect is that NaNs are propagated.- Parameters:
x1 (DNDarray) – The first array containing the elements to be compared.
x2 (DNDarray) – The second array containing the elements to be compared.
out (DNDarray, optional) – A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or
None, a freshly-allocated array is returned.
Examples
>>> import heat as ht >>> a = ht.random.randn(3, 4) >>> a DNDarray([[-0.5462, 0.0079, 1.2828, 1.4980], [ 0.6503, -1.1069, 1.2131, 1.4003], [-0.3203, -0.2318, 1.0388, 0.4439]], dtype=ht.float32, device=cpu:0, split=None) >>> b = ht.random.randn(3, 4) >>> b DNDarray([[ 1.8505, 2.3055, -0.2825, -1.4718], [-0.3684, 1.6866, -0.8570, -0.4779], [ 1.0532, 0.3775, -0.8669, -1.7275]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.minimum(a, b) DNDarray([[-0.5462, 0.0079, -0.2825, -1.4718], [-0.3684, -1.1069, -0.8570, -0.4779], [-0.3203, -0.2318, -0.8669, -1.7275]], dtype=ht.float32, device=cpu:0, split=None) >>> c = ht.random.randn(1, 4) >>> c DNDarray([[-1.4358, 1.2914, -0.6042, -1.4009]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.minimum(a, c) DNDarray([[-1.4358, 0.0079, -0.6042, -1.4009], [-1.4358, -1.1069, -0.6042, -1.4009], [-1.4358, -0.2318, -0.6042, -1.4009]], dtype=ht.float32, device=cpu:0, split=None) >>> d = ht.random.randn(3, 4, 5) >>> ht.minimum(a, d) ValueError: operands could not be broadcast, input shapes (3, 4) (3, 4, 5)
- percentile(x: heat.core.dndarray.DNDarray, q: heat.core.dndarray.DNDarray | int | float | Tuple | List, axis: int | Tuple[int, Ellipsis] | None = None, out: heat.core.dndarray.DNDarray | None = None, interpolation: str = 'linear', keepdims: bool = False, sketched: bool = False, sketch_size: float | None = 1.0 / MPI.COMM_WORLD.size) heat.core.dndarray.DNDarray[source]
Compute the q-th percentile of the data along the specified axis/axes. Returns the q-th percentile(s) of the
DNDarrayelements. Per default, the “true” percentile(s) of the entire data set are computed; however, the argument sketched allows to switch to a faster but inaccurate version that computes the percentile only on behalf of a random subset of the data set (“sketch”).- Parameters:
x (DNDarray) – Input tensor
q (DNDarray, scalar, or list of scalars) – Percentile or sequence of percentiles to compute. Must belong to the interval [0, 100].
axis (int, tuple of ints, or None, optional) – Axis (if int) or axes (if tuple) along which the percentiles are computed. Default is None, corresponds to calculating the percentile over the flattened array.
out (DNDarray, optional.) – Output buffer.
interpolation (str, optional) –
Interpolation method to use when the desired percentile lies between two data points \(i < j\). Can be one of:
‘linear’: \(i + (j - i) \cdot fraction\), where fraction is the fractional part of the index surrounded by i and j.
‘lower’: i.
‘higher’: j.
‘nearest’: i or j, whichever is nearest.
‘midpoint’: \((i + j) / 2\).
keepdims (bool, optional) – If True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result can broadcast correctly against the original array x.
sketched (bool, optional) – If False (default), the entire data is used and no sketching is performed. If True, a fraction of the data to use for estimating the percentile. The fraction is determined by sketch_size.
sketch_size (float, optional) – The fraction of the data to use for estimating the percentile; needs to be strictly between 0 and 1. The default is 1/nprocs, where nprocs is the number of MPI processes involved in the calculation, i.e., roughly the portion of the data that is anyway processed on a single process. Ignored for sketched = False.
- ptp(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None, out: heat.core.dndarray.DNDarray | None = None, keepdims: bool = False) heat.core.dndarray.DNDarray[source]
Range of values (maximum - minimum) along a given axis.
- Parameters:
x (ht.DNDarray) – Input array.
axis (None or int or Tuple[int,...], optional) – Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the ptp is selected over multiple axes, instead of a single axis or all the axes as before.
out (DNDarray, optional) – Output array to place the result.
keepdims (bool, optional) – If this is set to
True, the axes which are reduced are left in the result as dimensions with size one.
- Returns:
ptp – An array with the same shape as x, with the specified axis removed. If keepdims is True, the reduced axes are left in the result as dimensions with size one.
- Return type:
ht.DNDarray
Examples
>>> a = ht.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype=ht.float32) >>> ht.ptp(a) DNDarray([11.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.ptp(a, axis=0) DNDarray([9., 9., 9.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.ptp(a, axis=1) DNDarray([2., 2., 2., 2.], dtype=ht.float32, device=cpu:0, split=None)
- skew(x: heat.core.dndarray.DNDarray, axis: int = None, unbiased: bool = True) heat.core.dndarray.DNDarray[source]
Compute the sample skewness of a data set.
- Parameters:
x (ht.DNDarray) – Input array
axis (NoneType or Int) – Axis along which skewness is calculated, Default is to compute over the whole array x
unbiased (Bool) – if True (default) the calculations are corrected for bias
Warning
UserWarning: Dependent on the axis given and the split configuration, a UserWarning may be thrown during this function as data is transferred between processes.
- std(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int] | List[int] = None, ddof: int = 0, **kwargs: object) heat.core.dndarray.DNDarray[source]
Calculates the standard deviation of a
DNDarraywith the bessel correction. If an axis is given, the variance will be taken in that direction.- Parameters:
x (DNDarray) – array for which the std is calculated for. The datatype of
xmust be a floataxis (None or int or iterable) – Axis which the std is taken in. Default
Nonecalculates std of all data items.ddof (int, optional) – Delta Degrees of Freedom: the denominator implicitely used in the calculation is N - ddof, where N represents the number of elements. If
ddof=1, the Bessel correction will be applied. Settingddof>1raises aNotImplementedError.**kwargs – Extra keyword arguments
Examples
>>> a = ht.random.randn(1, 3) >>> a DNDarray([[ 0.5714, 0.0048, -0.2942]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.std(a) DNDarray(0.3590, dtype=ht.float32, device=cpu:0, split=None) >>> a = ht.random.randn(4, 4) >>> a DNDarray([[ 0.8488, 1.2225, 1.2498, -1.4592], [-0.5820, -0.3928, 0.1509, -0.0174], [ 0.6426, -1.8149, 0.1369, 0.0042], [-0.6043, -0.0523, -1.6653, 0.6631]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.std(a, 1, ddof=1) DNDarray([1.2961, 0.3362, 1.0739, 0.9820], dtype=ht.float32, device=cpu:0, split=None) >>> ht.std(a, 1) DNDarray([1.2961, 0.3362, 1.0739, 0.9820], dtype=ht.float32, device=cpu:0, split=None)
- var(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int] | List[int] = None, ddof: int = 0, **kwargs: object) heat.core.dndarray.DNDarray[source]
Calculates and returns the variance of a
DNDarray. If an axis is given, the variance will be taken in that direction.- Parameters:
x (DNDarray) – Array for which the variance is calculated for. The datatype of
xmust be a floataxis (None or int or iterable) – Axis which the std is taken in. Default
Nonecalculates std of all data items.ddof (int, optional) – Delta Degrees of Freedom: the denominator implicitely used in the calculation is N - ddof, where N represents the number of elements. If
ddof=1, the Bessel correction will be applied. Settingddof>1raises aNotImplementedError.**kwargs – Extra keyword arguments
Notes
Split semantics when axis is an integer:
if
axis=x.split, thenvar(x).split=Noneif
axis>split, thenvar(x).split = x.splitif
axis<split, thenvar(x).split=x.split - 1
The variance is the average of the squared deviations from the mean, i.e.,
var=mean(abs(x - x.mean())**2). The mean is normally calculated asx.sum()/N, whereN = len(x). If, however,ddofis specified, the divisorN - ddofis used instead. In standard statistical practice,ddof=1provides an unbiased estimator of the variance of a hypothetical infinite population.ddof=0provides a maximum likelihood estimate of the variance for normally distributed variables.Examples
>>> a = ht.random.randn(1, 3) >>> a DNDarray([[-2.3589, -0.2073, 0.8806]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a) DNDarray(1.8119, dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, ddof=1) DNDarray(2.7179, dtype=ht.float32, device=cpu:0, split=None) >>> a = ht.random.randn(4, 4) >>> a DNDarray([[-0.8523, -1.4982, -0.5848, -0.2554], [ 0.8458, -0.3125, -0.2430, 1.9016], [-0.6778, -0.3584, -1.5112, 0.6545], [-0.9161, 0.0168, 0.0462, 0.5964]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, 1) DNDarray([0.2777, 1.0957, 0.8015, 0.3936], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, 0) DNDarray([0.7001, 0.4376, 0.4576, 0.7890], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, 0, ddof=1) DNDarray([0.7001, 0.4376, 0.4576, 0.7890], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, 0, ddof=0) DNDarray([0.7001, 0.4376, 0.4576, 0.7890], dtype=ht.float32, device=cpu:0, split=None)