heat.statistics
Distributed statistical operations.
Module Contents
- argmax(x: heat.core.dndarray.DNDarray, axis: int | None = None, out: heat.core.dndarray.DNDarray | None = None, **kwargs: object) heat.core.dndarray.DNDarray
Returns an array of the indices of the maximum values along an axis. It has the same shape as
x.shape
with the dimension along axis removed.- Parameters:
Examples
>>> a = ht.random.randn(3, 3) >>> a DNDarray([[ 1.0661, 0.7036, -2.0908], [-0.7534, -0.4986, -0.7751], [-0.4815, 1.9436, 0.6400]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.argmax(a) DNDarray([7], dtype=ht.int64, device=cpu:0, split=None) >>> ht.argmax(a, axis=0) DNDarray([0, 2, 2], dtype=ht.int64, device=cpu:0, split=None) >>> ht.argmax(a, axis=1) DNDarray([0, 1, 1], dtype=ht.int64, device=cpu:0, split=None)
- argmin(x: heat.core.dndarray.DNDarray, axis: int | None = None, out: heat.core.dndarray.DNDarray | None = None, **kwargs: object) heat.core.dndarray.DNDarray
Returns an array of the indices of the minimum values along an axis. It has the same shape as
x.shape
with the dimension along axis removed.- Parameters:
Examples
>>> a = ht.random.randn(3, 3) >>> a DNDarray([[ 1.0661, 0.7036, -2.0908], [-0.7534, -0.4986, -0.7751], [-0.4815, 1.9436, 0.6400]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.argmin(a) DNDarray([2], dtype=ht.int64, device=cpu:0, split=None) >>> ht.argmin(a, axis=0) DNDarray([1, 1, 0], dtype=ht.int64, device=cpu:0, split=None) >>> ht.argmin(a, axis=1) DNDarray([2, 2, 0], dtype=ht.int64, device=cpu:0, split=None)
- average(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None, weights: heat.core.dndarray.DNDarray | None = None, returned: bool = False) heat.core.dndarray.DNDarray | Tuple[heat.core.dndarray.DNDarray, Ellipsis]
Compute the weighted average along the specified axis.
If
returned=True
, return a tuple with the average as the first element and the sum of the weights as the second element.sum_of_weights
is of the same type asaverage
.- Parameters:
x (DNDarray) – Array containing data to be averaged.
axis (None or int or Tuple[int,...], optional) – Axis or axes along which to average
x
. The default,axis=None
, will average over all of the elements of the input array. If axis is negative it counts from the last to the first axis. #TODO Issue #351: If axis is a tuple of ints, averaging is performed on all of the axes specified in the tuple instead of a single axis or all the axes as before.weights (DNDarray, optional) – An array of weights associated with the values in
x
. Each value inx
contributes to the average according to its associated weight. The weights array can either be 1D (in which case its length must be the size ofx
along the given axis) or of the same shape asx
. Ifweights=None
, then all data inx
are assumed to have a weight equal to one, the result is equivalent tomean()
.returned (bool, optional) – If
True
, the tuple(average, sum_of_weights)
is returned, otherwise only the average is returned. Ifweights=None
,sum_of_weights
is equivalent to the number of elements over which the average is taken.
- Raises:
ZeroDivisionError – When all weights along axis are zero.
TypeError – When the length of 1D weights is not the same as the shape of
x
along axis.
Examples
>>> data = ht.arange(1,5, dtype=float) >>> data DNDarray([1., 2., 3., 4.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.average(data) DNDarray(2.5000, dtype=ht.float32, device=cpu:0, split=None) >>> ht.average(ht.arange(1,11, dtype=float), weights=ht.arange(10,0,-1)) DNDarray([4.], dtype=ht.float64, device=cpu:0, split=None) >>> data = ht.array([[0, 1], [2, 3], [4, 5]], dtype=float, split=1) >>> weights = ht.array([1./4, 3./4]) >>> ht.average(data, axis=1, weights=weights) DNDarray([0.7500, 2.7500, 4.7500], dtype=ht.float32, device=cpu:0, split=None) >>> ht.average(data, weights=weights) Traceback (most recent call last): ... TypeError: Axis must be specified when shapes of x and weights differ.
- bincount(x: heat.core.dndarray.DNDarray, weights: heat.core.dndarray.DNDarray | None = None, minlength: int = 0) heat.core.dndarray.DNDarray
Count number of occurrences of each value in array of non-negative ints. Return a non-distributed
DNDarray
of length max(x) + 1 if input is non-empty, else 0.The number of bins (size 1) is one larger than the largest value in x unless x is empty, in which case the result is a tensor of size 0. If minlength is specified, the number of bins is at least minlength and if x is empty, then the result is tensor of size minlength filled with zeros. If n is the value at position i, out[n] += weights[i] if weights is specified else out[n] += 1.
- Parameters:
- Raises:
ValueError – If x and weights don’t have the same distribution.
Examples
>>> ht.bincount(ht.arange(5)) DNDarray([1, 1, 1, 1, 1], dtype=ht.int64, device=cpu:0, split=None) >>> ht.bincount(ht.array([0, 1, 3, 2, 1]), weights=ht.array([0, 0.5, 1, 1.5, 2])) DNDarray([0.0000, 2.5000, 1.5000, 1.0000], dtype=ht.float32, device=cpu:0, split=None)
- bucketize(input: heat.core.dndarray.DNDarray, boundaries: heat.core.dndarray.DNDarray | torch.Tensor, out_int32: bool = False, right: bool = False, out: heat.core.dndarray.DNDarray = None) heat.core.dndarray.DNDarray
Returns the indices of the buckets to which each value in the input belongs, where the boundaries of the buckets are set by boundaries.
- Parameters:
input (DNDarray) – The input array.
boundaries (DNDarray or torch.Tensor) – monotonically increasing sequence defining the bucket boundaries, 1-dimensional, not distributed
out_int32 (bool, optional) – set the dtype of the output to
ht.int64
(False) orht.int32
(True)right (bool, optional) – indicate whether the buckets include the right (False) or left (True) boundaries, see Notes.
out (DNDarray, optional) – The output array, must be the shame shape and split as the input array.
Notes
This function uses the PyTorch’s setting for
right
:right
returned index i satisfies
False
boundaries[i-1] < x <= boundaries[i]
True
boundaries[i-1] <= x < boundaries[i]
- Raises:
RuntimeError – If boundaries is distributed.
See also
digitize
NumPy-like version of this function.
Examples
>>> boundaries = ht.array([1, 3, 5, 7, 9]) >>> v = ht.array([[3, 6, 9], [3, 6, 9]]) >>> ht.bucketize(v, boundaries) DNDarray([[1, 3, 4], [1, 3, 4]], dtype=ht.int64, device=cpu:0, split=None) >>> ht.bucketize(v, boundaries, right=True) DNDarray([[2, 3, 5], [2, 3, 5]], dtype=ht.int64, device=cpu:0, split=None)
- cov(m: heat.core.dndarray.DNDarray, y: heat.core.dndarray.DNDarray | None = None, rowvar: bool = True, bias: bool = False, ddof: int | None = None) heat.core.dndarray.DNDarray
Estimate the covariance matrix of some data, m. For more imformation on the algorithm please see the numpy function of the same name
- Parameters:
m (DNDarray) – A 1-D or 2-D array containing multiple variables and observations. Each row of
m
represents a variable, and each column a single observation of all those variables.y (DNDarray, optional) – An additional set of variables and observations.
y
has the same form as that ofm
.rowvar (bool, optional) – If
True
(default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.bias (bool, optional) – Default normalization (
False
) is by (N - 1), where N is the number of observations given (unbiased estimate). IfTrue
, then normalization is by N. These values can be overridden by using the keywordddof
in numpy versions >= 1.5.ddof (int, optional) – If not
None
the default value implied bybias
is overridden. Note thatddof=1
will return the unbiased estimate andddof=0
will return the simple average.
- digitize(x: heat.core.dndarray.DNDarray, bins: heat.core.dndarray.DNDarray | torch.Tensor, right: bool = False) heat.core.dndarray.DNDarray
Return the indices of the bins to which each value in the input array x belongs. If values in x are beyond the bounds of bins, 0 or len(bins) is returned as appropriate.
- Parameters:
Notes
This function uses NumPy’s setting for
right
:right
order of bins
returned index i satisfies
False
increasing
bins[i-1] <= x < bins[i]
True
increasing
bins[i-1] < x <= bins[i]
False
decreasing
bins[i-1] > x >= bins[i]
True
decreasing
bins[i-1] >= x > bins[i]
- Raises:
RuntimeError – If bins is distributed.
See also
bucketize
PyTorch-like version of this function.
Examples
>>> x = ht.array([1.2, 10.0, 12.4, 15.5, 20.]) >>> bins = ht.array([0, 5, 10, 15, 20]) >>> ht.digitize(x,bins,right=True) DNDarray([1, 2, 3, 4, 4], dtype=ht.int64, device=cpu:0, split=None) >>> ht.digitize(x,bins,right=False) DNDarray([1, 3, 3, 4, 5], dtype=ht.int64, device=cpu:0, split=None)
- histc(input: heat.core.dndarray.DNDarray, bins: int = 100, min: int = 0, max: int = 0, out: heat.core.dndarray.DNDarray | None = None) heat.core.dndarray.DNDarray
Return the histogram of a DNDarray.
The elements are sorted into equal width bins between min and max. If min and max are both equal, the minimum and maximum values of the data are used. Elements lower than min and higher than max are ignored.
- Parameters:
Examples
>>> ht.histc(ht.array([1., 2, 1]), bins=4, min=0, max=3) DNDarray([0., 2., 1., 0.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.histc(ht.arange(10, dtype=ht.float64, split=0), bins=10) DNDarray([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=ht.float64, device=cpu:0, split=None)
- histogram(a: heat.core.dndarray.DNDarray, bins: int = 10, range: Tuple[int, int] = (0, 0), normed: bool | None = None, weights: heat.core.dndarray.DNDarray | None = None, density: bool | None = None) heat.core.dndarray.DNDarray
Compute the histogram of a DNDarray.
- Parameters:
a (DNDarray) – the input array, must be of float type
bins (int, optional) – number of histogram bins
range (Tuple[int,int], optional) – lower and upper end of the bins. If not provided, range is simply (a.min(), a.max()).
normed (bool, optional) – Deprecated since NumPy version 1.6. TODO: remove.
weights (DNDarray, optional) – array of weights. Not implemented yet.
density (bool, optional) – Not implemented yet.
Notes
This is a wrapper function of
histc()
for some basic compatibility with the NumPy API.See also
- kurtosis(x: heat.core.dndarray.DNDarray, axis: int | None = None, unbiased: bool = True, Fischer: bool = True) heat.core.dndarray.DNDarray
Compute the kurtosis (Fisher or Pearson) of a dataset. Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution.
If unbiased is True (defualt) then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators
- Parameters:
x (ht.DNDarray) – Input array
axis (NoneType or Int) – Axis along which skewness is calculated, Default is to compute over the whole array x
unbiased (Bool) – if True (default) the calculations are corrected for bias
Fischer (bool) – Whether use Fischer’s definition or not. If true 3. is subtracted from the result.
Warning
UserWarning: Dependent on the axis given and the split configuration, a UserWarning may be thrown during this function as data is transferred between processes.
- max(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None, out: heat.core.dndarray.DNDarray | None = None, keepdims: bool | None = None) heat.core.dndarray.DNDarray
Return the maximum along a given axis.
- Parameters:
x (DNDarray) – Input array.
axis (None or int or Tuple[int,...], optional) – Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the maximum is selected over multiple axes, instead of a single axis or all the axes as before.
out (DNDarray, optional) – Tuple of two output arrays
(max, max_indices)
. Must be of the same shape and buffer length as the expected output. The minimum value of an output element. Must be present to allow computation on empty slice.keepdims (bool, optional) – If this is set to
True
, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.
Examples
>>> a = ht.float32([ [1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12] ]) >>> ht.max(a) DNDarray([12.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.max(a, axis=0) DNDarray([10., 11., 12.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.max(a, axis=1) DNDarray([ 3., 6., 9., 12.], dtype=ht.float32, device=cpu:0, split=None)
- maximum(x1: heat.core.dndarray.DNDarray, x2: heat.core.dndarray.DNDarray, out: heat.core.dndarray.DNDarray | None = None) heat.core.dndarray.DNDarray
Compares two
DNDarrays
and returns a newDNDarray
containing the element-wise maxima. TheDNDarrays
must have the same shape, or shapes that can be broadcast to a single shape. For broadcasting semantics, see: https://pytorch.org/docs/stable/notes/broadcasting.html If one of the elements being compared isNaN
, then that element is returned. TODO: Check this: If both elements are NaNs then the first is returned. The latter distinction is important for complex NaNs, which are defined as at least one of the real or imaginary parts beingNaN
. The net effect is that NaNs are propagated.- Parameters:
x1 (DNDarray) – The first array containing the elements to be compared.
x2 (DNDarray) – The second array containing the elements to be compared.
out (DNDarray, optional) – A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or
None
, a freshly-allocated array is returned.
Examples
>>> import heat as ht >>> a = ht.random.randn(3, 4) >>> a DNDarray([[ 0.2701, -0.6993, 1.2197, 0.0579], [ 0.6815, 0.4722, -0.3947, -0.3030], [ 1.0101, -1.2460, -1.3953, -0.6879]], dtype=ht.float32, device=cpu:0, split=None) >>> b = ht.random.randn(3, 4) >>> b DNDarray([[ 0.9664, 0.6159, -0.8555, 0.8204], [-1.2200, -0.0759, 0.0437, 0.4700], [ 1.2271, 1.0530, 0.1095, 0.8386]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.maximum(a, b) DNDarray([[0.9664, 0.6159, 1.2197, 0.8204], [0.6815, 0.4722, 0.0437, 0.4700], [1.2271, 1.0530, 0.1095, 0.8386]], dtype=ht.float32, device=cpu:0, split=None) >>> c = ht.random.randn(1, 4) >>> c DNDarray([[-0.5363, -0.9765, 0.4099, 0.3520]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.maximum(a, c) DNDarray([[ 0.2701, -0.6993, 1.2197, 0.3520], [ 0.6815, 0.4722, 0.4099, 0.3520], [ 1.0101, -0.9765, 0.4099, 0.3520]], dtype=ht.float32, device=cpu:0, split=None) >>> d = ht.random.randn(3, 4, 5) >>> ht.maximum(a, d) ValueError: operands could not be broadcast, input shapes (3, 4) (3, 4, 5)
- mean(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None) heat.core.dndarray.DNDarray
Calculates and returns the mean of a
DNDarray
. If an axis is given, the mean will be taken in that direction.- Parameters:
x (DNDarray) – Values for which the mean is calculated for. The dtype of
x
must be a floataxis (None or int or iterable) – Axis which the mean is taken in. Default
None
calculates mean of all data items.
Notes
Split semantics when axis is an integer:
if
axis==x.split
, thenmean(x).split=None
if
axis>split
, thenmean(x).split=x.split
if
axis<split
, thenmean(x).split=x.split-1
Examples
>>> a = ht.random.randn(1,3) >>> a DNDarray([[-0.1164, 1.0446, -0.4093]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.mean(a) DNDarray(0.1730, dtype=ht.float32, device=cpu:0, split=None) >>> a = ht.random.randn(4,4) >>> a DNDarray([[-1.0585, 0.7541, -1.1011, 0.5009], [-1.3575, 0.3344, 0.4506, 0.7379], [-0.4337, -0.6516, -1.3690, -0.8772], [ 0.6929, -1.0989, -0.9961, 0.3547]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.mean(a, 1) DNDarray([-0.2262, 0.0413, -0.8328, -0.2619], dtype=ht.float32, device=cpu:0, split=None) >>> ht.mean(a, 0) DNDarray([-0.5392, -0.1655, -0.7539, 0.1791], dtype=ht.float32, device=cpu:0, split=None) >>> a = ht.random.randn(4,4) >>> a DNDarray([[-0.1441, 0.5016, 0.8907, 0.6318], [-1.1690, -1.2657, 1.4840, -0.1014], [ 0.4133, 1.4168, 1.3499, 1.0340], [-0.9236, -0.7535, -0.2466, -0.9703]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.mean(a, (0,1)) DNDarray(0.1342, dtype=ht.float32, device=cpu:0, split=None)
- median(x: heat.core.dndarray.DNDarray, axis: int | None = None, keepdims: bool = False) heat.core.dndarray.DNDarray
Compute the median of the data along the specified axis. Returns the median of the
DNDarray
elements.- Parameters:
x (DNDarray) – Input tensor
axis (int, or None, optional) – Axis along which the median is computed. Default is
None
, i.e., the median is computed along a flattened version of theDNDarray
.keepdims (bool, optional) – If True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result can broadcast correctly against the original array
a
.
- min(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int, Ellipsis] | None = None, out: heat.core.dndarray.DNDarray | None = None, keepdims: bool | None = None) heat.core.dndarray.DNDarray
Return the minimum along a given axis.
- Parameters:
x (DNDarray) – Input array.
axis (None or int or Tuple[int,...]) – Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the minimum is selected over multiple axes, instead of a single axis or all the axes as before.
out (Tuple[DNDarray,DNDarray], optional) – Tuple of two output arrays
(min, min_indices)
. Must be of the same shape and buffer length as the expected output. The maximum value of an output element. Must be present to allow computation on empty slice.keepdims (bool, optional) – If this is set to
True
, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.
Examples
>>> a = ht.float32([ [1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12] ]) >>> ht.min(a) DNDarray([1.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.min(a, axis=0) DNDarray([1., 2., 3.], dtype=ht.float32, device=cpu:0, split=None) >>> ht.min(a, axis=1) DNDarray([ 1., 4., 7., 10.], dtype=ht.float32, device=cpu:0, split=None)
- minimum(x1: heat.core.dndarray.DNDarray, x2: heat.core.dndarray.DNDarray, out: heat.core.dndarray.DNDarray | None = None) heat.core.dndarray.DNDarray
Compares two
DNDarrays
and returns a newDNDarray
containing the element-wise minima. If one of the elements being compared isNaN
, then that element is returned. They must have the same shape, or shapes that can be broadcast to a single shape. For broadcasting semantics, see: https://pytorch.org/docs/stable/notes/broadcasting.html TODO: Check this: If both elements are NaNs then the first is returned. The latter distinction is important for complex NaNs, which are defined as at least one of the real or imaginary parts beingNaN
. The net effect is that NaNs are propagated.- Parameters:
x1 (DNDarray) – The first array containing the elements to be compared.
x2 (DNDarray) – The second array containing the elements to be compared.
out (DNDarray, optional) – A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or
None
, a freshly-allocated array is returned.
Examples
>>> import heat as ht >>> a = ht.random.randn(3,4) >>> a DNDarray([[-0.5462, 0.0079, 1.2828, 1.4980], [ 0.6503, -1.1069, 1.2131, 1.4003], [-0.3203, -0.2318, 1.0388, 0.4439]], dtype=ht.float32, device=cpu:0, split=None) >>> b = ht.random.randn(3,4) >>> b DNDarray([[ 1.8505, 2.3055, -0.2825, -1.4718], [-0.3684, 1.6866, -0.8570, -0.4779], [ 1.0532, 0.3775, -0.8669, -1.7275]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.minimum(a,b) DNDarray([[-0.5462, 0.0079, -0.2825, -1.4718], [-0.3684, -1.1069, -0.8570, -0.4779], [-0.3203, -0.2318, -0.8669, -1.7275]], dtype=ht.float32, device=cpu:0, split=None) >>> c = ht.random.randn(1,4) >>> c DNDarray([[-1.4358, 1.2914, -0.6042, -1.4009]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.minimum(a,c) DNDarray([[-1.4358, 0.0079, -0.6042, -1.4009], [-1.4358, -1.1069, -0.6042, -1.4009], [-1.4358, -0.2318, -0.6042, -1.4009]], dtype=ht.float32, device=cpu:0, split=None) >>> d = ht.random.randn(3,4,5) >>> ht.minimum(a,d) ValueError: operands could not be broadcast, input shapes (3, 4) (3, 4, 5)
- percentile(x: heat.core.dndarray.DNDarray, q: heat.core.dndarray.DNDarray | int | float | Tuple | List, axis: int | None = None, out: heat.core.dndarray.DNDarray | None = None, interpolation: str = 'linear', keepdims: bool = False) heat.core.dndarray.DNDarray
Compute the q-th percentile of the data along the specified axis. Returns the q-th percentile(s) of the tensor elements.
- Parameters:
x (DNDarray) – Input tensor
q (DNDarray, scalar, or list of scalars) – Percentile or sequence of percentiles to compute. Must belong to the interval [0, 100].
axis (int, or None, optional) – Axis along which the percentiles are computed. Default is None.
out (DNDarray, optional.) – Output buffer.
interpolation (str, optional) –
Interpolation method to use when the desired percentile lies between two data points \(i < j\). Can be one of:
‘linear’: \(i + (j - i) \cdot fraction\), where fraction is the fractional part of the index surrounded by i and j.
‘lower’: i.
‘higher’: j.
‘nearest’: i or j, whichever is nearest.
‘midpoint’: \((i + j) / 2\).
keepdims (bool, optional) – If True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result can broadcast correctly against the original array x.
- skew(x: heat.core.dndarray.DNDarray, axis: int = None, unbiased: bool = True) heat.core.dndarray.DNDarray
Compute the sample skewness of a data set.
- Parameters:
x (ht.DNDarray) – Input array
axis (NoneType or Int) – Axis along which skewness is calculated, Default is to compute over the whole array x
unbiased (Bool) – if True (default) the calculations are corrected for bias
Warning
UserWarning: Dependent on the axis given and the split configuration, a UserWarning may be thrown during this function as data is transferred between processes.
- std(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int] | List[int] = None, ddof: int = 0, **kwargs: object) heat.core.dndarray.DNDarray
Calculates the standard deviation of a
DNDarray
with the bessel correction. If an axis is given, the variance will be taken in that direction.- Parameters:
x (DNDarray) – array for which the std is calculated for. The datatype of
x
must be a floataxis (None or int or iterable) – Axis which the std is taken in. Default
None
calculates std of all data items.ddof (int, optional) – Delta Degrees of Freedom: the denominator implicitely used in the calculation is N - ddof, where N represents the number of elements. If
ddof=1
, the Bessel correction will be applied. Settingddof>1
raises aNotImplementedError
.
Examples
>>> a = ht.random.randn(1, 3) >>> a DNDarray([[ 0.5714, 0.0048, -0.2942]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.std(a) DNDarray(0.3590, dtype=ht.float32, device=cpu:0, split=None) >>> a = ht.random.randn(4,4) >>> a DNDarray([[ 0.8488, 1.2225, 1.2498, -1.4592], [-0.5820, -0.3928, 0.1509, -0.0174], [ 0.6426, -1.8149, 0.1369, 0.0042], [-0.6043, -0.0523, -1.6653, 0.6631]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.std(a, 1, ddof=1) DNDarray([1.2961, 0.3362, 1.0739, 0.9820], dtype=ht.float32, device=cpu:0, split=None) >>> ht.std(a, 1) DNDarray([1.2961, 0.3362, 1.0739, 0.9820], dtype=ht.float32, device=cpu:0, split=None)
- var(x: heat.core.dndarray.DNDarray, axis: int | Tuple[int] | List[int] = None, ddof: int = 0, **kwargs: object) heat.core.dndarray.DNDarray
Calculates and returns the variance of a
DNDarray
. If an axis is given, the variance will be taken in that direction.- Parameters:
x (DNDarray) – Array for which the variance is calculated for. The datatype of
x
must be a floataxis (None or int or iterable) – Axis which the std is taken in. Default
None
calculates std of all data items.ddof (int, optional) – Delta Degrees of Freedom: the denominator implicitely used in the calculation is N - ddof, where N represents the number of elements. If
ddof=1
, the Bessel correction will be applied. Settingddof>1
raises aNotImplementedError
.
Notes
Split semantics when axis is an integer:
if
axis=x.split
, thenvar(x).split=None
if
axis>split
, thenvar(x).split = x.split
if
axis<split
, thenvar(x).split=x.split - 1
The variance is the average of the squared deviations from the mean, i.e.,
var=mean(abs(x - x.mean())**2)
. The mean is normally calculated asx.sum()/N
, whereN = len(x)
. If, however,ddof
is specified, the divisorN - ddof
is used instead. In standard statistical practice,ddof=1
provides an unbiased estimator of the variance of a hypothetical infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables.Examples
>>> a = ht.random.randn(1,3) >>> a DNDarray([[-2.3589, -0.2073, 0.8806]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a) DNDarray(1.8119, dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, ddof=1) DNDarray(2.7179, dtype=ht.float32, device=cpu:0, split=None) >>> a = ht.random.randn(4,4) >>> a DNDarray([[-0.8523, -1.4982, -0.5848, -0.2554], [ 0.8458, -0.3125, -0.2430, 1.9016], [-0.6778, -0.3584, -1.5112, 0.6545], [-0.9161, 0.0168, 0.0462, 0.5964]], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, 1) DNDarray([0.2777, 1.0957, 0.8015, 0.3936], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, 0) DNDarray([0.7001, 0.4376, 0.4576, 0.7890], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, 0, ddof=1) DNDarray([0.7001, 0.4376, 0.4576, 0.7890], dtype=ht.float32, device=cpu:0, split=None) >>> ht.var(a, 0, ddof=0) DNDarray([0.7001, 0.4376, 0.4576, 0.7890], dtype=ht.float32, device=cpu:0, split=None)