heat.preprocessing
Add the preprocessing functions to the ht.preprocessing namespace
Submodules
Package Contents
- _is_2D_float_DNDarray(input)
- _has_n_features(param, inputdata)
- _tol_wrt_dtype(inputdata)
- class StandardScaler(*, copy: bool = True, with_mean: bool = True, with_std: bool = True)
Bases:
heat.TransformMixin
,heat.BaseEstimator
Standardization of features to mean 0 and variance 1 by affine linear transformation; similar to sklearn.preprocessing.StandardScaler. The data set to be scaled must be stored as 2D-DNDarray of shape (n_datapoints, n_features). Shifting to mean 0 and scaling to variance 1 is applied to each feature independently.
- Parameters:
- Variables:
scale (DNDarray of shape (n_features,) or None) – Per feature relative scaling of the data to achieve unit variance. Set to
None
(no variance scaling applied) ifvar = None
orvar
below machine precision.mean (DNDarray of shape (n_features,) or None) – The mean value for each feature. Equal to
None
whenwith_mean=False
.var (DNDarray of shape (n_features,) or None) – Featurewise variance of the given data. Equal to
None
whenwith_std=False
.
- fit(X: heat.DNDarray, sample_weight: heat.DNDarray | None = None) Self
Fit
StandardScaler
to the given dataX
, i.e. compute mean and standard deviation ofX
to be used for later scaling.- Parameters:
X (DNDarray of shape (n_datapoints, n_features).) – Data used to compute the mean and standard deviation used for later featurewise scaling.
sample_weight (Not yet supported.) – Raises
NotImplementedError
.
- transform(X: heat.DNDarray) Self | heat.DNDarray
Applies standardization to input data
X
by centering and scaling w.r.t. mean and std previously computed and saved inStandardScaler
with :meth:fit
.
- inverse_transform(Y: heat.DNDarray) Self | heat.DNDarray
Scale back the data to the original representation, i.e. apply the inverse of :meth:
transform
to the inputY
.
- class MinMaxScaler(feature_range: Tuple[float, float] = (0.0, 1.0), *, copy: bool = True, clip: bool = False)
Bases:
heat.TransformMixin
,heat.BaseEstimator
Min-Max-Scaler: transforms the features by scaling each feature (affine) linearly to the prescribed range; similar to sklearn.preprocessing.MinMaxScaler. The data set to be scaled must be stored as 2D-DNDarray of shape (n_datapoints, n_features).
Each feature is scaled and translated individually such that it is in the given range on the input data set, e.g. between zero and one (default).
- Parameters:
feature_range (tuple (min, max), default=(0, 1)) – Desired range of transformed features.
copy (bool, default=True) –
copy = False
means in-place transformations whenever possible.clip (Not yet supported.) – raises
NotImplementedError
.
- Variables:
min (DNDarray of shape (n_features,)) – translation required per feature
scale (DNDarray of shape (n_features,)) – scaling required per feature
data_min (DNDarray of shape (n_features,)) – minimum per feature in the input data set
data_max (DNDarray of shape (n_features,)) – maximum per feature in the input data set
data_range (DNDarray of shape (n_features,)) – range per feature in the input data set
- fit(X: heat.DNDarray) Self
Fit the MinMaxScaler: i.e. compute the parameters required for later scaling.
- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – data set to which scaler shall be fitted.
- transform(X: heat.DNDarray) Self | heat.DNDarray
Transform input data with MinMaxScaler: i.e. scale features of
X
according to feature_range.- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – Data set to be transformed.
- inverse_transform(Y: heat.DNDarray) Self | heat.DNDarray
Apply the inverse of :meth:
fit
.- Parameters:
Y (DNDarray of shape (n_datapoints, n_features)) – Data set to be transformed back.
- class Normalizer(norm: str = 'l2', *, copy: bool = True)
Bases:
heat.TransformMixin
,heat.BaseEstimator
Normalizer: each data point of a data set is scaled to unit norm independently. The data set to be scaled must be stored as 2D-DNDarray of shape (n_datapoints, n_features); therefore the Normalizer scales each row to unit norm. This object is similar to sklearn.preprocessing.Normalizer.
- Parameters:
norm ({'l1', 'l2', 'max'}, default='l2') – The norm to use to normalize the data points.
norm='max'
refers to the \(\ell^\infty\)-norm.copy (bool, default=True) –
copy=False
enables in-place normalization.
- Variables:
None
Notes
Normalizer is stateless and, consequently, :meth:
fit
is only a dummy that does not need to be called before :meth:transform
. Since :meth:transform
is not bijective, there is no back-transformation :meth:inverse_transform
.- fit(X: heat.DNDarray) Self
Since :object:
Normalizer
is stateless, this function is only a dummy.
- transform(X: heat.DNDarray) Self | heat.DNDarray
Apply Normalizer trasformation: scales each data point of the input data set
X
to unit norm (w.r.t. tonorm
).
- class MaxAbsScaler(*, copy: bool = True)
Bases:
heat.TransformMixin
,heat.BaseEstimator
MaxAbsScaler: scale each feature of a given data set linearly by its maximum absolute value. The underyling data set to be scaled is assumed to be stored as a 2D-DNDarray of shape (n_datapoints, n_features); this routine is similar to sklearn.preprocessing.MaxAbsScaler.
Each feature is scaled individually such that the maximal absolute value of each feature after transformation will be 1.0. No shifting/centering is applied.
- Parameters:
copy (bool, default=True) –
copy=False
enables in-place transformation.- Variables:
- fit(X: heat.DNDarray) Self
Fit MaxAbsScaler to input data
X
: compute the parameters to be used for later scaling.- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – The data set to which the scaler shall be fitted.
- transform(X: heat.DNDarray) Self | heat.DNDarray
Scale the data with the MaxAbsScaler.
- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – The data set to be scaled.
- inverse_transform(Y: heat.DNDarray) Self | heat.DNDarray
Apply the inverse of :meth:
transform
, i.e. scale the input dataY
back to the original representation.- Parameters:
Y (DNDarray of shape (n_datapoints, n_features)) – The data set to be transformed back.
- class RobustScaler(*, with_centering: bool = True, with_scaling: bool = True, quantile_range: Tuple[float, float] = (25.0, 75.0), copy: bool = True, unit_variance: bool = False)
Bases:
heat.TransformMixin
,heat.BaseEstimator
This scaler transforms the features of a given data set making use of statistics that are robust to outliers: it removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range); this routine is similar to
sklearn.preprocessing.RobustScaler
.The underyling data set to be scaled must be stored as a 2D-DNDarray of shape (n_datapoints, n_features). Each feature is centered and scaled independently.
- Parameters:
with_centering (bool, default=True) – If True, data are centered before scaling.
with_scaling (bool, default=True) – If True, scale the data to prescribed interquantile range.
quantile_range (tuple (q_min, q_max), 0.0 <= q_min < q_max <= 100.0, default=(25.0, 75.0)) – Quantile range used to calculate scale_; default is the so-called the IQR given by
q_min=25
andq_max=75
.copy (bool, default=True) –
copy=False
enable in-place transformations.unit_variance (not yet supported.) – raises
NotImplementedError
- Variables:
- fit(X: heat.DNDarray) Self
Fit RobustScaler to given data set, i.e. compute the parameters required for transformation.
- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – Data to which the Scaler should be fitted.
- transform(X: heat.DNDarray) Self | heat.DNDarray
Transform given data with RobustScaler
- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – Data set to be transformed.
- inverse_transform(Y: heat.DNDarray) Self | heat.DNDarray
Apply inverse of :meth:
transform
.- Parameters:
Y (DNDarray of shape (n_datapoints, n_features)) – Data to be back-transformed