heat.preprocessing
Add the preprocessing functions to the ht.preprocessing namespace
Submodules
Package Contents
- _is_2D_float_DNDarray(input)
- _has_n_features(param, inputdata)
- _tol_wrt_dtype(inputdata)
- class StandardScaler(*, copy: bool = True, with_mean: bool = True, with_std: bool = True)[source]
Bases:
heat.TransformMixin,heat.BaseEstimatorStandardization of features to mean 0 and variance 1 by affine linear transformation; similar to sklearn.preprocessing.StandardScaler. The data set to be scaled must be stored as 2D-DNDarray of shape (n_datapoints, n_features). Shifting to mean 0 and scaling to variance 1 is applied to each feature independently.
- Parameters:
- Variables:
scale (DNDarray of shape (n_features,) or None) – Per feature relative scaling of the data to achieve unit variance. Set to
None(no variance scaling applied) ifvar = Noneorvarbelow machine precision.mean (DNDarray of shape (n_features,) or None) – The mean value for each feature. Equal to
Nonewhenwith_mean=False.var (DNDarray of shape (n_features,) or None) – Featurewise variance of the given data. Equal to
Nonewhenwith_std=False.
- with_mean = True
- with_std = True
- copy = True
- fit(X: heat.DNDarray, sample_weight: heat.DNDarray | None = None) Self[source]
Fit
StandardScalerto the given dataX, i.e. compute mean and standard deviation ofXto be used for later scaling.- Parameters:
X (DNDarray of shape (n_datapoints, n_features).) – Data used to compute the mean and standard deviation used for later featurewise scaling.
sample_weight (Not yet supported.) – Raises
NotImplementedError.
- transform(X: heat.DNDarray) Self | heat.DNDarray[source]
Applies standardization to input data
Xby centering and scaling w.r.t. mean and std previously computed and saved inStandardScalerwith :meth:fit.
- inverse_transform(Y: heat.DNDarray) Self | heat.DNDarray[source]
Scale back the data to the original representation, i.e. apply the inverse of :meth:
transformto the inputY.
- class MinMaxScaler(feature_range: Tuple[float, float] = (0.0, 1.0), *, copy: bool = True, clip: bool = False)[source]
Bases:
heat.TransformMixin,heat.BaseEstimatorMin-Max-Scaler: transforms the features by scaling each feature (affine) linearly to the prescribed range; similar to sklearn.preprocessing.MinMaxScaler. The data set to be scaled must be stored as 2D-DNDarray of shape (n_datapoints, n_features).
Each feature is scaled and translated individually such that it is in the given range on the input data set, e.g. between zero and one (default).
- Parameters:
- Variables:
min (DNDarray of shape (n_features,)) – translation required per feature
scale (DNDarray of shape (n_features,)) – scaling required per feature
data_min (DNDarray of shape (n_features,)) – minimum per feature in the input data set
data_max (DNDarray of shape (n_features,)) – maximum per feature in the input data set
data_range (DNDarray of shape (n_features,)) – range per feature in the input data set
- copy = True
- feature_range = (0.0, 1.0)
- clip = False
- fit(X: heat.DNDarray) Self[source]
Fit the MinMaxScaler: i.e. compute the parameters required for later scaling.
- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – data set to which scaler shall be fitted.
- transform(X: heat.DNDarray) Self | heat.DNDarray[source]
Transform input data with MinMaxScaler: i.e. scale features of
Xaccording to feature_range.- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – Data set to be transformed.
- inverse_transform(Y: heat.DNDarray) Self | heat.DNDarray[source]
Apply the inverse of :meth:
fit.- Parameters:
Y (DNDarray of shape (n_datapoints, n_features)) – Data set to be transformed back.
- class Normalizer(norm: str = 'l2', *, copy: bool = True)[source]
Bases:
heat.TransformMixin,heat.BaseEstimatorNormalizer: each data point of a data set is scaled to unit norm independently. The data set to be scaled must be stored as 2D-DNDarray of shape (n_datapoints, n_features); therefore the Normalizer scales each row to unit norm. This object is similar to sklearn.preprocessing.Normalizer.
- Parameters:
norm ({'l1', 'l2', 'max'}, default='l2') – The norm to use to normalize the data points.
norm='max'refers to the \(\ell^\infty\)-norm.copy (bool, default=True) –
copy=Falseenables in-place normalization.
- Variables:
None
Notes
Normalizer is stateless and, consequently, :meth:
fitis only a dummy that does not need to be called before :meth:transform. Since :meth:transformis not bijective, there is no back-transformation :meth:inverse_transform.- norm_ = 'l2'
- copy = True
- fit(X: heat.DNDarray) Self[source]
Since :object:
Normalizeris stateless, this function is only a dummy.
- transform(X: heat.DNDarray) Self | heat.DNDarray[source]
Apply Normalizer trasformation: scales each data point of the input data set
Xto unit norm (w.r.t. tonorm).
- class MaxAbsScaler(*, copy: bool = True)[source]
Bases:
heat.TransformMixin,heat.BaseEstimatorMaxAbsScaler: scale each feature of a given data set linearly by its maximum absolute value. The underyling data set to be scaled is assumed to be stored as a 2D-DNDarray of shape (n_datapoints, n_features); this routine is similar to sklearn.preprocessing.MaxAbsScaler.
Each feature is scaled individually such that the maximal absolute value of each feature after transformation will be 1.0. No shifting/centering is applied.
- Parameters:
copy (bool, default=True) –
copy=Falseenables in-place transformation.- Variables:
- copy = True
- fit(X: heat.DNDarray) Self[source]
Fit MaxAbsScaler to input data
X: compute the parameters to be used for later scaling.- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – The data set to which the scaler shall be fitted.
- transform(X: heat.DNDarray) Self | heat.DNDarray[source]
Scale the data with the MaxAbsScaler.
- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – The data set to be scaled.
- inverse_transform(Y: heat.DNDarray) Self | heat.DNDarray[source]
Apply the inverse of :meth:
transform, i.e. scale the input dataYback to the original representation.- Parameters:
Y (DNDarray of shape (n_datapoints, n_features)) – The data set to be transformed back.
- class RobustScaler(*, with_centering: bool = True, with_scaling: bool = True, quantile_range: Tuple[float, float] = (25.0, 75.0), copy: bool = True, unit_variance: bool = False, sketched: bool = False, sketch_size: float | None = 1.0 / ht.MPI_WORLD.size)[source]
Bases:
heat.TransformMixin,heat.BaseEstimatorScales the features of a given data set making use of statistics that are robust to outliers: it removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range); this routine is similar to
sklearn.preprocessing.RobustScaler.Per default, the “true” median and IQR of the entire data set is computed; however, the argument sketched allows to switch to a faster but inaccurate version that computes median and IQR only on behalf of a random subset of the data set (“sketch”) of size sketch_size.
The underyling data set to be scaled must be stored as a 2D-DNDarray of shape (n_datapoints, n_features). Each feature is centered and scaled independently.
- Parameters:
with_centering (bool, default=True) – If True, data are centered before scaling.
with_scaling (bool, default=True) – If True, scale the data to prescribed interquantile range.
quantile_range (tuple (q_min, q_max), 0.0 <= q_min < q_max <= 100.0, default=(25.0, 75.0)) – Quantile range used to calculate scale_; default is the so-called the IQR given by
q_min=25andq_max=75.copy (bool, default=True) –
copy=Falseenable in-place transformations.unit_variance (not yet supported.) – raises
NotImplementedErrorsketched (bool, default=False) – If True, use a sketch of the data set to compute the median and IQR. This is faster but less accurate. The size of the sketch is determined by the argument sketch_size.
sketch_size (float, default=1./ht.MPI_WORLD.size) – Fraction of the data set to be used for the sketch if sketched=True. The default value is 1/N, where N is the number of MPI processes. Ignored if sketched=False.
- Variables:
- with_centering = True
- with_scaling = True
- quantile_range = (25.0, 75.0)
- copy = True
- sketched = False
- sketch_size
- fit(X: heat.DNDarray) Self[source]
Fit RobustScaler to given data set, i.e. compute the parameters required for transformation.
- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – Data to which the Scaler should be fitted.
- transform(X: heat.DNDarray) Self | heat.DNDarray[source]
Transform given data with RobustScaler
- Parameters:
X (DNDarray of shape (n_datapoints, n_features)) – Data set to be transformed.
- inverse_transform(Y: heat.DNDarray) Self | heat.DNDarray[source]
Apply inverse of :meth:
transform.- Parameters:
Y (DNDarray of shape (n_datapoints, n_features)) – Data to be back-transformed