:mod:`heat.preprocessing` ========================= .. py:module:: heat.preprocessing .. autoapi-nested-parse:: Add the preprocessing functions to the ht.preprocessing namespace Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 preprocessing/index.rst Package Contents ---------------- .. function:: _is_2D_float_DNDarray(input) .. function:: _has_n_features(param, inputdata) .. function:: _tol_wrt_dtype(inputdata) .. py:class:: StandardScaler(*, copy: bool = True, with_mean: bool = True, with_std: bool = True) Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator` Standardization of features to mean 0 and variance 1 by affine linear transformation; similar to `sklearn.preprocessing.StandardScaler`. The data set to be scaled must be stored as 2D-`DNDarray` of shape (n_datapoints, n_features). Shifting to mean 0 and scaling to variance 1 is applied to each feature independently. :param copy: If False, try to avoid a copy and do inplace scaling instead. :type copy: bool, default=True :param with_mean: If True, center the data (i.e. mean = 0) before scaling. :type with_mean: bool, default=True :param with_std: If True, scale the data to variance = 1. :type with_std: bool, default=True :ivar scale_: Per feature relative scaling of the data to achieve unit variance. Set to ``None`` (no variance scaling applied) if ``var = None`` or ``var`` below machine precision. :vartype scale_: DNDarray of shape (n_features,) or None :ivar mean_: The mean value for each feature. Equal to ``None`` when ``with_mean=False``. :vartype mean_: DNDarray of shape (n_features,) or None :ivar var_: Featurewise variance of the given data. Equal to ``None`` when ``with_std=False``. :vartype var_: DNDarray of shape (n_features,) or None .. attribute:: with_mean :annotation: = True .. attribute:: with_std :annotation: = True .. attribute:: copy :annotation: = True .. role:: raw-html(raw) :format: html .. method:: fit(X: heat.DNDarray, sample_weight: Optional[heat.DNDarray] = None) -> Self Fit ``StandardScaler`` to the given data ``X``, i.e. compute mean and standard deviation of ``X`` to be used for later scaling. :param X: Data used to compute the mean and standard deviation used for later featurewise scaling. :type X: DNDarray of shape (n_datapoints, n_features). :param sample_weight: Raises ``NotImplementedError``. :type sample_weight: Not yet supported. .. method:: transform(X: heat.DNDarray) -> Union[Self, heat.DNDarray] Applies standardization to input data ``X`` by centering and scaling w.r.t. mean and std previously computed and saved in ``StandardScaler`` with :meth:``fit``. :param X: The data set to be standardized. :type X: DNDarray (n_datapoints, n_features) :param copy: Copy the input ``X`` or not. :type copy: bool, default=None .. method:: inverse_transform(Y: heat.DNDarray) -> Union[Self, heat.DNDarray] Scale back the data to the original representation, i.e. apply the inverse of :meth:``transform`` to the input ``Y``. :param Y: Data to be scaled back. :type Y: DNDarray of shape (n_datapoints, n_features) :param copy: Copy the input ``Y`` or not. :type copy: bool, default=None .. py:class:: MinMaxScaler(feature_range: Tuple[float, float] = (0.0, 1.0), *, copy: bool = True, clip: bool = False) Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator` Min-Max-Scaler: transforms the features by scaling each feature (affine) linearly to the prescribed range; similar to `sklearn.preprocessing.MinMaxScaler`. The data set to be scaled must be stored as 2D-`DNDarray` of shape (n_datapoints, n_features). Each feature is scaled and translated individually such that it is in the given range on the input data set, e.g. between zero and one (default). :param feature_range: Desired range of transformed features. :type feature_range: tuple (min, max), default=(0, 1) :param copy: ``copy = False`` means in-place transformations whenever possible. :type copy: bool, default=True :param clip: raises ``NotImplementedError``. :type clip: Not yet supported. :ivar min_: translation required per feature :vartype min_: DNDarray of shape (n_features,) :ivar scale_: scaling required per feature :vartype scale_: DNDarray of shape (n_features,) :ivar data_min_: minimum per feature in the input data set :vartype data_min_: DNDarray of shape (n_features,) :ivar data_max_: maximum per feature in the input data set :vartype data_max_: DNDarray of shape (n_features,) :ivar data_range_: range per feature in the input data set :vartype data_range_: DNDarray of shape (n_features,) .. attribute:: copy :annotation: = True .. attribute:: feature_range :annotation: = (0.0, 1.0) .. attribute:: clip :annotation: = False .. role:: raw-html(raw) :format: html .. method:: fit(X: heat.DNDarray) -> Self Fit the MinMaxScaler: i.e. compute the parameters required for later scaling. :param X: data set to which scaler shall be fitted. :type X: DNDarray of shape (n_datapoints, n_features) .. method:: transform(X: heat.DNDarray) -> Union[Self, heat.DNDarray] Transform input data with MinMaxScaler: i.e. scale features of ``X`` according to feature_range. :param X: Data set to be transformed. :type X: DNDarray of shape (n_datapoints, n_features) .. method:: inverse_transform(Y: heat.DNDarray) -> Union[Self, heat.DNDarray] Apply the inverse of :meth:``fit``. :param Y: Data set to be transformed back. :type Y: DNDarray of shape (n_datapoints, n_features) .. py:class:: Normalizer(norm: str = 'l2', *, copy: bool = True) Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator` Normalizer: each data point of a data set is scaled to unit norm independently. The data set to be scaled must be stored as 2D-`DNDarray` of shape (n_datapoints, n_features); therefore the Normalizer scales each row to unit norm. This object is similar to `sklearn.preprocessing.Normalizer`. :param norm: The norm to use to normalize the data points. ``norm='max'`` refers to the :math:`\ell^\infty`-norm. :type norm: {'l1', 'l2', 'max'}, default='l2' :param copy: ``copy=False`` enables in-place normalization. :type copy: bool, default=True :ivar None: .. rubric:: Notes Normalizer is :term:`stateless` and, consequently, :meth:``fit`` is only a dummy that does not need to be called before :meth:``transform``. Since :meth:``transform`` is not bijective, there is no back-transformation :meth:``inverse_transform``. .. attribute:: norm_ :annotation: = 'l2' .. attribute:: copy :annotation: = True .. role:: raw-html(raw) :format: html .. method:: fit(X: heat.DNDarray) -> Self Since :object:``Normalizer`` is stateless, this function is only a dummy. .. method:: transform(X: heat.DNDarray) -> Union[Self, heat.DNDarray] Apply Normalizer trasformation: scales each data point of the input data set ``X`` to unit norm (w.r.t. to ``norm``). :param X: The data set to be normalized. :type X: DNDarray of shape (n_datapoints, n_features) :param copy: ``copy=False`` enables in-place transformation. :type copy: bool, default=None .. py:class:: MaxAbsScaler(*, copy: bool = True) Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator` MaxAbsScaler: scale each feature of a given data set linearly by its maximum absolute value. The underyling data set to be scaled is assumed to be stored as a 2D-`DNDarray` of shape (n_datapoints, n_features); this routine is similar to `sklearn.preprocessing.MaxAbsScaler`. Each feature is scaled individually such that the maximal absolute value of each feature after transformation will be 1.0. No shifting/centering is applied. :param copy: ``copy=False`` enables in-place transformation. :type copy: bool, default=True :ivar scale_: Per feature relative scaling of the data. :vartype scale_: DNDarray of shape (n_features,) :ivar max_abs_: Per feature maximum absolute value of the input data. :vartype max_abs_: DNDarray of shape (n_features,) .. attribute:: copy :annotation: = True .. role:: raw-html(raw) :format: html .. method:: fit(X: heat.DNDarray) -> Self Fit MaxAbsScaler to input data ``X``: compute the parameters to be used for later scaling. :param X: The data set to which the scaler shall be fitted. :type X: DNDarray of shape (n_datapoints, n_features) .. method:: transform(X: heat.DNDarray) -> Union[Self, heat.DNDarray] Scale the data with the MaxAbsScaler. :param X: The data set to be scaled. :type X: DNDarray of shape (n_datapoints, n_features) .. method:: inverse_transform(Y: heat.DNDarray) -> Union[Self, heat.DNDarray] Apply the inverse of :meth:``transform``, i.e. scale the input data ``Y`` back to the original representation. :param Y: The data set to be transformed back. :type Y: DNDarray of shape (n_datapoints, n_features) .. py:class:: RobustScaler(*, with_centering: bool = True, with_scaling: bool = True, quantile_range: Tuple[float, float] = (25.0, 75.0), copy: bool = True, unit_variance: bool = False, sketched: bool = False, sketch_size: Optional[float] = 1.0 / ht.MPI_WORLD.size) Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator` Scales the features of a given data set making use of statistics that are robust to outliers: it removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range); this routine is similar to ``sklearn.preprocessing.RobustScaler``. Per default, the "true" median and IQR of the entire data set is computed; however, the argument `sketched` allows to switch to a faster but inaccurate version that computes median and IQR only on behalf of a random subset of the data set ("sketch") of size `sketch_size`. The underyling data set to be scaled must be stored as a 2D-`DNDarray` of shape (n_datapoints, n_features). Each feature is centered and scaled independently. :param with_centering: If `True`, data are centered before scaling. :type with_centering: bool, default=True :param with_scaling: If `True`, scale the data to prescribed interquantile range. :type with_scaling: bool, default=True :param quantile_range: Quantile range used to calculate `scale_`; default is the so-called the IQR given by ``q_min=25`` and ``q_max=75``. :type quantile_range: tuple (q_min, q_max), 0.0 <= q_min < q_max <= 100.0, default=(25.0, 75.0) :param copy: ``copy=False`` enable in-place transformations. :type copy: bool, default=True :param unit_variance: raises ``NotImplementedError`` :type unit_variance: not yet supported. :param sketched: If `True`, use a sketch of the data set to compute the median and IQR. This is faster but less accurate. The size of the sketch is determined by the argument `sketch_size`. :type sketched: bool, default=False :param sketch_size: Fraction of the data set to be used for the sketch if `sketched=True`. The default value is 1/N, where N is the number of MPI processes. Ignored if `sketched=False`. :type sketch_size: float, default=1./ht.MPI_WORLD.size :ivar center_: Feature-wise median value of the given data set. :vartype center_: DNDarray of shape (n_features,) :ivar iqr_: length of the interquantile range for each feature. :vartype iqr_: DNDarray of shape (n_features,) :ivar scale_: feature-wise inverse of ``iqr_``. :vartype scale_: array of floats .. attribute:: with_centering :annotation: = True .. attribute:: with_scaling :annotation: = True .. attribute:: quantile_range :annotation: = (25.0, 75.0) .. attribute:: copy :annotation: = True .. attribute:: sketched :annotation: = False .. attribute:: sketch_size .. role:: raw-html(raw) :format: html .. method:: fit(X: heat.DNDarray) -> Self Fit RobustScaler to given data set, i.e. compute the parameters required for transformation. :param X: Data to which the Scaler should be fitted. :type X: DNDarray of shape (n_datapoints, n_features) .. method:: transform(X: heat.DNDarray) -> Union[Self, heat.DNDarray] Transform given data with RobustScaler :param X: Data set to be transformed. :type X: DNDarray of shape (n_datapoints, n_features) .. method:: inverse_transform(Y: heat.DNDarray) -> Union[Self, heat.DNDarray] Apply inverse of :meth:``transform``. :param Y: Data to be back-transformed :type Y: DNDarray of shape (n_datapoints, n_features)