:mod:`heat.preprocessing`
=========================
.. py:module:: heat.preprocessing

.. autoapi-nested-parse::

   Add the preprocessing functions to the ht.preprocessing namespace


Submodules
----------
.. toctree::
   :titlesonly:
   :maxdepth: 1

   preprocessing/index.rst


Package Contents
----------------


.. function:: _is_2D_float_DNDarray(input)


.. function:: _has_n_features(param, inputdata)


.. function:: _tol_wrt_dtype(inputdata)


.. py:class:: StandardScaler(*, copy: bool = True, with_mean: bool = True, with_std: bool = True)

   Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator`

   Standardization of features to mean 0 and variance 1 by affine linear transformation; similar to `sklearn.preprocessing.StandardScaler`.
   The data set to be scaled must be stored as 2D-`DNDarray` of shape (n_datapoints, n_features).
   Shifting to mean 0 and scaling to variance 1 is applied to each feature independently.

   :param copy: If False, try to avoid a copy and do inplace scaling instead.
   :type copy: bool, default=True
   :param with_mean: If True, center the data (i.e. mean = 0) before scaling.
   :type with_mean: bool, default=True
   :param with_std: If True, scale the data to variance = 1.
   :type with_std: bool, default=True

   :ivar scale_: Per feature relative scaling of the data to achieve unit
                 variance. Set to ``None`` (no variance scaling applied) if ``var = None`` or ``var`` below machine precision.

   :vartype scale_: DNDarray of shape (n_features,) or None
   :ivar mean_: The mean value for each feature. Equal to ``None`` when ``with_mean=False``.

   :vartype mean_: DNDarray of shape (n_features,) or None
   :ivar var_: Featurewise variance of the given data. Equal to ``None`` when ``with_std=False``.

   :vartype var_: DNDarray of shape (n_features,) or None


   .. attribute:: with_mean
      :annotation: = True

      
   .. attribute:: with_std
      :annotation: = True

      
   .. attribute:: copy
      :annotation: = True

      
   .. role:: raw-html(raw)
      :format: html
   .. method:: fit(X: heat.DNDarray, sample_weight: Optional[heat.DNDarray] = None) -> Self

      Fit ``StandardScaler`` to the given data ``X``, i.e. compute mean and standard deviation of ``X`` to be used for later scaling.

      :param X: Data used to compute the mean and standard deviation used for later featurewise scaling.
      :type X: DNDarray of shape (n_datapoints, n_features).
      :param sample_weight: Raises ``NotImplementedError``.
      :type sample_weight: Not yet supported.


   .. method:: transform(X: heat.DNDarray) -> Union[Self, heat.DNDarray]

      Applies standardization to input data ``X`` by centering and scaling w.r.t. mean and std previously computed and saved in ``StandardScaler`` with :meth:``fit``.

      :param X: The data set to be standardized.
      :type X: DNDarray (n_datapoints, n_features)
      :param copy: Copy the input ``X`` or not.
      :type copy: bool, default=None


   .. method:: inverse_transform(Y: heat.DNDarray) -> Union[Self, heat.DNDarray]

      Scale back the data to the original representation, i.e. apply the inverse of :meth:``transform`` to the input ``Y``.

      :param Y: Data to be scaled back.
      :type Y: DNDarray of shape (n_datapoints, n_features)
      :param copy: Copy the input ``Y`` or not.
      :type copy: bool, default=None


.. py:class:: MinMaxScaler(feature_range: Tuple[float, float] = (0.0, 1.0), *, copy: bool = True, clip: bool = False)

   Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator`

   Min-Max-Scaler: transforms the features by scaling each feature (affine) linearly to the prescribed range;
   similar to `sklearn.preprocessing.MinMaxScaler`.
   The data set to be scaled must be stored as 2D-`DNDarray` of shape (n_datapoints, n_features).

   Each feature is scaled and translated individually such that it is in the given range on the input data set,
   e.g. between zero and one (default).

   :param feature_range: Desired range of transformed features.
   :type feature_range: tuple (min, max), default=(0, 1)
   :param copy: ``copy = False`` means in-place transformations whenever possible.
   :type copy: bool, default=True
   :param clip: raises ``NotImplementedError``.
   :type clip: Not yet supported.

   :ivar min_: translation required per feature

   :vartype min_: DNDarray of shape (n_features,)
   :ivar scale_: scaling required per feature

   :vartype scale_: DNDarray of shape (n_features,)
   :ivar data_min_: minimum per feature in the input data set

   :vartype data_min_: DNDarray of shape (n_features,)
   :ivar data_max_: maximum per feature in the input data set

   :vartype data_max_: DNDarray of shape (n_features,)
   :ivar data_range_: range per feature in the input data set

   :vartype data_range_: DNDarray of shape (n_features,)


   .. attribute:: copy
      :annotation: = True

      
   .. attribute:: feature_range
      :annotation: = (0.0, 1.0)

      
   .. attribute:: clip
      :annotation: = False

      
   .. role:: raw-html(raw)
      :format: html
   .. method:: fit(X: heat.DNDarray) -> Self

      Fit the MinMaxScaler: i.e. compute the parameters required for later scaling.

      :param X: data set to which scaler shall be fitted.
      :type X: DNDarray of shape (n_datapoints, n_features)


   .. method:: transform(X: heat.DNDarray) -> Union[Self, heat.DNDarray]

      Transform input data with MinMaxScaler: i.e. scale features of ``X`` according to feature_range.

      :param X: Data set to be transformed.
      :type X: DNDarray of shape (n_datapoints, n_features)


   .. method:: inverse_transform(Y: heat.DNDarray) -> Union[Self, heat.DNDarray]

      Apply the inverse of :meth:``fit``.

      :param Y: Data set to be transformed back.
      :type Y: DNDarray of shape (n_datapoints, n_features)


.. py:class:: Normalizer(norm: str = 'l2', *, copy: bool = True)

   Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator`

   Normalizer: each data point of a data set is scaled to unit norm independently.
   The data set to be scaled must be stored as 2D-`DNDarray` of shape (n_datapoints, n_features); therefore
   the Normalizer scales each row to unit norm. This object is similar to `sklearn.preprocessing.Normalizer`.

   :param norm: The norm to use to normalize the data points. ``norm='max'`` refers to the :math:`\ell^\infty`-norm.
   :type norm: {'l1', 'l2', 'max'}, default='l2'
   :param copy: ``copy=False`` enables in-place normalization.
   :type copy: bool, default=True

   :ivar None:

   .. rubric:: Notes

   Normalizer is :term:`stateless` and, consequently, :meth:``fit`` is only a dummy that does not need to be called before :meth:``transform``.
   Since :meth:``transform`` is not bijective, there is no back-transformation :meth:``inverse_transform``.


   .. attribute:: norm_
      :annotation: = 'l2'

      
   .. attribute:: copy
      :annotation: = True

      
   .. role:: raw-html(raw)
      :format: html
   .. method:: fit(X: heat.DNDarray) -> Self

      Since :object:``Normalizer`` is stateless, this function is only a dummy.


   .. method:: transform(X: heat.DNDarray) -> Union[Self, heat.DNDarray]

      Apply Normalizer trasformation: scales each data point of the input data set ``X`` to unit norm (w.r.t. to ``norm``).

      :param X: The data set to be normalized.
      :type X: DNDarray of shape (n_datapoints, n_features)
      :param copy: ``copy=False`` enables in-place transformation.
      :type copy: bool, default=None


.. py:class:: MaxAbsScaler(*, copy: bool = True)

   Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator`

   MaxAbsScaler: scale each feature of a given data set linearly by its maximum absolute value. The underyling data set to be scaled is
   assumed to be stored as a 2D-`DNDarray` of shape (n_datapoints, n_features); this routine is similar to
   `sklearn.preprocessing.MaxAbsScaler`.

   Each feature is scaled individually such that the maximal absolute value of each feature after transformation will be 1.0.
   No shifting/centering is applied.

   :param copy: ``copy=False`` enables in-place transformation.
   :type copy: bool, default=True

   :ivar scale_: Per feature relative scaling of the data.

   :vartype scale_: DNDarray of shape (n_features,)
   :ivar max_abs_: Per feature maximum absolute value of the input data.

   :vartype max_abs_: DNDarray of shape (n_features,)


   .. attribute:: copy
      :annotation: = True

      
   .. role:: raw-html(raw)
      :format: html
   .. method:: fit(X: heat.DNDarray) -> Self

      Fit MaxAbsScaler to input data ``X``: compute the parameters to be used for later scaling.

      :param X: The data set to which the scaler shall be fitted.
      :type X: DNDarray of shape (n_datapoints, n_features)


   .. method:: transform(X: heat.DNDarray) -> Union[Self, heat.DNDarray]

      Scale the data with the MaxAbsScaler.

      :param X: The data set to be scaled.
      :type X: DNDarray of shape (n_datapoints, n_features)


   .. method:: inverse_transform(Y: heat.DNDarray) -> Union[Self, heat.DNDarray]

      Apply the inverse of :meth:``transform``, i.e. scale the input data ``Y`` back to the original representation.

      :param Y: The data set to be transformed back.
      :type Y: DNDarray of shape (n_datapoints, n_features)


.. py:class:: RobustScaler(*, with_centering: bool = True, with_scaling: bool = True, quantile_range: Tuple[float, float] = (25.0, 75.0), copy: bool = True, unit_variance: bool = False, sketched: bool = False, sketch_size: Optional[float] = 1.0 / ht.MPI_WORLD.size)

   Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator`

   Scales the features of a given data set making use of statistics
   that are robust to outliers: it removes the median and scales the data according to
   the quantile range (defaults to IQR: Interquartile Range); this routine is similar
   to ``sklearn.preprocessing.RobustScaler``.

   Per default, the "true" median and IQR of the entire data set is computed; however, the argument
   `sketched` allows to switch to a faster but inaccurate version that computes
   median and IQR only on behalf of a random subset of the data set ("sketch") of size `sketch_size`.

   The underyling data set to be scaled must be stored as a 2D-`DNDarray` of shape (n_datapoints, n_features).
   Each feature is centered and scaled independently.

   :param with_centering: If `True`, data are centered before scaling.
   :type with_centering: bool, default=True
   :param with_scaling: If `True`, scale the data to prescribed interquantile range.
   :type with_scaling: bool, default=True
   :param quantile_range: Quantile range used to calculate `scale_`; default is the so-called
                          the IQR given by ``q_min=25`` and ``q_max=75``.
   :type quantile_range: tuple (q_min, q_max), 0.0 <= q_min < q_max <= 100.0,         default=(25.0, 75.0)
   :param copy: ``copy=False`` enable in-place transformations.
   :type copy: bool, default=True
   :param unit_variance: raises ``NotImplementedError``
   :type unit_variance: not yet supported.
   :param sketched: If `True`, use a sketch of the data set to compute the median and IQR.
                    This is faster but less accurate. The size of the sketch is determined by the argument `sketch_size`.
   :type sketched: bool, default=False
   :param sketch_size: Fraction of the data set to be used for the sketch if `sketched=True`. The default value is 1/N, where N is the number of MPI processes.
                       Ignored if `sketched=False`.
   :type sketch_size: float, default=1./ht.MPI_WORLD.size

   :ivar center_: Feature-wise median value of the given data set.

   :vartype center_: DNDarray of shape (n_features,)
   :ivar iqr_: length of the interquantile range for each feature.

   :vartype iqr_: DNDarray of shape (n_features,)
   :ivar scale_: feature-wise inverse of ``iqr_``.

   :vartype scale_: array of floats


   .. attribute:: with_centering
      :annotation: = True

      
   .. attribute:: with_scaling
      :annotation: = True

      
   .. attribute:: quantile_range
      :annotation: = (25.0, 75.0)

      
   .. attribute:: copy
      :annotation: = True

      
   .. attribute:: sketched
      :annotation: = False

      
   .. attribute:: sketch_size
      

   .. role:: raw-html(raw)
      :format: html
   .. method:: fit(X: heat.DNDarray) -> Self

      Fit RobustScaler to given data set, i.e. compute the parameters required for transformation.

      :param X: Data to which the Scaler should be fitted.
      :type X: DNDarray of shape (n_datapoints, n_features)


   .. method:: transform(X: heat.DNDarray) -> Union[Self, heat.DNDarray]

      Transform given data with RobustScaler

      :param X: Data set to be transformed.
      :type X: DNDarray of shape (n_datapoints, n_features)


   .. method:: inverse_transform(Y: heat.DNDarray) -> Union[Self, heat.DNDarray]

      Apply inverse of :meth:``transform``.

      :param Y: Data to be back-transformed
      :type Y: DNDarray of shape (n_datapoints, n_features)