:mod:`heat.naive_bayes.gaussianNB`
==================================
.. py:module:: heat.naive_bayes.gaussianNB

.. autoapi-nested-parse::

   Distributed Gaussian Naive-Bayes classifier.


Module Contents
---------------


.. py:class:: GaussianNB(priors=None, var_smoothing=1e-09)

   Bases: :class:`heat.ClassificationMixin`, :class:`heat.BaseEstimator`

   Gaussian Naive Bayes (GaussianNB), based on `scikit-learn.naive_bayes.GaussianNB <https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html>`_.
   Can perform online updates to model parameters via method :func:`partial_fit`.
   For details on algorithm used to update feature means and variance online,
   see Chan, Golub, and LeVeque 1983 [1].

   :param priors: Prior probabilities of the classes, with shape ``(n_classes,)``. If specified, the priors are not
                  adjusted according to the data.
   :type priors: DNDarray
   :param var_smoothing: Portion of the largest variance of all features that is added to
                         variances for calculation stability.
   :type var_smoothing: float, optional

   :ivar class_count_: Number of training samples observed in each class. Shape = ``(n_classes,)``
   :vartype class_count_: DNDarray
   :ivar class_prior_: Probability of each class. Shape = ``(n_classes,)``
   :vartype class_prior_: DNDarray
   :ivar classes_: Class labels known to the classifier. Shape = ``(n_classes,)``
   :vartype classes_: DNDarray
   :ivar epsilon_: Absolute additive value to variances
   :vartype epsilon_: float
   :ivar sigma_: Variance of each feature per class. Shape = ``(n_classes, n_features)``
   :vartype sigma_: DNDarray
   :ivar theta_: Mean of each feature per class. Shape = ``(n_classes, n_features)``

   :vartype theta_: DNDarray

   .. rubric:: References

   [1] Chan, Tony F., Golub, Gene H., and Leveque, Randall J., "Algorithms for Computing the Sample Variance: Analysis
   and Recommendations", The American Statistician, 37:3, pp. 242-247, 1983

   .. rubric:: Examples

   >>> import heat as ht
   >>> X = ht.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]], dtype=ht.float32)
   >>> Y = ht.array([1, 1, 1, 2, 2, 2])
   >>> from heat.naive_bayes import GaussianNB
   >>> clf = GaussianNB()
   >>> clf.fit(X, Y)
   <heat.naive_bayes.gaussianNB.GaussianNB object at 0x1a249f6dd8>
   >>> print(clf.predict(ht.array([[-0.8, -1]])))
   tensor([1])
   >>> clf_pf = GaussianNB()
   >>> clf_pf.partial_fit(X, Y, ht.unique(Y, sorted=True))
   <heat.naive_bayes.gaussianNB.GaussianNB object at 0x1a249fbe10>
   >>> print(clf_pf.predict(ht.array([[-0.8, -1]])))
   tensor([1])


   .. attribute:: priors
      :annotation: = None

      
   .. attribute:: var_smoothing
      :annotation: = 1e-09

      
   .. role:: raw-html(raw)
      :format: html
   .. method:: fit(x: heat.core.dndarray.DNDarray, y: heat.core.dndarray.DNDarray, sample_weight: Optional[heat.core.dndarray.DNDarray] = None)

      Fit Gaussian Naive Bayes according to ``x`` and ``y``

      :param x: Training set, where n_samples is the number of samples
                and n_features is the number of features.  Shape = (n_classes, n_features)
      :type x: DNDarray
      :param y: Labels for training set. Shape = (n_samples, )
      :type y: DNDarray
      :param sample_weight: Weights applied to individual samples (1. for unweighted). Shape = (n_samples, )
      :type sample_weight: DNDarray, optional


   .. method:: __check_partial_fit_first_call(classes: Optional[heat.core.dndarray.DNDarray] = None) -> bool

      Adapted to HeAT from scikit-learn.

      This function returns ``True`` if it detects that this was the first call to
      :meth:`partial_fit` on :class:`GaussianNB`. In that case the :attr:`classes_` attribute is also
      set on :class:`GaussianNB`.


   .. method:: __update_mean_variance(n_past: int, mu: heat.core.dndarray.DNDarray, var: heat.core.dndarray.DNDarray, x: heat.core.dndarray.DNDarray, sample_weight: Optional[heat.core.dndarray.DNDarray] = None) -> Tuple[heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray]

      Adapted to HeAT from scikit-learn.
      Compute online update of Gaussian mean and variance.
      Given starting sample count, mean, and variance, a new set of
      points ``x``, and optionally sample weights, return the updated mean and
      variance. (NB - each dimension (column) in ``x`` is treated as independent
      -- you get variance, not covariance).
      Can take scalar mean and variance, or vector mean and variance to
      simultaneously update a number of independent Gaussians.
      See Chan, Golub, and LeVeque 1983 [1]

      :param n_past: Number of samples represented in old mean and variance. If sample
                     weights were given, this should contain the sum of sample
                     weights represented in old mean and variance.
      :type n_past: int
      :param mu: Means for Gaussians in original set. Shape = (number of Gaussians,)
      :type mu: DNDarray
      :param var: Variances for Gaussians in original set. Shape = (number of Gaussians,)
      :type var: DNDarray
      :param x: Input data
      :type x: DNDarray
      :param sample_weight: Weights applied to individual samples (1. for unweighted). Shape = (n_samples,)
      :type sample_weight: DNDarray, optional

      .. rubric:: References

      [1] Chan, Tony F., Golub, Gene H., and Leveque, Randall J., "Algorithms for Computing the Sample Variance: Analysis
      and Recommendations", The American Statistician, 37:3, pp. 242-247, 1983


   .. method:: partial_fit(x: heat.core.dndarray.DNDarray, y: heat.core.dndarray.DNDarray, classes: Optional[heat.core.dndarray.DNDarray] = None, sample_weight: Optional[heat.core.dndarray.DNDarray] = None)

      Adapted to HeAT from scikit-learn.
      Incremental fit on a batch of samples.
      This method is expected to be called several times consecutively
      on different chunks of a dataset so as to implement out-of-core
      or online learning.
      This is especially useful when the whole dataset is too big to fit in
      memory at once.
      This method has some performance and numerical stability overhead,
      hence it is better to call :func:`partial_fit` on chunks of data that are
      as large as possible (as long as fitting in the memory budget) to
      hide the overhead.

      :param x: Training set, where `n_samples` is the number of samples and
                `n_features` is the number of features. Shape = (n_samples, n_features)
      :type x: DNDarray
      :param y: Labels for training set. Shape = (n_samples,)
      :type y: DNDarray
      :param classes: List of all the classes that can possibly appear in the ``y`` vector.
                      Must be provided at the first call to :func:`partial_fit`, can be omitted
                      in subsequent calls. Shape = ``(n_classes,)``
      :type classes: DNDarray, optional
      :param sample_weight: Weights applied to individual samples (1. for unweighted). Shape = (n_samples,)
      :type sample_weight: DNDarray, optional


   .. method:: __partial_fit(x: heat.core.dndarray.DNDarray, y: heat.core.dndarray.DNDarray, classes: Optional[heat.core.dndarray.DNDarray] = None, _refit: bool = False, sample_weight: Optional[heat.core.dndarray.DNDarray] = None)

      Actual implementation of Gaussian NB fitting. Adapted to HeAT from scikit-learn.

      :param x: Training set, where n_samples is the number of samples and
                n_features is the number of features. Shape = (n_samples, n_features)
      :type x: DNDarray
      :param y: Labels for training set. Shape = (n_samples,)
      :type y: DNDarray
      :param classes: List of all the classes that can possibly appear in the y vector.
                      Must be provided at the first call to :func:`partial_fit`, can be omitted
                      in subsequent calls. Shape = (n_classes,)
      :type classes: DNDarray, optional
      :param _refit: If ``True``, act as though this were the first time :func:`__partial_fit` is called
                     (ie, throw away any past fitting and start over).
      :type _refit: bool, optional
      :param sample_weight: Weights applied to individual samples (1. for unweighted). Shape = (n_samples,)
      :type sample_weight: DNDarray, optional


   .. method:: __joint_log_likelihood(x: heat.core.dndarray.DNDarray) -> heat.core.dndarray.DNDarray

      Adapted to HeAT from scikit-learn.
      Calculates joint log-likelihood for `n_samples` to be assigned to each class.
      Returns a ``DNDarray`` `joint_log_likelihood(n_samples, n_classes)`.


   .. method:: logsumexp(a: heat.core.dndarray.DNDarray, axis: Optional[Union[int, Tuple[int, Ellipsis]]] = None, b: Optional[heat.core.dndarray.DNDarray] = None, keepdims: bool = False, return_sign: bool = False) -> heat.core.dndarray.DNDarray

      Adapted to HeAT from scikit-learn.
      Compute the log of the sum of exponentials of input elements. The result, ``np.log(np.sum(np.exp(a)))``
      calculated in a numerically more stable way. If `b` is given then ``np.log(np.sum(b*np.exp(a)))``
      is returned.

      :param a: Input array.
      :type a: DNDarray
      :param axis: Axis or axes over which the sum is taken. By default ``axis`` is ``None``,
                   and all elements are summed.
      :type axis: None or int or Tuple [int,...], optional
      :param keepdims: If this is set to ``True``, the axes which are reduced are left in the
                       result as dimensions with size one. With this option, the result
                       will broadcast correctly against the original array.
      :type keepdims: bool, optional
      :param b: Scaling factor for ``exp(a)`` must be of the same shape as ``a`` or
                broadcastable to ``a``. These values may be negative in order to
                implement subtraction.
      :type b: DNDarray, optional
      :param return_sign: If this is set to ``True``, the result will be a pair containing sign
                          information; if ``False``, results that are negative will be returned
                          as ``NaN``.
                          #TODO: returns NotImplementedYet error.
      :type return_sign: bool, optional
      :param sgn: #TODO If return_sign is True, this will be an array of floating-point
                  numbers matching res and +1, 0, or -1 depending on the sign
                  of the result. If ``False``, only one result is returned.
      :type sgn: DNDarray, NOT IMPLEMENTED YET


   .. method:: predict(x: heat.core.dndarray.DNDarray) -> heat.core.dndarray.DNDarray

      Adapted to HeAT from scikit-learn.
      Perform classification on a tensor of test data ``x``.

      :param x: Input data with shape (n_samples, n_features)
      :type x: DNDarray


   .. method:: predict_log_proba(x: heat.core.dndarray.DNDarray) -> heat.core.dndarray.DNDarray

      Adapted to HeAT from scikit-learn.
      Return log-probability estimates of the samples for each class in
      the model. The columns correspond to the classes in sorted
      order, as they appear in the attribute ``classes_``.

      :param x: Input data. Shape = (n_samples, n_features).
      :type x: DNDarray


   .. method:: predict_proba(x: heat.core.dndarray.DNDarray) -> heat.core.dndarray.DNDarray

      Adapted to HeAT from scikit-learn.
      Return probability estimates for the test tensor x of the samples for each class in
      the model. The columns correspond to the classes in sorted
      order, as they appear in the attribute ``classes_``.

      :param x: Input data. Shape = (n_samples, n_features).
      :type x: DNDarray