:mod:`heat.linalg.qr`
==========================
.. py:module:: heat.core.linalg.qr

.. autoapi-nested-parse::

   QR decomposition of ``DNDarray``s.


Module Contents
---------------


.. function:: qr(A: heat.core.dndarray.DNDarray, mode: str = 'reduced', procs_to_merge: int = 2) -> Tuple[heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray]

   Calculates the QR decomposition of a 2D ``DNDarray``.
   Factor the matrix ``A`` as *QR*, where ``Q`` is orthonormal and ``R`` is upper-triangular.
   If ``mode = "reduced"``, function returns ``QR(Q=Q, R=R)``, if ``mode = "r"`` function returns ``QR(Q=None, R=R)``

   This function also works for batches of matrices; in this case, the last two dimensions of the input array are considered as the matrix dimensions.
   The output arrays have the same leading batch dimensions as the input array.

   :param A: Array which will be decomposed.
   :type A: DNDarray of shape (M, N), of shape (...,M,N) in the batched case
   :param mode: default "reduced" returns Q and R with dimensions (M, min(M,N)) and (min(M,N), N). Potential batch dimensions are not modified.
                "r" returns only R, with dimensions (min(M,N), N).
   :type mode: str, optional
   :param procs_to_merge: This parameter is only relevant for split=0 (-2, in the batched case) and determines the number of processes to be merged at one step during the so-called TS-QR algorithm.
                          The default is 2. Higher choices might be faster, but will probably result in higher memory consumption. 0 corresponds to merging all processes at once.
                          We only recommend to modify this parameter if you are familiar with the TS-QR algorithm (see the references below).
   :type procs_to_merge: int, optional

   .. rubric:: Notes

   The distribution schemes of ``Q`` and ``R`` depend on that of the input ``A``.

       - If ``A`` is distributed along the columns (A.split = 1), so will be ``Q`` and ``R``.

       - If ``A`` is distributed along the rows (A.split = 0), ``Q`` too will have  `split=0`. ``R`` won't be distributed, i.e. `R. split = None`, if ``A`` is tall-skinny, i.e., if
         the largest local chunk of data of ``A`` has at least as many rows as columns. Otherwise, ``R`` will be distributed along the rows as well, i.e., `R.split = 0`.

   Note that the argument `calc_q` allowed in earlier Heat versions is no longer supported; `calc_q = False` is equivalent to `mode = "r"`.
   Unlike ``numpy.linalg.qr()``, `ht.linalg.qr` only supports ``mode="reduced"`` or ``mode="r"`` for the moment, since "complete" may result in heavy memory usage.

   Heats QR function is built on top of PyTorchs QR function, ``torch.linalg.qr()``, using LAPACK (CPU) and MAGMA (CUDA) on
   the backend. Both cases split=0 and split=1 build on a column-block-wise version of stabilized Gram-Schmidt orthogonalization.
   For split=1 (-1, in the batched case), this is directly applied to the local arrays of the input array.
   For split=0, a tall-skinny QR (TS-QR) is implemented for the case of tall-skinny matrices (i.e., the largest local chunk of data has at least as many rows as columns),
   and extended to non tall-skinny matrices by applying a block-wise version of stabilized Gram-Schmidt orthogonalization.

   .. rubric:: References

   Basic information about QR factorization/decomposition can be found at, e.g.:

       - https://en.wikipedia.org/wiki/QR_factorization,

       - Gene H. Golub and Charles F. Van Loan. 1996. Matrix Computations (3rd Ed.).

   For an extensive overview on TS-QR and its variants we refer to, e.g.,

       - Demmel, James, et al. “Communication-Optimal Parallel and Sequential QR and LU Factorizations.” SIAM Journal on Scientific Computing, vol. 34, no. 1, 2 Feb. 2012, pp. A206–A239., doi:10.1137/080731992.