heat.linalg.qr

QR decomposition of (distributed) 2-D ``DNDarray``s.

Module Contents

qr(A: heat.core.dndarray.DNDarray, mode: str = 'reduced', procs_to_merge: int = 2) Tuple[heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray]

Calculates the QR decomposition of a 2D DNDarray. Factor the matrix A as QR, where Q is orthonormal and R is upper-triangular. If mode = "reduced, function returns QR(Q=Q, R=R), if mode = "r" function returns QR(Q=None, R=R)

Parameters:
  • A (DNDarray of shape (M, N)) – Array which will be decomposed. So far only 2D arrays with datatype float32 or float64 are supported For split=0, the matrix must be tall skinny, i.e. the local chunks of data must have at least as many rows as columns.

  • mode (str, optional) – default “reduced” returns Q and R with dimensions (M, min(M,N)) and (min(M,N), N), respectively. “r” returns only R, with dimensions (min(M,N), N).

  • procs_to_merge (int, optional) – This parameter is only relevant for split=0 and determines the number of processes to be merged at one step during the so-called TS-QR algorithm. The default is 2. Higher choices might be faster, but will probably result in higher memory consumption. 0 corresponds to merging all processes at once. We only recommend to modify this parameter if you are familiar with the TS-QR algorithm (see the references below).

Notes

The distribution schemes of Q and R depend on that of the input A.

  • If A is distributed along the columns (A.split = 1), so will be Q and R.

  • If A is distributed along the rows (A.split = 0), Q too will have split=0, but R won’t be distributed, i.e. R. split = None and a full copy of R will be stored on each process.

Note that the argument calc_q allowed in earlier Heat versions is no longer supported; calc_q = False is equivalent to mode = “r”. Unlike numpy.linalg.qr(), ht.linalg.qr only supports mode="reduced" or mode="r" for the moment, since “complete” may result in heavy memory usage.

Heats QR function is built on top of PyTorchs QR function, torch.linalg.qr(), using LAPACK (CPU) and MAGMA (CUDA) on the backend. For split=0, tall-skinny QR (TS-QR) is implemented, while for split=1 a block-wise version of stabilized Gram-Schmidt orthogonalization is used.

References

Basic information about QR factorization/decomposition can be found at, e.g.:

For an extensive overview on TS-QR and its variants we refer to, e.g.,

  • Demmel, James, et al. “Communication-Optimal Parallel and Sequential QR and LU Factorizations.” SIAM Journal on Scientific Computing, vol. 34, no. 1, 2 Feb. 2012, pp. A206–A239., doi:10.1137/080731992.