heat.linalg.qr
QR decomposition of (distributed) 2-D ``DNDarray``s.
Module Contents
- qr(A: heat.core.dndarray.DNDarray, mode: str = 'reduced', procs_to_merge: int = 2) Tuple[heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray]
Calculates the QR decomposition of a 2D
DNDarray
. Factor the matrixA
as QR, whereQ
is orthonormal andR
is upper-triangular. Ifmode = "reduced
, function returnsQR(Q=Q, R=R)
, ifmode = "r"
function returnsQR(Q=None, R=R)
- Parameters:
A (DNDarray of shape (M, N)) – Array which will be decomposed. So far only 2D arrays with datatype float32 or float64 are supported For split=0, the matrix must be tall skinny, i.e. the local chunks of data must have at least as many rows as columns.
mode (str, optional) – default “reduced” returns Q and R with dimensions (M, min(M,N)) and (min(M,N), N), respectively. “r” returns only R, with dimensions (min(M,N), N).
procs_to_merge (int, optional) – This parameter is only relevant for split=0 and determines the number of processes to be merged at one step during the so-called TS-QR algorithm. The default is 2. Higher choices might be faster, but will probably result in higher memory consumption. 0 corresponds to merging all processes at once. We only recommend to modify this parameter if you are familiar with the TS-QR algorithm (see the references below).
Notes
The distribution schemes of
Q
andR
depend on that of the inputA
.If
A
is distributed along the columns (A.split = 1), so will beQ
andR
.If
A
is distributed along the rows (A.split = 0),Q
too will have split=0, butR
won’t be distributed, i.e. R. split = None and a full copy ofR
will be stored on each process.
Note that the argument calc_q allowed in earlier Heat versions is no longer supported; calc_q = False is equivalent to mode = “r”. Unlike
numpy.linalg.qr()
, ht.linalg.qr only supportsmode="reduced"
ormode="r"
for the moment, since “complete” may result in heavy memory usage.Heats QR function is built on top of PyTorchs QR function,
torch.linalg.qr()
, using LAPACK (CPU) and MAGMA (CUDA) on the backend. For split=0, tall-skinny QR (TS-QR) is implemented, while for split=1 a block-wise version of stabilized Gram-Schmidt orthogonalization is used.References
Basic information about QR factorization/decomposition can be found at, e.g.:
Gene H. Golub and Charles F. Van Loan. 1996. Matrix Computations (3rd Ed.).
For an extensive overview on TS-QR and its variants we refer to, e.g.,
Demmel, James, et al. “Communication-Optimal Parallel and Sequential QR and LU Factorizations.” SIAM Journal on Scientific Computing, vol. 34, no. 1, 2 Feb. 2012, pp. A206–A239., doi:10.1137/080731992.