The non-symmetric s-step Lanczos algorithm: Derivation of efficient recurrences and synchronization-reducing variants of BiCG and QMR.




The Lanczos algorithm is among the most frequently used iterative techniques for computing a few dominant eigenvalues of a large sparse non-symmetric matrix. At the same time, it serves as a building block within the Biconjugate Gradient (BiCG) and the Quasi-Minimal Residual (QMR) methods for solving large sparse non-symmetric systems of linear equations. It is well known that, when implemented on distributed-memory computers with a huge number of processes, the synchronization time spent in computing dot products increasingly limits the parallel scalability. Therefore, we propose synchronization-reducing variants of the Lanczos, as well as BiCG and QMR methods, in an attempt to mitigate these negative performance effects. These so-called s-step algorithms are based on grouping dot products for joint execution and replacing time-consuming matrix operations by efficient vector recurrences. The purpose of this paper is to provide a rigorous derivation of the recurrences for the s-step Lanczos algorithm, introduce s-step BiCG and QMR variants, and compare the parallel performance of these new s-step variants with previous algorithms.