A Comparison of QMR, CGS and TFQMR on a Distributed Memory Machine.




For the solution of systems of linear equations with general non-Hermitian nonsingular coefficient matrices, an implementation of three different algorithms on a parallel machine with distributed memory is proposed. Each of the three algorithms, QMR, CGS and TFQMR, contains two matrix-vector products that dominate the execution time. While the matrix-vector products of CGS and TFQMR are dependent this is not valid for QMR. The two matrix-vector products of QMR can be computed simultaneously. This paper shows how the performance of a parallel implementation is increased by exploiting this property. Timing results of all three algorithms on an Intel PARAGON XPS 10 system are presented.