# Publications » Top 10

- A hybrid geometric + algebraic multigrid method with semi-iterative smoothers
- A comparison of the Extrapolated Successive Overrelaxation and the Preconditioned Simultaneous Displacement methods for augmented linear systems
- A comparison of CPU and GPU implementations for solving the Convection Diffusion equation using the local Modified SOR method
- Is modified PSD equivalent to modified SOR for two-cyclic matrices?
- The impact of eigenvalue locality on the convergence behavior of the PSD method for two-cyclic matrices
- Convergence of the Diffusion method for weighted torus graphs using Fourier analysis
- Load balancing for the numerical solution of the Navier-Stokes equations
- The parallelisation of the Princeton Ocean model
- Convergence theory of extrapolated iterative methods for a certain class of non-symmetric linear systems
- On the convergence of some generalized preconditioned iterative methods

Num. Linear Algebra Applic., Vol. 21, Issue 2, pp. 221-238, 2014

**ABSTRACT:**We propose a multigrid method for solving large-scale sparse linear systems arising from discretizations of partial differential equations, such as those from finite element and generalized finite difference methods. Read more →

Our proposed method has the following two characteristics. First, we introduce a hybrid geometric + algebraic multigrid method, or HyGA, to leverage the rigor, accuracy, and efficiency of geometric multigrid (GMG) for hierarchical unstructured meshes, with the flexibility of algebraic multigrid (AMG). Second, we introduce efficient smoothers based on the Chebyshev–Jacobi method for both symmetric and asymmetric matrices. We also introduce a unified derivation of restriction and prolongation operators for weighted-residual formulations over unstructured hierarchical meshes and apply it to both finite element and generalized finite difference methods. We present numerical results of our method for Poisson equations in both 2-D and 3-D and compare our method against the classical GMG and AMG as well as Krylov subspace methods.
← Read less

Numerische Mathematik, 2014, (under revision)

**ABSTRACT:**In this paper we study the impact of two types of preconditioning on the numerical solution of large sparse augmented linear systems. Read more →

The first preconditioning matrix is the lower triangular part whereas the second is the product of the lower triangular part with the upper triangular part of the augmented system's coefficient matrix. For the first preconditioning matrix we form the Generalized Modified Extrapolated Successive Overrelaxation (GMESOR) method, whereas the second preconditioning matrix yields the Generalized Modified Preconditioned Simultaneous Displacement (GMPSD) method, which is an extrapolated form of the Symmetric Successive Overrelaxation method. We find sufficient conditions for each aforementioned iterative method to converge. In addition, we develop a geometric approach, for determining the optimum values of their parameters and corresponding spectral radii. It is shown that both iterative methods studied (GMESOR and GMPSD) attain the same rate of convergence. Numerical results confirm our theoretical expectations.
← Read less

Parallel Computing, http://dx.doi.org/10.1016/j.parco.2014.02.002, 2014

**ABSTRACT:**In this paper we study a parallel form of the SOR method for the numerical solution of the Convection Diffusion equation suitable for GPUs using CUDA. To exploit the parallelism offered by GPUs we consider the ﬁne grain parallelism model. Read more →

This is achieved by considering the local relaxation version of SOR. In particular, we use SOR with red-black ordering using two sets of parameters ω

_{1ij}and ω_{2ij}for the 5 point stencil. The parameter ω_{1ij}is associated with each red (i + j even) grid point (i, j), whereas the parameter ω_{2ij}is associated with each black (i + j odd) grid point (i, j). The use of a parameter for each grid point avoids the global communication required in the adaptive determination of the best value of ω and also increases the convergence rate of the SOR method (Varga, 1962) [38] and (Young, 1971) [41]. We present our strategy and the results of our effort to exploit the computational capabilities of GPUs under the CUDA environment. Additionally, two parallel CPU programs utilizing manual SSE2 (Streaming SIMD Extensions 2) and AVX (Advanced Vector Extensions) vectorization were developed as performance references. The optimizations applied on the GPU version were also considered for the CPU version. Signiﬁcant performance improvement was achieved with all three developed GPU kernels differentiated by the degree of recomputations thus affecting the ﬂops per element access ratio. ← Read lessLin. Alg. And its Appl., Vol. 432, Issue 11, pp. 2798-2815, 2010.

**ABSTRACT:**In this paper we study the convergence analysis of the Modiﬁed Preconditioned Simultaneous Displacement (MPSD) method when A is a two-cyclic matrix. Convergence conditions and optimum values of the parameters are determined in case the eigenvalues of the associated Jacobi iteration matrix are either all real or all imaginary.

Lin. Alg. And its Appl., Vol. 430, pp. 1929–1944, 2009

**ABSTRACT:**In this paper, we analyse the convergence of the preconditioned simultaneous displacement (PSD) method applied to linear systems of the form Au = b where A is a two-cyclic matrix. Read more →

Convergence conditions and optimum values of the parameters of the method are determined in the cases where the eigenvalues of the associated Jacobi iteration matrix are either all real or all imaginary. It is shown that the convergence behavior of the PSD method is greatly affected by the locality of the eigenvalues of the associated Jacobi iteration matrix. In particular, it is shown that when these eigenvalues are real the PSD method degenerates into the extrapolated Gauss–Seidel method whereas when they are imaginary its convergence is increased by an order of magnitude and becomes equivalent to the extrapolated SOR method. Finally, a comparison with the SSOR method reveals that the PSD method possesses a better convergence behavior in all cases.
← Read less

Theoretical Computer Science, Vol. 401, Issues 1-3, pp. 1-16, July 2008

**ABSTRACT:**This paper studies the Diffusion method for the load balancing problem in case of weighted torus graphs. Closed form formulae for the optimum values of the edge weights are determined using local Fourier analysis. It is shown that an extrapolated version of Diffusion can become twice as fast for the stretched torus graphs.

PARA06, Workshop on state-of-art in scientific and parallel computing, 761-773, Sweden, June 18-21, 2006

**ABSTRACT:**In this paper, we simulate the performance of a load balancing scheme. In particular, we study the application of the Extrapolated Diffusion(EDF) method for the efficient parallelization of a simple atmospheric model. Read more →

It involves the numerical solution of the steady state Navier-Stokes(NS) equations in the horizontal plane and random load values, corresponding to the physics computations, in the vertical plane. For the numerical solution of NS equations, we use the local Modified Successive Overrelaxation (LMSOR) method with local parameters thus avoiding the additional cost caused by the global communication of the involved parameter ω in the classical SOR method. We have implemented an efficient domain decomposition technique by using a larger number of processors in the areas of the domain with heavier work load. With our balancing scheme, a gain of as much as approximately 45% in execution time is achieved, in certain cases.
← Read less

Europar '99 Parallel Processing, 5th Intern. Euro-Par Conference, Lecture Notes in Computer Science 1685, 1395-1402

**ABSTRACT:**In this paper we present the parallel implementation of the Princeton Ocean Model (POM) using message passing. Read more →

Domain decomposition techniques are used for the horizontal discretization whereas in the vertical direction each column per grid point is computed. The required interprocessor communication was implemented on the PVM message passing library. Three different data partitioning schemes are studied. It is found that the checkerboard partitioning produces the best results when the number of processors becomes larger than 15, otherwise row block striped partitioning is to be preferred.
← Read less

Numeriche Mathematik 45, 447-458, 1984

**ABSTRACT:**A variety of iterative methods are applied to linear algebraic systems of the form Au=b, where the matrix A is consistently ordered and the iteration matrix of the Jacobi method is skew-symmetric. Read more →

The related theory of convergence is developed and the optimum values of the involved parameters for each considerer scheme are determined. It reveals that under the aforementioned assumptions the Extrapolated Successive Underrelaxation method attains a rate of convergence which is clearly superior over the Successive Underrelaxation method when the Jacobi iteration matrix is non-singular.
← Read less

SIAM J. Num. Anal. 18, No. 4, 591-596, 1981

**ABSTRACT:**This paper generalizes the preconditioning techniques introduced by Evans for the numerical solution of the linear system Au=b and defines two extrapolated versions of the Gauss-Seidel method as well as an extrapolated version of the successive overrelaxation method. Read more →

Finally, it establishes some convergence theorems for the considered iterative schemes when A has particular properties such as being consistently ordered, positive definite having weak diagonal dominance.
← Read less