CFD Solvers with Minimal Memory Access

Rainald Löhner, PhD., DSc

Many state of the art CFD codes that exhibit low computational intensity (flops per RAM access) `saturate' the memory bandwidth of modern chips after only a few cores, thus minimizing any benefits from going to a higher number of available cores. This bottleneck is expected to become even more pronounced for future manycore systems.

This has led to the quest for CFD solvers with minimal memory access.

We report on recent developments and results for Finite Difference and Edge-Based Finite Element solvers.

The best of these implementations yield one residual for only 6 fetches and 4 stores, regardless of the size of the stencil (and therefore the discretization order). This means that in terms of memory access they are competitive even with finite diffrence stencils as low as 2 (typical of CFD codes with 2nd order spatial discretization of fluxes and 4th order damping).

Timings for a low Mach number finite difference code using a 6th order spatial discretization show competitive timings as compared to conventional loops. This bodes well for future HPC architectures.