Field Precision title

Parallel processing optimization

The Professional version of the 3D field-solution programs Aether, HiPhi, Magnum, HeatWave, RFE3 and field updates in OmniTrak use parallel processing routines of OpenMP to achieve significant reductions in run-time on multi-core computers. Some users have reported that the programs sometimes fail to implement parallel processing. This article addresses two topics:

  • The limitations of parallel processing and how our programs avoid memory conflicts.
  • Modifications to the programs in the most recent version to ensure that they utilize the full parallel-processing capabilities of the computer.

Consider the application of parallel processing in a boundary-value field solution (e.g., HiPhi, Magnum,...). The discrete form of the partial differential equations are solved by an iterative method where values of the primary function (e.g., electrostatic potential, reduced potential,...) are corrected to comply with values on neighboring mesh points. An optimal method alternately corrects odd and even mesh points. It is not necessary to proceed in sequence. Different parts of the mesh may be corrected simultaneously — hence, the appeal of parallel processing. The constraint is that two processes may not change the same mesh memory location simultaneously. Because our programs use structured conformal meshes, an easy and efficient solution is to assign processes to different layers of the mesh along the z direction (index K). Each process works in tandem from the top to the bottom of its assigned layer so there is no danger of overlap.

There are two reasons why we need to avoid thin processing layers:

  • There is overhead associated with MP organization, so that increasing the number of processors may give diminishing returns. For example, little speed advantage is gained by assigning three processors to layers two indices thick verses one processor to a layer with ΔK = 6.
  • A safety factor is necessary to ensure that there is never a memory overlap.

For these reasons, we set the minimum processor layer thickness to ΔK= 5. In previous versions, the programs simply skipped a parallel calculation if

ΔK = Kmax/NProc ≤ 5.

(Here, NProc is the number of processors requested in the PARALLEL command.) In this case, parallel processing would appear to fail if a solution had large dimensions in x-y, small dimension in z and the user specified a large value of NProc.

To eliminate the sometimes mysterious behavior, the current programs use the following logic.

  1. If NProc = 1, the program performs a non-parallel calculation.
  2. If NProc > 1, the program calculates the quantity NProcMax = Kmax/5. If NProcNProcMax, the program opens NProc processes. Otherwise, the number of processes equals NProcMax.

In this way, the program uses the maximum number of processors consistent with avoiding memory overlap.

LINKS