Field Precision title

Magnum in parallel

After converting Magnum, our 3D magnetic field code, to parallel operation, we made test calculations that give a good sense of how much improvement to expect. An ideal parallel program would give a 50% reduction in run time on a dual-processor machine. In reality, there are some sections of programs that can not (or should not) be run in parallel:

  • Disk operations (particularly with sequential files).
  • Small blocks where it's not worth the overhead to launch multiple threads.
  • Calculations that use shared resources.

In Magnum, the major disk operations are mesh input, solution file output and generation of finite-element coefficients. Because the generation of coefficients uses a large, random-access file, only a single thread can address the file at any time. We found there was no benefit running this operation in parallel. One shared resource that would be unwieldy to replicate is the unit for lest-squares-fit field interpolations. Therefore, the adjustment of μr values in nonlinear solutions is performed by a single thread. Fortunately, it is straightforward to implement multiple threads in the matrix inversion that occupies most of the run time.

As a benchmark test, I used the SHIELD_SATURATION example included in the Magnum application library. The example models a magnetic shield with highly-saturated regions. The solution involved 15 cycles of permeability adjustment. The calculation was performed on a HP xw62000 computer with two Xeon processors at 3.2 GHz. The run time for the serial program was 1424 seconds while the time for the parallel version was 885 s. The run time reduction factor for the dual-processor system was Rf = 0.622.

To see the implication of the result, let fs be the fraction of operations in Magnum performed by a single thread and fp the fraction performed by N threads. Ignoring the overhead time for initiating threads, the run reduction factor is given by

Rf = fs + fp/N.

Taking fs + fs = 1 and solving for fp, we find that

fp = (1 - Rf)/(1 - 1/N).

For N = 2 and Rf = 0.622, the fraction of parallel operations is fp = 0.76. This number is consistent with an inspection of the trace generated by the Windows task manager. Inverting the equation, the reduction factor for a quad-core system (N = 4) is predicted to be 0.432 (a speed increase of 2.31X).

LINKS