Benchmarks

Profiling tools

MFEM-MGIS provides a simple tool for profiling your application, consisting mainly in generating a time table of timed sections respecting the call stack (tree). The master or root timer of the call stack is initialized during the general initialization of mfem-mgis:

mfem_mgis::initialize(argc, argv);

Displaying the time table in your terminal is done by adding the following function call:

mfem_mgis::Profiler::timers::print_timers();

This is an example of an output:

MPI feature activated, rank 0:
|-- start timetable -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|    name                                                                  |    number Of Calls |             min(s) |            mean(s) |             max(s) |            part(%) |             imb(%) |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| > root                                                                   |                  1 |           0.061155 |           0.061313 |           0.061512 |        100.000000% |          0.003254% |
| |--> Solve                                                               |                  1 |           0.056359 |           0.056493 |           0.056539 |         91.914400% |          0.000812% |
|    |--> NonLinearEvolutionProblem::solve                                 |                  1 |           0.056359 |           0.056492 |           0.056538 |         91.913194% |          0.000812% |
|       |--> NonLinearEvolutionProblemImplementationBase::solve            |                  1 |           0.056357 |           0.056490 |           0.056536 |         91.909814% |          0.000810% |
|          |--> NewtonSolver::Mult                                         |                  1 |           0.056351 |           0.056483 |           0.056528 |         91.897665% |          0.000798% |
|             |--> NewtonSolver::computeNewtonCorrection                   |                 10 |           0.048653 |           0.048706 |           0.048851 |         79.416040% |          0.002969% |
|                |--> NewtonSolver::getJacobian                            |                 10 |           0.020212 |           0.020215 |           0.020220 |         32.870789% |          0.000239% |
|          |--> NonLinearEvolutionProblemImplementationBase::setup         |                  1 |           0.000004 |           0.000005 |           0.000006 |          0.009562% |          0.148211% |
| |--> FiniteElementDiscretization::constructor                            |                  1 |           0.002696 |           0.002717 |           0.002760 |          4.487192% |          0.015873% |
|    |--> loadMesh                                                         |                  1 |           0.000944 |           0.000965 |           0.001002 |          1.629139% |          0.038049% |
| |--> common::post_processing_step                                        |                  1 |           0.000856 |           0.000974 |           0.001084 |          1.762192% |          0.113317% |
| |--> set_mgis_stuff                                                      |                  1 |           0.000687 |           0.000824 |           0.000970 |          1.577426% |          0.177846% |
| |--> set_linear_solver                                                   |                  1 |           0.000139 |           0.000169 |           0.000204 |          0.331618% |          0.205352% |
|    |--> NonLinearEvolutionProblemImplementationBase::updateLinearSolver  |                  1 |           0.000001 |           0.000001 |           0.000001 |          0.001975% |          0.078562% |
| |--> PeriodicNonLinEvPB::constructor_with_bct                            |                  1 |           0.000045 |           0.000047 |           0.000050 |          0.081340% |          0.053553% |
|-- end timetable ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

To write the time table in a file is done by adding the following function call:

mfem_mgis::Profiler::timers::write_timers();

Each line is composed of the section name, number of calls, time in seconds, and ratio to total time. This is an example of an output:

root 1 3676.43 100
   FiniteElementDiscretization::constructor 1 132.422 3.60192
      loadMesh 1 117.7 3.20148
   PeriodicNonLinEvPB::constructor_with_bct 1 0.276666 0.0075254
   set_mgis_stuff 1 0.104737 0.00284886
   set_linear_solver 1 0.000827939 2.25202e-05
      NonLinearEvolutionProblemImplementationBase::updateLinearSolver 1 1.5063e-05 4.09718e-07
   Solve 40 3480.88 94.6808
      NonLinearEvolutionProblem::solve 40 3480.88 94.6808
         NonLinearEvolutionProblemImplementationBase::solve 40 3480.88 94.6808
            NonLinearEvolutionProblemImplementationBase::setup 40 0.000191316 5.20385e-06
            NewtonSolver::Mult 40 3480.88 94.6808
               NewtonSolver::computeNewtonCorrection 109 3322.7 90.3783
                  NewtonSolver::getJacobian 109 349.572 9.50845
   common::post_processing_step 40 95.899 2.60848

Note

You can specify the name of your output file by using mfem_mgis::Profiler::OutputManager::writeFile("OutputTimerFile.perf").

Note

Both functions can be called directly by this function print_and_write_timers()

You can instrument each of your functions using section timers. It’s important to remember that adding chronos can add extra cost, negligible for “big” functions but very costly for functions called in each element.

  • To time a function, place this instruction at the start of your function, the timer will stop at the end of the scope (the second time point is hidden in the timer destructor).

CatchTimeSection("NameOfYourFunction");
  • To add a second timer to the same scope, you can use:

CatchNestedTimeSection("NameOfYourNestedSection");

Benchmarks RVE MOX

In this section, we propose some scaling curves for different test cases. For the first benchmark, we use the MOX RVE example with 643 inclusions and a viscoplastic behavior law for the matrix, and an elastic behavior law for the inclusions. Calculations are performed on the TOPAZE supercalculator at CCRT. Each node in the cluster is built on 64-core AMD EPYC Milan 7763 dual-socket processors running at 2.45 GHz and equipped with 256 GB RAM. Benchmarks are in pure MPI.

Regarding the specificities of the simulations, we use the HyprePCG solver with a HypreBoomerAMG preconditioner. Mesh reading is performed using a mesh pre-cut into small msh files to limit the impact on the memory footprint.

Illustration of a RVE with 634 spheres after 5 seconds.

We performed tests on 3 problem sizes: 80M ddl, 190M ddl and 664M ddl.

Note

Un test de faisabilité de cette simulation a aussi était réalisé sûr plus de 5.3 milliards de ddl.

80M ddl

Time, Memory footprint and speedup of a MOX RVE with 80M ddl.

190M ddl

Time, Memory footprint and speedup of a MOX RVE with 190M ddl.

664M ddl

Time, Memory footprint and speedup of a MOX REV with 664M ddl.

MFEM/MGIS versus MFEM

This benchmark aims to evaluate the time overhead between MFEM and MFEM/MGIS for a simple test case involving an elastic behavior law.

Details:

  • Solver: HyprePCG

  • Preconditionner: HypreDiagScale (Jacobi)

  • Tolerance: 1e-14

  • Boundary conditions: we impose an uniform dirichlet condition U = (0,0,0) at left and U = (0,0,1) at right.

../_images/bench-perf.png

Refinement

MPI

DDL

MFEM-MGIS Time

MFEM-MGIS Iteration

MFEM Time

MFEM Iteration

Overhead

4

1

111843

8.34

1592

6.21

1565

34.4 %

5

1

839619

122.54

3167

94.53

3102

29.6 %

5

16

839619

51.34

3167

41.90

3102

22.5 %

6

16

6502275

1057.24

6273

877.92

6124

20.0 %

Note

Note that due to the small size of the input mesh, domain decomposition is limited to 16 subdomains (MFEM-MGIS only performs parallel refinement). To improve this benchmark, consider using sequential refinement or a finer mesh.

Run this example on Topaze supercomputer:

ccc_mprun -n 16 -c 1 -m work -T 600 -p milan -Q test ./MFEMLinearElasticityBenchmark --mesh ../beam-tet.mesh -r 5
ccc_mprun -n 16 -c 1 -m work -T 600 -p milan -Q test ./MFEMMGISLinearElasticityBenchmark --mesh ../beam-tet.mesh -r 5