generic-parallel interface for
material propertiesThis page describes the generic-parallel interface.
The generic-parallel interface has been designed to be
compatible with current parallel programming models (see the
Backends section).
The generic-parallel interface for material properties
generates two functions. Those functions can be used:
dlopen/dlsym on Unix systems or
LoadLibrary/GetProcAddress on
Windows.In addition, the generic-parallel interface also
exports:
parameters_as_static_variables
options).The first generated function matches the following prototype:
void (*)(mfront_gmp_OutputStatus* const, // output status
mfront_gmp_real* const, // output values
const mfront_gmp_size_type, // output stride
const mfront_gmp_real* const, // values of the arguments
const mfront_gmp_size_types* const, // strides of the arguments
const mfront_gmp_size_type, // number of arguments
const mfront_gmp_size_type, // number of points
const mfront_gmp_OutOfBoundsPolicy); // out of bounds policyThe mfront_gmp_OutputStatus structure and the
mfront_gmp_OutOfBoundsPolicy enumeration type are described
in the page dedicated to the generic
interface for material properties. The mfront_gmp_real
type is by default an alias to double precision floating point
numbers.
If the number of points is null, no computation is made.
A null stride means that the associated argument is uniform. The stride of the output can be null only if all the strides of all the arguments are null, otherwise an error is reported.
Most backends will treat the case where all strides are equal to one as a special case and will use an optimized implementation.
The second generated function matches the following prototype:
void (*)(mfront_gmp_OutputStatus* const, // output status
mfront_gmp_real* const, // output values
const mfront_gmp_real* const, // values of the arguments
const mfront_gmp_size_type, // number of arguments
const mfront_gmp_size_type, // number of points
const mfront_gmp_OutOfBoundsPolicy); // out of bounds policyThis function assumes that the values of the arguments and the values of the output are stored contiguously in memory. In other words, this prototype is equivalent to the the first prototype when all strides are equal to one.
Backends are associated with parallel programming models.
CUDA backendLet UO2_ShearModulus.mfront be an implementation of a
material property computing the shear modulus of uranium dioxide, which
depends on the temperature and on the porosity. This file can be
compiled as follows:
$ mfront --obuild --configuration-file=config-cuda.json \
--interface=generic-parallel UO2_ShearModulus.mfront
The following library has been built :
- libGenericParallelUO2-cuda.so : UO2_ShearModulus UO2_ShearModulus2where the configuration file config-cuda.json provide
the required information to call a CUDA compiler
(nvcc or clang++) with the appropriate flags.
This file may also contain the options associated with the
CUDA backend, see below.
UO2_ShearModulus implements the first prototype, while
UO2_ShearModulus2 implements the second.
The following code shows how to call the function
UO2_ShearModulus. In this example, the porosity is assumed
to be uniform and has thus a stride equal to zero. For the sake of
simplicity, memory allocation on the host and on the device are handled
by the Thrust
library.
thrust::host_vector<double> G;
thrust::host_vector<double> T = {300, 500, 300, 800};
thrust::host_vector<double> f = {0.1};
thrust::device_vector<double> d_T(T);
thrust::device_vector<double> d_f(f);
thrust::device_vector<double> d_G(T.size());
auto output = mfront_gmp_OutputStatus{};
const auto policy = GENERIC_MATERIALPROPERTY_NONE_POLICY;
const auto args = std::array<double *, 2u>{thrust::raw_pointer_cast(d_T.data()),
thrust::raw_pointer_cast(d_f.data())};
const auto args_stride = std::array<mfront_gmp_size_type, 2u>{1, 0};
UO2_ShearModulus(&output, thrust::raw_pointer_cast(d_G.data()), 1,
args.data(), args_stride.data(), 2, 4, policy);
G = d_G;The following remarks can be made:
UO2_ShearModulus function,
including uniform values. Managed memory can also be used.CUDA backend. It is only used in the
example for simplicity.CUDA backend and the
presented example can be compiled with both nvcc and
clang++.nvccThe following configuration file exemplifies how to use the
nvcc compiler to build the source code generated by the
CUDA backend
compilation_options : {
cuda : {
compiler: "/usr/local/cuda-12.8/bin/nvcc",
compilation_flags: {"-O2 -std=c++20 -diag-suppress 20012",
"--expt-relaxed-constexpr", "-Xcompiler -fPIC"}
}
}clang++The following configuration file exemplifies how to use the
clang++ compiler to build the source code generated by the
CUDA backend:
compilation_options : {
cuda : {
compiler: "clang++",
compilation_flags: {"-O3 -std=c++20 -march=native",
"-x cuda --cuda-compile-host-device --cuda-gpu-arch=sm_86",
"--cuda-path=/usr/local/cuda-12.8/ -fPIC -DPIC"}
}
},
linking_options: {
linker_flags: "-L/usr/local/cuda-12.8/lib64 -lcudart -ldl -lrt -pthread"
}The only option available is
number_of_threads_per_block, the number of threads per
block. By default, the value used is \(64\). This number is generally chosen a
multiple of the number of threads per warps, which is typically \(32\) or \(64\). Typical values are thus \(64\), \(128\) and \(256\).
When required, the strides are stored in constant memory on the device.
If the material property exposes parameters, those are stored in a
global variable on the host (the CPU). For efficiency, parameters are
copied in constant memory on the device when evaluating the material
property. This mechanism allows to keep the flexibility provided by
parameters: this flexibility is required to perform sensitivity
analyses, uncertainty propagation studies or perform a new
identification. Note that parameters’ handling can still be disabled by
setting the parameters_as_static_variables DSL option to
true.
The errors are handled by allocating a managed array of integers
associated with all possible kind of errors. This array is only accessed
if an error occurs, minimizing the cost of error handling. On output,
the user knows if an error occurred, but not where this errors occurred.
For instance, the UO2_ShearModulus function may report that
one temperature passed on input was negative at one evaluation point.
For material properties, errors are currently associated with violation
of bounds and physical bounds. Those tests are disabled by setting the
disable_runtime_checks DSL optionto true. Note
that this option also disables other checks performed on the host
(notably regarding the number of arguments passed).
stdpar)Let UO2_ShearModulus.mfront be an implementation of a
material property computing the shear modulus of uranium dioxide, which
depends on the temperature and on the porosity. This file can be
compiled as follows:
$ mfront --obuild --configuration-file=config-stdpar.json \
--interface=generic-parallel UO2_ShearModulus.mfront
The following library has been built :
- libGenericParallelUO2-stdpar.so : UO2_ShearModulus UO2_ShearModulus2where the configuration file config-stdpar.json provide
the required information to compile the generated source files and also
selects the execution policy to be used (see below).
UO2_ShearModulus implements the first prototype, while
UO2_ShearModulus2 implements the second.
auto G = std::vector<double>(4);
const auto T = std::vector<double>{300, 500, 300, 800};
const auto f = std::vector<double>{0.1};
auto output = mfront_gmp_OutputStatus{};
const auto policy = GENERIC_MATERIALPROPERTY_NONE_POLICY;
const auto args = std::array<const double *, 2u>{T.data(), f.data()};
const auto args_strides = std::array<mfront_gmp_size_type, 2u>{1, 0};
UO2_ShearModulus(&output, G.data(), 1, args.data(), args_strides.data(), 2, 4,
policy);g++The following configuration file exemplifies how to use the
clang++ compiler to build the source code generated by the
stlpar backend:
interfaces_options: {
generic-parallel: {
backend: {stlpar: {execution_policy: "parallel_unsequenced_policy"}}
}
},
compilation_options : {
cxx : {
compiler: "g++",
compilation_flags: "-O2 -std=c++20 -march=native"
}
},
linking_options : {
linker_flags : "-ltbb"
}nvhcpThe following configuration file exemplifies how to use the
nvhcp compiler to build the source code generated by the
stlpar backend and execute the computation of the material
property on the device:
interfaces_options: {
generic-parallel: {
backend: {stlpar: {execution_policy: "parallel_unsequenced_policy"}}
}
},
compilation_options : {
cxx : {
compiler: "nvc++",
compilation_flags: "-O2 -stdpar=gpu -std=c++20 -march=native -gpu=sm_89"
}
},
linking_options : {
linker_flags: "-stdpar=gpu"
}The only option available is execution_policy, which can
have one of the following values:
sequenced_policy (or seq)unsequenced_policy (or unseq)parallel_policy (or par)parallel_unsequenced_policy (or
par_unseq)The exact meaning of those policies are implementations defined and may depend on compiler flags used. See this page for details.
sequenced_policy denotes a standard sequential
loop.unsequenced_policy seems generally associated to
vectorization.parallel_policy and
parallel_unsequenced_policy generally rely on a
multithreading approach. This note applies to the current
implementations provided by the Visual Studio,
gcc, clang++ or icpx compilers
and the standard libraries they rely on.-stdpar=gpu provided by the
nvhpc compiler has two effects:
std::vector) are managed: eventual memory transfers between
the host (the CPU) and the device (the GPU) are done transparently.libstdc++ implements the parallel STL algorithms on the
top of the Threading Building Blocks library and requires an explicit
link to this library.