The
dapl async_event error could either be 1) a bug or incompatibility in Intel MPI 2) a wrongly configured Intel MPI or 3) a wrongly configured network infrastructure on the cluster.
If you can run other MPI software on the cluster, we can disregard (3).
QuantumATK 2019.12 ships Intel MPI 2018 update 1. For (2) please read the Intel MPI user guide and documentation which you can find
here. I suggest reaching out to your cluster administrator to get advice on how to configure Intel MPI to use the cluster network infrastructure.
If you believe that Intel MPI 2018 update 1 is incompatible with your cluster it is possible to use a newer version when running QuantumATK. If your cluster already has a new version of Intel MPI installed in a module system you can simply put e.g.:
in your submission script. If Intel MPI is not installed, you can install it yourself by downloading the oneAPI installer from Intel's website. Then in your submission script put:
source /path/to/intel/oneapi/mpi/latest/env/mpivars.sh
You can verify that it find the correct version by executing
Be sure that your submission script uses the mpiexec/mpirun executable of the new version of Intel MPI (which should now be in your PATH) and not the hardcoded path to the one in
QuantumATK/libexec/mpiexec.hydra. If you use a job scheduler you may have to use the MPI launcher it provides, e.g. for SLURM use srun.
Now, this will still not work out of the box. The reason is that the
QuantumATK/bin/atkpython file is actually not an executable but a launcher script, which sets up environment variables so that third party libraries like Intel MPI can be found. You can open it in a text editor if you are curious. Importantly in this case is that it
prepends the path to the
QuantumATK/lib directory to
LD_LIBRARY_PATH. When the program launches it will look for the mpi library (
libmpi.so) in the directories in
LD_LIBRARY_PATH in the order they appear. It will then always find the 2018.1 version in
QuantumATK/lib. In order to force it to use the newer version of Intel MPI you thus have to
delete or rename all files starting with "libmpi" in the QuantumATK/lib directory. That way it will end up instead finding the newest version in e.g.
/path/to/intel/oneapi/mpi/latest/lib.
You can verify that an atkpython run uses the correct Intel MPI version by setting:
in your submission script. When the program starts it should output debug log messages from Intel MPI, including the version of the library used.
Starting from QuantumATK 2022.12 we now ship Intel MPI in a separate folder and append, instead of prepend, the directory to
LD_LIBRARY_PATH. This makes it easier for users to use their own installation of Intel MPI without them having to delete/rename files.