QuantumATK Forum
QuantumATK => Scripts, Tutorials and Applications => Topic started by: hagelberg on January 6, 2026, 23:20
-
Hi - I'm trying to make QuantumATK-X-2025-06-SP1 functional on the ORNL cluster CADES.
While I was successful at installing the program, I haven't been able to run parallel jobs so far.
Here is an example for an MPI script that worked extremely well on CADES for QuantumATK-U-2022.12:
#!/bin/bash
#SBATCH --nodes 2
##SBATCH --ntasks-per-node 4
#SBATCH --exclusive
#SBATCH --mem=0
#SBATCH -p high_mem
#SBATCH -A cnms
#SBATCH -t 24:00:00
#SBATCH -o output_%j.out
#SBATCH -e output_%j.err
module purge
module load gcc/8
export SNPSLMD_LICENSE_FILE="[email protected]"
srun -N2 --ntasks-per-node=4 -v --mpi=pmi2 /home/fhagelberg/QuantumATK-2/QuantumATK-U-2022.12-SP1/atkpython/bin/atkpython /lustre/or-scratch/cades-birthright/fhagelberg/Olaf/zWS2+Hbridge-AFM-SGGA-Dojo-2-electrode.py
When I use the same script for QuantumATK-X-2025-06-SP1, cryptic error messages appear (see below). Pmi2 doesn't seem to work any longer (pmix is unavailable on CADES). Is there any other way I could implement MPI to run the program in a SLURM environment, leveraging srun which has turned to be very effective?
Error message:
srun: defined options
srun: -------------------- --------------------
srun: (null) : or-condo-c[317,364]
srun: jobid : 4348670
srun: job-name : atksub
srun: mpi : pmi2
srun: nodes : 2
srun: ntasks-per-node : 4
srun: oom-kill-step : 0
srun: verbose : 1
srun: -------------------- --------------------
srun: end of defined options
srun: jobid 4348670: nodes(2):`or-condo-c[317,364]', cpu counts: 36(x2)
srun: CpuBindType=(null type)
srun: launching StepId=4348670.0 on host or-condo-c317, 4 tasks: [0-3]
srun: launching StepId=4348670.0 on host or-condo-c364, 4 tasks: [4-7]
srun: topology/default: init: topology Default plugin loaded
srun: Node or-condo-c317, 4 tasks started
srun: Node or-condo-c364, 4 tasks started
slurmstepd: error: mpi/pmi2: value not properly terminated in client request
slurmstepd: error: mpi/pmi2: request not begin with 'cmd='
slurmstepd: error: mpi/pmi2: full request is: 00000000000000000000000000000000000000000000000
-
The difference between v. U-2022.12 and X-2025.06 is probably that the former ships Intel MPI 2018.1 and the latter ships Intel MPI 2021.15. I am not sure what has changed between those versions in relation to PMI2, but according to Intels documentation (https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-6/job-schedulers-support.html) it should support it. Maybe you need to manually set the path to the pmi library in the environment variable I_MPI_PMI_LIBRARY. You can also try srun with default/no arguments and see if it works out of the box.
If none of this helps, I suggest you ask Intel directly as we're not really in control nor knowledgeable about the inner workings of their MPI implementation. They have a user support forum here: https://community.intel.com/t5/Intel-MPI-Library/bd-p/oneapi-hpc-toolkit