Author Topic: MPI for QuantumATK-X-2025-06-SP1  (Read 1740 times)

0 Members and 1 Guest are viewing this topic.

Offline hagelberg

  • Regular QuantumATK user
  • **
  • Posts: 17
  • Country: us
  • Reputation: 0
    • View Profile
MPI for QuantumATK-X-2025-06-SP1
« on: January 6, 2026, 23:20 »
Hi - I'm trying to make QuantumATK-X-2025-06-SP1 functional on the ORNL cluster CADES.
While I was successful at installing the program, I haven't been able to run parallel jobs so far.
Here is an example for an MPI script that worked extremely well on CADES for QuantumATK-U-2022.12:

#!/bin/bash
#SBATCH --nodes 2
##SBATCH --ntasks-per-node 4
#SBATCH --exclusive
#SBATCH --mem=0
#SBATCH -p high_mem
#SBATCH -A cnms
#SBATCH -t 24:00:00
#SBATCH -o output_%j.out
#SBATCH -e output_%j.err

module purge
module load gcc/8

export SNPSLMD_LICENSE_FILE="[email protected]"

srun -N2 --ntasks-per-node=4 -v --mpi=pmi2 /home/fhagelberg/QuantumATK-2/QuantumATK-U-2022.12-SP1/atkpython/bin/atkpython     /lustre/or-scratch/cades-birthright/fhagelberg/Olaf/zWS2+Hbridge-AFM-SGGA-Dojo-2-electrode.py

When I use the same script for QuantumATK-X-2025-06-SP1, cryptic error messages appear (see below). Pmi2 doesn't seem to work any longer (pmix is unavailable on CADES). Is there any other way I could implement MPI to run the program in a SLURM environment, leveraging srun which has turned to be very effective?


Error message:

srun: defined options
srun: -------------------- --------------------
srun: (null)              : or-condo-c[317,364]
srun: jobid               : 4348670
srun: job-name            : atksub
srun: mpi                 : pmi2
srun: nodes               : 2
srun: ntasks-per-node     : 4
srun: oom-kill-step       : 0
srun: verbose             : 1
srun: -------------------- --------------------
srun: end of defined options
srun: jobid 4348670: nodes(2):`or-condo-c[317,364]', cpu counts: 36(x2)
srun: CpuBindType=(null type)
srun: launching StepId=4348670.0 on host or-condo-c317, 4 tasks: [0-3]
srun: launching StepId=4348670.0 on host or-condo-c364, 4 tasks: [4-7]
srun: topology/default: init: topology Default plugin loaded
srun: Node or-condo-c317, 4 tasks started
srun: Node or-condo-c364, 4 tasks started
slurmstepd: error: mpi/pmi2: value not properly terminated in client request
slurmstepd: error: mpi/pmi2: request not begin with 'cmd='
slurmstepd: error: mpi/pmi2: full request is: 00000000000000000000000000000000000000000000000

Offline filipr

  • QuantumATK Staff
  • QuantumATK Guru
  • *****
  • Posts: 106
  • Country: dk
  • Reputation: 12
  • QuantumATK developer
    • View Profile
Re: MPI for QuantumATK-X-2025-06-SP1
« Reply #1 on: January 14, 2026, 11:44 »
The difference between v. U-2022.12 and X-2025.06 is probably that the former ships Intel MPI 2018.1 and the latter ships Intel MPI 2021.15. I am not sure what has changed between those versions in relation to PMI2, but according to Intels documentation (https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-6/job-schedulers-support.html) it should support it. Maybe you need to manually set the path to the pmi library in the environment variable I_MPI_PMI_LIBRARY. You can also try srun with default/no arguments and see if it works out of the box.

If none of this helps, I suggest you ask Intel directly as we're not really in control nor knowledgeable about the inner workings of their MPI implementation. They have a user support forum here: https://community.intel.com/t5/Intel-MPI-Library/bd-p/oneapi-hpc-toolkit