Author Topic: Question regarding parallelization calculation using mpi  (Read 3368 times)

0 Members and 1 Guest are viewing this topic.

Offline pshinyeong

  • Heavy QuantumATK user
  • ***
  • Posts: 37
  • Country: kr
  • Reputation: 0
    • View Profile
Hello, I am trying to test mpi, for some reason mpirun works perfectly in QuantumATK 2020.09 & 2022.12 version but doesn't work in 2023.03 and 2023.12.

(Command I used)
mpirun -f hf78 -ppn 15 /home/edr_05/QuantumATK/QuantumATK-U-2022. 12-SP1/bin/atkpython mpi_test.py > mpi_test.log &

(output shown when I type "ps -ef | grep mpirun")
/bin/sh /opt/intel/oneapi/mpi/2021.10.0//bin/mpirun -f hf78 -ppn 15 /home/edr_05/QuantumATK/QuantumATK-U-2022. 12-SP1/bin/atkpython mpi_test.py 

This command works perfectly using node 7&8, 15 cores each and writes log as it is supposed to.

However, if I use the same command with atkpython from the bin folder in Quantumatk 2023.12 and 2023.03, it doesn't write any log file and doesn't proceed any calculation. Similarly, if I use mpiexec.hydra or mpiexec in the mpi file new versions, it also doesn't write any log. mpirun works well for 2023.03 and 2023.12 atkpython only when assigning one node but fails when assigning more than one node.

Can anyone help me solve this issue? Thank you

Offline filipr

  • QuantumATK Staff
  • Heavy QuantumATK user
  • *****
  • Posts: 81
  • Country: dk
  • Reputation: 6
  • QuantumATK developer
    • View Profile
Re: Question regarding parallelization calculation using mpi
« Reply #1 on: December 21, 2023, 15:39 »
Yes this is an recurring problem, that requires a little Christmas stiry. First of all it pays off to know that atkpython is actually a launcher script - it is a bash script that sets several important environment variables before the program is run. In particular it sets the PATH and LD_LIBRARY_PATH environment variables that control where the OS loader should look for shared libraries, such as libmpi.so. Before v. 2023.03 we set LD_LIBRARY_PATH so that the loader would also load the mpi library we ship with QuantumATK (yes, we ship Intel MPI - no need to install it yourself, though it may still be preferable). This however caused problems on some clusters that couldn't use Intel MPI we shipped, because it was of an older version. To allow for users to use their custom MPI installation we changed the launcher so as to append the path to the shipped MPI library, which means that if the system already has a libmpi.so in LD_LIBRARY_PATH it will be preferred. This also caused problems because a lot of people run it on system where they don't know that they actually have an Intel MPI in their paths. So in v. 2023.09 we now ship two launchers: atkpython which will ALWAYS use the shipped libmpi.so and atkpython_custom-mpi which assumes that you are using a system installed Intel MPI.

Here's a guide to use them:

If you want to use the shipped MPI, you should execute the atkpython launcher, but for safety ensure that you also use the shipped MPI launcher in path/to/QuantumATK/mpi/bin/mpiexec and further ensure that you do not have unnecessary I_MPI_* environment variables set, especially I_MPI_ROOT. Be sure to not source any mpivars.sh or similar scripts and don't use the mpirun script that ships with Intel MPI.

If you want to use the system Intel MPI you have to source the Intel MPI environment either though sourcing mpivars.sh or module load intel-mpi if the cluster has a module for it. Then you should run atkpython_custom-mpi instead of atkpython and everything should work if you have configured MPI correctly (but that is also now your own responsibility).

Offline pshinyeong

  • Heavy QuantumATK user
  • ***
  • Posts: 37
  • Country: kr
  • Reputation: 0
    • View Profile
Re: Question regarding parallelization calculation using mpi
« Reply #2 on: December 21, 2023, 16:25 »
Thank you for your quick response. Using atkpython_system-mpi does the job.