Author Topic: mpirun hostfile rank allocate error  (Read 3014 times)

0 Members and 1 Guest are viewing this topic.

Offline Lim changmin

  • Heavy QuantumATK user
  • ***
  • Posts: 28
  • Country: kr
  • Reputation: 0
    • View Profile
mpirun hostfile rank allocate error
« on: December 11, 2023, 03:20 »
Dear Expert
Hello, I am currently having some problems when running mpi.

My hostfile is dirac1, dirac2... dirac8 and each one contains max. 20 mpi nodes.

I named each hostfile as dirac1=hf1, dirac2=hf2, dirac3+dirac4=hf34

I works well when I run just one hostfile, the calculation runs properly.

However, when the I run hostfiles more than 2 (for example, hf34=dirac3 and 4) the log file only shows blank space, although the terminal, shows that the process is working properly.

Also while allocating mpi, I make hostfile through terminal

vi hf34
dirac3:15
dirac4:15
:wq

and when I run the calculation
mpirun -n hf34 -p 30 home/edrl_04/Desktop/quantum23ver/quantumatk/V-2023.09/bin/atkpython opt_device_conf_110_5e20.py > opt_device_conf_110_5e20.log &

the process allocate as dirac3:20 and dirac4:10 although I already set their process as 15 each. And the log file shows nothing.

I am using 23.9 version of quantum atk and my intel mpi verison is 21.10 version

Is there anything that I can do to fix this 2 major problems?
Thank you for reading the questions.

Offline filipr

  • QuantumATK Staff
  • Heavy QuantumATK user
  • *****
  • Posts: 81
  • Country: dk
  • Reputation: 6
  • QuantumATK developer
    • View Profile
Re: mpirun hostfile rank allocate error
« Reply #1 on: December 11, 2023, 10:02 »
First of all: if your computer cluster has a scheduling system you should probably use that instead of assigning process allocation manually. When you specify number of processes per host in the file it is not a hostfile but a machine file. Also the -n option is for specifying the total number of processes, not the hostfile, you should use the -f or -machinefile option instead. And if you specify the number of processes per host in the machine file you should not use the -p opr -ppn option. I would assume that the correct command would be:
Code
mpirun -machinefile hf34 home/edrl_04/Desktop/quantum23ver/quantumatk/V-2023.09/bin/atkpython opt_device_conf_110_5e20.py > opt_device_conf_110_5e20.log &
To see how to correctly execute the mpirun/mpiexec command try to run `mpirun --help`. And for further information on how to assign process allocations see: https://www.intel.com/content/www/us/en/developer/articles/technical/controlling-process-placement-with-the-intel-mpi-library.html https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-8/global-hydra-options.html If you try the above and follow the documentation and it still doesn't work I suggest you ask your cluster admin/support or create a post on the Intel MPI support forum as I think your problem does not seem to be related to QuantumATK specifically.