Author Topic: Parallelization with MPICH2  (Read 5727 times)

0 Members and 1 Guest are viewing this topic.

Offline atk_user

  • Heavy QuantumATK user
  • ***
  • Posts: 48
  • Country: 00
  • Reputation: 0
    • View Profile
Parallelization with MPICH2
« on: February 10, 2014, 01:31 »
Dear ATK staff members,

I'm using the demo version of ATK (13.8.0) and installed on the Linux (CentOS 5.8).

I want to calculate with several nodes (8cpu per 1 node) total 4 nodes.

In the manual,

To run ATK on 4 nodes:
mpiexec -n 4 $ATK_BIN_DIR/atkpython script.py > script.log

with this command, the job just running in the single node. (not equally distributed!)

node1 (4 process running)
node2 (not running)
node3 (not running)
node4 (not running)

I found the manual of the Guide to running in parallel.

In order to run the MPI processes are distributed equally among the nodes, using the -npernode option. However, the option is not supported in the MPICH2. I checked for several versions as 1.0.5p4, 1.2.1p1, 1.4, 1.5.

The demo version does not support the parallelization?

« Last Edit: February 10, 2014, 02:10 by atk_user »

Offline atk_user

  • Heavy QuantumATK user
  • ***
  • Posts: 48
  • Country: 00
  • Reputation: 0
    • View Profile
Re: Parallelization with MPICH2
« Reply #1 on: February 10, 2014, 02:47 »
I solve this problem by handling the queueing system.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5578
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: Parallelization with MPICH2
« Reply #2 on: February 10, 2014, 12:36 »
I see you solved it, but for others browsing the Forum one can point out that the resource allocation is handled by the queue system and not by ATK, so any issues with that need can be quite cluster-specific.

"-npernode" is a bit special and not always available, however on newer installations and in particular if you use HYDRA as process manager, you should "-ppn" instead.

Another important points is to disable threading when running in MPI parallel (this is a new discovery and we're still working on understanding it fully). So, in your submit script, include

export OMP_NUM_THREADS=1
export OMP_DYNAMIC=FALSE

And then you can even try "-n 32" when running on the 4 nodes, to use all cores for MPI.

Offline atk_user

  • Heavy QuantumATK user
  • ***
  • Posts: 48
  • Country: 00
  • Reputation: 0
    • View Profile
Re: Parallelization with MPICH2
« Reply #3 on: February 13, 2014, 06:20 »
Thank you for your suggestion. It was successfully parallelized. Thank you again.

Offline atk_user

  • Heavy QuantumATK user
  • ***
  • Posts: 48
  • Country: 00
  • Reputation: 0
    • View Profile
Re: Parallelization with MPICH2
« Reply #4 on: February 25, 2014, 06:10 »
Dear Dr. Anders Blom

I have further question for the parallelization.
In our cluster (8cpu per 1 node and each nodes are connected with infiniband), there is officially not support mesa library and mpiexec (mpdbood also) just only support mpirun and mpirun_rsh command using sun grid engine (sge queue system).

If I run the ark with openmp like

#$ -pe openmp 8
export OMP_NUM_THREADS=8

atkpython input.py

the calculation works fine without error messages for mesa library, but If I run the atk with mpirun or mpirun_rsh with the mvapich2 environment, the error is occurred with mesa library. (ImportError: libGLU.so.1: cannot open shared object file: No such file or directory)

Is there any method running the atk with mpirun or mpirun_rsh under mvapich2 and openmpi environment?

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5578
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: Parallelization with MPICH2
« Reply #5 on: February 25, 2014, 11:32 »
These things are quite cluster-dependent so I don't know what your "mpirun_rsh" is, actually. However, ATK is not compatible with OpenMPI, it's a different technology.

The libGLU error message is probably related to this: http://www.quantumwise.com/support/faq/116