Author Topic: the setting for the parallel computing of ATK  (Read 4836 times)

0 Members and 1 Guest are viewing this topic.

Offline luke419

  • Regular QuantumATK user
  • **
  • Posts: 15
  • Country: kr
  • Reputation: 0
    • View Profile
the setting for the parallel computing of ATK
« on: August 22, 2012, 06:24 »
Hello,

I'd like to perform ATK in parallel based on MPI using 4 node systems in which each has 8 cores (total 32 cores).
I've make input file (python) for ATK and performed it using queue system in  as way that VASP was performed.

My questions are as follows.

1. Despite parallel computing, the speed is not good.
The queue state showed that it assigned to 4 nodes, however, if I connected the corresponding nodes, running program for ATK is not shown by "top" or "ps" command.
Only one node showed one running job of "atkexec".
What's the problem with it and what should I do for it?

Should I add the commands for the node number or core number in input python file for correct running?
Otherwise, is it working correctly though they are not shown?
I ask your answers as in detail as possible, please.

2. Incidentally, the job stopped during the process without the completion after 4 days.
Can restart it? If so, what should I do for it?


Best reagrds,

  Young


Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5429
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: the setting for the parallel computing of ATK
« Reply #1 on: August 22, 2012, 10:40 »
1. What was your command line for starting ATK? You may need to provide a machinefile to make sure the jobs are spread out over the nodes. Remember to not use more than 1 2 MPI process per socket, so assuming your nodes are double quad-cores, I would recommend running mpiexec -n 8. If your mpiexec supports it, you can use the argument -npernode 2, otherwise you will need a machinefile. To test if you get the desired MPI allocation, try a small test script first (without any real calculations), containing just
Code: python
import socket
if processIsMaster():
    print "Master node",
else:
    print "Slave node",
print socket.gethostname()
If you run -n 8 you want this to write each hostname twice. 2. It depends a lot on where and why it stopped, but there is a checkpoint file that you could try. See http://quantumwise.com/publications/tutorials/mini-tutorials/142

Offline luke419

  • Regular QuantumATK user
  • **
  • Posts: 15
  • Country: kr
  • Reputation: 0
    • View Profile
Re: the setting for the parallel computing of ATK
« Reply #2 on: August 23, 2012, 03:40 »
I didn't compile it and I've used a binary file provided by quantumwise for the execution.
The commands for the execution was as follows.

mpirun -np 16 -machinefile nodefile /opt/QuantumWise/atk-12.2.2/atkpython/bin/atkpython test.py > out &

nodefile contains three nodes.

I've got the following result using the test script.

+------------------------------------------------------------------------------+
|                                                                              |
| Atomistix ToolKit 12.2.2 [Build 144eba5]                                     |
|                                                                              |
+------------------------------------------------------------------------------+
Master node node45

Slave node and its specific host name was not shown and it seems that something is wrong.

What should I do for it?
Can I compile the source files of ATK in my system?

Best regards,

  Young


Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5429
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: the setting for the parallel computing of ATK
« Reply #3 on: August 23, 2012, 23:07 »
No, you cannot obtain the source, and it wouldn't solve the problem, or rather the problem can normally easily be solved by making sure the MPI environment is properly configured.

A question which needs to be answered is which MPI library are you using? It must be MPICH2 compatible.

Also, test the simplest thing in the world:

Quote
mpirun -np 16 -machinefile nodefile echo $HOSTNAME

and see what that prints.

Offline luke419

  • Regular QuantumATK user
  • **
  • Posts: 15
  • Country: kr
  • Reputation: 0
    • View Profile
Re: the setting for the parallel computing of ATK
« Reply #4 on: August 24, 2012, 08:02 »
Thanks for your answers, Anders Blom.
I've asked about MPI system to the system administrator and he said MPICH2 was installed. Mpirun works well generally in this system (e.g., VASP etc.) and I think it needs some modification for ATK but I don't know what it is.
I am not an expert on the system details and I will ask the administrator to upload his question using my ID and talk with you on this point.

If possible, you may let me know of your email address (informing your email adress by sending it to my email) for the direct discussion with him.
By the way, in case that it is hard to solve, may I ask you connect to our computer system and check the detail?

Many thanks for all your helps again.

Best regards,

  Young

 

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5429
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: the setting for the parallel computing of ATK
« Reply #5 on: August 24, 2012, 08:22 »
What was the output of the simple command? My email address can be seen in my Forum profile.

Offline luke419

  • Regular QuantumATK user
  • **
  • Posts: 15
  • Country: kr
  • Reputation: 0
    • View Profile
Re: the setting for the parallel computing of ATK
« Reply #6 on: August 27, 2012, 05:20 »
1. The output is as follows.
+------------------------------------------------------------------------------+
|                                                                              |
| Atomistix ToolKit 12.2.2 [Build 144eba5]                                     |
|                                                                              |
+------------------------------------------------------------------------------+
Master node master

In the a pevious test, it was node45 instead of master but it didn't show Slave node, either.

2. The email address is "hidden" in your profile of the forum.
I'd like to ask you send the information on your email address to luke419@google.com, please.

3. What was the version of MPICH2 which was used to compile ATK 12.2.2?


Best regards,

  Young

« Last Edit: August 27, 2012, 07:04 by luke419 »

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5429
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: the setting for the parallel computing of ATK
« Reply #7 on: August 27, 2012, 09:34 »
Please run

mpirun -n 4 echo $HOSTNAME

with or without machinefile argument.

We built ATK with MPICH2 1.3.2p1 but you can use any higher version.
« Last Edit: August 27, 2012, 09:40 by Anders Blom »

Offline luke419

  • Regular QuantumATK user
  • **
  • Posts: 15
  • Country: kr
  • Reputation: 0
    • View Profile
Re: the setting for the parallel computing of ATK
« Reply #8 on: August 27, 2012, 13:38 »
1. I've tested the script with and without includinig machinefile list and the output is the same as follows.

master -p4pg /home/yijhon/ATK/test/PI9492 -p4wd /home/yijhon/ATK/test

PI9492 was changed to PIxxxx, it is just numbering of the job.

2. Please, let me know of your email address by sending the information to luke419@google.com so that the system adiministrator contact to you, if possible.

Best regards,

  Young

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5429
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: the setting for the parallel computing of ATK
« Reply #9 on: August 27, 2012, 13:54 »
If you can't run "echo $HOSTNAME" in parallel it's a general problem with your parallel configuration which has nothing to do with ATK, and which your sysadmin needs to help you solve.

My email is not hidden in my profile.
« Last Edit: August 27, 2012, 13:56 by Anders Blom »

Offline luke419

  • Regular QuantumATK user
  • **
  • Posts: 15
  • Country: kr
  • Reputation: 0
    • View Profile
Re: the setting for the parallel computing of ATK
« Reply #10 on: September 14, 2012, 18:58 »
First many thanks for all your previous help, Anderson Blom.
The hostname test was done again and it runs well in the comupter system.
It seems that mpich2 is the best for ATK but I'd like to use "mpich intel" if possible since there is some probelm in mpich2 of our system.
It is written that mpich intel will be all right for running ATK.
However, if I run it, only one process (one core) is running in the submitted node and it is not distributed to other cores nor the other machcines.
If you know about this issue, would you let me know of it?
Incidentally, mpich intel in our system works well for many other programs.

Best

  Young


Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5429
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: the setting for the parallel computing of ATK
« Reply #11 on: September 15, 2012, 00:52 »
In our experience Intel MPI should work, we have other customer systems where it's running fine.

I'm not experienced with it personally however, we don't use it in-house. It sounds to me as if it's a matter of giving the correct parameters to Intel MPI and making sure it's configured correctly. Once that is taken care of, ATK shouldn't have any problem running the way you want. But you really should get the administrator of the cluster to provide the relevant parameters for running on multiple nodes. There are no special parameters to give to ATK for this, and the indication you mention that the processes only run on one node is something related to the MPI (and possibly the queue system), i.e. it's outside our control.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5429
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: the setting for the parallel computing of ATK
« Reply #12 on: September 15, 2012, 00:54 »
Moreover, to troubleshoot or give advise on issues like this is impossible without at least a hint of the PBS script and what output you get (like "cat $PBS_NODEFILE" executed inside the submission script, etc).