Print Page - How to set up parallization between 3 computers

QuantumATK => Installation and License Questions => Topic started by: perfetti on December 19, 2011, 16:42

Title: How to set up parallization between 3 computers
Post by: perfetti on December 19, 2011, 16:42

Dear Everyone,
   I have a very superfacial question: I want to set up three computers in parallel, however, I don't know how. Should I just connect them together and then install the MPICH2? We don't have computer support here, so I need to do it alone. What kind of software and hardware do we need? I only went out and bought two computers besides my quadcore workstation, so there's only windows 7 OS inside.
   Thanks.

Title: Re: How to set up parallization between 3 computers
Post by: Nordland on December 19, 2011, 21:45

First of all I would setup mpich2 on the windows machines and form a mini-cluster with them:
Perhaps this link could be of your help. http://www.webstreet.com/super_computer.htm

Then I would install atk on all machines, and then you should be ready to go,
once you have read http://quantumwise.com/documents/tutorials/latest/ParallelGuide/index.html/

Title: Re: How to set up parallization between 3 computers
Post by: Anders Blom on December 19, 2011, 23:19

Also, keep in mind to have the user files in a location that can be accessed by all nodes, like a commonly mounted network drive (on the same drive letter). And, to run the ATK calculations in parallel you must use the command line, it cannot be done from within VNL.

Title: Re: How to set up parallization between 3 computers
Post by: perfetti on December 20, 2011, 02:45

What do you mean by the command line?
Does that mean I must use the linux system, or could I just use the editor?

Quote from: Anders Blom on December 19, 2011, 23:19

Also, keep in mind to have the user files in a location that can be accessed by all nodes, like a commonly mounted network drive (on the same drive letter). And, to run the ATK calculations in parallel you must use the command line, it cannot be done from within VNL.

Title: Re: How to set up parallization between 3 computers
Post by: Anders Blom on December 20, 2011, 10:59

Believe it or not, but there is a command line in Windows too ;)

Easiest way is to open Start Menu>Run (WinKey+R) and type "cmd". There you can run "atkpython script.py > logfile.txt" or in the case of parallel something like "mpiexec -n 5 script.py > logfile.txt".

Title: Re: How to set up parallization between 3 computers
Post by: perfetti on February 10, 2012, 04:26

Dear everyone,
   I have run the parallization for a few days. I have three computer, one has 12 GB memory, and the other two has 8 GB memory each. They are all quad-core computers, and I guess they are also single socket machine.
   Now my job will take about 3.6 GB on one local machine, so I decide to allocate two processes one machine, according to the parallel strategy. This allocation is proved not to cause problems. However, I found a phenomena: the third computer seems never working under such a setting. Usually the master node has 100% CPU usage, while the second one has 50% CPU usage, and the third one only has 1% CPU usage. Sometimes the network is even not connected for the third machine, but the job still runs smoothly under the same command.
   I am not sure what's the problem. I could see the three machines are connected under mpich, but the third one seems doesn't work. And my command line for a job is like this:
   mpiexe -n 6 myjob.py > myjob.log
   Can anybody point out the problem for me? Cause I really want to make the most usage of the computers.Thanks very much!
Have a nice day.

Title: Re: How to set up parallization between 3 computers
Post by: Anders Blom on February 10, 2012, 08:22

The first thing to check is if the 3rd machine even gets an ATK process at all. If it does, but still uses so little CPU, it can be because there is not enough parts of the code to parallelize over to keep all 6 MPI nodes busy. This would be the case if you have a bulk system and few k-points, for instance.

More likely, however, is that your 3rd node doesn't even participate in the run (again, you should check if there is an atkpython process on it or not). If there isn't any atkpython running there, perhaps you need to run with a "-machinefile" argument to mpiexec (see the MPICH2 documentation). It may also be that for some reason mpd is not running on the 3rd machine, or if it is, perhaps the machine at which you start doesn't know about it. Make sure your mpd ring is properly configured.

When testing this, you don't need to run a large ATK script and check each machine. It's enough to run a script containing

Code: python

import socketif processIsMaster():
    print 'Master node:',
else:
    print 'Slave node:',
print socket.gethostname()

The printout to the screen shows on which machine the script was run, so it's handy for troubleshooting this situation.

QuantumATK Forum

QuantumATK => Installation and License Questions => Topic started by: perfetti on December 19, 2011, 16:42