Author Topic: Communication error with rank 12??? (Read 4352 times)

Sarvesh Agarwal · « **on:** June 14, 2012, 12:50 »

1.I have installed MPICH2 1.3 in two nodes each node is having 12 cores, therefore running my script in 16 cores parallely, however after some time simulation is getting stopped and giving the following error.
2. When iam running my script parallely in the same node using CUI it is taking more time compare to if I am running it in GUI.what is the problem.Simulation time for that scipt is 0nly 4 minutes in GUI ,whereas in Parallely running it is taking double time 8 min.
Can anyone Plz suggest smthng, Iam new user in ATK.

Anders Blom · « **Reply #1 on:** June 14, 2012, 14:55 »

The two problems are related, and provide a perfect illustration about the importance of understanding how MPI parallelization actually takes place.

The short story is, that you really shouldn't put more than 1 MPI process per socket. Depending on your hardware, you may have 2 or 3 sockets on each node (depending on if they are 4-core or 6-core). Therefore, the maximum recommended MPI parallelization is -n 4 or -n 6 (2 nodes x 2/3 sockets). Anything above this is likely to be slower.

And this is what you are seeing in parallel (CUI) vs. serial (GUI). You probably over-parallelize so much, that the cores spend most of their time fighting for cache and RAM access, plus communicating among each other, rather than actually doing calculations.

The beauty of multicore is that you can have different tasks running at the same time without disturbing each other so much as if you have only a single core; like, you can still use your internet browser while a calculation is running. On an old computer, each time the browser needed CPU, it would kick out the calculation for a while, and vice versa, but with a dualcore they can run in tandem.

But it's a myth that cores are independent compute nodes. They share a lot of infrastructure (like L2 cache), and loading independent processes on all 4 cores of a quad-core will be slower than having a single process.

The really proper way to utilize a multinode/socket/core environment is to use hybrid parallelization, where you do MPI over the nodes (or sockets), and OpenMP threading over the cores. This means, each socket still only solves "one problem" (in the case of ATK, it diagonalizes one k-point), but it can try to use all cores to do so - if they are not doing something else at the moment. If, however, you try to make each socket solve 4 problems simultaneously, then first of all they cannot thread, so you lose that advantage actually, and second as mentioned above the cores start to suffer from insufficient memory and network access.

The second important point is that each MPI process is a complete (almost) replica of the calculation. Therefore, if you load 12 MPI processes on a machine 12 GB of RAM, your effective calculation size is limited just 1 GB. So most likely what happens in your case 1 is that a serial version of the calculation would need perhaps 1.5 GB (I'm just using dummy numbers to show the principle) but in parallel with -n 24 over 2 nodes it would need 1.5*12 per node - and this is probably more than your available RAM.

ATK does utilize a hybrid scheme for some parts of the computations, and will do so even more in the future. So, limit your MPI parallelization to 1 process per node (or 1 per socket if you have enough RAM) and let ATK thread over the cores instead.

Sarvesh Agarwal · « **Reply #2 on:** June 15, 2012, 10:02 »

Thanx Anders...

, I utilize this knowledge for running my scripts.

QuantumATK Forum

News:

Author Topic: Communication error with rank 12??? (Read 4352 times)

Sarvesh Agarwal

Communication error with rank 12???

Anders Blom

Re: Communication error with rank 12???

Sarvesh Agarwal

Re: Communication error with rank 12???