17
« on: October 30, 2017, 03:30 »
Dear Sirs
Last weekend I hit 3 issue in ATK mpi settings. The story is when I used 36 core to run the following script, it's quickly converged at 37 step in each loop. However, as increasing computing process to 44 core from 36, which gives not converged results, even >100 steps? Same code gives very different results due to process number setting, could you please explain this ? The second bizarre thing is, my cluster is made of A(44 core/256G ram)+B(14 core/128G ram), when I turn off B and run web example mpi_test.py with mpi "58" process in only A, which equally showing 1 master+57 slaves (completely ignores my physical cpu number is 44 ea in A) and keeps running, why? does ATK may virtualize cpu/process in A ? What's the correct concept between process and physical cpu count in ATK? Could you hint me where I shall improve? much appreciate and looking forward to your replies.
--------------
The third bizarre is : I have physical 14 cpu in B, but fully running program with mpi: 58, they always shows 0% performance , anything wrong in my QW path setting or folders in clstr?