QuantumATK Forum
QuantumATK => General Questions and Answers => Topic started by: Kaspar on May 23, 2013, 04:48
-
Dear all,
I recently tried my luck with running parallel calculations. Taking up 2 full 8-core nodes with one process per node (as per suggested by Parallel Guide) I got a massive speed up from more than a week on my local computer to less than 8 hours !
Now my question is; it seems that the output files - both the log file and the .nc file - get twice the output written to them when running the calculation as I described. I have a hunch that output gets n-fold written for n processes? This is not a major problem, but a minor nuisance since the log file gets a bit more messy and the size of the .nc file, which is already sizeable, is increased.
Is there a way to avoid this or is it an inherent feature of parallel calculations?
Thank you all!
Kaspar
-
When you use the “print” command in the mpich2 parallel execution you should add “if processIsMaster():” before the print command in the input file.
print total_energy.evaluate().inUnitsOf(eV)
in the serial execution becomes
if processIsMaster(): print total_energy.evaluate().inUnitsOf(eV)
in the parallel
Your log file issue will be improved by this method.
See also
http://www.quantumwise.com/documents/manuals/latest/ReferenceManual/index.html/ref.processismaster.html
I could not understand the nc file issue. But I suspect you run the calculation twice, it will generate the double data in ncfile.
-
Or use "nlprint" instead of print, it's MPI safe. So,
nlprint("My string")
will only be printed once, even if you run in parallel.
-
By the way, if you really see double output from ATK (not your own print statements), then it's an indication that in fact the parallelization is not working properly. If you have for instance multiple lines with "dE = ...", or coordinates (even the "Started" messages), it actually means all nodes think that they are the master process, and your calculation is essentially doing a "multiple serial" run. Then there is no parallel speedup either. This problem can be caused by using OpenMPI instead of MPICH2, or if your process manager is smpd instead of mpd (but use hydra anyway!).
-
Thanks for all the replies!
The case was like your last suggestion, Anders. Everything was printed twice, including all "starting" messages and the atom positions etc.
I was using OpenMPI, so I'm trying now using MPICH2. According to the log file, so far it's writing only once.
I'll get back once the calculation is over.
-
Hello again,
I just wanted to tell that it was indeed the MPICH2 that solved my problem.
The simulations are running in parallel now scaling very well with the number of nodes that I have tried (up to 8 nodes). And the output is printed to the log file just once, as it should.
Thank you for your help !