By the way, if you really see double output from ATK (not your own print statements), then it's an indication that in fact the parallelization is not working properly. If you have for instance multiple lines with "dE = ...", or coordinates (even the "Started" messages), it actually means all nodes think that they are the master process, and your calculation is essentially doing a "multiple serial" run. Then there is no parallel speedup either. This problem can be caused by using OpenMPI instead of MPICH2, or if your process manager is smpd instead of mpd (but use hydra anyway!).