First of all, just to be sure, did you verify that your MPI setup is correct, such that ATK really can run in parallel properly? That is, when you run (test_mpi.py)
from ATK.MPI import *
if processIsMaster():
print '# Master node'
else:
print '# Slave node'
using
mpiexec -n 2 atk test_mpi.py
it prints "Master node" once and "Slave node" once.
In general, the two-probe part should be more parallel than even the bulk part. However, there are parts of the code which run in serial (it's simply not possible to parallelize 100% of the code), so if you check in on the CPU usage exactly at such a point, it will look like it doesn't run in parallel. So, it matters which method you use for checking the parallelism (how did you do it?).
Also we need to be careful what we mean by parallel, if we refer to OpenMP or MPI...
Yes the MPI setup is correct and ATK can run in parallel using the test_mpi.py script. I've also verified that the types of license (master and slaves) are correctly checked out. When I check for parallelism, basically I just see the load of my machines. I have a job with 4 processors. While I can see that there are 4 instances of ATK running, only one of them is active when the program is running the SCF for Fermi energy of scattering region. Otherwise, all 4 of them are active. I'm looking at the output file to determine which part of the calculation the program is running (runlevel 6). This is more prominent as my scattering region is over 1200 atoms.
Serial calculation at:
Equivalent Bulk Calculation (Initial Density for TwoProbe)
My system is installed with MPICH2 1.0.7. I have the job scheduler/load balancing software Torque/Maui install (with the appropriate mpiexec script) but I don't think that is the cause of the issue since I observe the same issue when I'm not using the job scheduler. The OS is Fedora 7 and ATK version is 2008.02.1.
I'm tempted to run without the initial bulk SCF loop. But as the job has already been running for 200+ hrs I'm not inclined to stop it and try again. I've tried to do that once before but found out that although there is a .nc file, it can be used to restart/continue the calculation.