The degree of parallelization also depends to some extent on the system and the parameters used. For instance, if you have 1x1x100 k-point sampling, then you will see good parallelization in the beginning, for the electrodes, but once you get into the two-probe calculation there isn't much for ATK to parallelize over.
By the way, are you parallelizing this calculation on a single machine (the use of "top" leads me to believe so)? In that case, it's probably not a good idea to use more than 2-3 parallel processes anyway, because of competition for RAM and cache. You're probably better off using threading, if you have a multi-core CPU, and perhaps 2 MPI processes.