ATK can take advantage of both MPI and OpenMP (to a lesser extent), but for your calculations I think all the benefit will lie in MPI. As a rule of thumb, the code will scale well up to roughly the number of k-points NAxNB/2 for the self-consistent part for zero bias, whereas for finite bias you have a benefit up to 30-50 nodes due to the integration in the complex plane. The speed-up is however not linear, and you have to account for the probability to wait very long in the queue if you request too many nodes. For analysis, like computing the LDOS or T(E) etc, the scaling can be linear up to 100 nodes easily (the number of energy points in T(E) for instance).
I would recommend running over 16 MPI nodes, try 32 for some of the analysis.