Thank you for these questions, and for making it so explicit. Even so, it's not entirely simple to answer in a conclusive way, but I will try.
The MPI benefit is generally larger than the pure OpenMP parallel scaling, except for problems which is strongly dominated by diagonalization (large bulk cells or molecules). ATK can parallelize in MPI over k-points and energy points. From a general perspective, your best option is to combine the two methods, by using 2 workstations, each with two quad-cores. This will allow you to run up to at least 4 MPICH2 processes, each of which can thread on 4 cores, for maximum performance. This will require 1 master and 3 slave licenses.
The added benefit of 2 more cores per socket (hexacores) is extremely small, but perhaps rather expensive. So the performance/money is small.
So, to summarize in relation to your points:
1. Not fully, no, except in a few cases.
2. Yes, absolutely. At least 4, in some cases (the small carbon system, and for transmission spectra etc) perhaps even 8 or more.
3. No, the combination of OpenMP and MPI is best (and the advantage of hexcore over quadcore is negligible).