This apparent low speed can be caused by a competition between OpenMP and MPI. If you didn't explicitly turn off threading, each of your 12 MPI processes will attempt to thread on all cores of the machine, and in total the calculation will run very slowly. You should set
export OMP_NUM_THREADS=1
in the shell, before running "mpiexec ... atkpython" to avoid this.
In ATK 2017, this will no longer be necessary, but it is for 2016 and any older version.
Also, as the "atklog" file and Jess points out, 12 is not a good number to MPI processes for this system, which again contributes to bad performance.