You can find many details in the
Parallel Guide.
If this is a double quad-core, I would however not use more than 2 MPI processes, otherwise the MPI parallelization will consume too much memory, and kill all threading advantage. MPI parallelization is primarily intended for nodes with multiple, separate machines.
Threading over multicores is, on the other hand, automatically activated in ATK.