60 cores per node is a slightly unusual number, but possible of course.
In general there is no need to specify the number of threads, the software will figure out the best number by itself. So leave NUM_THREADS empty, unless you have very, very special reasons.
Few problems in QuantumATK scale well to 200+ MPI processes (at least when using LCAO; plane-wave basis sets can go higher, and NEGF). For memory reasons, and to conserve your hardware resources, if you have 60 core machines, I would start by running on just one node, perhaps with 10 or 20 MPI processes to test speed and memory consumption.
The point is to combine MPI and threads, not pick only one or the other. So, again, single node, a smallish number of MPIs (for memory, and weighed against number of k-points), and let the software thread by itself. Of course you still need to reserve the full node in your queue submission.