Total memory required per k-point: 4.56 GB
By default most algorithms will parallelize over k-points, so if you use N processes per node this will require at minimum N x 4.56 GB per node. Besides this there will be some other quantities that aren't distributed across processes. Unfortunately the log report isn't totally up to date with all the quantities that use large amounts of memory so using it to give an accurate estimate of the total memory requirements is not possible.
But here are two suggestions to reduce the memory consumption drastically.
Use OpenMP threads for shared memory parallelization. If you e.g. have nodes with 40 cores you can use e.g. 10 MPI processes with 4 OpenMP threads each. The more threads the less memory usage, but depending on the system size there is an upper limit on parallel efficiency.
Parallelize the diagonalization at each k-point: set "processes_per_kpoint" under Parameters for "Eigenvalue solver" in the Calculator Settings. You want to have N_local_kpoints x N_processes_per_kpoint x N_openmp_threads equal to the number of cores on each node, so it takes a little planning to get the best load balancing.
Also be sure to use a new version of QuantumATK (>= 2020.12, maybe even >= 2021.06 - can't remember), as we improved OpenMP parallelization recently and also introduced MPI shared memory for certain quantities.
First, be sure to use a new version of QuantumATK as we have improved memory usage and distribution in the more recents versions: