Author Topic: Quantifying Memory Requirements for a Parallelized Job  (Read 53 times)

0 Members and 1 Guest are viewing this topic.

Offline apnichol

  • New ATK user
  • *
  • Posts: 2
  • Country: us
  • Reputation: 0
    • View Profile
I would like to quantify the actual memory requirements for a parallelized job on a cluster since some jobs have run out of memory in the past due to insufficient RAM from each node. How can we calculate the memory requirements for an entire job calculation (how do we quantifiably determine it from the *.log file where it provides memory per k-pt, per dense matrix dimension, and per real-space grid)? A snippet of the memory requirement from a previous *.log file is shown below:

+------------------------------------------------------------------------------+
| K-point grid: 4 x 4 x 4                                                      |
| Number of irreducible k-points: 32                                           |
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
| Real space grid sampling is (209, 209, 209) in a, b, and c directions.       |
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
| Memory requirements for the calculation                                      |
+------------------------------------------------------------------------------+
| Dense matrices: 1.52 GB per matrix [Matrix dimensions 9984 x 9984]           |
| Total memory required per k-point: 4.56 GB                                   |
|                                                                              |
| Storage of real-space orbitals: Enabled                                      |
| Storage requires 306 MB                                                      |
|                                                                              |
| Total memory required per real-space grid: 139 MB                            |
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
| SCF History                                                                  |
+------------------------------------------------------------------------------+
| Memory required to store SCF history: 10.02 GB                               |
| Number of history steps: 20                                                  |
+------------------------------------------------------------------------------+

Offline filipr

  • QuantumATK Staff
  • Regular ATK user
  • *****
  • Posts: 28
  • Country: dk
  • Reputation: 3
  • QuantumATK developer
    • View Profile
Re: Quantifying Memory Requirements for a Parallelized Job
« Reply #1 on: June 20, 2022, 10:37 »
Quote
Total memory required per k-point: 4.56 GB

By default most algorithms will parallelize over k-points, so if you use N processes per node this will require at minimum N x 4.56 GB per node. Besides this there will be some other quantities that aren't distributed across processes. Unfortunately the log report isn't totally up to date with all the quantities that use large amounts of memory so using it to give an accurate estimate of the total memory requirements is not possible.

But here are two suggestions to reduce the memory consumption drastically.

Use OpenMP threads for shared memory parallelization. If you e.g. have nodes with 40 cores you can use e.g. 10 MPI processes with 4 OpenMP threads each. The more threads the less memory usage, but depending on the system size there is an upper limit on parallel efficiency.

Parallelize the diagonalization at each k-point: set "processes_per_kpoint" under Parameters for "Eigenvalue solver" in the Calculator Settings. You want to have N_local_kpoints x N_processes_per_kpoint x N_openmp_threads equal to the number of cores on each node, so it takes a little planning to get the best load balancing.

Also be sure to use a new version of QuantumATK (>= 2020.12, maybe even >= 2021.06 - can't remember), as we improved OpenMP parallelization recently and also introduced MPI shared memory for certain quantities.

First, be sure to use a new version of QuantumATK as we have improved memory usage and distribution in the more recents versions: