Author Topic: out of memory error by allocating 8000 GB memory withing DFTB calculation!  (Read 123 times)

0 Members and 1 Guest are viewing this topic.

Offline Alireza

  • New ATK user
  • *
  • Posts: 8
  • Country: de
  • Reputation: 0
    • View Profile
Dear experts,

I am trying to get the electronic transport of a large system which contains 2604 atoms in the unit cell, within the dftb calculations.
First I tried this setting:
--nodes=1
--ntasks-per-node=24
--cpus-per-task=1
--mem-per-cpu=2540
and the job ran into out-of-memory error.
Then I increased the number of nodes up to 20, leading to 480 cores and 1,219,200 mb total memory. Unfortunately, the calculation ran into the same error again. The attached figure is the memory utilization related to the mentioned job. At the time around 37 minutes, the semi empirical calculation finished, and the transmission calculation starts. Note that, the calculation ran into error at this time.

UPDATE on 01.11.2021:
I also tried on a different partition which has larger memory by allocating 8T memory. The same error occurred. Can someone tell me what is the scale of transport calculations with atk? The number of atoms vs memory. Is it a bug or a technical problem?

Appreciate any comments/suggestions

Cheers, A
« Last Edit: November 1, 2021, 12:49 by Alireza »

Offline Petr Khomyakov

  • QuantumATK Staff
  • Supreme ATK Wizard
  • *****
  • Posts: 1286
  • Country: dk
  • Reputation: 24
    • View Profile
Could you try using threading instead of increasing the number of nodes/cores, e.g., reducing the number of tasks per node to 12 and then setting cpus per task to 2, or even using less tasks if setting it to 12 does not work? I think it is not an issue with total memory, but rather not enough memory on a single compute node.

An ultimate test would be setting the number of tasks per node to 1 and cpus per task to 24. Having 20 fast communicating nodes would then sill allow you for 20 MPI processes to use for this job - I assume your nodes InfiniBand-type of communication, otherwise, using multiple nodes might be not really advantages, and even increase compute time due to slow communication.