Author Topic: Back engine exception - memory  (Read 2235 times)

0 Members and 1 Guest are viewing this topic.

Offline hagelberg

  • Regular QuantumATK user
  • **
  • Posts: 15
  • Country: us
  • Reputation: 0
    • View Profile
Back engine exception - memory
« on: May 12, 2020, 17:57 »
Hello,

I struggle with memory restrictions when running device jobs. Two questions in this context:

(1) When trying to run a spin-orbit calculation involving a device (SOGGA, OMX - please see attached .py file), I came in conflict with machine memory caps. I consulted online advice on how to reduce the memory demands of my job (e.g.  https://docs.quantumatk.com/technicalnotes/advanced_performance/advanced_performance.html) and implemented what I found (see the Device Algorithm Settings block in the attached .py file). In this case, I received an error message that I couldn't decipher:

"Calculating Density Matrix :
** Back Engine Exception : info is nonzero
** Location of Exception : greensfunctioncalculator.cpp:843"
(see the attached output file)

This probably has to do with some erroneous algorithm settings. Could you give me some advice on this?

(2) Is there some way to determine the overall memory required by an ATK job? This would be extremely helpful. It happens very often that I start a job successfully, just to find it terminated with a memory error after many hours of running time.

Thanks,
Frank Hagelberg

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5405
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: Back engine exception - memory
« Reply #1 on: May 13, 2020, 22:23 »
Hi Frank,

there is a very handy way to estimate the memory before running a calculation. In the Script Generator, just open the section Memory Usage and click the button (see screenshot)! This will print a report and you can assess where most of the memory is used and how to design a parallelization strategy around that.

This can also be scripted using the MemoryUsage() class (https://docs.quantumatk.com/manual/Types/MemoryUsage/)

For your case, the report is sees in the picture. Note that I switched to the PseudoDojo basis sets which I prefer over OpenMX because they are softer and you can use a lower cut-off. We see that the calculation needs about 8-9 GB and that about half of that memory is used by the SCF mixing history. That part is shared in MPI but some other terms are not and will be duplicated. So if you are trying to run this on a machine with just 16-32 GB, and using so many MPIs no the same machine, I can see why you might run out of memory.

A quick fix would be to experiment with both reducing the number of history steps in SCF - a safe number will probably be 10-12. And, use fewer MPIs - it will run slower but at least it will run!

Offline hagelberg

  • Regular QuantumATK user
  • **
  • Posts: 15
  • Country: us
  • Reputation: 0
    • View Profile
Re: Back engine exception - memory
« Reply #2 on: May 14, 2020, 19:00 »
Thank you, Anders - it looks like the answer to my second question was right in front of my eyes. I'm now experimenting with the memory-usage estimator which indeed seems to be a very direct way to get an initial memory guess.
As to my attempted spin-orbit calculation: if I understand correctly, the PseudoDojo potential is not available for spin-orbit jobs in the ATK/VNL 2018.6 release, the version that I'm using right now.  Comparing the available options (SG15 versus OMX) in terms of memory, I find that OMX makes substantially lower demands than SG15.

Thanks again,
Frank