Author Topic: Reducing memory consumption for MTP training  (Read 793 times)

0 Members and 1 Guest are viewing this topic.

Offline pshinyeong

  • Regular QuantumATK user
  • **
  • Posts: 21
  • Country: kr
  • Reputation: 0
    • View Profile
Reducing memory consumption for MTP training
« on: January 10, 2024, 08:35 »
I am trying to train MTP for silicon bulk with single boron dopant (as a test).
I created the script using defect training template in the workflow.
Pristine configuration was primitive silicon bulk (2 atoms), defect configuration generated has supercell repetition of 3x3x3.
I used 2x2x2 k-point density, pseudodojo medium basis set for silicon and boron, reduced the basis size to 400, used all available processes per task.
Most of the parameters were left as default settings.
I ran the script using intel MPI & SLURM with 3 nodes 18 cores each. Each node has 64GB memory.
Calculation file, log file and screenshot of the error is attatched below.

I also tried increasing the number of nodes, reducing the number of cores, increasing the number of processes per task, reducing the basis size etc. and same error appeared (I assume it is memory error)

My main goal is to do active learning MTP but if the above calculation faces memory error, I am pretty sure active learning MTP would also result in memory error.
Could anyone suggest any further ways to reduce memory consumption during MTP training?

Thank you

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5411
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: Reducing memory consumption for MTP training
« Reply #1 on: January 18, 2024, 23:44 »
It's not necessarily out of memory. Can you check the other log files for errors?
In particular the last one, mtp_training_defectconfiguration_0_bundle_2.log

Offline pshinyeong

  • Regular QuantumATK user
  • **
  • Posts: 21
  • Country: kr
  • Reputation: 0
    • View Profile
Re: Reducing memory consumption for MTP training
« Reply #2 on: January 20, 2024, 14:47 »
Hello, I do not see any error message in the mtp_training_defectconfiguration_0_bundle_2.log,
I have attached the lmtp_training_defectconfiguration_0_bundle_2.log for reference

Offline Julian Schneider

  • QuantumATK Staff
  • QuantumATK Guru
  • *****
  • Posts: 163
  • Country: dk
  • Reputation: 25
    • View Profile
Re: Reducing memory consumption for MTP training
« Reply #3 on: January 24, 2024, 12:00 »
I tried running your script and it seems to run ok in principle. So it could be that it just runs out of memory on your computer and it looks like this is happening during the DFT calculations of the training data.
It looks like you have already tried most of the options to reduce memory by distributing across more nodes.
Apart from that it might be possible to reduce the memory a bit by deleting the calculator state of the optimized configurations, as shown in the attached script.
Also, it seems that the ParallelCG solver you use for the poisson equation uses a bit more memory than the default FFT solver, so changing that could also help a bit, unless you have a specific reason for using that solver.