Author Topic: Error message - mem error running in parallel  (Read 2803 times)

0 Members and 1 Guest are viewing this topic.

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Error message - mem error running in parallel
« on: January 20, 2012, 21:05 »
I am running atk in parallel on a machine with an equivalent of 24 cores ...

i got the following after an hour:


  File "./zipdir/NL/Calculators/DeviceCalculatorInterface.py", line 287, in _update
  File "./zipdir/NL/Calculators/DeviceCalculatorInterface.py", line 287, in _update
  File "./zipdir/NL/Calculators/LCAOCalculator/DeviceLCAOCalculator.py", line 1632, in scfLoopDevice
  File "./zipdir/NL/Calculators/LCAOCalculator/DeviceLCAOCalculator.py", line 588, in scfLoopDeviceHamiltonian
  File "./zipdir/NL/Calculators/CommonBuilder/DeviceBuilder.py", line 461, in createElectrostaticCalculator
  File "./zipdir/NL/Calculators/LCAOCalculator/DeviceLCAOCalculator.py", line 1632, in scfLoopDevice
  File "./zipdir/NL/Calculators/LCAOCalculator/DeviceLCAOCalculator.py", line 588, in scfLoopDeviceHamiltonian
  File "./zipdir/NL/Calculators/CommonBuilder/DeviceBuilder.py", line 461, in createElectrostaticCalculator
  File "./zipdir/NL/CommonConcepts/PoissonSolvers/MultigridSolver.py", line 55, in calculateFunctionOnGrid
MemoryError
  File "./zipdir/NL/CommonConcepts/PoissonSolvers/MultigridSolver.py", line 55, in calculateFunctionOnGrid
MemoryError



Now there are still many processes running, so ... i am not sure, will the results be valid, or if i got this i should kill the processes and start over?


Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5418
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: Error message - mem error running in parallel
« Reply #1 on: January 20, 2012, 22:32 »
That looks like a traceback, which is not just a warning. So it might be best to interrupt it.

Note that when you run in MPI parallel, each MPI process holds an (almost) entire copy of the whole calculation. So running 24 processes on one machine means you use 24x the amount of memory the same calculation uses in serial. It's generally best to limit the number of MPI processes per node to the number of sockets, or at least a reasonable number compared to the amount of RAM on the node.

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Re: Error message - mem error running in parallel
« Reply #2 on: January 20, 2012, 22:36 »
Good tips .. I will check on that and try again.  Thank you. 

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
running in parallel question
« Reply #3 on: January 24, 2012, 03:41 »
Maybe this is a dumb question, but what I am doing now is running a script such as :

1) make cfg
2) setup calcs
3) run calcs
4) make plots

If I run in parallel, do I need to seperate these?  for example, I noticed that a print message i put in the routine to make the cfg was printed 8 times, meaning it was run by each of the parallel threads ... is that bad?  do i need to only run calculations in parallel?  How do i know what to seperate exactly? i was assumming the program knew how to do that, but since my make config script seems to have run 8 times, that seems sort of strange ... it should not have to make the sonfiguration objects more than once ... please advice


Offline Nordland

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 812
  • Reputation: 18
    • View Profile
Re: Error message - mem error running in parallel
« Reply #4 on: January 24, 2012, 18:43 »
Your approach is correct, but you should do a minor modifiation, but first let me explain what happens. The same process starts on all nodes - setting up the structures to calculate and if you write print all processes will print this message. Once the calculation starts, each process is assign to different parts of the calculation. So for example where it comes to finding the eigenvectors for all the k-points. The first nodes takes the first part of the k-points and the second the second part and so on. Once done they will continue along again, until it enters a new part where it can benefit from MPI. If you want to for the prettiness of your scripts you can use nlprint(....). It is a MPI-safe print, meaning that only the master node will print this. When it comes to the plotting, one are in general not desiring all the nodes to make the same graph and overwrite the output file as it is bound to give problems. Refering to http://www.quantumwise.com/documents/tutorials/latest/ParallelGuide/ParallelGuide.pdf you can see there is a function called: processIsMaster() So if you in your script do something like this:
Code: python
if processIsMaster():
    # Do some plotting
then it is only the master node that will do plotting.
« Last Edit: January 24, 2012, 20:42 by Nordland »

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Re: Error message - mem error running in parallel
« Reply #5 on: January 24, 2012, 19:32 »
thank you that is very helpful