Author Topic: It's memory's question  (Read 5227 times)

0 Members and 1 Guest are viewing this topic.

Offline wring

  • Regular QuantumATK user
  • **
  • Posts: 24
  • Reputation: 0
    • View Profile
It's memory's question
« on: December 22, 2009, 01:22 »
     rank 15 in job 1  cu108-ib_38172   caused collective abort of all ranks
     Is this question cased by the litter memory? In every cpu, the memory is about 3Gb.

Offline zh

  • QuantumATK Support
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 1141
  • Reputation: 24
    • View Profile
Re: It's memory's question
« Reply #1 on: December 22, 2009, 10:03 »
The reason and solution can be found in the following thread:
http://quantumwise.com/forum/index.php?topic=199.0

Offline wring

  • Regular QuantumATK user
  • **
  • Posts: 24
  • Reputation: 0
    • View Profile
Re: It's memory's question
« Reply #2 on: December 23, 2009, 01:35 »
rank 0 in job 1  cu107-ib_48955   caused collective abort of all ranks
  exit status of rank 0: return code 137


We use Intel mpi in our cluster. Doesn't this cause the problem?
« Last Edit: December 23, 2009, 01:44 by wring »

Offline zh

  • QuantumATK Support
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 1141
  • Reputation: 24
    • View Profile
Re: It's memory's question
« Reply #3 on: December 23, 2009, 07:12 »
Quote
We use Intel mpi in our cluster.

This just means that the MPI installed on your cluster was combined with the C and Fortran compilers of Intel. Please check the version of your MPI.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5411
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: It's memory's question
« Reply #4 on: December 23, 2009, 11:43 »
ATK only functions with MPICH2 (and, quite possibly, MVAPICH).

Typical error when using other MPIs manifest themselves as all processes running as masters, thus you don't actually get any parallelization performance benefit, and possible collisions occur in the I/O (writing NetCDF files for instance).

Offline wring

  • Regular QuantumATK user
  • **
  • Posts: 24
  • Reputation: 0
    • View Profile
Re: It's memory's question
« Reply #5 on: December 24, 2009, 01:34 »
But my senior fellow apprentice run well and I run only one work in the cluster well, too. When I put the second or the third in the computer ,the question is ocurring .
    Thanks a lot.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5411
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: It's memory's question
« Reply #6 on: December 24, 2009, 22:39 »
Not sure exactly what you mean by "put ... in the computer". If this means you run in parallel over more than one node, compared to just running on one node, then the error is to be expected. But, again, under any circumstance, ATK is only supported under MPICH2, anything else is experimental and up to the user.

If you don't have a queue system that controls allocation to individual nodes, then certainly if you run several calculations simultaneously you can run out of memory.

Also, finally, note that to run several calculations simultaneously you need more than one master license. If this is the problem you'll see an error message in your "stderr" file (if you have a queue).

Offline wring

  • Regular QuantumATK user
  • **
  • Posts: 24
  • Reputation: 0
    • View Profile
It's memory's question
« Reply #7 on: January 7, 2010, 03:26 »
  Our new cluster has 8 cpu/node, the total memory of each node is 24 Gb. How many atoms we calculate per node are? Because recently the calculation always leads to the cumputer dead.

Offline zh

  • QuantumATK Support
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 1141
  • Reputation: 24
    • View Profile
Re: It's memory's question
« Reply #8 on: January 7, 2010, 07:41 »
Quote
"How many atoms we calculate per node are?"
Just for the sentence of your question, it seems that you can give an answer by yourself if you take look at your input file. 

How many atoms in a system can be calculated on each node?
Usually, no definite answer because the required computing resource also depends on the choice of other parameter, not only the total number of atoms.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5411
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: It's memory's question
« Reply #9 on: January 7, 2010, 11:29 »
As zh writes, the total memory usage is a complicated function of a lot of parameters, not just the number of atoms. Each element has a different number of valence electrons, which is what matters rather than the number of atoms. Also, we have to consider at least also the basis set size (which is what really matters in determining the matrix sizes), the mesh cut-off, and the k-point sampling.

However, I believe another point is most crucial in your case. If your 8 CPUs share this memory, and you run 8 MPI processes on the machine, the amount of memory available is effectively only 4 Gb, because each MPI process uses the same amount of RAM. So, to test how "large" system you can run, start by using only one CPU, and monitor the memory usage. If, for instance, the job takes 10 Gb, then you can only use 2 MPI processes.