QuantumATK Forum

QuantumATK => General Questions and Answers => Topic started by: naash on April 4, 2011, 06:23

Title: logfile meaning
Post by: naash on April 4, 2011, 06:23
Numbers of Processors:  10
---------------------------
started mpd Number: 5
---------------------------
/share/apps/ATK/atkpython/bin/atkpython: line 3:  7097 Killed                  PSEUDOPOTENTIALS_PATH=$EXEC_DIR/../share/pseudopotentials GPAW_SETUP_PATH=$EXEC_DIR/../share/gpaw-setups/ PYTHONHOME=$EXEC_DIR/.. PYTHONPATH= LD_LIBRARY_PATH=$EXEC_DIR/../lib $EXEC_DIR/atkpython_exec $*


rank 8 in job 1  node9_42312   caused collective abort of all ranks
  exit status of rank 8: return code 137

 what does the above message mean
Title: Re: logfile meaning
Post by: Anders Blom on April 4, 2011, 10:37
Usually it means you have run out of memory. Your first lines seem to indicate that your run 10 MPI processes on 5 nodes. That means each node runs two processes, and this doubles the memory requirement (each processes uses about the same amount as a serial process). The rule of thumb is one MPI process per physical node, unless the calculation is small (compared to the available RAM).

Also, if this is ATK 10.8, you can consider upgrading to 11.2 which uses less memory for two-probe calculations.