Author Topic: very strange restart calculation  (Read 3685 times)

0 Members and 1 Guest are viewing this topic.

Offline postnikov

  • Regular QuantumATK user
  • **
  • Posts: 22
  • Reputation: 0
    • View Profile
very strange restart calculation
« on: June 30, 2010, 05:44 »
I perform calculations with atk code.  The charge in the scattering region  varied very smoothly as follows.

# sc  0 : q =  430.00000 e
# sc  1 : q =  415.65637 e  dRho =  9.8695E-01
# sc  2 : q =  620.18219 e  dRho =  5.9300E+01
# sc  3 : q =  442.87604 e  dRho =  5.8095E+01
# sc  4 : q =  439.45299 e  dRho =  9.4812E-01
# sc  5 : q =  437.16236 e  dRho =  3.9194E-01
.......
# sc 36 : q =  432.10083 e  dRho =  5.4732E-01
# sc 37 : q =  431.76483 e  dRho =  6.5820E-02
# sc 38 : q =  431.26704 e  dRho =  1.7698E-01
# sc 39 : q =  430.68879 e  dRho =  3.4002E-01
# sc 40 : q =  431.47270 e  dRho =  3.1151E-01
# sc 41 : q =  431.60416 e  dRho =  1.8738E-02
# sc 42 : q =  431.88605 e  dRho =  4.7449E-02


Unfortunatlely, my calculation is closed due to the computer shutdown. I restart the calculation,
the charge in the scattering region calculation is very strange.

# sc  0 : q =  432.55992 e
# sc  0 : q =  432.55992 e
# sc  0 : q =  432.55992 e
# sc  1 : q =   -4.00223 e  dRho =  3.6860E+01
# sc  1 : q =   -4.00223 e  dRho =  3.6860E+01
# sc  1 : q =   -4.00223 e  dRho =  3.6860E+01



Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5428
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: very strange restart calculation
« Reply #1 on: June 30, 2010, 11:18 »
Except for the negative charge, which could be recovered in step 2, you are somehow not running the second calculation properly in parallel. It seems all nodes think they are master nodes... How did you restart (script and command line)?

Offline postnikov

  • Regular QuantumATK user
  • **
  • Posts: 22
  • Reputation: 0
    • View Profile
Re: very strange restart calculation
« Reply #2 on: June 30, 2010, 17:07 »
This is my init script part:

.
.
.
.
.
runtime_parameters = runtimeParameters(
    verbosity_level = 10,
    checkpoint_filename = 'twoprobe.nc'
)
# Perform self-consistent field calculation
scf = executeSelfConsistentCalculation(
    twoprobe_configuration,
    two_probe_method,
    runtime_parameters = runtime_parameters,
    initial_calculation = scf
)
.
.
.
.
.

my input file for restart calcilation is just like this:

.
.
.
.
.
runtime_parameters = runtimeParameters(
    verbosity_level = 10,
    checkpoint_filename = 'newtwoprobe.nc'
)   
scf = restoreSelfConsistentCalculation("twoprobe.nc")
# Perform self-consistent field calculation
scf = executeSelfConsistentCalculation(
    twoprobe_configuration,
    two_probe_method,
    runtime_parameters = runtime_parameters,
    initial_calculation = scf
)

.
.
.
.
.
 

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5428
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: very strange restart calculation
« Reply #3 on: June 30, 2010, 18:16 »
And exactly the same command for mpiexec?

Offline postnikov

  • Regular QuantumATK user
  • **
  • Posts: 22
  • Reputation: 0
    • View Profile
Re: very strange restart calculation
« Reply #4 on: July 1, 2010, 09:08 »
YES, they are all using the followingg command to submit jobs:

mpirun -np 8 /home/postnikov/atk-2008-10/bin/atk test.py </dev/null | tee  out&

which mpirun shows

/opt/intel/mpich2-1.0.7rc1/bin/mpirun


Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5428
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: very strange restart calculation
« Reply #5 on: July 1, 2010, 09:13 »
It seems you are not running with the proper MPICH2 libraries. You must use that MPI, ATK does not support Intel MPI, even if it's supposed to be "compatible". The effect you see, that all nodes think they are masters, is a typical symptom. It also means you are actually not getting a proper parallel performance.

To run ATK in parallel, you should use "mpiexec" from MPICH2 (the one from Argonne!).

Unless your mpirun is some kind of alias for that...?

Offline postnikov

  • Regular QuantumATK user
  • **
  • Posts: 22
  • Reputation: 0
    • View Profile
Re: very strange restart calculation
« Reply #6 on: July 1, 2010, 09:34 »
Thanks!

I think my mpirun is not the IntelMpi.

The used mpich2 is only installed by the ifort compiler, not the pgi complier.
In my dir /opt/intel/mpich2-1.0.7rc1/bin/

There is one file mpiexec.

You mean that I must use mpiexe not using the mpirun, although they both belong to the mpich2?

By the way, the openmpi can be used in the atk code?

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5428
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: very strange restart calculation
« Reply #7 on: July 1, 2010, 10:06 »
No, OpenMPI can not be used, that's a completely different MPI architecture compared to MPICH(2).

In MPICH1 the command was "mpirun", in 2 they changed it to "mpiexec", but usually there's a symbolic link from mpirun->mpiexec for compatibility, so probably that makes no real difference.

Under any circumstance, ATK only officially supports MPICH2 from Argonne. 1.0.7p1 is also a very old version, there is 1.2.1 now. I suggest you install that.