Author Topic: Issue with parallel two probe calculations (Read 6267 times)

lamkt · « **on:** January 28, 2009, 04:26 »

Hi all,

I would like to know if anyone observed that during the calculation of parallel two probe calculation, the SCF loop for the scattering region (The one right after the calculation of Fermi energy of electrodes and before the calculation of charge of the system) runs on a single processor?

I'm just wondering if there is anyway to make that part of the calculation run on the multi-processor to speed things up.

Any suggestions will be much appreciated.

Cheers!
Dick LAM

Anders Blom · « **Reply #1 on:** January 28, 2009, 10:32 »

First of all, just to be sure, did you verify that your MPI setup is correct, such that ATK really can run in parallel properly? That is, when you run (test_mpi.py)

Code

from ATK.MPI import *
 
if processIsMaster():
    print '# Master node'
else:
    print '# Slave node'

using mpiexec -n 2 atk test_mpi.py it prints "Master node" once and "Slave node" once. In general, the two-probe part should be more parallel than even the bulk part. However, there are parts of the code which run in serial (it's simply not possible to parallelize 100% of the code), so if you check in on the CPU usage exactly at such a point, it will look like it doesn't run in parallel. So, it matters which method you use for checking the parallelism (how did you do it?). Also we need to be careful what we mean by parallel, if we refer to OpenMP or MPI...

lamkt · « **Reply #2 on:** January 28, 2009, 11:22 »

Yes the MPI setup is correct and ATK can run in parallel using the test_mpi.py script. I've also verified that the types of license (master and slaves) are correctly checked out. When I check for parallelism, basically I just see the load of my machines. I have a job with 4 processors. While I can see that there are 4 instances of ATK running, only one of them is active when the program is running the SCF for Fermi energy of scattering region. Otherwise, all 4 of them are active. I'm looking at the output file to determine which part of the calculation the program is running (runlevel 6). This is more prominent as my scattering region is over 1200 atoms. Serial calculation at:

Code

Equivalent Bulk Calculation (Initial Density for TwoProbe)

My system is installed with MPICH2 1.0.7. I have the job scheduler/load balancing software Torque/Maui install (with the appropriate mpiexec script) but I don't think that is the cause of the issue since I observe the same issue when I'm not using the job scheduler. The OS is Fedora 7 and ATK version is 2008.02.1. I'm tempted to run without the initial bulk SCF loop. But as the job has already been running for 200+ hrs I'm not inclined to stop it and try again. I've tried to do that once before but found out that although there is a .nc file, it can be used to restart/continue the calculation.

Anders Blom · « **Reply #3 on:** January 28, 2009, 11:40 »

Ok, it seems your MPI is not the issue.

I think maybe there is nothing wrong here, actually...

I assume you have (1,1) k-points in the XY directions? The equivalent bulk run is carried out with (Nkx,Nky,1) k-points where (Nkx,Nky) are those given to the electrodes. Nkz given to the electrodes is ignored, since the cell is always long enough in Z not to need any significant k-point sampling, and the equivalent bulk run is anyway just an approximation.

In that case, there isn't really much for ATK to parallelize over. Parallelization in ATK is primarily implemented in 3 places:

k-point sampling, but if you have (1,1,1) this means only one node is active
energy sampling, but this only kicks in in the real two-probe calculation part (with open boundary conditions)
generation of matrix elements, but beyond 400 atoms this is the smaller part of the SCF iteration

There is one thing that could help you in this situation. While the matrix diagonalization does not parallelize over MPI, it does thread on multi-core CPUs using OpenMP. Thus, if you enable threading in ATK (see the manual appendix), and run such that each MPI node is a multi-core CPU, you should get "double parallelization": MPI parallelization over k-points, energy points, and matrix elements, and OpenMP threading to speed up the matrix diagonalization.

Anders Blom · « **Reply #4 on:** January 28, 2009, 11:45 »

Another thing to note is that the version you run, 2008.02, is significantly slower than the latest release, 2008.10. The difference can be as large as 2-5 times, and the difference is especially noticeable for the equivalent bulk run when using (1,1) k-point sampling, since the newer version uses real matrices instead of complex, which not only reduces calculation time by 4 times or so, but also memory usage.

The equivalent bulk part is the most memory intensive part of a two-probe run, generally, and thus it is the part which sets the limit on how large calculations you can run. The latest release goes a long way to reduce this memory allocation and thus allows for larger systems to be simulated.

Nordland · « **Reply #5 on:** January 28, 2009, 11:47 »

Hey lamkt.

I have two things you can try:

Skip the "Equivalent Bulk Calculation": If your systems is well-conditioned, you can disable it, by using twoProbeAlgorithmParameters(initial_density_type=InitialDensityType.NeutralAtom). It will run alot faster then,
but there is a small risk of getting some small convergence troubles, if the the system is ill-conditioned.

Upgrade to ATK 2008.10: If you say, there is no speedup at all, I am guessing that it is a 1D system, or in other word,
that the Equivalent Bulk" part of the calculation uses only the Gamma-point, and therefore you can gain as a factor for 4 in
speed by upgrading to ATK 2008.10, since it has improved algorithms for this according to Changelog

Best regards,
Nordland

lamkt · « **Reply #6 on:** January 28, 2009, 11:51 »

A big thank you to both Dr. Blom and Nordland. I'll try with the upgrade once the current calculations end and do a comparison on the speed and results to the same system.

QuantumATK Forum

News:

Author Topic: Issue with parallel two probe calculations (Read 6267 times)

lamkt

Issue with parallel two probe calculations

Anders Blom

Re: Issue with parallel two probe calculations

lamkt

Re: Issue with parallel two probe calculations

Anders Blom

Re: Issue with parallel two probe calculations

Anders Blom

Re: Issue with parallel two probe calculations

Nordland

Re: Issue with parallel two probe calculations

lamkt

Re: Issue with parallel two probe calculations