Author Topic: One error occured when calculating the DOS  (Read 7163 times)

0 Members and 1 Guest are viewing this topic.

Offline John

  • Heavy QuantumATK user
  • ***
  • Posts: 33
  • Reputation: 0
    • View Profile
One error occured when calculating the DOS
« on: March 12, 2011, 02:02 »
Dear Sir,
    One error occured when I calculate the Device DOS of my two-probe system, which have 270 atoms (Carbon or Hydrogen) using the 2X2 CPUs. The error is following, can you give any suggestion? It is run by the newest version 11.2.  Thanks.


+------------------------------------------------------------------------------+
|                                                                              |
| Transmission Spectrum Analysis                                               |
|                                                                              |
+------------------------------------------------------------------------------+

                            |--------------------------------------------------|
Calculating Transmission   : ==================================================

+----------------------------------------------------------+
| Transmission Spectrum Report                             |
| -------------------------------------------------------- |
| Left electrode Fermi level  = -3.993602e+00 eV           |
| Right electrode Fermi level = -3.993602e+00 eV           |
| Energy zero                 = -3.993602e+00 eV           |
+----------------------------------------------------------+
   energy       T(up)
     eV
 -2.000000e+00   8.812919e-01
 -1.990000e+00   8.925038e-01
 -1.980000e+00   9.001541e-01

...........................................

  1.980000e+00   9.576673e-01
  1.990000e+00   9.638053e-01
  2.000000e+00   9.703618e-01
+----------------------------------------------------------+
+------------------------------------------------------------------------------+
|                                                                              |
| Device Density of States                                                     |
|                                                                              |
+------------------------------------------------------------------------------+

                            |--------------------------------------------------|
Calculating DOS            : =================================================Fatal error in MPI_Allreduce: Message truncated, error stack:
MPI_Allreduce(773).......: MPI_Allreduce(sbuf=0x95abc30, rbuf=0x1c820e0, count=4, MPI_INT, MPI_SUM, MPI_COMM_WORLD) failed
MPIR_Reduce(764).........:
MPIR_Reduce_binomial(172):
do_cts(490)..............: Message truncated; 1405104 bytes received but buffer size is 16
rank 0 in job 17  node18_42298   caused collective abort of all ranks
  exit status of rank 0: killed by signal 9

Offline ajaramil

  • New QuantumATK user
  • *
  • Posts: 1
  • Country: us
  • Reputation: 0
    • View Profile
Re: One error occured when calculating the DOS
« Reply #1 on: March 12, 2011, 03:39 »
I had the same problem when running an LDOS calculation on a two-probe system on multiple processors. 

This is most likely an MPI coding problem in the LDOS/DOS routines in ATK 11.2.0, so as a quick workaround I just ran the same calculation on a single processor and it worked fine.

Cheers

Offline zh

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 1141
  • Reputation: 24
    • View Profile
Re: One error occured when calculating the DOS
« Reply #2 on: March 12, 2011, 10:30 »
which version of  mpi is used? It seems that the heavy communication between nodes caused the problem. Please try the mpich2, meanwhile use more nodes.
« Last Edit: March 12, 2011, 13:34 by zh »

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5575
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: One error occured when calculating the DOS
« Reply #3 on: March 12, 2011, 13:24 »
We cannot rule out that ATK is to blame, but in almost all cases this error is related to the hardware on the cluster. Different codes require different performance and features from the MPI environment, so it might be some software packages do not trigger the problem, while ATK does.

We'll try to run some tests on our own cluster + an external one next week. If you don't mind posting your offending scripts (all details), it will make it easier for us.

(Btw, don't try to use OpenMPI with ATK, it does not work at all.)

Offline John

  • Heavy QuantumATK user
  • ***
  • Posts: 33
  • Reputation: 0
    • View Profile
Re: One error occured when calculating the DOS
« Reply #4 on: March 13, 2011, 03:27 »
Dear Sir,
    My MPI is mpich2.1.0.8.
    I have tested the simple Li-H2-Li system, Almost no errors occured in parallel calculation for all kinds of analysises, including LDOS and DOS calculation.
    Maybe, like ajaramil said,"This is most likely an MPI coding problem in  ATK 11.2.0 when calculating the LDOS, DOS, and Transmissionpathways of one big system (such as my 200 atoms) in parallel (2X2 nodes). It worked fine when running the same calculation (LDOS, DOS, and Transmissionpathways ) on a single processor.

Offline John

  • Heavy QuantumATK user
  • ***
  • Posts: 33
  • Reputation: 0
    • View Profile
Re: One error occured when calculating the DOS
« Reply #5 on: March 24, 2011, 03:22 »
Dear Sir,
     Fortunately, the problem of LDOS parallel calculation is solved, but I still encounter some problems when calculating the DOS using the newest version 11.2.1 in parallel by 2X2 nodes because the output of DOS only prints "nan" like the folllowing.
PS:  My mpi is mpich2.1.0.8.   The command is
"/home/user1/bin/mpich2/bin/mpiexec -machinefile node -n 4 /home/user1/bin/atk112/atkpython/bin/atkpython test.py >& test.txt"
Can you give any advance?  Thank you.

output:
                            |--------------------------------------------------|
Calculating DOS            : ==================================================

+----------------------------------------------------------+
| Density of States Report                                 |
| -------------------------------------------------------- |
| Left electrode Fermi level  = -2.244959e+00              |
| Right electrode Fermi level = -2.244959e+00              |
| Energy zero                 = -2.244959e+00              |
| Units = eV    1/eV                                       |
+----------------------------------------------------------+
 -2.000000e+00            nan
 -1.990000e+00            nan
 -1.980000e+00            nan
 -1.970000e+00            nan
 -1.960000e+00            nan
 -1.950000e+00            nan
.......................................
  1.950000e+00            nan
  1.960000e+00            nan
  1.970000e+00            nan
  1.980000e+00            nan
  1.990000e+00            nan
  2.000000e+00            nan

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5575
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: One error occured when calculating the DOS
« Reply #6 on: March 24, 2011, 11:45 »
First of all, let us rule out the possibility that this is because you run it in parallel. Thus: do you get the same NaNs if you run in serial? Beyond that, we would have to see the script in order to troubleshoot.

Offline John

  • Heavy QuantumATK user
  • ***
  • Posts: 33
  • Reputation: 0
    • View Profile
Re: One error occured when calculating the DOS
« Reply #7 on: March 25, 2011, 03:15 »
It still can not get the DOS information in serial calculation for my 3-dimensional Au-molecule-Au system.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5575
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: One error occured when calculating the DOS
« Reply #8 on: March 25, 2011, 11:11 »
We are working on a solution for this problem, which has been reported by other users as well.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5575
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: One error occured when calculating the DOS
« Reply #9 on: April 7, 2011, 10:07 »
This issue is fixed in ATK 11.2.2.