Author Topic: Possible Bugs: Call QuantumWise Staffs for Help  (Read 13030 times)

0 Members and 1 Guest are viewing this topic.

Offline anyuezhiji

  • Regular QuantumATK user
  • **
  • Posts: 16
  • Reputation: 0
    • View Profile
Possible Bugs: Call QuantumWise Staffs for Help
« on: July 27, 2014, 11:10 »


Possible Bug 1:
Conducted only coordinate translation would actually lead to different results (See attachments for details).
This bug is likely to remain in the latest version of Atomistix ToolKit 2014.b1 [Build 6d212c8].
Is there anything wrong with getting the neighborlist?


Possible Bug 2:
Multiprocessing eats too much memory?
Are there n copies of data in processes when running a work with n cores (all cores are in one computer)?
Then it would eats n times memory.
However, with vasp 5.3.5, a n-core work eats only a little more memory than serial processing.
Is multiprocessing with improper pythonmodule? Or anything improper in program dealing with parallel process?

Offline zh

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 1141
  • Reputation: 24
    • View Profile
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #1 on: July 28, 2014, 10:14 »
For your 1st question:
The lattice vectors defined in your script file correspond to those of a  hexagonal supercell of graphene. But the coordinates of atoms defined in your script file are wrong, because they contain some atoms in a neighboring image of such supercell. It will cause some atoms being overlapped in the supercell calculation although no error message was reported during the running of job.   So both of your calculations are wrong.

For your 2nd questions:
Please provide more detailed information instead of a rough claim. For  a typical case of your calculation  running in the serial manner  and in the parallel one with multiple processes, you may type the command of "top" to check the use of memory.  Also, if you can provide the details of your parallel jobs (that is, how did you run the job in parallel?), it will be much better.

Please check the manual for running ATK in parallel:
http://www.quantumwise.com/documents/tutorials/latest/ParallelGuide/ParallelGuide.pdf


Offline anyuezhiji

  • Regular QuantumATK user
  • **
  • Posts: 16
  • Reputation: 0
    • View Profile
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #2 on: July 28, 2014, 14:53 »
For your 1st question:
The lattice vectors defined in your script file correspond to those of a  hexagonal supercell of graphene. But the coordinates of atoms defined in your script file are wrong, because they contain some atoms in a neighboring image of such supercell. It will cause some atoms being overlapped in the supercell calculation although no error message was reported during the running of job.   So both of your calculations are wrong.

For your 2nd questions:
Please provide more detailed information instead of a rough claim. For  a typical case of your calculation  running in the serial manner  and in the parallel one with multiple processes, you may type the command of "top" to check the use of memory.  Also, if you can provide the details of your parallel jobs (that is, how did you run the job in parallel?), it will be much better.

Please check the manual for running ATK in:
http://www.quantumwise.com/documents/tutorials/latest/ParallelGuide/ParallelGuide.pdf




For  1st question:
1. I believe that there is none atoms being overlapped (See attachments).
2. Even if there are overlaped atoms, coordinate translation would lead to the same result in the right program.
3. The difference caused by coordinate translation apears in DFT and Slater-Koster module, disapears in Extended Hückel module.

For  2nd question:
I found it claimed here: http://quantumwise.com/support/faq/91-how-many-atoms-can-be-computed-using-atk?catid=21%3Atechnical
Quote
An important thing to note is that each MPI node uses the same amount of RAM. So if one assigns more than one MPI processes to the same node, the memory requirement goes up quickly on that node.
I hope shared-memory should been used in ATK‘s python multiprocessing to save RAM.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5574
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #3 on: July 29, 2014, 20:20 »
1. There are definitely atoms overlapping.
2. No. If two atoms are in the same position (modulus a lattice translation vector) you will get wrong results (in any code).
3. The design of MPI - the way used in ATK - is to duplicate the memory per MPI process. If you instead do a multithreaded calculation, this is not the case. MPI was originally designed as a way to distribute calculation across separate nodes in a cluster environment. Using MPI on a multicore machine can provide some performance improvement, but it's in the nature of the problem that you need enough memory to use it in this way. Note however that you will most likely not get a good speed-up by putting say 4 MPIs on a single quadcore machine - there will be too much competition between the processes for RAM and cache access.

Offline anyuezhiji

  • Regular QuantumATK user
  • **
  • Posts: 16
  • Reputation: 0
    • View Profile
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #4 on: July 30, 2014, 05:29 »
1. There are definitely atoms overlapping.
2. No. If two atoms are in the same position (modulus a lattice translation vector) you will get wrong results (in any code).
3. The design of MPI - the way used in ATK - is to duplicate the memory per MPI process. If you instead do a multithreaded calculation, this is not the case. MPI was originally designed as a way to distribute calculation across separate nodes in a cluster environment. Using MPI on a multicore machine can provide some performance improvement, but it's in the nature of the problem that you need enough memory to use it in this way. Note however that you will most likely not get a good speed-up by putting say 4 MPIs on a single quadcore machine - there will be too much competition between the processes for RAM and cache access.



Well, but which atoms are overlapping.

Use the lattice vectors in Graphene1.uc to repeat Graphene1.xyz as 2×2×1 along the A B and C-axes,
then export the model 2×2×1 as Graphene1(2.2.1).xyz,
import it to Graphene1(2.2.1).xlsx and sort the coordinates by  section (x as the primary sort field, y as  the secondary sort field),
then calculate the distances of coordinates in adjacent rows,
but none of distances  are  less then 2.46 (if overlapped, there must be a distance close to zero).

« Last Edit: July 30, 2014, 05:58 by anyuezhiji »

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5574
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #5 on: July 30, 2014, 07:57 »
I apologize, I was not looking carefully enough. You are probably right, and the easiest way is to use the "Close neighbors" tools in the VNL Builder to verify it!

So in that case we are back to your first question, which I don't quite understand. Your scripts are very advanced which make them difficult to understand for anyone else. For a bug report, you will need to prepare two simple scripts that show the issue. The scripts need to be without "code" - just a simple version without options or loops or variables, and without custom plotting too - because after all there is a small possibility the error is in your code and not in ATK, and it will take us too long to troubleshoot your code.

Maybe, simply, your translation is not correct...

Offline anyuezhiji

  • Regular QuantumATK user
  • **
  • Posts: 16
  • Reputation: 0
    • View Profile
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #6 on: July 30, 2014, 14:14 »
I apologize, I was not looking carefully enough. You are probably right, and the easiest way is to use the "Close neighbors" tools in the VNL Builder to verify it!

So in that case we are back to your first question, which I don't quite understand. Your scripts are very advanced which make them difficult to understand for anyone else. For a bug report, you will need to prepare two simple scripts that show the issue. The scripts need to be without "code" - just a simple version without options or loops or variables, and without custom plotting too - because after all there is a small possibility the error is in your code and not in ATK, and it will take us too long to troubleshoot your code.

Maybe, simply, your translation is not correct...


Here are simple scripts.

The “Select By Bond Length” tool shown there are definitely none atoms overlapping.
Graphene1.py → Graphene2.py
Conducted only coordinate translation: x → x-0.64  y → y+0.38

And for Graphene1.py, if set vector_b = [8.544, 14.7986420999, 0.0]*Angstrom,  Band1.png would got.
Only 3 kpoints are used in Graphene1.py and Graphene2.py for testing, but they shown results consistent with the more detailed results from the Band.py and plotBand.py in “script for bug test.7z”.

By the way, every time reading the model from self-consistent finished .nc files, it would shown "Calculating Nonlocal Part and Kinetic Matrix".  Is not that information stored in the .nc files? If yes, this is a waste of time.
« Last Edit: July 30, 2014, 15:19 by anyuezhiji »

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5574
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #7 on: July 30, 2014, 19:15 »
If you also only use 3 k-points for the self-consistent loop, this may be the reason - try 9x9 for the main calculation instead. I don't have time to look at the details but I agree something needs to be checked.

Right, those parts are not stored in the NC file because they are fast to recalculate, even for a large system.

Offline anyuezhiji

  • Regular QuantumATK user
  • **
  • Posts: 16
  • Reputation: 0
    • View Profile
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #8 on: July 30, 2014, 20:21 »
If you also only use 3 k-points for the self-consistent loop, this may be the reason - try 9x9 for the main calculation instead. I don't have time to look at the details but I agree something needs to be checked.

Right, those parts are not stored in the NC file because they are fast to recalculate, even for a large system.


Dear Dr. Blom,

Only 3 k-points is for Bandstructure() test, for the self-consistent loop is 9x9x1.

Quote
numerical_accuracy_parameters = NumericalAccuracyParameters(
    k_point_sampling=(9, 9, 1),
    )

Quote
bandstructure = Bandstructure(
    configuration=bulk_configuration,
    kpoints=[[0.,0.,0.],[1./3,2./3,0.],[2./3,1./3,0.]],
    bands_above_fermi_level=All
    )

Quote
grep "E =" Graphene1.log
|   0 E = -209.947 dE =  1.000000e+00 dH =  1.262869e-01                       |
|   1 E = -184.423 dE =  2.552412e+01 dH =  1.085114e-01                       |
|   2 E = -191.404 dE =  6.980547e+00 dH =  2.408157e-03                       |
|   3 E = -185.399 dE =  6.004406e+00 dH =  2.498466e-02                       |
|   4 E = -185.413 dE =  1.391652e-02 dH =  8.107498e-05                       |
|   5 E = -185.383 dE =  2.998789e-02 dH =  1.288588e-04                       |
|   6 E = -185.385 dE =  2.034424e-03 dH =  1.304958e-05                       |
|   7 E = -185.385 dE =  2.128451e-05 dH =  1.905362e-06                       |

Quote
grep "E =" Graphene2.log
|   0 E = -209.947 dE =  1.000000e+00 dH =  1.262857e-01                       |
|   1 E = -184.423 dE =  2.552401e+01 dH =  1.085104e-01                       |
|   2 E = -191.403 dE =  6.980123e+00 dH =  2.408149e-03                       |
|   3 E = -185.399 dE =  6.004017e+00 dH =  2.498361e-02                       |
|   4 E = -185.413 dE =  1.390191e-02 dH =  8.113542e-05                       |
|   5 E = -185.383 dE =  3.011765e-02 dH =  1.296259e-04                       |
|   6 E = -185.385 dE =  2.236028e-03 dH =  1.403214e-05                       |
|   7 E = -185.385 dE =  3.237228e-05 dH =  1.753595e-06                       |


In self-consistent loop, the two models are both end with  E = -185.385 and Fermi Level  = -4.011141 eV (-4.011147 eV), but the total energies given by TotalEnergy(bulk_configuration).evaluate().inUnitsOf(eV) are -1.487783794e+04 and -1.511937382e+04 respectively.

Maybe there is nonething wrong in self-consistent loop, but something wrong in further calculations (e.g. Bandstructure() TotalEnergy()).


Maybe I'm obsessive-compulsive disorder, I still think those parts should also stored in the NC file. I dislike the repeated calculations, it would be a waste of time and memory.


Thanks and regards for your help and hard work in the summer holidays!
 
« Last Edit: July 30, 2014, 20:56 by anyuezhiji »

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5574
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #9 on: July 31, 2014, 01:52 »
First of all, thanks a lot for providing simple scripts for debugging. It makes a world of difference for us.

Now, you wouldn't save any memory by storing this information, because you would still have to read it back into memory from the NC file. And doing so takes about as much time as recomputing it. So if anything we do save space doing it the way we do - the NC file becomes smaller.

So, graphene is tricky. I re-ran your Graphene1.py script with 27x27 k-points and I then get a perfect match to the expected result. Probably you can get away with fewer but I wanted to be sure.

Offline anyuezhiji

  • Regular QuantumATK user
  • **
  • Posts: 16
  • Reputation: 0
    • View Profile
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #10 on: July 31, 2014, 03:45 »
First of all, thanks a lot for providing simple scripts for debugging. It makes a world of difference for us.

Now, you wouldn't save any memory by storing this information, because you would still have to read it back into memory from the NC file. And doing so takes about as much time as recomputing it. So if anything we do save space doing it the way we do - the NC file becomes smaller.

So, graphene is tricky. I re-ran your Graphene1.py script with 27x27 k-points and I then get a perfect match to the expected result. Probably you can get away with fewer but I wanted to be sure.


Dear Dr. Blom,

I'm trying to using 27x27 k-points for test but it seems does not work.
The DFT test are running (It will be relatively time-consuming), here are DFTB results:

Offline anyuezhiji

  • Regular QuantumATK user
  • **
  • Posts: 16
  • Reputation: 0
    • View Profile
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #11 on: July 31, 2014, 04:09 »

Use GrapheneDFTB.py, but set vector_b = [8.544, 14.799, 0.0]*Angstrom.

Offline anyuezhiji

  • Regular QuantumATK user
  • **
  • Posts: 16
  • Reputation: 0
    • View Profile
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #12 on: July 31, 2014, 06:13 »
Here is the DFT result with 27x27 k-points, which is consistent with  9x9 k-points.
The total energy is -1.487783852e+04 eV (-1.487783794e+04 for 9x9, -1.511937382e+04 is the right value).
« Last Edit: July 31, 2014, 06:21 by anyuezhiji »

Offline anyuezhiji

  • Regular QuantumATK user
  • **
  • Posts: 16
  • Reputation: 0
    • View Profile
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #13 on: July 31, 2014, 06:25 »
I re-ran your Graphene1.py script with 27x27 k-points and I then get a perfect match to the expected result.

Dear Dr. Blom,

Could you provide this script as attachment.
Thanks a lot.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5574
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: Possible Bugs: Call QuantumWise Staffs for Help
« Reply #14 on: August 2, 2014, 08:11 »
I really just changed 1 line - the k-points, compared to yours.
Attached is also the corresponding band structure plot.
I didn't evaluate the total energy.