Author Topic: How to find checkpoint file and restart calculation in linux redhat system  (Read 5604 times)

0 Members and 1 Guest are viewing this topic.

Offline Sukhbir

  • Regular QuantumATK user
  • **
  • Posts: 12
  • Country: in
  • Reputation: 0
    • View Profile
Hello,
I was running calculation on Graphene FET by using mpiexec parallel run. But my calculation stopped due to power cut. Hence I am unable to find checkpoint file to restart it again. Presently, i am using 12.8.2 version of ATK-VNL in linux base operating system (Red Hat).
So, Can anyone tell me
1. How to find and locate appropriate checkpoint file
2. How to restart it (If any anyone can provide me video)
3. Does restart can provide all analysis results.
 

Offline zh

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 1141
  • Reputation: 24
    • View Profile
Please use the new version. The support to this very old version is limited.
1. It may be stored in '/tmp'.  The storage information (name and path of checkpoint file) may be written into the log file of your job.
2. Look at here: https://www.quantumwise.com/publications/tutorials/item/502-restarting-stopped-calculations
3. It depends on how much the information has been stored in the checkpoint file.

Offline Sukhbir

  • Regular QuantumATK user
  • **
  • Posts: 12
  • Country: in
  • Reputation: 0
    • View Profile
Thanks for your reply,
But I am confused because tmp folder contain many checkpoint file and i have opened them all in editor but they donot contain correct input information. So is there any other folder where by default it can be stored . Therefore, how should i analyse it. Secondly,I am unable to find log file.
Please guide me.   

Offline Jess Wellendorff

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 933
  • Country: dk
  • Reputation: 29
    • View Profile
If you cannot find the log file of the job (example: mpiexec -n 4 myscript.py > mylog.log) it is gonna be hard for you to restart this job. Why not simply redo the calculation from scratch? That is a pretty common consequence of power failures on a supercomputer.

Offline Sukhbir

  • Regular QuantumATK user
  • **
  • Posts: 12
  • Country: in
  • Reputation: 0
    • View Profile
Thanks for reply,

I have got checkpoint file from tmp folder. Now I want to know that, Should I keep name of output file (analysis.nc) same as it was in previous script. Because half of its calculation is complete. So does the calculation start from where it stopped and now i can expect all results .
   

Offline zh

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 1141
  • Reputation: 24
    • View Profile
It is better to keep it. 

The restarted calculation continues not exactly from the stopped point of the last calculation, because the stored information was written after some specific step or point. For example, during the self-consistent calculation (SCF) of a bulk configuration, the charge density may be written into the checkpoint file in every SCF step.  If the calculation stops during the i^th step, the continued job will continue from the saved charge density of the (i-1)^th step.

Offline Sukhbir

  • Regular QuantumATK user
  • **
  • Posts: 12
  • Country: in
  • Reputation: 0
    • View Profile
Thanks for your kind reply,

My calculation  is completed. But, It is showing some error and I am unable to understand it. I am attaching the screenshot of it.
Please guide me accordingly where i am making mistake.

Offline Jess Wellendorff

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 933
  • Country: dk
  • Reputation: 29
    • View Profile
As the error message clearly says, your script has called the function "nlsave" with too few arguments. If you take a look at the ATK reference manual ( http://www.quantumwise.com/documents/manuals/latest/ReferenceManual/index.html/ref.nlsave.html ) you will see that the correct syntax is like this:
Code
nlsave('file.nc', configuration)
In your script, you have only specify the NetCDF filename, but not the configuration that should be saved. Fix this, and it will work.

Offline Sukhbir

  • Regular QuantumATK user
  • **
  • Posts: 12
  • Country: in
  • Reputation: 0
    • View Profile
Thanks lot for your kind reply