Author Topic: How to set parallel calculation for ATK 2015? (Read 7670 times)

lknife · « **on:** May 5, 2017, 16:33 »

Dear all,

As mentioned in tutorial “Spin-orbit transport calculations: Bi2Se3 topological insulator thin-film device”, DFT device calculations including spin-orbit coupling can be computationally heavy and may require a lot of memory. Thus, MPI is a good solution for these calculations.

I am using ATK2016 on my local computer to do some scientific calculations. If I want to run MPI calculation on the computer cluster of our university, only ATK2015-MPI can be used. In ATK2016, one can specify the parallel parameters, such as “processes per NEB image”, “processes per individual”, “processes per bias point” and “processes per saddle search” directly in the python script. However, in ATK2015, no such parameters can be used in the script. One can only set the MPI calculation through “Job manager”.

The default script of a “calculator” related to MPI settings for ATK2015 and ATK2016 are as followings:

ATK2015:
equilibrium_method = GreensFunction (
processes_per_contour_point=1,
)
non_equilibrium_method = GreensFunction(
processes_per_contour_point=1,
)

ATK2016:
equilibrium_method = GreensFunction(
processes_per_contour_point=1,
)
non_equilibrium_method = GreensFunction(
processes_per_contour_point=1,
)
parallel_parameters = ParallelParameters(
processes_per_bias_point=None,
processes_per_neb_image=None,
processes_per_saddle_search=1,
processes_per_individual=None,
)

One question is: How to set “processes_per_bias_point” in ATK2015? I do think it’s very helpful to save time for I-V calculating if you have enough processes.

For the tutorial above mentioned, the MPI settings involved are the followings:
(1)   “4 MPI processes per k-point”
(2)   “evaluation of the 48 contour points for the Green’s function method is done using 3 MPI processes per contour point”
(3)   “if your cores have only little memory, you may need to allocate 2 cores or more per MPI process”

The guide to use our computer cluster for MPI calculations can be described as these codes:

___________________________________________________________________________________
## N is the number of nodes you want to span you MPI processes
#SBATCH --nodes N
## Set the number of processes per node, this must not exceed the node core count
#SBATCH --ntasks-per-node n
## Request for memory per task, default is 2G so request more if needed. Specify in MB
#SBATCH --mem-per-cpu 4098
____________________________________________________________________________________

I have questions about (1): Can I set “4 MPI processes per k-point” through python script using ATK2015 before submitting the .py file to the cluster? If not, how to set the parameters above for this calculation? Do I Just need to set the “ntasks-per-node” to 4?

I am looking forward to your kind help! Thank you very much!

lknife · « **Reply #1 on:** May 5, 2017, 16:48 »

by the way, there is no parameter can be set for "DiagonalizationSolver" for ATK2015, which has two parameters in ATK2016, one is "bands_above_fermi_level", the other is "processes_per_kpoint".

Anders Blom · « **Reply #2 on:** May 6, 2017, 19:52 »

Correct. This was introduced in 2016

lknife · « **Reply #3 on:** May 18, 2017, 17:29 »

Dear Anders Blom/ other QuantumWise staffs

Thank you for your reply to my question on Forum named "How to set parallel calculation for ATK 2015? ".

During these days, I am struggling with the tutorial "“Spin-orbit transport calculations: Bi2Se3 topological insulator thin-film device”". As it was said in the tutorial, DFT device calculations including spin-orbit coupling can be computationally heavy.

I tried to follow the steps of the tutorial and reproduce the results ot it. However, I am blocked by running the script "electrod1.py". It took me a very long time (more than 3 days) to run the script on my local computer and it is still running----Although the tutorial payed a lot attention on explaining the memory distribution through MPI, it does can be run on my local computer. I estimated the peak memory request through the VNL: it is just about 7G and can be met on my local computer.
Here is a small question: because there are two code lines that cannot be handled through the VNL:
------------------------------------------------------------
left_electrode.setCalculator(calculator)
MemoryUsage(left_electrode)
------------------------------------------------------------
I am not sure if the memory estimated through the block "Calculator" in the VNL is the same obtained by “memoryUsage(left_electrode) “. If the peak memory request is just about 7G, I think many computers can meet this requirement. Thus, in my opinion, we don’t need to set these parameters as in the tutorial:
--------------------------------------------------------------------------------
4 MPI processes per k-point.
Processes per contour point = 3
-------------------------------------------------------------------------------
Instead, we can just set “1 MPI process per k-point” and “processes per contour point 1” to get the fastest speed.
-------------However, I am not sure if it is correct, could you please help me about that? --------------

Another question: ----------------------------
Since it would take me so long time to run the calculation on my local computer, I want to run it on the computer cluster of our university. There are several nodes can be used on the cluster: at least 2 nodes with 16 cores per node and 16G memory per core.
I tried several times to run the script on the cluster. However, because of my limited knowledge on MPI calculation of ATK, it took me more than 5 days to run the calculation. In addition, it exceeded the time limit of the cluster so that it was killed and no results was obtained.

Here I would like to seek your kind help to help me find a MPI strategy to run it through our cluster which can be finished within 5 days so that I can get the results.

Thank you very much for your time and kind help!

Best regards,
Yours Sincerely
lknife

Anders Blom · « **Reply #4 on:** May 18, 2017, 18:51 »

Q1: The best thing in 2016 and newer version is to not set anything, just let the software figure out the optimal parameters. It does a very good job for standard problems, and if you are not limited in memory.

Q2: Definitely a good idea to run on the cluster. Start by reading carefully http://docs.quantumwise.com/guides/mpi_atk/mpi_atk.html#parallel-execution-of-atk and the section after that, i.e. ensure you limit threading. Test with running a small example in parallel first, before trying the big system, and ensure you have scaling of performance with number of processes.

lknife · « **Reply #5 on:** May 19, 2017, 19:03 »

Thank you very much for your kind help!

QuantumATK Forum

News:

Author Topic: How to set parallel calculation for ATK 2015? (Read 7670 times)

lknife

How to set parallel calculation for ATK 2015?

lknife

Re: How to set parallel calculation for ATK 2015?

Anders Blom

Re: How to set parallel calculation for ATK 2015?

lknife

Re: How to set parallel calculation for ATK 2015?

Anders Blom

Re: How to set parallel calculation for ATK 2015?

lknife

Re: How to set parallel calculation for ATK 2015?