QuantumATK Forum
QuantumATK => General Questions and Answers => Topic started by: yasheng on September 3, 2016, 21:53
-
Hi All,
I am using 2016 ATK. In one of my calculation, the number or symmetry reduced k-points is 50, and I used 24 cpus for doing the calculation.
And I got this warning in my output file:
+------------------------------------------------------------------------------+
| DiagonalizationSolver parallelization report. |
+------------------------------------------------------------------------------+
| Total number of processes: 24 |
| Total number of k-points: 50 |
| Processes per k-point: 1 |
| Number of process groups: 24 |
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
| WARNING: Sub-optimal distribution of processes over k-points. |
| For optimal performance make sure the number of processes is |
| a multiple of the number of k-points times processes_per_kpoint. |
| 22 process(es) have fewer k-points assigned than others |
| and will be partially idle. |
+------------------------------------------------------------------------------+
Does this mean that I am only using 2 cpus? and 22 others are idle and wasted?
Should I use 50 cpus for this calculation to do 1 cpu for 1 kpoint? What if I don't have that many cpus available ?
Can you explain in detail how to do this and not wasting our time and computational resources?
Thank you,
Yasheng
-
Dear Yasheng,
the report from DiagonalizationSolver indicates that you are running on 24 cpus, and k-points parallelization is switched off so that each of the 50 k-points is solved by all the 24 cpus.
You can turn on k-points parallelization by adding the flag "processes_per_kpoint=N_processes_per_kpoint" in the DiagonalizationSolver entry in your python input file. As the warning indicates, for optimal performance, the number of cpus (N_cpus) should be a multiple of the number of processes per kpoint (N_processes_per_kpoint) times the number of k-points (N_kpoints).
Example nr. 1 :
N_cpus = 12
N_kpoints = 6
N_processes_per_kpoint = 2
In this case, each of the 6 k-point will be solved by 2 processes, and all k-points will be solved simultaneously.
Example nr. 2 :
N_cpus = 12
N_kpoints = 12
N_processes_per_kpoint = 2
In this case, the first half of k-points (6 k-points) will be solved simultaneously by 2 processes, then the second half (6 k-points) will be solved.
Regards,
Daniele.
-
In the 2016 version the "processes per k-points" should be automatic, so perhaps you have in your script actively turned it off (or rather, set it to 1)?
-
Just to clarify the meaning of your original message: The program will first solve 24 k-points using one process each. It will then solve another 24 k-points using one process each. Finally, it will solve the remaining two k-points using one process each, and in this time, 22 processes will be idle. So 22 processes will only be idle in approximately 1/3 of the time. This is still a sub-optimal distribution, but the program does not leave 22 cpu's idle. That is why the message says "partially idle".
-
That is more clear now. Thank you all.