QuantumATK Forum
QuantumATK => General Questions and Answers => Topic started by: jchang on September 12, 2017, 01:56
-
Hi,
I was trying to run device simulation to get transmission.
It is a very simple structure. Just monolayer 2D material with no gate and it is about only 7nm long.
But I get error message whenever it enters to run device scf loop.
I have tested various number of cores and tried different number of transverse modes and different poisson solver, but all failed.
And I also tested it on other cluster, which showed same error.
Here, I attached input file and log.
Please give me your advice.
Thanks.
jiwon
-
Looks like you are running out of memory. I will suggest the following to be sure:
1. Use 96 cores to assign exactly one contour point to each.
2. If that does not work, try without the Grimme correction. If that runs fine, you probably need more memory per core to use the Grimme correction.
-
Hi,
the log file does not provide sufficient information to understand what has caused the crash, but I suspect it is due to insufficient memory, because of incorrect load balancing. Since you have 96 contour points and only 70 processes, the first 26 processes are occupied twice as much as the others.
Could you try to re-run the simulation using:
equilibrium_method = GreensFunction(
processes_per_contour_point=2,
)
Regards,
Daniele.
-
Thanks for your response.
To follow your suggestion,
equilibrium_method = GreensFunction(
processes_per_contour_point=2,
)
in my above case, since I have 96 contour points, do I need to have 192 cores?
My license restriction is 100 cores. Then, is there any way to reduce contour points?
And here is the additional error message that I got from terminal, not in log file.
[jchang@dirac0 Device5nmSbCh]$ [proxy:0:3@dirac0] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:3@dirac0] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:3@dirac0] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@dirac0] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@dirac0] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@dirac0] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@dirac0] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion
Please take a look.
Thanks.
jiwon
-
No, if you go to 96 cores you do not need to also use 2 processes per contour point. You can also try saving the self-energies to disk as described here: http://docs.quantumwise.com/manuals/Types/StoreOnDisk/StoreOnDisk.html
-
Hi,
I tried your suggestions but all failed.
I attached my input file and log file again.
I though only 5nm monolayer should not have any memory problem. But it seems not.
Can you check if it is really memory problem? and give me any other advices?
Thanks.
jiwon
-
From the error messages, it certainly looks like a memory problem. Try searching for "std::bad_alloc" on the internet.
I will have a look at it, but please confirm whether or not you have tested without the Grimme correction?
-
Thanks Ulrik.
I have tested it without Grimme correction, but it didn't make difference.
jiwon
-
Okay, thanks for clarifying - I am looking into it, and will get back to you.
-
Thanks!!
-
I can confirm that the calculation is running out of memory. The main cause is the high mesh cutoff, which is default due to the semi-core states of the SG15 pseudopotential for Sb. So I have the following suggestions, subject to your specific needs:
- You can try to reduce the mesh cutoff. The default is chosen to give very accurate energies and forces, so if you do not need those, a lower mesh cutoff can still be fine for your needs.
Another option is to use a different pseudopotential. More specifically, the FHI pseudopotentials do not contain semi-core states, and have much lower default mesh cutoffs as a consequence. You will sacrifice some accuracy, but depending on your specific needs, it might not be important.
LDA uses less memory that GGA, so you can also consider that.
If none of these options work, you need to increase the memory per process, either by increasing the amount of memory or by decreasing the number of MPI processes. If you use the version of MPI that we ship with ATK, as of 2017, you can use a combination of MPI processes and threading.
-
I see. I will try your suggestions.
Appreciate it.
jiwon