QuantumATK Forum
QuantumATK => General Questions and Answers => Topic started by: AsifShah on February 2, 2024, 13:51
-
Dear Admin,
Any idea what is causing this error? QuantumATK version is latest one released in december 2023.V12.
I do not face any issue on previous version of QATK on same cluster.
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 181758 RUNNING AT node10
= KILLED BY SIGNAL: 4 (Illegal instruction)
===================================================================================
-
In almost all cases it means the calculation ran out of memory. There are many techniques to reduce memory, incl. different parallelization strategies, see https://docs.quantumatk.com/manual/technicalnotes/advanced_performance/advanced_performance.html
-
I'm not convinced this is a memory issue. More likely an incompatibility between the software and hardware.
Some questions we need answer to in order to investigate this issue:
- What are you trying to run? Please send script if you can
- How far does it get? Please send log output
On top of that it would be very valuable if you could rerun the script and set the environment variable I_MPI_DEBUG=5 in your submission script before the execution of atkpython. Then Intel MPI will output various diagnostics to the output log. Send that to us. Also please send the output of 'cat /proc/cpuinfo' on the node that is running the script (you can add this to the top of the submission script) as well as the name and version of the Operating System.
-
Dear Filipr,
Thanks for responding.
GPU Response:
1. When I run on single GPU core, it runs very well and gives output nicely. (See attached Au_MoS2.py file)
2. When I run on multiple GPU cores, it shows an error (See attached file Error.txt).
3. Also see the attached Au_MoS2.log file.
4. Also see attached cpuinfoo.txt
CPU Response:
1. When I run on single or multiple CPU cores, the output file shows this:
[0] MPI startup(): Intel(R) MPI Library, Version 2021.8 Build 20221129 (id: 339ec755a1)
- MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
- MPI startup(): libfabric version: 1.13.2rc1-impi
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 805 RUNNING AT ssdnode2
= KILLED BY SIGNAL: 4 (Illegal instruction)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 806 RUNNING AT ssdnode2
= KILLED BY SIGNAL: 4 (Illegal instruction)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 807 RUNNING AT ssdnode2
= KILLED BY SIGNAL: 4 (Illegal instruction)
===================================================================================
-
The M3GNet implementation in QuantumATK currently only supports running on a single GPU and with a 1 MPI process (the case that works nicely for you).
When running on CPU only you should set device='cpu'. We found an issue when running on a node that does not support CUDA with device='cuda', the automatic fallback to CPU does not work. We will fix that in the upcoming service pack.