Dear Julian,
I was trying to run M3GNet on windows 11, core i5 with 4 cores and 8 logical threads with GTX 1650 NVidia Graphics card having 1024 CUDA cores and 8 GB memory, Graphics boost clock 1560 MHz.
When running as atkpython input.py > output.log in CMD, I observed a 50% (from 1 hour to 30 mins) speed up in a certain MD simulation with 600 atoms (for 3 ps) on GPU using 'device=cuda'.
However, the output file shows following before starting MD run:
Enabled: False |
| Number of Domains: 1 |
| Decomposition Pattern: 1x1x1 |
| CPU Information: |
| Process ID 0 at ASH (4 threads)
Is it possible to increase the number of threads (beyond 4) or how to properly utilize Cuda cores to see more speed up as my final system has 5K atoms?
Also, I have this doubt that if I am running on Cuda cores, shouldn't the number of threads be equal to the number of Cuda cores = 1024, instead log file shows only 4 threads = CPU cores? Furthermore, I observed that GPU utilization was showing 0% when the simulation was running, though there was a 2x speedup.
Thanks