Author Topic: M3GNET  (Read 9333 times)

0 Members and 1 Guest are viewing this topic.

Offline AsifShah

  • QuantumATK Guru
  • ****
  • Posts: 173
  • Country: in
  • Reputation: 2
    • View Profile
Re: M3GNET
« Reply #15 on: February 16, 2024, 10:29 »
Dear Julian,

I was trying to run M3GNet on windows 11, core i5 with 4 cores and 8 logical threads with GTX 1650 NVidia Graphics card having 1024 CUDA cores and 8 GB memory, Graphics boost clock 1560 MHz.

When running as atkpython input.py > output.log in CMD, I observed a 50% (from 1 hour to 30 mins) speed up in a certain MD simulation with 600 atoms (for 3 ps) on GPU using 'device=cuda'.
However, the output file shows following before starting MD run:
Enabled: False                                                               |
| Number of Domains: 1                                                         |
| Decomposition Pattern: 1x1x1                                                 |
| CPU Information:                                                             |
|  Process ID 0 at ASH (4 threads)


Is it possible to increase the number of threads (beyond 4) or how to properly utilize Cuda cores to see more speed up as my final system has 5K atoms?
Also, I have this doubt that if I am running on Cuda cores, shouldn't the number of threads be equal to the number of Cuda cores = 1024, instead log file shows only 4 threads = CPU cores? Furthermore,  I observed that GPU utilization was showing 0% when the simulation was running, though there was a 2x speedup.

Thanks
« Last Edit: February 18, 2024, 04:45 by AsifShah »

Offline AsifShah

  • QuantumATK Guru
  • ****
  • Posts: 173
  • Country: in
  • Reputation: 2
    • View Profile
Re: M3GNET
« Reply #16 on: February 20, 2024, 05:40 »
Dear Julian,

Can you kindly help in this regard?

Thanks

Offline Julian Schneider

  • QuantumATK Staff
  • QuantumATK Guru
  • *****
  • Posts: 164
  • Country: dk
  • Reputation: 25
    • View Profile
Re: M3GNET
« Reply #17 on: February 28, 2024, 22:26 »
The log shows only the CPU processes and threads, not the CUDA cores.

If you have more CPU cores, you can increase the OMP_NUM_THREADS to use more threads, which will normally speed-up the CPU part simulation, but it does not neccessarily speed-up the GPU part.