Author Topic: QuantumATK-W-2024.09 CUDA requirements  (Read 461 times)

0 Members and 1 Guest are viewing this topic.

Offline karolina2

  • Regular QuantumATK user
  • **
  • Posts: 10
  • Country: gb
  • Reputation: 0
    • View Profile
QuantumATK-W-2024.09 CUDA requirements
« on: September 23, 2024, 14:52 »
Hi, I would like to ask about specific hardware/software requirements for installation of QuantumATK-W-2024.09 so the TorchX potentials can be used. I was trying to use TorchX interatomic potentials implemented in the version W-2024.09 but QuantumATK. On my laptop, it has crashed quantumATK with following message:
Code
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: no kernel image is available for execution on the device
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /slowfs/qatkdev2/users/qatktest/de02vlbamboo17/exlibs/.conan/data/torch/2.3.0.dev2/quantumatk/qatk2024.09/build/98ec6f6a829f6910030c9440491421bd8b3c4e2e/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xa9 (0x7f2003d112c9 in /home/karolcia/QuantumATK/QuantumATK-W-2024.09/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xc2 (0x7f2003cc0280 in /home/karolcia/QuantumATK/QuantumATK-W-2024.09/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c6 (0x7f2003c3e996 in /home/karolcia/QuantumATK/QuantumATK-W-2024.09/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x17a5b71 (0x7f1fdc7d4b71 in /home/karolcia/QuantumATK/QuantumATK-W-2024.09/lib/libtorch_cuda.so)
frame #4: at::native::copy_device_to_device(at::TensorIterator&, bool, bool) + 0xce5 (0x7f1fdc7f6b55 in /home/karolcia/QuantumATK/QuantumATK-W-2024.09/lib/libtorch_cuda.so)
...
It seems that there is some software incompatibility. I have tested it again with atkpython to receive more info:
Code
>: ./atkpython
                            QuantumATK®                             

    Version: W-2024.09 for Windows and Linux [Build fe68a9810a2]    

                Copyright © 2004-2024 Synopsys, Inc.                

This software and the associated documentation are proprietary to Synopsys,
Inc. This software may only be used in accordance with the terms and
conditions of a written license agreement with Synopsys Inc. All other use,
reproduction, modification, or distribution of this software is strictly
prohibited. Licensed Products communicate with Synopsys servers for the
purpose of providing software updates, detecting software piracy and
verifying that customers are using Licensed products in conformity with the
applicable License Key for such Licensed Products. Synopsys will use
information gathered in connection with this process to deliver software
        updates and pursue software pirates and infringers.         

Inclusivity & Diversity - Visit SolvNetPlus to read the "Synopsys Statement
  on Inclusivity and Diversity" (Refer to the article 000036315 at  
                 https://solvnetplus.synopsys.com)                  

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

In [3]: device=torch.device("cuda:0")

In [4]: torch.randn(64,100,device=device)
/home/karolcia/QuantumATK/QuantumATK-W-2024.09/atkpython/lib/python3.11/site-packages/torch/cuda/__init__.py:184: UserWarning: 
    Found GPU0 Quadro P2000 which is of cuda capability 6.1.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is 7.0.
    
  warnings.warn(
/home/karolcia/QuantumATK/QuantumATK-W-2024.09/atkpython/lib/python3.11/site-packages/torch/cuda/__init__.py:209: UserWarning: 
Quadro P2000 with CUDA capability sm_61 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_70 sm_80 sm_90.
If you want to use the Quadro P2000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-ba9783c479eb> in <module>
----> 1 torch.randn(64,100,device=device)

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I have the following configuration: OS: debian bookworm kernel: 6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64 GNU/Linux GPU: NVIDIA Corporation GP107GLM [Quadro P2000 Mobile] (rev a1) system GPU driver: nvidia-driver/stable-backports,now 535.183.06-1~bpo12+1 amd64 [installed] system CUDA toolkit: nvidia-cuda-dev/stable,now 11.8.89~11.8.0-5~deb12u1 amd64 [installed] I have compiled a local version of torch (2.6.0a0+gitea737e4) in conda enviroment and I can run following commands without issues:
Code
>: python3
Python 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> device=torch.device("cuda:0")
>>> torch.randn(64,100,device=device)
tensor([[-0.2413, -0.2472,  1.6616,  ...,  0.9969, -2.3384, -0.2548],
        [ 0.2143,  0.3940, -1.8022,  ...,  0.4503,  0.1832,  0.1646],
        [ 0.0362,  0.2684, -0.5221,  ...,  0.2252,  0.5317,  1.2204],
        ...,
        [ 0.9273,  0.5185, -1.5020,  ...,  0.3420, -0.7115,  1.6105],
        [-1.3727, -1.3622, -0.9894,  ...,  0.6896,  0.4132, -0.8442],
        [-0.1157,  0.1780,  1.8506,  ...,  0.2973, -1.2583,  0.2122]],
       device='cuda:0')
>>> 
Is it possible to use custom versions of torch library in quantumatk/atkpython? If so, where could I find some informaton how to set it up? If not, would it be possible to get a patch for quantumATK that also allows to use older CUDA capabilities?

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5541
  • Country: dk
  • Reputation: 91
    • View Profile
    • QuantumATK at Synopsys
Re: QuantumATK-W-2024.09 CUDA requirements
« Reply #1 on: September 23, 2024, 23:37 »
I don't know what exactly is missing in this processor technically, but the GPU support we have included for torch-based force fields and MTP fitting is only designed for Nvidia A100, H100 and V100. I also think that is only where you would see a serious performance benefit.

As for running a separate Torch version, that might be hard. When we didn't ship it with the software one could pip install it in a virtual environment, but I am not sure if you can downgrade that way. For more info on venvs, see https://docs.quantumatk.com/manual/Python.html#customize-the-environment-python-venvs
« Last Edit: September 23, 2024, 23:48 by Anders Blom »