Hi,
I would like to ask about specific hardware/software requirements for installation of QuantumATK-W-2024.09 so the TorchX potentials can be used.
I was trying to use TorchX interatomic potentials implemented in the version W-2024.09 but QuantumATK. On my laptop, it has crashed quantumATK with following message:
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: no kernel image is available for execution on the device
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at /slowfs/qatkdev2/users/qatktest/de02vlbamboo17/exlibs/.conan/data/torch/2.3.0.dev2/quantumatk/qatk2024.09/build/98ec6f6a829f6910030c9440491421bd8b3c4e2e/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xa9 (0x7f2003d112c9 in /home/karolcia/QuantumATK/QuantumATK-W-2024.09/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xc2 (0x7f2003cc0280 in /home/karolcia/QuantumATK/QuantumATK-W-2024.09/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c6 (0x7f2003c3e996 in /home/karolcia/QuantumATK/QuantumATK-W-2024.09/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x17a5b71 (0x7f1fdc7d4b71 in /home/karolcia/QuantumATK/QuantumATK-W-2024.09/lib/libtorch_cuda.so)
frame #4: at::native::copy_device_to_device(at::TensorIterator&, bool, bool) + 0xce5 (0x7f1fdc7f6b55 in /home/karolcia/QuantumATK/QuantumATK-W-2024.09/lib/libtorch_cuda.so)
...
It seems that there is some software incompatibility. I have tested it again with atkpython to receive more info:
>: ./atkpython
QuantumATK®
Version: W-2024.09 for Windows and Linux [Build fe68a9810a2]
Copyright © 2004-2024 Synopsys, Inc.
This software and the associated documentation are proprietary to Synopsys,
Inc. This software may only be used in accordance with the terms and
conditions of a written license agreement with Synopsys Inc. All other use,
reproduction, modification, or distribution of this software is strictly
prohibited. Licensed Products communicate with Synopsys servers for the
purpose of providing software updates, detecting software piracy and
verifying that customers are using Licensed products in conformity with the
applicable License Key for such Licensed Products. Synopsys will use
information gathered in connection with this process to deliver software
updates and pursue software pirates and infringers.
Inclusivity & Diversity - Visit SolvNetPlus to read the "Synopsys Statement
on Inclusivity and Diversity" (Refer to the article 000036315 at
https://solvnetplus.synopsys.com)
In [1]: import torch
In [2]: torch.cuda.is_available()
Out[2]: True
In [3]: device=torch.device("cuda:0")
In [4]: torch.randn(64,100,device=device)
/home/karolcia/QuantumATK/QuantumATK-W-2024.09/atkpython/lib/python3.11/site-packages/torch/cuda/__init__.py:184: UserWarning:
Found GPU0 Quadro P2000 which is of cuda capability 6.1.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability supported by this library is 7.0.
warnings.warn(
/home/karolcia/QuantumATK/QuantumATK-W-2024.09/atkpython/lib/python3.11/site-packages/torch/cuda/__init__.py:209: UserWarning:
Quadro P2000 with CUDA capability sm_61 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_70 sm_80 sm_90.
If you want to use the Quadro P2000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-4-ba9783c479eb> in <module>
----> 1 torch.randn(64,100,device=device)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I have the following configuration:
OS: debian bookworm
kernel: 6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64 GNU/Linux
GPU: NVIDIA Corporation GP107GLM [Quadro P2000 Mobile] (rev a1)
system GPU driver: nvidia-driver/stable-backports,now 535.183.06-1~bpo12+1 amd64 [installed]
system CUDA toolkit: nvidia-cuda-dev/stable,now 11.8.89~11.8.0-5~deb12u1 amd64 [installed]
I have compiled a local version of torch (2.6.0a0+gitea737e4) in conda enviroment and I can run following commands without issues:
>: python3
Python 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> device=torch.device("cuda:0")
>>> torch.randn(64,100,device=device)
tensor([[-0.2413, -0.2472, 1.6616, ..., 0.9969, -2.3384, -0.2548],
[ 0.2143, 0.3940, -1.8022, ..., 0.4503, 0.1832, 0.1646],
[ 0.0362, 0.2684, -0.5221, ..., 0.2252, 0.5317, 1.2204],
...,
[ 0.9273, 0.5185, -1.5020, ..., 0.3420, -0.7115, 1.6105],
[-1.3727, -1.3622, -0.9894, ..., 0.6896, 0.4132, -0.8442],
[-0.1157, 0.1780, 1.8506, ..., 0.2973, -1.2583, 0.2122]],
device='cuda:0')
>>>
Is it possible to use custom versions of torch library in quantumatk/atkpython?
If so, where could I find some informaton how to set it up?
If not, would it be possible to get a patch for quantumATK that also allows to use older CUDA capabilities?