Author Topic: Questions Regarding GPU Acceleration (Read 18383 times)

dmicje12 · « **on:** November 19, 2025, 16:07 »

Hello everyone, I would like to use my RTX 5080 for GPU acceleration.
However, after running I get the following message:

**Back Engine Exception : PETSc error 76 of type 0 in MatSeqAIJCUSPARSECopyToGPU:2488: This program was not compiled for SM 120
: cudaErrorInvalidDevice: invalid device ordinal
** Location of Exception : petsc_impl.h:90
May I ask if the RTX 5080 is currently not supported? Or is there any other way to enable GPU acceleration with the RTX 5080?

Thank you, everyone!

filipr · « **Reply #1 on:** November 20, 2025, 18:55 »

QuantumATK relies on the PETSc library for some sparse matrix operations. It appears that PETSC uses one of a few specialized CUDA functions that are not compatible with newer compute capabilities (your RTX 5080 has compute capability 12.0 = SM120 as per this chart). Basically your GPU is too new to be used for the PETSc library shipped with QuantumATK. The compatible compute capabilities has to be chosen when compiling the PETSc source code, so it is baked in. So for now you can't use your RTX gpu for doing calculations that involve PETSc operations.

However, even if it did work it likely wouldn't be faster than running on CPU. RTX 5080 is a GPU primarily designed for graphics processing and it only has 384 FP64 floating point units - 2 per streaming multiprocessor, so reportedly you only get 1/64 of the FLOP throughput. Scientific calculations mostly rely on 64 bit floating point numbers for accuracy in the algorithms. For scientific computations one should use specific GPUs with many FP64 units as is the case for the A, H and B series data center GPUs.

The RTX 5080 sounds like a blast for gaming, though

dmicje12 · « **Reply #2 on:** November 21, 2025, 03:01 »

Thank you for your response!
I would like to know how your team evaluated the acceleration performance of the V100. Was the improvement significant?
I understand that the V100 is certainly not as powerful as the A100 or H100, but how does the V100 compare to a server-grade CPU (such as the EPYC 7F72 shown in the manual)?
Thanks!

filipr · « **Reply #3 on:** November 24, 2025, 14:01 »

I am not aware of any benchmarks on Volta series GPUs. We mainly have access to A-series hardware. Depending on the calculations you do the bottleneck of the program will often be in CUDAx libraries like cuBLAS and cuSolver, so you can search for benchmarks of these.

dmicje12 · « **Reply #4 on:** November 25, 2025, 08:17 »

Thanks you for your reply!

AsifShah · « **Reply #5 on:** November 26, 2025, 04:34 »

On the same note,

I am using RTX 6000 Ada but there is no speedup. The CPU runs faster than GPU.
Any idea, why? Will these issues be resolved in next version of QATK?

filipr · « **Reply #6 on:** November 26, 2025, 14:12 »

As noted above, the reason for this is that RTX cards do not have many FP64 units per streaming multiprocessor. From this table you can see that RTX 6000 has a single precision operation throughput of 91 TeraFLOPS, but only 1.4 TeraFLOPS for double precision, i.e. a fraction of 1/64 as I also explained earlier.

Besides that, GPUs are only good for tasks that can be heavily parallelized and not all algorithms scale equally well. On top of that transferring data from host RAM to GPU memory is very slow, especially on consumer cards, so one needs to go to quite large systems before the compute speedup compensates for the overhead of copying data to the GPU.

These issues are inherent to scientific computing, not just QuantumATK and are not really "fixable". If you want a worthwhile GPU speedup you need to get your hands on the GPUs designed for scientific workloads.

filipr · « **Reply #7 on:** November 28, 2025, 11:18 »

We have recently been made aware that Nvidia has implemented algorithms that can do linear algebra operations using emulated floating point operations on tensor cores on GPUs that do not necessarily have many native FP64 units such as RTX 6000: https://developer.nvidia.com/blog/unlocking-tensor-core-performance-with-floating-point-emulation-in-cublas/

This requires using the newest CUDA version (and probably also newest drivers). We have not had time to do any experiments with this, but feel free to try yourself. Note that QuantumATK ships with CUDA 12.2, so you will have to modify the launcher script (bin/atkpython is just a Bash script that sets LD_LIBRARY_PATH and other relevant environment variables) so that the program picks up the right CUDA libraries.

QuantumATK Forum

News:

Author Topic: Questions Regarding GPU Acceleration (Read 18383 times)

dmicje12

Questions Regarding GPU Acceleration

filipr

Re: Questions Regarding GPU Acceleration

dmicje12

Re: Questions Regarding GPU Acceleration

filipr

Re: Questions Regarding GPU Acceleration

dmicje12

Re: Questions Regarding GPU Acceleration

AsifShah

Re: Questions Regarding GPU Acceleration

filipr

Re: Questions Regarding GPU Acceleration

filipr

Re: Questions Regarding GPU Acceleration