Recent Posts

Pages: 1 2 [3] 4 5 ... 10
21
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by filipr on November 28, 2025, 11:18 »
We have recently been made aware that Nvidia has implemented algorithms that can do linear algebra operations using emulated floating point operations on tensor cores on GPUs that do not necessarily have many native FP64 units such as RTX 6000: https://developer.nvidia.com/blog/unlocking-tensor-core-performance-with-floating-point-emulation-in-cublas/

This requires using the newest CUDA version (and probably also newest drivers). We have not had time to do any experiments with this, but feel free to try yourself. Note that QuantumATK ships with CUDA 12.2, so you will have to modify the launcher script (bin/atkpython is just a Bash script that sets LD_LIBRARY_PATH and other relevant environment variables) so that the program picks up the right CUDA libraries.
22
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by filipr on November 26, 2025, 14:12 »
As noted above, the reason for this is that RTX cards do not have many FP64 units per streaming multiprocessor. From this table you can see that RTX 6000 has a single precision operation throughput of 91 TeraFLOPS, but only 1.4 TeraFLOPS for double precision, i.e. a fraction of 1/64 as I also explained earlier.

Besides that, GPUs are only good for tasks that can be heavily parallelized and not all algorithms scale equally well. On top of that transferring data from host RAM to GPU memory is very slow, especially on consumer cards, so one needs to go to quite large systems before the compute speedup compensates for the overhead of copying data to the GPU.

These issues are inherent to scientific computing, not just QuantumATK and are not really "fixable". If you want a worthwhile GPU speedup you need to get your hands on the GPUs designed for scientific workloads.
23
Thank you for the advice. I'll try it.
24
General Questions and Answers / Re: Issue on running MTP training simulation
« Last post by AsifShah on November 26, 2025, 04:36 »
Hi,

Just a small suggestion. Instead of going with MTP, I would suggest fine-tuning a MACE mode which is more accurate than MTP. The fine tuning is also very simple. You can go throught his tutorial.

 "https://docs.quantumatk.com/tutorials/mace-training-c-am-TiSi/mace-training-c-am-TiSi.html"
25
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by AsifShah on November 26, 2025, 04:34 »
On the same note,

I am using RTX 6000 Ada but there is no speedup. The CPU runs faster than GPU.
Any idea, why? Will these issues be resolved in next version of QATK?
26
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by dmicje12 on November 25, 2025, 08:17 »
Thanks you for your reply!
27
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by filipr on November 24, 2025, 14:01 »
I am not aware of any benchmarks on Volta series GPUs. We mainly have access to A-series hardware. Depending on the calculations you do the bottleneck of the program will often be in CUDAx libraries like cuBLAS and cuSolver, so you can search for benchmarks of these.
28
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by dmicje12 on November 21, 2025, 03:01 »
Thank you for your response!
I would like to know how your team evaluated the acceleration performance of the V100. Was the improvement significant?
I understand that the V100 is certainly not as powerful as the A100 or H100, but how does the V100 compare to a server-grade CPU (such as the EPYC 7F72 shown in the manual)?
Thanks!
29
Hi all,

Could anyone clarify the physical meaning of electrode_extension_lengths in the device configuration, and how it influences the calculation results? The manual only states that it is “the desired equivalent electrode extension length of each electrode,” which is not very informative.

In a related forum discussion (https://forum.quantumatk.com/index.php?topic=10735.0), it was mentioned that the calculation results depend on this parameter. What is the underlying theory for this dependence?

We already have the parameter equivalent_electrode_lengths, which defines how much of the central region should be equivalent to the electrodes which is clear. But how is electrode_extension_lengths fundamentally different?

In older versions of the manual and ATK (e.g., https://docs.quantumatk.com/tutorials/atk_transport_calculations/atk_transport_calculations.html), this parameter did not exist.
30
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by filipr on November 20, 2025, 18:55 »
QuantumATK relies on the PETSc library for some sparse matrix operations. It appears that PETSC uses one of a few specialized CUDA functions that are not compatible with newer compute capabilities (your RTX 5080 has compute capability 12.0 = SM120 as per this chart). Basically your GPU is too new to be used for the PETSc library shipped with QuantumATK. The compatible compute capabilities has to be chosen when compiling the PETSc source code, so it is baked in. So for now you can't use your RTX gpu for doing calculations that involve PETSc operations.

However, even if it did work it likely wouldn't be faster than running on CPU. RTX 5080 is a GPU primarily designed for graphics processing and it only has 384 FP64 floating point units - 2 per streaming multiprocessor, so reportedly you only get 1/64 of the FLOP throughput. Scientific calculations mostly rely on 64 bit floating point numbers for accuracy in the algorithms. For scientific computations one should use specific GPUs with many FP64 units as is the case for the A, H and B series data center GPUs.

The RTX 5080 sounds like a blast for gaming, though  8)
Pages: 1 2 [3] 4 5 ... 10