Hello, we are trying to use the NVIDIA GPUs (V100, A100) in our cluster to accelerate the MTP training. Unfortunately, I encounter the following error:
CUDA Error: cusolverDnDgesvd failed with status 3
NL.ComputerScienceUtilities.ParallelTools.DynamicTaskScheduler.TaskExecutionError: An exception was raised while executing task "013b829c660c11ef992218c04dbe52d0".
Traceback (most recent call last):
File "zipdir/NL/ComputerScienceUtilities/ParallelTools/DynamicTaskScheduler.py", line 940, in __runNextTaskOnDelegatorProcess
File "zipdir/NL/ComputerScienceUtilities/Workflow/Workflow.py", line 1193, in _runTask
File "zipdir/NL/ComputerScienceUtilities/Workflow/Workflow.py", line 708, in run
File "zipdir/NL/Study/MomentTensorPotential/FitMomentTensorPotential.py", line 599, in _execute
File "build/atkpython/lib/python3.11/site-packages/scaitools/moment_tensor_potentials/training.py", line 909, in fitMTPPotential
File "build/atkpython/lib/python3.11/site-packages/scaitools/moment_tensor_potentials/training.py", line 630, in fit_mtp_potential
File "build/atkpython/lib/python3.11/site-packages/scaitools/moment_tensor_potentials/training.py", line 421, in solve_least_squares_problem
File "build/atkpython/lib/python3.11/site-packages/scaitools/moment_tensor_potentials/linear_learning.py", line 55, in fit
File "build/atkpython/lib/python3.11/site-packages/scaitools/moment_tensor_potentials/mathutil.py", line 145, in svd_least_squares
RuntimeError: CUDA error
The documentation states that it should work with CUDA 11.8, the cluster has CUDA 11.7 and 12.2 installed and it doesn't work with any of these versions. The driver version is 555. According to the NVIDIA documentation (
https://docs.nvidia.com/cuda/cusolver/index.html#cusolverdn-t-gesvd), the status message is "CUSOLVER_STATUS_ARCH_MISMATCH The device only supports compute capability 5.0 and above."
I think this is a bit odd. Does QuantumATK use CUDA functions that are not available on GPUs with a higher compute capability? Maybe I misinterpret the status message.
Thanks a lot!
Best regards,
Nils