Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - filipr

Pages: [1] 2 3 ... 7
1
We have recently been made aware that Nvidia has implemented algorithms that can do linear algebra operations using emulated floating point operations on tensor cores on GPUs that do not necessarily have many native FP64 units such as RTX 6000: https://developer.nvidia.com/blog/unlocking-tensor-core-performance-with-floating-point-emulation-in-cublas/

This requires using the newest CUDA version (and probably also newest drivers). We have not had time to do any experiments with this, but feel free to try yourself. Note that QuantumATK ships with CUDA 12.2, so you will have to modify the launcher script (bin/atkpython is just a Bash script that sets LD_LIBRARY_PATH and other relevant environment variables) so that the program picks up the right CUDA libraries.

2
As noted above, the reason for this is that RTX cards do not have many FP64 units per streaming multiprocessor. From this table you can see that RTX 6000 has a single precision operation throughput of 91 TeraFLOPS, but only 1.4 TeraFLOPS for double precision, i.e. a fraction of 1/64 as I also explained earlier.

Besides that, GPUs are only good for tasks that can be heavily parallelized and not all algorithms scale equally well. On top of that transferring data from host RAM to GPU memory is very slow, especially on consumer cards, so one needs to go to quite large systems before the compute speedup compensates for the overhead of copying data to the GPU.

These issues are inherent to scientific computing, not just QuantumATK and are not really "fixable". If you want a worthwhile GPU speedup you need to get your hands on the GPUs designed for scientific workloads.

3
I am not aware of any benchmarks on Volta series GPUs. We mainly have access to A-series hardware. Depending on the calculations you do the bottleneck of the program will often be in CUDAx libraries like cuBLAS and cuSolver, so you can search for benchmarks of these.

4
QuantumATK relies on the PETSc library for some sparse matrix operations. It appears that PETSC uses one of a few specialized CUDA functions that are not compatible with newer compute capabilities (your RTX 5080 has compute capability 12.0 = SM120 as per this chart). Basically your GPU is too new to be used for the PETSc library shipped with QuantumATK. The compatible compute capabilities has to be chosen when compiling the PETSc source code, so it is baked in. So for now you can't use your RTX gpu for doing calculations that involve PETSc operations.

However, even if it did work it likely wouldn't be faster than running on CPU. RTX 5080 is a GPU primarily designed for graphics processing and it only has 384 FP64 floating point units - 2 per streaming multiprocessor, so reportedly you only get 1/64 of the FLOP throughput. Scientific calculations mostly rely on 64 bit floating point numbers for accuracy in the algorithms. For scientific computations one should use specific GPUs with many FP64 units as is the case for the A, H and B series data center GPUs.

The RTX 5080 sounds like a blast for gaming, though  8)

5
It's hard to help with this problem as there is not really enough information (what machines, OS, how was SLURM configured and how was the job submitted). Isuggest that you start by contacting your cluster admin and see if they can look into the issue, as it is most likely not a problem with QuantumATK but rather a problem with how the machines and SLURM was configured. The IT admin will be typically be able to log into the specific nodes and see whether the process is running and see if there is actually any output.

6
Most QuantumATK operations do collective calls and cannot be manually mpi parallelized by e.g. just splitting a list of k-points among mpi processes.

What you can do instead is that you can first do the DFT ground state in a separate calculation and store the configuration to an HDF5 file. Then you can make a separate post-processing script that calculate OrbitalMoment() for a single k-point in a loop over a subset of all the k-points you are interested in, where the subset is determined from an input argument or environment variable (e.g. job array index).

Here's an unfinished example to give you the idea:
Code
import sys
my_process_index = int(sys.argv[1])
total_num_procs = ??

all_kpoints = numpy.array([[...], [...], ..., [...]])

my_kpoints = # logic to get local sequence of k-points here

configuration = nlread("dft_ground_state_calc.hdf5", BulkConfiguration)[-1]

orbital_moments = []
for kpoint in my_kpoints:
    orb_moment_analysis = OrbitalMoment(configuration, kpoints=MonkhorstPackGrid(1, 1, 1, k_point_shift=kpoint))
    orbital_moments.append(orb_moment_analysis.atomResolvedOrbitalMoment())

# Save 'orbital_moments' to file somehow, e.g. pickle

Then you can run this script either in a bash loop in a single script or submit it as multiple jobs or as a job array. Then you can collect the result files from each job and gather into a single array.

7
Hi Pshinyeong, QuantumATK ships it's own version of Intel MPI. The version shipped with QuantumATK X-2025.06 is Intel MPI 2021.15 which comes with Intel oneAPI 2024.2 (I believe), which is newer than the one you load in your job script. Newer versions of Intel MPI are not necessarily compatible with older versions, so that could be why it fails.

I suggest that you do not use a custom Intel MPI version unless you really, REALLY know what you're doing. So instead of loading oneAPI or OpenMPI as you would maybe normally do for other academic software your simply don't. QuantumATK is a self-container plug-and-play solution that works as-is without any installed libraries or tools.

So I recommend you remove the module loads and simply have this line to launch your job:

Code
srun /path/to/atkpython $PYTHON_SCRIPT > $LOG_FILE

If in doubt - use the built in job manager GUI to set up submission scripts for Slurm.

8
General Questions and Answers / Re: MPI error
« on: May 15, 2025, 14:05 »
This appears to be problem with either the cluster configuration or Intel MPI. Contact your cluster admin and show them this error and/or submit a support question on the Intel oneAPI support forum: https://community.intel.com/t5/Intel-MPI-Library/bd-p/oneapi-hpc-toolkit

In both cases it can help them if you set the environment variable I_MPI_DEBUG=5 in your submission script and rerun and copy the debug output from Intel MPI when asking for help elsewhere.

9
Vasilipe, if you search the internet for the error message "cannot execute: required file not found" you'll see that most of the times people report this problem it is due to errors with the shebang of scripts - either the program that the shebang points to doesn't exist or there is a typo or stray whitespace/newline characters. The 'atkpython' launcher script uses the same shebang as the 'quantumatk' launcher script, so if 'atkpython' works it would be weird if 'quantumatk' doesn't. Please check that the shebang (first line) of '/eng/tools/synopsys/atk/W-2024.09-SP1/atkpython/bin/quantumatk' is the same as in  '/eng/tools/synopsys/atk/W-2024.09-SP1/atkpython/bin/atkpython'. Also check that you can run the  '/eng/tools/synopsys/atk/W-2024.09-SP1/atkpython/bin/python3.11' executable.

If you're still having problems, it could be that the installation was somehow corrupted or that your system is not supported, see https://docs.quantumatk.com/intro/installation/technical_hardware.html#linux for system requirements. I recommend contacting your system administrator as the problem is most likely due to the installation or system configuration and not the software.

10
Is it possible that the installation folder was moved after the program was installed?

Try to open the file '/eng/tools/synopsys/atk/W-2024.09-SP1/atkpython/bin/quantumatk' in a simple text editor (it's a simple text file). The first line has a "shebang" (which tells the OS which program to run the file with) and should be the full path the the actual python executable with the path: '/eng/tools/synopsys/atk/W-2024.09-SP1/atkpython/bin/python3.11'. If it points to a file that does not exist it means the installation directory was moved, and the launcher script will not work. You (or your sys admin) shouldn't move the installation folder, but you can try to fix the shebang to point to a valid path.

11
Can you share what kind of CPU, Operating System and network interface (ethernet/infiniband) these nodes have?

On another note: This level of parallelism is way overkill for this system and will probably make it run slower, not faster. You're trying to use 18 processes so find the eigenvalues of a 1004x1004 matrix, which is honestly not that big and can be done pretty quick on even a laptop CPU.

I suggest that you run on a single 48 core node with 16 MPI processes and 3 OpenMP threads. This will ensure that each MPI process get one k-point and three CPU cores are working on the matrices for that k-point.

12
The function returns the plasma frequency, i.e. the square root of the expression in the documentation.

13
We have not looked into this yet, and it might take quite some time (months+) before it will happen, so if you need an urgent solution I recommend that you look for alternative ways to run QuantumATK. I have filed a ticket in our bug reporting system, so that the issue is not forgotten.

14
You should use the PAW method instead of normconserving pseudopotentials. See also documentation and notes in the manual: https://docs.quantumatk.com/manual/Types/BaderCharges/BaderCharges.html

15
Hmm it seems to be something different than just missing system libraries - probably some deeper problem related to how these containers work. I don't have a solution for you right now - we will have to try to get our hands on our own singularity container and that can take a while.

I suggest that you install QuantumATK on bare metal or in a VM. Do note that QuantumATK ships all needed dependencies and does not require that the cluster has anything installed as long as it lives up to our stated requirements, most notably a Linux distro that has glibc version 2.28 or newer (no RHEL/CentOS 7). For that reason there is little utility in using containers for QuantumATK.

Pages: [1] 2 3 ... 7