Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - filipr

Pages: [1] 2 3 ... 7
1
QuantumATK relies on the PETSc library for some sparse matrix operations. It appears that PETSC uses one of a few specialized CUDA functions that are not compatible with newer compute capabilities (your RTX 5080 has compute capability 12.0 = SM120 as per this chart). Basically your GPU is too new to be used for the PETSc library shipped with QuantumATK. The compatible compute capabilities has to be chosen when compiling the PETSc source code, so it is baked in. So for now you can't use your RTX gpu for doing calculations that involve PETSc operations.

However, even if it did work it likely wouldn't be faster than running on CPU. RTX 5080 is a GPU primarily designed for graphics processing and it only has 384 FP64 floating point units - 2 per streaming multiprocessor, so reportedly you only get 1/64 of the FLOP throughput. Scientific calculations mostly rely on 64 bit floating point numbers for accuracy in the algorithms. For scientific computations one should use specific GPUs with many FP64 units as is the case for the A, H and B series data center GPUs.

The RTX 5080 sounds like a blast for gaming, though  8)

2
It's hard to help with this problem as there is not really enough information (what machines, OS, how was SLURM configured and how was the job submitted). Isuggest that you start by contacting your cluster admin and see if they can look into the issue, as it is most likely not a problem with QuantumATK but rather a problem with how the machines and SLURM was configured. The IT admin will be typically be able to log into the specific nodes and see whether the process is running and see if there is actually any output.

3
Most QuantumATK operations do collective calls and cannot be manually mpi parallelized by e.g. just splitting a list of k-points among mpi processes.

What you can do instead is that you can first do the DFT ground state in a separate calculation and store the configuration to an HDF5 file. Then you can make a separate post-processing script that calculate OrbitalMoment() for a single k-point in a loop over a subset of all the k-points you are interested in, where the subset is determined from an input argument or environment variable (e.g. job array index).

Here's an unfinished example to give you the idea:
Code
import sys
my_process_index = int(sys.argv[1])
total_num_procs = ??

all_kpoints = numpy.array([[...], [...], ..., [...]])

my_kpoints = # logic to get local sequence of k-points here

configuration = nlread("dft_ground_state_calc.hdf5", BulkConfiguration)[-1]

orbital_moments = []
for kpoint in my_kpoints:
    orb_moment_analysis = OrbitalMoment(configuration, kpoints=MonkhorstPackGrid(1, 1, 1, k_point_shift=kpoint))
    orbital_moments.append(orb_moment_analysis.atomResolvedOrbitalMoment())

# Save 'orbital_moments' to file somehow, e.g. pickle

Then you can run this script either in a bash loop in a single script or submit it as multiple jobs or as a job array. Then you can collect the result files from each job and gather into a single array.

4
Hi Pshinyeong, QuantumATK ships it's own version of Intel MPI. The version shipped with QuantumATK X-2025.06 is Intel MPI 2021.15 which comes with Intel oneAPI 2024.2 (I believe), which is newer than the one you load in your job script. Newer versions of Intel MPI are not necessarily compatible with older versions, so that could be why it fails.

I suggest that you do not use a custom Intel MPI version unless you really, REALLY know what you're doing. So instead of loading oneAPI or OpenMPI as you would maybe normally do for other academic software your simply don't. QuantumATK is a self-container plug-and-play solution that works as-is without any installed libraries or tools.

So I recommend you remove the module loads and simply have this line to launch your job:

Code
srun /path/to/atkpython $PYTHON_SCRIPT > $LOG_FILE

If in doubt - use the built in job manager GUI to set up submission scripts for Slurm.

5
General Questions and Answers / Re: MPI error
« on: May 15, 2025, 14:05 »
This appears to be problem with either the cluster configuration or Intel MPI. Contact your cluster admin and show them this error and/or submit a support question on the Intel oneAPI support forum: https://community.intel.com/t5/Intel-MPI-Library/bd-p/oneapi-hpc-toolkit

In both cases it can help them if you set the environment variable I_MPI_DEBUG=5 in your submission script and rerun and copy the debug output from Intel MPI when asking for help elsewhere.

6
Vasilipe, if you search the internet for the error message "cannot execute: required file not found" you'll see that most of the times people report this problem it is due to errors with the shebang of scripts - either the program that the shebang points to doesn't exist or there is a typo or stray whitespace/newline characters. The 'atkpython' launcher script uses the same shebang as the 'quantumatk' launcher script, so if 'atkpython' works it would be weird if 'quantumatk' doesn't. Please check that the shebang (first line) of '/eng/tools/synopsys/atk/W-2024.09-SP1/atkpython/bin/quantumatk' is the same as in  '/eng/tools/synopsys/atk/W-2024.09-SP1/atkpython/bin/atkpython'. Also check that you can run the  '/eng/tools/synopsys/atk/W-2024.09-SP1/atkpython/bin/python3.11' executable.

If you're still having problems, it could be that the installation was somehow corrupted or that your system is not supported, see https://docs.quantumatk.com/intro/installation/technical_hardware.html#linux for system requirements. I recommend contacting your system administrator as the problem is most likely due to the installation or system configuration and not the software.

7
Is it possible that the installation folder was moved after the program was installed?

Try to open the file '/eng/tools/synopsys/atk/W-2024.09-SP1/atkpython/bin/quantumatk' in a simple text editor (it's a simple text file). The first line has a "shebang" (which tells the OS which program to run the file with) and should be the full path the the actual python executable with the path: '/eng/tools/synopsys/atk/W-2024.09-SP1/atkpython/bin/python3.11'. If it points to a file that does not exist it means the installation directory was moved, and the launcher script will not work. You (or your sys admin) shouldn't move the installation folder, but you can try to fix the shebang to point to a valid path.

8
Can you share what kind of CPU, Operating System and network interface (ethernet/infiniband) these nodes have?

On another note: This level of parallelism is way overkill for this system and will probably make it run slower, not faster. You're trying to use 18 processes so find the eigenvalues of a 1004x1004 matrix, which is honestly not that big and can be done pretty quick on even a laptop CPU.

I suggest that you run on a single 48 core node with 16 MPI processes and 3 OpenMP threads. This will ensure that each MPI process get one k-point and three CPU cores are working on the matrices for that k-point.

9
The function returns the plasma frequency, i.e. the square root of the expression in the documentation.

10
We have not looked into this yet, and it might take quite some time (months+) before it will happen, so if you need an urgent solution I recommend that you look for alternative ways to run QuantumATK. I have filed a ticket in our bug reporting system, so that the issue is not forgotten.

11
You should use the PAW method instead of normconserving pseudopotentials. See also documentation and notes in the manual: https://docs.quantumatk.com/manual/Types/BaderCharges/BaderCharges.html

12
Hmm it seems to be something different than just missing system libraries - probably some deeper problem related to how these containers work. I don't have a solution for you right now - we will have to try to get our hands on our own singularity container and that can take a while.

I suggest that you install QuantumATK on bare metal or in a VM. Do note that QuantumATK ships all needed dependencies and does not require that the cluster has anything installed as long as it lives up to our stated requirements, most notably a Linux distro that has glibc version 2.28 or newer (no RHEL/CentOS 7). For that reason there is little utility in using containers for QuantumATK.

13
Is the file 'libQt5Core.so.5' in the directory '/opt/ohpc/pub/apps/qatk/2024.09/lib'? i.e. does the symlink ''/opt/ohpc/pub/apps/qatk/2024.09/lib/libQt5Core.so.5' exist and point to the file '/opt/ohpc/pub/apps/qatk/2024.09/lib/libQt5Core.so.5.15.14'?

If it exists, what it the output of 'ldd /opt/ohpc/pub/apps/qatk/2024.09/lib/libQt5Core.so.5.15.14'?

14
Hmm, try to run

Code
/opt/ohpc/pub/apps/qatk/2024.09/atkpython/bin/python -c "from PyQt5 import QtCore"

15
Ok, as you can see the error comes from within the loading of the 'QtPy' Python package in the file (which you can open yourself to see): "/opt/ohpc/pub/apps/qatk/2024.09/atkpython/lib/python3.11/site-packages/qtpy/__init__.py" on line 259. It appears that QtPy tries to load PySide but fails. QuantumATK uses PyQt5, not PySide. So something causes it to try to load PySide instead. This can either be because an environment variable 'QT_API' is set or because PyQt5 cannot be loaded for some reason (most likely missing system libraries).

So first check if there's an environment variable 'QT_API' set before running 'quantumatk'? If there is, be sure to unset this before running quantumatk: 'unset QT_API'.

If there's no such variable, try to run: `/opt/ohpc/pub/apps/qatk/2024.09/bin/atkpython -c "from PyQt5 import QtCore"' and see what comes out. If it fail, but doesn't mention any reason, try to run:

Code
LD_DEBUG=files /opt/ohpc/pub/apps/qatk/2024.09/bin/atkpython -c "from PyQt5 import QtCore" > pyqt5_ld_debug.out 2>&1

and then grep for errors in the generated file 'pyqt5_ld_debug.out'.

Pages: [1] 2 3 ... 7