Recent Posts

Pages: [1] 2 3 ... 10
1
Hi, a few days ago, I posted some questions about MTP error(https://forum.quantumatk.com/index.php?topic=13336.0). From the reply, I tried to study MACE training potential, but didn't work out due to the fact that I didn't have GPU. So I returned to the MTP training, but an error occurred that I have never encountered.

This time the job terminates with many repeated warnings about the Study HDF5 file not existing, and then crashes with an HDF5 “truncated file” error leading to MPI_Abort.
UserWarning: The original file of the Study object 'GeTeCN_amor_train_gga.hdf5' no longer exists.
This means no task results will be saved to the new file.

During MTP training update / dataset construction, the run fails while reading an HDF5 file:

OSError: Unable to open file (truncated file: eof = 338986776, sblock->base_addr = 0, stored_eoa = 338986929)
  File "zipdir/sergio/HDF5/HDF5.py", line 111, in __init__
  File "/home/synopsys/quantumatk/X-2025.06/atkpython/lib/python3.11/site-packages/h5py/_hl/files.py", line 567, in __init__
OSError: Unable to open file (truncated file: eof = 338986776, sblock->base_addr = 0, stored_eoa = 338986929)
OSError: Unable to open file (truncated file: eof = 338986776, sblock->base_addr = 0, stored_eoa = 338986929)
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/synopsys/quantumatk/X-2025.06/atkpython/lib/python3.11/site-packages/h5py/_hl/files.py", line 231, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
I attached the rest of the error script below file.

So here's the question
1. What exactly triggers the repeated warning:
“The original file of the Study object … no longer exists”?
Is this typically caused by launching the run from a temporary working directory (e.g., scratch/zipdir) where the original Study HDF5 is not available?
Is there a recommended way to set an absolute/persistent output path for the Study/Workflow files in Active Learning?

2. Regarding the fatal error:
HDF5 truncated file
Is this usually due to interrupted I/O (walltime kill, quota/full filesystem, network filesystem instability), or can concurrent MPI access to the same HDF5 also corrupt/truncate it?
In Active Learning MTP, which specific HDF5 file is being read at this stage (the Study file, a workflow state file, training dataset file, or something else)? Any tips to identify it deterministically?

3. What is the recommended restart/recovery procedure after an HDF5 truncation?
Should I delete/rename the corrupted HDF5 and restart from the last valid iteration?
Is there an official method to validate/repair the HDF5 (or is rollback the only safe option)?

I also attached slurm file and py file that I used

Thank you
2
General Questions and Answers / The GPU is not supported
« Last post by Roc2019 on December 14, 2025, 03:01 »
Dear Sir,

Recently, I did test the GPU performance of ATK2025.6 version on my GPU node with Tesla P100-PCIE-16GB cards.

However it prints the following errors,
###
The GPU is not supported. Verify that the GPU has CUDA compute capability 7 or
higher and is in compute mode.

Address this issue or run without CUDA acceleration.
####

So, does it mean the code can not work with the Tesla P100-PCIE-16GB cards?

In addition, it can work with my another node with  NVIDIA GeForce RTX 4090

Thanks so much.

Roc 



 
3
Would an expert be able to answer this question?
4
I have a question about electron-phonon coupling simulation in Quantum ATK.
I was following this guide https://docs.quantumatk.com/tutorials/mobility/mobility.html to set up a model to calculate the e-ph coupling rate of graphene.
Also referred to this https://spdocs.synopsys.com/dow_retrieve/qsc-x/seg/quantumatk/X-2025.06/manual/Types/ElectronPhononCoupling/ElectronPhononCoupling.html to know how the function works.

Here comes the problem, I think e-ph coupling rate is a function of graphene's fermi level.
However, I cannot tune it either in the interface window nor the python codes.
For example, in the codes, it is like below where no input parameters as 'fermi shift'

electron_phonon_coupling = ElectronPhononCoupling(
    configuration=optimized_configuration,
    hamiltonian_derivatives=hamiltonian_derivatives,
    dynamical_matrix=dynamical_matrix,
    kpoints_fractional=kpoints_fractional,
    qpoints_fractional=qpoints_fractional,
    electron_bands=All,
    energy_tolerance=0.01 * eV,
    initial_state_energy_range=[-0.4, 0.4] * eV,
)

I do find one 'fermi shift' in the definition of inverseLfieTime:

inverse_life_time = electron_phonon_coupling.inverseLifeTime(phonon_modes=All, electron_bands=All, temperature=300*Kelvin, fermi_shift=0.0*eV, integration_method=GaussianBroadening(), refinement=1)

But it cannot be plot properly.
So by any chance anyone can take a look to see how to get a simulation result of e-ph coupling rate as function of fermi shift ?
Thanks in advance.
5
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by filipr on November 28, 2025, 11:18 »
We have recently been made aware that Nvidia has implemented algorithms that can do linear algebra operations using emulated floating point operations on tensor cores on GPUs that do not necessarily have many native FP64 units such as RTX 6000: https://developer.nvidia.com/blog/unlocking-tensor-core-performance-with-floating-point-emulation-in-cublas/

This requires using the newest CUDA version (and probably also newest drivers). We have not had time to do any experiments with this, but feel free to try yourself. Note that QuantumATK ships with CUDA 12.2, so you will have to modify the launcher script (bin/atkpython is just a Bash script that sets LD_LIBRARY_PATH and other relevant environment variables) so that the program picks up the right CUDA libraries.
6
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by filipr on November 26, 2025, 14:12 »
As noted above, the reason for this is that RTX cards do not have many FP64 units per streaming multiprocessor. From this table you can see that RTX 6000 has a single precision operation throughput of 91 TeraFLOPS, but only 1.4 TeraFLOPS for double precision, i.e. a fraction of 1/64 as I also explained earlier.

Besides that, GPUs are only good for tasks that can be heavily parallelized and not all algorithms scale equally well. On top of that transferring data from host RAM to GPU memory is very slow, especially on consumer cards, so one needs to go to quite large systems before the compute speedup compensates for the overhead of copying data to the GPU.

These issues are inherent to scientific computing, not just QuantumATK and are not really "fixable". If you want a worthwhile GPU speedup you need to get your hands on the GPUs designed for scientific workloads.
7
Thank you for the advice. I'll try it.
8
General Questions and Answers / Re: Issue on running MTP training simulation
« Last post by AsifShah on November 26, 2025, 04:36 »
Hi,

Just a small suggestion. Instead of going with MTP, I would suggest fine-tuning a MACE mode which is more accurate than MTP. The fine tuning is also very simple. You can go throught his tutorial.

 "https://docs.quantumatk.com/tutorials/mace-training-c-am-TiSi/mace-training-c-am-TiSi.html"
9
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by AsifShah on November 26, 2025, 04:34 »
On the same note,

I am using RTX 6000 Ada but there is no speedup. The CPU runs faster than GPU.
Any idea, why? Will these issues be resolved in next version of QATK?
10
General Questions and Answers / Re: Questions Regarding GPU Acceleration
« Last post by dmicje12 on November 25, 2025, 08:17 »
Thanks you for your reply!
Pages: [1] 2 3 ... 10