1
General Questions and Answers / Regarding questions of MTP error problem
« Last post by Lim changmin on Yesterday at 17:53 »Hi, a few days ago, I posted some questions about MTP error(https://forum.quantumatk.com/index.php?topic=13336.0). From the reply, I tried to study MACE training potential, but didn't work out due to the fact that I didn't have GPU. So I returned to the MTP training, but an error occurred that I have never encountered.
This time the job terminates with many repeated warnings about the Study HDF5 file not existing, and then crashes with an HDF5 “truncated file” error leading to MPI_Abort.
UserWarning: The original file of the Study object 'GeTeCN_amor_train_gga.hdf5' no longer exists.
This means no task results will be saved to the new file.
During MTP training update / dataset construction, the run fails while reading an HDF5 file:
OSError: Unable to open file (truncated file: eof = 338986776, sblock->base_addr = 0, stored_eoa = 338986929)
File "zipdir/sergio/HDF5/HDF5.py", line 111, in __init__
File "/home/synopsys/quantumatk/X-2025.06/atkpython/lib/python3.11/site-packages/h5py/_hl/files.py", line 567, in __init__
OSError: Unable to open file (truncated file: eof = 338986776, sblock->base_addr = 0, stored_eoa = 338986929)
OSError: Unable to open file (truncated file: eof = 338986776, sblock->base_addr = 0, stored_eoa = 338986929)
fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/synopsys/quantumatk/X-2025.06/atkpython/lib/python3.11/site-packages/h5py/_hl/files.py", line 231, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
I attached the rest of the error script below file.
So here's the question
1. What exactly triggers the repeated warning:
“The original file of the Study object … no longer exists”?
Is this typically caused by launching the run from a temporary working directory (e.g., scratch/zipdir) where the original Study HDF5 is not available?
Is there a recommended way to set an absolute/persistent output path for the Study/Workflow files in Active Learning?
2. Regarding the fatal error:
HDF5 truncated file
Is this usually due to interrupted I/O (walltime kill, quota/full filesystem, network filesystem instability), or can concurrent MPI access to the same HDF5 also corrupt/truncate it?
In Active Learning MTP, which specific HDF5 file is being read at this stage (the Study file, a workflow state file, training dataset file, or something else)? Any tips to identify it deterministically?
3. What is the recommended restart/recovery procedure after an HDF5 truncation?
Should I delete/rename the corrupted HDF5 and restart from the last valid iteration?
Is there an official method to validate/repair the HDF5 (or is rollback the only safe option)?
I also attached slurm file and py file that I used
Thank you
This time the job terminates with many repeated warnings about the Study HDF5 file not existing, and then crashes with an HDF5 “truncated file” error leading to MPI_Abort.
UserWarning: The original file of the Study object 'GeTeCN_amor_train_gga.hdf5' no longer exists.
This means no task results will be saved to the new file.
During MTP training update / dataset construction, the run fails while reading an HDF5 file:
OSError: Unable to open file (truncated file: eof = 338986776, sblock->base_addr = 0, stored_eoa = 338986929)
File "zipdir/sergio/HDF5/HDF5.py", line 111, in __init__
File "/home/synopsys/quantumatk/X-2025.06/atkpython/lib/python3.11/site-packages/h5py/_hl/files.py", line 567, in __init__
OSError: Unable to open file (truncated file: eof = 338986776, sblock->base_addr = 0, stored_eoa = 338986929)
OSError: Unable to open file (truncated file: eof = 338986776, sblock->base_addr = 0, stored_eoa = 338986929)
fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/synopsys/quantumatk/X-2025.06/atkpython/lib/python3.11/site-packages/h5py/_hl/files.py", line 231, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
I attached the rest of the error script below file.
So here's the question
1. What exactly triggers the repeated warning:
“The original file of the Study object … no longer exists”?
Is this typically caused by launching the run from a temporary working directory (e.g., scratch/zipdir) where the original Study HDF5 is not available?
Is there a recommended way to set an absolute/persistent output path for the Study/Workflow files in Active Learning?
2. Regarding the fatal error:
HDF5 truncated file
Is this usually due to interrupted I/O (walltime kill, quota/full filesystem, network filesystem instability), or can concurrent MPI access to the same HDF5 also corrupt/truncate it?
In Active Learning MTP, which specific HDF5 file is being read at this stage (the Study file, a workflow state file, training dataset file, or something else)? Any tips to identify it deterministically?
3. What is the recommended restart/recovery procedure after an HDF5 truncation?
Should I delete/rename the corrupted HDF5 and restart from the last valid iteration?
Is there an official method to validate/repair the HDF5 (or is rollback the only safe option)?
I also attached slurm file and py file that I used
Thank you

Recent Posts