Author Topic: Reading HDF5 files obtained from another computer/clusters  (Read 316 times)

0 Members and 1 Guest are viewing this topic.

Offline ml1019

  • New ATK user
  • *
  • Posts: 3
  • Country: gb
  • Reputation: 0
    • View Profile
Dear all,

I obtained a hdf5 file from a calculation that ran on a hpc cluster and tried to analyse it using the GUI on my laptop. But the problem is that the working directory of hdf5 is a temporary directory and cannot be found on my own laptop. Please find below the error.

Traceback (most recent call last):
  File "test.py", line 6, in <module>
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    configuration = nlread(path, object_id='DeviceConfiguration_0')[0]
  File "zipdir/NL/IO/NLSaveUtilities.py", line 898, in nlread
  File "zipdir/NL/IO/HDF5.py", line 558, in readHDF5
  File "zipdir/NL/IO/HDF5.py", line 639, in readHDF5Group
  File "zipdir/NL/IO/HDF5.py", line 600, in readHDF5Dict
  File "zipdir/NL/IO/HDF5.py", line 660, in readHDF5Group
  File "zipdir/NL/IO/Serializable.py", line 331, in _fromVersionedData
  File "zipdir/NL/CommonConcepts/Calculator.py", line 67, in _createObject
    configuration = nlread(path, object_id='DeviceConfiguration_0')[0]
  File "zipdir/NL/IO/NLSaveUtilities.py", line 898, in nlread
  File "zipdir/NL/QuantumATK/ScopeExecuter.py", line 244, in scope_execute
  File "zipdir/NL/IO/HDF5.py", line 558, in readHDF5
NL.ComputerScienceUtilities.Exceptions.NLScopeExecutionError: The parameter 'working_directory' must be a string to an existing directory.
The directory /var/tmp/pbs.6846470.pbs was not found.
  File "zipdir/NL/IO/HDF5.py", line 639, in readHDF5Group
  File "zipdir/NL/IO/HDF5.py", line 600, in readHDF5Dict
  File "zipdir/NL/IO/HDF5.py", line 660, in readHDF5Group
  File "zipdir/NL/IO/Serializable.py", line 331, in _fromVersionedData
  File "zipdir/NL/CommonConcepts/Calculator.py", line 67, in _createObject
  File "zipdir/NL/QuantumATK/ScopeExecuter.py", line 244, in scope_execute
NL.ComputerScienceUtilities.Exceptions.NLScopeExecutionError: The parameter 'working_directory' must be a string to an existing directory.
The directory /var/tmp/pbs.6846470.pbs was not found.

I appreciate it if you have any suggestions. Thank you

Offline Anders Blom

  • QuantumATK Staff
  • Supreme ATK Wizard
  • *****
  • Posts: 5221
  • Country: dk
  • Reputation: 87
    • View Profile
    • QuantumATK at Synopsys
Re: Reading HDF5 files obtained from another computer/clusters
« Reply #1 on: February 28, 2023, 07:36 »
The only case I can guess where this might occur is if you have requested the self energies to be stored on disk. Can you confirm if that is the case? If so, it seems like a "logical bug" but still something we might need to look into; if not, pls attached your input file (actually, if you can, pls do that anyway).

Also pls confirm which version of the code us used.

Offline ml1019

  • New ATK user
  • *
  • Posts: 3
  • Country: gb
  • Reputation: 0
    • View Profile
Re: Reading HDF5 files obtained from another computer/clusters
« Reply #2 on: March 3, 2023, 13:45 »
The only case I can guess where this might occur is if you have requested the self energies to be stored on disk. Can you confirm if that is the case? If so, it seems like a "logical bug" but still something we might need to look into; if not, pls attached your input file (actually, if you can, pls do that anyway).

Also pls confirm which version of the code us used.

Dear Anders,

Thank you for your reply. 

Yes, exactly. The self energies were stored on disk. The version I used is 2021.06-SP2.

Do you mean if I need to analyze the hdf5 file, I need to save it in memory, which obviously requires a lot more memory than before?

Offline Anders Blom

  • QuantumATK Staff
  • Supreme ATK Wizard
  • *****
  • Posts: 5221
  • Country: dk
  • Reputation: 87
    • View Profile
    • QuantumATK at Synopsys
Re: Reading HDF5 files obtained from another computer/clusters
« Reply #3 on: March 7, 2023, 07:31 »
Well, that *should* not be needed, but I guess due to this bug, then currently yes. Or, you could just not store them at all (NoStorage), which of course lowers performance. Could you get access to a new version and try that? We don't fix bugs in old versions anyway, and maybe it actually has been fixed; I was able to save a DeviceConfiguration in the latest version with a StoreOnDisk that used a non-existent directory, and read it back, although it was not a converged calculation. As a workaround, which I think will work, my first suggestion is to re-save the quantities you want to analyze in the GUI into a separate new HDF5 file, except for the geometry. You could always easily add that, using your original script. You are anyway not going to perform any post-processing calculations on the laptop! (Btw, if the laptop also is Linux, you could also simply create the directory /var/tmp/pbs.6846470.pbs - the error is because that directory doesn't exist, but NanoLab is not trying to read any data from it.) So, if your laptop is Windows, then try this on the HPC machine:
Code: python
from NL.IO.NLSaveUtilities import nlinspect
current_file = "current.hdf5"
new_file = "new.hdf5"
file_contents = nlinspect(current_file)
for x in file_contents:
    if x.cls == DeviceConfiguration:
        pass
    else:
        data = nlread(current_file, object_id=x.object_id)[0]
        nlsave(new_file, data)
The new file should be possible to open on the laptop, I hope! You could also (still on the HPC machine) make a copy of your original script, and before the calculator is attached, add
Code: python
nlsave("new.hdf5", device_configuration)
and then stop execution of the script when it starts the calculation (or insert a bogus statement like "ffffff" right after this nlsave line). That way, the new.hdf5 file will have the geometry in it, albeit with a different fingerprint than the analysis data. We could fix that with a few more lines of code, if needed; let me know if you have problems in the further analysis.

Offline ml1019

  • New ATK user
  • *
  • Posts: 3
  • Country: gb
  • Reputation: 0
    • View Profile
Re: Reading HDF5 files obtained from another computer/clusters
« Reply #4 on: March 18, 2023, 02:52 »
Well, that *should* not be needed, but I guess due to this bug, then currently yes. Or, you could just not store them at all (NoStorage), which of course lowers performance. Could you get access to a new version and try that? We don't fix bugs in old versions anyway, and maybe it actually has been fixed; I was able to save a DeviceConfiguration in the latest version with a StoreOnDisk that used a non-existent directory, and read it back, although it was not a converged calculation. As a workaround, which I think will work, my first suggestion is to re-save the quantities you want to analyze in the GUI into a separate new HDF5 file, except for the geometry. You could always easily add that, using your original script. You are anyway not going to perform any post-processing calculations on the laptop! (Btw, if the laptop also is Linux, you could also simply create the directory /var/tmp/pbs.6846470.pbs - the error is because that directory doesn't exist, but NanoLab is not trying to read any data from it.) So, if your laptop is Windows, then try this on the HPC machine:
Code: python
from NL.IO.NLSaveUtilities import nlinspect
current_file = "current.hdf5"
new_file = "new.hdf5"
file_contents = nlinspect(current_file)
for x in file_contents:
    if x.cls == DeviceConfiguration:
        pass
    else:
        data = nlread(current_file, object_id=x.object_id)[0]
        nlsave(new_file, data)
The new file should be possible to open on the laptop, I hope! You could also (still on the HPC machine) make a copy of your original script, and before the calculator is attached, add
Code: python
nlsave("new.hdf5", device_configuration)
and then stop execution of the script when it starts the calculation (or insert a bogus statement like "ffffff" right after this nlsave line). That way, the new.hdf5 file will have the geometry in it, albeit with a different fingerprint than the analysis data. We could fix that with a few more lines of code, if needed; let me know if you have problems in the further analysis.
Dear Anders, Thank you for your solution! I can analyse the new hdf5 file now.