Author Topic: Error after installing ATK-12.2.0  (Read 4003 times)

0 Members and 1 Guest are viewing this topic.

Offline ziand

  • Heavy QuantumATK user
  • ***
  • Posts: 78
  • Country: de
  • Reputation: 5
    • View Profile
Error after installing ATK-12.2.0
« on: May 29, 2012, 19:11 »
After ATK-12.2.0 has been installed on a cluster, the following error appears. The script is a very minimum one. It contains just print "Test"
(Which is printed. Any additional ATK-code throws exceptions.)

Quote
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "./zipdir/NL/__init__.py", line 4, in <module>
  File "./zipdir/NLEngine.py", line 35, in <module>
  File "./zipdir/NLEngine.py", line 17, in swig_import_helper
ImportError: libpng12.so.0: cannot open shared object file: No such file or directory
Test

Before that, version 11.2.3 worked fine. I'm not aware of any changes on that cluster. ATK was not installed by myself but by some Admin. I do not have root access. I read the QuantumWise FAQ: The "missing library" is not missing on that cluster, I found it in /usr/lib/

I can however start an atkpython-shell (tested on the headnode). This works and I can run serious calculation from inside that shell. The error appears when I submit the job via qsub. (No matter how many nodes or mpi-jobs.)

We are using a PBS job scheduling system here, together with mvapich2.
The command "mpich2version" gives:

Quote
MPICH2 Version:         1.8a1
MPICH2 Release date:    Mon Nov 14 18:25:45 EST 2011
MPICH2 Device:          ch3:mrail
MPICH2 configure:       --prefix=/lustrefs/mpi/gcc-4.6.2/mvapich2-1.8a1p1 --with-rdma=gen2 --with-ib-include=/usr/local/ofed/include --with-ib-libpath=/usr/local/ofed/lib64 --enable-romio --with-file-system=lustre --enable-shared --enable-g=dbg --enable-debuginfo --enable-totalview --without-mpe
MPICH2 CC:      gcc    -g -DNDEBUG -DNVALGRIND -O2
MPICH2 CXX:     c++   -g -DNDEBUG -DNVALGRIND -O2
MPICH2 F77:     gfortran   -g -O2 -L/usr/local/ofed/lib64
MPICH2 FC:      gfortran   -g -O2

Any suggestions?
(I can send you my PBS-script if neccessary, but as I said, it worked with ATK-11.2.3.)
(And I will inform the cluster admin.)

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5423
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: Error after installing ATK-12.2.0
« Reply #1 on: May 29, 2012, 19:32 »
How did you verify that libpng12.so.0 exists on the compute nodes? Can you log onto the individual nodes? It may be it only exists on the head node. If you can't log onto the compute nodes, try submitting a PBS job where you don't actually run ATK, but instead give a command like
Code
locate libpng12.so.0
in the PBS script - this gets executed on the nodes where ATK would run. That is, remove the "mpiexec ..." line and enter the command above instead.

Offline ziand

  • Heavy QuantumATK user
  • ***
  • Posts: 78
  • Country: de
  • Reputation: 5
    • View Profile
Re: Error after installing ATK-12.2.0
« Reply #2 on: May 29, 2012, 19:39 »
Wow, that was a quick reply  8)

Yes, I can log onto individual nodes. But no, I did not very verfy that libpng12.so.0 exists on the nodes, and indeed it does not!!! (I had to look manually. Even "locate" does not exist on the nodes.)

So does that neccessarily mean, that something changed on that cluster, recently? Or did something change within ATK?

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5423
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: Error after installing ATK-12.2.0
« Reply #3 on: May 29, 2012, 20:29 »
The change is within ATK.

So, you need the admin to install this lib on the nodes. Should be easy enough if it's on the head node and they run the same distro.

Offline ziand

  • Heavy QuantumATK user
  • ***
  • Posts: 78
  • Country: de
  • Reputation: 5
    • View Profile
Re: Error after installing ATK-12.2.0
« Reply #4 on: May 30, 2012, 12:10 »
The admin copied the missing lib into the ATK/lib-folder.
This woks but now "libjpeg.so.62" is missing.

Do you have a list of all the things that should be copied?
That would be a tedious work otherwise because I have the feeling there could be more "picture libraries" and stuff missing.

The cluster admins want to keep the compute node software environment as tiny as possible to save RAM because the compute nodes run diskless (all is in the RAM).
He said that such libs are nearly never needed on compute nodes.

Offline Nordland

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 812
  • Reputation: 18
    • View Profile
Re: Error after installing ATK-12.2.0
« Reply #5 on: May 30, 2012, 14:36 »
Hey ziand.

We dont have the complete list right now, but you can generate it you self using the following recipe:

ldd path/to/atkpython/lib/python2.7/_NLEngine.so

This will resolve all dynamics symbols, and if there is anything else missing then it would show up in this list.

Offline ziand

  • Heavy QuantumATK user
  • ***
  • Posts: 78
  • Country: de
  • Reputation: 5
    • View Profile
Re: Error after installing ATK-12.2.0
« Reply #6 on: May 30, 2012, 16:30 »
Thanks, now it works.

There were numerous "missing libs" (see below, everything that starts with "/usr" or "/lib64").

Quote
ldd _NLEngine.so
ldd: warning: you do not have execution permission for `./_NLEngine.so'
        libintlc.so.5 => not found
        libsocorro.so => not found
        libmkl_intel_lp64.so => not found
        libmkl_intel_thread.so => not found
        libmkl_lapack.so => not found
        libmkl_core.so => not found
        libiomp5.so => not found
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000002a96a52000)
        libRocketFuel2.so => not found
        librt.so.1 => /lib64/tls/librt.so.1 (0x0000002a96b68000)
        libpng12.so.0 => /usr/lib64/libpng12.so.0 (0x0000002a96c82000)
        libz.so.1 => /usr/lib64/libz.so.1 (0x0000002a96da9000)
        libjpeg.so.62 => /usr/lib64/libjpeg.so.62 (0x0000002a96ebd000)
        libGLU.so.1 => /usr/X11R6/lib64/libGLU.so.1 (0x0000002a96fde000)
        libGL.so.1 => /usr/X11R6/lib64/libGL.so.1 (0x0000002a97161000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000002a9730c000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000002a974fc000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000002a97682000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000002a97790000)
        /lib64/ld-linux-x86-64.so.2 (0x000000552aaaa000)
        libXext.so.6 => /usr/X11R6/lib64/libXext.so.6 (0x0000002a979c4000)
        libX11.so.6 => /usr/X11R6/lib64/libX11.so.6 (0x0000002a97ad5000)
        libXxf86vm.so.1 => /usr/X11R6/lib64/libXxf86vm.so.1 (0x0000002a97ccf000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000002a97dd4000)

Offline Nordland

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 812
  • Reputation: 18
    • View Profile
Re: Error after installing ATK-12.2.0
« Reply #7 on: May 30, 2012, 17:02 »
Okay I forgot one important point about the LDD, thanks for point that out for me :) You should run:
Code
LD_LIBRARY_PATH=path/to/atkpython/lib/ ldd path/to/atkpython/lib/python2.7/_NLEngine.so
If you do this, the remaining symbols will also be provided as those missing from the list you have shared, are the ones we ship along.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5423
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: Error after installing ATK-12.2.0
« Reply #8 on: May 30, 2012, 22:48 »
Just to make a simple statement: from the list you have it looks like all symbols are resolved. The "not founds" you see are ATK libraries, which we ship, and ATK will find them.