Author Topic: Problem running atk 11.2.b2 in parallel in the background  (Read 6359 times)

0 Members and 1 Guest are viewing this topic.

Offline Derek Stewart

  • Regular QuantumATK user
  • **
  • Posts: 9
  • Reputation: 0
    • View Profile
Hi everyone,

I have been testing out the new version of ATK with some parallel runs and I have run into a problem when I try to run even the simple mpi_test in the background with mpiexec.  Everything works properly if I run it with everything printing out to the screen or I direct it to a file and let it run in the foreground.  For mpich2, I am using version 1.3.2.  The calculations are done on a redhat enterprise 5 machine with Xeon processors.

For example, these commands work fine:

 mpiexec  -n 2 -hosts d1,d2 /opt/QuantumWise/atk-11.2.b2/atkpython/bin/atkpython /home/derek/atk_mpi_test/test_mpi.py

 mpiexec  -n 2 -hosts d1,d2 /opt/QuantumWise/atk-11.2.b2/atkpython/bin/atkpython /home/derek/atk_mpi_test/atk_mpi_test > out.run

However, when I try to run it in the background using & at the end.

 mpiexec -n 2 -hosts d1,d2 /opt/QuantumWise/atk-11.2.b2/atkpython/bin/atkpython /home/derek/atk_mpi_test/test_mpi.py &

I get the following error:
[mpiexec@d1.cnf.cornell.edu] HYDU_sock_read (./utils/sock/sock.c:222): read errno (Input/output error)
[mpiexec@d1.cnf.cornell.edu] control_cb (./pm/pmiserv/pmiserv_cb.c:249): assert (!closed) failed
[mpiexec@d1.cnf.cornell.edu] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@d1.cnf.cornell.edu] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:206): error waiting for event
[mpiexec@d1.cnf.cornell.edu] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion


After searching through some discussion groups on mpiexec using hydra routing, I found the following work-around to run things in the background. 

 mpiexec -n 2 -hosts d1,d2 /opt/QuantumWise/atk-11.2.b2/atkpython/bin/atkpython /home/derek/atk_mpi_test/test_mpi.py < /dev/null &
 
With this redirection, you can also run the calculation with nohup at the beginning as well.
   
The following link discusses this issue in more detail:
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-October/008239.html

Best Regards,

Derek





Offline Derek Stewart

  • Regular QuantumATK user
  • **
  • Posts: 9
  • Reputation: 0
    • View Profile
Re: Problem running atk 11.2.b2 in parallel in the background
« Reply #1 on: February 14, 2011, 16:36 »
Hi everyone,

I just noticed that the final command line I listed had a typo.  It should have included the redirect to the output file as follows:


mpiexec -n 2 -hosts d1,d2 /opt/QuantumWise/atk-11.2.b2/atkpython/bin/atkpython /home/derek/atk_mpi_test/test_mpi.py > out.run < /dev/null &

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5418
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Re: Problem running atk 11.2.b2 in parallel in the background
« Reply #2 on: February 14, 2011, 17:10 »
Thanks for sharing this trick to solve this little snag, which also appears with mpd (but in that case, unlike with hydra, the program is allowed to run, it just generates tons of garbage output in the log file). We'll make it a "tip" in the parallel guide to make sure people are aware of it!