Author Topic: running atk with mpich2 over multiple machines with ssh  (Read 10303 times)

0 Members and 1 Guest are viewing this topic.

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
hello,

I can run atk using mpich with multiple processes, but not over multiple machines ... the other machines can only be accessed via ssh with password

There is a line in the mpich doc that says "
    ssh othermachine date
If you cannot get this to work without entering a password, you will need to configure ssh or rsh so that this can be done. "

Any help on how to do this? I have no idea how to configure ssh



Offline Nordland

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 812
  • Reputation: 18
    • View Profile
Re: running atk with mpich2 over multiple machines with ssh
« Reply #1 on: January 30, 2012, 10:11 »
The way I personally do it, is to generate ssh key without a password. Add this key to your accepted keys on your machines, and then there is no prompt for password anymore.

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Re: running atk with mpich2 over multiple machines with ssh
« Reply #2 on: January 30, 2012, 10:15 »
Ok thank you I will try this (send to IT people who understand) .. :)


Ed

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Re: running atk with mpich2 over multiple machines with ssh
« Reply #3 on: January 31, 2012, 23:41 »
Ok so the IT people said this would be too unsecure .. cant do this ...

but i found if you install mpich2 with the compilation option --with-pm=smpd
then you can use some extensions which allow you to set the user and password ...

I run like this:
$MPICH2_BIN_DIR/mpiexec -hosts 4 <host1> 2 <host2> 2 -pwdfile pwd.txt -phrase <password> -p 3396 $ATK_BIN_DIR/atkpython $SRC_DIR/$SCRIPT2 > $LOG_DIR/$SCRIPT2.log.txt

first it said i needed to provide a password so i added the -phrase above ... now it says:

op_read error on left context: Error = -1
unable to read the challenge string, Error = -1


Any ideas here?



Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Re: running atk with mpich2 over multiple machines with ssh
« Reply #4 on: February 1, 2012, 05:56 »
I posted this question on an mpich forum and those users said (as you said) that you have to setup ssh without a password ... the IT people at my university said this cannot be done ... so this means that either I cannot use parallelization, or I am stuck with running separate jobs on multiple machines .... this is not really a great solution ... any ideas?

Offline Nordland

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 812
  • Reputation: 18
    • View Profile
Re: running atk with mpich2 over multiple machines with ssh
« Reply #5 on: February 1, 2012, 21:50 »
Your machine is that a Windows or Linux? And the machines you are trying login into?

Offline Nordland

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 812
  • Reputation: 18
    • View Profile
Re: running atk with mpich2 over multiple machines with ssh
« Reply #6 on: February 1, 2012, 21:58 »
If you are under linux, you can do this evil trick to trick your evil admins:

Start up the terminal, go to .ssh and write
ssh-add your_password_key

Then it will remember that password for length of this terminal session. You can then fire of ATK and it should not prompt for password.


http://linux.die.net/man/1/ssh-add

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Re: running atk with mpich2 over multiple machines with ssh
« Reply #7 on: February 1, 2012, 22:07 »
oooh that sounds good ... thank you i will try it

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5575
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: running atk with mpich2 over multiple machines with ssh
« Reply #8 on: February 2, 2012, 08:29 »
Just as a general note, ATK does not support the "smpd" process manager. It may appear to work (it runs), but if you look carefully each node will actually act as master, so you will not get any parallelization + problems with I/O since all nodes will attempt to write the NetCDF files at the same time. It will be obvious that something is wrong from the log file too, since all statements will appear multiple times.

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Re: running atk with mpich2 over multiple machines with ssh
« Reply #9 on: February 2, 2012, 10:45 »
thank you.  I only tried that because i saw there were options to supply a password, but it didnt work anyway ... thank you. 

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Re: running atk with mpich2 over multiple machines with ssh
« Reply #10 on: February 9, 2012, 18:35 »
followed this tutorial and i am able to ssh without passwords .. might be help to others

http://www.mtu.net/~engstrom/ssh-agent.php

ed

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Re: running atk with mpich2 over multiple machines with ssh
« Reply #11 on: February 10, 2012, 00:12 »
ok well i have tried and tried, it seems i cannot figure out how to get mpich working across multiple machines .. it works fine for 1 machine and multiple threads .. but over mult machines ... not so good ... if you guys ever provide a detailed guide that would be very helpful ..

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5575
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: running atk with mpich2 over multiple machines with ssh
« Reply #12 on: February 10, 2012, 10:16 »
Is this Linux or Windows? It would help to know more in detail what goes wrong - then maybe we can work out a guide as a result of our discussions in this thread.

I take it you have gone through the basic stuff, installing MPICH2 and ATK on the 2 machines, and set up MPD?

Offline esp

  • Supreme QuantumATK Wizard
  • *****
  • Posts: 318
  • Country: us
  • Reputation: 3
    • View Profile
    • University of Minnesota
Re: running atk with mpich2 over multiple machines with ssh
« Reply #13 on: February 10, 2012, 18:02 »
Yes I have 4 linux machines with Ubuuntu ... all have atk installed in exactly the same way, same paths, same options .. also they all have 2 installations of mpich, one with the smpd option and one without in different paths ...

all of them work fine runnning atk on their own, even parallelized but on one machine with a mpich -n type run

i just cannot get it to work over multiple machines starting from one job .... one reason is because all the machines have ssh passwords to access... i recently posted about ssh-agent, which i successfully figured out which saves your password for a session ... that solved one problem, but then when i try to run the job i get all sorts of wierd errors, ... not from atk but from mpich i think ... i will try again and post those errors ... my comment is just that in general this is not a simple thing to do and a tutorial would help (linux machines, ssh with passwords, running atk over mult machines)

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5575
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: running atk with mpich2 over multiple machines with ssh
« Reply #14 on: February 10, 2012, 20:20 »
I agree it's not an entirely trivial thing to do. However, it may be that you have a working MPI environment already. You mention smpd - that's a no-no with ATK. But ok you have another one, that's the only one which will work. Yes, post your error messages, that will be the only way to proceed. Also, you can try to execute something simpler than ATK, you can do simple tests like
Code
mpiexec -n 3 echo $HOSTNAME