QuantumATK Forum

QuantumATK => General Questions and Answers => Topic started by: esp on January 29, 2012, 04:32

Title: running atk with mpich2 over multiple machines with ssh
Post by: esp on January 29, 2012, 04:32
hello,

I can run atk using mpich with multiple processes, but not over multiple machines ... the other machines can only be accessed via ssh with password

There is a line in the mpich doc that says "
    ssh othermachine date
If you cannot get this to work without entering a password, you will need to configure ssh or rsh so that this can be done. "

Any help on how to do this? I have no idea how to configure ssh


Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: Nordland on January 30, 2012, 10:11
The way I personally do it, is to generate ssh key without a password. Add this key to your accepted keys on your machines, and then there is no prompt for password anymore.
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: esp on January 30, 2012, 10:15
Ok thank you I will try this (send to IT people who understand) .. :)


Ed
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: esp on January 31, 2012, 23:41
Ok so the IT people said this would be too unsecure .. cant do this ...

but i found if you install mpich2 with the compilation option --with-pm=smpd
then you can use some extensions which allow you to set the user and password ...

I run like this:
$MPICH2_BIN_DIR/mpiexec -hosts 4 <host1> 2 <host2> 2 -pwdfile pwd.txt -phrase <password> -p 3396 $ATK_BIN_DIR/atkpython $SRC_DIR/$SCRIPT2 > $LOG_DIR/$SCRIPT2.log.txt

first it said i needed to provide a password so i added the -phrase above ... now it says:

op_read error on left context: Error = -1
unable to read the challenge string, Error = -1


Any ideas here?


Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: esp on February 1, 2012, 05:56
I posted this question on an mpich forum and those users said (as you said) that you have to setup ssh without a password ... the IT people at my university said this cannot be done ... so this means that either I cannot use parallelization, or I am stuck with running separate jobs on multiple machines .... this is not really a great solution ... any ideas?
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: Nordland on February 1, 2012, 21:50
Your machine is that a Windows or Linux? And the machines you are trying login into?
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: Nordland on February 1, 2012, 21:58
If you are under linux, you can do this evil trick to trick your evil admins:

Start up the terminal, go to .ssh and write
ssh-add your_password_key

Then it will remember that password for length of this terminal session. You can then fire of ATK and it should not prompt for password.


http://linux.die.net/man/1/ssh-add
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: esp on February 1, 2012, 22:07
oooh that sounds good ... thank you i will try it
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: Anders Blom on February 2, 2012, 08:29
Just as a general note, ATK does not support the "smpd" process manager. It may appear to work (it runs), but if you look carefully each node will actually act as master, so you will not get any parallelization + problems with I/O since all nodes will attempt to write the NetCDF files at the same time. It will be obvious that something is wrong from the log file too, since all statements will appear multiple times.
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: esp on February 2, 2012, 10:45
thank you.  I only tried that because i saw there were options to supply a password, but it didnt work anyway ... thank you. 
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: esp on February 9, 2012, 18:35
followed this tutorial and i am able to ssh without passwords .. might be help to others

http://www.mtu.net/~engstrom/ssh-agent.php

ed
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: esp on February 10, 2012, 00:12
ok well i have tried and tried, it seems i cannot figure out how to get mpich working across multiple machines .. it works fine for 1 machine and multiple threads .. but over mult machines ... not so good ... if you guys ever provide a detailed guide that would be very helpful ..
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: Anders Blom on February 10, 2012, 10:16
Is this Linux or Windows? It would help to know more in detail what goes wrong - then maybe we can work out a guide as a result of our discussions in this thread.

I take it you have gone through the basic stuff, installing MPICH2 and ATK on the 2 machines, and set up MPD?
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: esp on February 10, 2012, 18:02
Yes I have 4 linux machines with Ubuuntu ... all have atk installed in exactly the same way, same paths, same options .. also they all have 2 installations of mpich, one with the smpd option and one without in different paths ...

all of them work fine runnning atk on their own, even parallelized but on one machine with a mpich -n type run

i just cannot get it to work over multiple machines starting from one job .... one reason is because all the machines have ssh passwords to access... i recently posted about ssh-agent, which i successfully figured out which saves your password for a session ... that solved one problem, but then when i try to run the job i get all sorts of wierd errors, ... not from atk but from mpich i think ... i will try again and post those errors ... my comment is just that in general this is not a simple thing to do and a tutorial would help (linux machines, ssh with passwords, running atk over mult machines)
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: Anders Blom on February 10, 2012, 20:20
I agree it's not an entirely trivial thing to do. However, it may be that you have a working MPI environment already.

You mention smpd - that's a no-no with ATK. But ok you have another one, that's the only one which will work.

Yes, post your error messages, that will be the only way to proceed.

Also, you can try to execute something simpler than ATK, you can do simple tests like

Code
mpiexec -n 3 echo $HOSTNAME
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: esp on February 10, 2012, 21:12
what you posted works fine, but that is only for one machine .. i do this everyday no problem .. but i need to be able to run on multiple machines
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: Anders Blom on February 10, 2012, 21:15
Yes ok, so next you make sure you have mpd configured to know which other machines are in the ring, and if necessary specify a -machinefile argument when running.
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: Anders Blom on February 10, 2012, 21:23
https://help.ubuntu.com/community/MpichCluster is pretty succinct and easy to follow. You can skip most things, like installing gcc etc, but 7 and 11 are very relevant for you I guess. Also, just in case you didn't already, you must have a shared folder where (ideally) ATK and (absolutely necessary) the scripts you run reside.
Title: Re: running atk with mpich2 over multiple machines with ssh
Post by: esp on February 10, 2012, 23:15
Thank you that does look like a good tutorial, i will try it.