Nimrod on Nectar – Barrines

To config and run Nimrod on Nectar – distributing jobs to Barrines cluster – with PBS queuing system.

Nimrod requires a PBS client to submit jobs. Barrines is using PBS pro, which is a commercial tool. Not sure I can get a hold of its client or not.

Maybe ssh wrapper is a way to get around.

Update

  • got PBS client (rpm files) from Martin, installed on local system. Now need to config to point to pbs server.
  • From the manual

Specifying Queue and/or Server
The “-q destination” option to qsub allows you to specify a particular destination to which you want the job submitted. The destination names a queue, a Server, or a queue at a Server. The qsub command will submit the script to the Server defined by the destination argument. If the destination is a routing queue, the job may be routed by the Server to a new destination. If the -q option is not specified, the qsub command will submit the script to the default queue at the default Server. (See also the discussion of PBS_DEFAULT in “Environment Variables” on page 17.) The destination specification takes the following form:
-q [queue[@host]]
Examples:

qsub -q queueName@serverName my_job

qsub -q queueName@serverName.domain.com my_job

config file:

PBS_EXEC=/usr/pbs
PBS_HOME=/var/spool/PBS
PBS_START_SERVER=0
PBS_START_MOM=1
PBS_START_SCHED=0
PBS_SERVER=paroo3.hpcu.uq.edu.au
PBS_SCP=/usr/bin/scp

Looked at hosts file in one of the barrine ssh server.

x.x.x.x paroon3…..

and then another paroom3 which points to the actual pbs server

–> I dont think I can access this node from outside.

Asedk Martin, looks like the public paroo3 can also get jobs.

Will email him …

http://etutorials.org/Linux+systems/cluster+computing+with+linux/Part+III+Managing+Clusters/Chapter+17+PBS+Portable+Batch+System/17.6+Troubleshooting/

http://shanuwebpage.blogspot.com.au/2013/06/pbsiff-file-not-setuid-root.html

Done. Can submit jobs now.

Nimrod Stagein: –> sharedhome=yes

-W stagein=/tmp/1379669878.69-vm-130-102-154-19.qld.nect
ar.org.au@vm-130-102-154-19.qld.nectar.org.au:/home/s4327550/.nimrod/s4
327550vm-130-102-154-19.qld.nectar.org.aux86_a64linux2 /tmp/s4327550/13
79669878.69

Nimrod processes running on one of the node:

s4327550 4602 0.0 0.0 12604 1584 ? S 19:38 0:00 /bin/bash /tmp/1379669878.69-vm-130-102-154-19.qld.nectar.org.au –dbase tcp%vm-130-102-154-19.qld.nectar.org.au,130.102.154.19%40001%% –ident 4705 –user s4327550
s4327550 4613 0.4 0.0 64460 12620 ? Sl 19:38 0:01 ./remote –dbase tcp%vm-130-102-154-19.qld.nectar.org.au,130.102.154.19%40001%% –ident 4705 –user s4327550

from NimrodGridRun –> with sharedhome-=false:

stagein:

stagein=1379683546.56-vm-130-102-154-19.qld.nectar.or
g.au@vm-130-102-154-19.qld.nectar.org.au:/home/s4327550/.nimrod/s432755
0vm-130-102-154-19.qld.nectar.org.aux86_a64linux2,
.nimservers4327550vm-130-102-154-19.qld.nectar.org.au.pub@vm-130-102-1
54-19.qld.nectar.org.au:/home/s4327550/.nimrod/.nimserver.pub,
.nimports4327550vm-130-102-154-19.qld.nectar.org.au@vm-130-102-154-19.
qld.nectar.org.au:/home/s4327550/.nimrod/.nimports4327550vm-130-102-154
-19.qld.nectar.org.au /tmp/s4327550/1379683546.57

 

4707 | 5 | 5 | /home/s4327550/.nimrod/s4327550vm-130-102- 154-19.qld.nectar.org.aux86_a64linux2 | –dbase tcp%vm-130-102-154-19.qld.nectar .org.au,130.102.154.19%40001%% –ident 4707 –user s4327550 | A | start | active | 2 | (‘vm-130-102-154-19.qld.nectar.org.au’, ‘680780.paroo3 @paroo3.hpcu.uq.edu.au’) | | | 2013-09-20 20:08:41.714754 | 2013-09-20 20:08:41.630722 | 2 | F | F | F | | | 0 | 600 | | | 2013-09-20 20:06:13
.557608 | | 2013-09-20 20:06:39.688356 | |
| 0
4708 | 5 | 5 | /home/s4327550/.nimrod/s4327550vm-130-102-
154-19.qld.nectar.org.aux86_a64linux2 | –dbase tcp%vm-130-102-154-19.qld.nectar
.org.au,130.102.154.19%40001%% –ident 4708 –user s4327550 | A | start
| active | 2 | (‘vm-130-102-154-19.qld.nectar.org.au’, ‘680781.paroo3
@paroo3.hpcu.uq.edu.au’) | | | 2013-09-20 20:08:41.804353 |
2013-09-20 20:08:41.71735 | 2 | F | F | F |
| | 0 | 600 | | | 2013-09-20 20:06:18
.61342 | | 2013-09-20 20:06:39.778896 | |

| 0

Same error:

nimrod.agent (tid140737353881344) 2013-09-20 20:14:25,106
INFO Agent ident:4707 initialising.
nimrod.agent (tid140737353881344) 2013-09-20 20:14:25,107
INFO Agent scratch dir is /usr/tmp/s4327550/1379672065.11
Traceback (most recent call last):
File “/home/ngdev/install/nimrod-old-db/share/nimrod/t5dbase/NimrodClient.py”, line 190, in open
File “/home/ngdev/install/nimrod-old-db/share/nimrod/t5dbase/NimrodClient.py”, line 164, in get_contact
IOError: [Errno 2] No such file or directory: ‘/tmp/s4327550/agentarchive.ztqu12h/.nimports4327550vm-130-102-154-19.qld.nectar.org.au’

Tags:

About slump

Dr Slump ... :D.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: