Notes
Powered by Gregarious (33)
Go to Post Index Blog Index
Subscribe Subscribe
Subscribe to RSS feed via Email Subscribe via Email
Sphere: Related Content
 

How to build a linux Cluster - Part II

Filed under Cluster, How-to.

Viewed 3386 times times.

 

 

Series table of contents:

  1. How to build a linux Cluster - Part I
  2. How to build a linux Cluster - Part II
  3. How to build a linux Cluster - Part III

This post series documents how I built a powerful and scalable Linux cluster using only free software and off the shelf components. To build our cluster we are going to use three pieces of software:

On the first part of the series, I showed you how you can install DRBL on your server machine. Now, I explain how to install Condor on your DRBL cluster so you can easily submit and manage your computing jobs.

2. Condor Instalation

In order to make the most out of our newly created linux cluster we will install Condor, a batch system developed by the University of Wisconsin. After downloading the latest tarball and uncompressing it, just run:

 

666
[root@underdark condor-6.7.12]$ ./condor_install

The instalation script is well conceived and provides plenty of details on what each option means so you shouldn’t have many problems setting it up. You must, however, keep in mind that everything is shared between the server and the clients and answer accordingly when it asks you about the file systems, accounts and password files. Also, if you want to be able to make use of the Java support you need to have Sun’s java virtual machine installed prior to installing condor (if you install it a posteriori you can always sort through condor’s configuration files or redo the condor installation). MPI support requires an mpich installation (lam’s implementations does not play nicely with condor) and possibly tinkering with the configuration files after the installation. You can see the full log of a condor installation on a DRBL system here . For further details on Condor installation and its many other features you should refer to the Condor Manual online and to the very active mailing list that you are invited to subscribe to when you download the software. After the installation you can start condor by running:

 

666
[root@underdark condor-6.7.12]$ /usr/local/condor/bin/condor_master

This command should then spawn all other processes that condor requires. In particular in the server you should be able to see 5 different condor_ processes running:

 

666
667
668
669
670
671
672
673
[root@underdark condor-6.7.12]$ ps -ef | grep condor
condor   16418     1  0 14:18 ?        00:00:00 condor_master -f
condor   16477 16418  0 14:20 ?        00:00:00 condor_collector -f
condor   16478 16418  0 14:20 ?        00:00:00 condor_negotiator -f
condor   16479 16418  0 14:20 ?        00:00:00 condor_schedd -f
condor   16480 16418  0 14:20 ?        00:00:00 condor_startd -f
root     16503 16382  0 14:20 pts/4    00:00:00 grep condor
[root@underdark condor-6.7.12]$

After you start condor on the clients you should see all of the above except condor_collector and condor_negotiator. Sometimes the installation program gets a bit confused with the fact that the server has several hostnames (the usual hostname for eth0, nfsserver_eth1 and nfsserver_eth2 respectively) and running condor_master in the server will start only the services that are used by a client machine. We can easily fix it by using condor_configure to force the current machine (with all its hostnames) to be the server. This will change the relevante configuration files after which condor must be told to reread them. This is done with two simple commands.

 

666
667
668
669
[root@underdark condor-6.7.12]$ ./condor_configure --type=submit,execute,manager
[root@underdark condor-6.7.12]$ /usr/local/condor/sbin/condor reconfig
Sent "Reconfig" command to local master
[root@underdark condor-6.7.12]$

After which you should be able to see the server machine as part of your condor cluster when you run condor_status.

 

666
667
668
669
670
671
672
673
674
675
676
677
[bgoncalves@underdark ~]$ condor_status
 
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
underdark     LINUX       INTEL  Unclaimed  Idle       0.240  1518  0+00:20:04
 
                     Machines Owner Claimed Unclaimed Matched Preempting
 
         INTEL/LINUX        1     0       0         1       0          0
 
               Total        1     0       0         1       0          0
[bgoncalves@underdark ~]$


If you now run condor_master you should see the clients being added to this list within a few minutes (usually around 5-10 min but the more machines you have the longer it will take for all of them to finish handshaking with the server). There are a number of tutorials available online (not to mention the oficial manual) that teach you the basics of using condor (I wrote one for my users that is available here) but the mechanics of it are fairly simple specially if you’ve used other batching systems before.


On the final part of the series, I’ll give you a basic introduction to the mechanics of submitting and managing jobs in your new scalable cluster.

Sphere: Related Content




Leave a Reply




 

© Copyright 2004 Bruno Goncalves - All rights reserved

Valid XhtmlValid CSS

Socialized through Gregarious 33
Close
E-mail It