Notes
Powered by Gregarious (33)
Go to Post Index Blog Index
Subscribe Subscribe
Subscribe to RSS feed via Email Subscribe via Email
Sphere: Related Content
 

How to build a linux Cluster - Part I

Filed under Cluster, How-to.

Viewed 2495 times times.

 

 

Series table of contents:

  1. How to build a linux Cluster - Part I
  2. How to build a linux Cluster - Part II
  3. How to build a linux Cluster - Part III

This post series documents how I built a powerful and scalable Linux cluster using only free software and off the shelf components. To build our cluster we are going to use three pieces of software:

Fedora Core 3 is base distribution since it’s freely available and well supported by both DRBL and Condor and for the purposes of this tutorial I’ll assume that you already have FC3 installed on the machine that is to become our server. The server machine will be responsible for storing and serving all the files necessary for the client machines to be able to work and acting as a firewall between the cluster and the outside world. All the networking and services configuration is done seamlessly using our terminal server of choice (DRBL) installation scripts. After DRBL is successfully set up we will install the Condor software to act as a job manager that allows us to simply submit jobs (both serial and parallel) on the server machine without having to worry about where they will run. Condor will figure out when and where the jobs run and will send you an email when it’s all done so you can pick up your results on the server. In the remaining sections I’ll give a brief overview of the configuration of these two software packages and what care must be taken to make them work nicely together.


Using just drbl, we would have a set of machines that could be used as thin clients in, say, a classroom setting. By adding a well known clustering software (condor) we turn this set of machines in to a computing cluster that can perform high throughput scientific computation on a large scale for a small price tag. Computing jobs (either serial or parallel) are submitted on the server and the condor package then takes care of distributing the jobs by the machines that are currently iddle, if any, or putting them on queue until the required resources are available. Condor is also capable of performing periodic checkpoint on the jobs and restarting them if something causes the machine they are running in to die/reboot (like a power failure).


The end product is very stable and allows for compute nodes to be added and removed from the system without need to reconfigure anything up to a maximum number of machines that is configurable at drbl and condors install time. This number can be made arbitrarily large with just a small penalty of harddrive space required to store some configuration files (config files for 72 machines use less than 3Gb of harddrive space on the server).

1. DRBL Instalation

In this particular example we will assume that our server machine has three network interfaces, eth0, to connect to the outside world, eth1 and eth2 to connect to the client machines. I chose to use two different subnets for the client machines in order to spread the network traffic over two different routers and to illustrate that the clients don’t all need to be in the same subnet although that is probably the most common case. Our client machines are cheap boxes with nothing but a motherboard, CPU, RAM and network card (possibly embedded in the motherboard). Before you start installing DRBL you should configure eth1 and eth2 with private IP addresses (192.168.) and set the clients’s BIOS to boot via the network card.


DRBL is available as a small rpm package that is easily installed. To set it up we are going to be using two installation scripts, drblsrv to configure the server and drblpush to configure the clients. We must start with drblsrv by running

666
/opt/drbl/setup/drblsrv -i

This perl script will simply install the packages that drbl requires (dhcp, tftp, etc… if they’re not already installed) and ask you if you want to go ahead and update the system. This is straightforward since the default options are usually the correct ones for most of the cases. You just have to remember that if you downloaded the rpm from the testing or unstable directories to select Yes when it asks if you want to use the testing or unstable packages. You can take a look at what one run of drblsrv run looks like in the drblsrv.txt attachment.


After drblsrv finishes (which can take a fair amount of time, specially if you have a slow internet connection and you asked to update the system) it’s time to run drblpush by typing:

666
/opt/drbl/setup/drblpush -i

This second perl script is the workhorse of the whole system it configures all starts are the services required to make it all work. It will automatically detect the network interfaces that have private ips assigned to it and ask you how many clients you want to set up in each of them. For added security, you can bind the booting process (via dhcp) to the mac addresses of your clients at the cost of flexibility. This feature is useful if you are setting up your system in the open (in a classroom for instance) where anyone can come and plug a new machine in there without you knowing about it. Since our system is mean only for computing and is assumed to be behind closed doors accessible to users only by ssh-ing to the server we choose N for this option and simply tell drbl how many clients we want on each network interface and they all get ip addresses on a first come first serve basis at boot time. At this time DRBL will present you with a nice little graphical view of your setup:

          NIC    NIC IP                    Clients
+-----------------------------+
|         DRBL SERVER         |
|                             |
|    +-- [eth0] xxx.xxx.xxx.xxx +- to WAN
|                             |
|    +-- [eth1] 192.168.110.254 +- to clients group 1 [ 24 clients, their IP
|                             |            from 192.168.110.1 - 192.168.110.24]
|    +-- [eth2] 192.168.120.254 +- to clients group 2 [ 24 clients, their IP
|                             |            from 192.168.120.1 - 192.168.120.24]
+-----------------------------+

to make sure everything is the way you intend it to be. In this particular example we just have made room for 24 machines in each private network card. If this doesn’t look the way you intend it, you can just Ctrl-C and restart from scratch, nothing has been changed in your system yet. After this point, the only option you might want to change from default is the one refereing to the boot mode (the second after the ASCII “art”). Since these machines will only be used for number crunching, you want to set that to “2″ for text mode so you don’t waste resources starting an X window server that will never be used. All the other options can safely be left at default values or modified to serve your specific goals. You should also note that if you change something on the server and you want those changes to propagate to the clients you need to rerun drblpush again. In this case you can choose to keep the old client settings and just export the changes or to redo everything from scratch. After drbl finishes sucessfully, you can boot your client machines and if everything went according to plan you should now have a basic cluster system up and running. You can see what a full run of drblpush looks like in the drblpush.txt attachment. If you run in to any problems, you can use the support forum that is available through sourceforge. My experience has been that you tipically get an answer back from one of the developers in 24h or so.


On the next part of this series I’ll show you how to configure the cluster management software, Condor, and the third and final part will give you a basic introduction to how you can take full advantage of your brand new cluster.

In the mean time, please feel free to leave and questions or suggestions in the comment area.

Sphere: Related Content




Leave a Reply




 

© Copyright 2004 Bruno Goncalves - All rights reserved

Valid XhtmlValid CSS

Socialized through Gregarious 33
Close
E-mail It