Online Materials

Condor Primer

Bruno Miguel Tavares Gonçalves

 

High Throughput Computing (HTC)Top

The Condor Project was created to satisfy a current necessity in state of the art computational research, the need to process large amounts of information in an efficient manner. This is achieved by implementing what is known in the literature as a Hight Throughput Computing enviornment, that delivers large amounts of computational power over a large period of time1.

Condor allows for the CPU cycles that are unused in various computers to be available for general use in an efficient and transparent manner, thus dramatically improving the use of already existing computational resources, without affecting their normal use. It also allows for a number of independent jobs to be simultaneously scheduled and run concurrently and as fast as resource availability allows it.

 

SCRIPT fileTop

Before you can submit a job to Condor you need to write a submit script that informs Condor of what is required to complete the computational task. A simple submit script can look something like this:

Executable = hello 

Universe = vanilla 

Output = hello.out.$(PROCESS)

Input = hello.in.$(PROCESS)

Error = hello.err.$(PROCESS)

Transfer_files = ALWAYS

Log = hello.log 

Queue 3

The only necessary lines are the first two and the last one. The meaning of each line is described bellow:

Executable = hello

  • This line tells Condor what binary/script to run. Arguments shouldn't be passed along with the binary

Transfer_files = ALWAYS

  • This indicates to Condor that it should send the files to the remote machine. This allows your program to run in machines that don't have all the necessary libraries installed. You should always include this line, even though it is not mandatory.

Universe = Vanilla

  • This is where one would specify the proper universe to use. There are several possible choices, but the most commonly used is the vanilla universe. You can look up more details

Input = hello.in.$(PROCESS)

  • The contents of this file will be used as <stdin> for this process. The macro $(PROCESS) is replaced by the process number, starting at 0. As such, process number 0 will read in hello.in.0, Process number 1 will read in hello.in.1, etc...

Output = hello.out.$(PROCESS)

  • Any output generated by the binary file will be redirected to this file. As before $(PROCESS) stands for the process number.

Error = hello.err.$(PROCESS)

  • Any error messages produced by the processes will be redirected to this file. Similarly to before, $(PROCESS) identifies the process number.

Log = hello.log

  • Condor stores the job log in this file. Any actions taken during the exectution of this job will be listed here.

Queue N

  • This line tells Condor how many different processes to queue. If the numerical argument is ommited, only one process is added to the execution queue. The processes thus generated will be run on as many machines as there are available and for as long as required to complete the exectution.

 

All the files mentioned above are expected to be in the same directory as the submit script and the executable binary. The commands used to submit and control jobs are described in the next section.

 

Basic CommandsTop

 

condor_status

If you want to know how many nodes and cpus are know to Condor you can run:

condor_status

This command will list all of the nodes that are running in the Condor that your machine is in. It will also provide basic information about each node, such as the architecture, operating system, and whether or not there are any jobs that are currently running.

 

condor_compile COMMAND

Condor requires that your binaries are linked with some condor-specific libraries so that it can do its job efficiently. This is achieved by calling:

condor_compile COMMAND
where COMMAND stands for the command you would normaly used to compile your code. So, if you would normaly compile your source code using:

gcc foo.c -o foo.x
you should now use:

condor_compile gcc foo.c -o foo.x

condor_submit SCRIPT

After we have linked our binary with the condor libraries and have writen the submit script, we need to tell Condor to use it to create a new job. All you do is type:

condor_submit SCRIPT
If you want to see what the submit command is doing for debugging purposes, you can simply type:

condor_submit -v SCRIPT

After this command returns all you need to do is wait for your job to finish running. If everything goes according to plan the output of your runs should be in the Output files listed in the submit script and any errors that have occured will be in the Error files. Please note that the Error files will always be created, even if no errors occur.

 

condor_q

You can check the status of your job by typing:

condor_q
The output of this command will tell you how many jobs are on Condors queue, which jobs are currently Running and which jobs are Idle. Jobs can be idle simply because they havent gotten their turn yet, or because they encountered some sort of problem.

You can check if something went wrong by using:

condor_q -analyze ID

The output should give you an idea of what's going on.

 

condor_rm ID

To remove a process from Condors job queue you type:

condor_rm ID

where ID is the jobs ID number that you can get from a previous call to condor_q

 

Further resources

There exist several other commands, and much more to be said about condor. This tutorial was meant to be just a very basic introduction to the way Condor operates and after reading it you should understand enough of Condor to be able to start using it right away. Further information, very detailed documentation[1] and FAQs can be found in Condors website: http://www.cs.wisc.edu/condor/

 

BibliographyTop

1
Team Condor.
Condor Version 6.4 Manual.
University of Winsconsin-Madison, August 2002.



FootnotesTop

... time1
As opposed to a Hight Performance Computing (HPC) enviornment that delivers very high performance over short periods of time.




 

© Copyright 2004 Bruno Goncalves - All rights reserved

Valid XhtmlValid CSS