In this post I’ll give an overview of the GridFactory software suite (including GridPilot) and provide the minimum information to get started as well as pointers to more thorough documentation.
Contents
- 1 Introduction
- 2 User guides
- 3 Advanced topics
Introduction
Overview
GridFactory is a software suite composed of the following programs:
GridFactory server
- The GridFactory server is the software running on the server to which jobs are submitted. It maintains a queue of computing jobs that GridWorkers download (pull) and run.
GridWorker
- GridWorker is the application that runs on so-called worker nodes. It picks up computing jobs from the GridFactory server, executes them and transfers the results back to the GridFactory server. The application can run both in GUI mode (suitable for desktops) and in command-line mode (suitable for servers).
GridFactory command-line tools
- The GridFactory command-line tools is a software package that provides command-line tools for submitting jobs, subsequently querying their status and stopping them if necessary.
GridPilot
- GridPilot is a GUI application designed to manage large amounts of jobs. It streamlines the job preparation, validates that jobs have finished correctly and registers output files in file catalogs. It is more general than GridFactory in the sense that it can also run jobs on other computing systems than GridFactory, in particular on legacy grid systems and directly on the Amazon EC2 cloud.
Certificates
The GridFactory software assumes that you have installed SSL credentials (X.509 certificate and RSA key) in a default place. If none are found, you are asked to locate them. If you don’t have any, default credentials are used.
If you have worked with traditional grid software, you will most likely already have working credentials installed.
The GridFactory software will only trust and use these credentials if they were issued by a known so-called certification authority (CA). In order to browse GridFactory services without warnings from your web browser you should install the default CA certificate of GridFactory. You may also want to install the default client test certificate.
For more details, see “User guides” and “Advanced topics”.
Virtual organizations
With GridFactory, people and computing resources are identified by X.509 certificates. Every X.509 certificate has a unique subject or distinguished name string (DN). To GridFactory, a virtual organization (VO) is simply a set of DN’s. More specifically, a VO is defined by a text file served by a web server, containing a new-line separated list of DN’s. Thus, creating a virtual organization is simple: just create a text file containing a list of distinguished names and put it on a web server, so that it can be accessed as e.g. some.server.com/my_vo.txt. A default installation of a GridFactory server will serve such a text file at my.server.org/vos/default.txt.
To become member of a VO, you’ll therefore have to get your DN into such a text file. Typically, there will be a web form to fill out or just an email to send.
To find out what your DN is: in GridPilot, select
“Help” → “Credentials info”
in GridWorker, select
“Help” → “Show my distinguished name”
Allowing a virtual organization on my resources
To tell GridWorker that members of a given VO, defined by, say, some.server.com/my_vo.txt, are allowed to run jobs on your computer, you must start GridWorker with the flag -v, like
gridworker -v some.server.com/my_vo.txt
To allow multiple VO’s to run jobs on your computer, use the -v flag several times.
Applications, executables and runtime environments
In GridPilot terminology, an application of an executable consists in running it a given number of times with different parameters and/or input files.
To get you started, on the blog, a number of applications are described. These you can import in GridPilot, simply by selecting
“File” → “Import application(s)”
The idea is that you can customize such an application by editing the various parameters that define the application.
One such parameter is which executable to run. The executable is defined by an executable record, which can also be edited. The parameters of an executable record include the actual file (script or binary) to execute, input files and which runtime environment to use.
Runtime environments are provided by the computing systems (GridFactory) and thus, in contrast to executables, not shipped with the job. A GridFactory server typically downloads and caches runtime environments from another GridFactory server.
User guides
- GridPilot quick start
- A short guide to getting started with running jobs from the GridPilot GUI.
- GridPilot users’s guide
- An older, CERN/ATLAS biased, comprehensive guide to using GridPilot, including various concrete but older examples.
- GridFactory quick start
- A short guide to getting started with deploying a GridFactory server with attached worker nodes and running jobs from the command line.
Advanced topics
The design of GridFactory
In case you’re interested in the design considerations and architecture of GridFactory, here is the original technical design report: . A more up-to-date technical description of the GridFactory web services is available here:
.
Certificates again
GridFactory uses SSL/X.509 certificates for identifying computing resources. Running instances of GridFactory server, GridWorker and GridPilot are all identified by a signed X.509 certificate and associated RSA key. The certificate must be signed by a trusted certificate authority (CA). Currently, GridFactory does not provide any functionality for generating, requesting or manipulating keys and certificates, so it’s up to you to get a certificate/key and convert it to X.509/RSA format (PEM) if necessary.
By default, GridFactory trusts a selected set of CA’s. This set consists of the distribution of EUGridPMA plus the certificate of our partner Cabo. Each of the components of GridFactory keeps these certificates in a certain directory:
GridWorker: “~/.GridWorker/certificates”
GridPilot: “~/.GridPilot/certificates”
GridFactory server: “/var/www/grid/certificates”
GridFactory command-line tools: “~/.globus/certificates”
These directories are default locations and can be modified by command-line options and/or configuration file settings.
If the certificate of your CA is not among the default certificates of GridFactory you should add it to the relevant directories.
All this said, you can run GridWorker and GridPilot without a personal certificate/key. In that case you will simply be using the default certificate/key and you will be able to run jobs on our test clusters, but your permissions on other resources are likely to be very limited. To run a GridFactory server you always need a certificate with a distinguished name that matches the DNS name of the server.
Using Apache httpd as a secure file server
GridFactory uses Apache httpd for its RESTful web service. This is done via the standard Apache modules mod_ssl and mod_dav, plus two extra Apache modules, mod_gacl and mod_gridfactory, that are part of the GridFactory server distribution.
The httpd+mod_ssl+mod_dav+mod_gacl combo can also be used to serve other files than the input and output files of compute jobs, i.e. can be used to set up an HTTPS file server with GACL authorization.
For more information, see the paper on authorization and virtual organizations .
Using MySQL as a secure file catalog
GridPilot can keep its job information on a MySQL server and can also use MySQL servers as file catalogs. For this to work, you need to be authorized to read and write over the internet to a MySQL database on a server somewhere. The administrator of the server can autorize you either via traditional user name and password or via your X.509 certificate.
GridWorker deployment strategies
Deployment of GridFactory, consists in setting up one or several servers, instructing the “clients” with computing needs to install GridPilot and then deploying GridWorkers on, possibly, a large number of computers. One of the most tricky parts is likely to be the large-scale deployment of GridWorkers. The common scenarios fall in two categories:
Replacement of a traditional batch system
With this is meant allowing a group of users to use a farm of dedicated computers in a server room and be able to give them some amount of fair share. On such computers, GridWorker should be started in command-line (non-GUI) mode, i.e. with the flag -n. For a simple deployment example, see the document “Preliminary tests of GridFactory” . For larger deployments you will probably want to automatize things via shell scripts.
Cycle scavenging
With this is meant using a group of desktop computers as a computing resource. The deployment is simple: just tell your desktop users to get and install GridWorker and start it up whenever they are not using their desktop. When they return to their desktop, they should feel free to either shut down GridWorker completely or pause all running jobs.
The possible problem here is obviously lost jobs. The GridFactory server deals with this in the following way: a GridWorker that has not given any sign of life for a given amount of time is considered dead and its jobs reassigned i.e. started all over again on another desktop.
Creating your own software catalog
A GridFactory software catalog is an XML file in a specific format, listing software packages and their dependences. These packages are tarballs in a specific format (directory layout and some mandatory files). To set up your own catalog and repository, the easiest is probably to download an existing catalog, like gridfactory.nbi.dk/rtes/rtes.xml, strip out all except a few packages, put the file on a web server, download the packages and put them on the web server as well and change the catalog to reflect their locations. To use this catalog, configure your GridFactory server by setting
RTE_URLS = my.new.location/my_catalog.xml
in /etc/gridfactory.conf. Then GridFactory will download and republish this catalog under my.gridfactory.server/gridfactory/rtes/rte_catalog.xml. For more details, see the paper on the GridFactory software repository .
Using GridFactory as a back-end for traditional grids
Since GridFactory can replace – and provides command-line tools that are very similar to those of – a traditional batch system, it is straight forward to use GridFactory as a back-end for traditional grid systems like gLite and ARC, provided by EGEE and NorduGrid respectively. In fact, this has already been done for NorduGrid/ARC – see svn.nordugrid.org.
Submitting to localhost
If you’re submitting jobs from the GridFactory server itself, you may do so without using X.509 authentication – but at the price of reduced security. To be more precise: all users on your system will be able to modify each others’ jobs and the running GridFactory server processes. I.e. all users must be absolutely trusted. Here are the necessary steps:
- add the line
umask u=rwx,g=rwx,o=r
at the beginning of the init scripts spoolmanager, queuemanager, pullmanager and httpd/apache2 as well as at the beginning of the psub and pclean scripts - add the local users you want to be able to submit to the apache group (apache or www-data)
- modify the CLI scripts by appending some options to the invocation of the Java main method:
- for pcat: -b localhost -s root -f “”
- for pclean:-b localhost -s `whoami`
- for pkill: -b localhost -f “” -s root -g `whoami`
- for pstat: -b localhost -s `whoami` -f “”
- for psub: -b localhost
- append the following lines to the psub script:
chgrp -R apache /var/spool/gridfactory/jobs/* >& /dev/null chmod -R g+rw /var/spool/gridfactory/jobs/* >& /dev/null