In this post I’ll try to explain the vision behind a software suite I wrote at the Niels Bohr Institute, University of Copenhagen, 2009-11.
The ideas for the software arose from my dual involvement with the high energy physics and the e-science groups. Some of the code originated in previous projects, including distributed computing efforts of the ATLAS experiment at CERN, the LCG project and the KnowARC project. Moreover, various general open-source libraries are used.
GridFactory was a research project in distributed computing with the aim of helping to push the envelope in scientific collaboration – using the Internet not just for communication or donation of CPU-power, but for concrete collaborative work through the sharing of computational resources.
In particular by making it simple to:
- form and dissolve collaborations
- provision compute clusters of heterogeneous, distributed resources
- use such cluster in an intuitive and collaborative way for scientific research
The main results of the project were:
- a handful of papers,
- a pilot implementation of the ideas exposed in the papers and below,
- actually helping scientists carry out heavy-duty computations,
- and of course the contributing to the education of a few individuals to experts in data processing.
The project took a fresh look at the 10 years old grid computing concept and incorporated ideas from utility and cloud computing, focusing on improving the user experience.
Below follows a more detailed explanation of these ideas.
Batch systems allow the sharing of a single large computing facility by multiple users. A central concept is that of “jobs”, which users can submit to job queues. Users are typically offered a command-line interface for this. Commonly used batch systems included: LSF, PBS, SGE, LoadLeveler and Condor.
These batch systems had one thing in common: once a job is submitted to a server, the files making up the job are pushed to a worker node by the server. This means that all worker nodes must run a network service, listening for jobs. Moreover, most of the systems rely on a shared file system and shared user accounts to exist on the server and the worker nodes. All this effectively means that server and worker nodes must be on the same local area network.
Most of the batch systems listed above offered a common web service interface for job submission: DRMAA. The DRMAA specification (v1.0) does not support transferring input and output files from/to remote locations. It also does not specify how to authenticate with a remote system. To allow submitting jobs to remote sites, these sites must therefore have a service in front of the batch system. In the grid community, a widely supported web service interface for remote job submission was BES. Examples of middleware projects that implemented BES are: EGEE, NorduGrid and Unicore. There also existed a more stand-alone BES implementation based on gSOAP. The many grid computing projects of the past decade aimed at enabling global, distributed computing, but met with limited success outside of the production use by CERN/LHC.
Server rental offerings of varying degree of automatization have been around for a some time. Examples included RackSpace, ServePath and ThePlanet. Rental of virtual machines is offered by Amazon, Google, GoGrid and others. These machines are controlled by the user either via a web interface or through a RESTful web service interface. Such offerings are usually referred to as cloud computing. Some vendors, notably OpenStack, OpenNebula, CloudStack, Enomaly and Eucalyptus, offer software – in principle allowing anyone to set up a cloud.
The idea behind GridFactory was to extend a traditional job-oriented batch system interface and with file staging capabilities plus a catalog of software packages. The system shares many characteristics with traditional grid systems, but in contrast to these it is not built on top of batch systems, but is a batch system extended with WAN capabilities. The system also shares many characteristics with cloud systems, but in contrast to these it is not machine oriented, but job oriented: a virtual machine is seen as just another piece of software that a given job may need in order to run. GridFactory also shares some characteristics with utility computing systems like BOINC, but in contrast to BOINC it supports virtualization, is a general purpose batch system – allowing to run any executable, and moreover allows any user to build up a virtual machine and software catalog.
GridFactory was designed for running large numbers of independent compute jobs, i.e. jobs that don’t communicate with each other. While multi-CPU provisioning and MPI support could be added, this is not currently a priority. Also MapReduce currently was outside of the scope of GridFactory.
As such, potential users included high-performance computing users who want:
- on-the-fly provisioning of virtual machines and software
- easy installation and configuration – at the price of less scheduling features than traditional batch systems
- an intuitive GUI for managing large numbers of independent compute jobs
- scaling across multiple sites
- companies and academic institutions looking for a simple way to analyze large amounts of scientific data, processing log files, rendering 3D images or converting media files
- companies and academic institutions looking for a way to streamline data processing workflows – keeping track of in and output files and operating system and software requirements.
What distinguishes GridFactory from traditional batch computing systems is the following:
- WAN support: There is no dependence on shared file systems, ssh login or other LAN artifacts. Remote job submission is inherently supported.
- Network security: All peers in a grid are (optionally) identified by an X.509 certificate and all communication is secure.
- Ease of use: To make network configuration as simple as possible, a pull paradigm is employed for the communication between a server and its worker nodes.
- Host security: Worker nodes can be protected from rogue computing jobs by configuring them to run all jobs inside virtual machines.
What distinguishes GridFactory from traditional grid computing systems is the following:
- Fewer abstraction layers, less bloat: The server part of GridFactory is implemented directly as Apache modules, exhibiting a straight-forward RESTful interface. No application servers, no SOAP. Since wide area networks are inherently supported, the complication of interfacing a grid middleware stack with batch systems is eliminated. Moreover, no attempt is made of supporting other transport mechanisms than HTTPS.
- Standards compliance: Only standard security libraries are used. Only plain and standard HTTPS is used. Only proven open-source technology is used: Apache, MySQL, KVM, VirtualBox, Java. For job submission, neither BES nor WSRF is used, but instead a simple RESTful interface.
- Platform support: All common platforms are supported: most of the GridFactory code is 100% pure Java – the non-Java code is restricted to the server and consists of two Apache modules written in c, which compile on all major Linux distributions (binaries are provided).
- No central bottlenecks: The hierarchical pull architecture leaves the decision of which jobs to run to the resources running them and therefore avoids central bottlenecks and single points of failure like resource brokers and central information systems.
- Scalability: The hierarchical pull architecture supports multiple parents and the submission clients support multiple submission target hosts. Therefore, load balancing is built in and performance scales linearly with the number of servers.
- Virtual organizations can easily be created and destroyed.
- Software catalogs can easily be created, published and used.