In this post, I’ll take a look at some runs of the POV-Ray application available in the GridPilot app store: To import this app, just choose “File → Import application”, navigate to the relevant folder and click “OK”.
This application is a bit more sophisticated than the one used for the simple benchmarking described in a previous post. Now, each job renders a slice of the final image and all slices are combined at the end. The image was 768×2024 pixels in size and split in 11 slices. The amount of data involved is fairly small compared to the running time of each job and the app is therefore ideal for running on a remote resource like Amazon’s EC2.
My GridFactory cluster on EC2 with voluntary worker nodes, described in a previous post had grown to 11 instances. I decided to use this application to try out what kind of performance I could get from such an ad-hoc cluster. Notice that the cluster was composed of contributed worker nodes I had no control over – and judging from the running times, most of the instances were of the smallest possible kind.
For comparison I also ran on a non-virtualized GridFactory cluster with 4 worker nodes (actually each with 4 Intel i7 cores), each running max 4 jobs in parallel. The results are summarized below.
|EC2 voluntary grid||Dedicated hardware (no virtualization)|
|Average submission time per job (s)||1.0||0.82|
|Summed running time (s)||16106||2226|
|User real waiting time (submission, processing and data transfer time) (s)||4164||565|
Notes: It’s pretty clear that for these kind of very small runs, one is very vulnerable to the “tail” problem – the few last jobs that never seem to finish because they landed on very slow nodes. This problem is present on infrastructures without guaranteed quality of service, like public/academic grid infrastructures and like the ad-hoc EC2 cluster used above. The worker nodes of this cluster were probably doing other things apart from running the occasional GridFactory job. The variance of the execution times was consequently rather dramatic: from 98 to 4036 seconds. With larger productions, the tail problem should become less imminent, since the slow resources will typically tend to be small and not consume very many jobs: they will just snatch a few jobs and chew along on these while bigger resources process the bulk of the jobs and not affect the overall processing time negatively.
One way to avoid the tail problem is to run on dedicated hardware. This is what I did in the last run. This run demonstrated that there was no free lunch on EC2. On dedicated hardware, the production ran 7 times faster and in fact, the numbers show that I could have run the production almost twice as fast on a single core on my desktop than on the 11-nodes on EC2.