Super linearity


Introduction

There's been quite a bit of debate over the past few years about whether or not it is possible for a parallel machine to "find the answer" to a problem more than n times faster than an "equivalent" sequential machine. The gloomy prospects for parallel speedups being (as some authors have argued) around O(log(n)) or even worse have dissuaded many from looking into massive parallelism.

The following small simulation is a "showth by construction" that there are some problems that can be solved on a parallel machine faster than in "linear time".

Scenario

In this simulation a machine -- be it sequential or parallel -- must execute at least MAX_JOBS independent tasks. Such tasks are typical in "hard problems" where, e.g., an independent set of "probes" into a search space must be made and the best of the results selected as a "near optimal" answer to the problem. We further require that no job may be aborted before the results from MAX_JOBS are obtained.

In the present scenario a "job" is known to have a rather contrived distribution of execution time. With probability P (> 1/2) it will execute for 1 unit of time, but with probability 1-P it will take up to MAX_TIME units of time.

Sequential machine

The sequential machine can only execute MAX_JOBS jobs, one after the other. In this present scenario we do not allow any machine to abort any job before it reaches its quota, if it seems that job will run for more than 1 unit of time.

Parallel machine

The parallel machine has a number n of independent processing elements (PE's) and their memories. A scheduling mechanism that has no overhead can allocate an in-coming job to any idle PE. The scheduler is FCFS -- the next in-coming job is started on the next available PE. After MAX_JOBS tasks have been completed the parallel simulation ends "instantly". I.e. no terminating overhead is assumed.

Toy simulation

C program

Output

Table 1

Discussion

The columns show, in order, the number of PE's in the simulated parallel machine, the simulated sequential time, the simulated parallel time, the speedup (== sequential time/parallel time) and the "efficiency" (== speedup/number of PE's in parallel machine).

There are a number of factors that contribute to the "super linear" effect. For example, after the parallel machine has executed its job quota is can simply abort any still-running PE's. Such aborted jobs are likely to be those that run for MAX_TIME time units. Our rules said that no jobs were allowed to be aborted before the quota was completed -- hence the sequential computer just had to executed every "long" job it came across (i.e. P -- around 10% -- of all the jobs it ran).

This "aborting" effect is seen to increase the efficiency by a small amount as the number of PE's approaches around 500. (The careful reader will also note the efficiency falls off toward 512 but then increases dramatically toward 1000 PE's and then appears to approach an asymptote).

But the larger effect -- and the one I concentrate on here -- is due to what I'll call "priority PE's".

The "num prio PE's" column, above, shows the average number (over MAX_IT samples) of PE's of the simulated parallel machine that -- by chance alone -- found they ran only "short" jobs of 1 time unit for the entire duration of the simulation. The "num jobs on prio PE's" column shows the number of jobs that were executed by such "priority PE's".

As the number of PE's exceeds around 500 we find the majority of PE's are running short jobs only. Hence the parallel machine does not have to execute very many of the "long jobs" in order to complete its quota of MAX_JOBS (== 5000) jobs.

When the number of PE's reaches around 1000 the "priority PE's" are responsible for MORE than the required quota of 5000 jobs. Hence the parallel time tends to drop to \ceiling(5000*1/n) while the expected sequential time remains at 5000*(.9*1+.1*1000) == 504500.

I.e. the speedup is tending to around 100*n and is moderately "super linear".

Small generalisations

The super-linearity doesn't "go away" if some forms of overhead are introduced to the parallel machine. For example, we can allow a fixed overhead for scheduling each job. Allowing 1 time unit of scheduling overhead per job we obtain:

Table 2

Allowing an additional lg(n) units of time for termination of the parallel machine gives us:

Table 3

With the addition of these modest overheads -- equivalent to the runtime of additional "short" jobs -- the super-linear effect is still seen.

Other things to see

Performance Models for Vector and Parallel Machines
[The basics, from UKY's Computer Architecture notes].

Evaluation of parallel performance
[By U. Ruede].

Theory of Parallel Algorithms Parallel Speedup
[SDSU CS662 page. Includes Superlinear Speedup in Theory ].

Any pointers about superlinear speedup?
On a conference I attended someone mentioned cases of superlinear speedup by parallelisation. Apart from cases where certain threshold values (e. g. cache sizes) play an important part, he told me of an example where several linear lists are searched by so many processors, which turn out to be capable to do this in less than 1/n of the time needed by one processor (which has to do a lot of context switching).
[From <http://www.ntua.gr:80/parallel/internet/usenet/comp.parallel/articles/1994/09-Sep/000088>].

Scalability & Speed-up
Scalability concerns with how the performance of a given partitioning changes when additional computers are made available. It is seen in general that scalability increases as the number of processors employed to solve the problem increases. However a stage is reached beyond which any further increase in the number of processors does not increase the speed of execution.


Kym Horsell /
Kym@KymHorsell.COM

ADVISORY: Email to these sites is filtered. Unsolicited email may be automajically re-directed to the relevant postmaster.