Super linearity
Introduction
There's been quite a bit of debate over the past few years about
whether or not it is possible for a parallel machine to
"find the answer" to a problem more than n times
faster than an "equivalent" sequential machine. The gloomy prospects
for parallel speedups being (as some authors have argued) around
O(log(n)) or even worse have dissuaded many from looking into
massive parallelism.
The following small simulation is a "showth by construction" that
there are some problems that can be solved on a parallel machine
faster than in "linear time".
Scenario
In this simulation a machine -- be it sequential or parallel -- must
execute at least MAX_JOBS independent tasks. Such tasks are
typical in "hard problems" where, e.g., an independent set of
"probes" into a search space must be made and the best of the results
selected as a "near optimal" answer to the problem.
We further require that no job may be aborted before the results from
MAX_JOBS are obtained.
In the present scenario a "job" is known to have a rather contrived
distribution of execution time. With probability P (> 1/2) it will execute
for 1 unit of time, but with probability 1-P it will take up
to MAX_TIME units of time.
Sequential machine
The sequential machine can only execute MAX_JOBS jobs, one after
the other. In this present scenario we do not allow any machine to
abort any job before it reaches its quota,
if it seems that job will run for more than 1 unit of time.
Parallel machine
The parallel machine has a number n of independent processing
elements (PE's) and their memories.
A scheduling mechanism that has no overhead can allocate an in-coming
job to any idle PE. The scheduler is FCFS --
the next in-coming job is started on the next available PE.
After MAX_JOBS tasks have been completed the parallel simulation
ends "instantly". I.e. no terminating overhead is assumed.
Toy simulation
Output
Discussion
The columns show, in order, the number of PE's in the simulated parallel
machine, the simulated sequential time, the simulated parallel time,
the speedup (== sequential time/parallel time) and the "efficiency"
(== speedup/number of PE's in parallel machine).
There are a number of factors that contribute to the "super linear"
effect. For example, after the parallel machine has executed its
job quota is can simply abort any still-running PE's. Such aborted
jobs are likely to be those that run for MAX_TIME time units.
Our rules said that no jobs were allowed to be aborted before the
quota was completed -- hence the sequential computer just had to
executed every "long" job it came across (i.e. P -- around 10% -- of
all the jobs it ran).
This "aborting" effect is seen to increase the efficiency by a
small amount as the number of PE's approaches around 500.
(The careful reader will also note the efficiency falls off
toward 512 but then increases dramatically toward 1000 PE's and
then appears to approach an asymptote).
But the larger effect -- and the one I concentrate on here --
is due to what I'll call "priority PE's".
The "num prio PE's" column, above, shows the average number (over
MAX_IT samples) of PE's of the simulated parallel machine that --
by chance alone -- found they ran only "short" jobs of 1 time unit
for the entire duration of the simulation.
The "num jobs on prio PE's" column shows the number of jobs that
were executed by such "priority PE's".
As the number of PE's exceeds around 500 we find
the majority of PE's are running short jobs only.
Hence the parallel machine does not
have to execute very many of the "long jobs"
in order to complete its quota of MAX_JOBS (== 5000) jobs.
When the number of PE's reaches around 1000 the
"priority PE's" are responsible for MORE than the required quota of
5000 jobs.
Hence the parallel time tends to drop to \ceiling(5000*1/n) while the
expected sequential time remains at 5000*(.9*1+.1*1000) == 504500.
I.e. the speedup is tending to around 100*n and is moderately "super linear".
Small generalisations
The super-linearity doesn't "go away" if some forms of
overhead are introduced to the parallel machine.
For example, we can allow a fixed overhead for
scheduling each job. Allowing 1 time unit of scheduling overhead
per job we obtain:
Table 2
Allowing an additional lg(n) units of time for termination
of the parallel machine gives us:
Table 3
With the addition of these modest overheads -- equivalent to the runtime of
additional "short" jobs -- the super-linear effect is still seen.
Other things to see
-
Performance Models for Vector and
Parallel Machines
-
[The basics, from UKY's
Computer Architecture notes].
-
Evaluation of parallel performance
-
[By U. Ruede].
-
Theory of Parallel Algorithms Parallel Speedup
-
[SDSU CS662 page. Includes
Superlinear Speedup in Theory
].
-
Any pointers about superlinear speedup?
-
On a conference I attended someone mentioned cases of superlinear
speedup by parallelisation. Apart from cases where certain
threshold values (e. g. cache sizes) play an important part,
he told me of an example where several linear lists are searched
by so many processors, which turn out to be capable to do this
in less than 1/n of the time needed by one processor (which has to
do a lot of context switching).
[From <http://www.ntua.gr:80/parallel/internet/usenet/comp.parallel/articles/1994/09-Sep/000088>].
-
Scalability & Speed-up
-
Scalability concerns with how the performance of a given partitioning changes when additional computers are
made available. It is seen in general that scalability increases as the number of processors employed to solve the
problem increases. However a stage is reached beyond which any further increase in the number of processors does
not increase the speed of execution.
Kym Horsell /
Kym@KymHorsell.COM
ADVISORY:
Email to these sites is filtered. Unsolicited email may
be automajically re-directed to the relevant postmaster.