Background
The recent development of parallel and distributed computing software has
introduced a variety of software tools. These tools can be classified based on
the communication paradigms used to implement them such as remote procedure
call, shared memory, and message passing. Furthermore, these tools support
several parallel/distributed computing paradigms like data parallel,
functional parallel, and object model. This variety of tools and the supported
programming paradigms and languages make the selection of the best tool to run
a given class of applications on a parallel or distributed system a non-trivial
task that requires some research. We expect tool evaluation to receive more
attention as the deployment and usage of distributed systems increases.
There has been little research addressing this problem due to the fact that
most of the available software tools are new and not fully developed yet.
The main objective of this research is to remedy this problem by developing
multi-level evaluation methodology that aims at achieving the following
objectives.
- Provide the capability to compare tools from different perspectives,
such as tool performance, software development, and application performance.
- Provide the capability to select the best tool and computer system
architecture for a given class of applications.
- Serve as a valuable tool to improve existing software tools by
identifying their deficiencies and their bottlenecks.
In tool performance level, we focus on the performance of the tool primitives
when they run on different platforms (e.g., IBM-SP1, Alpha cluster, SUN
workstations) interconnected by different computer networks (e.g., Ethernet,
FDDI, ATM). In application performance level, we evaluate the performance of
four applications that are classified into four classes (Numerical Applications
, Signal and Image processing, Simulation, and Utilities).
In software development level, we evaluate the tool from the software
development point of view (e.g., ease of programming, language support).
The tools considered in this study are Express, p4, and PVM.
However, the same techniques can be applied to evaluate other tools.
Experimental Result
Tool Performance Level
In this level, we compare the performance of the tools in terms
of the performance of their communication primitives viz. snd/rcv and
broadcast primitives. We also introduce a parameter, Relative Overhead
with respect to Socket (ROS), to measure the amount of overhead incurred
when a tool is implemented using socket primitives.
In order to be able to compare the performance of tools running on
different platforms and networks, one needs to define a unitless quantity.
The execution time of a primitive can not be used as a basis for comparison
since all tools are not necessarily supported on all platforms. We use the
execution time of socket primitives as the ideal execution time for
implementing tool primitives since they are implemented using socket
communication library (BSD socket library).
Ethernet-based SUN Workstation Cluster
p4 implementation of point-to-point communications to this computing
environment has the best performance when compared to the PVM and Express
implementation. For this group communication primitive, Express has the best
performance while PVM has the worst performance. It is worth noting that the
tool that has the best snd/rcv performance does not necessarily imply the best
performance for broadcast/multicast primitives.
ATM-based SUN Workstation Cluster
We were not able to benchmark the broadcast primitive because the current
implementation of the ATM switch supports only two SUN IPX workstations.
The results of this environment benchmark are consistent with those of the
previous environment.
IBM-SP1
PVM has the best performance while Express has the worst performance in
point-to-point communication. Also, PVM has the best broadcast performance on
IBM-SP1. However, for small message sizes, all tools perform equally good.
FDDI-based ALPHA Cluster
p4 has the best snd/rcv performance whereas Express has the best broadcast
performance.
Application Performance Level
The applications chosen to benchmark the tools at this level include
two-dimensional FFT (Fast Fourier Transform), JPEG (Joint Photographic Experts
Group) simulation, Monte Carlo Integration, and Parallel Sorting by Sampling.
We have benchmarked these applications on all platforms such as IBM-SP1, Alpha
cluster, and SUN workstation. We also compare the results with those obtained
from running the same applications on CM5 parallel computer.
On ALPHA cluster, p4 implementation of JPEG simulation and 2D-FFT performed
best, whereas PVM and Express implementations were best for sorting and Monte
Carlo integration, respectively.
On IBM-SP1, the results are consistent with those obtained on the ALPHA
cluster. However, the execution times are significantly higher on IBM-SP1
than on ALPHA cluster.
Comparing the results with those of CM5 machine, ALPHA cluster outperforms
the other two platforms for the benchmarked applications. Furthermore, this
shows that a distributed system built around a high-performance workstations
and a high-speed network can provide high-performance computing comparable to
those offered by parallel and/or supercomputers.
Application Development Level
In this level, we quantify the application development properties according to
the Language Support, Interface and Integratability, Ease of Programming,
Debugging Capability, Tailoring the Software, Error Handling, Programming
Models Supported, Run time support for Parallel I/O, Portability criteria.
Among studied tools, Express is the best tool with respect to application
development facilities.