Background

The recent development of parallel and distributed computing software has introduced a variety of software tools. These tools can be classified based on the communication paradigms used to implement them such as remote procedure call, shared memory, and message passing. Furthermore, these tools support several parallel/distributed computing paradigms like data parallel, functional parallel, and object model. This variety of tools and the supported programming paradigms and languages make the selection of the best tool to run a given class of applications on a parallel or distributed system a non-trivial task that requires some research. We expect tool evaluation to receive more attention as the deployment and usage of distributed systems increases.

There has been little research addressing this problem due to the fact that most of the available software tools are new and not fully developed yet. The main objective of this research is to remedy this problem by developing multi-level evaluation methodology that aims at achieving the following objectives.

In tool performance level, we focus on the performance of the tool primitives when they run on different platforms (e.g., IBM-SP1, Alpha cluster, SUN workstations) interconnected by different computer networks (e.g., Ethernet, FDDI, ATM). In application performance level, we evaluate the performance of four applications that are classified into four classes (Numerical Applications , Signal and Image processing, Simulation, and Utilities). In software development level, we evaluate the tool from the software development point of view (e.g., ease of programming, language support). The tools considered in this study are Express, p4, and PVM. However, the same techniques can be applied to evaluate other tools.

Experimental Result

Tool Performance Level

In this level, we compare the performance of the tools in terms of the performance of their communication primitives viz. snd/rcv and broadcast primitives. We also introduce a parameter, Relative Overhead with respect to Socket (ROS), to measure the amount of overhead incurred when a tool is implemented using socket primitives. In order to be able to compare the performance of tools running on different platforms and networks, one needs to define a unitless quantity. The execution time of a primitive can not be used as a basis for comparison since all tools are not necessarily supported on all platforms. We use the execution time of socket primitives as the ideal execution time for implementing tool primitives since they are implemented using socket communication library (BSD socket library).

Ethernet-based SUN Workstation Cluster

p4 implementation of point-to-point communications to this computing environment has the best performance when compared to the PVM and Express implementation. For this group communication primitive, Express has the best performance while PVM has the worst performance. It is worth noting that the tool that has the best snd/rcv performance does not necessarily imply the best performance for broadcast/multicast primitives.

ATM-based SUN Workstation Cluster

We were not able to benchmark the broadcast primitive because the current implementation of the ATM switch supports only two SUN IPX workstations. The results of this environment benchmark are consistent with those of the previous environment.

IBM-SP1

PVM has the best performance while Express has the worst performance in point-to-point communication. Also, PVM has the best broadcast performance on IBM-SP1. However, for small message sizes, all tools perform equally good.

FDDI-based ALPHA Cluster

p4 has the best snd/rcv performance whereas Express has the best broadcast performance.

Application Performance Level

The applications chosen to benchmark the tools at this level include two-dimensional FFT (Fast Fourier Transform), JPEG (Joint Photographic Experts Group) simulation, Monte Carlo Integration, and Parallel Sorting by Sampling. We have benchmarked these applications on all platforms such as IBM-SP1, Alpha cluster, and SUN workstation. We also compare the results with those obtained from running the same applications on CM5 parallel computer.

On ALPHA cluster, p4 implementation of JPEG simulation and 2D-FFT performed best, whereas PVM and Express implementations were best for sorting and Monte Carlo integration, respectively.

On IBM-SP1, the results are consistent with those obtained on the ALPHA cluster. However, the execution times are significantly higher on IBM-SP1 than on ALPHA cluster.

Comparing the results with those of CM5 machine, ALPHA cluster outperforms the other two platforms for the benchmarked applications. Furthermore, this shows that a distributed system built around a high-performance workstations and a high-speed network can provide high-performance computing comparable to those offered by parallel and/or supercomputers.

Application Development Level

In this level, we quantify the application development properties according to the Language Support, Interface and Integratability, Ease of Programming, Debugging Capability, Tailoring the Software, Error Handling, Programming Models Supported, Run time support for Parallel I/O, Portability criteria. Among studied tools, Express is the best tool with respect to application development facilities.