Computer Science 402
High Performance Computing

Denison

Project Specification and Proposal:

Proposal Due Wed. Mar. 9, 2011, 1:30pm
Final Project Delivery Due Monday, May 9th 9am.

Specification

Learning Objective:

The project uses a parallel program, targeted in any language and on any of the parallel architectures we have at our disposal, as a means of applying ideas from class about parallel algorithms and parallel programming/computation. The problem domain, computation, implementation, and evaluation are not constrained, so you can explore and be creative. Focus on what interests you, and try hard to put the parallelism ideas into practice.

Project Goals:

Your goals in the project are (a) to challenge yourself in designing and implementing a parallel computation, and (b) to reveal to me (mostly in the concluding report) much of what you have learned in class. Remember that we have no final in this class, and so the project and report must serve for me to assess your overall learning and mastery of the material.

Tasks:

  1. Select a computation that interests you and that would benefit from parallelism. A few topics are listed below, and there are many similar topics to be found on the Web, but the best topic is one that interests you and about which you may have some special knowledge. You can use some of the ideas in Chapter 11 of the book on Capstone Project Ideas as well.
  2. Select a language to write the program in. At this point, we have covered basic material on PThreads and MPI. We will have units on OpenMP and CUDA/GPGPU programming after Spring Break. You can venture in another direction, but we may have to work to get the resources needed.
  3. Write an initial program for the problem; call it P1. The purpose of P1 is to have a solution from which you can revise and improve the computation. Do not be ambitious, but get a solution working quickly for the core computation. Accept a possibly naïve parallel solution. (A sequential solution is unacceptable except for unimportant parts of the computation like initialization.) Avoid embellishments and fancy I/O; accept constraints on the solution ("n is a power of 2").
  4. Using the CTA performance model presented in the book, your understanding of parallel computers, your knowledge of parallel algorithms, and your general CS smarts critique the P1 program. That is, identify places where there are inefficiencies. Improve P1 to create P2, or for some projects, create a competitive P2.
  5. Gather evidence about the performance of P1 and P2 to test your understanding of whether the "improvement" actually improved the program. Generally, this evidence will involve running your program on the cluster machine or other parallel processor.
  6. Write a report describing what you did, how you analyzed your program (Task 4), how you improved it and why, and what the experimental evidence was. Include a listing of your commented program. This should be complete and well written and include a bibliography and supporting performance evaluation. Your grade will come primarily from the report, although if you have some demonstration, I will consider that as well.

Proposal

Your proposal should be roughly two pages long (not including the bibliography), and should include the following information:

Possible Topic Areas

Your Topic Here

The best project topic is one that interests you. If you have a topic you like, think about how a project might go, then send me an email outline of what you'd like to try. Look at the list of titles for Table 11.1 in the book to see if one of them sparks your interest.

Find an Application from one of the Sciences

The Professors in Physics, Chemistry, and Biology often have problems of significant computational size that could benefit from a parallel solution. They may already have a sequential solution that you can work to parallelize. Or they could have a parallel solution that you can evaluate and then create improvements for.

Commonly Cited Parallel Applications

The online literature is filled with examples that are generally thought to be good candidates for parallel solution: MPEG compression, Smith-Waterman genome matching computation, many body (gravitation) simulation, etc. The examples usually involve large amounts of data or computation, or both. The experiments needed to assess P1 versus P2 do not have to be large, only large enough to demonstrate whatever point is being made.

Game Searches

Because board games have a succinct description, they are a common example of a work queue approach; moreover, searching is a task that is often improved by parallelism. If you have an interest in games, implement a search for a board configuration with a certain property.

Graph Computations

The All Pairs Shortest Path was an easy computation in ZPL. Find a computation on a graph and develop a ZPL solution. For example, the closest pair of points (Euclidean) is a computation that often uses a k-d tree partitioning of the point space. A regular k-d tree is a structure that can easily be imposed on a linear array of points. Once partitioned, the points can be moved to individual processors with remap, and the closest computation performed locally, and with neighbors for points close to the boundaries.

Compete Against a Benchmark

There are a variety of parallel benchmark suites to be found on the Web, such as the NAS Parallel Benchmarks (NPB), SPEC HPC2002, Cray’s Application Kernel Matrix (AKM), or Stanford's SPLASH (including a Barnes-Hut N-Body). Some of these computations can be large, but one approach is to formulate your parallel solution using the principles from class, and lift the scalar code from the benchmark (assuming a compatible base language). In creating your P1 and P2 programs, you need to apply a significant, new idea that is not part of published examples of the benchmark. In addition to comparing your P1 and P2 performance, compare your result to a solution from the suite’s site.