MyOOPS開放式課程
請加入會員以使用更多個人化功能
來自全球頂尖大學的開放式課程,現在由世界各國的數千名義工志工為您翻譯成中文。請免費享用!
課程來源:MIT
     
目前本課程尚未完成翻譯,請耐心等待,我們會儘快處理

以下為系統擷取之英文原文



18.337J / 6.338J Applied Parallel Computing (SMA 5505)

Spring 2005

Photo of typical parallel computers.The photo represents an example of parallel computing. The Beowulf project consists of one front end and nine computers, resulting in high computing performance. (Photo by Alan Edelman.)

Course Highlights

This course features lecture notes and student projects.

Course Description

Applied Parallel Computing is an advanced interdisciplinary introduction to applied parallel computing on modern supercomputers.

Technical Requirements

Special software is required to use some of the files in this course: .gz, .tar.





Syllabus

Description

Applied Parallel Computing is an advanced interdisciplinary introduction to applied parallel computing on modern supercomputers. Numerical topics include dense and sparse linear algebra, N-body problems, multigrid, fast-multipole, wavelets and Fourier transforms. Geometrical topics include partitioning and mesh generation. Other topics include applications oriented architecture, software systems, MPI, Cilk, data parallel systems, Parallel MATLAB®, caches and vector processors with hands-on emphasis on understanding the realities and myths of what is possible on the world's fastest machines. One emphasis for this course will be VHLLs or very high level languages for parallel computing. Are we there yet?



Prerequisite

Linear Algebra (18.06)



Assignments and Projects

There are 3 homework assignments. The first assignment is to pick a topic of interest for your final project. You will need to hand in 2 progress reports, and a final project report. Also, you are required to do two presentations on the project, one before Spring break when you hand in your first progress report, and one at the end of the semester. Also you need to keep a Web site on your project. All reports and the Web site will be made public and archived.



Grading


activitiesPercentages
Scribing (Commenting on Notes for One Lecture)5%
Homework45%
Project (Roughly Half Semester)50%





Discussion

Every week there will be an optional discussion session. This is great place to get research ideas/project ideas.





Calendar

LEC #TOPICSKEY DATES
1Introduction
2MPI, OpenMP, MATLAB®*P
3Parallel Prefix
4Dense Linear Algebra I
5Dense Linear Algebra II
6Sparse Linear Algebra I
7Parallel Computer ArchitectureHomework 0 due two days before Lec #7
8STAR-P DemoProject proposal due
9Sparse Linear Algebra II
10FFT
11Domain Decomposition I
12-13Project Proposal and Progress PresentationHomework 1 due on Lec #13
14Sparse Linear Algebra III
15Fast Multipole I
17Fast Multipole II
18Fast Multipole III
19Domain Decomposition II
20Partitioning IProgress report due

Homework 2 due
21Partitioning II
22Mesh Generation
23Package Design of the Connection Machine
24SVM and SVD
25FFTW
26Student Project Presentations I
27Student Project Presentations II
28Student Project Presentations IIIFinal projects due




Lecture Notes

This section contains documents that could not be made accessible to screen reader software. A "#" symbol is used to denote such documents.

Selected Lecture Notes from Spring 2003 Class

The course notes below are a work in progress. The lecture numbers do not correspond to the class session numbers. The notes are available as a single file (PDF - 4.3 MB)# or as separate chapters below.



Chapter 1: Introduction (PDF - 1.2 MB)

1.1. The Machines
1.2. The Software
1.3. The Reality of High Performance Computing
1.4. Modern Algorithms
1.5. Compilers
1.6. Scientific Algorithms
1.7. History, State-of-Art, and Perspective

  • 1.7.1. Things that are not Traditional Supercomputers

1.8. Analyzing the Top500 List Using Excel

  • 1.8.1. Importing the XML File
  • 1.8.2. Filtering
  • 1.8.3. Pivot Tables

1.9. Parallel Computing: An Example
1.10. Exercises



Chapter 2: MPI, OpenMP, MATLAB®*P (PDF)

2.1. Programming Style
2.2. Message Passing

  • 2.2.1. Who am I?
  • 2.2.2. Sending and Receiving
  • 2.2.3. Tags and Communicators
  • 2.2.4. Performance, and Tolerance
  • 2.2.5. Who's got the floor?

2.3. More on Message Passing

  • 2.3.1. Nomenclature
  • 2.3.2. The Development of Message Passing
  • 2.3.3. Machine Characteristics
  • 2.3.4. Active Messages

2.4. OpenMP for Shared Memory Parallel Programming
2.5. STARP



Chapter 3: Parallel Prefix (PDF)

3.1. Parallel Prefix
3.2. The "Myth" of lg n
3.3. Applications of Parallel Prefix

  • 3.3.1. Segmented Scan
  • 3.3.2. Csanky's Matrix Inversion
  • 3.3.3. Babbage and Carry Look-Ahead Addition

3.4. Parallel Prefix in MPI



Chapter 4: Dense Linear Algebra (PDF)

4.1. Dense Matrices
4.2. Applications

  • 4.2.1. Uncovering the Structure from Seemingly Unstructured Problems

4.3. Records
4.4. Algorithms, and Mapping Matrices to Processors
4.5. The Memory Hierarchy
4.6. Single Processor Condiderations for Dense Linear Algebra

  • 4.6.1. LAPACK and the BLAS
  • 4.6.2. Reinventing Dense Linear Algebra Optimization

4.7. Parallel Computing Considerations for Dense Linear Algebra
4.8. Better Load Balancing

  • 4.8.1. Problems


Chapter 5: Sparse Linear Algebra (PDF)

5.1. Cyclic Reduction for Structured Sparse Linear Systems
5.2. Sparse Direct Methods

  • 5.2.1. LU Decomposition and Gaussian Elimination
  • 5.2.2. Parallel Factorization: The Multifrontal Algorithm

5.3. Basic Iterative Methods

  • 5.3.1. SuperLU-dist
  • 5.3.2. Jacobi Method
  • 5.3.3. Gauss-Seidel Method
  • 5.3.4. Splitting Matrix Method
  • 5.3.5. Weighted Splitting Matrix Method

5.4. Red-Black Ordering for Parallel Implementation
5.5. Conjugate Gradient Method

  • 5.5.1. Parallel Conjugate Gradient

5.6. Preconditioning
5.7. Symmetric Supernodes

  • 5.7.1. Unsymmetric Supernodes
  • 5.7.2. The Column Elimination Tree
  • 5.7.3. Relaxed Supernodes
  • 5.7.4. Supernodal Numeric Factorization

5.8. Efficient Sparse Matrix Algorithms

  • 5.8.1. Scalable Algorithms
  • 5.8.2. Cholesky Factorization
  • 5.8.3. Distributed Sparse Cholesky and the Model Problem
  • 5.8.4. Parallel Block-Oriented Sparse Cholesky Factorization

5.9. Load Balance with Cyclic Mapping

5.10. Heuristic Remapping
5.11. Scheduling Local Computations



Chapter 6: Parallel Machines (PDF)#



Chapter 7: FFT (PDF)

7.1. FFT

  • 7.1.1. Data Motion
  • 7.1.2. FFT on Parallel Machines
  • 7.1.3. Exercises

7.2. Matrix Multiplication
7.3. Basic Data Communication Operations



Chapter 8: Domain Decomposition (PDF)#

8.1. Geometric Issues

  • 8.1.1. Overlapping vs. Non-overlapping Regions
  • 8.2.2. Geometric Discretization

8.2. Algorithmic Issues

  • 8.2.1. Classical Iterations and their Block Equivalents
  • 8.2.2. Schwarz Approaches: Additive vs. Multiplicative
  • 8.2.3. Substructuring Approaches
  • 8.2.4. Accellerants

8.3. Theoretical Issues
8.4. A Domain Decomposition Assignment: Decomposing MIT



Chapter 9: Particle Methods (PDF)#

9.1. Reduce and Broadcast: A Function Viewpoint
9.2. Particle Methods: An Application
9.3. Outline
9.4. What is N-Body Simulation?
9.5. Examples
9.6. The Basic Algorithm

  • 9.6.1. Finite Difference and the Euler Method

9.7. Methods for Force Calculation

  • 9.7.1. Direct Force Calculation
  • 9.7.2. Potential Based Calculation
  • 9.7.3. Poisson Methods
  • 9.7.4. Hierarchical Methods

9.8. Quadtree (2D) and Octtree (3D) : Data Structures for Canonical Clustering
9.9. Barnes-hut Method (1986)

  • 9.9.1. Approximating Potentials

9.10. Outline
9.11. Introduction
9.12. Multipole Algorithm: An Overview
9.13. Multipole Expansion
9.14. Taylor Expansion
9.15. Operation No.1 - SHIFT
9.16. Operation No.2 - FLIP
9.17. Application on Quad Tree
9.18. Expansion from 2-D to 3-D
9.19. Parallel Implementation



Chapter 10: Partitioning and Load Balancing (PDF - 1.1 MB)

10.1. Motivation from the Parallel Sparse Matrix Vector Multiplication
10.2. Separators
10.3. Spectral Partitioning - One way to Slice a Problem in Half

  • 10.3.1. Electrical Networks
  • 10.3.2. Laplacian of a Graph
  • 10.3.3. Spectral Partitioning

10.4. Geometric Methods

  • 10.4.1. Geometric Graphs
  • 10.4.2. Geometric Partitioning: Algorithm and Geometric Modeling
  • 10.4.3. Other Graphs with Small Separators
  • 10.4.4. Other Geometric Methods
  • 10.4.5. Partitioning Software

10.5. Load-Balancing N-body Simulation for Non-uniform Particles

  • 10.5.1. Hierarchical Methods of Non-uniformly Distributed Particles
  • 10.5.2. The Communication Graph for N-body Simulations
  • 10.5.3. Near-field Graphs
  • 10.5.4. N-body Communication Graphs
  • 10.5.5. Geometric Modeling of N-body Graphs


Chapter 11: Mesh Generation (PDF)

11.1. How to Describe a Domain?
11.2. Types of Meshes
11.3. Refinement Methods

  • 11.3.1. Hierarchical Refinement
  • 11.3.2. Delaunay Triangulation
  • 11.3.2. Delaunay Refinement

11.4. Working With Meshes
11.5. Unsolved Problems



Chapter 12: Support Vector Machines and Singular Value Decomposition (PDF )

12.1. Support Vector Machines

  • 12.1.1. Learning Models
  • 12.1.2. Developing SVMs
  • 12.1.3. Applications

12.2. Singular Value Decomposition



Bibliography (PDF)

Selected Lecture Notes from Spring 2003 Class


LEC #TOPICS
1Introduction (PDF - 1.3 MB)
2MPI, MATLAB®*P (TXT)
4Parallel Prefix (PDF)
5Parallel Computer Architecture I (PDF)
6Parallel Computer Architecture II (PDF)
7Dense Linear Algebra I

(PDF 1) (Courtesy of James Demmel. Used with permission.)
(PDF 2)
(PDF 3 - 1.1 MB)
(PDF 4) (Courtesy of Jack Dongarra, University of Tennessee. Used with permission.)
9FFT (PDF)
11Graph Partitioning (PDF)
14N-Body Problem - Barnes Hut (PDF)#
18Domain Decomposition (PDF)#
23Support Vector Machines (PDF 1) (PDF 2)# (Courtesy of Tong Wen. Used with permission.)




Assignments

Special software is required to use some of the files in this section: .gz, .tar.

There are 3 homework assignments. Associated files are provided for assignments 0 and 1.

Homework 0: Buffon-Laplace Needle Problem (PDF)
hw0.tar.gz (GZ)

Homework 1: Parallel Matrix Power (via Prefix) (PDF)
hw1.tar.gz (GZ)

Homework 2: Parallel Sparse Conjugate Gradients (PDF)





Projects

The project counts toward 50% of the course grade. Students are required to hand in 2 progress reports, and a final project report. Below are samples of final student reports from the 2003 class.

All work is done by students named and used with permission.


PROJECT TITLESAUTHORS
Comparison of Programming and Synchronization Techniques (PDF)

Sean Lie

FFTW and MATLAB®*P (PDF)Richard Hu
Parallel Implementation of a Multi-Length Scale Finite Element Method (PDF)Trevor Tippetts
Java™ MPI in MATLAB®*P (PDF)Max Goldman and Da Guo
A Parallel Hierarchical Solver for the Poisson Equation (PDF)R. Sudarshan and Seung Lee
Sparse Matrix Implementation on MATLAB®*P (PDF)Stu Blair
Parallelizing Incremental Bayesian Segmentation (IBS)Joseph Hastings and Siddhartha Sen





Some Ideas

  • Parallel Jacobi Computation/Simulation
  • Running MATLAB®*P across Multiple Clusters via IMPI
  • 'Automatically' parallelize for-loops with mm mode?
  • Use Video Cards to do Interesting Parallel Computation
  • Parallel Filters
  • Write new front end for *p- Octave? R? Maxima? Write interpreter?
  • Try MATLAB® Optimization Stuff in *P
  • Computer Graphics: Ray Tracing
  • Computer Graphics: Simulate Cloth and Movement of Cloth
  • Computer Graphics: Any Sort of Animation
  • Simulate a Musical Instrument (Wind, String, Percussion, Other?)
  • Simulate a Stellar Collision (Scientific American, Nov 2002)
  • Parallel Linear Programming (Perhaps take Advantage of Star*P)
  • HTML Renderer
  • Integer Factorization by Quadratic or Numberfield Sieve
  • Replicate a known result: 2^20,996,011-1 is prime. 4-color map theorem. 4x4x4 3D tic-tac-toc-toe is first-player (?) win. Kepler conjecture.
  • Create a game playing program for some game other than Chess. Or chess, if you really want.
  • Generate Some Fractal Images
  • Program perturbation. Investigate the effects of typos for a particular programming language/compiler. Start with a working program and introduce perturbations (typos). What percentage cause a compile-time error?
  • Computational biology--Motif discovery: given one set of strings labeled positive, and another labeled negative, find the word (substring) that is present in each positives string not in any of the negatives, with the complication that there is noise: many strings are misclassified in the wrong set.
  • Computational biology. A genome typically contains instances of long chunks of DNA that are identical to another chunk somewhere else. Remove these duplicated chunks.
  • Keep in mind that the human genome will not fit in RAM.
  • Theorem prover. Either parallelize a theorem prover, or use one to prove something interesting.




Related Resources

Here are a list of links to interesting readings related to this course.

Q: Is Supercomputing dead? (Oct 1994)
A: There is a Future for High-performance Computing (Sep 98)

Is this the supercomputing model for the future? Beowulf
Or this? Blue Gene - Petaflop Supercomputer



Parallel MATLAB®

Documents Relating to STAR-P and Predecessors

Parallel MATLAB® Survey



Grid Computing

Grid Computing Info Center - All you Need to Know about Grid Computing

Globus - One of the Major Computational Grid Projects

World Record for Largest Dense Linear System Solved (Though Really a Structured Application of Multipole)

Held by UIUC



Parallel Architecture

MPI - Message Passing Interface

SETI@home - distributed Computing over the Internet



Parallel Softwares

Netlib - Collection of Mathematical Software, Many of them Parallel

The ACTS Toolkit - a Collection of Scientific Computation Software, Many of them will be Discussed in the Course

ParMETIS - A Parallel Graph Partitioning Software



Fast Computers

TOP500 Supercomputers

ASC - Home of the Fastest Computer in the World



Miscellaneous

Parascope - A Listing of Parallel Computing Sites

Pictures of Supercomputers

A Timeline of Supercomputing History




留下您對本課程的評論
標題:
您目前為非會員,留言名稱將顯示「匿名非會員」
只能進行20字留言

留言內容:

驗證碼請輸入7 + 4 =

標籤

現有標籤:1
新增標籤:


有關本課程的討論

目前暫無評論,快來留言吧!

Creative Commons授權條款 本站一切著作係採用 Creative Commons 授權條款授權。
協助推廣單位: