Page 1 |
Save page Remove page | Previous | 1 of 123 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
PARALLELIZATION FRAMEWORK FOR SCIENTIFIC APPLICATION KERNELS ON MULTI-CORE/MANY-CORE PLATFORMS by Liu Peng A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2011 Copyright 2011 Liu Peng
Object Description
Title | Parallelization framework for scientific application kernels on multi-core/many-core platforms |
Author | Peng, Liu |
Author email | liupeng@usc.edu;liupeng@usc.edu |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Computer Science |
School | Viterbi School of Engineering |
Date defended/completed | 2011-06-17 |
Date submitted | 2011-07-11 |
Date approved | 2011-07-13 |
Restricted until | 2011-07-13 |
Date published | 2011-07-13 |
Advisor (committee chair) | Nakano, Aiichiro |
Advisor (committee member) |
Prasanna, Viktor K. Shing, Katherine S. |
Abstract | The advent of multi-core/many-core paradigm has provided unprecedented computing power, and it is of great significance to develop a parallelization framework for various scientific applications to harvest the computing power. However, it is a great challenge to design an efficient parallelization framework that continues to scale on future architectures due to the complexity of real-world applications and the variety of multi-core/many-core platforms. ❧ To address this challenge, I propose a hierarchical optimization framework that maps applications to hardware by exploiting multiple levels of parallelization: (1) Inter-node level parallelism via spatial decomposition; (2) inter-core level parallelism via cellular decomposition; and (3) single-instruction multiple-data (SIMD) parallelization. The framework includes application-based SIMD analysis and optimization, which allows application scientists to determine whether their applications are viable for SIMDization and provide various code transformation techniques to enhance the SIMD efficiency as well as simple recipes when compiler auto-vectorization fails. I also propose a suite of optimization strategies to achieve ideal on-chip inter-core strong scalability on emerging many-core architectures: (1) A divide-and-conquer algorithm adaptive to local memory; (2) a novel data layout to improve data locality; (3) on-chip locality-aware parallel algorithms to enhance data reuse; and (4) a pipeline algorithm using data transfer agent to orchestrate computation and memory operations to hide latency to shared memory. ❧ I have applied the framework to three scientific applications, which represent most of the numerical classes in the seven dwarfs (which are known to cover most high performance computing applications): (1) Stencil computation, specifically lattice Boltzmann method (LBM)for fluid flow simulation; (2) molecular dynamics (MD) simulation; and (3) molecular fragment analysis via connected component detection. ❧ I have achieved high inter-node, inter-core (multithreading), and SIMD efficiency on various computing platforms: (1) For LBM, inter-node parallel efficiency 0.978 on 131,072 BlueGene/P processors, multithreading efficiency 0.882 on 6 cores of a Cell BE, and SIMD efficiency 0.780 using 4-element vector registers of a Cell BE; (2) for MD simulation, inter-node parallel efficiency 0.985 on 106,496 BlueGene/L processors, and inter-core multithreading parallel efficiency 0.99 on the 64-core Godson-T many-core architecture; (3) for molecular fragment analysis, nearly linear inter-node strong scalability up to 50 million vertices molecular graph on 32 computing nodes, and over 13-fold inter-core speedup on 16 cores. In addition, a simple performance model based on hierarchical parallelization is derived, which suggests that the optimization scheme is likely to scale well toward exascale. Furthermore, I have analyzed the impact of architectural features on applications' performance to find that certain architectural features are essential for these optimizations. ❧ This research not only suggests viable optimization techniques for broad scientific applications on future many-core parallel supercomputing platforms, but also provides guidance on effective architectural design of future supercomputing systems. |
Keyword | multi/many core; parallel computing; scientific simulation |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m |
Contributing entity | University of Southern California |
Rights | Peng, Liu |
Physical access | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
Repository name | University of Southern California Digital Library |
Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
Repository email | cisadmin@lib.usc.edu |
Archival file | uscthesesreloadpub_Volume71/etd-PengLiu-77.pdf |
Description
Title | Page 1 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text | PARALLELIZATION FRAMEWORK FOR SCIENTIFIC APPLICATION KERNELS ON MULTI-CORE/MANY-CORE PLATFORMS by Liu Peng A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2011 Copyright 2011 Liu Peng |