Page 1 |
Save page Remove page | Previous | 1 of 203 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
METASCALABLE HYBRID MESSAGE-PASSING AND MULTITHREADING ALGORITHMS FOR N-TUPLE COMPUTATION by Manaschai Kunaseth A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 13th, 2013 © 2013 Manaschai Kunaseth
Object Description
Title | Metascalable hybrid message-passing and multithreading algorithms for n-tuple computation |
Author | Kunaseth, Manaschai |
Author email | kunaseth@usc.edu;manaschai.kunaseth@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Computer Science |
School | Viterbi School of Engineering |
Date defended/completed | 2013-06-18 |
Date submitted | 2013-07-31 |
Date approved | 2013-07-31 |
Restricted until | 2013-07-31 |
Date published | 2013-07-31 |
Advisor (committee chair) | Nakano, Aiichiro |
Advisor (committee member) |
Lucas, Robert F. Shing, Katherine |
Abstract | The emergence of the multicore era has granted unprecedented computing capabilities. Extensively available multicore clusters have influenced hybrid message-passing and multithreading parallel algorithms to become a standard parallelization for modern clusters. However, hybrid parallel applications of portable scalability on emerging high-end multicore clusters consisting of multimillion cores are yet to be accomplished. Achieving scalability on emerging multicore platforms is an enormous challenge, since we do not even know the architecture of future platforms, with new hardware features such as hardware transactional memory (HTM) constantly being deployed. Scalable implementation of molecular dynamics (MD) simulations on massively parallel computers has been one of the major driving forces of supercomputing technologies. Especially, recent advancements in reactive MD simulations based on many-body interatomic potentials have necessitated efficient dynamic n-tuple computation. Hence, it is of great significance now to develop scalable hybrid n-tuple computation algorithms to provide a viable foundation for high-performance parallel-computing software on forthcoming architectures. ❧ This dissertation research develops a scalable hybrid message-passing and multithreading algorithm for n-tuple MD simulation, which will continue to scale on future architectures (i.e. achieving metascalability). The two major goals of this dissertation research are: (1) design a scalable hybrid message-passing and multithreading parallel algorithmic framework on multicore architectures and evaluate it on most advanced parallel architectures; and (2) develop a computation-pattern algebraic framework to design scalable algorithms for general n-tuple computation and prove its optimality in a systematic and mathematically rigorous manner. ❧ To achieve the first goal, we have developed and thoroughly analyzed algorithms for hybrid message passing interface (MPI) + open multiprocessing (OpenMP) parallelization of n-tuple MD simulation, which are scalable on large multicore clusters. Two data-privatization thread scheduling algorithms via nucleation-growth allocation have been designed: (1) compact-volume allocation scheduling (CVAS); and (2) breadth-first allocation scheduling (BFAS). These two algorithms combine fine-grain dynamic load balancing and minimal memory-footprint threading. Theoretical study has revealed decent asymptotic memory efficiency for both algorithms, thereby reducing 75% memory consumption compared to a naïve-threading algorithm. Furthermore, performance benchmarks have confirmed higher performance of the hybrid MD algorithm over a traditional algorithm on large multicore clusters, where 2.58-fold speedup of the hybrid algorithm over the traditional algorithm was observed on 32,768 nodes of IBM BlueGene/P. ❧ We have also investigated the performance characteristics of HTM on the IBM BlueGene/Q computer in comparison with conventional concurrency control mechanisms, using an MD application as an example. Benchmark tests, along with overhead-cost and scalability analysis, have quantified relative performance advantages of HTM over other mechanisms. We found that the bookkeeping cost of HTM is high but that the rollback cost is low. We have proposed transaction fusion and spatially compact scheduling techniques to reduce the overhead of HTM with minimal programming. A strong scalability benchmark has shown that the fused HTM has the shortest runtime among various concurrency control mechanisms without extra memory. Based on the performance characterization, we have derived a decision tree in the concurrency-control design space for general multithreading applications. ❧ To achieve the second goal, we have developed a computation-pattern algebraic framework to mathematically formulate general n-tuple computation. Based on translation/reflection-invariant properties of computation patterns within this framework, we have designed a shift-collapse (SC) algorithm for cell-based parallel MD. Theoretical analysis has quantified the compact n-tuple search space and small communication cost of SC-MD for arbitrary n, which are reduced to those in best pair-computation approaches (e.g. eighth-shell method) for n = 2. Benchmark tests have shown that SC-MD outperforms our production MD code at the finest grain, with 9.7- and 5.1-fold speedups on Intel-Xeon and BlueGene/Q clusters. SC-MD has also exhibited excellent strong scalability. ❧ In addition, we have analyzed the computational and data-access patterns of MD, which led to the development of a performance prediction model for short-range pair-wise force computations in MD simulations. The analysis and performance model provide fundamental understanding of computation patterns and optimality of certain parameters in MD simulations, thus allowing scientists to determine the optimal cell dimension in a linked-list cell method. The model has accurately estimated the number of operations during the simulations with the maximum error of 10.6% compared to actual measurements. Analysis and benchmark of the model have revealed that the optimal cell dimension minimizing the computation time is determined by a trade-off between decreasing search space and increasing linked-list cell access for smaller cells. ❧ One difficulty about MD is that it is a dynamic irregular application, which often suffers considerable performance deterioration during execution. To address this problem, an optimal data-reordering schedule has been developed for runtime memory-access optimization of MD simulations on parallel computers. Analysis of the memory-access penalty during MD simulations has shown that the performance improvement from computation and data reordering degrades gradually as data translation lookaside buffer misses increase. We have also found correlations between the performance degradation with physical properties such as the simulated temperature, as well as with computational parameters such as the spatial-decomposition granularity. Based on a performance model and pre-profiling of data fragmentation behaviors, we have developed an optimal runtime data-reordering schedule, thereby archiving speedup of 1.35, 1.36 and 1.28, respectively, for MD simulations of silica at temperatures 300 K, 3,000 K and 6,000 K. ❧ The main contributions of this dissertation research are two-fold: Metascalable hybrid message-passing and multithreading parallel algorithmic framework on emerging multicore parallel clusters, and a novel computation-pattern algebraic framework to design scalable algorithm for general n-tuple computation and prove its optimality in a mathematically rigorous manner. We expect that the proposed hybrid algorithms and mathematical approaches will provide a generic framework to a broad range of applications on future extreme-scale computing platforms. |
Keyword | n-tuple computation; algorithm; performance modeling; high performance computing; scientific computing; molecular dynamics |
Language | English |
Format (imt) | application/pdf |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m |
Contributing entity | University of Southern California |
Rights | Kunaseth, Manaschai |
Physical access | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
Repository name | University of Southern California Digital Library |
Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-KunasethMa-1911.pdf |
Archival file | uscthesesreloadpub_Volume7/etd-KunasethMa-1911.pdf |
Description
Title | Page 1 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text | METASCALABLE HYBRID MESSAGE-PASSING AND MULTITHREADING ALGORITHMS FOR N-TUPLE COMPUTATION by Manaschai Kunaseth A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 13th, 2013 © 2013 Manaschai Kunaseth |