Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A Boltzmann model for tracking aerosols in turbulent flows
(USC Thesis Other)
A Boltzmann model for tracking aerosols in turbulent flows
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A BOLTZMANN MODEL FOR TRACKING AEROSOLS IN TURBULENT FLOWS by Sewoong Jung A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (AEROSPACE AND MECHANICAL ENGINEERING) May 2013 Copyright 2013 Sewoong Jung i This document is dedicated to my family. ii Table of Contents List of Tables ........................................................................................................ v List of Figures ....................................................................................................... vi Abstract ................................................................................................................ ix Chapter 1: Introduction ......................................................................................... 1 1.1 Particle transport in turbulence ............................................................. 1 1.2 Lattice Boltzmann methods using GPU processors.............................. 4 1.3 Simulation of fluid and particle tracking ................................................ 7 1.3.1 Lagrangian simulations ............................................................ 10 1.3.2 Eulerian models ........................................................................ 13 1.3.3 Particle deposition in wall bounded turbulent flows .................. 15 1.4 Wakes behind a square cylinder in a channel .................................... 16 1.5 Turbulent flow in a square duct .......................................................... 19 1.6 Turbophoresis and particle deposition ................................................ 23 Chapter 2: Fluids Modeling ................................................................................. 26 2.1 Lattice Boltzmann method .................................................................. 26 2.2 The Boltzmann and lattice Boltzmann equations ................................ 27 2.3 Equilibrium distribution function .......................................................... 31 2.4 Forcing term ....................................................................................... 33 2.5 Boundary conditions ........................................................................... 34 2.6 Sub-grid scale turbulence model ........................................................ 36 Chapter 3: Boltzmann-based model for tracking aerosol particles...................... 38 3.1 Equations of motion ............................................................................ 38 Chapter 4: Numerical Simulation ........................................................................ 43 iii 4.1 Implementation of the fluids model ..................................................... 43 4.2 Parallelization of fluids model ............................................................. 45 4.3 Simulation setup ................................................................................. 45 4.3.1 A square cylinder placed in a channel flow .............................. 46 4.3.2 Square duct flow ....................................................................... 47 Chapter 5: GPU Parallelization ........................................................................... 50 5.1 Introduction ......................................................................................... 50 5.2 GPU parallelization and CUDA ........................................................... 52 5.3 Implementation of the lattice Boltzmann model using CUDA ............. 55 5.4 Performance ....................................................................................... 57 Chapter 6: Results .............................................................................................. 60 6.1 Wakes behind a square cylinder in a channel .................................... 60 6.1.1 Flow validation .......................................................................... 60 6.1.2 Particle tracking ........................................................................ 61 6.2 Transport in a turbulent square duct ................................................... 65 6.2.1 Flow validation .......................................................................... 65 6.2.2 Particle tracking in a static probability field ............................... 70 6.2.2.1 Visualization of preferential concentration ......................... 71 6.2.2.2 Deposition rate .................................................................. 73 6.3 GPU Boltzmann model simulation ...................................................... 78 6.3.1 Square duct flow simulation comparison .................................. 79 6.3.2 GPU Performance .................................................................... 81 Chapter 7: Conclusions ...................................................................................... 83 Bibliography ........................................................................................................ 87 Appendices ......................................................................................................... 96 Appendix A: LB equation to Navier-Stokes equation .................................... 96 Appendix B: Program code .......................................................................... 99 B.1 A square cylinder placed in 2D channel (Fortran) ......................... 99 iv B.2 3D square duct flow (Fortran) ...................................................... 112 B.3 3D square duct flow (CUDA) ....................................................... 128 v List of Tables Table 6.1 Comparison of Boltzmann model relaxation factor and relaxation time. .................................................................................................... 74 Table 6.2 Execution time per 1000 time steps and speedup for CPU and GPU implementation of particle tracking with static and evolving flow ..................................................................................................... 81 vi List of Figures Figure 1.1 Mean secondary flows in a square duct. Note the y-z is cross- section and axes are normalized by width of the duct. ...................... 22 Figure 1.2 Particle deposition velocities from fully-developed turbulent pipe flow (Young and Leeming, 1997) ...................................................... 24 Figure 2.1 Hexagonal lattice model. ................................................................... 27 Figure 2.2 Lattice velocity directions of (a) D2Q9 and (b) D3Q19. ..................... 30 Figure 2.3 Schematic of the half-way bounce-back algorithm for the D2Q9 model. ............................................................................................... 35 Figure 3.1 Interpolation of probability distribution function from neighboring nodes. ............................................................................................... 41 Figure 4.1 Flow chart of the implementation of LBM. ......................................... 44 Figure 4.2 Computational domain of a rectangular channel and a square cylinder .............................................................................................. 47 Figure 4.3 Computational domain of the straight square duct with associated coordinate system. ............................................................................ 48 Figure 5.1 A comparison of the growth of the computational power of GPUs and CPUs, adapted from Fig 1 of NVIDIA CUDA Programming Guide. ............................................................................................... 51 Figure 5.2 Schematic assembly of a CPU and a GPU. ...................................... 52 Figure 5.3 GPU parallelized domain for square duct flow simulation. ................. 56 vii Figure 5.4 Performance results of LBM calculation for various domain sizes. (a) Computational speed per time step. (b) Speedup of GPU versus CPU. ...................................................................................... 59 Figure 6.1 Instantaneous plot of a square cylinder placed in a channel flow at Re=1000: (a) velocity vector plot, (b) stream line plot. Notes X and Y axis are normalized by length of the square cylinder, B. ................ 61 Figure 6.2 Inertial particle distribution of behind a cylinder in 2D channel. ......... 63 Figure 6.3 Particle deposition efficiency with Stokes number. ............................ 64 Figure 6.4 Secondary flow of the vectors normalized by W in a square duct: (a) Contours of the mean streamwise velocity field in the y-z cross section. (b) Mean secondary velocity in the lower left quadrant of the duct. ............................................................................................ 66 Figure 6.5 Mean streamwise velocity profile at the centerline of the duct compared with previous DNS result and law of the wall. ................... 67 Figure 6.6 Root-mean-square (RMS) velocity fluctuations normalized by the friction velocity for fully developed turbulent duct flow compared to previous DNS duct and DNS channel results. ................................... 68 Figure 6.7 Mean streamwise velocity profile at the centerline of the duct for higher resolution of 576 x 97 x 97 compared with resolution of 448 x 75 x 75, DNS and law of the wall. ................................................... 69 Figure 6.8 RMS velocity fluctuations for higher resolution of 576 x 97 x 97 compared to resolution of 448 x 75 x 75, DNS duct and DNS channel results. ................................................................................. 69 Figure 6.9 Instantaneous particle positions visualized in the x-z plane for τ p + = 15. Particles shown are within 30 wall units from the bottom wall. ................................................................................................... 72 viii Figure 6.10 Instantaneous particle position visualized in the y-z plane for low inertia ( τ p + = 1) at t + = (a) 135, (b)270, (c)675, for an initial release height of y 0 + = 3. Particles are visualized within 75 wall units from the center of the square duct. ........................................................... 73 Figure 6.11 Particle deposition rate plotted versus time for various values of particle inertia. ................................................................................... 75 Figure 6.12 Deposition velocity versus particle inertia. The dashed line represents the experimental data envelope for circular pipes compiled by Young and Leeming (1997). The open squares are the DNS result reported by Phares and Sharma (2006). ................... 76 Figure 6.13 Particle deposition patterns averaged over four walls and along the stream-wise direction of the square duct. .................................... 77 Figure 6.14 Deposition velocities comparison for static and evolving fluids model. ............................................................................................... 80 ix Abstract The transport and deposition of inertial particles in unsteady flows were investigated using a novel computationally inexpensive particle tracking model. The flows are simulated using lattice Boltzmann methods, which evolve until computationally steady flows are obtained. Initially, a snapshot of the flow is stored, and the trajectories of particles are computed through the flow domain under the influence of this static probability field. Although the flow is not computationally evolving during the particle tracking simulation, the local velocity is obtained stochastically from the local probability function, thus allowing the dynamics of the turbulent flow to be resolved from the point of view of the suspended particles. Particle inertia is modeled by using a relaxation parameter based on the particle Stokes number that allows for a particle velocity history to be incorporated during each time step. Computational simulation of particle-laden flows in a square cylinder placed in a 2D channel flow having Reynolds number of Re = 1000 was performed. Particles having Stokes number greater than unity were not affected by the flow in the wake region, while particles with smaller Stokes numbers (St < 1) interact strongly with the wake vortices. The computed particle deposition on the front x side of the cylinder compared well with previous numerical results. The deposition efficiency decreases with decreasing Stokes number, typical of an inertial impaction process. The static probability model was also applied for tracking inertial particles through a turbulent square duct flow having a friction Reynolds number of Re = 300. Wall deposition rates and deposition patterns were obtained and exhibited a high level of agreement with previously obtained DNS computational results and experimental results for a wide range of particle inertia. This Boltzmann-based mode, which computes inertial particle motion from the local fluid probability distribution function, was parallelized efficiently using a GPU architecture, allowing for a turbulent square duct simulation that allows the fluid probability field to evolve during the particle tracking simulation. The computation time of the inertial particle tracking in the evolving three-dimensional turbulent duct flow is 134 times faster than using sequential computation on a CPU. Extremely good agreement was found between the static probability field simulation and the evolving probability field simulation. These results suggest that accurate particle tracking through complex turbulent flows may be feasible given a suitable probability field, such as one obtained from a lattice Boltzmann simulation. This in turn presents a xi new paradigm for the rapid acquisition of particle transport statistics without the need for concurrent computations of fluid flow evolution. 1 Chapter 1: Introduction 1.1 Particle transport in turbulence Inertial particle transport in turbulent flows is important to a wide range of processes including cloud formation and evolution, powder processing, particle dispersion in confined flows, laser diagnostics of fluid flows, fuel injection, turbulent coagulation and many more. However, Lagrangian tracking of particles through such flows in a computation is complicated by the continuous range of lengthscales and timescales inherent to the flow. Accurate tracking of inertial particles in a turbulent flow simulation requires that the instantaneous fluid velocity components be known at every particle tracking time node. Even for the case of one-way coupling, a direct numerical simulation (DNS) having spatial resolution less than the Kolmogorov scale is necessary for the simulation to provide sufficiently accurate velocity components. This limits the potentially accurate tracking of particles to moderate Reynolds numbers for which flow DNS simulations may be applied. Any sub-grid scale (SGS) model assumes away the local instantaneous fluid velocity components, combining them into a mean local energy or viscosity parameter. In such simulations, a particle tracking model on top of the local 2 dissipation model may be applied. One common approach is to take the local fluid velocity to be equal to some velocity disturbance added to the mean fluid velocity (Wang and Squires, 1996; Salmanzadeh et al., 2010). The disturbance velocity is randomly selected from a distribution that is dependent upon the local turbulent viscosity or energy parameter. Particle inertia determines the smallest size of the turbulent eddy that affects its transport. For example, passive tracers follow eddy streamlines even at the smallest scales. By contrast, the motion of very high inertia particles may be insensitive to all but the mean flow structures. Consequently, a Lagrangian particle tracking model requires increasingly more flow detail as particle inertia is reduced. However, Eulerian dispersion models may be applied for low-inertia particles, generally removing the need for an accurate turbulence model in many cases. Therefore, the transitional inertia range that falls in between the regime dominated by inertia and the regime dominated by turbulent dispersion remains the most difficult to accurately track. For the specific case of a wall- bounded particle laden turbulent flow, this transitional regime corresponds to particles whose transport is dominated by the combination of inertia and dispersion in the process 3 known as turbophoresis, whereby particles are transported down the gradient of turbulence intensity (Young and Leeming, 1997). The present study explores a novel computationally inexpensive model that allows inertial particles to be tracked through a probability density field, where the equilibrium probability distribution is a Boltzmann distribution. Although the probability density field may evolve, such as in a Lattice Boltzmann simulation, this study aims to explore the range of particle inertia that is relatively insensitive to the streaming and time evolution of the probability field. Note that this is different than a static velocity field, which would be represented by a mean flow field. The main goal is to develop a model that describes the motion of inertial particles through complex turbulent flows without the need for instantaneous velocity components. An approximate discrete velocity distribution function is first solved using a lattice Boltzmann model for a steady turbulent flow, and then the particles are tracked as they move under the influence of the fluid velocity distribution. 4 1.2 Lattice Boltzmann methods using GPU processors A primary benefit of lattice Boltzmann methods (LBM) is that the computation relies on a local system of equations, requiring calculation of only local and neighboring properties. This makes the algorithm very suitable for parallelized computation especially on multi-core platforms. Liu et al. (2008) showed the LBM to be an attractive method to accelerate computation using the central processing unit (CPU) and the graphics processing unit (GPU). The GPU was originally designed for computer generated graphics, which require highly intensive and rapid computation. Since their introduction, the use of computer graphic hardware for general-purpose computation has been an active research area. The first machine to use a general-purpose GPU was the Ikonas (England, 1978). It is a programmable raster display system for cockpit instrumentation display. Within the field of graphics applications, Rhoades et al. (1992) used a programmable GPU for procedural texturing and shading, and Cabral et al. (1994) used one for medical imaging with accelerated volume rendering. Recently, GPUs have has also been used to accelerate non-graphical computations because of their tremendous computational speed at a low price and size. Bolz et al. 5 (2003) implemented a conjugate gradient solver for sparse and unstructured matrices and a multi-grid solver for regular grids on the GPU. They compared computational performance using a CPU and showed that GPU implementation was almost two times faster than the CPU. Buck et al. (2005) described a system for general-purpose computation on programmable graphic hardware. It includes a simple data-parallel construct so that the GPU can be used as a streaming processor. They applied the system to a variety of applications, including image segmentation, FFT, and ray tracing. Their system performed up to seven times faster than the CPU counterpart. In 2007, NVIDIA released an extended C programming model, called Compute Unified Device Architecture (CUDA), for easy of parallel programming using GPUs. CUDA facilitates parallelization of a massive number of computations using built in libraries and toolkits with familiar programming languages like C, C++, and FORTRAN. Many flow simulations through complex geometries or turbulent regions using LBM were successfully implemented using CPU-based parallelization. Kandhai et al. (1998) demonstrated fluid flow in porous media using LBM. In that study, the code was parallelized based on the decomposition of the computational grid into equal sub volumes. The computational cost of LBM simulations linearly increased with the number 6 of fluid nodes and showed 12% to 60% increase in computational speed. Desplat et al. (2001) described a versatile program for simulating complex fluid models using parallelized lattice Boltzmann code. The code allows for application of a broad class of multicomponent and complex or multiphase fluid with or without solid surfaces. Because of the computational power, programmability, price, and size of GPUs, LBM computations have been parallelized on single GPU processors. Qiu et al. (2004) implemented a dispersion simulation and visualization for urban security based on a multiple relaxation time lattice Boltzmann model. The computation was accelerated and efficiently rendered buildings with small textures on the GPU. Li et al. (2005) further accelerated the LBM on the GPU by grouping distribution packets from the Boltzmann equation into 2D textures. Other studies of LBM implementation on a GPU include the simulation of soap bubbles (Wei et al., 2004), the real-time ink dispersion in absorbent paper (Chu and Tai, 2005), the melting and flowing in a multiphase environment (Zhao et al., 2006), and simulation of miscible binary mixtures (Zhu et al., 2006). Fan et al. (2004) simulated the dispersion of airborne contaminants in the Times Square are of New York City as an example application. The computation time was 0.31 second/step for a computation size of 480x400x80 using 30 GPU nodes. The 7 computational execution time was 4.6 times faster than their CPU cluster implementation. Julien and Inanc (2009) described a Navier-Stokes solver for incompressible fluid flow using CUDA developed by nVIDIA. They reported that a quad-GPU platform performed two orders of magnitude faster than to a serial CPU implementation. Two-dimensional and three-dimensional LBM simulations have also been accelerated on GPUs using CUDA by Astorino et al. (2011) and Tolke and Krafczyk (2008). Their results demonstrated that desktops with GPUs can serve as a cost- effective small parallel computing platform to accelerate LBM simulations substantially. 1.3 Simulation of fluid and particle tracking The most accurate computational method for evaluation of instantaneous turbulent velocities is direct numerical simulation (DNS), which for a Newtonian fluid involves solving the Navier-Stokes equations, 0 i i x U (1.1) i j j i j j i i U x x x P x U U t U 2 1 (1.2) 8 A sufficiently large number of nodes re required to resolve the Kolmogorov scales of the turbulent flow. DNS is a good tool for supplying information that cannot be extrapolated from experiments, but it is limited to relatively low Reynolds numbers due to the computational expense associated with having to resolve the smallest lengthscales. High Reynolds number flows may be modeled using the more computationally inexpensive Large Eddy Simulation (LES). LES computes the large, energy containing eddies, while sub-grid scale (SGS) models describe the effects of the smallest scales. A filtering operation is introduced to separate the large and small scales, and decomposes the velocities into the sum of a filtered velocity, i U and an SGS velocity i U : i i i U U U (1.3) dy y x G y U x U i , ) ( (1.4) where G is a filter function. The most commonly used filters are the top hat filter, Fourier cutoff filter and Gaussian filter. The incompressible LES filtered Navier-Stokes equations are 0 i i x U (1.5) j ij i j j i j j i i x U x x x P x U U t U 2 1 (1.6) 9 Over bars denote application of the filtering operation, and ij is the SGS stress tensor. Typically, the filtered fluid velocity is applied in the particle’s equation of motion. The description of the particles motion accounts for inertia due to the density difference between the particles and the fluid. The particle equation of motion can be expressed: P i i U dt dX (1.7) Considering a particle as a rigid sphere, the particle motion is governed by the force balance equation (Maxey and Riley 1983). The forces include Stokes drag, pressure drag, virtual or added mass, Basset history, lift, and gravity. Most of these terms can be neglected depending on the particle inertia and in the simplest case only the Stokes drag is considered: p P i i P i U U dt dU (1.8) where P i U is particle velocity (define after Eq. 1.7). The particle relaxation time p can be written 18 2 p p d (1.9) where p d is the particle diameter, is the fluid kinematic viscosity, and is ratio of particle and fluid density. 10 Particles may be tracked through a fluid computation using the Lagrangian or Eulerian frame of reference. In general, the Lagrangian approach captures more of the fundamental physics involved in particle trajectory with less conceptual modeling. However, it requires much higher computational cost than Eulerian modeling. Most studies assume rigid spherical particle motion and deposition from fully developed turbulent flow. Gravity, thermal gradients, electrical fields, and interaction between particles are commonly ignored. Particles are also assumed not to affect the turbulent flow and not bounce, detach and reentrain once they contact a wall (Sippola and Nazaroff, 2002). 1.3.1 Lagrangian simulations The Lagrangian approach focuses on the motion of a single particle in the turbulent flow. The computation involves summing all forces acting on each particle. Determination of individual particle motion can provide more direct information about particle trajectories, including how deposition occurs and how particles interact with and turbulent eddies. A number of studies that employ a Lagrangian model to track particles in wall- bounded turbulent flow have been reported. Pedinotti et al. (1992) applied DNS to 11 examine vortical structures and the resulting particle transport in wall-bounded turbulent flows. That study showed that particles can become preferentially concentrated in low speed streaks in the near-wall region. McLaghlin (1989) reported the first Lagrangian simulation focusing on particle deposition in wall-bounded turbulent flows by DNS. He demonstrated particle accumulation in the near-wall region prior to deposition. His computed particle deposition velocities agree well with experimental data and he provided information on particle Reynolds numbers, particle velocities and near wall particle concentrations that were not obtained from previous experiments or simulations. Brooke et al. (1992) reported Lagrangian particle tracking considering only the drag force in a vertical channel flow generated by DNS. That study also noted accumulation of particles near the wall and transport into the low-speed streaks region. It was also concluded that near-wall vortices directly cause particle deposition and the high velocity impact of particles upon deposition, as compared to the local mean fluid velocity. Chen and McLaughlin (1995) showed DNS-generated vertical channel flow using a particle equation of motion that included a wall-corrected drag and the lift force. The lift force was corrected for high Reynolds numbers and for the presence of the wall. Because of the 12 decreased lift and increased drag force near the wall, their deposition rates were lower than other numerical simulation results. A number of researchers have studied particle transport in turbulent channel flows using a sub-grid scale (SGS) model to overcome the computational limitations of DNS, which is generally restricted to lower Reynolds numbers and shorter simulation times as the Kolmogorov scale becomes excessively small. Wang and Squires (1996) suggested using random numbers sampled from a Gaussian distribution to simulate the local fluctuating velocities. The local velocity experienced by a particle is taken to be the sum of the mean fluid velocity and the SGS fluctuation velocity. Their results agree well with previous experimental and DNS results. However, results for smaller particles deviated most from the DNS results, likely due to the sensitivity of smaller inertial particle trajectories to even the smallest scale eddies in the flow. The smallest eddies are modeled approximately in LES while they are in principle completely resolved a DNS with suitable resolution. Uijttewaal and Oliemans (1996) compared larger particles (in the inertia-moderated regime) in vertical tube flows using LES and compared their results to DNS results. In that study, the Stokes drag force in the particle equation of motion was modified to account for large particle Reynolds numbers. Particle deposition results 13 showed good agreement with experimental data from Liu and Agarwal (1974). The study resolved the trend of decreasing deposition velocity with increasing particle size, and increasing deposition velocity with Reynolds numbers. Larger particles are not as significantly influenced by turbulent fluctuations near the wall, but rather by the larger more energetic eddies near the turbulent core, which throws higher inertia particles through the boundary layer and into the wall. More recently, Salmanzadeh et al. (2010) showed that the inclusion of an SGS disturbance significantly affects the deposition rate of small particles from the corresponding mean simulations. Kuerten (2005) also modeled the turbophoresis phenomenon, and Winkler et al. (2003) studied preferential concentration of particles in a fully developed turbulent square duct flow using an SGS model. The latter study confirmed that particles accumulate in regions of high strain-rate, low swirling strength, and low vorticity in the square duct geometry. 1.3.2 Eulerian models The Eulerian method treats the particle and the flow as separate continuous phases. Instead of computing individual motions of particles, the overall behavior of an ensemble of particles is calculated. The Eulerian approach computes particle 14 concentration profile by solving of conservation equations for the particle phase and the volume-averaged concentration (Sippola and Nazaroff, 2002). Particle dynamics in a turbulent flow have been modeled using an Eulerian approach, particularly in the Gaussian plume models (Turner, 1994), which continue to play a large role in urban airshed modeling of pollutant transport (Heinsohn and Kabel, 1999). The model simulates dispersion of a process gas or aerosol stream exiting an exhaust stack of some height and diameter caused by its momentum and buoyancy. Gauss’s equation is used to describe the behavior of plumes and dispersion in the vertical and horizontal directions. Dispersion in the vertical direction depends on the atmospheric conditions (i.e. the relative stability of the surrounding air). By contrast, the horizontal dispersion largely depends on molecular and eddy diffusion. Though computationally efficient, the individual trajectories of particles are not resolved using Eulerian approaches, and this makes studying particle phenomena, such as deposition an impaction, difficult. The widespread usage of Eulerian models for atmospheric analyses underscores the need for better and more computationally efficient Lagrangian models that may be applied to large scale atmospheric processes. 15 1.3.3 Particle deposition in wall bounded turbulent flows One of the earliest analyses of particle deposition in a turbulent flow was reported by Friedlander and Johnston (1957). Their so-called free-flight model describes deposition of inertial particles as a penetration through the near wall region initiated by flow structures farther from the wall. This replaced more conventional thinking at the time that deposition was purely diffusive process. Friedlander and Johnston equated a particle’s stopping distance to its distance of free flight to the wall. The particle stopping distance is defined as the total distance that a particle will travel in a stagnant fluid given some initial velocity. Many researchers have modified this free-flight model. For example, instead of a constant value for the initial free-flight velocity, Davies (1966) recommended the local root-mean-square of the wall-normal velocity fluctuation of the turbulent flow. He also included the size of particle radius in the capture distance and effects of Brownian diffusion for smaller particles. Liu and Ilori (1974) suggested the particle eddy diffusivity and fluid eddy viscosity were not the same as had previously been assumed. For larger particles, they assumed particle eddy diffusivity was greater than the eddy viscosity. This 16 modification with neglecting Brownian diffusion showed better results than previous results of the free-flight models. Caporaloni et al. (1975) first recognized the effect of turbophoresis in particle transport near boundaries. They included turbophoresis along with Brownian and turbulent diffusion for a solution of the particle mass conservation equation to account for particle flux due to a turbophoretic velocity. Young and Leeming (1997) and Guha (1997) described particle motion in turbulent flows using particle mass and momentum conservation equations to compute the particle convective velocity. The two models are essentially the same, but used different coordinate systems; Young and Leeming used radial coordinates while Guha used Cartesian coordinates. These are the only reported Eulerian models to simulate accumulation of particles near the wall region. 1.4 Wakes behind a square cylinder in a channel Tracking aerosols around a bluff body is important to gain a fundamental understanding of particle impaction and deposition phenomena. A square cylinder in a channel flow with fixed separation points is commonly used as a simple model for simulating a liquid-fuel spray combustor (Jacobs and Armstrong, 2009). The main 17 parameters that affect the flow and particle motion are the Reynolds number, the ratio of cylinder width to the height of the channel, and the particle Stokes number. The presence of the square cylinder creates flow separation and an unsteady wake behind the body. The flow contains an asymmetric vortex structure behind the cylinder, a stagnation region in front of the cylinder, and regions of flow acceleration and deceleration (Jafari et al. 2010). Longmire and Eaton (1992) experimentally investigated the transport of particles in vortex structures. They visualized an axisymmetric round jet laden with 55 mm glass beads using Laser-Doppler Velocimetry, and found that dispersion is dominated by convection rather than diffusion. Preferential concentration of particles is observed near the vortex ring structures and the particles tend to accumulate in the regions between vortices. Lazaro and Lasheras (1992) experimentally observed fast entrainment of small droplet and slower entrainment of larger droplets because of the Stokes number effects. They also concluded that the particle concentration field is related to the streaky nature of the large-scale structures. Uthuppan et al. (1994) reported a numerical simulation of the dynamics and dispersion of particles in the near field of a high velocity transitional axisymmetric jet. They showed that small particles are trapped in the vortical structures, large particles pass 18 through them, and intermediate sized particles tend to accumulate between them. Chang and Kailasanath (1996) studied the particle dynamics in a confined high-speed shear flow. They used a flux-corrected transport algorithm and a Lagrangian model for particle tracking in the flow to show that dispersion was affected by the shedding frequency even at downstream locations. They also showed that high concentrations of particles are associated with high vorticity for small particles, and the other way around for intermediate to large sized particles. Crowe et al. (1988) numerically simulated particles in the wake of a bluff body. They confirmed that particles tend to accumulate along the edges of vortex structures. They studied Stokes numbers of 0.01, 1.0, and 100 and determined that particles with a Stokes number of unity have the most preferential concentration. Brandon and Aggarwal (2001) numerically investigated particle deposition on a square cylinder placed in a rectangular channel for unsteady vertical flow. They analyzed the particle transport for a range of Stokes numbers and showed the particle deposition efficiency increased as Stokes number was increased until St = 3. The Reynolds number did not affect particle deposition, but it had an effect on particle dispersion. Each study showed that particle 19 motion depends on the Stokes number which directly affects the interaction of particles with vortex structures. Recently, Rowghani et al. (2010) and Perumal et al. (2012) modeled the viscous flow over a square cylinder using the lattice Boltzmann method. They demonstrated the capability of the lattice Boltzmann method to capture vortex shedding phenomena and the effects of Reynolds number, blockage ratio, and channel length. Jafari et al. (2010) combined an LBM simulation and Lagrangian particle tracking in a channel with a square cylinder obstruction. They studied the influence of vortex shedding on the particle deposition, dispersion, and local concentration in the channel. They concluded particle deposition on the front side of the obstruction decreases as Stokes number decrease, and particles with a Stokes number of unity have the most preferential concentration while particle with a Stokes number of 10 do not interact strongly with the wake vortices. 1.5 Turbulent flow in a square duct A square duct flow is an interesting geometry in which to track inertial particles due to the combination of secondary flows and turbophoresis. Prandtl (1926) classified the first turbulence-driven secondary flows created by turbulent velocity fluctuations. 20 Even though secondary flows are only 1% to 3% of the streamwise bulk velocity in magnitude, they significantly affect mean flow and turbulent structures including wall shear stress distribution, heat transfer rate, and transport of passive tracers (Huser and Biringen 1993). Brundrett and Baines (1964) experimentally measured the velocity field and Reynolds stresses in a square duct. They indicated that secondary flows are caused by gradients in the Reynolds stresses. The secondary flows penetrate the corners and approach the walls as the Reynolds number increases. However, the general secondary flow pattern is independent of the Reynolds number. Demuren and Rodi (1984) performed calculations of flow in straight ducts and developed algebraic expressions for the Reynolds stresses by simplifying previous models and retaining the gradients of the secondary velocities. Prediction of mean flow and turbulence quantities were accurate, but the secondary flow velocities were lower than expected. Kajishima and Miyake (1992) applied an eddy viscosity model for a square duct flow. The secondary flows were well recovered due to the imbalance between the gradient of the turbulence stress and the corresponding pressure in the near-corner regions. 21 Simulations of turbulent flows in a square duct have also been reported using DNS and LES methods. One of the first studies to report LES of a turbulent square duct flow was Madabhushi and Vanka (1991). They used a friction Reynolds number of 360 and a 65 x 65 x 32 grid in the x, y and z directions. It was observed that the production of mean streamwise vorticity is caused by both Reynolds shear and normal stresses. Xu and Pollard (2001) used an LES model to simulate the turbulent flow in a straight square annular duct at a friction Reynolds number of 200. They developed a relationship between the averaged streamwise velocity and the distance away from a concave corner along the corner bisector using a curve-fitting method. They used results from LES in a square annular duct and DNS in a square duct for curve-fitting. The mean secondary flow is shown symmetrically placed around the convex corner as a chain of strong, counter- rotation vortex pairs. A weak vortex pair is symmetrically located around the concave corner. 22 Figure 1.1 Mean secondary flows in a square duct. Note the y-z is cross-section and axes are normalized by width of the duct. Gavrilakis (1993) used DNS to compute the mean flow and turbulent statistics of a fully developed turbulent flow in a square duct. The Reynolds number based on the friction velocity and hydraulic diameter was 300 and 16,100,000 grid points were used for the DNS computation. According to Gavrilakis, secondary flow velocity vectors constructed from the averaged spanwise velocity moved from the center of the duct to each corner along the corner bisector as shown in Fig 1.1. At the duct corners, vortices are established by the interaction of the vertical and the horizontal boundary layers. Since weak shear stresses exist close to corners while strong shear stresses exist in the center of the duct, fluid momentum was transferred from the core of the duct toward the corners. 23 Huser and Biringen (1993) performed a DNS simulation of turbulent flow in a square duct at Re = 600, based on mean friction velocity and duct width. Turbulence statistics along the wall bisector from the fully developed turbulent field compared well with experimental and numerical results. Strong turbulence production occurred near the wall bisector, with weak production at the corner bisector. This produces positive and negative convection of the mean streamwise velocity at these respective locations. A mean secondary flow from the core of the duct to the corner is created by dominant turbulence structures in the flow. Dominant ejection structures are produced during a bursting event and are composed of two streamwise counter-rotating vortices. The corner bisector has reduced mean shear, which prohibits ejections from occurring. 1.6 Turbophoresis and particle deposition Turbophoresis is the tendency of particles to move in the direction of decreasing turbulence intensity. Turbophoresis causes particles to concentrate near walls where turbulence decreases to zero at wall. This process in inertial-dependent, and directly results in an increase in deposition rate with particle inertia for particles that are most influenced by turbophoresis (i.e. the diffusion-impaction regime in Fig. 1.2) 24 Fig 1.2 depicts experimentally measured particle deposition velocities as a function of particle inertia from turbulent flows in a circular pipe. The dimensionless deposition velocity is equal to the particle flux to the walls divided by the particle number concentration in the pipe. Figure 1.2 Particle deposition velocities from fully-developed turbulent pipe flow (Young and Leeming, 1997) Particle deposition velocities in Fig 1.2 are divided into three regimes, the diffusional deposition regime, the diffusion-impaction regime, and the inertia-moderated regime. Smaller particles are less likely to be driven inertially through the boundary layer or to accumulate along the wall by turbophoreses. Thus they exhibit a low and constant 25 deposition rate. By contrast, the deposition velocity increases by four orders of magnitude in the diffusion-impaction regime. Particles move from the core to the wall by turbophoresis and are deposited by turbulent fluid fluctuations that penetrate the boundary layer. For the larger, higher-inertia particles that comprise the inertia- moderated regime, deposition rate is high and less dependent upon particle inertia. In this regime, particles acquire enough momentum from turbulent fluctuations in the duct core to completely penetrate the boundary layer and deposit on the walls (Young and Leeming, 1997). Phares and Sharma (2006) applied a DNS simulation of a turbulent square duct flow to show that a combination of secondary flow and turbophoresis causes inertial particles to deposit on the walls in a squared duct. In this study, a computational model is developed using the lattice Boltzmann method for fluid simulation and particle trajectories are computed using a model that allows particle in the computation to respond to the probability distribution function that drives the lattice Boltzmann model. This computationally inexpensive model will be validated by comparing particle deposition results with experiments and other numerical simulations. 26 Chapter 2: Fluids Modeling 2.1 Lattice Boltzmann method In this study, the turbulent flows will be numerically solved using the Lattice Boltzmann method (LBM). The LBM was originally created to model flows governed by the Navier-Stokes equations (Gunstensen et al. 1991; Chen et al. 1992; Alexander et al. 1993; Inamuro et al. 1995; He et al. 1997; Chen and Doolen 1998; Lallemand and Luo 2000; Wolf-Gladrow 2000). More recently, the LBM has developed into a promising and alternative new scheme for computational fluid dynamics (CFD) techniques (Dawson et al. 1993; Martinez et al. 1994; He et al. 1998; Kang et al. 2002; Lammers, 2006). Over the last decade, it has proven to be a reliable and efficient method for the numerical simulation of single and multiphase fluids. The method is based on statistical physics and models the fluid flow by tracking the evolution of the distribution functions of the fluid particles in phase space. The method can be considered in the class of kinetic theory approaches. One major advantage of the lattice Boltzmann method over other fluid dynamics simulation techniques its ability to incorporate interaction terms directly into the equations of motion. This makes 27 it possible to deal with complex geometries, fluid mixtures, and to incorporate turbulence models. 2.2 The Boltzmann and lattice Boltzmann equations The Lattice Boltzmann method evolved from the lattice gas cellular automata (LGCA), which was developed to simulate gas flows using discrete dynamics, where conservation laws are applied at microscopic levels. Frisch et al. (1986) applied a hexagonal lattice model to simulate a hydrodynamics system. In that scheme, a particle at a lattice node can move to a neighboring node in one of 6 lattice directions as shown in Fig. 2.1. Figure 2.1 Hexagonal lattice model. The particle occupation is defined by Boolean variables t x n i , where i ranges from zero to the discrete velocity direction number (see Fig. 2.1) at each node. The configuration of particles evolves by streaming and collision at each time step. Each 28 particle moves to the nearest node in its velocity direction. If particles collide at a single node, their velocity directions are changed. The evolution of a particle having velocity i c from position x at time t and moving to i c x over next time step 1 t can be expressed as: i i i i C t x n t t c x n , 1 , , (2.1) where i C is a collision function. The left hand side of Eq. 2.1 can be expanded as: 2 , 1 , t n t c t n t t x n t t c x n i i i i i i (2.2) Substituting Eq. 2.2 into Eq. 2.1 and neglecting higher order terms, one obtains t Q n c t n i i i i (2.3) A major disadvantage to this approach is the statistical noise that is associated with the Boolean occupation number. In order to improve the statistical noise through statistical averaging of the dynamics, the Boolean occupation number, i n , in a lattice direction was replaced with a corresponding ensemble averaged population, i f . Each component i f is a discretized probability distribution function and represents the probability of a particle moving in the i th direction at position x . Now, Eq. 2.3 becomes the Boltzmann equation. i i i f u t f (2.4) 29 where the collision operator is t C i . The Boltzmann equation, Eq. 2.4, describes the evolution of the particle distribution function, ) , , ( t u x f , which is the probability of a particle (physically corresponding to a molecule) having a given position and momentum. The collision operator, , describes the interaction of molecules due to the collision. The BGK (Bhatnagar-Gross-Krook) model simplifies the collision operator by introducing a single relaxation time to the local equilibrium. The simplest form of BGK model can be written as f f f u t f eq BGK (2.5) where eq f is the equilibrium distribution function (Bhatnagar et al. 1954). The discretized Boltzmann equation along with the BGK approximation is the lattice Boltzmann equation. A finite set of velocities replaces the continuous spectrum of velocities, and the particle distribution function is discretized in this velocity space. The discretized equation is given by ) ( ) , ( ) , ( ) , ( ) , ( i eq i i i i i i f f x t t x f t t x x f t x t t x f t t x f (2.6) 30 The discretization of space is represented by the lattice or grid, and the relation i x = t u i establishes the link with the discrete velocity set { i u }. The lattice Boltzmann equation simplifies to: ) ( ) , ( ) , ( i eq i i i i f f t t x f t t t u x f , ) , 1 , 0 ( M i (2.7) The particle movement is restricted to a limited number of directions. For instance three- dimensional models with 19 velocity vectors (18 associated with a moving particle, and 1 associated with a particle at rest) are commonly denoted as D3Q19 shown in Fig. 2.2(b). Alternatives are models with 15 or 27 velocities (D3Q15 or D3Q27). However, the D3Q27 model had no clear advantages over the D3Q19 velocity model, while the D3Q15 model exhibited a decreased stability (Chen and Doolen, 1998). Figure 2.2 Lattice velocity directions of (a) D2Q9 and (b) D3Q19. The 9 and 19 dimensionless lattice velocities c u e i i in 2 and 3-dimensional space, respectively, can be written 2 3 5 4 6 7 9 8 1 14 1 3 4 5 2 6 10 9 7 8 11 19 12 13 15 16 17 18 y x z (a) (b) 31 1 , 1 , 1 , 1 1 , 0 , 0 , 1 0 , 0 i e 9 ~ 6 5 ~ 2 1 i i i for D2Q9 1 , , 0 , 1 , 0 , 1 , 0 , 1 , 1 1 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 1 0 , 0 , 0 i e 19 ~ 8 7 ~ 2 1 i i i for D3Q19 where t x c = 1 is called the lattice speed. Typically, the basic unit for lattice spacing, x , and discrete time unit, t , are defined in terms of the physical speed of sound. However, using the true speed of sound can lead to unacceptable short time step for small-scale flows. The speed of sound in lattice Boltzmann method is calculated from lattice Mach number which is much larger than real Mach number, and compensating for this by raising the viscosity in order to preserve the Reynolds number (Succi, 2001). After replacing the finite set of velocities,{ i u } with lattice velocities { i e }, finally the lattice Boltzmann equation becomes ) ( ) , ( ) , ( i eq i i i i f f t t x f t t t e x f (2.8) 2.3 Equilibrium distribution function The local equilibrium distribution can be expressed from the Boltzmann velocity distribution function: 32 RT u RT vu RT vu RT v RT RT u RT vu RT v RT RT u v RT f D D D eq i 2 ) ( 2 ) ( 1 2 exp ) 2 ( 2 exp 2 exp ) 2 ( 2 exp ) 2 ( 2 2 2 2 2 2 2 2 2 2 (2.9) where R is the universal gas constant, T is the absolute temperature, and D is the dimension of the space. Introducing RT c 3 , the equation yields to the final form of i i i i i i i eq i u u c u e c u e c w f 2 2 2 2 2 3 ) ( 2 9 3 1 (2.10) in which the constant values of the weight coefficients i w are 36 1 ; 9 1 ; 9 4 9 ~ 6 5 ~ 2 1 w w w for D2Q9 36 1 ; 18 1 ; 3 1 19 ~ 8 7 ~ 2 1 w w w for D3Q19 Using these weight coefficients the macroscopic fluid density and velocity u can be computed from the summation of the discrete particle distribution functions as follows i i f ; i i i e f u (2.11) The basic computational algorithm consists of a streaming step and a collision step on the lattice grid. The streaming step describes the movements along the particle directions, while the collision step represents the relaxation to the local equilibrium values. The streaming step is implemented by applying the distribution functions 33 associated with each node to its neighboring nodes t e x i . The collision step then can be computed by summing the right hand side of Eq. 2.8 and its original distribution function using Eq. 2.9 and 2.10. These simple steps, with the use of a regular grid, make the LBM computation easy to apply and very efficient. 2.4 Forcing term The lattice Boltzmann equation, as written in Eq. 2.8, lacks a forcing term. In general, most applications consider either interactions between particles or an external force field or electric field. The turbulent flows considered in this thesis are driven by a constant pressure gradient: dx dp F (2.12) In the presence of the external force F, the lattice Boltzmann equation is written i eq i i i i tG f v f t t x f t t t e x f ) ) , ( ( ) , ( ) , ( (2.13) Compared to Eq. 2.8, a new equilibrium velocity, v is used instead of the macroscopic flow velocity, u , when calculating the equilibrium distribution function. Also, an additional forcing term i tG is added to the collision function. These changes can be implemented by one of two approaches: the kinetic theory on the continuous Boltzmann 34 equation level (Martys, et al, 1998, Shan, et al, 2006) or using a Chapman-Enskog expansion to match the corresponding macroscopic equation to the Navier-Stokes equation (Guo, et al, 2002). A number of methods that specify the forcing term and the associated numerical stability exist. In this study, we use the method presented by Ginzbourg and Alder (1994), where F u v (2.14) i s i i Fe c w G 2 (2.15) Here, u is defined, as before: i i i e f u 1 . The fluid momentum is calculated from the average of the momentum before the collision and the momentum after the collision: F t e f u i i i 2 . (2.16) 2.5 Boundary conditions Selection of the boundary conditions in lattice Boltzmann methods affects the numerical accuracy and stability of the computation. Over the last two decades, many boundary conditions have been proposed and investigated for variety of simulations (see, for example, Chen et al., 1996; He et al., 1997; Maier et al., 1996; Noble et al., 1995; Skordos 1993; Ziegler 1993; Zou and He, 1997). The simplest boundaries are periodic, in 35 which edges are treated as if they are attached to opposite edges. Periodic boundaries are usually applied at open ends of a physical channel or duct. The suitable boundary condition at a solid boundary is the no-slip wall or bounce-back boundary condition. As illustrated in Fig. 2.3, an incoming fluid particle from a fluid node is reflected at the solid node back along the direction from where it came. Figure 2.3 Schematic of the half-way bounce-back algorithm for the D2Q9 model. This bounce-back scheme results in no tangential velocity along the fluid wall interface, and can be imagined as reflecting the particle distribution functions at the boundary. There are two kinds of bounce-back boundary conditions. The first one is called a full bounce-back, which assumes that the wall is located at the solid nodes. The second is the half-way bounce-back, where the wall is located half-way between the fluid and solid nodes. According to Ziegler (1993), a half-way bounce back condition has second-order accuracy while a full bounce-back provides first-order accuracy. Solid Fluid Solid Fluid Solid Fluid Fluid Solid Streaming Bounce Back 36 2.6 Sub-grid scale turbulence model Turbulence is simulated in the lattice Boltzmann model in a way that is similar to conventional sub grid model. Analogous to using a varying viscosity in those models, a varying relaxation parameter (i.e., in Eq. 2.8) is used in turbulent LBM models. In the present study, we use the Smagorinsky sub-grid model, in which the Reynolds stress tensor is dependent on the local strain rate and the stress tensor. The stress tensor is used to compute the eddy viscosity, edd . The total viscosity is the sum of the physical and the eddy viscosities edd total (2.17) According to Hou et al. (1996) the eddy viscosity and total viscosity can be express as S C s edd 2 (2.18) S C s total 2 (2.19) where S is the intensity of the local strain tensor and computed as 2 2 1 , , 2 2 ) ( 6 ) ( 18 s s C C S (2.20) 37 We employ a uniform mesh, a lattice grid spacing of, = 1, and a Smagorinkski constant s C = 0.03 (see Yu et al. 2005). The stress tensor, , can be evaluated for each cell locally from non-equilibrium density distributions as eq i i i i i f f e e 19 1 , , (2.21) where i denotes the velocity direction in the D3Q19 model. Finally, the total relaxation time is 2 1 3 2 S C s total . (2.22) The value of S is always positive, and therefore the total relaxation time increases with S . Since values of the relaxation time are kept close to 0.5, the numerical simulation remains stable (Chen and Doolen 1998). This turbulence model is integrated into the laminar LB model (i.e. constant relaxation time) by calculating the total relaxation time at each time step. 38 Chapter 3: Boltzmann-based model for tracking aerosol particles This chapter describes a model for tracking inertial particles through the probability density field that is obtained from the lattice Boltzmann method, in order to compute the trajectories through the flow. Not that the term “particle” here is distinct from the “fluid particles” described in the previous chapter. The local velocity of the flow is computed stochastically using the local distribution function. After the flow is fully developed and stored, the particles are tracked as they move under the influence of the fluid velocity distribution. The stochastic nature of the particle motion is captured by treating the fluid velocity that influences the particles as a random variable whose particular realization is chosen according to the local probability distribution function of the fluid velocity calculated from the lattice Boltzmann model. 3.1 Equations of motion The behavior of particles in turbulence depends on the particle Stokes number defined as L U St p 0 (3.1) 39 where U 0 is the characteristic fluid velocity of the flow, and L is the characteristic length. For smaller Stokes numbers the fluid more easily carries the particles, while for larger Stokes numbers, particle inertia dominates and the particles tend to cross fluid streamlines. As is common for the simulation of micron-sized particles in subsonic flows, only the Stokes drag is considered in this study. Although lift and gravity are known to have some influence on particle motion near the wall and in horizontal ducts, we have chosen to apply only the Stokes drag so that direct comparisons may be made with previous DNS results. Using the Stokes number, Eq. 1.8 can be written in dimensionless form as St V V dt dV p f p * (3.2) where f V and p V are the dimensionless fluid and particle velocities, respectively. Let us now examine the behavior of an inertial particle during one simulation time step. Assuming f V is constant over this time, the dimensionless particle velocity, P V may be solved to be St t p f f p e V V V V )) 0 ( ( (3.3) 40 where t is the dimensionless simulation time step. Replacing the initial particle velocity, ) 0 ( P V , with the particle velocity at the previous time step, old P V , and P V with the new particle velocity new P V , Eq. 3.3 may be rearranged as St t P St t f P e V e V V old new ) 1 ( (3.4) To model the inertial particle velocity evolution from the Boltzmann model, we introduce an inertial particle relaxation factor, , based on the particle Stokes number. old new P f P V V V ) 1 ( (3.5) This factor acts to quantify the extent of velocity history experienced by the particle. As approaches zero, the particles retain only the velocity of the previous time step, and thus have high inertia and are unaffected by the fluid motion. When is close to 1, the particles immediately conform entirely to the local fluid velocity at the new time step, and thus act more like passive tracers. The relaxation parameter may be computed from the local Stokes number by combining Eqs. 3.4 and 3.5, yielding: St t e 1 (3.6) For a turbulent flow simulation, it is customary to use wall units for scaling, rather than the duct dimensions and bulk velocity. Wall units are based on a non-dimensional wall 41 distance and friction velocity for a wall-bounded turbulent flow. Thus, an equivalent relation may include parameters expressed in wall units: p t e 1 (3.7) The particle motion is then determined from the probability distribution function as follows. When a particle enters the lattice domain, the fluid velocity around the particle can be calculated using the weighted average Boltzmann distribution function as shown in Fig. 3.1. The weighted average distribution is calculated from the distances, d i, between the particle and the neighboring nodes. 2 7 2 8 1 4 1 3 2 2 3 1 4 f d f d f d f d d f i p Figure 3.1 Interpolation of probability distribution function from neighboring nodes. f p d 1 d 2 d 3 d 4 42 As shown in Fig. 3.1, the fluid velocity is determined from the local distribution function, f p , which is constructed from each of the weighted distributions of the neighboring nodes. The mean fluid velocity at the particle position can then be calculated by j p p j f , j p j p f j f e V 1 (3.7) Similarly, the local instantaneous fluid velocity at the particle position is obtained by statistically selecting the fluid velocity from the discrete probability distribution functions at each neighboring node. These are then combined in a weighted manner that reflects the relative distance of the particle from each node. In essence, this inertial particle tracking model combines conventional particle trajectory modeling with a stochastic velocity implementation sheme that allows application to the lattice Boltzmann domain. 43 Chapter 4: Numerical Simulation 4.1 Implementation of the fluids model For simulation of the fluid, the lattice Boltzmann computation starts with applying initial an equilibrium function. The initial equilibrium function can be calculated at all nodes from an initial velocity profile and density. The subsequent collision and streaming steps are applied as follows: Collision step: ) ( ) , ( ) , ( i eq i i t i f f t t x f t t x f Streaming step: ) , ( ) , ( t t x f t t t e x f t i i i During the collision step, a new probability distribution function and a collision operator are calculated for the next time step, and the bounce-back boundary condition is applied at the solid nodes. During the streaming step, the distribution functions are propagated to the neighboring nodes. Then new densities and velocities at all of the nodes are computed for the next time step. This procedure continues until the mean flow field reaches a steady state, as summarized in the flow chart in Fig. 4.1. 44 Figure 4.1 Flow chart of the implementation of LBM. Initial state: Compute initial equilibrium function of all nodes using assigned density and velocity ) 0 , ( t x f eq i Streaming Step: Move the particles to neighboring nodes and compute new particle distribution function ) , ( ) , ( t t x f t t t e x f i i i Collision Step: Apply bounce back boundary condition and compute collision operator )] , ( ) , ( [ ) , ( ) , ( t x f t x f t t x f t t x f eq i i i Velocity Calculation: Compute new densities and velocities of the nodes i i f i i i e f u Steady State? Next Time Step: Compute new equilibrium distribution function u u c u e c u e c w f i i i i eq i 2 2 2 2 2 3 ) ( 2 9 3 1 NO END YES 45 4.2 Parallelization of fluids model There are essentially no spatial dependencies among variables in the collision step. Only nearest neighbors have dependencies and this is favorable for parallelization. By contrast, the streaming step is the more difficult to compute due to the spatial dependencies. The algorithm is parallelized by dividing the entire computational domain into several sub-domains in the streamwise direction, with each sub-domain assigned to a single processor. This is known as domain decomposition. However, LBM requires communication between neighboring nodes, which might be located in adjacent sub- domains for the streaming step. Additional dummy variables are required to hold copies of the information of the nearest neighboring sub-domains. Thus, at the end of each iteration, partial results from the bordering planes are exchanged between adjacent sub- domains. 4.3 Simulation setup In this study, we have applied a two-dimensional and a three-dimensional simulation of particle motion through turbulence. The two-dimensional domain considers a particle-laden channel flow impinging on a square cylinder placed within the channel. 46 This computation illustrates the transport and deposition of aerosol particles flowing over a bluff body, in which the fluid flow is characterized by a recirculation region and large- scale vortex structures in the wake of the body. The three-dimensional computation is applied to a turbulent straight square duct flow. In addition to serving as a scientifically interesting geometry characterized by secondary flows and highly inhomogeneous turbulence, the aerosol motion in a square duct is relevant to the transport and deposition of particles in ventilation ducts. 4.3.1 A square cylinder placed in a channel flow Fig. 4.2 illustrates the computational domain of the two-dimensional channel flow over the square obstruction. The total length is L = 24B and width of the channel is H=4B where B is the width of the square. The square cylinder is centrally located at a distance of 8.5B away from the inlet. A parabolic velocity profile is applied at the channel inlet indicating a fully developed channel flow. A constant pressure is prescribed at the channel exit in terms of the equilibrium distribution function. In addition, a no-slip boundary condition is applied at the top and bottom walls of the channel. The Reynolds number of the flow is based on the square width and the bulk velocity at the channel entrance, Re = V 0 B/υ. In this simulation, Re = 1,000 and a grid size of 360 x 61 are used. 47 These parameters were selected to allow for direct comparison with literature (Brandon and Aggarwal, 2001). The kinematic viscosity υ is calculated by applying V 0 =0.032. The flow is developed using a D2Q9 lattice Boltzmann model. Figure 4.2 Computational domain of a rectangular channel and a square cylinder 4.3.2 Square duct flow A schematic of the square duct flow domain having dimensions W W 6W is shown in Fig. 4.3. In this simulation, the domain is bound by four no-slip walls normal to the wall direction, Y, and span-wise direction, Z. Two open boundaries are assumed to be periodic in the stream-wise direction, X. X L = 24B H = 4B B Y B V 48 Figure 4.3 Computational domain of the straight square duct with associated coordinate system. The characteristic length scale is the length of each side wall, W, and the friction velocity u τ is used for calculating characteristic time scale by W / u τ . The friction Reynolds number, Re τ = u τ W/υ 0 is equal to 300, where υ 0 is the fluid kinematic viscosity. Considering a uniform grid, the computational domain is discretized into 448 x 75 x 75 nodes in the stream-wise, span-wise, and wall normal directions, respectively. This corresponds to a grid size of Δ = 1 in lattice units with the constant time step Δt = 1 or equivalently Δ + = 4 and Δt + = 5.4 x 10 -2 in wall units. This grid size should ideally not exceed the local dissipative or Kolmogorov length scale, which is approximately equal to 2 for this flow (Gavrilakis, 1992). In general, the smallest scales for turbulent flow are near the wall. Due to the use of the halfway bounce-back scheme for the implementation of the wall boundary condition, the first grid point is located at a distance of Δ + /2= 2. A W Y Z X 6W W 49 steady state turbulent channel flow of Re τ = 180 was applied as initial velocity profile to provide perturbations. Then the computation evolved over 9x10 5 time steps to obtain a fully developed turbulent flow with a molecular kinematic viscosity of 0.001 and density of 1 in lattice units. The program for the simulation was parallelized with 64 processors with MPI(message passing library) command in Fortran 90. The velocity profiles of the resulting steady turbulent flow were averaged over another 100,000 time steps. Using the time averaged velocity profile and Boltzmann distribution function from the fully developed turbulent flow, particle tracking was implemented for a duration of approximately 13,500 simulation time in lattice units. The static velocity profile and distribution function are fixed and applied each time step. 50 Chapter 5: GPU Parallelization 5.1 Introduction Due to the computational advantages of lattice Boltzmann simulations, it is a very suitable algorithm for parallelization on CPUs (Central Processing Units) as well as GPUs (Graphics Processing Units). The GPU was originally designed for computer graphics, which require highly intensive computation. Since 2003, GPUs have been used to accelerate non-graphical computations by performing general floating point operations (Rohm, 2011). Modern GPUs contain hundreds of arithmetic units which provide tremendous computational acceleration at a comparable price and size. Traditional CPU parallelization for numerically intensive scientific applications required expensive and gigantic computational clusters. As shown in Fig. 5.1, the computational power of GPUs has continued to increase drastically, while increases in the CPU computational performance have stagnated during the period from 2003 to 2010. During this period, GPU computational power has increased nearly 300 times, while CPU computational power has increased only about 30 51 times. The growth of computational power in GPUs has already exceeded CPUs by more than one order of magnitude. Figure 5.1 A comparison of the growth of the computational power of GPUs and CPUs, adapted from Fig 1 of NVIDIA CUDA Programming Guide. The GPU is the processor on the video card in a personal computer that was developed by the video gaming industry for vivid 3D scene rendering. Performing a massive number of floating point computations per video frame in parallel requires a much larger number of cores for data processing. GPUs are optimized to devote more transistors to data processing rather than data caching and flow control. By contrast, CPUs are designed to possess large cache memories and control units for executing rapid 52 sequential code, as schematically illustrated in Fig. 5.2 (NVIDIA, 2010). Therefore, a combination of CPUs for sequential computation and GPUs for parallelization of massive numerical calculations will be used for the high performance computing. Figure 5.2 Schematic assembly of a CPU and a GPU. 5.2 GPU parallelization and CUDA In recent years, general-purpose computation using GPUs (GPGPU) has become an active research area due to the programmability and computational power with a comparable price and size. Many applications have been demonstrated with GPU acceleration such as sparse matrix solvers for partial differential equations to solve the Navier Stokes equation (Bolz et al., 2003), linear algebra operators and a conjugate gradient solver (Kruger and Westermann, 2003), simulation of cloud dynamics using the 53 Jacobi and Gauss-Seidel solvers (Harris et al., 2003), and ray tracing (Carr et al., 2002; Purcell et al., 2002). Implementing the LBM using GPU-based architectures was studied by many researchers (Li et al. 2003; Tolke, 2008; Zhao, 2007). Kaufman et al. (2009) implemented the LBM with GPU parallelization in many applications, including simulating fire, smoke, lightweight object in the wind, jellyfish swimming in water, and heat shimmering and mirage. For large scale LBM simulation and high performing computing, for example the simulation of real-time plume dispersion in complex urban environments and thermal fluid dynamics in a pressurized water reactor, a GPU cluster was used successfully to accelerate the computations (Qiu et al., 2004 and Fan et al., 2004). CUDA (Compute Unified Device Architecture) is a new massively parallel architecture and programming language produced by NVIDIA. It supports most high- level programming languages such as MATLAB, C, C++, and FORTRAN. Using these languages, CUDA can handle each GPU cores and accelerate communication speed between many processors. Tubbs and Tsai (2011) implemented the LBM to solve the shallow water equations and the advection-dispersion equation on GPU-based architectures including C and MATLAB. They concluded that the parallel GPU 54 performance increased along with the domain size due to the increasing computational intensity and decreasing communication time between sub-domains. Additionally, they showed the execution time on C with GPU parallelization was an order of magnitude higher than MATLAB with GPU parallelization. The CUDA parallelization system consists of a CPU, referred to as a host and one or more GPUs, referred to as devices. All of the computational values must first be created in CPU memory and then copied into the GPU memory, because the GPU cannot access anything in the CPU memory during device processes. After executing the code in the GPU, the results need to be copied back to the host memory from the device for host processes. The GPUs from NVIDIA have a number of multiprocessors and each multiprocessor has a group of stream processors, known as cores. Each core can execute a sequential thread in parallel. Parallel portions of a program (known as kernels) are executed on the device, and are required to partition cores into a grid of thread blocks. These threads cooperate through shared memory and can be synchronized. In order to control and track each of the threads and blocks, CUDA allows definition of a unique identity using built-in variables, “threadIdx.x” and "blockIdx.x”. 55 In this study, the GPU-parallelized simulations were conducted on a single work station with the Ubuntu Linux operating system. The computations were accelerated with an NVIDIA GeForce GTX 550Ti graphics card, using 192 CUDA cores with 2 GB of memory RAM and a memory bandwidth of 98.5 GB/s. The host contained an AMD Phenom II X6 1100T with 8 GB of memory RAM. 5.3 Implementation of the lattice Boltzmann model using CUDA The LBM has a very simple implementation, which only requires calculation of the local and neighboring properties. Thus, all of the LBM computations can be efficiently implemented through the GPU parallelization. To compute the LBM equations on the GPU, the computational domain is divided into threads and blocks. As seen in Fig. 5.3, each thread represents lattice node in the X-direction for the square duct flow simulation. The cross-sections of the domain are divided into blocks. The numbers in front of each row and column are the indices of each block in the Y-Z section. The local calculation on each grid of the computational domain can be executed by its own thread 56 at the same time. This makes the GPU-parallelized simulation faster than a sequential CPU program. Figure 5.3 GPU parallelized domain for square duct flow simulation. To track each thread, calculations of the specific index using a thread’s unique identity are needed. One possible implementation of CUDA is dX dY Z Y X index , where X , Y , and Z are the index of each grid point in the domain, dY is total number of column in Y , and dX is total number of columns in X . For example, a value of density, , at (X,Y,Z) can be identified by )); , , ( ( ; . ; . ; . k j i index rho y blockIdx k x blockIdx j x threadIdx i 0 1 2 3 0 1 2 3 X (flow direction) Z Y Thread 3 Thread 2 Thread 1 b0 b4 b8 b12 b1 b5 b9 b13 b2 b6 b10 b14 b3 b7 b11 b15 Blocks 57 In a similar manner, every particle can be dedicated to a unique thread block for GPU parallelization for the tracking of inertial particles. As far as we know, the present study is the first to compute inertial particle trajectories in a turbulent flow using the Boltzmann model and parallelized using a GPU. Moreover, this study demonstrates a computationally inexpensive model for tracking inertial particles through a turbulent flow that may be executed on a personal computer with GPU parallelization. 5.4 Performance The performance of the D3Q19 LBM calculation was evaluated for different domain sizes ranging from 5,000 to 2,560,000 using a GeForce GTX 550Ti GPU and an Intel Core i7 CPU processor. Initially the computation was performed with GeForce GTX 295, however due to the lack of its memory RAM, it was not able to handle more than 1,500,000 domain sizes. Fig. 5.4 (a) compares the computational time in seconds of a single time step running sequential FORTRAN code and parallel code in CUDA. Note that both X and Y axes are logarithmic in scale. Fig. 5.4 (b) describes the speedup factor of the GPU computational time over the CPU computational time. 58 As clearly seen in Fig 5.4 (b), the GPU code is up to 190 times faster than the CPU code. However, the GPU is less efficient for smaller domain sizes because of the time needed for switching data between the host and the device. As the domain size increases, the data switching time is amortized over the longer simulation running time which results in a larger speedup. For domain sizes greater than 135,000, the computational time becomes nearly proportional to the domain size so that speedup factors become relatively constant at maximum value. Implementation of LBM with GPU parallelization clearly demonstrates the advantage of the GPU. It is simple to program and parallelize using CUDA due to the inherent parallelism of the LBM. Much faster computational time using the GPU makes for an efficient model and computationally inexpensive model. This also shows the promise of the Boltzmann model for tracking inertial particles through a turbulent flow with GPU parallelization. Validation of the model with GPU parallelization will be presented in Chapter 6.3 through comparison with sequential CPU code results as well as evaluation of computation performance. 59 Figure 5.4 Performance results of LBM calculation for various domain sizes. (a) Computational speed per time step. (b) Speedup of GPU versus CPU. (a) (b) 60 Chapter 6: Results 6.1 Wakes behind a square cylinder in a channel The Boltzmann model for tracking inertial particles through a turbulent flow was tested using the two-dimensional study channel flow geometry. The flow over a square obstruction in the channel was established using a D2Q9 lattice Boltzmann flow model. Inertial particles in the wake of the square cylinder were tracked and particle deposition in the square cylinder was calculated. 6.1.1 Flow validation The flow around a square cylinder placed within a channel is mainly characterized by flow separation, periodicity, and vortex street formation. For higher Reynolds number, the flow separates from the trailing edge of the cylinder, instead of the leading edge, as seen for lower Reynolds numbers. The unsteady wake structure is directly affected by the flow separation and develops into periodically repeating vortices. This asymmetric vortex shedding is called a Von-Karman vortex sheet (Davis et al., 1984). Figures 6.1 (a) and (b) show an instantaneous velocity vector field and the corresponding stream lines of the flow around the square cylinder at Re=1000. Regions of stagnation in front of the 61 cylinder, flow separation along the top and bottom of the cylinder, and asymmetric vortex shedding in the cylinder wake are shown. Figure 6.1 Instantaneous plot of a square cylinder placed in a channel flow at Re=1000: (a) velocity vector plot, (b) stream line plot. Notes X and Y axis are normalized by length of the square cylinder, B. 6.1.2 Particle tracking In order to investigate the motion of inertial particles in the recirculation region in the wake and the large-scale vortex structures in the post-wake region, inertial particles having a prescribed Stokes number were released at downstream distance of 6.5B from the channel inlet. Every 75 time step (in lattice units which are given in terms of speed of 62 sound and lattice spacing as described in Ch.2.2) twenty particles are injected and particle trajectories are computed using Eqn. 3.5 and 3.7. Snapshots of particles are shown in Fig. 6.2. Particles with St = 0.5 behave more like passive tracers, responding to the fluid relatively fast. Note, however, that they are not entirely entrained into the shedding vortices as would be expected for perfectly passive tracer. In fact that current model cannot perfectly recover the passive tracers because of the finite time step used in the simulation. The small time steps associated with a lattice Boltzmann simulation are characterized by speed of sound, and each time step uses a new local probability distribution. Most particles are distributed around the vortex periphery and are increasingly more dispersed with increasing distance from the cylinder. At St = 1 and 5, particles are not entrained in the near wake because they are essentially not affect by the vortex shedding. Many experimental and numerical studies (Longmire and Eaton, 1992; Lazaro and Lasheras, 1992; Chang and Kailasanth, 1996; Brandon and Aggarwal, 2001) reported similar results regarding particle distribution and dispersion in the presence of large vortex structures. 63 Figure 6.2 Inertial particle distribution of behind a cylinder in 2D channel. The deposition efficiencies on the front side of the square cylinder were computed to quantitatively compare the present particle tracking model with previous studies. We define the deposition efficiency as the ratio of the number of deposited particles on the surface to the number of particles entering the projected area of the surface. As shown in 64 Fig. 6.3, particle deposition efficiency increases rapidly when the Stokes number is less than 1. For higher Stokes numbers the particle deposition efficiency becomes independent of Stokes number. This is in agreement with other numerical results by Longmuir and Blodgett (as reported by Fuchs 1964), Vincent and Humphries (1978), Brandon and Aggarwal (2001), and Jafari et al. (2010). This is also typical behavior for inertial impactors, which exhibit cutoff Stokes number for impaction on the order of unity. Figure 6.3 Particle deposition efficiency with Stokes number. 65 6.2 Transport in a turbulent square duct A three-dimensional turbulent flow in a square duct was applied for tracking inertial particles using our Boltzmann-based model. The turbulent flow was developed using the lattice Boltzmann method based on the D3Q19 model. The fully developed channel flow provided sufficient initial flow condition from which the flow field was allowed to develop. 6.2.1 Flow validation The results from the LBM simulation using an SGS model in a square duct are obtained by averaging the velocity fields. Fig. 6.4 shows computed contours and the velocity vector field of the mean streamwise velocity in the y-z cross sectional plane in a square duct. As seen in Fig. 6.4, the mean velocity vectors are moving along the diagonals towards all four corners. The faster-moving fluid in low shear region is transported from center of the duct toward the corner along the corner bisectors. The slower-moving fluid in high shear region is moved from near the wall toward the center along the duct walls (Gavrilakis, 1992). Fig. 6.4(b) shows that LBM can predict these secondary flows, which feature two counter-rotating vortices in each duct quadrant. These vortices transfer fluid momentum 66 from the center of the duct along the corner bisector to the center of the walls. The center of the lower vortex for this LBM simulation is located at 0.23W, 0.10W in y-z coordinates. This is in excellent agreement with previous reported values. For the high resolution DNS presented by Gavrilakis(1992), it is located at 0.25W, 0.10W. Huser and Biringen(1993) also predicted the center at 0.20W, 0.10W with higher friction Reynolds number, and Madabhushi and Vanka(1991) computed values of 0.27W, 0.13W using a large eddy simulation. Figure 6.4 Secondary flow of the vectors normalized by W in a square duct: (a) Contours of the mean streamwise velocity field in the y-z cross section. (b) Mean secondary velocity in the lower left quadrant of the duct. (a) (b) 67 Figure 6.5 Mean streamwise velocity profile at the centerline of the duct compared with previous DNS result and law of the wall. Fig. 6.5 shows a plot of the mean streamwise velocity profile, normalized by the friction velocity as a function of the wall-normal distance from the wall given in wall units. The mean velocity profile is compared with the law of the wall which can be expressed as b y u ln 1 (6.1) where 32 . 0 is the von Karman constant and 9 . 3 b (Gavrilakis, 1992). The LBM simulation shows good agreement with the logarithm law for y + > 30. 68 The turbulence statistics in the near-wall region are displayed in Fig. 6.6. The turbulence intensity or root-mean-square (rms) streamwise, spanwise, and wall-normal velocity fluctuations are normalized by the friction velocity. The computed results provide reasonably good agreement with channel flow data from the viscous sublayer to half of the wall length. As Gavrilakis (1992) reported, the turbulence intensities are slightly lower for duct flow as compared to channel flow in all directions. Figure 6.6 Root-mean-square (RMS) velocity fluctuations normalized by the friction velocity for fully developed turbulent duct flow compared to previous DNS duct and DNS channel results. Insensitivity to grid resolution was verified using a higher resolution computational grid, having dimensions 576 x 97 x 97. This corresponds to a first grid point at 1.875 wall units. As shown in Fig. 6.7 and 6.8, the velocity statistics are within 69 1% of the lower resolution statistics results. Thus, all subsequent computations employed a resolution of 448 x 75 x 75. Figure 6.7 Mean streamwise velocity profile at the centerline of the duct for higher resolution of 576 x 97 x 97 compared with resolution of 448 x 75 x 75, DNS and law of the wall. Figure 6.8 RMS velocity fluctuations for higher resolution of 576 x 97 x 97 compared to resolution of 448 x 75 x 75, DNS duct and DNS channel results. 70 6.2.2 Particle tracking in a static probability field The computational results presented in this section correspond to those obtained in a static probability field. Once the lattice Boltzmann fluid simulation stabilized to a steady state turbulent flow, a snapshot of the probability field was used in order to generate the velocity field needed to track particles. Note that this velocity field is not static, because new velocity vectors are obtained stochastically from the local probability distributions at each time step. We hypothesize that this computationally inexpensive approach will work increasingly well for increasing particle inertia values. The particle tracking simulation uses two different cases of initial spatial particle distributions. Case A features 61,064 particles uniformly distributed in x-z planes located within the viscous sublayer, and the initial particle velocities were set equal to the local fluid velocity. Case B features 100,000 particles randomly distributed throughout the entire flow domain, and the initial particle velocities were set equal zero velocity. The dimensionless particle relaxation time in wall units, p , is based on the friction velocity and the length of each side wall. When a particle reaches the wall, the particle is considered to be deposited on the wall and the effect of rebound is neglected. 71 6.2.2.1 Visualization of preferential concentration Inertial particles in a turbulent fluid have the tendency to preferentially concentrate in regions of low vorticity (Eaton and Fessler, 1994). The particle concentration fields near the wall are inhomogeneous with high densities occurring in the low speed streaks (Rouson and Eaton, 1994). Case A was simulated for visualizing this preferential concentration of particles in the turbulent square duct flow. Fig. 6.9 is the top view snapshot of the particle positions from the early evolution of the uniform distribution to the late evolution, where the particles are more randomly dispersed. In that figure, uniformly distributed particles were released from an initial location of 3 0 y . The computational relaxation factor (based on the particle Stokes number and allows to be incorporated a history of particle velocity each time step), 0036 . 0 , corresponds to dimensionless particle relaxation time of 15 p . The view displays particles within 30 wall units from the wall. 72 Figure 6.9 Instantaneous particle positions visualized in the x-z plane for τ p + = 15. Particles shown are within 30 wall units from the bottom wall. According to Maxey (1987), particles tend to accumulate in the high strain rate regions or low vorticity, while empty regions are due to high vorticity. This is a purely inertial effect wherein inertial particles are centrifuged out of regions of high vorticity. The noticeable accumulation of particles near the wall, as seen in Fig. 6.9, indicates the turbophoretic effect of driving inertial particles down the turbulence intensity gradient. 73 Fig. 6.10 is a snapshot of particles in the y-z plane. The displayed particles are located within 150 wall units from the center of the square duct. High inertia particles were initially placed near the wall, but then moved toward the inside of the duct because of the off-axis secondary flows. Eventually, particles were distributed throughout most of the domain at the latest time shown in Fig. 6.10(c). (a) (b) (c) Figure 6.10 Instantaneous particle position visualized in the y-z plane for low inertia (τ p + = 1) at t + = (a) 135, (b)270, (c)675, for an initial release height of y 0 + = 3. Particles are visualized within 75 wall units from the center of the square duct. 6.2.2.2 Deposition rate Case B was used in order to quantify particle deposition rates. The deposition rate is presented as particle deposition velocity, d k : d p d p d d d t N N V N A t N k 4 1 , (6.2) 74 where N d is the number of deposited particles during a time interval Δt d , and N p is the number of suspended particles at the beginning of each time interval. The total wall surface area is A, and V is the volume of the duct. Table 6.1 summarizes 6 different values of particle inertia converted from the relaxation factor using Eq. 3.6. Table 6.1 Comparison of Boltzmann model relaxation factor and relaxation time. Boltzmann model constant , η 0.052 0.011 0.0036 0.0027 0.0018 0.0010 Dimensionless Relaxation time p 1 5 15 20 30 50 The deposition fraction, N d /N p, is computed for every time interval of Δ d + = 5.4. Fig. 6.11 displays the deposited fractions for the total simulation of 400 time wall units for the 6 different relaxation times. The deposition rates are computed every 100 time intervals. At the beginning of the simulation, the deposition rates increase slowly due to the initial release conditions, which, in this case, features particles released at zero velocity. After this initial transient period, the fractions are stable and remain constant. For lower values of particle inertia, the deposition rates are constant, but exhibit increased fluctuations during the simulation time. This may be attributed decreased statistical averaging because fewer particles deposit as inertia is decreased. 75 Figure 6.11 Particle deposition rate plotted versus time for various values of particle inertia. Fig. 6.12 shows the computed deposition velocity using Eq. 6.2 averaged over the last 420 time wall units of particle deposition rate. The computed deposition velocities are compared with the circular pipe experimental deposition results obtained by Young and Leeming (1997), which are shown as the dashed line. The Boltzmann model results lie within the previous experimental data, particularly in the diffusion-impaction regime, despite the circular pipe shape. 76 Figure 6.12 Deposition velocity versus particle inertia. The dashed line represents the experimental data envelope for circular pipes compiled by Young and Leeming (1997). The open squares are the DNS result reported by Phares and Sharma (2006). The main difference between a square duct and circular pipe is that a circular pipe exhibits uniform deposition around the entire circumference. By contrast, a square duct exhibit distinct deposition patterns. Nevertheless, both exhibit comparable particle deposition rates and trends with particle inertia. The particle deposition patterns in a 77 square duct depend on local effects caused by the secondary flow. Fig. 6.13 plots the particle deposition patterns of the same six values of particle inertia. The number of particles deposited along the all four walls is calculated by averaging along the total stream-wise distance. The x-axis is normalized with its maximum length of the wall. Figure 6.13 Particle deposition patterns averaged over four walls and along the stream- wise direction of the square duct. Fig. 6.13 shows that, for particle inertia values of p + = 15 or smaller, particles tend to deposit more in the center of the wall, and exhibit lower deposition rates. Higher p + =50 p + =30 p + =15 p + =10 p + =1 p + =5 78 inertia particles, ( p + = 15) deposit more near the corners with relatively high deposition rate. This indicates that higher inertia particles follow the local secondary flow towards the duct corners, where they deposit more efficiently than lower inertia particles. A combination of the secondary flow and turbophoretic force can cause this phenomenon. 6.3 GPU Boltzmann model simulation The previous particle trajectories were computed using a static probability distribution field, allowing for high computational efficiency. As parallelization is increased using GPUs, distribution profiles may be updated more rapidly at every time step during the particle tracking simulation resulting in a more physically realistic simulation. All of the codes were converted into CUDA from Fortran 90 language, and GPU parallelization was applied as described in Ch. 5.3. All particles were assigned into node blocks for their particle dynamic computation using the updated distribution functions from lattice Boltzmann simulation. According to Tubbs and Tsai (2011), the GPU parallelization performance increases with the total resolution because of increasing computational intensity and decreasing need for communications between sub-domains. 79 Thus, the three-dimensional square duct flow simulation was used for these computations and computational performance was compared. 6.3.1 Square duct flow simulation comparison The Boltzmann-based model is implemented by applying updated distribution functions from the evolving simulation at each computational time step. Deposition velocity was computed in a square duct flow for comparison with the static probability method, and for validation of the model. Note that these computations are all performed on a desktop computer with a single GPU graphic card. The limited GPU RAM available in this configuration limited the simulation to a total of 64,000 particles randomly distributed through the entire square duct domain. Initial particle velocities were set to fluid velocity, and the simulation evolved until the effect of the initial conditions were eliminated. The results are shown in Fig. 6.14, which reveals excellent agreement between the static probability model and the evolving probability model. As expected, all of the results also lie within the deposition results of the circular pipe experiment envelope compiled by Young and Leeming (1997). Figure 6.14 confirms our initial hypothesis that the computationally efficient static probability model performs extremely well for particle relaxation times greater 80 than1. This is significant because this range comprises most of the diffusion-impaction region of the plot that is characterized by a rapid rise in deposition rate. From a practical standpoint, this is an important result because many industrial processes incorporate supermicron-sized aerosols that lie in this region. We conclude that the static probability model could be readily and rapidly implemented for many practical cases, eliminating the need for DNS, LES, or even an evolving LBM simulation. Figure 6.14 Deposition velocities comparison for static and evolving fluids model. 81 6.3.2 GPU Performance The static probability field model was implemented on a single work station using Fortran 90. Velocity profiles were fully developed using CPU parallelization with 64 processors, however the particle computations were not parallelized. With GPU acceleration, the particle computations may be parallelized without difficulty for the first method using CUDA. It can also parallelize both velocity computation and particle tracking computation, so that the probability field can evolve at every time step. Table 6.2 shows the computation time needed to execute 1000 time steps of each program along with a comparison of CPU and GPU methods. Using a static probability field, a GPU computation was 16 times faster than a CPU computation by parallelizing only the particle computation. When the fluid flow evolution computation was parallelized and combined with parallelized particle computation, the GPU parallelization executed 134 times faster than a single CPU run. Table 6.2 Execution time per 1000 time steps and speedup for CPU and GPU implementation of particle tracking with static and evolving flow Execution Time (sec) Speedup CPU (Fortran) GPU (CUDA) CPU/GPU 82 Static Flow 389.7 24.4 16.0 Evolving Flow 23300.2 173.4 134.4 These results further underscore the benefits of parallelized GPU computing, and bodes well for the future of GPU/LBM computations for inertial particle tracking in turbulence. 83 Chapter 7: Conclusions Particle tracking using the Boltzmann model was successfully implemented in unsteady flows through a turbulent square duct and a square obstruction in a channel. The turbulent flows were developed by a lattice Boltzmann method and particles were initially introduced into a static, non-evolving probability field derived from the flow simulation. The effect of particle inertia was modeled using a relaxation factor at each particle evolution time step that simulated inertial effects of particles in motion. The relaxation factor was derived from a locally-applied Stokes drag model. The local fluid velocity was obtained stochastically from the local probability distributions, which was interpolated from the distributions at all neighboring node points. The particle transport snapshots and deposition rates were computed and the results compared well with previous experimental and computational data for the wide range of particle inertia values considered. These results demonstrate that this Boltzmann-based model may be an attractive computational method for turbulent particle tracking due to its accuracy, simplicity, and decreased computational expense. 84 An unsteady vortical flow over a square cylinder placed in a channel was developed and particle dynamics were investigated. The LBM simulation was able to compute the unsteady wake structures and the periodic asymmetric shedding of vortices in the wake of an obstructing square cylinder in a 2D channel flow. Particles having Stokes number greater than unity were not affected by the flow in the wake region while particles with smaller Stokes numbers (St < 1) interact strongly with the wake vortices. In addition, particle deposition on the front side of the cylinder decreases with Stokes number, and showed good agreement with previous computational results. The turbulent square duct flow was fully developed and velocity statistics of the mean streamwise velocity, secondary flow, and velocity fluctuations compared well with previous DNS results. Particle tracking was compared with experimental pipe flow data and computational square duct DNS results. The particle deposition rate was calculated and the results agree quantitatively in the diffusion-impact and inertia-moderated regimes. The unique inertia-dependent deposition patterns previously reported for a square duct were recovered by the present model. For higher inertia particles, more deposition occurs around the corners of the walls, and for lower inertial particles, more particles are deposited near the centers of the walls. Both of these results may be 85 attributed to the unique combination of the secondary flows and the turbulence intensity gradient. The model was accelerated using a GPU and the resulting execution times were compared to the CPU computations. For the three-dimensional turbulent flow computations having a total of 2.5 million node points, the GPU program executed 190 times faster than the corresponding CPU computation. Regarding particle tracking, the GPU was 16 times faster than the CPU when applying the static probability field. This speedup capability makes the model more physically realistic by allowing for application of an evolving probability field during the particle tracking simulation. If all of the programs, including the fluid evolution and the particle tracking computations, were parallelized on a GPU, it would run 134 times faster than on a CPU. The GPU performance indicates that the model can be significantly accelerated using a GPU for feasible implementation on a single personal computer. We conclude that the Boltzmann-based model particle tracking model presented in this thesis represents a computationally inexpensive model for tracking inertial particles in turbulent flow, and has great potential for feasible implementation for more complex flow geometries. However, for near passive tracer motion, the model is not as 86 accurate as a DNS simulation. It is worth noting here that the model may be further refined to extend the range of applicability towards passive tracers. For example, in the present model, each particle evolution time step features a new local fluid velocity extracted from the local probability distributions. Small scale eddies, which may more significantly influence the motion of extremely light particles, may be further resolved by including a velocity history, wherein the local fluid velocity from the previous time step is remembered in a colored noise scheme. 87 Bibliography Alexander, F.J., Chen, S., Sterling, J.D. (1993). Lattice Boltzmann thermohydrodynamics. Phys. Rev. E 47:R2249 –52 Astorino, M., Becerra, S. J., Quarteroni, A. (2012). A modular lattice Boltzmann GPU based solver. SeMA Journal, Vol. 59, 53-78. Bhatnagar, P.L., and Gross, E.P. Kook, M. (1954). A model for collision Process in gases. I. Small amplitude processes in charged and neutral one-component system, Phys. Rev., 94,511-525. Bolz, J., Farmer, I., Grinspun, E., Schroder, P. (2003). Sparse matrix solvers on the GPU: conjugate gradients and multigrid, ACM Transactions Graphics 22(3):917 –924. Brandon, D.J., Aggarwal, S.K., (2001). A numerical investigation of particle deposition on a square cylinder placed in a channel flow. Aerosol Science and Technology 34: 340–352. Brooke, J.W., Hanratty, T.J., and McLaughlin, J.B. (1994). Free-flight mixing and deposition of aerosols, Phys. Fluids 6:3404–3415. Buck, I. (2005). Stream Computing on Graphics Hardware. Stanford University Doctoral Dissertation Cabral, B., Cam, N., and Foran, J. (1994). Accelerated volume rendering and tomographic reconstruction using texture mapping hardware. ACM Symposium on Volume Visualization 94, 91–98. Caporaloni, M., Tampieri, F., Trombetti, F. and Vittori, O. (1975) Transfer of particles in nonisotropic air turbulence. Journal of the Atmospheric Sciences, 32: 565-568. Carr, N. A., Hall, J. D., Hart, J. C. (2002). The Ray Engine. SIGGRAPH/Eurographics Workshop on Graphics Hardware. Chang, E., and Kailasanath, K. (1996). Simulation of Dynamics in a Confined Shear Flow, AIAA J. 34:1160–1166. Chen, H. (1993). Discrete Boltzmann systems and fluid flows. Comp. Phys. 7:632-37. Chen, H., Chen, S., and Matthaeus, W.H. (1992). Recovery of the Navier-Stokes equations using a laatice-gas Boltzmann method. Phys. Rev. A. 45:R5339-42. Chen, S., Dawson, S.P., Doolen, G.D., Janecky, D.R., and Lawniczak, A. (1995). Lattice methods and their applications to reacting systems. Comp. Chem. Eng. 19:617-46. 88 Chen, S, and Doolen, G.D. (1998). Lattice Boltzmann method for fluid flows. Annu. Rev. Fluid Mech. 30:329-364 Chen, S., Martinez, D., and Mei, R. (1996). On boundary conditions in lattice Boltzmann methods. Phys. Fluids. 8:2527-36 Chen, M. and McLaughlin, J.B. (1995) A new correlation for the aerosol deposition rate in vertical ducts. Journal of Colloid and Interface Science, 169: 437-455. Chu, N.S., and Tai, C. (2005). MoXi: real-time ink dispersion in absorbent paper. ACM Transactions on Graphics, vol. 24, no. 3, pp. 504-511 Crowe, C. T., Chung, J. N., and Trout, T. R. (1988). Particle Mixing in Free Shear Flows, Progress in Energy and Combustion Science 14:171–194. Davies, C.N. (1965) The rate of deposition of aerosol particles from turbulent flow through ducts. Annals of Occupational Hygiene, 8: 239-245. Davis, R.W., Moore, E. F., and Purtell, L. P. (1984). A Numerical-Experimental Study of Confined Flow around Rectangular Cylinders, Phys. Fluids 27:46–59. Dawson, S.P., Chen, S., Doolen, G.D. (1993). Lattice Boltzmann computations for reaction diffusion equations. J. Chem. Phys. 98:1514–23 Demuren, A. O. & Rodi, W. (1984). Calculation of turbulence-driven secondary motion in noncircular ducts. J. Fluid Mech. 140, 189. Desplat, J.C., Paonabarraga, I., Bladon, P. (2001). LUDWIG: a parallel Lattice – Boltzmann code for complex fluids, Computer Physics Communications,134:273 – 290. Eaton, J.K., and Fessler, J.R. (1994) Preferential concentration of particles by turbulence. Intl J. Multiphase Flow 20, 169-209. Eggels, J.G.M. (1996). Direct and large-eddy simulation of turbulent fluid flow using the lattice Boltzmann scheme. Int. J. Heat Fluid Flow 17:307-23 Eggels ,J.G.M., and Somers, J.A. (1995). Numerical simulation of free convective flow using the lattice-Boltzmann scheme. J. Heat Fluid Flow 16:357-64 England, J.N. (1978). A system for interactive modeling of physical curved surface objects. In Proceedings of SIGGRAPH 78 1978, 336-340. 1978. Fan, Z., Qiu, F., Kaufman, A., Yoakum-Stover, S. (2004). GPU Cluster for high performance computing. Proceedings of the 2004 ACM/IEEE Conference on Supercomputing. IEEE Computer Society P47. Fessler, J.R., Kulick, J.D., and Eaton, J.K. (1994). Preferential concentration of heavy particles in a turbulent channel flow. Phys. Fluids 6, 3742–3749. Fuches, N.A.(1964). The mechanics of aerosol. New York: Pergamon Press. 89 Friedlander, S.K., and Johnstone, H.F. (1957). Deposition of suspended particles from turbulent gas streams, Ind. Eng. Chem. 49:1151–1156. Frisch, U., Hasslacher, B., and Pomeau, Y. (1986). Lattice gas Automata for the Navier Stokes Equation, Phys. Rev. Lett. Gavrilakis, S. (1992). Numerical simulation of low-Reynolds-number turbulent flow through a straight Square Duct, J. Fluid Mech. 244:101–129. Ginzbourg, I., Adler, P.M. (1994). Boundary flow condition analysis for three- dimensional lattice Boltzmann model, Journal of Physics II France 4.191-214 Guha, A. (1997) A unified Eulerian theory of turbulent deposition to smooth and rough surfaces. Journal of Aerosol Science, 28: 1517-1537. Gunstensen, A.K., and Rothman, D.H. (1993). Lattice Boltzmann studies of immiscible two phase flow through porous media. J. Geophys. Res. 98:6431-41 Gunstensen, A.K., Rothman, D.H., Zaleski, S., Zanetti, G. (1991). Lattice Boltzmann model of immiscible fluids. Phys. Rev. A. 43:4320–27 Guo, Z., Zheng, C., and Shi, B. (2002). Discrete lattice effects on the forcing term in the lattice Boltzmann method. Physical Review E 65, 046308 Hallyiday, I., Care, C.M., Thompson, S., and White, D. (1996). Induced burst of fluid drops in a two-component lattice Bhatnager-Gross-Krook fluid. Phys. Rev. E 54:2573 - 2576 Harris, M., Baxter, W., Scheuermann, T., Lastra, A. (2003). Simulation of cloud dynamics on graphics hardware. In proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pp. 92–101. He, X., Chen, S., Doolen, G.D. (1998). A novel thermal model for the lattice Boltzmann method in incompressible limit. J. of Comp. Phy. 146(1):282-300. He, X., and Luo, L. (1997). Lattice Boltzmann model for the incompressible Navier- Stokes equation. J. Stat. Physics 88:927-944 He, X., Zou, Q., Luo, L.S., and Dembo, M. (1997). Analytic solutions and analysis on non-slip boundary condition for the lattice Boltzmann BGK model, Journal of Statistical Physics 87:115-136. Heinsohn, R.J., and Kabel, R.L. (1999). Sources and Control of Air Pollution. Prentice Hall, New Jersey. p.409-425. Hou, S., Sterling, J., Chen, S., Doolen, G.D. (1996), A lattice Boltzmann subgrid model for high Reynolds number flows. Fields Inst Commu, vol. 6, 151–6 Huser, A., and Biringen, S. (1993). Direct numerical simulation of turbulent flow in a square duct. J. Fluid Mech. 257, 65–95. 90 Inamuro, T., Yoshino, M., Ogino, F. (1995). A nonslip boundary condition for lattice Boltzmann simulations. Phys. Fluids 7:2928–30 Jacobs, G.B., Armstrong, K., (2009). Inertial Particle Dispersion in the Lagrangian Wake of a Square Cylinder, AIAA Paper, 2009-1026. Jafari, S., Salmanzadeh, M., Rahnama, M., Ahmadi, G., (2010). Investigation of particle dispersion and deposition in a channel with a square cylinder obstruction using the lattice Boltzmann method. Journal of Aerosol Science, 41:198-206. Julien, C.T. and Inanc, S. (2009) CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows 47th AIAA Aerospace Sciences Meeting, AIAA 2009-758. Junk, M., and Yang, Z. (2008). Outflow boundary conditions for the lattice Boltzmann method. Progress in Computational Fluid Dynamics 8:38-48 Kajishima, T., Miyake, Y. (1992). A discussion on eddy viscosity models on the basis of the large eddy simulation of turbulent flow in a square duct. Comp. Fluids 21(2):151– 161 Kandhai, D., Koponen, A., Hoekstra, A.G., Kataja, M., Timonen, J., Sloot, P.M.A. (1998). Lattice Boltzmann hydrodynamics on parallel systems, Computer Physics Communications, 111(1 –3):14 –26. Kang, Q., Zhang, D., and Chen, S., (2002). Unified lattice Boltzmann method for flow in multiscale porous media. Phys. Rev. E 66 (5): 056307 Kato, T., Kono, K., Seta, K., Martinez, D., and Chen, S. (1997). Boiling two-phase by the lattice Boltzmann method. Int. J. Mod. Phys. In press Kaufman, A., Fan, Z., Petkov, K. (2009) Implementing the lattice Boltzmann model on commodity graphics hardware. J. of Stat. Mech. P06016 Kruger, J., Westermann, R. (2003). Linear algebra operators for GPU implementation of numerical algorithms. ACM Transactions Graphics 22(3):908 –916. Kuerten, J.G.M. (2005). Subgrid modeling in particle-laden channel flow. Phys. Fluids. 18:45-82 Lallemand, P., and Luo, L.S. (2003). Theory of the lattice Boltzmann method: Acoustic and thermal properties in two and three dimensions. Phy. Rev. E 68, 036706 Lammers, P., Beronov, K.N., Volkert, R., Brenner, G., and Durst, F. (2006). Lattice BGK direct numerical simulation of fully developed turbulence in incompressible plane channel flow. Computers and Fluids 35:1137-1153. Lazaro, B. J., and Lasheras, J. C. (1992). Particle Dispersion in the Developing Free Shear Layer, Part 1—Unforced Flow, J. Fluid Mech. 235:143–178. 91 Li, W., Fan, Z., Wei, X., Kaufman, A. (2005). Flow Simulation with Complex Boundaries, GPU Gems 2, chap. 47, 747 –764.. Addison-Wesley, Boston, MA. Li, W., Wei, X., and Kaufman, A. (2003). Implementing lattice Boltzmann computation on graphics hardware. Visual Computer, 19(7-8):444–456, Liu, B.Y.H., and Agarwal, J. K. (1974). Experimental observation of aerosol deposition in turbulent flow, J. Aerosol Sci. 5:145–155. Liu, B.Y.H. and Ilori, T.A. (1974) Aerosol deposition in turbulent pipe flow. Environmental Science and Technology, 8: 351-356. Liu, P.L.F., Yeh, H., and Synolakis, C. (2008). Advanced numerical models for simulating tsunami waves and runup. World Scientific Publishing. Volume 10. Longmire, E. K., and Eaton, J. K. (1992). Structure of a Particle-Laden Round Jet, J. Fluid Mech. 236:217–257. Madabhushi, R.K., and Vanka, S.P. (1991). Large eddy simulation of turbulence-driven secondary flow in a square duct. Phys. Fluids A 3, 2734–2745. Maier, R.S., Bernard, R.S., and Grunau, D.W. (1996). Boundary conditions for the lattice Boltzmann method. Phys. Fluids 8:1788-1801 Martinez, D.O., Chen, S., Matthaeus, W.H. (1994). Lattice Boltzmann magnetohydrodynamics. Phys. Plasmas 1:1850–67 Martys, N.S., Shan, X., and Chen, H. (1998), Evaluation of the external force term in thediscrete Boltzmann equation, Physical Review E, 58, 6855-6857 Maxey, M.R. (1987). The gravitational settling of aerosol particles in homogeneous turbulence and random flow fields. J. Fluid Mech. 174, 441–465. Maxey, M.R. and Riley, R.R. (1983). Equation of motion for a small rigid sphere in a nonuniform flow, Phys. Fluids 26, 883-889 Mendoza, M. and Munoz, J.D. (2008). Three-dimensional lattice Boltzmann model for magnetic reconnection, Phys. Rev. E, Vol. 77, 026713. McLaughlin, J.B. (1989). Aerosol particle deposition in numerically simulated channel flow, Phys. Fluids A 1:1211–1224. Michaelides, E.E. (2006). Particles, bubbles, and drops: Their motion, heat, and mass transfer, World Scientific, New Jerse. p.256-260. Noble, D.R., Chen, S., Georgiadis, J.G., and Buckius, R.O. (1995). A consistent hydrodynamic boundary condition for the lattice Boltzmann method. Phys. Fluids 7:203-9 NVIDIA. (2010). NVIDIA CUDA Programming Guide 4.2, http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf 92 Pattison, M.J., Premnath, and K.N., Banerjee, S. (2009). Computation of turbulent flow and secondary motions in a square duct using a forced generalized lattice Boltzmann equation. Phys. Rev. E 79:026704. Pedinotti, S., Mariotti. G., and Banerjee, S. (1992). Direct numerical simulation of particle behavior in the wall region of turbulent flows in horizontal channels. Int. J. Multiphase Flow 18,927 Perumal, D.A., Kumar, G.V.S., Dass, A.K. (2012). Numerical simulation of viscous flow over a square cylinder using lattice Boltzmann method. ISRN Math. Phys. Vol. 2012, 630801 Phares, D.J., and Sharma, G. (2006). A DNS study of aerosol deposition in a turbulent square duct flwo. Aerosol Sci. and Technol. 40:1016-1024. Premnath, K.N., Pattison, M.J., and Banerjee, S. (2009). Generalized lattice Boltzmann equation with forcing term for computational of wall-bounded turbulent flows. Phys. Rev. E 79:026703. Purcell, T. J., Buck, I., Mark, W. R., and Hanrahan, P. (2002). Ray Tracing on Programmable Graphics Hardware. ACM Transactions on Graphics 21, 3, 703–712. Qiu, F., Zhao, Y., Fan, Z., Wei, X., Lorenz, H., Wang, J., Yoakum-Stover, S., Kaufman, A. and Mueller, K., (2004). Dispersion Simulation and Visualization for Urban Security, IEEE Visualization Proceedings, pp. 553-560. Rhoades, J., Turk, G., Bell, A., State, A., Neumann, U. and Varshney, A. (1992). Real- Time Procedural Textures. Symposium on Interactive 3D Graphics 1992, ACM / ACM Press, 95-100. Rohm, D. (2011). Lattice Boltzmann simulations on GPUs. Stuttgart University Doctoral Dissertation Rouson, D.W.I., and Eaton, J.K. (1994). Direct numerical simulation of turbulent channel flow with immersed particles., ASME/FED Numerical Methods in Multiphase Flows, 185. Rowghani, S., Mirezaei, M., Kamali, R. (2010). Numerical simulation of fluid flow past a square cylinder using a lattice Boltzmann method. J. of Aero. Sci. and Tech. Vol. 7, 1:9-17. Salmanzadeh, M., Rahnama, M., and Ahmadi, G. (2010). Effect of sub-grid scales on large eddy simulation of particle deposition in a turbulent channel flow. Aerosol Sci. and Technol. 44:796-806. Sanders, J., and Kandrot, E. (2011). CUDA by example: An introduction to general- purpose GPU programming. Addison-Wesley. 93 Sankaranarayanan, K., Shan, X., Kevrekidis, I.G., and Sundaresan, S. (2002). Analysis of drag and virtual mass forces in bubbly suspensions using an implicit formulation of the lattice Boltzmann method. J. Fluid Mech. 452: 61–96. Shan, X., and Chen, H. (1993), Lattice Boltzmann model for simulating flows with multiple phase and components, Physical Review E, 47, 1815-1819 Shan, X., Yuan, X-F., and Chen, H. (2006). Kinetic theory representation of hydrodynamics: a way beyond the Navier-Stokes equation. J. of Fluid Mech., 550, 413-441 Sharma, G., and Phares, D.J. (2006). Turbulent transport of particles in a straight square duct, Int. J. Multiphase Flow 32:823-837. Sippola, M.R., Nazaroff, W.W., (2002). Modeling Particle Deposition In Ventilation Ducts. Proceedings of the Indoor Air 2002 Conference, 1:515-520 Skordos, P.A. (1993). Initial and boundary conditions for the lattice Boltzmann method. Phys. Rev. E 48:4823-42 Spasov, M., Rempfer, D., and Mokhasi, P. (2009). Simulation of a turbulent channel flow with an entropic lattice Boltzmann method. Int. J. Numer. Meth. Fluids 60:1241- 1258. Succi, S. (2001). The Lattice Boltzmann Equation for Fluid Dynamics and Beyond. Clarendon Press Succi, S., Benzi, R., and Higuera, F. (1991). The lattice-Boltzmann equation- a new tool for computational fluid dynamics. Physics D 47:219-30 Suh,Y. J., and Kim, S.S. (1996). Effect of Obstructions on the Particle Collection Efficiency in a Two-Stage Electrostatic Precipitator, J. Aerosol Sci. 27:61–74. Sukop, M., and Thorne D. (2005). Lattice Boltzmann Modeling: An Introduction for Geoscientists and Engineers. Berlin: Springer-Berlag Tolke, J. and Krafczyk, M. (2008). TeraFLOP computing on a desktop PC with GPUs for 3D CFD. International Journal Computational Fluid Dynamics 22(7): 443-456. Tubbs, K.R., and Tsai, F.T.-C. (2011). GPU accelerated Boltzmann model for shallow water flow and mass transport, Int. J. Numer. Meth. in Engng, 86,316-334. Turner, D. B., (1994). Workbook of Atmospheric Dispersion Estimates: An Introduction to Dispersion Modeling, 2 nd ed., Lewis Publishers, Boca Raton, FL. Uijttewaal, W.S.J. and Oliemans, R.V.A. (1996) Particle dispersion and deposition in direct numerical and large eddy simulations of vertical pipe flows. Physics of Fluids, 8: 2590-2604. 94 Uthuppan, J., Aggarwal, S. K., Grinstein, F. F., and Kailasanath, K. (1994). Particle Dispersion in a Transitional Axisymmetric Jet: A Numerical Simulation, AIAA J. 32:2004–2014. Vincent, J.H., and Humphries,W. (1978). The collection of air borne dusts by bluff bodies. Chemical Engineering Science, 33, 1147–1155. Wang, Q., and Squires, K.D. (1996). Large eddy simulation of particle-laden turbulent channel flow. Phys. Fluids 8, 1207–1223. Wei, X., W. Li, K. Mueller, and A. Kaufman. (2004). The Lattice-Boltzmann Method for Gaseous Phenomena. IEEE Transactions on Visualization and Computer Graphics 10(2), pp. 164–176. Wellein, G., Zeiser, T., Donath, S., and Hager, G. (2006). On the Single Processor Performance of Simple Lattice Boltzmann Kernels. Computers & Fluids 35, 910-919 (2006) Winkler, C.M., Rani S.L., and Vanka S.P. (2003). Preferential concentration of particle in a fully developed turbulent square duct flow. Int. J. Multiphase Flow 30, 27-50. Wolf-Gladrow, D. (2000). Lattice-Gas Cellular Automata and Lattice Boltzmann Models. Berlin: Springer Xu, H., and Pollard, A. (2001). Large eddy simulation of turbulent flow in a square annular duct. Phy. of Fluids. 13, 3321-3337 Yeung, P.K. (2002). Lagrangian investigations of turbulence. Annu. Rev. Fluid Mech. 34, 115-142. Yeung, P.K., and Pope, S.B. (1989). Lagrangian statistics from direct numerical simulation of isotropic turbulence. J. Fluid Mech. 207, 531-586. Young, J., and Leeming, A. (1997). A theory of particle deposition in turbulent pipe flow, J. Fluid Mech. 340:129–159. Yu, H., Girimaji S.S., and Luo, L.S. (2005), Lattice Boltzmann simulations of decaying homogeneous isotropic turbulence. Phy.Rev. E, 71 Zhang, H., and Ahmadi, G. (2000). Aerosol particle transport and deposition in vertical and horizontal turbulent duct flow. J. Fluid Mech. 406, 55–80. Zhu, H., Liu, X., Liu, Y., Wu, E., (2006). Simulation of miscible binary mixtures based on lattice Boltzmann method. Journal of Visualization and Computer Animation 17(3-4): 403-410. Ziegler, D.P. (1993). Boundary conditions for lattice Boltzmann simulations. J. Stat. Phys. 71:1171-77 95 Zhao, Y., Han, Y., Fan, Z., Qiu, F., Kuo, Y.C., Kaufman, A., Mueller, K. (2007). Visual simulation of heat shimmering and mirage. IEEE Trans. Vis. Comput. Graph. 13(1), 179–189. Zou, Q., and He, X. (1997). On pressure and velocity boundary conditions for the lattice Boltzmann BGK model. Phys. Fluids 9:1591-98 96 Appendices Appendix A: LB equation to Navier-Stokes equation The incompressible Naver-Stokes equations can be derived through application of Taylor series expansion and Chapman-Enskog expansion of the lattice Boltzmann equation. When the left hand side of the lattice Boltzmann equation is expanded as a Talor series up to the second order term, i i i i i i i i i i i i f e e t f e t f f e t f t x f t t t e x f : 2 2 1 ) , ( ) , ( 2 2 (A.1) where : is the colon product between dyads. The Chapman-Enskog analysis requires expansion of the distribution function in the small parameter, , which is the Knudsen number. It is defined as the ratio of the mean free path length to characteristic length scale of the flow, , , , 1 0 2 2 1 0 2 2 1 0 x x x t t t t f f f f i i i i (A.2) where subscript 0 denotes local equilibrium eq i i f f 0 . Macroscopic flow variables are defined by moments of the distribution function and defined as 97 , 0 , 0 , , 1 1 0 0 i i i i i i e f f e f u f (A.3) The lattice Boltzmann equation can be rewritten by substituting the expanded distribution function and separating into two different orders of Knudsen number. The first order term in 1 yields 1 0 1 1 0 i i i i f f e t f (A.4) and Eq. A.3 can be applied to Eq. A.4. The continuity equation can then be obtained, 0 u t (A.5) Similarly, the second order term in 2 becomes 2 0 2 1 0 2 1 0 2 1 1 2 0 1 1 : 2 1 2 1 i i i i i i i i i i i f f e e t f e t f f e t f t f (A.5) This can be simplified as, 2 1 1 1 1 2 1 2 1 i i i i i f f e t f t f (A.6) which can be converted into the momentum equation using Eq. A.3 and introducing the momentum flux tensor, . 0 t u (A.7) where the momentum flux tensor can be expressed as, 98 0 1 2 1 i i i i f f e e (A.8) Assuming low Mach number or small velocity, the equilibrium distribution function, Eq. 2.9, can be applied back into the flux tensor, )) ( ) ( ( u u u u p (A.9) substituting Eq. A.9 into the momentum equation, Eq. A.7, the Navier-Stokes equation is recovered. )) ( ) ( ( u u p u u t u (A.10) 99 Appendix B: Program code B.1 A square cylinder placed in 2D channel (Fortran) PROGRAM 2Dchannel parameter(ly=61,lx=60*6) parameter(nP=20) integer is_solid_node(ly,lx),ts,tsf,tcount,Nde,tcountp,tcountd real rho(ly,lx),f(ly,lx,9),ftemp(ly,lx,9),ex(9),ey(9),& u_x(ly,lx),u_y(ly,lx),x(ly,lx),y(ly,lx),feq(ly,lx,9),& Px(nP), Py(nP),rp(1,nP*2),tau(ly,lx),cnde(500) real nu,L,num,lynp,pyct,Upxfi(nP),Upyfi(nP) real etime, elapsed(2),tota call initialCons(eta,pi,Re,uMax,L,B,nu,tau,c) ! Set initial particle position ! 1. random location !call rand(rp,nP*2) !do i = 1,nP !Px(i) = rp(1,i) * 10 !Py(i) = rp(1,i+nP) *19 +22 !enddo ! 2. linearly position ! for pyct = 22.5 do j=1,nP Px(j)=53 Py(j)=pyct pyct = pyct +1 enddo ! Set solid nodes at walls on top and bottom is_solid_node=0 100 do i=1,lx is_solid_node(1,i)=1 is_solid_node(LY,i)=1 enddo do j=23,38 do i=128,143 is_solid_node(j,i)=1 enddo enddo ! Velocity field computation using LBM ! Set initial density rho=1. do j=1,ly do i=1,lx f(j,i,1) = (4./9. )*rho(j,i) f(j,i,2) = (1./9. )*rho(j,i) f(j,i,3) = (1./9. )*rho(j,i) f(j,i,4) = (1./9. )*rho(j,i) f(j,i,5) = (1./9. )*rho(j,i) f(j,i,6) = (1./36.)*rho(j,i) f(j,i,7) = (1./36.)*rho(j,i) f(j,i,8) = (1./36.)*rho(j,i) f(j,i,9) = (1./36.)*rho(j,i) enddo enddo ! Define lattice velocity vectors ex(0+1)= 0 ey(0+1)= 0 ex(1+1)= 1 ey(1+1)= 0 ex(2+1)= 0 ey(2+1)= 1 ex(3+1)=-1 101 ey(3+1)= 0 ex(4+1)= 0 ey(4+1)=-1 ex(5+1)= 1 ey(5+1)= 1 ex(6+1)=-1 ey(6+1)= 1 ex(7+1)=-1 ey(7+1)=-1 ex(8+1)= 1 ey(8+1)=-1 f1=3. f2=9./2. f3=3./2. ! Time loop tcount =1 tcountd =1 tcountp = 1 do ts=1,tsf ! Computing macroscopic density, rho, and velocity, u=(ux,uy). do j=1,ly do i=1,lx u_x(j,i) = 0.0 u_y(j,i) = 0.0 rho(j,i) = 0.0 if(is_solid_node(j,i).eq.0) then do k=1,9 rho(j,i) = rho(j,i) + f(j,i,k) u_x(j,i) = u_x(j,i) + ex(k)*f(j,i,k) u_y(j,i) = u_y(j,i) + ey(k)*f(j,i,k) enddo u_x(j,i) = u_x(j,i)/rho(j,i) 102 u_y(j,i) = u_y(j,i)/rho(j,i) endif enddo enddo ! MACROSCOPIC (DIRICHLET) BOUNDARY CONDITIONS ! Inlet: Poiseuille profile do j = 2,ly-1 y_phys = j-1.5 u_x(j,1) = 4. * uMax / (L*L) * (y_phys*L-y_phys*y_phys) !u_x(j,1) = uMax u_y(j,1) = 0 fsum1 = f(j,1,1) + f(j,1,3)+f(j,1,5) fsum2 = f(j,1,4) + f(j,1,7)+f(j,1,8) rho(j,1) = 1. / (1-u_x(j,1)) * (fsum1+2.*fsum2) ! Outlet: Constant pressure rho(j,lx) = 1. fsum3 = f(j,lx,1) + f(j,lx,3)+f(j,lx,5) fsum4 = f(j,lx,2) + f(j,lx,6)+f(j,lx,9) u_x(j,lx) = -1 + 1. / (rho(j,lx)) * (fsum3+2.*fsum4) u_y(j,lx) = 0 enddo ! MICROSCOPIC BOUNDARY CONDITIONS: INLET (Zou/He BC) do j = 2,ly-1 f(j,1,2) = f(j,1,4) + 2./3.*rho(j,1)*u_x(j,1) f(j,1,6) = f(j,1,8) + 1./2.*(f(j,1,5)-f(j,1,3)) + 1./2.*rho(j,1)*u_y(j,1)& + 1./6.*rho(j,1)*u_x(j,1) f(j,1,9) = f(j,1,7) + 1./2.*(f(j,1,3)-f(j,1,5)) - 1./2.*rho(j,1)*u_y(j,1)& + 1./6.*rho(j,1)*u_x(j,1) !MICROSCOPIC BOUNDARY CONDITIONS: OUTLET (Zou/He BC) f(j,lx,4) = f(j,lx,2) - 2./3.*rho(j,lx)*u_x(j,lx) f(j,lx,8) = f(j,lx,6) + 1./2.*(f(j,lx,3)-f(j,lx,5)) - 1./2.*rho(j,lx)*u_y(j,lx)& - 1./6.*rho(j,lx)*u_x(j,lx) 103 f(j,lx,7) = f(j,lx,9) + 1./2.*(f(j,lx,5)-f(j,lx,3)) + 1./2.*rho(j,lx)*u_y(j,lx)& - 1./6.*rho(j,lx)*u_x(j,lx) enddo ! Compute the equilibrium distribution function, feq. do j=1,ly do i=1,lx if(is_solid_node(j,i).eq.0) then rt0 = (4./9. )*rho(j,i) rt1 = (1./9. )*rho(j,i) rt2 = (1./36.)*rho(j,i) ueqxij = u_x(j,i) ueqyij = u_y(j,i) uxsq = ueqxij * ueqxij uysq = ueqyij * ueqyij uxuy5 = ueqxij + ueqyij uxuy6 = -ueqxij + ueqyij uxuy7 = -ueqxij -ueqyij uxuy8 = ueqxij -ueqyij usq = uxsq + uysq feq(j,i,0+1) = rt0*(1. - f3*usq) feq(j,i,1+1) = rt1*(1.+ f1*ueqxij+f2*uxsq - f3*usq) feq(j,i,2+1) = rt1*(1.+ f1*ueqyij+f2*uysq - f3*usq) feq(j,i,3+1) = rt1*(1.- f1*ueqxij+f2*uxsq - f3*usq) feq(j,i,4+1) = rt1*(1.- f1*ueqyij+f2*uysq - f3*usq) feq(j,i,5+1) = rt2*(1.+ f1*uxuy5 +f2*uxuy5*uxuy5-f3*usq) feq(j,i,6+1) = rt2*(1.+ f1*uxuy6 +f2*uxuy6*uxuy6-f3*usq) feq(j,i,7+1) = rt2*(1.+ f1*uxuy7 +f2*uxuy7*uxuy7-f3*usq) feq(j,i,8+1) = rt2*(1.+ f1*uxuy8 +f2*uxuy8*uxuy8-f3*usq) endif enddo enddo ! Collision step. 104 do j=1,ly do i=1,lx if(is_solid_node(j,i).eq.1) then ! Standard bounceback temp= f(j,i,1+1) f(j,i,1+1) = f(j,i,3+1) f(j,i,3+1) = temp temp= f(j,i,2+1) f(j,i,2+1) = f(j,i,4+1) f(j,i,4+1) = temp temp= f(j,i,5+1) f(j,i,5+1) = f(j,i,7+1) f(j,i,7+1) = temp temp= f(j,i,6+1) f(j,i,6+1) = f(j,i,8+1) f(j,i,8+1) = temp else ! Regular collision do k=1,5 f(j,i,k) = f(j,i,k)-( f(j,i,k) - feq(j,i,k))/tau(j,i)! & !+ 1./3.*ex(k)*g enddo do k=6,9 f(j,i,k) = f(j,i,k)-( f(j,i,k) - feq(j,i,k))/tau(j,i)! & !+ 1./12.*ex(k)*g enddo endif enddo enddo ! Streaming step; subtle changes to periodicity here due to indexing do j=1,ly if (j.gt.1) then jn = j-1 105 else jn = LY endif if (j.lt.ly) then jp = j+1 else jp = 1 endif do i=1,lx if (i.gt.1) then in = i-1 else in = LX endif if (i.lt.LX) then ip = i+1 else ip = 1 endif ftemp(j,i,0+1) = f(j,i,0+1) ftemp(j,ip,1+1) = f(j,i,1+1) ftemp(jp,i,2+1) = f(j,i,2+1) ftemp(j,in,3+1) = f(j,i,3+1) ftemp(jn,i ,4+1) = f(j,i,4+1) ftemp(jp,ip,5+1) = f(j,i,5+1) ftemp(jp,in,6+1) = f(j,i,6+1) ftemp(jn,in,7+1) = f(j,i,7+1) ftemp(jn,ip,8+1) = f(j,i,8+1) enddo enddo f=ftemp ! Calculation Large Eddy Viscosity do j=1,ly do i=1,lx 106 pixx = 0 piyy = 0 pixy = 0 piyx = 0 do k = 1,9 pixx = pixx+ex(k)*ex(k)*(f(j,i,k)-feq(j,i,k)) piyy = piyy+ey(k)*ey(k)*(f(j,i,k)-feq(j,i,k)) pixy = pixy+ex(k)*ey(k)*(f(j,i,k)-feq(j,i,k)) piyx = piyx+ey(k)*ex(k)*(f(j,i,k)-feq(j,i,k)) enddo stsq = pixx*pixx + piyy*piyy + pixy*pixy + piyx*piyx s1 = 1./6./csq s2 = 18*csq*sqrt(stsq) s3 = sqrt(nusq+s2) s4 = s1*(s3-nu) tau(j,i) = 3.*(nu+csq*s4)+1./2. enddo enddo !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! call part2dcy(nP,lx,ly,ex,ey,f,uxp,uyp,uxpc_n,uypc_n,xpc_n,ypc_n,St) ! End time loop enddo pause END PROGRAM SUBROUTINE part2dcy(nP,lx,ly,ex,ey,f,uxp,uyp,uxpc_n,uypc_n,xpc_n,ypc_n,St) real xpc_n(nP),ypc_n(nP),f(ly,lx,9),ex(9),ey(9),dx,dy,dt real uxpc_n(nP),uypc_n(nP), xpos(lx), ypos(ly),eta, St real u_x(ly,lx), u_y(ly,lx),uxp(ly,lx), uyp(ly,lx) real rn(1,nP*4), r,prn(nP*4), Upxc, Upyc real uxout(ly,lx), uyout(ly,lx), uxpc(nP), uypc(nP),xin,yin integer Nnx,Npx,Nny,Npy,rcount,tc,X1,X2,X3,X4 Nde = 0. 107 call rand(rn,nP*4) do i = 1,nP*4 prn(i) = rn(1,i) end do dx = 6./360 dy = 1./60 dt = dx*dx eta = 1-exp(-dt/St) do j=1,ly ypos(j)=dy*(j-1) enddo do i=1,lx xpos(i)=dx*(i-1) enddo do j=1,ly do i = 1,lx u_x(j,i) = 1/dx*uxp(j,i) u_y(j,i) = 1/dy*uyp(j,i) enddo enddo rcount = 1 do k = 1,nP xin=xpc_n(k) yin=ypc_n(k) if (yin .le. 0 .or. yin .ge. 1) then Upyc = 0 Upxc = 0 else call searind(indi1,indj1,indi2,indj2,xin,yin,xpos,ypos,lx,ly) 108 Dp1 = sqrt((xin/dx-indi1)**2+(yin/dy-indj1)**2) Dp2 = sqrt((xin/dx-indi1)**2+(yin/dy-indj2)**2) Dp3 = sqrt((xin/dx-indi2)**2+(yin/dy-indj1)**2) Dp4 = sqrt((xin/dx-indi2)**2+(yin/dy-indj2)**2) TD = Dp1 + Dp2 + Dp3 + Dp4 do j=1,ly do i=1,lx uxout(j,i)=u_x(j,i) uyout(j,i)=u_y(j,i) enddo enddo call intpvel2d3(ux_yn,uy_yn,uxout,uyout,xin,yin,dx,dy,& indi1,indj1,indi2,indj2,xpos,ypos,lx,ly) Upxc = ux_yn Upyc = uy_yn endif if (xin .ge. 127*dx .and. xin .le. 142*dx .and. yin .ge. 22*dy .and. yin .le. 37*dy) then uxpc_n(k) = 0 uypc_n(k) = 0 Nde = Nde +1 elseif (xin .gt. 359*dx .or. yin .gt. 59*dy .or. yin .lt. 0) then uxpc_n(k) = 0 uypc_n(k) = 0 elseif (xin .le. 122*dx) then uxpc_n(k) = Upxc uypc_n(k) = Upyc else ! Calculating distance of a particle and 4 nodes r = prn(rcount) 109 call prob2d(indi1,indj1,ex,ey,f,lx,ly,nP,r,X1) r = prn(rcount+1) call prob2d(indi1,indj2,ex,ey,f,lx,ly,nP,r,X2) r = prn(rcount+2) call prob2d(indi2,indj1,ex,ey,f,lx,ly,nP,r,X3) r = prn(rcount+3) call prob2d(indi2,indj2,ex,ey,f,lx,ly,nP,r,X4) rcount = rcount+4 f1 = f(indi1,indj1,X1) f2 = f(indi1,indj2,X2) f3 = f(indi2,indj1,X3) f4 = f(indi2,indj2,X4) rhop = (f1+ f2 + f3 + f4) ! Velocity of particle epxn1 = ex(X1) epxn2 = ex(X2) epxn3 = ex(X3) epxn4 = ex(X4) epyn1 = ey(X1) epyn2 = ey(X2) epyn3 = ey(X3) epyn4 = ey(X4) Upx = (1/dx)*(1/TD)*(epxn1*f1*Dp4 + epxn2*f2*Dp3 + & epxn3*f3*Dp2 + epxn4*f4*Dp1) / rhop Upy = (1/dy)*(1/TD)*(epyn1*f1*Dp4 + epyn2*f2*Dp3 + & epyn3*f3*Dp2 + epyn4*f4*Dp1) / rhop uxpc_n(k) = eta * Upx + (1-eta)*uxpc_n(k) uypc_n(k) = eta * Upy + (1-eta)*uypc_n(k) ! New Postion of particle endif 110 Pnx = xpc_n(k) + uxpc_n(k)*dt Pny = ypc_n(k) + uypc_n(k)*dt xpc_n(k) = Pnx ypc_n(k) = Pny ! Finding Nd_Np !if (Upxi(i) .eq. 0) then !Nde = Nde +1 !end if !enddo enddo END SUBROUTINE subroutine searind(indi1,indj1,indi2,indj2, & xin,yin,xposi,yposi,lx,ly) integer indi1, indj1, indi2, indj2 real xin, yin, xposi(lx), yposi(ly) do i=1,lx if(xposi(i) .ge. xin) then indi1=i-1 indi2=i goto 100 endif enddo 100 continue do j=1,ly if(yposi(j) .ge. yin) then indj1=j-1 indj2=j goto 101 111 endif enddo 101 continue end subroutine subroutine intpvel2d3(ux_p,uy_p,ux,uy,xin,yin,dx,dy,& indi1,indj1,indi2,indj2,xpos,ypos,lx,ly) real xin, yin,dx,dy,ux_p,uy_p,r real ux(ly,lx), uy(ly,lx),xpos(lx),ypos(ly) real uxf(4), uyf(4),xpf(4),ypf(4)!,hx(4),hy(4) integer indi1,indj1,indi2,indj2 uxf(1)=ux(indj1,indi1) xpf(1)=xpos(indi1) xpf(2)=xpos(indi2) uxf(2)=ux(indj2,indi1) ypf(1)=ypos(indj1) ypf(2)=ypos(indj2); uxf(3)=ux(indj1,indi2); uxf(4)=ux(indj2,indi2); uyf(1)=uy(indj1,indi1); uyf(2)=uy(indj2,indi1); uyf(3)=uy(indj1,indi2); uyf(4)=uy(indj2,indi2); rx=(xin-xpf(1))/dx; ry=(yin-ypf(1))/dy; ux1=(1-rx)*uxf(1)+rx*uxf(3); ux2=(1-rx)*uxf(2)+rx*uxf(4); uy1=(1-rx)*uyf(1)+rx*uyf(3); uy2=(1-rx)*uyf(2)+rx*uyf(4); ux_p=(1-ry)*ux1+ry*ux2; uy_p=(1-ry)*uy1+ry*uy2; end subroutine 112 B.2 3D square duct flow (Fortran) !--------------------------- Program LBM3D !--------------------------- !--------------------------------------------------------------------- character (len=60) :: fin, fin1, fout,fout1, fil2, fint character (len=7) :: fmt1 character (len=8) :: ctime !-------------------------------------------------- include 'mpif.h' include 'dim.h' ! !-------------------------------------------------- !-irelax: Perform relaxation process. !-irelax: Flag indicating whether relaxation process has reached end. !------------------------------------------- include 'comgridxyzt.h' character (len=2) :: f11 character (len=16) :: fil character (len=9) :: output_dir character (len=9) :: fins character (len=18) :: finsf ! !-Variables used for timing real :: timestart,timeend !- Miscellaneous variables real :: hc, nusq, stsq, s2, usnum, csq, umax, G, tau_i real :: fac1, fac2, fac3, f1, f2, f3 integer :: ts, tsf, rcount real :: temp, coftemp !------------- 113 !MPI Variables !------------- integer :: myid, ierr, numprocs, comm integer :: nprocs integer :: twoslice, subslice integer, dimension(nproch*nprocv,2) :: id2d integer, dimension(nproch,nprocv) :: id1d integer, dimension(nproch) :: group, group_rank integer, dimension(2) :: group_h, group_rank_h !------------------- !- Define 3-D Arrays !------------------- real, dimension(:,:,:), allocatable :: ux,uy,uz, rho, tau real, dimension(:,:,:), allocatable :: uxp,uyp,uzp real, dimension(:,:,:,:), allocatable :: f, ftemp, fdiff, feq real, dimension(:), allocatable :: ex, ey, ez real, dimension(:,:), allocatable :: umean, urmsm integer, dimension(:,:,:), allocatable :: is_solid_node !-Integer Variable that raises red-flag on allocation errors integer AllocateStatus !--------------------- !- Allocate 3-D Arrays !--------------------- allocate (ux(nz,ny,nxpl),uy(nz,ny,nxpl),uz(nz,ny,nxpl), > rho(nz,ny,nxpl), tau(nz,ny,nxpl), > is_solid_node(nz,ny,nxpl), > uxp(nz,ny,nxpl),uyp(nz,ny,nxpl),uzp(nz,ny,nxpl), > stat=AllocateStatus) if (AllocateStatus /= 0) then stop "**Not Enough Memory - Phys. Space 3D Arrays in MAIN **" end if ! allocate(f(nz,ny,nxpl,19),ftemp(nz,ny,nxpl,19), > fdiff(nz,ny,nxpl,19),feq(nz,ny,nxpl,19), 114 > stat=AllocateStatus) if (AllocateStatus /= 0) then stop "**Not Enough Memory - Distribution function in MAIN **" end if allocate(ex(19), ey(19), ez(19), umean(nz,ny), urmsm(nz,ny), > stat=AllocateStatus) if (AllocateStatus /= 0) then stop "**Not Enough Memory - Lattice speed in MAIN **" end if ! !******************* !------------------ !MPI Initialization !------------------ !******************* comm = mpi_comm_world call MPI_INIT(ierr) call MPI_COMM_RANK(comm,myid,ierr) call MPI_COMM_SIZE(comm,numprocs,ierr) if( myid == 0 ) then nprocs = nproch*nprocv write(*,*) ' MPI initialized with ',numprocs,' live processors >specified in run_mpi' write(*,*) ' Number of horizontal processors', nproch write(*,*) ' Number of vertical processors', nprocv if( nprocs .ne. numprocs ) then write(*,*) ' Number of live processors: ', numprocs, > ' does not match' write(*,*) ' the number specified in dim.h: ', > nprocs write(*,*) ' Check the run_mpi script and the dim.h file.' goto 999 ! shutdown MPI and exit gracefully ! call MPI_ABORT(comm,ierr) ! MPI ABORT causes problems ! 115 endif endif !--------------------------------------------------------- ! Create specialized data structure for MPI data swaps !--------------------------------------------------------- comm = mpi_comm_world call create_MPI_data_type(nx,ny,nz,nxpl,nypl,nzpl, & twoslice,subslice, & myid,comm) call MPI_BARRIER(comm,ierr) !--------------------------------------------------------- ! Set Up 2-D MPI processor grid !--------------------------------------------------------- call mpi_grid_setup(myid,id1d,id2d) !----------------------------------------------------------- ! Set Up all MPI processor groups and intracommunicators !----------------------------------------------------------- iflag = 0 call mpi_group_setup(group,group_rank, myid,id1d,id2d,comm, iflag) call MPI_BARRIER(comm,ierr) !------------------------------------------------------------------- namelist/inputs/ istart,dt,xlen,ylen,zlen,CS,grav,umax, Retau,nstatavg itout = 0 !-Begin counting time for full simulation here timestart0 = MPI_WTIME() !-Set the constant pi here pi2 = 8.*atan(1.0) !******************************************** !-------------------------------------------- !-Read in data from input file on processor 0 116 !-------------------------------------------- !******************************************** readin0: if (myid==0) then open(7,file='input.dat',status='old') read(7,inputs) !-Output all parameters as reminder write(*,*) '*-------------------------------*' write(10,inputs) write(*,*) '*-------------------------------*' print *,'xlen, ylen,zlen: ',xlen,ylen,zlen print *,'umax, Re_tau : ',umax,Retau print *,' dt :',dt !- Compute input parameters hc=(ny-1) xnu=0.001 tau_i=3.*xnu+0.5 u_star=xnu*Retau/hc umax=u_star*(2.5*log(Retau)+5.5) Re=hc*umax/xnu nusq=xnu*xnu G=4.*(u_star**2)/hc csq=CS*CS nsteps=50000 endif readin0 !-------------------------------------------------------- !-Now BROADCAST all necessary constants to all processors !-------------------------------------------------------- !-Inputs call MPI_BCAST(istart,1,MPI_INTEGER,0,comm,ierr) call MPI_BCAST(dt,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(xlen,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(ylen,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(zlen,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(grav,1,MPI_REAL8,0,comm,ierr) 117 call MPI_BCAST(Retau,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(nstatavg,1,MPI_REAL8,0,comm,ierr) !- Computed parameters call MPI_BCAST(hc,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(umax,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(Re,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(xnu,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(tau_i,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(G,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(CS,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(csq,1,MPI_REAL8,0,comm,ierr) call MPI_BCAST(nsteps,1,MPI_INTEGER,0,comm,ierr) do k=1,nz do j=1,ny do i=1,nxpl tau(k,j,i)=tau_i enddo !i enddo !j enddo !k dxp=xlen/(nx-1) dyp=ylen/(ny-1) dzp=zlen/(nz-1) dtp=1.*dxp*dxp if(myid==0)write(*,*)' dxp, dyp, dzp :',dxp,dyp,dzp !******************************************************* !-Setup/Initialization !--------------------- ux=0. uy=0. uz=0. umean=0. f=0.0 ftemp=0.0 118 !- Set up solid nodes on boundaryies is_solid_node=0 do k=1,nz do i=1,nxpl is_solid_node(k,1,i)=1 is_solid_node(k,ny,i)=1 enddo !i enddo ! k do j=1,ny do i=1,nxpl is_solid_node(1,j,i)=1 is_solid_node(nz,j,i)=1 enddo enddo !- Define lattice velocity vectors ex(0+1)=0 ey(0+1)=0 ez(0+1)=0 ex(1+1)=0 ey(1+1)=1 ez(1+1)=0 ex(2+1)=0 ey(2+1)=-1 ez(2+1)=0 ex(3+1)=1 ey(3+1)=0 ez(3+1)=0 ex(4+1)=-1 ey(4+1)=0 ez(4+1)=0 ex(5+1)=0 ey(5+1)=0 ez(5+1)=1 ex(6+1)=0 119 ey(6+1)=0 ez(6+1)=-1 ex(7+1)=1 ey(7+1)=1 ez(7+1)=0 ex(8+1)=-1 ey(8+1)=1 ez(8+1)=0 ex(9+1)=1 ey(9+1)=-1 ez(9+1)=0 ex(10+1)=-1 ey(10+1)=-1 ez(10+1)=0 ex(11+1)=0 ey(11+1)=1 ez(11+1)=1 ex(12+1)=0 ey(12+1)=1 ez(12+1)=-1 ex(13+1)=0 ey(13+1)=-1 ez(13+1)=1 ex(14+1)=0 ey(14+1)=-1 ez(14+1)=-1 ex(15+1)=1 ey(15+1)=0 ez(15+1)=1 ex(16+1)=1 ey(16+1)=0 ez(16+1)=-1 ex(17+1)=-1 ey(17+1)=0 120 ez(17+1)=1 ex(18+1)=-1 ey(18+1)=0 ez(18+1)=-1 !********************** f1= 3. f2= 9./2. f3= 3./2. !-STARTING FROM SCRATCH !********************** if(istart.eq.0) then timestart = MPI_WTIME() !- Initial density and distribution functions fac1=(1./3.) fac2=(1./18.) fac3=(1./36.) do k=1,nz do j=1,ny do i=1,nxpl rho(k,j,i)=1. f(k,j,i,1)=fac1*rho(k,j,i) f(k,j,i,2)=fac2*rho(k,j,i) f(k,j,i,3)=fac2*rho(k,j,i) f(k,j,i,4)=fac2*rho(k,j,i) f(k,j,i,5)=fac2*rho(k,j,i) f(k,j,i,6)=fac2*rho(k,j,i) f(k,j,i,7)=fac2*rho(k,j,i) f(k,j,i,8)=fac3*rho(k,j,i) f(k,j,i,9)=fac3*rho(k,j,i) f(k,j,i,10)=fac3*rho(k,j,i) f(k,j,i,11)=fac3*rho(k,j,i) f(k,j,i,12)=fac3*rho(k,j,i) f(k,j,i,13)=fac3*rho(k,j,i) 121 f(k,j,i,14)=fac3*rho(k,j,i) f(k,j,i,15)=fac3*rho(k,j,i) f(k,j,i,16)=fac3*rho(k,j,i) f(k,j,i,17)=fac3*rho(k,j,i) f(k,j,i,18)=fac3*rho(k,j,i) f(k,j,i,19)=fac3*rho(k,j,i) enddo !i enddo !j enddo !k call initP(myid,numprocs,comm,ux,uy,uz,id1d,id2d) do k=1,nz do j=1,ny do i=1,nxpl ux(k,j,i)=ux(k,j,i)*u_star uy(k,j,i)=uy(k,j,i)*u_star uz(k,j,i)=uz(k,j,i)*u_star enddo !i enddo !j enddo !k call Compute_feq(ux,uy,uz,rho,is_solid_node, ex,ey,ez, > f1,f2,f3,f,myid,comm,numprocs) do k=1,nz do j=1,ny do i=1,nxpl ux(k,j,i)=0.0 uy(k,j,i)=0.0 uz(k,j,i)=0.0 rho(k,j,i)=0. if(is_solid_node(k,j,i)==0) then do l=1,19 rho(k,j,i)=rho(k,j,i)+f(k,j,i,l) ux(k,j,i)=ux(k,j,i)+ex(l)*f(k,j,i,l) uy(k,j,i)=uy(k,j,i)+ey(l)*f(k,j,i,l) 122 uz(k,j,i)=uz(k,j,i)+ez(l)*f(k,j,i,l) enddo ! l ux(k,j,i)=(ux(k,j,i)+0.5*G)/rho(k,j,i) uy(k,j,i)=uy(k,j,i)/rho(k,j,i) uz(k,j,i)=uz(k,j,i)/rho(k,j,i) endif ! is_solid_node enddo !o enddo !j enddo !k timeend = MPI_WTIME() if (myid == 0) write(*,*) 'Time for INITIALIZATION', > timeend-timestart t = 0. else !************ !-RESTART RUN !************ if (myid == 0) write(*,*) 'Reading in RESTART file' fin = 'start.dat' fin1= 'Ffile.dat' fin = output_dir//fin fin1=output_dir//fin1 call restart(t,fin,fin1,myid,id1d,id2d,comm,ux,uy,uz,f,feq) call Convertunit(ux,uy,uz,1,myid,numprocs,comm) !- Computing Macroscopic density do k=1,nz do j=1,ny do i=1,nxpl ux(k,j,i)=0. uy(k,j,i)=0. uz(k,j,i)=0. rho(k,j,i)=0. if(is_solid_node(k,j,i)==0) then do l=1,19 123 rho(k,j,i)=rho(k,j,i)+f(k,j,i,l) ux(k,j,i)=ux(k,j,i)+ex(l)*f(k,j,i,l) uy(k,j,i)=uy(k,j,i)+ey(l)*f(k,j,i,l) uz(k,j,i)=uz(k,j,i)+ez(l)*f(k,j,i,l) enddo !l ux(k,j,i)=(ux(k,j,i)+0.5*G)/rho(k,j,i) uy(k,j,i)=uy(k,j,i)/rho(k,j,i) uz(k,j,i)=uz(k,j,i)/rho(k,j,i) endif ! is_solid_node enddo !i enddo !j enddo !k write(*,*)'TIME :',t !-Record starting time timestart = MPI_WTIME() timeend = MPI_WTIME() if (myid == 0) write(*,*) 'Time for SET-UP',timeend-timestart endif !*********************** !----------------------- !-BEGIN MAIN LOOP HERE-| !----------------------- !*********************** rcount=1 do 1000 istep = 1,nsteps call Compute_feq(ux,uy,uz,rho,is_solid_node, ex,ey,ez, > f1,f2,f3,feq,myid,comm,numprocs) !- Collision step do k=1,nz do j=1,ny do i=1,nxpl 124 ! - Standard Bounce Back if(is_solid_node(k,j,i)==1)then temp=f(k,j,i,1+1) f(k,j,i,1+1)=f(k,j,i,2+1) f(k,j,i,2+1)=temp temp=f(k,j,i,5+1) f(k,j,i,5+1)=f(k,j,i,6+1) f(k,j,i,6+1)=temp temp=f(k,j,i,7+1) f(k,j,i,7+1)=f(k,j,i,10+1) f(k,j,i,10+1)=temp temp=f(k,j,i,8+1) f(k,j,i,8+1)=f(k,j,i,9+1) f(k,j,i,9+1)=temp temp=f(k,j,i,11+1) f(k,j,i,11+1)=f(k,j,i,14+1) f(k,j,i,14+1)=temp temp=f(k,j,i,12+1) f(k,j,i,12+1)=f(k,j,i,13+1) f(k,j,i,13+1)=temp temp=f(k,j,i,15+1) f(k,j,i,15+1)=f(k,j,i,18+1) f(k,j,i,18+1)=temp temp=f(k,j,i,16+1) f(k,j,i,16+1)=f(k,j,i,17+1) f(k,j,i,17+1)=temp else !- regular collision do l=1,19 fdiff(k,j,i,l)=f(k,j,i,l)-feq(k,j,i,l) f(k,j,i,l)=f(k,j,i,l)-( ( f(k,j,i,l)-feq(k,j,i,l) )/ tau(k,j,i) ) enddo !l do l=1,19 125 f(k,j,i,l)=f(k,j,i,l)+(1.5)*((2*tau(k,j,i)-1)/tau(k,j,i))*G*(ex(l)-ux(k,j,i))*feq(k,j,i,l) enddo !l endif ! bounce back or regular collision enddo !i enddo !j enddo !k !- Streaming step if(myid==0)write(*,*)'--- Streaming step ---' call streaming_p(f,ftemp,myid,comm,numprocs) do k=1,nz do j=1,ny do i=1,nxpl do l=1,19 f(k,j,i,l)=ftemp(k,j,i,l) enddo !l enddo !i enddo !j enddo !k if(ifSGS==1)then if(myid==0)write(*,*)' === SGS model is on === ' if(myid==0)write(*,*)' -- CS, tau_i, tau(5,5,5):',Cs,tau_i,tau(5,5,5) call SGSmodel(tau,ex,ey,ez,f,feq,myid,numprocs,comm,tau_i,CS,xnu) if(myid==0)write(*,*)' Updated -- CS, tau_i, tau(5,5,5):',Cs,tau_i,tau(5,5,5) endif !if SGS !- Computing Macroscopic density do k=1,nz do j=1,ny do i=1,nxpl ux(k,j,i)=0. uy(k,j,i)=0. uz(k,j,i)=0. rho(k,j,i)=0. if(is_solid_node(k,j,i)==0) then 126 do l=1,19 rho(k,j,i)=rho(k,j,i)+f(k,j,i,l) ux(k,j,i)=ux(k,j,i)+ex(l)*f(k,j,i,l) uy(k,j,i)=uy(k,j,i)+ey(l)*f(k,j,i,l) uz(k,j,i)=uz(k,j,i)+ez(l)*f(k,j,i,l) enddo !l ux(k,j,i)=(ux(k,j,i)+0.5*G)/rho(k,j,i) uy(k,j,i)=uy(k,j,i)/rho(k,j,i) uz(k,j,i)=uz(k,j,i)/rho(k,j,i) endif ! is_solid_node enddo !i enddo !j enddo !k t=t+dt if(mod(istep,100)==0)then temp=0.0 do j=1,ny do k=1,nz temp=0.0 do i=1,nxpl temp=temp+ux(k,j,i) enddo !i umean(k,j)=temp/float(nxpl) enddo !j enddo !k do k=1,nz do j=1,ny temp=0. call MPI_ALLREDUCE(umean(k,j),temp,1,MPI_REAL8, > MPI_SUM,comm,ierr) umean(k,j)=temp/float(numprocs) enddo !j 127 enddo !k if(mod(istep,nstatavg)==0 .and. ifstatout==1)then fins='SVdat_bin' finsf=output_dir//fins call sendoutdatafile_loop2(finsf,myid,id1d,id2d,comm, > ux,uy,uz,'0',istep,nsteps,nstatavg) endif 1000 continue !! !- Computing Macroscopic density call Convertunit(ux,uy,uz,-1,myid,numprocs,comm) !----------------------- !-MAIN LOOP ENDS HERE | !----------------------- !--------------------------------------------------------------- !-Disassemble all MPI processor subgroups and intracommunicators !--------------------------------------------------------------- iflag = 1 call mpi_group_setup(group,group_rank, myid,id1d,id2d,comm, iflag) 999 if (myid == 0) then write(*,*) 'PROGRAM ENDS !' end if timeend0 = MPI_WTIME() if (myid == 0) write(*,*) 'Time required for full simulation' >, timeend0-timestart0 !---------------------- !-Deallocate 3-D arrays !---------------------- deallocate (ux,uy,uz,rho,tau,f,feq,ftemp,fdiff,ex,ey,ez,is_solid_node, > umean,uxp,uyp,uzp,urmsm, stat=AllocateStatus) if (AllocateStatus /= 0) then 128 stop "**Error Deallocating - 3D Arrays in MAIN **" end if !-End MPI call MPI_FINALIZE(ierr) 15 format(a2,e10.4) 25 format(a8,e10.4) end program LBM3D B.3 3D square duct flow (CUDA) #ifndef LBM #define LBM #include <stdio.h> #include <iostream> #include <cuda.h> #include <cuda_runtime.h> #include <cutil_inline.h> #include <cutil_math.h> #include <curand_kernel.h> #include <cstdlib> #include <ctime> #include <fstream> #include <sstream> #include <string> #include "lbm_part.cuh" #include "lbmcuh_part.cuh" using namespace std; /////////////////////////////////////////////////////////////////////////////////////////////////// int main(){ //Get available memory on graphics card before allocation size_t freeMemory_before; size_t totalMemory_before; 129 cudaMemGetInfo(&freeMemory_before, &totalMemory_before); //Initialise memory for LBM model initFields(); uint numCells = m_dimensions.x * m_dimensions.y * m_dimensions.z; initPos (Part_p_host, randnum); outparti(Part_p_host); cutilSafeCall(cudaMemcpy(Part_p,Part_p_host,sizeof(float3)*nP,cudaMemcpyHostToD evice)); cutilSafeCall(cudaThreadSynchronize()); cutilCheckMsg("Synchronisation failed"); /////////////////////////////////////////////////////////////////////////////////// /// Apply initial velocity profile readdata(m_dU_host); cutilSafeCall(cudaMemcpy(m_dU,m_dU_host,sizeof(float4)*numCells,cudaMemcpyHo stToDevice)); cutilSafeCall(cudaThreadSynchronize()); cutilCheckMsg("Synchronisation failed"); dim3 blocks=make_uint3(m_dimensions.y, m_dimensions.z, 1); uint threads=m_dimensions.x; initF<<<blocks, threads>>>(m_dF,m_dU,rhonull); intiPartvel<<<nP,1>>> (Part_dui, Part_du,Part_p, m_dU); /////////////////////////////////////////////////////////////////////////////////// // Time loop for (int i=0; i<ttr; ++i){ dim3 blocks=make_uint3(m_dimensions.y, m_dimensions.z, 1); uint threads=m_dimensions.x; //Collide and stream step collideStream<<<blocks, threads>>>(m_dF,feq,m_dU,g,solid,tau); cutilCheckMsg("collideStreamD kernel execution failed"); cutilSafeCall(cudaThreadSynchronize()); cutilCheckMsg("Synchronisation failed"); 130 apply_BCs_stream<<<blocks, threads>>>(m_dF,f_temp); cutilCheckMsg("apply_BCs_stream kernel execution failed"); cutilSafeCall(cudaThreadSynchronize()); cutilCheckMsg("Synchronisation failed"); update_f<<<blocks, threads>>>(m_dF, f_temp); cutilSafeCall(cudaThreadSynchronize()); cutilCheckMsg("Synchronisation failed"); SGSmodel<<<blocks, threads>>>(m_dF, feq, csq,nu,tau,solid); cutilSafeCall(cudaThreadSynchronize()); cutilCheckMsg("Synchronisation failed"); //////////////////////////////////////////////////////////////////////////////////////// getdisrand (addrand_host, disrand); cutilSafeCall(cudaMemcpy(addrand,addrand_host,sizeof(float3)*nP,cudaMemcpyHostT oDevice)); cutilSafeCall(cudaThreadSynchronize()); cutilCheckMsg("Synchronisation failed"); part_tracking<<<nP,1>>>(Part_p, Part_dui, Part_du,m_dU, eta, addrand, nde); cutilSafeCall(cudaThreadSynchronize()); cutilCheckMsg("Synchronisation failed"); cutilSafeCall(cudaMemcpy(nde_host,nde,sizeof(int)*nP,cudaMemcpyDeviceToHost)); calnde(nde_host, calNde); icurrent = icurrent +1; } }// Particle time loop end ////////////////////////////////////////////////////////////////////////////////////////// cutilSafeCall(cudaMemcpy(m_dU_host,m_dU,sizeof(float4)*numCells,cudaMemcpyDe viceToHost)); cutilSafeCall(cudaMemcpy(Part_p_host,Part_p,sizeof(float3)*nP,cudaMemcpyDeviceTo Host)); 131 } // main loop ////////////////////////////////////////////////////////////////////////////////////////////////// void initFields(void){ cutilSafeCall(cudaMalloc(&Part_p, sizeof(float3)*nP)); cutilSafeCall(cudaMalloc(&Part_dui, sizeof(float3)*nP)); cutilSafeCall(cudaMalloc(&Part_du, sizeof(float3)*nP)); cutilSafeCall(cudaMalloc(&addrand, sizeof(float3)*nP)); cutilSafeCall(cudaMalloc(&nde, sizeof(int)*nP)); Part_p_host=(float3*)malloc(sizeof(float3)*nP); randnum = (float*)malloc(sizeof(float)*nP*3); addrand_host=(float3*)malloc(sizeof(float3)*nP); disrand = (float*)malloc(sizeof(float)*nP*3); nde_host = (int*)malloc(sizeof(int)*nP); calNde = (int*)malloc(sizeof(int)*240); for (int i=0; i<240; ++i){calNde[i] = 0;} //Calculate Parameter m_dimensions.x=m_dx; m_dimensions.y=m_dy; m_dimensions.z=m_dz; numnodes=m_dimensions.x*m_dimensions.y*m_dimensions.z;\ Params *params_temp=(Params*)malloc(sizeof(Params)); m_dU_host=(float4*)malloc(sizeof(float4)*numnodes); params_temp->dx=m_dimensions.x; params_temp->dy=m_dimensions.y; params_temp->dz=m_dimensions.z; params_temp->tau=m_tau; cutilSafeCall(cudaMalloc(&m_dU, sizeof(float4)*numnodes)); cutilSafeCall(cudaMalloc(&tau, sizeof(float)*numnodes)); 132 cutilSafeCall(cudaMalloc(&m_dF, sizeof(float)*19*numnodes)); cutilSafeCall(cudaMalloc(&feq, sizeof(float)*19*numnodes)); cutilSafeCall(cudaMalloc(&f_temp, sizeof(float)*19*numnodes)); cutilSafeCall(cudaMalloc(&solid, sizeof(int)*numnodes)); // Set initial value of solid to 0 cutilSafeCall(cudaMemset(solid,0.0f,sizeof(int)*numnodes)); cutilSafeCall(cudaMemcpyToSymbol(params,params_temp,sizeof(Params))); free(params_temp); printf("\nFields and Params copied to device\n"); //Fill fields with initial data dim3 blocks = make_uint3(m_dimensions.y, m_dimensions.z, 1); uint threads = m_dimensions.x; initGridD<<<blocks, threads>>>(m_dF,feq, f_temp,tau,m_tau,rhonull); cutilCheckMsg("initGridD kernel execution failed"); initGridU<<<blocks, threads>>>(m_dU, rhonull); cutilCheckMsg("initGridU kernel execution failed"); initSolid<<<blocks, threads>>>(solid); cutilCheckMsg("initGridU kernel execution failed"); printf("initGridD done with fields\n"); }//initial ///////////////////////////////////////////////////////////////////////////////////////// uint grid2indexhost(uint x, uint y, uint z, uint i=0){ return x+((y+z*m_dy)*m_dx)+(m_dx*m_dy*m_dz*i); } ///////////////////////////////////////////////////////////////////////////////////////// __device__ __host__ uint grid2index(uint x, uint y, uint z, uint i=0){ return x+((y+z*params.dy)*params.dx)+(params.dx*params.dy*params.dz*i); 133 } ///////////////////////////////////////////////////////////////////////////////////////// __global__ void initF(float *f, float4 *m_dU, float rho){ uint x = threadIdx.x, y = blockIdx.x, z = blockIdx.y; float3 vel; vel.x = m_dU[grid2index(x,y,z)].x; vel.y = m_dU[grid2index(x,y,z)].y; vel.z = m_dU[grid2index(x,y,z)].z; float uxsq = vel.x*vel.x, uysq = vel.y*vel.y, uzsq = vel.z*vel.z, uxuy1 = vel.x + vel.y, uxuy2 = -vel.x + vel.y, uxuy3 = vel.x - vel.y, uxuy4 = -vel.x - vel.y, uyuz1 = vel.y + vel.z, uyuz2 = vel.y - vel.z, uyuz3 = -vel.y + vel.z, uyuz4 = -vel.y - vel.z, uxuz1 = vel.x + vel.z, uxuz2 = vel.x - vel.z, uxuz3 = -vel.x + vel.z, uxuz4 = -vel.x - vel.z; float usq = uxsq + uysq + uzsq; float f0eq, f1eq, f2eq, f3eq, f4eq, f5eq, f6eq, f7eq, f8eq, f9eq, f10eq, f11eq, f12eq,f13eq,f14eq,f15eq,f16eq,f17eq,f18eq; float rt0 = 1.f/3.f * rho, rt1 = 1.f/18.f * rho, rt2 = 1.f/36.f * rho; f0eq = rt0*(1.f - 1.5f*usq); f1eq = rt1*(1.f + 3.f*vel.x + 4.5f*uxsq - 1.5f*usq); f2eq = rt1*(1.f - 3.f*vel.x + 4.5f*uxsq - 1.5f*usq); f3eq = rt1*(1.f + 3.f*vel.y + 4.5f*uysq - 1.5f*usq); f4eq = rt1*(1.f - 3.f*vel.y + 4.5f*uysq - 1.5f*usq); 134 f5eq = rt1*(1.f + 3.f*vel.z + 4.5f*uzsq - 1.5f*usq); f6eq = rt1*(1.f - 3.f*vel.z + 4.5f*uzsq - 1.5f*usq); f7eq = rt2*(1.f + 3.f*uxuy1 + 4.5f*uxuy1*uxuy1-1.5f*usq); f8eq = rt2*(1.f + 3.f*uxuy4 + 4.5f*uxuy4*uxuy4-1.5f*usq); f9eq = rt2*(1.f + 3.f*uxuy3 + 4.5f*uxuy3*uxuy3-1.5f*usq); f10eq = rt2*(1.f + 3.f*uxuy2 + 4.5f*uxuy2*uxuy2-1.5f*usq); f11eq = rt2*(1.f + 3.f*uxuz1 + 4.5f*uxuz1*uxuz1-1.5f*usq); f12eq = rt2*(1.f + 3.f*uxuz4 + 4.5f*uxuz4*uxuz4-1.5f*usq); f13eq = rt2*(1.f + 3.f*uxuz2 + 4.5f*uxuz2*uxuz2-1.5f*usq); f14eq = rt2*(1.f + 3.f*uxuz3 + 4.5f*uxuz3*uxuz3-1.5f*usq); f15eq = rt2*(1.f + 3.f*uyuz1 + 4.5f*uyuz1*uyuz1-1.5f*usq); f16eq = rt2*(1.f + 3.f*uyuz4 + 4.5f*uyuz4*uyuz4-1.5f*usq); f17eq = rt2*(1.f + 3.f*uyuz2 + 4.5f*uyuz2*uyuz2-1.5f*usq); f18eq = rt2*(1.f + 3.f*uyuz3 + 4.5f*uyuz3*uyuz3-1.5f*usq); f[grid2index(x,y,z,0)]= f0eq; f[grid2index(x,y,z,1)]= f1eq; f[grid2index(x,y,z,2)]= f2eq; f[grid2index(x,y,z,3)]= f3eq; f[grid2index(x,y,z,4)]= f4eq; f[grid2index(x,y,z,5)]= f5eq; f[grid2index(x,y,z,6)]= f6eq; f[grid2index(x,y,z,7)]= f7eq; f[grid2index(x,y,z,8)]= f8eq; f[grid2index(x,y,z,9)]= f9eq; f[grid2index(x,y,z,10)]= f10eq; f[grid2index(x,y,z,11)]= f11eq; f[grid2index(x,y,z,12)]= f12eq; f[grid2index(x,y,z,13)]= f13eq; f[grid2index(x,y,z,14)]= f14eq; f[grid2index(x,y,z,15)]= f15eq; 135 f[grid2index(x,y,z,16)]= f16eq; f[grid2index(x,y,z,17)]= f17eq; f[grid2index(x,y,z,18)]= f18eq; } ///////////////////////////////////////////////////////////////////////////////////////////////////////// __global__ void initGridD(float *destF,float *feq, float *f_temp, float *tau, float m_tau, float rho){ uint x = threadIdx.x, y = blockIdx.x, z = blockIdx.y; tau[grid2index(x,y,z)]=m_tau; destF[grid2index(x,y,z,0)]= (1.f/3.f)*rho; destF[grid2index(x,y,z,1)]= (1.f/18.f)*rho; destF[grid2index(x,y,z,2)]= (1.f/18.f)*rho; destF[grid2index(x,y,z,3)]= (1.f/18.f)*rho; destF[grid2index(x,y,z,4)]= (1.f/18.f)*rho; destF[grid2index(x,y,z,5)]= (1.f/18.f)*rho; destF[grid2index(x,y,z,6)]= (1.f/18.f)*rho; destF[grid2index(x,y,z,7)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,8)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,9)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,10)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,11)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,12)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,13)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,14)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,15)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,16)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,17)]= (1.f/36.f)*rho; destF[grid2index(x,y,z,18)]= (1.f/36.f)*rho; for(int i=0; i<19; i++){ f_temp[grid2index(x,y,z,i)]=destF[grid2index(x,y,z,i)]; feq[grid2index(x,y,z,i)]=0;//destF[grid2index(x,y,z,i)]; } } ///////////////////////////////////////////////////////////////////////////////////////// 136 __global__ void initGridU(float4 *m_dU, float rho){ uint x = threadIdx.x, y = blockIdx.x, z = blockIdx.y; //Use if start from zero velocity m_dU[grid2index(x,y,z)].x=0.0f; m_dU[grid2index(x,y,z)].y=0.0f; m_dU[grid2index(x,y,z)].z=0.0f; m_dU[grid2index(x,y,z)].w=rho; } ////////////////////////////////////////////////////////////////////////////////////////// __global__ void initSolid(int *solid){ uint x = threadIdx.x, y = blockIdx.x, z = blockIdx.y; if(z==0 || z==params.dz-1 || y==0 || y==params.dy-1){ solid[grid2index(x,y,z)]=1; }// if } // initsolid ////////////////////////////////////////////////////////////////////////////////////////////// __global__ void collideStream(float *f, float *feq, float4 *m_dU, float g, int *solid, float *tau){ uint x=threadIdx.x, y=blockIdx.x, z=blockIdx.y; float f0 = f[grid2index(x,y,z,0)], f1 = f[grid2index(x,y,z,1)], f2 = f[grid2index(x,y,z,2)], f3 = f[grid2index(x,y,z,3)], f4 = f[grid2index(x,y,z,4)], f5 = f[grid2index(x,y,z,5)], f6 = f[grid2index(x,y,z,6)], f7 = f[grid2index(x,y,z,7)], f8 = f[grid2index(x,y,z,8)], f9 = f[grid2index(x,y,z,9)], f10 = f[grid2index(x,y,z,10)], 137 f11 = f[grid2index(x,y,z,11)], f12 = f[grid2index(x,y,z,12)], f13 = f[grid2index(x,y,z,13)], f14 = f[grid2index(x,y,z,14)], f15 = f[grid2index(x,y,z,15)], f16 = f[grid2index(x,y,z,16)], f17 = f[grid2index(x,y,z,17)], f18 = f[grid2index(x,y,z,18)]; float tauc = tau[grid2index(x,y,z)]; float rho=f0+f1+f2+f3+f4+f5+f6+f7+f8+f9+f10+f11+f12+f13+f14+f15+f16+f17+f18; float3 vel; if (solid[grid2index(x,y,z)] == 0) { vel.x=((f1-f2+f7-f8+f9-f10+f11-f12+f13-f14)+0.5f*g)/rho; vel.y=(f3-f4+f7-f8-f9+f10+f15-f16+f17-f18)/rho; vel.z=(f5-f6+f11-f12-f13+f14+f15-f16-f17+f18)/rho; } else { vel.x=0; vel.y=0; vel.z=0; rho = 0; } m_dU[grid2index(x,y,z)]=make_float4(vel.x,vel.y,vel.z,rho); //Calculate equilibrium DF(feq) float uxsq = vel.x*vel.x, uysq = vel.y*vel.y, uzsq = vel.z*vel.z, uxuy1 = vel.x + vel.y, uxuy2 = -vel.x + vel.y, uxuy3 = vel.x - vel.y, 138 uxuy4 = -vel.x - vel.y, uyuz1 = vel.y + vel.z, uyuz2 = vel.y - vel.z, uyuz3 = -vel.y + vel.z, uyuz4 = -vel.y - vel.z, uxuz1 = vel.x + vel.z, uxuz2 = vel.x - vel.z, uxuz3 = -vel.x + vel.z, uxuz4 = -vel.x - vel.z; float usq = uxsq + uysq + uzsq; float f0eq, f1eq, f2eq, f3eq, f4eq, f5eq, f6eq, f7eq, f8eq, f9eq, f10eq, f11eq, f12eq,f13eq,f14eq,f15eq,f16eq,f17eq,f18eq; float f0old, f1old, f2old, f3old, f4old, f5old, f6old, f7old, f8old, f9old, f10old, f11old, f12old, f13old, f14old, f15old, f16old, f17old, f18old; float rt0 = 1.f/3.f * rho, rt1 = 1.f/18.f * rho, rt2 = 1.f/36.f * rho; if (solid[grid2index(x,y,z)] == 0) { f0eq = rt0*(1.f - 1.5f*usq); f1eq = rt1*(1.f + 3.f*vel.x + 4.5f*uxsq - 1.5f*usq); f2eq = rt1*(1.f - 3.f*vel.x + 4.5f*uxsq - 1.5f*usq); f3eq = rt1*(1.f + 3.f*vel.y + 4.5f*uysq - 1.5f*usq); f4eq = rt1*(1.f - 3.f*vel.y + 4.5f*uysq - 1.5f*usq); f5eq = rt1*(1.f + 3.f*vel.z + 4.5f*uzsq - 1.5f*usq); f6eq = rt1*(1.f - 3.f*vel.z + 4.5f*uzsq - 1.5f*usq); f7eq = rt2*(1.f + 3.f*uxuy1 + 4.5f*uxuy1*uxuy1-1.5f*usq); f8eq = rt2*(1.f + 3.f*uxuy4 + 4.5f*uxuy4*uxuy4-1.5f*usq); f9eq = rt2*(1.f + 3.f*uxuy3 + 4.5f*uxuy3*uxuy3-1.5f*usq); f10eq = rt2*(1.f + 3.f*uxuy2 + 4.5f*uxuy2*uxuy2-1.5f*usq); f11eq = rt2*(1.f + 3.f*uxuz1 + 4.5f*uxuz1*uxuz1-1.5f*usq); f12eq = rt2*(1.f + 3.f*uxuz4 + 4.5f*uxuz4*uxuz4-1.5f*usq); f13eq = rt2*(1.f + 3.f*uxuz2 + 4.5f*uxuz2*uxuz2-1.5f*usq); 139 f14eq = rt2*(1.f + 3.f*uxuz3 + 4.5f*uxuz3*uxuz3-1.5f*usq); f15eq = rt2*(1.f + 3.f*uyuz1 + 4.5f*uyuz1*uyuz1-1.5f*usq); f16eq = rt2*(1.f + 3.f*uyuz4 + 4.5f*uyuz4*uyuz4-1.5f*usq); f17eq = rt2*(1.f + 3.f*uyuz2 + 4.5f*uyuz2*uyuz2-1.5f*usq); f18eq = rt2*(1.f + 3.f*uyuz3 + 4.5f*uyuz3*uyuz3-1.5f*usq); feq[grid2index(x,y,z,0)]= f0eq; feq[grid2index(x,y,z,1)]= f1eq; feq[grid2index(x,y,z,2)]= f2eq; feq[grid2index(x,y,z,3)]= f3eq; feq[grid2index(x,y,z,4)]= f4eq; feq[grid2index(x,y,z,5)]= f5eq; feq[grid2index(x,y,z,6)]= f6eq; feq[grid2index(x,y,z,7)]= f7eq; feq[grid2index(x,y,z,8)]= f8eq; feq[grid2index(x,y,z,9)]= f9eq; feq[grid2index(x,y,z,10)]= f10eq; feq[grid2index(x,y,z,11)]= f11eq; feq[grid2index(x,y,z,12)]= f12eq; feq[grid2index(x,y,z,13)]= f13eq; feq[grid2index(x,y,z,14)]= f14eq; feq[grid2index(x,y,z,15)]= f15eq; feq[grid2index(x,y,z,16)]= f16eq; feq[grid2index(x,y,z,17)]= f17eq; feq[grid2index(x,y,z,18)]= f18eq; // Do collisions (adding forcing term g) f[grid2index(x,y,z,0)]= (f0 - (f0-f0eq)/tauc) + ((1.5f*(2.f*tauc-1.f)/tauc)*g*(0.f- vel.x)*f0eq); f[grid2index(x,y,z,1)]= (f1 - (f1-f1eq)/tauc) + ((1.5f*(2.f*tauc-1.f)/tauc)*g*(1.f- vel.x)*f1eq); f[grid2index(x,y,z,2)]= (f2 - (f2-f2eq)/tauc) + ((1.5f*(2.f*tauc-1.f)/tauc)*g*(-1.f- vel.x)*f2eq); 140 f[grid2index(x,y,z,3)]= (f3 - (f3-f3eq)/tauc) + ((1.5f*(2.f*tauc-1.f)/tauc)*g*(0.f- vel.x)*f3eq); f[grid2index(x,y,z,4)]= (f4 - (f4-f4eq)/tauc) + ((1.5f*(2.f*tauc-1.f)/tauc)*g*(0.f- vel.x)*f4eq); f[grid2index(x,y,z,5)]= (f5 - (f5-f5eq)/tauc) + ((1.5f*(2.f*tauc-1.f)/tauc)*g*(0.f- vel.x)*f5eq); f[grid2index(x,y,z,6)]= (f6 - (f6-f6eq)/tauc) + ((1.5f*(2.f*tauc-1.f)/tauc)*g*(0.f- vel.x)*f6eq); f[grid2index(x,y,z,7)]= (f7 - (f7-f7eq)/tauc) + ((1.5f*(2.f*tauc-1.f)/tauc)*g*(1.f- vel.x)*f7eq); f[grid2index(x,y,z,8)]= (f8 - (f8-f8eq)/tauc) + ((1.5f*(2.f*tauc-1.f)/tauc)*g*(-1.f- vel.x)*f8eq); f[grid2index(x,y,z,9)]= (f9 - (f9-f9eq)/tauc) + ((1.5f*(2.f*tauc-1.f)/tauc)*g*(1.f- vel.x)*f9eq); f[grid2index(x,y,z,10)]= (f10 - (f10-f10eq)/tauc)+ ((1.5f*(2.f*tauc-1.f)/tauc)*g*(-1.f- vel.x)*f10eq); f[grid2index(x,y,z,11)]= (f11 - (f11-f11eq)/tauc)+ ((1.5f*(2.f*tauc-1.f)/tauc)*g*(1.f- vel.x)*f11eq); f[grid2index(x,y,z,12)]= (f12 - (f12-f12eq)/tauc)+ ((1.5f*(2.f*tauc-1.f)/tauc)*g*(-1.f- vel.x)*f12eq); f[grid2index(x,y,z,13)]= (f13 - (f13-f13eq)/tauc)+ ((1.5f*(2.f*tauc-1.f)/tauc)*g*(1.f- vel.x)*f13eq); f[grid2index(x,y,z,14)]= (f14 - (f14-f14eq)/tauc)+ ((1.5f*(2.f*tauc-1.f)/tauc)*g*(-1.f- vel.x)*f14eq); f[grid2index(x,y,z,15)]= (f15 - (f15-f15eq)/tauc)+ ((1.5f*(2.f*tauc-1.f)/tauc)*g*(0.f- vel.x)*f15eq); f[grid2index(x,y,z,16)]= (f16 - (f16-f16eq)/tauc)+ ((1.5f*(2.f*tauc-1.f)/tauc)*g*(0.f- vel.x)*f16eq); f[grid2index(x,y,z,17)]= (f17 - (f17-f17eq)/tauc)+ ((1.5f*(2.f*tauc-1.f)/tauc)*g*(0.f- vel.x)*f17eq); f[grid2index(x,y,z,18)]= (f18 - (f18-f18eq)/tauc)+ ((1.5f*(2.f*tauc-1.f)/tauc)*g*(0.f- vel.x)*f18eq); } 141 else { f0old = f[grid2index(x,y,z,0)]; f1old = f[grid2index(x,y,z,1)]; f2old = f[grid2index(x,y,z,2)]; f3old = f[grid2index(x,y,z,3)]; f4old = f[grid2index(x,y,z,4)]; f5old = f[grid2index(x,y,z,5)]; f6old = f[grid2index(x,y,z,6)]; f7old = f[grid2index(x,y,z,7)]; f8old = f[grid2index(x,y,z,8)]; f9old = f[grid2index(x,y,z,9)]; f10old = f[grid2index(x,y,z,10)]; f11old = f[grid2index(x,y,z,11)]; f12old = f[grid2index(x,y,z,12)]; f13old = f[grid2index(x,y,z,13)]; f14old = f[grid2index(x,y,z,14)]; f15old = f[grid2index(x,y,z,15)]; f16old = f[grid2index(x,y,z,16)]; f17old = f[grid2index(x,y,z,17)]; f18old = f[grid2index(x,y,z,18)]; f[grid2index(x,y,z,0)]= f0old; f[grid2index(x,y,z,1)]= f2old; f[grid2index(x,y,z,2)]= f1old; f[grid2index(x,y,z,3)]= f4old; f[grid2index(x,y,z,4)]= f3old; f[grid2index(x,y,z,5)]= f6old; f[grid2index(x,y,z,6)]= f5old; f[grid2index(x,y,z,7)]= f8old; f[grid2index(x,y,z,8)]= f7old; f[grid2index(x,y,z,9)]= f10old; f[grid2index(x,y,z,10)]= f9old; f[grid2index(x,y,z,11)]= f12old; f[grid2index(x,y,z,12)]= f11old; 142 f[grid2index(x,y,z,13)]= f14old; f[grid2index(x,y,z,14)]= f13old; f[grid2index(x,y,z,15)]= f16old; f[grid2index(x,y,z,16)]= f15old; f[grid2index(x,y,z,17)]= f18old; f[grid2index(x,y,z,18)]= f17old; } } //////////////////////////////////////////////////////////////////////////////// __global__ void SGSmodel(float *f, float *feq, float csq, float nu, float *tau, int *solid){ uint x=threadIdx.x, y=blockIdx.x, z=blockIdx.y; if (solid[grid2index(x,y,z)]==0) { float f0 = f[grid2index(x,y,z,0)], f1 = f[grid2index(x,y,z,1)], f2 = f[grid2index(x,y,z,2)], f3 = f[grid2index(x,y,z,3)], f4 = f[grid2index(x,y,z,4)], f5 = f[grid2index(x,y,z,5)], f6 = f[grid2index(x,y,z,6)], f7 = f[grid2index(x,y,z,7)], f8 = f[grid2index(x,y,z,8)], f9 = f[grid2index(x,y,z,9)], f10 = f[grid2index(x,y,z,10)], f11 = f[grid2index(x,y,z,11)], f12 = f[grid2index(x,y,z,12)], f13 = f[grid2index(x,y,z,13)], f14 = f[grid2index(x,y,z,14)], f15 = f[grid2index(x,y,z,15)], f16 = f[grid2index(x,y,z,16)], f17 = f[grid2index(x,y,z,17)], f18 = f[grid2index(x,y,z,18)]; float f0eq = feq[grid2index(x,y,z,0)], f1eq = feq[grid2index(x,y,z,1)], 143 f2eq = feq[grid2index(x,y,z,2)], f3eq = feq[grid2index(x,y,z,3)], f4eq = feq[grid2index(x,y,z,4)], f5eq = feq[grid2index(x,y,z,5)], f6eq = feq[grid2index(x,y,z,6)], f7eq = feq[grid2index(x,y,z,7)], f8eq = feq[grid2index(x,y,z,8)], f9eq = feq[grid2index(x,y,z,9)], f10eq = feq[grid2index(x,y,z,10)], f11eq = feq[grid2index(x,y,z,11)], f12eq = feq[grid2index(x,y,z,12)], f13eq = feq[grid2index(x,y,z,13)], f14eq = feq[grid2index(x,y,z,14)], f15eq = feq[grid2index(x,y,z,15)], f16eq = feq[grid2index(x,y,z,16)], f17eq = feq[grid2index(x,y,z,17)], f18eq = feq[grid2index(x,y,z,18)]; float fdiff1,fdiff2,fdiff3,fdiff4,fdiff5,fdiff6,fdiff7,fdiff8,fdiff9,fdiff10,fdiff11, fdiff12,fdiff13,fdiff14,fdiff15,fdiff16,fdiff17,fdiff18; float pixx, piyy, pizz, pixy, pixz, piyx,piyz, pizx, pizy, stsq, s1, s2, s3, s4; fdiff1 = (f1-f1eq); fdiff2 = f2-f2eq; fdiff3 = f3-f3eq; fdiff4 = f4-f4eq; fdiff5 = f5-f5eq; fdiff6 = f6-f6eq; fdiff7 = f7-f7eq; fdiff8 = f8-f8eq; fdiff9 = f9-f9eq; fdiff10 = f10-f10eq; fdiff11 = f11-f11eq; fdiff12 = f12-f12eq; 144 fdiff13 = f13-f13eq; fdiff14 = f14-f14eq; fdiff15 = f15-f15eq; fdiff16 = f16-f16eq; fdiff17 = f17-f17eq; fdiff18 = f18-f18eq; pixx = fdiff1+fdiff2+fdiff7+fdiff8+fdiff9+fdiff10+fdiff11+fdiff12+fdiff13+fdiff14; piyy = fdiff3+fdiff4+fdiff7+fdiff8+fdiff9+fdiff10+fdiff15+fdiff16+fdiff17+fdiff18; pizz = fdiff5+fdiff6+fdiff11+fdiff12+fdiff13+fdiff14+fdiff15+fdiff16+fdiff17+fdiff18; pixy = fdiff7+fdiff8-fdiff9-fdiff10; pixz = fdiff11+fdiff12-fdiff13-fdiff14; piyx = fdiff7+fdiff8-fdiff9-fdiff10; piyz = fdiff15+fdiff16-fdiff17-fdiff18; pizx = fdiff11+fdiff12-fdiff13-fdiff14; pizy = fdiff15+fdiff16-fdiff17-fdiff18; stsq = pixx*pixx + piyy*piyy + pizz*pizz + pixy*pixy + pixz*pixz+ piyx*piyx + piyz*piyz + pizx*pizx + pizy*pizy; s1 = 1.f/6.f/csq; s2 = 18.f*csq*sqrt(stsq); s3 = sqrt(nu*nu+s2); s4 = s1*(s3-nu); tau[grid2index(x,y,z)] = 3.f*(nu+csq*s4)+1.f/2.f; } } //////////////////////////////////////////////////////////////////////////////// // CUDA kernel all BC's apart from periodic boundaries: __global__ void apply_BCs_kernel (float *f,int *solid){ float f0old, f1old, f2old, f3old, f4old, f5old, f6old, f7old, f8old, f9old, f10old, f11old, f12old, f13old, f14old, f15old, f16old, f17old, f18old; uint x=threadIdx.x, y=blockIdx.x, z=blockIdx.y; // Solid BC: "bounce-back" if (solid[grid2index(x,y,z)] == 1) { 145 f0old = f[grid2index(x,y,z,0)]; f1old = f[grid2index(x,y,z,1)]; f2old = f[grid2index(x,y,z,2)]; f3old = f[grid2index(x,y,z,3)]; f4old = f[grid2index(x,y,z,4)]; f5old = f[grid2index(x,y,z,5)]; f6old = f[grid2index(x,y,z,6)]; f7old = f[grid2index(x,y,z,7)]; f8old = f[grid2index(x,y,z,8)]; f9old = f[grid2index(x,y,z,9)]; f10old = f[grid2index(x,y,z,10)]; f11old = f[grid2index(x,y,z,11)]; f12old = f[grid2index(x,y,z,12)]; f13old = f[grid2index(x,y,z,13)]; f14old = f[grid2index(x,y,z,14)]; f15old = f[grid2index(x,y,z,15)]; f16old = f[grid2index(x,y,z,16)]; f17old = f[grid2index(x,y,z,17)]; f18old = f[grid2index(x,y,z,18)]; f[grid2index(x,y,z,0)]= f0old; f[grid2index(x,y,z,1)]= f2old; f[grid2index(x,y,z,2)]= f1old; f[grid2index(x,y,z,3)]= f4old; f[grid2index(x,y,z,4)]= f3old; f[grid2index(x,y,z,5)]= f6old; f[grid2index(x,y,z,6)]= f5old; f[grid2index(x,y,z,7)]= f8old; f[grid2index(x,y,z,8)]= f7old; f[grid2index(x,y,z,9)]= f10old; f[grid2index(x,y,z,10)]= f9old; f[grid2index(x,y,z,11)]= f12old; f[grid2index(x,y,z,12)]= f11old; f[grid2index(x,y,z,13)]= f14old; 146 f[grid2index(x,y,z,14)]= f13old; f[grid2index(x,y,z,15)]= f16old; f[grid2index(x,y,z,16)]= f15old; f[grid2index(x,y,z,17)]= f18old; f[grid2index(x,y,z,18)]= f17old; }//if } //////////////////////////////////////////////////////////////////////////////// __global__ void apply_BCs_stream (float *f, float *f_temp){ float f1old, f2old, f3old, f4old, f5old, f6old, f7old, f8old, f9old, f10old, f11old, f12old, f13old, f14old, f15old, f16old, f17old, f18old; uint x=threadIdx.x, y=blockIdx.x, z=blockIdx.y; int ip,in,jp,jn,kp,kn; f1old = f[grid2index(x,y,z,1)]; f2old = f[grid2index(x,y,z,2)]; f3old = f[grid2index(x,y,z,3)]; f4old = f[grid2index(x,y,z,4)]; f5old = f[grid2index(x,y,z,5)]; f6old = f[grid2index(x,y,z,6)]; f7old = f[grid2index(x,y,z,7)]; f8old = f[grid2index(x,y,z,8)]; f9old = f[grid2index(x,y,z,9)]; f10old = f[grid2index(x,y,z,10)]; f11old = f[grid2index(x,y,z,11)]; f12old = f[grid2index(x,y,z,12)]; f13old = f[grid2index(x,y,z,13)]; f14old = f[grid2index(x,y,z,14)]; f15old = f[grid2index(x,y,z,15)]; f16old = f[grid2index(x,y,z,16)]; f17old = f[grid2index(x,y,z,17)]; f18old = f[grid2index(x,y,z,18)]; 147 if (z>0) {kn = z-1;} else {kn = params.dz-1;} if (z<params.dz-1){kp = z+1;} else {kp = 0;} if (y>0) {jn = y-1;} else {jn = params.dy-1;} if (y<params.dy-1) {jp = y+1;} else {jp = 0;} if (x>0) {in = x-1;} else {in = params.dx-1;} if (x<params.dx-1) {ip = x+1;} else {ip = 0;} f_temp[grid2index(ip,y,z,1)]= f1old; f_temp[grid2index(in,y,z,2)]= f2old; f_temp[grid2index(x,jp,z,3)]= f3old; f_temp[grid2index(x,jn,z,4)]= f4old; f_temp[grid2index(x,y,kp,5)]= f5old; f_temp[grid2index(x,y,kn,6)]= f6old; f_temp[grid2index(ip,jp,z,7)]= f7old; f_temp[grid2index(in,jn,z,8)]= f8old; f_temp[grid2index(ip,jn,z,9)]= f9old; f_temp[grid2index(in,jp,z,10)]= f10old; f_temp[grid2index(ip,y,kp,11)]= f11old; f_temp[grid2index(in,y,kn,12)]= f12old; f_temp[grid2index(ip,y,kn,13)]= f13old; f_temp[grid2index(in,y,kp,14)]= f14old; f_temp[grid2index(x,jp,kp,15)]= f15old; f_temp[grid2index(x,jn,kn,16)]= f16old; f_temp[grid2index(x,jp,kn,17)]= f17old; f_temp[grid2index(x,jn,kp,18)]= f18old; } //////////////////////////////////////////////////////////////////////////////////////// 148 __global__ void update_f (float *f, float *f_temp){ uint x=threadIdx.x, y=blockIdx.x, z=blockIdx.y; for(int i=0; i<19; i++) f[grid2index(x,y,z,i)]=f_temp[grid2index(x,y,z,i)]; } #endif
Abstract (if available)
Abstract
The transport and deposition of inertial particles in unsteady flows were investigated using a novel computationally inexpensive particle tracking model. The flows are simulated using lattice Boltzmann methods, which evolve until computationally steady flows are obtained. Initially, a snapshot of the flow is stored, and the trajectories of particles are computed through the flow domain under the influence of this static probability field. Although the flow is not computationally evolving during the particle tracking simulation, the local velocity is obtained stochastically from the local probability function, thus allowing the dynamics of the turbulent flow to be resolved from the point of view of the suspended particles. Particle inertia is modeled by using a relaxation parameter based on the particle Stokes number that allows for a particle velocity history to be incorporated during each time step. Computational simulation of particle-laden flows in a square cylinder placed in a 2D channel flow having Reynolds number of Re = 1000 was performed. Particles having Stokes number greater than unity were not affected by the flow in the wake region, while particles with smaller Stokes numbers (St < 1) interact strongly with the wake vortices. The computed particle deposition on the front side of the cylinder compared well with previous numerical results. The deposition efficiency decreases with decreasing Stokes number, typical of an inertial impaction process. The static probability model was also applied for tracking inertial particles through a turbulent square duct flow having a friction Reynolds number of Reτ = 300. Wall deposition rates and deposition patterns were obtained and exhibited a high level of agreement with previously obtained DNS computational results and experimental results for a wide range of particle inertia. This Boltzmann-based mode, which computes inertial particle motion from the local fluid probability distribution function, was parallelized efficiently using a GPU architecture, allowing for a turbulent square duct simulation that allows the fluid probability field to evolve during the particle tracking simulation. The computation time of the inertial particle tracking in the evolving three-dimensional turbulent duct flow is 134 times faster than using sequential computation on a CPU. Extremely good agreement was found between the static probability field simulation and the evolving probability field simulation. These results suggest that accurate particle tracking through complex turbulent flows may be feasible given a suitable probability field, such as one obtained from a lattice Boltzmann simulation. This in turn presents a new paradigm for the rapid acquisition of particle transport statistics without the need for concurrent computations of fluid flow evolution.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Large eddy simulations of laminar separation bubble flows
PDF
Tracking and evolution of compressible turbulent flow structures
PDF
An experimental study of the elastic theory of granular flows
PDF
Pulsatile and steady flow in central veins and its impact on right heart-brain hemodynamic coupling in health and disease
PDF
Supersonic flow study of a capsule geometry using large-eddy simulation
PDF
Numerical study of flow characteristics of controlled vortex induced vibrations in cylinders
PDF
Modeling and analysis of parallel and spatially-evolving wall-bounded shear flows
PDF
Pore-scale characterization and fluid-flow simulation: application of the Lattice Boltzmann method to conventional and unconventional rocks
PDF
Numerical simulations of linearly stratified flow around a sphere
PDF
Reconstruction and estimation of shear flows using physics-based and data-driven models
PDF
Three-dimensional kinetic simulations of whistler turbulence in solar wind on parallel supercomputers
PDF
Fluid dynamics of a crystallizing particle in a rotating liquid sphere
PDF
Numerical modeling of separated flows at moderate Reynolds numbers appropriate for turbine blades and unmanned aero vehicles
PDF
Model based design of porous and patterned surfaces for passive turbulence control
PDF
Numerical study of shock-wave/turbulent boundary layer interactions on flexible and rigid panels with wall-modeled large-eddy simulations
PDF
Oscillatory and streaming flow due to small-amplitude vibrations in spherical geometry
PDF
Computational fluid dynamic analysis of highway bridge superstructures exposed to hurricane waves
PDF
The projection immersed boundary method for compressible flow and its application in fluid-structure interaction simulations of parachute systems
PDF
Modeling and simulation of circulating tumor cells in flow. Part I: Low-dimensional deformation models for circulating tumor cells in flow; Part II: Procoagulant circulating tumor cells in flow
PDF
Flow and thermal transport at porous interfaces
Asset Metadata
Creator
Jung, Sewoong
(author)
Core Title
A Boltzmann model for tracking aerosols in turbulent flows
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Aerospace and Mechanical Engineering (Computational Fluid and Solid Mechanics)
Publication Date
01/31/2013
Defense Date
12/19/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
aerosols,computational fluid dynamics,Lattice Boltzmann method,OAI-PMH Harvest,turbulent flows
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Phares, Denis J. (
committee chair
), Henry, Ronald C. (
committee member
), Kassner, Michael E. (
committee member
)
Creator Email
sewjung@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-130134
Unique identifier
UC11291382
Identifier
usctheses-c3-130134 (legacy record id)
Legacy Identifier
etd-JungSewoon-1419.pdf
Dmrecord
130134
Document Type
Dissertation
Rights
Jung, Sewoong
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
aerosols
computational fluid dynamics
Lattice Boltzmann method
turbulent flows