Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Nanostructure electronic band modeling for solar cell and lighting applications
(USC Thesis Other)
Nanostructure electronic band modeling for solar cell and lighting applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
University of Southern California Viterbi School of Engineering 2013 Nanostructure Electronic Band Modeling for Solar Cell and Lighting Applications Ph.D. Dissertation Submitted to the USC Graduate School by: Suzana Sburlan Dr. P. Daniel Dapkus, Advisor Department of Electrical Engineering - Electrophysics ii iii Table of Contents Acknowledgments ........................................................................................................................................ v 1. Introduction .......................................................................................................................................... 1 2. Methodology ......................................................................................................................................... 8 2.1 An 8-Band Model for Electronic Structure Calculation ................................................................ 8 2.1.1 Hamiltonian for Cubic Systems ................................................................................................. 9 2.1.2 Hamiltonian for Hexagonal Systems ....................................................................................... 11 2.1.3 Lanczos Algorithm ................................................................................................................... 13 2.1.3.1 Numerical Stability .................................................................................................................. 18 2.1.3.2 Operator Symmetrization ....................................................................................................... 21 2.1.3.3 Matrix Operation ..................................................................................................................... 23 2.2 Conjugate gradient method applied to strain calculation .................................................................. 26 2.2.1 Strain Hamiltonian for Cubic and Hexagonal Crystals ............................................................. 26 2.2.2 Strain Calculation Using Continuum Elasticity Theory ............................................................ 28 2.2.2.1 Free Energy of Cubic Crystals .................................................................................................. 30 2.2.2.2 Free Energy of Hexagonal Crystals .......................................................................................... 36 2.2.3 Implementation ....................................................................................................................... 39 2.2.3.1 Matrix-Vector Operation for Cubic System ............................................................................. 39 2.2.3.2 Matrix-Vector Operation for Hexagonal System ..................................................................... 43 2.2.3.3 Conjugate Gradient Algorithm ................................................................................................ 47 2.2.3.4 Calculation of Strain Components ........................................................................................... 49 2.3 Piezoelectric Potential Calculation ..................................................................................................... 51 2.3.5 Generalized Minimum Residual Algorithm (GMRES) for Non-Symmetric Linear Systems ..... 53 2.3.6 Non-unique Solutions .............................................................................................................. 55 2.3.7 Piezoelectric Effect on Electronic Structure ............................................................................ 56 2.4 Material Parameters ........................................................................................................................... 57 2.5 Parallel Implementation ..................................................................................................................... 60 2.5.3 MPI (Message Passing Interface) Concepts ............................................................................ 60 2.5.4 MPI Subroutines ...................................................................................................................... 62 2.5.5 Spatial Decomposition ............................................................................................................ 65 iv 2.5.6 Parallelizing a Sequential Program .......................................................................................... 65 2.5.7 Sharing Boundary Data ............................................................................................................ 68 2.5.8 Implementation Issues ............................................................................................................ 70 3. Results and Analysis ............................................................................................................................ 75 3.1 Properties of InAs Nanostructures Grown on (001) GaAs Substrates ................................................ 75 3.1.1 Strain–Dominated Quantum Dot Growth ............................................................................... 75 3.1.2 Critical Height Prediction for Highly Strained Nanowires Using Continuum Elasticity ........... 83 3.1.3 Strain-Releasing Nanowire Configurations ............................................................................. 95 3.1.4 Effect of Strain, Polarization, Shape, and Size on Band Gaps of Embedded Quantum Dots 103 3.1.5 Electronic Structure of Single and Multi-Body QD Systems .................................................. 110 3.2 Properties of WZ Crystal Nanostructures Grown on (0001) Substrates ........................................... 129 3.2.1 An 8-Band Study of GaN/AlN Quantum Dot Interactions ............................................. 129 4. Further Work..................................................................................................................................... 143 4.1 Novel Geometries and Configurations .............................................................................................. 143 4.2 Second Order Piezoelectric Effects on Electronic Structure of Quantum Dots ................................ 143 4.3 Absorption Spectra Calculation for High Efficiency Solar Cells Using QD Superlattices ................... 144 4.4 Application to Intermediate Band Solar Cells ................................................................................... 145 Appendix A. User’s Manual ............................................................................................................... 149 A.1 Electronic Structure Calculation with Strain ..................................................................................... 149 A.2 Strain and Polarization Only Calculation ........................................................................................... 160 5. References ........................................................................................................................................ 162 v Acknowledgments So many people and organizations have been essential to the successful completion of my thesis. To begin, I wish to thank Professor Manuel Nieto-Vesperinas who oversaw my year- long Fulbright tenure at the Instituto de Ciencia de Materiales in Madrid, Spain. His warm hospitality and guidance during my stay gave me the confidence I needed to take on the challenge of graduate school upon my return to Los Angeles. The fellowships I received from USC’s Viterbi School of Engineering and the National Defense Science and Engineering Graduate Program enabled me to attend courses in computer science where I acquired the programming skills required in my research. I am very grateful for the flexibility to undertake a modeling project that complemented my group’s experimental goals. To this end, I wish to thank my advisor, Dr. Paul Daniel Dapkus, for supporting me as I cultivated a solid understanding of all the underlying concepts and methods necessary for the development of this theoretical tool. Throughout this process, he provided a broad perspective on the most relevant questions being pursued within our field, which allowed me to formulate a practical direction for my studies as well as my post-doctoral career. I have also been extremely fortunate to have been mentored by Dr. Aiichiro Nakano, who always treated me as one of his own students. He made countless resources available to me, offered feedback on my work, and prepared me for each formal milestone of the Ph.D. In 2011, I participated in the National Science Foundation’s East Asian and Pacific Summer Institutes, spending two months in China. My very gracious host, Dr. Zhiwei Shan, and his vi students welcomed me at the Center for Advancing Material Performance from the Nanoscale (Xi’an Jiaoto ng University), and provided a uniquely enriching experience both professionally and culturally. I would also like to acknowledge the tuition reimbursement support I received from my employer, the Jet Propulsion Laboratory, while I completed my dissertation. I am thankful to all my friends and colleagues there who supported me in the pursuit of my degree alongside my work. In particular, my good friend John Tran, a former colleague from USC, was instrumental in getting me the interview that led to the start of my JPL employment in 2009. My parents, Aura and Mihail, have been the most influential role models for me, teaching me the importance of hard work and perseverance. They always inspired me to seize every opportunity that life presented, and to keep going even when confronted with challenges. My grandparents, Paul and Eugenia, have been so caring and generous with me from the time I was little, and their love has always been a source of strength. My brother, Tudor, often delivers my grandmother’s home-cooked meals to my house, and shares his creative take on science and technology, which helps me maintain a fresh and optimistic outlook on my work and its role in society. Finally, I would like to express my appreciation to my wonderful husband, David. No one other than he could have shown so much genuine support throughout my multi-year endeavor. Besides shouldering the brunt of the household chores, he also contributed to drawing figures for my thesis. He always reminded me to stay focused on my goal and lifted my spirits in moments of doubt. Because of him, I feel that I have lived a full and balanced life, vii even while juggling the demands of work, school, and family. I am in constant awe of his incredible devotion and patience. David, I love you. You truly are the wind beneath my wings. …I will get ri ght to that pile of laundry now. 1 1. Introduction Semiconductor materials have provided unparalleled opportunities for development of novel electronic and photonic devices throughout the 20 th century. Semiconductor lasers 4-5 , light-emitting diodes, and solar cells have been created by exploiting the absorption and emission characteristics of these crystalline materials having unique band gaps. Combining low band gap materials with higher band gap ones has provided modern-day transistor 6-9 circuits as well as optical waveguides for communications. The continuous advancement of precise wafer patterning techniques 10-11 has enabled researchers to reduce feature dimensions down to the nanometer scale, offering the potential for additional improvements in device performance. High quality nanoscale devices can reduce the amounts of material used in manufacturing. In addition, they can consume less power by operating reliably using few particle transitions. Thus, nanotechnology has become a promising field with important bearings on the effort to conserve energy. Precisely controlling the growth of nanoscale features grown on a substrate is of great interest to researchers for these reasons. Semiconductor nanostructures are currently investigated for novel solar cell applications. Commercial thin film solar cells made of silicon currently provide efficiency no greater than 20%. In addition, their cost has not decreased sufficiently to encourage widespread adoption in private housing. Thus, the drive to harness the solar spectrum and shift global energy consumption toward renewable resources must be carried out in the context of a realistic efficiency to cost comparison. One approach to achieve higher efficiency is to capture as much of the solar spectrum as possible. At present, individual solar cells can only absorb a 2 small percentage of the solar spectrum which depends on the band gap of the semiconductor materials 12 . Efforts to integrate multiple materials in a solar cell have provided greater efficiencies, but are limited by the formation of defects within the crystal structure which degrade performance. Because each material has a different lattice constant, multiple thin films grown together will be strained – their lattices will either be stretched or compressed from their natural length. The drive to achieve greater efficiency in solar cells has focused the attention of many researchers on nanoscale semiconductor structures as possible alternatives to the conventional thin film (2-dimensional) devices. It has been found that 1-dimensional nanowires (NW) and 0- dimensional “quantum dots” (QDs) provide a strong confinement of cha rge carriers exhibiting a discreet electronic structure similar to that of individual atoms. These energy levels are expected to depend strongly on the exact size, shape, and composition of the nanostructure. Thus, the ability to fabricate such “artificial atoms” and wires as solar cells may enable absorption of a much wider range of the solar spectrum than is presently achievable with thin films. Quantum dots and nanowires are still subject to the same mechanism of defect generation which limits the efficiency of thin film devices, namely, strain due to lattice mismatch. Because each semiconductor material has a different lattice constant, films deposited on a different material substrate can grow free of defects only up to a critical thickness 13-14 typically only a few monolayers (~ 2 nm) in height. Beyond this, the strains accumulated in the crystal become too large inducing a relaxation that causes bonds to break 3 and dislocations to form. In the presence of these dislocations, electrons in the valence band which transition to the conduction band by photon absorption are likely to lose energy through collisions and recombine with holes. Thus, the efficient collection of electrons as photocurrent is hindered by this process and conversion of light is significantly limited. Nevertheless, using QDs rather than thin films may allow development of devices which are not strain limited. This is because QDs have the potential to relax 15-16 in the lateral direction during growth. In this way, they can form more defect-free layers which would prevent recombination of electrons. Taking into account that different materials also possess characteristic absorption coefficients which determine the number of photons absorbed over a certain length, the opportunity to increase the critical thickness in QDs is a further motivating factor for inclusion in solar cell applications. Along with QDs, freestanding nanowires have also been proposed for solar cells. Nanowires are tall columns that are formed on top of substrates through various growth methods discussed below. It has been suggested that these structures can also benefit from lateral relaxation and can be grown to long lengths without inducing any defects. Nanowires offer yet another opportunity to combine different band gap materials that form multiple junctions in the vertical direction. Heterojunctions can also be formed in the radial direction by fabricating core-shell nanowires 17 . Nanowires with different cross-sections and center-to- center spacing can be grown. Densely packed arrays of nanowires could trap more light and efficient absorption through the sidewalls would greatly enhance the performance of solar cells. It is apparent that these structures have provided innumerable opportunities for scientists to advance in the development of innovative material technologies. 4 The methods of fabricating QDs and NWs include the Stranski-Krastanov method as well as selective area 10,18-19 and vapor-liquid-solid (VLS) phase epitaxy 20-22 . The Stranski-Krastanov method requires deposition of a wetting layer - a low band gap material on a higher band gap substrate in a manner similar to thin film growth. It was discovered that at higher temperatures, the strained wetting layer relaxes by rearranging itself into individual QD structures on the substrate surface. This stochastic process yields QDs with relatively small dimensions, especially in the vertical direction with typical heights being ~4 nm. On the other hand, lithographic methods (e.g. e-beam and X-ray) enable selective area growth (SAG) 19 and greater control over the shape and height of QDs depending on the deposition time. For some applications, uniformity in shape is desirable because size variations lead to broadening of the QDs’ photoluminescence spectra. Finally, VLS methods use metal particles deposited on the substrate surface. During growth, these particles liquefy and facilitate incorporation of the gas phase reactants into solid crystal structures of nanowires and QDs. Patterned deposition of metal also enables selective area growth. The increasingly sophisticated research in QD fabrication and the numerous possible applications – including communications, quantum computing 23 , solar energy, biological detectors, single-photon sources, and semiconductor lighting – has also prompted tremendous efforts to develop theoretical models that can predict the important characteristics QDs based on their shape, size, and composition, including their energy gaps, band edge wavefunctions, transition dipole elements, and absorption spectra. It has been shown that the presence of strain significantly alters the shapes of bulk bands and must be carefully modeled 24-25 . In this work, we model the strain of both cubic and hexagonal crystal materials, also referred to as zinc 5 blended (ZB) and wurtzite (WZ) respectively. Group III-V materials commonly used in photoelectric devices including GaAs, InAs, InP, InGaN, AlN, and GaN are formed in either one or both of these crystal structures. Each of these alloys has composition-dependent strain coefficients for each crystal axis 26 . In addition, straining a crystal can also induce separation of charges along certain crystal directions known as the piezoelectric effect 27 . This creates an electric field in the strained crystal which is negligible in some materials, but quite significant in others. Furthermore, growth of QDs in the vertical direction can be oriented along different crystal axes, depending on the relative alignment of the substrate. In strongly polar materials, the cations or anions present on the terminating surfaces induce an additional spontaneous polarization causing an electric field along the growth direction. These complex effects have been modeled using atomistic as well as continuum approaches 15,28 , both of which are computationally intensive simulations and have benefited from the advent of parallel computing. In this work, we implement a finite difference model based on 8-band theory 29 . This approach solves for the electron and hole states near = 0 using a basis of 8 bands as a perturbative expansion that approximates the semiconductor crystal nanostructure. The key step is to diagonalize a Hamiltonian which correctly describes coupling between two conduction bands, two heavy-hole, two light hole, and two split off bands. In order to explain experimental observations as accurately as possible, researchers are actively seeking improvements to the current model. As a result, a commercial application has not been developed for widespread use but rather, researchers have individually implemented 6 their own models for theoretical studies. Nevertheless, a tool which allows experimentalists to calculate trends in energy gaps and strain profiles can prove to be invaluable in identifying fruitful research directions. Understanding how each property 30 (e.g. size, shape, composition, etc.) will individually affect electronic structure will give researchers the necessary knowledge to precisely engineering the band gap of QD structures. This work seeks to answer the following main questions: (1) How is the coherent growth of free-standing nanowires limited by strain? (2) How do piezoelectric effects alter the energy states of QDs? (3) How are strain and electronic properties altered in multi-body QD systems? To this end, this work shall focus on implementing an 8-band model which includes strain and piezoelectric effects for calculating the important characteristics of QDs and NWs of arbitrary shape and composition. This model has been successfully used for several decades to compute the band structure of individual QDs embedded in a substrate, particularly pyramidal-shaped dots which typically form during Stranski-Krastanov growth 31-32 . Nevertheless, novel shapes and formations are still being vigorously pursued, and freestanding nanowires surrounded by vacuum must also be modeled. The relatively long dimensions of nanowires require large-scale simulations, as does the accurate modeling of multiple QD systems. To enable computation on realistic scales, we implement parallel algorithms on multiple processors. In this way, the required computing time is also greatly reduced, and memory allocation can be shared among different nodes. It should also be noted that scientific 7 applications such as solving Schrodinger’s equati on pose a non-trivial problem with respect to memory limitations. The matrix being solved is too large to store entirely in memory, and thus conventional routines for matrix operations (e.g. LAPACK or ScaLAPACK) cannot be applied. This work will first discuss the theoretical basis of the method, as well as continuum elasticity and piezoelectricity. The algorithms implemented to solve each of these linear systems will be presented, and we will discuss specific issues related to the implementation of each in FORTRAN with MPI for parallel computing. Then, new findings will be presented which are obtained through application of the model to novel QD and NW systems in the context of high efficiency solar cells applications. 8 2. Methodology 2.1 An 8-Band Model for Electronic Structure Calculation The 8-band Hamiltonians used in this thesis are implemented from previous works. Here, we first provide a brief overview of this approach based on perturbation theory used in quantum mechanics. Bloch’s Theorem 33 states that the wavefunctions of an electron in a crystal with periodic potential can be written as the product where is the wave vector, is the position vector, and is a periodic envelope function for an electronic state denoted by the integer n. Then the Schrödinger equation becomes: where the momentum operator , is Planck’s constant, is the mass of free electrons, and is the eigenenergy of the electronic state. Upon substitution of the wavefunction, the derivative of acting on the plane wave gives and this term then cancels out leaving an equation for the periodic envelope alone: The set of functions having , , form a complete set, and it is also possible to show that 9 Therefore, we can use this basis to approximate for small values of using second order perturbation theory where the operator is taken as the perturbation such that: To solve this equation, the summation must be truncated for a finite number of envelope functions or bands. An 8-band model has been used successfully for several decades to calculate the band structure of QDs of arbitrary shapes, by expanding the equation above using the set of 8 basis functions near the zone center represented by , , , , , , , . These functions employ the model which characterizes the conduction band in a bulk material to be similar to the s-orbital of a single atom, while the valence bands are assumed to resemble the , , and orbitals. The and states represent electrons with spin up or spin down. Thus, the 8-band Hamiltonian includes interaction terms that take into account the coupling between these 4 orbitals, along with the 2 electron spin states. 2.1.1 Hamiltonian for Cubic Systems The Luttinger-Kohn 34-35 Hamiltonian which has been developed to compute the electronic levels of cubic crystal nanostructures of arbitrary shapes using a system of 8 basis functions is shown below 36 . The Hamiltonian has two conduction bands, while the 6 valence bands become separated in energy yielding two heavy-hole, two light hole, and two split off bands. 10 where: 11 Here is Planck’s constant; are modified Luttinger parameters; is the optical dipole matrix element that describes the coupling between the valence and conduction bands and is the corresponding coupling energy; and are the unstrained conduction- and valence-band energies respectively; and is the spin-orbit splitting energy. 2.1.2 Hamiltonian for Hexagonal Systems The wurtzite (WZ) crystal structure consists of two interpenetrating hexagonal closely packed sublattices, offset along the c axis by 5/8 of the height c. It is possible to leverage the similarities between this and the cubic crystal structure to derive an analogous Hamiltonian for the hexagonal system as shown in Ref. 37. This is the cubic approximation 38 which involves mapping the c axis along the (111) direction and the x and y directions along the [11 ] and [ 10] directions respectively. The valence bands are close together in energy, therefore, a second-order perturbation method provided by Lowdin 39 is applied to break the degeneracy in the Hamiltonian. The Hamiltonian is given by: 12 13 Here, the parameters and are the electron effective masses in the z (growth) and transpose directions respectively. The terms are analogous to the Luttinger parameters from the cubic system; similarly, the energies are analogous to the spin-orbit coupling term . For the cubic approximation, the following relations hold: 2.1.3 Lanczos Algorithm To solve for the eigenvalues and eigenvectors of theses Hamiltonians, we have implemented the Lanczos algorithm. The Lanczos algorithm 40 is an iterative method developed in the 1950s by Cornelius Lanczos that is an adaptation of power methods used to find eigenvalues and eigenvectors of a square matrix. It can be viewed as a simplified Arnoldi algorithm 41 in that it applies to hermitian matrices and is effective for finding extreme eigenvalues of large sparse Hamiltonians. The algorithm takes as input an NxN matrix H and an integer NWF - the maximum number of iterations. The following simple Lanczos algorithm has been shown to be most numerically stable: 14 The coefficients α s and β s are used to construct a tridiagonal and symmetric matrix T NWF such that T NWF = Q NWF † HQ NWF where Q NWF is a matrix whose column vectors are q(1:N,1)…q(1:N,NWF) after NWF iterations. The diagonal elements will be α s while β s will appear on the off-diagonal of T NWF such that: q(1:N,1) ⇐ random vector with norm 1 q(1:N,0) ⇐ 0 β 1 ⇐ 0 Ψ ⇐ 0 for s = 1:NWF v ⇐ H q(1:N,s) – β s q(1:N,s-1) α s ⇐ v T ∙ q(1:N,s) v ⇐ v – α s q(1:N,s) if (s < NWF) β s+1 ⇐ || v || q(1:N,s +1) ⇐ v / β s+1 end end call tridiag(NWF) for m = 1:NWF for s = 1:NWF Ψ(1:N,m) = Ψ(1:N,m) + q(1:N,s)QS(s,m) end end 15 Such a matrix is easily diagonalizable with commonly available routines (e.g. Numerical Recipes). We employ the subroutine tridiag which returns the eigenvalues of T NWF and the eigenvectors as column vectors of a matrix QS. It can be shown that the eigenvalues of T NWF are approximate eigenvalues of the original matrix H. Furthermore, we can compute the approximate eigenvectors from Ψ m = Q NWF QS(1:N,m). This is accomplished in the last step of the algorithm given above. The Lanczos algorithm is expected to yield fast convergence of extreme eigenvalues. For Hamiltonians describing systems with periodic potentials such as crystals, Lanczos algorithm also provides fast convergence of band edge states. Therefore, it is quite useful in practice for calculating the band gaps of semiconductor material systems. For Hamiltonians with dimensions - , Lanczos algorithm can provide a good approximation of the band gap after just a few hundred iterations 42 . A tridiagonal matrix of this size can be diagonalized in just a fraction of a second. Indeed, the limitation typically encountered when using Lanczos algorithm is memory rather than computation time. To see this, we note that the algorithm can only obtain the eigenvectors Ψ m of the Hamiltonian H after computing all the column vectors of the matrix Q NWF . Thus, to calculate the entire spectrum after NWF iterations, we require storing two structures of size NxNWF. For the dimensions given above, this requirement can 16 overflow the memory. In practice, we have seen that this can translate into extremely slow simulation completion times because of long wait times for memory retrieval. This problem can be remedied by first limiting the number of eigenvectors we compute. We expect that only the eigenvectors around the band gap will be well converged after NWF iterations. These will represent our band edge wavefunctions in the valence and conduction bands. Eigenvectors far away from the band gap will not carry any physical meaning, and thus need not be computed. Thus, we would like to first obtain the eigenvalues after NWF iterations and identify the band gap, then compute only a relatively small number of band edge wavefunctions. We note that the eigenvalue calculation only needs to store three vectors for each iteration – q(1:N,s), q(1:N,s-1), and v. Thus, we can eliminate the matrix Q NWF and store only these vectors. We can then carry out the Lanczos iteration twice. The first time, we compute the eigenvalues and identify the band gap while the second time, we recompute each of the q(1:N,s) vectors and add its contribution to the eigenvectors Ψ m using only those column vectors of the QS matrix which correspond to the desired eigenvectors. The new algorithm is given below: 17 The subroutine calculate_wavefunction takes as input the sorted vector of eigenvalues computed by tridiag. We call the subroutine findgap to find the largest energy difference between two consecutive eigenvalues in eval and to return the higher index of the gap. We then specify the number of wavefunctions we wish to plot around this index with NPLOT. For NPLOT = 20, the routine will compute the first 10 wavefunctions around the conduction band subroutine calculate_wavefunction(eval) q(1:N,1) ⇐ random vector with norm 1 q(1:N,0) ⇐ 0 Ψ ⇐ 0 call findgap(gap_index, eval) for s = 1, NWF n = mod(s,2) k = mod(s+1,2) v ⇐ H q(1:N,n) – β s q(1:N,k) v ⇐ v – α s q(1:N,n) for i = 0:NPLOT – 1 m = gap – NPLOT/2 + i Ψ(1:N,i+1) = Ψ(:,i+1) + q(1:N,n)QS(s,m) end if (s < NWF) q(1:N,k) = v/ β s+1 end end q(1:N,1) ⇐ random vector with norm 1 q(1:N,0) ⇐ 0 β 1 ⇐ 0 for s = 1,NWF n = mod(s,2) k = mod(s+1,2) v ⇐ H q(1:N,n) – β s q(1:N,k) α s ⇐ v T ∙ q(1:N,n) v ⇐ v – α s q(1:N,n) if (s < NWF) β s+1 ⇐ || v || q(1:N,k) ⇐ v / β s+1 end call tridiag(s) end call calculate_wavefunction(eval) end 18 edge and the first 10 around the valence band edge. We also note in the above that we do not call tridiag(NWF) once as in the initial algorithm. Rather, we are calling tridiag(s) to compute the eigenvalues on each iteration s < NWF. This can serve an important purpose which we shall discuss below. 2.1.3.1 Numerical Stability A complete treatment of the Lanczos algorithm will typically include a discussion of its limitations. The version given above is considered to be a simple Lanczos algorithm. More complex methods have been designed, such as the “block” Lanczos and an entire class of variations based on restarting 43 the algorithm after a certain number of iterations. These restarted algorithms (for example, “Thick -Restart 44 ” Lanczos method) seek to addre ss the problem of loss of orthogonality among the q(:,s) vectors. Mathematically, these vectors form an orthonormal basis, but in practice, roundoff errors introduced in digital computation can accumulate and lead to linear dependence among the q(1:N,s) vectors. As a result, some of the eigenvalues of the resultant tridiagonal matrix can be “spurious”. That is, they may not be approximations to the original matrix. Because of this, practical implementations have developed three kinds of remedies to address loss of orthogonality. The first is to prevent the loss of orthogonality; the second is to recover the orthogonality after the basis is generated; and the third is to identify and remove the “spurious” eigenvalues from the true ones. The simplest method to recover orthogonality is to perform a Gram-Schmidt reorthogonalization. During each Lanczos iteration, we compute a v vector which we add to the 19 basis of q vectors. Before doing so, however, we must subtract out all the projections of the existing q vectors to ensure that the new vector will be orthogonal. This is accomplished by including the following steps after the line v ⇐ v – α s q(:,n) above: for m = 1:s b = q(:,m) T ∙ v v = v – b q(:,m) end It is also recommended that this loop be carried out twice in order to reduce the projections to orders . While this is the most direct method to prevent loss of orthogonality, it is also the most expensive in terms of time. Furthermore, when this step was included in practice, we found that it did not to have any effect on the resulting eigenvalues and eigenvectors. We concluded therefore that our particular application is not sensitive to loss of orthogonalization and excluded this step from our implementation. We found that the eigenvalues around the band gap generally converge after about 100 iterations, but the wavefunctions require longer to become smooth. A satisfactory number of iterations for the Lanczos algorithm was found to be NWF = 500. As an example of the output we expect from the Lanczos iteration, we introduce Fig. 1. The system is an InAs quantum dot surrounded by GaAs on all sides. The figure plots the eigenvalues computed by the subroutine tridiag after each iteration. Because the subspace used to approximate the eigenvalues grows with the number of iterations, the eigenvalues also grow in number reaching 200 after 200 iterations. It is apparent in this figure that the FIG. 1. Example output of the approximate eigenvalues from the subroutine tridiag for 1-200 iterations. The system is an InAs quantum dot embedded in GaAs. Lanczos algorithm is used to solve for the eigenvalues of this Hamiltonian. The valence and conduction edge eigenvalues are well converged after 200 Lanczos iterations and the band gap can be identified by inspection ignoring the spurious eigenstates. Valence edge Conduction edge Spurious eigenvalues 20 eigenvalues around the valence and conduction edges are converging fastest. We can easily obtain the band gap by subtracting the conduction edge energy from the valence band energy. However, we also note that there are spurious eigenvalues inside the band gap. Although they persist throughout the calculation, these spurious states are not present on every iteration, but rather they appear every 6 – 7 iterations as can be seen in Fig. 1. It is possible that our final iteration will contain a spurious eigenvalue inside the band gap. Therefore, in order to properly identify the band gap, we must examine several iterations. In practice we choose to write the output of the tridiag value to a file for the last 20 iterations (e.g. 181 – 200). Then, the band gap can be correctly identified by inspecting this file and ignoring the spurious eigenvalues. The above discussion on reorthogonalization provides evidence that the Lanczos algorithm itself is not the likely source of these “spurious” states. Indeed, spurious solutions could also be the cause of the discretization or the approximation in the Hamiltonian employing a finite number of bands. There is evidence that the latter effect is indeed a likely source of these spurious states 45 . It is also possible to identify a spurious eigenvalue by plotting the resulting wavefunctions from the Lanczos iteration. If a spurious eigenvalue appears in our last iteration, the subroutine findgap will include it in the output wavefunctions. In Fig. 2, we plot the wavefunctions of another QD system - a truncated square pyramid with large base dimension than top dimension and height of 1.79 nm. It is apparent that the spurious state is not as smooth as the other states. We expect that the wavefunctions farther away from the band gap will be less well converged on any given iteration, and indeed, we observe that the higher energy states become blurrier. However, Fig. 2 shows a distorted state inside the band gap which is not present on previous iterations. In general, we note that the spurious states 21 will have this rough appearance making them easily identifiable amid the true band edge states. Nevertheless, the spurious state will bear some resemblance to a physical state – in this case, the highest (hole) energy valence state shown (0.186 eV). We can infer from this observation that the spurious eigenvalues result from approximations in the model rather than numerical limitations of the Lanczos algorithm. These states are artifacts of the finite band approximation which cannot exactly satisfy the periodic conditions of the crystal structure for all vectors 46 . 2.1.3.2 Operator Symmetrization So far, we have described how Lanczos algorithm can be used to solve for the eigenvectors and eigenvalues of a hermitian matrix . In the next two sections, we explain how the 8-band Hamiltonians for cubic and hexagonal crystals are adapted for purpses of diagonalization. We note from Sections 2.1.1 and 2.1.2 that these matrices consist of derivative operators. Although these operators appear in a familiar form, (e.g. and ), FIG. 2. Wavefunctions around the band gap of an InAs QD embedded in GaAs. The system is a truncated square pyramid with dimension of 21.48 nm at the base and 14.13 nm at the top. The height is 1.79 nm. The wavefunctions output after 500 iterations contain a spurious state which is identifiable graphically because it has a distorted appearance compared to the rest of the eigenstates. Energy (eV) 0.231 0.256 0.307 1.264 1.452 1.520 1.530 valence edge spurious state conduction edge y x z 0.187 tr_pyrme2spredo -> where are the eigenvalues for this? not in 041110 22 they cannot be implemented as individual matrices and multiplied together. The reason is due to the material dependent parameters mentioned above. Because the derivative and position operators do not commute (i.e. ), a new symmetrized form of this combined operator must be introduced. This operator must satisfy the commutation properties of the components and it must ensure that the Hamiltonian retains its hermiticity. We note that it is not necessary that each individual operator comprising be symmetric. It is only required that the terms on the diagonal be hermitian. The off-diagonal terms need not be hermitian as long as their hermitian conjugate appears in the mirrored position across the diagonal. To satisfy the requirements above, we must introduce the forward and backward derivatives and . In one-dimension we have: ; ; A new single derivative form will be a combination of and and will incorporate any material dependent parameter such that: Because , we can verify that . The second derivative will take the form: 23 which is a hermitian operator. Finally, a mixed derivative must take the form: which is also a hermitian operator. It can be verified that when where is a constant and is the identity matrix, all the above operators reduce to the standard difference quotients without preference to the backward or forward direction. Therefore, the operators must be divided by the equivalent finite difference element: 2.1.3.3 Matrix Operation While libraries exist in FORTRAN to enable direct matrix multiplication operations, our application does not lend itself to the use of such tools. This is because the Hamiltonian is comprised of derivatives which are sparse position independent matrices. Storing such a matrix in full form would quickly overrun the memory. Nevertheless, because of the cyclical nature of these operators, it is possible to implement matrix multiplication in a subroutine employing for 24 loops. Such a subroutine takes an input vector q and outputs only the resultant vector of the operation hq = H*q. The subroutine hamiltonian_op(q,hq) accomplishes this operation. The input vector q(2,8,N x ,N y ,N z ) is a 16N x N y N z complex vector. The factor of 16 is due to the real and imaginary parts as well as to the 8 segments corresponding to the number of bands. The resulting vector will be the product of a row of operators in the Hamiltonian applied to the corresponding segments of the vector q. For example, considering only the matrix : To formulate this as a subroutine, each operator must be reduced to a point by point description. This involves finding a general form for each operator. For example, the fifth term above applies to which includes the operator and material matrix. If we let be an arbitrary diagonal matrix and carry out the explicit multiplication to obtain the matrix we find that: The term represents all other multiplicative factors in front of this operator including the term . The process of explicitly writing out each row of the resultant vector and calculating it inside for loops becomes apparent from this discussion. Each row of will be 25 written as a sum of contributions from all terms operating on . Finally, we offer a word of caution regarding the hermitian conjugates of these derivative operators. Taking the hermitian conjugate introduces a minus sign for the derivative operator such that . However, the operator is hermitian. Each operator must appear with its correct hermitian conjugate across the diagonal. If this is not the case, the entire Hamiltonian will not be hermitian and the computed eigenvalues will be complex and will not correspond to the physical system. Our algorithm first computes the vector Then, a separate subroutine is called to compute whose contribution is added to . This subroutine is called strainhamphys for cubic systems and strainhamphyshex for hexagonal systems. Finally, the piezoelectric potential is added to the diagonal terms of the Hamiltonian by adding where . Calculating the components of the strain Hamiltonian and are described in the next sections. 26 2.2 Conjugate gradient method applied to strain calculation 2.2.1 Strain Hamiltonian for Cubic and Hexagonal Crystals Strain effects are introduced into our simulation through the strain Hamiltonian H s . For the cubic system, this operator takes the form: with: 27 The terms , , and are material dependent parameters which are taken from Ref. 36. For the hexagonal system, the strain Hamiltonian is given by: where: 28 The terms and represent the strain deformation potential for the conduction band in the parallel and perpendicular growth directions respectively. The are strain deformation potentials for the valence bands. In the cubic approximation 37-38 , the following is also true: It is important to note that it is difficult to measure the parameters independently, therefore a great deal of discrepancy still exists in experimental data. In many cases, the values are obtained by fitting to measurements and the results do not always agree in the literature. It has been found that more than one combination of terms can be used, all of which satisfy the cubic approximation in the equations above. We have chosen values which are recommended as the most representative of the majority of cases as explained in Ref. 26. The next sections will describe how the strain components are derived from continuum elasticity theory and how we compute them in this work using the conjugate gradient (CG) algorithm. 2.2.2 Strain Calculation Using Continuum Elasticity Theory From the theory of elasticity 47 , a body undergoing a deformation will have a displacement vector u defined at every point. At any given point, the displacement components are given by: For two points which are very close together, the radius joining them after the deformation will be . The new distances between the points before and after the deformation 29 will be and . We can get an expression for by combining the above equations and substituting to obtain: The second term on the right hand side can be written as ) and the last term becomes . Then the above equation can be written as: where is the strain tensor. In linear elasticity, it is assumed that the third term in will be at least one order of magnitude smaller than the first two terms, and therefore we can neglect it such that: From thermodynamics, we can define a stress tensor such that, at constant temperature: 30 where is the Helmholtz free energy. To derive an expression for as a function of strain tensor, we expand in powers of for small deformations. The linear term will contain . But when , the stresses will also be zero. It follows that the expansion of does not contain a linear term. Consequently, a first order approximation for will be a quadratic function of the strain tensor known as Hooke’s Law. The change in free energy of crystals in isothermal compression takes the form: where is a tensor of rank four called the elastic modulus tensor. 2.2.2.1 Free Energy of Cubic Crystals Cubic crystals will have only three independent terms in such that: where we have made the notation change , , and . Our current system also requires that we introduce a method to account for the lattice mismatch between materials with different lattice constants. The lattice mismatch is the sole source of internal strain, since no other forces are externally applied. Nanostructure material grown on a substrate with different lattice constant will be forced to adapt to the substrate’s crystal structure and will be strained. Evidently, the definition of displacement u which gives rise to strain will differ depending on the reference material. For our calculation, we choose to measure displacement relative to the substrate’s coordinate system. We treat the 31 nanostructure material as expanded substrate material with different elastic constants (values of ). However, inside the nanostructure, we must adjust the energy to account for its larger lattice constant. This is accomplished by the modification 48 – α where: α where and are the lattice constants of the nanostructure and substrate respectively. This modification can be understood by analyzing a simple example in one dimension. Figure 3 shows that this modification essentially shifts the energy minimum from to a nonzero value inside the nanostructure. Therefore, the calculated strain inside the nanostructure material will not be the physical strain. This is because we compute our strain relative to the substrate’s lattice constant. To see this, we differentiate the modified expression for above with respect to to obtain: FIG. 3. To account for lattice mismatch, the term shifts the energy minimum inside the nanostructure material from to . Energy 1 2 2 1 2 2 ′ 32 α such that for = 0, α . For , the nonzero stress is fictitious since no external forces are acting on the material. We must therefore introduce a correction term to obtain the physical strain that gives The corrected strain is still computed in the coordinates of the substrate material and must be converted to the nanostructure coordinates through multiplication by . Therefore, physical strain inside the QD is given by: where is the strain that minimizes the modified expression for . FIG. 4. A nanostructure with lattice constant grown on a lattice mismatched substrate with lattice constant . The displacement u relative to its natural lattice spacing is nonzero inside the nanostructure when the nanostructure adjusts to the substrate’s lattice spacing . = 0 0 33 To find the minimum energy configuration, it is necessary to discretize the physical space. We will use the same grid as for our electronic calculation and similarly, we find that it is necessary to use forward and backward derivative operators 48 . where is the lattice vector in the direction . Symmetric differences given by: must be avoided because they produce unphysical results at low energies which oscillate with periods and mix with the physical solutions. For instance, a displacement ) has when using symmetric differences. Nevertheless, using nonsymmetric differences introduces an intrinsic bias for the forward or backward direction. To solve this, we compute using all possible combinations of the three difference operators and take an average such that: where each + and – represents the forward or backward derivative used for the operator in the , , and direction respectively. 34 In order to apply conjugate gradient (CG) method to minimize the energy , we must formulate the expression as a quadratic system . Since the strains are derivatives of the displacement field, we deduce that That is, the energy is a quadratic function of the strain field. Thus the modification due to lattice mismatch provides a linear term, and The energy is quadratic in and must be minimized before being included in the electronic calculation. This requires solving for the displacement field which minimizes the energy over the 3D volume consisting of 3N degrees of freedom where N ≡ N x N y N z . Because our CG algorithm applies the operator to successive vectors on each iteration and also uses the vector , we must derive the forms of these two structures. In addition, we anticipate that the matrix will be too large to store in memory, and a subroutine must be supplied which operates explicitly on vector terms to conduct the equivalent multiplication. As with the 8- band Hamiltonian, is a sparse matrix consisting of derivative operators, thus facilitating a direct implementation of the required subroutine employing for loops. For the moment, we can ignore the nonsymmetric derivatives, and derive an explicit expression for given by A term such as can be written using operator notation as . If we construct the displacement vector as u = [ , then will be a 35 matrix of size 3N whose terms have the form given above. We can then construct in matrix form as: We note that the off-diagonal terms with , can be written as in the mth row and nth column of the matrix, or as in the nth row and mth column. This is because . Because of this equivalence, these terms have been split into two terms and divided among the two complementary positions in the matrix. For example, produces in the third row and first column and in the first row, third column. This ensures that our matrix is symmetric, which is a crucial requirement for the CG algorithm. Likewise, the term due to lattice mismatch can be written as: α α By comparing with the quadratic form, we deduce that α α 36 Thus is a 3N x 1 vector computed by applying the derivative operators in each direction to the N x 1 vector . 2.2.2.2 Free Energy of Hexagonal Crystals Hexagonal crystals will have five independent terms in such that 47 : where we have made the notation change , , , , and . As with cubic crystals, we introduce a linear strain term to account for the lattice mismatch. In this case we have 49 : where: α 37 such that and are the lattice constant in the nanostructure and substrate respectively for and directions, respectively, while and apply to the (growth) axis. Note that the term is an additive constant (independent of strain), heretofore not present in the cubic system. This term does not affect the result of the conjugate gradient algorithm which finds the strain that minimizes total energy. However, this term does affect the subsequent calculation of the total energy. We have noted that some texts do not take this term into account 49 , while others include it 50 . Writing the terms explicitly as derivatives of the displacement vector components, we obtain: The above can be represented as a symmetric matrix having the form: As with the calculation of cubic strain, the physical strain must be computed by subtracting out fictitious strain terms from the outputs of the conjugate gradient algorithm. 38 Because the hexagonal system has 2 lattice constants, the physical strain expression is more complex than the in the cubic strain case. The physical strain is calculated by first multiplying each strain component by or . For example, and . The shear components involving a differentiation in the direction must include both terms. For example, Then, the fictitious strain term is subtracted from and and subtracted from , while the shear terms do not have this correction. The above process for obtaining the physical strain is summarized in the expression below: Here . We deduced this method by comparing with the cubic strain calculation and making the most reasonable choice for the shear components, taking into account that the hexagonal system has two lattice constants. Finally, the linear term in the quadratic equation is also more complex than for the cubic system due to the unique terms α and . The linear term is then: α α α α 39 2.2.3 Implementation 2.2.3.1 Matrix-Vector Operation for Cubic System It is apparent that all terms in the matrix from section 2.2.2 above have the same form given by . Also, the rows and columns follow a cyclical pattern. In formulating a subroutine for the matrix operation, this repetition enables us to create general subroutines for differentiation, which will be called subsequently in loops. This provides much greater readability and ease in testing and debugging of the code. However, we must now return to the consideration of the forward and backward derivatives and examine their implications on the program’s efficiency. Nonsymmetric derivatives introduce an added layer of complication. Instead of one matrix, we have 8 versions containing all possible permutations of the backward and forward derivative operators. As an example, the term is Since each term contains only two derivatives, the combined matrix will contain several occurrences of a given term. For where , each permutation repeats twice over all terms in , and for , each permutation appears 4 times. Therefore, care must be taken when designing an algorithm to avoid redundant calculations. 40 A subroutine which computes the resultant vector where x is the input vector will need to apply different derivatives to different parts of . This is another source of possible redundancy because different matrix elements require the same derivative to be computed. The solution is to begin by choosing one section of the vector, one direction for differentiation, and a backward/forward derivative. One subroutine will be used to differentiate the indicated vector section in the specified direction using a backward or forward derivative. Then, we will compute all terms in the matrix that require the particular differentiated vector by calling a second subroutine which will apply different elastic constant matrices, as well as the operator . Thus, the inputs to the second subroutine will be a second choice for differentiation direction and backward/forward derivative. In addition, this subroutine will specify where the new term should be added in the resulting vector . This will depend on the indeces of the terms being computed in the matrix . Figure 5 illustrates the calculation. The algorithm used can be summarized as follows: subroutine calculate_A(input_vector, output_vector) for vector_section = 1:3 for direction1 = 1:3 for back_or_for1 = 0:1 call subroutine1 (vector_section, direction1, back_or_for1) if vector_section = direction1 call subroutine2 (4,1, direction1, back_or_for1,vector_section) for back_or_for2 = 0:1 direction2 = mod(direction1+1, 3)+1 call subroutine2(2, 2, direction2, back_or_for2, direction2) direction2 = mod(direction1, 3)+1 call subroutine2(2, 2, direction2, back_or_for2, direction2) end for else call subroutine2(4, 3, direction1, back_or_for1, vector_section) 41 for back_or_for2 = 0:1 call subroutine2(2,3,vector_section,back_or_for2,direction1) end for end if end for end for end for subroutine subroutine1(input_section, direction, back_or_for) Differentiate the input_section along direction using back_or_for derivatives and store in temporary vector. end subroutine subroutine subroutine2(factor, elastic_modulus, direction, back_or_for,output_section) Multiply the terms in the temporary vector by factor and elastic_modulus matrix, then differentiate in direction using back_or_for derivatives, and store in the output_section. end subroutine end subroutine calculate_A 42 FIG. 5. Sequence flow in the operation of matrix on a vector for cubic systems. 𝑡 1 𝑡 2 𝑡 3 temp = 0 direction1 back_or_for1 back_or_for2 direction2 = mod(direction1,3)+1 back_or_for2 direction2 = mod(direction1+1,3)+1 direction 𝑡 𝑒 𝑡 𝑜 _ 𝑒 𝑡 𝑜 vector_section 𝑒 𝑡 𝑜 1 _𝑜 _ 𝑜 1 ( 𝑒 𝑡 𝑜 1 _𝑜 _ 𝑜 1 ) † 11 ( 𝑒 𝑡 𝑜 2 _𝑜 _ 𝑜 2 ) † 12 ( 𝑒 𝑡 𝑜 2 _𝑜 _ 𝑜 2 ) † 12 ( 𝑒 𝑡 𝑜 1 _𝑜 _ 𝑜 1 ) † 44 ( 𝑒 𝑡 𝑜 _ 𝑒 𝑡 𝑜 _𝑜 _ 𝑜 2 ) † 44 x 4 x 2 x 4 x 2 x 2 + [output vector_section ] + [output direction2 ] + [output direction2 ] + [output vector_section ] + [output direction1 ] back_or_for2 direction1 = vector_section direction1 ≠ vector_section 43 2.2.3.2 Matrix-Vector Operation for Hexagonal System We have applied the same principles to the hexagonal system as used in the cubic system to avoid redundant calculations. The subroutine used is longer than for the cubic system due to lower symmetry for the hexagonal system. It is summarized below: subroutine calculate_A(input_vector, output_vector) for vector_section = 1:2 for back_or_for1 = 0:1 temp_vector = 0 call subroutine1 (vector_section, vector_section, back_or_for1) call periodic_bc(temp_vector) call apply_boundcond(temp_vector) direction2 = vector_section call subroutine2(4, 1, direction2, back_or_for1, vector_section) direction2 = 2 for back_or_for2 = 0:1 call subroutine2(2, 2, direction2, back_or_for2, mod(vector_section,2)+1) end for direction2 = 3 for back_or_for2 = 0:1 call subroutine2(2, 2, direction2, back_or_for2, direction2) end for temp_vector = 0 direction2 = mod(col, 2) + 1 call subroutine1 (vector_section, direction2, back_or_for1) call periodic_bc(temp_vector) call apply_boundcond(temp_vector) call subroutine2( 2, 1, direction2, back_or_for1, vector_section) 44 call subroutine2(-2, 2, direction2, back_or_for1, vector_section) for back_or_for2 = 0:1 call subroutine2( 1, 1, vector_section, back_or_for2, direction2) call subroutine2(-1, 2, vector_section, back_or_for2, direction2) end for temp_vector = 0 call subroutine1 (vector_section, 3, back_or_for1) call periodic_bc(temp_vector) call apply_boundcond(temp_vector) call subroutine2(4, 5, 3, back_or_for1, vector_section) for back_or_for2 = 0:1 call subroutine2(2, 5, vector_section, back_or_for2, 3) end for end for end for vector_section = 3; for back_or_for1 = 0:1 for direction2 = 1:2 temp_vector = 0 call subroutine1 (vector_section, direction2, back_or_for1) call periodic_bc(temp_vector) call apply_boundcond(temp_vector) call subroutine2(4, 5, direction2, back_or_for1, 3) for back_or_for2 = 0:1 call subroutine2(2, 5, 3, back_or_for2, direction2) end for end for temp_vector = 0 call subroutine1 (vector_section, 3, back_or_for1) 45 call periodic_bc(temp_vector) call apply_boundcond(temp_vector) call subroutine2(4, 4, 3, back_or_for1, 3) for direction2 = 1:2 for back_or_for2 = 0:1 call subroutine2(2, 3, direction2, back_or_for2, direction2) end for end for end for subroutine subroutine1(input_section, direction, back_or_for) Differentiate the input_section along direction using back_or_for derivatives and store in temp_vector. end subroutine subroutine subroutine2(factor, elastic_modulus, direction, back_or_for,output_section) Multiply the terms in the temp_vector by factor and elastic_modulus matrix, then differentiate in direction using back_or_for derivatives, and store in the output_section. end subroutine end subroutine calculate_A 46 FIG. 6. Sequence flow in the operation of matrix on a vector for hexagonal systems. 𝑡 1 𝑡 2 𝑡 3 𝑡 𝑒 𝑡 𝑜 _ 𝑒 vector_sec 𝑒 𝑡 𝑜 2 _𝑜 _ 𝑜 1 vector_sec < 3 vector_sec = 3 back_or_for1 back_or_for1 temp = 0 direction2 < 3 temp = 0 3 _𝑜 _ 𝑜 1 𝑒 𝑡 𝑜 _ 𝑒 _𝑜 _ 𝑜 1 3 _𝑜 _ 𝑜 1 𝑒 𝑡 𝑜 2 _𝑜 _ 𝑜 1 temp = 0 temp = 0 direction2 = mod(vector_sec,2) + 1 temp = 0 back_or_for2 direction2 = mod(vector_sec,2) + 1 back_or_for2 back_or_for2 back_or_for2 direction2 < 3 2( 𝑒 𝑡 𝑜 2 _𝑜 _ 𝑜 2 ) † 12 +[𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 2 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] 4( 𝑒 𝑡 𝑜 _ 𝑒 _𝑜 _ 𝑜 1 ) † 11 2( 3 _𝑜 _ 𝑜 2 ) † 13 +[𝑜 𝑡 𝑡 3 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] 4( 𝑒 𝑡 𝑜 2 _𝑜 _ 𝑜 1 ) † 44 +[𝑜 𝑡 𝑡 3 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] 2( 3 _𝑜 _ 𝑜 2 ) † 44 +[𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 2 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] 4( 3 _𝑜 _ 𝑜 1 ) † 33 +[𝑜 𝑡 𝑡 3 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] 2( 𝑒 𝑡 𝑜 2 _𝑜 _ 𝑜 2 ) † 13 +[𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 2 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] 2( 𝑒 𝑡 𝑜 2 _𝑜 _ 𝑜 1 ) † 11 +[𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] 2( 𝑒 𝑡 𝑜 2 _𝑜 _ 𝑜 1 ) † 12 +[𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] ( 𝑒 𝑡 𝑜 _ 𝑒 _𝑜 _ 𝑜 2 ) † 11 +[𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 2 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] ( 𝑒 𝑡 𝑜 _ 𝑒 _𝑜 _ 𝑜 2 ) † 12 +[𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 2 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] + [𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] back_or_for2 back_or_for2 4( 3 _𝑜 _ 𝑜 1 ) † 44 +[𝑜 𝑡 𝑡 𝑒 𝑡 𝑜 _ 𝑒 ] 2( 𝑒 𝑡 𝑜 _ 𝑒 _𝑜 _ 𝑜 2 ) † 44 +[𝑜 𝑡 𝑡 3 ] back_or_for2 47 2.2.3.3 Conjugate Gradient Algorithm The conjugate gradient (CG) 51 method is one of the most popular iterative algorithms for solving large systems of linear equations , where b is a known vector, and is a known square, symmetric, positive-definite (or positive-indefinite) matrix. CG, along with other iterative methods, is a fast, memory-efficient technique for computing the unknown vector when is a sparse matrix 42 . In addition, CG can be applied to minimize (or maximize) a quadratic function having the form The gradient of a quadratic form is defined to be For a given point , the gradient points in the direction of greatest increase of . To find the critical points of , we set the gradient field equal to zero. Applying the gradient formula to , we can derive 48 Setting the above equal to zero, we obtain or when is symmetric. Given inputs , , a starting value , a maximum number of iterations , and an error tolerance ε, the conjugate gradient algorithm is: We note that the algorithm will terminate when the error δ new is smaller than the initial error δ 0 multiplied by the user-defined tolerance ε 2 . However, δ 0 is derived using the initial x ⇐ random vector with norm 1 i ⇐ 0 r ⇐ b – Ax d ⇐ r δ new ⇐ r T r δ 0 ⇐ δ new While i < i max and δ new > ε 2 δ 0 q ⇐ Ad α ⇐ x ⇐ x + αd If i is divisible by 50 r ⇐ b – Ax else r ⇐ b – αq δ old ⇐ δ new δ new ⇐ r T r β ⇐ d ⇐ r + βd i ⇐ i + 1 49 random guess for the solution . This means that the quadratic function will be smaller than the initial value but not necessarily minimized when the algorithm terminates. Therefore, to ensure minimization, it is necessary to apply the algorithm twice using two different tolerance values ε (e.g. ε 2 = 10 -5 and ε 1 = 10 -6 ) and compare the final values of to confirm that they do not change significantly. In our application, experience has shown that the actual physical system determines how small an ε is required for minimization of . Specifically, nanostructures embedded in a substrate require smaller ε than free-standing systems 52 . This phenomenon is described in greater detail in Section 3.1.2. Additional instructions on how to verifying the minimization of are provided in Appendix A. 2.2.3.4 Calculation of Strain Components When the CG algorithm computes, the displacement vector u will be determined. To include the strain in the Hamiltonian, the displacement vector must be differentiated to obtain the 6 strain tensor components. Rather than storing these 6 matrices in memory, we choose instead to calculate the components on the fly from the displacement vector u when computing the vector as explained on p. 23. For example, at the location with indices , , : We note that using symmetric derivatives is not problematic in this case and where is the length of the simulation window in the th direction, and is the number of grid points in that direction. 50 One further complication is the conversion that must be made to obtain the physical strain for the points lying inside the nanostructure. To do this, we must check each position to see if it contains nanostructure material. One easy way to deduce this is to check the vector which will be zero at positions containing substrate material. In this case, the strain is computed normally as above. Otherwise, the physical strain must be adjusted as described sections 2.2.2.1 and 2.2.2.2 by subtracting the fictitious portion. In addition, the terms v and w for the cubic system in section 2.2.1 will be more complicated because they require a derivative operation on the vector q as described on p. 23 along with differentiation to obtain the required strain components. In that section, we saw that differentiation at the point with indices , , along , for example, requires the values of the material parameters at ( +1, , ), ( , , ), and ( -1, , ). Therefore, we must find the strains at these three positions, which will depend on , , , , and . All three positions must also be queried to see if they lie inside the nanostructure and adjusted accordingly. Finally, because points along the boundary (e.g. (1, , )) will require displacement terms up to two positions away for differentiation (e.g. and ), the displacement vector must carry a second edge frame. That is, its dimensions will be (-1:N X +1,-1,N Y +1,-1:N Z +1). 51 2.3 Piezoelectric Potential Calculation 2.3.3 Cubic System The piezoelectric effect refers to the localization of charge induced by changes in a strained crystal lattice which can alter the electronic structure of semiconductor devices 53 . For the cubic system, the first order polarization is derived from the strain fields 29,54 : 𝑒 where 𝑒 is the only independent piezoelectric coefficient that remains due to zinc blende (ZB) symmetry. The charge density is then obtained and used to solve Poisson’s equation for the corresponding electric potential, taking into account the material dependence of the static dielectric constant . The second equation below represents the linear system which must be solved for . The first term on the right hand side above refers to the true three-dimensional charge density, while the second is the contribution of polarization interface charge densities due to a discontinuous across interfaces. If all terms containing are moved to the left side, the equation above takes the form . This is a linear system with 52 . We note that the latter term in involves only a single derivative which operates on . This term causes our matrix to be non-symmetric and we cannot apply our previously developed CG algorithm to solve it. A more general algorithm is necessary to solve a non-symmetric linear system, which will be described in section 2.3.5. 2.3.4 Hexagonal System The first order polarization due to strain in hexagonal crystals is given by 49 : 𝑒 𝑒 𝑒 𝑒 Here, represents the spontaneous polarization present in hexagonal systems when growth occurs along the axis. The three distinct piezoelectric coefficients are 𝑒 , 𝑒 , and 𝑒 . As can be seen for the Wurtzite (WZ) system, both diagonal and off-diagonal strain terms contribute to the built-in electric field. As for a cubic crystal, the charge density is obtained by differentiating the polarization vector such that: However, in a hexagonal crystal, the dielectric is a tensor, rather than a constant so that: where 53 (In the cubic crystal, such that may be factored out.) For the hexagonal system, the equation becomes: More explicitly, the following equation must be solved for . As for cubic crystals, this is a non-symmetric system and can be solved using the same generalized algorithm described in the next section. 2.3.5 Generalized Minimum Residual Algorithm (GMRES) for Non- Symmetric Linear Systems The GMRES method was developed by Yousef Saad and Martin H. Schultz in 1986 42 and approximates the solution by a vector in the Krylov subspace similar to our Lanczos algorithm. In fact, GMRES uses the Arnoldi algorithm - a generalized form of the Lanczos method - to find the approximate solution. Because of memory considerations similar to the ones we discussed in the section 2.1.3.3, we implement an iterative form of the algorithm which allows us to restart the algorithm every steps. This lets us avoid having to store all the previously computed vectors in memory. The algorithm is outlined below: 54 V ⇐ 0 r ⇐ b – A V vh(1,:) ⇐ r/|| r || δ 0 ⇐ || r || β(1) ⇐ || r || While i < i max and || r || > εδ 0 do Hmat ⇐ 0 vh(2:m+1,:) ⇐ 0 do j = 1:m Avj ⇐ A vh(j,:) do i = 1:j Hmat(i,j) = <Avj, vh(i,:)> vh(j+1,:) = vh(j+1,:) – Hmat vh(i,:) end vh(j+1,:) = vh(j+1,:) + Avj Hmat(j+1,j) = ||vh(j+1,:)|| vh(j+1,:) = vh(j+1,:)/Hmat(j+1,j)) call lsf(β, m+1, y, m, Hmat, vmat, w, m+1, m) do k = 1, m V = V + vh(k,:) y(m) end r = b – A V vh(1,:) = r/|| r || β(1) ⇐ || r || i = i + 1 end 55 The subroutine lsf_eig.f which we use from Numerical Recipes returns a vector which minimizes the quantity – 𝑡 . This amounts to solving a least squares problem which can be accomplished using several different methods including QR decomposition and Givens rotations. The lsf_eig.f subroutine employs chi-square minimization based on Numerical Recipes. We are interested in the vector needed to form the approximate solution to on each iteration. However, care must be taken since lsf_eig.f will return the singular value decomposition into the matrices 𝑡 , 𝑡 and . Because 𝑡 is overwritten in this way, we must reset it to zero on each iteration before computing its new values to avoid errors. 2.3.6 Non-unique Solutions The non-symmetric nature of the above system introduces some difficulty in finding the physical solution to the piezoelectric potential which we seek. We know that the piezoelectric potential will have an antisymmetric shape, but the GMRES algorithm may converge to a symmetric solution or even a constant one. While these solutions satisfy the small residual condition required to end the iteration, upon inspection we observe that they do not have the expected spatial dependence which defines the physical potential. To ensure that the algorithm converges to the physical solution, we must use the Dirichlet (or fixed) boundary condition when solving for . In the GMRES algorithm above, the subroutine apply_boundcondV is called each time the matrix is applied to a vector. This subroutine sets all the vector’s terms on the edge of the simulation window to 0. Physically, this means that the electric potential must vanish at the edges. Furthermore, in solving the linear system 56 above, there are 3 other quantities which are differentiated and must have boundary conditions defined. The quantities are , , and the displacement vector u from which all the strain elements are calculated. Our experience shows that u may take on either fixed or periodic boundary conditions without affecting the outcome of the GMRES iteration. As for and , these quantities must be continuous at the boundaries because a discontinuity would introduce a boundary charge at the edge of the simulation window. Therefore, the periodic boundary condition is hard-wired in the program for both of these quantities. Our experience has shown that even after defining the boundary conditions in this way based on physical arguments, it is still possible that the algorithm will not yield the desired physical solution. We have observed that a constant solution is commonly encountered and it will be dependent on the initial value defined for V. Because our other algorithms use a randomly generated initial guess vector to begin the iteration, we initially chose a random vector for V as well. However, with this vector we observed that the output of GMRES was not only a non-physical solution, but also varied slightly each time. On the other hand, we noted that the iteration consistently yielded the physical solution when V was initially set to 0. For this reason, our GMRES algorithm begins by setting V ⇐ 0 as stated above. 2.3.7 Piezoelectric Effect on Electronic Structure Once the displacement vector u is computed, our program can also calculate the piezoelectric potential . This potential is a constant operator at each grid point and can be included in the electronic structure calculation in a straightforward fashion. The subroutine 57 piezham adds the piezoelectric contribution to each band in such that . The total Hamiltonian is then given by . Our program has options built in to allow the inclusion or exclusion of different components in the Hamiltonian. We can gain important insight into the contribution of each effect by carrying out simulations setting or and comparing the results to the total Hamiltonian. This approach is demonstrated in Chapter 3. 2.4 Material Parameters The material parameters used in this work are taken from Refs. 26, 36, 49 and 37 and are summarized in the Table 1 on the next page. The material parameters described therein are implemented in atomic units in our programs. This was done in order to avoid finite machine sized number issues for small values. Atomic units are obtained by converting energy quantities from eV to Hartrees, length dimensions in nanometers to Bohrs, and charges in Coulombs to electron charges. The conversion factors are listed below: 1 Hartree = 27.2116 eV 1 electron charge = 1.6 x 10 -19 Coulombs 1 Bohr = 0.0529 nm 58 Table 1. Material parameters of GaN/AlN and InAs/GaAs Parameters GaN (WZ) AlN (WZ) Parameters InAs (ZB) GaAs (ZB) a (nm) 0.3189 0.3112 a (nm) 0.60584 0.56533 c (nm) 0.5185 0.4982 E g (eV) 0.418 1.519 E g (eV) 3.44 6.28 Δ (eV) 0.38 0.33 Δ 1 (eV) 0.016 -0.0585 γ 1 19.67 6.85 Δ 2 (eV) 0.004 0.0068 γ 2 8.37 2.1 Δ 3 (eV) 0.004 0.0068 γ 3 9.29 2.9 0.20 0.28 E p (eV) 22.2 25.7 0.18 0.32 VBO (eV) 0.21 0 A 1 -6.56 -3.95 a c (eV) -6.66 -9.30 A 2 -0.91 -0.27 a v (eV) 0.66 0.7 A 3 5.65 3.68 b (eV) -1.8 -2.0 A 4 -2.83 -1.84 d (eV) -3.6 -5.4 A 5 -3.13 -1.95 C 11 (eV/nm 3 ) 5,206 7,569 A 6 -4.86 -2.91 C 12 (eV/nm 3 ) 2,829 3,425 VBO (eV) 0 -0.8 C 44 (eV/nm 3 ) 2,474 3,775 (eV) -9.50 -12.00 e 14 (C/nm 2 ) 1.2593 x 10 -22 4.4495 x 10 -22 (eV) -8.20 -5.40 15.15 12.9 D 1 (eV) -3.00 -3.00 D 2 (eV) 3.60 3.60 D 3 (eV) 8.82 9.60 D 4 (eV) -4.41 -4.80 D 5 (eV) -4.00 -4.00 D 6 (eV) -5.10 -5.10 C 11 (eV/nm 3 ) 2,438 2,475 C 12 (eV/nm 3 ) 906 856 C 13 (eV/nm 3 ) 663 675 C 33 (eV/nm 3 ) 2,488 2,331 C 44 (eV/nm 3 ) 656 725 e 15 (C/nm 2 ) -0.490 x 10 -18 -0.600 x 10 -18 e 31 (C/nm 2 ) -0.490 x 10 -18 -0.600 x 10 -18 e 33 (C/nm 2 ) 0.730 x 10 -18 1.460 x 10 -18 P sp (C/nm 2 ) -0.029 x 10 -18 -0.081 x 10 -18 10.01 8.57 9.28 8.67 59 Our implementation currently allows us to model configurations consisting of up to three materials. The cubic system includes parameters for InP as well. Multiple materials can be modeled with a straightforward modification of our programs. In this case, structures with compositional gradients (e.g. InxGa1-xAs) can also be modeled by interpolating the material parameter to represent the realistic stoichiometry. This is described in more detail in Chapter 4 and Appendix A. The strain calculation employs the elastic constants of the bulk materials. In addition, we have shown that the strains of uncovered QDs and freestanding NWs can be successfully calculated by setting the elastic constants of vacuum to very small values. This approach is described in full detail in Chapter 3. We note here that we have also attempted a similar approach in order to compute the electronic structure of freestanding NWs and uncovered QDs. We chose different values of and to represent the conduction and valence band energies of vacuum. In one case, we let = and set equal to a value much higher than the conduction band energy of GaAs. We also considered setting these values to equal the ionization energy of the material. However, the results were not satisfactory because the wavefunctions did not have the expected structure. We therefore focused our efforts on the strain calculation of freestanding structures. In the case of embedded systems, our model allows us to compute both the strain and the electronic states of the nanostructure. 60 2.5 Parallel Implementation 2.5.3 MPI (Message Passing Interface) Concepts Parallel computing is the process by which the workload required to complete a program is divided among different processors. If the workload consists mainly of independent sequences of operations, this division of labor among processors provides drastic reductions in completion times over sequential programs. Scientific computing is experiencing an ever- growing need for parallelization to enable realistic modeling of physical systems on a large scale. We have implemented a parallel program to solve for the electronic levels including strain effects of lattice mismatched semiconductor nanostructures. The software uses spatial decomposition to divide the simulation window among a number of processors as shown in Fig. 7. Because our algorithms consist of for loops which apply derivative matrices to vectors, computation at one point in space only requires knowledge of the vector values and material parameters at neighboring points. Therefore, each processor can perform the required vector operations respective to its own region. However, the processors must share data located on the edges of its own simulation window with the processors containing neighboring regions to enable accurate representation of the continuous space. This is accomplished through the Message Passing Interface (MPI). The program calls special subroutines throughout execution to exchange the necessary data among processors. Vectors on each processor have extra terms 61 which are copied from the other processors. The vector operations can then be completed normally by each individual processor. MPI is a communication interface used in parallel computing clusters that allows processors to share information among them. An MPI program must include the required header file (e.g. mpif.h for FORTRAN). MPI uses a special compiler and execution command mpiexec. Once the executable file is created, a batch job is submitted to the cluster’s scheduling system. This is accomplished by submitting a PBS (portable batch system) file which specifies the executable and requests the desired number of processors for the job. The job is scheduled and queued depending on the availability of resources. When the required nodes are free, the system executes the same file on all requested processors. Because our program consists mainly of independent vector operations over the entire simulation space, it is highly scalable and provides a nearly perfect speedup where is the execution time of the sequential program and is the execution time of the parallel program on p processors. The parallel program provides the added benefit of allowing memory requirements to be shared among many nodes, which improves performance by reducing the time required for data retrieval. The High Performance Computing Center (HPCC) at USC provides 2683 nodes – 963 dual core (4 processors) and 1534 quad core (8 processors). The memory is shared among the processors within one node, therefore, it is possible to increase the memory available to each processor by using fewer processors per node. These capabilities enable the simulation of realistic configurations consisting of multi-body nanostructure 62 systems. For example, a cubic simulation window of 150 nm on each side can be modeled on 216 processors and requires approximately ½ hour to complete. 2.5.4 MPI Subroutines Our software employs only a few basic MPI subroutines: MPI_INIT – initiates the MPI environment MPI_COMM_RANK – returns a processor’s uniquely assigned ID MPI_ALLREDUCE – collects data from all processors and distributes it to all processors MPI_IRECV – non-blocking receive call MPI_SEND – blocking send call MPI_WAIT – delays execution until all previous non-blocking calls have completed MPI_FINALIZE – terminates the MPI environment The MPI syntax in FORTRAN is nearly identical to that in C with the exception that FORTRAN treats the MPI functions as subroutines requiring a “call” and an additional argument for ierror is needed in all calls. The main program must define three integer values used by the MPI subroutines: integer status(MPI_STATUS_SIZE), ierror, request To initialize the MPI environment, the main program first calls MPI_INIT which requires only one argument: call MPI_INIT(ierror) 63 To obtain each processor’s rank, the program then calls MPI_COMM_RANK: call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) Here, rank is an integer which will store the processor’s uniquely assigned ID. The argument MPI_COMM_WORLD signifies that the rank is assigned within the group consisting of all processors. It is possible to create subgroups of processors within the MPI environment with different specified tasks. However, because no such subdivision is carried out in our program, MPI_COMM_WORLD is always the default argument. (Another subroutine: call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierror), will return the total number of processors in the integer nprocs. However, our program does not need to know nprocs directly, and therefore, this subroutine is unnecessary.) The subroutine MPI_ALLREDUCE is called to collect data from a group of processors and combine it using a specified operation. Specifically, our program calls this subroutine when a dot product operation is required. Because each processor i contains only a portion of a given vector q and d given by q i and d i , the total dot product will be . The MPI_ALLREDUCE call allows us to collect all values, combine them via a sum operation, and distribute the total to all processors. The arguments are: call MPI_ALLREDUCE(local, global, 1, MPI_REAL8, MPI_SUM, MPI_COMM_WORLD, ierror) Here, the variable local stores the value to be sent (e.g. ) which will be combined with the local value of other processors. The global value will store the received value ( ) in each processor’s memory. The number 1 represents 1 value being sent and MPI_REAL8 means 64 that it is an 8 byte number. Finally, MPI_SUM specifies that the local values are to be added in this case. The subroutine MPI_IRECV, MPI_SEND, and MPI_WAIT are always called together in our program to carry out the application of periodic boundary conditions and are described in greater detail in the next section. Here, we will define the arguments to these subroutines: call MPI_IRECV(rec_buf, size, MPI_REAL8, from_rank, tag, MPI_COMM_WORLD, request, ierror) The variable rec_buf specifies where the received data is to be copied and size specifies how many numbers are being transferred of type MPI_REAL8. The variable from_rank indicates the processor rank that the rec_buf comes from, and tag is used as a label to identify this message. Here, request is used to signal the completion of the non-blocking call MPI_IRECV (see section 2.5.8). Likewise, call MPI_SEND(sen_buf, size, MPI_REAL8, to_rank, tag, MPI_COMM_WORLD, ierror) now transfers the size numbers of type MPI_REAL8 stored in sen_buf to the processors whose ID is to_rank and uses the tag as a label for this message. The subroutine MPI_WAIT must be used in tandem with the non-blocking MPI_IRECV call. It monitors the status of the request in MPI_IRECV and does not allow the process to continue until MPI_IRECV has completed. call MPI_WAIT(request, status, ierror) At the end of the main program, the MPI environment is finalized with the call: call MPI_FINALIZE(ierror) 65 2.5.5 Spatial Decomposition Once the MPI environment is initialized, calling MPI_COMM_RANK allows each process to access its unique rank assigned by the system at the onset of execution. For a job with P processes, each process is assigned an integer rank between 0 and P – 1. To achieve spatial decomposition, the program must then arrange the processors in a 3-dimensional grid which will be used to map the physical space. This is accomplished by using the process’s rank to calculate three vector indices describing its coordinates in the 3D grid. In order to do this, the user must first define the number of processors along each direction in the header file. For example the vector vproc(3) = (2,2,2) describes a simple cubic grid with 2 processes along each physical dimension , , and . Then, each process can define its vector indices as: vid(1) = rank/((vproc(2)*vproc(3)) vid(2) = mod(rank/vproc(3), vproc(2)) vid(3) = mod(rank, vproc(3)) For the process with rank = 0, vid = (0, 0, 0) and for rank = 1, vid = (0, 0, 1) and so on. To ensure correct execution, the user defined variable vproc must correspond to the same number of processors requested in the PBS file. 2.5.6 Parallelizing a Sequential Program A sequential program can be parallelized in a straightforward fashion by making specific modifications and inserting calls to MPI subroutines to share information among processors 66 when necessary. Any spatially dependent vector and matrix involved in derivative operations will need data from the processors assigned to the neighboring region of space. Because of this, such variables are defined in the header file to be larger than the processors’ spatial dimensions. For example, if N x = N y = N z = 5, each processor is assigned a (5 x 5 x 5) grid. A material parameter such as will be defined as - a (7 x 7 x 7) matrix. As we have mentioned in the previous chapter, the displacement vector u is an exception because it must carry two edge frames. Therefore, u(-1:N X +1,-1,N Y +1,-1:N Z +1) will be a (9 x 9 x 9) matrix in the example above. At the onset of the program, the subroutine setup defines the material matrices according to the specified geometry. This routine identifies the material present at each point in the simulation space according to the user defined geometry for the system, then assigns each material parameter at that point. Once each processor computes the different material matrix in its own region of space, it must copy the boundary information from adjacent processors. This is accomplished by calling the subroutine periodic_bcmat described in the next section. Here, this subroutine does not only apply periodic boundary conditions to the simulation space, but rather, it “stitches” the entire space together by transferring information between all adjacent processors. Applying this subroutine to all material matrices during setup gives all processors the necessary boundary material information to accurately complete their matrix operations. In the case when the user defined boundary conditions are not periodic (but rather fixed or transparent), the edge information of some processors must be modified. Our program 67 allows us to specify boundary conditions in each direction , , and independently, as well as to define two sets of boundary conditions – one for the strain calculation and one for the electronic structure calculation. Because of this, the subroutines which apply the corresponding boundary conditions are called after periodic_bcmat during setup. These routines only modify the boundary values of the processors containing the region of space on the edge of the entire window. For example, for fixed boundary conditions in the direction, the boundary values of the material parameters for the processors with vid(0, :, :) and vid(vproc(1)-1, :, :) are set to 0. Another modification required during setup is in the calculation of the finite difference elements , , and . In a sequential program, they are computed as dx = L x /N x , dy = L y /N y , and dz = L z /N z where L x , L y , and L z are the physical lengths of the entire simulation window. They must be changed to dx = L x /(N x *vproc(1)), dy = L y /(N y *vproc(1)), and dz = L z /(N z *vproc(1)). Furthermore, the geometry subroutines which identify the material present at each point , , must modify their physical length calculation – e.g. l x = i*dx + L x /vproc(1)*vid(1). Throughout execution, the boundary information must be shared among processors before a derivative operation is applied to a vector. If a dot product is calculated, the MPI_ALLREDUCE subroutine is used as described above. If a subroutine that correctly transfers boundary data is defined and introduced at the proper steps in the program, a sequential program can easily turn into a parallel version. The next section describes in more detail how our program defines such a subroutine. 68 2.5.7 Sharing Boundary Data To obtain the boundary values from its neighbors, the processors call the subroutine periodic_bcmat which takes a specified variable as input and copies the edge data from the neighboring processors into that variable’s boundary locations. This subroutine also sends its own edge data to its neighbors in a “rou nd-robin” fashion and serves as the basis for our MPI implementation. To begin, the processor must first identify the ranks of all its neighbors – 2 in each direction , , and . This is accomplished by computing pux, pdx, puy, pdy, puz, pdy: sen = mod(vid(1) + 1, vproc(1)) pux = vproc(3)*vproc(2)*sen + vproc(3)*vid(2) + vid(3) sen = mod(vid(1) – 1 + vproc(1), vproc(1)) pdx = vproc(3)*vproc(2)*sen + vproc(3)*vid(2) + vid(3) sen = mod(vid(2) + 1, vproc(2)) puy = vproc(3)*vproc(2)*vid(1) + vproc(3)*sen + vid(3) sen = mod(vid(2) – 1 + vproc(2), vproc(2)) pdy = vproc(3)*vproc(2)*vid(1) + vproc(3)*sen + vid(3) sen = mod(vid(3) + 1, vproc(3)) puz = vproc(3)*vproc(2)*vid(1) + vproc(3)*vid(2) + sen sen = mod(vid(3) – 1 + vproc(3), vproc(3)) pdz = vproc(3)*vproc(2)*vid(1) + vproc(3)*vid(2) + sen Transferring the shared boundary information is carried out in 6 steps. To illustrate, we use the material parameter : 69 1. First Transfer in a. Receive γ(0, :, :) from pdx b. Send γ(N x , :, :) to pux 2. Second Transfer in a. Receive γ(N x + 1, :, :) from pux b. Send γ(1, :, :) to pdx 3. First Transfer in a. Receive γ( :, 0, :) from pdy b. Send γ( :, N y , :) to puy 4. Second Transfer in a. Receive γ( :, N y + 1, :) to puy b. Send γ( :, 1, :) to pdy 5. First Transfer in a. Receive γ( :, :, 0) from pdz b. Send γ( :, :, N z ) to puz 6. Second Transfer in a. Receive γ( :, :, 1) from puz b. Send γ( :, :, N z + 1) to pdz Because the displacement vector u must transfer data for two edge frames, we define a separate subroutine for it – periodic_bcxv which also uses 6 steps, but each step is longer. For example, the first two steps require: 1. First Transfer in x a. Receive u(0, :, :) from pdx b. Send u(N x , :, :) to pux c. Receive u(-1, :, :) from pdx d. Send u(N x - 1, :, :) to pux 2. Second Transfer in x a. Receive u(N x + 1, :, :) from pux b. Send u(1, :, :) to pdx c. Receive u(N x + 2, :, :) from pux d. Send u(2, :, :) to pdx 70 2.5.8 Implementation Issues In this section, we will take a closer look at our “round robin” data exchange implementation, highlighting timing considerations and pitfalls to avoid. We begin by examining the steps outlined above in greater detail. All of the transfers actually begin by first copying a variable’s edge data tha t is to be sent into a buffer – sen_buf. The data that is received is also initially copied into a different buffer – rec_buf and then copied into the target variable’s edge position. For a variable we have: First Transfer in x: do i = 0, N y + 1 do j = 0, N z + 1 sec_buf((N z +2)*i + j + 1) = γ(N x , i, j) end do end do call MPI_IRECV(rec_buf, numx, MPI_REAL8, pdx, 10, MPI_COMM_WORLD, request, ierror) call MPI_SEND(sen_buf, numx, MPI_REAL8, pux, 10, MPI_COMM_WORLD, ierror) call MPI_WAIT(request, status, ierror) do i = 0, N y + 1 do j = 0, N z + 1 γ(0, i, j) = rec_buf((N z +2)*i + j + 1) end do end do It is safer to first buffer the sent and received data to avoid run time errors. The code above achieves both a transfer of data from as well as receiving data into . However, using as a direct argument to the MPI_SEND and MPI_IRECV functions could lead to errors. 71 Run time errors can manifest themselves differently depending on the compiler used. Some compilers may change the order of instructions if no data dependencies are found. Our experience found that this can be a source of errors with the code above because some FORTRAN compilers will not detect the data dependency of the sec_buf in the MPI_SEND call on the copy operation before it. As a result, they may reorder the instructions to copy data into sen_buf after the MPI_SEND call to transfer its contents. This problem stems from the fact that MPI was originally developed as a C interface and some FORTRAN compilers do not have the proper checks for data dependencies implemented for MPI subroutines. One way to avoid this pitfall is to make any variable that is an argument to an MPI call a global variable. In this case, the compiler will not reorder instructions which modify these variables. Therefore, our program defines all MPI arguments (e. g. rec_buf, sen_buf, numx, pux, pdx, request, ierror, status) in the header file or in the main program. We also note that while the shared data comes from a multi-dimensional array , the intermediate send and receive buffers are one-dimensional. This was implemented for more efficient use of memory allocation. The one-dimensional buffers can be used to copy data from arrays of any size and dimension, and therefore, these buffers can be used for all transfers as long as they are defined to have the maximum possible size for transferred data. This maximum size is based on the displacement vector u which must transfer 2 extra frames. In general, we will also have a different number of grid points along each dimension , , and . Therefore, the maximum size is taken to be dim = max(N x , N y , N z ) and the buffers are defined as sen_buf((dim + 4)*(dim + 4)) and rec_buf((dim + 4)*(dim + 4)). 72 Although this method provides efficient reuse of memory, great care must be taken when computing the values of numx, numy, and numz which define how many data points will be transferred. Our experience has shown that the compiler will not catch out of bound errors in indexes and arrays and it is possible that data being received and copied locally can overwrite other adjacent data if the intended array is overflowed. This will introduce error in the results at run time which are very difficult to trace. For this reason, the starting indexes of copied data to be transferred must also be carefully verified. In particular, we note that transferring a given frame (e.g. γ(N x , :, :)) includes all boundary terms (e.g. γ(N x , 0, 0) and γ(N x , N x +1, N x +1)). The sequence of transfers outlined above will result in a variable containing data for its entire boundary, including corner terms like γ(0, 0, 0) and γ(N x +1, N x +1, N x +1). If we only transferred γ(N x , 1: N x , 1: N x ) for example, these corner terms would remain 0. The corner terms are necessary for only one operator, namely the mixed derivative which requires knowledge of the terms displaced in two simultaneous directions, e.g. . Another common problem in all parallel implementations is the issue of deadlock. In certain instances, none of the processors are able to advance because they simultaneously attempt to send data to one another, but none of them has issued a receive call. This can happen when blocking calls are used. For example, our data transfer could have been implemented with just two blocking calls: call MPI_SEND(sen_buf, numx, MPI_REAL8, pux, 10, MPI_COMM_WORLD, ierror) call MPI_RECV(rec_buf, numx, MPI_REAL8, pdx, 10, MPI_COMM_WORLD, request, ierror) In this case, each processor first issues a call to MPI_SEND. Because it is a blocking call, it does not return control to the main program until after it completes the send operation. The 73 operation is complete when the process has transferred all the data to be sent into a buffer controlled by the kernel. The process then proceeds to the blocking receive call which issues a message that it is open receive data with the specified tag. This operation is complete once the kernel delivers the appropriate message buffer. In most cases, this sequence of calls will not lead to deadlock. However, we must consider a case where the amount of data to be transferred is large and will not all fit into the kernel’s allocated memory buffer. In this case, the send operation will copy enough data to fill the kernel’s buffer, but it will not return control to the main pr ogram. The kernel will attempt to deliver the filled buffer to the corresponding process, but it will fail to do so, since none of the processors are able to issue their receive call. This problem is not encountered often in modern operating systems where the buffer size is normally large enough for most applications. Nevertheless, it can be difficult to verify the exact buffer size allocated by the operating system for such purposes, and our applications can require large amounts of data to be transferred in realistic simulations. Therefore, we have chosen a non-blocking implementation to avoid deadlock. In our case, the non-blocking receive call is issued first. This call allows the process to advance even if the operation has not completed. The send call can then transfer its data even if the kernel must fill its buffer multiple times. Finally, the wait call allows execution to complete only after the receive call completes. 74 FIG. 7. A 2D illustration of data sharing of boundary values among processors (P 0 – P 3 ). The vectors on each processor store boundary data copied from its neighbors in the gray areas depicted above. The arrows indicate how boundary terms are copied from neighboring processors in each direction and . Parallel Implementation P 0 P 3 P 2 P 1 • Parallel program for strain and electronic structure has been implemented. • Spatial decomposition using message passing interface (MPI) enables scaling on thousands of processors. • Nearly perfect speedup, reduced memory requirement per node • 150 nm x 150 nm x 150 nm simulation window on 216 processors in ½ hour. • High Performance Computing Center (USC) offers 2683 nodes ~ 16,000 processors. 75 3. Results and Analysis 3.1 Properties of InAs Nanostructures Grown on (001) GaAs Substrates 3.1.1 Strain –Dominated Quantum Dot Growth In this section, we study uncovered QDs to understand the role played by strain in the growth and formation of these structures. We use dimensions for the QDs which have been previously observed in Ref. 1. In the Stranski-Krastanov growth mechanism, material deposited on a lattice-mismatched substrate will form into small clusters called QDs. This configuration allows the material to reduce its strain energy compared to a thin film. For solar cell applications, it is desirable to deposit more than a few monolayers (ML) in order to maximize absorption. As more material is deposited however, accumulated strain will eventually lead to the formation of dislocations which significantly reduce the efficient conversion of absorbed light to photocurrent. The growth of thin films deposited on a lattice mismatched substrate is FIG. 8. Vertically stacked InAs QDs grown on (001) GaAs substrate from Ref. 1. We use the dimensions observed in that work to model the QDs in this study. 76 known to be limited to a certain critical thickness beyond which the defect density becomes too significant for practical applications. In a system which consists of InAs deposited on a GaAs substrate with relatively high lattice mismatch of 7%, experiments have shown that the critical thickness is only 1-2 monolayers (ML). Nevertheless, it is postulated that quantum dots can release strain by relaxing in the lateral direction, allowing growth of more defect-free material. It has been observed through experiments that QDs formed during Stranski-Krastanov growth appear to reach critical dimensions in both the lateral and vertical dimensions. In order to quantify how QD size and shape is driven by strain, we apply our linear elastic model to calculate the strain energy in QDs with varying dimensions. FIG. 9. Strain energy density along the direction for single layer InAs QD with diameters 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, and 60 (blue, red, green, pink, orange, cyan, black, dashed blue, dashed red, dashed green, dashed pink, and dashed orange curves respectively) grown on GaAs substrate. 77 Our grid spacing is equal to , the lattice constant of GaAs. We apply periodic boundary conditions on the displacement u at the edges of the simulation window. Quantum dots initially form on a surface surrounded by vacuum and we model this by setting the elastic constants in the empty region to very low values. The vacuum is otherwise treated as substrate material with the same grid spacing. Our results show that this approach produces the desired effect of introducing lateral relaxation in QDs with uncovered surfaces. We first compute the strain energy density of single layer QDs with circular base as their diameter increases from 5 to 60 nm. If our simulation window has bounds of we plot the strain energy density as a function of , when , and , where is the position of the QD layer. Figure 9 plots the results for 10 QDs with different diameters. It is apparent that the strain energy increases asymptotically towards a maximum as the diameter increases. There is little difference in the energy density for QDs with diameters 25 nm and above, while QDs with diameters of 5-20 nm have energy densities of 60-70% of this maximum. For thin films, the dependence of the strain energy density on height is derived from linear elasticity theory to be: (1) Here, is the shear modulus, is the Poisson ratio, and is the lattice mismatch. We note that the asymptotic energy in Fig. 9 is approximately 0.57 𝑒 which is what we obtain from the above equation with Thus, our results indicate that as the lateral dimension increases, the strain energy approaches that of an infinite thin film. 78 We next calculate the strain energy density of QDs composed of 2 layers. The bottom layer has diameter of 30 nm, while the second top layer has varying diameter from 5 – 30 nm. Figure 10 plots the strain energy density per volume through both top and bottom layers for this series of QDs. We observe that depositing a second layer leads to a decrease in the strain energy density in the bottom layer. This suggests that the second layer helps align the first layer more closely with its natural lattice constant. The presence of a second layer may also induce a transfer of strain energy into the surface substrate layer. We examine this effect in more detail in section 3.1.2. As Fig. 10 shows, the QDs with a smaller top layer have lower energy density. The energy density per area is higher for QDs with nearly equal top and bottom layers. This is another effect of lateral relaxation, and helps explain the shape of QDs formed through Stranski-Krastanov growth which have truncated shapes with low aspect ratios. We next calculate the strain energy density for single layer QDs in a configuration having a truncated disk-shaped QD embedded within the substrate. We again base our calculations on the QD stacks fabricated in Ref. 1 taking the base diameter of the embedded QD to be 20 nm FIG. 10. Strain energy density for 2-layer uncovered QD. (a) The bottom layer (dashed lines) has diameter of 30 nm and top layer diameter (solid lines) varies between 5 – 30 nm is 5 nm increments. (b) Total strain energy (top + bottom layers) of QD in (a). Red, blue, green, pink, orange, and cyan curves correspond to top layer diameter of 5, 10, 15, 20, 25, and 30 nm respectively. (a) (b) 79 and the top diameter to be 14 nm. The height of the QD is 3 nm and the distance from the top to the substrate surface is approximately 6 nm, equivalent to 11 ML. For the single layer QD formed on the substrate surface, we again vary the diameter between 5 and 60 nm. Figure 11 (a) shows the resulting strain energy densities per volume along the direction through these QDs. It can be seen that the energy density is reduced by 20-30% over a region spanning 20 nm through the QD centers. However, beyond this region, the strain energies begin to increase towards a slightly higher level than the maximum attained in Fig. 9. Nevertheless, this increase occurs over an additional 10 nm on either side, so that QDs with diameters of 15 – 35 nm experience a lower energy density throughout compared to those in Fig. 9. This result explains the experimental observation that embedded QD layers allow larger ones to form on top due purely to the effects of strain. It is remarkable that although the amount of QD material is very small compared to the substrate, it can significantly alter the strain fields in the surrounding medium and layers grown on top. To solidify our understanding of this phenomenon, we carry out two more calculations. In the first one Fig. 11 (b), we enlarge the base diameter of the embedded QD to 30 nm and the top to 24 nm. In the second case Fig. 11 (c), the dimensions are the same as in Fig. 11 (a), but we decrease the distance from the QD top to the surface to only 1.5 nm. This leaves = 3 ML between the top of the QD and the substrate surface. Figure 11 (b) shows the results for the first instance of a QD with wider diameter. The results for small QDs match closely with those in Fig. 9. The wider base QDs reveal a region of constant strain energy which is wider than in Fig. 9. We observe that this single layer QD also approaches a maximum in the center similar to Fig. 9, but at a level lower than that 0.57 𝑒 . This lower energy level is about 0.4 𝑒 80 which corresponds to an equivalent “lattice mismatch” of 6%. In Fig. 11 (c) we present the results from the second instance of the embedded QD very close to the substrate surface. The results show an even greater reduction than Fig. 9 of 30- 45% in strain energy density in the 20 nm region over the embedded QD. The maximum energy level in this region drops further to 0.3 𝑒 with an equivalent lattice mismatch of 5.2%. We note however, that the energy on the QD edges rise above 0.6 𝑒 - higher than in Fig. 9. In 3D space, the strain energy forms a valley surrounded by a ring of more highly FIG. 11. Strain energy density of single QD layers with diameters D = 5 – 60 nm grown over an embedded QD below the substrate surface as shown in (d). (a) B1 = 20 nm, B2 = 14 nm, H = 3, y 0 = 6 nm (11 ML). (b) B1 = 30 nm, B2 = 24 nm, H = 3, y 0 = 3 nm. (c) B1 = 20 nm, B2 = 14 nm, H = 3 nm, y 0 = 1.5 nm (3 ML). Solid blue, red, green, pink, orange, cyan, and black curves represent diameters of 5, 10, 15, 20, 25, and 30 nm respectively. Dashed blue, red, green, pink, and orange curves represent diameters of 35, 40, 45, 50, 55, and 60 nm respectively. y 0 (a) (b) (c) (d) 81 strained material. Because the energy is reduced by a much larger amount in the center of the QD, the overall QD energy is smaller for this configuration compared to Fig. 9. Our results confirm that nanostructures embedded within lattice mismatched substrates exert forces on the surrounding crystal structure which can be modeled using continuum elasticity theory. Because in a realistic fabrication process, it can be difficult to control the placement of QD layers relative to the QD configuration below, we also seek to understand how misalignment of QDs relative to the embedded ones can affect strain energy. We model two surface QDs both with 5 nm diameter, while the embedded QD has a tapered disk shape with 20 nm base and 14 nm top layer. Its height is about 3 nm and it is embedded 6 nm away from the surface. This is the same as the QD in Fig. 11 (a). The QDs on the surface have varying center-to-center spacing given by d i = 7 nm, 11 nm, 15 nm, 19 nm, 23 nm, and 27 nm. As this distance increases, the QDs move beyond the edges of the embedded QD, such that for 27 nm, they are located beyond the edges of the embedded QD. The strain energy is plotted in Fig. 12 for the six inter-dot distances d i . It is apparent that the embedded QD serves to greatly reduce the strain when the QDs are directly on top of it. As they move toward the edges of the QD, the strain energy increases. This is consistent with previous observations and with the strain energy profile of the embedded QD which showed that the strain energy was greater at the edges.Comparing with Fig. 11 (a), the strain energy for a single 5 nm diameter QD on top of an embedded QD (solid blue curve) was 0.2273 𝑒 , while in the absence of an embedded QD (Fig. 9), the energy was 0.3375 𝑒 (solid blue curve). In the current case, the energy of the QDs with d i = 7 nm which are entirely 82 aligned with the embedded QD is 0.2451 𝑒 , while for d i = 27 nm, the energy is 0.3469 𝑒 . Apparently, the two-QD configuration yields slightly higher strain energies. This may be due to the interaction between the strain fields of the two QDs in close proximity. Our results indicate that we can expect the strain energies for clustered QDs to be up to 10% higher than for isolated QDs. In this section, we have shows the effects of lateral relaxation and strain fields from embedded nanostructures on the strain energies of QDs with one or two layers. In the next section, we turn our attention to freestanding nanowires (NW) with many layers grown in the direction. We determine the expected critical dimensions of the NW which favor defect-free growth versus dislocation generation. FIG. 12. Two uncovered single-layer QDs with diameters of 5 nm and inter-dot distance d i varying between 7, 11, 15, 19, 23, and 27 nm (red, blue, green, pink, cyan, and black curves, respectively). The QDs are grown on top of a truncated QD as in Fig. 11 (d) with B1 = 20 nm, B2 = 14 nm, H = 3, y 0 = 6 nm (11 ML). d i 83 3.1.2 Critical Height Prediction for Highly Strained Nanowires Using Continuum Elasticity Free-standing semiconductor nanowires (NW) fabricated with novel crystal growth techniques have become of interest to researchers as potential innovative devices 4-9,55-57 and most recently as possible components of high efficiency solar cells. 58-61 Thin pillars of a low band-gap material such as InAs or InP can be grown on higher band-gap substrates such as GaAs or Si using selective area growth 18,62-72 or vapor-liquid-solid (VLS) phase epitaxy. 20-22,73-74 A major technological challenge in fabricating such nanoscale heterostructures is the generation of lattice defects such as misfit dislocations caused by the mismatch of lattice constants between different semiconductors. 17,75-78 This in turn introduces recombination mechanisms for charge carriers and consequently degrades the device efficiency. For example, the growth of coherent thin films is limited to a few monolayers (ML) of InAs on GaAs due to the accumulated strain. 79 It is postulated that the inherent strain due to lattice mismatch between the two materials will be lower than for thin films, 80 because the NWs are free to relax in the lateral direction. 81 Experiments have indicated that NWs with many more layers than thin films can be grown without inducing such defects. 82 Because strain plays a critical role in the formation of NWs, 83 theoretical methods 84 have been developed to understand its effects on these structures. Analytical methods have been used to model NWs with simplified geometries, such as those grown on rigid substrates which are assumed to take on no strain 85 or lattice mismatched NW layers on a semi-infinite NW. 86 However, substrate relaxation is expected to play an important role on NW formation, and it is desirable to develop a method of calculating strain energy for free-standing NW systems of 84 arbitrary geometries. In this paper, we implement a numerical method that allows us to calculate strain and strain energy in NW of realistic geometries on massively parallel computers. As a specific example, we calculate the strain energy of a free-standing InAs NW formed on a GaAs substrate as a function of rod diameter and height, and compared it to the formation energy of misfit dislocations. The results demonstrate a critical role of substrate relaxation on defect-free growth of InAs NWs on GaAs. Only in the presence of substrate relaxation can the NW grow to infinite length without emitting dislocations below a critical diameter. 87 We use continuum elasticity theory to calculate spatial distribution of strain and strain energy for free-standing NWs grown on a substrate as a function of NW height. Figure 13 shows a schematic of the simulated NWs with disk-shaped cross-sections having diameter d and FIG. 13. Simulated NW monolayers grown on a substrate. The dots indicate the center point of each layer in the plane where the strain energy per area is calculated. The Cartesian axes are also shown. The substrate thickness is denoted as d sub , and is fixed at 18 ML. We have verified that the total strain energy in the NW agrees to within < 1% for systems with d sub = 18 ML and d sub = 36 ML. h d d sub y x z 85 heights of 1-h ML. The three-dimensional space is discretized with a cubic grid of spacing a GaAs , the lattice constant of the substrate. 48 The NW material is treated as expanded substrate material with different elastic constants given in Ref. 26. Above the substrate, the NW is surrounded by vacuum, numerically represented by very small elastic constants. As a reference system, a completely rigid substrate is also considered for which the elastic constants are chosen to be 3-4 orders of magnitude greater than those of GaAs. The NW’s physical strain is calculated by converting the computed strain from the substrate’s reference frame to that of the InAs crystal structure. 48 The substrate thickness, d sub , is fixed as 18 ML. We have verified that the total strain energy in the NW changes only < 1% when d sub is increased to 36 ML. We have designed scalable algorithms to solve the linear elasticity problem with the CG method on parallel computers based on spatial decomposition and implemented them using message passing. 42,88 The CG iteration terminates when the value of the residual vector norm (where is the current estimate after iterations for the solution is reduced by a specified factor after iterations such that . When comparing the resulting (minimized) strain energy for different values of , we find that smaller values of (more iterations) are required for convergence of the total strain energy when modeling free-standing NWs compared to structures that are completely embedded in GaAs. To test the numerical convergence of the strain energy as the strain constants to represent vacuum, , are decreased, we carry out several calculations of strain energy on a NW system using different values for and . In Fig. 14 (a), the total strain energy in an InAs NW with 5 nm diameter and h = 100 ML is plotted as a function of vacuum strain constant, all three strain constants being set to the same value shown on the axis. The blue, red, and 86 green curves represent values of 10 -5 , 10 -6 , and 10 -7 , respectively. The strain energy converges within 0.6% for strain constants of and 10 -7 . In Fig. 2 (b), we compare the decrease in residual norm vector with the number of iterations for the NW in (a) along with a 200 ML nanorod of the same diameter and material system. We note that for the 200 ML NW (red curve), the residual norm decreases at a slower rate, such that more iterations are required to converge to within = 10 -6 . For greater values, the difference in the required number of iterations grows relative to the h = 100 ML NW (blue curve). FIG. 14 (a) Total strain energy in NW with d = 5 nm and h = 100 ML computed for different values of and = 1x10 -5 , 1x10 -6 , 1x10 -7 (blue, red, and green curves respectively). (b) Residual vector norm vs. CG iterations NW with d = 5 and h = 100 ML (blue curve) and h = 200 ML (red curve). (c) Total strain energy in NW with d = 5 nm and h = 20 ML grown on rigid substrate vs. values with and (blue), and (green) and and (red). (d) Residual vector norm vs. CG iterations for NW with d = 5 and h = 20 ML grown on GaAs and rigid substrates (blue and red curves respectively). 87 In Fig. 14 (c), we plot the convergence of strain energy for a 5 nm diameter NW with h = 20 ML growing on a rigid substrate, as a function of vacuum strain constant value, termination criterion of iterations ( ), and the rigid substrate strain constant value denoted by . For example, when , each strain constant in the rigid substrate system is 4 orders of magnitude greater than the corresponding value for GaAs. We observe little difference in the strain energy when the vacuum strain constant is increased from to (blue curve). In this case, decreasing the value to 10 -7 also has little effect (red). Nevertheless, when the rigid substrate value changes from 3 to 4, an increase of less than 10% is observed in the total strain energy (green). This is expected since a more rigid substrate induces greater strain in a nanorod. We will explain the mechanism of this phenomenon later in this section. Figure 14 (c) indicates that for the rigid substrate material, a value of is required for convergence within less than 10%. However, since we will show that the strain energy obtained with is already too large for coherent growth, we carry out our analysis based on this value. Figure 14 (d) compares the convergence rates of a rigid substrate system (red curve) with those of a GaAs substrate system (blue curve) by plotting the decreasing residual norm with the number of iterations for a NW with d = 5 nm and h = 20 ML. It is apparent that a rigid substrate system converges at a much slower rate than the GaAs case and requires smaller values of , consistent with our previous observation for highly disparate elastic constants. However, Fig. 14 (a) suggests that total strain energy convergence is ensured for an absolute value of residual norm which is approximately 10 -1 – 10 -2 . The higher required values of for the rigid substrate case are explained by the higher initial value of the residual vector 88 norm — = 10 7 for the rigid substrate, compared to = 10 4 (Fig. 14 (d)). To account for this 3 orders-of-magnitude difference in the starting points of the two iterations, the user defined value must be set accordingly. The above analysis provides a guideline for achieving convergence by choosing proper values for , , and . We have confirmed the convergence of the strain energy as we set the strain constants to smaller values to represent vacuum. When considering free- standing NWs growing on a rigid substrate, still smaller values of are required for convergence. When modeling a rigid substrate, we also observe the convergence of the strain energy with greater values for the rigid substrate strain constants represented by . Thus, the more the material constants in the system differ from one another, the more iterations will be required to ensure convergence. 89 The increase in strain energy density (energy/area) with height for thin films has been previously derived as 90 : , where is the shear modulus, is Poisson’s ratio, and is the lattice mismatch. The critical thickness h c for certain thin films has been predicted by equating with the dislocation energy density for a thin film. 90 While this analysis yielded accurate solutions for films with lattice mismatch below 4.2%, systems with greater mismatch could not be studied with this method because the predicted strain energy was always greater than the dislocation energy for all values of . Nevertheless, our calculations show that InAs/GaAs structures (~7.2% lattice mismatch) with finite lateral dimension have significantly lower energy due to relaxation in the outward direction, allowing the estimation of the using this approach. As we have seen in 89 Fig. 9, NWs with diameters of 5 – 10 nm have strain energy density that is 60 – 70% that of a thin film, which suggests that more layers can be grown coherently as long as the total energy density remains below the formation energy of dislocations. Figure 9 also suggests that a critical diameter also exists for which the NW’s strain energy will be greater than the dislocation energy for all heights. To determine the critical height and diameter , we apply our numerical calculation to compute the total strain energy of multiple-layer NWs with diameters between 5 – 7 nm, and heights of h = 1 – 6 ML, and compare to the expected dislocation energies of these structures. The dislocation energy is calculated as in Ref. 91 to account for the finite diameter of the NW using Matthew’s model for two edge dislocations such that , (3) where is the radius of the nanorod, and h is the nearest distance between the dislocation and a free surface. 92 Based on Ovidko’s met hod 16 and Glas’ extension, 86 is defined as (4) where is used to account for the circular cross-section of the nanorod. Following Ref. 91, the strain energy is defined as , (5) where is the residual strain energy present in the system after a dislocation is introduced, which is computed using continuum elasticity assuming a smaller residual lattice mismatch. We have computed the residual strain energy, but found that it is negligible compared to and does not affect our determination of critical nanorod diameter. Therefore, we let . 90 Figure 15 demonstrates that the total strain energy in a NW with diameter ≤ 6 nm reaches an asymptote with height, remaining below the dislocation energy. In this regime, defect-free growth is favored. Nevertheless, for NWs with diameters greater than 6 nm, the strain energy immediately exceeds the dislocation energy at the onset of growth. This observation is consistent with previous works demonstrating the existence of a critical diameter marking the transition from infinite to finite critical height. For our particular material system with large lattice mismatch of 7.2%, the sharp transition of the critical height from infinite to 1 ML, as well as the critical diameter below 10 nm, are consistent with the results reported in Refs. 86 and 93. We also note from Fig. 15 that a NW with = 5 nm grown on a rigid substrate (dot-dash black curve) has much higher strain energy that exceeds the dislocation energy at the onset of growth. This observation suggests that substrate relaxation plays a crucial role in defect-free NW formation. To gain insight into the role of substrate relaxation, we compare the individual hydrostatic strain components for several geometries of InAs structures growing on both relaxed (GaAs) and rigid substrate in Fig. 16 (a) – (f). The physical strain components xx and zz are plotted relative to each material’s lattice constant described in Ref. 48 along the axis through the center point of the plane as depicted in Fig. 13. (As expected due to the symmetry in the plane, the yy component was identical to xx in our calculations.) We plot the strains through the top substrate layers that interface with the NW layers. In Fig. 16 (a), we present the results of our simulations for a single layer of InAs with (i.e. film) deposited on top of a GaAs substrate. The InAs material at 1 ML from the interface has negative strain components and indicating compression with magnitude equal to . This 91 corresponds to the maximum lattice mismatch between the two materials as computed in the InAs frame of reference: . This confirms the expected result that an infinite thin film will have strain equal to the lattice mismatch between the two materials. The zz component is positive, indicating tension as expected from the Poisson effect, and its magnitude is approximately 0.03. As we have seen in Fig. 9, a thin film with finite lateral dimensions will approach the same strain energy as a thin film for large enough diameters. Figure 16 (b) plots the strain components of the 60 nm diameter single layer NW, which is expected to closely resemble a FIG. 15. Total strain energy vs. height for InAs nanorods with 5, 6, and 7 nm diameter (solid red, green, and blue curves respectively). Dashed curves depict corresponding dislocation energies . Dot-dash black curve depicts the total strain energy for 5 nm diameter nanorod grown on a completely rigid substrate. Because the strain energy exceeds the dislocation energy for the 7 nm diameter nanorods, the critical diameter is 6 nm. 92 thin film. Indeed, the strain components confirm this by comparison with Fig. 16 (a). Although the magnitudes of the strain components are almost the same as in Fig. 16 (a), all three of them are slightly smaller for the finite dimension NW. As we expect, this is due to the NW’s ability to relax in the radial direction in order to minimize its total strain energy. In Fig. 16 (a) and (b), we also observe that the strain components in the substrate are non-zero. The xx component is slightly positive, indicating a small amount of tension in the substrate to accommodate the InAs monolayer’s larger lattice constant. In addition, the zz component is significantly greater than zero in the top substrate layer, with magnitude that is nearly equal to the zz component in the InAs monolayer. This indicates that the top substrate layer is also under tension in the direction in the same manner as the adjacent lattice mismatched material. To understand why this configuration serves to minimize the total overall strain energy, we recall that strain is defined as the derivative of a displacement vector u. Thus a displacement in the direction in the NW layer must induce a similar displacement in the adjacent substrate layer to ensure a small overall displacement change. For the lattice mismatched InAs monolayer, the expansion along serves to offset the compressive strains in and . As seen from the free energy equation in section 2.2.2.1, the opposite signs of the transverse and longitudinal strain components serve to reduce the strain energy as predicted by the Poisson effect. With this understanding of the relationship among the strain components, we turn our attention to the rigid substrate system. Figure 16 (c) and (d) present the results for a 5 nm diameter single layer NW formed on a rigid substrate (c) compared to GaAs (d). We note that the xx for the rigid substrate in the NW again has a value of as in Fig. 16 (a). However, the zz component is nearly zero 93 as in the substrate layer where the strain is strictly zero. The implication for the InAs monolayer is significant in that it cannot expand in the direction to offset the compression in and . Thus, the significant increase in energy for a rigid substrate compared to a relaxed substrate observed in Fig. 15 originates because the NW on the rigid substrate is unable to reduce its energy through expansion in the transverse direction. FIG. 16. Hydrostatic strain components and (blue and green curves respectively) along direction through center of plane (see Fig. 13) for InAs structures on GaAs ((a), (b), (d)) and rigid substrates ((c), (e), (f)). Nanostructure dimensions are in (a), (60 nm, 1 ML) in (b), (5 nm, 1 ML) in (c)-(d), and (5 nm, 5 ML) in (e)-(f). Shear strain components , , and are negligible. 94 Finally, in Fig. 16 (e) and (f), we examine the changes in the strain components as more layers are added to the NW systems of Fig. 16 (c) and (d). We plot the strain components of the 5 nm diameter NW having = 5 ML in both cases. As expected, there is little change in the NW strain as more layers are added, when the substrate is rigid. We note that the xx component reaches a maximum in the first NW layer, while the zz component peaks in the second layers. All components decay to zero at a height of 5 ML. For the system in Fig. 16 (d), the strain is shared among both NW and substrate. The NW is able to maintain smaller strains overall, and exhibits a cross-over from compressive (tensile) strain in xx ( zz ) to tensile (compressive) strain between the fourth and fifth monolayer. Thus our results demonstrate that distributed strain among both NW and substrate surface layers is required for coherent structure formation. A completely rigid substrate would not be conducive to defect free growth. In summary, we have used continuum elasticity theory to estimate the critical diameter for defect-free growth of NWs with finite lateral size. We have found that lateral relaxation allows highly strained GaAs/InAs NWs to grow coherently provided that the diameter is 6 nm or below. Beyond this base size, the relaxation is negligible and the NWs behave like thin films. In addition, we have shown that the GaAs layers at the substrate surface play an important role in the existence of coherent free growth of the NW. Our model indicates that a completely rigid substrate would induce a much higher strain in a NW, eliminating the existence of a critical radius. Finally, we have demonstrated a methodology that enables strain calculation of free- standing nanostructures systems having arbitrary geometries using continuum elasticity theory. 95 3.1.3 Strain-Releasing Nanowire Configurations Free-standing semiconductor nanowires (NWs) 62,68,70 are being considered as the main components of the next generation of solar cells. 58,61 It is postulated that lateral relaxation allows the coherent growth of such pillars on top of lattice mismatched substrates which would greatly increase the absorption length compared to thin film solar cells. Recent electromagnetic simulations showed that the broadband absorption efficiency is a sensitive function of the NW diameter relative to the inter-NW distance and ranges from 10 to 25% 94 . Previously, 52 we used continuum elasticity 48 to predict the critical diameters and heights 85 ( and , respectively) of InAs NWs grown on a GaAs substrate by comparing the strain energy with the energy of dislocation formation 16,86,91,93 as a function of NW diameter and height . The critical diameter is therefore defined from As the NW height increases, the strain energy initially increases linearly similar to FIG. 17. Geometrical dimensions of NW grown on pillar (a) and NW grown on an embedded QD (b). The QD is embedded a distance below the substrate surface. (a) (b) 96 thin film formation. Due to the NW’s finite diameter, ho wever, the energy increases at a slower rate than for a thin film, and reaches an asymptote after several monolayers (ML). Any layers added to the top of the NW will be fully relaxed and form with their natural unstrained lattice constant. If for a given height, coherent growth is not favored and dislocations are highly likely. The critical height is therefore defined as For our InAs NWs grown on GaAs substrate having high lattice mismatch of ~7%, we found that the strain energy exceeds the dislocation energy for all when > 6 nm. Therefore, our analysis predicted that NWs with > 6 nm have critical height ~ 1 ML. For ≤ 6 nm, we found that for all indicating that . That is, NWs with diameters below = 6 nm could be grown without inducing any defects. However, electron beam lithographic techniques 11 used to pattern substrates for growth of nanostructures have resolution close to 10 nm, which poses a challenge in forming such thin NWs. Therefore, the ability to grow coherent NWs with > 6 nm is highly desirable. In this work, we apply the same approach as in Ref. 52 to compare and for NWs formed in several growth configurations with the potential to allow greater strain relaxation than a simple substrate. In this way, we quantify the expected increase in critical diameter that can be achieved by growing InAs NWs on several strain-releasing GaAs geometries. The increased NW diameter due to the proposed configurations is expected to increase the photo-absorption efficiency for photovoltaic applications. Recently, both theoretical and experimental evidence has been reported demonstrating strain-releasing mechanisms induced in certain geometrical configurations. In Ref. 95, the growth of a quantum dot (QD) 96-97 on a square mesa was modeled by molecular dynamics 97 (MD), suggesting that such a structure could allow coherent formation of a 12 ML QD via lateral relaxation. Based on this concept, we calculate the strain energy of an InAs NW growing on a pillar up to 20 ML tall (Fig. 17 (a)) and compare the critical diameter of this structure with that of a NW grown directly on a bulk substrate. Heterostructures such as these can be grown via the vapor liquid solid (VLS) mechanism or selective area epitaxy (SAE) 19 . Furthermore, experiments have shown that QDs in stacked layers grow increasingly larger as more layers are added. 1 It has been widely accepted that embedded QDs help align the local lattice constant of the crystal to that of the natural QD material. 98-99 Embedded QDs thus serve as nucleation sites which lower strain energy for the subsequent layers formed on top, enabling the fabrication of vertically aligned QD arrays 100 . Here, we apply this observation to quantify the expected increase in critical height for NWs growing on top of a single QD layer compared to a simple substrate (Fig. 17 (b)). It is conceivable to use VLS or SAE to grow an array of NWs on a substrate containing an embedded QD layer. Lastly, we examine the change in strain energy when these NW are shifted off-axis with respect to the embedded QDs, as may occur during fabrication processes without precise alignment control of NW position relative to the embedded QDs. In Fig. 18 (a), we plot the total strain energy (solid curves) versus NW height for InAs NWs growing on top of GaAs pillars of 20 ML in height formed on a GaAs substrate. The pillars have the same diameter as the NW of 8, 10, and 9 nm (solid curves with square, circle, and no markers respectively). The dashed curves represent the energy of two edge dislocations formed in the NW with different heights. As in Ref. 91, the dislocation energy calculation incorporates the finite NW diameter yielding three curves for = 8, 10, and 9 nm NWs (dashed 98 curves with square, circle, and no markers respectively). Similarly to Ref. 52, the strain energy increases rapidly with , such that the NW with = 10 nm has strain energy (solid curve, circle markers) exceeding the dislocation energy at = 1 ML. This indicates that coherent growth is not favorable for this NW. By contrast, the NW with = 8 nm has strain energy (solid curve with square markers) that is always less than its corresponding dislocation energy. This indicates that it is possible to grow many coherent layers with = 8 nm without triggering dislocations. Thus, the critical height of the NWs drops from virtually infinite to zero for small increases of 1-2 nm in diameter. These sharp transitions have been previously predicted and are characteristic of highly lattice mismatched materials. 93 The critical diameter appears to be FIG. 18. Strain and dislocation energy (solid and dashed curves respectively) for InAs NWs grown on GaAs pillars with = 20 ML and the same diameter as NW - = 8, 10, and 9 nm (square, circle, and no markers respectively). Star and diamond symbols depict strain energy for 9 nm diameter NW on pillar with = 5 and 3 ML respectively. 99 9 nm, which is larger than the 6 nm we previously calculated for a NW grown on a simple substrate. The corresponding increase in volume is 225%. The increase in critical diameter occurs due to the lateral relaxation of the pillar-NW structure. In Fig. 18, we also plot strain energy for NWs grown on pillars of 5 and 3 ML (stars and diamonds respectively). As expected, as the pillar length decreases the NW strain energy increases because the pillar cannot provide enough lateral relaxation to offset the lattice mismatch close to the substrate surface. A pillar with = 3 ML shows a noticeable difference, while the = 5 ML and = 20 ML pillars have negligible differences. We can conclude that any pillar with height beyond 5 ML will provide the same reduction in NW strain energy. In Ref. 52, our calculations showed that the FIG. 19. Strain energy ( )and dislocation ( ) energy (solid and dashed curves respectively) for InAs NWs grown on GaAs substrate with InAs QD embedded at = 1, 2, 3, and 5 ML below the surface (diamond, square, circle, and no markers respectively). Red, blue, green, and black curves depict NWs with = 8, 9, 10, and 11 nm respectively. The QD has truncated cone shape with = 20 nm base diameter, =14 nm top diameter, and = 7 ML. 100 strains generated at the material interfaces decreased to zero over ~5 ML in the substrate material. Our results here are consistent with that observation and establish an interaction length between the two materials, which is a function of their material parameters. Next, we calculate the strain energy in an InAs NW grown on a GaAs substrate with an InAs QD embedded in the substrate. The QD has a truncated cone shape with 20 nm base diameter ( ), 14 nm top diameter ( ), and height of 7 nm ( ) (see Fig. 17). These dimensions are observed for QDs in superlattice structures fabricated in Ref. 1. The QD distance from the surface varies as = 1 ML, 2 ML, 3 ML, and 5 ML depicted in Fig. 19. Solid, FIG. 20. Maximum strain energy for NWs with ≤ 5 ML, grown on embedded QD as in Fig. 19 with distance from surface vs. NW off-center displacement ( ) relative to QD central axis. Dashed, solid, dot-dash, and dotted curves have = 7, 8, 9, and 10 nm respectively. Diamond, square, circle markers depict = 1, 2, and 3 ML respectively. 101 dot-dash, and dotted curves represent strain energies for NW diameters of = 8-11 nm. Dashed lines represent the corresponding dislocation energies. As expected, the results show the strain energy decreasing with , since a smaller QD distance from the surface facilitates greater lattice mismatch accommodation between the NW and substrate. Our results indicate that a critical diameter of 10 nm can be achieved provided = 1 ML, and 9 nm diameter for = 3 ML. Finally, in Fig. 20, we present the results for this configuration, with NWs shifted off axis with respect to the QD by a distance . We calculate the strain energies for each NW height up to = 5 ML as in Fig. 18 and Fig. 19. In Fig. 20, we present the maximum strain energy over these NW heights for different diameters and QD distance from surface ( ). As the off-center distance increases, we expect the strain energy to increase, since the NW is moving towards a region of pure substrate material. We observe that the strain energy does indeed begin to increase for > 10 nm. This is not surprising since the lower edge of the embedded QD occurs at = 10 nm (that is, = 20 nm). This result suggests that both top and bottom dimensions of the embedded QD are significant for reducing the NW strain energy. For = 5 - 10 nm, however, our results show a decrease in strain energy compared to = 0 occurring for all NW configurations. 58-59 At = 5 nm, the base edge of the NWs have moved beyond the top edge of the embedded QD at = 7 nm (since = 14 nm), but are still within the QD’s bottom edge at = 10 nm. For = 10 nm, the NW base edges have moved beyond the bottom edge, yet the strain energy continues to decrease, except for the NWs with = 10 nm, and = 9 nm, = 1 ML. In addition, we also observe that the strain energies for NWs with larger can exceed those for NWs with the same diameter and smaller when ≥ 10 nm, as can be seen from 102 the solid curves ( = 8 nm) and dot-dash curves ( = 9 nm) in Fig. 20. We attribute these phenomena to the shape of the embedded QD inducing complex strain fields in the surrounding substrate. This demonstrates the importance of such strain calculations in guiding experiments towards the most desirable results. Our findings suggest that the minimum strain energy may occur for off-center NW positions. More in-depth analysis of the interacting strain fields in QD array systems is necessary. In conclusion, we have shown that specific growth configurations allow greater critical diameter for highly lattice mismatched NWs, which provides up to 275% volume increase of coherently grown material. This can lead to significant increase in photon absorption potential of NW solar cell structures. Further analysis of the complex induced strain fields for specific QD shapes and NW positions may reveal further potential gains in critical diameter. 103 3.1.4 Effect of Strain, Polarization, Shape, and Size on Band Gaps of Embedded Quantum Dots Having understood the role of strain on the formation of QDs, we now apply our full model in order to compute electronic structure. As explained in Chapter 1, the QD size, as well as the strain and resulting polarization are expected to significantly alter the band edge energies compared to the bulk material. An important advantage of our model is the ability to calculate the band gap while including only some, but not all of these factors. This allows us to observe the shift in band edge energies from each individual component, and in this way quantify the value of each. The ability to separate the observed shifts into contributions from multiple sources serves can provide important insights to the fabrication process. Experimentalists are typically able to observe only the overall emission wavelengths which in reality are a result of the convolution of multiple phenomena. Here, we carry out different simulations setting and in order to understand the contributions from each component in our total Hamiltonian . We choose QD systems with dimensions taken from Fig. 8. In that work, the authors described the shape of the QDs to be “box islands” and observed that they increased in size in each subsequently grown layer. Here, we also wish to understand how the band gaps of these different QDs might differ as a function of size and shape. To this end, we choose 4 sizes and 3 possible shapes for these QDs. The shapes are chosen to be a lens, a truncated pyramid with square base, and a truncated disk as shown in Fig. 22. We choose 4 dimensions giving increasing sizes for the truncated disk shape shown in Fig. 22 (d)-(g). We then model the truncated pyramid and lens shaped QD with the 104 same base and height dimensions as the second largest truncated disk shape (Fig. 22 (e)) for comparison of shape variations. To understand the appearance of typical strain profiles, we first present Fig. 21 with the strain component curves of the truncated pyramid (Fig. 22 (a)) along the direction through an plane crossing through the QD center in . As we expect the hydrostatic and (red and blue curves respectively) components are negative indicating that the QD is compressed. We note that the substrate is also tensed (positive strain) immediately outside the QD. The component (green curve) also shows the expected spatial variation. The shear components are very small in the , , and , and they only take significant values at the edges of the QD. We note that the shear strains possess an antisymmetric shape which determines our piezoelectric potential. We observe that the strain profiles like the ones shown in Fig. 21 are very similar for a variety of QD shapes and sizes analyzed in this work. In this example, the hydrostatic strain is smaller than the maximum of . With larger diameter, the QDs become fully strained as was observed in the case of uncovered QDs. FIG. 21. Strain components of QD shown in Fig. 22 (a) through the plane that crosses the center in the direction. Red, blue, green, pink, cyan, and black curves represent , , , , , and respectively. 105 We next plot the piezoelectric potential for various shapes in Fig. 22. (Since InAs and GaAs materials are not strongly polar, they have negligible spontaneous polarization .) The first row gives 2 truncated pyramids, the second row contains one lens-shaped QD, and the bottom row compares 4 truncated disk shapes of increasing size. We note from the energy scales that the piezoelectric potential varies between 18 and 34 meV. The antisymmetric shape is expected FIG. 22. Piezoelectric potential for truncated pyramid (a)-(b), lens (c), and truncated disk (d)-(g) shaped QDs with varying base, top, and height dimensions. (c) H = 1.79 nm B1 = 21.48 nm y x z (b) (a) H = 1.79 nm B1 = 21.48 nm B2 = 14.13 nm Energy (meV) 25.4 13.9 2.4 3.3 14.8 26.3 Energy (meV) 23.4 12.9 2.4 2.8 13.3 23.7 H = 1.79 nm B1 = 10.74 nm B2 = 7.07 nm Energy (meV) 18.3 10.0 1.8 2.3 10.5 18.8 (g) (f) (e) (d) H = 2.83 nm B1 = 28.27 nm B2 = 15.82 nm H = 1.70 nm B1 = 19.79 nm B2 = 14.13 nm H = 1.79 nm B1 = 21.48 nm B2 = 14.13 nm Energy (meV) 21.4 11.8 2.1 2.7 12.4 22.0 Energy (meV) 23.0 12.6 2.2 3.0 13.4 23.7 Energy (meV) 29.0 15.8 2.6 4.0 17.2 30.4 Energy (meV) 33.4 18.5 3.6 3.8 18.7 33.6 H = 3.96 nm B1 = 35.62 nm B2 = 18.09 nm 106 from the shape of the shear strain components. Comparing the 3 different shapes with the same dimensions in the second column ((b), (c), and (e)), we note that the truncated disk and pyramid have higher peak piezoelectric potential than the lens. This can be explained by the fact that the first two shapes are modeled with more sharp edges than the lens. We can expect that the shear strains will be highest along these abrupt surfaces. The bottom row of truncated pyramids shows the peak piezoelectric potential increasing with size. In this case, both the base and height are increasing for each QD. However, in the top row which compares two pyramids with the same height but different base size, we observe that the peak piezoelectric field is higher in the smaller pyramid. The pyramid has the sharpest edges, and we expect that this shape will have the highest piezoelectric field. The QD in Fig. 22 (a) also has the steepest sloped edges among all the QDs studied. We can infer that both sharp and steeply sloped edges contribute to increase the effect of the piezoelectricity. We next analyze how the electronic levels differ in the shapes above. We calculate the band gaps for the QDs in Fig. 22 (b) – (g) in 3 different scenarios – , , and . In the first case, no strain or piezoelectric effects are included and the system does not take the lattice mismatch into account. This system is very similar to the “particle in a box” problem. The GaAs substrate has a higher band gap which acts as a barrier that traps an electron in the InAs region. We expect that the energies will decrease due to weaker confinement as the QD size increases. In the second case, we include the strain Hamiltonian but not the piezoelectric effect in order to compare the effect each one of these operators has on the band gap. Figure 23 presents our results for the band gaps plotted as a function of the QD volume. The lens shape has the smallest volume for the same base and height, and we note that this system has 107 the largest band gap. The pyramid shape has slightly larger volume than the disk, and thus shows a lower band gap. The band gap decreases as the 4 disk shaped QDs increase in size. We observe a dramatic increase in the band gap when the strain effect is included. This is expected since we are introducing more energy into the system in addition to the confinement energy. Finally, we note that the band gaps are almost identical when the piezoelectric effect is included along with the strain. Evidently, the piezoelectric effect does not play a significant role in our particular material system because InAs and GaAs are not strongly polar materials. However, their lattice mismatch of 7% introduces significant shifts in the energy bands indicating large displacements in the crystal structure. FIG. 23. Band gaps for lens, truncated pyramid, and truncated disk QDs (see Fig. 22) (green, blue, and red dots respectively. Dashed and solid lines are aides for the eye. The label “ & ” indicates that the band gaps for these two cases are nearly the same. This is because the piezoelectric potential present in the total Hamiltonian makes a negligible difference in band gaps of these materials. disk H disk H k pyramid H ks & H disk H ks pyramid H k lens H ks & H lensH k 0.90 0.95 1.00 1.05 1.15 1.20 1.25 1.30 0 500 1000 1500 2000 2500 QD Volume (nm 3 ) Bandgap Energy (eV) 108 To understand how the conduction and valence bands are individually shifted by each component, we present Table 2 which lists the results for the shapes in Fig. 22 (b)-(g) under the 3 scenarios where and . Comparing the 3 shapes, we note that the conduction band is shifted downward by 8-9 meV as the QD volume decreases (lens – disk – pyramid) and the valence band shifts upward by 6-7 meV. The net result is a decrease in the band gap of about 15 meV due to confinement effects. Comparing the and case, we observe a dramatic increase in the conduction band of about 0.39 meV for all shapes. The valence band also increases by 0.15 meV. Thus, the strain causes both bands to shift upward, but affects the conduction band more strongly such that the overall effect is an increase in the band gap. We have shown that the hydrostatic strain components are the most significant, and in section 2.2, we showed that these components define the diagonal terms of . 109 Table 2. Conduction and valence band energies of QDs Lens Disk Pyramid (d) 1.464 0.298 (c) 1.081 0.140 (e) 1.072 0.142 (b) 1.066 0.154 1.470 0.294 1.461 0.301 1.450 0.307 1.469 0.294 1.460 0.300 1.452 0.307 (f) 1.340 0.369 (g) 1.328 0.397 110 3.1.5 Electronic Structure of Single and Multi-Body QD Systems The results presented in section 0 revealed that confinement effects push the bulk valence and conduction band edges away from one another, while strain increases both for a higher overall band gap in lattice mismatched QD systems. We have also shown that strain decays to zero in the substrate and we expect the band structure in this region to be similar to that of the bulk. Another important parameter is the relative band offset between the QD and substrate (see Fig. 24). In our simulations, we set the valence band edge of the substrate to correspond to . We use (GaAS) = 1.519 eV and (InAs) = 0.418 eV. However, the band offset between the valence bands of the two materials is chosen to be = 0.21 eV from experiments. Therefore, we have (InAs) = 0.21 eV and (InAs) = 0.628 eV. In Fig. 1, we represent the unperturbed energy levels of the bulk material and illustrate the shifts we expect as a result of confinement and strain. For a very small QDs, we have observed that the strain is lower than in larger QDs. In this case, the confinement effect will dominate and the band edges shift away from one another. However, as Fig. 1 shows, the energy offset relative to the substrate material decreases and the bands become flatter. An electron in such a shallow potential well is expected to have only a few bound states at low energies and scattering states at higher energies. Solving our Hamiltonian using Lanczos algorithm allows us to identify the energy crossover between the bound and scattering states by examining the resulting wavefunctions. FIG. 24. Conduction and valence band offset energy (left) and expected energy shifts from confinement ( ) and strain ( in small and large QDs (center and right respectively). ΔE c bulk small QD large QD ΔE s ΔE c ΔE s ΔE c ΔE s ΔE c ΔE s E BO 111 To demonstrate this, we plot the wavefunctions around the band gap of some embedded QD shapes introduced in sections 3.1.1. Figures 26-29 present four single-layer disk- shaped QDs with diameters of 10, 15, 20, and 25 nm. For each shape, we plot 20 wavefunctions around the band gap, usually resulting in the 10 top valence band states, and the 10 bottom conduction band states. Green dotted lines denote the boundaries between valence and conduction band wavefunctions and each graph is labeled with its computed energy in eV. When present, spurious eigenenergies (see section 2.1.3.1) inside the band gap are denoted by a red label. The band gaps for the QDs with increasing size are given by 1.276, 1.199, 1.160, and 1.138 eV. The states around the band gaps appear to be localized inside the QDs and exhibit the expected nodal structure similar to a particle in an infinite potential well. Because the potential is finite, however, we observe a transition from the bound states to non- localized scattering states in the 10 and 15 nm diameter QDs. We expect this transition to occur in all QDs above 1.519 eV and below 0, which are the energies of the CB and VB edges in the substrate. The relative position of the CB and VB of the QD material will determine the number of bound states present. For the 10 nm diameter QD, we observe 7 bound states in the valence band above 0.153 eV and 2 bound states in the conduction band below 1.519 eV. These energies can be expressed as 58 meV below the VB edge (0.211 eV) and 32 meV above the CB edge (1.487 eV). This suggests that the conduction band is shallower than the valence band relative to the substrate material. Electrons in the conduction band must overcome a smaller potential difference in order to tunnel into the substrate material than do holes in the valence band. 112 In Fig. 27, we observe all bound states in the valence band and 5 bound states in the conduction band for a 15 nm diameter QD. This is an indication that both bands are deepening due to confinement effects as depicted in Fig. 24. Indeed, while the onset of the scattering state in the CB occurs at 1.519 eV as in the 10 nm QD from Fig. 26 the CB edge shifts downward to 1.446 eV such that the relative difference increases to 73 meV compared to the 10 nm diameter QD. The CB has more bound states and electrons must overcome a greater potential difference to tunnel into the substrate material. Thus, we expect that as QDs increase in size, the confinement effect will cause both CB and VB to deepen to various degrees, leading to higher number of bound states. Nevertheless, the presence of strain introduces a further subtlety for the CB. We have shown that large QDs have higher strain than small QDs. When comparing the CB energy shifts in a large QD to those of a small QD as in Fig. 24, we observe that the larger shift due to strain energy works in opposition to the shift in confinement for larger QDs. This implies that strain can provide a relatively shallow potential for electrons in the CB even when confinement is smaller due to large size. Such a configuration would have a smaller band gap, but electrons would need less additional energy to escape into the substrate material. We observe this effect in the next set of results plotted in Figs. 29-32. The geometries correspond to three truncated disk shapes with dimensions as in Fig. 22 (e)-(g). The band gaps are given with increasing size as 1.159, 1.015, and 0.931 eV. These QDs have been observed to form via strain relaxation in the Stranski-Krastanov growth mode and their strains approach that of a thin film when they are embedded inside the substrate. Figure 30 plots the 113 wavefunctions for the smallest QD revealing only 2 bound states in the CB above 1.520. The CB edge is 60 meV below this (1.460 eV). The volume of this QD is 480 nm 3 , but its conduction band structure is similar to the 10 nm diameter single-layer QD with volume of 44 nm 3 . Similarly, Fig. 31 plots the wavefunctions for a larger QD with volume of 1108 nm 3 having 6 bound states similar to the 15 nm diameter QD with volume of 100 nm 3 . In this case, the first scattering state is 0.138 eV above the CB edge (1.384 eV). Thus, with increasing QD size, the CB is becoming deeper allowing for more bound states due to weaker confinement. Figure 32 further supports this by showing that all plotted wavefunctions are bound in both VB and CB. Nevertheless, this example reveals that QDs with volumes that differ by an order of magnitude relative to smaller QDs can still exhibit shallow CB structures with few bound states due to the interplay between strain and confinement effects. This is an important insight which can help to precisely engineer the band gap and the relative band offset of nanostructure systems. (1) With increasing QD size, the number of VB bound states increases. (2) With increasing QD size, the number of CB bound states may increase or decrease due to the interplay between strain and confinement effects. Finally, we have applied our model to the QD stack of 4 layers modeled after the experimental result in section 3.1.1, Fig. 8. We vary the thickness of each layer as shown in Fig. 25. The number label denotes the thickness of each layer in nanometers. We present the results of these 3 configurations in Figs. 33-38. Each figure plots 20 valence or conduction band wavefunctions around the band gap. The band gap is below 1 eV for all cases which is close to the band gap of the largest QD from section 3.1.1, Fig. 8. We note the presence of 17 – 19 114 bound states in the CB and the onset of scattering states in above 1.519 eV in all three cases. Most of the states in both the VB and CB are confined to the top largest QD. However, we observe transitions occurring between high order eigenstates of the top QD, to lower order eigenstates of the smaller QDs below. This is very similar to the way an electron occupies orbitals with increasing energy in an atom (e.g. 1 , 2 , 2 , 3 , 3 , 3 , etc.). Indeed, we observe that a transition to a smaller QD is usually followed by a transition back to the top QD. This also resembles transitions of an electron in a large atom that may fill 4 orbital before filling the 3 orbital. With increasing layer thickness, the overall band gap of the system is 997, 975, and 964 meV. The valence and conduction bands shift towards each other with values of 363, 371, and 378 meV and 1.360, 1.346, and 1.342 meV respectively. We also note that the first transition from the top QD to the next largest QD occurs at energies closer to the band edge when the layer thickness is increased. That is, the first transition in the VB occurs at 306, 319, and 329 meV with increasing layer thickness, and the first transition in the CB occurs at 1.444, 1.427, FIG. 25. Stack of 4 QDs with truncated disk shape having dimensions given in Fig. 22 (e)-(g). The number label denotes the thickness of each layer in nanometers. 5.653 7.915 10.176 (a) (b) (c) 115 and 1.411 eV. Relative to the band edge energy, we have a difference of 57, 52, and 49 meV for the VB and 84, 81, and 69 meV for the CB with increasing layer thickness. These results suggest that it becomes easier to transition between QDs although the layer thickness increases. This would indicate that QD layers which are close may induce confinement and strain effects in the substrate layers which cause an increase in the band gap of this material. This effect is expected to introduce a higher effective barrier for electrons to transition between QDs. Thus, the band structure of such stacked nanostructure systems merits closer investigation. Our model enables the study of complex multiple-QD configurations in greater detail. 116 FIG. 26. Valence and conduction band wavefunctions with corresponding energies (eV) for single- layer disk shaped InAs QD with 10 nm diameter embedded in GaAs. Green dashed lines denote the band gap. 0.147 0.151 0.153 0.159 E = E = E = E = 0.159 0.164 0.177 0.187 0.211 0.211 1.487 1.487 E = E = E = E = 0.159 0.164 0.177 0.187 0.211 0.211 1.487 1.487 E = E = E = E = E = E = E = E = E = 1.519 1.523 1.523 1.524 1.527 1.529 1.531 1.532 E = E = E = E = E = E = E = E = E = 1.519 1.523 1.523 1.524 1.527 1.529 1.531 1.532 117 FIG. 27. Valence and conduction band wavefunctions with corresponding energies (eV) for single- layer disk shaped InAs QD with 15 nm diameter embedded in GaAs. Green dashed lines denote the band gap. 0.176 0.184 0.187 0.199 0.200 0.208 0.221 0.222 0.176 0.184 0.187 0.199 0.200 0.208 0.221 0.222 0.247 0.247 1.232 1.446 1.446 1.446 1.503 1.503 0.247 0.247 1.232 1.446 1.446 1.446 1.503 1.503 1.519 1.523 1.524 1.526 118 FIG. 28. Valence and conduction band wavefunctions with corresponding energies (eV) for single-layer disk shaped InAs QD with 20 nm diameter embedded in GaAs. Green dashed lines denote the band gap. 0.207 0.207 0.209 0.220 0.223 0.235 0.243 0.265 0.207 0.207 0.209 0.220 0.223 0.235 0.243 0.265 0.265 1.425 1.425 1.425 1.425 1.455 1.467 1.467 0.265 1.425 1.425 1.425 1.425 1.455 1.467 1.467 1.513 1.514 1.519 1.520 119 FIG. 29. Valence and conduction band wavefunctions with corresponding energies (eV) for single- layer disk shaped InAs QD with 25 nm diameter embedded in GaAs. Green dashed line denotes the band gap. 0.215 0.222 0.226 0.230 0.236 0.240 0.253 0.257 0.215 0.222 0.226 0.230 0.236 0.240 0.253 0.257 0.276 0.276 1.414 1.414 1.443 1.443 1.444 1.444 0.276 0.276 1.414 1.414 1.443 1.443 1.444 1.444 1.479 1.479 1.487 1.496 120 FIG. 30. Valence and conduction band wavefunctions with corresponding energies (eV) for truncated disk-shaped InAs QD with dimensions as in Fig. 22 (e). Green dashed lines denote the band gap. Red text indicates a spurious eigenvalue. 0.167 0.185 0.206 0.221 0.237 0.248 0.252 0.271 0.167 0.185 0.206 0.221 0.237 0.248 0.252 0.271 0.274 0.301 0.978 1.460 1.507 1.520 1.525 1.526 0.274 0.301 0.978 1.460 1.507 1.520 1.525 1.526 1.531 1.543 1.550 1.563 121 FIG. 31. Valence and conduction band wavefunctions with corresponding energies (eV) for truncated disk-shaped InAs QD with dimensions as in Fig. 22 (f). Green dashed line denotes the band gap. 0.238 0.263 0.273 0.292 0.308 0.320 0.323 0.341 0.238 0.263 0.273 0.292 0.308 0.320 0.323 0.341 0.345 0.369 1.384 1.429 1.431 1.475 1.477 1.480 0.345 0.369 1.384 1.429 1.431 1.475 1.477 1.480 1.522 1.527 1.535 1.549 122 FIG. 32. Valence and conduction band wavefunctions with corresponding energies (eV) for truncated disk-shaped InAs QD with dimensions as in Fig. 22 (g). Green dashed line denotes the band gap. 0.263 0.285 0.307 0.324 0.333 0.352 0.359 0.372 0.263 0.285 0.307 0.324 0.333 0.352 0.359 0.372 0.376 0.397 1.328 1.366 1.370 1.406 1.408 1.413 0.376 0.397 1.328 1.366 1.370 1.406 1.408 1.413 1.448 1.480 1.488 1.495 123 FIG. 33. Valence band wavefunctions with corresponding energies (eV) for a 4-layer stack of truncated disk-shaped InAs QDs embedded in GaAs as shown in Fig. 25 (a). Green dashed line denotes the band gap. Red text indicates a spurious eigenvalue. 0.235 0.249 0.252 0.267 0.270 0.282 0.288 0.295 0.235 0.249 0.252 0.267 0.270 0.282 0.288 0.295 0.306 0.307 0.314 0.318 0.323 0.327 0.333 0.341 0.306 0.307 0.314 0.318 0.323 0.327 0.333 0.341 0.342 0.355 0.363 0.608 1.360 1.360 1.387 1.401 124 FIG. 34. Conduction band wavefunctions with corresponding energies (eV) for a 4-layer stack of truncated disk-shaped InAs QDs embedded in GaAs as shown in Fig. 25 (a). Green dashed line denotes the band gap. 0.342 0.355 0.363 0.608 1.360 1.360 1.387 1.401 1.416 1.423 1.427 1.433 1.455 1.459 1.463 1.467 1.416 1.423 1.427 1.433 1.455 1.459 1.463 1.467 1.471 1.480 1.482 1.492 1.501 1.511 1.514 1.524 1.471 1.480 1.482 1.492 1.501 1.511 1.514 1.524 125 FIG. 35. Valence band wavefunctions with corresponding energies (eV) for a 4-layer stack of truncated disk-shaped InAs QDs embedded in GaAs as shown in Fig. 25 (b). A green dashed line denotes the band gap. 0.231 0.242 0.252 0.261 0.269 0.278 0.285 0.293 0.231 0.242 0.252 0.261 0.269 0.278 0.285 0.293 0.301 0.306 0.314 0.319 0.321 0.328 0.331 0.336 0.301 0.306 0.314 0.319 0.321 0.328 0.331 0.336 0.343 0.349 0.359 0.371 1.346 1.346 1.379 1.389 126 FIG. 36. Conduction band wavefunctions with corresponding energies (eV) for a 4-layer stack of truncated disk-shaped InAs QDs embedded in GaAs as shown in Fig. 25 (b). 0.343 0.349 0.359 0.371 1.346 1.346 1.379 1.389 1.422 1.431 1.438 1.444 1.445 1.458 1.465 1.475 1.422 1.431 1.438 1.444 1.445 1.458 1.465 1.475 1.483 1.485 1.493 1.501 1.506 1.518 1.523 1.528 1.483 1.485 1.493 1.501 1.506 1.518 1.523 1.528 127 FIG. 37. Valence band wavefunctions with corresponding energies (eV) for a 4-layer stack of truncated disk-shaped InAs QDs embedded in GaAs as shown in Fig. 25 (c). Green dashed line denotes the band gap. 0.233 0.243 0.253 0.264 0.273 0.281 0.288 0.299 0.233 0.243 0.253 0.264 0.273 0.281 0.288 0.299 0.301 0.310 0.316 0.320 0.325 0.329 0.337 0.340 0.301 0.310 0.316 0.320 0.325 0.329 0.337 0.340 0.345 0.356 0.362 0.378 1.269 1.342 1.342 1.378 128 FIG. 38. Conduction band wavefunctions with corresponding energies (eV) for a 4-layer stack of truncated disk-shaped InAs QDs embedded in GaAs as shown in Fig. 25 (c). Green dashed line denotes the band gap. Red text indicates a spurious eigenvalue. 0.345 0.356 0.362 0.378 1.269 1.342 1.342 1.378 1.384 1.411 1.416 1.421 1.427 1.452 1.455 1.456 1.384 1.411 1.416 1.421 1.427 1.452 1.455 1.456 1.463 1.467 1.469 1.493 1.499 1.501 1.510 1.525 1.463 1.467 1.469 1.493 1.499 1.501 1.510 1.525 129 3.2 Properties of WZ Crystal Nanostructures Grown on (0001) Substrates 3.2.1 An 8-Band Study of GaN/AlN Quantum Dot Interactions Quantum dots (QDs) 101-102 fabricated using the Stranski-Krastanov or selective area growth (SAG) techniques have become of interest to researchers as potential optoelectronic devices. In particular, gallium nitride (GaN) QDs are promising candidates for light emitting diodes (LEDs) in the short wavelength visible spectrum range (350 – 500 nm) 103-105 . QDs behave like “artificial atoms” with discrete emission wavelengths which are highly sensitive to the QD size and shape. The smallest QDs have band gaps 0.5 – 1 eV higher than the bulk material, due to charge-carrier confinement in 3 dimensions. Furthermore, QDs grown on lattice- mismatched substrates experience additional shifts in their conduction and valence band edges from strain-induced deformation potentials 27 . The presence of mechanical strain also causes charges to shift within a crystal, giving rise to a built-in piezoelectric polarization ( ). Finally, those binary materials having inter-atomic bonds with asymmetric charge distributions will also exhibit polar surfaces due to the terminating cations or anions. This in turn provides an additional spontaneous polarization ( ). Here we study GaN QDs grown in the wurtzite (WZ) crystal structure which exhibit a spontaneous polarization along the growth direction (0001). Such QDs have been experimentally observed to emit blue light with a transition energy of 2.95 eV 106 attributable to large internal polarizations. The desire to systematically engineer devices with well-controlled and predictable optical properties has driven the development of theoretical methods that model the expected 130 effects of the above factors on the band structure of strained nanoscale QDs. Here we implement an 8-band method to describe the dispersion, strain, piezoelectric and spontaneous polarization of WZ GaN QDs embedded in an AlN matrix. This method allows us to calculate the band edge energies, and the corresponding electron and hole states near the band extrema. This method has been successfully applied to systems of isolated QDs. However, as fabrication techniques have evolved, QDs are now being grown in 2-D arrays as well as 3-D vertically correlated “superlattices”. Because these structures are lattic e- mismatched, the strain field of each QD is likely to affect its neighbors. In addition, we have previously seen that strain is shared between the substrate and a lattice-mismatched nanowire (NW), which allows the NW to grow coherently without dislocations 52 . Thus, the strain induced in the substrate also affects the strain inside the QD. In previous studies of isolated QDs, the substrate strain was either assumed to be zero, or would decay to zero far away from the QD 85- 86 . However, realistic superlattices often have several layers of QDs separated by a few monolayers (ML) of confining material. Because the strain is not expected to decay in such a configuration as for an isolated QD system, it is important to examine the strain fields of stacked geometries to understand their expected contribution to the shifts in electronic energy levels. When nanostructures are grown on substrates having a smaller lattice constant, the nanostructure’s lattice will be compressed relative to its natural spacing. Such a compressed system will cause the conduction and valence bands to shift away from each other in energy, thus widening the band gap 29 . The opposite is true when the substrate has larger lattice constant than the nanostructure material. In our case, the lattice constant of GaN is larger than 131 that of AlN, therefore, the resulting strain is expected to cause a larger band gap. However, the strain-induced piezoelectric field, while negligible in some materials such as zinc blende (ZB) InAs and GaAs, is significant in the GaN WZ crystal 53 . This phenomenon, along with the spontaneous polarization field caused by polar surfaces acts in the same way as an applied electric field tilting the energy bands to create a potential difference across the material (see Fig. 43). When an electron is promoted to the conduction band, leaving a hole in the valence band, the presence of the electric field acts to push them apart. Because of the tilt in energy bands, the spatial separation of the electron and hole is also accompanied by a lower difference in their energies which is referred to as the quantum confined Stark effect (QCSE). This phenomenon is illustrated in Fig. 43 (a). In such materials, QCSE allows electronic transitions to occur at energies below the bulk band gap 50 . Thus, for the material system studied here, the strain and polarization will produce opposite effects in the electronic transition energy. GaN has bulk band gap energy of 3.5 eV (~350 nm), and QD devices have been investigated as candidates for green LEDs with emission desired around 500 nm. Nonetheless, an additional important aspect is the carrier lifetime of electrons and holes. Once an electron is promoted from valence to conduction band, it decays back down with a characteristic radiative recombination lifetime τ. The lifetime depends on the electric field present, as well as the spatial overlap of the electron and hole states. It has been predicted that the permanent electric field of isolated GaN QDs pushes electrons and holes apart, minimizing spatial overlap leading to long radiative recombination times 49 . This increases the likelihood that the electron will dissipate its energy through nonradiative processes rather than photon emission, and directly impacts the efficiency of the device. If the electron and hole remain strongly confined 132 within the same spatial region to maintain significant overlap, the redshift in the band gap caused by polarization may be beneficial to achieving green light emission. Strong confinement is provided by the high barrier energy of AlN, as well as by small QD size. However, for 2-D and 3-D systems, proximity among the QDs may lead to shielding of the polarization fields. Because of this, it is important to characterize the effect of polarization on the energies and electronic states in 2-D and 3-D structures of QDs having interacting electric fields and compare them to the case of a single QD system. We have calculated the band gap energy and electronic states of a single row of 3 QDs, as well as 2-D sheets containing 9 QDs in a square configuration (Fig. 39). Experiments have shown that vertical superlattices can be grown consisting of QDs with a truncated pyramid shape and hexagonal base of highly uniform size 107 . As in Ref. 49 the QDs have side length of = 7.5 nm at the bottom, while the height and top side lengths are both taken to be 𝑡 = = 3 nm. We vary the center-to-center inter-QD distance in 2 nm increments between 16 – 22 nm as shown in Fig. 39. We expect that the strain will be higher as the QDs are brought closer together. In Fig. 40, we plot the profiles of the 6 strain components through the bottom layers of the QDs in the row configuration (Fig. 39 (b)) along the -direction at = 31 nm as indicated by the dashed line in Fig. 39. The dash, dot, dot-dash, and solid lines represent the strain profiles for the rows with inter-QD distance = 16, 18, 20, and 22 nm respectively. In addition, the strain profiles of a single QD are plotted with star markers. The hydrostatic components , , and are negative inside the QDs, indicating compressive strain as expected. It is also apparent that the compressive strain in the substrate in regions between the QDs increases as decreases. This is also expected as the substrate is not allowed to relax 133 as it does around the isolated QD. It is possible that the substrate strain also contributes to shifts in the band gap of multi-body systems. The strain profiles through the middle row of a 2- D sheet layer (not shown) appear identical to Fig. 40. Nevertheless, we expect that 2-D sheets will have additional strained substrate regions compared to the isolated row. We next compare the eigenvalues of different systems to understand the contributions arising from each effect. We have computed the electron and hole energy levels of the row and sheet system shown in Fig. 39 and compared to the energy of a single QD under different scenarios. In the first case, we exclude the polarization potential (from both spontaneous and piezoelectric polarization) from the Hamiltonian and calculate the energy states in the presence of strain alone. Figure 41 (a) shows these results vs. with the conduction and band edge energies in the left plot and the band gap energy on the right. As stated previously, we expect the compressive strain to widen the band gap in the QD compared to the bulk (unstrained) material, and we note that this is indeed the case. Compared to the bulk band gap of 3.44 eV for GaN, we observe that the conduction band has shifted to ~4.07 eV and the valence bands have also shifted up to from 0.0 eV to 0.28 eV. The overall effect is a wider band gap of ~3.79 eV. Compared to the single QD energy levels (dashed and dot-dash curves) the conduction bands for the row and sheet are shifted upwards by about < 1 meV and 4-5 meV respectively. The valence band values are within 1 meV of the single QD valence energy. Therefore, we note that the energy gaps are higher by ~1 meV and ~5 meV for the row and sheet geometries respectively (Fig. 41 (a)). To understand whether this difference occurs due to the greater strain in the substrate for multi-body systems, we recalculate the energies of these geometries and set the substrate deformation potentials to zero. The results are presented in Fig. 41 (b). 134 Here, the only noticeable difference is an increase of the single QD band gap of 1 meV in the absence of substrate strain compared to the single QD with substrate strain. This observation can be explained as follows: for the single QD, the substrate is tensed at the QD interface and decays to zero far away from the QD. Tensile strain has the opposite effect as the compressive strain inside the QD, therefore, when it is excluded, we observe the energy shifting due to compressive strain alone. For the multi-body row and sheet configurations, the QD is surrounded by both tensed substrate along the QD top and bottom, and compressive strain in the regions between the QDs, which does not decay to zero. The opposing strain may cancel one another’s effects on the band gap energy. Therefore, when the substrate strain is removed in the row and sheet geometries, the energy levels change by less than 1 meV. The overall effect when compared to a single QD is a blueshift of up to 5 meV for the row and sheet geometries due to the compressed strain surrounding the QDs. Next, in Fig. 41 (c) and (d) we compare the energies of the sheet and row systems with the single QD when the piezoelectric polarization alone is included and when the total polarization including the spontaneous component along the growth axis is added. In Fig. 41 (c) we note that the piezoelectric field has the effect of reducing the transition energy by 65 meV due to QCSE for an effective band gap of 3.72 eV compared to the strain only case in Fig. 41 (a). This is due primarily to the downward shift in the electron ground state energy of 74 meV offset by the downward shift in the hole energy of 9 meV. Compared to the single QD, the band gaps of the row and sheet are smaller by 2-9 meV. Since we have seen the contribution from substrate strain is a 1 meV blueshift, we can attribute the observed redshift in transition energy to the effect of the interacting piezoelectric field in the multibody system. Finally, in Fig. 41 (d), 135 we present the results with the spontaneous polarization. The curves follow a similar trend as Fig. 41 (c). However, now the band edge states shift even closer together with the hole energy increasing by roughly 200 meV and the conduction band falling by 250 meV. This introduces a significant redshift in the transition energy of 450 meV. It is apparent then, that the spontaneous polarization is the dominant factor that determines the transition energy (or “effective band gap”) of this material system. We also note that some of the curves in Fig. 41 (a)-(d) do not strictly increase or decrease with , rather they exhibit both increasing and decreasing trends. This is most noticeable for the and curves for sheets in Fig. 41 (a)-(d). The appearance of the same trend over multiple calculations is evidence that this is a physical effect, rather than a numerical one. We have observed this phenomenon previously in strained nanowires 108 , indicating that certain spatial configurations of QDs may be more strained than others in a manner which is not strictly determined by the inter-QD distance . Because we have seen that spontaneous polarization has the dominant effect on the effective energy band gaps, we turn our attention to the 3-D stack geometries shown in Fig. 39 (d). We expect to see a strong dependence of the band gaps on the inter-layer distance since the polarization vector is directed along the z-axis. In Fig. 41 (e) we present the energy results for the stacked row and layer configurations with = 1, 3, and 5 ML over a range of . The results show the electron (hole) ground state energy decreasing (increasing) with . It is evident that the distance between layers is the dominant factor in the transition energy as expected. We see a 200 meV redshift in the row and layer band gaps between 5 and 3 ML, and an additional 300 meV redshift between 3 and 1 ML. Most notably, we observe that the transition energies are now close to 2.5 eV, corresponding to a wavelength of 500 nm. Thus the 136 stacked configuration consisting of QDs with interacting polarization may provide electronic transition energies suitable for green LED devices. Nevertheless, it has been predicted that the polarization field will separate electron and hole states along the growth direction, minimizing their spatial overlap. This is expected to increase the recombination lifetime of the electron. To gain insight into this effect, we present the band edge electron and hole states for a stacked row geometry in Fig. 42 with = 1 nm, and = 1 ML (a) and 5 ML (b). The hole state is located inside the center QD of the bottom row, and appears confined to the bottom of the QD. Most electron states on the other hand, are distributed along the top row in the stacked structure. The states appear to oscillate between the center QD and the QDs along the edges of our geometry. In Fig. 42 (a), we observe that the electron ground state (3.405 eV) is located at the top of the center QD. The first and second excited states with energies 3.438 eV are each located in one of the edge QDs. Thus, the electron’s energy along the edges is greater by 25 meV. In fact, the higher order states follow the same pattern. It is reasonable for this to occur since the electron experiences a lower electric field in the center of the QD due to shielding from the outer QDs. Shielding is expected to be especially strong along the direction, since this is the orientation of . In Fig. 43 (b) we plot the potential energy due to internal polarization of a stacked sheet along the direction with = = 31 nm having = 1 nm and = 1 – 5 ML. Portions with negative (positive) slopes represent the potential energy inside the QD (substrate). It is apparent that the structure with = 1 ML (dot-dash curve) provides the most shielding to the center QD. This QD has the lowest internal electric field. We have also analyzed these curves for higher values of , as well as for the stacked row geometry and found them to be very similar to Fig. 137 43 (b). As increased, the extreme values of those curves decreased by up to 150 meV, with the stacked rows showing slightly less shielding than the stacked sheets. Thus, this demonstrates the strong shielding effect in the direction when QD layers are grown within 1 ML of each other and confirms the dramatic effect of high internal polarization on the transition energies of QD superlattices. Our results are consistent with the findings in Ref. 106 where transition energies of 2.95 eV were observed. Our results predict that decreasing further would allow green light emission. The electron-hole overlap, however, is expected to be small since electrons and holes are pushed toward the top and bottom of the QD stack, minimizing the probability of a radiative transition. Nonetheless, in Fig. 42 (b), we plot the electron and hole states for the stacked row with = 5 ML. Here we observe some of the excited states occurring in the bottom and middle rows, which may provide transitions with shorter recombination lifetimes. This result is somewhat surprising, because shielding in the center row is not as strong as in Fig. 42 (a), yet the electron and holes are in closer proximity. However, the potential energy in this case is more uniform than in Fig. 42 (a). Additional observations of the spatial distribution of electron and hole states in other geometries are required to understand this behavior. In summary, we have demonstrated the successful application of 8-band theory to characterize the hole and electron energy levels of different configurations of multiple GaN (WZ) QDs. We have quantified contributions from the strained substrate to the transition energy (< 5 meV), piezoelectric polarization (~75 meV), and spontaneous polarization (~450 meV). With spontaneous polarization along the growth direction being the dominant factor, we find that the electronic transition energies of QD layers stacked within 1 ML are redshifted 138 by a total of 1 eV compared to the bulk material (3.44 eV), indicating that emission in the green wavelength range (500 nm) for LED applications is possible. We observe that spatial overlap of hole and electron states appears small due to the built-in spontaneous electric field. Nonetheless, we propose to quantify the overlap and corresponding device efficiency, as well as to study additional QD configurations whose electronic states may benefit from spatial alignment due to the strong internal field. The polarization is an additional parameter for tuning the optical response of such nanostructures. Our approach is in agreement with previously observed theoretical and experimental observations for optical transition energies and can be used to characterize more complex geometries that approach realistic 2-D and 3-D superlattices. 139 FIG. 39. Geometries of QD configurations investigated in this section – row (a), sheet (b), stacked sheet (c) and stacked row (d). The Cartesian axes are also shown. d 1 d 1 d 1 d 2 h b t (a) (b) (c) (d) x y z x z 31 nm 31 nm 31 nm FIG. 40. Strain profiles for row of 3 QDs along the direction and = 31 nm (Fig. 39 (a)) Strain profiles for row of 3 QDs along the direction and = 31 nm ( (a)) 140 FIG. 41. (a) – (d) Conduction ( ), valence ( ) and band gap ( ) energies for 3 x 1 rows (Fig. 39 (a), square markers) and 3 x 3 sheets (Fig. 39 (b), diamond markers) with varying . Circle markers represent energies of single QD. (a) = 0, (b) Deformation potential = 0 for AlN and = 0, (c) = 0; (d) full Hamiltonian. (e) , , and for stacked rows and layers (Fig. 39 (c)-(d)) vs. . Markers represent row or sheet stack and value: = 5 ML row (sheet) => diamond (cross); = 3 ML row (sheet) => circle (triangle); = 1 ML row (sheet) => square (star). (a) (b) (c) (d) (e) 141 FIG. 42. Isosurfaces of probability density for the top valence band hole state and conduction band electron states for a 3 stacked row structure (Fig. 39 (d)) having = 1 ML and = 1 ML (a) and 5ML (b). 0.802 eV 3.405 eV 3.438 eV 3.438 eV 3.519 eV 3.553 eV 3.580 eV 3.612 eV 3.622 eV 3.657 eV 3.692 eV 3.717 eV (a) 0.532 eV 3.623 eV 3.638 eV 3.638 eV 3.744 eV 3.760 eV 3.809 eV 3.815 eV 3.826 eV 3.850 eV 3.867 eV 3.905 eV (b) 142 FIG. 43. (a) Band-tilting in the presence of internal polarization in the QD and substrate ( and ) respectively. Bound charges accumulate at the QD interfaces which attract the electron and hole to opposite sides. This leads to lower transition energy on the right side (QCSE) than on the left side. (b) Calculated potential energy ( ) for stacked sheets with = 1 – 5 ML and = 1 nm along the direction with = = 31 nm. ∆E ∆E Energy Position (z) QD QD P QD P sb P sb + + + - - - + + + - - - - + - + QCSE (a) (b) 143 4. Further Work 4.1 Novel Geometries and Configurations It was demonstrated that QDs can undergo a higher temperature transition to form self- assembled quantum rings as illustrated in Fig. 44 2,109-115 . Using our model, we will also analyze the dependence of critical size and electronic structure on various parameters such as inner and outer diameter ( and ) as well as different configuration of elliptically-shaped QDs with major and minor axes ( and ). 4.2 Second Order Piezoelectric Effects on Electronic Structure of Quantum Dots Our current work has shown that the piezoelectric effect does not significantly alter the shape of the bands in our InAs/GaAs system. Nevertheless, this effect plays a significant role in FIG. 44. Quantum ring (QR) geometries showing relevant parameters for inner and outer radii as well as major and minor axes for elliptical shapes. Photo taken from Ref. 2. Application to Quantum Rings Quantum Rings (QR) B. Lee, C. Lee, Nanotech. 15 848 (2004). a 1 a 2 a 2 a 1 b 1 b 2 • What are the critical dimensions (inner/outer diameter) of QRs? 144 more polar materials such as GaN. Our model can be expanded to study the second order contribution of the polarization 30 by calculating : where , and are elastic parameters along different crystal axes. We note that the second order polarization will depend on the product of shear strain with hydrostatic strain components. The second order polarization can be computed in a straightforward fashion from our displacement vector u, and subsequently added to the first order polarization to solve for the piezoelectric potential . 4.3 Absorption Spectra Calculation for High Efficiency Solar Cells Using QD Superlattices Using our model, it is possible to explore the design space as a function of dot size, shape, composition, inter-dot spacing, QD configuration, and polarization with the goal of identifying QD systems that can serve as highly efficient absorption media in solar cells. Our model allows us to calculate the energy levels and wavefunctions of different QDs, and these will be used to derive the transition dipole moments 3 as: 145 From these, we will compute the absorption coefficients which provide insight into the probability of transitions between different electronic states and ultimately determine the absorption efficiency at specific wavelengths. 4.4 Application to Intermediate Band Solar Cells Recently, a new application of quantum dots for enhanced efficiency of solar cells has emerged – the intermediate band solar cell (IBSC) 12,116-122 . It was proposed that QD layers embedded inside the absorbing medium of semiconductor cells can provide an additional energy band between the valence and conduction bands, which can enable photons with energies below the band gap to be effectively converted to current. The IB should be only half- filled to allow absorption of valence band electrons. Theoretically it is predicted that such a device could increase the limiting efficiency of a single gap cell from 40.7% to 63.2%. To achieve such high efficiencies, the intermediate band must also meet several other criteria 3 , namely: (1) Transitions from VB → IB and IB → CB must be optically allowed and strong, thus, the IB medium must have large dipole matrix elements. (2) In order to achieve maximal gains in quantum efficiency, the ideal VB → IB and IB → CB absorption spectra should not have any overlap, such that each transition requires a photon with unique wavelength (“photon sorting”). (3) From calculations of concentrated light, the ideal position of the IB should be 𝑒 and 𝑒 . 146 FIG. 45. (a) Electronic energy levels of an intermediate band solar cell (IBSC); (b) Desired electronic transitions for high efficiency; (c) Deleterious transitions to be minimized. Photo taken from Ref. 3. In addition, several undesirable effects must be minimized: 1. the capture of CB electrons or VB holes by the IB, 2. the thermal escape of either IB electrons to the CB, or of IB holes to the VB, 3. the nonradiative recombination of IB electrons with VB holes, and 4. the recombination of CB electrons with VB holes. Quantum well layers with impurity levels have been considered as possible candidates capable of providing an intermediate band with the desired characteristics. However, radiative recombination is expected to dominate when the wave functions of electrons in the IB are delocalized, as they are in the VB and CB. For this reason, the strong confinement effect of QDs has elevated them to the forefront of research in this area. The zero dimensional (0D) nature of the QDs favors the isolation of the IB from the CB because the sharp density of states prevents 147 fast recombination of electrons from the CB to the narrow energies in the IB known as the phonon bottleneck. Our model allows us to explore a design space to evaluate how well different QD systems meet the desired criteria for intermediate band structures outlined above. If we assume the IB forms from the lowest electron level in the QD, our model allows us to calculate the corresponding absorption coefficients 3 from: Recently, an attempt was made to design an IBSC with the desired criteria for high efficiency gains using In 0.47 Ga 0.53 As QDs with lens-like shapes, diameter = 40 nm, and height = 4 nm, separated by 12.4 nm grown on a GaAs substrate. However, the researchers concluded that the QD layers do not provide the necessary electronic structure and absorption spectra to make them suitable for this application and alternative QD systems must be studied for IBSC applications. The model developed as part of this thesis work will enable an in-depth understanding of the individual effects from size, shape, polarization, and composition on electronic structure of QDs which will help identify materials with the desired criteria. Knowledge of the decoupled mechanisms that affect band structure will also allow us to develop strain engineering techniques 24-25,97,99,123 to precisely control absorption characteristics 148 of QD systems. With this capability we will be in a position to design QD systems with the necessary energy levels to provide a new highly efficient technology for solar cell applications. 149 Appendix A. User’s Manual A.1 Electronic Structure Calculation with Strain Suzana Sburlan suzana.sburlan@gmail.com This manual will describe the various steps and procedures needed to successfully calculate the strain, piezoelectric potential, eigenvalues, and eigenvectors for systems of cubic or hexagonal crystals using the FORTRAN program partest5.f90 or partest5hex.f90 which use the MPI interface to compute strain on multiple computing nodes at the High Performance Computing Center (HPCC) of University of Southern California (USC). The conjugate gradient (CG) algorithm 51 is implemented to minimize the strain energy of a cubic lattice 48 or hexagonal lattice 49 . The general minimized residual method (gmres) is implemented to solve for the first order 30 polarization potential induced in the cubic crystal 54 or hexagonal crystal 49 due to strain. The Hamiltonians from Ref. 54 and Ref. 37 are implemented cubic and hexagonal crystal respectively. The program calculates the strain of 2-material systems induced from their lattice mismatch, or systems with up to 2 materials where some regions have free surfaces (i.e. are exposed to vacuum). 1) Log in to your hpcc account and open the folder where parstrain1.f90 is saved. Partest5.f90 requires the following additional files in the same folder: a) partest5_mod.f90 b) geometries.f90 c) parstrain.f90 d) gmrespiezVA.f90 e) piezham.f90 f) strainhamphys.f90 g) lsf_eig.f Partest5hex.f90 requires the following additional files in the same folder: h) parest5hex_mod.f90 i) geometries.f90 j) hexstrain.f90 k) gmreshexVA.f90 l) piezham.f90 m) strainhamphyshex.f90 n) lsf_eig.f 2) Open the parameter module file partest5_mod.f90 or parest5hex_mod.f90 with a text editor. For example, using vi type at the command prompt: vi partest5_mod.f90 150 This file defines all global variables for parstrain1.f90. a) SIMULATION WINDOW PARAMETERS i) Set the control parameters: (1) calcstrain = 1 to compute the strain, otherwise only confinement effects are included in the Hamiltonian (2) out_geo = 1 to output the geometry files (3) calc_piez = 1 to calculate piezoelectric and spontaneous polarization, this part is excluded from the Hamiltonian (4) out_piez = 1 to output the polarization potential result in output files (5) out_wf = 1 to output wavefunctions in output files ii) Set NX, NY, NZ as the number of grid points in each direction per processor. The total number of grid points will depend on the number of processors assigned to each dimension (see step 1), a), iii)). iii) The vproc(3)=(px, py, pz) array indicates the number of processors assigned in the x, y, and z direction of the simulation window. If one dimension is longer, choose more processors in that direction. iv) Set the physical dimensions of the simulations window as LX, LY, and LZ (input lengths in nanometers, then divide by BOHR = the Bohr radius of an atom). This will define a grid spacing such that (dx, dy, dz) = (Lx/(px*Nx), (Ly/(py*Ny), Lz/(pz*Nz)). It is recommended that the grid spacing be set equal to the lattice constant of the reference material (e.g. GaAs), although multiples of this lattice constant are also allowed (see step 2), b), ii)). v) The parameters NBAND, NWF, and NPLOT define the size of other vectors in the program, but only the last 2 may be modified. (1) NBAND = 8 should not be changed (2) NWF defines the number of iterations carried out by the Lanczos algorithm and may be changed. We have observed that the eigenvalues converge after NWF defines the number of iterations carried out by the Lanczos algorithm and may be changed. We have, but to achieve smooth wavefunctions, it is necessary for NWF ≥ 500. (3) NPLOT is the number of wavefunctions to output to file. The program automatically finds the band gap from the eigenvalues, then computes the NPLOT/2 top valence band wavefunctions and NPLOT/2 bottom conduction band wavefunctions. vi) Set sbcx, sbcy, sbcz as boundary conditions of the strain variables xv, Cmatrix, and lattice in x, y, and z. For partest5hex.f90, these are also applied to e15, e33, e31, and for partest5.f90 these are applied to e14. (1) ‘pbc’ = periodic boundary conditions – edge values on one side of simulation window are the same as edge values on the opposite side (2) ‘fbc’ = fixed boundary conditions – edge values are zero on both sides of simulation window (3) ‘tbc’ = transpare nt boundary conditions – edge values are equal to the values of their neighbors on each side of the simulation window 151 vii) Set bcx, bcy, bcz as boundary conditions of the material parameters in the Hamiltonian in x, y, and z. For partest5hex.f90 these are the matrices A1, A2, A3, A4, A5, A6, P1, P2, and for partest5.f90 these are the matrices gam1, gam2, gam3, P0. (1) ‘pbc’ = periodic boundary conditions – edge values on one side of simulation window are the same as edge values on the opposite side (2) ‘fbc’ = fixed boundary conditions – edge values are zero on both sides of simulation window (3) ‘tbc’ = transparent boundary conditions – edge values are equal to the values of their neighbors on each side of the simulation window b) GEOMETRY DEFINITION i) Enter the name of the geometry subroutine as a string (see geometries.f90). ii) Enter the names of mat1, mat2, and mat3 as strings according to the definition in geometries.f90. Let mat2 or mat3 correspond to the reference material (e.g. GaAs). The material mat1 must be set to correspond to the non-reference material (e.g. InAs). The strain for mat1 is then calculated relative to this grid spacing (defined from the reference material’s lattice constant). At the end of the simulation, the strain in mat1 is adjusted to get the physical strain 48 See Suzana Sburlan’s Ph.D. Thesis for further discussion. iii) Variables L1X – tpw8 are variables that appear in the geometry subroutines contained in the file geometries.f90. Once a geometry has been chosen by the user as in step 2), b), i), the user must specify the dimensions of all variables required in the subroutine. For example, the subroutine pped in geometries.f90 models a quantum dot (QD) with rectangular shape, which uses L1X, L1Y, and L1Z to set the lengths of each side. When simulating this geometry, the user must specify the desired values for L1X, L1Y, and L1Z in the parameter module file. All other variables will not be used for this simulation. c) SPECIFY OUTPUT FILE NAMES i) Enter strings for the file names containing output results. It is recommended that the user choose output files with the same name, except for the extension, such that the name distinguishes this simulation from others. For example, to calculate strain for a rectangular shaped QD with side lengths of 5, 6, and 7 nm, we can name the output files ‘QDcube_x5_y6_z7.str’, ‘QDcube_ x5_y6_z7.geo’, and ‘QDcube_ x5_y6_z7.pez’, ‘QDcube_ x5_y6_z7.eig’, ‘QDcube_ x5_y6_z7.wav’. ii) The strain will be output in 6 different files per processor, each containing the one strain component of the cubic lattice - , , over the region of space assigned to that processor. The strain components are unitless. iii) The piezoelectric potential will be output in a single file per processor containing the value at each grid point in the simulation. Units are in eV. iv) The geometry file will output by each processor as the value of the global variable Cmatrix(1,:,:,:). This represents the elastic strain parameter C 11 at each grid point in the simulation window. This data allows us to determine the material at each grid point. After running the simulation, we can write scripts using this file to calculate other quantities of 152 interest through planes or smaller regions in the simulation window. This allows us to easily extract information from the output files on the Unix cluster. v) The eigenvalues in eV from the last 20 iterations will be output as a single sequential output in the *.eig file for each run. For example, if NWF = 500, the eigenvalues from iterations 481 – 500 will be output in sequence. The *.eig file will have a total of 9810 lines. vi) The wav files will be output as NPLOT files per processor containing the probability density at each grid point. d) NUMERICAL ITERATION PARAMETERS 1) Corrit is the number of iterations before the CG algorithm corrects for roundoff error by recomputing the exact residual vector. This value does not usually need to be changed. For matrices of order ~n, corrit = is an appropriate value. For further explanation, see “ An Introduction to the Conjugate Gradient Method Without the Agonizing Pain” by Jonathan Richard Shewchuk (http://www.cs.cmu.edu/~quake-papers/painless- conjugate-gradient.pdf) p. 49. 2) Eps is the factor by which the error in the initial guess for the unknown vector must be reduced to complete the CG algorithm. The smaller the eps value, the longer the CG algorithm will have to run. The CG algorithm must be run with two values of eps to ensure that the strain energy has been minimized (instructions below). 3) Imax are the number of iterations after which the CG algorithm automatically stops. After completing a strain calculation, the user should check the .out file (see below). If the algorithm has completed imax iterations without reaching the desired reduction in error (determined by the eps value), the algorithm will exit with the message: “WARNING: maximum iteration depth reached: conjugate gradient”. In this case, the value of imax should be increased and the calculation repeated. 4) Restart is the number of iterations after which the gmres algorithm restarts. This does not need to be changed. For further explanation, see Suzana Sburlan’s Ph.D. thesis. 5) Epsgmres is the same as the eps variable, but for the gmres algorithm. 6) Imaxgmres is the same as the imax variable, but for gmres. After completing a strain calculation, the user should check the .out file (see below). If the algorithm has completed imaxgmres iterations without reaching the desired reduction in error (determined by the epsgmres value), the algorithm will exit with the message: “WARNING: maximum iteration depth reached: gmrespiez”. In this case, the value of imaxgmres should be increased and the calculation repeated. e) MATERIAL PARAMETER DEFINITIONS The user should define the material parameters of materials other than InAs, GaAs, and vacuum here. i) C11InAs, C12InAS, and C44InAs are the elastic strain constants of InAs taken from 26 and converted to eV/BOHR 3 . ii) The variable lcInAs is the lattice constant of InAs in nm/BOHR. iii) The variable e14InAs is the piezoelectric coefficient for InAs with units BOHR -2 . (The SI units are C/m 2 . To convert from SI to BOHR -2 , divide by the electron charge and multiply by (Bohr radius) 2 . 153 iv) The same parameters for InAs described in (1)-(3) above are defined analogously for GaAs. The user can include other material parameters in this section and modify the parstrain1.f90 file accordingly (see below). v) The elastic strain constants for vacuum are defined as C11vacu, C12vacu, and C44vacu to be several orders of magnitude smaller than those of InAs and GaAs. They are used when modeling structures with free surfaces. In this case, the convergence of the strain calculation should be verified by using two values of C11vacu, C12vacu, and C44vacu. For example, C11vacu = C12vacu = C44vacu = 1x10 -7 and C11vacu = C12vacu = C44vacu = 1x10 -8 . The lattice constant of vacuum is set to be the same as that of the reference material. f) Save the changes to patest5hex_mod.f90 or partest5_mod.f90 to a different file. It is recommended the same name be given to this new file as set for the output files in parstrain_mod.f90. For our example in step 2), c), i), the parameter module would be saved as ‘QDcube_ x5_y6_z7.f90’. It is important to save a copy of the module file for each simulation for later reference to the parameters used. 3) Compile the program with the new mod file. For example: mpif90 –o executable module_file partest5hex.f90 lsf_eig.f90 It is recommended that the executable have the same name as all other files with no extension. For example: mpif90 -o QDcube_x5_y6_z7 QDcube_x5_y6_z7.f90 partest5hex.f90 lsf_eig.f90 NOTE: Changes to software environment may be required to enable the compiler by editing the .cshrc file in the user’s root directory. (See Dr. Nakano for more information.) 4) The job must be submitted to the cluster as a batch job using the qsub command. A script file can be written to hold all commands that would be otherwise input at the command prompt when submitting a job. The user can write a script, or use submit_job.pbs as an example. a) On line 3 set the number of nodes and ppn (processors per node) values. The product of these two values is the total number of processors used. This value must be equal or greater than the total processors defined in step 2), a), iii) for the vproc array. If vproc = (2, 4, 8), the total number of processors is 2x4x8 = 64. Therefore, the user may request nodes=8:ppn=8. The processors per node on hpcc vary between 2 and 16, with 8 being most common (although this changes with each upgrade). See Dr. Nakano for more information on the allowed values of ppn. b) On line 4, set the walltime=hours:minutes:seconds as the amount of time your simulation will need to complete. The system will kill your job if it has not finished in this amount of time. The maximum setting is 24 hours. However, requesting this walltime every time will likely increase the amount of time your jobs spend waiting in the queue until they are assigned to nodes. It is recommended to request a shorter time, if known. The walltime will appear at the end of the .out when the simulation completes. Therefore, the user can use the walltime to estimate the required time for subsequent simulations. For simulations needing more than 24 hours, permission must be granted by the hpcc administrators to use hpcc’s long queue. 154 c) On lines 5, 7, and 11, input names for the .out file, the error file, and executable file. It is recommended that these all have the same name as the parameter module, e.g. QDcube_x5_y6_z7.out, QDcube_x5_y6_z7_run, and QDcube_x5_y6_z7 respectively. The .out file will capture information about the nodes carrying out the calculation, the walltime to complete the iteration, as well as any other output messages from the program itself (partest5hex.f90 or partest5.f90). An error file will also be output for each run with an .e###### extension (e.g. QDcube_x5_y6_z7_run.e7382357). This file captures any run time errors output by the processors. If the simulation does not complete normally, checking the error file and .out file will reveal the problem. d) On line 8, set WORK_HOME as the location of your executable. This will also be where all output files get written. If the job cannot find the executable in the specified folder, it will kill the job and print an error message to the error file. 5) Save the .pbs file with a name that allows easy tracking later. It is recommended that the same file name be used as the other files, e.g. QDcube_x5_y6_z7.pbs. 6) Submit the job by typing at the command prompt: qsub pbs_file For example: qsub QDcube_x5_y6_z7.pbs 7) How to write a new geometry subroutine: a) The file geometries.f90 includes all the geometries currently in use. To define a new geometry, write a new subroutine at the bottom of this file. Follow the examples of the other subroutines to define the materials that correspond to each region in your simulation window. The program currently has several variables L1X – tpw8 defined in the module files (partest5_mod.f90 and partest5hex_mod.f90) which the user may use to define the parameters in the new system – e.g. diameter, height, angle, etc. The user can choose any of these variables when writing the new geometry subroutine. The user can also define new variables, but must declare them in the module files. Care must be taken to avoid name conflict with other variables. b) The subroutine setup must be modified in partest5.f90 or partest5hex.f90 to include the name of the new geometry. In the case structure for geometry (SELECT_CASE (geometry(1:nlen))), add the call to the new subroutine following the examples of the other geometries. 8) How to concatenate output files: For the electronic band structure calculation, the output files will correspond to 6 strain components as described in step 2), c), ii), one geometry file, one polarization potential file, and NPLOT wavefunction files. Each processor that participates in the simulation will output its own file corresponding to its region of space. Each file will contain 4 columns corresponding to the x, y, and z grid points (integers) relative to the entire simulation window. The names of the output files will be numbered with the processor number in the extension. For example: QDcube_x5_y6_z7.str000031 contains the strain values calculated by processor 0, while QDcube_x5_y6_z7.str000131 contains the strain values calculated by processor 1. Similarly, QDcube_x5_y6_z7.geo0000 contains the geometry information for the region of space assigned to 155 processor 0; QDcube_x5_y6_z7.pez0000 contains the polarization potential for the region of space assigned to processor 0. Since there are multiple wavefunctions output per run, each file will be tagged with the number that corresponds to the eigenvalue of that wavefunction before the processor number. For example QDcube_x5_y6_z7.wav32600 is processor 0’s wavefunction corresponding to the 326th eigenvalue output on the last iteration of the Lanczos algorithm. For each strain component, geometry, polarization, and wavefunction file, the files from all processors must be concatenated into a single file and sorted by x, y, and z coordinates. The user may write a script to accomplish this, or use the file catstr.pbs, catgeo.pbs, catpez.pbs, and catwav_100_1.pbs. The files must be edited according to the names of the files to concatenate, and the number of files for each component (corresponding to the number of processors). The files may be sorted in any order desired – e.g. first by x, then by y, then, by z, or z, then by y, then, by x. The latter is useful for looking in a file along the growth direction z. For example, if we wanted to verify how many monolayers a QD in our specified geometry has, it is straightforward to open the geometry file sorted by z first, then scroll along the growth direction through the center of the QD and verify its height. However, as we shall see for visualization purposes, we must sort by x first to make the file compatible with our visualization software Visit (Lawrence Livermore National Lab). In the script files catstr.pbs, catgeo.pbs, catpez.pbs, and catwav_100_1.pbs, the sorting is accomplished by the line: sort –n –k 3 –k 2 –k 1 “$i””$j””$k”.wav”$wav” –o temp8 This corresponds to x first sorting while: sort –n –k 1 –k 2 –k 3 “$i””$j””$k”.wav”$wav” –o temp8 a) STRAIN FILES i) Open catstr.pbs ii) On line 2, enter the file path of the files to concatenate. iii) On lines 4-11, enter the name of the files to concatenate in a nested for loop. For example: for i in QDcube_x5 do for j in _y6 do for k in z7 do This is useful when concatenating the results of many simulations that have similar names due to similar parameters. For example, if 3 simulations of a rectangular QD were carried out while varying the side length in z to be 7, 8, and 9 nm, the script could be used to concatenate all the results sequentially using the for loop by writing: 156 for i in QDcube_x5 do for j in _y6 do for k in z7 z8 z9 do iv) If the number of processors is 9 or fewer, enter this value on line 16. For example, for 8 processors enter: for ((myid = 1; myid <=8; myid++)) Comment out the next 2 for loops (lines 25-41) by adding a # at the beginning of each line. If the number of processors is 99 or fewer, enter this value on line 25. For example, for 64 processors enter: for ((myid = 1; myid <=64; myid++)) Comment out the next for loop (lines 34-41) by adding a # at the beginning of each line. Otherwise, enter the number of processors on line 34 and leave all 3 for loops uncommented. Make sure the first two loops are number 1-9 and 1-99. v) To run catstr.pbs, it must be submitted as a job requesting a single node to hpcc. Type the following command at the prompt: qsub –A account_name –l walltime = hours:00:00 –l nodes=1 –o output_file –N error_file pbsfile For example: qsub –A lc_an –l walltime = 05:00:00 –l nodes=1 –o catstr.out –N catstr_run catstr.pbs b) GEOMETRY FILES i) Open catgeo.pbs ii) On line 2, enter the file path of the files to concatenate. iii) On lines 4-9, enter the name of the files to concatenate in a nested for loop as in step 9), c). iv) If the number of processors is 9 or fewer, enter this value on line 14. For example, for 8 processors enter: for ((myid = 1; myid <=8; myid++)) Comment out the next 2 for loops (lines 23-39) by adding a # at the beginning of each line. If the number of processors is 99 or fewer, enter this value on line 23. 157 Comment out the next for loop (lines 32-39) by adding a # at the beginning of each line. Otherwise, enter the number of processors on line 32 and leave all 3 for loops uncommented. Make sure the first two loops are number 1-9 and 1-99. v) To run catgeo.pbs, it must be submitted as a job requesting a single node to hpcc. Type the following command at the prompt: qsub –A account_name –l walltime = hours:00:00 –l nodes=1 –o output_file –N error_file pbsfile For example: qsub –A lc_an –l walltime = 05:00:00 –l nodes=1 –o catgeo.out –N catgeo_run catgeo.pbs. c) PIEZOELECTRIC POTENTIAL FILES To concatenate the pez files open catpez.pbs and follow the analogous steps for the geometry file listed in b) above. d) WAVEFUNCTION FILES i) Open catwav_100_1.pbs ii) On line 3, enter the file path of the files to concatenate. iii) On lines 5-10, enter the name of the files to concatenate in a nested for loop. For example: for i in QDcube_x5 do for j in _y6 do for k in z7 do iv) On line 12, enter the tag numbers of the wav files. For example: for ((wav = 326; wav <= 345; wav++)) We note here these tags will vary for each geometry calculated, because each system will have its own energy levels. Because of this, it may not be possible to concatenate the wav files from different systems in the same script file. For example, two different geometry outputs from QDcube_x5_y6_z7.f90 and QDcube_x5_y6_z10.f90 may have wavefunctions corresponding to eigenvalues 326 – 345 and 319 – 338 respectively. The user should check these numbers before concatenating, and if necessary write different scripts to concatenate each one separately. This can be accomplished by making different versions of catwav_100_1.pbs, but care must be taken to also change the name of the temporary file 158 (e.g. temp8) used in each version. This will enable running the different script versions simultaneously, otherwise the scripts will overwrite each other’s temporary file s. v) Follow the analogous steps in part a) to enable or disable the loops depending on the number of processors used. vi) To run catwav_100_1.pbs, it must be submitted as a job requesting a single node to hpcc. Type the following command at the prompt: qsub –A account_name –l walltime = hours:00:00 –l nodes=1 –o output_file –N error_file pbsfile For example: qsub –A lc_an –l walltime = 05:00:00 –l nodes=1 –o catwav_100_1.out –N catgeo_run catgeo.pbs. vii) If the wavefunction tags are < 100, for example QDcube_x5_y6_z7.wav036000, a slightly different version of the catwav_100_1.pbs should be used. An example is found in catwav_10.pbs. 9) How to interpret output files: a) EXTRACTING DATA SUBSETS Once all output files are concatenated and sorted, the user will have 6 strain component files, 1 geometry file, 1 polarization file, 1 eigenvalue file, and NPLOT wavefunction files for each geometry. The user can write additional program to extract the strain components through a plane, or calculate other quantities of interest. Examples of this are the program files strain_all.f90 and strain_allhex.f90. These programs calculate the total strain energy through a region of the simulation window having a square column shape as defined by the user at the top. Note that these particular programs assume the file to be sorted first by x, then y, then z. They can be compiled and run locally and use the module files the desired geometries. An example of how to compile and run would be: pgf90 –o executable module_file strain_all.f90 ./executable For example: pgf90 -o QDcube_x5_y6_z7_sa QDcube_x5_y6_z7.f90 strain_all.f90 ./ QDcube_x5_y6_z7_sa Individual strain components in a desired region can also be output to a file using the program getstrain.f90. In addition, getpez.f90 prints the polarization potential in a user specified region of space. 159 These smaller files containing subsets of data can then be imported directly into Matlab for plotting. b) VISUALIZATION IN 3-D We have used the free software Visit (https://wci.llnl.gov/code/visit) for 3-D rendering of our data throughout this work. It has been useful to plot 3-D isosurfaces for the wavefunctions and polarization potentials, as well as for plotting the geometry data in order to verify correctness. This has been accomplished using the “Contour Plot” feature. Visit takes as input a file in “silo” format. This means all output text files to be plotted must first be converted to “silo”. We use the utility txt2silo for this purpose. This program takes as input the desired concatenated and sorted text file such as QDcube_x5_y6_z7.wav326 and outputs a silo file with the same name as the input but having “qm -“ in the front. For example: qm - QDcube_x5_y6_z7.wav326. Txt2silo additionally expects the file to contain 2 lines at the top that indicate the number of grid points in each direction. If we want to automatically convert to silo, we must concatenate these two lines at the beginning of each data file. Text to silo conversion has been included in all of the concatenation scripts mentioned above – catgeo.pbs, catpez.pbs, catwav_100_1.pbs, and catstr.pbs. This is accomplished after each file is concatenated and sorted. The 3 lines that carry out the conversion are, for example: cat dns.dat “$i””$j””$k”.wav”$wav” > temp8 mv temp8 “$i””$j””$k”.wav”$wav” /home/rcf-40/sburlan/txt2silo “$i””$j””$k”.wav”$wav” The first line above adds the contents of the dns.dat file to the top of the data file. The dns.dat file should contain 2 lines specifying the number of grid points as well as starting and ending values in the x, y, and z directions in the following format: 1 Nx Ny Nz 1 start_x start_y start_z end_x end_y end_z dns tmp For example: 1 200 200 32 1 1.000 1.000 1.000 62.24 62.24 15.94 dns tmp The numbers 1.000 1.000 1.000 62.24 62.24, and 15.94 will be applied by Visit as units along the plot axes. Here we have specified them in nanometer. The “qm -“ file output by txt2silo can be directly opened by Visit. 10) How to check for convergence: It is highly recommended that the convergence of the CG algorithm be checked. The user defined eps value determines how many iterations the program will carry out to reduce the initial error by 160 the specified amount. However, it does not guarantee that the strain energy is minimized to within a certain order of magnitude. Therefore, it is recommended that each simulation be repeated with at least 2 different values of eps, e.g. eps = 1x10 --3 and eps = 1x10 —4 . For each simulation, the total strain energy in the system can be computed based on the output strain components and the resulting values can be compared for agreement. It has been observed that systems of embedded QDs require fewer iterations to converge (and therefore larger allowed values for eps), while systems with free surfaces or completely rigid materials require more iterations. In the former case, the amount of QD material is relatively small compared to the surrounding substrate. The convergence should be carefully monitored when the number system contains nearly equal amounts of each material. The variable epsgmres should also be varied in this way to ensure convergence of the gmres iteration for piezoelectric potential. 11) How to include other material parameters: The required material parameters for InAs and GaAs are currently included in parstrain_mod.f90. To use additional materials, the user must define these in parstrain_mod.f90. Additionally, the user must edit the subroutine setup in parstrain1.f90. There are 3 SELECT CASE structures where the user must include a case for the new material: SELECT_CASE (mat1) SELECT_CASE(mat2) SELECT_CASE(mat) Include the reference to the new material following the examples of the materials already present. A.2 Strain and Polarization Only Calculation In addition to partest5hex.f90 and partest5.f90, we have implemented two additional versions of the code that compute only strain and polarization effects. These are hexstrain1.f90 and parstrain1.f90. These standalone programs only require definition of the material parameters needed for strain and polarization and exclude the additional ones used in the Hamiltonian diagonalization. This allows us to save memory. 12) The program parstrain1.f90 can be used as follows: (a) The program requires the following additional files in the same folder: (i) parstrain1_mod.f90 (ii) geometries.f90 (iii) parstrain.f90 (iv) gmrespiezVA.f90 (v) lsf_eig.f 161 (b) Follow the same steps outlined in the previous section in the parstrain1_mod.f90 file to define the parameters that apply to strain and polarization. 13) Compile the program using: mpif90 –o executable module_file parstrain1.f90 lsf_eig.f90 14) Run the program using the a pbs file and the qsub command as outlined in the previous section. 15) The program hexstrain1.f90 can be used as follows: (c) The program requires the following additional files in the same folder: (i) hexstrain1_mod.f90 (ii) geometries.f90 (iii) hexstrain.f90 (iv) gmreshexVA.f90 (v) lsf_eig.f 16) This program is compiled and run analogously to parstrain1.f90 above. 162 5. References 1 Q. H. Xie, A. Madhukar, P. Chen, and N. P. Kobayashi, Physical Review Letters 75, 2542 (1995). 2 B. C. Lee and C. P. Lee, Nanotechnology 15, 848 (2004). 3 V. Popescu, G. Bester, M. C. Hanna, A. G. Norman, and A. Zunger, Physical Review B 78, 205321 (2008). 4 R. Agarwal, C. J. Barrelet, and C. M. Lieber, Nano. Letters 5, 917 (2005). 5 M. H. Huang, S. Mao, H. Feick, H. Q. Yan, Y. Y. Wu, H. Kind, E. Weber, R. Russo, and P. D. Yang, Science 292, 1897 (2001). 6 T. Bryllert, W. Lars-Erik, F. E. Linus, and L. Samuelson, IEEE Electron Device Letters 27, 323 (2006). 7 P. Nguyen, N. H. T., T. Yamada, M. K. Smith, J. Li, J. Han, and M. Meyyappan, Nano Letters 4, 651 (2004). 8 F. Patolsky, B. P. Timko, G. H. Yu, Y. Fang, A. B. Greytak, G. F. Zheng, and C. M. Lieber, Science 313, 1128640 (2006). 9 J. Xiang, W. Lu, Y. Hu, Y. Wu, H. Yan, and C. M. Lieber, Nature Letters 441, 490 (2006). 10 V. C. Elarde, T. S. Yeoh, R. Rangarajan, and J. J. Coleman, Journal of Crystal Growth 272, 148 (2004). 11 V. B. Verma and J. J. Coleman, Applied Physics Letters 93, 111117 (2008). 12 G. D. Wei, K. T. Shiu, N. C. Giebink, and S. R. Forrest, Applied Physics Letters 91, 223507 (2007). 13 M. Ichimura, Phys. Stat. Sol. (a) 153, 431 (1996). 14 A. Sasaki, Thin Solid Films 267, 24 (1995). 15 D. A. Faux, G. Jones, and E. P. Oreilly, Modelling and Simulation in Materials Science and Engineering 2, 9 (1994). 16 Ovid'ko, Phys. Rev. Lett. 88, 046103 (2002). 17 B. Nikoobakht and A. Herzing, ACS Nano 4, 5877 (2010). 18 H. J. Chu, T. W. Yeh, L. Stewart, and P. D. Dapkus, Phys. Status Solidi C 7, 2494 (2010). 19 K. Tomioka, Journal of Materials Research 26, 2127 (2011). 20 V. G. Dubrovskii, N. V. Sibirev, G. E. Cirlin, J. C. Harmand, and V. M. Ustinov, Phys. Rev. E 73, 021603 (2006). 21 J. C. Harmand, G. Patriarche, N. Pere-Laperne, M.-N. Merat-Combes, L. Travers, and F. Glas, Applied Physics Letters 87, 3101 (2005). 22 Y. Wu and Y. P., J. Am. Chem. Soc. 123, 3165 (2001). 23 L. D. Contreras-Pulido and R. Aguado, Physical Review B 77, 155420 (2008). 24 L. Seravalli, P. Frigeri, M. Minelli, P. Allegri, V. Avanzini, and S. Franchi, Applied Physics Letters 87, 063101 (2005). 25 L. Seravalli, M. Minelli, P. Frigeri, S. Franchi, G. Guizzetti, M. Patrini, T. Ciabattoni, and M. Geddo, Journal of Applied Physics 101, 024313 (2007). 26 I. Vurgaftman, J. R. Meyer, and L. R. Ram-Mohan, Journal of Applied Physics 89, 5815 (2001). 27 V. Fonoberov, E. P. Pokatilove, and A. Balandin, J. Nanosci. Nanotech. 3, 253 (2003). 28 Q. Peng, X. Zhang, L. Hung, E. A. Carter, and G. Lu, Physical Review B 78, 054118 (2008). 29 O. Stier, M. Grundmann, and D. Bimberg, Physical Review B 59, 5688 (1999). 30 A. Schliwa, M. Winkelnkemper, and D. Bimberg, Phys. Rev. B 76, 205324 (2007). 31 N. Vukmirovc and S. Tomic, Journal of Applied Physics 103, 103718 (2008). 163 32 N. Vukmirovic, D. Indjin, V. D. Jovanovic, Z. Ikonic, and P. Harrison, Physical Review B 72, 075356 (2005). 33 J. Davies, The Physics of Low-Dimensional Semiconductors : An Introduction (Cambridge University Press, 1999). 34 E. O. Kane, J. Phys. Chem. Solids 1, 249 (1957). 35 J. M. Luttinger, Kohn, W., Physical Review 97 (1955). 36 C. Pryor, Phys. Rev. B 57 (1998). 37 S. L. Chuang and C. S. Chang, Phys. Rev. B 54, 2491 (1996). 38 M. Suzuki and S. Uenoyama, 1995 52, 8132 (1995). 39 P. Lowdin, Journal of Chemical Physics 19 (1951). 40 J. K. Cullum and R. A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1, 1 ed. (SIAM: Society for Industrial and Applied Mathematics, 2002). 41 W. E. Arnoldi, Q. Appl. Math. 9 (1951). 42 Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed. (Society for Industrial and Applied Mathematics, Philadelphia, 2003). 43 D. Calvetti, L. Reichel, and D. C. Sorensen, Electronic Transactions on Numerical Analysis 2 (1994). 44 K. Wu and H. D. Simon, in Thick-Restart Lanczos Method for Symmetric Eigenvalue Problems, 1998. 45 B. A. Foreman, Phys. Rev. B 56 (1997). 46 K. I. Kolokolov, J. Li, and C. Z. Ning, Phys. Rev. B 68 (2003). 47 L. D. Landau and E. M. Lifshitz, Theory of Elasticity, 3 ed. (Butterworth-Heinemann Ltd., 1986). 48 C. Pryor, J. Kim, L. W. Wang, A. J. Williamson, and A. Zunger, Journal of Applied Physics 83, 2548 (1998). 49 V. Fonoberov and A. Balandin, J. Appl. Phys. 94, 7178 (2003). 50 Y. M. Liu, Z. Y. Yu, X. M. Ren, and Z. H. Xu, Chinese Physics B 17, 3471 (2008). 51 J. R. Shewchuk, An Introduction to Conjugate Gradient Method Without the Agonizing Pain (Carnegie Mellon University, Pittsburgh, 1994). 52 S. Sburlan, P. D. Dapkus, and A. Nakano, Journal of Applied Physics 111, 054907 (2012). 53 C. Wood and J. Debdeep, Polarization Effects in Semiconductors: From Ab Initio Theory to Device Applications (Springer, New York, 2008). 54 C. Pryor, Phys. Rev. B 57 (1998). 55 F. Qian, S. Gradecak, Y. Li, C. Y. Wen, and C. M. Lieber, Nano Letters 5, 2287 (2005). 56 Y. Qin, X. Wang, and Z. L. Wang, Nature Letters 451, 809 (2008). 57 M. T. Bjork, B. J. Ohlsson, T. Sass, A. I. Persson, C. Thelander, M. H. Magnusson, K. Deppert, L. R. Wallenberg, and L. Samuelson, Nano Letters 2, 87 (2002). 58 W. Wei, X. Y. Bao, C. Soci, D. Y., W. Z. L., and W. D., Nano Letters 9, 2926 (2009). 59 M. Gratzel, Nature 414, 338 (2001). 60 M. Law, L. E. Greene, J. C. Johnson, R. Saykally, and P. D. Yang, Nature Materials Letters 4, 455 (2005). 61 B. Nikoobakht, Chem. Mater. 19, 5279 (2007). 62 K. Tomioka, Y. Kobayashi, J. Motohisa, S. Hara, and T. Fukui, Nanotechnology 20, 145302 (2009). 63 M. S. Gudiksen, J. Wang, and C. M. Lieber, J. Phys. Chem. B 105, 4062 (2001). 64 K. Hiruma, M. Yazawa, T. Katsuyama, K. Ogawa, K. Haraguchi, M. Koguchi, and H. Kakibayashi, Applied Physics Reviews 77, 447 (1995). 65 L. E. Jensen, M. T. Bjork, S. Jeppesen, A. I. Persson, B. J. Ohlsson, and L. Samuelson, Nano Letters 4, 1961 (2004). 164 66 T. Kuykendall, P. J. Pauzauskie, Y. Zhang, J. Goldberger, D. Sirbuly, J. Denlinger, and P. D. Yang, Nature Materials Letters 3, 524 (2004). 67 I. Levin, A. Davydov, B. Nikoobakht, and N. Sanford, Applied Physics Letters 87, 103110 (2005). 68 T. Martensson, C. Patrik, T. Svensson, B. A. Wacaser, M. W. Larsson, W. Seifert, K. Deppert, A. Gustafsson, L. R. Wallenberg, and L. Samuelson, Nano Letters 4, 1987 (2004). 69 W. I. Park and G. C. Yi, Adv. Mater. 16, 87 (2004). 70 P. J. Poole, J. Lefebvre, and J. Fraser, Applied Physics Letters 83, 2055 (2003). 71 C. P. T. Svensson, W. Seifert, M. W. Larsson, L. R. Wallenberg, J. Stangl, G. Bauer, and L. Samuelson, Nanotechnology 16, 936 (2005). 72 M. A. Verheijen, G. Immink, T. de Smet, M. T. Borgstrom, and E. P. A. M. Bakkers, J. Am. Chem. Soc. 128, 1353 (2006). 73 J. B. Hannon, S. Kodambaka, F. M. Ross, and R. M. Tromp, Nature Letters 440, 69 (2006). 74 M. T. Borgstrom, G. Immink, B. Ketelaars, R. Algra, and E. P. A. M. Bakkers, Nat. Nanotechnol. 2, 541 (2007). 75 H. J. Fan, F. Bertram, A. Dadgar, J. Christen, A. Krost, and M. Zacharias, Nanotechnology 15, 1401 (2004). 76 B. Nikoobakht, S. Eustis, and A. Herzing, J. Phys. Chem. C 113, 7031 (2009). 77 B. J. Spencer and J. Tersoff, Applied Physics Letters 77, 2533 (2000). 78 Z. Zhong and Q. P. Sun, International Journal of Solids and Structures 39, 5753 (2002). 79 D. Hull and D. J. Bacon, Introduction to Dislocations (Butterworth-Heinemann, 2001). 80 K. Shiraishi, N. Oyama, K. Okajima, N. Miyagishima, K. Takeda, H. Yamaguchi, I. Tomonori, and O. Takahisa, Journal of Crystal Growth 237-239, 206 (2002). 81 E. Ertekin, P. A. Greaney, and D. C. Chrzan, Journal of Applied Physics 97, 114325 (2005). 82 K. Hiruma, M. Yazawa, T. Katsuyama, K. Ogawa, K. Haraguchi, and M. Koguchi, Applied Physics Reviews 77, 447 (1995). 83 S. Raychaudhuri and E. T. Yu, J. Vac. Sci. Technol. B. 24, 2053 (2006). 84 T. Akiyama, K. Sano, K. Nakamura, and T. Ito, Japanese Journal of Applied Physics 45, L275 (2006). 85 A. Braun, K. M. Briggs, and P. Boni, Journal of Crystal Growth 241, 231 (2002). 86 F. Glas, Physical Review B 74, 121302 (2006). 87 S. Raychaudhuri and E. T. Yu, J. Appl. Phys. 99, 114308 (2006). 88 B. Smith, P. Bjorstad, and W. Gropp, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations (Cambridge University Press Cambridge, UK, 2004). 89 A. Kaw and M. Hess, Adequacy of Solutions (University of South Florida, Tampa, 2001). 90 R. People and J. C. Bean, Applied Physics Letters 47, 322 (1985). 91 J. Shi and X. Wang, J. Phys. Chem. C 114, 2082 (2010). 92 aaa 93 K. L. Kavanagh, Semiconductor Science and Technology 25, 024006 (2010). 94 N. Huang, C. Lin, and M. Povinelli, Journal of Optics 14, 024004 (2012). 95 X. Su, R. K. Kalia, A. Nakano, P. Vashista, and A. Madhukar, Applied Physics Letters 79, 4577 (2001). 96 F. Guffarth, R. Heitz, A. Schliwa, O. Stier, N. N. Ledentsov, A. R. Kovsh, V. M. Ustinov, and D. Bimberg, Physical Review B 6408, 085305 (2001). 97 B. C. Lee, S. D. Lin, C. P. Lee, H. M. Lee, J. C. Wu, and K. W. Sun, Applied Physics Letters 80, 326 (2002). 98 S. Tomic, P. Howe, N. M. Harrison, and T. S. Jones, Journal of Applied Physics 99, 093522 (2006). 99 K. M. Kim, Y. J. Park, S. H. Son, S. H. Lee, J. I. Lee, J. H. Park, and S. K. Park, Physica E-Low- Dimensional Systems & Nanostructures 24, 148 (2004). 165 100 M. Makeev and A. Madhukar, Physical Review Letters 86, 5542 (2001). 101 D. Bimberg, M. Grundmann, and N. N. Ledenstov, Quantum Dot Heterostructures (Wiley, West Sussex, 1999). 102 D. Bimberg, Semiconductor Nanostructures (Springer, Berlin, 2008). 103 P. Bhattacharya, M. Zhang, and J. Hinckley, 2010 97, 251107 (2010). 104 M. Zhang, A. Banerjee, C. S. Lee, J. Hinckley, and P. Bhattacharya, Applied Physics Letters 98, 221104 (2011). 105 M. Zhang, P. Bhattacharya, and W. Guo, Applied Physics Letters 97, 011103 (2010). 106 F. Widmann, J. Simon, B. Daudin, G. Feuillet, J. L. Rouviere, and N. T. Pelekanos, Phys. Rev. B 58 (1998). 107 F. Widmann, B. Daudin, G. Feuillet, Y. Samson, and J. L. Rouviere, J. Appl. Phys. 83 (1998). 108 S. Sburlan, P. D. Dapkus, A. Nakano, Applied Physics Letters 100, 163108 (2012). 109 B. Alen, J. Martinez-Pastor, D. Granados, and J. M. Garcia, Physical Review B 72, 155331 (2005). 110 J. A. Barker, R. J. Warburton, and E. P. O'Reilly, Physical Review B 69, 035327 (2004). 111 I. Filikhin, E. Deyneka, H. Melikyan, and B. Vlahovic, Molecular Simulation 31, 779 (2005). 112 B. Y. Jia, Z. Y. Yu, and Y. M. Liu, Modelling and Simulation in Materials Science and Engineering 17, 035004 (2009). 113 A. Lorke, R. J. Luyken, J. M. Garcia, and P. M. Petroff, Japanese Journal of Applied Physics Part 1- Regular Papers Short Notes & Review Papers 40, 1857 (2001). 114 F. Martins, B. Hackens, M. G. Pala, T. Ouisse, H. Sellier, X. Wallart, S. Bollaert, A. Cappy, J. Chevrier, V. Bayot, and S. Huant, Physical Review Letters 99, 136807 (2007). 115 J. Sormunen, J. Riikonen, T. Hakkarainen, M. Sopanen, and H. Lipsanen, Japanese Journal of Applied Physics Part 2-Letters & Express Letters 44, L1323 (2005). 116 L. Caudra, A. Marti, A. Luque, C. R. Stanley, and A. McKee, 17th European Photovoltaic Solar Energy Conference 22, 98 (2001). 117 N. Lopez, A. Marti, A. Luque, C. Stanley, C. Farmer, and P. Diaz, Journal of Solar Energy Engineering-Transactions of the Asme 129, 319 (2007). 118 A. Luque and A. Marti, Physical Review Letters 78, 5014 (1997). 119 A. Luque and A. Marti, Electronics Letters 44, 943 (2008). 120 A. Luque, A. Marti, N. Lopez, E. Antolin, E. Canovas, C. Stanley, C. Farmer, L. J. Caballero, L. Cuadra, and J. L. Balenzategui, Applied Physics Letters 87, 083505 (2005). 121 A. Luque, A. Marti, N. Lopez, E. Antolin, E. Canovas, C. Stanley, C. Farmer, and P. Diaz, Journal of Applied Physics 99, 094503 (2006). 122 G. D. Wei and S. R. Forrest, Nano Letters 7, 218 (2007). 123 L. Seravalli, P. Frigeri, P. Allegri, V. Avanzini, and S. Franchi, Materials Science & Engineering C- Biomimetic and Supramolecular Systems 27, 1046 (2007).
Abstract (if available)
Abstract
Quantum dots (QDs) and nanowires (NWs) are being investigated as potential alternatives to conventional thin film optoelectronic devices. These nanostructures have the potential to partially relieve the strain caused by lattice mismatch between two materials. The accumulated strain at interfaces leads to defect generation which degrades device performance. QDs and NWs have electronic transition levels which differ significantly from that of the bulk material, therefore, a model is necessary to understand the effects of confinement, strain, and polarization effects on these energy levels. Here we implement and validate a model to compute transition energies of nanostructures based on k.p theory with continuum elasticity for strain. We apply our model to study the impact of strain on the formation and electronic structure of QDs and NWs in the context of novel solar cells or light emitting devices. Our model enables us to understand the impact of geometry, configuration, and composition of QDs and NWs on the transition wavelengths in order to predict which characteristics will provide improvements in device efficiency.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Light management in nanostructures: nanowire solar cells and optical epitaxial growth
PDF
Growth, characterization of gallium arsenide based nanowires and application in photovoltaic cells
PDF
Synthesis and properties study of Q1D semiconductor nanostructures
PDF
Optical, mechanical, and electrical properties of nano-structured materials
PDF
Nanophotonic light management in thin film silicon photovoltaics
PDF
Quantum molecular dynamics simulations of hydrogen production and solar cells
PDF
Organic solar cells: molecular electronic processes and device development
PDF
Temperature-dependent photoionization and electron pairing in metal nanoclusters
PDF
One-dimensional nanomaterials for electronic and sensing applications
PDF
GaAs nanowire optoelectronic and carbon nanotube electronic device applications
PDF
Non-radiative energy transfer for photovoltaic solar energy conversion: lead sulfide quantum dots on silicon nanopillars
PDF
Molecular dynamics simulations of nanostructures
PDF
GaN nanostructures grown by selective area growth for solid-state lighting
PDF
Organic photovoltaics: from materials development to device application
PDF
Efficiency droop in indium gallium nitride light emitters: an introduction to photon quenching processes
PDF
Optical and electrical characterization of one-dimensional (1D) and two-dimensional (2D) nanostructures
PDF
Zero-dimensional and one-dimensional nanostructured materials for application in photovoltaic cells
PDF
Modeling and mitigation of radiation-induced charge sharing effects in advanced electronics
PDF
Nanorod-based InGaN/GaN core-shell nanoLEDs
PDF
One-dimensional nanostructures for chemical sensing, transparent electronics, and energy conversion and storage devices
Asset Metadata
Creator
Sburlan, Suzana E.
(author)
Core Title
Nanostructure electronic band modeling for solar cell and lighting applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
04/18/2013
Defense Date
01/31/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
band gap,electronic structure,LEDs,nanowires,OAI-PMH Harvest,quantum dots,solar cells,strain
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Dapkus, Paul Daniel (
committee chair
), Nakano, Aiichiro (
committee member
), Steier, William Henry (
committee member
)
Creator Email
suzana.sburlan@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-239732
Unique identifier
UC11293975
Identifier
etd-SburlanSuz-1564.pdf (filename),usctheses-c3-239732 (legacy record id)
Legacy Identifier
etd-SburlanSuz-1564.pdf
Dmrecord
239732
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Sburlan, Suzana E.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
band gap
electronic structure
LEDs
nanowires
quantum dots
solar cells
strain