Close
About
FAQ
Home
Collections
Login
USC Login
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Models and information rates for channels with nonlinearity and phase noise
(USC Thesis Other)
Models and information rates for channels with nonlinearity and phase noise
PDF
Download
Share
Open document
Flip pages
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content
MODELS AND INFORMATION RATES FOR CHANNELS WITH NONLINEARITY AND PHASE NOISE by Hassan Ghozlan A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2015 Copyright 2015 Hassan Ghozlan Dedication To my family. ii Acknowledgements I express my deep gratitude to my advisor, Prof. Gerhard Kramer, for his guidance and support for the past ve years. Among the things I learned from him is the importance of quality, clarity and conciseness. I am grateful to Prof. Urbashi Mitra and Prof. Jason Fulman for having been on my dissertation committee as well as my qualifying exam committee. I also would like to thank Prof. Alan Willner and Prof. Giuseppe Caire for having been on my qualifying exam committee. I would like to thank Ren e- Jean Essiambre and Chongjin Xie for making my visit to Bell Labs on Crawford Hill a valuable experience. I thank my colleagues at the Communication Sciences Institute and would like mention among them Ansuman Adhikary, Song-Nam Hong, Arash Saber Tehrani, Ji Mingyue, Oluwaseun (Seun) Sangodoyin and Ryan Rogalin. I want to thank all the Ph.D. students and the postdocs at the Communication Engineering Institute (LNT) at the Technical University of Munich for making me feel as a part of the group during my visit, with special thanks to Jie (Jacky) Hou, Tobias Lutz and Yingkan Chen. I thank the sta at the Electrical Engineering department especially Diane Deme- tras, Gerrielyn Ramos, Tim Boston and Susan Wiedem for their help in sorting out the administrative issues. Finally, I thank my parents and my brothers for their endless support, patience and love. iii Table of Contents Dedication ii Acknowledgements iii List of Figures 3 Part 1 Optical Fiber Channels with Nonlinearity and Dispersion 4 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Fiber Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Overview of Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5 Zero Group Velocity Mismatch . . . . . . . . . . . . . . . . . . . . . . . . 41 1.5.1 Discrete-Time Two-User Model . . . . . . . . . . . . . . . . . . . . 41 1.5.2 Amplitude Modulation . . . . . . . . . . . . . . . . . . . . . . . . . 43 1.5.3 Phase Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1.5.4 Interference Focusing . . . . . . . . . . . . . . . . . . . . . . . . . . 47 1.5.4.1 Phase Contribution . . . . . . . . . . . . . . . . . . . . . 49 1.5.4.2 Amplitude Contribution . . . . . . . . . . . . . . . . . . . 50 1.5.5 Discrete-Time K-User Model . . . . . . . . . . . . . . . . . . . . . 52 1.5.5.1 Interference Focusing . . . . . . . . . . . . . . . . . . . . 53 1.6 Non-Zero Group Velocity Mismatch . . . . . . . . . . . . . . . . . . . . . 55 1.6.1 Continuous-Time Model . . . . . . . . . . . . . . . . . . . . . . . . 56 1.6.2 Discrete-Time Model . . . . . . . . . . . . . . . . . . . . . . . . . . 61 1.6.3 Inner Bound: Phase Modulation . . . . . . . . . . . . . . . . . . . 64 1.6.4 Outer Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 1.6.5 Interference Focusing . . . . . . . . . . . . . . . . . . . . . . . . . . 69 1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 1.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 1.8.1 Ring Modulation in AWGN Channels . . . . . . . . . . . . . . . . 75 1.8.2 Upper Bound on the Modied Bessel Function of the First Kind . 77 1.8.3 Expected Value of the Logarithm of a Rician R.V. . . . . . . . . . 78 1.8.4 Minimum-Distance Estimator . . . . . . . . . . . . . . . . . . . . . 84 1.8.5 Orthogonality of Impulse Responses . . . . . . . . . . . . . . . . . 86 1.8.6 Sum of Sinc Squared . . . . . . . . . . . . . . . . . . . . . . . . . . 88 1.8.7 Volterra Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 1.8.8 A Perturbative Approach . . . . . . . . . . . . . . . . . . . . . . . 95 1 1.8.9 Non-Homogeneous First-Order Linear PDE . . . . . . . . . . . . . 97 1.8.10 Solution to Two Coupled Partial Dierential Equations . . . . . . 99 Part 2 Wiener Phase Noise Channels 102 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 2.2 Waveform Phase Noise Channel . . . . . . . . . . . . . . . . . . . . . . . . 108 2.3 Discrete-time Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 2.3.1 Matched Filter Receiver . . . . . . . . . . . . . . . . . . . . . . . . 110 2.3.2 Multi-sample Receiver . . . . . . . . . . . . . . . . . . . . . . . . . 112 2.4 High SNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 2.4.1 Amplitude Modulation . . . . . . . . . . . . . . . . . . . . . . . . . 116 2.5 Rate Computation at Finite SNR . . . . . . . . . . . . . . . . . . . . . . . 129 2.5.1 Computing The Conditional Probability . . . . . . . . . . . . . . . 133 2.5.2 Computing The Marginal Probability . . . . . . . . . . . . . . . . 134 2.6 Numerical Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 2.6.1 Excessively Large Linewidth . . . . . . . . . . . . . . . . . . . . . 138 2.6.2 Large Linewidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 2.6.3 Comparison With Other Models . . . . . . . . . . . . . . . . . . . 141 2.7 High SNR Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 2.7.1 Phase Modulation in AWGN Channels . . . . . . . . . . . . . . . . 143 2.7.2 Phase Modulation in Phase Noise Channels . . . . . . . . . . . . . 149 2.8 Summary and Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . 159 2.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 2.10 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 2.10.1 Details of Discrete-Time Model of Multi-sample Receiver . . . . . 162 2.10.2 Details of Double Filtering Receiver . . . . . . . . . . . . . . . . . 165 2.10.3 Statistical Quantities of Filtered Wiener Phase Noise . . . . . . . . 169 2.10.4 Statistical Quantities of G . . . . . . . . . . . . . . . . . . . . . . . 176 2.10.5 Conditional PDF of Output Phase Given Input Phase in AWGN Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 2.10.6 Supporting Lemmas for Lower Bound on Rate of Phase Modulation in AWGN Channels . . . . . . . . . . . . . . . . . . . . . . . . . . 181 2.10.7 Some Approximations of Circular Distributions . . . . . . . . . . . 184 References 186 2 List of Figures 1.1 Interference Focusing Modulation. . . . . . . . . . . . . . . . . . . . . . . 73 1.2 Frequency responses of receiver lters. . . . . . . . . . . . . . . . . . . . . 74 2.1 Waveform phase noise channel. . . . . . . . . . . . . . . . . . . . . . . . . 106 2.2 Transmitter: pulse-shaping lter. . . . . . . . . . . . . . . . . . . . . . . . 110 2.3 Matched lter receiver with symbol rate sampling. . . . . . . . . . . . . . 111 2.4 Multi-sample receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 2.5 Double ltering receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 2.6 Bayesian network for X n ;S n ; n for n = 9. . . . . . . . . . . . . . . . . . 133 2.7 Bayesian network for X n ;S n ; n for n = 9 and L = 3. . . . . . . . . . . . 135 2.8 Lower bounds on rates for 16-QAM, square transmit-pulse and multi- sample receiver at f HWHM T s = 0:125. . . . . . . . . . . . . . . . . . . . . . 138 2.9 Lower bounds on rates for 16-QAM, cosine-squared transmit-pulse and multi-sample receiver at f HWHM T s = 0:125. . . . . . . . . . . . . . . . . . 139 2.10 Lower bounds on rates for 16-PSK, square transmit-pulse and multi-sample receiver at f HWHM T s = 0:0125. . . . . . . . . . . . . . . . . . . . . . . . . 140 2.11 Comparison of information rates for dierent models. . . . . . . . . . . . . 141 2.12 Two-stage decoding in a multi-sample receiver. . . . . . . . . . . . . . . . 158 3 Part 1 Optical Fiber Channels with Nonlinearity and Dispersion 4 Abstract Discrete-time interference channel models are developed for information trans- mission over optical ber using wavelength-division multiplexing. A set of coupled non- linear Schr odinger equations (derived from Maxwell's equations) is the corner stone of the models. The rst model is a memoryless model that captures the nonlinear phenomena of the cross-phase modulation but ignores dispersion. The main characteristic of the model is that amplitude variations on one carrier wave are converted to phase variations on an- other carrier wave, i.e., the carriers interfere with each other through amplitude-to-phase conversion. For the case of two carriers, a new technique called interference focusing is proposed where each carrier achieves the capacity pre-log 1, thereby doubling the pre-log of 1/2 achieved by using conventional methods. The technique requires neither chan- nel time variations nor global channel state information. For more than two carriers, interference focusing is also useful under certain conditions. The second model captures the nonlinear phenomena of the cross-phase modulation in addition to dispersion. The dispersion is included by taking into account the group velocity mismatch but ignoring all higher-order eects of dispersion. Moreover, the model captures the eect of ltering at the receivers. In a 3-user system, it is shown that all users can achieve the maximum pre-log factor 1 simultaneously by using interference focusing, a time-limited pulse and a bank of lters at the receivers, thus exploiting all the available amplitude and phase degrees of freedom. 1.1 Introduction Data networks, most notably the internet, play a crucial rote in connecting the world today whether in the business sector, the government sector or residential users. With the 5 increase of the processing power of computers and the innovation in telecommunication technologies, services provided over these networks have evolved from simple services like email to more bandwidth-hungry services like video streaming. Moreover, the growth of the number of personal computers connected to the internet and the increasing popularity of smart phones and other mobile devices have fueled an exponential growth of trac. There are several technologies for access networks ranging from wireless, such as 3G cellular technology, to wired, such as DSL or optical ber. The majority of trac in the core network is carried by optical ber. Understanding the ultimate limits of transporting information over optical ber networks is thus of great importance and would help to provide guidelines for designing such networks to meet the rapidly growing trac demand. The fundamental limits, also referred to as the capacity, can be studied using tools of information theory. An appealing property of optical ber is that it has low power attenuation over a large window of frequencies which allows the transmission of signals of large bandwidth over long distances. Optical ampliers compensate the power loss but unfortunately, they also add noise. Moreover, a signal propagating in optical ber experiences distortions due to chromatic dispersion and Kerr non-linearity. The optical ber channel thus suers from three main impairments of dierent nature: noise, dispersion, and Kerr non-linearity. The interaction between these three phenomena makes the problem of estimating the capacity challenging. This chapter is organized as follows. We review in Sec. 1.2 basic denitions and relations in information theory. In Sec. 1.3, we describe the wave propagation equation in optical ber and the impairments that arise in transmission. We provide an overview 6 of related work in Sec. 1.4. We study the case of zero group velocity mismatch (zero dispersion) in Sec. 1.5. We extend this model to non-zero group velocity mismatch in Sec. 1.6. We develop discrete-time interference channel models for both cases and show that a pre-log of 1 is achievable for all users, despite the cross-phase modulation that arises due to the ber non-linearity. 1.2 Information Theory In this section, we introduce basic information-theoretic quantities and inequalities. For more details, the reader is referred to [22]. Discrete Random Variables. LetX,Y andZ be discrete random variables. Letp X;Y be the joint probability distribution 1 ofX andY . The marginal probability distribution p X of X is p X (x) = X y p X;Y (x;y): (1.1) The conditional probability distribution p YjX of Y given X is p YjX (yjx) = p X;Y (x;y) p X (x) (1.2) for p X (x)> 0. The entropy of X is given by H(X) =E[logp X (X)] = X x:p X (x)>0 p X (x) logp X (x): (1.3) 1 We use the term "probability distribution" to mean the non-cumulative probability distribution of a random variable. 7 The joint entropy of (X;Y ) is H(X;Y ) =E[logp X;Y (X;Y )] = X x;y:p X;Y (x;y)>0 p X;Y (x;y) logp X;Y (x;y): (1.4) The conditional entropy of Y given X is H(YjX) =E[logp YjX (YjX)] = X x;y:p X;Y (x;y)>0 p X;Y (x;y) logp YjX (yjx) (1.5) = X x:p X (x)>0 p X (x)H(YjX =x) (1.6) where for p X (x)> 0 we dene H(YjX =x) =E[logp YjX (YjX)jX =x] = X y:p YjX (yjx)>0 p YjX (yjx) logp YjX (yjx): (1.7) The mutual information between X and Y is I(X;Y ) =H(Y )H(YjX) (1.8) =H(X)H(XjY ) =I(Y ;X) (1.9) and the conditional mutual information between X and Y given Z is I(X;YjZ) =H(XjZ)H(YjX;Z) = X z:p Z (z)>0 p Z (z)I(X;YjZ =z) (1.10) 8 where we dene I(X;YjZ =z) =H(XjZ =z)H(YjX;Z =z) (1.11) for p Z (z)> 0. We introduce basic results for the information theoretic quantities dened above. Non-negativity of mutual information I(X;Y ) 0: (1.12) Conditioning does not increase entropy H(YjX)H(Y ): (1.13) The chain rule for entropy H(X;Y ) =H(X) +H(YjY ) (1.14) =H(Y ) +H(YjX): (1.15) A more general form is H(X 1 ;:::;X n ) = n X k=1 H(X k jX 1 ;:::;X k1 ): (1.16) 9 The chain rule for mutual information I(X;Y;Z) =I(X;Y ) +I(X;ZjY ): (1.17) A more general form is I(X;Y 1 ;:::;Y n ) = n X k=1 I(X;Y k jY 1 ;:::;Y k1 ): (1.18) Data processing inequality I(X;Y )I(X;Z) (1.19) ifX|Y |Z forms a Markov chain, i.e.,p Y;ZjX =p YjX p ZjY . As a corollary, we have I(X;Y )I(X;g(Y )) (1.20) with equality if the function g() is a one-to-one mapping. Continuous Random Variables. Suppose thatX andY are real continuous random variables with a joint probability density function (pdf) p X;Y . Let p X and p YjX be the marginal pdf of X and the conditional pdf of Y given X, respectively. The dierential entropy h(X) of X is h(X) =E[logp X (X)] = Z p X (x) logp X (x)dx (1.21) 10 while the joint dierential entropyh(X;Y ) and the conditional dierential entropyh(YjX) are h(X;Y ) =E[logp X;Y (X;Y )] = ZZ p X;Y (x;y) logp X;Y (x;y)dydx (1.22) and h(YjX) =E[logp YjX (YjX)] = ZZ p X;Y (x;y) logp YjX (yjx)dydx: (1.23) Complex Random Variables Suppose thatX is a complex continuous random vari- able. The notation h(X) is to be understood as h(X) =h(<fXg;=fXg) (1.24) where<fg and=fg are real and imaginary parts of a complex random number, respec- tively. In other words, h(X) denotes the (joint) entropy of the Cartesian coordinates of X. A complex number x can also be represented in polar coordinates, i.e., by using the amplitudejxj and the phase argx. It is important to point out that the relation between the joint entropies of the two representations is (e.g., see [55, Lemma 6.16]) h(jXj; argX) =h(<fXg;=fXg)E[logjXj]: (1.25) 11 It is also worth noting that I(<fXg;=fXg;Y ) =I(jXj; argX;Y ) (1.26) for any X and Y . Channel Capacity The channel between the inputs (X 1 ;:::;X n ) and the outputs (Y 1 ;:::;Y n ) is dened by a conditional probability of the outputs given the input. For any input distribution, the information rate R = lim n!1 1 n I(X 1 ;:::;X n ;Y 1 ;:::;Y n ) (1.27) is achievable, i.e., there exist codes such that the probability of decoding error approaches zero as n tends to innity. The (information) capacity of a channel is the maximum achievable rate and is given by C = lim n!1 1 n supI(X 1 ;:::;X n ;Y 1 ;:::;Y n ) (1.28) where the supremum is over all of possible joint input distributions satisfying desired constraints, e.g., a power constraint. For a memoryless channel, the unconstrained ex- pressions simplify to R =I(X;Y ) (1.29) 12 and C = max p X I(X;Y ): (1.30) Gaussian Channel The discrete-time memoryless complex Gaussian channel is Y =X +Z (1.31) where the outputY and the inputX are complex andZ is a circularly-symmetric complex Gaussian random variable with mean 0 and varianceE[jZj 2 ] =N. For the second moment constraintE[jXj 2 ]P , the capacity is [71] [22] C = log 1 + P N (1.32) and is achieved by a Gaussian input distribution. 1.3 Fiber Models In this section, we discuss noise, chromatic dispersion and Kerr non-linearity in optical ber. Lumped optical ampliers (also referred to as discrete ampliers), such as erbium- doped optical ampliers (EDFA), or distributed ampliers, such as Raman ampliers, add noise to the signal due to amplied spontaneous emission (ASE). The noise is typically modeled as a white Gaussian process. In lumped amplication,N s ampliers are inserted periodically over a ber link of total lengthL which createsN s spans 2 , each of lengthL s = 2 In practice, a span is often composed of standard single-mode ber (SSMF) followed by dispersion- compensated ber (DCF). However, we consider ber links made of one type of ber for simplicity. 13 L=N s . In distributed amplication, the signal is amplied continuously as it propagates through the ber. The power spectral density (PSD) of the ASE noise is N lumped ASE =N s (e Ls 1)hn sp (1.33) for lumped amplication and N distrib ASE =Lhn sp (1.34) for (ideal) distributed amplication. In the above expressions, h is the photon energy, n sp is the spontaneous emission factor and is the power attenuation coecient [35]. Note that N lumped ASE !N distrib ASE as L s ! 0 (1.35) because N s (e Ls 1) =L e Ls 1 L s !L as L s ! 0: (1.36) Dispersion arises because the medium absorbs energy through the oscillations of bound electrons, causing a frequency dependence of the material refractive index [2, p. 7]. The Kerr eect is caused by anharmonic motion of bound electrons in the presence of an intense electromagnetic eld, causing an intensity dependence of the material refractive index [2, p. 17, 165]. 14 Let A(z;t) be a complex number representing the slowly-varying component (or en- velope) of a linearly-polarized, electric eld at position z and timet in single-mode ber. The equation governing the evolution ofA(z;t) as the wave propagates through the ber is [2, p. 44] @A @z + 1 @A @t +i 2 2 @ 2 A @t 2 = 2 A +i jAj 2 A (1.37) where i = p 1, is the power attenuation coecient, 1 is the reciprocal of the group velocity, 2 is the group velocity dispersion (GVD) coecient, = n 2 ! 0 =(cA e ), n 2 is the non-linear refractive index, ! 0 is the carrier frequency, c is the speed of light in free space, and A e is the eective cross-section area of the ber. It is common to specify GVD through the dispersion parameter D which is related to 2 by [2, p. 11] D = 2c 2 2 (1.38) where is the wavelength of the wave in free-space, i.e., = 2c=! 0 . By dening a retarded-time reference frame with T =t 1 z, we have [2, p. 50] @A @z +i 2 2 @ 2 A @T 2 = 2 A +i jAj 2 A: (1.39) Moreover, by dening U =Ae z=2 , the dierential equation becomes @U @z +i 2 2 @ 2 U @t 2 =i e z jUj 2 U (1.40) 15 and thus the term (=2)A is eliminated from the equation. We will set = 0 in the rest of this section for simplicity. Therefore, we have i @A @z 2 2 @ 2 A @T 2 + jAj 2 A = 0 (1.41) which is referred to as the non-linear Schr odinger (NLS) equation [2, p. 50] because of its similarity to the Schr odinger equation with a non-linear potential term when the roles of time and distance are exchanged. When 2 = 0, Equation (1.41) has the exact solution [2, p. 98] A(L;T ) =A(0;T )e i LjA(0;T )j 2 (1.42) where L is the ber length. In other words, Kerr non-linearity leaves the pulse shape unchanged but causes an intensity-dependent phase shift. The phase shift phenomenon is called self-phase modulation (SPM). Suppose two optical elds at dierent carrier frequencies ! 1 and ! 2 are launched at the same location and propagate simultaneously inside the ber. The elds interact with each other through the Kerr eect [2, Ch. 7]. Specically, neglecting ber losses, the propagation is governed by the coupled non-linear Schr odinger (NLS) equations [2, p. 264, 274]: i @A 1 @z 21 2 @ 2 A 1 @T 2 + 1 (jA 1 j 2 + 2jA 2 j 2 )A 1 = 0 (1.43) i @A 2 @z 22 2 @ 2 A 2 @T 2 + 2 (jA 2 j 2 + 2jA 1 j 2 )A 2 +id @A 2 @T = 0 (1.44) 16 where A k (z;T ) is the time-retarded, slowly varying component of eld k, k = 1; 2, the 2k are group velocity dispersion (GVD) coecients, the k are non-linear parameters, and d = 12 11 where the 1k are reciprocals of group velocities. Observe that now the group velocities 1= 11 and 1= 12 enter the equations via d since there is no common reference frame to remove both. The parameterd is a measure of group velocity mismatch (GVM). To simplify the model, we assume that the second order dispersion eects are negligible, i.e., 21 = 22 = 0. Ifd = 0, the coupled NLS equations (1.43)-(1.44) have the exact solutions [2, p. 275] A 1 (L;T ) =A 1 (0;T )e i 1 L(jA 1 (0;T )j 2 +2jA 2 (0;T )j 2 ) (1.45) A 2 (L;T ) =A 2 (0;T )e i 2 L(jA 2 (0;T )j 2 +2jA 1 (0;T )j 2 ) (1.46) wherez = 0 is the point at which both elds are launched. Kerr non-linearity again leaves the pulse shapes unchanged but causes interference through intensity-dependent phase shifts. The interference phenomenon is called cross-phase modulation (XPM). The XPM is an important impairment in optical networks using wavelength-division multiplexing (WDM), see [35]. Kerr non-linearity leads to a third eect called four-wave mixing (FWM) when more than two carriers co-propagate in the ber. In four-wave mixing, four elds propagating at frequencies! 1 ,! 2 ,! 3 and! 4 =! 1 ! 2 +! 3 interact with each other. The propagation is governed by the following set of coupled equations [2, p. 393]: @A 1 @z =i 1 2 4 (jA 1 j 2 + 2 X k6=1 jA k j 2 )A 1 + 2A 2 A 3 A 4 e +iz 3 5 (1.47) 17 @A 2 @z =i 2 2 4 (jA 2 j 2 + 2 X k6=2 jA k j 2 )A 2 + 2A 1 A 3 A 4 e +iz 3 5 (1.48) @A 3 @z =i 3 2 4 (jA 3 j 2 + 2 X k6=3 jA k j 2 )A 3 + 2A 1 A 2 A 4 e iz 3 5 (1.49) @A 4 @z =i 4 2 4 (jA 4 j 2 + 2 X k6=4 jA k j 2 )A 4 + 2A 1 A 2 A 3 e iz 3 5 (1.50) where k = n 0 2 ! k cA e (1.51) and the wave-vector mismatch is given by = 4 + 3 2 1 (1.52) where 1 , 2 , 3 and 4 are the propagation constants of the four waves. We ignore the FWM eect in our models. 1.4 Overview of Related Work We provide an overview of the related work in this section. Mitra and Stark [61] studied a WDM system in which XPM is the only non-linear eect considered (ignoring FWM and assuming that SPM can be fully corrected). The propagation equation of the kth channel is thus described by @A k @z =i 2 2 @ 2 A k @t 2 iV k (z;t)A k (1.53) 18 where V k (z;t) =2 X `6=k jA ` (z;t)j 2 : (1.54) A key simplication in [61] is approximating V k (z;t) by a Gaussian random process leading to the solution 3 A(L;t) = Z U(t;t 0 ;L)A(0;t 0 )dt 0 +n(t) (1.55) whereU(t;t 0 ;L) is called the propagator 4 andn(t) is a white circularly-symmetric complex Gaussian process with zero mean and one-sided power spectral density N lumped ASE . The following capacity (per WDM channel) lower bound was derived for Gaussian inputs C LB =B log 2 1 + e (P=P) 2 P P ASE + (1e (P=P) 2 )P ! (bits/sec) (1.56) where P ASE =N ASE B is the total amplication noise from all spans and P = s BD 2 2 L e ln(N c =2) (1.57) whereN c is the number of WDM channels,B is the channel bandwidth, is the channel spacing in wavelength and L e is the eective length dened as L e =N s 1 exp(L s ) N s 1 : (1.58) 3 We focus on one channel and drop the index k. 4 The was no explicit expression given in [61] for the propagator. 19 The computation of the lower bound above requires only the input-output covariance matrix. The conclusion of [61] was that capacity has a nite peak and does not increase indenitely with the input power. Tang [75] studied WDM transmission over a single-span dispersion-free ber link. In this case, the propagation equation can be solved analytically in closed-form. A lower bound on capacity was obtained for Gaussian inputs by computing the power spectral density of the input (the sum all WDM channels), the power spectral density of the output (the overall WDM signal after propagation) and the cross-spectral density of the input and output. Tang extended the results of [75] to a multi-span dispersion-free ber link in [74] and then to a multi-span dispersive ber link in [76]. In [76], a truncated Volterra series [66] (see Appendix 1.8.7) is used to approximate the solution to the NLS equation assuming that the eect of non-linearity is small. The capacity lower bounds obtained in [74] and [76] are derived by computing spectral densities as done in [75]. The lower bounds in [75], [74] and [76] have a peak value versus input power. We remark that it is implicit in the analysis of [74{76] that the receiver of each user has access to all the WDM channels, and not just its own channel, which means it may be possible to achieve information rates that are higher than the rates achieved in the case of having access to a single channel only. 20 Narimanov and Mitra [63] studied a single-channel transmission over a multi-span dispersive ber link. A perturbation technique (see Appendix 1.8.8) is used to approxi- mate the solution to the NLS equation assuming that the non-linear term is small. The following capacity expression is found C =B log 2 1 + P P ASE C 1 C 2 +O( 4 ) (bits/sec) (1.59) where the correction terms C 1 and C 2 are both proportional to N 2 s B 2 P 2 (see [63, Eq(24)-(25)] for exact expressions). Xiang and Zhang [83] extended some of the results of [63]. Ho and Khan [47] studied WDM transmission over a multi-span dispersive ber link. It is argued that under constant-envelope (also called constant-intensity) modulation with uniform phase 5 , SPM and XPM cause only non-time-variant phase shift and hence, the phase distortion is eliminated. By modeling FWM as additive Gaussian noise, they obtain the following estimate of the information rate achieved by constant-envelope modulation C LB 1 2 log 2 P P ASE +P FWM + 1:1 (bits/symbol) (1.60) 5 We refer to constant-envelope modulation with uniform phase as ring modulation. 21 where P ASE is the total amplication noise and P FWM is the total FWM power from all spans, which is given by P FWM =N s X p6= 0;q6= 0 : jp +qj (Nc 1)=2 2 P p P q P p+q (D pq =3) 2 2 + k 2 pq (1.61) whereD pq = 3 ifp =q (degenerate FWM) andD pq = 6 ifp6=q (non-degenerate FWM), and k pq = 2 2 Df 2 pq=c = 2 (!) 2 pq where f and ! are the channel spacing in frequency (Hz) and angular frequency (radians), respectively. It is assumed in (1.61) that the FWM components from individual ber spans combine incoherently. Mecozzi [59] models the propagation of a single-channel in a dispersionless ber link, in which the ber loss is compensated by equally spaced (lumped) ampliers, as @ @z A(z;t) =i jA(z;t)j 2 A(z;t) +n(z;t) (1.62) where n(z;t) is a complex circularly-symmetric Gaussian process accounting for the am- plied spontaneous emission noise with E[n(z;t)] = 0 (1.63) E[n(z;t)n(z 0 ;t 0 )] = 0 (1.64) E[n(z;t)n (z 0 ;t 0 )] = N lumped ASE L s D (tt 0 ) D (zz 0 ): (1.65) Dispersion is neglected in this model to all orders. Without any ltering mechanism, the bandwidth of n(t;z) is innite. To work around this problem, it is assumed in [59] that 22 ideal square (in frequency domain) lters with bandwidth B (equal to the bandwidth of the pulse) are inserted along the ber link and hence n(t;z) becomes E[n(z;t)n (z 0 ;t 0 )] = N lumped ASE B L s D (zz 0 ): (1.66) The solution is then found to be A(z;t) = A(0;t) + Z z 0 n(z 0 ;t)dz 0 exp 0 @ i Z z 0 A(0;t) + Z z 0 0 n(z 00 ;t)dz 00 2 dz 0 1 A : (1.67) It is worth mentioning that the eect of square lters is incorporated in this solution by assuming that the noise is bandlimited (to the bandwidth of the signal), while ignoring that ltering changes the shape of the signal. It is also assumed that the signal bandwidth at the output is the same as the input signal bandwidth ignoring the spectral broadening due to non-linearity. Mecozzi also derived an expression for the conditional distribu- tion of Y = A(t;z) given X = A(t; 0) (see [59, Eq.(70)]) through computing arbitrary (conditional) moments E[Y n Y m jX =x]. We remark that (1.62) is a more accurate de- scription of wave propagation under distributed amplication than lumped amplication. For distributed amplication, the correlation function of n(z;t) is E[n(z;t)n (z 0 ;t 0 )] = N distrib ASE B L D (zz 0 ): (1.68) assuming the noise is bandlimited to bandwidth B. Turitsyn et al [78] also studied a single-channel transmission over zero-dispersion ber link. They obtained the conditional 23 distribution ofY =A(L;t) givenX =A(0;t) using techniques from quantum mechanics. For Gaussian input X (i.e.,jXj is Rayleigh andjXj 2 has central chi-square distribution with two degrees of freedom) and direct-detection (decoding using onlyjYj), a lower bound was derived: C log 2 (1 +SNR) 2SNR ln 2 +F 1 (SNR) (bits/symbol) (1.69) where SNR =P=P ASE and F 1 (s) =s 1 Z 1 0 xK 0 x p 1 +s 1 I 0 (x) log 2 I 0 (x) dx (1.70) where I 0 and K 0 are the zeroth-order modied Bessel functions of the rst and second kind, respectively. At high SNR, the approximate lower bound C 1 2 log 2 (1 +SNR) +O(1) (bits/symbol) (1.71) is obtained by using the asymptotic approximation of the modied Bessel functions for large arguments. In [84{86], Youse and Kschischang derived the conditional probability of Y given X using two dierent approaches: a sum-product approach and a Fokker- Planck dierential equation approach. They also argue that the (per-sample) capacity is [86, Sec. IV] C = maxI(X;Y ) 1 2 log 2 (1 +SNR) 1 2 (bits/symbol) (1.72) 24 in the high-power regime and is achieved by input amplitudejXj with a half-Gaussian (also called half-normal or positive normal) distribution. 6 We remark that this result can be deduced from the solution of the dierential equation by Mecozzi [59] by noting that jYj =jX +Nj and using [14, Row III in Table 1] or [53, Eq(23)]: maxI(X;jX +Nj) = maxI(jXj;jX +Nj) 1 2 log SNR 2 (1.73) at high SNR. Wei and Plant [81] made some useful comments on the results of [78], [61] and [75]. Wegener et al [80] studied WDM transmission over a multi-span dispersive ber link. The propagation equation of the signal in channel k is described by @A k @z + 1k @A k @t +i 2k 2 @ 2 A k @t 2 = SPM z }| { i jA k j 2 A k + XPM z }| { iV k (z;t)A k + FWM z }| { iF k (z;t) (1.74) with V k (z;t) =2 X `6=k jA ` j 2 e z (1.75) F k (z;t) = X m6=k;n6==k A m A n A m+nk e ( kmn )z (1.76) 6 Half-normal distribution ofjXj implies thatjXj 2 has a central chi-square distribution with one degree of freedom. 25 where 1k and 2k are the coecients of the Taylor expansion of the propagation constant (!) around ! k (!) = 0k + 1k (!! k ) + 2k (!! 0 ) 2 2 +::: (1.77) and kmn = 2 (!) 2 (mk)(nk). The form of FWM given above is valid when the channel spacing is more than twice the channel bandwidth. To simplify the solution analytically, V k (z;t) and F k (z;t) are replaced with Gaussian random processes. The solution to the new dierential equation is given by A(L;t) = Z G(L;tt 0 )A(0;t 0 )dt 0 +n(t) (1.78) whereG(;) is Green's function [80, Eq(13)] andn(t) is white circularly-symmetric com- plex Gaussian process with zero mean and one-sided power spectral densityN lumped ASE . The following capacity lower bound (per channel) is derived for Gaussian inputs C LB =B log 2 1 + e (P=P) 2 P P ASE +P FWM + (1e (P=P) 2 )P ! (bits/sec) (1.79) where [80, Eq(19)] P = s BD 8 2 L e P Nc=2 n=1 n 1 (1.80) 26 and [80, Eq(23)] P FWM = 2 P 3 12L e p BD g N c ; f B (1.81) g(;) is given by [80, Eq(24)]. Similar to [61], the evaluation of the lower bound requires only the input-output covariance matrix where the output is taken to beY =A(L;t) and input to be X =A(0;t). Ivakovic et al [49] studied a single-channel long-haul transmission system that uses on-o keying (OOK) and return-to-zero (RZ) pulses. They estimate numerically the achievable information rate for independent uniformly distributed inputs when the eects of the Kerr non-linearity, chromatic dispersion and amplied spontaneous emission are taken into account. This is done by modeling the channel as a nite-state machine and obtaining an approximate expression for the conditional probability density function given a specic state (dened to be the conguration of neighboring bits) then using an algorithm that is a variant of the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm to estimate the information rate. We remark that the information rate is upper bounded by 1 bit/symbol because OOK is used and therefore the computed rate is not a good estimate for the capacity. Taghavi et al [73] studied WDM transmission over a single-span dispersive ber link. They use a (truncated) Volterra series (see Appendix 1.8.7) solution to the propagation equation @A @z = 2 A +i 2 2 @ 2 A @T 2 i jAj 2 A (1.82) 27 where T =t 1 z. Here, A(0;t) is the overall WDM signal. Each receiver uses a linear lter to compensate dispersion followed by a matched lter (matched to the transmitted pulse) whose output is sampled at the symbol rate. Assuming that dispersion is weak (so that inter-symbol interference can be neglected), a discrete-time memoryless model is obtained Y i =X i + SPM z }| { (i) i;i;i jX i j 2 X i + XPM z }| { X k6=i (i) k;k;i + (i) i;k;k jX k j 2 X i + FWM z }| { X kl +m =i : k6=i;m6=i (i) k;l;m X k X l X m +Z i (1.83) where Y i is the output of receiver i, X i is the input of the ith user, Z i is a circularly- symmetric complex Gaussian random variable with mean zero and variance P ASE and (i) k;l;m is given by [73, Eq(93)]. It is assumed for this model that the channel spacing is large enough with respect to the channel bandwidth so that the spectrum of each channel does not spill into other channels after non-linear mixing. The authors consider the case in which each receiver has access to the received signal of all channels and thus allows joint decoding with high reliability. This case was treated as a multiple-access channel. It is found that non-linearity does not aect the capacity of a channel to the rst-order approximation in and is achieved by performing an interference cancellation step: ~ Y i =Y i X kl +m =i : k6=i;m6=i (i) k;l;m Y k Y l Y m (1.84) 28 then using ~ Y i to decode X i . Moreover, single-channel detection (i.e., the decoder for the ith user has access to Y i only) was considered in two regimes: XPM-dominated and FWM-dominated regimes. For example, in the FWM-dominant regime, the maximum information rate for channel i is I(X i ;Y i ) = log 2 1 + P i P ASE log 2 0 @ 1 + X kl+m=i j (i) k;l;m j 2 P k P l P m P ASE 1 A (bits/symbol) (1.85) where P i P is the signal power of channel i. One can optimize over channel powers to maximize the sum rate or the symmetric rate (the rate that is achievable by all users). It is concluded that the capacity for single-channel detection is signicantly reduced compared to the multiple access channel capacity. Chen and Shieh [19] studied densely-spaced 7 coherent optical orthogonal frequency division multiplexing (OFDM) transmission over multi-span dispersive ber link. In this system, each wavelength channel is OFDM-modulated with subcarrier frequency spacing f. The authors treat SPM and XPM as FWM and compute the power of interference on the ith subcarrier to be [19, Eq(9)] P NLI = N=2j X m=N=2j N=2 X j=N=2 2 P 3 2 2 + ( 2 (!) 2 jm) 2 0 1 (j;m) (1.86) 7 Densely-spaced is used to mean the guard band between channels is very small compared to the channel bandwidth. 29 whereN is the total number of subcarriers in the overall WDM signal, i.e., N is equal to N c times the number of subcarriers per channel and 0 1 (j;k) is given by 0 1 (j;m) = sin N s jm 2 L s (!) 2 =2 sin (jm 2 L s (!) 2 =2) (1.87) where ! here is the (angular) frequency spacing of subcarriers. The factor 1 captures the phase array eect accounting for the interference of FWM components among multiple spans. The authors follow the approach of [61] by considering the non-linear eect as multiplicative noise. A lower bound on spectral eciency is given S log 2 1 + e (P=P) 2 P P ASE + (1e (P=P) 2 )P ! (bits/sec/Hz) (1.88) where P ASE is the ASE noise accumulated from all spans over one subcarrier and P is dened by the relation P NLI = P P 2 P: (1.89) We remark that there is no justication in [19] for modeling the non-linear interference as multiplicative noise. In [61] and [80], treating XPM as multiplicative noise in [61] and [80] and treating FWM as additive noise in [80] was justied based on the way the XPM and FWM appear in the propagation equation. It is worth noting that, for small P=P , we have e (P=P) 2 P P ASE + (1e (P=P) 2 )P P P ASE + (P=P ) 2 P 30 = P P ASE +P NLI : (1.90) Essiambre et al [35] reviewed fundamental concepts of digital communications, in- formation theory and the physical phenomena present in transmission over optical ber networks. They also evaluated by numerical simulations a capacity limit estimate for the optical ber channel using multi-ring constellations in various scenarios, e.g., dierent constellation shapings and dierent ber dispersion maps. For WDM transmission, non- linear compensation through backpropagation (of individual channels) was used. The trend in the various scenarios is that the capacity has a peak value versus the launch power. Bosco et al [15, 16] studied (ultra-dense) WDM transmission over uncompensated optical ber links with both distributed and lumped amplication. They argue that, after digital signal processing (DSP) at the receiver, the distribution of each of the received constellation points is approximately Gaussian with independent components, even in the absence of additive ASE noise. Hence, they adopt a model, called the Gaussian noise (GN) model, in which the impact of non-linear propagation is approximated by excess additive Gaussian noise, which they call non-linear interference (NLI). The developed channel model is essentially the discrete-time memoryless channel: Y j =X j +Z ASE;j +Z NLI;j (1.91) where Y j is the jth output and X j is the jth input whilefZ ASE;j g andfZ NLI;j g are two i.i.d. zero-mean circularly-symmetric Gaussian random processes with variances P ASE 31 and P NLI , respectively. The processesfZ ASE;j g andfZ NLI;j g are independent. P ASE is the total ASE noise over the entire ber link andP NLI /P 3 . For example, suppose there are N c WDM channels each with a symbol rate R s . We have 8 P NLI =P 3 16 27 2 L e R s !j 2 j ln j 2 jL e N 2 c ! 2 4N s (1.92) and P NLI =P 3 32 27 2 L R s !j 2 j ln j 2 jLN 2 c ! 2 6 (1.93) for distributed amplication [16] and for lumped amplication [15], respectively. Using the GN model, capacity estimates are derived. In [68], Poggiolini discusses the GN model in depth. Mecozzi and Essiambre [60] studied multi-channel transmission over a dispersive ber link with distributed amplication. They developed a general rst-order perturbation theory of the signal propagation and simplied the expression for highly dispersive, or pseudolinear, transmission. The signal is linearly-modulated 9 at the transmitter and the detection apparatus at the receiver is made of an optical lter (to separate the channel), 8 The expressions are for polarization-multiplexed transmission. 9 The signal is the sum of modulated pulses. 32 mixing with a local oscillator and subsequent sampling at the symbol rate. By concentrat- ing on inter-channel non-linearity (in particular XPM), they derive a capacity estimate per channel, e.g., the capacity of channel 0 is [60, Sec. XII] C = log 2 1 + E ja 0 j 2 E [ja 0 j 2 ] ! (bits/symbol) (1.94) with E ja 0 j 2 E [ja 0 j 2 ] = P ASE P 0 + 4 2 T 2 s L j 2 j X k6=0 ( k 1) P 2 k T s !jkj (1.95) where T s is the symbol interval, P ASE =N distrib ASE =T s , a k is the input symbol to channel k and P k is the average power of channel k, which is given by P k =E[ja k j 2 ] 1 T s (1.96) assuming unit-energy transmit-pulse. The parameter k is the kurtosis of the constellation of channel k dened as k = E[ja k j 4 ] (E[ja k j 2 ]) 2 : (1.97) 33 The capacity above is achieved when the input a 0 is Gaussian. For identical channels, where the average power isP and the constellation is the same for all channels, the inverse of the SNR is E ja 0 j 2 E [ja 0 j 2 ] P ASE P +P 2 4 2 L R s !j 2 j 2( 1) ln N c 2 + 0:577 : (1.98) For circularly-symmetric complex Gaussian random variables, the kurtosis is 2. An im- portant observation is that the constellation of the interfering channels is important in determining the system impairments. For example, if the transmitted signals are purely phase-modulated (kurtoris = 1), the impact of XPM is absent which indicates that the dominant contribution to XPM impairments is caused by the amplitude modulation of the interfering channel. Secondini et al [70] studied WDM transmission over a dispersive ber link. By ne- glecting FWM, the equation governing the signal propagation of channel k is @A k @z =i 2 2 @ 2 A k @t 2 +i V SPM;k (z;t)A k +i2 V XPM;k (z;t)A k (1.99) with V SPM;k (z;t) =jA k (z;t)j 2 (1.100) V XPM;k (z;t) = X `6=k jA ` (z;t)j 2 : (1.101) 34 The key simplication is replacing the unknown intensities appearing in the equation with those corresponding to linear propagation, i.e., set V SPM;k (z;t) =j ^ A k (z;t)j 2 and V XPM;k (z;t) = P `6=k j ^ A ` (z;t)j 2 where ^ A k is the solution to the linear equation: @ ^ A k @z =i 2 2 @ 2 ^ A k @t 2 : (1.102) Although this does not solve the problem of nding an analytical solution, they derive a rst-order approximation to the solution based on frequency-resolved logarithmic per- turbation. The approximate solution is used to develop a discrete-time channel model for the channel of user k which is composed of the optical ber link followed by a back- propagation block (and thus it is assumed that SPM is fully compensated), a matched lter, and sampling at the symbol rate. In the discrete-time model obtained, the jth output is 10 Y j = 1 X m=1 H j;m X jm +Z j (1.103) where H j;m are the coecients of the time-varying impulse response of the channel, X j is the jth input symbol andfZ j g is an i.i.d. circularly-symmetric complex Gaussian process with mean zero and variance P ASE . A closed-form approximation of the eective XPM variance 2 was derived for the case of identical interfering channels, symmetrically distributed around the observed channel with no guard bands, i.e., the channel spacing is equal to the channel bandwidth. In particular, considering ideal sinc pulses and a 10 We drop the index k of the user. 35 dispersion-unmanaged link of length L with ideal distributed amplication, the eective XPM variance is 2 =P 2 4 2 L R 2 s j 2 j F (;N c ) (1.104) where F (;N c ) = ( 1) ln N c 2 + N c 2 + 1 ln N c + 2 N c + (2) N c 2 + 1 ln N c + 2 N c 1 N c 2 (1.105) in which is the kurtosis of the transmitted symbols, dened as = E[jXj 4 ] (E[jXj 2 ]) 2 : (1.106) For large N c , F can be approximated as F (;N c ) ( 1) ln N c 2 + 1 2 + 1 2 (1.107) and hence 2 P 2 4 2 L R s !j 2 j 2( 1) ln N c 2 + 1 2 + 1 (1.108) where ! = 2R s . By using the theory of mismatched decoding, they compute the information rate achieved by i.i.d. Gaussian input symbols and a maximum likelihood 36 symbol-by-symbol detector designed for a memoryless AWGN auxiliary channel with same covariance matrix as the true channel to be I G (X;Y ) = log 2 1 + PjH 0 j 2 P ASE + (1jH 0 j 2 )P (bits/symbol) (1.109) where P ASE =N ASE =T s and jH 0 j 2 e 2 : (1.110) They also evaluate the information rate achieved by a maximum likelihood sequence de- coder designed for an auxiliary AWGN channel with inter-symbol interference, with same input-output covariance matrix as the real channel [70, Eq(37)]. For constant-envelope (CE) modulation with uniform phase and symbol-by-symbol detection, the achievable information rate at large SNR is I CE (X;Y ) 1 2 log 2 4 e PjH 0 j 2 P ASE +P (1jH 0 j 2 ) (bits/symbol): (1.111) We remark that 2 in (1.108) bears some resemblance to the second term in (1.98) and the quantity (P=P ) 2 with P in (1.57): P 2 P 2 =P 2 2 2 L e ln(N c =2) B 2 ! (1.112) where we used D = 2 !. 37 Dar et al [25] proposed a block-memoryless discrete-time channel model for WDM transmission in the pseudo-linear regime in which XPM is the dominant non-linear eect. The discrete-time model is Y j =X j e i j +Z ASE;j (1.113) whereY j is thejth output,X j is thejth input,fZ ASE;j g is an i.i.d. circularly-symmetric complex Gaussian process with mean zero and variance P ASE andf j g is a random process that models XPM and is assumed to be a block-independent process, i.e., it remains unchanged within a block (of length M) but changes independently between blocks. It is assumed that j is (real) Gaussian with zero mean and variance 2 /E[jXj 4 ] E[jXj 2 ] 2 ; (1.114) i.e., the variance depends on the type of modulation. For the proposed model, two lower bounds were developed: the rst is tight in the low power regime and is based on Gaussian i.i.d. inputs while the second is tighter at high power and is based on independent, identically and isotropically distributed inputs (X 1 ;X 2 ;:::;X M ) where P M j=1 jX j j 2 has a chi-square distribution with 2M 1 degrees of freedoms. At high SNR, the second capacity lower bound is C LB 1 1 2M log 2 P P ASE (bits/symbol): (1.115) 38 In [24,26], Dar et al added an extra term to the previous model to give the discrete-time model Y j =X j e i j +Z ASE;j +Z NL;j (1.116) wherefZ NL;j g is an i.i.d. circularly-symmetric complex Gaussian process with mean zero and variance P NL and is statistically independent offZ ASE;j g. The purpose of the extra term is to capture non-linear eects that do not manifest themselves as phase noise. Agrell et al [3] proposed, for coherent long-haul ber-optical links without dispersion compensation, a discrete-time model in which the kth output Y k is Y j =X j + ~ Z k v u u u t P ASE + 0 @ 1 2M + 1 j+M X i=jM jX i j 2 1 A 3 (1.117) wheref ~ Z j g are i.i.d. zero-mean unit-variance circularly-symmetric complex Gaussian ran- dom variables andM is a positive integer that represents the length of (one-sided) channel memory. The constant P ASE represents the total ASE noise while is a constant (in- dependent of P ) that quanties the non-linear interference. They call this model the nite-memory GN model. IffX j g is an i.i.d. process, then lim M!1 1 2M + 1 k+M X i=kM jX i j 2 =P (1.118) for any given k, which shows that the proposed model converges to the regular GN model, provided that the inputs are i.i.d. and that the channel memory M is suciently large. Using the nite-memory GN model, they derive semi-analytic lower bounds for 39 non i.i.d. inputs. It was shown through numerical simulations that the information rates of the nite-memory GN model (for dierent values ofM) are higher than the rate of the regular GN model. We remark that there is little justication for the proposed discrete- time model as it was not derived from a continuous-time description of the system. Youse and Kschischang [87{91] discuss the non-linear Fourier transform (NFT), a method for solving a broad class of non-linear dierential equations, and in particular for solving the non-linear Schr odinger equation that governs the (noiseless) propagation of a signal in optical ber. The NFT plays for such systems the same diagonalization role that the ordinary Fourier transform plays for linear systems. They propose a scheme for information transmission, called non-linear frequency-division multiplexing (NFDM), which can be viewed as a non-linear analogue of orthogonal frequency-division multiplex- ing. In NFDM, information is encoded in the non-linear Fourier transform of the signal, consisting of two components: a discrete and a continuous spectral function. By mod- ulating non-interacting degrees of freedom of a signal, deterministic crosstalk between signal components due to dispersion and non-linearity is eliminated, i.e., inter-symbol and inter-channel interference are zero. We develop discrete-time interference channel models for WDM transmission over a single-span of both dispersionless and dispersive ber. The models are based on coupled dierential equations that capture SPM, XPM and GVM (GVM is included in the disper- sive case only). Transmitters send linearly-modulated pulses while receivers use matched lters with symbol rate sampling (for dispersionless transmission) or banks of lters (for dispersive transmission). Rather than using Gaussian codebooks, we design codebooks based on a new technique called interference focusing. We show that all users achieve 40 a pre-log of 1 simultaneously by using interference focusing. Our results were partially presented in [39] and [40]. 1.5 Zero Group Velocity Mismatch In this section, we investigate the case of zero group velocity mismatch d = 0, i.e., there is no dispersion. We develop a discrete-time two-user channel model based on sampling of electric elds in Sec. 1.5.1 and show that a pre-log 1/2 is achievable using phase modulation only in Sec. 1.5.3. In Sec. 1.5.4, we introduce our new technique, interference focusing, and show that it achieves a pre-log 1 for all users, and therefore no degrees of freedom are lost. An extension of the discrete-time model to K users is presented in Sec. 1.5.5. 1.5.1 Discrete-Time Two-User Model Transmitterk sends a string of symbolsX n k = (X k [1];X k [2]; ;X k [n]) using rectangular pulses (in the time domain). The transmitted signals A 1 (0;T ) and A 2 (0;T ) propagate through a dispersionless optical ber of length L. At the end of the ber (z = L), the signalsA 1 (L;T ) andA 2 (L;T ) are amplied by discrete ampliers that introduce amplied spontaneous emission noise that is modeled as additive, white and Gaussian. Receiver k obtains Y n k = (Y k [1];Y k [2]; ;Y k [n]) by matched ltering the noisy amplied signal and sampling the lter output at the symbol rate. In this case, (1.45)-(1.46) imply that the channel is memoryless. Hence, we drop the time indices and write the input-output relationships as Y 1 =X 1 exp ih 11 jX 1 j 2 +ih 12 jX 2 j 2 +Z 1 (1.119) 41 Y 2 =X 2 exp ih 21 jX 1 j 2 +ih 22 jX 2 j 2 +Z 2 (1.120) where Z k is circularly-symmetric complex Gaussian noise with variance N. We assume that the noise random variables at the receivers are independent. The term exp(ih kk jX k j 2 ) models SPM and the term exp(ih k` jX ` j 2 ),k6=`, models XPM. We regard theh k` as chan- nel coecients that are time invariant. These coecients are known at the transmitters as well as the receivers. We use the power constraints 1 n n X j=1 E jX k [j]j 2 P; k = 1; 2: (1.121) A scheme is a collectionf(C 1 (P;N);C 2 (P;N))g of pairs of codes such that at (P;N), userk uses the codeC k (P;N) that satises the power constraint and achieves an informa- tion rate R k (P;N) where k = 1; 2. We distinguish between two limiting cases: 1) xed noise with growing powers, and 2) xed powers with vanishing noise. Denition 1 The high-power pre-log pair (r 1 ;r 2 ) is achieved by a scheme if the rates satisfy r k (N) = lim P!1 R k (P;N) log(P=N) for k = 1; 2: (1.122) Denition 2 The low-noise pre-log pair (r 1 ;r 2 ) is achieved by a scheme if the rates satisfy r k (P ) = lim N!0 R k (P;N) log(P=N) for k = 1; 2: (1.123) The (high-power or low-noise) pre-log pair (1=2; 1=2) can be achieved if both users use amplitude modulation only or phase modulation only, as shown in Sec. 1.5.2 and Sec. 42 1.5.3, respectively. We show in Sec. 1.5.4 that the high-power pre-log pair (1; 1) can be achieved through interference focusing. 1.5.2 Amplitude Modulation First, we introduce a result by Lapidoth [53, Sec. IV]. Lemma 3 (Lapidoth) LetY =X+Z whereZ is a circularly-symmetric complex Gaus- sian random variable with mean 0 and variance N. Dene SjXj 2 =P . Suppose S is distributed as p S (s) = e s=2 p 2s ; s 0: (1.124) In other words,jXj 2 follows a Gamma distribution (or a Chi-squared distribution) with one degree of freedom and has mean P . Then we have I(jXj 2 ;jYj 2 ) 1 2 log 1 + P 2N +o(1) (1.125) where o(1) tends to zero as P=N tends to innity. IfjX 1 j 2 =P andjX 2 j 2 =P are distributed according to p S in (1.124), then we have for k = 1; 2 I(X k ;Y k )I(jX k j;jY k j) =I(jX k j 2 ;jY k j 2 ) 1 2 log 1 + P 2N +o(1): (1.126) 43 It follows that the high-power and low-noise pre-log pair (1=2; 1=2) can be achieved when both users use amplitude modulation. 1.5.3 Phase Modulation Suppose the transmitters use phase modulation withjX 1 j = p P andjX 2 j = p P . The input-output equations (1.119){(1.120) become Y 1 =X 1 e ih 11 P +ih 12 P +Z 1 (1.127) Y 2 =X 2 e ih 21 P +ih 22 P +Z 2 : (1.128) Therefore, each receiver sees a constant phase shift which allows us to treat each transmitter- receiver pair separately as an AWGN channel. We will show that the pre-log pair (r 1 ;r 2 ) = (1=2; 1=2) can be achieved by using phase modulation only. Theorem 4 (One-Ring Modulation) Fix P > 0. Let Y = X + Z where Z is a circularly-symmetric complex Gaussian random variable with mean 0 and variance N, and X = p Pe i X where X is a real random variable uniformly distributed on [0; 2). Then we have I(X;Y ) 1 2 log P N 1 (nats): (1.129) Proof. We have I(X;Y ) =h(Y )h(YjX) =h(Y )h(Z) 44 =E[ logp Y (Y )] log(eN) (1.130) The probability density function (pdf) p Y of Y can be shown to be [35, p. 688] p Y (y) = 1 N e (y 2 A +P )=N I 0 2y A p P N ! (1.131) where I 0 () is the modied Bessel function of the rst kind of order zero and Y A =jYj. Therefore, we have h(Y ) =E " log 1 N e (Y 2 A +P )=N I 0 2Y A p P N !!# (a) E 2 4 log 0 @ 1 N e (Y A p P ) 2 =N q 2Y A p P=N 1 A 3 5 (b) E log N q 2Y A p P=N = 1 4 log 2P N + log (N) + 1 2 E " log Y A r 2 N !# (1.132) where (a) follows by using (see Lemma 9 in Appendix 1.8.2) I 0 (z) p 2 e z p z e z p z ; z> 0 (1.133) and (b) holds because (Y A p P ) 2 0. The pdf of Y A is given by p Y A (y A ) = Z p Y (y)y A d y = 2y A N e (y 2 A +P )=N I 0 2y A p P N ! : (1.134) 45 The last expectation in (1.132) is E " log Y A r 2 N !# = Z 1 y A =0 p Y A (y A ) log y A r 2 N ! dy A (a) = Z 1 z=0 ze (z 2 + 2 )=2 I 0 (z) log(z)dz (b) = 1 2 0; P N + log 2P N (c) 1 2 log 2P N (1.135) where (a;x) is the upper incomplete Gamma function dened in (1.267). Step (a) follows by setting 2 = 2P=N andz =y A p 2=N, (b) follows from Lemma 12 (see Appendix 1.8.3) and (c) holds because (0;x) 0 for x 0. 11 Combining (1.130), (1.132) and (1.135) gives I(X;Y ) 1 2 log 2P N log(e) (1.136) which concludes the proof. Now, suppose that X k = p Pe i X;k for k = 1; 2 where X;1 and X;2 are statistically independent and uniformly distributed on [0; 2). It follows from (1.127), (1.128) and Theorem 4 that the high-power and low-noise pre-log pair (1=2; 1=2) can be achieved when both users use phase modulation. 11 Note that limx!1 (0;x) = 0. 46 1.5.4 Interference Focusing We propose an interference focusing technique in which the transmitters focus their phase interference on one point by constraining their transmitted signals to satisfy h 21 jX 1 j 2 = 2~ n 1 ; ~ n 1 = 1; 2; 3;::: (1.137) h 12 jX 2 j 2 = 2~ n 2 ; ~ n 2 = 1; 2; 3;::: (1.138) In other words, the transmitters use multi-ring modulation with specied spacings be- tween the rings. 12 We thereby remove XPM interference and (1.119)-(1.120) reduce to Y k =X k e ih kk jX k j 2 +Z k ; k = 1; 2: (1.139) This channel is eectively an AWGN channel since h kk is known by receiver k and the SPM phase shift is determined by the desired signalX k . We will show that the high-power pre-log pair (1; 1) is achieved under the constraints (1.137)-(1.138). Theorem 5 (Multi-Ring Modulation) LetY =X+Z whereZ is a circularly-symmetric complex Gaussian random variable with mean 0 and variance N. Suppose E[jXj 2 ] P andjXj 2 is allowed to take on values that are multiples of a xed real number ^ p> 0, i.e., jXj 2 =m^ p where m2N. Then there exists a probability distribution p X of X such that lim P!1 I(X;Y ) log(P=N) 1: (1.140) 12 Multi-ring modulation was used in [33{35] for symmetry and computational reasons. We here nd that it is useful for improving rate. 47 Proof. DeneX A =jXj and X = argX. Consider multi-ring modulation, i.e.,X A and X are statistically independent, X is uniformly distributed on the interval [0; 2) and X A 2f p P j : j = 1;:::;Jg where J is the number of rings. We choose the rings to be spaced uniformly in amplitude as P j =aj 2 ^ p (1.141) where a is a positive integer. We further use a uniform frequency of occupation of rings with P X A ( p P j ) = 1=J, j = 1; 2;:::;J. The power constraint is therefore 1 J J X j=1 aj 2 ^ pP: (1.142) For (1.142), we compute 1 J J X j=1 a^ pj 2 =a^ p (J + 1)(2J + 1) 6 (1.143) and to satisfy the power constraint we choose 13 J = 3 + p 1 + 48P=(a^ p) 4 : (1.144) Moreover, we choose a to scale as N log(P=N) so J scales as p (P=N)= log(P=N). 13 The solution for J should be positive and rounded down to the nearest integer but we ignore these issues for notational simplicity. 48 We have I(X;Y ) =I(X A ; X ;Y ) =I(X A ;Y ) +I( X ;YjX A ) (1.145) The termI(X A ;Y ) can be viewed as the amplitude contribution while the termI( X ;YjX A ) is the phase contribution. 1.5.4.1 Phase Contribution We show that the phase modulation contributes 1/2 to the pre-log when using multi-ring modulation. Lemma 6 A non-decreasing functionf(x) inx satises, for integersa andb withab, Z b x=a1 f(x)dx b X i=a f(i) Z b+1 x=a f(x)dx: (1.146) We thus have I( X ;YjX A ) (a) = J X j=1 1 J I( X ;YjX A = p P j ) (b) J X j=1 1 J 1 2 log P j N 1 (c) = 1 2J J X j=1 log aj 2 ^ p N 1 (d) 1 2J Z J x=0 log ax 2 ^ p N dx 1 (e) = 1 2 log aJ 2 ^ p Ne 2 1 (1.147) 49 where (a) follows from the uniform occupation of rings, (b) follows from Theorem 4, (c) holds by choosing the rings according to (1.141), (d) follows from Lemma 6 since the logarithm is an increasing function and (e) follows by using log(ax 2 ^ p=N) = log(a^ p=N) + 2 log(x) and Z log(x)dx =x log (x=e): (1.148) We can therefore write lim P!1 I( X ;YjX A ) log(P=N) lim P!1 1 2 log(aJ 2 ^ p=N) log(P=N) = 1 2 (1.149) where (1.149) follows becausea scales asN log(P=N),J 2 scales as (P=N)= log(P=N), and ^ p is independent of P and N. The pre-log of the phase contribution is therefore at least 1=2. 1.5.4.2 Amplitude Contribution We show that amplitude modulation contributes 1=2 to the pre-log. We have I(X A ;Y ) =H(X A )H(X A jY ) (1.150) where H(X A ) = log(J). We showed previously that J scales as p (P=N)= log(P=N) if a scales as N log(P=N). We bound H(X A jY ) using Fano's inequality as H(X A jY )H(X A j ^ X A ) H(P e ) +P e log(J 1) (1.151) 50 where ^ X A is any estimate of X A given Y , P e = Pr[ ^ X A 6= X A ] and H(P e ) is the binary entropy function with a general logarithm base. Suppose we use the minimum distance estimator ^ X A = arg min x A 2X A jY A x A j (1.152) where Y A =jYj andX A =f p P j : j = 1;:::;Jg. The probability of error P e is upper bounded by (see Lemma 13) P e 2 J J X j=2 exp 2 j 4 ! (1.153) where j = ( p P j p P j1 )= p N. For the power levels (1.141), we have j = p a^ p=N for all j, and hence P e 2(J 1) J exp a^ p 4N 2 exp a^ p 4N : (1.154) We see from (1.154) that lim P!1 P e = 0 if a scales as N log(P=N) (recall that J scales as p (P=N)= log(P=N) ). We thus have lim P!1 H(X A jY ) = 0 by using (1.151). Conse- quently, we have lim P!1 I(X A ;Y ) log(P=N) = lim P!1 log(J) log(P=N) = 1 2 : (1.155) Finally, combining (1.145), (1.149), and (1.155) gives (1.140). 51 We conclude that interference focusing achieves the largest-possible high-power pre- log of 1. Each user can therefore exploit all the phase and amplitude degrees of freedom simultaneously. 1.5.5 Discrete-Time K-User Model Equations (1.45)-(1.46) generalize to K frequencies. This motivates the following mem- oryless interference network model based on sampling the eldsA k (z;T ),k = 1; 2;:::;K, atz = 0 andz =L. Transmitterk sends a string of symbolsX n k = (X k [1];X k [2]; ;X k [n]) while receiver k sees Y n k = (Y k [1];Y k [2]; ;Y k [n]). We model the input-output relation- ship at each time instant j as Y k [j] =X k [j] exp i K X `=1 h k` jX ` [j]j 2 ! +Z k [j] (1.156) fork = 1; 2;:::;K whereZ k [j] is circularly-symmetric complex Gaussian noise with vari- anceN. All noise random variables at dierent receivers and dierent times are taken to be independent. The terms exp(ih kk jX k [j]j 2 ) model SPM and the terms exp(ih k` jX ` [j]j 2 ), k6= `, model XPM. The h k` are again channel coecients that are time invariant and are known at the transmitters as well as the receivers. The power constraints are 1 n n X j=1 E jX k [j]j 2 P; k = 1; 2;:::;K: (1.157) 52 1.5.5.1 Interference Focusing We outline how to apply interference focusing to problems with K > 2. Dene the interference phase vector = [ 1 ; 2 ;:::; K ] T (1.158) where k = P K `=1 h k` jX ` j 2 and the instantaneous power vector = jX 1 j 2 ;:::;jX K j 2 T : (1.159) The relation between the and in matrix form is =H SP +H XP (1.160) whereH SP is a diagonal matrix that accounts for SPM andH XP is a zero-diagonal matrix that accounts for XPM. We outline the argument forK = 3. Suppose the XPM matrix for a 3-user interference network is H XP = 2 6 6 6 6 6 6 4 0 1=2 3=5 3=4 0 2=3 5=6 1=5 0 3 7 7 7 7 7 7 5 : (1.161) 53 Suppose that each transmitter knows the channel coecients between itself and all the receiving nodes. The transmitters can thus use power levels of the form = 2 [ lcm(4; 6)m 1 ; lcm(2; 5)m 2 ; lcm(5; 3)m 3 ] = 2 [ 12m 1 ; 10m 2 ; 15m 3 ] (1.162) where lcm(a;b) is the least common multiple of a and b, and m 1 ;m 2 ;m 3 are positive integers. We thus have H XP = 2 2 6 6 6 6 6 6 4 0 5 9 9 0 10 10 2 0 3 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 m 1 m 2 m 3 3 7 7 7 7 7 7 5 (1.163) which implies that the phase interference has been eliminated. The above example combined with an analysis similar to Section 1.5.4 shows that in- terference focusing will give each user a pre-log of 1 even forK-user interference networks. However, the XPM coecientsh k` must be rationals. Modifying interference focusing for real-valued XPM coecients is an interesting problem. It is clear from the example that interference focusing does not require global channel state information. 54 1.6 Non-Zero Group Velocity Mismatch For simplicity, we ignore polarization eects and ber losses in this section. The evo- lution of the slowly-varying components of three optical elds at three dierent center frequencies inside the ber is governed by i @A 1 @z +i 11 @A 1 @t 21 2 @ 2 A 1 @t 2 + 1 (jA 1 j 2 + 2jA 2 j 2 + 2jA 3 j 2 )A 1 = 0 (1.164) i @A 2 @z +i 12 @A 2 @t 22 2 @ 2 A 2 @t 2 + 2 (jA 2 j 2 + 2jA 1 j 2 + 2jA 3 j 2 )A 2 = 0 (1.165) i @A 3 @z +i 13 @A 3 @t 23 2 @ 2 A 3 @t 2 + 3 (jA 3 j 2 + 2jA 1 j 2 + 2jA 2 j 2 )A 3 = 0: (1.166) Suppose 21 = 0, 22 = 0 and 23 = 0, i.e., we have no group velocity dispersion. The solution at z =L, where L is the ber length, is given by A k (L;t) =A k (0;t k1 L) exp (i k (L;t k1 L)) (1.167) where k = 1; 2; 3 and the time-dependent nonlinear phase shifts k (L;t) are 1 (L;t) = Z L 0 1 (jA 1 (0;t)j 2 + 2jA 2 (0;t + ( 11 12 ))j 2 + 2jA 3 (0;t + ( 11 13 ))j 2 )d (1.168) 2 (L;t) = Z L 0 2 (jA 2 (0;t)j 2 + 2jA 1 (0;t + ( 12 11 ))j 2 + 2jA 3 (0;t + ( 12 13 ))j 2 )d (1.169) 3 (L;t) = Z L 0 3 (jA 3 (0;t)j 2 + 2jA 1 (0;t + ( 13 11 ))j 2 + 2jA 2 (0;t + ( 13 12 ))j 2 )d (1.170) 55 The solution to the three coupled equations follows from steps similar to the steps outlined in Appendix 1.8.10 for two coupled equations. Dene d kj = 1k 1j : (1.171) 1.6.1 Continuous-Time Model Consider the case of 13 6= 12 6= 11 . Without loss of generality, suppose that 13 > 12 > 11 . Letfx k [m]g n1 m=0 be the codeword sent by transmitter k. Suppose that the transmitters use a pulse of shape p(t) where p(t) = 0 for t = 2 [0;T s ] and Z Ts 0 jp()j 2 d =E s : (1.172) The signal sent by transmitter k is A k (0;t) = n1 X m=0 x k [m] p(tmT s ): (1.173) The signal observed by receiver k is r k (t) =A k (L;t) +z k (t) (1.174) where z k (t) is circularly-symmetric complex Gaussian white noise with E[z k (t)] = 0, and E[z k (t)z k (t +)] = N(). The processes z 1 (t), z 2 (t) and z 3 (t) are statistically independent. 56 The signal seen by receiver k is fed to a bank of linear time-invariant (LTI) lters with impulse responsesfh f (t)g f2F k , whereF k Z =f:::;1; 0; 1;:::g and h f (t) =p (t) exp(i2fK(t)) (1.175) where K(t) is dened as K(t) = 1 E s Z t 0 jp()j 2 d: (1.176) The choice of the setF k is specied in Sec. 1.6.5. The impulse responses of the lters are orthogonal, i.e., if f 1 6=f 2 , then (see Appendix 1.8.5) Z 1 1 h f 1 ()h f 2 ()d = 0: (1.177) The analysis is similar for all receivers, hence we present only the analysis for receiver 1 for brevity. The output of the lter with index f is y 1;f (t) =r 1 (t)?h f (t) where ? denotes convolution. The noiseless part ~ y 1;f (t) of the output of the lter with index f is given by ~ y 1;f (t + 11 L) =A 1 (L;t + 11 L)?h f (t) = A 1 (0;t)e i 1 (L;t) ?h f (t) 57 = n1 X m=0 x 1 [m] p(tmT s )e i 1 (L;t) ! ?h f (t) = n1 X m=0 x 1 [m] Z 1 1 p(mT s )p (t)e i 1 (L;)i2fK(t) d: (1.178) Sampling the output signaly 1;f (t + 11 L) at the time instantst =jT s , forj = 1; 2;:::;n, yields ~ y 1;f (jT s + 11 L) =x 1 [j] Z jTs+Ts jTs jp(jT s )j 2 e i 1 (L;)i2fK(jTs) d (1.179) where we used p(t) = 0 for t = 2 [0;T s ]. We write 1 (L;) as 1 (L;) = 11 (L;) + 12 (L;) + 13 (L;) (1.180) where we have dened 11 (L;t) = 1 LjA 1 (0;t)j 2 (1.181) 12 (L;t) = 2 1 L 12 1 T s Z t tLd 21 jA 2 (0;)j 2 d (1.182) 13 (L;t) = 2 1 L 13 1 T s Z t tLd 31 jA 3 (0;)j 2 d (1.183) and where L 1k =T s =jd 1k j for k6= 1. Since p(t) = 0 for t = 2 [0;T s ], we have 12 (L;t) = 2 1 L 12 T s Z t tLd 21 n1 X m=0 jx 2 [m]j 2 jp(mT s )j 2 d = 2 1 L 12 T s n1 X m=0 jx 2 [m]j 2 Z t tLd 21 jp(mT s )j 2 d 58 = 2 1 L 12 E s T s n1 X m=0 jx 2 [m]j 2 (tmT s ;d 21 ) (1.184) where (t;d) is dened as (t;d) = 1 E s Z t tLd jp()j 2 d: (1.185) If LdT s , then (t;d) = 8 > > > > > > > > > > < > > > > > > > > > > : K(t); 0t<T s 1; T s t<Ld ~ K(t;d); Ldt<Ld +T s 0; otherwise (1.186) where K(t) is dened by (1.176) and ~ K(t;d) is given by ~ K(t;d) = 1 E s Z Ts tLd jp()j 2 d = 1 E s Z Ts 0 jp()j 2 d 1 E s Z tLd 0 jp()j 2 d = 1K(tLd): (1.187) One can express 13 (L;t) in a similar manner. Suppose that Ld k1 = M k1 T s for some positive integer M k1 for k = 2; 3. Hence, for 2 [jT s ;jT s +T s ], we have 14 11 (L;) = 1 L E s T s jx 1 [j]j 2 (1.188) 14 We use the convention of setting the quantities that involve a negative time index to zero. 59 12 (L;) = 2 1 L 12 E s T s M 21 X r=1 jx 2 [jr]j 2 ! + jx 2 [j]j 2 jx 2 [jM 21 ]j 2 K(tjT s ) (1.189) 13 (L;) = 2 1 L 13 E s T s M 31 X r=1 jx 3 [jr]j 2 ! + jx 3 [j]j 2 jx 3 [jM 31 ]j 2 K(tjT s ) : (1.190) By substituting (1.188){(1.190) in (1.180), we get 1 (L;) = 1 [j] + 2v 1 [j]K(jT s ) (1.191) where 1 [j] =h 11 jx 1 [j]j 2 +h 12 M 21 X r=1 jx 2 [jr]j 2 +h 13 M 31 X r=1 jx 3 [jr]j 2 (1.192) v 1 [j] =h 12 jx 2 [j]j 2 jx 2 [jM 21 ]j 2 =2 +h 13 jx 3 [j]j 2 jx 3 [jM 31 ]j 2 =2 (1.193) h 11 = 1 L E s T s ; h 12 = 2 1 L 12 E s T s ; h 13 = 2 1 L 13 E s T s : (1.194) Then by substituting in (1.179), we have ~ y 1;f (jT s + 11 L) =x 1 [j] E s e i 1 [j] Z Ts 0 jp()j 2 E s e i2(v 1 [j]f)K() d: (1.195) The integral in (1.195) can be evaluated using (see Lemma 14 in Appendix 1.8.5) Z Ts 0 jp()j 2 E s e BK() d = 8 > > < > > : (e B 1)=B; if B6= 0; 1; if B = 0 (1.196) 60 where B is a complex number. Therefore, the noiseless part ~ y 1;f [j] of the output of the lter with index f at time j is given by ~ y 1;f [j] =x 1 [j]E s e i 1 [j] u 1;f [j] (1.197) where 1 [j] is dened (1.192) and u 1;f [j] = 8 > > < > > : exp (i2(v 1 [j]f)) 1 i2(v 1 [j]f) ; if v 1 [j]6=f 1; otherwise (1.198) where v 1 [j] is dened in (1.193). The output of the lter with index f at time j is y 1;f [j] =y 1;f (jT s + 11 L) = ~ y 1;f [j] +z 1;f [j] (1.199) where z 1;f [j] = z 1 (t)?h f (t)j t=jTs+ 11 L : (1.200) The variable z 1;f [j] is Gaussian with mean 0 and variance NE s . Moreover, due to the orthogonality of the lter bank impulse responses, we have E(z 1;f 1 [j]z 1;f 2 [j]) = 0 for all f 1 6=f 2 , which implies that the random variablesfz 1;f [j]g f2F 1 are independent. 1.6.2 Discrete-Time Model Transmitterk sends a codewordX n k = (X k [1];X k [2]; ;X k [n]) while receiverk observes Y n k = (Y k [1]; Y k [2]; ; Y k [n]). The inputX k [j] of transmitterk to the channel at time 61 j is a scalar, whereas the channel output Y k [j] at receiver k at time j is a vector whose components are Y k;f [j], f2F k . The input-output relations are: Y k;f [j] =X k [j] e i k [j] U k;f [j] +Z k;f [j] (1.201) with 1 [j] =h 11 jX 1 [j]j 2 +h 12 M 21 X r=1 jX 2 [jr]j 2 +h 13 M 31 X r=1 jX 3 [jr]j 2 (1.202) 2 [j] =h 21 M 21 X r=1 jX 1 [j +M 21 r]j 2 +h 22 jX 2 [j]j 2 +h 23 M 32 X r=1 jX 3 [jr]j 2 (1.203) 3 [j] =h 31 M 31 X r=1 jX 1 [j +M 31 r]j 2 +h 32 M 32 X r=1 jX 2 [j +M 32 r]j 2 +h 33 jX 3 [j]j 2 (1.204) where M 21 , M 31 and M 32 are positive integers and U k;f [j] = 8 > > < > > : exp (i2(V k [j]f)) 1 i2(V k [j]f) ; if V k [j]6=f 1; otherwise (1.205) where we dene V 1 [j] =h 12 (jX 2 [j]j 2 jX 2 [jM 21 ]j 2 )=2 +h 13 (jX 3 [j]j 2 jX 3 [jM 31 ]j 2 )=2 (1.206) V 2 [j] =h 21 (jX 1 [j +M 21 ]j 2 jX 1 [j]j 2 )=2 +h 23 (jX 3 [j]j 2 jX 3 [jM 32 ]j 2 )=2 (1.207) V 3 [j] =h 31 (jX 1 [j +M 31 ]j 2 jX 1 [j]j 2 )=2 +h 32 (jX 2 [j +M 32 ]j 2 jX 2 [j]j 2 )=2: (1.208) Z k;f [j] models the noise at lterf of receiverk at timej, the random variablesfZ k;f [j]g k;f;j are independent circularly-symmetric complex Gaussian random variables with mean 0 62 and variance N. We regard the h k` as channel coecients that are time invariant and known globally. The following power constraints are imposed: 1 n n X j=1 E jX k [j]j 2 P; k = 1; 2; 3: (1.209) A scheme is a collectionf(C 1 (P;N);C 2 (P;N);C 3 (P;N))g of triples of codes such that at (P;N), user k uses the codeC k (P;N) that satises the power constraint and achieves an information rate R k (P;N) for k = 1; 2; 3 where R k (P;N) =I(X k ; Y k ) lim n!1 1 n I(X n k ; Y n k ): (1.210) We extend the denitions of pre-logs made in Denitions 1 and 2. Denition 7 The high-power pre-log triple (r 1 ;r 2 ;r 3 ) is achieved by a scheme if the rates satisfy r k (N) = lim P!1 R k (P;N) log(P=N) for k = 1; 2; 3: (1.211) Denition 8 The low-noise pre-log triple (r 1 ;r 2 ;r 3 ) is achieved by a scheme if the rates satisfy r k (P ) = lim N!0 R k (P;N) log(P=N) for k = 1; 2; 3: (1.212) The (high-power or low-noise) pre-log triple (1=2; 1=2; 1=2) can be achieved if all users use phase modulation only. It is not obvious whether (1=2; 1=2; 1=2) is achievable by using amplitude modulation only. In this work, we show that the high-power pre-log triple (1; 1; 1) can be achieved for any positive N through interference focusing. 63 1.6.3 Inner Bound: Phase Modulation Suppose we use only the lter with index f = 0. Suppose further that the inputs X n k of userk are independent and identically distributed (i.i.d.) with a constant amplitude p P and a uniformly random phase (a ring), i.e., we have X k [j] = p Pe i X;k [j] (1.213) where X;k [j] is uniform on [;) for j = 1; 2;:::;n. Therefore, the outputs become Y k;0 [j] =X k [j] e i k [j] U k;0 [j] +Z k;0 [j] (1.214) with 1 [j] = [h 11 +h 12 M 21 +h 13 M 31 ]P 2 [j] = [h 21 M 21 +h 22 +h 23 M 32 ]P 3 [j] = [h 31 M 31 +h 32 M 32 +h 33 ]P; (1.215) i.e., the phase k [j] is constant for all j = 1;:::;n. Moreover, we have U 1;0 [j] = 1; maxfM 21 ;M 31 g<jn U 2;0 [j] = 1; M 32 <j <nM 21 U 3;0 [j] = 1; 1j <n maxfM 31 ;M 32 g: (1.216) 64 Thus, the users are decoupled under constant amplitude modulation (except near the beginning and the end of transmission). We have 1 n I(X n 1 ; Y n 1 ) (a) 1 n I(X n 1 ;Y n 1;0 ) (b) 1 n n X j=1 I(X 1 [j];Y 1;0 [j]) (c) 1 n n X j=maxfM 21 ;M 31 g+1 I(X 1 [j];Y 1;0 [j]) (d) n maxfM 21 ;M 31 g n 1 2 log P N 1 (1.217) where (a) follows from the chain rule and the non-negativity of mutual information, (b) follows becauseX 1 ;:::;X n are i.i.d. and because conditioning does not increase entropy, (c) follows from the non-negativity of mutual information and (d) holds because (see Theorem 4) I(X 1 [j];Y 1;0 [j]) 1 2 log P N 1: (1.218) As n!1, we have R ring 1 (P;N) 1 2 log P N 1: (1.219) By using similar steps for users 2 and 3, we have R ring k (P;N) 1 2 log P N 1 (1.220) 65 fork = 1; 2; 3 which implies that the pre-log triple is (1=2; 1=2; 1=2) is achieved simply by using one receiver lter and phase modulation. 1.6.4 Outer Bound Dene W n k = (W k [1];W k [2]; ;W k [n]) where W k [j] = P f2F k Y k;f [j] U k;f [j] G k [j] e ih kk jX k [j]j 2 i k [j] (1.221) and G k [j] = s X f2F k jU k;f [j]j 2 : (1.222) We remark that jG k [j]j 2 = X f2F k jU k;f [j]j 2 = X f2F k (sinc(fV k;f [j])) 2 (1.223) where sinc(x) = 8 > > < > > : sin(x) x ; if x6= 0; 1; otherwise. (1.224) By using (1.221) and (1.201), we rewrite W k [j] as W k [j] =X k [j] e ih kk jX k [j]j 2 G k [j] +Z k [j] (1.225) 66 where Z k [j] = P f2F k Z k;f [j] U k;f [j] G k [j] e ih kk jX k [j]j 2 i k [j] : (1.226) The outer bound is obtained using a genie-aided argument. The genie gives each receiver the messages it is not interested it. For example, the genie gives X n 2 and X n 3 to receiver 1 so we have I(X n 1 ; Y n 1 ) (a) I(X n 1 ; Y n 1 ;X n 2 ;X n 3 ) (b) = I(X n 1 ; Y n 1 jX n 2 ;X n 3 ) (c) = I(X n 1 ;W n 1 jX n 2 ;X n 3 ) =h (W n 1 jX n 2 ;X n 3 )h (W n 1 jX n 1 ;X n 2 ;X n 3 ) =h (W n 1 jX n 2 ;X n 3 )h (Z n 1 jX n 2 ;X n 3 ) (d) = h (W n 1 jX n 2 ;X n 3 )n log(eN) (e) n X j=1 h(W 1 [j]jX n 2 ;X n 3 )n log(eN) (f) n X j=1 E log 1 + E[jX 1 [j]j 2 ]jG 1 [j]j 2 N X n 2 X n 3 (g) n X j=1 log 1 + E[jX 1 [j]j 2 ] N (h) n log 1 + 1 n n X j=1 E[jX 1 [j]j 2 ] N ! n log 1 + P N : (1.227) 67 Step (a) follows from the chain rule and the non-negativity of mutual information; (b) holds because X n 1 is independent of X n 2 and X n 3 ; (c) holds because conditioned on X n 2 and X n 3 , W n 1 is a sucient statistic for X n 1 , i.e., I(X n 1 ; Y n 1 jW n 1 ;X n 2 ;X n 3 ) = 0; (1.228) (d) follows because, conditioned on X n 2 and X n 3 , Z n 1 is i.i.d. Gaussian with (conditional) mean = 0 and E jZ 1 [j]j 2 X n 2 ;X n 3 =N; (1.229) (e) follows by using the chain rule of the dierential entropy and by using the fact that conditioning does not increase entropy; (f) holds because the Gaussian distribution max- imizes the entropy (under second moment constraint) and E jW 1 [j]j 2 X n 2 ;X n 3 =E[jX 1 [j]j 2 ]jG 1 [j]j 2 +N; (1.230) (g) holds because jG k [j]j 2 1 X f=1 (sinc(fV k;f [j])) 2 = 1 (1.231) 68 which can be shown using Fourier transform techniques (see Appendix 1.8.6); and (h) follows from Jensen's inequality. By using a similar argument for receiver 2 and receiver 3, we eventually have R k log 1 + P N (1.232) for k = 1; 2; 3 which implies that the maximal pre-log triple is (1; 1; 1). 1.6.5 Interference Focusing We use interference focusing, i.e., we focus the phase interference on one point by im- posing the following constraints on the transmitted symbols: h 21 jX 1 [j]j 2 = 2~ n 21 ; h 31 jX 1 [j]j 2 = 2~ n 31 ; (1.233) h 12 jX 2 [j]j 2 = 2~ n 12 ; h 32 jX 2 [j]j 2 = 2~ n 32 ; (1.234) h 13 jX 3 [j]j 2 = 2~ n 13 ; h 23 jX 3 [j]j 2 = 2~ n 23 ; (1.235) where ~ n 21 , ~ n 31 , ~ n 12 , ~ n 32 , ~ n 13 and ~ n 23 2 N, which ensures that the XPM interference is eliminated. Suppose that h 21 , h 31 , h 12 , h 32 , h 13 and h 23 are rational. Then the interference focusing constraints become jX k [j]j 2 = 2^ p k ~ n k (1.236) 69 where ^ p 1 = lcm(den(h 21 ); den(h 31 )); (1.237) ^ p 2 = lcm(den(h 12 ); den(h 32 )); (1.238) ^ p 3 = lcm(den(h 13 ); den(h 23 )): (1.239) where den(x) is the denominator of a rational number x. Because of other constraints, e.g., power constraint, only a subset of the allowed rings is actually used, i.e., we choose jX k [j]j 2 2P k =f2^ p k ~ n; ~ n2N k g (1.240) whereN k N =f1; 2; 3;:::g. In this case, V k [j]2V k , for k = 1; 2; 3, where V k = 8 < : X j6=k d j :d j 2D j ;j2f1; 2; 3g 9 = ; (1.241) and D k =fm 1 m 2 :m 1 2N k ;m 2 2N k g (1.242) which leads us to choose the sets of \normalized frequencies"F k of the lter banks at the receivers asF k =V k . 70 Thus, under interference focusing, the output at receiverk at timej is a vector Y k [j], whose components arefY k;f [j]g f2V k , where Y k;f [j] =X k [j] exp ih kk jX k [j]j 2 U k;f [j] +Z k;f [j]; (1.243) and where U k;f [j] = 8 > > < > > : 1; if V k [j] =f; 0; otherwise. (1.244) This means that exactly one lter (the lter with indexV k [j]) output among all the lters contains the signal corrupted by noise, while all other lters put out noise at any time j = 1;:::;n. Therefore, we have 1 n I(X n k ; Y n k ) (a) 1 n n X j=1 I(X k [j]; Y k [j]) (b) 1 n n X j=1 I X k [j];Y k;V k [j] [j] (c) = I X k [1];X k [1]e ih kk jX k [1]j 2 +Z k;V k [1] [1] (1.245) where (a) follows because X n 1 are i.i.d. and because conditioning does not increase en- tropy; (b) follows from the chain rule and the non-negativity of mutual information; and (c) holds because X n 1 are i.i.d. and the channel becomes a memoryless time-invariant 71 channel under interference focusing. It follows from Theorem 5 that by using interference focusing, we have lim P!1 I X k [1];X k [1]e ih kk jX k [1]j 2 +Z k;V k [1] [1] log(P=N) 1 (1.246) which implies that r k 1 for k = 1; 2; 3. Hence, the high-power pre-log triple (1; 1; 1) is achievable. We remark that the question of whether all users can simultaneously achieve a low-noise pre-log of 1 is open for both models with and without GVM. The following toy example illustrates our receiver structure, and the role that inter- ference focusing plays in choosing its parameters. Example: Consider 2 transmitters that use a rectangular pulse, i.e., p(t) = 8 > > < > > : p E s =T s ; 0t<T s 0; otherwise: (1.247) Suppose that h 12 = 5, h 21 = 4, P 1 = 8, and P 2 = 7. Suppose that the users choose the sets of ringsN 1 =f1; 4; 9g andN 2 =f2; 8g (see Fig. 1.1), i.e., the power levels are P 1 =f0:5; 2; 4:5g andP 2 =f0:8; 3:2g. These choices satisfy the power constraints and eliminate the interference. The parameters of the lter bank areF 1 =V 1 =f6; 0; 6g andF 2 =V 2 =f8;5;3; 0; 3; 5; 8g. In other words, receiver 1 has 3 lters whose frequency responses are sinc functions centered at f 1 6=T s , f 1 , and f 1 + 6=T s , whereas receiver 2 has 7 lters whose frequency responses are sinc functions centered at 7 dierent frequencies (see Fig. 1.2). This shows that, because of the non-linearity, the receivers 72 XI XQ XI XQ Figure 1.1: Ring modulation used by transmitter 1 (left) and transmitter 2 (right). The thin lines are the rings allowed by interference focusing, and the thick blue lines are the rings selected for transmission. need to extract information from a \bandwidth" larger than the \bandwidth" of the transmitted signal. 1.7 Conclusion We introduced 2 discrete-time interference channel models based on a simplied optical ber model. We used coupled non-linear Schr odinger (NLS) equations to develop our models. In the rst model there was no dispersion. The non-linear nature of the ber- optic medium causes the users to suer from amplitude-dependent phase interference. We introduced a new technique called interference focusing that lets the users take full advantage of all the available amplitude and phase degrees of freedom at high transmission powers. In the second model, the second-order dispersion is negligible. However, we included non-zero group velocity mismatch as well as non-linearity. We showed that our discrete-time model is justied by using square pulse shaping at the transmitters and a bank of frequency-shifted matched lters at the receivers. We proved that both users 73 f 1 f 1 +6/T s f 1 −6/T s f 2 f 2 +8/T s f 2 −8/T s Figure 1.2: Frequency responses of the lters at receivers 1 (top) and 2 (bottom). can achieve a high-power pre-log of 1 simultaneously by using interference focusing, thus exploiting all the available amplitude and phase degrees of freedom. 74 1.8 Appendix 1.8.1 Ring Modulation in AWGN Channels We derive the probability density function (pdf) of the output Y for a ring input to an AWGN channel. The derivation is already available in the literature, e.g., see [35], but we include it here for completeness. Consider the AWGN channel Y =X +Z (1.248) where X the input and Z is a circularly-symmetric complex Gaussian with mean 0 and variance N. The conditional distribution p YjX of Y given X (in Cartesian coordinates) is p YjX (yjx) =p Y R ;Y I jX R ;X I (y R ;y I jx R ;x I ) = 1 N exp (y R x R ) 2 + (y I x I ) 2 N (1.249) where Y R =<[Y ], Y I ==[Y ], Y R =<[X] and Y I ==[X]. The probability distribution p Y of Y (in Cartesian coordinates) is p Y (y)p Y R ;Y I (y R ;y I ) = Z p X (x)p YjX (yjx)dx (1.250) 75 Suppose theX A =R whereR is xed and X is uniformly distributed on [;). Then we have p X (x)p X R ;X I (x R ;x I ) = 1 2 D (x 2 R +x 2 I R 2 ) R (1.251) where D () is the Dirac delta function dened by Z I D (x)dx = 8 > > < > > : 1; if 02I 0; otherwise (1.252) and p Y (y) = Z 1 1 Z 1 1 1 2 D (x 2 R +x 2 I R 2 ) R 1 N exp (y R x R ) 2 + (y I x I ) 2 N dx R dx I : (1.253) By setting x R =R cos X ; x I =R sin X (1.254) y R =y A cos Y ; y I =y A sin Y (1.255) we have p Y (y) = Z 2 =0 Z 1 x A =0 1 2 D (x 2 A R 2 ) R 76 1 N e (y 2 A +R 2 )=N exp 2 y A R cos( Y X ) N x A dx A d X = 1 N e (y 2 A +R 2 )=N Z 2 =0 1 2 e 2y A R cos( Y X )=N d X = 1 N e (y 2 A +R 2 )=N I 0 2y A R N (1.256) where I 0 () is the modied Bessel function of the rst kind of order zero. The distribution of Y A is p Y A (y A ) = Z p Y A ; Y (y A ; y )d y (a) = Z p Y R ;Y I (y A cos y ;y A sin y )y A d y (b) = 2y A N e (y 2 A +R 2 )=N I 0 2y A R N : (1.257) where (a) holds becausep Y A ; Y (y A ; y ) =p Y R ;Y I (y A cos y ;y A sin y )y A (see [65, Sec 6.3]) while (b) follows by using (1.256). 1.8.2 Upper Bound on the Modied Bessel Function of the First Kind Lemma 9 We have I 0 (z) p 2 e z p z ; z> 0: (1.258) Proof. We have cos 1 4 2 = 2 ; 0=2 (1.259) cos 0; =2 (1.260) 77 where (1.259) follows by using the innite product form for cosine cosx = 1 Y n=1 1 4x 2 2 (2n 1) 2 : (1.261) We thus have I 0 (z) = 1 Z 0 e z cos d (a) 1 " Z =2 0 e z(14 2 = 2 ) d + Z =2 d # = p 4 e z p z 1 2Q( p 2z) + 1 2 (b) p 2 e z p z (1.262) where Q(z) = Z 1 z 1 p 2 e x 2 =2 dx: (1.263) Step (a) follows because the exponential function is a monotonic increasing function and (1.259){(1.260) while (b) holds because Q(z) 0 and p 4 e z p z p 4 min z0 e z p z = p 2e 4 1 2 : (1.264) 1.8.3 Expected Value of the Logarithm of a Rician R.V. Consider the following functions. 78 Gamma function (z) [1, 6.1.1] (z) = Z 1 0 t z1 e t dt (1.265) Psi (Digamma) function (z) [1, 6.3.1] (z) = d dz ln (z) = 0 (z) (z) (1.266) Upper incomplete Gamma function (a;x) [1, 6.5.3] (a;x) = Z 1 x t a1 e t dt; a> 0 (1.267) We derive several useful lemmas concerning these functions. Lemma 10 Z 1 0 e x 2 2 x 2k+1 ln(x)dx = 2 k1 (k + 1) ln(2) + 0 (k + 1) (1.268) Proof. Consider I k = Z 1 x=0 e x 2 2 x 2k+1 ln(x)dx (a) = Z 1 u=0 e u ( p 2u) 2k+1 ln( p 2u) 1 p 2u du = Z 1 u=0 e u (2u) k 1 2 (ln(2) + ln(u))du = 2 k1 ln(2) Z 1 u=0 e u u k du + 2 k1 Z 1 u=0 e u u k ln(u)du 79 (b) = 2 k1 (k + 1) ln(2) + 0 (k + 1) (1.269) where (a) follows from the transformation of variables u = x 2 =2 and (b) holds because the integral in the rst term of the previous equation can be expressed in terms of the Gamma function (see (1.265)) as Z 1 u=0 e u u k du = (k + 1) (1.270) and the integral in the second term is given by [46, 4.352 (4)] Z 1 u=0 e u u k ln(u)du = 0 (k + 1): (1.271) Lemma 11 1 X k=0 t k k! (k + 1) =e t ((0;t) + ln(t)) (1.272) Proof. We use the following formula [17, 6.2.1 (60)] 1 X k=1 t k k! (k +a) = t a e t a (a) 1e t t + 2 F 2 (1; 1;a + 1; 2;t) (1.273) 80 where 2 F 2 (a 1 ;a 2 ;b 1 ;b 2 ;x) is the generalized hypergeometric function dened as [46, 9.14 (1)], [17, p. 674] 2 F 2 (a 1 ;a 2 ;b 1 ;b 2 ;x) = 1 X k=0 (a 1 ) k (a 2 ) k (b 1 ) k (b 2 ) k x k k! (1.274) where (f) k =f(f + 1)::: (f +k 1). Setting a = 1 in (1.273) gives 1 X k=1 t k k! (k + 1) =te t (1) 1e t t + 2 F 2 (1; 1; 2; 2;t) =e t F (t) + (1e t ) (1) (1.275) in which we dened F (t) as F (t) =t 2 F 2 (1; 1; 2; 2;t) =t 1 X k=0 1 (k + 1) 2 (t) k k! = 1 X m=1 1 m (t) m m! : (1.276) From the Fundamental Theorem of Calculus, we have Z z 0 F 0 (t) dt =F (z)F (0) (1.277) where F 0 (t) = dF (t) dt = 1 X m=1 (t) m1 m! = 1e t t (1.278) 81 and therefore the left-hand side of (1.277) is Z z 0 1e t t dt = Z 1 z e t t dt + Z z 1 1 t dt + Z 1 0 1e t t dt Z 1 1 e t t dt = (0;z) + ln(z) + (1.279) where we used the denition of the upper incomplete Gamma function and the following integral form for Euler's constant [46, 8.367 (12)] = Z 1 0 1e t t dt Z 1 1 e t t dt: (1.280) Since F (0) = 0 and (1) = [46, 8.367 (1)], we have F (z) = (0;z) + ln(z) (1): (1.281) It follows from (1.275) and (1.281) that 1 X k=0 t k k! (k + 1) = (1) + 1 X k=1 t k k! (k + 1) =e t ((0;t) + ln(t)): (1.282) 82 Lemma 12 Z 1 0 xe x 2 + 2 2 I 0 (x) ln(x)dx = 1 2 0; 2 2 + ln( 2 ) (1.283) Proof. We compute Z 1 0 xe x 2 + 2 2 I 0 (x) ln(x)dx (a) = Z 1 0 xe x 2 + 2 2 1 X k=0 (x 2 2 =4) k (k!) 2 ! ln(x)dx = 1 X k=0 1 4 k (k!) 2 Z 1 0 xe x 2 + 2 2 (x) 2k ln(x)dx (b) = 1 X k=0 1 4 k (k!) 2 2 k1 e 2 2 2k ((k + 1) ln(2) + 0 (k + 1)) =e 2 2 " ln(2) 2 1 X k=0 ( 2 =2) k k! (k!) 2 + 1 2 1 X k=0 ( 2 =2) k k! 0 (k + 1) (k + 1) # (c) = e 2 2 " ln(2) 2 1 X k=0 ( 2 =2) k k! + 1 2 1 X k=0 ( 2 =2) k k! (k + 1) # (d) = e 2 2 ln(2) 2 e 2 2 + 1 2 e 2 2 0; 2 2 + ln 2 2 = 1 2 0; 2 2 + ln( 2 ) (1.284) where in (a) we used the series representation of I 0 () [1, 9.6.10] I 0 (x) = 1 X k=0 (x 2 =4) k (k!) 2 : (1.285) Step (b) follows from Lemma 10, (c) follows because [1, 6.1.6] (k + 1) =k! (1.286) 83 and (k + 1) = 0 (k + 1)=(k + 1) and (d) follows because the sum in the rst term in the previous step is the series representation of the exponential function and the sum in the second term is given by Lemma 11. 1.8.4 Minimum-Distance Estimator Lemma 13 LetY =X +Z whereZ is a circularly-symmetric complex Gaussian random variable with mean 0 and variance N. SupposejXj2X A =f p P j :j = 1;:::;Jg where 0< P 1 < P 2 <:::< P J . For a minimum-distance estimator ^ X A dened as ^ X A = arg min x A 2X A jY A x A j (1.287) where Y A =jYj, the probability of error P e is P e = Pr[ ^ X A 6=X A ] 2 J J X j=2 exp 2 j 4 ! (1.288) where j = ( p P j p P j1 )= p N. Proof. Dene X A =jXj. Let P e;j be the error probability when X A = p P j . We have P e = P J j=1 (1=J)P e;j and P e;j = 8 > > > > > > > > > > > < > > > > > > > > > > > : Pr Y A p P 1 + p P 2 2 ; j = 1 Pr Y A p P K1 + p P K 2 ; j =J Pr Y A p P j1 + p P j 2 + Pr Y A p P j + p P j+1 2 ; otherwise: (1.289) 84 Conditioned onX A = p P j ,Y A is a Ricean random variable, and hence we compute [69, p. 50] Pr Y A p P j + p P j+1 2 ! =Q p P j p N=2 ; p P j + p P j+1 2 p N=2 ! (1.290) where Q(a;b) is the Marcum Q-function [21]. Consider the following bounds. Upper bound for b>a [21, UB1MG] Q(a;b) exp (ba) 2 2 : (1.291) Lower bound for b<a [21, LB2aS] Q(a;b) 1 1 2 exp (ab) 2 2 exp (a +b) 2 2 : (1.292) The bound (1.292) implies 1Q(a;b) exp (ab) 2 2 : (1.293) We use (1.290) and (1.291) to write Pr Y A p P j + p P j+1 2 ! exp 2 j+1 4 ! : (1.294) 85 where j = ( p P j p P j1 )= p N. Similarly, we use inequality (1.293) to write Pr Y A p P j1 + p P j 2 ! exp 2 j 4 ! : (1.295) Collecting our results, we have P e 1 J 2 4 exp 2 2 4 + J1 X j=2 exp 2 j 4 ! + J1 X j=2 exp 2 j+1 4 ! + exp 2 J 4 3 5 = 2 J J X j=2 exp 2 j 4 ! : (1.296) 1.8.5 Orthogonality of Impulse Responses First, we introduce a useful lemma. Lemma 14 For any complex number B, we have Z Ts 0 jp(t)j 2 E s e BK() d = 8 > > < > > : (e B 1)=B; if B6= 0; 1; if B = 0 (1.297) where K() is dened in (1.176). Proof. For B = 0, we have Z Ts 0 jp(t)j 2 E s e BK() d = Z Ts 0 jp(t)j 2 E s d 86 = 1 (1.298) where the last equality follows from (1.172). For B6= 0, we have Z Ts 0 jp(t)j 2 E s e BK() d (a) = 1 B Z Ts 0 BK 0 ()e BK() d (b) = e BK(Ts) e BK(0) B (c) = e B 1 B (1.299) where (a) follows by applying Leibniz rule for dierentiation under the integral sign [1, 3.3.7]: K 0 (t) = 1 E s d dt Z t 0 jp()j 2 d = 1 E s jp(t)j 2 ; (1.300) (b) is obtained through integration by substitution and (c) holds becauseK(0) = 0 and K(T s ) = 1. Next we show that the impulse responsesfh f (t)g f2F 1 are orthogonal (cf. (1.177)). For f 1 6=f 2 , we have Z 1 1 h f 1 ()h f 2 ()d (a) = Z 1 1 jp()j 2 exp(i2(f 1 f 2 )K()) d (b) = Z 1 1 jp(t)j 2 exp(i2(f 1 f 2 )K(t)) dt 87 (c) = Z Ts 0 jp(t)j 2 exp(i2(f 1 f 2 )K(t)) dt (d) = 0 (1.301) where (a) follows from K () = K(), (b) follows from the transformation of variables t =, (c) holds because p(t) = 0 for t = 2 [0;T s ] and (d) follows from Lemma 14 with B =i2(f 1 f 2 )6= 0. 1.8.6 Sum of Sinc Squared We show that 1 X n=1 (sinc(n)) 2 = 1 (1.302) for any real . Dene the Fourier transform X(f) of x(t) as X(f) =Ffx(t)g = Z 1 1 x(t)e j2ft dt: (1.303) Let x(t) =e i2t rect(t) where rect(t) = 8 > > < > > : 1; ifjtj 1=2; 0; otherwise. (1.304) The Fourier transform of x(t) is X(f) = sinc(f) (1.305) 88 We have 1 X n=1 (sinc(n)) 2 (a) = 1 X n=1 jX(n)j 2 (b) = Z 1 1 jX(f)j 2 df (c) = Z 1 1 jx(t)j 2 dt (d) = 1 (1.306) where (a) follows from (1.305) (b) holds because x(t) is time-limited (see Theorem 8.4.3 in [56]) (c) follows from Parseval's theorem (see [64, Sec. 4.3.7]) and (d) follows by direct integration ofjx(t)j 2 =jrect(t)j 2 . 1.8.7 Volterra Series In this section, we introduce the Volterra series method for solving the propagation equa- tion. We solve up to the third kernel, see [66] for more details and for the solution up to the fth kernel. In the time domain, the propagation equation is @A(z;t) @z = 2 A(z;t) 1 @A(z;t) @t i 2 2 @ 2 A(z;t) @t +i jA(z;t)j 2 A(z;t): (1.307) Let ~ A(z;!) be the Fourier transform of ~ A(z;t): ~ A(z;!) = Z 1 1 A(t;z)e i!t dt: (1.308) 89 The dierential equation (1.307) can be written in the frequency domain as @ ~ A(z;!) @z = 2 ~ A(z;!) 1 (i!) ~ A(z;!)i 2 2 (i!) 2 ~ A(z;!) +i ZZ ~ A(z;! 1 ) ~ A (z;! 2 ) ~ A(z;!! 1 +! 2 ) d! 1 2 d! 2 2 =G 1 (!) ~ A(z;!) +G 3 ZZ ~ A(z;! 1 ) ~ A (z;! 2 ) ~ A(z;!! 1 +! 2 )d! 1 d! 2 (1.309) in which we have dened G 1 (!) = 2 +i 1 ! +i 2 2 ! 2 (1.310) and G 3 = i (2) 2 : (1.311) The Volterra series transfer function (VSTF) is obtained in the frequency domain as a relationship between the Fourier transformsX(!) andY (!) of the input and the output: Y (!) = 1 X n=1 Z Z H n (! 1 ; ;! n1 ;!! 1 ! n1 ) X(! 1 )X(! n1 )X(!! 1 :::! n1 )d! 1 :::d! n1 (1.312) where H n (! 1 ; ;! n1 ;! n ) is the nth order Volterra kernel. 90 Let A(!) denote ~ A(0;!). The third-order VTSF for an optical ber, in which the propagation is described by (1.307), is ~ A(z;!) =H 1 (!;z)A(!) + ZZ H 3 (! 1 ;! 2 ;!! 1 +! 2 ;z)A(! 1 )A (! 2 )A(!! 1 +! 2 )d! 1 d! 2 : (1.313) Even-order kernels are set to zero due to the absence of even order non-linearities in an op- tical ber. The initial conditions for the kernels areH 1 (!; 0) = 1 andH n (! 1 ;:::;! n ; 0) = 0 for n> 1. Substituting (1.313) in the left hand side of (1.309) gives L.H.S. = @ @z H 1 (!;z)A(!) + ZZ H 3 (! 1 ;! 2 ;!! 1 +! 2 ;z)A(! 1 )A (! 2 )A(!! 1 +! 2 )d! 1 d! 2 = @H 1 (!;z) @z A(!) + ZZ @H 3 (! 1 ;! 2 ;!! 1 +! 2 ;z) @z A(! 1 )A (! 2 )A(!! 1 +! 2 )d! 1 d! 2 (1.314) while substituting (1.313) in the right hand side of (1.309) gives R.H.S. =G 1 (!) H 1 (!;z)A(!) + ZZ H 3 (! 1 ;! 2 ;!! 1 +! 2 ;z)A(! 1 )A (! 2 )A(!! 1 +! 2 )d! 1 d! 2 +G 3 ZZ H 1 (! 1 ;z)A(! 1 ) + ZZ ::: H 1 (! 2 ;z)A(! 2 ) + ZZ ::: H 1 (!! 1 +! 2 ;z)A(!! 1 +! 2 ) + ZZ ::: d! 1 d! 2 =G 1 (!)H 1 (!;z)A(!) 91 + ZZ h G 1 (!)H 3 (! 1 ;! 2 ;!! 1 +! 2 ;z) +G 3 H 1 (! 1 ;z)H 1 (! 2 ;z)H 1 (!! 1 +! 2 ;z) i A(! 1 )A (! 2 )A(!! 1 +! 2 )d! 1 d! 2 + other terms. (1.315) By comparing (1.314) and (1.315), we obtain the two following dierential equations 15 @H 1 (!;z) @z =G 1 (!)H 1 (!;z) (1.316) and @H 3 (! 1 ;! 2 ;!! 1 +! 2 ;z) @z =G 1 (!)H 3 (! 1 ;! 2 ;!! 1 +! 2 ;z) +G 3 H 1 (! 1 ;z)H 1 (! 2 ;z)H 1 (!! 1 +! 2 ;z): (1.317) The solution to the rst dierential equation is H 1 (!;z) =e zG 1 (!) : (1.318) Set ! 3 =!! 1 +! 2 and write @H 3 (! 1 ;! 2 ;! 3 ;z) @z =G 1 (! 1 ! 2 +! 3 )H 3 (! 1 ;! 2 ;! 3 ;z) +G 3 H 1 (! 1 ;z)H 1 (! 2 ;z)H 1 (! 3 ;z) =G 1 (! 1 ! 2 +! 3 )H 3 (! 1 ;! 2 ;! 3 ;z) +G 3 e z(G 1 (! 1 )+G 1 (! 2 )+G 1 (! 3 )) : (1.319) 15 We ignore the higher order terms because we are solving up to the third kernel only. 92 By multiplying both sides bye zG 1 (! 1 ! 2 +! 3 ) , the dierential equation can be written as @ @z h e zG 1 (! 1 ! 2 +! 3 ) H 3 (! 1 ;! 2 ;! 3 ;z) i =G 3 e z(G 1 (! 1 )+G 1 (! 2 )+G 1 (! 3 )G 1 (! 1 ! 2 +! 3 )) : (1.320) Integrate both sides to get e zG 1 (! 1 ! 2 +! 3 ) H 3 (! 1 ;! 2 ;! 3 ;z)H 3 (! 1 ;! 2 ;! 3 ; 0) = Z z 0 G 3 e (G 1 (! 1 )+G 1 (! 2 )+G 1 (! 3 )G 1 (! 1 ! 2 +! 3 )) d: (1.321) Therefore, we have H 3 (! 1 ;! 2 ;! 3 ;z) =e zG 1 (! 1 ! 2 +! 3 ) h H 3 (! 1 ;! 2 ;! 3 ; 0) + Z z 0 G 3 e ( ~ G 1 (! 1 ;! 2 ;! 3 )) d i (1.322) where ~ G 1 (! 1 ;! 2 ;! 3 ) is dened as ~ G 1 (! 1 ;! 2 ;! 3 ) =G 1 (! 1 ) +G 1 (! 2 ) +G 1 (! 3 )G 1 (! 1 ! 2 +! 3 ): (1.323) Since H 3 (! 1 ;! 2 ;! 3 ; 0) = 0, we have H 3 (! 1 ;! 2 ;! 3 ;z) =e zG 1 (! 1 ! 2 +! 3 ) G 3 e z ~ G 1 (! 1 ;! 2 ;! 3 ) 1 ~ G 1 (! 1 ;! 2 ;! 3 ) : (1.324) 93 By substituting (1.310) in (1.323), we have ~ G 1 (! 1 ;! 2 ;! 3 ) = +i 1 [! 1 ! 2 +! 3 (! 1 ! 2 +! 3 )] +i 2 2 [! 2 1 ! 2 2 +! 2 3 (! 1 ! 2 +! 3 ) 2 ] = +i 2 [! 1 ! 2 ! 2 2 +! 2 ! 3 ! 1 ! 3 ] = +i 2 (! 1 ! 2 )(! 2 ! 3 ): (1.325) In summary, we have H 1 (!;z) =e zG 1 (!) (1.326) and H 3 (! 1 ;! 2 ;!! 1 +! 2 ;z) =H 1 (!;z)G 3 exp z ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 ) 1 ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 ) (1.327) where ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 ) = +i 2 (! 1 ! 2 )(! 1 !): (1.328) It is convenient to dene H 0 3 (! 1 ;! 2 ;!! 1 +! 2 ;z) =G 3 exp z ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 ) 1 ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 ) : (1.329) 94 1.8.8 A Perturbative Approach In this section, we introduce a solution to the propagation equation (1.307) using the perturbative approach of [63]. Expand ~ A(z;!) as a power series in ~ A(z;!) =e G 1 (!)z 1 X `=0 ` F ` (!;z) (1.330) where G 1 is dened in (1.310). Substituting (1.330) in the left hand side of (1.309) gives L.H.S. =e G 1 (!)z 1 X `=0 ` @F ` (!;z) @z +G 1 (!)e G 1 (!)z 1 X `=0 ` F ` (!;z) (1.331) while substituting (1.330) in the right hand side of (1.309) gives R.H.S. =G 1 (!)e G 1 (!)z 1 X `=0 ` F ` (!;z) +i ZZ e [G 1 (! 1 )+G 1 (! 2 )+G 1 (!! 1 +! 2 )]z 1 X ` 1 ;` 2 ;` 3 =0 ` 1 +` 2 +` 3 F ` 1 (! 1 ;z)F ` 2 (! 2 ;z)F ` 3 (!! 1 +! 2 ;z) d! 1 2 d! 2 2 : (1.332) By comparing (1.331) and (1.332), we obtain @F 0 (!;z) @z = 0 (1.333) and for `6= 0 we have @F ` (!;z) @z =i ZZ e ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 )z 95 X ` 1 ;` 2 ;` 3 : ` 1 +` 2 +` 3 + 1 =` F ` 1 (! 1 ;z)F ` 2 (! 2 ;z)F ` 3 (!! 1 +! 2 ;z) d! 1 2 d! 2 2 (1.334) where ~ G 1 is dened in (1.323). We have the initial conditionsF 0 (!; 0) =A(!) ~ A(0;!) andF ` (!; 0) = 0 for `6= 0. The solution to (1.333) is F 0 (!;z) =A(!): (1.335) For ` = 1, we have @F 1 (!;z) @z =i ZZ e ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 )z F 0 (! 1 ;z)F 0 (! 2 ;z)F 0 (!! 1 +! 2 ;z) d! 1 2 d! 2 2 =i ZZ e ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 )z A(! 1 )A (! 2 )A(!! 1 +! 2 ) d! 1 2 d! 2 2 (1.336) whose solution is F 1 (!;z) = ZZ i (2) 2 " exp( ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 )z) 1 ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 ) # A(! 1 )A (! 2 )A(!! 1 +! 2 )d! 1 d! 2 : (1.337) For ` = 2, we have @F 2 (!;z) @z 96 =i ZZ e ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 )z [F 0 (! 1 ;z)F 0 (! 2 ;z)F 1 (!! 1 +! 2 ;z) +F 0 (! 1 ;z)F 1 (! 2 ;z)F 0 (!! 1 +! 2 ;z) +F 1 (! 1 ;z)F 0 (! 2 ;z)F 0 (!! 1 +! 2 ;z)] d! 1 2 d! 2 2 =i ZZ e ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 )z [A(! 1 )A (! 2 )F 1 (!! 1 +! 2 ;z) +A(! 1 )F 1 (! 2 ;z)A(!! 1 +! 2 ) +F 1 (! 1 ;z)A (! 2 )A(!! 1 +! 2 )] d! 1 2 d! 2 2 : (1.338) We leave out the solution ofF ` (!;z) for` 2. See Appendix A of [63] for an expression ofF 2 (!;z). Note that the Volterra series technique and the perturbative technique give expressions of the output that agree for the rst two non-zero terms. ~ A(z;!) =e G 1 (!)z [F 0 (!;z) + F 1 (!;z)] =H 1 (!;z)A(!) + ZZ H 3 (! 1 ;! 2 ;!! 1 ! 2 )A(! 1 )A (! 2 )A(!! 1 +! 2 )d! 1 d! 2 =e G 1 (!)z " A(!) + ZZ i (2) 2 exp( ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 )z) 1 ~ G 1 (! 1 ;! 2 ;!! 1 +! 2 ) A(! 1 )A (! 2 )A(!! 1 +! 2 )d! 1 d! 2 # : (1.339) 1.8.9 Non-Homogeneous First-Order Linear PDE We want to solve the non-homogeneous rst-order linear partial dierential equation (PDE) @y(z;t) @z +c @y(z;t) @t =f(z;t) (1.340) 97 where c is a constant. Let =tcz and y(z;t) = (z;tcz) = (z;). Then we have @y(z;t) @z = @ (z;) @z + @ (z;) @ @ @z = @ @z c @ @ (1.341) and @y(z;t) @t = @ (z;) @ @ @t = @ @ : (1.342) By substituting in (1.340), we get @ @z c @ @ +c @ @t =f(z; +cz) (1.343) which simplies to @ @z =f(z; +cz): (1.344) The solution to the previous dierential equation is (z;) = 0 () + Z z 0 f(; +c)d: (1.345) Since =tcz, we have (z;tcz) = 0 (tcz) + Z z 0 f(;tcz +c)d: (1.346) 98 But (z;tcz) =y(z;t) and hence y(z;t) = 0 (tcz) + Z z 0 f(;tcz +c)d: (1.347) By setting z = 0, we nd that y(0;t) = 0 (t) (1.348) and therefore, the solution to (1.340) is y(z;t) =y(0;tcz) + Z z 0 f(;tcz +c)d: (1.349) 1.8.10 Solution to Two Coupled Partial Dierential Equations We want to solve the coupled equations i @A 1 @z +i 11 @A 1 @t + 1 (jA 1 j 2 + 2jA 2 j 2 )A 1 = 0 (1.350) i @A 2 @z +i 12 @A 2 @t + 2 (jA 2 j 2 + 2jA 1 j 2 )A 2 = 0: (1.351) Let A j (z;t) =V j (z;t) exp (i j (z;t)), j = 1; 2, where V j (z;t) =jA j (z;t)j. Then, we have @A j @z = @V j @z e i j +V j e i j i @ j @z = @V j @z +iV j @ j @z e i j (1.352) 99 and @A j @t = @V j @t +iV j @ j @t e i j (1.353) Substituting in the coupled equations yields i @V 1 @z + 11 @V 1 @t @ 1 @z + 11 @ 1 @t 1 (V 2 1 + 2V 2 2 ) V 1 = 0 (1.354) i @V 2 @z + 12 @V 2 @t @ 2 @z + 12 @ 2 @t 2 (V 2 2 + 2V 2 1 ) V 2 = 0 (1.355) By separating the real and imaginary parts, we have @V 1 @z + 11 @V 1 @t = 0 (1.356) @ 1 @z + 11 @ 1 @t = 1 (V 2 1 + 2V 2 2 ) (1.357) @V 2 @z + 12 @V 2 @t = 0 (1.358) @ 2 @z + 12 @ 2 @t = 2 (V 2 2 + 2V 2 1 ) (1.359) Equations (1.356) and (1.358) are homogeneous rst-order PDEs whose solutions are (see Appendix 1.8.9) V 1 (z;t) =V 1 (0;t 11 z) (1.360) V 2 (z;t) =V 2 (0;t 12 z) (1.361) 100 Equations (1.357) and (1.359) are non-homogeneous rst-order PDEs whose solutions are 1 (z;t) = 1 (0;t 11 z) + Z z 0 1 V 2 1 (;t 11 z + 11 ) + 2V 2 2 (;t 11 z + 11 ) d = 1 (0;t 11 z) + Z z 0 1 V 2 1 (0;t 11 z) + 2V 2 2 (0;t 11 z + ( 11 12 )) d (1.362) 2 (z;t) = 2 (0;t 12 z) + Z z 0 2 (V 2 2 (;t 12 z + 12 ) + 2V 2 1 (;t 12 z + 12 ))d = 2 (0;t 12 z) + Z z 0 2 (V 2 2 (0;t 12 z) + 2V 2 1 (0;t 12 z + ( 12 11 )))d (1.363) Therefore, the solution to the original coupled equations is A 1 (z;t + 11 z) =A 1 (0;t) exp Z z 0 i 1 (jA 1 (0;t)j 2 + 2jA 2 (0;t + ( 11 12 ))j 2 )d (1.364) A 2 (z;t + 12 z) =A 2 (0;t) exp Z z 0 i 2 (jA 2 (0;t)j 2 + 2jA 1 (0;t + ( 12 11 ))j 2 )d (1.365) 101 Part 2 Wiener Phase Noise Channels 102 Abstract A waveform channel is considered where the transmitted signal is corrupted by Wiener phase noise and additive white Gaussian noise. A discrete-time channel model that takes into account the eect of ltering on the phase noise is developed. The model is based on a multi-sample receiver, i.e., an integrate-and-dump lter whose output is sampled at a rate higher than the signaling rate. It is shown that, at high Signal-to-Noise Ratio (SNR), the multi-sample receiver achieves a rate that grows logarithmically with the SNR if the number of samples per symbol (also called the oversampling factor) grows with the square-root of the SNR. Moreover, the pre-log factor is at least 1/2 in this case, which is achieved by amplitude modulation. The logarithmic behavior of the multi-sample receiver is a signicant improvement over the double-logarithmic (log log) behavior of a matched lter receiver. Numerical simulations are used to compute tight lower bounds on the information rates achieved by the multi-sample receiver. The numerical simulations show that oversampling at the receiver is benecial for both strong and weak phase noise at high SNR. The results are compared with results obtained when using other discrete-time models. Finally, it is shown for an approximate discrete-time model of the multi-sample receiver that the capacity pre-log at high SNR is at least 3/4 if the number of samples per symbol grows with the square root of the SNR. The analysis shows that phase modulation achieves a pre-log of at least 1/4 while amplitude modulation achieves a pre-log of 1/2. This is strictly greater than the capacity pre-log of the (approximate) discrete-time Wiener phase noise channel with only one sample per symbol, which is 1/2. 103 2.1 Introduction Phase noise arises due to the instability of oscillators [28] in communication systems such as satellite communication [18], microwave links [32] or optical ber communication [77]. The statistical characterization of the phase noise depends on the application. In systems with phase-locked loops (PLL), the residual phase noise follows a Tikhonov distribution [79]. In Digital Video Broadcasting DVB-S2, an example of a satellite communication system, the phase noise process is modeled by the sum of the outputs of two innite- impulse response lters driven by the same white Gaussian noise process [18]. In optical ber communication, the phase noise in laser oscillators is modeled by a Wiener process [77]. A commonly studied discrete-time channel model is Y k =X k e j k +Z k (2.1) wherefY k g are the output symbols, fX k g are the input symbols, f k g is the phase noise process andfZ k g is additive white Gaussian noise (AWGN). For example, Katz and Shamai [50] studied the model (2.1) when f k g is independent and identically distributed (i.i.d.) according to p (), when is uniformly distributed (called a non- coherent AWGN channel) and when has a Tikhonov (or von Mises) distribution (called a partially-coherent AWGN channel). The i.i.d. Tikhonov phase noise models the resid- ual phase error in systems with phase-tracking devices, e.g., phase-locked loops (PLL) and ideal interleavers/deinterleavers. Katz and Shamai [50] characterized some properties 104 of the capacity-achieving distribution. Tight lower bounds on the capacities of memo- ryless non-coherent and partially coherent AWGN channels were computed by solving an optimization problem numerically in [50] and [48], respectively. Lapidoth studied in [53] a phase noise channel (2.1) at high SNR. He considered both memoryless phase noise and phase noise with memory. He showed that the capacity grows logarithmically with the SNR with a pre-log factor 1/2, where the pre-log is due to amplitude modula- tion only. The phase modulation contributes a bounded number of bits only. Dauwels and Loeliger [27] proposed a particle ltering method to compute information rates for discrete-time continuous-state channels with memory and applied the method to (2.1) for Wiener phase noise and auto-regressive-moving-average (ARMA) phase noise. Barletta, Magarini and Spalvieri [13] computed lower bounds on information rates for (2.1) with Wiener phase noise by using the auxiliary channel technique proposed in [5] and they computed upper bounds in [12]. They also developed a lower bound based on Kalman ltering in [11]. Barbieri and Colavolpe [6] computed lower bounds with an auxiliary channel slightly dierent from [13]. Capacity bounds are developed for a multi-input multi-output (MIMO) extension of the discrete-time model (2.1) in [30,31]. A less-studied model is the continuous-time model for phase noise channels, also called a waveform phase noise channel, see Fig. 2.1. The received waveform r(t) is r(t) =x(t) e j(t) +n(t); for t2R (2.2) where x(t) is the transmitted waveform while n(t) and (t) are additive noise and phase noise, respectively. Continuous-time white phase noise is considered in [45, Section IV.C] 105 Transmitter + Receiver n(t) e j(t) x(t) r(t) Figure 2.1: Waveform phase noise channel. and [8, 9]. We study the waveform channel (2.2) with Wiener phase noise. A detailed description of the channel is given in Sec. 2.2. This model is reasonable, for example, for optical ber communication with low to intermediate power and laser phase noise, see [36{38]. Since the sampling of a continuous-time Wiener process yields a discrete-time Wiener process (Gaussian random walk), it is tempting to use the model (2.1) withfg as a discrete-time Wiener process. However, this ignores the eect of ltering prior to sampling. It was pointed out in [37] that \even coherent systems relying on amplitude modulation (phase noise is obviously a problem in systems employing phase modulation) will suer some degradation due to the presence of phase noise". This is because the ltering converts phase uctuations to amplitude variations. It is worth mentioning that ltering is necessary before sampling to limit the variance of the noise samples. The model (2.1) thus does not t the channel (2.2) and it is not obvious whether a pre-log 1/2 is achievable. The model that takes the eect of (matched) ltering into account is Y k =X k H k +N k (2.3) 106 wherefH k g is a fading process. The model (2.3) falls in the class of non-coherent fading channels, i.e., the transmitter and receiver have knowledge of the distribution of the fading processfH k g, but have no knowledge of its realization. For such channels, Lapidoth and Moser showed in [55] that, at high SNR, the capacity grows double-logarithmically with the SNR, when the processfH k g is stationary, ergodic, and regular. Rather than using a matched lter and sampling its output at the symbol rate, we use a multi-sample receiver, i.e., a lter whose output is sampled many times per symbol. We show that this receiver achieves a rate that grows logarithmically with the SNR if the number of samples per symbol grows with the square-root of the SNR. Furthermore, we show that a pre-log of 1/2 is achievable through amplitude modulation. We also show for an approximate multi-sample model that phase modulation contributes 1/4 to the pre-log factor at high SNR, if the oversampling factor grows with the square root of SNR. We corroborate the results of the high-SNR analysis with numerical simulations at nite SNR. Our results were partially presented in [41{43]. Benets of oversampling were reported for other problems. For example, in some digital storage systems, it was shown by numerical simulations that oversampling can increase the information rate [67]. In a low-pass-lter-and-limiter (nonlinear) channel, it was found that oversampling (with a factor of 2) oers a higher information rate [44]. It was demonstrated in [52] that doubling the sampling rate recovers some of the loss in capacity incurred on the bandlimited Gaussian channel with a one-bit output quantizer. For non-coherent block fading channels, it was shown in [29] that by doubling the sampling rate the information rate grows logarithmically at high SNR with a pre-log 11=N rather than 1Q=N which is the pre-log achieved by sampling at symbol rate (N is the length 107 of the fading block andQ is the rank of the covariance matrix of the discrete-time channel gains within each block). This chapter is organized as follows. The continuous-time model is described in Sec. 2.2 and the discrete-time models of the matched lter receiver and the multi-sample receiver are described in Sec. 2.3. We derive a lower bound on the capacity in Sec. 2.4 based on amplitude modulation and show that the pre-log factor is at least 1/2 if the oversampling factor grows with (the square root of) the SNR. We develop algorithms to compute tight lower bounds on the information rates of a multi-sample receiver in Sec. 2.5. In Sec. 2.6, we report the results of numerical simulations which demonstrate the importance of increasing the oversampling factor with the SNR to achieve higher rates. We derive a lower bound on the rate achieved by phase modulation for an approximate multi-sample discrete-time model in Sec. 2.7 and show that phase modulation contributes at least 1/4 to the pre-log factor, resulting in an overall pre-log factor of 3/4 when both amplitude and phase modulation are used. In Sec. 2.8, we summarize our results and mention some open problems. Finally, we conclude with Sec. 2.9. 2.2 Waveform Phase Noise Channel We use the following notation: j = p 1 , denotes the complex conjugate, D is the Dirac delta function,de is the ceiling operator. We useX k as a shorthand for (X 1 ;X 2 ;:::;X k ). Suppose the transmit-waveform is x(t) and the receiver observes r(t) =x(t) e j(t) +n(t) (2.4) 108 wheren(t) is a realization of a white circularly-symmetric complex Gaussian processN(t) with E [N(t)] = 0 E [N(t 1 )N (t 2 )] = 2 N D (t 2 t 1 ): (2.5) The phase (t) is a realization of a Wiener process (t): (t) = (0) + Z t 0 W ()d (2.6) where (0) is uniform on [;) and W (t) is a real Gaussian process with E [W (t)] = 0 E [W (t 1 )W (t 2 )] = 2 D (t 2 t 1 ): (2.7) The processesN(t) and (t) are independent of each other and independent of the input. N 0 = 2 2 N is the single-sided power spectral density of the additive noise. We dene U(t) exp(j(t)). The auto-correlation function of U(t) is R U (t 1 ;t 2 ) =E [U(t 1 )U (t 2 )] = exp (jt 2 t 1 j) (2.8) and the power spectral density of U(t) is S U (f) = Z 1 1 R U (t;t +) e j2f d = 1 =2 (=2) 2 +f 2 : (2.9) 109 g(t) x(t) fx m g Figure 2.2: Transmitter: pulse-shaping lter. The spectrum is said to have a Lorentzian shape. It is easy to show that =f FWHM = 2f HWHM where f FWHM is the full-width at half-maximum and f HWHM is the half-width at half-maximum. Let T be the transmission interval, then the transmitted waveforms must satisfy the power constraint E 1 T Z T 0 jX(t)j 2 dt P (2.10) where X(t) is a random process whose realization is x(t). 2.3 Discrete-time Models Transmitter Let (x 1 ;x 2 ;:::;x b ) be the codeword sent by the transmitter. Suppose the transmitter uses a unit-energy pulseg(t) whose time support is [0;T s ] whereT s is the symbol interval (see Fig. 2.2). The waveform sent by the transmitter is x(t) = b X m=1 x m g(t (m 1)T s ): (2.11) 2.3.1 Matched Filter Receiver The received signal r(t) is fed to a matched lter and the output y(t) of the lter is sampled at symbol rate (also called Baud rate) as illustrated in Fig. 2.3. The k-th 110 g (Tst) decoder y(t) kT s y k ^ xm r(t) Figure 2.3: Matched lter receiver with symbol rate sampling. output sample is y k =r(t)?g (T s t)j t=kTs = b X m=1 x m Z 1 1 e j() g( (m 1)T s ) g ( (k 1)T s )d +n k =x k Z 1 1 e j() jg( (k 1)T s )j 2 d +n k (2.12) where k = 1;:::;b, ? denotes convolution and n k is the part of the output due to the additive noise. Therefore, we have the discrete-time model Y k =X k H k +N k (2.13) where H k Z 1 1 e j() jg( (k 1)T s )j 2 d: (2.14) The transmitter and receiver know the distribution of the fading processfH k g, but have no knowledge of its realization. For this model, Lapidoth and Moser showed in [55] that, at high SNR, the capacity grows double-logarithmically with the SNR when the process fH k g is stationary, ergodic, and regular. 111 R decoder y(t) kT s =L y k integrate and dump ^ xm r(t) Figure 2.4: Multi-sample receiver. Approximation A commonly-used approximation is Z 1 1 e j() jg( (k 1)T s )j 2 de j(kTs) Z 1 1 jg( (k 1)T s )j 2 d which yields the approximate discrete-time model Y k =X k e j(kTs) +N k : (2.15) 2.3.2 Multi-sample Receiver Let L be the number of samples per symbol (L 1) and dene the sample interval as = T s L : (2.16) The received waveform r(t) is ltered using an integrator over a sample interval to give the output signal y(t) = Z t t r() d: (2.17) The signal y(t) is a realization of a random process Y (t) that is sampled at t = k, k = 1;:::;n = bL (see Fig. 2.4). Hence, we have the discrete-time model in which the 112 k-th output sample is Y k =X dk=Le e j k F k +N k (2.18) where Y k Y (k), k ((k 1)), F k 1 Z k (k1) g k L 1 T s e j(() k ) d (2.19) and N k Z k (k1) N() d: (2.20) The processfN k g is an i.i.d. circularly-symmetric complex Gaussian process with mean 0 andE[jN k j 2 ] = 2 N (see Appendix 2.10.1) while the processf k g is the discrete-time Wiener process: k = k1 +W k (2.21) for k = 2;:::;n, where 1 is uniform on [;) andfW k g is an i.i.d. real Gaussian process with mean 0 andE[jW k j 2 ] = 2, i.e., the probability density function (pdf) of W k is p W k (w) =G(w; 0; 2 W ) where G(w;; 2 ) = 1 p 2 2 exp (w) 2 2 2 (2.22) 113 and 2 W = 2. The random variable (W k mod 2) is a wrapped Gaussian and its pdf is p W (w; 2 W ) where p W (w; 2 ) = 1 X i=1 G(w 2i; 0; 2 ): (2.23) Moreover,fF k g andfW k g are independent offN k g but not independent of each other. Equations (2.10) and (2.11) imply the power constraint 1 b b X m=1 E[jX m j 2 ]PPT s : (2.24) The signal-to-noise ratio SNR is dened as SNR P 2 N T s = P 2 N : (2.25) 2.4 High SNR In this section, we study rectangular pulses, i.e., we consider g(t) 8 > > < > > : p 1=T s ; 0t<T s ; 0; otherwise: (2.26) It follows that F k in (2.19) becomes F k 1 Z k (k1) e j(() k ) d: (2.27) 114 The processfF k g is i.i.d. (see Appendix 2.10.1). For the kth input symbol, we have L outputs, so it is convenient to group the L samples per symbol in one vector and dene Y k (Y (k1)L+1 ;Y (k1)L+2 ;:::;Y (k1)L+L ): (2.28) We also use Y k as a shorthand for (Y 1 ; Y 2 ;:::; Y k ). The capacity of a point-to-point channel is given by C(SNR) = lim b!1 1 b supI(X b ; Y b ) (2.29) where the supremum is over all of possible joint distributions of the input symbols satis- fying the power constraint. For a given input distribution, the achievable rate R is given by R(SNR) =I(X; Y) lim b!1 1 b I(X b ; Y b ): (2.30) We dene X A jXj and X \X where\ denotes the phase angle (also called the argument) of a complex number, then decompose the mutual information using the chain rule into two parts: I(X b ; Y b ) =I(X b A ; Y b ) +I( b X ; Y b jX b A ): (2.31) 115 Dene I(X A ; Y) lim b!1 1 b I(X b A ; Y b ) (2.32) and I( X ; YjX A ) lim b!1 1 b I( b X ; Y b jX b A ): (2.33) It follows from (2.30){(2.33) that I(X; Y) =I(X A ; Y) +I( X ; YjX A ): (2.34) The rst term represents the contribution of the amplitude modulation while the second term represents the contribution of the phase modulation. 2.4.1 Amplitude Modulation We focus on the contribution of amplitude modulation in this section. Suppose that X b A is i.i.d. so that I(X b A ; Y b ) (a) = b X k=1 I(X A;k ; Y b jX k1 A ) (b) = b X k=1 H(X A;k )H(X A;k jY b X k1 A ) (c) b X k=1 I(X A;k ; Y k ) (d) b X k=1 I(X A;k ;V k ) (2.35) 116 where V k = L X `=1 jY (k1)L+` j 2 : (2.36) Step (a) follows from the chain rule of mutual information, (b) follows from the indepen- dence of X A;1 ;X A;2 ;:::;X A;n , (c) holds because conditioning does not increase entropy, and (d) follows from the data processing inequality. Since X b A is identically distributed, then V b is also identically distributed and we have, for k 2, I(X A;k ;V k ) =I(X A;1 ;V 1 ): (2.37) In the rest of this section, we consider only one symbol (k = 1) and drop the time index. Moreover, we set T s = 1 for simplicity. By combining (2.36) and (2.18), we have V = L X `=1 X 2 A 2 jF ` j 2 + 2X A <[e j X e j ` F ` N ` ] +jN ` j 2 =X 2 A G + 2X A Z 1 +Z 0 (2.38) where G, Z 1 and Z 0 are dened as G 1 L L X `=1 jF ` j 2 (2.39) Z 1 L X `=1 <[e j X e j ` F ` N ` ] (2.40) Z 0 L X `=1 jN ` j 2 : (2.41) 117 The second-order statistics of Z 1 and Z 0 are (see Appendix 2.10.2) E[Z 1 ] = 0 Var[Z 1 ] =E[G] 2 N =2 E[Z 0 ] = 2 N Var [Z 0 ] = 4 N E [Z 1 (Z 0 E[Z 0 ])] = 0: (2.42) By using the Auxiliary-Channel Lower Bound Theorem in [5, Sec. VI], we have I(X A ;V )E[ logq V (V )]E[ logq VjX A (VjX A )] (2.43) where q VjX A (vjx A ) is an arbitrary auxiliary channel and q V (v) = Z p X A (x A )q VjX A (vjx A )dx A (2.44) where p X A () is the true input density, i.e., q V () is the output density obtained by connecting the true input source to the auxiliary channel. E[] is the expectation according to the true distribution of X A and V . We point out that any auxiliary channel gives a valid lower bound and the \closer" the auxiliary channel is to the true channel, the tighter the lower bound. We choose the auxiliary channel V auxiliary =X 2 A ~ G + 2X A ~ Z 1 + ~ Z 0 (2.45) where ~ G, ~ Z 1 and ~ Z 0 are dened as ~ G 1 (2.46) 118 ~ Z 1 L X `=1 <[e j X e j ` N ` ] (2.47) ~ Z 0 2 N : (2.48) We make a few remarks to motivate our choice of the auxiliary channel: G converges to 1 (in mean square) as L!1 (see (2.230) in Appendix 2.10.4). SinceN ` is circularly-symmetric complex Gaussian, thene j X e j ` N ` is also circularly- symmetric complex Gaussian with the same variance of N ` . It follows that ~ Z 1 is Gaussian since it is a linear combination of Gaussian random variables. More specif- ically, the random variable ~ Z 1 is Gaussian with mean zero and variance 2 N =2. In addition, Z 1 converges to ~ Z 1 (in mean square) as L!1. This can be shown by computing (using steps similar to (2.197) in Appendix 2.10.2) E h jZ 1 ~ Z 1 j 2 i = 1 2 2 N LE[jF 1 1j 2 ] (2.49) and using (2.226) in Appendix 2.10.3 to get lim L!1 E h jZ 1 ~ Z 1 j 2 i = 1 2 2 N lim a!1 E[jF 1 1j 2 ] = 0: (2.50) Z 0 converges to 2 N (in mean square) as L!1. This follows directly by taking the limit of Var(Z 0 ) given in (2.42): lim L!1 E (Z 0 2 N ) 2 = lim !0 4 N = 0: (2.51) 119 In summary, the conditional distribution q VjX A of the auxiliary channel is q VjX A (vjx A ) = 1 q 4x 2 A 2 2 N exp (vx 2 A 2 N ) 2 4x 2 A 2 2 N : (2.52) It follows that E[ log(q VjX A (VjX A ))] =E (VX 2 A 2 N ) 2 4X 2 A 2 2 N + log + 1 2 log(4 2 N ) + 1 2 E[log(X 2 A )]: (2.53) By using (2.38), we have VX 2 A 2 N 2 = X 2 A (G 1) + 2X A Z 1 + (Z 0 2 N ) 2 =X 4 A 2 (G 1) 2 + 4X 2 A 2 Z 2 1 + (Z 0 2 N ) 2 + 4X 3 A 2 (G 1)Z 1 + 2X 2 A (G 1)(Z 0 2 N ) + 4X A Z 1 (Z 0 2 N ) (2.54) and hence, by using the second-order statistics (2.42), we have E (VX 2 A 2 N ) 2 4X 2 A 2 2 N = 1 4 2 N E[X 2 A ]E (G 1) 2 + 1 2 N E[Z 2 1 ] + 1 4 2 2 N E 1 X 2 A E (Z 0 2 N ) 2 + 1 2 N E[X A ]E [(G 1)Z 1 ] + 1 2 2 N E[G 1]E[Z 0 2 N ] + 1 2 N E 1 X A E Z 1 (Z 0 2 N ) 120 = 1 4 2 N P E (G 1) 2 + 1 2 E[G] + 2 N 4 E 1 X 2 A (2.55) where we also used (see Appendix 2.10.2) E [(G 1)Z 1 ] = 0: (2.56) Substituting (2.55) into (2.53) and using E[G] 1 yield E[ log(Q VjX A (VjX A ))] log + 1 2 log(4 2 N ) + 1 2 E[log(X 2 A )] + P 4 2 N E (G 1) 2 + 1 2 + 2 N 4 E 1 X 2 A : (2.57) It is convenient to dene X P X 2 A . We choose the input distribution p X P (x P ) = 8 > > < > > : 1 exp x P P min ; x P P min 0; otherwise (2.58) where 0<P min <P and =PP min , so that E[X P ] =E[X 2 A ] =P: (2.59) It follows from (2.44) and (2.58) that q V (v) = Z 1 P min 1 exp x P P min q VjX P (vjx P ) dx P exp(P min =) f V (v) (2.60) 121 where q VjX P (vjx P ) =q VjX A (vj p x P ) (2.61) and f V (v) Z 1 0 1 exp x P q VjX P (vjx P )dx P : (2.62) The inequality (2:60) follows from the non-negativity of the integrand. By combining (2.52), (2.61), (2.62) and making the change of variables x =x P , we have f V (v) = Z 1 0 e x=() 1 q 4x 2 N exp (vx 2 N ) 2 4x 2 N dx = 1 q ( + 4 2 N ) exp 2 4 2 N " v 2 N jv 2 N j r 1 + 4 2 N #! (2.63) where we used equation (140) in Appendix A of [62]: Z 1 0 1 a exp x a 1 p bx exp (ux) 2 bx dx = 1 p a(a +b) exp 2 b " ujuj r 1 + b a #! : (2.64) Therefore, we have E[ log(f V (V ))] = 1 2 log( 2 ( 2 + 4 2 N )) 122 1 2 2 N " E[V 2 N ]E[jV 2 N j] r 1 + 4 2 N # (a) log() + 1 2 2 N E[V 2 N ] "r 1 + 4 2 N 1 # (b) log() (2.65) where (a) holds because the logarithmic function is monotonic andE[jj]E[], and (b) holds because E[V 2 N ] =E[X 2 A ]E[G] + 2E[X A ]E[Z 1 ] +E[Z 0 ] 2 N =P E[G] 0: (2.66) The monotonicity of the logarithmic function and (2.60) yield E[ log(q V (V ))]E h log e P min = f V (V ) i log + log P min (2.67) where the last inequality follows from (2.65). It follows from (2.43), (2.57) and (2.67) that I(X A ;V ) log P min 1 2 log(4 2 N ) 1 2 E[log(X 2 A )] P 4 2 N E (G 1) 2 1 2 2 N 4 E 1 X 2 A : (2.68) 123 If P min =P=2, then =PP min =P=2 and we have E 1 X P 1 P min = 2 P (2.69) and E [log (X P )] = Z 1 1 e (x)= log(x)dx (a) = log + Z 1 1 e (u1) log(u)du (b) log + 1 (2.70) where (a) follows by the change of variablesu =x=, and (b) holds because log(u)u1 for all u> 0. Substituting into (2.68), we obtain I(X A ;V ) 1 2 logSNR2 1 2 log(8) 1 2SNR 1 4 SNRE (G 1) 2 : (2.71) By combining (2.32), (2.35), (2.37) and (2.71), we have I(X A ; Y) 1 2 logSNR2 1 2 log(8) 1 2SNR 1 4 SNRE (G 1) 2 : (2.72) Suppose L grows with SNR such that L = l p SNR m : (2.73) 124 Since = 1=L, then we have lim SNR!1 SNR =1 and lim SNR!1 SNR 2 = 1 2 (2.74) which implies lim SNR!1 I(X A ; Y) 1 2 logSNR 2 1 2 log(8) 2 36 (2.75) because lim SNR!1 SNRE (G 1) 2 = lim SNR!1 SNR 2 lim !0 E[(G 1) 2 ] 2 = 1 2 () 2 9 = 2 9 : (2.76) This can be shown by writing E[(G 1) 2 ] =E[(GE[G]) 2 ] + (E[G] 1) 2 (2.77) then computing lim !0 E[(G 1) 2 ] 2 = lim !0 Var(G) 2 + lim !0 (E[G] 1) 2 2 = () 2 9 : (2.78) by using (see (2.231) and (2.232) in Appendix 2.10.4) lim !0 Var(G) 2 = 0 and lim !0 (E[G] 1) 2 2 = () 2 9 : 125 R () 2 y(t) kT s =L y k integrate and dump digital LTI lter L# downsample Amplitude decoder r(t) vm ^ x A;m Figure 2.5: Double ltering receiver. The main result of this section is summarized in following theorem. Theorem 15 If L =d p SNRe and the input X n A is i.i.d. such that (jX k j 2 P=2) is exponentially distributed with mean P=2, i.e.,jX k j 2 is distributed according to p X P where p X P (jxj 2 ) = 8 > > < > > : 2 P exp 1 2jxj 2 P ; jxj 2 P=2 0; otherwise (2.79) then lim SNR!1 I(X A ;Y) 1 2 logSNR2 1 2 log(8) 2 36 : (2.80) The lower bound is achieved by the double-ltering 1 receiver structure shown in Fig. 2.5. As a result, the capacity pre-log satises lim SNR!1 C(SNR) logSNR 1 2 (2.81) which follows from Theorem 15 and C(SNR)I(X; Y)I(X A ; Y). The capacity thus grows logarithmically at high SNR with a pre-log factor of at least 1/2. 1 We borrow the term \double-ltering" from [37]. 126 Remarks There is a wide literature on the design of receivers for the channel model (2.1) with discrete-time Wiener phase noise, e.g., see [7], [72], [6] and references therein. One may want to use these designs, which raises the following question: \when is it justied to approximate the non-coherent fading model (2.3) with the discrete- time phase noise model (2.1)?" Our result suggests that this approximation may be justied when the phase variation is small over one symbol interval (i.e., when the phase noise linewidth is small compared to the symbol rate) and also the SNR is low to moderate. Note that the SNR at which the high-SNR asymptotics start to manifest themselves depends on the application. The authors of [36] treated on-o keying transmission in the presence of Wiener phase noise by using a double-ltering receiver, which is composed of an intermedi- ate frequency (IF) lter, followed by an envelope detector (square-law device) and then a post-detection lter. They showed that by optimizing the IF receiver band- width the double-ltering receiver outperforms the single-ltering (matched lter) receiver. Furthermore, they showed via computer simulation that the optimum IF bandwidth increases with the SNR. This is similar to our result in the sense that we require the number of samples per symbol to increase with the SNR in order to achieve a rate that grows logarithmically with the SNR. One can achieve a pre-log of 1=2 withL less thand p SNRe. For instance, suppose we use the auxiliary channel (2.45) with the same ~ Z 1 and ~ Z 0 dened in (2.47) and 127 (2.48), respectively, but set ~ G = E[G] rather than ~ G = 1 as in (2.46). By using steps similar to the steps used to derive (2.72), we obtain I(X A ; Y) 1 2 logSNR2 1 2 log(8) 1 2SNR 1 4 SNRE (GE[G]) 2 : (2.82) Now choose L = l 3 p 2 SNR m : (2.83) Since = 1=L, it follows that lim SNR!1 SNR =1 and lim SNR!1 SNR 3 = 1 2 : (2.84) Moreover, we have (see Appendix 2.10.3) lim SNR!1 SNR Var(G) = lim SNR!1 SNR 3 lim !0 Var(G) 3 = 1 2 4() 2 45 = 4 2 45 : (2.85) Hence, we have lim SNR!1 I(X A ; Y) 1 2 logSNR2 1 2 log(8) 2 45 : (2.86) This means that a pre-log of 1=2 is achieved if the oversampling factorL grows with the cubic root of SNR. 128 2.5 Rate Computation at Finite SNR In this section, we develop algorithms to compute the information rate numerically. We change the notation in this section as follows: we denote the input symbols with a sub- script \symb", i.e., the input codeword is (X symb;1 ;X symb;2 ;:::;X symb;b ) and we use X k to denote the kth input sample which is dened by X k =X symb;dk=Le g ((k mod L)): (2.87) We write (2.18) as Y k =X symb;dk=Le e j k F k +N k : (2.88) The information rate I(X;Y ) is dened as I(X;Y ) lim b!1 1 b I(X b symb ; Y b ) = lim b!1 1 b I(X n ;Y n ) (2.89) where the last equality follows from the denitions (2.87) and (2.28). One diculty in evaluating (2.89) is that the joint distribution offF k g andfW k g is not available in closed form. Even the distribution ofF k is not available in closed form (there is an approximation for small linewidth, see (16) in [36]). However, we can numerically compute tight lower 129 bounds on I(X;Y ) by using the auxiliary-channel technique described in (2.43) and (2.44). That is, we have I(X;Y )I(X;Y ) =E log q YjX (YjX) q Y (Y ) (2.90) where q YjX (j) is an arbitrary auxiliary channel and q Y (y) = X ~ x p X (~ x)q YjX (yj~ x) (2.91) where p X is the true distribution of X. The distribution q Y () is the output distribution obtained by connecting the true input source to the auxiliary channel. Using this result, we can compute a lower bound on I(X;Y ) by using the following algorithm [5]: 1. Sample a long sequence (x n ;y n ) according to the true joint distribution of X n and Y n . 2. Compute q Y n jX n(y n jx n ) and q Y n(y n ) = X ~ x n p X n(~ x n )q Y n jX n(y n j~ x n ) (2.92) where p X n is the true distribution of X n . 3. Estimate I(X;Y ) using I(X;Y ) 1 b log q Y n jX n(y n jx n ) q Y n(y n ) : (2.93) 130 Auxiliary Channel I Consider the auxiliary channel k =X k e j k +N k (2.94) wheref k g andfN k g are dened in Sec. 2.3.2 andX k is dened by (2.87). The channel is the same as Y in (2.88) except that F k is replaced by 1 or more generally with g ((k mod L)). The channel is described by the conditional distribution p n jX n(y n jx n ) = Z n p n ; n jX n( n ;y n jx n ) d n (2.95) where p n ; n jX n( n ;y n jx n ) = n Y k=1 p k j k1 ( k j k1 ) p jX; (y k jx k ; k ) (2.96) with p k j k1 (j ~ ) = 8 > > < > > : p W ( ~ ; 2 W ); k 2 1=(2); k = 1 (2.97) and p jX; (yjx;) = 1 2 N exp yx e j 2 2 N ! : (2.98) The channel p n jX n has continuous states n , which makes step 2 of the algorithm com- putationally infeasible. 131 Auxiliary Channel II We use the following auxiliary channel for the numerical sim- ulations: k =X k e jS k +N k (2.99) which has the conditional probability p n jX n(y n jx n ) = X s n 2S n p S n ; n jX n(s n ;y n jx n ) (2.100) whereS is a nite set and p S n ; n jX n(s n ;y n jx n ) = n Y k=1 p S k jS k1 (s k js k1 ) p jX; (y k jx k ;s k ) (2.101) where p S k jS k1 (sj~ s) = 8 > > < > > : Q(sj~ s); k 2 1=jSj; k = 1: (2.102) Next, we describe our choice ofS and Q(j). We partition [;) into S intervals with equal lengths and pick the mid points of these intervals to be the elements ofS, i.e., we have S =f^ s i :i = 1;:::;Sg where ^ s i =i 2 S S : (2.103) 132 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 1 2 3 4 5 6 7 8 9 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 Figure 2.6: Bayesian network for X n ;S n ; n for n = 9. The state transition probability Q(j) is chosen similar to [13] and [12]: Q(sj~ s) = 2 S Z (; ~ )2R(s)R(~ s) p W ( ~ ; 2 W ) dd ~ (2.104) whereR(s) = [s=S;s+=S), i.e.,R(s) is the interval whose midpoint iss. The larger S and L are, the better the auxiliary channel (2.99) approximates the actual channel (2.88). We remark that the auxiliary channel gives a valid lower bound on I(X;Y ) even for small S and L. 2.5.1 Computing The Conditional Probability Suppose the input X n has the distribution p X n. A Bayesian network for X n ;S n ; n is shown in Fig. 2.6. The probability p n jX n(y n jx n ) can be computed using p n jX n(y n jx n ) = X s2S n (s) (2.105) 133 where we recursively compute k (s)p S k ; k jX n(s;y k jx n ) (2.106) (a) = X ~ s2S p S k1 ;S k ; k jX n(~ s;s;y k jx n ) (b) = X ~ s2S k1 (~ s) p S k ; k jS k1 ; k1 ;X n(s;y k j~ s;y k1 ;x n ) = X ~ s2S k1 (~ s) Q(sj~ s) p jX; (y k jx k ;s) (2.107) with the initial value 0 (s) = 1=jSj. Step (a) is a marginalization, (b) follows from Bayes' rule and the denition of k in (2.106), while (2.107) follows from the structure of Fig. 2.6. We remark that (2.107) is the same as with independent X 1 ;:::;X n , e.g., see equation (9) in [58, Sec. IV]. 2.5.2 Computing The Marginal Probability Dene X m (X (m1)L+1 ;X (m1)L+2 ;:::;X (m1)L+L ). Suppose the input symbols are i.i.d. and X m 2X whereX is a nite set. Therefore, p X n has the form p X n(x n ) = b Y m=1 p X (x m ): (2.108) A Bayesian network forX n ;S n ; n is shown in Fig. 2.7. The probability p n(y n ) can be computed using p n(y n ) = X s2S b (s) (2.109) 134 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 1 2 3 4 5 6 7 8 9 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X symb;1 X symb;2 X symb;3 Figure 2.7: Bayesian network for X n ;S n ; n for n = 9 and L = 3. where m (s)p S mL ;Y m (s; y m ) which can be computed using the recursion: m (s) (2.110) = X ~ x2X L p X (~ x) X ~ s2S m1 (~ s) p S mL ;YmjS (m1)L ;Xm (s; y m j~ s; ~ x) with the initial value 0 (s) = 1=jSj. The setX L is X L =fx (g();g(2);:::;g(L)) :x2Xg: (2.111) We remark thatjX L j =jXj and notjXj L . Next, we dene m;L (s; ~ s; ~ x)p S mL ;YmjS (m1)L ;Xm (s; y m j~ s; ~ x) (2.112) 135 fors; ~ s2S and ~ x2X L . Computing m;L (s; ~ s; ~ x) is similar to computing n (see (2.107)). Intuitively, this is because a block ofL samples in Fig. 2.7 has a structure similar to Fig. 2.6. More precisely, m;L (s; ~ s; ~ x) can be computed recursively by using m;` (s; ~ s; ~ x) (2.113) = X &2S m;`1 (&; ~ s; ~ x) Q (sj&) p jX; y (m1)L+` j~ x ` ;s with the initial value m;0 (s; ~ s; ~ x) = 8 > > < > > : 1; s = ~ s 0; otherwise: (2.114) Therefore, computing p n(y n ) involves two levels of recursion: 1) recursion over the symbols as described by (2.110) and 2) recursion over the samples within a symbol as described by (2.113). 2.6 Numerical Simulations We use two pulses with a symbol-interval time support: A unit-energy square pulse g 1 (t) = 1 p T s rect t T s (2.115) 136 where rect(t) 8 > > < > > : 1; jtj 1=2; 0; otherwise: (2.116) A unit-energy cosine-squared pulse g 2 (t) = 1 p T s =2 cos 2 t T s rect t T s : (2.117) The rst step of the algorithm is to sample a long sequence according to the true joint distribution of X n and Y n . To generate samples according to the original channel (2.88), we must accurately represent digitally the continuous-time waveform (2.4). We use a simulation oversampling rate L sim = 1024 samples/symbol. After the lter (2.17), the receiver has L samples/symbol distributed according to (2.88). Next, to choose a proper sequence length, we follow the approach suggested in [5]: for a candidate length, run the algorithm about 10 times (each with a new random seed) and check whether all estimates of the information rate agree up to the desired accuracy. We used b = 10 4 unless otherwise stated. For ecient implementation of (2.107), p jX; (j;) can be factored out of the sum- mation to yield: k (s) =p jX; (y k jx k ;s) 0 k (s) z }| { X ~ s2S k1 (~ s) Q(sj~ s) (2.118) 137 Moreover, since Q(j) can be represented by a circulant matrix due to symmetry, 0 k () can be computed eciently using the Fast Fourier Transform (FFT). Similarly, the com- putation of (2.113) can be done eciently by factoring out p jX; (j;) and by using the FFT. 2.6.1 Excessively Large Linewidth −5 0 5 10 15 20 25 0 0.5 1 1.5 2 2.5 3 3.5 4 SNR (dB) I(X;Y ) (bits/symbol) AWGN L=16, S=64 L=16, S=32 L=16, S=16 L= 8, S=64 L= 8, S=32 L= 8, S=16 L= 4, S=64 L= 4, S=32 L= 4, S=16 Figure 2.8: Lower bounds on rates for 16-QAM, square transmit-pulse and multi-sample receiver at f HWHM T s = 0:125. Suppose f HWHM T s = 0:125 and the input symbols are independently and uniformly distributed (i.u.d.) 16-QAM. Fig. 2.8 shows an estimate ofI(X;Y ) for a square transmit- pulse, i.e., g(t) = g 1 (tT s =2) and an L-sample receiver with L = 4; 8; 16 and S = 16; 32; 64. The curves with S = 64 are indistinguishable from the curves with S = 32 over the entire SNR range for all values of L, and hence S = 32 is adequate up to 25 138 dB. Even S = 16 is adequate up to 20 dB. The important trend in Fig. 2.8 is that higher oversampling rateL is needed at high SNR to extract all the information from the received signal. For example, L = 4 suces up to SNR 10 dB, L = 8 suces up to SNR 15 dB butL 16 is needed beyond that. It was pointed out in [5] that the lower bounds can be interpreted as the information rates achieved by mismatched decoding. For example,I(X;Y ) forL = 8 andS 32 in Fig. 2.8 is essentially the information rate achieved by a multi-sample (8-sample) receiver that uses maximum-likelihood decoding for the simplied channel (2.94) when it is operated in the original channel (2.88). −5 0 5 10 15 20 25 0 0.5 1 1.5 2 2.5 3 3.5 4 SNR (dB) I(X;Y ) (bits/symbol) AWGN L=16, S=64 L=16, S=32 L=16, S=16 L= 8, S=64 L= 8, S=32 L= 8, S=16 L= 4, S=64 L= 4, S=32 L= 4, S=16 Figure 2.9: Lower bounds on rates for 16-QAM, cosine-squared transmit-pulse and multi- sample receiver at f HWHM T s = 0:125. Fig. 2.9 shows an estimate ofI(X;Y ) for a cosine-squared transmit-pulse, i.e.,g(t) = g 2 (tT s =2) and an L-sample receiver at L = 4; 8; 16 and S = 16; 32; 64. We nd that S = 32 suces up to 25 dB. We see in Fig. 2.9 the same trend in Fig. 2.8: higher L is 139 needed at higher SNR. Comparing Fig. 2.8 with Fig. 2.9 indicates that the square pulse is better than the cosine-squared pulse for the same oversampling rate L. 2.6.2 Large Linewidth 10 12 14 16 18 20 22 24 26 28 30 0 0.5 1 1.5 2 2.5 3 3.5 4 SNR (dB) I(X;Y ) (bits/symbol) AWGN L=16 L= 8 L= 4 L= 2 L= 1 Figure 2.10: Lower bounds on rates for 16-PSK, square transmit-pulse and multi-sample receiver at f HWHM T s = 0:0125. As the linewidth decreases, the benet of oversampling at the receiver becomes appar- ent only at higher SNR. For example, for f HWHM T s = 0:0125 and i.u.d. 16-PSK input, Fig. 2.10 shows an estimate of I(X;Y ) for a square transmit-pulse and an L-sample receiver at L = 1; 2; 4; 8; 16 and S = 64. We see that L = 4 suces up to SNR 19 dB, L = 8 suces up to SNR 24 dB and only beyond that L 16 is necessary. We conclude from Fig. 2.8{2.10 that the required L depends on 1) the linewidth f FWHM of the phase noise; 2) the pulse g(t); and 3) the SNR. 140 2.6.3 Comparison With Other Models We compare the discrete-time model of the multi-sample receiver with other discrete- time models. The simulation parameters for our model (GK) are b = 10 4 , L = 16 (with L sim = 1024) and S = 64 for 16-QAM (S = 128 was too computationally intensive) and S = 128 for QPSK. −5 0 5 10 15 20 25 0 0.5 1 1.5 2 2.5 3 3.5 4 SNR (dB) Information Rate (bits/symbol) 16−QAM, AWGN 16−QAM, γ = 0.4, GK 16−QAM, γ = 0.4, MTR 16−QAM, γ = 0.4, Baud QPSK, AWGN QPSK, γ = 0.5, GK QPSK, γ = 0.5, MTR QPSK, γ = 0.5, Baud Figure 2.11: Comparison of information rates for dierent models. In Fig. 2.11, we show curves for the Baud-rate model used in [6] and [27]{ [11]. The model is (2.1) where the phase noise is a Wiener process whose noise increments have variance 2 . We set 2 = 2T s . The simulation parameters for the Baud-rate model are b = 10 5 and S = 128. 141 We also show curves for the Martal o-Tripodi-Raheli (MTR) model [58] in Fig. 2.11. For the sake of comparison, we adapt the model in [58] from a square-root raised-cosine pulse to a square pulse and write the \matched" lter outputfV m g as V m = L X `=1 (m1)L+` (2.119) where m = 1;:::;b and k is dened in (2.94). The auxiliary channel is Y m =X symb;m e jm +Z m ; m 1 (2.120) where the processfZ m g is an i.i.d. circularly-symmetric complex Gaussian process with mean 0 and E[jZ m j 2 ] = 2 N T s while the processf m g is a rst-order Markov process (not a Wiener process) with a time-invariant transition probability, i.e., for k 2 and all k ; k1 2 [;), we have p k j k1 ( k j k1 ) = p 2 j 1 ( k j k1 ). Furthermore, the phase space is quantized to a nite number S of states and the transition probabilities are estimated by means of simulation. The probabilities are then used to compute a lower bound on the information rate. The simulation parameters for the MTR model are b = 10 5 , L = 16 and S = 128. We see that the Baud-rate and MTR models saturate at a rate well below the rate achieved by the multi-sample receiver. Moreover, the multi-sample receiver achieves the full 4 bits/symbol and 2 bits/symbol of 16-QAM and QPSK, respectively, at high SNR. 142 2.7 High SNR Revisited In this section, we focus on the contribution of phase modulation to capacity. However, solving the problem for the model given by (2.18) is dicult. The model we adopt in this section does not include the eect of ltering modeled byfF k g. More specically, the kth output of the simplied approximate model is Y k =X dk=Le e j k +N k (2.121) whereX k iskth input symbol andf k g andfN k g are the same processes dened in Sec. 2.3.2. The rest of this section is organized as follows. First, we show in Sec. 2.7.1 that phase modulation achieves an information rate with a pre-log of 1/2 in an AWGN channel. Even though this result is already known (e.g., see [14], [82] or [45]), the technique used in the proof is useful for proving other results as we will see. Then we show in Sec. 2.7.2 that the contribution of phase modulation to the pre-log is at least 1/4 for the approximate model in (2.121). 2.7.1 Phase Modulation in AWGN Channels We develop a lower bound on the information rate achieved by phase modulation in an AWGN channel which shows that phase modulation achieves a pre-log of 1/2 at high SNR. Let R be a xed positive real number. Consider the channel whose output Y is Y =Re j X +Z (2.122) 143 where X 2 [;) is the input and Z is a circularly-symmetric complex Gaussian ran- dom variable with mean 0 and variance 1. Dene Y \Y . The conditional probability p Y j X is given by [4] (see Appendix 2.10.5 for details) p Y j X ( y j x ) =p ( y x ) (2.123) with p () = 1 2 e R 2 + 1 p 4 R cos e R 2 sin 2 erfc (R cos) (2.124) where erfc() is the complementary error function which is dened as erfc(z) 2 p Z 1 z e t 2 dt: (2.125) Lemma 16 Fix R> 0. For the channel Y described by (2.122), we have E [cos( Y X )] =E [cos()] 1 1 R 2 (2.126) E [sin( Y X )] =E [sin()] = 0 (2.127) where is a random variable with the probability density function p given by (2.124). Proof. From symmetry (see Lemma 29), we have E [cos( Y X )] =E [cos()] (2.128) 144 where E [cos()] Z cos() p () d (a) = 2 Z 0 cos() p () d (b) = Z 0 cos() 1 p R cos e R 2 sin 2 erfc (R cos) d = Z =2 0 cos() 1 p R cos e R 2 sin 2 erfc (R cos) d + Z =2 cos() 1 p R cos e R 2 sin 2 erfc (R cos) d (c) Z =2 0 R p cos 2 e R 2 sin 2 erfc (R cos)d (d) Z =2 0 R p cos 2 e R 2 sin 2 2e R 2 cos 2 d = Z =2 0 2R p cos 2 e R 2 sin 2 d R p e R 2 Z =2 0 cos 2 d (e) = Z =2 0 2R p cos 2 e R 2 sin 2 d p 4 Re R 2 (2.129) where (a) follows because both p () and cos() are even functions, (b) follows from (2.124) and R 0 cos d = 0, (c) follows because the integrand is non-negative over the interval 2 [=2;], (d) holds because erfc (R cos) 2e R 2 cos 2 (see Lemma 30) and cos 2 e R 2 sin 2 0 and nally (e) follows by direct integration. We bound the integral in (2.129) as follows Z =2 0 2R p cos 2 e R 2 sin 2 d (a) 2R p Z =2 0 (1 2 ) e R 2 2 d 145 = 2R p Z =2 0 e R 2 2 d 2R p Z =2 0 2 e R 2 2 d (b) = erf 2 R 1 2R 2 erf 2 R + 1 2 p R e 2 R 2 =4 (c) erf 2 R 1 2R 2 (2.130) where erf() is the error function dened as erf(z) 2 p Z z 0 e t 2 dt: (2.131) Step (a) follows from cos 2 e R 2 sin 2 (1 2 )e R 2 2 (see Lemma 33), (b) follows by using the denition of erf() and [46, 3.321(5)]: Z b 0 t 2 e a 2 t 2 dt = 1 2a 3 p 2 erf (ab)ab e a 2 b 2 (2.132) and (c) holds because erf() 1 and the last term is non-negative. Substituting back in (2.129) yields E [cos()] erf 2 R 1 2R 2 p 4 R e R 2 (2.133) 1e 2 R 2 =4 1 2R 2 p 4 R e R 2 (2.134) where the last inequality follows from the relations erf (R=2) = 1 erfc (R=2) and erfc (R=2)e 2 R 2 =4 (see [20, eq(5)]). Therefore, we have R 2 E [cos()]R 2 4 2 e 1 2 p 4 3 2e 3=2 (2.135) 146 becauseR 3 e R 2 (3=2e) 3=2 andR 2 e 2 R 2 =4 4= 2 e (see Lemma 34). We conclude that E [cos()] 1 0:83 R 2 1 1 R 2 : (2.136) We remark that E [sin()] Z sin() p () d = 0 (2.137) because the integrand is odd. Therefore, it follows from the symmetry (see Lemma 29) and (2.137) that E [sin( Y X )] = 0: (2.138) Lemma 17 Fix R> 0. Let Y be the channel described by (2.122). If the channel input X is uniformly distributed over [;), then we have I( X ;Y )I( X ; Y ) 1 2 logR 2 1: (2.139) Proof. Dene q Y j X ( y j x ) exp( cos( y x )) 2I 0 () (2.140) 147 whereI 0 () is the zeroth-order modied Bessel function of the rst kind and > 0. This distribution is known as Tikhonov (or von Mises) distribution [57]. Dene q Y ( y ) Z p X ( x ) q Y j X ( y j x ) d x (2.141) = 1 2 : (2.142) The mutual information between the input phase X and the output phase Y is I( X ; Y ) (a) E log q Y j X ( Y j X ) E [log (q Y ( Y ))] (b) = log(2) log(2I 0 ()) +E [cos( Y X )] = log(I 0 ()) +E [cos( Y X )] (c) 1 2 log +E [cos( Y X )] (2.143) where (a) follows from the auxiliary-channel lower bound theorem in [5, Sec. VI], (b) follows from (2.142) and (2.140) and (c) follows from [39, Lemma 2] I 0 (z) p 2 e z p z e z p z : (2.144) By combining (2.143) and (2.126), we have I( X ; Y ) 1 2 log R 2 : (2.145) 148 Therefore, setting =R 2 yields I( X ; Y ) 1 2 logR 2 1: (2.146) We remark that our motivation for choosing a Tikhonov distribution as an auxiliary channel is that the channel distribution in (2.124) is approximately Tikhonov for large R and both distributions are approximately Gaussian (see Appendix 2.10.7). However, the Tikhonov distribution is more convenient for the analysis than the (wrapped) Gaussian distribution. 2.7.2 Phase Modulation in Phase Noise Channels In this section, we study the contribution of the phase modulation to the capacity of the channel (2.121) at high SNR. We assume thatT s = 1 and 2 N = 1 for simplicity. By using the chain rule, we have I( n X ; Y n jX n A ) = n X k=1 I( X;k ; Y n jX n A ; k1 X ) (a) n X k=2 I( X;k ; Y n jX n A ; k1 X ) (b) n X k=2 I( X;k ; ~ Y k jX n A ; k1 X ) (2.147) where ~ Y k is a deterministic function of (Y n ;X n A ; k1 X ). Inequality (a) follows from the non-negativity of mutual information and (b) follows from the data processing inequality. 149 At high SNR, we use some intuition to choose a reasonable processing of (Y n ;X n A ; k1 X ) for decoding X;k : 1. Since only the past inputsX k1 are available, the future outputs Y n k+1 are not very useful for estimating (k1)L . 2. Sincef k g is a rst-order Markov process, the most recent past input symbol X k1 and the most recent output sampleY (k1)L are the most useful for estimating (k1)L . A simple estimator is e j b (k1)L Y (k1)L X k1 =e j (k1)L + N (k1)L X k1 =e j (k1)L 1 + ~ Z k1 (2.148) where ~ Z k N kL e j kL X k : (2.149) 3. Given the current input amplitudejX k j and the estimate of (k1)L , the rst sample Y (k1)L+1 in Y k is the most useful for decoding X;k because the following samples become increasingly corrupted by the phase noise. We scaleY (k1)L+1 to normalize the variance of the additive noise and write Y (k1)L+1 p = jX k j p e j X;k + ~ N k e j (k1)L+1 (2.150) 150 where ~ N k N (k1)L+1 e j (k1)L+1 p : (2.151) To summarize, we choose ~ Y k = Y (k1)L+1 p Y (k1)L X k1 : (2.152) It follows from (2.152), (2.148) and (2.150) that ~ Y k = jX k j p e j X;k + ~ N k 1 + ~ Z k1 e jW (k1)L+1 (2.153) where ~ N k and ~ Z k1 are statistically independent and ~ N k N C (0; 1) (2.154) ~ Z k1 fjX k1 j =jx k1 jgN C 0; 1 jx k1 j 2 (2.155) which means that, conditioned onfjX k1 j =jx k1 jg, ~ Z k1 is a Gaussian random variable with mean 0 and variance 1=(jx k1 j 2 ). Moreover,W (k1)L+1 is statistically independent of ~ N k and ~ Z k1 . The choice of ~ Y k in (2.152) implies that I( X;k ; ~ Y k jX n A ;X k1 ) =I( X;k ; ~ Y k jX A;k ;X k1 ): (2.156) 151 Dene ~ Y;k \ ~ Y k and Q ~ Y j X y x exp( cos( y x )) 2I 0 () : (2.157) Furthermore, dene Q ~ Y;k jX A;k ;X k1 y jx k j;x k1 Z p X;k jX A;k ;X k1 x jx k j;x k1 Q ~ Y j X ( y j x )d x = 1 2 : (2.158) The last equality holds because X 1 ;:::;X n are statistically independent and X;k is independent of X A;k with a uniform distribution on [;). We have I( X;k ; ~ Y k jX A;k ;X k1 ) (a) I( X;k ; ~ Y;k jX A;k ;X k1 ) (b) E h logQ ~ Y j X ( ~ Y;k j X;k ) i E h logQ ~ Y;k jX A;k ;X k1 ~ Y;k jX k j;X k1 i (c) = log(2) log(2I 0 ()) +E h cos( ~ Y;k X;k ) i = log(I 0 ()) +E h cos( ~ Y;k X;k ) i (d) 1 2 log +E h cos( ~ Y;k X;k ) i 1 2 log 2 W 2 4 SNR (2.159) 152 where (a) follows from the data processing inequality, (b) follows by extending the result of the auxiliary-channel lower bound theorem in [5, Sec. VI], (c) follows from (2.157) and (2.158), (d) follows from (2.144) and the last inequality holds because E h cos( ~ Y;k X;k ) i 1 2 W 2 4 SNR (2.160) for SNR> 2, as we will show. Dene S k 1 + ~ Z k (2.161) ^ Y k jX k j p e j X;k + ~ N k : (2.162) Therefore, we have E h cos( ~ Y;k X;k ) i (a) = E h cos(W (k1)L+1 + S;k1 + ^ Y;k X;k ) i (b) = E cos(W (k1)L+1 + S;k1 ) E h cos( ^ Y;k X;k ) i E sin(W (k1)L+1 + S;k1 ) E h sin( ^ Y;k X;k ) i (2.163) where S;k \S k and ^ Y;k \ ^ Y k . Step (a) follows from (2.153) while step (b) follows from the trigonometric relation cos(A +B) = cos(A) cos(B) sin(A) sin(B) and that W (k1)L+1 + S;k1 is independent of ^ Y;k and X;k . 153 GivenX A;k =jx k j, ^ Y k is statistically the same as Y in (2.122) withR =jx k j p , and therefore we have from Lemma 16 that E cos( ^ Y;k X;k ) X A;k =jx k j 1 1 jx k j 2 (2.164) E sin( ^ Y;k X;k ) X A;k =jx k j = 0: (2.165) Equations (2.163) and (2.165) imply that we do not need to computeE sin(W (k1)L+1 + S;k1 ) . We have E cos(W (k1)L+1 + S;k1 ) =E cos(W (k1)L+1 ) E [cos( S;k1 )]E sin(W (k1)L+1 ) E [sin( S;k1 )] (2.166) becauseW (k1)L+1 and S;k1 are independent. SinceW (k1)L+1 is Gaussian with mean 0 and variance 2 W = 2, the characteristic function of W (k1)L+1 is E e jW (k1)L+1 =e 2 W =2 (2.167) and by using the linearity of expectation, we have E cos(W (k1)L+1 ) =< E e jW (k1)L+1 =e 2 W =2 (2.168) E sin(W (k1)L+1 ) == E e jW (k1)L+1 = 0 (2.169) where<[] and=[] denote the real and imaginary parts, respectively. 154 Given X k1 = x k1 , S k1 is distributed as Y in (2.122) with R =jx k1 j p . By using Lemma 16, we have E cos( S;k1 ) X k1 =x k1 1 1 jx k1 j 2 : (2.170) It follows from (2.166), (2.168), (2.169) and (2.170) that E cos(W (k1)L+1 + S;k1 ) e 2 W =2 1E 1 jX k1 j 2 1 : (2.171) By combining (2.163), (2.164), (2.165) (2.171), we have (for P > 2) E h cos( ~ Y;k X;k ) i e 2 W =2 1E 1 jX k1 j 2 1 1E 1 jX k j 2 1 (a) e 2 W =2 1 2 P 2 (b) e 2 W =2 4 P (c) 1 2 W 2 4 P (2.172) where (a) holds becausejX k j 2 P=2 andjX k1 j 2 P=2, while (b) follows frome 2 W =2 1 and the non-negativity of 4=(P ) 2 and (c) follows from e x 1x. It follows from (2.147), (2.156) and (2.159) that 1 b I( b X ; Y b jX b A ) b 1 b 1 2 log 4 SNR : (2.173) 155 Hence, we have I( X ; YjX A ) = lim b!1 1 b I( b X ; Y b jX b A ) 1 2 log 4 SNR : (2.174) Setting = SNR (the signal-to-noise ratio in one sample) gives I( X ; YjX A ) 1 2 logSNRSNR 2 4: (2.175) Since = 1=L, we have lim SNR!1 p SNR = 1 and lim SNR!1 SNR 2 = 1 2 : (2.176) Therefore, by taking the limit of SNR tending to innity, we have lim SNR!1 I( X ; YjX A ) 1 4 logSNR log 1 4: (2.177) The last equation implies that the phase modulation contributes 1=4 to the pre-log of the information rate when oversampling is employed. We remark that, since the lower bound (2.174) is valid for any> 0, one can optimize over to get the tightest lower bound as follows I( X ; YjX A ) max >0 1 2 log + 4 SNR = 1 2 log 1 2 + 8 SNR 1 2 (2.178) 156 = 1 2 log SNR 2SNR 2 + 8 1 2 : (2.179) For L = l p SNR m , we have from (2.176) and (2.179) lim SNR!1 I( X ; YjX A ) 1 4 logSNR log 1 log (2 + 8) 1 2 : (2.180) We nd that the optimum (the that gives the tightest lower bound) does not yield a pre-log better than that of = SNR. The main result of this section is summarized in Theorem 18. Theorem 18 Consider the discrete-time model in (2.121). If L =d p SNRe and the input X n is i.i.d. with arg(X k ) independent ofjX k j for k = 1;:::;n such that arg(X k ) is uniformly distributed over [;) and (jX k j 2 P=2) is exponentially distributed with mean P=2, then lim SNR!1 I(X A ;Y) 1 2 logSNR2 1 2 log(8) (2.181) and lim SNR!1 I( X ;YjX A ) 1 4 logSNR log 1 4: (2.182) Moreover, (2.182) and (2.181) are achieved by the receiver structure shown in Fig. 2.12. 157 () 2 y k digital LTI lter L# downsample Amplitude decoder Phase decoder Sample delay Symbol delay ^ x A e j ^ X ^ x A;m vm y k1 ^ x m1 e j ^ k1 ^ X;m ^ xm Figure 2.12: Two-stage decoding in a multi-sample receiver. For a given k, m =dk=Le. The upper branch is part of the double ltering receiver for amplitude decoding. The lower branch is a phase-noise tracker which outputs an estimate ^ k1 of the phase noise where ^ k1 = arg(y k1 =^ x m1 ) (cf. (2.148)). The phase decoder uses the estimate of the phase noise ^ k1 and the estimated amplitude ^ x A;m to produce an estimate ^ X;m of the phase of x m . The proof of (2.181) is similar to the proof in Sec. 2.4.1 for the full model and hence the proof is omitted. 2 As a corollary of Theorem 18, the capacity pre-log for the approximate model (2.121) satises lim SNR!1 C(SNR) logSNR 3 4 : (2.183) The corollary follows from Theorem 18 by using C(SNR) I(X; Y) = I(X A ; Y) + I( X ; YjX A ). This shows that the capacity grows logarithmically at high SNR with a pre-log factor of at least 3/4. It is worth pointing out that the phase modulation pre-log of 1/4 requires only 2 samples per symbol for which the time resolution, 1=, grows as the square root of the SNR. It is interesting to contemplate whether another receiver, e.g., a non-coherent 2 The inequality (2.72) is valid for the approximate model except that G is set to 1, becauseF k = 1 in the approximate model. Hence, the term 2 =36 in (2.80) does not appear in (2.181). 158 receiver, can achieve the maximum amplitude modulation pre-log of 1/2 but requires only 1 sample per symbol. If so, one would need only 3 samples per symbol to achieve a pre-log of 3/4. 2.8 Summary and Open Problems The known results are summarized in Tables 2.1 and 2.2. The conjecture in Table 2.1 is based on our belief thatfH k g in (2.14) is stationary, ergodic and regular, i.e., its "present" cannot be predicted precisely from its "past" [54]. If this is true, it follows that the capacity grows double-logarithmically with SNR according to Lapidoth and Moser [55]. Here are some open problems (at high SNR) when oversampling is allowed 3 : Model without ltering: Can phase modulation with oversampling achieve 1 2 logSNR? Model with ltering: Can phase modulation with oversampling achieve 1 4 logSNR as in the model without ltering? { If yes: can it achieve 1 2 logSNR? It makes sense to try to answer this question for the model without ltering rst. { If no: can it achieve a positive pre-log? 3 The number of samples L per symbol is allowed to tend to innity. 159 Table 2.1: Asymptotic Capacity Model Baud Rate Sampling (L = 1) C(SNR) Without Filtering (2.15) 1 2 logSNR [53] With Filtering (2.13) log logSNR (conjecture) Table 2.2: Asymptotic Achievable Rate Model Oversampling Amplitude Modulation Phase Modulation I(X A ; Y) I( X ; YjX A ) Without Filtering (2.121) 1 2 logSNR [42] 1 4 logSNR [43] With Filtering (2.18) 1 2 logSNR [42] { It is worth mentioning that the results in [42] and [43] were obtained by letting the number of samples L per symbols grow with SNR: L = l p SNR m : (2.184) However, this is not necessarily the optimum (minimum) oversampling factor to achieve the aforementioned pre-logs. For the model with ltering, we indeed showed in Sec. 2.4 that a pre-log of 1/2 can be achieved using L = l 3 p SNR m (2.185) but it is not clear whether this is optimal. For the model without ltering, Barletta and Kramer [10] showed that if L =dSNR e (2.186) 160 where 0 < 1, then the capacity pre-log is at most (1 +)=2 which implies that L/ p SNR is optimal for achieving a pre-log of 3/4. The impact of oversampling on the capacity of non-Wiener phase noise channels, or multi-input multi-output (MIMO) phase noise channels, are also interesting open problems. 2.9 Conclusion We studied a waveform channel impaired by Wiener phase noise and AWGN. A discrete- time channel model based on ltering and oversampling is considered. The model accounts for the ltering eects on the phase noise. It is shown that, at high SNR, the multi-sample receiver achieves rates that grow logarithmically with SNR with at least a 1/2 pre-log factor, if the number of samples per symbol grows with the square-root of the SNR. In addition, we computed via numerical simulations tight lower bounds on the information rates achieved by the multi-sample receiver. We observed that the required oversampling rate depends on the linewidth of the phase noise, the shape of the transmit-pulse and the signal-to-noise ratio. The results demonstrate that multi-sample receivers increase the information rate for both strong and weak phase noise at high SNR. We compared our results with the results obtained by using other discrete-time models. Finally, we showed for an approximate discrete-time model of the multi-sample receiver that phase modulation achieves a pre-log of at least 1/4 while amplitude modulation achieves a pre- log of 1/2, if the number of samples per symbol grows with the square root of the SNR. Thus, we demonstrated that the overall capacity pre-log is 3=4 which is greater than the capacity pre-log of the (approximate) discrete-time Wiener phase noise channel with only one sample per symbol. 161 2.10 Appendix 2.10.1 Details of Discrete-Time Model of Multi-sample Receiver Characterization offN k g Linear ltering (integration) of the Gaussian process N(t) yields another Gaus- sian process. Therefore, second-order statistics completely characterize the process fN k g. The mean is E[N k ] = Z k (k1) E[N(t)] dt = 0 (2.187) which follows from the linearity of expectation and (2.5). The covariance is E[N k 1 N k 2 ] (a) = Z k 1 (k 1 1) Z k 2 (k 2 1) E[N(t 1 )N (t 2 )] dt 2 dt 1 (b) = Z k 1 (k 1 1) Z k 2 (k 2 1) 2 N D (t 2 t 1 ) dt 2 dt 1 = 2 N K [k 1 k 2 ] Z k 1 (k 1 1) dt 1 = 2 N K [k 1 k 2 ] (2.188) where K [k] is the Kronecker delta K [k] = 8 > > < > > : 1; k = 1 0; otherwise: (2.189) 162 Step (a) follows from the linearity of expectation, and (b) follows from (2.5). The pseudo-covariance is E[N k 1 N k 2 ] = Z k 1 (k 1 1) Z k 2 (k 2 1) E[N(t 1 )N(t 2 )] dt 2 dt 1 = 0 (2.190) which holds becauseE [N(t 1 )N(t 2 )] = 0, i.e.,N(t) is a circularly-symmetric complex process. In summary, the processfN k g is a Gaussian process with the second-order statistics E[N ` ] = 0 (2.191) E[N ` N k ] = 2 N K [`k] (2.192) E[N ` N k ] = 0 (2.193) Remark 19 The processfN k g is an i.i.d. Gaussian process withN k N C (0; 2 N ). Proof. The variables N 1 ;:::;N n are uncorrelated which implies that they are statistically independent because they are Gaussian. The identical distribution property follows from the invariance of the second-order statistics of the Gaussian random variable N k with the index k. Wiener process: 163 Another characterization of the (continuous-time) Wiener process: the Wiener pro- cess (t) (0) is a Gaussian process with the second-order statistics E [(t) (0)] = 0 (2.194) E [((t 1 ) (0))((t 2 ) (0))] = 2 minft 1 ;t 2 g: Property 20 The Wiener process has uncorrelated increments, i.e., increments over disjoint time intervals are uncorrelated. Property 21 The Wiener process has independent increments, i.e., increments over disjoint time intervals are independent. Property 22 An increment (t 2 ) (t 1 ) is Gaussian with Var ((t 2 ) (t 1 )) = 2jt 2 t 1 j (2.195) The processfF k g Remark 23 The processfF k g is an i.i.d. process (if the pulse is rectangular). Proof. The independence follows from property 21 and because functions of inde- pendent random variables are also independent. The identical distribution property follows from the rectangular pulse shape 4 and that the integrations are over incre- ments of a Wiener process with equal length. 4 If the pulse is not rectangular, the processfF k g is not identically distributed because the phase noise e j(t) is weighted in a given sample interval according to the pulse shape. 164 2.10.2 Details of Double Filtering Receiver Moments of Gaussian random variables: Remark 24 If XN R (0; 2 ), then E[X 4 ] = 3 4 . Remark 25 If ZN C (0; 2 ), then E[jZj 4 ] = 2 4 . Remark 26 If ZN C (0; 2 ), then E ZjZj 2 = 0. Remark 24 follows from [51, eq (4.54)] while remarks 25 and 26 follow from [51, eq (8.117)]. E[Z 1 ] = 0 E[Z 1 ] (a) = E " L X `=1 <[e j X e j ` F ` N ` ] # (b) = L X `=1 < E[e j X e j ` F ` N ` ] (c) = L X `=1 < E[e j X e j ` F ` ]E[N ` ] (d) = 0 (2.196) Step (a) follows from the denition (2.40), (b) follows from the linearity of the expectation, (c) follows becausefN k g is independent of X ,f k g andfF k g, and (d) follows from (2.191). 165 E[Z 2 1 ] =E[G] 2 N L=2 E[Z 2 1 ] (a) = E L X `=1 L X k=1 <[e j X e j ` F ` N ` ]<[e j X e j k F k N k ] (b) = E L X `=1 L X k=1 1 2 <[e j2 X e j ` j k F ` F k N ` N k + e j ` j k F ` F k N ` N k ] (c) = L X `=1 L X k=1 1 2 < h E[e j2 X e j ` j k F ` F k N ` N k ]+ E[e j ` j k F ` F k N ` N k ] i (d) = L X `=1 L X k=1 1 2 < h E[e j2 X e j ` j k F ` F k ]E[N ` N k ]+ E[e j ` j k F ` F k ]E[N ` N k ] i (e) = 1 2 2 N L X `=1 E[jF ` j 2 ] (f) = 1 2 2 N LE[G] (2.197) Step (a) follows from the denition (2.40), (b) follows by using the relation<[A]<[B] = 1 2 <[AB+AB ] = 1 2 <[A B +A B], (c) follows from the linearity of the expectation, (d) follows becausefN k g is independent of X ,f k g andfF k g, (e) follows from (2.192) and (2.193), and (f) follows from (2.39). E[Z 0 ] = 2 N L E[Z 0 ] (a) = E " L X `=1 jN ` j 2 # (b) = L X `=1 E jN ` j 2 166 (c) = 2 N L (2.198) Step (a) follows from the denition (2.41), (b) follows from the linearity of the expectation, (c) follows from (2.192). E (Z 0 E[Z 0 ]) 2 = 4 N 2 L E (Z 0 E[Z 0 ]) 2 = Var(Z 0 ) (a) = LVar(jN 1 j 2 ) =L(E[jN 1 j 4 ]E 2 [jN 1 j 2 ]) (b) = L 2 4 N 2 4 N 2 = 4 N 2 L (2.199) where (a) holds becausefN k g is i.i.d. and (b) follows from Remark 25. E [Z 1 Z 0 ] = 0 E [Z 1 Z 0 ] (a) = E "" L X `=1 <[e j X e j ` F ` N ` ] #" L X k=1 jN k j 2 ## =E " L X `=1 L X k=1 <[e j X e j ` F ` N ` jN k j 2 ] # (b) = L X `=1 L X k=1 < E[e j X e j ` F ` N ` jN k j 2 ] (c) = L X `=1 L X k=1 < E[e j X e j ` F ` ]E[N ` jN k j 2 ] (d) = L X `=1 < E[e j X e j ` F ` ]E[N ` jN ` j 2 ] 167 (e) = 0 (2.200) Step (a) follows from the denitions (2.40) and (2.41), (b) follows from the linear- ity of the expectation, (c) follows becausefN k g is independent of X ,f k g and fF k g, (d) follows from the independence of N ` and N k for k6= ` and (2.191), i.e., E[N ` jN k j 2 ] =E[N ` ]E[jN k j 2 ] = 0 for k6=`, (e) follows by using Remark 26. E [Z 1 (Z 0 E[Z 0 ])] = 0 Follows from the linearity of expectation and by using (2.200) and (2.196). E [GZ 1 ] = 0 E [GZ 1 ] (a) = E "" L X `=1 <[e j X e j ` F ` N ` ] #" 1 L L X k=1 jF k j 2 ## =E " 1 L L X `=1 L X k=1 <[e j X e j ` F ` jF k j 2 N ` ] # (b) = 1 L L X `=1 L X k=1 < E[e j X e j ` F ` jF k j 2 N ` ] (c) = 1 L L X `=1 L X k=1 < E[e j X e j ` F ` jF k j 2 ]E[N ` ] (d) = 0 (2.201) Step (a) follows from the denitions (2.39) and (2.40), (b) follows from the linearity of the expectation, (c) follows becausefN k g is independent of X ,f k g andfF k g, (d) follows from (2.191). E [(G 1)Z 1 ] = 0 168 Follows from the linearity of expectation and by using (2.201) and (2.196). 2.10.3 Statistical Quantities of Filtered Wiener Phase Noise Outline for computing moments 5 of F 1 : LetM be a positive integer, c = (c 1 ;:::;c M ) T be a constant vector, t = (t 1 ;:::;t M ) T be a non-negative real vector and (t) = ((t 1 ) (0);:::; (t M ) (0)) T where (t) is dened in (2.6). We have E 1 M Z Z 0 exp(jc T (t))dt (a) = 1 M Z Z 0 E exp(jc T (t)) dt (b) = 1 M Z Z 0 exp 1 2 c T (t)c dt (c) = Z Z 1 0 exp 1 2 c T (u)c du (d) = Z Z 1 0 exp 2 c T (u)c du (2.202) where dt =dt M :::dt 1 and (t) is the covariance matrix of (t) whose entries are given by ij (t) = 2 minft i ;t j g; for i;j = 1;:::;M: (2.203) Step (a) follows from the linearity of expectation, (b) follows by using the charac- teristic function of a Gaussian random vector, (c) follows from the transformation 5 We remark that a formula for computing moments of ltered Wiener phase noise (specically,E[jF1j 2k ] where k is a positive integer) is found in [23]. However, we derived the moments here independently. 169 of variables t = u and (d) holds because (u) = (u) which follows from (2.203). We dene ae : (2.204) Details for computingE[jF 1 j 2 ] Using the denition in (2.27), we have E[jF 1 j 2 ] =E 1 2 Z 0 Z 0 e j((t 2 )(t 1 )) dt 2 dt 1 = 1 2 Z 0 Z 0 e jt 2 t 1 j dt 2 dt 1 = Z 1 0 Z 1 0 a ju 2 u 1 j du 2 du 1 : (2.205) Here, using M = 2, c = (1; 1) T and (t) = 2 2 6 6 4 t 1 minft 1 ;t 2 g minft 2 ;t 1 g t 2 3 7 7 5 (2.206) yields c T (t)c = 2(t 1 +t 2 2 minft 1 ;t 2 g) = 2jt 2 t 1 j: (2.207) 170 Finally, we have E[jF 1 j 2 ] = 2 a 1 loga (loga) 2 (2.208) by using (2.205) and the following lemma. Lemma 27 Z 1 0 Z 1 0 a jtsj ds dt = 2 a 1 loga (loga) 2 ; for a> 0: Proof. Z 1 0 Z 1 0 a jtsj ds dt = Z 1 0 Z t 0 a ts ds + Z 1 t a st ds dt = Z 1 0 a t 1a t loga +a t aa t loga dt = 1 loga Z 1 0 a t +a 1t 2 dt = 1 loga a 1 loga + a 1 loga 2 = 2 a 1 loga (loga) 2 Details for computingE[jF 1 j 4 ] LetT =ft : 0t i ;i = 1; 2; 3; 4g and (t) = (t 4 ) (t 3 ) + (t 2 ) (t 1 ): (2.209) 171 The variable (t) is Gaussian. Using the denition (2.27), we have E[jF 1 j 4 ] =E 1 4 Z t2T e j (t) dt = 1 4 Z t2T e Var( (t))=2 dt (2.210) where the last equality follows by using the characteristic function of a Gaussian random variable. The setT can be partitioned into 24 subsets based on the order of t 1 ,t 2 ,t 3 andt 4 . However, because of the symmetry, we have E[jF 1 j 4 ] = 1 4 16 Z t2T 1 +8 Z t2T 2 e Var( (t))=2 dt (2.211) where the setsT 1 andT 2 are T 1 =ft : 0t 1 t 2 t 3 t 4 g T 2 =ft : 0t 1 t 3 t 2 t 4 g: (2.212) In other words, we need to compute the integral over two subsets only rather than 24 subsets. For t2T 1 , we have Var ( (t)) (a) = Var ((t 4 ) (t 3 )) +Var ((t 2 ) (t 1 )) (b) = 2(t 4 t 3 +t 2 t 1 ) (2.213) 172 where (a) follows because (t 4 )(t 3 ) and (t 2 )(t 1 ) are increments of a Wiener process over disjoint time intervals and hence they are independent (see Property 21) while (b) follows from Property 22. We compute the rst integral K 1 = 1 4 Z t2T 1 e Var( (t))=2 dt = Z 1 0 Z 1 u 1 Z 1 u 2 Z 1 u 3 a u 4 u 3 +u 2 u 1 du 4 du 3 du 2 du 1 = 6 6a + 4 loga + 2a loga + (loga) 2 2(loga) 4 : (2.214) For t2T 2 , we can rewrite (t) as (t) = ((t 4 ) (t 2 )) + 2((t 2 ) (t 3 )) + ((t 3 ) (t 1 )) and hence, we have Var ( (t)) (a) = Var ((t 4 ) (t 2 )) +Var (2((t 2 ) (t 3 ))) +Var ((t 3 ) (t 1 )) (b) = 2(t 4 3t 3 + 3t 2 t 1 ): (2.215) Step (a) follows because (t 4 )(t 2 ), (t 2 )(t 3 ) and (t 3 )(t 1 ) are increments of a Wiener process over disjoint time intervals and hence they are independent 173 (see Property 21) while (b) follows from Property 22. Next, we compute the second integral K 2 = 1 4 Z t2T 2 e Var( (t))=2 dt = Z 1 0 Z 1 u 1 Z 1 u 3 Z 1 u 2 a u 4 3u 3 +3u 2 u 1 du 4 du 2 du 3 du 1 = 81 + 80a +a 4 36 loga 48a loga 144(loga) 4 : (2.216) By combining (2.211), (2.214) and (2.216), we obtain E[jF 1 j 4 ] = 783 784a +a 4 + 540 loga + 240a loga + 144(loga) 2 18(loga) 4 : (2.217) Of course, we could computeE[jF 1 j 4 ] by using M = 4, c = (1; 1;1; 1) T and (t) = 2 2 6 6 6 6 6 6 6 6 6 6 4 t 1 minft 1 ;t 2 g minft 1 ;t 3 g minft 1 ;t 4 g minft 2 ;t 1 g t 2 minft 2 ;t 3 g minft 2 ;t 4 g minft 3 ;t 1 g minft 3 ;t 2 g t 3 minft 3 ;t 4 g minft 4 ;t 1 g minft 4 ;t 2 g minft 4 ;t 3 g t 4 3 7 7 7 7 7 7 7 7 7 7 5 (2.218) to compute c T (t)c = 2 h t 1 +t 2 +t 3 +t 4 2 minft 1 ;t 2 g + 2 minft 1 ;t 3 g 2 minft 1 ;t 4 g 2 minft 2 ;t 3 g + 2 minft 2 ;t 4 g 2 minft 3 ;t 4 g i (2.219) then substitute in (2.202). 174 Computing Var(jF 1 j 2 ) Var(jF 1 j 2 )E (jF 1 j 2 E[jF 1 j 2 ]) 2 =E[jF 1 j 4 ] (E[jF 1 j 2 ]) 2 = 711 640a 72a 2 +a 4 + 396 loga + 384a loga + 72(loga) 2 18(loga) 4 : (2.220) Limits We have lim a!1 Var(jF 1 j 2 ) = 0; lim a!1 Var(jF 1 j 2 ) loga = 0; lim a!1 Var(jF 1 j 2 ) (loga) 2 = 4 45 ; lim a!1 Var(jF 1 j 2 ) (loga) 3 =1 (2.221) and lim a!1 E[jF 1 j 2 ] 1 2 loga = 0; lim a!1 E[jF 1 j 2 ] 1 2 (loga) 2 = 1 9 ; lim a!1 E[jF 1 j 2 ] 1 2 (loga) 3 =1: (2.222) ComputingE[F 1 ] E[F 1 ] = 1 Z 0 E[e j(()(0)) ] d = 1 Z 0 e d = Z 1 0 a u du = a 1 loga : (2.223) Computing Var(F 1 ) 175 We have Var(F 1 )E jF 1 E[F 1 ]j 2 =E[jF 1 j 2 ]jE[F 1 ]j 2 = 3 4a +a 2 + 2 loga (loga) 2 (2.224) which follows by using (2.208) and (2.223). More limits It follows from (2.223) and (2.224) that lim a!0 E[F 1 ] = 1 and lim a!0 Var(F 1 ) = 0 (2.225) which further imply that lim a!1 E[jF 1 1j 2 ] = lim a!1 Var(F 1 ) +jE[F 1 ] 1j 2 = lim a!1 Var(F 1 ) + lim a!1 E[F 1 ] 1 2 = 0: (2.226) 2.10.4 Statistical Quantities of G Moments of G We have E[G] =E " 1 L L X `=1 jF ` j 2 # = 1 L L X `=1 E[jF ` j 2 ] =E[jF 1 j 2 ] (2.227) 176 where we used the denition of G in (2.39) and that F 1 ;:::;F L are identically distributed (see Remark 23). Moreover, we have Var(G) = Var 1 L L X `=1 jF ` j 2 ! = 1 L Var(jF 1 j 2 ) (2.228) where we used the denition of G in (2.39) and that F 1 ;:::;F L are independent and identically distributed (see Remark 23). G converges to 1 in mean square as L!1 We have E[(G 1) 2 ] =E[(GE[G]) 2 ] + (E[G] 1) 2 = 1 L Var(jF 1 j 2 ) + E[jF 1 j 2 ] 1 2 (2.229) which follows from (2.228) and (2.227). By using (2.220) and (2.208), we get lim L!1 E[(G 1) 2 ] = lim L!1 1 L lim a!1 Var(jF 1 j 2 ) + lim a!1 E[jF 1 j 2 ] 1 2 = 0: (2.230) Limits Recall from (2.16) withT s = 1 and from (2.204) thatL = 1= and =(loga)=. Therefore, (2.228) and (2.221) imply lim !0 Var(G) = 0; lim !0 Var(G) 2 = 0; lim !0 Var(G) 3 = 4 45 () 2 ; lim !0 Var(G) 4 =1 (2.231) 177 while (2.227) and (2.222) imply lim !0 (E[G] 1) 2 = 0; lim !0 (E[G] 1) 2 2 = 1 9 () 2 ; lim !0 (E[G] 1) 2 3 =1: (2.232) 2.10.5 Conditional PDF of Output Phase Given Input Phase in AWGN Channels We derive the conditional distribution of the output phase given the input phase in an AWGN channel when the input amplitude is xed. The conditional distribution was reported in [4] without the derivation. Consider the channel Y =Re j X +Z (2.233) where X 2 [;) is the input and Z is a circularly-symmetric complex Gaussian with mean 0 and variance 1. The conditional distribution p Yj X is given by p Yj X (yj x ) = 1 exp yRe jx 2 (2.234) Let Y R =<[Y ], Y I ==[Y ], Y A =jYj and Y =\Y = argY . More precisely, the conditional distribution of Y in rectangular (Cartesian) coordinates given X is p Y R ;Y I j X (y R ;y I j x ) = 1 exp (y R +jy I )Re jx 2 (2.235) 178 The conditional distribution of Y in polar coordinates given X is p Y A ; Y j X (y A ; y j x ) =p Y R ;Y I j X (y A cos y ;y A sin y j x ) y A = 1 exp y A e jy Re jx 2 y A = 1 exp y 2 A +R 2 2y A R cos( y x ) y A (2.236) By marginalization, we have p Y j X ( y j x ) = Z 1 0 p Y A ; Y j X (y A ; y j x ) dy A = Z 1 0 1 e (y 2 A +R 2 2y A R cos(yx)) y A dy A = 1 e R 2 Z 1 0 e (y 2 A 2y A R cos(yx)) y A dy A (a) = 1 e R 2 1 2 + p 2 R cos( y x )e R 2 cos 2 (yx) erfc (R cos( y x )) (b) = 1 2 e R 2 + 1 p 4 R cos( y x )e R 2 sin 2 (yx) erfc (R cos( y x )) (2.237) where (a) follows from Lemma 28 and (b) follows from the trigonometric identity cos 2 + sin 2 = 1. Lemma 28 Z 1 0 x e (x 2 2bx) dx = 1 2 + p 2 b e b 2 erfc(b): (2.238) 179 Proof. Z 1 0 x e (x 2 2bx) dx (a) = Z 1 b (u +b) e b 2 u 2 du =e b 2 Z 1 b u e u 2 du +b Z 1 b e u 2 du (b) = 1 2 +b e b 2 p 2 erfc(b) (2.239) where (a) follows from the transformation u =xb while (b) holds because Z 1 b u e u 2 du = 1 2 e u 2 u=1 u=b = 1 2 e b 2 (2.240) and Z 1 b e u 2 du = p 2 erfc(b) (2.241) which follows directly from the denition of erfc in (2.125). It is worth mentioning that p Y A j X (y A j x ) = Z p Y A ; Y j X (y A ; y j x )d y = 2y A e (y 2 A +R 2 ) I 0 (2y A R) (2.242) where I 0 () is the modied Bessel function of the rst kind of order zero. 180 2.10.6 Supporting Lemmas for Lower Bound on Rate of Phase Modulation in AWGN Channels Lemma 29 Suppose X and Y are real-valued circular 6 random variables [57]. Suppose p Y j X (j) is a conditional probability density such that p Y j X ( y j x ) = p ( y x ) and p () is periodic with a period 2. Then, for any bounded function f() that is periodic with a period 2, we have E [f( Y X )] = Z p () f() d: (2.243) This implies that E [f( Y X )] does not depend on the distribution of X . Proof. First, we remark that the pdf of a circular random variable is dened on the whole real line and is normalized over any interval of length 2 [57, Ch.3], e.g., Z x+2 x p Y j X ( y j x ) d y = 1; 1<x<1: (2.244) We proceed next to the proof. E [f( Y X )] = Z Z p X ; Y ( x ; y ) f( y x ) d y d x (a) = Z Z p X ( x ) p Y j X ( y j x ) f( y x ) d y d x (b) = Z p X ( x ) Z p ( y x ) f( y x ) d y d x 6 Circular random variables are random variables dened on a circle, i.e., random angles and should not be confused with circularly-symmetric random variables. 181 (c) = Z p X ( x ) Z x x p () f() dd x (d) = Z p X ( x ) Z p () f() dd x = Z p X ( x )d x Z p Y () f() d = Z p ()f() d (2.245) where (a) follows from the chain rule of probability, (b) holds becausep Y j X ( y j x ) = p ( y x ), (c) follows from the transformation of variables = y x and (d) holds because p ()f() is periodic with period 2. Lemma 30 For z 0, erfc(z) 2e z 2 . Proof. Since for any real number z erf(z) =erf(z) (2.246) and erfc(z) = 1 erf(z) (2.247) therefore, we have erfc(z) = 1 erf(z) = 1 + erf(z) = 2 erfc(z): (2.248) 182 To complete the proof, we use [20, eq(5)] erfc(z)e z 2 for z 0: (2.249) Lemma 31 sintt for t 0. Proof. Since 1 cosz 0, then for t 0 we have Z t 0 (1 cosz) dz 0 (2.250) which implies that (for t 0) t sint 0: (2.251) Lemma 32 sint (2=)t for t2 [0;=2]. Proof. The proof follows from the convexity of the sin function over [0;=2] and Jensen's inequality. Lemma 33 Let a 0. Then for t 0, we have cos 2 t e a sin 2 t (1t 2 )e at 2 : (2.252) 183 Proof. The trigonometric identity cos 2 t + sin 2 t = 1 and sint t (Lemma 31) imply that cos 2 t 1t 2 : (2.253) Since sintt for t 0 and since the exponential function is monotone, we have e a sin 2 t e at 2 : (2.254) By combining the two inequalities, we obtain cos 2 t e a sin 2 t cos 2 t e at 2 (1t 2 )e at 2 : (2.255) Lemma 34 Let a> 0 and n be a positive integer. Then we have max x x n e ax 2 = n 2ae n=2 : (2.256) 2.10.7 Some Approximations of Circular Distributions For large R and small : e R 2 0, cos 1, sin and erfc (R cos) 1. Then, we have p () = e R 2 2 + R cos p 4 e R 2 sin 2 erfc (R cos) R p 4 e R 2 2 : 184 For large and small : I 0 ()e = p 2 and cos() 1 2 =2. Then, we have e cos() 2I 0 () e (1 2 =2) p 2 2e = r 2 e 2 =2 : For small 2 W : We have p W (w) = 1 X i=1 1 p 2 W e (w2i) 2 2 2 W 1 p 2 W e w 2 2 2 W : 185 References [1] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with For- mulas, Graphs, and Mathematical Tables. New York, 1972. [2] G. P. Agrawal. Nonlinear Fiber Optics. Academic Press, 3rd edition, 2001. [3] E. Agrell, A Alvarado, G. Durisi, and M. Karlsson. Capacity of a nonlinear optical channel with nite memory. J. Lightwave Tech., vol. 32(no. 16):pp. 2862{2876, Aug. 2014. [4] J.P. Aldis and A.G. Burr. The channel capacity of discrete time phase modulation in AWGN. IEEE Trans. Inf. Theory, vol. 39(no. 1):pp. 184{185, Jan. 1993. [5] D.M. Arnold, H.-A. Loeliger, P.O. Vontobel, A. Kavcic, and Wei Zeng. Simulation- based computation of information rates for channels with memory. IEEE Trans. Inf. Theory, vol. 52(no. 8):pp. 3498{3508, Aug. 2006. [6] A. Barbieri and G. Colavolpe. On the information rate and repeat-accumulate code design for phase noise channels. IEEE Trans. Commun., vol. 59(no. 12):pp. 3223{ 3228, Dec. 2011. [7] A. Barbieri, G. Colavolpe, and G. Caire. Joint iterative detection and decoding in the presence of phase noise and frequency oset. IEEE Trans. Commun., vol. 55(no. 1):pp. 171{179, Jan. 2007. [8] L. Barletta and G. Kramer. On continuous-time white phase noise channels. IEEE Int. Sym. Inf. Theory (ISIT), 2014. [9] L. Barletta and G. Kramer. Signal-to-noise ratio penalties for continuous-time phase noise channels. Int. Conf. on Cog. Rad. Oriented Wireless Net. (CROWNCOM), 2014. [10] L. Barletta and G. Kramer. Upper bound on the capacity of discrete-time Wiener phase noise channels. IEEE Inf. Theory Workshop (ITW), 2015 (pre-print available on arxiv.org). [11] L. Barletta, M. Magarini, and A. Spalvieri. A new lower bound below the information rate of Wiener phase noise channel based on Kalman carrier recovery. Optics Express, vol. 20(no. 23):pp. 25471{25477, Nov. 2012. 186 [12] L. Barletta, M. Magarini, and A. Spalvieri. The information rate transferred through the discrete-time Wiener's phase noise channel. J. Lightwave Tech., vol. 30(no. 10):pp. 1480{1486, May 2012. [13] L. Barletta, M. Magarini, and A. Spalvieri. Estimate of information rates of discrete- time rst-order Markov phase noise channels. IEEE Photonics Tech. Letters, vol. 23(no. 21):pp. 1582{1584, Nov. 2011. [14] N. M. Blachman. A comparison of the informational capacities of amplitude- and phase-modulation communication systems. Proc. IRE, vol. 41(no. 6):pp. 748{759, 1953. [15] G. Bosco, P. Poggiolini, A. Carena, V. Curri, and F. Forghieri. Analytical results on channel capacity in uncompensated optical links with coherent detection. Optics Express, vol. 19(no. 26):pp. B440{B451, Dec. 2011. [16] G. Bosco, P. Poggiolini, A. Carena, V. Curri, and F. Forghieri. Analytical results on channel capacity in uncompensated optical links with coherent detection: erratum. Optics Express, vol. 20(no. 17):pp. 19610{19611, Aug. 2012. [17] Yury A. Brychkov. Handbook of special functions : derivatives, integrals, series and other formulas. CRC Press, 7th edition, 2008. [18] E. Casini, R. D. Gaudenzi, and A. Ginesi. DVB-S2 modem algorithms design and performance over typical satellite channels. Int. J. on Sat. Comm. and Net., vol. 22(no. 3):pp. 281{318, 2004. [19] X. Chen and W. Shieh. Closed-form expressions for nonlinear transmission per- formance of densely spaced coherent optical OFDM systems. Optics Express, vol. 18(no. 18):pp. 19039{19054, Aug. 2010. [20] M. Chiani and D. Dardari. Improved exponential bounds and approximation for the Q-function with application to average error probability computation. IEEE Global Telecom. Conf. (GLOBECOM), vol. 2:pp. 1399 {1402, Nov. 2002. [21] G. E. Corazza and G. Ferrari. New bounds for the MarcumQ-function. IEEE Trans. Inf. Theory, vol. 48(no. 11):pp. 3003{3008, Nov. 2002. [22] T. Cover and J. Thomas. Elements of Information Theory. Wiley, 2nd edition, 2006. [23] Y.E. Dallal and S. Shamai. Power moment derivation for noisy phase lightwave systems. IEEE Trans. Inf. Theory, vol. 40(no. 6):pp. 2099{2103, Nov. 1994. [24] R. Dar, M. Shtaif, and M. Feder. Improved bounds on the nonlinear ber-channel capacity. Euro. Conf. on Opt. Comm. (ECOC), pages 1{3, Sep. 2013. [25] R. Dar, M. Shtaif, and M. Feder. Information Rates in the Optical Nonlinear Phase- Noise Channel. Allerton Conf., Oct. 2013. [26] R. Dar, M. Shtaif, and M. Feder. New Bounds on the Capacity of Fiber-Optics Communications. ArXiv e-prints, 2013. 187 [27] J. Dauwels and H.-A. Loeliger. Computation of information rates by particle meth- ods. IEEE Trans. Inf. Theory, vol. 54(no. 1):pp. 406{409, Jan. 2008. [28] A. Demir, A. Mehrotra, and J. Roychowdhury. Phase noise in oscillators: a unifying theory and numerical methods for characterization. IEEE Trans. on Circ. and Sys. I: Fund. Theory and App., vol. 47(no. 5):pp. 655{674, 2000. [29] M. D orpinghaus, G. Koliander, G. Durisi, E. Riegler, and H. Meyr. Oversampling Increases the Pre-Log of Noncoherent Rayleigh Fading Channels. IEEE Trans. Inf. Theory, vol. 60(no. 9):pp. 5673{5681, Sept. 2014. [30] G. Durisi, A Tarable, C. Camarda, R. Devassy, and G. Montorsi. Capacity Bounds for MIMO Microwave Backhaul Links Aected by Phase Noise. IEEE Trans. Com- mun., vol. 62(no. 3):pp. 920{929, March 2014. [31] G. Durisi, A Tarable, C. Camarda, and G. Montorsi. On the capacity of MIMO Wiener phase-noise channels. Inf. Theory and App. Workshop (ITA), pages 1{7, Feb. 2013. [32] G. Durisi, A. Tarable, and T. Koch. On the multiplexing gain of MIMO microwave backhaul links aected by phase noise. IEEE Int. Conf. on Comm. (ICC), Jun. 2013. [33] R.-J. Essiambre, G. Foschini, G. Kramer, and P. Winzer. Capacity Limits of In- formation Transport in Fiber-Optic Networks. Phys. Rev. Letters, vol. 101:paper 163901, Oct. 2008. [34] R.-J. Essiambre, G. Foschini, P. Winzer, and G. Kramer. Capacity limits of ber- optic communication systems. Opt. Fiber Comm. Conf. (OFC), page OThL1, Mar. 2009. [35] R.-J. Essiambre, G. Kramer, P. J. Winzer, G.J. Foschini, and B. Goebel. Capacity limits of optical ber networks. J. Lightwave Tech., vol. 28(no. 4):pp. 662{701, Feb. 2010. [36] G.J. Foschini, L.J. Greenstein, and G. Vannucci. Noncoherent detection of coherent lightwave signals corrupted by phase noise. IEEE Trans. Commun., vol. 36(no. 3):pp. 306{314, Mar. 1988. [37] G.J. Foschini and G. Vannucci. Characterizing ltered light waves corrupted by phase noise. IEEE Trans. Inf. Theory, vol. 34(no. 6):pp. 1437{1448, Nov. 1988. [38] G.J. Foschini, G. Vannucci, and L.J. Greenstein. Envelope statistics for ltered optical signals corrupted by phase noise. IEEE Trans. Commun., vol. 37(no. 12):pp. 1293{1302, Dec. 1989. [39] H. Ghozlan and G. Kramer. Interference focusing for mitigating cross-phase modu- lation in a simplied optical ber model. IEEE Int. Sym. Inf. Theory (ISIT), pages 2033{2037, Jun. 2010. 188 [40] H. Ghozlan and G. Kramer. Interference focusing for simplied optical ber models with dispersion. IEEE Int. Sym. Inf. Theory (ISIT), pages 376{379, Aug. 2011. [41] H. Ghozlan and G Kramer. Multi-sample receivers increase information rates for wiener phase noise channels. IEEE Global Telecom. Conf. (GLOBECOM), pages 1919{1924, 2013. [42] H. Ghozlan and G. Kramer. On Wiener phase noise channels at high signal-to-noise ratio. IEEE Int. Sym. Inf. Theory (ISIT), pages 2279{2283, 2013. [43] H. Ghozlan and G. Kramer. Phase modulation in Wiener phase noise channels with oversampling at high SNR. IEEE Int. Sym. Inf. Theory (ISIT), 2014. [44] E.N. Gilbert. Increased information rate by oversampling. IEEE Trans. Inf. Theory, vol. 39(no. 6):pp. 1973{1976, 1993. [45] B. Goebel, R. Essiambre, G. Kramer, P.J. Winzer, and N. Hanik. Calculation of mutual information for partially coherent Gaussian channels with applications to ber optics. IEEE Trans. Inf. Theory, vol. 57(no. 9):pp. 5720{5736, Sep. 2011. [46] I.S. Gradshtein and I.M. Ryzhik. Table of integrals, series and products. Academic Press, 2007. [47] K.-P. Ho and J.M. Kahn. Channel capacity of WDM systems using constant- intensity modulation formats. Opt. Fiber Comm. Conf. (OFC), pages 731{733, Mar. 2002. [48] P. Hou, B.J. Belzer, and T.R. Fischer. On the capacity of the partially coherent additive white Gaussian noise channel. IEEE Int. Sym. Inf. Theory (ISIT), pages 372{372, Jul. 2003. [49] M. Ivkovic, I. Djordjevic, and B. Vasic. Calculation of achievable information rates of long-haul optical transmission systems using instanton approach. J. Lightwave Tech., vol. 25(no. 5):pp. 1163{1168, May 2007. [50] M. Katz and S. Shamai. On the capacity-achieving distribution of the discrete-time noncoherent and partially coherent AWGN channels. IEEE Trans. Inf. Theory, vol. 50(no. 10):pp. 2257{2270, Oct. 2004. [51] H. Kobayashi, B.L. Mark, and W. Turin. Probability, Random Processes, and Statis- tical Analysis: Applications to Communications, Signal Processing, Queueing Theory and Mathematical Finance. Cambridge University Press, 2012. [52] T. Koch and A Lapidoth. Increased capacity per unit-cost by oversampling. IEEE Convention of Electrical and Electronics Engineers in Israel (IEEEI), pages 684{688, Nov. 2010. [53] A. Lapidoth. Capacity bounds via duality: A phase noise example. Asian-Europian Workshop on Information Theory, pages 58{61, Jun. 2002. 189 [54] A. Lapidoth. On the asymptotic capacity of stationary Gaussian fading channels. IEEE Trans. Inf. Theory, 51(2):437{446, Feb. 2005. [55] A. Lapidoth and S.M. Moser. Capacity bounds via duality with applications to multiple-antenna systems on at-fading channels. IEEE Trans. Inf. Theory, vol. 49(no. 10):pp. 2426{2467, Oct. 2003. [56] Amos Lapidoth. A Foundation in Digital Communications. Cambridge University Press, 2009. [57] Kanti V. Mardia and Peter E. Jupp. Directional Statistics. John Wiley and Sons Ltd., 2000. [58] M. Martal o, C. Tripodi, and R. Raheli. On the information rate of phase noise- limited communications. Inf. Theory and App. Workshop (ITA), 2013. [59] A. Mecozzi. Limits to long-haul coherent transmission set by the Kerr nonlinearity and noise of the in-line ampliers. J. Lightwave Tech., vol. 12(no. 11):pp. 1993{2000, Nov. 1994. [60] A Mecozzi and R. Essiambre. Nonlinear shannon limit in pseudolinear coherent systems. J. Lightwave Tech., vol. 30(no. 12):pp. 2011{2024, Jun. 2012. [61] P. P. Mitra and J. B. Stark. Nonlinear limits to the information capacity of optical bre communications. Nature, vol. 411:pp. 1027{1039, 2001. [62] S.M. Moser. Capacity results of an optical intensity channel with input-dependent Gaussian noise. IEEE Trans. Inf. Theory, vol. 58(no. 1):pp. 207{223, Jan. 2012. [63] E. Narimanov and P. Mitra. The Channel Capacity of a Fiber Optics Communication System: Perturbation Theory. J. Lightwave Tech., vol. 20(no. 3):pp. 530{537, 2002. [64] Alan V. Oppenheim, Alan S. Willsky, and S. Hamid Nawab. Signals & Systems. Prentice-Hall, 2nd edition, 1997. [65] Athanasios Papoulis and S. Unnikrishna Pillai. Probability, Random Variables and Stochastic Processes. McGraw-Hill, 4th edition, 2002. [66] K.V. Peddanarappagari and M. Brandt-Pearce. Volterra series transfer function of single-mode bers. J. Lightwave Tech., vol. 15(no. 12):pp. 2232{2241, Dec. 1997. [67] R. Pighi and R. Raheli. Information rates of oversampled detectors for transition- noise-limited digital storage systems. IEEE Int. Sym. Inf. Theory (ISIT), pages 2536{2540, 2007. [68] P. Poggiolini. The GN Model of Non-Linear Propagation in Uncompensated Co- herent Optical Systems. J. Lightwave Tech., vol. 30(no. 24):pp. 3857{3879, Dec. 2012. [69] John G. Proakis and Masoud Salehi. Digital Communications. McGraw-Hill, 5th edition, 2008. 190 [70] M. Secondini, E. Forestieri, and G. Prati. Achievable Information Rate in Nonlinear WDM Fiber-Optic Systems With Arbitrary Modulation Formats and Dispersion Maps. J. Lightwave Tech., vol. 31(no. 23):pp. 3839{3852, Dec. 2013. [71] C. E. Shannon. A mathematical theory of communications. Bell Sys. Tech. J., vol. 27(no. 2):pp. 379 { 423, 623 { 656, Jul. and Oct. 1948. [72] A. Spalvieri and L. Barletta. Pilot-aided carrier recovery in the presence of phase noise. IEEE Trans. Commun., vol. 59(no. 7):pp. 1966{1974, Jul. 2011. [73] M.H. Taghavi, G.C. Papen, and P.H. Siegel. On the Multiuser Capacity of WDM in a Nonlinear Optical Fiber: Coherent Communication. IEEE Trans. Inf. Theory, vol. 52(no. 11):pp. 5008{5022, 2006. [74] J. Tang. The multispan eects of Kerr nonlinearity and amplier noises on Shannon channel capacity of a dispersion-free nonlinear optical ber. J. Lightwave Tech., vol. 19(no. 8):pp. 1110{1115, Aug. 2001. [75] J. Tang. The Shannon Channel Capacity of Dispersion-Free Nonlinear Optical Fiber Transmission. J. Lightwave Tech., vol. 19(no. 8):pp. 1104 { 1109, 2001. [76] J. Tang. The channel capacity of a multispan DWDM system employing dispersive nonlinear optical bers and an ideal coherent optical receiver. J. Lightwave Tech., vol. 20(no. 7):pp. 1095{1101, Jul. 2002. [77] R.W. Tkach and A.R. Chraplyvy. Phase noise and linewidth in an InGaAsP DFB laser. J. Lightwave Tech., vol. 4(no. 11):pp. 1711{1716, 1986. [78] K.S. Turitsyn, S.A. Derevyanko, I.V. Yurkevich, and S.K. Turitsyn. Information capacity of optical ber channels with zero average dispersion. Phys. Rev. Letters, vol. 91:203901, Nov. 2003. [79] A.J. Viterbi. Phase-locked loop dynamics in the presence of noise by Fokker-Planck techniques. Proc. IEEE, vol. 51(no. 12):pp. 1737{1753, 1963. [80] L.G.L. Wegener, M.L. Povinelli, A.G. Green, P.P. Mitra, J.B. Stark, and P.B. Little- wood. The eect of propagation nonlinearities on the information capacity of WDM optical ber systems: cross-phase modulation and four-wave mixing. Physica D: Nonlinear Phenomena, vol. 189(no. 1-2):pp. 81{99, 2004. [81] H. Wei and D. Plant. Comment on 'Information Capacity of Optical Fiber Channels with Zero Average Dispersion'. ArXiv Physics e-prints, 2006. [82] A. D. Wyner. Bounds on communication with polyphase coding. Bell Sys. Tech. J., vol. 45(no. 4):pp. 523{559, April 1966. [83] L. Xiang and X. P. Zhang. The study of information capacity in multispan nonlinear optical ber communication systems using a developed perturbation technique. J. Lightwave Tech., vol. 29(no. 3):pp. 260{264, Feb. 2011. 191 [84] M.I. Youse and F.R. Kschischang. A Fokker-Planck dierential equation approach for the zero-dispersion optical ber channel. IEEE Int. Sym. Inf. Theory (ISIT), pages 206{210, Jun. 2010. [85] M.I. Youse and F.R. Kschischang. A probabilistic model for optical ber channels with zero dispersion. Biennial Symposium on Communications (QBSC), pages 221{ 225, May 2010. [86] M.I. Youse and F.R. Kschischang. On the per-sample capacity of nondispersive optical bers. IEEE Trans. Inf. Theory, vol. 57(no. 11):pp. 7522{7541, Nov. 2011. [87] M.I. Youse and F.R. Kschischang. Communication over ber-optic channels using the nonlinear Fourier transform. IEEE Int. Sym. Inf. Theory (ISIT), pages 1710{ 1714, Jul. 2013. [88] M.I. Youse and F.R. Kschischang. Integrable communication channels and the nonlinear Fourier transform. IEEE Int. Sym. Inf. Theory (ISIT), pages 1705{1709, Jul. 2013. [89] M.I. Youse and F.R. Kschischang. Information Transmission Using the Nonlinear Fourier Transform, Part I: Mathematical Tools. IEEE Trans. Inf. Theory, vol. 60(no. 7):pp. 4312{4328, Jul. 2014. [90] M.I. Youse and F.R. Kschischang. Information Transmission Using the Nonlinear Fourier Transform, Part II: Numerical Methods. IEEE Trans. Inf. Theory, vol. 60(no. 7):pp. 4329{4345, Jul. 2014. [91] M.I. Youse and F.R. Kschischang. Information Transmission Using the Nonlinear Fourier Transform, Part III: Spectrum Modulation. IEEE Trans. Inf. Theory, vol. 60(no. 7):pp. 4346{4369, Jul. 2014. 192
Asset Metadata
Creator
Ghozlan, Hassan (author)
Core Title
Models and information rates for channels with nonlinearity and phase noise
Contributor
Electronically uploaded by the author
(provenance)
School
Andrew and Erna Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
04/08/2015
Defense Date
09/17/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
chromatic dispersion,communication channels,cross‐phase modulation,discrete‐time models,filtered phase noise,group velocity mismatch,information rates,interference focusing,Kerr nonlinearity,mismatched decoding,oai:digitallibrary.usc.edu:usctheses,OAI-PMH Harvest,optical fiber,oversampling,phase noise,Wiener process
Format
application/pdf
(imt)
Language
English
Advisor
Kramer, Gerhard (
committee chair
), Fulman, Jason (
committee member
), Mitra, Urbashi (
committee member
)
Creator Email
ghozlan@usc.edu,hassan.ghozlan@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-541844
Unique identifier
UC11297572
Identifier
etd-GhozlanHas-3249.pdf (filename),usctheses-c3-541844 (legacy record id)
Legacy Identifier
etd-GhozlanHas-3249.pdf
Dmrecord
541844
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Ghozlan, Hassan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Abstract (if available)
Abstract
The dissertation is divided into two parts. ❧ PART 1: Discrete‐time interference channel models are developed for information transmission over optical fiber using wavelength‐division multiplexing. A set of coupled nonlinear Schroedinger equations (derived from Maxwell's equations) is the corner stone of the models. The first model is a memoryless model that captures the nonlinear phenomena of the cross‐phase modulation but ignores dispersion. The main characteristic of the model is that amplitude variations on one carrier wave are converted to phase variations on another carrier wave, i.e., the carriers interfere with each other through amplitude‐to‐phase conversion. For the case of two carriers, a new technique called interference focusing is proposed where each carrier achieves the capacity pre‐log 1, thereby doubling the pre‐log of 1/2 achieved by using conventional methods. The technique requires neither channel time variations nor global channel state information. For more than two carriers, interference focusing is also useful under certain conditions. The second model captures the nonlinear phenomena of the cross‐phase modulation in addition to dispersion. The dispersion is included by taking into account the group velocity mismatch but ignoring all higher‐order effects of dispersion. Moreover, the model captures the effect of filtering at the receivers. In a 3‐user system, it is shown that all users can achieve the maximum pre‐log factor 1 simultaneously by using interference focusing, a time‐limited pulse and a bank of filters at the receivers, thus exploiting all the available amplitude and phase degrees of freedom. ❧ PART 2: A waveform channel is considered where the transmitted signal is corrupted by Wiener phase noise and additive white Gaussian noise. A discrete‐time channel model that takes into account the effect of filtering on the phase noise is developed. The model is based on a multi‐sample receiver, i.e., an integrate‐and‐dump filter whose output is sampled at a rate higher than the signaling rate. It is shown that, at high Signal‐to‐Noise Ratio (SNR), the multi‐sample receiver achieves a rate that grows logarithmically with the SNR if the number of samples per symbol (also called the oversampling factor) grows with the square‐root of the SNR. Moreover, the pre‐log factor is at least 1/2 in this case, which is achieved by amplitude modulation. The logarithmic behavior of the multi‐sample receiver is a significant improvement over the double‐logarithmic behavior of a matched filter receiver. Numerical simulations are used to compute tight lower bounds on the information rates achieved by the multi‐sample receiver. The numerical simulations show that oversampling at the receiver is beneficial for both strong and weak phase noise at high SNR. The results are compared with results obtained when using other discrete‐time models. Finally, it is shown for an approximate discrete‐time model of the multi‐sample receiver that the capacity pre‐log at high SNR is at least 3/4 if the number of samples per symbol grows with the square root of the SNR. The analysis shows that phase modulation achieves a pre‐log of at least 1/4 while amplitude modulation achieves a pre‐log of 1/2. This is strictly greater than the capacity pre‐log of the (approximate) discrete‐time Wiener phase noise channel with only one sample per symbol, which is 1/2.
Tags
chromatic dispersion
communication channels
cross‐phase modulation
discrete‐time models
filtered phase noise
group velocity mismatch
information rates
interference focusing
Kerr nonlinearity
mismatched decoding
optical fiber
oversampling
phase noise
Wiener process
Linked assets
University of Southern California Dissertations and Theses