Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Noise benefits in nonlinear signal processing
(USC Thesis Other)
Noise benefits in nonlinear signal processing
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NOISE BENEFITS IN NONLINEAR SIGNAL PROCESSING by Ashok Patel A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2009 Copyright 2009 Ashok Patel Dedication To my dear wife Amola for her unconditional love and support, and to our beloved daughter Anika ii Acknowledgments I thank my advisor Professor Bart Kosko for his guidance and support throughout my graduate work. He always challenged me to give more and demanded the best. His wisdom, philosophy, and rigorous scholarship have immensely influenced my academic growth. I also thank the other committee members Professor Remigijus Mikulevicius and Professor Edmond Jonckheere for their thoughtful comments and discussions. I am so fortunate to have my beautiful and loving wife Amola who stood by me through thick and thin, and made every sacrifice for me. I owe everything to her. Our beloved daughter Anika has always cheered me up with her charming smile. Words cannot describe my love and appreciation. This dissertation is dedicated to them. iii Table of Contents Dedication ii Acknowledgments iii List of Figures vi Abstract x Chapter 1: Introduction 1 1.1 Noise Benefits in Nonlinear Systems . . . . . . . . . . . . . . . . . 1 1.2 Performance Measures for Noise Benefits . . . . . . . . . . . . . . 4 1.3 Screening Conditions to Find Noise Benefits . . . . . . . . . . . . . 6 1.3.1 Forbidden Interval Theorems . . . . . . . . . . . . . . . . . 7 1.4 Learning Algorithms for Finding Optimal Noise Benefits . . . . . . 8 1.5 Contributions of this Dissertation . . . . . . . . . . . . . . . . . . . 10 Chapter 2: Part One: "oise Benefits with Detection and Error Probabilities 13 Chapter 3: Optimal "oise Benefits in "eyman-Pearson and Inequality- Constrained Statistical Signal Detection 15 3.1 Noise Benefits in Neyman-Pearson Signal Detection . . . . . . . . 16 3.2 Optimal Noise Densities for Neyman-Pearson Signal Detection . . . 20 3.3 N-P SR Noise Finding Algorithm . . . . . . . . . . . . . . . . . . . 41 3.4 Noise Benefits in Inequality-Constrained Statistical Decisions Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.5 Applications of the SR Noise-Finding Algorithm . . . . . . . . . . 50 3.5.1 Near-Optimal SR Noise for a Suboptimal One-Sample Neyman-Pearson Hypothesis Test of Variance . . . . . . . 50 3.5.2 Near-Optimal Signal Power Randomization for a Power-Constrained Signal Transmitter . . . . . . . . . . . 53 iv Chapter 4: Error-Probability "oise Benefits in Threshold Signal Detection 58 4.1 Error-Probability Noise Benefits: Total and Partial SR . . . . . . . . 59 4.2 Binary Signal Detection Based on Error-Probability . . . . . . . . . 63 4.3 Noise Benefits in Threshold Detection of Discrete Binary Signals . . 64 4.4 Closed-Form Optimal SR Noise . . . . . . . . . . . . . . . . . . . . 69 4.5 Adaptive Noise Optimization . . . . . . . . . . . . . . . . . . . . . 72 4.6 Noise Benefits in Threshold Detection of Signals with Continuous Probability Densities . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.7 Noise Benefits in Parallel Arrays of Threshold Neurons . . . . . . . 85 Chapter 5: "oise Benefits in Quantizer-Array Correlation Detection and Decoding 91 5.1 Noise Benefits in Array Signal Detection . . . . . . . . . . . . . . . 92 5.2 Neyman-Pearson Binary Signal Detection in α-Stable Noise . . . . 96 5.3 Quantizer Noise Benefits in Nonlinear-Correlation-Detector-Based Neyman-Pearson Detection . . . . . . . . . . . . . . . . . . . . . . 101 5.4 Maximum-Likelihood Binary Signal Detection in Symmetric α-Stable or Generalized-Gaussian Channel Noise . . . . . . . . . . . . . . . 124 5.5 Noise Benefits in Maximum-Likelihood Detection Using Quantizer-Array-Based Nonlinear Correlators . . . . . . . . . . . . 127 5.6 Watermark Decoding Using the Array-Based Correlation Detectors 137 Table 5.1 Comparison of Watermark-Decoding Performances . . . 146 Chapter 6: Part Two: "oise Benefits with Mutual Information 148 Chapter 7: Stochastic Resonance in Spiking Retinal and Sensory "euron Models 150 7.1 Noise Benefits in Spiking Neurons Models . . . . . . . . . . . . . 151 7.2 Stochastic Resonance in Spiking Retinal Models . . . . . . . . . . 153 7.3 Stochastic Resonance in Spiking Sensory Neuron Models . . . . . . 168 Chapter 8: Stochastic Resonance in Continuous and Spiking "euron Models with Levy "oise 180 8.1 Levy Noise Benefit in Neural Signal Detection . . . . . . . . . . . 181 8.2 Noisy Feedback Neuron Models . . . . . . . . . . . . . . . . . . . 185 8.3 Levy Processes and Stochastic Differential Equations . . . . . . . . 188 8.4 Levy Noise Benefits in Continuous Neuron Models . . . . . . . . . 194 8.5 Levy Noise Benefits in Spiking Neuron Models . . . . . . . . . . . 201 8.6 Proofs of Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Chapter 9: Future Work 225 References 227 v List of Figures 1.1 Stochastic resonance in the Kanisza square illusion with infinite-variance symmetric α-stable (α = 1.9) noise . . . . . . . . . . . . . . . . . . . . 2 1.2 Mutual information Levy noise benefits in the continuous bistable neuron (8.1)-(8.2) and (8.5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3.1 SR noise benefits in suboptimal Neyman-Pearson signal detection . . . . 17 3.2 Finding near-optimal Neyman-Pearson SR noise . . . . . . . . . . . . . 52 3.3 SR noise (signal-strength randomization) benefits in optimal anti-podal signal detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1 Stochastic resonance (SR) noise benefits in threshold detection of binary discrete signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Adaptive SR for infinite-variance α-stable noise . . . . . . . . . . . . . 75 4.3 Partial stochastic resonance (SR) in threshold detection of a binary signal X with continuous pdf in the presence of zero-mean additive discrete bipolar scale-family noise . . . . . . . . . . . . . . . . . . . . . . . 80 4.4 Total stochastic resonance (SR) in threshold detection of a binary signal X with continuous pdf in the presence of zero-mean additive symmetric scale family noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5 SR noise benefits based on inequality (5.93) for array signal detection . . 88 4.6 Collective noise benefits in parallel-array maximum-likelihood signal detection for different values of M (the number of threshold neurons in each of K = 36 arrays) . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.1 Noise-enhanced digital watermark decoding . . . . . . . . . . . . . . . 93 vi 5.2 Samples of symmetric α-stable probability densities and their white-noise realizations . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.3 SR noise benefits based on inequality (5.12) for dc signal detection in α-stable channel noise . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4 Initial SR effects in a nonlinear correlation detector for dc signal detection in impulsive symmetric α-stable channel noise (α = 1.85) . 117 5.5 Comparison of initial SR effects in the nonlinear correlation detector for dc signal detection in α-stable channel noise (α = 1.85) for different types of symmetric quantizer noise . . . . . . . . . . . . . . 120 5.6 Comparison of Neyman-Pearson detection performance of four nonlinear detectors for different values of (a) false-alarm probabilities P FA and (b) SαS channel noise dispersion when the signal strength A = 1 and the number of samples K = 50 . . . . . . . . . . . . . . . . 122 5.7 Neyman-Pearson detection performance of four nonlinear detectors for different values of the signal strength A when the tail-thickness parameter α of the SαS channel noise is (a) α = 1, (b) α = 1.3, (c) α = 1.6, and (d) α = 1.9 . . . . . . . . . . . . . . . . . . . . . . . 123 5.8 Initial SR effects in maximum-likelihood quantizer-array detection . . 131 5.9 Comparison of initial SR noise benefits in the maximum-likelihood quantizer array detector for four different types of quantizer noise . . 132 5.10 Initial SR effects in the quantizer-array maximum-likelihood detection of a deterministic bipolar sequence s k of unknown amplitude A in infinite-variance SαS channel noise with Laplacian quantizer noise . . 134 5.11 Comparison of initial SR effects in the quantizer-array maximum- likelihood detection of a deterministic bipolar sequence s k of unknown amplitude A in infinite-variance SαS channel noise with different types of quantizer noise . . . . . . . . . . . . . . . . . . . 135 5.12 SR effects in the quantizer-array maximum-likelihood detection of an antipodal signal sequence in symmetric bimodal Gaussian-mixture channel noise with Gaussian quantizer noise . . . . . . . . . . . . . 136 vii 5.13 Six different 512×512 host images watermarked with the ‘yin-yang’ image in Figure 5.1(a) . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.14 Watermark decoding performance of the three detectors (a) Cauchy detector, (b) limiting (Q → ∞) array detector, and (c) generalized-Gaussian detector . . . . . . . . . . . . . . . . . . . 145 7.1 Stochastic resonance in a spiking retinal neuron . . . . . . . . . . . 152 7.2 Stochastic resonance (SR) in the spiking retina model (7.1)-(7.3) with additive Gaussian white noise n 1 and n 2 . . . . . . . . . . . . . 168 7.3 Cross-sections plots of the mutual-information surface of Figure 7.2 169 7.4 Approximation of the average firing rate . . . . . . . . . . . . . . . . 171 7.5 Stochastic resonance in the FHN spiking neuron model—a simulation instance of Theorem 7.2 . . . . . . . . . . . . . . . . . . . . . . . . 175 7.6 Stochastic resonance in the integrate-and-fire spiking neuron model with subthreshold input signals and infinite-variance α-stable noise (α = 1.9) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 7.7 Stochastic resonance in the FHN spiking neuron model with superthreshold input signals and additive Gaussian white noise . . . 179 8.1 Stochastic resonance in the Kanisza square illusion with infinite- variance symmetric α-stable (α = 1.9) noise . . . . . . . . . . . . . 182 8.2 Sample paths from one-dimensional Levy processes . . . . . . . . . 183 8.3 Mutual information Levy noise benefits in the continuous bistable neuron (8.1)-(8.2) and (8.5) . . . . . . . . . . . . . . . . . . . . . . 184 8.4 Mutual information SR Levy noise benefits in the logistic continuous neuron (8.1)-(8.2) and (8.4) . . . . . . . . . . . . . . . . . . . . . . 200 8.5 Mutual information Levy noise benefits in the linear-threshold continuous neuron (8.1)-(8.2) and (8.6) . . . . . . . . . . . . . . . . 200 8.6 Mutual information Levy noise benefits in the Gaussian or ‘radial basis’ continuous neuron (8.1)-(8.2) and (8.8) . . . . . . . . . 201 viii 8.7 Mutual information Levy noise benefits in the leaky integrate-and- fire spiking neuron model (8.50) . . . . . . . . . . . . . . . . . . . . 208 8.8 Mutual information Levy noise benefits in the FHN spiking neuron (8.52)-(8.53) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 ix Abstract This dissertation shows how noise can benefit nonlinear signal processing. These “stochastic resonance” results include deriving necessary and sufficient conditions for noise benefits, optimal noise distributions, and algorithms that find the optimal or near-optimal noise. The results apply to broad classes of signal and noise distributions. Applications include Neyman-Pearson and maximum-likelihood signal detection in single detectors and in parallel arrays, digital watermark decoding, retinal signal detection, and signal detection in feedback neurons. x Chapter 1 Introduction 1.1 Noise Benefits in Nonlinear Systems Noise can benefit many physical, chemical, biological, and engineering signal processing systems if the user judiciously adds noise or exploits the noise already present [10, 83, 137, 180, 263]. This noise benefit or stochastic resonance (SR) effect requires some form of nonlinear signal processing [37, 95, 176]. Small amounts of noise can improve the system performance while too much noise degrades it. Figure 1.1 shows an SR noise benefit in the Kanisza square illusion [123] with impulsive Levy noise [12]. The illusory square becomes more pronounced as the noise dispersion increases and then less pronounced as the dispersion increases further. Figure 1.2 shows the signature inverted- U curve of SR. This non-monotonic curve plots the system’s performance measure versus the noise dispersion. The system performance rises to a maximum and then falls as the dispersion of the injected noise increases. Some nonlinear systems can have a multimodal performance curve and so show stochastic “multiresonance” [145, 258]. 1 (a) (b) (c) (d) (e) Figure 1.1: Stochastic resonance in the Kanisza square illusion with infinite-variance symmetric -stable ( = 1.9) noise. The Kanisza square illusion improves as the noise dispersion increases from 0.047 to 0.3789 and then it degrades as the dispersion increases further. Each pixel represents the output of the noisy bistable potential neuron model (8.1)-(8.2) and (8.5) that uses the pixel values of the original Kanisza square image as subthreshold input signals. The additive-stable noise dispersions are (a) = 0:047, (b) = 0:1015, (c) = 0:3789, (d) = 1, and (e) = 3:7321. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Scale κ of additive white Jump−Diffusion Levy noise Mutual information I(S,R) in bits 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Scale κ of additive white NIG Levy noise Mutual information I(S,R) in bits 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Scale κ of additive white α−stable Levy noise (α = 1.9) Mutual information I(S,R) in bits (a) (b) (c) Figure 1.2: Mutual information Levy noise benefits in the continuous bistable neuron (8.1)-(8.2) and (8.5). Additive white Levy noisedL t increases the mutual information of the bistable potential neuron for the subthreshold input signalss 1 =0:3 ands 2 = 0:4. The types of Levy noisedL t are (a) Gaussian with uniformly distributed jumps, (b) pure-jump normal inverse Gaussian (NIG), and (c) symmetric-stable noise with = 1.9 (thick-tailed bell curve with infinite variance [192]). The dashed vertical lines show the total min-max deviations of the mutual information in 100 simulation trials. A large and growing literature has documented SR applications in a wide range of non- linear systems. SR occurs in physical systems such as ring lasers [182, 256], threshold hys- teretic Schmitt triggers [170, 184], superconducting quantum interference devices (SQUIDs) 2 [125, 254, 264], Josephson junctions [34, 107, 128], chemical systems [153, 157, 272], cir- cuits [10, 102, 135], and quantum systems [66, 96, 164]. SR also occurs in biological sys- tems such as the auditory system [118, 119], rat [59, 93], crayfish [264], cricket [154], pad- dlefish [80], and in many types of model neurons [35, 36, 57, 55, 202, 212, 274]. SR in neu- ron models generally involve some form of subthreshold and suprathreshold signal detection [35, 64, 100, 109, 156, 187, 188, 190, 203, 207, 234, 245, 248, 247, 261, 264]. Noise can also im- prove human tactile response [58, 167, 195], muscle contraction and coordination [74, 171, 221], and sensory enhancement [103, 189]. The study of SR emerged largely from physics [24, 23] and biology [182, 264]. But many engineering signal processing systems have also added dither-like noise to improve how human perceive signals. These systems include audio compact discs [163], analog-to-digital devices [177, 262, 273], video images [237], visual perception schemes [5, 218, 243], and cochlear im- plants [49, 48, 247]. Additive noise can sometimes stabilize chaotic attractors [131, 173, 275] Injecting small amounts of noise can also improve the performance of a wide range of signal detection and estimation algorithms [46, 44, 50, 203, 207, 208, 201, 230, 278, 279, 9]. This includes Neyman-Pearson detection, Bayesian detection, and Bayesian estimation. Noise benefits in binary signal detection occur only if the receiver or detector is suboptimal or the injected noise is not independent of the concurrent received signal and the hypotheses [269, 22]. But the SR noise benefits can occur even if the receiver is optimal when the noise depends on the received signal and when the average signal power constrains the signal transmission [210]. SR-based detectors can lead to improved detection of faint signals found in spread spectrum watermark decoding and communication systems [211, 201]. These systems detect faint signals in noisy backgrounds that often have non-Gaussian and even impulsive noise types. Similar SR 3 designs might also exploit the signal-based cross-talk noise in cellular systems, Ethernet packet flows, or Internet congestion. Noise can also stem from the mobility of communicating nodes in ad-hoc networks. Smart routing or coding systems can exploit these noise sources to the improve the message passing [29] or data queries in such networks [98]. Most SR effects or noise benefits are robust. Theorems and simulations show that SR ef- fects usually occur for many types of finite-variance and even infinite-variance (impulsive) noise [139, 140, 141, 187, 203, 207, 208, 201] if a nonlinear system exhibits noise benefit for any one type of noise. These noise types can model energetic disturbances that range from thermal jitter to unmodeled environmental effects to the random crosstalk of signals in large neural or communication networks. Chapters 4-5 and 7-8 show that many of our SR noise benefits are robust. 1.2 Performance Measures for Noise Benefits Researchers have used several different performance measures to study SR effect in different systems. These performance measures vary from system to system. Performance measures that work for one system may not be useful for some other systems. There is still no consensus in the SR literature about which performance measures best measure the SR effect [83, 100, 137, 176]. A common SR performance measure is some form of signal-to-noise ratio (SNR) [223, 90, 181, 187, 271]. This is a popular performance measure because it is intuitive and because much research in SR has grown from the study of simple dynamical systems with external periodic forcing signal [75, 78, 85]. But there is no standard definition of system-level sig- nal and noise in nonlinear systems. So researchers work with an SNR that is easy to compute 4 and depends on the standard spectral measures in signal processing. Signal-detection proba- bility and average detection-error probability are also useful to measure SR in signal process- ing applications [210, 208]. Other popular SR performance measures include cross-correlation [56, 55, 151], residence-time distribution [37, 84], Fisher information [226, 44], and mutual in- formation [39, 140, 188, 203, 207, 246] between the input and output of the system. We use detection and error probabilities and mutual information to measure the SR effects in this disser- tation. Addition of noise benefit in nonlinear signal detection can increase the detection probability of Neyman-Pearson detectors [50, 126, 206, 210] or decrease the error probability of threshold detectors [224, 222, 127, 208, 230]. Detection-probability SR noise benefits occur in Neyman- Pearson detectors [152] when adding noise to the received signal before making a decision in- creases the signal detection probabilityP D while the false-alarm probability or Type-I error prob- abilityP FA stays at or below a preset level. Error-probability noise benefits occur in threshold detectors when adding noise to the received signal before making a decision decreases the error probability (probability-weighted sum of decision errors)P e or increases the total variation diver- gence of the thresholded output’s distributions [158, 220]. Both detection and error-probability noise benefits occur in single threshold detectors and also in parallel arrays of threshold detectors or quantizers. SR noise benefit can take the form of an increase in Shannon’s measure of mutual information or entropy-based bit count [61] between the input and output of a nonlinear system [39, 179, 180, 188, 203, 207, 246]. These systems can range from simple static threshold detectors to dynamical systems that obey a general stochastic differential equation. Mutual information is a useful detection-performance measure when the input signals are not deterministic. It applies 5 to many neurophysiological and communications applications involving random aperiodic input signals for which the output SNR might be ill-defined, uninformative, or irrelevant. Mutual- information noise benefits are also more general than error-probability noise benefits because mutual-information noise benefits can occur even when error-probability noise benefits do not occur. 1.3 Screening Conditions to Find Noise Benefits An important research question is when noise will benefit system’s performance. SR occurs in many suboptimal systems but the suboptimality of the system is not a sufficient condition for SR noise benefits. SR can also occur even in optimally designed systems. So successful SR applications will need some SR screening tests or conditions that can prevent a fruitless search for nonexistent noise benefits in many contexts. Such screening conditions will depend on the system’s performance measure. Each chapter of this dissertation gives a necessary or sufficient condition to find either detec- tion or error-probability noise benefits [210, 211, 208, 201] or to find mutual-information noise benefits [203, 207]. These SR conditions can act as a type of screening procedure that predicts whether a noise benefit will occur in a given real system. Some of these SR conditions deter- mine the existence of optimal SR noise for detection-probability benefits and then even lead to the exact form of the optimal noise inequality-constrained statistical decision problems. One error-probability SR condition allows us to compute the optimal intensity for several closed-form noise types if the input signals are discrete binary random variables. A gradient-ascent learning 6 algorithm can find this optimal intensity using this SR condition and the sample data when the noise pdf does not have a closed form as with many thick-tailed noise pdfs. All the necessary or sufficient conditions use signal or sample distributions, threshold values, or some other system parameters to predict whether a noise benefit will occur. Some detection or error-probability SR conditions reduce to a weighted-derivative comparison of signal probability densities at the detection threshold [210, 208] while some other conditions use the mean and variance of the test statistics and the first-derivatives of the mean and variance for array-based nonlinear correlation detection [208, 201]. The necessary or sufficient conditions have the form of a “forbidden interval” of parameter values for mutual-information noise benefits if the input signals are discrete binary random variables. 1.3.1 Forbidden Interval Theorems Forbidden interval theorems can decide whether mutual-information noise benefits occur in many nonlinear systems. These systems include carbon nanotube detectors [151], quantum-optical communication [265], and neural signal processing [203, 207, 138]. Forbidden interval theorems state that existence of SR noise benefit depends on whether the noise mean or location value falls outside or inside an interval of signal and threshold related values. The theorems act as a type of screening device for a noise benefit because they give sufficient or necessary conditions for such a noise benefit just based on the values of the signal, the threshold, and the noise mean or location. The simplest forbidden interval theorem is for the SR effect in mutual-information-based threshold detection of discrete weak binary signals [140, 141]. This theorem is the strongest because it gives both necessary and sufficient condition: SR occurs if and only if the noise mean 7 or location parameter obeys = 2 (A; +A) for threshold whereA<A< for bipolar subthreshold signalA. A similar forbidden-interval necessary condition [209, 208] in Chapter 4 shows that the mutual-information noise benefits are more general than error-probability noise benefits. Chapters 7 and 8 show that forbidden interval theorems extend to a spiking retinal neuron model and to more complex stochastic neuron models with Brownian or even Levy (jump) noise [203, 205, 207]. But forbidden interval theorems themselves do not show how to find the optimal noise inten- sity that maximizes the mutual information. Nor do they indicate the magnitude of such a noise benefit if it exists for a given noise type. These theorems merely indicate whether a mutual in- formation noise benefit exists in theory for a given combination of system and noise parameters. But stochastic gradient-ascent learning algorithm can often find the optimal noise intensity for many types of finite-variance and infinite-variance noise types once a forbidden interval theorem predicts such noise benefit [188]. So the forbidden interval theorems together with a robustified stochastic learning algorithm can help in many potential noise applications that use threshold detectors that process subthreshold input signals in a noisy environment. 1.4 Learning Algorithms for Finding Optimal Noise-Benefits Learning algorithms can often find the optimal SR noise when closed-form techniques are not available. The optimal noise has a simple closed form of a weighted (randomized) average of two noise realizations for a single-inequality constrained statistical decision problems such as Neyman-Pearson signal detection if the optimal noise exists. But computing such optimal noise is not simple and the existence of noise benefits does not itself imply the existence of optimal 8 noise. Chapter 5 present a deterministic algorithm that uses successive approximations to find a near-optimal SR noise from a finite set of noise realizations. Optimal independent additive SR noise is just a constant if the noise is unconstrained [127]. So we can just add a proper constant shift in the subthreshold signals or in the threshold itself to optimize the signal processing. But signals or thresholds may not be available for optimization. An important example of this occurs in biological cellular signal detection [100, 148, 245, 261, 264] where signals are weak or subthreshold. We often cannot control the signal or threshold. But we can control the noise in the system’s local environment by adjusting the mechanism involved in its shaping such as temperature [117, 198]. But these systems are far too complex to model with closed-form techniques. This suggests that we may need to use intelligent or adaptive model-free techniques to learn or approximate the SR effects when we do not know the exact form of the dynamical systems. Stochastic gradient-scent adaptive algorithms can learn the SR effect in many nonlinear sys- tems for various performance measures such as SNR, mutual-information, and detection error- probability [187, 188, 211]. Mitain and Kosko [187] first showed how adaptive systems can learn to add optimal amount of noise to improve the SNR in popular nonlinear feedback systems. These adaptive systems learn the SR effect as the system performs a stochastic gradient ascent on the SNR. Robust noise suppressors can improve the learning process when the learning terms are highly impulsive. Statistically robust learning laws can also find the entropy-optimal noise level for a wide range of neuron models [188]. Biological neurons should experience less mutual information if they process weak or subthreshold stimuli and if they do not use their local noise. Such noise- based information maximization in threshold neurons is consistent with Linsker’s principle of 9 information maximization in neural networks [161, 162]. These results support the implicit SR conjecture that biological neurons have evolved to computationally exploit their noisy environ- ments [38, 59, 58, 67, 154]. The internal noise intensity sometimes depends on the system size in many biological systems. Such systems may have adapted to achieve the optimal size to exploit the internal noise instead of reducing it to enhance their detection of subthreshold signals [156]. A stochastic gradient-ascent algorithm can sometimes learn the optimal noise intensity with- out using a robust supressor on its learning terms even when the noise is from impulsive-stable family. Such an algorithm uses a necessary and sufficient condition for error-probability noise benefits. Optimal noise-intensity formulas do exist for many closed-form scale-family noise to minimize the error probability in simple threshold detectors when a user can control only the noise variance or dispersion. But the stochastic learning algorithm is useful when noise probabil- ity densities that do not have a closed form such as impulsive symmetric-stable noise. 1.5 Contributions of this Dissertation This dissertation presents several new theorems and applications of “stochastic resonance” (SR) or noise benefits in nonlinear signal processing. These results include deriving necessary or suf- ficient conditions for noise benefits, the optimal noise distributions, and algorithms to find the optimal or near-optimal noise. These SR results apply to broad classes of signal and noise distri- butions and have wide range of applications such as Neyman-Pearson and maximum-likelihood signal detection in single detectors or in parallel arrays, spread-spectrum communication or wa- termark decoding, and signal detection in feedforward and feedback neurons. 10 Chapter 3 presents theorems and an algorithm to find optimal or near-optimal noise benefits in Neyman-Pearson and other more general inequality-constrained statistical decision problems including optimal signal detection for power-constrained transmitter. The optimal SR noise dis- tribution is just the randomization of two noise realizations when the optimal noise exists for a single inequality constraint on the average cost. The theorems give necessary and sufficient con- ditions for the existence of such optimal SR noise in inequality-constrained signal detectors. A new deterministic algorithm can find near-optimal noise when the optimal noise does not exist or when computing the optimal SR noise is not simple. Chapter 4 presents theorems and a stochastic learning algorithm to find error-probability noise benefits in threshold detectors when the noise comes from a scale-family distribution. It then extends such noise benefits to parallel arrays of threshold detectors or neurons. These results apply to broad classes of signal and noise distributions. Chapter 5 shows that arrays of parallel connected threshold detectors or quantizers can ben- efit from quantizer noise or threshold fluctuations even when an individual threshold detector cannot itself show a noise benefit. Theorems give several conditions for initial rate of noise bene- fits in quantizer-array-based nonlinear correlators for Neyman-Pearson and maximum-likelihood detection. They show that symmetric uniform noise gives the best initial SR effect among all sym- metric scale-family noise types. A spread spectrum watermark decoding application illustrates these noise benefits. Chapters 7 and 8 extend forbidden-interval SR conditions to feedforward and general feed- back neuron models for mutual-information noise benefits in subthreshold signal detection. The neuron models include major mathematical models of spiking retinal and other sensory neurons 11 they obey a general stochastic differential equation. Theorems and simulations show that mutual- information noise benefits are robust in the sense that they occur for many types of impulsive jump-noise processes such as Levy processes found in real and model neurons as well as in mod- els of finance and other random phenomena. 12 Chapter 2 Part One: Noise Benefits with Detection and Error Probabilities The three chapters in Part One show how noise can benefit nonlinear statistical decision making in terms of increasing the detection probability or reducing the average detection error. Chapter 3 shows that the deliberate injection of noise can improve many instances of Neyman-Pearson signal detection by maximizing the detection probability while the false-alarm probability stays at or below a preset threshold. Necessary and sufficient conditions give the existence such noise benefits and the optimal noise that has a simple closed form of a weighted (randomized) average of two noise realizations. A new algorithm can find near-optimal noise when the optimal noise does not exist or when computing the optimal SR noise is not simple. These SR results extend directly to the more general case of maximizing the expected payoff in statistical decision making with an inequality constraint on the average cost. Chapter 4 shows noise can decrease the error probability of threshold detectors. It first shows error-probability noise benefits in simple threshold detectors. It then shows that error-probability noise benefits occur also in parallel arrays of threshold detectors or feedforward netwrorks of threshold neurons. Necessary or sufficient conditions allow us to find such noise benefits in 13 almost all suboptimal simple threshold detectors and also allow us to find the optimal noise inten- sity for several closed-form noise types when a threshold detector performs discrete binary signal detection in the presence of additive scale-family noise. A gradient-ascent learning algorithm can find this optimal intensity from sample data even when the noise pdf does not have a closed form as with many thick-tailed noise pdfs. Chapter 5 shows how both detection and error-probability noise benefits can occur in arrays of threshold detectors. The noisy quantizer-array-based nonlinear correlation detectors show a genuine collective noise benefit when an individual threshold detectors does not itself produce a noise benefit. Such SR noise benefits occur for non-Gaussian channel noise even when it is impulsive infinite-variance symmetric alpha-stable. Chapter 5 also characterize these SR effects in terms of detector parameters such as the number of quantizersQ, modality of channel noise pdf, and the type of quantizer noise. The array-SR effects have applications in discrete-cosine- transform-domain spread spectrum watermark extraction and many other communication and signal processing problems. 14 Chapter 3 Optimal Noise Benefits in Neyman-Pearson and Inequality-Constrained Statistical Signal Detection This chapter presents four theorems and an algorithm to find optimal or near-optimal “stochas- tic resonance” (SR) noise benefits for Neyman-Pearson hypothesis testing and for more general inequality-constrained signal detection problems. The optimal SR noise distribution is just the randomization of two noise realizations when the optimal noise exists for a single inequality constraint on the average cost. The theorems give necessary and sufficient conditions for the existence of such optimal SR noise in inequality-constrained signal detectors. There exists a se- quence of noise variables whose detection performance limit is optimal when such noise does not exist. Another theorem gives sufficient conditions for SR noise benefits in Neyman-Pearson and other signal detection problems with inequality cost constraints. An upper bound limits the number of iterations that the algorithm requires to find near-optimal noise. 15 3.1 Noise Benefits in Neyman-Pearson Signal Detection We focus first on the special case of SR in signal detection that uses Neyman-Pearson (N-P) hypothesis testing [152] to decide between two simple alternatives. We define the noise as N-P SR noise if adding such noise to the received signal before making a decision increases the signal detection probabilityP D while the false-alarm probabilityP FA stays at or below a preset level for a given detection strategy. Figure 3.1 shows this type of noise benefit for a suboptimal receiver and does not involve the typical inverted-U curve of SR (but would if it used uniform noise and we plotted the detection probability against the noise variance). An SR noise benefit does not occur in an optimal receiver if the noise is independent of the concurrent received signal and the hypotheses. This follows from the so-called irrelevance theorem of optimal detection [269, 22]. But Section 3.5.2 shows that SR noise benefits can occur even if the receiver is optimal when the noise depends on the received signal. Figure 3 shows such an SR noise benefit in optimal anti-podal signal detection when the average signal power constrains the signal transmission. Sections II and III present three SR results for Neyman-Pearson signal detection. The first SR result gives necessary and sufficient conditions for the existence of optimal N-P SR noise. The existence of some N-P SR noise does not itself imply the existence of optimal noise. But there exists a sequence of noise variables whose detection performance limit is optimal when the optimal N-P SR noise does not exist. The second SR result is a sufficient condition for SR noise benefits in N-P signal detection. The third SR result is an algorithm that finds near-optimal N-P SR noise from a finite set ~ N of noise realizations. This noise is nearly optimal if the detection and false alarm probabilities in ~ N and in the actual noise spaceN ~ N are sufficiently close. An 16 −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 X Probability density f 0 f 1 Suboptimal threshold θ = 2.85 H 0 : f x (x,H 0 ) = f 0 (x) H 1 : f x (x,H 1 ) = f 1 (x) Detection Rule Suboptimal: Reject H 0 if X > θ Optimal: Reject H 0 if X ∈ [θ 1 , θ 2 ] α = 0.05 θ 1 = 1.342 θ 2 = 1.658 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Probability of False Alarm P FA Probability of Detection P D Part of the ROC Curve {(p FA (n), p D (n)): n ∈ R} for suboptimal test of H 0 vs. H 1 a SR Detection Improvement c e b d H 0 : X ~ N(0,3) H 1 : X ~ N(1,1) α = 0.05 Reject H 0 if X+N > θ = 2.85 Upper boundary of the convex hull of the ROC curve ∂V ∂V g (a) (b) Figure 3.1: SR noise benefits in suboptimal Neyman-Pearson signal detection (a) The thin solid line shows the probability density function (pdf)f 0 of signalX under the normal hypoth- esisH 0 :X N(0; 3) while the dashed line shows the pdff 1 ofX under the alternative normal hypothesis H 1 :X N(1; 1). The one-sample optimal detection scheme requires the two thresholds 1 = 1.342 and 2 = 1.658. It rejectsH 0 ifX2 [ 1 ; 2 ] for the test significance or preset false-alarm probability = 0.05. The thick vertical solid line shows the best single threshold = 2.85 if the detector cannot use two thresholds due to resource limits or some design constraint. The respective suboptimal detector rejectsH 0 ifX >. (b) The solid line shows the monotonic but nonconcave ROC curveU =f(p FA (n);p D (n)): n2Rg of the suboptimal detector for different realizationsn of the additive noiseN inX. Herep FA (n) = 1-( n p 3 ) and p D (n) = 1(-1-n) for the standard normal cumulative distribution function . The detector operates at pointa = (p FA (0);p D (0)) = (0.05, 0.0322) on the ROC curve in the absence of noise. Nonconcavity of the ROC curveU between the pointsb = (p FA (n 2 );p D (n 2 )) andc = (0,0) allows the N-P SR effect to occur. A convex or random combination of two operating pointsb ande = (p FA (n 1 );p D (n 1 )) gives a better detec- tion performance (pointg) than pointa at the same false-alarm levelp FA (0) = = 0.05. Such a random combination of operating points results from adding an independent discrete noiseN with pdff N (n) = (nn 1 ) + (1-)(nn 2 ) to the random sampleX where = (p FA (n 2 )-)=(p FA (n 2 )-p FA (n 1 )). Point d is on the upper boundary@V of the ROC curve’s convex hull (dashed tangent line betweenb andc). So d is the supremum of detection performances that the random or convex combination of operating points on the ROC can achieve such that stays at 0.05. Pointd is the convex combination ofb andc but it is not realizable by adding only noise to the sampleX because pointc = (0, 0) is not on the ROC curve since there is no noise realizationn2R such that 1 ( n p 3 ) = 0 = 1( 1n). So the N-P SR noise exists but the optimal N-P SR noise does not exist in the noise spaceN =R. 17 upper bound limits the number of iterations that the algorithm needs to find near-optimal noise. Section 3.4 extends these results to more general statistical decision problems that have one in- equality constraint on the average cost. These SR results extend and correct prior work in “detector randomization” or adding noise for improving the performance of N-P signal detection. Tsitsiklis [253] explored the mechanism of detection-strategy randomization for a finite set of detection strategies (operating points) in decentralized detection. He showed that there exists a randomized detection strategy that uses a convex or random combination of at most two existing detection strategies and that gives the optimal N-P detection performance. Such optimal detection strategies lie on the upper boundary of the convex hull of the receiver operating characteristic (ROC) curve points. Scott [239] later used the same optimization principle in classification systems while Appadwedula [11] used it for energy-efficient detection in sensor networks. Then Chen et al. [50] used a fixed detector structure: they injected independent noise in the received signal to obtain a proper random com- bination of operating points on the ROC curve for a given suboptimal detector. They showed that the optimal N-P SR noise for suboptimal detectors randomizes no more than two noise realiza- tions. But Chen et al. [50] assumed that the convex hullV of the set of ROC curve operating points UR 2 always contains its boundary@V and thus that the convex hullV is closed. This is not true in general. The topological problem is that the convex hullV need not be closed ifU is not compact [108]: the convex hull ofU is open ifU itself is open [27]. Chen et al. argued correctly 18 along the lines of the proof of Theorem 3 in [50] when they concluded that the “optimum pair can only exist on the boundary.” But their later claim that “eachz on the boundary can be expressed as the convex combination of only two elements of U” is not true in general because V may not include all of its boundary points. The optimal N-P SR noise need not exist at all in a fixed detector [206]. Figure 3.1 shows a case where the N-P SR noise exists but where the optimal N-P SR noise does not exist in the noise spaceN =R. Section 3.5.1 further shows that the optimal SR noise does exist if we restrict the noise space to a compact interval such as [-3, 3]. The algorithm finds nearly optimal N-P SR noise realizations from a discretized set of noise realizations ~ N = [-3:0.0001:3] in 17 iterations. Section 3.5.2 shows that the detection performance of the maxi- mum a posteriori (MAP) receiver can sometimes benefit from signal-power randomization in an average-power-constrained anti-podal signal transmitter if the channel noise pdf is multimodal. We assume that the transmitter transmits equiprobable anti-podal signalsfS;Sg withS2S = [0.5, 3.75] and that the additive channel noise has a symmetric bimodal Gaussian-mixture proba- bility density. Then the respective error probability of the optimal MAP receiver is nonconvex. So the transmitter can improve the detection performance by time-sharing or randomizing between two power levels for some values of the constraining maximum average. The algorithm finds a near-optimal signal power randomization from a discretized subset of signal-strength realizations ~ S = [0.5:0.0001:3.75]. The algorithm finds this signal-power distribution in just 13 iterations. The next four sections present and illustrate the formal SR results. All examples use single sam- ples in statistical decision making but the results still hold for multiple samples. 19 3.2 Optimal Noise Densities for Neyman-Pearson Signal Detection We now derive two theorems that fully characterize the optimal noise probability densities for Neyman-Pearson signal detection. Then we give a sufficient condition for SR noise benefits in N-P signal detection. Consider a binary hypothesis test where we decide between H 0 : f X (x;H 0 ) = f 0 (x) and H 1 :f X (x;H 1 ) =f 1 (x) using anm-dimensional noisy observation vectorY =X +N. HereX2 R m is the received signal vector,N2NR m is a noise vector with pdff N , andN is the noise space. The noise vectorN can be random or a deterministic constant such asf N (n) =(nn o ). Heref 0 andf 1 are the pdfs of the signalX under the hypothesisH 0 andH 1 . We need not know the prior probabilitiesP (H i ) of the hypothesesH i . Then we want to determine when the optimal additive noise N opt exists and gives the best achievable detection performance at the significance level for the given detection strategy. DefineP D (n) andP FA (n) as the respective probabilities of correct detection and false detec- tion (alarm) when the noise realization is n. DefineP D (f N ) = R N P D (n)f N (n)dn andP FA (f N ) = R N P FA (n)f N (n)dn as the respective probabilities of detection and false alarm when the noise pdf isf N . Let f N opt be the pdf of the optimal SR noise N opt that we add to the received signal X to maximize the probability of detection P D while keeping P FA . LetF denote the set of all probability density functions. So we need to find f N opt = arg max f N 2F R N P D (n)f N (n)dn (3.1) 20 such that f N opt (n) 0 for alln (3.2) R N f N opt (n)dn = 1 and (3.3) P FA (f N opt ) = R N P FA (n)f N opt (n)dn : (3.4) Conditions (3.2) and (3.3) are general pdf properties while (3.1) and (3.4) state the Neyman- Pearson criteria for the optimal SR noise pdff N opt . The conditional noise pdff(njx;H i ) obeys f(njx;H i ) =f N (n) if the noise random variableN is independent of the concurrent received signal random variable X and the hypothesesH i . Then the irrelevance theorem [269, 22] implies that the optimal likelihood-ratio test based on the received signal realization x and noise realization n is the same as the optimal likelihood-ratio test based on only x. So the optimal detector can always ignore the noise realization n without affecting its detection performance. This implies that the N-P SR noise benefit will not occur for the noise N if the receiver uses the optimal likelihood-ratio test based onx. But computing optimal likelihood-ratio thresholds is not simple for many non-Gaussian noise types [124, 142]. We may also need multiple thresholds to partition the test-statistic sample space into acceptance and rejection regions if the likelihood ratio is not a monotone function of the test statistic. So some detection systems use suboptimal tests if they have special hardware resource limits or if they constrain the number of detection thresholds [53, 89]. Then SR noise benefits may occur and so we may need to compute the optimal or near-optimal SR noise pdf. The primal-dual method [166, 30] directly solves the above optimization problem (3.1)-(3.4) in the noise domainR m . This approach gives both the conditions for the existence of optimal 21 SR noise and the exact form of the optimal noise pdf. It also leads to an algorithm that can find near-optimal SR noise if theP D on the noise spaceN is sufficiently close to its restriction to the discrete noise realizations ~ NN . The Lagrangian of the inequality-constrained optimization problem (3.1)-(3.4) is L(f N ;k) = Z N P D (n)f N (n)dnk Z N P FA (n)f N (n)dn (3.5) = Z N (P D (n)k(P FA (n)))f N (n)dn: (3.6) The Lagrange duality [166, 30] implies that sup f N 2F Z N P D (n)f N (n)dn = min k0 sup f N 2F L(f N ;k): (3.7) So solving the optimization problem equates to findingk 0 and the pdff N opt such that min k0 sup f N 2F L(f N ;k) = L(f N opt ;k ): (3.8) The next two theorems give necessary and sufficient conditions for the existence of the opti- mal N-P SR noise and for the form of its pdf if it exists. Define first the sets D + = fn2N : (P FA (n)) 0g and (3.9) D = fn2N : (P FA (n)) 0g: (3.10) AssumeD 6=; so thatN opt always exists. LetP D + sup ,P D sup , andP Dsup be the respective suprema ofP D (n) over the setsD + ,D , andN : 22 P D + sup = sup n P D (n) : n2D + (3.11) P D sup = sup n P D (n) : n2D and (3.12) P Dsup = sup n fP D (n) : n2Ng: (3.13) Define g(n;k) = P D (n)k (P FA (n)) (3.14) and letd + (k),d (k), andd(k) be the respective suprema over the setsD + ,D , andN : d + (k) = sup n g(n;k) : n2D + (3.15) d (k) = sup n g(n;k) : n2D and (3.16) d(k) = sup n fg(n;k) : n2Ng: (3.17) Define G + = fn2D + :P D (n) =P D + sup g and (3.18) G = fn2D :P D (n) =P D sup g: (3.19) Rewrite the Lagrangian (3.6) as L(f N ;k) = Z N g(n;k)f N (n)dn: (3.20) 23 Then (3.8) becomes min k0 sup f N 2F L(f N ;k) = min k0 sup f N 2F Z N g(n;k)f N (n)dn (3.21) = min k0 d(k): (3.22) Equality (3.22) follows because sup f N 2F Z N g(n;k)f N (n)dn sup f N 2F Z N d(k)f N (n)dn (3.23) = d(k) (3.24) = sup n fg(n;k) : n2Ng (3.25) and because strict inequality in (3.23) implies that there exists ann 1 2N such that sup f N 2F Z N g(n;k)f N (n)dn < g(n 1 ;k) because of (3.24)-(3.25) = Z N g(n;k)(nn 1 )dn: (3.26) But this is a contradiction because the supremum of a set of numbers cannot be less than any one of those numbers. The definition (3.24) implies thatd(k) = maxfd (k);d + (k)g. So (3.21)-(3.22) reduces to min k0 sup f N 2F L(f N ;k) = min k0 maxfd (k);d + (k)g: (3.27) 24 Theorem 3.1(a) below gives necessary and sufficient conditions for the existence of optimal SR noise. It also gives the exact form of the optimal N-P SR noise pdf f N opt if it exists when P D sup P D + sup . Theorem 3.2 likewise gives necessary and sufficient conditions for the exis- tence off N opt whenP D sup <P D + sup . Theorem 3.1 (a) Suppose thatP D sup P D + sup and thatG 6=;. Then f N opt (n) = (nn o ) (3.28) is an optimal SR noise pdf for Neyman-Pearson detection for somen o 2G andP FA (f N opt ). The Neyman-Pearson optimal SR noise does not exist for the given test level ifG =;. But there exists a noise pdf sequenceff Nr g 1 r=1 of the form (3.28) such thatP FA (f Nr ) for allr and such that lim r!1 P D (f Nr ) = P Dsup : (3.29) (b) Suppose that P D sup < P D + sup . Then P FA (f N opt ) = if the Neyman-Pearson optimal SR noise pdff N opt (n) exists. The optimal noise pdf in Theorem 3.1(a) is not unique ifG contains more than one noise real- ization. 25 Proof: Part (a) Suppose thatP D sup P D + sup and thatG 6=;. Then for allk 0 d (k) = sup n P D (n)k (P FA (n)) : n2D (3.30) sup n P D (n) : n2D = P D sup (3.31) and similarlyd + (k) P D + sup . Thend (k) d + (k) for allk 0 ifP D sup P D + sup . The left-hand side of (3.27) becomes min k0 sup f N 2F L(f N ;k) = min k0 d (k) (3.32) = d (0) becaused (k) is a nondecreasing function ofk = P D sup = P Dsup : (3.33) Thus (3.6), (3.8), and (3.33) imply thatk = 0. So we need to find the pdff N opt such that L(f N opt ; 0) = Z N P D (n)f N opt (n)dn = P D sup : (3.34) The definition ofG implies thatP D (n o ) =P D sup =P Dsup for any n o 2 G . Choosef N (n) as the unit impulse atn o :f N (n) =(nn o ). Then Z N P D (n)f N (n)dn = P D (n o ) = P D sup (3.35) Z N P FA (n)f N (n)dn = P FA (n o ) : (3.36) 26 Sof N is the optimal SR noise pdf and hence f N opt = (nn o ) for anyn o 2G : (3.37) Suppose now thatG =;. ThenP D (n)<P Dsup =P D sup for alln2N . Suppose thatf N 0 is the optimal SR noise pdf. Then P D (f N 0) = Z N P D (n)f N 0(n)d(n) (3.38) < Z N P Dsup f N 0(n)d(n) becauseP D (n)<P Dsup and because (3.39) Z N P Dsup P D (n) f N 0(n)d(n) = 0 iff P Dsup P D (n) f N 0(n) = 0 almost everywhere onN = P Dsup : (3.40) G =; and the supremum definition ofP D sup further imply [227] that there exists a sequence of noise realizationsfn r g 1 r=1 inG such that lim r!1 P D (n r ) = P Dsup : (3.41) So there exists n s 2fn r g 1 r=1 G such that P D (n s ) > P D (f N 0) and P FA (n s ) . Define a sequence of noise pdfs f Nr (n) = (n n r ). Then f Ns contradicts the optimality of f N 0 while P FA (f Nr ) for allr and lim r!1 P D (f Nr ) = lim r!1 P D (n r ) = P Dsup : (3.42) 27 Part (b) Suppose thatP D sup < P D + sup and thatG + 6=;. Suppose also thatf N 0 is the optimal SR noise pdf such that P FA (f N 0) = v < and that P D (f N 0) P D (f N ) for any other noise pdf f N . The definition ofG + implies thatP FA (n 1 ) > v if n 1 2G + . ThenP D (n 1 ) =P Dsup > P D (f N 0) becauseP FA (f N 0) =v< andP D sup <P D + sup =P Dsup . Define f N (n) = v P FA (n 1 )v (nn 1 ) + P FA (n 1 ) P FA (n 1 )v f N 0(n): (3.43) Thenf N is a pdf becausef N (n) 0 for alln and R N f N (n)dn = 1. So P FA (f N ) = Z N P FA (n)f N (n)dn (3.44) = v P FA (n 1 )v P FA (n 1 ) + P FA (n 1 ) P FA (n 1 )v Z N P FA (n)f N 0(n)dn (3.45) = v P FA (n 1 )v P FA (n 1 ) + P FA (n 1 ) P FA (n 1 )v v (3.46) = (3.47) and P D (f N ) = Z N P D (n)f N (n)dn (3.48) = v P FA (n 1 )v P D (n 1 ) + P FA (n 1 ) P FA (n 1 )v Z N P D (n)f N 0(n)dn (3.49) = v P FA (n 1 )v P Dsup + P FA (n 1 ) P FA (n 1 )v P D (f N 0) (3.50) > P D (f N 0) (3.51) 28 becauseP Dsup >P D (f N 0). ButP D (f N )>P D (f N 0) contradicts the optimality off N 0. SoP FA (f N 0) = iff N 0 is the optimal SR noise pdf and ifP D sup <P D + sup . Suppose now thatP D sup < P D + sup butG + =;. The definitions ofP D + sup andG + imply that there exists an n 1 2D + such thatP D (n 1 )> P D (f N 0) becauseP FA (f N 0) = v < . Thenf N again contradicts the optimality off N 0 if we definef N as in (3.43). Theorem 3.2 Suppose thatP D sup <P D + sup . Then (a)-(d) hold: (a) There existsk 0 such thatd + (k ) =d (k ) =d(k ) and minfd + (k);d (k)gd(k ) maxfd + (k);d (k)g for anyk 0. (b) Suppose the noise pdf f N satisfies P D (f N ) = d(k ) > P D (0) and P FA (f N ) = . Then f N is a Neyman-Pearson optimal noise pdf. Sod(k ) is the optimal N-P SR detection probabilityP D opt . (c) Suppose that there exist n 1 2 D and n 2 2 D + such that g(n 1 ;k ) = d (k ) = d(k ) = g(n 2 ;k ) =d + (k ). Then f N opt (n) = (nn 1 ) + (1)(nn 2 ) (3.52) is the optimal Neyman-Pearson SR noise pdf ifd(k )>P D (0) and if = P FA (n 2 ) P FA (n 2 )P FA (n 1 ) : (3.53) 29 (d) Neyman-Pearson optimal SR noise does not exist if (c) does not hold. But there does exist a noise pdf sequenceff Nr g 1 r=1 of the form (3.52)-(3.53) such that lim r!1 P D (f Nr ) = d(k ): (3.54) The optimal noise pdf is not unique if more than one pair of noise realizations satisfy Theorem 3.2(c). Proof: Part (a) Definitions (3.14)-(3.16) d + (k) and d (k) are continuous functions of k. d + (k) is further an unbounded and decreasing function ofk. Butd (k) is a nondecreasing function ofk because P D sup < P D + sup . So there existsk > 1 such thatd + (k ) =d (k ) ifd + (1)> d (1). There likewise exists 0 k < 1 such that d + (k ) = d (k ) if d + (1) < d (1) because d + (0) = P D + sup > P D sup = d (0). Thus there exists k 0 such that d + (k ) = d (k ) = d(k ) and minfd + (k);d (k)gd(k ) maxfd + (k);d (k)g for anyk 0. Part (b) Part (a) above and (3.27) imply that min k0 sup f N 2F L(f N ;k) = min k0 maxfd (k);d + (k)g = d(k ): (3.55) 30 Letf N be a noise pdf such thatP D (f N ) =d(k )>P D (0) andP FA (f N ) =. Then L(f N ;k ) = Z N (P D (n)k (P FA (n)))f N (n)d(n) (3.56) = Z N P D (n)f N (n)d(n) (3.57) = P D (f N ) = d(k ) (3.58) = min k0 sup f N 2F L(f N ;k) by (3.55). (3.59) Sof N is the optimal SR noise pdf. Part (c) Suppose that there existn 1 2D + andn 2 2D such thatg(n 1 ;k ) =d(k ) =g(n 2 ;k ). Define f N (n) = (nn 1 ) + (1)(nn 2 ) (3.60) where = P FA (n 2 ) P FA (n 2 )P FA (n 1 ) : (3.61) Then P D (f N ) = Z N P D (n)f N (n)d(n) (3.62) = Z N P D (n) [(nn 1 ) + (1)(nn 2 )]d(n) (3.63) = P D (n 1 ) + (1)P D (n 2 ) (3.64) = d(k ) + (1)d(k ) = d(k ) (3.65) and 31 P FA (f N ) = Z N P FA (n)f N (n)d(n) (3.66) = Z N P FA (n) [(nn 1 ) + (1)(nn 2 )]d(n) (3.67) = P FA (n 1 ) + (1)P FA (n 2 ) (3.68) = P FA (n 2 ) P FA (n 2 )P FA (n 1 ) P FA (n 1 ) + 1 P FA (n 2 ) P FA (n 2 )P FA (n 1 ) P FA (n 2 ) (3.69) = : (3.70) Then (3.65), (3.70), and the result of Part (b) imply thatf N (n) is an optimal SR noise pdf. This optimal noise pdf is not unique if there exist more than one pair of noise realizations that satisfy the hypothesis of Theorem 3.2(c). Part (d) Define H + = fn2D + :g(n;k ) =d(k )g and (3.71) H = fn2D :g(n;k ) =d(k )g: (3.72) Suppose thatH + 6=; butH =;. So there exists n 1 2 D + such thatg(n 1 ;k ) =d(k ) but there does not existn2D such thatg(n;k ) =d(k ). Theng(n;k )<d (k ) =d(k ) for all n2D by the definition ofd (k ). Supposef N 0 is the optimal SR noise pdf withP FA (f N 0) =. 32 Then P D (f N 0) = Z N g(n;k )f N 0(n)d(n) (3.73) Z D + g(n;k )f N 0(n)d(n) + Z D nD 0 g(n;k )f N 0(n)d(n) (3.74) Z D + d(k )f N 0(n)d(n) + Z D nD 0 g(n;k )f N 0(n)d(n) (3.75) < Z D + d(k )f N 0(n)d(n) + Z D nD 0 d(k )f N 0(n)d(n) (3.76) becauseg(n;k )<d(k ) for alln2D and Z D nD 0 [d(k )g(n;k )]f N 0(n)d(n) = 0 iff [d(k )g(n;k )]f N 0(n) = 0 almost everywhere on onD nD 0 = d(k ): (3.77) The supremum definition ofd (k ) andd(k ) =d (k ) imply [227] that there exists a sequence of noise realizationsfn r g 1 r=1 inD such that lim r!1 g(n r ;k ) = d(k ): (3.78) Now define a sequence of pdfs f Nr (n) = r (nn 1 ) + (1 r )(nn r ) (3.79) where r = P FA (n r ) P FA (n r )P FA (n 1 ) : (3.80) 33 ThenP FA (f Nr ) = for allr and lim r!1 P D (f Nr ) = lim r!1 Z N g(n;k )f Nr (n)d(n) (3.81) = lim r!1 [ r g(n 1 ;k ) + (1 r )g(n r ;k )] (3.82) = lim r!1 [ r d(k ) + (1 r )g(n r ;k )] (3.83) = d(k ) lim r!1 r + lim r!1 g(n r ;k ) lim r!1 (1 r ) (3.84) = d(k ) lim r!1 r + d(k ) lim r!1 (1 r ) by (3.78) (3.85) = d(k ): (3.86) HenceP FA (f Nr ) = and there exists a positive integerl such thatP D (f Nr ) > P D (f N 0) for allr l. This contradicts the optimality off N 0. So optimal SR noise does not exist ifH =; and H + 6=;. Similar arguments also prove the nonexistence of optimal SR noise in the more general case when eitherH + =; orH =;. The optimal randomization in (3.52)-(3.53) of two noise realizations resembles the optimal “min- imax” randomization of pure strategies in finite zero-sum games [259, 165, 196]. But the noise result differs because the minimax optimization does not impose inequality constraints on the expected payoff. Theorem 3.2 also implies the following necessary conditions for the optimal N-P SR noise. 34 Corollary 3.1 Suppose thatP D sup <P D + sup and thatP D andP FA are differentiable in the interior of the noise spaceN . (a) Suppose thatf N opt is an optimal N-P SR noise pdf of the form (3.52)-(3.53) in Theorem 3.2(c) and thatn 1 andn 2 of (3.52)-(3.53) are the interior points ofN . Thenn 1 andn 2 satisfy P D (n 1 ) kP FA (n 1 ) = P D (n 2 ) kP FA (n 2 ) (3.87) rP D (n 1 ) krP FA (n 1 ) = 0 (3.88) rP D (n 2 ) krP FA (n 2 ) = 0 (3.89) for somek 0. (b) Suppose further that for eachk 0 at most one solution ofrP D (n) -krP FA (n) = 0 inR m is a global maximum ofP D (n)k(P FA (n)). Thenf N opt =(nn ) is the optimal N-P SR noise pdf if such a solution n exists inD 0 . There is otherwise no optimal N-P SR noise in the interior ofN . Proof: Part (a) Say that f N opt is an optimal SR noise pdf of the form (3.52)-(3.53) in Theorem 3.2(c). Then g(n 1 ;k ) =d(k ) =g(n 2 ;k ) for somek 0 and so (3.87) follows. The definition ofd(k ) im- plies thatn 1 andn 2 are maximal points ofg(n;k) fork =k . Son 1 andn 2 satisfy (3.88)-(3.89) fork =k ifP D andP FA are differentiable in the interior ofN and if n 1 and n 2 of (3.52)-(3.53) are the interior points ofN . 35 Part (b) Theorem 3.2(a) implies that there existsk 0 such thatd (k ) =d(k ) =d + (k ). Now either there exist n 1 2 D + and n 2 2 D such thatg(n 1 ;k ) =d + (k ) =d(k ) =g(n 2 ;k ) =d (k ) or else the optimal SR noise does not exist by Theorem 3.2(d). The former case implies that n 1 and n 2 are solutions ofrP D (n)k rP FA (n) = 0. But the hypothesis implies that at most one solution ofrP D (n)krP FA (n) = 0 inR m is a global maximum ofP D (n)k(P FA (n)) for eachk 0. Son 1 =n 2 = (say)n 2D 0 =D \D + . ThenP FA (n ) = 0 andf N opt =(nn ) is the optimal noise pdf by Theorem 3.2(c). Equalities (3.87)-(3.89) are necessary but not sufficient because the noise realizationsn 1 and n 2 that satisfy (3.88)-(3.89) need not be global maxima. They need not be inD + andD even if they are global maxima. So (3.87)-(3.89) may not help find the optimal SR noise. But Corollary 3.1(b) shows that these necessary conditions can suggest when optimal SR noise does not exist. Section 3.5.1 applies Corollary 3.1(b) to a hypothesis test between two Gaussian densities. Theorem 3.3 gives a sufficient condition to detect an N-P SR noise benefit in detectors that use a single noisy observationY2R to decide betweenH 0 andH 1 . 36 Theorem 3.3 Let the detection and false-alarm probabilitiesP D andP FA be real-valued functions that are dif- ferentiable in a neighborhood of 0. Suppose that P 00 D (0) and P 00 FA (0) exist and that P FA (0) = . Suppose also that P FA does not have a local minimum at 0 and that P D does not have a local maximum at 0. Then an N-P SR noise exists if P 00 D (0) P 0 FA (0) > P 00 FA (0) P 0 D (0) (3.90) or if sgn(P 0 FA (0)) sgn(P 0 D (0)) 0. Proof: The local version of Taylor’s theorem [134] gives P D (r)P D (0) = P 0 D (0)r +P 00 D (0) r 2 2 + 1 (r) (3.91) where 1 (r) r 2 ! 0 asr! 0. Likewise P D (0)P D (r) = P 0 D (0)rP 00 D (0) r 2 2 2 (r) (3.92) P FA (r)P FA (0) = P 0 FA (0)r +P 00 FA (0) r 2 2 + 3 (r) (3.93) P FA (0)P FA (r) = P 0 FA (0)rP 00 FA (0) r 2 2 4 (r) (3.94) where i (r) r 2 ! 0 asr! 0 fori = 2, 3, and 4. 37 Now define max (r) = maxf 1 (r); 2 (r); 3 (r); 4 (r)g: (3.95) Suppose first that bothP FA andP D increase at 0. Hence 0<P 00 D (0)P 0 FA (0)P 00 FA (0)P 0 D (0) because of (3.90). Then (3.91)-(3.95) imply that there exists> 0 such that for all 0<r< 2 max (r)[P 0 D (0) +P 0 FA (0)] r 2 < P 00 D (0)P 0 FA (0)P 00 FA (0)P 0 D (0) (3.96) and hence max (r)[P 0 D (0)r +P 0 FA (0)r] < P 00 D (0) r 2 2 P 0 FA (0)rP 00 FA (0) r 2 2 P 0 D (0)r: (3.97) Rewrite the above inequality as 2P 0 D (0)rP 00 FA (0) r 2 2 + 2P 0 D (0)r max (r) < 2P 0 FA (0)rP 00 D (0) r 2 2 2P 0 FA (0)r max (r): (3.98) This is equivalent to [P 0 FA (0)r +P 00 FA (0) r 2 2 + max (r)][P 0 D (0)rP 00 D (0) r 2 2 + max (r)] < [P 0 FA (0)rP 00 FA (0) r 2 2 max (r)][P 0 D (0)r +P 00 D (0) r 2 2 max (r)]: (3.99) Then [P FA (r)P FA (0)][P D (0)P D (r)] < [P FA (0)P FA (r)][P D (r)P D (0)] (3.100) 38 for all 0<r< because of (3.91)-(3.95). Rewrite (3.100) as P D (0) < [P FA (r)P FA (0)] [P FA (r)P FA (r)] P D (r) + [P FA (0)P FA (r)] [P FA (r)P FA (r)] P D (r): (3.101) This inequality implies that f N (n) = (n +r) + (1)(nr) with (3.102) = [P FA (r)P FA (0)] [P FA (r)P FA (r)] (3.103) is an SR noise pdf. Suppose now that bothP FA andP D decrease at 0. Hence 0<P 00 FA (0)P 0 D (0)P 00 D (0)P 0 FA (0) and there exists> 0 such that for all 0<r< -2 max (r)[P 0 D (0)+P 0 FA (0)] r 2 < P 00 FA (0)P 0 D (0)P 00 D (0)P 0 FA (0): (3.104) Then similar arguments show that (3.102)-(3.103) again give an SR noise pdf. All other cases obey either P D (0) < P D (r) and P FA (0) P FA (r) or P D (0) < P D (r) and P FA (0)P FA (r) becauseP FA does not have a local minimum at 0 and becauseP D does not have a local maximum at 0 by hypothesis. So eitherf N (n) =(n +r) orf N (n) =(nr) gives an SR noise pdf. 39 Theorem 3.3 implies the following corollary. It gives a sufficient condition for an SR noise benefit in N-P signal detectors if they partition the real lineR into acceptance and rejection in- tervals and if they use a single noisy observationY2R to decide betweenH 0 andH 1 . Corollary 3.2: Suppose that the thresholds =f 1 ;:::; k g partition the real lineR into accep- tance and rejection regions and thatP FA (0) =. Suppose also that the hypothesized pdfsf i are differentiable at all the thresholds in . Defines(j) = 1 if j is a left endpoint of any rejection interval. Else lets(j) =1. Then additive noise can improve the N-P detection of such a detector ifP FA does not have a local minimum at 0, ifP D does not have a local maximum at 0, and if P j s(j)f 0 0 ( j ) P j s(j)f 1 ( j ) > P j s(j)f 0 1 ( j ) P j s(j)f 0 ( j ) (3.105) or sgn( P j s(j)f 0 ( j ))sgn( P j s(j)f 1 ( j )) 0. The inequality (3.105) holds because the hypotheses of Corollary 3.2 imply that P 00 D (0)jP 0 FA (0)j = P j s(j)f 0 1 ( j ) P j s(j)f 0 ( j ) and P 00 FA (0)jP 0 D (0)j = P j s(j)f 0 0 ( j ) P j s(j)f 1 ( j ) in (3.90). 40 3.3 N-P SR Noise Finding Algorithm This section presents an algorithm for finding a near-optimal SR noise density. Theorems 3.1 and 3.2 give the exact form for the optimal SR noise pdf. But such a noise pdf may not be easy to find in a given noise spaceN . So we present an algorithm that uses Theorems 3.1 and 3.2 and succes- sive approximations to find a near-optimal SR noise from a finite set of noise realizations ~ NN . The algorithm takes as input, , ~ N in (3.9)-(3.19), and the respective detection and false alarm probabilitiesP D andP FA in ~ N . The algorithm first searches for a constant noise from the setG if the inequalityP D sup P D + sup holds. The algorithm otherwise finds a numberk(i) at each iterationi such thatjd (k(i))d(k )j 2 i+1 and thisk(i) givesjd + (k(i))d (k(i))j < in at mosti max =dlog 2 (2=)e+1 iterations. Then the algorithm defines the noise ~ N 0 as the random combination of ~ n 1 2D and ~ n 2 2D + so thatg(~ n 1 ;k(i max )) =d (k(i max )),g(~ n 2 ;k(i max )) =d + (k(i max )), andP FA (f ~ N 0) =. Theorem 4(a) below shows that for all > 0 the algorithm finds an SR noise ~ N 0 from ~ N in at most i max =dlog 2 (2=)e+1 iterations such that 0 P D (f ~ N opt )P D (f ~ N 0) . Here ~ N opt is the optimal N-P SR noise in ~ N and f ~ N opt is the pdf of ~ N opt . Theorem 4(b) shows that 0 P D (f N opt )P D (f ~ N 0) ( +) if for eachn2N there exists an ~ n2 ~ N so that jP D (n)P D (~ n)j and (3.106) P FA (~ n) P FA (n) (3.107) and if N opt is the optimal N-P SR noise inN with pdf f N opt . Thus the algorithm will find a near-optimal noise ~ N 0 for any small positive if we choose ~ N such that is sufficiently small. 41 SR Noise Finding Algorithm Let D + = f~ n2 ~ N : (P FA (~ n)) 0g and D = f~ n2 ~ N : (P FA (~ n)) 0g Let P D + sup = maxfP D (~ n) : ~ n2D + g and P D sup = maxfP D (~ n) : ~ n2D g Let G = f~ n2D :P D (~ n) =P D sup g If P D sup P D + sup f ~ N opt (n) = (n ~ n 0 ) for any ~ n 0 2G Else Let D 0 = f~ n2 ~ N : (P FA (~ n)) = 0g and k(0) = 1 Let d (k(0)) = maxfP D (~ n) (P FA (~ n)) : ~ n2D g Let d + (k(0)) = maxfP D (~ n) (P FA (~ n)) : ~ n2D + g Let ds(1) = d (k(0)) and df(1) = d + (k(0)) Let i = 1 and i stop = l log 2 2 m Whilejd (k(i))d + (k(i))j > and ii stop Let dr(i) = (ds(i) +df(i))=2 Let k(i) = minf(P D (~ n)dr(i))=(P FA (~ n)) : ~ n2D nD 0 g Let d + (k(i)) = maxfP D (~ n)k(i)(P FA (~ n)) : ~ n2D + g Let d (k(i)) = dr(i) and ds(i + 1) = dr(i) If d + (k(i)) > d (k(i)) Let df(i + 1) = min d + (k(i); maxfds(i);df(i)g Else Let df(i + 1) = max d + (k(i); minfds(i);df(i)g End If Leti =i + 1 End While Ifjd + (k(i 1))d (k(i 1))j > Let t = sgn[d + (k(i 1))d (k(i 1))] Let k(i) = maxf P D (~ n) d (k(i 1)) +t =(P FA (~ n))) : ~ n2D + nD 0 g Let d + (k(i)) = d (k(i 1)) +t Let d (k(i)) = maxfP D (~ n)k(i)(P FA (~ n)) : ~ n2D g Else Let k(i) =k(i 1) End If f ~ N 0(n) = (n ~ n 1 ) + (1)(n ~ n 2 ) () where ~ n 1 2D : P D (~ n 1 )k(i)(P FA (~ n 1 )) = d (k(i)), ~ n 2 2D + : P D (~ n 2 )k(i)(P FA (~ n 2 )) = d + (k(i)), and = P FA (~ n 2 ) P FA (~ n 2 )P FA (~ n 1 ) End If 42 Theorem 3.4 (a) Pick any> 0. Then the above algorithm finds an N-P SR noise ~ N 0 from ~ N in at mosti max =dlog 2 (2=)e+1 iterations so that P D (f ~ N opt ) P D (f ~ N 0) P D (f ~ N opt ) and (3.108) P FA (f ~ N 0) : (3.109) (b) The suboptimal detection performance with noise ~ N 0 is at most+ less than the optimal SR detection with noiseN opt if ~ N satisfies (3.106)-(3.107): P D (f N opt ) P D (f ~ N 0) P D (f N opt ) ( +): (3.110) Proof: Part (a) The setG is nonempty and finite because the noise realization vector ~ N is finite. Suppose that P D sup P D + sup . Then Theorem 3.1(a) implies that the optimal noise ~ N opt in ~ N has the form f ~ N opt (n) = (n ~ n 0 ) for any ~ n 0 2G . The algorithm finds such noise ifP D sup P D + sup . Suppose now thatP D sup <P D + sup . Then there existsk 0 such thatd + (k ) =d (k ) = d(k ) by Theorem 3.2(a). Also there exist ~ n 1 2D and ~ n 2 2D + such thatg(~ n 1 ;k ) =d (k ) =d(k ) =d + (k ) =g(~ n 2 ;k ) because ~ N is finite. Then Theorem 3.2(c) implies that the optimal 43 SR noise ~ N opt in ~ N has the pdff ~ N opt of the form (3.52). P D (f ~ N opt ) = by Theorem 3.1(b) while Theorem 3.2(b) implies thatP D (f ~ N opt ) =d(k ). The algorithm finds an SR noise ~ N 0 from ~ N and its pdff ~ N 0 is of the form () in the algorithm. This pdff ~ N 0 satisfies (3.109) with equality. So we need to show only thatf ~ N 0 satisfies (3.108). We first show that the SR noise pdf f ~ N 0 satisfies (3.108) ifjd (k(i)) d(k )j and jd + (k(i))d(k )j for somei. Theorem 3.2(a) implies thatd + (k(i)) d(k ) ifd (k(i)) d(k ) andd + (k(i))d(k ) ifd (k(i))d(k ). So suppose first thatd(k )d (k(i)) andd + (k(i))d(k ). Let ~ n 1 2D and ~ n 2 2D + such thatg(~ n 1 ;k(i) =d (k(i)) andg(~ n 2 ;k ) =d + (k(i)). Note thatP FA (f ~ N 0) = R ~ N P FA (n)f ~ N 0(n)d(n) =. Then P D (f ~ N 0) = Z ~ N (P D (n)k(P FA (n)))f ~ N 0(n)d(n) (3.111) = g(~ n 1 ;k(i)) + (1)g(~ n 2 ;k(i)) (3.112) = d (k(i)) + (1)d + (k(i)) (3.113) (d(k )) + (1)d(k ) (3.114) becaused(k )d (k(i)) and d + (k(i)) d(k ) = d(k ) d(k ) because 0 1 (3.115) = P D (f ~ N opt ): (3.116) Similar arguments show thatP D (f ~ N 0)d(k ) if d(k )d + (k(i)) andd (k(i))d(k ). So we need show only thatjd (k(i))d(k )j andjd + (k(i))d(k )j for somei. We now show thatjd (k(i))d(k )j 2 2 i for alli. This implies thatjd (k(i))d(k )j fori =i stop wherei stop = l log 2 2 m . We then prove thatjd + (k(i))d(k )j fori =i stop +1. 44 We use mathematical induction oni to prove thatjd (k(i))d(k )j 2 2 i for alli. Basis Step (i = 1): The definition ofdr(i) givesdr(1) = ds(1) +df(1) 2 whereds(1) =d (k(0)), df(1) = d + (k(0)), and k(0) = 1. Both ds(1) and df(1) are between 0 and 2 by hypothesis. Sojds(1)df(1)j 2. Thenjdr(1)ds(1)j =jdr(1)df(1)j = jds(1)df(1)j 2 2 2 . Theo- rem 3.2(a) implies that d(k ) is between d (k(0)) (= ds(1)) and d + (k(0)) (= df(1)). Then jdr(1)d(k )jjdr(1)ds(1)j 2 2 . Andd (k(1)) =dr(1) holds in the algorithm becausek(1) = minf(P D (~ n)dr(1))=(P FA (~ n)) : ~ n2D nD 0 g. Hencejd (k(1))d(k )j 2 2 . Induction Hypothesis (i =m): Suppose thatjds(m)df(m)j 2 2 m1 . Then the definition of dr(i) implies thatjdr(m)ds(m)j =jdr(m)df(m)j = jds(m)df(m)j 2 2 2 m . Suppose also thatd(k ) lies betweends(m) anddf(m). Thenjdr(m)d(k )j 2 2 m . Note that d (k(m)) = dr(m) wherek(m) = minf(P D (~ n)dr(m))=(P FA (~ n)) : ~ n2D nD 0 g in the algorithm. So jd (k(m))d(k )j 2 2 m . Induction Step (i =m + 1): d(k ) lies between d (k(m)) and d + (k(m)) by Theorem 3.2(a). Suppose thatd + (k(m))>d (k(m)).d(k ) also lies betweend (k(m)) and maxfds(m);df(m)g becaused (k(m)) =dr(m) = ds(m) +df(m) 2 by definition. Sod(k ) lies between minfd + (k(m)); maxfds(m);df(m)gg andd (k(m)) and whered (k(m)) minfd + (k(m)); maxfds(m);df(m)gg. The algorithm definesds(m+1) =d (k(m)) =dr(m) anddf(m+1) = minfd + (k(m)); maxfds(m);df(m)gg. Then dr(m) < df(m + 1) and d(k ) lies between ds(m + 1) and df(m + 1). 45 Writejds(m+1)df(m+1)j =jdr(m)df(m+1)jjdr(m)maxfds(m);df(m)gj 2 2 m . The last inequality follows from the induction hypothesis while the first inequality follows from the definition ofdf(m+1) and the fact thatdr(m)<df(m + 1). Thenjd (k(m+1))df(m+1)j =jdr(m+1) df(m+1)j =jdr(m+1) ds(m+1)j = jds(m+1)df(m+1)j 2 2 2 m+1 because d (k(m+1)) =dr(m+1) = ds(m+1) +df(m+1) 2 andjds(m+1)df(m+1)j 2 2 m . This proves the induction claimjd (k(m+1))d(k )j 2 2 m+1 because d(k ) lies between ds(m+1) and df(m+1). Similar arguments prove the induction claim whend + (k(m+1))<d (k(m+1)). We have so far proved thatjd (k(i))d(k )j fori =i stop . But this need not imply that jd + (k(i stop ))d (k(i stop ))j. The algorithm findsk(i stop +1) such thatd + (k(i stop +1)) =d (k(i stop )) +t wheret = sgn[d + (k(i stop ))d (k(i stop ))] ifjd + (k(i stop ))d (k(i stop ))j>. This implies that jd (k(i stop +1))d + (k(i stop +1))j becaused(k ) lies betweend (k(i stop +1)) andd + (k(i stop +1)) by Theorem 3.2 (a) and becausejd (k(i stop +1))d(k )j. Part (b) The optimal SR noise pdff N opt is of the form (3.52)-(3.53) and soP FA (f N opt ) =P FA (n 1 ) + (1)P FA (n 2 ) =. There exist ~ n 1 and ~ n 2 in ~ N that satisfy (3.106)-(3.107) by hypothesis. Define ~ N 00 as a noise restricted to ~ N with pdf f ~ N 00(n) = (n ~ n 1 ) + (1)(n ~ n 2 ): (3.117) Then = P FA (f N opt ) P FA (~ n 1 ) + (1)P FA (~ n 2 ) because of (3.107) (3.118) = P FA (f ~ N 00) because of (3.117) (3.119) 46 and P D (f N opt ) = P D (n 1 ) + (1)P D (n 2 ) (3.120) (P D (~ n 1 ) +) + (1)(P D (~ n 2 ) +) because of (3.106) (3.121) = P D (f ~ N 00) + because of (3.117): (3.122) Inequalities (3.119) and (3.122) and the fact that ~ N 00 is restricted to ~ N imply that P D (f ~ N opt ) P D (f ~ N 00) P D (f N opt ) (3.123) if ~ N opt is the optimal SR noise restricted to ~ N such thatP FA (f ~ N opt ). Then inequality (3.110) follows from (3.123) and the result of Part (a). We next show that the noise-finding algorithm is much faster than exhaustive search ifP D sup < P D + sup for large number of noise realizations. Suppose that the number of elements in the discretized setsD ,D + , andD 0 are respectivelyM,T , andV . Then Theorem 3.2(c) implies that exhaustive search needs to consider all MT noise realization pairs (~ n ; ~ n + )2 D D + and find the pair that corresponds to the maximal value ofP D (~ n ) + (1)P D (~ n + ) where = P FA (~ n + ) P FA (~ n + )P FA (~ n ) to find the optimal noise ~ N opt in ~ N . Thus exhaustive search needs to find the maximum from the set of M T elements. But the algorithm first finds the maximum from the smaller set of only M elements and from the smaller set of only T elements to find the respective values of d (k(0)) and d + (k(0)). The algorithm next finds the minimum from the setf (P D (~ n)dr(i)) (P FA (~ n)) : ~ n2 D nD 0 g of only MV elements and finds the maximum from the setfP D (~ n)k(i)(P FA (~ n)) : ~ n2 D + g of onlyT elements to get the respective values ofk(i) andd + (k(i)) at each iterationi = 1, ...,dlog 2 (2=)e. Then it finds the maximum from 47 the set of TV elements and from the set of M elements during the last iteration if needed. Theorem 3.4(a) shows that the algorithm finds the near-optimal noise ~ N 0 or randomization in just i max =dlog 2 (2=)e + 1 iterations such thatP D (f ~ N opt )P D (f ~ N 0). So the algorithm is faster than exhaustive search even for very small values of if bothM andT are large. The two applications in Section 3.5 specifically show that the noise-finding algorithm requires fewer iterations than the upper-bound number of iterationsi max . 3.4 Noise Benefits in Inequality-Constrained Statistical Decisions Problems We show that randomization and noise benefits extend beyond Neyman-Pearson signal detec- tion. Researchers have found randomization benefits in inequality-constrained communication problems to improve information rates [63, 233] and in inequality-constrained statistical deci- sions such as average-power-constrained signal transmission and jamming strategies [16] as well as in pricing and scheduling for a network access point that maximizes the time-average profit subject to an average-transmission-rate constraint [114]. Azizoglu [16] showed that the optimal channel switching strategy for an average-power-constrained transmitter time-shares between at most two channels and power levels to minimize the receiver-end error probability in multiple additive noise channels. Huang and Neely [114] studied a pricing and transmission scheduling problem for a network access point to maximize the time-average profit. They showed that ran- domization of at most two business-price tuples suffices for the access point to achieve its optimal time-average profit under the average-transmission-rate constraint. These examples are all spe- cial cases of noise benefits in expected payoff maximization in statistical decision problems that 48 have one inequality constraint on the expected cost. We now extend the previous results to a broad class of expected payoff maximization prob- lems subject to an inequality-constrained expected cost such that the payoff and cost are real- valued and bounded nonnegative Borel-measurable functions on the noise spaceNR m . Let h be the payoff function and c be the cost function. We want to maximize the average payoff E f N (h(N)) = R N h(n)f N (n)dn subject to the average cost constraintE f N (c(N)) = R N c(n)f N (n)dn . Define the noiseN as SR noise if its addition improves the expected payoffE f N (h(N)) while the expected costE f N (c(N)) stays at or below a preset maximum expected cost . Suppose that the respective cost and payoff in the absence of noise arec(0) andh(0). ThenE f N (h(N))> h(0) andE f N (c(N)) hold if N is SR noise. We want to find the optimal SR noise N opt such thatE f N (h(N))E f N opt (h(N opt )) for any other SR noiseN and such thatE f N opt (c(N opt )) . Theorems 3.1-3.4 hold if we replaceP D ,P FA , and in (3.9)-(3.110) with the respective real- valued bounded nonnegative payoff functionh, cost functionc, and the preset maximum expected cost . Theorem 3.4(a) holds withi max =dlog 2 (=)e+1 if=2 bounds bothh andc. So the SR noise-finding algorithm can find a near-optimal SR noise ~ N 0 that improves the expected payoff in statistical decisions with one inequality constraint on the expected cost if we choose a small enough and a set of noise realizations ~ N such that is sufficiently small in (3.106). We omit the proofs of the above statements because they are substantially the same as the proofs of The- orems 3.1-3.4 but with minor notational changes. The next section applies the algorithm to find a near-optimal signal power randomization for a power-constrained anti-podal signal transmitter that improves the MAP detection in Gaussian-mixture noise. 49 3.5 Applications of the SR Noise-Finding Algorithm This section presents two applications of the SR noise-finding algorithm. The first application finds a near-optimal SR noise for a suboptimal one-sample Neyman-Pearson hypothesis test be- tween two Gaussian distributions. The second application gives a near-optimal signal power randomization for a power-constrained anti-podal signal transmitter in additive Gaussian-mixture channel noise where the receiver uses optimal MAP signal detection. 3.5.1 Near-Optimal SR noise for a Suboptimal One-Sample Neyman-Pearson Hypothesis Test of Variance Consider a hypothesis test between two Gaussian densitiesH 0 : f 0 (x) = 1 p 6 e x 2 6 vs.H 1 : f 1 (x) = 1 p 2 e (x1) 2 2 where we want to decide betweenH 0 andH 1 using only a single observation ofX at the significance level = 0.05. Figure 1(a) shows bothf 0 andf 1 . Note that the likelihood ratio f 1 (x)=f 0 (x) = p 3e 2x 2 6x+3 6 is not monotonic. So the optimal Neyman-Pearson test function requires the two thresholds 1 = 1.342 and 2 = 1.658 for the optimal decision. It rejectsH 0 if X2 [ 1 ; 2 ] and if the test significance or preset false-alarm probability is = 0.05. The optimal N-P detection probability is 0.11. Suppose that the detection system can use only one threshold due to resource limits or some design constraint. Such a suboptimal detector must rejectH 0 whenX >. Then = 2.85 for =P FA (0) = 0.05. The noiseless detection probability isP D (0) = 0.0322. HereP D (n) = 1-(- 1-n) and P FA (n) = 1-( n p 3 ) for the standard normal cumulative distribution function (z) = 1 p 2 R z 1 e w 2 =2 dw. 50 Then inequality (3.105) of Corollary 3.2 becomes ( 1)>=3. So = 2.85 satisfies this sufficient condition. So there exists additive SR noise N that improves the detection perfor- mance of the suboptimal detector. But the existence of this SR noise N does not itself imply the existence of optimal SR noiseN opt if the noise space isR. Note thatP D (n) andP FA (n) are monotone increasing onR andP D sup =P D (0)<P D + sup = 1. Then Theorem 3.1(b) implies that the corresponding false-alarm probability is = 0.05 if the optimal N-P SR noise exists. Suppose that the optimal N-P SR noise exists. Then the necessary condition of Corollary 3.1(a) becomes P 0 D (n)kP 0 FA (n) =f 1 (n) -kf 0 (n) = 0 wheref 0 andf 1 are the pdfs defined above. Then there existsu> 0 such thatf 1 (n)>kf 0 (n) for alln2 (n i ;n i +u) andf 1 (n)<kf 0 (n) for all n2 (n i u;n i ) ifn i maximizesP D (n)k(P FA (n)). Then at most one solution ofP 0 D (n) kP 0 FA (n) = 0 is a global maximum ofP D (n)k(P FA (n)) for eachk 0. So Corollary 3.1(b) implies that optimal N-P SR noise does not exist if the noise space isR. But the hypothesis of Theorem 3.2(c) does hold if we restrict the noise space to a compact interval (say [-3,3]) because P D (n) andP FA (n) are continuous functions ofn. We next apply the algorithm to find near optimal noise inN = [3; 3] for = 2 20 . Consider the discretized set ~ N of noise realizations starting from -3 up to 3 with an increment of 0.0001: ~ N = [-3:0.0001:3]. ~ N satisfies (3.106)-(3.107) for = 0.00004 because 0.4 boundsf 0 andf 1 . Figure 3.2 plotsg(~ n;k(i)) =P D (~ n) -k(i)(P FA (~ n)-) before the first iteration (i = 0) wherek(0) = 1 and after the 17 th iteration (i = 17). The noise-finding algorithm finds the value k(17) = 1.8031 in just 17 (<i max = 22) iterations such thatjd + (k(17)) -d (k(17))j< = 2 20 . Note that g(~ n 1 ,k(17)) =d (k(17)) at ~ n 1 = -32D and thatg(~ n 2 ,k(17)) =d + (k(17)) at ~ n 2 = 2.14332 D + . Then 51 −3 −2 −1 0 1 2 3 −0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Noise realization Signal detection without noise P D (0) = 0.0322 d − (k(0)) d − (k(17)) d + (k(0)) d + (k(17)) ˜n 2 =2.1433 SR detection improvement 0.09 − 0.0322 = 0.0578 ˜n 1 =− 3 d − (k(17)) = d + ((17)) = 0.0894 before iterations (i=0) k(0) = 1 iteration i = 17 k(17) = 1.8031 Significance level α = 0.05 Plots of P D ( vs. noise realizations ˜n P D (˜n)− k(i)[P FA (˜n)− α ] P D (˜n)− k(i)[P FA (˜n)− α ] ˜n ˜n) − k(i)[P FA (˜n) − α] Figure 3.2: Finding near-optimal Neyman-Pearson SR noise The two curves plotg(~ n;k(i)) =P D (~ n) -k(i)(P FA (~ n)) before the first iteration (i = 0) and after the 17 th iteration (i = 9) wherek(0) = 1. The detection probability isP D (0) = 0.0322 in the absence of additive noise. The noise-finding algorithm finds a value ofk(17) = 1.8031 in 17 iterations such thatjd + (k(17)) d (k(17))j< = 2 20 . Note thatg(~ n 1 ;k(17)) =d (k(17)) at ~ n 1 = -32D andg(~ n 2 ;k(17)) =d + (k(17)) at ~ n 2 = 2.14332D + . Then (3.124)-(3.125) give the pdf of a near-optimal N-P SR noise ~ N 0 andP D (f ~ N 0) = 0.0894. So the N-P SR noise ~ N 0 increases the detection probabilityP D 177% from 0.0322 to 0.0894. f ~ N 0(n) = (n + 3) + (1)(n 2:1433) (3.124) with = (P FA (2:1433) 0:05) (P FA (2:1433)P FA (3)) = 0:8547 (3.125) is the pdf of a near-optimal N-P additive SR noise ~ N 0 becauseP D (f ~ N 0) = 0.0894 and because The- orem 3.4(b) implies that the detection probabilityP D (f N opt ) for the optimal N-P SR noiseN opt inN = [3; 3] will be at most 0.00004 + 2 20 more thanP D (f ~ N 0). So the algorithm finds a near-optimal 52 N-P SR noise that gives a 177% increase in the detection probability of the single-threshold sub- optimal test from 0.0322 to 0.0894. The noise-enhanced detection probability 0.0894 is fairly close to the optimal N-P detection probability 0.11. 3.5.2 Near-Optimal Signal Power Randomization for a Power-Constrained Signal Transmitter The detection performance of a MAP receiver can sometimes benefit from signal power ran- domization or time-sharing in an average-power-constrained anti-podal signal transmitter if the channel noise pdf is not multimodal. The noise-finding algorithm finds a near-optimal signal power distribution or randomization in an average-power-constrained transmitter that improves the MAP receiver’s detection performance. Consider a signal detection hypothesis test where the transmitter transmits anti-podal signals X2fS;Sg whereS2S = [0.5, 3.75] and both signal values are equally likely:H 0 :X =S vs. H 1 : X =S andP (H 0 ) =P (S) =P (S) =P (H 1 ). Suppose that the transmitter can use at most 4.75 units of expected powerE(S 2 ) and that the receiver decides betweenH 0 andH 1 using a single noisy observationY =X +N. HereN is an additive symmetric Gaussian-mixture channel noise where the signal probability density isf 0 (y) = 1 2 p 2 e (y2S) 2 2 + 1 2 p 2 e (y+2S) 2 2 at the receiver under the hypothesisH 0 andf 1 (y) = 1 2 p 2 e (y2+S) 2 2 + 1 2 p 2 e (y+2+S) 2 2 under the hypothesisH 1 . Such Gaussian-mixture channel noise can occur due to co-channel interference in communication systems [26, 94, 257]. The receiver is optimal and hence it uses MAP signal detection to maximize the probability of correct decision. Then the optimal receiver rejectsH 0 53 if the likelihood ratio obeys f 1 (y)=f 0 (y) > P (H 0 )=P (H 1 ) = 1. We assume that the transmit- ter can time-share or randomize between the signal power levels so that the receiver knows the signal power valueS 2 but does not know whether the signal is +S orS. Then the detection region of this optimal receiver is not fixed because it depends on the signal-power randomization information. The power-constrained detection problem takes the following form. Letf S be the probability density of the transmitter’s signal strengthS. DefineP CD (f S ) as the probability of correct decision and letE f S (S 2 ) be the average signal power when the signal-strength pdf isf S . Then we want to findf S opt such thatP CD (f S )P CD (f S opt ) for anyf S such thatE f S opt (S 2 ) 4.75. We can view P CD (f S ) as the average payoffE f S (h(S)) and viewE f S (S 2 ) as the average costE f S (c(S)) with the maximum average cost = 4.75. So we can apply the SR noise finding algorithm to find a near-optimal signal-power pdff~ S 0 from a discretized set ~ S of signal-power realizations. We use ~ S = [0.5:0.0001:3.75] and = 2 20 . Then ~ S satisfies conditions (3.106)-(3.107) for = 0.0002 because 0.2 boundsf 0 andf 1 . Bothh andc have upper bound=2 = 10/2. So the iteration upper bound isi max =dlog 2 (=)e+1 = 24. The generalization of Theorem 3.3 to this example gives the sufficient condition P 00 CD (s)2s > 2jP 0 CD (s)j (3.126) for the SR noise benefits if the maximal average signal power iss 2 . Figure 3.3 plots the correct-decision probability P CD versus the signal power S 2 . The ran- domized signal power is optimal because the graph is nonconcave. Azizoglu [16] proved that the plot ofP CD versusS 2 is concave if the channel noise has finite variance and if it has a unimodal 54 0 2 4 6 8 10 12 14 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Signal power S 2 Probability of Correct Detection P CD S 1 2 = 1.4908 SR Detection improvement d c b a Maximum average signal power E(S 2 ) = 4.75 S 2 2 = 9.3697 P CD = 0.7855 using constant power signaling P CD = 0.8421 using randomized power signaling Figure 3.3: SR noise (signal-strength randomization) benefits in optimal anti-podal signal detec- tion Signal power randomization in an average-power-constrained anti-podal transmitter improves the detection performance of the optimal receiver. The transmitter transmits anti-podal signalsX2fS;Sg such that E(S 2 ) = 4.75 and such that both signals values are equally likely: H 0 : X =S vs. H 1 : X = S andP (H 0 ) =P (S) =P (S) =P (H 0 ). The receiver receives the noisy observationY =X +N. N is symmetric Gaussian-mixture channel noise. The signal probability density isf 0 (y) = 1 2 p 2 e (y2S) 2 2 + 1 2 p 2 e (y+2S) 2 2 at the receiver under the hypothesisH 0 and isf 1 (y) = 1 2 p 2 e (y2+S) 2 2 + 1 2 p 2 e (y+2+S) 2 2 under the hypothesisH 1 . The receiver uses the single noisy observationY and the optimal MAP decision rule to decide betweenH 0 andH 1 . The solid line shows the nonmonotonic and nonconcave plot of the probability of correct decisionP CD versus the signal powerS 2 . Nonconcavity of the plot between the points b andc allows the SR effect to occur. The probability of correct decisionP CD is 0.7855 (pointa) if the transmitter uses a constant powerS 2 = 4.75 (a constant signal strengthS = 2.1794). The dashed tangent line shows that we can achieve a better probability of correct decision (0.8421 at point d) at the same average signal powerE(S 2 ) = 4.75 if the transmitter time-shares or randomizes appropriately between the signal power levelsS 2 1 = 1.4908 (pointb) andS 2 2 = 9.3697 (pointc). pdf that is continuously differentiable at each point except the mode. The nonconcavity of the graph in Figure 3.3 arises from the bimodal Gaussian-mixture pdf even though the bimodality of 55 the channel noise pdf does not itself give a nonconcave plot ofP CD versusS 2 . The probability of correct decisionP CD is 0.7855 (pointa) if the transmitter uses a constant powerS 2 = 4.75 (a constant signal strengthS = 2.1794). The dashed tangent line shows that we can achieve a bet- ter detection performance using the same average signal powerE(S 2 ) = 4.75. The probability of correct decisionP CD is 0.8421 (pointd) if the transmitter time-shares or randomizes appropriately between the signal power levelsS 2 1 = 1.4908 (pointb) andS 2 2 = 9.3697 (pointc). The sufficient condition (3.126) does not hold fors 2 = 4.75 because the plot ofP CD versusS 2 is locally concave at the constant-power operating pointa:P 00 CD (s)< 0<P 0 CD (s) ats = p 4:75. So the noise benefit occurs even when the sufficient condition (3.126) does not hold. The algorithm finds the SR noise or signal-strength randomization ~ S 0 with pdf f ~ S 0(s) = (s 1:221) + (1)(s 3:061) (3.127) where = (3:061 2 4:74) (3:061 2 1:221 2 ) = 0:5876 (3.128) in just 13 (<i max = 24) iterations. So the transmitter should time-share or randomize between the anti-podal signalsf1:221; 1:221g andf3:061; 3:061g with respective probabilities = 0.5876 and 1- = 0.4124. This signal-strength randomization pdff ~ S 0 is nearly optimal because Theorem 3.3(b) implies thatP CD (f S opt ) will be at most 0.0002 + 2 20 more thanP CD (f ~ S 0) = 0.8421 for the optimal signal-strength randomization or for the optimal SR noiseS opt inS = [0.5, 3.75]. Thus the SR noise algorithm can find a near-optimal signal power randomization that improves the average probability of correct decision (from 0.7855 to 0.8421) over the constant power signal- ing. Chapeau-Blondeau and Rousseau showed related SR noise benefits in the optimal Bayesian detection of a periodic square-wave signal in bimodal Gaussian-mixture phase noise [45] and 56 of a constant signal in additive bimodal noise [47]. But they did not find either the optimal or near-optimal noise pdf as in (3.127)-(3.128) for inequality-constrained optimal signal detection. 57 Chapter 4 Error-Probability Noise Benefits in Threshold Signal Detection Noise can also benefit threshold signal detection by reducing the probability of detection er- ror. This chapter presents five new theorems and a stochastic learning algorithm to find such error-probability noise benefits in threshold detectors when the noise comes from a scale-family distribution. The first theorem gives a necessary and sufficient condition for such a noise benefit when a threshold neuron performs discrete binary signal detection in the presence of additive scale-family noise. The theorem allows the user to find the optimal noise probability density for several closed-form noise types that include generalized Gaussian noise. The second theorem gives a noise-benefit condition for more general threshold signal detection when the signals have continuous probability densities. The third and fourth theorems reduce this noise-benefit to a weighted-derivative comparison of signal probability densities at the detection threshold when the signal densities are continuously differentiable and when the noise is symmetric and comes from a scale family. The fifth theorem shows how collective noise benefits can occur in a parallel array of threshold neurons even when an individual threshold neuron does not itself produce a noise benefit. The stochastic gradient-ascent learning algorithm can find the optimal noise value for noise probability densities that do not have a closed form. 58 We focus on signal detection in additive scale-family noise where a user can control only the noise variance or dispersion [148, 198, 216] to decrease the error probability. The performance measure is the probability of correct decisionP CD = 1P e whereP e is the probability of error. We use threshold neuron model to illustrate these noise benefits but these results apply to any threshold detector. 4.1 Error-Probability Noise Benefits: Total and Partial SR We classify SR benefits as either total SR or partial SR. The SR effect is total if adding indepen- dent noise in the received signal reduces the error probability. Then the plot of detection probabil- ity versus noise intensity increases monotonically in some noise-intensity interval starting from zero. The SR effect is partial when the detection performance increases in some noise-intensity interval away from zero. Total SR ensures that adding small amounts of noise gives a better detection performance than not adding noise. Partial SR ensures only that there exists a noise intensity interval where the detection performance increases as the noise intensity increases. The same system can exhibit both total and partial SR. We derive conditions that screen for total or partial SR noise benefits in almost all suboptimal simple threshold detectors because the SR con- ditions apply to such a wide range of signal probability density functions (pdfs) and noise pdfs. Learnings laws can then search for the optimal noise intensity in systems that pass the screening conditions. Section 4.5 presents one such stochastic learning law. We have already proven necessary and sufficient “forbidden interval” conditions on the noise mean or location for total SR in mutual-information-based threshold detection of discrete weak binary signals [140, 141]: SR occurs if and only if the noise mean or location parameter obeys 59 = 2 (A;+A) for threshold whereA<A< for bipolar subthreshold signalA. More general forbidden interval theorems apply to many stochastic neuron models with Brownian or even Levy (jump) noise [203, 207]. Corollary 4.1 below gives a forbidden-interval necessary con- dition for SR in error-probability detection. But we did find necessary and sufficient conditions for both total and partial SR noise benefits in error-probability-based threshold signal detection when the noise has a scale-family distribution. Theorem 4.1 gives the a simple necessary and sufficient SR condition for a noise benefit in threshold detection of discrete binary signals. This result appears in Section 4.3. The condition also determines whether the SR effect is total or partial if the noise density belongs to a scale fam- ily. Scale-family densities include many common densities such as the normal and uniform but not the Poisson. The condition implies that SR occurs in simple threshold detection of discrete binary signals only if the mean or location of additive location-scale family noise does not fall in an open forbidden interval. The uniform, Gaussian, Laplacian, and generalized Gaussian (r = 1 2 ) noise in Figures 4.1(a) - 4.1(e) produce a noise benefit because they satisfy condition (4.5) of The- orem 4.1. But the Laplacian noise in Figure 4.1(f) violates this forbidden-interval condition of Corollary 4.1 and so there is no noise benefit. Section 4.4 shows that the SR condition of Theorem 4.1 also allows us to find the optimal noise dispersion that maximizes the detection probability for a given closed-form scale-family noise pdf. Section 4.5 shows that a gradient-ascent learning algorithm can find this optimal intensity from sample data even when the noise pdf does not have a closed form as with many thick-tailed noise pdfs. 60 0 0.5 1 1.5 2 2.5 0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.68 Standard deviation σ of additive uniform noise Probability of Correct Detection P D Optimal Uniform Noise Standard Deviation σ o = 0.954 0 0.5 1 1.5 2 2.5 0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 Standard deviation σ of the additive Gaussian noise Probability of Detection P D Optimal Gaussian Noise Standard Deviation σ o = 0.954 0 0.5 1 1.5 2 2.5 0.5 0.52 0.54 0.56 0.58 0.6 0.62 Standard deviation σ of additive Laplacian noise Probability of Correct Detection P D Optimal Laplacian Noise Standard Deviation σ 0 = 1.3493 (a) (b) (c) 0 1 2 3 4 5 0.5 0.52 0.54 0.56 0.58 0.6 Standard deviation σ of additive Generalized Gaussian (r =1/2) noise Probability of Correct Detection P D Optimal Generalized Gaussian (r = 1/2) Noise Standard Deviation σ 0 = 2.4319 0 0.5 1 1.5 2 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 Dispersion σ of the additive Cauchy noise Probability of Correct Detection P D Optimal Cauchy Noise Dispersion σ 0 = 0.866 0 0.5 1 1.5 2 2.5 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Standard deviation σ of additive Laplacian noise Probability of Correct Detection P D (d) (e) (f) Figure 4.1: Stochastic resonance (SR) noise benefits in threshold detection of binary discrete signals The discrete signalX can take subthreshold valuess 0 = -1 ors 1 = 0 with equal probability (p 0 =P (s 0 ) =P (s 1 ) =p 1 ) and = 1.5 is the detection threshold. We decideX =s 1 if the observationW =X +N >. ElseX =s 0 . Noise benefits occur for five out of six types of additive noise. There is a noise benefit in (a)-(e) because the zero-mean uniform, Gaussian, Laplacian, generalized Gaussian (r = 1 2 ) noise, and zero-location Cauchy noise satisfy condition (4.5) of Theorem 4.1. The SR effect is partial in (a) because condition (4.6) does not hold while the SR effect in (b)-(e) is total in each case because condition (4.6) holds. The dashed vertical lines show that the maximum SR effect occurs at the theoretically predicted optimal noise intensities. There is no noise benefit in (f) for Laplacian noise because its mean lies in the forbidden interval in accord with Corollary 4.1: = 12 (0:5; 1:5) = (s 1 ;s 0 ). Total SR can never occur in an optimal threshold system if we add only independent noise in the received signal. Kay and coworkers showed that the optimal independent additive SR noise is just a constant that minimizes the detection error probability of a given detection scheme [127]. So total SR can never occur if the detection threshold location is optimal even when the overall 61 detection scheme is suboptimal. But we show that partial SR can still occur in a single-threshold suboptimal system even if the detection threshold is optimal. Rousseau and Chapeau-Blondeau found earlier that what we call partial SR occurs in some special cases of optimal threshold de- tection [225]. Figure 4.3 shows such a partial SR effect for the important but special case of an optimal threshold. Our result still holds in the general case of non-optimal thresholds. But the suboptimality of the signal detection remains only a necessary condition for SR noise benefits based on error probability. Theorem 4.2 in Section 4.6 presents a related necessary and sufficient condition for a noise benefit in a more general case of threshold detectors when the signals have continuous pdfs and when the additive independent noise has a pdf from a scale family. Then Theorem 4.3 gives a necessary and sufficient condition for total SR with zero-mean discrete bipolar noise. Corollary 4.2 gives a necessary and sufficient condition for partial SR with zero-mean discrete bipolar noise when there is no total SR in Theorem 4.3. Theorems 4.3 and 4.4 each gives a necessary and suffi- cient condition for total SR when the additive noise is either zero-mean discrete bipolar or when it comes from a finite-variance symmetric scale family. These two theorems compare weighted derivatives of continuously differentiable signal pdfs at the detection threshold to determine the total SR effect. Theorem 4.5 of Section 4.7 shows when noise produces a collective SR effect in parallel arrays of threshold neurons even when an individual threshold neuron does not produce an SR effect. The next section describes a general problem of threshold-based neural signal de- tection and defines the two SR effects based on error probability. 62 4.2 Binary Signal Detection Based on Error-Probability We now cast the problem of threshold-based neural signal detection as a statistical hypothesis test. So consider the binary hypothesis test where a neuron decides betweenH 0 : f X (x;H 0 ) = f 0 (x) andH 1 : f X (x;H 1 ) =f 1 (x) using a single noisy observation ofX +N and a detection threshold where noiseN is independent ofX. HereX2R is the original signal input to the threshold neuron andf i is its pdf under the hypothesisH i fori = 0 or 1. So we use the classical McCulloch-Pitts threshold neuron ([175]) where the neuron’s outputY has the form Y = 8 > > < > > : 1 (acceptH 1 ) if X +N > 0 (acceptH 0 ) else: (4.1) This simple threshold neuron model has numerous applications [15, 42, 81, 186, 20]). The probability of correct decision P CD () = 1P e () measures detection performance. Suppose thatp 0 =P (H 0 ) andp 1 =P (H 1 ) = 1p 0 are the prior probabilities of the respective hypothesesH 0 andH 1 . Let() and() be the respective Type-I and Type-II error probabili- ties when the intensity of the additive noiseN is : () = P (rejectH 0 jH 0 is true at noise intensity) (4.2) () = P (acceptH 0 jH 1 is true at noise intensity): (4.3) Then define the probability of error as the usual probability-weighted sum of decision errors [215] P e () = p 0 () + p 1 (): (4.4) 63 We assume that the additive noiseN is a scale-family noise with pdff N (;n) where is the noise intensity (standard deviation or dispersion):f N (;n) = 1 f( n ) wheref is the standard pdf for the family [41]. Then the noise cumulative distribution function (CDF) isF N (;n) =F ( n ) whereF is the standard CDF for the family. We next define SR effects in neural signal detection based on error probability. A binary signal detection or hypothesis testing system exhibits the SR effect in the noise intensity interval (a;b) for 0a<b<1 iffP CD ( 1 )<P CD ( 2 ) for any two noise intensities 1 and 2 such that a 1 < 2 b. The SR effect is total ifa = 0 and partial ifa6= 0. We say that the SR effect occurs at the noise intensity iff the SR effect occurs in some noise intensity interval (a;b) and 2 (a;b). 4.3 Noise Benefits in Threshold Detection of Discrete Binary Signals We first consider the binary signal detection problem where the signal X is a binary discrete random variable with the two valuess 0 ands 1 so thats 0 < s 1 and thatP (X = s 0 ) =p 0 and P (X =s 1 ) =p 1 . Then Theorem 4.1 gives a necessary and sufficient condition for an SR effect in the threshold neuron model (8.3) for discrete binary signal detection if the additive noise comes from an absolutely continuous scale-family distribution. 64 Theorem 4.1 Suppose that the additive continuous noiseN has scale-family pdff N (;n) and that the threshold neuron model is (8.3). Suppose that signalX is a binary discrete random variable with the two valuess 0 ands 1 so thatP (X =s 0 ) =p 0 andP (X =s 1 ) =p 1 . Then the SR noise benefit occurs in a given noise intensity interval (a;b) if and only if p 0 (-s 0 )f N (,-s 0 ) < p 1 (-s 1 )f N (,-s 1 ) (4.5) for almost every noise intensity2 (a;b). The SR effect is total if lim #0 p 0 (-s 0 )f N (,-s 0 ) < lim #0 p 1 (-s 1 )f N (,-s 1 ) (4.6) Proof: The signalX is a binary discrete random variable with the two valuess 0 ands 1 . So the Type-I and Type-II error probabilities (4.2) and (4.3) become () = 1F N (;s 0 ) (4.7) () = F N (;s 1 ) (4.8) whereF N is the absolutely continuous CDF of the additive noise random variableN. Then the er- ror probabilityP e () =p 0 () +p 1 () is an absolutely continuous function of in any closed interval [c;d]R + wherec> 0. Then the above definition of SR effects and the fundamental theorem of calculus [77] imply that the SR effect occurs in the noise intensity interval (a;b) if 65 and only if dPe() d < 0 for almost all2 (a;b). So the SR effect occurs in the noise intensity interval (a;b) if and only if 0 < p 1 @F N (;-s 1 ) @ p 0 @[1F N (;-s 0 )] @ (4.9) for almost all2 (a;b). Rewrite (8.81) as 0 < p 1 @F ( s 1 ) @ p 0 @[1F ( s 0 )] @ (4.10) whereF is the standard scale-family CDF of the additive noiseN. Then (4.10) gives 0 < p 1 (-s 1 ) f( -s 1 ) p 0 (-s 0 ) f( -s 0 ) = p 1 (-s 1 )f N (;-s 1 ) p 0 (-s 0 )f N (;-s 0 ) (4.11) because the additive noiseN has scale-family pdff N (;n) = 1 f( n ) and because the noise scale is always positive. Inequality (4.5) now follows from (4.11). The definition of a limit implies that condition (4.5) holds for all2 (0;b) for someb> 0 if (4.6) holds. So (4.6) is a sufficient condi- tion for the total SR effect in the simple threshold detection of discrete binary random signals. Theorem 4.1 lets users screen for total or partial SR noise benefits in discrete binary signal detection for a wide range of noise pdfs. This screening test can prevent a fruitless search for nonexistent noise benefits in many signal-noise contexts. The inequality (4.5) leads to the simple 66 stochastic learning law in Section 4.5 that can find the optimal noise dispersion when a noise benefit exists. The learning algorithm does not require a closed-form noise pdf. Inequality (4.5) differs from similar inequalities in standard detection theory for likelihood ratio tests. It specifically resembles but differs from the maximum a posteriori (MAP) likelihood ratio test in detection theory [215]: 8 > > < > > : RejectH 0 if p 0 f N (,z-s 0 ) < p 1 f N (,z-s 1 ) Else acceptH 0 : (4.12) The MAP rule (4.12) minimizes the detection-error probability in optimal signal detection. But it requires the noisy observationz of the received signalZ =X +N whereas inequality (4.5) does not. Inequality (4.5) also contains the differences-s 0 and-s 1 . So Theorem 4.1 gives a general way to detect SR noise benefits in suboptimal detection. Theorem 4.1 implies a forbidden-interval necessary condition if the noise pdf comes from a location-scale family. Corollary 4.1 Suppose that the additive noise N has location-scale family pdf f N (;n). Then the SR noise benefit effect occurs only if the noise mean or location obeys the forbidden-interval condition = 2 (s 1 ;s 0 ). 67 Proof: The equivalent signal detection problem isH 0 : s 0 + versusH 1 : s 1 + in the additive scale- family noise ~ N =N if we absorb in the signalX. Then inequality (4.5) becomes p 0 (-s 0 -)f ~ N (,-s 0 -) < p 1 (-s 1 -)f ~ N (,-s 1 -): (4.13) So the SR noise benefit does not occur if2 (s 1 ;s 0 ) because then the right-hand side of inequality (4.13) would be negative and the left-hand side would be positive. Corollary 4.1 differs from similar forbidden-interval conditions based on mutual information. The corollary shows that the interval condition = 2 (s 1 ;s 0 ) is only necessary for a noise benefit based on error probability. It is not sufficient. But the same interval condition is both necessary and sufficient for a noise benefit based on mutual information [140, 141]. Figures 4.1(a) - 4.1(e) show simulation instances of Theorem 4.1 for zero-mean uniform, Gaussian, Laplacian, generalized Gaussian (r = 1 2 ) noise, and for zero-location Cauchy noise whens 0 = 0,s 1 = 1,p 0 =p 1 , and = 1.5. There is a noise benefit in (a)-(e) because the zero-mean uniform, Gaussian, Laplacian, generalized Gaussian (r = 1 2 ) noise, and zero-location Cauchy noise satisfy condition (4.5) of Theorem 4.1. Uniform noise gives the largest SR effect but the detection performance degrades quickly as the noise intensity increases beyond its optimal value. Laplacian, generalized Gaussian (r = 1 2 ), and Cauchy noise give a more sustained SR effect but with less peak detection performance. The SR effect is partial in Figure 4.1(a) because condition (4.6) does not hold while the SR effects in Figures 4.1(b) - 4.1(e) are total because (4.6) holds for Gaussian, Laplacian, generalized Gaussian (r = 1 2 ), and Cauchy noise. 68 Figure 4.1(f) shows a simulation instance of Corollary 4.1 fors 0 = 0,s 1 = 1, and = 1.5 with additive Laplacian noiseN with mean = 1:N L(1, 2 ). Noise does not benefit the detection performance because the noise mean = 1 lies in the forbidden interval (-s 1 ;-s 0 ) = (0:5; 1:5). 4.4 Closed-Form Optimal SR Noise Theorem 4.1 permits the exact calculation of the optimal SR noise in some special but important cases of closed-form noise pdfs. The noise pdf here comes from a scale family and has zero mean or location. The binary signals are “weak” in the sense that they are subthreshold: s 0 <s 1 <. Then Theorem 4.1 implies that the optimal noise intensity (standard deviation or dispersion) o obeys p 0 (-s 0 )f N (,-s 0 ) = p 1 (-s 1 )f N (,-s 1 ) (4.14) if the noise density is unimodal. Equation (4.14) may be nonlinear in terms of and so may require a root-finding algorithm to compute the optimal noise intensity o . But we can still directly compute the optimal noise values for several common closed-form noise pdfs that include generalized Gaussian pdfs. The generalized Gaussian distribution [191] is a two-parameter family of symmetric contin- uous pdfs. The scale-family pdff(;n) of a generalized Gaussian noise has the form f(;n) = 1 f gg n = 1 r 2 [(3=r)] 1=2 [(1=r)] 3=2 e Bj n j r (4.15) 69 where f gg is the standard pdf of the family, r is a positive shape parameter, is the gamma function,B = h (3=r) (1=r) ir 2 , and is the scale parameter (standard deviation). This family of pdfs includes all normal (r = 2) and Laplace distributions (r = 1). It includes in the limit (r!1) all continuous uniform distributions on bounded real intervals. This family can also model sym- metric platykurtic densities whose tails are heavier than normal (r< 2) or symmetric leptokurtic densities whose tails are lighter than normal (r> 2). Applications include noise modeling in im- age, speech, and multimedia processing [19, 86, 146]. Putting (5.87) in (4.14) gives the optimal intensity of generalized-Gaussian noise as o = B(j 0 s 0 j r j 1 s 1 j r ) ln( 0 -s 0 )ln( 1 -s 1 ) +ln(p 0 )ln(p 1 ) 1 r : (4.16) We can now state the closed-form optimal noise dispersions for uniform, Gaussian, Lapla- cian, generalized Gaussian (r = 1 2 ), and Cauchy noise. Uniform noise: Let N be uniform noise in the interval [v;v] so that f N (,n) = 1 2v if n2 [v;v] andf N (,n) = 0 else. Then the noise standard deviation =v= p 3. So inequality (4.5) holds if and only if eithers 1 < v < s 0 ors 0 < v but p 1 p 0 > s 0 s 1 . So then the SR effect occurs if and only if2 (s 1 ) p 3 ; (s 0 ) p 3 whenp 0 =p 1 . Figure 4.1(a) shows that the SR effect is partial and the unimodal detection performance is maximal at the optimal noise standard deviation o = (s 0 ) p 3 = 0.866 whens 0 = 0,s 1 = 1, and = 1:5. 70 Gaussian noise (r = 2): Equation (4.16) implies that the unimodal detection performance for Gaussian noise is maximal at the noise standard deviation o = " (s 0 ) 2 (s 1 ) 2 2 [ln(-s 0 )ln(-s 1 ) +ln(p 0 )ln(p 1 )] #1 2 : (4.17) Figure 4.1(b) shows a Monte-Carlo simulation plot (10 6 runs) of detection performance when s 0 = 0,s 1 = 1,p 0 =p 1 , and = 1:5. The dashed vertical line shows that the maximal detection performance occurs at the predicted optimal noise intensity o = 0.954. Laplaciannoise (r = 1): Equation (4.16) gives the optimal standard deviation o of the Lapla- cian noise as o = 2(s 1 s 0 ) 2 [ln(-s 0 )ln(-s 1 ) +ln(p 0 )ln(p 1 )] 2 : (4.18) The dashed vertical line in Figure 4.1(c) shows that the maximum detection performance occurs at the predicted optimal noise scale o = 1.3493 fors 0 = 0,s 1 = 1, and = 1:5. Generalized Gaussian (r = 1 2 ): Equation (4.16) gives the optimal standard deviation o of generalized Gaussian noise withr = 1 2 as o = 2 4 (120) 1 4 h (s 0 ) 1 2 (s 1 ) 1 2 i ln(-s 0 )ln(-s 1 ) +ln(p 0 )ln(p 1 ) 3 5 2 : (4.19) 71 Figure 4.1(c) shows that the maximal detection performance occurs at the predicted optimal noise scale o = 2.4319 whens 0 = 0,s 1 = 1, and = 1:5 Figures 4.1(b)-4.1(e) also show that the peak SR effect decreases and the related optimal noise intensity increases as the shape parameterr of the generalized Gaussian pdf decreases. Simulations showed that the SR effect decayed slowly with increasing noise intensity far beyond o asr decreased. Cauchy noise: A zero-location infinite-variance Cauchy noise with scale parameter or disper- sion has pdff N (,n) = ( 2 +n 2 ) . Then equation (4.16) implies that o = p (s 0 )(s 1 ) (4.20) is the optimal Cauchy noise dispersion. The dashed vertical line in Figure 4.1(d) shows that the maximal detection performance occurs at the predicted optimal noise scale o = 0.866 fors 0 = 0, s 1 = 1, and = 1:5. 4.5 Adaptive Noise Optimization Noise adaptation can find the optimal noise variance or dispersion when a closed-form noise pdf is not available [141, 187, 188]. This applies to almost all zero-location symmetric-stable (SS) random variables because their bell-curve pdfs have no known closed form. These thick-tailed 72 bell curves often model impulsive noise in many environments [192]. AnSS random variable does have a characteristic function' with a known exponential form [97, 192]: '(!) = exp(j! j!j ) (4.21) where finite is the location parameter, = > 0 is the dispersion that controls the width of the bell curve, and2 (0; 2] controls the bell curve’s tail thickness. TheSS bell curve’s tail gets thicker as falls. The only known closed-formSS pdfs are the thick-tailed Cauchy with = 1 and the thin-tailed Gaussian with = 2. The Gaussian pdf alone amongSS pdfs has a finite variance and finite higher-order moments because the rth lower-order moments of an -stable pdf with< 2 exist if and only ifr<. The location parameter serves as the mean for 1< 2 and as only the median for 0< 1. An adaptive SR learning law updates the dispersion parameter based on how the parameter increases the system performance measure [187]. So using the probability of correct detection P CD gives the following gradient-ascent algorithm: k+1 = k + c k @P CD @ : (4.22) Herec k is an appropriate learning coefficient at iterationk. The chain rule of calculus implies that @P CD @ = @P CD @ @ @ = @P CD @ 1 . Here @P CD @ = p 1 (s 1 )f N (s 1 )p 0 (s 0 )f N (s 0 ) becauseP CD () = 1P e () and @Pe() @ =p 0 (s 0 )f N (s 0 )p 1 (s 1 )f N (s 1 ) from (4.11). Then the learning law (4.22) becomes 73 k+1 = k + c k [p 1 k (s 1 )f N k (s 1 ) p 0 k (s 0 )f N k (s 0 )] 1 k : (4.23) Figure 4.2 shows how (4.23) can find the optimal noise dispersion for zero-location symmetric -stable (SS) random variables that do not have closed-form densities. We need to estimate the signal probabilitiesp i k and noise densityf N k at each iterationk. So we generated 500 signal- noise random samplesfs l ;n l g for l = 1;:::; 500 at each k and then used them to estimate the signal probabilities and noise density with their respective histograms. Figure 4.2 shows the SR profiles and noise-dispersion learning paths for different-stable noise types. We used a constant learning rate c k = 0.1 and started the noise level from several initial conditions with different noise seeds. All the learning paths quickly converged to the optimal noise dispersion 0 . 4.6 Noise Benefits in Threshold Detection of Signals with Continuous Probability Densities Consider a more general binary hypothesis testH 0 :F 0 versusH 1 :F 1 whereF 0 andF 1 are the respective absolutely continuous signal CDFs under the hypothesisH 0 andH 1 . Assume again the threshold neuron model of (8.3). Then Theorem 4.2 below gives a necessary and sufficient condition for an SR effect in general threshold signal detection. The proof extends the proof of Theorem 4.1 and uses the theory of generalized functions. 74 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Dispersion γ of additive α−stable noise (α = 1.1) Probability of Correct Decision P CD Optimal Noise Dispersion γ o = 0.1844 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 0.55 0.6 0.65 0.7 0.75 Dispersion γ of additive α−stable noise (α = 1.5) Probability of Correct Decision P CD Optimal Noise Dispersion γ o = 0.1213 (a) (b) 0 500 1000 1500 2000 2500 3000 3500 4000 0 0.1 0.2 γ k 0 500 1000 1500 2000 2500 3000 3500 4000 0 0.5 1 γ k 0 500 1000 1500 2000 2500 3000 3500 4000 0 0.5 1 Iteration k γ k 0 500 1000 1500 2000 2500 3000 0 0.1 0.2 γ k 0 500 1000 1500 2000 2500 3000 0.1 0.2 0.3 0.4 γ k 0 500 1000 1500 2000 2500 3000 0 0.5 1 Iteration k γ k Figure 4.2: Adaptive SR for infinite-variance-stable noise A gradient-ascent learning algorithm learns the optimal noise dispersions for infinite-variance -stable noise. The bipolar signalX can take either the values 0 = -0.4 ors 1 = 0.4 with equal probability (p 0 = P (s 0 ) =P (s 1 ) =p 1 ) when = 0.5 is the detection threshold. We decideX =s 1 if the noisy observation X +N > and otherwiseX =s 0 . The additive noise is-stable with tail-thickness parameter: (a) = 1:1 and (b) = 1:5. The graphs at the top show the nonmonotonic signatures of SR where peak detection occurs at the optimal dispersion o . The sample paths in the bottom plots show the convergence of the noise dispersion k from the initial noise scale to the optimum o . The initial noise dispersions are 0.05, 0.5, and 1 with constant learning ratec k = 0:1. 75 Theorem 4.2 Suppose that the signal CDFsF 0 andF 1 are absolutely continuous and that the additive noiseN has scale-family pdff N (;n). Then the SR noise benefit occurs in a given noise intensity interval (a;b) if and only if p 0 R R nf 0 (-n)f N (,n)dn < p 1 R R nf 1 (-n)f N (,n)dn (4.24) for almost all2 (a;b). The above condition also holds if the noise is discrete when appropriate sums replace the integrals. Proof: Write the respective Type-I and Type-II error probabilities (4.2) and (4.3) as () = R R [1F 0 (n)]f N (;n)dn (4.25) () = R R F 1 (n)f N (;n)dn (4.26) where appropriate sums replace the integrals if the noise is discrete. Then the error probability P e () =p 0 () +p 1 () is an absolutely continuous function of in any closed interval [c;d] R + where c > 0. Then the above definition of SR effects and the fundamental theorem of calculus [77] imply that the SR effect occurs in the noise intensity interval (a;b) if and only if dPe() d < 0 for almost all2 (a;b). So the SR effect occurs in the noise intensity interval (a;b) if and only if 0 < p 1 @ @ R R F 0 (n) 1 f( n )dn p 0 @ @ R R [1F 1 (n)] 1 f( n )dn (4.27) 76 = p 1 @ @ R R F 0 (~ n)f(~ n)d~ n p 0 @ @ R R [1F 1 (~ n)]f(~ n)d~ n (4.28) for almost all2 (a;b). The last equality follows from the change of variable from n to ~ n. We next use the theory of distributions or generalized functions to interchange the order of integration and differentiation. The error probabilities () and () are locally integrable [277] in the space R + of because they are bounded. Then P e () is a generalized function of [277] and hence its distributional derivative always exists. The termsF 0 (~ n)f(~ n) and F 1 (~ n)f(~ n) in (4.28) are also generalized functions of for all ~ n2R because they too are bounded. Then we can interchange the order of integration and distributional derivative in (4.28) [120]. So the SR effect occurs in the noise intensity interval (a;b) if and only if 0 < p 1 R R @F 0 (~ n) @ f(~ n)d~ n p 0 R R @[1F 1 (~ n)] @ f(~ n)d~ n (4.29) = p 1 R R ~ nf 0 (~ n)f(~ n)d~ n p 0 R R ~ nf 1 (~ n)f(~ n)d~ n (4.30) for almost all2 (a;b). Then inequality (4.24) follows if we substitute back~ n =n andf N (;n) = 1 f( n ) in (4.30). Corollary 4.2 below gives a sufficient condition for the SR effect using a zero-mean bipolar discrete noiseN. 77 Corollary 4.2 Suppose that the signal pdfsf 0 andf 1 are continuous and that there exist positive numbersr 1 and r 2 such that p 0 [f 0 (-r 2 )-f 0 (+r 1 )] < p 1 [f 1 (-r 2 )-f 1 (+r 1 )] (4.31) holds. Suppose also that N is additive scale-family noise with standard family pdf P (N 0 = q r 1 r 2 ) =r 2 =(r 1 +r 2 ) andP (N 0 = q r 2 r 1 ) =r 1 =(r 1 +r 2 ) so thatN =N 0 is zero-mean bipolar noise with variance 2 . Then an SR effect occurs at the noise standard deviation p r 1 r 2 . Proof: N =N 0 has pdfP (N = q r 1 r 2 ) =r 2 =(r 1 +r 2 ) andP (N = q r 2 r 1 ) =r 1 =(r 1 +r 2 ). Then Theorem 4.2 implies that the SR effect occurs at the noise standard deviation p r 1 r 2 if and only if p 0 [f 0 ( r r 2 r 1 )f 0 ( + r r 1 r 2 )] < p 1 [f 1 ( r r 2 r 1 )f 1 ( + r r 1 r 2 )] (4.32) for almost all in some open interval that contains p r 1 r 2 . The inequality (4.32) holds for = p r 1 r 2 from the hypothesis (4.31). Then (4.32) holds for all in some open interval that contains p r 1 r 2 because the signal pdfsf 0 andf 1 are continuous. 78 The next theorem gives a necessary and sufficient condition for the total SR effect when the signal pdfs are continuously differentiable and when the additive discrete noiseN is zero-mean bipolar. Theorem 4.3 Suppose that the signal pdfsf 0 andf 1 are continuously differentiable at. Suppose also that the additive discrete noise N is zero-mean bipolar. Then the total SR effect occurs if p 0 f 0 0 () > p 1 f 0 1 () and only if p 0 f 0 0 ()p 1 f 0 1 () . Proof: Note thatp 0 f 0 0 ()>p 1 f 0 1 () if and only if p 0 lim #0 2 4 f 0 (+ q r 1 r 2 )f 0 (- q r 2 r 1 ) ( q r 2 r 1 + q r 1 r 2 ) 3 5 > p 1 lim #0 2 4 f 1 (+ q r 1 r 2 )f 1 (- q r 2 r 1 ) ( q r 2 r 1 + q r 1 r 2 ) 3 5 : This implies inequality (4.32) for2 (0;b) for someb> 0. Figure 4.3 shows a simulation instance of Corollary 4.2 and Theorem 4.3 when = 0.7379 is the optimal detection threshold in the absence of noise. The signal pdf is equally likely to be either a trimodal Gaussian mixturef 0 = 1 3 e (n+3) 2 =2 p 2 + 1 3 e n 2 =2 p 2 + 1 3 e (n3) 2 =2 p 2 or a bimodal Gaussian mixturef 1 = 0:4 e (n+2) 2 =2 p 2 + 0:6 e (n2) 2 =2 p 2 . These multimodal mixture densities arise in some neural systems [185, 267]. The optimal (minimum error-probability) detection in this case requires four thresholds to partition the signal space into acceptance and rejection regions. 79 −6 −4 −2 0 2 4 6 0 0.05 0.1 0.15 0.2 0.25 X r 2 r 1 θ = 0.7379 −1.75 f 1 f 0 H 0 : f 0 (x) H 1 : f 1 (x) Reject H 0 if X+N > θ 4 p 0 = p 1 0 1 2 3 4 5 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 Standard deviation σ of zero−mean bipolar noise Probability of Correct Decision P D (a) (b) Figure 4.3: Partial stochastic resonance (SR) in threshold detection of a binary signal X with continuous pdf in the presence of zero-mean additive discrete bipolar scale-family noiseN. (a) The signal pdf is equally likely to be eitherH 0 :f 0 (n) = 1 3 e (n+3) 2 =2 p 2 + 1 3 e n 2 =2 p 2 + 1 3 e (n3) 2 =2 p 2 orH 1 : f 1 (n) = 0:4 e (n+2) 2 =2 p 2 + 0:6 e (n2) 2 =2 p 2 . The thick vertical line shows the optimal detection threshold = 0.7379 when there is no noise. We decideH 1 if the noisy observationX +N > . Else we decideH 0 . N =N 0 is additive scale-family noise with standard family pdfP (N 0 = q r1 r2 ) =r 2 =(r 1 +r 2 ) and P (N 0 = q r2 r1 ) =r 1 =(r 1 +r 2 ). ThusN is zero-mean bipolar noise with variance 2 . (b)f 0 andf 1 satisfy condition (4.31) of Corollary 4.2 for r 1 = 4- = 3.2621 and r 2 = +1.75 = 2.4879. Hence the additive noiseN shows a partial SR effect at N = p r 1 r 2 = 2.8488 (marked by the dashed vertical line and). But the total SR effect does not occur becausef 0 andf 1 do not satisfy the conditionp 0 f 0 0 () > p 1 f 0 1 () of Theorem 4.3. But the neurons have only one threshold for signal detection. The optimal location is at 0.7379 for such a single detection threshold. Figure 4.3(a) shows that f 0 and f 1 satisfy condition (4.31) of Corollary 4.2 for r 1 = 4- = 3.2621 andr 2 =+1.75 = 2.4879. Then the SR effect occurs at = p r 1 r 2 = 2.8488 because we choose the discrete noiseN as in Corollary 4.2. The signal pdfsf 0 andf 1 do not satisfy the the conditionp 0 f 0 0 ()>p 1 f 0 1 () of Theorem 4.3 at = 0.7379. Hence Figure 4.3(b) shows that the total SR effect does not occur but a partial SR effect does occur at = 2.8488. 80 Theorem 4.4 below extends the necessary and sufficient total-SR-effect condition of Theorem 4.3 to any finite-mean symmetric scale-family additive noise N. This noise can be SS with infinite variance so long as> 1. Theorem 4.4 Suppose that additive noiseN has a finite mean and has a symmetric scale-family pdf. Suppose signal pdfsf 0 andf 1 are bounded and continuously differentiable at. Then the total SR effect occurs if p 0 f 0 0 () > p 1 f 0 1 () (4.33) and only if p 0 f 0 0 () p 1 f 0 1 (). Proof: Inequality (4.24) of Theorem 4.2 implies that the total SR effect occurs if and only if p 0 Z 1 0 n (f 0 (+n)f 0 (-n)) 1 f( n )dn > p 1 Z 1 0 n (f 1 (+n)f 1 (-n)) 1 f( n )dn (4.34) for almost all 2 (0;b) for some b > 0 because f N (;n) = 1 f( n ) is a symmetric noise pdf. Puttingn =~ n in (4.34) gives p 0 Z 1 0 ~ nf 0 (+~ n)f 0 (-~ n)f(~ n)d~ n > p 1 Z 1 0 ~ nf 1 (+~ n)f 1 (-~ n)f(~ n)d~ n: (4.35) 81 for almost all2 (0;b). Then the definition of a limit implies that (4.35) holds if lim !0 p 0 Z L 0 ~ nf 0 (+~ n)f 0 (-~ n)f(~ n)d~ n + lim !0 p 0 Z 1 L ~ nf 0 (+~ n)f 0 (-~ n)f(~ n)d~ n > lim !0 p 1 Z L 0 ~ nf 1 (+~ n)f 1 (-~ n)f(~ n)d~ n + lim !0 p 1 Z 1 L ~ nf 1 (+~ n)f 1 (-~ n)f(~ n)d~ n: (4.36) wherej~ nf i (+~ n)f i (-~ n)f(~ n)jj~ nQf(~ n)j for some numberQ because the pdfsf i are bounded. So Lebesgue’s dominated convergence theorem [77] implies that the limit of the second term on both sides of (4.36) is zero because the additive noise has a finite mean. Then inequality (4.36) holds if lim !0 p 0 Z L 0 ~ n f 0 (+~ n)f 0 (-~ n) f(~ n)d~ n > lim !0 p 1 Z L 0 ~ n f 1 (+~ n)f 1 (-~ n) f(~ n)d~ n: (4.37) The mean-value theorem [77] implies that for any> 0 f i (+~ n)f i (-~ n) 2~ n sup juj f 0 i (u) (4.38) for alljnj . The right-hand side of (4.38) is bounded for a sufficiently small because the pdf derivatives f 0 0 and f 0 1 are continuous at . So Lebesgue’s dominated convergence theorem applies to (4.37) and thus the limit passes under the integral: 82 p 0 Z L 0 2~ n 2 lim !0 f 0 (+~ n)f 0 (-~ n) 2~ n f(~ n)d~ n > p 1 Z L 0 2~ n 2 lim !0 f 1 (+~ n)f 1 (-~ n) 2~ n f(~ n)d~ n: (4.39) Then L’Hospital’s rule gives p 0 Z L 0 2~ n 2 f 0 0 ()f(~ n)d~ n > p 1 Z L 0 2~ n 2 f 0 1 ()f(~ n)d~ n: (4.40) or p 0 f 0 0 () Z L 0 2~ n 2 f(~ n)d~ n > p 1 f 0 1 () Z L 0 2~ n 2 f(~ n)d~ n: (4.41) Thusp 0 f 0 0 ()>p 1 f 0 1 () because the integrals in (4.41) are positive and finite. Figure 4.4(a) plots the following two bimodal Gaussian-mixture signal pdfs f 0 (n) = 1 2 e (n+0:7) 2 =2(0:3) 2 p 20:3 + 1 2 e (n0:7) 2 =2(0:3) 2 p 20:3 (4.42) f 1 (n) = 1 2 e (n+0:3) 2 =2(0:3) 2 p 20:3 + 1 2 e (n1:7) 2 =2(0:3) 2 p 20:3 (4.43) with detection threshold = 0:5. This figure also shows that these signal pdfs satisfy the inequality p 0 f 0 0 () > p 1 f 0 1 () of Theorem 4.4. Figure 4.4(b) shows the predicted total SR effect for zero- mean uniform noise. The total SR effect in Figure 4.4(c) exceeds the scope of Theorem 4.4 because the mean of the Cauchy noise does not exist. This suggests the conjecture thatp 0 f 0 0 ()> 83 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 X f i (x) Threshold θ = 0.5 f 0 f 1 H 0 : f 0 (x) vs. H 1 : f 1 (x) Reject H 0 if X+N > θ p 0 = p 1 0 0.2 0.4 0.6 0.8 1 0.625 0.635 0.645 0.655 0.665 0.675 Standard deviation σ of additive Gaussian noise Probability of Correct Detection P D (a) (b) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.628 0.63 0.632 0.634 0.636 0.638 0.64 0.642 Dispersion σ of additive Cauchy noise Probability of Correct Detection P D (c) Figure 4.4: Total stochastic resonance (SR) in threshold detection of a binary signal X with continuous pdf in the presence of zero-mean additive symmetric scale family noise. (a) The bimodal signal pdf is equally likely to be eitherH 0 :f 0 (n) = 1 2 e (n+0:7) 2 =2(0:3) 2 p 20:3 + 1 2 e (n0:7) 2 =2(0:3) 2 p 20:3 or H 1 : f 1 (n) = 1 2 e (n+0:3) 2 =2(0:3) 2 p 20:3 + 1 2 e (n1:7) 2 =2(0:3) 2 p 20:3 . The thick vertical line indicates the detection threshold = 0:5. We decideH 1 if the noisy observationX +N >. Else decideH 0 . These signal pdfs satisfy the conditionp 0 f 0 0 () > p 1 f 0 1 () of Theorem 4.4 at = 0:5. The detection performance curves in (b) and (c) show the respective predicted total SR effect for zero-mean Gaussian noise and conjectured SR effect for zero-location Cauchy noise. p 1 f 0 1 () is a sufficient condition for the total SR effect even when the symmetric scale-family noise has no expected value. 84 4.7 Noise Benefits in Parallel Arrays of Threshold Neurons We show last how a simple calculus argument ensures a noise benefit for maximum likelihood detection with a large number of parallel arrays or networks of threshold neurons. This SR noise benefit need not occur for a single neuron and so is a genuine collective noise benefit. This total SR result for arrays applies to maximum likelihood detection of two alternative signals. So it resembles but differs from the array SR result in Patel & Kosko (2009b) that ap- plies instead to the Neyman-Pearson detection of a constant signal in infinite-variance symmetric alpha-stable channel noise with a single array of noisy quantizers. We note that Stocks (2000) first showed that adding noise in an array of parallel-connected threshold elements improves the mutual information between the array’s input and output. Then Rousseau and Chapeau-Blondeau [224, 222] used such a threshold array for signal detection. They first showed an SR noise benefit for Neyman-Pearson detection and for Bayesian detection. Researchers have also shown mutual- information noise benefits in arrays of threshold neurons [109, 180, 246, 248]. Suppose first that the signal pdfsf i are equally likely (p 0 =p 1 ) and symmetric aroundm i so that f i (m i +x) = f i (m i x) for allx (4.44) where m 0 < m 1 . Then = m 0 +m 1 2 is a reasonable detection threshold for individual neurons because of this symmetry. Suppose there areK parallel arrays such that each parallel array has 85 M noisy threshold neurons. Suppose that thek th parallel array receives an independent sample X k of the input signalX. The output of them th threshold neuron in thek th array is Y m;k ( N ) = sign(X k +N m ) (4.45) where N is the scale parameter of an additive signal independent symmetric noiseN m in them th threshold neuron. Assume that the noise random variablesN m are i.i.d. for each neuron and for each array. Suppose last that the outputW k of thek th array sums allM threshold neuron outputs Y m;k : W k ( N ) = M X m=1 Y m;k ( N ): (4.46) A summing-threshold neuron combines the outputsW k from each parallel array into and then usessign() for maximum-likelihood detection: ( N ) = K X k=1 W k ( N ) H 1 > < H 0 0: (4.47) since the pdf of is symmetric around zero because the signal and noise pdfs are symmetric. Now define i ( N ) and 2 i ( N ) as the respective mean and variance of under the hypothesis H i when N is the neuron’s noise intensity. Then 0 ( N ) = 1 ( N ) and 2 0 ( N ) = 2 1 ( N ) for all N also because all signal and noise pdfs are symmetric. The pdf of is approximately Gaussian for either hypothesis because the central limit theorem applies to the sum (5.10) if the sample sizeK is large since the summands are i.i.d. Then Theorem 4.5 gives a necessary and sufficient condition for an SR effect (total or partial) in the parallel-array detector (4.45)-(5.10). 86 Theorem 4.5 Suppose that ( N )jH 0 N( 0 ( N ); 2 0 ( N )) and ( N )jH 1 N( 1 ( N ); 2 1 ( N )) where 0 ( N ) = 1 ( N ) and 2 0 ( N ) = 2 1 ( N ) for the threshold-neuron array model (4.45)-(5.10). Then 1 ( N ) 0 1 ( N ) > 1 ( N ) 0 1 ( N ) (4.48) is necessary and sufficient for an SR effect at the noise intensity N in the parallel-array maximum- likelihood detector. Proof: The detection probability obeys P D = 1 2 P (( N )< 0jH 0 ) + 1 2 P (( N )> 0jH 1 ) (4.49) = 1 ( N ) 0 ( N ) (4.50) because 0 ( N ) = 1 ( N ) and 2 0 ( N ) = 2 1 ( N ) as discussed above. Here is the CDF of the standard normal random variable. Then the chain and quotient rules of differential calculus give dP D d N = ~ 1 ( N ) 1 ( N ) 1 ( N ) 0 1 ( N ) 1 ( N ) 1 0 ( N ) 2 1 ( N ) : (4.51) So 1 ( N ) 0 1 ( N )> 1 ( N ) 0 1 ( N ) is necessary and sufficient for the SR effect ( dP D d N > 0) at the noise intensity N because ~ is the pdf of the standard normal random variable. Figure 4.5 shows a simulation instance of the SR condition in Theorem 4.5 for the parallel- array maximum-likelihood detection of Gaussian mixture signals in the hypothesis test of 87 0 0.5 1 1.5 2 2.5 3 3.5 4 −0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Standard deviation σ N of additive Gaussian noise σ 1 (σ n )μ’ 1 (σ n ) − μ 1 (σ n )σ’ 1 (σ N ) σ opt (a) 0 0.5 1 1.5 2 2.5 3 3.5 4 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 Standard deviation σ N of additive Gaussian noise Probability of Correct Decision P D H 0 : X ~ 0.5φ(−0.45,2) + 0.5φ(0.45,2) H 1 : X ~ 0.5φ(1−0.45,2) + 0.5φ(1+0.45,2) Number of Parallel Arrays K = 36 p 0 = p 1 σ opt M = 16 M = Number of Threshold Neurons in Each Array (b) Figure 4.5: SR noise benefits based on inequality (5.93) for array signal detection. SR noise benefits in neural array signal detection. (a) The smoothed plot of 1 ( N ) 0 1 ( N ) - 1 ( N ) 0 1 ( N ) versus the standard deviation N of additive Gaussian noise. The zero-crossing occurs at the noise standard deviation N = opt . (b) The solid line and square markers show the respective plots of the detection probabilities P D . Adding small amounts of Gaussian noise N to each threshold neuron improves the array detection probabilityP D . This SR effect occurs until inequality (5.93) holds. So opt maximizes the detection probability. H 0 :f 0 (x) = 1 2 (0:45; 2;x) + 1 2 (0:45; 2;x) (4.52) versus H 1 :f 1 (x) = 1 2 (1-0:45; 2;x) + 1 2 (1+0:45; 2;x) (4.53) 88 0 0.5 1 1.5 2 2.5 3 3.5 4 0.7 0.75 0.8 0.85 0.9 0.95 Standard deviation σ N of additive Gaussian noise Probability of Correct Decision P D H 0 : X ~ 0.5φ(−0.45,2) + 0.5φ(0.45,2) H 1 : X ~ 0.5φ(1−0.45,2) + 0.5φ(1+0.45,2) Number of Parallel Arrays K = 36 p 0 = p 1 M =32 M =16 M =8 M =4 M =1 M = Number of Threshold Neurons in Each Array Figure 4.6: Collective noise benefits in parallel-array maximum-likelihood signal detection for different values ofM (the number of threshold neurons in each ofK = 36 arrays). The solid lines show that the detection probabilityP D improves initially as the noise intensity N increases. The solid lines also show that the SR effect increases as the numberM of threshold neurons increases. The dashed line shows that the SR effect does not occur ifM = 1. where(v;d;x) is the pdf of the Gaussian random variable with meanv and varianced 2 . We usedK = 36 parallel arrays and so tookK = 36 i.i.d. random samples of the input signalX. Each parallel array hasM = 16 threshold neurons and each receives the same input signal sample. The threshold is m 0 +m 1 2 = 0.5 because these signal pdfsf i are symmetric aroundm 0 = 0 andm 1 = 1. The noiseN in the neurons are i.i.d. Gaussian random variables. Figure 4.5(a) plots smoothed 1 ( N ) 0 1 ( N ) - 1 ( N ) 0 1 ( N ) versus the standard deviation N of the additive Gaussian noise. Adding small amounts of noise N before thresholding improves the array’s overall detection probabilityP D in Figure 4.5(b). This total SR effect occurs until the inequality (5.93) holds in Figure 4.5(a). 89 Figure 4.6 shows that the collective noise benefit increases as the number M of threshold neurons increases in each ofK = 36 arrays. The dashed line shows that the SR effect does not occur for an individual threshold neuron (M = 1) because inequality (4.33) of Theorem 4.4 does not hold for the specific signal pdfsf i in (4.52)-(4.53). But Figure 4.6 shows that the SR effect does occur if we use more than one threshold neuron (ifM > 1) in a large number (K > 30) of parallel arrays. And Figure 4.4 shows that the two bimodal Gaussian-mixture signal pdfsf 0 and f 1 in (5.98)-(4.43) satisfy inequality (4.33) of Theorem 4.4. So one threshold neuron can produce the SR effect. 90 Chapter 5 Noise Benefits in Quantizer-Array Correlation Detection and Watermark Decoding This chapter demonstrates that quantizer noise can improve statistical signal detection in array- based nonlinear correlators in Neyman-Pearson and maximum-likelihood detection. This holds even for infinite-variance symmetric-stable channel noise and for generalized-Gaussian chan- nel noise. Noise-enhanced correlation detection leads to noise-enhanced watermark extraction based on such nonlinear detection at the pixel or bit level. The first of two new theorems gives a necessary and sufficient condition for such quantizer noise to increase the detection probability of a constant signal for a fixed false-alarm probability if the channel noise is symmetric and if the number of samples is large. The second theorem shows that the array must contain more than one quantizer for such a stochastic-resonance noise benefit if the symmetric channel noise is uni- modal. It also shows that the noise-benefit rate improves in the small-quantizer noise limit as the number of array quantizers increases. The second theorem further shows that symmetric uniform quantizer noise gives the optimal rate of initial noise benefit among all finite-variance symmetric scale-family noise types. Two corollaries give similar results for the stochastic-resonance effect in maximum-likelihood detection of a deterministic bipolar spreading sequence with unknown 91 amplitude. Simple limiting-array correlation detectors perform better than many other nonlinear detectors for Neyman-Pearson detection of a constant signal in mid-to-low impulsive symmetric -stable noise. Or they can perform nearly optimally for the maximum-likelihood detection of an antipodal sequence in a generalized-Gaussian channel noise without using the signal amplitude. Watermark decoding illustrates these noise benefits. A simple soft-limiter-based correlator gives a performance similar to that of a more complex generalized-Gaussian detector and of a Cauchy detector even though these two detectors require knowledge of the watermark-signal amplitude. 5.1 Noise Benefits in Array Signal Detection Figure 5.1 shows an SR noise benefit in the maximum-likelihood watermark extraction of the ‘yin-yang’ image embedded in the discrete-cosine transform (DCT-2) coefficients of the ‘Lena’ image. The ‘yin-yang’ image of Figure 5.1(a) is the 6464 binary watermark message em- bedded in the mid-frequency DCT-2 coefficients of the 512512 gray-scale ‘Lena’ image using direct-sequence spread spectrum [106]. Figure 5.1(b) shows the result when the ‘yin-yang’ figure watermarks the ‘Lena’ image. Figure 5.1(c) shows that small amounts of additive uniform quan- tizer noise improve the watermark-extraction performance of the noisy quantizer-array-based ML detector while too much noise degrades the performance. Uniform quantizer noise with standard deviation = 1 reduces more than 33% of the pixel-detection errors in the extracted watermark image. Figures 5.1(d)-5.1(g) show simulation instances of the retrieved watermark images using the linear correlator and the ML noisy quanizer-array-based nonlinear correlation detector. Com- paring the retrieved watermark images and their respective pixel-detection errors confirms both that the nonlinear detector outperforms the linear correlator and that adding uniform noise in the 92 6464 Yin-yang watermark image Enlarged yin−yang Image 256 × 256 Lena Image Watermarked With YinYang Image Peak Signal−to−Noise Ratio PSNR = 46.6413dB 0 0.5 1 1.5 2 20 30 40 50 60 70 80 90 Standard Deviation σ of Uniform Quantizer Noise Pixel Errors in Watermark Extraction Watermark Extraction Errors Using Noisy Quatizer−Array−Based Detector Q = 30 Q → ∞ (a) (b) (c) 672 pixel errors Linear Correlation Detection 75 pixel errors σ = 0 48 pixel errors σ = 1 79 pixel errors σ = 3 (d) (e) (f) (g) Figure 5.1: Noise-enhanced digital watermark decoding SR noise benefits in quantizer-array-based maximum-likelihood (ML) watermark decoding. (a) Binary 6464 watermark ‘yin-yang’ image. (b) Watermarked 512512 ‘Lena’ image. Direct-sequence spread spectrum embeds each message bit of the yin-yang image in a set of mid-frequency discrete cosine trans- form (DCT-2) coefficients of the gray-scale ‘Lena’ image. (c) Nonmonotonic quantizer-noise-enhanced watermark-detection performance plot of the array-based ML detectors. The noisy array detector hadQ = 30 quantizers. Uniform quantizer noise decreased the pixel-detection error by more than 33 %. The solid U-shaped line shows the average pixel-detection errors of 200 simulation trials. The dashed vertical lines show the total min-max deviations of pixel-detection errors in these simulation trials. The dashed U-shaped line gives the lower bound on the pixel-detection error that any quantizer-array detector with symmetric uniform quantizer noise can achieve if it increases the number of quantizersQ in its array. (d) Retrieved ’yin-yang’ image using the ML linear correlation detector. (e) Retrieved ’yin-yang’ image using the ML noiseless quantizer-array-based detector. This nonlinear detector outperforms the linear correlation detector. (f) Retrieved ’yin-yang’ image using the ML noisy quantizer-array-based detector. The additive uniform quantizer noise improves the detection of the quantizer-array detector by more than 33% as the uniform quantizer noise standard deviation increases from = 0 to = 1. (g) Too much quantizer noise degrades the watermark detection. The SR effect is robust against the quantizer noise intensity because the pixel-detection error in (g) is still less than the pixel-detection errors in (d). 93 quantizer array gives a robust and nonmotonic performance improvement. This appears to be the first demonstration of suprathreshold SR noise benefit in quantizer-array-based DCT-domain watermark extraction. The quantizer-array-based detector consists of a nonlinear preprocessor that precedes a cor- relator and a likelihood-ratio test on the correlator’s output. This nonlinear detector takes K samples of a noise-corrupted signal and then sends each sample to the nonlinear preprocessor ar- ray ofQ noisy quantizers connected in parallel. Each quantizer in the array adds its independent quantizer noise to the noisy input sample and then quantizes this doubly noisy data sample into a binary value. The quantizer array output for each sample is just the sum of allQ quantizer out- puts. The correlator then correlates these preprocessedK samples with the signal. The detector’s final stage applies either the Neyman-Pearson likelihood-ratio test in Section 5.2 or the maximum likelihood-ratio test in Section 5.4. Section 5.3 presents two SR noise-benefit theorems that apply to broad classes of channel and quantizer noises for the quantizer-array-based NP and ML detectors. Theorem 5.1 gives a necessary and sufficient condition for an SR noise benefit in Neyman-Pearson detection of a dc signal. Corollary 5.1 in Section 5.5 gives the same condition for the ML detection of a deterministic bipolar spreading sequence of unknown amplitude. These results apply to all types of symmetric channel noise and to symmetric quantizer noise. They require only that the number of data samplesK be large. Theorem 5.2 contains three results. They require only that the quantizer noise come from a symmetric scale-family probability density function (pdf) with finite variance. The first result shows that Q > 1 is necessary for an initial SR effect if the symmetric channel noise is uni- modal. The second result is that the rate of the initial SR effect in the small quantizer noise 94 limit (lim N !0 dP D d N ) improves if the number of quantizersQ in the array increases. This result implies that we should replace the noisy quantizer-array nonlinearity with its deterministic limit (Q!1) to achieve the upper-bound detection performance if the respective quantizer-noise cu- mulative distribution function has a simple closed form. The final result is that symmetric uniform quantizer noise gives the optimal initial SR effect rate among all symmetric scale-family noise types. Corollary 5.2 in Section 5.5 extends Theorem 5.2 to the ML detection of a deterministic bipolar spreading sequence of unknown amplitude. These results hold for any symmetric uni- modal channel noise even though we focus onSS noise and symmetric generalized-Gaussian channel noise. Theorem 5.2 and Corollary 5.2 involve an initial SR effect that either increases the detection probability P D or decreases the error probability P e for small amounts of noise. We define the SR effect as an initial SR effect if there exists someb> 0 such thatP D ( N )>P D (0) orP e ( N )< P e (0) for all N 2 (0;b). HereP D ( N ) andP e ( N ) are the respective detection probability and the error probability when the quantizer noise intensity is N .P D (0) andP e (0) are these probabilities in the absence of quantizer noise. Array-based noise benefits have only a recent history. Stocks [246] first showed that adding quantizer noise in an array of parallel-connected quantizers improves the mutual information between the array’s input and output. Then Rousseau and Chapeau-Blondeau [224, 222] used such a quantizer array for signal detection. They first showed the SR effect for Neyman-Pearson detection of time-varying signals and for Bayesian detection of both constant and time-varying signals in different types of non-Gaussian but finite-variance channel noise. Here we characterize their observed SR effects in terms of detector parameters such as the number of quantizersQ and the type of quantizer noise. 95 Our SR array results prove that quantizer noise can improve quantizer-array-based Neyman- Pearson and ML detection even when the channel noise is impulsive infinite-variance symmetric -stable (SS) noise. We show how the quantizer-array detector or its limiting nonlinear corre- lation detector benefit the Neyman-Pearson detection of a constant signal when theSS channel noise does not have a closed-form pdf. We also show how these quantizer-array-based detectors benefit the ML detection of an antipodal sequence in generalized-Gaussian and SS channel noise when we do not know the signal amplitude and the noise parameters. We then apply these detectors for ML watermark extraction from a set of watermarked images and show that SR ef- fect occurs in the the ML watermark extraction. The simple limiting-array nonlinear correlation detector does not need the watermark-signal amplitude but it still gives a watermark-extraction performance similar to the more computationally complex generalized-Gaussian detector that does require such knowledge. 5.2 Neyman-Pearson Binary Signal Detection in-Stable Noise This section develops the Neyman-Pearson statistical framework for the noise-benefit theorems that follow. The problem is to detect a known deterministic signals k in additive white symmetric -stable (SS) channel noiseV k givenK observed samplesX 1 ;:::;X K : H 0 : X k =V k H 1 : X k =s k +V k : (5.1) TheV k are independent and identically distributed (i.i.d.) zero-locationSS random variables. We consider only constant (dc) signals so thats k =A for allk. The characteristic function' of 96 theSS noise random variableV k has the exponential form [97, 192] '(!) = exp(j! j!j ) (5.2) where real is the location parameter,2 (0; 2] is the characteristic exponent that controls the density’s tail thickness, = > 0 is the dispersion that controls the width of the bell curve, and is the scale parameter. The bell curve’s tails get thicker as falls from 2 to near zero. So energetic impulses become more frequent for smaller values of . Figure plots four SS bell curves and their corresponding white noise samples. SS pdfs can model heavy-tailed or impulsive noise in many applications such as underwater acoustic signals, telephone noise, clutter returns in radar, internet traffic, financial data, and trans- form domain image or audio signals [2, 21, 1, 31, 87, 192, 266]. The only known closed-form SS pdfs are the thick-tailed Cauchy with = 1 and the thin-tailed Gaussian with = 2 [194]. The Gaussian pdf alone amongSS pdfs has a finite variance and finite higher-order moments. Themth lower-order moments of an-stable pdf with< 2 exist if and only ifm<. The location parameter serves as the proxy for the mean if 1< 2 and as a proxy for the median if 0< 1. The uniformly most powerful detector for the hypotheses in (5.1) is a Neyman-Pearson log- likelihood ratio test [252, 280]: NP (X) = K X k=1 log(f (X k -s k ))- log(f (X k )) H 1 > < H 0 (5.3) 97 −5 0 5 0 0.1 0.2 0.3 0.4 0.5 0.6 ν f α (ν) SαS densities for different values of α and a fixed γ α = 2 α = 1.5 α = 1 α = 0.5 γ = 1 −6 −4 −2 0 2 4 6 0 0.1 0.2 0.3 0.4 0.5 0.6 SαS densities for different values of γ and a fixed α ν f α (ν) α = 1.5 γ = 1 γ = 0.5 γ = 2 γ = 1.5 (a) (b) 0 100 200 300 400 500 600 700 800 900 1000 −1 −0.5 0 0.5 x 10 6 0 100 200 300 400 500 600 700 800 900 1000 −1000 −500 0 0 100 200 300 400 500 600 700 800 900 1000 −60 −40 −20 0 20 0 100 200 300 400 500 600 700 800 900 1000 −10 −5 0 5 10 Sample k V k V k V k V k (α = 0.5, γ = 1) (α = 1, γ = 1) (α = 1.5, γ = 1) (α = 2, γ = 1) Realizations of SαS noise for different values of α and a fixed γ (c) Figure 5.2: Samples of symmetric -stable probability densities and their white-noise realiza- tions. (a) Probability density functions with zero-location ( = 0) and unit dispersion ( = 1) for = 2, 1.5, 1, and 0.5. The densities are bell curves that have thicker tails as increases. The case = 2 gives a Gaussian density with variance two (or unit dispersion). The parameter = 1 gives Cauchy density. (b) Probability density functions for = 1.5 and = 0 with dispersion = 0.5, 1, 1.5, and 2. (c) Samples of -stable random variables with zero location and unit dispersion. The plots show white-noise realizations when = 0.5, 1, 1.5, and 2. Note the scale differences ony axis. The-stable variablev becomes much more impulsive as the parameter falls. 98 because the observedK samples X =fX 1 ;:::;X K g are i.i.d. We choose so that it has a preset false-alarm probabilityP FA =. This Neyman-Pearson detector is difficult to implement because again the SS pdf f has no closed form except for = 1 and = 2. The NP detector does reduce to the simpler test L (X) = K X k=1 s k X k H 1 > < H 0 (5.4) if the additive channel noiseV k is Gaussian ( = 2) [252]. But even this linear correlation de- tector is suboptimal when the channel noise is non-Gaussian. Its detection performance degrades drastically as the channel noise pdf departs further from Gaussianity [8, 252]. An important special case is the NP detector for Cauchy ( = 1) channel noise. The zero- location Cauchy random variable ( = 1) has the closed-form pdf f V k (v k ) = 1 2 +v 2 k (5.5) for realv k and positive dispersion . The NP detector is nonlinear for such Cauchy channel noise and has the form C NL (X) = K X k=1 log 2 + (X k As k ) 2 2 + (X k +As k ) 2 H 1 > < H 0 : (5.6) So the NP Cauchy detector (5.6) does not have a correlation structure. It is also more compu- tationally complex than the NP linear correlation detector (5.4). But the NP Cauchy detector 99 performs well for highly impulsiveSS noise cases. Section 5.3 shows that some simple nonlin- ear correlation detectors perform as well or even better than does the Cauchy detector when the SS noise is mildly impulsive (when 1.6). The locally optimal detector has the familiar correlation structure [192, 152] LO (X) = K X k=1 s k f 0 (X k ) f (X k ) = K X k=1 s k g LO (X k ) H 1 > < H 0 ~ (5.7) and coincides with the linear correlator (5.4) for the Gaussian noise ( = 2). The score function g LO is nonlinear for< 2. The locally optimal detector (5.7) performs well when the signal am- plitudeA is small. But this test is not practical whenf does not have a closed form becauseg LO requires bothf andf 0 . So researchers have suggested other suboptimal detectors that preserve the correlation structure but that replace g LO with different zero-memory nonlinear functions g [33, 32, 147, 252, 280]. These nonlinearities range from simple ad-hoc soft-limiters g SL (X k ) = 8 > > > > > > < > > > > > > : X k ifjX k jc 1 if X k >c 1 if X k <c (5.8) and hole-puncher functions g SL (X k ) = 8 > > < > > : X k ifjX k jc 0 else (5.9) 100 to more complex nonlinearities that better approximate g LO such as those based on an empiri- cal characteristic function, a scale-mixture approach, or an approximated pdf using discretized integrals [147, 115]. The next section presents two SR theorems for the simple nonlinear correlation detector that replaces the deterministic nonlinearity g LO with a noisy quantizer-array-based random nonlin- earityg NQ or its deterministic limitg N1 . We show that these detectors enjoy SR noise benefits. We then compare their detection performances with the Cauchy detector (5.6) and the nonlinear correlation detectors based on the simple soft-limiter and hole-puncher nonlinearities (5.8)-(5.9). 5.3 Quantizer Noise Benefits in Nonlinear-Correlation-Detector-Based Neyman-Pearson Detection This section presents the two main SR noise-benefit theorems for Neyman-Pearson detectors. The nonlinear correlation detector has the form NQ (X) = K X k=1 s k g NQ (X k ) H 1 > < H 0 ~ (5.10) where g NQ (X k ) = 1 Q Q X q=1 sign(X k +N q ): (5.11) Here is the quantization threshold andsign(X k +N q ) =1 forq = 1, ...,Q. We choose = A 2 because both the channel noiseV k and the quantizer noiseN q are symmetric noises. We assume that the additive quantizer noiseN q has a symmetric scale-family [41] noise pdf f N ( N ;n) = 1 N f ~ N ( n N ). Here N is the noise standard deviation andf ~ N is the standard pdf for 101 the family [41]. Then the noise cumulative distribution function (CDF) isF N ( N ;n) =F~ N ( n N ). Scale-family densities include many common densities such as the Gaussian and uniform but not the Poisson. We assume that the quantizer noise random variablesN q have finite variance and are independent and come from a symmetric scale family noise. The quantizer noise can arise from electronic noise such as thermal noise, shot noise, or avalanche noise from analog circuits [73, 105]. The noisy quantizer-array detector (5.10)-(5.11) is easy to use and requires only one bit to represent each quantizer’s output. This helps in applications of sensor networks and distributed systems that have limited energy or that involve limited data handling and storage [130, 199]. Define next i ( N ) and 2 i ( N ) as the respective mean and variance of NQ under the hypoth- esisH i when N is the quantizer noise intensity. Then 0 ( N ) = 1 ( N ) and 2 0 ( N ) = 2 1 ( N ) for all N because bothW andN are symmetric. The mean i and variance 2 i of the test statistic NQ depend on both the additive channel noiseW and on the quantizer noiseN. So i and 2 i depend on the noise intensities and N . But we write these two terms as i ( N ) and 2 i ( N ) because we control only the quantizer noise intensity N . The Appendix derives the complete form of i ( N ) and 2 i ( N ) in the respective equations (5.27) and (5.45). The additive structure of NQ in (5.10) gives rise to an important simplification. The pdf of NQ is approximately Gaussian for both hypotheses because the central limit theorem [41] applies to (5.10) if the sample sizeK is large since the random variables are i.i.d. and have finite vari- ance. Then Theorem 5.1 gives a necessary and sufficient inequality condition for the SR effect in the quantizer-array detector (5.10)-(5.11). This SR condition depends only on i ( N ) and 2 i ( N ) and on their first derivatives. 102 Theorem 5.1 Suppose that NQ jH 0 N( 0 ( N ); 2 0 ( N )) and NQ jH 1 N( 1 ( N ); 2 1 ( N )) where 0 ( N ) = 1 ( N ) and 2 0 ( N ) = 2 1 ( N ). Then 1 ( N ) 0 1 ( N ) > 1 ( N ) 0 1 ( N ) (5.12) is a necessary and sufficient for an SR noise-benefit in Neyman-Pearson detection based on the nonlinear test statistic NQ . Proof: The Neyman-Pearson detection rule based on NQ rejectsH 0 if NQ > because we choose such thatP ( NQ > jH 0 ) =. Standardizing NQ underH 0 gives P ( NQ > jH 0 ) = P NQ 0 ( N ) 0 ( N ) > 0 ( N ) 0 ( N ) jH 0 (5.13) = P Z > 0 ( N ) 0 ( N ) (5.14) = P (Z >z ) (5.15) where Z = NQ jH 0 0 ( N ) 0 ( N ) N(0,1) is a standard normal random variable and 1 (z ) = for the CDF of the standard normal random variable. Then the detection threshold is = z 0 ( N ) + 0 ( N ) becauseZ = NQ jH 0 0 ( N ) 0 ( N ) . Standardizing NQ underH 1 likewise gives P D = 1P ( NQ > jH 1 ) (5.16) = 1P ( NQ 1 ( N ) 1 ( N ) > 1 ( N ) 1 ( N ) jH 1 ) (5.17) = 1 z 2 1 ( N ) 1 ( N ) (5.18) 103 because 0 ( N ) = 1 ( N ) and 2 0 ( N ) = 2 1 ( N ). Then dP D d N = 2 z 2 1 ( N ) 1 ( N ) 1 ( N ) 0 1 ( N ) 1 ( N ) 1 0 ( N ) 2 1 ( N ) : (5.19) So 1 ( N ) 0 1 ( N )> 1 ( N ) 0 1 ( N ) is necessary and sufficient for the SR effect ( dP D d N > 0) because is the pdf of the standard normal random variableZ. Figure 5.3 shows a simulation instance of the SR condition in Theorem 5.1 for dc signal de- tection in an impulsive infinite-variance channel noise. The dc signal has magnitudeA = 0.5 and we set the false-alarm probabilityP FA toP FA = 0.1. The channel noise isSS with parameters = 1.85, = 1:7 1:85 = 2.67, and = 0. The detector preprocesses each of theK = 50 noisy samples X k withQ = 16 quantizers in the array. Each quantizer has quantization threshold =A=2 and adds the independent uniform quantizer noiseN to the noisy sampleX k before quantization. Fig- ure 3(a) shows the plot of 1 ( N ) 0 1 ( N ) - 1 ( N ) 0 1 ( N ) versus the standard deviation N of the additive uniform quantizer noise. We used 10 5 simulation trials to estimate 1 ( N ) and 1 ( N ) and then used the difference quotients 1 ( N j+1 ) 1 ( N j ) N j+1 N j+1 and 1 ( N j+1 ) 1 ( N j ) N j+1 N j+1 to estimate their first derivatives. Adding small amounts of quantizer noiseN improved the detection probability P D in Figure 3(b). This SR effect occurs until the inequality (5.12) holds in Figure 3(a). Figure 3(b) also shows the accuracy of the Gaussian (central limit theorem) approximation of the detec- tion statistic NQ ’s distribution. Circle marks show the detection probabilities computed from the 10 5 Monte-Carlo simulations. The solid line plots the respective approximate detection probabil- ity (5.18). 104 0 0.5 1 1.5 2 2.5 3 3.5 4 −0.2 0 0.2 0.4 0.6 0.8 1 Standard deviation σ N of the additive uniform quantizer noise N σ 1 (σ η )μ 1 ’(σ η ) − μ 1 (σ η )σ 1 ’(σ η ) SR Condition: SR occurs if and only if σ 1 (σ η )μ 1 ’(σ η ) − μ 1 (σ η )σ 1 ’(σ η ) > 0 No SR Region SR Region Optimal quantizer noise intensity σ N opt (a) 0 0.5 1 1.5 2 2.5 3 3.5 4 0.8 0.82 0.84 0.86 0.88 0.9 0.92 Standard deviation σ N of the additive uniform quantizer noise N Probability of Detection P D Gaussian Approximation Actual Optimal quantizer noise intensity σ N opt No SR Region SR Region V k ~ SαS(α = 1.85, γ =2.67, δ = 0) Number of samples K = 50 False−alarm probability P FA = 0.1 Signal strength A = 1 Quantizer noise N q ~ U(0,σ N ) Number of Quatizers Q = 16 (b) Figure 5.3: SR noise benefits based on inequality (5.12) for dc signal detection in-stable channel noise. (a) The plot of 1 ( N ) 0 1 ( N ) - 1 ( N ) 0 1 ( N ) versus the standard deviation N of additive uniform quan- tizer noise. The zero crossing occurs at the quantizer noise standard deviation N opt . (b) The solid line and circle markers show the respective plots of the detection probabilitiesP D with and without the Gaussian approximation of NQ ’s distribution. Adding small amounts of quantizer noiseN improved the detection probabilityP D by more than 7.5%. This SR effect occured until inequality (5.12) held. So N opt maximized the detection probability. 105 Theorem 5.1 helps characterize the SR noise benefit in terms of the numberQ of quantizers and the type of quantizer noise in Theorem 5.2 below. Theorem 5.2 states that it takes more than one quantizer to produce the initial SR effect and that the rate of initial SR effect increases as the numberQ of quantizers increases. It further states that uniform quantizer noise gives the maximal initial SR effect among all possible finite-variance symmetric scale-family quantizer noise. The- orem 5.2 follows from Theorem 5.1 if we substitute the expressions for 1 ( N ), 0 1 ( N ), 1 ( N ), and 0 1 ( N ) and then pass to the limit N ! 0. Theorem 5.2 (a) Q> 1 is necessary for the initial SR effect in the Neyman-Pearson detection of a dc signal in symmetric unimodal channel noiseV if the test statistic is the nonlinear test statistic NQ . (b) Suppose that the initial SR effect occurs in the quantizer-array detector (5.10)-(5.11) withQ 1 quantizers and with some symmetric quantizer noise. Then the rate of the initial SR effect in the quantizer-array detector (5.10)-(5.11) withQ 2 quantizers is larger than the rate of the initial SR effect withQ 1 quantizers ifQ 2 >Q 1 . (c) Zero-mean uniform noise is the optimal finite-variance symmetric scale-family quantizer noise in that it gives the maximal rate of the initial SR effect among all possible finite-variance quantizer noise in the Neyman-Pearson quantizer-array detector (5.10)-(5.11). 106 Proof: Part (a) The definition of the initial SR effect and Theorem 5.1 imply that the initial SR effect occurs if and only if there exists someb> 0 such that the condition (5.12) holds for all the quantizer-noise intensities N 2 (0;b). Then multiply both sides of (5.12) with 2 1 ( N ) to get 1 ( N ) 2 1 0 ( N ) < 2 2 1 ( N ) 0 1 ( N ) for all N 2 (0;b) (5.20) as a necessary condition for the initial SR effect. We will prove that inequality (5.20) does not hold for all quantizer-noise intensities N in some positive interval if Q = 1. We first derive the equations for 1 ( N ), 0 1 ( N ), 2 1 ( N ), and 2 1 0 ( N ). We then show that the right-hand side of (5.20) is negative in some noise-intensity interval (0;h) while the left-hand side of (5.20) is positive in the same interval. Define first the random variablesY k =g NQ (X k ) fork = 1, ...,K. Then NQ = P K k=1 Y k and the population mean of NQ jH 1 is 1 ( N ) = E( NQ jH 1 ) = K X k=1 E(Y k jH 1 ): (5.21) TheY k jH 1 are i.i.d. random variables with population mean E(Y k jH 1 ) = Z 1 1 E(Y k jX k =x;H 1 )f X k jH 1 (x)dx (5.22) = Z 1 1 E(Y k jV k =v;H 1 )f V (vA)dv: (5.23) 107 HereA is the dc signal andE(Y k jX k =x;H 1 ) is the conditional mean when the received signal X k = x and when H 1 is true. Then (5.26) follows because X k jH 1 = V k +A and V k are i.i.d. channel-noise random variables with common pdff V . WriteE(Y k jV k =v;H 1 ) =E(Y k jv;H 1 ). Define nextY k;q =sign(X k +N q ) where theN q are finite-variance i.i.d. scale-family quantizer-noise random variables with variance 2 N and CDFF N . Then E(Y k jv;H 1 ) = E(Y k;q jv;H 1 ) (5.24) = 1 2F N ( A 2 v) (5.25) = 1 2F~ N ( A 2 v N ) (5.26) whereF~ N is the standard CDF for the scale family of the quantizer noise. So (5.21) and (5.26) imply that 1 ( N ) = Z 1 1 K " 1 2F~ N ( A 2 v N ) # f V (vA)dv: (5.27) > 0 (5.28) because 1 2F~ N ( A 2 v N ) is a non-decreasing odd function around A=2 while f V (vA) is a symmetric unimodal density with modeA. We next derive the equations for 0 1 ( N ) and then find the limiting value lim N !0 0 1 ( N ) N . Note first that 0 1 ( N ) = @ @ N Z 1 1 K " 1 2F~ N ( A 2 v N ) # f V (vA)dv (5.29) = Z 1 1 ( A 2 v N ) K N f ~ N ( A 2 v N )f V (vA)dv (5.30) 108 because of the distributional derivatives [120] of 1 ( N ) and 1 2F~ N ( A 2 v N ) with respect to N . This allows us to interchange the order of integration and differentiation [120] in (5.29)-(5.30). Next substitute A 2 v N =n in (5.30) to get 0 1 ( N ) N = Z 1 1 nK f V (-n N - A 2 ) N f ~ N (n)dn (5.31) = Z 1 0 nK f V ( A 2 +n N )f V ( A 2 n N ) N f ~ N (n)dn (5.32) becausef ~ N is a symmetric pdf. Then lim N !0 0 1 ( N ) N = lim N !0 Z 1 0 nK f V (- A 2 +n N )f V (- A 2 -n N ) N f ~ N (n)dn = Z 1 0 nK lim N !0 f V (- A 2 +n N )f V (- A 2 -n N ) N f ~ N (n)dn because of Lebesgue’s dominated convergence theorem [77] = K Z 1 0 2n 2 f 0 V ( A 2 )f ~ N (n)dn (5.33) because of L’Hospital’s rule [227] = Kf 0 V ( A 2 ) Z 1 0 2n 2 f ~ N (n)dn (5.34) = 2Kf 0 V ( A 2 ) (5.35) becausef ~ N is the symmetric pdf of the zero-mean and unit-variance quantizer noise. The uni- modality of the symmetric channel noiseV implies that (5.35) is negative. Then 0 1 ( N ) is nega- tive for all noise intensities N in some interval (0;h). Then (5.35) also implies that lim N !0 0 1 ( N ) = 0: (5.36) 109 We now derive expressions for the population variance 2 1 ( N ) of NQ jH 1 and its distributional derivative 2 1 0 ( N ). TheY k jH 1 are i.i.d. random variables. So 2 1 ( N ) = Var ( NQ jH 1 ) (5.37) = KVar(Y k jH 1 ) (5.38) = K E(Y 2 k jH 1 )E 2 (Y k jH 1 ) : (5.39) where E(Y k jH 1 ) = 1 ( N ) K (5.40) and E(Y 2 k jH 1 ) = Z 1 1 E(Y 2 k jv;H 1 )f V (vA)dv: (5.41) But E(Y 2 k jv;H 1 ) = 1 Q E(Y 2 k;q jv;H 1 ) + Q 1 Q E 2 (Y k;q jv;H 1 ) (5.42) = 1 Q + Q 1 Q E 2 (Y k;q jv;H 1 ) (5.43) becauseE(Y 2 k;q jv;H 1 ) = 1 by the definition ofY k;q = 1 Q + Q 1 Q " 1 2F~ N ( A 2 v N ) # 2 (5.44) because of (5.26). Putting (5.44) in (5.41) and then putting (5.40) and (5.41) in (5.39) gives 110 2 1 ( N ) = Z 1 1 K Q f V (vA)dv + Z 1 1 K(Q 1) Q " 1 2F~ N ( A 2 v N ) # 2 f V (vA)dv K 2 1 ( N ): (5.45) Then the distributional derivative of 2 1 ( N ) with respect to N is 2 1 0 ( N ) = " Z 1 1 K(Q 1) Q 2[1 2F~ N ( A 2 v N )] ( A 2 v N ) 1 N f ~ N ( A 2 v N )f V (vA)dv # 2K 1 ( N ) 0 1 ( N ): (5.46) Equation (5.46) implies that 2 1 0 ( N ) is positive for all N 2 (0;h) ifQ = 1 because 0 1 ( N ) is negative in the quantizer-noise-intensity interval (0;h) and because of (5.28). So the left-hand side of (5.20) is positive and the right-hand side of (5.20) is negative for all N 2 (0;h) ifQ = 1. HenceQ> 1 is a necessary condition for the initial SR effect in the Neyman-Pearson detection of a dc signalA in symmetric unimodal channel noiseV using the nonlinear test statistic NQ . Part (b) Take the limit N ! 0 on both sides of (5.19) to find the rate of the initial SR effect near a zero quantizer-noise intensity. Then lim N !0 dP D d N = 2(z 2 1 (0) 1 (0) ) 1 (0) 0 1 (0)- 1 (0) lim N !0 1 0 ( N ) 2 1 (0) (5.47) = 2(z 2 1 (0) 1 (0) ) 1 (0) lim N !0 1 0 ( N ) 2 1 (0) : (5.48) 111 because of (5.36). We know that lim N !0 2 1 0 ( N ) = 2 1 (0) lim N !0 1 0 ( N ) (5.49) because 2 1 0 ( N ) = 2 1 ( N ) 1 0 ( N ). Equations (5.48) and (5.49) imply that the rate of the initial SR effect increases if lim N !0 2 1 0 ( N ) decreases. Further: lim N !0 2 1 0 ( N ) = lim N !0 Z 1 1 K(Q 1) Q 2[1 2F~ N (n)] nf ~ N (n)f V ( A 2 N n)dn (5.50) if we substitute A 2 v N =n in (5.46). Then lim N !0 2 1 0 ( N ) = 2K(Q 1)f V ( A 2 ) Q Z 1 1 [1 2F~ N (n)]nf ~ N (n)dn: (5.51) The integrand of (5.51) is negative because [1 2F~ N (n)]n is a non-positive function. Then lim N !0 2 1 0 ( N ) decreases as the number Q of quantizers increases because Q 2 1 Q 2 > Q 1 1 Q 1 if Q 2 >Q 1 . So the initial SR effect withQ 2 quantizers is larger than that of the detector withQ 1 quantizers ifQ 2 >Q 1 . Part (c) Fix the channel noiseV and the numberQ of quantizers and choose the symmetric scale-family quantizer noiseN. Equations (5.48) and (5.49) imply that the rate of decrease in the error prob- ability with respect to N is a maximum if lim N !0 2 1 0 ( N ) achieves its minimum value. We 112 want to find the symmetric standard pdff ~ N of the zero-mean unit-variance scale-family quantizer noise ~ N that minimizes Z 1 1 [1 2F~ N (n)]nf ~ N (n)dn: (5.52) Suppose first that the quantizer noise ~ N is a symmetric discrete random variable on sample space ~ N =fn ` 2 R : `2L Zg wheren 0 = 0. Suppose also thatn ` =n ` andn ` > 0 for all`2N =f1; 2; 3;:::g. LetP~ N (n ` ) =P ( ~ N =n ` ) = p ` 2 denote the probability density of this symmetric standard discrete quantizer noise ~ N whereP~ N (n ` ) = p ` 2 =P~ N (n ` ) for all`2N and P~ N (n 0 ) =p 0 . So we replace the integral (5.52) with the appropriate sum: 1 X `=1 [1 2F~ N (n ` )]n ` P~ N (n ` ): (5.53) Then we need to find the densityP~ N that minimizes (5.53). Then 1 X `=1 [1 2F~ N (n ` )]n ` P~ N (n) = 1 X `=1 n ` P~ N (n ` ) 2 1 X `=1 F~ N (n ` )n ` P~ N (n ` ) (5.54) = 2 1 X `=1 F~ N (n ` )n ` P~ N (n ` ) (5.55) = 2 1 X `=1 [F~ N (n ` )F~ N (n ` )]nP~ N (n) (5.56) 0: (5.57) Equality (5.55) follows because ~ N is zero-mean while (5.56) follows because the densityP~ N (n) is symmetric [ n ` =n ` and P~ N (n ` ) = P~ N (n ` ) ]. So we need to find the density P~ N that maximizes 2 P 1 `=1 [F~ N (n ` )F~ N (n ` )]nP~ N (n) becauseF~ N (n ` )F~ N (n ` ) is positive. Write 113 2 1 X `=1 [F~ N (n ` )F~ N (n ` )]nP~ N (n) = 2 1 X `=1 [F~ N (n ` )F~ N (n ` )] p P~ N (n)n p P~ N (n) (5.58) Then the Cauchy-Schwarz inequality [77] gives 2 1 X `=1 [F~ N (n ` )F~ N (n ` )]nP~ N (n ` ) 2 " 1 X `=1 [F~ N (n ` )F~ N (n ` )] 2 P~ N (n) #1 2 " 1 X `=1 n 2 ` P~ N (n ` ) #1 2 (5.59) = 2 " 1 X `=1 [F~ N (n ` )F~ N (n ` )] 2 P~ N (n) #1 2 1 p 2 (5.60) because ~ N has unit variance = p 2 (p 1 +2p 0 ) 2 4 p 1 2 + (p 2 +2p 1 +2p 0 ) 2 4 p 2 2 + (p 3 +2p 2 +2p 1 +2p 0 ) 2 4 p 3 2 +::: 1 2 (5.61) because ~ N is symmetric and soP~ N (n ` ) =P~ N (n ` ) =p ` =2 for all`2N p 2 " (p 0 +p 1 +p 2 +p 3 +:::) 3 + (p 0 +p 1 +p 2 +p 3 +:::) 3 3 8 # 1 2 (5.62) = p 2 " 1 + 1 3 8 #1 2 because P 1 `=0 p ` = 1 (5.63) = 1 p 3 : (5.64) So (5.64) gives the upper bound on (5.53) for any discrete symmetric standard quantizer noise ~ N. The symmetric unit-variance discrete bipolar quantizer noise ~ N [P~ N (1) =P~ N (1) = 1/2] does not achieve the upper bound (5.64). 114 Suppose now that the quantizer noise pdff N is continuous. Then we want to find the symmet- ric standard pdff ~ N of a zero-mean unit-variance scale-family quantizer noise ~ N that maximizes Z 1 1 j[1 2F~ N (n)]nf ~ N (n)jdn (5.65) because the integrand of (5.51) is non-positive. Write Z 1 1 j[1 2F~ N (n)]nf ~ N (n)jdn = Z 1 1 [1 2F~ N (n)] p f ~ N (n)n p f ~ N (n) dn (5.66) Again the Cauchy-Schwarz inequality [77] gives Z 1 1 j[1 2F~ N (n)]nf ~ N (n)jdn Z 1 1 [1 2F~ N (n)] 2 f ~ N (n)dn 1=2 Z 1 1 n 2 f ~ N (n)dn 1=2 (5.67) = Z 1 1 [1 2F~ N (n)] 2 f ~ N (n)dn 1=2 (5.68) = E 1 2F~ N ( ~ N) 2 1=2 (5.69) = E (1 2U) 2 1=2 (5.70) = 1= p 3: (5.71) Equations (5.70)-(5.71) follow becauseU =F~ N ( ~ N) is a uniform random variable in [0; 1] for any continuous quantizer noise ~ N [41]. Inequality (5.67) becomes an equality if and only if theF~ N satisfy [1 2F~ N (n)] 2 = n 2 (5.72) 115 for some constant on the support off ~ N [77]. But (5.72) implies that F~ N (n) = 1 2 + p n 2 for alln2 [ 1 p ; 1 p ] (5.73) for = 1/3 becauseF~ N is the CDF of a standard (zero-mean and unit-variance) quantizer noise. The same CDF in (5.73) implies that ~ N is uniformly distributed in [ p 3; p 3]. So the symmetric uniform quantizer noise achieves the upper bound (5.71) and hence it gives the maximal rate of the initial SR effect among all finite-variance continuous scale-family quantizer noise. The upper bound of (5.71) is the same as the upper bound of (5.64). So zero-mean continu- ous uniform noise is the optimal finite-variance symmetric scale-family quantizer noise because it gives the maximal rate of the initial SR effect among all possible finite-variance quantizer noise in the quantizer-array detector (5.10)-(5.11). Figure 5.4 shows simulation instances of Theorem 5.2(a)-(b). The thin dashed line in Figure 5.4 shows that the SR effect does not occur ifQ = 1 as Theorem 5.2(a) predicts. The solid lines show that the initial SR effect increases as the number Q of quantizers increases as Theorem 5.2(b) predicts. Q = 32 quantizers gave a 0.925 maximal detection probability and thus gave an 8% improvement over the noiseless detection probability 0.856. The thick dashed line in Figure 5.4 shows the upper bound on the detection probability that any noisy quantizer-array detector (5.10)-(5.11) with symmetric uniform quantizer noise can achieve if it increases the numberQ of quantizers in its array. 116 0 0.5 1 1.5 2 2.5 3 3.5 4 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Standard deviation σ N of the additive uniform quantizer noise N Probability of Detection P D Number of samples K = 50 False−alarm probability P FA = 0.1 Signal strength A = 1 Quantizer noise N q ~ U(0,σ N ) Number of Quatizers Q = 1,4,8,16,32, ∞ Channel noise V k ~ SαS(α = 1.85, γ =2.67, δ = 0) Optimal SαS detector Q = 1 Q = 4 Q = 16 Q = 8 Q = 32 Q → ∞ Figure 5.4: Initial SR effects in a nonlinear correlation detector for dc signal detection in impul- sive symmetric-stable channel noise ( = 1.85). The solid lines show that the detection probabilityP D improves at first as the quantizer noise intensity N increases. The dashed line shows that the SR effect does not occur ifQ = 1 as Theorem 5.2(a) predicts. The solid lines also show that the rate of the initial SR effect increases as the number Q of quantizers increases as Theorem 5.2(b) predicts. Theorem 5.2(b) implies that the limit lim Q!1 P D (g NQ ) gives the upper bound on the noisy quantizer-array detector’s detection probability. The right side of (5.11) is the conditional sample mean of the bipolar random variableY k;q =sign(X k +N q ) =sign(X k +N q A=2) given X k =x k because =A=2 and because the random variablesN q are i.i.d. Then the conditional expected value has the form E[Y k;q jX k =x k ] = 1 2F N ( N ;A=2x k ): (5.74) The CDFF N ( N ;) is the CDF of the symmetric scale-family quantizer noiseN q that has standard deviation N and that has CDFF~ N for the entire family. Then the strong law of large numbers [41] 117 implies that the sample meang NQ (X k ) in (5.11) converges with probability one to its population mean: lim Q!1 g NQ (X k ) = 1 2F N ( N ;A=2X k ) (5.75) = 1 2F~ N ( A=2X k N ) (5.76) = 2F~ N ( X k A=2 N ) 1 (5.77) Equality (5.77) follows because the quantizer noise is symmetric. So Theorem 5.2(b) implies that the detection probabilityP D (g N1 ) of the limiting nonlinear correlation detector N1 (X) = K X k=1 s k g N1 (X k ) H 1 > < H 0 ~ (5.78) with g N1 (X k ) = 2F~ N ( X k A=2 N ) (5.79) gives the upper bound on the detection performance of the quantizer-array detector when the quantizer noiseN has standard deviation N and when the scale-family CDF isF~ N . We can use this limiting non-noisy nonlinear correlation detector (5.78)-(5.79) if the quantizer noise CDF expressionF~ N ( A=2x k N ) has a simple closed form. Simulations show that the detection performance of the noisy quantizer-array detector quickly approaches the detection performance of the limiting quantizer-array (Q!1) detectors. So we can use a noisy quantizer-array de- tector (5.10)-(5.11) withQ near 100 to get a detection performance close to that of the limiting quantizer-array (Q!1) detector. 118 The limiting nonlinearityg N1 (X k ) is easy to use for symmetric uniform quantizer noise be- cause it is a shifted soft-limiter with shift =A=2: g N1 (X k ) = 8 > > > > > > < > > > > > > : X k A=2 ifjX k A=2jc 1 if (X k A=2)>c 1 if (X k A=2)<c (5.80) where c = p 3 N . Figure 4 shows that the limiting nonlinear correlation detector (5.78)-(5.79) with the shifted soft-limiter nonlinearity (5.80) gives almost the same detection performance as the optimalSS detector (5.3). We used the numerical method of [174] to compute theSS pdf f . The limiting-array nonlinearity (5.79) is monotonic non-decreasing while the asymptotic be- havior of the locally optimal nonlinearity in (5.7) is g LO (X k ) ( + 1)=X k . So if the signal strength A is small then quantizer-array-based detectors cannot perform better than nonlinear correlation detectors with nonmonotonic nonlinearities such as g LSO-P (X k ) = 8 > > < > > : (+1)X k c 2 ifjX k jc (+1) X k else (5.81) in [33, 32] or such as g(X k ) = aAs k X k 1 +bAs k X 2 k (5.82) in [280]. 119 0 0.5 1 1.5 2 2.5 3 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 Standard deviation σ N of the additive quantizer noise Probability of Detection P D Signal strength A = 1 Number of samples K = 50 Number of quantizers Q = 16 False−alarm probability P FA = 0.1 Symmetric α−stable Channel noise V k ~ SαS(α = 1.85, γ =2.67, δ = 0) Uniform Laplacian Gaussian Discrete bipolar Figure 5.5: Comparison of initial SR effects in the nonlinear correlation detector for dc signal detection in-stable channel noise ( = 1.85) for different types of symmetric quantizer noise. Symmetric uniform noise gives the maximal rate of the initial SR effect as Theorem 5.2(c) predicts. Sym- metric discrete bipolar noise gives the smallest SR effect and is the least robust. The SR noise benefit is most robust against Laplacian quantizer noise in these four cases. Figure 5.5 shows simulation instances of Theorem 5.2(c). It compares the initial SR noise benefits for different types of simple zero-mean symmetric quantizer noises such as Laplacian, Gaussian, uniform, and discrete bipolar noise when there areQ = 16 quantizers in the array. Sym- metric uniform noise gave the maximal rate of the initial SR effect as Theorem 5.2(c) predicts. It also gives the maximal SR effect (maximum increase in the detection probability) compared to Laplacian, Gaussian, and discrete bipolar noise. Theorem 5.2(c) guarantees only a maximal rate for the initial SR effect. It does not guarantee a maximal SR effect for symmetric uniform noise. So some other type of symmetric quantizer noise may give the maximal SR effect in other detec- tion problems. Figure 5.5 also shows that symmetric discrete bipolar noise gives the smallest SR effect and is the least robust. The SR effect was most robust against Laplacian quantizer noise. 120 We next compare the detection performance of the Cauchy detector (5.6) with that of other nonlinear correlators. These other nonlinear correlation detectors are the soft-limiter nonlinearity (5.8), the hole-puncher nonlinearity (5.9), and the simple limiting-array (Q!1) nonlinearity (5.80) that corresponds to symmetric uniform quantizer noise. Figure 5.6 shows that the limiting- array (Q!1) detector (5.80) outperforms the Cauchy detector and the nonlinear correlation detectors based on the soft-limiter and the hole-puncher nonlinearities. Figure 5.6(a) plots their detection probabilitiesP D versus the false-alarm probabilitiesP FA . Figure 5.6(b) plots their de- tection probabilityP D versus theSS channel noise dispersion . The respective zero-location SS channel noiseV has the characteristic exponent = 1.6 and signal strengthA = 1.5. All the detectors in the Figure 5.6 useK = 50 samples. Figure 5.6 shows that the simple shifted-softlimiter detector gives the best performance among all the detectors. Its performance advantage is greatest for small false-alarm probabilities and for large dispersion values of theSS channel noise. The detectors perform equally well for large false-alarm probabilities (near 0.1) and for small dispersion values (near 1) of theSS channel noise. The best breakpoint or saturation parameterc for the limiting quantizer-array detector, soft- limiter detector, and hole-puncher detector depends on the impulsiveness of the channel noise and on the signal-to-noise ratio. So we searched for the best parameterc in the interval [0; 30] with an increment of 0.2 for these detectors and computed their respective detection probabilities for the best parameterc in the discrete parameter space [0 : 0:2 : 30]. Figure 5.7 compares the detection performances of various nonlinear detectors for four dif- ferent values of theSS tail-thickness parameter. Figures 5.7(a)-5.7(d) plot these detection probabilities P D versus the signal strengthA for (a) = 1, (b) = 1.3, (c) = 1.6, and (d) = 1.9 for the false-alarm probability P FA = 0.005 and when the number of samples is K = 50. 121 10 −3 10 −2 10 −1 10 0 False−alarm Probability P FA Probability of Detection P D Quantizer−array−based (Q → ∞) nonlinearity Cauchy detector Softlimiter nonlinearity Holepuncher nonlinearity 0.6 SαS channel noise (α = 1.6, γ = 1, δ = 0) Signal Strength A = 1.5 Number of Samples K = 50 (a) 1 1.5 2 2.5 3 3.5 4 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Dispersion γ of SαS channel noise Probability of Detection P D Quantizer−array−based (Q → ∞) nonlinearity Cauchy detector Softlimiter nonlinearity Holepuncher nonlinearity SαS channel noise (α = 1.6, δ = 0) Signal Strength A = 1.5 False−alarm probability P FA = 0.005 Number of samples K = 50 (b) Figure 5.6: Comparison of Neyman-Pearson detection performance of four nonlinear detectors for different values of (a) false-alarm probabilitiesP FA and (b)SS channel noise dispersion when the signal strengthA = 1 and the number of samplesK = 50. Both figures show that the limiting (Q!1) quantizer-array-based detector (5.80) performs better than the Cauchy detector and better than the nonlinear correlation detectors based on soft-limiter and hole- puncher nonlinearities. 122 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Signal Strength A Probability of Detection P D Comparison of detection probability P D vs. signal strength A for different detectors Cauchy detector Quantizer−array−based (Q → ∞) nonlinearity Softlimiter nonlinearity Holepuncher nonlinearity SαS channel noise (α = 1, γ = 2, δ = 0) False−alarm probability P FA = 0.005 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Signal Strength A Probability of Detection P D Comparison of detection probability P D vs. signal strength A for different detectors Cauchy detector Quantizer−array−based (Q → ∞) nonlinearity Softlimiter nonlinearity Holepuncher nonlinearity SαS channel noise (α = 1.3, γ = 2 1.3 , δ = 0) False−alarm probability P FA = 0.005 (a) (b) 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Signal Strength A Probability of Detection Comparison of Detection Performance P D vs. signal Strength A for different detectors Quantizer−array−based (Q → ∞) nonlinearity Cauchy detector Softlimiter nonlinearity Holepuncher nonlinearity SαS channel noise (α = 1.6, γ = 2 1.6 , δ = 0) False−alarm probability P FA = 0.005 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Signal Strength A Probability of Detection P D Comparison of detection probability P D vs.signal strength A for different detectors Quantizer−array−based (Q → ∞) nonlinearity Softlimiter nonlinearity Holepuncher nonlinearity Cauchy detector SαS channel noise (α = 1.9, γ = 2 1.9 , δ = 0) False−alarm probability P FA = 0.005 (c) (d) Figure 5.7: Neyman-Pearson detection performance of four nonlinear detectors for different val- ues of the signal strengthA when the tail-thickness parameter of theSS channel noise is (a) = 1, (b) = 1.3, (c) = 1.6, and (d) = 1.9. The Cauchy detector gives the best performance for the highly impulsive noise cases (a) and (b). The limiting-array (Q!1) detector (shifted soft-limiter) outperforms the other detectors for the medium-to- low impulsive noise cases (c) and (d). The performance difference between the shifted soft-limiter and the soft-limiter detector increases as falls and thus as theSS channel noise becomes more impulsive. The shifted soft-limiter detector, the soft-limiter detector, and the hole-puncher detectors perform similarly for mildly impulsiveSS channel noise (d). These simulation plots show that the limiting (Q!1) quantizer-array-based detector (shifted 123 soft-limiter) is the best simple nonlinear correlation detector for medium-to-low impulsiveSS noise cases (1.6 1.9). The Cauchy detector gave the best performance for the highly impulsive noise cases in Fig- ures 5.7(a)-5.7(b). This performance gain came at a higher cost of computational complexity. The Cauchy detector did not perform better than the shifted soft-limiter, the soft-limiter detector, or hole-puncher detector as increased and thus as the impulsiveness of theSS channel noise de- creased (case (d)). The shifted soft-limiter outperformed the other detectors in the medium-to-low impulsive noise cases (c) and (d). The performance difference between the shifted soft-limiter and the soft-limiter detector increased as theSS channel noise becomes more impulsive. The shifted soft-limiter detector, the soft-limiter detector, and the hole-puncher detectors had similar detection performances when = 1.9 (case (d)). The soft-limiter detector is more suitable for mild impulsiveSS noise (when> 1.9). Figures 5.7(a)-5.7(d) also show that the performance difference among all the detectors decreased when the signal strengthA became too low or too high. 5.4 Maximum-Likelihood Binary Signal Detection in Symmetric- Stable or Generalized-Gaussian Channel Noise We now consider the maximum-likelihood (ML) detection of a known deterministic bipolar sig- nal sequence s k of unknown amplitude A in either additive i.i.d. SS channel noise V k or generalized-Gaussian channel noise V k . We assume that the noise pdf parameters are also un- known. The ML detection usesK observed samplesX 1 , ...,X K : 124 H 0 : X k =As k +V k H 1 : X k =As k +V k : (5.83) Heres k is a known bipolar sequence that takes only the values1 for allk. The ML detector for (5.83) is a log-likelihood ratio test [252, 280]: ML (X) = K X k=1 log(f(X k -As k ))- log(f(X k +As k )) H 1 > < H 0 0 (5.84) Again the optimal ML detector (5.84) does not have a closed form when the channel noise isSS except for = 1 and = 2. Nor does the optimal ML detector have a correlation structure if< 2 and thus if theSS channel noiseV k is not Gaussian. The optimal ML detector for the hypothesis test (5.83) has the different form ML NL (X) = K X k=1 [jX k +As k j r jX k -As k j r ] H 1 > < H 0 0: (5.85) if the symmetric channel noise variablesV k are i.i.d. generalized Gaussian random variables with pdf f V k (v k ) = De Bjv k j r (5.86) 125 for V k . Here r is a positive shape parameter, B is an intensity parameter, and D is a normal- izing constant. The generalized Gaussian family [191] is a two-parameter family of symmetric continuous pdfs. The scale-family pdff(;n) of a generalized Gaussian noise has the form f(;n) = 1 f gg n = 1 r 2 [(3=r)] 1=2 [(1=r)] 3=2 e Bj n j r (5.87) wheref gg is the standard pdf of the family, is the standard deviation, and is the gamma func- tion. This family of pdfs includes all normal (r = 2) and Laplace (r = 1) pdfs. It includes in the limit (r!1) all continuous uniform pdfs on bounded real intervals. It can also model sym- metric platykurtic densities whose tails are heavier than normal (r< 2) or symmetric leptokurtic densities whose tails are lighter than normal (r > 2). Applications include noise modeling in image, speech, and multimedia processing [19, 65, 86, 146]. Generalized Gaussian noise can apply to watermark detection or extraction [51, 106, 193, 260]. The generalized-Gaussian ML detector (5.85) does not use the scale parameter. So (5.85) applies to watermark extraction in images when generalized Gaussian random variablesV k model mid-frequency discrete cosine transform (DCT) coefficients [28, 106] or subband discrete wavelet transform coefficients [268, 51]. But the mid-frequency DCT-coefficients of many images may have fatter tails than generalized Gaussian pdfs have. And using the generalized-Gaussian ML detector (5.85) may be difficult for non-Gaussian (r6= 2) noise because (5.85) requires joint estimation of the signal and noise parameters and because (5.85) also requires exponentiation with floating point numbers whenr6= 2. So Briassouli and Tskalides [31] have proposed using instead the Cauchy pdf to model the DCT coefficients. 126 The ML Cauchy detector has the form C NL (X) = K X k=1 log 2 + (X k As k ) 2 2 + (X k +As k ) 2 H 1 > < H 0 0: (5.88) It does not use exponentiation with floating point numbers. But the nonlinear detectors (5.85) and (5.88) require that we know the signal amplitudeA. And the ML estimation ofA is not easy for either detector when the bipolar sequences k varies withk. We next analyze a noisy quantizer-array correlation statistic g NQ and its limit g N1 . Neither uses the value ofA for the ML detection of (5.83). These nonlinearities are versions of (5.11) and (5.79) withA = 0. We show that the results of Theorems 5.1 and 5.2 also hold for the ML correlation detectors based ong NQ andg N1 . 5.5 Noise Benefits in Maximum-Likelihood Detection Using Quantizer- Array-Based Nonlinear Correlators The next two corollaries extend Theorems 5.1 and 5.2 to the ML detection problem in (5.83). We first restate the noisy quantizer-array-based correlation statisticg NQ (5.11) and its limitg N1 (5.79) with = 0 for the maximum-likelihood detection of hypotheses (5.83): NQ (X) = K X k=1 s k g NQ (X k ) H 1 > < H 0 0 (5.89) where g NQ (X k ) = Q X q=1 sign(X k +N q ) (5.90) 127 and N1 (X) = K X k=1 s k g N1 (X k ) H 1 > < H 0 0 (5.91) where g N1 (X k ) = 2F~ N ( X k N ) 1: (5.92) We use = 0 because the pdfs of the random samplesX k are symmetric aboutAs k andAs k given the hypotheses H 0 and H 1 of (5.83) and because both hypotheses are equally likely (p 0 = p 1 ). So the two ML detectors do not require that we know the signal amplitude A. The generalized-Gaussian ML detector (5.85) and the Cauchy ML detector (5.88) do require such knowledge. Corollary 5.1 below requires that the mean and variance of the detection statistics NQ and N1 in (5.89) and (5.91) obey 0 ( N ) = 1 ( N ) and 2 0 ( N ) = 2 1 ( N ) for all N . These equali- ties hold because (5.89)-(5.92) imply that NQ jH 0 = NQ jH 1 and N1 jH 0 = N1 jH 1 . The pdf of NQ is again approximately Gaussian for either hypothesis because the central limit theorem [41] applies to (5.10) if the sample sizeK is large for i.i.d. random variables with finite variances. Then the SR noise-benefit conditions of Theorems 5.1 and 5.2 also hold for the quantizer-array ML detector (5.89)-(5.90) and for its limiting-array (Q!1) ML detector (5.92). The proof of Corollary 5.1 mirrors that of Theorem 5.1. We state it for completeness because it is brief. 128 Corollary 5.1 Suppose that NQ jH 0 N( 0 ( N ); 2 0 ( N )) and NQ jH 1 N( 1 ( N ); 2 1 ( N )) where 0 ( N ) = 1 ( N ) and 2 0 ( N ) = 2 1 ( N ). Then the inequality 1 ( N ) 0 1 ( N ) > 1 ( N ) 0 1 ( N ) (5.93) is necessary and sufficient for the SR effect in the ML detection of (5.83) using the quantizer- array-based detector (5.89)-(5.90). Proof: The ML detection rule for NQ rejectsH 0 if NQ > 0. Then the correct-decision probabilityP CD = 1 -P e has the form P CD ( N ) = 1 2 P ( NQ < 0jH 0 ) + 1 2 P ( NQ > 0jH 1 ) (5.94) = 1 ( N ) 0 ( N ) (5.95) because 0 ( N ) = 1 ( N ) and 2 0 ( N ) = 2 1 ( N ). This gives dP D ( N ) d N = 1 ( N ) 1 ( N ) 1 ( N ) 0 1 ( N ) 1 ( N ) 1 0 ( N ) 2 1 ( N ) : (5.96) So 1 ( N ) 0 1 ( N )> 1 ( N ) 0 1 ( N ) is necessary and sufficient for the SR effect ( dP D d N > 0) because is the pdf of the standard normal random variable. 129 The lengthy proof of Corollary 5.2 below is nearly the same as the proof of Theorem 5.2. It replaces the detection probability P D with the correct-decision probability P CD and uses a zero threshold. We omit it for reasons of space. Corollary 5.2 (a) Q> 1 is necessary for the initial SR effect in the quantizer-array detector (5.89)-(5.90) for the ML detection of (5.83) in any symmetric unimodal channel noise. (b) Suppose that the initial SR effect occurs withQ 1 quantizers and with some symmetric quan- tizer noise in the quantizer-array detector (5.89)-(5.90). Then the rate of the initial SR effect in the quantizer-array detector (5.89)-(5.90) withQ 2 quantizers is larger than the rate of initial SR effect withQ 1 quantizers ifQ 2 >Q 1 . (c) Zero-mean uniform noise is the optimal finite-variance symmetric scale-family quantizer noise in that it gives the maximal rate of the initial SR effect among all possible finite-variance quantizer noise in the ML quantizer-array detector (5.89)-(5.90). Figure 5.8 shows the predicted SR effects in the ML detection of (5.83) in generalized- Gaussian channel noise for quantizer-array detectors. These array-based detectors use Gaussian quantizer noise andK = 75 samples. The signal is a bipolar sequence with amplitudeA = 0.5. The channel noise is generalized-Gaussian with parametersr = 1.2 and = 2. The rate of the initial SR effect increases as the numberQ of quantizers increases as Corollary 5.2(b) predicts. 130 0.2 0.4 0.6 0.8 1 1.2 1.4 0.006 0.01 0.02 0.03 0.04 0.05 0.06 Standard deviation σ N of the Gaussian quantizer noise Probability of Error P e Q = ∞ Optimal Generalized−Gaussian Detector Quantizer noise N q ~ N(0,σ N ) Number of Quatizers Q = 1,3,6,16,32, ∞ Number of samples K = 75 Signal strength A = 0.5 Channel noise V k ~ GG(r = 1.2, σ = 2) Q = 32 Q = 16 Q = 6 Q = 3 Q = 1 Figure 5.8: Initial SR effects in maximum-likelihood quantizer-array detection. The signal is a bipolar sequence of amplitudeA = 0.5. The channel noise is generalized-Gaussian with parameters r = 1.2 and = 2. The quantizer noise is zero-mean Gaussian. The initial SR effect does not occur ifQ = 1 as Corollary 5.2(a) predicts. The rate of initial SR effect increases as the number of quantizersQ increases as Corollary 5.2(b) predicts. The thick dashed line shows the error probability of the respective limiting-array (Q!1) detector. This detection performance is nearly optimal compared to the optimal generalized-Gaussian detector (5.85) (thick horizontal dashed-dot line). The thick dashed line shows the error probability of the limiting-array (Q!1) ML correlation detector (5.91) with limiting-array Gaussian-quantizer-noise nonlinearity g N1 (X k ) = 2( X k N ) 1 (5.97) where we have replacedF~ N in (5.92) with the standard normal CDF . The thick horizontal dash- dot line shows the error probability of the optimal generalized-Gaussian ML detector (5.85). The limiting-array (Q!1) detector does not require that we know the signal amplitudeA. It still gave almost the same detection performance as the optimal generalized-Gaussian detector gave. 131 0.2 0.4 0.6 0.8 1 1.2 1.4 0.01 0.009 0.008 0.007 Standard deviation σ N of quantizer noise Probability of Error P e Channel noise V k ~ GG(r = 1.2, σ = 2) Discrete bipolar Uniform Laplacian Gaussian Signal strength A = 0.5 Number of samples K = 75 Number of quantizers Q = 16 Figure 5.9: Comparison of initial SR noise benefits in the maximum-likelihood quantizer-array detector for four different types of quantizer noise. Symmetric uniform noise gave the maximal rate of the initial SR effect as Corollary 5.2(c) predicts. But Gaussian noise gave the best peak SR effect because it had the maximum decrease in error probability. Laplacian quantizer noise gave the most robust SR effect and had almost the same peak SR effect as the Gaussian noise had. Symmetric discrete bipolar noise gave the smallest SR effect and was least robust. 132 The simulation results in Figure 5.9 show the initial SR-rate optimality of symmetric uniform quantizer noise. The array detector used Q = 16 quantizers and K = 75 samples. The signal and channel noise are the same as in Figure 5.8. The symmetric uniform quantizer noise gave the maximal rate of the initial SR effect as Corollary 5.2(c) predicts. Gaussian noise gave the best peak SR effect in the sense that it had the maximum decrease in error probability. Laplacian quantizer noise gave the most robust SR effect and had almost the same peak SR effect as the Gaussian noise had. Symmetric discrete bipolar noise gave the smallest SR effect and was least robust. Figures 5.10-5.11 show simulation instances of Corollary 5.2 for SS channel noise. The signal is again a bipolar sequence with amplitudeA = 0.5. The channel noise isSS with = 1.7 and = 0:5 1:7 . The noisy quantizer-array detector usedK = 75 samples and Laplacian quan- tizer noise. Figures 5.10 plots the error probability of the noisy quantizer-array detector versus the standard deviation of the Laplacian quantizer noise. The total SR effect did not occur ifQ = 1 just as Corollary 5.2(a) predicts. The rate of the initial SR effect increased as the numberQ of quantizers increased as Corollary 5.2(b) predicts. Figure 5.11 compares the initial SR effects for the simple symmetric quantizer using Lapla- cian, Gaussian, uniform, and discrete bipolar noise forQ = 15. Symmetric uniform noise again gave the maximal rate of the initial SR effect as Corollary 5.2(c) predicts. Symmetric uniform noise also gave the highest peak SR effect when compared with symmetric Laplacian, Gaussian, and discrete bipolar noise even though Corollary 5.2(c) does not guarantee such optimality for the peak SR effect. Gaussian noise gave almost the same peak SR effect as uniform noise did. 133 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 10 −6 10 −5 10 −4 10 −3 Standard deviation σ N of additive Laplacian quantizer noise Error Probability P e Q = 31 Q = 1 Q = 7 Q = 3 Q = 15 Channel noise V k ~ SαS(α = 1.7, γ = 0.5 1.7 , δ = 0) Signal amplitude A = 0.5 Number of samples K = 50 Number of quantizers Q = 1, 3, 7, 15, 31 Figure 5.10: Initial SR effects in the quantizer-array maximum-likelihood detection of a deter- ministic bipolar sequence s k of unknown amplitude A in infinite-variance SS channel noise with Laplacian quantizer noise. Initial SR effects in the quantizer-array maximum-likelihood detection of a deterministic bipolar sequence s k of unknown amplitudeA in infinite-varianceSS channel noise with Laplacian quantizer noise. The initial SR effect does not occur ifQ = 1 as Corollary 5.2(a) predicts. The rate of initial SR effect increases as the number of quantizersQ increases as Corollary 5.2(b) predicts. Gaussian noise was also more robust for the SR effect than uniform noise was. Figure 5.11 fur- ther shows that symmetric discrete bipolar noise gave the smallest SR effect and was the least robust. The SR effect was most robust against Laplacian quantizer noise. The symmetric channel noise must be unimodal in Theorem 5.2(a) and Corollary 5.2(a). But the SR effect can still occur even for a detector that has just one quantizer if the channel noise is multimodal. Figure 5.12 plots the error probability of the noisy quantizer-array detector versus the standard deviation of the Gaussian quantizer noise for the ML detection of a bipolar sequence 134 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 10 −6 10 −5 10 −4 Error Probability P e Standard deviation σ N of additive quantizer noise Discrete bipolar Uniform Gaussian Laplacian Signal amplitude A = 0.5 Number of samples K = 50 Number of quantizers Q = 15 Channel noise V k ~ SαS(α = 1.7, γ = 0.5 1.7 , δ = 0) Figure 5.11: Comparison of initial SR effects in the quantizer-array maximum-likelihood detec- tion of a deterministic bipolar sequence s k of unknown amplitude A in infinite-variance SS channel noise with different types of quantizer noise. The symmetric uniform noise gave the maximal rate of the initial SR effect as Corollary 5.2(c) predicts. Symmetric discrete bipolar noise gave the smallest SR effect and was least robust. Symmetric uniform noise had the peak SR effect in the sense that it had the maximum decrease in the error probability. Gaus- sian noise gave almost the same peak SR effect as uniform noise did. The SR effect was most robust against Laplacian quantizer noise. 135 0 0.2 0.4 0.6 0.8 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Standard deviation σ N of additive Gaussian noise Probability of Error P e Q = 8 Q = 4 Q = 1 Q = 2 Bimodal Gaussian mixture Channel noise V k : f Vk (v) = 0.5φ(v,−0.7,0.3 2 ) + 0.5φ(v,0.7,0.3 2 ) Signal amplitude A = 0.5 Number of samples K = 20 Number of quantizers Q = 1, 2, 4, 8 Figure 5.12: SR effects in the quantizer-array maximum-likelihood detection of an antipodal signal sequence in symmetric bimodal Gaussian-mixture channel noise with Gaussian quantizer noise. The initial SR effect occurs even ifQ = 1 because the channel noise was multimodal. The rate of the initial SR effect still increased as the number of quantizersQ increased as Corollary 5.2(b) predicts. in bimodal Gaussian-mixture channel noise. Here the channel noise is a symmetric bimodal Gaussian mixture with pdf f 0 (n) = 1 2 e (n+0:7) 2 =2(0:3) 2 p 20:3 + 1 2 e (n0:7) 2 =2(0:3) 2 p 20:3 (5.98) with signal amplitudeA = 0.5. Figure 5.12 shows that the SR effect occurs forQ = 1 and that the rate of the initial SR effect increased as the numberQ of quantizers increased. 136 5.6 Watermark Decoding Using the Array-Based Correlation Detectors The above noisy nonlinear detectors and their limiting-array (Q ! 1) detectors can benefit digital watermark extraction. We will demonstrate this for blind watermark decoding based on direct-sequence spread spectrum. Digital watermarking helps protect copyrighted multimedia data because it embeds or hides a signal in the data without altering it too much [14, 82]. One can use watermarked data while it remains protected since it does not need decryption. Watermarking does not prevent copying but it lets one detect or trace some copyright violations. So good watermarking needs to be robust and secure as well as imperceptible. Watermark extraction or decoding is simple and accurate if the original host data is available. We just compare the unwatermarked data with the watermarked data to recover the embedded message. But the unwatermarked data may not be available in many multimedia applications. This includes tracking or video watermarking where the use of the original data is not practical because of its large volume. The applications may require blind watermark techniques so that the extraction processes do not use the original data but instead treat it as noise added in the hidden message [104]. Transform-domain watermarking techniques are more robust and tamper- proof compared to direct spatial watermarking [150, 82]. We will use the popular direct-sequence spread-spectrum watermarking in the DCT domain because it gives a robust but invisible water- mark and because it allows various types of detectors for blind watermark extraction [18, 62, 106]. DCT watermarking embeds or adds watermark signalsW [k] of anM 1 M 2 binary watermark imageB(n) in the DCT-2 coefficientsV [k] of anL 1 L 2 host imageH(n). Here k = (k 1 ;k 2 ) 137 and n = (n 1 ;n 2 ) are the respective 2-D indices in the transform domain and in the spatial domain. We apply an 8 8 block-wise DCT-2 transform [62] V (k 1 ;k 2 ) = 8 > > > > > > < > > > > > > : d( ~ k 1 )d( ~ k 2 ) P 7 ~ n 1 =0 P 7 ~ n 2 =0 H(8t + ~ n 1 ; 8u + ~ n 2 )cos h ~ k 1 (2~ n 1 +1) 16 i cos h ~ k 2 (2~ n 2 +1) 16 i for 0 ~ k 1 ; ~ k 2 7 0 otherwise (5.99) to the host imageH where ~ k 1 =k 1 mod 8, ~ k 2 =k 2 mod 8, t = largest integer smaller than or equal tok 1 =8,u = largest integer smaller than or equal tok 2 =8, and d( ~ k) = 8 > > < > > : 1= p 8 if ~ k = 0 1=2 if 1 ~ k 7 : (5.100) Using a block-wise DCT involves less computation than using a full-image DCT. The watermark imageB is anM 1 M 2 black-and-white image such that approximately half of its pixels are black and the rest are white. This watermark imageB gives a watermark message b = (b 1 ;:::;b M ) withM (=M 1 M 2 ) bipolar (1) bits such thatb i = 1 ifB(n 1 ;n 2 ) = 255 (white) andb i = -1 ifB(n 1 ;n 2 ) = 0 (black) wherei =n 1 + (n 2 1)M 1 . A secret key picks a pseudorandom assignment of disjoint subsetsD i of size or cardinalityK from the setD of mid-frequency DCT-coefficients for each message bitb i . DenoteI i as the set of two-dimensional indices of the DCT-coefficient setD i :I i =fk :V i [k]2D i g. ThenjD i j =jI i j = K. The secret key also gives the initial seed to a pseudorandom sequence generator that produces a bipolar spreading sequences i [k] of lengthK (k2I i ) for each message bitb i [106, 215]. Each s i is a pseudorandom sequence ofK i.i.d. random variables inf1;1g. The cross-correlation 138 between theseM spreading sequencesfs i g M i=1 is zero while their autocorrelation is the Kronecker delta function. We embed the information of bit b i in each DCT-coefficient V i [k] of the setD i using the respective spreading sequences i [k] of lengthK. Each bipolar message bitb i multiplies its biploar psuedorandom spreading sequence s i to “spread” the spectrum of the original message signal over many frequencies. Often a psychovisual DCT-domain perceptual mask gives the watermark embedding strengtha[k] to reduce the visibility of the watermark in the watermarked image [4]. This perceptual maska[k] also multiplies the psuedorandom sequences i [k] to give the watermark signalW i [k] =b i a[k]s i [k] for each k-pixel in the DCT domain. We used the constant perceptual maska[k] =A for all k and thus used a constant embedding strength. So the watermark signal W [k] at each k-pixel is eitherA orA. We then addW i [k] to the host-image DCT-2-coefficient V i [k]2D i . This gives the watermarked DCT-2 coefficientX i [k] =V i [k] +b i a[k]s i [k]. Then the inverse block-wise DCT-2 transform gives a watermarked imageH W [n]. Retrieving the hidden message b requires that we know the psuedorandom assignment of DCT-coefficientsfV i [k]j k2D i g to each message bitb i in b and that we also know the psuedo- random sequences i for eachb i . Then an attacker cannot extract the watermark without the secret key. Suppose that we do know the secret key. Then watermark decoding is equivalent to testing M binary hypotheses of the form H 0 (b i =1) : X i [k] =As i [k] +V i [k] H 1 (b i = +1) : X i [k] =As i [k] +V i [k] (5.101) 139 for all k2I i and fori = 1;:::;M. Heres i is the known bipolar signal sequence andA is the known or unknown signal amplitude. The optimal decision rule to decode the message bitb i is the ML rule W (X i ) = X k2I i log X[k]jH 1 X[k]jH 0 H 1 > < H 0 0 (5.102) because we assume that the DCT-2 coefficients V i [k] are i.i.d. random variables and that the message bitsb i are equally likely to be -1 or 1. The ML detection rule (5.102) becomes a simple linear correlator L (X i ) = X k2I i s i [k]X i [k] H 1 > < H 0 0 (5.103) if the DCT-2 coefficients are Gaussian random variables. The ML detection rule (5.102) becomes the generalized-Gaussian detector or decoder ML NL (X i ) = X k2I i [jX i [k]+As i [k]j r jX i [k]-As i [k]j r ] H 1 > < H 0 0: (5.104) or the Cauchy detector C NL (X i ) = X k2I i log 2 + (X i [k]As i [k]) 2 2 + (X i [k] +As i [k]) 2 H 1 > < H 0 0 (5.105) if the respective DCT-2 coefficients have a generalized Gaussian or a Cauchy pdf. But these optimal ML detectors require that we know the watermark-embedding strengthA and know their distribution parametersr or . Both the suboptimal quantizer-array detector 140 NQ (X i ) = X k2I i s i [k]g NQ (X i [k]) H 1 > < H 0 0 (5.106) where g NQ (X i [k]) = Q X q=1 sign(X i [k] +N q ) (5.107) and its limiting-array nonlinear correlation detector (5.91)-(5.92) N1 (X i ) = X k2I i s i [k]g N1 (X i [k]) H 1 > < H 0 0 (5.108) where g N1 (X i [k]) = 2F~ N ( X i [k] N ) 1: (5.109) require that we know the quantizer noise intensity N . They do not require that we know the embedding strengthA. The algorithm below summarizes the process of watermark embedding and watermark decoding. Figure 5.1 shows an SR noise benefit in the watermark-decoding performance of quantizer- array detector (5.106)-(5.107) and its limiting-array nonlinear correlation detector (5.108)-(5.109) when the quantizer noise is symmetric uniform noise. The simulation software was Matlab. Figure 5.1(a) shows the ‘yin-yang’ image. We use this binary (black and white) 6464 image as a hidden message to watermark the 512512 gray-scale ‘Lena’ image. Figure 1(b) shows the ‘Lena’ image watermarked with the ‘yin-yang’ image such that its peak signal-to-noise ratio (PSNR) is 46.6413 dB. We define the PSNR as the ratio of the maximum power of the original host image’s pixelsH[k] to the average power of the difference between the original imageH and the watermarked host imageH W : 141 Watermark Embedding 1. Compute the 8 8 block-wise DCT-2 transformV [k] of theL 1 L 2 host imageH[n] as in (5.99). 2. LetD be the set of mid-frequency DCT-2 coefficients of all 8 8 DCT blocks ofV [k]. 3. Convert theM 1 M 2 binary (black-and-white) watermark imageB(n) into anM 1 M 2 =M-bit bipolar watermark message b = (b 1 ;:::;b M ) such thatb i = 1 ifB(n 1 ;n 2 ) = 255 (white) elseb i = -1 ifB(n 1 ;n 2 ) = 0 (black) wherei =n 1 + (n 2 1)M 1 . 4. Use a secret key to psuedorandomly pickM disjoint subsetsD i of cardinalityK from the setD. 5. LetI i be the set of two-dimensional indices of the DCT-coefficient setD i . 6. Use the secret key to generateM psuedorandom bipolar spreading sequencess i [k] of lengthK wheres i [k] =1 for all k2I i , andi = 1, ...,M. 7. For each message bitb i : compute the watermark signalsW i [k] =b i a[k]s i [k] for all k2I i wherea[k] =A is the constant perceptual mask (constant embedding strength). 8. For each message bitb i : the watermark signalsW i [k] in each DCT-coefficientV i [k] of the setD i and get the watermarked DCT-2 coefficientsX i [k] =V i [k] +b i a[k]s i [k] for all k2I i . 9. Compute the inverse block-wise DCT-2 transform using the watermarked DCT-coefficientsX[k] to get a watermarked imageH W [n]. Watermark Decoding 1. Compute the 88 block-wise DCT-2 transform coefficientsX[k] of theL 1 L 2 watermarked host imageH W [n]. 2. Use the secret key to determine the index setsI i g fori = 1, ...,M. 3. ObtainM sets of watermarked DCT-coefficients X i =fX i [k]j k2I i g fori = 1, ...,M. 4. Use the secret key to reproduce the pseudo spreading sequencesfs i g M i=1 . 5. Find the decoded message bitsf ^ b i g M i=1 using any of the following ML decoders forM binary hypothesis tests in (5.101): Linear correlation detector (5.103) Generalized-Gaussian detector (5.104) Cauchy detector (5.105) Nonlinear noisy quantizer-array detector (5.106)-(5.107) Limiting-array (Q!1) correlation detector (5.108)-(5.109) 142 PSNR = 10log 10 " max k2f1;:::;L1gf1;:::;L 2 g H 2 [k] jH[k]H W [k]j 2 =(L 1 L 2 ) # : (5.110) Each DCT-coefficient setD i of the ‘Lena’ image in Figure 5.1(a) hides one message bit of the ‘yin-yang’ image using a psuedorandom bipolar spreading sequence s i . Matlab generated the pseudorandom sequence s i . The solid U-shaped line in Figure 5.1(c) shows the average pixel-detection errors of the ML noisy quantizer-array detector (5.106)-(5.107) for 200 randomly generated secret keys. The dashed vertical lines show the total min-max deviation of the pixel- detection errors in these simulation trials. The dashed U-shaped line shows the pixel-detection errors of the limiting-array (Q ! 1) correlation detector (5.108)-(5.109) where g N1 is the soft-limiter nonlinearity g SL of (5.8) because the quantizer noise is symmetric uniform noise. So the thick dashed line gives the lower bound on the pixel-detection error that any quantizer- array detector with symmetric uniform quantizer noise can achieve by increasing the number of quantizersQ in its array. Figures 5.1(d)-5.1(g) show the extracted ’yin-yang’ image using the ML linear correlation detector (5.103) and the ML noisy quantizer-array detector (5.106)-(5.107). The noisy quantizer- array nonlinear detector outperforms the linear correlation detector. Figures 5.1(e)-5.1(f) show that adding uniform quantizer noise improved the watermark detection. The pixel-detection errors decreased by more than 33% as the uniform quatizer noise standard deviation increased from = 0 to = 1. Figure 5.1(g) shows that too much quantizer noise degrades the watermark detection. But the SR effect is robust against the quantizer noise intensity because the pixel-detection error in (g) is still less than the pixel-detection errors in (d) and (e). 143 (a) Elaine (b) Goldhill (b) Pirate (d) Bird (e) Peppers (f) Tiffany Figure 5.13: Six different 512512 host images watermarked with the ‘yin-yang’ image in Figure 5.1(a). (a) Elaine, (b) Goldhill, (c) Pirate, (d) Bird, (e) Peppers, and (f) Tiffany. The peak signal-to-noise ratio (PSNR) of each watermarked image was between 45 dB and 47 dB. Figure 5.13 shows six other images watermarked with the ‘yin-yang’ image. We used the watermarking strengthA such that the PSNRs of all these watermarked images remained between 45 to 47 db. Figure 5.14 shows the average watermark-decoding performances of the ML Cauchy detector, the limiting-array (Q!1) detector for the symmetric uniform quantizer noise (soft- limiter correlation detector), and the generalized-Gaussian detector for various values of their respective pdf parameters ( , N , or r). We used over 200 simulation trials for each of these six images and for the ‘Lena’ image to compute the average pixel-detection errors in watermark decoding. A set of randomly generated 200 secret keys allocated the DCT-2-coefficients and produced the spreading sequences for the watermarking process of each host image. We then applied all three detectors to each of the seven watermarked images to decode the watermark using each secret key. 144 1 2 3 4 5 6 0 50 100 150 200 250 300 350 400 450 500 550 Dipersion γ of Cauchy distribution Pixel Errors in Watermark Extraction Watermark Extraction Errors Using The Cauchy Detector Elaine Goldhill Pirate Fish Peppers Lena Tiffany 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 50 100 150 200 250 300 Standard deviation σ N of symmetric uniform quantizer noise Pixel Errors in Watermark Extraction Watermark Extraction Errors Using The Limiting (Q → ∞) Array Detector Elaine Goldhill Pirate Fish Peppers Lena Tiffany (a) (b) 0.2 0.4 0.6 0.8 1 1.2 1.4 0 50 100 150 200 250 300 350 400 450 500 550 Shape parameter r of generalized Gaussian distribution Pixel Errors in Watermark Extraction Watermark Extraction Errors Using The generalized Gaussian Detector Elaine Goldhill Pirate Fish Peppers Lena Tiffany (c) Figure 5.14: Watermark decoding performance of the three detectors (a) Cauchy detector, (b) limiting (Q!1) array detector, and (c) generalized-Gaussian detector. These plots show the detectors’ average pixel-detection errors in watermark decoding for sampled values of their respective pdf parameters ,r, and N . Figure 5.14(a) plots the Cauchy detector’s average pixel-detection errors versus the disper- sion for each watermarked image. Figures 5.14(b)-5.14(c) plot the respective average pixel- detection errors for the soft-limiter correlation detector and the generalized-Gaussian detector against the standard deviation of the uniform quantizer noise and the shape parameterr of the 145 Images Minimum Average Pixel Errors in Watermark Extraction Cauchy Limiting-Array Generalized-Gaussian Detector Detector (Q!1) Detector Elaine 241 241 241 Goldhill 252 263 263 Pirate 140 168 150 Bird 157 165 172 Peppers 76 88 91 Lena 34 43 43 Tiffany 21 28 28 Table 5.1: Comparison of Watermark-Decoding Performances Minimum values of average watermark-decoding performance for the Cauchy detector, the Limiting (Q! 1) array detector, and the Generalized-Gaussian detector. generalized-Gaussian pdf. Table 5.1 shows the minimum value of these average pixel-detection- error plots for each host image and for each detector. The Cauchy detector had the best per- formance. The soft-limiter correlation detector had average errors similar to the generalized- Gaussian detector. But the soft-limiter did not need the watermark strengthA. Both the Cauchy detector and the generalized-Gaussian detector needed A and had more complex implementa- tions. Noise can also benefit DCT-domain watermark decoding for some other types of nonlin- ear detectors. Researchers [250, 249, 68] have shown noise benefits in DCT-domain watermark decoding based on parameter-induced stochastic resonance [70, 69]. Their approaches differed from ours in two ways. They used a pulse amplitude modulated antipodal watermark signal but they did not use psuedorandom bipolar spreading sequences to embed this watermark signal in the DCT coefficients. They further used nonlinear but dynamical detectors to process the water- marked DCT coefficients. Sun et al. [250] used a monostable system with selected parameters for a given watermark. Wu and Qiu [270] and Sun and Lei [249] used a single bistable detector 146 while Duan et al. [68] used an array of bistable saturating detectors. These bistable detectors require tunning two system parameters besides finding the optimal additive noise type and its variance. These detectors are also subthreshold systems. Their dynamical nature made the error probability in the decoding of one watermark bit depend on the value of the previous watermark bit. The noisy quantizer-array detectors in this paper exhibit suprathreshold SR noise benefits. And the detection error probabilities for all watermark bits are independent. 147 Chapter 6 Part Two: Noise Benefits with Mutual Information The two chapters in Part Two show how noise can benefit both feedfoward and feedback pro- cessing of subthreshold signals in terms of increasing the nonlinear signal system’s mutual in- formation or bit count. Chapter 7 shows that a spiking retinal neuron model can benefit from almost all noise types. It then shows that other Poisson spiking sensory neurons benefit from additive white Gaussian noise. Chapter 7 also extends the earlier “forbidden interval” theorems on mutual-information SR in threshold neurons to spiking retinal and Brownian sensory neuron models. The theorems act as a type of screening device for noise benefit because they give suffi- cient or necessary conditions for such noise benefit just based on the values of signal, threshold and noise mean or location. Chapter 8 extends the forbidden-interval noise benefits to nonlinear stochastic differential equations in which the driving noise processes are Levy or jump processes. It shows that the mutual information of many such systems can benefit from quite general and mathematically complex Levy diffusions if they obey a general stochastic differential equation. Potential appli- cations of Levy noise benefit range from low-light imaging and financial models of irregular or disruptive markets to mathematical models of retinal and sensory neurons and artificial neural 148 interconnections. Chapter 8 gives the first demonstration of the Levy-noise SR effect for a wide range of general feedback continuous neurons and spiking neurons. 149 Chapter 7 Stochastic Resonance in Spiking Retinal and Sensory Neuron Models This chapter presents a substantial new theoretical finding. It proves that the bit count or mu- tual information of the major mathematical models of Poisson spiking retinal neurons and other sensory neurons must benefit from the addition of some noise in terms of improving how the neu- rons detect faint (sub-threshold) signals. Two new theorems show that small amounts of additive white noise can improve the bit count or mutual information of several popular models of spiking retinal neurons and spiking sensory neurons. The first theorem gives necessary and sufficient conditions for this noise benefit or stochastic resonance (SR) effect for subthreshold signals in a standard family of Poisson spiking models of retinal neurons. The result holds for all types of finite-variance noise and for all types of infinite-variance stable noise. The second theorem gives a similar forbidden-interval sufficient condition for the SR effect for several types of spiking sen- sory neurons that include the Fitzhugh-Nagumo neuron, the leaky integrate-and-fire neuron, and the reduced Type I neuron model if the additive noise is Gaussian white noise. Simulations show that neither the forbidden-interval condition nor Gaussianity is necessary for the SR effect. 150 7.1 Noise Benefits in Spiking Neurons Models Figure 7.1 shows an SR noise benefit in a spiking retinal neuron. The neuron should emit more spikes when the brightness contrast level is low rather than high. The right amount of Gaussian noise helps the neuron discriminate between two levels of brightness contrast. The retinal neuron emits too few spikes if no noise corrupts the Bernoulli sequence of contrast levels. The neuron also emits too many spikes and emits many of them at the wrong time if too much noise corrupts the sequence. The next section presents the first of two new SR theorems for spiking neurons. The first theorem gives necessary and sufficient conditions for this noise benefit or stochastic resonance (SR) effect for subthreshold signals in standard models of Poisson spiking retinal neurons. The last section presents the second theorem that gives a sufficient condition for an SR noise benefit in standard models of spiking sensory neurons that include the Fitzhugh-Nagumo neuron, the leaky integrate-and-fire neuron,and the reduced Type I neuron model if the additive noise is Gaussian white noise. The converse also holds for the leaky integrate-and-fire neuron but need not hold for other spiking sensory neurons as simulations confirm. Lack of a converse broadens rather than lessens the potential scope of SR in spiking sensory neurons. Simulations also show that the SR effect can persist for other types of finite-variance and infinite-variance noise. These new theorems extend the earlier results on SR in threshold neurons for subthreshold signals [140, 141]. The original forbidden interval theorem [140, 141] states that simple thresh- old neurons will have an SR noise benefit in the sense that noise increases the neuron’s mutual information or bit count if and only if the noise mean or location parameter does not fall in a threshold-related interval: SR occurs if and only if = 2 (T –A,T +A) for thresholdT where 151 0 2 4 6 8 10 12 14 16 18 20 0 0.5 0 2 4 6 8 10 12 14 16 18 20 0 0.5 0 2 4 6 8 10 12 14 16 18 20 −1 0 1 0 2 4 6 8 10 12 14 16 18 20 −1 0 1 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 0 2 4 6 8 10 12 14 16 18 20 0 1 0 2 4 6 8 10 12 14 16 18 20 0 1 0 2 4 6 8 10 12 14 16 18 20 0 1 Time in seconds f(t) ∗ (S(t) + n 1 (t)) + n 2 (t) f(t) ∗ (S(t) + n 1 (t)) S(t) + n 1 (t) S(t) Spikes Spikes Spikes (a) (b) (c) (d) (e) (f) (g) (h) r(t) Figure 7.1: Stochastic resonance in a spiking retinal neuron. Stochastic resonance in a spiking retinal neuron. The neuron should emit more spikes when the brightness contrast level is low rather than high. Noise improves the discrimination of subthreshold contrast stimuli in the retina model (1)-(3). (a) Bernoulli contrast signalS as a function of timet. (b) Contrast signalS plus Gaussian white noisen 1 with variance 1 = 0.03. (c) Signal in plot (b) filtered withf in (1). (d) Filtered noisy signal in (c) plus noisen 2 (synaptic and ion-channel noise) with variance 2 = 0.06. (e) Noisy spike rater(t). (f) SR effect: Output Poisson spikes that result from the noisy spike rater(t). (g) Output spikes in the absence of noise. (h) Output spikes in the presence of too much noise. A < A < T for bipolar subthreshold signalA. The sufficient or if-part of the theorem first appeared in [140] while the converse only-if part first appeared in [141]. The result holds for all noise types that have finite variance and for all infinite-variance noise types from the broad family of stable distributions [241]. The proof technique assumes that the nonnegative mutual 152 information is positive and then shows that it goes to zero as the noise variance or dispersion goes to zero—so the mutual information must increase as the noise dispersion increases from zero. We now extend the above memoryless SR theorem to the more complex models of retinal and sensory neurons that produce spike trains. We prove that a general retinal model with two noise sources and a piecewise-linear sigmoidal function exhibits SR if and only if the sum of the two noise means does not lie in the forbidden interval ( 1 v 1 ; 2 v 2 ) that depends on the threshold values 1 and 2 and on the subthreshold signal valuesv 1 andv 2 . The only-if part holds in the sense that the system performs better without noise than with it when the interval condition fails. We then show that the SR effect holds for a general family of nonlinear sensory neural models if the additive noise is Gaussian white noise. These models include the popular FitzHugh-Nagumo (FHN) model [52, 55] and the integrate-and-fire model [55, 88], and the reduced Type I neuron model [160]. 7.2 Stochastic Resonance in Spiking Retinal Models Theorem 7.1 below characterizes SR in spiking retinal models. It states that standard spiking retinal models benefit from additive white noise if and only if a joint noise mean or location parameter does not fall in a forbidden interval of threshold-based values. Theorem 7.1 holds for all finite-variance noise and for all impulsive or infinite-variance stable noise [97, 139, 241]. The performance measure is the input-output Shannon mutual-information [61] bit countI(S;R) = H(R) – H(RjS) for input signal random variable S and output response random variable R. Figure 1 shows a simulation instance of Theorem 7.1 for Gaussian white noise that corrupts a random Bernoulli sequence of brightness contrast levels in a Poisson-spiking retinal neuron. 153 The retina model of Theorem 7.1 is a noisy version of a common Wiener-type cascade model [43, 129, 133, 219, 231]: r(t) = r 0 h Z 1 1 f(z)fS(tz) +n 1 (t)gdz +n 2 (t) (7.1) whereS is the input stimulus defined below,r is the instantaneous Poisson spike rate that gives the exponential interspike-interval density function asp(t) =r(t)exp[ R t 0 r()d],f is a band-pass linear filter function, andh is a memoryless monotone–nondecreasing function. Heren 1 denotes the combined stimulus and photoreceptor noise [6, 149, 219] andn 2 denotes the combined ion- channel noise [236, 255] and the synaptic noise [79, 155, 168]. The input stimulusS is Michelson’s visual contrast signal [40]:S = (L c L s )/(L c +L s ).L c is the amount of light that falls on the center of the ganglion cell’s receptive field. L s is the light that falls on its surround region. The sigmoid-shaped memoryless functionh approximates the spike threshold and saturation level. We defineh as a piecewise-linear approximation of a sigmoidal nonlinearity [276]: h(x) = 8 > > > > > > < > > > > > > : 2 1 ifx> 2 x 1 if 1 x 2 0 ifx< 1 (7.2) and so r(w(t)) = 8 > > > > > > < > > > > > > : r 0 ( 2 1 ) ifw(t)> 2 r 0 (w(t) 1 ) if 1 w(t) 2 0 ifw(t)< 1 . (7.3) 154 The Shannon mutual informationI(S;R) between the input contrast signalS and the output aver- age spiking rater measures the neuron’s bit count and allows us to detect the noise enhancement or SR effect. The subthreshold contrast signalS(t)2fA;Bg is a random Bernoulli sequence withP [S(t) = A] =p andP [S(t) =B] = 1p. The time duration of each signal valueA andB inS(t) is much larger than the time constant of the linear filterf(t). We definev(t) as the filtered output of the contrast signalS(t) without noisen 1 (t) and such that v(t)j S(t)=A = v 1 (7.4) and v(t)j S(t)=B = v 2 (7.5) in steady-state, wherev 1 >v 2 and max(v 1 ;v 2 )< 1 < 2 . So the input signalS(t) is subthresh- old. We measure the average spike rate for each symbol only when the corresponding value of v(t) is in steady-state. Then the filtered noisez isz(t) =f(t)n 1 (t) where ‘’ denotes convo- lution. Theorem 7.1 below gives necessary and sufficient conditions for an SR noise effect in the retina neuron model (7.1)-(7.3) for either noise sourcen 1 orn 2 . The theorem shows that some increase in such noise must increase the neuron’s mutual information I(S;R)—and thus must increase the neuron’s ability to discriminate subthreshold contrast signals—if the noise mean or location parameter obeys a simple interval constraint. This SR effect holds for all finite-variance probability density functions. The result is robust because it further holds for all infinite-variance 155 stable noise densities such as impulsive Cauchy or Levy noise [97, 139, 140, 141] and the un- countably many other stable densities that obey a generalized central limit theorem [241]. The proof follows the technique of [140, 141]. Theorem 7.1 Suppose that the noise sourcesn 1 andn 2 in the retina model (7.1)-(7.3) are white and have finite- variance (or finite-dispersion in the stable case) probability density functions p 1 (n) and p 2 (n) with corresponding variances (dispersions) 2 1 and 2 2 ( 1 and 2 ). Suppose that the input signal S is subthreshold (v 2 <v 1 < 1 < 2 ) and that there is some statistical dependence between the input contrast random variableS and the output random variableR so thatI(S;R)> 0. Then the retina model (7.1)-(7.3) exhibits the nonmonotone SR effect in the sense thatI(S;R)! 0 as 2 1 and 2 2 (or 1 and 2 ) decrease to zero if and only if the mean sumE(n 1 ) R f()d +E(n 2 ) (or the location parameter sum in the stable case) does not lie in the interval ( 1 v 1 ; 2 v 2 ). The only-if part holds in the sense that the system performs better without noise than with it when the interval condition fails. Proof: Assume 0<P S (s)< 1 to avoid triviality whenP S (s) = 0 or 1. A. If-part We show thatS andR are asymptotically independent: I( 1 ; 2 )! 0 as 1 ! 0 & 2 ! 0. This is equivalent toI()! 0 as! 0 where is the variance of the total noisen =z +n 2 . Independence ofn 1 andn 2 implies thatz andn 2 are independent and hence 2 = Var(z) + 2 2 , where Var(z) = 2 1 R f 2 (t)dt. Recall that I(S;R) = 0 if and only if S and R are statistically independent [61]. So we need to show only thatf SR (s;r) =P S (s)f R (r) orf RjS (rjs) =f R (r) 156 as! 0 for signal symbolss2fA;Bg andr2 [0;r 0 ( 2 1 )] wheref SR is a joint probability density function andf SjR is a conditional density function. This is equivalent toF RjS = F R as ! 0 whereF RjS is the conditional distribution function [72]. The well-known theorem on total probability and the two-symbol alphabet setfA;Bg give F R (r) = X s F RjS (rjs)P S (s) (7.6) = F RjS (rjA)P S (A) +F RjS (rjB)P S (B) (7.7) = F RjS (rjA)P S (A) +F RjS (rjB)(1P S (A)) (7.8) = (F RjS (rjA)F RjS (rjB))P S (A) +F RjS (rjB) (7.9) So we need to show thatF RjS (rjA) -F RjS (rjB)! 0 as! 0 for allr in the closed interval [0;r 0 ( 2 1 )]. This condition implies thatF R (r) =F RjS (rjB) andF R (r) =F RjS (rjA). Note thatF RjS (rjA) =F RjS (rjB) = 1 forr =r 0 ( 2 1 ) becauser 0 ( 2 1 ) is the maximum firing rate. So we need to show only that F RjS (rjA) - F RjS (rjB)! 0 as ! 0 for all r in the half-open interval [0;r 0 ( 2 1 )). Considers =A: Then 7.3 implies that F RjS (rjA) = Prfr 0 h(v +n)rgj S=A = Prfr 0 h(v 1 +n)rg by (4) = Prfh(v 1 +n)r=r 0 g becauser 0 > 0 = Prfv 1 +n sup[h 1 (r=r 0 )]g becauseh is monotonic nondecreasing 157 = Prfn sup[h 1 (r=r 0 )]v 1 g = Z sup[h 1 (r=r 0 )]v 1 1 p(n)dn wherep(n) is the probability density function of the total noisez +n 2 . A symmetric argument shows that F RjS (rjB) = Z sup[h 1 (r=r 0 )]v 2 1 p(n)dn: So we need to show that Z sup[h 1 (r=r 0 )]v 2 1 p(n)dn Z sup[h 1 (r=r 0 )]v 1 1 p(n)dn (7.10) = Z sup[h 1 (r=r 0 )]v 1 sup[h 1 (r=r 0 )]v 2 p(n)dn! 0 as ! 0 But Equation 7.2 implies that 1 sup[h 1 (r=r 0 )] < 2 . So Z sup[h 1 (r=r 0 )]v 2 sup[h 1 (r=r 0 )]v 1 p(n)dn Z 2 v 2 1 v 1 p(n)dn and so it is enough to show that Z 2 v 2 1 v 1 p(n)dn ! 0 as ! 0: (7.11) We first consider the case of finite variance noise. Let the mean of the total noisen =z +n 2 be m = E(z) +E(n 2 ). Suppose that m < 1 v 1 since m = 2 ( 1 v 1 ; 2 +v 2 ) where for 158 convenience only we ignore the measure-zero case ofm = 1 v 1 . Pick = 1 2 ( 1 v 1 m). So 1 v 1 = 1 v 1 +mm =m + ( 1 v 1 m) =m + 2 =m +. Then F RjS (rjA)F RjS (rjB) = Z 2 v 2 1 v 1 p(n)dn (7.12) Z 1 1 v 1 p(n)dn (7.13) Z 1 1 v 1 p(n)dn (7.14) Z 1 m+ p(n)dn (7.15) = Prfnm +g (7.16) = Prfnmg (7.17) Prfjnmjg (7.18) 2 2 by Chebychev’s inequality (7.19) ! 0 as 1 ! 0 & 2 ! 0 because 2 = 2 1 R f 2 (t)dt + 2 2 . A symmetric argument shows that form> 2 v 2 F RjS (rjA)F RjS (rjB) 2 2 ! 0 (7.20) as 1 ! 0 & 2 ! 0. We next consider the case of infinite variance noise. Note that ifn 1 andn 2 are alpha-stable noise then z = n 1 f and z +n 2 are also alpha-stable noise (Grigoriu, 1995). Let m be the location parameter of the total alpha-stable noisen =z +n 2 . The characteristic function(!) 159 of alpha-stable noise density p(n) reduces to a simple exponential in the zero dispersion limit (Kosko & Mitaim, 2003): lim !0 (!) = expfim!g (7.21) for all’s, skewness’s, and locationm’s because '(!) = exp n im! j!j 1 +isign(!) tan 2 o for6= 1 (7.22) and '(!) = expfim! j!j(1 2i lnj!jsign(!)=)g for = 1 (7.23) where sign(!) = 8 > > > > > > < > > > > > > : 1 if!> 0 0 if! = 0 1 if!< 0 (7.24) with i = p 1, 0 < 2,1 1, and > 0. So Fourier transformation gives the corresponding density function in the limiting case ( ! 0) as a translated delta function lim !0 p(n) = (nm) (7.25) 160 Then F RjS (rjA)F RjS (rjB) = Z 2 v 2 1 v 1 p(n)dn (7.26) ! Z 2 v 2 1 v 1 (nm)dn = 0 (7.27) becausem = 2 ( 1 v 1 ; 2 v 2 ). B. Only-if part Suppose thatm2 ( 1 v 1 ; 2 v 2 ) wherem is the mean or location parameter of the total noise n =z +n 2 . Then exactly one of the following four cases holds: Case (1):v 2 +m 1 <v 1 +m 2 Case (2): 1 <v 2 +m<v 1 +m 2 Case (3): 1 <v 2 +m< 2 <v 1 +m Case (4):v 2 +m 1 < 2 <v 1 +m Suppose that Case (1) or Case (4) holds. Then define a new random variableY =g(R) such that y =g(r) = 8 > > < > > : 0 if r = 0 1 if r> 0: (7.28) 161 Suppose next that Case (2) holds. Then define y =g(r) = 8 > > < > > : 0 ifrr 0 (v 2 +m +a) 1 ifr>r 0 (v 2 +m +a) (7.29) wherea = (v 1 v 2 )=2. Suppose last that Case (3) holds. Then define y =g(r) = 8 > > < > > : 0 ifrr 0 ( 2 1 ) 1 ifr =r 0 ( 2 1 ). (7.30) We show thatI(S;Y )! H(S) as! 0. Recall thatH(S)I(S;R) becauseI(S;R) = H(S) –H(SjR) andS is a discrete random variable, and thatI(S;R)I(S;Y = g(R)) by data processing inequality [61]. ThenI(S;R) converges to its maximum valueH(S) as! 0 and hence the SR effect does not exist in the sense that the system performs better without noise than with it when the interval condition fails. We first give the proof for Case (1) and Case (4). Note thatv 2 +m< 1 impliesm< 1 v 2 where for convenience only we ignore the measure-zero case ofm = 1 v 2 . Suppose thatm is the mean of the finite variance total noisez +n 2 . Pick = 1 2 d(m; 1 v 2 )> 0. Then 1 v 2 =m +. Write 162 P YjS (0jB) = Prfr 0 (n +v) = 0gj S=B (7.31) = Prfn +v 2 1 g by (3) and (5) (7.32) = Prfn 1 v 2 g (7.33) Prfn 1 v 2 g (7.34) = Prfnm +g (7.35) = 1Prfnm>g (7.36) 1Prfjnmj>g (7.37) 1 2 2 by Chebychev’s inequality (7.38) ! 1 as 2 ! 0 (7.39) SoP YjS (0jB) = 1. Similarly forP YjS (1jA): Note that 1 <v 1 +m) 1 v 1 <m. Now pick = 1 2 d( 1 v 1 ;m) > 0. Then 1 v 2 + =m. Write P YjS (1jA) = Prfr 0 (n +v)> 0gj S=A (7.40) = Prfn +v 1 1 g by (3) and (4) (7.41) = Prfn 1 v 1 g (7.42) Prfn 1 v 1 +g (7.43) = Prfnmg (7.44) 163 = 1Prfnm<g (7.45) 1Prfjnmj>g (7.46) 1 2 2 by Chebychev’s inequality (7.47) ! 1 as 2 ! 0 (7.48) SoP YjS (1jA) = 1. Suppose next thatm is the location parameter of the total alpha-stable noisez +n 2 . Then P YjS (0jB) = Prfn 1 v 2 g (7.49) = Z 1 v 2 1 p(n)dn (7.50) ! Z 1 v 2 1 (nm)dn = 1 as ! 0 (7.51) because m< 1 v 2 : (7.52) Similarly P YjS (1jA) = Prfn 1 v 1 g (7.53) = Z 1 1 v 1 p(n)dn (7.54) ! Z 1 1 v 1 (nm)dn = 1 as ! 0 (7.55) because m> 1 v 1 : (7.56) 164 The two conditional probabilities for both the finite-variance and infinite variance cases likewise imply thatP YjS (0jA) =P YjS (1jB) = 0 as! 0 or ! 0. These four probabilities further imply that H(YjS) = X s X y P SY (s;y)log 2 P YjS (yjs) (7.57) = X s P s (s) X y P YjS (yjs)log 2 P YjS (yjs) (7.58) ! 0 (7.59) where we have used the L’Hospital rule [227] that 0log 2 0 = 0. The unconditional entropyH(Y ) becomes H(Y ) = X y P Y (y)log 2 P Y (y) (7.60) ! X s P S (s)log 2 P S (s) (7.61) = H(S) (7.62) because P Y (y) = X s P YjS (yjs)P S (s) (7.63) = P YjS (yjA)P S (A) +P YjS (yjB)P S (B) (7.64) = P YjS (yjA)P S (A) +P YjS (yjB)(1P S (A)) (7.65) = (P YjS (yjA)P YjS (yjB))P S (A) +P YjS (yjB) (7.66) = (P YjS (yjB)P YjS (yjA))P S (B) +P YjS (yjA) (7.67) 165 ! 8 > > < > > : P S (A) if y = 1 P S (B) if y = 0 (7.68) ThusH(YjS)! 0 andH(Y )!H(S) as! 0 or ! 0. ThenI(S;Y )!H(S) as! 0 or ! 0 becauseI(S;Y ) =H(Y )H(YjS).H(S) is the maximum ofI(S;Y ) becauseI(S;Y ) =H(S) –H(SjY ) andH(SjY ) 0 [61]. SoI(S;R) converges to its maximum valueH(S) as ! 0 and hence the system performs better without noise than with it for Case (1) and Case (4). We next prove the claim for Case (2). We show only thatP YjS (0jB) =P YjS (1jA) = 1 as ! 0 because the rest of the proof proceeds as in Case (1). P YjS (0jB) = Prfr 0 (n +v 2 )r 0 (v 2 +m +a)g (7.69) = Prfn +v 2 v 2 +m +ag (7.70) = Prfnm +ag (7.71) Prfnm +g for = a 2 (7.72) = 1Prfnm>g (7.73) 1Prfjnmj>g (7.74) 1 2 2 by Chebychev’s inequality (7.75) ! 1 as 2 ! 0 (7.76) SoP YjS (0jB) = 1. 166 Similarly P YjS (1jA) = Prfr 0 (n +v 2 )>r 0 (v 2 +m +a)g (7.77) = Prfn +v 1 >v 2 +m +ag (7.78) becausea = v 1 v 2 2 (7.79) = Prfn +v 1 >v 1 +mag (7.80) = Prfn>mag (7.81) Prfn>mg (7.82) by picking = a 2 (7.83) = 1Prfnm<g (7.84) 1Prfjnmj>g (7.85) 1 2 2 by Chebychev’s inequality (7.86) ! 1 as 2 ! 0 (7.87) SoP YjS (1jA) = 1. The proof for Case (3) proceeds as in Case (1). Simulation results confirm this mathematical result that noise in retinal signal processing can help retinal neurons detect subthreshold contrast signals. Figures 7.2 and 7.3 show detailed simulation instances of the predicted SR effect in Theorem 7.1. Figure 7.2 shows a 3-D plot of the Shannon mutual information versus the standard deviations of Gaussian white noise sources n 1 andn 2 in (1). Figure 7.3 shows their respective cross-section plots for the values 1 = 0.01 167 0 0.1 0.2 0 0.04 0.08 0.1 0 0.1 0.2 0.3 0.4 0.5 Mutual information I(S,R) in bits Standard deviation σ 2 of additive Gaussian white synaptic and ion−channel noise Standard deviation σ 1 of additive Gaussian white noise in the photoreceptors and contrast signal Figure 7.2: Stochastic resonance (SR) in the spiking retina model (7.1)-(7.3) with additive Gaus- sian white noisen 1 andn 2 . The noisy retina model has the spiking Poisson form (7.1)-(7.3) with thresholds 1 = 0 and 2 = 0.3. The maximum firing rate is 100 spikes/sec. The Bernoulli contrast signal takes the value of 0.2 with success probabilityp = 1=2 and takes the value of 0.4 otherwise. The graph shows the retina model’s smoothed input-output mutual information surface as a function of the noise standard deviations 1 and 2 . and 2 = 0.02. We computed the bit countI(S;R) using a discrete density ofR based on the number of spikes in 1-second intervals for each input symbol. Each plot shows the nonmonotonic signature of SR. 7.3 Stochastic Resonance in Spiking Sensory Neuron Models Theorem 7.2 below describes the SR noise benefit in a wide range of spiking sensory neuron models. It states its own forbidden-interval sufficient condition for SR in the special but ubiqui- tous case of additive Gaussian white noise. Proposition 7.1 shows that the converse also holds 168 0 0.05 0.1 0.15 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Standard deviation σ 2 Mutual information I(S,R) in bits 0 0.02 0.04 0.06 0.08 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Standard deviation σ 1 Mutual information I(S,R) in bits (a) (b) Figure 7.3: Cross-sections plots of the mutual-information surface of Figure 7.2. Plots (a) and (b) show the respective cross-sections of the mutual-information surface of Figure 7.2 for 1 = 0.01 and 2 = 0.02. Each simulation trial produced 10,000 input-output samplesfs(t);r(t)g that estimated the Poisson spiking rate r(t) to obtain the mutual information. Thick lines show the average mutual information. Vertical lines show the total min-max deviations of the mutual information in 1000 trials. for the leaky integrate-and-fire neuron. Figure 7.6 further shows that the SR result persists in this case even for infinite-variance stable noise. Proposition 7.2 shows that the converse need not hold for the Fitzhugh-Nagamo neuron—SR can still occur inside the forbidden interval. Theorem 7.2 specifically shows that these and other spiking neuron models enjoy an SR noise benefit if the noise meanE(n) falls to the left of a bound and if their average firing rates depend on the Kramers rate solution [144] of the Fokker-Planck diffusion equation. Theorem 7.2 applies to popular spiking sensory neuron models such as the FitzHugh-Nagumo (FHN) model [52, 55], the leaky integrate-and-fire model [55, 88], and the reduced Type I neuron model [160]. Figure 7.5 shows that SR can still occur in the FHN neuron model even ifE(n) falls to the right of this bound. So the interval condition in Theorem 7.2 is not necessary. 169 The FHN neuron model has the form _ v = v(v 2 1 4 )w +A T d +n; (7.88) _ w = vw (7.89) wherev is the membrane voltage (fast) variable,w is a recovery (slow) variable,A T = -5/(12 p 3) is a threshold voltage,S is the input signal,d =BS,B is the constant signal-to-threshold dis- tance, andn is independent Gaussian white noise. The input signal is subthreshold whend > 0 and so thenS <B. Kramers rate formula gives the average firing rate of the FHN neuron model with subthreshold input signals (S(t)<<B) [55] E(r(t)) = B 2 p 3 exp " 2 p 3[B 3 3B 2 S(t)] 3 2 # : (7.90) The average spike rate model poorly estimated the average firing rates of the FHN model in simulations. So we instead fitted the equation E(r(t)) =a exp bB 3 +cB 2 S(t) 2 (7.91) to the simulation data. Nonlinear least-squares gave the parametersa,b, andc in (7.91). Figure 7.4 shows that the fitted model (7.91) closely estimates the average spike rates of the FHN neuron model because the coefficient of determination wasr 2 = .9976. 170 0 1 2 3 4 5 6 x 10 −3 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Standard deviation σ of additive Gaussian white noise Average spike rate in spikes/second Figure 7.4: Approximation of the average firing rate. The estimated firing rate (solid line) closely approximates the average firing rate (dashed line) for the FHN neuron model in (9). The model parameters are A T = -5/(12 p 3), B = 0.07, and S = 0.01. Nonlinear least-squares fitted the parameters in (9) as a = 1.1718, b = 0.0187, and c = 0.0680 with coefficient of determinationr 2 = .9976. The leaky integrate-and-fire neuron model has the form [55] _ v = av +a +S +n (7.92) wherev is the membrane voltage,a and are constants,=a is the barrier height of the potential,S is an input signal, andn is independent Gaussian white noise. The input signalS is subthreshold whenS <. The neuron emits a spike when the membrane voltagev crosses the threshold value of 1 from below to above. The membrane voltagev resets to 1=a just after the neuron emits a spike. Then the ensemble-averaged spike rateE(r(t)) for subthreshold inputs (S 2 << ) has the form [55] E(r(t)) = p 2 exp 2 + 2S(t) 2 a (7.93) where 2 is the variance ofn. 171 Theorem 7.2 applies to the reduced Type I neuron model in (12) below. Type I neurons have a continuous firing-rate input current characteristic and respond with a wide range of firing rates near the threshold. Examples of Type I neurons include cortical neuron models such as the Traub neuron model [251]. Type II neurons have a narrow range of firing rates and have a discontinuous firing-rate input current characteristics. Examples of Type II neurons include the Hodgkin-Huxley neuron model [110] and the FHN neuron model [52, 55]. The two types of neuron models also produce different dynamical effects. Type I neurons produce saddle-node bifurcations whereas Type II neurons produce Hopf bifurcations. The reduction procedure in [99, 113] gives a simple one-dimensional normal form [160] of the multi-dimensional dynamics of Type I neuron models: _ v = +v 2 +n (7.94) wherev is the membrane potential, is the value of input signal, and is the standard deviation of Gaussian white noisen. The firing rate of the reduced model (7.94) for subthreshold or excitable regime (< 0) and weak noise ( 2 << 2jj 3=2 ) [160] is E(r(t)) = p jj exp " 8jj 3=2 3 2 # : (7.95) We can combine (7.91), (7.93), and (7.95) into the general form E(r(t)) =g(B;S(t);)exp h(B;S(t)) k 2 (7.96) 172 whereE(r(t)) is the average firing rate andk is a constant. The functionsg(B;S;) andh(B;S) depend on the potential barrierB, the subthreshold input signalS, and on the variance 2 of the additive Gaussian white noisen so thatE(r(t))! 0 as! 0. We note that the formula for the average Poisson spike rate in excitable cells due to the voltage-gated ion channels dynamics has a form similar to (7.96) [25]. We can now state Theorem 7.2. This theorem gives a sufficient condition for SR to occur in spiking sensory neuron models if their average output spike rates have the general form (7.96). The proof again follows the proof in [140, 141]. Theorem 7.2 Suppose that the average spike rate of a sensory neuron model has the form (7.96) and thatE(n) is the mean of the model’s additive Gaussian white noise n. Suppose that input signal S(t)2 fs 1 ;s 2 g is subthreshold: S(t)<B. Suppose that there is some statistical dependence between the input signal random variableS and the output average spike-rate random variableR so that I(S;R) > 0. Then the spiking sensory neuron exhibits the nonmonotone SR effect in the sense thatI(S;R)! 0 as the noise intensity! 0 ifE(n)<Bs 2 . Proof: Again we need to show only that F RjS (rjs 1 )F RjS (rjs 2 )! 0 iff F RjS (rjs 1 ) 1 + 1F RjS (rjs 2 )! 0 iff PrfRrjS =s 2 gPrfRrjS =s 1 g! 0 as ! 0 for allr in (0;r max ) (7.97) 173 wherer max is a finite number because the real biological neurons have absolute refractory time period. This implies that the distribution of spike-rates has a finite mean. We can write PrfRrjS =s 1 g E(RjS =s 1 ) r (7.98) by Markov’s inequality for allr and similarly PrfRrjS =s 2 g E(RjS =s 2 ) r (7.99) If the expression ofE(r(t)) has the form (7.96) then we need only show thatE(RjS = s 1 )! 0 and E(RjS = s 2 )! 0 as ! 0. We can absorb E(n) into the input signal S(t) because the noise n is additive in the model of spiking sensory neuron. Then the new input signal is S 0 (t) =S(t) +E(n) andS 0 (t) is subthreshold (S 0 (t)<B) becauseE(n)<B –s 2 wheres 2 = maxfs 1 ;s 2 g. ThusE(r(t)) has the form of (7.96). This proves (7.97) and hence the Theorem 7.2. Figure 7.5 shows a simulation instance of the SR effect in Theorem 7.2 for the special but important case of the FHN neuron model. The mutual-information plot shows the predicted nonmonotonic signature of SR. The leaky integrate-and-fire neuron model produces similar non- monotonic SR plots. Figure 7.6 goes beyond the scope of Theorem 7.2 and shows a simulation instance of the SR effect in the leaky integrate-and-fire neuron model with implusive infinite- variance-stable white noise. 174 0 1 2 3 4 5 x 10 −3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Standard deviation σ of additive Gaussian white noise Mutual information I(S,R) in bits Figure 7.5: Stochastic resonance in the FHN spiking neuron model—a simulation instance of Theorem 7.2. The model parameters areA T = -5/12( p 3),B = 0.07, andS =0.005. The solid curve shows the average mutual information. The dashed vertical lines show the total min-max deviations of mutual information in 100 simulation trials. Proposition 7.1 The converse of Theorem 7.2 holds for the leaky integrate-and-fire neuron model (8.50) in the sense that the system performs better without noise than with it when the interval conditionE(n) <Bs 2 fails. Proof: Suppose thatE(n)>Bs 2 . Then exactly one of the following two is true: Case (1):s 0 1 =s 1 +E(n) is subthreshold ands 0 2 =s 2 +E(n) is superthreshold. Case (2): Boths 0 1 ands 0 2 are superthreshold. 175 0 1 2 3 4 5 6 7 8 9 x 10 −5 0 0.1 0.2 0.3 0.4 0.5 0.6 Dispersion γ of additive α−stable noise (α = 1.9) Mutual information I(S,R) in bits Figure 7.6: Stochastic resonance in the integrate-and-fire spiking neuron model with subthreshold input signals and infinite-variance-stable noise ( = 1.9). The model parameters are a = 0.5, = 0.01, s 1 = 0.0025, and s 2 = 0.005. The solid curve shows the smoothed average mutual information. The dashed vertical lines show the total min-max deviations of mutual information in 100 simulation trials. Suppose that the input signal s 0 i is superthreshold. Then the interspike interval T i in the absence of additive noisen is [88] T i = m ln v 1 i v r v 1 i T h (7.100) wherev 1 i andv r are the respective values of the membrane potential at steady-state and at the reset, m is a time-constant of the membrane potential, andT h is a threshold for spike generation. The interspike interval has a Gaussian distribution in the presence of Gaussian white noisen in (8.50) [88]. The probability density of interspike interval i is f( i ) = v 0 i p exp v 02 ( i 0 ) 2 2 (7.101) 176 whereE() = 0 ,v 0 i = dv i (t) dt evaluated att =T i , and is the standard deviation of the additive white noise. Then Prfj i T i j>g 2 2v 0 for all> 0 ! 0 as 2 ! 0 (7.102) Thus ifs 0 i is superthreshold then i !T i in probability as! 0 by the definition of convergence in probability. Then the corresponding output spike raterj S 0 =s 0 i =r i ! 1 T i in probability because r i = 1 i . So Prfjr i 1 T i j>g ! 0 for all> 0 as 2 ! 0 (7.103) Suppose that Case (1) holds. Then define y =g(r) = 8 > > < > > : 0 ifr 1=T 2 2 1 ifr> 1=T 2 2 : (7.104) Suppose that Case (2) holds. Then define y =g(r) = 8 > > < > > : 0 ifr 1 T 1 +a 1 ifr> 1 T 2 a (7.105) wherea = ( 1 T 2 1 T 1 )=2. Note thata> 0 because 1 T 2 > 1 T 1 . We need to show only thatP YjS 0(0js 0 1 ) =P YjS 0(1js 0 2 ) = 1 as! 0 because the rest of the proof is similar to the only-if part of the proof of Theorem 7.1. 177 Suppose that Case (1) holds. Then P YjS (0js 0 1 ) = Prfr 1=T 2 2 jS 0 =s 0 1 g (7.106) = 1Prfr> 1=T 2 2 jS 0 =s 0 1 g (7.107) 1 E(RjS 0 =s 0 1 ) 1=T 2 2 by Markov’s inequality (7.108) = 1 as! 0 (7.109) becauses 0 1 is subthreshold andE(r(t))! 0 for (7.96). P YjS 0(1js 0 2 ) = Prfr> 1=T 2 2 jS 0 =s 0 2 g (7.110) = 1Prfr< 1=T 2 2 jS 0 =s 0 2 g (7.111) 1Prfjr 2 1 T 2 j> 1=T 2 2 becauserj S 0 =s 0 i =r i (7.112) ! 1 as ! 0 by (7:103): (7.113) Suppose now that Case (2) holds. Then P YjS (0js 0 1 ) = Prfr 1 T 1 +ajS 0 =s 0 1 g (7.114) = Prfr 1 1 T 1 +ag becauserj S 0 =s 0 i =r i (7.115) = 1Prfr 1 > 1 T 1 +ag (7.116) 1Prfjr 1 1 T 1 j>ag (7.117) ! 1 by (7:103) (7.118) and (7.119) 178 P YjS (1js 0 2 ) = Prfr> 1 T 2 ajS 0 =s 0 2 g (7.120) = Prfr 2 > 1 T 2 ag (7.121) = 1Prfr 2 < 1 T 2 ag (7.122) 1Prfjr 2 1 T 2 j>ag (7.123) ! 1 by (7:103): (7.124) 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 −3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Standard deviation σ of additive Gaussian white noise Mutual information I(S,R) in bits Figure 7.7: Stochastic resonance in the FHN spiking neuron model with superthreshold input signals and additive Gaussian white noise. The interval condition in Theorem 7.2 is not necessary. The model parameters areA T = -5/(12 p 3),B = 0.07,s 1 = 0.56, ands 2 = 0.565.E(n) = 0>B –s 2 = -0.495 implies thatE(n) does not satisfy the interval condition of Theorem 7.2. The solid curve shows the smoothed average mutual information. The dashed vertical lines show the total min-max deviations of mutual information in 100 simulation trials. Proposition 7.2 The converse of Theorem 7.2 does not hold for the FHN neuron model (7.88)-(7.89). Figure 7.7 confirms Proposition 7.2 because it shows that SR can still occur when the noise meanE(n) falls to the right ofB maxfs 1 ;s 2 g. 179 Chapter 8 Stochastic Resonance in Continuous and Spiking Neuron Models with Levy Noise This chapter presents the first formal proof that the main neural stochastic models can actually benefit from Levy (jump) diffusions that generalize the more common but less realistic Brownian (non-jump) diffusions. Levy noise extends standard Brownian noise to many types of impulsive jump-noise processes found in real and model neurons as well as in models of finance and other random phenomena. Two new theorems and the Itˆ o calculus show that white Levy noise will benefit subthreshold neuronal signal detection if the noise process’s scaled drift velocity falls inside an interval that depends on the threshold values. These results generalize earlier forbidden interval theorems of neuronal SR noise benefits. Applebaum [13] has recently expanded these results to more general Levy processes that do not require the finite second-moment restriction. 180 8.1 Levy Noise Benefit in Neural Signal Detection Levy noise can help neurons detect faint or subthreshold signals. Levy noise extends standard Brownian noise to many types of impulsive jump-noise processes found in real and model neu- rons as well as in models of finance and other random phenomena. Two new theorems and the Itˆ o calculus show that white Levy noise will benefit subthreshold neuronal signal detection if the noise process’s scaled drift velocity falls inside an interval that depends on the threshold values. These results generalize earlier forbidden-interval theorems of neuronal ‘stochastic resonance’ or noise-injection benefits [140, 141, 188, 203] to a wide range of general feedback continuous neu- rons and spiking neurons that benefit from a broad class of additive white Levy noise. Simulation results show that the same noise benefits still occur for some infinite-variance stable Levy noise processes even though the theorems themselves apply only to finite-variance Levy noise. This appears to be the first demonstration of the SR effect for neuron models subject to Levy noise perturbations. Figure 8.1 shows how impulsive Levy noise can enhance the Kanisza-square visual illusion in which four dark-corner figures give rise to an illusory bright interior square. Each pixel is the thresholded output of a noisy bistable neuron whose input signals are subthreshold and quantized pixel values of the original noise-free Kanisza image. The outputs of the bistable neurons do not depend on the input signals if there is no additive noise because the input signals are subthreshold. Figure 8.1(a) shows that adding infinite-variance Levy noise induces a slight correlation between the pixel input and output signals. More intense Levy noise increases this correlation in Figures 8.1(b)-(c). Still more intense Levy noise degrades the image and undermines the visual illusion in Figures 8.1(d)-(e). Figure 8.2 shows typical sample paths from different types of Levy noise. 181 (a) (b) (c) (d) (e) Figure 8.1: Stochastic resonance in the Kanisza square illusion with infinite-variance symmetric -stable ( = 1.9) noise. The Kanisza square illusion improves as the noise dispersion increases from 0.047 to 0.3789 and then it degrades as the dispersion increases further. Each pixel represents the output of the noisy bistable potential neuron model (8.1)-(8.2) and (8.5) that uses the pixel values of the original Kanisza square image as subthreshold input signals. The additive-stable noise dispersions are (a) = 0:047, (b) = 0:1015, (c) = 0:3789, (d) = 1, and (e) = 3:7321. Figure 8.3 shows the characteristic inverted-U or non-monotonic signature of SR for white Levy noise that perturbs a continuous bistable neuron. We generalize the recent ‘forbidden interval’ theorems [140, 141, 188, 203, 204] for continu- ous and spiking neuron models to a broad class of finite-second-moment Levy noise that may even depend on the neuron’s membrane potential. The theorems below show that mutual-information SR noise benefit will occur if the additive white Levy noise process has a bounded scaled drift velocity that does not fall within a threshold-based interval. This holds for general feedback continuous neuron models that include common signal functions such as logistic sigmoids or Gaussians. It also holds for spiking neurons such as the FitzHugh-Nagumo, leaky integrate-and- fire, and reduced Type-I neuron models. We used the Itˆ o stochastic calculus to prove our results under the assumption that the Levy noise has a finite second moment. But Figure 8.1 and the (c) sub-figures of Figures 8.3-8.8 all show that the SR noise benefit still occurs in the more general infinite-variance case of some types of -stable Levy noise. So the SR effect is not limited to 182 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 t −0.2 −0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 t 0 0.1 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 t −0.2 0.1 −0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 t −0.2 −0.1 0 L t L t L t (a) Brownian diffusion (b) jump diffusion (c) normal inverse Gaussian process L t (d) infinite−variance α−stable process Figure 8.2: Sample paths from one-dimensional Levy processes. (a) Brownian motion with drift = 0:1 and variance = 0:15, (b) jump diffusion with = 0:1, = 0:225, Poisson jump rate = 3, and uniformly distributed jump magnitudes in the interval [-0.2,0.2] (and so with Levy measure (dy) = (3=0:4)dy for y 2 [0:2; 0:2] and zero else), (c) normal inverse Gaussian (NIG) process with parameters = 20, = 0, = 0.1, and = 0, (d) infinite-variance-stable process with = 1.9 and dispersion = 0.0272 ( = 0, = 0, and(dy) is of the form k jyj 1+ dy). finite-second-moment Levy noise. Levy noise has advantages over standard Gaussian noise in neuron models despite its in- creased mathematical complexity. A Levy noise model more accurately describes how the neu- ron’s membrane potential evolves than does a simpler diffusion model because the more general Levy model includes not only pure-diffusion and pure-jump models but jump-diffusion mod- els as well [111, 229]. Neuron models with additive Gaussian noise are pure-diffusion models. 183 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Scale κ of additive white Jump−Diffusion Levy noise Mutual information I(S,R) in bits 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Scale κ of additive white NIG Levy noise Mutual information I(S,R) in bits 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Scale κ of additive white α−stable Levy noise (α = 1.9) Mutual information I(S,R) in bits (a) (b) (c) Figure 8.3: Mutual information Levy noise benefits in the continuous bistable neuron (8.1)-(8.2) and (8.5). Additive white Levy noisedL t increases the mutual information of the bistable potential neuron for the subthreshold input signalss 1 =0:3 ands 2 = 0:4. The types of Levy noisedL t are (a) Gaussian with uniformly distributed jumps, (b) pure-jump normal inverse Gaussian (NIG), and (c) symmetric-stable noise with = 1.9 (thick-tailed bell curve with infinite variance [192]). The dashed vertical lines show the total min-max deviations of the mutual information in 100 simulation trials. These neuron models rely on the classical central limit theorem for their Gaussian structure and thus they rely on special limiting-case assumptions of incoming Poisson spikes from other neu- rons. These assumptions require at least that the number of impinging synapses is large and that the synapses have small membrane effects due to the small coupling coefficient or the synaptic weights [88, 132]. The Gaussian noise assumption may be more appropriate for signal inputs from dendritic trees because of the sheer number of dendrites. But often fewer inputs come from synapses near the post-synaptic neuron’s trigger zone and these inputs produce impulses in noise amplitudes because of the higher concentration of voltage-sensitive sodium channels in the trig- ger zone [91, 92, 122, 197]. Engineering applications also favor the more general Levy model because physical devices may be limited in their number of model-neuron connections [183] and because real signals and noise can often be impulsive [97, 192, 232]. 184 8.2 Noisy Feedback Neuron Models We study Levy SR noise benefits in the noisy feedback neuron models of the general form _ x = x(t) +f(x(t)) +s(t) +n(t) (8.1) y(t) = g(x(t)) (8.2) with initial condition x(t 0 ) = x 0 . Here s(t) is the additive net excitatory or inhibitory input forcing signal—eithers 1 ors 2 . The additive noise termn(t) is Levy noise with mean or location and intensity scale (or dispersion for symmetric-stable noise where = with charac- teristic function(u) =e iu juj ). The neuron feeds its activation or membrane potential signal x(t) back to itself throughx(t)+f(x(t)) and emits the (observable) thresholded or spike signal y(t) as output. Hereg is a static transformation function. We use the threshold g(x) = 8 > > < > > : 1 ifx> 0 0 else (8.3) for continuous neuron models. We use a related threshold g in spiking neuron models where g determines the spike occurrence. The neuronal signal function f(x) of (8.1) can be of quite general form for continuous neuron models [204]: Logistic. The logistic signal function [136] is sigmoidal and strictly increasing f(x) = 1 1 +e cx (8.4) 185 for scaling constantc > 0. We usec = 8. This signal function gives a bistable additive neuron model. Hyperbolic Tangent. This signal function is also sigmoidal and gives a bistable additive neuron model [7, 54, 112, 136]: f(x) = 2 tanhx (8.5) Linear Threshold. This linear-threshold signal has the form [136]: f(x) = 8 > > > > > > < > > > > > > : cx jcxj< 1 1 cx> 1 1 cx<1 (8.6) for constantc> 0. We usec = 2. Exponential. This signal function is asymmetric and has the form [136] f(x) = 8 > > < > > : 1 expfcxg ifx> 0 0 else (8.7) for constantc> 0. We usec = 8. Gaussian. The Gaussian or ‘radial basis’ signal function [136] differs in form from the signal functions above because it is nonmonotonic: f(x) = expfcx 2 g (8.8) 186 for constantc> 0. We usec = 8. The above neuron models can have up to three fixed points depending on the input signal and the model parameters. The input signal is subthreshold in the sense that switching it from s 1 tos 2 or vice versa does not change the outputY t of (8.23). There exist 1 and 2 such that the inputS is subthreshold when 1 s 1 < s 2 2 . The values of 1 and 2 depend on the model parameters. Consider the linear threshold neuron model (8.1)-(8.2) and (8.6) withc = 2. A simple calculation shows that if the input signalS t 2fs 1 ;s 2 g satisfies0:5<s 1 <s 2 < 0:5 then the linear threshold neuron has two stable fixed points (one positive and the other negative) and has one unstable fixed point between them. The Gaussian neuron model (8.1)-(8.2) and (8.8) has only one fixed point if 0<s 1 <s 2 . So the input is subthreshold because switching it froms 1 tos 2 or vice versa does not change the outputY t . Figure 8.3 shows the mutual information noise benefits in the bistable neuron model (8.1)-(8.2) and (8.5) for three different additive white Levy noise cases when the input signals are subthreshold. Note the signature nonmonotonic shape of all three SR noise-benefit curves in Figure 8.3. The membrane potential dynamics (8.1) are one-dimensional for all our neuron models except for the two-dimensional FHN spiking neuron model below. So next we briefly describe multi- dimensional Levy processes and set up a general multi-dimensional Levy stochastic differential equation framework for our feedback continuous and spiking neuron models. 187 8.3 Levy Processes and Stochastic Differential Equations Levy processes [217, 235] form a wide class of random processes that include Brownian motion, -stable processes, compound Poisson processes, generalized inverse Gaussian processes, and generalized hyperbolic processes. Figure 8.2 shows some typical scalar Levy sample paths. Levy processes can account for the impulsiveness or discreteness of both signals and noise. Researchers have used Levy processes to model diverse phenomena in economics [17, 238], physics [242], electrical engineering [3, 17, 192, 214], biology [240], and seismology [244]. A Levy process L t = (L 1 t ;:::;L m t ) 0 fort 0 in a given probability space ( ;F; (F t ) 0t1 ;P ) is a stochastic process taking values inR m with stationary and independent increments (we assume thatL 0 = 0 with probability 1). The Levy processL t obeys three properties: 1. L t L s is independent of sigma-algebraF s for 0s<t1 2. L t L s has the same distribution asL ts 3. L s !L t in probability ifs!t. The Levy-Khintchine formula gives the characteristic function ofL t as [12] (u) =E(e ihu;Lti ) =e t(u) fort 0 and u2R m (8.9) whereh;i is the Euclidean inner product (sojuj =hu;ui 1 2 ). The characteristic exponent or the so-called Levy exponent is (u) =ih;ui 1 2 hu;Kui + Z R m f0g [e ihu;yi - 1 -ihu;yi jyj<1 (y)](dy) (8.10) 188 for some 2 R m , a positive-definite symmetric mm matrix K, and measure on Borel subsets ofR m 0 =R m nf0g (or(f0g) = 0). Then is a Levy measure such that Z R m 0 minf1;jyj 2 g(dy) < 1: (8.11) A Levy processL t combines a drift component, a Brownian motion (Gaussian) component, and a jump component. The Levy-Khintchine triplet (;K;) completely determines these com- ponents. The Levy measure determines both the average number of jumps per unit time and the distribution of jump magnitudes in the jump component ofL t . Jumps of any size in a Borel setB form a compound Poisson process with rate R B (dy) and jump density(dy)= R B (dy) if the closureB does not contain0. gives the velocity of the drift component.K is the covari- ance matrix of the Gaussian component. IfK =0 and = 0 then (8.9) becomesE(e ihu;Lti ) = e ith;ui . ThenL t =t is a simplem-dimensional deterministic motion (drift) with velocity vector . IfK6= 0 and = 0 thenL t is am-dimensional Brownian motion with drift because (8.9) takes the formE(e ihu;Lti ) =e t[ih;ui 1 2 hu;Kui] and because this exponential is the characteristic function of a Gaussian random vector with mean vectort and covariance matrixtK. IfK6=0 and (R m ) <1 then L t is a jump-diffusion process whileK = 0 and (R m ) <1 give a compound Poisson process. IfK =0 and(R m ) =1 thenL t is a purely discontinuous jump process and has an infinite number of small jumps in any time interval of positive length. We consider only the Levy processes whose components L k t have finite second moments: E[(L k t )] 2 <1. This excludes the important family of infinite-variance-stable processes (in- cluding the = 0.5 Levy stable case) where2 (0,2] measures the tail thickness and where sym- metric-stable distributions have characteristic functions(u) =e iu juj [97, 140, 192, 232]. 189 But a finite-moment assumption does not itself imply that the Levy measure is finite: ((R) < 1). Normal inverse Gaussian NIG(;;;) distributions are examples of semi-thick-tailed pure-jump Levy processes that have infinite Levy measure and yet have finite moments of all order [101, 228]. They can model the risks of options hedging and of credit default in portfo- lios of risky debt obligations [121, 238]. They have characteristic functions of the form(u) = e [iu+( p 2 2 p 2 (+iu) 2 )] where 0jj< and> 0. LetL t (;K;) = (L 1 t ;:::;L m t ) 0 be a Levy process that takes values inR m whereL j t ( j ; j ; j ) are real-valued independent Levy processes forj = 1;:::;m. We denote the Levy-Itˆ o decompo- sition [12] ofL j t for eachj = 1;:::;m andt 0 as L j t = j t + j B j t + Z jy j j<1 y j ~ N j (t;dy j ) + Z jy j j1 y j N j (t;dy j ) (8.12) = j t + j B j t + Z t 0 Z jy j j<1 y j ~ N j (ds;dy j ) + Z t 0 Z jy j j1 y j N j (ds;dy j ): (8.13) Here determines the velocity of the deterministic drift process i t while theB j t are real-valued independent standard Brownian motions. Then = ( 1 ;:::; m ) 0 andK =diag[( 1 ) 2 ;:::; ( m ) 2 ]. The N j are independent Poisson random measures on R + R 0 with compensated (mean- subtracted) Poisson processes ~ N j and intensity/Levy measures j . Define the Poisson random measure as N j (t;B) = #fL s 2B for 0stg (8.14) 190 for each Borel setB inR 0 . The Poisson random measure gives the random number of jumps ofL t in the time interval [0;t] with jump size L t in the setB. N j (t;B) is a Poisson random variable with intensity j (B) if j (B)<1 and if we fixt andB. ButN j (t;)(!) is a measure if we fix !2 and t 0. This measure is not a martingale. But the compensated Poisson random measure ~ N j (t;B) =N j (t;B)t j (B) (8.15) is a martingale and gives the compensated Poisson integral (8.12) (the second term on the right- hand side of (8.12)) as Z jy j j<1 y j ~ N j (t;dy j ) = Z jy j j<1 y j N j (t;dy j ) t Z jy j j<1 y j j (dy j ) forj = 1;:::;m: (8.16) We assume again that each L j t has a finite second moment (EjL j t j 2 <1). But if L j t is a Levy process with triplet ( j ; j ; j ) thenL j t has a finitep th moment forp2 R + if and only if R jy j j>1 jy j j p j (dy j ) <1 [235]. The drift velocity j relates to the expected value of a Levy Process L j t by E(L j 1 ) = j + R jy j j>1 y j (dy j ) and E(L j t ) = tE(L j 1 ). So if L j t is a standard Brownian motion then j = 0,E(L j t ) = 0, and Var(L j t ) =t( j ) 2 . The variance of the Levy process in (8.12) is Var(L j t ) = Var( j B j t ) + Var( Z jy j j<1 y j ~ N j (t;dy j )) + Var( Z jy j j1 y j N j (t;dy j )) (8.17) 191 because the underlying processes are independent. The variance terms on the right-hand side of (8.17) have the following form [12]: Var(L j t ) = t( j ) 2 (8.18) Var Z jy j j1 y j N j (t;dy j ) ! = t Z jy j j1 jy j j 2 j (dy j ) (8.19) Var Z jy j j<1 y j ~ N j (t;dy j ) ! E Z jy j j<1 y j ~ N j (t;dy j ) ! 2 (8.20) = t Z jy j j<1 jy j j 2 j (dy j ): (8.21) The last equality follows from the Itˆ o isometry identity (Proposition 8.8 in [60]). Then (8.17) and (8.18)-(8.21) imply that the Var(L j t )! 0 if and only if j ! 0 and j ! 0. We can rewrite (8.1)-(8.2) as a more general Itˆ o stochastic differential equation (SDE) [12] dX t = b(X t )dt +c(X t )dL t (8.22) Y t = g(X t ) (8.23) with initial conditionX 0 . Hereb(X t ) =X t +f(X t ) +S t is a Lipschitz continuous drift term,c(X t ) is a bounded Levy diffusion term, anddL t is a white Levy noise with noise scale. Our continuous neuron models are again one-dimensional but the spiking FHN neuron model is two-dimensional. So consider the general d-dimensional SDE in the matrix form with m- dimensional Levy noiseL t = (L 1 t ;:::;L m t ) 0 dX t =b(X t )dt +c(X t )dL t (8.24) 192 which is shorthand for the system of SDEs dX i t =b i (X t )dt + m X j=1 c i j (X t )dL j t fori = 1;::;d (8.25) with initial condition X i 0 . Here X t = (X 1 t ;:::;X d t ) 0 , b(X t ) = (b 1 (X t );:::;b d (X t )) 0 , and c is a dm matrix with rowsc i (X t ) = (c i 1 (X t );:::;c i m (X t )). The functionsb i : R d !R are locally or globally Lipschitz measurable functions. The functionsc i j : R d ! R are bounded globally Lipschitz measurable functions such thatjc i j j 2 H i j 2R + . TheL j t terms are independent Levy processes as in (8.13) with j = 0 forj = 1, ...,m. Then dX i t = b i (X t )dt + m X j=1 [c i j (X t ) j ]dt + m X j=1 [c i j (X t ) j ]dB j t + m X j=1 Z jy j j<1 [c i j (X t )y j ] ~ N j (dt;dy j ) + m X j=1 Z jy j j1 [c i j (X t )y j ]N j (dt;dy j ) = b i (X t )dt + m X j=1 [ i j (X t )]dt + m X j=1 i j (X t )dB j t + m X j=1 Z jy j j<1 F i j (X t ;y j ) ~ N j (dt;dy j ) + m X j=1 Z jy j j1 G i j (X t ;y j )N j (dt;dy j ): (8.26) where i j (X t ) =c i j (X t ) j = 0, i j (X t ) =c i j (X t ) j , F i j (X t ;y j ) =c i j (X t )y j , and G i j (X t ;y j ) =c i j (X t )y j are all globally Lipschitz functions. This equation has the integral form with initial conditionX i 0 : X i t = X i 0 + Z t 0 b i (X s )ds + m X j=1 Z t 0 i j (X s )dB j s + m X j=1 Z t 0 Z jy j j<1 F i j (X s ;y j ) ~ N j (ds;dy j ) + m X j=1 Z t 0 Z jy j j1 G i j (X s ;y j )N j (ds;dy j ): (8.27) 193 8.4 Levy Noise Benefits in Continuous Neuron Models We now prove that Levy noise can benefit the noisy continuous neurons (8.22)-(8.23) with signal functions (8.4)-(8.8) and subthreshold input signals. We assume that the neuron receives a con- stant subthreshold input signalS t 2fs 1 ;s 2 g for timeT . LetS denote the input signal and letY denote the output signalY t for a sufficiently large randomly chosen timetT . Noise researchers have used various system performance measures to detect SR noise benefits [39, 55, 126, 151, 178, 187, 188, 203, 222]. These include the output signal-to-noise ratio, cross- correlation, error probability, and Shannon mutual information between input and output signals. We use Shannon mutual information to measure the Levy noise benefits. Mutual information measures the information that the neuron’s output conveys about the input signal. It is a common detection performance measure when the input signal is random [39, 116, 188, 246]. Define the Shannon mutual informationI(S;Y ) of the discrete input random variableS and the output random variableY as the difference between the output’s unconditional and conditional entropy [61]: I(S;Y ) = H(Y )H(YjS) (8.28) = X y P Y (y) logP Y (y) + X s X y P SY (s;y) logP YjS (yjs) (8.29) = X y P Y (y) logP Y (y) + X s P S (s) X y P YjS (yjs) logP YjS (yjs) (8.30) = X s;y P SY (s;y) log P SY (s;y) P S (s)P Y (y) : (8.31) 194 So the mutual information is the expectation of the random variable log P SY (s;y) P S (s)P Y (y) : I(S;Y ) = E h log P SY (s;y) P S (s)P Y (y) i : (8.32) HereP S (s) is the probability density of the inputS,P Y (y) is the probability density of the output Y ,P YjS (yjs) is the conditional density of the outputY given the inputS, andP SY (s;y) is the joint density of the inputS and the outputY . An SR noise benefit occurs in a system if and only if an increase in the input noise variance or dispersion increases the system’s mutual information (8.32). We need the following lemma to prove that noise improves the continuous neuron’s mutual information or bit count. The Appendix gives the proof of Lemma 8.1. Lemma 8.1 Letb i : R d ! R andc i j : R d ! R in (8.24)-(8.25) be measurable functions that satisfy the global Lipschitz conditions jb i (x 1 )b i (x 2 )j K 1 kx 1 x 2 k (8.33) jc i j (x 1 )c i j (x 2 )j K 2 kx 1 x 2 k (8.34) and jc i j (x 1 )j 2 H i j (8.35) for allx 1 ;x 2 2R d and fori = 1, ...,d andj = 1, ...,m. Suppose 195 dX t = b(X t )dt +c(X t )dL t (8.36) d ^ X t = b( ^ X t )dt (8.37) wheredL t is a Levy noise with =0 and finite second moments. Then for everyT2R + and for every"> 0: E[ sup 0tT kX t ^ X t k 2 ]! 0 as j ! 0 and j ! 0 (8.38) for allj = 1, ...,m, and hence P [ sup 0tT kX t ^ X t k 2 >"]! 0 as j ! 0 and j ! 0 (8.39) for allj = 1, ...,m since mean-square convergence implies convergence in probability. We prove the Levy SR theorem with the stochastic calculus and a special limiting argument. This avoids trying to solve for the processX t in (8.22). The proof strategy follows that of the ‘forbidden interval’ theorems [140, 141, 203]: what goes down must go up. Jensen’s inequality implies thatI(S;Y ) 0 [61]. Random variablesS andY are statistically independent if and only if I(S;Y ) = 0. Hence I(S;Y ) > 0 implies some degree of statistical dependence. So the system must exhibit the SR noise benefit if I(S;Y ) > 0 and if I(S;Y )! 0 when noise parameters! 0 and! 0. Theorem 8.1 uses Lemma 8.1 to show thatI(S;Y )! 0 when noise parameters! 0 and! 0. So some increase in the noise parameters must increase the mutual information. 196 Theorem 8.1 Suppose that the continuous neuron models (8.22)-(8.23) and (8.4)-(8.8) have a bounded globally Lipschitz Levy diffusion termc(X t )H and that the additive Levy noise has drift velocity. Suppose also that the input signalS(t)2fs 1 ;s 2 g is subthreshold: 1 s 1 < s 2 2 and that there is some statistical dependence between the input random variableS and the output spike- rate random variableR so thatI(S;R) > 0. Then the neuron models (8.22)-(8.23) with signal functions including (8.4)-(8.8) exhibit the nonmonotone SR effect in the sense thatI(S;Y )! 0 as the Levy noise parameters! 0 and! 0 if 1 s 1 H 2 s 2 . Proof: Let f k ; k g 1 k=1 be any decreasing sequence of Levy noise parameters such that k ! 0 and k ! 0 as k!1. Define X(t) k and Y (t) k as solution processes of the continuous neuron models with Levy noise parameters k and k instead of and. Suppose that6= 0. We can absorb the driftc(X t ) into the input signalS because the Levy noiseL t is additive in the neuron models. Then the new input signalS 0 =S +c(X t ) and it does not affect the Lipschitz continuity ofb(X t ) in equation (8.22). Note thatS 0 is subthreshold ( 1 S 0 2 ) if 1 s 1 H 2 s 2 . So we lose no generality if we consider the noisedL t with = 0 and letS2fs 1 ;s 2 g be subthreshold in the continuous neuron models (8.22). This allows us to use Lemma 8.1. Let the symbol ‘0’ denote the input signalS =s 1 and the output signalY = 0. Let the symbol ‘1’ denote the input signalS =s 2 and the output signalY = 1. Assume that 0<P S (s)< 1 to avoid triviality whenP S (s) = 0 or 1. We show thatS andY are asymptotically independent by using the fact thatI(S;Y ) = 0 if and only ifS andY are statistically independent [61]. So we 197 need to show only thatP SY (s;y) = P S (s)P Y (y) orP YjS (yjs) = P Y (y) as k ! 0 and k ! 0 as k!1 for signal symbols s2 S and y2 Y . The theorem of total probability and the two-symbol alphabet setS give P Y (y) = X s P YjS (yjs)P S (s) = P YjS (yj0)P S (0) +P YjS (yj1)P S (1) = P YjS (yj0)P S (0) +P YjS (yj1)(1P S (0)) = (P YjS (yj0)P YjS (yj1))P S (0) +P YjS (yj1) So we need to show only thatP Y k jS (yj0)P Y k jS (yj1) = 0 as k ! 0 and k ! 0 fory2f0; 1g . We prove the case fory = 0 only: lim k!1 fP Y k jS (0j0)P Y k jS (0j1)g = 0 since the proof for y = 1 is similar. Then the desired limit goes to zero because lim k!1 fP Y k jS (0j0) P Y k jS (0j1)g = lim k!1 P Y k jS (0j0) lim k!1 P Y k jS (0j1) (8.40) = lim k!1 P [Y k = 0jS = 0] lim k!1 P [Y k = 0jS = 1] (8.41) = lim k!1 P [X(t) k < 0jS = 0] lim k!1 P [X(t) k < 0jS = 1] for larget = lim k!1 n P [X(t) k < 0; ^ X t < 0jS = 0] + P [X(t) k < 0; ^ X t > 0jS = 0] o (8.42) lim k!1 n P [X(t) k > 0; ^ X t < 0jS = 1] + P [X(t) k > 0; ^ X t > 0jS = 1] o for larget (8.43) 198 = lim k!1 n P [X(t) k < 0j ^ X t < 0;S = 0]P [ ^ X t < 0jS = 0] + P [X(t) k < 0j ^ X t > 0;S = 0]P [ ^ X t > 0jS = 0] o (8.44) lim k!1 n P [X(t) k > 0j ^ X t < 0;S = 1]P [ ^ X t < 0jS = 1] +P [X(t) k > 0j ^ X t > 0;S = 1]P [ ^ X t > 0jS = 1] o (8.45) for larget = f1 1 2 + 0 1 2 gf0 1 2 + 1 1 2 g (8.46) by Lemma 8.1 and the assumption that P [ ^ X t < 0jS =s i ] =P [ ^ X t > 0jS =s i ] = 1=2 (8.47) fori = 1; 2: = 0 (8.48) Sub-figures (a) and (b) of Figures 8.4-8.6 show simulation instances of Theorem 8.1 for finite- variance jump-diffusion and pure-jump additive white Levy noise in logistic, linear-threshold, and Gaussian neuron models. Small amounts of additive Levy noise in continuous neuron models pro- duce the SR effect by increasing the Shannon mutual informationI(S;Y ) between realizations of a random (Bernoulli) subthreshold input signalS and the neuron’s thresholded output random variableY . The SR effect in Figures 8.3(c) and 8.4-8.6(c) lie outside the scope of the theorem because it occurs for infinite-variance-stable noise. Thus the SR effect in continuous neurons is not limited to finite-second-moment Levy noise. 199 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Scale κ of additive white uniform jump−diffusion Levy noise Mutual information I(S,R) in bits 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Scale κ of additive white NIG Levy noise Mutual information I(S,R) in bits 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Scale κ of additive white α−stable Levy noise Mutual information I(S,R) in bits (a) (b) (c) Figure 8.4: Mutual information SR Levy noise benefits in the logistic continuous neuron (8.1)- (8.2) and (8.4). Additive white Levy noisedL t increases the mutual information of the logistic neuron for the subthreshold input signal s 1 =0:6, s 2 =0:4, and c = 8. The types of Levy noise dL t are (a) Gaussian with uniformly distributed jumps, (b) pure-jump normal inverse Gaussian (NIG), and (c) symmetric-stable noise with = 1.95 (thick-tailed bell curve with infinite variance [192]). The dashed vertical lines show the total min-max deviations of the mutual information in 100 simulation trials. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Scale κ of additive white uniform jump−diffusion Levy noise Mutual information I(S,R) in bits 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Scale κ of additive white NIG Levy noise Mutual information I(S,R) in bits 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Scale κ of additive white α−stable Levy noise Mutual information I(S,R) in bits (a) (b) (c) Figure 8.5: Mutual information Levy noise benefits in the linear-threshold continuous neuron (8.1)-(8.2) and (8.6). Additive white Levy noise dL t increases the mutual information of the linear-threshold neuron for the subthreshold input signals 1 =0:4,s 2 = 0:4, andc = 2. The types of Levy noisedL t are (a) Gaussian with uniformly distributed jumps, (b) pure-jump normal inverse Gaussian (NIG), and (c) symmetric - stable noise with = 1.95 (thick-tailed bell curve with infinite variance [192]). The dashed vertical lines show the total min-max deviations of the mutual information in 100 simulation trials. 200 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.1 0.2 0.3 0.4 0.5 0.6 Scale κ of additive white uniform jump−diffusion Levy noise Mutual information I(S,R) in bits 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.1 0.2 0.3 0.4 0.5 0.6 Scale κ of additive white NIG Levy noise Mutual information I(S,R) in bits 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.6 Scale κ of additive white α−stable Levy noise Mutual information I(S,R) in bits (a) (b) (c) Figure 8.6: Mutual information Levy noise benefits in the Gaussian or ‘radial basis’ continuous neuron (8.1)-(8.2) and (8.8). Mutual information Levy noise benefits in the Gaussian or ‘radial basis’ continuous neuron (8.1)-(8.2) and (8.8). Additive white Levy noise dL t increases the mutual information of the Gaussian neuron for the subthreshold input signals 1 =0:4,s 2 = 0:4, andc = 8. The types of Levy noisedL t are (a) Gaussian with uniformly distributed jumps, (b) pure-jump normal inverse Gaussian (NIG), and (c) symmetric - stable noise with = 1.95 (thick-tailed bell curve with infinite variance [192]). The dashed vertical lines show the total min-max deviations of the mutual information in 100 simulation trials. 8.5 Levy Noise Benefits in Spiking Neuron Models We next demonstrate Levy SR noise benefits in three popular spiking neuron models: the leaky integrate-and-fire model [55, 88], the reduced Type I neuron model [160], and the FitzHugh- Nagumo (FHN) model [76, 56]. This requires the use of Lemma 8.2 as we discuss below. These neuron models have a one- or two-dimensional form of (8.1). A spike occurs when the membrane potentialx(t) crosses a threshold value from below. We measure the mutual informationI(S;R) between the input signal s(t) and the output spike-rate response R of theses spiking neuron models. We define the average output spike-rate responseR in the time interval (t 1 ;t 2 ] as R = N(t 1 ;t 2 ] t 2 t 1 (8.49) 201 whereN(t 1 ;t 2 ] is the number of spikes in the time interval (t 1 ;t 2 ]. The Leaky Integrate-and-Fire Neuron Model The leaky integrate-and-fire neuron model has the form [55] _ v = av +a +S +n (8.50) wherev is the membrane voltage,a and are constants,=a is the barrier height of the potential, S is an input signal, andn is independent Gaussian white noise in the neural literature but here is Levy white noise. The input signalS is subthreshold whenS < . The neuron emits a spike when the membrane voltagev crosses the threshold value of 1 from below to above. The mem- brane voltagev resets to 1=a just after the neuron emits a spike. The Reduced Type I Neuron Model The reduction procedure in [99, 113] gives a simple one-dimensional normal form [160] of the multi-dimensional dynamics of Type I neuron models: _ v = +v 2 +n (8.51) wherev is the membrane potential, is the value of input signal, and is the standard deviation of Gaussian white noise n in the neural literature but here is Levy white noise. This reduced model (8.51) operates in a subthreshold or excitable regime when the input< 0. 202 The FitzHugh-Nagumo (FHN) Neuron Model The FHN neuron model [56, 76, 88] is a two-dimensional simplification of the Hodgkin and Huxley neuron model [110]. It describes the response of a so-called Type II excitable system [88, 159] that undergoes a Hopf bifurcation. The system first resides in the stable rest state for subthreshold inputs as do multistable systems. Then the system leaves the stable state in response to a strong input but returns to it after passing through firing and refractory states in a manner that differs from the behavior of multistable systems. The FHN neuron model is a limit-cycle oscillator of the form _ v = v(v 2 1 4 )w +A +s(t) +n(t) (8.52) _ w = vw (8.53) wherev(t) is a fast (voltage) variable, w(t) is slow (recovery) variable, A is a constant (tonic) activation signal, and = 0:005. n(t) is a white Levy noise and s(t) is a subthreshold input signal—eithers 1 ors 2 . We measure the neuron’s response to the input signals(t) in terms of the transition (firing) rater(t). We can rewrite (8.52)-(8.53) as _ v =v(v 2 1 4 )w +A T (Bs(t)) +n(t) (8.54) _ w = vw (8.55) where B is a positive constant parameter that corresponds to the distance that the input signal s(t) must overcome to cross the threshold. Then Bs(t) is the signal-to-threshold distance 203 and so s(t) is subthreshold when Bs(t) > 0. Our simulations used B = 0:007 and hence A =(5=(12 p 3 + 0:007)). The deterministic FHN model (n(t) 0 in (8.54)) performs relaxation oscillations and has an action potentialv(t) that lies between -0.6 and 0.6. The system emits a spike whenv(t) crosses the threshold value = 0. We use a lowpass-filtered version ofv(t) to avoid false spike detections due to the additive noise. The lowpass filter is a 100-point moving-average smoother with a 0.001 second time-step. We rewrite equations (8.52)-(8.53) as _ x 1 = x 1 ((x 1 ) 2 1 4 ) x 2 + A + s(t) + n(t) (8.56) _ x 2 = x 1 x 2 : (8.57) Herex 1 =v andx 2 =w. The corresponding matrix Itˆ o stochastic differential equation is dX t = b(X t )dt +c(X t )dL t (8.58) whereX t = (X 1 t ;X 2 t ) T ,L t = (L 1 t ;L 2 t ) T , b(X t ) = 2 6 6 4 b 1 (X 1 t ; X 2 t ) b 2 (X 1 t ; X 2 t ) 3 7 7 5 = 2 6 6 4 X 1 t ((X 1 t ) 2 1 4 ) X 2 t + A + st X 1 t X 2 t 3 7 7 5 ; and c(X t ) = 2 6 6 4 0 3 7 7 5 . Thus all of the above spiking neuron models have the SDE form (8.24). Note that the drift term of the leaky integrate-and-fire neuron model is globally Lipschitz while the drift term of the 204 reduced Type I neuron model is locally Lipschitz. The Lipschitz condition is not easy to verify in the FHN model. We now show that the drift termb 1 (X t ) in the preceding equation does not satisfy the global Lipschitz condition. Note thatb 1 (X t ) is differentiable onR 2 because the partial derivatives of b 1 (X 1 t ;X 2 t ) exist and are continuous onR 2 . Suppose thatb 1 (X t ) satisfies the following global Lipschitz condition: There exists a constantK > 0 such that jb 1 (Z t )b 1 (Y t )j KjjZ t Y t jj for allZ t andY t 2R 2 andt2 [0;T ]. Then the mean-value theorem gives b 1 (Z t )b 1 (Y t ) = [ @b 1 () @X 1 t @b 1 () @X 2 t ] [Z t Y t ] (8.59) = @b 1 () @X 1 t (Z 1 t Y 1 t ) + @b 1 () @X 2 t (Z 2 t Y 2 t ) for some betweenZ t andY t inR 2 . Then jb 1 (Z t )b 1 (Y t )j j @b 1 () @X 1 t jjZ 1 t -Y 1 t jj @b 1 () @X 2 t jjZ 2 t -Y 2 t j = j @b 1 () @X 1 t jjZ 1 t Y 1 t j 1 jZ 2 t Y 2 t j because @b 1 @X 2 t = 1 = j @b 1 () @X 1 t jjZ 1 t Y 1 t j choosing Z t andY t such thatZ 2 t =Y 2 t = j @b 1 () @X 1 t jjjZ t Y t jj > KjjZ t Y t jj 205 for some Z t 2 R 2 and Y t 2 R 2 becausej @b 1 @X 1 t j =j 3(X 1 t ) 2 + 1 4 j is unbounded and continuous onR 2 and so there is a domainDR 2 such thatj @b 1 () @X 1 t j> K for all2D. Thusb 1 (X t ) is not globally Lipschitz. So we cannot use Lemma 8.1 to prove the sufficient condition for the SR effect in the FHN neuron model (8.54)-(8.55). Butb 1 (X t ) is locally Lipschitz. The partial derivatives ofb 1 (X 1 t ;X 2 t ) exist and are contin- uous onR 2 . So @b 1 @X 1 t and @b 1 @X 2 t achieve their respective maxima on the compact setf2 R 2 : jjjjng. Then (8.59) gives the required local Lipschitz condition: jb 1 (Z t )b 1 (Y t )j maxfsup j @b 1 () @X 1 t j; sup j @b 1 () @X 2 t jgjjZ t Y t jj = K 0 n jjZ t Y t jj for allZ t andY t 2 R 2 such thatjjZ t jj n,jjY t jj n, andjjjj n. Lemma 8.2 extends the conclusion of Lemma 8.1 to the locally Lipschitz drift termsb i (X t ). Theorem 8.2 below gives a ‘forbidden-interval’ sufficient condition for a Levy SR noise ben- efits in spiking neuron models such as the leaky integrate-and-fire model [55, 88], the reduced Type I neuron model [160], and the FitzHugh-Nagumo (FHN) model [76, 56]. It shows that these neuron models enjoy SR noise benefits if the noise mean falls to the left of a bound. Theorem 8.2 requires Lemma 8.2 to extend the conclusion of Lemma 8.1 to the locally Lipschitz drift terms b i (X t ). The Appendix gives the proof of Lemma 8.2. 206 Lemma 8.2 Letb i :R d !R andc i j :R d !R in (8.24)-(8.25) ((8.50)-(8.53) for spiking neuron models) be measurable functions that satisfy the respective local and global Lipschitz conditions jb i (z)b i (y)j C n kzyk whenkzkn andkykn; (8.60) jc i j (z)c i j (y)j K 1 kzyk (8.61) jc i j (z)j 2 H i j (8.62) for allz andy2R d , and fori = 1, ...,d andj = 1, ...,m. Suppose dX t = b(X t )dt +c(X t )dL t (8.63) d ^ X t = b( ^ X t )dt (8.64) wheredL t is a Levy noise with =0 and finite second moments. Then for everyT2R + and for every"> 0: E[ sup 0tT kX t ^ X t k 2 ]! 0 as j ! 0 and j ! 0 (8.65) for allj = 1, ...,m, and hence P [ sup 0tT kX t ^ X t k 2 >"]! 0 as j ! 0 and j ! 0 (8.66) for allj = 1, ...,m since mean-square convergence implies convergence in probability. 207 0 0.005 0.01 0.015 0.02 0.025 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Scale κ of additive white Gaussian Levy noise Mutual information I(S,R) in bits 0 0.005 0.01 0.015 0.02 0.025 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Scale κ of additive white uniform jump−diffusion Levy noise Mutual information I(S,R) in bits 0 0.005 0.01 0.015 0.02 0.025 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Scale κ of additive white α−stable Levy noise Mutual information I(S,R) in bits (a) (b) (c) Figure 8.7: Mutual information Levy noise benefits in the leaky integrate-and-fire spiking neuron model (8.50). Mutual information Levy noise benefits in the leaky integrate-and-fire spiking neuron model (8.50). Addi- tive white Levy noisedL t increases the mutual information of the IF neuron with parametersa = 0.5 and = 0.02 for the subthreshold input signals 1 = 0:005 ands 2 = 0:012. The types of Levy noisedL t are (a) Gaussian, (b) Gaussian with uniformly distributed jumps, and (c) symmetric-stable noise with = 1.95 (thick-tailed bell curve with infinite variance [192]). The dashed vertical lines show the total min-max deviations of the mutual information in 100 simulation trials. We can now state and prove Theorem 8.2. Theorem 8.2 Suppose that the spiking neuron models (8.50)-(8.51) and (8.52)-(8.53) have the form of the Levy SDE (8.24) with a bounded globally Lipschitz Levy diffusion termc(X t )H and that the addi- tive Levy noise has drift velocity. Suppose that the input signalS(t)2fs 1 ;s 2 g is subthreshold: S(t)<B. Suppose there is some statistical dependence between the input random variableS and the output spike-rate random variableR so thatI(S;R) > 0. Then the spiking neuron models (8.50)-(8.51) and (8.52)-(8.53) exhibit the SR effect in the sense thatI(S;R)! 0 as the Levy noise parameters! 0 and! 0 ifH<Bs 2 . 208 Proof: Let f k ; k g 1 k=1 be any decreasing sequence of Levy noise parameters such that k ! 0 and k ! 0 as k!1. Define X(t) k and R k as the respective solution process and spiking rate process of the FHN spiking neuron model (8.58) with Levy noise parameters k and k instead of and. Suppose that6= 0. We can absorb the driftc(X t ) into the input signalS because the Levy noiseL t is additive in all the neuron models. Then the new input signalS 0 =S +c(X t ) and this does not affect the Lipschitz continuity ofb(X t ) in (8.22). S 0 is subthreshold (S 0 < B) becausec(X t )< H<B –s 2 wheres 2 = maxfs 1 ;s 2 g. So we lose no generality if we consider the noisedL t with = 0 and letS2fs 1 ;s 2 g be subthreshold in the continuous neuron models (8.22). This allows us to use Lemma 8.2. Recall thatI(S;R) = 0 if and only ifS andR are statistically independent [61]. So we need to show only thatf SR (s;r) =P S (s)f R (r) orf RjS (rjs) =f R (r) as! 0 and! 0 for signal symbolss2fs 1 ;s 2 g and for allr 0. Heref SR is the joint probability density function and f SjR is the conditional density function. This is logically equivalent toF RjS = F R as k ! 0 and k ! 0 ask! 0 whereF RjS is the conditional distribution function [72]. Again the theorem of total probability and the two-symbol alphabet setfs 1 ;s 2 g give F R (r) = X s F RjS (rjs)P S (s) (8.67) =F RjS (rjs 1 )P S (s 1 ) +F RjS (rjs 2 )P S (s 2 ) =F RjS (rjs 1 )P S (s 1 ) +F RjS (rjs 2 )(1P S (s 1 )) = (F RjS (rjs 1 )-F RjS (rjs 2 ))P S (s 1 ) +F RjS (rjs 2 ): (8.68) 209 So we need to show that lim k!1 F R k jS (rjs 1 )F R k jS (rjs 2 ) = 0 for allr 0. This holds if and only if lim k!1 P [R k >rjS =s 1 ]P [R k >rjS =s 2 ] = 0 (8.69) We prove that lim k!1 P [R k >rjS =s i ] = 0 fori = 1 andi = 2. Note that ifr > 0 for (8.58) thenX 1 (t) k must cross the firing or spike threshold. Then P [R k >rjS =s i ] P [ sup t 1 tt 2 X 1 (t) k >jS =s i ]: Then Lemma 8.2 shows that the required limit goes to zero: lim k!1 P [R k >rjS =s i ] (8.70) lim k!1 P [ sup t 1 tt 2 X 1 (t) k >jS =s i ] (8.71) = lim n!1 P [ sup t 1 tt 2 X 1 (t) k >; ^ X 1 (t)<jS =s i ] (8.72) because ^ X 1 (t) converges to the FHN fixed-point Z F i < for larget = 0 by Lemma 8.2: Sub-figures (a) and (b) of Figures 8.7-8.8 show simulation instances of Theorem 8.2 for finite- variance diffusion and jump-diffusion white Levy noise in the leaky integrate-and-fire and the FHN neuron models. Small amounts of additive Levy noise in these spiking neuron models pro- duce the SR effect in terms of the noise-enhanced Shannon mutual informationI(S;Y ) between realizations of a random (Bernoulli) subthreshold input signal S and the neuron’s thresholded 210 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 −3 0 0.1 0.2 0.3 0.4 0.5 0.6 Scale κ of additive white Gaussian Levy noise Mutual information I(S,R) in bits 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 −3 0 0.1 0.2 0.3 0.4 0.5 0.6 Scale κ of additive white uniform jump−diffusion Levy noise Mutual information I(S,R) in bits 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 −3 0 0.1 0.2 0.3 0.4 0.5 0.6 Scale κ of additive white α−stable Levy noise Mutual information I(S,R) in bits (a) (b) (c) Figure 8.8: Mutual information Levy noise benefits in the FHN spiking neuron (8.52)-(8.53). Mutual information Levy noise benefits in the FHN spiking neuron (8.52)-(8.53). Additive white Levy noise dL t increases the mutual information of the FHN neuron for the subthreshold input signal s 1 = 0:0045 ands 2 = 0:0045. The types of Levy noisedL t are (a) Gaussian, (b) Gaussian with uniformly distributed jumps, and (c) symmetric -stable noise with = 1.9 (thick-tailed bell curve with infinite variance [192]). The dashed vertical lines show the total min-max deviations of the mutual information in 100 simulation trials. output random variableY . The SR effects in Figures 8.7(c) and 8.8(c) again lie outside the scope of Theorem 8.2 because they occur for infinite-variance-stable noise and because Theorem 8.2 requires noise with finite second moments. Thus the SR effect in spiking neurons is not limited to finite second moment Levy noise. Applebaum [13] recently proved it for more general Levy processes that do not require the finite second-moment restriction. This is preliminary evidence of the theoretical and applied significance of the SR forbidden-interval theorems of this chapter. Adding Levy noise to enhance faint signals could apply to a variety of signal and image process- ing problems that include low-light imaging in satellites and other sensor devices, night vision, artificial vision and olfaction, neural prosthetics, infrared imaging, impulsive signal detection, and some types of pattern recognition. 211 8.6 Proofs of Lemmas The proof of Lemma 8.2 relies on the proof technique of Lemma 8.1 in which we bound a mean- squared term by four additive terms and then show that each of the four terms goes to zero in the limit. Lemma 8.1: Letb i :R d !R andc i j :R d !R in (8.24)-(8.25) be measurable functions that satisfy the global Lipschitz conditions jb i (x 1 )b i (x 2 )j K 1 kx 1 x 2 k (8.73) jc i j (x 1 )c i j (x 2 )j K 2 kx 1 x 2 k (8.74) and jc i j (x 1 )j 2 H i j (8.75) for allx 1 ;x 2 2R d and fori = 1, ...,d andj = 1, ...,m. Suppose dX t = b(X t )dt +c(X t )dL t (8.76) d ^ X t = b( ^ X t )dt (8.77) wheredL t is a Levy noise with =0 and finite second moments. Then for everyT2R + and for every"> 0: E[ sup 0tT kX t ^ X t k 2 ]! 0 as j ! 0 and j ! 0 (8.78) 212 for allj = 1, ...,m, and hence P [ sup 0tT kX t ^ X t k 2 >"]! 0 as j ! 0 and j ! 0 (8.79) for allj = 1, ...,m since mean-square convergence implies convergence in probability. Proof: The Lipschitz conditions (8.73) and (8.74) ensure that the process X t exists [12] for t 0 in (8.76). Then the proof commences with the inequality sup 0tT kX t ^ X t k 2 d X i=1 sup 0tT (X i t ^ X i t ) 2 (8.80) which implies that E[ sup 0tT kX t ^ X t k 2 ] d X i=1 E[ sup 0tT (X i t ^ X i t ) 2 ]: (8.81) Equations (8.27) and (8.76)-(8.77) imply X i t ^ X i t = Z t 0 [b i (X s )-b i ( ^ X s )]ds + m X j=1 Z t 0 i j (X s )dB j s + m X j=1 Z t 0 Z jy j j<1 F i j (X s ;y j ) ~ N j (ds;dy j ) + m X j=1 Z t 0 Z jy j j1 G i j (X s ;y j )N j (ds;dy j ): (8.82) 213 This gives an upper bound on the squared difference as (X i t ^ X i t ) 2 (3m + 1) Z t 0 [b i (X s )b i ( ^ X s )]ds 2 + m X j=1 Z t 0 i j (X s )dB j s 2 + m X j=1 ( Z t 0 Z jy j j<1 F i j (X s ;y j ) ~ N j (ds;dy j ) ) 2 + m X j=1 ( Z t 0 Z jy j j1 G i j (X s ;y j )N j (ds;dy j ) ) 2 1 A (8.83) because (u 1 +::: +u n ) 2 n(u 2 1 +::: +u 2 n ). The Cauchy-Schwartz inequality gives Z t 0 [b i (X s )b i ( ^ X s )]ds 2 Z t 0 ds Z t 0 [b i (X s )b i ( ^ X s )] 2 ds : (8.84) Now put (8.84) in the first term of (8.83) and then take expectations of the supremum on both sides to get four additive terms as an upper bound: E " sup 0tT (X i t ^ X i t ) 2 # (3m + 1) E " sup 0tT t Z t 0 [b i (X s )b i ( ^ X s )] 2 ds # + m X j=1 E " sup 0tT Z t 0 i j (X s )dB j s 2 # + m X j=1 E 2 4 sup 0tT ( Z t 0 Z jy j j<1 F i j (X s ;y j ) ~ N j (ds;dy j ) ) 2 3 5 + m X j=1 E 2 4 sup 0tT ( Z t 0 Z jy j j1 G i j (X s ;y j )N j (ds;dy j ) ) 2 3 5 1 A : (8.85) 214 We next show that each of the four terms goes to zero. Consider the first term on the right- hand side of (8.85): E " sup 0tT t Z t 0 [b i (X s )b i ( ^ X s )] 2 ds # TE " sup 0tT Z t 0 [b i (X s )b i ( ^ X s )] 2 ds # TK 2 1 E " sup 0tT Z t 0 kX s ^ X s k 2 ds # by the Lipschitz condition (8.73) TK 2 1 Z T 0 E sup 0us kX u ^ X u k 2 ds: (8.86) The second term E " sup 0tT Z t 0 i j (X s )dB j s 2 # 4E " Z T 0 f i j (X s )dB j s 2 # because R t 0 i j (X s )dB j s is a martingale and so we can apply Doob’sL p inequality [169]: E sup atb jU t j p p p1 p EjU b j p iffU t g t0 is a real-valued martingale, [a;b] is a bounded interval ofR + ,U t 2L p ( ;R), and ifp> 1 (p = 2 in our case). But 4E " Z T 0 f i j (X s )dB j s 2 # = 4 Z T 0 E f i j (X s )g 2 ds (8.87) by Itˆ o isometry [12]: E n R T 0 f(s;w)dB s o 2 = R T 0 E(jf(s;w)j 2 )ds iff2H 2 ([0;T ]) where H 2 ([0;T ]) is the space of all real-valued measurablefF t g-adapted processes such that 215 E R T 0 jf(s;w)j 2 ds <1. Then 4 Z T 0 E f i j (X s )g 2 ds 4( j ) 2 Z T 0 E fc i j (X s )g 2 ds (8.88) by definition of i j (X s ) 4( j ) 2 TH i j becausejc i j j 2 H i j : (8.89) Note that E 2 4 sup 0tT ( Z t 0 Z jy j j<1 F i j (X s ;y j ) ~ N j (ds;dy j ) ) 2 3 5 4E 2 4 ( Z T 0 Z jy j j<1 F i j (X s ;y j ) ~ N j (ds;dy j ) ) 2 3 5 by Doob’sL p inequality = 4E 2 4 ( Z T 0 Z jy j j<1 c i j (X s )y j ~ N j (ds;dy j ) ) 2 3 5 by definition ofF i j (X s ;y j ) 4H i j E 2 4 ( Z T 0 Z jy j j<1 y j ~ N j (ds;dy j ) ) 2 3 5 becausejc i j j 2 H i j 4H i j E 2 4 ( Z jy j j<1 y j ~ N j (T;dy j ) ) 2 3 5 = 4H i j T Z jy j j<1 jy j j 2 j (dy j ) (8.90) by Itˆ o isometry and (8.21). Similar arguments and (8.19) give E 2 4 sup 0tT ( Z t 0 Z jy j j1 G i j (X s ;y j )N j (ds;dy j ) ) 2 3 5 4H i j T Z jy j j1 jy j j 2 j (dy j ): (8.91) 216 Substituting the above estimates (8.86), (8.89), (8.90), and (8.91) in inequality (8.85) gives E " sup 0tT (X i t ^ X i t ) 2 # (3m + 1) TK 2 1 Z T 0 E sup 0us kX u ^ X u k 2 ds + m X j=1 4TH i j ( j ) 2 + Z R jy j j 2 j (dy j ) 1 A : (8.92) Inequalities (8.81) and (8.92) imply that we can write z(T ) A +Q Z T 0 z(s)d(s) (8.93) where z(T ) = E " sup 0tT kX t ^ X t k 2 # ; A = d X i=1 m X j=1 (3m+1)4TH i j ( j ) 2 + Z R jy j j 2 j (dy j ) ; andQ = (3m+1)dTK 2 1 . Then we getz(T )Ae QT by Gronwall’s inequality [71]:(t) e t for allt2 [0;T ] and for real continuous(t) in [0;T ] such that(t) + R t 0 ()d where t2 [0;T ] and > 0. Note thatA! 0 as j ! 0 and j ! 0. Hence E[ sup 0tT kX t ^ X t k 2 ] ! 0 as j ! 0 and j ! 0 (8.94) for eachj = 1;:::;m. This implies the claim (8.79). 217 Lemma 8.2: Letb i : R d ! R andc i j : R d ! R in (8.24)-(8.25) ((8.50)-(8.53) for spiking neuron models) be measurable functions that satisfy the respective local and global Lipschitz conditions jb i (z)b i (y)j C n kzyk whenkzkn andkykn; (8.95) jc i j (z)c i j (y)j K 1 kzyk (8.96) jc i j (z)j 2 H i j (8.97) for allz andy2R d , and fori = 1, ...,d andj = 1, ...,m. Suppose dX t = b(X t )dt +c(X t )dL t (8.98) d ^ X t = b( ^ X t )dt (8.99) wheredL t is a Levy noise with =0 and finite second moments. Then for everyT2R + and for every"> 0: E[ sup 0tT kX t ^ X t k 2 ]! 0 as j ! 0 and j ! 0 (8.100) for allj = 1, ...,m, and hence P [ sup 0tT kX t ^ X t k 2 >"]! 0 as j ! 0 and j ! 0 (8.101) for allj = 1, ...,m since mean-square convergence implies convergence in probability. 218 Proof: First define the function ~ b i r such that (i) ~ b i r (x) =b i (x) forjjxjjr (ii) ~ b i r (x) = 0 forjjxjj 2r (iii) ~ b i r (x) = ((2rjjxjj)=r)b i (rx=jjxjj) forrjjxjj 2r. We then show that the function ~ b i r is globally Lipschitz: j ~ b i r (x) ~ b i r (y)jC 0 r jjxyjj for allx,y2R d . Consider the function ~ b i r (x). Write ~ b i r (x) = 8 > > < > > : b i (x) ifjjxjjr f(x)g i (x) ifrjjxjj 2r (8.102) where f(x) = ((2rjjxjj)=r) and (8.103) g i (x) = b i (rx=kxk): The definition of ~ b r implies that it is Lipschitz continuous on the regionD 1 =fkxkrg: k ~ b i r (x) ~ b i r (y)kC r kxyk for allx;y2D 1 : (8.104) We first show that ~ b r (x) is Lipschitz continuous on the regionD 2 =frkxk 2rg. For x;y2D 2 =frkxk 2rg: 219 jf(x)f(y)j = jkykkxkj r by definition off (8.105) kxyk r and jg i (x)g i (y)j = jb i (rx=kxk)b i (ry=kyk)j (8.106) by definition ofg i C r k rx kxk ry kyk k (8.107) because rs ksk 2D 1 for alls2R d andb i is Lipschitz continuous onD 1 C r 2 kxyk (8.108) becauserkxk;kyk 2r: Hence j ~ b i r (x) ~ b i r (y)j = jf(x)g i (x)f(y)g i (y)j (8.109) jf(x)g i (x)-f(z)g i (z)j +jf(z)g i (z)-f(y)g i (y)j (8.110) = jf(x)g i (x)-f(x)g i (y)j +jf(x)g i (y)-f(y)g i (y)j (8.111) by choosingz on the line segment between 0 andy such thatkzk =kxk 220 = jf(x)jjg i (x)g i (y)j +jg i (y)jjf(x)f(y)j (8.112) kfk 1;2 jg i (x)g i (y)j +kgk 1;2 jf(x)f(y)j (8.113) where we definekvk 1;i = supfkv(s)k :s2D i g kfk 1;2 C r 2 kxyk +kgk 1;2 kxyk r (8.114) C 0 r kxyk whereC 0 r =kfk 1;2 C r 2 + kgk 1;2 r : (8.115) So ~ b r (x) is Lipschitz continuous onD 2 . We next show that ~ b r (x) is Lipschitz continuous onD 1 andD 2 . Choosex2D 1 ,y2D 2 , and a pointz of@D 1 on the line segment betweenx andy. Then j ~ b i r (x) ~ b i r (y)j j ~ b i r (x)- ~ b i r (z)j +j ~ b i r (z)- ~ b i r (y)j (8.116) C r kxzk +C 0 r kzyk (8.117) C 0 r kxyk (8.118) because C 0 r C r andkxzk +kzyk =kxyk. So ~ b r (x) is Lipschitz continuous with coefficientC 0 r onkxk 2r. We now choosex2 (D 1 [D 2 ),y2 (D 1 [D 2 ) c , and a pointz of @(D 1 [D 2 ) c on the line segment betweenx andy. Then j ~ b i r (x) ~ b i r (y)j j ~ b i r (x)- ~ b i r (z)j +j ~ b i r (z)- ~ b i r (y)j (8.119) C r kxzk + 0 (8.120) C 0 r kxzk +C 0 r kzyk (8.121) = C 0 r kxyk: (8.122) 221 Then (8.104), (8.115), (8.118), and (8.122) show that ~ b i r (x) is Lipschitz continuous with coeffi- cientC 0 r onR d . Consider next the SDE d ~ X t = ~ b r ( ~ X t )dt +c( ~ X t )dL t : (8.123) Lemma 8.1 holds for (8.123) and so we can write E[ sup 0t ~ Tr k ~ X t ^ X t k 2 ]! 0 as j ! 0 and j ! 0 (8.124) for allj = 1, ..., m where ~ T r = infft 0 :k ~ X t k rg and we chooser such thatk ^ X t k<r for allt. Now defineT r = infft 0 :kX t k rg and r = infft :kX t k r ork ~ X t k rg = minf ~ T r ;T r g. ThenX t and ~ X t satisfy (8.98) on [0; r ]. Note that ~ T r andT r are stopping times and thus r is also a stopping time. So arguments similar to those of the proof of Lemma 8.1 ((8.81)-(8.91) with appropriate modifications) give E " sup 0uminft;rg kX u ~ X u k 2 # Q 0 Z t 0 E " sup 0uminfs;rg kX u ~ X u k 2 # ds: (8.125) Then E " sup 0uminft;rg kX u ~ X u k 2 # 0 (8.126) by Gronwall’s inequality. HenceX t = ~ X t holds almost surely on [0; r ]. This result and (8.124) give 222 E[ sup 0tr kX t ^ X t k 2 ]! 0 as j ! 0 and j ! 0 for all j = 1;:::;m: (8.127) We need now show only that r !1 almost surely asr!1 to prove (8.101). LetX minfT;rg be the value of the processX t at time minfT; r g. Note first thatkX minfT;rg k 2 = I fr>Tg kX T k + I frTg kX r k 2 . Then E kX minfT;rg k 2 = E h I fr>Tg kX T k 2 i + E h I frTg kX r k 2 i = E h I fr>Tg kX T k 2 i + P ( r T )r 2 becausekX r k =r. Therefore P ( r T ) E kX minfT;rg k 2 r 2 : (8.128) Applying Itˆ o’s lemma [12] tokX minfT;rg k 2 gives E kX minfT;rg k 2 A 00 +Q 00 Z T 0 E kX minfs;rg k 2 ds: (8.129) Thus E kX minfT;rg k 2 A 00 e Q 00 T (8.130) 223 by Gronwall’s inequality whereA 00 andQ 00 do not depend onr because we do not use the Lips- chitz condition in the derivation of (8.129). Then (8.128) and (8.130) imply that P ( r T ) E kX minfT;rg k 2 r 2 ! 0 asr!1: (8.131) Thus r !1 almost surely asr!1. This implies the claim (8.100). Q.E.D. 224 Chapter 9 Future Work The results of this dissertation should extend in at least four major directions: 1. Extension of Forbidden Intervals and other SR conditions: The forbidden interval con- ditions should extend to signal detection systems that use multi-dimensional signals or multiple samples. The SR conditions of Chapter 4 should likewise extend to multiple-threshold detectors. More general forbidden intervals or SR conditions should exist for multiplicative noise and for other performance measures such as cross-correlation and for information-theoretic divergence measures [158, 200] other than mutual information. The present forbidden interval theorems hold only for binary discrete subthreshold signals. Some form of forbidden interval theorems should also hold for continuous binary suprathreshold signals. It is an open research question whether the SR results of Chapters 7 and 8 extend to directed information [143, 172, 213]. Directed in- formation I(X N ! Y N ) is a direction-of-causality extension of mutual information between random sequencesX n andY n for channels with memory and feedback: I(X N !Y N ) = N X n=1 I(X n ;Y n jY n1 ) (9.1) 225 whereX n = (X 1 ;:::;X n ). Some form of noise should also benefit nonlinear feedback communi- cation and signal processing systems. 2. New Adaptation Laws: The SR condition of Chapter 5 should give simple stochastic learning laws that can learn the optimal quantizer-noise intensity for quantizer-array detectors and deterministic limiting-array detectors. This condition uses only the mean and variance of the test statistics and their first-derivatives. So the learning laws should use fewer data samples and converge fairly quickly to the optimal noise intensity. The general optimization structure of Chapter 3 should lead to fast adaptive noise-finding algorithms. These algorithms might exploit the geometric properties of the payoff functions and constraints. 3. Optimal Noise: An open research question is how to find the optimal way to inject noise in a given system when it exhibits some noise benefit. Chapter 3 gives the form of optimal noise that applies to many types of noise-injection methods such as using additive or multiplicative noise in inequality-constrained optimization problems. But we do not yet know which noise-injection method gives the optimal performance for most systems. The optimal noise may also depend on the choice of performance measure. 4. Jump Processes: SR results with Levy noise should also extend to many other more general independent-increment processes. These processes have a wide range of applications finance, physics, and biology such as pricing and hedging, quantum stochastic calculus, and epidemiology. 226 References [1] A. Achim, A. Bezerianos, and P. Tsakalides. Novel bayesian multiscale method for speckle removal in medical ultrasound images. IEEE Transactions on Medical Imaging, 20:772–783, 2001. [2] R. J. Adler, R. E. Feldman, and M. S. Taqqu, editors. A practical guide to heavy tails: statistical techniques and applications. Birkhauser Boston Inc., Cambridge, MA, USA, 1998. [3] H. Ahn and R. E. Feldman. Optimal filtering of a gaussian signal in the presence of levy noise. SIAM Journal of Applied Mathematics, 60(1):359–369, November 1998. [4] A. J. Ahumada and H. A. Peterson. Luminance-model-based DCT quantization for color image compression. In B. E. Rogowitz, editor, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 1666, pages 365–374, August 1992. [5] T. Aihara, K. Kitajo, D. Nozaki, and Y . Yamamoto. Internal noise determines external stochastic resonance in visual perception. Vision Research, 48(14):1569 – 1573, 2008. [6] P. Ala-Laurila, K. Donner, and A. Koskelainen. Thermal activation and photoactivation of visual pigments. Biophysical Journal, 86(6):3653–3662, June 2004. [7] S.-I. Amari. Neural theory of association and concept-formation. pages 269–281, 1988. [8] S. Ambike, J. Ilow, and D. Hatzinakos. Detection for binary transmission in a mixture of gaussian noise and impulsive noise modeled as an alpha-stable process. Signal Processing Letters, IEEE, 1(3):55–57, Mar 1994. [9] P.-O. Amblard, S. Zozor, M. D. McDonnell, and N. G. Stocks. Pooling networks for a dis- crimination task: noise-enhanced detection. In Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 6602, 2007. [10] B. And` o and S. Graziani. Stochastic resonance: theory and applications. Kluwer Aca- demic Publishers, Norwell, MA, USA, 2000. [11] S. Appadwedula, V . V . Veeravalli, , and D. L. Jones. Energy-efficient detection in sensor networks. IEEE Journal on Selected Areas in Communication, 23(4):693–702, April 2005. [12] D. Applebaum. Levy Processes and Stochastic Calculus. Cambridge University Press, July 2004. 227 [13] D. Applebaum. Extending stochastic resonance for neuron models to general levy noise. IEEE Transactions on Neural Networks, to appear 2009. [14] M. Arnold, S. D. Wolthusen, and M. Schmucker. Techniques and Applications of Digital Watermarking and Content Protection. Artech House Publishers, July 2003. [15] P. Auer, H. Burgsteiner, and W. Maass. A learning rule for very simple universal approxi- mators consisting of a single layer of perceptrons. Neural Networks, 21(5):786–795, 2008. [16] M. Azizoglu. Convexity property in binary detection. IEEE Transaction on Information Theory, 42(4):1316–1321, July 1996. [17] O. E. Barndorff-Nielsen and N. Shephard. Non-gaussian ornstein-uhlenbeck-based models and some of their uses in financial economics. Journal Of The Royal Statistical Society Series B, 63(2):167–241, 2001. [18] M. Barni, F. Bartolini, V . Cappellini, and A. Piva. A DCT-domain system for robust image watermarking. Signal Processing, 66(3):357 – 372, 1998. [19] Y . Bazi, L. Bruzzone, and F. Melgani. Image thresholding based on the em algorithm and the generalized gaussian distribution. Pattern Recognition, 40(2):619–634, 2007. [20] V . Beiu, J. M. Quintana, and M. J. Avedillo. VLSI implementations of threshold logic - a comprehensive survey. IEEE Trans. Neural Networks, 14:1217–1243, 2003. [21] A. Benerjee, P. Burlina, and R. Chellappa. Adaptive target detection in foliage-penetrating sar images using alpha -stable models. IEEE Transactions on Image Processing, 8(12):1823–1831, December 1999. [22] N. Benvenuto and G. Cherubini. Algorithms for Communications Systems and their Ap- plications. p. 442, Wiley, 2002. [23] R. Benzi, G. Parisi, A. Sutera, and A. Vulpiani. Stochastic resonance in climatic change. Tellus, 34:10–+, 1982. [24] R. Benzi, A. Sutera, and A. Vulpiani. The mechanism of stochastic resonance,. Journal of Physics A, 14:L453, 1981. [25] S. M. Bezrukov and I. V odyanoy. Stochastic resonance in thermally activated reactions: Application to biological ion channels. Chaos: An Interdisciplinary Journal of Nonlinear Science, 8(3):557–566, 1998. [26] V . Bhatia and B. Mulgrew. Non-parametric likelihood based channel estimator for gaus- sian mixture noise. Signal Processing, 87:2569–2586, November 2007. [27] K. C. Border. Fixed Point Theorems with Applications to Economics and Game Theory. p. 10, Cambridge University Press, 1989. [28] A. C. Bovik. The Essential Guide to Image Processing. Academic Press, June 2009. 228 [29] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized gossip algorithms. IEEE Transactions on Information Theory, 52(6):2508–2530, 2006. [30] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. [31] A. Briassouli, P. Tsakalides, and A. Stouraitis. Hidden messages in heavy-tails: Dct- domain watermark detection using alpha-stable models. IEEE Transactions on Multime- dia, 7(4):700–715, 2005. [32] C. L. Brown. Score functions for locally suboptimum and locally suboptimum rank detec- tion in alpha-stable interference. Proceedings of the 11th IEEE Signal Processing Work- shop on Statistical Signal Processing, SSP2001, pages 58–61, 2001. [33] C. L. Brown and A. M. Zoubir. A nonparametric approach to signal detection in impulsive interference. IEEE Transactions on Signal Processing, 48(9):2665–2669, Sep 2000. [34] P. Bryant, K. Wiesenfeld, and B. McNamara. The nonlinear effects of noise on parametric amplification: An analysis of noise rise in josephson junctions and other systems. Journal of Applied Physics, 62(7):2898–2913, 1987. [35] A. Bulsara, E. W. Jacobs, T. Zhou, F. Moss, and L. Kiss. Stochastic resonance in a single neuron model: Theory and analog simulation. Journal of Theoretical Biology, 152:531– 555, 1991. [36] A. R. Bulsara, T. C. Elston, C. R. Doering, S. B. Lowen, and K. Lindenberg. Cooperative behavior in periodically driven noisy integrate-fire models of neuronal dynamics. Physical Review E, 53(4):3958–3969, Apr 1996. [37] A. R. Bulsara and L. Gammaitoni. Tuning in to noise. Physics Today, 49:39–47, March 1996. [38] A. R. Bulsara, A. J. Maren, and G. Schmera. Single effective neuron: Dendritic coupling effect and stochastic resonance. Biological Cybernatics, 70(2):145–156, December 1993. [39] A. R. Bulsara and A. Zador. Threshold detection of wideband signals: A noise-induced maximum in the mutual information. Physical Review E, 54(3):R2185–R2188, Sep 1996. [40] D. A. Burkhardt, J. Gottesman, D. Kersten, and G. E. Legge. Symmetry and constancy in the perception of negative and positive luminance contrast. J. Opt. Soc. Am. A, 1(3):309– 316, 1984. [41] G. Casella and R. Berger. Statistical Inference. Duxbury Resource Center, June 2001. [42] N. Caticha, J. E. Palo Tejada, D. Lancet, and E. Domany. Computational capacity of an odorant discriminator: the linear separability of curves. Neural Computation, 14(9):2201– 2220, 2002. [43] D. Chander and E. J. Chichilnisky. Adaptation to Temporal Contrast in Primate and Sala- mander Retina. J. Neurosci., 21(24):9904–9916, 2001. 229 [44] F. Chapeau-Blondeau, S. Blanchard, and D. Rousseau. Fisher information and noise- aided power estimation from one-bit quantizers. Digital Signal Processing, 18(3):434– 443, 2008. [45] F. Chapeau-Blondeau and D. Rousseau. Noise improvements in stochastic resonance: From signal amplification to optimal detection. Fluctuation and Noise Letters, 2(3):L221– L233, September 2002. [46] F. Chapeau-Blondeau and D. Rousseau. Noise-enhanced performance for an optimal bayesian estimator. IEEE Transactions on Signal Processing, 52(5):1327–1334, May 2004. [47] F. Chapeau-Blondeau and D. Rousseau. Constructive action of additive noise in optimal detection. International Journal of Bifurcation & Chaos, 15(9):2985–2994, September 2005. [48] M. Chatterjee and S. I. Oba. Noise improves modulation detection by cochlear implant listeners at moderate carrier levels. Journal of Acoustic society of America, 118(1):993– 1002, August 2005. [49] M. Chatterjee and M. E. Robert. Noise enhances modulation sensitivity in cochlear im- plant listeners: Stochastic resonance in a prosthetic sensory system? Journal of the Asso- ciation for Research in Otolaryngology, 2(2):159–171, June 2001. [50] H. Chen, P. K. Varshney, S. M. Kay, and J. H. Michels. Theory of stochastic resonance ef- fects in signal detection: Part I - fixed detectors. IEEE Transactions on Signal Processing, 55(7):3172–3184, July 2007. [51] Q. Cheng and T. S. Huang. An additive approach to transform-domain information hid- ing and optimum detection structure. IEEE Transactions on Multimedia, 3(3):273–284, September 2001. [52] D. R. Chialvo, A. Longtin, and J. M¨ uller-Gerking. Stochastic resonance in models of neuronal ensembles. Physical Review E, 55(2):1798–1808, Feb 1997. [53] Y . Ching and L. Kurz. Nonparametric detectors based on m-interval partitioning. IEEE Transactions on Information Theory, 18(2):251–257, March 1972. [54] A. M. Cohen and S. Grossberg. Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Transactions on System, Man, and Cybernetics, 13(5):815–826, September 1983. [55] J. J. Collins, C. C. Chow, A. C. Capela, and T. T. Imhoff. Aperiodic stochastic resonance. Physical Review E, 54(5):5575–5584, Nov 1996. [56] J. J. Collins, C. C. Chow, and T. T. Imhoff. Aperiodic stochastic resonance in excitable systems. Physical Review E, 52(4):R3321–R3324, Oct 1995. [57] J. J. Collins, C. C. Chow, and T. T. Imhoff. Stochastic resonance without tuning. Nature, 376(6537):236–238, July 1995. 230 [58] J. J. Collins and T. T. Imhoff. Noise-enhanced tactile sensation. Nature, 383:770–+, October 1996. [59] J. J. Collins, T. T. Imhoff, and P. Grigg. Noise-enhanced information transmission in rat SA1 cutaneous mechanoreceptors via aperiodic stochastic resonance. J Neurophysiol, 76(1):642–645, 1996. [60] R. Cont and P. Tankov. Financial modelling with jump processes. Chapman Hall/CRC, Boca Raton, Florida, 2003. [61] T. M. Cover and J. A. Thomas. Elements of Information Theory (Wiley Series in Telecom- munications and Signal Processing). Wiley-Interscience, 2006. [62] I. J. Cox, M. L. Miller, and J. A. Bloom. Digital Watermarking: Principles & Practice. Morgan Kaufmann, 2001. [63] P. Cuff. Communication requirements for generating correlated random variables. In Information Theory, 2008. ISIT 2008. IEEE International Symposium on, pages 1393– 1397, July 2008. [64] G. Deco and B. Sch¨ urmann. Stochastic resonance in the mutual information between input and output spike trains of noisy central neurons. Phys. D, 117(1-4):276–282, 1998. [65] M. N. Do and M. Vetterli. Wavelet-based texture retrieval using generalized gaussian den- sity and kullback-leibler distance. IEEE Transactions on Image Processing, 11(2):146– 158, Feb 2002. [66] K. Dong and N. Makri. Quantum stochastic resonance in the strong-field limit. Phys. Rev. A, 70(4):042101, Oct 2004. [67] J. K. Douglass, L. Wilkens, E. Pantazelou, and F. Moss. Noise enhancement of informa- tion transfer in crayfish mechanoreceptors by stochastic resonance. Nature, 365:337–340, September 1993. [68] F. Duan, D. Abbott, and F. Chapeau-Blondeau. The application of saturating detectors to a dct-domain watermarking scheme. Fluctuation and Noise Letters, 8(1):L65–L79, 2008. [69] F. Duan, D. Rousseau, and F. Chapeau-Blondeau. Residual aperiodic stochastic resonance in a bistable dynamic system transmitting a suprathreshold binary signal. Physical Review E, 69(1):011109, January 2004. [70] F. Duan and B. Xu. Parameter-induced stochastic resonance and baseband binary PAM signals transmission over an AWGN channel. I. J. Bifurcation and Chaos, 13(2):411–425, 2003. [71] R. Durrett. Stochastic Calculus: A Practical Introduction. CRC Press, 1996. [72] R. Durrett. Probability: Theory and Examples. Duxbury, 3 edition, 2004. 231 [73] M. Epstein, L. Hars, R. Krasinski, M. Rosner, and H. Zheng. Design and implementation of a true random number generator based on digital circuit artifacts. Lecture Notes in Computer Science, 2779:152–165, 2003. [74] J. B. Fallon, R. W. Carr, and D. L. Morgan. Stochastic resonance in muscle receptors. Journal of Neurophysiology, 91(6):2429–2436, Jun 2004. [75] S. Fauve and F. Heslot. Stochastic resonance in a bistable system. Physics Letters A, 97:5–7, August 1983. [76] R. Fitzhugh. Impulses and physiological states in theoretical models of nerve membrane. Biophysical Journal, 1(6):445–466, July 1961. [77] G. B. Folland. Real Analysis: Modern Techniques and Their Applications. Wiley- Interscience, second edition, 1999. [78] R. F. Fox. Stochastic resonance in a double well. Phys. Rev. A, 39(8):4148–4153, Apr 1989. [79] M. A. Freed. Rate of quantal excitation to a retinal ganglion cell evoked by sensory input. Journal of Neurophysiology, 83:2956–2966, 2000. [80] J. A. Freund, L. Schimansky-Geier, B. Beisner, A. Neiman, D. F. Russel, T. Yakusheva, and F. Moss. Behavioral stochastic resonance: How the noise from a daphnia swarm enhances individual prey capture by juvenile paddlefish. Journal of Theoretical Biology, 214(1):71 – 83, 2002. [81] Y . Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. volume 37, pages 277–296, 1999. [82] B. Furht and D. Kirovski. Multimedia Security Handbook. Boca Raton, FL, 2005. [83] L. Gammaitoni, P. H¨ anggi, P. Jung, and F. Marchesoni. Stochastic resonance. Rev. Mod. Phys., 70(1):223–287, Jan 1998. [84] L. Gammaitoni, F. Marchesoni, E. Menichella-Saetta, and S. Santucci. Stochastic reso- nance in bistable systems. Physical Review Letters, 62(4):349–352, Jan 1989. [85] L. Gammaitoni, E. Menichella-Saetta, S. Santucci, F. Marchesoni, and C. Presilla. Period- ically time-modulated bistable systems: Stochastic resonance. Phys. Rev. A, 40(4):2114– 2119, Aug 1989. [86] S. Gazor and Wei Zhang. Speech probability distribution. Signal Processing Letters, IEEE, 10(7):204–207, July 2003. [87] P. G. Georgiou, P. Tsakalides, and C. Kyriakakis. Alpha-stable modeling of noise and robust time-delay estimation in the presence of impulsive noise. IEEE Transactions on Multimedia, 1:291–301, 1999. [88] W. Gerstner and W. M. Kistler. Spiking Neuron Models. Cambridge University Press, August 2002. 232 [89] J. D. Gibson and J. L. Melsa. Introduction to nonparametric detection with applications. IEEE Press, 1996. [90] Z. Gingl, R. Vajtai, and L. B. Kiss. Signal-to-noise ratio gain by stochastic resonance in a bistable system. Chaos, Solitons & Fractals, 11(12):1929 – 1932, 2000. [91] M. T. Giraudo and L. Sacerdote. Jump-diffusion processes as models for neuronal activity. Biosystems, 40(1-2):75 – 82, 1997. Neuronal Coding. [92] M. T. Giraudo, L. Sacerdote, and R. Sirovich. Effects of random jumps on a very simple neuronal diffusion model. Biosystems, 67(1-3):75 – 83, 2002. [93] B. J. Gluckman, T. I. Netoff, E. J. Neel, W. L. Ditto, M. L. Spano, and S. J. Schiff. Stochas- tic resonance in a neuronal network from mammalian brain. Physical Review Letters, 77(19):4098–4101, Nov 1996. [94] Y . Gong, Z. Ding, and C. F. N. Cowan. MMSE turbo equalizer for channels with cochan- nel interference. IEEE International Conference on Acoustics, Speech and Signal Process- ing,ICASSP 2006, 4:437–440, May 2006. [95] P. F. Gora. The logistic equation and a linear stochastic resonance. Acta Physica Polonica B, 35(5):1583–1595, 2004. [96] M. Grifoni and P. H¨ anggi. Coherent and incoherent quantum stochastic resonance. Physi- cal Review Letters, 76(10):1611–1614, Mar 1996. [97] M. Grigoriu. Applied Non-Gaussian Processes. Prentice Hall, 1995. [98] M. Grossglauser and D. Tse. Mobility increases the capacity of ad-hoc wireless networks. In INFOCOM 2001. Twentieth Annual Joint Conference of the IEEE Computer and Com- munications Societies. Proceedings. IEEE, volume 3, pages 1360–1369 vol.3, 2001. [99] B. S. Gutkin and G. B. Ermentrout. Dynamics of membrane excitability determine inter- spike interval variability: a link between spike generation mechanisms and cortical spike train statistics. Neural Computation, 10(5):1047–1065, 1998. [100] P. H¨ anggi. Stochastic resonance in biology: How noise can enhance detection of weak signals and help improve biological information processing. ChemPhysChem, 3:285–290, 2002. [101] A. Hanssen and T.A. Oigard. The normal inverse gaussian distribution: a versatile model for heavy-tailed stochastic processes. In Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP ’01). 2001 IEEE International Conference on, volume 6, pages 3985–3988, 2001. [102] G. P. Harmer, B. R. Davis, and D. Abbott. A review of stochastic resonance: Circuits and measurement. IEEE Transactions on Instrumentation and Measurement, 51(2):299–309, April 2002. [103] J.D. Harry, J.B. Niemi, A.A. Priplata, and J.J. Collins. Balancing act [noise based sensory enhancement technology]. Spectrum, IEEE, 42(4):36–41, April 2005. 233 [104] F. Hartung and M. Kutter. Multimedia watermarking techniques. Proceedings of IEEE, 87(7):1079–1107, 1999. [105] B. Hayes. Randomness as a resource. American Scientist, 89(4):300304, 2001. [106] J. R. Hernandez, A. Member, F. Perez-Gonzalez, M. Amado, and O. Prez-gonzlez. Dct- domain watermarking techniques for still images: Detector performance analysis and a new structure. IEEE Trans. on Image Processing, 9:55–68, 2000. [107] A. D. Hibbs, A. L. Singsaas, E. W. Jacobs, A. R. Bulsara, J. J. Bekkedahl, and F. Moss. Stochastic resonance in a superconducting loop with a josephson junction. Journal of Applied Physics, 77(6):2582–2590, 1995. [108] J.-B. Hiriart-Urruty and C. Lemarchal. Fundamentals of Convex Analysis. Springer, 31, 2004. [109] T. Hoch, G. Wenning, and K. Obermayer. Optimal noise-aided signal transmission through populations of neurons. Phys Rev E, 68:011911 – 1–11, 2003. [110] A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology, 117(1):500–544, January 1952. [111] N. Hohn and A. N. Burkitt. Shot noise in the leaky integrate-and-fire neuron. Physical Review E, 63(3):031902, Feb 2001. [112] J. J. Hopfield. Neural networks with graded response have collective computational properties like those of two-state neurons. Proceeding of National Academy of Science, 81:3088–3092, May 1984. [113] F. C. Hoppensteadt and E. M. Izhikevich. Weakly Connected Neural Networks. Springer, 1997. [114] L. Huang and M. Neely. The optimality of two prices: Maximizing revenue in a stochas- tic network. Proc. of 45th Annual Allerton Conference on Communication, Control, and Computing, September 2007. [115] J. Ilow and D. Hatzinakos. Applications of the empirical characteristic function to estima- tion and detection problems. Signal Processing, 65(2):199–219, 1998. [116] M. E. Inchiosa, J. W. C. Robinson, and A. R. Bulsara. Information-theoretic stochastic resonance in noise-floor limited systems: The case for adding noise. Physical Review Letters, 85:3369–3372, October 2000. [117] G. A Jacobson, K. Diba, A. Yaron-Jakoubovitch, Y . Oz, C. Koch, I. Segev, and Y . Yarom. Subthreshold voltage noise of rat neocortical pyramidal neurones. The Journal of Physiol- ogy, 564:145–160, 2005. [118] F. Jaramillo and K. Wiesenfeld. Mechanoelectrical transduction assisted by brownian mo- tion: A role for noise in the auditory system. Nature Neuroscience, 1:384–388, 1998. 234 [119] F. Jaramillo and K. Wiesenfeld. Physiological noise level enhances mechanoelectrical transduction in hair cells. Chaos, Solitons & Fractals, 11(12):1869 – 1874, 2000. [120] D. S. Jones. The theory of generalized functions. Cambridge University Press, 2 edition, 1982. [121] A. Kalemanova, B. Schmid, and R. Werner. The normal inverse gaussian distribution for synthetic cdo pricing. Journal of Financial Derivatives, 14(3):80–93, September 2007. [122] E. R. Kandel, J. H. Schwartz, and T. M. Jessell. Principles of Neuroscience. McGraw-Hill, 4 edition, 2000. [123] G.K. Kanizsa. Subjective contours. In Scientific American, volume 234, 1976. [124] S. A. Kassam. Signal Detection in Non-Gaussian Noise. Springer-Verlag, 1988. [125] I. Kh. Kaufman, D. G. Luchinsky, P. V . E. McClintock, S. M. Soskin, and N. D. Stein. High-frequency stochastic resonance in squids. Physics Letters A, 220(4-5):219 – 223, 1996. [126] S. Kay. Can detectability be improved by adding noise? Signal Processing Letters, IEEE, 7(1):8–10, Jan 2000. [127] S. Kay, J.H. Michels, Hao Chen, and P.K. Varshney. Reducing probability of decision error using stochastic resonance. Signal Processing Letters, IEEE, 13(11):695–698, Nov. 2006. [128] B. J. Kim, M.-S. Choi, P. Minnhagen, G. S. Jeon, H. J. Kim, and M. Y . Choi. Spatiotempo- ral stochastic resonance in fully frustrated josephson ladders. Phys. Rev. B, 63(10):104506, Feb 2001. [129] K. J. Kim and F. Rieke. Temporal Contrast Adaptation in the Input and Output Signals of Salamander Retinal Ganglion Cells. J. Neurosci., 21(1):287–299, 2001. [130] N. Kimura and S. Latifi. A survey on data compression in wireless sensor networks. In ITCC ’05: Proceedings of the International Conference on Information Technology: Cod- ing and Computing (ITCC’05) - Volume II, pages 8–13, Washington, DC, USA, 2005. IEEE Computer Society. [131] J.-Y . Ko, K. Otsuka, and T. Kubota. Quantum-noise-induced order in lasers placed in chaotic oscillation by frequency-shifted feedback. Physical Review Letters, 86(18):4025– 4028, Apr 2001. [132] A. F. Kohn. Dendritic transformations on random synaptic inputs as measured from neu- ron’s spike train–modeling and simulation. IEEE Transactions on Biomedical Engineer- ing, 36(1):44–54, January 1989. [133] M. J. Korenberg and I. W Hunter. The identification of nonlinear biological systems: Lnl cascade models. Biological Cybernetics, 55(2-3):125–134, 1986. 235 [134] T. W. K¨ orner. A Companion to Analysis: A Second First and First Second Course in Analysis. pp. 135-138, American Mathematical Society, 2004. [135] W. Korneta, I. Gomes, C. R. Mirasso, and R. Toral. Experimental study of stochastic reso- nance in a chua’s circuit operating in a chaotic regime. Physica D: Nonlinear Phenomena, 219(1):93 – 100, 2006. [136] B. Kosko. Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Ma- chine Intelligence. Prentice-Hall, Englewood Cliffs, NJ, 1991. [137] B. Kosko. Noise. Viking/Penguin, 2006. [138] B. Kosko, I. Lee, S. Mitaim, A. Patel, and M. M. Wilde. Applications of forbidden in- terval theorems in stochastic resonance. In Applications of Nonlinear Dynamics, V . In, P . Longhini, A. Palacios (Eds.), pages 71–89, 2009. [139] B. Kosko and S. Mitaim. Robust stochastic resonance: Signal detection and adaptation in impulsive noise. Physical Review E, 64(5):051110, Oct 2001. [140] B. Kosko and S. Mitaim. Stochastic resonance in noisy threshold neurons. Neural Net- works, 16(5-6):755–761, 2003. [141] B. Kosko and S. Mitaim. Robust stochastic resonance for simple threshold neurons. Phys- ical Review E, 70(3):031911–1 – 031911–10, Sep 2004. [142] S. Kotz, T. Kozubowski, and K. Podgorsk. The Laplace Distribution and Generalizations. Birkh¨ auser Boston, 2001. [143] G. Kramer. Directed information for channels with feedback. Ph.D. Dissertation, Swiss Federal Institute of Technology (ETH) Zurich, 1998. [144] H. A. Kramers. Brownian motion in a field of force and the diffusion model of chemical reactions. Physica, 7:284–304, April 1940. [145] A. Krawiecki. Structural stochastic multiresonance in the Ising model on scale-free net- works. European Physical Journal B, 69:81–86, May 2009. [146] R. Krupi´ nski and J. Purczy´ nski. Approximated fast estimator for the shape parameter of generalized gaussian distribution. Signal Processing, 86(2):205–211, 2006. [147] E. E. Kuruoglu, W. J. Fitzgerald, and P. J. W. Rayner. Near optimal detection of signals in impulsive noise modeled with a symmetric-stable distribution. Communications Letters, IEEE, 2(10):282–284, October 1998. [148] L. L¨ aer, M. Kloppstech, C. Sch¨ ofl, T. J. Sejnowski, G. Brabant, and K. Prank. Noise enhanced hormonal signal transduction through intracellular calcium oscillations. Bio- physical Chemistry, 91:157–166, July 2001. [149] T. D. Lamb. Sources of noise in photoreceptor transduction. Journal of the Optical Society of America A, 4:2295–2300, December 1987. 236 [150] G. C. Langelaar, I. Setyawan, and R. L. Lagendijk. Watermarking digital image and video data. a state-of-the-art overview. Signal Processing Magazine, IEEE, 17(5):20–46, Sep 2000. [151] I. Y . Lee, X. Liu, C. Zhou, and B. Kosko. Noise-enhanced detection of subthreshold signals with carbon nanotubes. IEEE Transactions on Nanotechnology, 5(6):613–627, November 2006. [152] E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer, 3rd edition edition, 2008. [153] D. S. Leonard and L. E. Reichl. Stochastic resonance in a chemical reaction. Physical Review E, 49(2):1734–1737, February 1994. [154] J. E. Levin and J. P. Miller. Broadband neural encoding in the cricket cercal sensory system enhanced by stochastic resonance. Nature, 380:165–168, March 1996. [155] W. B. Levy and R. A. Baxter. Energy-efficient neuronal computation via quantal synaptic failures. Journal of Neuroscience, 22:4746–4755, June 2002. [156] H. Li, Z. Hou, and H. Xin. Internal noise enhanced detection of hormonal signal through intracellular calcium oscillations. Chemical Physics Letters, 402(4-6):444 – 449, 2005. [157] Q. S. Li and A. Lei. Selectivity of explicit internal signal stochastic resonance in a chemi- cal model. The Journal of Chemical Physics, 119:7050–7053, October 2003. [158] F. Liese and I. Vajda. On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10):4394–4412, Oct. 2006. [159] B. Lindner, J. Garc´ ıa-Ojalvo, A. Neiman, and L. Schimansky-Geier. Effects of noise in excitable systems. Physics Reports, 392:321–424, March 2004. [160] B. Lindner, A. Longtin, and A. Bulsara. Analytic expressions for rate and cv of a type i neuron driven by white gaussian noise. Neural Computation, 15(8):1761–1788, 2003. [161] R. Linsker. Self-organization in perceptual network. Computer, pages 105–117, March 1988. [162] R. Linsker. A local learning rule that enables information maximization for arbitrary input distributions. Neural Comput., 9(8):1661–1665, 1997. [163] S. P. Lipshitz, R. A. Wannamaker, and J. Vanderkooy. Quantization and dither: A theoret- ical survey. J. Audio Eng. Soc., 40(5):355–374, November 1992. [164] R. L¨ ofstedt and S. N. Coppersmith. Quantum stochastic resonance. Physical Review Letters, 72(13):1947–1950, Mar 1994. [165] R. D. Luce and H. Raiffa. Games and Decisions. Wiley, 1957. [166] D. Luenberger. Optimization by vector space methods. Wiley-Interscience, 224-225, 1969. 237 [167] E. Manjarrez, O. Diez-Martnez, I. Mndez, and A. Flores. Stochastic resonance in hu- man electroencephalographic activity elicited by mechanical tactile stimuli. Neuroscience Letters, 324(3):213 – 216, 2002. [168] A. Manwani and C. Koch. Detecting and estimating signals in noisy cable structures, i: neuronal noise sources. Neural Computation, 11(9):1797–1829, 1999. [169] X. Mao. Stochastic Differential Equations and their Applications. Horwood Publishing, 1997. [170] F. Marchesoni, F. Apostolico, and S. Santucci. Stochastic resonance in an asymmetric schmitt trigger. Physical Review E, 59(4):3958–3963, Apr 1999. [171] L. Martinez, T. Perez, C. R. Mirasso, and E. Manjarrez. Stochastic Resonance in the Motor System: Effects of Noise on the Monosynaptic Reflex Pathway of the Cat Spinal Cord. Journal of Neurophysiology, 97(6):4007–4016, 2007. [172] J. Massey. Causality, feedback and directed information. In Proc. 1990 Symp. Information Theory and Its Applications (ISITA-90), Waikiki, HI, pages 303–305, November 1990. [173] K. Matsumoto and I. Tsuda. Noise-induced order. Journal of Statistical Physics, 31(1):87–106, 1983. [174] J. H. McCulloch. Numerical approximation of the symmetric stable distribution and den- sity. In R. Adler, R. Feldman, and M. Taqqu, editors, A practical guide to heavy tails: sta- tistical techniques and applications, pages 489–499. Birkhauser Boston Inc., Cambridge, MA, USA, 1998. [175] W. S. McCulloch and W. Pitts. A logical calculus of ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, (5):115–133, 1943. Reprinted in Neurocomputing: Foundations of Research, ed. by J. A. Anderson and E Rosenfeld. MIT Press 1988. [176] M. D. McDonnell and D. Abbott. What is stochastic resonance? definitions, misconcep- tions, debates, and its relevance to biology. PLoS Computational Biology, 5(5):e1000348, 05 2009. [177] M. D. McDonnell, N. G. Stocks, C. E. M. Pearce, and D. Abbott. Analog-to-digital con- version using suprathreshold stochastic resonance. In S. F. Al-Sarawi, editor, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 5649 of So- ciety of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, pages 75–84, February 2005. [178] M. D. McDonnell, N. G. Stocks, C. E. M. Pearce, and D. Abbott. Quantization in the presence of large amplitude threshold noise. Fluctuation Noise Letters, 5(3):L457–L468, September 2005. [179] M. D. McDonnell, N. G. Stocks, C. E. M. Pearce, and D. Abbott. Optimal information transmission in nonlinear arrays through suprathreshold stochastic resonance. Physics Letters A, 352:183–189, 2006. 238 [180] M. D. McDonnell, N. G. Stocks, C. E. M. Pearce, and D. Abbott. Stochastic Resonance: From Suprathreshold Stochastic Resonance to Stochastic Signal Quantization. Cambridge University Press, 2008. [181] B. McNamara and K. Wiesenfeld. Theory of stochastic resonance. Physical Review A, 39:4854–4869, May 1989. [182] B. McNamara, K. Wiesenfeld, and R. Roy. Observation of stochastic resonance in a ring laser. Physical Review Letters, 60(25):2626–2629, Jun 1988. [183] C. Mead. Analog VLSI and Neural Systems. Addison Wesley, 1989. [184] V . I. Melnikov. Schmitt trigger: A solvable model of stochastic resonance. Physical Review E, 48(4):2481–2489, Oct 1993. [185] M. Y . Min and K. Appenteng. Multimodal distribution of amplitudes of miniature and spontaneous epsps recorded in rat trigeminal motoneurones. Journal of Physiology, 494:171–182, July 1996. [186] M. Minsky and S. Papert. Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge, MA, expanded edition, 1988. [187] S. Mitaim and B. Kosko. Adaptive stochastic resonance. Proceedings of the IEEE, 86(11):2152–2183, 1998. [188] S. Mitaim and B. Kosko. Adaptive stochastic resonance in noisy neurons based on mutual information. IEEE Transactions on Neural Networks, 15:1526–1540, November 2004. [189] T. Mori and S. Kai. Noise-induced entrainment and stochastic resonance in human brain waves. Physical Review Letters, 88(21):218101, May 2002. [190] F. Moss, L. M. Ward, and W. G. Sannita. Stochastic resonance and sensory information processing: a tutorial and review of application. Clinical Neurophysiology, 115(2):267– 281, February 2004. [191] S. Nadarajah. A generalized normal distribution. Journal of Applied Statistics, 32(7):685– 694, September 2005. [192] C. L. Nikias and M. Shao. Signal processing with alpha-stable distributions and applica- tions. Wiley-Interscience, New York, NY , USA, 1995. [193] A. Nikolaidis and I. Pitas. Asymptotically optimal detection for additive watermarking in the dct and dwt domains. IEEE Transactions on Image Processing, 12(5):563–571, May 2003. [194] J. P. Nolan. Numerical calculation of stable densities and distribution functions. Commu- nications in Statistics - Stochastic Models, 13:759–774, 1997. [195] M. Ohka and S. Kondo. Stochastic resonance aided tactile sensing. Robotica, 27(4):633– 639, 2009. 239 [196] G. Owen. Game Theory. Academic Press, 2nd edition, 1982. [197] A. Pacut. Separation of deterministic and stochastic neurotransmission. In Neural Net- works, 2001. Proceedings. IJCNN ’01. International Joint Conference on, volume 1, pages 55–60, 2001. [198] E. Pantazelou, C. Dames, F. Moss, J. Douglass, and L. Wilkens. Temperature dependence and the role of internal noise in signal transduction efficiency of crayfish mechanorecep- tors. International Journal of Bifurcation and Chaos, 5:101108, 1995. [199] H. C. Papadopoulos, G. W. Wornell, and A. V . Oppenheim. Low-complexity digital en- coding strategies for wireless sensor networks. In Proc. ICASSP98, pages 3273–3276, 1998. [200] L. Pardo. Statistical inference based on divergence measures. Chapman & Hall/CRC, 2006. [201] A. Patel and B. Kosko. Noise benefits in quantizer-array correlation detection and water- mark decoding. submitted to IEEE Transaction on Signal Processing. [202] A. Patel and B. Kosko. Noise benefits in spiking retinal and sensory neuron models. In IEEE International Joint Conference on Neural Networks, 2005. IJCNN ’05., volume 1, pages 410–415, July-August 2005. [203] A. Patel and B. Kosko. Stochastic resonance in noisy spiking retinal and sensory neuron models. Neural Networks, 18(5-6):467–478, 2005. [204] A. Patel and B. Kosko. Mutual-information noise benefits in brownian models of con- tinuous and spiking neurons. In Neural Networks, 2006. IJCNN ’06. International Joint Conference on, pages 1368–1375, 2006. [205] A. Patel and B. Kosko. Levy noise benefits in neural signal detection. In IEEE Inter- national Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007, volume 3, pages III–1413–III–1416, April 2007. [206] A. Patel and B. Kosko. Optimal noise benefits in Neyman-Pearson signal detection. 33rd International Conference on Acoustics, Speech, and Signal Processing, ICASSP’08, 2008. [207] A. Patel and B. Kosko. Stochastic resonance in continuous and spiking neuron models with Levy noise. IEEE Transactions on Neural Networks, 19(12):1993–2008, December 2008. [208] A. Patel and B. Kosko. Error-probability noise benefits in threshold neural signal detection. Neural Networks, 22(5-6):697–706, July/August 2009. [209] A. Patel and B. Kosko. Neural signal-detection noise benefits based on error probability. Proceedings of the International Joint Conference on Neural Networks, IJCNN 2009, June 2009. 240 [210] A. Patel and B. Kosko. Optimal noise benefits in neyman-pearson and inequality- constrained statistical signal detection. IEEE Transactions on Signal Processing, 57:1655– 1669, May 2009. [211] A. Patel and B. Kosko. Quantizer noise benefits in nonlinear signal detection with alpha- stable channel noise. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pages 3269–3272, 2009. [212] X. Pei, K. Bachmann, and F. Moss. The detection threshold, noise and stochastic reso- nance in the Fitzhugh-Nagumo neuron model. Physics Letters A, 206:61–65, 1995. [213] H.H. Permuter, Young-Han Kim, and T. Weissman. On directed information and gam- bling. In Information Theory, 2008. ISIT 2008. IEEE International Symposium on, pages 1403–1407, July 2008. [214] O. V . Poliannikov, Y . Bao, and H. Krim. Levy processes for image modelling. In Proceed- ings of IEEE Signal Processing Workshop on Higher-Order Statistics, pages 204–207, 1999. [215] J. G. Proakis and M. Salehi. Digital Communications. McGraw Hill, 5th edition, 2008. [216] C. V . Rao, D. M. Wolf, and A. P. Arkin. Control, exploitation and tolerance of intracellular noise. Nature, 420(6912):231–237, November 2002. [217] S. I. Resnick. Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Springer, 2007. [218] M. Riani and E. Simonotto. Stochastic resonance in the perceptual interpretation of am- biguous figures: A neural network model. Physical Review Letters, 72(19):3120–3123, May 1994. [219] F. Rieke, D. Warland, de Ruyter, and W. Bialek. Spikes: Exploring the Neural Code. The MIT Press, 1999. [220] J. W. C. Robinson, D. E. Asraf, A. R. Bulsara, and M. E. Inchiosa. Information-theoretic distance measures and a generalization of stochastic resonance. Physical Review Letters, 81:2850–2853, October 1998. [221] S. Ross, B. Arnold, J. T. Blackburn, C. Brown, and K. Guskiewicz. Enhanced balance as- sociated with coordination training with stochastic resonance stimulation in subjects with functional ankle instability: an experimental trial. Journal of NeuroEngineering and Re- habilitation, 4(1):47, 2007. [222] D. Rousseau, G. V . Anand, and F. Chapeau-Blondeau. Noise-enhanced nonlinear detector to improve signal detection in non-Gaussian noise. Signal Processing, 86(11):3456–3465, November 2006. [223] D. Rousseau and F. Chapeau-Blondeau. Suprathreshold stochastic resonance and signal- to-noise ratio improvement in arrays of comparators. Physics Letters A, 321(5-6):280 – 290, 2004. 241 [224] D. Rousseau and F. Chapeau-Blondeau. Constructive role of noise in signal detection from parallel array of quantizers. Signal Processing, 85(3):571–580, March 2005. [225] D. Rousseau and F. Chapeau-Blondeau. Stochastic resonance and improvement by noise in optimal detection strategies. Digital Signal Processing, 15(1):19 – 32, 2005. [226] D. Rousseau, F. Duan, and F. Chapeau-Blondeau. Suprathreshold stochastic resonance and noise-enhanced fisher information in arrays of threshold devices. Physical Review E, 68(3):031107, Sep 2003. [227] W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, 3rd edition, 1976. [228] T. Rydberg. The normal inverse gaussian levy processes: Simulation and approximation. Communications in Statistics: Stochastic Models, 13:887–910, 1997. [229] L. Sacerdote and R. Sirovich. Multimodality of the interspike interval distribution in a sim- ple jump-diffusion model. Scientiae Mathematicae Japonicae, 58(2):307–322, 20030901. [230] A. A. Saha and G. V . Anand. Design of detectors based on stochastic resonance. Signal Processing, 83(6):1193–1212, 2003. [231] H. M. Sakai, J. L. Wang, and K. Naka. Contrast gain control in the lower vertebrate retinas. Journal of General Physiology, 105(6):815–835, 1995. [232] G. Samorodnitsky and M. S. Taqqu. Principles of Mathematical Analysis. Chapman Hall/CRC, 1994. [233] A. Sanderovich, S. Shamai, Y . Steinberg, and G. Kramer. Communication via decentral- ized processing. IEEE Transactions on Information Theory, 54(7):3008–3023, July 2008. [234] H. Sasaki, S. Sakane, T. Ishida, M. Todorokihara, T. Kitamura, and R. Aoki. Suprathresh- old stochastic resonance in visual signal detection. Behavioural Brain Research, 193(1):152 – 155, 2008. [235] K. I. Sato. Levy Processes and Infinitely Divisible Distributions. Cambridge University Press, 1999. [236] E. Schneidman, B. Freedman, and I. Segev. Ion channel stochasticity may be critical in de- termining the reliability and precision of spike timing. Neural Computation, 10(7):1679– 1703, 1998. [237] D. W. E. Schobben, R. A. Beuker, and W. Oomen. Dither and data compression. IEEE Transactions on Signal Processing, 45(8):2097–2101, Aug 1997. [238] W. Schoutens. Exotic options under levy models: An overview. Journal of Computational and Applied Mathematics, 189(1):526–538, May 2006. [239] M. J. J. Scott, M. Niranjan, and R. W. Prager. Realisable classifiers: Improving operat- ing performance on variable cost problems. Proceedings of 9th British Machine Vision Conference, 1:305–315, September 1998. 242 [240] R. Segev, M. Benveniste, E. Hulata, N. Cohen, A. Palevski, E. Kapon, Y . Shapira, and E. Ben-Jacob. Long term behavior of lithographically prepared in vitro neuronal networks. Physical Review Letters, 88(11):118102, March 2002. [241] M. Shao and C. L. Nikias. Signal detection in impulsive noise based on stable distributions. In The 27th Asilomar Conference on Signals, Systems and Computers, volume 1, pages 218–222, November 1993. [242] M. F. Shlesinger, G. M. Zaslavskii, U. Frisch, and (eds). Levy Flights and Related Topics in Physics: Proceedings of the International Workshop Held at Nice, France, 27-30 June 1994 (Lecture Notes in Physics). Springer, September 1995. [243] E. Simonotto, M. Riani, C. Seife, M. Roberts, J. Twitty, and F. Moss. Visual perception of stochastic resonance. Physical Review Letters, 78(6):1186–1189, Feb 1997. [244] O. Sotolongo-Costa, J. C. Antoranz, A. Posadas, F. Vidal, and A. Vazquez. Levy flights and earthquakes. Geophysical Research Letters, 27:1965, 2000. [245] W. C. Stacey and D. M. Durand. Stochastic resonance improves signal detection in hip- pocampal ca1 neurons. Journal Neurophysiology, 83:1394–1402, 2000. [246] N. G. Stocks. Information transmission in parallel threshold arrays: Suprathreshold stochastic resonance. Physical Review E, 63(4):041114, March 2001. [247] N. G. Stocks, D. Appligham, and R. P. Morse. The application of suprathreshold stochastic resonance to cochlear implant coding. Fluctuation and Noise Letters, 2:L169–L181, 2002. [248] N. G. Stocks and R. Mannella. Generic noise-enhanced coding in neuronal arrays. Physi- cal Review E, 64(3):030902 – 1–4, August 2001. [249] S. Sun and B. Lei. On an aperiodic stochastic resonance signal processor and its applica- tion in digital watermarking. Signal Processing, 88(8):2085 – 2094, 2008. [250] S. Sun and P. Qui. Algorithm for digital watermarking based on parameter-induced stochastic resonance. Journal of Communications, 88(8):48 – 55, 2005. [251] R. D. Traub and R. Miles. Neural Networks of the Hippocampus. Cambridge University Press, New York, 1976. [252] G. A. Tsihrintzis and C. L. Nikias. Performance of optimum and suboptimum receivers in the presence of impulsive noise modeled as an alpha-stable process. IEEE Transactions on Communications, 43(234):904–914, March 1995. [253] J. N. Tsitsiklis. Decentralized Detection, Advances in Statistical Signal Processing, H.V . Poor and J.B. Thomas, Eds. JAI Press, Greenwich, CT, 1998. [254] O. G. Turutanov, A. N. Omelyanchouk, V . I. Shnyrkov, and Yu. P. Bliokh. Stochastic resonance-based input circuits for squids. Physica C: Superconductivity, 372-376(1):237 – 239, 2002. 243 [255] M. C. W. van Rossum, B. J. O’Brien, B. J. Obrien, and R. G. Smith. Effects of noise on the spike timing precision of retinal ganglion cells. Journal of Neurophysiology, 89:2406– 2419, 2003. [256] G. Vemuri and R. Roy. Stochastic resonance in a bistable ring laser. Physical Review A, 39:4668–4674, May 1989. [257] N. Veselinovic, T. Matsumoto, and M. Juntti. Iterative pdf estimation-based multiuser diversity detection and channel estimation with unknown interference. EURASIP Journal on Applied Signal Processing, 6:872–882, January 2005. [258] J. M. G. Vilar and J. M. Rub´ ı. Stochastic multiresonance. Physical Review Letters, 78(15):2882–2885, April 1997. [259] J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Prince- ton University Press, Princeton, NJ, 1944. [260] J. Wang, G. Liu, Y . Dai, J. Sun, Z. Wang, and S. Lian. Locally optimum detection for barni’s multiplicative watermarking in DWT domain. Signal Processing, 88(1):117–130, 2008. [261] W. Wang and Z. D. Wang. Internal-noise-enhanced signal transduction in neuronal sys- tems. Physical Review E, 55(6):7379–7384, June 1997. [262] R. A. Wannamaker, S. P. Lipshitz, and J. Vanderkooy. Stochastic resonance as dithering. Physical Review E, 61(1):233–236, January 2000. [263] T. Wellens, V . Shatokhin, and A. Buchleitner. Stochastic resonance. Reports on Progress in Physics, 67(1):45–105, 2004. [264] K. Wiesenfeld and F. Moss. Stochastic resonance and the benefits of noise: from ice ages to crayfish and squid. Nature, 373:33–36, 1995. [265] M. M. Wilde and B. Kosko. Quantum forbidden-interval theorems for stochastic reso- nance. Journal of Physics A: Mathematical and Theoretical, 42:465309+23, 2009. [266] W. Willinger, M. S. Taqqu, R. Sherman, and D. V . Wilson. Self-similarity through high- variability: Statistical analysis of ethernet lan traffic at the source level. IEEE/ACM Trans- actions on Networking, 5(1):71–86, 1997. [267] K. Wimmer, K. J. Hildebrandt, R. M. Hennig, and K. Obermayer. Adaptation and selec- tive information transmission in the cricket auditory neuron AN2. PLoS Computational Biology, 4(9):e1000182, 2008. [268] G. Van De Wouwer, P. Scheunders, and D. Van Dyck. Statistical texture characterization from discrete wavelet representations. IEEE Transactions on Image Processing, 8:592– 598, 1999. [269] J. M. Wozencraft and I. M. Jacobs. Principles of Communication Engineering. New York: Wiley, 220, 1965. 244 [270] G. Wu and Z. Qiu. A novel watermarking scheme based on stochastic resonance. In 8th International Conference on Signal Processing, volume 2, 2006. [271] S. Xie, B. Xiang, H. Deng, S. Xiang, and J. Lu. Improved stochastic resonance algorithm for enhancement of signal-to-noise ratio of high-performance liquid chromatographic sig- nal. Analytica Chimica Acta, 585(1):60 – 65, 2007. [272] W. Xu, D. Hu, H. Shen, and M. Li. Explicit internal signal stochastic resonance in a chemical oscillator. Applied Mathematics and Computation, 172(2):1188 – 1194, 2006. Special issue for The Beijing-HK Scientific Computing Meetings. [273] Q. Ye, H. Huang, Y . Chen, and B. Tian. Stochastic resonance ad conversion and its effect on image enhancement. In 2007 IEEE International Conference on Multimedia and Expo, pages 1675–1678, July 2007. [274] M. Yoshida, H. Hayashi, K. Tateno, and S. Ishizuka. Stochastic resonance in the hip- pocampal CA3-CA1 model: a possible memory recall mechanism. Neural Networks, 15(10):1171–1183, 2002. [275] M. Yoshimoto, H. Shirahama, and S. Kurosawa. Noise-induced order in the chaos of the belousov–zhabotinsky reaction. The Journal of Chemical Physics, 129(1):014508, 2008. [276] Y . Yu, B. Potetz, and T. S. Lee. The role of spiking nonlinearity in contrast gain control and information transmission. Vision Research, 45(5):583–592, March 2005. [277] A. H. Zemanian. Distribution theory and transform analysis: an introduction to general- ized functions, with applications. Dover Publications, Inc., New York, NY , USA, 1987. [278] S. Zozor and P.-O. Amblard. On the use of stochastic resonance in sine detection. Signal Processing, 82(3):353367, March 2002. [279] S. Zozor and P.-O. Amblard. Stochastic resonance in locally optimal detectors. IEEE Transactions on Signal Processing, 51(12):3177–3181, December 2003. [280] S. Zozor, J.-M. Brossier, and P.-O. Amblard. A parametric approach to suboptimal signal detection in -stable noise. IEEE Transactions on Signal Processing, 54(12):4497–4509, December 2006. 245
Abstract (if available)
Abstract
This dissertation shows how noise can benefit nonlinear signal processing. These "stochastic resonance'' results include deriving necessary and sufficient conditions for noise benefits, optimal noise distributions, and algorithms that find the optimal or near-optimal noise. The results apply to broad classes of signal and noise distributions. Applications include Neyman-Pearson and maximum-likelihood signal detection in single detectors and in parallel arrays, digital watermark decoding, retinal signal detection, and signal detection in feedback neurons.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Optimizing statistical decisions by adding noise
PDF
Noise benefits in expectation-maximization algorithms
PDF
Noise benefits in Markov chain Monte Carlo computation
PDF
Noise-robust spectro-temporal acoustic signature recognition using nonlinear Hebbian learning
PDF
Reconfigurable high-speed processing and noise mitigation of optical data
PDF
Models and information rates for channels with nonlinearity and phase noise
PDF
Localization of multiple targets in multi-path environnents
PDF
A biomimetic approach to non-linear signal processing in ultra low power analog circuits
Asset Metadata
Creator
Patel, Ashok
(author)
Core Title
Noise benefits in nonlinear signal processing
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
11/24/2009
Defense Date
10/26/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
alpha-stable noise,array-based signal detection,continuous neuron models,detection probability,digital watermark decoding,error probability,inequality-constrained statistical decisions,Levy noise,maximum-likelihood detection,mutual information,neural signal detection,Neyman-Pearson detection,noise benefits in optimal signal detection,noise-enhanced signal detection,nonlinear hypothesis testing, nonlinear correlation detection,OAI-PMH Harvest,optimal noise,quantizer noise,retinal neuron models,spiking neuron models,stochastic resonance,suprathreshold stochastic resonance,threshold signal detection
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kosko, Bart (
committee chair
), Jonckheere, Edmond (
committee member
), Mikulevicius, Remigijus (
committee member
)
Creator Email
ashokpat@usc.edu,batspatel@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2762
Unique identifier
UC1149691
Identifier
etd-Patel-3377 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-284990 (legacy record id),usctheses-m2762 (legacy record id)
Legacy Identifier
etd-Patel-3377.pdf
Dmrecord
284990
Document Type
Dissertation
Rights
Patel, Ashok
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
alpha-stable noise
array-based signal detection
continuous neuron models
detection probability
digital watermark decoding
error probability
inequality-constrained statistical decisions
Levy noise
maximum-likelihood detection
mutual information
neural signal detection
Neyman-Pearson detection
noise benefits in optimal signal detection
noise-enhanced signal detection
nonlinear hypothesis testing, nonlinear correlation detection
optimal noise
quantizer noise
retinal neuron models
spiking neuron models
stochastic resonance
suprathreshold stochastic resonance
threshold signal detection