Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Efficient acoustic noise suppression for audio signals
(USC Thesis Other)
Efficient acoustic noise suppression for audio signals
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EFFICIENT ACOUSTIC NOISE SUPPRESSION FOR AUDIO SIGNALS by Hesu Huang A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment o f the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2006 Copyright 2006 Hesu Huang Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3236509 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. ® UMI UMI Microform 3236509 Copyright 2006 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. DEDICATION Dedicated with love to my husband Tao Xue, my son Jonathan Xue, my parents Jinlong Huang and Qingfen Wang Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ACKNOWLEDGEMENTS I would like to express my sincere gratitude and appreciation to my advisor Dr. Chris Kyriakakis, for his continuous support, patience and encouragement throughout my PhD program study. This dissertation could not have been completed without his expert guidance and invaluable comments. I learned a lot from Dr. Kyriakakis, from both his extensive knowledge and his decent personality. And I will continue to benefit from this throughout my entire life. My thanks also goes to the rest members in my dissertation committee as well as in my qualifying exam committee: Dr. Narayanan, Dr. Zimmermann, Dr. Jenkins and Dr. Kuo for providing many insightful comments that improved the presentation and contents o f this dissertation. I wish to thank all the group members under Dr. Kyriakakis' supervision. Thanks for the good research atmosphere you provided, for the help you offered and for the good-spirited discussions we shared during my graduate studies. At last, I would like to thank my family. Thank my husband Tao and my son Jonathan. Their understanding and love enable me to finish this PhD project. Thank my parents and parents-in-law for their life-long love and emotional support throughout the process. My special thank goes to my sister, Yusu Wang, for sharing iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. her experience on doing research and writing dissertation and for listening to my complaints and frustrations whenever I needed. iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TABLE OF CONTENTS DED ICATION ..............................................................................................................................ii ACKNOW LEDGEM ENTS......................................................................................................iii LIST OF TA B LES................................................................................................................... viii LIST OF FIG U R E S ....................................................................................................................ix ABSTRACT .............................................................................................................................xii CHAPTER 1 IN TRO D U CTIO N ............................................................................................1 1.1 M o tiv a tio n s a n d O v e r v ie w ..........................................................................................................1 1.2 Prev io u s W o r k................................................................................................................... 2 1.2.1 Additive Background Noise Reduction........................................................................3 1.2.2 Convolutive Noise Reduction.......................................................................................4 1.3 Co ntr ibu tio n s of the Re s e a r c h .................................................................................................5 1.3.1 Proposal of Delay less Subband Adaptive Noise Suppression...................................5 1.3.2 Binaural Model Based Binaural Additive Noise Reduction...................................... 8 1.3.3 Binaural Dereverberation Using Delayless Subband Least-Squares Algorithm... 11 1.4 Outline of the Pr o p o sa l.............................................................................................................. 12 CHAPTER 2 RESEARCH BACKGROUND................................................................... 13 2.1 H u m a n H earing a n d A u d it o r y Sy s t e m ................................................................................13 2.1.1 The Human E ar............................................................................................................13 2.1.2 Masking.........................................................................................................................15 2.1.3 Critical Band.................................................................................................................19 2.2 A d a ptiv e N oise Ca n c e l l a t io n.....................................................................................20 2.2.1 Adaptive Noise Cancellation......................................................................................21 2.3 D ela y l e ss S u b b a n d Filtering St r u c t u r e ......................................................................... 23 2.3.1 Subband Adaptive Filtering (SAF)............................................................................ 23 2.3.2 Delayless Subband Filtering Structure...................................................................... 25 2.4 Spectral Su b t r a c t io n................................................................................................... 26 2.4.1 Spectral Subtraction.....................................................................................................26 2.4.2 Generalized Spectral Subtraction...............................................................................29 2.5 O bjective M e a su r e s........................................................................................................................31 2.5.1 Segmental Signal-to-Noise Ratio (S-SNR).............................................................. 32 2.5.2 Enhanced Itakura Distance Measure (E-ID)............................................................. 33 2.5.3 Weighted Spectral Slope Measure (WSS).................................................................34 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C H A P T E R 3 D E L A Y L E S S S U B B A N D A D A P T IV E N O IS E S U P P R E S S IO N .... 3 6 3.1 O v e r v ie w ..............................................................................................................................................38 3.2 Rea l-v a l u e d D ela y less Su b b a n d A da pt iv e Filtering (R D S A F ).......................... 39 3.2.1 Real-Valued Single-Sideband (SSB ) A nalysis Filter B a n k s........................................ 3 9 3.2.2 Prototype Filter D e sig n ..............................................................................................................41 3.2.3 W eight Transform ation..............................................................................................................45 3.3 Ca u sa l Re a l-v a l u e d D e la y l e ss Su b b a n d A N C .............................................................48 3.3.1 A ffine Projection A lgorithm .................................................................................................... 48 3.3.2 W eight Transforming Rules for Causal Real-valued D elayless Subband ANC ...4 9 3.4 NON-CAUSAL REAL-VALUED DELAYLESS SUBBAND BLIND DECONVOLUTION............50 3.4.1 Constant Modulus Algorithm (C M A )................................................................................... 50 3.4.2 W eight Transforming Rules for N on-causal D elayless Subband C M A .................... 51 3.5 A n a l y sis of Co m pu ta tio n a l Co m p l e x it y ...........................................................................53 3.6 Sim ula tio n Re s u l t s ........................................................................................................................56 3.6.1 Com plex-Valued D SA F vs. Real-valued D S A F ...............................................................56 3.6.2 Non-causal D elayless Subband CM A Sim ulations..........................................................58 3.6.3 Results o f Simulated A coustic N oise Suppression...........................................................6 0 3.7 C o n c l u sio n ................................................................................................................. 61 C H A P T E R 4 IM P R O V E M E N T O N M O N A U R A L B L IN D D E R E V E R B E R A T IO N ............................................................................................................................ 6 4 4.1 LP Re s id u a l ......................................................................................................................................... 65 4.2 M odified CM A A lg o r it h m .......................................................................................................... 68 4.3 S im ulatio n Re s u l t s ........................................................................................................................70 4.4 Co n c l u s io n s........................................................................................................................................71 C H A P T E R 5 B IN A U R A L A D D IT IV E N O IS E R E D U C T IO N ........................................ 74 5.1 Pr o blem Fo r m u l a t io n...................................................................................................................75 5.2 Sim plified B in a u r a l m o d e l A cting A s V A D .................................................................... 76 5.2.1 Pre-processing S tage................................................................................................................... 77 5.2.2 Binaural Cue Extraction and V oice A ctivity D etection..................................................80 5.3 Spectral Su btr a c tio n B a se d N oise Re d u c t io n .............................................................. 82 5.3.1 N oise Reduction R u le ................................................................................................................ 82 5.3.2 N oise M asking Threshold..........................................................................................................85 5.3.3 Computer Sim ulations................................................................................................................ 9 0 5.3.4 C onclusions.................................................................................................................................... 94 5.4 ANC B a se d A d aptive B in a u r a l N oise Re d u c t io n ......................................................... 97 5.4.1 M ain Idea........................................................................................................................................ 9 9 5.4.2 Subband Processing.................................................................................................................. 100 5.4.3 Intermittent ANC M o d u le...................................................................................................... 101 5.4.4 Intermittent ANC M o d u le...................................................................................................... 102 5.4.5 Computer Sim ulation............................................................................................................... 104 5.4.6 C onclusions.................................................................................................................................. 106 5.5 D is c u s s io n s........................................................................................................................................ 110 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 6 BINAURAL D EREV ERBERA TIO N .................................................... 112 6.1 O v e r v ie w ............................................................................................................................................113 6.2 M a in Id e a ............................................................................................................................................114 6.2.1 Problem Formulation.................................................................................................114 6.2.2 Principles.................................................................................................................... 115 6.3 E stim atio n of C h a n n el Im pu lse Re s p o n s e s.....................................................................119 6.3.1 Unit-norm Constrained RLS Algorithm.................................................................. 119 6.3.2 Component-norm Constrained LS Algorithm......................................................... 121 6.3.3 Comparison between these two constrained LS algorithms..................................129 6.4 In v e r se Filtering a n d O ptim um Or d er D e t e r m in a t io n...........................................133 6.4.1 Adaptive Inverse Filtering........................................................................................ 133 6.4.2 Optimum Order Determination................................................................................ 134 6.5 COMBINATION WITH REAL-VALUED DELAYLESS SUBBAND PROCESSING.....................135 6.6 C o m puter Sim u l a t io n s................................................................................................................136 6.7 C o n c l u s io n s......................................................................................................................................138 CHAPTER 7 SUMMARY AND O U TLO O K ................................................................141 7.1 Su m m a r y ............................................................................................................................................. 141 7.2 Sug g estio n s to Futu r e W o r k .................................................................................................144 7.2.1 Single Channel Blind Deconvolution Incorporating Speech Characteristics 144 7.2.2 Application of Blind Signal Separation into ANS.................................................. 145 REFERENCES..........................................................................................................................147 A PPE N D IC ES..........................................................................................................................152 A. A ffine Projection A lgorithm (APA).......................................................................152 B. C o n st a n t m o d u l u s A lgorithm (C M A )............................................................................. 153 C. Efficient Su b b a n d D ec o m po sitio n....................................................................................... 154 C. 1 Polyphase Representation........................................................................................ 154 C.2 Implementation of Uniform DFT Filter Banks..................................................... 155 C.3 Implementation of Generalized DFT Filter Banks (GDFT).................................157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF TABLES Table 3.1: Objective Measurements for M onaural Delayless Subband Noise Suppression.................................................................................................. 63 Table 4.1: Objective Measurements for Speech Dereverberation Perform ance.............................................................................................................73 Table 5.1: Objective Measurements Comparison between Simple Masking Threshold Estimation and M MSE Based M asking Threshold Estim ation................................................................................................................ 97 Table 5.2: Objective Measurements Comparison between Auditory Filterbank Based Method and DSAF Based M ethod................................... 107 Table 6.1: Unit-norm Constrained RLS A lgorithm .........................................................120 Table 6.2: Linearly Constrained FLS A lgorithm ..............................................................125 Table 6.3: Robust Linearly Constrained FLS A lgorithm ................................................129 Table 6.4: Computational Complexity Com parison........................................................131 Table 6.5: Procedures to Determine Optimum Order P ..................................................135 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF FIGURES Figure 1.1: Overview o f an ANS S ystem ...............................................................................3 Figure 2.1: Structure o f Human E ar....................................................................................... 14 Figure 2.2: The Absolute Threshold o f H earing................................................................. 17 Figure 2.3: Illustration o f Frequency M asking with a Tone Presented..........................18 Figure 2.4: The Effect o f Temporary M asking....................................................................18 Figure 2.5: ANC System M o d el.............................................................................................22 Figure 2.6: Conventional Subband Adaptive Filtering Structure....................................24 Figure 2.7: Block Diagram o f Delayless Subband Adaptive Filtering Structure.................................................................................................................. 26 Figure 2.8: Block Diagram o f Spectral Subtraction...........................................................27 Figure 3.1: Block Diagram for Delayless Subband Adaptive Noise Suppression.............................................................................................................38 Figure 3.2: Illustration o f Real-valued SSB Filter Bank Im plem entation................... 41 Figure 3.3: Frequency Response o f a Prototype Filter E xam ple....................................45 Figure 3.4: Diagram o f FFT-2 W eight Transform ation....................................................48 Figure 3.5: Architecture o f C M A ........................................................................................... 50 Figure 3.6: Block Diagram o f Non-Casual W eight Transforming Process for Delayless Subband C M A ..............................................................................52 Figure 3.7: Convergence Curves.............................................................................................57 Figure 3.8: Simulation Results for Non-causal Delayless Subband C M A .................. 59 Figure 3.9: Simulations Results for Single Channel Acoustic Noise Suppression.............................................................................................................62 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.1: Block Diagram o f M odified M onaural Dereverberation M ethod.....................................................................................................................65 Figure 4.2: Power Spectrum Density P lo ts.......................................................................... 66 Figure 4.3: Kurtosis at Different Reverberation L ev el..................................................... 67 Figure 4.4: Simulation Results for M odified CM A based D ereverberation................ 72 Figure 5.1: Binaural Noisy Signal Recording System ...................................................... 75 Figure 5.2: Simplified Binaural Model Structure Acted as V A D ...................................77 Figure 5.3: Frequency Response o f a 25-band Gammatone Filter..................................79 Figure 5.4: Frequency Response o f All-pole Approximation for a 25-band Gammatone F ilter................................................................................................. 80 Figure 5.5: Speech/Noise Detection R esults........................................................................81 Figure 5.6: Block Diagram for SS based Binaural Noise R eduction.............................82 Figure 5.7: Spectral Floor and Over-subtraction F actor................................................... 84 Figure 5.8: Calculation Steps for M asking Threshold...................................................... 85 Figure 5.9: Functions Used for Noise M asking Threshold C alculation...................... 86 Figure 5.10: Example o f Noise Masking Threshold Computed for a 16ms Section o f S peech............................................................................................... 88 Figure 5.11: Objective Measurements for Color Noise Reduction............................... 91 Figure 5.12: Simulation Results for Left Channel Signals.............................................. 95 Figure 5.13: Simulation Results for Right Channel Signals............................................96 Figure 5.14: Block Diagram for ANC based Subband Binaural Noise R eduction..............................................................................................................99 Figure 5.15: Flow chart for Intermittent ANC M odule................................................... 102 Figure 5.16: Simulation Results for Left Channel S ignals............................................ 108 x Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5.17: Simulation Results for Right Channel Signals..........................................109 Figure 6.1: Block Diagram o f Binaural Dereverberation................................................113 Figure 6.2: Problem Formulation and Error Signal Construction.................................115 Figure 6.3: Estimated RIR for Left and Right C hannel.................................................. 132 Figure 6.4: Error Signals Computation in Optimum Order D eterm ination............... 134 Figure 6.5: Cost Function PE(p) vs. O rderp .....................................................................137 Figure 6.6: Simulation Results for Binaural D ereverberation.......................................139 Figure 6.7: Results o f Equalized Channel Impulse Responses..................................... 140 Figure C .l: N-to-1 D ecim ator............................................................................................... 154 Figure C.2: Clockwise Commutator Model for the Polyphase Structure for an M-to-1 Decimator..........................................................................................155 Figure C.3: Complex Bandpass Filter and M odulator Interpretation o f the Channel o f the Uniform DFT Filter Bank A nalyzer............................ 156 Figure C.4: Polyphase Structure for the Filter Bank Analyzer for K = IM ..................157 Figure C.5: Polyphase Structure for the GDFT Filter Bank Analyzer for K = IM .................................................................................................................... 158 xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ABSTRACT Acoustic Noise Suppression (ANS) are crucial for a variety o f applications, such as audio communication, hearing aids and speech recognition. Although there have been intensive studies on ANS using microphone array, monaural and binaural ANS still presents many challenges in real-world applications. This dissertation proposes several efficient monaural and binaural noise suppression schemes to deal with two common types o f acoustic noise: the additive noise and the convolutive noise in the form o f reverberation. For monaural ANS, we propose a two-stage approach by using APA algorithm and CMA algorithm to suppress the ambient noise and convolutive noise successively. The method causes zero transmission delay by a novel design and application o f Real-valued Delayless Subband Adaptive Filter (RDSAF) structure. To further improve the dereverberation performance, the modified CM A algorithm is investigated and operated in the LP residual domain. Simulation results demonstrate that our method is efficient and achieves high-quality noise suppression with no delay, making it attractive for real-time applications. To reduce additive binaural noises, we propose a new framework by integrating the merits o f binaural analysis with various additive noise reduction techniques. In particular, we consider two commonly-used techniques: the perceptually motivated xii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. spectral subtraction and the subband intermittent Adaptive Noise Cancellation (ANC). In both methods, we replace the conventional VAD by a simplified binaural model to improve the voice activity detection at low SNR. This, together with some other novel modifications to take account o f the non-uniform spectrum o f most real- world noise, enable us to achieve enhanced performance in reducing high colored binaural noise w ith low SNR. Furthermore, each method also presents some unique properties, making it appropriate for different types o f applications. Finally, an adaptive binaural dereverberation strategy is proposed. The utilization o f constrained Least-Squares algorithms enables it to blindly identify the left/right channel Impulse Response (IR) both efficiently and adaptively. RDSAF structure is incorporated to further improve the efficiency. Simulations show that for short channel IRs, our method can achieve almost perfect dereverberation; while for long IRs, it achieves a good dereverberation performance with only slight transmission delay. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 1 INTRODUCTION 1.1 Motivations and Overview In 1870’s, with the invention of cylinder phonograph by Thomas Edison and the birth of the first telephone by Alexander Graham Bell, the art of sound stepped into a new era. Since then, sound recording, coding, transmission, mixing, and reproduction have been constantly evolving. Among these fields, sound recording plays a big role in picking up the audio signals for the successive processing. However, it is inevitable that during the capture of the audio signals, the undesired noise or sound effects may be recorded simultaneously by the microphones. Therefore, how to effectively and efficiently suppress the acoustic noise (in this proposal referred as Acoustic Noise Suppression, ANS) has been an active area of research, and is the topic of this proposal. By using ANS techniques, we expect to achieve near ‘Compact Disc’ quality sound and certainly suppress or reduce any disturbances and distortions such as background noise, echoes, clipping and other signal degradations to adequately low levels. The applications of Acoustic Noise Suppression include: • Audio Communications: ANS can reduce the acoustic noise existed in telephone, audio conferencing, and hands free communication, improve the intelligibility of the audio signals. 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Hearing Aids: In a noisy environment, people with hearing loss would have much more difficulty to achieve the same speech quality and intelligibility as normal people. ANS can be used to help them understand the speech better. • Audio Archive Restoration: ANS is used to recover the original high-quality audio recording or to improve the signal quality. Unlike the applications mentioned above, distortion caused by ANS its own must be avoided. • Speech Recognition and Coding Systems: ANS can be employed as a preprocessor before the subsequent processing and coding of the speech signals. It may greatly improve the performance and efficiency of the system. 1.2 Previous Work Real world audio signals are usually disturbed by various noises, which can cause severe difficulties in audio communication and degrade the signal quality. Previous studies on ANS have focused on two common types of noise: the additive background noise and the convolutive noise in the form of reverberation and channel distortion. Assuming the propagation of the acoustic signals through the air is linear, the recorded signal can be represented as the summation of additive ambient noise and the convolution of the clean audio signal with the impulse response of the environment acoustics, as shown in Figure 2.1. 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Desired Signal s(n) (Speech, music) Signal reflection * direct path Undesired Noise (fan, engine,etc) s(n) Acoustic Noise ------ N Subsequent Suppression ------ 1 / Processing Figure 1.1: Overview o f an ANS System 1.2.1 Additive Background Noise Reduction The ambient background noise is the most common noise existed in the real-world recordings. It can be either stationary (e.g. fan, computer) or non-stationary (e.g. radio, cocktail party). In particular, the presence of background noise can significantly deteriorate the performance of the subsequent processing, especially for speech coding and speech recognition, which are designed to work in noise-free conditions. Tremendous research on additive noise reduction has therefore been carried out in the past two decades. Methods for additive noise suppression can be categorized into three groups - monaural, binaural and multi-channel techniques. The existing approaches for single-channel additive noise reduction include Spectral Subtraction [4], Adaptive Noise Cancellation (ANC) [55], Kalman Filtering based on speech modeling [42], and signal subspace decomposition methods [12]. When noise reference is available, ANC is a good choice for suppressing additive noise due to its good performance and capability of tracking environment variations. With only noisy signal presented, the subtractive-type algorithms based on spectral subtraction constitute a family of popularly used approaches due to its implementation 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. simplicity. For binaural noise reduction, Computational Auditory Scene Analysis (CASA) methods are able to separate the distinct sound sources from the background noise [3] [54], The famous “cocktail-party processor” provided by Bodden [3] is one representative approach to separate the desired sound source from the environmental noise at different locations using a binaural signal processing model. Besides, In the case that multi microphones are available, the beam-forming processing using microphone array, fixed or adaptive [21] [49], has shown to be a good way in suppressing the additive noise by producing directivity with respect to a desired incident direction. 1.2.2 Convolutive Noise Reduction In a reverberant environment, when the sound source is far away from the microphone, the recorded audio signal is subject to the reflections from the room boundaries. The transmission between the sound source and the microphone can be characterized by linear distortion of the amplitude and phase called reverberation, and modeled by the Acoustic Impulse Response (AIR) of the acoustic environment. Long reverberation may severely decrease speech intelligibility and listening comfort, cause difficulties in the subsequent processing like Automatic Speech Recognition (ASR). Therefore, dereverberation has attracted a great deal of interest these year, but been proven to be more difficult, especially under the circumstances when only the reverberant signals are presented without a priori knowledge of the room acoustics. For many years, multi-microphone methods have been intensively studied, which can accomplish some degree of dereverberation by means of beam-forming techniques [5] and blind source separation (BSS) algorithms [39], Flowever, single-microphone dereverberation 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. still poses a formidable challenge and is more common in real-world applications. Previous study on single-microphone methods includes cepstrum-based algorithms [1] and speech signal property based algorithms [59], In the former, room impulse response is estimated from the cepstrum and equalized, while in the latter, signal intervals with small signal-to- reverberation ratios are detected and attenuated. Recently, with the development of blind deconvolution techniques, dereverberation approaches employing blind deconvolution algorithms are studied under the assumption that the signals are statistically independent and identically distributed (i.i.d.) non-gaussian sequences [29] [40], 1.3 Contributions of the Research Compared to multi-microphone acoustic noise suppression schemes, monaural and binaural methods are more convenient and require less and smaller equipments. In this dissertation, we concentrate our study on monaural and binaural noise suppression approaches. Our goal is to develop ANS techniques that are efficient, effective, and suitable for real-time applications. Major contributions are summarized as: 1.3.1 Proposal of Delayless Subband Adaptive Noise Suppression The AIR of the real-world acoustic environment typically has large number of taps, resulting in heavy computational burden and slow convergence rate. Delayless Subband Adaptive Filtering (DSAF) [37] can help save the computational cost and accelerate the convergence rate without any transmission delay introduced. Hence, an Affine Projection Algorithm (APA) based Real-valued Causal Delayless Subband ANC is proposed to remove the additive noise in the first stage, and then a Constant Modulus Algorithm (CMA) based Real valued Non-causal Delayless Subband Blind Deconvolution Processor is exploited to 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dereverberate the reverberant audio signal in the second stage. The corresponding weight transformation for Causal/Non-causal Real-valued Delayless Subband Filter architecture is also deduced respectively. Specific contributions of the proposed monaural acoustic noise reduction strategy are listed below: • Design of Real-valued Delayless Subband Adaptive Filter Architecture: Delayless Subband Filtering architecture can eliminate the inherent transmission delay existed in the conventional subband algorithms. But the “complex-valued” subband signals generated by the conventional DSAF usually introduce additional computational complexity when combined with the APA algorithm, as implementing APA with complex-valued signals is more time consuming than with real-valued signals. Therefore, by using Single Sideband Modulation (SSB) filter banks, we design a Real-valued Delayless Subband Adaptive Filter architecture and integrate it with APA and CMA in the two stages respectively. • Study of Prototype Filter Design for Delayless Subband Architecture: The performance of the DSAF highly depends on the design of the prototype filters. In our proposed approach, a simple but effective prototype filter design method utilizing constrained least squares optimization is presented to achieve a highly suppressed aliasing for the Real-valued DSAF. • Proposal of APA based Causal Real-valued Delayless Subband ANC to eliminate the additive noise: As we mentioned in Section 1.2.1, ANC is an effective and widely used approach in reducing additive noise when noise reference is available, and is therefore employed in our approach. In particular, Affine Projection Algorithm (APA) [41] is selected as the 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. adaptive algorithm in our implementation of ANC, for it provides an improved performance over the NLMS algorithm and lower computational complexity than the RLS algorithm. Furthermore, to reduce the computational cost, we integrate it with the Causal Real-valued DSAF to achieve a good performance in additive noise reduction. • Proposal of CMA based Non-causal Real-valued Delayless Subband Blind Deconvolution Processor to Dereverberate the Signal: Constant Modulus Algorithm [20], a blind deconvolution technique, is employed to dereverberate the reverberant audio signal. Similar to the first stage, we still incorporate the Real-valued Delayless Subband Filtering to avoid the transmission delay and save the computational cost. Meanwhile, since the subband adaptive filters produced by CMA are non-causal, a new Non-causal Real-valued Delayless Subband Filtering structure as well as its corresponding weight transformation is exploited. • Computational Analysis of Real-valued Delayless Subband APA/CMA vs. Complex-valued Delayless Subband APA/CMA: To prove the efficiency of our proposed Real-valued Delayless Subband Adaptive Filter architecture when combined with APA/CMA algorithms, we analyze its computational complexity as well as its performance in this study and compare with that of its complex-valued counterpart. • Study of Modified CMA Algorithm in Audio Signal’s Dereverberation Applications: One possible problem with CMA algorithm is that when working on super-Gaussian signals with positive kurtosis, its performance deteriorates greatly. This limits its application in dereverberating audio signals because most audio signals are super- Gaussian. A modified CMA with an additional norm factor introduced in the update 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. equation is then investigated and utilized to blind equalize the reverberant speech signals. Moreover, the dereverberation process is implemented in LP residual domain, instead of the time domain. In this way, we expect a better dereverberation performance because the Linear Prediction process pre-whitens the speech signals. 1.3.2 Binaural Model Based Binaural Additive Noise Reduction In this proposal, we present a new framework by integrating the merits of binaural analysis with two different additive noise reduction techniques. In both methods, we replace the conventional Voice Activity Detector (VAD) by a simplified binaural model to improve the detection of voice activity at even low SNR. This, together with taking account of the non- uniform spectrum of most real-world noise, enable us to achieve enhanced performance when reducing high colored binaural noise in low SNR environment. In our first proposed additive binaural noise reduction scheme, the binaural analysis is integrated with a perceptually motivated spectral subtraction algorithm together to yield an efficient noise suppression performance. A simple, but effective, perceptually-weighted spectral subtraction algorithm is then applied on the left and right channels respectively. Taking into account that most real-world noise is colored with a non-uniform spectrum, the band-specific over-subtraction factors and spectral floors determined by the SNR on each frame are also introduced into the subtraction rules to further reduce the musical noise and signal distortion. Specific contributions are shown as following: • Replacement of Conventional VAD with Simplified Binaural Model: Conventional VAD widely used in spectral subtraction is substituted by a simplified binaural model in our approach. Assuming that noise source and signal source are 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. located at different locations, this simplified binaural model [18] can identify the speech/noise dominant segments even in the low SNR environments by mimicking how the human auditory system works. • Derivation of Psychoacoustically Motivated Subtraction Rule: Taking account of the masking phenomenon of human auditory system, a modified perceptually-weighted spectral subtraction rule is derived. The utilization of auditory masking threshold helps to suppress the signal distortion caused by spectral subtraction. • Introduction of Segmental SNR Based Band Specific Over-subtraction Factors and Spectral Floors: Since most real-world noise is neither white nor stationary, the noise does not affect the speech signal uniformly over the whole spectrum all through the time range. Hence, the band-specific over-subtraction factors and spectral floors determined by the segmental SNR on each critical band are introduced to reduce the noise while still maintaining a good signal quality. Therefore, our proposed binaural noise reduction strategy is more effective for colored binaural noise reduction. The second proposed additive binaural noise reduction scheme is built upon the combination of perceptual binaural model with Adaptive Noise Cancellation (ANC) technique. To reduce the binaural additive noise adaptively, intermittent ANC module is applied in selected subbands during noise-only segments detected by simplified binaural model. The novelty of this scheme is presented below: 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Replacement of Conventional VAD with Simplified Binaural Model: Similar to the binaural additive noise reduction scheme based on perceptually spectral subtraction, we also utilize the simplified binaural model in this binaural ANR scheme to detect the speech pauses. • Intermittent ANC Module: Considering that the noises contained in the left and right channels coming from the same noise source are correlated, we then apply Adaptive Noise Cancellation (ANC) on the left/right channel signals during speech silent segments. When the adaptive filter converges, the output signal of ANC is expected to be noise free and contains desired speech signal only. As such, binaural additive noise is successfully eliminated by the use of intermittent ANC. • Combination with Subband Processing: Subband decomposition is introduced to take account of the non-uniform spectrum of most real-world noise. Subband processing leads to faster adaptation and improved performance through the freedom of using different adaptive parameters in each subband. Furthermore, it helps to reduce implementation complexity by applying intermittent ANC in selected subbands only. In our work, we employ two different types of subband processing techniques: the auditory filter bank implemented by Gammatone filterbanks and the oversampled Delayless Subband Filters. The consequent binaural noise reduction performances are compared. 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.3.3 Binaural Dereverberation Using Delayless Subband Least-Squares Algorithm With one more channel signal presented, binaural dereverberation is easier than single channel dereverberation. Through careful observation and derivation, we formulate the problem of blind estimating left/right channel impulse response into a quadratic optimization problem. Consequently, a binaural dereverberation approach using Constrained Least- Squares algorithms and adaptive inverse filtering is proposed. The distinctive features of our proposed adaptive binaural dereverberation method include: • Constrained Least-Squares Algorithms Used for Blind System Identification: The binaural channel impulse response vector is actually the eigenvector corresponding to the smallest eigenvalue of the autocorrelation matrix of input signals. Instead of using time-consuming batch algorithm to compute the eigenvector directly, we employ some Constrained Least-Squares (LS) minimization strategy to estimate the binaural impulse responses adaptively. Both unit-norm constrained LS algorithm and unit-component constrained Fast LS algorithm are studied and applied. The results are compared. • Adaptive Inverse Filtering to Recover the Source Signal Using RLS Algorithm: After the left and right channel impulse responses are successfully estimated, adaptive inverse filtering is implemented to recover the original source signal. The core adaptive algorithm used for inverse filtering is RLS algorithm. • Combination with DSAF: In order to reduce the overall scheme’s implementation complexity, we combine the previously proposed Real-valued DSAF structure with the binaural dereverberation 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. algorithms. The utilization of Real-valued DSAF not only saves the computational cost required for adaptive Least-Squares algorithms, but also avoids long signal path delay caused by conventional subband processing. 1.4 Outline of the Proposal This proposal is organized as follows. Chapter 2 gives some background information of this research, including basic hearing auditory system, some psychoacoustic facts, primary idea of ANC, basic Spectral Subtraction algorithm and the objective measurements we use to evaluate the speech signals quality in this study. The monaural Real-valued Delayless Subband Acoustic Noise Suppression scheme is presented in Chapter 3, while Chapter 4 modifies the work on single-channel dereverberation done in Chapter 3 by using a modified CMA algorithm. Chapter 5 is devoted to additive binaural colored noise reduction strategies. The binaural dereverberation approach using Delayless Subband Constrained Least-Squares algorithms are studied in Chapter 6. Finally, a summary of all the results we obtain and some suggestions to future work are listed in Chapter 7. 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 2 RESEARCH BACKGROUND Before we go deep into monaural/binaural noise suppression algorithms, we would like to give a brief introduction of some preliminary background to our research in this chapter. Some knowledge on human hearing, auditory masking and critical bands is illustrated in section 2.1. Section 2.2 explains how ANC works for additive noise reduction, and section 2.3 presents the main principles of spectral subtraction algorithm. Information regarding Delayless Subband Filtering Architecture is introduced in section 2.4. Several objective measurements used to evaluate the speech signal’s quality are introduces in section 2.5. 2.1 Human Hearing and Auditory System Research on human auditory properties is an ongoing process. Previous study [60] shows that by taking advantage of the properties of human ears, we can improve the performance of acoustic noise suppression algorithms for both monaural and binaural cases. Most research on human auditory properties in noise suppression fall into two studies: the study of human auditory model and the study of masking phenomenon. This section reviews some fundamental knowledge regarding human hearing and auditory system. 2.1.1 The Human Ear The human auditory system can be divided into two main functional blocks, the periphery and the central. The periphery is a complex mechanism that converts sound waves into nerve 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. impulses, which are then conveyed by the auditory nerves to the brain stem. It comprises the outer ear, the middle ear, the inner ear, as shown in Figure 2.1. OUTER EAR MIDDLE EAR INNER EAR e ttjc ta s ' St GuC& r eqr oortal •Ewstodhtan lub® rourtii drum w incfesw Figure 2.1: Structure o f Human Ear [20] • The Outer Ear The outer ear consists of the pinna (the visible part of the ear), the ear canal, and terminates at the eardrum. The pinna collects sound and filters the sound in a way that depends on the direction sound coming from. These filtering effects can be described by the Head-Related Transfer Function (HRTF). The ear canal is a tube which directs the sound to the eardrum, and can be seen as a cavity with one end open and the other closed by eardrum. In this way, the ear canal acts as a resonant filter, with a peak around 5KHz. 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • The Middle Ear The middle ear begins at the eardrum (tympanic membrane) and includes three tiny bones of the middle ear, the ossicles — malleus, incus, and stapes. The eardrum is a thin membrane stretched across the inner end of the canal. It separates the outer from middle ear and vibrates when sound hits it. The ossicles form the linkage between the tympanic membrane and the oval window that leads to the inner ear. When sound enters the ear and makes the eardrum vibrate, the vibrations pass from the eardrum along the ossicles to the inner ear. • The Inner Ear The inner ear is a bony structure comprised of the semicircular canals of the vestibula and the cochlea. Usually, only the cochlea is considered to be the hearing part of the ear. It is also the most complex part of the ear, wherein the mechanical pressure waves are converted into electrical pulses. The cochlea is a fluid-filled spiral lined with thousands of tiny hair cells. When sound waves enter the fluid of the cochlea, vibrating sound waves cause ripples in the fluid, which then bends the tiny hairs. The hair cells convert the fluid motion into electrical impulse and send them to the auditory nerve. The nerve passes these impulses up to your brain, which recognizes them as different sounds. 2.1.2 Masking Human auditory masking occurs unconsciously at every moment for all of us. It is such a highly complex process that is only partially understood till now. The American Standards Association (ASA) defines masking as the process or the amount (customarily measured in decibels) by which the threshold of audibility is raised by the presence of another (masking) sound. To well understand masking, we must consider the following two concepts: 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Threshold of Hearing In order to be audible, sounds require a minimum pressure, which forms the threshold of hearing. The Absolute Threshold of Hearing (ATH) is the most common one describing how much sound intensity of a pure tone is necessary to be perceptible in a noiseless environment. It is unique from person to person and furthermore changes with a person's age. Due in part to altering in the outer and middle ear, this minimum pressure varies considerably with the frequency of test sound, and typically shows a minimum at frequencies between 1kHz and 5kHz. • Masking Masking is an important property of hearing and is widely used in sound compression algorithms as well as in acoustic noise reduction methods. It occurs when the audibility of one sound is decreased due to the presence of another masking sound. Generally speaking, masking effects can be classified into two categories: simultaneous masking and temporal masking. 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 80 0 O M 0.05 0.1 0.2 0.5 1 2 5 frequency of test tone (kHz) Figure 2.2: The Absolute Threshold o f Hearing The numbers indicate the age o f the test subjects [60] Figure 2.2 shows a curve of a typical absolute threshold of hearing. However, the threshold of hearing for a particular tone can be raised by the presence of another noise or another tone, which is masking. In simultaneous masking, the masking sound and the masked sound are present at the same time. The presence of the masking sound (masker) raises the threshold for the detection of another, like jamming in radio. Figure 2.3 gives an example of simultaneous masking, in which the audible threshold is modified with the presence of a single tone. Also note that the threshold is raised for tones at higher frequencies and to some extent at lower frequencies. Hence, when a complex input spectrum, such as music, is presented, the threshold is raised at nearly all frequencies. 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. t Masker Shifted hearing threshold 0 ) 5 £ 3 ( 0 ( A £ CL xs c 3 o € 0 Frequency Signal not masked Hearing threshold Masked signals Figure 2.3: Illustration o f Frequency Masking with a Tone Presented [6] [60] Temporal masking, on the other hand, refers to the effect of masking with a small time onset. Forward masking and backward masking occur when the masking sound continues to mask sounds at lower levels before and after the masking sound's actual duration. Figure 2.4 shows this concept. dB * ■ -20ms ' 200ms "150ms H u m s Figure 2.4: The Effect o f Temporary Masking [60] Masking has much to do with the frequency spectrum and the temporal separation of the sounds as well as the tonality of the masker. Broadband noise masker with little or no 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. phase coherence can mask sounds with levels as little as 2-6 dB below the masker. Tone like sounds need to be much louder, needing as much as 18-24 dB higher amplitude to mask other tones or noise, partially due to phase distortion and the appearance of difference tones. To measure the effect of masking quantitatively, the masked threshold is usually determined. The masked threshold is the sound pressure level of a test sound (usually a sinusoidal test tone), necessary to be just audible in the presence of a masker. It varies with the masking sound and lies above the absolute threshold in most cases. It is identical with the threshold in quiet when frequencies of the masker and the test sound are very different. Therefore, the amount of masking can be measured as the difference between the masked threshold and the absolute threshold. 2.1.3 Critical Band There are two membranes running along the cochlea: the Basilar Membrane (BM) and Reissner's Membrane. Covered in about 5000 outer hair cells and 25,000 inner hair cells, the BM is believed to perform a crucial part of sound perception. Previous study of Fletcher noticed that the BM acts like a mechanical frequency analyzer [13], so that each fibre of the auditory nerve is tuned to a different frequency in the audible range. However, any pure tone input to the BM gives not just a single hair cell firing, but a large number. If two pure tones of similar frequency are played, the response curves overlap. Thus the BM can be categorized into a bank of overlapping band pass filters called auditory filters. 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Critical bands are analogous to a spectrum analyzer with variable center frequencies. A critical band is defined as the smallest band of frequencies that activate the same part of the BM. Although experiments show that critical bands are much narrower at low frequencies than at high frequencies, exact width of critical bands is still not sure. According to Zwicker [20], the bandwidth of critical bands is relatively constant below 500 Hz, but above that increases approximately in proportion with frequency. But Moore's measurements in terms of Equivalent Rectangular Band (ERB) indicated narrower bandwidths, and recommended the following equation for ERB calculation: E R B (fc) = 24.7 • (4‘3 7 ' fc / 000 + \) (1) where f c is the center frequency of the critical band. Also note in Moore’s measurements, the bandwidths of critical bands below 500 Hz are not constant anymore. The theory of critical bands is an important auditory concept because they show that the ear discriminates between energy in the band, and energy outside the band. The frequency selectivity of masking effects can also be explained in terms of critical bands. In general, a critical band is the bandwidth around a center frequency, which marks a sudden change in subjective response. For example, within a critical band a louder tone can make a softer tone inaudible, resulting simultaneously masking. Therefore, masking models are usually described in critical bands unit. 2.2 Adaptive Noise Cancellation Adaptive Noise Canceller (ANC) [55] is a powerful technique used for background noise reduction wherein a separate noise reference measurement is made. Compared to other noise 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reduction methods, ANC places fewer restrictions on the nature of both of the signal and noise sources, and can track the variation of the environment. We will give a brief introduction to ANC in this section. 2.2.1 Adaptive Noise Cancellation A simplified system schematic of ANC is illustrated in Figure 2.5. Generally speaking, a ANC requires two input signals: the primary noise-corrupted audio signal x(n), and the reference noise signal rij(n), which is correlated to the noise no(n) presented in the primary signal x(n). Based on the assumption that the reference noise ni(n) is only correlated to the noise n( ,(n) contained in the primary signal x(n), the reference noise ni(n) is processed by an adaptive filter W to create a replica of the noise n0(n). This replica is then subtracted from x(n) to get a cleaner signal. The result of this subtraction, s{ri), is then an estimate of the original clean audio signal s(n). The filter training process is designed to minimize the error signal e(n) = s ( n ) - s { n ) . This can be explained mathematically as following equations. Since: e(n) = s(n) + nQ ( n ) - n l(n)*w(n) (2) where w(n) is the impulse response of adaptive filter W. We can see that the training process will minimize the expected squared error £[|e(«)|2] : E [\e(nf] = £[|.s(h) + n0 (n) - ny (n) * w ( n f ] = E[s{n)2 ] + 2 E[s(n) • n0 («)] - 2E[s(n) ■ {nx (n) * w(w))] (3) + E[(n0 (n) - «! («) * w{n))2 ] 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. If the signal s(n) is uncorrelated to the noise rio(n) and ni(n), the equation of 2s[|e(«)|2] is reduced to: E[\e(n)\2 ] = E [s{nf ] + E[(n0 (n) - «, (n) * w {n)f ] (4) The adaptive filter W has no influence on the signal s(n), so in order to minimize squared error, the adaptive filter W can only minimize the second term on the right side of Eq.(4). As a result, the adaptive filter W adapts to produce it output ni(n)*w(n) as a least squares estimate of noise component no(n) contained in x(n). Primary Input: System Output: uinj * s(n) Reference Input: n^n) Noise Source Signal Source Adaptive Filter W Figure 2.5: ANC System Model There are two key factors that affect the performance of ANC: the correlation between the noise no(n) contained in x(n) and the reference noise rit(n), and the correlation between the noise and the signal s(n). Since ANC is based on the assumption that signal and noise are uncorrelated, to achieve a good performance, it usually requires that the correlation between the noise and the signal s(n) to be weak, whereas the correlation between the noise tio(n) and ni(n) to be high enough. Therefore, the microphone used to measure the reference noise nrfn) is usually placed very close to the primary microphone. 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.3 Delayless Subband Filtering Structure In reality, noise source positions and acoustic environment are usually time-varying. The utilization of ANC techniques can help track the variations of the surrounding environment. However, the Acoustic Impulse Response of the environment typically has large number of taps, resulting in heavy computational burden and slow convergence rate. Subband processing [9] is known to be able to save the computational cost and accelerate the convergence rate. But the problem with the conventional Subband Adaptive Filtering (SAF) is that the transmission delay is inevitable when the signal goes through the analysis/synthesis filter banks. This is particularly undesirable in real-time applications. To solve this problem, Morgan and Thi [37] proposed a Delayless Subband Adaptive Filter (DSAF) architecture, which avoids the transmission delay usually associated with the conventional SAF. This section will give a brief introduction of conventional SAF and DSAF respectively. 2.3.1 Subband Adaptive Filtering (SAF) As we stated above, typical ANC used in acoustic noise reduction requires large numbers of coefficients and suffers from formidable computations and low convergence speed. Subband Adaptive Filters (SAF) has been introduced to overcome this problem by splitting the fullband signals into a number of frequency subbands. In particular, subband signals are often decimated in SAF systems. In this way, the subband decomposition and decimation greatly reduce the update rate and the length of the adaptive filters, resulting in a much lower computational complexity. Besides, this also leads to a whitening of the input signals and an improved convergence behavior. A general structure of a SAF is depicted in Figure 2.6. First, the fullband signals are decomposed into K subbands using analysis filters. Then each 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. subband adaptive subband filter is adapted independently. The outputs of these subbands are finally combined using a synthesis filter bank to reconstruct the fullband output. x(n) ( 3 - 4 Analysis Filter Banks —►H0 (z) h e i H H S ■ * Ht(z) N Hk.iW Hi d(n) 0 - Analysis Filter Banks A h„(z) I—► Hi — A H,(z) I— > |In 1 > • » • t • U | hk ,,(z)|-» |I^ 1 Synthesis Filter Banks [ T n W G 0( z ) d(n) Figure 2.6: Conventional Subband Adaptive Filtering Structure Non-ideal filters in the analysis/synthesis filter bank cause aliasing of the subband signals. This aliasing can be cancelled in the synthesis bank when certain conditions are met by the synthesis filters and in the subband processing. However, the in-band aliasing is still present in SAF input signals and reduces the system’s performance. In case of critical sampling, where the decimation ration N equals the number of uniform subbands K, the use of adaptive cross-filters between adjacent subbands or gap filter bank [17] [59] can help suppress this aliasing, but also leads to a reduced performance or significant signal distortion. Oversampled SAFs with N<K, on the other hand, offer a simplified structure that can reduce the aliasing level in subbands without employing any cross-filters or gap filterbanks. Our following discussion therefore will focus on the oversampled subband structure. 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.3.2 Delayless Subband Filtering Structure A problem with conventional SAF is that analysis/synthesis filter banks also introduce transmission delays into signal path. To overcome this problem, Delayless Subband Adaptive Filterer (DSAF) was introduced by Mogan [37], In contrast to the conventional SAF structure which requires both analysis and synthesis filters, Delayless SAF avoids synthesis filter banks by employing a weight transformation technique to transform the subband filter weights into fullband filter weights. Thus the transmission delay can be eliminated because the signal for canceling the desired signal is computed by the fullband filter. Figure 2.7 shows a general block diagram of Delayless SAF. Observe that there are two critical components required to construct such a Delayless SAF: the analysis filter banks and the weight transformation block. The delayless subband filter can operate either in an open-loop mode, in which the local error signal is used to update the subband adaptive weights, or in a closed-loop mode, where the fullband error signal is employed instead. Generally speaking, open-loop scheme gives less noise suppression, as the algorithm is working blindly with respect to the real error signal. Closed-loop scheme, on the other hand, can attenuate the noise in a larger extent. But it also presents a slower convergence rate due to the fact that a delay is also introduced into the weight update path. This will be a minor concern in the ANC applications with slowly varying environment. But for applications like Acoustic Echo Cancellation (AEC) in which convergence speed is very important, open-loop scheme will be more suitable. 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. K /2 Wi-iflht rrunslorni Analysis Filter Svnthesis Filler Banks Figure 2.7: Block Diagram o f Delayless Subband Adaptive Filtering Structure Position A: Closed-Loop scheme, Position B: Open-Loop scheme [37] 2.4 Spectral Subtraction Most single-channel noise reduction algorithms proposed in the past decades derived from spectral subtraction algorithm [4], which, according to Malca et al. (1996), has become “almost standard in noise reduction”. We call these spectral subtraction based algorithms as “subtractive-type algorithms”. This section illustrates how spectral subtraction and its derivatives work. 2.4.1 Spectral Subtraction Spectral subtraction algorithm is based on the fact that the statistical properties of audio signals, especially speech, are only stationary over short periods of time, whereas the noise is 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. assumed to be stationary over much longer periods and uncorrelated to the audio signal. Because of the short-time stationary of speech, the processing has to be done on a frame-by- frame basis. In addition, the spectral subtraction is usually performed in a transformed domain, e.g. the frequency domain. A common transform is the Fourier transform, which provides an equidistant frequency solution. There are also other ones such as the wavelet transform with non-equidistant spectral resolution, but they are not considered here. In the simplest form of spectral subtraction algorithm, a noisy signal is overlap-partitioned in short time frames of some milliseconds and is transformed to the frequency domain subsequently. An estimated noise magnitude spectrum detected in speech pauses is then subtracted from each noisy magnitude spectrum. The noise-reduced spectra are finally transformed back to the time domain using the unchanged phase of the noisy signal and overlap-added to achieve the “cleaned” audio signal. In this case, the phase of the audio signal is not processed because we assume that phase distortion is not easily perceived by the human ear. A basic system for spectral subtraction is presented in Figure 2.8. x(n) Frequency D om ain T ransform ation |* M phase arg [X(a>)] Inverse Frequency D om ain T ransform ation N oise Spectrum Estim ation Spectral Subtraction Figure 2.8: Block Diagram o f Spectral Subtraction 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Basic Idea Consider an audio signal s(n) corrupted by an additive background noise n(n). The observation x(n) can be expressed by: x{n) = s(ri) + n(n) (5) By using short-time Fourier transforms, the corresponding magnitude spectrum of the noisy signal is denoted as: X(co) = S(a>) + N{co) (6) Hence, if the noise spectrum |jV(©)|can be estimated as |jV(o)|, the magnitude spectrum estimation of clean audio signal can be achieved by: |S(«)| = |X(«)|-|V(©)| (7) Once the subtraction is computed in the spectral domain with Eq.(7), the enhanced audio signal s(n ) is obtained as: s(n) = IFFT\§(a))\ ■ e J*e(x(a))} (8) • Noise Estimation Estimating the noise spectrum is one of the major tasks of a noise reduction system. Basically, there exist two ways for estimating the noise spectrum. One possible solution is by using Voice Activity Detectors (VAD), the average noise magnitude spectrum or power spectral density (PSD) can be estimated during speech pauses. In this case, the VAD plays a critical role in attaining a high level noise 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reduction performance. But in environments with a high amount of noise power, it’s very hard to detect the speech pauses and a very robust detector is required to avoid estimation errors. The VAD also has problem in detecting unvoiced phonemes contained in the speech signal. The other solution is by considering the different temporal characteristics of the speech and noise PSD, it is possible to obtain a noise magnitude or PSD estimate by tracking spectral minima in a sliding window covering several frames [33], Accordingly, slow changes in the noise spectrum can be followed even if the speaker is active. However, the problem with these techniques is that they usually capture signal energy during speech periods, thus degrading the quality of the compensated speech signal. Besides, to get reliable and accurate noise power estimation, the frame size, the window size and the smoothing parameter need to be chosen veiy carefully. In this way, it is clearly better to use an efficient VAD for most of the noise suppression systems and applications. 2.4.2 Generalized Spectral Subtraction The simple basic spectral subtraction shows limited performance in manner of speech quality. A serious disadvantage is that there will remain an unnatural sounding residual noise called “musical noise”, which sounds artificial and disturbing to the listener. These artifacts are due to randomly distributed spectral peaks in the residual noise. Furthermore, the signal distortion is another disadvantage that spectral subtraction algorithm suffers from. Therefore, various solutions have been developed to reduce the unpleasant effect, including the introduction of spectral floor and over-subtraction of noise. 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Considering the basic spectral subtraction rule in Eq.(7), which may be interpreted as spectral weighting of the noisy speech signal, S ( g )) = G (< o) - X ( g )) (9) where G(o) - is the weighting function. This weighting function can be \X{w)\ regarded as a kind of adaptive frequency domain filter, and depends on the spectrum of both estimated noise and noisy signal. Hence, the noise reduction problem is converted to the problem of estimating noise-free spectrum by applying real-valued signal-dependent weights to the noisy-signal spectrum. To reduce the musical noise, we consider the following modified weighting function, into which we introduce the over-subtraction factor and spectral floor: G (a)- (1 - a (P | X{6>)\ y if | X(co)\ | ^ H ' \X{co)\ otherwise < 1 a + f3 ( 10) This is one of the most flexible forms of subtractive-type algorithm, which is called the generalized spectral subtraction algorithm [53], By the introduction of over-subtraction factor and spectral floor, it tries to obtain the tradeoff among the noise reduction level, musical noise and speech distortion: 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Over-subtraction factor a By employing over-subtraction factor a > 1, the short-time spectrum is attenuated more than necessary, which leads to a reduction of residual noise peaks but also to an increased audible distortion. • Spectral Flooring p By introducing spectral flooring 0 < p « 1, we can prevent the occurring of negative values in the spectrum and mask the residual noise accordingly. This leads to a reduction of residual noise peaks but also to a decreased level of background noise reduction. • Exponent y = y{ = 1 / / y 2 The exponent y determines the sharpness of the transition from the G(oj) = 1 to the G(a> ) = 0. y = 1 means the magnitude subtraction algorithm, while y = 2 represents the power subtraction rule. 2.5 Objective Measures Speech quality measurements provide a good way to evaluate the performance of varies acoustic noise reduction techniques and speech enhancement algorithms. In general, the methods for assessment of speech quality fall into two classes: subjective and objective measures. Subjective measures involve human listening to the clean/recovered signals first and then assigning a rating to it, which makes it costly and time-consuming. Hence, in recent years, there has been increasing interest in objective measurements that correlates well with a subjective speech quality measure. Those real-time, accurate, and economical objective measurements open up a wide range of applications that cannot be supported with subjective 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. listening tests. In this section, we give a brief introduction to several commonly used objective measurements. 2.5.1 Segmental Signal-to-Noise Ratio (S-SNR) Since the correlation of SNR with subjective quality is very poor, it is of little our interest in this study [44], Frame based Segmental SNR, instead, is one more reasonable objective measurement widely used in assessing noise suppression performance, which compares the two waveforms in the time domain. It is defined as the average of the SNR values over short segments as below: i JV-l i_, — $ V ( h + M) S_SN R = - ^ 1 0 1 o g A A "-°----------------------- (U ) L M i Z t s(n + N l)-s(n + N l)]2 ™ rt=0 where L represents the number of frames in the signal and N the number of samples per frame, s(n) is the original audio signal, x(n) is the distorted audio signal. The frame length is typically set to 15 to 20 ms for speech signals. In most noise reduction applications, we are more interested in Segmental SNR improvement instead of Segmental SNR itself. SNR improvement can be defined by the difference between the segmental SNR of the input noisy signal and the output recovered signal as shown in the following equation: S_SN R lm p r o v e m e n t = Seg _SNR(Recovered Signal) - Seg_SNR(Noisy Signal) (12) 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Segmental SNR is simple to compute and easy to implement, but has been criticized for being a poor estimator of subjective audio quality, which limits its utilization in many real world applications. 2.5.2 Enhanced Itakura Distance Measure (E-ID) Unlike Segmental SNR which works in time-domain, Itakura Distance Measure (ID) works in the frequency domain, trying to find the discrepancy between the power spectrum of the distorted signal X(w) and that of the original audio signal S(w). Based on the definition of Itakura-Saito distortion measure in [24], the Itakura Distance measure is then given by: original signal s(n) and distorted signal x(n) respectively [25]. However, the performance of Itakura Distance Measure usually deteriorates when the processed signals are degraded heavily, for example, when the signal is of very low SNR. This leads to some modification to the original Itakura Distance measurement: the Enhanced Itakura Distance [7], The Enhanced Itakura measure which incorporated the masking properties of human auditory system is known to offer a more consistent indication of the subjective quality of speech signal. Similar to the way how masking effect is incorporated in (13) where L i =1 L (14) A ,(a ) = A,(z) l ^ „ = 1 + as(i) and ax(i) are the i,h LPC prediction coefficients of the Llh -order LPC models for the 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. speech coding, noise spectral components below the noise masking threshold are also excluded from the calculation of the E-Itakura Distance measure because they are inaudible. As a result, the E-Itakura Distance measure on the plh frame can be achieved by following equation: r rr> , J E _ I D = log \M (co,p)\---------- (15) \As(ao,p)\ 2;r where M (co,p) is the masking matrix on the p‘ h frame, which can be obtained based on the noise masking threshold Th(ca,p) using the following equations: I f Px (a> , p ) < Th(co, p ), M (o ),p ) = 0 ~ ( 1 0 ) I f Px (co, p ) > Th(co, p ), M (co,p) = 1 Here Px (co, p ) is the short-term power spectrum of the noisy or distorted signal x(n) on the pth frame. In [24], the Enhanced Itakura Distance measure is compared with the original Itakura Distance. The results show that its correlation with subjective evaluation ratings of speech quality is improved 0.16. 2.5.3 Weighted Spectral Slope Measure (WSS) Weighted Spectral Slope Measure works in the perceptual domain to evaluate the signal’s quality. In [44], Quackenbush found that, compared to other measures based on aural models, Klatt’s Weighted Spectral Slope measure (WSS) yields the best results in predicting the subjective speech quality. WSS also shows a reasonably high correlation ( ( p - 0.74) with subjective quality test. 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In WSS measure, the signals are divided into 25 critical band and the corresponding spectrums are obtained to simulate the human auditory system [28]. By taking into account the fact that the spectral variation especially the peak locations plays an important role in human perception of audio quality, a metric based on weighted spectral slope differences near the peaks is computed and compared in each critical band. The spectral slope in the fh critical band is computed as: SLs(i) = S (i + l ) - S ( i ) SLx(i) = X ( i + l ) - X ( i ) where S(i) and X(i) are the spectra in decibels in the i,h critical band. We can see that SL/iJ and SLx(i) are in fact the first order slopes of these spectra. Next, a weight function W(i) for the slope difference observed in the i h critical band is obtained as follows: W (i) = {Ws(i) + Wx{i))!2 (18) Here Ws(i) and W x(i) are two weighting functions derived separately from the two spectra of the signals to be compared s(n) and x/n). The weighting functions W /ij and Wx(i) require the knowledge of the maximum output over all channels, and the output of the nearest peak to any channel. In [28], Klatt suggested an easy approach to compute them respectively based on the information of S(i) and SLx(i). Finally, the WSS can be calculated using the following equation: WSS = f > ( 0 K ( 0 - S 4 ( 0 ] 2 (19) (= i 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 3 DELAYLESS SUBBAND ADAPTIVE NOISE SUPPRESSION In this chapter, we propose an efficient two-stage adaptive noise reduction scheme to suppress the additive noise and convolutive noise successively. In the first step, a simple but effective Adaptive Noise Canceller based on Affine Projection Algorithm is employed to eliminate the additive noise. The resulting signal is then fed into the second stage - the blind deconvolution processor, which dereverberates the “wet” signal using a Higher Order Statistics (HOS) based blind deconvolution technique - Constant Modulus Algorithm (CMA). The utilization of adaptive techniques in our approach makes it possible to track the real-time variations of the surrounding environment. The impulse response of the acoustic environment typically has large number of taps, which cause heavy computational burden and slow convergence rate. Subband processing can help save the computational cost and accelerate the convergence rate. But the problem with the conventional Subband Adaptive Filtering (SAF) is that the delay is inevitable after the signal goes through the analysis/synthesis filter banks. This is particularly undesirable in real-time applications. To solve this problem, the Delayless Subband Adaptive Filter (DSAF) is employed, which not only avoids the transmission delay, but also improves the system performance compared to the conventional SAF. However, classical Delayless SAF usually adopts DFT or GDFT based analysis filter banks, in which the resulting subband signals are “complex-valued”. This introduces additional complexity when it is integrated with the 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. APA/CMA algorithms, as implementing APA/CMA with complex-valued signals is more complicated than with real-valued signals. Therefore, a “Real-valued” Delayless SAF structure is designed and integrated with APA and CMA algorithms in both steps to reduce the acoustic noise efficiently without any transmission delay. In particular, we design and implement a Real-valued Causal Delayless Subband APA to remove the additive background noise in the first stage and a Real-valued Non-causal Delayless Subband CMA to suppress the convolutive noise in the second step. The corresponding weight transformations are also deduced respectively. The use of Delayless SAF can eliminate the inherent delay existed in the conventional SAF systems while still maintaining a good performance. Meanwhile, by applying Single Side Band (SSB) modulation on the complex GDFT filter banks, we can easily generate the real-valued subband signals, which simplify the implementation of APA/CMA on each subband and reduce the computational complexity of the whole system. Therefore, our system is more efficient and well suited for real-time applications. The section is organized as follows: In section 3.1, we present the overall system structure in brief. Section 3.2 outlines how to build Real-valued DSAF architecture. Section 3.3 describes APA based Causal Real-valued Delayless Subband ANC, which is used to reduce the additive noise. In Section 3.4 we present the implementation of Non-causal Real-valued Delayless Subband CMA, which is employed to suppress the convolutive noise. Section 3.5 analyzes the complexity of our proposed method and compares it with that of the conventional complex-valued DSAF. The simulation results that demonstrate the efficiency 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of our proposed method are presented in Section 3.6. Finally, Section 3.7 gives a summary and draws the conclusion. 3.1 Overview As we stated above, in a noisy environment, the audio signal picked up by a microphone can be represented as x(n)=s(n)*g(n)+n0(n), where s(n) is the clean audio signal we expect, no(n) is the ambient noise, and g(n) is the impulse response of the acoustical environment. To extract the “clean” signal s(n), we use two adaptive techniques: an Affine Projection Algorithm based ANC to reduce the additive noise no(n), and a HOS-based Blind Deconvolution method, Constant Modulus Algorithm, to dereverberate the room effect g(n). Furthermore, these two techniques are combined with Real-valued Delayless Subband Filtering in order to obtain an improved performance with lower computational complexity. Figure 3.1 depicts the overall system structure. u (n )« s(n)*g(n) Nonlinear Function Real Valued knalysis 1 liter Banks reference noise APA. Figure 3.1: Block Diagram for Delayless Subband Adaptive Noise Suppression 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2 Real-valued Delayless Subband Adaptive Filtering (RDSAF) To eliminate the transmission delay and to improve the performance, Delayless SAF is widely used in various applications. But the problem with the conventional Delayless SAF using DFT or GDFT filter banks is that the resulting subband signals are “complex-valued”. This introduces additional complexity when combined with the APA algorithm - implementing APA with complex-valued signals is more complicated than with real-valued signals. As for CMA, although Delayless SAF with complex-valued subband signals requires approximately the same amount of multiplications as with real-valued subband signals, the implementation with complex-valued signals is more complicated. To make the approach more efficient, we propose an Oversampled Real-valued Delayless Subband Adaptive Filtering structure, and integrate it with the APA and CMA respectively. To construct a Real-valued Delayless Subband Adaptive Filter, we must consider two critical components: the analysis filter banks used to decompose the signal into real-valued subbands and the weight transformation used to transform the subband adaptive weights into equivalent fullband filter coefficients. 3.2.1 Real-Valued Single-Sideband (SSB) Analysis Filter Banks To obtain the real-valued oversampled filter banks, we can use either non-uniform filter banks[8] or Single-Sideband (SSB) modulated analysis filter banks[52]. The former scheme typically needs to deal with different subsampling rates. Therefore in our approach, we adopt the SSB filter banks. 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. SSB filter banks can be obtained by post-processing the complex-valued Generalized DFT (GDFT) filter banks. The GDFT transform pair is defined as: Y°dft = £ y{n)W^ k + k °)(n + "“> , k = 0,1,...., K -1 n = 0 (20) y W = v S n = 0,1,...., - 1 & k =0 where K is the total number of subbands, and WK = e ,(1'r'K). The analysis filter bank equation based on GDFT is then: X kG D F T {m) = | ]h (m N -n )x (n )W -(k + k M « = — c o V ) k = o,i,...., a : - i where h(n) is the lowpass prototype filter for analysis filter banks, N is the decimation factor, and Lp is the length of prototype filter h(n). In this paper, we only discuss the case that ko=l/2 and no=0. Therefore, SSB signal XfSB(m) can be expressed in terms of GDFT signal as: X s kS B (m) = Re[Xt r;/,/'r {m)ejr" m N I2] (22) where mA = it/N is the bandwidth of the prototype filter h(n). Obviously, the decimation factor for real-valued SSB filter banks is N = % / c o a . Compared to the decimation factor for a complex-valued GDFT filter banks, Nc = 2n/coA , N is approximately half of Nc, i.e. N ~ N C /2, because of the real operation in SSB. The implementation of SSB filter banks for the Uh channel is shown in Figure 3.2. As such, several efficient methods to implement GDFT filter banks can be directly applied to the implementation of SSB filter banks. In this paper, we adopt the efficient polyphase implementation of the GDFT filter banks proposed in [9], where the decimation is done 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. before the subband filtering. To avoid the aliasing, the filter banks are oversampled with a factor of half of the decimation factor for complex-valued filter banks, i.e. the subband signals are decimated by a factor N = Nc /2. x(n) h(n) kt. channel of GDFT filter banks (a) V \H(ejw)\ / \. TT TT (a) a ai | x£D F 7 eH 2 ' k 0 k 2 n ttN ttN t t _ ~ 2 'I f 0 AT 2 ru ) t t t tN TT T T N 2 t t (b) (a): Real-valued SSB Filter Bank Analyzer Based on GDFT Filter Bank for the tfh Channel. (b): Spectrum o f Prototype Filter, kfh channel o f GDFT Analysis Filterbank and SSB Analysis Filterbank Figure 3.2: Illustration o f Real-valued SSB Filter Bank Implementation 3.2.2 Prototype Filter Design From above analysis, we can see that the design of the real-valued single-sideband filter banks is eventually reduced to the design of the prototype filter h(n). The performance of the DSAF therefore highly depends on the property of prototype filter h(n). During the prototype 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. filter design for conventional SAF, we usually need to overcome three kinds of distortions: aliasing, magnitude distortion and phase distortion. However, by avoiding the synthesis filter banks, DSAF eliminates the magnitude distortion that occurs in the traditional SAF. Consequently, the design of prototype filter h(n) becomes less stringent, and the constraints are reduced to: (1) zero phase distortion, which can be achieved by a linear phase prototype filter; (2) less aliasing, which can be realized by forcing the filter attenuation in the stopband, a> e \n/Nc, n\, sufficiently large. As a result, the desired prototype filter h(n) for Real-valued DSAF is in fact a linear phase lowpass filter with normalized passband cut-off frequency c o p ~ n/K and stopband cut-off frequency c o s < n/Nc. Since it is important for the prototype filter to have a flat passband response and a highly attenuated stopband, we present an efficient constrained least squares method to design a linear-phase FIR lowpass filter. Recall that a Type II FIR filter H (eJ‘ “) can be expressed by its amplitude response as follows: - ( L p ~ \ H{eJ C ,) = e Ja 2 A(a) (23) Here, the amplitude response is given as A(eo)= ^/z(-^ + h)2cos[(h + 0.5)&>]- To minimize «=0 the stopband mean energy of the prototype filter, which is closely associated with the aliasing effect among the subbands, we define the square error as: = — — f \H(eJa)fda = — -— f (A(o))fdo) (24) n - cos V, 1 1 n - ojs V, 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Based on this definition, the prototype filter with a minimum aliasing effect can be designed by minimizing e in Eq.(24) with the constraint that the passband response is flat. Thus, the constrained minimization problem can be formulated as: min e i t , (25) subject to: fo r ® e [0 ,a > p] < = > |i4(o)|«l fo r © e[0,<ap] Consider two sets of frequencies Qp = {co/f o f\ ..., and Q s = { o f cof..., - the dense and uniform frequency grids on passband [0, 0)p] and stopband [o)s , T t\ respectively. We can write the corresponding A(co) on Qp and C 2 S in matrix forms as: Ap = [A(o0 p), A ( a > { ) , A ( a f p_xj\r =Cp -b As = [ A W ), A W ),..., A W .- , ) f =C,-k Here matrix C„ and Cv are k x — and k x— matrices, whose coefficients are: p 2 ’ 2 Cp (/, j ) = 2 cos[ry,^ (j - 0.5)] and Cs (/, j) = 2 cos|>,* t (j - 0.5)] (27) The vector b consists of the filter coefficients of the desired prototype filter h(n): b = [h{— ), h{— + \),...,h(Lp- 1)]7 • Based on the above expressions, the evaluation of stopband energy on Qs leads to the following quadratic function: 1 1 2 1 * = f \H(eJaf d a = - = - b T{Cr s Cs)b (28) while the constraint on flat passband leads to the linear equation: Cp -b = IKfXl=[l,l,...,lJ (29) 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As a result, the constrained minimization problem Eq.(25) is reduced to a linear constrained minimization of a quadratic function shown below: min ( - b T(Cl -C,)b) 2 ~ (30) subject to Cp -b = IK x l This linearly constrained least squares optimization problem can be solved by Lagrangian multiplier method. The Lagrangian function in our problem is: = + where Q ,= C r p -Cp (31) ^ i'=0 By setting the derivatives of Lfb, p) to zeros with respect to b and p, we can find the solution to this minimization problem: b = Q;lCT p(CpQ;lC 7pr l-I (32) On the other hand, since Qs =Cp - C , the element of matrix Qs can be expressed as: (ft 1 1 Qs(k,n) = 2 cos[(«-— )<y]cos[(A:-— )©]do) = I cos[(k- n)a>]da> + f cos[(k + n-\)a)]d6) * i)s *bs = q(k-n) + q(n + k - 1) (33) where q(k) - £ cos{kco) da> - n sin c(k) - co5 sin c (^ ^ -) n Recognize that the first term q(k-n) contained in Qs(k,n) actually constructs a symmetric Toeplitz matrix, and the second term q(n+k-l) constructs a Hankel matrix, this special Toeplitz-plus-Hankel structure of matrix Qs gives rise to less memory requirement to store the matrix and efficient algorithms to solve the problem. 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Furthermore, Merchant and Parks [36] provided a method for solving this special Toeplitz- plus-Hankel structure in a complexity of 0(N2 ). This makes our proposed prototype fdter design method much more efficient. Figure 3.3 gives an example of the resulting prototype fdter’s frequency response for a 16-band real-valued Delayless SAF and compares it with the filter designed by Parks-McClellan algorithm. We can see that the frequency magnitude response in the passband is pretty flat while the attenuation in the stopband is veiy high, more than 20 dB higher than that of the filter designed by Parks-McClellan algorithm. N o n to liffl& d fre q u e n c y ( N y q u is t - 1 ) N o n a a liz ie d F re q u e n c y (N y q u is t= 1) (a): Designed by Remez Function (b): Designed by Our Method Figure 3.3: Frequency Response o f a Prototype Filter Example 3.2.3 Weight Transformation Weight transformation is another critical factor that would affect the performance of Delayless SAF. Morgan suggested a FFT-stacking way to implement the weight transformation for complex-valued DSAF. Assume the fullband adaptive filter is of L taps, the input/error signals are separated in K subbands and each subband adaptive filter is Mtaps long. In FFT-stacking procedure, an M point FFT is first operated on each subband filter weights. Then the resulting DFT coefficients are properly stacked in an L element array based on some stacking rules. Finally, the fullband filter impulse response can be achieved by applying IFFT on the L element array. In [23], it is shown that this weight transformation 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. can be interpreted as constructing the fullband filter impulse response by passing the subband filter coefficients through a synthesis filter bank as follows: K - l W ( z ) ^ F k(z)Wk(zN) k = 0 where the synthesis filter Fk(z) can be expressed as: F(z) = ZA. E / W c \ ) k = 0 /= _ _ L + 1 2K (2k+l)— —1 2 K H 2k-1) L L 2K + 2 2 K Z / W c \ ) 0 < k < k = K _ ~ 2 K 2 2K (2k+\)~ K < k < K - l l=(2k— \)--- + 1 ' 2 K 1 i_ 1 where f(z) = — / = - £ + ! _ ^ If: , (34) (35) When FFT-stacking method is interpreted as a synthesis filter bank, there are lots of deep nulls in the passbands of the synthesis filters, which deteriorates the system performance significantly. To correct this, Huo proposed a modified method by setting W( ^ + L according to the frequency response of the proper subband filter, which is referred as FFT-2 weight transform [23]. Instead of an M point FFT, but a 2M point FFT is computed first, then followed by some stacking rules to stack those DFT coefficients into a fullband L taps filter. The diagram of FFT-2 weight transform is shown in Figure 3.4. The corresponding synthesis filters for FFT-2 are given as: 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. — 1 K L k = 0 ( 2 * + l) — -1 K where f (z) E / w U ) (2 /r+ l)— V V — < £ < £ - 1 2 £ / ( * < 2/.) Compared to the original FFT-stacking weight transformation, FFT-2 weight transform showed a significant improvement in performance. In our approach, using the similar idea behind the FFT-2 weight transformation, we design new stacking rules for both Real-valued Delayless Subband APA and Real-valued Delayless Subband CMA. Note that the Real valued Delayless Subband APA in the first stage employs causal FIR as the subband adaptive filters, whereas the Real-valued Delayless Subband CMA in the second step utilizes non-causal FIR as subband adaptive filters. Therefore, two different stacking rules are designed for the Causal Real-valued DSAFs and Non-causal Real-valued DSAFs separately. In the following two sections, we will give a brief introduction to these two different cases respectively. 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2L/N Frequency Stacking Rules 2L/N Fullband Coefficients: W IFFT & Discard Last L Samples Figure 3.4: Diagram o f FFT-2 Weight Transformation 3.3 Causal Real-valued Delayless Subband ANC In our two-step adaptive noise reduction scheme, the Adaptive Noise Canceller we used in the first step is a simple and efficient technique to reduce the additive noises. To implement this adaptive noise canceller, the Affine Projection Algorithm (APA) is selected as the adaptive algorithm, because it provides an improved performance over the NLMS algorithm and lower computational complexity than the RLS algorithm. Furthermore, since the wideband adaptive noise cancellation often involves adaptive filtering with hundreds or even thousands of taps, we integrate it with the Real-valued Delayless Subband Processing to achieve an improved performance with zero delay. The real-valued subband signals also make the implementation of APA more efficient. 3.3.1 Affine Projection Algorithm The Normalized Least Mean Square (NLMS) algorithm is a simple, stable, and low complexity adaptive filtering technique. However, its performance is seriously deteriorated when the input signals are colored. The Recursive Least Squares (RLS) algorithm can speed 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. up the convergence rate considerably in the case of colored signals, but with very high computational complexity [20], which limits its usage in practical applications. Compared to NLMS and RLS algorithms, the Affine Projection Algorithm, on the other hand, provide a good trade-off between convergence speed and computational cost. In our approach, we use an APA based delayless subband ANC in the first step to suppress the ambient noise. The details of APA are described in Appendix A. The computational requirement for a traditional APA is (2NP+Const*P2 ) multiplications per sample, where Const is a constant decided by the complexity of inverse matrix computation required in (22). However, when Fast APA is used and the subband signals are real-valued, the computational requirement in this step can be reduced to (2M+20P) real multiplications per sample [16]. Thus the computational cost is reduced. 3.3.2 Weight Transforming Rules for Causal Real-valued Delayless Subband We have shown in the second section that FFT-2 weight transformation can provide a better performance in Delayless Subband Processing. So we design a new stacking rule based on FFT-2 weight transformation for the causal RDSAF, which can be expressed as: Common Multiple of K and 2N. W‘(l) and Wlk(l) stand for the Ith FFT coefficient of the 49 ANC W \l) = \0 I = 1} I e (L \ 2l}) (37) Wl(2Ll -/)* where L is the length of the fullband filter, and is assumed to be dividable by the Least Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. fullband filter and of the k‘ h subband filter respectively. Here [a] denotes the closest integer smaller than a, and (a)t, means ‘a modulus b \ 3.4 Non-causal Real-valued Delayless Subband Blind Deconvolution After processed by the first stage Real-valued Delayless Subband ANC, most additive noise can be eliminated without any transmission delay. But the resulting signal is still reverberant. To suppress this convolutive noise, we feed the reverberant signal into a delayless subband blind deconvolution processor in the second stage. Constant Modulus Algorithm (CMA), a blind deconvolution technique, is utilized to extract the “dry” signal. Similar to the first step, we still combine CMA with the real-valued delayless subband filtering architecture to avoid the transmission delay and save the computational cost. Also note that the spectrum of the signal becomes flatter on each subband, which results in reduced sensitivity to noise. 3.4.1 Constant Modulus Algorithm (CMA) ► LMS Algorithm Nonlinear Function Figure 3.5: Architecture o f CMA 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CMA is a well-known algorithm for blind equalization of LTI systems. It exploits the Higher Order Statistics (HOS) of the received signal indirectly and is simple to implement. With reasonable computational cost, CMA can achieve a good result for slow time variant systems. Figure 3.5 gives the architecture of the CMA algorithm. Appendix B gives the details of the implementation of CMA algorithm. Let u(n) be the reverberant audio signal from the output of the Delayless Subband ANC in the first step. Assuming perfect adaptation of ANC, u(n) can be approximated as the clean signal s(n) convolved with the room impulse response g(n), i.e. u(n) ~ g(n) *s(n). Then CMA is applied on each subband signal Uk(n) decomposed by the SSB modulated filter banks. Since the estimate s(n) is assumed to exhibit Bussgang statistics, CMA is actually a member of Bussgang algorithms. Hence it has the common drawbacks that Bussgang algorithms have. 3.4.2 Weight Transforming Rules for Non-causal Delayless Subband CMA When combining the conventional CMA with Real-valued DSAF, we need to utilize some weight transformation to convert the subband adaptive coefficients into fullband weights. However, unlike the causal APA adaptive filters used in the first stage, the adaptive filters of CMA are non-causal. Therefore, to overcome the non-causal effect, we need to first shift the subband adaptive filter’s coefficients to causal ones before applying weight transformation, and then shift them back after the weight transformation. The diagram of the weight transformation for Non-causal Real-valued DSAF is shown in Figure 3.6. 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. . W ) oints 4M point FFT Band Frequency KM points Stacking 4M oints 4M point FFT Band Rules (M * K / 4 ) m i 4M*K/4 J Fullband Coefficients: W IFFT & Remain First 2MK/4+1 Samples Note: Figure 3.6: Block Diagram o f Non-Casual Weight Transforming Process fo r Delayless Subband CMA The stacking rules can be accordingly designed as following: W \ l ) = W 2 ((I)m +M) W 2 (M + 1) K-M V 7 2 W2( K M - i y i rn K M s I e [0, — — ) / = 2 K M . .K • M _ T. I e ( h 2, K ■ M) (38) Therefore, the fullband filter has f =2- +1 taps while each subband adaptive filter has 4 2M+1 taps. Here, we assume the decimation factor N=KJ4. W2 (I) and W2 k(l) stand for the l'h FFT coefficients of fullband filter and of the F h subband filter respectively. Still, [ a ] denotes the closest integer smaller than a , and (a)b means ‘a modulus b \ 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.5 Analysis of Computational Complexity To show the efficiency of the proposed real-valued DSAF noise suppression approach, we perform a brief analysis on its computational complexity in this section. Since Morgan has given a detailed analysis on the computational requirements for complex-valued Delayless SAF and compared it with that of conventional SAF in [37], we only focus on the comparison between conventional complex-valued DSAF and our proposed real-valued DSAF. Our analysis is based on the conclusion made in Section 3.1 that N ~ Nc /2 holds approximately. According to [37], the computational load required for a Delayless SAF, both with real-valued signals and with complex-valued signals, can be divided into four components: subband decomposition, subband adaptive filtering (APA and CMA in this paper), weight transformation, and the signal path convolution with fullband filter W. Since the computation complexity of the last two components for real-valued DSAF is same as that for complex-valued DSAF, we concentrate our analysis on the first two parts. • Subband decomposition: Let /?,c denote the computational requirement of complex-valued subband decomposition and R[ denote that of real-valued delayless subband decomposition. For a GDFT analysis filter bank, if polyphase FFT implementation is used, we need ~ (Lp + K log2 K + 4K)jNc (39) real multiplications per input sample [23], where Lp represents the length of prototype filter h(n). For the corresponding real-valued SSB filter banks, SSB modulation requires additional K/N multiplications per sample. Since N ~ Nc/2, we conclude that real-valued SSB banks approximately require twice computations as GDFT filter banks, i.e. 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R[ ~ 2 R I. However, compared with the computational savings in the second step (see analysis below), especially the computations required for APA, this overhead is actually small and ignorable. • Subband adaptive filtering: Considering APA used in the first step, if Fast APA is used and the subband signals are real-valued, the computational requirement in this step is (2M+20P) real multiplications per sample [16]. Here M is the order of the adaptive filter, P is the projection order and K is the number of subbands. As each complex multiplication requires four real multiplications, it is easy to obtain the computational requirement for real-valued DSAF and complex-valued DSAF. ^ = ^ (2| - + 2 0 p ) * f (40) = -£ -(* j j - + 2 0 /> )* ! (41) Replacing Nc with 2N, we get the ratio between them: (42) R C 2 A P A (2L/N + 40P)*K/2 R r 2 A P A (2L/N + 20P)*K/2 In ANC applications, when small projection order P is chosen, we have R C 2 A P A ~ R r 2 A p A > which means that processing complexity is approximately equal in both real-valued and complex-valued cases. On the other hand, when a larger projection order P, like 8 or 16, is used, which is common in practical noise cancellation applications, the term 20P (or 4OP) is much larger than 2L/N. In this case, P becomes the dominant factor that we have R2 A P A ~ 2.R2 a p a . 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As for the real-valued delayless subband CMA in the second step, it is easy to show that the computational complexity on each subband is of order 0(2M+1) ~ 0(2M), where 2M+1 is the subband filter length of the real-valued DSAF. To achieve the same filtering performance, due to the doubled decimation factor Nc ~ 2N, the complex-valued DSAF only needs filter taps of M +l, about half of the subband filter length required for the real-valued DSAF. Considering the fact that one complex multiplication requires 4 real multiplications, the computational complexity between real and complex valued DSAF for CMA can be estimated as follows: (43) N c 2 R -2 C K M 0 0 ^ * 0(2M) * (44) Replacing Nc with 2N, we get the ratio between them: K c m a „ 0 (M )* (£ + l)/2 l K C M A ~ 0 (M )* (K +1)/2 In summary, the real-valued approach is more efficient when combined with delayless subband APA, because its computational cost can be highly independent of the adaptive filter length L when P is large. As for delayless subband CMA, the computational complexity is almost equal between real-valued and complex-valued DSAF. Based on the above analysis, we can conclude that our proposed real-valued delayless subband noise reduction scheme outperforms its complex-valued counterpart with respect to the computational cost. 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.6 Simulation Results To demonstrate the effectiveness of our proposed method, we simulated the following three scenarios. 3.6.1 Complex-Valued DSAF vs. Real-valued DSAF In the first experiment, by a simple system identification simulation, we compared the performance of our proposed real-valued Delayless SAF with that of the conventional complex-valued Delayless SAF for both open loop and closed-loop schemes. To make further comparison, the corresponding classical SAFs are also applied on the same system identification problem. In our simulations, the unknown system is modeled by an impulse of 256 taps, which is obtained from an exponentially decaying set of random values between ±1. The ambient additive noise to the system is a colored air-conditioner noise. Then the signals are decomposed into 32 subbands, and the decimation factors for complex-valued DSAF and real-valued DSAF are 16 and 8 respectively. The resulting convergence curves showing the Normalized Mean Squared Error (NMSE) for open-loop and closed-loop schemes are plotted in Figure 3.7 respectively. Note that in general, the real-valued subband processing presents a comparable performance to the complex-valued subband processing in both classical SAF and Delayless SAF. Moreover, compared to the conventional SAF, Delayless SAF apparently shows an improved performance in terms of converging rate and asymptotic residual error. In particular, closed-loop delayless SAF can help achieve a much lower level asymptotic residual error than open-loop scheme, because the feedback signal used in closed-loop scheme is the wideband error which permits the adaptive filters to partly compensate for 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. aliasing of the subband signals. However, the delay introduced into the weight update path in closed-loop scheme also reduces its convergence rate, which makes the convergence speed of closed-loop Delayless SAF slower than that of open-loop scheme, as shown in Figure 3.7. In our application, since the environment is varying slowly, this is of minor concern. -20 -40 -60 -60 -100 -120 -140 Com plex-Valued Conventional SAF Com plex-Valued O pen-loop D elayless SAF Com plex-Valued C losed-loop D elayless SAF 0.5 1 1.5 S am ples 2.5 x 10 (a): Complex-ValuedSAF/DSAF -20 -40 -60 O D L U C O 2 -80 -100 -120 Real-Valued Conventional SAF Real-Valued O pen-loop D elayless SAF Real-Valued Closed-loop D elayless SAF -140 0.5 2.5 Sam ples .4 x 10 (b): Real-Valued SAF/DSAF Figure 3.7: Convergence Curves 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.6.2 Non-causal Delayless Subband CMA Simulations In this simulation, we use an artificial reverberant music Ui(n) to test the performance of our proposed non-causal delayless subband CMA. ui(n) is generated by convolving a piece of anechoic music si(n) with an artificial Acoustic Impulse Response (AIR) gi(n) (Figure 3.8(a) and Figure 3.8(b)). Then we apply the Non-causal Delayless Subband CMA to dereverberate ul(n). The number of the fullband filter taps L is set as 1024. The adaptations are performed in K=32 subbands with a maximally decimation rate N=8. The final estimate of the clean signal (n) output by subband CMA is illustrated in Figure 3.8(c). 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 3 4 S Sample Index ai 0 / 4 Time: second faj: Original Music and its Spectrogram Sample Index Time: second (b): Reverberant Music and its Spectrogram 2 $ 4 Sample M ex O 0.2 6 * OS 0.8 Time: second (c): Recovered Music and its Spectrogram Figure 3.8: Simulation Results fo r Non-causal Delayless Subband CMA Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.6.3 Results of Simulated Acoustic Noise Suppression In the third experiment, a studio environment is simulated with one signal source: a piece of real recorded anechoic speech signal S2(n), and one noise source: a piece of real air- conditioner noise rii(n). The original clean speech s2(n) and the colored air-conditioner noise are shown in Figure 3.9(a) and Figure 3.9(b) respectively. We use a real room AIR g2(n) in this experiment to create the reverberant speech signal u2(n) = s2(n)*g2(n), and then add the air-conditioner noise into this reverberant speech. The resulting noisy and reverberant signal x(n)=s2(n)*g2(n)+no(n) is plotted in Figure 3.9(c). • Results of delayless subband ANC: In the first step, we try to remove the additive air-conditioner noise no(n) using real valued delayless subband ANC. The number of the fullband filter taps Ll is chosen as 512, and the adaptive noise cancellations are performed in Kl =32 subbands with a decimation factor of Nl=8. The SNR of reverberant signal si(n)*g2(n) to air-conditioner noise n0(n) is set to -10 dB. To make a trade off between better performance and less computational cost, the projection order P we selected as 4 for APA algorithm. After applying our proposed real-valued delayless subband ANC onto the noisy and reverberant signal x(n), we obtain the resulting signal «,(«) as shown in Figure 3.9(d). It is obvious that in u2{ri) , most of the air-conditioner noise has been removed. The result demonstrates that our proposed real-valued delayless subband ANC works very well on canceling additive background noise, even under the circumstance that the additive air- conditioner noise is colored and non-stationary. 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Results of delayless subband CMA: In this step, the real-valued non-causal delayless subband CMA is applied on u2(n) for dereverberation. Now the number of the fullband filter taps L2 is chosen as 1024. The adaptations are also performed in K2=32 subbands with a decimation rate N2=8. The final output of the clean signal s2(n) is illustrated in Figure 3.9(e). CMA’s performance on dereverberation is not as good as we have expected. We believe that the coloration and non-stationary properties of the speech signal and the non-minimum phase representation of the room impulse response are two major reasons. But the performance of our proposed real-valued non-causal Delayless subband CMA is comparable to that of the fullband CMA. 3.7 Conclusion In this chapter, we propose a two-stage Real-valued Delayless Subband Adaptive Noise Reduction scheme to suppress the additive noise and convolutive noise successively. The system implementation details are presented and the computation complexity is analyzed. The simulation results show that our proposed real-valued delayless subband ANC in the first step succeeds in reducing most of the additive background noise, even highly colored background noise. The non-causal real-valued delayless subband CMA in the second step presents a comparable performance to the fullband CMA in deconvolving the reverberant signals. Besides, compared to its complex-valued counterpart, real-valued delayless SAF in our approach not only helps eliminate the transmission delay, but also makes the overall system more efficient. Therefore, our proposed method is well suited for real-time acoustic noise reduction applications. 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a): Original Clean Speech Signal (b): Air Conditioner Noise (c): Noisy Speech with Both Air Conditioner (4): Output Signal o f DSANC with Air Noise and Reverberation Conditioner Noise Reduced a s ................. £ U m 1 as • & .§ 1 IjS a 28 3 M a § Sam ple In d e x Final Output Signal o f DSCMA with Both Air-conditioner Noise and Convolutive Noise Suppressed Figure 3.9: Simulations Results fo r Single Channel Acoustic Noise Suppression Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Noisy and Reverberant Speech Reverberant Speech after subband ANC Final Recovered Speech after Dereverberation ID 0.336 0.237 0.355 WSSD 80.84 40.26 57.05 Table 3.1: Objective Measurements for Monaural Delayless Subband Noise Suppression 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 4 IMPROVEMENT ON MONAURAL BLIND DEREVERBERATION In Chapter 3, we have already proposed a single-channel dereverberation strategy whose core is a blind deconvolution algorithm: CMA algorithm. But the computer simulation results show that this method works unsatisfactorily when deconvolving audio signals. Investigating the algorithm, we notice two factors limit its performance when audio signals are involved. • It is well-known that CMA performs well for sub-Gaussian signals with negative kurtosis. However, most audio signals are not sub-Gaussian signals, but super- Gaussian with positive kurtosis. • When applying CMA algorithm, we assume that the input signal is white, i.e. i.i.d. sequence, with zero mean and unit variance. But audio signals are usually colored signals with some internal temporal correlation. In order to overcome the above two restrictions, we investigate the modified CMA algorithm in this chapter, and propose an enhanced single channel dereverberation strategy based on it. In particular, the modified CMA algorithm, which is believed to be able to deconvolve signals with super-Gaussian distribution, is operated on the LP residual of the reverberant speech signal. The resulting adaptive filter coefficients are then transferred to time domain to 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dereverberate the speech signal. The block diagram of this modified monaural blind dereverberation approach is shown in Figure 4.1. u(n) e{n) nonlinear function g(.) w(n) adaptive filter LP analysis w(n) adaptive filter Figure 4.1: Block Diagram o f Modified Monaural Dereverberation Method 4.1 LP Residual The LP residual of speech signal can be obtain ed from standard Linear Prediction Analysis. Applying LP Analysis, we can express the current speech sample as a linear weighted sum of past p samples, as denoted by Eq. (46): p s(n) = - Y , ak -s(n-k) (46) k=\ where {ak } k=l, 2, ..., p are the LP coefficients. LP residual is then given by: p p r(n ) = s(n) — s(n) = s(n) + ^ ak ■ s(n - k) - ^ ak ■ s(n - k ) with a0 = 1; (47) k= \ k = 0 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Compared to time domain speech signal, its LP residual has the following two features that make it more appropriate in our application. • It is well know that Linear Prediction Analysis can decorrelate or pre-whiten the speech signal. Therefore, the resulting LP residuals are more white with less internal correlation. Figure 4.2 compared the power spectrum of the a speech segment and its LP residual. We can see that the resulting LP residual is less “colored” with a flatter spectrum. Since Bussgang family algorithms achieve best performance when the input signal is i.i.d. sequence, applying modified CMA algorithm on the LP residual instead of the original time-domain signal can help to improve its blind deconvolution performance. item 128JS Ita q u i- n c y iH a B s q u ia w y iH s (a): PSD o f Speech Signal (b): PSD o f Corresponding LP Residual Figure 4.2: Power Spectrum Density Plots • In previous study, researchers have found that the LP residual of speech signal is farther from Gaussian distribution than the clean speech is. In other words, LP residual of speech signal has a kurtosis lower than the kurtosis of speech signal itself, where the kurtosis of a signal is defined as: 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This fact can be verified by Figure 4.3, in which the kurtosis of a piece of reverberant speech segment and its LP residual are compared at different reverberation level. Since CMA type algorithms fail to work for Gaussian signals, we expect the input signal to be as far as possible from Gaussian distribution to yield a better performance. As a result, LP residual is more suitable to be the input signal to the CMA type algorithms. 18 - kurtosis of speech signal - kurtosis of LP residul 16 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 distance to microphone (m eter) Figure 4.3: Kurtosis at Different Reverberation Level Due to the above two reasons, we choose to operate blind deconvolution on the LP residual of speech signal instead of the time domain speech itself. Now we need to use LP synthesis filter to synthesize the dereverberated LP residual into time domain signal, which might cause LP reconstruction artifacts. Knowing that for small adaptation rates in CMA type algorithm, the blind deconvolution system is linear, we can therefore apply the blind 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. deconvolution filter obtained by LP residual to the reverberant speech signal directly, as shown in Figure 4.1. 4.2 Modified CMA Algorithm It is well-known that the Bussgang-type adaptive algorithms including CMA algorithm work well when deconvolving sub-Gaussian communication signals. However, for super-Gaussian signals, such as most audio signals, these methods often work poorly or even diverge. Heinz Mathis proposed a modified Bussgang blind deconvolution approach in [34] [35], trying to enable it to deconvolve impulsive signals such as audio signals. In this chapter, we investigate this modified algorithm and combine it with Real-valued DSAF structure to improve the performance of monaural blind dereverberation. As we know, the update functions for Bussgang-type algorithms can be written as a difference of two polynomials: w{n +1) - w(n) + //(g, O) - g 2 (y))u(n) (49) with: g,(y) = asign(y)\y\p (50) g 2(y) = sign(y)\y\q (51) It’s easy to see that CMA algorithm is just a special case of Bussgang-type algorithm with p= l and q=3, which can be updated using the following equation: w(n +1) = w(n) + ju(R2 • y(ri) - y(«)3) • u{ri) (52) 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. However, for super-Gaussian signals, the Bussgang-type algorithms generally fail to provide local stability. So Heinz Mathis suggested a modified method to possibly stabilize the Bussgang-type algorithms in [35]. The goal of the modified Bussgang-type algorithms is to modify the coefficient updates to drive the length of the coefficient vector w to a constant value. Some terms proportional to the squared L2 -norm of the coefficient vector w are inserted, resulting a modified update equations of the form: w in +1) = w(n) + // ( |M » i r gi O ) “ | H » r £2 O O M «) (53) where ||w||2 = w w and p and q are even-valued integers. To reduce the computational complexity increase caused by the modification, we can compute a scalar-only update to 1 1 w(n)|\2 using the following equation alternatively: I I w(n +1) ||2 = || w(n) ||2 +2//(||w(«)||P g, (y(«)) -||w (« )f g 2 (y(n)))y(n) + V 2 (||lf («)ir (T(«)) - ||w («)f g 2 (y(n)))2r(n) with r(n) updated as: r(n) = r(n-l) + u(ri)2- u ( n - L - l ) 2 (55) Here L is the order of the Bussgang-type algorithms. Hence, the design goal for this modified algorithm is to choose p and q so that the corresponding algorithm is locally stable. The sufficient stability condition is given by the following equation: I I w o p l ||2 < ---------------- _ £ z l .---------------- (56) E { s q } £{N } q( 1-----------fy) - p ( 1----- j-p A 77 f I I ^ 1 E{\s\ } £{|j| } 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In our application, we are particularly interested in modified CMA algorithm. By settingp= l and q=3 and q =0, we therefore derive the update equation for modified CMA algorithm as: w(n + \) = w(n) + p(\\ w(n)\\p -R2 ■ y(n)-\y(n)\2)-u(n) (57) According to [34], the modified algorithm allows for the deconvolution of super-Gaussian signals as long as the optimum coefficient vector is not overly long. Therefore, the non- causal real-valued DSAF we proposed in Chapter 3 is integrated with the modified CMA algorithm. In this way, we can reduce not only the optimum filter’s order in each subband, but also the overall system’s implementation complexity. 4.3 Simulation Results To evaluate the performance of this approach, we simulate a reverberant speech signal by convolving a dry speech segment with a real recorded Room Impulse Response(RIR) truncated to 256 taps. The signal is split into 64 subbands. In each subband, the modified CMA algorithm is applied with update parameter fi = 0.001. The signal after processed by this Delayless Subband Modified CMA is plotted in the Figure 4.4. To make a further comparison, we also applied the previously proposed CMA based monaural dereverberation approach in Section 3.4 on this reverberant signal too. The results in terms of LP kurtosis, Enhanced ID and WSSD objective measurements are compared in Table 4.1. 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Both the statistics and the informal listening tests show that the use of modified CMA leads a better reverberation suppression performance when compared to the method based on the CMA algorithm. However, the dereverberation performance is still not that good as we expect. Coloration and non-stationary properties of the speech signal might be the two factors that deteriorates the dereverberation performance of the modified CMA based algorithm. Besides, the real world RIR is usually non-minimum phase, which adds more difficulties when blind deconvolving audio signals using Bussgang-type algorithms. 4.4 Conclusions This chapter aims at improving the monaural dereverberation performance of the CMA algorithm based approach we proposed in Chapter 3. Since the conventional CMA algorithm usually works poorly when deconvolving the audio signals with super-Gaussian distribution, the modified CMA algorithm is employed instead, which includes a norm factor into the coefficient update equation to make it stable even for audio signals with super-Gaussian distribution. Meanwhile, in order to de-correlate the input audio signal and improve the performance, the modified CMA is applied to the LP residual of input speech, instead of the time domain speech signal. We further combine the modified CMA algorithm with the Non- causal Real-valued DSAF structure to reduce its computational complexity and improve its performance. Computer simulations and comparison results show that the employment of modified CMA outperforms the previous suggested CMA based approach to a limited degree. 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 10 (a): Original Dry Speech 2 3 4 5 6 x 104 (b):Reverberant Speech (c): Speech Signal after Dereverberation Figure 4.4: Simulation Results for Modified CMA based Dereverberation 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LP Kurtosis Enhanced ID WSSD Wet Signal 7.8429 0.2918 56.8048 by original CMA 9.0115 0.2606 49.7540 by Modified CMA 9.5173 0.2816 44.7338 Table 4.1: Objective Measurements for Speech Dereverberation Performance Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 5 BINAURAL ADDITIVE NOISE REDUCTION In this chapter, we propose two efficient binaural Additive Noise Reduction (ANR) schemes, both of which try to combine the merits of binaural analysis and some commonly-used noise reduction algorithms together to reduce the colored binaural noises. In both schemes, the conventional Voice Activity Detector (VAD) is substituted by a simplified binaural model, in order to get more accurate voice activity information even at low SNR. This, together with some other novel modifications to take account of the non-uniform spectrum of most real- world noise, enable both two approaches to achieve enhanced performance in reducing high colored binaural noise in low SNR environment. In the first binaural ANR approach we propose, a simple but effective perceptually-weighted spectral subtraction algorithm based on the masking threshold computation is applied onto the left and right channels respectively. The band-specific over-subtraction factors and spectral floors determined by the SNR on each frame are also introduced into the subtraction rules to further reduce the musical noise and signal distortion. The second binaural ANR scheme is built upon the integration of Subband Adaptive Noise Cancellation (ANC) with simplified binaural model. Two different subband processing methods are introduced and compared to reduce the implementation complexity and improve the noise reduction performance. Intermittent ANC is then applied to selected subbands to suppress the colored binaural noise adaptively. 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The section is organized as follows: Section 5.1 describes the overall system structure. Section 5.2 describes how to implement the simplified binaural model to detect the speech/noise segments. In Section 5.3, we explain the additive binaural noise reduction scheme using spectral subtraction technique. Section 5.4 shows how to integrate ANC with binaural model to suppress the additive binaural noise. Finally, the comparison and discussions are presented in Section 5.5. 5.1 Problem Formulation Figure 5.1: Binaural Noisy Signal Recording System In binaural systems, we have two channel audio signals recorded in the left and right ear canal respectively. Based on the assumption that the signal source and noise source are located at different spatial positions, a typical binaural noisy signal recording system can be illustrated using Figure 5.1. To eliminate the effect of noise source on the binaural recordings, in this work, we provide two different approaches which combine two popular noise reduction methods with binaural processing to suppress binaural additive noise. 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2 Simplified Binaural Model Acting As VAD VAD plays a crucial role in many speech processing applications, such as subtractive type noise reduction algorithms. The traditional VAD algorithms are based on the measurements of energy levels, zero crossing rates [45], pitch, cepstral features [19], and the periodicity measurement [51]. Unfortunately, these algorithms perform poorly in low SNR situations (like SNR<5dB), especially when the noise is non-stationary. It is well known that binaural cues can be used to analyze auditory scenes. For example, one of the most popularly binaural auditory models is Lindemann-Gaik's processor [15] [30], The binaural excitation patterns it produces can give information on both the spatial and temporal distribution of the sound sources. This implies that it is possible for us to use this information to detect the speech/silence segments when the speech and noise signals are spatially separated. However, the conventional binaural models usually require a very large amount of computation to analyze the Interaural Time Difference (ITD) and Interaural Level Difference (ILD). In our scheme, we employ a simplified binaural model [18] instead, which can generate most of the characteristics of binaural excitation patterns, but with greatly reduced computational complexity. Derived from the sophisticated Lindemann-Gaik's model, the simplified model is also an interaural running short-time cross-correlator. The simplified model only processes the signal peaks with its running cross-correlator to decrease the complexity of calculations. The block diagram of the simplified model is depicted in Figure 5.2. 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Left Channel Signal Peak Detector Peak Detector Pre Processing Pre Processing Post-Processing <— Delay Lines Binaural Exictation Patterns Delay Lines —> Coincidence Detection Right Channel Signal Figure 5.2: Simplified Binaural Model Structure Acted as VAD 5.2.1 Pre-processing Stage First, the peripheral auditory processing including cochlea and fibre nerve is simulated in the pre-processing stage. As we stated in the previous section, cochlea filtering can be modeled by a bank of band pass filters. In order to better model the auditory periphery cochlear frequency resolution and selectivity, many researchers use filters inspired by the auditory filter banks which have non-uniform bandwidth and non-uniform spacing of center frequencies. The auditory filter banks employed in this work belong to Gammatone filters, developed by Patterson et al [43], as their implementations are efficient. A Gammatone filter is the product of a rising polynomial, a decaying exponential function, and a cosine wave. It can be described with the following formula: gammatoneft) = at n ~ ie~2 'zhEm,r)‘ cos(2;r-fc-t + f) (58) 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where n is the order of the filter, f c is the center frequency of the critical band and a,b& 91 are constants. A possible discrete fourth-order Gammatone filter is a simple 8-th order recursive digital filter with 4 zeros and 8 poles (these 8 poles consist of one conjugate pole pair to the fourth power), and can be seen as cascading four different filter stages. Each of these four stages is a second order (biquadratic) digital filter. The Z-domain formula for such a fourth-order Gammatone filter is: After the values for the center frequency f c, constant b and sampling interval Ts are set appropriately, we can determine the poles and zeros for each of the four filter stage, and subsequently the corresponding filter coefficients A,, and according to Slaney [48]. Figure 5.3 shows the frequency response of a 25 bands Gammatone filter. Also note that all the zeros in the Gammatone filter are on the real axis, which means that the zeros have a small effect near the center frequency of each filter. By ignoring the zeros in the original filter, we cut the computational complexity nearly in half and get an all-pole Gammatone filter. Thus we have four second order sections, each with the identical set of conjugate poles near the resonant frequency. Compared to the ordinary Gammatone filter, all-pole Gammatone filter provides an improved time-domain match to the Basilar Membrane mechanical impulse response measurement, a simpler parameterization and less computational cost Therefore, we choose all-pole Gammatone filter as the auditory filter banks. However, the all-pole version may not be quite sharp on the low frequency side of the resonance. The frequency response of a 25-band all-pole Gammatone filter is plotted in Figure 5.4. Gammatone{z) = yf[0Z2 + AyyZ + /f[2 + AjyZ + Aj2 Z^ + ByyZ + Bn Z2 + Bj\Z + $22 _ ^ 30-z + An z + A3 2 ' A40z + A4lz + A4 2 (59) Z + B^yZ + B ^ z + B4\Z + B4 j 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In our approach, the left/right channel signals are decomposed into M=25 critical bands using the fourth-order gammatone filter banks described above, which model the behavior of cochlear. Then, each critical band signal is half-wave rectified to mimic the firing probabilities of the auditory nerve. In order to process the envelopes of the signals at higher frequencies, a first order low-pass filter is considered. Finally, saturation effects are modeled by taking the square root of the signal. Frequency Response of 4th Order Gammatone Fitter 0 -20 V -40 V -120 -140 to3 Frequency: Hz Figure 5.3: Frequency Response o f a 25-band Gammatone Filter 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Frequency Response of 4th Order All-pole Gammatone Approximation 0 -50 § -100 T J n . o * -200 -250 10’ .3 10 Frequency: Hz Figure 5.4: Frequency Response o f All-pole Approximation fo r a 25-band Gammatone Filter 5.2.2 Binaural Cue Extraction and Voice Activity Detection After the preprocessing stages, peak detectors are used to extract the information on peaks and their amplitudes. On each critical band, the peaks of left/right channel are fed into the cross-correlator moving in opposite directions along two delay lines and are multiplied at each tap. To simulate the contralateral inhibition in Lindemann-Gaik's model, we insert a coincidence detector here. The coincidence detector measures the time delays between two corresponding peaks met on the delay lines, and computes resulted energy of that coincidence. When the detector detects a coincidence based on some criteria, both peaks are deleted from the delay lines. In this way, the periodicity of the cross-correlation function can be suppressed. 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Clean Speech Signal 8 0 Speaker: 40° Azimuth Voice Detection Result By Simplified Binaural Model -3 0 Noise: -60° Azimuth Time: ms Figure 5.5: Speech/Noise Detection Results (Speaker: 40° Azimuth; Noise: -60° Azimuth) Now, we have M running binaural activity patterns for M critical bands. To identify the speech pauses in each critical band, a correlation-to-azimuth transformation is applied, mapping the correlation axis into azimuth. The transformation rule is determined in a supervised learning phase. The resulting binaural patterns are then segmented into frames. For each frame, we estimate the dominant source location by identifying the peak positions in binaural excitation patterns. In this way we can easily figure out the speech pauses for each critical band. Figure 5.5 shows an example of speech/noise identification result produced by the simplified model for a noisy speech at 5th critical band with center frequency f =640 Hz. Since the simplified binaural model distinguishes the noise dominant from speech dominant mainly based on their spatial locations, it can work well even under low SNR conditions for both stationary and non-stationary noises. 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3 Spectral Subtraction Based Noise Reduction Figure 5.6 shows the overall structure of our proposed binaural noise reduction scheme. There are three critical components in this system: (i) a simplified binaural model acting as the VAD, (ii) modified subtraction rule and (iii) psychoacoustically motivated spectral subtraction. They will be explained in the following sections. Right C hannel Phase Left Chn: xR (n) = nR (n) + sR(n) M agnitude Act as Speech Pause Detector M agnitude Right Chn: M”)= "it") + sL (n ) sL (n) Left C hannel Phase B inaural Model Spectral S u b tract W indowing + FFT Spectral Subtract. W indowing + FFT IF F T & O verlap Add IF F T & O verlap Add Modified S u b trac t Rule Modified S u b trac t Rule Calculation M asking Threshold Calculation M asking Threshold Figure 5.6: Block Diagram fo r SS based Binaural Noise Reduction 5.3.1 Noise Reduction Rule As we have stated, spectral subtraction is a popular noise reduction approach due to its simplicity. Many variations of it have been developed to suppress both the signal distortion and musical noise, such as Ephraim-Malah’s MMSE spectral amplitude estimator [11], the introduction of over-subtraction factors and spectral floors [27], and nonlinear spectral subtraction [32], In our approach, we choose the variation of [26] for its simplicity and effectiveness in reducing colored noise. By using the band-specific over-subtraction factors 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and spectral floors, we take into account the non-uniform spectrum of the real-world colored noise and make the subtraction rule more suitable for colored noise reduction. Assuming the speech and noise signals are uncorrelated and the noise power changes relatively slowly, the idea of spectral subtraction is to subtract the spectral magnitude of the noise from that of the noisy signal to obtain a cleaner signal. The noise spectrum is estimated during the silence period detected by the simplified binaural model in this work. Consider the noisy signal x(n), x(n) = s(n) + n(ri) (60) where s(n) is the clean speech signal, and n(n) is the noise signal. The noise reduction is performed in the frequency domain by applying the Discrete Fourier Transform (DFT) on a frame-by-frame basis. The magnitude spectrum of the noise on the ilh critical band is obtained by exponential averaging during each speech pause as follows, N, (k , p ) = aNi (k, p -1 ) + (1 - a)Xt (k, p) (61) where Nj(k,p) and Xj(k,p) denote the estimates of the p'h frame's spectral magnitude of the noise and the noisy speech on the i h critical band respectively. Then, the estimate of the clean speech's magnitude spectrum can be obtained as, S li(k,p) = Gi(k,p)X,(k,p ) (62) Gi{k,p) = XGi{k,p-l) + w lth (1 n max[#(T0^,(^p)> x ,(k’P)-y<(P)N,(k’P)] (63) X,{k,p) Gi(k,p) is the spectral weighting filter and is exponentially averaged by a time constant X to reduce the musical noise. p,(p) is the spectral floor of the i'h critical band, which prevents the 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. estimated speech magnitude spectrum falling below zero and also helps to "mask" the neighboring residual noise components. y,(p) is the over-subtraction factor of the i,h critical band. Unlike the conventional spectral subtraction technique which uses fixed over-subtraction factor and spectral floor throughout the frequency range all the time, we set different values for y,(p) and f>,(p) on different critical band for different signal frame. In particular, y,(p) and Pi(p) are determined as a function of the segment SNR on the i'h critical band which is defined as: SNR,(P) = 10 log, k = s, k = s, (64) Here, s, and e, are the starting and ending frequency bins of the fh critical band. The functions are described in Figure 5.7(a) and Figure 5.7(b) separately. Therefore, different levels of noise in each critical band can be minimized whilst maintaining a small signal distortion. -- 4 - - 0.01 ► SNR: dB SNR: dB (a): pi(p) vs. SNR, (b): y,(p) vs. SNR, Figure 5.7: Spectral Floor and Over-subtraction Factor 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3.2 Noise Masking Threshold When applying the noise reduction technique in hearing aids, we are actually dealing with audio signals intended for a human listener. Thus, the properties of the human auditory system should be considered. Simultaneous masking is a well-known psychoacoustical property of the auditory system where some sounds are inaudible in the presence of other sounds (maskers) with certain characteristics. Therefore, the masking properties are modeled by calculating the noise masking threshold in each critical band. FFT Spectrum Masking Threshold Renormalize Threshold Estimate Applying Spread Function Absolute Threshold Compare Extract Max Critical Band Analysis Tonality W eighting Threshold Offset Figure 5.8: Calculation Steps fo r Masking Threshold [26] 5.3.2.1 Noise M asking Threshold Computation Noise masking has been applied successfully to audio coding to mask the distortion introduced in the coding process. In this paper, we calculate the masking threshold Tfkp) in each critical band using the method described in [26] based on the already estimated clean speech’s magnitude spectrum S\(k,p). The calculation steps are illustrated in Figure 5.8. 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Briefly, the critical band analysis is first performed on the FFT power spectrum of the signal by adding up the energies in each critical band. To take into account both in-band masking and inter-band masking, a spreading function, which describes each critical band’s masking effect on the other critical bands, is convolved with the critical band power spectrum. The spreading function used in our work is proposed by Schroeder et al. in [47] and is represented in Figure 5.9(a). In the next step, a relative threshold offset is computed and subtracted from the previous results. This relative threshold offset can be obtained by tonality measurement of the signal to determine the tone-like or noise-like nature of the signal. The resulting relative threshold offset values are shown in Figure 5.9(b). Finally, normalization is performed and compared with the absolute threshold of hearing to get the noise masking threshold. An example of noise masking threshold for a piece of 16ms speech is represented in Figure 5.10. The sampling frequency in this example is 44100Hz, the total number of critical bands is K=25. G O o G s -5 tL- J Q t i ^G * 3 C 3 Q - 100-5 o 5 10 0 5 10 15 20 25 Critical Band Number Critical Band Number (a): Spreading Function (b): Relative Threshold Offset Figure 5.9: Functions Us ed fo r Noise Masking Threshold Calculation [27] 5.3.2.2 Perceptually-weighted Subtraction Rule After the masking threshold Tfk,p) is estimated, it is utilized to modify the spectral weighting filter G,(k,p). A similar idea as the perceptually motivated speech enhancement proposed by Tsoukalas [50] is employed here: the suppression of noise is not necessary at 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the frequencies where the noise components are masked by the audio signal. Thus, the spectral weighting function G*fkp) is modified as: The usage of the masked threshold can lead to a lower distortion of the speech due to the fact that the speech remains unchanged at frequencies where the noise is masked. However, there still remains an unnaturally sounding residual noise especially in the segments where only a weak or no speech is present. To overcome this problem, we further modify the spectral weighting function G*,(k,p) by multiply it with a factor Sfp) . This factor Sfp) is determined by the following rule: otherwise (65) And the final estimate of the clean speech's magnitude spectrum is obtained as: S 2i(k,p) = G;(k,p)X,(k,p) (66) if the p 'h fram e on i'h critical band contains speech i f the p 'h fram e on i'h critical band is noise only Therefore, the final estimation of clean signal spectrum is achieved by: S fk,p) = Sl(p)-G](Kp)-X,{k,p) (68) 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. E xam ple of Noise M asking Threshold S peech signal spectrum Noise masking threshold S -10 C D I -20 -30 -40 -50 -60 2.5 ,4 0.5 Frequency. Hz x 10 Figure 5.10: Example o f Noise Masking Threshold Computedfor a 16ms Section o f Speech 5.3.2.3 Improvement on Masking Threshold Estimation As we can see, to calculate the masking threshold, a simple power spectral subtraction method is employed to get an estimation of the clean speech’s spectrum, which usually leads to a coarse estimation of masking threshold especially in low SNR conditions. The resulting denoised signals are accordingly disturbed with increased residual noise which is very annoying. In order to get a more accurate estimation for noise masking threshold, we examine Ephrahim and Malah’s MMSE spectral amplitude estimator when computing the masking threshold. Moreover, an efficient approximation to EM ‘s MMSE spectral amplitude estimator called MMSE spectral power estimator is utilized to estimate the clean speech’s spectrum, which is known to be the most appropriate algorithm to calculate 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. auditory masked thresholds for the purpose of perceptually motivated noise reduction with less computations. The MMSE method of estimating the spectral amplitude of speech [11] is a Bayesian estimation method employing a mean-squared error cost function. In the MMSE formulation provided by Ephrahim and Malah [11], it is assumed that the Fourier expansion coefficients of the original signal and the noise may be modeled as statistically independent, zero-mean, Gaussian random variables. This assumption leads to a Rayleigh probability density function for the magnitude spectrum of clean speech and a uniform probability density function for its phase. The MMSE gain function is given by: We can see the gain function is determined by two parameters: an a priori SNR £k and an a (69) where posteriori SNR y k . These two SNRs on the p th frame can be estimated using the “decision- directed” approach suggested by Ephraim and Malah in [11] as following: (71) S \ k , p - X ) & (? )= a T 7 7 ------ n + ( ! - « ) max[0, (yk (jj) - 1 )] (72) 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. However, the MMSE spectral amplitude estimator given in Eq. (60) requires the computation of exponential and Bessel functions, making it very complicated and time- consuming. Therefore an efficient approximation to the Ephraim and Malah’s MMSE spectral amplitude estimator provided by Patrick J. Wolfe is utilized instead to simplify its implementation [57], This approximation is actually a MMSE spectral power estimator with the suppression rule simplified as: With greatly reduced computations, this alternative estimation can achieve a good and consistent approximation to the Ephraim and Malah suppression rule. By using G ’ (k) instead of G*(k) to estimate the signal’s spectral power, we get a more accurate masking threshold estimation. As such, we can reduce the level of residual noise with less distortion. 5.3.3 Computer Simulations To evaluate the performance of our proposed binaural noise reduction scheme, we carried out the following two experiments. 5.3.3.1 Our Proposed Perceptually Subtraction Rule Vs. Conventional Spectral In the first experiment, a piece of air-conditioner noise signal, which is highly colored, is added into a piece of clean speech signal to form a noisy speech. To prove the effectiveness and SNR-determined parameters, we apply it to this noisy speech to reduce the colored noise. The performances of our proposed perceptually weighted subtraction rule are also compared (73) Subtraction for Colored Noise Reduction of our proposed psychoacoustically motivated spectral subtraction rule with band-specific 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with conventional magnitude spectral subtraction algorithm. Figure 5.11 gives performance comparison between our proposed subtraction rule and conventional magnitude spectral subtraction rule in terms of segmental SNR improvement, E-ID and WSSD. As can see, our proposed perceptually weighted subtraction rule can suppress more additive colored monaural noise. Informal listening tests also show that the signal estimated by our subtraction rule suffers less residual noise and distortion than the signal produced by the conventional magnitude spectral subtraction algorithm. 0.5- 0.45 0.4 0.35 0.3 0.25 --4-- Recovered signal using conventional SS Recovered signal using our algorithm '■v.„ Input SNR: dB (a) Enhanced Itakura Distance Figure 5.11: Objective Measurements fo r Color Noise Reduction 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 80 75^--. 70 65 60 55 50 45 40 35 -+ ■ ■ Recovered signal using conventional SS Recovered signal using our algorithm 'OK 4 5 6 7 Input SNR: dB (b) Weighted Spectral Slope 10 1 5 I X Z in - 6 - Conventional Spectral Subtraction Our Proposed Algorithm - 3 - 2 - 1 0 1 2 3 Input SNR [dB] (c) SNR Improvement Figure 5.11: Continued 4 5 5.3.3.2 Binaural Noise Reduction In the second experiment, one signal source (male speaker) and one noise source (air conditioner) are positioned at 15° azimuth and -40° azimuth respectively. The noise generated by the air-conditioner is narrow-band limited, slow-varying and highly colored. 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The mixed signal is then obtained by convolving the clean speech/noise with their corresponding HRTFs. Thus, we obtain a pair of noisy binaural signals. Figure 5.12 and Figure 5.13 depict the spectrograms of the clean binaural speech and noisy binaural speech for right and left channels respectively. Both binaural noise reduction schemes by using simple threshold estimation and the MMSE based threshold estimation are applied to the pair of noisy speech signals. The results are compared. • Binaural Noise Reduction using Simple Threshold Estimation First feed the binaural noisy signals into our first proposed binaural noise reduction system using simple masking threshold estimation. We obtain a pair of "cleaner" binaural speech, whose spectrograms are plotted in Figure 5.12 and Figure 5.13 respectively. Obviously, the colored binaural air conditioner noise is successfully suppressed. Besides, informal listening demonstrates that the denoised binaural speech sounds natural with little musical noise and distortions. It showed that for colored noise reduction, our approach results in lower signal distortion and less residual noise, especially for the channel which suffers the greatest noise impairment. • Binaural Noise Reduction Based on Threshold Estimation Computation using MMSE spectral estimator Next, we apply the extended binaural noise reduction scheme with MMSE based masking threshold estimation on the same noisy binaural speeches. Its performance in terms of objective measurements is compared with that of the binaural ANR using simple threshold estimation. The comparison results are listed in Table 5.1. It is easy 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to notice that the introduction of MMSE algorithm into masking threshold estimation increases binaural noise reduction performance. 5.3.4 Conclusions In this section, we proposed a simple but effective binaural noise reduction scheme, which consists of simplified binaural model and psychoacoustically motivated spectral subtraction. The simplified binaural model is employed to detect the speech pauses, which overcomes the drawbacks of conventional VAD in low SNR environments and results in more accurate noise estimation. A simple but effective perceptually motivated spectral subtraction based on noise masking threshold is then carried out on left/right channel independently to suppress the additive colored binaural noise. Band-specific over-subtraction factors and spectral floors determined by frame SNR estimates are also utilized in our proposed methods to improve its noise reduction performance for colored noise. Moreover, MMSE spectral power estimator based masking threshold estimation further enhances the estimation accuracy and reduces the residual noise consequently. In summary, compared to the conventional VAD based spectral subtraction, our approach offers an improved performance for binaural colored noise reduction under low SNR conditions; compared to the sophisticated binaural models based cocktail-party processors, our proposed binaural noise reduction scheme is simpler and much more computationally efficient which makes it suitable for real-time hearing aid applications. 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. O 0.2 0.4 0.6 0.8 1 1.2 Time (a). Clean Speech: Left Channel x to4 T im e (h). Noisy Speech: Left Channel (SNR =2dB) x - J O 4 0 0 2 0.4 0 6 0.8 1 12 T im e fcj. Recovered Speech: Left Channel Figure 5.12: Simulation Results fo r Left Channel Signals 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a). Clean Speech: Right Channel x w * Time (b). Noisy Speech: Right Channel (SNR = 7dB) x tO 4 lame (c). Recovered Speech: Right Channel Figure 5.13: Simulation Results fo r Right Channel Signals Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Enhanced ID WSSD S-SNR Improvement Noisy Signal (Left) 0.89443 60.3394 0 Simple Binaural Noise Reduction 0.63627 57.6235 2.0461 Binaural Noise Reduction Based on MMSE 0.48898 57.242 2.2081 (a): Left channel results Enhanced ID WSSD S-SNR Improvement Noisy Signal (Right) 0.38565 50.0563 0 Simple Binaural Noise Reduction 0.21053 47.2949 -0.4183 Binaural Noise Reduction Based on MMSE 0.17541 44.2139 0.3265 (b): Right channel results Table 5.1: Objective Measurements Comparison between Simple Masking Threshold Estimation and MMSE Based Masking Threshold Estimation 5.4 ANC Based Adaptive Binaural Noise Reduction In section 5.3, we proposed a binaural noise reduction approach based on the combination of spectral subtraction and perceptual binaural model, which is simple to implement and effective to reduce binaural colored noises. However, this spectral subtraction based algorithm can not effectively suppress non-stationary noise or noise at very low SNR. Besides, unavoidable musical noise is still introduced due to the noise spectrum estimation error. 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In this section, we propose an alternative binaural noise reduction strategy, which is built upon the integration of Adaptive Noise Cancellation(ANC) technique [55] with the perceptual binaural model. Examining the speech signals, we notice that speech signals contain many silence segments, which can be detected by our previously proposed simplified binaural model. Under the assumption that the binaural noises in left and right channels are coming from the same noise source and correlated, but uncorrelated with the speech signal, we can then use an adaptive filter to cancel the noise during the speech silence periods [10] [56]. Meanwhile, by taking into account the colored nature of most noises in real life, we incorporate the subband processing into the approach and apply the ANC process in selected frequency band as needed only. This gives faster adaptation through the freedom to use different adaptive parameters in each subband. Objective speech quality measures show that the algorithm removes a large amount of the colored binaural noises with relatively less residual noise. The block diagram of the overall system for our proposed subband adaptive binaural noise reduction scheme is plotted in Figure 5.14. 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. x, (n) = Intermittent ANC Intermittent ANC Simplified Binaural Model Noise Power Esitmation Speech Pauses Detection Synthesis Banks Filter Analysis Banks Filter "► S(n) S lW + IIlW Figure 5.14: Block Diagram fo r ANC based Subband Binaural Noise Reduction 5.4.1 M ain Idea Assume the binaural signals xfn) and xR (n) consist of one speaker s(n) and one noise source n(n) located at different locations, as shown in Figure 5.1. Accordingly, xL(n) and xR (n) can be expressed as: x l («) = s l («) + n i. («) = K l * s (n) + hn! * n ( n ) x r (») = s r («) + n R ( ” ) = K r * s ( n ) + K r * »(») (75) During the silence or non-speech activity segments, the binaural signals xL(n) and xR (n) actually become noise only signals as shown below: XL(n) = hnL * ” (») xR(n) = hnR*n(n) (76) Now we apply adaptive noise cancellation onto the current segment binaural signals, for example, input xL(n) as the reference signal and xR (n) as the primary signal. When speech 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. segments come, the adaptation would be temporarily stopped. In this way, we can guarantee the adaptive filter W s values to converge to W t = H n R / H r[ no matter speech signal is presented or not. As a result, when the adaptive filter W finally converges, the output signal of the ANC can be represented as: V = X k - ^ X l = ( H „ - ^ H SL)S (77) nL H n L It’s easy to see that when the algorithm converges, the output signal contains speech signal only with the undesired noise components successfully cancelled. In fact, the final signal u(n) is a filtered version of clean speech signal. 5.4.2 Subband Processing As we stated in Section 5.1, real world acoustic noise is usually not white and its effect on the speech signal’s frequency spectrum is not uniform. Accordingly, splitting signals into different subbands to process independently in each band becomes a good choice. Subband operation also gives faster adaptation through the freedom to use different adaptive parameters in each band. In our work, we employed two different types of subband processing techniques and compared their performances on adaptive binaural noise reduction. In both approaches, fullband binaural signals are first separated into subbands, intermittent ANC is then applied in each critical band independently to suppress the binaural noise. The output signals of each critical band ANC are finally synthesized together to obtain the fullband recovered speech signal. 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Auditory Filter Bank First, a set of filterbanks modeling the spectral analysis performed by human being’s auditory system is employed to separate the signals into critical bands. In our work, we use the well-known Gammatone filterbanks for this purpose, which we have introduced in Section 5.2. In this way, human being’s auditory system is simulated and the binaural noisy signals are decomposed into non-uniform critical bands to process separately. • Delay less Subband Filter Bank It’s well-known that adaptive filtering algorithm plays a veiy important role in ANC applications. The adaptive filter’s performance, in terms of computational complexity and the behavior of convergence, mainly depends on the adaptive algorithm used to update the adaptive filter coefficients. However, the order of the adaptive filter in our adaptive binaural noise reduction problems is usually of hundreds or even thousands of taps, resulting in a very poor performance. Delayless Subband Adaptive Filter (DSAF) is a promising technique to help us achieve a faster convergence with reduced computation and zero transmission delay. Therefore, we choose the Real-valued DSAF we proposed in Chapter 3 as an alternative subband processing technique in our ANC based binaural noise reduction scheme to decompose the signals into different subbands. 5.4.3 Intermittent ANC Module In a intermittent ANC, the adaptive filter coefficients are updated only during the noise-only segments. To identify those speech pauses or noise only segments, we need to use some VAD to separate the speech dominant segments from noise dominant ones. But just as we’ve stated in Section 5.2, conventional VAD performs poorly in low SNR or non-stationary noisy environment. The Simplified Binaural Model which we proposed in Section 5.2 is 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. employed instead to detect the noise only segments in each critical band binaural signals. The simplified binaural model use spatial information to distinguish the noise-only segments, enabling it to achieve a good speech activity detection performance even under low SNR conditions for non-stationary noise. 5.4.4 In term itten t ANC M odule Figure 5.15 depicts how an intermittent ANC module works in one subband. Input Signals at i"' moment s(i) = xR(i- D) - x j xw. i = i + 1 xR(i)and xLi = [xL(i) xL(i- 1)... xL (i - N + 1)] Simplified Binaural Model S L t,____ Silence Segment Detection Speech Pauses? Noise Power P t Estimation P „ > Threshold Update J V j Using RLS Algorithm Figure 5.15: Flow chart fo r Intermittent ANC Module 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4.4.1 Adaptive Algorithm The core of this module is the adaptive algorithm used to update the adaptive FIR filter’s coefficients. The LMS algorithm is arguably the most popular algorithm because of its simplicity and reliability. But its convergence rate gets very slow if there is a big spread among the eigenvalues of the input signal’s auto-correlation matrix, which limits its use in audio signal processing. Another big problem of applying LMS algorithm in our binaural noise reduction scheme is how to choose the step size of the LMS algorithm in different bands. To maximizing the performance of LMS algorithm, this step size should be chosen based on the subband signal’s power. However, without a priori knowledge of the speech/noise signal’s characteristics, it’s hard to optimize the step size in each band to maximize the convergence rate. The RLS approach, on the other hand, is independent of the gradient step size, offering faster convergence and smaller MSE, though at the expense of requiring more computations. But when incorporating it with subband processing technique, we can overcome the computational intensity of the RLS algorithm via the subband decimation. Therefore, we choose RLS algorithm as the core algorithm of intermittent ANC in this binaural ANR approach. In order to achieve a close approximation to the optimal non-causal filter, a delay A must be inserted in the primary input xR (n). 5.4.4.2 Implementation Considerations The adaptive filter’s coefficients are only updated using RLS algorithm during speech pauses, in which only noise exists. So our proposed ANC module in each band is actually intermittent ANC. Furthermore, we know most real-world noises, especially speech shaped 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. noises, are colored and band limited. Considering this fact, we also estimate the noise power in the primary channel during the noise-only segments in each band. If the noise power is under some threshold, we assume that the noise existed in that band is small enough to be ignored. Then the adaptive filter coefficients in that band are set to zero, which means no adaptive noise cancellation is necessary in this band. As such, by adding a small amount of extra computations to estimate noise power, we can reduce a large amount of computations required for ANC in some bands. The noise power estimate P, in the i h band can be implemented using a simple autoregressive estimator of the form of Eq. (78), where a is the estimator’s forgetting factor. P, («) = aP, (n -1 ) + (1 - a)\n, (n)|2 (78) The iterative algorithm for updating the f h subband adaptive filter w _' is shown in Figure 5.15. 5.4.5 Computer Simulation Two simulations are carried out to test the performance of our proposed subband ANC based binaural noise reduction strategy. One incorporates the auditory filter banks to split signals into 25 critical bands, while another one utilizes the Real-valued Delayless Subband Filtering technique to separate the signals into 64 uniform bands. In both simulations, one signal source (male speaker) and one noise source are simulated to be positioned at 15° azimuth and -40° azimuth respectively. The noise is highly colored speech shaped noise with narrow band. The noisy binaural signals are obtained by convolving the speech/noise with their corresponding FIRTFs. The SNR of the left channel is about 4 dB, while the SNR of the right channel is around 0 dB. In both approaches, the delay inserted into the fullband primary input xR (n) is set to A = 60. 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The noise reduction performance is measured by three objective measurements: Segment SNR Improvement (S-SNR), Enhanced Itakura Distance (E-ID) and Weighted Spectral Slope (WSS) which we have introduced in Section 2.2. 5.4.5.1 Auditory filter band based approach To separate the fullband binaural noisy signals into different bands, a 25-band all-pole FIR Gammatone filterbank is implemented, whose bandwidths are non-uniform, but determined by the auditory Critical Band (CB) corresponding to its center frequency. In each critical band, a 256 taps adaptive filter is utilized to realize the intermittent RLS algorithm. The noise reduction performance using this auditory filter band is summarized in Table 5.2. 5.4.5.2 DSAF based approach In this approach, the fullband binaural signals are split into 64 bands using a DSAF. The subband adaptive filter’s order is set to 16 taps in each band. The final noise reduction performance is summarized in Table 5.2, Figure 5.16 and Figure 5.17 also plot the spectrograms of output binaural signals and compare them with the spectrograms of both the clean binaural speeches and the binaural noisy speeches. 5.4.5.3 Comparison Between Auditory Filter bank Based and DSAF Based Approach As can be seen, DSAF based method outperformed the auditory filterbank based method in the following three aspects, though both of them can effectively suppress the binaural noises in low SNR environment. • A delay is introduced into the signal path because of the Gammatone filter bank analysis/synthesis processing. This delay is of the same length as the Gammatone 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. filter’s order, i.e. 2048 samples in our simulation, which is more than 100 ms for systems sampled with 16000 Hz. • The output signal after binaural noise reduction based Gammatone filterbank processing suffers more distortion then that of DSAF based method. Due to the fact that the transfer function of the overall system of the Gammatone filter based analysis/synthesis auditory filter banks is not a perfect delay (for 25 channels, it results in a more than ldB ripple), a small additional distortion is added into the processing signals. The comparison results in terms of Enhanced-ID and WSSD measurements listed in Table 5.2 already prove this. • The DSAF based method is much less time-consuming with less computational complexity because of the efficient decomposition implementation and the oversampled decimation in each subband. The resulting adaptive filter order in each subband is accordingly much shorter, not only improving the ANC’s performance but also reducing the whole system’s implementation complexity. 5.4.6 Conclusions In this section, we presented an intermittent ANC based subband adaptive binaural noise reduction scheme. The employment of simplified binaural model acting as conventional VAD enables our proposed approach work well even under low-SNR circumstances. Moreover, two subband processing techniques: auditoiy filterbank and DSAF, are incorporated and compared respectively. Our simulation results showed that the use of DSAF can help yield a better binaural noise reduction performance than that of auditory filterbank does. 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Another possible application of this approach is binaural sound rendering (auditory scene reconstruction). Consider the two signals contained in the binaural recordings are all useful audio signals we are interested, like speeches. In this case, we can use the similar method to separate the two audio signals out. With the help of simplified binaural model identifying the two signals locations, we can easily reconstruct the auditory scene by feed the separated two signals into the corresponding loudspeakers positioned at the identified locations respectively. However, compared to the application to binaural noise reduction problems, the application in binaural sound reproduction is more complicated because ANC need to be applied twice to extract the two signals out successively. Enhanced ID WSSD S-SNR Improvement Noisy Signal (Left) 0.89443 60.3394 0 Auditory Filterbank 0.13168 46.8992 0.4855 DSAF 0.12881 37.1320 2.0401 (a): Left channel results Enhanced ID WSSD S-SNR Improvement Noisy Signal (Right) 0.38565 50.0563 0 Auditory Filterbank 0.17516 46.8992 - 3.6182 DSAF 0.087101 31.061 - 1.4362 (b): Right channel results Table 5.2: Objective Measurements Comparison between Auditory Filterbank Based Method and DSAF Based Method 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a). Original Clean Speech: Left Channel — Tim e x 1Q« (b). Noisy Speech: Left Channel I M i x 10* (c). Recovered Speech: Left Channel Figure 5.16: Simulation Results fo r Left Channel Signals Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. O 0 5 1 1 5 2 2.5 Tim e x 10* (a). Original Clean Speech: Right Channel Tim e x 104 (b). Noisy Speech: Right Channel W K tm M Tim e „ 1Q- (c). Recovered Speech: Right Channel Figure 5.17: Simulation Results for Right Channel Signals 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.5 Discussions In this chapter, we proposed two different binaural noises reduction methods both based on simplified binaural model. One is built upon the simple spectral subtraction algorithm, another one is built upon subband adaptive noise cancellation technique. In these two methods a simplified binaural model simulating the auditory system is employed to identify the speech activities, which can achieve good speech/silence detection results for noisy signals at low SNR. Simulation results demonstrate that both these two binaural ANR methods can effectively reduce the high colored binaural noises even in low SNR environment. However, each of them also has its own unique properties. Comparing them, we can conclude: • The spectral subtraction based algorithm is simple, making it easy to implement. The subband ANC based method, on the other hand, is more time-consuming and requires more computations. • The subband ANC based method outperforms the spectral subtraction based method in both noise reduction levels and recovered speech signal’s quality, especially for non-stationary binaural noise suppression in very low SNR environment. Furthermore, the subband ANC based method overcomes the musical noise inherently existed in the spectral subtraction based method. • The final recovered binaural speech signal obtained by subband ANC based method is actually some filtered version of the original clean speech. This adds some additional distortion into the recovered signals. But for speech enhancement, we care 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. more about the natural degree of the recovered speech and its intelligibility. Hence, small distortion can be ignored as long as it doesn’t affect the speech’s intelligibility. • The output signals after processed by the spectral subtraction based methods are still binaural signals with all binaural effects preserved, whereas the output signal of the subband ANC based method is a monaural signal. To reconstruct the binaural signals, we have to know the HRTFs corresponding to the speech source’s location priorily. But in most real world applications, it is hard to know the left/right channel HRTFs in advance. From the above comparisons we can see that the spectral subtraction based method and the subband ANC based method have their own performance features individually, which makes them appropriate to different type real world applications. I l l Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 6 BINAURAL DEREVERBERATION In this chapter, we propose an efficient adaptive binaural dereverberation strategy using constrained least-squares algorithm. First, the blind identification of two-channel’s impulse responses is formulated as a quadratic optimization problem. Then two constrained Least- Squares algorithms are examined and applied to estimate the two channels impulse response adaptively. Finally, RLS algorithm based inverse filtering is used to recover the unknown “dry” sound source signal. Real-valued Delay less Subband Adaptive Filtering processing is also incorporated to reduce the computational burden without signal path delay. Computer simulations show that for short impulse responses, our proposed method can achieve nearly perfect blind deconvolution. For real world long room impulse responses with non-minimum phase, our proposed method can still estimate them efficiently and recover the source signal to a fair good degree. The section is organized as follows: Section 6.1 describes the overall system structure. The main idea behind the left/right channel impulse responses estimation is discussed in Section 6.2. Section 6.3 gives a brief introduction to unit-norm constrained Recursive Least-Squares algorithm. In Section 6.3, we explain the robust linearly constrained fast least-squares algorithm. How to use two-channel inverse algorithm to recover the “dry” audio signal is explained in Section 6.4. Section 6.5 presents the calculation steps to determine the optimum order. In Section 6.6, we describe how to combine the proposed algorithm with DSAF. Finally, the computation simulations and conclusions are shown in Section 6.7. 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.1 Overview V.'" t> Inverse Filtering Inverse Filtering Inverse Filtering Constrained LS Constrained LS Constrained LS Weight Transform Analysis Valued Banks Filter Real Analysis Valued Banks Filter Real Figure 6.1: Block Diagram o f Binaural Dereverberation As we know, we have two channel reverberant audio signals in binaural systems: left channel signal and right channel signal. Dereverberation with two reverberant signals becomes much easier than the monaural dereverberation problems. Furuya proposed a similar strategy for two-channel blind deconvolution in [14]. However, his strategy has several shortcomings that limit its widespread applications: • His method involved direct matrix inversion and eigenvalue decomposition, which leads to a computationally intensive batch approach, such as [31] [38]. So it’s hard 113 S(n) t H inv ^ .. ' I x R(n) - C l Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to implement his method in some real-world applications which requires the algorithm to be efficient and adaptive. • In his method, optimum order needs to be determined in each subband, making his method very complicated and time-consuming. • Due to the conventional oversampled subband processing incorporated in his method, long delay is also introduced into the recovered impulse response and the output dereverberated signal. This is very annoying in many real time applications. To overcome the above three drawbacks, we propose a Real-valued DSAF based adaptive binaural blind dereverberation strategy in this section. In our approach, there are three stages consisted, as shown in Figure 6.1. First, the optimum order of the channel impulse response estimate is determined. Then the impulse responses of both right and left channel are identified using some adaptive constrained Least-Squares algorithm. Finally, the original “dry” audio signal is recovered with the help of adaptive inverse filters. 6.2 Main Idea 6.2.1 Problem Formulation Consider a binaural system as presented in Figure 6.2, in which two microphones are used to capture the left and right channel sound respectively. Let s(n) represents the “dry” sound source signal, then the left/right channel observation xL(n)/xR (n) are the reverberant signals picked up by the left/right microphones. If we use hi(n)/hR (n) to denote the impulse response of the left/right channel acoustic path, xl/n)/xR (n) is actually the result of a linear 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. convolution between the source signal s(n) and the corresponding channel response hL (n)/hR (n): x L ( n ) = hL (n ) * s ( n ) x R{n ) = hR{ri)* s (n ) (79) Therefore, if we can identify the channel impulse response hL(n)/hR (n) from the observations xi (n)/xR (n), then we can use recover the source signal s(n) using the estimate of hR (n)/hR (n) accordingly. To guarantee the channels are identifiable, we assume that the channel transfer functions HL(z) and HR (z) do not share any common zeros [22], x L(n) H D — x R(n) Figure 6.2: Problem Formulation and Error Signal Construction 6.2.2 Principles When we deal with binaural dereverberation problems, all we have are right/left channel reverberant signals xL(n)/xR (n) . Let’s assume that the channel impulse responses hL(n) and hR (n) can be modeled using FIR filters with order P, which is already known to us. It’s easy for us to formulate the following equation: (») * hL («) = s(«) * K (”) * h (») = x l (n) * K («) (80) 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The precondition for the above equation to hold is that there is no other noise existed. Using the vector form, we can rewrite the Eq. (80) as: with and 7 T XR(n)h i - Xl W IL r (81) x R(ri) = [xR(n) x R( n - 1) ... x R( n - P + l)]r (82) x] (n) - \x L(ti) x L( n - 1) ... x L(n — P + Y)]T (83) h R =[hR(0) hR( 1) ... hR( P - l ) f (84) h L =[h,X0) hL(\) ... hL( P - l ) f (85) Let’s define an error function e(n) based on above equations: e(n) = x R (n) * hL (n) - x L (n) * hR (n) = x T Rin )h L -x T Lin)hR (86) j , r. = u (n)h with input signal vector: yL{n) = \x R{ri) x R( n - 1) ... x R( n - P + 1) x L{n) x L( n - 1) ... x L( n - P + l)]r (87) and estimated filter coefficient vector: h = [hL(0) hL{ 1) ... hL{ P - \ ) - h R(0) - h R{ 1) ... - h R( P - 1)]7 (88) The construction procedure of error e(n) is also illustrated in Figure 6.2. Notice that in the absence of other noise, the error e(n) = 0 for all n only under the circumstances of h R = h R , h L = h j . But in real applications, e(n) is impossible to be zero exactly due to the 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. existence of other noise and the measurement errors. So our goal is to find out the coefficient vector h that minimize some cost function J(n) which is supposed to be a function of the error e(n). Consider two popularly used options for choosing the cost function J(n): • Cost function J(n) is set as the mean squared error of e(n) as: J in ) = E { e \n ) } = E{h (n)u(n)uT (n)h(n)} = S i n ) R m (n)h(n) where Ku O ) = E {u(n)uT (n)} (90) In the case with the presence of observation noise, the correlation matrix R is positive definite rather than positive semi-definite. Therefore, the desired solution for h (n ) is determined by minimizing the cost function J(n): J \ A h = arg m in(./(n)) = argm in(/2 (ri)R(n)h(n)) (91) The most popular adaptive algorithm used to solve this optimization problem is LMS algorithm. • Cost function J(n) is defined in the weighted Least-Squares manner: J (n ) = 5 X ' [ e 2( 0] /= ! = X ^ [h {n)u{i)u (0M «)] M (92) = h (rc)[X ^ u(i)u (O M n) = h \ n ) R uu(ri)h(n) where A is the weighting or forgetting factor whose value is chosen between 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. interval [0,1]. Here, the input signal autocorrelation matrix is given by the following equation: K u in) = ^ ^ ~ 'u { n ) u T (n) (93) i= i As such, h can be estimated by minimizing the above cost function in a weighted least-squares manner as: h = arg m m (J(n)) = arg min(/z {i)Ru u (n)h(n)) (94) The most popular adaptive algorithm used to solve this optimization problem is RLS algorithm. Since adaptive Least-Squares algorithms is able to achieve better performance for highly colored audio signals, we choose the second cost function defined in the weighted LS manner to solve the problem in our work. By observing Eq. (94), we notice this is actually a quadratic optimization problem. It’s easy to see that the vector h(n) minimizing the cost function J(n) is just the eigenvector corresponding to the smallest eigenvalue of Ru u . h(ri) can be decided by direct matrix inversion or eigenvalue decomposition as in [14], but usually leading to a batch approach which is computational complicated. Alternatively, adaptive algorithms such as RLS algorithm offer a more efficient way to solve this optimization problem. Hence we focus on the investigation of adaptive Least-Squares algorithms to estimate the channel impulse responses in this dissertation. 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3 Estimation of Channel Impulse Responses When estimating the impulse response vector h{ri) using Least-Squares algorithm, we need to impose some constrain on h(n) to avoid a trivial estimate with all zero elements. Two constraints are usually employed: • The unit-normalization constraint, which requires the norm of h(n) to be one, i.e. h(n) = 1 • The component-normalization constraint, which is defined by a linear system, i.e. c r h(n) = 1, where c is a constant vector we know in advance. 6.3.1 Unit-norm Constrained RLS Algorithm When the unit-norm constraint is imposed, the estimation of impulse response with a priori knowledge of filter order P is formulated into the following optimization problem: n min y 'A n ‘[h (n)u(n)uT (n)h(n)\ t f (95) subject to : || h(n) ||= 1 As we know, Recursive Least-Squares (RLS) algorithm is a widely-used adaptive algorithm in solving the unconstrained optimization problems. With the unit-norm constraint presented, the most direct and simple way is to scale the coefficient vector h(n) to norm at each iteration. The reason for this simplification is that scaling coincides with an orthogonal projection on the surface of the constraint (the unit-sphere in this case). The detailed algorithm is listed in Table 6.1. 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Analyzing this algorithm’s computational complexity, we can see that we need 4L multiplications, L divisions and one square foot at each iteration to optimize an efficient vector with order L, excluding the computation required to calculate adaptation gain g(n) . Unit-norm Constrained RLS Algorithm • Input new data u(n+l) at time (n+1) and construct input data vector: u(n+l) • Compute the adaptation gain g(n+l) 1. P(n +1) = A~x P{n) - r 1 g(n +1 )u{n +1 )P(n) 2. 7t(n + 1) = P(n)u(n + 1) 3. , n (n + 1) g(n + 1) = - ------jr=^---------------- A + u (n + Y)7t(n + Y) • Update the coefficient vector: «T I. e(n + 1) = — h (n)u(n + 1) 2. h(n + 1) = h(n) + g(n +1 )e{n) x U h(n + \) 3« h(n +1) = — =—7 . ------- norm(h(n +1 ) Table 6.1: Unit-norm Constrained RLS Algorithm 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3.2 Component-norm Constrained LS Algorithm When the component-norm constraint is imposed, we assume that the constraint vector c is already known to us. Accordingly, the blind identification of the left/right impulse response can be formulated as the following linearly constrained optimization problem: A straightforward way to implement such a constrained LS algorithm is to use the standard Recursive Least-Squares (RLS) algorithm and convert the constrained problem into an unconstrained form. As shown by Resende in [46], some intermediate steps can be avoided to make the solution simpler. With the help of the adaptation gain g(ri) , Resende proposed a more direct approach called robust Constrained Fast Least-Squares (CFLS) algorithm in [46], which works both effectively and efficiently in solving linearly constrained LS problems adaptively, only with an extra computational complexity proportional to the number of coefficients. 6.3.2.1 General Linearly Constrained FLS Algorithm Using Lagrange multipliers [2], the optimum coefficient vector h o p t (n) minimizing the cost function can be determined as: m i n ^ A " '[h (n)u(ri)uT(n)h(n)\ n (96) subject to : c T h(n) = 1 LP ,(n) = K:(n)c[cT R - u U n)c_r (97) where the autocorrelation matrix of the input vector u(n) is: (98) 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. When observing Eq. (97) carefully, we notice that two more variables, RuX (n)c and \£ R-ul (/Of-]”' > 'n each iteration need to be updated to compute h(n) recursively. The RLS computation procedure gives us a hint that the inverse autocorrelation matrix can be updated using the following equation: R u l ( » + ! ) = ^ [R uu ( n ) - g in + 1) / in + l)R J ( » ) ] (99) Here g(ri) is the adaptation gain widely used in FLS algorithms, which is defined by: g (n) = Ruu(nM n) (10°) Now let us define another variable: lin) = R ~ u lin)c ( 101) If we right-multiply the constraint vector c at both sides of Eq. (101), we can get: y(n + 1) = A~l[y(n )~ g (n + l)u r (n + \)y(n)\ (102) As we can see, the above Eq. (102) can be utilized to update y (n ) recursively. Consider the variable £ (n) which is defined by: £ (n ) = c T R ^ (n )c = c r £ (n ) (103) Bases on the Eq. (103), we can yield: <^(« + 1) = c T y(n + Y) = X~x[cr y ( n ) - c r g(n + Y)uT(n + \)y(n)\ (104) = A'1[ ^ '( « ) - c r g (« + l)wr (« + l)f(n )] 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The inverse of £(n + 1) is derived using the classical matrix inversion lemma as: C \ n + V) = X [ C \ r i ) + 1 - c T g(n +1 )uT (n +1 )y(ri) (105) By defining a variable l(n+l) as: (106) T T T c y(n) - c g(n + \)u (n + \)y(n) we can rewrite Eq. (105) in a simpler way by: [cTK l i n + \)c Y x = X{[c y{n)\~x + l(n + l)u T(n + l)y(n)[cTKm)]-1} (107) Substituting Eq. (99) and (107) into Eq. (97), the channel impulse responses estimate at time (n+1) can be obtained as: h(n +1) = y(n + \)^~l (n +1) = [y{n) - g(n +1 )u (n +1 )y{n)\{\cr y («)]”' + /(« +1 )uT (n +1 )y(n)[c Km)]-1 } = h(ri) - g(n +1 )u T ( n + I ) ^ ( m ) + l(n + 1)[KW ) ~ S (n + l)“ r ( n + l)y(n))]H .T (n +1)h(n) = h(n) + g{n + 1)Km) - Kn + 1)KW + l)e(M ) Therefore, we can see Eq. (102) (106) (108) and (109) give us a recursive approach to find the optimum channel impulse response vector h opl(ri) . The complete general Linearly constrained FLS algorithm is summarized in Table 6.2. (108) where e{n +1) = — h (m)m(m +1) (109) 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. At the initialization stage, we need to decide appropriate values for Ru u (0) , y(0) and h(0) . Considering the fact that adaptation gain g(n +1) is computed using some Fast LS algorithm, we can initialize them using the following equations: Ru u(0 ) = E 0diag [\, r 1 , r2, ( n o ) K 0) = J R -'(0)c (111) m = r(0)[cTm r (ii2) where Eo is a positive value corresponding to the forward prediction error energy and L=2P is the length of the coefficient vector h {n ). The Constrained Fast LS algorithm in Table 6.2 requires approximately 6L+2 multiplications and one division for coefficient vector with order L, excluding the computation required to calculate adaptation gain g (n ) . Based on this fact, we can say that this Constrained Fast LS algorithm shows a comparable implementation complexity in real time applications. 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Linearly Constrained FLS Algorithm • Input new data u (n + l) at time (n+1) and construct input data vector: u (n + l) • Compute the adaptation gain g (n + l) We can use a general Fast LS algorithm to compute it • Compute the parameters needed for coefficient vector update: 1. y(n +1) = A"1 \y(n) - g (n +1 )u (n + Y)y(ri)\ u in £ gi.n + 1) 2. l{n +1) - — j-------------------------- ---------------- c y ( n ) - c g(n + Y)u (n + Y)y(n) • Update the coefficient vector: 4. e(n +1 ) = - h (ri)u(n + 1) 5. h{n + 1 ) = h(ri) + g(n +1 )e(n) - l(n +1 )y(n + 1 )e(n) Table 6.2: Linearly Constrained FLS Algorithm 6.3.2.2 Robust Constrained FLS Algorithm One problem with this Linearly Constrained FLS algorithm is that the component- normalization constraint only acts in the initialization phase by Eq. (102). The coefficient A vector h{n) is easy to diverge from the global minimum point after some iterations due to the round-off errors. To overcome this, a Robust Constrained FLS algorithm is accordingly 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. studied in [46], and is also employed in our work to solve the component-norm constrained LS problem. When deriving the robust version of the Linearly Constrained FLS algorithm summarized in Table 6.3, we still use Eq. (97) as a guide in the following derivation procedure. With the help of new definition of P(n), which is denoted as follows: P in +1) = /„ - y{n +1 )\c r y{n + 1)]-1 c r (113) we can rewrite Eq. (108) in the following form: h{n +1) = h{n) + P{n + Y)g(n +1 )e(n + 1) (114) Here Ii is a L x L identity matrix. As we stated before, the coefficient vector h{ri) will depart from the theoretical values after a few iterations. So some corrector term should be introduced into h(ri) . This corrector term is supposed to be proportional to the value of the constraint [1 — c T h(ri)\ and vanishes when the coefficients set satisfies the constraint. Accordingly, a corrector term is added into the right-hand of the Eq. (114), resulting a modified update equation as following: h{n +1) = h(ri) + P (n + \)g (n + \)e(n +1) + y(n +1 )[cT y(n + 1)]”'[1 ~ £ T h(n)] (115) The Eq. (115) can be expressed in a more simple form as following: k n +1) = R;l in +1 )c[cr R;l in +1 )c\~' = y{n + i)[cT y{n + l)]~l (116) = qin +1) 126 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Here the vector q[n)is defined by: q(n) = y(ri)[cT y(n)Y' (117) In this way, the recursive update of coefficient vector h(n) is in fact converted to the problem of how to update the vector g(n) recursively. As long as the recursion for g(n) is robust, the entire algorithm is robust as well. Based on the Eq. (106) and Eq. (116), the vector q_(n) can be easily updated using the below recursion equation: q{n +1) = [q(n) - g(n +1 )v(n + 1)] • ------— l — (118) - - 1 - z(n + \)v(n +1) where the variables v(n) and u(n) are defined as: Observing the definition for g(n), we can see if no round-off error accumulation presented in an ideal case, the following equality holds: z(n) = c T g(n) (119) and 7 v(n) = x (n)q(n) ( 120) c q(ri) = 1 ( 121) T This gives rise to a correcting term proportional to [1 — c q(n)] introduced into the vector q(n), and the resulting q_(n) can be achieved by: q(n + l) = q (w + l) + c[cr c] 1 [1 — c r (« + l)] ( 122) with q (n +1) being the vector without round-off errors: 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. q ( n + 1) = [q(n) - g(n +1 )v(n + 1)] • ----- l — ------- (123) - - 1 - u(n + \)v(n + 1) As a summary, the complete algorithm of Robust Linearly Constrained FLS is shown in Table 6.3. The initialization is similar to that used in the general Linearly Constrained FLS algorithm: MO) = 4(0) (124) with m=rmcT m rl 02 5) and 7(0) = R^u (0)c (126) When estimating the computational complexity of this Robust Linearly Constrained FLS algorithm, we assume that the term -f- can be computed priority because the constraint c is known. As a result, the operations count amounts to 6L+1 multiplications and one division for coefficient vector with order L, excluding the computation required to calculate adaptation gain g (ri). 128 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Robust Linearly Constrained FLS Algorithm • Input new data u(n+l) at time (n+1) and construct input data vector: u(n+l) • Compute the adaptation gain g.(n+l) We can use a general Fast LS algorithm to compute it • Compute the parameters needed for coefficient vector update: T 1. u(ri) = c g(n) T 2. v(ri) = x (n)q(n) ' 1 3. q (n +1) = [,q(n) - g (n + l)v(n +1)] 1 - u(n +1 )v(n + 1) 4. q(n +1) = q (« + !) + c[cT c] 1 [1 - c q (n + 1)] Update the coefficient vector: 1. h(n + 1) = q(n +1) Table 6.3: Robust Linearly Constrained FLS Algorithm 6.3.3 Comparison between these two constrained LS algorithms When comparing unit-norm constrained algorithm with the component-norm constrained algorithm, we can focus on three aspects: the generality of the constraint, the computational complexity and the performance. 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • For unit-norm constraint, the only thing we need to know about the coefficient vector h(ri) is that the norm of h(n) should be unit, which is a very general assumption in many applications. However, when component-norm constraint is considered, we must have a priori knowledge of the constraint vector c, which is actually difficult to determine in practice. At this point, unit-norm constraint is more applicable in real practice. But in our case, the coefficient vector h{ri) is in fact an eigenvector of the auto-correlation matrix Ru u corresponding to a zero eigenvalue. Hence, in ideal situation without any other noises presented, the selection of c becomes robust, c can be any vector of the form c = [a, 0,... 0]r , where a is an arbitrary non-zero constant. • Now let us look at the computational complexity of these two algorithms. Since both algorithms involve the calculation of the common adaptation vector g(n) , we exclude the computations required for g(ri) in our comparison. Table 6.4 lists the computational complexity in terms of multiplications, divisions and square root for both the unit-norm RLS algorithm and the Robust Linearly Constrained FLS algorithm. Although the Unit-norm RLS needs 2L less multiplications than Robust Linearly Constrained FLS algorithm in each update, it requires additional L-l division and square root calculation to compute the norm of coefficient vector instead. In real world applications, division and square root computation are usually more complicated to implement and cost more MIPs and sources. So taking all these into account, we can conclude that the Robust Linearly Constrained FLS algorithm is more efficient to implement. 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Robust Linearly Constrained FLS Unit-norm RLS Multiplication 6L+1 4L Division 1 L Square Root 0 1 Table 6.4: Computational Complexity Comparison • In order to evaluate the performance of these two algorithms in estimating the two- channel impulse responses, we carry out such an experiment. In our experiment, a pair of reverberant binaural recordings are simulated by convolving a piece of clean anechoic speech with two simulated Room Impulse Response hr(n) and h/(n). Both the Unit-norm RLS and Robust Linearly Constrained FLS algorithm are applied to these two channel reverberant signals. The estimated impulse responses for left and right channel are obtained and plotted in Figure 6.3 for comparison. As can be seen from the plots, there is no difference between the obtained estimated impulse responses using these two algorithms, which means both these two constrained LS algorithms performed well in blind identifying binaural channel responses. Since the Unit-norm Constrained RLS algorithm and Robust Linearly Constrained FLS algorithm offer comparable performances, in the following study, we choose to use the robust CFLS algorithm to blind estimate the channel impulse response h(n) due to its simplicity. 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 0 .5 0 -0.5 O riginal IR 0 20 4 0 6 0 80 100 120 140 0.6 E s tim a te d IR using U n it-N o rm C o n stra in t 0.4 0.2 -0.2 1 0 0 120 80 140 0.6 E s tim a te d IR using C o m p o n en t-n o rm C o n stra in t 0 .4 0.2 -0.2 1 0 0 120 140 Left C h an n el (a): Left Channel 0.4 0.2 0 -0.2 -0 4 i I O riginal IR 0 20 40 60 8 0 100 120 140 E s tim a te d IR using Unit-IMorm C o nstraint 2 0 40 60 8 0 100 120 140 E s tim a te d IR using C o m p o n en t-n o rm C o nstraint 40 60 80 100 R ight C h an n el 140 (b): Right Channel Figure 6.3: Estimated RIR fo r Left and Right Channel Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.4 Inverse Filtering and Optimum Order Determination 6.4.1 Adaptive Inverse Filtering After the left/right channel impulse responses h L(ri) / hR(n) are estimated, standard adaptive filtering approaches can be employed to estimate their inverse filters. The block diagram of adaptive inverse filtering process is depicted in Figure 6.4. Given the non minimum phase character of the acoustic path, a non-causal system has to be approximated by incorporating proper delay A into the inverse filter model. As a result, the achieved A A inverse filters hm v ,{n) / hm v R(n) should satisfy the following two equations when converging: L L(n)*hL(n) = S ( n - k ) (127) Kv_R(n) * hR(n) = S (n - A) There is a trade-off in selecting the delay taps. We hope the delay should be small enough to be imperceptible to the listener while still maintaining good inverse filtering performance. In our case, for an adaptive inverse filter of 2048 tabs, a delay of 400 taps is inserted which leads to about 10 ms delay in recovered signal for signals sampled by 44100 kHz. . The adaptive algorithm used to update the inverse filter is RLS algorithm. The recovered anechoic signal s(n) is then expressed as: s(n) = [xR(n)* hm v R(n) + xL(n)* hm v L(n)]/2 (128) 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ¥■ eL(n,p) s(n) s(n) h L (n ,p ) h L ( n ) h L ( n ) Figure 6.4: Error Signals Computation in Optimum Order Determination 6.4.2 Optimum Order Determination So far, all our discussions are based on one assumption that the length of left/right channel impulse response is known. In real applications, the order of hn(n) and htfn) are usually unknown, which means that we need to determine the channel impulse order first. We can search for the optimum value of the filter order P by minimizing a cost function suggested by Furuya in [57]: PE(p) = E[eR2 (n,p)]/E[xR (n)] + E[eL2 (n ,p )]/E [x 2(«)] (129) Here the error signals eR(n,p) and eL(n,p) are defined as: eR (n, p ) = x R (n) - s(n + A) * hR (n, p ) eL («»P ) = x l (n) “ s(n + A) * hL (n, p ) Figure 6.4 shows how to compute eR(n,p ) and e, (n,p). Only with eR(n,p) = 0 and eL (n, p) = 0, can the cost function PE(p) = 0. Under this condition, when the optimum order p is searched in a range of numbers, the order p which minimizes the cost function PE(p) is decided to be the optimum order of the estimated impulse responses. A detailed flow chart on how to determine optimum order p is summarized in Table 6.5. 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Determination of Optimum Order L • Input binaural signals x R(n) and x L(n) • Forp =p]: p 2 do A A 1. Estimate h L (n, p ) / h R (n, p ) using constrained LS algorithm 2. Derive the inverse filters h in v L(n ,p )/h jn v R(n,p) using adaptive inverse filtering 3. Estimate the anechoic signal s(n, p ) 4. Compute the cost function PE(p) • Compare the PE(p) and choose the / with minimum PE(p) as the optimum order P P = arg m in[P£(/?)] p e [n{, n2 ] Table 6.5: Procedures to Determine Optimum Order P 6.5 Combination with Real-valued Delayless Subband Processing When the binaural blind deconvolution algorithm we have proposed is implemented in fullband, blind identification and inverse filtering of very long RIRs are usually involved, resulting in huge computational complexity and reduced performance. Real-valued Delayless Subband Processing as we have presented in Chapter 3 can help solve this problem without any additional delay introduced. The decimation operation also greatly 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reduced the optimum order in each subband. This is very important because the search range for optimum order in subband becomes much narrower, leading to greatly reduced computational cost too. Therefore, we incorporate the open-loop Real-valued DSAF structure into binaural dereverberation scheme. As for optimum order decision in each subband, we can make such an assumption that the optimum order in each subband is same. Then we only need to perform optimum order decision in one carefully selected subband only. We can choose the subband with the highest power to decide the subband optimum order because that subband plays a more important role compared to other subbands. The optimum order decided in this subband is then utilized in all other subbands. In this way, lots of computations can be saved. 6.6 Computer Simulations In this section, we demonstrate the effectiveness of our proposed Real-valued Delayless Subband binaural dereverberation method using some computer simulations. In our simulations, a pair of reverberant binaural speech signals are obtained by convolving a piece of clean anechoic speech with two measured Room Impulse Response hR (n) and hL(n) in a small studio. Figure 6.6(a) (b) plots the original anechoic speech and the reverberant left/right channel speech signals. The left/right channel reverberant signals are split into 64 subbands, and optimum order determination is applied in the 2n d subband which shows a highest subband power. The resulting PE(p) corresponding to different value of order p is plotted in Figure 6.5, which 136 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. shows that the optimum order for the 2n d subband impulse responses is 64. Accordingly, we set 64 as the subband impulse response order in other subbands. m T Z > JL -io C l. -12 -14 m inim um p=64 -16 -18 O rder p Figure 6.5: Cost Function PE(p) vs. Order p Now in each subband apply the adaptive binaural blind dereverberation approach we proposed. The resulting subband inverse filters are stacked into a fullband filter, which is then used to filter the fullband reverberant signal to recover the source signal after dereverberation. The final estimate of the original dry source signal s(n) is given in Figure 6.6(c). To further verify the effectiveness of our proposed method on binaural dereverberation, we also show the equalized impulse response for both left and right channel in Figure 6.7. By utilizing our method, the power of the non-pulsive part which is the reverberant part was decreased about 15 dB for left channel and 23 dB for right channel. But we can also notice that a delay of 400 samples is introduced into the recovered signal due to 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the inverse filtering process. However, at 44100 Hz sampling rate, 400 samples are only of about 10 ms, which is tolerable for human being’s perception. 6.7 Conclusions An adaptive blind binaural dereverberation method in this chapter. The left and right channel impulse responses are estimated using the adaptive Constrained Least-Squares algorithms first, and then the dry source signal can be recovered by adaptive inverse filtering. For short channel impulse responses, our method can achieve almost perfect dereverberation performance. For long RIR, we combine it with Real-valued Delay less Subband Filters to reduce its computational complexity. Computer simulations demonstrate that our proposed Delayless Subband binaural dereverberation method can effectively suppress the reverberation with short delay. Compared to the two-channel blind deconvolution method Furuya suggested, our approach can yield better dereverberation performance with less computational burden and less delay. Besides, the adaptive algorithm based scheme also facilitates its applications in real-time tasks. 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) Original Anechoic Signal Sample Itviex * ^ Sample Index (b) Reverberant Binaural Signal: Left / Right Channel m - in - f - ;.................n t .... rirvl- 1 1 --.-.-.-i1 l-tv 1 lnnnn-v 1 l- 1 iiii-f-ivvT-i-ir-n-rnr-V m -i--inn-vii-------n 0 2 4 6 8 W 1 2 Sample Index t. 10* (c) Recovered Signal After Dereverberation Figure 6.6: Simulation Results for Binaural Dereverberation Reproduced with permission of the copyright owner. Further reproduction prohibited without permission (a) Left Channel IR m m m i m i io b o \mi im ieo q lees Sample Index (b) Equalized IR: Left f e d $ 5 m S am ple In d e x (c) Right Channel IR (d) Equalized IR: Right Figure 6.7: Results o f Equalized Channel Impulse Responses 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 7 SUMMARY AND OUTLOOK 7.1 Summary The aim of this dissertation is to efficiently suppress the acoustic noises for both monaural and binaural cases. Two kinds of acoustic noises, additive ambient noise and convolutive noise in the form of reverberation, are particularly studied. The important results achieved in this dissertation are summarized as below: • Real-valued Delay less Subband Monaural Acoustic Noise Suppression We propose a two-stage Real-valued Delayless Subband adaptive noise suppression scheme to suppress the single-channel ambient noise and convolutive noise successively. In the first stage, APA algorithm is implemented to adaptively cancel additive noise, while in the second stage. CMA algorithm is employed to suppress the reverberation. The utilization of Delayless S AF architecture eliminates the inherent transmission delay existed in the conventional subband algorithms and improves the overall system’s performance. The real-valued subband signals produced by Single Sideband modulated filterbanks further make the implementation of APA and CMA more efficient. Computational analysis and simulation results show that our proposed method can achieve good noise suppression performance, especially for additive noise reduction. Moreover, it is very efficient with no transmission delay, which makes it highly attractive in real-time applications. 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Improvement on Delayless Subband CMA Based Monaural Dereverberation In order to further improve the single-channel dereverberation performance, we investigate a modified CMA algorithm, which is able to blind deconvolved super- Gaussian signals, and substitute it for the conventional CMA algorithm. Instead of applying the modified CMA algorithm in original time domain signal directly, we implement blind deconvolution on the LP residual obtained by Linear Prediction Analysis, which can help de-correlate the signals and improve the deconvolution performance of Bussgang type algorithms accordingly. Computer simulations show that this method outperforms the original CMA algorithm based dereverberation scheme in terms of objective measures. But the improvement is limited due to the inherent correlation existed in audio signals. • Binaural Additive Noise Reduction with the Help of Binaural Model To reduce additive binaural noise, we present two different binaural colored noise reduction strategies and compare their performances. Both of them employ a simplified binaural model as the conventional VAD to get more accurate voice activity detection even in a low SNR environment. In the first scheme, a simple but effective perceptually-weighted spectral subtraction algorithm guided by auditory masking threshold is developed. Band-specific over subtraction factors and spectral floors determined by the SNR on each frame are also introduced into the subtraction rules to further reduce the musical noise and signal distortion. To suppress the residual noise caused by inaccurate estimation of masking threshold, MMSE power spectral estimator is further employed during masking threshold’s computation. The denoised signals are still binaural signals with all binaural 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. effect preserved. Compared to the sophisticated “cocktail-party” processors used for binaural noise reduction, our method is simpler and computationally efficient especially in reducing high colored binaural noises at low SNR. The second scheme utilizes a different idea, ANC, to reduce binaural additive noise adaptively. Left and right channel noisy signals are fed into intermittent ANC as primary signal and reference signal respectively. Adaptive update is carried out only during the speech silence segments detected by the simplified binaural model. Moreover, the combination of intermittent ANC with subband processing not only decrease its implementation complexity, but also give us more freedom to select which subbands need noise reduction. The denoised signal is a monaural signal with all binaural effects lost. Simulation results demonstrate that this method can efficiently reduce the non- stationary colored binaural noises to a large extent even for noisy inputs at very low SNR. • Binaural Dereverberation Using Constrained Least-Squares Methods A novel adaptive binaural dereverberation strategy is developed based on the Constrained Least-Squares algorithms. Both Unit-norm Constrained RLS algorithm and Robust Linearly Constrained FLS algorithm are investigated, which can blindly estimate the left/right channel impulse response successfully. The RLS algorithm based adaptive inverse filtering is then used to recover the original “dry” source signal. In order to facilitate its application in real-world situations, we also present a well-formulated cost function to determine the order of the impulse response. For short channel impulse responses, our method can achieve almost perfect dereverberation. As for long RIR, the 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. integration with DSAF structure also make it very efficient and be able to achieve a fair good dereverberation performance with short transmission delay. 7.2 Suggestions to Future Work We’ve discussed several monaural and binaural noise suppression schemes in this dissertation. Further effort is still needed to address various issues inherent to additive and convolutive noise suppression algorithms and issues involved in how to implement them more efficiently in real-time: 7.2.1 Single Channel Blind Deconvolution Incorporating Speech Characteristics Compared to additive noise reduction, convolutive noise suppression or dereverberation process presents to be more difficult and still poses a formidable challenge for audio processing, especially for single channel dereverberation where only one observation is given. In our study, a single channel dereverberation algorithm is proposed based on a blind deconvolution algorithm popularly used in communications: CMA algorithm. However, our experiments show that its performance is not that good as we expected because speech signals have some inherent properties, such as periodicity and formant structure, making their sequences statistically dependent. Nowadays, further investigation on blind deconvolution techniques, such as infomax based approach, have been proposed and achieve some good results. Therefore, the exploitation of these blind deconvolution algorithms in audio signals dereverberation could be one future research direction. 144 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Moreover, speech has some specific characteristics, such as periodicity and formant structure, which can be utilized as a clue in guiding blind dereverberation. As such, we expect more work will be carried out on how to integrate them within blind deconvolution algorithms to achieve an improved dereverberation performance. 7.2.2 Application of Blind Signal Separation into ANS Blind Source Separation (BSS)is a relatively new signal processing strategy, which was introduced by Herault and Jutten (1986). Since then, Blind Source Separation has received quite a lot of attention. With BSS, one wants to separate the mixture of sound sources with little prior information of the original signal sources. If one of the sources is the original audio signal and the others are noise sources, estimation of the original source is in fact an acoustic denoising operation. Therefore, we can use BSS algorithms to solve acoustic noise suppression problems. Generally speaking, the BSS algorithms can be categorized into two classes according to the mixture types. In the simplest case, the signals are just added to each other, and so called instant mixture. For this case, well tested methods have been proposed and are good and straightforward. On the other hand, for the case that the signals are convolutively mixed, such as the audio signals recorded in a real room with reverberation, the existing BSS solutions perform poorly. Consider the binaural noise reduction case, assuming one sound source and one noise source are presented. The recorded noisy binaural signals in two ear canals can be represented as: 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. x L («) = s(n) * hs L (n) + n(ri) * hn I (n) xR («) = ■ *(») * ^ O ) + n(ri) * hn R (n) To extract the clean audio signal s(n) out, we need to eliminate the influences of additive noise n(n) and convolutive noise hsi(n) ( i= R or L). In this way, the binaural noise suppression problem is reduced to a BSS problem with two mixtures. However, compared to the general convolutive BSS problems, blind binaural noise suppression is more challenging and difficult to solve due to the following facts: • The audio signals are usually recorded in a noisy environment with additive noises. This requires the BSS algorithms to be noise insensitive. • The audio sounds are usually recorded in an environment with lots of reverberation. The reverberation might be long, which can be modeled as an impulse response of FIR filter with thousands of taps. This degrades the BSS performance. • The audio signal and noise are usually temporal-correlated (non-whiten), some of the BSS methods based on both temporal and spatial independence of the sources would whiten the output signals and makes the output sound unnatural. Therefore, how to find a suitable BSS algorithm and make it meet the above challenges when applied to binaural noise suppression problems deserves more study in future work in ANS field. 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. REFERENCES [1] Bees, D., Blostein, M. and Kabal, P., “Reverberant speech enhancement using cepstral processing,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 977-980, 1991. [2] Bertsekas, D., Nonlinear Programming, Athena Scientific, Belmont, MA, 2nd edition, 1999. [3] Bodden, M., “Modeling human sound-source localization and the cocktail-party- effect,” Acta Acoustica, volume 1, pp. 43-55, 1993. [4] Boll, S.F., “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans, on Acoust. Speech Signal Process. Vol. 27, pp.113-120, April 1979. [5] Brandstein, M. S. and Ward, D.B., (eds.), Microphone arrays: signal processing techniques and applications, New York, NY: Springer Verlag, 2001. [6] Cavagnolo, B. and Bier, J., “Introduction to Digital Audio Compression,” Berkeley Design Technology, Inc., Internet: http://3c.nii.org.tw/3c/silicon/embedded/2003- 03/lntroductionToDigitalAudioCompression.pdf [7] Chen, Guo, Soo Ngee Koh, and Ing Yann Soon, “Enhanced Itakura measure incorporating masking properties of human auditory system”, Signal Processing, volume 83, number 7, pp. 1445- 1456, 2003. [8] Coulson, A. J. “A generalization of nonuniform bandpass sampling,” IEEE Trans, on Signal Processing, volume 43, pp. 694-704, 1995. [9] Crochiere, R. E., Rabiner, L. R., Multirate digital signal processing, Ed. Prentice-Hall, Englewood Cliffs, N.J., 1983. [10] Darlington, D. J., and Campbell, D. R., “Sub-band adaptive filtering applied to hearing aids,” Proc.ICSLP’96, pp. 921-924, Philadelphia, USA, 1996. [11] Ephraim, Y. and Malah, D., “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans, on Acoustics, Speech, and Signal Processing, volume ASSP-32, no. 6, pp. 1109-1121, 1984. [12] Ephraim, Y. and Trees, H. L. V., “A signal subspace approach for speech enhancement,” IEEE Trans, on Speech and Audio Proc., volume 3, pp. 251-266, 1995. [13] Fletcher, H., “A Space-Time Pattern Theory of Hearing,” J. Acoustical Soc. Amer. 1, pp. 311-343, 1930. 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [14] Furuya, K. and Kaneda, Y., "Two-channel blind deconvolution for non-minimum phase impulse responses," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-97, pp. 1315-1328, 1997. [15] Gaik, W., “Combined evaluation of interaural time and intensity differences: psychoacoustical results and computer modeling,” J. Acoustical Soc. Amer. 94, pp. 98- 110,1993. [16] Gay, S.L., Tavathia, S., “The fast affine projection algorithm,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-95, vol. 5, pp. 3023-3026, 9-12 May 1995. [17] Gilloire, A. and Vetterli, M., “Adaptive filtering in subbands with critical sampling: analysis, experiments and applications to acoustic echo cancellation”, IEEE Trans, on Signal Processing, volume 40(8), pp. 1862-1875, 1992. [18] Grabke, J.W. and Blauert, J., “Cocktail-Party processors based on binaural models,” Workshop on Computational Auditory Scene Analysis, International Joint Conference on Artificial Intelligence 1995. [19] Haigh, J. A. and Mason, J. S., “Robust voice activity detection using cepstral features,” in Proc. IEEE TENCON, pp. 321-324, China, 1993. [20] Haykin, S., Adaptive Filter Theory, Prentice-Hall, Inc., New Jersey, 2001. [21] Hoffman, M. W., Trine, T. D., Buckley, K. M., and Tasell, D. J. V., "Robust adaptive microphone array processing for hearing aids: realistic speech enhancement", J. Acoustical Soc. Amer., 96(2), pp. 759-770, 1994. [22] Huang, Y., Benesty, J. and Elko, G. W., “Adaptive eigenvalue decomposition algorithm for real-time acoustic source localization system,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-99, volume 2, pp. 937-940, 1999. [23] Huo, J., Nordholm, S. and Zang, Z., “New weight transform schemes for delayless subband adaptive filtering,” Global Telecommunications Conference, 2001. GLOBECOM '01. IEEE, volume 1, pp. 197-201, 2001. [24] Itakura, F. and Saito, S., “Analysis Synthesis Telephony Based on the Maximum Like lihood Method”, Proceedings of the 6th International Conference on Acoustics, C17-- 20, Tokyo, Japan, 1968. [25] Itakura, F., “Minimum prediction residual principle applied to speech recognition”, IEEE Trans, on Acoust., Speech and Signal Processing, volume ASSP-23, no. 1, pp. 67-72, Feb. 1975. [26] Johnston, J.D., “Transform Coding of Audio Signal using Perceptual Noise Criteria,” IEEE J. on Select. Areas Commun., volume 6, pp. 314-323, 1998. 148 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [27] Kamath S.D. and Loizou, P.C., “A multi-band spectral subtraction method for enhancing speech corrupted by colored noise,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-92, May, 1992. [28] Klatt, D. H., “Prediction of perceived phonetic distance from critical-band spectra: a first step,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-82, Paris, pp. 1278-1281, May 1982. [29] Lee, J. H. and Lee, S.-Y., “Blind dereverberation of speech signals using independence transform matrix,” Proc. of the International Joint Conference on Neural Networks, volume 2, pp. 1453 -1457, July 20 -24, 2003. [30] Lindemann, W., “Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals,” J. Acoustical Soc. Amer., 80(6), pp. 1608— 1622 1986. [31] Liu, H., Xu, G. and Tong, L., “A deterministic approach to blind equalization,” in Proc. The 27th Asilomar Conference on Signals, Systems, and computers, volume 1, pp. 751-755, 1993. [32] Lockwood, P. and Boudy, J., “Experiments with a nonlinear spectral subtractor, hidden markov models and projection, for robust recognition in car,” Speech Commun., volume 11, pp. 215-228, June, 1992. [33] Martin, R., “Spectral subtraction based on minimum statistics,” EUSIPCO-94, pp. 1182-1185, Edinburgh, Scotland, September 1994. [34] Mathis, H. and Douglas, S. C., “Bussgang blind deconvolution for. impulsive signals”, IEEE Trans, on Signal Processing, volume 51, no. 7, July 2003. [35] Mathis, H., “Nonlinear Functions for Blind Separation and Equalization,” Ph.D. thesis, ETH Zurich Hartung-Gorre, Konstanz, 2001, ISBN 3-89649-728-6. [36] Merchant, G. A., and Parks, T. W., “Efficient solution of a Toeplitz-plus-Hankel coefficient matrix system of equations,” IEEE Trans, on Acoust. Speech, Signal Processing, volume ASSP-30, pp. 40-44, 1982. [37] Morgan, D.R., and Thi, J.C. “A delayless subband adaptive filter architecture,” IEEE Trans. Signal Processing, volume 43, No. 8, pp. 1819-1830, 1995. [38] Moulines, E., Duhamel, P., Cardoso, J. F. and Mayrargue, S., “Subspace methods for the blind identification of multichannel FIR filters,” IEEE Trans, on Signal Processing, 43(2), pp. 516-525, 1995. [39] Mukai, R., Araki, S. and Makino, S., "Separation and dereverberation performance of frequency domain blind source separation," Proceeding of ICA 2001 Conference, pp. 230-235, San Diego, Dec. 2001. 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [40] Nokas, G., Dermatas, E., and Kokkinakis, G., “Robust speech recognition in noisy reverberant rooms,” International Workshop SPECOM'98, 1998. [41] Ozeki, K. and Umeda, T., “An adaptive filtering algorithm using an orthogonal projection to an affine subspace and its properties,” Electronics and Communication in Japan, volume 67-A, No. 5, 1984. [42] Paliwal, K. and Basu, A., “A speech enhancement method based on Kalman filtering,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 12 , pp. 177 -180, April 1987. [43] Patterson, R.D., Holdsworth, J. and Allerhand M., "Auditory Models as preprocessors for speech recognition," The Auditory Processing of Speech: From the auditory periphery to words, Edited by M. E. H. Schouten, pp. 67-83, Mouton de Gruyter, Berlin, 1992. [44] Quackenbush, S. R., Barnwell, T. P., Clements, M. A., "Objective Measures of Speech Quality", Englewood Cliffs, NJ, Prentice-Hall, 1988. [45] Rabiner, L. R. and Sambur, M., “An algorithm for determining the endpoints of isolated utterances,” The Bell System Technical Journal, volume 54, No. 2, pp. 297- 315, Feb. 1975. [46] Resende, L. S., Romano, J. M. T. and Bellanger, M. G., “A fast Least-Squares algorithm for linearly constrained adaptive filtering”, IEEE Trans, on Signal Processing, 44(5), pp. 1168-1174, 1996. [47] Schroeder, M. R., Atal, B. S. and Hall, J. L., “Optimizing digital speech coders by exploiting masking properties of the human ear,” J. Acoustical Soc. Amer., volume 66, pp. 1647-1652, Dec, 1979. [48] Slaney, M., “An efficient implementation of the Patterson-Holdsworth auditory filterbank,” Apple Computer Technical Report #35, 1998. [49] Soede, W., Berkhout, A. J. and Bilsen, F. A., “Development of a directional hearing instrument based on array technology,” J. Acoustical Soc. Amer., 94(2), pp. 785-798, 1993. [50] Tsoukalas, D., Paraskevas, M. and Mourjopoulos, J., "Speech enhancement using psychoacoustic Criteria", Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-93, pp. 359-362, April. 1993. [51] Tucker, R., “Voice activity detection using a periodicity measure,” Proc.Inst. Elect. Eng., volume 139, pp. 377-380, Aug. 1992. [52] Vaidyanathan, P., Multirate Systems and Filter Banks, Prentice Hall, Inc., 1993. 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [53] Virag, N., “Single channel speech enhancement based on masking properties of the human auditory system,” IEEE Tran, on Speech and Audio Processing, volume 7(2), pp. 126-137, 1999. [54] Wang, D.L. and Brown, G.J., “Separation of speech from interfering sounds based on oscillatory correlation,” IEEE Trans, on Neural Networks, volume 10, no.3, pp. 684- 697, 1999. [55] Widrow, B., Glover, J. R., McCool, J. M., Kaunitz, J., Williams, C. S., Hearn, R. H., Zeidler, J. R., Dong, E. and Goodlin, R. C., “Adaptive noise cancelling: principles and applications,” Proceedings of IEEE, volume 63, pp. 1692-1716, 1975. [56] Widrow, B., Steams, S. D., “Adaptive Signal Processing,” Prentice-Hall, Englewood Cliffs, NJ, 1985. [57] Wolfe, P. J., and Godsill, S. J., “On Bayesian estimation of spectral components for broadband noise reduction in audio signals,” Technical Report CUED/F- INFENG/TR.404, Department of Engineering, University of Cambridge, 2001. http://www.eecs.harvard.edu/~patrick/publications/wolfe tr404 01 .pdf [58] Yamada, Y., Ochi, H. and Kiya, H., “A subband adaptive filter allowing maximally decimation”, IEEE J. Selected Areas Comms, volume 12(9), pp.1548-1552, 1994. [59] Yegnanarayana, B. and Murthy, P. S., "Enhancement of reverberant speech using LP residual signal," IEEE Trans, on Signal and Audio Processing, volume 8(3), pp. 267- 281,2000. [60] Zwicker, E., and Fasti, H., Psychacoustics, facts and models. Springer-Verlag, Berlin, 1990. 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. APPENDICES APPENDIX A: Affine Projection Algorithm (APA) The principle algorithm of our real-valued APA can be described as below. Given the reference noise input vector on subband at step n: nk (») = [nk (n), nk (n - 1 ),..., nk (n - M + 1)]7 (A. 1) and the desired response vector, which in our case is the primary input vector: *k 00 = lxk (nX xk(n -l),...,xk( n - p + l ) f (A.2) where M is the length of subband adaptive filter, P is the projection order. Then the PxM input signal matrix Ak on Ph subband is formed as: A (n) = [nk (n), nk (n - 1),..., nk (n - P + l ) f (A.3) The error signal on the h!h subband is obtained: e\{n) = x k{n )-A k{n)*xk{n) (A.4) Thus, we get the subband adaptive weights update function: w\(n + 1) = w \O ) + juAk' {n)[Ak (n)AkT (n) + S i y le \ («) (A.5) where n is the adaptation step size limited to 0<ju<2, the scalar < 5 is a regularization parameter used to cope with the ill-conditioning in matrix inversion, and I is a PxP identity matrix. 152 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. APPENDIX B: Constant Modulus Algorithm (CMA) Let Uk(n) be the input signal to CMA, which is just the tfh subband signal of reverberant signal u/n) ~ g(n)*s(n). First, we define the constant modulus Rk for each subband as follows: Rk =E skinf \ / (A.6) where Sk(n) denotes the clean and non-reverberant signal of Eh subband. The goal of CMA is to minimize the non-convex cost function Jk(n) on each subband based on the a priori knowledge of the constant modulus R* . The cost function Jk(n) has the form: ( k o ) f J k(n) = E (A.7) Correspondingly, the adaptive filter coefficients are updated as below: w\(n + l) = w 2k (n) + ii-yk(ri)-(Rk - \yk(n)\2)-u k(n) (A.8) where w2 k(nj represents the kth subband adaptive filter coefficients vector, and Uk(n) is the vector of data in the adaptive filter of subband. Thus, we get the estimate sk (n) of the clean subband signal sk(n) using the zero-memory nonlinear function Ni(n): h in) = N I iyk in)) = y k in) • 0 + Rk ~ \yk i n f ) (A.9) 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. APPENDIX C: Efficient Subband Decomposition In various subband structures, the polyphase structure can help make the implementation of analysis filter banks more efficient. C.l Polyphase Representation The polyphase branch filters for a filter h(n) is defined as: Pp(n) - h(nM - p), p = Q , 1,... ,M -1 Now consider an N-to-1 decimator as shown in Figure C.l x(n) y(n) h(n) Figure C.l: N-to-1 Decimator Using the input-to-output relation for decimator, we can get: oo y(n)= ^ h (k )x (n M - k) k =-oo By defining k = rM + p , y(n) can be written into the following summation: M - 1 * > A n ) = Z Y .h(< rM - p)x((n - r)N - p) p = .0 r=- A/-1 o o Z Z pp(r)xP(n ~ r) = Z ^ w **/>(") p= 0 r=-oo p = 0 (A. 10) (A. 11) (A. 12) where xp{n) = x(nM - p), p = 0,1,... , M - l . In this way, the polyphase decimator structure can be more practically realized with the aid of clockwise, starting at the p = 0 branch at time m=0. (See Figure C.2). 154 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. x(n) M - Figure C.2: Clockwise Commutator Model for the Polyphase Structure for an M-to-1 Decimator C.2 Implementation of Uniform DFT Filter Banks Uniform DFT filter bank is an important technique used to design a class of filter banks with equal passband widths. Consider a K-band uniform DFT analysis filter band, its filter bank has uniformly spaced filters with an even-type stacking arrangement, the center frequencies of the channels are: y t t I c * = 0 ,l,...tf - l (A.13) K By defining WK = eJ{2!llK), the kth channel signals can then be expressed as: oo X k{m)= -n)x(n)W fk", k - 0 ,\,... K - I (A.14) « = — oo If we define a set of band pass filter as: hk (n) = h(n)Wkn = h{n)ei(2nknlK ) (A. 15) then an alternative interpretation to the uniform DFT filter bank can be developed as following: oo X k (m) = Wfkm M £ x(n)K imM - n) (A. 16) fl— — oo This interpretation can be shown using Figure C.3. 155 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Xk (m) h, (n)=h(n)«W IW -kmM K Figure C.3: Complex Bandpass Filter and Modulator Interpretation o f the Ph Channel o f the Uniform DFT Filter Bank Analyzer Now, we consider a special case of oversampled uniform DFT filter bank. In this case, the oversampling rate M has the following relationship with the subband numbers K as K=MI, where I is a positive integer. (If 1=1, it is critically sampled, and if 1=2, it is oversampled by a factor of two). With this condition, by making change of variables: n = rK + p, p = 0 , 1 , K - \ (A.17) we can rewrite Eq.(A.14) into: oo K - 1 X , (m) = £ £ h(mM - r K - p)x(rK + p)W t -kp K r= — oop=0 K - 1 (A. 18) = Y Y . h ( ( m - r I ) M - p ) x p{r)Wl r=-oop=0 -kp K with xp{r) = x(rK + p), p = 0, 1, ..., K - 1, r == 0, ± 1, ... . The form of Eq.(A.18) can now be made clearer by defining a modified or extended set of polyphase filters of the from: PP(m) = h(mM - p), p = 0, 1,..., K - \ Applying this definition to Eq.(A.18) then leads to the form: (A. 19) X k{m) = j w K p = 0 kp Y^Ppim-r^x^r) (A.20) Eq.(A.20) suggests that Xk(m) is of the form of a Discrete Fourier Transform (DFT) of Y^Ppim-r^Xpir) . This DFT can be performed with Mlog2M efficiency by using FFT 156 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. algorithm. This leads to the overall polyphase filter bank structure more efficient. Meanwhile, recognize that the term in brackets in Eq.(A.20) simply defines an I sample interpolator. Therefore, we get such a polyphase structure for the uniform DFT filter bank analyzer with A>/M as shown in Figure C.4. x n ( 0 P0 (m) _______ 1 ^i(m ) | - j- » ► O I M K-l K-l Figure C.4: Polyphase Structure for the Filter Bank Analyzer fo r K=IM C.3 Implementation of Generalized DFT Filter Banks (GDFT) Consider the generalized DFT (GDFT) filter bank defined in the previous chapter, which can be expressed by the following equation: -{k + k u )(« + « 0) K X kG D FT(m)= Y ,h (m N - n)x(n)Wl n =— co We also talk about the case of K=IM only in this section. (A.21) By applying the same definitions of Eq.(A.13) and Eq.(A.15), we can rewrite Eq.(A.21) into: G D F T /'™\ _ w~i - k + ! c o^m M + n ^ X ™ ‘ (m) = Wj y, hk (n)x(mM - n) _ yp -< .k+ k0)n0-k,> mM W -k m M y hk (mM - n)x(n) (A.22) w - ( k +k0 )n0- k 0m M _ X DFT ^ 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Compared this equation with Eq.(A.16), we can get the polyphase structure for GDFT filter bank analyzer as shown in Figure C.5 based on the efficient implementation of Uniform DFT filter bank. Polyphase Structure of Uniform DFT F ilter Bank (Fig. C.4) Xn G D F T (m) o y y -( H + k a )n0-k„m N M X G D F T (m) W. - ( l+ £ 0 )n 0 - k {)m N M ■ X K .I0D Fr(m) j ^ — ( K - i+ k 0 )»,> -k„ m N M Figure C.5: Polyphase Structure for the GDFT Filter Bank Analyzer for K=IM 158 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Contributions to image and video coding for reliable and secure communications
PDF
Design and analysis of server scheduling for video -on -demand systems
PDF
Design and analysis of MAC protocols for broadband wired/wireless networks
PDF
Advanced video coding techniques for Internet streaming and DVB applications
PDF
Call admission control and resource allocation for quality of service support in wireless multimedia networks
PDF
Contributions to efficient vector quantization and frequency assignment design and implementation
PDF
Dynamic radio resource management for 2G and 3G wireless systems
PDF
Energy efficient hardware-software co-synthesis using reconfigurable hardware
PDF
Content -based video analysis, indexing and representation using multimodal information
PDF
Code assignment and call admission control for OVSF-CDMA systems
PDF
Computer-aided lesion detection in positron emission tomography: A signal subspace fitting approach
PDF
Fading channel equalization and video traffic classification using nonlinear signal processing techniques
PDF
Clustering techniques for coarse -grained, antifuse-based FPGAs
PDF
Information hiding in digital images: Watermarking and steganography
PDF
A study of unsupervised speaker indexing
PDF
High-frequency mixed -signal silicon on insulator circuit designs for optical interconnections and communications
PDF
High fidelity multichannel audio compression
PDF
Error resilient techniques for robust video transmission
PDF
Direction -of -arrival and delay estimation for DS -CDMA systems with multiple receive antennas
PDF
Design and performance analysis of low complexity encoding algorithm for H.264 /AVC
Asset Metadata
Creator
Huang, Hesu
(author)
Core Title
Efficient acoustic noise suppression for audio signals
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Kyriakakis, Chris (
committee chair
), Jenkins, Keith (
committee member
), Kuo, C.-C. Jay (
committee member
), Narayanan, Shrikanth (
committee member
), Zimmermann, Roger (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-581502
Unique identifier
UC11336510
Identifier
3236509.pdf (filename),usctheses-c16-581502 (legacy record id)
Legacy Identifier
3236509.pdf
Dmrecord
581502
Document Type
Dissertation
Rights
Huang, Hesu
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical