Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Error tolerance approach for similarity search problems
(USC Thesis Other)
Error tolerance approach for similarity search problems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ERROR TOLERANCE APPROACH FOR SIMILARITY SEARCH PROBLEMS by Hye-Yeon Cheong A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2011 Copyright 2011 Hye-Yeon Cheong Dedication To my loving Father, To my beloved husband, Alexis Tourapis, ii Acknowledgements Foremost, I owe a great debt of gratitude to my advisor Prof. Antonio Ortega for his patience, inspiration, continuous support and encouragement, and friendship throughout my PhD study. I feel very lucky to have him as my advisor and I would have been lost without him. I would also like to express my sincere gratitude to the rest of my thesis com- mittee: Prof. Sandeep Gupta, Prof. Cyrus Shahabi, Prof. Jay Kuo, and Prof. Melvin A. Breuer, for their insightful comments, and hard questions. I wish to thank my parents and sisters for their support and love. I would like to thank my mother-in-law, Litsa, for her patience, love, and invaluable advices. Mostimportantly,wordscannotdescribehowmuchiamthankfultomydearest husband, alexis, for his unconditional love and support. iii Table of Contents Dedication ii Acknowledgements iii List Of Tables vii List Of Figures viii Abstract xvii Chapter 1: Introduction 1 1.1 Relaxing the Requirement of Exact Solutions . . . . . . . . . . . . . 1 1.1.1 Fault and Defect Tolerant System . . . . . . . . . . . . . . . 2 1.1.2 Error Tolerant System . . . . . . . . . . . . . . . . . . . . . 4 1.2 Proximity Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Nearest Neighbor Search (NNS) Problem . . . . . . . . . . . 8 1.2.1.1 Definition of NNS problem . . . . . . . . . . . . . . 8 1.2.1.2 Central to a wide range of applications . . . . . . . 8 1.2.1.3 Computational challenge . . . . . . . . . . . . . . . 9 1.2.1.4 Tends to be tolerant to approximation . . . . . . . 9 1.2.2 Motion Estimation Process . . . . . . . . . . . . . . . . . . . 11 1.2.3 Vector Quantization for Data Compression . . . . . . . . . . 14 1.3 Contributions and Overview of the Thesis . . . . . . . . . . . . . . 16 Chapter 2: Approximation Algorithm for Nearest Neighbor Search Problem 19 2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 NNS Performance Evaluation Measure . . . . . . . . . . . . . . . . 22 2.2.1 ǫ−NNS Problem and Performance Measure . . . . . . . . . . 22 2.2.2 Proposed ¯ ǫ−NNS Problem Setting . . . . . . . . . . . . . . 24 2.2.3 Error Rate, Accumulation, Significance, and Variation . . . 26 2.3 Interpretation of NNS Approximation Using ¯ ǫ Measure . . . . . . . 28 2.3.1 NNS Problem Modeling . . . . . . . . . . . . . . . . . . . . 30 iv 2.3.2 NNS Algorithm Modeling . . . . . . . . . . . . . . . . . . . 31 2.3.3 NNS Algorithm Performance . . . . . . . . . . . . . . . . . . 31 2.3.4 Simple Case Study & Motivation . . . . . . . . . . . . . . . 32 2.3.4.1 Nonlinear scaling on metric computation resolution 34 2.3.4.2 Homogeneity of viewpoints . . . . . . . . . . . . . 36 2.4 QueryAdaptiveNearestNeighborPreservingMetricApproximation Algorithm (QNNM) . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . 38 2.4.2 Observations & Characteristics of Proposed QNNM . . . . . 39 2.4.3 Basic Structure of Proposed QNNM . . . . . . . . . . . . . . 42 2.4.3.1 Quantization . . . . . . . . . . . . . . . . . . . . . 43 2.4.3.2 Non-uniform scalar quantization within the QNNM metric . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.4.3.3 Quantization-based metric design conditions . . . . 48 2.4.3.4 Avoiding the overhead complexity. . . . . . . . . . 50 2.4.3.5 Q-space vs. target space . . . . . . . . . . . . . . . 53 2.5 Finding the Optimal Q . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.5.1 ProposedStochasticOptimizationAlgorithmtoFindtheOp- timal Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.5.1.1 Modeling input data set distribution . . . . . . . . 59 2.5.1.2 Objective function formulation . . . . . . . . . . . 61 2.5.1.3 Preprocessing based objective function evaluation . 62 2.5.1.4 Preprocessing and search process co-design . . . . . 64 2.6 Complexity-Performance Analysis . . . . . . . . . . . . . . . . . . . 67 2.6.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . 67 2.6.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 71 2.7 Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . 75 2.7.1 Motion Estimation for Video Compression . . . . . . . . . . 75 2.7.2 Vector Quantization for Data Compression . . . . . . . . . . 81 2.8 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 90 Chapter 3: Fault EffectModelingforNearest Neighbor SearchProb- lem 92 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.2 NNS Metric Computation Architecture & SA Fault Model . . . . . 96 3.3 Quality Measure for Nearest Neighbor Search with Faulty Hardware 98 3.4 Multiple Fault Effect Modeling . . . . . . . . . . . . . . . . . . . . 103 3.4.1 Binary Adder Tree and Fault Characterization . . . . . . . . 104 3.4.2 Multiple Fault Effect Modeling for Nearest Neighbor Search Metric Computation . . . . . . . . . . . . . . . . . . . . . . 105 3.4.3 Multiple Faults Effect Modeling for Matching Process . . . . 114 3.5 ErrorTolerantAlgorithmDesignAnalysisforNearestNeighborSearch118 v 3.6 Error Tolerant Architecture Design Analysis . . . . . . . . . . . . . 122 3.7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 127 Chapter 4: Conclusions and Research Directions 129 Bibliography 135 vi List Of Tables 2.1 Generalizedmetricfunctionstructureofourinterest(average/sumof each dimensional distance measurements) and corresponding metric function examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.2 Theaveragenumberoftransitions(ameasureofdynamicpowercon- sumption) and the number of gates (a measure of circuit size, static power consumption) for various types of adders (a) and multipliers (b) for different input bit sizes. [7] . . . . . . . . . . . . . . . . . . . 69 vii List Of Figures 1.1 Simple illustration of error tolerance concept. Relaxing the require- ment of exact solution/output to a small rangeǫ tol of near-optimal/ acceptable solution may dramatically reduce the cost. Cost can be interpreted as computational/ circuit complexity, power, manufac- ture, verification, testing cost and others. Curves or slopes of this trade-off relation can be controlled by different ET(approximation) algorithmsorETbaseddesign/synthesis/testingstrategies. Curves shown in this figure are collected from different applications (motion estimation for video coding and vector quantization for image cod- ing) when our proposed ET based algorithm (QNNM in Chapter 2) is applied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Simplified illustration of a typical digital circuit design flow. ET approach, i.e., relaxed constraints on performance, may be applied to each design and synthesis phase. . . . . . . . . . . . . . . . . . . 6 1.3 Motion estimation process. Current block as a query vector aims to selecting a motion vector that minimizes the distance metric. . . . . 12 1.4 Block diagram of a hybrid coding scheme with motion compensating prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.5 Vector quantization partitions a vector spaceR D into a finite set of non-overlapping Voronoi regions. Each region is associated with a code vector and the set of all the code vectors is called a codebook. (a) illustrates an example design of a codebook for 2D vector space. (b) illustrates how each input vector is encoded with and decoded from the index of its nearest code vector. . . . . . . . . . . . . . . 15 2.1 Illustration of approximate NNS problem (ǫ−NNS), where q and p∗ represent a query and its exact NN, respectively. . . . . . . . . . 22 viii 2.2 Illustration of error probability distribution over error-significance. Error-rate is defined as the sum of all non-zero error-significance probabilities. ¯ ǫ measure is equal to the mean value of a given dis- tribution. ET based acceptance curve would appear as a vertical line, determining a certain distribution to be classified acceptable or unacceptable depending on their mean value (¯ ǫ). . . . . . . . . . . . 27 2.3 (a)Exampleillustrationofsimulateddistancedistribution,F(x)and its corresponding NN-distance distribution, F min (x) for |R| = 100. Shaded area represents expected NN-distance E(d(q,NN(q))). (b) Illustrationofbothdistancedistribution,F(x)anditscorresponding NN-distancedistribution,F min (x)basedontheactualdatacollected from motion estimation process for video coding application using a linear search with |R| approximately 4000. F q (x) and F min q (x) are both averaged distributions. Five representatively different video sequences in terms of their distance distribution were chosen. . . . . 29 2.4 Graphical illustration of DE (= E(d(r A ,q))−E(d(r ∗ ,q)) = solid shaded area - shaded area with hatching) for several simple NNS al- gorithms. Three graphs illustrate the performance of three different algorithms which respectively (a) simply returns a randomly chosen pointfromRasitsapproximateNNpoint, (b)randomlysubsamples R and performs a linear search to find the minimum distance point, and (c) discards some information for distance metric computation (e.g., subsampling dimensionality) and performs a linear search and return resulting NN point. Both (b) and (c) cases reduce the com- plexity by half (reducing the number of searching points N (b) and dimensionality D (c)). . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.5 (a) Linear approximation of NN-distance distribution F min (x). (b) Linear approximation of distance distribution F(x). (c) Nonlinear approximation of F min (x). (d) Nonlinear approximation of F(x). Above four figures illustrate the impact on NNS performance when reducingthemetriccomputationresolution/precision(staircasefunc- tions: 16 representation levels for these example graphs) instead of blindly computing each distance metric to full precision (solid blue line: e.g., 65536 representation levels if 16 bit-depth used). . . . . . 35 2.6 illustrates that most of critical regions we need to measure the dis- tance accurately is the region belonging to less than 1% of a given RDD distribution (red box). . . . . . . . . . . . . . . . . . . . . . . 36 ix 2.7 TwodimensionalexampleillustrationofproposedQNNM algorithm and the relation betweenψ q andq and their related spaces. F R ,F V , andF Q indicatethedistributionsofdatasetRrepresentedinoriginal (U,d), viewpoint (U V ,d V ), and quantized viewpoint space (U Q ,d Q ), respectively. This shows how ψ q can be obtained from q such that overhead computation of finding ψ q prior to each querying can be avoided. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.8 Comparison of conventional quantization scheme and our proposed quantization based metric function design. . . . . . . . . . . . . . . 48 2.9 Example illustration of hardware architectures of three metric com- putations. (a)Minkowski metricd(q,r) = ( P j |q j −r j | p ) 1/p isshown as an example original metric d. (b) shows proposed QNNM met- ric d obj (q,r) = d ψ (ψ q (r)) which approximates d. ψ qj is a query- dependent non-uniform scalar quantizer which compresses input to a fewer bits (typically 8 or 16 bits into 1-bit), replacing |q j −r j | p computation in (a). Blank circle represents an operator determined by d ψ , e.g., it can be an adder if d ψ (x) = P x j , comparator if d ψ (x) =max(x j ), or logical operator OR if d ψ (x) =∨x j . (c) is the equivalent metric of d obj of (b), represented with query-independent quantizer Q j . Q j minimizing ¯ ǫ (2.17) is found via off-line optimiza- tionandusedtodetermineψ qj whichequivalentlyminimizesaverage NNS error ¯ ǫ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.10 Trade-offbetweenthecomplexity(cost)andNNSaccuracydegrada- tion(¯ ǫ),e.g.,thecoarserquantizationcauseshigherNNSdistortion¯ ǫ withlowercomplexitycost whilethefinerquantizationleadstolower ¯ ǫ at the expense of higher cost. Different applications (i.e., different F V ) result in different trade-off curves. The right-hand side of the curve represents the region of inefficient choices of Q and d Q while the left side represents infeasible choices of Q and d Q design. Our goal is to design Q and d Q given F V and ¯ ǫ tol such that its resulting complexity and ¯ ǫ pair correspond to the red points on the curves, achieving the lowest complexity with ¯ ǫ ≤ ¯ ǫ tol . (Left) QNNM per- formance: the trade-off curve between error in accuracy (ΔPSNR in dB) and complexity. QNNM is applied to the vector quantiza- tion process for image coding & motion estimation process for video coding applications. (Right) With the data set R ∼ N D (0,Σ) with ρ ij = 0.5, σ 2 = 100 for all i,j, this compares the performance of three approximate NNS methods: bit truncation (BT), dimension subsample (DS), and QNNM. . . . . . . . . . . . . . . . . . . . . . 56 x 2.11 (a) A simple example of 2D viewpoint space partitioned by Q= {θ 1 ,θ 2 ,θ 3 }. P z ,U z values at all six gray points need to be retrieved from F V ,H V to compute f obj . (b) 3D search space of Q. x 1 , x 2 , x 3 are 3 arbitrarily chosen candidate solutions for Q. (c) shows the preprocessedstructures(F V ,H V havingP z ,U z valuesatallpointsin (c)) to computef obj forx 1 ,x 2 ,x 3 . However,f obj1 for all gray points in (b) can be computed with the same F V , H V in (c). . . . . . . . 63 2.12 An example illustration of grid-based optimization algorithm with ω=2,γ=3 in 2D search space. It searches for the optimalQ ∗ (quan- tization parameters) for QNNM (2.15). It shows the iterations of either moving or 1/γ-scaling a grid G of size ω×ω. Note that this should not be confused with ME search or NNS search algorithms. 65 2.13 Theproposedapproachleadstocomplexityreductionsini)thesum- mation process after quantization (significant circuit complexity re- duction due to bit size compression), and ii) a set of dimension- distancecomputationpriortoquantization(dimension-distancecom- putation becomes unnecessary). . . . . . . . . . . . . . . . . . . . 68 2.14 Diagrams of arithmetic circuits for (a) N-bit ripple carry adder and (b) 4 bit array multiplier. . . . . . . . . . . . . . . . . . . . . . . . 68 2.15 Complexitybehaviorcomparisonofconventionall 1 andl 2 normmet- ric computation vs. proposed distance quantization based l p norm metric computation with respect to the input bit size and dimen- sionality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.16 QNNM performance when NNS data set R is modeled with multi- variate normal distribution R∼N D (0,Σ) with different dimensions DandcovariancematricesΣwhichhaveidenticalmarginalvariances σ 2 j ∀j and covariance ρ ij ∀i,j for all dimensions. Straight blue line (no metric) represents average distance per dimension from a query toanyrandomlychosenobject,whichcanbeconsideredastheworst bound of NNS performance. Blue curves (average distance/D to ex- act NN) represents the lower bound of NNS performance (original metric). (a) ℓ 1 as original metric. R ∼ N D (0,Σ) with σ 2 =100 ∀j and ρ ij = 0, 0.5, 0.8, 0.95,∀i,j. (b) The same setting as (a) except σ 2 =400∀j. (c) The same setting as (a) except original metric to be ℓ 2 (Euclidean distance). (a) and (b) show the degree of dispersion of data set of interest does not affect QNNM performance in terms of NN approximation error ¯ ǫ. However, the stronger the correlation ρ and the higher the dimension, it is more advantageous for QNNM. . 74 xi 2.17 DifferentrepresentationofFig.2.16representingQNNMperformance with respect to NNS approximation error ¯ ǫ over different dimension- ality and covariance. All results are under the same data set model settings as those of Fig. 2.16 with different original/benchmark met- ric (Left) ℓ 1 and (Right) ℓ 2 . . . . . . . . . . . . . . . . . . . . . . . 75 2.18 Performance comparison of bit truncation methods (BT), dimen- sional subsampling (DS), and proposed QNNM methods for three different metrics: (a) ℓ 1 norm, (b) ℓ 2 norm, and (c) weighted ℓ 2 norm distance metric. X-axis represents how much metric complex- ity is reduced in percentage while y-axis represents average perfor- mance degradation ¯ ǫ in finding NN. The input distribution of data set R∼N D (0,Σ) with σ 2 j = 100, ρ ij = 0.5 ∀i,j . . . . . . . . . . . 76 2.19 2D statistical distribution of motion estimation data. (a) shows dis- tribution of candidate blocks (in this example, two adjacent pixels) for motion searching, i.e., F in metric space. (b) represents distri- bution of difference/distance between a query block and candidate blocks, i.e., F V in viewpoint space. (c) shows distribution of differ- ence/distance between a query block and its best matching block (NN), i.e., F NN in viewpoint space. . . . . . . . . . . . . . . . . . . 77 2.20 Rate-distortion curves for 9 different CIF test sequences. They com- pare ME performance with original metric (SAD) and with pro- posed 1-bit QNNM under three different ME settings and two differ- ent search algorithms. ΔdB shown represents average performance (PSNR) difference between original metric and 1-bit QNNM. . . . . 78 2.21 AverageQNNMperformanceof9differenttestsequences. QNNMis performedwithfullsearch(left)andwitharepresentativefastsearch method, EPZS (right). Numbers in parentheses shown in graph leg- end represent the number of metric computation performed per a query macroblock. Three different ME settings from simple to com- plexweretested: (i)16×16block(D = 256)withforwardprediction only and full-pel, (ii) same as i) but allows bi-directional prediction, and(iii)variableblocksizes,quarter-pelaccuracy,bi-directionalpre- diction setting. With 1-bit quantizer per dimension/pixel used with full search, on average 0.02dB, 0.06dB, 0.09dB performance loss in- curredfor MEsettings(i), (ii), and(iii), respectively. Similarly1-bit quantizerusedwithEPZSresultsinaverage-0.01dB,0.02dB,0.02dB performance loss for ME settings (i), (ii), and (iii), respectively. . . 80 xii 2.22 (Left) comparison of our objective function f obj1 simulated numer- ically with different input distribution settings. (Right) compares the objective functionf obj1 based on the collected ME data (dashed lines) with simulated experiments excluding intra and skip modes (solid lines). Both used 1-bit quantization thus a single threshold (x-axis). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 2.23 Illustration of performance degradation in coding efficiency with re- lation to the 1-bit quantizer threshold value for various input se- quences. Representativelydifferenttestsequences(low/highmotion, texture) were selected to cover a wide range of variations in input sequences. Result shows very low sensitivity to input sequence vari- ationintermsoftheoptimalthresholdvalue. Thisconfirmshighho- mogeneity of viewpoints towards nearest-neighbors assumptionwhich our proposed algorithm is based on. The range of threshold is from 0 to 255 while only 0 to 120 range is shown. . . . . . . . . . . . . . 81 2.24 Comparisons of video compression performance in rate distortion sense for five different motion estimation scenarios: i) conventional full search and full metric computation, ii) checking points are re- duced, iii) dimensionality is reduced, iv) uniform quantization (bit truncation) is performed, and v) proposed 1 bit quantization based metric computation. scenarios ii), iii), and iv) have the same com- plexity reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 2.25 Comparisons of Complexity-Performance trade-offs of four different scenarios: each scenario reduces i) size of a data set R, ii) dimen- sionality of each datar∈R, iii) bit depth of each data dimension by truncating least significant bits (equally seen as uniform quantiza- tion on each data dimension), and iv) resolution of each dimension- distance (proposed distance quantization based metric computation. X axis represents complexity percentage to that of original full com- putation. Y axis represents the RD performance loss measured in dB, thus zero represents no performance degradation. . . . . . . . 83 2.26 2D statistical distribution of vector quantization data. (a) shows distributionoftheinputtrainingimagedatatogenerateacodebook. (b) shows distribution of a set of code vectors (codebook with a size of 512) where NNS is performed with every query vector to find its closest code vector (F in metric space). (c) represents distribution of difference/distance from a query vector to every code vector (F V in viewpoint space). (d) shows distribution of distance from a query vector to its NN/best matching codeword (F NN in viewpoint space). 85 xiii 2.27 Rate-distortion curves of VQ based image coding performance for 4 gray-scale512x512testimages. FourdifferentVQscenariosarecom- pared: (i) generalized Lloyd algorithm (GLA) based VQ codebook withℓ 2 distance (standard VQ), (ii) tree-structured vector quantiza- tion (TSVQ), (iii)(iv) GLA based VQ codebook with 1-bit QNNM and 2-bit QNNM. Performance was compared for dimensionality 8, 16, and 32 and for different codebook sizes, 32, 64, 128, 256, 512, 1024. 2-bit QNNM outperforms TSVQ method. . . . . . . . . . . . 86 2.28 Average performance loss (average over different codebook sizes and test images in PSNR) by QNNM decreases as the number of quan- tization level increases. Right axis represents the average number of bits per dimension/pixel allowed for QNNM. 3 different vector sizes/dimensionality, 8, 16, and 32, were shown. This result as well as the one of ME application both support our previously drawn performance analysis that QNNM performance improves with di- mensionality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 2.29 While VQ input data is identically distributed on each dimension or pixel, their optimalQ thresholds are not identical in all dimensions. (a)and(b)show2Dexamplesofhaving(a)identicaland(b)different q threshold for each dimension. Difference of optimal Q thresholds of different dimensions increases and performance improves as cor- relation between dimensions becomes stronger. (c) shows how much performance of VQ based image coding (goldhill with D=16) im- proves as q threshold is optimized in terms of exploiting correlation information between dimensions. It is also the case for ME applica- tion and others which involve data set with high correlation. . . . 88 3.1 Simple illustration of single stuck at fault model. . . . . . . . . . . 95 3.2 (Upper) Four examples of MMC architectures for NNS metric com- putation, represented as a dependence graph, where only processing elements(PEs)areshownforsimplicity. ADdenotesaprocessingel- ement which is, forℓ 1 metric for instance, an absolute difference and addition. M denotes a minimum value computation. (Lower) Tree structuredflowgraphcorrespondingtothetype-2MMCarchitecture on the left. In this graph, the processing element (leaf nodes, shown hereasAD)andaddition(innernodes)computationsareseparated, thus AD denotes only the distance computation for each dimension (e.g., absolute difference forℓ 1 metric). Ifℓ 2 metric is used as a cost metric, leaf nodes become PEs computing a square difference. . . . 97 xiv 3.3 ExampleErrorRateandSignificanceduetoSSAfaultwithinMMCA for motion estimation application. . . . . . . . . . . . . . . . . . . . 98 3.4 (Left) Rate-Distortion impact of faults. (Right) Temporal Quality Variations due to Faults. . . . . . . . . . . . . . . . . . . . . . . . . 100 3.5 Rate control scheme distributes the fault impact over time. a SSA fault within the ME metric computation circuit is simulated. . . . 101 3.6 Illustration of single stuck at fault and its impact on output of that faulty interconnect (upper), and on output of final matching metric cost (lower). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.7 Examples of multiple SA fault cases with different dependency rela- tions between faults. (a) two serially placed faults in the same path of circuit, (b) two parallel faults placed in separate paths, and (c) simple combination of (a) and (b) cases. These relations are illus- trated more clearly in the fault dependency graphs shown below the binary trees. Metric computation (MC) error function Δ(D) can be formulated via functions of {H i (D)} M i=1 defined in (3.2) using two basic operators{⊙,+} (function addition and a variant of function composition operator defined in (3.3)), according to the dependency relation of a given set of faults as shown above, whereD denotes the final cost value (e.g., SAD). . . . . . . . . . . . . . . . . . . . . . . 108 3.8 Δ(D) computation for each of the three cases described in Fig- ure 3.7. Faultsf 1 andf 2 are SA1 faults with parameters{1,p 1 ,α 1 }, {1,p 2 ,α 2 } respectively and f 3 is SA0 fault with {0,p 3 ,α 3 }. (a) a fault f 1 affects the input to the next fault position f 2 . Thus, the total cost shift at D (MC error function Δ(D)) is the summation of the cost shift due to f 1 at D, (H 1 (D)) and that of f 2 at shifted D by H 1 (D), which is (H 2 (H 1 (D)+D)). (b)Δ(D) is the simple linearsummationofH 1 (D)andH 2 (D). (c)onlydependentrelation between faults {f 1 ,f 2 } and f 3 is depicted which is essentially the same process as (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.9 Simple example of MMC circuit with multiple faults and its corre- sponding MC error function. . . . . . . . . . . . . . . . . . . . . . . 112 3.10 Analysis of matching process error due to multiple SA faults. . . . 112 3.11 A comparison of two different sets of MV candidates with respect to the MP error due to a single SA fault. . . . . . . . . . . . . . . . . 113 xv 3.12 Effect of ΔSAD, N and fault location parameter p with respect to their impacts on the error rate P E and expected error significance ¯ E based on our SA fault effect model (3.7),(3.8) and P minSAD (D) obtained from the CIF foreman sequence. . . . . . . . . . . . . . . 116 3.13 Illustrates that input sequence, fault parameters, and search algo- rithms all affect the matching process performance. Hardware archi- tecture on the other hand affect the distribution of potential fault locations. Error tolerant search algorithm can reduce the impact of a given fault. Error tolerant MMC architecture includes smaller percentage of critical/unacceptable fault locations. . . . . . . . . . . 116 3.14 (top)comparisonofthreesearchalgorithmsFS,TSS,andEPZSwith respect to ΔSAD andN. (left) ME algorithm comparison in coding efficiency loss for different SSA faults with parameters, p = 0···15 from LSB to MSB atα = 1 and 7/8. (right) RD impact of faults for different ME search algorithms. . . . . . . . . . . . . . . . . . . . . 117 3.15 Illustration of different error tolerance level for different search algo- rithms. Impact of SA fault within the metric computation to search performance is measured with error rate and error significance. . . . 118 3.16 Average fault effect ¯ E on ME performance (top) and overall coding efficiency(bottom)areshownfordifferentαparameters(left)andfor different average binary tree depths (D =N ·E(α)) which describe different MMC architectures (right). Perfectly balanced trees (type- 3 in Figure 3.2) show the minimum expected error introduced by a single SA fault. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 3.17 Experimental results on actual ET based decision example with sin- gle SA fault assumption in the context of H.264/AVC comparing different metric hardware architectures and search algorithms with respect to the percentage of unacceptable fault positions given dif- ferent the decision thresholds. With a perfectly balanced tree MMC architecture (type-3), less than 1% of fault positions result in more than 0.01dB quality degradation. . . . . . . . . . . . . . . . . . . . 124 xvi Abstract As the system complexity increases and VLSI chip circuit becomes more highly condensed and integrated towards nano-scale, the requirement of 100% exact exe- cution of designed operations and correctness for all transistors and interconnects is prohibitively expensive, if not impossible, to meet in practice. To deal with these problems, defect tolerance (DT) and fault tolerance (FT) techniques at the design and manufacturing stages have been widely studied and practiced. FT and DT techniques ideally try to mask all effects of faults and internal errors by exploit- ing and managing redundancies, which leads to more complex and costly system to achieve hopefully ideal or at least acceptable output quality at the expense of additional complexity cost. Ontheotherhand,arecentlyintroducederrortolerance(ET)approachisanex- ercise of designing and testing systems cost-effectively by exploiting the advantages ofa controlledrelaxation ofsystem level outputqualityprecision requirement. The basic theme of ET approach is to allow erroneous output leading to imperceptible degreeofsystemlevel qualitydegradationinordertosimplifyandoptimize thecir- cuit size and complexity, power consumption, costs as well as chip manufacturing yield rate. Motivation of ET approach is two-fold: By exploiting certain range of distortions/errors which lead to negligible impact on system level performance, i) a significant portion of manufactured dies with such minor imperfection of physi- cal origin can be saved, thus increasing overall effective yield, and ii) considerable circuit simplification and high power efficiency is attainable by systematically and purposefully introducing such distortions/errors. xvii With a primary focus on similarity search problem within the scope of this ET concept, this thesis presents several methodologies to deal with the problems of system complexity and high vulnerability to hardware defects and fabrication process variability and consequently a lower yield rate. Many real world similarity search problems present serious computational bur- den and remain a long standing challenge as they involve high dimensional search space, often largely varying database, and increasingly more dynamic and user- interactive search metrics. This leads to complexity (response time, circuit/power complexity)becomingmoreimportantcriterionforasuccessfuldesign. Thus, great potential benefit of ET concept can be reaped in such area. First part of this thesis studies similarity search problem and presents a novel methodology, called quantization based nearest-neighbor-preserving metric approxi- mationalgorithm(QNNM),whichleadstosignificantcomplexityreductioninsearch metric computation. Proposed algorithm exploits four observations: homogeneity of viewpoint property, concentration of extreme value distribution, NN-preserving not distance-preserving criterion, fixed query info during search process. Based on these, QNNM approximates original/benchmark metric by applying query- dependent non-uniform quantization directly on the dataset, which is designed to minimize the average NNS error, while achieving significantly lower complexity, e.g., typically 1-bit quantizer. We show how the optimal query adaptive quantizers minimizing NNS error can be designed “off-line” without prior knowledge of the query information to avoid the on-line overhead complexity and present an effi- cient and specifically tailored off-line optimization algorithm to find such optimal quantizer. ThreedistinguishingcharacteristicsofQNNMarestatisticalmodelingofdataset, employing quantization within a metric, and query-adaptive metric, all of which allowQNNMtoimproveperformancecomplexitytrade-offsignificantlyandprovide robust result especially when the problem involves non-predefined or largely vary- ing data set or metric function due to its intrinsic flexibility from query to query change. xviii Withmotionestimation(ME)application,QNNMwithcoarsest1-bitquantizer per pixel (note that a quantizer for each pixel is different to exploit the correlation of input distribution) results in on average 0.01dB performance loss while reducing more than 70% to 98% metric computation cost. Second part of the thesis, we present a complete analysis of the effect of inter- connect faults in NNSmetric computation circuit. We provideda model to capture the effect of any fault (or combination of multiple faults) on the matching met- ric. We then describe how these errors in metric computation lead to errors in the matching process. Our motivation was twofold. First, we used this analysis to predict the behavior of NNS algorithms and MMC architectures in the presence of faults. Second, this analysis is a required step towards developing efficient ET based acceptability decision strategies and can be used to simplify fault space, sep- arate acceptable/unacceptable faults, and consequently simplify testing. Based on thismodel, weinvestigatedtheerrortolerancebehaviorofNNSprocessinthepres- ence of multiple hardware faults from both algorithmic and hardware architecture point of views by defining the characteristics of the search algorithm and hardware architecture that lead to increased error tolerance. With ME application, our simulation showed that search algorithms satisfying error tolerant characteristics we defined perform up to 2.5dB higher than other searchmethodsinthepresenceoffault,andtheyalsoexhibitsignificantcomplexity reduction (more than 99%) without having to compromise with the performance. Our simulation also showed that if error tolerant hardware architecture is used, the expected error due to a fault can be reduced by more than 95% and more than 99.2% of fault locations within matching metric computation circuits result in less than 0.01dB performance degradation. Throughout this thesis, motion estimation process for video coding and vector quantization for image coding are tested as example applications to numerically evaluate and verify our proposed algorithms and modeling in actual practical ap- plication setting. xix Chapter 1 Introduction 1.1 RelaxingtheRequirementofExactSolutions A computer is an electronic device (hardware) designed to manipulate data accord- ing to prescribed logical operations (software). Computers, over half a century, have moved deeply towards becoming everyday commodities and are embedded in practically every aspect of our daily lives from laptops and PDAs to mobile phones, digital cameras, toys, and all sorts of electronic appliances/devices. As its range of applications and our dependence on such devices continue to increase, the complexity of computer hardware and software has also increased. The progress of semiconductor manufacturing technology towards deep sub- micron feature sizes, e.g., sub-100 nanometer technology, has created a growing impact of hardware defects and fabrication process variability, which lead to re- duction in both initial and mature yield rates 1 [34]. It is becoming economically more difficult to produce close to 100% correct circuits as they become more highly condensed and integrated towards nanoscale. The profitability of manufacturing such digital systems heavily depends on the fabrication yield since lower yield rates 1 Yield is defined as the proportion of operational circuits to the total number fabricated. 1 can increase the chip manufacturing/verification/testing cost and delay their time- to-market. Computer software also has been increasingly complex and has become more vulnerable to system operation/programming errors (bugs). We first introduce some related terminology. Manufacturing defect is any un- plannedimperfectiononthewafersuchasscratchesfromwafermishandling, orthe result of undesired chemical and airborne particles deposited on the chip during manufacturing process [32]. But not all defects lead to faults, affecting the circuit operation. Causes of fault include design error, manufacturing defects, external disturbance, system misuse. An error is a manifestation of a fault in a system, where an element produces different logical value from its intended one. A fault in a system can occur without causing an error, which is referred to as latent fault. A failure occurs when a running system deviates from its designed behavior/function due to errors. 1.1.1 Fault and Defect Tolerant System The requirement of 100% bug-free software and/or 100% correctness for transistors and interconnects (fault free) is prohibitively expensive, if not impossible, to meet in practice. To deal with these problems, defect tolerance (DT) and fault tolerance (FT) techniques at the design and manufacturing stages have been widely studied and practiced [32] [29]. Fault tolerance is the ability of a system to continue to provide correct ser- vice /operation of given task even in the presence of internal faults (either hard- ware failures or software errors), while defect tolerance refers to any circuit de- sign/implementation that provides higher yield than the one without defect toler- ancetechniqueapplied[6]. Theseapproachesaimatmaskingalleffectsofdefectsor 2 faultssuchthatnoerroroccursatanyoutputofafaultysystem. Correctoperation typically implies no error at anysystem outputbutcertain types of errors might be acceptable depending on its severities. Although there exist many variations, the basicthemeofallfaultanddefecttolerancetechniquesistointroduce/managesome formofredundancies/safeguardsandextrafunctionalitiesintermsofthehardware, software, information, and/or time [33]. In other words, components that are sus- ceptible to defects or faults are replicated, such that if a component fails, one or more of the non-failed replicas will continue to perform correct operations with no appreciable disruption. Note that both fault and defect tolerance approaches as- sume the ideal/exact output requirement and try to perform the exact action and produce the precise output. While it is very hard to devise absolutely foolproof, 100% reliable system, the more we strive to reduce the probability of error, the higher the cost. To put it differently, FT and DT techniques lead to more complex system to achieve hopefully the ideal or at least acceptable output so that the high quality reliability is attained at the expense of additional complexity cost. Forsomeapplicationssuchasmedicaldevicesorfinancialapplications,theexact precise execution of designed operations without error is critical even if it comes with extra costs to safeguard and guarantee the output quality. However, for the enormous domain of other applications such as systems dealing with multimedia information,searchengines,orpowersensitivemobiledevices,controlledrelaxation of output quality precision is desirable and advantageous. This is because many applicationsofthesekindsshareakeycharacteristicofbeingabletotoleratecertain typesoferrorsatthesystemoutputswithincertainlevelsofseverities. Furthermore, many complex problems and tasks entail approximation and estimation techniques at the algorithmic level to obtain the solution. Also note that most approximation algorithms proposed are not optimized in terms of circuit complexity (primarily 3 optimized in time and space complexity with respect to the input size) and the measures they employ to specify what constitutes exact or optimal solution often does not accurately capture the notion of user quality. Therefore the restriction on exact precision in solution quality is, realistically speaking, an over-idealization. For such systems and applications, a new application-oriented paradigm, error tolerance is recently introduced. Definition of error tolerance [6] A circuit is error tolerant with respect to an application if (1) it contains defects that cause internal errors and might cause external errors, and (2) the system that incorporates this circuit produces acceptable results. 1.1.2 Error Tolerant System Inthisthesisweparticularlyfocusonthisapplication-orientederrortolerance(ET) approach [6] which is fundamentally different from FT or DT. Within the scope of error tolerance concept, we develop in following chapters several methodologies which allow to deal with the problem of system complexity and high vulnerability to internal errors and faults in a different perspective. The error tolerance approach assumes that the performance requirement itself is relaxed from the exact solution to an acceptable range of solutions and tries to exploit and maximize the benefit of this relaxed performance constraint to signif- icantly simplify and optimize the circuit size and complexity, power consumption, costs as well as yield rate. Thus, the basic theme of ET approach is to sacrifice ex- actoutputqualitytoanacceptablerangeofoutputqualityofsystem(resultsmight be erroneous but by inappreciable or imperceptible degree of quality degradation) to make the system “simpler” and cost-effective, whereas FT and DT techniques 4 0 0.5 1 1.5 2 2.5 3 0 20 40 60 80 100 Complexity (%) Performance degradation ε tol Figure1.1: Simpleillustrationoferrortoleranceconcept. Relaxingtherequirement of exact solution/output to a small range ǫ tol of near-optimal/ acceptable solution maydramaticallyreducethecost. Costcanbeinterpretedascomputational/circuit complexity, power, manufacture, verification, testing cost and others. Curves or slopes of this trade-off relation can be controlled by different ET(approximation) algorithms or ET based design/ synthesis/ testing strategies. Curves shown in this figure are collected from different applications (motion estimation for video coding and vector quantization for image coding) when our proposed ET based algorithm (QNNM in Chapter 2) is applied. lead to more complex and costly system to achieve hopefully the ideal or at least acceptable output. Fig. 1.1 shows a simplified illustration of the error tolerance approach. The importance of error tolerance approach also lies in its potential to signif- icantly enhance the effective yield at the testing stage. Classical test techniques have certain deficiencies when applied to error-tolerant systems. First, they deal only with pass (perfect:fault free) versus fail (imperfect:faulty) whereas error tol- erant testing makes use of the gray area between perfect and imperfect by further partitioning faulty dies into more categories. In addition, classical test techniques ignoreerrorcharacteristicsand, atmost, relateerrorstofaultlocation. Errortoler- ance technique, on the other hand, does not focus only on fault location but rather on identifying the quantitative and qualitative implications of a defect. Finally, 5 Typical Design Flow Algorithm Design / System Specification Logic/Physical Synthesis Circuit Design Fabrication Testing/ Packaging Accept Discard Problem / Task System Specification Functional/ Architecture Design Figure 1.2: Simplified illustration of a typical digital circuit design flow. ET ap- proach, i.e., relaxed constraints on performance, may be applied to each design and synthesis phase. classical tests are universal for a given part, whereas error tolerant testing might require application-specific tests [6]. Fig.1.2showsatypicaldigitalcircuitdesignflowconsistingofaseriesofdistinct activitiesthataretypically(althoughnotnecessarily)performedinsequence. Error tolerance techniques can be applied to each phase from high level synthesis to logic level synthesis phase as well as testing phase. Refer to [1] [6] for further details on error tolerant computing and its related research. In this thesis, we study one of the most fundamental problems called proxim- ity problems in the context of error tolerance approach and present several ET based algorithms from the algorithmic level to hardware architecture and testing level. We primarily focus on the nearest neighbor search (NNS) problem, for it is central in a vast range of applications. Many real world NNS problems present serious computational burden and remain a long standing challenge as they involve high dimensional search space, often largely varying dataset, and increasingly more dynamic and user-interactive search metrics. We first present a novel ET based NNS algorithm called, quantization based nearest-neighbor-preserving metric ap- proximation algorithm (QNNM) which provides a solution to significantly simplify 6 the metric computation process while preserving the fidelity of the minimum dis- tance ranking based on statistical modeling of data set (Chapter 2). In Chapter 3, we provide a model that estimates the impact of the manufacturing faults (single as well as multiple) within a NNS metric computation circuit on the search perfor- mance. Wefurtherderivethecharacteristicsofthesearchalgorithmsandhardware architecturesthatleadtoincreasederrortoleranceproperty(i.e.,reducedimpactof agiven faultontheoutputperformance). Motionestimation processfor videocod- ingandvectorquantizationfordatacompressionaretestedasexampleapplications for our study to verify our results numerically. Intherestofthischapter, briefdescriptionofNNSproblem(Section1.2.1), two example applications, motion estimation (Section 1.2.2) and vector quantization (Section 1.2.3), and the overview and goal of this thesis (Section 1.3) are provided. 1.2 Proximity Problems The proximity problem is a class of geometric problems in computational geome- try which involve estimation of distances between geometric objects. Some exam- ple proximity problems on points include closest pair problem, diameter (furthest pair) problem, NNS, k-NNS, approximate NNS problems, minimum spanning tree, variants of clustering problems, range query, reverse query, batched query, and bi-chromatic problems. Although in this thesis, techniques primarily focusing on the NNS problem are studied and developed, it is possible to apply and extend the same concept of our work to other proximity problems. 7 1.2.1 Nearest Neighbor Search (NNS) Problem 1.2.1.1 Definition of NNS problem NNS problem we primarily consider in this thesis is simply defined as: given N pointsinmetricspace,withpreprocessingallowed,howquicklycananearestneigh- bor of a new given query point q be found? That is, supposeU is a set andd is a distance measure (metric) onU such that d :U×U → [0,∞), taking pairs of objects in U and returning a nonnegative real number. Given a set R ⊂ U of N objects and a query object q ∈ U in a metric space M = (U,d), NNS problem is to find efficiently the (approximately) nearest or most similar object in the set R to a query q (min{d(q,s)|s∈R}). We chose NNS problem as a good example where the error tolerance property can be well exploited for the following reasons described below. 1.2.1.2 Central to a wide range of applications This is among the most fundamental problems and a basic component of most proximity problems and it is also of major importance to a wide range of appli- cations. NNS problem appears with different names (e.g., post office problem, proximity search, closest point search) in a diverse literature in the area of com- putational geometry, statistics [23], data analysis/mining, biology [44], information retrieval [45] [22], pattern recognition/classification [20], data compression [24], computer graphics [39], artificial intelligence [43]. Typically the features of each object of interest are represented as a point in a vector spaceR D and the metricd is used to measure similarity between objects. Most optimization problems requir- ing finding a point/object minimizing or maximizing a given cost function can be also seen a special case of NNS problem. 8 1.2.1.3 Computational challenge While it is of major importance to a wide range of applications, many real world search problems present serious computational burden and remain challenging as theytakeplaceinaveryhighdimensionalsearchspace(intheorderofhundredsor thousands). Furthermore, some applications involve data set R arbitrarily varying from query to query (e.g., motion estimation) or even user controllable/interactive search metric d which may also change from query to query. There has been ex- tensive research and a multitude of algorithms proposed to reduce computational complexityoftheseproblemsintermsofquerytime, storagespace, andpreprocess- ing complexity. But very little work has been done with respect to the efficiency of metric evaluation and circuit complexity. NNS problem is also appropriate for the ASIC (application-specific integrated circuit) customized chip rather than for the general-purpose one, since it involves very heavy data access, regularized and repetitive computations in distance evalu- ation for every pair of objects. This adds extra advantage of applying application- specific error tolerance approach to NNS problem. 1.2.1.4 Tends to be tolerant to approximation Pursuing the exact NNS solution becomes meaningful if a specification of what constitutes optimal solution, i.e., metric measure for NNS case, is accurate. Math- ematically formulated metric function itself is often the approximation of certain quality of dissimilarity, e.g., perceptual similarity between images, and does not accurately capture the notion of user quality. Strictly speaking, even the physical distance can be seen as an approximation since for the ease of computation, the physical distance is represented with a single number that is the average of a series 9 of distance measurement for each dimension, e.g., sum of measurements for each dimension, while more accurate description of proximity should be associated with a distribution of each dimensional distance. Meaning, even the nearest in physical distance may not be the true nearest. Furthermore, the significance of “nearest” neighbor decreases as dimensionality increases while many real world problems are posed with high dimensionality [3]. Thisisbecauseasdimensionalityincreases, distributionofrandomobjectsinterms of distance tends to be concentrated at the same distance. Therefore finding the exact nearest neighbor may not be as meaningful as it appears to be and small distance difference based on a given metric seldom matters inpractice. Alsooveryears,researchershaveshownthatapproximationallowssim- plification of algorithm in terms of computational complexity as well as hardware architectures while performance is often as good as the exact solution. Especially in high dimensional search space (D> 10) as is the case in many practical applica- tions,almostallNNSalgorithmsallowacertaindegreeofapproximationduetothe exponentially increasing time and space requirement of exact NNS computation. Furthermore, note that NN metric is only to identify/preserve NN information and not the distance itself. Thus, as long as nearest-neighbor can be identified, not all metric computation has to be done blindly to full precision and one could maintain fine precision only where it is needed for instance. Inthefollowingsection, weintroducemotionestimationandcompensationpro- cessforvideocodingandvectorquantizationprocessfordatacompressionasexam- pleapplicationsofNNSproblem. Thesespecificapplicationsarechosentoevaluate our analytical results numerically on actual real world applications and to verify them experimentally. 10 1.2.2 Motion Estimation Process One of the key factors of video compression efficiency is how well the temporal redundancy is exploited by motion compensated prediction. Motion compensation is a technique to represent a picture in terms of the transformation of a reference picture which is previously transmitted/stored in advance so the compression ef- ficiency can be improved. Block motion compensation (BMC) is most commonly used, where the frames are partitioned into non-overlapping blocks of pixels (e.g., macroblocks of 16×16 pixels). Each block is predicted from a block of equal size in thereferencepictures(previousand/orfutureframes)asillustratedinFig.1.3. The blocks are shifted to the position of the predicted block, where the shift is repre- sented as a motion vector (MV). There are many variations from this simple form of motion estimation including variable block sizes, subpel accurate MV search, weighted prediction, etc. But the basic theme of motion estimation(ME) process is to determine motion vectors which maximize the efficiency of prediction or the compression efficiency. Thus, performance of ME process is critical to optimize coding in a rate distortion sense. The ME process comprises a search strategy of the motion displacement offset (motionvector)forinterpicturepredictivecodingandamatchingmetriccomputa- tion. Practically, it is infeasible to use the ideal metric that computes the ultimate effect, includingtransform, quantization, andentropycoding, ofallpossiblemotion vectors (MVs) to select the best MV. Therefore, a simpler metric has been used instead, which searches for the MV that minimizes certain distance metric prior to residual coding from a certain set of candidate MVs. The search strategy of the ME process determines the set of candidate MVs and its refining process so that it tends to optimize the trade-off between complexity and performance. The 11 Current Frame Previous Frame Corresponding position Assume we are looking for MV of this block Search Area (+A,-A). Note A pixels are added to all sides. searching Best match Motion Vector Figure 1.3: Motion estimation process. Current block as a query vector aims to selecting a motion vector that minimizes the distance metric. cost/matchingmetricfunctionwhichisalsoreferredtoasadistortionmeasurecap- tures the prediction error; commonly l 1 or l 2 norms are used, i.e. Sum of Absolute Differences (SAD) or Sum of Squared Differences (SSD). After the encoder selects the MV that minimizes cost, it encodes the difference block (prediction residual) between the original and motion compensated blocks. Each residual block is trans- formed, quantized, and entropy coded. Fig. 1.3 shows a block diagram of motion compensated hybrid video coding scheme. The motion estimation process for video coding is a particularly interesting ap- plication in the context of error tolerance due to its computational and memory intensive characteristics, the characteristic of compression itself (i.e., the lossy rep- resentationofuncompressedsignals), andalsothelimitedcapacityofhumanvisual perception ability. ME can be seen as a NNS problem with a current block being a query object q and a set of candidate MVs from the reference pictures being a data set R. A 12 Input Video Disparity Estimation VLC + - Quantization Quantization -1 + Transform Transform -1 Loop Filter Reference Picture Buffer Disparity Intra or Inter (motion & illumination changes) Encoder Control Disparity Compensation AVC encoder Bitstream VLD Quantization -1 + Transform -1 Loop Filter Reference Picture Buffer Disparity Compensation Output AVC decoder Bitstream Figure 1.4: Block diagram of a hybrid coding scheme with motion compensating prediction. 13 difference with respect to the typical NNS problem is that a data setR is not fixed over queries and essentially R changes with q. ME process tends to pose significant computational burden. For the simplest ME scheme using only 16×16 block size with a single reference frame and linear search with a search range ±32 could take more than 80% of the total encoding power of a video encoding system. Computational penalty of ME can be much worse if it includes other commonly used ME settings such as bi-predictive coding, multiple reference pictures, and variable block sizes. 1.2.3 Vector Quantization for Data Compression A vector quantizer encodes a multidimensional vector space into a finite set of values from a discrete subspace of lower dimension. A lower-space vector requires less storage space, so vector quantization (VQ) is often used for lossy compression of data such as image, video, audio or for voice recognition (statistical pattern recognition). VQ maps D-dimensional vectors in the vector space R D into a finite set of vectors Y ={y i :i = 1,2,,N}. Each vector y i is called a code vector and the set of all the code vectors, Y, is called a codebook. A vector space R D is partitioned into N code cells (clusters) C = {C i : i = 1,2,,N} of non-overlapping regions where each regionC i , called Voronoi region, is associated with each codeword/code vector, y i . Each Voronoi region C i is defined by: C i ={x∈R D :kx−y i k≤ x−y j ,∀j6=i} where VQ maps each input vector x∈C i in code cell C i to the code vector y i . 14 The Encoder The Decoder Input vectors Code vectors Voronoi Region (a) (b) Figure 1.5: Vector quantization partitions a vector space R D into a finite set of non-overlapping Voronoi regions. Each region is associated with a code vector and thesetofallthecodevectorsiscalledacodebook. (a)illustratesanexampledesign of a codebook for 2D vector space. (b) illustrates how each input vector is encoded with and decoded from the index of its nearest code vector. 15 VQ consists of two processes: (i) designing a codebook, i.e., generating the data set of NNS as a simple 2D codebook example illustrated in Fig. 1.5(a), and (ii) performing nearest neighbor/code vector search to a given input vector (a query of NNS) for encoding as illustrated in Fig. 1.5(b). There are several VQ algorithms whichhavedifferentcodebookdesignmethodbutallofthemperformexactnearest neighbor search. VQ performs NNS for each input vector (a query) to all code vectors (data set) and encodes it with the nearest code vector. A set of all code vectors can be seen as data set of NNS problem. 1.3 Contributions and Overview of the Thesis In this thesis, with a primary focus on NNS problem, we propose to exploit er- ror tolerance concept for proximity problems from algorithm level to hardware architecture design and testing level. The main theme throughout this thesis is to model and analyze the behavior of performance degradation with the range of faults/errors allowed or with the level of simplification made. Proposed modeling and algorithms throughout this thesis are numerically evaluated and verified by simulating using example applications: motion estimation process for video coding and vector quantization for data compression. Quantization based nearest-neighbor-preserving metric approxima- tion algorithm (QNNM) (Chapter 2) [11] [13] First part of the thesis studies similarity search problem and presents a novel methodology, called quantization based nearest-neighbor-preserving metric approx- imation (QNNM) algorithm, which leads to significant complexity reduction in search metric computation. Proposed algorithm exploits four observations: (i) 16 homogeneity of viewpoint property, (ii) concentration of extreme value distribu- tion, (iii) NN-preserving not distance-preserving criterion, and (iv) fixed query information during search process. Based on these, QNNM approximates origi- nal/benchmark metric by applying query-dependent non-uniform quantization di- rectly on the dataset, which is designed to minimize the average NNS error, while achieving significantly lower complexity, e.g., typically 1-bit quantizer. It entails nonlinear sensitivity to distance such that finer precision is maintained only where it is needed/important (the region of expected nearest neighbors) while unlikely regions to be nearest neighbors are very coarsely represented. We show how the optimalqueryadaptivequantizersminimizingNNSerrorcanbedesigned“off-line” without prior knowledge of the query information to avoid the on-line overhead complexity and present an efficient and specifically tailored off-line optimization algorithm to find such optimal quantizer. Three distinguishing characteristics of QNNM are statistical modeling of dataset, employing quantization within a met- ric, and query-adaptive metric, all of which allow QNNM to improve performance complexity trade-off significantly and provide robust result even when the problem involvesnon-predefinedorlargelyvaryingdatasetormetricfunction. Withmotion estimation application, QNNM with coarsest 1-bit quantizer per pixel (note that a quantizer for each pixel is different to exploit the correlation of input distribution) results in on average 0.01dB performance loss while reducing more than 70% to 98% metric computation cost. Manufacturing fault effect modeling and error tolerant designs for NNS problem (Chapter 3) [16] [9] [12] Second part of the thesis, we present a complete analysis of the effect of intercon- nect faults in NNS metric computation circuit. We provided a model to capture 17 the effect of any fault (or combination of multiple faults) on the matching met- ric. We then describe how these errors in metric computation lead to errors in the matching process. Our motivation was twofold. First, we used this analysis to predict the behavior of NNS algorithms and MMC architectures in the presence of faults. Second,thisanalysisisarequiredsteptowardsdevelopingefficientETbased acceptability decision strategies and can be used to simplify fault space, separate acceptable/unacceptable faults, and consequently simplify testing. Based on this model, we investigated the error tolerance behavior of NNS process in the presence of multiple hardware faults from both algorithmic and hardware architecture point of views by defining the characteristics of the search algorithm and hardware archi- tecturethatleadtoincreasederrortolerance. WithMEapplication, oursimulation showed that search algorithms satisfying error tolerant characteristics we defined performupto2.5dB higherthanothersearchmethodsinthepresenceoffault, and they also exhibit significant complexity reduction (more than 99%) without having to compromise with the performance. Our simulation also showed that if error tolerant hardware architecture is used, the expected error due to a fault can be reducedbymorethan95%andmorethan99.2%offaultlocationswithinmatching metric computation circuits result in less than 0.01dB performance degradation. 18 Chapter 2 Approximation Algorithm for Nearest Neighbor Search Problem In this chapter, we present a novel NNS metric approximation technique which maps the original metric space to a simple one in such a way that (approximate) nearest-neighbor (NN) is preserved, while reducing potential wasting of resources in computing high precision metric for unlikely solutions/points. This algorithm significantly reduces computational complexity while the penalty to be paid in performance is very small. This metric approximation is not fixed for all potential queries but adaptively adjusted at each query process, exploiting the information of query point and statistical modeling of data set to provide better performance complexity trade-off. Thus proposed method is intrinsically flexible from query to query and efficient with largely varying database/metric functions. ThisNNSproblemhasalonghistoryandwefirstsummarizeexistingworksand their approaches. We then provide certain intuitive interpretation of the notion of approximation and provide a different perspective which can be taken into account to further reduce computational complexity. Then we formulate the problem and present our proposed solution as well as experimental results. 19 2.1 Related Work The NNS problem has been investigated in computer science for a long time and a multitude of algorithms have been presented, all of which focus on how to “prepro- cess” a given set of pointsR so as to efficiently answer queries assuming (i) that a data setR is fixed (or slightly varying) over queries and (ii) that a metricd is also fixed. ApproachestoNNSproblemingeneral, canbebroadlygroupedintotwoclasses : (i) reducing the subset of data to be examined (ability to discard large portion of data points during the search process), (ii) transforming the metric space. The first class includes a variety of approaches that create efficient data struc- tures by partitioning metric space and query execution algorithm (e.g., variants of k-d tree [47], R-trees, metric trees [52], ball-trees [41], similarity hashing [28]). The latter class includes metric/feature dimension reduction techniques, such as principal component analysis [26], latent semantic indexing [21], independent com- ponent analysis, multidimensional scaling and singular value decomposition using linear transforms (e.g., KLT, DFT, DCT, DWT). The latter class includes metric embedding techniques which provide ability to lower dimensions, transform general metric space to normed space, or map complex metric space to simpler one. All these existing approaches focus on altering/preprocessing data set R so as to minimize volume (number of metric computation iteration) and size (e.g., dimensions) of examined data, using for example, data-restructuring [46] [48], fil- tering [40], sorting, sampling [36] [15], transforming [37], bit-truncating [25] [8], quantizing [35] etc, to ultimately reduce CPU and I/O costs for querying. 20 Although many among these algorithms provide good reduction in search com- plexityin low dimensionality, NNSproblemwith highdimensional andlarge collec- tions of data set as is the case with many real-world problems, still remains a long standing challenge. Furthermore, since they are preprocessing-based techniques based on a fixed dataset and a metric function, they face serious challenges if NNS problem involves largely varying data set R changing significantly from query to query and/or variable/user-controllable search metricd. For applications requiring to solve such problem, most of existing methods simply use random or empirical samplingorapplicationspecifictailoringofdatasetaccordingtothenatureofdata, which therefore is not generally applicable. Along with any improvement achieved from these existing approaches, further significant complexity reduction is attainable by increasing the efficiency of metric computation. While most of works have been primarily focusing on designing al- gorithms which scale well with the database size as well as with the dimension, our proposed work focuses on error tolerant hardware implementation of metric com- putation for NNS search process. Our work can be seen as a technique that maps the original complex metric space to a significantly simpler metric space query- dependently, which is designed to minimize the average NNS error, while achieving significantly lower metric computation complexity. Furthermore, proposed method assumes neither a fixed dataset nor fixed metric function over queries, which en- ables efficient operation even with varying, user adaptive metric functions based on user preference and largely varying dataset. Next section describes several conventional NNS performance evaluation mea- sures and their pros and cons. We also introduce a different NNS performance measure and using this, we provide insight and intuition for why there might be more room for improvement in metric computation complexity. 21 ε q p* Figure 2.1: Illustration of approximate NNS problem (ǫ−NNS), where q and p∗ represent a query and its exact NN, respectively. 2.2 NNS Performance Evaluation Measure Primary concerns of this problem are both qualitative and quantitative aspects of the solution: accuracy (how close the returned object is to the optimal ob- ject/solution) and complexity (the amount of resources required for the algorithm execution). Complexity of NNS algorithm typically refers to space complexity (memoryrequirementandpreprocessingcost)andquerytimecomplexity(thenum- berofmetricevaluationstoansweraqueryandcomputationalcomplexityofmetric evaluation). Complexity is in general compared as a function of the input size (size of a data setN and dimensionalityD), typically represented by the worst case be- havior. As to the accuracy, NNS problem can be either exact NNS or approximate NNS problem. Exact algorithms obviously reach an exact/optimum solution, while approximation algorithms find an approximate solution that is close enough to the true solution. 2.2.1 ǫ−NNS Problem and Performance Measure ǫ−NNS problem Most common approximate NNS problem is the ǫ−NNS problem (also called c− 22 NNS, where c = 1+ǫ > 1 is approximation factor) defined as follow: Given an error bound ǫ> 0, we say that a point r∈R is an ǫ−approximate NN of a query q if d(r,q)≤ (1+ǫ)·d(r ∗ ,q), where r ∗ is the exact NN of q (Fig. 2.1). In other words, an ǫ−NNS algorithm returns points whose distance from the query is no more than (1 +ǫ) times the distance of the true nearest neighbor, but, without any information on the actual rank of the distance of the returned points and also without any guarantee of returning any such point. In general as one decreases ǫ for better accuracy, the probability of returning an empty set increases. Typical approach of evaluating/comparing ǫ−NNS algorithms is simply com- paring complexity as a function of the input size and ǫ> 0 as a notion of approxi- mation measure providing guaranteed performance bound in a worst case point of view. However, in order to more accurately assess the quality of a returned set of an ǫ−NNS algorithm, beyond the performance bound measure ǫ, one could con- sider two complementary measures: recall [38] or relative distance error (DE) [53] defined as: Recall = |R ∗ ∩R A | |R ∗ | (2.1) DE = d(r A ,q)−d(r ∗ ,q) d(r ∗ ,q) = d(r A ,q) d(r ∗ ,q) −1 (2.2) where R ∗ and R A denote, respectively, a set of all relevant/qualifying points (all near neighbors within the range of distance (1+ǫ)·d(r ∗ ,q)) and a returned set of points retrieved by the approximation algorithm of interest, while |·| represents the cardinality of a set. Similarly, r ∗ and r A are the exact and retrieved (by the approximation algorithm) nearest neighbor of q, respectively. 23 Recall measure represents the probability of approximate algorithm returning a qualifying point. This measure considers all points returned as being of equal importance. Relative distance error DE, on the other hand, quantifies the quality ofareturnedsetbutprovidesnoindicationontheprobabilityoffailinginreturning any qualifying point. Note that DE measure is input data dependent. In order to more accurately assess an improvement in complexity reduction due toagivenalgorithm,improvementinefficiency(IE)[53]measurecanbeusedwhich relates the costs of the exact and approximated searches. IE = cost(r ∗ ) cost(r A ) (2.3) where cost(r) corresponds to the execution cost to retrieve r. 2.2.2 Proposed ¯ ǫ−NNS Problem Setting ǫ−NNS problem constrains the quality of solutionr to be within the worst perfor- mancebound(1+ǫ)·d(r ∗ ,q)suchthatitonlyallowseithertohaveasolution(that is close enough to the nearest neighbor based on ǫ) or to have nothing. Thus typi- callytheperformanceofapproximationalgorithmshasbeenanalyzedusingaworst case point of view. This worst case conditioning or worst case analysis provides a guaranteed performance safety and is useful in characterizing or proving whether the approximate solution is provably close to the optimal one independently from the input data distribution. However, it provides several drawbacks as well. Worst case performance analysis generally does not serve as a practical predic- tive tool and is seldom met in practice: the empirical verification of approximation algorithm performance often shows large gap between actual expected performance 24 and the worst case analytical bound. Moreover, for many useful practical algo- rithms involving probabilistic/randomized method or input data dependent pro- cessing, theoretical worst case analysis often becomes less meaningful since worst case can be very unlikely to happen in probability. Also for typical ǫ−NNS algo- rithms,reducingtheworstcaseboundofsolutionquality(increasingtheprobability of having higher quality solution) leads to decreased recall rate (higher probability of returning an empty set) which is often input dependent. Therefore in this thesis we consider approximate NNS problems without con- straint on specific performance safety boundary. In other words, an approximate algorithm is expected to always return its own best result and its performance will be evaluated in probabilistic terms as the expected/average solution quality (closeness to the optimal one). We refer to it as ¯ ǫ−NNS problem. This is more appropriate problem setting for our purpose, considering that our proposed work focuses on increasing the metric computation efficiency and not on creating a data structure and query process. To evaluate algorithms of interest in this setting, we only need to consider one NNS accuracy measure ¯ ǫ measuring the averagedistancedegradationintroducedbyapproximateNNSalgorithmrelativeto that of the benchmark algorithm (exact NN). ¯ ǫ, the solution quality as an expected value, is defined as: ¯ ǫ = E(d(r A ,q)−d(r ∗ ,q)) E(d(r ∗ ,q)) = E(d(r A ,q)) E(d(r ∗ ,q)) −1 (2.4) where E(·) is the expectation operator over the input distribution. And similarly, complexity efficiency is measured as an expected value, IE: IE = E(cost(r ∗ )) E(cost(r A )) (2.5) 25 Although these measures provide more close numerical estimate to actual prac- tical setting performance, reflecting average typical behavior in performance and complexity efficiency of given algorithm, they are intrinsically input data depen- dent. Therefore, to yield accurate results from a practical point of view, the input distribution needs to be as close as possible to the realistic distribution of the prac- tical applications. Therefore, analysis on the performance behavior for different input characteristics may be needed as well for better evaluation. Conventional ǫ−NNS and ¯ ǫ−NNS problem setting we propose to use can be briefly summarized as follow: Given the input distribution,ǫ−NNS algorithm per- formance can be evaluated using Recall, relative distance error (DE), and data independent worst case bound ǫ, d(r A ,q)≤ (1+ǫ)·d(r ∗ ,q), (2.6) while the performance of proposed ¯ ǫ−NNS algorithm can be evaluated only with ¯ ǫ ( 2.4). E(d(r A ,q)) = (1+¯ ǫ)·E(d(r ∗ ,q)) (2.7) 2.2.3 ErrorRate,Accumulation,Significance,andVariation Particularly for the error tolerance approach, it is critical to select a good measure that quantifies the notion of error. Often error rate, error accumulation, error significance (these three are also known as RAS), and error variation measures are used. They are all input distribution dependent measure and quantify certain quality/behavior of errors under typical application specific environment. They 26 error significance error probability (%) θ accept ? Figure 2.2: Illustration of error probability distribution over error-significance. Error-rate is defined as the sum of all non-zero error-significance probabilities. ¯ ǫ measure is equal to the mean value of a given distribution. ET based acceptance curve would appear as a vertical line, determining a certain distribution to be clas- sified acceptable or unacceptable depending on their mean value (¯ ǫ). describe more accurately the expected performance in practical setting unlike mea- sures that are input distribution independent. Thus, such measures are consistent with our purpose of error tolerance approach. As definedin [5], error rate is the fraction of resultsthatare erroneous ina long sequence of output patterns under normal operating conditions. Error significance deals with the degree to which a piece of data is in error. Error accumulation (retention)dealswiththechangeinerrorrateorerrorsignificanceovertime, which can be seen as the ‘error propagation’ measure. For example, in a feed-forward pipeline architecture, a defect probably produces a fixed error rate. Many finite state machines, such as a linear feedback shift register, lead to errors propagated or accumulated. In the NNS problem setting, error rate refers to how often inexact solution is returned (Pr(r ∗ 6= r A )) while error significance refers to how much additional quality degradation is introduced (d(r A ,q)−d(r ∗ ,q)). In our ¯ ǫ−NNS performance analysis,errorrateisnotverymeaningfuljustastheworstcaseanalysisisnotavery 27 good measure in practice. In fact, performance can be fully described by the error probability distribution over error significance as in Fig. 2.2. These distributions in this figure represent the NNS performance with different faulty NNS metric computationcircuits. However, eachdistributioncanbeseenastheperformanceof an NNS approximation algorithm. Perfect system (e.g., fault-free chip, exact NNS algorithm) must have 100% error probability on zero error-significance. Error-rate is defined as the sum of all non-zero error-significance probabilities. Expected error significance, which is in fact non-normalized ¯ ǫ, is a good measure to describe such distribution with a single number, thus a good practical performance measure. Notethatifasystemissensitivetothetemporalperformancevariation,variance of errordistributioncan be also taken into accountfor theacceptance decision. For example,whenmotionestimationforvideocodingisconsidered,ifMEperformance varies significantly from block to block, it could lead to unpleasant blocky artifact. However,inthisthesis,wedonotconsidervariancesincetheacceptablerangeof¯ ǫis usually quite small (typically acceptable ¯ ǫ should be constrained to “imperceptible /indistinguishable” degree of error to application users) and small ¯ ǫ also assures small variance as well in practice. 2.3 Interpretation of NNS Approximation Using ¯ ǫ Measure Thegoalofthissectionisnottomodelalldependenciesbetweendifferentexecution steps of all NNS algorithms but to provide a general form for algorithms so that we can obtain intuitive interpretation of approximation error. 28 2.3.1 NNS Problem Modeling For this purpose, we model NNS problem in probabilistic terms using a distance distribution, instead of assuming specific fixed data set R. More specifically, given a metric space (U,d) whered defines distance between any pair of points inU and returns a nonnegative real number,d :U×U → [0,∞), we model a set ofN points R ⊂ U, R ={r 1 ,r 2 ,··· ,r N } as a set of N random samples from the distribution F U overU. The relative distance distribution (RDD) ofd(r,q) from a queryq∈U withrespecttoanyr∈U canbeseenasaviewpointtakenfromqtowardsametric space U in terms of distance d, which is defined as: F q (x) =Pr(d(r,q)≤x) (2.8) Similarly, a viewpoint taken from q towards its nearest neighbors in terms of d in U can be represented as the relative nearest-neighbors distance distribution (RNNDD) as: F min q (x) =Pr(d(r ∗ ,q)≤x) = 1−(1−F q (x)) N (2.9) Fig. 2.3 illustrates examples of F q (x) and F min q (x) for both simulated and actual data based cases. 2.3.2 NNS Algorithm Modeling A linear search computes distances between q and all points r ∈ R and selects points in R which corresponds to the minimum distance. We denote by NN(q) a set of minimum distance points with respect to q obtained using a linear search. 29 (a) 0 0.2 0.4 0.6 0.8 1 F (x) F min (x) (b) Akiyo Coastguard Container Foreman News F (x) F min (x) 1 - 0.8 - 0.6 - 0.4 - 0.2 - 0 - 0 10000 20000 30000 40000 50000 60000 70000 Figure 2.3: (a) Example illustration of simulated distance distribution, F(x) and its corresponding NN-distance distribution, F min (x) for |R| = 100. Shaded area representsexpectedNN-distanceE(d(q,NN(q))). (b)Illustrationofbothdistance distribution, F(x) and its corresponding NN-distance distribution, F min (x) based on the actual data collected from motion estimation process for video coding ap- plication using a linear search with |R| approximately 4000. F q (x) and F min q (x) are both averaged distributions. Five representatively different video sequences in terms of their distance distribution were chosen. 30 NN(q) ={r ∗ ∈R|∀r∈R⊂U,q∈U :d(r ∗ ,q)≤d(r,q)} (2.10) We could also roughly characterize and model most NNS algorithms similarly. Any arbitrary NNS algorithm, which we refer to it as A from here on, can be modeled, in terms of its average behavior, as a linear search with different metric space (U A ,d A ) with a distribution F U A , thus consequently different R A with size N A and different d A such that, NN A (q) ={r ∗ ∈R A |∀r∈R A ⊂U A ,q A ∈U A :d A (r ∗ ,q A )≤d A (r,q A )} (2.11) Similarly,wedenoteRDDandRNNDDofalgorithmAwithrespecttoqbyF q,A (x) and F min q,A (x). For example, efficient NNS algorithms are likely to have distribution F U A con- centratedneargivenqandN A <N, whichmeansalgorithm Aefficientlydiscarded most of points inR that are less likely to beq’s nearest neighbors and selectedR A , a subset of R for actual comparison process. 2.3.3 NNS Algorithm Performance The performance of NNS algorithm A can be measured by ¯ ǫ as in 2.4: ¯ ǫ = E(d(r A ,q)−d(r ∗ ,q)) E(d(r ∗ ,q)) For convenience in description, we use non-normalized ¯ ǫ denoting it by simplyDE. DE =E(d(r A ,q))−E(d(r ∗ ,q)) (2.12) 31 DE can be further written as: DE = Z R + μ(x)f A (x)dx− Z R + xf(x)dx, (2.13) where f(x) is a point probability or pdf (probability density function) form of F min q (x), and analogously f A (x) is a pdf of F min q,A (x). μ(x) represents the expected distance (in terms of the original distance measure d) of q’s neighbors that have a given distance x according to the algorithm A’s distance measure d A . μ(x) =E(d(r,q)|d A (r,q) =x,∀r∈R A ) Given the input distributions, since E(d(r ∗ ,q)) is fixed (it is equal to the shaded area in Fig. 2.3 (a)), the E(d(r A ,q)) term will determine the performance of al- gorithm A in terms of the solution quality. In other words, our goal is to find an algorithm which minimizesE(d(r A ,q)) while maximizing complexity reduction. Note that ¯ ǫ or DE provide no information on the complexity efficiency which has to be considered separately. 2.3.4 Simple Case Study & Motivation Fig. 2.4 is a graphical illustration ofDE (where it can be seen to be the difference between the solid shaded area and the shaded area with hatching) for several sim- ple NNS algorithms: (a) when algorithm A returns a randomly chosen point from R, (b) when A randomly subsamples R and performs a linear search and returns the minimum distance point, and (c) when A discards some information for dis- tance metric computation (e.g., subsampling dimensionality) and performs a linear search. Solid shaded area and shaded area with hatching representE(d(r A ,q)) and 32 0 0.2 0.4 0.6 0.8 1 F q (x) F q min (x) 0 0.2 0.4 0.6 0.8 1 F q (x) F q min (x) 0 0.2 0.4 0.6 0.8 1 F q (x) F q min (x) (a) (b) (c) N=100 N=100 N=50 N=100 Figure2.4: GraphicalillustrationofDE (=E(d(r A ,q))−E(d(r ∗ ,q))=solidshaded area - shaded area with hatching) for several simple NNS algorithms. Three graphs illustratetheperformanceofthreedifferentalgorithmswhichrespectively(a)simply returnsarandomlychosenpointfromRasitsapproximateNNpoint,(b)randomly subsamplesRandperformsalinearsearchtofindtheminimumdistancepoint,and (c) discards some information for distance metric computation (e.g., subsampling dimensionality) and performs a linear search and return resulting NN point. Both (b) and (c) cases reduce the complexity by half (reducing the number of searching points N (b) and dimensionality D (c)). 33 E(d(r ∗ ,q)), respectively. Case (a) can be seen as a worst case bound. While both (b) and (c) cases reduce the complexity by half (reducing the number of searching points N (b) and dimensionality D (c) by randomly subsampling), reduction in searching point impacts performance less than reduction in dimensionality of the metric function/data. However, due to the curse of dimensionality, efficient reduc- tion in dimensionality usually helps reducing the size of searching points exponen- tially. Most existing NNS algorithms approach the problem by efficiently reducing either searching points N (e.g., using certain data structures) or dimensionality D (e.g., by performing certain transform process), or by a combination thereof. Well designed algorithms based on this approach can lead to minimal performance loss. However, further significant reduction in complexity is attainable by reducing the metric computation resolution/precision instead of blindly computing each dis- tance metric to full precision. 2.3.4.1 Nonlinear scaling on metric computation resolution Fig. 2.5 provides intuitive idea of why significant resolution reduction may lead to insignificant performance loss. This can be seen as a problem of achieving the targeted NNS performance while minimizing the amount of resources (bits, num- ber of quantization bins) required, by maintaining fine precision only where it is needed/important. Typical quantization is designed to compress the data while maintaining the necessary fidelity of the source data. However if we employ certain quantization process within the metric, the goal is not particularly in compressing the data but reducing the computational complexity while maintaining the fidelity ofminimumdistancerankingandnotthefidelityofdistanceitself. Thus,ourtarget metric needs to have a nonlinear sensitivity to distance, such that finer precision is assigned to the region of expected nearest neighbors and coarser resolution is 34 (b) (d) (a) (c) N=100 0 0.2 0.4 0.6 0.8 1 F min (x) F(x) F A min (x) 0 2 4 6 8 x 10 -3 f(x) f A (x) N=100 0 0.2 0.4 0.6 0.8 1 F min (x) F(x) F A min (x) 0 2 4 6 8 x 10 -3 f(x) f A (x) Figure 2.5: (a) Linear approximation of NN-distance distribution F min (x). (b) Linear approximation of distance distribution F(x). (c) Nonlinear approxima- tion of F min (x). (d) Nonlinear approximation of F(x). Above four figures il- lustrate the impact on NNS performance when reducing the metric computation resolution/precision (staircase functions: 16 representation levels for these example graphs) instead of blindly computing each distance metric to full precision (solid blue line: e.g., 65536 representation levels if 16 bit-depth used). 35 0 50 100 150 200 250 300 350 400 0.2 0.4 0.6 0.8 1 N=100 N=1000 0.05 N=2000 0.005 Figure 2.6: illustrates that most of critical regions we need to measure the distance accurately is the region belonging to less than 1% of a given RDD distribution (red box). assigned to the rest of the regions. Fig. 2.5 (a) and (b) represent a technique which preserves the distance F(x) well as in Fig. 2.5 (b) but results in higher error in NNS performance as in Fig. 2.5 (a). Fig. 2.5 (c) and (d) on the other hand is poor in preserving distances F(x) as in Fig. 2.5 (d), while entailing less error in terms of the quality of approximate nearest neighbors (i.e., preservingF min (x) better) as in Fig. 2.5 (c). 2.3.4.2 Homogeneity of viewpoints Discussion so far may raises an obvious question concerning to the statistical infor- mationthatweareassumedtoknowforafixedq. Therelativedistancedistribution F q (x) may vary for different viewpoint q, which is called viewpoint discrepancies. This problem of viewpoint discrepancies has been studied [19] and it has been shown that the viewpoint discrepancies is quite insignificant for bounded random metric space, which is referred to as homogeneity of viewpoints property. Experi- ments performed with large text files also showed similar results. In other words, distancedistributionsmeasuredwithrespecttodifferentviewpointq, areverysim- ilar. However, note that for our purpose, even this assumption of high homogeneity of viewpoints is not necessary. Fig. 2.6 illustrates that the most critical region 36 where we need to measure the distance accurately is the region belonging to less than 1% of whole RDD distribution (region in the red box). Considering that ho- mogeneity of viewpoints “towards nearest-neighbors” is even higher, a technique developed based on the average statistics would behave quite well in general for different viewpoints q. Fig. 2.3 (b) also provides real application information on the homogeneity of viewpoints towards nearest-neighbors formotionestimationapplication. Fiverepre- sentatively different video sequences in terms of RDD still produce high homogene- ity of viewpoints towards nearest-neighbors. Fig. 2.3 (b) also shows that in general there is very limited region where high precision distance measure is needed for high NNS performance. Following sections describe how this nonlinear scaling on metric computation precision can be exploited and how it can be designed to result in significant com- plexity reduction. 2.4 QueryAdaptiveNearestNeighborPreserving Metric Approximation Algorithm (QNNM) 2.4.1 Problem Formulation Our goal is to find a new metric function d obj that approximates the original or benchmark metric d in terms of preserving the fidelity of NNS while having signif- icantly lower computational complexity than that ofd. Any metric approximation approachcanbeformulatedasψ :U →U ψ mappingtheoriginalmetricspace(U,d) into a simpler metric space (U ψ ,d ψ ) where NN search is performed withd ψ metric. 37 Ifψ is the same for all queries, this metric space mapping can be seen as a prepro- cessing (e.g., space transformation to reduce dimensionality) aiming at simplifying the metric space while preserving relative distance between objects. However, here our goal is to design a simple query-adaptive mapping ψ q :U →U ψ (U,d) ψq → (U ψ ,d ψ ) (2.14) that satisfies following three conditions: 1. Use the information of a given query location q which is constant over the searching process such that our objective metric d obj can be designed as a function of only ψ q (r). d obj (q,r) =d ψ (ψ q (r)) (2.15) 2. Reduce d obj complexity by exploiting the statistical characteristics of NN (extreme value distribution of sample minimum F min as in Fig. 2.3 where most of statistical information of NN is concentrated in a very narrow range) such that its resulting (U ψ ,d ψ ) is significantly simplified to preserve only NN and not the relative distances between objects. 3. Finding such query-dependent mapping ψ q prior to each querying operation should impose insignificant overhead complexity, if any. 2.4.2 Observations & Characteristics of Proposed QNNM Our proposed solution QNNM to this problem is based on the following four ob- servations: 38 1. The homogeneity of viewpoint (HOV) property [19]: the distribution of data set R with respect to distance between two objects or nearest two objects tends to be homogeneous. In other words, if we statistically model a metric datasetRintermsofitsdistancesd(q,r)withrespecttoagivenqueryqusing thedistancedistributionF(x) =Pr(d(q,r)≤x)orNN-distancedistribution F min (x) = Pr(d(q,r ∗ ) ≤ x), [19] shows that such distributions tend to be probabilistically very similar regardless of a query /viewpoint position. 2. As illustrated in Fig. 2.3, when performing NNS for different queries the distancesd(q,r ∗ ) between a query vector and its best match (NN) tend to be concentratedin a very narrow range (extreme value distribution of the sample minimum F min (x)). 3. The only goal of NNS metric is to identify NN or preserve the fidelity of the minimum distance ranking and not the distance itself. 4. The query vector is fixed during the entire search process. Our proposed approach, motivated from these observations, has three distin- guishing characteristics: (i) statistical modeling of data set R in terms of distance d, (ii) employing scalar quantization within a metric, and (iii) making a metric query-dependent. Typically most of NNS related studies are based on the assumption of fixed data sets and a fixed distance metric d which are known in advance. However, many practical applications/situations, e.g., data streaming environments, involve largely varying data sets. For some applications such as motion estimation, more than half of data set changes from the current query to the following one. In such situations, preprocessing based algorithms lose their meaning because such 39 assumptionsconsequentlyrequiretopreprocessRpartiallyorentirelyalloveragain whenever there is a change in R or d. The proposed algorithm, on the other hand, is based on the statistical charac- teristics of data set R (which is typically modeled using Gaussian distribution) in metric space (U,d) such that even if the search involves a largely varying data set R, no extra processing/updating is required due to the HOV property. Furthermore, because of these characteristics of statistical modeling, the pro- posed method can support not only predefined but also ad-hoc or online metric queries. For example, user could control and select features of interest and their weights and assign each feature/dimension a different role for evaluating the simi- larity. Unfortunately most existing approaches construct whole preprocessed data structures based on a fixed dimensions and metric function. Our proposed method, on the other hand, is based on query adaptive metric which can maximize the use of query location information to simplify the metric as well as make the metric flexible and adaptable from query to query changes even when there exist variation of metric functions and/or dimensionality change, without having to rebuild the whole data structure or to perform transforms from scratch. Therefore, even when the metricd changes (e.g., change ofr or weightsw j of weighted Minkowski metric P j w j (q j −r j ) p especially when user-controllable), the proposed method does not requirecomputingmetricapproximationfromscratchagainbutonlysimplescaling of quantization thresholds. Scalar quantization is chosen since it is computationally efficient, flexible to adapt to changed context (e.g.,R, q,d), but also convenient to introduce different sensitivity to different regions and dimensions to exploit the observation 2 and to reduce potential wasting of resources in computing high precision metric for unlikely solutions/points. Observation 2 allows these quantizers to be very coarse 40 (e.g., 1 bit per dimension) leading to very low computational complexity without affectingoverallNNSperformancemuch. OurexperimentalresultsinSection2.7.1, for instance, show negligible NNS performance degradation (average 0.01dB loss) when proposed method with optimized 1-bit/pixel quantizer is used as compared to ℓ 1 metric for motion estimation (ME) application for video coding. The third characteristic of proposed metric approximation algorithm is that the metric measure itself is not fixed but changes with query data which is fixed during the entire search process. But more importantly, we show that the problem of finding the optimal query-dependent quantization parameters for the proposed metric can be formulated as an off-line optimization process based on observation 1, such that it only requires trivial overhead complexity of changing a metric for a new query data prior to the search operation. In addition, note that our proposed QNNM algorithm can be used indepen- dently but also in parallel with most existing preprocessing based NNS algorithms. Because most preprocessing based algorithms try to prune and eliminate irrelevant data to a given query to narrow down the pool of data, which results in even- tually different data set from query to query and therefore, it can be seen as a varying data set itself to further perform NNS within that archived data set. Also note that our approach (if optimally designed) automatically eliminate certain di- mensions whose contribution to finding nearest-neighbor is insignificant. However our method cannot, for instance, transforms the basis/coordinate of the data to achieve dimensionality reduction. Therefore, if the user prefers, the data could be first preprocessed into a more suitable/compact domain to achieve better per- formance using, for example, principal component analysis [26], latent semantic indexing [21], independent component analysis, multidimensional scaling and sin- gular value decomposition using linear transforms (e.g., KLT, DFT, DCT, DWT). 41 Finally, our proposed approach can be also used for k-nearest neighbor search or orthogonal range search where quantization itself is user specified input. If one insists on finding the exact nearest-neighbor, this approach can be also used as a preliminary filtering step to filter out unlikely candidates and then refinement process can be performed within the remaining set. The details of proposed QNNM algorithm is described in following sections. 2.4.3 Basic Structure of Proposed QNNM There are three assumptions under which proposed QNNM is developed: 1. D-dimensional vector space U =R D , 2. There is no cross-interference between dimensions in original metric d. In other words, each dimensional dissimilarity is measured in an isolated fashion and then averaged/combined together (refer to Table. 2.1). General metric function structure d can be written as: d(q,r) = D X j=1 d j (q j ,r j ), r∈U (2.16) This metric computation structure comprises two basic processes: i) the distance computation in each dimension (we will refer to it as dimension- distance d j (q j ,r j )),andii)thesummation/combinationofallsuchdimension- distances. 3. The performance of NNS algorithm in terms of its accuracy is evaluated us- ing the expected solution quality, ¯ ǫ (the degree of average solution quality 42 1 ( , ) ( , ) D j j j j d q r d q r = = ∑ 1 ( , ) | | D j j j d q r q r = = - ∑ 1/2 2 1 ( , ) | | D j j j d q r q r = = - ∑ 1/ 1 ( , ) | | p D p j j j d q r q r = = - ∑ 1/ 1 ( , ) | | p D p j j j j d q r w q r = = - ∑ 1/ 1 ( , ) | | p D p j j j j d q r c q r = = - ∑ 1 | | ( , ) | | D j j j j j q r d q r q r = - = + ∑ 1 ( , ) D j j j d q r q r = = ⋅ ∑ p R + ∈ 1 0, 1 D j j j c c = > = ∑ Manhattan Euclidean Minkowski Weighted Minkowski Generalized Minkowski Generalized metric structure of interest Camberra Inner product (similarity) Table 2.1: Generalized metric function structure of our interest (average/sum of each dimensional distance measurements) and corresponding metric function ex- amples. degradation introduced by the algorithm relative to that of benchmark (full search) algorithm). ¯ ǫ =E q d(q,r ∗ ψ (q))−d(q,r ∗ (q)) d(q,r ∗ (q)) (2.17) where r ∗ ={r ′ ∈R|∀r∈R⊂U,q∈U :d(q,r ′ )≤d(q,r)} r ∗ ψ ={r ′ ∈R|∀r∈R⊂U,q∈U :d obj (q,r ′ )≤d obj (q,r)} 43 Our proposed approach simplifies the metric computation by approximating a given metric using quantization process. We will first provide a brief description of quantization and then proceed to describe our solution. 2.4.3.1 Quantization Quantization maps a sequence of continuous or discrete vectors into a digital se- quence such that less bit-rate can be used to represent the information while maintaining whatever necessary fidelity of the data. Scalar quantizer simply di- vides 1-dimensional space into a set of non-overlapping intervals or cells S = {s i ;s i = [θ i ,θ i+1 )} that cover all possible values. {θ i } is a set of thresholds. A quantizer in general consists of two mappings: a forward quantizer partitions input data space into disjoint and exhaustive cells and assigns to each cell s i a mapping symbol b i in some mapping symbol set B. Inverse quantizer assigns to eachmappingsymbolb i areproductionvaluer i . Thus, anyinputthatfallsintoone cells i isquantizedtos i ’scorrespondingmappingsymbolb i ,whichisinturninverse quantized to its reproduction valuer i . converting input data sequence into a set of mapping symbol is to compress input data into fewer bits while reproduction value is to reproduce the expected original vector from the mapping symbols. Thegoalofatypicalquantizeristoproducethebestreproductionofinputdata given a bit-rate budget. In other words, a quantizer is designed (i.e., determining a set of thresholds, mapping symbols, and reproduction values) typically to minimize reproduction error/distortion (e.g., mean square error between data points and their representatives). 44 A forward quantizer functionq which compresses the input datax and returns corresponding mapping symbol, can be expressed as: Q(x) = X i b i 1 s i (x) (2.18) while an inverse quantizer function Q −1 which returns the reproduction value r i from a mapping symbol b i , can be similarly expressed as: Q −1 (x) = X i r i 1 b i (Q(x)) =b x (2.19) whereb x represents a reproduction value of x. 2.4.3.2 Non-uniform scalar quantization within the QNNM metric The proposed technique is illustrated in Fig. 2.7. Our proposed metric space map- ping function ψ q :U → b U is a vector of scalar quantization functions ψ q = (ψ q1 ,ψ q2 ,··· ,ψ qD ) (2.20) whereeachψ qj isappliedindependentlyonj-thdimensionr j ofr = (r 1 ,r 2 ,··· ,r D ) ∈R, such that ψ q (r) = (ψ q1 (r 1 ),ψ q2 (r 2 ),··· ,ψ qD (r D )) where r ∈ U and ψ q (r) ∈ b U. Each ψ qj is a non-uniform scalar quantizer chosen based on the query data and dataset R. Each quantizer ψ qj may have different number of quantization levels and threshold values for different j. This mapping function ψ q divides the metric space U into a set of disjoint and exhaustive hyper-rectangular cells as shown in Fig. 2.7 (b) and assigns each cell 45 q (b) Target metric space 2 2 ( ) q r ψ ( , ) U d ( , ) U d ψ ψ 1 z (c) Viewpoint space ( , ) V V U d (d) Quantized viewpoint space ( , ) Q Q U d (a) Original metric space A query adaptive quantizer : NN preserving q ψ A global quantizer : NN preserving Q F V F R q F Q r 1 q 1 v q 1 1 ( ) q r ψ r 2 2 z 2 v Simple conversion from Q j to Ψ j Ψ j (r j ) = Q j (d j (q j , r j )) :distance preserving Homogeneity of viewpoint property :distance preserving Figure 2.7: Two dimensional example illustration of proposed QNNM algorithm and the relation between ψ q and q and their related spaces. F R , F V , and F Q indicate the distributions of data set R represented in original (U,d), viewpoint (U V ,d V ), and quantized viewpoint space (U Q ,d Q ), respectively. This shows howψ q can be obtained fromq such that overhead computation of findingψ q prior to each querying can be avoided. 46 Proposed quantization based metric function design Quantizer is designed to preserve NN quality (minimize DE) min q ψ ˆ d { } i r U ∈ ˆ { } i b U ∈ A r Input vectors nearest neighbor mapping symbols dA(r,q) Conventional quantization scheme and its design Quantizer is designed to compress {b i} while minimize reconstruction error (e.g., MSE) Q 1 Q - { } i r ˆ { } i r { } i b Input vectors reconstructed input mapping symbols Figure 2.8: Comparison of conventional quantization scheme and our proposed quantization based metric function design. with a cell index vector ψ q (r) (a vector of quantization integer mapping values). All data points r ∈ R need to be mapped to one of these cells. Based on their corresponding cell index ψ q (r), new distance to a query d obj = d ψ (ψ q (r)) (e.g., Fig. 2.9 (b)) is computed and a nearest distance pointr ∗ ψ in terms of a new metric is returned. Note that our proposed quantization based mapping function has certain differ- ences from typical quantization as illustrated in Fig. 2.8. The goal of our quanti- zation based mapping function is not in minimizing input data reproduction error given a fixed bit-rate budget, but in minimizing ¯ ǫ, the average NNS performance loss given a fixed complexity budget. Conventionally rate is defined as the average number of bits per source vector required to describe its corresponding mapping symbol. However we define rate as the cardinality of a set of mapping symbols B since the complexity of mapping functionψ q increases proportionally with the size ofB. Note that it is possible for multiple different cells to have the same mapping symbol. 47 2.4.3.3 Quantization-based metric design conditions The design of (U ψ ,d ψ ) consists of determining d ψ metric and a quantization based mapping function ψ q (2.20). ψ q is a set of scalar quantizers {ψ qj } j , each ψ qj of which includes three sets of parameters; the number of quantization levels, a set of quantization thresholds, and a set of mapping symbols. The number of quan- tization levels and quantization thresholds determines how to partition a original metric space (U,d) into a set of hyper-rectangulars while quantization mapping symbols and d ψ determines the ranking of each hyper-rectangular in terms of the probability of having NN. The whole design of these parameters should be chosen soastomaximizeNNSperformanceinbothsearchaccuracy(¯ ǫ)andcomputational complexity. During similarity searching process, a query q is fixed and all metric computa- tionsaredonetomeasuredistancesbetweenqandotherdatapoints. Byexploiting this fact,d ψ (q,r) can be simply a function of onlyr. Due to the second constraint ofnocross-interferenceamongdimensionsinmetricfunctionsd,ψ q canbedesigned in such a way that d can be reduced to d ψ which can be a simply sum of vector elements: d ψ (q,x) = D X j=1 x j (2.21) or d ψ can be even simpler bitwise OR operator of vector elements especially when ψ q is 1-bit quantizer: d ψ (q,x) =∨x j (2.22) Notethatmappingsymbolsofψ q canbedesignedmoreintelligentlythansimple consecutive numbers to simplify d ψ complexity to bitwise operators. 48 If summation is used as d ψ as in (2.21), d ψ can be implemented with a simple D-leafbinaryaddertree. d obj whichisourultimategoaltofind, canberepresented as: d(q,x) ∼ =d obj (q,x) =d ψ (ψ q (q),ψ q (r)) = D X j=1 ψ q (r) j = D X j=1 ψ qj (r j ) (2.23) Inotherwords,anyoriginal/benchmarkmetricdsatisfyingd(q,r) = P D j=1 d j (q j ,r j ) structure can be reduced to d obj (q,r) = P D j=1 ψ qj (r j ) metric if scalar quantizer set ψ q is optimally designed in such a way that nearest-neighbor found based on d obj metric is very close to that found based on original d metric (i.e., small ¯ ǫ). Note that this metric d ψ obviously violates the classical notion of metric (sym- metryandtriangleinequalityconditions)sinceitonlydefinesdistancefromapoint to a fixed q. Thus, this is only meaningful for similarity searching purpose. Since our goal is not to reproduce or approximate the input point r but to simplify a new metric space U ψ and d ψ , mapping symbol set can be simply a set of consecutive integers beginning with 0 instead of, for example, centroid of corresponding quantization bin. The benefit of compressingr into fewer bits for each dimension is to reduce the input bit size to d ψ computation such that the binary adder tree of d ψ (assuming d ψ is summation operator) can be simplified significantly. Asmentionedearlier,however,mappingsymbolscanbealsodesignedtosimplify d ψ into bitwise operator. Furthermore, it can be also designed to improve NNS performance(reduce¯ ǫ)inawaytocontroltheslopeofd obj metricsurfacedepending on the input distribution of data set R. 49 2.4.3.4 Avoiding the overhead complexity Given R and q, ψ q and d ψ need to be designed to minimize both complexity and average NNS error ¯ ǫ (2.17). However, finding the optimal query-dependent ψ q satisfyingsuchconditionspriortoeachqueryingoperationwouldimposesignificant overheadcosts. Thiscanbeavoidedbasedonthehomogeneityofviewpoint property studied in [19]. This allows to consider a viewpoint space (U V ,d V ) (e.g., Fig. 2.7(c)), where we denotev the vector of dimensional distancesd j (q j ,r j ) (2.16) between a query point and a search point: v := ~ d(q,r)∈U V . (2.24) where we define ~ d(·) operator as: ~ d(x,y) := (d 1 (x 1 ,y 1 ),d 2 (x 2 ,y 2 ),··· ,d D (x D ,y D )) (2.25) Then, under the assumption of viewpoint homogeneity (which makes distance distributiononviewpointspacequery-independent), wecangenerateoff-linestatis- tics over multiple queries and model a dataset by the overall distance distribution F V of v in U V . Or if the distribution f R of data set R can be fitted with stan- dard distribution such as multivariate normal, the distribution f V can be easily computed from cross-correlation off R andf q (the distribution of queries) which is discussed in detail in Section 2.5. F V of v in viewpoint space is the distribution of ~ d(r 1 ,r 2 ) where r 1 and r 2 are i.i.d. with data set distribution f R . F V (x) =Pr(v≤x) =Pr( ~ d(r 1 ,r 2 )≤x), r 1 ,r 2 ∼f R (2.26) 50 If the distribution of queries f q is different from that of data set f R , then F V (x) =Pr(v≤x) =Pr( ~ d(q,r)≤x), q∼f q ,r∼f R (2.27) where F V represents the probability that there exist objects whose distance v to a given arbitrary query is smaller than x. Given this query-independent F V model obtained, instead of directly finding ψ q : U → U ψ minimizing ¯ ǫ (2.17) for every given query q, we could alternatively look for a query-independent mapping functionQ :U V →U Q (e.g., Fig. 2.7 (c)(d)) which minimizes ¯ ǫ and satisfies a following condition, d obj (q,r) =d ψ (ψ q (r)) =d Q (Q(v)). (2.28) Inotherwords, theproblemofdesigninglowcomplexityd ψ andfindingtheoptimal ψ q given q and R can be replaced by the problem of designing low complexity d Q and finding the optimal Q that minimize ¯ ǫ given F V . This is because Q is query- independent which allows off-line process to find the optimalQ and also because ¯ ǫ of ψ q :U →U ψ is identical to ¯ ǫ of Q :U V →U Q . To minimize overhead cost of converting optimal Q to determine optimal ψ q prior to each querying process, we use the same metric for d Q and d ψ (d Q = d ψ ) and design Q to be analogous to ψ q (e.g., Fig. 2.9 (c)) as follow: Q(v) = (Q 1 (v 1 ),Q 2 (v 2 ),··· ,Q D (v D )), v∈U V (2.29) where each Q j is similarly a non-uniform scalar quantization function which is applied independently on j-th dimension v j of U V space. Thus, similarly to ψ q , Q partitions the viewpoint space U V into a set of hyper-rectangular cells U Q (e.g., 51 Fig. 2.7 (d)), where each cell is represented with a vector of mapping symbolsQ(v) and we denote it by z := Q(v) ∈ U Q . Once the optimal Q that minimizes ¯ ǫ and satisfies above design conditions is obtained ‘off-line’, given a queryq prior to each querying process, optimal ψ q can be obtained by the following simple equation: ψ qj (r j ) =Q j (v j ) =Q j (d j (q j ,r j )), ∀j (2.30) Forexample,letascalarquantizerQ j dividesaj-thdimensionintoasetofintervals S = {s i ;s i = [θ i ,θ i+1 )} covering all possible values with a set of thresholds {θ i } i and assigns a mapping symbol m i to each interval s i . Q j (v j ) = X i m i 1 s i (v j ) (2.31) Ifd isℓ p norm for instance, a scalar quantizerψ qj is determined according to (2.30) such that ψ qj (r j ) = X i m i 1 σ i (r j ) (2.32) with a set ofψ qj quantization thresholds{q j ± p √ θ i } i and a set of its corresponding intervals Σ ={σ i ;σ i = [q j + p p θ i ,q j + p p θ i+1 )∪[q j − p p θ i+1 ,q j − p p θ i )} In other words, first find a set of thresholds {θ i } i for the optimal Q j and store { p √ θ i } i off-line, and then for every new given query q, a new set of thresholds {q j ± p √ θ i } i forψ qj needs to be updated on the fly. Note that this computation for updating ψ qj is done only once given a query q before computing any d obj for all data points to identify q’s NN, which only costs negligible overhead. 52 2.4.3.5 Q-space vs. target space Even though query adaptive quantization based NNS metric is our target metric d obj which maps original metric space to target space (U ψ ,d ψ ), d obj (q,r) =d ψ (ψ q (r)) similarly, onecanalsousequeryindependent quantizationbasedNNSmetricwhich maps original metric space to Q-space. d obj2 (q,r) =d Q (Q(|q−r|)) Both have their pros and cons. Since the former metric d obj uses threshold values that change depending on query data, there is more restriction in optimiz- ing hardware circuit complexity. However, the latter metric d obj2 use always fixed thresholds, therefore, hardware circuit complexity can be even further simplified thus it is more appropriate with dedicated chip. However absolute difference be- tween query and objects need to be computed a prior. Fig. 2.9 illustrates the metric computation hardware architecture of Minkowski metric. Fig. 2.9 (a) represents the original Minkowski metric d, Fig. 2.9 (b) repre- sents our target metric d A where query adaptive quantizer ψ q is used. Fig. 2.9 (c) shows equivalent metric of Fig. 2.9 (b) but using global quantizer Q j . Fig. 2.9 (b) and (c) shows that 8 or 16 bit-depth databus interconnect is quantized into 1 bit. Although this is just an example, almost all of our simulations are based on 1 bit quantizer (per dimension) which we find it sufficient. 53 (a) rD q1 + + + + + 16 ab sub ^p 16 ab sub ^p + 16 ab sub ^p 16 ab sub ^p + + 16 ab sub ^p 16 ab sub ^p + + + + + + + ^(1/p) q2r2 r1 q4r4 q3r3 q6r6 q5r5 qD qD-5rD-5qD-4rD-4 qD-3rD-3 qD-2rD-2 qD-1rD-1 ab sub ^p ab sub ^p ab sub ^p ab sub ^p ab sub ^p ab sub ^p 16 16 16 16 16 16 (b) 1 1 1 1 1 1 1 1 1 1 1 1 16 r1 r2 r3 r4 r5 r6 rD-5 rD-4 rD-3 rD-2 rD-1 rD 16 16 16 16 16 16 16 16 16 16 16 ψq1 ψq2 ψq3 ψq4 ψq5 ψq6 ψqD ψq,D-1 ψq,D-2 ψq,D-3 ψq,D-4 ψq,D-5 (c) 16 ab sub ^p 16 ab sub ^p 16 ab sub ^p 16 ab sub ^p 16 ab sub ^p 16 ab sub ^p ab sub ^p ab sub ^p ab sub ^p ab sub ^p ab sub ^p ab sub ^p 16 16 16 16 16 16 Q2 Q1 Q4 Q3 Q6 Q5 QD-1 QD QD-3 QD-2 QD-5 QD-4 1 1 1 1 1 1 1 1 1 1 1 1 q1 q2r2 r1 q4r4 q3r3 q6r6 q5r5 rD qD qD-5rD-5qD-4rD-4 qD-3rD-3 qD-2rD-2 qD-1rD-1 Figure 2.9: Example illustration of hardware architectures of three metric compu- tations. (a) Minkowski metric d(q,r) = ( P j |q j −r j | p ) 1/p is shown as an example original metric d. (b) shows proposed QNNM metric d obj (q,r) =d ψ (ψ q (r)) which approximatesd. ψ qj is a query-dependentnon-uniformscalar quantizerwhichcom- presses input to a fewer bits (typically 8 or 16 bits into 1-bit), replacing |q j −r j | p computation in (a). Blank circle represents an operator determined by d ψ , e.g., it can be an adder if d ψ (x) = P x j , comparator if d ψ (x) = max(x j ), or logical operator OR ifd ψ (x) =∨x j . (c) is the equivalent metric ofd obj of (b), represented with query-independent quantizer Q j . Q j minimizing ¯ ǫ (2.17) is found via off-line optimization and used to determineψ qj which equivalently minimizes average NNS error ¯ ǫ. 54 0 0.5 1 1.5 2 2.5 3 0 10 20 30 40 50 VQ D=8 VQ D=16 VQ D=32 ME 1 FS SAD ME 2 FS SAD ME 3 FS SAD Complexity (%) VQ ℓ 2 D=8 VQ ℓ 2 D=16 VQ ℓ 2 D=32 ME ℓ 1 IPPP ME ℓ 1 IBBP ME ℓ 1 VBS, IBBP, Q-pel 1-bit QNNM Performance loss (∆ PSNR in dB) Complexity (%) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 0 10 20 30 40 50 60 70 80 90 100 BT D=30 BT D=20 DS D=30 DS D=20 QNNM D=30 QNNM D=20 BT ℓ 2 D=30 BT ℓ 2 D=20 DS ℓ 2 D=30 DS ℓ 2 D=20 QNNM ℓ 2 D=30 QNNM ℓ 2 D=20 Average NN approximation error ε Figure 2.10: Trade-off between the complexity (cost) and NNS accuracy degrada- tion (¯ ǫ), e.g., the coarser quantization causes higher NNS distortion ¯ ǫ with lower complexitycost whilethefinerquantizationleadstolower¯ ǫattheexpenseofhigher cost. Different applications (i.e., different F V ) result in different trade-off curves. Theright-handsideofthecurverepresentstheregionofinefficientchoicesofQand d Q while the left side represents infeasible choices of Q and d Q design. Our goal is to design Q and d Q given F V and ¯ ǫ tol such that its resulting complexity and ¯ ǫ pair correspond to the red points on the curves, achieving the lowest complexity with ¯ ǫ≤ ¯ ǫ tol . (Left) QNNM performance: the trade-off curve between error in accuracy (ΔPSNR in dB) and complexity. QNNM is applied to the vector quantization process for image coding & motion estimation process for video coding applica- tions. (Right) With the data setR∼N D (0,Σ) withρ ij = 0.5,σ 2 = 100 for alli,j, this compares the performance of three approximate NNS methods: bit truncation (BT), dimension subsample (DS), and QNNM. In Section 2.5, we develop and describe the optimal quantizer design which minimizes¯ ǫgivenafixedcomplexitybudgetbasedonalldesignconditionsdescribed above. 2.5 Finding the Optimal Q The complexity reduction comes at the expense of some performance loss due to the quantization process. As expected, there is a trade-off between complexity and performance such that coarser quantization will lead to further complexity 55 reductions, while increasing degradation in search performance. To maximize the performance for a fixed number of quantization levels or steps, it is critical to find the optimal quantizer. As shown in Section 2.4, the problem of finding the optimal target metric space (U ψ ,d ψ )fortheproposedd obj canbereplacedbytheproblemoffindingaquantized viewpointspace(U Q ,d Q )whichminimizes¯ ǫandcomplexityofitsmappingfunction Q and d Q metric. The design of (U Q ,d Q ) consists of determining d Q metric and a quantizationbasedmappingfunctionQ(2.29). Qisasetofscalarquantizers{Q j } j , for eachQ j of which includes three sets of parameters; the number of quantization levels b j , a set of quantization thresholds {θ ji } i , and a set of mapping symbols {m ji } i . {b j } and{θ ji } determines how to partition a viewpoint space (U V ,d V ) into a set of hyper-rectangulars while {m ji } i and d Q determines the ranking of each hyper-rectangular in terms of the probability of having NN. The whole design of d Q ,{b j },{θ ji }, and{m ji } should be chosen so as to max- imize NNS performance in both search accuracy and computational complexity. Finding such optimal Q-space design depends highly on the input data distribu- tion F V (2.27) and the degree of NNS error tolerance ¯ ǫ tol of a given application. For example, in some applications (e.g., motion estimation for video coding in Sec- tion 2.7.1), coarsest 1-bit quantizerQ j and a simpled Q metric with logic operators OR (d Q (z) = ∨z j ) still results in negligible performance degradation while some otherapplications(e.g.,vectorquantizationforimagecodinginSection2.7.2)could require 2-bit quantizer Q j with d Q (z) = P z j to achieve similar performance. Note that since NNS error tolerance level ¯ ǫ tol is application-specific, those ap- plications with very small ¯ ǫ tol may not benefit much from our proposed QNNM technique. A user can estimate the performance of QNNM when used within a 56 given application of interest to decide whether to use QNNM, without necessar- ily having to find the optimal Q and d Q functions first (refer to Section 2.6.2 on how). Once a user decide to useQNNM, one could proceed to find the optimalQ function. In this section, we present an optimization algorithm that finds the optimal mapping function Q ∗ and metric d ∗ Q which minimize both average NNS accuracy degradation ¯ ǫ and computational complexity cost of d obj =d Q (Q(v))(2.28). Q ∗ ,d ∗ Q = argmin (Q,d Q) [¯ ǫ,cost] T (2.33) Simultaneously optimizing two conflicting objective functions,f obj1 = ¯ ǫ andf obj2 = cost produces a set of Pareto optimal solutions as shown in Fig. 2.10 trade-off curves between ¯ ǫ and cost. Our design goal is to find a Pareto optimal solution whose resulting ¯ ǫ is within an acceptable range (¯ ǫ≤ ¯ ǫ tol ) while cost is minimized. Optimization process in general consists of two phases: the search process (i.e., generating candidate solutions) and the evaluation process (evaluating solutions, i.e., f obj computation). Note that in the context of this optimization, we refer to the ‘search’ for the optimal set of quantizer design parameters, which should not be confused with the search performed in NNS itself. This problem (2.33) is a stochastic optimization (SO) problem 1 [49] since both input and output of a system/objective function (f obj1 = ¯ ǫ in particular) involve stochastic behavior and it aims to achieve optimality on average. In other words, evaluating f obj1 can be only estimated, typically through Monte-Carlo simulation approach (e.g., training data samples are simulated to evaluate the average NNS 1 This contrasts with the conventional deterministic optimization where the values of the objec- tive function are assumed to be exact. While SO algorithm is also often referred to as randomized search methods which incorporate randomness in search algorithm, SO algorithm in this thesis refers to optimization methods for stochastic objective functions. 57 performancef obj1 = ¯ ǫ). Thus, unlike conventional deterministic optimization prob- lems, reliableevaluationofstochasticf obj aloneimposesseriouscomputationalbur- den. Moreover, most of usefulassumptions/properties (smoothness, continuity, dif- ferentiability, linearity, convexity, unimodality etc.) for designing efficient search process tend to be unavailable, which consequently leads to extra complication in designing the search process of finding a global minimum. Furthermore, this opti- mization problem often involves searching high dimensional solution space/search space 2 . These three characteristics (expensive evaluation process, difficult search- ing process, and large search space) make this stochastic optimization problem extremely computationally expensive. Therefore our goal is to design a highly specialized optimization algorithm for our problem to minimize such computational cost as much as possible. 2.5.1 ProposedStochasticOptimizationAlgorithmtoFind the Optimal Q Proposed algorithm reduces complexity by formulatingf obj1 such that a large por- tionoff obj1 computationscanbesharedandcomputedonlyonceasapreprocessing step for a certain set of (quantizer) solution points, instead of computing f obj1 for each solution point independently. This leads to the total optimization complexity to change from O(TN s ) to O(T +c 1 +c 2 N s ), where T is the size of training data which needs to be sufficiently large, N S is the total number of candidate solutions evaluatedduringthesearchprocess. c 1 andc 2 arepreprocessingcostandf obj1 eval- uation cost, respectively. This requires a joint design of the search and evaluation processes. 2 Note that search space of (2.33) should not be confused with metric space U or U ψ (2.14) of NNS problem. Each dimension of the search space of (2.33) represents θ ji , m ji , or d Q . 58 2.5.1.1 Modeling input data set distribution We first need to model input data set of interest R and its corresponding view- point space distribution f V . If we denote the distribution of data set R by f R , f V can be derived from autocorrelation of f R or cross-correlation of f R and f q if the distribution of queries f q is different from data set distribution f R . For example, if dimensional distance d j (q j ,r j ) of original metric d(q,r) as in (2.16) is the form of w j |q j −r j | p , the probability density function (pdf) of |q−r| is (denoted by f |q−r| ) as: f |q−r| (a) = X |x|=a (f R ⋆f R )(x) q,r∼f R , a≥ 0 (2.34) F |q−r| (a) = Z |a| −|a| (f R ⋆f R )dx q,r∼f R , a≥ 0 (2.35) f |q−r| (a) = X |x|=a (f q ⋆f R )(x) q∼f q , r∼f R , a≥ 0 (2.36) F |q−r| (a) = Z |a| −|a| (f q ⋆f R )dx q∼f q , r∼f R , a≥ 0 (2.37) where ⋆ is cross-correlation operator defined as (f ⋆g)(a) = Z ∞ −∞ f ∗ (x)g(a+x)dx (2.38) Then f V can be represented as f V (a) =f |q−r| (g −1 (a)) (2.39) F V (a) =F |q−r| (g −1 (a)) (2.40) 59 where g(x) = (w 1 x p 1 , w 2 x p 2 ,··· , w D x p D ) For example, if we model data setR with D-variate normal distribution: r,q∼ N D (μ, Σ)withmeanvectorμandcovariancematrixΣ,then(q−r)∼N D (0, 2Σ). If d =ℓ 1 norm, f V (a) = 2 (2π) D/2 |Σ| 1/2 exp − a ′ Σ −1 a 2 a≥ 0 (2.41) If d =ℓ 2 norm, f V (a) = 2 (2π) D/2 |Σ| 1/2 exp − a ′ Σ −1 1 2 a≥ 0 (2.42) 2.5.1.2 Objective function formulation Given F V input distribution, our objective function f obj1 = ¯ ǫ can be formulated using F V . f obj1 = ¯ ǫ is an NSS error measure as defined in (2.17), consists of two terms, d(q,r ∗ (q)) and d(q,r ∗ ψ (q)). Finding a NNS algorithm which minimizes ¯ ǫ is the same as finding one that minimizes d(q,r ∗ ψ (q)) since E[d(q,r ∗ (q))] term is constant given F V or dataset R while E[d(q,r ∗ ψ (q))] changes with Q parameters. Therefore, f obj1 can be reduced to: f obj1 =E[d(q,r ∗ ψ (q))] = X a μ Q (a)f min Q (a) (2.43) where f min Q is the pdf of F min Q (a), and F min Q (a) =Pr(d obj (q,r ∗ ψ )≤a), (2.44) 60 μ Q (a) =E(d(q,r)|d obj (q,r) =a,∀q,r∈U). (2.45) To computeμ Q andF min Q , we first assign three parameters to each cellc z of the set of hyper-rectangular cells defined by Q (Fig. 2.7 lower right): (i) probability mass p z , (ii) non-normalized centroid u z , and (iii) distance d z = P z j . p z = Z cz f V (v)dv (2.46) u z = Z cz <v,1>f V (v)dv (2.47) Then F min Q and μ Q (a) are computed as: F Q (a) = X dz≤a p z (2.48) F min Q (a) = 1−(1−F Q (a)) N (2.49) μ Q (a) = P dz=a u z P dz=a p z (2.50) 2.5.1.3 Preprocessing based objective function evaluation Computing f obj1 is simple once p z , u z are known for all cells c z , but obtaining p z , u z in the first place can be complex. However, if the following two data sets F V and H V are available or computed in a pre-processing stage: F V (x) =Pr(v≤x) (2.51) H V (x) = X v≤x <v,1> (2.52) thenP z = P z ′ ≤z p z ′ andU z = P z ′ ≤z u z ′ can be easily computed for each cellc z , so 61 v 1 v 2 0.2 0.4 0.6 0.8 1 1 0.6 0.3 0.1 v 1 v 2 θ 1 θ 2 1 1 θ 3 (0,0) (1,0) (2,0) 1 2 0 1 0 0 θ 1 θ 2 θ 3 x 1 x 2 x 3 1 0.6 0.3 0.1 0.2 0.4 0.6 0.8 1 (0,1) (1,1) (2,1) (a) (b) (c) Figure 2.11: (a) A simple example of 2D viewpoint space partitioned by Q= {θ 1 ,θ 2 ,θ 3 }. P z ,U z values at all six gray points need to be retrieved from F V ,H V to compute f obj . (b) 3D search space of Q. x 1 , x 2 , x 3 are 3 arbitrarily chosen candidate solutions for Q. (c) shows the preprocessed structures (F V , H V having P z ,U z values at all points in (c)) to compute f obj for x 1 , x 2 , x 3 . However, f obj1 for all gray points in (b) can be computed with the same F V , H V in (c). thatallnecessaryp z ,u z valuescanbeobtainedwithonlyc 2 =O(DN C )cost,where D is dimension and N C is total number of cells generated byQ. N C = Q j (b j +1), where b j denotes the number of thresholds assigned by Q on j-dimension of U V . However, the computational (c 1 ) and storage complexity ofF V andH V increase exponentially (e.g., O(DW D ) assuming all dimensions are represented with the same resolution W). To reduce such complexity, while it is very important to minimize the dimensionalityD if possible 3 , it is also important to note that only a small fraction ofF V andH V data relating to the candidate solutions on the search path is used during the optimization process. We next propose a search algorithm that maximally reuses F V and H V data and show how F V and H V can be updated in conjunction with the search process in order to reduce overall storage and computation. 3 DisreducibledependingontheinputdistributionF V ifcertaindimensionsareindependentor interchangeable/commutative. In fact this is usually the case for real-world applications (e.g., for video coding, all pixels tend to be heavily correlated yet interchangeable statistical characteristics thus common 16×16 processing unit image block (D=256) can be reduced to D=1). 62 2.5.1.4 Preprocessing and search process co-design Givenk arbitrarysolutionpointsonthesearchspace,preprocessingcostS k tobuild F V and H V containing only necessary data to compute f obj1 of those k points is the same as that for computingf obj1 ofK different solution pointsG which form a grid, where: K = Y j (k+1)b j b j S k = Y j (kb j +1) (2.53) In other words, if every adjacent pair of solution points in G can be connected by line of equal size Δ, all such regularly spaced lines form a D S -dimensional grid in D S -dimensional search space. If so, evaluation of solution points in G can maximallyreusedatafromF V andH V andthusleadtominimalpreprocessingcost in both space and time complexity. Fig.2.11providesasimpleexamplewhichillustratesthattheminimumrequired preprocessing cost to computef obj1 for a set of three arbitrary solution points (e.g., x 1 ,x 2 ,x 3 in Fig. 2.11 (b), k=3) is the same as that for a set of 112 solution points (e.g., all gray points in Fig. 2.11 (b), K = 112) which forms a grid structure. Based on the above observation and the unimodality of f obj1 function over the search space 4 , we describe a grid based iterative search algorithm framework with guaranteed convergence to the optimal solution 5 . The basic iteration of this algorithm consists of (i) generating a grid G i which equivalently indicates a set of solution points which correspond to all grid points, (ii) building minimum required preprocessed structuresF Vi andH Vi for computing 4 We represent each quantization parameter not with the actual threshold value θ but with the marginal cumulative probability F V (θ), such that the search space becomes [0,1] D .This is not only for ease of f obj computation but also helps increasing slope, reducing neutrality, ruggedness, or discontinuity of f obj function, leading to higher search speed-up and making f obj unimodal. This also provides further indication regarding to the sensitivity to performance. 5 its convergence result is similar to those in [51], [31] 63 G 0 G 1 G 2 G 3 G 4 Δ 0 Figure 2.12: An example illustration of grid-based optimization algorithm with ω=2,γ=3 in 2D search space. It searches for the optimalQ ∗ (quantization param- eters) for QNNM (2.15). It shows the iterations of either moving or 1/γ-scaling a grid G of size ω×ω. Note that this should not be confused with ME search or NNS search algorithms. f obj of all grid points onG i , (iii) computing a set off obj1 and finding its minimizer Q ∗ i ofG i , and (iv) generating a next gridG i+1 by either moving or scalingG i based on Q ∗ i information. We model a gridG on the search space with its center/locationC, grid spacing Δ, and size parameterω, assuming it has equal spacing and size for all dimensions, such that the total number of solution points in G is to be ω D S . With initialization of grid-size parameter ω, grid scaling rate γ, tolerance for convergence Δ tol > 0, grid-spacing parameter Δ 0 , and initial grid G 0 , for each iteration i = 0,1,.. 1. Preprocess: construct F Vi and H Vi to evaluate G i 2. Search: seek a minimizer Q ∗ i among the points in G i 3. Update: generate a new grid G i+1 based on Q ∗ i ◦ Move the center of grid: C i+1 =Q ∗ i ◦ Grid space update • Moving grid: if Q ∗ i is on the boundary of grid G i : 64 Δ i+1 = Δ i • Scaling grid: if Q ∗ i is not on the boundary of grid G i : Δ i+1 = Δ i /γ ◦ Terminate: if Δ i+1 < Δ tol ◦ Generate G i+1 : with parameters ω, Δ i+1 , and C i+1 Given this algorithm modeling, our goal is to find two integer parameter values, w and γ, minimizing overall computational complexity. Overall optimization complexity can be quantified as 6 : O(T +Lc 1 +Lc 2 N s ), (2.54) c 1 =O(Dω B ), (2.55) c 2 =O(DS 1 ), (2.56) where c 1 is both time and space complexity of phase-1 search. L denotes the total number of iterations. Note that c 2 is fixed regardless of ω and γ. N s depends on phase-2 grid search algorithm which is described below, but roughly varies from O(ωD) to O(ωc D ). If we assume to continue iteration until it gets as fine as resolution W, total iteration number is L≈ γ ω log γ W w . Therefore, γ≥ 1 minimizing γlog γ W and min- imum possible integer ω≥ 2 minimizes overall complexity in both time and space: That is, γ = 3 and ω = 2. (Fig. 2.12) 6 Overall complexity can be further reduced from O(L(T +c 1 +c 2 N s )) to O(T +Lc 1 +Lc 2 N s ) by splitting and deleting portions of training data set at each iteration such that only relevant data is examined for each update. 65 Phase-2 Grid Search Algorithm Given grid points with generalk> 1 and Δ, the general grid search algorithm can be represented in this form, Q n+1 = Π Θ (Q n −a n ˆ ∇ n (Q n )) wherestepsizea n ,Π Θ aprojectionofpointsoutsideΘbackintoΘ,searchdirection ˆ ∇ n , and iterationn. Note that in determininga n and ˆ ∇ n for this problem, typical stochastic or gradient approximation methods are not useful in general. Due to relatively large Δ, it is unreliable to approximate gradient and often k is small enough to perform line search. Thus a n = Δ. In determining ˆ ∇ n , especially as dimensionality increase, stochastic choice of ˆ ∇ n tend to be close to orthogonal to true gradient direction which leads to being slow in convergence. Furthermore, randomized choice does not use given information most efficiently (e.g., cycling, visiting similar wrong solutions ). Therefore efficient direct search is more suitable. 2.6 Complexity-Performance Analysis 2.6.1 Complexity Analysis Comparing two metric computations: the original/benchmark Fig. 2.9 (a) and the proposed Fig. 2.9 (c), the proposed approach leads to complexity reductions in i) the summation process after quantization (upper part of the dashed quantization line in Fig. 2.13) and ii) a set of dimension-distance computation processes prior to quantization (lower part of the line in Fig. 2.13). Fig. 2.14 and Table. 2.2 provide useful insight to understand the computational complexity of the search at the circuit level [7]. Fig. 2.14 illustrates the structure 66 16 ab sub ^p 16 ab sub ^p 16 ab sub ^p 16 ab sub ^p ab sub ^p 16 ab sub ^p ^(1/p) q 2 r 2 q 1 r 1 q 4 r 4 q 3 r 3 q 6 r 6 q 5 r 5 q Dr D qD-5r D-5 qD-4r D-4 qD-3r D-3 qD-2r D-2 qD-1r D-1 ab sub ^p ab sub ^p ab sub ^p ab sub ^p ab sub ^p ab sub ^p 16 16 16 16 16 16 16 Quantization Significant bit size compression on adder tree Elimination of dimensional distance computations Figure 2.13: The proposed approach leads to complexity reductions in i) the sum- mation process after quantization (significant circuit complexity reduction due to bit size compression), and ii) a set of dimension-distance computation prior to quantization (dimension-distance computation becomes unnecessary). FA A 2 B 2 C 2 C 1 Sum 2 FA A 1 B 1 C 0 Sum 1 FA A N B N C N-1 Sum N FA A N-1 B N-1 C N-2 Sum N-1 C N (a) (b) b3 0 b2 0 b1 0 b0 0 P7 P6 P5 P4 a0 0 a1 0 a2 0 a3 0 P0 P1 P2 P3 FA a i b j s in s out c in c out Figure 2.14: Diagrams of arithmetic circuits for (a) N-bit ripple carry adder and (b) 4 bit array multiplier. 67 (a) (b) avg. # of logic transitions/add # of gates 1938 1228 695 808 576 64 857 597 350 401 288 32 368 284 170 200 144 16 adder size (bits) 1323 543 218 Conditional Sum 711 344 161 Carry Select 437 220 108 Carry Skip 405 202 100 Carry Lookahead 366 182 90 Ripple Carry 64 32 16 adder size (bits) Adder Type avg. # of logic transitions/add # of gates 1938 1228 695 808 576 64 857 597 350 401 288 32 368 284 170 200 144 16 adder size (bits) 1323 543 218 Conditional Sum 711 344 161 Carry Select 437 220 108 Carry Skip 405 202 100 Carry Lookahead 366 182 90 Ripple Carry 64 32 16 adder size (bits) Adder Type avg. # of logic transitions pipelined # of gates 10051 10417 9918 9792 32 2477 2569 2405 2336 16 612 613 567 528 8 multiplier size (bits) 16638 3389 557 Dadda 19548 3874 573 Wallace 99102 7348 583 Modified Array 99062 7191 548 Array 32 16 8 multiplier size (bits) Multiplier Type avg. # of logic transitions pipelined # of gates 10051 10417 9918 9792 32 2477 2569 2405 2336 16 612 613 567 528 8 multiplier size (bits) 16638 3389 557 Dadda 19548 3874 573 Wallace 99102 7348 583 Modified Array 99062 7191 548 Array 32 16 8 multiplier size (bits) Multiplier Type Table 2.2: The average number of transitions (a measure of dynamic power con- sumption) and the number of gates (a measure of circuit size, static power con- sumption) for various types of adders (a) and multipliers (b) for different input bit sizes. [7] of arithmetic circuits for a representative N bit-depth adder Fig. 2.14 (a) and 4 bit-depth multiplier Fig. 2.14 (b), the size of which increases significantly with the input bit size. A block FA (full adder) is a single bit adder and a typical N bit adder consists of N full adders linked together by the carry inputs and outputs. Typical N bit multiplier consists of N by N array of cells, each of which consists of full adder and AND gate. Table. 2.2 (b) and (d) essentially demonstrate that the computational complexity, circuit size, static and dynamic power consumption, computation delays of most basic arithmetic elements including adder or multiplier are all directly influenced by, and increase polynomially with, the input bit size. Therefore quantization applied to dimension-distance terms in each dimension, as shown in Fig. 2.13, leads to significant simplification of the summation process (binary adder tree). Typically 8, 16, or 32 bit-depth input is quantized into either 68 0,1, or 2 bits depending on the input distribution on each dimension (non-uniform bitallocationoverdimensions). Forexample,formotionestimationinvideocoding, very coarse quantization (e.g., 1 bit per pixel) has been shown to be sufficient to achieve video coding performance nearly unchanged (average 0.01dB loss. See Section 2.7.1). Fig. 2.9 (b) and Fig. 2.9 (c) are equivalent metric. As our target metric imple- mentation structure Fig. 2.9 (b) shows, entire set of dimension-distance computa- tions{d j (q j ,r j )} D j=1 and its corresponding circuits can be eliminated. Comparing the original/benchmark metric computation d(q,r) = D X j=1 d j (q j ,r j ) and our proposed metric computation architecture d(q,r) = D X j=1 ψ qj (r j ), complexity savings depend on how complex the original metric is and how much complexity budget the user constrains to design the optimalψ qj (e.g., total number of quantization bins). Fig. 2.15 illustrates the complexity increase as a function of the input bit size, dimensionality, and order p of metric (if p-norm metric is considered) for both conventional and proposed metric computations. We measure complexity in units of number of full-adder operations (basic building blocks of arithmetic logic cir- cuits), under the assumption that n-bit addition, subtraction, and absolute value operations have the same complexity and that a square operation has equivalent 69 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 1 2 3 4 5 6 7 8 Complexity Complexity Input bit size dimensionaltiy Complexity analysis of proposed method vs. conventional one l 1 norm l 2 norm proposed l 1 norm l 2 norm proposed 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 0 50 100 150 200 250 Figure2.15: Complexitybehaviorcomparisonofconventionall 1 andl 2 normmetric computationvs. proposeddistancequantizationbasedl p normmetriccomputation with respect to the input bit size and dimensionality. complexity to that of an n 2 -bit addition. For motion estimation example, the di- mensionality represents the number of pixels per matching block while input bit size represents pixel bit-depth. Note that the complexity of the proposed method remains constant over different input bit sizes and over different original metrics, whileitslowlyincreaseswithdimensionalityascomparedtotheconventionalmetric computation. This approach obviously imposes extra complexity for quantization process. However, this quantization implementation can be integrated with the following adder block such that its overhead cost is kept negligible as compared to the com- plexity reduction achieved elsewhere. 2.6.2 Performance Analysis QNNMperformanceisoptimizedandevaluatednotbasedonworst-caseerrormea- sure but on average-case error measure. Therefore, its performance is input data dependent and application specific. In this section, we provide an analysis and 70 identify the statistical characteristics of promising application, as well as simple technique to estimate QNNM performance for any specific application of interest without having to perform any optimization process. Note that even if NN approximation error ¯ ǫ is the same, its system level perfor- manceaswellasapplicationspecificerrortolerancelevel¯ ǫ tol mayvaryfromapplica- tion to application. Therefore, it is important to estimate how much ‘system-level’ performance degradation QNNM could introduce as well as complexity saving. To do this, one could first analyze their data set R in terms of its dimensionality, average correlation, original search metric, and average distance to exact NN. Once obtained, refer to Fig. 2.16 and Fig. 2.17 to roughly estimate QNNM error ¯ ǫ. Then by introducing ¯ ǫ error to their NNS system and observing its impact on the system level result, one could use such data to decide whether to adopt QNNM method to one’s application of interest or not. In Fig. 2.16 and Fig. 2.17, data set is modeled by multivariate normal distribu- tion N D (μ,±). There are three significant statistical characteristics that influence QNNM performance. These can be shown both analytically and numerically as in Fig. 2.16 and Fig. 2.17. • Any linear transform of data set would not change QNNM performance in terms of ¯ ǫ. More specifically any scaling of Σ or any change of μ would not affect ¯ ǫ of QNNM. • QNNM ¯ ǫ decays exponentially as dimensionality increases. 7 7 Note that dimensionality D here refers to the dimensionality of viewpoint space and it should not be confused with dimensionality of optimization search spaceD S . viewpoint space dimension- ality is the same as dimensionality of metric space where NN is searched. dimensionality of opti- mization search space, as we discussed earlier, represents total number of parameters/thresholds used to quantize viewpoint space. 71 • The more correlated (positive or negative or in any direction) the data set distribution is, the less QNNM ¯ ǫ becomes. 8 Fig. 2.16 and Fig. 2.17 show simulated QNNM performance for four specific data set model settings (multivariate normal distribution with identical marginal distribution and covariance for all dimensions with four different correlation coef- ficient 0, 0.5, 0.8, 0.95) with various dimensionality with respect to two different original metricsℓ 1 andℓ 2 . Straight line (no metric) represents average distance per dimension from a query to any randomly chosen object, which can be considered as the worst bound of NNS performance. Blue curves (average distance/D to exact NN)representsthelowerboundofNNSperformance. Notethataveragedistanceto exact NN per dimension is not constant but increases as dimensionality increases. This is due to the weak law of large numbers, where the variance of the distance convergesto0inprobabilityasdimensionalityissufficientlylarge. Thus,NNSloses its meaning since all objects converge to the same distance from the query point. While this simulation shows the case with identical correlation for all dimensions and it may not be the case for the actual data set of interest, user can refer to zero correlation (uncorrelation) performance as the worst performance bound of QNNM performance. One can compute average correlation coefficient for all dimensions and can make a rough estimate between curves provided in Fig. 2.16 and Fig. 2.17 and apply estimated ¯ ǫ error to the system of interest 9 and determine if the end result is acceptable or not. 8 Two subspace divided by a hyper-surface of original metric are different from two subspaces dividedbyahyperplaneofquantizationthreshold. Commonregionsoftwopairsofsplitsubspaces do not cause error but the remaining regions introduce error. if data distribution is correlated, the ratio of data falling into these error introducing regions is reduced. Therefore, the stronger correlation is in any direction, the smaller ¯ ǫ becomes. 9 instead of searching exact NN, let the system choose the object having distance =(1+ ¯ ǫ)∗ d(NN,q) 72 (c) (b) (a) 0 1 2 3 4 5 6 7 8 9 10 11 12 0 20 40 60 80 100 ρ = 0.95 Average distance/D from query to NN Dimensions (D) ρ = 0.8 ρ = 0.5 ρ = 0 v=100 c=0 v=100 c=0 v=100 QNNM c=0 v=100 3 th No metric Original metric (SAD) QNNM 1 bit QNNM 2 bits 0 2 4 6 8 10 12 14 16 18 20 22 24 0 20 40 60 80 100 ρ = 0.95 Average distance/D from query to NN Dimensions (D) ρ = 0.8 ρ = 0.5 ρ = 0 v=100 c=0 v=100 c=0 v=100 QNNM c=0 v=100 3 th No metric Original metric (SAD) QNNM 1 bit QNNM 2 bits 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 ρ = 0.95 Average distance/D from query to NN Dimensions (D) ρ = 0.8 ρ = 0.5 ρ = 0 v=100 c=0 v=100 c=0 v=100 QNNM c=0 v=100 3 th No metric Original metric (SSD) QNNM 1 bit QNNM 2 bits Figure 2.16: QNNM performance when NNS data setR is modeled with multivari- ate normal distribution R∼N D (0,Σ) with different dimensions D and covariance matrices Σ which have identical marginal variances σ 2 j ∀j and covariance ρ ij ∀i,j for all dimensions. Straight blue line (no metric) represents average distance per dimension from a query to any randomly chosen object, which can be considered as the worst bound of NNS performance. Blue curves (average distance/D to exact NN) represents the lower bound of NNS performance (original metric). (a) ℓ 1 as original metric. R ∼ N D (0,Σ) with σ 2 =100 ∀j and ρ ij = 0, 0.5, 0.8, 0.95, ∀i,j. (b) The same setting as (a) except σ 2 =400∀j. (c) The same setting as (a) except original metric to be ℓ 2 (Euclidean distance). (a) and (b) show the degree of dis- persion of data set of interest does not affect QNNM performance in terms of NN approximation error ¯ ǫ. However, the stronger the correlation ρ and the higher the dimension, it is more advantageous for QNNM. 73 a Dimensions (D) d c b a d c b 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 20 40 60 80 100 Average NN approximation error ε c=0 v=100 relative error c=0 v=400 relative error 3 thresh c=0 v=100 relative error 3 thresh c=0.95 v=400 relative error QNNM 1 bit. R ~ N D (0,Σ) σ j 2 =100 j QNNM 1 bit. R ~ N D (0,Σ) σ j 2 =400 j QNNM 2 bit. R ~ N D (0,Σ) σ j 2 =100 j QNNM 2 bit. R ~ N D (0,Σ) σ j 2 =400 j a: ρ ij = 0 i.j b: ρ ij = 0.5 i.j c: ρ ij = 0.8 i.j d: ρ ij = 0.95 i.j a Dimensions (D) d c b a d c b 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 Average NN approximation error ε c=0 v=100 relative error c=0 v=400 relative error 3 thresh c=0 v=100 relative error 3 thresh c=0.95 v=400 relative error QNNM 1 bit. R ~ N D (0,Σ) σ j 2 =100 j QNNM 1 bit. R ~ N D (0,Σ) σ j 2 =400 j QNNM 2 bit. R ~ N D (0,Σ) σ j 2 =100 j QNNM 2 bit. R ~ N D (0,Σ) σ j 2 =400 j a: ρ ij = 0 i.j b: ρ ij = 0.5 i.j c: ρ ij = 0.8 i.j d: ρ ij = 0.95 i.j Figure 2.17: Different representation of Fig. 2.16 representing QNNM performance withrespecttoNNSapproximationerror¯ ǫoverdifferentdimensionalityandcovari- ance. All results are under the same data set model settings as those of Fig. 2.16 with different original/benchmark metric (Left) ℓ 1 and (Right) ℓ 2 . 2.7 Example Applications 2.7.1 Motion Estimation for Video Compression Inthissection,ourproposedapproachanditsanalyticalstudyareappliedtomotion estimation (ME) process used in video coding system as an example application. Experimentalresultsareprovidedtovalidateourstudyandtoempiricallyevaluate the performance of our proposed approach. The ME process is one of good example applications for the following rea- sons: i) its computational burden is very heavy, ii) it is inherently tolerant with search approximation error, iii) dimensions/pixels of data set is highly correlated (Fig. 2.19 (a) image blocks with highly correlated adjacent pixels, and Fig. 2.19 (b) consequently highly correlated viewpoint distance distributionF V ), iv) it holds homogeneity of viewpoint property, which consequently means NN distribution is concentrated in a very narrow range (Fig. 2.19 (c)), v) ME is typically performed on high dimensional data set (from 4×4 to 16×16 block as a vector of dimension 74 Complexity (%) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 0 10 20 30 40 50 60 70 80 90 100 BT D=30 BT D=20 DS D=30 DS D=20 QNNM D=30 QNNM D=20 BT D=30 BT D=20 DS D=30 DS D=20 QNNM D=30 QNNM D=20 (b) Average NN approximation error ε 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 0 10 20 30 40 50 60 70 80 90 100 BT D=30 BT D=20 DS D=30 DS D=20 QNNM D=30 QNNM D=20 Complexity (%) BT D=30 BT D=20 DS D=30 DS D=20 QNNM D=30 QNNM D=20 (c) Average NN approximation error ε 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 0 10 20 30 40 50 60 70 80 90 100 BT D=30 BT D=20 DS D=30 DS D=20 QNNM D=30 QNNM D=20 Complexity (%) BT D=30 BT D=20 DS D=30 DS D=20 QNNM D=30 QNNM D=20 (a) Average NN approximation error ε Figure 2.18: Performance comparison of bit truncation methods (BT), dimensional subsampling(DS),andproposedQNNMmethodsforthreedifferentmetrics: (a)ℓ 1 norm,(b)ℓ 2 norm,and(c)weightedℓ 2 normdistancemetric. X-axisrepresentshow much metric complexity is reduced in percentage while y-axis represents average performance degradation ¯ ǫ in finding NN. The input distribution of data set R∼ N D (0,Σ) with σ 2 j = 100, ρ ij = 0.5 ∀i,j 75 (c) (b) (a) 250 200 150 100 50 0 250 200 150 100 50 0 250 200 150 100 50 0 0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 Figure 2.19: 2D statistical distribution of motion estimation data. (a) shows dis- tribution of candidate blocks (in this example, two adjacent pixels) for motion searching, i.e.,F in metric space. (b) represents distribution of difference/distance between a query block and candidate blocks, i.e.,F V in viewpoint space. (c) shows distribution of difference/distance between a query block and its best matching block (NN), i.e., F NN in viewpoint space. 16 to 256), and vi) Because F and F V are identically distributed for each dimen- sion/pixel (Fig. 2.19 (a) (b)), the optimization process of finding the optimal Q (discussed previously in Section 2.5) can be significantly simplified. In Fig. 2.20, 9 CIF (352×288) sequences were tested for simulation using a H.264 /MPEG-4 AVC (JM17.1) encoder. Three different ME settings from simple to complex were tested: (i) 16×16 block (D=256) with full-pel accuracy forward prediction only (IPPP), (ii) same as i) but allows bi-directional prediction (IBBP), and (iii) variable block sizes, quarter-pel accuracy, bi-directional prediction setting. Two search metrics: ℓ 1 norm (sum of absolute difference) and proposed 1-bit and 2-bit QNNM were tested. With ℓ 1 norm, full search (with the search window of ±16) is used for motion estimation while with QNNM metric, both full search and a representative fast search algorithm, EPZS were tested. 1-bit QNNM with full search on average results in 0.02dB, 0.06dB, 0.09dB performance loss for ME set- tings(i),(ii),and(iii),respectively. Similarly1-bitQNNMwithEPZSsearchresults inonaverage-0.01dB,0.02dB,0.02dBperformancelossforMEsettings(i),(ii),and (iii), respectively. Fig. 2.21 shows for three different ME settings described above 76 36 37 38 39 40 41 42 43 44 45 46 0 100 200 300 400 500 600 Thousands P SAD FS P QNNM(7) FS P QNNM(7) EPZS PSNR in dB Bitrate (kbps) SAD w/ full search QNNM w/ full search QNNM w/ EPZS P only Akiyo CIF IBBP IBBP, VBS, Qpel ∆ dB = Avr. ∆PSNR ∆ dB = 0.008 ∆ dB = 0.05 ∆ dB = 0.03 ∆ dB = 0.03 ∆ dB = 0.05 ∆ dB = 0.08 33 34 35 36 37 38 39 40 41 42 43 0 500 1000 1500 2000 2500 3000 3500 4000 Thousands P SAD FS P QNNM(6.14) FS P QNNM(7) EPZS PSNR in dB Bitrate (kbps) P only Foreman CIF IBBP IBBP, VBS, Qpel ∆ dB = 0.07 ∆ dB = 0.09 ∆ dB = 0.06 ∆ dB = 0.01 ∆ dB = 0.09 ∆ dB = 0.09 31 32 33 34 35 36 37 38 39 40 41 42 43 0 1000 2000 3000 4000 5000 6000 7000 Thousands P SAD FS P QNNM(7,15) FS P QNNM(6,14) EPZS PSNR in dB Bitrate (kbps) P only Bus CIF IBBP IBBP, VBS, Qpel ∆ dB = 0.02 ∆ dB = 0.04 ∆ dB = 0.2 ∆ dB = -0.4 ∆ dB = 0.06 ∆ dB = -0.33 30 31 32 33 34 35 36 37 38 39 40 41 42 43 0 2000 4000 6000 8000 10000 Thousands P SAD FS P QNNM(6,14) FS P QNNM(7) EPZS PSNR in dB Bitrate (kbps) P only Mobile CIF IBBP IBBP, VBS, Qpel ∆ dB = -0.017 ∆ dB = -0.03 ∆ dB = 0.03 ∆ dB = 0.03 ∆ dB = 0.03 ∆ dB = 0.04 33 34 35 36 37 38 39 40 41 42 43 44 0 1000 2000 3000 4000 5000 6000 Thousands P SAD FS P QNNM(7,15) FS P QNNM(7) EPZS PSNR in dB Bitrate (kbps) P only Football CIF IBBP IBBP, VBS, Qpel ∆ dB = 0.04 ∆ dB = -0.04 ∆ dB = 0.06 ∆ dB = 0.01 ∆ dB = 0.06 ∆ dB = 0.06 32 33 34 35 36 37 38 39 40 41 42 43 44 0 1000 2000 3000 4000 5000 6000 7000 8000 Thousands P SAD FS P QNNM(6,14) FS P QNNM(6,14) EPZS PSNR in dB Bitrate (kbps) P only Stefan CIF IBBP IBBP, VBS, Qpel ∆ dB = 0.01 ∆ dB = -0.5 ∆ dB = 0.33 ∆ dB = 0.29 ∆ dB = 0.18 ∆ dB = 0.003 34 35 36 37 38 39 40 41 42 43 44 0 100 200 300 400 500 600 700 800 900 1000 Thousands P SAD FS P QNNM(6,14) FS P QNNM(7) EPZS PSNR in dB Bitrate (kbps) P only News CIF IBBP IBBP, VBS, Qpel ∆ dB = 0.06 ∆ dB = 0.18 ∆ dB = 0.07 ∆ dB = 0.13 ∆ dB = 0.08 ∆ dB = 0.16 31 32 33 34 35 36 37 38 39 40 41 42 43 0 1000 2000 3000 4000 5000 6000 Thousands P SAD FS P QNNM(7,15) FS P QNNM(8) EPZS PSNR in dB Bitrate (kbps) P only Coastguard CIF IBBP IBBP, VBS, Qpel ∆ dB = 0.02 ∆ dB = 0.03 ∆ dB = 0.03 ∆ dB = 0.01 ∆ dB = 0.03 ∆ dB = 0.03 33 34 35 36 37 38 39 40 41 42 43 0 500 1000 1500 2000 Thousands P SAD FS P QNNM(7) FS P QNNM(7) EPZS PSNR in dB Bitrate (kbps) P only Container CIF IBBP IBBP, VBS, Qpel ∆ dB = -0.01 ∆ dB = 0.02 ∆ dB = 0.02 ∆ dB = 0.05 ∆ dB = 0.01 ∆ dB = 0.03 SAD w/ full search QNNM w/ full search QNNM w/ EPZS ∆ dB = Avr. ∆PSNR SAD w/ full search QNNM w/ full search QNNM w/ EPZS ∆ dB = Avr. ∆PSNR SAD w/ full search QNNM w/ full search QNNM w/ EPZS ∆ dB = Avr. ∆PSNR SAD w/ full search QNNM w/ full search QNNM w/ EPZS ∆ dB = Avr. ∆PSNR SAD w/ full search QNNM w/ full search QNNM w/ EPZS ∆ dB = Avr. ∆PSNR SAD w/ full search QNNM w/ full search QNNM w/ EPZS ∆ dB = Avr. ∆PSNR SAD w/ full search QNNM w/ full search QNNM w/ EPZS ∆ dB = Avr. ∆PSNR SAD w/ full search QNNM w/ full search QNNM w/ EPZS ∆ dB = Avr. ∆PSNR Figure 2.20: Rate-distortion curves for 9 different CIF test sequences. They com- pare ME performance with original metric (SAD) and with proposed 1-bit QNNM under three different ME settings and two different search algorithms. ΔdB shown representsaverageperformance(PSNR)differencebetweenoriginalmetricand1-bit QNNM. 77 and for two different search algorithms, how QNNM performance in ME changes with the average number of quantization level per pixel used. The reason QNNM with EPZS in Fig. 2.21 (Right) may perform better than original metric (i.e., neg- ative performance loss when the curve goes below zero) is because original metric SAD itself is an approximation of optimal metric (transform + quantization + entropy coding) for optimal motion vector search. Both Fig. 2.20 and Fig. 2.21 show QNNM performance with fast search algo- rithm is as good as or better than QNNM with full search method. Two different search algorithm can be seen as different data set in terms of both distribution and the number of objects, which consequently means different viewpoint distribution as well. However, the reason QNNM works well with very different search algo- rithmsisthatdistancedistributionsofNNforbothsearchalgorithmsarerelatively similar. Both Fig. 2.20 and Fig. 2.21 also numerically support our QNNM per- formance analysis of dimensionality that smaller dimensionality (ME with variable block size use partitioned image block as small as up to ×4 block size) introduces higher performance loss. Fig. 2.25 illustrates the trade-offs between complexity and performance for pro- posed and three different representative scenarios. Other sequences tested showed similar results. The proposed approach provides a better trade-off and can also be used together with most of other existing algorithms to further improve the complexity reduction. Fig. 2.22 (Left) compares our cost function ¯ ǫ with the expected performance error collected from numerically simulated experiments for different input distribu- tion settingsf y . As the number of experiments increases, expected error converges toourcostfunction,confirmingtheaccuracyofour¯ ǫformulation. Fig.2.22(Right) 78 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 0 1 2 P only FS IBBP FS VBS Qpel IBBP FS Performance loss (∆ PSNR in dB) QNNM quantizer (avr. bits/pixel) ME using forward prediction (1089) ME using bi-directional prediction (1814) ME using variable block sizes, quarter- pel refinement, bi-directional prediction (60403) Full Search with QNNM -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 0 1 2 P only EPZS IBBP EPZS VBS Qpel IBBP EPZS Performance loss (∆ PSNR in dB) QNNM quantizer (avr. bits/pixel) ME using forward prediction (8) ME using bi-directional prediction (18) ME using variable block sizes, quarter- pel refinement, bi-directional prediction (1452) Fast Search (EPZS) with QNNM Figure 2.21: Average QNNM performance of 9 different test sequences. QNNM is performed with full search (left) and with a representative fast search method, EPZS(right). Numbersinparenthesesshowningraphlegendrepresentthenumber of metric computation performed per a query macroblock. Three different ME set- tings from simple to complex were tested: (i) 16×16 block (D = 256) with forward prediction only and full-pel, (ii) same as i) but allows bi-directional prediction, and (iii) variable block sizes, quarter-pel accuracy, bi-directional prediction setting. With 1-bit quantizer per dimension/pixel used with full search, on average 0.02dB, 0.06dB, 0.09dB performance loss incurred for ME settings (i), (ii), and (iii), re- spectively. Similarly 1-bit quantizer used with EPZS results in average -0.01dB, 0.02dB, 0.02dB performance loss for ME settings (i), (ii), and (iii), respectively. 0 100 200 300 400 500 600 0 20 40 60 80 100 120 Quantization Threshold Performance loss Foreman CIF Mobile CIF Stefan CIF 0 20 40 60 80 100 120 140 160 180 200 220 240 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 p Performance loss Quantization Threshold Quantization Threshold (f z (0)) Performance Loss E NN Performance Loss E NN Foreman CIF Mobile CIF Stefan CIF E NN uniform E NN rayleigh E NN lognormal E NN model Figure2.22: (Left)comparisonofourobjectivefunctionf obj1 simulatednumerically with different input distribution settings. (Right) compares the objective function f obj1 based on the collected ME data (dashed lines) with simulated experiments excluding intra and skip modes (solid lines). Both used 1-bit quantization thus a single threshold (x-axis). 79 Sensitivity of 1 bit quantization threshold to input sequences 0.0 0.5 1.0 1.5 2.0 0 20 40 60 80 100 120 Quantization Threshold Performance loss in dB Foreman CIF Mobile CIF Akiyo CIF Stefan CIF Bus CIF Sensitivity of 1 bit quantization threshold to input sequences Performance Loss in dB Quantization Threshold θ Foreman CIF Mobile CIF Akiyo CIF Stefan CIF Bus CIF Figure 2.23: Illustration of performance degradation in coding efficiency with rela- tion to the 1-bit quantizer threshold value for various input sequences. Represen- tatively different test sequences (low/high motion, texture) were selected to cover a wide range of variations in input sequences. Result shows very low sensitivity to input sequence variation in terms of the optimal threshold value. This confirms high homogeneity of viewpoints towards nearest-neighbors assumption which our proposed algorithm is based on. The range of threshold is from 0 to 255 while only 0 to 120 range is shown. compares our cost function based on the collected ME data with simulated experi- ments. Fig. 2.23 provides some insight about the sensitivity of optimal threshold to in- putvariation. Despitelargevariationoftheinputsourcecharacteristics,dimension- distanceswherequantizationisappliedexhibitmoreconsistentstatisticalbehavior, leading to overall robustness in our quantization method. 2.7.2 Vector Quantization for Data Compression A vector quantizer encodes a multidimensional vector space into a finite set of values from a discrete subspace of lower dimension. A lower-space vector requires less storage space, so vector quantization (VQ) is often used for lossy compression of data such as image, video, audio or for speech recognition (statistical pattern recognition). 80 Akiyo 39 39.2 39.4 39.6 39.8 40 160 165 170 175 180 185 190 195 200 kbps dB original (100%) dataset reduction (50%) dimensions reduction (50%) uniform Q (bit trunc.) (50%) proposed 1 bit (20%) dB kbps (c) Akiyo CIF Full computation Dataset reduction Dimension reduction Bit truncation (uniform Q) Proposed (1bit) Mobile 34 34.5 35 35.5 36 36.5 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 kbps dB origina (100%) dataset reduction (50%) dimensions reduction (50%) uniform Q (bit trunc.) (50%) proposed 1 bit (20%) dB kbps (b) Mobile CIF Full computation Dataset reduction Dimension reduction Bit truncation (uniform Q) Proposed (1bit) Foreman 35 36 37 38 39 40 400 600 800 1000 1200 1400 1600 1800 kbps dB original (100%) dataset reduction (50%) dimensions reduction (50%) uniform Q (bit trunc.) (50%) proposed 1 bit (20%) (a) Foreman CIF kbps dB Full computation Dataset reduction Dimension reduction Bit truncation (uniform Q) Proposed (1bit) Figure2.24: Comparisonsofvideocompressionperformanceinratedistortionsense for five different motion estimation scenarios: i) conventional full search and full metric computation, ii) checking points are reduced, iii) dimensionality is reduced, iv)uniformquantization(bittruncation)isperformed, andv)proposed1bitquan- tization based metric computation. scenarios ii), iii), and iv) have the same com- plexity reduction. 81 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Foreman CIF 0 20 40 60 80 100 Complexity (%) Performance Loss in dB data set subsample dimension subsample bit truncation (uniform Q) proposed quant. based metric computation 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Stefan CIF 0 20 40 60 80 100 Complexity (%) Performance Loss in dB data set subsample dimension subsample bit truncation (uniform Q) proposed quant. based metric computation 0 20 40 60 80 100 data set subsample dimension subsample bit truncation (uniform Q) proposed quant. based metric computation 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Bus CIF Complexity (%) Performance Loss in dB Figure 2.25: Comparisons of Complexity-Performance trade-offs of four different scenarios: each scenario reduces i) size of a data set R, ii) dimensionality of each datar∈R, iii)bitdepthofeachdatadimensionbytruncatingleastsignificantbits (equally seen as uniform quantization on each data dimension), and iv) resolution of each dimension-distance (proposed distance quantization based metric computa- tion. X axis represents complexity percentage to that of original full computation. Y axis represents the RD performance loss measured in dB, thus zero represents no performance degradation. 82 VQ maps D-dimensional vectors in the vector space R D into a finite set of vectors Y ={y i :i = 1,2,,N}. Each vector y i is called a code vector and the set of all the code vectors, Y, is called a codebook. A vector space R D is partitioned into N code cells (clusters) C = {C i : i = 1,2,,N} of non-overlapping regions where each region C i called Voronoi region is associated with each codeword, y i . Each Voronoi region C i is defined by: C i ={x∈R D :kx−y i k≤ x−y j ,∀j6=i} where VQ maps each input vector x ∈ C i in code cell C i to the code vector y i . Fig. 2.26(a) and (b) illustrate this in the case of a simple representation of a gray- scale image in a 2D vector space by taking in pairs the values of adjacent pixels (a) and their corresponding set of 512 code cells/Voronoi regions (b). VQ consists of two process: designing a codebook and performing nearest code vector search. There are several VQ algorithms which have different codebook design method but all perform exact nearest neighbor search. In this section, we use well-known generalized Lloyd algorithm (GLA) to design a codebook and use QNNM metric to search that codebook. VQperformsNNSforeachinputvector(aquery)toallcodevectorsandencodes it with the nearest code vector. A set of all code vectors can be seen as data set of NNS problem (Fig. 2.26(b)). Fig. 2.26(c) and (d) each illustrate 2D example of distancedistributiontoallcodevectorsF V anddistancedistributiontothenearest code vector F NN , respectively. Fig. 2.27 and Fig. 2.28 show 1-bit and 2-bit QNNM performance when it is applied to codebook search process of VQ based image coding. Fig. 2.27 compares four different VQ settings: (i) using VQ codebook obtained by generalized Lloyd 83 (d) (b) (a) Voronoi cells 250 200 150 100 50 0 0 50 100 150 200 250 250 200 150 100 50 0 0 50 100 150 200 250 0 10000 20000 30000 30000 20000 10000 0 0 200 400 600 800 1000 1000 800 600 400 200 0 (c) Figure 2.26: 2D statistical distribution of vector quantization data. (a) shows distribution of the input training image data to generate a codebook. (b) shows distribution of a set of code vectors (codebook with a size of 512) where NNS is performedwitheveryqueryvectortofinditsclosestcodevector(F inmetricspace). (c) represents distribution of difference/distance from a query vector to every code vector (F V in viewpoint space). (d) shows distribution of distance from a query vector to its NN/best matching codeword (F NN in viewpoint space). 84 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0.2 0.4 0.6 0.8 1 1.2 1.4 stdvq tsvq QNNM 1 QNNM 2 PSNR in dB Rate (bpp) Standard VQ Tree structured VQ QNNM 1 bit QNNM 2 bit D = 8 Elaine D = 16 D = 32 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 0.2 0.4 0.6 0.8 1 1.2 1.4 stdvq tsvq QNNM 1 QNNM 2 PSNR in dB Rate (bpp) Standard VQ Tree structured VQ QNNM 1 bit QNNM 2 bit D = 8 Zelda D = 16 D = 32 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 0 0.2 0.4 0.6 0.8 1 1.2 1.4 stdvq tsvq QNNM 1 QNNM 2 PSNR in dB Rate (bpp) Standard VQ Tree structured VQ QNNM 1 bit QNNM 2 bit D = 8 Baboon D = 16 D = 32 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0.2 0.4 0.6 0.8 1 1.2 1.4 stdvq tsvq QNNM 1 QNNM 2 PSNR in dB Rate (bpp) Standard VQ Tree structured VQ QNNM 1 bit QNNM 2 bit D = 8 Goldhill D = 16 D = 32 Figure 2.27: Rate-distortion curves of VQ based image coding performance for 4 gray-scale 512x512 test images. Four different VQ scenarios are compared: (i) generalized Lloyd algorithm (GLA) based VQ codebook with ℓ 2 distance (stan- dard VQ), (ii) tree-structured vector quantization (TSVQ), (iii)(iv) GLA based VQ codebook with 1-bit QNNM and 2-bit QNNM. Performance was compared for dimensionality 8, 16, and 32 and for different codebook sizes, 32, 64, 128, 256, 512, 1024. 2-bit QNNM outperforms TSVQ method. 85 Performance loss (∆ PSNR in dB) D = 8 D = 16 D = 32 QNNM quantizer (avr. bits/pixel) 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 Figure 2.28: Average performance loss (average over different codebook sizes and test images in PSNR) by QNNM decreases as the number of quantization level increases. Right axis represents the average number of bits per dimension/pixel al- lowedforQNNM.3differentvectorsizes/dimensionality,8,16,and32,wereshown. This result as well as the one of ME application both support ourpreviously drawn performance analysis that QNNM performance improves with dimensionality. algorithm (GLA) and ℓ 2 distance search metric (standard VQ) (ii) tree-structured vectorquantization(TSVQ),(iii)usingGLAbasedVQcodebookwith1-bitQNNM, and (iv) using GLA based VQ codebook with 2-bit QNNM. Performance was com- pared for dimensionality 8, 16, and 32 and for different codebook sizes, 32, 64, 128, 256, 512, 1024. Fig. 2.28 shows how average performance of QNNM used for codebook search changes with dimensionality and average number of quantization levels per pixel. Both Fig. 2.27 and Fig. 2.28 numerically supports our analysis in Section 2.6.2 showing QNNM performance improves with dimensionality. Similarly to ME application, input distribution to VQ and distribution of code- bookF aswellasdistancedistributiontocodevectorsF V , allofthesesharesimilar statistical characteristics of high correlation and identical marginal distributions as shown in Fig. 2.26. This allows QNNM to exploit their correlation information to 86 11 12 13 14 15 16 17 18 19 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 stdvq QNNM 1 QNNM 1 QNNM 2 QNNM 2 Standard VQ QNNM 1 bit QNNM 1 bit optimized QNNM 2 bit QNNM 2 bit optimized PSNR in dB Rate (bpp) (a) (b) (c) 0 10000 20000 30000 30000 20000 10000 0 0 10000 20000 30000 30000 20000 10000 0 Figure 2.29: While VQ input data is identically distributed on each dimension or pixel, their optimal Q thresholds are not identical in all dimensions. (a) and (b) show 2D examples of having (a) identical and (b) different q threshold for each dimension. Difference of optimal Q thresholds of different dimensions increases and performance improves as correlation between dimensions becomes stronger. (c) shows how much performance of VQ based image coding (goldhill with D=16) improves asq threshold is optimized in terms of exploiting correlation information betweendimensions. ItisalsothecaseforMEapplicationandotherswhichinvolve data set with high correlation. 87 improve QNNM performance further. Fig. 2.29(a) and (b) illustrate q quantizer exampleusingidenticalquantizerforallpixels(a)anddifferentquantizerfordiffer- ent pixels (b) and show why (b) partitions viewpoint space better to approximate NN. 10 Fig.2.29(c)showhowmuchperformancecanbeimprovedbyexploitingcor- relation information. Thin curve line of QNNM represents QNNM with identical quantizer for all pixels while thick curve line represents optimized quantizers which take statistical correlation information into account. In general, QNNM performance on VQ image coding results in more significant performance degradation than that of ME application. Therefore QNNM is not highly recommended for VQ application. This is not because of statistical charac- teristics of input data to VQ system but because NN search error in VQ introduce direct impact on compression efficiency. Also VQ for image coding in general use smaller blocks to code (e.g., 2×2 to 4×4). Unlike ME, ℓ 2 metric is also not an approximated metric of optimal complex metric (e.g., transform, quantization, en- tropycodingcombined)butitisoptimalitselfingeneratingcodebookandachieving the best compression given codebook. In other words, even if NN approximation error ¯ ǫ is similar for both ME and VQ application, its impact on system perfor- mance is more tolerant with ME and more severe with VQ application. However, if QNNM is used in VQ for audio/voice data compression which often requires high dimensionality (e.g., D > 500), NN approximation error ¯ ǫ could result in much more tolerant result. 10 threshold values shown in Fig. 2.29(a)(b) do not represent the actual optimal threshold values which is much smaller than those of illustrated. Actual optimal threshold values are too small to be visible on graphs. 88 2.8 Conclusions and Future Work This chapter introduced a novel methodology, called quantization based nearest- neighbor-preserving metric approximation algorithm(QNNM), which leads to sig- nificant complexity reduction in search metric computation. Proposed algorithm exploitsfourobservations: (i)homogeneityofviewpointproperty,(ii)concentration of extreme value distribution, (iii) NN-preserving not distance-preserving criterion, and(iv)fixedqueryinformationduringsearchprocess. Basedonthese, QNNM ap- proximates original/benchmark metric by applying query-dependent non-uniform quantizationdirectlyonthedataset,whichisdesignedtominimizetheaverageNNS error, while achieving significantly lower complexity, e.g., typically 1-bit quantizer. It entails nonlinear sensitivity to distance such that finer precision is maintained only where it is needed/important while unlikely regions to be nearest neighbors are very coarsely represented. We show how the optimal query adaptive quantizers minimizingNNSerrorcanbedesignedoff-linewithoutpriorknowledgeofthequery information to avoid the on-line overhead complexity and present an efficient and specifically tailored off-line optimization algorithm to find such optimal quantizer. Three distinguishing characteristics of QNNM are statistical modeling of dataset, employingquantizationwithinametric, andquery-adaptivemetric, allofwhichal- low QNNM to improve performance complexity trade-off significantly and provide robustresultevenwhentheprobleminvolvesnon-predefinedorlargelyvaryingdata set or metric function. With motion estimation application, QNNM with coarsest 1-bit quantizer per pixel (note that a quantizer for each pixel is different to exploit the correlation of input distribution) results in on average 0.01dB performance loss while reducing more than 70% to 98% metric computation cost. 89 Our proposed approach can be also used for k-nearest neighbor search or or- thogonal range search. To find the exact nearest-neighbor, this approach can be also used as a preliminary filtering step to filter out unlikely candidates and then refinement process can be performed within the remaining set. This method can beperformedindependentlybutalsoinparallelwithmostofexistingpreprocessing based algorithms. More work needs to be done in terms of specific logic/circuit level design for efficient hardware implementation of quantizer. Also it would be interesting to see howthismethodcanbeincorporatedwithparallelanddistributedindexstructures. Furthermore, if possible, generalization of our proposed work can be developed to deal with more general metric function such as quadratic metric and relax the constraint we posed in this thesis on metric structure (no cross-interference among dimensions in metric function). The concept of our proposed approach can be extended to numerous problems involving search process such as combinations of queries, batch queries, classifica- tion problems, other proximity problems etc. But not limited to similarity search problem, thepotentialbenefitoferrortoleranceconceptcanbereapedfromvariety of application areas. 90 Chapter 3 Fault Effect Modeling for Nearest Neighbor Search Problem In this chapter, we investigate the impact of hardware faults on NNS problem with a focus on NNS metric computation process. More specifically, we provide an ana- lytical formulation of the impact of single and multiple stuck-at-faults within NNS metric computation. We further present a model for estimating the system-level performance degradation due to such faults, which can be used for the error toler- ance based decision strategy of accepting a given faulty chip. We also show how different faults and NN search algorithms compare in terms of error tolerance and define the characteristics of search algorithm that lead to increased error tolerance. Finally, we show that different hardware architectures performing the same metric computation have different error tolerance characteristics and we present the opti- mal hardware architecture for NNS metric computation in terms of error tolerance. Our work could also applied to systems (e.g., classifiers, matching pursuits, vec- tor quantization) where a selection is made among several alternatives (e.g., class label, basis function, quantization codeword) based on which choice minimizes an additive metric of interest. 91 3.1 Introduction Error tolerance approach raises the threshold of conventional perfect/imperfect to acceptable/unacceptable by analyzing the system-level impacts of faults and ac- cepting minor defects which result in slightly degraded performance within some application specific ranges of acceptability. However, a critical factor for this ap- proach to be successful is to be able to cost-efficiently test and accurately predict if a defective chip will provide acceptable system-level performance without having to perform application level testing. In previous work, the impact of hardware defects that lead to faults at circuit interconnects and soft errors produced by voltage scaling have been studied. Hard- ware faults such as those arising in a typical fabrication process can potentially lead to “hard” errors, since some of the functionality in the design is permanently impaired, whereas“soft”errorsmayarisewhenacircuitoperatesatavoltagelower than specified for the system. Previous work [17,18] showed that certain range of hardware defects within the motion estimation (ME) and discrete cosine transform (DCT) with quantization subsystems lead to acceptable quality degradation. It also proposed a novel ET based testing strategy for such systems. Other recent work [9] showed that the ME process exhibits significant error tolerance in both hard and soft errors and further proposed simple error models to provide insights into what features in NNS algorithms lead to increased error tolerance. In this chapter, based on the same ET concept, we provide an analytical for- mulation of the impact of multiple hardware faults on NNS metric computation process. This provides estimates of system-level performance degradation due to such faults, which can be used in deciding whether to accept a given faulty chip. Furthermore,basedonthismodel,weinvestigatetheerrortolerancebehaviorofthe 92 NNS process in the presence of multiple hardware faults from both an algorithmic and a hardware architecture point of view. In comparing different NNS algorithms we observe that high performance, low complexity search algorithms can in fact more error tolerant than other methods. As an example, in our experiments, en- hancedpredictivezonalsearch(EPZS)[14]algorithmformotionestimationprocess exhibitsminimumdegradationwithrespecttofullsearch(FS)inthefault-freecase (e.g.,0.01dB loss)butcanperformsignificantlybetterinthepresenceoffaults(e.g., upto2.5dB gaincomparedtoafaultyFSinsomecases). Whencomparingdifferent hardware architectures to perform the same metric computation, we also observe significant variations in error tolerance. We show that the optimal hardware ar- chitecture for NNS metric computation in terms of error tolerance is a perfectly balanced binary tree structure, which is also effective in terms of enabling parallel computation. Our simulations with motion estimation (ME) for video coding ap- plication show that, if the optimal structure is used, the expected error due to a fault can be reduced by up to 95%, as compared to other architectures, and that more than 99.2% of fault locations within metric computation circuits result in less than 0.01dB performance degradation. Most previous research has relied on the single stuck-at (SA) fault assumption, which has been well-studied due to its simplicity and high fault coverage and has worked fairly well in practice. However, with decreasing feature sizes and increas- ingly aggressive design styles, single SA fault has become a rather restrictive model since it allows the defect to influence only one net, while defects in modern devices tend to cluster and affect multiple lines in the failing chip. Moreover, recent ex- periments [27] confirm that more than 40% of defects found in failing chips cannot be diagnosed using the single SA fault model. Multiple SA fault model on the other hand covers a greater percentage of physical defects, and can also be used 93 Input Output Error: 0 1 0 0 0 Single Stuck At 0 (SSA0, p-th busline) Input Output Error: 2 p 1 0 1 0 Input Output 1 0 0 1 Single Stuck At 1 (SSA1, p-th busline) Input Output Error: 0 1 0 1 1 Error: 2 p Figure 3.1: Simple illustration of single stuck at fault model. to model defects of certain different fault types such as multiple bridging faults or multipletransitionfaults. However,themultipleSAfaultcasehasnotbeenstudied extensively due to the issues of complexity and functional error inter-dependency characteristics. In this chapter, we present an analytical model of the system-level fault effect for any arbitrary multiple SA faults in a metric computation circuit and study error tolerance properties of the NNS process at both algorithmic and hardware architecture level based on multiple SA fault assumption. The rest of this chapter is organized as follows. In Section 3.2, we begin with a brief description of several hardware architecture for metric computation pro- cess and SA fault model. In Section 3.3, we discuss appropriate measures for the impact of faulty hardware and provide simulation results using ME application. In Section 3.4, we formulate multiple SA fault effect on both metric computation level and matching process or search level. In Section 3.5, we analyze and define the characteristics of NNS algorithms that lead to increased error tolerance. In Section 3.6, we provide analysis on NNS metric hardware architecture pertaining to the error tolerance and show the optimal structure is perfectly balanced tree 94 structure. Furthermore, we present experimental results and discussion on actual ETbaseddecisionexampleformotionestimationprocessapplicationinthecontext of H.264/AVC, comparing different hardware architectures and search algorithms. Section 3.7 summarizes our conclusions and main results. 3.2 NNSMetricComputationArchitecture&SA Fault Model There are several types of hardware implementation architectures [42] for comput- ing a NNS metric, with different levels of parallelism. We will refer to them as matching metric computation (MMC) architectures. Figure 3.2 illustrates a few examples of MMC architectures [42] which can be viewed as arrays of cascaded adders and represented as binary tree graphs, where each inner node represents an adder, an edge connecting two inner nodes represents a data bus, and a leaf node corresponds to the processing element (PE) that computes distance for each dimension (e.g., the pixel-level prediction error for ME application). Figure 3.2 and tree structured model of MMC architectures will be revisited in more detail when we discuss the optimal structure of MMC architecture in Section 3.6. Note that our fault effect model in Section 3.4 and analysis on the search algorithms in Section 3.5 are independent of the MMC architecture and valid for any hardware implementation structure used. Throughout this chapter we consider only faults in the interconnect data bus that affect the data transfer between PEs within a metric computation hardware architecture. These interconnect faults are modeled with the stuck-at (SA) fault model, a well-known structural fault model that assumes that the design contains 95 AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD + + + + + + + + + + + + + M + M M M N N N N type-1 type-4 type-3 type-2 AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD N type-2 (x) (y) (z) AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD Figure 3.2: (Upper) Four examples of MMC architectures for NNS metric compu- tation, represented as a dependence graph, where only processing elements (PEs) are shown for simplicity. AD denotes a processing element which is, for ℓ 1 metric forinstance, anabsolutedifferenceandaddition. Mdenotesaminimumvaluecom- putation. (Lower) Tree structured flow graph corresponding to the type-2 MMC architecture on the left. In this graph, the processing element (leaf nodes, shown here as AD) and addition (inner nodes) computations are separated, thus AD de- notesonlythedistancecomputationforeachdimension(e.g.,absolutedifferencefor ℓ 1 metric). If ℓ 2 metric is used as a cost metric, leaf nodes become PEs computing a square difference. 96 error rate - error significance plot for ME fault 0 10 20 30 40 50 60 70 80 0 50 100 150 200 250 300 error significance ( average extra bits per wrong match MB) error rate (%) 14 13 12 11 15 14 13 12 11 13 14 12 11 10 13 12 11 10 12 13 11 10 13 12 14 10 11 12 11 9 10 Full search ME acceptance curve α=1 α=7/8 α=3/4 α=5/8 α=1/2 α=3/8 Figure 3.3: Example Error Rate and Significance due to SSA fault within MMCA for motion estimation application. a fault that will cause a line in the circuit to behave as if it is permanently stuck at a logic value 0 (stuck-at-0 fault, abbreviated as SSA0) or 1 (stuck-at-1 fault, abbreviated as SSA1). The SA fault model covers 80-90% of the possible manufacturing defects in CMOS circuits [50], such as missing features, source-drain shorts, diffusion con- taminants, and metallization shorts, oxide pinholes, etc. 3.3 QualityMeasureforNearestNeighborSearch with Faulty Hardware Faults in a MMC architecture would imply that it is likely that the selected point r f may not be equal to the nearest neighbor point r ∗ which can be obtained with fault-freeMMCA.WedefineNNSErrorifr f 6=r ∗ duetoafault. Anerrordoesnot occur for all queries but only occurs if certain conditions are met. An error occurs 97 iff, a) if a fault is in the p-th data line, then the input to p must be 0 for r ∗ and 1 for r f , and b) 0<d(r f ,q)−d(r ∗ ,q)≤ 2 p . Therefore, our focus is on how often these errors occur (error rateP e =prob(r f 6=r ∗ )) and how much additional quality degradation is introduced (error significance S e = d(r f ,q)−d(r ∗ ,q), representing the level of inaccuracy of MCP). Error rate and significance depend highly on the fault position with a certain variation due to the input sequence characteristics. Figure 3.3 demonstrates clearly how error rate and significance values change with faults at different positions. Points connected with the same line are faults in the same interconnect data bus with a different data bit line. Points shown in outer linesindicatefaultsinthedatabuswhichhas32moreleafnodesconnectingtowards that data bus than the adjacent inner line. Also note that SSA0 and SSA1 faults at the same positions produce identical results in both error rate and significance. Proof of this and further analysis on this concept of error rate and significance of hardware fault are provided in [10]. Wefurtherdiscussfaulterrormeasuresusingmorespecificexampleapplication, comparing DCT process and ME process for video coding. While faults introduced within the DCT block within video coding tend to have a direct impact on visual quality degradation and the type of artifacts introduced, faults on ME metric have a more indirect impact on overall quality by impairing the accuracy of the ME engine. Thus, degradation of motion compensated (MC) prediction signal and its increased residual energy will be compensated with the rate penalty with no significant visual quality loss if Rate Control (RC) is not employed. However, for most cases, there are constraints imposed on the bit rate of the system where a certain local average bit rate needs to be maintained over time. Considering that most modern video encoders use RC schemes, additional bits required by increased residual of a block/frame, given certain bit-rate constraint, will be compensated by 98 TQV Increment due to Faults 0% 10% 20% 30% 40% 50% 60% 70% 0 1000 3000 5000 7000 Bitrate (kbps) Variation increment RD performance in the presence of faults 32 34 36 38 40 42 44 46 1000 3000 5000 7000 Bitrate (kbps) PSNR (dB) fault free fault A fault B fault C fault D fault free fault A fault B fault C fault D Figure 3.4: (Left) Rate-Distortion impact of faults. (Right) Temporal Quality Variations due to Faults. using fewer bits for other blocks/frames. For example, assuming that a block level RC model is used, the increase in bit-rate due to faulty ME could imply increase in thequantizationscalechosentoencodethesubsequentblocks. Inthiscase,artifacts would appear in the form of standard quantization artifacts, rather than, as was the case for DCT, error-specific artifacts. Therefore, the problem of measuring the visualqualitydegradationduetofaultintroductionwithintheMEcanbealsoseen astheproblemofmeasuringtheimpactonpicturequalityofacompressionprocess. Thus, any visual quality metric which captures the video compression impairment reliably can be also applied to this case. For example, most commonly accepted and widely used visual quality metric is MSE, or equivalently PSNR which simply computes distortion energy thus it lacks of correlating well with other important characteristics of human visual sys- tem. However, several experiments with human interference evaluated PSNR as a competitive metric for compression distortion assessment [2]. 99 PSNR Temporal Variation foreman (2Mbps RC) 30 32 34 36 38 40 42 1 31 61 91 121 151 181 211 241 271 frames PSNR (dB) faultfree p=11, a=1 p=13, a=1 Figure 3.5: Rate control scheme distributes the fault impact over time. a SSA fault within the ME metric computation circuit is simulated. Another problem of the average PSNR or MSE of all frames in a video sequence to represent quality degradation is that it can easily underestimate distortion vari- ation throughout the sequence. In reality [2], an end-user will evaluate an entire video sequence based upon the quality variations and minimum quality. Therefore, distortion variation can be separately considered into the spatial variation, a block level quality variation within a frame, and temporal variation, a frame level one throughout the sequence. This distortion variation measure can be particularly meaningful for the study of ME faults since we observed that intro- ducing a SSA fault in ME always increases both the temporal and spatial quality variation on the video output. Note that the level of variation increase depends on the RC scheme. Typically RC performs bit allocation by selecting the encoder’s quantization step size (QP) for the residual block/image. Note that most modern implementationsofRC,aconstraintisemployedontheincrement/decrementofthe quantizer,whichresultsindistributingdistortionthroughoutthepicture. Therefore errors occurring with certain rate are subdued and smoothly spread out over the 100 picture after the rate controlled quantization process. Consequently, distinguishing errors into two measures of error rate and significance becomesno longer necessary. Wehaveobservedthatspatialdistortionvariationstendtobeimperceptible,onthe other hand temporal variations can be significant. Spatial variation was measured by computing the variance of Qp values for each frame and by averaging them. To evaluate temporal variations within a video sequence we defined the measureTQV as: TQV = P N−2 i=0 MSE fault i −MSE fault i+1 P N−2 i=0 MSE no fault i −MSE no fault i+1 (3.1) whereMSE fault i andMSE no fault i are the frame MSE values of the decoded images with and without ME faults respectively, and N is the total number of frames considered. InFigure3.4(left)wepresenttheRateDistortionperformancefortheForeman sequence, at CIF resolution, in the presence of ME faults within an MPEG-2 en- coder. Similarly,the Temporal Quality Variation for the same faults are presented in Figure 3.4 (Right). Since error rate and significance measurements are no longer useful after RC Quantization for ME fault case, PSNR with additional measure of temporal quality variation would be able to represent well the quality degradation introduced by ME fault. However, we observed that temporal variation increase is relatively small compared to PSNR change and roughly proportional to the PSNR degradation. ThusitiswellcapturedbyPSNRmeasure. Figures3.4(Left)and3.4 (Right) illustrate well this point. Large quality variations in Figure 3.4 (Right) are also related to a rather significant drop in PSNR in Figure 3.4 (Left). Therefore, for the evaluation of ME faults, PSNR can still capture most of quality impairment sufficiently although temporal variation metric could potentially improve the accu- racy of visual quality assessment. In Figure 3.3, a PSNR based quality threshold 101 T (e.g. T = 0.1dB) was considered to classify faults. This threshold essentially de- finesanacceptancecurve,accordingtowhichfaultsbelowthiscurveareconsidered as acceptable, while faults above are considered as unacceptable. 3.4 Multiple Fault Effect Modeling In this section we present a complete analysis of the effect of interconnect faults in NNS process. We start by providing a model to capture the effect of any fault (or combination of multiple faults) on NNS metric. We then describe how these errors in metric computation lead to errors in the matching/searching process. Our motivation is twofold. First, we will use this analysis to predict the behavior of NNS algorithms and MMC architectures in the presence of faults (see Sections 3.5 and 3.6, respectively). Second, this analysis is a required step towards developing comprehensive testing techniques to identify acceptable error behavior. In particu- lar,itwillprovidetoolstotacklemultiplefaultscenarios,bydiscardingspecificsets of faults (e.g., when one of the multiple faults is by itself unacceptable), establish- ing equivalence classes across faults (e.g., cases when dependent and independent faults may have the same impact), or determining how the parameters of the faults involved affect overall behavior (e.g., multiple faults with similar parameters may lead to relatively worse errors than sets of faults with differing characteristics). FaultsinaMMCarchitecturecanleadtoerrorsinthecostmetriccomputation. We will refer to this as a metric computation (MC) error and denote its magnitude as Δ(D,{f i }) = ˆ D −D where ˆ D and D denote the computed costs with and without MC error and f i denotes a fault. In general Δ(D,{f i }) is a function of the input sequence and the characteristics of the faults (e.g., their types, locations, and their dependencies). However, to simplify the notation, from here on we omit 102 {f i }, a set of parameter vectors representing each fault’s characteristics. We will refer to this function Δ(D) as the MC error function. Note that MC errors do not necessarily lead to matching process (MP) errors. We say a block suffers a MP error if r f 6= r ∗ where r f and r ∗ represent the selected object/point and minimum object/point from given data set R, respectively . Since MP errors do not occur for all queries, we also define the MP error rate (P E = prob(r f 6= r ∗ )) which represents how often these MP errors occur, and the MP error significance (S E = d(r f ,q)−d(r ∗ ,q), where d(r f ,q) and d(r ∗ ,q) are the computed distance from a query point to r f and r ∗ , respectively). 3.4.1 Binary Adder Tree and Fault Characterization Each SA fault in any MMC architecture can be fully characterized by three at- tributes, a) the fault type t∈{0,1} (Stuck-At-0/SA0 or Stuck-At-1/SA1), b) the bit position of the faulty data linep∈{0,··· ,P} where 0 and P correspond to the LSBandMSBoffaultydatabusrespectively, dependingontheMMCarchitecture, and c) the position of that data bus. The last attribute can be parameterized as a ratio α ∈ (0,1] of the number of leaves in the subtree rooted at a faulty edge (data bus) to that of the entire tree. For example in Figure 3.2 (Lower), if a fault is positioned at (x), its corresponding α, say α (x) is 1. Similarly, α (y) = 1/2 and α (z) = 1/8andsoforth. Thereforeeachfaulticanberepresentedasf i = (t i ,p i ,α i ), wheret i ∈{0,1},p i ∈{0,··· ,P}andα i ∈ (0,1]. Notethatthesefaultparameters completely capture the fault effect, so that two faults with the same parameters in two different architectures have, on average, the same effect. 103 3.4.2 Multiple Fault Effect Modeling for Nearest Neighbor Search Metric Computation Consider an interconnect fault i affecting a data bus. If the input to the data bus isx then we can represent the effect of the fault as a functionh i (x), ˆ x−x, where ˆ x is the output of the faulty data bus. For fault parameters p i and t i the shift induced by SA fault i can then be described as h i (x) = ˆ x−x = (−1) t i +1 ·2 p i ·I w i t i (x), where w i 0 = [ k∈2N 0 +1 [k·2 p i ,(k+1)·2 p i ) and w i 1 = [ k∈2N 0 [k·2 p i ,(k+1)·2 p i ), whereN 0 denotes the set of non-negative integers. Note thath i (x) is a function of x, i.e., an intermediate term in the computation of the final matching metric cost D. Thus, modeling the final observed metric computation error would in principle require modeling the distribution of intermediate values x for a given D. Instead, as a simplification, we assume that, on average and for a given α i and D, we will tend to have x ∼ = αD, i.e., the intermediate metric value x at the fault position is proportional to the number of pixel-errors accumulated up to that position 1 . With this approximation we define the error at the output corresponding to fault i as H i (D) ∼ =h i (αD), which can be written as: H i (D), (−1) t i +1 ·2 p i ·I W i t i (D), (3.2) 1 This would hold if pixel-errors were iid with mean dependent on D. However, based on the generalized central limit theorem even if there exist some weak dependencies between pixel-errors, this still holds true if none of pixel-error variables exert a much larger influence than the others. 104 where W i 0 = [ k∈2N 0 +1 k·2 p i α i , (k+1)·2 p i α i and W i 1 = [ k∈2N 0 k·2 p i α i , (k+1)·2 p i α i H i (D) is a periodic piecewise constant square wave function ofD taking values in the set R i ={0, 2 p i ·(−1) t i +1 } with period T i = 2 p i +1 /α i . If there is a single SAfaultf i = (t i ,p i ,α i )inaMMCarchitecture, thenwiththeaboveapproximation the error at the output is Δ(D) ∼ =H i (D), which leads to D being shifted by ±2 p or remaining unchanged depending on D and f i . However,whenweconsidermultipleSAfaults,Δ(D)cannotbeaccuratelymod- eled unless extra information regarding the dependency relation between faults is also provided. In other words, two sets of faults with identical fault configurations, {f i } M i=1 with f i = (t i ,p i ,α i ) could produce different MC error function Δ(D) de- pending on the dependency relation between faults. Figure 3.7 illustrates simple examples of multiple faults with different dependencies. If faults are placed serially in the same computation path, the output of previous faults along the path could affect the input to the following fault position. We refer to such faults as dependent faults (Figure 3.7 (a)). Note that the relative position of the faults in the architec- ture matters, i.e., the computations closer to the root of the tree are affected by the errors introduced closer to the leaves, but not the other way around. On the other hand, if faults are placed in separate paths such that the output of each fault position is independent from one another, we refer to them as independent faults (Figure 3.7 (b)). Figure 3.7 (c) shows the case of a simple combination of both independent and dependent faults. For each of these three scenarios, Figure 3.7 provides an example of multiple faults locations in the MMC hardware architec- ture represented as a binary tree and a fault dependency graph illustrating their 105 2 P 2 P input output SA0 SA1 + + + + + + + + + + + + + + + dD dD dD dD dD dD dD dD dD dD dD dD dD dD dD dD Df p-th bit, SA fault 2 P 2 P /α SA0 SA1 Df D input output p-th bit SA fault Hi (D) 2 p 2 p /αi SA0 SA1 Figure 3.6: Illustration of single stuck at fault and its impact on output of that faulty interconnect (upper), and on output of final matching metric cost (lower). 106 + + + + + + + + + + + + + + + + + + + + + + + + + + + + f1 f1 f1 f2 f2 f2 f3 f1 f2 f1 f2 f1 f2 f3 Δ(D) = H 1 (D) + H 2 (D) Δ(D) = H 1 (D) H 2 (D) Δ(D) = (H 1 (D) + H 2 (D)) H 3 (D) (a) Dependent faults (b) Independent faults (c) Combined faults Figure3.7: ExamplesofmultipleSAfaultcaseswithdifferentdependencyrelations between faults. (a) two serially placed faults in the same path of circuit, (b) two parallel faults placed in separate paths, and (c) simple combination of (a) and (b) cases. These relations are illustrated more clearly in the fault dependency graphs shown below the binary trees. Metric computation (MC) error function Δ(D) can beformulatedviafunctionsof{H i (D)} M i=1 definedin(3.2)usingtwobasicoperators {⊙,+} (function addition and a variant of function composition operator defined in (3.3)), according to the dependency relation of a given set of faults as shown above, where D denotes the final cost value (e.g., SAD). 107 dependencyrelation. Similarly,anydependencyrelationofanarbitrarysetoffaults S ={f i } M i=1 in a MMC architecture, where f i = (t i ,p i ,α i ), can be represented as a combination of parallel and serial cascade of faults. In order to formulate Δ(D) for any given set of faults with arbitrary fault dependency relations, we first introduce (F, +,⊙), an algebraic structure where the elements ofF are periodic piecewise constant functions which are closed under two operations {+,⊙}. The operator + denotes linear function addition and ⊙ denotes a function composition operator defined as f(x)⊙g(x),g(f(x)+x)+f(x) (3.3) This operator⊙ combinesf(x) andg(x), where the input tog(x) depends on the output of f(x). These two operators {+,⊙} essentially combine two MC error functions for two different sets of faults into one function providing the total cost shift incurred by both sets of faults. If two sets of faults are independent, the overall error function is the sum of the two error functions corresponding to each set of faults, thus the + operator is used. Similarly, the ⊙ operator is applied when two sets of faults are dependent. Figure 3.7 provides simple examples of multiple faults and their corresponding Δ(D) representation using two operators {+,⊙}. Figure3.8illustratesmoregraphicallyforeachofthethreecasesdiscussed in Figure 3.7 how Δ(D) function is linked to the dependency relation of multiple faults. WhentwosetsoffaultsareindependentasinFigure3.8(b), Δ(D)function becomes linearly additive. On the other hand, when they are dependent, as in Figure 3.8 (a)(c), the error function will have to be computed by using the ⊙ operator. 108 -2 p3 D D H 1(D)+H 2(D) (c) Combined faults: f 1 , f 2 and f 3 Δ D 2 p3 /α 3 3·2 p3 /α 3 5·2 p3 /α 3 H 3(D) = H 3 (H 1 (D) + H 2 (D) + D) + H 1 (D) + H 2 (D) Δ(D) = (H 1 (D) + H 2 (D)) H 3 (D) 2 p1 2 p2 Δ 2 p1 +2 p2 2 p1 2 p2 D 2 p1 /α 1 3·2 p1 /α 1 5·2 p1 /α 1 D 2 p2 /α 2 3·2 p2 /α 2 (b) Independent faults: f 1 and f 2 Δ(D) = H 1 (D) + H 2 (D) D H 1(D) H 2(D) D H 1 (D) D Δ D 2 p1 2 p2 2 p1 +2 p2 2 p1 2 p1 /α 1 3·2 p1 /α 1 5·2 p1 /α 1 2 p2 /α 2 3·2 p2 /α 2 2 p2 (a) Dependent faults: f 1 and f 2 2·2 p2 /α 2 2 p1 H 2 (D) = H 2 (H 1 (D) + D) + H 1 (D) Δ(D) = H 1 (D) H 2 (D) Figure 3.8: Δ(D) computation for each of the three cases described in Figure 3.7. Faultsf 1 andf 2 areSA1faultswithparameters{1,p 1 ,α 1 },{1,p 2 ,α 2 }respectively and f 3 is SA0 fault with {0,p 3 ,α 3 }. (a) a fault f 1 affects the input to the next fault position f 2 . Thus, the total cost shift at D (MC error function Δ(D)) is the summation of the cost shift due to f 1 at D, (H 1 (D)) and that of f 2 at shifted D by H 1 (D), which is (H 2 (H 1 (D)+D)). (b)Δ(D) is the simple linear summation of H 1 (D) and H 2 (D). (c) only dependent relation between faults {f 1 ,f 2 } and f 3 is depicted which is essentially the same process as (a). 109 More formally, the MC error function Δ(D) of an arbitrary set of faults S = {f i } M i=1 in any MMC architecture, wheref i = (t i ,p i ,α i ), with any dependency rela- tion can be estimated using operators{+,⊙} on the set of functions{H i (D)} M i=1 resultingalsoinaperiodicpiecewiseconstantfunctionofcostD,taking2 M different possible values in a finite set R S with period T S , where 2 R S = ( X i∈A 2 p i ·(−1) t i +1 ) A∈P({1,···,M}) T S =lcm T i = 2 p i +1 α i M i=1 (3.4) Since the ⊙ operator is non-commutative and non-distributive over addition, order is important in computing Δ(D). Based on its fault dependency graph (examples are shown in Figure 3.7 which can be also represented as a tree in which each node i correspond to a fault i), Δ(D) need to be updated iteratively at each node i of the tree from the leaves towards the root by combining each H i (D). The combination process at each node i is performed using + and ⊙ operators to combine siblings ({Δ j (D)} j∈J i ) and children ( P j∈J i Δ j (D)) with parent (H i (D)), respectively, where Δ i (D) represents the updated Δ(D) for a subtree rooted at node i andJ i is a set of all child nodes of node i. Δ i (D) = ( X j∈J i Δ j (D))⊙H i (D) Therefore for a given arbitrary set of multiple SA faults, we can obtain MC error function Δ(D) which indicates whether there is a MC error (Δ6= 0) and how much shift occurred due to faults (Δ = ˆ D−D) for any given final cost value D. Note that once we obtain MC error function (Δ(D)), the dependency relation does not have to be reconsidered afterwards. 2 P (·) and lcm(·) denote the power set and the least common multiple, respectively. 110 dD dD dD dD dD dD dD dD dD dD dD dD dD dD dD dD f2 f1 f3 f4 Df =: df (p, q) D 1 2 3 4 Figure3.9: SimpleexampleofMMCcircuitwithmultiplefaultsanditscorrespond- ing MC error function. τ P SAD D Δ(D) + D D P SAD minSAD minSAD Figure 3.10: Analysis of matching process error due to multiple SA faults. 111 D P SAD1 P SAD2 minSAD I f1 ΔSAD 1 ΔSAD2 Δ(D) 2 p 2·2 p /α 3·2 p /α 2 p /α I f2 minSAD τ 1 =ΔSAD 1 τ 2 = 2 p D P SAD1 P SAD2 minSAD ΔSAD 1 ΔSAD2 Δ(D) 2 p 2·2 p /α 3·2 p /α 2 p /α I f2 minSAD τ 1 =ΔSAD1 τ 2 = 2 p D P SAD1 P SAD2 minSAD ΔSAD 1 ΔSAD 2 Δ(D) 2 p 2·2 p /α 3·2 p /α 2 p /α τ 1 = τ 2 = 0 Figure 3.11: A comparison of two different sets of MV candidates with respect to the MP error due to a single SA fault. 112 3.4.3 MultipleFaultsEffectModelingforMatchingProcess SAfaultsalterthecomputedcostvalues ˆ D =D+Δ(D)accordingtotheMC error function which we have analytically formulated in the previous section. However, this does not necessarily lead to matching process (MP) errors, which is the case where the selected candidate r f based on the altered cost value ˆ D is not equal to the bestr ∗ . AMP error occurs if and only if,D(r f )>D(r ∗ ) and ˆ D(r f )< ˆ D(r ∗ ). In this section, based on the MC error function Δ(D) we model the MP error with two assumptions: i) minimum distance follows a distribution P minSAD which hasdifferentcharacteristicsfordifferentclassesofvideosequences(e.g., lowmotion vs. highmotionvideo),andii)thesetofcandidatesN tobetestedforthematching process can be modeled as N iid samples drawn from a distribution P SAD . Based on these two assumptions, for a given Δ(D), MP error rate P E and the expected MP error significance ¯ E are formulated as follows. SA faults shift all computed costs by Δ(D) which in turn alterP SAD into ˆ P SAD where ˆ P SAD ( ˆ D) = X ∀D:D+Δ(D)= ˆ D P SAD (D) (3.5) Figure 3.10 (left) illustrates an example of how a givenP SAD is mapped into ˆ P SAD using Δ(D) function. MP error occurs if the shifted minimum SAD (min [ SAD) is not the minimum of ˆ P SAD in ˆ D domain and if there exists at least one candidate that falls within the shaded region shown in Figure 3.10. We refer to this shaded area as error region which essentially indicates the range of SAD values satisfying SAD > minSAD and [ SAD < min [ SAD conditions and therefore determined by 113 Δ(D). This error region is bounded by error region bounding interval τ which is a function of minSAD value and defined as τ (D), Δ(D)−minR S (3.6) where minR S denotes the minimum of a set R S defined in (3.4). We denote ˆ F SAD as a cumulative distribution function of ˆ P SAD and define a new function ξ N (D), (1− ˆ F SAD ( ˆ D)) N thatgivestheprobabilitythatshiftedcostvaluesofallNcandidates fall higher than ˆ D. Then we can represent MP error rate for a given minimum distance value (denoted as D min ) as P E|D min = 1−ξ N (D min ). Therefore, P E = X D P minSAD (D)·P E|D = X D P minSAD (D)·(1−ξ N (D)) (3.7) Similarly, expected value of MP error significance for a given D min can be repre- sented as E D min (S E ) = X d∈X D min (d−D min )·ξ N−1 (d)·P SAD (d) whereX D min = n d|D min <d<D min +τ (d), ˆ d< ˆ D min o isasetofdcorrespond- ing to the blue shaded area of P SAD (d) in Figure 3.10 (Left). Similarly, the expected MP error can be expressed as ¯ E =E(S E ) = X D P minSAD (D)·E D (S E ) ¯ E =E(S E ) = X D P minSAD (D) X d∈X D (d−D)·ξ N−1 (d)·P SAD (d) (3.8) 114 0 10 20 30 40 50 60 70 80 90 0 200 400 600 800 1000 1200 1400 0 50 100 150 200 250 300 350 400 450 500 0 200 400 600 800 1000 1200 1400 matching process error rate (%) SAD SAD error significance (avr. SAD diff) error rate comparison for different N and SAD expected error comparison for different N and SAD 2 10 2 10 p = 8 p = 9 p = 10 p = 10 p = 9 p = 8 N = 5 N = 5 N = 10 N = 10 N = 20 N = 20 N = 40 N = 40 Figure 3.12: Effect of ΔSAD, N and fault location parameter p with respect to their impacts on the error rate P E and expected error significance ¯ E based on our SA fault effect model (3.7),(3.8) andP minSAD (D) obtained from the CIFforeman sequence. MP error Fault parameters Input sequences NNS algorithms MC error N P SAD ( ) D Δ MP error NNS matching process error. p f ≠ p* MC error Metric computation error. df(p, q) ≠ d(p, q) HW faults on NNS metric Ē Figure 3.13: Illustrates that input sequence, fault parameters, and search algo- rithms all affect the matching process performance. Hardware architecture on the other hand affect the distribution of potential fault locations. Error tolerant search algorithm can reduce the impact of a given fault. Error tolerant MMC architecture includes smaller percentage of critical/unacceptable fault locations. 115 33 34 35 36 37 38 39 40 41 42 0 500 1000 1500 2000 2500 3000 FS fault free FS p =11, α = 0.75 FS p =13, α = 1 TSS fault free TSS p =11, α = 0.75 TSS p =13, α = 1 EPZS fault free EPZS p =11, α = 0.75 EPZS p =13, α = 1 PSNR (dB) bitrate (kbps) Foreman CIF RD performance in the presence of faults expected error in PSNR (dB) bit line position ‘p’ (LSB~MSB) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 performance comparison of ME algorithms in the presence of faults EPZS TSS FS α = 1 α = 7/8 SAD FS TSS EPZS 1 N FS SAD FS SAD TSS SAD EPZS N TSS = 41 N EPZS = 5 N FS = 4225 comparison of three ME search algorithms distance distribution (ME SAD distribution) probability EPZS TSS FS Figure 3.14: (top) comparison of three search algorithms FS, TSS, and EPZS with respect to ΔSAD andN. (left) ME algorithm comparison in coding efficiency loss for different SSA faults with parameters, p = 0···15 from LSB to MSB at α = 1 and 7/8. (right) RD impact of faults for different ME search algorithms. 116 error rate - error significance plot for ME fault 0 10 20 30 40 50 60 70 80 0 50 100 150 200 250 300 error significance ( average extra bits per wrong match MB) error rate (%) 14 13 12 11 15 14 13 12 11 13 14 12 11 10 13 12 11 10 12 13 13 12 14 10 11 12 11 10 Full search ME acceptance curve EPZS ME acceptance curve α=1 α=7/8 α=3/4 α=5/8 α=1/2 α=3/8 Figure 3.15: Illustration of different error tolerance level for different search algo- rithms. Impact of SA fault within the metric computation to search performance is measured with error rate and error significance. 3.5 ErrorTolerantAlgorithmDesignAnalysisfor Nearest Neighbor Search In Section 3.4, we introduced a function, Δ(D), to model the distortion in metric computation caused by a given set of multiple SA faults. Based on this model, we modeled the impact of faults on matching process by additionally considering the interaction between Δ(D) and the characteristics of a candidate set in terms of the number N and quality (distribution P SAD ). These attributes can be largely controlled by NNS search algorithm design while Δ(D) is determined solely by fault parameters i.e., locations, types, and dependencies. Therefore, in this section, basedonourpreviousstudiesofMPerrorwedefinethecharacteristicsofthesearch algorithmthatleadtoincreasederrortoleranceandshowhowfaultparametersand 117 design choices made for the search process are related in terms of error tolerance. Note that our interest is not in accurate modeling of NNS search algorithms but in characterizing their average behavior with respect to our parameters of interest that are related to the error tolerance property. The multiple faults effect model for the matching process in terms of P E and ¯ E in (3.6), (3.7) shows that error robustness depends on both the number N and quality P SAD of the candidates tested by NNS algorithm. Note that both P E and ¯ E are a function of ξ N (d) = (1− ˆ F SAD ( ˆ d)) N , which is the only term associated with N. Since ξ N (d) decays exponentially as a function of N, for large enough N the difference in performance between different N values is negligible. Thus, in practice, for values of N encountered in most practical algorithms (e.g., N > 10) differences in overall error behavior are mostly determined by differences in quality of the candidates, rather than their number. As to the quality of the candidates, P SAD in our model, we approximate it by the range of P SAD distribution (a coverage of SAD values) defined as ΔSAD = maxSAD−minSAD based on two reasons: i) Although it varies from block to block, our experiments show that average distribution given the same ΔSAD ex- hibits close to uniform distribution, and ii) error region bounding interval τ defined in (3.8) over which both P E and ¯ E expectation computations are performed is bounded by ΔSAD. In other words, both P E and ¯ E can be directly controlled by ΔSAD. Therefore, we approximate the quality of the candidates with one param- eter ΔSAD in this section for the purpose of studying error tolerance property of NNS algorithms since it generally captures the main feature of P SAD that is most related to the P E and ¯ E. 118 For given M multiple faults, there can be 2 M different τ values: τ i ∈T ={τ|τ =r−minR S ,r∈R S } If ΔSAD > τ i ∀i, both P E and ¯ E are determined dominantly by Δ(D) function but not by ΔSAD. In other words, if a set of faults has all small p i which decides R S and consequently T, the choice of NNS algorithm will not change the impact of faults and the impact itself of such faults is generally already insignificant due to their small p i . Figure 3.14 (left) shows that three NNS algorithms for motion estimationapplicationwithdifferentΔSADsresultinsimilar ¯ E forsmallp. Onthe other hand, if ΔSAD < τ i ∃i, then τ i = ΔSAD ∀i ∈{i|τ i > ΔSAD}, resulting the error region bounding interval τ i for P E and ¯ E computation to be bounded by ΔSAD for some region ofD. Therefore, the choice of NNS algorithm can influence the impact of faults considerably. If ΔSAD<τ i ,∀i, allτ i are completely bounded by ΔSAD and the error depends primarily on ΔSAD. Figure 3.10 (right) provides a simple comparison of two candidate sets with different ΔSADs in the presence of a single SA fault and show how ΔSAD is linked with the expected error. Figure 3.12 illustrates our conclusions drawn from our model for a single SA fault case. It shows that P E and ¯ E increase almost exponentially with ΔSAD and saturate when ΔSAD reaches τ = 2 p while the impact of N is minimal especially for large N. Therefore, if our goal is to choose an NNS algorithm that reduces the impact of faults, we should choose one such that typical sets of MV candidates are as close as possible to optimal value (i.e., small ΔSAD). In practice, search algorithms satisfying this characteristic also show significant complexity reduction (small N) without having to compromise with the performance. 119 ToevaluatetheimpactofNNSalgorithmonerrortolerance,threerepresentative NNSsearchalgorithms,fullsearch(FS),threestepsearch(TSS)[30],andenhanced predictivezonalsearch(EPZS)[14]aretestedastheyprovidedifferentcombinations of N and ΔSAD parameters. FS exhaustively searches all candidates within the search window, and thus has the largest number of candidatesN FS and SAD range ΔSAD FS . TSS successively evaluates sparsely distributed candidates and tries to followthedirectionofminimumdistortiontolocatethesmallestSAD.Althoughthe number of candidates N TSS is small, its SAD range ΔSAD TSS remains relatively large. On the other hand, EPZS, a state of the art ME algorithm, considers a combination of optimized predictors and refinement process to locate the minimum distortion location. Unlike TSS, both the number of candidates N EPZS and SAD range ΔSAD EPZS tend to be quite small. As an example, when a search window of±32isused,N FS = 4225,N TSS = 41, andN EPZS = 8.8onaverageforForeman CIF sequence. For the same sequence, we have ΔSAD FS = 1.5× ΔSAD TSS = 9.75× ΔSAD EPZS on average. These properties are illustrated in Figure 3.14 (top), which shows the distribution of the SADs for all candidates of a given block, sorted by magnitude. Withthesethreealgorithms, varioussequencesweretestedwithaseriesoffault parameters using a H.264 /MPEG-4 AVC baseline encoder. Only 16× 16 block partitions, a single reference, and only integer-pel search were used for ME. Note that all experimental results presented from this point forward were performed under these same constraints. Figure 3.14 (left) provides the comparison of three algorithms in terms of PSNR degradation 3 due to a single SA fault with different parameters α and p while Figure 3.14 (right) depicts their RD performance. The 3 BDPSNR (Bjontegaard Delta PSNR) [4] was used to calculate average PSNR difference be- tween RD curves. 120 CIF resolution Foreman sequence was used for both graphs and other sequences tested showed similar results. Our experimental results are consistent with our earlierconclusionsthatthefaultlocationparameterspandαmainlydeterminesthe fault impact on NNS performance while for a given fault location, NNS algorithm operating on smaller ΔSAD reduces the fault impact. 3.6 Error Tolerant Architecture Design Analysis InadditiontoanyerrortoleranceprovidedbyaNNSalgorithm,thedifferentMMC architectures can also significantly influence the degree of error tolerance. For example, ¯ E can be reduced by more than 95% if a type-3 MMC architecture is used instead of type-1 (shown in Figure 3.2), when 16×16 block size is used for ME. In this section we show that MMC architecture with a perfectly balanced binary tree 4 structure (type-3) provides the highest error tolerance to SA faults. AsshownpreviouslyinFigure3.2,thereareseveraltypesofMMCarchitectures with different levels of parallelism, which can be grouped into either whole-block based (type-1,2,3) and sub-block based (type-4) structures. While whole-block based architectures allow more parallelized form, sub-block based ones reuse the architecture multiples times to perform whole block error computation. These two types of structures provide different error tolerance behavior and will be discussed separately. From here on, we omit the term “whole-block based” unless required for clarity. Each MMC architecture can be uniquely represented as a binary tree with a given number of leaves N, inner nodes N −1, and edges 2N −2, where N cor- responds to the number of pixels of the motion compensation block. Each edge 4 a full binary tree in which all leaves are at depth n or n-1 for some n. 121 α α expected error in S E the average depth of the tree type-1 expected error in PSNR the average depth of the tree type-1 type-3 type-2 type-3 type-2 expected error in PSNR expected error in S E fault effect on ME process for different MMC architectures fault effect on overall coding efficiency for different MMC architectures fault effect on ME process for different α parameters fault effect on overall coding efficiency for different α parameters 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FS TSS EPZS 0 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FS TSS EPZS FS TSS EPZS FS TSS EPZS Figure 3.16: Average fault effect ¯ E on ME performance (top) and overall coding efficiency (bottom) are shown for different α parameters (left) and for different average binary tree depths (D = N ·E(α)) which describe different MMC archi- tectures (right). Perfectly balanced trees (type-3 in Figure 3.2) show the minimum expected error introduced by a single SA fault. corresponds to each data bus connecting adders in the MMC architecture and it consists of multiple bit lines. Each edge can be parameterized with α and we as- sume that every MMC architecture of our consideration uses the same number of bit lines for data buses which have the same α parameter. Then, every MMC ar- chitecture can be uniquely described by only the distribution ofα parameter. (e.g., type-1structurewillgiveuniformdistributionofαoverallpossibleαvaluewhereas type-3 will give high concentration for lower α and exponentially decreasing as α increases to 1). 122 acceptance decision for different MMCA and ME algorithms acceptance decision threshold in PSNR (dB) acceptance decision threshold in PSNR (dB) 0 20 40 60 80 100 120 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 1.25% 0.5% 0.25% 0.75% 0 200 400 600 800 1000 1200 1400 1600 1800 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 20% 10% 5% 15% # of unacceptable fault positions acceptance decision for different MMCA and ME algorithms type-1 MMC architecture type-2,3 MMC architecture type-2 MMC architecture type-3 MMC architecture FS TSS EPZS Figure 3.17: Experimental results on actual ET based decision example with sin- gle SA fault assumption in the context of H.264/AVC comparing different metric hardware architectures and search algorithms with respect to the percentage of un- acceptable fault positions given different the decision thresholds. With a perfectly balanced tree MMC architecture (type-3), less than 1% of fault positions result in more than 0.01dB quality degradation. Based on the fact that all MMC architectures have the same number of data buses and bit lines and the general assumption that random defects are distributed with equal probability across the metric computation circuit, the probability of fault occurrence is equal for all architectures. Therefore the expected value of additional error energy introduced by a single SA fault,E ¯ E|SSAF indicates the error tolerance level of a given MMC architecture in the presence of a single SA fault. It can be represented as, E ¯ E|SSAF = X α i ∈A k X p j ∈Pα i ¯ E(α i ,p j )·q = X α i ∈A k ¯ E α i ·q = X α p(α)· ¯ E α where ¯ E α i = P p j ∈Pα i ¯ E(α i ,p j ) which is the same for all MMC architecture given α i . A k andP α i arethesetofalldatabusinMMCcircuitkandofallbitlinesbelong to data bus α i respectively. q is a probability of having a fault at certain position and it is equal probability over all fault locations. p(α) denotes the distribution of 123 α and it is determined by the MMC architecture. Based on our fault effect model studied in Section 3.4, ¯ E increases linearly with the fault parameter α with an assumption that both P SAD (D) and P minSAD (D) are uniform distributions over the entire range of D. Then, argmin p k (α) E ¯ E|SSAF = argmin p k (α) X α p k (α)· ¯ E α = argmin p k (α) D k whereD k =N· P α p k (α)·α =N·E(α) which is equivalent to the average depth ofthebinarytreethatcorrespondstotheMMCarchitecturek. Ifp k (α)determined by the MMC architecturek minimizesE ¯ E|SSAF , it also minimizes the average depth of the binary tree corresponding to that MMC architecturek. Therefore the MMC architecture corresponding to the binary tree with minimum average depth 5 (perfectlybalancedbinarytree)leadstothehighesterrortoleranceandfurthermore also maximize parallel computation. This conclusion holds more strongly when ¯ E increaseswithαsuperlinearlywhichisthecaseinpracticeastheaboveassumption isnotgenerallytrue(Figure3.16(left))mainlyduetotheP minSAD (D)distribution being more concentrated at the lower D such that minimum SAD becomes more likely to fall in the error region of Δ(D) function with higherα. More specifically, the relative reduction rate of ¯ E of the type-3 MMC structure compared to the others increases if the increasing rate of ¯ E vs. α is higher (e.g., exponential vs. linear increase) or if N increases 6 . Figure 3.16 illustrate how ¯ E increases with α (left) and with average binary tree depth describing different MMC architecture (right) for different ME algorithms in actual simulation. 5 sketch of proof: a binary tree with N leaves in which the difference between maximum and minimum depths D max −D min is greater than 1, can be reformed into the perfectly balanced tree by iteratively moving two leaves at D i max to D i min leaf. Each iteration i reduces average depth by D i max −D i min −1 /N. Therefore perfectly balanced trees have the minimum average depth. 6 for example, ¯ E of type-1, 2, and 3 increases in the order of O N 2 , O(N), and O(log 2 N) respectively. 124 The same conclusion holds true for the multiple SA faults case although a for- mal argument establishing this property is somewhat involved, but its validity is essentially a consequence of the fact that ¯ E is an increasing function with each α i of{f i } M i=1 . Figure3.17showshowthepercentageofunacceptableSSAfaultlocationsgiven a MMC structure and search algorithm varies with the acceptance decision thresh- olds. It also provides comparison for three representative MMC architectures and search algorithms for the same process. This simulation result shows that even if acceptance decision threshold is as small as 0.01dB 7 , still more than 99% of fault locations produce imperceptible degradation. In other words, more than 99% of defective metric computation circuits with SSA fault having type-3 architecture only produce less than 0.01dB degradation. This result confirms that the MMC architecture can affect the system level error tolerance property quite significantly. Similarly for sub-block based MMC architectures which reuses its structure L times to perform one whole block error computation, the expected error for a SA fault f i can be represented as ¯ E sub =P sub (f i )·E sub (S E |f i ) = 1 L P wh (f i )·(E wh (S E |f i )) L Since a single SA fault in sub-block based structure has the same effect as that of L multiple independent SA faults with the same fault parameters in whole block based one, its impact increases by a factor of L. Therefore the fault impact for the sub-block based architectures increases exponentially with L, resulting in reduced error tolerance compared to the whole block based ones. 7 In general, 0.1-0.2dB is considered to be an imperceptible quality difference in typical im- age/video coding applications. 125 3.7 Conclusions and Future Work Based on the system-level error tolerance concept, we present a complete analysis oftheeffectofinterconnectfaultsinNNSmetriccomputationcircuit. Weprovided a model to capture the effect of any fault (or combination of multiple faults) on the matching metric. We then describe how these errors in metric computation lead to errors in the matching process. Our motivation was twofold. First, we used this analysis to predict the behavior of NNS algorithms and MMC architectures in the presence of faults. Second, this analysis is a required step towards developing com- prehensive testing techniques to identify acceptable error behavior. In particular, it will provide tools to tackle multiple fault scenarios, by discarding specific sets of faults (e.g., when one of the multiple faults is by itself unacceptable), establish- ing equivalence classes across faults (e.g., cases when dependent and independent faults may have the same impact), or determining how the parameters of the faults involved affect overall behavior (e.g., multiple faults with similar parameters may lead to relatively worse errors than sets of faults with differing characteristics). Based on this model, we investigated the error tolerance behavior of nearest neighbor search (NNS) process in the presence of multiple hardware faults from both an algorithmic and a hardware architecture point of view by defining the characteristics of the search algorithm and hardware architecture that lead to in- creased error tolerance. More specifically, we investigate the relationship between fault locations and design choices made for the search process in terms of error tol- erance and define the characteristics of the search algorithm that lead to increased error tolerance. We showed that error robustnessdepends on thenumber and qual- ity of the candidates tested by NNS algorithm but the quality primarily influences ET level. We also showed that different hardware architectures performing the 126 same metric computation can also significantly influence the degree of error toler- ance and further showed that the optimal MMC hardware architecture in terms of error tolerance is perfectly balanced binary tree structures, which also allow the maximized parallel computing. Motion estimation process for video coding is tested as an example application forourstudytoverifyourmodelsandresultsinactualpracticalapplicationsetting. Our simulation showed that search algorithms satisfying such characteristics (hav- ing a candidate set with smaller set size and having a distribution closer to nearest neighbors) also exhibit significant complexity reduction, apart from increased ET, without having to compromise with the performance. For example, in our exper- iments, enhanced predictive zonal search (EPZS) [14] algorithm which has these characteristics showed 0.01dB lower and upto 2.5dB higher performance than that of full search (FS) in fault-free and faulty cases, respectively, while reducing more than 99% complexity. Our simulation also showed that if optimalstructure is used, the expected error due to a fault can be reduced by more than 95% and more than 99.2% of fault locations within matching metric computation circuits result in less than 0.01dB performance degradation. 127 Chapter 4 Conclusions and Research Directions The subject of this thesis was studying similarity search problem within the scope oferrortoleranceconcept. Thisthesispresentedseveralmethodologiestodealwith the problems of system complexity and high vulnerability to hardware defects and fabrication process variability and consequently a lower yield rate. Error Tolerance and Similarity Search Error tolerance (ET) approach [6] is an exercise of designing and testing systems cost-effectively by exploiting the advantages of a controlled relaxation of system level output quality precision requirement. The basic theme of ET approach is to allow erroneous output but by inappreciable/imperceptible degree of system level qualitydegradationinordertosimplifyandoptimizethecircuitsizeandcomplexity, power consumption, costs as well as chip manufacturing yield rate. Motivation of ET approach is two-fold: By exploiting certain range of distortions/errors which lead to negligible impact on system level performance, i) a significant portion of manufactured dies with such minor imperfection of physical origin can be saved, thus increasing overall effective yield, and ii) considerable circuit simplification and high power efficiency is attainable by systematically and purposefully introducing such distortions/errors. 128 Considering the growing needs for dealing with a large often distributed volume of data and increasing mobile computing/telecommunication, complexity criterion (response time, circuit/power complexity) has become an important criterion for a successfuldesign. Thus, thepotentialbenefitoferrortoleranceconceptinpractical point of view can be enormous. Similarity search problem is one of the examples which could benefit from error tolerance approach due to its data and computation intensive characteristics. Also this is one of the best examples where approximate solution (relaxation from the exact solution requirement) is desirable and advan- tageous. In fact, almost all current research focus on proximity problems is to efficiently find the good approximate solutions. QuantizationBasedNearest-Neighbor-PreservingMetricApproxima- tion Algorithm Along with the improvement achieved from existing approaches for NNS problem in terms of the number of examined data during query process through database preprocessingtechniques,ourproposedapproachintroducesfurthersignificantcom- plexity reduction by increasing the efficiency of metric computation. From the per- spective of error tolerance (ET) concept specifically for the applications requiring similarity search process, we developed an efficient algorithm, called query adaptive nearest-neighbor-preserving metric approximation algorithm, whichreducescompu- tational burden of searching process by primarily focusing on the metric compu- tation simplification. Proposed metric aims not at preserving measured distances but at preserving the fidelity of minimum distance ranking, while reducing po- tential wasting of resources in computing high precision metric for unlikely solu- tions/points. This metric approximation is not fixed for all potential queries but adaptivelyadjustedateachqueryprocessexploitingtheinformationofquerypoint 129 to provide better performance complexity trade-off. Thus proposed method is in- trinsically flexible from query to query and efficient with largely varying database /metric functions. Our proposed approach employs quantization process within the metric, which is applied directly to data set points without having to compute actual distance to the query point. It entails nonlinear sensitivity to distance such that finer precision is maintained only where it is needed/important (the region of expected nearest neighbors) while unlikely regions to be nearest neighbors are very coarsely repre- sented. Typically in our simulation with motion estimation, 1 bit quantizer per dimension/pixel is used and its resulting performance loss was negligible (on aver- age 0.01dB loss). We provide an analytical formulation of the search performance measure, based on which the optimal quantizer is designed to maximize the fidelity of the minimum distance ranking. Our proposed method is intrinsically more flexible and adaptable from query to query changes even with largely varying database or more dynamic environment (e.g., data streaming) or when there exist variation of metric functions and/or dimensionality change (e.g., user defined, controllable metric such as arbitrary se- lection of features/weightings), without having to rebuild the whole data structure or to perform transforms from scratch. Our proposed approach can be also used for k-nearest neighbor search or or- thogonal range search. To find the exact nearest-neighbor, this approach can be also used as a preliminary filtering step to filter out unlikely candidates and then refinement process can be performed within the remaining set. This method can beperformedindependentlybutalsoinparallelwithmostofexistingpreprocessing based algorithms. 130 More work needs to be done in terms of specific logic/circuit level design for efficient hardware implementation of quantizer. Additional future work is needed on more thorough analysis of performance factors such that given statistical infor- mation of certain application, one could accurately estimate how much complexity- performance trade-off can be expected. There is also a need to develop more con- crete methodology in collecting statistical information from data when it is applied to certain index structure method. Also it would be interesting to see how this method can be incorporated with parallel and distributed index structures. Furthermore, if possible, generalization of our proposed work can be developed to deal with more general metric function such as quadratic metric and to relax the constraint we posed in this thesis on metric structure (no cross-interference among dimensions in metric function). The concept of our proposed approach can be extended to numerous problems involving search process such as combinations of queries, batch queries, classifica- tion problems, other proximity problems etc. But not limited to similarity search problem, thepotentialbenefitoferrortoleranceconceptcanbereapedfromvariety of application areas. Hardware Fault Effect Modeling for Nearest Neighbor Search Prob- lem Based on the system-level error tolerance concept, we present a complete analysis oftheeffectofinterconnectfaultsinNNSmetriccomputationcircuit. Weprovided a model to capture the effect of any fault (or combination of multiple faults) on the matching metric. We then describe how these errors in metric computation lead to errors in the matching process. Our motivation was twofold. First, we used this analysis to predict the behavior of NNS algorithms and MMC architectures in 131 the presence of faults. Second, this analysis is a required step towards developing comprehensive testing techniques to identify acceptable error behavior. Based on this model, we investigated the error tolerance behavior of nearest neighbor search (NNS) process in the presence of multiple hardware faults from both an algorithmic and a hardware architecture point of view by defining the characteristics of the search algorithm and hardware architecture that lead to in- creased error tolerance. More specifically, we investigate the relationship between fault locations and design choices made for the search process in terms of error tol- erance and define the characteristics of the search algorithm that lead to increased error tolerance. We showed that error robustnessdepends on thenumber and qual- ity of the candidates tested by NNS algorithm but the quality primarily influences ET level. We also showed that different hardware architectures performing the same metric computation can also significantly influence the degree of error toler- ance and further showed that the optimal MMC hardware architecture in terms of error tolerance is perfectly balanced binary tree structures, which also allow the maximized parallel computing. Motion estimation process for video coding is tested as an example application forourstudytoverifyourmodelsandresultsinactualpracticalapplicationsetting. Our simulation showed that search algorithms satisfying such characteristics (hav- ing a candidate set with smaller set size and having a distribution closer to nearest neighbors) also exhibit significant complexity reduction, apart from increased ET, without having to compromise with the performance. For example, in our exper- iments, enhanced predictive zonal search (EPZS) [14] algorithm which has these characteristics showed 0.01dB lower and upto 2.5dB higher performance than that of full search (FS) in fault-free and faulty cases, respectively, while reducing more than 99% complexity. Our simulation also showed that if optimalstructure is used, 132 the expected error due to a fault can be reduced by more than 95% and more than 99.2% of fault locations within matching metric computation circuits result in less than 0.01dB performance degradation. Based on the fault effect model we presented, we need to develop comprehen- sive testing techniques to identify acceptable error behavior. In particular, based on this model, we need to tackle multiple fault scenarios, by discarding specific sets of faults (e.g., when one of the multiple faults is by itself unacceptable), establish- ing equivalence classes across faults (e.g., cases when dependent and independent faults may have the same impact), or determining how the parameters of the faults involved affect overall behavior (e.g., multiple faults with similar parameters may lead to relatively worse errors than sets of faults with differing characteristics). 133 Bibliography [1] http://biron.usc.edu/wiki/index.php/error tolerant computing. [2] ITU (International Telecommunication Union) Study Group VQEG (The Video Quality Experts’ Group), ftp://ftp.its.bldrdoc.gov/dist/ituvidq. [3] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is ’nearest neighbor’ meaningful? In Lecture Notes in Computer Science, volume 1540, page 217235, 1999. [4] G. Bjontegaard. Calculation of average psnr differences between rd-curves. In ITU-T Video Coding Experts Group, document VCEG-M33, 2001. [5] M. A. Breuer. Determining error rate in error tolerant VLSI chips. In Pro- ceedings of IEEE Int. Workshop on Electronic Design, Test and Applications DELTA04, 2004. [6] M. A. Breuer, S. K. Gupta, and T. M. Mak. Defect and error tolerance in the presence of massive numbers of defects. IEEE Design & Test of Comp., 21:216–227, May–June 2004. [7] T. K. Callaway and E. E. Swartzlander. Optimizing arithmetic elements for signal processing. In VLSI Signal Processing, V, pages 91–100, 1992. [8] Y. Chan and S. Y. Kung. Multi-level pixel difference classification methods. In in Proc. ICIP, page 252255, 1995. [9] H. Y. Cheong, I. Chong, and A. Ortega. Computation error tolerance in motion estimation algorithms. In IEEE International Conference on Image Processing, ICIP’06, Oct. 2006. [10] H. Y. Cheong and A. Ortega. System level fault tolerant motion estimation algorithms&techniques. Technicalreport, SIPI,Univ.ofSouthernCalifornia, 2006. [11] H. Y. Cheong and A. Ortega. Distance quantization method for fast near- est neighbor search computations with applications to motion estimation. In Asilomar Conference on Signals, Systems and Computers, Nov. 2007. 134 [12] H. Y. Cheong and A. Ortega. Motion estimation performance models with applicationtohardwareerrortolerance. InVisual Communications and Image Processing VCIP’07, Jan. 2007. [13] H. Y. Cheong and A. Ortega. Quantization based nearest-neighbor-preserving metricapproximation. InIEEEInternationalConferenceonImageProcessing, ICIP’08, Oct. 2008. [14] H. Y. Cheong and A. M. Tourapis. Fast motion estimation within the h.264 codec. In Proc.IEEE Int. Conf. on MultiMedia and Expo (ICME-2003), July 2003. [15] K.T.Choi, S.C.Chan, andT.S.Ng. Anewfastmotionestimationalgorithm using hexagonal subsampling pattern and multiple candidates search. In in Proc. ICIP. 2, page 497500, 1996. [16] I. Chong, H. Y. Cheong, and A. Ortega. New quality metrics for multimedia compression using faulty hardware. In Proc. of International Workshop on Video Processing and Quality Metrics for Consumer Electronics, 2006. [17] I. Chong and A. Ortega. Hardware testing for error tolerant multimedia com- pression based on linear transforms. In Proc. of IEEE International Sympo- sium on Defect and Fault Tolerance in VLSI Systems, DFT’05,pages523–534, 2005. [18] H. Chung and A. Ortega. Analysis and testing for error tolerant motion esti- mation. In Proc. of IEEE Int. Symp. on Defect Fault Tolerance in VLSI Syst., pages 514–522, 2005. [19] P. Ciaccia, M. Patella, and P. Zezula. A cost model for similarity queries in metric spaces. In Proceedings of the seventeenth ACM SIGACT-SIGMOD- SIGART symposium on Principles of database systems, pages 59–68, 1998. [20] T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. In IEEE Transactions on Information Theory, volume 13, pages 21–27, 1967. [21] S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Hars- bman. Indexing by latent semantic analysis. In Journal of the Society for Information Sciences, volume 41, pages 391–407, 1990. [22] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. In Journal of the American Society for Information Science, volume 41, 1990. [23] L. Devroye and T. J. Wagner. Nearest neighbor methods in discrimination. Handbook of Statistics, 1982. 135 [24] A. Gersho and R. M. Gray. Vector Quanfization and Signal Compression. Kluwer Academic, 1991. [25] Z. He, C. Tsui, K. Chan, and M. Liou. Low-power VLSI design for motion estimation using adaptive pixel truncation. In IEEE Transaction on CSVT 10, 2000. [26] H. Hotelling. Analysis of a complex of statistical variables into principal com- ponents. In Journal of Educational Psychology, volume 27, pages 417–441, 1933. [27] L. M. Huisman. Diagnosing arbitrary defects in logic designs using single location at a time (SLAT). IEEE TCAD, 23:91–101, 2004. [28] P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Symposium on Theory of computing, 1998. [29] R. Karri, K. Hogstedt, and A. Orailoglu. Computer-aided design of fault- tolerant VLSI systems. IEEE Design & Test of Computers, 13:88 – 96, 1996. [30] T. Koga, K. Linuma, A. Hirano, Y. Iijima, and T. Ishiguro. Motion- compensated interframe coding for video conferencing. In IEEE NTC, pages 531–534, 1981. [31] T. Kolda, R. Lewis, and V. Torczon. Optimization by direct search: New perspectives on some classical and modern methods. SIAM Review, 45:385– 482, 2003. [32] I. Koren and Z. Koren. Defect tolerant VLSI circuits: Techniques and yield analysis. InProceedings of the IEEE, Vol. 86, pages1817–1836, SanFrancisco, CA, Sept. 1998. [33] I. Koren and C. M. Krishna. Fault Tolerant Systems. Morgan Kaufmann, United States, 2007. [34] R. Leachman and C. Berglund. Systematic mechanisms-limited yield assess- ment survey. Competitive semiconductor manufacturing program, UC Berke- ley, 2003. [35] S. Lee, J. M. Kim, and S. I. Chae. New motion estimation algorithm us- ing adaptively quantized low bit-resolution image and its VLSI architecture formpeg video encoding. In IEEE Trans. CSVT. 8, 1998. [36] B. Liu and A. Zaccarin. New fast algorithm for motion estimation of block motion vectors. In IEEE Trans. CSVT. 3, page 148157, 1993. 136 [37] M. Love. Probability theory. In Graduate Texts in Mathematics II, Springer- Verlag, volume 46, 1978. [38] J. Makhoul, F. Kubala, R. Schwartz, and R. Weischedel. Performance mea- sures for information extraction. In Proceedings of DARPA Broadcast News Workshop, 1999. [39] J. Marks, B. Andalman, P. Beardsley, W. Freeman, S. Gibson, J. Hodgins, and T. Kang. Design galleries: A general approach to setting parameters for computer graphics and animation. In SIGGRAPH Conference Proceedings, pages 389–400, 1997. [40] B. Natarajan, V. Bhaskaran, and K. Konstantinides. Low-complexity block- basedmotionestimationviaone-bittransforms. InIEEE Trans. Circuits Syst. Video Technol. vol. 7, page 702706, 1997. [41] S. M. Omohundro. Efficient algorithms with neural network behaviour. In Journal of Complex Systems, volume 1, page 273347, 1987. [42] P. Pirsch, N. Demassieux, and W. Gehrke. VLSI architectures for video compression-asurvey. InProc. IEEE,volume83(2), pages220–246, Feb.1995. [43] S. J. Russell and P. Norvig. Artificial intelligence: a modern approach. Prentice-Hall, 1995. [44] A. A. Salamov and V. V. Solovyev. Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. Journal of Molecular Biology, 247(1):11–15, 1995. [45] G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. [46] H. Samet. The Design and Analysis of Spatial Data Structures. Addison- Wesley, MA, 1980. [47] H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan-Kaufmann, San Francisco, 2006. [48] T. Sellis, N. Roussopoulos, and C. FaIoutsos. Multidimensional access meth- ods: Trees have grown everywhere. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB), 1997. [49] J. C. Spall. Introduction to stochastic search and optimization: Estimation, simulation, and control. Wiley, 2003. 137 [50] O.SternandH.J.Wunderlich. Simulationresultsofanefficientdefectanalysis procedure. In Proc. IEEE Int. Test Conf. on TEST: The Next 25 Years, pages 729–738, Oct. 1994. [51] V. Torczon. On the convergence of pattern search algorithms. In SIAM Jour- nal on Optimization, volume 7, pages 1–25, 1997. [52] J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. In Information Processing Letters, volume 40, page 175179, 1991. [53] P. Zezula, P. Savino, G. Amato, and F. Rabitti. Approximate similarity re- trieval with m-trees. In Very Large Databases Journal, volume 7, pages 275– 293, 1998. 138
Abstract (if available)
Abstract
As the system complexity increases and VLSI chip circuit becomes more highly condensed and integrated towards nano-scale, the requirement of 100% exact execution of designed operations and correctness for all transistors and interconnects is prohibitively expensive, if not impossible, to meet in practice. To deal with these problems, defect tolerance (DT) and fault tolerance (FT) techniques at the design and manufacturing stages have been widely studied and practiced. FT and DT techniques ideally try to mask all effects of faults and internal errors by exploiting and managing redundancies, which leads to more complex and costly system to achieve hopefully ideal or at least acceptable output quality at the expense of additional complexity cost.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Error tolerant multimedia compression system
PDF
Optimal defect-tolerant SRAM designs in terms of yield-per-area under constraints on soft-error resilience and performance
PDF
Dynamic packet fragmentation for increased virtual channel utilization and fault tolerance in on-chip routers
Asset Metadata
Creator
Cheong, Hye-Yeon
(author)
Core Title
Error tolerance approach for similarity search problems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering (Multimedia and Creative Technology)
Publication Date
02/03/2011
Defense Date
06/09/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
approximation algorithm,error tolerance,motion estimation,multiple faults modeling,nearest neighbor search,OAI-PMH Harvest,QNNM,similarity search,vector quantization
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ortega, Antonio (
committee chair
), Gupta, Sandeep K. (
committee member
), Shahabi, Cyrus (
committee member
)
Creator Email
hyeyeon.cheong@gmail.com,hyeyeonc@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3639
Unique identifier
UC1165419
Identifier
etd-Cheong-4320 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-428824 (legacy record id),usctheses-m3639 (legacy record id)
Legacy Identifier
etd-Cheong-4320.pdf
Dmrecord
428824
Document Type
Dissertation
Rights
Cheong, Hye-Yeon
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
approximation algorithm
error tolerance
motion estimation
multiple faults modeling
nearest neighbor search
QNNM
similarity search
vector quantization