Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Advanced techniques for stereoscopic image rectification and quality assessment
(USC Thesis Other)
Advanced techniques for stereoscopic image rectification and quality assessment
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ADV ANCED TECHNIQUES FOR STEREOSCOPIC IMAGE RECTIFICATION AND QUALITY ASSESSMENT by Hyunsuk Ko A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2015 Doctoral Committee: Professor C.-C. Jay Kuo, Chair Professor Alexander A. Sawchuk Professor Aiichiro Nakano Copyright 2015 Hyunsuk Ko Contents List of Tables iv List of Figures vi Abstract ix 1 Introduction 1 1.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Stereoscopic Image Quality Assessment . . . . . . . . . . . . . 1 1.1.2 Uncalibrated Stereo Rectification . . . . . . . . . . . . . . . . 3 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Stereoscopic Image Quality Assessment . . . . . . . . . . . . . 5 1.2.2 Uncalibrated Stereo Rectification . . . . . . . . . . . . . . . . 7 1.3 Contribution of the Research . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Organization of the Proposal . . . . . . . . . . . . . . . . . . . . . . . 12 2 Formula-based Stereoscopic Image Quality Index 13 2.1 Structural Distortion Parameter (SDP) . . . . . . . . . . . . . . . . . . 13 2.1.1 Asymmetric Distortion . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2 Binocular Perception in Stereovision . . . . . . . . . . . . . . . 14 2.1.3 Procedure for calculating SDP . . . . . . . . . . . . . . . . . . 16 2.2 The Proposed Quality Index using SDP-based Binocular Perception Model 18 2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 Construction of Stereoscopic Image Database - MCL3D 25 3.1 Previous Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Description of the MCL-3D Image Database . . . . . . . . . . . . . . . 27 3.2.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.2 Image Source . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.3 distortion types and levels . . . . . . . . . . . . . . . . . . . . 30 3.2.4 Subjective Test for the database . . . . . . . . . . . . . . . . . 34 3.2.5 Mean Opinion Scores . . . . . . . . . . . . . . . . . . . . . . . 37 ii 3.3 Perception Quality Analysis . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.1 Statistical analysis of distortion type . . . . . . . . . . . . . . . 39 3.3.2 Feedback from conversation . . . . . . . . . . . . . . . . . . . 44 4 A ParaBoost Stereoscopic Image Quality Assessment (PBSIQA) System 47 4.1 Overview of the Proposed Quality Index . . . . . . . . . . . . . . . . . 47 4.2 MCL-3D, IVC-A and LIVE-A Databases . . . . . . . . . . . . . . . . 49 4.3 Proposed PBSIQA System . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.1 Scorer Design for Texture Distortions . . . . . . . . . . . . . . 52 4.3.2 Scorer Design for Depth Distortions . . . . . . . . . . . . . . . 60 4.3.3 Learning-based Scorers and Fuser Design . . . . . . . . . . . . 63 4.3.4 Training and Test Procedures . . . . . . . . . . . . . . . . . . . 66 4.4 Learning procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.5.1 Performance Comparison of Stage I Scorers . . . . . . . . . . . 69 4.5.2 Performance Improvement via Fusion . . . . . . . . . . . . . . 71 4.5.3 Performance Comparison with Other Quality Indices . . . . . . 73 5 Robust Uncalibrated Stereo Rectification with Constrained Geometric Dis- tortions (USR-CGD) 78 5.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Proposed USR-CGD Rectification Algorithm . . . . . . . . . . . . . . 83 5.2.1 Generalized Homographies . . . . . . . . . . . . . . . . . . . . 83 5.2.2 Geometric Distortions . . . . . . . . . . . . . . . . . . . . . . 87 5.2.3 Iterative Optimization . . . . . . . . . . . . . . . . . . . . . . 91 5.3 Databases of Uncalibrated Stereo Images . . . . . . . . . . . . . . . . 94 5.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6 Conclusion 105 Reference List 107 iii List of Tables 2.1 The subjective test environment. . . . . . . . . . . . . . . . . . . . . . 23 2.2 Overall performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 Performance of database with Symmetric distortion only. . . . . . . . . 24 2.4 Performance of database with Asymmetric distortion only. . . . . . . . 24 3.1 Summary of 3D Image Database . . . . . . . . . . . . . . . . . . . . . 26 3.2 Recommendations for subjective test moehod. . . . . . . . . . . . . . . 35 4.1 The summary of MCL databases . . . . . . . . . . . . . . . . . . . . . 51 4.2 Features used for the scorers . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 The results of single feature analysis: DS-A and DS-B represent Dataset A and Dataset B, respectively. . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Performance comparison among 6 sub-databases in terms of the rank and PCC values (inside the parathesis) with respect to each of the eight learning-based scorers, where the top-ranked sub-database is marked in bold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5 Cumulative performance improvement via fusion against Dataset C of the MCL-3D database. . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.6 Performance comparison over the MCL-3D database. . . . . . . . . . . 74 4.7 Performance of the proposed PBSIQA index over IVC-A & LIVE-A databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.8 Cross-data validation result: Use the models generated from MCL-3D to predict a quality score of stereo images in IVC-A & LIVE-A databases. 76 5.1 The databases of uncalibrated stereo images . . . . . . . . . . . . . . . 96 5.2 Performance comparison of six rectification algorithms in terms of the rectification error (E v ) and orthogonality (E O ) and Skewness (E Sk ) geo- metric distortions, where the best result is shown in bold. For the MCL- SS database, the results of four test sets (Interior, Street, Military, and Vehicle) are averaged. . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 iv 5.3 Performance comparison of six rectification algorithms in terms of aspect- ratio (E AR ). rotation (E R ) and scale-variance (E SR ) geometic distor- tions, where the best result is shown in bold. For the MCL-SS database, the results of four test sets (Interior, Street, Military, and Vehicle) are averaged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.4 Performance comparison based on 100 and 300 correspondences as the input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 v List of Figures 1.1 Basic Framework of Stereoscopic Image Quality Assessment. . . . . . 4 2.1 Examples of Asymmetric Distortions. . . . . . . . . . . . . . . . . . . 14 2.2 Illustration of two information flow types between stereoscopic image pairs with asymmetric distortion in HVS. . . . . . . . . . . . . . . . . 15 2.3 Three input images & corresponding edge magnitude maps. . . . . . . . 16 2.4 Residual edge maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 The SDP distribution for images with blur, JPEG-2000, JPEG and addi- tive noise distortion types. . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6 The blockdiagram of the proposed index for stereo image pairs. . . . . . 20 2.7 MOS toward structural-/ non-structural distortion. . . . . . . . . . . . . 23 3.1 The system based on DIBR technology. . . . . . . . . . . . . . . . . . 27 3.2 Structure of experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Reference texture images in MCL3D database . . . . . . . . . . . . . . 29 3.4 The entire precessing flow and the corresponding distorted part for each database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5 Subjective test environment . . . . . . . . . . . . . . . . . . . . . . . . 36 3.6 GUI of Subjective test software . . . . . . . . . . . . . . . . . . . . . . 37 3.7 Age distribution of the assessors . . . . . . . . . . . . . . . . . . . . . 38 3.8 MOS of distorted images when distortion comes from texture part only. The horizontal axis is the image number, images with the same source are collected into one minor grid. Within each minor grid, there are 6 minor slots, corresponding to 6 distortion types. Within each minor slot, there are 4 images, corresponding to 4 levels. The vertical axis is the corresponding MOS. . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.9 Example image when jpeg distortion is on texture part only. (a) is the jpeg compressed texture image, with second strongest distortion level. (b) is the original depth map. (c) is the left view of finally rendered stereoscopic image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 vi 3.10 MOS with confidence interval of stereoscopic image with additive noise and transmission loss on depth map. The horizontal axis is the image number, images with the same source are collected into one minor grid. Within each minor grid, there are 2 minor slots, corresponding to 2 dis- tortion types. Within each minor slot, there are 4 images, corresponding to 4 levels. The vertical axis is the corresponding MOS. . . . . . . . . 41 3.11 Example image when additive noise is on depth part only. (a) is the orig- inal texture image. (b) is the depth map with additive noise, strongest level distortion. (c) is the left view of finally rendered stereoscopic image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.12 MOS with confidence interval of stereoscopic image with gaussian blur, jp2k compression, jpeg compression and sampling blur on depth map. The horizontal axis is the image number, images with the same source are collected into one minor grid. Within each minor grid, there are 4 minor slots, corresponding to 4 distortion types. Within each minor slot, there are 4 images, corresponding to 4 levels. The vertical axis is the corresponding MOS. . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.13 Example image when sampling blur is on depth part only. (a) is the original texture image. (b) is the depth map with sampling blur, sec- ond strongest level distortion. (c) is the left view of finally rendered stereoscopic image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.14 MOS of stereoscopic image with additive noise and transmission loss on both texture and depth map. The horizontal axis is the image number, images with the same source are collected into one minor grid. Within each minor grid, there are 2 minor slots, corresponding to 2 distortion types. Within each minor slot, there are 4 images, corresponding to 4 levels. The vertical axis is the corresponding MOS. . . . . . . . . . . . 45 3.15 MOS of stereoscopic image with gaussian blur, jp2k compression, jpeg compression and sampling blur on both texture and depth map. The horizontal axis is the image number, images with the same source are collected into one minor grid. Within each minor grid, there are 4 minor slots, corresponding to 4 distortion types. Within each minor slot, there are 4 images, corresponding to 4 levels. The vertical axis is the corre- sponding MOS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1 The conceptual block-diagram of the PBSIQA scoring system. . . . . . 48 4.2 Distortion generation in the MCL-3D database. . . . . . . . . . . . . . 50 4.3 The impact on a rendered image of different distortion sources. . . . . . 52 4.4 The MOS values as a function of the depth map distortion level, where the horizontal axis is the image index. Images from the same source include two distortion types with four distortion levels, respectively. . . 62 4.5 The diagram of the proposed PBSIQA system. . . . . . . . . . . . . . . 64 vii 4.6 The PCC and SROCC performance curves as a function of the number of fused scorers in the PBSIQA system. . . . . . . . . . . . . . . . . . 73 4.7 Scatter plots for images in the Dataset C of MCL-3D database, where the x-axis is the subjective MOS value and the y-axis is the predicted MOS using various objective quality indices. . . . . . . . . . . . . . . 75 5.1 Illustration of a pinhole camera model. . . . . . . . . . . . . . . . . . . 79 5.2 The epipolar geometry of a pair of stereo images. . . . . . . . . . . . . 81 5.3 Illustration of image rectification. . . . . . . . . . . . . . . . . . . . . . 82 5.4 The effect of introducing a parameter for the y-translation: the original and rectified stereo pairs are shown in the left and right of subfigures (a) and (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.5 The effect of introducing a parameter for different zoom levels: the orig- inal and rectified stereo pairs are shown in the left and right of subfigures (a) and (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.6 The original and rectified images used to define four new geometric measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.7 The block diagram of the proposed iterative optimization procedure. . . 92 5.8 The block-diagram of the proposed USR-CGD system. . . . . . . . . . 94 5.9 The left images of 4 MCL-SS reference image pairs. . . . . . . . . . . 95 5.10 The 8 test image pairs of “Interior” with different geometric distortions, where left and right images are overlapped for the display purpose. . . . 96 5.11 The eight selected left images from 20 MCL-RS image pairs. . . . . . . 97 5.12 Camera configuration used to acquire synthetic stereo images. . . . . . 98 5.13 Subjective quality comparison of rectified image pair using (a) the USR algorithm and (b) the USR-CGD algorithm. . . . . . . . . . . . . . . . 101 5.14 The subjective quality and the rectification error (the value in parenthe- sis) for Fountain2 of MCL-RS databases. . . . . . . . . . . . . . . . . 103 5.15 The subjective quality and the rectification error (the value in parenthe- sis) for Fountain3 of MCL-RS databases. . . . . . . . . . . . . . . . . 103 5.16 The subjective quality and the rectification error (the value in parenthe- sis) for Military of MCL-SS databases. . . . . . . . . . . . . . . . . . . 103 5.17 The subjective quality and the rectification error (the value in parenthe- sis) for Tot of SYNTIM databases. . . . . . . . . . . . . . . . . . . . . 104 5.18 The subjective quality and the rectification error (the value in parenthe- sis) for Root of VSG databases. . . . . . . . . . . . . . . . . . . . . . . 104 5.19 The subjective quality and the rectification error (the value in parenthe- sis) for Yard of VSG databases. . . . . . . . . . . . . . . . . . . . . . . 104 viii Abstract New frameworks for an objective quality evaluation and an image rectification of stereo- scopic image pairs are presented in this work. First, quality assessment of stereoscopic image pairs is more complicated than that for 2D images since it is a multi-dimensional problem where the quality is affected by distortion types as well as the relation between the left and right views such as different types/levels of distortion in two views. In our work, we first introduce a novel formula-based metric that provide better results than several existing methods. However, the formula-based metric still has its limitation. For further improvement, we propose a parallel boosting system based quality index. That is, we classify distortion types into groups and design a set of scorer to handle them separately. At stage 1, each scorer generates its own score for a specific distortion type. At stage 2, all intermediate scores are fused to predict the final quality index with nonlinear regression. Experimental results demonstrate that the proposed quality index outperforms most of state-of-the art quality assessment methods by a significant margin over different databases. Secondly, a novel algorithm for uncalibrated stereo image-pair rectification under the constraint of geometric distortion, called USR-CGD, is presented in this work. Although it is straightforward to define a rectifying transformation (or homography) given the epipolar geometry, many existing algorithms have unwanted geometric distortions as a ix side effect. To obtain rectified images with reduced geometric distortions while main- taining a small rectification error, we parameterize the homography by considering the influence of various kinds of geometric distortions. Next, we define several geomet- ric measures and incorporate them into a new cost function for parameter optimization. Finally, we propose a constrained adaptive optimization scheme to allow a balanced performance between the rectification error and the geometric error. Extensive exper- imental results are provided to demonstrate the superb performance of the proposed USR-CGD method, which outperforms existing algorithms by a significant margin. x Chapter 1 Introduction 1.1 Motivation and Background 1.1.1 Stereoscopic Image Quality Assessment With the rapid development of three-dimensional (3D) technology, 3D visual media has extended its applications ranging from entertainment such as films and broadcasting to automobile industry, remote education, medical operations and games. 3D images and videos offer immersive experience, so that a viewer can have depth perception when each eye sees a scene from a different viewpoint. With the efforts of standardiza- tion societies, many advanced systems for coding, transmitting, and storing 3D visual data have been proposed such as stereoscopic 3D video [1], multiview video coding (MVC) [2], and multiview video plus depth map (MVD) format [3, 4]. Such 3D systems typically deal with a tradeoff between bit resources and visual qualities. To determine an optimized solution, we need a means to measure the qualities of images or videos. Tra- ditionally, the peak signal-to-noise ratio (PSNR) has been used even for 3D images and videos by simply averaging the PSNRs of multiple views. However, it is well known that the correlation between PSNR (or most existing 2D metrics) and human visual experi- ence for 3D visual stimuli is untrustworthy [5, 6]. We hence need a better method to assess 3D qualities. An indisputable approach is to ask the opinions of human observers. However, such subjective evaluation is not only time-consuming and expensive but also 1 impossible in automatic systems. Therefore, it is of great importance to design an objec- tive evaluation algorithm that is consistent with subjective human evaluation. Whereas many researches on 2D image quality metrics considering human visual system (HVS) have made a substantial amount of progress in the last two decades, the study on 3D image quality metrics is still at its initial stage. Compared to the 2D qual- ity assessment (QA), the QA of stereoscopic images is more difficult and complicated. While the quality of a 2D image is mainly affected by the errors in pixel positions and values, there are more factors influencing the perceptual qualities of stereoscopic images, for example, mismatch between left and right views and discomfort, such as dizziness or eye strain, caused by excessive depth perception. Asymmetrical distortion, which is due to different qualities of left and right views, is an example of mismatch distortion. In such a case, HVS reacts differently according to distortion types dur- ing the QA [7, 8, 9]. Thus, to design a reliable quality metric, a thorough study on HVS should be preceded. With regard to distortion types, rendering distortion has been introduced as a critical factor, since the 3D video coding (3DVC) standardization [10] adopted the MVD data format. The MVD format consists of a texture image/video and its corresponding depth map. Both texture and depth are compressed and transmitted to the decoder side. Then, several virtual views can be synthesized using the two input views by a depth image based rendering (DIBR) technique [11]. In this framework, tex- ture errors trigger blurring or blocking artifacts in virtual views, while depth map errors cause horizontal geometric misarrangement since pixel values in a depth map represents the disparity information for the rendering process. Therefore, all these factors should be taken into account for the 3D QA. Fig. 1.1 shows a basic framework of the research. Basically, there is left/right origi- nal stereoscopic image pair as a reference, and there also exists their distorted pair. We 2 will assess the quality of such a distorted stereo image pair. First, we conduct a subjec- tive test to obtain Human score, so called, Mean Opinion Score (MOS). Next, we also get an objective score from certain objective metric. Finally, our goal is to develop an robust objective metric that is consistent with Human visual experience. 1.1.2 Uncalibrated Stereo Rectification Stereo vision is used to reconstruct the 3D structure of a real world scene from two images taken from slightly different viewpoints. Recently, stereoscopic visual contents have been vastly used in entertainment, broadcasting, gaming, tele-operation, and so on. The cornerstone of stereo analysis is the solution to the correspondence problem, which is also known as stereo matching. It can be defined as locating a pair of image pixels from left and right images, where two pixels are the projections from the same scene element. A pair of corresponding points should satisfy the so-called epipolar constraint. That is, for a given point in one view, its corresponding point in the other view must lie on the epipolar line. However, in practice, it is difficult to obtain a rectified stereo pair from a stereo rig since it often contains parts of mechanical inaccuracy. Besides, thermal dilation affects the camera extrinsic parameters and some internal parameter. For example, the focal length can be changed during shooting with different zoom levels. For the above-mentioned reasons, epipolar lines are not parallel and not aligned with the coordinate axis and, therefore, we must search through the 2D window, which makes the matching process a time-consuming task. The correspondence problem can however be simplified and performed efficiently if all epipolar lines in both images are in parallel with the line connecting the centers of the two projected views, which is called the baseline. Since the corresponding points have the same vertical coordinates, the search range is reduced to a 1D scan line. This can be embodied by applying 2D projective transforms, or homographies, to each image. This 3 Figure 1.1: Basic Framework of Stereoscopic Image Quality Assessment. process is known as “image rectification” and most stereo matching algorithms assume that the left and the right images are perfectly rectified. If intrinsic/extrinsic camera parameters are known from the calibration process, the rectification process is unique up to trivial transformations. However, such parameters are often not specified and there exists a considerable amount of uncertainty due to faulty calibration. On the other hand, there are more degrees of freedom to estimate the homography of uncalibrated cameras. The way to parameterize and optimize these parameteres is critical to the generation of rectified images without unwanted warping distortions. Such a rectification method without knowing camera parameters is called uncalibrated rectification, which is our concern in this work. 4 1.2 Related Work 1.2.1 Stereoscopic Image Quality Assessment Traditionally, objective quality metrics are classified into three categories according to the accessibility to the information of a reference image, which is assumed to have “per- fect” quality. A full-reference (FR) QA algorithm uses a reference image as well as a distorted output image to judge the quality of the output, while a no-reference (NR) algorithm can access the output only. Between these two ends, a reduced-reference (RR) algorithm lies, which uses partial information of the reference image, such as water- marks. Alternatively, we can categorize objective quality metrics into two groups: one takes an analytical (formula-based) approach, and the other a learning-based approach. In the formula-based approach, one trend is to use existing 2D metrics or a combina- tion of them to judge the qualities of stereoscopic images. Some researches attempt to incorporate disparity or depth information, which has a significant impact on the quali- ties. In [12, 13], 2D metrics have been extended to the 3D QA by incorporating disparity map distortions. Specifically, a quality index is obtained by combining the distortions of a depth map and 2D images of both views linearly. In [14], Hanhart et al. analyzed the correlations between PSNR-based metrics and the perceived quality of an asymmet- ric stereo pair, which is composed of a decoded view and a synthesized view. Another trend is to extract useful features, which influence human visual experience, and com- bine those features to derive a metric. In [15, 16], Sazzad et al. assumed that perceived distortions and depths of stereoscopic images depend strongly on local features, such as edges and planar parts, and proposed a stereoscopic metric for JPEG compressed images based on segmented local features. Also, many formula-based metrics exploit the characteristics of HVS. In [17], Gor- ley and Holliman proposed a stereoscopic metric based on the sensitivity of HVS to 5 contrast and luminance changes. Several quality metrics use binocular perception mod- els. In [7, 8], Stelmach and Meegan reported that binocular perception is dominated by a higher quality image in the case of lowpass filtering distortions, but by the average of both images in the case of quantized distortions. Seuntiens et al. [9] also observed that the perceived quality of a JPEG-coded stereo image pair is close to the average quality of the two individual views. Ryu et al. [18] proposed an extended version of the SSIM index based on a binocular model. Their metric uses a fixed set of parameters and is not adaptive to different asymmetric distortions. The learning-based approach has been adopted by only a few stereoscopic QA met- rics, while machine learning has been used for several 2D image QA methods. Tsung- Jung et al. [20] presented a novel QA method for 2D images based on the multi-metric fusion (MMF), assuming that there is no single metric to yield the best performance in all situations. They determined the final quality score by combining multiple met- rics nonlinearly with suitable weights, obtained through a training process. Their MMF metric outperforms the state-of-the-art quality metrics. As an extension of MMF, Jin et al. [21] proposed a block-based MMF (BMMF) scheme, which estimates image con- tents and distortion types in a block. They observed that the performance of an image quality metric is highly influenced by visual contexts. For 3D image QA, Parketal. [22] used a set of universally relevant geometric stereo features for anaglyph images and built a regression model to capture the relationship between the features and the qualities of stereo images. Cheng and Sumei [23] extracted a set of basis images using independent component analysis and then used a binary tree support vector machine to predict the grades of distorted stereo images. 6 1.2.2 Uncalibrated Stereo Rectification As pioneering researchers on image rectification, Ayache and Francis [24] and Fusiello et al. [25] proposed rectification algorithms based on known camera parameters. The necessity of knowing calibration parameters is the major shortcoming of these early methods. To overcome it, several researchers proposed a technique called projective (or uncalibrated) rectification that rectifies images without knowing camera parameters but estimating homography using epipolar geometry under various constraints. Hart- ley and Andrew [26, 27] developed a theoretical foundation for this technique, where they derived a condition for one of two homographies to be close to a rigid transforma- tion while estimated the other by minimizing the difference between the corresponding points. Loop and Zhang [28] estimated homographies by decomposing them into a pro- jective transform and an affine transform. There exist rectification algorithms that take geometric distortions into account to prevent them from being introduced in the rectified images. For example, Pollefeys et al. [29] proposed a simple and efficient algorithm for a stereo image pair using a polar parametrization of the image around the epipole while Gluckman and Nayar [30] pre- sented a rectification method to minimize the resampling effect, which corresponds to the loss or creation of pixels due to under- or over-sampling, respectively. A simi- lar approach was developed by Mallon and Whelan in [31], where perspective distor- tions are reduced by applying the singular vector decomposition (SVD) to the first order approximation of an orthogonal-like transfrom. Isgro and Trucco [32] proposed a new approach to estimate homographies without explicit computation of the fundamental matrix by minimizing the disparity as done in [26]. However, such a constraint on the disparity sometimes generates significantly distorted image. For further improvement, Wu and Yu [33] combined this approach with a shearing transform to reduce the distor- tion. 7 More recently, Fusiello and Luca [34] proposed a Quasi-Euclidean epipolar rectifi- cation method that approximates the Euclidean (calibrated) case by enforcing the recti- fying transformation to be a collineation induced by the plane at the infinity, which does not demand a specific initialization step in the minimization process. Zilly et al. [35] proposed a technique to estimate the fundamental matrix with the appropriate rectifica- tion parameter jointly. However, their method is limited to the case with almost parallel stereo rigs and a narrow baseline. Georgievetal. [36] developed a practical rectification algorithm of lower computational cost, yet it is only applicable in the probable stereo setup. 1.3 Contribution of the Research Several contributions are made in this research, which are described as follows: The main contribution of Chapter 2 is to propose a novel formula-based quality index that handles not only symmetric but also asymmetric distortions. Although most of existing stereoscopic quality indices assume a symmetric distortion, there actually exist mismatches between left and right views in real applications such as different distortion types, levels, color mismatch, and so on. To take such an asymmetry into account, we first investigate a reaction of human visual system regarding asymmetric distortion and provide a proof that a newly introduced structural distortion is a main factor of the different human visual experience. To incorporate such a structural masking effect into a quality index, we introduce a structural distortion parameter (SDP) that varies according to different distortion types. Then, we use the SDP as a control parameter in a binocular perception model, and apply it into three components of SSIM to obtain an overall quality index. In the proposed framework, the binocular model accommodates 8 distortion types and distortion degrees and, therefore, offers robust quality assessment results for both symmetric and asymmetric distortions. In Chapter 3, we describe our new stereoscopic image database, MCL3D. Even though there are several existing 3D image databases, any of them did not carefully consider a rendering distortion. Most recent 3D video coding standard started to use a new video format, that is, multi-view plus depthmap (MVD) format. The main rea- son to adopt MVD format results from an resource efficiency. Under the framework of using MVD, we may send a few compressed texture and its corresponding depthmap to decoder. At decoder side, we can generate several intermediate view using Depthmap Image Based Rendering (DIBR) technique. During view rendering process, the dis- tortion in texture image and depthmap image gives different degradation effect on the rendered image. This observation should be considered to develop a robust quality index. To this end, our MCL3D database was build focuing on a rendering distotion. On top of that,we mimicked the whole processing flow for 3D video communication, and considered all possible distortion types which will be naturally imposed at each stage, including video capturing, compression, transmission, and rendering, to make the final rendered stereoscopic image the same as in real systems. It contains 693 high res- olution stereoscopic image pairs. We used pair-wise comparison methods to conduct the subjective test, and carried out 270 sets of experiments. Based on the statistical information of the assessors and the final Mean Opinion Score (MOS), user experiences on the perceptual quality of the stereoscopic images were analyzed in detail, and the performance of benchmarks was evaluated. In Chapter 4, we propose ensemble fusion system based quality index. The assump- tion is that, given the complex and diversifying nature of general visual content and distortion types, it would be challenging to solely rely on a single method. In other 9 words, a single analytical (formula-based) measure tends to perform poorly under con- ditions when their underlying assumptions are not satisfied. To tackle such a limitation, the proposed index adopts a learning based algorithm. More importantly, we reveal the complex relationship among the different kinds of distortions in a given pair of stereo- scopic images by introducing a novel ensemble fusion based framework. The major contributions can be summarized as follows: 1. The novel ensemble fusion system based quality index is proposed, which con- siders the impacts of combinations of several distortion types on the perceptual quality of a stereoscopic image pair. We use a learning algorithm to generate the model for the predicted quality score, which finds out optimized relation- ship among the intermediate scores for individual distortion types. Our model is scalable and can be easily extended to consider new distortion types or include state-of-the-art metrics in the future. 2. In order to cover a broad range of distortion types systematically, we classify the distortion types into several groups and provide a relevant feature extraction scheme for each group. Especially, we investigate different impacts of distortions of texture image and depthmap image on the rendered image. 3. We prove the superiority and the robustness of the proposed metric by extensive experimental results on a database of MVD data format as well as a database of a conventional stereo format with symmetric and asymmetric distortions. Besides, we show a consistent robustness of the proposed metric from a cross database validation. Furthermore, we compare a performance of the proposed model with over 15 state-of-the-art existing 2D/3D quality indices. In Chapter 5, we propose a novel rectification algorithm for uncalibrated stereo images, which demands no prior knowledge of camera parameters. Although quite a few 10 methods are proposed to reduce unwanted warping distortions in rectified images with different homography parameterization schemes, there is no clear winner among them. Additionally, only two geometric measures (namely, orthogonality and the aspect-ratio) are used as geometric distortion criteria while they are not sufficient in characterizing the subjective quality of all rectified images. Here, we analyze the effect of various geometric distortions on the quality of rectified images comprehensively, and take them into account in algorithmic design. The proposed USR-CGD algorithm minimizes the rectification error while keeping errors of various geometric distortion types below a certain level due to the following contributions: An uncalibrated stereo rectification algorithm is proposed to minimize the rectifi- cation error with constrained geometric distortions. A variety of geometric distor- tions such as the aspect-ratio, rotation, skewness and scale-variance are introduced and incorporated in the new cost function, then it is minimized by our novel opti- mization scheme. A parameterization scheme for rectifying transformation is developed. The parameters include the focal length difference between two cameras, the verti- cal displacement between optical centers, etc. This new scheme helps reduce the rectification error by adding more degrees of freedom to the previous Euclidean model [34]. We provide a synthetic database that contains six geometric distortion types fre- quently observed in uncalibrated stereo image pairs. It allows a systematic way to analyze the effect of geometric distortions and parameterize the rectifying trans- formations accordingly. We also provide a real world stereo database with vari- ous indoor and outdoor scenes of full-HD resolution. The performance of several algorithms can be easily evaluated with these two databases. 11 1.4 Organization of the Proposal The rest of this thesis is organized as follows. Stereoscopic Image Quality Index using SDP-based binocular perception model is proposed in Chapter 2. Next, a newly built MCL3D stereoscopic image database is described in detail in Chapter 3. In Chapter 4, we propose the learning-based parallel boosting system with scorer design and fea- ture extraction scheme to realize the image quality assessmnt system while the proposed USR-CGD algorithm is elaborated in Chapter 5. The corresponding experimental results are reported in Chapter 2, Chapter 4 and Chapter 5, where extensive performance com- parisons are made across multiple databases. Finally, concluding remarks are given in Chapter 5. 12 Chapter 2 Formula-based Stereoscopic Image Quality Index In this chapter, we propose a novel formula-based stereoscopic image quality index. First, we introduce a parameter called the structural distortion parameter (SDP), which varies according to different distortion types. Then, we use the SDP as a control param- eter in an improved binocular perception model, and apply it into three components of SSIM to obtain an overall quality index. In the proposed framework, the binocular model accommodates both distortion types and degrees and, therefore, offers robust QA results for both symmetric and asymmetric distortions. The rest of this chapter is organized as follows. The structure distortion parameter is introduced first. Next, the new 3D image quality index is proposed and its performance is evaluated. 2.1 Structural Distortion Parameter (SDP) 2.1.1 Asymmetric Distortion Five distortion types were considered in IVC [37] and LIVE [38] 3D image databases. They are: 1) blur, 2) JPEG coding, 3) JPEG-2000 coding, 4) fast fading and 5) additive noise. The distortion level is the same for the left- and right-views in these databases. In order to observe the effect of asymmetric distortions, we generated stereo image pairs that have different distortion levels in the left- and right- images while keeping 13 Figure 2.1: Examples of Asymmetric Distortions. the distortion type the same. For example, we allow three blur levels, where 1 is the strongest and 3 is the weakest, to occur independently in one of the two views and conduct the subjective visual test. Some examples of asymmetric distortion are shown in Fig. 2.1. 2.1.2 Binocular Perception in Stereovision We observe from experiments that, in human perceived 3D image quality, interaction of left/right views depend on the distortion type. For example, for the blur and the JPEG-2000 coding distortions, the lost information of a low quality image can be compensated by a high quality image as shown in Fig. 2.2 (a). Thus, the perceptual quality is closer to that of the high quality image. On the other hand, for the additive noise and the JPEG coding distortions, the high quality image is negatively influenced by the low quality image as shown in Fig. 2.2 (b). 14 [Original Image / Higher Quality] [Blurred Image (Information Loss) / Lower Quality] Supplementary Information (a) Information flowing from high to low quality images [OriginalwImagew/wHigherwQuality] [JPEGwImagew(InformationwAddition)w/wLowerwQuality] Annoying Information "Structural Distortion" (b) Information flowing from low to high quality images Figure 2.2: Illustration of two information flow types between stereoscopic image pairs with asymmetric distortion in HVS. One way to explain the above observation is to consider whether a new “structure distortion” is introduced in one of the image pairs. The most obvious distortion in JPEG-2000 coded images is the ringing artifact in the edge region. Both blur and ringing are distortions occurring in the edge region of both images. In this case, there is no new structure distortion introduced. If one image has better quality than the other, it can be well perceived by human being to compensate the worse image. In contrast, the most obvious distortion in JPEG-coded images is the blocking artifact, which can be apparent to human eyes if appearing in smooth regions. Additive noise 15 Original Image Blurred Image JPEG Image Edge_Orig Edge_Blur Edge_JPEG Figure 2.3: Three input images & corresponding edge magnitude maps. is also visible in smooth regions. Suppose that one image does not have any distortion (highest quality) while the other image has these distortions. Since there is no way to hide these distortions using the high quality image, the overall perceptual quality is either the average of two inputs or closer to that of the low quality one. For this reason, the blocking artifact and the noise are called new structure distortions. 2.1.3 Procedure for calculating SDP Being motivated by the above discussion, we attempt to quantify the structural distortion of the distorted image with a parameter called the structure distortion parameter (SDP). The process consists of three steps. First, we compute the edge magnitude map of each 16 Res_Edge_Blur Res_Edge_JPEG Figure 2.4: Residual edge maps. input image using the sobel-edge detector. Three exemplary images (i.e. the desk lamp image and the associated blurred and JPEG-coded images) are given in Fig. 2.3. Second, we compute the difference between the edge magnitude maps of the distorted and the orignal edge maps (called the residual edge map): Residual Edge Map(i;j) = Edge Dist(i;j) Edge Org(i;j): Two examplary residual edge maps are shown in Fig. 2.4, where the left and right images are results obtained from the blurred and the JPEG-coded images, respectively. Clearly, the residual edge map for the JPEG-coded image is more obvious than that of the blurred image. Finally, we define the SDP as SDP = P W i=1 P H j=1 jRes Edge Map(i;j)j WHMaxC ; (2.1) whereW andH are the width, the height Height of the input image,Max is the maxi- mum intensity level, andC is a control parameter. 17 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Blur JPEG2000 JPEG Additive Noise 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Blur JPEG2000 JPEG Additive Noise SDP value SDP value USCMCL Database 1 USCMCL Database 2 Figure 2.5: The SDP distribution for images with blur, JPEG-2000, JPEG and additive noise distortion types. To illustrate the usefulness of the SDP, we plot the distribution of SDPs with respect to a set of stereo image pairs created by our laboratory with four image distortion types in Fig. 2.5, where we plot the mean and the 90% confidence level. The images in USCMCL Database 1 and USCMCL Database 2 are basically the same as those in IVC and LIVE. However, distortions in IVC and LIVE as symmetric while they are gener- alized to asymmetric distortions in USCMCL databases. We see clearly that the SDP value is distributed in different ranges for these four distortion types. 2.2 The Proposed Quality Index using SDP-based Binocular Perception Model The SSIM index was proposed in [47] for the quality assessment of a single 2D image. To do so, one first measures the luminance, contrast and structural (S) similarities between imagesx andy via L(x;y) = 2 x y +C 1 2 x 2 y +C 1 ; C(x;y) = 2 x y +C 2 2 x 2 y +C 2 ; S(x;y) = 2 xy +C 3 x 2 y +C 3 ; (2.2) 18 where,, and xy are the mean, the standard deviation, and the covariance, respec- tively. Then, the SSIM index can be expressed as SSIM(x;y) = [L(x;y)] [C(x;y)] [S(x;y)] : (2.3) In this section, we generalize the single-image SSIM index to a new 3D image pair quality index by incorporating a binocular perception model in [8]. In this model,the perceived quality of the stereo image pair can be written as Q B =fwQ n high + (1w)Q n low g 1 n ; (2.4) whereQ high andQ low are quality indices of high and low quality images in the stereo image pair, respectively, and parameterw controls the contribution of the high quality image to binocular perception. In the proposed index, parameter w is determined by SDP and luminance/contrast indices of SSIM. The overall block-diagram of the proposed index is shown in Fig. 2.6. Basically, we apply the binocular model to the three components in SSIM, and need to determine the weight,w, and the exponent,n. Empirically, we find the following parameters n L = 1; n C = 1; n S = 4; work well in Eq. (2.4). We will focus on the choice ofw for each component of SSIM below. We first study the weight for the structural component. LetSDP L andSDP R denote the SDP of the left and right images, respectively, and let SDP max = max(SDP L ;SDP R ); SDP min = min(SDP L ;SDP R ): 19 Original LeftxImagex(OL) Original RightxImagex(OR) Distorted LeftxImagex(DL) Distorted RightxImagex(DR) SDI Right SDI BinocularxPerceptionx ModelxforxSimilarity BinocularxPerceptionx ModelxforxLuminance BinocularxPerceptionx ModelxforxContrast SDI Left Sim Idx (OL,xDL) Lum Idx (OL,xDL) Cont Idx (OL,xDL) Sim Idx (OR,xDR) Lum Idx (OR,xDR) Cont Idx (OR,xDR) S B C B L B QualityxIndexx(Q) Figure 2.6: The blockdiagram of the proposed index for stereo image pairs. Then, we adopt the following weight assignment scheme in the binocular model to account for the structural component: w S = 0:5 0:5 r SDP max SDP min SDP max ifSDP max T; (2.5) w S = 0:5 + 0:5 r SDP max SDP min SDP max ifSDP max <T; (2.6) where T is a threshold. It is set to 0.3 in our experiment. In words, the high quality image will have a higher weight ifSDP max is low enough, but a lower weight ifSDP max is high. 20 By following [46], we assign a higher weight to the image with more contrast for the contrast component. Mathematically, the weight is expressed as w C =Cont max =(Cont min +Cont max ); where Cont max = maxfmean(CMap L );mean(CMap R )g and Cont min = minfmean(CMap L );mean(CMap R )g, and where CMap denotes the contrast map of the left or right images. For the luminance component, we observe that an image that has weaker luminance sim- ilarity to the original image has more impact on the perceptual quality in the subjective test. Thus, we adopt the following weight assignment: w L =Lum min =(Lum min +Lum max ); where Lum max = maxfmean(LMap L );mean(LMap R )g and Lum min = minfmean(LMap L );mean(LMap R )g, and whereLMap is the luminance map of the left or right images. Finally, the overall quality index for the 3D image can be written as Q = (L B ) (C B ) (S B ) ; (2.7) where L B , C B , and S B are binocular perceptual indices of luminance, contrast, and structural similarities, respectively. We obtain the parameters in Eq. (2.7) experimen- tally as = 18; = 0:3; = 0:3; 21 2.3 Experimental Results We generated two new databases with asymmetrical distortortions based on IVC [37] and LIVE [38], which are called USCMCL 1 and USCMCL 2, respectively. To be more specific, we took 12 pristine stereoscopic image pairs from [37] and [38] and applied four distortion types. Each test stereoscopic image pair has six distortion combinations as described in Table 2.1. They consist of three symmetric-& three asymmetric- pairs of distortion for each distortion type. We conducted a subjective test among 20 people to obtain the mean opinion score (MOS). The test environment and set-up are summa- rized in Table 2.1. First, we compare the MOS behavior with respect to blur and JPEG distortions as shown in Fig. 6. Each curve in the figure can be segmented into 6 seg- ments corresponding to 6 test images from either IVC or LIVE. Each segment has 6 consecutive points whose distortion degree pairs are [1,1], [1,2], [1,3], [2,2], [2,3], and [3,3] in order. We see that the subjective 3D image quality is dominated by the higher quality image in the blur image case but closer to the lower quality image in the JPEG case, which supports the proposed binocular model as given in Eqs. 5 and 6. Next, we compare the performance of four index methods (PSNR, SSIM, RKS [18] and the pro- posed one) in four measures, including Pearson Correlation Coefficient (PCC), Spear- man Rank-Order Correlation Coefficient (SROCC), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) as recommended by VQEG [48]. Since PSNR and SSIM are only applicable to a single image, we use a simple average to yield the overall quality index. We show the results for test images containing both symmetric and asym- metric distortions, symmetric distortions only and asymmetric distortions only in Tables 2, 3, and 4, respectively. Each table lists performance scores for four distortion types - blur, JPEG-2000, JPEG and additive white noise as well as their average performance scores. The index that gives the best average performance score is highlighted in bold. 22 Table 2.1: The subjective test environment. Participants 20 people (man:19 / woman:1 / age: 2437) Methodology ACR (w/ 5 grading scales [1:Bad–5:Excellent]) Original Stereo- scopic Images Six from IVC database [37] Six from LIVE database [38] Distortion Type Blur / JPEG 2000 / JPEG / Additive Noise(WN) Asymmetirc distortion design Each test stereoscopic pair has six combinations of distor- tion per each distortion type as follows: [1,1], [1,2], [1,3], [2,2], [2,3] and [3,3], where [Dist Level left ]; [Dist Level right ]: [1]: The strongest- / [3]:The weakest distortion # of test imags (6 + 6) 6 4 = 288 0 5 10 15 20 25 30 35 40 1 1.5 2 2.5 3 3.5 4 4.5 5 MOS USC Database1 USC Database2 0 5 10 15 20 25 30 35 40 1 1.5 2 2.5 3 3.5 4 4.5 5 USC Database1 USC Database2 MOSs of JPEG distorted stereoscopic images MOSs of Blurred stereoscopic images Figure 2.7: MOS toward structural-/ non-structural distortion. We see that the proposed 3D image index method outperforms the other three indices in all cases. Let us focus on the performance comparison of RSK and the proposed index method. The performance gain of the proposed one over RSK is more significant in the asym- metric distortion case since the proposed index method takes the binocular perception model into account and adjust the weight according to the SDP of the underlying distor- tion. This adaptability against various distortion types makes the proposed index more robust so that it works well in both symmetric and asymmetric cases. 23 Table 2.2: Overall performance. PCC SORCC RMSE MAE PSNR SSIM RSK Prop. PSNR SSIM RSK Prop. PSNR SSIM RSK Prop. PSNR SSIM RSK Prop. USCMCL 1 BLUR 0.72 0.80 0.84 0.89 0.70 0.80 0.85 0.89 0.78 0.68 0.61 0.51 0.60 0.55 0.50 0.40 JP2K 0.82 0.86 0.89 0.90 0.83 0.87 0.90 0.90 0.62 0.55 0.48 0.46 0.49 0.45 0.40 0.37 JPEG 0.34 0.72 0.87 0.88 0.30 0.72 0.86 0.87 0.82 0.61 0.43 0.42 0.70 0.52 0.36 0.33 WN 0.80 0.92 0.93 0.95 0.77 0.90 0.92 0.94 0.54 0.35 0.33 0.28 0.43 0.26 0.26 0.20 A VR 0.67 0.82 0.88 0.91 0.65 0.82 0.88 0.90 0.69 0.55 0.46 0.42 0.55 0.44 0.38 0.33 USCMCL 2 BLUE 0.80 0.82 0.81 0.86 0.79 0.83 0.81 0.87 0.66 0.62 0.64 0.56 0.53 0.47 0.49 0.42 JP2K 0.83 0.88 0.85 0.87 0.82 0.88 0.84 0.87 0.52 0.44 0.49 0.45 0.40 0.34 0.38 0.35 JPEG 0.67 0.84 0.82 0.88 0.64 0.81 0.78 0.81 0.73 0.53 0.56 0.51 0.59 0.43 0.44 0.40 WN 0.62 0.76 0.88 0.96 0.53 0.77 0.83 0.94 0.52 0.44 0.32 0.19 0.39 0.34 0.27 0.15 A VR 0.73 0.83 0.84 0.89 0.69 0.82 0.82 0.87 0.61 0.51 0.50 0.43 0.48 0.40 0.39 0.33 Table 2.3: Performance of database with Symmetric distortion only. PCC SORCC RMSE MAE PSNR SSIM RSK Prop. PSNR SSIM RSK Prop. PSNR SSIM RSK Prop. PSNR SSIM RSK Prop. USCMCL 1 BLUR 0.78 0.90 0.96 0.97 0.78 0.90 0.93 0.94 0.80 0.55 0.35 0.32 0.62 0.46 0.28 0.25 JP2K 0.82 0.89 0.96 0.96 0.83 0.88 0.91 0.90 0.73 0.59 0.38 0.36 0.56 0.47 0.34 0.30 JPEG 0.52 0.83 0.94 0.95 0.52 0.78 0.91 0.93 0.91 0.59 0.37 0.33 0.78 0.49 0.31 0.27 WN 0.95 0.96 0.97 0.97 0.85 0.94 0.95 0.95 0.36 0.31 0.26 0.26 0.24 0.22 0.18 0.18 A VR 0.77 0.90 0.96 0.96 0.74 0.87 0.92 0.93 0.70 0.51 0.34 0.32 0.55 0.41 0.28 0.25 USCMCL 2 BLUE 0.95 0.95 0.95 0.96 0.81 0.87 0.89 0.90 0.41 0.40 0.39 0.37 0.32 0.32 0.33 0.31 JP2K 0.89 0.92 0.90 0.91 0.86 0.92 0.86 0.88 0.53 0.45 0.51 0.48 0.44 0.38 0.38 0.37 JPEG 0.82 0.90 0.89 0.92 0.72 0.88 0.82 0.85 0.71 0.53 0.56 0.50 0.60 0.41 0.41 0.37 WN 0.72 0.82 0.95 0.97 0.71 0.89 0.94 0.97 0.52 0.43 0.16 0.09 0.36 0.30 0.11 0.06 A VR 0.85 0.90 0.92 0.94 0.77 0.89 0.88 0.90 0.54 0.45 0.40 0.36 0.43 0.35 0.31 0.28 Table 2.4: Performance of database with Asymmetric distortion only. PCC SORCC RMSE MAE PSNR SSIM RSK Prop. PSNR SSIM RSK Prop. PSNR SSIM RSK Prop. PSNR SSIM RSK Prop. USCMCL 1 BLUR 0.71 0.73 0.80 0.85 0.62 0.69 0.80 0.84 0.62 0.59 0.52 0.45 0.43 0.47 0.43 0.36 JP2K 0.83 0.83 0.85 0.85 0.81 0.86 0.84 0.84 0.44 0.44 0.42 0.42 0.37 0.35 0.36 0.35 JPEG 0.01 0.59 0.77 0.79 0.01 0.58 0.70 0.72 0.62 0.50 0.40 0.38 0.51 0.40 0.34 0.32 WN 0.59 0.81 0.83 0.94 0.65 0.78 0.81 0.93 0.47 0.35 0.33 0.20 0.40 0.28 0.26 0.16 A VR 0.53 0.74 0.81 0.86 0.52 0.73 0.79 0.83 0.54 0.47 0.42 0.36 0.43 0.38 0.35 0.30 USCMCL 2 BLUE 0.71 0.70 0.67 0.75 0.54 0.63 0.60 0.69 0.56 0.56 0.58 0.52 0.40 0.39 0.43 0.37 JP2K 0.81 0.85 0.80 0.84 0.81 0.82 0.79 0.81 0.35 0.31 0.35 0.32 0.28 0.26 0.26 0.24 JPEG 0.39 0.65 0.71 0.83 0.31 0.56 0.50 0.64 0.55 0.45 0.42 0.33 0.46 0.36 0.30 0.23 WN 0.61 0.74 0.89 0.96 0.44 0.75 0.82 0.92 0.41 0.34 0.23 0.14 0.31 0.26 0.20 0.11 A VR 0.63 0.73 0.77 0.85 0.52 0.69 0.68 0.76 0.46 0.42 0.39 0.33 0.36 0.32 0.30 0.24 24 Chapter 3 Construction of Stereoscopic Image Database - MCL3D In this chapter, the newly built MCL-3D database will be introduced in detail which was specifically designed for stereoscopic image based on the DIBR technology. Images from three view points were extracted from a set of well-chosen 3DVC test sequences, and various types of distortions commonly met in actual capturing, compression, trans- mission and rendering were imposed. We used the finally rendered stereoscopic images to conduct the subjective test, and collected ample number of opinion scores. Based on which, the factors affecting the perceptual quality of stereoscopic images were analyzed, and the performance of benchmarks was demonstrated. 3.1 Previous Databases As a traditional and widely used fidelity measurement method, Signal to Noise Ratio (SNR) was utilized for several decades for signal processing and was borrowed for image and video processing as a quality assessment method for several tens of years. After that, several 2D image and video databases for quality assessment were released [49], and several new algorithms considering HVS were proposed. According to the latest performance [50], the metric for 2D image is almost ready to be put into practical use. However, few comparable effort was devoted to 3D images. 25 Table 3.1: Summary of 3D Image Database Table 3.1 is a summary of several well-known databases for stereoscopic images. For these databases, the LIVE Phase I [38] only has traditional noises symmetrically applied to the left and right image. To make the scenario more complete, they added asymmetric distortions to the images, and formed the LIVE Phase II [51] database. The IVC 3D database [37] is similar to LIVE Phase I but with different source image selection. For these three databases, they used low resolution image pairs, which are not commonly used for 3D display. EPFL database [52] used high definition images, they focused on the impact of different disparity on the visual quality, and the most valuable work for this database is providing a coarse guideline for disparity selection. Similar but more delicate work was ongoing at MOSCOW state university [53], they are analyzing the disparity on continuous video films and some metrics were finely designed for disparity selection of 3D films. The IVC DIBR database [55] tested the visual quality of rendered stereoscopic pairs by 7 different rendering algorithms. Within this database, the texture image and depth image from only one view point were used to render both left and right images. For these databases, they do not care about the communication structure. The distortions were imposed into the binocular images directly, which restricts the reference value under specific broadcasting structure like DIBR. 26 Figure 3.1: The system based on DIBR technology. With DIBR technology, the processing flow of the video communication system is shown in Fig. 3.1. At the encoder side, the view point could be sparsely sampled, and the texture and depth map images of these sampled views will be compressed and transmit- ted separately. At the decoder side, all the texture images and depth maps from different view points could be decoded, then the two stereoscopic image pairs from any view point could be rendered. Motivated by the requirements of quality assessment based on this structure, we built the MCL-3D database. In this database, we mimicked the whole pro- cessing flow for 3D video communication, and considered all possible distortion types which will be naturally imposed at each stage, including video capturing, compression, transmission, and rendering, to make the final rendered stereoscopic image the same as in real systems. 3.2 Description of the MCL-3D Image Database 3.2.1 Structure The structure used to build the MCL-3D database is illustrated in Fig. 3.1, and the cor- responding image generation method is shown in Fig. 3.2. First the texture with depth map of three views from a set of 3DVC test sequences were sampled, i.e., O T1 with 27 Figure 3.2: Structure of experiment O D1 ,O T2 withO D2 , andO T3 withO D3 . And distortions with different types and levels were added either in original image or in compressed bits. Then the distorted texture images and depth maps were input to the view synthesis reference software (VSRS) [56] to render the distorted stereoscopic image pair. For DIBR distortion which is caused by imperfect rendering algorithms, or insufficient view information, we took theO T2 and O D2 as input, and used four different rendering algorithms to generate the stereoscopic image pair. The VSRS was taken as a near-prefect synthesis method. If both of the leftmost and rightmost view are obtained, the VSRS can output a near perfect image at the interme- diate view, for which it is difficult for most people to figure out the flaws. Based on this, 28 (a) Kendo (b) Balloons (c) Love bird1 (d) Poznan street (e) Poznan hall2 (f) Shark (g) Microworld (h) Gt fly (i) Undo dancer Figure 3.3: Reference texture images in MCL3D database we took original three texture images and depth maps views as input and rendered the R VL andR VR . These two views will be taken as the reference for further analysis. 3.2.2 Image Source The quality of a database is highly rely on the reference images. And the images selected should be representative and have a wide variety. For 3D video communication systems, the test sequences for 3DVC can be considered as good candidates. Lots of multi-view sequences with associated depth data were provided by group members of Joint Col- laborative Team on 3D video coding extension development. Within the candidates, we abandoned some of them with spatial resolution not commonly used, and some with 29 Figure 3.4: The entire precessing flow and the corresponding distorted part for each database camera calibration problems, and also some sequences with similar scene characteris- tics. Finally nine of them were preferred in MCL-3D database. And they are demon- strated in Fig. 3.3. 3.2.3 distortion types and levels For the communication system built on DIBR, distortions may come from image acqui- sition, compression, transmission, and rendering. At the image acquisition stage, gaus- sian blur and additive noise are considered to be a common distortion. Before com- pression, the image may be down-sampled. To make the transmission more efficient, all the images will be compressed beforehand, which cause blockiness and compression blur. And during transmission, there will be some bit error. Before displaying, some rendering algorithm must be applied to render all the views. For the aforementioned databases, each of them investigated some part of these distortion types, and they are shown in Fig. 3.4. In MCL-3D database, we added the distortions specifically consider- ing all the conceivable cases. In the recommendation drafts introduced by ITU [57] or VQEG [48] for subjective test, the image qualities of alternatives are divided by 5 levels for discrete scoring, or labeled with 5 levels for continuous scoring method. And five levels are taken as the 30 most proper choices for human perception. Hence, we set 5 levels for each image set in our database. The reference stereoscopic images are taken as “Excellent”, and we imposed four other distorted images with different distortion strength, corresponding to “good”, “fair”, “poor”, and “bad” respectively. Most of the real distortion types existing in the 3D transmission system are com- pounded, and we can only view the rendered image with compound distortion at the terminal end. However, to systematically examine how each kind of distortion affects the perceptual quality, we would better separate the distortions. Thus, like almost all other databases, we only impose one kind of distortion to the source images for syn- thesis in our experiment. In order to investigate the scale of degradation for the same distortion on different part, either texture or depth, we applied the same distortion type and level separately to texture part and depth part. And to further study the more real case, we added compound distortion for which the same symmetric distortion exists both in texture image and depth map. The distortion caused by imperfect rendering algorithm is unique from others. For traditional rendering algorithm, only the middle view was taken as input, and the stereo- scopic image pair are rendered with some hole filling techniques. And the key point of such kind of algorithm is how to fill the holes to beautify the perceptual quality. Thus, for distortion brought about by rendering, we tookO T2 andO D2 as input and generate the stereoscopic image pairs. Some detail on the distortion types and the corresponding adding methods are out- lined below. Gaussian Blur. Lots of parameters need to be calibrated [52] during the acquisition of high quality stereoscopic images, within which the focal length is one of the most important ones. Texture images from any view will be blurred due to improper focal length. 31 Depth maps could be acquired by equipments [59], estimated by depth estimation algorithms [60] or generated by computer graphics. It was declared by some researchers [55] that the visual experience can be improved by applying some blur to the depth map before rendering. But there is no experimental results supporting this announcement, and we would like to demonstrate its effectiveness. We used ‘GaussianBlur()’ function in OpenCV [61] lib to add the gaussian blur, the levels were controlled by the standard deviation of the kernel, and the kernel size used are 11, 21, 31, 41 for the four levels. Additive White Noise. With digital image capturing systems, CMOS or CCD sensors are placed at the end of optical lenses to capture the decomposed color light intensity. And the intensity was later transformed to voltage and quantized to digital pixel values. Interference is ubiquitous in electronic circuits, and these interference appears as additive white noise in the obtained texture or depth image. The ‘randn()’ func- tion in the OpenCV lib was utilized to generate the additive noise, the levels were controlled by the standard deviation parameter. Noisy images were generated according to the different levels and then added to the original one. The parame- ters used are 5, 17, 33, 53. Down-sampling Blur. The originally captured image is always with high spatial resolution. Restricted by transmission bandwidth or considering the scalable coding structure for broad- casting, the original image will be down-sampled before compression. Although the difference of the visual experience for images with down-sample blur and Gaussian blur is slight, we would like to acquire the ground-truth of such kind of experience. The resize() function with default resizing method in OpenCV is used 32 for down-sampling and up-sampling, different levels for down-sampling blur are concerning the sampling ratio and the sampling ratios used are 5, 8, 11 and 14 respectively for the four levels. Jpeg and Jpeg2000 Compression. Compression is necessary for image and video communication owing to its huge volume with high redundancy. The common effect for compression is blockiness and blur. We apply Jpeg and Jpeg2000 compression to mimic these two distortion types for various kinds of compression methods. For Jpeg compression, we uti- lized the ‘imencode()’ function in OpenCV and the the quality levels are 30, 12, 8 and 5 for the four levels. For Jpeg2000 compression, we utilized the Kakadu [62] package, and the compression parameters used are 200, 500, 900 and 1500 respec- tively for each level. Transmission Error. Transmission of the data via unreliable channel will cause packet loss and bit error. At the decoding side, the lost information will usually be amended by error concealment methods. The importance of such kind of error concealments varied for different types of media. In MCL-3D database, we used the OpenJPEG lib with JPWL mode to mimic the encoding with transmission. The images were first encoded by several tiles, unequal protection and error correction codes were used in JPWL mode. Then some bit errors were added to the compressed bitstreams. At decoder side, the errors were partly corrected. Since with the protection methods used there is no direct correlation between bit-error rate and final visual quality of the decoded image. For distortion on either part, we generated 80 seeds of images with error presence, and pick out 4 of them with different visual quality and form the 4 transmission error levels. 33 Rendering Distortion. With the DIBR technology, the final video to be displayed on the screen need to be rendered by the texture and depth images. Typical rendering errors like black hole [66] and boundary blur will appear with imperfect rendering techniques. There are lots of rendering algorithms [55], and we took some representative ones to do the subjective test. The selected rendering algorithms are DIBR without hole filling, DIBR with filter [68], DIBR with inpainting [69], and DIBR with hierarchical hole filling [70]. 3.2.4 Subjective Test for the database The subjective test is conducted by systematically designed procedure. A GUI program is developed for subjective test, with which the pair-wise comparison method is applied. Test environment is set up according to ITU recommendations[71]. And we verified and confirmed the testing results after the subjective test procedure. Subjective test method ITU and VQEG are the two foremost international groups working on the standardiza- tion of subjective test method. The methods could roughly be classified into four groups according to the scoring levels and stimulus numbers, as shown in Table 3.2. It is experimentally demonstrated that continuous scale number can not improve the precision of test results [72], and it makes the assessor spend more time hesitating the scoring number. And with ACR method, for different assessor the same score has different meaning. Even for one specific assessor, the rating criteria vary along the testing time no matter how elaborately the training process was prepared. So after some preliminary test, we only focus on the methods with double stimulus and discrete score. 34 Table 3.2: Recommendations for subjective test moehod. Beside the methods recommended by ITU and VQEG, another interesting pairwise comparison method were applied by some renowned database [73]. With these methods, two images are displayed simultaneously, and one of them will be preferred by the assessor. Each of the images is assigned a point score, the point score will accumulate during the pairwise competition, and the final point score is taken as the final opinion score. The testing procedure is explicit, however, there is no proof to convince the final score. Pairwise comparison method has been extensively used for resource ranking and recommendation systems. The method has some mathematical support and it could be brought into subjective tests. In our database, we tried to integrate the swiss competition rule and score conversion models, and acquire a more meaningful result. The pairs are selected based on the swiss competition rule with some minor adjustment. We set up 9 rounds of competition, for each round only images with the same point score is selected. 35 Figure 3.5: Subjective test environment The pairs are finely selected to make the inference of a full matrix more reliable. We recorded the comparison result for each pair as well as the final point score. Subjective test procedure A special test environment is set up for the subjective test. The display equipment is 46.9” LG 47LW5600. Assessors are asked to take a polarized glass and seated around 2.5 times the height of the screen with some minor adjustment [58], as showed in Fig. 3.5. During the test, two stereoscopic image pairs are shown on the screen simul- taneously, with vertical manner. The images are resized to adapt the display, and the gap between two images is padded with grey levels specified by ITU documents [58]. With pair-wise comparison method, only the relative quality of the two pairs will be annotated by assessor. Hence the resize process will not affect the final result. The GUI software is used to control the whole process for each assessor. Before doing the test, the assessor is asked to input some personal information in the log inter- face shown in Fig. 3.6. And each assessor is asked to be with normal perception of the 36 Figure 3.6: GUI of Subjective test software stereoscopic images. During the subjective test, assessors are asked to prefer one from the two vertically positioned image pairs using the up and down key on the keyboard. For each image set, the testing time is 12 min. to 15 min. After the subjective test, we conducted a short conversation with the assessor. No questionnaire is designed to avoid misleading. The assessors are recruited mostly from students in University of Southern Califor- nia and Xidian University. Among the 270 assessors, 170 are male, account for 63%, 100 are female, account for 37%. In order to investigate the score difference between experts and non-experts using pairwise comparison method, we also asked assessors to leave their expertise property on stereoscopic image. Among them, 34 are expert, account for 13%, and 236 are non-expert, account for 87%. The age distribution of the assessors could be found in Fig .3.7. 3.2.5 Mean Opinion Scores Each assessor in one test session has carried out all the distorted images for only one reference image. And the test duration for each session is less than 15 minutes, which complied with the requirements of recommendations given in ITU-R Rec. BT.500 [57]. 37 Figure 3.7: Age distribution of the assessors By the subjective test, we collected 30 opinion scores for each image set, which is totally 270 opinion scores for each image set and 20790 opinion scores in total. The subjective test results were verified by screening process according to Annex 2 of Rec. BT.500 [57], rare scores were deleted. As a extra rigorous outlier check process, the highest 10% and the lowest 10% scores were abandoned for each image. And the final MOS was calculated by mean value of remaining opinion scores. Since we collected much more opinion scores than recommended, the remaining scores could still meet the requirements. 3.3 Perception Quality Analysis The human visual system is almost the same for most people, and there is some com- mon sensibility to specific type of distortion, which is important for image processing. 38 Based on the feedback information from the questionnaire and verified by the score dis- tribution, we hereby exhibit some prominent factors which impact the overall perceptual quality. 3.3.1 Statistical analysis of distortion type Two types of images were involved in DIBR structure. The one is 2D texture image, which has the same appearance and properties as traditional 2D images. And the other one is the depth map, which is fundamentally different from texture image. Pixel val- ues in depth map indicate the relative distance from the object to the camera, and the precision is not very important. For most depth map, they can be roughly segmented into several flat or gradient region. Nowadays, the depth map may record by sensors directly or estimated by some algorithms from different views of the same scene [60]. By different acquiring method, some specific type of distortions may exist either in tex- ture or depth only, or exists in both. Hence it is spontaneous to find out the common and different effects for different scenarios. Distortion only on texture part The MOS of distorted images with distortion on texture part only is drawn in Fig. 3.8. The MOS have monotonously decreasing property within each minor slot, which is similar to the preset levels of distorted texture image. The viewer can easily perceive the distortion whenever there is any kind of distortion on the original texture image. An example image is shown in Fig. 3.9. 39 Figure 3.8: MOS of distorted images when distortion comes from texture part only. The horizontal axis is the image number, images with the same source are collected into one minor grid. Within each minor grid, there are 6 minor slots, corresponding to 6 distortion types. Within each minor slot, there are 4 images, corresponding to 4 levels. The vertical axis is the corresponding MOS. Figure 3.9: Example image when jpeg distortion is on texture part only. (a) is the jpeg compressed texture image, with second strongest distortion level. (b) is the original depth map. (c) is the left view of finally rendered stereoscopic image. 40 Figure 3.10: MOS with confidence interval of stereoscopic image with additive noise and transmission loss on depth map. The horizontal axis is the image number, images with the same source are collected into one minor grid. Within each minor grid, there are 2 minor slots, corresponding to 2 distortion types. Within each minor slot, there are 4 images, corresponding to 4 levels. The vertical axis is the corresponding MOS. Distortion only on depth part For stereoscopic images with distortion only on the depth part, the result is a little bit different. We divide the distorted images into two groups, the first group have additive noise and transmission loss on depth map, and the second group have gaussian blur, sampling blur, jpeg compression and jp2k compression on depth map. For the first group, the MOS is shown in Fig. 3.10. The final score is decreasing along the preset decreasing levels, which is quite similar to distortion on texture case. In this figure, the 95% confidence interval is added to confirm the decreasing property. An example image is shown in Fig. 3.11, which have strongest additive noise on the depth map. And the final rendered image looks quite weird. 41 Figure 3.11: Example image when additive noise is on depth part only. (a) is the original texture image. (b) is the depth map with additive noise, strongest level distortion. (c) is the left view of finally rendered stereoscopic image. For the second group, the MOS is shown in Fig. 3.12, and an example image with sampling blur on depth map is shown in Fig. 3.13. For most of the images, the MOS are fluctuating around a high score, and the confidence interval is about the same, they do not have monotonously decreasing property. From the short interview with assessors, similar conclusion could also be drew that they can not clearly discriminate most of these distorted images. However, there are two exceptions from Fig. 3.12. The images with number from 49 to 65 are coming from ’shark’ image, the first slot is for gauss blurred images, and the last slot is for sampling blurred images. For these two cases, the MOS is decreasing with the decreasing preset levels. The other exception comes from images with number 81 to 97 which belongs to the ’microWorld’ image. The second and third slot is for jpeg 2000 and jpeg compression. And the corresponding MOS are also decreasing along the preset levels. By analyzing the depth map of these two exceptions, we can partly get to the bottom of this phenomenon. For the ’shark’ image, there is an obvious salient object, and the difference of depth between the foreground 42 Figure 3.12: MOS with confidence interval of stereoscopic image with gaussian blur, jp2k compression, jpeg compression and sampling blur on depth map. The horizontal axis is the image number, images with the same source are collected into one minor grid. Within each minor grid, there are 4 minor slots, corresponding to 4 distortion types. Within each minor slot, there are 4 images, corresponding to 4 levels. The vertical axis is the corresponding MOS. and background is large. Blur on the edges of the foreground object in depth map makes the boundary have some apparent sculpture effect. For the ’microWorld’ image, the depth map has complex pattern comparing to other images. Lots of small objects exists in the image, and neighbor objects have distinguishable depth map. The jpeg and jpeg 2000 compression on the depth map twisted the final rendered image, which makes the stereoscopical perception unnatural. Human visual system is sensitive to these two effects. Distortion on both Considering the distortion on both cases, we follow the same division method used before according to the distortion types. For the first group, only additive white noise and the transmission loss is considered. The final MOS is shown in Fig. 3.14. For the 43 Figure 3.13: Example image when sampling blur is on depth part only. (a) is the original texture image. (b) is the depth map with sampling blur, second strongest level distortion. (c) is the left view of finally rendered stereoscopic image. second group, gaussian blur, jpeg compression, jpeg 2000 compression, and sampling blur are considered. The final MOS is shown in Fig. 3.15. In these two figures, the MOS score of the corresponding distorted image with the same distortion level but have separate distortion on each part are also added. From the MOS, some additive effect could be observed. For the first group, the MOS is greatly determined by the worst part when distortion exists on both part. And for the second group, the image quality with distortion on both part is further deteriorated. 3.3.2 Feedback from conversation In addition to the standard subjective test procedure, we collected some extra infor- mation from assessors on the perceptual feeling of the stereoscopic images by short 44 Figure 3.14: MOS of stereoscopic image with additive noise and transmission loss on both texture and depth map. The horizontal axis is the image number, images with the same source are collected into one minor grid. Within each minor grid, there are 2 minor slots, corresponding to 2 distortion types. Within each minor slot, there are 4 images, corresponding to 4 levels. The vertical axis is the corresponding MOS. Figure 3.15: MOS of stereoscopic image with gaussian blur, jp2k compression, jpeg compression and sampling blur on both texture and depth map. The horizontal axis is the image number, images with the same source are collected into one minor grid. Within each minor grid, there are 4 minor slots, corresponding to 4 distortion types. Within each minor slot, there are 4 images, corresponding to 4 levels. The vertical axis is the corresponding MOS. 45 interviews. It was complained that, comfortable 3D perception should be based on per- fect 2D texture images. Whenever there is some perceptible distortion on the texture image, i.e., the distortion level is lower than ’good’, it discomforts the viewers. And they would rather view 2D image instead of 3D image. And there is some consensus on the distortion types. Blur distortion on texture For stereoscopic image, any kind of blur on the texture image is unacceptable. It is well-known that human visual system fits well with images have sharp edge and bright color [75]. Since a large portion of our assessors had also taken the subjective test of TID2013 [76] 2D image database, it was said that, comparing to the 2D blurred image, the 3D images with blur on texture are much more annoying and unacceptable. Shattering distortion types Beside the regular distortion types which can also be seen in other databases, some new effects special for DIBR structure need to be concerned. With DIBR structure, the depth map represents the relative distance from the camera to the object. And whenever there is some twisting on the contour, it will destroy the structure of the object, and finally reflect the rendered image. These distortion types include additive noise, transmission loss and severe jpeg compression. And such kind of shattering distortion types should be avoided before rendering process. 46 Chapter 4 A ParaBoost Stereoscopic Image Quality Assessment (PBSIQA) System Although many factors should be taken into account for the 3D QA, most previous quality measures work only for specific types of degradation. Such measures, e.g. the widely used SSIM index [47], are generally based on analytical formulae to predict quality scores. However, analytical measures tend to perform poorly under conditions when their underlying assumptions are not satisfied. In this chapter, we propose a machine learning based parallel boosting system to provide a robust measure for stereoscopic image QA, which works for various distortion conditions, including different distortion types as well as different levels of distortion in left and right views. Learning techniques help us to avoid any parameter tuning, which restricts the adaptability of the formula-based metrics in varying conditions. In addition, we reveal the complex relationship among different kinds of distortions in a given pair of stereoscopic images by introducing a novel ensemble fusion-based framework. 4.1 Overview of the Proposed Quality Index The conceptual block-diagram of the proposed PBSIQA system is depicted in Fig. 4.1, which consists of two stages. At Stage I, multiple learning-based scorers are designed, where each scorer handles a specific type of distortion such as the blocking artifact, bur- ring distortion, additive noise, and so on. The output of these scorers are a normilized 47 ScorerlG1 ScorerlG2 ScorerlG3 ScorerlGN Left Image Right Image Featuresetlfor distortionltypelG1 Featuresetlfor distortionltypelG2 Featuresetlfor distortionltypelG3 Featuresetlfor distortionltypelGN S 1 S 2 S 3 S N Featurel1 Featurel2 FeaturelN1 Featurel1 Featurel2 FeaturelN2 Featurel1 Featurel2 FeaturelN3 Featurel1 Featurel2 FeaturelN5 Fuser Predicted QualitylScore STAGEl2 STAGEl1 Inputlimages Figure 4.1: The conceptual block-diagram of the PBSIQA scoring system. objective score that has a value between 0 and 1. The output of these scorers is an obejctive score that only considers the target distortion. At Stage II, the fuser takes scores from all individual scorers to yield the final quality score. The prediction model in the fuser is also obtained from a learning process. The availability of multiple scorers at Stage I enables us to handle complex factors that influence human perceptual quality systematically. In particular, we will investigate the impacts of various distortions in tex- ture and depth maps on the quality of rendered images and, then, take them into account in the design of participating scorers. We prove the superiority and the robustness of the proposed PBSIQA system by extensive experimental results over several databases consisting of the MVD format as well as the stereoscopic format with symmetric and asymmetric distortions. Along this line, we compare the performance of the proposed PBSIQA system against 16 other 2D/3D quality indices. 48 4.2 MCL-3D, IVC-A and LIVE-A Databases The MVD data format is adopted in the 3DVC standard due to its several advantages. For example, it allows the texture image and its corresponding depth map image at a few sampled viewpoints to be compressed and transmitted in a visual communication system. After decoding them at a receiver side, we can render several intermediate views between any two input views using depth image based rendering (DIBR) technique. However, very few 3D image quality database was built based on the MVD data format. This is one of the main motivations in constructing the MCL-3D database [74]. The MCL-3D database is designed to investigate the impact of texture and/or depth map distortions on the perceived quality of the rendered image so as to develop a robust stereoscopic image quality index. First, we carefully chose nine texture images and their associated depth maps from a set of 3DVC test seqeunces as references. Only texture images are shown in Fig. 3.3. This reference set contains indoor/outdoor scenes, CG images, different depth percep- tion, and so on. Fig. 4.2 shows the distortion design. There are three original views and each view consists of a texture image and its associated depth map (denoted by O T1 /O D1 , O T2 /O D2 , and O T3 /O D3 ). Distortions with different types and levels were applied to original images and/or depth maps. Then, distorted texture images and depth maps were input to the view synthesis reference software (VSRS 3.5 [56]) to render two intermediate views, D VL andD VR that an assessor will see in the subjective test. We applied six different distortions with four different levels to three input views sym- metrically: Guassian blur, sampling blur, JPEG compression, JPEG-2000 compression, additive noise, and transmission error. Based on the recommendations of ITU [71], we consider five quality levels in subjective tests. The original reference stereoscopic images have excellent quality while the other 4-level distorted images were controlled by parameters associated with different distortion types. The reader can find more detailed 49 Figure 4.2: Distortion generation in the MCL-3D database. information about the parameters in [74]. The whole MCL-3D database can be divided into the following three sub-databases. Dataset A contains texture distortions only. Dataset B contains depth map distortions only. Dataset C contains both texture and depth map distortions. The above design allows us to investigate the effect of txture distortions and depth map distortions independently. We conducted a subjective test among 270 people to obtain the mean opinion score (MOS). The test was performed in a controlled environment as recommended by VQEG [48]. The display equipment was 46.9” LG 47LW5600. The subjective test results were verified by a screening process according to Annex 2 of Recommendation BT.500 [77], and ourliers were removed. In our earlier work [78], we also constructed two new databases by expanding the IVC [37] and the LIVE [38] databases and call them IVC-A and LIVE-A, respectively. 50 Table 4.1: The summary of MCL databases MCL Databases MCL3D MCL1 MCL2 Distortion Reference Source Nine texture/depthmap images from 3DVC test sequences Six stereoscopic image pairs from IVC [37] database Six stereoscopic image pairs from LIVE Phase I [38] database # of test images 648 144 Distortion type 6 (gaussian blur / sampling blur / JPEG compression / JPEG2000 compression / additive noise / transmission error) 4 (gaussian blur / JPEG compression / JPEG2000 compression / additive noise) Distortion level 4 4 Data format MVD (Multi-View plus Depthmaps) Conventional stereoscopic image (only texure image) Distortion design Whole dataset can be divided into three sub-databases. - Dataset A: Distortions only in texture images - Dataset B: Distortions only in depthmap images - Dataset C: The same distortion in both texture & depthmap images Each test stereoscopic pair has six combinations of distortion per each distortion type as follows: -[1,1], [1,2], [1,3], [2,2], [2,3] and [3,3] where [l, r] denotes the distortion levels of left and right images & [1]: the strongest, [2]: moderate, and [3]: the weakest distortion. Remark Rendering distortion (two intermediate views are rendered via VSRS3.5) Contain both symmetric and asymmetric distortions Subjective Test methodology Pair-wise comparison ACR-HR (w/ 5 grading scales [1:Bad 5:Excellent]) participant 270 people (men: 170, women:100 / ages: 21 42) 20 people (men:19, women:1 / ages: 24 37) Score type MOS MOS We took 12 stereoscopic image pairs from IVC and LIVE as references. While the dis- tortions in IVC and LIVE are symmetric, we add asymmetric distortions in IVC-A and LIVE-A. That is, we add distortions of different levels but the same type to the left and right views of a stereo image pair. There are four distortion types. Each stereoscopic image pair has six combinations of distortion levels: three symmetric distortion levels ([1; 1]; [2; 2]; [3; 3]) and three asymmetric distortion levels ([1; 2]; [1; 3]; [2; 3]). We con- ducted a subjective test to obtain MOS using the absolute category rating (ACR) [71]. A summary of the three SIQA databases used in our test is given in Table 4.1. 4.3 Proposed PBSIQA System The impact of texture distortions and depth map distortions on rendering quality is dif- ferent through a view synthesis process. In this section, we first design scorers tailored to texture distortions and depth map distortions in Sec. 4.3.1 and Sec. 4.3.2, respec- tively, as shown in the first stage of Fig.4.1. The design of these scorers and the score 51 (a) Impact of texture distortion (b) Impact of depth map distortion Figure 4.3: The impact on a rendered image of different distortion sources. fuser is implemented by support vector regression as detailed in Sec. 4.3.3. Finally, the training and test procedures are described in Sec. 4.4. 4.3.1 Scorer Design for Texture Distortions In Dataset A of the MCL-3D database, distortions are applied to the texture image while the depth map is kept untouched. In this case, we see a similar distortion effect in the rendered stereo image. For example, the texture image in Fig. 4.3a is distorted by a transmission error, a similar distortion type is observed in the rendered view since pixel values of the input texture image are direct sources to the pixel intensity of the rendered view. Furthermore, it is reported in [78] that the interaction between left and right views in perceived 3D image quality depends on the distortion type. For example, for blurring 52 Table 4.2: Features used for the scorers Type Feature Ref Type Feature Ref Pixel difference MD (Maximum Difference) [80] Structural Similarity SSIM index [47] MAE (Mean Absolute Error) [81] SSIM Luminance [47] PSNR - SSIM Contrast [47] ABV (Average Block Variance) - SSIM Similarity [47] MIN (Modified Infinity Norm) [81] UQI (Universal Quality Index) [84] Blockiness - SVD related Singular Value [83] AADBIIS (Average Absolute Difference Between In-Block Samples) [20] Singular Vector [83] Edge related ES (Edge Stability) [81] Spectral Difference ZCR (Zero Crossing Rate) [20] AES (Average Edge Stability) [20] PC (Phase Congruency) [82] PRATT [81] Contrast measure GM (Gradient Magnitude) [82] Image Correlation MAS (Mean Angle Similarity) [81] HVS HVS-MSE [79] NCC (Normalized Cross Correlation) [81] View synthesis NDSE (Noticeable Depth Synthesis Error) [89] and JPEG-2000 coding distortions, the lost information of a low quality view tends to be compensated by a high quality view. Thus, the perceptual quality is closer to that of the high quality view. On the other hand, for additive noise and JPEG coding distortions, a high quality view is negatively influenced by a low quality view. For this reason, distor- tions should be carefully classified. We classify distortion types into multiple groups, and design good scorers for them. For each scorer at Stage I in Fig. 4.1, proper features are extracted from input texture images and trained by a learning algorithm. Then, the scorer outputs an intermediate score for the target distortion group. Based on previous studies on image quality assessment [79, 80, 81, 82, 83, 84, 90] and our own experience, we select 24 candidate features for further examination, and they are listed in Table 4.2. Then, we calculated the Pearson correlation coefficient 53 Table 4.3: The results of single feature analysis: DS-A and DS-B represent Dataset A and Dataset B, respectively. Feature PCC for whole Dataset PCC for sub-dataset with single distortion type WN Gauss JP2K JPEG Sample Trans DS-A DS-B DS-A DS-B DS-A DS-B DS-A DS-B DS-A DS-B DS-A DS-B DS-A DS-B ES 0.83 0.44 0.82 0.57 0.82 0.26 0.87 0.45 0.85 0.47 0.69 0.13 0.62 0.51 MAS 0.60 0.15 0.89 0.30 0.44 0.15 0.63 0.17 0.71 0.20 0.43 0.10 0.74 0.09 MD 0.73 0.63 0.92 0.67 0.77 0.82 0.82 0.68 0.75 0.31 0.62 0.80 0.73 -0.17 NCC 0.34 0.16 0.29 0.40 -0.41 -0.39 0.60 -0.11 0.49 -0.03 0.65 -0.56 0.25 0.43 NDSE 0.60 0.72 0.94 0.81 0.41 0.79 0.43 0.66 0.49 0.50 0.48 0.81 0.74 0.75 SSIM index 0.64 0.67 0.90 0.68 0.44 0.73 0.53 0.70 0.66 0.48 0.52 0.75 0.73 0.69 UQI 0.68 0.45 0.72 -0.06 0.70 0.72 0.59 0.44 0.72 0.42 0.73 0.71 0.52 0.33 AADBIIS 0.27 0.52 0.95 0.84 -0.59 0.42 0.43 0.51 -0.71 0.45 -0.47 0.67 -0.53 0.42 ABV -0.20 0.38 0.92 0.82 -0.55 0.34 -0.42 0.50 -0.43 -0.20 -0.49 0.78 0.80 0.46 AES 0.33 0.48 -0.61 0.48 0.58 0.48 0.68 0.67 -0.42 -0.11 0.60 0.43 0.49 0.83 BLOCKINESS 0.29 -0.03 0.18 0.49 -0.21 -0.10 0.77 0.46 0.81 0.42 0.30 -0.12 -0.51 -0.04 GM 0.78 0.69 0.93 0.87 0.74 0.37 0.80 0.69 0.82 0.59 0.73 0.43 0.60 0.72 HVS MSE 0.60 0.36 0.78 0.60 0.50 -0.27 0.66 0.31 0.79 0.56 0.76 -0.24 0.57 0.35 MAE 0.68 0.63 0.94 0.84 0.55 0.37 0.58 0.65 0.67 0.51 0.57 0.43 0.71 0.72 MIN 0.70 0.63 0.94 0.80 0.63 0.73 0.62 0.67 0.63 0.48 0.73 0.75 0.79 0.18 PC 0.78 0.32 0.87 0.90 0.74 -0.19 0.82 0.34 0.90 0.64 0.70 -0.14 0.62 0.68 PRATT 0.78 0.56 0.85 0.39 0.80 0.39 0.77 0.60 0.81 0.51 0.77 0.29 0.53 0.75 PSNR 0.77 0.68 0.96 0.84 0.73 0.79 0.73 0.68 0.70 0.52 0.75 0.81 0.81 0.58 SSIM CON 0.68 0.68 0.92 0.68 0.55 0.73 0.51 0.71 0.76 0.56 0.61 0.76 0.75 0.65 SSIM LUM 0.78 0.62 0.80 0.38 0.74 0.61 0.66 0.66 0.76 0.45 0.86 0.78 0.60 0.54 SSIM SIM 0.58 0.49 0.94 0.60 0.30 0.63 0.50 0.65 0.31 -0.08 0.37 0.46 0.77 0.40 ZCR 0.64 0.17 0.92 -0.44 0.76 0.14 0.67 0.24 0.86 -0.44 0.71 0.33 -0.35 0.48 SVD 0.78 N/A 0.76 N/A 0.76 N/A 0.85 N/A 0.88 N/A 0.80 N/A 0.62 N/A (PCC) to indicate the prediction accuracy between a feature value and a single-feature- based quality scorer (with the exception that the singular value and the singular vector are integrated into one feature vector for the SVD scorer) over the whole Dataset A and its six sub-databases that contain one distortion type. The PCC results of the 23 single-feature-based scorers are shown in one row of Table 4.3. Based on these results, we exclude five features (namely, NCC, AADBIIS, ABV , AES, and blockiness) whose corresponding scorers have low PCC values for the whole Dataset A. For the remaining ones, we investigate which features are suitable for which distortion type. For instance, ES are Pratt are useful features to be included in the learning-based scorer targeting at the blurring distortion. In a similar way, we assign candidate features to several scorers as described below. 54 Scorer#1forBlurringDistortion: Burring distortion is mostly caused by low pass filter- ing, down-sampling, and compression (e.g. JPEG2000) due to the loss of high frequen- cies. This kind of distortion is referred to as the information loss distortion (ILD) [18]. The perceptual quality of a blurred stereoscopic image pair is closer to that of the high quality view since the structural component of the high quality view is preserved against blurring of the low quality view. Blurring distortion can be observed around edges most obviously. Human are also sensitive to misalignments between the edges of left and right views. Thus, we use two edge-related features to measure blurriness in an image introduced below. The first one is the edge stability mean squre error (ESMSE) [81], which character- izes the consistency of edges that are evident across multiple scales between the original and distorted images. To compute ESMSE, we first obtain edge maps with five differ- ent standard deviation parameters using the Canny operator. The output at scale m is decided by thresholding as E(r;c; m ) = 8 < : 1 ifC m (r;c)>T m 0 otherwise (4.1) where C m (r;c) is the output of the derivative of the Gaussian operator at the mth scale. An edge stability map ES(r;c) is obrained by the longest sequence E(r;c; m ); ;E(r;c; m+l1 ) of edge maps such that ES(r;c) =l; (4.2) where l =arg max \ m k m+l1 E(r;c; k ) = 1: (4.3) 55 The edge stability indices for the original and the distorted images at pixel location (r;c) denoted byQ(r;c) and ^ Q(r;c), respectively. Then, the ESMSE value is calculated by summing the edge stability indices over all edge pixel positions,n d , of the edge pixels of the original image as ESMSE = 1 n d n d X r;c=0 [Q(r;c) ^ Q(r;c)] 2 : (4.4) The other one is Pratt’s measure [81] that considers the accuracy of detected edge locations, missing edges and false alarm edges. The quantity is defined as Pratt’s Measure = 1 maxfn d ;n t g n d X i=1 1 1 +ad 2 i ; (4.5) wheren d andn t are the number of the detected and the ground truth edge points, respec- tively, andd i is the distance to the closest edge for theith detected edge pixel. Scorer #2 for Blocking Distortion: Blocking distortion is one of the most common dis- tortions generated by block-based image/video coders, such as JPEG, H.264 and HEVC. When stereoscopic images with blocking distortion are shown, the average quality of both views or that of the lower quality is perceived [9, 78]. The high quality image is negatively influenced by the low quality image since blocking distortion introduces new visual artifacts that do not exist in the pristine original scene. Such artifacts are called the information additive distortion (IAD) [18]. As stated in [20, 79], there are two good features in detecting blockiness. They are the HVS-modified MSE (HVS MSE) and the zero crossing rate (ZCR). Mathematically, the HVS MSE [79] is given by HVS MSE = ( 1 RC R X r=1 C X c=1 jUfIgUf ^ Igj 2 ) 1 2 ; (4.6) 56 whereRC is the image size,I and ^ I denote original and distorted images, respectively, UfIg = DCT 1 fH( p u 2 +v 2 ) (u;v)g is the inverse discrete cosine transform (DCT) ofH( p u 2 +v 2 ) (u;v), and whereH(), = p u 2 +v 2 , is an HVS-based band-pass filter and (u;v) is the DCT of image I. The ZCR along the horizontal direction is given by ZCR h = 1 R(C 2) R X r=1 C2 X c=1 z h (r;c) where z h (r;c) =f 1; horizontal ZC atd h (r;c) 0; otherwise (4.7) and whered h (r;c) = ^ I(r;c + 1) ^ I(r;c), c2 [1;C 1]. The vertical zero crossing rate, ZCR v , can be computed similarly. Finally, the overall ZCR is given by ZCR = ZCR h + ZCR v 2 : (4.8) Scorer #3 for Additive Noise: Additive noise is often caused by thermal noise during a scene acquisition process. If either view is distorted by additive noise, the perceived quality is degraded by the low quality image. A pixel-based difference is a good index for additive noise. We select three such features to characterize additive noise: the peak-signal-to-noise ratio (PSNR), the maximum difference (MD) [80] and the modified infinity norm (MIN) [81]. They are defined as PSNR = 10 log 10 255 2 MSE ; (4.9) where MSE = 1 RC P R r=1 P C c=1 (I(r;c) ^ I(r;c)) 2 ; MD = maxjI(r;c) ^ I(r;c)j; (4.10) 57 and MIN = v u u t 1 r r X i=1 4 2 i (I ^ I); (4.11) where4 i (I ^ I) denotes the ith largest deviation among all pixels. In this work, we select the top 25% deviations. Scorer #4 for Global Structural Error: We consider structural errors since the HVS is highly sensitive to the structural information of the visual stimuli. Structural errors can be captured by three components of the SSIM index [47], which are luminance, contrast and structural similarities between imagesx andy. They are defined as L(x;y) = 2 x y +C 1 2 x 2 y +C 1 ; (4.12) C(x;y) = 2 x y +C 2 2 x 2 y +C 2 ; (4.13) S(x;y) = 2 xy +C 3 x 2 y +C 3 ; (4.14) where,, and xy denote the mean, the standard deviation and the covariance, respec- tively. Scorer #5 for Local Structural Error: The performance of an image quality scorer can be improved by applying spatially varying weights to structural errors. Salient low-level features such as edges provide important information to scene analysis. Two features were proposed in [82] at this end: phase congruency (PC) and gradient magnitude (GM). Physiological studies show that PC provides a good measure for the significance of a local structure. Readers are referred to [82] for its detailed definition and properties. While PC is invariant with respect to image contrast, GM does take local contrast into 58 account. Scorer #6 for Object Structure & Luminance Change: We utilize singular vectors and singular values as features. Given an imageI, we can decompose it into the product of two singular vector matricesU andV and a diagonal matrix in form of U = [u 1 u 2 :::u R ]; V = [v 1 v 2 :::v C ]; = diag( 1 ; 2 ;:::; l ); (4.15) where u i and v j are column vectors, k is a singular value and l = min(R;C). It was shown in [83] that the first several singular vectors offer a good set to represent the structural information of objects while subsequent vectors account for finer details. Furthermore, singular values are related to the luminance changes. They should be con- sidered since the luminance mismatch between left and right views results in annoying viewing experience. Scorer #7 for Transmission Error: Packet loss and bit errors occur during stereo image data transmission. It appreas in form of block errors since most image coding standards adopt the block-based approach. Based on the feature analysis, both the universal quality index (UQI) [84] and the mean angle similarity (MAS) [81] are useful in characterizing transmission error. The UQI is defined as UQI = xy x y 2 x y 2 x + 2 y 2 x y 2 x + 2 y ; (4.16) which is equal to the product of three terms representing the loss of correlation, the luminance distortion, and the contrast distortion betwenn two imagesX andY , respec- tively. MAS is a feature that measures the statistical correlation between pixel vectors of the original and the distorted images since similar colors will result in vectors pointing 59 to the same direction in the color space. The moments of the spectral chromatic vector are used to calculte the correlation as MAS = 1 1 N 2 N X r;c=1 2 cos 1 hC(r;c); ^ C(r;c)i kC(r;c)kk ^ C(r;c)k ; (4.17) where C(r;c) and ^ C(r;c) indicate the multispectral pixel vectors at position (r;c) for original and distorted images, respectively. For each scorer, three feature values fromV 1,V 2, andV 3 are fed into the learning- based module as shown in Fig. 2.6. We entrust the task of seeking an optimal relation among these features to a learning algorithm, where the impact of the quality difference between left and right views on the perceptual quality highly depends on the distortion type. 4.3.2 Scorer Design for Depth Distortions Dataset B of the MCL-3D database contains depth map distortions only. Research on the effect of the depth distortion on rendered stereo image quality has been conducted recently by quite a few researchers, e.g., [85, 86, 87, 88]. Generally speaking, the depth value is inversely proportional to the horizontal disparity of rendered left and right views [63, 64, 65, 66, 67] so that the horizontal disparity distortion appears in form of geometric errors [54, 55]. An example is given in Fig. 4.3b, where noise is added to the depth map and the geometric error can be easily observed around edge regions. Scorer #8 for Geometric (Horizontal Disparity) Error: Zhao et al. [89] proposed a Depth No-Synthesis-Error (D-NOSE) model by exploiting that the depth information is typically stored in 8-bit grayscale format while the disparity range for a visually com- fortable stereo pair is often far less than 256 levels. Thus, multiple depth values do 60 correspond to the same integer (or sub-pixel) disparity value in the view synthesis pro- cess. In other words, some depth distortion may not trigger geometrical changes in the rendered view. Specifically, if a pixel distortion of the depth map falls into the range defined by the D-NOSE profile, it does not affect the rendered image quality in terms of MSE. Being motivated by the D-NOSE model, we define a noticeable depth synthesis error (NDSE) feature for geometric errors as NDSE = X i jDfig ^ Dfigj; (4.18) whereD and ^ D represent the original and the distorted depth maps, respectively, andi is a pixel index out of the range of D-NOSE profile which is defined as D-NOSE(v) = [v + (v);v + + (v)]; (4.19) and where (v) = dD 1 ( d(D(v))N 1e N )ev; (4.20) + (v) = bD 1 ( d(D(v))Ne N )cv; (4.21) and wherev is a quantized depth value,D(v) is a disparity function with regard tov, andN represent an offset error for rounding operation and 1=N precision, respectrively, de andbc denote the ceiling and the floor operations. For more details on the D-NOSE profile, we refer to [89]. Since NDSE only considers pixel distortions that change the original disparity value, it has the highest prediction accuracy (with PCC=0.73) among all features in Table 4.3. However, as compared to those in Dataset A, most features have relatively low PCC 61 (a) MOS values and their confidence level for stereoscopic images caused by additive noise and transmission errors. (b) MOS values and their confidence level for stereoscopic images caused by Gaussian blur and JPEG compression. Figure 4.4: The MOS values as a function of the depth map distortion level, where the horizontal axis is the image index. Images from the same source include two distortion types with four distortion levels, respectively. values in Dataset B. To investigate further, we divide distorted images into two groups. The first group includes additive noise and transmission errors while the second group includes the Gaussian blur, sampling blur, JPEG compression, and JPEG 2000 com- pression. For the first group, as the distortion level becomes higher, the MOS decreases monotonically as shown in Fig. 4.4a. However, we cannot observe such a coherent MOS movement trend for the second group in Fig. 4.4b. 62 This phenomenon can be explained below. Differences among neighboring pixels caused by the first group of distortions such as additive noise or transmission errors are generally large, leading to scattered geometric distortions in the rendered image espe- cially around object boundaries. They tend to get worse as the distortion level increases. On the other hand, differences among neighboring pixels caused by the second group of distortions are often small and changing gradually. We may not be able to recognize them easily although geometric errors exist in the rendered image. Scorer#9-Formula-basedMetric: Due to the weaker correlation between the MOS level and the distortion level as shown in Fig. 4.4b, it is diffucult to obtain high prediction accuracy via a learning-based algorithm from features of the input depth map directly. To overcome this challenge, we exploit the auxiliary information from two rendered views. That is, we use the SDP index [78] that is computed based on left/right rendered views as a candidate scorer. It helps boost the prediction accuracy as presented in Section 4.5. 4.3.3 Learning-based Scorers and Fuser Design The proposed PBSIQA system composed by nine quality scorers is summarized in Fig. 4.5. To yield an intermediate score from scorers #1#8, we adopt the support vector regression (SVR) technique. Consider a set of training data (x n ;y n ), wherex n is a feature vector andy n is the target value, e.g. the subjective quality score of thenth image. In the"-SVR [92], [94], the objective is to find a mapping functionf(x n ) that has a deviation at most" from the target value, y n , for all training data. The mapping function is in form of f(x) =w T (x) +b; (4.22) 63 Fuser Predicted Quality Score S L1 S L2 S L3 S L4 S L5 S L6 S L7 S L8 Scorer L E/ Scorer L E) Scorer L Ex Scorer L EH Scorer L E_ Scorer L EZ Scorer L ER Scorer L EU S F1 S F2 Scorer F E/ Scorer F E) for Blocking Artifact for Blurring Distortion for Additive Noise for Global Structural Error for Local Structural Error for Luminance Change for Transmission Error for Geometric Error SDPpmertic Score MMpSSIM Score ThreeNInputNViewsNPV/NwNV)NwNVx- TextureNImagesNPT/NwNT)NwNTx- DepthmapNImagesNPD/NwND)NwNDx- STAGEN) PScoreNspace- STAGEN/ PFeatureNspace- HVS_MSE ZCR Pratt ES PSNR MD MIN SSIM_Lum SSIM_Cont SSIM_Sim PC GM SingularNVector SingularNValue UQI MAS SMPE PSNR Figure 4.5: The diagram of the proposed PBSIQA system. wherew is a weighting vector,() is a non-linear function, and b is a bias term. We should findw andb satisfying the following condition: jf(x n )y n j"; 8n = 1; 2;:::;N t ; (4.23) whereN t is the number of training data. In [92], it was proven that w = N S X i=1 ( i i )(x i ); 0 i ; i C; (4.24) where i and i are Lagrange multipliers, C is a penalty parameter, andN S is the number of support vectors. The support vectors are defined as those data points satisfying the inequality in (4.23). The corresponding i and i are equal to zero for the Karush-Kuhn- Tucker (KKT) conditions to be met [92]. Sincew is defined only by support vectors, the 64 complexity off() is determined by the number of the support vectors. By incorporating (4.24) into (4.22), we have f(x) =w T (x) +b = N S X i=1 ( i i )(x i ) T (x) +b = N S X i=1 ( i i )K(x i ;x) +b; (4.25) whereK(x i ;x j ) is a kernel function K(x i ;x j ) =(x i ) T (x j ): (4.26) Although there are many kernels such as linear, polynomial and sigmoid, we use the radial basis function (RBF) in form of K(x i ;x j ) = exp(kx i x j k 2 ); > 0 (4.27) where is the radius controlling parameter, since it provides good performance in appli- cations in general [93]. Since it is not easy to determine a proper" value, we use a different version of the regression algorithm called the -SVR [94], where 2 (0; 1) is a control parameter to adjust the number of support vectors and the accuracy level, so that " becomes a variable in an optimization problem. Then, we can obtain the samef(x) andw more conveniently. At Stage 2 of the PBSIQA system, we fuse all intermediate scores from the scorers at Stage 1 to determine the final quality score. We adopt the-SVR algorithm to implement the fuser. Suppose that there aren scorers withm training stereoscopic image pairs. For theith training pair, we compute the intermediate quality scores i;j , wherei = 1; 2;:::;m 65 is the stereoscopic image pair index and j = 1; 2;:::; 9 is the scorer index. Let s i = (s i;1 ;s i;2 ; ;s i;9 ) be the intermediate score vector for theith image pair. We train the fuser usings i with all image pairs in the training set and determine the weighting vector w and the bias parameterb accordingly. Finally, the PBSIQA-predicted quality index for a given stereoscopic image pair in the test set can be found via Q(s) =w T (s) +b: (4.28) 4.3.4 Training and Test Procedures The n-fold cross validation [95] is a common strategy to evaluate the performance of a learning-based algorithm to ensure reliable results and prevent over-fitting, where the data are split inton chunks, and one chunk is used as the test data while the remaining n 1 chunks are used as training data. The same experiment is repeated n times by employing each of the n chunks as the test data. Finally, the overall performance is determined by an average of the test results over then chunks. In the proposed frame- work, training data are used to generate a regression model of each scorer at Stage 1. Then, a regression model of the fuser is obtained by training all intermediate scores from Stage 1 as input features withn-fold cross validation at Stage 2, where the number of samples is the same as the number of stereoscopic image pairs in then1 chunks. In all reported experiment results, we use the 10-fold cross validation. In addition, a feature scaling operation is performed before the training and test processes. It is conducted to avoid features of a larger numeric range dominating those of a smaller numeric range. For example, the PSNR value has a larger range of values than the other two features in scorer #3. We scale the feature values of each scorer to the unit range of [0,1] at Stage 1. 66 At the training stage, our goal is to determine the optimal weighting vectorw and biasb that minimize the error between MOS and the predicted score, i.e., X i jMOS i Q(s i )j 2 : (4.29) Since we use RBF as a kernel function, two parametersC and should be optimized to achieve the best regression accuracy. We conduct parameter search onC and at the training stage using the cross validation scheme in [93]. Various pairs of (C; ) are tried, and the one with the best cross validation accuracy is selected. After the best (C; ) pair is found, the whole training set is used again to generate the final scorer. At the test stage, we use the intermediate score vectors i in Eq. (4.28) to determine the quality score. The test can be quickly done since all model parameters are decided at the training stage. 4.4 Learning procedure The n-fold cross validation [95] is a common strategy to evaluate the performance of a learning-based algorithm to ensure reliable results and prevent over-fitting. The data is split into n chunks, then one chunk is used as test data while the remaining n 1 chunks are used as training data. The experiments are repeatedn times by employing each of then chunks as test data. Finally, the overall performance is determined by an average of the tests over the n chunks. In the proposed framework, training data are used at stage 1 to generate a regression model of each learning-based scorer. At stage 2, a regression model of the fuser is obtained by training all the intermediate scores from stage 1 as input features withn-fold cross validation again, where the number of samples is the same as the number of stereoscopic image pairs in then 1 chunks. In all the experiments, we use 10-fold cross validation. 67 In addition, a feature scaling is performed before the SVR training stage. This pro- cess is necessary to avoid features in greater numeric ranges dominating those in smaller numeric range. For example, in scorer #3, PSNR has a wider and higher range of values than the other two features. Therefore. we scale the feature values of each scorer at stage1 to the unit range of [0,1]. At the training stage, our goal is to determine the optimal weighting vectorw and biasb that minimize the error between MOS and the predicted score, given by jjMOS i Q(s)jj; i = 1; 2;:::k (4.30) wherejjjj is the absolute difference. Since we use RBF as a kernel function, two parameters C and should be optimized to achieve the best regression accuracy. We perform the parameter search onC and during the training stage using the cross vali- dation scheme in [93]. Various pairs of (C; ) are tried, and the one with the best cross validation accuracy is selected. After the best (C; ) is decided, the whole training set is used again to generate the final scorer. At the testing stage, we use the intermediate score vectors i in Eq. (4.28) to deter- mine the quality score. The test can be done fast once we get the models from the training stage. 4.5 Performance Evaluation To evaluate the performance of the proposed PBSIQA index, we follow the suggestions of the VQEG report [96] and use three performance measures: the Pearson correlation coefficient (PCC), the Spearman rank-order correlation coefficient (SROCC), and the root mean squared error (RMSE). Before calculating PCC and SROCC, we apply the 68 monotonic logistic function to the predicted quality scores so as to fit the subjective quality scores (MOS) and remove any nonlinearity via f(s) = 1 2 1 +exp( s 3 j 4 j ) + 2 ; (4.31) wheres andf(s) are the predicted score and the fit predicted score, respectively, and k (k = 1; 2; 3; 4) are the parameters to minimize the mean squared error betweenf(s) and MOS. 4.5.1 Performance Comparison of Stage I Scorers First, we compare the performance of eight learning-based scorers (Scorers #1-8) and show that each scorer truly offer good performance for its target distortion type. Here, we consider the MLC3D database only, and divide it into six sub-databases so that each contains one distortion type as described in Section 4.2. The six distortion types are listed in the top row of Table 4.4. We compute the PCC value of each scorer against each sub-database and rank its effectiveness in the descending order of PCC values indi- vidually. Table 4.4 summarizes the results of scorers in Stage 1 only (namely, without fusion in Stage 2). The performance of scorers #1, #2, and #3 matches well with their target design. First, scorer #1 has the highest correlation on the blurred and JPEG2000 compression sub-databases. This is reasonable since, besides the ringing artifect, the main distortion of the JPEG2000 compression sub-database is blurring. Scorer #2 is designed for the blocking artifact, and the JPEG compression database has the top rank. Scorer #3 has the best performance on additive noise database as designed. Scorer #4 (for global structural error) and scorer #5 (for local structural error) share strong similarity in their rank orders. Specifically, they provide the first and second best 69 Table 4.4: Performance comparison among 6 sub-databases in terms of the rank and PCC values (inside the parathesis) with respect to each of the eight learning-based scor- ers, where the top-ranked sub-database is marked in bold. Scorer # Target Distortion A WN sub-database Gaussian sub-database JP2K sub-database JPEG sub-database Sample sub-database Trans. Loss sub-database 1 Blurring Distortion 3 (0.80) 1 (0.89) 2 (0.84) 4 (0.77) 5 (0.70) 6 (0.57) 2 Blocking Artifact 3 (0.81) 6 (0.67) 2 (0.82) 1 (0.88) 4 (0.80) 5 (0.70) 3 Additive Noise 1 (0.89) 5 (0.72) 2 (0.77) 4 0.75) 6 (0.66) 3 (0.76) 4 Global Structural Error 1 (0.90) 4 (0.78) 3 (0.79) 2 (0.84) 6 (0.65) 5 (0.66) 5 Local Structural Error 1 (0.91) 4 (0.79) 3 (0.81) 2 (0.82) 5 (0.72) 6 (0.66) 6 Luminance Change & Scene Structural Error 4 (0.73) 5 (0.7) 2 (0.84) 1 (0.87) 3 (0.76) 6 (0.66) 7 Transmission Error 2 (0.82) 5 (0.64) 6 (0.63) 3 (0.68) 3 (0.68) 1 (0.84) 8 Geomeric Error 1 (0.92) 6 (0.39) 4 (0.60) 2 (0.70) 5 (0.40) 2 (0.70) performance in additive noise and JPEG compressed databases. As observed in [78], the information additive distortion (IAD) is more obvious than the information loss dis- tortion (ILD) among all structural distortions. Additive noise and blockiness are typical exmaples of IAD. Our observation is consistent with that in [78]. Scorer #6 yields the best result for the JPEG compressed sub-database since we may see color/luminance chages if the compression ratio is too high, which can be captured by the singular value feature of Scorer #6. However, it has poor correlation with human subjective experience on the additive noise sub-database. This implies that, although scorers #4, #5 and #6 are designed for structural errors in common, they capture different aspects. Scorer #7 offers the best result for transmission errors to meet the target design. Last, scorer #8, designed to capture geometric distortions, provides the best result in databases with additve noise and transmission errors, which was already explained in Section 4.3.2. 70 Table 4.5: Cumulative performance improvement via fusion against Dataset C of the MCL-3D database. Accumulation of Scorers Base Performance (#8: Geometric Error) Blurring Distortion (#2) Additive Noise (#3) Blocking Artifact (#1) Transmission Error (#7) Structural Error (#4 #5) Luminance Change (#6) Perf. Inc. Perf. Inc. Perf. Inc. Perf. Inc. Perf. Inc. Perf. Inc. A WN sub-database PCC 0.92 0.78 -17.9% 0.90 13.3% 0.90 0.0% 0.90 0.0% 0.91 1.1% 0.92 1.1% SROCC 0.89 0.77 -15.6% 0.89 13.5% 0.89 0.0% 0.89 0.0% 0.90 1.1% 0.90 0.0% RMSE 0.90 1.65 45.5% 1.07 -54.2% 1.07 0.0% 1.04 -2.9% 1.09 4.6% 1.13 3.5% Gaussian Blur sub-database PCC 0.35 0.86 59.3% 0.89 3.4% 0.93 4.3% 0.93 0.0% 0.94 1.1% 0.95 1.1% SROCC 0.24 0.86 72.1% 0.86 0.0% 0.92 6.5% 0.92 0.0% 0.93 1.1% 0.94 1.1% RMSE 1.62 0.78 -107.7% 0.68 -14.7% 0.46 -47.8% 0.49 6.1% 0.44 -11.4% 0.42 -4.8% JP2K sub-database PCC 0.53 0.78 32.1% 0.87 10.3% 0.92 5.4% 0.92 0.0% 0.93 1.1% 0.95 2.1% SROCC 0.63 0.77 18.2% 0.86 10.5% 0.92 6.5% 0.92 0.0% 0.92 0.0% 0.95 3.2% RMSE 1.79 1.42 -26.1% 0.86 -65.1% 0.67 -28.4% 0.66 -1.5% 0.53 -24.5% 0.52 -1.9% JPEG sub-database PCC 0.81 0.79 -2.5% 0.83 4.8% 0.93 10.8% 0.94 1.1% 0.95 1.1% 0.97 2.1% SROCC 0.79 0.70 -12.9% 0.80 12.5% 0.90 11.1% 0.92 2.2% 0.94 2.1% 0.97 3.1% RMSE 1.00 1.03 2.9% 0.85 -21.2% 0.56 -51.8% 0.47 -19.1% 0.38 -23.7% 0.39 2.6% Sample Blur sub-database PCC 0.45 0.84 46.4% 0.78 -7.7% 0.87 10.3% 0.90 3.3% 0.91 1.1% 0.93 2.2% SROCC 0.59 0.81 27.2% 0.76 -6.6% 0.82 7.3% 0.89 7.9% 0.90 1.1% 0.92 2.2% RMSE 1.92 1.04 -84.6% 1.24 16.1% 0.92 -34.8% 0.74 -24.3% 0.85 12.9% 0.76 -11.8% Trans Loss sub-database PCC 0.66 0.84 21.4% 0.79 -6.3% 0.72 -9.7% 0.83 13.3% 0.87 4.6% 0.91 4.4% SROCC 0.61 0.79 22.8% 0.75 -5.3% 0.67 -11.9% 0.76 11.8% 0.85 10.6% 0.89 4.5% RMSE 1.26 0.83 -51.8% 0.98 15.3% 1.11 11.7% 1.04 -6.7% 0.88 -18.2% 0.57 -54.4% Whole Database PCC 0.58 0.74 21.6% 0.84 11.9% 0.89 5.6% 0.91 2.2% 0.92 1.1% 0.93 1.1% SROCC 0.56 0.70 20.0% 0.82 14.6% 0.86 4.7% 0.89 3.4% 0.91 2.2% 0.92 1.1% RMSE 1.55 1.27 -22.0% 1.01 -25.7% 0.89 -13.5% 0.81 -9.9% 0.77 -5.2% 0.75 -2.7% 4.5.2 Performance Improvement via Fusion We show how the PBSIQA system can improve QA performance by fusing the results from scorers at Stage 1 progressively in this section. To illustrate the design methodol- ogy, without loss of generality, we use the performance of scorer #8 for the geometric error as the base, add one scorer at a time to account for the most challenging distortion type in the remaining set, and show how the added scorer improves the performance for six sub-databases and the entire MCL-3D database. The results conducted against Dataset C of the MCL-3D database are shown in Table 4.5, where the base performance with scorer #8 is shown in the first column and the results from scorers #2, #3, #1, #7, #4, #5, and #6 are fused to its score one by one cumulatively. We see a substantial per- formance gain for each sub-database when its associated scorer is included in the fusion process. The whole database contains different distortion types so that it is beneficial to fuse all scorers to get robust and accurate predicted quality scores as shown in the last row of Table 4.5. 71 The design methodology (namely, the fusion order of scorers #2, #3, #1, #7, #4 and #5 and, finally, #6) is explained below. The performance gain for a given database is calculated in percentages as Performance Increase = V curr V prev V prev 100; (4.32) where V curr is the value of PCC, SROCC or RMSE after adding a new scorer, while V prev is the result of the previous stage. Since the error rate of base scorer #8 is high- est in sub-databases with Gaussian/sampling blur, we fuse scorer #2 to scorer #8 to reduce erroneous prediction caused by blurring distortion. We show the performance improvement for each sub-database in the second column of Table 4.5 and see about 60% and 45% performance gains for sub-databases with the Gaussian and the sampling blur, respectively. Although the performance gain drops in some sub-databases, they are not as significant. As a result, we achieve a performance gain of 21.6% for the entire database. After the fusion of scorers #8 and #2, it is shown in Table 4.5 that the system does not perform well against additive noise. Thus, we fuse scorer #3 to the system in the next step. After the fusion of scorers #8, #2 and #3, we observe a significant performance boost (13%) for the additive noise sub-database, and a performance gain of 11.9% for the entire database. By following the same line of thought, we can fuse more scorers and obtain better performance for the entire database. Based on the above discussion, both PCC and SROCC values with respect to the whole MCL-3D database are plotted as a function of the number of fused scorers in Fig. 4.6. The performance of the proposed PBSIQA system may reach a saturation point if the number of fused scorers is suffi- ciently larger. It should also be emphasized that the proposed PBSIQA system can be extended systematically. That is, if new distortion types are introduced, we can design 72 1 2 3 4 5 6 7 8 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 # of Fused Scorers PCC SROCC Figure 4.6: The PCC and SROCC performance curves as a function of the number of fused scorers in the PBSIQA system. scorers to tailor to them and fuse new scorers into the system. In contrast, the traditional formula-based quality index does not have such flexibility. 4.5.3 Performance Comparison with Other Quality Indices In this section, we compare the performance of the proposed PBSIQA index with other QA indices against Datasets A, B, and C of MCL-3D, IVC-A and LIVE-A. The latter two are more challenging than their sources, IVC [37] and LIVE [38], by including asymmetric distortions. Futhermore, we conduct a cross-database learning procedure to demonstrate the robustness of the proposed PBSIQA index. That is, the PBSIQA trained with the MCL-3D database is used to predict the quality of stereo image pairs in IVC-A and LIVE-A. For performance benchmarking, we consider both 3D and 2D quality indices in the literature. The 3D indices include those denoted by Benoit [13], Campisi [12], RKS [18] and BQPNR [19]. The 2D indices include: the Signal to Noise Ratio (SNR), the Peak Signal to Noise Ratio (PSNR), the Mean Square Error (MSE), the Noise Quality Measure [97] (NQM), the Universal Quality Index [84] (UQI), the Structural Similarity Index [47] (SSIM), the pixel-based VIF [98] (VIFP), the visual 73 Table 4.6: Performance comparison over the MCL-3D database. Dataset A Dataset B Dataset C Quality Indices PCC SROCC RMSE PCC SROCC RMSE PCC SROCC RMSE Proposed PBSIQA 0.921 0.913 0.703 0.848 0.775 0.580 0.933 0.920 0.741 3D indices BQPNR 0.591 0.533 2.241 0.675 0.512 1.181 0.617 0.522 2.339 RKS 0.710 0.679 1.578 0.829 0.748 0.894 0.731 0.731 1.594 Benoit 0.560 0.559 1.855 0.698 0.533 1.147 0.593 0.593 1.884 Campisi 0.752 0.755 1.422 0.815 0.676 0.926 0.824 0.824 1.323 2D indices C4 0.788 0.796 1.402 0.818 0.676 0.921 0.824 0.833 1.323 IFC 0.639 0.593 1.722 0.662 0.160 0.199 0.671 0.609 1.733 MSE 0.673 0.675 1.657 0.785 0.739 0.991 0.709 0.703 1.648 NQM 0.754 0.787 1.470 0.766 0.672 1.030 0.836 0.836 1.282 PSNR HVS 0.749 0.795 1.484 0.811 0.753 0.936 0.815 0.807 1.354 PSNR 0.720 0.678 1.555 0.798 0.737 0.965 0.725 0.705 1.609 SNR 0.705 0.699 1.589 0.806 0.738 0.947 0.743 0.721 1.566 SSIM 0.533 0.534 1.896 0.785 0.728 0.992 0.575 0.587 1.914 UQI 0.475 0.502 1.971 0.769 0.659 1.022 0.548 0.545 1.956 VIF 0.663 0.656 1.677 0.746 0.582 1.066 0.696 0.671 1.678 VIFP 0.604 0.622 1.784 0.743 0.602 1.072 0.665 0.646 1.744 VSNR 0.693 0.720 1.615 0.790 0.730 0.981 0.760 0.756 1.519 signal-to-noise ratio [99] (VSNR), the Peak Signal to Noise Ratio taking into account CSF [100] (PSNR-HVS), C4 [101], and the image fidelity criterion [102] (IFC). Since 2D indices are only applied to a single image, we use the average score of left and right rendered views to yield the overall quality index. We first focus on performance comparison against the MCL-3D database. The results are summarized in Table 4.6, where the best & the second best indices in each column of are shown in bold. We have the following observations. Dataset A of MCL-3D (texture distortion only) The PBSIQA index has the best performance in terms of PCC, SROCC and RMSE. It outperforms all other 2D and 3D indices by a significant margin. Among the 2D indices, those designed by considering HVS offer better perfor- mance such as C4, NQM, and PSNR HVS. 74 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 4 5 6 7 8 9 10 11 12 13 BQPNR MOS (a) BQPNR 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 11 12 RKS MOS (b) RKS 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 2 3 4 5 6 7 8 9 10 11 12 Benoit MOS (c) Benoit 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 2 3 4 5 6 7 8 9 10 11 12 Campisi MOS (d) Campisi 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 2 3 4 5 6 7 8 9 10 11 12 C4 MOS (e) C4 0 5 10 15 2 3 4 5 6 7 8 9 10 11 12 IFC MOS (f) IFC 0 500 1000 1500 2000 2 3 4 5 6 7 8 9 10 11 12 MSE MOS (g) MSE 10 15 20 25 30 35 40 2 3 4 5 6 7 8 9 10 11 12 NQM MOS (h) NQM 15 20 25 30 35 2 3 4 5 6 7 8 9 10 11 12 PSNR HVS MOS (i) PSNR HVS 15 20 25 30 35 40 2 3 4 5 6 7 8 9 10 11 12 PSNR MOS (j) PSNR 10 15 20 25 30 35 2 3 4 5 6 7 8 9 10 11 12 SNR MOS (k) SNR 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 11 12 SSIM MOS (l) SSIM 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2 3 4 5 6 7 8 9 10 11 12 UQI MOS (m) UQI 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2 3 4 5 6 7 8 9 10 11 12 VIF MOS (n) VIF 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2 3 4 5 6 7 8 9 10 11 12 VIFP MOS (o) VIFP 5 10 15 20 25 30 35 40 2 3 4 5 6 7 8 9 10 11 12 VSNR MOS (p) VSNR (q) PBSIQA Figure 4.7: Scatter plots for images in the Dataset C of MCL-3D database, where the x-axis is the subjective MOS value and the y-axis is the predicted MOS using various objective quality indices. Dataset B of MCL-3D (depth map distortion only) The PBSIQA index gives the best performance in PCC, SROCC and RMSE, yet its performance gain over the second best one narrows. Note that there are only two scorers (scorers #8 & #9) in the PBSIQA system dedicated to the depth map distortion. 75 Table 4.7: Performance of the proposed PBSIQA index over IVC-A & LIVE-A databases. Database PCC SROCC RMSE IVC-A 0.952 0.923 0.271 LIVE-A 0.924 0.914 0.324 Table 4.8: Cross-data validation result: Use the models generated from MCL-3D to predict a quality score of stereo images in IVC-A & LIVE-A databases. Database PCC SROCC RMSE IVC-A 0.884 0.853 0.404 LIVE-A 0.893 0.844 0.433 Dataset C of MCL-3D (both texture and depth map distortions) The PCC and the SROCC values of the PBSIQA index are 0.933 and 0.920, respectively. It outperforms all benchmarking indices by a significant margin. The scatter plots of predicted scores vs. MOS along with their logistic fitting curves for various objective quality indices are shown in Fig. 4.7. These data are calculated based on Dataset C of MCL-3D. Each point on the plot represents one stereoscopic image pair. The horizontal axis gives the objective quality score while the vertical axis indicates the MOS value. The scatter plots confirm that the PBSIQA index exhibits the best correlation with human perception. Although the logistic fitting curves of NQM and PSNR HVS show trends similar to that PBSIQA, their variances are much larger. Next, we show the performance of the proposed PBSIQA index with respect to IVC- A and LIVE-A databases in Table 4.7. Since these two databases consist of the stereo- scopic data format (i.e., the left and the right views without the depth map), we only fuse scorers #1#7 for texture distortions in this PBSIQA system. As shown in the table, we see that all three performance metrics (PCC, SROCC and RMSE) are comparable with 76 those for MCL-3D. This might be on account of a fact that a distortion design in IVC- A and LIVE-A is more simpler than that of MCL-3D so that a learning algorithm can generate more realiable model for each scorer. Practically, it is not convenient to train learning-based quality assessment indices to tailor to a new database every time. It is desirable to develop a quality index that offers consistent performance when it is trained by one database yet tested by a different database. This is called cross-database vali- dation. To conduct this task, we train the PBSIQA system using the MCL-3D database and then use it to predict the quality score of stereoscopic images in IVC-A and LIVE- A. The results are shown in Table 4.8. Although its performance degrades slightly, the PBSIQA system still offers good results. As the SIQA database size becomes larger, its prediction accuracy goes higher. 77 Chapter 5 Robust Uncalibrated Stereo Rectification with Constrained Geometric Distortions (USR-CGD) 5.1 Mathematical Background We briefly review the mathematical background on perspective projection and epipolar geometry for uncalibrated rectification in this section. As shown in Fig. 5.1, the pinhole camera model consists of optical centerC, image planeR, object pointW , and image pointM that is the intersection ofR and the line containingC andW . The focal length is the distance betweenC andR, and the optical axis is the line that is orthogonal toR and containsC, where its intersection withR is the principal point. Letw andm be the coordinates ofW andM, respectively. They are related by a perspective projection matrixP of dimension 3 4 as m = 2 6 6 6 4 u v 1 3 7 7 7 5 ' 2 6 6 6 4 p 11 p 12 p 13 p 14 p 21 p 22 p 23 p 24 p 31 p 32 p 33 p 34 3 7 7 7 5 2 6 6 6 6 6 6 6 4 x y z 1 3 7 7 7 7 7 7 7 5 (5.1) = Pw; (5.2) 78 Figure 5.1: Illustration of a pinhole camera model. where' indicates the equal up to scale. MatrixP can be decomposed into P =K [Rjt]; (5.3) where K and [Rjt] are called the camera intrinsic matrix and the camera extrinsic matrix, respectively. MatrixK is in form of K = 2 6 6 6 4 u u 0 0 v v 0 0 0 1 3 7 7 7 5 ; (5.4) where u = s u f and v = s v f are focal lengths in theu andv axes, respectively (f is the physical focal length of the camera in the unit of millimeters whiles u ands v are the scale factors), and (u 0 ;v 0 ) are the coordinates of the principal point, is the skew factor when theu and thev axes of the model are not orthogonal. For simplicity, it is often 79 assumed that the horizontal and vertical focal lengths are the same and there is no skew betweenu andv axes. Thus, we have K = 2 6 6 6 4 0 w 2 0 h 2 0 0 1 3 7 7 7 5 ; (5.5) where w and h are the width and the height of the image, respectively. The camera extrinsic matrix of dimension 3 4 is concerned with camera’s position and orientation. It consists of two parts: a rotation matrix R of dimension 3 3 and a displacement vectort of dimension 3 1. The plane that contains optical centerC and is parallel to the image plane is the focal plane. According to [39], the Cartesian coordinates~ c ofC is given by ~ c =R 1 t: (5.6) Then, any optical ray that passes through M and C can be represented by the set of pointsw: ~ w =~ c +R 1 K 1 m; (5.7) where is a constant. Next, we consider two stereo pinhole cameras as shown in Fig. 5.2. Let m l and m r be point correspondences that are the projections of the same 3D object pointw on imagesI l andI r , respectively. e l ande r are called epipoles that are intersection points of the baseline with the left and right image planes. The plane containing the baseline and object pointw is called the epipolar plane, and the intersection lines between the epipolar plane and each of the two image planes are epipolar lines. 80 Figure 5.2: The epipolar geometry of a pair of stereo images. The intrinsic projective geometry between the corresponding points in the left and right images can be described by the epipolar constraint as m T l Fm r =m T r F T m l =0; (5.8) whereF is the fundamental matrix, which is a 33 matrix with rank 2, and0 = [0 0 0] T is a zero column vector. The epipole, which is the null space ofF, satisfies the following condition: Fe l =F T e r =0: (5.9) Fundamental matrix F maps a point, m l , in one image to the corresponding epipolar line,Fm l =l r , in the other image, upon which the corresponding pointm r should lie. Generally, all epipolar lines are not in parallel with each other and passing through the epipole (namely,l T l e l =l r T e r =0). Image rectification as shown in Fig. 5.3 is the process of converting the epipolar geometry of a given stereo image pair into a canonical form that satisfies two condi- tions: 1) all epipolar lines are parallel to the baseline, 2) there is no vertical disparity 81 Figure 5.3: Illustration of image rectification. between the corresponding epipolar lines in both images. This can be done by applying homographies to each of image planes or, equivalently, mapping the epipoles to a point at infinity as e 1 = [1 0 0] T . Especially, the fundamental matrix of a pair of rectified images can be expressed in form of F 1 = 2 6 6 6 4 0 0 0 0 0 1 0 1 0 3 7 7 7 5 : (5.10) LetH l andH r be two rectifying homographies of the left and right images, respectively, and (^ m l ; ^ m r ) be the corresponding points in the rectified images. Then, we have ^ m l =H l m l ; ^ m r =H r m r : (5.11) According to Eq. (5.8), ^ m T l F 1 ^ m r =0: (5.12) 82 By incorporating Eq. (5.11) in Eq. (5.12), we obtain m T l H T l F 1 H r m r =0: (5.13) As a result, the fundamental matrix of the original stereo image pair can be specified as F =H T l F 1 H r . The fundamental matrix is used to calculate the rectification error in the process of parameter optimization. Thus, the way to parameterizeH l andH r is critical to the generation of good rectified images. 5.2 Proposed USR-CGD Rectification Algorithm 5.2.1 Generalized Homographies We begin with a brief review on the parameterization of the homography as proposed by Fusiello et al. [25, 34]. The rectification procedure is a process that defines new virtual camerasP n l andP nr given old camerasP ol andP or . New virtual cameras are obtained by rotating old cameras around their optical centers until two focal planes contain the baseline and become coplanar. The rectifying transformation is used to map the image plane of P o onto the image plane of P n . Without loss of generality, we use the left camera as the example in the following discussion. The same derivation applies to the right camera as well. For the left camera, we have H l =P nl 1:3 P 1 ol 1:3 ; (5.14) 83 (a) Without parameter for y-translation (b) With parameter for y-translation Figure 5.4: The effect of introducing a parameter for the y-translation: the original and rectified stereo pairs are shown in the left and right of subfigures (a) and (b). where the subscript denotes a range of columns. By Eq. (5.7), the optical rays of each images can be represented by ^ w l = ^ c ol + ol R 1 ol K 1 ol m ol ; (5.15) ^ w l = ^ c nl + nl R 1 nl K 1 nl m nl : (5.16) Since it is assumed that the optical center does not move in old and new camera; namely, ^ c ol =^ c nl , in [34], the homography is expressed as H l =K nl R l K 1 ol ; (5.17) whereK ol andK nl are intrinsic matrices of the old and the new cameras, respectively, andR l is the rotation matrix that transform the old camera to the new camera for recti- fication. The homography model as described above has its limitations since it only allows the rotation around camera’s optical center during the rectification process. However, 84 if there is only vertical disparity between two images, the camera would still be rotated to make focal planes coplanar, which inevitably introduces a warping distortion to the rotated rectified image as shown in Fig. 5.4a. Better rectified images can be obtained by allowing an extended set of camera parameters such as the displacement between optical centers and different focal lengths. This is elaborated below. First, we consider a new homography that includes the translation of the optical center. Based on Eqs. (5.15) and (5.16), we get m nl =K nl R nl [c tl +R 1 ol K 1 ol m ol ]; (5.18) wherec tl =c ol c nl denotes the movement between the optical centers of the old and the new cameras. Since the horizontal disparity does not affect the rectification per- formance, we focus on the case of camera’s vertical translation. The corresponding homography can be derived by simplifying Eq. (5.18) as H l =K nl T l R l K 1 ol (5.19) where T l = 2 6 6 6 4 1 0 0 0 1 t yl 0 0 1 3 7 7 7 5 (5.20) denotes a vertical translation matrix, which compensates the displacement as shown in Fig. 5.4b. 85 (a) Without parameter for z-translation (b) With parameter for z-translation Figure 5.5: The effect of introducing a parameter for different zoom levels: the original and rectified stereo pairs are shown in the left and right of subfigures (a) and (b). Next, we examine another generalization by allowing different focal lengths in two intrinsic matrices. In [34], intrinsic matricesK ol andK or are set to be the same,i.e., K ol =K or = 2 6 6 6 4 0 w 2 0 h 2 0 0 1 3 7 7 7 5 ; (5.21) where the same focal length is adopted. However, if there exists an initial difference in the zoom level between the two cameras, the rectification performance will degrade since two cameras should be rotated on Z-axis to put both image planes to be coplanar as shown in Fig. 5.5a, which reults in the up-scaled rectified images in horizontal direction. 86 To compensate the initial zoom difference, the left and the right cameras are allowed to have different focal lengths. Thus, we have K ol = 2 6 6 6 4 l 0 w 2 0 l h 2 0 0 1 3 7 7 7 5 ;K or = 2 6 6 6 4 r 0 w 2 0 r h 2 0 0 1 3 7 7 7 5 : (5.22) Geometrically, the two image planes are brought to a normalized image plane without loss of perpendicularity with respect to the object point as shown in Fig. 5.5b so that the viewing angle of each camera is not changed. To summarize, we derive a generalized rectification homograhpy model that consists of nine parameters in this section: five rotational parameters: yl , zl , xr , yr , and zr ; two y-translational parameters:t yl andt yr ; two focal length parameters: l and r . The x-axis rotational parameter ( xl ), which affects the portion of the scene that is pro- jected, is set to zero. Furthermore, since images are projected to the virtual camera plane, we can arbitrarily choose new intrinsic matrices as long as both cameras have the same vertical focal length and the same vertical coordinate of the principal point to meet the epipolar constraint in Eq. (5.8). Thus, we demandK nl andK nr to be the same as the old left one: K nl =K nr =K ol : (5.23) 5.2.2 Geometric Distortions The parameters of the rectification homography model are updated through an optimiza- tion process to generate a pair of rectified images. There are two measures on the quality 87 of the rectified image: 1) the rectification error and 2) errors of various geometric dis- tortions. The rectification error is the average vertical disparity in pixels between two corre- sponding points in the left and right images. It is an objective (or quantitative) measure, and it should be less than 0.5 pixels for a stereo matching algorithm to be performed in the 1-D scanline. In the literature, the Samson error (E s ) [27, 40] is widely used. It is the approximated projection error defined by E s = 1 N v u u t N X j=1 (E s ) 2 j ; (5.24) whereN is the number of corresponding pairs and (E s ) j is thejth component of nor- malized vectorE s , which is computed via (E s ) 2 j = (m j r T Fm j l ) 2 (Fm j l ) 2 1 + (Fm j l ) 2 2 + (m j r T F ) 2 1 + (m j r T F ) 2 2 : (5.25) In the image rectification process, we need to define a cost function that takes both the objective quality (as represented by the rectification error E s ) and the subjective quality (as represented by geometric distortions) into account. The latter is needed since we cannot avoid warping distortions in a perspective transformation. Here, the geo- metrical distortion is measured by the dissimilarity of a rectified image from its original unrectified one. Several rectification algorithms [28]-[33] have been proposed to control the perspective distortion in rectified images using auxiliary affine transforms (e.g., the Jacobian or SVD ofH, etc.), yet none of them is clearly superior to others in reaching well-balanced rectification quality. Mallon and Whelan [31] introduced two geometric measures for performance eval- uation: 1) orthogonality and 2) aspect ratio. Let a = ( w 2 ; 0; 1) T , b = (w; h 2 ; 1) T , 88 c = ( w 2 ;h; 1) T and d = (0; h 2 ; 1) T be four mid points of an image before the trans- form, where w and h are the image width and height. After the transform, the new positions ofa,b,c andd are denoted by ^ a, ^ b, ^ c and ^ d, respectively. We examine vectors ^ x = ^ b ^ d and ^ y = ^ c ^ a. The orthogonality measure is defined as the angle between ^ x and ^ y: E O = cos 1 ^ x^ y jxjjyj : (5.26) It is desired that the orthogonality measure is as close to 90 as possible. The aspect ratio is used to measure the ratio of image width to image height before and after the rectification transform. Let a = (0; 0; 1) T , b = (w; 0; 1) T , c = (w;h; 1) T and d = (0;h; 1) T be the four corner points. Again, we examine vectors ^ x = ^ b ^ d and ^ y = ^ c ^ a after the transform. The aspect ratio is defined as E A = ( ^ x T ^ x ^ y T ^ y ) 1=2 : (5.27) The aspect ratio should be as close to unity as possible. AlthoughE O andE A are useful, they are still not sufficient to characterize all possi- ble geometric distortions in rectified images. For example,E A is sometimes close to the unity in the presence of large skewness. For this reason, we propose four new geometric distortion measures, which includes a modified aspect ratio measure, and use them in the regularization term of the cost function in the optimization process. We define four corner points (a,b,c andd) and four mid points (e,f,g andh) in the original image as shown in Fig. 5.6. The points in the rectified image are represented by adding a prime to their symbols. The four new measures are given below. Modified aspect ratio (ideally 1): E AR = 1 2 ( a 0 o 0 c 0 o 0 + b 0 o 0 d 0 o 0 ): (5.28) 89 Figure 5.6: The original and rectified images used to define four new geometric mea- sures. It gives a more accurate measure of the aspect ratio change. Skewness (ideally 0 ): E Sk = 1 4 4 X i=1 (j90 6 CA i j): (5.29) It is used to measure how corner angles (denoted by 6 CA i ,i = 1; 2; 3; 4) deviate from 90 after rectification. Rotation angle (ideally 0 ): E R = cos 1 ( ofo 0 f 0 jofjjo 0 f 0 j ): (5.30) It is the angle betweenof and ~ o ~ f, which is used to measure the rotation degree after rectification. Size Ratio (ideally 1): E SR = Area rec Area orig : (5.31) 90 It is used to measure to the size change of the input rectangle after rectification. Generally speaking, the cost function can be written as C() =E s + AR E AR + Sk E Sk + R E R + SR E SR ; (5.32) where denotes a set of rectification parameters to be selected in the optimization pro- cess,E s is the Samson error to characterize the rectification error, and AR , Sk , R and SR are weights for the modified aspect ratio distortion, the Skewness distortion, the rotation angle distortion and the size ratio distortion, respectively. 5.2.3 Iterative Optimization To find an optimal solution to Eq. (5.32) is actually a very challenging task. First, there is a tradeoff between the rectification error and geometric errors. That is, when we attempt to reduce the rectification error by adjusting the parameters of generalized rectification homograhpyH, the geometric error tends to increase, and vice versa. As a result, the cost is a non-convex function that has many local minima. Furthermore, the choice of proper weights AR , Sk , R and SR is an open issue remaining for future research. To proceed, we simply adopt equal weight solution. That is, each weight takes either 0 or 0.25 two values since there is one rectification error term and four geometric error terms in Eq. (5.32), respectively. For a specific type of geometric distortion, if its error is less than a preset threshold, its weight is set to 0. Otherwise, it is set to 0.25. Mathematically, we have X = 8 < : 1; ifE X T X ; 0:25; ifE X <T X ; (5.33) whereX can beAR,Sk,R orSR andT X is the corresponding preset threshold. 91 Figure 5.7: The block diagram of the proposed iterative optimization procedure. Based on this design, the cost is a dynamically changing function that always con- tains the rectification error. A geometric distortion term will be included in the cost function (or ”being turned-on”) only when it is larger than a threshold. Otherwise, it is turned off. To solve this problem, we propose an iterative optimization procedure as shown in Fig. 5.7. It consists of the following steps. 1. We begin with a cost function that contains the Sampson error (E s ) only. The optimization procedure offers the initial set of rectification parameters, which is denoted by init . 2. We update four weight = ( AR ; Sk ; R ; SV ) from init , which is initially set to init = (0; 0; 0; 0). 3. Under the current set of rectification parameters,, and the current set of weights, , we solve the optimization problem with respect to Eq. (5.32). Then, both rectificatioin parameters and weights are updated accordingly. Step 3 is iterated until the the cost of the current round is greater than or equal to that of the previous 92 round. Mathematically, ifC( n ) C( n1 ), we choose n1 as the converged rectification parameters. When we compare the current cost C( n ) with the cost in the previous round C( n1 ) in Step 3, the number of geometric terms in Eq. (5.32) may change. If this occurs, we should compensate it for fair comparison. That is, the cost should be nor- malized by the sum of weight viaC normalized ( n ) =C( n )=(1 + P X X ). The choice of a proper threshold value for each geometric error is important in reach- ing balanced performance. In this work, we set threshold values of geometric errors to the following: 0:8E AR 1:2; E Sk 5 ; (5.34) 0:8E SR 1:2; jE R j 30 (5.35) Furthermore, we normalize four geometric errors by considering their value range. The normalizing factors are: N AR = 1:5; N Sk = 6:5; N R = 18:5; N SV = 2:5; (5.36) which can be absorbed in their respective weight; i.e. the new weight becomes X = 0:25N X when the term is on. Last, the minimization of the cost function is carried out using the nonlinear least square method, Trust Region [41], starting with all unknown variables set to zero. The block-diagram of the proposed USR-CGD system is shown in Fig. 5.8. This system is fully automatic since there is no need to estimate the fundamental matrix. To establish the point correspondence between the left and right images, we extract the SIFT feature [42] and find the initial putative matching points. We also apply 93 Figure 5.8: The block-diagram of the proposed USR-CGD system. RANSAC [43] to remove outliers. It is noteworthy that the number of the correspon- dences strongly affects the rectification performance because the homography is esti- mated based on their errors, and the optimal number varies with the image resolution. A special case of the USR-CGD algorithm is to turn off all geometric distortion terms, which is called the USR algorithm. 5.3 Databases of Uncalibrated Stereo Images The number of publicly available databases of uncalibrated stereo images is small. The SYNTIM database [44] consists of eight stereo images (seven indoor scenes and one out- door scene) of two resolutions, 512x512 and 768x576. The VSG database [31] consists of six stereo images (two indoor scenes and four outdoor scenes) of resolution 640x480. Although these two databases contain multiple unrectified stereo images taken by differ- ent camera poses, it is difficult to analyze the effect of a pose on the quality of rectified images. To allow in-depth analysis, we built two high quality databases of uncalibrated stereo images on our own in this research. First, we built a synthetic stereo image database consisting of 32 image pairs using the OpenGL software. The left images of 4 stereo pairs are shown in Fig. 5.9. The advantage of using a synthetic image is that it is easy to generate a specific geometric distortion. Given two cameras whose initial settings are shown in Fig. 5.12, we can 94 (a) Interior (b) Street (c) Military (d) Vehicle Figure 5.9: The left images of 4 MCL-SS reference image pairs. translate or rotate each of cameras along the X-/Y-/Z-axis so as to mimic real world camera configurations. We can also generate the effects of a wide baseline, vertical misalignment, and different zoom levels. For each of reference stereo image pairs, we generate 8 test stereo pairs, where 6 of them are obtained by applying a single geometric distortion while 2 are generated by applying all six geometric distortions together. For the latter, we move left and right cameras along the X and Y axes and the range of increased disparity is 0185 pixels. The ratio of object’s sizes between left and right images due to cameras’ translation on the Z-axis is about 11.9831.24%. Finally, the angle difference due to camera rotation is 10 on each axis. Eight such image pairs of image “Interior” are given in Fig. 5.10. More stereo image pairs can be found in [45]. Next, we built a database of unrectified stereo images by capturing various scenes from the real world environment. It consists of 5 indoor scenes and 15 outdoor scenes. The left images of 8 selected scenes are shown in Fig. 5.11. The scenes were taken with different zoom levels and various poses using two Point Grey Flea3 cameras with an 95 (a) Trans. Error on X-axis (b) Trans. Error on Y-axis (c) Trans. Error on Z-axis (d) Rot. Error on X-axis (e) Rot. Error on Y-axis (f) Rot. Error on Z-axis (g) Compound Error 1 (h) Compound Error 2 Figure 5.10: The 8 test image pairs of “Interior” with different geometric distortions, where left and right images are overlapped for the display purpose. Table 5.1: The databases of uncalibrated stereo images MCL databases Existing databases Name MCL-SS MCL-RS SYNTIM [44] VGS [31] # of Test Images 32 20 8 6 (8 different geometric distortions per each reference) Resolution 1920x1080 1920x1080 512x512 or 768x576 640x480 Remark - CG-images generated from OpenGL programming - 5 indoor scenes - 7 indoor scenes - 2 indoor scenes - 15 outdoor scenes - 1 outdoor scene - 4 outdoor scenes additional zoom lens. To meet the recommendation of the current broadcasting stan- dard, all images were captured in full-HD resolution (1920x1080). For the rest of this paper, we call our synthetic database MCL-SS (synthetic stereo) and the real database MCL-RS (real stereo), where MCL is the acronym of the Media Communication Lab at the University of Southern California. The comparison with existing databases is summarized in Table 5.1. The MCL databases can be downloaded at [45]. 96 (a) Doheny (b) Dog (c) Dolls (d) Drawer (e) Fountain (f) Keychain (g) Step (h) Tommy Figure 5.11: The eight selected left images from 20 MCL-RS image pairs. 5.4 Performance Evaluation To evaluate the performance of the proposed USR and USR-CGD algorithms, we com- pare them with four rectification algorithms over the four databases. The four bench- mark algorithms are Hartley [27], Mallon and Whelan [31], Wu and Yu [33] and Fusiello and Luca [34]. We adopt the following vertical disparity error E v = 1 N N X j=1 (E v ) j ; (5.37) where (E v ) j =j(H l m j l ) 2 (H r m j r ) 2 j; (5.38) 97 Figure 5.12: Camera configuration used to acquire synthetic stereo images. as the objective error measure while orthogonality (E O ), aspect-ratio (E AR ), skewness (E Sk ), rotation (E R ), and scale-variance (E SV ) are used as subjective error measures. For the number of correspondences, different numbers of matching points are adopted for different image resolutions; namely, 300 for the MCL-SS and the MCL-RS databases (1920x1080), 80 for the SYNTIM database (512x512 or 768x576), and 50 for the VSG database (640x480), respectively. Note that the number of correspondences may decrease after outlier removal. To obtain more reliable results, we conduct experiments on each test set with three random seeds for RANSAC, which results in different distributions of correspondences. Then, these results are averaged for the final performance measure. Each geometric error is the averaged value of left and right rectified images. Extensive experimental results are shown in Tables 5.2 and 5.3, where the best algorithm is marked in bold. Note that these two tables contain the same horizontal entries but different vertical entries. The rectification error, the orthogonality and skewness distortions are shown in Table 5.2 while the aspect ration, rotation and scale variance distortions are shown in Table 5.3. We first examine the advantages of the generalized homography model without con- sidering the geometric distortion constraints (namely, the USR algorithm). At the bot- tom of each test set in Table 5.2, we show the mean performance for that particular set. 98 Table 5.2: Performance comparison of six rectification algorithms in terms of the rec- tification error (E v ) and orthogonality (E O ) and Skewness (E Sk ) geometric distortions, where the best result is shown in bold. For the MCL-SS database, the results of four test sets (Interior, Street, Military, and Vehicle) are averaged. DATABASE TEST SET RECTIFICATION ERROR (Ev in pixels) GEOMETRIC DISTORTIONS (Ideal Value) Othogonality (90 ) Skewness (0 ) Hartley Mallon Wu Fusiello USR USR- Hartley Mallon Wu Fusiello USR USR- Hartley Mallon Wu Fusiello USR USR- CGD CGD CGD MCL-SS X-translation 0.09 0.13 144.83 0.10 0.09 0.09 91.44 89.96 66.47 90.01 89.95 89.95 1.63 0.36 55.97 0.05 0.34 0.34 Y-translation 0.09 0.17 1252.36 0.17 0.14 0.69 90.33 45.98 97.39 89.81 90.01 90.19 0.48 44.02 54.23 0.67 0.77 1.69 Z-translation 0.15 77.26 960.16 0.94 0.24 0.79 100.38 133.94 116.78 92.58 97.74 90.95 27.73 61.14 48.30 5.80 16.56 1.67 X-rotation 0.09 0.18 246.97 2.13 0.24 0.93 90.40 41.56 132.89 90.22 88.17 89.63 3.75 48.33 48.48 6.85 6.25 3.63 Y-rotation 0.11 0.20 966.68 0.11 0.08 0.10 91.87 89.17 83.31 90.02 89.97 89.90 4.17 4.14 35.04 3.74 3.75 3.74 Z-rotation 0.11 0.18 109.49 0.10 0.11 0.09 90.35 92.78 111.08 89.95 90.02 90.10 0.93 3.01 53.95 0.24 0.69 0.70 compound1 37.52 6443.91 662.73 3.49 0.10 0.55 99.26 110.23 100.38 88.73 97.10 89.98 39.76 31.13 61.06 10.48 18.18 1.45 compound2 0.10 4.10 101.37 1.32 0.12 0.94 88.36 118.46 95.07 87.47 88.06 89.23 22.32 44.67 52.01 11.97 10.39 2.61 Mean 4.78 815.76 555.58 1.04 0.25 0.50 92.80 90.26 100.42 89.85 91.39 89.99 12.47 29.60 51.13 4.98 7.11 2.18 MCL-RS Books 0.68 25.01 129.41 3.47 1.97 1.87 59.91 78.36 43.74 84.95 84.93 91.10 28.47 35.53 43.31 6.38 14.99 4.98 Dogs 0.21 0.44 2403.43 5.67 0.16 0.17 93.50 93.33 48.66 93.57 89.18 90.21 4.68 5.57 67.57 11.72 2.34 0.99 Doheny 0.10 0.42 1.75 0.09 0.09 0.08 83.80 80.84 83.45 90.02 88.01 88.96 15.22 11.85 68.85 0.38 4.98 2.56 Dolls 0.17 1.25 36.09 0.21 0.16 0.20 92.54 100.68 156.50 89.30 93.00 89.69 7.86 10.27 60.14 1.59 8.77 0.92 Drawer 0.21 0.03 18.50 0.47 0.45 0.20 104.35 134.93 83.80 88.83 92.99 90.04 34.33 43.71 76.14 3.08 27.02 0.14 Fountain 1.72 36.32 354.99 0.35 0.20 0.91 88.99 87.31 105.04 90.40 90.55 90.32 2.74 2.85 52.19 0.49 1.34 0.86 Fountain2 0.18 2.66 0.14 2.06 0.20 0.27 94.98 98.58 95.85 82.18 95.36 90.21 10.90 18.48 24.89 15.18 12.31 0.51 Fountain3 0.27 5.21 169.22 0.14 0.11 0.13 93.45 95.26 133.98 90.16 93.21 90.65 8.32 12.58 53.26 0.32 10.39 1.88 Fountain4 2.51 9.30 355.94 1.04 1.06 0.45 78.85 50.94 138.19 87.76 81.52 90.21 52.38 43.09 49.54 5.62 21.62 0.75 Keychain 0.25 1.63 391.80 0.48 0.21 0.42 83.90 87.56 129.47 89.25 84.64 89.45 15.12 12.67 67.26 1.86 2.65 1.65 Keychain2 0.38 0.57 0.63 12.36 0.23 0.31 88.83 98.50 70.75 77.29 88.90 89.00 2.55 11.76 70.16 13.67 4.17 2.30 Leavey 0.20 1.37 6.72 0.19 0.11 0.19 90.48 98.29 63.40 89.85 88.37 90.47 2.28 8.40 71.24 0.48 4.55 0.06 Mudhall 0.15 0.97 0.63 0.12 0.11 0.13 91.40 93.48 116.97 91.42 91.90 89.50 2.04 3.61 23.49 3.36 5.11 1.18 RTCC 3.27 15.39 0.72 0.27 0.23 0.36 89.62 103.91 103.32 89.31 90.64 89.92 2.76 14.38 58.07 5.75 16.40 1.06 Salvatory 1.53 7.82 1.30 0.60 0.20 0.56 87.73 87.36 103.62 87.97 85.65 90.47 11.20 9.44 72.15 6.18 5.26 1.11 Step 1.07 2.98 82.35 0.20 0.78 0.74 94.67 81.95 148.84 96.08 93.40 89.53 9.47 14.84 72.25 9.51 4.46 1.38 Tommy 0.17 1.41 452.77 0.13 0.11 0.12 89.22 96.86 90.07 89.92 88.03 90.06 1.97 8.66 0.00 0.28 7.65 1.07 Tommy2 0.14 5.39 346.31 0.13 0.10 0.11 91.39 93.90 31.32 90.34 90.94 90.15 5.07 6.54 51.28 3.22 9.17 0.20 Viterbi 0.12 0.33 0.59 0.12 0.37 0.11 93.23 86.30 64.80 89.98 86.17 90.15 17.93 27.96 40.49 0.39 0.56 1.64 VKC 0.20 0.62 2527.03 0.17 0.17 0.15 84.99 81.27 37.77 90.00 89.73 89.31 9.27 9.97 49.77 0.16 4.19 1.58 Mean 0.67 5.96 364.02 1.41 0.31 0.38 88.84 91.48 92.15 88.92 89.28 89.96 12.23 15.61 55.21 4.48 8.98 1.34 SYNTIM Aout 0.33 2.67 11.43 4.32 0.53 1.44 99.68 110.98 88.99 86.70 92.94 91.09 21.47 26.12 65.47 14.92 11.49 7.41 BalMire 0.46 3.21 256.49 0.45 0.22 0.34 90.77 96.39 133.71 91.20 91.24 89.66 11.10 14.24 52.66 3.87 11.17 2.95 BalMouss 0.19 0.33 0.19 0.21 0.18 0.26 91.64 84.67 88.53 88.68 88.56 89.46 6.76 5.29 6.03 4.89 5.39 2.42 BatInria 0.15 0.47 0.15 0.14 0.13 0.13 90.69 87.31 89.99 90.00 89.93 89.98 3.47 3.35 6.18 1.26 4.09 2.35 Color 0.20 0.32 0.21 0.42 0.19 0.43 95.56 85.71 89.99 88.76 89.95 90.14 7.91 5.74 6.29 7.31 5.06 2.69 Rubik 0.14 1.24 1.42 0.41 0.13 0.38 87.26 108.58 58.40 89.84 93.97 91.17 14.24 17.51 56.94 0.43 11.74 0.42 Sport 0.13 0.94 0.56 0.06 0.07 0.08 73.58 79.25 135.17 90.84 87.50 90.00 49.88 27.26 64.71 3.18 9.56 0.10 Tot 0.14 0.80 0.61 0.24 0.15 0.73 83.87 74.82 63.86 86.20 86.47 88.06 16.52 14.99 54.44 15.10 13.81 10.78 Mean 0.22 1.25 33.88 0.78 0.20 0.47 89.06 90.96 93.58 89.03 90.07 89.82 16.42 14.31 39.09 6.37 9.04 3.64 VSG Arch 0.18 0.55 67.15 0.17 0.15 0.17 88.63 85.59 76.44 90.63 87.94 90.42 20.28 13.77 28.67 2.08 6.88 1.01 Drive 28.16 0.56 44.87 0.17 0.17 0.16 97.08 99.16 130.15 89.22 87.47 89.98 26.40 19.95 51.11 4.29 9.94 0.18 Lab 0.11 0.27 1.22 0.09 0.08 0.10 86.19 101.45 86.21 91.34 91.39 91.14 6.81 11.38 78.11 4.84 4.84 4.15 Roof 0.14 0.14 86.01 0.62 0.11 0.32 111.39 93.25 91.45 91.64 90.02 90.71 23.81 7.57 5.93 3.71 6.24 3.83 Slate 0.20 0.15 2.21 0.11 0.11 0.18 88.76 83.66 88.43 88.92 88.28 89.98 7.01 6.30 76.09 4.33 6.89 0.42 Yard 0.20 0.38 0.14 0.11 0.10 0.13 94.60 84.42 87.03 89.28 88.43 89.73 10.24 7.81 14.11 5.02 7.30 1.52 Mean 5.12 0.34 33.60 0.21 0.12 0.18 94.44 91.26 93.28 90.17 88.92 90.33 15.76 11.13 42.34 4.04 7.01 1.85 As far as the rectificatoin error is concerned, we see that the USR algorithm provides the smallest error for all four databases. The results of the MCL-SS database allow us to analyze the performance improve- ment. To give an example, the vertical disparity is often caused by translational distor- tion in the Y-axis, rotational distortion in the X-axis and different zoom levels in the Z-axis (emulated by translational distortion). By comparing our generalized homogra- phy model with that of Fuesillo and Luca [34], our model provides smaller errors in 99 Table 5.3: Performance comparison of six rectification algorithms in terms of aspect- ratio (E AR ). rotation (E R ) and scale-variance (E SR ) geometic distortions, where the best result is shown in bold. For the MCL-SS database, the results of four test sets (Interior, Street, Military, and Vehicle) are averaged. DATABASE TEST SET GEOMETRIC DISTORTIONs (Ideal Value) Aspect-Ratio (1) Rotation (0 ) Scale-Variance (1) Hartley Mallon Wu Fusiello USR USR- Hartley Mallon Wu Fusiello USR USR- Hartley Mallon Wu Fusiello USR USR- CGD CGD CGD MCL-SS X-translation 1.01 1.01 95.57 1.00 1.00 1.00 0.17 45.13 50.93 0.02 0.17 0.17 1.00 1.00 8.43 1.00 1.00 1.00 Y-translation 1.01 1.01 248.66 1.01 1.02 1.03 54.17 93.47 87.24 53.13 56.87 42.76 1.01 1.25 288.82 1.02 1.03 1.02 Z-translation 3.11 2.86 28.94 2.20 2.82 1.03 37.37 73.18 90.10 2.43 17.58 2.39 4.55 0.21 27.68 31.65 12.04 1.12 X-rotation 1.16 1.21 1231.26 1.23 2.82 1.15 56.35 79.69 71.88 54.72 60.26 51.57 1.03 1.46 2.17 1.21 1.01 1.05 Y-rotation 1.08 1.08 129.90 1.08 1.08 1.07 1.68 16.31 33.63 1.65 1.64 1.63 1.00 1.07 78.58 1.04 1.08 1.07 Z-rotation 1.01 1.01 417.89 1.00 1.00 1.01 5.19 5.22 100.49 5.18 5.19 5.17 1.01 0.99 0.40 1.00 1.00 1.00 compound1 5.12 2.63 400.24 1.38 1.58 1.03 64.19 57.39 87.66 16.91 23.01 15.87 1.36 5.98 219.74 1.28 2.47 1.06 compound2 3.43 2.57 426.15 1.34 1.32 1.08 48.80 65.84 67.65 39.71 50.57 46.19 4.60 1.20 18.05 1.09 1.44 0.98 Mean 2.11 1.67 372.33 1.28 1.38 1.05 33.49 54.53 73.70 22.47 26.91 21.12 1.94 1.65 80.48 4.91 2.01 1.03 MCL-RS Books 6.21 10.51 20.05 1.11 2.94 1.08 81.76 89.62 70.90 14.34 86.06 38.32 0.92 80.26 7.06 1.26 10.41 1.14 Dogs 1.05 1.05 3.32 2.14 1.04 1.02 8.99 7.51 90.79 4.72 8.89 8.88 1.19 0.97 0.93 2.13 0.83 0.84 Doheny 1.31 1.31 56.69 1.01 1.10 1.05 7.25 86.51 16.47 7.36 6.69 7.16 0.66 1.52 0.67 1.00 1.00 0.98 Dolls 1.19 1.19 4.40 1.03 1.24 1.03 8.32 9.43 148.80 7.22 8.21 3.26 1.34 0.75 0.43 1.01 1.57 0.95 Drawer 15.93 7.48 29.39 1.13 11.79 1.00 22.66 67.89 82.86 0.80 64.30 0.05 81.95 0.16 1.68 2.99 12.54 0.85 Fountain 1.05 1.06 63.33 1.04 1.03 1.03 4.83 34.04 73.16 61.09 99.08 20.80 0.94 1.09 4.24 1.03 1.04 1.02 Fountain2 1.34 1.34 1.85 1.75 1.75 1.02 7.26 87.23 13.88 7.53 6.95 2.35 1.30 0.81 1.75 4.11 7.57 1.12 Fountain3 1.30 1.29 896.51 1.01 2.26 1.04 5.44 60.61 46.86 6.32 8.73 5.66 1.44 0.80 3.93 1.00 11.96 1.07 Fountain4 20.33 3.62 18.19 1.21 2.78 1.01 43.01 163.92 104.06 4.39 33.87 4.24 8.14 4.46 9.01 2.88 9.34 0.80 Keychain 1.35 1.34 2589.48 1.04 1.33 1.09 4.78 4.34 115.30 7.53 4.52 7.42 0.81 1.63 101.40 0.99 1.32 0.98 Keychain2 1.05 1.05 138.23 1.17 1.05 1.04 9.68 36.37 81.10 2.13 9.68 9.62 0.77 1.24 0.84 4.30 1.41 1.41 Leavey 1.05 1.06 35.07 1.01 1.10 1.00 13.26 45.83 57.89 18.73 26.34 5.70 1.04 0.96 1.91 1.00 1.03 1.00 Mudhall 1.04 1.04 141.11 1.08 1.10 1.06 4.76 4.98 10.76 4.02 5.43 3.91 1.02 0.96 0.66 1.13 1.27 1.04 RTCC 1.05 1.06 352.77 1.13 1.13 1.07 16.40 38.97 87.48 27.42 29.73 25.09 1.02 0.96 0.00 1.09 1.24 0.98 Salvatory 1.27 1.27 135.96 1.20 1.60 1.05 6.98 35.86 97.01 11.34 8.60 13.00 0.85 1.41 5.03 1.19 1.56 1.00 Step 1.24 1.24 1311.56 1.31 1.12 1.03 12.60 84.80 133.00 37.01 38.13 11.94 1.36 0.75 419.14 1.43 1.33 1.01 Tommy 1.04 1.04 1.00 1.01 1.09 1.04 13.87 39.54 0.00 15.59 13.63 15.06 1.01 1.07 1.00 1.00 0.96 0.98 Tommy2 1.11 1.11 700.69 1.09 1.18 1.02 6.52 7.03 5.39 7.17 7.70 6.52 1.07 0.95 2.77 1.22 1.38 1.00 Viterbi 30.28 28.85 2.06 1.03 31.69 1.03 14.20 80.41 42.90 56.21 0.22 1.96 32.15 0.85 1.76 1.00 19.25 1.02 VKC 1.20 1.20 124.81 1.02 1.08 1.04 9.87 35.51 70.13 8.56 8.04 8.39 0.86 1.28 0.19 1.00 0.98 0.98 Mean 4.57 3.46 348.02 1.17 3.47 1.04 15.00 51.02 69.14 12.77 26.53 9.97 6.99 5.14 29.28 1.64 12.03 1.01 SYNTIM Aout 5.32 2.31 1114.30 3.64 2.19 1.65 27.80 86.77 125.85 22.89 33.56 25.99 2.54 1.20 0.02 1.25 2.01 1.08 BalMire 1.40 1.40 3.59 1.12 1.77 1.08 6.93 53.02 63.67 2.83 8.02 3.24 1.12 0.99 1.50 1.37 4.97 1.08 BalMouss 1.16 1.16 1.17 1.15 1.16 1.06 5.67 48.26 4.88 4.16 4.40 3.03 0.95 1.23 1.09 1.12 1.07 0.93 BatInria 1.14 1.14 1.25 1.05 1.16 1.09 0.95 45.93 1.31 1.70 1.38 1.26 1.06 1.22 1.28 1.01 1.18 1.07 Color 1.24 1.24 1.26 1.42 1.24 1.10 5.15 47.16 5.05 3.84 4.75 4.85 1.09 1.37 1.23 1.57 1.31 1.01 Rubik 1.47 1.51 14.07 1.01 1.51 1.01 17.53 89.82 87.59 7.70 13.73 8.07 1.25 0.55 2.48 1.01 2.16 1.00 Sport 5.86 4.42 21.63 1.09 1.34 1.00 13.84 47.74 120.32 1.52 6.21 0.03 1.67 13.77 7.16 1.03 1.70 1.00 Tot 1.66 1.77 22.56 1.65 1.67 1.45 21.75 91.28 67.78 19.37 17.26 16.99 0.79 2.71 46.42 1.04 1.47 1.32 Mean 2.41 1.87 147.48 1.52 1.50 1.18 12.45 63.75 59.55 8.00 11.16 7.93 1.31 2.88 7.65 1.18 1.98 1.06 VSG Arch 10.88 3.70 1.52 1.06 1.24 1.03 6.62 32.15 57.21 3.52 2.87 3.46 3.08 13.50 1.07 1.02 1.30 1.02 Drive 11.66 1.60 34.84 1.16 1.45 1.00 32.70 46.59 93.17 3.86 5.85 1.62 3.01 1.53 7.18 1.42 2.72 1.00 Lab 1.13 1.14 134.37 1.14 1.14 1.11 16.57 19.62 86.08 16.86 16.71 16.44 1.11 0.86 26.94 1.06 1.16 1.14 Roof 1.20 1.20 1.18 1.12 1.19 1.15 1.27 89.40 1.01 1.09 1.16 1.38 1.15 0.75 0.85 1.60 1.31 1.14 Slate 1.20 1.20 25.26 1.12 1.21 1.01 10.86 36.13 89.51 10.12 6.12 5.69 0.89 1.32 2.28 0.99 1.16 0.98 Yard 1.27 1.27 1.42 1.26 1.03 1.04 2.72 30.94 3.28 2.30 2.42 2.49 0.92 1.46 1.46 1.09 1.40 1.01 Mean 4.56 1.68 33.10 1.25 1.06 1.05 11.76 42.47 55.04 6.29 6.51 5.18 1.69 3.24 6.64 1.20 1.51 1.05 all three cases since it includes additional degrees of freedom to compensate geomet- ric errors. Furthermore, two stereo paris with compound distortions are challenging to existing algorithms while the USR algorithm offers a stable and robust performance across all test pairs since the generalized homography model is able to cope with com- bined geometric distortions. On the other hand, the USR algorithm does not offer the best performance in geo- metric distortions since it does not take them into consideration. As shown in Tables 5.2 and 5.3, the USR-CGD algorithm has the lowest geometric distortions in most cases. 100 (a) The rectified image pair using the USR algorithm (E v =0.12) (b) The rectified image pair using the USR- CGD algorithm (E v =0.34) Figure 5.13: Subjective quality comparison of rectified image pair using (a) the USR algorithm and (b) the USR-CGD algorithm. For subjective quality comparison, we show a rectified image pair using the USR and USR-CGD algorithms in Fig. 5.13. Although the rectification error (E v ) of the USR- CGD algorithm increases slightly (from 0.12 to 0.34 pixel), its resulting image pair look much more natural since all geometric errors are reduced significantly. Generally speaking, the rectification error of the USR-CGD algorithm is comparable with that of the USR algorithm, and the increase of E v in the USR-CGD algorithm is around 0:06 0:27 pixels. Among the six benchmarking algorithms, only USR and USR-CGD achieve a mean rectification error (E v ) less than 0.5 pixel, which is the minimum requirement for stereo matching to be performed along the 1D scanline. For the subjective quality evaluation of rectified stereo image pairs of the proposed USR-CGD algorithm and four benchmarking algorithms, we select six representative images pairs and show their rectified results in Figs. 5.145.19. The vertical disparity errorE v is also listed for each subfigure. For the purpose of displayingE v , we choose ten sample matching points, which are evenly distributed along the vertical direction and draw corresponding epipolar lines. It is apparent that the proposed USR-CGD algorithm 101 Table 5.4: Performance comparison based on 100 and 300 correspondences as the input. Database MCL-SS MCL-RS Algorithm # of E v Geometric Errors E v Geometric Errors Correspondences E O E Sk E AR E R E SR E O E Sk E AR E R E SR Hartley 300 4.78 92.80 12.47 2.11 33.49 1.94 0.67 88.84 12.23 4.57 15.00 6.99 100 2202.55 93.12 15.24 8.90 38.42 18.59 110.13 86.55 16.75 3.07 24.96 3.24 Mallon 300 815.76 90.26 29.60 1.67 54.53 1.65 5.96 91.48 15.61 3.46 51.02 5.14 100 13266.94 87.72 31.39 2.30 59.87 1.94 19.72 88.98 19.97 2.27 51.77 2.23 Wu 300 555.58 100.42 51.13 372.33 73.70 80.48 364.02 92.15 55.21 348.02 69.14 29.28 100 1039.03 94.99 48.54 1070.51 82.34 9.47 1231.28 83.41 60.53 400.13 74.88 150.88 Fusiello 300 1.04 89.85 4.98 1.28 21.47 4.91 1.41 88.92 4.48 1.17 12.77 1.64 100 119.49 89.19 5.72 1.40 20.38 5.56 1.13 89.17 3.34 1.11 13.60 1.56 USR-CGD 300 0.50 89.99 2.18 1.05 24.12 1.03 0.38 89.96 1.34 1.04 9.97 1.01 100 0.52 90.01 2.88 1.01 23.28 1.04 0.26 90.00 1.36 1.04 10.17 1.03 have the best visual quality with minimal warping distortion at the cost of slightly higher rectification errors in some cases. There is another advantage of the USR-CGD algorithm. That is, it is able to provide robust performance when the number of correspondences is smaller. To demonstrate this, we conduct the same test with 100 correspondences for the MCL-SS and MCL- RS databases. The performance comparison of using 300 and 100 correspondences is shown in Table 5.4. The rectification errors of all four benchmarking algorithms increase a lot since their reliability in point matching is more sensitive to the number of corre- spondences. In contrast, the USR-CGD algorithm still maintains the rectification error at a similar level. This is because that the USR-CGD algorithm can adaptively optimize the homographies to meet pre-set conditions. A similar behavior is also observed in the degree of geometric distortions. Based on the above discussion, we conclude that the USR-CGD algorithm achieves the best overall performance in finding a proper balance between the rectification error and geometric distortions since it includes all of them in its objective function for opti- mization. 102 (a) Original unrectified image pair (b) Hartley (E v =0.146) (c) Mallon (E v =1.504) (d) Wu (E v =0.093) (e) Fuesillo (E v =2.109) (f) USR-CGD (E v =0.299) Figure 5.14: The subjective quality and the rectification error (the value in parenthesis) for Fountain2 of MCL-RS databases. (a) Original unrectified image pair (b) Hartley (E v =0.214) (c) Mallon (E v =5.484) (d) Wu (E v =4.520) (e) Fuesillo (E v =0.120) (f) USR-CGD (E v =0.101) Figure 5.15: The subjective quality and the rectification error (the value in parenthesis) for Fountain3 of MCL-RS databases. (a) Original unrectified image pair (b) Hartley (E v =0.061) (c) Mallon (E v =625.942) (d) Wu (E v =99.432) (e) Fuesillo (E v =0.133) (f) USR-CGD (E v =0.127) Figure 5.16: The subjective quality and the rectification error (the value in parenthesis) for Military of MCL-SS databases. 103 (a) Original unrectified image pair (b) Hartley (E v =0.141) (c) Mallon (E v =0.796) (d) Wu (E v =0.607) (e) Fuesillo (E v =0.240) (f) USR-CGD (E v =0.308) Figure 5.17: The subjective quality and the rectification error (the value in parenthesis) for Tot of SYNTIM databases. (a) Original unrectified image pair (b) Hartley (E v =0.108) (c) Mallon (E v =0.127) (d) Wu (E v =0.091) (e) Fuesillo (E v =1.068) (f) USR-CGD (E v =0.282) Figure 5.18: The subjective quality and the rectification error (the value in parenthesis) for Root of VSG databases. (a) Original unrectified image pair (b) Hartley (E v =0.098) (c) Mallon (E v =0.385) (d) Wu (E v =0.125) (e) Fuesillo (E v =0.128) (f) USR-CGD (E v =0.119) Figure 5.19: The subjective quality and the rectification error (the value in parenthesis) for Yard of VSG databases. 104 Chapter 6 Conclusion In our first work, we proposed a learning-based ensemble fusion model to predict the quality score of a stereoscopic image pair, which has a high correlation with a human opinion. Since the quality prediction of the stereoscopic image pair is a complicated multi-dimensional problem related to various distortion types, an analytical quality met- ric using a single formula cannot provide reliable performance in all situations. To over- come this limitation, we exploit the SVR algorithm to obtain an optimized relationship between MOS and various distortions. In the ensemble fusion model, we developed feature extraction schemes for the scor- ers to address different distortion types. Each scorer outputs a quality score for the indi- vidual distortion at stage1 and all the intermediate scores are fused by SVR at stage2 to yield a final score. The proposed model has an advantage of scalability. It can be easily extended by adding new scorers according to the requirements of applications. Experimental results over different databases demonstrated the superiority of the proposed quality index. For example, on the database with a conventional stereoscopic data format, we achieved PCC up to 0.95. In addition, even on the database with MVD data format, the proposed index provides comparable prediction accuracy while the per- formance of the existing 2D/3D fomular-based quality metrics drops a lot in the new format. This is meaningful since most emerging 3D video/image applications are likely to be based on MVD format to take advantage of rendering technology for the sake of resource efficiency. 105 In the second work, we proposed a novel image rectification algorithm for uncali- brated stereo images. From thorough analysis of various geometric distortions that are likely to occur in a real application, we first parameterize the homography to reduce the rectification error. Next, we proposed the constrained optimization scheme with new cost function that adopts newly designed four geometric measures as regularization terms, which prevents severe perspective distortions to be introduced in the rectified images while maintaining the small rectification error. Experimental results demon- strate that the proposed algorithm outperforms the existing algorithms with a significant margin in terms of both objective and subjective quality. 106 Reference List [1] ISO/IEC JTC1/SC29/WG11, “Text of ISO/IEC FDIS 23000-11 for Stereoscopic Video AF,” N10283, Busan, Korea, Oct. 2008. [2] A. Vetro, T. Wiegand, and G. J. Sullivan “Overview of the Stereo and Multiview Video Coding Extensions of the H.264/ MPEG-4 A VC Standard,”Proc.oftheIEEE, vol. 99, no. 4, pp. 626-642, Apr. 2011. [3] K. Muller, P. Merkle, and T. Wiegand,“3-D Video Representation Using DepthMaps, ”Proc.oftheIEEE, vol. 99, no. 4, pp. 643-656, Apr. 2011. [4] P. Merkle, A. Smolic, K. Muller, and T. Wiegand, “Multi-view video plus depth representation and coding,” inProc.IEEEICIP, vol. 1, pp. 201-204, Oct. 2007. [5] Z. Wang and A. Bovik, Modern Image Quality,ser. Synthesis Lectures on Image, Video and Multimedia Processing. Morgan and Claypool, Feb. 2006. [6] J. You, L. Xing, A. Perkis, and X. Wang, “Perceptual quality assessment for stereo- scopic images based on 2D image quality metrics and disparity analysis,”, in Proc. Int.WorkshopVideoProcessingandQualityMetrics, vol. 9, 2010. [7] L. B. Stelmach, W. J. Tam, D. V . Meegan, A. Vincent, and P. Corriveau, “Human perception of mismatched stereoscopic 3D inputs”, in Proc. IEEE ICIP, vol. 1, pp. 5-8, 2000. [8] D. V . Meegan, L. B. Stelmach, and W. J. Tam “Unequal weighting of monocular inputs in binocular combination: Implications for the compression of stereoscopic imagery”,JournalofExperimentalPsychology: Applied, vol. 7, pp. 143, 2001. [9] P. Seuntiens, L. Meesters, and W. Ijsselsteijn, “Perceived quality of compressed stereoscopic images: Effects of symmetric and asymmetric JPEG coding and cam- era separation,”ACMTransactionsonAppliedPerception(TAP), vol. 3, pp. 95-109, Apr. 2006. 107 [10] ISO/IEC JTC1/SC29/WG11, “Applications and Requirements on 3D Video Cod- ing,” N10283, Busan, Korea, Oct. 2008. [11] P. Kauff, N. Atzpadin, C. Fehn, M. Muller, O. Schreer, A. Smolic, and R. Tanger, “Depth map creation and image based rendering for advanced 3DTV services pro- viding interoperability and scalability,” Signal Process., Image Commun., vol. 22, no. 2, pp. 217-234, Feb. 2007. [12] P. Campisi, P. Le Callet, and E. Marini, “Stereoscopic image quality assessment,”, inProc.15thEuropeanSignalProcessingConference,, Sep. 2007 [13] A. Benoit, P. Le Callet, P. Campisi, and R. Cousseau, “Using disparity for quality assessment of stereoscopic images,” inProc.IEEEICIP, pp. 389-392, Oct. 2008. [14] P. Hahart, F. De Simone, and T. Ebrahimi, “Quality assessment of asymmetric stereo pair formed from decoded and synthesized views,” in Fourth International WorkshoponQualityofMultimediaExperience(QoMEX), pp. 236-241, Jul. 2012. [15] Z. M. P. Sazzad, S. Yamanaka, Y . Kawayoke, and Y . Horita, “Stereoscopic image quality prediction,” inFirstInternationalWorkshoponQualityofMultimediaExpe- rience(QoMEX), pp. 180-185, Jul. 2009. [16] Z. M. P. Sazzad, R. Akhter, J. Baltes, and Y . Horita, “Objective No-Reference Stereoscopic Image Quality Prediction Based on 2D Image Features and Relative Disparity,” inAdvancesinMultimedia, vol. 2012, Jan. 2012. [17] P. Gorley, and N. Holliman, “Stereoscopic Image Quality Metrics and Compres- sion,” in Electronic Imaging 2008. International Society for Optics and Photonics, pp. 680305-680305, 2008. [18] S. Ryu, D. Kim, and K. Sohn, “Stereoscopic image quality metric based on binoc- ular perception model,” inProc.IEEEICIP, vol. 1, pp. 609-612, Sep. 2012. [19] S. Ryu, and K. Sohn, “No-Reference Quality Assessment for Stereoscopic Images Based on Binocular Quality Perception,” inIEEETransactionsonCircuitsandSys- temsforVideoTechnology, vol. 24, no. 4, pp. 591-602, Apr, 2014. [20] L. Tsung-Jung, W. Lin, and C.-C. Jay Kuo, “A multi-metric fusion approach to visual quality assessment,” inThirdInternationalWorkshoponQualityofMultime- diaExperience(QoMEX), pp. 72-77, Sep. 2011. [21] L. Jin, K. Egiazarian, and C.-C. Jay Kuo, “Perceptual image quality assessment using block-based multi-metric fusion (BMMF),” in37thIEEEIntl.Conf.onAcous- tics,Speech,andSignalProc(ICASSP), pp. 1145-1148, Mar. 2012. 108 [22] M. Park, L. Jiebo, and C. G. Andrew, “Toward Assessing and Improving the Qual- ity of Stereo Images,” inIEEEJournalofSelectedTopicsinSignalProcessing,6(5), pp. 460-470, 2012. [23] J. Cheng, and L. Sumei, “Objective Quality Assessment of Stereo Images Based on ICA and BT-SVM,” inIEEE7thInternationalConferenceonComputerScience &Education(ICCSE), pp. 154-159, 2012. [24] N. Ayache, and L. Francis, “Trinocular stereo vision for robotics.” inIEEETrans- actions on Pattern Analysis and Machine Intelligence, vol. 13, no. 1, pp 73-85, 1991. [25] A. Fusiello, T. Emanuele, and V . Alessandro, “A compact algorithm for rectifica- tion of stereo pairs.” in Machine Vision and Applications, vol.12, no. 1, pp. 16-22, 2000. [26] R. Hartley, “Theory and practice of projective rectification.” inInternat.J.Comput. Vision, vol. 35, no. 2, pp. 115-127, 1999. [27] R. Hartley, and Z. Andrew, “Multiple view geometry in computer vision.” inCam- bridgeuniversitypress, 2003. [28] C. Loop, and Z. Zhang, “Computing rectifying homographies for stereo vision.” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 125131, Jun, 1999. [29] M. Pollefeys, K. Reinhard, and G. Luc, “A simple and efficient rectification method for general motion.” in Proceedings of the IEEE Conference on Computer Vision andPatternRecognition, V ol. 1. pp. 496-501, 1999. [30] J. Gluckman, and S. Nayar. “Rectifying transformations that minimize resampling effects” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 111117, Dec, 2001. [31] J. Mallon, and P. Whelan, “Projective rectification from the fundamental matrix.” inImageandVisionComputing, vol. 23, no.7, pp. 643-650, 2005. (VSG images can be downloadable at: http://www.vsg.dcu.ie/code.html) [32] F. Isgr, and T. Emanuele, “On robust rectification for uncalibrated images.” inIEEE InternationalConferenceonImageAnalysisandProcessing, pp. 297-302, 1999. [33] H. Wu, and Y . Yu. “Projective rectification with reduced geometric distortion for stereo vision and stereoscopic video.”, inJournalofIntelligentandRoboticSystems, vol. 42, no. 1, pp. 71-94, 2005. 109 [34] A. Fusiello, and I. Luca, “Quasi-Euclidean epipolar rectification of uncalibrated images.” inMachineVisionandApplications, vol. 22, no. 4, pp. 663-670, 2011. [35] F. Zilly, M. Mller, P. Eisert, and P. Kauff, “Joint estimation of epipolar geome- try and rectification parameters using point correspondences for stereoscopic TV sequences.” inProceedingsof3DPVT, Paris, France, 2010. [36] M. Georgiev, G. Atanas, and H. Miska, “A fast and accurate re-calibration tech- nique for misaligned stereo cameras.” in IEEE International conference on Image Processing, pp. 24-28, 2013. [37] A. Benoit, P. L. Callet, P. Campisi, and R. Cousseau, “IRCCyN/IVC 3D images database,” in ACM Transactions on Applied Perception (TAP), “http://www.irccyn.ec-nantes.fr/spip.php?article876”, 2008 [38] A. K. Moorthy, C. C. Su, A. Mittal, and A. C. Bovik, “Subjective evalua- tion of stereoscopic image quality,” in Signal Processing: Image Communication, “live.ece.utexas.edu/research/quality/live 3dimage phase1.html”, 2012. [39] O. Faugeras, “Three-dimensional computer vision: a geometric viewpoint”, MIT press, Cambrigde, MA, 1993. [40] Q. Luong, and D. Olivier, “The fundamental matrix: Theory, algorithms, and sta- bility analysis.” in International Journal of Computer Vision V ol. 17, No. 1, pp. 43-75, 1996. [41] Y . Yuan, “A review of trust region algorithms for optimization.” inICIAM, V ol. 99, 2000. [42] G. Lowe, “Distinctive image features from scale-invariant keypoints.” in Interna- tionaljournalofcomputervision. V ol. 60, no. 2, pp. 91-110, 2004. [43] M. Fischler, and R. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.” in Commu- nicationsoftheACM, V ol. 24, no. 6, pp. 381-395, 1881. [44] SYNTIM images can be downloadable at: “http://perso.lcpc.fr/tarel.jean- philippe/syntim/paires.html” [45] MCL-SS & MCL-RS databases can be downloadable at : http://mcl.usc.edu/mcl- ss-database/ and http://mcl.usc.edu/mcl-rs-database/ [46] G.E. Legge, and G.S. Rubin, “Binocular interactions in suprathreshold contrast perception,” inAttention,Perception,&Psychophysics, vol. 30, pp. 49-61, 1981. 110 [47] Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli, “Image quality assess- ment: From error visibility to structural similarity,”, inIEEETransactionsonImage Processing, vol. 13, pp. 600-612, Apr. 2004. [48] VQEG, “Subjective evaluation of stereoscopic image quality,” inFinalreportfrom the Video Quality Experts Group on the validation of objective models of video qualityassessmentPhase2, Aug, 2003. [49] K. Fliegel, “QUALINET Multimedia Databases v4. 0”, 2013. [50] L. Tsung-Jung, W. Lin, and C.-C. Jay Kuo, “Image quality assessment using multi- method fusion”, inIEEETransactionsonImageProcessing, vol. 22, pp. 1793-1807, May. 2013. [51] M.J. Chen, C.C Su, D.K. Kwon, L.K. Cormack, and A. Bovik, “Full-reference quality assessment of stereopairs accounting for rivalry”, in Signal Processing: ImageCommunication, vol. 28, pp. 1143-1155, 2013. [52] L. Goldmann, F. De Simone, and T. Ebrahimi, “Impact of acquisition distortions on the quality of stereoscopic images,” in5thInternationalWorkshoponVideoPro- cessingandQualityMetricsforConsumerElectronics(VPQM), 2010. [53] A. V oronov, D. Vatolin, D. Sumin, V . Napadovsky, and A. Borisov, “Methodol- ogy for stereoscopic motion picture quality assessment”, in IS&T SPIE Electronic Imaging.InternationalSocietyforOpticsandPhotonics, pp. 864810-864810, Mar, 2013. [54] P. Merkle, Y . Morvan, A. Smolic, D. Farin, K. Mller, P. H. N. de With, and T.Wiegand, “The effect of depth compression on multiview rendering quality,” in SignalProcess.: ImageCommun., vol. 24, no. 1-2, pp. 7388, Jan. 2009. [55] E. Bosc, R. Pepion, P. Le Callet, M. Koppel, P. Ndjiki-Nya, M. Pressigout, and L. Morin, “Towards a new quality metric for 3-D synthesized view assessment,”in IEEEJournalofSelectedTopicsinSignalProcessing, vol. 5, no. 7, pp. 1332-1343, 2011. [56] M. Tanimoto, T. Fujii, and K. Suzuki . “View synthesis algorithm in view synthesis reference software 3.5 (VSRS3.5)”, 2009. [57] ITU. BT.500-11, “Methodology for the subjective assessment of the quality of television pictures”, 2002. [58] ITU. BT.1438, “Subjective assessment of stereoscopic television pictures”, 2000. 111 [59] H. Du, P. Henry, X. Ren, M. Cheng, D. B. Goldman, S. M. Seitz, and D. Fox,(2011, “Interactive 3D modeling of indoor environments with a consumer depth camera”, in ACM Proc.13th international conference on Ubiquitous computing, pp. 75-84, Sep, 2011. [60] M. Tanimoto, T. Fujii, K. Suzuki, N. Fukushima, and Y . Mori, “Depth estima- tion reference software (DERS) 5.0”, in Tech. Rep. ISO/IEC JTC1/SC29/WG11 M 16923, Lausanne, Switzerland, 2009. [61] G. Bradski, “The OpenCV Library. Dr Dobbs Journal of Software Tools”, 25(11). 1225, 2000. [62] D. Taubman D, “Kakadu software 6.4”, 2012. [63] C. Fehn, “A 3D-TV approach using depth-image-based rendering (DIBR),” in Proc.Visual.,Imag.,ImageProcess., pp. 482487, 2003 [64] L. Zhang, and W. J. Tam, “Stereoscopic image generation based on depth images for 3D TV ,”IEEETrans.Broadcast., vol. 51, no. 2, pp.191199, Jun. 2005. [65] K. Mller, A. Smolic, K. Dix, P. Merkle, P. Kauff, and T. Wiegand, “View synthesis for advanced 3D video systems,” EURASIPJ.ImageVideoProcess., vol. 2008, pp. 111, 2008. [66] D. Tian, P.L. Lai, P. Lopez, and C. Gomila, “View synthesis techniques for 3D video”, 2009. [67] Y . Mori, N. Fukushima, T. Yendo, T. Fujii, and M. Tanimoto, “View generation with 3D warping using depth information for FTV ,” Signal Process.: Image Com- mun., vol. 24, no. 12, pp. 6572, Jan. 2009. [68] C. Fehn, “Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV”, inProc.SPIE-TheInternationalSocietyforOpti- calEngineering, vol. 5291, p. 93104. 2004. [69] A. Telea. “An Image Inpainting Technique Based on the Fast Marching Method”, inJournalofGraphicsTools,vol.9, no. 1, pp. 23-24, 2004. [70] M. Solh, and G. Alregib, “Hierarchical hole-filling for depth-based view synthe- sis”, inIEEEJournalonSelectedTopicsinSignalProcessing, vol. 6, no. 5, pp495- 504, 2012. [71] P.910, ITU-T RECOMMENDATION, “Subjective video quality assessment meth- ods for multimedia applications,” in , 2008. 112 [72] T. Tominaga, T. Hayashi, J. Okamoto, and A. Takahashi, “Performance compar- isons of subjective quality assessment methods for mobile video”, in IEEE 2nd International Workshop on Quality of Multimedia Experience (QoMEX),pp 82-87, 2010, [73] N. Ponomarenko, V . Lukin, A. Zelensky, K. Egiazarian, M. Carli, and F. Battisti, “Tid2008-a database for evaluation of full-reference visual quality assessment met- rics”, inAdvancesofModernRadioelectronics, vol.10, no. 4, pp 30-45, 2009. [74] MCL-3D database is available online, “http://mcl.usc.edu/mcl-3d-database/” [75] L. Zhang, and X. Wu, “An edge-guided image interpolation algorithm via direc- tional filtering and data fusion”, in IEEE Transactions on Image Processing, vol. 15, no. 8, pp 2226-2238, 2006. [76] N. Ponomarenko, O. Ieremeiev, V . Lukin, L. Jin, K. Egiazarian, Astola J, et al, “A New Color Image Database TID2013: Innovations and Results”, in Advanced ConceptsforIntelligentVisionSystems, vol. 8192, pp. 402.13, 2013. [77] VQEG, “Methodology for the subjective assessment of the quality of television pictures,” inITU-R.BT.500-11, 2002. [78] H. Ko, C. Kim, S. Choi, and C.-C. Jay Kuo, “3D image quality index using SDP- based binocular perception model,” accepted to IEEE 11th IVMSP Workshop, Jun. 2013. [79] T. H. Falk, Y . Guo, and W. Y Chan, “Improving robustness of image quality measurement with degradation classification and machine learning,” in Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers (ACSSC2007), pp. 503-507, 2007. [80] A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their performance,” IEEETransactiononCommunications, vol. 43, pp. 29592965, Dec. 1995 [81] I. Avcibas, B. Sankur, and K. Sayood, “Statistical evaluation of image quality mea- sures,” inJournalofElectronicimaging, vol. 11, pp. 206-223, 2002. [82] L. Zhang, L. Zhang, X. Mou, and D. Zhang “FSIM: a feature similarity index for image quality assessment,” in IEEE on Image Processing Image Processing, vol. 20, pp. 2378-2386, Aug. 2011. [83] M. Narwaria, and W. Lin “SVD-based quality metric for image and video using machine learning,” in IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, vol. 42, pp. 347-364, Apr. 2012. 113 [84] Z. Wang and A. Bovik, “A Universal Image Quality Index,” in IEEE Signal Pro- cessingLetter, vol. 9, no 3, Mar. 2002. [85] W. S. Kim, A. Ortega, P. L. Lai, D. Tian, and C. Gomila, “Depth map distortion analysis for view rendering and depth coding,” inProc.IEEEInt.Conf.ImagePro- cess., pp. 721724, 2009. [86] W. S. Kim, A. Ortega, P. L. Lai, D. Tian, and C. Gomila, “Depth map coding with distortion estimation of rendered view,” in Proc. SPIE Vis. Inf. Process. Commun., 2010. [87] P. Lai, A. Ortega, C. C. Dorea, P. Yin, and C. Gomila, “Improving view rendering quality and coding efficiency by suppressing compression artifacts in depth-image coding,” in Proc. Visual Commun. Image Process. (VCIP), vol. 7257, pp. 725700- 1725700-10, 2009. [88] B. T. Oh, J. Lee, and D.-S. Park, “Depth map coding based on synthesized view dis- tortion function,” inIEEEJ.Sel.TopicsSignalProcess., vol. 5, no. 7, pp. 13441352, Nov. 2011. [89] Y . Zhao, C. Zhu, Z. Chen, and L. Yu, “Depth no-synthesis-error model for view synthesis in 3-D video,” in IEEE Trans. Image Process., vol. 20, no. 8, pp. 22212228, Aug. 2011. [90] E. Ong, W. Lin, Z. Lu, X. Yang, S.Yao, F. Pan, L. Jiang and F. Moschetti, “A no- reference quality metric for measuring image blur,” in Proc. Seventh International SymposiumonSignalProcessingandItsApplications, vol. 1, pp. 469-472, 2003. [91] Z. Wang, E. Simoncelli, A. Bovik, and M. Matthews, “Multiscale structural simi- larity for image quality assessment,” inProc.IEEEAsilomarConferenceonSignals, SystemsandComputers, vol. 2, pp. 1398-1402, 2003. [92] B. Sch¨ olkopf, and A. J. Smola “Learning with kernels: support vector machines, regularization, optimization and beyond,” intheMITPress, 2002. [93] C.-W. Hsu, C.-C. Chang, and C.-J Lin “A practical guide to support vector classi- fication,” in Dept. Comput.Sci., National TaiwanUniv., Taipei, Taiwan, Tech. Rep., Apr. 2010. [94] D. Basak, S. Pal, and D. C. Patranabis “Support vector regression,” inNeuralInfor- mationProcessing-LettersandReviews, vol. 11, no. 10, pp. 203-224, Oct, 2007. [95] C. M. Bishop, “Pattern recognition and machine learning,” V ol. 1, New York: springer, 2006. 114 [96] VQEG, Final report from the video quality experts group on the validation of objective models of video quality assessment phase 2, Aug. 2003. [97] N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans, and A.C. Bovik, “Image quality assessment based on a degradation model,” in IEEE Transactions onImageProcessing, vol. 4, no. 4, pp. 636-650, Apr. 2000. [98] H. R. Sheikh, and A. C. Bovik, “Image information and visual quality,” in IEEE TransactionsonImageProcessing, vol. 15, no. 2, pp. 430-444, Feb. 2006. [99] D. M. Chandler, and S. S. Hemami, “VSNR: A wavelet-based visual signal-to- noise ratio for natural images,” inIEEETransactionsonImageProcessing, vol. 16, no. 9, pp. 2284-2298, Sep. 2007. [100] N. Ponomarenko, F. Silvestri, K. Egiazarian, M. Carli, J. Astola, and V . Lukin, On between-coefficient contrast masking of dct basis functions, in Proc. 3rd Int. Workshop Video Process. Quality Metrics Consum. Electron., Scottsdale, AZ, Jan. 2007. [101] M. Carnec, P. Le Callet, and D. Barba, “An image quality assessment method based on perception of structural information,” in Proc. IEEE ICIP, vol. 3, pp. III- 185, Sep. 2003. [102] H. R. Sheikh, A. C. Bovik, and G. De Veciana “An information fidelity criterion for image quality assessment using natural scene statistics,”in IEEE Transactions onImageProcessing, vol. 14, no. 12, pp. 2117-2128, Dec. 2005. [103] http://vision.middlebury.edu/stereo/ [104] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” inInternationaljournalofcomputervision, vol. 47, no. 1, pp. 742, 2002. [105] Q. Yang, L. Wang, R. Yang, S. Wang, M. Liao, and D. Nistr, “Real-time global stereo matching using hierarchical belief propagation”, in BMVC, vol. 6, pp. 989- 998, 2006. 115
Abstract (if available)
Abstract
New frameworks for an objective quality evaluation and an image rectification of stereoscopic image pairs are presented in this work. First, quality assessment of stereoscopic image pairs is more complicated than that for 2D images since it is a multi-dimensional problem where the quality is affected by distortion types as well as the relation between the left and right views such as different types/levels of distortion in two views. In our work, we first introduce a novel formula-based metric that provide better results than several existing methods. However, the formula-based metric still has its limitation. For further improvement, we propose a parallel boosting system based quality index. That is, we classify distortion types into groups and design a set of scorer to handle them separately. At stage 1, each scorer generates its own score for a specific distortion type. At stage 2, all intermediate scores are fused to predict the final quality index with nonlinear regression. Experimental results demonstrate that the proposed quality index outperforms most of state-of-the art quality assessment methods by a significant margin over different databases. ❧ Secondly, a novel algorithm for uncalibrated stereo image-pair rectification under the constraint of geometric distortion, called USR-CGD, is presented in this work. Although it is straightforward to define a rectifying transformation (or homography) given the epipolar geometry, many existing algorithms have unwanted geometric distortions as a side effect. To obtain rectified images with reduced geometric distortions while maintaining a small rectification error, we parameterize the homography by considering the influence of various kinds of geometric distortions. Next, we define several geometric measures and incorporate them into a new cost function for parameter optimization. Finally, we propose a constrained adaptive optimization scheme to allow a balanced performance between the rectification error and the geometric error. Extensive experimental results are provided to demonstrate the superb performance of the proposed USR-CGD method, which outperforms existing algorithms by a significant margin.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A learning‐based approach to image quality assessment
PDF
Experimental design and evaluation methodology for human-centric visual quality assessment
PDF
Autostereoscopic 3D diplay rendering from stereo sequences
PDF
Machine learning techniques for outdoor and indoor layout estimation
PDF
Advanced knowledge graph embedding techniques: theory and applications
PDF
Advanced visual processing techniques for latent fingerprint detection and video retargeting
PDF
A data-driven approach to image splicing localization
PDF
Techniques for compressed visual data quality assessment and advanced video coding
PDF
A data-driven approach to compressed video quality assessment using just noticeable difference
PDF
Advanced techniques for object classification: methodologies and performance evaluation
PDF
Advanced techniques for green image coding via hierarchical vector quantization
PDF
Machine learning techniques for perceptual quality enhancement and semantic image segmentation
PDF
Advanced techniques for human action classification and text localization
PDF
Video object segmentation and tracking with deep learning techniques
PDF
Efficient machine learning techniques for low- and high-dimensional data sources
PDF
Explainable and lightweight techniques for blind visual quality assessment and saliency detection
PDF
Word, sentence and knowledge graph embedding techniques: theory and performance evaluation
PDF
Novel and efficient schemes for security and privacy issues in smart grids
PDF
Data-efficient image and vision-and-language synthesis and classification
PDF
Deep learning techniques for supervised pedestrian detection and critically-supervised object detection
Asset Metadata
Creator
Ko, Hyunsuk
(author)
Core Title
Advanced techniques for stereoscopic image rectification and quality assessment
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
03/09/2015
Defense Date
12/03/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
machine learning,OAI-PMH Harvest,quality assessment,rectification,stereo matching,stereoscopic images
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Nakano, Aiichiro (
committee member
), Sawchuk, Alexander A. (Sandy) (
committee member
)
Creator Email
kohyunsu@usc.edu,kosu9980@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-538362
Unique identifier
UC11298259
Identifier
etd-KoHyunsuk-3222.pdf (filename),usctheses-c3-538362 (legacy record id)
Legacy Identifier
etd-KoHyunsuk-3222.pdf
Dmrecord
538362
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Ko, Hyunsuk
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
machine learning
quality assessment
rectification
stereo matching
stereoscopic images