Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Compression of signal on graphs with the application to image and video coding
(USC Thesis Other)
Compression of signal on graphs with the application to image and video coding
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
UniversityofSouthernCalifornia
Ph.D.Dissertation
COMPRESSIONOFSIGNAL ON GRAPHS WITH THE
APPLICATIONTO IMAGE AND VIDEO CODING
Author:
Yung-HsuanChao
Supervisor:
Dr. AntonioOrtega
ADissertationPresentedtothe
FACULTYOFTHEUSCGRADUATESCHOOL
UNIVERSITY OFSOUTHERNCALIFORNIA
InPartialFulfillmentoftheRequirementsfortheDegree
DoctorofPhilosophy
(ELECTRICALENGINEERING)
December2017
Copyright2017 Yung-HsuanChao
iii
Abstract
InthisPh.D.dissertation,wediscussseveralgraph-basedalgorithmsfortransformcoding
in image and video compression applications. Graphs are generic data structures that are
useful in representing signals in various applications. Different from the classic transforms
such as Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT), graphs
canrepresentsignalsonirregularandhighdimensionaldomains,e.g. socialnetworks,sensor
networks. For regular signals such as images and videos, graphs can adapt to local charac-
teristicssuchasedgesandthereforeprovidemoreflexibilitythanconventionaltransforms. A
frequency interpretation for signal on graphs can be derived using the Graph Fourier Trans-
form (GFT). By properly adjusting the graph structure, e.g. connectivity and weights, based
on signal characteristics, the GFT can provide compact representations even for signals with
discontinuities. However, the GFT has high implementation complexity, making it less ap-
plicable in signals of large size, e.g. video sequences. In our work, we develop a transform
coding scheme based on a low complexity lifting transform on graph. More specifically,
we focus on two important problems in the design of a lifting transform, namely, the design
of bipartition and the bipartite graph approximation. The two parts are optimized in terms
of energy compaction for Gaussian Markov Random Field (GMRF), which has been widely
utilizedinmodelingthestatisticsofimagedata.
As application, we consider two types of multimedia signals, including both regular and
irregularlydistributedsignals. Amongthefirsttypeofsignal,weconsiderthecompressionof
intra-predictedvideoresiduals,whichisregularwithpixelsresidingonthe2Dgrid. However,
thesesignalscontainsignificantedgestructures,whichcannotbeefficientlyrepresentedwith
existing transform coding standards. With the proposed graph lifting transform based on
local edges, we demonstrate significant gains as compared to the state of the art DCT based
coding, with comparable performance to that achieved by the high complexity GFT. We
also discuss different types of edge models for video residuals and propose a new model for
ramp edges, which shows promising results in GFT, as compared to the conventional step
edge model. As a second type of signal, we propose a coding scheme for non-demosaicked
light field images. Similar to the traditional digital camera, a light field camera captures
color information using a photo sensor embedded with a color filter array (CFA). On the
captured image, each pixel contains one single color component (out of R,G, and B) which
are distributed based on Bayer pattern. However, through the conversion to an array of
sub-aperture images, which is a representation commonly used for light field processing and
display,thedistributionofBayerpatternnolongerholdsandpixelsofeachcolorcomponent
are distributed irregularly in space. In order to compress such data, a conventional scheme
iv
using DCT requires demosaicking during conversion, which highly increases the amount of
data for coding. With a graph based approach, the original signal can be efficiently encoded
without any pre-processing step, avoiding the redundancies introduced by demosaicking.
We also discuss an intra-prediction algorithm and optimal graph construction for irregularly
spacedpixels. Theresultsusingtheproposedschemewithgraphbasedliftingtransformshow
huge gains in compression as compared to DCT based coding in high bit rates, which are
criticalforarchivalscenarioandinstantcamerastorage.
v
Tomyfamily...
vii
Acknowledgements
This dissertation would not have been possible without the help and support of my teachers,
collaborators, family, and friends. First and foremost, I would like to thank my advisor
ProfessorAntonioOrtega,forhissupervision,patience,andguidance. Histrainingoncritical
thinking, formulating and solving problems has helped me evolve from a naive student to a
mature researcher. I wouldn’t say the process was easy. Actually, there were some really
toughmomentswhenIfeltlost,andalmostgaveuponmyPh.D.studies,buthisconsiderable
encouragementhaskeptmegoing. Icouldnothaveaskedforabetteradvisorandmentor.
It was my pleasure to be given the opportunity to work with Professor Gene Cheung in
National Institute of Informatics (NII), for the topic of light field image compression during
thelastyearofmyPh.D.. Thediscussionwithhimhavebeeninsightful. Theworkwouldnot
becompletedwithouthisgenerousfeedbackandsuggestions.
I would also like to thank Professor C.-C. Jay Kuo, and Professor Ramesh Govindan for
servingasmydissertationcommitteemembers,aswellasProfessorJustinP.Haldar,Professor
ShrikanthS.NarayananforbeingthecommitteemembersandProfessorDavidTaubmanfrom
UniversityofNewSouthWales(UNSW),forbeingtheguestmemberinmyqualifyingexam.
Theircommentshasbeenagreathelpinimprovingthisdissertation.
I would like to express my gratitude to all my teachers in USC. I have greatly benefited
from their lectures. I must also acknowledge all the input from my collaborators and group
members, Dr. Hilmi E. Egilmez, Dr. Sunil K. Narang, Dr. Akshay Gadde, Aamir Anis,
Jiun-Yu (Joanne) Kao, Eduardo Pavez, Dr. Wei Hu, and Jin Zeng. I would like to give
specialthanksDr. YuwenHe,thementorofmyinternshipinInterDigital. Histrainingreally
stimulate my interest in video coding, and I would never forget his encouragement in my
pursuitofPh.D.
Finally,Iwouldliketotakethisopportunitytoexpresstheprofoundgratitudetomyfamily
fortheirloveandsupport. Iwouldespeciallyliketothankmybrother,Wei-Lun(Harry)Chao.
Thank you for the advice in research. Thank you for always being there for me. Thank you
forbeingtherolemodelformeineveryway.
ix
Contents
Abstract iii
Acknowledgements vii
ListofFigures xiii
ListofTables xv
ListofAbbreviations xix
ListofSymbols xxi
1 Introduction 1
1.1 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 LowComplexity GraphTransformforImage/Videocoding . . . . . . 4
1.2.2 ApplicationforLightFieldImageCompression . . . . . . . . . . . . 5
1.2.3 EdgeModelsintheGraphbasedTransform . . . . . . . . . . . . . . 7
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 GraphbasedLiftingTransformwithOptimizedBipartition . . . . . . 8
1.3.2 GraphbasedCompressionforPre-demosaicLightFieldImage . . . . 8
1.3.3 RampModelintheGraphbasedTransform . . . . . . . . . . . . . . 8
1.4 Outlineofthethesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 GraphbasedTransforms 11
2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 GraphFourierTransform (GFT) . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 SignalVariationonGraphs . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 OptimalityofGFTinSignalCompression . . . . . . . . . . . . . . . 13
2.3 LiftingTransforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
x
2.3.2 LocalizedTransformDesign . . . . . . . . . . . . . . . . . . . . . . 16
2.4 LiftingTransformonGraphs[56] . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 PredictionFilterDesign . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.2 UpdateFilterDesign . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.3 GraphReduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.4 ComplexityofLiftingScheme . . . . . . . . . . . . . . . . . . . . . 23
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 BipartitioninLiftingTransforms 25
3.1 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 MaxCutbasedBipartition . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 PredictionErrorMinimization . . . . . . . . . . . . . . . . . . . . . 26
3.2 OptimizedBipartitionforGMRFmodeledsignal . . . . . . . . . . . . . . . 26
3.2.1 RelationshiptoNoiseModel(NM)andMovingAverage(MA)models 27
3.2.2 GaussianMarkov RandomField(GMRF)Model . . . . . . . . . . . 28
3.2.3 OptimizedBipartitionforGMRF . . . . . . . . . . . . . . . . . . . 29
3.2.4 AnalysisofProposedBipartition . . . . . . . . . . . . . . . . . . . . 30
3.3 BipartiteGraphFormulation . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 KronReductionbasedReconnection . . . . . . . . . . . . . . . . . . 32
3.3.2 IterativeKronReduction . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.3 ProbabilisticInterpretation . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 VideoCodingApplication 39
4.1 GraphConstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 BipartitionScheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 TransformDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 OverheadSignalingandEntropyCoding . . . . . . . . . . . . . . . . . . . . 44
4.5 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5 Application: Pre-demosaicLightFieldImageCompression 47
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 NotationsandBackground . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
xi
5.2.2 Background: CompressionafterDe-mosaicking . . . . . . . . . . . . 50
5.3 Proposedscheme: CompressionbeforeDe-mosaicking . . . . . . . . . . . . 52
5.4 ProposedIntra-predictionScheme . . . . . . . . . . . . . . . . . . . . . . . 54
5.4.1 GradientEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.4.2 StructureTensor Estimation . . . . . . . . . . . . . . . . . . . . . . 56
5.4.3 Data-adaptiveKernelRegression . . . . . . . . . . . . . . . . . . . . 58
5.5 ProposedTransformCoding . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.5.1 GraphConstructionbaseonGeometricDistance . . . . . . . . . . . 60
5.5.2 GraphLearningbasedonStatisticsModeling . . . . . . . . . . . . . 61
5.5.3 Graph-basedLiftingTransform . . . . . . . . . . . . . . . . . . . . 63
5.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.6.1 ExperimentalSetting . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 EA-GBT:Step/RampEdgeModelsforVideoCompression 71
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 EdgeModelsforResidualSignals . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.1 Ramp-EdgeModel . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.2 ExperimentalJustificationoftheEdgeModels . . . . . . . . . . . . 75
6.3 RampCodingandGraphConstruction . . . . . . . . . . . . . . . . . . . . . 76
6.3.1 ArithmeticRampEdgeCoding(AREC) . . . . . . . . . . . . . . . . 76
6.3.2 GraphConstructionfromtheEdgeMap . . . . . . . . . . . . . . . . 78
6.4 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7 ConclusionsandFutureWork 83
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Bibliography 87
A ReconnectionusingKronReduction 95
xiii
ListofFigures
1.1 RGBcolorcomponentsarrangedbyBayerpatternGRBG . . . . . . . . . . . 6
1.2 Transformation of Bayer patterned RGB color components into formats suit-
ableforimage/videocodecs . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 1 levelliftingscheme: forwardandinversetransforms . . . . . . . . . . . . . 15
2.2 Theliftingschemewithmulti-leveldecomposition . . . . . . . . . . . . . . 15
2.3 Exampleoflocalizedliftingtransformon1 dimensionalsignal . . . . . . . . 16
2.4 Exampleofgraphdownsamplingbyconnecting2-hopsneighborsinthepre-
viouslevel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 ExampleofgraphdownsamplingusingKronreduction . . . . . . . . . . . . 21
2.6 ExampleofgraphdownsamplingusingKronreductionandsparsification . . 23
3.1 ExampleofU nodeselectionbygreedyalgorithmofMAPerrorminimization 31
3.2 Example of block withP nodes (blue nodes) with low connectivity to the
updateset(rednodes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 ExampleforiterativeKronreductionbyremoving1 nodeatatime . . . . . . 33
3.4 Exampleofbipartitegraphconstructionusingtwodifferentschemes . . . . . 34
3.5 Reconstruction error (MSE) after truncation coefficients inP for different
bipartitionrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1 Exampleof4-connectedgridgraph . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Boundaryextensionforpixelsaround(a)blockboundariesand(b)edges . . . 43
4.3 Thecomparisonbetweenproposedliftingscheme,MaxCutbasedlifting,and
theMaxCutbasedliftingwithproposedre-connectiontechnique. . . . . . . . 43
4.4 Encoderforintra-predictedvideoswithmodeselectionbetweenDCTandthe
graphbasedliftingtransform. . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1 Conceptualsystemoflenselet-basedplenopticcamera . . . . . . . . . . . . . 47
5.2 Conventionalencoderforlightfieldimage. Thedemosaickingandcalibration
processesareappliedbeforecompression. . . . . . . . . . . . . . . . . . . . 50
xiv
5.3 Proposed LF encoder, where compression is applied on the raw lenselet data
withoutdemosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.4 SparselydistributedGcomponentsononesub-apertureimage(FigureFriends1
fromEPFLlightfielddataset) . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.5 DecoderintheproposedLFcodingscheme,wheredemosaicingandcalibra-
tionareappliedtogeneratefullcolorsub-apertureimagearray . . . . . . . . 53
5.6 ProposedIntra-predictionsystemforsparselydistributedpixels . . . . . . . . 54
5.7 4 decoded reference blocks and the vectors indication their relative locations
totheinputblockIandedgedirectionestimated . . . . . . . . . . . . . . . . 57
5.8 Illustration of kernel size and shape in the smooth block (left) and the block
withstrongedge(right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.9 A part of graph constructed for irregularly placed R components. In (a), the
oneusing4nearestneighbormethodisshown. In(b),eachpixelisconnected
to2 neighborsinhorizontalandverticalorientationsrespectively . . . . . . . 60
5.10 Graphstructureoptimizedforclassescorrespondingtointra-predictionangle
(1) =
3
8
(nearly vertical) and (2) = 0 (horizontal). The link color
indicates the associated weight and the node color indicates the associated
selfloopweight(darker: largerweight) . . . . . . . . . . . . . . . . . . . . 66
5.11 AveragePSNRoverR,G,andBcomponentsfortestimages(a)Friends1(b)
Bikes,(c)Flowers,and(d)Ankylosaurur&Diplodocus . . . . . . . . . . . . 68
6.1 (a)Thestepfunctionand (b)rampfunctionforedgemodeling . . . . . . . . 72
6.2 1-Dlinegraphwithweaklinkweights w fortherampspannedfrom x
i
to x
i+`
74
6.3 Theoptimallinegraphfor INTER predictedresiduals . . . . . . . . . . . . 76
6.4 Theoptimallinegraphfor INTRApredictedresiduals . . . . . . . . . . . . 76
6.5 (a) An example of binary ramp map with pixels p
1
;p
2
;p
5
indicating the
rampcenters,and(b)thechaincodeformedbytraversingthoughtheconsec-
utiveramppixels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.6 (a) The 8 directions that can be taken by c
i
. (b) The potential traversing
directionsfrom p
i
to p
i+1
giventhedirectionfrom p
i1
to p
i
. . . . . . . . . . 77
6.7 (a)AnexampleofAECcodingonstepedgesand(b)AREConrampedges . 78
6.8 An example of residual block and the corresponding4-connected grid graph
(rednodesindicatethedetectedrampcenters) . . . . . . . . . . . . . . . . . 79
6.9 Examplesofweaklinkassignmentbasedonramppositions . . . . . . . . . . 80
6.10 ThevideoencoderwithhybridEA-GBT/DCTmodeselectionbasedonrate-
distortionoptimization(RDO) . . . . . . . . . . . . . . . . . . . . . . . . . 80
xv
ListofTables
4.1 PSNR-bitrate comparison with Bjontegaard metric. The negative value for
rate indicates the average bitrate reduction against DCT, and the positive
PSNRshowstheaveragePSNRgain. . . . . . . . . . . . . . . . . . . . . . . 46
6.1 BitratecomparisonbetweenAECandAREC.BPPindicatesthebitrategain
ofARECoverAEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Bjontegaard Delta Criterion for INTER predicted videos: PSNR and bitrate
gainofEA-GBT-stepandEA-GBT-rampoverDCT . . . . . . . . . . . . . . 81
6.3 Bjontegaard Delta Criterion for INTRA predicted videos: PSNR and bitrate
gainofEA-GBT-stepandEA-GBT-rampoverDCT . . . . . . . . . . . . . . 82
xvii
ListofAlgorithms
1 Greedysolutionforbipartition . . . . . . . . . . . . . . . . . . . . . . . . . 30
2 KronreductionbasedreconnectionforP . . . . . . . . . . . . . . . . . . . . 33
3 Boundary/Edgeextensionforsamplingongraphs . . . . . . . . . . . . . . . 41
4 Bipartitioninmulti-levelliftingtransform . . . . . . . . . . . . . . . . . . . 42
5 ArithmeticRampEdgeCoding(AREC) . . . . . . . . . . . . . . . . . . . . 79
xix
ListofAbbreviations
DCT Discrete Cosine Transform
DWT Discrete Wavelet Transform
GFT GraphFourier Transform
HEVC HighEfficiency Video Coding
JPEG Joint PhotographicExpertsGroup
CDF Cohen-Daubechies-Feauveau
MST Maximum SpanningTree
NM Noise Model
MA MovingAverage
EA-GBT EdgeAdaptive GraphBasedTransform
GMRF GaussianMarkov Random Field
AR AutoRegressive
LF Light Field
CFA Color Filter Array
ML Maximum Likelihood
xxi
ListofSymbols
GeneralRule
i; j;N; Scalar(loweranduppercasenormal)
S;N;P; Set(uppercaseCalligraphic)
f; c; s; Vector(lowercase bold)
f
i
;c
i
;s
i
; Vectorelementatindexi
f
S
; c
S
; s
S
; Sub-vectorextractedfrompositionsindexedbyset Sc
A; T; P; Matrix(uppercase Bold)
A
i;j
;T
i;j
;P
i;j
; Matrixelementatindex (i; j)
A
S;N
; T
S;N
; P
S;N
; Sub-matrixfromrow/columnpositionsindexedbysetS/N
SymbolList
G |V |E Graph|Nodeset|Linkset
A Adjacencymatrix
N (i) Setofneighbouringnodesconnectingtonode v
i
deg(i) | D degreeonnode v
i
|Degreematrix
L CombinatorialLaplacian
L
sym
SymmetricnormalizedLaplacian
L
G
GeneralizedLaplacian
P |U Predictionset|Updatesetinlifting
P | U Prediction|Updatefilterinlifting
Tr() |det() Trace|Determinant
diag(v) Diagonalmatrixwiththeelementsof v onthemaindiagonal
N (;) MultivariateGaussianwithmean andcovariance
Q Precision(inversecovariance)matrixofGaussian
O() bigOnotationincomplexityanalysis
1
Chapter1
Introduction
A graph is a data structure that consists of a set of nodes (or vertices) connected by links
(or edges). Graphs provide natural representations for data in many modern fields such a
sensor network and social network, where data samples are distributed irregularly in high
dimensional space, and also for traditional regular signals such as audio and images. In an
image, each data point or data patch, e.g., pixel and pixel patch, can be denoted as a node,
and a graph signal is a function associated to each node, e.g., pixel intensity. A link can be
weighted with a value defined according to the similarity between the two nodes connected.
Thedefinitionofsimilaritydependsontheapplication. Forexample,insocialnetworks,two
nodes(users)canbeconsideredsimilar,andthushavelargelinkweightbetweenthem,ifthe
users reside in the same city or have mutual friends. In images, two nodes (pixels) can be
considered similar if the geometric distance between them is small and/or if the pixels have
similar intensity values. In the last decade, there has been a growing interest in generalizing
the commonly used theories and algorithms that were first developed for traditional signal
processing to signals on graphs. Some of these techniques include de-noising, filtering,
clustering and compression. In this thesis, we will discuss the problem of compression for
datadefinedongraphs. Specifically,wewillbefocusingonapplicationsforimageandvideo
compression.
Transform coding techniques including those based on the Discrete Cosine Transform
(DCT) and Discrete Wavelet Transform (DWT) are widely used nowadays for multimedia
compression. These methods exploit the redundancies within smooth signals, projecting
signals onto a frequency domain in which they have compact representation. However, these
transforms are limited to signals on regular 1D and 2D grids. Moreover, the transforms
are designed based on a stationary assumption of signal statistics. In order to represent
data with locally variable characteristics and on defined irregular domains, researchers have
proposed the Graph Fourier Transform (GFT). Similar to DCT, frequency interpretation for
signals on graphs can be derived with GFT. By properly designing link weights based on
2 Chapter1. Introduction
signalcharacteristics,structuressuchasimageedges,whicharerepresentedbyrelativelyhigh
frequenciesinconventionaltransforms,canbecodedwithfairlylowbitrate.
However,theGFTisaglobaltransformandrequireshighcomplexityinimplementation.
Hence it is difficult to apply it for large data or in applications such as video coding, where
complexity is of significant concern. In this dissertation, we develop a low complexity
graph based transform using the lifting scheme. The lifting scheme is a technique to obtain
multi-resolution signal representation and is composed of three building blocks, namely,
bipartition, prediction, and update. In bipartition, nodes of the input graph signal are split
into an update set and a prediction set. Then, signal in the prediction set is predicted from
the update set, resulting in prediction residuals stored as high frequency coefficients. Next,
the signal in update set is updated using the filtered prediction residuals, giving rise to the
smooth approximation stored as low frequency coefficients. In predicting (updating) the
prediction (update) set, only the signal in the opposite set will be utilized, i.e. only the
weights (similarities) on links connecting nodes in different sets are exploited. Therefore,
the prediction and update processes for a graph signal can be seen as processes applied on
an approximated bipartite graph with only links connecting a node in prediction (update)
set to a node in update (prediction) set. Transforms implemented with the lifting scheme
are guaranteed to be invertible regardless of the the design of the three building blocks, and
therefore provide lots of flexibility in design. In order to address the complexity concern in
the GFT, in this thesis, we design the prediction and update filters to be localized, and focus
ontwoproblems:
1. Howtodesignproperbipartitioninlifting;
2. Howtoconstructabipartite approximationforagraphgiventhevertexbipartition
inordertoimprovecodingefficiency,giventheuseoflocalizedpredictionandupdatefilters.
As an application, we will discuss the application of the proposed graph based lifting
transforminbothregularandirregularmultimediasignals. Asanexampleofregularsignals,
we consider the compression of intra-predicted video residuals, where graphs are designed
based on edge structure within data. We also describe a hybrid coding scheme that provides
optimized transform selection according to coding efficiency. As for irregular signals, we
discuss the compression of non-demosaicked light field (LF) images. We develop a coding
scheme for raw LF data without going through pre-processing steps such as demosaicking,
whichsignificantlyincreaseredundancywithindata. Withoutdemosaicking,theLFdata,after
being converted to an array of sub-aperture images, contains pixels with sparse distribution.
In our scheme, the pixels will be considered vertices of a graph, and encoded using a graph-
based lifting transform. We also describe the intra-prediction algorithm and optimal graph
1.1. RelatedWork 3
construction applied for pixels with irregular distribution. In the last part of the dissertation,
we consider the problem of edge modeling for different types of predicted video residuals,
namely the intra and inter-predicted residuals, and the corresponding optimization of graph
structure in GFT to achieve improved coding efficiency. In the next section, we will review
somerelatedworkinimageandvideocompression,followedbyadetaileddescriptionofthe
research problemsinthisdissertationandourcontributions.
1.1 RelatedWork
TransformssuchasDiscreteCosineTransform(DCT)andDiscreteWaveletTransform(DWT)
havebeenwidelyusedinconventionalmultimediasignalcompression. Forexample,inimage
andvideocodingstandardsincludingJPEG,MPEG,H.264,andthemostrecentHEVC,DCT
isusedtoobtainsparserepresentationofsignalsinafrequencydomain. However,forsignals
containing high frequency structures, e.g., edges in images, these standard transforms are
likelytoproducelargecoefficientsinhighfrequencybasisinthetransformeddomain,which
require high bitrate to represent them. Besides, the standard transforms can only deal with
conventional signals that are defined in regular domain such as the 1 dimensional space
and 2D grids. In order to deal with signals having high frequency edges, researchers have
proposed designing directionaltransforms, which can incorporate the edge directions in the
basisfunctions. SomeexamplesincludethedirectionalDCT[83],bandelet[42],andcurvelets
[38]. Filtering in these works is performed along the edge directions, where neighbouring
pixels tend to have high correlation, thus avoiding filtering across edges. However, many
of these directional transforms require a high complexity pre-classification, which divides
signals into multiple regions of uniform geometric flow. Also, most of these transforms are
restricted to a given number of edge directions, which limits the adaptation to signals with
morecomplicatedcharacteristics,e.g.,corners.
Edge-AdaptiveGraphbasedTransform
In[28],anedgeadaptivetransformbasedongraphsisproposedfordepthimagecoding. The
use of graphs allows the incorporation of signal characteristics into the link weight, which
removes limitations on the specific edge directions that can be represented. Furthermore,
graphsprovidenaturalrepresentationsforsignalsinirregularandhighdimentionalspaces,and
thereforecanbeappliedinapplicationssuchassocialandsensornetworks. Forsignaldefined
ongraphs,afrequencyresponsecanbederivedusingtheeigenvectorsandeigenvaluesofthe
graph Laplacian, which models the variation on graphs taking into account their connection.
The transform is called the Graph Fourier Transform (GFT). A signal is considered smooth
4 Chapter1. Introduction
onagraphifthevariationissmallbetweentwonodesconnectedwithlargeweightlink. Such
signalshaveenergymostlycompactedonthebasisassociatedtosmalleigenvalues. Therefore,
onecanobtainsparserepresentationsforsignalswithedgesiftheassociatedgraphisdesigned
in a way that the links connecting nodes separated by image edges have small weights. The
GFTsofgraphsdesignedbasedonedgestructuresarealsocalledEdge-AdaptiveGraphbased
Transforms (EA-GBTs), and have been successfully applied in applications such as depth
map compression [28, 76], image compression [27], and video compression [32]. In [84],
a theoretical analysis of the optimality of GFT in compression is provided in a probabilistic
point of view. The authors show that given a graph, there is a unique underlying Gaussian
Markov Random Field (GMRF) that satisfies a conditional independent assumption defined
by graph connection. The GFT associated to the graph is shown to be equivalent to the KLT
ofsuchGMRFmodel,andthereforeisoptimalintermsofdecorrelation.
1.2 Motivation
1.2.1 LowComplexityGraphTransformforImage/Videocoding
Despite the advantages described in the previous section, a major challenge in using the
GFT is its high complexity in computation. There are two sources of complexity. First,
the transform basis vectors are generated using eigen-decomposition, with complexity up to
O(N
3
). Second, the application of transform to a signal has complexity O(N
2
), because in
generalthecorrespondingmatrixmultiplicationdoesnothaveafastalgorithm. Althoughthe
first source of complexity is more significant, this computation can be performed offline if
graph connectivity does not change. Therefore, in a real application, the complexity can be
reduced using some pre-computing techniques. Examples include the work in [76] and [32].
In [76] one pre-selects the k most utilized graph structures in advance, pre-computing the
eigen-decompositionoff-line. In[32],ontheotherhand,graphtemplatesaregeneratedbased
on statistical observation. Nevertheless, the second source of complexity, i.e. computing the
transform coefficients for each block, is unavoidable, and therefore limits the usage of GFT
basedalgorithmsforlarge N orin applicationswithmajorconcernincomplexity.
To address the complexity issue in GFT, we aim to design a low complexity transform in
signal projection, and at the same time still keep the advantages of GFT in signal adaptation.
We design such transform using the lifting approach. The lifting transform for graph signals
hasbeenappliedsuccessfullyinsignalcompressionfordifferentapplications,includingdata
gatheringinWirelessSensorNetwork(WSN)[67],andvideocoding[53,54]. Intheseworks,
1.2. Motivation 5
thetransformsaredesignedtobelocalizedinthesensethatthefilteringofonenodeonlytakes
informationfromthenodesthataredirectlyconnectedtoit. Asaresult,forlocallyconnected
graphs,thecomputationoftransformedcoefficientsrequiresonlyO(NlogN) incomplexity,
while for GFT, the computation costs up to O(N
2
) in complexity. Moreover, the prediction
and update function are functions of the link weights, modeling the pair-wise similarities
between nodes. Therefore, the transform avoids filtering across dissimilar nodes, e.g., pixels
acrossimageedges,andthusfewhighfrequencycoefficientswillbeproduced. Althoughthe
lifting transform hasprovided good performance in compressionfor many applications, only
a limited amount of work has addressed its optimality in compression. In this dissertation,
we will discuss the optimization of lifting transform. Specifically, we focus on optimizing
thebipartitionstepintheliftingtransformdesigninordertoensureenergycompactioninthe
transformedfrequencydomain.
1.2.2 ApplicationforLightFieldImageCompression
Light Field (LF) imaging separately captures light rays arriving from different directions at
eachpixelinanimage. Withtheadditionalinformationofraydirectionswithinthelightfield
image, some useful applications including multi-view rendering [7, 75], depth estimation [6,
29] and re-focusing [5, 11] become possible. In a lenselet-based plenoptic light field camera
[63],whichisthemostcommonlyusedlightfieldcameranowadays,anarrayofmicrolenses
is placed in front of the image sensor, in order to separately capture different directional rays
arriving at an image pixel. Due to the structure of microlenses, each point on the focal plane
will be mapped onto a patch of pixels, in which each pixel corresponds to a specific ray
direction, instead of one pixel position as for traditional digital camera. The captured raw
image is commonly called a lenselet image, and will typically be converted into a series of
2D sub-aperture images before compression and display. Each sub-aperture image collects
pixelsofthesameraydirection.
Similar to the traditional digital camera, a light field camera uses Bayer patterned color
filters on its sensor to capture color information, and on the resulting lenselet image, each
pixel position will contain only one color component out of R, G, and B, as shown in Fig.
1.1. In order to generate full color RGB light field image, a lenselet image typically goes
through demosaicking before being converted to sub-aperture images. Each sub-aperture
image can be seen as one traditional 2D picture of a specific ray angle. Therefore, in the
existing compression schemes [3, 15, 21, 30, 48], the existing video compression standards
have been adopted to encode the series of sub-aperture images. For example, in [3], sub-
aperture images are arranged, in Raster and Spiral orders, into pseudo-sequences that can
6 Chapter1. Introduction
Figure1.1: RGBcolor components arranged by Bayer pattern GRBG
Figure 1.2: Transformation of Bayer patterned RGB color components into formats
suitable for image/video codecs
1.2. Motivation 7
be input into video codecs. In [15], a new mode, called Self Similarity (SS) has been
proposed and included as an intra-prediction mode in HEVC standard, in order to exploit
the similarity between pixels from adjacent ray directions. However, the number of pixels
is highly increased by demosaicking, in which missing color components at each pixel are
interpolatedbytheneighbouringpixels. Althoughintheorythesamecompressionratecanbe
achievedbeforeandafterinterpolation,existingimageandvideocodingtechniquesincluding
JPEGandHEVCusuallydonotexploitthisknowledgeintheencodingalgorithm. Therefore,
inthisthesis,weproposeanovelcodingschemethatdirectlyencodestherawlenseletimage
before demosaicking, avoiding the redundancies introduced by the demosaicking process. A
similar idea has been applied for conventional image [43, 44] and video compression [13].
In these works, the algorithms directly encode the non-demosaicked (Bayer patterned) raw
image/video,andapplythedemosaickingonthedecodersideforthedecodedimageandvideo
withBayerpattern. Forencoding,theRGBcomponentsarrangedbasedonBayerpatternwill
firstbetransformedintoformats(rectangular)suitableforimageandvideocodecs,asshown
in Fig. 1.2, through simple down-sampling and rotation, before being compressed with the
correspondingstandardssuchasJPEGandHEVC.However,unlikeimageandvideo,pixelsin
lightfielddata,aftertheconversiontosub-apertureimages,willnolongerbedistributedbased
on Bayer pattern. Instead, the distribution will be highly irregular within each sub-aperture
image, making it difficult to be encoded using conventional coding scheme. In our work,
we develop a novel coding scheme for non-demosaicked light field image, where sparsely
and irregularly distributed pixels within each sub-aperture image are connected as a graph
and coded with a graph based transform. We also extend intra-prediction to pixels that are
distributedsparselytoexploitthecorrelationamongclose-bypixelswithineachsub-aperture
image.
1.2.3 EdgeModelsintheGraphbasedTransform
The Edge Adaptive Graph based Transform (EA-GBT) has shown promising results in com-
pressingimageswithedges. Theedgemodelinmostgraphdesignsisbasedontheassumption
thatalledgescanberepresentedwithidealstepfunctions[28,76]. However,edgeswithsharp
steptransitionsrarelyexistinnaturalimages. Instead,mostoftheedgesinimagesareramps,
as pointed out in [61]. In this thesis, we present an alternative graph construction based on
ramp edge model. As an application, we consider the compression of videos with different
types of prediction. In addition, the signaling of graph geometries for ramp edges is also
discussed.
8 Chapter1. Introduction
1.3 Contributions
1.3.1 GraphbasedLiftingTransformwithOptimizedBipartition
Inourwork,weintroduceanoptimizationproblemforbipartitioninliftingscheme,basedon
agenerativeGMRFmodel. Theoptimalbipartitionmaximizestheenergycompactioninthe
update set. We also provide a greedy solution for finding the right bipartition for different
sizeofpredictionandupdatesets. Theresultingbipartitionhasanicephysicalinterpretation
inthattheregionswithmorevariationingraphstructure,whicharemoredifficulttopredict,
will have more nodes selected as predictors and included into the update set. An extension
of the bipartition algorithm to multi-level decompositions is also presented. In addition, we
propose a novel bipartite graph formulation using Kron reduction based reconnection for
each node in the prediction set. The method gives promising results reducing the prediction
residue, which requires high cost for encoding. A probability interpretation is also provided
for the proposed reconnection technique. The experiments on intra-predicted video coding
show that the proposed lifting scheme outperforms the standard DCT based encoding, and
provides comparableperformancetothehighcomplexityGFT.
1.3.2 GraphbasedCompressionforPre-demosaicLightFieldImage
Wedevelopacodingsystemfornon-demosaickedlightfielddatawithsparselyandirregularly
distributedpixels. Withoutdemosaicking,wemaprawsensedcolordatacapturedbyplenoptic
camera directly to sub-aperture image 2D grids, within which the color pixels are sparsely
distributed. Anovelintra-predictionschemeisperformedontheirregularlydistributedpixels,
exploiting the correlation of each block with its decoded neighbouring reference blocks. A
gradient estimation is calculated within each block based on the structure tensor, which is
utilized later for edge direction estimation and directional prediction using adaptive kernel.
Fortransformcoding,sparselydistributedpixelswithineachblockinasub-apertureimageare
connectedasagraphandencodedwiththelocalizedgraphbasedliftingtransformproposed.
The optimal graph constructions are derived based on Maximum Likelihood (ML) criteria
appliedtoaGMRFmodel.
1.3.3 RampModelintheGraphbasedTransform
Weproposean1Dautoregressivemodel(AR)basedonrampedges,andestimatetheoptimal
parametersinthemodelusingthetrainingsetsofresidualvideosequences. Thenewmodelis
utilized in the graph construction and has shown outperforming results in the intra-predicted
1.4. Outlineofthethesis 9
videosequencesthantheonebasedonstepedges. Tosignaltheoverheadinformation,anew
edge coding technique, called Arithmetic Ramp Edge Coding (AREC) is presented, which is
anextensionoftheAECapproach [18]utilizedforstepedgecoding.
1.4 Outlineofthethesis
In this chapter, we described the motivation and our contributions in this dissertation. In
Chapter 2, we will review the concepts of the Graph Fourier Transform, the lifting scheme,
and its extension to signal on graphs. In Chapter 3, the problem of finding the optimal
bipartition in lifting scheme in terms of energy compaction is defined, and a solution based
on greedy approximation is proposed. Besides, we also describe the proposed method for
bipartite graph formulation based on Kron reduction. The optimality of prediction filter
based on the bipartite formulation is discussed. In Chapter 4, the proposed lifting transform
is applied to the problem of video compression and we also discuss the graph construction
and coding scheme designed. In Chapter 5, we address the problem of light field image
coding and the improvement of coding efficiency using the graph based transform. A novel
intra-prediction and graph construction algorithms are described in detail. In Chapter 6,
we describe the design of Edge Adaptive Graph based Transform (EA-GBT) based on step
and ramp edge models, and study the edge characteristics in different types of residual video
sequences. WeconcludethedissertationinChapter7withpossibledirectionsforfuturework.
11
Chapter2
GraphbasedTransforms
In this chapter, we provide an introduction to the Graph Fourier Transform and the lifting
scheme, which are necessary backgrounds for this dissertation. We will start from some
notation definitions for graphs and the lifting scheme in Section 2.1. In Section 2.2, we will
introduce the concept of the Graph Fourier Transform (GFT), which defines the frequency
interpretation for signals on graphs. This will give us an idea of how to construct a suit-
able graph in order to get compact representation of the associate signal in the frequency
domain. We will also discuss the optimality of GFT computation in compression in terms
of a probabilistic model. The same model will be adopted later in our design of optimal
lifting transform. In Section 2:3, we describe the concept of lifting transform, including its
invertibility, and the commonly used approaches for prediction and update filters. In Section
2.4,wewilldiscusstheextensionofliftingschemeongraphsandrelatedresearchproblem.
2.1 Notations
A graph G = (V;E) is a collection of nodes indexed by v
i
2V =fv
1
;v
2
;v
3
;;v
N
g, and
links e
i;j
, connecting nodes v
i
and v
j
. The number of nodes N defines the size of the graph.
Foraweightedgraph,thereisanon-negativeweightw
i;j
associatedtoeachlinke
i;j
,modeling
thesimilaritybetweentheconnectednodepair. TheconnectionofgraphGcanberepresented
byanNN adjacencymatrix A,whichhaszerodiagonalelementsandtheelement A
i;j
= w
i;j
for off-diagonal terms. For undirected graphs, the adjacency matrix is symmetric, and the
degreedeg(i)ofnodev
i
isdefinedasdeg(i) =
P
j
w
i;j
. Adegreematrix DisaNN diagonal
matrixwith D
i;i
= deg(i). AcombinatorialLaplacianmatrixisdefinedas L = D A,which
isthemostcommonlyusedLaplacianmatrixinliterature. InChapter6,wealsoconsiderthe
generalized graph Laplacian L
G
where there are non-negative self loop weights associated to
each node. A graph signal f2 R
N
can be represented as a vector, where each element f
i
is
thesignalvalueassociatedtonode v
i
.
12 Chapter2. GraphbasedTransforms
We focus only on linear transforms T : R
N
! R
M
, where the operation on node v
i
is
defined as a linear combination of the signal value on node v
i
and its nearby nodes v
j
2N
i
.
i.e.
y
i
=< T
i;:
; f >=T
i;i
f
i
+
X
j2N
i
T
i;j
f
j
; (2.1)
where T
i;:
representsthei
th
rowofthetransformmatrix T,and y
i
istheoutputoftheoperation
at node v
i
. In the lifting transform, the nodes inV are divided into two disjoint setsP and
U,whereP standsforthepredictionset,andU istheupdateset. Thecorrespondingsignals
are denoted as f
P
2 R
n
and f
U
2 R
m
, where n and m are the number of nodes inP andU
respectively. The prediction filter is defined as a linear transform P :R
m
!R
n
that predicts
signal f
P
from f
U
. The update filter is defined as a linear transform U : R
n
! R
m
that
updates f
U
usingthe predictionresidual coefficientsstored inP. The bipartition,prediction,
and update processes can be repeated for the smooth coefficients stored in the update set for
multi-level lifting transform. We use the superscript to represent signals and operations in
each level, e.g. f
`
, G
`
and P
`
indicate the signal, the associated graph, and the prediction
transforminlevel`,respectively
2.2 GraphFourierTransform (GFT)
2.2.1 SignalVariationonGraphs
Inordertoobtainthefrequencyinterpretationforsignalsrepresentedbygraphs,similartothe
DCT transform for one dimensional array, it is necessary to extend the concept of variation
from conventional signals to graph signals, considering both the connection and pair-wise
similarity between nodes. The most commonly used variation operator for graphs is the
combinatorialLaplacianmatrix L. Thevariationofsignal f onthegraphgiventheassociated
Laplacianmatrix L iswrittenas
var(L; f) = f
T
Lf
=
1
2
X
i;j
w
i;j
(f
i
f
j
)
2
:
(2.2)
The graph connectivity is taken into account since the difference between two nodes that
are strongly connected, i.e. with large link weight, will be emphasized in computing the
variation. An N N Laplacian matrix is diagonalizable with non-negative eigenvalues
1
= 0;
2
;;
N
, where
i
j
for i j. The corresponding eigenvectors are denoted
2.2. GraphFourierTransform(GFT) 13
as U =fu
1
; u
2
;; u
N
g. Note that for two eigenvectors u
i
and u
j
with i j, the variation
of the vectors on the graph follows the order of eigenvalues, i.e., var(L;u
i
) var(L;u
j
).
Therefore, these eigenvectors provide a Fourier-like basis with frequency quantified by the
corresponding eigenvalues, thus U is called the Graph Fourier Transform (GFT). The GFT
coefficients
~
f of signal f are obtained as
~
f = U
1
f. Note that for undirected graph, the
combinatorial Laplacian matrix is symmetric and thus U
1
= U
T
. There are other types of
Laplacian matrices that have been used in literature, including the random walk Laplacian
L
rw
= D
1
LandthesymmetricnormalizedLaplacianmatrixL
sym
= D
1=2
LD
1=2
,whichis
commonlyusedingraphcutrelatedworks[70]. Bothmatriceshavenon-negativeeigenvalues
i
2 [0;2]. Note that the combinatorial Laplacian L and the normalized LaplacianL
sym
are
both symmetric, with orthogonal GFT basis, while the GFT for the random walk Laplacian
L
rw
isnon-orthogonal.
2.2.2 OptimalityofGFTinSignalCompression
Inordertoachievegoodperformanceincompression,itisimportanttohaveatransformthat
canrepresentsignalssparsely. Intraditionalsignalprocessingapplications,theDCTisuseful
incompressingsmoothsignalssuchthatsignalvariationsaresmallbetweenneighbouringdata
points,sinceonaveragemostofthesignalenergyisconcentratedonthelowfrequencybases.
ForGFT,thesignalvariationisdefinedin(2.2)asafunctionofsignalandgraphstructure. In
ordertohavecompactrepresentationwithmostenergyinthelowfrequencybasisforagiven
signal f, we aim to choose a graph that leads to small variation on the resulting Laplacian
matrix L
, i.e. the value of var(L
; f) is small. This can be easily achieved by assigning
largeweightstolinksconnectingnodeswithsimilarsignalvalues,andsmallweightstolinks
connectingnodeswithlargedifferences. Theflexibilityingraphconstructionprovidesmany
advantagestoGFTforcompressioninapplicationssuchasimageandvideocompression[27,
28, 32, 76, 77]. Natural images usually contain edges, where multiple neighboring pairs of
pixels have large intensity difference. Edges are not efficiently represented by conventional
transforms such as the DCT. We can express an image as a graph where one pixel is denoted
asanode,andansparserepresentationinGFTcanbeachievedbyassigningsmallweightsto
linksacrossedges.
In [84, 85], optimality of GFT in signal compression is analyzed. For a random signal
with known correlation, described by its covariance matrix, the Karhunen–Loève Transform
(KLT) is optimal in terms of de-correlation. The KLT is an optimal orthogonal transform
in energy compaction in terms of mean squared error, i.e. the mean squared error of the
reconstructed signal using k transformed coefficients is minimized among all orthogonal
14 Chapter2. GraphbasedTransforms
transforms. However, correlation models need to be estimated from training data for real
signals. In order to obtain a fixed transform without high complexity in computation, the
DCT is often used. The DCT is equivalent to KLT for signals that can be modeled as a
stationary Markov sequence with correlation equal to 1 [1, 14]. The optimality of the GFT
canbeshowninasimilarway. GivenagraphG,theassociatedGFTisshowntobeequivalent
to the KLT for an underlying Gaussian Markov Random Field (GMRF) defined based on the
graph connectivity, i.e. the inverse covariance matrix (precision matrix) of this GMRF is
equivalent to the graph Laplacian L. In fact, GFT can outperform KLT in practice, since
fewerparametersarerequiredtoestimate,leadingtomorerobustsignalmodeling[34].
2.3 LiftingTransforms
In this section, we will introduce the lifting scheme, including the invertibility property, the
designoflocalizedtransformsandtheextensiontomulti-leveldecomposition.
2.3.1 Preliminaries
Aliftingschemeconsistsofthreestages:
1. Bipartition: Datapointsaredividedintotwodisjointsets,calledpredictionset(P)and
updateset (U)
2. Prediction: This step is used to remove redundancy in the signal. The signal f
P
inP
is predicted using signal f
U
inU with prediction transform denoted as P. Then, the
prediction error, i.e. d = f
P
Pf
U
, is stored in setP. If the signals in the two sets are
highlycorrelatedwitheachother,thepredictionerrorisexpectedtobesmall.
3. Update: The signal f
U
inU is updated using the prediction residue d and the update
transform U. The process generates a smooth approximation s to the original signal f,
andtheresultisstoredinU astransformedcoefficients.
A one level lifting scheme is summarized in Fig. 2.1. The transformed coefficients c =
[s
T
; d
T
]
T
,containthepredictionerror d = f
P
Pf
U
andthesmoothcoefficients s = f
U
+ Ud.
Inmatrixform,thewholeprocesscanbewrittenas
c =
2
6
6
6
6
4
s
d
3
7
7
7
7
5
=
2
6
6
6
6
4
I U
0 I
3
7
7
7
7
5
2
6
6
6
6
4
I 0
P I
3
7
7
7
7
5
2
6
6
6
6
4
f
U
f
P
3
7
7
7
7
5
: (2.3)
2.3. LiftingTransforms 15
Figure2.1: 1 levellifting scheme: forward and inverse transforms
Figure2.2: Thelifting scheme with multi-level decomposition
The inverse transforms for both prediction and update processes can be immediately derived
byinvertingtheoperations(additionreplacedbysubtractionandsubtractionbyaddition)and
ordersintheforwardprocess,asshowninFig. 2.1. Inmatrixformthisiswrittenas
2
6
6
6
6
4
I 0
P I
3
7
7
7
7
5
1
=
2
6
6
6
6
4
I 0
P I
3
7
7
7
7
5
;
2
6
6
6
6
4
I U
0 I
3
7
7
7
7
5
1
=
2
6
6
6
6
4
I U
0 I
3
7
7
7
7
5
:
(2.4)
Notethattheinvertibilityisguaranteedregardlessoftheselectionofthepredictiontransform
P andupdatetransform U,thusprovidingalotofflexibilityintransformdesign.
If the predictor is properly designed such that the prediction error d has low energy
on average, fewer bits will be needed for representingP, thus reducing the overall cost in
16 Chapter2. GraphbasedTransforms
Figure2.3: Exampleoflocalized lifting transform on1 dimensional signal
compression. A multi-resolution representation for signals can be derived by repeatedly
applying the lifting scheme on the smooth coefficients c, as shown in Fig. 2.2. The smooth
coefficients c
`
serve as the input signal f
`+1
for the (` +1)
th
level lifting. Moreover, a more
compact representation can be obtained from the multi-level decomposition if the predictor
P
`
ineachlevel` isselectedproperly. Theinvertibilitypropertywillstillholdformulti-level
decomposition.
2.3.2 LocalizedTransformDesign
Theliftingschemecanbeusedtoimplementanyinvertibletransform. Oneofthecommonly
used transform is the 5/3 biorthogonal filterbank of Cohen-Daubechies-Feauveau (CDF) [2],
which has been adopted in JPEG2000 standard for lossless compression [9, 73]. Given a
one dimensional signal at first level f
1
= [f
1
1
; f
1
2
; f
1
N
]
T
, the lifting scheme for CDF5/3 can
be represented as shown in Fig. 2.3 for N = 8. For the bipartition, the data points at odd
locations are assigned to update set (U
1
), and the even points are put into the prediction set
(P
1
). Each data point inP
1
is predicted using its adjacent points, resulting in a prediction
error d
1
= [d
1
1
;d
1
2
;d
1
3
;d
1
4
]
T
. The prediction error will then be used to filter the adjacent data
points inU
1
, generating the smooth approximation s
1
= [s
1
1
;s
1
2
;s
1
3
;s
1
4
]
T
. For multi-level
2.4. LiftingTransformonGraphs[56] 17
decomposition, the lifting scheme for the2
nd
level will be applied on the smooth coefficients
s
1
. In detail, the prediction for signal value f
Pi
is computed with the two adjacent points
f
Ui
and f
U(i+1)
as
^
f
Pi
=
1
2
(f
Ui
+ f
U(i+1)
). For the update stage, f
Ui
is updated with the
prediction error from the two neighbours. The smooth coefficients for interior points are
computed as c
i
= f
Ui
+
1
4
(d
i
+d
(i+1)
). Compared to other types of wavelet filterbanks such
asHaartransformandCDF9/7,the CDF5/3filterbankshaveseveraladvantages:
1. Locality: The transform is highly localized and therefore has low complexity in com-
putation
2. Rational-valued: Thetransformcoefficientsarerational-valued,andthereforeitiseasier
to create lossless transform in real implementation. Also, for coefficients equal to
1
2
n
,
thetransformcanbeimplementedwithsimpleshifts.
3. Symmetry: The filter response is symmetric, which enables a simple generalization
onto undirected graphs, where each node might have different number of connected
neighbours.
Due to these nice properties, for the prediction and update transforms design throughout this
dissertation we will apply the generalization of CDF5/3 filterbanks to graphs, which will be
describedindetailinthenextsection. Themaincontributionforourworkwillbefocusingon
thebipartitionalgorithmandthegraphconstruction,giventheuseofCDF5/3liketransforms
forpredictionandupdate.
2.4 LiftingTransform on Graphs [56]
The generalization of lifting scheme to signals on graphs is in general not trivial. Before we
go into the detail, here we first describe the problems that will be encountered during the
generalization,andthecriterionforthedesignofgraph-basedlifting.
1. Graphconstruction: InSection2.2,wehavediscussedthecriterionfordesigningagood
graph on which to define the GFT, where low weights are assigned to links connecting
nodes with large signal difference. The criterion also hold for graphs in the lifting
transform. Since we consider only linear transforms in the design of prediction and
updatefilters,inordertoobtaingoodprediction,thefilteringforeachnodeinP should
select higher weights for the neighbours inU that are similar. Besides, since the
main goal in our work is to design low complexity and localized transforms, the graph
18 Chapter2. GraphbasedTransforms
connectivityisrequiredtobelocalized. Thedetailsofgraphconstructioninapplication
ofvideocompressionwillbeaddressedinChapter4.
2. Design of prediction and update transforms (P and U): As mentioned above, a useful
feature of CDF5/3 is its locality, i.e., the transform of data at each position x 2 P
requiresonlythesignalat x andvaluesattheadjacentdatapoints x1and x+1inU.
This property is preserved in the generalization to signal on graphs: the transform for
nodev
i
2P takesinformationonlyfrom f
i
and f
j
;v
j
2N (i)\U,whereN (i)consists
of nodes that are directly connected to v
i
. The same property holds for v
i
2U such
that only neighbors fromP are considered during the transform. Note that for general
graphs,N (i) for different v
i
may contain different number of nodes. Also, the weight
onthelinkconnecting v
i
and v
j
2N (i) variesbasedonpair-wisesimilarities. Inorder
tohavebetterprediction,thepredictorfornode v
i
shouldobtainmoreinformationfrom
those connected neighbors that are likely to be most similar to v
i
. Besides exploiting
the correlation between data samples, it is also important to have transforms that are
orthogonal or nearly orthogonal, in order to reduce the distortion in reconstruction
caused by quantization. In [68], the authors propose a method to design update filters
thatpromoteorthogonality. Laterinthissection,wewilldescribethegeneralizationof
CDF5/3predictorandtheorthogonalizedupdatefilterinmoredetail.
3. Bipartition: Intheliftingscheme,thetransform(PredictionorUpdate)fornode v
i
inP
(respectivelyU)isafunctionofonlythesignalvaluesatv
i
andnodesinU (respectively
P) in order to ensure the invertibility. In using the CDF5/3 filterbank, the transform
at each node acquires neighbouring information only from the connected nodes in the
opposite bipartite set. For graphs that are not bipartite, this means that the links that
connect nodes in the same set will not be utilized during the filtering. In other words,
applying generalized CDF5/3 in graphs is equivalent to applying the transform on an
approximatedbipartitegraph,containingonlylinksconnectingnodesinU tonodesin
P. Thebipartitionalgorithmdirectlyaffectsthebipartiteapproximationandtheperfor-
mance of prediction and update filtering. In order to have better prediction, each node
v
i
inP shouldhaveenoughhighcorrelatedneighbouringnodesinU afterthebipartite
approximation. In Chapter 3, we will review related work for bipartite approximation
insignalcompression,anddescribeourapproachindesigninganoptimizedbipartition
intermsofenergycompactioninthetransformeddomain.
4. Graph representation for signals in level` > 1: After the update stage, the signal will
bedownsampled,keepingonlythesmoothsignalinU forprocessinginthenextlevel.
2.4. LiftingTransformonGraphs[56] 19
Those nodes inU might not be directly connected in the original graph. Therefore, a
graph needs to be constructed for the downsampled data in order to capture pair-wise
correlation. In Section 2.4.3, we will discuss some commonly used graph reduction
afterdownsampling.
2.4.1 PredictionFilterDesign
Given the bipartite setsP andU and the approximated bipartite graphG
bpt
(with adjacency
matrix A
bpt
), which contains only links connecting nodes inP to nodes inU, the predictor
usedinourworkfornode v
i
2P isdefinedas
^
f
i
=
1
deg(i)
X
v
j
2N
bpt
(i)
w
i;j
f
j
; (2.5)
wherew
i;j
= A
bpt
(i; j)andN
bpt
(i)isthesetofneighboringnodeofv
i
inG
bpt
. Thetransformed
coefficientatv
i
iscomputedasd
i
= f
i
^
f
i
. Notethatforaonedimensionalsignal,whichcan
be represented with a line graph with all link weights equal to 1, the predictor is simplified
into the predictor used in the traditional CDF5/3 filterbank described in 2.3. Therefore, we
will call this transform the generalized CDF5/3 predictor, which has been applied for graph
based lifting by several authors [53, 54, 56, 67]. Later in Section 5.5.3, we will also discuss
the generalized CDF5/3 filterbank developed for generalized graphs with non-zero self loop
weightsonnodes.
2.4.2 UpdateFilterDesign
For the design of update filter, we can also generalize the update filters used in CDF5/3
filterbanks. The transformed coefficient at v
r
2U, written as s
r
, stores the smooth signal
computedas
s
r
= f
r
+
1
2deg(r)
X
v
j
2N (r)
w
r;j
d
r
; (2.6)
which has been applied in literature [54, 56, 67]. However, in [68], the authors show that for
the CDF5/3 design applied to graphs, the transform’s orthogonality will be reduced if each
nodehasmorethan2connectedneighbours,Therefore,inthepaper,theauthorsproposedan
orthogonalized update transform. Assume the signal in`
th
lifting level is f = [f
T
U
; f
T
P
]
T
, then
20 Chapter2. GraphbasedTransforms
Figure 2.4: Example of graph downsampling by connecting 2-hops neighbors in the
previous level
thecorrespondingtransformmatrix T intheliftingschemecanbewrittenas
T =
2
6
6
6
6
6
6
6
6
6
6
4
t
T
1
t
T
2
:
:
:
t
T
N
3
7
7
7
7
7
7
7
7
7
7
5
=
2
6
6
6
6
4
I U
0 I
3
7
7
7
7
5
2
6
6
6
6
4
I 0
P I
3
7
7
7
7
5
; (2.7)
wherethefirstm rowscorrespondtothefilterresponseofnodesinU,andthelastnrowsare
thefilterresponsesfornodesinP. Thefilterresponsefornode v
j
inP canbewrittenas
t
T
j
= e
T
j
2
6
6
6
6
4
I 0
P I
3
7
7
7
7
5
; (2.8)
where e
j
is a column vector with element e
j
j
= 1 and e
j
k
= 0 for k , j. The orthogonalized
updatefilter Uiscomputedforeachnode v
i
2U suchthatthefilterresponse t
i
isorthogonal
to the filter response t
j
of its neighbouring nodes v
j
2P. The computation can be done
by solving a linear equation. In orthogonalizing t
i
, only the responses of its neighbouring
nodes are considered. For localized P, the nodes inP that are not neighbors of v
i
have
filter responses that have no common support or small common support with t
i
, and thus
have little effect on its orthogonality. In the lifting scheme used in our work, we will apply
thegeneralizedCDF5/3predictorwithorthogonalizedupdatefilter,sincetheorthogonalized
update filter has been shown to have better performance empirically compared to the one
withoutorthogonalization.
2.4. LiftingTransformonGraphs[56] 21
Figure2.5: Example of graph downsampling using Kron reduction
2.4.3 GraphReduction
Toobtainamulti-resolutiondecompositionforsignals,theliftingschemeisappliediteratively
onto the downsampled, smooth signal from the update set in the previous lifting level. The
transformfurtherexploitsthesimilaritywithinthedownsampledsignal. Forthe1dimensional
signaldiscussedinSection2.3.2,wherethebipartitiondividesdataintoevenandoddsamples,
theCDF5/3filterbankexploitsthecorrelationbetweenpairsofdatapointsthatare1pointaway
inthefirstlevel. Forthetransforminthe2
nd
level,thede-correlationconsidersthesimilarity
between data samples that are 2 points away in the first level. Similarly for the`
th
level, the
transform exploits the correlation between samples that are 2 points away in the (` 1)
th
level. The concept can be extended to signals on graphs [54, 57, 58], where for the graph
constructionofinputsignalinlevel`,thenodesthatare2hopsawayinthepreviouslevelare
connected. However, the approach cannot maintain graph connectivity as the decomposition
goes to higher levels. In Fig. 2.4, we show an example of graph construction using such
approach, where the update set in each level is chosen by random sampling. In this example,
multiple disconnected components are produced when the decomposition reaches level 3.
As a result, the transform will not be able to exploit the correlation between disconnected
components,losingtheopportunity offurtherenergycompaction.
In this work, we apply another graph reduction method, initially proposed in [20], and
designed from a probabilistic viewpoint. We first define a generative Gaussian Markov
Random Field (GMRF) model with inverse covariance (or precision) matrix Q = L + I,
where L is the graph Laplacian and is a small constant used to ensure matrix invertibility.
Givenarandomsignal f = [f
T
U
; f
T
P
]
T
producedbysuchGMRFmodel,where f
U
and f
P
arethe
signalsintheupdateandpredictionsetsrespectively,thecovariancematrix andtheinverse
22 Chapter2. GraphbasedTransforms
covariancematrix Qcanbewritteninblockformasfollows:
=
2
6
6
6
6
4
U;U
U;P
P;U
P;P
3
7
7
7
7
5
; Q =
2
6
6
6
6
4
Q
U;U
Q
U;P
Q
P;U
Q
P;P
3
7
7
7
7
5
: (2.9)
After removing signal inP, the downsampled signal f
U
is also a GMRF, with covariance
matrix
U;U
,andtheinversecovariancematrix,denotedas Q
d
,canbewrittenas
Q
d
=
1
U
= Q
U;U
Q
U;P
Q
y
P;P
Q
P;U
: (2.10)
We can derive the corresponding graph Laplacian of f
U
as L
d
Q
d
. The graph connection
in the downsampled signal therefore is based on the partial correlation specified in Q
d
. The
correspondingadjacencymatrix A
d
isdefinedas
A
di;i
= 0
A
di;j
=Q
di;j
; fori, j:
(2.11)
This method is also known as the Kron reduction in the literature, and was first proposed for
application in electrical networks [22]. The graph derived from Kron reduction has several
desirablepropertiesinparticular,andmostrelevantforourapplication:
1. If the original graph is connected in the first level, the reduced graph will still be
connected.
2. Two nodes v
i
and v
j
are connected in level ` if there is a path through the removed
nodesinP
`1
.
3. For two nodes that are connected though a path of large weighted links in level`1,
thelinkconnectingtheminlevel` willalsohavelargeweight.
One drawback in the Kron reduction is that the downsampled graph becomes dense as the
decompositiongoestohigherlevels. AnexampleisshowninFig. 2.5,wherethedownsampled
setsarethesameasinFig. 2.4. Thislossofsparsitycanincreasethecomputationcomplexity
in the transform. Also, the transform will fail to capture the local connectivity information.
Therefore, we apply a simple sparsification after downsampling in each level by keeping the
largestk linksforeachnode. InFig. 2.6,weshowanexampleofsparsificationinthereduced
graph,whereatleast4 linksarekeptforeachnode.
2.5. Summary 23
Figure2.6: ExampleofgraphdownsamplingusingKronreductionandsparsification
2.4.4 ComplexityofLiftingScheme
As mentioned in Section 1.2.1, the GFT requires multiplying the input signal (written as
a vector) with a dense matrix, which leads to O(N
2
) complexity. For lifting, on the other
hand, the transform is highly localized. If half of the nodes are selected into the update set
in each level, at most logN transform levels will be required. For the computation of each
coefficient in level`, the number of operations is proportional to the degree deg(i) of node
v
i
, which is usually a constant for graphs of interest in image and video processing and due
to the sparsification after graph reduction in high lifting levels. As a result, onlyO(NlogN)
complexity is needed for lifting application. Besides, the GFT coefficients are real valued,
while in lifting based on CDF 5/3 filterbank, the coefficients are rational, which leads to
additional reduction in implementation cost. In our experiments in Chapter 4, we will show
that with the lifting transform, the quality of reconstruction after compression is comparable
toGFT,sothatwepaynopenaltyforthesesignificantreductionsincomplexity.
2.5 Summary
Inthischapter,wehaveintroducedtheconceptofgraphs,andanotionoffrequencyforgraph
signals using the Graph Fourier Transform (GFT). We have discussed the optimality of GFT
in signal energy compaction. In Sections 2.3 and 2.4, we gave an introduction for the lifting
scheme and its generalization to signals on graphs. The design of prediction and update
transformsandgraphdownsamplingusedinourworkarealsodescribed. Inthenextchapter,
we will focus on the optimization of bipartition in the lifting transform in terms of energy
compaction.
25
Chapter3
BipartitioninLiftingTransforms
In Section 2.4, we have described the problems in applying the lifting scheme to signals on
graphs and the algorithms for prediction, update and graph downsampling utilized in our
experiments. In this chapter, we will be focusing on the optimization of bipartition in order
tomaximizeenergycompaction.
3.1 RelatedWork
3.1.1 MaxCutbasedBipartition
The first bipartition technique proposed for graph based lifting was [56], where a denoising
applicationwasconsidered. Inthiswork,theauthorsuseabipartitionalgorithmcalledconflict
minimization,whichaimstominimizethetotalweightoflinksconnectingnodesinthesame
set (P orU), i.e., those links that will not be used in the lifting transform. In [53, 54, 57],
a similar idea is applied for the lifting scheme in the application of low pass approximation
and video compression, where the bipartition is done by maximizing the total weights on
links connecting nodes inP to nodes inU. In other words, the algorithm tries to utilize as
many pair-wise similarities as possible after discarding conflicting links to achieve better de-
correlation. Thisisisequivalenttosolvingamaximumcutproblem(MaxCut). TheMaxCut
problem is known to be NP-complete, and therefore some greedy approximations have been
proposed in the literature. In [59], the authors propose a solution for the Max Cut problem
basedonmaximumspanningtree(MST),whoseoptimalsolutioncanbeformedusingmethods
suchasPrim’salgorithm. TheoptimalityoftheMaxCutapproachisalsoanalyzedin[59]for
interpolation, where the authors show that by maximizing the cut value betweenP andU, a
lower bound of the`
1
norm error for cross-linear interpolation will be minimized assuming
the samples are i.i.d. However, the signals typically exhibit local correlation, and therefore
thei.i.d assumptiondoesnotholdingeneral. Takingimagesasanexample,thereexistshigh
26 Chapter3. BipartitioninLiftingTransforms
correlation between pixels that are adjacent to each other. Moreover, no justification was
provided foroptimalityintermscompressionefficiencyin[59].
3.1.2 PredictionErrorMinimization
In [23, 52], Martínez-Enríquez et al. propose an optimal bipartition for lifting transform in
video compression. The objective function is based on minimizing the energy of prediction
error stored inP, thus promoting the energy compaction into the low frequency setU.
Defining
^
f
P
tobethepredictionof f +P,theproblemcanbeexpressedas
(P;U)
= argmin
(P;U)
E[(f
P
^
f
P
)
T
(f
P
^
f
P
)]
= argmin
(P;U)
E[(f
P
Pf
U
)
T
(f
P
Pf
U
)]
= argmin
(P;U)
X
v
i
2P
E[(f
i
^
f
i
)
2
]:
(3.1)
In this work, the authors consider two signal models, namely the noise model (NM), and
moving average (MA) model. The prediction
^
f
i2P
is computed as the normalized average
of its neighbouring nodes inU. A greedy solution for (3.1) is proposed. In general, the
transform with MA model performs better than the one with NM based bipartition in terms
of minimizing the expected prediction errors inP, showing the significance for considering
localcorrelationinthedesigningofsignalmodel. However,themovingaverage(MA)model
assumes marginal independence for samples that are not immediate neighbors, which is in
generalnottrueforimageandvideosignalwherecorrelationusuallyalsoexitsbetweenpixels
thatarefewpixelsaway. Inthenextsection,wewilldiscussindetailthedifferenceofthetwo
modelsin[23,52]fromourproposedsignalmodel.
3.2 OptimizedBipartition for GMRF modeled signal
In this thesis, we propose a bipartition scheme that is optimal in terms of energy compaction
undertheassumptionthatan N dimensionalsignal f canbemodeledasGMRF: fN (;).
A random vector f = [f
1
; f
2
;; f
N
]
T
can be modeled by a GMRF if its probability density
functioncanbewrittenas
p(f) = (2)
N
2
det(Q)
1
2
exp(
1
2
(f )
T
Q(f )); (3.2)
3.2. OptimizedBipartitionforGMRFmodeledsignal 27
where is its mean vector and Q is the inverse covariance matrix Q =
1
, also called
precision matrix. The precision matrix Q defines the conditional pair-wise correlations
betweensignalvalues, i.e.
p(f
i
; f
j
jf=ff
i
; f
j
g) =
Q
i;j
p
Q
i;i
Q
j;j
: (3.3)
3.2.1 RelationshiptoNoiseModel(NM)andMovingAverage(MA)mod-
els
ItcanbeshownthattheNM andMAsignalmodelsconsideredin[23,52]bothcorrespondto
specificGMRFmodels. UndertheNM case,asignalisdefinedas
f = c+; (3.4)
where c consists of constant values [c
1
;c
2
;;c
N
]
T
and vector contains i.i.d zero mean
Gaussian noises [
1
;
2
;;
N
]
T
with variance
2
;i
for i
th
element. The model can be
expressedasaGMRFwithmean
NM
= c andcovariance
NM
=E[(f c)(f c)
T
] (3.5)
=E[
T
] (3.6)
= diag([
2
;1
;
2
;2
;;
2
;N
]): (3.7)
The covariance matrix contains non-zero values only for diagonal elements, i.e., the model
assumes there is no pair-wise correlation between pixels. The assumption is usually not true
for image and video signals, where neighboring pixels usually possess high similarity. For
MAmodel,ontheotherhand,thesignalvalueforsample(node)i isdefinedas
f
i
=
1
jN
[i]
j
X
j2N
[i]
j
+
i
; (3.8)
whereN
[i]
is the closed set of spatial neighbors of node i (N
[i]
=N (i)[ i) and is an
adjustable parameter.
j
and
i
are both i:i:d zero-mean Gaussian noise with variance
2
;i
and
2
;i
respectively. The equation can be expressed in matrix form using the adjacency
matrix A anddegreematrix D as
f = (D+ I)
1
(A+ I) +; (3.9)
28 Chapter3. BipartitioninLiftingTransforms
where vectors and contain the variance of i:i:d Gaussian noises: = [
1
;
2
;;
N
]
T
and = [
1
;
2
;;
N
]
T
. In [23, 52], the adjacency matrix for video signals contains only
links between each pixel and its four immediate neighbors (pixels on the top, right, left, and
bottom), i.e., pixels within 1 pixel width. Defining
~
A = (D+ I)
1
(A+ I), it can be shown
thatthemodelisalsoaspecificGMRF,withzeromeanandcovariancematrix
MA
=E[
T
] (3.10)
=E[(
~
A +)(
~
A +)
T
] (3.11)
=
~
AE[
T
]
~
A
T
+
2
E[
T
] (3.12)
=
~
Adiag([
2
;1
;
2
;2
;;
2
;N
]])
~
A
T
+
2
diag(
2
;1
;
2
;2
;;
2
;N
]): (3.13)
In (3.13), the second term is a diagonal matrix consisting of noise variances, while the first
term,whichperformsleftandrightmultiplicationofadiagonalmatrixwithadjacencymatrix
~
A, contains only non-zero elements between each node and its neighbors within 2-hops (2
pixel width). In other words, the model assumes marginal independence between pixel that
aremorethantwopixelsawayfromeachother. Theassumption,althoughtakingintoaccount
some local similarities between adjacent pixels, is also not realistic in general for image and
videos, where similarities usually exist between pixels that are few pixels (more than 2 pixel
width)awayfromeachother.
3.2.2 GaussianMarkovRandomField(GMRF)Model
In our work, we define a generative GMRF model where the partial correlation, defined in
the inverse covariance matrix (precision matrix) Q, is based on the graph connectivity, i.e.
Q
i;j
= 0ifthereisnoconnectiononthegraphbetweennode v
i
and v
j
. IntheappliedGMRF,
Q isdefinedas
Q = (L+I); (3.14)
where is a small constant 0 used to ensure the invertibility of Q. Based on the chain rule
in probability theory, the covariance matrix for the proposed GMRF has non-zero elements
fornodesconnectedwithapathwithinthegraph,whichprovidesamorerealisticmodelthan
NM and MA models considered in the related work. The described GMRF model has been
adopted with great success in many applications in multimedia signal processing including
[10,84,85].
3.2. OptimizedBipartitionforGMRFmodeledsignal 29
3.2.3 OptimizedBipartitionforGMRF
To find the optimal bipartition in terms of energy compaction, we apply the same objective
function used in (3.1), which minimizes the magnitude of prediction residuals inP. The
bipartition (P;U) and prediction transform P are jointly optimized. For the GMRF model,
givenachosenbipartition,theoptimalpredictor
^
f
P
for f
P
istheconditionalexpectationof f
P
given f
U
,whichisexpressedas
^
f
P
=E[f
P
jf
U
]
=
P;U
1
U;U
f
U
=Q
1
P;P
Q
P;U
f
U
;
(3.15)
i.e., the maximum a posteriori (MAP) estimation. The prediction error in (3.1) with the
optimalprediction
P
=
P;U
1
U;U
=Q
1
P;P
Q
P;U
(3.16)
canthereforeberewrittenas
(P;U) = argmin
(P;U)
(P;U)
= argmin
(P;U)
Tr(E[(f
P
P
f
U
)(f
P
P
f
U
)
T
])
= argmin
(P;U)
Tr(
P;P
P;U
1
U;U
U;P
)
= argmin
(P;U)
Tr(Q
1
P;P
);
(3.17)
where the step going from 3
rd
to the 4
th
line above is based on the Schur complement [86].
This is an NP-hard problem, and therefore an approximation is required. In our work, we
applyagreedyapproximationforsolving(3.17),whichissummarizedinAlgorithm1. Inthe
algorithm, given a graph G = (V;E), the update set is initialized as an empty setU
0
=;,
and the prediction setP
0
, on the other hand, is initialized as the whole node setV. The
superscript denotes the iteration. In iterationt, the algorithm selects an optimal node v from
P
t1
thatminimizesthepredictionerror(P
t1
=fvg;U
t1
[fvg) oftheremainingprediction
nodes given the signal in update set with node v included. The optimal node will then be
added to the update set:U
t
=U
t1
[fvg.The process continues until the target sizejUj is
reached.
Similar optimization approaches for partitioning have been used in applications such as
dynamic networks [51] and active learning [39]. However, to solve the problem in (3.17)
30 Chapter3. BipartitioninLiftingTransforms
Algorithm1Greedysolutionforbipartition
Input GraphG = (V;E) andtargetjUj = m
Output Bipartition (U;P)
1: InitializeU
0
=;;P
0
=V
2: fort = 1 : 1 : m do
3: Select v = argmin
v
(P
t1
=fvg;U
t1
[fvg)
4: P
t
P
t1
=fvg
5: U
t
U
t1
[fvg
6: endfor
requires computing the inverse Q
1
P;P
in every iteration t for each potential node v inP
t1
,
which would be too complex in practice. Therefore, in [39], the authors propose an efficient
sequential optimization scheme. In this algorithm, eigen-decomposition of Q is performed
at the beginning. For the remaining iterations, only matrix-vector multiplication for each
candidate node inP
t1
is required. As a result, the computation complexity for bipartition
canbereducedtoO(N
3
).
3.2.4 AnalysisofProposed Bipartition
We observe that the selected nodes forU using the proposed algorithm are usually not
distributedevenly. Infact,thedistributionvariesaccordingtothelocalcorrelationdefinedin
the GMRF model, as shown in the toy example in Fig. 3.1. In the example, pixel intensities
have higher variation on the left side compared to variation on the right side of the block.
We use a 4-connected grid graph to represent the block (Fig. 3.1(b)). The link weights are
decidedbasedonedgesdetected: theweightonlinksacrossedgesareassignedasmallweight
w < 1,whileotherlinkshavelinkweight1. Then,weapplytheproposedalgorithmdiscussed
intheprevioussectionandselect
1
4
ofthenodestobeincludedinU (Fig. 3.1(c)). Asaresult,
the density of nodes selected forU is higher in the high variance regions (left) as compared
tothesmootharea. Intuitively,thepixelsinlowvarianceregionhavesimilarintensitieseven
though the pixels may be several hops away from each other, and therefore selecting a large
numberofsamplesintheseregionscanberedundant. Forpixelsinhighvarianceregions,on
the other hand, a large number of update pixels is required in order to ensure good quality in
prediction.
3.3. BipartiteGraphFormulation 31
(a) Block with edge structure
0.1
1
(b) 4 connected grid graph constructed based on
edgestructuresin(a)
0.1
1
(c)
Figure 3.1: Example ofU node selection by greedy algorithm of MAP error mini-
mization
32 Chapter3. BipartitioninLiftingTransforms
(a) (b)
Figure3.2: ExampleofblockwithP nodes(bluenodes)withlowconnectivitytothe
update set (red nodes)
3.3 BipartiteGraphFormulation
In this section, we discuss the method for bipartite graph construction given the bipartition.
In the previous work using lifting [53, 54, 56, 57], bipartite graphs were obtained by remov-
ing those links connecting two nodes in eitherP orU, i.e., conflicting links. However, as
mentionedinSection3.2.4,usingtheproposedbipartitionbasedonminimizingtheMAPpre-
dictionerror,thedistributiondensityofnodesintheU variesdependingonlocalcorrelation.
As a result, nodes inP in the low variance areas, where distribution ofU nodes have low
density, tend to have low connectivity toU. An example is shown in Fig. 3.2(a), where we
considerthesamegraphusedinFig. 3.1(b),onwhichhalfofnodesareselectedasU nodes.
Ifwetakeacloselookatthelowvariancearea(markedwithredbox),itcanbeobservedthat
thereexistnodesinP thatdonothaveanydirectconnectiontoU,whilemostoftheP nodes
have only one neighbouring node inU. That is, the nodes inP will have limited amount of
informationforprediction. Asaresult,highpredictionerrormayoccurinthesmoothregions
whenusingalocalizedpredictiontransform, e.g.,generalizedCDF5/3.
3.3.1 KronReductionbasedReconnection
In order to ensure that every node inP has sufficient number ofU neighbors for prediction,
we propose a Kron reduction based reconnection approach in generating the bipartite graph
G
bpt
fortransform. OneimportantpropertyofKronreduction,asdescribedin(2.10),isthatit
maintainsgraphconnectivity,i.e. afterremovingnodesinS
c
V (V =S[S
c
),twonodes
3.3. BipartiteGraphFormulation 33
Figure3.3: Exampleforiterative Kron reduction by removing1 node at a time
Algorithm2KronreductionbasedreconnectionforP
Input Adjacencymatrix A ofgraphG andbipartition (U;P)
Output Adjacencymatrix A
bpt
ofthebipartitegraphG
bpt
1: Initialize A
bpt
asanempty N N matrix
2: Computedegreematrix D,where D
i;i
=
P
j
A
i;j
3: GraphLaplacian L = D A
4: for v
i
2P do
5: DefineU
+
=U[fv
i
g,andP
=P=fv
i
g
6: ComputetheKronreductionL
U
+ = L
U
+
;U
+ L
U
+
;P
L
y
P
;P
L
P
;U
+
7: Assign link weight connecting v
i
and v
j
2U to be A
bpt
(i; j) =L
U
+(k;r), where
k andr aretheindicesof v
i
and v
j
inthereducedLaplacianL
8: Keeponlythe k linksbetween v
i
andU withthelargestweights
9: endfor
inS that were connected by a path through the removed nodesS
c
will remain connected in
the reduced graph. An example of Kron reduction after removing two nodes sequentially is
shown in Fig. 3.3. Thanks to this property, for a node v2P with low connectivity to the
update set in G, we can generate links connecting v toU that are more than 1 hop away by
removing other nodes inP using Kron reduction. The newly connectedU nodes are those
which have connection to v in the original graph by a path through the removedP nodes. In
our construction for the bipartite graph, we apply the same process for every node v inP.
Then, a sparsification process, which for a givenU keeps only k links with largest weights
connecting to v, is applied for the resulting graph in order to reduce the complexity for the
followingtransformstages. ThedetailofthealgorithmissummarizedinAlgorithm2.
3.3.2 IterativeKronReduction
Another useful property of the Kron reduction is that it can be computed iteratively, i.e. the
Kron reduction applied after removing nodes in setS
c
=fv
1
;v
2
;;v
m
g leads to the same
34 Chapter3. BipartitioninLiftingTransforms
(a)Bipartitegraphbuiltbyremovinglinksconnect-
ing nodes in the same set (U orP
(b) Bipartite graph built using kron reduction and
sparsificationfornodesinP
Figure3.4: Exampleofbipartite graph construction using two different schemes
graphasremovingnode v
i
2S
c
oneatatimeovermiterationsandadaptingthegraphateach
iteration. Thisavoidsthecomputationofmatrixinversion(thematrixinversionwillbecomea
simpledivisonbyaconstant). Notethattheorderofnodesremovaldoesnotaffecttheresult.
If the reduction is done in a certain order:fv
1
;v
2
;;v
m
g, denotingS
t
the node set kept in
thet
th
iteration,thentheKronreductioniniterationt iscomputedas
L
t
=L
t1
v
t
;v
t
L
t1
v
t
;v
t
L
t1
v
t
;v
t
L
t1
v
t
;v
t
; (3.18)
wheretheindexv
t
correspondingtothenodesinS
t1
=fv
t
g. Ineachiteration,thecomplexity
depends on the number of nonzero elements inL
t1
v
t
;v
t
, i.e. the number of links connecting
to the eliminated node. Define c to be the maximum number of links connecting to the
eliminatednodethroughtheiteration,thecostforeliminatingonenodetakesO(c
2
)operations
by performing the outer productL
t1
v
t
;v
t
L
t1
v
t
;v
t
and the total complexity for Kron reduction
will beO(c
2
N). In [22], the authors shows that the reduced graph from a graph with sparse
connection will be sparse though Kron iteration in general, thus c is usually much smaller
than N. For bipartite graph construction using the proposed reconnection, the process needs
to be done for all the nodes in v2P. Note that during the Kron reduction at each node v,
onlytheconnectionfrom v toU isconsidered,i.e.,onlyonerowfrom(3.18)iscalculated,so
thecostforiterativeKronreductionateachnodeisreducedtoO(cN). Therefore,theoverall
complexityofre-connectionforallthenodesinP isO(cN
2
) inoneliftinglevel.
3.4. Experiments 35
3.3.3 ProbabilisticInterpretation
Using the Kron reduction based reconnection for every node inP, we can assure that every
node has enough neighbors inU for the prediction with the localized CDF5/3 transform to
be used. Moreover, we can show that the generalized CDF5/3 predictor P
bpt
applied on the
bipartite graphG
bpt
withoutsparsification,whichiswrittenas
P
bpt
= D
bpt
1
P;P
A
bpt
P;U
; (3.19)
where D
bpt
and A
bpt
denote the degree and adjacency matrices of G
bpt
, is equivalent to the
MAP estimator in (3.15) for the defined GMRF model. A proof is provided in Appendix A.
Therefore, the prediction transform we proposed (without bipartite graph sparsification) is
optimalintermsofpredictionerrorminimization. Inaddition,withthehelpofiterativealgo-
rithm, the Kron reduction based reconnection has lower complexity compared to computing
directly theMAPestimator,whichrequiresmatrixinversion.
3.4 Experiments
Inourexperiments,weapplytheliftingtransformusingtheproposedbipartitionandreconnec-
tionalgorithmstotestimagesandpredictedresiduesfromvideodata. Thegraphconstruction
is based on the edge locations, which is the same approach used in our application for video
compression in the next chapter. For the links across edges, a weight w < 1 is assigned,
while the rest of the links are assigned weight 1. For baseline comparison, we consider the
related approaches described in Section 3.1, namely the (1) Maximum spanning tree (MST),
(2) Noise Model (NM) and (3) Moving Average (MA) model based methods. For the first
approach, since there is no assumption for the types of prediction and update transforms in
the bipartition design, we apply the same reconnection and transforms used in our proposed
scheme. For the bipartition based on the NM and MA, on the other hand, we apply the same
predictor as the assumption in the optimization described in [23, 52]: the predicted signal
^
f
i
onnode v
i
2P iscomputedas
^
f
i
=
1
m
i
X
j2N (i)\U
f
j
; (3.20)
wherem
i
=jN (i)\Uj. Sameastheimplementationin[23,52],theweaklinksacrossimage
edges are removed before the bipartition and prediction. The update filter is orthogonalized
usingthemethodin[68]inallofthesecases.
36 Chapter3. BipartitioninLiftingTransforms
ThecomparisonisbasedontheenergycompactionintheupdatesetU inthetransformed
domain, i.e. we compare the mean squared error of reconstruction after truncating the
coefficients inP in the transformed domain. The test sets contains 4 images:man, airplane,
lena, peppers and 2 intra-predicted residual sequences: Cactus and Kimono, consisting of 5
frames in each sequence. The images or video frames are firstly divided into multiple non-
overlapping88 blocks,wherethe liftingschemesareapplied separately. Inourresults,we
only consider blocks where some edges are present, and the mean squared errors are derived
as the average for all the edge blocks. For MST, the number of nodes in setU is determined
given the starting node in the implementation, while in NM, MA, and the proposed method,
the size ofU is decided by the users. In the experiments, we consider different bipartition
ratesjUj=N for the three methods. Note that for NM and MA model-based bipartition,U is
initializedastheminimumSetCover,andthereforetheminimumbipartitionrateishigher.
The results in Fig. 3.5 show that the proposed method using GMRF model consistently
outperformsthebaselineapproachesespeciallyforlowerbipartitionrate. Thisisbecausethe
proposed GMRF model captures more correlation between pixels that are further away from
eachother,whichisusuallyhighforpixelsinsmoothareas. Therefore,inthebipartition,the
algorithmwillselectmorenodesinhighvariationregionforU,asdescribedinSection3.2.4,
resulting in better efficiency in prediction compared to other baseline approaches. Note that
theexperimentonlyconsidersa1levelliftingtransform. Theextensiontomulti-levelcanbe
done by applying the same process of bipartition and transform iteratively from low level to
highlevelonthedownsampledgraphsasdonein[53,54,57,68]. However,theoptimization
for bipartition considering multi-level is still an open question. In our application for video
compression discussed in next chapter, we apply a different bipartition approach for multi-
levellifting,whichempiricallygivesbetterperformancethantheconventionalmethodin[53,
54,57,68].
3.5 Summary
In this chapter, we defined the problem of optimal bipartition in the lifting transform in
terms of energy compaction for the generative GMRF model. The model provides more
accurate modeling for real signals. A greedy approximation was applied for selecting nodes
to be included in the update set in the experiment. In addition, we proposed a reconnection
approach for bipartite graph construction in order to solve the problem of connectivity loss
caused by the uneven distribution of nodes in update set using the proposed bipartition. Our
experimental results show that the proposed method outperforms the baseline bipartition
approachesintermsofenergycompaction.
3.5. Summary 37
(a) Man (b)Airplane
(c) Lena (d)Peppers
(e) Kimono: intra-predicted residue (f)Cactus: intra-predictedresidue
Figure3.5: Reconstructionerror(MSE)aftertruncationcoefficientsinP fordifferent
bipartition rate
39
Chapter4
VideoCodingApplication
In Chapter 3, we have discussed optimization of graph based lifting in terms of energy
compactioninaprobabilisticperspective. Specifically,weproposedanoptimizedbipartition
basedonagenerativeGMRFmodelderivedfromthegraphconnectivity,inordertomaximize
the energy compacted in the update set (U) in the transformed domain. Besides, since the
problem is NP-hard, we provided a greedy solution, which has shown promising results in
energy compaction for differentjUj as compared to the baselines methods. In this chapter,
weapplytheproposedliftingschemetotheapplicationofintra-predictedvideocompression.
Inthedesignofimageandvideocodecs,therearemanycomponentsthatneedtobedesigned
inadditiontothetransformschemediscussedinthepreviouschapter. Theseinclude1)graph
construction,2)overheadsignaling,3)transformedcoefficientscanning,and4)ratedistortion
optimization.
The advantage of graph representation lies in its adaptation to different signal charac-
teristics. A better graph representation in terms of the connection and weight assignment
thatcapturesthepair-wisesimilaritymoreaccuratelycanenablebetterpredictionandenergy
compaction. However,amorecomplexrepresentationalsoincreasestheoverheadrequiredto
describethegraphstructure,i.e.,theinformationneededsothatthedecodercanconstructthe
inversetransform. Inthischapter,wewilldiscussagraphconstructionapproachesuitablefor
predictedvideoresiduesanditscorrespondingoverheadsignaling. Moreover,wewilldiscuss
theorderingoftransformedcoefficientsandtheassociatedentropycoding. Mostofthework
inthischapterwaspublishedin[80].
4.1 GraphConstruction
We apply block based encoding, i.e. a video frame is divided into non-overlapping m m
blocks, on which transforms are applied. For graph construction in each block, we take a 4-
connectedgridgraph,showninFig. 4.1,whichhasbeenwidelyadoptedforimageandvideo
40 Chapter4. VideoCodingApplication
Figure4.1: Example of4-connected grid graph
framerepresentation,asastartingpoint. Theweightassignmentisbasedontheedgestructure.
For two nodes with an edge detected in between, the pair will be considered as having weak
correlation and assigned a nonzero weight w < 1 on the link connecting them. While other
node pairs will be considered as having strong correlation, with link weight equivalent to
1. This is similar to the method applied in [28] for depth map. The only difference is that
instead of fully disconnecting links across edges, we assign a nonzero weak weight. This is
because the intensity difference of pixels across edges in the intra-predicted video and other
natural images is not as sharp as that encountered with depth maps. Also, edge structures
aremorecomplicatedinintra-predictedresidualsequencesthanindepthmaps,andtherefore
byfullydisconnectingthelinksacrossedges,lotsofdisconnectedgraphcomponentsmaybe
produced.
The choice of w is based on the minimization of prediction error using the generalized
CDF 5/3 transform for every node v
i
2V using the information from its one hop neighbors
v
j
2 N(i). TheCDF5/3predictorofnode v
i
iswrittenas
(i) =
1
deg
i
(
X
j2N (i)
w
wf
j
+
X
j2N (i)
s
f
j
); (4.1)
where N(i)
w
and N(i)
s
indicates the set of adjacent nodes of v
i
with weak correlation and
strongcorrelation,anddeg(i) isthedegreeof v
i
computedas
deg(i) = wjN (i)
w
j+jN (i)
s
j: (4.2)
4.2. BipartitionScheme 41
Theoptimalweight w
isfoundbysolvingthefollowingminimizationofthetotalprediction
error:
w
= argmin
w
X
i
(f
i
(i))
2
: (4.3)
The optimization is performed on the intra-predicted frames from a set of training video
sequences. Notethattheproblemin(4.3)isnotconvexsincethenormalizationfactorof (4.2)
containstheunknownvariable w. Inourexperiment,weapplygradientdescentwithmultiple
initial points generated randomly, and select the optimal w
with minimum prediction error.
For the construction of downsampled graphs in lifting level` > 1, we apply Kron reduction
withsparsification,describedinSection2.4.3.
4.2 BipartitionScheme
Algorithm3Boundary/Edgeextensionforsamplingongraphs
Input GraphG = (V;E),andtargetU size m
Output U andP aftersampling
1: Extend the Adjacency matrix A to include the extended nodes (called A
ext
) around
boundariesandedges.
2: ComputetheLaplacianmatrixas D
ext
A
ext
+I.
3: InitializeU =;,P =V
4: fort = 1 : 1 : m do
5: Choose the sample y s.t. y, along with its extended nodesfy
0
; y
00
g andU
t
in the
previous,minimizetheMAPerrorinsetP.
6: U =U[fyg,andP =P=fyg.
7: endfor
For bipartition, we apply the optimized scheme based on MAP error minimization pro-
posed in Section 3.2.3. Note that in GMRF, the diagonal elementQ
i;i
in the precision matrix
Q = L+Icanbeinterpretedastheinverseofthepredictionerrorfornode v
i
givenV=fv
i
g.
Hence, the nodes around block boundaries and edges, which have lower degree, are consid-
ered to have large prediction error, and therefore are given higher priority in selection for
update set. However, the pixels near boundaries tend to be further away from other pixels,
thus having high density in sampling around boundaries reduces efficiency in prediction. To
address this issue, we make the number of links for each node equal by using a symmetric
boundary extension as shown in Fig. 4.2. As a result, the graph used for bipartition is aug-
mented. Theapproachisconsistentwiththefilterbankusedlater,whichalsousesaboundary
extension with degree normalization. If anode v (e.g. node11 in the example) is selected as
42 Chapter4. VideoCodingApplication
Algorithm4Bipartitioninmulti-levelliftingtransform
Input graphG = (V;E),maximumlevel q,andtargetsizefjU
m
jg
m=1:q
Output fU
m
g
m=1:q
1: InitializeU =;andP =V
2: for m = q :1 : 1 do
3: InitializeU
m
=;
4: for s = 1 : 1 :jU
m
j do
5: Select v
i
= argmin
v
i
(P=fv
i
g;U[fv
i
g)
6: P =P=fv
i
g
7: U =U[fv
i
g
8: endfor
9: U
m
=U
10: endfor
a sample to be included inU, its mirrored nodes (denoted 11
0
) are also selected. Note that
the weight between extended node v
0
and a boundary node x (e.g., node 15) is equal to the
weightbetween v and x. Thesameideaisalsoappliedfornodesaroundedges. Thismethod
ofbipartitionincludingtheboundaryandedgeextensionissummarizedinAlgorithm3. Note
that in this work, we propose a novel bipartition strategy for multi-level decomposition. In
[23, 52], the optimization forU
`
in level` considers only the minimization of errors stored
inP
`
. It ignores the fact that sinceU
`1
=U
`
[P
`
, the selection ofU
`
will also affect
the prediction forP in the lower levels, i.e. fP
`1
;P
`2
;;P
2
;P
1
g. Therefore, in our
bipartition scheme for multi-level decomposition, the update setU
`
in level` is optimized
such that the prediction error inS =fP
`
;P
`1
;;P
2
;P
1
g is minimized. The objective
functioniswrittenas
U
`
= argmin
U
`
(S;U
`
)
= argmin
U
`
E(kf
S
Pf
U
`k)
2
;
(4.4)
where f
S
and f
U
` correspond to the signals inS andU
`
and P is the MAP estimator of f
S
given f
U
`. ThegreedyalgorithmissummarizedinAlgorithm4.
4.3 TransformDesign
The bipartition approximation G
bpt
is constructed by reconnectingP nodes using the Kron
reduction, as discussed in Section 3.3. As mentioned, the method solves the problem of
connectivity loss inP due to the uneven distribution of samples inU. The prediction and
4.3. TransformDesign 43
(a)
(b)
Figure4.2: Boundaryextension for pixels around (a) block boundaries and (b) edges
29
31
33
35
37
39
41
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
average PSNR
bpp
PSNR - average
MaxCut
MaxCut + reconn
GMRF based
Figure4.3: The comparison between proposed lifting scheme, MaxCut based lifting,
andtheMaxCutbased lifting with proposed re-connection technique.
44 Chapter4. VideoCodingApplication
update transforms are based on the generalized CDF5/3 filterbanks and orthogonalization.
Wecomparethecodinggainoftheproposedbipartitionandre-connectionapproach(reconn-
GMRF)withtheMaxCutbasedliftingin[54](MaxCut). Thetestsequencesconsistof7frames
from7 video sequences. The result is shown in Fig. 4.3. we also include results of MaxCut
based bipartition with re-connection using Kron reduction (reconn-MaxCut). Note that even
withasimplebipartitionschemesuchasMaxCutbasedmethod,usingre-connectionitleads
to performance comparable to reconn-GMRF, making further simplification of bipartition a
directionforfuturework.
4.4 OverheadSignaling and Entropy Coding
Inapracticalimage/videocodec,thetransformedcoefficientsareusuallyscannedandentropy
coded in a specific order, such as the zigzag scanning in the conventional DCT, where the
coefficientsarescannedfromlowfrequencycomponentstohighfrequencycomponents. Ifthe
signalenergyiswellcompactedintothelowfrequencysubband,thehighfrequencysubband
will contains lots of zeros after quantization, leading to highly efficient encoding since the
number of nonzero components for scanning is small. In the proposed lifting scheme, we
optimize the energy compaction in the update set, and repeat the same process from the
1
st
level to the highest level of decomposition. As a result, the lifting coefficients with large
magnitudewilltendtobeconcentratedinthehighlevels,whilethesmallcoefficientswillmost
likelybecontainedinlowlevels. Therefore,weadoptthesamecoefficientreorderingmethod
proposedin[53]. Defining d
`
and s
`
asthedetailandsmoothcoefficientsstoredinP andU in
the`
th
level,thecoefficientswillbeorderedas[s
q
; d
q
; d
q1
;; d
2
; d
1
]beforescanning. The
coefficientswithineachlevelwillbeorderedbasedontheirreliability,definedastheaverage
of weights on links connected to the node. In general, a node with low reliability has higher
predictionerror,andthereforeshouldbescannedearlierthananodewithhighreliability. For
entropy coding, we apply an amplitude group partition technique called AGP [66]. AGP can
learn and adapt to different coefficient distributions, thus providing fair comparison between
different transforms. Before quantization, the coefficients of the CDF53 lifting transform are
first normalized based on [74] so as to compensate for the lack of orthogonality. For the
signalingofgraphgeometries,weusetheArithmeticEdgeCoding(AEC)proposedin[18].
4.5. ExperimentalResults 45
Figure4.4: Encoderforintra-predictedvideoswithmodeselectionbetweenDCTand
thegraph based lifting transform.
4.5 ExperimentalResults
Inourexperiments,wegeneratetheintra-predictedresidualframesfortestsequencesForeman,
Mobile,Silent,andDeadlineusingHEVC(HM-14)withtransformunitsizefixedas88. The
encoder system is shown in Fig. 4.4. In order to deal with the trade-off between transform
quality and signaling overhead, for the transform coding, the encoder will select between
the proposed graph based lifting scheme, which requires sending edge locations, and the
conventionalDCT.TheselectionisbasedonRateDistortionOptimization(RDO).Wedefine
SSE to be the sum of squared error of the reconstructed signal, and R as the bitrate. For
graph based lifting, both the bits for coefficient encoding and the edge information overhead
areconsideredforthebitrate. TheRDcostiscomputedasRD
lifting
= SSE
lifting
+(R
coeff
lifting
+
R
edge
lifting
), where R
coeff
and R
edge
correspond to the bitrate needed for encoding transformed
coefficients and the graph geometry based on edges. DCT is chosen if there is no edge
componentintheblockorifitsRDcost,computedasRD
DCT
= SSE
DCT
+R
coeff
DCT
,issmaller
than RD
lifting
. The parameter is chosen as0:852
(QP12)=5
, and the QP values used in this
experiment are from24 to36 with step size2. The flags for transform selection are signaled
usingarithmeticcoding. Inordertofurtherreducetheoverheadcost,ineach88blockonly
onecontourisallowedforAEC.
Wecomparethetheproposedliftingscheme(reconn-GMRF)with4baselineapproaches,
includingtheDCT,theGFT,MaxCut,andreconn-MaxCut. TheDCTcoefficientsarezig-zag
scanned, and the GFT coefficients are scanned from the smallest eigenvalue to the largest
eigenvalues. The average PSNR gain and bitrate reduction are presented in Table 4.1. For
46 Chapter4. VideoCodingApplication
Methods GFT reconn-GMRF reconn-MaxCut MaxCut
PSNR
(dB)
rate
(%)
PSNR
(dB)
rate
(%)
PSNR
(dB)
rate
(%)
PSNR
(dB)
rate
(%)
Foreman 0.34 -7.28 0.29 -6.42 0.26 -5.77 0.17 -3.63
Mobile 0.17 -1.46 0.10 -0.97 0.10 -0.96 0.08 -0.51
Silent 0.22 -4.28 0.20 -3.88 0.18 -3.58 0.09 -1.66
Deadline 0.37 -4.97 0.31 -3.98 0.30 -3.90 0.24 -3.19
Table4.1: PSNR-bitratecomparisonwithBjontegaardmetric. Thenegativevaluefor
rate indicates the average bitrate reduction against DCT, and the positive PSNR
shows the average PSNR gain.
videos with simple edge structures such as Foreman and Deadline, graph based lifting has
around 0:3dB gain in PSNR. While in the videos with complicated edge structure such as
Mobile, the gain is limited since the edge map dominates the cost. We also show that
reconn-MaxCut providesagoodapproximationtohighcomplexityreconn-GMRF andGFT.
4.6 Summary
In this chapter, we described the application of our proposed transform coding using graph
based lifting transform in video coding. Novel approaches in graph weights assignment
and multi-level bipartition in lifting are proposed. we also discussed detail in designing
encoding system, e.g., transform selection and signaling overhead. The results outperforms
the conventional DCT based encoding, and their performance approximates that of the high
complexityGFTbasedencoding.
47
Chapter5
Application: Pre-demosaicLightField
ImageCompression
5.1 Introduction
In this chapter, we introduce a new application of graph based transforms to light field (LF)
image coding1. LF imaging separately captures light rays arriving from different directions
at each pixel in the image sensor. With acquired LF data, multi-view rendering and re-
focusing become possible post-capture. However, due to the additional information of light
ray directions, acquired LF data are large in volume compared to conventional color images
of the same resolution, and hence efficient compression of LF data is important for storage
andtransmission.
Inthelastdecade,manyhardwaredesignshavebeendevelopedforLFacquisition,includ-
ing multiplecamera arrays, aperturecameras, and lenselet-basedplenoptic cameras. Among
them, the lenselet-based plenoptic camera is the most popular, and has been made commer-
cially available by companies such as Lytro [50] and Raytrix [62]. In a plenoptic camera, a
1Apartofthecontentinthischapterwas published in [8]
Figure5.1: Conceptual system of lenselet-based plenoptic camera
48 Chapter5. Application: Pre-demosaicLightFieldImageCompression
microlens array is placed ahead of an otherwise conventional photo sensor embedded with
Bayer color filter, as shown in Fig. 5.1. The resulting raw image is called lenselet image.
Since light fields are commonly represented and processed as 4D functions [45, 63, 72], the
acquiredlenseletimagetypicallyundergoesdemosaicking(pixelwiseRGBinterpolation)and
conversion to multiple sub-aperture images on a 2D array. Each sub-aperture image can be
seenas atypical2Dphoto,gatheringpixelsfromaspecificlightdirection.
There exist two types of redundancies within the 4D LF representation: i) spatial redun-
dancy among neighboring pixels in a sub-aperture image, i.e. intra-view correlation, and ii)
angular redundancy among sub-aperture images of nearby directions, i.e. inter-view corre-
lation. The existing works in literature for lenselet-based LF compression can be roughly
grouped into two categories that make use of classic image and video coding techniques.
The first one encodes the entire 2D array of sub-aperture images as an image using an image
coding standard, e.g. JPEG or Main Still Picture Profile in HEVC [15, 16, 65]. The intra
and inter-view correlations are exploited using the regular intra-prediction modes (Angular,
Planar and DC modes) and a newly introduced Self Similarity (SS) mode, which is similar to
the Intra Block Copy (IBC) in HEVC screen content extension. The other type is the pseudo
sequence based approach [3, 21, 25, 30, 48, 69], where each sub-aperture image is treated as
avideoframeinapseudovideosequence. Theintraandinter-viewcorrelationsareexploited
usingtheintraandinterpredictioninthevideocodingstandard,e.g. H.264andHEVC.
Exploiting inter-view correlation in compression leads to high computation complexity
(due to motion / disparity prediction) and creates dependencies among coded sub-aperture
images,whichisundesirableforrandomaccess. Inparticular,inanarchivingscenario,auser
may desire to quickly browse through viewpoint images, each of which can be synthesized
in acceptably high quality using only a small subset of sub-aperture images. Thus, speedy
extraction of this image subset from the LF data compressed in high quality is important.
Furthermore,wenotethatmoststandarddigitalcamerasusealowcomplexitycodec(JPEG)
operatingbydefaultatveryhighPSNR.Similarly,inthispaperwewillconsideranintra-view
only approach (which leads to faster encoding and better random access), operating at high
rates/PSNR.
Another problem in the aforementioned existing works is that the compression schemes
areappliedonthefullcolorsub-apertureimages,wherelargeredundanciesareintroducedby
demosaicking. Moreover, to incorporate standard codecs, an RGB sub-aperture image must
be converted to 4:2:0 YUV format, which induces distortions due to integer rounding and
colorsub-sampling.
5.1. Introduction 49
Inthispaper,weproposeanewcodingscheme,wherecompressionisappliedontheorig-
inallenseletimagescapturedbythephotosensor,withouttheaforementionedpre-processing
thatincreasesdatavolumeordistortscapturedpixelvalues. Ourworkisinspiredbyschemes
proposedin[12,13,43,44]forregularimages,whichalsopostponethedemosaickingstepto
the decoder. Specifically, we first map the raw captured pixels directly onto sparse locations
in a series of sub-aperture images. Unlike the input images for compression in [12, 13,
43, 44], where R, G, and B pixels are regularly distributed based on the Bayer pattern, the
color components after the mapping to sub-aperture images are irregularly placed, making it
difficulttobeencodedusingconventionalschemes,e.g., JPEGorAll-IntramodeinHEVC.
In our proposed scheme, novel intra-prediction and transform coding are designed for
non-demosaicked LF data, consisting of sparsely distributed pixels. Specifically, for intra-
prediction, we estimate the local characteristics of each block based on the structure tensor
according to the available pixels. The information will then be used to adjust the shape of
kernel functions used for prediction. For transform coding, the irregularly distributed pixels
in a sub-aperture image will be connected as a graph, with the pixel values interpreted as a
graph-signal. The graph weights, which reflect similarities between connected sample pairs,
are optimized based on Gaussian Markov Random Field (GMRF) modeling of signal. The
problemoflearningtheoptimalgraphstructure,i.e.,graphconnectionandweights,basedon
statistical modeling for image and video signal has been well studied in the last decade [24,
34, 36, 60]. However, the methods can only be applied on data with regular pixel placement.
In this work, we consider the graph learning problem based on sparsely distributed pixels,
namely blocks with missing observations on some pixel positions. Due to the large amount
of data size, the graph-signals are encoded using the low complexity graph-based lifting
transformdescribedinthepreviousfewchapters. Inourexperiments,weapplytheproposed
LF coding scheme on a dataset captured with Lytro Illum. Compared to the state of the art
HEVC-basedcoding,theresultsshownoticeablegainsathighPSNRs,whichisusefulforLF
renderinginanarchivalscenario.
Therestofthechapterisorganizedasfollows. InSection5.2,wepresentthenotationused
throughout this chapter and a review of conventional approaches in lenselet-based LF image
compression. OurproposedcodingschemeisdescribedinSection5.3. Insections5.4and5.5,
wedescribetheproposedintra-predictionandtransformcodingbasedonsparselydistributed
pixels. Experimentsandconclusionsarepresentedinsections5.6and5.7respectively.
50 Chapter5. Application: Pre-demosaicLightFieldImageCompression
Figure 5.2: Conventional encoder for light field image. The demosaicking and cali-
brationprocesses are applied before compression.
5.2 NotationsandBackground
5.2.1 Notations
In this chapter, the elements of a 2-dimensional vector (e.g., v = [v
1
;v
2
]
T
) in the Euclidean
spacewillrepresentthedirectionsalongrow-axisandcolumn-axis. Thenotationiscommonly
used for image coordinate systems. Compared to the definition in Section 2.1, we consider
here a more general graph structure G
G
= (V
G
;E
G
), where for every node v inV
G
, there is
an associated non-negative self loop weight h
i
. The corresponding graph Laplacian matrix,
called generalized graph Laplacian, is defined as L
G
= D A+ H, where H is a diagonal
selfloopmatrixwithelement H
i;i
= h
i
. Dand Aarethedegreematrixandadjacencymatrix
defined in Section 2.1. Note that the commonly used combinatorial Laplacian L is a special
caseof L
G
withzeroweightforall selfloops.
5.2.2 Background: CompressionafterDe-mosaicking
Fig. 5.2 depicts the conventional coding scheme for light field image compression, where
the captured Bayer patterned lenselet image is first converted into an array of full color
sub-aperture images, followed by compression using standard image/video codecs. In the
figure, the conversion process is based on the method proposed by Dansereau et al. [17, 47].
Through the Bayer filter embedded on the photo sensor, each pixel on the captured lenselet
image contains only one color component out of R, G, and B. In order to generate full color
images, the missing color components at each pixel have to be interpolated using the nearby
pixelswherethetargetcolorsareavailable. Theprocessiscalleddemosaicking. Inthiswork,
we apply the demosaicking approach proposed by Malvar et al. [31] to the lenselet image.
Thenumberofpixelstobeencodedwillbeincreasedthreefoldthroughtheprocessregardless
ofthedemosaickingalgorithmsused.
5.3. Proposedscheme: CompressionbeforeDe-mosaicking 51
Figure 5.3: Proposed LF encoder, where compression is applied on the raw lenselet
data without demosaicking
Projected from the microlens array in the plenoptic camera, a lenselet image consists
of multiple hexagonally arranged pixel patches, which are called macro-pixels (represented
with a dash line in Fig.5.2); each macro-pixel collects light for one image pixel arriving
fromdifferentdirections. However,duetomanufacturingdefects,thearrangementofmacro-
pixels is usually not aligned with the image coordinates, making it difficult to infer a pixel’s
corresponding position in the scene and the arriving light angle. In [17], the center of
each macro-pixel is located through the help of white images, which are lenselet images
taken through a white diffuser, or lenselet images of a white scene. The center locations
are estimated by finding the local maximum of brightness on the white images. Then, the
color lenselet image needs to be calibrated via rotation, translation and scaling, so that each
macro-pixel center (denoted with red point in the figure) falls onto an integer pixel location
and the arrangement of macro-pixels is aligned to the regular hexagonal grid. Through the
calibration, the amount of data will also be increased due to the interpolation involved in
scaling.
Each pixel in the calibrated image is indexed by its spatial and angular coordinates. The
spatial coordinate is given by the position of the associated macro-pixel and the angular
coordinateistherelativelocationwithineachmacro-pixel. Wethencollectpixelsofthesame
angular coordinate into one sub-aperture image, where the pixels are arranged according to
their spatial coordinates. Each sub-aperture image can be viewed as a typical 2D picture,
wherelargespatialcorrelationexistsbetweenneighbouringpixels.
52 Chapter5. Application: Pre-demosaicLightFieldImageCompression
Figure 5.4: Sparsely distributed G components on one sub-aperture image (Figure
Friends1 from EPFL light field dataset)
5.3 Proposedscheme: Compression before De-mosaicking
In the pre-processing stage of the conventional coding scheme, the volume of LF data is
increased greatly during demosaicking and the scaling operation needed for calibration.
In order to avoid these redundancies, we propose a new coding scheme for LF in which
compression is performed on the data collected in the original lenselet image instead of
the pre-processed pixels in the full color sub-aperture images. The flow chart is shown in
Fig. 5.3. Without demosaicking, we map raw pixels onto the calibrated lenselet image
according to the transformation matrix applied in [17]. Pixels that fall onto non-integer
locations after transformation will be rounded to the nearest integer positions. Then, based
on the relative locations within the macro-pixels on the calibrated image, pixels are arranged
onto multiple sub-aperture images, where redundancies between spatial neighbours can be
exploited. Due to the placement of macro-pixels, the pixels in the sub-aperture images will
beplacedhexagonally,whichisdifferentfromtheconventionalpipeline,wheresub-aperture
images are re-sampled into rectangular placement. Note that the pixels around boundaries
of each macro-pixel, which have large noise due to underexposure, are discarded in the
pipeline described in [17]. However, in the rearrangement process in our proposed scheme,
the boundary pixels will be kept. The mapping will not change the number of pixels nor the
intensityvaluesofR,G,andBcomponentsfromtherawdata.
Since no interpolation is applied, some pixel locations are empty in the sub-aperture
images,asshowninFig. 5.4. Dependingonthecameramanufacturing,i.e.,thetypeofmacro-
pixelmisalignmentandcalibrationalgorithmadopted,thespatialandangularcoordinatesfor
each pixel on the captured lenselet image may be different. Therefore, the pattern of pixel
5.3. Proposedscheme: CompressionbeforeDe-mosaicking 53
distributioninsub-apertureimagesisnotfixedandisalsohighlyirregular. Thisisincontrast
with the input signal considered in the pre-demosaic image coding schemes proposed in [12,
13,43,44],whereR,G,andBpixelsaredistributedregularlybasedonBayerpattern. Dueto
such irregularity of spatial distribution for LF data, existing coding techniques, e.g., discrete
cosinetransform(DCT)anddiscretewavelettransform(DWT),cannotbeeasilyapplied. This
motivatestheuseofgraphs,whichcanrepresentbothregularandirregulardatapointsaslong
as the pair-wise relations can be defined properly. In the next section, we will first describe
the proposed intra-prediction that can predict irregularly spaced pixel based on pixels in the
decoded neighbouring blocks. Then, in Section 5.5, the graph construction for graph-based
codingofsparselydistributedpixelswillbediscussed.
Figure 5.5: Decoder in the proposed LF coding scheme, where demosaicing and
calibrationareapplied to generate full color sub-aperture image array
Atthedecoderside,pixelsinsub-apertureimagesarede-compressedandinverse-mapped
back to their original positions in the 2D lenselet image, as shown in Fig. 5.5. The image
will then be demosaicked and calibrated [17, 47] in order to generate full color 4D LF for
furtherprocessing,e.g.,multi-viewrenderingandre-focusing. Notethatourschemedoesnot
rely on a particular selection of demosaicking and calibration algorithms. In fact, multiple
workshavebeenproposedoverthelastfewyearsinLFimagecalibrationanddemosaicking.
For example, in [79] and [82], different methods are applied to locate macrolens centers by
examing the dark pixels and line features from the white image. Therefore, the spatial and
angular coordinates estimated are different from the ones in [17], which leads to different
types of sparse distribution within sub-aperture images. In [55, 79], a different strategy of
demosaicking is considered which performs color interpolation on each of the sub-aperture
image instead of the whole lenselet image. Our coding scheme can be easily adapted by
using different transform matrices for pixel rearrangement in the encoder side according to
thecalibrationalgorithmapplied. Thedemosaickingandcalibrationstrategiesatthedecoder
sidecanalsobeadjustedaccordingly.
54 Chapter5. Application: Pre-demosaicLightFieldImageCompression
Figure5.6: ProposedIntra-prediction system for sparsely distributed pixels
5.4 ProposedIntra-prediction Scheme
In order to facilitate random access for rendering, each sub-aperture image is encoded in-
dependently. A sub-aperture image will be divided into non-overlapping m m blocks, as
transform units. A coding scheme with variable block sizes, e.g., the quad-tree partition in
HEVC,isleftasapossiblefuturework.
5.4.1 GradientEstimation
In the latest video standards, e.g., H.264 and HEVC, an intra-prediction mode is selected
throughexhaustivelysearchingallthepossibledirectionstofindtheonewiththeminimumrate
distortion(RD)cost. Theprocessincreasestheimplementationcomplexitysignificantlyasthe
numberofdirectionsallowedincreases. Inordertoreducethecomplexity,manymethodshave
been proposed in literature [4, 26, 37, 40], which exploit the local characteristics, e.g. edge
direction,toreducethenumberofcandidatemodessearched. Intheseworks,theinformation
of local gradients is used to determine the dominant edge direction within each block. The
intra-modes closely aligned with the estimated edge direction are of higher priority in mode
selection. However, all these methods are restricted to regular videos with a complete set of
pixels within each block. In this section, we propose a new intra-prediction algorithm based
onthestructuretensorforintra-predictingthesparselydistributedpixelsbeforedemosaicking.
In Fig. 5.6, we show the flow chart of our proposed intra-prediction system. Pixels in
the decoded neighbouring blocks are used as references for each block in the sub-aperture
5.4. ProposedIntra-predictionScheme 55
images. Based on the structure tensor, which provides information about local gradient in
the reference blocks, we estimate the structure tensor of the current block to be encoded.
Therefore, overhead, such as indices for intra-prediction mode, does not need to be signaled.
With structure tensor, edge direction and strength within a block can be estimated. This
information allows us to define the rotations and stretches of the adaptive kernels used for
intra-prediction. Besides the estimation of structure tensor, where we utilize the correlation
between different color channels, all other steps are applied on R, G, and B components
separately.
ThestructuretensorofablockBiscalculatedas
H
B
=
i2B
rf(i)rf(i)
T
=
i2B
2
6
6
6
6
4
d
r
(i)
2
d
r
(i)d
c
(i)
d
c
(i)d
r
(i) d
c
(i)
2
3
7
7
7
7
5
; (5.1)
where d
r
(i) and d
c
(i) denote the gradient of pixel intensities along row and column axis on
pixeli.
In our work, the gradientrf(i) for each pixeli is estimated with linear regression. Using
Taylor’stheorem,thepixelvalueatagivenpoint x canbeexpressedas
f (x) = f (a)+rf(a) (x a)+O(kx ak
2
); (5.2)
whererf(a) is the gradient vector at point a. For a sufficiently close to x, the pixel value
f (x) can bewellapproximatedwiththefirsttwoterms
f (x) f (a)+rf(a) (x a): (5.3)
Based on the linear approximation, we estimate the gradientrf(a) at point a by fitting a
hyperplanethatbestsatisfies(5.3)foranumberofpixelsfx
1
; x
2
; x
k
gcloseto a. Thefitting
canberepresentedasanoverdeterminedsystem:
F = Xrf(a) (5.4)
=
2
6
6
6
6
6
6
6
6
6
6
4
f (x
1
) f (a)
f (x
2
) f (a)
:
:
:
f (x
k
) f (a)
3
7
7
7
7
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
6
6
6
6
4
(x
1
a)
T
(x
2
a)
T
:
:
:
(x
k
a)
T
3
7
7
7
7
7
7
7
7
7
7
5
rf(a): (5.5)
56 Chapter5. Application: Pre-demosaicLightFieldImageCompression
Theoptimalgradientrf(a)
canbederivedbysolvingtheleastsquareproblem
rf(a)
= argmin
rf(a)
kFXrf(a)k
2
2
; (5.6)
whichhasaclosedformsolution
rf(a)
= (X
T
X)
y
X
T
F: (5.7)
Foreachpixel,weuse4nearbypixels(k = 4)forhyperplanefittingin(5.5). Theneighbouring
pixelsareselectedasthe2closestneighboursintermsofEuclideandistanceinhorizontaland
vertical orientations, respectively. This choice is made in order to avoid the ill-conditioning
thatmayresultifthepixelspickedwereallalignedonthesameline.
After computing the structure tensor in (5.1) using the estimated gradients, we apply
eigen-decompositiononthe22 matrix H
B
:
H
B
=
f
e
1
e
2
g
2
6
6
6
6
4
1
0
0
2
3
7
7
7
7
5
2
6
6
6
6
4
e
1
T
e
2
T
3
7
7
7
7
5
(5.8)
=
1
e
1
e
1
T
+
2
e
2
e
2
T
: (5.9)
The resulting eigenvectors e
1
and e
2
along with their corresponding eigenvalues
1
and
2
,
where
1
<
2
, summarize the gradient distribution within block B. The eigenvector e
2
represents the direction maximally aligned with the gradient, while the orthogonal direction
e
1
roughlyrepresentstheedgedirection.
5.4.2 StructureTensorEstimation
Leveraging the local redundancy among neighbouring pixels in the sub-aperture image, we
can estimate the local gradient, and therefore the structure tensor, of each input block I using
the information from its decoded neighbouring blocksfB
1
;B
2
;g. In our algorithm, the
estimate of the structure tensor H
I
of block I is calculated as the weighted average of the
structuretensorsfromitsneighbouringblocks:
H
I
=
1
c
i
w
i
(
1
n
i
H
B
i
); (5.10)
where n
i
denotes the number of available pixels in reference block B
i
, w
i
is the weight
associatedtoblockB
i
,andc isthe normalizationconstant c =
i
w
i
.
5.4. ProposedIntra-predictionScheme 57
Figure 5.7: 4 decoded reference blocks and the vectors indication their relative loca-
tionstotheinput block I and edge direction estimated
The weight value w
i
is a function of the edge orientation in B
i
and the reference block’s
relative location from the input block I. This design is based on the observation that edges
in natural images are continuous contours, and can be approximated locally with straight
lines. In other words, the orientation of edges is mostly consistent locally. Therefore, if the
edge direction calculated on one reference block B
i
is consistent with its relative location
from I, e.g., if the reference block in the top-left corner has edge orientation from top-left
to bottom-right, the input block to-be coded is more likely to have the same edge direction.
In estimating the gradient in the input block, blocks with consistent edge orientation will be
assignedlargerweights. Inouralgorithm,weconsiderupto4decodedneighbouringblocks,
i.e.,blocksonthetop-leftcorner,top,top-rightcorner,andleft,asthereferencesforstructure
tensorestimationandthefollowingintra-prediction. Theunitvectors v
i
fromItoeachofthe
B
i
referenceblocksare[
1
p
2
;
1
p
2
]
T
;[1;0]
T
;[
1
p
2
;
1
p
2
]
T
;[0;1]
T
,respectively. Denote e
i
tobethe
edgedirectioncalculatedwiththestructuretensorinblockB
i
,and
i
tobetheanglebetween
v
i
and e
i
,calculatedas
i
= arccos(
v
i
e
i
kv
i
k
2
)2 [0;]; (5.11)
depictedinFig. 5.7. Theweight w
i
in(5.10)isdefinedas
w
i
=
8
>
>
<
>
>
:
exp(
i
) if <
2
exp(
i
) if >
2
; (5.12)
where is an adjustable parameter. In (5.12), larger weights are assigned to the blocks
58 Chapter5. Application: Pre-demosaicLightFieldImageCompression
Figure 5.8: Illustration of kernel size and shape in the smooth block (left) and the
block with strong edge (right)
with smaller
i
, i.e., the edge directions e
i
and v
i
are nearly consistent. The structure tensor
estimation described in (5.10) is done only on the G channel. For R and B channels, the
structure tensors are calculated based the reconstructed G channel pixels in the same block,
sincethereexistshighcorrelationbetweengradientsinR,G,andBchannels.
5.4.3 Data-adaptiveKernelRegression
For intra-prediction, we apply the same adaptive kernel regression on each pixel in a given
block,similartowhatwasdonein[35]. Inthecaseofzero-orderestimation,thepredictionof
the pixel intensity at x can be calculated by taking the weighted average of its neighbouring
pixels x
i
,writtenas
~
f (x) =
1
z
x
x
i
2R
x
(K(x
i
x) f (x
i
)); (5.13)
whereR
x
denotesthesetofneighbouringpixelsof xinthedecodedreferenceblocks,and z
x
isthenormalizationconstant z
x
=
x
i
2R
x
K(x
i
x). Acommonchoiceofthekernelfunction
K() istheGaussiankernel
K(x
i
x) =
1
2
2
exp(
(x
i
x)
T
(x
i
x)
2
2
); (5.14)
which gives higher weights to nearby pixels than to pixels that are far away. However, such
kernel selection assumes isotropy in image characteristics, ignoring local features such as
edges. For data-adaptive kernel regression used in this work, the Gaussian kernel is adapted
5.5. ProposedTransformCoding 59
basedontheedgeorientationandstrengthderivedfromstructuretensoras
K(x
i
x) =
p
det(C)
2
2
exp(
(x
i
x)
T
C(x
i
x)
2
2
); (5.15)
where Cisthelocalgradientcovariancebasedontheeigenvectorsandeigenvaluesofstructure
tensordefinedin(5.9),andcanbedecomposedas
C =
f
e
1
e
2
g
2
6
6
6
6
4
e
1
T
e
2
T
3
7
7
7
7
5
(5.16)
=
2
6
6
6
6
4
1
0
0
3
7
7
7
7
5
, =
2
+ p
2
1
+ p
2
(5.17)
=
1
n
p
1
2
+ p
1
; (5.18)
where [e
1
e
2
] rotates the coordinates of Gaussian kernel along the edge direction and
dominant gradient. is the ratio of two eigenvalues, representing the relative strength of
gradient in e
2
from the perpendicular direction e
1
. The kernel will be elongated for blocks
of strong edges, where
2
1
, and will be near-circular for smooth blocks (
2
1
0),
as illustrated in Fig. 5.8. determines the scaling of the kernel size, where n is the number
of available pixels in the block. p
1
and p
2
are two positive scalars used to ensure numerical
stability.
5.5 ProposedTransform Coding
As described in Section 5.3, the pixels rearranged onto the sub-aperture images before de-
mosaicking are distributed sparsely, which motivates the usage of graph-based transform in
exploring data redundancies. In this section, we discuss the graph construction, on which
the graph based transform will be applied. Each node on the graph represents one pixel and
each link connects distinct pixels within the same color component. Note that the graphs for
different blocks are constructed independently. For graph construction, we will discuss two
algorithms: 1) a simple heuristic based on geometric distance between pixels and 2) graph
learning fromtrainingdatabasedonGMRFmodeling.
60 Chapter5. Application: Pre-demosaicLightFieldImageCompression
(a) (b)
Figure 5.9: A part of graph constructed for irregularly placed R components. In (a),
the oneusing4 nearest neighbor method is shown. In (b), each pixel is connected to2
neighborsinhorizontal and vertical orientations respectively
5.5.1 GraphConstructionbaseonGeometricDistance
In each sub-aperture image, similar to natural images, large local redundancies exist among
pixels that are close in distance. Hence, the most straightforward approach in exploiting
the pair-wise correlation is to connect each pixel with its k nearest neighbors in terms of
Euclideandistance. Forcomplexityreductioninthegraph-basedliftingtransform,wherethe
computation for each node depends on its connected neighbours, we consider mainly sparse
graphs,i.e.,smallk. However,thegraphconnectionbasedonk-nearestneighbourwithsmall
k can be highly sensitive to the pixel arrangement. For example, in the cropped sub-aperture
imageshowninFig. 5.9,Rcomponentsaremostlyalignedhorizontally. Theresultinggraph,
based on k-nearest neighbor (k = 4), thus consists of mostly horizontal links as shown in
Fig. 5.9(a), and is unable to capture local similarity in regions with vertical features, e.g.,
verticaledges.
Inordertoexploitsimilarityalongdifferentorientations,yetstillkeepconnectionsparse,
we instead connect each pixel to an equal number of neighbours in horizontal and vertical
directions, as shown in Fig. 5.9(b). The weight w
i;j
on the link connecting node v
i
and v
j
is
definedas
w
i;j
= exp
dist(i; j)
2
2
!
; (5.19)
withtheassumptionthatpixelsthatarecloserindistancearemorelikelytobesimilarinpixel
intensities. The function dist(i; j) measures the Euclidean distance between node v
i
and v
j
.
Thisapproachhasbeenadoptedinourpreviouspublicationin[8].
5.5. ProposedTransformCoding 61
5.5.2 GraphLearningbasedonStatisticsModeling
Inadditiontothesimpleheuristicusinggeometricdistance,wealsoconsideroptimizedgraph
construction based on the statistics of residual blocks in LF images. Since the information
about the graph, i.e., connection, link weights, and self loops, can be uniquely and fully
representedusingagraphLaplacianmatrix L
G
,theproblemoffindingtheoptimalgraphcan
beseen asbeingequivalenttotheoptimizationof L
G
.
In the literature, many works have studied the problem of finding the optimal graph
Laplacian matrix based on observations in different applications. In [24, 33, 34, 60, 78], a
statistical assumption has been made on data, where each observation f2 R
N
is modeled as
a realization of a GMRF, i.e. f N (; = Q
1
) and the problem for learning the graph
Laplacian matrix L
G
for this type of data modeling boils down to the maximum likelihood
estimation of the precision matrix Q, which defines the partial correlation between pair-wise
variables in f. In [33, 34], Egilmez et al. proposed an algorithm specifically for finding a
precisionmatrix QwithgraphLaplacianstructure,i.e.
8
>
>
<
>
>
:
Q
i;j
< 0 if A
i;j
> 0
Q
i;j
= 0 if A
i;j
= 0
; (5.20)
with additional constraints on the graph connectivity. For a graph of N nodes and M links,
givenpi.i.dobservationsff
1
; f
2
; f
p
gofzeromeanGMRF,theprecisionmatrixcanbefound
bysolvingthemaximumlikelihood (ML)problem:
argmax
Q
Y
i=1:p
p(f
i
jQ;
i
= 0)
=argmin
w;h
logdet(Q)Tr(QS)
subject to Q = Bdiag(w)B
T
+diag(h);
(5.21)
where S is the N N sample covariance matrix, and B is the N M incidence matrix,
specifying the connectivity between node pairs. The M1 vector w contains the weights
associatedtoeachlink,andthe N1vector hcontainstheselfloopsassociatedtoeachnode.
The aforementioned Laplacian learning algorithms are all based on the assumption that
each N dimensional observation f
i
contains a complete set of variables, i.e. a pixel value is
available at each index in f
i
. In a block of a non-demosaicked sub-aperture image, however,
pixels are distributed sparsely and each index in f
i
contains at most one color component out
of R, G, and B. Therefore some modifications are required for using the existing learning
62 Chapter5. Application: Pre-demosaicLightFieldImageCompression
algorithm. Without loss of generality, we can write the observed block as a column vector
withitselementsorderedas
f
i
=
2
6
6
6
6
4
f
iO
f
iM
3
7
7
7
7
5
; (5.22)
where f
iO
is a r dimensional vector containing the observed pixel intensities, and f
iM
of
dimensional Nr is the missing pixels. r and the indices specified byO andM are both
variables that are block dependent. In order to optimize the graph for f
iO
, we assume f
iO
to be a sub-sampled version of f
i
, which is modeled as a GMRF. In the statistics literature,
many methods are proposed for estimating the inverse covariance matrix in a GMRF model
based on observations with missing variables [41, 49, 71]. In the work by Kolar and Xing
[41], a simple plug-in algorithm is proposed. The method consists of two steps: First, the
samplecovariancematrixisestimatedusingtheincompleteobservations. Define r
i
tobethe
indicatorvectorof f
i
,whereelement
8
>
>
<
>
>
:
r
ia
= 1 if f
i
a
isavailable
r
ia
= 0 otherwise
; (5.23)
theestimatedsamplecovariancematrix
~
S iscalculatedas
~
S
a;b
=
P
i=1:p
r
ia
r
ib
(f
i
a
i
a
)(f
i
b
i
b
)
P
i=1:p
r
ia
r
ib
: (5.24)
Readers are referred to [41] for more details on the theoretical justification of the estimation.
In the second step,
~
S will be plugged into the objective function of the maximum likelihood
estimation of the precision matrix Q. In this work, we will optimize Q based on (5.21) with
S replacedby
~
S.
We can write the covariance and precision matrices of the estimated GMRF model in
blockformas
= Q
1
=
2
6
6
6
6
4
O;O
O;M
M;O
M;M
3
7
7
7
7
5
;
Q =
2
6
6
6
6
4
Q
O;O
Q
O;M
Q
M;O
Q
M;M
3
7
7
7
7
5
:
(5.25)
Once the precision matrix Q, and therefore the graph Laplacian matrix, is derived for the
whole block, which includes the positions of both available and missing data, the graph
LaplacianL for f
iO
canbecalculatedbytakingtheSchurcomplementofthesub-matrix
O;O
5.5. ProposedTransformCoding 63
of:
L =
1
O;O
= Q
O;O
Q
O;M
Q
1
M;M
Q
M;O
: (5.26)
Inourexperiment,lightfieldimagesaredividedintotwogroups: trainingsetandtestset.
Blocksofintra-predictedresidualsfromthetrainingdataareclassified,basedonthestructure
tensor,into8 directionalmodes:
M =fM
j =
3
8
;
4
;
8
;0;
8
;
4
;
3
8
;
2
g (5.27)
and one DC mode,M
DC
, if there is no dominant edge direction. Given the edge angle
of a training block derived from its structure tensor, i.e., the angle between the smallest
eigenvector and the horizontal axis, and the eigenvalues
1
and
2
(
1
2
), the associated
classisdeterminedbasedon
8
>
>
<
>
>
:
M
DC
if
2
1
< T
M
if
2 [
16
; +
16
)
: (5.28)
Training blocks within a class are assumed to be samples from the same GMRF model.
Therefore,intotal9graphLaplacianmatricesfL
Gi
ji = 1;2;9g arederivedusingtheplug-
in algorithm for intra-predicted residuals. We consider 8 connected graph for pixels on the
hexagonal grid as the connectivity constraint in (5.21). For residual blocks in test set, blocks
will be classified into 9 modes based on the edge angle derived from the estimated structure
tensor,whosecalculationisdescribedinSection5.4. Thegraphconstructionofpixelsineach
blockisobtainedwithSchurcomplementfromtheLaplacianmatrix L
Gi
ofthecorresponding
class.
5.5.3 Graph-basedLifting Transform
Due to the large data size of light field images, we apply the localized graph based lifting
transform with the Max Cut bipartition and re-connection (reconn-MaxCut) described in
Chapter 4, which was shown to have comparable performance but lower complexity than
reconn-GMRF.AsdescribedinChapter3,forthecontextofcompression,themainobjective
in designing the lifting scheme is to compress most of the data energy in the low frequency
band, i.e., in the update setU, or correspondingly reduce the energy of prediction residuals
d = f
P
Pf
U
. Forasignal f modeledasGMRFN (; Q
1
),theminimummeansquareerror
64 Chapter5. Application: Pre-demosaicLightFieldImageCompression
estimator(alsoMAPestimator)ofsignalvalue f
i
onnode v
i
2P isexpressedas
^
f
i
=
i
X
j
Q
i;j
Q
i;i
(f
j
j
)
=
X
j
Q
i;j
Q
i;i
f
j
+ (
i
+
X
j
Q
i;j
Q
i;i
j
):
(5.29)
The matrix form of (5.29) for the case of zero mean is the same as in (3.15). If the precision
matrix estimated satisfies the graph Laplacian structure, as described in (5.20), and if
i
j
= , which is usually true for neighboring nodes of high correlation, the above equation
canbesimplifiedas
^
f
i
=
X
j
Q
i;j
Q
i;i
f
j
+(1+
X
j
Q
i;j
Q
i;i
)
=
1
D
i;i
+ H
i;i
X
j
A
i;j
f
j
+
H
i;i
D
i;i
+ H
i;i
:
(5.30)
The estimator is simply a weighted average of the signal values f
j
on the neighbouring
nodes v
j
and the associated mean of v
i
. The self loop weight H
i;i
can be interpreted as
a measurement of similarity to the mean . In our experiment, we assume = 0 since the
sub-aperture images after intra-prediction usually have average value close to0. Note that if
the graph is bipartite, i.e. links only exist between nodes from opposite sets, the prediction
operationin(5.30)isequivalenttothelowcomplexityCDF53filterbank
^
f
i
=
1
D
i;i
X
v
j
2U
A
i;j
f
j
(5.31)
described in Section 2.4, when self loop weight H
i;i
= 0. In this work, we consider a more
generalCDF5/3filterbankwithnon-zeroselfloops:
^
f
i
=
1
D
i;i
+ H
i;i
X
v
j
2U
A
i;j
f
j
+
H
i;i
D
i;i
+ H
i;i
: (5.32)
For bipartite graphs, the proposed filterbanks are optimal in terms of mean square error for
theGMRFmodel,asshownin(5.30).
For graphs that are not bipartite, we use the re-connection algorithm with sparsification
described in Section 3.3, which re-connects each v
i
2 P to be predicted to nodes inU,
and apply CDF5/3 predictor in (5.32) on the newly formed bipartite graph. As proven in
Appendix A, without the sparsification, the applied predictor is equivalent to the minimum
5.6. Experiments 65
mean square error estimator (MMSE) for f
P
given f
U
, and therefore is optimal in terms of
energycompaction.
For Max Cut bipartition, we apply the algorithm proposed in [59] using MST algorithm.
The update filter is calculated through orthogonalization, and the construction of graphs for
lifting transform in levels 2 is based on Kron reduction. For entropy coding, we apply the
Amplitude and Group Partitioning (AGP) algorithm [66]. Note that the graph-based lifting
transform applied is the same as in Section 4.5 (reconn-MaxCut), but with graphs designed
inthischapter.
5.6 Experiments
5.6.1 ExperimentalSetting
For archival purpose, one should assess the quality of reconstructed lenselet image in the
original RGB pattern. However, current state of the art schemes using HEVC discard un-
derexposed pixels at the boundary of macro-pixels during the conversion to sub-aperture
image array, so it is difficult to recover the lenselet image on the decoder side. Hence, for
evaluation, we compare performances on the reconstructed full color sub-aperture images.
The full color sub-aperture image before compression, generated from the raw lenselet data
using the demosaicking and calibration pipeline described in [17, 47], is taken as the ground
truth. As a baseline, we consider the HEVC (HM 16.9) encoding of sub-aperture images
in original 4:4:4 RGB (intraHEVC-RGB444) and 4:2:0 YUV (intraHEVC-YUV420) formats.
Sinceweconsideracodingschemethatallowsefficientrandomaccess,theconfigurationused
in HEVC is All-Intra coding, which allows intra-prediction but no inter-prediction. For fair
comparison, the post filters in HEVC, including deblocking filter and SAO are disabled. In
our proposed scheme, the same demosaicking and calibration will be applied on the decoder
side to the reconstructed lenselet image, in order to generate the reconstructed sub-aperture
images for evaluation. Images from the proposed and baseline schemes are compared in
RGB format without sub-sampling. For intraHEVC-YUV420, the reconstructed sub-aperture
images are translated back to 4:4:4 RGB format before evaluation. The up-sampling for U
andVcomponentsisbasedonnearestneighborinterpolation.
Inourmethod,eachsub-apertureimageisdividedintonon-overlapping88blocks. The
light field images we consider in the experiments are obtained from the EPFL database [46].
Weconsiderthreedifferentscenariosforthegraphbasedcodingscheme:
66 Chapter5. Application: Pre-demosaicLightFieldImageCompression
(a) =
3
8
(b) = 0 (horizontal)
Figure 5.10: Graph structure optimized for classes corresponding to intra-prediction
angle (1) =
3
8
(nearly vertical) and (2) = 0 (horizontal). The link color indicates
the associated weight and the node color indicates the associated self loop weight
(darker: larger weight)
1. DGLT: geometric Distance based graph construction (Section 5.5.1) in Graph Lifting
Transformwithoutintra-prediction,whichhasbeenusein[8].
2. intraDGLT: geometric Distance based graph construction in Graph Lifting Transform
withtheproposed intra-prediction(Section5.4)
3. intraLGLT: graph Learning based graph construction (Section 5.5.2) in Graph Lifting
Transformwiththeproposedintra-prediction
For graph learning in scenario3, we select4 sub-aperture images from each of the following
LFimagesfromEPFLdataset: Ankylosaurus_&_Stegosaurus,Ceiling_Light,ISO_Chart_16,
Perforated_Metal_1, Sophie_&_Vincent_3, and Yan_&_Krios_standing to form the training
set. The parameter in (5.12) is set as 0:9, and the parameters;p
1
;p
2
in (5.15) and (5.17)
fordata-adaptiveGaussiankernelsarechosentobe1:6,0:001and0:001. Thethresholdvalue
5.6. Experiments 67
T in(5.28)fortrainingblockclassificationissetas1:5. Weapply2levelliftingtransformfor
transformcodingforpixelsineachblock.
In Fig. 5.10, two illustrative examples are shown for weighted graph optimized for set
M
=
3
8
andM
=0
. The color on links and nodes indicates the associated link weight and
self loop weights. It can be seen that the orientation of links with strong weights (links with
darker color) well matches the intra-prediction direction, which is also the edge direction of
the block. The nodes with large self loop weights concentrate around the boundaries close
to the reference blocks, which have better prediction. Therefore, the associated residuals
on those pixels tend to have lower variance. The observation matches our interpretation in
Section 5.5 that self loop weights model the similarity of observed values on a node to their
expectedvalue.
Therawdatain[46]arecapturedwithLytroIllumcamera[64]. Eachtestimageisofsize
53687728. In the baseline scheme, the raw data will be converted into 1515 full color
sub-aperture images. Each sub-aperture image is of size 434625. Therefore for each test
image, there are a total of 91546875 = 1515434625 (1+
1
4
+
1
4
) pixels that need
to be encoded by HEVC when using 4:2:0 YUV format. In our scheme, on the other hand,
thecompressionisappliedontheoriginalrawdatawithoutdemosaicking,andthereforeonly
41483904 = 53687728 pixelsarerequired,savingmorethan55% ininputdatasize.
5.6.2 Results
Fig. 5.11 shows the PSNR comparisons for images Friends_1, Bikes, Flowers, and Anky-
losaurur_&_Diplodocus from EPFL dataset. The considered QP values range from 4 to 36.
Forapplicationssuchasarchivingandinstantstorageoncameras,imagesaretypicallystored
inveryhighquality. Therefore,intheevaluation,weconsidermainlythehighbitrateregion.
It can be seen that for higher bit rates (bpp > 2), graph based coding schemes significantly
outperform the conventional approach using HEVC. This is because as the bit rate increases,
indicating a smaller quantization step, more high frequency components will be kept in the
transform domain after quantization. Using conventional approach to encode will incur a
large cost to encode all the coefficients, while in the proposed method, only around half as
many coefficients are encoded. For low bit rate region, since most of the high frequency
coefficients will be discarded after quantization, the number of coefficients that need to be
encoded in HEVC are much smaller than the total number of coefficients. Therefore, the
advantage of having a smaller amount of data is not as significant in this case. With the
proposed intra-prediction, correlations between neighbouring blocks are utilized to reduce
redundancies before compression. As a result, performance is improved by about 5dB over
68 Chapter5. Application: Pre-demosaicLightFieldImageCompression
(a) (b)
(c) (d)
Figure5.11: AveragePSNRoverR,G,andBcomponentsfortestimages(a)Friends1
(b)Bikes,(c)Flowers, and (d) Ankylosaurur & Diplodocus
5.7. Summary 69
our results without intra-prediction. Moreover, with graph learning, directional local charac-
teristicsareexploitedinselectinggraphlinkweights,whichprovidesmoreaccuratemodeling
of the similarity between pixels as compared to the link weight selection based solely on
Euclidean distance. In the results, using graph learning leads to around 0.5dB gain over
graph construction based on distance. For baseline method using 4:2:0 YUV format, PSNR
will mostly saturate near 43dB, which is mainly caused by the color conversion. During
the conversion, some details are lost when rounding floating point values, and resolution is
reduced as down-sampling is performed. In the proposed method, since the compression is
performedontheraw RGBdata,theperformancewillnotbedegradedby distortionofcolor
conversionanddownsampling.
5.7 Summary
Inthischapter,wehavedescribedanovelcodingschemeforlightfieldimagesbasedongraph
basedliftingtransform. Theschemeisabletoencodetheoriginalrawdatawithoutintroducing
redundancies from demosaicking and calibration and distortion from color conversion and
downsampling. Moreover, we proposed an intra-prediction and graph learning algorithm for
pixels in sub-aperture images that are sparsely distributed. The pixels are then connected as
graphsandencodedwithlowcomplexitygraphbasedliftingtransform. Thecodingresultsat
highbitratesusingtheproposedmethodoutperformthewidelyappliedHEVCbasedapproach.
71
Chapter6
EdgeAdaptiveGraph-BasedTransforms
Step/RampEdgeModels for Video Compression
6.1 Introduction
In this chapter, we study edge models for edge adaptive graph-based transforms (EA-GBTs)
in video compression. In particular, we consider step and ramp edge models to design
graphsusedfordefiningtransforms,andcomparetheirperformanceoncodingintraandinter
predicted residual blocks. EA-GBTs are special types of Graph Fourier Transforms (GFTs),
described in Section 2.2, where the graphs are constructed based on the edge information.
ThefirstexampleofanEA-GBTwasproposedin[28]fordepth-mapcoding. Byadjustingthe
graphweightsbasedontheedgeinformation,i.e.,smallerweightsareassignedforlinksacross
edges,anEA-GBTwasdefinedforeachblock. Suchadaptationprovidesbettercompression,
since the resulting transforms avoid filtering across the discontinuities which could have
created high frequency coefficients. Recently, Hu et.al [76] proposed an EA-GBT to capture
bothstrongandweakedgesfordepth-mapcoding. Inourpreviouswork[32,80],weshowthat
EA-GBTs can also be used to improve inter and intra predicted residual coding performance
overtheDCT.However,priorworkonlyconsidersthestepedgemodelfortheEA-GBTdesign
byimplicitlyassumingthatalledgescanberepresentedwithidealstepfunctions,asshownin
Fig. 6.1(a). As pointed out in [61], this is usually not the case, especially for high resolution
natural images where edges are mostly ramps, as illustrated in Fig. 6.1(b). In this work,
we show that such property also holds for intra predicted residual blocks generated in video
codingandthatbettercompressioncanbeachievedbyusingtheproposedramp-edgemodel.
In contrast, for inter predicted coding, we obtain better compression performance using the
stepedgemodel. Mostofthiswork waspublishedin[81].
In order to optimize EA-GBTs whose associated graphs are designed based on a ramp
edgemodel,weproposeanewprobabilisticmodelforrampedgesandestimatetheparameters
72 Chapter6. EA-GBT:Step/RampEdgeModelsforVideoCompression
(a) (b)
Figure6.1: (a)Thestepfunction and (b) ramp function for edge modeling
by training on residual block data. The optimized model parameters are used to design EA-
GBTs by adjusting the graphs’ weights according to the detected ramp edges within blocks.
For compression, we employ a block-adaptive coding scheme, i.e., different EA-GBTs are
designedforblockswithdifferentedgepositions,whichrequiressendingedgeinformationto
thedecoderforreconstruction. Forefficientedgecoding,weproposeanewmethodforramp
edges, called arithmetic ramp edge coding (AREC), extending AEC proposed in [19]. In our
experiments, we compare EA-GBT with step and ramp models for inter and intra predicted
residualvideos. Theresultsshowthattheproposedrampedgemodelperformsbetterthanstep
edgemodelforintrapredictedresiduals,andbothmodelsoutperformDCT-basedencoding.
6.2 EdgeModelsforResidual Signals
6.2.1 Ramp-EdgeModel
The derivation of the optimal graph for EA-GBT in [76] is based on the assumption that
1-D signals with an edge can be modeled as auto-regressive (AR) processes with a step
transition. However, edges with step transition rarely exist in natural images. The edges
mostly have a smoother (ramp-like) transition as a result of image capture and digitization,
which is particularly true for high resolution images. Therefore, in our work, we model the
1-D signals[x
1
;x
2
;x
N
]
t
with an edge in a block as AR processes with a sloped transition
from pixels x
i
to x
i+`
, where ` denotes the ramp width, immersed in an independent and
identicallydistributed(i.i.d.) Gaussiannoise e
k
N (0;
e
2
). Themodeliswrittenas
6.2. EdgeModelsforResidualSignals 73
x
1
= (y+)+e
1
x
2
= x
1
+e
2
:::
x
i
= x
i1
+e
i
x
i+1
= x
i
+e
i+1
+t
1
:::
x
i+`
= x
i+(`1)
+e
i+`
+t
`
x
i+(`+1)
= x
i+`
+e
i+(`+1)
:::
x
N
= x
N1
+e
N
;
(6.1)
where y isthereferencesampleintheneighboringblock,andN (0;
2
) istheassociated
distortion. A sloped transition is denoted as an i.i.d. random gapt
p
N (m;
t
2
). Note that
the model is equivalent to the model in [76] for ` = 1 and y = 0, and thus can be seen as
a generalization of that work. We can express (6.1) in matrix form as Fx = b+ y, where
y = [y;0;0;0]
t
and
F =
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1 0 0 0
1 0
:
:
:
:
:
:
:
:
:
0 1
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
: 1 0
0 0 1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
; b =
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
e
1
+
e
2
e
3
:
:
:
:
:
:
:
:
:
e
N
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
+
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
0
:
:
:
t
1
:
:
:
t
`
:
:
:
0
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
: (6.2)
Since Fisinvertible,thesignalcanbewrittenas x = F
1
b+ F
1
y,where F
1
yistheoptimal
prediction for x, and r = F
1
b is the resulting residual signal. The optimal transform for
compressing r can be derived by computing the KLT. In order to do it, we first compute the
74 Chapter6. EA-GBT:Step/RampEdgeModelsforVideoCompression
Figure6.2: 1-Dlinegraphwithweaklinkweights w fortherampspannedfrom x
i
to
x
i+`
covariancematrixof r as
C =
e
2
F
1
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1+ 0 0
0 1
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
: 1+
t
0
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
: 0
:
:
: 0
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
: 0 1+
t
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
0
0 0 1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
(F
1
)
t
; (6.3)
where
t
=
t
2
e
2
and =
(
)
2
e
2
. Inthischapter,weonlyconsiderthecasewhen = 1. Then,
theprecisionmatrix,definedas Q = C
1
,canbewrittenas
Q =
1
e
2
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
1+
1
1+
1
1 2 1
:
:
:
:
:
:
:
:
:
1 2 1
1 1+
1
1+
t
1
1+
t
1
1+
t
2
1+
t
1
1+
t
:
:
:
:
:
:
:
:
:
1
1+
t
1+
1
1+
t
1
1 2 1
:
:
:
:
:
:
:
:
:
1 1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
: (6.4)
Giventhe1-Dsignalf = [f
1
; f
2
; f
N
]
t
oflengthN withtheslopededgebetweenlocation
x
i
and x
i+`
, we can represent the signal using a line graph shown in Fig.6.2. It can be shown
that if we assign the weights w across the ramp to be
1
1+
t
=
1
1+
2
t
=
2
e
, while others are set
to 1, the graph Laplacian L is approximately equivalent to Q in (6.4) if the distortion of the
6.2. EdgeModelsforResidualSignals 75
reference sample is large. Since precision matrix Q L and the covariance matrix C share
thesameeigenvectorset,theEA-GBTbasis UdefinestheKLT.Inoursimulation,weassume
thenoisevariance
2
e
is1. Theparameter
2
t
,canbeestimatedfromthesamplevariance ^
2
t
of pixel gradientsfjf
i
f
i+1
j;jf
i+1
f
i+2
j;;jf
i+`1
f
i+`
jg extracted from the detected
ramps.
6.2.2 ExperimentalJustificationoftheEdgeModels
We justify our proposed model experimentally by learning a graph from the real inter/intra
predicted residuals. We employ one of the graph learning methods proposed in [24]. The
residualsignal f2R
N
isfirstmodeledasaGaussianMarkovRandomField(GMRF)defined
asfollows.
p(fjQ) =
1
(2)
N=2
det(Q)
1=2
exp
1
2
f
t
Qf
; (6.5)
where Q is the precision matrix to be estimated. Note that the AR model described in the
previoussubsectionisonespecialcaseofaparametricGMRFmodel. Theoptimalprecision
matrixin(6.5)iscomputedbysolvingthemaximumlikelihoodproblem:
Q = argmax
Q2
logdet(Q)Tr (QS); (6.6)
where Sisthesamplecovarianceofresidualsignal f and definesthematrixtypeandgraph
connectivity constraint for Q. In our case, we constrain the matrix to be a combinatorial
Laplacian of a 2-connected line graph. In (6.6), the objective function is derived by taking
thenaturallogarithmoflikelihoodtermin(6.5).
In order to form the training set, we apply Sobel detector to identify step edges, and the
edgedetectorproposedin[61]toidentifyrampedgesofwidth` = 2onunquantizedresiduals
obtainedfromHEVCintraandinterpredictionfor8videosequences,withblocksizefixedas
88. Thetrainingsetiscomposedofrow/columnresidualvectors[x
1
;x
2
;;x
8
]
t
withastep
edgedetectedbetweenpixelx
4
andx
5
,andarampedgedetectedbetweenpixelx
3
andx
5
. By
solving for the graph Laplacian L Q in (6.6) for the training data, which includes residuals
where both step and ramp edges were identified, the resulting model gives an estimate of the
mostefficientrepresentationfortheedgestructure. TheresultsareshowninFigs. 6.3and6.4,
where the maximum link weights are normalized to 1. It can be seen that the graph derived
for INTRA predicted residuals (Fig. 6.4) has a similar structure to the model in (6.1) with
` = 2, which offers some justification for the application of ramp edge models. On the other
hand, the step edge model, (6.1) with` = 1, is better for representing the INTER predicted
76 Chapter6. EA-GBT:Step/RampEdgeModelsforVideoCompression
residuals (Fig. 6.3). This is because edges in inter predicted residuals tend to have sharper
transitionsduetomotionestimationmismatchesforsomepixelswithrespecttothereference
block.
Figure6.3: Theoptimal line graph for INTER predicted residuals
Figure6.4: Theoptimal line graph for INTRA predicted residuals
6.3 RampCodingand Graph Construction
6.3.1 ArithmeticRampEdgeCoding(AREC)
The ramp edge detection for each transform block is based on the work in [61]. Similar
to Canny edge detector, the algorithm can be implemented in two steps: pre-filtering and
differentiation. The optimal pre-filter coefficients are listed in [61] for ramps with different
widths ` (` is restricted to be even). For differentiation, a pixel with gradient larger than a
thresholdT in the pre-filtered image is detected as a ramp center and stored in a binary map
B. The information of B will be used to define the graphs in EA-GBT, and thus needs to be
signaled to the decoder. We extend the idea of arithmetic edge coding (AEC) [19], which
wasproposedforstepedgecoding,forencodingthepositionsoframpcenters. Theproposed
method iscalledarithmeticrampedgecoding(AREC).
Given the ramp positions p
1
;p
2
;;p
n
in B, as shown in Fig.6.5, we create an ordered
chaincodefc
1;2
;c
2;3
;;c
n1;n
gbytraversingthroughtheramppixels,whereeachelementin
the chain code denotes a directed link connecting two adjacent ramp components. Note that
there are 8 possible directions that c
i1;i
can take:fN,NE,E,SE,S,SW,W,NWg, as shown
inFig. 6.6(a). Giventhedirectionoflinkc
i1;i
fromnode p
i1
to p
i
,thereareonly5possible
directions for c
i;i+1
from node p
i
to p
i+1
: fforward, slight right, right, slight left, leftg, as
denoted in Fig. 6.6(b), assuming the ramp edge detected contains no sharp corners. As in
AEC, the chain code is encoded using arithmetic coding. However, the chain code in AEC
is composed of boundaries detected for step edges, and for each boundary, only4 directions:
6.3. RampCodingandGraphConstruction 77
(a) (b)
Figure 6.5: (a) An example of binary ramp map with pixels p
1
;p
2
;p
5
indicating
the ramp centers, and (b) the chain code formed by traversing though the consecutive
ramp pixels.
(a) (b)
Figure 6.6: (a) The 8 directions that can be taken by c
i
. (b) The potential traversing
directionsfrom p
i
to p
i+1
given the direction from p
i1
to p
i
.
78 Chapter6. EA-GBT:Step/RampEdgeModelsforVideoCompression
fN,E,W,Sg canbetaken,asshowninFig. 6.7. ARECdetailsaresummarizedinAlgorithm
5,wherethepredictionfor ~ c
i;i+1
iscomputedusinglinearregressiononk previouslytraversed
pixels p
ik1
;p
i1
;p
i
. k is set to 4 in our simulation. The top most uncoded ramp center
ischosenas p
1
withS (south)astheinitialtraversingdirection.
(a)
(b)
Figure 6.7: (a) An example of AEC coding on step edges and (b) AREC on ramp
edges
In order to evaluate the coding efficiency of AREC, we compare AREC for ramp edge
coding and AEC for step edge coding on the same set of 88 residual blocks taken from 8
video sequences without rate distortion optimization. The step edges are found using Sobel
detector. The average bits per pixel (BPP) results are shown in Table 6.1. It can be seen that
AREC achieves around 11:5% bitrate reduction with respect to AEC for both inter and intra
predictedresiduals.
AEC forStep model AREC for Ramp model
BPP(10
2
) BPP (10
2
) BPP (%)
INTERpredicted 6.50 5.71 -12.2
INTRApredicted 6.95 6.17 -11.2
Table 6.1: Bitrate comparison between AEC and AREC.BPP indicates the bitrate
gain of AREC over AEC
6.3.2 GraphConstructionfromtheEdgeMap
After applying ramp detector, a graph can be constructed based on the positions of detected
ramp centers. A residual block is first represented as a 4-connected grid graph, where each
link has weight 1, as shown in Fig. 6.8. Then for each ramp center detected, the neighbors
along 4 directions: ftop;bottom;left;rightg are inspected. If the neighbor along direction
m is not a ramp component, the estimated weight w < 1 is assigned onto the
`
2
links along
6.4. ExperimentalResults 79
Algorithm5ArithmeticRampEdgeCoding(AREC)
Input Binary map B with one ramp contourfp
1
;p
2
;p
n
g
1: Initialize p
1
and traversing directionc
0;1
2: fori = 1 : n1 do
3: Search for p
i+1
from the 5 possible directions d
j
with the priority ordered as
fforward, slight right, slight left, right,leftg
4: if i k then
5: Assign equal probability
1
5
for d
j
.
6: else
7: Predict the direction ofc
i;i+1
as ~ c
i;i+1
8: Compute the angle
j
betweeneach d
j
and ~ c
i;i+1
.
9: Compute the von Mises distribution'(
j
) ofangle
j
10: Assign the probability for d
j
tobe
'(
j
)
P
5
r=1
'(
r
)
11: end if
12: Encode c
i;i+1
, with one of the 5 possible directions, using the arithmetic coding with the
assigned probability
13: endfor
Figure 6.8: An example of residual block and the corresponding 4-connected grid
graph(rednodes indicate the detected ramp centers)
direction m. An example is shown in Fig. 6.9(a), where the red nodes indicate the ramp
centers detected and the dashed red links denote the links with the small weight. For ramp
center c,
`
2
links along the right direction, where the neighbor n
2
is a non-ramp pixel, are
assignedtobeweak. InFig. 6.9(b),weshowanexampleofgraphconstructionfor` = 2.
6.4 ExperimentalResults
We apply the proposed transform to 6 test sequences: BQMall, BasketballDrill, City, Crew,
Harbour,andSoccer. Thetransformblocksizeisfixedaseither88or1616. Theencoder
flow chart is shown in Fig. 6.10. The unquantized residual blocks for both inter and intra
80 Chapter6. EA-GBT:Step/RampEdgeModelsforVideoCompression
(a) (b)
Figure6.9: Examplesof weak link assignment based on ramp positions
predictionaregeneratedwithHEVC(HM-14.0)atQP = 32. Eachblockisrepresentedusing
a 4-connected grid graph. For transform coding, we apply a hybrid scheme, EA-GBT and
DCT, of which the encoder will compare the rate distortion cost, defined as Sum of Squared
Error + bitrate. The transform with the lower cost will be selected as the final approach
for the associated block. In our experiment, we consider two hybrid schemes, including the
EA-GBT/DCT with step edge model and the EA-GBT/DCT with ramp edge model. The
parameter is defined as = 0:852
(QP12)=3
, where QP = 24;26;28;30;32;34. For step
edgedetection,weusetheSobeloperator,whileforrampedgedetection,themethodproposed
in [61] is applied. For blocks using EA-GBT, the edge positions are encoded and signaled
as an overhead. For step model, we use AEC proposed in [19]. For ramp model, we use
AREC. In order to reduce the overhead, only one contour is allowed in each block. The
ramp width is fixed as 2, chosen empirically, and therefore no signaling is required. The
transformed coefficients for both EA-GBT and DCT are uniformly quantized and encoded
Figure6.10: The video encoder with hybrid EA-GBT/DCT mode selection based on
rate-distortion optimization (RDO)
6.5. Summary 81
usingarithmeticcoding.
The average PSNR and bitrate gain for the inter and intra predicted residual videos over
pureDCTbasedencoderareshownintables6.2and6.3. Fortheintrapredictedvideos,EA-
GBT/DCTwithrampedgemodelperformsslightlybetterthanEA-GBT/DCTwithstepedge
model. Forinterpredictedresiduals,eventhoughAECisnotasefficientasAREC(shownin
Table6.1),EA-GBT/DCTwithstepmodelachievesbetteroverallefficiency,indicatingbetter
edge representation for this model. The results coincide with our justifications discussed in
Section 6.2.2. For both step and ramp models, EA-GBT/DCT outperforms the DCT based
encoder. Note that as the size of transform block increases, the performance of EA-GBT
improves,sincelargerblocksaremorelikelytohaveedges.
Methods Size Ramp88 Step88
PSNR(dB) rate(%) PSNR(dB) rate (%)
BQmall 832480 0.23 -4.04 0.24 -4.20
BasketballDrill 832480 0.18 -3.82 0.18 -3.87
City 704576 0.21 -3.37 0.23 -3.60
Crew 704576 0.10 -2.04 0.11 -2.33
Harbour 704576 0.21 -2.84 0.24 -3.26
Soccer 704576 0.20 -2.88 0.24 -3.37
Average 0.19 -3.17 0.21 -3.44
Methods Size Ramp1616 Step1616
PSNR(dB) rate(%) PSNR (dB) rate(%)
BQmall 832480 0.35 -6.20 0.37 -6.65
BasketballDrill 832480 0.18 -4.24 0.22 -5.11
City 704576 0.25 -4.35 0.27 -4.91
Crew 704576 0.14 -3.41 0.15 -3.64
Harbour 704576 0.24 -3.46 0.24 -3.51
Soccer 704576 0.21 -3.10 0.22 -3.34
Average 0.23 -4.13 0.25 -4.53
Table6.2: BjontegaardDeltaCriterionforINTERpredictedvideos: PSNRandbitrate
gainofEA-GBT-step and EA-GBT-ramp over DCT
6.5 Summary
In this chapter, we proposed a new edge model in EA-GBT based on ramp edges, which is
justified experimentally for intra-predicted residuals using graph learning. Arithmetic ramp
edgecoding(AREC)isproposedtoencodethedetectedramppositions. Experimentalresults
forEA-GBT withbothstep andrampmodels demonstrateimprovedperformance overDCT-
based video coding. Moreover, for intra-predicted residuals, EA-GBT with the new ramp
edgemodelsperformsbetterthanEA-GBTwithstepedgemodels.
82 Chapter6. EA-GBT:Step/RampEdgeModelsforVideoCompression
Methods Size Ramp88 Step88
PSNR(dB) rate(%) PSNR (dB) rate(%)
BQmall 832480 0.29 -3.40 0.27 -3.16
BasketballDrill 832480 0.16 -2.69 0.15 -2.58
City 704576 0.20 -2.13 0.14 -1.48
Crew 704576 0.08 -1.52 0.06 -1.26
Harbour 704576 0.21 -2.21 0.15 -1.58
Soccer 704576 0.20 -1.98 0.13 -1.32
Average 0.19 -2.32 0.15 -1.90
Methods Size Ramp1616 Step1616
PSNR(dB) rate(%) PSNR (dB) rate(%)
BQmall 832480 0.48 -5.43 0.40 -4.61
BasketballDrill 832480 0.15 -2.57 0.13 -2.22
City 704576 0.24 -2.55 0.23 -2.40
Crew 704576 0.13 -2.80 0.11 -2.38
Harbour 704576 0.29 -2.96 0.26 -2.63
Soccer 704576 0.22 -2.25 0.21 -2.15
Average 0.25 -3.09 0.22 -2.73
Table 6.3: Bjontegaard Delta Criterion for INTRA predicted videos: PSNR and
bitrategainofEA-GBT-step and EA-GBT-ramp over DCT
83
Chapter7
ConclusionsandFutureWork
7.1 Conclusion
In this dissertation, we proposed several graph based algorithms for efficient compression
imagesandvideo. Inourfirstwork,withtheoryandapplicationdiscussedinChapter3and4,
alowcomplexitygraphbasedliftingtransformisproposed. Aproblemofoptimalbipartition,
which divides nodes into a Prediction set and Update set, is defined in terms of energy
compaction of the transformed coefficients. We applied a greedy algorithm for selecting the
most representative nodes to be included in the Update set in each lifting level assuming the
signal can be well modelled as GMRF. This statistical model has been widely used in the
image and video processing literature. The results for the proposed bipartition outperform
related work in terms of the mean square error of the high frequency coefficients stored int
the Prediction set. Moreover, since for lifting transforms, graph links connecting nodes in
thesamesetscannotbeutilizedforfiltering,weproposeabipartitegraphreconnectionbased
onKronreduction,whichisabletocapturesimilaritybetweenPredictionSet andUpdateset
moreaccuratelythantheconventionalapproaches. Theresultsinintra-predictedvideocoding
show outstanding performance as compared to the state of the art DCT coding. In addition,
comparable performance to the high complexity GFT can be achieved using the proposed
liftingschemewithoptimizedbipartitionandbipartitegraphapproximation.
InthesecondworkinChapter5,wepresentedanapplicationofgraphbasedtransformin
lightfieldimagecompressionforrandomaccess. Weproposedanovelcodingschemeforlight
field images which is able to encode the original raw data without introducing redundancies
fromdemosaickingandcalibration. Anintra-predictionalgorithmisdevelopedthatexplores
the correlation between the sparsely distributed pixels between each block and its decoded
neighbouringblockswithineachsub-apertureimages. Theresidualpixelsarethenconnected
as graphs and encoded with a graph based lifting transform. A learning algorithm for graph
structure is also proposed using Maximum Likelihood (ML) estimation of GMRF model
84 Chapter7. ConclusionsandFutureWork
parametersbasedontheobservationswithincompletedata. Theresultsshowverysignificant
gains in coding efficiency against the All intra HEVC in high bit rates, which is commonly
consideredinarchivalscenario.
Finally, we discuss an edge model for video residuals and the construction of graphs
in GFT based on different edge modeling. Each column and row from images signal is
modeled as an Autoregressive Regressive (AR) process with edges modeled as i.i.d noise.
We consider step edge and ramp edge models in our design, and derive the optimal GFTs
for decorrelation. The discussion and justification of the two edge model on different types
of predicted video residuals, i.e. the intra and inter-predicted residuals, is presented. The
experiment on intra-predicted residuals shows promising performance using the proposed
ramp edge model. Moreover, we developed a novel signalling method, call Arithmetic Ramp
EdgeCoding,forgraphgeometrybasedonrampedges.
7.2 FutureWork
There are several question we would like to address in future work. For lifting bipartition
described in Chapter 3, currently we are using a greedy algorithm for selecting nodes to
be included in the Update set. However, this has high complexity due to calculating and
comparing the variance on each node. It will be interesting to develop a fast heuristic with
thedesiredpropertiesinthecurrentalgorithm,whichinclude:
1. MorenodesinareaswithhighvariancetextureshouldbeincludedintheUpdateset.
2. Nodes with more local neighbours of large similarity, namely nodes that can provide
better prediction for neighbouring nodes, should have higher likelihood to be selected
intotheUpdateset
3. Nodes having very few neighbours with large similarity, i.e. nodes that are nearly
isolated on graphs, should have high likelihood to be included in the Update set, since
theycannotbepredictedwell byanyothernodes.
In the current light field coding scheme described in Chapter 5, we target a situation
requiring efficient random access and therefore ignore the prediction and transform across
different sub-aperture images. In the future work, we would like to consider a more general
coding scheme considering both intra and inter view correlation, which may include block
matching algorithm for sparsely distributed pixels and the possible edge connection between
blocks in different sub-aperture images. In the last work in Chapter 6, currently we only
consider the 1D AR process in modelling statistics in video residuals. Therefore the graph
assignment for the 2D grid graph in our experiment is still an approximation. The extension
7.2. FutureWork 85
toa2Dmodelcanbehelpfulinanalysis. Also,itwouldbeinterestingtoconsidermoreedge
models,e.g.,lineedge,infuturework.
87
Bibliography
[1] Nasir Ahmed, T Natarajan, and Kamisetty R Rao. “Discrete cosine transform”. In:
IEEEtransactionsonComputers100.1(1974),pp.90–93.
[2] Albert Cohen, Ingrid Daubechies, and J-C Feauveau. “Biorthogonal bases of com-
pactly supported wavelets”. In: Communications on pure and applied mathematics
45.5(1992),pp.485–560.
[3] Alexandre Vieira, Helder Duarte, Cristian Perra, Luis Tavora, and Pedro Assuncao.
“Dataformatsforhighefficiencycodingoflytro-illumlightfields”.In:ImageProcess-
ing Theory, Tools and Applications (IPTA), 2015 International Conference on. IEEE.
2015,pp.494–497.
[4] An-Chao Tsai, Anand Paul, Jia-Ching Wang, and Jhing-Fa Wang. “Intensity gradient
technique for efficient intra-prediction in H. 264/AVC”. In: IEEE Transactions on
CircuitsandSystemsforVideoTechnology18.5(2008),pp.694–698.
[5] Ashok Veeraraghavan, Ramesh Raskar, Amit Agrawal, Ankit Mohan, and Jack Tum-
blin. “Dappled photography: Mask enhanced cameras for heterodyned light fields and
codedaperturerefocusing”. In:ACMTrans.Graph.26.3(2007),p.69.
[6] Tom E Bishop and Paolo Favaro. “Full-resolution depth map estimation from an
aliased plenoptic light field”. In: Asian Conference on Computer Vision. Springer.
2010,pp.186–200.
[7] ChangilKim,HenningZimmer,YaelPritch,AlexanderSorkine-Hornung,andMarkus
H Gross. “Scene reconstruction from high spatio-angular resolution light fields.” In:
ACMTrans.Graph.32.4(2013),pp.73–1.
[8] Yung-HsuanChao,GeneCheung,andAntonioOrtega.“Pre-demosaiclightfieldimage
compressionusinfgraphliftingtransform”.In:toImageProcessing(ICIP),2017IEEE
InternationalConferenceon.IEEE.2017forthcoming.
[9] CharilaosChristopoulos,AthanassiosSkodras,andTouradjEbrahimi.“TheJPEG2000
stillimagecodingsystem:anoverview”.In:IEEEtransactionsonconsumerelectronics
46.4(2000),pp.1103–1127.
88 BIBLIOGRAPHY
[10] Rama Chellappa and Shankar Chatterjee. “Classification of textures using Gaussian
Markov random fields”. In: IEEE Transactions on Acoustics, Speech, and Signal Pro-
cessing33.4(1985),pp.959–963.
[11] Chih-Chieh Chen, Yi-Chang Lu, and Ming-Shing Su. “Light field based digital refo-
cusing using a DSLR camera with a pinhole array mask”. In: Acoustics Speech and
Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE. 2010,
pp.754–757.
[12] Chin Chye Koh, Jayanta Mukherjee, and Sanjit K Mitra. “New efficient methods of
image compression in digital cameras with color filter array”. In: IEEE Transactions
onConsumerElectronics49.4(2003),pp.1448–1456.
[13] King-Hong Chung and Yuk-Hee Chan. “A lossless compression scheme for Bayer
color filter array images”. In: IEEE Transactions on Image Processing 17.2 (2008),
pp.134–144.
[14] RogerJClarke.“Transformcodingofimages”.In: Astrophysics1(1985).
[15] Caroline Conti, Paulo Nunes, and Luís Ducla Soares. “HEVC-based light field im-
age coding with bi-predicted self-similarity compensation”. In: Multimedia & Expo
Workshops(ICMEW),2016IEEEInternationalConferenceon.IEEE.2016,pp.1–4.
[16] Caroline Conti, Paulo Nunes, and Luis Ducla Soares. “New HEVC prediction modes
for 3D holoscopic video coding”.In: Image Processing (ICIP), 2012 19thIEEE Inter-
nationalConferenceon.IEEE.2012,pp.1325–1328.
[17] Donald G Dansereau, Oscar Pizarro, and Stefan B Williams. “Decoding, calibration
and rectification for lenselet-based plenoptic cameras”. In: Proceedings of the IEEE
ConferenceonComputerVisionandPatternRecognition.2013,pp.1027–1034.
[18] Ismael Daribo, Gene Cheung, and Dinei Florencio. “Arithmetic edge coding for arbi-
trarilyshapedsub-blockmotionpredictionindepthvideocompression”.In:201219th
IEEEInternationalConferenceonImageProcessing.IEEE.2012,pp.1541–1544.
[19] IsmaelDaribo,DineiFlorencio,andGeneCheung.“Arbitrarilyshapedmotionpredic-
tionfordepthvideocompressionusingarithmeticedgecoding”.In:ImageProcessing,
IEEETransactionson23.11(2014),pp.4696–4708.
[20] David I Shuman, Mohammad Javad Faraji, and Pierre Vandergheynst. “A multiscale
pyramidtransformforgraphsignals”.In:IEEETransactionsonSignalProcessing64.8
(2016),pp.2119–2134.
BIBLIOGRAPHY 89
[21] Dong Liu, Lizhi Wang, Li Li, Zhiwei Xiong, Feng Wu, and Wenjun Zeng. “Pseudo-
sequence-based light field image compression”. In: Multimedia & Expo Workshops
(ICMEW),2016IEEEInternationalConferenceon.IEEE.2016,pp.1–4.
[22] Florian Dorfler and Francesco Bullo. “Kron reduction of graphs with applications to
electricalnetworks”.In:IEEETransactionsonCircuitsandSystemsI:RegularPapers
60.1(2013),pp.150–163.
[23] EduardoMartinez-Enriquez,JesusCid-Sueiro,FernandoDiaz-De-Maria,andAntonio
Ortega. “Directional Transforms for Video Coding Based on Lifting on Graphs”. In:
IEEETransactionsonCircuitsandSystemsforVideoTechnology(2016).
[24] Eduardo Pavez, Hilmi E Egilmez, Yongzhe Wang, and Antonio Ortega. “GTT: Graph
templatetransformswithapplicationstoimagecoding”.In:PictureCodingSymposium
(PCS),2015.IEEE.2015,pp.199–203.
[25] Feng Dai, Jun Zhang, Yike Ma, and Yongdong Zhang. “Lenselet image compression
scheme based on subaperture images streaming”. In: Image Processing (ICIP), 2015
IEEEInternationalConferenceon.IEEE.2015,pp.4733–4737.
[26] FengPan,XiaoLin,SUSANTORahardja,KengPangLim,andZGLi.“Adirectional
fieldbasedfastintramodedecisionalgorithmforH.264videocoding”.In:Multimedia
and Expo, 2004. ICME’04. 2004 IEEE International Conference on. Vol. 2. IEEE.
2004,pp.1147–1150.
[27] GiuliaFracastoroandEnricoMagli.“Predictivegraphconstructionforimagecompres-
sion”. In: Image Processing (ICIP), 2015 IEEE International Conference on. IEEE.
2015,pp.2204–2208.
[28] Godwin Shen, Woo-Shik Kim, Sunil K Narang, Antonio Ortega, Jaejoon Lee, and
Hocheon Wey. “Edge-adaptive transforms for efficient depth map coding”. In: Picture
CodingSymposium(PCS),2010.IEEE.2010,pp.566–569.
[29] Hae-Gon Jeon, Jaesik Park, Gyeongmin Choe, Jinsun Park, Yunsu Bok, Yu-Wing Tai,
and In So Kweon. “Accurate depth map estimation from a lenslet light field camera”.
In:ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition.
2015,pp.1547–1555.
[30] Harini Priyadarshini Hariharan, Tobias Lange, and Thorsten Herfet. “Low complexity
lightfieldcompressionbasedonpseudo-temporalcircularsequencing”.In:Broadband
Multimedia Systems and Broadcasting (BMSB), 2017 IEEE International Symposium
on.IEEE.2017,pp.1–5.
90 BIBLIOGRAPHY
[31] HenriqueSMalvar,Li-WeiHe,andRossCutler.“High-qualitylinearinterpolationfor
demosaicingofBayer-patternedcolorimages”.In:Acoustics,Speech,andSignalPro-
cessing, 2004. Proceedings.(ICASSP’04). IEEE International Conference on. Vol. 3.
IEEE.2004,pp.iii–485.
[32] Hilmi E Egilmez, Amir Said, Yung-Hsuan Chao, and Antonio Ortega. “Graph-based
transformsforinterpredictedvideocoding”.In:ImageProcessing(ICIP),2015IEEE
InternationalConferenceon.IEEE.2015,pp.3992–3996.
[33] HilmiEEgilmez,EduardoPavez,andAntonioOrtega.“Graphlearningfromdataunder
structuralandlaplacianconstraints”.In:arXivpreprintarXiv:1611.05181(2016).
[34] Hilmi E Egilmez, Yung-Hsuan Chao, Antonio Ortega, Bumshik Lee, and Sehoon
Yea. “GBST: Separable transforms based on line graphs for predictive video coding”.
In: Image Processing (ICIP), 2016 IEEE International Conference on. IEEE. 2016,
pp.2375–2379.
[35] HiroyukiTakeda,SinaFarsiu,andPeymanMilanfar.“Kernelregressionforimagepro-
cessing and reconstruction”. In: IEEE Transactions on image processing 16.2 (2007),
pp.349–366.
[36] Wei Hu, Gene Cheung, and Antonio Ortega. “Intra-prediction and generalized graph
Fouriertransformforimagecoding”.In:IEEESignalProcessingLetters22.11(2015),
pp.1913–1917.
[37] ChiuanHwang,ShinShanZhuang,andShang-HongLai.“Efficientintramodeselection
usingimagestructuretensorforH.264/AVC”.In:ImageProcessing,2007.ICIP2007.
IEEEInternationalConferenceon.Vol.5.IEEE.2007,pp.V–289.
[38] Jean-Luc Starck, Emmanuel J Candès, and David L Donoho. “The curvelet transform
forimagedenoising”.In:IEEETransactionsonimageprocessing11.6(2002),pp.670–
684.
[39] Ming Ji and Jiawei Han. “A Variance Minimization Criterion to Active Learning on
Graphs.”In:AISTATS.2012,pp.556–564.
[40] Wei Jiang, Hanjie Ma, and Yaowu Chen. “Gradient based fast mode decision algo-
rithm for intra prediction in HEVC”. In: Consumer Electronics, Communications and
Networks (CECNet), 2012 2nd International Conference on. IEEE. 2012, pp. 1836–
1840.
BIBLIOGRAPHY 91
[41] MladenKolarandEricPXing.“Consistentcovarianceselectionfromdatawithmissing
values”. In: Proceedings of the 29th International Conference on Machine Learning
(ICML-12).2012,pp.551–558.
[42] ErwanLePennecandStephaneMallat.“Bandeletimageapproximationandcompres-
sion”.In:MultiscaleModeling&Simulation4.3(2005),pp.992–1039.
[43] Sang-Yong Lee and Antonio Ortega. “A Novel Approach for Compression of Images
CapturedusingBayerColorFilterArrays”.In:arXivpreprintarXiv:0903.2272(2009).
[44] Sang-YongLeeandAntonioOrtega.“Anovelapproachofimagecompressionindigital
cameraswithaBayercolorfilterarray”.In:ImageProcessing,2001.Proceedings.2001
InternationalConferenceon.Vol.3.IEEE.2001,pp.482–485.
[45] Marc Levoy and Pat Hanrahan. “Light field rendering”. In: Proceedings of the 23rd
annual conference on Computer graphics and interactive techniques. ACM. 1996,
pp.31–42.
[46] Light-FieldImageDataset.http://mmspg.epfl.ch/EPFL-light-field-image-
dataset.
[47] Light Field Toolbox v0.4. https://www.mathworks.com/matlabcentral/
fileexchange/49683-light-field-toolbox-v0-4.
[48] LiLi,ZhuLi,BinLi,DongLiu,andHouqiangLi.“PseudoSequenceBased2-DHier-
archicalCodingStructureforLight-FieldImageCompression”.In:DataCompression
Conference(DCC),2017.IEEE.2017,pp.131–140.
[49] Karim Lounici et al. “High-dimensional covariance matrix estimation with missing
observations”.In:Bernoulli 20.3(2014),pp.1029–1058.
[50] LytroIllum.https://illum.lytro.com/.
[51] Makan Fardad, Fu Lin, and Mihailo R Jovanović. “Algorithms for leader selection
in large dynamical networks: Noise-free leaders”. In: 2011 50th IEEE Conference on
DecisionandControlandEuropeanControlConference.IEEE.2011,pp.7188–7193.
[52] Eduardo Martínez Enríquez. “Lifting transforms on graphs and their application to
videocoding”.PhDdissertation.UniversidadCarlosIIIdeMadrid,2013.
[53] Eduardo Martínez-Enríquez, Fernando Díaz-de María, and Antonio Ortega. “Video
encoder based on lifting transforms on graphs”. In: 2011 18th IEEE International
ConferenceonImageProcessing.IEEE.2011,pp.3509–3512.
92 BIBLIOGRAPHY
[54] Eduardo Martínez-Enríquez and Antonio Ortega. “Lifting transforms on graphs for
videocoding”.In:2011DataCompressionConference.IEEE.2011,pp.73–82.
[55] Mozhdeh Seifi, Neus Sabater, Valter Drazic, and Patrick Perez. “Disparity-guided
demosaickingoflightfieldimages”.In:ImageProcessing(ICIP),2014IEEEInterna-
tionalConferenceon.IEEE.2014,pp.5482–5486.
[56] Sunil K Narang and Antonio Ortega. “Lifting based wavelet transforms on graphs”.
In: Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing
Association,2009AnnualSummitandConference.2009,pp.441–444.
[57] SunilKNarangandAntonioOrtega.“Localtwo-channelcriticallysampledfilter-banks
ongraphs”.In:2010IEEEInternationalConferenceonImageProcessing.IEEE.2010,
pp.333–336.
[58] SunilKNarangandAntonioOrtega.“Perfectreconstructiontwo-channelwaveletfilter
banks for graph structured data”. In: IEEE Transactions on Signal Processing 60.6
(2012),pp.2786–2799.
[59] Ha Q Nguyen and Minh N Do. “Downsampling of signals on graphs via maximum
spanningtrees”.In:IEEETransactionsonSignalProcessing63.1(2015),pp.182–191.
[60] Eduardo Pavez and Antonio Ortega. “Generalized Laplacian precision matrix esti-
mation for graph signal processing”. In: Acoustics, Speech and Signal Processing
(ICASSP),2016IEEEInternationalConferenceon.IEEE.2016,pp.6350–6354.
[61] Maria Petrou and Josef Kittler. “Optimal edge detectors for ramp edges”. In: IEEE
TransactionsonPatternAnalysis&MachineIntelligence5(1991),pp.483–491.
[62] RaytrixCamera.https://www.raytrix.de/.
[63] RenNg,MarcLevoy,MathieuBrédif,GeneDuval,MarkHorowitz,andPatHanrahan.
“Light field photography with a hand-held plenoptic camera”. In: Computer Science
TechnicalReportCSTR2.11(2005),pp.1–11.
[64] Martin Řeřábek and Touradj Ebrahimi. “New light field image dataset”. In: 8th Inter-
national Conference on Quality of Multimedia Experience (QoMEX). EPFL-CONF-
218363.2016.
[65] Ricardo Monteiro, Luís Lucas, Caroline Conti, Paulo Nunes, Nuno Rodrigues, Sérgio
Faria,CarlaPagliari,EduardodaSilva,andLuísSoares.“LightfieldHEVC-basedim-
agecodingusinglocallylinearembeddingandself-similaritycompensatedprediction”.
In:Multimedia&ExpoWorkshops(ICMEW),2016IEEEInternationalConferenceon.
IEEE.2016,pp.1–4.
BIBLIOGRAPHY 93
[66] Amir Said and William A Pearlman. “Low-complexity waveform coding via alpha-
bet and sample-set partitioning”. In: Electronic Imaging’97. International Society for
OpticsandPhotonics.1997,pp.25–37.
[67] GodwinShenandAntonioOrtega.“Optimizeddistributed2Dtransformsforirregularly
sampled sensor network grids using wavelet lifting”. In: 2008 IEEE International
ConferenceonAcoustics,SpeechandSignalProcessing.IEEE.2008,pp.2513–2516.
[68] Godwin Shen and Antonio Ortega. “Tree-based wavelets for image coding: Orthogo-
nalizationandtreeselection”.In:PictureCodingSymposium,2009.PCS2009.IEEE.
2009,pp.1–4.
[69] Shengyang Zhao, Zhibo Chen, Kun Yang, and Hongru Huangi. “Light field image
coding with hybrid scan order”. In: Visual Communications and Image Processing
(VCIP),2016.IEEE.2016,pp.1–4.
[70] Jianbo Shi and Jitendra Malik. “Normalized cuts and image segmentation”. In: IEEE
Transactionsonpatternanalysisandmachineintelligence22.8(2000),pp.888–905.
[71] Nicolas Städler and Peter Bühlmann. “Missing values: sparse inverse covariance es-
timation and an extension to sparse regression”. In: Statistics and Computing 22.1
(2012),pp.219–235.
[72] Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. “The
lumigraph”.In:Proceedingsofthe23rdannualconferenceonComputergraphicsand
interactivetechniques.ACM.1996,pp.43–54.
[73] DavidTaubmanandMichaelMarcellin.JPEG2000imagecompressionfundamentals,
standards and practice: image compression fundamentals, standards and practice.
Vol.642.SpringerScience&BusinessMedia,2012.
[74] Bryan Usevitch. “Optimal bit allocation for biorthogonal wavelet coding”. In: Data
CompressionConference,1996.DCC’96.Proceedings.IEEE.1996,pp.387–395.
[75] Wei-Chao Chen, Jean-Yves Bouguet, Michael H Chu, and Radek Grzeszczuk. “Light
fieldmapping:efficientrepresentationandhardwarerenderingofsurfacelightfields”.
In:ACMTransactionsonGraphics(TOG)21.3(2002),pp.447–456.
[76] WeiHu,GeneCheung,AntonioOrtega,andOscarCAu.“Multiresolutiongraphfourier
transform for compression of piecewise smooth images”. In: IEEE Transactions on
ImageProcessing24.1(2015),pp.419–433.
94 BIBLIOGRAPHY
[77] Woo-Shik Kim, Sunil K Narang, and Ortega, Antonio. “Graph based transforms for
depth video coding”. In: 2012 IEEE International Conference on Acoustics, Speech
andSignalProcessing(ICASSP).IEEE.2012,pp.813–816.
[78] XiaowenDong,DorinaThanou,PascalFrossard,andPierreVandergheynst.“Learning
laplacian matrix in smooth graph signal representations”. In: IEEE Transactions on
SignalProcessing64.23(2016),pp.6160–6173.
[79] Shan Xu, Zhi-Liang Zhou, and Nicholas Devaney. “Multi-view Image Restoration
from Plenoptic Raw Images”. In: Asian Conference on Computer Vision. Springer.
2014,pp.3–15.
[80] Yung-Hsuan Chao, Antonio Ortega, and Sehoon Yea. “Graph-based lifting transform
forintra-predictedvideocoding”.In:2016IEEEInternationalConferenceonAcoustics,
SpeechandSignalProcessing(ICASSP).IEEE.2016,pp.1140–1144.
[81] Yung-HsuanChao,HilmiEEgilmez,AntonioOrtega,SehoonYea,andBumshikLee.
“Edge adaptive graph-based transforms: Comparison of step/ramp edge models for
videocompression”.In:ImageProcessing(ICIP),2016IEEEInternationalConference
on.IEEE.2016,pp.1539–1543.
[82] Yunsu Bok, Hae-Gon Jeon, and In So Kweon. “Geometric calibration of micro-lens-
basedlightfieldcamerasusinglinefeatures”.In:IEEEtransactionsonpatternanalysis
andmachineintelligence39.2(2017),pp.287–300.
[83] BingZengandJingjingFu.“Directionaldiscretecosinetransforms—anewframework
for image coding”. In: IEEE transactions on circuits and systems for video technology
18.3(2008),pp.305–313.
[84] Cha Zhang and Dinei Florêncio. “Analyzing the optimality of predictive transform
coding using graph-based models”. In: IEEE Signal Processing Letters 20.1 (2013),
pp.106–109.
[85] ChaZhang,DineiFlorêncio,andPhilipAChou.“Graphsignalprocessing–aprobabilis-
tic framework”. In: Microsoft Res., Redmond, WA, USA, Tech. Rep. MSR-TR-2015-31
(2015).
[86] Fuzhen Zhang. The Schur complement and its applications. Vol. 4. Springer Science
&BusinessMedia,2006.
95
AppendixA
ReconnectionusingKronReduction
Inthisappendixwewillprovethatthepredictionofanynodev inthePredictionset(P)using
the proposed predictor scheme, namely applying the generalized CDF5/3 filterbanks after
graph reconnection, is equivalent to applying the maximum a posteriori estimation (MAP)
for v usingU assumingthesignalcanbemodeledasGMRFdefinedbythegraphstructure.
DefinemandnasthesizeofP andU,andsetsP
=P=fvgandU
+
=U[fvg. Without
loss of generality, the indices of nodes inU
+
are ordered as[v;U], and the indices of nodes
inP areorderedas[v;P
]. TheMAPestimationofP givenU iscalculatedas
P
PjU
f
U
=L
1
P;P
L
P;U
f
U
: (A.1)
Wecanrewrite L
P;P
and L
P;U
inblockmatrixforms:
L
P;P
=
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
deg(v)+h
v
L
v;P
L
P
;v
L
P
;P
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
(A.2)
L
P;U
=
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
L
v;U
L
P
;U
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
; (A.3)
96 AppendixA. ReconnectionusingKronReduction
where deg(v) and h
v
are the degree and self loop of node v. Define c = L
P
;v
, b
T
= L
v;P
,
and s
v
= deg(v)+h
v
,thematrixin(A.2)canbere-writtenas
L
P;P
=
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
s
v
b
T
c L
P
;P
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
: (A.4)
Theinversecanbecalculatedusingblock-wisematrixinversion:
L
1
P;P
=
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
(s
v
b
T
L
1
P
;P
c)
1
s
1
v
b
T
(L
P
;P
cs
1
v
b
T
)
1
L
1
P
;P
c(s
v
b
T
L
1
P
;P
c)
1
(L
P
;P
cs
1
v
b
T
)
1
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
:
(A.5)
The MAP estimated of v givenU, which corresponds to the first row in P
PjU
, is therefore
expressedasbelow.
P
PjU
(1;:) =L
1
P;P
(1;:) L
P;U
=
f
(s
v
b
T
L
1
P
;P
c)
1
s
1
v
b
T
(L
P
;P
cs
1
v
b
T
)
1
g
2
6
6
6
6
6
6
6
6
6
6
4
L
v;U
L
P
;U
3
7
7
7
7
7
7
7
7
7
7
5
=(s
v
b
T
L
1
P
;P
c)
1
L
v;U
+s
1
v
b
T
(L
P
;P
cs
1
v
b
T
)
1
L
P
U
:
(A.6)
Defineconstant q as
q =(s
v
b
T
L
1
P
;P
c); (A.7)
AppendixA. ReconnectionusingKronReduction 97
(A.6)canbefurthersimplifiedas
q
1
L
v;U
+s
1
v
b
T
(L
P
;P
cs
1
v
b
T
)
1
L
P
;U
= q
1
L
v;U
+q
1
qs
1
v
b
T
(L
P
;P
cs
1
v
b
T
)
1
L
P
U
= q
1
(L
v;U
+qs
1
v
b
T
(L
P
;P
cs
1
v
b
T
)
1
L
P
U
)
= q
1
(L
v;U
(s
v
b
T
L
1
P
;P
c)s
1
v
b
T
(L
P
;P
cs
1
v
b
T
)
1
L
P
U
)
= q
1
(L
v;U
(b
T
s
1
v
b
T
L
1
P
;P
cb
T
)(L
P
;P
cs
1
v
b
T
)
1
L
P
U
)
= q
1
(L
v;U
b
T
(Is
1
v
L
1
P
;P
cb
T
)(L
P
;P
s
1
v
cb
T
)
1
L
P
U
)
= q
1
(L
v;U
b
T
(Is
1
v
L
1
P
;P
cb
T
)(L
P
;P
L
P
;P
L
1
P
;P
s
1
v
cb
T
)
1
L
P
U
)
= q
1
(L
v;U
b
T
(Is
1
v
L
1
P
;P
cb
T
)((L
P
;P
)(Is
1
v
L
1
P
;P
cb
T
))
1
L
P
U
)
= q
1
(L
v;U
b
T
(Is
1
v
L
1
P
;P
cb
T
)(Is
1
v
L
1
P
;P
cb
T
)
1
L
1
P
;P
L
P
U
)
= q
1
(L
v;U
b
T
L
1
P
;P
L
P
U
):
(A.8)
Inourdesignofpredictiontransform,weapplythegeneralizedCDF5/3afterreconnecting
nodes v 2 P toU = [v
1
;v
2
;;v
n
]. The reconnection is derived with Kron reduction
by removing nodes inP
= P=fvg. We define the the weights on links between v and
[v
1
;v
2
;;v
n
] after reconnection as[w
v;v1
;w
v;v2
;;w
v;vn
]. The graph Laplacian L
kron
after
removingnodesinP
canbewrittenas
L
kron
= L
U
+
;U
+ L
U
+
;P
L
1
P
;P
L
P
;U
+: (A.9)
The first row of L
kron
, provides the information for connecting v andU and the associated
weight on each link. The matrix L
U
+
;U
+, L
U
+
;P
, and L
P
;U
+, can be expressed in block
98 AppendixA. ReconnectionusingKronReduction
matrixformsas
L
U
+
;U
+ =
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
s
v
L
v;U
L
U;v
L
U;U
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
(A.10)
,
L
U
+
;P
=
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
L
v;P
L
U;P
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
b
T
L
U;P
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
(A.11)
,and
L
P
;U
+ =
2
6
6
6
6
6
6
6
6
6
6
6
6
6
4
L
P
;v
L
P
;U
3
7
7
7
7
7
7
7
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
6
6
6
6
6
6
6
4
c L
P
;U
3
7
7
7
7
7
7
7
7
7
7
7
7
7
5
: (A.12)
Thefirst rowof L
kron
canthereforebewrittenas
L
kron
(1;:) =
f
s
v
L
v;U
g
b
T
L
1
P
;P
2
6
6
6
6
6
6
6
6
6
6
6
6
6
4
c L
P
;U
3
7
7
7
7
7
7
7
7
7
7
7
7
7
5
=
f
s
v
b
T
L
1
P
;P
c L
v;U
b
T
L
1
P
;P
L
P
;U
g
:
(A.13)
The first term s
v
b
T
L
1
P
;P
c is the summation of degree and self loop of node v after the
reconstruction, which is also equivalent to the negative of constant q defined in (A.7). The
negative of the 1 n vector L
v;U
b
T
L
1
P
;P
L
P
;U
, on the other hand, stores the weights
[w
v;v1
;w
v;v2
;;w
v;vn
] on links between v toU. As mentioned in Section 2.4.1 and 5.5.3,
AppendixA. ReconnectionusingKronReduction 99
thepredictionofnode v,denotedas
^
f
v
,usingCDF5/3isexpressedas
^
f
v
=
1
deg(v)+h
v
X
v
j
2U
w
v;vj
f
v
j
=
1
deg(v)+h
v
[w
v;v1
;w
v;v2
;;w
v;vn
]f
U
; (A.14)
wheredeg(v),h
v
,and w
v;vj
arethedegree,selfloop,andlinkweightsonthelinksconnecting
v in the graph after reconnection. Replace the variables with the representation derived in
(A.13),(A.14)canbewrittenas
^
f
v
=(s
v
b
T
L
1
P
;P
c)
1
(L
v;U
b
T
L
1
P
;P
L
P
;U
)
= q
1
(L
v;U
b
T
L
1
P
;P
L
P
;U
)
; (A.15)
which is equivalent to the MAP estimation in (A.8). We therefore prove that the proposed
predictionwiththegeneralizedCDF5/3afterreconnectionisequivalenttotheMAPestimation
oftheunderlyingGMRF.
Abstract (if available)
Abstract
In this Ph.D. dissertation, we discuss several graph-based algorithms for transform coding in image and video compression applications. Graphs are generic data structures that are useful in representing signals in various applications. Different from the classic transforms such as Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT), graphs can represent signals on irregular and high dimensional domains, e.g. social networks, sensor networks. For regular signals such as images and videos, graphs can adapt to local characteristics such as edges and therefore provide more flexibility than conventional transforms. A frequency interpretation for signal on graphs can be derived using the Graph Fourier Transform (GFT). By properly adjusting the graph structure, e.g. connectivity and weights, based on signal characteristics, the GFT can provide compact representations even for signals with discontinuities. However, the GFT has high implementation complexity, making it less applicable in signals of large size, e.g. video sequences. In our work, we develop a transform coding scheme based on a low complexity lifting transform on graph. More specifically, we focus on two important problems in the design of a lifting transform, namely, the design of bipartition and the bipartite graph approximation. The two parts are optimized in terms of energy compaction for Gaussian Markov Random Field (GMRF), which has been widely utilized in modeling the statistics of image data. ❧ As application, we consider two types of multimedia signals, including both regular and irregularly distributed signals. Among the first type of signal, we consider the compression of intra-predicted video residuals, which is regular with pixels residing on the 2D grid. However, these signals contain significant edge structures, which cannot be efficiently represented with existing transform coding standards. With the proposed graph lifting transform based on local edges, we demonstrate significant gains as compared to the state of the art DCT based coding, with comparable performance to that achieved by the high complexity GFT. We also discuss different types of edge models for video residuals and propose a new model for ramp edges, which shows promising results in GFT, as compared to the conventional step edge model. As a second type of signal, we propose a coding scheme for non-demosaicked light field images. Similar to the traditional digital camera, a light field camera captures color information using a photo sensor embedded with a color filter array (CFA). On the captured image, each pixel contains one single color component (out of R,G, and B) which are distributed based on Bayer pattern. However, through the conversion to an array of sub-aperture images, which is a representation commonly used for light field processing and display, the distribution of Bayer pattern no longer holds and pixels of each color component are distributed irregularly in space. In order to compress such data, a conventional scheme using DCT requires demosaicking during conversion, which highly increases the amount of data for coding. With a graph based approach, the original signal can be efficiently encoded without any pre-processing step, avoiding the redundancies introduced by demosaicking. We also discuss an intra-prediction algorithm and optimal graph construction for irregularly spaced pixels. The results using the proposed scheme with graph based lifting transform show huge gains in compression as compared to DCT based coding in high bit rates, which are critical for archival scenario and instant camera storage.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Efficient transforms for graph signals with applications to video coding
PDF
Graph-based models and transforms for signal/data processing with applications to video coding
PDF
Lifting transforms on graphs: theory and applications
PDF
Efficient coding techniques for high definition video
PDF
Scalable sampling and reconstruction for graph signals
PDF
Sampling theory for graph signals with applications to semi-supervised learning
PDF
Distributed source coding for image and video applications
PDF
Human activity analysis with graph signal processing techniques
PDF
Estimation of graph Laplacian and covariance matrices
PDF
Human motion data analysis and compression using graph based techniques
PDF
Critically sampled wavelet filterbanks on graphs
PDF
Techniques for compressed visual data quality assessment and advanced video coding
PDF
Efficient graph learning: theory and performance evaluation
PDF
Application-driven compressed sensing
PDF
Advanced knowledge graph embedding techniques: theory and applications
PDF
Random access to compressed volumetric data
PDF
Advanced techniques for green image coding via hierarchical vector quantization
PDF
Novel algorithms for large scale supervised and one class learning
PDF
Advanced machine learning techniques for video, social and biomedical data analytics
PDF
Robust video transmission in erasure networks with network coding
Asset Metadata
Creator
Chao, Yung-Hsuan
(author)
Core Title
Compression of signal on graphs with the application to image and video coding
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
11/10/2017
Defense Date
09/05/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
graph signal processing,image processing,OAI-PMH Harvest,transform coding,video coding
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ortega, Antonio (
committee chair
), Govindan, Ramesh (
committee member
), Kuo, C.-C. Jay (
committee member
)
Creator Email
shuya802@gmail.com,yunghsuc@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-455321
Unique identifier
UC11264076
Identifier
etd-ChaoYungHs-5893.pdf (filename),usctheses-c40-455321 (legacy record id)
Legacy Identifier
etd-ChaoYungHs-5893.pdf
Dmrecord
455321
Document Type
Dissertation
Rights
Chao, Yung-Hsuan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
graph signal processing
image processing
transform coding
video coding