Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Active state tracking in heterogeneous sensor networks
(USC Thesis Other)
Active state tracking in heterogeneous sensor networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ACTIVESTATETRACKINGINHETEROGENEOUSSENSORNETWORKS by Daphney–StavroulaZois ADissertationPresentedtothe FACULTYOFTHEGRADUATESCHOOL UNIVERSITYOFSOUTHERNCALIFORNIA InPartialFulfillmentofthe RequirementsfortheDegree DOCTOROFPHILOSOPHY (ELECTRICALENGINEERING) August2014 Copyright 2014 Daphney–StavroulaZois Dedication Tomyhusbandforhisunconditionallove,constantsupportandpatience. Tomyparentsandbrotherfortheirloveandcontinuousencouragement. ii Acknowledgements Overthepastyears,Ihavereceivedsupport,feedbackandencouragementfromvarious individuals. This dissertation would not have been possible without their help in so manywaysandIwouldliketotakethisopportunitytoacknowledgetheirpart. Iwouldlike,firstofall,toexpressmydeepestgratitudeandsincereappreciationto my advisor, Dr. Urbashi Mitra, for her guidance, care and patience. During numerous occasions,shehasgivenmevariousopportunitiesforlearningandindependentthinking and has believed in me. Without her support, encouragement and consistent feedback, this dissertation would not have been possible. I would also like to thank my quali- fying and dissertation committee members, Dr. Rahul Jain, Dr. Shrikanth Narayanan, Dr. Michael Neely, Dr. Shaddin Dughmi, and Dr. Milind Tambe, for their intriguing questionsandinsightfulsuggestionsinorderformetocompletethisthesis. I would also like to thank the KNOWME network team members: Dr. Murali Annavaram, Dr. Urbashi Mitra, Dr. Shrikanth Narayanan, Dr. Donna Spruijt–Metz, Dr. Gaurav Sukhatme, Dr. Adar Emken, Dr. Gautam Thatte, Dr. Ming Li, Dr. Harsh- vardhan Vathsangam and Sangwon Lee, which have shared their excitement, expertise andexperiencewithmeonmultipleoccasions. Besidestheindividualsmentionedabove,Iwouldliketoacknowledgethecontribu- tion of the following people. I cannot express how indebted I am to Dr. George Mous- takides. He has been an amazing mentor, an excellent teacher and one of the smartest iii people that I know. I hope that I could be as knowledgeable, enthusiastic and energetic as he is. I have also gained a lot from my interactions with Dr. Giuseppe Caire, his classes on Random Processes and Error Correcting Codes and his depth of knowledge andclarityofpresentation. IwasveryluckytointeractwithDr. RobertScholtz,firstby beingateachingassistantinhisRandomProcessesclassandlateronbybeingastudent in his Advanced Random Processes class. I have also benefited from interactions with Dr. BhaskarKrishnamachari,Dr. TaraJavidiandDr. AshutoshNayyar. Finally,Iwould liketomentionDr. MarcoLevorato,Dr. GeoffHollinger,Dr. NicoloMichelusiandDr. EmrahAkyolfortheircriticalfeedbackatseveralstagesofmyPhD. LifeatUSCwouldhavenbeenmuchmoredifficultwereitnotforDianeDemetras, GerrielynRamos,JaniceThompson,AnitaFung,TimBoston,CorineWong,andSusan Wiedem. Theywerealwaysavailabletohelpandanswerallmyquestions. Iwouldalso liketomentionthesupportandhelpofMargeryBertiandTracyCharlesduringthefirst yearofmystudies. Isincerelyappreciateandthankyouall. I cannot describe the hard–work and dedication needed to complete a PhD degree. During this tough and full of surprises journey, I would not have made it without the constantsupport,encouragementandhelpofcertainfriendsatUSC.Manythanksgoto Sunav,Ozgun,Srinivas,Hao,Chiru,Songze,Sajjad,Kuan–Wen,forallthediscussions, arguments and fun that we had to together. The same goes to the Greek gang at USC: Orestis, Theodora, Nikos, Pavlos, Christodoulos, Megas and Melina. Thank you for keepingmesane,listeningtomycomplaintsandalwaysprovidingmewithawayout. I wouldneverforgetthetimewespenttogether. And last, but most importantly, I would like to express my deepest gratitude to my family. I would like to thank my parents, Dr. Dimitrios Zois and Dr. Polyxeni Stathopoulou, and my brother, Vasileios Zois, for their continuous understanding and encouragement, their love and affection. Many thanks also go to my parents–in–law iv and my sister–in–law for always finding ways to make me smile. In particular, I would like to thank my husband, Dr. Charalampos Chelmis, for holding my hand all these years,enduringthislongprocesswithmeandalwaysofferingmehissupportandlove. Iamforeverindebtedtoyoufortalkingmerightoffthecliff. Lastbutnotleast,Iwould like to mention our cat, Mushroom, for the serenity, happiness and love he has offered methroughtheyears. v TableofContents Dedication ii Acknowledgements iii ListofTables viii ListofFigures ix Abstract xii Chapter1: Introduction 1 1.1 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 MotivatingApplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 ThesisOutline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter2: Energy–EfficientHeterogeneousSensorSelectioninWBANs 16 2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 ProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 StochasticModel . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.2 PerformanceMeasures . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.3 PartialObservability&SufficientStatistics . . . . . . . . . . . . 26 2.3.4 OptimizationProblem . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4 OptimalSensorSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5 ApproximationSchemes . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5.1 MIC–T3SAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5.2 ConstrainedApproximations . . . . . . . . . . . . . . . . . . . . 36 2.6 CaseStudy: TheKNOWMENetwork . . . . . . . . . . . . . . . . . . . . 39 2.6.1 SimulationsFramework,BaselineandMetrics . . . . . . . . . . 39 2.6.2 NumericalResults . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.6.4 Effectofthetransitionmatrix . . . . . . . . . . . . . . . . . . . . 51 vi 2.6.5 IndividualStateDetectionError. . . . . . . . . . . . . . . . . . . 52 2.7 ConcludingRemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Chapter3: ActiveClassification: aUnifiedFramework 56 3.1 RelatedWorkandContributions . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 ProblemStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.1 SystemModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.2 InnovationsRepresentationofSystemModel . . . . . . . . . . . 65 3.3 SystemStateEstimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.1 Kalman–likeEstimator . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.2 FilterPerformance . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.3.3 StandardKFversusMarkovchainKalman–likefilter . . . . . . 78 3.4 OptimalControlPolicyDesign . . . . . . . . . . . . . . . . . . . . . . . . 80 3.4.1 PerfectStateInformationReformulation&DPAlgorithm . . . 81 3.4.2 SufficientStatistic&NewDPAlgorithm . . . . . . . . . . . . . 84 3.5 GreedyFisherInformationSensorSelection . . . . . . . . . . . . . . . . 90 3.5.1 FisherInformation . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.5.2 DiscreteFisherInformation . . . . . . . . . . . . . . . . . . . . . 91 3.5.3 GFIS 2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.6 SmoothingEstimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.7 NumericalExample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.8 ConcludingRemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Chapter4: ActiveClassificationwithSensingCosts 114 4.1 RelatedWorkandContributions . . . . . . . . . . . . . . . . . . . . . . . 115 4.2 ProblemDefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.2.1 OptimizationProblem . . . . . . . . . . . . . . . . . . . . . . . . 118 4.3 OptimalSensingStrategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.4 MainResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.4.1 StructuralProperties . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.4.2 PassiveversusActiveSensing . . . . . . . . . . . . . . . . . . . . 132 4.5 LowComplexityStrategies . . . . . . . . . . . . . . . . . . . . . . . . . . 138 4.5.1 MyopicStrategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 4.5.2 CE–WWLBStrategy . . . . . . . . . . . . . . . . . . . . . . . . . 140 4.6 NumericalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 4.7 ConcludingRemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Chapter5: ConclusionsandFutureWork 160 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.2 FutureDirections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 References 166 vii ListofTables 2.1 EnergygainsachievedbyDP,MIC–T3S,E 2 MBADPandGME 2 PS 2 for varyingvaluesofN whendetectionaccuracyissetequaltoEA’saccuracy. 47 2.2 Individualstatedetectionerrorfordifferentweightingschemes. . . . . . 53 3.1 Averagedetectionaccuracyfordifferentcontrolpolicies(A:ACCmean –1sample,B:ACCvariance–1sample,Γ: ECGPeriod–1sample.) . 108 3.2 AveragefilteringandsmoothingdetectionaccuracyunderDPpolicy. . . 109 3.3 MSEandDetectionaccuracycomparisonbetweenDPandGFIS 2 algo- rithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 viii ListofFigures 1.1 Ambientintelligenceschematic. . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Learningprocess. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Activestatetrackingmodel.. . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Trade–offbetweenqualityandcost. . . . . . . . . . . . . . . . . . . . . . 5 1.5 WBANschematic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Markovchainoffourphysicalactivities{Sit,Stand,Run,Walk}[114]. . 40 2.2 Gaussiandistributionsassociatedwitheachofthefouractivitiesforthe ACC Mean, ACC Variance and ECG Period features for two different subjects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.3 Performance of EA, DP and MIC–T3S with respect to ATC in the case ofN =12samplesforvaryingvaluesoftheweightfactorλ. . . . . . . . 42 2.4 Trade–offbetweenAUECandADEforDPandMIC–T3Sinthecaseof N =12samplesfortwoindividuals. EAperformanceforvaryingvalues ofN isalsoprovidedforreference. . . . . . . . . . . . . . . . . . . . . . 43 2.5 Average allocation of samples per sensor for MIC–T3S in the case of N = 12 samples for varying values of the weight factor λ for two indi- viduals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.6 Trade–off between AUEC and ADE for E 2 MBADP and GME 2 PS 2 in the case of N = 12 samples for two individuals. EA performance for varyingvaluesofN isalsoprovidedforreference. . . . . . . . . . . . . 45 2.7 AverageallocationofsamplespersensorforE 2 MBADPandGME 2 PS 2 inthecaseofN =12samplesforvaryingvaluesoftheaccuracythresh- oldτ fortwoindividuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 ix 2.8 Average allocation of samples per sensor and activity state for MIC– T3S,E 2 MBADPandGME 2 PS 2 whendetectionaccuracyissetequalto EA’saccuracy(upperplot: Subject1,lowerplot: Subject2). . . . . . . . 48 2.9 Belief state temporal evolution and corresponding number of received samplesfromeachsensorforMIC–T3SinthecaseofN =12samples. More emphasis is on minimizing energy consumption. The underlying truestateisalsoprovidedforreference. Timeslotsforwhichtheunder- lyingtruestatedoesnotchangearegroupedtogetherbyanellipsoid. . . 49 2.10 Belief state temporal evolution and corresponding number of received samplesfromeachsensorforMIC–T3SinthecaseofN =12samples. More emphasis is on maximizing detection accuracy. The underlying truestateisalsoprovidedforreference. Timeslotsforwhichtheunder- lyingtruestatedoesnotchangearegroupedtogetherbyanellipsoid. . . 50 2.11 AverageallocationofsamplesforGME 2 PS 2 inthecaseofN =12sam- plesforvaryingvaluesoftheaccuracythresholdτ anddifferentMarkov chaintransitionmatrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1 Interconnection of system block diagram and MMSE estimator block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.2 Average MSE performance of optimal MMSE estimator and Kalman– likeestimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.3 Tracking performance. Top: individual’s true activity, middle: esti- mated activity (DP policy), bottom: estimated activity (Greedy MSE policy). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 3.4 Exemplary effect of stage R on the smoothed state estimates (pmfs). Theinitialfilteredestimateisalsogivenforcomparison. . . . . . . . . . 110 3.5 Trackingperformancecomparison. Top: trueactivity,middle: estimated activity(DPAlgorithm),bottom: estimatedactivity(GFIS 2 Algorithm). 111 3.6 Average allocation of samples per sensor and state comparison. Top: DPAlgorithm,bottom: GFIS 2 Algorithm. . . . . . . . . . . . . . . . . . 112 4.1 Graphshowingeffectoftwodifferentcontrolinputsontheobservation kernel for the same set of states,e 1 ande 2 . In (a), the two distributions overlap, leading to errors, while in (b), a different selection of control inputleadstopracticallynooverlap. . . . . . . . . . . . . . . . . . . . . . 118 x 4.2 OptimalDPpolicycostexampleforthreecontrolinputsandassociated thresholdsensingstrategyrule. . . . . . . . . . . . . . . . . . . . . . . . . 131 4.3 Currentcostsforfixedvarianceσ 2 u i =2anddifferenta 12 (u i ). . . . . . . 134 4.4 Currentcostswithdifferentvariancesanda 12 (u i )=constant. . . . . . 137 4.5 Trade–off curves for DP MSE–based, myopic and CE–WWLB strate- giesforN =2samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 4.6 Trade–off curves for myopic (N = 12 samples), CE–WWLB (N = 12 samples) and equal allocation (N = 3,6,9,12 samples) strategies: (a) AMSEversusAEC,(b)ADPversusAEC. . . . . . . . . . . . . . . . . . 156 4.7 Samplesallocationfordifferentphysicalactivitystatesfordetectionper- formancesettoEA’sperformance(N =12samples). . . . . . . . . . . . 157 xi Abstract The proliferation of heterogeneous devices that integrate a wide range of sensing and networkingfeaturesinconjunctionwithadvancesinwirelesscommunicationshas,now morethanever,ledtoanenvironment,wheremeasurementsareincreasinglyubiquitous. At the same time, the measurement capabilities of any device (e.g. operation modes, reliability, range) are intertwined with usage costs (e.g. energy, time, computational complexity), which heavily depend on the deployed resources. In that sense, different qualitativeviewsofthesamephenomenoncanbeacquiredbyincurringdifferentusage costs. As a result, an inherent trade–off between competing goals emerges. In this dissertation, we investigate the interactions between estimation/detection components and available resources. We specifically focus on systems, where the state evolves as a discrete–time, finite–state Markov chain but is not directly observed. Instead, various heterogeneous observations are available. Our objective is to design efficient estima- tion and sensingstrategies toguide the observation selection process andachieve cost– efficientandaccurateinference. Thisisknownastheactivestatetrackingproblem. First, the problem of physical activity detection using heterogeneous sensors in a wireless body area network is considered. Since the energy–constrained nature of the fusion center imposes critical restrictions on the network lifetime, the number of sam- ples allocated to each sensor is optimized to achieve the minimum trade–off between worst–case detection error probability and energy cost. To this end, a novel stochastic xii control formulation is proposed and the optimal sensor selection strategy is derived to drive the allocation of samples over time. Next, three near–optimal, low–complexity strategies are designed exploiting heuristics and key properties that are proved. Using experimentaldatafromoverweightadolescentsubjects,significantenergygains(some- timesashighas68%)areobservedincomparisontoanaiveequalallocationalgorithm for detection error probability on the order of 10 −4 . The proposed strategies are also compared with respect to average allocation of samples and sensitivity on state transi- tions,whiledetectionofephemeralstatesisalsoconsidered. In an effort to accommodate various applications, state tracking of a discrete–time, finite–state Markov chain with controlled conditionally Gaussian measurement vectors isthenconsidered. Towardtheachievementofthisgoal,aunifiedframeworkofestima- tion and control is proposed. In particular, approximate minimum–mean squared error estimators(filters,smoothers)aredevelopedandasensingstrategyisdesigned. Anon- standardrecursionisderivedtoachievethelatter. Toavoidtheattendantcomputational complexity, a suboptimal, lower complexity strategy is proposed. The success of the proposedframeworkisillustratedonthephysicalactivitydetectionprobleminwireless bodyareanetworks. Next, the aforementioned framework is extended to accommodate sensing usage costs. Specifically, the trade–off between tracking performance and sensing costs is considered. State tracking is achieved by employing the earlier proposed estimator and a nonstandard recursion is developed to determine the optimal sensing strategy. Prop- erties and sufficient conditions are also derived that characterize the structure of the optimal sensing strategy. To overcome the computational complexity of the latter solu- tion,lowcomplexitystrategiesareproposed. Applicationoftheproposedframeworkon xiii thephysicalactivitydetectionprobleminwirelessbodyareanetworkssuggestssignifi- cant energy gains (as high as 60%) in comparison to a naive equal allocation algorithm fora4%detectionerror. Finally,futuredirectionsrelatedtotheworkdescribedinthisthesisarediscussed. xiv Chapter1 Introduction “Since2008,therearemoreobjectsconnectedtotheinternetthanpersonsintheworld andthisfigurewillhit50billionby2020!”–[2]. Inthenot–so–distantfuture, everythinginourliveswillbeinterconnected. Wewill live in smart cities full of sensors, where street conditions, vehicle and pedestrian lev- els will be exploited to optimize driving and walking routes, street lighting will adapt to time and weather conditions, and real–time sound monitoring in centric zones will enable control of noise pollution. Our homes will be equipped with intelligent sens- ing devices, which will enable us to remotely control appliances to avoid accidents, efficiently manage energy and water resources and provide us superior home entertain- ment. These devices will also interact with our own on–body sensing network, which willassistusinourdailyroutinebycarefullyconsideringourindividualcharacteristics (e.g. history of diseases, age, preferences, emotions). At the same time, physical phe- nomena such as forest fires, floods, volcano eruptions, earthquakes and tsunamis will be detected on time to ensure effective emergency response. Furthermore, intelligent agriculture and animal farming monitoring will significantly enhance product quality. In essence, intelligent surroundings will enable continuous data collection and control everywhere from the environment, infrastructures, businesses and ourselves with the ultimategoalofimprovingourownlifestyleaswellastheenvironmentwelivein. All these applications rely on a variety of sensors (e.g. motion detector, camera, GPS, microphone, biometric, infrared, thermal, light, temperature, radiation sensors, 1 !"#$%&'( !"#$% &'()*+"'&(,")% $!$(-&!% $./0(,")% 1"'2+-(0$% #$.30(-%0('$% *!"++3)4% $)&$'&(3)#$)&% 5()23)4% Figure1.1: Ambientintelligenceschematic. etc) to materialize this ambient intelligence (see Fig. 1.1), and the resulting infrastruc- turegivesrisetolarge–scale,heterogeneousandmulti–modalcommunicationnetworks. Itisevidentthattheunhinderedoperationoftheseapplicationsdependscruciallyonthe efficient and holistic management of such networks. In particular, a task of utmost importance, common to all the above applications, is to infer some unknown (possibly time–varying)phenomenon(e.g. streetconditions,homeandresourceconditions,phys- ical and emotional state of individual, etc) by exploiting the available heterogeneous sensing modalities (e.g. sensor type and mode, number of samples, location) of the sensingsystemathandasafunctionofthepastinformation,whileexplicitlyaccounting for any resource constraints typical of wireless systems. Toward this end, this disserta- tionstudiestheactivestatetrackingproblem 1 : Active state tracking. How can we accurately and efficiently track a time–varying, unknownprocessbyadaptivelyexploitingheterogeneousresources? 1 Thisproblemisalsoreferredtoasactiveclassificationorcontrolledsensingforinference. 2 Red, blue, green, orange, purple, violet, yellow? Red, orange, green, yellow, blue, purple, violet? Red, orange, green, yellow, blue, purple, violet? . . . . . . . . . . . . llow? . . . . Initial state: True hypothesis: yes, no, no, no, no, no, no yes, yes, no, no, no, no, yes yes, yes, yes, yes, yes, yes, yes , Figure1.2: Learningprocess. Active state tracking is an inherently interdisciplinary problem, which lies in the intersection of signal processing, communications and control. However, previous research efforts have ignored this connection in the name of simplicity, and are yet to holisticallysolvethisopenanddifficultproblem. Typicalinstantiationsofthisapproach arethat: i. Sensorusagecostsand/orcapabilitiesareignoredorassumedtobethesameforall sensors[8]. ii. Simplified generic costs are adopted instead of application–specific performance objectives[42]. iii. Communication effects are ignored by adopting the “perfect sensing of state” assumption[121]orsimplisticobservationkernels[62]. In this thesis, we acknowledge the interdisciplinary nature of the active state tracking problem by striving to develop methods that integrate techniques from signal process- ing, communications and control to tackle the challenges presented by this very inter- estingproblem. Asthecapabilitiesandcomplexityoftherelatedsystemsincrease,and 3 !" #" !" $" %" &" $ % !" #" ! ! & $ # !" !" $ $" #" $" %" !"#$%&'(!)* #+'+$,*-!)+%!.* Figure1.3: Activestatetrackingmodel. to ensure the viability of the associated applications, it is imperative to exploit these connections. Todeviseintelligentmethods,weacknowledgethatlearningissequentialandadap- tive. Namely, as shown in Fig. 1.2, individuals ask questions and based on the answers theyreceive,theyappropriatelyrefinetheirquestionstoachievetheparticularinference taskathand. Inthisthesis,wecapturetheabovecharacteristicsbyleveragingadvances in stochastic modeling, estimation and optimization. In particular, we model the active statetrackingproblemasapartiallyobservableMarkovdecisionprocess(POMDP)[14] describedby: i. an underlying discrete–time dynamical system, which expresses the evolution of the system’s state and related observations, under the influence of decisions made atdiscretetimesteps,and ii. a cost function that is additive over time, which expresses the cost of decision– making. 4 !"#$%&'( )*+&( !" !"#$%&'( )*+&( #" Figure1.4: Trade–offbetweenqualityandcost. Thus, the POMDP formulation successfully captures the active state tracking problem byholisticallydescribingtheconceptsofestimation,controlandutility. Throughout this dissertation, we adopt the POMDP model that is shown pictorially in Fig. 1.3. Specifically, we consider systems, where the state evolves as a discrete– time, finite–state Markov chain. The adoption of the specific model is motivated by a widearrayofapplicationsfallingundertheframeworkofactivestatetracking,including target tracking in sensor networks [42, 41, 8], physical activity tracking using on–body sensing technology [129], spectrum sensing [116], estimation of sparse signals [89], radar scheduling [61] and coding with feedback [55]. In each of these problems, the system state (e.g. target location, physical activity of an individual, channel state, mes- sage sent, etc) evolves in a discrete Markovian way. At each time step, the exact value of the current system state is unknown but a set of different sensing modalities (sen- sor type, number of samples, etc) is available. Based on some metric of interest (e.g. detectionerrorprobability,energyconsumption,mean–squarederror,etc),weselectan appropriate modality by exercising the related control. As a result, we receive a set of noisy (discrete or continuous) observations, which is then used to estimate the system state. Even though POMDP constitutes a standard tool for modeling decision–making problems under uncertainty, the unique characteristics of active state tracking require 5 modification of the existing theory in multiple occasions. In addition, we identify vari- ouschallengesassociatedwiththisproblem: i. Nonstandardcontrolproblem: Incontrasttotraditionalcontrolsystems,wherecon- trolaffectssystemstateevolution[14],inactivestatetrackingapplications,asver- ified by Fig. 1.3, the controller actively selects between the available observations, butdoesnotaffecttheplant. Thus,well–knownresultscannotbedirectlyemployed tosolvethisproblem. ii. Heterogeneity: Sensors generate large volumes of multi–dimensional data, which canbenoisyorincompleteduetofailures. Furthermore,differentsensorsyielddif- ferentqualitativeviewsofthesamephenomenon(quality),whilerequiringdifferent usage cost (sensing cost), e.g. see Fig. 1.4. This results in a complicated trade–off betweencompetinggoals. iii. Unifiedsensing,estimationandcontrol: Sincesensorsareheterogeneous,thereisa never–ending cycle of sensing, estimation, and control. Namely, current decisions influence future estimates, which in turn affect future decisions and so on. Thus, sensingandestimationtasksaretightlyinterconnectedrevealingthatcarefuldesign oftheattendantstrategiesisnecessary. Forexample,acontrolthatlookspromising intheshortrunmaybeinefficientinthelongrun. iv. Measure of information: In his 1928 paper on the transmission of information, Hartley gave a very intuitive example of how information becomes more precise [47]: “For example, in the sentence, “Apples are red,” the first word eliminates other kindsoffruitandallotherobjectsingeneral. Theseconddirectsattentiontosome property or condition of apples, and the third eliminates other possible colors. It 6 doesnot, however,eliminatepossibilitiesregardingthesizeofapples,andthisfur- therinformationmaybeconveyedbysubsequentselections.” Later, in 1948, Shannon stated that “information exists only when there is a choice of possible messages” [110]. Both of them quantified the importance of informa- tion and suggested appropriate measures. In active state tracking problems, sensor heterogeneity necessitates the design and adoption of sensing strategies to achieve inference tasks. In turn, the resulting quality of inference drives the adaptation of the associated strategy. Thus, the choice of an appropriate measure of information isofparamountimportance. v. Scalability: State and control space explosion in conjunction with the sheer vol- umeofinformationfordecision–makingsignificantlychallengestheapplicationof active state tracking frameworks in large–scale applications. This is the so–called curse of dimensionality. For instance, in a physical activity detection application, the state usually consists of physical activity states (e.g. sit, stand, run, walk, etc), emotionalstates(e.g. happy,sad,angry,etc),andcontextualstatessuchaslocation (e.g. home, office, etc) and time (e.g. morning, afternoon, night etc). In a target tracking application, the control denotes which sensors will be activated in a large wirelesssensornetwork. Inboththeaboveapplications,observationscouldconsist ofmulti–dimensionalmeasurementvectorsfromdifferentsensors. Itisevidentthat the performance of any strategy can be significantly degraded by the state, control andobservationspacesexplosion,hencetheneedforscalablesolutions. Inthisthesis,weconsidervariousaspectsrelatedtothesechallengesanddeviseappro- priatesolutions. Specifically,inChapter2,wedescribesensorheterogeneitybytwoper- formancemeasures: 1)theworst–casedetectionerrorprobabilityand2)theenergycost. Inanefforttodevisescalablesolutions,weprovepropertiesoftherelatedcostfunctions 7 anddevisenear–optimal,low–complexityadaptivestrategies. InChapter3,wepropose a holistic framework of estimation and control, where we capture sensor heterogene- ity by the mean–squared error. We also propose a suboptimal, low–complexity strat- egy based on the Fisher information [39]. In Chapter 4, we extend the aforementioned framework to account for generic sensing usage costs. We also prove properties of the related cost functions and devise a near–optimal, low–complexity strategy. Finally, we devise a near–optimal, low–complexity strategy based on the Weiss–Weinstein lower bound[123]. 1.1 RelatedWork The active state tracking problem has received considerable research attention in the recentyearsaswellasinthepast. Thisisbecauseitarisesindifferentformsinabroad spectrumofapplications, e.g. sensormanagementforobjectclassificationandtracking [124, 8, 50], coding with feedback [85], spectrum sensing [116], amplitude design for channelestimation[96],visualsearch[84],estimationofsparsesignals[48,122],radar scheduling [61], health care [129], context awareness [121], graph classification [69], generalizedsearch[91]andtext,imageandvideoclassificationandretrieval. Active state tracking can essentially be thought of as the time–varying version of activehypothesistesting[82,90]. Thelattergeneralizestheclassicalhypothesistesting problem [118] in the sense that the goal is now to decide among M hypotheses in a speedy and sequential manner, while accounting for the penalty of wrong decision by exploitingK available different actions. In1959,Chernoffwasthefirsttoconsiderthe activebinarycompositehypothesistestingproblem[26]. Heproposedtodeterminethe most likely hypothesis at each time step and select an action that can best discriminate 8 betweenthelatterandanyotheralternativehypothesis. Recentworkinactivehypothesis testinghasmadevariouscontributionsinthecaseofmultiplehypotheses[82,90]. Arelatedprobleminstatisticsistheoptimaldesignofexperiments(OED)[38]. The objectiveistodesignexperimentsthatareoptimalwithrespecttosomestatisticalcrite- rion so as to infer an unknown parameterized system. In particular, in 1953, Blackwell was the first to introduce the comparison of experiments [16], where a decision maker can select one out of several experiments to collect a single observation and base her final decision. Various extensive studies [16, 70, 21, 29, 43, 66, 115] have been con- ductedsincethen. Relatedproblemsinmachinelearning, stochasticcontrolandsensornetworksliter- ature are active learning [108] and scheduling of measurements or sensors [64, 7, 77, 103, 6, 51, 76, 92, 45, 53, 59, 62, 49, 54, 8, 74, 129, 60]. In active learning, the goal is to construct an accurate classifier by utilizing the minimum number of training sam- ples. This is usually achieved by intelligent adaptive querying, i.e. selecting the input patterns for the training process in a statistically optimal way. For a nice introduction to active learning and survey of the corresponding literature, the reader is referred to [108] and references therein. In stochastic control, scheduling of measurements has been studied for linear systems with Gaussian disturbances [77, 6, 51, 76, 45, 92, 53]. Under these assumptions and for the case of quadratic cost functions, the optimal mea- surement policy is independent of the measurements and can be determined a priori. More recently, the problem of sensor scheduling has been considered in various kinds ofsensornetworksassumingnon–linearsystemsorsystemsmodeledbydiscrete–time, finite–state Markov chains [59, 62, 49, 54, 8, 74, 129, 60]. For this latter case, most priorworkhasassumeddiscreteobservations[59,62,74,129,60],scalar[54,8,60]or w measurementsfromw sensorsunderindependenceassumptions[8]. 9 In comparison to the above, we focus on time–varying systems, i.e. we allow the hypothesis to change with time as a Markov chain. The adoption of the specific model resultsinamorecomplicatedsituation,sinceestimationandcontrolarecoupledtogether contrary to the standard linear case [77, 6, 51, 76, 45, 92, 53]. In contrast to active learning, where the true hypothesis is revealed if a cost is paid, we only have access to noisy observations. We also consider more complicated observation models, e.g. generic Gaussian measurement vectors, compared to prior art [59, 62, 74, 129, 60, 54, 8,60]. Thesecharacteristicscastourproblemmoregeneralandrealistic,butharderthan thosealreadyconsideredintheliterature. 1.2 MotivatingApplication Oneofthemostpromisingapplicationoftheambientintelligenceparadigm,referredto asInternetofThings[9],ishumanhealthmonitoring. “eHealth”,atermadoptedforthe first time back in 1999 [30] to encompass the range of services/systems at the intersec- tion of medicine and technology, has been the center of attention of both research and commercialorganizations. Googlehasjustannounceditsdesignofsmartcontactlenses for glucose level tracking of diabetic individuals and Calico, a new company focusing onagingandrelateddiseases. IBMrecognizeshealthcareasoneofthefiveinnovations that will change our lives within five years, while its focus is on DNA sequencing for cancertreatment. Qualcomm,AsthmapolisandZephyrarecurrentlyresearchingasthma attack detection in children and teenagers using wearable technology. USC, Stanford, MITandBerkeleyfocusonawidevarietyofhealth–relatedproblemsrangingfromthe development of Magnetic Resonance Imaging (MRI) techniques, new cancer treatment and heart monitoring technologies to techniques for helping the blind people and auto- matic monitoring of induced comas. A driving force behind these research initiatives 10 Figure1.5: WBANschematic. istheincreasingsophisticationofwearableandimplantablemedicaldevicesalongwith their integration with wireless technology, which has led to an ever–expanding range of therapeutic and diagnostic applications. Applications include, but are not limited to, physicalactivitydetectionforobesityprevention[129],patientrehabilitation[56],diag- nostics,socialcommunicationandinteractiondevelopmentinautism[87],falldetection inelderly[17],andearlydetection,treatmentandpreventionofdiseases[32]. WirelessBodyAreaNetworks(WBANs)[23]constituteanovelclassofsensornet- works that enable a wide range of innovative applications in healthcare, entertainment, lifestyle,sports,militaryandemergencysituations. AsshowninFig.1.5,aWBANtyp- ically comprises of few heterogeneous, biometric sensors, e.g. accelerometers (ACCs), electrocardiographs(ECGs),andafusioncenter,whichisusuallyanenergy-constrained personal device, e.g. mobile phone, PDA. Having the potential to revolutionize health care, WBANs need to be able to support data collection, efficient decision making and interaction with the individual, the cloud and the healthcare professionals toward improvingeachindividual’slifestyle. Practical realization of WBANs faces a number of unique challenges [23]. In par- ticular, energy efficiency constitutes a significant factor in long–term deployment of 11 WBANs. The problem of resource allocation for energy–efficient physical activity detection in WBANs constitutes a great paradigm of active state tracking in hetero- geneous communication networks. Based on an actual implementation of a prototype WBAN,knownastheKNOWMEnetwork[78],efficientstatetracking(i.e. istheindi- vidualstanding,runningorwalking?) provestobeintertwinedwiththeadoptedsensing strategysinceeachsensor’sdiscriminativecapabilitiesvaryperactivity,whileincurring differentenergycosts. Theselectedstrategyalsodependsonthecharacteristicsofeach individual, as we will later see in Chapter 2, indicating the need for personalization. Motivated by this application, in this thesis, we illustrate why tracking and control are intertwinedinheterogeneoussensornetworks. Furthermore,weproposenovelformula- tionstomodeltheinteractionbetweensystemcomponents,developnewstateestimation techniques and design algorithms for accurate and efficient state tracking. Parts of this workarepresentedfromtheWBANperspective, whileothersusethelatterapplication to validate the performance of the proposed frameworks and strategies. However, the related formulations and algorithms are general and can accommodate several classifi- cationandtrackingtasksthatappearindifferentapplications. 1.3 ThesisOutline In order to address some of the basic challenges discussed earlier, we start by consid- eringtheactivestatetrackingexampleofenergy–efficientphysicalactivitydetectionin energy–constrainedheterogeneousWBANs. Accordingtoobservationsofareal–world prototypeWBAN[78],theenergy–constrainednatureofthefusioncenterimposescriti- calrestrictionsonsystemlifetime. Toaddressthisissue,weintroduceanovelstochastic 12 control framework, which considers both sensor heterogeneity and application require- ments, for achieving the two–fold goal: energy savings with satisfactory detection per- formance. Exploitingthisframework,wederiveanoptimaldynamicprogrammingalgo- rithmforthesensorselectionproblem. Next,wederiveimportantpropertiesofthecost functionals and use them to design three approximation algorithms, which offer near– optimal performance with significant complexity reduction. We evaluate the proposed framework on real data collected from the KNOWME network [78]. Specifically, we comment on the form of the optimal control policy and compare the proposed algo- rithms’ performance with respect to energy, detection accuracy, average allocation of samples,per–stateaverageallocationofsamplesandsensitivityonstatetransitions. We also provide a generalization of the proposed framework to accommodate detection of ephemeralstates. Ourobservationsindicatethatenergygainsashighas68%incompar- ison to an equal allocation scheme can be achieved with probability of detection error ontheorderof10 −4 . Next, we consider the active state tracking problem for a system modeled by a discrete–time,finite–stateMarkovchainobservedthroughconditionallyGaussianmea- surementvectors. Thestatisticsofthemeasurementmodelareshapedbytheunderlying stateandanexogenouscontrolinput,whichinfluencetheobservations’quality. Toaccu- ratelytrackthetime–evolvingsystemstate,weaddressthejointproblemofdetermining recursiveformulaeforastructuredminimummean–squarederror(MMSE)stateestima- toranddesigningacontrolstrategy. Specifically,followinganinnovationsapproach,we derive a non–linear approximate MMSE estimator for the Markov chain system state. To obtain an appropriate control strategy, we use the associated mean–squared error as anoptimizationcriterioninapartiallyobservableMarkovdecisionprocessformulation. 13 To solve for the optimal solution, we propose a nonstandard stochastic dynamic pro- gramming algorithm. We also propose a suboptimal, low–complexity algorithm. Fur- thermore, we consider the problem of enhancing estimation performance by exploiting both past and future observations and control inputs. More precisely, we derive non– linearapproximateMMSEsmoothingestimators(fixed–point,fixed–interval,fixed–lag) toacquireimprovedstateestimatesandcommentontheirdifferences. Weillustratethe performance of the proposed framework on the physical activity detection problem in WBANs. Finally,weconsidertheaforementionedactivestatetrackingprobleminthecaseof sensingusagecosts. Theobjectiveistodevisesensingstrategiestooptimizethetrade– offbetweentrackingperformanceandsensingcost. Weemployournon–linearapproxi- mateMMSEestimatorforstatetrackingandusetheassociatedMSEinconjunctionwith the sensing cost metric in a partially observable Markov decision process formulation. To determine the optimal sensing strategy, we derive a nonstandard dynamic program- ming recursion. We also derive properties of the related cost functions and provide sufficientconditionsregardingthestructureoftheoptimalpolicy. Inparticular,wedis- cuss when passive state tracking is optimal. To overcome the associated computational burden of the optimal sensing strategy, we propose two low complexity strategies. We illustratetheperformanceoftheproposedstrategiesontheproblemofenergy–efficient physicalactivitytrackinginWBANs,whereweobserveenergysavingsashighas60% fora4%detectionerrorwithrespecttoanequalallocationsensingstrategy. The rest of this thesis is organized as follows. Chapter 2 describes the problem of physical activity detection in energy–constrained heterogeneous WBANs. Chap- ter 3 describes the problem of designing a holistic framework of estimation and con- trol for discrete–time, finite–state Markov chains observed via controlled conditionally Gaussian measurement vectors. Chapter 4 describes an extension of this framework to 14 accommodatesensingusagecosts. Finally,Chapter5concludesthethesis,anddiscusses futureresearchdirections. Throughout this thesis, we adopt the following notational conventions. Unless stated otherwise, all vectors are column vectors denoted by lowercase boldface sym- bols(e.g.v)andallmatricesaredenotedbyuppercaseboldfacesymbols(e.g.A). Sets aredenotedbycalligraphicsymbols(e.g.X)andSXSdenotesthecardinalityofsetX.1 denotesavectorwithallcomponentsequaltooneandItheidentitymatrixwithdimen- sionsdeterminedfromthecontext. tr(⋅)denotesthetraceoperator,SASthedeterminant of matrixA, YxY the L 2 –norm of vectorx, diag(x) the diagonal matrix with elements the components of vectorx and blkdiag(A 1 ,...,A n ) the block diagonal matrix with maindiagonalblocksthematricesA 1 ,...,A n . Finally,foranyeventB,1 B denotesthe correspondingindicatorfunction,i.e. 1 B = ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 1 if B occurs, 0 otherwise. 15 Chapter2 Energy–EfficientHeterogeneous SensorSelectioninWBANs In this chapter, we study the problem of optimizing the heterogeneous sensor selection process for a WBAN with an energy–constrained fusion center that employs the Blue- toothstandard[1]fordatacommunication,whileoptimizingphysicalactivitydetection. Ourgoalistwo–fold: maximizethelifetimeofsuchauniquesensornetworkwhilepre- serving detection performance. Real measurements from our prototype system [114] revealthattheconstrainedenergybudgetofthefusioncenter,andnotthesensors,limits theachievementofthisgoal. Thisisduetothefactthatthereceptionofsignalsfromthe sensors is more energy expensive than the transmission of such samples from the sen- sors. In fact, collecting samples from all WBAN sensors equally leads to undesirably high energy consumption. Namely, a fully charged fusion center battery is depleted in a couple of hours, while the WBAN dedicated sensors can run for a couple of weeks. Furthermore, the off–the–shelf sensors are not programmable. Thus, we focus on opti- mizing the listening schedule employed by the fusion center: which sensor to listen to andforhowlong. The remainder of this chapter is organized as follows. In Section 2.1, we overview the unique characteristics of our problem and state our contributions. We summarize prior work on energy efficient algorithms in sensor networks in Section 2.2. In Section 2.3, we introduce the problem of heterogeneous sensor selection in a WBAN and the corresponding POMDP model. We derive the optimal sensor selection strategy using 16 dynamic programming in Section 2.4. In Section 2.5, we propose suboptimal solutions based on cost–to–go function properties and heuristics. In Section 2.6, we evaluate our proposedschemesusingrealdata. WeconcludethechapterinSection2.7. 2.1 Contributions In recent years, there has been an explosion of energy–saving frameworks and tech- niques for a plethora of sensor network applications. At the same time, much of the prior work has adopted somewhat restrictive assumptions. For example, sen- sor usage costs, capabilities or both are assumed to be the same for all sensors [121, 8, 41, 120, 109, 22, 24, 10, 12, 124]. Similarly, the “perfect sensing of state” assumption i.e. the underlying phenomenon of interest is fully known if data transmis- sion is accomplished and successful, is usually adopted [121, 120, 8, 109]. Lastly and most importantly, it is typically assumed that the sensors are energy constrained versus thefusioncenter[8,109,41,22,24,124,34,12,119,127,62,10]. In contrast to the above, our problem is characterized by the following unique fea- tures: i. The sensors are heterogeneous in energy use (cost of receiving one sample from eachsensor 1 )aswellasdetectioncapabilitiesfordifferentphysicalactivitystates. ii. In contrast to [114], the underlying activity is time–evolving and only known through noisy observations since the sensors’ communication and sensing modes introduceerrors. iii. To ensure compliance in wear, only a small number of heterogeneous sensors are onbody. 1 Thecostofprocessingdatawasnotthefocusofthiswork. 17 iv. Thefusioncenteristheenergybottleneck(insteadofthesensors). Thus, its unique characteristics necessitate the design of new formulations and algo- rithms. Our main contributions are as follows. We devise a novel formulation, based on stochastic control theory, that optimizes the trade–off between detection accuracy defined as a function of the worst–case, pairwise error probability of misclassification between two activities, and energy consumption due to reception of samples from het- erogeneous WBAN sensors. To the best of our knowledge, this is the first work on designing energy–efficient listening strategies for the energy–constrained fusion center ofaheterogeneousWBAN,whichtracksthetime–evolving,unknownphysicalactivity of an individual. The application of the Partially Observable Markov Decision Pro- cess (POMDP) framework to this problem is also novel. Further, we derive an optimal dynamic programming (DP) algorithm for our sensor selection problem and discuss its advantagesanddisadvantages. Wealsodevisethreeapproximationschemes: (i)aMini- mumIntegratedCostTimeSharingSensorSelection(MIC–T3S)scheme,(ii)anEnergy Efficient Maximal Belief Approximate Dynamic Programming (E 2 MBADP) scheme, and(iii)aGreedyMinimumEnergyandErrorProbabilitySensorSelection(GME 2 PS 2 ) scheme, by exploiting system features as well as important properties of the cost–to– go functions. In all cases, we employ a state detector based on maximum likelihood (ML) principles. Finally, we evaluate the performance of our schemes through simula- tion results, based on experimental data, collected by our prototype system [114] and energy costs experimentally determined during its operation. We compare their perfor- mancewithanequalallocationscheme,whichalwaysselectsequalnumberofsamples from each sensor, and we observe significant energy gains. We underscore that [114] cannot be used as a baseline because it assumes a static system model. State of the art approachesforenergy–efficientoperationofwirelesssensornetworks(WSNs)arealso 18 not applicable for comparison, since their methods work for fundamentally different systemsandunderdistinctlydifferentassumptions. 2.2 RelatedWork EnergyefficiencyinWSNshasbeenawidelystudiedproblem[4]. Sincesensorsaregen- erally battery–powered devices, reducing node power consumption to extend network lifetimeisveryimportant. Manyapproacheshavebeendevisedtoaddressthisproblem, fromad–hocalgorithmstomoresophisticatedones,suchasdata–driven, mobility,and duty cycling / sleep scheduling schemes [4], with the latter ones gaining a lot of recent attention. The main idea behind sleep scheduling (also referred as sensor selection), is to devise a schedule under which nodes alternate between active and sleep periods, depending on network activity. Several algorithms [34, 12, 22, 119, 127] and mathe- matical formulations [24, 124, 109, 121, 41, 8, 62, 120, 10] have enabled the design of sleeping policies for several applications. In all of these cases, except [121, 120], the goalistooptimizeenergyconsumptioninthesensornodes,incontrasttoourworkthat focusesonthefusioncenter. In hierarchical schemes [34], low accuracy, energy–efficient sensor nodes are con- tinuouslymonitoringthephysicalphenomenonofinterest,andhigheraccuracysensors are activated when a critical event occurs. In node clustering schemes [12, 22], sensor nodes form groups and a representative node in each cluster, chosen periodically either byacoordinatorortherestofthenodes,staysawaketohandlecommunication. Incon- trast to these schemes, we select nodes based on their detection capabilities and energy consumption to track a time–evolving physical activity, and there are cases where no sensors are selected at all. Smart feature selection in conjunction with Bayesian statis- tics has also been used to support energy–efficient operation of WBANs [119, 127]. 19 The main idea is to incrementally select sensors to accurately estimate the underlying non–evolving human state based on their informational capabilities with respect to the truestate. Ourwork,ontheotherhand,buildsonarobustoptimizationframeworkthat guaranteesoptimalitytodeterminetheindividual’stime–evolvingstateateachtimestep byassessingbothenergyandaccuracyrequirements. Recently,mathematicalapproachesbasedonstochasticcontrolprincipleshavebeen proposed. Markov Decision Process (MDP) frameworks have been applied in order to select between transmission rates or modes [109, 121, 120] and subset of sensors [24]. Forexample,[121,120]propose aconstrainedMDPformulation todetermine the optimal sensor sampling policy, given a constrained energy budget and under missing observations so as to efficiently track a Markovian / semi–Markovian long user state evolving process. The main difference between the above schemes and our work is that we do not assume knowing the true system state, in fact this is what we want the WBAN to detect. POMDP frameworks [41, 62, 8, 10, 124] have also been devised to achieve energy–efficient operation of WSNs and WBANs. In contrast to [41, 62, 8, 124], where several system and cost models are examined and approximate schemes are devised based on simplifying assumptions, our system and cost model match the properties of a real–life WBAN system. Our work considers sensor nodes’ individual characteristicsandproposesschemeswhichcanbedeployedinrealisticscenarios. The work potentially most similar to ours [10], considers energy efficient classification in a WBAN.However,ourgoalistooptimizethelisteningstrategyofthefusioncenter,not thesensors’energyconsumptionduetodatatransmission. Inourproblemformulation, we consider heterogeneous sensors, allow a variety of transmission rates in contrast to the activate/deactivate all sensors control signals introduced in [10] and we provide closed–form expressions for the energy and detection accuracy costs. Finally, instead of adopting existing approximation algorithms from the literature, as usually done by 20 most prior work, we propose novel approximation schemes based on properties of the cost–to–gofunctionsandheuristicsthatfitourproblem’suniquefeatures. 2.3 ProblemFormulation We consider a WBAN consisting of K heterogeneous, commercial sensors e.g. accelerometer(ACC),electrocardiograph(ECG),pulseoximeter,inastartopologyand an energy–constrained cellphone fusion center. The WBAN measures vital signs of an individual, whoisalternatingbetweenasetofpre-specifiedactivities, suchas Sit, Run, and Walk. During this process, a set of biometric signals is generated by the sensors and communicated via Bluetooth to the fusion center, which in turn must determine the individual’s current activity. In our prototype system [114], the cellphone fusion center’s battery is taxed by receiving signals from the sensors. Our two–fold goal is to minimize the reception cost while accurately tracking the underlying time–evolving physical activity. To this end, we must design a method that determines which sensors thefusioncenterlistensto(ifany),oneatatime,andhowmanysamplestoreceivefrom eachsensor. Webeginbyintroducingthestochasticmodelofoursystem. 2.3.1 StochasticModel We define as system state the current activity performed by the individual and denote its value at time k, x k . The system state takes values from the set X = {1,...,n} i.e. x k ∈ X, where n corresponds to the number of activities supported by the system. We model the temporal evolution of activities using a Markov chain e.g. Figure 2.1. The corresponding statistics are described by a n×n probability transition matrixP, such that P jSi is the probability of the individual being in state j at the next time step given thatheiscurrentlyinstatei. Weassumethatthesetransitionprobabilitiesdonotchange 21 with time, hence the Markov chain is stationary. Although the transition probabilities of the Markov chain should be in practice time–dependent, this assumption allows us tostudysensorselectiondesignforenergy–efficientactivitydetectioninWBANs. Any principles developed in this work should extend to more complex activity alternation models. EstimatingthetransitionmatrixPwasbeyondthescopeofthiswork. At each time step, a set of raw biometric signals is generated by the off–the–shelf sensors in the WBAN. Feature extraction and selection techniques (e.g. [114, 68]) are thenemployedinaback–endservertoproduceasetofsamples. Asamplecorresponds to an extracted feature value from the generated biometric signals, such as ACC mean, ECGperiodormagnitudeofFastFourierTransformofawindowofbiometricdata[68]. Theeffectofdifferentfeaturesontheactivityrecognitionaccuracyhasbeeninvestigated in [114], where a filter–based feature selection method was proposed to determine an optimal feature set for the state detection problem. To achieve energy efficiency, the cellphoneselectsanappropriatecontrolinput,whichisdefinedasthenumberofsamples to receive from which sensor at a given time. Specifically, the control input at time k is denoted byu k and consists of a K–tuple of the form [N u k 1 ,N u k 2 ,...,N u k K ] T , where N u k l denotes the total number of samples received from sensorl when control inputu k is applied. We assume that the total number of samples received from all the selected sensors during the interval[k,k+1) satisfies the constraint ∑ K l=1 N u k l ⩽ N, whereN is fixed. Basedonthecontrolinputdefinitionandtheaboveconstraint,itisstraightforward to see that there are α = ∑ N i=0 ‰ i+K−1 i Ž available controls supported by the system i.e. u k takes values from the set U = {u 1 ,u 2 ,...,u α }. The number of available controls is exponential in N and K, leading to an integer programming optimization problem, necessitating the design of low–cost approximations. Note that the transitions between statesdonotdependontheselectedcontrolinput. 22 Theabovedescriptionindicatesthatwehaveadiscrete–timedynamicalsystemwith system state x k and control inputu k . Unfortunately, the true system state is unknown, since sensors can only provide the fusion center with measurements, resulting in a discrete–time dynamical system with imperfect or partially observed state information [14],alsoknownasPOMDP[3]. Based on the selected control input at timek−1, a measurement vectory k consist- ingoftheselectedsamples/featuresissenttothecellphonefusioncenter. Weadoptthe temporally correlated Gaussian signal model for the biometric signals, introduced and justifiedin[114]. Specifically,theextractedfeaturesfollowanAR(1)–correlatedmulti- variate Gaussian model since there exists temporal correlation for a single feature, and features from different sensors as well as the same sensor are assumed to be uncorre- lated,duetofeatureselection. Theseassumptionshavebeenvalidatedthroughextensive simulations in [114]. The above formulation results in the followingn–ary generalized Gaussianhypothesistestingproblem H u k−1 i ∶y k ∼N(m u k−1 i ,Q u k−1 i ), i=1,2,...,n, (2.1) wherem u k−1 i ,Q u k−1 i ,i= 1,2,...,n denote the mean vectors and covariance matrices of themeasurementvectory k undereachhypothesis,givencontrolu k−1 . Themeanvectors andcovariancematricesare m u k−1 i =[μ i,u k−1 (S 1 ) T ,μ i,u k−1 (S 2 ) T ,...,μ i,u k−1 (S K ) T ] T (2.2) Q u k−1 i =diag(Q i,u k−1 (S 1 ),Q i,u k−1 (S 2 ),...,Q i,u k−1 (S K )), (2.3) whereμ i,u k−1 (S l ) is a N u k−1 l ×1 vector,Q i,u k−1 (S l ) = σ 2 S l ,i 1−φ 2 T+σ 2 z I is a N u k−1 l ×N u k−1 l matrix,TisaToeplitzmatrixwhosefirstrow/columnis[1,φ,φ 2 ,...,φ N u k−1 l −1 ],Iisthe N u k−1 l ×N u k−1 l identitymatrix,φistheparameteroftheAR(1)modelandσ 2 z accountsfor 23 sensing and communication noise. An important observation is that given two distinct controls u q 1 = [N u q 1 1 ,N u q 1 2 ,...,N u q 1 K ] T and u q 2 = [N u q 2 1 ,N u q 2 2 ,...,N u q 2 K ] T , q 1 ,q 2 ∈ {1,...,α}, and a specific sensor l,Q i,u q 1(S l ) is of size N u q 1 l ×N u q 1 l , whileQ i,u q 2(S l ) isofsizeN u q 2 l l ×N u q 2 l l i.e. theyareofdifferentdimensions. Furthermore,thesizeofthe covariancematrixQ i,u k−1 dependsonthetotalnumberofreceivedsamplesateachtime stepandcanbelessthanN ×N. Atthefusioncenter, aninstantaneousdetectorstructuretakesasinputthemeasure- ment vector y k and outputs an estimate of the true activity at time k, denoted by z k . We assume thatz k ∈X i.e. the detector always outputs a valid estimate of the underly- ing activity. An exception to this rule is when the0 K = [0,0,...,0] T control input is selected. In this case, no measurements are produced, thus, no observationz k is gener- ated,whichcanbemodeledasanerasurevalueǫ ∉X. Forthen–aryproblemdefinedin (2.1),thereisnoclosedformexpressionfortheobservationprobabilitiesP(z k Sx k ,u k−1 ) forcontrolinputsu k−1 ≠0 K . Thus,weapproximatethemusingthepairwiseerrorprob- ability upper bound introduced in [65]. In this case, the observation probabilities take thefollowingform r(x k−1 ,u k−1 ,x k ,z k )≐P(z k Sx k ,x k−1 ,u k−1 )≅ » P z k Sx k−1 P x k Sx k−1 ρ u k−1 z k ,x k , (2.4) i.e. there exists a dependence on the previous system state, and ρ u k−1 z k ,x k is the Bhat- tacharyya coefficient [33], which for the case of multivariate Gaussian can be shown tobe ρ u k−1 z k ,x k =expŒ− 1 8 (Δm u k−1 z k ,x k ) T (Q u k−1 h ) −1 Δm u k−1 z k ,x k + 1 2 log detQ u k−1 h » detQ u k−1 x k ⋅detQ u k−1 z k ‘, (2.5) withΔm u k−1 z k ,x k =(m u k−1 z k −m u k−1 x k )and2Q u k−1 h =Q u k−1 z k +Q u k−1 x k . Evaluatingtheobserva- tionprobabilitiesusing(2.4)isofcomplexityO(n 3 αN 3 )andcanbeexecutedoff–line. 24 Note that the dependence on the previous system state does not invalidate any assump- tions of the general POMDP framework. Still, it is imperative to re–derive any results toexaminetheimpactofthisdependenceontheirform. The total information available for decision making at the fusion center at time k, alsoknownasinformationvector,isgivenbyI k =(z 0 ,...,z k ,u 0 ,...,u k−1 )[14],where I 0 = z 0 corresponds to the initial information vector. Thus, the control input at time k is a function of I k , i.e. u k = η k (I k ), where η k is the sensor selection policy at time k. Next,weintroduceourperformancemeasures. 2.3.2 PerformanceMeasures We capture the sensors heterogeneity with respect to 1) energy consumption and 2) detection accuracy by introducing two kinds of performance measures. We define the unnormalized energy cost of control inputu k ase(u k )≐u T k δ, where δ =[δ 1 ,...,δ K ] T andδ l ∈(0,1]isfixedandknownforeachsensor[114]anddenotesthecommunication costofreceivingonesamplefromsensorl. Notethatdifferentsensorsexhibitdifferent communication costs due to their different data rates and locations (on–phone sensors are more energy–efficient). We denote the normalized energy cost, taking values in the interval [0,1], as E(u k ). Both energy costs are independent of the system state, but dependonthepersensorandtotalnumberofreceivedsamples. We capture the discriminative capabilities of heterogeneous sensors by considering that different controls result in different activity detection accuracy, measured by the worst–caseerrorprobabilityp W e (x k ,u k ). Formally,wedefinethisprobabilityasfollows p W e (x k ,u k )≐ max x k+1 ,z k+1 x k+1 ≠z k+1 [r(x k ,u k ,x k+1 ,z k+1 )], (2.6) 25 where 0 ⩽ p W e (x k ,u k ) ⩽ 1, ∀x k ∈ X, u k ∈ U. When 0 K is selected, p W e (x k ,0 K ) = 1, ∀x k ∈X. Westudythetrade–offbetweendetectionaccuracyandenergyconsumptionbycon- sideringaunifiedobjectivefunctionoftheseperformancemeasures. Wedefinethetotal costg λ (x k ,u k )asfollows g λ (x k ,u k )≐(1−λ)p W e (x k ,u k )+λE(u k ), (2.7) whereλ∈[0,1]denotesthetrade–offbetweentherelativeimportanceoftheworst–case error probability and the normalized energy cost. Our total cost can be easily extended toaccommodateothercostfunctions. 2.3.3 PartialObservability&SufficientStatistics Inthissection,weaddresstheproblemofreducingthedatathatareindeednecessaryfor controlpurposes. Theinformationvector,I k ,attimestepk isofexpandingdimension. Instead, we use as a sufficient statistic the probability distribution of state x k given I k [14],whichisalsoknownasbeliefstateandisdefinedas p k =[p 1 k ,p 2 k ,...,p n k ] T , wherep j k =P(x k =jSI k ),j ∈X. (2.8) LetP denotethesetofallbeliefstatesp k suchthat P =™p k ∈R n ∶1 T n p k =1,0⩽p j k ⩽1,∀j ∈Xž. (2.9) P isan−1dimensionalsimplex,knownasbeliefspace[3]. Thebeliefstateevolvesin time based on the update rule provided in Lemma 1. Compared to standard POMDPs, the update rule involves the Hadamard product of two matrices, resulting in a more 26 complexevolutionofthebeliefstate,furtherchallengingthesolutionoftheoptimization problem. Lemma 1. Letp k denote the belief state at time step k. Assume that the control input u k isselectedandattimestepk+1,theobservationz k+1 isgenerated. Then,thebelief statep k+1 isgivenbythefollowingrule p k+1 = P○r(u k ,z k+1 ) T p k 1 T n P○r(u k ,z k+1 ) T p k , (2.10) where○denotestheHadamardproductbetweenthematricesofinterestandr(u k ,z k+1 ) denotesthen×nmatrixofobservationprobabilitieswithelementsr(i,u k ,j,z k+1 ),i,j ∈ X. Proof. Wederivethebeliefupdaterulestartingfromthedefinitionofthebeliefstatein (2.8) and applying Bayes’ theorem. Due to the definition of the observation probabili- ties,thereisadependenceontheprevioussystemstate. Hence: p j k+1 =P(x k+1 =jSz 0 ,...,z k ,z k+1 ,u 0 ,...,u k−1 ,u k ) =P(x k+1 =jSI k ,z k+1 ,u k )= P(x k+1 =j,z k+1 SI k ,u k ) P(z k+1 SI k ,u k ) . (2.11) Thenumeratorin(2.11)canbecomputedasfollows P(x k+1 =j,z k+1 SI k ,u k )= n Q i=1 r(i,u k ,j,z k+1 )P jSi p i k , (2.12) 27 where we have used the law of total probability in conjuction with the following facts: (i) x k+1 depends only on x k , (ii) x k does not depend onu k and (iii) z k+1 depends on x k+1 ,x k andu k . Similarly,thedenominatorcanbecalculatedasfollows P(z k+1 SI k ,u k )= n Q i=1 n Q l=1 r(l,u k ,i,z k+1 )P iSl p l k . (2.13) Substitutingbackin(2.11),weget p j k+1 = ∑ n i=1 r(i,u k ,j,z k+1 )P jSi p i k ∑ n i=1 ∑ n l=1 r(l,u k ,i,z k+1 )P iSl p l k , j ∈X, (2.14) whichwritteninvectorformresultsin(2.10). It is possible to receive no samples from any sensor at any time step when the0 K controlinputisselected. Inthiscase,theupdateruledescribedin(2.10)cannotbeused; we exploit the underlying Markovian activity evolution using the update rule p k+1 = Pp k . Finally, we introduce an ML detector, which estimates the true system state by maximizingthecurrentbeliefstatei.e. ˆ x ML k =argmaxp k . 2.3.4 OptimizationProblem Theexpectedaccumulatedtotalcostforthesystemoverafinite–horizonisasfollows J λ =Eœ L−1 Q k=0 g λ (x k ,u k )¡, (2.15) where L represents the horizon length of interest. We use a finite–horizon formulation as a first step towards addressing the inherently time–varying nature of our problem resulting from the time–varying activities model that induces time evolving transition probabilities. We expect the finite–horizon solution to better fit our problem since it 28 canpotentiallyproduceanon–stationarypolicy,whichiswell–suitedinatime–varying framework. Our goal is to determine the optimal sensor schedule that minimizes the total accu- mulated cost J λ from time 1 to time L over the set of admissible control policies. We assume that the terminal cost g(x L ) is zero. Thus, we have the following finite hori- zon, partially observable stochastic control problem: min u 0 ,u 1 ,...,u L−1 J λ . Since both theworst–caseerrorprobabilityandthenormalizedenergycostareuniformlybounded from above and below, and the state and control spaces are finite sets, an optimal pol- icy exists [14]. The solution to this optimization problem for each value of λ yields an optimalsensorselectionpolicy. Forλ=1i.e. ourfocusisonlyenergyconsumption,we considertwoseparatecaseswhereweexaminedifferentconstraintoptimizationcriteria. Towardsfindingtheoptimalpolicy,weconvertthepartiallyobservedstochasticcon- trolproblemtoafullyobservedstochasticcontrolproblemdefinedintermsofthebelief state[14]. Werewritethecostfunctionin(2.15)intermsofthebeliefstatep k as J λ =Eœ L−1 Q k=0 p T k g λ (u k )¡=Eœ L−1 Q k=0 p T k ‰(1−λ)p W e (u k )+λE(u k )1 n Ž¡, (2.16) where g λ (u k ) = g λ (1,u k ),...,g λ (n,u k ) T and p W e (u k ) = p W e (1,u k ),..., p W e (n,u k ) T . 2.4 OptimalSensorSelection Inthissection,wecomputetheoptimalsensorselectionpolicyusingstochasticdynamic programming. IncontrasttotraditionalPOMDPs,ourmodelpresentstwouniquechar- acteristics. First, instead of affecting the transition to the next system state, the control policydetermineshowmanyobservationsaretobereceivedinthenexttimestep,andif so, what their corresponding quality is going to be. Secondly, observation probabilities 29 depend on the previous system state apart from the current state and previous control. Theorem2givestheDPalgorithmforcomputingtheoptimalsensorselectionpolicyin thiscontext. Theorem2. Fork=L−2,...,0thecost–to–gofunctionJ λ k (p k )isrelatedtoJ λ k+1 (p k+1 ) throughtherecursion J λ k (p k )=min u k ∈U p T k g λ (u k )+A(p k ,u k ) , (2.17) where A(p k ,u k )= ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ∑ n θ=1 1 T n P○r(u k ,θ) T p k J λ k+1 Œ P○r(u k ,θ) T p k 1 T n P○r(u k ,θ) T p k ‘ ,u k ≠0 K J λ k+1 (Pp k ) ,u k =0 K . (2.18) Thecost–to–gofunctionfork=L−1isgivenby J λ L−1 (p L−1 )= min u L−1 ∈U p T L−1 g λ (u L−1 ) . (2.19) Proof. TheDPalgorithm[14]forageneral,controlledMarkovchainsystemwithstate x k ,controlu k ,disturbancew k ,instantaneouscostf k (x k ,u k ,w k )andbeliefstateupdate ruleΦ k (p k ,u k ,z k+1 )canbewrittenasfollows J k (p k )= min u k ∈U E x k ,w k ,z k+1 œf k (x k ,u k ,w k )+J k+1 ŒΦ k Œp k ,u k ,z k+1 ‘‘WI k ,u k ¡ . (2.20) The first term, which represents the current cost of selecting controlu k , can be com- putedasfollows E x k ,w k ,z k+1 œf k (x k ,u k ,w k )WI k ,u k ¡= n Q i=1 g λ (i,u k )P(x k =iSI k )=p T k g λ (u k ), (2.21) 30 Thesecondterm,whichrepresentstheexpectedfuturecostofselectingcontrolu k ≠0 K , canbedeterminedasfollows E x k ,w k ,z k+1 œJ k+1 ŒΦ k Œp k ,u k ,z k+1 ‘‘WI k ,u k ¡= n Q θ=1 P(z k+1 =θTp k ,u k )J λ k+1 ŒΦ k Œp k ,u k ,θ‘‘, (2.22) where P(z k+1 =θTp k ,u k )= n Q i=1 p i k n Q j=1 P(x k+1 =jSx k =i)P(z k+1 =θTx k =i,x k+1 =j,u k ) = n Q i=1 p i k n Q j=1 P jSi r(i,u k ,j,θ). (2.23) For control input0 K , the current cost is still determined by (2.21) while the expected future cost becomes J λ k+1 (Pp k ) due to receiving no observations. Combining (2.20)– (2.23)andexpressingtheminvectorformresultsin(2.17)and(2.18). If u ∗ k = η ∗ k+1 (p k ) minimizes the right–hand side of the DP algorithm for each k, the optimal sensor selection policy will be η ∗ = {η ∗ 0 ,η ∗ 1 ,...,η ∗ L−1 }. Determining the optimal policy using DP is computationally expensive due to an exponentially large control space U and the dependence of the observation probabilities on the previous systemstate. Inaddition,aswithtraditionalPOMDPs,thebeliefspaceP isuncountably infinite. Specifically, for n possible system states and quantization of the belief space P withresolutiond,thenumberofbeliefstatesisO((d+1) n ),resultingincomplexity O(n 3 (d+1) n αL)fordeterminingtheoptimalsensorselectionpolicy. 31 2.5 ApproximationSchemes In the previous section, we argued that optimal solutions using dynamic programming are computationally expensive. Herein, we determine low–cost suboptimal solutions thatcanbeimplementedandruninrealtime. 2.5.1 MIC–T3SAlgorithm We begin by proving a number of important properties of the function J λ k (p k ), which we then utilize to devise a practically implementable scheme. Lemma 3 states that the function J λ k (p k ) is positively homogeneous of degree 1. This can be shown by induction. Lemma 3. The function J λ k (p k ), k = L−1,L−2,...,0, is positively homogeneous of degree1i.e. J μ k (μp k )=μJ λ k (p k ),∀μ>0. (2.24) Proof. AttimestepL−1,wehavethat J L−1 (μp L−1 )= min u L−1 ∈U ™μp T L−1 g(u L−1 )ž = min u L−1 ∈U ™ n Q i=1 μp i L−1 g(i,u L−1 )ž =μ min u L−1 ∈U ™ n Q i=1 p i L−1 g(i,u L−1 )ž =μ min u L−1 ∈U ™p T L−1 g(u L−1 )ž =μJ L−1 (λp L−1 ) 32 Next,weassumethatattimestepk+1,J k+1 (μp k+1 )=μJ k+1 (p k+1 )andwewillprove thatJ k (μp k )=μJ k (p k ). Specifically,weworkasfollows J k (μp k )=min u k ∈U μp T k g(u k )+ n Q θ=1 1 T n P○r(u k ,θ) T μp k J k+1 Œp k+1 Sp k ,u k ,θ‘, μp T k g λ (0 K )+J λ k+1 (Pμp k ) =μmin u k ∈U p T k g(u k )+ n Q θ=1 1 T n P○r(u k ,θ) T p k J k+1 Œp k+1 Sp k ,u k ,θ‘, p T k g λ (0 K )+J λ k+1 (Pp k ) =μJ k (p k ) (2.25) andthisstepcompletestheproof. Foranycontrolinputu k ∈U∖0 K ,theterm1 T n P○r(u k ,θ) T p k isascalar. Therefore,by settingμ=1 T n P○r(u k ,θ) T p k andtakingadvantageofLemma3,thetermA(p k ,u k ) givenin(2.18)for∀u k ∈U ∖0 K canbesimplifiedasshowninthefollowinglemma. Lemma4. ThefunctionA(p k ,u k )canbewritteninthefollowingsimplerform A(p k ,u k )= ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ ∑ n θ=1 J λ k+1 ‰P○r(u k ,θ) T p k Ž ,u k ≠0 K J λ k+1 (Pp k ) ,u k =0 K (2.26) fork=L−2,...,0. Using Lemma 4 and the DP algorithm given in (2.17)–(2.19), we prove Lemma 5 by induction. Lemma5. ThefunctionJ λ k (p k ),k=L−1,...,0,isconcaveandpiece–wiselinear. Proof. AttimestepL−1,wehavethat J λ L−1 (p L−1 )=minp T L−1 ζ 1 L−1 ,...,p T L−1 ζ α L−1 , (2.27) 33 where ζ q L−1 =g λ (u q ),q ∈ {1,...,α}. The termp T L−1 ζ q L−1 is linear with respect top L−1 andsincetheminimumoflinearfunctionsisaconcave, piece–wiselinearfunction, we concludethatJ λ L−1 (p L−1 )isaconcave,piecewiselinearfunction. Next,weassumethat J λ k+1 (p k+1 )=minp T k+1 ζ 1 k+1 ,...,p T k+1 ζ α k+1 . Then: J λ k (p k )=min u k ∈U p T k g λ (u k )+ n Q θ=1 J λ k+1 ŒP○r(u k ,θ) T p k ‘,p T k g λ (0 K )+J λ k+1 (Pp k ) =min u k ∈U p T k Œg λ (u k )+ n Q θ=1 min q∈{1,...,α} ‰P○r(u k ,θ)Žζ q k+1 ‘, p T k Œg λ (0 K )+ min q∈{1,...,α} P T ζ q k+1 ‘ =minp T k ζ 1 k ,...,p T k ζ a k . (2.28) SinceJ λ k (p k )can be written as the minimum of linear functions, we conclude that it is aconcave,piece–wiselinearfunction. Next, we state an important theorem that will be useful in developing our algorithmic scheme. Theorem 6 (Rockaffeller and Wets [102]). A function Γ ∶ R a → R b is superlinear if and only if Γ is positively homogeneous and concave. Moreover, then Γ(∑ n i=1 f i ) ⩾ ∑ n i=1 Γ(f i ). According to Lemmas 3 and 5, J λ k (p k ) is positively homogeneous and concave. The- orem 6 shows that J λ k (p k ) is also superlinear. Thus, if we express any arbitrary belief statep k as p k =p 1 k s 1 +...+p n k s n , (2.29) wheres i is the n–dimensional vector with 1 in the i–th position and zeros elsewhere, thenthefollowinginequalityholdsforthefunctionJ λ k (p k ) 34 J λ k (p k )⩾p 1 k J λ k (s 1 )+...+p n k J λ k (s n ). (2.30) Optimizing each individual term of the sum on the right–hand–side of (2.30) gives us the optimal sensor selection policy for the states in the corners of the belief space P, weighted by the probability of being in these states. Since each control inputu k corre- spondstoavectorindicatingthenumberofreceivedsamplesfromeachsensor,accord- ingto(2.30),thecontrolatanarbitrarybeliefstatep k canbecomputedbytime–sharing as ̂ u p k k = p 1 k ̂ u s 1 k +...+p n k ̂ u sn k . However, the result of the above operation may lead to non–integer values and thus, we need to apply hard decisions at the end. Specifically, we apply a minimum distance rule where ̂ u p k k is assigned the closest control. Ties are brokenarbitrarily. Selectingtheoptimalmethodformakingharddecisionswasbeyond thescopeofthiswork. Next, we substitute the right–hand–side of (2.30) into the simplified version of DP algorithmgivenby(2.17)and(2.26)gettingthefollowingapproximation: ̂ J λ k (p k )≈min u k ∈U p T k g λ (u k )+ n Q θ=1 n Q i=1 P○r(u k ,θ) T p k i ̂ J λ k+1 (s i ), p T k g λ (0 K )+ n Q i=1 Pp k i ̂ J λ k+1 (s i ) (2.31) for k = L − 2,...,0 and [ ] i denotes the i–th element. We propose the Minimum Integrated Cost Time Sharing Sensor Selection (MIC–T3S) algorithm consisting of the followingtwoparts: i. Offline: Compute the control policy at the corners of the simplexP for every time stepinthehorizonofinterestusingAlgorithm1. Determiningthesensorselection policy is of time complexity O(n 4 αL) and space complexity O(n 3 α). This part canbeimplementedataremoteback–endserver. 35 ii. Online: Use the time sharing approach introduced above in conjunction with the update rule forp k given in (2.10) during the normal operation of the system. Exe- cutingthesensorselectionpolicyisoftimecomplexityO(n 4 )andspacecomplex- ityO(n 3 α). Algorithm1MIC–T3S(Offlinepart) Input: transition matrixP, observation matrixr(u k ,z k+1 ), costsg λ (u q ),q = 1,...,α, horizonlengthL Output: costandpolicyatthecornersofsimplexP 1: Use (2.19) to compute costs ̂ J λ L−1 (s 1 ),..., ̂ J λ L−1 (s n )and controlŝ u s 1 L−1 ,...,̂ u sn L−1 in beliefstatess 1 ,...,s n fortimestepL−1 2: fork=L−2∶0do 3: Use (2.31) to compute costs ̂ J λ k (s 1 ),..., ̂ J λ k (s n ) and controls ̂ u s 1 k ,...,̂ u sn k in beliefstatess 1 ,...,s n attimestepk 4: endfor 2.5.2 ConstrainedApproximations Inthissubsection,wefocusonthecaseofλ=1,hence,weconsiderthefollowingcost function J 1 =Eœ L−1 Q k=0 g 1 (x k ,u k )¡=Eœ L−1 Q k=0 E(u k )¡. (2.32) Minimizing J 1 without imposing any constraints results in the policy with the mini- mum energy consumption while probably achieving poor detection performance. To avoid this pathological case, we introduce two instantaneous constraints, resulting in twoapproximationschemes. 36 E 2 MBADP Algorithm In this case, our goal is to minimize (2.32) while enforcing the system to select controls that will lead to high certainty beliefs. Formally, this is expressedasfollows min u 0 ,...,u L−1 J 1 subjectto max(p k )⩾τ, (2.33) where τ ∈ [0,1]. We compute sensor selection policies using dynamic programming, pruning infeasible controls as we go based on the constraint in (2.33), as described in Algorithm 2. When the constraint is not satisfied, we select the control input that gives detectionaccuracyclosesttothethresholdτ. Algorithm2E 2 MBADP Input: set of belief states {p 1 ,...,p bf }, transition matrix P, observation matrix r(u k ,z k+1 ),costsE(u q ),q=1,...,α,horizonlengthL Output: costandpolicyforeachbeliefstatein{p 1 ,...,p bf } 1: Use (2.19) to compute costs J 1 L−1 (p 1 ),...,J 1 L−1 (p bf ) and controlsu p 1 L−1 ,...,u p bf L−1 in belief statesp 1 ,...,p bf for time step L−1. Use (2.33) to decimate the set of availablecontrols. 2: fork=L−2∶0do 3: Use (2.17) to compute costs J 1 k (p 1 ),...,J 1 k (p bf ) and controlsu p 1 k ,...,u p bf k in beliefstatesp 1 ,...,p bf attimestepk. Use(2.33)todecimatethesetofavailable controls. 4: endfor GME 2 PS 2 Algorithm Inthiscase,ourgoalistominimize(2.32)whileenforcingthe system to select controls that will give rise to low worst–case error probabilities in the nexttimestep. Formally,thisisexpressedasfollows min u 0 ,...,u L−1 J 1 subjectto P W e (x k ,u k )⩽1−τ, (2.34) 37 where τ is defined as before. We propose a greedy algorithm to solve (2.34). At each time step, given a belief statep k , the algorithm estimates the underlying activity using the ML detector. Consecutively, the algorithm selects the control inputu k that mini- mizes the instantaneous energy cost E(u k ) subject to P W e (ˆ x ML k ,u k ) ⩽ 1−τ. If at any step, the constraint in (2.34) is not satisfied, the algorithm selects the control input that givesdetectionaccuracyclosesttothethreshold1−τ. The goal of both E 2 MBADP and GME 2 PS 2 is to minimize the energy cost. In both cases, if the constraint is not satisfied, the selected control is the “best of the worst” controls, the one that achieves accuracy closer to the desired threshold. The key dif- ference between the two schemes lies in the constraint definition and the steps each algorithm performs to satisfy the corresponding constraint. E 2 MBADP selects controls on the basis that they will give rise to high beliefs while keeping energy as minimum as possible. On the other hand, GME 2 PS 2 exploits system memory by using the belief stateasinputtotheMLdetectorandthenusingtheMLestimateasthebasisofdeciding between controls. We expect that for low values of τ, GME 2 PS 2 will achieve better detectionaccuracythanE 2 MBADPsinceforthelatterscheme, there aresituations that even though the constraint in (2.33) is satisfied, there is a non–zero belief for the rest of the states. If these states are “highly” confusable with the true state, more detection errors will occur. On the other hand, GME 2 PS 2 cleverly selects between available con- trols to also bound the worst–case error probability, satisfying the constraint in (2.34), thus decreasing the error probability. As a result, GME 2 PS 2 will have slightly higher energyconsumption. 38 2.6 CaseStudy: TheKNOWMENetwork Inthissection,weevaluatetheperformanceoftheschemespresentedinsections2.4and 2.5. Our simulations are driven by experimental data collected by a prototype WBAN, the KNOWME network [114], and energy costs experimentally determined during its operation. TheKNOWMEnetwork[114]consistsofaNokiaN95fusioncenterwithabuilt–in tri–axialACCthatsamplesat30Hz,andacommercialBluetooth–enabledECG,which is also equipped with a tri–axial ACC. The ECG samples at 300 Hz while the built– in ACC at 75 Hz. The external sensors simply transmit data to the fusion center via Bluetooth;whilethemobilephoneperformscoordination,processing,computationand sample collection. The energy consumption of receiving data from each of the sensors has been experimentally determined [114] to be 0.063 W for the Nokia N95 internal ACC,0.108WfortheECGand0.084WfortheexternalACC.Thedifferenceinrecep- tionpowerforthelasttwosensorscanbeexplainedbytheirdifferentdatarates. Clearly, theuseofinternalsensorsornosensorsatallismoreenergyefficientcomparedtoBlue- tooth communication of samples from the external sensors. In addition, the number of samples selected from each sensor quantifies the energy spent by the cellphone, spec- ifying the energy savings achieved if less samples are received. On the other hand, if either only the internal sensors or no sensors are used, the resulting quality of activity detectionmaybepoor. 2.6.1 SimulationsFramework,BaselineandMetrics Data collection was conducted in the lab and consisted of three to four sessions, where twelvesubjectswererequiredtoperformeightspecificcategoriesofphysicalactivities. 39 Figure2.1: Markovchainoffourphysicalactivities{Sit,Stand,Run,Walk}[114]. A detailed description of the data collection process, protocols and test subject charac- teristics can be found in [114]. For brevity, we report results for two individuals only. Here, we focus on distinguishing between four activities {Sit, Stand, Run, Walk}. We use three features extracted from the biometric signals: 1) the ACC Mean, average of the acceleration signal means in each axis for an ACC data window, from the internal ACC, 2) the ACC Variance, average of the acceleration signal variances in each axis for an ACC data window, and 3) the ECG Period, ECG waveform inter–peak period, from the standalone WBAN sensor. These features belong to the optimal feature set for the state detection problem determined by the filter–based feature selection method proposed in [114]. The statistics of the Markov chain and the state distributions for the three features, for two individuals, are shown in Fig. 2.1 and 2.2, respectively. Based on our earlier system description, the cost of receiving one sample from each sensor is δ ACCMean =0.58,δ ACCVariance =0.776andδ ECGPeriod =1,respectively[114]. AllsimulationsfortheDPbasedalgorithmswereperformedusingaslidingwindow technique, where the control policy is derived for fixed horizon length, the first control inputisemployedforthecurrenttimeslot,thewindowisslidedbyonetimeslotandthe process is repeated for the whole length of the experiment. Unless otherwise stated, all simulations were performed for horizon length L = 5, total number of samples N = 12 and were averaged over 10 6 Monte Carlo runs. The value of the horizon length was empirically selected based on how negligible the impact of the control process was on 40 240 245 250 255 260 265 270 0 0.5 1 1.5 pdf ACCMean −5 0 5 10 15 20 25 30 35 40 0 0.5 1 1.5 pdf ACCVariance Sit Stand Run Walk −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 0 20 40 60 pdf ECGPeriod (a)Subject1 130 135 140 145 150 155 160 0 0.5 1 pdf ACCMean Sit Stand Run Walk −5 0 5 10 15 20 25 30 35 0 1 2 3 pdf ACCVariance −1 −0.5 0 0.5 1 1.5 2 2.5 3 0 20 40 pdf ECGPeriod (b)Subject2 Figure2.2: GaussiandistributionsassociatedwitheachofthefouractivitiesfortheACC Mean,ACCVarianceandECGPeriodfeaturesfortwodifferentsubjects. futurestatesafteracertainhorizonlength. WecompareouralgorithmsagainstanEqual Allocation(EA)scheme,whichselectsequalnumberofsamplesfromeachsensor,irre- spectiveofthesystemstate. ThetotalnumberofsamplesselectedbyEAisalwaysequal tothetotalnumberofavailablesamples. Themetricsweemployedare: i. theAverageDetectionError (ADE)definedas ¯ P de ≐ 1 MC MC Q k=1 1 {x k ≠ˆ x k } , (2.35) ii. theAverageUnnormalizedEnergyCost (AUEC)definedas ¯ E ≐ 1 MC MC Q k=1 e(u k ), (2.36) iii. theAverageTotalCost (ATC)definedas ¯ g≐(1−λ) ¯ P de +λ ¯ E, (2.37) 41 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Weightfactorλ AverageTotalCost(ATC) EA DP MIC−T3S Figure 2.3: Performance of EA, DP and MIC–T3S with respect to ATC in the case of N =12samplesforvaryingvaluesoftheweightfactorλ. where MC denotes the number of time slots the system was simulated and ¯ E is the average normalized energy cost defined similarly to the AUEC. MIC–T3S, E 2 MBADP and GME 2 PS 2 are also compared with respect to the average allocation of samples per sensorandtheaverageallocationofsamplespersensor,peractivitystate. 2.6.2 NumericalResults Inthissection,wefirstbrieflycommentontheformoftheoptimalcontrolpolicycom- puted by DP. We then illustrate simulation results for the proposed schemes and com- mentontheirperformance. The optimal control determined by DP depends on the current belief state only, instead of the history of previous states and controls. The optimal control, however, does not depend on time as an outcome of the Markov chain being homogeneous and the use of time–invariant performance measures. Our simulation results indicate that the number of unique controls selected in total is considerably smaller than the total 42 0 1 2 3 4 5 6 7 8 9 10 10 −4 10 −3 10 −2 10 −1 10 0 AverageUnnormalizedEnergyCost(AUEC) AverageDetectionError(ADE) DP MIC−T3S Subject1 Subject2 EA(N=3) EA(N=6) EA(N=9) EA(N=12) Figure 2.4: Trade–off between AUEC and ADE for DP and MIC–T3S in the case of N = 12 samples for two individuals. EA performance for varying values of N is also providedforreference. number of available controls. Fig. 2.2 shows that for both subjects, using one of the features/sensors allows us to effectively discriminate between a subset of the activities. Hence, a combination of samples from the appropriate sensors suffices to accurately determine the system state at each time step. The optimal combination of samples is then a function of desired energy and accuracy level. An exhaustive search over the wholecontrolspacecanbeavoidedusingaheuristicthatselectsapropersubsetofcon- trols. Oursimulationresultsalsoverifythatthecost–to–gofunctionsareindeedconcave andpiece–wiselinear. Thisimpliesthatwecandetermineeachsegmentparticipatingin the cost–to–go functions by its endpoints and thus, define the optimal sensor selection policyasathresholdrulereducingcomplexity. Fig. 2.3 shows the ATC achieved by EA, DP and MIC–T3S with respect to the weight factor λ for the statistics in Fig. 2.2a. Both DP and MIC–T3S optimize the average total cost, as expected. In fact, their ATC is bounded and does not increase 43 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8 Weightfactorλ Averagenumberofsamplespersensor ACCMean ACCVariance ECGPeriod (a)Subject1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8 9 Averagenumberofsamplespersensor Weightfactorλ ACCMean ACCVariance ECGPeriod (b)Subject2 Figure2.5: AverageallocationofsamplespersensorforMIC–T3SinthecaseofN =12 samplesforvaryingvaluesoftheweightfactorλfortwoindividuals. linearly with λ conversely to EA. In addition, MIC–T3S achieves near–optimal perfor- mancesinceitsATCcloselyfollowsDP’s. Similartrendsareobservedforthestatistics inFig.2.2b. Fig. 2.4 demonstrates the trade–off between AUEC and ADE for DP and MIC– T3S for two different individuals. EA performance is also provided for reference for N ∈ {3,6,9,12}. The trends that the schemes under question exhibit become appar- enteventhoughthenumberofsamplesseemsfew. ForafixedvalueofN,EAachieves certaindetectionaccuracybyspendingspecificamountofenergy,whileDP’sandMIC– T3S’ accuracy varies as a function of energy consumption. In particular, with increas- ing energy consumption, the detection error decreases. In any case, DP and MIC–T3S exhibit energy gains ranging from 29% to 74%, for detection accuracy equal to EA’s, as shown in Table 2.1. This implies that careful sensor and listening strategy selec- tion can result in significant energy savings, as compared to receiving equal number of samplesfromallsensors. Whenmaximumdetectionaccuracyisdesired,DPandMIC– T3SachievelowerdetectionerrorthanEAsincetheyemployasmartersensorselection 44 0 1 2 3 4 5 6 7 8 9 10 10 −4 10 −3 10 −2 10 −1 10 0 AverageUnnormalizedEnergyCost(AUEC) AverageDetectionError(ADE) E 2 MBADP GME 2 PS 2 Subject1 Subject2 EA (N=3) EA (N=6) EA (N=12) EA (N=9) Figure 2.6: Trade–off between AUEC and ADE for E 2 MBADP and GME 2 PS 2 in the caseofN =12samplesfortwoindividuals. EAperformanceforvaryingvaluesofN is alsoprovidedforreference. strategy, depending on current system state. On the contrary, when the goal is mini- mum energy consumption, DP and MIC–T3S achieve 100% energy gains but perform worse than EA with respect to detection accuracy. Finally, different energy gains can be observed for different subjects, as expected, due to the variability of user statistics. Nevertheless,energygainsincreasewithN. Fig. 2.5 presents the average allocation of samples per sensor for MIC–T3S as a functionoftheweightfactorλforthetwoindividuals. Weobservethattheallocationof samples differs significantly between individuals as a result of different user statistics. Forexample,forindividual1,nosamplesareselectedfromtheECGperiodbecauseof itsenergyexpensivenatureandlowresolutioncapabilities(seeFig.2.2a)whileforindi- vidual2,mostsamplescomefromtheECGperiod. Inanycase,whenenergyefficiency isdesired,nosamplesareselectedfromanysensor,whereaswhendetectionaccuracyis crucial,allavailablesamplesareused. Fig.2.6showsthetrade–offbetweenAUECandADEforE 2 MBADPandGME 2 PS 2 for the two individuals. EA performance is also provided for reference for N ∈ 45 0.95 0.955 0.96 0.965 0.97 0.975 0.98 0.985 0.99 0.995 1 0 1 2 3 4 5 6 7 8 Accuracythresholdτ Averagenumberofsamplespersensor ACCMean(E 2 MBADP) ACCVariance(E 2 MBADP) ECGPeriod(E 2 MBADP) ACCMean(GME 2 PS 2 ) ACCVariance(GME 2 PS 2 ) ECGPeriod(GME 2 PS 2 ) (a)Subject1 0.95 0.955 0.96 0.965 0.97 0.975 0.98 0.985 0.99 0.995 1 0 2 4 6 8 10 12 Accuracythresholdτ Averagenumberofsamplespersensor ACCMean(E 2 MBADP) ACCVariance(E 2 MBADP) ECGPeriod(E 2 MBADP) ACCMean(GME 2 PS 2 ) ACCVariance(GME 2 PS 2 ) ECGPeriod(GME 2 PS 2 ) (b)Subject2 Figure 2.7: Average allocation of samples per sensor for E 2 MBADP and GME 2 PS 2 in the case of N = 12 samples for varying values of the accuracy threshold τ for two individuals. {3,6,9,12}. Bothschemesexhibitsimilartrade–offcurveswithanydifferencesresult- ing from the inherently different detection accuracy criterion they employ. Note that for a scheme to achieve better detection accuracy, it must relax the energy minimiza- tion criterion, resulting in higher energy consumption, and vice versa. For example, when minimal energy consumption is desired, E 2 MBADP achieves lower AUEC but higher detection error while GME 2 PS 2 , due to its more stringent accuracy constraint, achieves lower detection error incurring higher energy cost. In general, both schemes’ detection error monotonically decreases as function of energy consumption. For detec- tion accuracy equal to EA’s, energy gains vary from 22% to 66% (68%) for E 2 MBADP (GME 2 PS 2 ),asshowninTable2.1,dependingontheunderlyinguserstatistics. Wecan stillachieveenergygainsontheorderof100%forE 2 MBADP(94%forGME 2 PS 2 )but onlyifwesacrificedetectionaccuracy. Fig. 2.7 illustrates the average allocation of samples per sensor for E 2 MBADP and GME 2 PS 2 withrespecttothresholdτ ∈[0.95,1]forthetwoindividuals. Bothschemes follow similar strategies for sensors and samples selection as DP and MIC–T3S. For 46 Table2.1: EnergygainsachievedbyDP,MIC–T3S,E 2 MBADPandGME 2 PS 2 forvary- ingvaluesofN whendetectionaccuracyissetequaltoEA’saccuracy. Algorithms DP/MIC–T3S E 2 MBADP GME 2 PS 2 N=3 Subject1 56% 50% 48% Subject2 29% 22% 22% N=6 Subject1 65% 62% 60% Subject2 46% 46% 46% N=9 Subject1 74% 63% 64% Subject2 44% 49% 46% N=12 Subject1 67% 66% 68% Subject2 53% 53% 51% values ofτ < 0.95 (not shown in Fig. 2.7), E 2 MBADP selects, on average, less number of samples, and sometimes no samples at all. In contrast, GME 2 PS 2 always selects at least one sample, either from the most energy efficient sensor or the most informative one,sincetheworst–caseerrorprobabilityofselectingnosamplesatallisoneandthus, prohibitive. In general, for low values of τ, both schemes receive mostly from ACC Meansinceitisthemostenergyefficientsensor. Asτ increases,thenumberofsamples selected by the two algorithms is comparable and any differences are attributed to the detection accuracy criterion they employ, which also explains the corresponding ADE andAUECvalues. 2.6.3 Comparison In this section, we comment on the similarities and differences between our proposed schemes. We first note that inter–user variability significantly determines resulting energysavings. Thissuggeststhatpersonalizedtrainingwouldgreatlyincreasethebat- tery life of the fusion center. Further, different sets of features could potentially lead to better detection accuracy performance e.g. see [68], which in turn can affect signifi- cantlytheenergygainsachievedbyeachscheme. 47 Sit Stand Run Walk 0 1 2 3 4 5 6 Averagenumberofsamples persensorperstate Sit Stand Run Walk 0 2 4 6 8 Averagenumberofsamples persensorperstate :ECGPeriod :ACCMean :ACCVariance MIC−T3S E 2 MBADP GME 2 PS 2 MIC−T3SE 2 MBADP GME 2 PS 2 Figure 2.8: Average allocation of samples per sensor and activity state for MIC–T3S, E 2 MBADPandGME 2 PS 2 whendetectionaccuracyissetequaltoEA’saccuracy(upper plot: Subject1,lowerplot: Subject2). The ADE achieved by all schemes is comparable, with specific values depending on the weight factor λ or the threshold τ, and the total number of samples N. For low values of τ, however, GME 2 PS 2 achieves better detection accuracy than the rest of the schemes, while sacrificing energy consumption. Next, if we impose EA accuracy, we observefromTable2.1thatforlowvaluesofN,therearedifferencesbetweentheenergy gains achieved by each scheme. However, asN increases, these differences practically disappear. Ingeneral,achievingbothlowADEandAUECconstitutesaconflictinggoal. Dependingonthevaluesofλandτ,eachoftheseschemescanachievebetterAUECby relaxing the constraint performance for ADE and vice versa. Determining the optimal valueofλandτ wasoutofthescopeofthiswork. Fig. 2.8 illustrates the average allocation of samples per sensor and activity state when we impose EA accuracy for MIC–T3S, E 2 MBADP and GME 2 PS 2 for the two individuals. Samples allocation depends on the system state and the underlying user statistics, as expected. However, for a given individual and a specific state, all algo- rithmsusethesamesetofsensors. Wealsonoticethatallschemesrequire,onaverage, 48 n n+2 n+4 n+6 n+8 n+10 n+12 n+14 n+16 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Timeslot Beliefstate Sit Stand Run Walk Walk Sit Sit Walk Run Walk Walk Run (a)Beliefstatetemporalevolution n n+2 n+4 n+6 n+8 n+10 n+12 n+14 n+16 0 1 Timeslot Number of samplespersensor ACCMean ACCVariance ECGPeriod Nosamples areselected. (b)Temporalallocationofsamplespersensor Figure2.9: Beliefstatetemporalevolutionandcorrespondingnumberofreceivedsam- ples from each sensor for MIC–T3S in the case of N = 12 samples. More emphasis is on minimizing energy consumption. The underlying true state is also provided for reference. Time slots for which the underlying true state does not change are grouped togetherbyanellipsoid. much less than the total number of available samples to achieve EA accuracy. Their smartselectionstrategyemployslessbutmoreinformativesamplesresultinginareduc- tion on sample usage as high as 69%, without sacrificing accuracy, and still achieving important energy gains. Of course, this reduction on sample usage is highly user and statedependent,asseeninFig.2.8. Lastbutnotleast,MIC–T3SandE 2 MBADPexhibit comparable allocations in contrast to GME 2 PS 2 . In fact, the latter algorithm tends to selectmoresamplesinthecaseofthesecondindividualduetotheresolutionoftheuser statisticsanditsgreedynaturethatfocusesonshort–termrewardsandignoreslong–term consequences. Finally, Figs. 2.9 and 2.10 show the belief state evolution and corresponding allo- cation of samples per sensor with respect to time when MIC–T3S is employed for two distinctscenarios. TheseresultsarebasedontheuserstatisticsshowninFig.2.2a. Sim- ilarresultsholdfortheuserstatisticsshowninFig.2.2b. Fig.2.9showsMIC–T3S’per- formance when more emphasis is on minimizing energy consumption, while Fig. 2.10 49 n n+2 n+4 n+6 n+8 n+10 n+12 n+14 n+16 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Timeslot Beliefstate Sit Stand Run Walk Sit Walk Sit Walk Run Walk Walk Run Run WalkWalk Sit Sit (a)Beliefstatetemporalevolution n n+2 n+4 n+6 n+8 n+10 n+12 n+14 n+16 0 1 Timeslot Number of samplespersensor ACCMean ACCVariance ECGPeriod (b)Temporalallocationofsamplespersensor Figure2.10: Beliefstatetemporalevolutionandcorrespondingnumberofreceivedsam- plesfromeachsensorforMIC–T3SinthecaseofN =12samples. Moreemphasisison maximizingdetectionaccuracy. Theunderlyingtruestateisalsoprovidedforreference. Time slots for which the underlying true state does not change are grouped together by anellipsoid. presents MIC–T3S’ performance when more desirable is to maximize detection accu- racy. Weobservethattheformerscenarioincludesbeliefstatesthataremoredispersed in contrast to the latter where belief states are predominantly delta shaped functions. Furthermore,inthelattercase,sensorsareselectedbasedontheirdetectioncapabilities with respect to what is believed to be the true system state. Conversely, when the goal istooptimizeenergyconsumption,thefusioncentereitherreceivesonlyfromthemost energy–efficient sensor i.e. ACC Mean or does not receive samples at all. In any case, weseethatexploitingavailableinformationleadstosignificantdecreaseintheselected number of samples. In general, MIC–T3S tracks satisfactorily the underlying system state even when energy consumption minimization is its primary goal. Similar results holdfortheothertwoschemes,withtheexceptionthatinthefirstscenario,E 2 MBADP andGME 2 PS 2 alwaysselectonesamplefromtheACCMeanandthus,theyincurfewer detection errors. This behavior is attributed to the different decision structure of these schemes. 50 0.95 0.955 0.96 0.965 0.97 0.975 0.98 0.985 0.99 0.995 1 0 2 4 6 8 10 12 Accuracythresholdτ Averagetotalnumberofsamples matr xP 0 matrixP 1 (inbetween) matrixP 2 (lowuncertainty) matrixP 3 (highestuncertainty) As theevolutionuncertainty increases,moresamples are selectedonaverage. Figure2.11: AverageallocationofsamplesforGME 2 PS 2 inthecaseofN =12samples for varying values of the accuracy threshold τ and different Markov chain transition matrices. 2.6.4 Effectofthetransitionmatrix In this section, we discuss how the form of the Markov chain transition matrix affects the ATC, ADE, AUEC and the average selected number of samples for each scheme. TheseresultsarebasedontheuserstatisticsshowninFig.2.2a. Similarresultsholdfor the user statistics shown in Fig. 2.2b. For Markov chains with certain transitions being more probable than others, our schemes select on average less than the total available samples. For example, when maximum detection accuracy is desired, only 6 samples are selected, leading to energy gains as high as 51%, with detection error considerably better than EA. On the other hand, as the uncertainty in transitions increases, the total selectednumberofsamplesalsoincreasesandattainsthemaximumquicker. TheATC, ADE and AUEC are also affected by the form of the transition matrix in the sense that higher state evolution uncertainty makes planning more challenging and results in higher costs. GME 2 PS 2 ’s behavior is slightly different in some cases. Specifically, 51 when certain transitions are more probable than others and high accuracy requirements are imposed, GME 2 PS 2 selects 12 samples, incurring high energy consumption while achievingsamedetectionaccuracyastherestoftheschemes. Astheuncertaintyinstate evolutionincreases,GME 2 PS 2 selects,onaverage,moresamplescomparedtotheother algorithmsincurringslightlyhigherenergycost,butachievingbetterdetectionaccuracy. This is a result of GME 2 PS 2 ’s greedy nature that focuses on short–term rewards and ignoreslong–termconsequences. Fig.2.11showsthetotalnumberofsamplesselected byGME 2 PS 2 forthefollowingMarkovchaintransitionmatrices P 0 = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0.6 0.2 0 0.4 0.1 0.4 0.1 0 0 0.1 0.3 0.3 0.3 0.3 0.6 0.3 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ,P 1 = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0.2 0.2 0 0.1 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.4 0.1 0.1 0.4 0.2 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ , P 2 = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0.1 0 0 0.9 0.9 0.1 0 0 0 0.9 0.1 0 0 0 0.9 0.1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ,P 3 = 1 4 I 4×4 . (2.38) 2.6.5 IndividualStateDetectionError In this section, we show how to introduce weighting to emphasize certain states over others. We observe that the average per state detection error is different across states and in fact, for some states, it is significant (see first row of Table 2.2) eventhough the total ADE may be low. This is expected since the POMDP framework optimizes the average cost, which implies that unlikely states are filtered out e.g. Stand, while more 52 Table2.2: Individualstatedetectionerrorfordifferentweightingschemes. WeightingschemeW TotalAUEC Averageperstatedetectionerror (Averagenormalizedenergycostperstate–AECS) Sit Stand Run Walk W=diag(1,1,1,1) 1.0314 0.0497 0.1111 0.0172 0.0537 (0.3982) (0.0934) (0.1568) (0.3343) W=diag(1,10,1,1) 1.1568 0.0461 0.0514 0.0039 0.0450 (0.3847) (0.1330) (0.1488) (0.3337) W=diag(10,1,1,1) 1.3740 0.0336 0.0988 0.0186 0.0474 (0.45989) (0.0956) (0.1148) (0.3314) W=diag(1,1,1,10) 1.2940 0.0279 0.0818 0.0047 0.0321 (0.4011) (0.0849) (0.1843) (0.3297) W=diag(5,15,1,1) 1.6359 0.0276 0.0234 0.0014 0.0348 (0.4225) (0.1411) (0.1133) (0.3231) likelystatesarepromotede.g. Sit. Wecanintroduceadifferentweightingparameterfor eachstateinourdetectionerrormetrici.e. (2.6)nowbecomes p W e (x k ,u k )=w x k max x k+1 ,z k+1 x k+1 ≠z k+1 [r(x k ,u k ,x k+1 ,z k+1 )], (2.39) where w x k is the importance factor assigned to state x k . To compensate for the intro- ductionofweightsinthedetectionaccuracymetric,weneedtoappropriatelyweightthe averageenergycostandthus,theexpectedaccumulatedtotalcost(2.16)nowbecomes J λ =cEœ L−1 Q k=0 p T k Œ (1−λ) c Wp W e (u k )+ λ c E(u k )1 n ‘¡, (2.40) where c = p T k W1 n corresponds to the normalization constant, which ensures that {w i p i k } n i=1 remains a valid distribution, and W = diag(w 1 ,...,w n ) denotes the diag- onalmatrixofweights. Table 2.2 summarizes the state detection error for each of the four states for different weighting schemes. The total AUEC and the average normalized energy cost per state (AECS) are also given. The Markov chain stationary distribution is [0.3953,0.0930,0.1628,0.3488] T . The results are based on the user statistics shown 53 in Fig. 2.2b using DP for λ = 0.5. Similar trends hold for the user statistics shown in Fig. 2.2a and the other algorithms. We observe that assigning large weight to a spe- cific state gives better detection accuracy for this state, as expected. At the same time, the detection error for the rest of the states can be better or worse since certain sensors canbetterdiscriminatebetweenspecificactivities’sets. Comparedtotheequalweights case,thetotalAUECisincreasedsincemoreeffortisputtodetectsomestatesthanoth- ers. Infact,theAECSforastatewithlargeweightisincreased,whilefortherestofthe statesitisdecreased. Finally,whenweassignlargeweighttotheWalkstate,weobserve that its AECS is slightly decreased. This is expected since Walk can be distinguished easilyfromtherestofthestatesusingsamplesonlyfromthemostenergy–efficientsen- sor (see Fig. 2.2b). At the same time, the AECS for the rest of the states is increased; toensurethatWalk isalwaysdetected,weneedtoensurethatstateswhichleadtoWalk arealsodetected. AccordingtoFig.2.2b,thisrequiresreceivingsamplesfromthemore expensivesensors,thus,increasingtheAECSfordetectingthecorrespondingstates. 2.7 ConcludingRemarks Inthischapter,weaddressedtheproblemofheterogeneoussensorselectioninaWBAN with an energy–constrained fusion center, to achieve energy–efficient operation while optimizing physical activity detection. We proposed a novel stochastic control frame- worktocapturetheinterplaybetweendifferentsystemcomponents,whichalsoconsid- eredsensorheterogeneitywithrespecttoenergyanddetectioncapabilities. Wederived an optimal DP algorithm and proved important properties of the cost–to–go functions, based on which, we proposed and examined three approximation algorithms circum- venting the curse of dimensionality of the DP solution: (i) MIC–T3S, which is based on time–sharing and achieves low complexity, (ii) E 2 MBADP and (iii) GME 2 PS 2 that 54 exploit the problem’s underlying structure, employing heuristics to decimate the set of available controls. Finally, we evaluated our schemes’ performance on real data from a real–world WBAN and presented numerical results verifying high energy gains with gooddetectionaccuracybyutilizinglessthantheavailableresources. 55 Chapter3 ActiveClassification: aUnified Framework In this chapter, we investigate the active state tracking problem for a system modeled by a discrete–time, finite–state Markov chain. The ‘hidden’ system state is observed through a conditionally Gaussian measurement vector that depends on the underlying system state and an exogenous control input, which shapes the observations’ quality. To accurately track the time–evolving system state, we address the joint problem of determiningrecursiveformulaeforastructuredminimummean–squarederror(MMSE) state estimator and designing a control strategy. Specifically, following an innova- tions approach, we derive a non–linear approximate MMSE estimator for the Markov chain system state that proves to be structurally similar to the Kalman filter (KF) [57]. To obtain a control strategy, we propose a partially observable Markov decision pro- cess (POMDP) formulation, where the filter’s mean–squared error (MSE) performance serves as the optimization criterion. Since deriving the optimal solution via dynamic programming recursions is of high computational complexity, we propose a subopti- mal,low–complexityalgorithmbasedontheFisherInformationmeasure[39]. Wealso consider the problem of enhancing system state estimates by exploiting both past and future observations and control inputs. More precisely, we derive non–linear approx- imate MMSE smoothing estimators (fixed–point, fixed–interval, fixed–lag) to acquire improved state estimates and comment on their differences. Finally, we illustrate the 56 performanceoftheproposedframeworkusingrealdatafromthebodysensingapplica- tionofChapter2. Theremainderofthischapterisorganizedasfollows. WestartSection3.1bysum- marizing prior work and continue with stating our contributions. In Section 3.2, we provide our stochastic system model and its innovations representation. Next, in Sec- tion 3.3, we derive the Kalman–like estimator and give its MSE performance. We also contrastourproposedestimatortothestandardKF.InSection3.4,wederivetheoptimal control policy that drives the estimator, while in Section 3.5, we devise a suboptimal, low–complexity control policy based on the Fisher information measure. Next, in Sec- tion 3.6, we derive smoothed estimators to acquire more refined system estimates. In Section 3.7, we consider the body sensing application example to illustrate the perfor- manceofourframeworkandweconcludethechapterinSection3.8. 3.1 RelatedWorkandContributions TheclassicalKalmanfilter(KF)[57]alongwiththefixed–interval[98,40],fixed–point and fixed–lag [75, 27, 79] smoothers are suitable for estimating discrete–time, linear Gauss–Markov systems. Their extensions, i.e. the Extended and Unscented KF [25], the Extended and Unscented Kalman smoother, are suitable for general, non–linear, (non)–Gaussiansystems,andtheyusuallyadoptaGaussianapproximationforthestate distribution, while their performance depends significantly on either some kind of lin- earization or the careful selection of samples points. In various applications, physical or cost constraints usually prevent the transmission of all available observations and thus, state estimation needs to be carried out using only a subset of those. To this end and within the context of linear Gaussian systems, the resulting estimator proves to be a Kalman filter matched to the observation policy [6]. To reduce communication 57 costs in sensor network (SN) applications, the problem of state estimation using quan- tizedobservationswith/withoutavailabilityofanalogmeasurementshasbeenaddressed [100, 88]. For example, the proposed Sign–of–Innovation KF [100] and its extensions (see [88] and references therein) are based on quantized versions of the measurement innovation and/or real measurements for both Gaussian linear and non–linear dynam- ical systems. Two well–known approaches for deriving recursive estimators are: 1) the innovations method [107], and 2) the reference probability method [36]. The for- mer one defines innovations sequences and exploits martingale calculus to determine the estimator’s gain, while the latter introduces a probability measure change to cast the observations independent and identically distributed so as to simplify calculations. MMSEandrisk–sensitive 1 estimatorshavebeenderivedviathesemethodsfordiscrete– time, finite–state Markov chains observed via discrete observations [106, 37, 11] or observations corrupted by white Gaussian noise [63, 35], but without exerting control. Theworkin[95]proposedanapproximate MMSEestimatorstartingfromamaximum ` a posteriori (MAP) detector, while [31] derived risk–sensitive recursive estimators for discrete–time,discretefinite–stateMarkovchainswithcontinuous–valuedobservations. In contrast to these works, our proposed estimators are approximate MMSE estimators that build upon the innovations method [107, 106, 11] for discrete–time, finite–state Markovchainsobservedviacontrolled(i.e. observationsareactivelyselectedbyacon- troller) conditionally Gaussian measurements. We also underscore that our problem is fundamentallydifferentfromtheproblemofKalmanfilteringwithmissingobservations [86,46,111],whereobservationsmaybedelayedorlostduetothenatureofcommuni- cationchannels. Inparticular,ourgoalistocontrolwhichoftheavailablemeasurements shouldberequestedassumingnotransmissionorreceptionerrors. 1 In contrast to risk–neutral (MMSE) estimation, risk–sensitive estimation penalizes higher–order momentsoftheestimationerror. 58 We focus on state estimation for several reasons. First, we have previously con- sidered state detection in Chapter 2, where the detection error probability was adopted as performance objective. However, since the latter measure does not admit closed– form solutions, we were restricted to bounds that can be quite loose in medium to low Signal–to–Noise Ratio (SNR) scenarios. In contrast, MSE is computed in closed–form andenablesustofocusontruetrackingperformance. Second,contrarytotheapproach in Chapter 2, the current framework enables a natural joint consideration of estimation and control, since the control law affects the estimation process. In fact, the frame- work proposed in this chapter is much more general and realistic, and extends the cor- responding framework of Chapter 2, which assumed discrete observations, performed MaximumLikelihoodsystemstatedetectionandemployedaworst–caseerrorprobabil- ity bound as an optimization metric. Furthermore, the belief state, which corresponds totheconditionalprobabilitydistributionassociatedwiththechainstates,istheMMSE system state estimate. As a result, through MSE minimization, we acquire better belief state estimates, which in turn give rise to high detection accuracy. Third, as we will later see in Chapter 4, the optimal cost–to–go function is a piece–wise concave func- tion of the predicted belief state implying the potential use of efficient methods for computation. Finally, it is well–known that finding the optimal solution of a POMDP is a computationally intractable problem and thus not suited for large–scale applica- tions. However, related computations can be significantly accelerated by considering the underlying structure of the related processes and exploiting sparse approximation methods similar to [72, 67]. Our framework constitutes the basis toward addressing suchlarge–scaleproblems. Our work differs from the system state estimation problem in discrete–time, jump Markovlinearsystems(JMLS)[28]. Inthesesystems,thegoalistoestimatetheunder- lying system state given that the system operates in multiple modes, each of which is 59 linear and the switching between them casts the overall system non–linear. The mode change is usually modeled by a discrete–time, finite–state Markov chain, which can be assumed either known or unknown leading to different estimation techniques [28]. In the latter case, the Markov chain true value is usually determined via minimization of theassociatedposteriordetectionerrorprobability. Incontrasttotheabove,weactively select measurements to estimate the system state, which is a Markov chain. Most prior workinJMLSconsistsof‘passive’approaches,i.e. methodsthatattempttodothebest possible when no control over the observations is exerted. Nonetheless, the problem of designing control sequences to enable discrimination between the multiple modes subject to state and control constraints has also been studied (see for example [15] and references therein). The key differences between [15] and our work are: 1) the control in[15]affectsbothstateandmeasurementslinearlycontrarytoourformulation,and2) ourfocusisMSE,notadetectionerrorprobabilityupperbound[15]. In the context of SNs and active classification literature [82, 90, 59, 62, 49, 8, 74, 129,60],ourgenericdefinitionofcontrolaccommodatesthefusionofmultipledifferent types of samples from heterogeneous sensors and thus, generalizes prior frameworks. In particular, most prior work assumes discrete observations [59, 45, 62, 74, 129, 60], scalar [8, 60, 82] or w measurements from w sensors under independence assumptions [8, 74], which is not realistic for our problems of interest. In contrast, we focus on time–varying systems with Gaussian measurement vectors, which account for multi– modal observations. A variety of cost functions has been previously adopted as per- formance quality measures, such as generic costs [125, 60], general convex distance measures [62, 8], detection error probability and bounds [62, 129, 60, 90, 82], mean– squared error (MSE) [59, 62, 122, 128], information–theoretic measures [62], distance metrics [59, 8] and estimation bounds [45, 49, 74]. We focus on MSE performance 60 since we can acquire closed–form formulae for the MSE, which enable us to explic- itly focus on true estimation performance, versus other metrics, which may not admit closedformsolutionsandtheirapproximationcansignificantlyaffectthesensingstrat- egy. Furthermore, in contrast to the case of linear systems with Gaussian disturbances [77, 6, 51, 76, 45, 92, 53], where the related filtering error covariance matrix is inde- pendent of the measurement sequence, the estimation performance of our estimator is affected by the selected control policy. To achieve good MSE performance, we use the trace of the conditional filtering error covariance matrix as the cost functional of a POMDP. In most cases, the associated POMDP is linear [125, 8, 129, 82, 60, 90] in the belief state resulting in a standard formulation that is in general easier to character- ize since the relevant value function is known to be piecewise linear and convex [71]. In contrast, similar to [59, 62], our POMDP proves to be non–linear due to non–linear dependence on the predicted belief state and thus harder to characterize. Still, the opti- mal policy can be derived via stochastic dynamic programming versus [74], where a suboptimal scheme is needed. In [59, 62], only one out a finite number of sensors can be selected and the MSE metric employed by the authors is scaled by a user–defined cost in an effort to capture the effect of different sensors. In contrast, our framework is more widely applicable since it allows the selection of multiple heterogeneous sensors, whiletheireffectisdirectlycapturedbyourMSEmetricwithouttheneedofadditional user–definedvariables. Ourworkalsodiffersfrom[60]sinceweareinterestedinequallytrackingallsystem states,notdeterminingwhentheMarkovchainhitsaspecifictargetstateandterminate tracking. To do so, we choose from a variety of controls versus [60], where only two types of controls are considered. In contrast to [125], where the authors address exis- tence and stability issues for linear quadratic Gaussian control and generic Markovian 61 models, we provide a unified framework of estimation and control for systems mod- eled bydiscrete–time, finite–state Markovchains observed via controlled conditionally Gaussianobservations. ItisusuallythecasethatDP–basedapproachessufferfromthecurseofdimensional- ityyieldingnoefficientlycomputablesolutions. Inourproblemformulation,thisfactis exacerbatedbyadoptingtheMSEasperformanceobjective,whichresultsinnon–linear cost functions. As a first step toward the design of computationally efficient control strategies,weproposeasensorselectionalgorithmbasedontheFisherinformationmea- sure[39]. Thismeasureisextremelyimportant inestimationtheoryandstatisticssince it1)characterizeshowwellwecanestimateaparameterbasedonasetofobservations, and 2) is related to the concept of efficiency and the Kullback–Leibler divergence [44], which also constitutes a fundamental measure in information theory. Various scalar functions of the Fisher information matrix have been previously considered as opti- mization criteria for sensor selection and active parameter/state estimation [104, 50]. However, in all these cases, the differentiability of the associated likelihood function is implicitly assumed. In contrast, we consider multi–valueddiscreteparameters, where thisassumptionfails. Our contributions are as follows. We propose a framework for estimation and con- trol for controlled sensing applications for a very important class of models: discrete– time, finite–state Markov chains observed via controlled conditionally Gaussian mea- surements. Specifically, we derive recursive formulae for the state estimator, which proves to be formally similar to the classical KF, as well as for the three fundamen- tal types of smoothers (fixed–point, fixed–interval, fixed–lag). In addition, we derive a dynamic programming algorithm to determine the optimal control policy, which opti- mizes the filter’s MSE. We also devise a suboptimal, low–complexity control policy 62 basedontheFisherinformationmeasure. Tothisend,wefirstgeneralizethelattermea- sure to overcome the differentiability issue, and then derive a closed–form expression for our system model. Last but not least, we provide numerical results validating the performanceoftheproposedframeworkonrealdatafromthebodysensingapplication ofChapter2. 3.2 ProblemStatement We consider a stochastic dynamical system with system state modeled by a discrete– time, finite state Markov chain that evolves in time. The system state is hidden i.e. it isobservedthroughameasurementvectorthatdependsbothontheunderlyingstate,as well as an exogenous control input selected by a controller. Our goal is to accurately infer the underlying time–evolving system state by shaping the quality of the obser- vations. To this end, we consider the joint problem of determining formulae for the minimum mean–squared error (MMSE) system state estimate from the past observa- tions and controls (MMSE filter equations) and the optimal control strategy that drives this estimator. We also consider the problem of acquiring more refined state estimates by exploiting future observations and controls (MMSE smoother equations). We begin byintroducingthestochasticmodelofoursystem. 3.2.1 SystemModel We consider a dynamical system, where time is divided into discrete slots and k = 0,1,... denotesdiscretetime. Thesystemstatecorrespondstoafinite–state,first–order Markov chain withn states, i.e. X ={e 1 ,...,e n } withe i denoting the unit vector with 1inthei–thpositionandzeroeverywhereelse. Weadopttheaboveembeddingfortwo reasons: 1) the notational advantage of having the MMSE state estimate coincide with 63 the belief state (c.f. Section 3.3.1), 2) to match the MMSE estimate and belief state structures, i.e. if perfect observations were received, the belief state will coincide with somee i ∈ X. The Markov chain is defined on a given probability space(Ω,A,P) and ischaracterizedbythetransitionprobabilitymatrixPwithcomponentsP jSi =P(x k+1 = e j Sx k =e i ) fore i ,e j ∈ X. We assume that these transition probabilities do not change withtime,hencetheMarkovchainisstationary. The system state x k is hidden and at each time step, an associated measurement vectory k is generated. Each such vector follows a multivariate conditionally Gaussian modeloftheform y k Te i ,u k−1 ∼f(y k Se i ,u k−1 )=N(m u k−1 i ,Q u k−1 i ),∀e i ∈X (3.1) with statistics depending on the underlying system state x k and a control input u k−1 , selected by a controller at the end of time slot k −1. We denote the mean vector and covariancematrixofthemeasurementvectorforsystemstatee i andcontrolinputu k−1 asm u k−1 i andQ u k−1 i , respectively. The control inputu k−1 can be defined to affect the size of the measurement vectory k (cf. adaptive estimation of sparse signals in [122]), itsform,orbothandisselectedbythecontrollerbasedontheavailableinformation,i.e. historyofpreviouscontrolinputsandmeasurementvectors. Weassumethattherearea finite number of controls supported by the system, i.e. u k ∈ U = {u 1 ,u 2 ,...,u α }, and forthemoment,wedonotconsiderthecaseofmissingobservations. The above formulation results in a discrete–time dynamical system with imperfect stateinformation[14],alsoknownasPOMDP.Next,weintroducetheinnovationsrep- resentation of our system model, which is crucial for the derivation of the filtering and smoothingequations. 64 3.2.2 InnovationsRepresentationofSystemModel We introduce the source sequence of true states X k = {x 0 ,x 1 ,...,x k }, the control sequence U k = {u 0 ,u 1 ,...,u k } and the observations sequence Y k = {y 0 ,y 1 ,...,y k }. We also define the global history B k = σ{X k ,Y k ,U k }, the histories B + k = σ{X k+1 ,Y k ,U k } and B − k = σ{X k ,Y k−1 ,U k−1 }, and the observation-control history F k =σ{Y k ,U k−1 },whereσ{z}denotestheσ-algebrageneratedbyz, i.e.,thesetofall functionals ofz. Each control inputu k is determined based on the observation–control historyF k ,i.e.u k =η k (F k ). The innovations sequence{w k } related to{x k } [107] with respect toB k is defined as w k+1 ≐x k+1 −E{x k+1 SB k }, (3.2) so that due to the Markov property, E{x k+1 SB k } = E{x k+1 Sx k } = Px k . Note that the sequence{w k } is a{B}–Martingale Difference (MD) sequence, i.e. it satisfies the fol- lowingtwoproperties E{w k+1 SB k }=0, ∀k⩾0 and w k+1 ∈B + k , ∀k⩾0. (3.3) The last condition implies that w k+1 is a function of B + k . Similarly, the innovations sequence{v k }relatedtotheprocess{y k }[107]withrespecttoB − k ,isdefinedas v k ≐y k −E{y k SB − k }, (3.4) whereE{y k SB − k }=E{y k Sx k ,u k−1 }=M(u k−1 )x k ,M(u k−1 )=[m u k−1 1 ,...,m u k−1 n ],and we have exploited the signal model in (3.1). Again, the sequence {v k } is a {B − }–MD sequence,i.e. E{v k SB − k }=0, ∀k⩾0 and v k ∈B k , ∀k⩾0. (3.5) 65 Therefore, the Doob–Meyer decompositions of {x k } and {y k } with respect to B k andB − k ,respectively,are x k+1 =Px k +w k+1 , k⩾0, (3.6) y k =M(u k−1 )x k +v k , k⩾1. (3.7) 3.3 SystemStateEstimator Inthissection,wedevelopaKalman–likefilterforestimatingthediscrete–time,finite– stateMarkovchainsystemstatefrompastobservationsandcontrolsbasedonthetheory introduced in [107, 106]. Specifically, we derive an approximate MMSE estimate for a point process observed via conditionally Gaussian measurement vectors with statistics non–linearly influenced by the system state and a non–deterministic control input. We alsoprovideformulaeforthefilterperformanceandacomparisonbetweenourestimator andthestandardKF. 3.3.1 Kalman–likeEstimator Webeginbydefiningthe` aposterioriprobabilityofx k conditionedontheobservation– controlhistory,alsoknownasthebeliefstateinthePOMDPliterature[14],as p kSk ≐p 1 kSk ,...,p n kSk T ∈P, (3.8) 66 where p i kSk = P(x k = e i SF k ),e i ∈ X, and P = {p kSk ∈ R n ∶ 1 T n p kSk = 1,0 ⩽ p i kSk ⩽ 1,∀e i ∈ X}. The expected value ofx k conditioned on the observation–control history F k coincideswithp kSk since x kSk =E{x k SF k }= n Q i=1 e i P(x k =e i SF k )=p kSk . (3.9) Fromnowon,weusep kSk todenotetheoptimalMMSEstateestimator. Toproperlyaddresstheproblemofoptimalnon–linearMMSEestimation,webegin by defining two special sequences: the estimate innovations sequence {μ k } and the observationinnovationssequence{λ k }asfollows μ k ≐p kSk −p kSk−1 =E{x k SF k }−E{x k SF k−1 }, (3.10) λ k ≐y k −y kSk−1 =y k −E{y k SF k−1 }. (3.11) We can easily prove that both sequences are {F}–MD sequences. We note that the innovationssequencesin(3.10)and(3.11)trytocapturetheadditionalinformationcon- tained in the observation and its impact on the estimate p kSk , similarly to the case of theinnovationsequenceinthestandardKF[25]. However,contrarytothestandardKF, where the latter sequence is a white–noise sequence, herein, the innovations sequences are {F}–MD sequences 2 . Next, we state the MD representation theorem [107], which constitutes a powerful tool for developing recursive non–linear MMSE Kalman–like estimatorsbyexploitingtheinnovationssequencesin(3.10)and(3.11). Theorem7(Segall[107,106]). Theestimateinnovationssequence{μ k }isan{F}–MD sequence and therefore, it may be represented as a transformation of the observation 2 Roughlyspeaking,theMDpropertycanbeseenasan“intermediate”propertybetweenindependence anduncorrelation[107]. 67 innovations sequence {λ k } as μ k = G k λ k , where {G k } is an {F}–adapted sequence thatcanbecomputedasfollows G k =E{μ k λ T k SF k }‰E{λ k λ T k SF k }Ž −1 . (3.12) Theorem7statesthatthegainsequence{G k }isan{F}–adaptedsequence. Ingeneral, this implies that the optimal non–linear MMSE estimator of the sequence {x k } does not admit a recursive structure 3 since the recursivity property can only be ensured by the predictability property [107]. To clarify the difference between adaptability and predictability,westatetheirrespectivedefinitions. Definition8(Segall[107]). Asequence{b k }issaidtobe{F}–adaptedifb k ismeasur- ablewithrespecttoF k ,∀k. Definition 9 (Segall [107]). A sequence {b k } is said to be {F}–predictable if b k is measurablewithrespecttoF k−1 ,∀k. Thereexistsomespecialcases[107,106],whereithasbeensuccessfullyshownthatthe resulting estimator is finite–dimensional as a result of the predictability property being true. Specifically,thegainsequence{G k }is{F}–predictablefor i. alllinearcasesindiscrete–time,includingtheclassicalKalmanfilter, ii. discrete–timenon–linearpointprocesses[107,106,11]. At this point, we wish to underscore that in the discrete–time non–linear case for point processes, the predictability of the gain sequence has been proven for the uncontrolled case. For the controlled case, we can follow the same arguments as in [11] and exploit the fact that the control input is measurable with respect to the observation–control 3 Recursiveness is a very desirable property that ensures implementability of estimation in real time andsignificantmemorysavings. 68 history to prove the predictability of the gain sequence. In the case of discrete—time non–linear signals in white Gaussian noise, the predictability of the gain sequence has beendisproven[107]. However,forcertainclassesofsuchsystems,theoptimalMMSE estimator still admits a finite–dimensional recursive structure. Specifically, for non– linearsystemscharacterizedbyacertaintypeofVolterraseriesexpansionorstate–affine equation, it has been shown that the resulting estimator is recursive and of fixed finite dimension [73]. In this case, however, a more general theorem by Bremaud and Van Schuppenforverygeneralnon–lineardiscrete–timesystemsmustbeemployed[20,19]. The system model equations (3.6) and (3.7) do not fall into any of the categories above,wherethepredictabilityofthegainsequenceeitherholdsorfails. Alternatively, direct application of the theorem by Bremaud and Van Schuppen is impossible since theirrepresentationconstitutesageneralrepresentationofthefilterequationwithoutany explicitspecificationfortherelatedterms. Wehaveinsteadnumericallyestablishedthat thesequence{G k }cannotbe{F}–predictable(seeSection3.7). Thus,forourproblem of interest, the optimal non–linear Kalman–like MMSE estimator of the sequence{x k } isintrinsicallynon–recursive(i.e. theresultingestimatorisinfinite–dimensional). Since a recursive solution is desired within the family of Kalman–like estimators in our case, weimposerecursivityasadesignconstraintandusethefollowingapproximation 4 G k ≈E{μ k λ T k SF k−1 }‰E{λ k λ T k SF k−1 }Ž −1 . (3.13) This approximation along with the Doob–Meyer decompositions (3.6)–(3.7) and the definitionsin(3.10)–(3.11)allowustodetermineasuboptimalKalman–typenon–linear 4 Notethatifthegainsequencewaspredictable,(3.13)wouldbecomeanequality. 69 MMSE filtered estimator for the Markov chain system state. Namely, exploiting this approximation,wehavethat μ k =G k λ k , k⩾0, (3.14) whereG k is the time–varying gain given by (3.13). Note that for the set of recursive estimators with a Kalman–like structure, the proposed estimator is an optimal MMSE estimator. Theorem10statestherecursiveformulaefortheproposedestimatordenoted hereafterby ˆ p kSk . Theorem10. TheMarkovchainsystemestimateattimestepk isrecursivelydefinedas ˆ p kSk = ˆ p kSk−1 +G k (y k −y kSk−1 ), k⩾0 (3.15) with ˆ p kSk−1 =Pˆ p k−1Sk−1 , (3.16) y kSk−1 =M(u k−1 )ˆ p kSk−1 , (3.17) G k =Σ kSk−1 M T (u k−1 )‰M(u k−1 )Σ kSk−1 M T (u k−1 )+ ̃ Q k Ž −1 , (3.18) where ˆ p 0S−1 = π, and π is the initial distribution over the system states, Σ kSk−1 is the conditionalcovariancematrixofthepredictionerrorand ̃ Q k =∑ n i=1 ˆ p i kSk−1 Q u k−1 i . Proof. Having defined the estimate and observations innovation sequences as in (3.10) and (3.11), we apply (3.14) to get the desired recursive filter equation. To this end, we need to determine a recursive form that relatesp kSk−1 top k−1Sk−1 and explicit formulae fory kSk−1 andG k . 70 The expected value ofx k conditioned on the observation–control history F k−1 can bedeterminedasfollows p kSk−1 =E{x k SF k−1 } =PE{x k−1 SF k−1 }+E{w k SF k−1 } (a) = Pp k−1Sk−1 +E{E{w k SB k−1 }SF k−1 } (b) = Pp k−1Sk−1 , (3.19) wherewehaveexploited(a)thelawofiteratedexpectationsforσ–algebrasand(b)the fact thatw k is a{B}–MD sequence. Similarly, the expected value of the process{y k } conditionedontheobservation–controlhistoryF k−1 canbedeterminedas y kSk−1 =E{y k SF k−1 } =E{M(u k−1 )x k +v k SF k−1 } (a) = M(u k−1 )p kSk−1 +E{v k SF k−1 } (b) = M(u k−1 )p kSk−1 , (3.20) wherewehaveexploitedthat(a)u k−1 =η k−1 (F k−1 )and(b)v k isa{B − }–MDsequence. Atthispoint,wecanspecifyeachofthetermsthatcomprisethefiltergainin(3.12). Specifically,forE{λ k λ T k SF k−1 },wehave E{λ k λ T k SF k−1 }=E{(y k −y kSk−1 )(y k −y kSk−1 ) T SF k−1 } =E{y k y T k SF k−1 }−y kSk−1 y T kSk−1 . (3.21) 71 In order to determine the exact form ofE{y k y T k SF k−1 }, we first determine p(y k SF k−1 ) asfollows p(y k SF k−1 )=p(y k Sy 0 ,...,y k−1 ,u 0 ,...,u k−2 ) =p(y k Sy 0 ,...,y k−1 ,u 0 ,...,u k−1 ) = n Q i=1 P(x k =e i Sy 0 ,...,y k−1 ,u 0 ,...,u k−1 )p(y k Sx k =e i ,u k−1 ) = n Q i=1 P(x k =e i SF k−1 )f(y k Se i ,u k−1 ) = n Q i=1 p i kSk−1 f(y k Se i ,u k−1 ), (3.22) wherewehaveexploitedthatu k−1 =η k−1 (F k−1 ). Thelastresultimpliesthat E{y k y T k SF k−1 }= S yy T p(ySF k−1 )dy = n Q i=1 p i kSk−1S yy T f(ySe i ,u k−1 )dy = n Q i=1 p i kSk−1 ‰Q u k−1 i +m u k−1 i (m u k−1 i ) T Ž. (3.23) Substitutingbackto(3.21),andperformingsomemanipulations,weget E{λ k λ T k SF k−1 }= n Q i=1 p i kSk−1 Q u k−1 i +m u k−1 i (m u k−1 i ) T −y kSk−1 y T kSk−1 = n Q i=1 p i kSk−1 Q u k−1 i +M(u k−1 )diag(p kSk−1 )M T (u k−1 ) −M(u k−1 )p kSk−1 p T kSk−1 M T (u k−1 ) = ̃ Q k +M(u k−1 )(diag(p kSk−1 )−p kSk−1 p T kSk−1 )M T (u k−1 ) = ̃ Q k +M(u k−1 )Σ kSk−1 M T (u k−1 ), (3.24) 72 where ̃ Q k = ∑ n i=1 p i kSk−1 Q u k−1 i and we have also used the definition of the conditional prediction error covariance matrix in (3.40). Next, we derive the termE{μ k λ T k SF k−1 }. Specifically: E{μ k λ T k SF k−1 }=E{p kSk λ T k SF k−1 }−E{p kSk−1 λ T k SF k−1 }. (3.25) Thefirsttermof(3.25)isdeterminedasfollows E{p kSk λ T k SF k−1 }=E{E{x k SF k }λ T k SF k−1 } =E{E{x k λ T k SF k }SF k−1 } =E{x k (y k −y kSk−1 ) T SF k−1 } =E{x k y T k SF k−1 }−p kSk−1 y T kSk−1 , (3.26) wherewehaveexploitedtheMDpropertyofλ k . ThetermE{x k y T k SF k−1 }canbedeter- minedasfollows E{x k y T k SF k−1 }=E{x k x T k M T (u k−1 )SF k−1 }+E{x k v T k SF k−1 } = n Q i=1 e i e T i P(x k =e i SF k−1 )M T (u k−1 )+E{E{x k v T k SB − k }SF k−1 } =diag(p kSk−1 )M T (u k−1 )+E{x k E{v T k SB − k }SF k−1 } =diag(p kSk−1 )M T (u k−1 ), (3.27) wherewehaveusedthefactsthatu k−1 =η k−1 (F k−1 ),x k ∈B − k bydefinitionandtheMD propertyofv k . Thesecondtermof(3.25)isdeterminedasfollows E{p kSk−1 λ T k SF k−1 }=p kSk−1 E{λ T k SF k−1 }=0, (3.28) 73 where the last equality holds since λ k is a {F}–MD sequence. Combining the above results,wehave E{μ k λ T k SF k−1 }=diag(p kSk−1 )M T (u k−1 )−p kSk−1 y T kSk−1 =(diag(p kSk−1 )−p kSk−1 p T kSk−1 )M T (u k−1 ) =Σ kSk−1 M T (u k−1 ), (3.29) andthegainG k in(3.13)takesthefollowingform G k ≈Σ kSk−1 M T (u k−1 )‰M(u k−1 )Σ kSk−1 M T (u k−1 )+ ̃ Q k Ž −1 . (3.30) Therefore, using (3.10)–(3.11) andreplacing the optimal MMSEestimatep kSk with the approximateMMSEestimate ˆ p kSk toemphasizethatwehaveadoptedtheapproximation in(3.13),wegetfrom(3.14)that ˆ p kSk = ˆ p kSk−1 +G k [y k −y kSk−1 ], k⩾0. (3.31) The latter equation together with (3.19), (3.20) and (3.30) constitute a recursive algo- rithmfortheapproximatecomputationofthebeliefstatein(3.8). Finally,fortheinitial condition,wehave ˆ p 0S−1 =p 0S−1 ≐E{x 0 }=π, (3.32) whereπ istheinitialdistributionoverthesystemstates. Note that even though the proposed filter is formally similar to the classical KF, it isnotastandardKF.Infact,thegainG k dependsontheobservationsandtheresulting filter is non–linear in contrast to the classical KF, which constitutes a linear filter. Fur- thermore, since no constraint is imposed on the individual components of ˆ p kSk , there is 74 no guarantee that they lie on the [0,1] interval. To overcome this issue without incor- poratingadditionalconstraintsthatmaychallengethedeterminationofasolutiontoour problem, we need to apply a suitable memoryless (linear or non–linear) transformation to ˆ p kSk toensurefeasiblesolutionsaredetermined. Asafirststep,weprovethefollow- ingpropertyregardingthestructureoftheKalman–likeestimate. Proposition11. Considerthepredictedstateestimate ˆ p kSk−1 attimestepk. If1 T n ˆ p kSk−1 = 1,thenthestateestimate ˆ p kSk attimestepk hastheproperty1 T n ˆ p kSk =1. Proof. Westartfromthefilterequationin(3.15)andworkasfollows 1 T n ˆ p kSk =1 T n ˆ p kSk−1 +1 T n G k (y k −y kSk−1 ) =1+1 T n Σ kSk−1 M T (u k−1 ) ׉M(u k−1 )Σ kSk−1 M T (u k−1 )+ ̃ Q k Ž −1 (y k −y kSk−1 ), (3.33) wherewehavesubstitutedforG k . Furthermore,wenoticethat 1 T n Σ kSk−1 (a) = 1 T n (diag(ˆ p kSk−1 )−ˆ p kSk−1 ˆ p T kSk−1 ) =1 T n diag(ˆ p kSk−1 )−ˆ p T kSk−1 =0, (3.34) where for (a), see Eq. (3.41). Substituting the last equation back to (3.33), completes theproof. Atthispoint,weunderscorethatstartingfromavalidpredictionattime0andapplying a suitable transformation at each step ensures that the condition of Proposition 11 is alwaystrue. 75 Next, we wish to devise a transformation T ∶ ˆ p kSk → ˆ p τ kSk , which converts the Kalman–likeestimate ˆ p kSk toanewestimate ˆ p τ kSk thatisavalidpmf. Herein,weconsider thecasewhere T(ˆ p kSk )=Aˆ p kSk (3.35) withA∈S n ={A∈R n×n SA=A T },andweformulatetheoptimizationproblem min A∈S n YAˆ p kSk −ˆ p kSk Y 2 subjectto 1 T n Aˆ p kSk =1 0⩽a i ˆ p kSk ⩽1,i=1,2,...,n, (3.36) where a i denotes the i–th row of matrix A. The objective function is con- vex with respect to the optimization variables (a 11 ,...,a nn ,a 12 ,...,a 1n ,a 23 ,...,a 2n , a 34 ,...,a 3n ,...,a n−1,n ), where a ij is the(i,j) element of matrixA, since the Hessian ispositivesemi–definite,i.e. H= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 2(ˆ p 1 kSk ) 2 2(ˆ p 2 kSk ) 2 H u ⋱ 2(ˆ p n kSk ) 2 H ℓ 2‰(ˆ p 2 kSk ) 2 +(ˆ p 1 kSk ) 2 Ž ⋱ 2‰(ˆ p n kSk ) 2 +(ˆ p n−1 kSk ) 2 Ž ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ (3.37) can be written as the sum of an upper and a lower triangular matrix with non–negative diagonal elements. Thus, the optimization problem in (3.36) is a convex optimization problem with linear constraints, where the solutions are the solutions of the Karush– Kuhn–Tucker(KKT)conditions[18]. Analyticallysolvingthisproblemforn=2states, 76 where we have considered the structure indicated by Proposition 11, results in the fol- lowingsetofsolutions: (a 11 ,a 12 ,a 22 )= ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ Œ ˆ p 1 kSk −a 12 ˆ p 2 kSk ˆ p 1 kSk ,α 12 , ˆ p 2 kSk −a 12 ˆ p 1 kSk ˆ p 2 kSk ‘, ˆ p 1 kSk ,ˆ p 2 kSk ∈(0,1],ˆ p 1 kSk + ˆ p 2 kSk =1 Œ− a 12 ˆ p 2 kSk ˆ p 1 kSk ,α 12 , 1−a 12 ˆ p 1 kSk ˆ p 2 kSk ‘, ˆ p 1 kSk <0,ˆ p 2 kSk >0,ˆ p 1 kSk + ˆ p 2 kSk =1 Œ 1−a 12 ˆ p 2 kSk ˆ p 1 kSk ,α 12 ,− a 12 ˆ p 1 kSk ˆ p 2 kSk ‘, ˆ p 1 kSk >0,ˆ p 2 kSk <0,ˆ p 1 kSk + ˆ p 2 kSk =1 (a 11 ,0,1), ˆ p 1 kSk =0,ˆ p 2 kSk =1 (1,0,a 22 ), ˆ p 1 kSk =1,ˆ p 2 kSk =0 . (3.38) Thesolutionin(3.38)revealsthat i. Thetransformationisnon–linearanddata–dependent. ii. Thenegativeelementsaremappedtozero,whiletheremainingonesarerenormal- izedtoensurethattheirsumisequaltoone. Asaresult,weadoptthetransformation ˆ p τ kSk = Aˆ p kSk 1 T n Aˆ p kSk with a ii =1 {ˆ p i kSk ⩾0} anda ij =0,∀i≠j. (3.39) Thelattertransformationhasthesameeffectas(3.38)andthus,weconjecturethatitis also optimal forn> 2. Note that analytically determining a solution to (3.36) forn> 2 is computationally intensive, since there are n 2 +5n+2 2 KKT conditions and (2n) 2 cases mustbeexamined. 77 3.3.2 FilterPerformance The mean–squared error (MSE) performance of the filter in (3.15) is intertwined with theconditionalfilteringerrorcovariancematrix,whichcanbedirectlycomputedas Σ kSk ≐E{(x k −ˆ p kSk )(x k −ˆ p kSk ) T SF k }=diag(ˆ p kSk )−ˆ p kSk ˆ p T kSk . (3.40) Similarly,theMSEperformanceofthepredictorin(3.16)ischaracterizedbythecondi- tionalpredictionerrorcovariancematrix,whichcanbeagaincomputedas Σ kSk−1 ≐E{(x k −ˆ p kSk−1 )(x k −ˆ p kSk−1 ) T SF k−1 }=diag(ˆ p kSk−1 )−ˆ p kSk−1 ˆ p T kSk−1 . (3.41) Eq. (3.40) and (3.41) are directly obtained from their definitions and from the fact that thestatesoftheMarkovchainconstitutethestandardorthonormalbasis. Similarto[11], wehave Σ kSk =Σ kSk−1 +diag(μ k )−G k λ k λ T k G T k −2Sym(ˆ p kSk−1 G T k λ T k ), (3.42) Σ kSk−1 =PΣ k−1Sk−1 P T +diag(ˆ p kSk−1 )−Pdiag(ˆ p k−1Sk−1 )P T , (3.43) where Sym(B) = 1 2 (B+B T ) andΣ 1S0 = diag(π)−ππ T . Both recursive equations are formally similar to the Riccati equation for the standard KF [14]. Furthermore, both equationsrevealthefiltergain’sdependenceontheobservations. 3.3.3 StandardKFversusMarkovchainKalman–likefilter In this section, we comment on the similarities and differences of the mean-squared filtered estimator given in Theorem 10 and the standard KF [25]. Fig. 3.1 shows the systemmodel,thecorrespondingfilter,andtheirinterconnection. 78 Delay Delay + + - + + + + + + + Dela ay + + + + + + + + + + Delay - + + + + + + + + Delay + + + + + + + + Classical Kalman filter linear discrete-time Gauss-Markov system + (a)Interconnectionofbasicstate–variablesystemmodelandstandardKF. Delay Delay + + + - + + + + Dela ay + + + + + + + + discrete-time, finite-state Markov chain system Dela ay - + + + + + + + + + Kalman-like filter (b)InterconnectionofproposedsystemmodelandKalman–likefilter Figure 3.1: Interconnection of system block diagram and MMSE estimator block dia- gram. TheformalstructureoftheblockdiagramsinFig.3.1aand3.1bissimilar,e.g. both filterscontainwithintheirstructuresamodeloftheplant, processingisdonefollowing the same sequence of steps, etc. The main difference between the two estimators lies mainlyontheunderlyingdynamicalsystemtheyassume. Thesystemmodelforthestan- dard KF, shown in Fig. 3.1a, assumes that (i) the state and measurement equations are linear, (ii){x k } is a Gauss–Markov sequence since all related processes have Gaussian distributions,and(iii)thecontrolinputlinearlyinfluencesthesystemstate. Incontrast, oursystemmodel,showninFig.3.1b,assumesthat(i)thestateandmeasurementequa- tions include non–linear terms, (ii) {x k } is a discrete–time, finite–state Markov chain and the associated measurements conditioned on the system state and the control input areGaussian,and(iii)thecontrolinputinfluencesmeasurementsnon–linearly. Further- more, in the standard KF, the control affects the system state evolution in contrast to 79 ourcase,whereitonlyaffectsthemeasurements’quality. Anotherimportantdifference between the two estimators relates to the filter gain in the sense that the KF gain does notdependonthemeasurementsasitisthecasewiththegainofourestimator. Adirect outcomeofthisdependence,inconjunctionwithoursystemmodel,isthatourproposed estimator constitutes a non–linear filter opposed to the standard KF, which is a linear filter. Finally,inthestandardKFsetting,theconditionaldistributionofthesystemstate proves to be Gaussian with the system state estimate being the conditional mean, and theconditionalfilteringerrorcovariancematrixbeingtheconditionalcovariancematrix. However,inoursetting,thisdistributioncoincideswiththesystemstateestimate. 3.4 OptimalControlPolicyDesign WeconsidertheactivestatetrackingproblemintroducedinSection3.2,wheretheinfor- mation available to the controller at time k consists of the observation–control history definedearlieras F k =σ{y 0 ,y 1 ,...,y k ,u 0 ,u 1 ,...,u k−1 }, k=1,2,...,L, F 0 =σ{y 0 }. (3.44) Weareinterestedindetermininganadmissiblecontrolpolicyγ ={η 0 ,η 1 ,...,η L−1 }[14] thatminimizesthecostfunction J γ =Eœ L Q k=1 tr‰Σ kSk (y k ,u k−1 )Ž¡ (3.45) subjecttothesystemequation(3.6)andtheobservationsequation(3.7),withLdenoting the horizon length, andu k = η k (F k ). The term tr‰Σ kSk (y k ,u k−1 )Ž denotes the trace of the conditional filtering error covariance matrix, which explicitly depends on the 80 measurement vectory k and the control inputu k−1 5 . Note that if our goal is to estimate differentstateswithdifferentestimationaccuracies,weneedtoappropriatelymodifythe MSEcostfunctiondefinedabovebyweightingdifferentlytherelatedperstateerrors. At this point, we have the following finite horizon, partially observable stochastic control problem min u 0 ,u 1 ,...,u L−1 J γ . (3.46) Incontrasttostandardproblemsofthistype[14,113],wenotethatourcostfunctionis defined with respect to observations, not system states. This fact along with the defini- tion of our cost function influences the form of the solution. To determine the optimal policy,weexploittheideasin[14],i.e. i. WefirstreformulateourproblemasaperfectstateinformationproblemusingF k−1 asthenewsystemstate, andderivethecorrespondingdynamicprogramming (DP) algorithm(Section3.4.1). ii. Next,wedetermineasufficientstatisticforcontrolpurposesandderiveasimplified DPalgorithm,whichsolvesfortheoptimalcontrolpolicy(Section3.4.2). 3.4.1 PerfectStateInformationReformulation&DPAlgorithm In this section, we reduce our problem from imperfect to perfect state information and then,wederivethecorrespondingDPalgorithm. Fromtheobservations–controlhistory definitionin(3.44),weobservethat F k =(F k−1 ,y k ,u k−1 ), k=1,2,...,L−1, F 0 =σ{y 0 }. (3.47) 5 We underscore that we have purposely defined the cost to be the MSE of the current untransformed estimate ˆ p kSk toavoidchallengingthedeterminationofaclosed–formsolution. 81 The above equations describe the evolution of a system with system stateF k−1 , control inputu k−1 ,andrandom“disturbance”y k . Furthermore,wehavethatp(y k SF k−1 ,u k−1 )= p(y k SF k−1 ,u k−1 ,y 0 ,y 1 ,...,y k−1 ),sincebydefinition,y 0 ,y 1 ,...,y k−1 arepartofF k−1 , and thus, the probability distribution of y k depends explicitly only on the state F k−1 and control input u k−1 . In view of this, we define a new system with system state F k−1 ,controlinputu k−1 andrandom“distrurbance”y k ,wherethestateisnowperfectly observed. Before we proceed to the derivation of the DP algorithm, we state two well–known importantresults,thefundamentallemmaofstochasticcontrolandtheprincipleofopti- mality. Lemma12(Speyer,Chung[113]). Supposethattheminimumto min u∈U g(x,u) (3.48) andU isaclassoffunctionsforwhichE{g(x,u)}exists. Then, min u(x)∈U E{g(x,u(x))}=E{ min u(x)∈U g(x,u(x)).} (3.49) PrincipleofOptimality(Bellman,1957). Anoptimalpolicyhasthepropertythatwhat- evertheinitialstateandinitialdecisionare,theremainingdecisionsmustconstitutean optimalpolicywithregardtothestateresultingfromthefirstdecision. Theorem13givestheDPrecursionforcomputingtheoptimalcontrolpolicyforthe newsystemwithstateF k−1 ,controlinputu k−1 ,andrandom“disturbance”y k . 82 Theorem 13. For k = L − 1,...,1, the cost–to–go function J k (F k−1 ) is related to J k+1 (F k )throughtherecursion J k (F k−1 )= min u k−1 ∈U E y k ™tr‰Σ kSk (y k ,u k−1 )Ž+J k+1 (F k−1 ,y k ,u k−1 )TF k−1 ,u k−1 ž , (3.50) Thecost–to–gofunctionfork=Lisgivenby J L (F L−1 )= min u L−1 ∈U E y L ™tr‰Σ LSL (y L ,u L−1 )ŽTF L−1 ,u L−1 ž . (3.51) Proof. We apply the property of iterated expectation and exploit the conditional inde- pendenceoftheobservation–controlhistorytorewritetheoptimalcostJ ∗ asfollows J ∗ = min u 0 ,u 1 ,...,u L−1 Eœ L Q k=1 tr‰Σ kSk (y k ,u k−1 )Ž¡ = min u 0 ,u 1 ,...,u L−1 EœEœtr‰Σ 1S1 (y 1 ,u 0 )Ž+Eœtr‰Σ 2S2 (y 2 ,u 1 )Ž+... +Eœtr‰Σ LSL (y L ,u L−1 )ŽWF L−1 ,u L−1 ¡W...WF 1 ,u 1 ¡WF 0 ,u 0 ¡¡. (3.52) We then use Lemma 12 to interchange the expectation and minimization operations as follows J ∗ =Eœmin u 0 Eœtr‰Σ 1S1 (y 1 ,u 0 )Ž+min u 1 Eœtr‰Σ 2S2 (y 2 ,u 1 )Ž +...+min u L−1 Eœtr‰Σ LSL (y L ,u L−1 )ŽWF L−1 ,u L−1 ¡W...WF 1 ,u 1 ¡WF 0 ,u 0 ¡¡. (3.53) Finally,employingtheprincipleofoptimality,weacquirethefollowingrecursions J L (F L−1 )= min u L−1 ∈U E y L ™tr‰Σ LSL (y L ,u L−1 )ŽTF L−1 ,u L−1 ž , J L−1 (F L−2 )= min u L−2 ∈U E y L−1 ™tr‰Σ LSL (y L−1 ,u L−2 )Ž+J L−1 (F L−2 ,y L−1 ,u L−2 )TF L−2 ,u L−2 ž , 83 ⋮ J 1 (F 0 )= min u 0 ∈U E y 1 ™tr‰Σ 1S1 (y 1 ,u 0 )Ž+J 2 (F 0 ,y 1 ,u 0 )TF 0 ,u 0 ž , wherethelaststepconcludestheproof. 3.4.2 SufficientStatistic&NewDPAlgorithm As typical with imperfect state information problems, the DP is carried outover a state space of expanding dimension since the dimension of the state F k−1 increases at each timestepk−1withtheadditionofanewobservation. Thus,weseekasufficientstatistic for control purposes (see e.g. [14, 113]). For our problem formulation, we can prove by induction [14] that an appropriate sufficient statistic is the conditional probability distribution ˆ p kSk−1 , which also corresponds to the one–step predicted estimate of the systemstate. Proposition 14. For our active state tracking problem, the conditional distribution ˆ p kSk−1 constitutesasufficientstatisticforcontrolpurposes. Proof. We first note that knowing ˆ p LSL−1 along with f(y L Se i ,u L−1 ),∀e i ∈ X, which is part of the signal model, is sufficient to perform the minimization in (3.51). Thus, the minimizationintheright–handsizeof(3.51)becomes J L (F L−1 )= min u L−1 ∈U H L (ˆ p LSL−1 ,u L−1 )=J L (ˆ p LSL−1 ) (3.54) forappropriatefunctionsH L andJ L . Next,weassumethat J k+1 (F k )=min u k ∈U H k+1 (ˆ p k+1Sk ,u k )=J k+1 (ˆ p k+1Sk ) (3.55) 84 forappropriatefunctionsH k+1 andJ k+1 ,andwewillshowthat J k (F k−1 )= min u k−1 ∈U H k (ˆ p kSk−1 ,u k−1 )=J k (ˆ p kSk−1 ) (3.56) forappropriatefunctionsH k andJ k . Weunderscorethat ˆ p k+1Sk canbegeneratedrecur- sivelybyanequationoftheform ˆ p k+1Sk =Φ k (ˆ p kSk−1 ,y k ,u k−1 ), (3.57) whereΦ k canberecursivelydeterminedfromtheproblem’sdata. Exploiting(3.57)and (3.55),(3.50)isrewrittenas J k (F k−1 )= min u k−1 ∈U E y k ™tr‰Σ kSk (y k ,u k−1 )Ž+J k+1 (Φ k (ˆ p kSk−1 ,y k ,u k−1 ))TF k−1 ,u k−1 ž . (3.58) At this point, we notice that knowing ˆ p kSk−1 along with p(y k SF k−1 ,u k−1 ) is suffi- cient to calculate the expression inside the minimization. Furthermore, the distribu- tionp(y k SF k−1 ,u k−1 ) can be expressed in terms of ˆ p kSk−1 andf(y k Se i ,u k−1 ),∀e i ∈ X. Therefore, (3.58) can be written as a function of ˆ p kSk−1 andu k−1 , which in turn allows ustorewrite(3.50)asfollows J k (F k−1 )= min u k−1 ∈U H k (ˆ p kSk−1 ,u k−1 ) (3.59) for a suitable function H k . This last step completes the induction process and thus, ˆ p kSk−1 provestobeasufficientstatistic. 85 In one time step, the sufficient statistic evolution follows Bayes’ rule, and is character- izedbythefollowingrecursiveformula ˆ p k+1Sk = Pr(y k ,u k−1 )ˆ p kSk−1 1 T n r(y k ,u k−1 )ˆ p kSk−1 , (3.60) where r(y k ,u k−1 ) = diag(f(y k Se 1 ,u k−1 ),...,f(y k Se n ,u k−1 )) is the n × n diagonal matrix of measurement vector probability density functions. Theorem 15 gives the DP algorithmintermsofthesufficientstatistic ˆ p kSk−1 . Theorem 15. For k = L − 1,...,1, the cost–to–go function J k (ˆ p kSk−1 ) is related to J k+1 (ˆ p k+1Sk )throughtherecursion J k (ˆ p kSk−1 )= min u k−1 ∈U ˆ p T kSk−1 h(ˆ p kSk−1 ,u k−1 ) + S 1 T n r(y,u k−1 )ˆ p kSk−1 J k+1 Œ Pr(y k ,u k−1 )ˆ p kSk−1 1 T n r(y k ,u k−1 )ˆ p kSk−1 ‘dy , (3.61) where h(ˆ p kSk−1 ,u k−1 )=[h(e 1 ,ˆ p kSk−1 ,u k−1 ),...,h(e n ,ˆ p kSk−1 ,u k−1 )] T (3.62) withcomponents h(e i ,ˆ p kSk−1 ,u k−1 )=1−tr‰G T k G k Q u k−1 i Ž−Yˆ p kSk−1 +G k (m u k−1 i −y kSk−1 )Y 2 . (3.63) Thecost–to–gofunctionfork=Lisgivenby J L (p LSL−1 )= min u L−1 ∈U ˆ p T LSL−1 h(ˆ p LSL−1 ,u L−1 ) . (3.64) 86 Proof. BeforeweproceedwiththeproofofTheorem15,westatethefollowinglemma, whichwillbeusedlaterintheproof. Lemma 16 (Petersen, Pedersen[94]). Assumex ∽ N(m,Σ) andb,A a vector and a matrixofappropriatedimensions,thenE™(x−b) T A(x−b)ž=(m−b) T A(m−b)+ tr‰AΣŽ. Next, starting from the DP algorithm given in (3.50), we separately determine each of the two terms inside the minimization. From the definition of the conditional filtering errorcovariancematrixin(3.40)andthefilterequationin(3.15),wehavethat tr‰Σ kSk (y k ,u k−1 )Ž=1−Yˆ p kSk−1 +G k (y−y kSk−1 )Y 2 . (3.65) Thus, thefirstterm, whichcorrespondstotheimmediatecostofselectingcontrolinput u k−1 ,canbecomputedasfollows E y k ™tr‰Σ kSk (y k ,u k−1 )ŽTF k−1 ,u k−1 ž= n Q i=1 ˆ p i kSk−1S f(ySe i ,u k−1 )tr‰Σ kSk (y,u k−1 )Ždy = n Q i=1 ˆ p i kSk−1 Œ1−E™Yˆ p kSk−1 +G k (y−y kSk−1 )Y 2 W x k =e i ,u k−1 ž‘. (3.66) TodeterminethetermE™Yˆ p kSk−1 +G k [y−y kSk−1 ]Y 2 Tx k =e i ,u k−1 ž,weworkasfollows E™Yˆ p kSk−1 +G k (y−y kSk−1 )Y 2 Tx k =e i ,u k−1 ž=2ˆ p T kSk−1 E™G k (y−y kSk−1 )Sx k =e i ,u k−1 ž +Yˆ p kSk−1 Y 2 +E™(y−y kSk−1 ) T G T k G k (y−y kSk−1 )Sx k =e i ,u k−1 ž. (3.67) NotethatG k andy kSk−1 dependbydefinitiononthecontrolinputu k−1 andthisimplies that E™G k (y−y kSk−1 )Sx k =e i ,u k−1 ž=G k (m u k−1 i −y kSk−1 ), (3.68) 87 where we have exploited the signal model in (3.1). To determine the E™(y − y kSk−1 ) T G T k G k (y−y kSk−1 )Sx k =e i ,u k−1 ž,weexploitLemma16andget E™(y−y kSk−1 ) T G T k G k (y−y kSk−1 )Sx k =e i ,u k−1 ž=tr‰G T k G k Σ u k−1 i Ž +(m u k−1 i −y kSk−1 ) T G T k G k (m u k−1 i −y kSk−1 ). (3.69) Substituting(3.68)–(3.69)into(3.67)andcombiningterms,weget E™Yˆ p kSk−1 +G k (y−y kSk−1 )Y 2 Tx k =e i ,u k−1 ž=Yˆ p kSk−1 +G k (m u k−1 i −y kSk−1 )Y 2 +tr‰G T k G k Σ u k−1 i Ž, (3.70) andtheimmediatecostofselectingcontrolu k−1 becomes n Q i=1 ˆ p i kSk−1 ‰1−tr‰G T k G k Σ u k−1 i Ž−Yˆ p kSk−1 +G k (m u k−1 i −y kSk−1 )Y 2 Ž. (3.71) The second term in (3.50) represents the expected future cost of selecting control input u k−1 andcanbedeterminedasfollows E y k ™J k+1 (F k−1 ,y k ,u k−1 )TF k−1 ,u k−1 ž= E y k ™J k+1 (Φ k (ˆ p kSk−1 ,y k ,u k−1 ))Tˆ p kSk−1 ,u k−1 ž = S p(ySˆ p kSk−1 ,u k−1 )J k+1 (Φ k (ˆ p kSk−1 ,y,u k−1 ))dy, (3.72) where we have used the facts that ˆ p kSk−1 is a sufficient statistic for F k−1 and u k−1 = η k−1 (F k−1 ), and we have denoted by Φ k the update rule governing the evolution of ˆ p k+1Sk . At this point, we only needto determine the termp(ySˆ p kSk−1 ,u k−1 )and this can bedoneasfollows p(ySˆ p kSk−1 ,u k−1 )= n Q i=1 P(x k =e i Sˆ p kSk−1 ,u k−1 )p(ySx k =e i ,u k−1 ) 88 = n Q i=1 ˆ p i kSk−1 f(ySe i ,u k−1 ). (3.73) Substitutingbackto(3.72),weget E y k ™J k+1 (Φ k (ˆ p kSk−1 ,y k ,u k−1 ))Tˆ p kSk−1 ,u k−1 ž= S 1 T n r(y,u k−1 )ˆ p kSk−1 ×J k+1 Œ Pr(y k ,u k−1 )ˆ p kSk−1 1 T n r(y k ,u k−1 )ˆ p kSk−1 ‘dy, (3.74) where we have used the update rule for ˆ p kSk−1 given in (3.60). Substituting (3.71) and (3.74)to(3.50),wegetthefinalformoftheDPalgorithmgivenin(3.61). Thecost–to– gofunctionfortimestepLsimplyconsistsoftheimmediatecostofselectingaparticular controlandhastheformgivenin(3.71). Determining the desired control policy via the recursions in Theorem 15 results in highcomputationalcomplexity. Specifically,aswithtraditionalPOMDPs,thepredicted belief state ˆ p kSk−1 is uncountably infinite [14]. Furthermore, the control input defini- tion suggests that the control space size can be exponentially large, while determin- ing the expected future cost is challenging since it requires, in the worst–case, an N– dimensional integration for a measurement vector of length N. Contrary to standard POMDP problems [14], the term ˆ p T kSk−1 h(ˆ p kSk−1 ,u k−1 ) is a non–linear function of the predictedbeliefstate ˆ p kSk−1 andthus,existingefficienttechniquessuchas[71]cannotbe directlyemployed. Still,forsmallproblemsizes,anapproximatesolutionvianumerical computationisfeasibleandcanrevealstructuralcharacteristicsoftheoptimalsolution. Todetermine suboptimal but less computationally intensive algorithms for deriving the desired control policy, one may extend techniques from [59], where non–linear cost functions were approximated by appropriate piecewise linear cost functions. However, the performance of the attendant strategies is a function of the accuracy of the adopted 89 approximation. Another approach, which we will follow in Chapter 4, is to derive suf- ficient conditions, e.g. see [62, 81], for the structure of the optimal policy that enables efficientimplementations. 3.5 GreedyFisherInformationSensorSelection In an effort to avoid the computational burden associated with dynamic programming, in this section, we propose a myopic strategy based on the Fisher information measure [39]. We start by reviewing the definition of Fisher information and then move on to determine its exact from for our system model. We also comment on its structure and individualcharacteristics. Finally,westatetheproposedalgorithmanddiscussaboutits implementation. 3.5.1 FisherInformation TheFisherinformation[39]constitutesawell–knowninformationmeasure,whichtries tocapturetheamountofinformationthatanobservablerandomvariablez ∈Rcontains aboutanunknownparameterθ∈R. Itisrelatedtotheconceptofefficiencyinestimation theory since it provides a lower bound for the variance of estimators of a parameter, known as the Cram´ er–Rao lower bound (CRLB) [117]. To formally define the Fisher information,webeginwiththefollowingdefinition. Definition 17 (Score function [105]). Let f(zSθ) be the conditional pdf of z given θ, which is also the likelihood function for θ. For the observation z to be informative aboutθ,thedensitymustvarywithθ. Iff(zSθ)issmoothanddifferentiable,thischange is quantified by the partial derivative with respect to θ of the natural logarithm of the likelihoodfunction,i.e. S(θ)= ∂ ∂θ lnf(xSθ), (3.75) 90 whichiscalledthe score function. Undersuitableregularityconditions(i.e. differentiationwithrespecttoθandintegration withrespecttoz canbeinterchanged),itcanbeshownthatthefirstmomentofthescore iszero,i.e. E{S(θ)}= S f ′ (zSθ) f(zSθ) f(zSθ)dz = ∂ ∂θ œ S f(zSθ)dz¡=0. (3.76) Next,wegivetheformaldefinitionofFisherinformation. Definition18(Fisherinformation[105]). ThevarianceofthescorefunctionS(θ)isthe expected Fisher informationaboutθ,i.e. I(θ)=E{S 2 (θ)}=EœŒ ∂ ∂θ lnf(xSθ)‘ 2 ¡, (3.77) where0⩽I(θ)<∞. We underscore that since the expectation of the score is zero, the associated term has been dropped in Definition 18. Furthermore, we observe that the Fisher information characterizes the relative rate at which the pdf changes with respect to the unknown parameterθ. Inotherwords, thegreatertheexpectationofachangeisatagivenvalue, theeasieristodistinguishthisvaluefromneighboringvalues,andhence,wecanachieve betterestimationperformance. 3.5.2 DiscreteFisherInformation We consider the dynamical system model in Section 3.2.1, where the unknown sys- tem state x k is observed through a noisy measurement vector y k that is shaped by a 91 control input u k−1 . Since the system state x k corresponds to a discrete–time, finite– state Markov chain withn states, we adopt hereafter the scalar notationx k , where now X ≐{1,2,...,n}. In our formulation, there are three key components: 1) the system state x k , which corresponds to the unknown parameter of interest, 2) the measurement vectory k that refers to the observed random variable, and 3) the control inputu k−1 . Therefore, we needtoensurethatthesecomponentsaretakenintoconsiderationduringthederivation of the Fisher information measure. First, we observe that the discrete nature of the systemstatex k preventsthedirectionapplicationofDefinitions17and18. Toovercome thisissue,wedefinethefollowinggeneralizedscorefunction S(x k ,x k +h k ,u k−1 )= 1 h k lnŒ f(y k Sx k +h k ,u k−1 ) f(y k Sx k ,u k−1 ) ‘, (3.78) where the dependence onu k−1 has been stated explicitly and h k denotes a “test point”. TheroleofthelatteristoavoidtheneedfordifferentiabilityimposedbyDefinition17, whilecapturinganychangesoftheparametervaluesandenablingthecomputationofa generalizedFisherinformationmeasure. Forcompleteness,werecallthatthedensityof amultivariateGaussianrandomvectorw=[w 1 ,...,w d ] T isgivenby f(w)= (2π) −d~2 SΣS 1~2 expŒ− 1 2 (w−μ) T Σ −1 (w−μ)‘, (3.79) whereμisthemeanvectorandΣisthecovariancematrix. Todetermine,theexactformofthegeneralizedscorefunctionforoursystemmodel, wesubstitute(3.79)in(3.78)andaftersomemanipulations,weget S(x k ,x k +h k ,u k−1 )= 1 h k lnŒ ¿ Á Á À SΣ u k−1 x k S SΣ u k−1 x k +h k S ‘ 92 − 1 2 Œy T k A u k−1 x k ,x k +h k y k −2y T k b u k−1 x k ,x k +h k +c u k−1 x k ,x k +h k ‘ , (3.80) where A u k−1 x k ,x k +h k ≐Σ u k−1 ,−1 x k +h k −Σ u k−1 ,−1 x k , (3.81) b u k−1 x k ,x k +h k ≐Σ u k−1 ,−1 x k +h k m u k−1 x k +h k −Σ u k−1 ,−1 x k m u k−1 x k , (3.82) c u k−1 x k ,x k +h k ≐m u k−1 ,T x k +h k Σ u k−1 ,−1 x k +h k m u k−1 x k +h k −m u k−1 ,T x k Σ u k−1 ,−1 x k m u k−1 x k . (3.83) Theexpectedvalueofthegeneralizedscorefunctionin(3.80)hasthefollowingform E{S(x k ,x k +h k ,u k−1 )}= 1 h k lnŒ ¿ Á Á À SΣ u k−1 x k S SΣ u k−1 x k +h k S ‘− 1 2 tr‰A u k−1 x k ,x k +h k Σ u k−1 x k Ž − 1 2 m u k−1 ,T x k A u k−1 x k ,x k +h k m u k−1 x k +m u k−1 ,T x k b u k−1 x k ,x k +h k − 1 2 c u k−1 x k ,x k +h k ≐ 1 h k μ u k−1 x k ,x k +h k , (3.84) wherewehaveexploitedthefollowingpropertyforw∼N(μ,Σ)[94] E{w T Aw}=tr(AΣ)+μ T Aμ. (3.85) Atthispoint,wedefinethefollowinggeneralizedFisherinformationmeasure I(x k ,x k +h k ,u k−1 )≐EœS(x k ,x k +h k ,u k−1 )− 1 h k μ u k−1 x k ,x k +h k 2 ¡, (3.86) 93 where once more the dependence onu k−1 has been stated explicitly. To determine the exact form of this measure for our system model, we exploit (3.85) along with the fol- lowingpropertiesforw∼N(μ,Σ)[94] E{(w T Aw) 2 }=tr(AΣ(A+A T )Σ)+μ T (A+A T )Σ(A+A T )μ +(tr(AΣ)+μ T Aμ) 2 , (3.87) E{w T Aww T b}=(μ T A+(Aμ) T )Σb+(tr(ΣA T )+μ T Aμ)b T μ. (3.88) Notethattheaboveexpressionscanbesimplifiedmoreinourcase,since(A u k−1 x k ,x k +h k ) T = A u k−1 x k ,x k +h k . Aftersomemanipulations,Eq. (3.86)becomes I(x k ,x k +h k ,u k−1 )= 1 2h 2 k trŒ‰A u k−1 x k ,x k +h k Σ u k−1 x k Ž 2 ‘+ 1 h 2 k b u k−1 ,T x k ,x k +h k Σ u k−1 x k b u k−1 x k ,x k +h k − 1 h 2 k m u k−1 ,T x k A u k−1 x k ,x k +h k −2lnŒ ¿ Á Á À SΣ u k−1 x k S SΣ u k−1 x k +h k S ‘m u k−1 x k +Σ u k−1 x k A u k−1 x k ,x k +h k m u k−1 x k −2Σ u k−1 x k b u k−1 x k ,x k +h k . (3.89) Wenoticethat,asexpected,theresultingmeasureconstitutesacomplicatedfunctionof the statistics of the underlying multivariate Gaussian model. At the same time, these statisticsaredrivenbytheselectedcontrolinputu k−1 . As already discussed, the generalized Fisher information measure in (3.86) avoids the need for differentiability of the associated likelihood function by using test points. Ingeneral,thesetestpointsareselectedsothattheresultingparameterspaceiscovered, yetensuringthatinvalidparametervaluesareignored. Forourproblem,thisimpliesthat test points should be selected to account for the discrete nature of the parameter space, i.e. testpointsshouldbestate–dependent: h t ∈A≐™h t (x t )∈RSx t +h t (x t )∈Xž. For 94 Algorithm3GreedyFisherInformationSensorSelection(GFIS 2 ) 1: // INITIALIZATION 2: ˆ p 0S−1 ∶=π, ˆ x 0 =argmaxˆ p 0S−1 ; 3: Determineφ(ˆ x 0 ,u −1 )using(3.89); 4: u GFIS 2 −1 =argmaxφ(ˆ x 0 ,u −1 ); 5: // MAIN LOOP 6: fork=0∶Ldo 7: Requestmeasurementvectory k basedonu k−1 ; 8: y kSk−1 =M(u k−1 )ˆ p kSk−1 ; 9: G k =Σ kSk−1 M T (u k−1 )(M(u k−1 )Σ kSk−1 M T (u k−1 )+ ̃ Q k ) −1 ; 10: ˆ p kSk = ˆ p kSk−1 +G k (y k −y kSk−1 ); 11: // SYSTEM STATE ESTIMATE AT TIME STEPk 12: Declaresystemstateas: ˆ x k ∶=argmaxˆ p kSk ; 13: ˆ p k+1Sk =Pˆ p kSk ; ˆ x k+1 =argmaxˆ p k+1Sk ; 14: Determineφ(ˆ x k+1 ,u k )using(3.89) 15: u GFIS 2 k =argmaxφ(ˆ x k+1 ,u k ); 16: k ∶=k+1; 17: endfor example, if we assume X = {1,2,3,4}, the valid test point values for state x k = 2 are −1,1and2. Remark1. ThegeneralizedFisherinformationmeasurein(3.86)canalsobeemployed for constrained parameters. In that case, the selection of test points is less restrictive than our case, viz. if θ is defined in a constrained interval of the form [a,b], h k is selectedsothatθ+h k ∈[a,b]. 3.5.3 GFIS 2 Algorithm As already discussed, Fisher information captures the amount of information that an observable random variable carries about an unknown parameter. Ideally, we would like to maximize this information so that we can infer with certainty the value of the unknown parameter of interest. In our formulation, we also have an extra degree of freedom, the control input, which we can exploit accordingly to maximize the amount 95 of information we acquire with respect to the unknown parameter. To this end, we propose the following myopic sensor selection strategy that maximizes the generalized Fisherinformationmeasure(3.89)ateachtimestep u GFIS 2 k =argmaxφ(x k+1 ,u k ), (3.90) where φ(x k+1 ,u k ) ≐ max h k+1 [I(x k+1 ,x k+1 +h k+1 ,u k )]. We underscore that the Fisher information in (3.89)is maximized with respectto allpossible testpoint values ateach timesteptoensurethatthetightestFisherinformationiscomputed. Examining Eq. (3.90), we notice that the functionφ(⋅) depends both on the system state x k+1 and the control inputu k . However, the former variable is unknown, in fact this is what wish to infer. To overcome this impediment, we instead use an estimate of thesystemstate,i.e. ˆ x k+1 =argmaxˆ p k+1Sk , (3.91) where ˆ p k+1Sk iscomputedthroughourKalman–likefilterrecursionsofTheorem10. Our proposed strategy, which we refer to as Greedy Fisher Information Sensor Selection (GFIS 2 ),isshowninAlgorithm3. Notethatthesensorselectionpartisintertwinedwith theKalman–likefilterrecursions. Inparticular,ateachtimestep,GFIS 2 determinesthe predicted belief state, which then uses to determine the appropriate control input via (3.90). The proposed algorithm presents several benefits. Among these, the most impor- tant is its myopic structure, i.e. no computation of an expected future cost is any more required. As a result, the proposed algorithm incurs much lower computational com- plexitycomparedtoDP.Atthesametime,thecomputationofthevaluesofthefunction φ(⋅) along with the optimization step (3.90) can be completed off–line. Consequently, theproposedstrategycanbeimplementedasalook–uptable,suggestingaveryefficient 96 implementation. TheassociatedcomplexityisO(nαc GFI ),wherec GFI isthecomplex- ity of computing φ(⋅) for a pair (x k+1 ,u k ), versus O((d+1) n αc h,int ) for DP, where (d+1) n isthenumberofpredictedbeliefstates 6 andc h,int thecomplexityofcomputing theterminsidetheminimizationof(3.61)forapair(ˆ p kSk−1 ,u k ). 3.6 SmoothingEstimators In this section, we develop suboptimal MMSE smoother formulae for estimating the discrete–time, finite–state Markov chain at each time step. Our goal is to obtain more refinedsystemstateestimatesgiventheavailabilityofbothpastandfutureobservations andcontrolinputs. Weseekrecursiveformulaeforp kSs ,s>k. Exploitingthetheoryintroducedin[106,107],webeginbydefiningtwosequences, similartotheonesin(3.10)and(3.11) γ s ≐p kSs −p kSs−1 =E{x k SF s }−E{x k SF s−1 }, (3.92) ζ s ≐y s −y sSs−1 =y s −E{y s SF s−1 }, (3.93) whichwecaneasilyprovethatare{F}–MDsequences. Therefore,theMDrepresenta- tiontheoremallowsustowrite{γ}intermsoftheinnovations{ζ}asγ s =C s ζ s andC s canbedeterminedasin(3.12)from C s =E{γ s ζ T s SF s }‰E{ζ s ζ T s SF s }Ž −1 . (3.94) 6 Wequantizethepredictedbeliefspacewithresolutiond. 97 Once more, the gain sequence {C s } is not {F}–predictable. To determine a recursive solution,weimposerecursivityasadesignconstraintandusethefollowingapproxima- tion C s ≈E{γ s ζ T s SF s−1 }‰E{ζ s ζ T s SF s−1 }Ž −1 . (3.95) Theorem19statesthegeneral, finite–dimensionalexpressionfortheproposedsubopti- malMMSEsmoother ˆ p kSR fortheMarkovchainsystemstate. Theorem19. TheR–stage,approximatesmoothedestimatorofx k ,denotedby ˆ p kSR with R⩾k+1,k⩾0,isgivenbytheexpression ˆ p kSR = ˆ p kSk + R Q s=k+1 C s (y s −y sSs−1 ) (3.96) with C s =‰Θ k,s −ˆ p kSs−1 ˆ p T sSs−1 ŽM T ‰u s−1 )(M(u s−1 )Σ sSs−1 M T (u s−1 )+ ̃ Q s Ž −1 , (3.97) where Θ k,s =E{x k x T s−1 SF s−1 }P T , (3.98) E{x k x T s−1 SF s−1 }= Θ k,s−1 r(y s−1 ,u s−2 ) 1 T n ‰Θ k,s−1 r(y s−1 ,u s−2 )Ž1 n , (3.99) with r(y s−1 ,u s−2 ) = diag(f(y s−1 Se 1 ,u s−2 ),...,f(y s−1 Se n ,u s−2 )) denoting the n ×n diagonal matrix of measurement vector probability density functions, E{x 0 x T 0 SF 0 } = diag(ˆ p 0S0 )and ̃ Q s =∑ n i=1 ˆ p i sSs−1 Q u s−1 i . 98 Proof. The smoothed estimator ofx k can be derived by summing (3.92) from s = k to s=R,substitutingγ s from(3.92)andusingtheapproximationin(3.95)forC s ,viz. R Q s=k γ s = R Q s=k (p kSs −p kSs−1 )=p kSR −p kSk−1 ⇒ ˆ p kSR = ˆ p kSk + R Q s=k+1 E{γ s ζ T s SF s−1 }‰E{ζ s ζ T s SF s−1 }Ž −1 ζ s , (3.100) wheretheapproximateMMSEestimatehasbeenusedinthelaststep. Thecomputation of termsE{γ s ζ T s SF s−1 } andE{ζ s ζ T s SF s−1 } can be carried out similarly to the Kalman– like filter gain. The second term can be derived following the same principles as in the derivationofE{λ k λ T k SF k−1 }andthus,wehave E{ζ s ζ T s SF s−1 }= ̃ Q s +M(u s−1 )Σ sSs−1 M T (u s−1 ), (3.101) where ̃ Q s =∑ n i=1 p i sSs−1 Q u s−1 i ,andforthefirstterm,wehave E{γ s ζ T s SF s−1 }=E{p kSs ζ T s SF s−1 }−E{p kSs−1 ζ T s SF s−1 } =E{E{x k SF s }ζ T s SF s−1 }−p kSs−1 E{ζ T s SF s−1 } (a) = E{E{x k ζ T s SF s }SF s−1 } =E{x k (y s −y sSs−1 ) T SF s−1 } (b) = E{x k x T s SF s−1 }M T (u s−1 )+E{x k v T s SF s−1 }−p kSs−1 y T sSs−1 , (3.102) wherewehaveexploitedthat(a)ζ s isa{F}–MDsequenceand(b)u s−1 =η s−1 (F s−1 ). Toderiveaclosed–formexpressionforthetermΘ k,s =E{x k x T s SF s−1 },wefirstobserve that Θ k,s =E{x k x T s SF s−1 } 99 =E{x k x T s−1 SF s−1 }P T +E{x k w T s SF s−1 } =E{x k x T s−1 SF s−1 }P T +E{E{x k w T s SB s−1 }SF s−1 } =E{x k x T s−1 SF s−1 }P T +E{x k E{w T s SB s−1 }SF s−1 } =E{x k x T s−1 SF s−1 }P T . (3.103) Then,forthetermE{x k x T s−1 SF s−1 },weworkasfollows E{x k x T s−1 SF s−1 }= n Q i=1 n Q j=1 e i e T j P(x k =e i ,x s−1 =e j SF s−1 ) = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ P(x k =e 1 ,x s−1 =e 1 SF s−1 ) ... P(x k =e 1 ,x s−1 =e n SF s−1 ) P(x k =e 2 ,x s−1 =e 1 SF s−1 ) ... P(x k =e 2 ,x s−1 =e n SF s−1 ) ⋮ ⋱ ⋮ P(x k =e n ,x s−1 =e 1 SF s−1 ) ... P(x k =e n ,x s−1 =e n SF s−1 ) ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ = E{x k x T s−2 SF s−2 }P T r(y s−1 ,u s−2 ) p(y s−1 SF s−2 ,u s−2 ) = Θ k,s−1 r(y s−1 ,u s−2 ) p(y s−1 SF s−2 ,u s−2 ) = Θ k,s−1 r(y s−1 ,u s−2 ) 1 T n ‰Θ k,s−1 r(y s−1 ,u s−2 )Ž1 n , (3.104) where r(y s−1 ,u s−2 ) = diag(f(y s−1 Sx s−1 = e 1 ,u s−2 ),...,f(y s−1 Sx s−1 = e n ,u s−2 )) is the n ×n diagonal matrix of measurement vector probability density functions. The above recursive formula is initialized by E{x 0 x T 0 SF 0 } = diag(p 0S0 ). For the term E{x k v T s SF s−1 },wehave E{x k v T s SF s−1 }=E{E{x k v T s SB − s }SF s−1 } =E{x k E{v T s SB − s }SF s−1 }=0, (3.105) 100 wherewehaveexploitedtheMDpropertyofv s alongwiththefactthatx k ∈B − s ,∀s>k. TheaboveresultsforΘ k,s andE{x k v T s SF s−1 }allowustorewrite(3.102)as E{γ s ζ T s SF s−1 }=Θ k,s M T (u s−1 )−p kSs−1 y T sSs−1 =‰Θ k,s −p kSs−1 p T sSs−1 )M T (u s−1 Ž (3.106) Substituting(3.101)and(3.106)backto(3.100),wherethenotationoftheapproximate MMSEestimatehasbeenadoptedtoemphasizetheuseoftheapproximationin(3.95), completestheproof. The MSE performance of the smoother in (3.96) can be calculated similarly to the MSE performance of the filter and is characterized by the conditional smoothing error covariancematrixdefinedas Σ kSR ≐E{(x k −ˆ p kSR )(x k −ˆ p kSR ) T SF R }=diag(ˆ p kSR )−ˆ p kSR ˆ p T kSR , R⩾k+1, k⩾0. (3.107) AsevidentfromTheorem19,thegainmatrixC s dependsnon–linearlyontheobser- vations,asitisthecasewiththeKalman–likefilter. Comparingoursmoothedestimator with the corresponding Kalman smoother [25], we observe that in both cases filtered estimates are required to obtain smoothed estimates, and the smoothers gains do not depend on conditional smoothing error covariance matrices. Furthermore, as with the standardKalmansmoother,theKalman–likefilter’sgainisafactorofthesmoothergain since C s =(Θ k,s −ˆ p kSs−1 ˆ p T sSs−1 )Σ −1 sSs−1 G s . (3.108) Thisallowsustorewritethesmootherin(3.96)as ˆ p kSR = ˆ p kSk + R Q s=k+1 (Θ k,s −ˆ p kSs−1 ˆ p T sSs−1 )Σ −1 sSs−1 (ˆ p s+1Ss −ˆ p s+1Ss ). (3.109) 101 There are three well-known types of smoothers in the literature depending on the way observations are processed: fixed–point, fixed–interval, and fixed–lag. The fixed– point smootherp kSR ,R ⩾ k+1, uses all available information up to and including time step R, to improve the estimate of a state at a specific time step. On the contrary, the fixed–interval smoother p kSL ≐ E{x k SF L },k = 0,1,...,L − 1, uses all available information, while the fixed–lag smootherp kSk+Δ ≐E{x k SF k+Δ },k = 0,1,..., uses all informationuptoandincludingafixedintervaloftimeΔfromthetimestepofinterest. Theorem 19, Propositions 20 and 21 give expressions for approximate MMSE fixed– point,fixed–intervalandfixed–lagsmoothedestimators,respectively. Proposition 20. The fixed–interval approximate smoothed estimator of x k ,ˆ p kSL ,k = 1,2,...,L−1,isgivenbytheexpression ˆ p kSL =Pˆ p k−1SL +(I n −P) L Q s=k C s (y s −y sSs−1 ), (3.110) whereI n isthen×nidentitymatrix,andisinitializedby ˆ p 0SL = ˆ p 0S0 + L Q s=1 C s (y s −y sSs−1 ), (3.111) whichisobtainedfromthefixed–pointsmoothedestimatorfork=0. Proof. Ourproposedfixed–pointsmootherisoftheform ˆ p kSL = ˆ p kSk + L Q s=k+1 C s (y s −y sSs−1 ) (3.112) andsettingk=k−1,weget ˆ p k−1SL = ˆ p k−1Sk−1 + L Q s=k C s (y s −y sSs−1 ). (3.113) 102 Multiplying(3.113)byP,subtractingitfrom(3.112)andrearrangingterms,resultsin ˆ p kSL =Pˆ p k−1SL +ˆ p kSk −ˆ p kSk−1 −PG k (y k −y kSk−1 )+(I n −P) L Q s=k+1 C s (y s −y sSs−1 ). (3.114) Atthispoint,weobservefromthefilterequationin(3.15)that ˆ p kSk −ˆ p kSk−1 −PG k (y k −y kSk−1 )=(I n −P)C k (y k −y kSk−1 ), (3.115) wherewehaveexploitedthatG k coincideswithC k . Substituting(3.115)backto(3.114) givesusthefinalformofthefixed–intervalsmoothergivenin(3.110). Proposition 21. The fixed–lag approximate smoothed estimator of x k ,ˆ p kSk+Δ ,k = 0,1,...,isgivenbytheexpression ˆ p kSk+Δ =Pˆ p k−1Sk+Δ−1 +Γ(k,Δ)+(I n −P) k+Δ−1 Q s=k+1 C s (y s −y sSs−1 ), (3.116) whereI n isthen×nidentitymatrix, Γ(k,Δ)≐C k+Δ (y k+Δ −y k+ΔSk+Δ−1 )−ˆ p k+1Sk −ˆ p kSk−1 +Pˆ p kSk−1 , (3.117) andthesmootherisinitializedby ˆ p 0SΔ = ˆ p 0S0 + Δ Q s=1 C s (y s −y sSs−1 ), (3.118) whichisobtainedfromthefixed–pointsmoothedestimatorfork=0. Proof. SettingL=k+Δin(3.96),weget ˆ p kSk+Δ = ˆ p kSk + k+Δ Q s=k+1 C s (y s −y sSs−1 ), (3.119) 103 andfork=k−1,(3.119)becomes ˆ p k−1Sk+Δ−1 = ˆ p k−1Sk−1 + k+Δ−1 Q s=k C s (y s −y sSs−1 ). (3.120) Multiplying (3.120) byP, subtracting it from (3.119) and rearranging terms, gives us thefollowing ˆ p kSk+Δ =Pˆ p k−1Sk+Δ−1 +[C k+Δ (y k+Δ −y k+ΔSk+Δ−1 )−PC k (y k −y kSk−1 )−Pˆ p k−1Sk−1 ] +(I n −P) k+Δ−1 Q s=k+1 C s (y s −y sSs−1 ). (3.121) Weobservethat PC k (y k +y kSk−1 )−Pˆ p k−1Sk−1 = ˆ p k+1Sk +ˆ p kSk−1 −Pˆ p kSk−1 , (3.122) andaftersetting Γ(k,Δ)=C k+Δ (y k+Δ −y k+ΔSk+Δ−1 )−ˆ p k+1Sk −ˆ p kSk−1 +Pˆ p kSk−1 , (3.123) weobtainthefinalformofthefixed–lagsmoothergivenin(3.116). AsinthecaseoftheKalman–likeestimator,wecanapplyasuitablememoryless(linear ornon–linear)transformationtothesmoothedestimatesabovetoobtainvalidprobabil- itymassfunctions. 104 3.7 NumericalExample In this section, we provide numerical results to illustrate the performance of our pro- posedframeworkforthebodysensingapplicationofChapter2. Fortheconvenienceof thereader,webeginbysummarizingsomekeycharacteristicsofthisproblem. Our goal is to estimate the time–evolving physical activity state of an individual by using information from three biometric sensors: two accelerometers (ACCs) and an electrocardiograph(ECG).Wefocusondistinguishingbetweenfourphysicalstates(Sit, Stand,Run,Walk)withtransitionprobabilitymatrixPoftheform P= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0.6 0.2 0 0.4 0.1 0.4 0.1 0 0 0.1 0.3 0.3 0.3 0.3 0.6 0.3 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ . (3.124) Thecontrolinputisdefinedasatuplewitheachelementindicatingtherequestednumber ofsamplesfromtheassociatedsensorateachtimestep,whilethetotalrequestednumber of samples does not exceed a budget of N samples. Each sample corresponds to an extracted feature value from the associated biometric signal. Here, we focus on three features: 1) the ACC mean from the first ACC (S 1 ), 2) the ACC variance from the second ACC (S 2 ), and 3) the ECG period from the ECG (S 3 ). Based on the problem characteristics and the control input definition, the signal model in (3.1) constitutes an AR(1)–correlatedmultivariateconditionallyGaussianmodelwithstatistics m u k−1 i =[μ i,u k−1 (S 1 ) T ,μ i,u k−1 (S 2 ) T ,μ i,u k−1 (S 3 ) T ] T , (3.125) Q u k−1 i =diag(Q i,u k−1 (S 1 ),Q i,u k−1 (S 2 ),Q i,u k−1 (S 3 )), (3.126) Q i,u k−1 (S l )= σ 2 S l ,i 1−φ 2 T+σ 2 z I, (3.127) 105 whereiindicatesphysicalstatee i ,S l denotessensorl,μ i,u k−1 (S l )isofsizeN u k−1 l ×1,T isaToeplitzmatrixwithfirstrow/column[1,φ,φ 2 ,...,φ N u k−1 l −1 ],IistheN u k−1 l ×N u k−1 l identitymatrix,N u k−1 l indicatestherequestednumberofsamplesfromsensorS l ,φisthe parameter of the AR(1) model and σ 2 z accounts for sensing and communication noise. WereferthereadertoChapter2formoredetails. Ournumericalsimulationsarebasedontheabovemodelanddrivenbytherealdata collected by our prototype body sensing network. To showcase our framework’s per- formance, we focus on distinguishing between four activities for a single individual with signal model distributions shown in Fig. 2.2a. We have assumed N = 2 samples, φ=0.25andσ 2 z =2. Weunderscorethatourmethodsaredirectlyapplicabletomultiple sensorsandphysicalstatesaswellaslargersamplebudgets. Finally,weunderscorethat the DP policy has been numerically determined off–line and used in conjunction with thetransformationin(3.39). We begin by numerically establishing the suboptimalilty of our proposed Kalman– like estimator. Specifically, we numerically compare its performance with the perfor- mance of the optimal MMSE estimator, which for our system model of interest, can be recursivelydeterminedviaBayes’ruleasfollows p kSk = r(y k ,u k−1 )Pp k−1Sk−1 1 T n r(y k ,u k−1 )Pp k−1Sk−1 . (3.128) In Fig. 3.2, the MSE performance (trace of filtering error covariance matrix) of the Kalman–likeestimatorin(3.15)andtheoptimalMMSEestimatorin(3.128)areshown. Comparing the MSE performance of the two estimators, we observe that the proposed Kalman–likeestimatorachieveshigherMSEcomparedtotheoptimalMMSEestimator. Thisimpliesthattheformerestimatorissuboptimalinthesensethatitresultsinhigher MSE on average. In addition, using a MAP rule on top of the two estimators results 106 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 time step k filtering error covariance matrix trace optimal MMSE estimator Kalman-like estimator Figure 3.2: Average MSE performance of optimal MMSE estimator and Kalman–like estimator. in 87% and 92% detection accuracy for the Kalman–like filter and the optimal MMSE estimator, respectively. This fact reinforces our belief that the proposed estimator must besuboptimal. Next, in Fig. 3.3, we present the tracking performance of our proposed framework by illustrating the true and estimated state sequences. The output of our system is an estimate of the belief state, and we estimate our activity state via a MAP rule. We consider two policies: 1) the DP policy of Theorem 15, and 2) a greedy policy, which selects the control that minimizes the MSE at each time step k. We observe that under both policies, the proposed framework tracks significantly well the underlying, time– evolving activity state, even though the total number of samples used are few. The greedy policy seems to perform slightly worse; it is, however, characterized by lower computationalcomplexity. Table 3.1 summarizes the detection accuracy achieved by employing different con- trol policies. These are: 1) always select one sample from ACC Mean (strategy A), 2) 107 0 10 20 30 40 50 60 70 80 90 100 Sit Stand Run Walk time step k True activity 0 10 20 30 40 50 60 70 80 90 100 Sit Stand Run Walk time step k Estimated activity 0 10 20 30 40 50 60 70 80 90 100 Sit Stand Run Walk time step k Estimated activity DP policy Greedy MSE policy Figure 3.3: Tracking performance. Top: individual’s true activity, middle: estimated activity(DPpolicy),bottom: estimatedactivity(GreedyMSEpolicy). Table 3.1: Average detection accuracy for different control policies (A: ACC mean – 1 sample,B:ACCvariance–1sample,Γ: ECGPeriod–1sample.) Controlpolicy A B Γ Optimal GreedyMSE Detectionaccuracy 74% 77% 40% 87% 85% always select one sample from ACC Variance (strategy B), 3) always select one sam- plefromECGperiod(strategyΓ),4)optimalDPsensorselectionpolicy,and5)greedy MSEsensorselectionpolicy. Wefindthatselectingacontrolstrategyindependentofthe estimated belief (strategies A, B, Γ) does not benefit the detection accuracy. Similar is the case of using only one sample from one of the available sensors unless the selected sensor can easily discriminate between all states. Furthermore, fusing samples from sensors of different capabilities, as done by the optimal and greedy MSE control poli- cies, can boost detection performance significantly. We expect that for larger values of the total number N of available samples, the detection accuracy would be even higher. 108 Table3.2: AveragefilteringandsmoothingdetectionaccuracyunderDPpolicy. Filtering 87% Smoothing,R=k+1 88% Smoothing,R=k+2 89% Smoothing,R=k+3 89.2% Smoothing,R=k+4 89.4% Finally, we observe that the greedy MSE policy achieves detection accuracy very close totheoptimalone. Next, we comment on the form of the optimal control policy. The optimal control policy consists of three types of control inputs: 1) ACC mean – 2 samples, 2) ACC mean – 1 sample and ACC variance – 1 sample, and 3) ACC mean – 2 samples. The first type of control input is selected for most of the predicted belief states and this is duetothefactthatitcandiscriminatebetweenthemorelikelystatesi.e. Sit,Run,Walk. Thesecondandthirdtypesofcontrolinputareprimarilyselectedfordetectingtheleast likelystate,Stand. Specifically,whentheSitstatehaslowprobability(⩽0.5),thesecond controlinputisselectedsinceonesamplefromeachoftheinformativesensorscanhelp discriminatebetweenStand andtherestofthestates. However,whentheRunandWalk states have zero probability, samples from ACC mean are enough to detect Stand, as verifiedbyFig.2.2a. Table 3.2 summarizes the detection accuracy of filtering and smoothing operations under the DP policy. As expected, smoothing enhances detection accuracy. However, alsoexpected,thesmoothingperformancesaturatesasthestageRincreases. Weunder- scorethatdifferentMarkovchainsand/orsignalmodelstatisticswouldresultindifferent smoothingperformanceimprovements. InFig.3.4,wepresentanexampleoftheeffect of increasing the smoother’s stage on the pmf over the underlying state. We observe that future information can enhance or overturn our belief with respect to the true sys- tem state, unveiling its true value. As R increases, our belief stabilizes, which is also 109 filtered estimate k+1 k+2 k+3 k+4 0 0.2 0.4 0.6 0.8 1 Stage R pmf Underlying true state: Run Sit Stand Run Walk filtered estimate k+1 k+2 k+3 k+4 0 0.1 0.2 0.3 0.4 0.5 Stage R pmf Underlying true state: Walk Figure 3.4: Exemplary effect of stage R on the smoothed state estimates (pmfs). The initialfilteredestimateisalsogivenforcomparison. supported by the results in Table 3.2. Finally, even though the detection accuracy may not improve significantly by smoothing, the associated MSE does, as can verified by Fig.3.4. Fig. 3.5 depicts the true system state sequence and the tracking performance of DP and GFIS 2 . We note that both algorithms track very well the individual’s time–varying activitystatedespitethefewnumberofsamplesused. WealsoobservethatGFIS 2 usu- ally confuses the Walk and Run states, which has to do with how close the associated signal models are in conjunction with the Markov chain transition probabilities values. However,itsgreedynaturebenefitsdetectingtheStandstateversusDP,whichoptimizes theaveragecostandfiltersoutstatesoutwithlowstationaryprobability. Table3.3shows 110 0 20 40 60 80 100 Sit Stand Run Walk time step k True Activity 0 20 40 60 80 100 Sit Stand Run Walk time step k Estimated Activity 0 20 40 60 80 100 Sit Stand Run Walk time step k Estimated Activity DP Algorithm GFIS 2 Algorithm Figure 3.5: Tracking performance comparison. Top: true activity, middle: estimated activity(DPAlgorithm),bottom: estimatedactivity(GFIS 2 Algorithm). Table3.3: MSEandDetectionaccuracycomparisonbetweenDPandGFIS 2 algorithms. Sensingstrategy DP GFIS 2 MSE 0.3791 0.3848 Detectionaccuracy 87% 84% that the performance loss due to the adoption of GFIS 2 is small. Meanwhile, the asso- ciated reduction in complexity is significant, making the proposed algorithm attractive forcontrolledsensingapplications. ItispossibletoachievebetterMSE/detectionaccu- racy by considering a Bayesian version [117] of the Fisher information measure and/or extensions to the dynamical case [117]. However, this may significantly increase the relatedcomplexity. Finally, Fig. 3.6 illustrates the average number of samples per sensor and per state selectedbyDPandGFIS 2 . Wenoticethatbothalgorithmsrequestnosamplesfromthe 111 Sit Stand Run Walk 0 0.5 1 1.5 2 Average allocation of samples Sit Stand Run Walk 0 0.5 1 1.5 2 Average allocation of samples ACC 1 ACC 2 ECG ACC 1 ACC 2 ECG Figure 3.6: Average allocation of samples per sensor and state comparison. Top: DP Algorithm,bottom: GFIS 2 Algorithm. ECG as expected, since according to Fig. 2.2a the associated distributions are highly overlapping. Ontheotherhand,bothalgorithmsrequestacombinationofsamplesfrom the two accelerometers, where the exact number depends on the underlying physical activitystateandtheadoptedalgorithm. AninterestingobservationisthatGFIS 2 tends to select, on average, more samples from the second accelerometer in contrast to DP, which requests on average same number of samples from both ACCs. In contrast, for the Stand state, the situation is reversed, i.e. GFIS 2 selects on average more samples fromACC1thanACC2,whileDPrequestssamplesonlyfromACC2. 3.8 ConcludingRemarks Inthischapter,weaddressedtheactivestatetrackingproblemforadiscrete–time,finite– state Markov chain observed via conditionally Gaussian measurements. We proposed 112 a unified framework that combines MMSE state estimation (prediction, filtering and smoothing) and control policy design. Following an innovations method, we derived a non–linear Kalman–like estimator for the Markov chain system state, which is for- mally similar to the classical KF. We also derived a stochastic dynamic programming algorithm to determine the optimal control policy with the cost functional being the filters’ MSE performance. Despite that the optimal solution is computationally inten- sive, it was possible to design a suboptimal, lower complexity algorithm based on the Fisher information measure. To this end, we generalized the Fisher information mea- sure to account for multi–valued discrete parameters and control inputs. Furthermore, to enhance state estimation performance, we derived recursive formulae for the three fundamental smoothing types. Finally, we verified the successfulness of our proposed framework on the body sensing application presented in Chapter 2. Our results dif- fer from prior work in that we jointly consider time–varying systems, discrete states and active control over measurements. Our framework is widely applicable to a broad spectrum of active classification applications, including sensor management for object classificationandcontrol,radarschedulingandestimationofsparsesignals. While in this chapter an approximate optimal control policy was determined by numericallysolvingtheDPequationandasuboptimalpolicybasedontheFisherinfor- mation measure was proposed, it is also natural to consider the structural characteri- zation of the optimal control policy to enable the design of computationally efficient control strategies. In addition, since the measurement capabilities of any device (e.g. resolution, reliability, etc) are directly related to the usage cost (e.g. energy, computa- tional complexity, etc), it is very interesting to consider the trade–off between tracking performanceandsensingusagecost. Wewillformulateanextensionthataccommodates thesesissuesinthenextchapter. 113 Chapter4 ActiveClassificationwithSensingCosts Inthischapter,weextendtheactivestatetrackingproblemformulationofChapter3to accountforsensingusagecosts. Specifically,wefocusonsystemsmodeledbydiscrete– time, finite–state Markov chains, where we can dynamically select between available noisy Gaussian measurement vectors but now sensing costs are incurred. Our goal is todevisesensingstrategiestooptimizethetrade–offbetweentrackingperformanceand sensing cost. Toward this end, we incorporate sensing usage costs into the partially observableMarkovdecisionprocess(POMDP)formulationofChapter3andadoptour earlierproposedapproximateminimummean–squarederror(MMSE)estimatorforstate tracking. First, we derive the optimal sensing strategy for this new optimization prob- lemviadynamicprogramming(DP).Next,wederivepropertiesofthecost–to–gofunc- tion and sufficient conditions for the structure of the optimal sensing strategy. We also propose two low complexity sensing strategies to circumvent the high computational complexity associated with the optimal sensing strategy. Finally, we illustrate the per- formanceoftheproposedframeworkusingrealdatafromthebodysensingapplication ofChapter2. Theremainderofthischapterisorganizedasfollows. InSection4.1,wesummarize priorworkandstateourcontributions. Next,inSection4.2,wepresenttheoptimization problem. In Section 4.3, we provide the DP recursion that optimally solves the sensing strategy selection problem, and in Section 4.4, we prove properties of the cost–to–go function and give sufficient conditions for the structure of the optimal control policy. In Section 4.5, we propose two strategies that have low computational complexity, and 114 in Section 4.6, we illustrate the performance of all the proposed strategies on the body sensingapplicationofChapter2. WeconcludethechapterinSection4.7. 4.1 RelatedWorkandContributions Prior work on sensor selection and active classification has either ignored [49, 74, 90, 82, 128] for the sake of simplicity or adopted [59, 62, 8, 129, 122, 60] sensing usage costs. Since sensing capabilities are directly related to usage costs, we follow the latter approach. Sufficient conditions under which active sensing reduces to passive sensing and the optimalsensingpolicyhasathresholdstructureforlinearandnon–linearPOMDPshave been previously derived in [81] and [62], respectively. In contrast to [81], for the two– statecasewithscalarmeasurements,weestablishtheconcavityofthecost–to–gofunc- tionforournon–linearPOMDPandgeneralizetheconditionsof[81]inthreeways: we consider1)non–linearPOMDPs,2)time–varyingsystemstates,and3)differentsensing usagecosts. Wealsoillustratecaseswhereactivesensingisunavoidableandprovidethe exact form of the threshold. Note that we do not impose any restrictive constraints on theeffectofcontrolsonthebeliefstateevolutionversus[62],whereallbutonecontrols impose a “quantized” evolution. We underscore that a broad spectrum of applications canbeformulatedasatwo–stateproblemwithscalarmeasurements,e.g. spectrumsens- ing for cognitive radio [116], collision prediction for intelligent transportation [5], user motionestimationforcontextawareness[121]andoutlierdetection[52]. The computational complexity of dynamic programming is prohibitive for large problem sizes. We propose two low complexity sensing strategies with efficient imple- mentations: a myopic strategy and a strategy, where the MSE performance metric is replaced by the Weiss–Weinstein lower bound (WWLB). The WWLB constitutes 115 an important tool in estimation theory since it 1) provides a theoretical performance limit for a Bayesian estimator, and 2) is essentially free from regularity conditions 1 , versu other well–known bounds, e.g. the Cram´ er–Rao lower bound (CRLB), the Bhat- tacharyya lower bound (BLB) [117], and thus, it is applicable to the estimation of dis- crete parameters. Sensor selection algorithms based on estimation bounds have pre- viously appeared in [74, 49], where different versions of the Bayesian CRLB were used as the optimization cost. However, herein, we optimize the trade–off between the WWLB and sensing cost. We derive closed–form formulae for the sequential WWLB [99, 97, 126] for our system model, accounting for discrete parameters and control inputs versus [49], where numerical methods were employed to approximate key terms and and [74], where key posterior distributions were approximated to com- pute the Bayesian CRLB. We underscore that prior work on sequential WWLBs has focused on continuous parameters [99], discretized versions of continuous parameters [126]ortwo–valueddiscreteparameterswithrestrictiveassumptionsonthebound[97], without exerting control. Herein, we consider multi–valued discrete parameters. To the best of our knowledge, we are the first to design a sensing strategy based on the optimizationofWWLBfordiscreteparameters. Ourcontributionsareasfollows. WeproposeaPOMDPformulationwithcostfunc- tional defined as the trade–off between MSE and a sensing cost metric, and derive the optimalsensingstrategyusingDP.Forthespecialcaseoftwostatesandscalarmeasure- ments, we establish the concavity of the cost–to–go function and give sufficient condi- tionsunderwhichpassivesensingisoptimal. Wealsodiscussthecasewhereactivesens- ingisrequiredandillustratehowdecisionmakingisaccomplished(cf. thresholdstruc- ture). EventhoughDPconstitutesthestandardwayofdeterminingtheoptimalsensing 1 Theregularityconditionsrefertotheexistenceofderivativesofthejointpdfoftheobservationsand theparameters. 116 strategy, the curse of dimensionality [93] (i.e. one or all of the state, observation and controlspacesarelarge)makesitimpracticalforlarge–scaleapplications. Furthermore, adopting MSE as our optimization criterion gives rise to non–linear POMDPs, further challenging control policy determination. To overcome the associated computational burden,weproposeamyopicstrategy 2 ,andacost–efficientWWLB(CE–WWLB)strat- egy. Forthelatter,wefirstderiveclosed–formexpressionsforthesequentialWWLBin thecaseofmulti–valueddiscreteparametersandcontrolinputs. Basedontheformulae weobtain,wemakeconnectionsbetweentheboundanddetectionperformance(i.e. the BhattacharyyacoefficientandtheChernoffbound[33]). Finally,wevalidatetheperfor- manceoftheproposedsensingstrategiesonrealdatafromthebodysensingapplication ofChapter2andobservecostsavingsashighas60%withacceptabledetectionerror. 4.2 ProblemDefinition Inthissection,wedescribetheactivestatetrackingproblemwithsensingcostsandgive thecorrespondingoptimizationprobleminmathematicalterms. We adopt the stochastic system model of Chapter 3 to describe the interactions between the various system components. Our previously proposed Kalman–like esti- mator is employed for system state tracking. Recall that its MSE performance is char- acterizedbytheconditionalfilteringerrorcovariancematrixdefinedas Σ kSk =E{(x k −ˆ p kSk )(x k −ˆ p kSk ) T SF k }=diag(ˆ p kSk )−ˆ p kSk ˆ p T kSk . (4.1) 2 Duetheconcavityofthecost–to–gofunction,theassociatedstrategyhasaverynicestructure,known asthresholdstructure,forthespecialcaseoftwostatesandscalarmeasurements. 117 −10 −5 0 5 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 f(y|e i ,u 1 ) y f(y|e 1 ,u 1 ) f(y|e 2 ,u 1 ) overlap (a)Controlinputu 1 −10 −5 0 5 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 y f(y|e i ,u 2 ) f(y|e 1 ,u 2 ) f(y|e 2 ,u 2 ) (b)Controlinputu 2 Figure4.1: Graphshowingeffectoftwodifferentcontrolinputsontheobservationker- nelforthesamesetofstates,e 1 ande 2 . In(a),thetwodistributionsoverlap,leadingto errors,whilein(b),adifferentselectionofcontrolinputleadstopracticallynooverlap. Weunderscorethatthepmf ˆ p kSk constitutesanestimateoftheunknownsystemstateand isdrivenbythecontrolinputselection. Thus,intuitively,selectingthecontrolsequence thatminimizesthefilter’sMSEwouldresultingoodbeliefstateestimates. 4.2.1 OptimizationProblem As shown in Fig. 4.1, the proper choice of control input plays a crucial role in unveil- ing the true system state. In fact, for n > 2 states, selecting the appropriate control input is even more complicated, since a control input that separates two states can bring closer any other two states. At the same time, control input selection entails a usage cost, which is usually related to the effort needed to acquire certain measure- ments, e.g. power consumption spent for communicating certain number of samples fromthesensorsinasensornetworktothefusioncenter. Thus,weareinterestedintwo metrics: the estimation accuracy and the sensing cost associated with a certain control input. Weunderscorethatdifferentobservationscanprovidebetterorworsequalitative views of the same system state, while incurring higher or lower sensing cost. Since 118 we are interested in estimating the unknown system state, we capture estimation accu- racy by tr(Σ kSk (y k ,u k−1 )) ∈ [0,1], where the dependence ofΣ k ony k andu k−1 has been stated explicitly. Furthermore, for each control inputu k−1 , there exists a sensing cost c(u k−1 ), which is appropriately normalized so that c(u k−1 ) ∈ [0,1]. To study the trade–offbetweenestimationaccuracyandenergyconsumption,wedefinethefollowing objectivefunction g(y k ,u k−1 )≐(1−λ)tr(Σ kSk (y k ,u k−1 ))+λc(u k−1 ), (4.2) where λ ∈[0,1]. Under the stochastic system model given in Section 3.2.1, the goal is todetermineanadmissiblesensingstrategyforthecontroller,i.e. asequenceofcontrol inputsu 0 ,u 1 ,...,u L−1 ,whichsolvesforthefollowingoptimizationproblem min u 0 ,u 1 ,...,u L−1 Eœ L Q k=1 g(y k ,u k−1 )¡, (4.3) whereL<∞isthehorizonlengthandexpectationistakenwithrespecttothedistribu- tionofthemeasurementsequence. 4.3 OptimalSensingStrategy We next provide the optimal sensing strategy by deriving the corresponding dynamic programming(DP)recursionfortheactivestatetrackingproblem. Wealsocommenton thecharacteristicsofthisrecursion. TheactivestatetrackingproblemintroducedinSection4.2.1constitutesaninstance of a POMDP. As common with POMDPs, the information F k for decision making at eachtimeslotk isofexpandingdimension[14]. IncontrasttostandardPOMDPs[14], inourcase,amemory–boundedsufficientstatisticfordecisionmakingistheconditional 119 distribution ˆ p k+1Sk , which we refer to as predicted belief state [128]. In one time step, theevolutionofthissufficientstatisticfollowsBayes’ruleandisgivenbythefollowing recursion ˆ p k+1Sk = Pr(y k ,u k−1 )ˆ p kSk−1 1 T n r(y k ,u k−1 )ˆ p kSk−1 ≐Φ(ˆ p kSk−1 ,u k−1 ,y k ), (4.4) wherer(y k ,u k−1 ) = diag(f(y k Se i ,u k−1 ),...,f(y k Se n ,u k−1 )). The optimization prob- lem formulated in (4.3) can be solved using the finite–horizon DP equations given in Theorem22intermsofthepredictedbeliefstate. Theorem 22. For k = L − 1,...,1, the cost–to–go function J k (ˆ p kSk−1 ) is related to J k+1 (ˆ p k+1Sk )throughtherecursion J k (ˆ p kSk−1 )= min u k−1 ∈U ℓ (ˆ p kSk−1 ,u k−1 )+ S 1 T n r(y,u k−1 )ˆ p kSk−1 J k+1 Œ Pr(y,u k−1 )ˆ p kSk−1 1 T n r(y,u k−1 )ˆ p kSk−1 ‘dy , (4.5) where ℓ (ˆ p kSk−1 ,u k−1 )=(1−λ)ˆ p T kSk−1 h(p kSk−1 ,u k−1 )+λc(u k−1 ), (4.6) and h(ˆ p kSk−1 ,u k−1 )=[h(e 1 ,ˆ p kSk−1 ,u k−1 ),...,h(e n ,ˆ p kSk−1 ,u k−1 )] T , (4.7) with h(e i ,ˆ p kSk−1 ,u k−1 )=1−tr‰G T k G k Q u k−1 i Ž−Yˆ p kSk−1 +G k (m u k−1 i −y kSk−1 )Y 2 . (4.8) Thecost–to–gofunctionfork=Lisgivenby J L (ˆ p LSL−1 )= min u L−1 ∈U ℓ (ˆ p LSL−1 ,u L−1 ). (4.9) 120 Proof. The observation–control history F k = σ{Y k ,U k−1 } can be iteratively rewritten asfollows F k =(F k−1 ,y k ,u k−1 ), k=1,2,...,L−1, F 0 =σ{y 0 }, (4.10) and implies thaty k depends only on F k−1 andu k−1 since p(y k SF k−1 ,u k−1 ,y 0 ,y 1 ,..., y k−1 )=p(y k SF k−1 ,u k−1 ). StartingformtheoptimalcostJ ∗ ,weexploittheconditional independenceoftheobservation–controlhistoryinconjunctionwiththeiteratedexpec- tationpropertyasfollows J ∗ = min u 0 ,u 1 ,...,u L−1 Eœ L Q k=1 g(y k ,u k−1 )¡ = min u 0 ,u 1 ,...,u L−1 EœEœg(y 1 ,u 0 )+Eœg(y 2 ,u 1 )+... +Eœg(y L ,u L−1 )WF L−1 ,u L−1 ¡W...WF 1 ,u 1 ¡WF 0 ,u 0 ¡¡. (4.11) Then, we use the fundamental lemma of stochastic control [113] that enables us to interchangeexpectationandminimizationtoget J ∗ =Eœmin u 0 Eœg(y 1 ,u 0 )+min u 1 Eœg(y 2 ,u 1 )+... +min u L−1 Eœg(y L ,u L−1 )WF L−1 ,u L−1 ¡W...WF 1 ,u 1 ¡WF 0 ,u 0 ¡¡, (4.12) and employing the principle of optimality [13] that applies to dynamic decision prob- lemswithsumcostfunctionssuchas(4.3),weget J L (F L−1 )= min u L−1 ∈U E y L ™g(y L ,u L−1 )TF L−1 ,u L−1 ž , J L−1 (F L−2 )= min u L−2 ∈U E y L−1 ™g(y L−1 ,u L−2 )+J L (F L−2 ,y L−1 ,u L−2 )TF L−2 ,u L−2 ž , 121 ⋮ J 2 (F 1 )=min u 1 ∈U E y 2 ™g(y 2 ,u 1 )+J 3 (F 1 ,y 2 ,u 1 )TF 1 ,u 1 ž , J 1 (F 0 )=min u 0 ∈U E y 1 ™g(y 1 ,u 0 )+J 2 (F 0 ,y 1 ,u 0 )TF 0 ,u 0 ž . (4.13) Since the dimension ofF k−1 increases at each time slotk−1 with the addition of a new observation and control, we usep kSk−1 as a sufficient statistic for control purposes [128]. Then, we rewrite (4.13) as a function of this sufficient statistic by computing separatelyeachofthetermsinsidetheminimizationin(4.13). Specifically,forthefirst, whichcorrespondstothecurrentcostofselectingcontrolinputu k−1 ,wehave E y k ™g(y k ,u k−1 )TF k−1 ,u k−1 ž=(1−λ)E y k ™tr(Σ kSk (y k ,u k−1 ))TF k−1 ,u k−1 ž +λE y k ™c(u k−1 )TF k−1 ,u k−1 ž (a) = (1−λ) n Q i=1 ˆ p i kSk−1 (1−tr(G T k G k Q u k−1 i ) −Yˆ p kSk−1 +G k (m u k−1 i −y kSk−1 )Y 2 ) +λ S p(ySF k−1 ,u k−1 )c(u k−1 )dy =(1−λ)ˆ p T kSk−1 h(ˆ p kSk−1 ,u k−1 )+λc(u k−1 ) ≐ℓ (ˆ p kSk−1 ,u k−1 ), (4.14) where(a) the first term has been derived in Theorem 15 and the second term is by the definitionofconditionalexpectation,and h(ˆ p kSk−1 ,u k−1 )=[h(e 1 ,ˆ p kSk−1 ,u k−1 ),...,h(e n ,ˆ p kSk−1 ,u k−1 )] T (4.15) 122 is a n–dimensional vector with h(e i ,ˆ p kSk−1 ,u k−1 ) = 1 − tr(G T k G k Q u k−1 i ) −Yˆ p kSk−1 + G k (m u k−1 i −y kSk−1 )Y 2 . The second term in (4.13) constitutes the expected future cost associatedwithselectingcontrolinputu k−1 andcanbecomputedas E y k ™J k+1 (F k−1 ,y k ,u k−1 )TF k−1 ,u k−1 ž=E y k ™J k+1 (Φ k (ˆ p k−1 ,y k ,u k−1 ))Tˆ p k−1 ,u k−1 ž = S p(ySˆ p kSk−1 ,u kSk−1 )J k+1 (Φ k (ˆ p k−1 ,y,u k−1 ))dy = S n Q i=1 ˆ p i kSk−1 f(ySe i ,u k−1 ) ×J k+1 (Φ k (ˆ p k−1 ,y,u k−1 ))dy = S 1 T n r(y,u k−1 )ˆ p kSk−1 ×J k+1 Œ Pr(y,u k−1 )ˆ p kSk−1 1 T n r(y,u k−1 )ˆ p kSk−1 ‘dy, (4.16) where we have exploited the fact that ˆ p kSk−1 is a sufficient statistic of F k−1 andu k−1 = η k−1 (F k−1 ) as well as the update rule in (4.4). Substituting (4.14) and (4.16) back to (4.13),wegetthefollowingsetofequations J L (ˆ p LSL−1 )= min u L−1 ∈U ℓ (ˆ p LSL−1 ,u L−1 ), J L−1 (ˆ p L−1SL−2 )= min u L−2 ∈U ℓ (ˆ p L−1SL−2 ,u L−2 ) + S 1 T n r(y,u L−2 )ˆ p L−1SL−2 J L Œ Pr(y,u L−2 )ˆ p L−1SL−2 1 T n r(y,u L−2 )ˆ p L−1SL−2 ‘dy , ⋮ J 2 (ˆ p 2S1 )=min u 1 ∈U ℓ (ˆ p 2S1 ,u 1 )+ S 1 T n r(y,u 0 )ˆ p 2S1 J 3 Œ Pr(y,u 1 )ˆ p 2S1 1 T n r(y,u 1 )ˆ p 2S1 ‘dy , J 1 (ˆ p 1S0 )=min u 0 ∈U ℓ (ˆ p 1S0 ,u 0 )+ S 1 T n r(y,u 0 )ˆ p 1S0 J 2 Œ Pr(y,u 0 )ˆ p 1S0 1 T n r(y,u 0 )ˆ p 1S0 ‘dy , (4.17) andthisstepscompletestheproof. 123 Remark 2. The cost functions in (4.5) and (4.9) are non–linear functions of the pre- dicted belief state and thus, the resulting POMDP is non–linear vis–` a–vis standard POMDPs[14]. Solving the DP equations for a specific value of λ results in the optimal sensing strategy for a given trade–off between estimation accuracy and sensing cost. However, the DP recursion does not directly translate to practical solutions due to the following issues 1. The predicted belief state ˆ p kSk−1 is continuous valued, which implies that at each iteration, the cost–to–go function J k (ˆ p kSk−1 ) needs to be evaluated at each point ofanuncountablyinfiniteset. 2. Duetothekindofapplicationsaddressedbytheactivestatetrackingproblem,the controlspacecanbeexponentiallylarge. 3. The computation of the expected future cost requires in the worst case a multi– dimensionalintegration,whichischallenging. 4. DuetothenonlinearityoftheassociatedPOMDP,theaboveDPequationscannot bedirectlysolvedusingtechniquessuchas[112]and[71]. Note that even though the above issues cast the problem as computationally intractable,ingeneral,wecanstillgetanapproximatelyoptimalsolutionforsmallprob- lemsizesbydiscretizingthespaceofpredictedbeliefstateestimates. 4.4 MainResults We next discuss the structural properties of the cost–to-go functionJ k (⋅) for the active statetrackingproblem. Wealsoexploitthenotionofstochasticorderingoftheobserva- tionkernelstocharacterizetheformoftheoptimalsensingstrategyincertaincases. 124 4.4.1 StructuralProperties As a first step, we begin by simplifying the current costℓ (ˆ p kSk−1 ,u k−1 ) associated with selectingcontrolinputu k−1 atpredictedbeliefstate ˆ p kSk−1 asshowninLemma23. Lemma23. Thecurrentcostℓ (ˆ p kSk−1 ,u k−1 )canbeequivalentlywrittenasfollows ℓ (ˆ p kSk−1 ,u k−1 )=(1−λ)tr((I−G k M(u k−1 ))Σ kSk−1 )+λc(u k−1 ), (4.18) whereΣ kSk−1 istheconditionalpredictionerrorcovariancematrix. Proof. Asalreadydiscussed,thecurrentcostofselectingcontrolinputu k−1 consistsof twoparts,theestimationerrorpartandthesensingcostpart,i.e. ℓ (ˆ p kSk−1 ,u k−1 )=(1−λ)ˆ p T kSk−1 h(ˆ p kSk−1 ,u k−1 )+λc(u k−1 ). (4.19) To simplify the form of the current cost, we simplify the estimation error contribution asfollows ˆ p T kSk−1 h(ˆ p kSk−1 )= n Q i=1 ˆ p i kSk−1 h(e i ,ˆ p kSk−1 ,u k−1 )= n Q i=1 ˆ p i kSk−1 − n Q i=1 ˆ p i kSk−1 tr(G k G T k Q u k−1 i ) − n Q i=1 ˆ p i kSk−1 Yˆ p kSk−1 +G k (m u k−1 i −y kSk−1 )Y 2 . (4.20) At this point, we compute each term in (4.20) separately. Clearly, the first term ∑ n i=1 ˆ p i kSk−1 equals 1; for the second term, we exploit the linearity of the trace opera- torasfollows n Q i=1 ˆ p i kSk−1 tr(G k G T k Q u k−1 i )=tr(G k G T k n Q i=1 ˆ p i kSk−1 Q u k−1 i )=tr(G k G T k ̃ Q k ), (4.21) 125 where in the last step, we have used the definition of ̃ Q k provided in Theorem 10. For thethirdterm,weworkasfollows n Q i=1 ˆ p i kSk−1 Yˆ p kSk−1 +G k (m u k−1 i −y kSk−1 )Y 2 = n Q i=1 ˆ p i kSk−1 tr((ˆ p kSk−1 +G k (m u k−1 i −y kSk−1 )) ×(ˆ p kSk−1 +G k (m u k−1 i −y kSk−1 )) T ) =tr(ˆ p kSk−1 ˆ p T kSk−1 + n Q i=1 ˆ p i kSk−1 ˆ p kSk−1 (m u k−1 i −y kSk−1 ) T ×G T k + n Q i=1 ˆ p i kSk−1 G k (m u k−1 i −y kSk−1 )ˆ p T kSk−1 + n Q i=1 ˆ p i kSk−1 G k (m u k−1 i −y kSk−1 ) ×(m u k−1 i −y kSk−1 ) T G T k ) (4.22) Forthesecondterminsidethetraceoperatorabove,wehavethat n Q i=1 ˆ p i kSk−1 ˆ p kSk−1 (m u k−1 i −y kSk−1 ) T G T k = ˆ p kSk−1 ‰ n Q i=1 ˆ p i kSk−1 m u k−1 ,T i − n Q i=1 ˆ p i kSk−1 y T kSk−1 ŽG T k = ˆ p kSk−1 ((M(u k−1 )ˆ p kSk−1 ) T −y T kSk−1 )G T k =0. (4.23) Similarly, the third term inside the trace operator is equal to zero. Lastly, for the fourth term,aftersomemanipulations,weget n Q i=1 ˆ p i kSk−1 G k (m u k−1 i −y kSk−1 )(m u k−1 i −y kSk−1 ) T G T k =G k (M(u k−1 )diag(ˆ p kSk−1 ) ×M T (u k−1 )−y kSk−1 y T kSk−1 )G T k . (4.24) 126 Then, substituting (4.21) and (4.25) back to (4.20) and doing some manipulations, we get ℓ (ˆ p kSk−1 ,u k−1 )=1−tr(G k G T k ̃ Q k )−tr(ˆ p kSk−1 ˆ p T kSk−1 ) −tr(G k (M(u k−1 )diag(ˆ p kSk−1 )M T (u k−1 )−y kSk−1 y T kSk−1 )G T k ) =1−tr(ˆ p kSk−1 ˆ p T kSk−1 +Σ kSk−1 M T (u k−1 )G T k ) =tr(diag(ˆ p kSk−1 )−ˆ p kSk−1 ˆ p T kSk−1 −Σ kSk−1 M T (u k−1 )G T k ) =tr(Σ kSk−1 (I−M T (u k−1 )G T k )) =tr((I−G k M(u k−1 ))Σ kSk−1 ), (4.25) where in the last step, we have exploited the fact that tr(A T ) = tr(A). To get the final result,wesubstitute(4.25)backin(4.19)andthisstepconcludestheproof. Next,westatethefollowingassumptionthatisnecessaryforprovingtheremaining resultsinthissection. Assumption1. We wish to distinguish between two system states,e 1 ande 2 (i.e. X = {e 1 ,e 2 }), using scalar measurements (i.e. the control input affects only the form of the observation,notthesize). The simpler form of the current cost term along with Assumption 1 enables us to proveLemma24andusethistoproveTheorem25. Lemma24. UnderAsssumption1,thecurrentcostℓ (ˆ p kSk−1 ,u k−1 )isaconcavefunction ofthepredictedbeliefstate ˆ p kSk−1 . Proof. For clarity, we drop the dependence on time. Under Assumption 1, we focus ondiscriminatingbetweentwostates,e 1 ande 2 ,whichimpliesthatthepredictedbelief 127 stateisoftheform ˆ p=[ˆ p,1−ˆ p] T . Thus,aftersomemanipulations,thecurrentcostterm becomes ℓ (ˆ p,u)=(1−λ)(2f(ˆ p)−2f 2 (ˆ p)tr(LM T (u)(f(ˆ p)M(u)LM T (u) + ˆ pQ u 1 +(1− ˆ p)Q u 2 ) −1 M(u)))+λc(u), (4.26) wheref(ˆ p)= ˆ p(1− ˆ p)and L= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 −1 −1 1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ . (4.27) Note that the function f(ˆ p) is a concave function of ˆ p. Under Assumption 1, we focus onscalarmeasurementsandthus,(4.26)becomes ℓ (ˆ p,u)=(1−λ)(2f(ˆ p)−2f 2 (ˆ p)tr(L[m u 1 ,m u 2 ] T (f(ˆ p)[m u 1 ,m u 2 ]L[m u 1 ,m u 2 ] T + ˆ pσ 2 1,u +(1− ˆ p)σ 2 2,u ) −1 [m u 1 ,m u 2 ]))+λc(u) =(1−λ)Œ2f(ˆ p)− 2a 12 (u)f 2 (ˆ p) a 12 (u)f(ˆ p)+σ 2 1,u ˆ p+σ 2 2,u (1− ˆ p) ‘+λc(u), (4.28) wherea 12 (u)=(m u 1 −m u 2 ) 2 ⩾0. Inordertocharacterizethecurrentcostgivenin(4.28), weneedtodistinguishbetweenthefollowingfourcases i. CaseI:m u 1 =m u 2 andσ 2 1,u =σ 2 2,u ,u∈U, ii. CaseII:m u 1 =m u 2 andσ 2 1,u ≠σ 2 2,u ,u∈U, iii. CaseIII:m u 1 ≠m u 2 andσ 2 1,u =σ 2 2,u ,u∈U, iv. CaseIV:m u 1 ≠m u 2 andσ 2 1,u ≠σ 2 2,u ,u∈U. ForCasesIandII,a 12 (u)=0,andthus,thecurrentcostin(4.28)takesthefollowing form ℓ (ˆ p,u)=2(1−λ)f(ˆ p)+λc(u). (4.29) 128 The latter expression is a concave function of ˆ p and depends on the control input u through the sensing cost c(u). Under the assumptions of Case III, the current cost in (4.28)becomes ℓ (ˆ p,u)=(1−λ) 2σ 2 u f(ˆ p) a 12 (u)f(ˆ p)+σ 2 u +λc(u), (4.30) where a 12 (u) > 0 and σ 2 u = σ 2 1,u = σ 2 2,u . The second derivative with respect to ˆ p of the expressionin(4.30)hasthefollowingform ℓ ′′ (ˆ p,u)=− 4σ 4 u (a 12 (u)(3ˆ p(ˆ p−1)+1)+σ 2 u ) (a 12 (u)f(ˆ p)+σ 2 u ) 3 <0, (4.31) wherethelastinequalityholds∀ˆ p∈[0,1]sincea 12 (u)>0,f(ˆ p)⩾0and3ˆ p(ˆ p−1)+1> 0,∀ˆ p ∈ [0,1]. This last step implies that the current cost in (4.30) is also a concave function of ˆ p. Last but not least, under the assumptions of Case IV, the current cost in (4.28)takesthefollowingform ℓ (ˆ p,u)=(1−λ) 2f(ˆ p)(σ 2 1,u ˆ p+σ 2 2,u (1− ˆ p)) a 12 (u)f(ˆ p)+σ 2 1,u ˆ p+σ 2 2,u (1− ˆ p) +λc(u) (4.32) wherea 12 (u)>0. Thesecondderivativewithrespectto ˆ poftheexpressionin(4.32)is ℓ ′′ (ˆ p,u)=− α ˆ p,a 12 (u),σ 2 1,u +β ˆ p,a 12 (u),σ 2 2,u +γ ˆ p,σ 2 1,u ,σ 2 2,u (σ 2 1,u ˆ p+σ 2 2,u (1− ˆ p)+a 12 (u)f(ˆ p)) 3 , (4.33) where α ˆ p,a 12 (u),σ 2 1,u =σ 4 1,u (1+a 12 (u)σ 2 1,u )ˆ p 3 , (4.34) β ˆ p,a 12 (u),σ 2 2,u =σ 4 2,u (σ 2 2,u +a 12 (u))(1− ˆ p) 3 , (4.35) γ ˆ p,σ 2 1,u ,σ 2 2,u =σ 2 1,u σ 2 2,u f(ˆ p)(σ 2 1,u ˆ p+3σ 2 2,u (1− ˆ p)). (4.36) 129 Notethateachofthetermsin(4.34)–(4.36)isgreaterthanorequaltozeroyieldingthat the numerator in (4.33) is greater than zero. In addition, the denominator in (4.33) is alsogreaterthanzero. Thus,thesecondderivativegivenin(4.33)isnegative∀ˆ p∈[0,1] andtherefore,thecurrentcostin(4.32)constitutesaconcavefunctionof ˆ p. Remark3. Ournumericalsimulationsimplythatthisresultsholdsforn>2statesand multi–dimensionalmeasurementvectors. However,duetothecomplicatedexpressions involved,wehaveyettoestablishitsvalidityanalytically. Theorem 25. Under Assumption 1, the cost–to–go function J k (ˆ p kSk−1 ),k = L,L − 1,...,1,isaconcavefunctionofthepredictedbeliefstate ˆ p kSk−1 . Proof. We prove the concavity of the cost–to–go function J k (ˆ p kSk−1 ) by induction. At timestepL,itisclearthatJ L (ˆ p LSL−1 )isaconcavefunctionsinceaccordingtoLemma 24,foreachu L ∈U,ℓ (ˆ p LSL−1 ,u L )isaconcavefunctionandthepointwiseminimumof concavefunctionsisalsoconcave. Next, we assume that J k+1 (ˆ p k+1Sk ) is concave, and to prove the concavity of J k (ˆ p kSk−1 ), we only need to show that ∫ 1 T n r(y,u k−1 )ˆ p kSk−1 J k+1 (Φ(ˆ p kSk−1 ,y,u k−1 ))dy, where Φ(⋅) denotes the associ- atedupdaterule,isalsoaconcavefunctionforallu k−1 ∈U. Letv andw twopredicted beliefstatevectors. Foranyα,0⩽α⩽1,wehave α S 1 T n r(y,u k−1 )vJ k+1 (Φ(v,y,u k−1 ))dy +(1−α) S 1 T n r(y,u k−1 )wJ k+1 (Φ(w,y,u k−1 ))dy= S (α1 T n r(y,u k−1 )v+(1−α)1 T n r(y,u k−1 )w) × α1 T n r(y,u k−1 )v α1 T n r(y,u k−1 )v+(1−α)1 T n r(y,u k−1 )w J k+1 (Φ(v,y,u k−1 )) + α1 T n r(y,u k−1 )w α1 T n r(y,u k−1 )v+(1−α)1 T n r(y,u k−1 )w J k+1 (Φ(v,y,u k−1 )) dy 130 !" #" ! ! ! ! Figure 4.2: Optimal DP policy cost example for three control inputs and associated thresholdsensingstrategyrule. ⩽ S (α1 T n r(y,u k−1 )v+(1−α)1 T n r(y,u k−1 )w) ×J k+1 Œ α1 T n r(y,u k−1 )vΦ(v,y,u k−1 )+(1−α)1 T n r(y,u k−1 )wΦ(w,y,u k−1 ) α1 T n r(y,u k−1 )v+(1−α)1 T n r(y,u k−1 )w dy = S (α1 T n r(y,u k−1 )v+(1−α)1 T n r(y,u k−1 )w)J k+1 (Φ(αv+(1−α)w,y,u k−1 )), (4.37) wheretheinequalitycomesfromtheinductionhypothesisandthelaststepimpliesthat for allu k−1 ∈ U, the function ∫ 1 T n r(y,u k−1 )ˆ p kSk−1 J k+1 (Φ(ˆ p kSk−1 ,y,u k−1 ))dy is con- cave. Last but not least, J k (ˆ p kSk−1 ) constitutes the minimum of concave functions and thus,itisalsoconcave. AdirectconsequenceofTheorem25isthattheoptimalsensingstrategyhasathreshold structure,whichimpliesaveryefficientimplementation. Considerforexamplethesce- nario in Fig. 4.2. Each curve corresponds to the term inside the minimization in (4.5) for a different control input. Since the cost–to–go function is the minimum of these termsateachpredictedbeliefstatevalue,theintersectionpointscorrespondtodecision thresholds that specify the change between control inputs. As a result, by determining thesethresholds,theoptimalstrategyreducestotestinginwhichintervaltheassociated predictedbeliefstatefallsintoandadoptingtheassociatedcontrolinput. Thisresultfor non–linearPOMDPsgeneralizesinessencethewell–knownfactthattheoptimalpolicy 131 forlinearPOMDPswithtwostateshasathresholdstructure[14]. Notethat,contraryto thenon–linearPOMDPsin[62],wedonotimposeanyconstraintsonthecostfunctions, Markovchainandobservationprobabilitiestodeterminetheoptimalityofthethreshold structure. Finally, the concavity of the cost–to–go function enables us to characterize howinformativeacontrolinputis,asweshowinthesequel. 4.4.2 PassiveversusActiveSensing Aquestion of keyinterest is when a static or passive sensing policy is optimal. Herein, we exploit the notion of stochastic ordering of the observation kernels to characterize the structure of the optimal sensing strategy in several cases. According to Theorem 25, for fixed control inputu k−1 , the cost–to–go function clearly depends on the obser- vation kernel and the predicted belief state. Before, we proceed, we state the following definition. Definition 26 (Blackwell Ordering [16]). Given two conditional probability densities f(ySx,u a )andf(ySx,u b )fromX toY,wesaythatf(ySx,u b )islessinformativethan f(ySx,u a ) ‰f(ySx,u b ) ⩽ B f(ySx,u a )Ž if there exists a stochastic transformation W fromY toY suchthat f(ySx,u b )= S f(zSx,u a )W(z;y)dz, ∀x∈X. (4.38) ThefollowingstatementconstitutesanimportantoutcomeofBlackwellordering. Fact 1 (see [29] ch. 14.17 and [101] Theorem 3.2). Let f(ySx,u a ) and f(ySx,u b ) be twoobservationkernels. Iff(ySx,u b )⩽ B f(ySx,u a ),then(T a g)(ˆ p)⩽(T b g)(ˆ p),∀ˆ p∈ P andforanyconcavefunctiong ∶P →Rwith(T a g)(ˆ p)=E{g(Φ(ˆ p,u a ,y))},where expectationiswithrespecttof(ySx,u a ). 132 WeagainrestrictourattentiontocasesthatsatisfyAssumption1andinordertopro- videasetofconditionstodeterminetheoptimalcontrolstrategystructure,weconsider thefollowingfourcases i. CaseI(samemean,samevariance): m u 1 =m u 2 andσ 2 1,u =σ 2 2,u ,u∈U, ii. CaseII(samemean,differentvariance): m u 1 =m u 2 andσ 2 1,u ≠σ 2 2,u ,u∈U, iii. CaseIII(differentmean,samevariance): m u 1 ≠m u 2 andσ 2 1,u =σ 2 2,u ,u∈U, iv. CaseIV(differentmean,differentvariance): m u 1 ≠m u 2 andσ 2 1,u ≠σ 2 2,u ,u∈U. Now,combiningFact1withTheorem25resultsinthefollowingsetofconditions. Corollary 1. Under Assumption 1 and for the active state tracking problem in (4.3), if there exists a control input u ∗ satisfying f(ySx,u) ⩽ B f(ySx,u ∗ ) and ℓ (ˆ p,u) ⩾ ℓ (ˆ p,u ∗ ),∀u ∈ U,∀ˆ p ∈ P, it is always optimal to select control inputu ∗ irrespectively ofthepredictedbeliefstate ˆ p. Corollary 1 provides a set of sufficient conditions for reducing active state tracking to passive state tracking with no observation control. For Cases I and II, we note that the currentcostdependsonthesensingcostassociatedwithacertaincontrolinput,i.e. ℓ (ˆ p,u)=2(1−λ)ˆ p(1− ˆ p)+λc(u) (4.39) Ifweweretoorderallcontrols withrespecttothecurrentcostonly, thenthefollowing relationshipholds ℓ (ˆ p,u a )⩽ℓ (ˆ p,u b )⇔c(u a )⩽c(u b ), ∀ˆ p∈P (4.40) Theaboveresultimpliesthatweneedtoconsiderboththesensingcostsofthecontrols and the Blackwell ordering of the related observation kernels to determine the optimal 133 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 ˆ p ℓ (ˆ p,u) α 12 (u 1 ) = 1 α 12 (u 2 ) = 3 α 12 (u 3 ) = 5 α 12 (u 4 ) = 10 α 12 (u 5 ) = 100 Figure4.3: Currentcostsforfixedvarianceσ 2 u i =2anddifferenta 12 (u i ). controlinput 3 . Furthermore,underAssumption1andforCaseII,theBlackwellordering coincideswiththeorderingoftheassociatedvariances[101],i.e. σ 2 1,u b ⩾σ 2 1,u a ⇒f(ySx,u b )⩽ B f(ySx,u a ). (4.41) InCaseIII,thecurrentcosthasthefollowingform ℓ (ˆ p,u)=(1−λ) 2σ 2 u f(ˆ p) a 12 (u)f(ˆ p)+σ 2 u +λc(u), (4.42) andforλ=0,orderingtherelatedcostscanbeachievedbasedona 12 (u)=(m u 1 −m u 2 ) 2 , as visually verified in Fig. 4.3. Corollary 2 gives more general conditions under which thisorderingcanbeachieved. 3 Note that Case I is a degenerate case, where all the observation kernels are the same, and thus, the active state tracking problem reduces immediately to traditional state tracking without observation control. 134 Corollary2. UnderAssumption1andforcontrolinputsu i ,u j ∈U,ifeitherofthetwo conditions C1) c(u i )=c(u j ), C2) a(u i )>a(u j )andc(u i )<c(u j ), aremet,u i givesrisetothesmallestcurrentcostirrespectiveofthepredictedbeliefstate ˆ p. Proof. We start from (4.42) and consider two cases: 1) c(u i ) = c,∀u i ∈ U and c con- stant, 2) c(u i ) < c(u j ),u i ,u j ∈ U with i ≠ j. For the first case, it is straightforward to seethat ℓ (ˆ p,u i )⩾ℓ (ˆ p,u j )⇒ (1−λ) 2σ 2 u f(ˆ p) a 12 (u i )f(ˆ p)+σ 2 u +λc(u i )⩾(1−λ) 2σ 2 u f(ˆ p) a 12 (u j )f(ˆ p)+σ 2 u +λc(u j )⇒ 1 a 12 (u i )f(ˆ p)+σ 2 u ⩾ 1 a 12 (u j )f(ˆ p)+σ 2 u ⇒ a 12 (u i )⩽a 12 (u j ), (4.43) which implies that an ordering of the current costs associated with each control input can be achieved based on a 12 (u) = (m u 1 −m u 2 ) 2 . For the second case, we assume that for controlsu i ,u j ∈ U,i ≠ j, a 12 (u i ) > a 12 (u j ) and c(u i ) < c(u j ). Then, we have the following a 12 (u i )>a 12 (u j )⇒a 12 (u i )f(ˆ p)+σ 2 u >a 12 (u j )f(ˆ p)+σ 2 u ⇒(1−λ) 2σ 2 u f(ˆ p) a 12 (u i )f(ˆ p)+σ 2 u <(1−λ) 2σ 2 u f(ˆ p) a 12 (u j )f(ˆ p)+σ 2 u , (4.44) and c(u i )<c(u j )⇒λc(u i )<λc(u j ). (4.45) 135 Combining(4.44)and(4.45),weget (1−λ) 2σ 2 u f(ˆ p) a 12 (u i )f(ˆ p)+σ 2 u +λc(u i )<(1−λ) 2σ 2 u f(ˆ p) a 12 (u j )f(ˆ p)+λc(u j )σ 2 u ⇒ ℓ (ˆ p,u i )<ℓ (ˆ p,u j ), (4.46) ∀ˆ p∈[0,1]. Sincethelastinequalityholdsforall ˆ p∈[0,1],weconcludethattheordering ofthecurrentcostsassociatedwitheachcontrolinputcanbeachievedoncemorebased ona 12 (u)independentlyof ˆ p. For the more general Case IV, selecting the optimal control input is again not as straightforwardasCasesIandII.Infact,theoptimalchoiceofcontrolinputmaydepend onthevalueofthepredictedbeliefstate,asCorollary3revealsandFig.4.4illustrates. Corollary 3. Under Assumption 1 and assuming two control inputs u a and u b with a 12 (u a ) = a 12 (u b ), c(u a ) = c(u b ), σ 2 1,u a > σ 2 1,u b and σ 2 2,u a < σ 2 2,u b , there exists p ∗ ∈ P suchthatfor ˆ p⩽p ∗ ,ℓ (ˆ p,u a )⩽ℓ (ˆ p,u b )andfor ˆ p⩾p ∗ ,ℓ (ˆ p,u a )⩾ℓ (ˆ p,u b )with p ∗ = σ 2 2,u b −σ 2 2,u a σ 2 1,u a −σ 2 1,u b +σ 2 2,u b −σ 2 2,u a . (4.47) Proof. Westartfrom(4.28)andsimplifytermsasfollows ℓ (ˆ p,u a )⩾ℓ (ˆ p,u b )⇒ (1−λ)Œ2f(ˆ p)− 2a 12 (u a )f 2 (ˆ p) a 12 (u a )f(ˆ p)+σ 2 1,u a ˆ p+σ 2 2,u a (1− ˆ p) ‘+λc(u a )⩾ (1−λ)Œ2f(ˆ p)− 2a 12 (u b )f 2 (ˆ p) a 12 (u b )f(ˆ p)+σ 2 1,u b ˆ p+σ 2 2,u b (1− ˆ p) ‘+λc(u b )⇒ 2a 12 (u a )f 2 (ˆ p) a 12 (u a )f(ˆ p)+σ 2 1,u a ˆ p+σ 2 2,u a (1− ˆ p) ⩽ 2a 12 (u b )f 2 (ˆ p) a 12 (u b )f(ˆ p)+σ 2 1,u b ˆ p+σ 2 2,u b (1− ˆ p) ⇒ 136 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ˆ p ℓ (ˆ p,u) σ 2 1,u 1= 2,σ 2 2,u 1= 1 σ 2 1,u 2= 5,σ 2 2,u 2= 1 σ 2 1,u 3= 7,σ 2 2,u 3= 3 σ 2 1,u 4= 1,σ 2 2,u 4= 2 σ 2 1,u 5= 1,σ 2 2,u 5= 5 σ 2 1,u 6= 3,σ 2 2,u 6= 7 Figure4.4: Currentcostswithdifferentvariancesanda 12 (u i )=constant. 2a 12 (u a )f 2 (ˆ p)(a 12 (u b )f(ˆ p)+σ 2 1,u b ˆ p+σ 2 2,u b (1− ˆ p))⩽ 2a 12 (u b )f 2 (ˆ p)(a 12 (u a )f(ˆ p)+σ 2 1,u aˆ p+σ 2 2,u a(1− ˆ p))⇒ 2a 12 (u a )f 2 (ˆ p)‰(σ 2 2,u b −σ 2 2,u a)+(σ 2 1,u b −σ 2 2,u b −σ 2 1,u a+σ 2 2,u a)ˆ pŽ⩽0⇒ −2a 12 (u a )f 2 (ˆ p)‰(σ 2 2,u a−σ 2 2,u b )+(σ 2 1,u a−σ 2 1,u b +σ 2 2,u b −σ b 2,u a)ˆ pŽ⩽0, (4.48) where we have exploited the facts that a 12 (u a ) = a 12 (u b ) and c(u a ) = c(u b ). For the last inequality, we note that the term −2a 12 (u a )f 2 (ˆ p) ⩽ 0. Therefore, the inequality is trueifandonlyif (σ 2 2,u a−σ 2 2,u b )+(σ 2 1,u a−σ 2 1,u b +σ 2 2,u b −σ 2 2,u a)ˆ p ⩾0 ⇒ ˆ p ⩾ σ 2 2,u b −σ 2 2,u a σ 2 1,u a −σ 2 1,u b +σ 2 2,u b −σ 2 2,u a =p ∗ , (4.49) 137 wherewehaveexploitedthefactsthatσ 2 1,u a >σ 2 1,u b andσ 2 2,u a <σ 2 2,u b . Ontheotherhand, theinequalityisfalseifandonlyif ˆ p ⩽ σ 2 2,u b −σ 2 2,u a σ 2 1,u a −σ 2 1,u b +σ 2 2,u b −σ 2 2,u a =p ∗ (4.50) Combining(4.49)and(4.50),wegetthedesiredresult. Intuitively,fixinga 12 (u i )andincreasingtheassociatedvariancesleadstolargercost,as also verified by Fig. 4.4. Based on the above observations, for Case IV, active sensing is unavoidable, and the associated thresholds constitute a complicated function of the relatedmeans,variancesandsensingcosts. 4.5 LowComplexityStrategies In this Section, we propose two sensing strategies with lower complexity and discuss theirimplementation. OurmotivationstemsfromthefactthatDP’scomputationalcom- plexity proves to be prohibitive for large–scale applications, rendering DP impractical formanypracticalscenariosofinterest. 4.5.1 MyopicStrategy Starting from the DP recursion in (4.5), we propose a myopic algorithm that selects an appropriate control input by minimizing the one–step ahead cost. Namely, the control inputselectedateachtimeslotisasfollows u myopic k =argminℓ (ˆ p k+1Sk ,u k ). (4.51) 138 We note that the above solution avoids the computational burden associated with the computation of the expected future cost that requires an intensive multi–dimensional integration. Still, the size of associated control space and the non–linearity of ℓ (ˆ p k+1Sk ,u k )canbeanissue. Ontheotherhand, Lemma24impliesanefficientimple- mentationoftheproposedalgorithminthecaseoftwostatesandscalarmeasurements. Let us denote q(ˆ p k+1Sk ) = min u k ∈U ℓ (ˆ p k+1Sk ,u k ). For each distinct u k , the function ℓ (ˆ p k+1Sk ,u k ) is a concave function of ˆ p k+1Sk and this implies that q(ˆ p k+1Sk ) consists of segmentsoftheseconcavefunctions. Thelastobservationimpliesthatforthesettingin Lemma24,themyopicpolicyhasathresholdstructureoftheform u myopic k = ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ u i 1 , 0⩽ ˆ p⩽p ∗ i 1 , u i 2 , p ∗ i 1 < ˆ p⩽p ∗ i 2 , ⋮ , ⋮ u i J , p ∗ i Ξ < ˆ p⩽p ∗ i Ξ+1 , (4.52) where Ξ+1 denotes the number of different thresholds. Note that it is possible for a functionℓ (ˆ p k+1Sk ,u k )nottoparticipateatallinq(ˆ p k+1Sk )andinpractice,afewnumber of them participate in q(ˆ p k+1Sk ). The threshold structure of the policy enables an effi- cientimplementation ofthe associatedsensingstrategy rule: examine in which interval the predicted belief state falls into and declare as sensing strategy, the associated con- trol input. As discussed earlier, this is also true for the optimal sensing strategy (see Fig.4.2),wheretheterminsidetheminimizationin(4.5)hasbeenusedinstead. Lastly, Proposition27dealswiththeoptimalityofthemyopicpolicy. Proposition27. UnderAssumption1andforthecaseoftwocontrolinputsu a andu b ,if thereexistsaBlackwellorderingoftheassociatedobservationkernels,thentheoptimal sensingstrategyismyopic. 139 Proof. Similar to the proof of statement 1 in Theorem 3 [62], we need to show that the cost–to–go function J k (ˆ p kSk−1 ),k = L,L − 1,...,1, is a concave function of the predictedbeliefstate ˆ p kSk−1 4 . ThishasbeenestablishedinTheorem25,whichaccording to Theorem 3 in [62] and Corollary 5.5 in [101], results in the optimal sensing strategy beingmyopic. 4.5.2 CE–WWLBStrategy As already discussed in Section 4.2, we are interested in optimizing the trade–off between estimation accuracy, captured by the trace of the filtering error covariance matrix of our approximate MMSE state estimator, and sensing usage cost. In this sec- tion,weproposeanalternativesensingstrategybyexploitingalowerboundontheMSE inanefforttoacquireacomputationallyefficientalgorithm. Weiss–WeinsteinLowerBound The Weiss–Weinstein lower bound (WWLB) [123, 117] is a Bayesian bound on the MSE,wheretheparametersofinterestarerandomvariableswithknown ` aprioridistri- bution. Considerθ ∈R ℓ to be a random vector of parameters andz∈R m an associated measurementvector. Then,foranyestimator ˆ θ(z),theerrorcovariancematrixsatisfies theinequality E{(θ− ˆ θ(z))(θ− ˆ θ(z)) T }⩾HG −1 H T , (4.53) 4 In [62], the goal is to maximize a certain performance measure and thus, concavity is replaced with convexityand minoperatorswith maxoperators. 140 whereH=[h 1 ,h 2 ,...,h ℓ ]∈R ℓ ×ℓ isamatrixwithcolumnsh i ,i=1,...,ℓ ,representing different“testpoint”vectors,the(i,j)elementofmatrixGisgivenby [G] ij = EœŒL s i (z;θ+h i ,θ)−L 1−s i (z;θ−h i ,θ)‘ŒL s j (z;θ+h j ,θ)−L 1−s j (z;θ−h j ,θ)‘¡ EœL s i (z;θ+h i ,θ)¡EœL s j (z;θ+h j ,θ)¡ (4.54) foranysetofnumberss i ∈(0,1)andthejointlikelihoodratioisdefinedasfollows L(z;θ 1 ,θ 2 )= p(z,θ 1 ) p(z,θ 2 ) . (4.55) The definition in (4.54) indicates that the matrix G is symmetric. Also, the matrix H and the set of numbers {s 1 ,s 2 ,...,s ℓ } are arbitrary, i.e. (4.53) represents a family of estimation error bounds. The choice s i = 1 2 ,i = 1,2,...,ℓ, usually maximizes the WWLB [123]. Furthermore, the role of test point vectors is to avoid the restrictive regularity conditions that are usually imposed by other well–known bounds, such as the CRLB and the BLB [117]. As a result, the WWLB can be applied to a variety of situations, where the traditional bounds cannot, and in particular, for our problem of interest,intheestimationofdiscreteparameters. The sequential WWLB constitutes an extension of the WWLB for Markovian dynamical systems [99, 97, 126]. Specifically, letH k andG k be the matrices defined above, when calculated for the sequences X k , Y k and U k . To enable a sequential calculation of the WWLB, the matrix H k must be block–diagonal [99, 97, 126], viz. H k =blkdiag(H 0,0 ,H 1,1 ,...,H k,k ),wherethesubmatrixH r,r =[h 1 r ,h 2 r ,...,h ℓ r ]refers tothestatevectorx r . Wesets i = 1 2 ,i=1,2,...,ℓ . Then,thesequentialWWLBattime stepk is[99,97,126] E™(x k −ˆ p kSk )(x k −ˆ p kSk ) T ž⩾H k,k J −1 k H T k,k . (4.56) 141 TheinformationsubmatrixJ k+1 isrecursivelyupdatedasfollows[99,97,126] A k+1 =G k+1 k,k −G k k,k−1 A −1 k G k k−1,k , (4.57) J k+1 =G k+1 k+1,k+1 −G k+1 k+1,k A −1 k+1 G k+1 k,k+1 , (4.58) ∀k = 0,1,..., whereG k+1 i,j ∈R ℓ ×ℓ andG k e,f ∈R ℓ ×ℓ are entries of the matricesG k+1 and G k ,respectively. DuetothesymmetryofG k+1 andG k ,wehavethat 1. G k+1 i,j =G k+1 j,i , 2. G k e,f =G k e,f . Furthermore,wehavethefollowinginitialconditions A −1 0 ≐0, G 0 0,−1 ≐0, G 0 −1,0 ≐0, (4.59) and J −1 0 is the covariance matrix associated with P(x 0 )f(y 0 Sx 0 ,u −1 ), where u −1 is assumedtobeafixedcontrolinput. Lemma28providestheexactformofthesequential WWLBforoursystemmodel. Lemma 28. Consider the system model described in Section 3.2.1, where the sys- tem state is a discrete–time, finite–state Markov chain and the measurement vector is describedbythemultivariateGaussianobservationkernelof(3.1). LetP(x 0 )beknown ` aprioripmfrelatedtotheinitialstatex 0 . Then,thesequentialWWLBateachtimestep k isdeterminedby(4.57)and(4.58),where G k+1 k+1,k+1 = 2‰1−exp(η k (h k+1 ,−h k+1 ))Ž exp(2η k (h k+1 ,0)) , (4.60) G k+1 k+1,k =G k+1 k,k+1 = exp(ζ k (h k ,h k+1 ))−exp(ζ k (−h k ,h k+1 )) exp(η k (h k+1 ,0)+ρ k (h k ,0)) + exp(ζ k (−h k ,−h k+1 ))−exp(ζ k (h k ,−h k+1 )) exp(η k (,h k+1 ,0)+ρ k (h k ,0)) , (4.61) 142 G k+1 k,k = 2‰1−exp(ρ k (h k ,−h k ))Ž exp(2ρ k (h k ,0)) , (4.62) with η k (h a ,h b )=lnQ x k P(x k ) Q x k+1 » P(x k+1 +h a Sx k ) » P(x k+1 +h b Sx k ) ×ξ(x k+1 +h a ,x k+1 +h b ), (4.63) ρ k (h a ,h b )=ln Q x k−1 P(x k−1 )Q x k » P(x k +h a Sx k−1 ) » P(x k +h b Sx k−1 ) Q x k+1 » P(x k+1 Sx k +h a ) × » P(x k+1 Sx k +h b )ξ(x k +h a ,x k +h b ), (4.64) ζ k (h a ,h b )=ln Q x k−1 P(x k−1 )Q x k » P(x k +h a Sx k−1 )P(x k Sx k−1 ) Q x k+1 » P(x k+1 Sx k +h a ) × » P(x k+1 +h b Sx k )ξ(x k +h a ,x k )ξ(x k+1 +h b ,x k+1 ), (4.65) andthefunctionξ(⋅)correspondstotheBhattacharyyacoefficientgivenby[33] ξ(x k +h a ,x k +h b )=expŒ− 1 8 ‰m u k−1 x k +ha −m u k−1 x k +h b Ž T Q −1 h ‰m u k−1 x k +ha −m u k−1 x k +h b Ž + 1 2 log detQ h ¼ detQ u k−1 x k +ha ⋅detQ u k−1 x k +h b ‘, (4.66) where2Q h =Q u k−1 x k +ha +Q u k−1 x k +h b . Furthermore, theinformation submatrixJ 0 attimestep 0is J 0 = 2‰1−exp(γ(h 0 ,−h 0 ))Ž exp(2γ(h 0 ,0)) (4.67) withγ(h a ,h b )=ln∑ x 0 » P(x 0 +h a )P(x 0 +h b )ξ(x 0 +h a ,x 0 +h b ). Proof. TodeterminetheexactformofG k+1 k+1,k+1 ,G k+1 k+1,k ,G k+1 k,k+1 andG k+1 k,k ,westartfrom theirdefinitionsgiveninTheorem4.1of[97]. Webeginbyletting L ℓ (y ℓ ;x (1) ℓ ,x (2) ℓ ;x ℓ −1 ;u ℓ −1 )≐ f(y ℓ Sx (1) ℓ −1 ,u ℓ −1 )P(x (1) ℓ Sx ℓ −1 ) f(y ℓ Sx (2) ℓ −1 ,u ℓ −1 )P(x (2) ℓ Sx ℓ −1 ) , (4.68) 143 K ℓ (x ℓ +1 ;y ℓ ;x (1) ℓ ,x (2) ℓ ;x ℓ −1 ;u ℓ −1 )≐ P(x ℓ +1 Sx (1) ℓ ) P(x ℓ +1 Sx (2) ℓ ) L ℓ (y ℓ ;x (1) ℓ ,x (2) ℓ ;x ℓ −1 ;u ℓ −1 ). (4.69) Then,forthetermG k+1 k+1,k+1 ,wehavethat G k+1 k+1,k+1 = EœŒ » L + k+1 (y k+1 )− » L − k+1 (y k+1 )‘ 2 ¡ Eœ » L + k+1 (y k+1 )¡ 2 = E™L + k+1 (y k+1 )ž−2E™ » L + k+1 (y k+1 )L − k+1 (y k+1 )ž+E™L − k+1 (y k+1 )ž E™ » L + k+1 (y k+1 )ž 2 , (4.70) where L + k+1 (y k+1 ) ≐ L k+1 (y k+1 ;x k+1 + h k+1 ,x k+1 ;x k ;u k−1 ) and L − k+1 (y k+1 ) ≐ L k+1 (y k+1 ;x k+1 −h k+1 ,x k+1 ;x k ;u k−1 ). We determine each term of (4.70) separately asfollows lnEœ ¼ L + k+1 (y k+1 )¡=lnEœ » f(y k+1 Sx k+1 +h k+1 ,u k ) » P(x k+1 +h k+1 Sx k ) » f(y k+1 Sx k+1 ,u k ) » P(x k+1 Sx k ) ¡ =ln Q X k+1 S p(X k+1 ,U k ,Y k+1 ) × » f(y k+1 Sx k+1 +h k+1 ,u k ) » P(x k+1 +h k+1 Sx k ) » f(y k+1 Sx k+1 ,u k ) » P(x k+1 Sx k ) dY k+1 (a) = lnQ x k P(x k ) Q x k+1 » P(x k+1 +h k+1 Sx k )P(x k+1 Sx k ) × S » f(y k+1 Sx k+1 +h k+1 ,u k ) » f(y k+1 Sx k+1 ,u k )dy k+1 ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ ≐ξ(x k+1 +h k+1 ,x k+1 ) ≐η k (h k+1 ,0), (4.71) 144 where(a)resultsfromtheMarkoviannatureofoursystemandtheintegralin(4.71)is [33] ξ(x k+1 +h k+1 ,x k+1 )= S ¼ N(m u k x k+1 +h k+1 ,Q u k x k+1 +h k+1 ) » N(m u k x k+1 ,Q u k x k+1 )dy k+1 =expŒ− 1 8 ‰m u k x k+1 +h k+1 −m u k x k+1 Ž T Q −1 h ‰m u k x k+1 +h k+1 −m u k x k+1 Ž + 1 2 log detQ h ¼ detQ u k x k+1 +h k+1 ⋅detQ u k x k+1 ‘, (4.72) viz. theBhattacharyyacoefficient[33]. Next,wehave lnEœL + k+1 (y k+1 )¡=lnEœ f(y k+1 Sx k+1 +h k+1 ,u k )P(x k+1 +h k+1 Sx k ) f(y k+1 Sx k+1 ,u k )P(x k+1 Sx k ) ¡ =ln Q X k+1 S p(X k+1 ,U k ,Y k+1 ) f(y k+1 Sx k+1 +h k+1 ,u k ) f(y k+1 Sx k+1 ,u k ) × P(x k+1 +h k+1 Sx k ) P(x k+1 Sx k ) dY k+1 =lnQ x k P(x k ) Q x k+1 P(x k+1 +h k+1 Sx k ) × S f(y k+1 Sx k+1 +h k+1 ,u k )dy k+1 =0, (4.73) andsimilaristhecaseforlnE™L − k+1 (y k+1 )ž. Finally,wehavethat lnEœ ¼ L + k+1 (y k+1 )L − k+1 (y k+1 )¡=ln Q X k+1 S p(X k+1 ,U k ,Y k+1 ) » f(y k+1 Sx k+1 +h k+1 ,u k ) » f(y k+1 Sx k+1 ,u k ) × » P(x k+1 +h k+1 Sx k ) » f(y k+1 Sx k+1 −h k+1 ,u k ) ¼ » P(x k+1 −h k+1 Sx k ) » P(x k+1 Sx k ) » f(y k+1 Sx k+1 ,u k ) » P(x k+1 Sx k ) dY k+1 =lnQ x k P(x k ) Q x k+1 » P(x k+1 +h k+1 Sx k )P(x k+1 −h k+1 Sx k ) × S » f(y k+1 Sx k+1 +h k+1 ,u k ) » f(y k+1 Sx k+1 −h k+1 ,u k )dy k+1 ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ =ξ(x k+1 +h k+1 ,x k+1 −h k+1 ) =η k (h k+1 ,−h k+1 ). (4.74) 145 Substituting (4.71), (4.73) and (4.74) back to (4.70) and exploiting the property exp(ln(ω))=ω,weget(4.60). Next,forthetermG k+1 k+1,k ,weworkasfollows G k+1 k+1,k = EœŒ » L + k+1 (y k+1 )− » L − k+1 (y k+1 )‘Œ » K + k (y k )− » K − k (y k )‘¡ Eœ » L + k+1 (y k+1 )¡Eœ » K + k (y k )¡ = E™ » L + k+1 (y k+1 )K + k (y k )ž−E™ » L + k+1 (y k+1 )K − k (y k )ž E™ » L + k+1 (y k+1 )žE™ » K + k (y k )ž + −E™ » L − k+1 (y k+1 )K + k (y k )ž+E™ » L − k+1 (y k+1 )K − k (y k )ž E™ » L + k+1 (y k+1 )žE™ » K + k (y k )ž , (4.75) whereK + k (y k )≐K k (x k+1 ;y k ;x k +h k ,x k ;x k−1 ;u k−1 )andK − k (y k )≐K k (x k+1 ;y k ;x k − h k ,x k ;x k−1 ;u k−1 ). We only need to determine the four terms in the numerator and the termE™ » K + k (y k )žinthedenominator. So,wehave lnEœ ¼ K + k (y k )¡=ln Q X k+1 S p(X k+1 ,U k ,Y k+1 ) » P(x k+1 Sx k +h k ) » f(y k Sx k +h k ,u k−1 ) » P(x k+1 Sx k ) » f(y k Sx k ,u k−1 ) × » P(x k +h k Sx k−1 ) » P(x k Sx k−1 ) dY k+1 = Q x k−1 P(x k−1 )Q x k » P(x k +h k Sx k−1 )P(x k Sx k−1 ) Q x k+1 » P(x k+1 Sx k +h k ) × » P(x k+1 Sx k ) S » f(y k Sx k +h k ,u k−1 ) » f(y k Sx k ,u k−1 )dy k ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ =ξ(x k +h k ,x k ) ≐ρ k (h k ,0). (4.76) Next,wehavethat lnEœ ¼ L + k+1 (y k+1 )K + k (y k )¡=ln Q X k+1 S p(X k+1 ,U k ,Y k+1 ) » f(y k+1 Sx k+1 +h k+1 ,u k ) » f(y k+1 Sx k+1 ,u k ) 146 × » P(x k+1 +h k+1 Sx k ) » P(x k+1 Sx k +h k ) P(x k+1 Sx k ) » f(y k Sx k ,u k−1 ) » P(x k Sx k−1 ) × » f(y k Sx k +h k ,u k−1 ) » P(x k +h k Sx k−1 )dY k+1 =ln Q x k−1 P(x k−1 )Q x k » P(x k +h k Sx k−1 )P(x k Sx k−1 ) × Q x k+1 » P(x k+1 Sx k +h k )P(x k+1 +h k+1 Sx k ) × S » f(y k Sx k +h k ,u k−1 ) » f(y k Sx k ,u k−1 )dy k ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ =ξ(x k +h k ,x k ) × S » f(y k+1 Sx k+1 +h k+1 ,u k ) » f(y k+1 Sx k+1 ,u k )dy k+1 ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ =ξ(x k+1 +h k+1 ,x k+1 ) ≐ζ k (h k ,h k+1 ). (4.77) Followingtheexactsameprocedureasabove,wecanderivetherestofthetermsinthe denominatorof(4.75),i.e. lnEœ ¼ L + k+1 (y k+1 )K − k (y k )¡=ζ k (−h k ,h k+1 ), (4.78) lnEœ ¼ L − k+1 (y k+1 )K + k (y k )¡=ζ k (h k ,−h k+1 ), (4.79) lnEœ ¼ L − k+1 (y k+1 )K − k (y k )¡=ζ k (−h k ,−h k+1 ). (4.80) Substituting (4.71), (4.76) and (4.77)–(4.80) back to (4.75) and exploiting the property exp(ln(ω)) = ω, we get (4.61). By symmetry, G k+1 k+1,k = G k+1 k,k+1 and thus, the above formula,canbeusedtodetermineG k+1 k,k+1 aswell. 147 Lastbutnotleast,forthetermG k+1 k,k ,wehavethat G k+1 k,k = EœŒ » K + k (y k )− » K − k (y k )‘ 2 ¡ Eœ » K + k (y k )¡ 2 = E™K + k (y k )ž−2E™ » K + k (y k )K − k (y k )ž+E™K − k (y k )ž E™ » K + k (y k )ž 2 , (4.81) andtodetermineitsexactform,wedetermineeachtermasfollows lnE™K + k (y k )ž=ln Q X k+1 S p(X k+1 ,U k ,Y k+1 ) P(x k+1 Sx k +h k )f(y k Sx k +h k ,u k−1 ) P(x k+1 Sx k )f(y k Sx k ,u k−1 ) × P(x k +h k Sx k−1 ) P(x k Sx k−1 ) dY k+1 =ln Q x k−1 P(x k−1 )Q x k P(x k +h k Sx k−1 ) Q x k+1 P(x k+1 Sx k +h k ) × S f(y k Sx k +h k )dy k =0. (4.82) Similarly,lnE™K − k (y k )ž=0. Finally,wehavethat lnEœ ¼ K + k (y k )K − k (y k )¡=ln Q X k+1 S p(X k+1 ,U k ,Y k+1 ) » P(x k+1 Sx k +h k )f(y k Sx k +h k ,u k−1 ) » P(x k+1 Sx k )f(y k Sx k ,u k−1 ) × » P(x k +h k Sx k−1 )P(x k+1 Sx k −h k )f(y k Sx k −h k ,u k−1 ) P(x k Sx k−1 ) » P(x k+1 Sx k )f(y k Sx k ,u k−1 ) × » P(x k −h k Sx k−1 )dY k+1 =ln Q x k−1 P(x k−1 )Q x k » P(x k +h k Sx k−1 )P(x k −h k Sx k−1 ) × Q x k+1 » P(x k+1 Sx k +h k )P(x k+1 Sx k −h k ) × S » f(y k Sx k +h k ,u k−1 )f(y k Sx k −h k ,u k−1 )dy k ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ =ξ(x k +h k ,x k −h k ) 148 =ρ k (h k ,−h k ). (4.83) Substituting (4.76), (4.82) and (4.83) back to (4.81) and exploiting the property exp(ln(ω))=ω,weget(4.62). TodeterminetheinformationsubmatrixJ 0 ,weworkasfollows J 0 ≐ EœŒ » L + 0 (y 0 )− » L − 0 (y 0 )‘ 2 ¡ Eœ » L + 0 (y 0 )¡ 2 = E™L + 0 (y 0 )ž−2E™ » L + 0 (y 0 )L − k+1 (y 0 )ž+E™L − 0 (y 0 )ž E™ » L + 0 (y 0 )ž 2 , (4.84) where L + 0 (y 0 ) ≐ L 0 (y 0 ;x 0 + h 0 ,x 0 ;u −1 ) = p(y 0 Sx 0 +h 0 ,u −1 )P(x 0 +h 0 ) p(y 0 Sx 0 ,u −1 )P(x 0 ) and L − 0 (y 0 ) ≐ L 0 (y 0 ;x 0 −h 0 ,x 0 ;u −1 ) = p(y 0 Sx 0 −h 0 ,u −1 )P(x 0 −h 0 ) p(y 0 Sx 0 ,u −1 )P(x 0 ) . Based on these definitions, we have that lnEœ ¼ L + 0 (y 0 )¡=lnQ x 0 » P(x 0 +h 0 )P(x 0 ) S » f(y 0 Sx 0 +h 0 ,u −1 ) » f(y 0 Sx 0 ,u −1 )dy 0 ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ ξ(x 0 +h 0 ,x 0 ) ≐γ(x 0 +h 0 ,x 0 ), (4.85) and lnE™L + 0 (y 0 )ž=lnQ x 0 P(x 0 +h 0 ) S f(y 0 Sx 0 +h 0 ,u −1 )dy 0 =0, (4.86) lnE™L − 0 (y 0 )ž=lnQ x 0 P(x 0 −h 0 ) S f(y 0 Sx 0 −h 0 ,u −1 )dy 0 =0. (4.87) 149 Finally,wehavethat lnEœ » L + 0 (y 0 )L − 0 (y 0 )¡=lnQ x 0 » P(x 0 +h 0 )P(x 0 −h 0 ) × S » f(y 0 Sx 0 +h 0 ,u −1 ) » f(y 0 Sx 0 −h 0 ,u −1 )dy 0 ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ ξ(x 0 +h 0 ,x 0 −h 0 ) =γ(x 0 +h 0 ,x 0 −h 0 ). (4.88) Substituting (4.85)–(4.88) back to (4.84) and exploiting the property exp(ln(ω)) = ω, weget(4.67)andthisstepcompletestheproof. Remark 4. The variables in (4.57) and (4.58) are all scalars since our parameter of interestisadiscrete–time,finite–stateMarkovchainwithnstates 5 . As already discussed, the WWLB avoids the need to satisfy any regularity condi- tions via the usage of test points. For our system model, this fact implies that we can determine the exact form of the sequential WWLB through Lemma 28. Nonetheless, thetestpointsmustbecarefullyselectedtoaccountforthefactthatourparameterspace isdiscrete. Inotherwords,testpointsshouldbestate–dependent,i.e. h t ∈A≐™h t (x t )∈RSx t +h t (x t )∈Xž (4.89) toensurethevalidityandcorrectnessofalltherelatedformulae. Forinstance,forn=4 states {1,2,3,4}, the valid test point values for each state are: 1) h t (1) ∈ {1,2,3}, 2) h t (2) ∈ {−1,1,2}, 3) h t (3) ∈ {−2,−1,1}, and 4) h t (4) ∈ {−3,−2,−1}. In that sense, the test point at time slot t can be represented as a vector of the form h t = [h t (1), 5 We have adopted the scalar notation x k to represent the system state at time step k, where now x k ∈X ≐{1,...,n}. 150 h t (2),h t (3),h t (4)] T ,whereh t (i)isselectedsoeachstateimovestoanewstatej with i≠j. Remark5. TheWWLBcomputedaboveassumesonetestpointperparameter,andcan be easily extended to accommodate multiple test points per parameter [117]. This sig- nificantlyincreasestheassociatedcomputationalcomplexity,butinsomecases,multiple testpointsarerequiredtoobtainatightbound. Cost–EfficientWWLB(CE–WWLB) Having derived the WWLB for our system model and in an effort to bypass the high computational complexity associated with the DP solution of Section 4.3, we propose a myopic sensing strategy that optimizes the trade–off between the sequential WWLB andthesensingusagecostateachtimestep,i.e. u CE−WWLB k =argmin(1−λ)v(u k )+λc(u k ), (4.90) wherev(u k )≐max h k+1 [J −1 k+1 (h k+1 ,u k )],andthedependenceofJ k+1 (h k+1 ,u k )onh k+1 andu k hasbeenstatedexplicitly. TheWWLBismaximizedwithrespecttoallpossible testpointcombinationsateachtimesteptoensurethatthehighestWWLBiscomputed. SincetheWWLBconstitutesalowerboundontheMSEofanyMarkovchainsystem state estimator and we are interested in strategies that optimize the trade–off between MSEandsensingcost,theproposedstrategyin(4.90)isratherintuitive. Anotheragree- able characteristic is that the associated cost function v(u k ) consists of functions of union–bound terms based on the Bhattacharyya detection error probability bound [33]. In fact, the terms in (4.63), (4.64) and (4.65) can be expressed as functions of these bounds,e.g. η k (h a ,h b )=lnQ x k P(x k )P Bh ub (x k ), (4.91) 151 whereP Bh ub (x k )=∑ x k+1 » P(x k+1 +h a Sx k )P(x k+1 +h b Sx k )ξ(x k+1 +h a ,x k+1 +h b ). This last step builds a nice connection between MSE and detection error performance. Note that several sensing strategies, which have been empirically shown to perform well, have focused on the optimization of the Bhattacharyya coefficient and the detection errorprobabilityunionbounds[129],sincethesearegoodmeasuresoftheconfusability ofdifferenthypotheses. Atthispoint, weunderscorethattheBhattacharyyacoefficient in (4.66) follows from settings= s i = 1 2 ,i= 1,2,...,ℓ . If we wish to also optimize the WWLBwithrespecttos,theresultingWWLBformulae 6 willinsteaddependon S f(y k Sx k +h a ,u k−1 ) s f(y k Sx k +h b ,u k−1 ) 1−s dy k =exp(−κ(s)), (4.92) where κ(s)≐ s(1−s) 2 Œm u k−1 x k +h b −m u k−1 x k +ha ‘ T ŒsQ u k−1 x k +ha +(1−s)Q u k−1 x k +h b ‘ −1 Œm u k−1 x k +h b −m u k−1 x k +ha ‘ + 1 2 ln SsQ u k−1 x k +ha +(1−s)Q u k−1 x k +h b S SQ u k−1 x k +ha S s SQ u k−1 x k +h b S 1−s , (4.93) that is the error exponent of the Chernoff bound [33]. In that case, the WWLB union– boundtermswillbebasedontheChernoffdetectionerrorprobabilitybound[33]. Since the latter bound is tighter than the Bhattacharyya bound, the associated sensing strat- egy might lead to better trade–off curves than CE–WWLB, yet, with the expense of increased computational complexity due to the optimization over s. To avoid such an issue, we adopted the computationally simpler but slightly less tight Bhattacharyya bound. Themyopicstructureoftheproposedstrategyin(4.90)alsobenefitscomputational complexity, since the computational burden of determining the expected future cost 6 Notethatthesquareroottermswillalsobereplacedbypowersoffunctionsofs. 152 requiredbytheDPalgorithmin(4.5)isavoided. Furthermore,thereisnoneedanymore toconsidereverypointofanuncountablyinfiniteset, sincetheassociatedoptimization functiondoesnotdependonthepredictedbeliefstate ˆ p k+1Sk . Lastly,weobservethatthe WWLB constitutes an off–line performance bound, i.e. the related measurement infor- mationisaveragedout. Asaresult,theoff–linecomputationofthisstrategyispossible, whichinturnfacilitatesanefficientimplementation. 4.6 NumericalResults In this section, we present numerical simulations to illustrate the performance of the proposed sensing strategies in the body sensing application of Chapter 2. Specifically, we consider the scenario described in Section 2.6, where we wish to track the physical activity of an individual, which is described by the discrete–time, finite–state Markov chainofFig.2.1, usingthreebiometricsensors, i.e. twoaccelerometers(ACCs)andan electrocardiograph (ECG). Since continuously receiving samples from all the sensors limits the fusion center’s battery life, sensing strategies (such as the ones presented in Sections 4.3 and 4.5) must be employed. Thus, in this case, the sensing usage cost capturesthenormalizedenergycost ofselectingcontrolinputu k andisdefinedas c(u k )≐ 1 C u T k δ (4.94) whereδ =[δ ACC1 ,δ ACC2 ,δ ECG ] T =[0.585,0.776,1] T [129] corresponds to a vector that describes the mobile phone’s reception cost for each of the biometric sensors, andC is a normalizing factor. Based on these strategies, the mobile phone can decide to receive all (or any subset) of the generated samples by selecting the appropriate control input u k =[N u k 1 ,N u k 2 ,N u k 3 ] T ,whereN u k l denotesthetotalnumberofsamplesrequestedfrom sensor S l when control inputu k is selected. We assume that during each time slot k, 153 0.6 0.8 1 1.2 1.4 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 Average Energy Cost (AEC) Average MSE performance (AMSE) DP MSE–based strategy Myopic strategy CE–WWLB strategy Figure 4.5: Trade–off curves for DP MSE–based, myopic and CE–WWLB strategies forN =2samples. there exists a fixed budget of N samples that we cannot exceed, i.e. u T k 1 ⩽ N, and the mobile phone can select between α = ∑ N i=1 ‰ i+2 i Ž available measurement vectors of the formin(3.1)with m u k−1 i =[μ i,u k−1 (S 1 ) T ,μ i,u k−1 (S 2 ) T ,μ i,u k−1 (S 3 ) T ] T , (4.95) Q u k−1 i =diag(Q i,u k−1 (S 1 ),Q i,u k−1 (S 2 ),Q i,u k−1 (S 3 )), (4.96) whereμ i,u k−1 (S l ) is a N u k−1 l ×1 vector,Q i,u k−1 (S l ) = σ 2 S l ,i 1−φ 2 T+σ 2 z I is a N u k−1 l ×N u k−1 l matrix,T is a Toeplitz matrix whose first row/column is [1,φ,φ 2 ,...,φ N u k−1 l −1 ] T , φ is theparameterofourmodelandσ 2 z accountsforsensingandcommunicationnoise. In the sequel, we compare the optimal sensing strategy of Theorem 22 (DP MSE– basedstrategy)withthemyopicstrategyof(4.51)andtheCE–WWLBstrategyof(4.90) withrespectto: 154 i. theaverageMSEperformancedefinedas AMSE≐ 1 K K Q k=1 tr(Σ kSk ), (4.97) ii. theaveragedetectionperformancedefinedas ADP≐ 1 K K Q k=1 1 {x k =ˆ x k } , (4.98) where ˆ x k =argmaxˆ p kSk ,and iii. theaverageenergycost definedas AEC≐ 1 K K Q k=1 u T k δ, (4.99) where K represents the number of Monte Carlo runs. Unless otherwise stated, the simulation parameters are as follows: N = 12 samples in total, L = 5 and K = 10 6 . For the simulations, we use the signal model pdfs shown in Fig. 2.2a. For comparison, we also include the performance of the equal allocation (EA) strategy (N = 3,6,9,12), where the same number of samples are requested from each sensor irrespective of the individual’sphysicalactivitystate. Fig.4.5showstheAEC–AMSEtrade–offcurvesoftheDPMSE–based,myopicand CE–WWLB strategies. The total budget N of requested samples was set to two, since for N > 2, the optimal POMDP solution requires excessive amount of computation time. For these small problem sizes, the myopic and CE–WWLB strategies performed competitively with the DP MSE–based solution. In fact, the loss of performance due to adoption of myopic policy is small, while CE–WWLB’s performance is essentially indistinguishable from the performance of the DP MSE–based strategy. For the last 155 2 4 6 8 0.3 0.32 0.34 0.36 0.38 0.4 Average Energy Cost (AEC) Average MSE Performance (AMSE) Myopic strategy CE–WWLB strategy Equal allocation (N = 3) (N = 6) (N = 9) (N = 12) (a) 2 4 6 8 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 Average Energy Cost (AEC) Average Detection Performance (ADP) Myopic strategy CE–WWLB strategy Equal allocation (N = 3) (N = 6) (N = 9) (N = 12) (b) Figure 4.6: Trade–off curves for myopic (N = 12 samples), CE–WWLB (N = 12 sam- ples) and equal allocation (N = 3,6,9,12 samples) strategies: (a) AMSE versus AEC, (b)ADPversusAEC. observation, our intuition is that the WWLB successfully captures the detection nature of our active state tracking problem, which in turn justifies the suitability of functions of detection error probability bounds as performance objectives for this type of prob- lems. Another agreeable characteristic of employing these strategies is the attendant complexity reduction, which is significant. Based on these findings, we increase the totalnumberofsamplesN tocomparethelowercomplexitystrategies,andremovethe computationallyintractableoptimalmethodfromfurtherconsideration. Fig. 4.6 illustrates the trade–off curves achieved by the myopic and CE–WWLB strategies forN = 12 samples and EA forN = 3,6,9,12. In particular, Fig. 4.6a shows theAEC–AMSEcurve,whileFig.4.6btheAEC–ADPcurve. Inbothcases,weobserve thatbyspendingmoreenergy,weachievebetterMSE/detectionaccuracyperformance. Furthermore, compared to EA, the two sensing strategies exhibit the same detection accuracy; but lower energy consumption. We notice that the energy reduction achieved 156 Sit Stand Run Walk 0 1 2 3 4 5 6 Average number of samples ACC 1 ACC 2 ECG (a)Myopicstrategy. Sit Stand Run Walk 0 2 4 6 8 10 Average number of samples ACC 1 ACC 2 ECG (b)CE–WWLBstrategy. Figure4.7: Samplesallocationfordifferentphysicalactivitystatesfordetectionperfor- mancesettoEA’sperformance(N =12samples). by them is in general identical excluding the case where detection accuracy is highly– valued. In that case, CE–WWLB spends more energy to achieve similar detection per- formance with the myopic strategy, as verified by Fig. 4.6. This is due to the former strategynotusingthebeliefstateinformationtosteerthesensorselectionprocess,which inturngivesrisetoaconservativeselectiontocircumventanyworst–casescenarios. As aresult,themyopicstrategyexhibits60%energygains,whileCE–WWLBonly7%for detectionperformanceequaltoEA’sperformance(N =12samples). 157 Finally,Fig.4.7providestheaverageallocationofsamplespersensorforthemyopic (Fig. 4.7a) and CE–WWLB (Fig. 4.7b) strategies for the four physical activity states when their detection performance is set to EA’s performance. As expected, no samples are requested from the ECG since according to Fig. 2.2a, if ECG is used, it is hard to distinguish between the four physical activity states for this particular individual. On the other hand, a combination of samples from the two ACCs is used. In the myopic strategycase,theexactnumberdependsonthephysicalactivityofinterestandonaver- age, less than the total available samples are used. At the same time, preference is given to the first ACC. In contrast to the above, in the CE–WWLB strategy case, the exact number of samples is independent of the physical activity state since the belief stateinformationisignored, andpreferenceisgiventothesecondACC,whichismore energy–costly. Finally, neglecting belief state information and accounting for worst– casescenariosresultinusingallavailablesamples. 4.7 ConcludingRemarks In this chapter, we considered the problem of sensing strategy design to optimize the trade–offbetweenestimationperformanceandsensingcostinactivestatetrackingappli- cations. Our previously proposedKalman–like estimator wasemployed for state track- ing and an optimal sensor selection strategy was derived. In an effort to better under- standtheoptimalpolicystructure,structuralpropertiesofthecost–to–gofunctionwere studied in conjunction with the notion of stochastic ordering of observation kernels. In particular, the concavity of the cost–to–go function for non–linear POMDPs was established, which enabled us to show that the optimal policy has a threshold struc- ture and characterize when passive sensing is optimal. Two sensing strategies with low complexity were also presented. Finally, the performance of the proposed strategies 158 was illustrated using real data from the body sensing application of Chapter 2, where cost–savings as high as 60% were attained without significantly impairing estimation performance. 159 Chapter5 ConclusionsandFutureWork In this thesis, we have considered the active state tracking problem for discrete–time, finite–state Markov chains. To capture the interactions between the key subsystems, we have adopted a partially observable Markov decision process formulation. Based on this formulation, we devised sensing and estimation strategies employing different performancequalitymeasuresandoptimizedcompetinggoalswhereveritwasrequired. Theremainderofthischapterisorganizedasfollows. InSection5.1,wesummarize theresultsofthisthesis,whileinSection5.2,weoutlinevariousfuturedirections. 5.1 Summary Westartedthisdissertationbyaddressingaspecificactivestatetrackingapplication. In particular, in Chapter 2, we addressed the problem of resource allocation for energy– efficient physical activity detection in wireless body sensing networks. We formulated the aforementioned problem as a partially observable Markov decision process with costfunctiondefinedasthetrade–offbetweendetectionaccuracyandenergycost. This enabledustoderiveadynamicprogrammingalgorithmthatsolvesfortheoptimalsen- sor selection strategy. To design low–complexity algorithms, we first proved important properties of the related cost functions, and then used these in conjunction with heuris- tics to propose three suboptimal schemes. Finally, we evaluated the performance of all algorithms on real data collected from the KNOWME network [78] and observed high energygainscomparedtoanequalallocationstrategy. 160 Next, in Chapter 3, we studied active state tracking of discrete–time, finite–state Markov chains, where the measurement process is described by a controlled multi– variate Gaussian observation model. In an effort to devise a generic controlled sensing framework, we addressed the joint problem of determining recursive formulae for an MMSE state estimator and designing an accompanying control policy. Specifically, we derived an approximate state estimator with Kalman structure and used its mean– squared error in a partially observable Markov decision process formulation. Based on the latter, we derived a nonstandard dynamic programming algorithm to solve for the optimalsensingstrategy. WealsoderivedapproximateMMSEsmoothingestimatorsto enhance estimation performance. We concluded the chapter by illustrating the perfor- manceoftheproposedframeworkonthebodysensingapplicationofChapter2. Finally, in Chapter 4, we extended the aforementioned framework to account for sensingusagecosts. Inparticular,weoptimizedthetrade–offbetweenestimationaccu- racy and sensing cost by adopting a partially observable Markov decision formulation and deriving the optimal sensing strategy via a nonstandard dynamic programming. To overcome the computational complexity associated with determining the optimal solu- tion, we showed the concavity of the cost–to–go function in the case of two states and scalarmeasurements. Thissuggestedthethresholdstructureoftheoptimalsensingstrat- egy, thus generalizing similar results for linear POMDPs in the literature [14]. We also characterized when state tracking with no observation control is optimal by exploiting stochasticordering[80]. Exploitingtheaboveresultsandappropriatelygeneralizingthe Weiss–Weinsteinlowerbound[123],wederivedtwolow–complexitysensingstrategies, which exhibited near–optimal performance. Finally, we evaluated the performance of all the proposed strategies on the body sensing application of Chapter 2 and showed highenergygainscomparedtoanaiveequalallocationstrategy. 161 5.2 FutureDirections Theactivestatetrackingframeworkthatwehaveconsideredinthisthesiscoversawide rangeofapplications,asdiscussedinearlierchapters. Still,therearevariousinteresting future research directions and extensions that one can consider. We point out some of theminthesequel. Specifically, as a first step, characterizing the theoretical performance (e.g. ǫ – optimality) of our proposed low–complexity schemes and deriving respective guaran- tees constitute very interesting future directions. Successfully addressing these tasks canhaveagreatimpactonthetheoryofactivestatetrackingduetothegeneralityofour proposedframework. Tothisendandinefforttounderstandthestructureoftheoptimal sensing strategy in the most general case, it is imperative to generalize the results of Chapter4inthecaseofmultiplestatesandmeasurementvectors. Thiswouldalsofacil- itate the development of good approximation algorithms with performance guarantees. Ideally, the development of a principled framework for deriving near–optimal, low– complexity active state tracking strategies can have a huge impact on a broad spectrum ofapplications. At this point, we underscore that most prior art on active state tracking has adopted linear POMDP formulations [125, 8, 129, 82, 60, 90]. However, as systems with ever increasingcomplexitiesbecomeubiquitous,capturingnon–lineareffectsisunavoidable in modeling, observation and control. In addition, the adoption of specialized perfor- mance metrics that capture the rich characteristics of related applications, e.g. sparse signalsestimation,patientrehabilitationandemotionrecognitionfore–health,computer visionandrobotics,intelligenttransportation,isnecessarytosignificantlyenhanceper- formance. Tothisend,non–linearandnonstandardPOMDPformulationsbecomeindis- pensableandthus, devisingappropriatetechniquestoefficientlysolvetheseconstitutes averyimportanttask. 162 Throughout this dissertation, we have adopted a Bayesian approach to active state tracking, i.e. we assume that system state dynamics, observation distributions and key cost functions are known. However, in practical applications, this is not usually the case. Therefore, it is necessary to devise algorithms to jointly estimate these unknown quantities,e.g. thetransitionmatrix,thecosts,anddeviseactivesensingstrategies. Asa firststep,onecouldpossiblyextendresultssuchas[63]totheactivestatetrackingcase. Another important aspect of active state tracking problems is information charac- terization. This relates to the selection of appropriate performance objectives but also different types of ordering such as the ones discussed in Chapter 4. In particular, since decisionmakingisguidedbyappropriatelydefinedperformanceobjectivesandexploits knowledgeofpreviousinformationtoadaptthesensingstrategy,itisimportanttodeter- minewhatisagoodmeasureofinformation. Selectinganappropriateinformationmea- surecangreatlyfacilitatetheunderstandingofthestructureoftheoptimalstrategy. We have explored various measures of information throughout this thesis. However, what isanappropriatemeasureisstillanopenproblem. Inanefforttodesignefficientactive state tracking strategies, making connections and comparisons between well–known measures of informations as well as devise appropriate extensions (e.g developing on– line forms of WWLB can possibly lead to larger cost gains) and new measures consti- tutesaveryinterestingresearchproblem. Furthermore,aswehavealreadydiscussedin Chapter4,stochasticorderingoftheobservationkernelscanpossiblyrevealifaspecific controlismoreinformativethanothercontrols. Thisfacilitatesthedeterminationofthe structureoftheoptimalsensingstrategy,i.e. thewaytheoptimalcontrolinputspartition the belief space. There are various ways of comparing stochastic models [80]. Within the context of active state tracking, selecting the appropriate type of comparison can significantlysimplifythestructuralcharacterizationoftheoptimalsensingstrategyand is of paramount importance. In this thesis, Blackwell ordering has been adopted, while 163 in [62], the authors have considered the monotone likelihood ratio ordering as a means of ensuring the threshold structure of the optimal policy. However, determining more appropriate types of ordering, tailored to the characteristics of the active state tracking problem,isyettobeaddressed. Characterizing the gains of active state tracking and the respective limits is also a very interesting research problem. How much can we gain by adopting a sequential strategy versus a non–sequential one? What are the benefits of adaptive versus non– adaptive solutions? Questions similar to the above have been studied in the context of active hypothesis testing in [83]. In Chapter 4, we numerically verified that the CE– WWLBstrategy,whichisnon–adaptive,hasperformancesimilartothemyopicstrategy, whichisadaptive,exceptwhenweprimarilyshiftourinteresttodetectionaccuracy. As a first step, one could possibly extend results from [83] to the time–varying case in an efforttocharacterizethebenefitsofactivestatetracking. Scalable solutions will soon be more relevant than ever, especially due to the popu- larityoftheInternetofThings[9]andthecyber–physicalsystems[58]. Thus,asalready discussed,itisofutmostimportancethattheproposedstrategiescanaccommodatelarge state, control and observation spaces. This thesis has mostly focused on devising intel- ligent heuristics to address the scalability issue. As a next step, exploiting compressed sensing and sparse approximation theory tools to capture the hidden structure of the related spaces seems as the natural approach to follow. As a first step, one could adapt approaches from [72] and [67] to sparsify POMDP models and determine concise rep- resentationsofkeystochasticprocesses,enablingefficientdetectionandcontrol. Last but not least, active state tracking is a versatile framework, which can accom- modate a broad spectrum of applications as we have already discussed. This suggests that addressing the above challenges will have a great impact on the efficient design of 164 variousrealsystems. Ineffect,understandingandappropriatelymodelingsystemcom- ponents and key interactions, and studying the structural characteristics of the related solutionsconstitutethebasisfordevelopingefficientalgorithmsforrealsystems. Atthe sametime,itisofutmostimportancetoevaluateandunderstandtheimpactofsuchsolu- tions on these real systems. Through this process, the benefits and drawbacks of active state tracking will be easily perceived leading to appropriate modification of various designchoiceswiththeultimategoalofdesigningintelligentsystems. 165 References [1] Specification of the Bluetooth System. [Online]. Available at: https:// www.bluetooth.org/Technical/Specifications/adopted. htm,1994. [2] 50 Sensor Applications for a Smarter World. [Online]. Available at: http:// www.libelium.com/top_50_iot_sensor_applications_ ranking/.,2013. [3] D.Aberdeen. A(Revised)SurveyofApproximateMethodsforSolvingPartially Observable Markov Decision Processes. Technical report, National ICT Aus- tralia, 2003. [Online]. Available at: http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.64.3376. [4] G.Anastasi,M.Conti,M.D.Francesco,andA.Passarella. EnergyConservation in Wireless Sensor Networks: A Survey. Ad Hoc Networks, 7(3):537–568, May 2009. [5] S.Atev,H.Arumugam,O.Masoud,R.Janardan,andN.P.Papanikolopoulos. A vision–basedapproachtocollisionpredictionattrafficintersections. IEEETrans- actionsonIntelligentTransportationSystems,6(4):416–423,December2005. [6] M. Athans. On the determination of optimal costly measurement strategies for linearstochasticsystems. Automatica,8(4):397–412,1972. [7] M. Athans and F. C. Schweppe. Optimal waveform design via control theoretic concepts. InformationControl,10:335–377,1967. [8] G.K.Atia,V.V.Veeravalli,andJ.A.Fuemmeler. SensorSchedulingforEnergy– EfficientTargetTrackinginSensorNetworks. IEEETransactionsonSignalPro- cessing,59(10):4923–4937,October2011. [9] L.Atzori,A.Iera,andJ.Morabito. TheInternetofThings: Asurvey. Computer Networks,54(15):2787–2805,2010. 166 [10] L.K.Au,A.A.T.Bui,M.A.Batalin,andW.J.Kaiser. Energy–EfficientContext Classification With Dynamic Sensor Control. IEEE Transactions on Biomedical CircuitsandSystems,6(2):167–178,April2012. [11] E. Baccarelli and R. Cusani. Recursive Kalman-type optimal estimation and detectionofhiddenMarkovchains. SignalProcessing,51(1):55–64,May1996. [12] S. Balasubramanian, I. Elangovan, S. K. Jayaweera, and K. R. Namuduri. Dis- tributed and Collaborative Tracking for Energy–Constrained Ad–hoc Wireless Sensor Networks. In Proceedings of the IEEE Wireless Communications and NetworkingConference(WCNC),pages1732–1737,Atlanta,GE,March2004. [13] R.E.Bellman. DynamicProgramming. PrincetonUniversityPress,1957. [14] D.P.Bertsekas. DynamicProgrammingandOptimalControl,volume1. Athena Scientific,Belmont,MA,3edition,2005. [15] L. Blackmore, S. Rajamanoharan, and B. C. Williams. Active Estimation for Jump Markov Linear Systems. IEEE Transactions on Automatic Control, 53(10):2223–2236,2008. [16] D.Blackwell. EquivalentComparisonsofExperiments. AnnalsofMathematical Statistics,24:265–272,1953. [17] A.K.Bourke,J.V.O’Brien,andG.M.Lyons. Evaluationofathreshold–based tri–axialaccelerometerfalldetectionalgorithm. Gait & Posture,26(2):194–199, 2007. [18] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,NewYork,NY,1edition,2004. [19] P.M.BremaudandJ.H.VanSchuppen. Discretetimeprocesses: PartII:estima- tiontheory. TechnicalReportSSM7603b,WashingtonUniversity,June1976. [20] P.M.BremaudandJ.H.VanSchuppen. Discretetimestochasticsystems: PartI: stochasticcalculusandrepresentations. TechnicalReportSSM7603a,Washing- tonUniversity,June1976. [21] L. L. Cam. Sufficiency and Approximate Sufficiency. Annals of Mathematical Statistics,35(4):1419–1455,1964. [22] Q.Cao,T.Abdelzaher,T.He,andJ.Stankovic. TowardsOptimalSleepSchedul- ing in Sensor Networks for Rare–Event Detection. In Proceedings of the 4th ACM/IEEE International Symposium on Information Processing in Sensor Net- works(IPSN),pages20–27,LosAngeles,CA,April2005. 167 [23] M. Chen, S. Gonzalez, A. Vasilakos, H. Cao, and V. C. Leung. Body Area Net- works: ASurvey.MobileNetworksandApplications,16(2):171–193,April2011. [24] Y. Chen, Q. Zhao, V. Krishnamurthy, and D. Djonin. Transmission Scheduling for Optimizing Sensor Network Lifetime: A Stochastic Shortest Path Approach. IEEETransactionsonSignalProcessing,55(5):2294–2309,May2007. [25] Z. Chen. Bayesian Filtering: From Kalman Filters to Particle Filters, and Beyond. Technical report, McMaster University, 2003. [Online]. Avail- able at: http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.107.7415&rep=rep1&type=pdf. [26] H.Chernoff. SequentialDesignofExperiments. Annals of Mathematical Statis- tics,30:755–770,1959. [27] C. Chirarattananon and B. D. O. Anderson. The fixed–lag smoother as a finite– dimensionallinearsystem. Automatica,7:657–665,1971. [28] O. L. V. Costa, M. D. Fragoso, and R. P. Marques. Discrete-Time Markov Jump LinearSystems. SpringerProbabilityanditsApplicationsSeries,2005. [29] M.H.DeGroot. OptimalStatisticalDecisions. McGraw-Hill,Inc.,1970. [30] V. Della Mea. What is e–Health (2): The death of telemedicine? Journal of MedicalInternetResearch,3(2),June2001. [31] S. Dey and J. B. Moore. Risk-sensitive filtering and smoothing for Hidden MarkovModels. SystemsandControlLetters,25(5):361–366,August1995. [32] C. Di Natale, A. Macagnano, E. Martinelli, R. Paolesse, G. D’ Arcangelo, C. Roscioni, A. Finazzi Agro, and A. D’ Amico. Lung cancer identification by theanalysisofbreathbymeansofanarrayofnon–selectivegassensors. Elsevier: BiosensorsandBioelectronics,18(10):1209–1218,2003. [33] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley– Interscience,NewYork,NY,2edition,2001. [34] P. Dutta, M. Grimmer, A. Arora, S. Bibyk, and D. Culler. Design of a Wireless Sensor Network Platform for Detecting Rare, Random, and Ephemeral Events. In Proceedings of the 4th ACM/IEEE International Symposium on Information Processing in Sensor Networks (IPSN), pages 497–502, Los Angeles, CA, April 2005. [35] R. J. Elliott. Exact adaptive filters for Markov Chains Observed in Gaussian noise. Automatica,30(9):1399–1408,September1994. 168 [36] R. J. Elliott, L. Aggoun, and J. B. Moore. Hidden Markov Models, Estimation andControl,volume29ofApplicationsofMathematics. Springer,1995. [37] R. J. Elliott and H. Yang. How To Count and Guess Well: Discrete Adaptive Filters. AppliedMathematics&Optimization,30(1):51–78,July1994. [38] V.Fedorov. TheoryofOptimalExperiments. NewYork: Academic,1972. [39] R. A. Fisher. On the Mathematical Foundations of Theoretical Statistics. Philo- sophical Transactions of the Royal Society of London. Series A, 222(594– 604):309–368,1922. [40] D. Fraser and J. Poter. The optimum linear smoother as a combination of two optimumlinearfilters. IEEETransactionsonAutomaticControl,14(4):387–390, August1969. [41] J. A. Fuemmeler, G. K. Atia, and V. V. Veeravalli. Sleep Control for Tracking inSensorNetworks. IEEETransactionsonSignalProcessing,59(9):4354–4366, September2011. [42] J. A. Fuemmeler and V. V. Veeravalli. Smart Sleeping Policies for Energy Effi- cient Tracking in Sensor Networks. IEEE Transactions on Signal Processing, 56(5):2091–2101,May2008. [43] K. Goel and M. H. DeGroot. Comparison of experiments and information mea- sures. AnnalsofStatistics,7(5):1066–1077,1979. [44] C. Gourieroux and A. Monfort. Statistics and Econometric Models: Volume 1, GeneralConcepts,Estimation,PredictionandAlgorithms.CambridgeUniversity Press,1995. [45] V. Gupta, T. H. Chung, B. Hassibi, and R. M. Murray. On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage. Automatica,42(2):251–260,February2006. [46] M. Hadidi and S. Schwartz. Linear recursive state estimators under uncertain observations. IEEETransactionsonAutomaticControl,24(6):944–948,Decem- ber1979. [47] R. V. L. Hartley. Transmission of Information. Bell Systems Technical Journal, 7(3):535–563,July1928. [48] J. Haupt, R. Baraniuk, R. Castro, and R. Nowak. Sequentially designed com- pressedsensing. InProceedingsoftheIEEEStatisticalSignalProcessingWork- shop(SSP),pages401–404,AnnArbor,MI,August2012. 169 [49] M. L. Hernandez, T. Kirubarajan, and Y. Bar-Shalom. Multisensor resource deployment using posterior Cr´ amer–Rao bounds. IEEE Transactions on AerospaceandElectronicSystems,40(2):399–416,April2004. [50] A. O. Hero and D. Cochran. Sensor Management: Past, Present, and Future. IEEESensorsJournal,11(12):3064–3075,2011. [51] K.HerringandJ.Melsa. Optimummeasurementsforestimation. IEEETransac- tionsonAutomaticControl,19(3):264–266,June1974. [52] V.J.HodgeandJ.Austin. ASurveyofOutlierDetectionMethodologies. Artifi- cialIntelligenceReview,22(2):85–126,2004. [53] P.Hovareshti,V.Gupta,andJ.S.Baras. SensorSchedulingusingSmartSensors. In Decision and Control, 2007 46th IEEE Conference on, pages 494–499, New Orleans,LA,December2007. [54] O. C. Imer and T. Basar. Optimal Estimation with Limited Measurements. InternationalJournalofSystems,ControlandCommunications,2:5–29,January 2010. [55] T. Javidi and A. Goldsmith. Dynamic joint source-Channel coding with feed- back. In IEEE International Symposium on Information Theory Proceedings (ISIT),pages16–20,Istanbul,Turkey,July2013. [56] E. Jovanov, A. Milenkovic, C. Otto, and P. de Groen. A wireless body area net- work of intelligent motion sensors for computer assisted physical rehabilitation. JournalofNeuroEngineeringandRehabilitation,2(1),2005. [57] R. E. Kalman. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME–Journal of Basic Engineering, 82(Series D):35–45, 1960. [58] K.-D. Kim and P. R. Kumar. Cyber–Physical Systems: A Perspective at the Centennial. ProceedingsoftheIEEE,100:1287–1308,May2012. [59] V.Krishnamurthy. Algorithmsforoptimalschedulingandmanagementofhidden Markov model sensors. IEEE Transactions on Signal Processing, 50(6):1382– 1397,June2002. [60] V.Krishnamurthy. HowtoScheduleMeasurementsofaNoisyMarkovChainin DecisionMaking? IEEETransactionsonInformationTheory,59(7):4440–4461, March2013. 170 [61] V.Krishnamurthy,R.R.Bitmead,M.Gevers,andE.Miehling. SequentialDetec- tion with Mutual Information Stopping Cost. IEEE Transactions on Signal Pro- cessing,60(2):700–714,February2012. [62] V. Krishnamurthy and D. Djonin. Structured Threshold Policies for Dynamic Sensor Scheduling – A Partially Observed Markov Decision Process Approach. IEEETransactionsonSignalProcessing,55(10):4938–4957,October2007. [63] V. Krishnamurthy and J. B. Moore. On–Line Estimation of Hidden Markov Model Parameters based on the Kullback–Leibler Information Measure. IEEE TransactionsonSignalProcessing,41(8):2557–2573,August1993. [64] H. Kushner. On the optimum timing of observations for linear control systems with unknown initial state. IEEE Transactions on Automatic Control, 9(2):144– 150,April1964. [65] D.Lainiotis. AClassofUpperBoundsonProbabilityofErrorforMultihypothe- ses Pattern Recognition. IEEE Transactions on Information Theory, 15(6):730– 731,November1969. [66] E.L.Lehmann.Comparinglocationexperiments.AnnalsofStatistics,16(2):521– 533,1988. [67] M. Levorato, U. Mitra, and A. Goldsmith. Structure–based learning in wireless networks via sparse approximation. EURASIP Journal on Wireless Commmuni- cationsandNetworking,278(1):1–15,August2012. [68] M. Li, V. Rozgi´ c, G. Thatte, S. Lee, B. A. Emken, M. Annavaram, U. Mitra, D. Spruijt-Metz, and S. Narayanan. Multimodal Physical Activity Recognition by Fusing Temporal and Cepstral Information. IEEE Transactions on Neural SystemsandRehabilitationEngineering,18(4):369–380,August2010. [69] J. Ligo, G. Atia, and V. V. Veeravalli. A Controlled Sensing Approach to Graph Classification. In IEEEInternational Conferenceon Acoustics, Speech, and Sig- nalProcessing(ICASSP),pages5573–5577,Vancouver,Canada,May2013. [70] D. V. Lindley. On a measure of the information provided by an experiment. AnnalsofMathematicalStatistics,27(4):986–1005,1956. [71] W. S. Lovejoy. A survey of algorithmic methods for partially observed Markov decisionprocesses. AnnalsofOperationsResearch,28:47–66,April1991. [72] M. Maggioni and S. Mahadevan. A multiscale framework for Markov deci- sion processes using diffusion wavelets. Technical report, University of Mas- sachusetts, Amherst, 2006. [Online]. Available at: http://people.cs. umass.edu/ ˜ mahadeva/papers/06-36.pdf. 171 [73] S. I. Marcus. Optimal Nonlinear Estimation for a Class of Discrete–Time Stochastic Systems. IEEE Transactions on Automatic Control, 24(2):297–302, 1979. [74] E.Masazade,R.Niu,andP.K.Varshney. Anapproximatedynamicprogramming basednon-myopicsensorselectionmethodfortargettracking. InProceedingsof the 46th Annual Conference on Information Sciences and Systems (CISS), pages 1–6,PrincetonUniversity,March2012. [75] J. S. Meditch. On optimal linear smoothing theory. Journal of Information and Control,10:598–615,August1967. [76] R. K. Mehra. Optimization of measurement schedules and sensor designs for lineardynamicsystems. IEEE Transactions on Automatic Control,21(1):55–64, Feb1976. [77] L. Meier, J. Peschon, and R. M. Dressler. Optimal control of measurement subsystems. IEEE Transactions on Automatic Control, 12(5):528–536, October 1967. [78] U. Mitra, B. A. Emken, S. Lee, L. M., V. Rozgic, G. Thatte, H. Vathsangam, D.-S. Zois, M. Annavaram, S. Narayanan, M. Levorato, D. Spruijt-Metz, and G.Sukhatme. KNOWME:ACaseStudyinWirelessBodyAreaSensorNetwork Design. IEEECommunicationsMagazine,50(5):116–125,May2012. [79] J. B. Moore. Discrete–time fixed–lag smoothing algorithms. Automatica, 9(2):163–174,1973. [80] A.MullerandD.Stoyan. ComparisonMethodsforStochasticModelsandRisks. JohnWiley&Sons,2002. [81] M. Naghshvar and T. Javidi. Active M–ary Sequential Hypothesis Testing. In Proceedings of the IEEE International Symposium on Information Theory Pro- ceedings(ISIT),pages1623–1627,Austin,TX,June2010. [82] M. Naghshvar and T. Javidi. Active Sequential Hypothesis Testing. The Annals ofStatistics,41(6):2703–2738,December2013. [83] M. Naghshvar and T. Javidi. Sequentiality and Adaptivity Gains in Active Hypothesis Testing. IEEE Journal of Selected Topics in Signal Processing, 7(5):768–782,October2013. [84] M. Naghshvar and T. Javidi. Two–Dimensional Visual Search. In International Symposium Information Theory (ISIT), pages 1262–1266, Istanbul, Turkey, July 2013. 172 [85] M.Naghshvar,T.Javidi,andM.Wigger. ExtrinsicJensen–ShannonDivergence: Applications to Variable–Length Coding. ArXiv e-prints, June 2013. [Online]. Availableat: http://arxiv.org/pdf/1307.0067v1.pdf. [86] N. Nahi. Optimal recursive estimation with uncertain observation. IEEE Trans- actionsonInformationTheory,15(4):457–462,July1969. [87] S.NarayananandP.G.Georgiou. Behavioralsignalprocessing: Derivinghuman behavioral informatics from speech and language. Proceedings of the IEEE, 101(5):1203–1233,2013. [88] E.D.NerurkarandS.I.Roumeliotis. Resource-awareHybridEstimationFrame- workforMulti–robotCooperativeLocalization.TechnicalReport2012-001,Uni- versityofMinnesota,2012.[Online].Availableat:http://www-users.cs. umn.edu/ ˜ nerurkar/Nerurkar_multibithybridCL.pdf. [89] G. E. Newstadt, D. L. Wei, and A. O. Hero, III. Resource-Constrained Adaptive Search and Tracking for Sparse Dynamic Targets. ArXiv e-prints, April 2014. [Online].Availableat: http://arxiv.org/pdf/1404.2201v1.pdf. [90] S.Nitinawarat,G.K.Atia,andV.V.Veeravalli. ControlledSensingforMultihy- pothesis Testing. IEEE Transactions on Automatic Control, 58(10):2451–2464, October2013. [91] R.D.Nowak. Thegeometryofgeneralizedbinarysearch. IEEETransactionson InformationTheory,57(12):7893–7906,December2011. [92] Y. Oshman. Optimal sensor selection strategy for discrete–time state estimators. IEEE Transactions on Aerospace and Electronic Systems, 30(2):307–314, April 1994. [93] C. H. Papadimitriou and J. N. Tsitsiklis. The complexity of Markov decision processes. MathematicsofOperationsResearch,12(3):441–450,August1987. [94] K. B. Petersen and M. S. Pedersen. The Matrix Cookbook. [Online]. Avail- able at http://www2.imm.dtu.dk/pubdb/views/publication_ details.php?id=3274,November2008. [95] N. Phamdo and N. Farvardin. Optimal Detection of Discrete Markov Sources Over Discrete Memoryless Channels–Applications to Combined Source– Channel Doding. IEEE Transactions on Information Theory, 40(1):186–193, January1994. [96] R.Rangarajan,R.Raich,andA.O.Hero. OptimalSequentialEnergyAllocation for Inverse Problems. IEEE Journal of Selected Topics in Signal Processing, 1(1):67–78,June2007. 173 [97] I.RapoportandY.Oshman. Weiss–WeinsteinLowerBoundsforMarkovianSys- tems.Part1: Theory.IEEETransactionsonSignalProcessing,55(5):2031–2042, May2007. [98] H.E.Rauch,F.Tung,andC.T.Striebel. Maximumlikelihoodestimatesoflinear dynamicsystems. JournalofAmericanInstituteofAeronauticsandAstronautics, 3(8):1445–1450,August1965. [99] S.ReeceandD.Nicholson. TighteralternativestotheCram´ er–Raolowerbound fordiscrete–timefiltering. InProceedingsofthe8thInternationalConferenceon InformationFusion,Philadelphia,PA,July2005. [100] A.Ribeiro,G.B.Giannakis,andS.I.Roumeliotis. SOI–KF:DistributedKalman FilteringWithLow–CostCommunications UsingtheSignofInnovations. IEEE TransactionsonSignalProcessing,54(12):4782–4795,December2006. [101] U.Rieder. Structuralresultsforpartiallyobservedcontrolmodels. Methods and ModelsofOperationsResearch,35(6):473–490,1991. [102] R.T.RockafellarandR.J.B.Wets. VariationalAnalysis. Springer–Verlang,New York,NY,1edition,1997. [103] A. Sano and E. Terao. Measurement optimization in optimal process control. Automatica,6:705–714,1970. [104] L.Scardovi. InformationbasedControlforStateandParameterEstimation. PhD thesis,UniversityofGenoa,Genoa,Italy,2005. [105] M.J.Schervish. TheoryofStatistics. Springer,1995. [106] A.Segall. Recursiveestimationfromdiscrete-timepointprocesses. IEEETrans- actionsonInformationTheory,22(4):422–431,September1976. [107] A.Segall. StochasticProcessesinestimationtheory. IEEETransactionsonInfor- mationTheory,22(3):275–286,May1976. [108] B. Settles. Active Learning Literature Survey. Computer Sciences Tech- nical Report 1648, University of Wisconsin–Madison, 2009. [Online]. Availableat:http://research.cs.wisc.edu/techreports/2009/ TR1648.pdf. [109] A.SeyediandB.Sikdar. EnergyEfficientTransmissionStrategiesforBodySen- sor Networks with Energy Harvesting. IEEE Transactions on Communications, 58(7):2116–2126,July2010. 174 [110] C.E.Shannon. Amathematicaltheoryofcommunications. BellSystemsTechni- calJournal,27:379–423,623–656,July1948. [111] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poolla, M. I. Jordan, and S. S. Sastry. Kalman filtering with intermittent observations. IEEE Transactions on AutomaticControl,49(9):1453–1464,September2004. [112] R. D. Smallwood and E. J. Sondik. Optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21:1071–1088, 1973. [113] J. L. Speyer and W. H. Chung. Stochastic Processes, Estimation and Control. SIAM,2008. [114] G. Thatte, M. Li, S. Lee, B. A. Emken, M. Annavaram, S. Narayanan, D. Spruijt-Metz, and U. Mitra. Optimal Time–Resource Allocation for Energy– Efficient Physical Activity Detection. IEEE Tansactions on Signal Processing, 59(4):1843–1857,April2011. [115] E.Torgersen. StochasticOrdersandComparisonofExperiments. Hayward,CA: InstituteofMathematicalStatistics,pages334–271,1991. [116] J.UnnikrishnanandV.V.Veeravalli. AlgorithmsforDynamicSpectrumAccess with Learning for Cognitive Radio. IEEE Transactions on Signal Processing, 58(2):750–760,February2010. [117] H. L. Van Trees and K. L. Bell. Bayesian Bounds for Parameter Estimation and NonlinearFiltering/Tracking. IEEEPress,Wiley-Interscience,2007. [118] A. Wald and J. Wolfowitz. Optimal character of the sequential probability ratio tests. AnnalsofMathematicalStatistics,19(3):326–339,1953. [119] H. Wang, H.-S. Choi, N. Agoulmine, M. J. Deen, and J. W. K. Hong. Information–Based Energy Efficient Sensor Selection in Wireless Body Area Networks. In Proceedings of the IEEE International Conference on Commu- nications(ICC),pages1–6,Kyoto,Japan,June2011. [120] Y.Wang,B.Krishnamachari,andM.Annavaram.Semi–MarkovStateEstimation andPolicyOptimizationforEnergyEfficientMobileSensing. InProceedingsof the 9th Annual IEEE Conference on Sensor, Mesh and Ad Hoc Communications andNetworks(SECON),pages533–541,Seoul,Korea,June2012. [121] Y. Wang, B. Krishnamachari, Q. Zhao, and M. Annavaram. Markov–optimal Sensing Policy for User State Estimation in Mobile Devices. In Proceedings of the9thACM/IEEEInternationalConferenceonInformationProcessinginSensor Networks(IPSN),pages268–278,Stockholm,Sweden,April2010. 175 [122] D.WeiandA.O.Hero. MultistageAdaptiveEstimationofSparseSignals. IEEE JournalofSelectedTopicsinSignalProcessing,7(5):783–796,April2013. [123] E. Weinstein and A. J. Weiss. A general class of lower bounds in parameter estimation. IEEE Transactions on Information Theory, 34(2):338–342, March 1988. [124] J. L. Williams, J. W. Fisher, and A. S. Willsky. Approximate Dynamic Pro- grammingforCommunication–ConstrainedSensorNetworkManagement. IEEE TransactionsonSignalProcessing,55(8):4300–4311,August2007. [125] W. Wu and A. Arapostathis. Optimal Sensor Querying: General Markovian and LQG Models with Controlled Observations. IEEE Transactions on Automatic Control,53(6):1392–1405,July2008. [126] F. Xaver, P. Gerstoft, G. Matz, and C. F. Mecklenbrauker. Analytic Sequen- tial Weiss–Weinstein Bounds. IEEE Transactions on Signal Processing, 61(20):5049–5062,October2013. [127] P. Zappi, C. Lombriser, T. Stiefmeier, E. Farella, D. Roggen, L. Benini, and G. Tr¨ oster. Activity Recognition from On–Body Sensors: Accuracy–Power Trade–Off by Dynamic Sensor Selection. In Proceedings of the 5th European Conference on Wireless Sensor Networks (EWSN), pages 17–33, Bologna, Italy, February2008. [128] D.-S. Zois, M. Levorato, and U. Mitra. Active Classification for POMDPs: a Kalman-likeStateEstimator.ArXive-prints,December2013.[Online].Available at: http://arxiv.org/pdf/1312.2039v1.pdf. [129] D.-S.Zois, M.Levorato, andU.Mitra. Energy–Efficient, Heterogeneous Sensor SelectionforPhysicalActivityDetectioninWirelessBodyAreaNetworks. IEEE TransactionsonSignalProcessing,61(7):1581–1594,April2013. 176
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Learning and control for wireless networks via graph signal processing
PDF
Communication and cooperation in underwater acoustic networks
PDF
Sequential decision-making for sensing, communication and strategic interactions
PDF
Optimal resource allocation and cross-layer control in cognitive and cooperative wireless networks
PDF
Multichannel data collection for throughput maximization in wireless sensor networks
PDF
Learning and decision making in networked systems
PDF
Sensing with sound: acoustic tomography and underwater sensor networks
PDF
Design of cost-efficient multi-sensor collaboration in wireless sensor networks
PDF
Gradient-based active query routing in wireless sensor networks
PDF
Novel optimization tools for structured signals recovery: channels estimation and compressible signal recovery
PDF
Learning and control in decentralized stochastic systems
PDF
Theory and design of magnetic induction-based wireless underground sensor networks
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Application-driven compressed sensing
PDF
Magnetic induction-based wireless body area network and its application toward human motion tracking
PDF
Improving efficiency to advance resilient computing
PDF
Interaction and topology in distributed multi-agent coordination
PDF
Algorithms and frameworks for generating neural network models addressing energy-efficiency, robustness, and privacy
PDF
Intelligent near-optimal resource allocation and sharing for self-reconfigurable robotic and other networks
PDF
Learning, adaptation and control to enhance wireless network performance
Asset Metadata
Creator
Zois, Daphney-Stavroula
(author)
Core Title
Active state tracking in heterogeneous sensor networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
07/21/2014
Defense Date
06/11/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
active classification,approximation algorithms,Blackwell ordering,body area networks,controlled sensing for inference,energy efficiency,Fisher metric,heterogeneous sensing modes,innovations theory,Kalman filtering,Kalman smoothing,Markov chain,OAI-PMH Harvest,optimal control policy,partially observable Markov decision processes,POMDP,sensing strategy,sensor networks,stochastic dynamical systems,Weiss-Weinstein lower bound
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mitra, Urbashi (
committee chair
), Jain, Rahul (
committee member
), Tambe, Milind (
committee member
)
Creator Email
daphneyzois@gmail.com,zois@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-446569
Unique identifier
UC11287769
Identifier
etd-ZoisDaphne-2726.pdf (filename),usctheses-c3-446569 (legacy record id)
Legacy Identifier
etd-ZoisDaphne-2726.pdf
Dmrecord
446569
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zois, Daphney-Stavroula
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
active classification
approximation algorithms
Blackwell ordering
body area networks
controlled sensing for inference
energy efficiency
Fisher metric
heterogeneous sensing modes
innovations theory
Kalman filtering
Kalman smoothing
Markov chain
optimal control policy
partially observable Markov decision processes
POMDP
sensing strategy
sensor networks
stochastic dynamical systems
Weiss-Weinstein lower bound