Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Tactile object localization: behavioral correlates, neural representations, and a deep learning hybrid model to classify touch
(USC Thesis Other)
Tactile object localization: behavioral correlates, neural representations, and a deep learning hybrid model to classify touch
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Tactile Object Localization: Behavioral Correlates, Neural Representations, and a Deep Learning Hybrid Model to Classify Touch by Phillip Scott Maire A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (NEUROSCIENCE) August 2023 Copyright 2023 Phillip Scott Maire ii Acknowledgements First I thank my advisor, Dr. Andrew Hires, who made this possible by accepting me into his lab. Andrew, thank you for introducing me to an incredible array of new technologies and concepts, for challenging me along the way to be better, and for your guidance through the years. There was always plenty to get excited about in the Hires lab and I will miss the energy and vibrant culture that helped me grow. I would also like the extend my gratitude to all the members of the Hires lab across the years. Thank you Dr. Judith Hirsch for serving on my committee and greeting me so warmly when I first moved to Los Angeles. Thank you Dr. Huizhong Tao for serving on my committee and for your help through my rotation. I am greatly appreciative of Dr. Woody Petry whose lab I first joined at 19 years old, where I first gained my fascination with the brain. Thank you to Dr. Wenhao Dang who helped me record my first neuron, the first of many through countless all night recording sessions. Thank you to Dr. Martha Bickford for your guidance and enthusiasm, and for accepting me into your lab after graduating. A special thanks to Dr. Sean Masterson, Dr. Na Zhou, and Ark Slusarczyk for your help and for always being so kind. I have been irreversibly shaped for the better through the countless late nights, deep talks, and adventures I’ve had with my friends here. I am incredibly fortunate to be surrounded by so many remarkably intelligent and kindhearted people. A special thank you to all of you including Aida Bareghamyan, Dr. Nora Benavidez, Dr. Jonathan Cheung, Dr. Soyoung Choi, Dr. Andrew Erskine, Dr. Eric Hendricks, Dr. Jinho Kim, Samson King, Dr. Clarrisa Liu, Dr. Adam Lundquist, Zachary Murdock, Alicia Quihuis, Stef Walker, Dr. Rachel Yuan, Dr. Lily Zou and so many more. I want to extend my gratitude to my good friends David Casper and Ian Weber who helped me grow multitudes over the years through the many adventures, deep talks, and creative endeavors we shared together. Thank you to my family for all your support. A special thank you to my mother Ann Maire and sister Abby Maire. It is difficult to put into words how much you mean to me and how much your support has helped me get through every difficult moment in my life. I would like to dedicate my dissertation to both of you. I love you both so much. To my new family Sarah Ma and Winnie Ma, I love you so much and your support over the last two and a half years has meant the world to me. I am so excited to continue our adventures together. iii Table of Contents Acknowledgements ....................................................................................................................................... ii Table of Contents ......................................................................................................................................... iii List of Tables ................................................................................................................................................. v List of Figures ............................................................................................................................................... vi Introduction .................................................................................................................................................... 1 Chapter 1: The sensorimotor and neural underpinnings of whisker-guided object localization behavior in head-fixed mice .......................................................................................................................... 5 Introduction ................................................................................................................................................ 5 Results ........................................................................................................................................................ 7 Discussion ................................................................................................................................................ 19 Materials and Methods ............................................................................................................................ 22 Chapter 2: An object location code in whisker S1 ...................................................................................... 30 Introduction .............................................................................................................................................. 30 Results ...................................................................................................................................................... 30 Discussion ................................................................................................................................................ 40 Materials and Methods ............................................................................................................................ 43 Chapter 3: Multisensory response properties in secondary somatosensory cortex ..................................... 51 Introduction .............................................................................................................................................. 51 Results ...................................................................................................................................................... 52 Discussion ................................................................................................................................................ 61 iv Proposed analysis and details of additional dataset ................................................................................. 64 Materials and Methods ............................................................................................................................ 69 Chapter 4: Whisker Automatic Contact Classifier (WhACC) with Expert Human-Level Performance ................................................................................................................................................. 74 Abstract .................................................................................................................................................... 74 Introduction .............................................................................................................................................. 75 Discussion ................................................................................................................................................ 93 Materials and Methods ............................................................................................................................ 98 Concluding remarks ................................................................................................................................... 106 References .................................................................................................................................................. 108 v List of Tables Table 2.1. Table comparing properties of non-touch (n=68), non-location touch units (n=12), and location touch units ...................................................................................................................................... 37 Table 4.1 Performance metrics across models after median smoothing ..................................................... 84 Table 4.2 Performance metrics across models without median smoothing. Same as Table 4.1 but without median smoothing .......................................................................................................................... 85 vi List of Figures Figure 1.1 Head-fixed task and performance ................................................................................................ 8 Figure 1.2 Motor strategy and its influence on patterns of touch ................................................................ 10 Figure 1.4 Mice discriminate location using more than touch count .......................................................... 13 Figure 1.5 Mice discriminate location using features correlated to azimuthal angle rather than radial distance .............................................................................................................................................. 14 Figure S1.1. Comparison of touch count + touch angle classifier performance versus touch count + each Hilbert component individually, Related to Figure 1.6 .................................................................... 18 Figure 2.1 Trial structure with spikes. Trial structure with example traces of recorded stimuli and spikes ........................................................................................................................................................... 32 Figure 2.2 L5B excitatory neurons encode a representation of self-motion during free-whisking ............. 33 Figure 2.3 L5B S1 excitatory units are tuned object location at touch ........................................................ 34 Figure 2.4 Lesioned mice can detect touches but not discriminate object location .................................... 35 Figure 2.5 Object location tuning does not require specialized training ..................................................... 36 Figure 2.6 Object location is decodable to <0.5 mm precision from touch-evoked spike counts ............... 37 Figure 2.7 Active touch unmasks a distinct population code for object position in Layer 5 of S1 ............. 38 Figure S2.1 Naïve vs trained animals comparison. ..................................................................................... 39 Figure 3.1 Recording S2 neurons during a head-fixed object localization with audio playback ................ 52 Figure 3.2 S2 neuron are sound responsive ................................................................................................. 53 Figure 3.3 Touch and audio onset latency across depth. ............................................................................. 54 Figure 3.4 S2 neurons show complex and temporally dynamic responses to touch and sound .................. 55 vii Figure 3.5 Touch and audio responsive neurons are overlapping and concentrated in L5 and upper L6 ................................................................................................................................................................. 57 Figure S3.1 Touch response is modulated by time from pole up sound ...................................................... 65 Figure S3.2 Frequency balanced audio using iterative filter application .................................................... 70 Figure 4.1 Flow diagram of WhACC video pre-processing and design implementation ........................... 78 Figure 4.2 Touch frame scoring and variation in human curation .............................................................. 79 Figure 4.3 Data selection and model performance ...................................................................................... 82 Figure 4.4 Feature engineering and selection .............................................................................................. 88 Figure 4.5 – WhACC shows expert human level performance ................................................................... 90 Figure S4.1 WhACC curation GUI ............................................................................................................. 89 Figure 4.6 Retraining WhACC on a small sample of data can account for differences in datasets ............ 92 Introduction Working towards a deep understanding of how the brain executes computations and prioritizes information is of great value. This research can spark advancements in medical treatments, unveil overarching theories of information processing, and provide insights into how our minds form a persistent and coherent perception of our environment. Seminal research across numerous animal and human studies have laid a foundation for understanding how sensory and internal signals are represented, organized, and transformed through different stages of processing. Many of these studies were conducted in anesthetized and non-behaving awake animals. Despite their groundbreaking contributions, it has become clear that in order to attain a complete understanding of the brain, it is necessary to study neural representations during active behavior (Krakauer et al., 2017), since functional representations change based on the state of the animal (Niell and Stryker, 2010; Land et al., 2012; Haider et al., 2013; Vinck et al., 2015; McGinley et al., 2015; Shimaoka et al., 2018; Shumkova et al., 2021). However, studying animals during active behavior increases the complexity for multiple reasons. For one, behaving animals result in additional uncontrolled variables, which often need to be measured and quantified to meaningfully interpret neural responses. Additionally, these behavioral variables are not controlled by the experimenter, and so disentangling them can be difficult if they are correlated. Lastly, the complex brain-behavior relationship which is defined by the causal loop between the two; leads to further complications when determining if a given neural representation results from a given behavior (e.g. skilled motor movement) or if it is intrinsically computed independent of behavior (i.e. in a bottom up fashion). 2 The rodent whisker system serves as an excellent model for studying functional properties of the brain during active behavior. Systems have been designed to film and quantify the position, mechanical forces, and contact times of whiskers (Maire et al., 2023). In the whisker primary sensory cortex (S1), each whisker is topographically represented, which can be precisely targeted using intrinsic signal imaging prior to collecting neural data. While we cannot control behavior directly, intricate behavioral paradigms have been designed, which grant experimenters some control based on the structure of the task. These same paradigms give some scope of the advanced whisker-mediated computations rodents can perform, many times using only a single whisker (Brecht et al., 1997; Mehta et al., 2007; O’Connor et al., 2010a; Helmchen et al., 2018; Cheung et al., 2019; Kim et al., 2020; Pacchiarini et al., 2020). These advantages combined with the rich literature on the whisker system, positions it as an effective model to study the brain- behavior relationship. An interesting whisker mediated behavior rodents employ is object localization, which has been studied behaviorally (Knutsen et al., 2006; Mehta et al., 2007; Curtis and Kleinfeld, 2009; O’Connor et al., 2010a; Cheung et al., 2019) and neurologically (O’Connor et al., 2010b; O’Connor et al., 2013; Cheung et al., 2019) in both free moving and head-fixed experimental conditions. Rodents’ ability to localize objects is even more interesting considering their lack of proprioception in their whisker follicles (Moore et al., 2015). Because of this, mice must solve single whisker mediated object localization tasks using an alternative strategy. Previous literature found that free moving rodents strategically orient their heads in addition to whisking when trained to differentiate the position of two objects (Knutsen et al., 2006). Despite this, mice can accurately locate an object when in head stable (Mehta et al., 2007) and head fixed (O’Connor et 3 al., 2010a; O’Connor et al., 2010b; Cheung et al., 2019; Cheung et al., 2020) conditions. How mice accurately localized objects using only one whisker is not known, but likely involves a combination of a behavioral motor strategy combined with somatosensation. Given this, single whisker object localization offers a great opportunity to study the brain-behavior relationship through the motor strategy and the somatosensory signals. In Chapter 1 we study the behavioral motor strategy mice employ to solve this task (Cheung et al., 2019). In Chapter 2, we examine how these neural signals are represented in S1 during this active behavior (Cheung et al., 2020). An essential aspect of forming a persistent and coherent perception of our environment, is integrating various streams of information and establishing their relationship to one another. One key example of this is multisensory integration. Many tasks designed in neuroscience involve multisensory integration even if the task itself wasn’t designed to study this specifically. For example, in whisker object localization, a pole is moved into reach of the rodents whiskers and in doing so a sound from either a motor or pneumatic valve provides a reliable cue that the task has started (Mehta et al., 2007; O’Connor et al., 2010a; O’Connor et al., 2010b; Cheung et al., 2019; Cheung et al., 2020). Similarly, we found that the motors used to move the pole in our task make distinct sounds, and without proper controls mice can even solve the task using the sound itself. Even the valve used to release a liquid reward makes a faint but distinct clicking sound within the hearing range of mice. In Chapter 3 we discuss multisensory interaction we observe in the mouse secondary somatosensory cortex (S2), which respond to touch as well as sound. Various software tools have offered a way to quantify whisker variables during active behavior. These tools have proven to be indispensable to advancing our understanding of both neurological 4 and behavioral aspects of the rodent whisker system. Tools to track whisker position and shape, allow researchers to derive velocity, angle, amplitude, phase, midpoints, bending and forces of the whisker (Knutsen et al., 2005; Voigts et al., 2008; Perkon et al., 2011;Lepora et al., 2011; Clack et al., 2012; Towal et al., 2011; Betting et al., 2020). However, one of the most important features to studying whisking behavior is the precise time of touch (Jadhav et al., 2009; Hires et al., 2015; Bale and Maravall, 2018). Despite this, no system has been designed to quantify this and as a result time of touch is a time consuming manual curation process that varies across different laboratories. In Chapter 4 we discuss a python package which addresses this called whisker automatic contact classifier (WhACC; Maire et al., 2023). WhACC offers a pre-trained deep learning hybrid model, which is customizable to new datasets, which can save thousands of hours of labor, as well as open the doors to collect and process arbitrarily large datasets. 5 Chapter 1: The sensorimotor and neural underpinnings of whisker-guided object localization behavior in head-fixed mice Introduction Active tactile perception is an essential animal behavior that integrates directed sensor movement against objects and mechanoreceptor signals from objects, to create mental representations of those objects. The rodent whisker system serves as an excellent model system for studying active tactile perception because whiskers are external, enabling scientists to determine contact times, contact based bending forces, position, and position derived variables (Clack et al., 2012). Mice actively move their whiskers (i.e., whisk), repeatedly contacting and sweeping over objects of interest to feel their environments, similar to how humans use their hands to navigate a dark room. The range and variety of whisker-based capabilities in mice and other rodents are extensive and cannot be exhaustively listed in this paper. However, a few notable examples include localizing objects in space (Cheung et al., 2019; Cheung et al., 2020), discriminating angles (Kim et al., 2020), quickly run while tracking a wall (Sofroniew et al., 2014), discriminate between different textures (Wu et al., 2013; Helmchen et al., 2018; Pacchiarini et al., 2020), discriminate shape (Brecht et al., 1997), facilitate social interactions (Rao et al., 2014; Soumiya et al., 2016; Ebbesen and Froemke, 2021) and more (Diamond et al., 2008). These abilities are supported by a proportionally large primary sensory cortex dedicated to the whisker pad which occupied approximately 40-50% of S1 (Hubatz et al., 2020). An interesting area of research in active sensing is object localization. A key concept in object localization is proprioception, which is based in stretch-based mechanoreceptors which relay 6 information about the position and movement of the body and limbs. Interestingly, the mouse whisker facial muscles lack proprioceptors (Moore et al., 2015). Despite this sensory deficit, and as mentioned above, rodents can discriminate objects in space precisely using their whiskers (Krupa et al., 2001), even while head-fixed (O’Connor et al., 2010a; Cheung et al., 2019). This begs the question as to how the brain and whisker sensory system localize objects despite this. To examine this behaviorally, a paradigm that offers precise experimental control and the capacity to measure whisker and behavioral variables is crucial. O’Connor et al. (2010a) designed an object localization paradigm to do exactly that; where mice are rewarded based on their ability to discriminate a thin pole in two different locations. This study revealed that mice can discriminate azimuthal localizations of about 6°, only one whisker is needed to differentiate between two positions, and lesions of S1 reduce performance to change levels. These experiments were a step forward in our understanding of object localization in mice, however much is still unknown. In our behavioral experiments we address some of these unknowns. Specifically, we investigate which sensorimotor features best predict mouse choice and which best predict actual object location. Additionally, we explore the types of motor strategies mice employ to solve the task. Also of great interest is how neural response properties of active touch localization are represented in the cortex. Using the same two position object localization task, O’Connor et al. (2010b) investigate the neural response properties in S1. They reveal that ~63% layer 4 and 79% of layer 5 neurons in S1 distinguish between locations. Further, they reveal that while most discriminating neurons were found in layer 4 and layer 5 but some were also found in layer 2/3. 7 This study emphasizes the advantage of studying neural activity in the context of active behavior as they each offer complementary information on one another. There are however some unanswered questions which we address in our experiments. We determine if location tuning is learning or innate. Next, we examine whether location neurons are tuned to specific locations that span all whisker angles or if they can only broadly differentiate between go and no-go positions. For instance, earlier studies identified location neurons using only two locations (O’Connor et al., 2010b). Therefore, the ability to distinguish between locations might be an indirect result of neurons being tuned to features correlated with location (e.g., the entire rewarded location, phase, midpoint etc.). Finally, we quantify how precisely the object's location can be decoded using location tuned neurons. Results Objection localization task and performance and strategy For our object localization task, we used a modified version of the go/no-go whisker-guided localization task in head-fixed mice mentioned above (O’Connor et al., 2010a). In our task instead of two discrete positions, a vertical pole was randomly presented in contiguous ranges of go (0-5mm) and no-go (5-10mm) positions along the anteroposterior axis of the animal, approximately 8mm distal from the whisker pad (Figure 1.1A). Mice were trimmed to a single whisker (C2) prior to training and water restricted so that a water reward motivated task engagement. Whisker position was automatically traced from 1000 fps video (Clack et al., 2012) and was used to derive velocity, bending forces, and times of contact (Figure 1.1B). 8 The trial structure started with the pole in the down position which was inaccessible to the mouse. Then the pole moved vertically into up position which was cued by the sound of a Figure 1.1 Head-fixed task and performance. A) Trained mice report the perceived location of a pole presented along the anteroposterior axis via licking (go) or not licking (nogo). B) Overhead view of tracked whisker for two trials. To eliminate variation from fur, azimuthal angle is determined at the intersection of mask and whisker trace. C) Trial structure with example imaging frames at top. Pole presentation is triggered 500ms from session start and takes ~200ms to come into reach. Azimuthal angle time-series for 15 consecutive trials are overlaid with the sampling period (750ms duration), answer period (1250ms duration) and licks. D) Possible trial outcomes based on pole presentation and mouse choice. E) The average reaction time for each individual mouse (gray circles) and the mean ± SEM for all mice (black circle). F) Learning rates for this task highlighting 7,000 trials before and 1000 trials after reaching expert (75% accuracy over 200 trials). Inset, number of trials required to reach expert for each mouse in gray and population in black (mean ± std 8194±1816 trials). G) Psychometric performance curves for individual mice (gray) and across the population (black) expert in the task (n=15 mice). Bars denote the mean number of pre decision touches prior to decision for go (blue) and nogo (red) trials. H) Performance between go-nogo pairs of bins with the max distance of 0.5, 1, 2, 3, 4 and 5 mm from the discrimination boundary. Circles denote individual mice. X denote mean ± SEM across the population. P- values comparing population hit trials to false alarm trials: 0.5mm p=9.2e-3, 1mm p=5.2e-13, 2mm p=1.5e-15, 3mm p=1.5e-15, 4mm p=1.5e-15, 5mm p=1.5e 15; 2-sample t-test. (t-stat, degrees of freedom: 0.5mm=2.6, 276; 1mm=7.4, 552; 2mm=20.0, 596; 3mm=21.8, 569; 4mm=26.4, 578; 5mm=31.8, 594) 9 pneumatic valve and initiating a sharp stereotyped whisking onset. Mice were given a 0.75s second sampling period, to explore the pole location where any lick responses were not considered. This was followed by a 1.25 second answering period where licking or withholding licking indicated the perceived pole location (Figure 1.1C). A total of four trials outcomes could occur, mice could lick during a go trial (hit) and collect a water reward, lick during a no-go trial (false alarm) and get a 2 second timeout punishment, withhold licking during a no-go trial (correct rejection), or without licking during a go trial (miss; Figure 1.1D). In all analyses, we only consider sensorimotor behavior (e.g., whisks and touches) that contributed to a decision by including only data before the decision lick. The decision lick is defined as the first lick in the answer period for lick trial and median of the decision lick time for no-lick trials. This cutoff excludes post-decision motor activity that is driven by rhythmic licking on hit and false alarm trials. The reaction time between the first touch in a trial and the decision lick was 736ms ± 240ms (741±249ms on hit trials and 690±243ms on false alarm trials; Figure 1.1E). Mice take on average eight full training sessions or 8194±1816 trials to reach expert performance which we defined as more than 75% for the average of 200 continuous trials. Each Training session was 485±179 trials on average. Lick probability was statistically different than chance for go and no-go trials when the pole was presented ≤1mm(3.8±0.5° mean angle difference, 29%±11% lick difference, p=4.4e-5, 2-tail t- test) or ≤0.5mm (1.9±0.4° mean angle difference, 18%±20% lick difference, p=0.03, 2-tail t-test) from the decision boundary (Figure 1.1G, H). This shows that despite the lack of proprioceptors, mice can discriminate object location with submillimeter precision. 10 Upon the loud cue sound created by the pneumatic valve, mice initiate a large amplitude whisk that aligns with the cue (Figure 1.2A). This transition from a stable whisker position to a cue driven onset is low latency and shows little variation (60ms±16ms; Figure 1.2B). We divided whisks into cycles and then defined each cycle as a single whisk. We examined whisks in trials with touch and trials without touch, then compared the number of whisks before touch or before median Figure 1.2 Motor strategy and its influence on patterns of touch. A) Heat map of whisking amplitude for one mouse. Trials are sorted with first at the bottom and grouped by trial outcome. White dots are time points of first touch. Magenta circles show time points of first lick after onset of pole presentation. B) Whisking amplitude relative to time of pole onset for each mouse (gray) and average for all mice (black). Mean ± std of whisking onset from cue is 60ms±16ms C) Left, population distribution for the number of whisks before first touch. Right, population distribution of the number of whisks after first touch and before decision. For no-touch trials, the median first touch time for that mouse was used. Distribution difference is quantified using Kullback-Leibler divergence (see Methods). D) Mean ± std of the peak protraction relative to the discrimination boundary for each whisk in a go (blue) or nogo (red) trial before decision. E) Proportion of trials with touch for each mouse. Bars = SEM F) Trial type prediction performance of a logistic classifier based on touch presence compared to each mouse’s trial type discrimination performance. Bars = SEM. G) The proportion of go or nogo trials in which licking occurs conditioned on whether touch occurred on that trial. Bars = STD. 11 touch time respectively. Pre-touch whisks did not differ across touch and no-touch trials (Kld 0.04; Figure 1.2C), which shows that failure to touch the pole is not due to the failure to whisk. However, the number of post-touch whisks is significantly greater in touch trials (6.5±3.2) compared to non-touch trials (2.5±2.1, Kld 1.12; Figure 1.2C). This shows that mice adapt their whisking strategy based on if the pole is encountered, perhaps to gather more information on the location by sampling more. Whereas trials without touch are treated more like a detection task, trials with touch require more information to discriminate subtle differences in location. On average, mice direct their first whisk so the peak of the protraction stops just as it is about to leave the rewarded go position, followed by a second whisk directly at the decision boundary (Figure 1.2D). If this was consistent on a trial-by-trial basis, this would be an effective way to solve this task, essentially turning it into a detection task. However, this was not the case as the variance was large (10.9° mean stdev), suggesting that mice use a noisy motor strategy. This strategy led to mice touching the pole 94.6±1.5% SEM go trials and 54.9±6.1% SEM no-go trials (Figure 1.2E). Using the presence of touch alone, a logistic classifier could perform at 70.5%±9.5% accuracy which was significantly less than mice (81.2±5.7%; Figure 1.2F). Finally, when comparing all touch trials mice were much more likely to lick, confirming that touch alone is not enough to explain mouse performance (Figure 1.2G). Taken together this shows that mice must be using more than just the presence of touch (i.e., detection) to determine pole location and can discriminate between touch trials using some sensory motor features (e.g., touch count, touch angle, touch strength etc.). 12 Sensorimotor feature that best predict trial type and mouse choice Next we built one logistic classifier to predict trial type. The features which fail to predict trial type are likely of no use to the mouse to correctly predict the trial type. This is because we have high resolution spatiotemporal features that have negligible noise, so if these near perfect features fail to predict trial type. Thus, even if mice do have access to these features, they would not be useful for making a decision. The only caveat of this would be if features showed interactions. On the other hand, if a feature can accurately predict trial type, then this suggests the mouse could be using it but by no means does it suggest that is being used. The results from Figure 1.3 The distribution of sensorimotor features and their utility for predicting trial type and choice. A-F) Distribution of six sensorimotor features on go and nogo trials associated with six localization models for one example mouse. Lines show optimal logistic classifier for discriminating trial type from feature distribution. G) Trial type prediction performance of logistic classifiers for all mice based on each of the six features. Bars = SEM. Touch trials only. H) Choice prediction performance of logistic classifiers for all mice trained on pole position, each of the six features or all six features combined. Bars = SEM. Touch trials only. Significance based on Wilcoxon signed rank vs. shuffled models. 13 an example mouse show that as expected, radial distance and angle at touch prejudice nearly perfectly because they are derived from the location of the pole which itself defines trial type (Figure 1.3E, F). Alternatively, the other four features have some only predictive power (Figure 1.3A-D). When examined across all mice, we use Matthew’s correlation coefficient (MCC) to account for unbalanced data. These results show that radial distance (MCC 0.98+/-0.02; accuracy 98.9%+/-0.1%) and angle at touch (MCC 0.93+/-0.04; accuracy 97.0%+/-1.4%) still predict near perfectly while touch count performs the next best (MCC 0.43 +/- 0.07; accuracy 77.4 +/- 2.1%). Cue latency also performed statistically different than change (MCC 0.33+/-0.15; accuracy 73.1%+/-6.9%), emphasizing how stereotyped whisking is upon cue sound and up until touch onset. Figure 1.4 Mice discriminate location using more than touch count. A) Population average of touch count distributions and associated lick probabilities for all mice in go (blue) and nogo (red) trials. (P- values for 0:5 touches = 0.64, 4.4e-4, 1.7e-3, 1.1e-4, 7.0e-4, 5.8e-3; 2-tailed pair t-test. [t-stat, degrees of freedom: 0 touches=0.48, 13; 1 touch=4.9, 11; 2 touches=3.9, 13; 3 touches=5.3, 4; 4 touches=4.4, 13; 5 touches=3.5, 10]) B) Touch count influence on licking controlled for pole position. Number of touches normalized to mean number of touches for each pole position plotted against lick probabilities for go (blue) and nogo (red) trials. Lick probabilities are shown as mean ± 95% confidence intervals. 14 We further investigated the effect of touch count on mouse choice by comparing lick probability for go and no-go trials where the number of touches were the same (Figure 1.4A). We find that when touch count is between one and six, average mouse lick probability is different for go and no-go. On the contrary, as touches increase above six, mice tend towards always licking, and so the lick probability for go and no-go trials become indistinguishable. On one hand, this seems strange, as more touches should theoretically lead to more information and therefore better performance. On the other hand, no-go pole positions closer to the decision boundary are more likely to be touched more times, and no-go performance at these Figure 1.5 Mice discriminate location using features correlated to azimuthal angle rather than radial distance. A) Task design. After 120 trials of anteroposterior pole presentation, angle or distance trials were presented with 50% probability. B) The angle presentation positions (blue) held distance to the discrimination boundary constant while varying azimuthal angle across the anteroposterior task range. The distance presentation positions (cyan) held azimuthal angle fixed to the discrimination boundary angle while varying distance across the anteroposterior task range. Go positions spanned a range of 31+/-1.6° or 8-10mm distant while nogo positions spanned 19+/-3° or 10-13mm distant. C) Mean psychometric performance curves ± SEM for each class of trials across the population (n=5 mice, 15 sessions). D) Mean accuracy ± std across the population for each task. The mean accuracy for the angle trials was not significantly different from the anteroposterior (p = 0.26; one-way ANOVA). Distance trial performance was at chance, and significantly different from the anteroposterior and angle task (anteroposterior p=9.6e-10, angle p=9.5e-10; one-way ANOVA [F-value, degrees of freedom = 96.5, 36]). 15 locations is also worse because it is harder to distinguish smaller location differences (Figure 1.1G). To investigate this further, we compared the difference between actual and average number of touches for that pole location (Figure 1.4B). Both no-go trials with less location- controlled touches than average and go touches with more, show only modest increase in performance. On the contrary, no-go trials with more location-controlled touches than average, show a sharp increase in lick probability. Similarly, a steep decrease in lick probability is observed for go trials with less location-controlled touches on average. Taken together this suggests touch count is an integral component for how mice estimate object location while head fixed, but this cannot explain their performance entirely. Along with touch count, both radial distance and touch angle were the other two features with high importance for predicting mouse choice. These features were nearly perfectly correlated because they were dependent on the location of the pole. Because of this we could not effectively identify which was more important for choice using only analytical techniques. To address this, we designed a variation on the localization task where the standard anteroposterior location was broken into its component parts. The first was angle, which presented the pole in an arc to keep the distance to the follicle the same. The second was distance, which moved along the range of possible distances based on the original task but kept the angle of the pole stationary at the decision boundary angle (Figure 1.5A-B). Mice were trained on the original localization task until they reached expert performance. For three sessions following expert level, mice performed the original localization task for 120 trials to establish a consistent context. Following this, mice were randomly presented with either angle or distance trials with 50% probability. Psychometric performance curves for angle trials were indistinguishable from anteroposterior trials across all 16 mice but fell to change level for distance trials (n=5 mice, 15 sessions; Figure 1.5C and 1.6D). This conclusively demonstrates that, in our localization task, mice use whisker angle at touch to discriminate the pole position, not the radial distance. Considering that mice lack proprioception (Campagner et al., 2016), and their primary sensory afferents code for phase but not for whisker angle (Fee et al., 1997; Campagner et al., 2016; Severson et al., 2017; Severson et al., 2019), what accounts for the improved performance beyond touch count, which is best predicted by whisker angle? One possibility is the Hilbert recomposition model (Kleinfeld and Deschênes, 2011), according to which S1 receives phase information via sensory afferents as well as midpoint and amplitude from M1 (Hill et al., 2011), which are combined to reconstruct the whisker angle at touch. This model utilizes the Hilbert transform, which can reconstruct angle by employing a linear combination of the 17 phase, amplitude, and midpoint (Figure 1.6A-B). We tested if this model was consistent with mouse choice by training a classifier to predict choice using all three features (Figure 1.6C) and showed that it performed similarly (MCC 0.51±0.17) to the whisker angle model (MCC 0.48±0.19; Figure 1.6D). It should be noted that because phase is a period variable it would result in poor performance if directly fed into a logistic classifier, because of this we only use protraction touches to construct these models (89.9% +/-5.9% of the touch trials). To further dissect which of these components best predicted mouse choice, we constructed classifiers for each component alone. All these models performed similarly well but worse than Figure 1.6 Choice can be best predicted by a combination of touch count and whisking midpoint at touch. A) Time varying azimuthal angle can be transformed to the Hilbert components amplitude, midpoint, and phase. Example exploration bout for go (blue) and nogo (red) trial. B) Average autocorrelation across all mice for angle, amplitude, midpoint and phase. C) Choice prediction space for one mouse using Hilbert features. D) Classifier performance measured using MCC between angle and Hilbert features (p=0.76; Wilcoxon signed-rank test). E) Performance (MCC) of classifiers trained with individual model components versus angle at touch. Significant differences: angle to phase (p=1.5e-2), amplitude (p=6.7e-3), and midpoint (p=1.2e-2). N.S. differences: phase to amplitude (p=0.23), phase to midpoint (p=0.80) and amplitude to midpoint (p=0.52). All compared using Wilcoxon signed-rank test. F) Performance (MCC) of classifiers trained with individual model components plus touch count, versus angle at touch plus touch count. Significant differences: angle to phase (p=3.4e-3) and amplitude (p=2.0e-3). N.S. differences: phase to amplitude (p=0.64), phase to midpoint (p=0.19), amplitude to midpoint (p=0.12) and angle to midpoint (p=0.64). All compared using Wilcoxon signed-rank test. G) Heatmap of one sorted session task structure, sensorimotor inputs, classifier predictions, and mouse choice. Continuous variables (pole position, touch count, midpoint at touch, angle at touch, midpoint+touch count choice prediction, and angle+touch count choice prediction) are normalized from minimum (-1) to maximum (+1). NaN data is gray. Categorical variables (trial type, primary touch direction, mouse choice) are colored as in the legend. H) Comparison of midpoint+touch count and angle+touch count classifiers for all trials across individual mice (gray) and the population mean±std (turquoise). (p=0.64; Wilcoxon signed-rank test). Black arrow denotes mouse shown in example in G. I) Psychometric curves for optimal trial type discrimination performance using midpoint+counts and angle+counts compared against mouse choice for example mouse in G. J) Comparison of discrimination resolution between optimal trial type classifiers and mouse performance from Figure 1.1H. Shading denotes distance from discrimination boundary. 18 the angle model (Figure 1.6E). We then built models for each component combined with touch count. Adding the touch count increased performance for all three component models as well as for the angle model, indicating that the touch count provides complementary information about mouse choice (Figure 1.6F). Midpoint + touch count performed the best among the three and was the only one not statistically different from angle + touch count (MCC 0.59±0.04 SEM). To gain a comprehensive understanding of how features predict choice across all trials, we built a model which included all trials using a two-step process. First, for the no-touch trials, we used touch as the only predictor (i.e., always 0), since the sensorimotor features at touch were undefined for these trials. This model invariability and expectedly predicted ‘no lick’ for all no touch trials. On the remaining touch trials, we built a model using angle + touch count and another using midpoint + touch count which performed equally well on average (angle + count 0.72±0.04 SEM, accuracy 87.5±4.7%; midpoint + count MCC 0.71±0.03 SEM, accuracy 87.2%±3.9%; Figure 1.6G). These models also performed similarly to one another in individual mice (MCC r2 = 0.87; Figure 1.6H). Amplitude or phase with touch count also performed well, but significantly worse than angle with touch count (Figure S1.1). Figure S1.1. Comparison of touch count + touch angle classifier performance versus touch count + each Hilbert component individually, Related to Figure 1.6. 19 We wanted to further investigate the relationship between touch count, whisker angle at touch, and mouse performance. To do this we built a midpoint + touch count model and an angle + touch count model which were trained to predict trial type, but then evaluated how well they predicted mouse choice. This was effectively like using the model as a proxy for the mouse itself, since mice are attempting to do the same thing, albeit with an unknown level of access to these sensorimotor features. Both models predict trial type well but the midpoint + touch model best explains psychometric curves in 14/15 mice (Figure 1.6I). The same was true for the discrimination resolution; midpoint + touch count better explained mouse performance compared to angle + touch count (Figure 1.6J). Taken together, we find that mice use a targeted, noisy, and adaptive whisking strategy to localize objects. Touch count is accessible to the mouse and is a strong predictor for mouse choice. Finally, midpoint + touch count best predicts mouse choice and are therefore the most likely candidate features mice use to solve head fixed localization. Discussion We analyzed mice performing an active object localization task and revealed they can discriminate between submillimeter locations above chance (≤0.5mm and <2°; Figure 1.1). They adapt a strategy that begins with a stereotyped cue-initiated whisking onset followed by a touch dependent adaptive strategy (Figure 1.2). Further, sensorimotor features can predict the trial type (Figure 1.3). Despite radial distance and angle at touch both being near perfect predictors, our angle and distance behavioral task proved that mice were only using angle at touch (Figure 1.5). Given that mice likely don't have direct access to angle at touch, investigated if touch count alone could explain lick probability and found that it could not (Figure 1.4). Next we demonstrated how the three Hilbert components could predict choice; both with and without 20 touch count, and showed that touch count provided complimentary information (Figure 1.6). Finally, we found that midpoint + touch count best matches animal psychometric curves which suggests that mice could use this method to solve the task. We note some limitations to our study. Torsional roll angle was calculated using only a single overhead view of the mouse whisker, and we also did not account for how it may be affected by inertial bending during whisking. Despite this, previous experiments in rats showed a tight linear relationship between overhead view whisker curvature and roll angle (Knutsen et al., 2008), which provides support for our methodology. We observed that mice had a higher false alarm rate when they made more touches (Figure 1.4), which seems to be at odds with evidence accumulation models, whereby more samples lead to more information which should result in a non-negative impact on performance. For example, rats performing a texture discrimination task show a positive relationship between the number of touches and correct choice (Zuo and Diamond, 2019). In this example though, rats were freely moving. This is important since another study found that rats trained to discriminate the closest of two poles (on opposite sides of their face), showed that they consistently orient their head towards the closer pole and did so in a way that is proportional to the distance between poles (Knutsen et al., 2006). This suggests that head position and head movement are important for detecting location in free moving conditions. Given this, one interpretation of our result is that evidence accumulation can decrease performance if the system collecting that evidence is not designed to measure that variable. In other words, mice are using multiple sources of information as a proxy for object location since they lack inherent whisker location information (i.e., 21 proprioception). This is consistent with mice employing a noisy whisking strategy where an approximate number of touches are expected for each pole location. By relying on this strategy too heavily, mice leave themselves open to incorrect assessments when their noisy whisking strategy delivers more touches in a no-go position, or less touches in a go position (Figure 1.4B). As we discussed thoroughly, midpoint combined with touch count best predict mouse choice among all the features examined. Despite this it is important that future analysis also consider a model containing all features which drive strong neural responses along the feedforward somatosensory pathway and their derivatives. For example, because the phase + touch count model still performs well (Figure S1.1), we should consider adding whisker force or bending as a feature. Force or bending responsive neurons exist in the sensory afferent (Campagner et al., 2016; Severson et al., 2017; Severson et al., 2019; Fee et al., 1997) and could provide complimentary information to phase and touch. Further, whisk latency was examined alone (Figure 1.3G-H) but could be examined as part of the aforementioned model to determine if it could provide complimentary information as well. Finally, S1 has access to touch onset and offset information and length of a touch can be derived from this information. Length of a touch could carry considerable information about object location given an appropriate whisking strategy. S1 has access to all these features and so it is important to consider these features as part of a complete model. Another interesting analysis which is related to Figure 1.6I-J, would be to train two models for each feature: one that predicts trial type (go vs no-go) for correct trials (Model A) and another that predicts trial type for only incorrect trials (Model B). If mice use a given feature to make a 22 choice, then it stands to reason that this feature will covary with performance, therefore by comparing the net increase in information from model B to model A, we can infer which features mice are using to make decisions. For example, we know that using whisker angle at touch can predict trial type almost perfectly, and so in this case, model A and model B would likely be nearly identical, resulting in a value near zero for this analysis. This is subtly but distinctly different from what we have done already, and this is exemplified in the previous sentence where whisker angle at touch would be assigned a value of zero, while in all other analyses we have done thus far, it was always an important feature. Further, in this analysis if somehow model A was less predictive of trial type than model B, it would be a very strong indicator that mice are not using this feature. Materials and Methods Experimental model and subject details Fifteen VGAT/ChR2/EYFP mice (JAX B6.Cg-Tg), both male and female, of at least 3 months of age were used for the following experiments. A complete description of head-plate installation, water restriction procedure and behavioral apparatus has been described in previous work (O’Connor et al., 2010a). Following head-plate installation, mice were housed with littermates and singly housed if fighting occurred. Mice were provided food ad libitum. 7 days prior to training, mice were water restricted to 1mL of water per day. During this period, a daily health and weight assessment was completed to ensure mice were healthy. All procedures were approved under USC IACUC protocols 20169 and 20731. 23 Object localization task Mice were trained in a whisker-based go/no-go localization task. Using a single whisker (C2), mice learned to identify a smooth 0.6mm diameter pole 7-12mm lateral from the whisker pad as either a posterior rewarded location (go) or anterior unrewarded location (no-go). Pole positions were presented across a continuous range of 10mm along the anteroposterior axis with a go/no- go discrimination boundary at the center of this range. The pole was positioned by a pair of stepper linear actuators with 99nm resolution, 25µm accuracy and <5µm repeatability (Zaber NA11B30-T4). To avoid potential pole motion duration clues to position, between trials the motors first moved to the discrimination boundary then to the presentation location. To avoid potential ultrasonic clues associated with stepper motor function, the pole location was randomly jittered 0-127 microsteps (0-25µm) on each trial. The pole was vertically lifted into reach by a pneumatic linear slider (Festo) which also provided a sound cue on pole presentation onset. The position of this slider and the valve, and thus the location and amplitude of this cue sound, is fixed for all trials, confirmed by audio recording with an Earthworks M50 ultrasonic microphone. Mice made their decisions by licking or withholding licking to an electrical port during stimulus presentation. 4 trial outcomes were available: hit and miss or false alarm and correct rejection by licking or not licking on a go or no-go trial. On hit trials, a water reward (4- 8µL) was dispensed. The total amount of water dispensed of the session was limited only by the number of trials the mice chose to perform. False alarm trials led to a 2 second timeout that reset upon each lick. Correct rejection and miss trials were unpunished. Each trial was 4000ms or longer. The pole was triggered to rise at 500ms from trial start and came into touch range within ~200ms. The sampling period was 0-750ms after pole onset. Licking within this time block had no effect. The answer period was 1250-2000ms. Licking 24 within this time block led to Hit or False Alarm outcome. Licking in this time also prolonged the period of pole presentation to provide the opportunity for additional sensory feedback to help learning. The extended presentation time does not affect any analyses since only pre-lick touches are considered in this work. The inter-trial interval was 2000ms. To quantify learning rates all sessions leading up to the expert session were used, excluding one to two rig acclimation sessions. Expert threshold was set at >75% accuracy smoothing across 200 trials. Training 15 mice were trained in the object localization task. In the first sessions, the farthest go position was set ~30 degrees anterior of the resting whisker position. Optimal learning was achieved by first setting a gap between go and no-go ranges and slowly reducing that gap as performance improved. The initial gap set between go and no-go ranges were 4mm. Once mice reached >75% accuracy over 200 trials, this gap was reduced in 1mm increments till the go and no-go ranges were contiguous, with their shared border defined as the discrimination boundary. Five expert mice in the object localization task were tested on the angle/distance task. Angles and distances were calculated from the estimated follicle position at the discrimination boundary to the full range of pole positions in the object localization task. During the angle/distance task, 120 trials of the object localization task were first presented to establish baseline performance levels. Next, angle trials or distance trials were presented at 50% chance levels for the remainder of the session. Whisker motion acquisition and analysis Whisker behavior was captured for 4 seconds spanning the period prior to pole onset to response window. Video frames were acquired at 1000fps using Basler acA200-340kmNIR camera and 25 Edmund Optics 0.18X ½" GoldTL™ Telecentric Lens (Model # 52-258) under 940nm illumination on Streampix 6 software. Whisker position was tracked using Janelia Whisker Tracker (https://www.janelia.org/open-science/whisk-whisker-tracking; (Clack et al., 2012)). A mask was traced from the edge of the fur and whisker follicle was estimated 1mm back from the mask. The whisker’s azimuthal angle was quantified at the point of intersection of the mask and whisker trace, to avoid tracking noise in the fur. Whisking midpoint, amplitude and phase was decomposed from this angle using the Hilbert transform. Hilbert decompositions were calculated from band-pass filtered (6-60Hz, Butterworth) whisker angle time-series. Whisking amplitude is defined as the magnitude of the Hilbert transform of the filtered whisker angle. Whisking midpoint is defined as the filtered (6-60Hz) difference between the raw whisker angle time-series and the band-pass filtered signal. Whisking phase is defined as the phase angles of the Hilbert transform of the filtered whisker angle time-series. Whisker curvature was measured at 3-5mm out from the mask. The precise millisecond of touch was determined through custom MATLAB software (https://github.com/hireslab/HLab_Whiskers) using distance to pole and change in whisker curvature, followed by manual curation of images of uncertain whisker and pole intersections. Quantification and statistical analysis In all analyses, we considered only whisker motion and touch before the decision lick, the first lick of the answer period. On trials without licking, the median decision lick time on lick trials was used as the decision point. Licks before the answer period were ignored. To minimize the effects of change internal states of motivation, attention, satiety or frustration, the set of the 200 highest performing contiguous trials in a single session per mouse was used for all analyses. 26 Trials (0-15) where the animal was grooming or the video dropped 1 or more frames were removed from this set of 200. Adaptive whisking analyses Pre-touch windows were defined as the time from stimulus onset to first touch. Post-touch windows were set as time of first touch to the first lick. If no first touch or first lick was present, the median first touch time or median first lick time of the session was used. A whisk is defined as the number of whisking peaks with a whisking amplitude of 5 or greater. The difference in distributions is quantified using Kullback-Leibler divergence from using kl_div from Mathworks (https://www.mathworks.com/matlabcentral/fileexchange/20688-kullback-leibler-divergence). Trial type and choice prediction Retraction and protraction touches occur with ~pi radian offset in phase, which makes phase difficult to express as a linear function. Therefore we excluded retraction touches and trials with exclusively retraction touches for the Hilbert transform decoders (Fig 7C-F). For all other analysis retraction touches were included. The list of features used to predict trial type (go/no-go) or choice (lick/no lick) and their description are: • motor position (the horizontal motor position in microsteps for each trial) • touch presence (the presence or absence of a touch pre-decision) • touch counts (the number of touches pre-decision) • roll angle (the mean whisker curvature 1ms prior to touch for each trial) 27 • whisk latency (the mean time in milliseconds from the nearest whisking trough prior to touch for each trial) • cue latency (the time of first touch from cue onset in milliseconds) • radial distance (the mean radial distance from follicle at touch to pole position for each trial) • angle (the mean whisker angle at touch for each trial) • phase (the mean phase of the whisker at touch for each trial) • amplitude (the mean amplitude of the whisker at touch for each trial) • midpoint (the mean midpoint of the whisker at touch for each trial) • combined (curvature, cue latency, whisk latency, touch counts, radial distance and angle for each trial) • hilbert decomposition (phase, amplitude, and midpoint) For features using multiple predictors, each feature was mean normalized using the following equation: 𝑥 ! = x−mean(x) max(𝑥)−min (𝑥) The logistic classifier was adapted from Andrew Ng’s Machine Learning Course (https://www.coursera.org/learn/machine-learning) and modified to include lasso regularization. Sigmoid link function: ℎ " (𝑥) = 𝑔(𝜃 # 𝑥) 28 where 𝑔(𝑧) = $ $%& !" Cost function: 𝐶𝑜𝑠𝑡(ℎ " (𝑥),𝑦) = 7 −log;ℎ " (𝑥)<𝑖𝑓 𝑦 = 1 −log;1−ℎ " (𝑥)<𝑖𝑓 𝑦 = 0 𝐽(𝜃) = $ ' ∑ [−𝑦 ( logDℎ " ;𝑥 ( <E−;1−𝑦 ( <logD1−ℎ " ;𝑥 ( <E] ' () $ + Regularization L1 lasso regularization equation: 𝜆∗I|𝜃 ( | * () $ Where 𝜆 is the regularization parameter, 𝜃 are the partial regression coefficients of the model and N is the number of parameters. Gradient (partial derivative of the cost function): 𝜕𝐽(𝜃) 𝜕𝜃 + = 1 𝑚 I;ℎ " ;𝑥 ( <−𝑦 ( <𝑥 + (() ' () $ The cost function was minimized through the fmincg MATLAB function. The inputs of this function are the cost and the gradient: 𝐽(𝜃) and ./(") ." # . Classifier model evaluation For each set of features the optimal regularization parameter 𝜆, classifier performance and partial regression coefficients 𝜃 were evaluated across 20 iterations with 5-fold stratified 29 cross-validation. Optimal 𝜆 was chosen as the mean 𝜆 value between peak 𝜆 and first 𝜆 one SEM away from the peak. Classifier performance was calculated using Matthew’s correlation coefficient (MCC). MCC provides an unbiased metric of model performance in light of imbalanced datasets (Boughorbel et al., 2017). MCC values range from 1 to -1 with 1 meaning perfect model performance, 0 meaning chance, and -1 meaning all predictions are errors. The MCC was calculated using the following equation: 𝑀𝐶𝐶 = 𝑇𝑃∗𝐹𝑁−𝐹𝑃∗𝐹𝑁 R(𝑇𝑃+𝐹𝑃)∗(𝑇𝑃+𝐹𝑁)∗(𝑇𝑁+𝐹𝑃)∗(𝑇𝑁+𝐹𝑁) where TP are true positives, TN are true negatives, FN are false negatives and FP are false positives predictions. In order to interpret the weight of the logistic classifier, partial regression coefficients were converted to odds ratios using the following equation: 𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 = 𝑒 " Odds ratios were normalized between 0 and 1 and multiplied by their respective sign for each cross-validation step and averaged to calculate the normalized weight of each feature in prediction. 30 Chapter 2: An object location code in whisker S1 Introduction Evaluating neural representations across the cortex provide valuable insights into how the brain organizes and prioritizes information to understand and operate in the environment. Historical studies investigating sensory pathways have provided accessible interpretations of the progression of increasingly complex and abstracted sensory representations. However, it is becoming more apparent than ever that these functional representations are modulated by anesthesia (Land et al., 2012; Haider et al., 2013; Shumkova et al., 2021) and behavioral states (Niell and Stryker, 2010; Vinck et al., 2015; McGinley et al., 2015; Shimaoka et al., 2018); emphasizing the importance of understanding neural representations during behavior. Here we evaluate the neural representation of the object location task described in Chapter 1. Specifically, we use single-unit juxtacellular recordings from L5B of the C2 barrel of mouse whisker S1. More than half of all active neurons encode free-whisking angle and one third encode object location at touch. Surprisingly these populations are mostly non-overlapping and for neurons modulated by both, their tuning curves are uncorrelated. These neural representations exist independent of training. Finally, neurons can be pooled to decode object location equal to and exceeding expert mouse performance. Results Experimental design Water-restricted mice (n=16 Vgat-ChR2-EYFP) were trained on the behavioral task outlines in Chapters 1 and 2 (Figure 1.1A). High speed 1000 fps overhead video recorded whisker motion 31 which was traced (Figure 1.1B) to extract whisker motion, motion derived variables and whisker touch. We recorded from single units using juxtacellular loose-seal patch pipettes targeted to L5 in the C2 barrel column of S1 (Figure 2.1). Optogenetic tagging identified if neurons were excitatory or inhibitory based on short latency spiking in response to illumination of S1 with 473nm light. We recorded from 156 single units, 20 units were silent, 14 were putative interneurons leaving 122 active excitatory neurons. Using these neurons, we examine which sensorimotor features best describe the neural activity to gain a better understanding of how they are represented in S1-L5. Unless otherwise noted, all further analysis is performed on the 122 active excitatory neurons. Representation of self-motion We first investigated the neural correlates of self-motion during free-whisking, were no touch has occurred. A total of 96 of 122 neurons were significantly modulated by whisking (Chi- squared test). On average neurons increased from 5.0 ± 5.6 spks/s during non-whisking to 6.0 ± 7.1 spks/s during whisking (mean ± SD; Figure 2.2A). Whisking was volitional and so whisking varied across mice. Within the observed ranges of whisking, 60 neurons were significantly 32 modulated with respect to whisker angle and 46 were tuned to phase (Figure 2.2B-D). We found no neurons tuned to phase alone, and most angle tuned neurons were also tuned to phase (46/60) (Figure 2.2E). Whisking phase neurons evenly tiled phase space, while angle tuned neurons skewed towards edge locations, especially the posterior (Figure 2.2 C-D). We compared the modulation depth of neurons tuned to at least feature (Figure 2.2F) and found that the modulation depth of angle tuned neurons was significantly greater than to phase (p = 8.9e-6, Figure 2.2F-G; Methods). Of the 60 tuned neurons, the modulation depth of angle most strongly correlated with midpoint compared to phase, amplitude, or velocity (Figure 2.2H-I). Furthermore, the absolute modulation depth of angle and phase were also the most similar, although these distributions were largely overlapping (Figure 2.2J). These findings are consistent with the importance of midpoint in predicting choice (Cheung et al., 2019), and point to a potential role of midpoint being used to construct angle-tuned representations. Further, we find that during free-whisking, L5B neurons are more tuned to whisking angle compared to whisking phase on average. Figure 2.1 Trial structure with spikes. Trial structure with example traces of recorded stimuli and spikes. Phase masked to periods of amplitude >5 degrees. Pole presentation is triggered 500 ms from trial start and takes approximately 200 ms to come into reach. Pole exits at varying times based on trial events. 33 Representation of object location Next we examined the same 122 excitatory neurons during active touch. A total of 54 neurons were excited by touch and found their responses to be short latency and temporally sharp (Table Figure 2.2 L5B excitatory neurons encode a representation of self-motion during free-whisking. A) Firing rates for non-whisking (5.0 ± 5.6 Hz) and whisking periods (6.0 ± 7.1 Hz) (p=2.9e-2, t-stat 2.2, df 121, paired sample t-test). Data are represented as mean ± S.D. B) Three example units tuned to both whisking angle (blue) and phase (red). C) Population heat map for units tuned to whisker angle sorted by peak angle response. D) Population heat map of phase tuned units sorted by peak phase response. E) Pie chart of self-motion tuning across the L5B excitatory population. Phase (red, 0/122), angle (blue, 14/122), co-tuned (black, 46/122) and not tuned to either (gray, 62/122). F) Normalized positional preference for the 3 examples in B, phase (red), angle (blue). G) Absolute modulation depth (Methods) comparison between free-whisking phase and angle tuning. Red dot and error bars denote phase/angle mean ± SEM (4.2/7.1 ± 0.5/0.8 spks/s, p = 8.88e-6, t-stat= -4.87, df = 59; paired t- test). H) Modulation depth of angle, phase, amplitude, and midpoint for all angle tuned units. I) Contingency table of Pearson correlation coefficients for modulation depths across angle and motor variables. J) Difference in absolute modulation between angle and motor variables (motor – angle modulation). Phase to angle (mean ± SEM = 2.8 ± 0.6, p = 8.9e-6, t-stat = 4.9, df = 59). Amplitude to angle (mean ± SEM = 1.7 ± 0.6, p = 8.2e-3, t-stat = 2.7, df = 59). Midpoint to angle (mean ± SEM = 1.2 ± 0.5, p = 0.01, t-stat = 2.7, df = 59). Velocity to angle (mean ± SEM = 2.6 ± 0.6, p = 2.4e-5, t-stat = 4.6, df = 59). All compared using paired t test. 34 2.1). A subpopulation of 42 of these 54 neurons were modulated based on the anteroposterior touch location (Figure 2.3A). We found touch location tuned neurons across most of the depth we record from, but a larger ratio was found between 690-840µm from pia (Figure 2.3B). The preferred object location tiles the space but heavily favors the edge locations (defined by how the mice whisked), especially for the anterior pole locations (Figure 2.3C-D). The mean half-max width was 1.8 mm or approximately 9.2° of azimuth (Figure 2.3E). These results suggest that the location selectivity of touch neurons in L5B underscores its importance in transforming whisker touch into a representation of object location. Next we wanted to determine if S1 barrel cortex was required for our task design. We trained 4 mice to expert level and proceeded to lesion S1 by aspirating all cortical layers at and around the C2 barrel column (n=4). Performance outcomes were mixed, two mice were less affected by Figure 2.3 L5B S1 excitatory units are tuned object location at touch. A) Raster for example neuron (top) tuned to far object locations (bottom). B) Average firing rate vs. depth from pia for active non-location (black), location (gold), and silent (gray) units. C) Touch peri-stimulus time histogram (left) and location tuning curves (right) for three example units tuned to far (top), middle (middle), and close (bottom) pole positions. Data are represented as mean ± SEM. D) Population heat map of object-location tuned units, sorted by preferred location. White spaces are insufficiently sampled pole locations. E) Shape of normalized tuning curves across all object-location tuned units. Data are represented as mean ± SEM. Mean half max width response was 1.8 mm (~9.2° of azimuth). 35 lesions and continued to perform well, while the other two showed a failure to perform the task post lesion. We binned pole location across all trials where at least one touch had occurred, to determine if mice could still discriminate between locations (figure 2.4). We find that mice are not able to discriminate location for touch trials. On the contrary, when we divide trials into those with touch in the go position and those without in the no-go position, mice show expert level performance (Figure 2.4). This suggests that mice do not need barrel cortex to detect the presence of touch, but do need it to discriminate location when touches occur in both locations. Touch location tuning was examined in both naïve (n=10) and trained (n=6) mice. In the untrained group, water rewards were randomly given without regard to pole location (92 recording sessions). As described in Cheung et al., 2019, trained mice underwent discrimination training for go and no-go locations, with water only available in the posterior go range (30 recording sessions; Figure 2.5A-B). Trained mice exhibited increased touch frequency and reduced whisking time compared to naïve mice (Figure S2.1A). However, there was no significant difference in the proportion of touch-responsive units tuned to object location Normalized pole location Touch trials only Non-touch no-go trials Touch go trials 0 0.5 1 Lick probability Pre-lesion Post-lesion Posterior Anterior Normalized pole location 0 0.5 1 Lick probability Posterior Anterior AB Figure 2.4 Lesioned mice can detect touches but not discriminate object location. A) Psychometric performance curves for only touch trials before (green) and after (red) lesions. B) Psychometric performance plot binned into go (left) and no-go trials (right), before (green) and after (red) lesions (n=4). 36 between naïve (n=26/35, 74.3%) and trained (n=14/19, 73.7%) mice (p = 1.0, Fisher’s exact test; Figure 2.5C). Notably, a larger proportion of touch-responsive units was observed in trained animals (Figure S2.1B). The tuning width did not differ between the two groups (Figure 2.5D), and the preferred locations spanned the entire range of presented locations for both naïve and trained mice (Figure 2.5E-F). Figure 2.5 Object location tuning does not require specialized training. A) Schematic of two tasks. Mice were presented a pole randomly in a 10mm range, 7 – 12mm from the face. Naïve task, reward was available on 50% of trials, regardless of pole position. Trained task, reward exclusively available 100% of time in 0-5mm proximal Go range. B) Performance on naïve (49.3% ± 3.1% mean ± SD) and trained (66.9% ± 8.1% mean ± SD) recording sessions (p = 2.4e-34, t-stat = 17.2, df = 120, unpaired t-test). C) Proportion of touch units that are location tuned for naïve (left; 77.1%) vs trained (right; 78.9%) animals. D) Shape of normalized tuning curves for touch location units from naïve (gray) and trained (red/blue) mice. E). Population heatmap of touch location units from naïve (top 27 units) and trained (bottom 15 units) animals, sorted by preferred object location. Each row denotes a single location neuron. F) Histogram of positional preference of touch location units compared between naïve and trained animals. (p=0.12, t-stat = -1.6, df = 40; two-sample t-test). 37 For downstream neurons to accurately estimate the location of the pole using this population of location tuned neurons, they must sample a certain number to achieve a performance that approaches mouse psychometric curves. The question remains, how many neurons are required and how accurate can we decode location? To investigate this, we constructed a multinomial Non-touch units (n=68) Non-location touch units (n=12) Location touch units (n=42) Mean SD Median min - max Mean SD Median min - max Mean SD Median min - max Whisking (Hz) 4.47 5.81 1.38 .01 - 23.86 4.14 4.33 2.67 .20 - 26.33 9.10 10.20 5.44 .06 - 41.70 Quiet (Hz) 4.27 5.26 2.15 .04 - 43.89 3.79 4.38 1.53 .10 - 29.05 6.65 6.37 4.41 .04 - 28.81 Proportion of spikes evoked by touch 0.19 0.17 0.13 .00 - .72 0.33 0.23 0.26 .05 - .72 0.41 0.23 0.33 .09 - .99 Proportion of spikes evoked by touch + whisking 0.45 0.19 0.42 .10 - .88 0.52 0.21 0.56 .18 - .86 0.60 0.20 0.60 .16 - .99 Touch onset latency (ms) 12.33 6.33 11.00 6.00 - 26.00 10.12 4.39 9.50 4.00 - 22.00 Touch response duration (ms) 17.25 8.84 16.00 4.00 - 34.00 18.52 9.75 17.50 4.00 - 43.00 Spikes in response window (#) 0.40 0.33 0.37 .04 - 1.25 0.74 0.80 0.52 .07 - 3.60 Probability of touch response 0.29 0.19 0.26 .03 - .64 0.43 0.29 0.43 .05 - .93 Probability of response at peak bin 0.36 0.19 0.33 .07 - .84 0.55 0.27 0.59 .09 - .99 Probability of response at trough bin 0.21 0.23 0.12 .00 - .65 0.26 0.24 0.18 .00 - .88 Response at peak bin (Hz) 24.97 10.32 22.44 48.37 - 233.33 51.87 35.49 42.15 45.45 - 200.00 Response at trough bin (Hz) 11.58 10.47 6.99 45.88 - 200.00 19.97 21.60 13.79 27.00 - 200.00 Table 2.1. Table comparing properties of non-touch (n=68), non-location touch units (n=12), and location touch units. Figure 2.6 Object location is decodable to <0.5 mm precision from touch-evoked spike counts. A) Contingency table of pole location decoding performance from 25 pooled unique touch location units using a multinomial GLM B) Performance as a function of pool neuron count. C) Average psychometric curves from 15 expert mice (gray; Cheung et al., 2019) and neurometric curves from varying numbers of sampled location units. D) Performance from neurometric curves compared to expert mice. Data are represented as mean ± S.D. Solid black lines denote points significantly different (p < 0.05; two-sample t-test) from expert mouse performance. 38 Figure 2.7 Active touch unmasks a distinct population code for object position in Layer 5 of S1. A) Proportion of units tuned to whisker angle during free-whisking (blue, 36/122), at touch (gold, 19/122), co tuned (black, 25/122), or not-tuned (gray, 44/122). B) Absolute (top) and normalized (bottom) tuning curves for angle responses during free whisking (blue) and at touch (gold). C) Absolute modulation depth for angle tuning during free-whisking and touch for each class in A. D) Shape correlation between whisking and touch tuning curves for all units tuned to whisking and/or touch (blue and gold hash) compared to shuffled responses (gray). Kolmogorov-Smirnov p=0.18. E) Preferred angle during free- whisking vs. at touch. Single-tuned units on histograms, co-tuned units on plot. Distance from midline for co-tuned units: (mean ± SD = 12.6 ± 10.9°, p = 9.7e-6, t-stat = 5.6, df = 24; one sample t-test). 39 generalized linear model (GLM) to predict pole location. Predictors were the number of spikes generated from simulated touches based on the tuning curves of individual neurons (Methods). A linear classifier pooling the touch-evoked spike counts from 25 of our location tuned neurons (the subset with > 75 touches in > 80% of binned pole positions) predicted the pole location to < 0.5 mm distance from actual on 60.5% ± 1.3% (mean ± S.D.) of touches (Figure 2.6A-B). We constructed neurometric performance curves based on our model and compared them to expert mice performance (Figure 2.6C). We find that randomly sampling from five or more location- tuned neurons yielded virtual performance that met or exceeded expert mice (Figure 2.6D; Methods). Given this result, it is possible for downstream neurons to decode touch location equal to expert mice if they receive input from at least five location-tuned neurons. Active touch location tuning is independent of whisking angle tuning We wanted to test if location-tuned neurons were simply free whisking angle tuned neurons which were upregulated by touch. We find multiple lines of evidence that suggest they are Figure S2.1 Naïve vs trained animals comparison. A) Comparison of number of touches made per trial (left, p=3.0e-4) and proportion of time whisking (right, p = 5.5e-4) between naive (gray) and trained animals (red/blue hash). Both compared using two-sample Kolmogorov Smirnov test. B) The distribution of non-touch units, touch location units, and touch non-location units compared between recordings from naïve (n = 92) and trained (n = 30) animals. 40 separate from one another, implying that the tuning in each, arises through different means. We find that only 20% of neurons are tuned to both free-whisking and touch, while 16% are tuned to only touch and 30% are tuned to only whisking (Figure 2.7A). If these distributions were independent, the expected overlap would be 18%, which suggests that these tuning properties are independent. These observations eliminate the necessity of touch location-tuned neurons to be tuned to whisking at all. We then explored if the preferred angle of the co-tuned neurons aligned for whisking and touch (Figure 2.7B). We calculated the absolute modulation depth from these tuning curves and found that touch was approximately 3.6 times greater (14.6 ± 1.7 Hz; mean ± SEM) compared to whisking (4.04 ± 0.5 Hz; mean ± SEM; Figure 2.7C). Some whisking only tuned neurons display a greater absolute modulation depth for touch; this is explained by the fact that touch has a larger signal but also larger noise, and as such requires a larger absolute modulation depth to reach significance. Next we compared the shapes of the normalized tuning curves and found that whisking and touch tuning curves showed no difference in correlation compared to a randomly shuffled distribution (Figure 2.7D). Similarly, the preferred angle for touch and whisking co- tuned neurons were uncorrelated (Figure 2.7E). Taken together, our data suggest that touch angle tuning and whisking angle tuning are characterized as functional independent of one another. Discussion In this study we leveraged an object localization task while simultaneously recording single neurons and sensorimotor features (Figure 2.1) to reveal multiple novel findings in L5B excitatory neurons in mouse S1. During free whisking the majority of whisking tuned neurons are more significantly modulated by angle compared to phase, and no neurons were tuned to 41 phase alone (Figure 2.2). Upon whisker contact with the pole, ~75% of touch neurons were significantly modulated by pole location; these neurons were found across L5B and had similar shaped tuning curves (Figure 2.3 and Figure 2.5). Lesioned mice could perform the task to some degree, but only when differentiating between touch and non-touch trials, not pole location per se (Figure 2.4). Recording in naive and trained mice revealed no difference in that ratio of tuned touch neurons, tuning curve shape, and showed that no training was required for tuning to exist (Figure 2.5). Object location could be decoded equal to expert mouse performance using only five neurons (Figure 2.6). Finally, we show that the representation of free whisking tuning and touch location tuning are functionally independent of one another (Figure 2.7). Together this demonstrates the importance of L5B in transforming whisker touch into a whisker touch-based representation of location. The touch object location tuning in L5B was robust and marks a significant advancement in our understanding of somatosensory processing in the mouse cortex. This finding is somewhat surprising given the work by Curtis & Kleinfeld (2009), where phase, and not angle at touch, better explained neural responses in rat primary somatosensory cortex. Their study differs from ours including, the utilization of rats instead of mice, variations in the ratio of protraction to retraction touches, and the use of probe electrodes, which may introduce complications such as merged or missed neural spikes during recording. Most notably however, Curtis & Kleinfeld employed a task where rats reached their whiskers across a gap to whisk against a touch sensor continuously until it was retracted, upon which they received a reward. They describe that rats make large head movements and crane their neck to access the sensor, which is impossible in our task. Additionally, a separate study in free moving rats found that head movement and position 42 were correlated with discriminating between two object locations presented simultaneously (Knutsen et al., 2006). An intriguing hypothesis could unify these findings. It is possible that touch location tuning as we describe it here, is masked by neural signals present in free moving rodents that are effectively controlled for during head-fixed paradigms (e.g., head movement, running etc.,). An alternative but not mutually exclusive hypothesis could be that head-fixed mice adapt a natural investigative motor strategy based on their head-fixed ‘handicap’. Because these representations exist in naïve mice, location tuning cannot arise from the necessity of mice to solve the task. While naive mice were oblivious to the task we designed, they are not in any way naive to object investigation with their whiskers. Except for the randomized rewards, the task structure remained unchanged for naive mice. Given that trained mice acquired task proficiency, it is reasonable to assume that naive mice, also receiving rewards, strategically explored the pole in an effort to comprehend the task. Under this assumption, a natural investigative motor strategy adapted due to head-fixation, could be an essential component for generating these neural representations. One limitation of our study was that we chose to isolate our neural recordings to the approximate borders of L5B and therefore it is not known if other layers code for object location as well. Previous research using the same paradigm but with only two pole locations, showed that L4 could in fact discriminate between two pole locations based on spike triggered touch angle (O’Connor et al., 2013). When making comparisons, it is important to consider that in previous experiments, pole positions were approximately 25° apart, while in our study, pole positions ranged up to ~50°. To make a meaningful comparison, we would need to sample from two positions at around 25°, which would likely decrease the number of tuned neurons which could be detected based on this criterion. Additionally, while not a direct measure of object location, 43 another study found that the number of spikes between stimulus delivery and reaction time (i.e., answer lick time) could predict between two pole positions (O’Connor et al., 2010b). Specifically, 63% L4 and 79% of L5 neurons in S1 could discriminate between these trials. Given this evidence it would seem more likely that touch location tuned neurons exist in both L4 and L5 but perhaps are more pronounced in L5. Understanding the neural basis, as well as the necessary behavioral conditions for this touch location representation, presents a challenging yet uniquely valuable question in neuroscience. The intricate loop between neurology and behavior adds complexity to unraveling how the brain constructs representations. In this case, studying whisker-based object location tuning in rodents provides an opportunity to gain a unique insight into how cortical circuits achieve functional representation without direct access to sensory information that would normally define said representation (i.e., lack of proprioception). At the same time, the well-defined and contained nature of this problem offers the advantage of a controlled experimental setup to study the interdependent nature of behavior and neural activity. Materials and Methods Object localization task Mice were trained in a whisker based go/no-go object localization task. Using a single whisker (C2), water-restricted mice were motivated to whisk and identify the location of a smooth vertical pole (0.6mm diameter) 7-12mm lateral from the whisker pad. The pole moved along the anteroposterior axis acoss 10 mm was positioned using stepper linear actuators with 99 nm resolution, 25 μm accuracy and <5 μm repeatability (Zaber NA11B30-T4). To avoid potential ultrasonic cues associated with stepper motor movement, the pole was jittered 0-127 microsteps 44 (0-25 μm) on each trial. A pneumatic linear slider (Festo) was used to raise the pole vertically into touch reach for each trial. The Festo also provided a sound cue on pole presentation onset. Specific pole locations rewarded mice with water (4-8 μL), punished mice with a timeout (2 s), or had no effect based on the mouse’s decision to lick or withhold licking. In a go/no-go paradigm, four trial outcomes exists. In a minority of sessions where the animals were trained, the close posterior 5 mm of pole positions (go) were rewarded with water rewards upon licking (hit) or had no effect if mice withheld licking (miss). The far anterior 5mm of pole positions (no- go) were punished with timeout (false alarm) or had no effect if mice withheld licking (correct rejection). For the remaining sessions, rewards and punishment were given regardless of the pole location - go trials and no-go trials had overlapping pole locations. Behavior, videography, and electrophysiology Animal behavior, videography and electrophysiology were synchronized and captured during task performance using EPHUS (https://www.janelia.org/open-science/ephus). A single computer running BControl (MATLAB 2007b) was used to initiate each trial of the object localization task and synchronize video and electrophysiology recordings via a second computer running EPHUS. Trial onset triggered high-speed video capture of whisker motion (1000 fps) and electrophysiology recording of single unit activity (MultiClamp 700b). Whisker motion was captured from an overhead view and spanned 4 seconds, spanning the period prior to pole onset to response window. Video frames were acquired using Basler acA200-340kmNIR camera and Edmund Optics 0.18X 1⁄2’’ GoldTL Telecentric Lens (Model # 52-258) under 940 nm illumination on Streampix 6 software. Whisker shape and position was traced and tracked using Janelia Farm’s Whisker Tracker (https://www.janelia.org/open- science/whisk-whisker-tracking). A mask was traced around the edge of the fur to reduce 45 tracking noise. Whisker angle is quantified as the intersection between the mask and the whisker. The whisker midpoint, phase and amplitude was decomposed from the band-pass filtered (6-60 Hz, Butterworth) whisker angle time series using the Hilbert Transform (MATLAB 2018b: hilbert). Whisking amplitude and phase is defined as the magnitude and phase angle (radians) of the Hilbert Transform of the whisker angle time series, respectively. Whisking midpoint is the filtered (6-60 Hz) difference between whisker angle time series and band-pass filtered signal. Whisker curvature is the amount of bending of the whisker measured 3-5 mm lateral from the whisker mask. The precise millisecond of touch was determined through custom MATLAB software via distance to pole and change in whisker curvature. This was followed with manual curation of images of uncertain whisker and pole intersections. In-vivo loose seal juxtacellular recordings All animals used in this study were adult male or female transgenic mice (VGAT-ChR2-EYFP) expressing channelrhodopsin in inhibitory units. Following head-plate surgery, mice were trimmed to one whisker (C2) and intrinsic signal imaging was used to target the barrel column associated. A single whisker was maintained throughout training and recording. Prior to recording, animals were anesthetized (2% isofluorane) and a small craniotomy (200-300 μm) was made above the barrel column associated with the C2 whisker. On the first day of recording, animals were allowed to recover for 1 hour before recording. Recordings were repeated for 4.8 ± 1.5 sessions (mean ± SD) per animal. To sample single unit spiking activity in a manner unbiased by firing rate, blind juxtacellular loose-seal patch recordings were targeted to L5 (600 μm - 950 μm from pia; Lefort et al., 2009) neurons using patch pipettes (Warner Instruments; 5-8 MΩ) filled with 0.9% saline (Growcells). 46 Electrical recordings (n=156 neurons) were acquired and amplified using MultiClamp 700b and Headstage CV-7B. The pipette axis was aligned parallel to the C2 barrel column at 35°. To perform an unbiased sampling of L5B, we recorded from any isolated unit. An isolated unit was identified by an increase in resistance to 15-20 MΩ. Once a unit was isolated, 10 trials of the behavioral task was run to test for spikes during performance. If spikes were observed an isolated unit was maintained for at least 100 trials (137 ± 57; mean ± SD). Upon recording completion, 10 trials of a 10 Hz pulse of blue light (480nm, 10, 20ms at 15-20 milliwatts) was used to test whether the unit was an interneuron. Short latency spiking (or inhibition) to a 490nm 10 Hz 5ms light indicated if the neuron was inhibitory (or excitatory). Fourteen units were inhibitory and excluded from analysis. On the other hand, if an isolated unit did not spike after 10 trials a current pulse (100 us, 20 nanoamps) was injected to check if a unit was indeed patched. If a burst of spikes was observed, we deemed that neuron a silent cell. Lesions We trained four mice to expert performance for at least 1 day, defined as 75% accuracy for at least 200 trials. S1-C2 was targeted using intrinsic signal imaging while stimulating C2. Once mice reached expert level they were deeply anesthetized using isoflurane and lesions were performed using a blunt-tipped syringe needle connected to a vacuum And aspirating cortex (Hong et al., 2018). A 2 mm craniotomy was made, and lesions were targeted to the C2 barrel and all surrounding barrels. All layers were aspirated which was determined by the white myelinated fiber tracts ventral of layer 6. Histology DiI (ThermoFisher D282) was coated onto a patch pipette and inserted into the recording location on the final day of recording to identify the location of recordings. DiI coated pipettes 47 were inserted 1000um deep into the recording location and left there for 5 minutes to ensure proper coating of the recording location. 2 hours post dye, animals were deeply anesthetized with ketamine (110mg/kg) – xylazine (10mg/kg) cocktail before perfusion with 0.1 M sodium phosphate buffer, followed by 4% paraformaldehyde (PFA, in 0.1M sodium phosphate buffer). The fixed brain was then flattened along the axis perpendicular to the barrel column. The flattened brain was immersed in 4% PFA for 1 hour post-perfusion, transferred to 20% sucrose solution for 1 day, and then 30% sucrose for 1 day. 100 μm slices were cut tangentially and cytochrome oxidase staining was performed to reveal the barrel columns. Fluorescence imaging was done to recover the location of the DiI track. Recording location was determined by overlapping fluorescent track on top of bright field imaging of barrel columns. Defining touch response window A smoothed (Bayesian Adaptive Regression Splines [BARS] Wallstrom et al., 2008) response - 50 ms to 50 ms around touch was used to evaluate the touch response window. The touch response window is defined as any time point from 5 to 50 ms post-touch in the smoothed response that exceeded baseline (-50 to 0 ms pre-touch) ± the 95% confidence interval. 2 parameters were imposed to ensure an accurate response window was captured: 1) the mean firing rate of the touch response had to be > 2 Hz: 2) the touch response window had to be greater than 4 milliseconds. A touch neuron is defined as any neuron that had a touch response window. Tuning curves For a single neuron 5% of sampled touches or 5% of total whisking time points were used to define a point along the touch or whisking tuning curve. This method ensured 20 equally sampled bins. Stimulus values are defined as the median of each stimulus bin and the response 48 values as the mean of each response bin. For touch tuning the response bins include the firing rates within the touch response window as defined above. For whisking tuning the same response window as touch was used. If a neuron was not tuned to touch, the median touch response window was used to evaluate whisking tuning. The median touch response window is 10 to 28 milliseconds post-touch. Tuning curves were generated by smoothing using Bayesian Adaptive Regression Splines on the binned histograms. Neurons that had mean whisking responses less than 2 Hz were not evaluated. We used a one-way analysis of variance (ANOVA) at alpha level of 0.01 to quantify whether a neuron was tuned or not to a whisking or touch parameter. To further ensure that the tuning we observed was not due to noise in neural responses, we shuffled touch/whisking responses 1000 times and evaluated F-values from a one-way ANOVA. If our observed F-value was above the 95 th percentile of the shuffled population distribution of F-values we deemed the neuron as tuned. Tuning preference is the location of the peak response of the tuning curve. To define the width of the tuning, a multiple comparison test using Tukey-Kramer type critical value was used to identify the first bins in both direction that were significantly different from the peak value. If no bins were significant, no modulation width was defined. Maximum and minimum responses were calculated from BARS fitted tuning curves. Modulation The absolute modulation depth and modulation depth for each tuning curve is calculated as: 𝑎𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑚𝑜𝑑𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑑𝑒𝑝𝑡ℎ = max𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒−min𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 49 𝑚𝑜𝑑𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑑𝑒𝑝𝑡ℎ = max 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒−min𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 max 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒+min𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 Neural decoding We used multinomial logistic regression to decode pole location implemented using glmnet. (Hastie et al., 2013). Only touch units that sampled at least 80% of the pole position range were used for decoding. Each unit had a tuning curve that was interpolated to 40 bins to estimate location to 0.25 mm resolution. At each bin, 50 samples were drawn from a Poisson pdf with a λ as the mean of each interpolated bin. We justified drawing from a poisson pdf because we found that at touch the number of spikes generated in the touch response window followed a Fano factor of 0.94 ± 0.22 (mean ± SD). For the design matrix, each row is a location bin, each column a single neuron, and each entry a sampled neural response for the associated neuron. The decoder was run for 10 iterations. During each iteration a random 70% of trials were allocated for training and the remaining 30% for test. Lasso regularization (alpha parameter 0.95) was used to reduce over-fitting. To identify the number of units required, we sampled varying numbers of neurons with replacement from the units used to train the original model 500 times. The indices of the selected neurons were used to create a new population design matrix and matrix of learned coefficients from the original design matrix and learned coefficients. The prediction probabilities of location were computed by the below: ℎ " (𝑥) = 𝑔(𝜃 # 𝑥) where 𝑔(𝑧) = $ $%& !" where ℎ " (𝑥) is the hypothesis function, 𝜃 # are the learned coefficients, 𝑥 is the input design matrix, and 𝑔(𝑧) is the normal function of logistic regression used to calculate prediction probabilities. 50 The predicted location was chosen as the location with the highest probability. Model evaluation of accuracy and resolution was performed on the test set. Model accuracy is defined as the total number of correct predictions divided by the total number of predictions. A confusion matrix made from true and predicted locations was normalized across the total number of given true cases and used to define the decoding resolution and neurometric curves. Decoding resolution is defined as the total number of predictions within n bins of the diagonal, where each bin was 0.25 mm. Neurometric curves, defined here as the choice to lick given neural activity, is defined as the sum of predictions along true values for the go predictions (left half of the confusion matrix). Simulated neurometric curve performance for licks were defined as any lick probability that exceed 50%. 51 Chapter 3: Multisensory response properties in secondary somatosensory cortex Introduction To establish an uninterrupted and robust perception of our environment, we use multiple streams of information from within and across sensory modalities. While it was previously thought that interactions across sensory modalities largely remain separate until higher order cortical regions, both functional and anatomical evidence suggests that multisensory interactions are common even in the thalamus (Donishi et al., 2011; Kimura and Imbe, 2018). Despite this fundamental property of the brain, multisensory interactions in earlier cortical regions are not well understood. Here we report multisensory neurons found in mouse S2 that display strong touch and auditory driven responses. Previous experiments in rat whisker S2 found similar multisensory responses but our finding is the first in mouse (Brett-Green et al., 2004). A 2- photon imaging experiment in S2 L2/3 showed some auditory interactions, but these are distinctly different from what we observed here and mostly inhibitory (Zhang et al., 2020; see discussion). Rodent whisker S2 is more commonly associated with integrating multiple whiskers averaging around 10 compared to S1 which only responds to one or two (Kwegyir-Afful and Keller, 2004). There is some evidence that in L2/3 S2 neurons that project to S1 carry more information about hit and correct rejection trials (Chen et al., 2016). Unfortunately, detailed studies on the whisker responsive region of S2 are limited, especially in deeper layers. 52 Results For these experiments we trained mice on the same continuous object localization task outlined in Chapters 1 and 2. Briefly, mice were presented with a pole in a continuous range of go and no- go pole locations (Figure 3.1A, B). High speed video of the whisker was tracked to extract various sensorimotor features (Figure 3.1C). A total of four possible trial outcomes were possible as described in earlier chapters (Figure 3.1D). For the data reported here, the task structure was Location range False Alarm Hit Correct Rejection Miss GO NOGO 31±1.6° 19±2.9° 5mm 5mm 13mm 10mm 8mm Follicle NOGO GO Location ranges A BC E D G F H Up Down Real Fake Light Pole cue Pole Presentation Sampling Period Answer Period Reward Collection .75s .5s .2s Sample Lick Normal trial path 1.5s 1.5s 50% 10% 10% 30% Next trial 50% Sound Only Trial .12s 60 40 20 0 0 -50 50 100 Time from event (ms) 0 50 100 150 4000 2000 3000 1000 0 0 20 40 60 False Alarm Hit Correct Rejection Pole onset time Time(ms) Sorted Trials spikes/s Pole up sound Pole up sound Pole down sound Pole down sound Protraction touch Protraction touch Retraction touch S1 C2 S2 C2 0.388 mm posterior 1.284 mm Lateral L M P A Figure 3.1 Recording S2 neurons during a head-fixed object localization with audio playback. A) Whisker guided object localization task, example of go (hit) trial top, and a no-go (correct rejection) trial bottom. B) A still frame from 1000 fps video used to track whisker position, bending, and position derived variables. C) Extracted whisker position for a no-go (left) and go (right) trial. D) four trial outcomes from the task contingent on trial pole location and mouse licking during the answer period. E) modified audio task consisting of normal trials (left), sound only trials (right, bottom). For a subset (red) of trials a fake pole-down sound is played to allow a light-based trial start cue (right, middle), and a pole- up sound playback trial start cue (right, top). F) Intrinsic signal imaging of cortex while stimulating the C2 whisker. S1-C2 (centered) and S2 based on coordinates from S1-C2 G) E) Raster and trial type PSTH (bottom) of example neuron. H) Event PSTH of protraction touch, retraction touch, pole up sound, and pole down sound. 53 identical to the task in previous chapters (Figure 3.1E left), but modifications were made for un- processed data collected from additional experiments (Figure 3.1E right). We target S2 by stimulating the C2 whisker to target the C2 representation in S1. From there we use coordinates from this position to locate S2 (Figure 3.1F; methods). Our study revealed large amplitude neural responses that closely corresponded to the 'pole-up' sound, leading us to hypothesize that these were sound-induced responses (Figure 3.1G, H). By creating a PSTH of aligned pole-triggered auditory responses and excluding touch times, we were able to confirm that these responses were not touch-induced (Figure 3.4C). Whisker responses can be triggered by mechanical stimulation from reafference, which is self-generated motion (Fee et al., 1997; Campagner et al., 2016; Cheung et al., 2020). To ascertain if these Figure 3.2 S2 neuron are sound responsive. A) large amplitude pole-up sound-initiated whisks (black) and corresponding PSTH (green). B) Audio trace from pole up sound trigger (black) and corresponding PSTH (red) for the same example neuron in (A) with red line indicating the 20 ms onset latency of sound from trigger. C) Four example neurons aligned to pole sound initiated large amplitude whisks (green) and pole up trigger (red). 20 Whisker angle change (deg) Normalized audio trace -50 0 50 -20 -10 0 10 20 30 -1 0 1 -60 -50 -40 -20 Spikes/S Spikes/S Pole trigger aligned Whisking aligned -60 -40 -20 60 50 40 20 0 0 60 40 20 0 Time (ms) Time (ms) Time (ms) 350 140 50 200 00 0 0 A B C 54 responses could be caused by reafference, we identified all instances of large amplitude whisking induced by the pole-up sound. We found that many neurons displayed notable increases prior to these large amplitude whisking onsets (Figure 3.2A, C). Further comparisons between whisking and pole-up sound aligned responses revealed that the latter were both larger and sharper. Taken together this suggests that these responses are driven by sound and not somatosensation (Figure 3.2B, C). Determining response latencies can shed light on the potential origins and functions of a neural response. In our research, we found a clear difference when we compared the latency of touch responses with those of auditory-driven responses. Touch responses showed lower latency, typically within an 8-11 ms range, comparable to findings observed in S1 (Hires et al., 2015). In contrast, auditory-driven responses demonstrated longer latencies, falling within a 13-19 ms range (Figure 3.3). When we compared the average latencies of the four shortest latency neurons Figure 3.3 Touch and audio onset latency across depth. Response onset latency for touch (blue) and sound (red) responses. Sound latency are determine based on audio waveform onset (Figure 3.2B). Cortical depth (µm) Onset Laetency (ms) -1200 -1000 -800 -600 -400 Touch Touch Sound Sound 0 10 20 60 50 40 30 Untuned L5A L4 L5B L6 55 for both touch (8 ms) and auditory (14 ms), we found a 6ms difference. This difference suggests that auditory responses likely involve additional synaptic connections compared to touch responses. Many neurons also showed a sustained response to both types of stimuli. To categorize different response types, we established windows relative to the onset of the stimulus; early (0-25 ms), Figure 3.4 S2 neurons show complex and temporally dynamic responses to touch and sound. A) Z-scored touch responses for neurons classified as early (0-25 ms), late (25-50 ms), and sustained (0-50 ms) for protraction (top) and retraction (bottom) touches. Baseline period is outlined in red and signal period in blue. Responses are smoothed with a 30 ms window and colored to distinguish each trace (paired t-test p<0.01 methods). B) Z-scored peak responses for protraction and retraction touches. Responses capped at 5 standard deviations. C) Same as (A) but for pole up sound and pole down sound except signal windows are shifted forward 25 ms to account for trigger to sound delay (20ms) and increased latency (methods). D) Same as (B) but for pole up sound and pole down sound. -5 0 5 Signi!cant Pole sound response 05 -5 -5 0 5 05 -5 Signi!cant Touch response AB CD Retraction Z-scored response (SD) Pole-down sound Z-scored response (SD) Protraction Z-scored response (SD) Pole-up sound Z-scored response (SD) Z-scored touch response (SD) Time from touch onset Protraction Retraction -4 0 4 8 12 -15 -10 -5 0 5 10 15 -8 -4 0 4 8 0 10 20 30 -4 0 4 8 12 -2 0 2 4 6 0 20 40 60 80 0 40 80 0 20 40 0 20 40 60 Z-scored sound response (SD) Pole up sound Pole down sound Sustained Early Late -100 0 100 -50 50 -100 0 100 -50 50 -100 0 100 -50 50 0 40 80 120 0 40 80 Time from sound onset Sustained Early Late -100 0 100 200 -100 0 100 200 -100 0 100 200 Time (ms) Time (ms) 56 late (25-50 ms), and sustained (0-50 ms). Neurons that showed significant responses across multiple windows were classified based on which was most statistically different from the baseline period of -100 to -20 ms before stimulus onset (methods). We found that touch neurons were roughly evenly distributed across these three categories for retraction touches, while protraction touches were more frequently classified into the early and late periods, with fewer falling into the sustained period (Figure 3.4A). We adjusted the response windows forward 25 ms for auditory responses to account for latency differences (+6 ms) and the timing of event triggers in relation to the actual auditory waveform (Figure 3.4C, blue bar; waveform Figure 3.2B). Most pole-up auditory responses were categorized as late and sustained, with fewer classified as early. Most auditory-responsive neurons were classified as early, probably due to the pole-down audio waveform having two auditory components at 4 ms and 22 ms, which we did not distinguish between in this study (Figure 3.4C). The remaining pole-down auditory responses were evenly distributed across late and sustained periods. These results illustrate that S2 neurons in awake behaving mice exhibit strong modulation across various timescales for both tactile and auditory stimuli, confirming S2 as a multisensory area. We discovered several neurons showing opposite response polarities for protraction and retraction touches. To quantify this, we compared the Z-scored responses for each stimulus for each neuron (Figure 3.4B). This revealed that 7 out of 26 touch-responsive neurons either showed reversed polarity or were exclusively tuned to either protraction or retraction touches. Most of the remaining touch neurons were aligned with the identity line, indicating that most S2 touch neurons have balanced response magnitudes for protraction and retraction. We applied a similar analysis to pole-up and pole-down auditory responsive neurons and found that 9 out of 29 57 auditory-responsive neurons exhibited reversed polarity or exclusive tuning (Figure 3.4D). Auditory-responsive neurons were generally more scattered from the identity line, displaying that these neurons have a stronger preference for one sound over the other. These results underscore the complexity of neural responses in S2, highlighting its role in the integration of multisensory information. While response magnitude is relatively balanced across touch types, it can show strong preference across for one sound over another or even reverse polarity. Next we examined the laminar distribution of tuned neurons (Figure 3.5A). Our findings revealed a higher concentration of neurons modulated by both touch and sound in layers L5A Figure 3.5 Touch and audio responsive neurons are overlapping and concentrated in L5 and upper L6. A) Z-scored peak responses for sound, touch, and whisking, capped at +/- 2 standard deviations for all neurons plotted against recording depth. Cortical layers separated by yellow lines. B) Neurons with shared response properties for protraction and retraction touch (top), pole up sound and pole down sound (middle), and pole sound, touch and whisking, across all neurons (n=45). Cortical depth (µm) Z-scored response (SD) Touch Pole Sound Protraction Up Retraction Down Whisking AB Pole sound Touch Whisking Pole down sound Pole up sound Retraction Touch Protraction Touch 8 2 12 1 11 6 2 7 20 10 13 3 -1200 -1100 -1000 -900 -800 -700 -600 -500 -400 0 2 1 -1 -2 L5A L4 L5B L6 58 and L5B. Breaking this down even further, we found that auditory responses were particularly dense in L5B, whereas touch-responsive neurons were distributed more uniformly across the entirety of layer 5. Among the limited cells from layer L6, about half were responsive to at least one type of stimuli. Interestingly, we observed a scarcity of neurons in layer L4 tuned to either stimulus, suggesting that posterior medial nucleus (POm) which strongly drive L4 (Lee and Sherman, 2008), is not likely responsible for either tuning. We further explored the overlap in tuning across all responsive neurons (Figure 3.5B). We identified 26 neurons tuned to touch, 29 to pole sound, and 9 to whisking. Among the touch- responsive neurons, half were exclusively tuned to either protraction (10/26) or retraction (3/26), with the remaining half (13/26) responsive to both. Interestingly, only 2 out of 29 auditory- responsive neurons strictly responded to the pole-up sound, 7 to only the pole-down sound, while the majority (20/29) were modulated by both sounds. A total of 17 neurons were tuned to both sound and touch out of a total of 37 that responded to at least one. Out of this subset, slightly more were tuned to only sound (12/37) compared to only touch (8/37). To deepen our understanding of auditory driven responses, we devised a modified version of the object location task. Standard object location trials were no different than those we described in Chapters 1 and 2, where mice localize an object and lick for water reward on near pole positions (Figure 3.1A). Standard trials consisted of go and no-go trials in a continuous range of pole locations, and high-speed video data was collected and processed to calculate whisker sensorimotor features (Figure 3.1B, C). The sequence of standard trials is demonstrated in black in Figure 3.1E, always beginning with the pole in an unreachable down position for the mouse. 59 Upon trigger activation, the pole is elevated into place by a loud pneumatic linear slider, initiating the sampling period. During this time, mice examine the pole using a single whisker to ascertain its position. This is followed by an answering period starting after 0.75 seconds, during which licking determines trial outcomes. A water reward is released upon licking for successful go trials. The progression follows the normal trial path (Figure 3.1E), lowering the pole and generating a softer distinctive sound from the release of pressurized air, signifying trial conclusion. For the other half of trials, we introduced trial variants that followed the end of a standard trial. Specifically, in 10% of the trials, we only played various sounds through the speaker without the possibility of reward or the presence of the pole (Figure 3.1E, bottom right black). In the remaining 40% of trials, we played back a fake pole-down sound but kept the pole in the up position to trick the mouse into thinking the pole was down (Figure 3.1E, outlined in red). We used this design to test how touch, sound, and light might interact in multisensory S2 neurons. In 10% of all trials were light cue trials, where instead of the pole-up sound a light turn on for 120 ms which indicated the pole was available and the sampling period had begun (Figure 3.1E, right middle). The remaining 30% of all trials were fake pole-up trials, where a fake pole-up sound was played which indicated the sampling period had begun (Figure 3.1E, right top). Finally, in half of all fake pole-up trials (15% of all trials), a second pole-up sound was played 1.5 seconds after the first, which indicated nothing about the trial, and occurred in the middle of the answering period. 60 We examined both trained mice and naive mice for these additional experiments. We collected approximately 130 neurons not including silent neurons. Of these approximately 90 were held for longer than 100 trials. Unfortunately, no in-depth analysis has been conducted on this dataset at this time. However, general conclusions can be drawn from observations during data collection. First, naive mice showed clear auditory induced responses even if it was the first time they heard this sound. For each naive mouse the first neuron patched was intentionally selected to be higher firing and likely auditory responsive, which was tested using light tapping on a hard surface. We did this for the first neuron in each mouse to prove that even upon the first presentation of the pole-sound, before it has taken on any meaning, auditory responsive neurons are already present. Second, the light induced cue was not universally effective at inducing each mouse to whisk, suggesting that only some mice interpreted the intended meaning. Regardless of whether mice whisked upon the light cue, no obvious responses were observed, although a more detailed analysis is required to determine this conclusively. Third, most neurons that showed an obvious response to pole sounds also responded to alternative sounds played back via a speaker. These included beeping at various frequencies, forwards and backwards chirps, reward-value clicking, moving motor sounds (used to position the pole), and recordings of both pole sounds. Fourth, regarding these responses being attentional based, we found that audio playback drove responses even during periods where the mouse was unengaged. These audio responses recorded in this additional dataset were clear, even after the mouse stopped engaging with the task for over 100 trials. 61 Discussion In this study we establish L5 and upper L6 of S2 as a robust multisensory area, strongly driven by touch and sound (Figure 3.5A). We eliminate alternative explanations for auditory responses through whisker reafference or touch responses (Figure 3.2). Comparing the earliest onset latency reveals a distinct separation between touch responses which arrives at 8 ms, and auditory responses which arrive at 14 ms (Figure 3.3). We show that both touch and auditory responses exhibit a diversity of early, late, and sustained responses (Figure 3.4A, C). Comparing protraction and retraction Z-scored response magnitude reveals that some touch neurons show reverse polarity for each, but for those that don't, their response magnitudes are largely proportional (Figure 3.4B). The same analysis in pole-up and pole-down auditory responses reveals that more auditory neurons show a reverse polarity for each, and these differences are more pronounced compared to reverse polarity touch neurons (Figure 3.4D). Further, sound responsive neurons show a stronger preference across sound type, compared to touch neurons across touch type (Figure 3.4D). Finally, we show that the number of touch and auditory responsive neurons are approximately the same and there are more neurons tuned to both stimuli, than neurons tuned to either stimulus alone (Figure 3.5B). We further examined S2 using a new audio playback based variant of the object location task and collected an additional dataset which we have not yet analyzed in depth. However, we found through observation that naive mice also exhibit strong auditory responses. We observe no obvious response to a light cue, suggesting that this is an auditory specific response and not a generalized cue response. Finally, we observe that these auditory responses are not specific to 62 pole sounds but respond to a variety of auditory playback, suggesting they are general auditory signals. Multisensory integration It is now well known that the cortex, even in primary sensory areas, is multisensory as a guiding principle (Fu et al., 2003; Komura et al., 2005; Ghazanfar and Schroeder, 2006; Bizley et al., 2007; Land et al., 2012; Rao et al., 2014; Maruyama and Komai, 2018; Couto et al., 2019; Zhang et al., 2020; Lohse et al., 2021). Even subcortical structures, categorized as unimodal sensory thalamic structures, demonstrate multisensory interactions (Khorevin, 1980; Komura et al., 2005; Bieler et al., 2018; Kimura, 2020; Lohse et al., 2021; Ansorge et al., 2021). Multisensory responses in thalamus have also been observed in subthreshold recordings when presented alone (Donishi et al., 2011; Kimura and Imbe, 2018). Moreover, both ‘uni-modal’ thalamus and cortex receive input from primary sensory cortices outside of their primary sensory modality (Lohse et al., 2021; Zhou et al., 2022). Multisensory interaction in these early sensory areas typically manifest as a modulatory effect on the preferred sensory modality, when presented simultaneously with the non-primary modality, and most often have an inhibitory effect (Bizley et al., 2007; Rao et al., 2014; Zhang et al., 2020). This class of multisensory interaction can be thought of as modulators, as they can shape the responses to the primary modality but rarely reach suprathreshold alone. There is a vast amount of literature focused on somatosensory responses in auditory thalamus and cortex, but auditory responses in somatosensory areas are less studied and presumably less common. One study found an inhibitory effect of auditory stimulation in both S1 and S2 in mice 63 (Zhang et al., 2020). This finding is distinctly different from ours considering that most of the observed auditory responses were subthreshold and primarily inhibitory. Previous research found that in rat S2 there was a distinct multisensory zone that responded to sound, which was defined using a flat multi-electrode array placed on the cortical surface (Brett-Green et al., 2004; Menzel and Barth, 2005). While no corollary organization has been described, we suggest that the same organization could also be true in mice, and our observation is the result of patching neurons in this region. Alternatively, all of S2 may be strongly driven by sound, at least in head-fixed, water restricted, and actively behaving mice. Lastly, there was one instance of a study in agouti, which found similar auditory-tactile neurons near the border of S2 and primary auditory cortex, consistent with our results (Santiago et al., 2018). Source of auditory signal In terms of the possible source of this auditory signal there are a few possibilities. First nearby anterior auditory field (AAF) shares a border with S2 (Zhang et al., 2020) and could be driving activity through corticocortical projections. Auditory driven responses in AAF have an onset latency of about 11 ms (Sołyga and Barkat, 2019), just 3 ms before the auditory responses we observe here. This timing aligns nicely with direct projections from AAF and is the most likely source of the auditory responses. There are however alternative possibilities, as A1 does project to both VPN and POm (Zhou et al., 2022). These seems unlikely to drive the responses we see, since VPN is not known to have strong auditory responses and POm strongly drives L4 of S2 (Viaene et al., 2011), where we observe the smallest proportion of auditory neurons. Projections from auditory cortex to S2 is not establish but some connections from primary auditory cortex project to the forepaw regions of S1 in rodent (Santiago et al., 2018; Godenzini et al., 2021). 64 Proposed analysis and details of additional dataset Multisensory stimuli For the additional dataset some straight forward analysis can shed some light on the functional properties of multisensory neurons in S2. First evaluating which of the presented sounds best drive these neurons and if their frequency tuning curves and preferred stimuli are homogeneous. Our initial experiments which found neurons that respond with the opposite polarity to pole-up and pole-down sounds would suggest that their tuning curves are not uniform, at least in trained mice. Testing if this also occurs in naive mice, before these sounds signify the start and end of a trial would be interesting. In the same respect, evaluating differences in preferred stimuli in trained vs naive mice for behaviorally relevant (pneumatic pole sounds, clicking of water reward delivery, and sound of the Zabor motor moving) vs behaviorally irrelevant sounds (beeping and chirps) would be interesting. This is especially true considering the rather complex auditory tuning of some neurons showing reverse responses for pole-up and pole-down sounds. Finally, evaluating if light cue elicits any responses at all can help to distinguish if these are ‘cue’ responses or if they are truly innate audio responses; the latter seeming like the most likely. Multisensory interactions Preliminary analysis of neurons tuned to both touch and pole-up sound, revealed that at least three neurons exhibited a distinct interaction between time of first touch relative to the pole-up sound. Specifically, each of these three neurons showed a distinct increase in the touch response when it occurs within 100-120 ms after the pole-up trigger (Figure S3.1). Accounting for the latency of the trigger to pole-up sound (20ms) and the latency from the sound to the peak (not the onset) of the auditory response (~21 ms), means that approximately 59 to 79 ms after S2 activity 65 sharply increases activity from the sound, touch responses for these neurons are upregulated by about two-fold. This result is preliminary and was only found in select multi-sensory neurons we analyzed. Even so, this discovery aligns with the theory of phase resetting, which suggests that sensory cortices integrate multiple streams of information by aligning phase oscillations (Ghazanfar and Schroeder, 2006; Lakatos et al., 2007; van Atteveldt et al., 2014; Bauer et al., 2020). Briefly, phase resetting is grounded in the concept of phase-dependent modulation, which is a phenomenon where the phase of the ongoing network-level neural activity affects the probability of a neuron integrating activity and generate spikes, or similarly, generating more spikes than it Figure S3.1 Touch response is modulated by time from pole up sound. A) PSTH of pole sound aligned responses (top left) with statistically significant period (blue; methods). Touch response across whisker angle at touch (bottom left). Raster aligned to touch onset (black axis) for first touches in a trial with 0-25 ms signal period (blue bar) sorted by time from pole onset, blue line represents significant pole sound response window. Rainbow line represents the time from the pole onset aligned to red axis and read from the color bar (see example on right). B) Same as (A) but for all touches in a trial. 66 would otherwise (Ghazanfar and Schroeder, 2006; Bauer et al., 2020). Put simply, during an up phase, network level activity is increased and so if a neuron receives input during this phase, network activity can combine with a sensory (or any) neural signal, making them more likely to fire spikes or generally increase the spike rate. Phase resetting is the resetting of the phase of a given oscillation by a large burst of spikes, which then has a stereotyped oscillation that can upregulate other neural signals which arrive during the up phase (Ghazanfar and Schroeder, 2006; Bauer et al., 2020). It is theorized that one modality can use phase resetting to align incoming signals to another. This may explain what we see in these three example neurons, which would put this auditory induced oscillation between 12.7 and 16.9 Hz (Figure S3.1). If this is indeed phase resetting, then this effect should be evident in other touch neurons even if they are not modulated by sound. Evaluating this across all S2 the neurons collected could help substantiate phase resetting theory in awake behaving conditions. Finally, it is important to consider features which are collinear with touch onset like pole location (Figure S3.1, theta heatmaps), to determine what best explains this finding. Task engagement and attention based sensory interactions For about half of the recorded neurons in this dataset, we collected video recordings of the pupil dynamic in the mouse, which can be used as an indirect measure of animal state (Vinck et al., 2015; McGinley et al., 2015; Shimaoka et al., 2018). Further, for trained mice there are distinct periods where they engage with the task. For a subset of multisensory neurons, we recorded over 300 trials during which mice were engaged with the task, stopped performing the task but the behavior and sounds ran for over 100 trials following this. Given that a hallmark for multisensory integration is that it is dependent on context and animal state (Shumkova et al., 67 2021; Haider et al., 2013; Land et al., 2012; Niell and Stryker, 2010; Vinck et al., 2015; McGinley et al., 2015; Shimaoka et al., 2018) it seems likely that auditory responses will covary with pupil dynamics and behavioral engagement. Understanding the relationship between animal state and multisensory responses could provide key insight into the functional purpose of these multisensory responses. An important consideration for this behavior is the anticipation of pole-up. Due to the structure of our behavior, we have a Zabor step motor move the pole into position before it is triggered to rise. These motors make a quiet, but audible sound which stops for a variable amount of time before the pole is triggered. Because of this, there is an unintentional cue based on the end of this motor sound which probabilistically predicts the pole-up trigger. Evaluating performance, whisking, pupil dynamics and auditory responses across this axis could be incredibly interesting, as it provides a way to determine how anticipation can shape each one of these features. Sequential Sound Presentation and Auditory Adaptation In our investigation of the pole-up sound, we employed two distinct audio playback conditions to elucidate the role of these auditory stimuli. For the first of these, we conducted an audio-only trial that presented the pole-up sound five times successively, each separated by a 250 ms interval. This design was conceived to directly test the possibility of adaptation between sequential presentations, which is an important test to qualify the second audio condition. The second involved a test condition dubbed 'double-pole-up fake trials' (Figure 3.1E, top right). In this setup, the pole-up sound was played as expected at the onset of the sampling period but was replayed 1.5 seconds later during the reward period. If differences in the responses to these 68 identical sounds exist, several explanations might account for them. These may be attributed to sensory adaptation from the initial playback which can be tested using the first condition. Alternatively, they could be explained by disparities in attention and expectation levels which can be tested by comparing across behavioral performance and pupil dynamics. A third explanation could be the difference in sensorimotor features like collecting a reward or whisking during the second presentation, which can be tested using the extracted whisker features and comparing licking and non-licking trials. Testing each of these possibilities can offer critical information on the role of auditory signals in S2. Paired neural recordings Juxtacelluar loose-seal recording using patch pipettes is considered a gold-standard for single unit isolation which offers high temporal fidelity. In our experiments we had several instances where two units were isolated simultaneously. In total eight excitatory-excitatory pairs, three excitatory-inhibitory pairs and one excitatory-unclassified pair were recorded. While this subset of recordings is small, excitatory-inhibitory pairs showed a clear inverse correlation, suggesting a strong synaptic coupling. Examining the degree of this coupling and determining if a similar correlation is observed in excitatory-excitatory pairs can shed light on how the observed response properties are influenced by shared connections. 69 Materials and Methods Object-localization task Outside of the unquantified observation on the additional dataset which used the modified task with audio playback, all neural data reported are recorded using the original object localization task outlined in Chapter 1 and 2. Modified object localization task with audio playback As described in text we modified the original localization task to include audio playback and light cue on 50% of all trials (Figure 3.1E). 10% of all trials were audio only playback. The remaining 40% a fake pole-down sound was played and then on the following trail either a light (10% of all trials) or fake pole-up sound was played (30% of all trials). Half of the fake pole-up trials (15% of all trials) another identical pole-up sound was played 1.5 seconds later. Five pole- up and five pole-down sounds were used for these trials, selected pseudo randomly to prevent any undetected distinguishing features from a single recording from signal a fake pole-down trial. For light trial the pole-up sound was substituted for a small blue LED illuminated for 120 ms. Six possible audio-only trials were played back to test audio responses to other auditory stimuli which consisted of five different randomly ordered beeping tones (Figure S3.2B) or a single audio file which consisted of five consecutive reward value clicks, 5 consecutive pole-up sounds, the Zabor motor sound, linear chirp going up in frequency, the motor again, linear chirp going down in frequency. We utilized Psychtoolbox on MATLAB, run on a mid-2015 MacBook Pro (15-inch, 2.8 GHz Quad-Core Intel Core i7, 16 GB DDR3 RAM). Single channel audio data, recorded at a 96 kHz 70 sampling rate with an M50 measurement microphone, was paired with the video trigger in the other channel for data alignment. An Arduino facilitated bitcode interpretation from the behavioral computer, ensuring trial data alignment, and communicated trial information to the audio computer. A signal initiating audio recording slightly before video, neural, and behavioral data capture, and the audio data was loaded into memory in preparation for the trial. Subsequently, camera trigger times and trial numbers were used to align the data. Audio frequency response balancing We used an Earthworks M50 measurement microphone to record playback sound and to record all audio from our experiments. The M50 offers a balanced frequency response from 50Hz - 50kHz, which allowed use to record and playback frequency balanced audio. The speaker used for playback was the ADAM Audio AX5 which had a frequency range between 50Hz - 50kHz. To create frequency -0.1 0 0 0.1 0.2 0.3 1 0 1 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 Mouse hearing range at 40 dB First iteration Final iteration Frequency (kHz) Frequency (kHz) Before balancing After balancing Normalized amplitude difference (recording-recorded playback) Normalized amplitude Before balancing After balancing A B Figure S3.2 Frequency balanced audio using iterative filter application. A) Normalized amplitude difference from original audio and audio playback from before and after iterative frequency response balancing. B) Frequency response curve before (top) and after (bottom) balancing for beeping sounds at different frequencies. 71 balanced sound which could fool mice into thinking the pole had gone down even when it didn’t, we first recorded actual sounds with the microphone positioned where mice were head-fixed. We recorded behavioral sounds for the Zabor motor moving, the reward valve clicking, the pole-up and pole-down sounds. For pole sounds we recorded five recordings of each, across the length of about 100 trials to prevent any spontaneous sound or variations in the sound over a session to cue mice that the playback was a recording. Each audio recording was independently frequency balanced using a semi-automatic iterative filtering process applied to each recording. An example of this is displayed in Figure S3.2 of a recording of beeping tones randomly ordered in time but equally spaced in the frequency domain. For balancing audio, we first played back the sound with the microphone placed where the head of the mouse would be. Next we trimmed this recording using automatic template matching and generated the frequency response curve and subtracted this from the frequency response curve from the original recording (Figure S3.2A). We multiplied the ‘ideal filter’ by a learning rate between 0.1 and 0.3, add one, and then multiply by the previous filter. Next we fit to a Yule-Walker filter using MATLAB’s yulewalk function. Lastly we filter the original audio signal using the fit coefficients and MATLAB’s filter function. Filters were updated using the following formula. 𝑓 ! =𝑓 !"# ⋅$1+𝛼 # ⋅(𝐴−𝑅 ! )- fk as the value of the filter function at the k-th iteration, fk-1 as the value of the filter function at the previous (k-1)-th iteration, α1 as the learning rate 72 A as the value of the original audio signal Rk as the value of the recorded audio at the k-th iteration after the most recent filter. Behavior, videography, and electrophysiology All behavior, animal training, videography, and electrophysiology were recorded using identical procedures to chapter 2. In vivo loose-seal juxtacellular recordings All recording procedures were identical to those described in Chapter 2 other than the target of the recording location. Targeting S2 All animals used were transgenic mice (VGAT-ChR2-EYFP) expressing channelrhodopsin in inhibitory units. Mice underwent surgery to allow for head-fixed training and trimmed to one whisker (C2), which was maintained throughout training and recording. Intrinsic signal imaging was used to map the C2 barrel of S1 using a custom designed electromagnet driven silent whisker stimulator to avoid activating auditory cortex. S2 was targeted using coordinates (0.338 mm posterior and 1.284 mm lateral) derived from an image sourced from Maatuf et al., 2016, who used voltage sensitive dye imaging to map S1 and S2 in a control mouse while stimulating the C2 whisker. Targeting was supplemented by any signal changes observed during intrinsic signal imaging found near the expected target (Figure 3.1F). Prior to recording a small craniotomy (200–300 μm) was made above the recording location in isoflurane anesthetized 73 mice. Blind juxtacellular loose-seal patch recordings were targeted to L4 through L6 starting at ~400 μm to ~1200 μm from pia. Neural tuning classification Across touch, sound and whisking responses, significance responses were determined in the same way. For touch and whisking, signal windows were determined for early (0-25 ms), late (25-50 ms) and sustained (0-50ms) periods. Pole responses periods were shifted 25 ms forward to account for their delay from pole trigger to pole sound. Baseline periods were defined between -100 and -20 ms from each event. The sum was taken across each signal window and baseline period resulting in two arrays, each of length equal to the number of events. We performed a pairwise two-sample t-test across these populations and defined significant events to be any with P<0.01. If a neuron had more than one significant signal period, we classified it as the period with the lowest P-value. 74 Chapter 4: Whisker Automatic Contact Classifier (WhACC) with Expert Human-Level Performance Abstract The rodent vibrissal system remains pivotal in advancing neuroscience research, particularly for studies of cortical plasticity, learning, decision-making, sensory encoding, and sensorimotor integration. While this model system provides notable advantages for quantifying active tactile input, it is hindered by the labor-intensive process of curating touch events across millions of video frames. Even with the aid of automated tools like the Janelia Whisker Tracker, millisecond accurate touch curation often requires >3 hours of manual review / million video frames. We address this limitation by introducing Whisker Automatic Contact Classifier (WhACC), a python package designed to identify touch periods from high-speed videos of head-fixed behaving rodents with human-level performance. For our model design, we train ResNet50V2 on whisker images and extract features. Next, we engineer features to improve performance with an emphasis on temporal consistency. Finally, we select only the most important features and use them to train a LightGBM classifier. Classification accuracy is assessed against three expert human curators on over one million frames. WhACC shows pairwise touch classification agreement on 99.5% of video frames, equal to between-human agreement. Additionally, comparison between an expert curator and WhACC on a holdout dataset comprising nearly four million frames and 16 single-unit electrophysiology recordings shows negligible differences in neural characterization metrics. Finally, we offer an easy way to select and curate a subset of data to adaptively retrain WhACC. Including this retraining step, we reduce human hours required to curate a 100 million frame dataset from ~333 hours to ~6 hours. 75 Introduction Quantitative analysis of behavior is an essential method in systems neuroscience research. High speed video is a commonly used format for recording behavior. The resulting datasets are often large and, if they require manual curation, the analytical time cost is large as well. One such pain point is found within investigations using the rodent whisker system, a common model for investigating neural representations of tactile perception and sensorimotor integration. As the stimuli used in this field have evolved from passive whisker deflections in anesthetized rodents to active whisker touches during behavior, new methods for quantification of these stimuli are required. Here we address a specific and challenging problem, fully automated whisker touch detection from videography. Rodents are highly tactile creatures that sweep an array of whiskers forward and back during many behaviors, including locomotion (Sofroniew and Svoboda, 2015; Grant et al., 2018), social interaction (Rao et al., 2014) and object investigation (Cheung et al., 2019; Cheung et al., 2020; Kim et al., 2020). When whiskers touch something, the resulting forces drive mechanotransduction in the follicle and propagation of neural activity to higher order brain regions. The neural circuitry displays remarkable temporal precision (Jadhav et al., 2009; Hires et al., 2015; Bale and Maravall, 2018), so identifying the time of touch with millisecond resolution is crucial for investigations of tactile processing. Demonstrating this importance, automated classification programs such as Biotact (Lepora et al., 2011), Janelia Whisker Tracker (Clack et al., 2012), and related tools (Knutsen et al., 2005; Voigts et al., 2008; Perkon et al., 2011; Towal et al., 2011; Betting et al., 2020), have been developed to identify when, where, and 76 how hard whiskers are touching objects in head-fixed and freely moving as well as full-field, reduced whisker and single whisker paradigms. These whisker tracking methods provide faster touch classification than hand scoring, but none achieve maximally accurate touch classification without a second stage of manual curation of classification results. In the simple case of the Janelia Whisker Tracker applied to head-fixed, single whisker touch classification, our second stage manual curation process requires approximately 3 hours and 20 minutes per million video frames to complete. To alleviate this time burden, we developed, trained, and validated a hybrid convolution neural network and gradient boosted machine model to accurately identify single-whisker touch directly from video frames in a fully automated manner. Design and Implementation The overall design goal was to rapidly classify high-speed video of head-fixed object localization in mice and accurately identify periods when a whisker was touching a presented object. The imaging viewpoint was overhead from a single camera with the whisker pad, whisker, and object (a thin pole) backlit from a diffuse infrared light-emitting diode (Figure 4.1A). The resulting touch labels from analysis can then be used to characterize electrophysiological responses of neurons to touch (Figure 4.1A inset). A full frame video is useful for determining various kinematic features of whisker motion and deformation, such as velocity and curvature from bending. However, the most relevant information for touch identification is found in a small window of pixels around the object. 77 Additionally, small images reduce data size and speeds up the training process. Therefore, video input to the model was cropped by extraction of a 61x61 pixel window centered on the touched object across all video frames (Figure 4.1A, red box). This eliminated extraneous and idiosyncratic image data (e.g., fur or stubs of other whiskers) that could impede model performance by fitting to unreliable features. Independently centering the object on each frame also accommodates the potential translation of the touched object across the field of view (e.g., when the object is being presented, withdrawn, or is vibrating). When discriminating challenging touches, human curators often scroll forward and back between a few frames to infer when touch onset or offset occurs by the change in whisker displacement between frames. To provide the classifier with this temporal information, we overlaid three consecutive grayscale frames into three color channels (cyan, magenta, and yellow), such that the label at time t had access to the image at time t, and lag images at time t-1, and t-2 (Figure 4.1B, left). To improve the generality of our training set to different imaging conditions and orientations, we augmented images with rotation, noising, brightness, and magnification adjustments (Figure 4.1B, right). These sets of images were used as inputs to a variety of convolutional neural networks (CNNs) (Figure 4.1C). We extracted all 2,048 available features from the penultimate layer of the best performing model and liberally engineered additional features based on the standard deviation, smoothing, shifting and discrete difference of different window/step sizes, resulting in 40 additional sets of 2048 features, for a total of 83,968 features (Methods). Additionally, we calculated standard deviation of each of the 41 feature sets across feature space for a total of 84,009 features (Figure 78 4.1D, left). To reduce the computer memory demands and reduce model complexity, we iteratively selected smaller subsets of the most informative features for touch classification using an ensemble of LightGBM models, based on gain and split importance measures. Once further feature removal degraded mean model performance, we had a total of 2,105 features remaining (Figure 4.1D, right). Figure 4.1 Flow diagram of WhACC video pre-processing and design implementation. A) Sample touch frame from high-speed (1000 fps) video and extracted object-centered window for CNN input (red box) and corresponding spike train from a touch responsive neuron (inset) B) Three consecutive extracted frames combined into three color channels (left) and example augmented images (right). C) Representation of ResNet50V2 model used to extract features. D) Demonstrative sample of features extracted from ResNet50V2 (left), representation of feature engineering (center) and feature selection (right) for final WhACC model. E) Final WhACC model was trained using LightGBM with Optuna to achieve the best performance. σ μ D E A B C t-2 t-1 t Feature engineering Extracted features Feature selection ResNet50V2 Images augmented 10 times Temporally layer images Extract frame Time (ms) Features 150 ms 1 mV Touch Touch 79 Establishing touch ground truth and error metrics Training a supervised classifier requires a set of inputs and a corresponding set of accurately classified labels. However, there is no independent ground truth for whether a whisker is touching the object or not on each video frame. Instead, we used a ‘majority rule’ on the output of a panel of three expert human curators as a proxy for ground truth to train the model (Figure 4.2A). To maximize accuracy, human curators were not limited to viewing the cropped images. Instead, we performed whisker tracing and linking from full field images with the Janelia Whisker tracker (Clack et al., 2012). We performed a first pass automated touch estimation based on extracted whisker curvature and estimated distance to pole. Humans then examined these estimated time series variables to screen and correct clear obvious misclassifications. Finally, for more challenging frames, humans inspected full, zoomed, and frame-by-frame difference images of touch and near touch periods with a visual browser and corrected scoring Figure 4.2 Touch frame scoring and variation in human curation. A) Example of disparity between three human curators. Majority touch (dark blue), majority non-touch (light blue) were used for training models. Consensus frames (green) were used when evaluating curator versus paired consensus in C. B) Example of scored touch array (human majority) and the corresponding edge errors (deduct and append), and touch count errors (split, ghost, miss and join). C) Individual and mean error rate for each human curator compared against the consensus of the other two curators for touch count errors (left) and edge errors (right). Consensus Majority not touch Majority touch B AC Actual Deduct Append Split Ghost Miss Join Error type Touch count 2 2 2 3 3 1 1 Touch Non-touch Touch onset Ghost Miss Join Split 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Touch count errors per touch Human error rate Individual Mean Append Deduct 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Edge errors per touch Video frames Video frames 80 until satisfied. Human labels were applied to the test set of cropped images used for model evaluation. All three curators agreed on 99.46% of test set frames. Not all errors are created equal. We identified six distinct error types (Figure 4.2B). Four error types (splits, ghosts, misses, and joins) were classified as ‘touch-count errors’ because they affected the total number of touches. Touch count errors arise in four distinct ways. Misclassifying a frame in the middle of a touch event splits the touch into two separate events. Conversely, misclassifying a frame in the middle of a non-touch period creates a ghost touch, where no touch actually occurred. Failing to label any frames within a touch event results in a missed touch, while labeling a non-touch period between two touch events joins them into one. The other two errors are edge errors, where the length of a touch is shortened (deducts) or lengthened (appends) by mislabeling the start or end frames. These errors have different functional consequences for analysis of neural responses. Touch count errors propagate to touch evoked firing rates and peri-stimulus time histogram (PSTH) structure, while edge errors largely maintain touch evoked firing rate but degrade touch evoked latency and jitter measurements. To determine human error rate, we compared single curators against the consensus frames of the remaining two curators. Most human errors were edge errors, which occurred approximately 7 times more frequently than touch count errors (Figure 4.2C). During development of WhACC, model performance was evaluated across a range of error metrics, including area under the receiver operating characteristic curve (AUC), percent correct, touch count errors per touch (TC- error) and edge errors per touch. Our goal was to reduce TC-error because it most strongly 81 degrades important measures like the total number of touches (Cheung et al., 2019), angle at touch (Cheung et al., 2020), and the number of spikes evoked on touch onset (Hires et al., 2015). Selection of training, validation, and test set Our goal was to produce a reliable whisker-pole touch classifier with expert human level performance under a wide range of imaging conditions, all while minimizing the time burden of curating by hand. To support the generalization of model performance, our initial training and validation sets came from two experimentalists working on the same apparatus. In contrast, the test set comprised imaging data collected by two other people on different apparatus, across two labs and eight years. These datasets were used to train and select the base CNN model and the changes to the input images data (i.e., augmentation and lag images). For our final model, WhACC, we pooled all our data and generated new training, validation, and test sets to equally sample from the diversity of videos. To prevent overfitting, the data was assigned to each set based on video identity, ensuring that adjacent frames were included in only one dataset. To ensure reliable performance on new data, WhACC was validated on an additional holdout set of 16 different sessions of 100 trials each (~4 million frames), across different mice and different times, all of which were curated by a single curator. These data were not used for any stage of training, early stopping or feature selection and therefore serve as a reliable expectation of WhACC’s performance. 82 The training and validation sets were selected with a bias towards more touch-informative frames. In the majority of imaged frames, the whisker is not present within the cropped window. These frames should be easily classified and are thought to provide mostly redundant training information. Thus, we underweighted their presence by removing many, but not all of them. To select more informative frames, we included all frames labeled as touch by human curation and any frames within 80 frames of a touch, discarding the rest (80-border extraction) (Figure 4.3A). This eliminated many, but not all ‘no-whisker frames’, while retaining all touch and most near- touch frames. Including some no-whisker frames was useful to provide baseline information for the negative class in the absence of a whisker. To increase model generalizability, we augmented Figure 4.3 Data selection and model performance. A) Data selected for un-augmented (green) and augmented images (blue) using all frames within 80 or 3 frames from majority scored touch frames respectively. B) Composition of training, validation and test datasets used to train each CNN C) Performance of four CNN models across three different image modification approaches and the final WhACC model on separate test set described in methods. 0 100 200 300 400 500 3 border 80 border Touch 3 80 Training data 676k frames Touch Augmented touch Validation data 282k frames Non-touch Augmented non-touch 10% 29% 28% 33% 9% 30% 28% 33% 7% 93% Test data 780k frames 0 2 4 6 8 10 Touch count errors per touch 0.0 0.2 0.4 0.6 0.8 MobileNetV3-Small MobileNetV3-Large Inception-v3 ResNet50V2 WhACC 0.000 0.025 0.050 0.075 0.100 0.125 1-AUC 0.000 0.004 0.008 0.012 No-lag not-augmented No-lag augmented Lag augmented WhACC 0.00 0.25 0.50 0.75 1.00 1.25 Edge errors per touch C A B Time (ms) 83 10 images for each original image. To maintain training and data storage efficiency, we used a stricter selection criterion for selecting these frames, which we restricted to all frames within three frames of any touch frame (Figure 4.3A). We hypothesized that these near-touch and touch frames would have much greater variation and therefore augmenting them would help the model generalize better. Overall, the training set was ~676,000 frames and validation set was ~282,000 frames with a balanced set of classes (Figure 4.3B). The test dataset was not reduced at all and consisted of ~780,000 frames that matched the distribution of a real dataset. In each dataset, only the frames where the object was within reach were used. Model selection and evaluation To assess the feasibility of automated contact curation, we trained and tested four different fully unfrozen base models: ResNet50V2 (He et al., 2016), Inception-v3 (Szegedy et al., 2016), MobileNetV3-Large, and MobileNetV3-Small (Howard et al., 2019). We selected these four models because they are relatively lightweight, are pre-trained on ImageNet, and have a well- established proven track record in addressing a wide range of problems across various domains. For each base model we either used the original images or a set of augmented images combined with the original images. Finally, for each combination of these, we compared models trained with original images versus images where the three color channels contained frames from times t, t-1, and t-2 (Figure 4.1B, left). We refer to these as lag images because they contain frames that lag behind the timepoint we are predicting. Out of the resulting 16 unique models evaluated in Table 4.1 and 4.2, we discuss 12 in detail, focusing on the ones that showed the greatest improvement with each sequential step. The remaining four models trained on lag images with no augmentation were excluded from the discussion as they did not perform as well as 84 augmented images alone, but still showed improvement over original images. TC-error and AUC were moderately correlated (r = -0.52), consequently there were cases where the model with the highest AUC was not the same as the model with the lowest TC-error. Due to WhACC’s intended use case, all models were selected based solely on TC-error to emphasize temporal consistency. Training each model using single time point images without augmentation showed poor results overall. The best model, Inception-v3 achieving a TC-error of 1.521 and an AUC of 0.820 (Table 4.1). Adding augmented images to the training data decreased TC-error in all base models except Table 4.1 Performance metrics across models after median smoothing. Performance metrics across all tested model variants prior to any feature engineering. Bold text indicates the best performance for that metric across each image modification approach, yellow highlighted text indicates the best performance for that metric across all model variants. ‘Lag’ and ‘Aug’ indicate that lag and augmented images were included in training these models, respectively. All metrics were calculated after median smoothing predictions with a window of five. These results are from the test set outlined in Figure 3B, that consists of visually different data from another laboratory. 85 Inception-v3. ResNet50V2 performed best for this variation, with a TC-error 1.464 and an AUC of 0.964 (Table 4.1). Finally using the same set of augmented and original images, we changed the input images to contain frames from previous time points in the color channels. Incorporating this temporal information decreased TC-error for the deeper models, Inception-v3 and ResNet50V2. Interestingly, for both MobileNetV3 models, adding temporal information via lag images decreased performance measured by TC-error, but increased performance measured by AUC. The increased TC-error could be due to the depthwise separable convolution architecture that makes MobileNet so efficient. The best performing model again was ResNet50V2, with a TC-error of 0.402 and corresponding AUC of 0.981. Despite having a better AUC of 0.989, Inception-v3 had a slightly worse TC-error of 0.512 (Table 4.1). Therefore we selected ResNet50V2 as the best model for our purposes. Table 4.2 Performance metrics across models without median smoothing. Same as Table 4.1 but without median smoothing. 86 While TC-error is the most important measure, edge errors per touch can also negatively impact analysis of neural data. Edge errors per touch were slightly negatively correlated with TC-error (r = -0.11). Examining image modification approaches revealed that on average, augmentation with lag images showed the best performance for TC-error, but the worst performance for edge count errors; this suggests that there is some tradeoff between these two types of errors (Table 4.1). The above results showed that our overall approach to automated touch classification was feasible but needed improvements to match expert human level performance. To improve the overall performance, we implemented a two-stage hybrid model. The first stage of the model involved extracting 2048 features from the penultimate layer of the best performing model, which was ResNet50V2 trained on augmentation and lag images. These features were then inputted into a LightGBM classifier (Ke et al., 2017). We chose to use LightGBM due to its excellent performance across a range of classification and regression problems, as well as its fast and lightweight nature. Preliminary results showed a noticeable increase in overall performance, reducing TC-errors to 0.307 (Table 4.1), however, to achieve expert level performance, we sought insights from human curators to gain further understanding of how to improve the system based on temporal information. So far, our approach to touch classification was limited to integrating temporal information over three frames. However, expert human curators can integrate information over longer periods of time. A curator can intuitively infer times of high touch probability to bias their choice, since touches occur in clusters. They can also bias their confidence of a touch frame based on 87 previously identified touch frames, since touches most often occur in segments of many frames in a row. For example, if the last 20 frames were identified as touch frames, the likelihood of the next frame being a touch frame is much higher than a randomly sampled frame. Moreover, curators can distinguish onset and offset touch frames by identifying sudden changes in features like whisker speed and whisker bending. They do this by comparing them to earlier frames that act as a visual baseline for this change. Currently, our model does not have access to these features, so we engineered features to capture more of them. Building off the original 2,048 extracting features, we engineered features which improved model performance (Figure 4.3C). How we engineered features was informed by the strategies of expert curators outlined above. They included forward and back shifts of up to 5 frames, rolling means and rolling standard deviation of windows ranging from 3 to 61 frames, and discrete differences of frames ranging from -50 to 50 frames apart (Methods). Frame shifting and small window rolling mean operations should inform the current frame based on features of surrounding frames (Figure 4.4B, 4C). Larger window rolling mean and larger window rolling standard deviation should contain information of regions of high touch probability or clusters of touches (Figure 4.4C, 4D). Smaller window rolling standard deviation and discrete difference can reveal sudden changes in features which can help identify onset and offset times (Figure 4.4D, 4E). Finally, by taking the standard deviation across feature space, we get a measure of dispersal unique to each feature set, each of which can inform touch properties in their own way (Figure 4.4F). 88 Combined with the original features we now had a total of 84,009 features. To reduce model complexity and memory demands, we reduced the total number of features using recursive feature elimination (Methods). Our original training data could not fit into memory with all the features, so we divided data into 10 training and validation sets and fit 10 light GBM classifier models for each feature selection step. Because of how we split our data, performance metrics are not directly comparable to previous Figures (Methods). First, each model was trained with the original 2,048 features. Next we trained each model on the full set of 84,009 features. Adding engineered features reduced error by over 50% (Figure 4.4G). For the full feature set model, most features had embedded feature importance of 0 for all 10 models (i.e., not a single model used these features), so we eliminated all these features for the next iteration, leaving 28,913 remaining. We continued to reduce features by selecting features that passed a threshold until there was a negative impact on mean performance of the 10 models. In total, we isolated 2,105 Figure 4.4 Feature engineering and selection. A) The original 2048 features extracted from the penultimate layer of Resnet 50 V2, (zoom) enlarged for detail (white box). Additional features generated by (B) shifting, (C) smoothing, (D) taking the rolling standard deviation, and (E) taking the discrete difference for each of the original 2048 features. F) Standard deviation of the original and 40 additional engineered feature sets across feature space (columns). G) Model performance across feature engineering and reduction (feature selection). Original 2,048 full 84,009 29,913 18,445 6,730 4,353 3,624 3,029 2,105 Number of features 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 1 - AUC Individual models Mean Step size Step size Window size Window size 1 512 1024 1536 2048 0 50 100 150 -5 -4 -3 -2 -1 1 2 3 4 5 Shift 37 11 15 21 41 61 Rolling mean 37 11 15 21 41 61 Rolling standard deviation Original Resnet50V2 2048 features A B -50 -20 -10 -5 -4 -3 -2 -1 1 2 3 4 5 10 20 50 Discrete difference F G CD Discrete difference Rolling SD Rolling mean Shift Original 0 50 100 150 E Features Standard Deviation of each feature set Time (ms) Time (ms) Touch 89 high value features. The mean performance of the models trained on this selection was indistinguishable from those trained on the full feature set (Figure 4.4G). Using the selected 2,105 features, we trained a final LightGBM classifier using Optuna (Akiba et al., 2019), to optimize the hyperparameters over 100 Optuna trials (Methods). The final model, which we named WhACC, performed much better than any of the CNN models alone (Figure 4.3C). To compare WhACC with expert curators, we compared all consensus frames of two curators against either the other curator or against WhACC. We did this for each combination of two curators for a total of three comparisons. This method of comparison reveals the human error rate and establishes a realistic performance ceiling. Overall, WhACC made fewer TC-errors on average. For each type of touch count errors, WhACC either had similar or less variability compared to the human error rates (Figure 4.5A). WhACC made more edge errors per touch on average, specifically because of more deduct errors. For both types of edge errors WhACC showed less variability compared to expert curators (Figure 4.5B). On average WhACC performs as well as a human curator or slightly better when evaluating on TC-errors but has a slight bias towards shortening the length of touches (Figure 4.5C). Finally, WhACC shows a nearly Figure S4.1 WhACC curation GUI. WhACC curation GUI used to curate a sample of data for later retraining of LightGBM model. Touch (green), non-touch (red) and unclassified (black, not shown) can be assigned using up arrow, down array and the ‘1’ key respectively. Right and left arrows scan through frames. 90 identical but slightly larger mean percent correct frames compared to expert curators (Figure 4.5D). We've demonstrated that WhACC performs comparably to an expert curator. However, given the variability of video data between labs, experimental apparatuses, experimenters, and over time, we need to ensure its effectiveness on future datasets (e.g., data drift). To account for these differences, we developed a retraining system. First, we automatically sample video frames based on user-defined time when the pole is within reach. Using the tracked object position data, frames are automatically sampled equally based on the object's location in each video. Next, we Figure 4.5 – WhACC shows expert human level performance. A) Human vs WhACC touch count error rate for each error type (top) and in total (bottom), error bars indicate 95% CI. B) Same as A for edge errors. C) Difference in error rate for human versus WhACC. Negative values indicate WhACC outperforming human curators on average. D) Percent correct for individual and mean performance of human curators versus WhACC. Human WhACC 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Touch count errors per touch Split Ghost Miss Join Human WhACC 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Edge errors per touch Deduct Append −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 WhACC - Human (error rate) Human WhACC 0.055 0.060 0.065 0.070 0.075 Touch count errors per touch Mean Individual Human WhACC 0.35 0.40 0.45 0.50 0.55 0.60 0.65 Edge errors per touch Human WhACC 99.60 99.64 99.68 99.72 Percent correct (frames) Split Ghost Miss Join Deduct Append B AC D 91 curate this small subset using the included GUI (Figure S4.1). Finally, we retrain the LightGBM model with the newly curated data combined with the original training data. This approach enables us to quickly adapt WhACC to new datasets, without the need to retrain the ResNet50V2 model or repeat the time-intensive feature selection step. We then evaluated our retraining procedure using a holdout dataset with 16 different recording sessions of 100 videos each for a total of ~4 million frames. We validated our retraining procedure by comparing WhACC before retraining, after retraining with 100 frames per session (.03% of the data), and after retraining with 1000 frames per session (.3% of the data). Curating 1,000 frames takes between two and four minutes, while curating an entire session would take between three and five hours of focused work. Before retraining, WhACC performed well on 11 sessions (Figure 4.6A, top row) but poorly on the other 5 (Figure 4.6A, bottom row). Retraining with 100 frames per session led to a large reduction in TC-errors for the 5 poor performing sessions, while the 1000-frame model yielded an additional incremental reduction for some high-error sessions (Figure 4.6B). Both WhACC retrained models and the human curator identified the same seven sessions corresponding to touch- responsive neurons. The 1000-frame model generated more spikes per touch for three of the seven touch neurons compared to human curated data (Figure 4.6C). WhACC peak responses tended to be 1 to 2 ms earlier than the human curator's due to the increased number of deduct errors (Figure 4.6D). Finally, we found the signal window size of the 1000-frame model matched 92 that of the human curator for five neurons but was slightly larger for the other two (Figure 4.6E). Taken together, we have shown the utility of WhACC as a high-speed video touch classification system that allows for reliable, consistent, and accurate predictions that can be updated to adapt to new datasets. Figure 4.6 Retraining WhACC on a small sample of data can account for differences in datasets. A) Touch aligned PSTHs curated by WhACC before retraining (yellow), after retraining on 100 frames (orange), and 1000 frames (red) from each session compared against an expert curator (black, shaded regions represents 95% CI). (Top row) Examples of high accuracy without retraining and (bottom row) high accuracy following retraining. PSTHs are ordered by ascending TC-errors before retraining. B) Touch count error rate from a holdout dataset consisting of 16 different sessions before and after retraining. Gray outlines indicate sessions from the top row in A and black outlines indicate those from the top row in A. C) Comparison of spike evoked touch count across human and retrained WhACC, n=7 touch responsive neurons. D) Difference in peak touch response times between retrained WhACC models and human curators E) Same as D for the width of touch response windows. 0 25 50 75 0 50 100 150 200 No retraining Retrain 100 Retrain 1000 Human 0 25 50 75 0 10 20 0 25 50 75 0 10 20 30 0 25 50 75 0 2 4 6 0 25 50 75 0 5 10 0 25 50 75 0 20 40 60 0 25 50 75 0 2 4 No retraining Retrain 100 Retrain 1000 10 −2 10 −1 10 0 10 1 Touch count errors per touch Retrain 100 Retrain 1000 Human 0.0 0.2 0.4 0.6 0.8 |spikes/touch| −4 −3 −2 −1 0 1 Model - human peak time (ms) Model - human window width (ms) 0 1 2 3 4 5 Count −8 −4 0 4 8 0 1 2 3 4 5 Count Spikes per second Time from touch onset (ms) Retrain 1000 Retrain 100 0 2 4 6 A BC D E 93 Discussion Summary of WhACC The present study describes a novel approach to classify whisker touch frames from high-speed video using a 2-stage hybrid model implemented in a Python package named WhACC. WhACC was trained based on majority labels established by three expert curators and evaluated on a custom measure which penalized error classes which would be most detrimental to our downstream electrophysiology analysis. We integrated some temporal information into ResNet50V2 by layering different frames in color channels and trained the model using augmented images and dropout to increase generalizability. To further optimize performance, we extracted, engineered, and selected features from ResNet50V2. These features were fed into a LightGBM classifier, and the hyperparameters were optimized through OPTUNA. To account for data drift (e.g., changing imaging conditions over time), we developed and empirically validated a retraining system using a small number of sample frames. Our findings demonstrate that WhACC is an effective and reliable tool for whisker touch classification that can save time compared to manual curation. Potential limitations WhACC achieves human expert-level performance on our data but there are several conditions where its performance is uncertain. First, because the ResNet50V2 model was trained on frames from 1,000 frames per second (fps) video, WhACC may not be as effective for other frame rates because the lag images are constructed from differently sized time-steps. Second, WhACC may struggle when multiple whiskers are in the frame simultaneously, as it was trained to classify touches from single whisker videos. Third, our model was only trained on images where the 94 object was a small round pole so how well it generalizes to other contact objects is not known. Finally, we must consider video clarity and contrast because this impacts human curation, whisker tracing in the Janelia whisker tracker, and WhACC. Videos with different frame rates Preliminary tests using low fps video in our lab show promise for accurately identifying touches. Even so, an option for improvement could be to modify lower fps videos using a frame interpolation model to generate intermediate frames to match our training data at 1,000 fps. On the contrary, generating features from a low fps video and then interpolating those features afterward may not work as intended, because some of the features likely depend on the frame rate of the lag images (e.g., a velocity related feature derived from the distance of the whiskers in different color channels). Another alternative could be to downsample the original training data and retrain WhACC from scratch. Multi-whisker video and alternative contact objects Despite training on single-whisker videos, WhACC's ResNet50V2 stage likely captures general features useful for predicting touches in multi-whisker video. We believe that the existing retraining procedure should work well, provided that the training data contains images with multi-whisker touches. For different contact objects, our frame extraction method should work without issue because it is based on a template image defined by the user. While it's unclear how well WhACC will perform on contact objects of different shapes or sizes post-retraining, we believe it should generalize well for at least a subset of possible objects, given our use of augmented training images and poles of varying diameters and optical zooms. Finally, we expect 95 WhACC to perform well in freely moving rodents touching static objects from different directions, because we trained on rotated images, although we have not tested this. Why is retraining required? We determined that the reason WhACC failed on five sessions prior to retraining was due to very poor performance on whisker-out-of-frame images, which was unexpected given that these frames should be easy to classify. However, as soon as the whisker came into view, WhACC performed well. This result could be attributed to two main factors. First, it could be due to the variability of what ‘baseline’ whisker-out-of-frame images look like across sessions, including changes in focus, contrast and even debris on the pole and whisker. Second, relatively few whisker-out-of-frame images were included in our augmented training data, which most likely hindered WhACC’s ability to generalize to these frames. In addition, we noticed that WhACC performed slightly worse on far distance touches, when only the tip of the whisker touched the pole. We suspect this is due to limited training data for these scenarios and because of a low signal to noise of the visibility of the thin tapered end of the whisker. Final considerations and related work TC-error measures temporal consistency of predictions by penalizing changes in total touch count, rather than only misclassified individual frames. After training with lag and augmentation images and median smoothing, we show that TC-errors decreased for Inception-v3 and ResNet50V2 compared to models trained on only augmented images. The reverse was true for both MobileNet models. This might be related to their depthwise separable convolution architecture which first uses depthwise convolution on each color channel separately and then 96 uses pointwise convolution on the results from the depthwise convolution (Chollet, 2017). This method is an efficient way to independently map spatial and color channel information (Guo et al., 2019), but when we added temporal information into the color channels, it could not effectively learn correlations between channels and spatial information. Despite this result, 3D CNNs which use a modified form of depthwise separable convolution have proven to be useful in efficiently capturing temporal information in video data (Xie et al., 2018). Lastly, it is worth noting that for the models where lag images did improve performance, the initialized model weights pre-trained on ImageNet were sufficient for doing so. We selected the CNN model based on its capability to generalize to visually distinct test data and prioritized temporal consistency by evaluating on TC-error. However, all CNN models occasionally displayed inconsistent classification for adjacent frames identified by expert curators as nearly identical, highlighting a lack of temporal consistency. To some degree this is expected because the CNN models have no or very little temporal information. On the other hand, because these frames are nearly identical to the human eye, this failure is more indicative of a lack of overall generalizability and is largely independent of time. These observations highlight the utility of temporal consistency in building models that can generalize well to a variety of scenarios. With the high temporal resolution of our video, we can see that subtle differences between frames can have a major impact on the model's predictions. While the factors that impact model performance are complex and varied, evaluating the temporal consistency of non-temporal models may help reveal vulnerabilities to alterations in input data caused by noise or adversarial attacks. 97 Using LightGBM as the second stage of WhACC grants us the flexibility to integrate new features to improve the overall performance. For example, the Janelia whisker tracker extracts many useful features like whisker follicle angle, angular velocity, approximate distance to pole and whisker curvature. These and other useful features could be directly combined with our extracted features to improve overall performance. Alternatively, we could train a separate model on these additional features and fuse those predictions with WhACC. For example, tracking data is highly reliable at defining non-touch periods when the whisker is not near the pole and WhACC excels at classifying touch and near-touch frames. CNNs and other deep learning models have been combined with gradient boosting machines to achieve excellent results across multiple domains. These hybrid models have been used for image classification (Ismail and Islam Mondal, 2022; Sugiharti et al., 2022), as well as time series data, including stock price forecasting (Liu et al., 2020), ultra-short-term wind power prediction (Ju et al., 2019), acoustic scene classification (Fonseca et al., 2017), and sleep stage classification (Chambon et al., 2018). Techniques for combining these models include using a CNN as a feature extractor and then feeding these features into a gradient-boosted machine model, which is like the process we employed here. Another option is to use a late-fusion approach, where models are trained in parallel and predictions are combined either by averaging or by using an additional classifier to generate the final predictions. CNNs and gradient-boosted machines have even been integrated into a unified framework (Thongsuwan et al., 2021). A study related to our work (Bober-Irizar et al., 2019) identified precise temporal events in video data and leveraged a gradient boosted machine but otherwise approached the problem 98 differently. They use a late fusion approach where we use model stacking. Most importantly, they integrate temporal information using an LSTM, but we use feature engineering and allow LightGBM to capture temporal relationships. To our knowledge, no other study has combined a CNN to extract spatial features from video frames and then used feature engineering combined with a gradient-boosted machine to capture temporal information, to identify precise temporal events in video. In conclusion, WhACC is an efficient and adaptable classification software that can greatly reduce human curations hours across different laboratories which use a similar experimental design. Our package and full walkthrough can be found at https://github.com/hireslab/whacc. Materials and Methods Data selection and preprocessing The original training data consisted of grayscale MP4 video from eight mice across eight different behavioral sessions, with two sessions each from four different scientists. The videos were collected using three different experimental rigs and across two different laboratories. We created a Google Colab compatible interface to select a template image of the pole. Using this interface, we selected a template image of the pole (default 61X61 pixels). Any template size can be used to match the specific object. Using this template image, we match and extract each frame in an entire session using OpenCV (Bradski, 2000). All data are stored in H5 files using the H5py package (Collette et al., 2020). For each CNN model, training and validation videos were split based on the segments created from the 80-border extraction method in Figure 4.3A and described above. Test data consisted of 99 a set from another laboratory. We reasoned that if the CNN model could perform well on these visually different images of the same whisker task, then the extracted features would be general features of the task and not overfit to the specific video conditions. For each of these datasets we selected a subset using 3-border extraction, made ten copies of these data, and then augmented each frame independently prior to training. Examples of fully augmented images are displayed in Figure 4.1B. Using keras from tensorflow, images were randomly augmented using all the following augmentations: full rotation, shifting in any direction up to 10%, symmetrically zooming in or out up to 25%, changes in brightness ranging from 20% to 120% of the initial value. Additionally, we used the imgaug (Jung, 2017) python package to modify images with additive gaussian noise, where scale was set to three. These parameters were selected based on trial and error to ensure frames were still interpretable to a human curator. For lag images we simply stacked images from the previous two time points into the different color channels. Training CNNs Each CNN model was initialized using the weights pre-trained on ImageNet using Tensorflow (Abadi et al., 2016). Input images are fed into the network using a batch generator and automatically normalized from -1 to 1 and resized to 96X96 pixels to match pre-trained CNN model formats. Multiple levels of unfreezing were tested but only fully unfrozen models are presented here, as they performed best. We used RMSprop as the optimization algorithm and binary cross-entropy as the loss function. Various batch sizes and learning rates were also tested but final models were trained using a batch size of 200 with a learning rate of 10 -6 . We trained 100 with a dropout rate of 50% to improve generalizability of the model. We performed early stopping based on the validation loss with a patience of 15 epochs. For all models, training was stopped by early stopping. The model epoch with the best validation loss was saved and later compared against other CNN models evaluated on test data. ResNet50V2 with lag images training on augmented data was selected as the feature extractor model based on the test set having the lowest TC-error. Feature engineering selection As depicted in Figure 4.4, we applied various transformations to the 2048 features extracted from the penultimate layer of ResNet50V2. These included the following. Shifting by -5, -4, -3, -2, -1, 1, 2, 3, 4, 5. Rolling mean and rolling standard deviations centered at time of time = 0 with window sizes of 3, 7, 11, 15, 21, 41, 61. Discrete difference between time = 0 and relative step sizes -50, -20, -10, -5, -4, -3, -2, -1, 1, 2, 3, 4, 5, 10, 20, 50. With the original features this was a total of 41 sets of 2048 features. Lastly, we took each one of those sets and calculated the standard deviation across feature space to generate 41 additional features for a total of 41*2048+41 or 84,009 features. For feature selection we split all available data into training (70%) and validation (30%) sets and then split both of those into 10 equal sets. This was necessary because our full dataset with 84,009 features could not fit into memory on our local machine with 128GB of RAM. Using an ensemble of 10 models also helped reduce the risk of selecting poor predictors due to chance. We split data based on frame index (as opposed to segment or video), to ensure variance was equally distributed across the different splits. Equal variance among these splits meant that 101 important and reliable features should have approximately the same importance across models. This helped us to confidently eliminate features with moderate importance in a few models with little to no importance in the other models. Data consisted of features generated from lag images without any augmentation from the original dataset (80-border extraction). Augmented images were not included in any LightGBM models because preliminary tests showed decreased performance when trained on augmented images. Due to the nature of our feature engineering operations, some data were undefined at the edges (e.g., smoothing with a window of 61, the last 30 frames are undefined). Because of this, we dropped any time points that contained undefined values prior to training, so that features with undefined regions were not eliminated due to underrepresentation. As mentioned in the main text, models were trained with the original 2,048 features then with the full 84,009 features for comparison. We evaluated model performance using mean AUC of validation sets. Across the ten models trained on the full feature set, only 28,913 features were used at all, so we eliminated the others. We continued to eliminate features based on threshold of gain and split importance. This process was done through trial and error and was carried out by exploring distributions of feature importance and defining thresholds to eliminate weak predictors. Naturally, some feature elimination steps decreased performance and so the actual selection process resulted in many offshoots not shown in Figure 4.4G. If eliminating features reduced performance, we simply came up with a more conservative elimination step and tried again. We continued this process until we could no longer maintain the same level of performance when reducing features. Once finished, we selected 2,105 features which we proceeded to use to train the LightGBM classifier. 102 The LightGBM models used for feature selection were trained with the following hyperparameters. The number of leaves in the decision tree was set to 31, and the maximum number of iterations was set to 5000 but all models trained were halted before reaching this limit. The AUC was used as the evaluation metric, and early stopping was applied after 40 rounds. The histogram pool size was set to 16,384, and the maximum bin value was set to 255. The learning rate was set to 0.1, and the maximum depth of the tree was unlimited. The minimum number of data points allowed in a leaf was set to 20, and both the bagging fraction and feature fraction were set to 1. The minimum data points allowed in a bin was set to 3. Preprocessing and training After ResNet50V2 was selected as our feature extractor model, we created different training (263,069 frames), validation (73,909 frames), and test (114,051 frames) sets by randomly assigned data from single videos. Training and validation sets were trimmed using 80-border extraction, but the test set included all frames to mimic the distribution of actual data. These data were used to train the final WhACC model and for the later retraining process. Just like the feature selection step, features were generated from lag images without augmentation. The test set created at this step was used for Figure 4.3C to compare WhACC to the CNN models. Because there is some data from each of the eight sessions, this included some data used to train the CNN models. This means that in Figure 4.3C the performance is inflated for the CNN model but not for WhACC. We made this choice to emphasize WhACC’s true performance against the CNN models. Also note, the feature extractor CNN was selected based on the original test set described in the data selection and preprocessing section summarized in Figure 4.3B and Table 4.1. 103 Initial attempts to train our model showed great performance but we observed edge effects for input data within a few frames of the last frame in a video. We discovered this was due to the model relying on features which were unavailable for those timepoints due to how they were engineered. To remedy this, we created an index matrix (100 by 2,105) of all the possible data points which could contain undefined values for each feature at each timepoint. Next, we created a duplicate of each dataset. For each timepoint in these duplicates, we randomly selected from the 100 possible timepoints which contained undefined values and added undefined values to these relevant features to act as a mask. These altered datasets simulate what features look like for the first and last 50 frames of a video where undefined values are possible. Final datasets for the LightGBM portion of WhACC were composed of these altered datasets combined with their unaltered counterparts. This process is carried out by default when retraining WhACC on sample data as well. To optimize the performance of the LightGBM classifier for the WhACC, we used early stopping with a custom callback that evaluated TC-errors and AUC. If the validation data failed to improve for either of these metrics for 500 rounds, training was halted, and the model with the best TC-error was selected. The number of iterations was set to 10,000 but all models were halted before reaching this limit. To determine the optimal hyperparameters, we used Optuna, a hyperparameter optimization framework. Optuna settings included L1 and L2 regularization values between 1*10^-5 and 10. The number of leaves in the decision tree ranged from 2 to 256, and the suggested fraction of features to use per tree ranged from 0.4 to 1.0. The suggested fraction of data points to use for each bagging sample ranged from 0.4 to 1.0, and the suggested 104 frequency of bagging ranged from 1 to 7. Finally, the suggested minimum number of data points allowed in a leaf was set to range from 5 to 100. Retraining procedure To make WhACC more reliable and generalize across variability of new datasets, we designed and tested a retraining process using a small subset of data. We tested different methods for sampling data, including using a few full-length videos and a few frames across many different videos. The latter was more effective. For the retraining procedure we used the object tracking location to select trials equally spaced along the horizontal coordinates of the video. Then we selected a starting frame where the pole was available in all videos. We sampled 10 frames from either 10 or 100 videos for a total of either 100 or 1000 frames from each session respectively. Once data was selected, we used a GUI to curate the subset and saved our sample data (Figure S4.1). We then load the original training and validation data and split the new sample data derived from each session into each. We split sample data into 70/30 training and validation sets and apply a weight of two to new data to bias the model slightly. Next we duplicate each dataset and add undefined values as described in the preprocessing and training section. Then we retrain just the LightGBM stage of WhACC using the preselected hyperparameters or by using Optuna to select them again. Finally, we evaluate each model's validation TC-error and AUC to help select a final model. We evaluated WhACC before and after retraining using a holdout dataset of 16 sessions. Each session was composed of 100 videos and each video had 3,000 frames. Accounting for pole available times, we tested a total of ~4 million frames which were not used in any stage of 105 developing WhACC. This dataset was curated by only one expert curator, so the human error rate is not known. We compared WhACC before retraining, WhACC after retraining on an additional 100 frames from each session (1,600 total) and WhACC after training on an additional 1000 from each session (16,000 total). We constructed PSTHs by taking the mean spikes response across all touch onsets times for either each model or the expert curator. For the expert curator 95% confidence interval are included (1.96* standard error). Touch neurons were identified using PSTH traces smoothed with a window of 5ms. Using a baseline period from -100 to -20 ms from touch onset, we subtracted the signal from the baseline and divided it by the SD of the baseline to get a Z-scored trace. Any neuron with points outside of 2 SD from the baseline for four or more continuous time points from touch onset to 100 ms after was considered a touch neuron. If a touch signal was bipolar (i.e., positively and negatively modulated at different times following touch), only the significant times directly following touch onset were considered the signal window. If two significant regions were separated by some points below two standard deviations, then the signal window was defined as all the time points from the beginning of the first significant region to the end of the last significant region. Spikes per touch were calculated by taking the integral of the significant region of the unsmoothed baseline subtracted PSTH. We then took the absolute value of this number. Evaluating performance We created a class to determine touch count and edge errors and determine their lengths. As explained in the main text, we utilized TC-errors to help us select a model that performed more similarly to human curators, specifically in terms of the types of errors that were deemed 106 acceptable. We determined human error rate by comparing consensus frames between two curators against the predictions of the remaining curator, for each combination of the three curators. This method allowed for a fair comparison between WhACC and each curator. For all models we median smoothed predictions with a five ms window because we found this uniformly increased performance across models. Median smoothing primarily helped to eliminate noisy predictions found near touch onset and offset events. Concluding remarks Studying neural response properties in the context of behavior is integral to understanding the brain. Here we examined whisker guided object localization in mice trimmed to a single whisker. We find that mice have object localization precision of less than 0.5mm. By designing models to predict animal choice and trials type, we found that it is feasible that mice use a combination of touch count and whisker midpoint to solve this task, although alternative strategies could also account for the results. Further breaking the localization behavior into its component parts, we find that mice use the angle and not the distance from their face, to solve our localization task. We found a neural representation of object location in S1 which can decode the pole position. Through lesions experiments, we found that mice can still detect touch but cannot discriminate touch locations by their sensorimotor variables. We also found that these neural representations of touch no not require any special learning given that they exist in naive mice. While this is true, we did not eliminate the possibility that these representations are the results of a naturally developed investigative whisking strategies mice employ when attempting to learn a task. Lastly, 107 we found that location tuned neurons are not simply the results of touch-amplified free-whisking tuned neurons. Our investigate of S2 during the object localization task revealed that a high density of touch and sound responsive neurons in L5 and upper L6. We used the alignment of whisking onsets to eliminate the possibility of these auditory responses being driven by touch or self-motion. Single neurons were often tuned to both stimuli, and some showed complex tuning, which was especially interesting for the auditory responses in a traditionally somatosensory cortical region. Lastly we show that auditory responses lag touch responses by about 6ms when considering the shorted response latencies, suggesting that auditory responses may be the result of direct connection from AAF. Our final study is centered around WhACC, which allows for fast classification of whisker touch times. Here we test multiple CNNs to determine which best predicts times of touch using a custom metric which emphasizes temporal consistency across video frames. We show that transfer learning on these pre-trained networks predict touch times moderately well but lack temporal consistency. By selecting the best performing model and using it indirectly as a feature extractor. We show that these features can be modified across time using simple operations and fed into a LightGBM model to produce excellent results. By using this approach, we also gain the ability to retrain the LightGBM head to fit new data using standard computer hardware. Finally, we show that across new datasets with only a little retraining, our system can accurately predict touch times as well as human expert curators. 108 I began my dissertation emphasizing how important it is to study the brain-behavior relationship. As we explore neural activity in the context of behavior however, the complex and interdependent nature of the two makes it more difficult to meaningly interpret the results of how neural properties come to be. This complexity can be managed in two ways; through a reductionist framework (Krakauer et al., 2017), which isolates components, tests them separately and uses these simpler components to explain a more complex idea. Alternatively, we can embrace the complexity and manage it by carefully observing behavior and using modeling techniques to understand how these variables interact and which variables best explain the results. Both approaches are of great value, but I argue that the latter is underutilized in neuroscience research. I strongly believe that embracing modeling approaches (especially those which allow non-linear interactions) is key to understanding these complex systems. Tackling these problems also requires additional assistance through new tools to relieve time burdens (e.g., WhACC), more collaboration across disciplines, and an overall mentality that complex systems cannot always be reduced to simple explanations. References Abadi M et al. (2016) TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, pp 265–283 OSDI’16. Savannah, GA, USA: USENIX Association. Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: A Next-generation Hyperparameter Optimization Framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 2623–2631. Ansorge J, Wu C, Shore SE, Krieger P (2021) Audiotactile interactions in the mouse cochlear nucleus. Sci Rep 11:6887. Bale MR, Maravall M (2018) Organization of Sensory Feature Selectivity in the Whisker System. Neuroscience 368:70–80. Bauer A-KR, Debener S, Nobre AC (2020) Synchronisation of Neural Oscillations and Cross-modal Influences. Trends Cogn Sci 24:481–495. 109 Betting J-HLF, Romano V, Al-Ars Z, Bosman LWJ, Strydis C, De Zeeuw CI (2020) WhiskEras: A New Algorithm for Accurate Whisker Tracking. Front Cell Neurosci 14:588445. Bieler M, Xu X, Marquardt A, Hanganu-Opatz IL (2018) Multisensory integration in rodent tactile but not visual thalamus. Sci Rep 8:15684. Bizley JK, Nodal FR, Bajo VM, Nelken I, King AJ (2007) Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cereb Cortex 17:2172–2189. Bober-Irizar M, Skalic M, Austin D (2019) Learning to Localize Temporal Events in Large-scale Video Data. arXiv:191011631. Boughorbel, S., Jarray, F., and El-Anbari, M. (2017). Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PloS one 12(6), e0177678. Bradski G (2000) The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 25(11), 120-126. Brett-Green B, Paulsen M, Staba RJ, Fifková E, Barth DS (2004) Two distinct regions of secondary somatosensory cortex in the rat: topographical organization and multisensory responses. J Neurophysiol 91:1327–1336. Campagner D, Evans MH, Bale MR, Erskine A, Petersen RS (2016) Prediction of primary somatosensory neuron activity during active tactile exploration. Elife 5:e10696. Chambon S, Galtier MN, Arnal PJ, Wainrib G, Gramfort A (2018) A Deep Learning Architecture for Temporal Sleep Stage Classification Using Multivariate and Multimodal Time Series. IEEE Trans Neural Syst Rehabil Eng 26:758–769. Chen JL, Voigt FF, Javadzadeh M, Krueppel R, Helmchen F (2016) Long-range population dynamics of anatomically defined neocortical networks. Elife 5:e14679. Cheung J, Maire P, Kim J, Sy J, Hires SA (2019) The Sensorimotor Basis of Whisker-Guided Anteroposterior Object Localization in Head-Fixed Mice. Curr Biol 29:3029-3040.e4. Cheung JA, Maire P, Kim J, Lee K, Flynn G, Hires SA (2020) Independent representations of self- motion and object location in barrel cortex output. PLoS Biol 18:e3000882. Chollet F (2017) Xception: Deep Learning with Depthwise Separable Convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1800–1807. Honolulu, HI: IEEE. Clack NG, O’Connor DH, Huber D, Petreanu L, Hires A, Peron S, Svoboda K, Myers EW (2012) Automated tracking of whiskers in videos of head fixed rodents. PLoS Comput Biol 8:e1002591. Collette A et al. (2020) h5py/h5py: 3.1.0. Zenodo. Available at: https://zenodo.org/record/6575970 [Accessed March 30, 2023]. 110 Couto J, Kandler S, Mao D, McNaughton BL, Arckens L, Bonin V (2019) Spatially segregated responses to visuo-tactile stimuli in mouse neocortex during active sensation. bioRxiv:199364. Donishi T, Kimura A, Imbe H, Yokoi I, Kaneoke Y (2011) Sub-threshold cross-modal sensory interaction in the thalamus: lemniscal auditory response in the medial geniculate nucleus is modulated by somatosensory stimulation. Neuroscience 174:200–215. Fee MS, Mitra PP, Kleinfeld D (1997) Central versus peripheral determinants of patterned spike activity in rat vibrissa cortex during whisking. J Neurophysiol 78:1144–1149. Fonseca E, Gong R, Bogdanov D, Slizovskaia O, Gómez Gutiérrez E, Serra X (2017) Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks. Fu K-MG, Johnston TA, Shah AS, Arnold L, Smiley J, Hackett TA, Garraghty PE, Schroeder CE (2003) Auditory cortical neurons respond to somatosensory stimulation. J Neurosci 23:7510– 7515. Ghazanfar AA, Schroeder CE (2006) Is neocortex essentially multisensory? Trends Cogn Sci 10:278–285. Godenzini L, Alwis D, Guzulaitis R, Honnuraiah S, Stuart GJ, Palmer LM (2021) Auditory input enhances somatosensory encoding and tactile goal-directed behavior. Nat Commun 12:4509. Grant RA, Breakell V, Prescott TJ (2018) Whisker touch sensing guides locomotion in small, quadrupedal mammals. Proc Biol Sci 285:20180592. Guo Y, Li Y, Feris R, Wang L, Rosing T (2019) Depthwise Convolution is All You Need for Learning Multiple Visual Domains. arXiv:190200927. Haider B, Häusser M, Carandini M (2013) Inhibition dominates sensory responses in the awake cortex. Nature 493:97–100. He K, Zhang X, Ren S, Sun J (2016) Identity Mappings in Deep Residual Networks. In: Computer Vision – ECCV 2016 (Leibe B, Matas J, Sebe N, Welling M, eds), pp 630–645. Cham: Springer International Publishing. Hill DN, Curtis JC, Moore JD, Kleinfeld D (2011) Primary motor cortex reports efferent control of vibrissa motion on multiple timescales. Neuron 72:344–356. Hires SA, Gutnisky DA, Yu J, O’Connor DH, Svoboda K (2015) Low-noise encoding of active touch by layer 4 in the somatosensory cortex. Elife 4:e06619. Howard A, Sandler M, Chen B, Wang W, Chen L-C, Tan M, Chu G, Vasudevan V, Zhu Y, Pang R, Adam H, Le Q (2019) Searching for MobileNetV3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 1314–1324. Hubatz S, Hucher G, Shulz DE, Férézou I (2020) Spatiotemporal properties of whisker-evoked tactile responses in the mouse secondary somatosensory cortex. Sci Rep 10:763. 111 Ismail Md, Islam Mondal MdN (2022) Extreme Gradient Boost with CNN: A Deep Learning-Based Approach for Predicting Protein Subcellular Localization. In: Proceedings of the International Conference on Big Data, IoT, and Machine Learning (Arefin MS, Kaiser MS, Bandyopadhyay A, Ahad MdAR, Ray K, eds), pp 195–203. Singapore: Springer Singapore. Jadhav SP, Wolfe J, Feldman DE (2009) Sparse temporal coding of elementary tactile features during active whisker sensation. Nat Neurosci 12:792–800. Ju Y, Sun G, Chen Q, Zhang M, Zhu H, Rehman MU (2019) A Model Combining Convolutional Neural Network and LightGBM Algorithm for Ultra-Short-Term Wind Power Forecasting. IEEE Access 7:28309–28318. Jung AB (2017) imgaug: Image augmentation for machine learning experiments. GitHub Available at: https://github.com/aleju/imgaug [Accessed March 30, 2023]. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems 30. Khorevin VI (1980) Effect of electrodermal stimulation on single unit responses to acoustic stimulation in the parvocellular part of the medial geniculate body. Neurophysiology 12:129– 134. Kim J, Erskine A, Cheung JA, Hires SA (2020) Behavioral and Neural Bases of Tactile Shape Discrimination Learning in Head-Fixed Mice. Neuron 108:953-967.e8. Kimura A (2020) Cross-modal modulation of cell activity by sound in first-order visual thalamic nucleus. J Comp Neurol 528:1917–1941. Kimura A, Imbe H (2018) Robust Subthreshold Cross-modal Modulation of Auditory Response by Cutaneous Electrical Stimulation in First- and Higher-order Auditory Thalamic Nuclei. Neuroscience 372:161–180. Kleinfeld D, Deschênes M (2011) Neuronal basis for object location in the vibrissa scanning sensorimotor system. Neuron 72:455–468. Knutsen PM, Biess A, Ahissar E (2008) Vibrissal kinematics in 3D: tight coupling of azimuth, elevation, and torsion across different whisking modes. Neuron 59:35–42. Knutsen PM, Derdikman D, Ahissar E (2005) Tracking whisker and head movements in unrestrained behaving rodents. J Neurophysiol 93:2294–2301. Knutsen PM, Pietr M, Ahissar E (2006) Haptic object localization in the vibrissal system: behavior and performance. J Neurosci 26:8451–8464. Komura Y, Tamura R, Uwano T, Nishijo H, Ono T (2005) Auditory thalamus integrates visual inputs into behavioral gains. Nat Neurosci 8:1203–1209. 112 Krakauer JW, Ghazanfar AA, Gomez-Marin A, MacIver MA, Poeppel D (2017) Neuroscience Needs Behavior: Correcting a Reductionist Bias. Neuron 93:480–490. Krupa DJ, Matell MS, Brisben AJ, Oliveira LM, Nicolelis MA (2001) Behavioral properties of the trigeminal somatosensory system in rats performing whisker-dependent tactile discriminations. J Neurosci 21:5752–5763. Kwegyir-Afful EE, Keller A (2004) Response properties of whisker-related neurons in rat second somatosensory cortex. J Neurophysiol 92:2083–2092. Lakatos P, Chen C-M, O’Connell MN, Mills A, Schroeder CE (2007) Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53:279–292. Land R, Engler G, Kral A, Engel AK (2012) Auditory Evoked Bursts in Mouse Visual Cortex during Isoflurane Anesthesia. PLOS ONE 7:e49855. Lee CC, Sherman SM (2008) Synaptic properties of thalamic and intracortical inputs to layer 4 of the first- and higher-order cortical areas in the auditory and somatosensory systems. J Neurophysiol 100:317–326. Lepora NF, Fox CW, Evans M, Mitchinson B, Motiwala A, Sullivan JC, Pearson MJ, Welsby J, Pipe T, Gurney K, Prescott TJ (2011) A General Classifier of Whisker Data Using Stationary Naive Bayes: Application to BIOTACT Robots. In: Towards Autonomous Robotic Systems (Groß R, Alboul L, Melhuish C, Witkowski M, Prescott TJ, Penders J, eds), pp 13–23. Berlin, Heidelberg: Springer Berlin Heidelberg. Liu J, Lin C-MM, Chao F (2020) Gradient Boost with Convolution Neural Network for Stock Forecast. In: Advances in Computational Intelligence Systems (Ju Z, Yang L, Yang C, Gegov A, Zhou D, eds), pp 155–165. Cham: Springer International Publishing. Lohse M, Dahmen JC, Bajo VM, King AJ (2021) Subcortical circuits mediate communication between primary sensory cortical areas in mice. Nat Commun 12:3916. Maruyama AT, Komai S (2018) Auditory-induced response in the primary sensory cortex of rodents. PLoS One 13:e0209266. McGinley MJ, Vinck M, Reimer J, Batista-Brito R, Zagha E, Cadwell CR, Tolias AS, Cardin JA, McCormick DA (2015) Waking State: Rapid Variations Modulate Neural and Behavioral Responses. Neuron 87:1143–1161. Menzel RR, Barth DS (2005) Multisensory and secondary somatosensory cortex in the rat. Cereb Cortex 15:1690–1696. Moore JD, Mercer Lindsay N, Deschênes M, Kleinfeld D (2015) Vibrissa Self-Motion and Touch Are Reliably Encoded along the Same Somatosensory Pathway from Brainstem through Thalamus. PLoS Biol 13:e1002253. 113 Niell CM, Stryker MP (2010) Modulation of visual responses by behavioral state in mouse visual cortex. Neuron 65:472–479. O’Connor DH, Clack NG, Huber D, Komiyama T, Myers EW, Svoboda K (2010a) Vibrissa-Based Object Localization in Head-Fixed Mice. J Neurosci 30:1947–1967. O’Connor DH, Hires SA, Guo ZV, Li N, Yu J, Sun Q-Q, Huber D, Svoboda K (2013) Neural coding during active somatosensation revealed using illusory touch. Nat Neurosci 16:958–965. O’Connor DH, Peron SP, Huber D, Svoboda K (2010b) Neural activity in barrel cortex underlying vibrissa-based object localization in mice. Neuron 67:1048–1061. Perkon I, Kosir A, Itskov PM, Tasic J, Diamond ME (2011) Unsupervised quantification of whisking and head movement in freely moving rodents. J Neurophysiol 105:1950–1962. Rao RP, Mielke F, Bobrov E, Brecht M (2014) Vocalization-whisking coordination and multisensory integration of social signals in rat auditory cortex. Elife 3:e03185. Santiago LF, Freire MAM, Picanço-Diniz CW, Franca JG, Pereira A (2018) The Organization and Connections of Second Somatosensory Cortex in the Agouti. Front Neuroanat 12:118. Severson KS, Xu D, Van de Loo M, Bai L, Ginty DD, O’Connor DH (2017) Active Touch and Self- Motion Encoding by Merkel Cell-Associated Afferents. Neuron 94:666-676.e9. Severson KS, Xu D, Yang H, O’Connor DH (2019) Coding of whisker motion across the mouse face. Elife 8:e41535. Shimaoka D, Harris KD, Carandini M (2018) Effects of Arousal on Mouse Sensory Cortex Depend on Modality. Cell Rep 22:3160–3167. Shumkova V, Sitdikova V, Rechapov I, Leukhin A, Minlebaev M (2021) Effects of urethane and isoflurane on the sensory evoked response and local blood flow in the early postnatal rat somatosensory cortex. Sci Rep 11:9567. Sofroniew NJ, Svoboda K (2015) Whisking. Current Biology 25:R137–R140. Sugiharti E, Arifudin R, Wiyanti DT, Susilo AB (2022) Integration of convolutional neural network and extreme gradient boosting for breast cancer detection. Bulletin EEI 11:803–813. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the Inception Architecture for Computer Vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2818–2826. Thongsuwan S, Jaiyen S, Padcharoen A, Agarwal P (2021) ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost. Nuclear Engineering and Technology 53:522–531. 114 Towal RB, Quist BW, Gopal V, Solomon JH, Hartmann MJZ (2011) The morphology of the rat vibrissal array: a model for quantifying spatiotemporal patterns of whisker-object contact. PLoS Comput Biol 7:e1001120. van Atteveldt N, Murray MM, Thut G, Schroeder CE (2014) Multisensory integration: flexible use of general operations. Neuron 81:1240–1253. Viaene AN, Petrof I, Sherman SM (2011) Properties of the thalamic projection from the posterior medial nucleus to primary and secondary somatosensory cortices in the mouse. Proc Natl Acad Sci U S A 108:18156–18161. Voigts J, Sakmann B, Celikel T (2008) Unsupervised whisker tracking in unrestrained behaving animals. J Neurophysiol 100:504–515. Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification. In: Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, pp 318–335. Munich, Germany: Springer-Verlag. Zhang M, Kwon SE, Ben-Johny M, O’Connor DH, Issa JB (2020) Spectral hallmark of auditory- tactile interactions in the mouse somatosensory cortex. Commun Biol 3:1–17. Zhou W, Ye C, Wang H, Mao Y, Zhang W, Liu A, Yang C-L, Li T, Hayashi L, Zhao W, Chen L, Liu Y, Tao W, Zhang Z (2022) Sound induces analgesia through corticothalamic circuits. Science 377:198–204. Zuo Y, Diamond ME (2019) Rats Generate Vibrissal Sensory Evidence until Boundary Crossing Triggers a Decision. Curr Biol 29:1415-1424.e5.
Abstract (if available)
Abstract
Examining neural representations during behavior affords unique insights into neural processing, despite the challenges of reduced experimental control and time costs of precisely quantifying behavioral variability. In this study, we investigate both behavior and neural representation during a whisker-guided object localization task in head-fixed mice and introduce a Python package to reduce the burden of manual data curation it entails. We use simple models to identify that touch count and midpoint at touch best predict behavioral outcomes. We modify the behavioral task to show that mice use object angle at touch, but not radial distance to solve the task. Using this behavior, we identify a neural representation in the primary somatosensory cortex (S1) that is best explained by whisker angle at touch, but not phase at touch. We show that this neural representation exists in trained and naive mice in equal proportion, these neurons can be used to decode object location and arise independently of free-whisking tuning. Lesion experiments reveal that S1 is necessary for touch discrimination but not touch detection. Further, we discover multisensory neurons in the secondary somatosensory cortex which respond to sound and touch. These neurons exist in trained and naive mice, exhibit complex tuning across both modalities, and are concentrated in layer 5 and upper layer 6. Finally, we developed a Python package to classify touch events in high-speed video using a custom-trained ResNetV2-Light GBM hybrid model. This system is customizable to new datasets and performs equal to or better than expert curators.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Cell-type specialization of layer 5 excitatory neuron functions in tactile behavior underlying object localization
PDF
The behavioral and neural bases of tactile object localization
PDF
Neural circuits control and modulate innate defensive behaviors
PDF
Neural circuits underlying the modulation and impact of defensive behaviors
PDF
From sensory processing to behavior control: functions of the inferior colliculus
PDF
Dynamical representation learning for multiscale brain activity
PDF
Efficient learning: exploring computational and data-driven techniques for efficient training of deep learning models
PDF
Behavior understanding from speech under constrained conditions: exploring sparse networks, transfer and unsupervised learning
PDF
Deep learning techniques for supervised pedestrian detection and critically-supervised object detection
PDF
Learning to optimize the geometry and appearance from images
PDF
Scaling recommendation models with data-aware architectures and hardware efficient implementations
PDF
Designing data-effective machine learning pipeline in application to physics and material science
Asset Metadata
Creator
Maire, Phillip Scott (author)
Core Title
Tactile object localization: behavioral correlates, neural representations, and a deep learning hybrid model to classify touch
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Neuroscience
Degree Conferral Date
2023-08
Publication Date
08/10/2023
Defense Date
06/23/2023
Publisher
University of Southern California. Libraries
(digital)
Tag
behavior,deep learning,machine learning,mouse,neural representation,OAI-PMH Harvest,receptive field,somatosensation,Touch,Vibrissal,whisker
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hirsch, Judith;Hires, Andrew (
committee chair
), Tao, Huizhong (
committee member
)
Creator Email
maire@usc.edu,phillip.maire@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113298025
Unique identifier
UC113298025
Identifier
etd-MairePhill-12239.pdf (filename)
Legacy Identifier
etd-MairePhill-12239
Document Type
Dissertation
Rights
Maire, Phillip Scott
Internet Media Type
application/pdf
Type
texts
Source
20230810-usctheses-batch-1083
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Repository Email
cisadmin@lib.usc.edu
Tags
behavior
deep learning
machine learning
mouse
neural representation
receptive field
somatosensation
Vibrissal
whisker