Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Minimal sensory modalities and spatial perception
(USC Thesis Other)
Minimal sensory modalities and spatial perception
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MINIMAL SENSORY MODALITIES AND SPATIAL PERCEPTION by Douglas C. Wadle A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (PHILOSOPHY) August 2023 Copyright 2023 Douglas C. Wadle Acknowledgements I would like to thank the members of my committee, Janet Levin (chair), James Van Cleve (honorary co-chair), John Hawthorne, David Wallace, and Toby Mintz (who graciously stepped in at the 11th hour), for their support, insights, and guidance. I’d also like to thank the late Irving Biederman, my external committee member until his passing in 2022, for his input. Thanks to Andrew Bacon and Scott Soames, who guided me through my area exams, and to Alexis Wellwood, who has overseen my work in the Meaning Lab since 2021. Alexis, in particular, has been indefatigable in supporting the development of my empirical chops while also giving sage career advice. I would be remiss not to thank Mark Schroeder who has also always found the time to offer advice (practical and philosophical) whenever I asked. But my thanks extend to all of the philosophy faculty I have had the good fortune to interact with and learn from during my time at USC. I also want to thank all of the graduate students, past and present, for countless philosophical discussion, diversions, and general camaraderie. There are too many names to list here – at least not without running the risk unforgivable omissions – so a general thanks will have to do. My sincere thanks also go to the excellent administrative staff who have passed through the USC School of Philosophy during my time there. Of particular note for always going above and beyond – and being excellent people, to boot – are: Natalie Schaad, Amanda Velasco, Donna Lugo, and John Nikolai. Finally, I want to thank my family for their support and forbearance throughout this journey – especially my wife, Kristen and my children Clarence and Rose, but also my parents, Bob and Anne Wadle, and my in-laws, Richard and Diane Smiarowski. ii Contents Acknowledgements………………………………………………………………………………..ii List of Tables………………………………………………………………………………………v List of Figures…………………………………………………………………………………….vi Abstract…………………………………………………………………………………………..vii Introduction…………………………………………………………………………………..........1 Chapter 1: Sensory Modalities and Novel Features of Perceptual Experiences............................7 1. Background and significance………………………………………………………….9 2. Novel features……………………………………………………………………….. 11 3. Constraining the association rule……………………………………………………. 15 4. O’Callaghan’s association rule and some morals drawn from it……………………. 21 4.1.O’Callaghan’s rule………………………………………………………………. 21 4.2.comments regarding mechanisms and modalities………………………………. 23 5. Revising the association rule……………………………………………………….. 28 5.1.‘could’ and ‘would’ in determinate sensory mechanisms………………………. 28 5.2.loosening constraints on the stimulus………………………………………….. 30 5.3.loosening what counts as a result……………………………………………….. 34 6. The problem generalizes……………………………………………………………. 51 7. Conclusion………………………………………………………………………….. 53 Chapter 2: Information, Function, and Representation in Perceptual Processing.......................56 1. The bottom-up approach: Neander’s informational teleosemantics……………….. 59 2. The top-down approach: Shea’s varitel semantics…………………………………..65 3. The bottom-up constraint: internally accessible information………………………. 71 3.1.sensory receptor stimulation……………………………………………………. 72 3.2.background information………………………………………………………… 76 4. The top-down constraint: derivatively adaptive low-level functions………………. 78 5. Testing the account…………………………………………………………………. 83 5.1.toy examples……………………………………………………………………. 83 5.2.content indeterminacy challenges………………………………………………. 85 Chapter 3: Restricted Auditory Aspatialism............................................................................... 89 1. Setting the target: restricted auditory aspatialism………………………………….. 91 1.1.Strawson’s restricted auditory aspatialism………………………………………94 1.2.A new motivation for restricted auditory aspatialism………………………….. 97 1.3.The target notion of a purely auditory experience…………………………….. 102 2. Spatial hearing…………………………………………………………………….. 109 2.1. monaural cues…………………………………………………………………. 111 2.2.binaural cues……………………………………………………………………121 3. Restricted auditory aspatialism, unrestricted auditory aspatialism, and the ontology of sound…………………………………………………………………..129 4. Conclusion………………………………………………………………………… 132 Chapter 4: The Contributions of the Bodily Senses to Cortical Body Representations............135 1. The origins of the body schema.137 2. Information and the minimal bodily senses……………………………………….. 143 3. Minimal proprioception and minimal touch………………………………………. 150 3.1.minimal touch…………………………………………………………………. 150 3.2.minimal proprioception……………………………………………………….. 155 3.3.minimal proprioception + minimal touch…………………………………….. 159 3.4.limitations………………………………………………………………………162 4. Minimal equilibrioception………………………………………………………….165 4.1.minimal proprioception+touch+equilibrioception: orientation……………….. 168 4.2.minimal proprioception+touch+equilibrioception: scale……………………… 169 5. Implications and future directions………………………………………………….170 5.1.the origins of the body schema…………………………………………………170 5.2.the individuation of cortical body representations…………………………….. 172 Chapter 5: Vision, Vantage, and the Supposed Need for Action in Spatial Perception.............180 1. Perceptual relativity and action……………………………………………………. 182 1.1.what is perceptual relativity?………………………………………………….. 183 1.2.why does perceptual relativity (allegedly) support action-based views of perception?………………………………………………………………….. 187 2. The egocentric coordinates motivation……………………………………………. 188 2.1.orienting up/down and left/right axes of egocentric space……………………..190 2.2.scuttling the priority claim: action coordinates also depend on integration……192 2.3.distance in depth………………………………………………………………. 196 3. The sensorimotor knowledge motivation………………………………………….. 200 3.1.size…………………………………………………………………………….. 201 3.2.shape…………………………………………………………………………… 202 4. The objective experience motivation……………………………………………….205 5. Appearance-determining relations are not represented……………………………. 209 6. Conclusion………………………………………………………………………….218 References………………………………………………………………………………………221 Appendix………………………………………………………………………………………. 246 List of Tables Table 1.1. Summary of the association rules considered herein……………………….………246 Table 1.2a. Summary of results for each association rule with respect to the four constraints on an acceptable association rule…………………………………………………..247 Table 1.2b. Comparison of the guiding feature association intuitions for the relevant features in test cases by association rules…………………………………………..…………..247 v List of Figures Figure 3.1. Cone of confusion by interaural time difference, and tori of confusion by interaural time difference and interaural level difference………………………………………126 Figure 4.1. 1954 Penfield Hommunculi………………………………………………………..141 Figure 4.2. Receptive field architecture of mechanoreceptors of the foot by type…………….152 Figure 4.3. Illustration of importance of resting angle to the determination of limb position by proprioceptive input……………………………………………….……………….157 Figure 4.4. Illustration of exploratory choreography…………………………………………..160 Figure 4.5. Vestibular labyrinth…………………………………………………………….…..165 Figure 5.1. Stereoptic field at different interocular distances/vergence angles……………….. 198 Figure 5.2. Illustration of effect of focal distance on geometry of the stereoptically field…….202 vi Abstract This dissertation develops new tools to approach old questions about the the origins of (egocentric) spatial experience: in particular, whether it depends on an innate representation of space, on action or behavioral dispositions, or is derived solely from perception. In chapter 1 I argue that all attempts to formulate a rule for associating features of perceptual experiences with the Aristotelian senses (vision, audition, touch, taste, and smell) are inadequate; they either ignore subpersonal interaction between the putative senses or else they contradict intuitions that are constitutive of our very ideas of those senses. I conclude that we should stop theorizing about perceptual experience in terms of the Aristotelian senses. Chapter 2 develops my alternative approach, which focuses on the contributions of the neural mechanisms of perceptual processing (without regard for how we would map these mechanisms onto Aristotelian senses). On my proposal, perceptual content is determined by both the information that low-level mechanisms – beginning with the sensory receptors (e.g., retinas) – make available to subsequent perceptual processing and how perceptual processing uses that information to perform higher-level functions (e.g., bring about adaptive behaviors). Chapters 3-5 apply this method to show the extent to which the early stages of low-level perceptual processing – or, as I call them, minimal sensory modalities – capture information concerning spatial properties. In Chapter 3 I argue that minimal audition does not capture any egocentric spatial information by itself and, hence, that sensory experiences produced just by minimal audition (so-called ‘purely auditory experiences’) could not represent the locations of sounds. In Chapter 4 I use the minimal senses approach to evaluate and clarify the hypothesis that the body schema – a dynamic, subconscious vii representation of one’s current bodily posture – is acquired via sensory stimulation resulting from in utero movements. I suggest further empirical work that will be needed to determine if the hypothesis is true and then draw out implications for debates concerning the existence of and interactions between other putative body representations. Chapter 5 argues that minimal vision cannot capture information about egocentric spatial features – it is restricted to a retinocentric spatial frame. However, the integration of the spatial frames of minimal vision with the body- based representations discussed in chapter 4 is sufficient for the perception of egocentric spatial properties. Furthermore, we do not need action or innate representations of external space to account for this integration. viii Introduction Going back at least to Kant, philosophers of spatial perception have been concerned with the following related questions: Can our representation/conception of space be empirically derived? Can we experience spatial properties/relations without possessing an antecedent (a priori) representation of space? Or, to put a modern twist on it: Suppose we were to build a robot with sensory receptors that are functionally very much like our own. Are there deep learning algorithms by which it could learn to represent (and on that basis, ultimately to navigate) space? Kant (1780) thought that we do need an innate representation of space in order to have spatial perception. A modern development of the Kantian idea maintains that this innate spatial representation serves as a common container in which impressions from the various senses are arranged into a coherent representation of the world. Matthen (2014), for instance, argues that the integration of distinct spatial frames associated with each of the individual senses cannot arise either by mapping the spatial coordinates of the various frames to one privileged frame (e.g., the visual spatial frame) post-perceptually or from action, alone. From this Matthen concludes that there must be a pre-modal representation of space shared by the perceptual modalities. This pre- modal representation functions as a ‘common measure’ (or container) into which the objects and qualities perceived by the various senses can be placed to bring them into spatial relations with one another. In this dissertation, I offer an empirically grounded argument that, while we do need some – exceedingly minimal – innate spatial content to enable spatial perception, but none of 1 these representations have the form or function of the innate representation proposed by Kant or his modern-day counterparts. Importantly these innate spatial contents do not correspond to contents about egocentric space. All we need is innate content that tells us that, for any given sensory receptor organ, the surface of the receptor is spatially continuous. Given this innate content, perceptual learning is enough to coordinate the sensory input received by these receptors and acquire a perceptual representation of egocentric space from them. Notice that I have shifted to talk of sensory receptors where Matthen, for instance, spoke of sensory modalities. I make this shift because Matthen’s approach requires that we isolate those features of perceptual experiences that are contributed by the individual Aristotelian sense (sight, hearing, touch, taste, smell). As I argue in chapter 1, all attempts to formulate a rule for associating features of perceptual experiences with the Aristotelian senses are inadequate; they either ignore subpersonal interaction between the putative senses or else they contradict intuitions that are constitutive of our very ideas of those senses. The argument unfolds against the backdrop of the recent debate over the existence of features of perceptual experiences that aren’t associated with any of the Aristotelian senses (e.g., aspects of flavor experience that don’t reduce to taste or smell) (Bayne 2014; Briscoe 2016, 2017, 2019; Conolly 2014; Macpherson 2011a; O’Callaghan 2014a, 2014b, 2015, 2017, 2019; Spence and Bayne 2014). The immediate consequence for that debate is that the very question at issue is ill-founded. The broader lesson is that we should stop theorizing about perceptual experience in terms of the Aristotelian senses – an approach that continues to dominate philosophy of perception and (to a lesser degree) the empirical study of perception. 2 In chapter 2, I develop an alternative approach, which focuses on the contributions of the neural mechanisms of perceptual processing (without regard for how we would map these mechanisms onto Aristotelian senses) to perceptual representation. On my proposal, perceptual content is determined by both the information that low-level mechanisms – beginning with the sensory receptors (e.g., retinas) – make available to subsequent perceptual processing and how perceptual processing uses that information to perform higher-level functions (e.g., bring about adaptive behaviors). The account I develop is within the informational teleosemantics camp but outperforms the earlier informational teleosemantic views from which it draws inspiration (Neander 2017; Shea 2018). These earlier views focus either on sensory input or on behavioral output as the primary determiners of perpetual content. This leaves them open to content indeterminacy challenges that can only be answered by appealing to both of these aspects of content determination. In Chapters 3-5, I apply this method to the early stages of low-level perceptual processing (shallow processing of inputs to the sensory receptors) – or, as I call them, minimal sensory modalities. Each minimal sensory modality is identified with the mechanisms that process its characteristic inputs without taking input from the stimulation of sensory receptors associated with the other minimal senses. The idea is to identify what each minimal sense achieve, by way of spatial perception, first on its own and then in combination with other minimal senses. This approach builds on the long tradition of isolating sensory-specific contributions to our mental lives – e.g., Condillac’s (1754) famous statue thought experiment and Strawson’s (1959) ‘purely auditory experience’ and Haggard, Chen, and Fardo’s ‘skin space’ (Cheng and Haggard 2018; Cheng 2019; Fardo et al. 2018) – while avoiding the pitfalls of theorizing about the contents of 3 perception in terms of the Aristotelian senses pointed out in chapter 1. The minimal sensory modalities are not versions of the Aristotelian sense, though they frequently have Aristotelian analogues – e.g., minimal audition will be concerned with shallow processing of stimulation of the basilar membranes by pressure waves. In Chapter 3 I argue that minimal audition does not capture any egocentric spatial information by itself and, hence, that sensory experiences produced just by minimal audition (so- called ‘purely auditory experiences’) could not represent the locations of sounds. Spatial audition requires visual and proprioceptive inputs (traditionally conceived). I show how this result undermines the primary argument in favor of the view that sounds are events or properties of their sources rather than pressure waves or sensations – an argument that depends on sounds being auditorily experienced as located (Casati and Dokic 2005, 2009; O’Callaghan 2007, 2010). Given the lack of agreement concerning the individuation of the Aristotelian senses, the (less- than-minimal) notion of audition at play in this argument might well turn out not to be audition at all. In Chapter 4 I examine the contributions of the minimal bodily senses (proprioception, touch, and equilibrioception) to the formation of representations, in the brain, of the body’s spatial structure and present posture. I argue that these resources are sufficient for the formation of representations of bodily space, but – in the absence of additional sensory input (particularly from vision) – they fall short of representing external spatial properties. In particular, I use the minimal senses approach to evaluate and clarify the hypothesis that the body schema – a dynamic, subconscious representation of one’s current bodily posture – is acquired via sensory stimulation resulting from in utero movements (Meltzoff and Moore 1997; Meltzoff 2007a, 4 2007b; Marshall and Meltzoff 2014, 2015; Meltzoff and Marshall 2018; Fagard et al. 2018) . I suggest further empirical work that will be needed to determine if the hypothesis is true and then draw out implications for debates concerning the existence of and interactions between other putative body representations. Chapter 5 argues that minimal vision cannot capture information about egocentric spatial features – it is restricted to a retinocentric spatial frame (or a spatial frame defined by the integration of the two retinocentric frames in binocular vision). However, the integration of the spatial frames of minimal vision with the body-based representations discussed in chapter 4 is sufficient for the perception of egocentric spatial properties. These points are drawn out in a discussion of two claims about the dependence of our ability to see spatial properties on action. The first claim is that the spatial significance of vision – and hence the perspectival character of visual experience – is derived from action (Evans 1982; Grush 2007; Schellenberg 2007, 2010). The second is that the visual perception of the intrinsic (non-relation/viewpoint invariant) spatial features of objects depends on action (Hurley 1998; Hurley and Noë 2003; Noë 2006; Schellenberg 2007, 2010). I argue that proponents of both claims have overlooked non-retinal inputs to (what they are taking as) visual experience. That is, they tend to unwittingly treat their common sense notion of vision as though it were – with respect to the perceptual inputs it receives – minimal vision. This leads them to mistakenly identify informational shortfalls in vision that they believe action is needed to overcome. The collective result of these last three chapters is that the integration of the minimal modalities is sufficient for perceiving external spatial properties. For example, the integration of the minimal bodily senses can result in a spatial representation of the body that, when further 5 integrated with input from minimal vision can encode metric distances and directions in external space. However, this requires the coordination of innate subpersonal representations of spatial features of the sensory receptors – e.g., that they are continuous surfaces. 6 Chapter 1 Sensory Modalities and Novel Features of Perceptual Experiences ABSTRACT: Is the flavor of mint reducible to the minty smell, the taste, and the menthol-like coolness on the roof of one’s mouth, or does it include something over and above these – something not properly associated with any one of the contributing senses? More generally, are there features of perceptual experiences – so-called novel features – that are not associated with any of the standard Aristotelian senses (vision, audition, touch, taste, and smell) taken singly (the flavor of mint being one of the prime candidates for such features)? This question has received a lot of attention of late. Yet surprisingly little 1 attention has been paid to the question of what it means to say that a feature is associated with a modality in the first place. Indeed, there is only one fully developed proposal in 2 the literature (O’Callaghan, 2014a, 2014b, 2015, 2017). I argue that this proposal is too permissive to inform the debate over novel features. I go on to argue that all attempts to formulate a better proposal along these lines fail. The corollary of my arguments is that See Bayne (2014), Briscoe (2016a, 2016b, 2019), Connolly (2014), Fulkerson (2014a), Macpherson (2011a), 1 O’Callaghan (2008, 2012, 2014a, 2014b, 2015, 2017a, 2017b, 2019). See Spence and Bayne (2014) for a critical look at the evidence for multisensory experiences (with or without novel features). A bit more, but still too little, attention has been paid to another unresolved problem that undermines the attempt to 2 answer this question – namely, that there is no uncontroversial demarcation of the senses on which to base these associations. While I focus on the problem of feature association, I will engage with the further problem of individuating the senses to the extent necessary to address that first problem (see, especially, section 4.2). 7 the question of the existence of novel features is poorly formed. Furthermore, the problem generalizes, with the result we should not rely on our pre-theoretical notions of the senses as the basis of theorizing about the features (contents and phenomenal character) of perceptual experiences. Contents 1. Background and significance 2. Novel features 3. Constraining the association rule 4. O’Callaghan’s association rule and some morals drawn from it 4.1.O’Callaghan’s rule 4.2.comments regarding mechanisms and modalities 5. Revising the association rule 5.1.‘could’ and ‘would’ in determinate sensory mechanisms 5.2.loosening constraints on the stimulus 5.3.loosening what counts as a result 6. The problem generalizes 7. Conclusion Appendix Is the flavor of mint reducible to the minty smell, the taste, and the menthol-like coolness on the roof of one’s mouth, or does it include something over and above these – something not properly associated with any one of the contributing senses? More generally, are there features of perceptual experiences – so-called novel features – that are not associated with any of the standard Aristotelian senses (vision, audition, touch, taste, and smell) taken singly (the flavor of mint being one of the prime candidates for such features)? This question has received a lot of attention of late. Yet surprisingly little attention has been paid to the question of what it means 3 See Bayne (2014), Briscoe (2016a, 2016b, 2019), Connolly (2014), Fulkerson (2014a), Macpherson (2011a), 3 O’Callaghan (2008, 2012, 2014a, 2014b, 2015, 2017a, 2017b, 2019). See Spence and Bayne (2014) for a critical look at the evidence for multisensory experiences (with or without novel features). 8 to say that a feature is associated with a modality in the first place. Indeed, there is only one 4 fully developed proposal in the literature (O’Callaghan, 2014a, 2014b, 2015, 2017). I argue that this proposal is too permissive to inform the debate over novel features. I go on to argue that all attempts to formulate a better proposal along these lines fail. The corollary of my arguments is that the question of the existence of novel features is poorly formed. Furthermore, the problem generalizes, with the result we should not rely on our pre-theoretical notions of the senses as the basis of theorizing about the features (contents and phenomenal character) of perceptual experiences. 1. Background and Significance The philosophy of perception has come a long way in the last two decades. As recently as 1999 one could be forgiven for confusing the philosophy of perception with the philosophy of vision, but since then great strides have been taken to broaden our perspective, first opening new avenues for philosophical research into the senses of hearing, taste, smell, and touch, and now beginning to transcend the boundaries of these senses themselves. We are now able to ask questions like the one with which I began: Are there distinctively multisensory (i.e., novel) features of perceptual experiences? In what follows I will argue that this progress has not taken us far enough. Multisensory approaches to perception retain the inheritance of the earlier sense- by-sense approach, and this inheritance impedes further progress in the philosophy of perception. A bit more, but still too little, attention has been paid to another unresolved problem that undermines the attempt to 4 answer this question – namely, that there is no uncontroversial demarcation of the senses on which to base these associations. While I focus on the problem of feature association, I will engage with the further problem of individuating the senses to the extent necessary to address that first problem (see, especially, section 4.2). 9 When we take the senses one at a time, the complexities introduced by their interactions aren’t salient, and it is easy to think that our intuitive associations of features with these senses are unproblematic (and exhaustive). Unfortunately, this unreflective reliance on intuitive feature associations is still evident in much of the discussion of multisensory experience. Bayne (2014) is representative here. After acknowledging that there is no clear rule for associating a given feature of a perceptual experience with a sense and that there is no general agreement on how to individuate the senses, he remarks, “I will (largely) set [these issues] to one side here. As we will see, significant progress can be made in evaluating the decomposition thesis by relying only on our intuitive sense of how to individuate experiences” (p. 16). (The decomposition thesis is the claim that there are no novel features, and by “individuate experiences” Bayne means carve a perceptual experience up into sense-specific components – i.e., assign features of the experience to one or another of the Aristotelian senses – insofar as is possible.) But it is exactly at the interactions of the senses that the relevant intuitions start to break down. We need some other basis for associating features with the senses to properly undertake the multisensory account of perception. O’Callaghan is alone in offering a thoroughgoing feature association rule. As we shall 5 see, though, his proposal won’t actually settle the novel feature question. Furthermore, no proposal will. Indeed, no proposal can even deliver all the uncontroversial associations of features with our intuitive, pre-theoretical understandings of the Aristotelian senses – at least not in a way that is otherwise adequate for settling the novel feature question. The lesson I draw is Others have recently adopted O’Callaghan’s proposal (e.g., Briscoe 2021), but none have held it up to scrutiny as I 5 shall do. Fulkerson (2011, 2014) proposes his own partial rule based in feature binding. By his own admission, though, binding is “at best a sufficient condition” for features to be associated with a given modality (2014b, p. 34). Adjudicating the novel feature debate requires a complete remedy. So I will set Fulkerson’s proposal aside (but see n.44). 10 that our pre-theoretical notions of the senses, in terms of which the novel feature debate is couched, are an inadequate basis for theorizing about the content and phenomenal character of perceptual experience. Though I focus on the novel feature debate, this point holds for any 6 debate concerning the content or phenomenology of perceptual experience (e.g., intentionalism, 7 the thick/thin contents of experience debate, cognitive penetration, etc.). I recommend that we abandon the last vestiges of the sense-by-sense approach. Instead, we should focus on the contributions of the relevant perceptual mechanisms to perceptual experience, without worrying over how these mechanisms relate to our pre-theoretical notions of the (Aristotelian) senses. (More on mechanisms and senses below.) As the novel feature debate is my point of entry, the first step will be to get a clearer understanding of what these novel features are. It is to this that I now turn. 2. Novel Features Novel features are features of a perceptual experience that are not properly associated with any Aristotelian sense (pre-theoretically understood) – vision, audition, touch, taste, or smell. 8 I follow the dialectic of the novel feature debate in speaking of perceptual experiences. While the experience/sub- 6 experiential distinction does seem to be at play in explaining some of the intuitions I discuss, all that will matter for my arguments – particularly with respect to perceptual contents – is that the intuitions regarding the Aristotelian senses are what they are. The problem is especially salient for intramodal intentionalists, who claim that perceptual phenomenology 7 supervenes on content+modality (e.g., Crane 2003, 2007; Lycan 1996). See Bourget (2017) and O’Dea (2006) for reasons why novel features pose a problem for intramodal intentionalism. An anonymous reviewer pointed out the intriguing possibility of features that are mistakenly attributed to a 8 modality. Such features are either: (a) features that ought to be associated with one modality but are, due to interaction of the two senses, apparently associated with another (see Spence (2016) on oral referral as a potential instance of this) or (b) features that aren’t properly associated with any modality but are mistakenly thought to be. Features of type (a) would not be novel. Features of (b) would be novel. Making sense of mistaken feature associations will, of course, depend on a rule for associating features with modalities – just as understanding novel features does. 11 Rather, they arise due to the operation of multiple senses working together. The relevant features 9 are simple features of both the phenomenal character and representational content of the perceptual experience. When I speak of features, I will mean such simple features. (Sometimes the literature focuses solely on phenomenal features, other times both, and sometimes it is ambiguous between the two. I will consider both.) Note, also, that the Aristotelian senses are exteroceptive – they inform us about the external environment. Novel features – if there are any – are features of experiences by which we encounter our environment. The novel feature debate does not address features of interoceptive experiences, which inform us of states of our own body (such as hunger or pain). Novel features fall into two classes: 10 i. Mere novel feature instances: Features of a perceptual experience that, while they can in some circumstances be associated with experience in just one sensory modality, only appear as a feature of the given experience due to the coordinated operation of at least two distinct sensory modalities. ii. Novel feature types: Features of a perceptual experience that are never associated with a modality; they can only appear as a feature of a perceptual experience due to the coordinated operation of at least two distinct sensory modalities. And the modalities involved should suffice for the resulting experience to include the novel feature. Being a bit of 9 pine wood is not a novel feature of an experience in which one picks up a stick and smells it to ascertain what sort of wood it is because, though one needs the olfactory input along with visual and/or tactile input for the experience to include that property in its contents, one will also need cognitive inputs (e.g., a PINE concept) for this property to work into the contents of the experience. This distinction was introduced and further developed by O’Callaghan (2014a, 2015, 2017). The term ‘novel 10 feature’ is his. 12 Relations, the relata of which are perceived by different modalities, are the paradigm candidates for mere novel feature instances – e.g., the temporal relation between the sight of a baseball player’s foot touching third base and the sound of the ball striking the third baseman’s mitt used by an umpire to determine whether the runner makes it safely to third. Other candidates include cross-modally perceived gestalts, such as a rhythm articulated by alternating flashes and beeps. 11 A standard candidate for a novel feature type is flavor, as distinct from taste, where olfactory receptors interact with stimulation of the taste buds and input from the trigeminal nerve (by which we sense, e.g., the coolness of the mint or the heat of a chili pepper) to deliver an experience that is not (supposedly) reducible to the contributions of each of these, individually. 12 Another candidate is intermodal feature binding awareness – where features associated with different senses are experienced as bound together in a single perceptual object and the unity of the multisensory perceptual object is a feature of the experience (the putative novel feature). 13 It has been suggested that there is a distinct flavor modality (e.g., Stevenson 2009). The proposed modality encompasses taste, smell, trigeminal tactile stimulation, and (in some cases) vision. Without ruling on the propriety of treating such a system as a distinct sensory modality, we can note that such a modality is not one of the Aristotelian senses in terms of which the novel feature debate is conducted. Nor do we have any pre-theoretical notion of this flavor sense – it is a theoretical posit. Any dispute between someone who claims that flavor includes novel features O’Callaghan (2015) offers an extensive discussion of cross-modal gestalts in the context of the novel features 11 debate; for a skeptical take see Spence (2015a). See, e.g., Connolly (2014) and O’Callaghan (2014a, 2014b, 2015, 2017, 2019) for (somewhat guarded) defenses 12 of flavor as a novel feature type. For more on the various inputs to flavor perception see Auvray and Spence (2008), Spence (2015b). Again, Spence and Bayne (2014) offer reason to doubt that we have multisensory experiences, including the experience of multisensory objects. See also Spence and Frings (2020) for a discussion of one particular approach to multisensory feature integration. See especially O’Callaghan (2014a). 13 13 and one who claims that, while there are features involved in flavor experience beyond those supplied by taste, smell, and trigeminal stimulation, these are properly associated with a further (theoretical) flavor sense is merely verbal. This demonstrates the need for an agreed upon set of senses against which to evaluate novel features. In the novel feature debate, the Aristotelian senses play this role. It is also important to mark a distinction made within this literature between cross-modal interactions between the (Aristotelian) senses and properly multisensory effects. Here is O’Callaghan on the matter: “one sense could causally but not constitutively impact another. And sensory processes might interact while every aspect of perceptual experience remains associated with some specific modality or another. So, despite all the cross-talk, conscious perceptual experience might remain modality specific” (2015, p. 134). Standard examples of such cross- modal interactions without novel features are recalibration effects of the sort that generate illusions like the McGurk effect and the ventriloquism effect. In the McGurk effect, participants are presented with a video display of someone mouthing the syllable /ga/ onto which is dubbed the sound of someone pronouncing /ba/. Subjects report hearing /da/ even after they are made aware of the illusion. In the ventriloquism effect, a visual stimulus leads to a change in the apparent location of a sound. The effect gets its 14 name from the experience of hearing a ventriloquist’s voice as coming from her dummy’s mouth (because the ventriloquist’s mouth doesn’t move but the dummy’s does). To give a more modern example: when watching a movie while wearing headphones, we experience the voices of the The term ‘ventriloquism effect’ is sometimes used in a broader sense to include the spatial capture of audition by 14 any other sense (see Caclin, Soto-Faraco, Kingstone, and Spence 2002). I will focus on the audiovisual ventriloquism effect described above. 14 characters on the screen as coming from the mouths of those characters, not as coming from somewhere between our ears (Callan et al. 2015). These illusions are the result of sub-experiential interactions between the senses. Such interactions resolve discrepancies between the senses when they carry conflicting information about some feature of the environment or allow one modality to inherit information from another modality, where the second modality has a greater acuity or is more reliable with respect to that class of information. For instance, vision has greater spatial acuity than audition – at least for stimuli projected onto the fovea (a cone-rich structure of the retina onto which the center of the visual field is focused). When auditory and foveal visual stimulation are attributed to a common source, the localization of the visual stimulus dominates audition, conferring the more precise visual location on the sound. This recalibration of information leads to a corresponding alteration in the phenomenology and content of the perceptual experience. In ordinary circumstances such recalibration tends to increase the reliability of the senses. However, in irregular circumstances – such as those in which the illusions arise – these interactions can lead to errors. The thought, widely endorsed in the novel features literature, is that such causal interactions do not result in novel features. A feature of a perceptual experience can still be 15 associated with a single modality even when it is produced by more than one mechanism. If we classified features as multisensory strictly by the mechanisms that produce them, then nearly all the features of perceptual experience would be multisensory because causal interactions among Cf. Bayne (2014), with respect to contents. See Connolly (2014) for a reply. One might also push back on this 15 claim by noting that cross-modal illusions do induce some meta-cognitive uncertainty about the reliability of the experience (Deroy, Spence, and Noppeney 2016). However, the uncertainty is not a feature of the first order experience itself; it is an assessment of the reliability of that first order experience. (We don’t hear that there is uncertainty, we have uncertainty about what we hear.) Since novel features, if there are any, are features of the perceptual experience, metacognitive uncertainty won’t undermine the intuition that cross-modal illusions don’t produce novel features. 15 the sensory mechanisms (insofar as we can assign them to pre-theoretical notions of the senses) are pervasive, as noted by O’Callaghan, above. While this is a perfectly legitimate practice – 16 one common in the sciences – it deviates dramatically from that of partisans in the novel feature debate, with their reliance on the pre-theoretical notions of the Aristotelian senses as the basis for classifying sensory features. But, of course, to classify a feature as novel we will need some account of what it is for a feature (e.g., some particular color phenomenology) to be associated with a sensory modality (e.g., vision). As we saw with Bayne, above, philosophers debating the existence of novel features have generally relied on intuitive answers to this question (e.g., intuitions about the proper sensibles of a modality – those features that are perceivable by that modality and no others). But this approach will do no better, for the purposes of the novel feature debate, than the scientific approach detailed above. The appeal to such intuitions cannot settle the debate, since the features about which there is a dispute are precisely those about which intuitions diverge. If we are going to make any headway, we will need some other way of associating features with sensory modalities. The challenge is to draw the distinction between a causal impact and a constitutive impact in such a way that these intuitions are respected. It is very appealing to think that it should be drawn in terms of the features that the modified modality might produce on its own. A causal impact won’t – the thought goes – exceed these bounds while a constitutive impact (yielding a novel feature) will. This is the thought behind O’Callaghan’s proposed association See Driver and Noesselt (2008), Ghazanfar and Schroeder (2006), Schroeder and Foxe (2005), Shimojo and 16 Shams (2001). 16 rule, which we will examine shortly. Before we can assess that rule, we are going to need some criteria by which to judge it. 3. Constraining the association rule Any acceptable association rule must meet certain constraints. For instance, if the rule is going to be useful in assessing whether or not there are novel features, (1) it will have to tell us, for any given feature and sensory modality, whether or not that feature is associated with that modality. And (2) the associations the rule delivers must conform to certain intuitive associations of features with modalities. A rule that associates colors with smell and not with vision is not a good rule. Also (3) the rule must not prematurely trivialize the novel feature debate: We must not be able to read the existence or non-existence of novel features right off the rule. (A rule with, e.g., a final clause, “otherwise the feature is to be associated with vision”, does not satisfy this constraint.) And the novel features the rule allows should have independent theoretical interest. More generally, it must not render the debate unsubstantive. The intuitive associations of the second constraint that I shall focus on concern (a) proper sensibles and (b) merely causal influences of one sense on another. With respect to (a), these are features that are intuitively associated with only one sensory modality. Color, for instance, is a feature of visual experience but not of olfaction, audition, touch, or taste. Similarly, pitch, timbre, and loudness are proper sensibles of audition; odors are proper sensibles of smell, and so on. The way I am using proper sensibles here applies both to features of content and phenomenal character. The relevant intuitions regarding (b)-type cases are just those concerning causal interactions, including cross-modal illusions, discussed in the prior section. A perceptual 17 experience will contain no novel features due to these effects, not even mere novel feature instances. The impacted features are associated with the modality of the perceptual object that instantiates them – e.g., the illusory sound localization in the ventriloquism effect is associated with audition, not vision. These intuitions need to be respected if we are going to capture the pre- theoretical notions of the senses at play in the novel feature debate. There is one more constraint, which will require a brief argument: (4) The association must be grounded in an appeal to the stimulation states of the sensory mechanisms of our modalities. In particular, the rule must appeal to the stimulation states of the mechanism of the modality with which the given feature is associated. Sensory mechanisms (the sensory organs, 17 the portion of the nervous system connecting them to the brain, and the brain regions involved in processing inputs received from the sensory organs) and stimuli are two of the four axes along which theorizing about sensory modalities occurs. The other two are representational content/ worldly features perceived and the phenomenal character of the perceptual experience – the very sorts of things novel features are. (These criteria were codified by Grice (1962).) So, for any sensory modality there will be a mechanism that responds to some characteristic stimulus (e.g., light), a set of phenomenal features, and a set of contents ascribed to that senses. Any attempt to associate features with sensory modalities straightaway in terms of the content of the perceptual experience or its phenomenal character (or the two in combination) will be inadequate for present purposes. Here’s why: If you are already satisfied that explanations of the contents and phenomenal character of perception ultimately 17 need to be grounded in the physical (e.g., stimulation states of sensory receptors and subsequent neural processing), then you can skip the remainder of this section. 18 We need to distinguish between mere groupings of (simple) features of a perceptual experience and groupings that amount to the set of features properly associated with a given sensory modality. This will require an account of what a modality is. For the purposes of the novel feature debate, the relevant modalities need to correspond to the pre-theoretical notions of the Aristotelian senses – vision, audition, taste, smell, touch. If we restrict ourselves to features of the experience in giving our account of the modalities, we will need to appeal to relations between these features. It won’t do to merely stipulate associations – we would need to presume the very thing at issue in order to meet the first constraint. The difficulty is finding relations that don’t appeal to anything beyond the features themselves and that don’t beg the question either for or against novel features. While some have argued that the proper approach to individuating the senses is to focus on content, phenomenology, or both, these proposal are not made with the novel features debate in mind. For their part, partisans in the novel feature debate have generally sidestepped the 18 issue of individuating the senses (see the quote from Bayne, above). So, what might be said for the content and/or phenomenology approach to individuating the senses for the purposes of adjudicating the novel feature debate? We might try individuating the Aristotelian senses in terms of discrete phenomenal continua – e.g., color or pitch – in terms of which a notion of phenomenal similarity could be spelled out. But doing so leaves us with too many senses – e.g., a pitch sense, a timbre sense, a color sense, etc. Or we might try an appeal to dependencies between such features, letting the senses be the collections of interdependent features. For instance, we only see colors if we see The closest account we get is that of Richardson (2014). Richardson addresses causal influences of one sense on 18 another, but not the possibility of novel features. 19 them qualifying some shape (or spatial expanse), so color and (a certain kind of) shape content/ phenomenology belong to the same sense (vision). However, this won’t tell you which non- dependent features ought to be associated with a sensory modality, but we need an association of such non-dependent features to ground our associations of the dependent features. 19 Or we might try individuating the Aristotelian senses by phenomenology/content pairs. But we will still need a reason to pick certain pairings and not others. It is not the case that an experience with a particular content necessarily has a given phenomenology. For instance, we can have a perceptual experience with the content that there is a circle accompanied by either visual or tactile phenomenology. The question is why the one pairing should count as visual and the other as tactile. It won’t be because the visual phenomenology is grouped along one 20 phenomenal continuum and the tactile another, since we will have to include multiple continua in each sense (see above). We need some further reason to include/exclude the relevant continuum in/from the sense in question. We can’t appeal to dependences to group the relevant continua into a sense (e.g., one can only see the color of the circle if one has the visual sort of circle phenomenology) without some prior rule for grouping the non-dependent features. But there don’t appear to be any other relations between features of a perceptual experience that could make the difference of a sense. 21 And there are features that, while not seeming to ground any further associations, are non-dependent and are 19 intuitively associated with an Aristotelian sense – e.g., silence is associated with audition and empty space is associated with multiple senses: I can see the space between two objects, or I can feel the empty space in front of me when I have my hand about. The point applies to experience with non-perceptual contents, too. I could be thinking about circles (or whatever 20 else) while looking at a blank wall. Why shouldn’t the intuitively non-perceptual content get paired with some perceptual phenomenology and, on the strength of that pairing, be associated with one of our sensory modalities? Presumably we distinguish perceptual contents from non-perceptual contents because the former, and not the latter, are the result of a perceptual mechanism. But this has us appealing to mechanisms, so it can’t be the answer for the content/phenomenology pairing approach. If you are inclined to respond with an appeal to the relative determinacy of visual v. tactile shape discrimination, 21 substitute this example: We can feel and see the location of a pin prick on our fingertip. 20 So we can’t give an adequate account of the sensory modalities (for present purposes) in terms of just the content and phenomenal character of an experience. Fortunately, we can appeal to Grice’s other two criteria, mechanisms and stimulation, in formulating our association rule – individuating the senses by sensory mechanisms and using the stimulation states of the mechanisms as the basis of associations of features with modalities – namely, the modalities responsible for the production of those features. Hence the fourth constraint on our association 22 rule. 4. O’Callaghan’s association rule and some morals drawn from it 4.1. O’Callaghan’ s rule O’Callaghan has offered an association rule that, at first blush, seems to satisfy this fourth constraint: “for each perceptual experience of a given modality, the [features] it instantiates that [are] associated with that modality on that occasion [include] only that which a corresponding [unimodal] experience of that modality could have” (2014b, p. 152). He defines 23 “corresponding experiences” as experiences resulting from an equivalent stimulus (2014b, p. 152). O’Callaghan understands unimodal experience in terms of mechanisms, suggesting that we get a fix on a unimodal experience by considering what is left once we subtract the contributions of the mechanisms associated with the other sensory modalities (2014b, p. 145). More formally, for an instance of a feature F of a perceptual experience E resulting from a stimulus R: Since mechanisms are partially defined in terms of their response to stimuli, there is no disentangling the two. 22 There are proposals that don’t rely, directly, on Grice’s criteria (esp. Nudds 2004) but appeal to conventions – e.g., those features count as visual that we conventionally associate with vision. However, such accounts – by their own admission – won’t deliver unambiguous verdicts in the controversial cases. O’Callaghan’s formulation focuses only on phenomenal features. I adjust it here to include content. 23 21 (OCR) That instance of F is associated with a given modality if, and only if, a corresponding instance of F could be a feature of a unimodal experience E’, in that modality, resulting from stimulation by R. 24 So, the smell of coffee brewing, when part of a perceptual experience including the sight of bread in the toaster, the feel of a jar of jam in your hand, the sound of coffee percolating, etc., will belong to olfaction because, were we to isolate olfaction, ruling out any input from the other modalities, and present it with exactly the same scene, the resulting experience would include the smell of the coffee brewing. But the sound of the coffee percolating would not be associated with olfaction because olfaction would presumably not, in the presence of that same total stimulus, deliver the sound of percolating coffee. It is clear what OCR excludes: If a feature instance is to be assigned to a modality, its production cannot require concurrent stimulations outside the characteristic sensory inputs to that modality. But what about (i) past stimulations in another modality, as when one now “sees” the heat of an ember thanks to learned correlations between heat and a red glow? And what about (ii) inputs that are not stimulations at all, such as an innate Kantian spatial form of intuition? All the proposed rules should be read as relativized to normally functioning sensory mechanisms. Notice, also, 24 that the rules are formulated in terms of feature instances. A novel feature type will be one with no instances that are associated with any modality. We can say that a feature type is associated with a modality if some unimodal experience of that modality instantiates that feature. Finally, recognize that the association relation cannot be a function – we must leave open the possibility that a feature is associated with more than one modality if we are to accommodate common sensibles. It might be that the association relation is a function for phenomenal features, but that is a substantive philosophical thesis that is more often assumed than argued for. 22 Until these questions are answered OCR cannot satisfy our first constraint. It does not 25 tell us for any given feature instance and for each modality whether or not the feature instance is associated with that modality. With respect to (i), for instance, we learn to hear the elevation of 26 a sound by correlating the effects of the outer ear on the waveform of the sound – which differ with elevation – with visual input regarding the elevation of sound sources. Whether or not we include this correlation as part of the input to the unimodal experience will determine whether or not we should associate sound elevation with audition according to OCR. The problem is even worse for (ii). Whatever could have been supplied (in terms of content or phenomenal character) by the other senses could be supplied by inputs that aren’t derived from sensory stimulations. There could be, e.g., an innate correlation between waveform effects and sound elevation that does just what the learned correlation discussed above does. And if that is possible, subtracting vision (or any other modality) won’t shed any light on the deliverances of unimodal audition according to OCR. Likewise for all the other modalities. For instance, there could be an innate correlation between the smell of coffee brewing and the sound To be fair, O'Callaghan (2014b) does discuss the problem of learned correlations, drawing a distinction between 25 two kinds of unimodal experience: pure and mere experiences of a modality. The former are perceptual experiences had by a creature whose past and present experiences have only been in that modality. The latter are those had by a creature whose experience is presently, but not historically, restricted to that modality. O’Callaghan inclines towards ‘yes’. Others – most notably Strawson (1959) in his discussion of sounds and a ‘no space world’ – have said ‘no’. De Vignemont’s (2014) multimodal account of bodily awareness also depends on a ‘no’ answer. O’Callaghan does not discuss stimulus-independent inputs. One might also worry that there is no uncontroversial way of individuating the senses and, so, no way of 26 identifying what, say, vision is – let alone what features should be associated with it (Coady 1974; Fulkerson 2014c; Gray 2013; Grice 1962; Heil 1983; Keeley 2002; Macpherson 2011b, 2011c, 2014; Matthen 2015; Nelkin 1990; Nudds 2004; Roxbee-Cox 1970). However, the cases considered in the individuation of the senses literature concern determinable senses – the sort of senses that might be shared by creatures with differing sensory mechanisms and capacities associated with that sense (e.g., a sense of vision that can be had by humans, bees, and pit vipers). The novel feature debate, on the other hand, concerns a characterization of our experiences. If we focus just on the (relatively) determinate senses that we, as a species, share (as O’Callaghan does in his discussion), we might get a workable individuation of the Aristotelian senses. (I borrow the determinate/determinable sense distinction from O’Callaghan (2019, chapter 6).) 23 of a percolator, in which case OCR would associate the sound with olfaction. (I will have more to say about problems resulting from the use of ‘could’ below.) The lesson is that we will have to take a stance on what is and is not actually part of/an input to the mechanism responding to the stimulus and giving rise to the experience (including any learned correlations and non-sensory inputs) before we can successfully apply an association rule based in an individuation of the senses by those mechanisms. Before proceeding to further attempts at an association rule, some remarks about these sensory mechanisms are in order. 4.2. Comments regarding mechanisms and modalities The preceding discussion should alert us to the fact that we do not have uncontroversial intuitions about the exact extent of the sensory mechanisms of the Aristotelian senses. This 27 suggests that there are multiple intuitively acceptable ways of demarcating these mechanisms. This poses a problem for the novel feature debate because differences in demarcation will lead to differences in feature associations. And these differences are most likely to arise with respect to 28 the controversial feature association, including those concerning putative novel features. For instance, one acceptable demarcation might associate contested flavor features with taste while another doesn’t associate it with any of our Aristotelian senses. But the debate presumes that there will be an unqualified answer to the question ‘are there novel features?’ We will need a Indeed, it isn’t entirely clear whether or not cognitive inputs ought to be included in the sensory mechanisms, not 27 least because it isn’t clear where the line between perception and cognition should be drawn. If cognitive resources are required in order to give rise to experience, then they will certainly be involved in the relevant mechanisms. Specifying which cognitive resources are included will have ramifications for the debates over rich/thin contents of perceptual experience and cognitive penetration. In developing his sensory pluralism, Fulkerson (2014c) notes that different individuations (chosen for different 28 purposes) will give different results for different experiences. He doesn’t question that there will be some individuation of the senses that will be acceptable for answering the general question that is in dispute in the novel feature debate. (See Macpherson 2011b, 2011c for a similar sensory pluralist view.) 24 criterion for selecting a preferred demarcation of the mechanisms of a given Aristotelian sense if we are to get our unqualified answer. Here we can press our association rule into service. The idea is that candidate demarcations will be evaluated with respect to their ability to deliver the right verdict regarding our guiding feature association intuitions from constraint (2), among others. (These other intuitions – call them secondary intuitions – will be intuitive feature associations that we would like to retain if we can, but which we are willing to give up to save more central intuitions.) In order to do this, though, we need to settle on the right association rule – and that requires having clearly demarcated mechanisms for the rule to operate on. We need the rule to pick the demarcation, but we need the demarcation to pick the rule. The key to resolving this difficulty is finding some equilibrium between intuitions regarding feature associations and intuitions regarding the mechanisms of the Aristotelian senses. Returning to our second constraint on the association rule – that it must satisfy certain guiding feature association intuitions – we can add the qualification: provided the notion of a modality that you feed into it satisfies guiding intuitions about that modality. It is no mark against an association rule that it associates colors with olfaction if the relevant notion of olfaction includes retinal stimulation. Fortunately, though they don’t decide what to say about inputs of type (i) and (ii), our intuitions concerning the mechanisms of the Aristotelian senses do not leave us entirely at sea. For instance, it is uncontroversial that each of those senses has primary receptors (e.g., retinas for vision, the basilar membrane for audition, etc. ) that are tuned to a particular class of energy 29 Things get a little trickier for the other senses but, I think, not unbearably so. 29 25 (e.g., light, pressure waves, etc.) and that the primary receptors of a given sense are not part of the mechanisms of the other senses. These intuitions are linked to, but not dependent on, the 30 feature association intuitions of constraint (2). Regarding the intuition that the proper sensibles are associated with only one sensory modality, color is a proper sensible of vision because color is the sort of thing that is only detectable by the primary receptors of vision. Similarly, the paradigmatic instances of recalibration effects, including the McGurk and ventriloquism effects, require stimulation of the primary receptors of multiple modalities. (It has been shown that mental imagery can induce cross-modal illusions – e.g., that an imagined visual stimulus can induce the ventriloquism effect (Berger and Ehrsson 2013). However, this does not impact the primary receptor intuition any more than our ability to imagine a color while receiving no visual stimulation does. The primary receptor intuition is linked to the proper sensible and recalibration effect intuitions only with respect to their paradigmatic instances – i.e., those resulting from occurrent stimulation of the senses involved. And it does not depend on these feature association intuitions even in the paradigmatic instances. If anything, the order of explanation goes the other way around. The primary receptor intuition is not affected at all by the existence of non-paradigmatic instances of the effects, such as those involving mental imagery.) 31 We might think that this gives us an alternative rule: A feature of a perceptual experience is associated with a 30 given modality just in case we lose those features of the experience in a corresponding situation differing only in that there is no stimulation of the primary receptor. However, this test won’t work. Consider the McGurk effect: If we omit stimulation of the basilar membrane (primary auditory receptor), we lose /da/ phenomenology (i.e., the sound of someone pronouncing /da/), so this feature is associated with audition. But if we omit retinal stimulation, we also lose that phenomenology, so it will also be associated with vision. But intuitively, /da/ phenomenology should only be associated with audition. Such imagery is another non-stimulus input of the sort discussed in relation to OCR, above. It remains an open 31 question whether or not the mechanisms of imagery ought to be included in those of one of our Aristotelian senses and, relatedly, whether or not imagery is properly thought of as perceptual. Settling the question will have consequences for applications of the association rules to follow. 26 So, any acceptable candidate for the preferred demarcation of the sensory mechanisms to be used in adjudicating the novel feature debate will need to conform to our intuitions regarding primary receptors of the Aristotelian senses. With one further (safe) assumption – namely, that we don’t have stimulus-independent resources for duplicating the information about our environment that the primary receptors would capture in a given circumstance, were they active (whether or not they are) – we can assess how our rules fare with respect to our intuitions concerning proper sensibles and the McGurk and ventriloquism effects using a general notion of an acceptable demarcation – a placeholder demarcation on which the primary receptor condition and the safe assumption are satisfied. The goal is to find a rule that can deliver these intuitions for the general notion of an acceptable demarcation of the mechanisms of the Aristotelian senses. Once we have the rule in hand, we can use it to find our preferred demarcation by testing the acceptable candidate demarcations with respect to their ability to deliver our secondary intuitions. In short, we find our rule using the generalized acceptable demarcations of the Aristotelian senses, and then we us our rule to find the preferred demarcation of the mechanism of each of those senses. One might worry that, even with an association rule in hand, there will still be equally good candidates for the preferred demarcation that deliver different verdicts with respect to specific novel features. Be that as it may, without an association rule there is no hope for a resolution to the novel feature debate. With a rule some hope remains, so it behooves us to make the effort to find an adequate association rule. We have already seen that the rule must operate on clearly demarcated sensory mechanisms, so the dimensions along which the proposed rules can vary are: what counts as a 27 result of the stimulation states of a mechanism and how much we allow the stimulus to deviate from the actual stimulus in a counterfactual unimodal experience. In what follows I consider a series of modifications to OCR, each of which differs from its predecessor by a minimal adjustment in one or the other of these dimensions, thereby covering all the plausible ways of formulating an association rule. As we shall see, none of these attempts succeed, so the worry above never gets off the ground. (Tables summarizing the rules and how they fare with respect to the four constraints are included in the appendix.) 5. Revising the association rule 5.1. ‘Could’ and ‘would’ in determinate sensory mechanisms Let’s first modify OCR so that it will distinguish between our candidate demarcations of the mechanisms of a given sense. For an instance of a feature F of a perceptual experience E resulting from the stimulation of the sensory mechanisms M1,…, Mn by a (total) stimulus R: (RR1) That instance of F is associated with the modality individuated by Mm if, and only if, were just Mm stimulated by R, a corresponding instance of F could be a feature of the resulting experience E’. According to RR1 we will be able say that, if Mm is an auditory mechanism that includes learned correlations by which waveform effects can convey information about elevation, the experience of the elevation of a sound will be associated with audition. If Mm does not include the learned 28 correlation (or an innate proxy thereof), sound elevation will not be associated with audition. That is an improvement over OCR. However, the phenomenal features that could result from a unimodal experience under equivalent stimulation will not be as constrained as we want. Some deaf musicians, for instance, are able to distinguish pitch by tactile sensation of a vibrating surface (the surface being vibrated by or being the source of a sound wave). The resulting experience – or any tactile experience of a vibrating surface, for that matter – could have the phenomenal character we associate with pitch rather than the characteristic feel of a vibrating surface. But then RR1 associates features that ought to be proper sensibles (e.g., pitch phenomenology) with more than one modality (e.g., audition and touch), in violation of one of our guiding intuitions from constraint (2) on an acceptable association rule. RR1 is too promiscuous in its associations. 32 We can avoid this promiscuity by considering only what would result from the actual, determinate sensory mechanisms of the subject: (RR2) That instance of F is associated with the modality individuated by Mm if, and only if, were just Mm stimulated by R, a corresponding instance of F would be a feature of the resulting experience E’. While there is some evidence, from a select number of long-time users of sensory substitution devices, that 32 stimulation in one modality can result in some rudimentary phenomenology characteristic of another, this is due to recruiting resources of the latter modality to the processing of input to the former, in which case we are dealing with a different mechanism (one including the recruited resources) than the one with which we began (the ordinary touch mechanisms). The point here is that no acceptable demarcation of the ordinary mechanism of touch has, as a matter of fact, the ability to produce pitch phenomenology on its own, though it is conceivable that it could. 29 This corrects for our tactile pitch phenomenology. We wouldn’t have pitch phenomenology arising from tactile vibrations given the sensory mechanisms we actually have (on an acceptable demarcation of the tactile mechanism). However, given the restriction to the exact stimulus, R, in the counterfactual situation, there will be no way of capturing intuitions regarding recalibration effects. If the mechanism delivers a given feature only because of the influence of another mechanism, then that feature would not be contributed by the unimodal mechanism under that stimulation. For example, we cannot attribute any phoneme content/phenomenology to audition in the McGurk effect because the unimodal auditory mechanism would not, under equivalent stimulation result in /da/ content/phenomenology. That requires stimulation of the primary receptors of the visual mechanism as well. We will need to loosen the association rule if we are to capture the guiding intuition that the altered phoneme phenomenology/content in the McGurk effect is auditory. 5.2. Loosening constraints on the stimulus Perhaps we should loosen the restriction on the stimulus to something like ‘a substantially similar stimulus’ (assuming this can be made sufficiently precise to avoid violating our first constraint). This would give us, for an instance of a feature F of a perceptual experience E resulting from the stimulation of the sensory mechanisms M1,…, Mn by a (total) stimulus R: 30 (RR3) That instance of F is associated with the modality individuated by Mm if, and only if, there is some stimulus R’ that is substantially similar to R and, were just Mm stimulated by R’, the resulting experience E’ would have a corresponding instance of F. The idea here is to associate features with the mechanisms that would produce them under some similar stimulation, privileging unimodal mechanisms. For instance, any acceptable auditory 33 mechanism is capable of giving rise to an experience representing /da/ and providing the phenomenal features we associate with the pronunciation of that syllable. So, we can class /da/ experiences as auditory, even when they are produced by a multimodal mechanism, provided the stimulus that would result in unimodal /da/ experience counts as substantially similar to the stimulus of that multimodal mechanism. But there is a problem. Consider the following: I see what looks like a speckled (flat) wall while also feeling that the wall is rough with my hand. Had the bumps on the wall been slightly higher or had the lighting come from a slightly different angle, the wall would have visually appeared rough. RR3 will correctly associate the content that the wall before me is rough with touch; but it will also associate that content (incorrectly) with vision because, were I presented (in a unimodal visual experience) with a substantially similar stimulus that differed just in the height of the bumps or the angle of the light, I would visually experience the content that Alternatively, you might think that a feature will be associated with a modality just in case it is one of the 33 disjuncts in the disjunction of possible features resulting from the actual stimulus state of that modality’s mechanism plus whatever effects can be generated on its output by the other mechanisms under a range of stimulation. Such a rule will need a non-question begging means of identifying the experiential results of a causal influence of one mechanism on another to avoid associating every feature of the experience with each modality. This will require a fuller specification of the dependencies of phenomenal features on informational content carried by the sensory mechanisms. See §5.3 for reasons to think this won’t help adjudicate the novel feature debate. 31 the wall is rough. On the other hand, RR3 will not ascribe tactile roughness phenomenology to 34 vision because the tactile phenomenology isn’t the sort of phenomenology that (any acceptable notion of) vision would have from our substantially similar visual stimulus. The point is that RR3 does not do well with respect to the contents of the experience, at least where the contents concern common sensibles. This suggests that an association rule for phenomenology will be 35 easier to come by – common sensibles are still (intuitively) phenomenally distinguishable across the senses – so I will turn my attention to an association rule that works for phenomenal features. If no rule for phenomenal features can be found then there is no hope for the harder case of representational content. However, RR3 also gives us the wrong results with respect to the phenomenology of ventriloquism effect experiences because the effect does not merely relocate the sound, it makes the localization more precise. And this is not restricted to the ventriloquism effect. In general, sound localization phenomenology resulting from the stimulation of just the auditory mechanism will not be as refined as sound localization phenomenology resulting from the auditory mechanism when aided by foveal visual input. But then, if we just consider an experience 36 resulting from substantially similar stimulation of the auditory mechanism alone, we won’t get This result doesn’t depend on any resolution of the conflict between information carried by the visual and tactile 34 mechanisms, as in work on the visual dominance of touch (Rock and Victor 1964; Rock and Harris 1967). It depends on the fact that there is a possible stimulus similar to the actual stimulus that would make a content accessible to vision that is only accessible through touch in the actual experience – i.e., the visual mechanism plays no part in contributing that content to the original experience. This is not to say that the test could not be salvaged for content. Perhaps one could make a response along the 35 lines of Nudds’s (2009) distinction in the determinateness of experiences of common sensibles perceived with different modalities or an appeal to the total content characterizing the experience in which the content in question is embedded. Or perhaps we could try to restrict ‘substantially similar’ to rule out these cases. But these will require taking on theoretical commitments that might not pan out. See Battaglia, et al. (2003), King (2009), and especially Bizley and King (2008). See Blauert (1997, pp. 193-196) 36 for an overview. For a presentation of the most precise spatial information available to audition from purely auditory cues, see Shinn-Cunningham, et al (2000). 32 sound localization phenomenology as in the actual experience. We will get a much less precise localization. So, the localization of the sound will not be associated with audition according to RR3, contrary to our intuitions. The same is true of every recalibratory cross-modal mechanism in which one of the mechanisms is more finely tuned than the other but we want to associate the resulting (phenomenal) feature with the latter modality. 37 It will be no help to loosen our restriction on the stimulus further, by dropping the similarity requirement and leaving the stimulus unconstrained: (RR4) That instance of F is associated with the modality individuated by Mm if, and only if, there is some stimulus R’ such that, were just Mm stimulated by R’, the resulting experience E’ would have a corresponding instance of F. We still get the same result with respect to vision-assisted sound localization because no stimulus impinging on the auditory mechanism alone can give the fine-grained sound spatialization that arises from the recalibration of sound localization by foveal visual input to the auditory mechanism. Furthermore, this rule associates all the common sensibles in a given experience with every modality that is able to give rise to them, whether or not they do so in that experience. RR4 is decidedly worse than RR3. One might be inclined, at this point, to bite the bullet and abandon our guiding intuitions regarding the ventriloquism effect and other precision increasing sensory interactions. However, it is unclear why we should allow the retreat to substantially similar stimulation to save the Such recalibration for greater precision does not presuppose greater accuracy, as the case of the ventriloquism 37 effect should make clear. 33 McGurk intuition but not allow a retreat to a substantially similar, but less precise, result to save the ventriloquism intuition. Pending a compelling reason for the asymmetry, we ought to see if we can revise our association rule so that it captures both intuitions. The next section attempts to spell out just how we ought to do this. 5.3. Loosening what counts as a result Perhaps there is some profit to be gained by considering the differences between the ventriloquism effect and a case where RR3 arguably gets things right with respect to novel features, flavor. Flavor experience (allegedly) contains novel feature types whereas the ventriloquism effect experience merely evinces increased precision with respect to a unimodally available feature type (sound localization phenomenology). The flavor/taste distinction is, supposedly, a difference in kind. The ventriloquism effect (and ordinary visually enhanced auditory localization, generally) reveals a difference in degree. But RR3 cannot capture this distinction because it is bound by those features that the unimodal mechanism – however demarcated – would deliver on its own. We have no room left to maneuver with respect to either stimulus or mechanism. This suggests that we need to loosen our requirement on the features that would result from substantially similar stimulation. But we will need to do so in a way that does not wind up attributing too much to the modality. A first pass 38 at the required modification would be to associate features of a perceptual experience with a modality that either would produce them, under some substantially similar stimulation, or would Having already seen that the rule must operate on determinate mechanisms and that no loosening of the variation 38 in the simulation between the actual and the counterfactual situations will, on its own, get us an acceptable rule, this is all that is left to us. 34 produce a feature that is a determinate of the same determinable as the initial feature – a determinable being a relatively indeterminate property (e.g., color) and a determinate of a determinable being a more specific property in the range of the determinable (e.g., blue). Determinates of a given determinable can, themselves, be determinables with yet more specific determinates (e.g., azure). (The appeal to determinates/determinables is a more general way of capturing difference in degree/kind alluded to above.) So, for an instance of a feature F of an experience E resulting from the stimulation of the sensory mechanisms M1,…, Mn by a stimulus R: (RR5) That instance of F is associated with the modality individuated by Mm if, and only if, there is some stimulus R’ that is substantially similar to R and, were just Mm stimulated by R’, the resulting experience E’ would either: (a) have a corresponding instance of F or (b) have a corresponding instance of a feature F’ that is a determinate of the same determinable as F. RR5 does save the ventriloquism intuition, but it runs afoul of one of our other constraints: (b) introduces an opportunity for features to become associated with a modality despite the mechanism of that modality not being involved in the production of that feature. This is a straightforward violation of the fourth constraint. To illustrate the problem, it will be useful to appeal to a hypothetical case that allows us to abstract away from the difficulties posed by our own, complicated perceptual systems. 35 Imagine a race of aliens very much like us except that they have evolved such that they ordinarily experience a form of lexical-gustatory synesthesia. When they hear names in their home language, they have concurrent taste phenomenology of some sort of sugary soft drink, a different one for each name. (They evolved to have this synesthesia because it is, for them, conducive to social bonding. ) For the purposes of this example, we will include whatever 39 mechanisms are necessary to produce the synesthetic taste phenomenology in this alien race’s auditory mechanism. (This will still count as an acceptable demarcation of audition because stimulation of the primary receptors of their gustatory mechanism is not necessary for the synesthetic taste phenomenology.) Furthermore, let us stipulate that there are no other relevant interactions between these and the other senses complicating the case. In sum: these aliens have auditory mechanisms that produce taste phenomenology in a narrow range (sugary soft drink tastes) when they hear names, and there are no other relevant interactions between their senses in producing their auditory and gustatory experiences. Otherwise they are just like us. Now suppose that one of these aliens, Lexi, visits earth, settles in California, learns English and assimilates with human society there. Furthermore, ‘Joan’ turns out to be a name in Lexi’s home language, one resulting in a synesthetic experience of the taste of root beer. No other English words or names are names in her language. One day Lexi is drinking coffee when her friend Gus says, “June is always gloomy in Los Angeles”. (In terms of RR5, this total scene, including the coffee drinking and Gus’s utterance, is the stimulus R). When Lexi hears “June” she has no synesthetic taste phenomenology – it is not Also note that, for these aliens, taste – at least in the range of tastes of sugary soft drinks – is not a proper 39 sensible. Nothing in the example depends on this, nor does this case rely on the feature association intuitions of constraint (2). 36 one of the names that triggers her synesthesia. But she does experience the taste of coffee (RR5’s F, in this case) because she is, in fact, drinking coffee. However, the sound of “June” is very similar to “Joan”. So there is a stimulus (R’) – namely, Gus uttering “Joan is always gloomy in 40 Los Angeles” while everything else remains as it is in R – that is substantially similar to the original stimulus and that would result, under unimodal stimulation of Lexi’s auditory mechanism, in a corresponding feature (F’) – the taste of root beer – that is a determinate of the same determinable (phenomenal taste) as the coffee taste she is actually experiencing. Therefore, all the conditions on the right side of RR5 are satisfied, delivering the verdict that the coffee taste is associated with audition. Yet Lexi’s auditory mechanism has nothing to do with her tasting coffee. Her synesthesia never produces the taste of coffee. So this verdict violates constraint (4). 41 Of course, one could respond that we have latched on to the wrong determinable. The relevant determinable is just the range of tastes that Lexi’s auditory system, when properly stimulated, can produce in her – namely, sweet soft drink tastes. That would rule out coffee. But this just points to a further worry; namely, that we can carve determinables more or less finely, with different ways of doing so giving different associations according to RR5. We need a way of picking out the relevant determinable. The response in the preceding paragraph – that the objection relies on using the wrong determinable – is based in the intuition that the relevant determinable is fixed by the range of the mechanism in question (e.g., the range of taste phenomenology that Lexi can experience from If you don’t find ‘Joan’ and ‘June’ sufficiently similar, substitute the French (and alien) name ‘Jeune’ for ‘Joan’. 40 This conclusion follows even if there are no creatures that work according to the stipulations we have introduced 41 to describe Lexi. The point is simply that the rule is not adequately tied to the mechanisms that produce the features to be associated. 37 stimulation of her auditory mechanism). Adding this restriction to RR5 would certainly help resolve the violation of the fourth constraint, but hewing too closely to the capacities of the mechanism in question is what caused our problem with the ventriloquism effect (or any visually assisted auditory localization) with respect to RR3 and RR4: The auditory mechanism lacks the spatial resolution of (foveal) vision-supported auditory localization, so (foveal) vision-supported localization phenomenology will fall outside the range of the relevant determinable. That means that we cannot associate the localization of the sound with audition, contrary to our intuition. If we don’t restrict ourselves to the range of the determinable deliverable by the mechanism in question, we risk throwing ourselves back on the problem of selecting the relevant determinable (as we just saw). There is a middle way: restricting the determinable to the range of features that the subject’s sensory mechanisms can, individually, supply. However, this won’t fix RR5’s 42 rulings in Lexi’s case. The taste of coffee is deliverable by one of Lexi’s sensory mechanisms; namely, the gustatory mechanism. We are going to need a more nuanced approach. In particular, we need a notion of feature inheritance – conditions under which a modality can take on a feature of another modality. With the notion of inheritance properly spelled out, we will be able to fix on the relevant determinable (according to our middle way) and accommodate the ventriloquism effect and Lexi intuitions. The idea is that, in the ventriloquism case sound localization is inherited but in Lexi’s case coffee taste is not. So inheritance constrains the appeal to shared determinables in clause (b), bringing it in line with constraint (4). In a proper instance of inheritance, the original stimulus (R) would not result in a unimodal experience of the inherited determinate (F) in the inheriting modality. We can narrow To drop the restriction to the deliverances of the individual mechanisms would be make novel features 42 impossible, in violation of the third condition on our association rule. 38 this further, given that clause (a) will capture any instances of the determinate that would result from some substantially similar stimulation: Inheritance involves determinates of a unimodally accessible determinable that wouldn’t result from a substantially similar stimulus (R’). Furthermore, inherited features are inherited from some other mechanism that can produce them in a unimodal experience. For example, when the visual localization of a trumpet is bound to an 43 auditory object (the sound of the trumpet) in (foveal) vision-assisted sound localization, the localization – which is too precise to have been supplied by a unimodal auditory experience – is supplied by vision and inherited by audition. By contrast, Lexi’s coffee taste phenomenology is not inherited by audition – it is just a straightforward result of her taste mechanism. It is not necessary that the inherited feature be a feature of an object in the parent modality in the original experience (i.e., that (a) is satisfied for the parent modality), though this is typically so. Restricting inheritance in this way would present problems. First, it can’t accommodate the intuition that hemifield neglect patients experiencing the ventriloquism effect on their neglected side inherit the localization of the sound from vision (because there is no corresponding visual experience). (Hemifield neglect patients are susceptible to both the McGurk and the ventriloquism effects even when the visual stimulus is presented to their contralesional side and so, presumably, without any accompanying visual phenomenology (Bertelson et al. 2000; Leo et al. 2008; Soroker et al. 1995a, 1995b).) Also, some recalibration effects persist even after the contributor modality stops receiving stimulation. 44 For the sake of thoroughness, we should acknowledge the possibility of chains of inheritance – e.g., one modality 43 inheriting a feature from a mechanism that inherited the feature from yet a third mechanism. At some point chains of inheritance will need to bottom out in a mechanism that has non-derivatively supplied the feature. How the coordination of perceptual objects across modalities actually goes will be addressed in more detail 44 below. 39 The appeal to shared determinables also constrains inheritance. In particular, it captures the idea that we don’t inherit features arbitrarily – e.g., we don’t have experiences of green sounds and circular smells. And the restriction to a corresponding instance of a shared determinable captures the idea that instances of inheritance involve the inherited feature (F) overwriting a feature (F’) of the given perceptual object that would have been produced under substantially similar unimodal stimulation. This overwriting follows from the fact that the feature inherited (F) is a determinate of the same determinable as is the unimodally available feature (F’). Determinates of the same determinable are incompatible – an object cannot be both entirely red and entirely blue. Adding this notion of inheritance to RR5, we get: (RR6) That instance of F is associated with the modality individuated by Mm if, and only if, there is some stimulus R’ that is substantially similar to R and, were just Mm stimulated by R’, the resulting experience E’ would either: (a) have a corresponding instance of F or (b) have a corresponding instance of a feature, F’, that is a determinate of the same determinable as F, provided that (i) Mm inherits the instance of F from some other mechanism Mk in E. RR6 seems to get the verdict right in the ventriloquism effect – we will say exactly the same thing about it as we did about the localization of the sound of the trumpet, above. And it certainly gets the right verdict with respect to Lexi’s case. The taste of coffee is not inherited by audition – Lexi doesn’t experience the coffee taste because her auditory mechanism inherited the feature 40 from her gustatory mechanisms. She tastes the coffee as a result of the operation of her gustatory mechanism, itself. Notice, though, that we can say this in Lexi’s case because we stipulated that there were no complicating interactions between the senses. Real life cases will require empirical facts to determine when a mechanism does and doesn’t inherit a feature from another. And this is going to lead to a problem for RR6. Consider the justification on offer for treating the ventriloquism effect localization phenomenology as auditory: A feature (localization) is bound to an auditory object (a sound), where that feature was inherited from vision. Without the empirical details regarding the interactions of the visual and auditory mechanisms, we have to fall back on our intuitions about which perceptual objects belong to which sensory modalities. But we (intuitively) identify perceptual objects with a modality because we associate their perceptible features with that modality. This has us going in a tight circle: A feature instance is associated with audition because it is instantiated in an auditory object, and a perceptual object is auditory because the features it instantiates are associated with audition. We will need something other than an 45 appeal to the intuitive association of perceptual objects with a modality to get us a non-circular justification for associating inherited features with the inheriting modality. It can’t just be that binding an initially visually supplied feature to an intuitively auditory object can, by itself, justify calling that feature auditory. After all, we haven’t ruled out cross-modal objects – i.e., perceptual objects with features intuitively associated with different sensory modalities. We will need to dig This circularity worry is noted by Fulkerson with respect to his binding-based feature association condition (2014, 45 p. 38). He attempts to block the worry by appeal, in part, to proper sensibles: Given an antecedent association of proper sensibles to modalities, we can type perceptual objects by the proper sensibles they contain and then group any other features that are bound to that object according the modality of the object. This might be sufficient for his purpose, which is to show that haptic touch is unisensory. However, it won’t help our case: We are seeking a thorough going association rule that will tell us, among other things, which sensory modalities the proper sensibles belong to. But that is the very thing Fulkerson assumes. 41 beneath the phenomenal surface to understand why there is feature inheritance in the hope that this will provide a justification for associating the phenomenal feature with the inheriting modality. That is, we need to explain RR6’s clause (i) in terms of the mechanisms responsible for feature inheritance. Barring this, we still have not fully satisfied constraint (4). So what happens in one of these instances of inheritance? Conflicting determinates of the same determinable are pre-experientially attributed to (proto-)perceptual objects of different modalities. Perceptual objects, here, are bundles of bound features by which we encounter objective particulars. We can, for instance, be visually aware of some features of a thing and tactilely aware of others. The bundle of bound tactile features will be the tactile object, those of the visual features the visual object. (This doesn’t require an ontologically robust thing that is 46 the perceptual object.) If the resolution of this conflict yields a feature that could not have been supplied by one of the modalities being taken on as a feature of one of that modality’s objects, then we have a proper instance of inheritance. (Otherwise, we have mere recalibration. ) 47 But why should there be a conflict, given the fact that these are features of different perceptual objects? Because perceptual objects can be of the same objective particular (as in the look and feel of the cup in my hand) or of objects that are intimately related (as are sounds and their sources), such that having a determinate of a shared determinable in the one entails the same determinate in the other. We get conflicts when there is a clash of determinates in As noted above, this sort of intuitive classification of perceptual objects leads to a circularity worry, if it is to be 46 used in associating perceptual features with sensory modalities. A better way of typing perceptual objects by modality is to do so in terms of the mechanisms that produce them (by binding features into those perceptual objects). Perceptual objects typed in this way can factor in a rule for associating features with modalities (see below). Notice that the basic mechanism for mere recalibration and inheritance is the same, which supports treating the 47 McGurk and ventriloquism effects as broadly similar. Inheritance is a special case of recalibration. 42 perceptual objects so related. To recognize such instances there has to be some pre-experiential 48 mechanism linking objects and features across modalities and resolving apparent conflicts. This suggests that sub-experiential information sharing across modalities underwrites inheritance. If informational content is contributed to some mechanism and that mechanism uses that content in the determination of the phenomenal features of its perceptual objects, then we will have a justification for associating the phenomenal feature with the inheriting modality. This justification avoids the circularity worry while explaining the intuitive pull of the idea that, e.g., a feature of an auditory object ought to be associated with audition. It also gives us a better grip on why features aren’t inherited arbitrarily: it is not adaptive to inherit features if there is no conflicting information to reconcile (e.g., no acceptable demarcation of the auditory mechanisms carries information regarding wavelength reflectance, so no conflict with vision can arise on this front). Notice, too, that the justification doesn’t require that there be an instance of the relevant 49 feature associated with the parent modality in the experience, only that the mechanism of the parent modality carry the sort of information that can be used to determine instances of the feature in the perceptual processing eventuating in the experience. This solves the problems from the ventriloquism effect in hemifield neglect patients and the persistence of recalibration once the parent mechanism is no longer receiving stimulation. The sub-experiential recognition of this sort of relation between (proto-)perceptual objects is called the ‘unity 48 assumption’ in the empirical literature. See Chen and Spence (2017) for a review. Of course, it is possible that there are non-adaptive inheritance-like phenomena that we might want to capture 49 under the label “inheritance”. I have no strong leanings as to whether the resulting features should be associated with the “inheritor” mechanism. If so, then we can drop the shared determinable restriction in clause (b). Whether or not the feature satisfies the shared determinable restriction simply sorts the normal (adaptive) instances of inheritance from the odd ones. 43 The key point, though, is that it is not phenomenology that is inherited, in the first instance, but information. Inheritance occurs sub-experientially, before any phenomenology arises. This 50 leaves room for the modality to do some work in determining the phenomenal features, thereby justifying the claim that these features are properly associated with the modality. But even if 51 informational content, alone, determines the feature instance, we can justify associating the feature with the inheritor modality due to its uptake of the information in the formation of its own perceptual objects. Incorporating this into our association rule, we get: (RR7) That instance of F is associated with the modality individuated by Mm if, and only if, there is some stimulus R’ that is substantially similar to R and, were just Mm stimulated by R’, the resulting experience E’ would either: (a) have a corresponding instance of F or (b) have a corresponding instance of a feature F’ that is a determinate of the same determinable as F, provided that (i) Mm inherits information from some other mechanism Mk in the course of sub-experiential sensory processing (resulting in E) and (ii) Mm uses that information in determining the instance of F (in E), as a feature of one of Mm’s perceptual objects. Mm ‘inherits information’ when it receives informational input from Mk that (partially) determines an instance of 50 F, where no unimodal stimulation of Mm would carry that information and where Mk does carry that information in the sub-experiential processing resulting in E, either non-derivatively or through a chain of inheritance bottoming out in a mechanism that carries the information non-derivatively. On this view the content might be associated with one (or more) modality, the resulting phenomenology with 51 another. For example, we might continue to associate the localization content inherited, in the ventriloquism effect, with vision while associating the localization phenomenology with audition. This can accommodate a view like that of Bayne (2014) on which the contents of perceptual experiences involving sensory integration are multisensory while maintaining our guiding intuitions with respect to phenomenal features. 44 RR7 retains the advantages of RR6 while solving the circularity problem – it bases inheritance in the operation of the sensory mechanisms, forestalling the need to appeal to intuitions about the modality of perceptual objects/features in its application. However, in accepting this association rule, we lose something. Our previous versions of the rule make the sorting of phenomenal features empirically tractable – at least in principle – without requiring us to take on any theoretical commitments concerning the basis of the phenomenal features of perceptual experiences. We simply demarcate our mechanisms, isolate them, see what phenomenal features result from their stimulation, and compare with the phenomenal features of the original experience. RR7, on the other hand, requires us to take a stand on what information a given mechanism carries at various points in the course of processing a given stimulus and to endorse the supervenience of phenomenology on informational content. And once we have settled on a story that allows us to associate contents with mechanisms – and accepting the supervenience thesis – we will have to settle the further question concerning the point in the flow of information at which phenomenology emerges. None of this is trivial, but until these issues are sorted RR7 will not rule on a large number of features. Furthermore, there is a problem lurking in RR7. To illustrate: Suppose Lexi takes a sip of root beer just as Gus says, “Joan is always gloomy in Los Angeles”. RR7 will associate the root beer taste phenomenology with both taste and audition by clause (a) because unimodal sensory stimulation in either modality would have resulted in that phenomenology. But imagine two ways Lexi’s synesthesia might operate: (1) When Lexi’s synesthesia is triggered, 45 processing of gustatory objects is stopped – the only taste-like phenomenology she has in those moments is due to her synesthesia (the mechanisms of which are part of her auditory mechanism). (2) When Lexi’s gustatory system is stimulated, her synesthesia does not take effect. It seems clear that in (1) the root beer phenomenology should be attributed to audition and not to (the sense of) taste and in (2) it should be attributed to taste rather than audition. But, again, RR7 associates it with both regardless of how Lexi’ s synesthesia operates. And that means that clause (a) fails to satisfy constraint (4) just as clause (b) did in previous iterations of the rule. We have accepted that informational content plays a role in determining phenomenology to circumvent the problem for (b). Now we must do the same for (a): (RR8) That instance of F is associated with the modality individuated by Mm if, and only if, there is some stimulus R’ that is substantially similar to R and, were just Mm stimulated by R’, the resulting experience E’ would either: (a) have a corresponding instance of F, where (i) the instance of F in E was (partially) determined by information carried by Mm or (b) have a corresponding instance of a feature F’ that is a determinate of the same determinable as F, provided that (i) Mm inherits information from some other mechanism Mk in the course of sub-experiential sensory processing (resulting in E) and (ii) Mm uses that information in determining the instance of F (in E), as a feature of one of Mm’s perceptual objects. 46 It will be helpful, in assessing RR8, to take a step back and remember why we are looking for an association rule in the first place. Primarily, we wanted an association rule to adjudicate the novel feature debate. We were interested in novel features because we were worried that theorizing about perceptual experience in a sense-by-sense manner would lead us to miss something crucial; namely, those features not explainable in terms of the individual (Aristotelian) senses. Now notice that, applying RR8 – and so using it to adjudicate the novel feature debate – requires a complete accounting of the flow of information through the sensory mechanisms and an account of when that information manifests consciously. But, if we have this, we haven’t left any features unexplained. It’s just that the explanation doesn’t appeal to the Aristotelian senses, it appeals to sensory mechanisms. Notice that this holds for representational content as well as phenomenology: We could press RR8 into service as a content ascribing rule because the link to information processing by mechanisms allows us to circumvent the common sensibles problem while also allowing information to be inherited from other mechanisms. All we need to add is an account of how information determines representational content (see, e.g., Neander 2017; Shea 2018; and, especially, my elaboration of them in chapter 2). But again, if we have already accounted for the complete flow of information in perceptual mechanisms, on the way to determine the representational contents of perceptual processing, there is no further explanatory role for the Aristotelian senses to play. One might think that there is at least one theoretically interesting class of features in the vicinity of novel features that RR8 will reveal; namely, emergent features – i.e., informational 47 content derived from multiple mechanisms, but not found in any of those mechanisms themselves, and phenomenal features supervening on this emergent information. Perhaps emergent features can be leveraged into a defense of novel features. Notice, though, that emergent information and features arise within as well as across the (acceptable demarcations of the) mechanisms of our pre-theoretical conceptions of the Aristotelian senses. For example, there is a chemoreceptor on the tongue that, when stimulated in isolation, contributes no phenomenal features. However, when it is stimulated along with gustatory receptors that do contribute phenomenal taste features, emergent features (called ‘kokumi’) result. These features include a strengthening of the basic tastes and features that are not entirely characterizable in terms of those basic tastes – e.g., ‘heartiness’. Kokumi receptors are part of the primary receptor of taste 52 (i.e., they are chemoreceptors found on the tongue). And yet the result of a kokumi experience includes features that are more than the sum of the contributions of the sub-mechanisms giving rise to them – there is more to kokumi experience than one gets from the stimulation of any of the chemoreceptors of taste taken in isolation. Given this, novel features will be a subset of emergent features. They are those emergent features that emerge from the interaction of sensory mechanisms we associate with distinct Aristotelian senses. If emergent features are going to form the basis of a defense of novel features, the classification of some emergent features as novel will need to add explanatory power to our theory of perception. But the further classification of emergent features as novel will not add any explanatory power to our theory of perception: Once we are in a position to See Ohsu et al. (2010), Ueda et al. (1990); see also Brennan, Davies, Schepelmann, and Riccardi (2014), 52 Maruyama, Yasuda, Kuroda, and Eto (2012), Yamamoto et al. (2020), Yang, Bai, Zeng, and Cui (2019). A plausible candidate for the emergent information on which the kokumi phenomenology supervenes is information pertaining to expected caloric content (Tang, Tan, Teo, and Forde 2020). 48 apply RR8, with its attendant account of emergent features, we have accounted for all the features arising from the sensory mechanisms (however demarcated from one another). And the senses, themselves, are to be individuated in terms of their mechanisms. So, the further classification of emergent features as novel is nothing more than an attempt to carve the results of RR8 to match pre-theoretical intuitions about the senses. (While intuition matching can be used to evaluate theoretical proposals, there is no theoretical value to intuition matching qua intuition matching.) To illustrate, suppose that it were discovered that the putative novel feature involved in the experience of mintiness is a ‘harmonizing’ of the features provided by taste, smell, and trigeminal stimulation. And suppose, in addition, that, like kokumi, this feature only emerges when a further chemoreceptor is activated – one that does not produce any phenomenal features when stimulated in isolation. Furthermore, these (hypothetical) chemoreceptors are found midway between the gustatory receptors and the olfactory receptors – somewhere between the soft palate and the back of the nasal cavity – so that it is not entirely clear whether or not they belong to taste, smell, or neither. The flavor feature described here is certainly emergent. Whether or not it is novel depends on whether we attribute the hypothetical receptor to one of the Aristotelian senses. If, for example, we attribute the receptor to taste, then the feature will not be novel – it will be analogous to kokumi: the ‘harmoniousness’ will be associated with taste, though it ‘harmonizes’ features from multiple senses. If we don’t attribute it to any Aristotelian sense, it will be novel. But choosing between these options doesn’t have and further repercussions for our theorizing about mintiness. In particular, the classification of mintiness as 49 novel, should we choose that option, does not help us explain anything further about the experience of mintiness. So the appeal to emergent features cannot be leveraged into a defense of novel features. But that was our last hope for finding a theoretical use for novel features. In short, RR8 renders the debate we were hoping to illuminate irrelevant. And so it fails to meet our third constraint, that the association rule make the novel feature debate substantive. One might think this means we should look for a further tweak to the association rule that will fix these problems or for another way to formulate an association rule, other than our counterfactual strategy. I am inclined to draw a different lesson: Multisensory experience (as 53 construed in the novel feature debate) is a red herring – or, rather, a ladder to be kicked away. It was a useful step in philosophy of perception’s recent march from a single-minded focus on vision through a broadened focus inclusive of other Aristotelian senses to the eventual recognition that these senses (intuitively construed) won’t be useful in theorizing about perceptual experience. But having arrived at this point, we have no further use for novel features or multisensory accounts of perception. Furthermore, we also wanted an association rule to determine the preferred individuation of the Aristotelian senses – at least for the novel feature debate. We just saw that none of our association rules met all four constraints. If this result for the novel feature debate generalizes, then there will be no good rule for determining the preferred demarcation of the mechanisms of Regarding the counterfactual strategy, the only room left to maneuver concerns what it is to count as a result of 53 the stimulation of a mechanism, but we’ve covered all the obvious ground there. Any further proposal will be both ad hoc and unlikely to help. 50 the Aristotelian senses for the purposes of theorizing about the features of perceptual experiences. 6. The problem generalizes In fact, the failure to find an adequate association rule for the novel feature debate does generalize to any debate that takes the pre-theoretical notions of the Aristotelian senses as a starting point for theorizing about the contents and phenomenal character of perceptual experience. Any such debate needs an association rule that meets our four constraints (modifying the third constraint to apply to whichever debate is under consideration). But none of our association rules satisfied all four criteria. Perhaps, given that RR8 only ran afoul of the third constraint (to render the novel feature debate substantive), it will be useful for one of these other debates. If RR8 is going to be an acceptable association rule for one of these debates, then it will have to render that debate substantive. But RR8 clearly won’t render any of these debates substantive because – as we’ve seen – all the explanatory work regarding the contents and phenomenal character of perception is already done by the time we are able to apply RR8. Take the thick or thin contents debate, for example. The debate is over whether kind properties, such as being a pine tree, are ever contents of a perceptual (usually visual) experience. Insofar as the arguments in favor of thick contents rely on phenomenal contrast cases implicating a specific modality (e.g., Siegel 2006), we will need an association rule that tells us, for each kind property in the contents of a conscious experience, whether or not its correlated 51 phenomenal character is associated with a sensory modality (such as vision). RR8 will be able 54 to do this, but once we are in a position to apply RR8 we will already know all there is to know about which demarcations of, e.g., the visual mechanism carry which contents and result in which phenomenal features. Those who favor thick contents will prefer a demarcation of the visual mechanism that associates the phenomenal features correlated with certain kind properties with vision. Those who favor a thin account will prefer a demarcation that doesn’t. As with the novel feature debate, picking one demarcation over the other will have no further implications for any of our theorizing about perceptual content and phenomenology, though. All that theorizing will have already been done. A final comment: I am not saying that there could be no theoretical use for our pre- theoretical notions of the senses, just that there will be none with respect to explaining perceptual content and phenomenology. Nor am I saying that we should abandon any differentiation of the sensory mechanisms and focus just on a single modality, perceiving, in our explanations of content and phenomenology (as Speaks (2015) suggests). My approach is more akin to the 55 sensory pluralism of Macpherson (2011b, 2011c) and Fulkerson (2014), minus the extra step of declaring some demarcations of the sensory mechanisms to be versions of the Aristotelian senses. I omit the step because it adds nothing to the explanatory power of our perceptual theories. One could, of course, pursue the thick/thin debate without adverting to specific sensory modalities. But this is just 54 to concede the point that our pre-theoretical notions of the senses are not an adequate basis for theorizing about such matters. Indeed, the results canvased here suggest that the relevant mechanisms won’t divide into perceptual and 55 conceptual any more neatly than they will divide into perceptual modalities. 52 7. Conclusion I have argued that any association rule that will be acceptable for the purposes of informing the novel feature debate must meet four requirements: (1) It must answer ‘yes’ or ‘no’ to the question ‘Is this feature associated with that modality?’ for every feature and modality. (2) It must conform to our guiding intuitions regarding the correct associations of proper sensibles and the results of cross-modal recalibration effects. (3) It must render the debate substantive. (4) It must appeal to the stimulation states of sensory mechanisms. Given the general form of such a rule – roughly, a feature of a perceptual experience belongs to a given modality if, and only if, it could/would result from the stimulation of that modality in a unimodal experience – alterations can be made along the dimensions of (a) the demarcation of the sensory mechanisms by which we individuate the sensory modalities, (b) the stimulation received by that mechanism, and (c) what counts as a result of the stimulation of that mechanism. It was found that no variation on these parameters can satisfy all the constraints for an adequate association rule. Without an association rule we have no way to adjudicate the novel feature debate. Furthermore, our attempt to find an adequate association rule revealed that novel features are not an interesting theoretical notion. The failure to find an adequate association rule generalizes, undermining any attempt to theorize about the contents and phenomenal character of perception in terms of the Aristotelian senses. It is time to slough off the remaining baggage of the sense-by-sense approach to theorizing about the features of perceptual experiences in favor of an approach that proceeds, not 53 by appeal to under-defined notions of the sensory modalities, but in terms of the precisely demarcated sensory mechanisms. Doing so will be illuminating for a number of philosophical debates about perception. To cite a few that have received passing mention here: (1) Whether or not we need a Kantian intuition of space to have spatial perception, (2) the thick/thin contents of perceptual experience, and (3) a naturalistic account of the contents and phenomenology of perceptual experiences. The discussion here has already suggested how we might proceed with this project. The first step is to account for the informational contents carried in the perceptual mechanisms and how that informational content relates to the representational contents of perception. A good bit of work has already been done in this area, though there is certainly more to accomplish (e.g., Neander 2017; Shea 2018). The second step is to explain the relationship between phenomenology and informational/representational content. If phenomenology is determined by content or by content + mechanism, then we will be able to account for all the features of perceptual experiences once we have completed the first step. Even if it isn't, there is certainly a close connection between the content and phenomenology of a perceptual experience, so we will have gone some way towards explaining phenomenology once we have sorted the issues concerning mechanisms and contents. Furthermore, by focusing on clearly demarcated mechanisms, we make the question of the phenomenology of an experience resulting from the operation of that mechanism empirically tractable – at least in principle. We only need the means of isolating that mechanism and testing its deliverances (in light of our theories of informational content, representation, and phenomenology). This is not something that our pre-theoretical 54 notions of the senses afford, given the uncertainty about how precisely to demarcate their mechanisms. 55 Chapter 2 Information, Function, and Representation in Perceptual Processing ABSTRACT: In chapter 1 I argued that all attempts to formulate a rule for associating features of perceptual experiences with the Aristotelian senses are inadequate; they either ignore subpersonal interaction between the putative senses or else they contradict intuitions that are constitutive of our very ideas of those senses. The upshot is that we should stop theorizing about the representational contents and phenomenal character of perceptual experience in terms of the contributions of the Aristotelian senses – an approach that continues to dominate philosophy of perception, even in the recent turn toward multisensory experience. In this chapter, I offer an alternative approach to analyzing perceptual content, which focuses on the contributions of the neural mechanisms of perceptual processing (without regard for how we would map these mechanisms onto Aristotelian senses). The account builds on highly promising naturalistic accounts of mental representation recently developed by Neander (2017) and Shea (2018). These accounts combine the merits of earlier purely informational and purely teleosemantic views. Neander takes a bottom-up, input-driven approach. Shea takes a top-down, output-driven approach. I argue that – despite their merits – neither approach succeeds. Any such hybrid informational/teleosemantic theory must 56 account for the constraints the functions of low-level perceptual mechanisms impose on the information that factors in the production of high-level behaviors and the constraints high-level behaviors impose on the functional analysis of low-level perceptual mechanisms. I then show how to do this. Contents 1. The bottom-up approach: Neander’s informational teleosemantics 2. The top-down approach: Shea’s varitel semantics 3. The bottom-up constraint: internally accessible information 3.1.sensory receptor stimulation 3.2.background information 4. The top-down constraint: derivatively adaptive low-level functions 5. Testing the account 5.1.toy examples 5.2.content indeterminacy challenges Any externalist physicalist view of the mind needs a naturalized account of mental representation. Attempts to provide such an account fall primarily into two camps: informational views and teleosemantic views. Informational views focus on the features of the environment that cause or are correlated with mental states (Dretske 1981; Eliasmith 2000, 2005a, 2005b, 2013; Stampe 1977; and Usher 2001). Teleosemantic views focus on selected functions of the organism or system under consideration – often, but not always, adaptive behaviors by which the organism interacts with its environment (Millikan 1984, 1989, 2004; Papineau 1987, 1993, 57 1998). Both camps are plagued by content indeterminacy challenges, despite repeated attempts to solve these problems. 56 This has led to the development of hybrid views that use both information and function to determine the contents of a mental state. Neander (2017) and Shea (2018), in particular, offer highly promising versions of the hybrid view. Neander relies on a causal account of information and functions of low-level perceptual mechanisms to respond to environmental stimuli. Shea relies on correlational information and high-level functions to produce adaptive behaviors. That 57 is, Neander determines contents in a bottom-up (input-driven) fashion, Shea in a top-down (output-driven) fashion. Both Neander and Shea argue that their views answer the content indeterminacy challenges faced by exclusively informational and teleosemantic views, but they are both mistaken. In this chapter I suggest an alternative hybrid view that outperforms those of Neander and Shea. In §1, I describe Neander’s (2017) informational teleosemantics and show why it – and bottom-up approaches, generally – will not work. In §2, I describe Shea’s (2018) varitel semantics and show why it – and top-down approaches, generally – miss the mark. Any hybrid theory must account for the constraints the functions of low-level perceptual mechanisms impose on the information that factors in the production of high-level behaviors and the constraints high- The ‘indeterminacy’ applies to states for which our theory of content determination delivers an unwanted 56 multiplicity of contents – e.g., the theory gives the content P ∧ Q when there is reason to think that the content is just P. ‘Content indeterminacy’ is also sometimes used where there is a lack of wanted specificity – e.g., the theory gives the content P ∨ Q when there is reason to think the content is P. I will use ‘content indeterminacy’ in the former sense. Where the latter issue arises, I will speak of a lack of specificity. For content indeterminacy challenges faced by purely informational views see (Dretske 1986; Fodor 1990; Artiga and Sebastian 2018). For indeterminacy challenges for purely teleosemantic theories see (Dretske 1986; Fodor 1990, 1996; Griffiths and Goode 1995; Neander 1995). They also differ with respect to their characterization of information – Neander uses a causal account, Shea a 57 correlational one, as we see below. However, this difference will not factor as prominently in the discussion as the functional difference will. 58 level behaviors impose on the functional analysis of low-level perceptual mechanisms. (Neander and Shea restrict their accounts to perceptual contents. I do the same.) In §3 and §4, I offer an account that can satisfy both constraints. In §5 I show how my account can meet the content indeterminacy challenges. 1. The Bottom-up Approach: Neander’s Informational Teleosemantics Neander’s informational teleosemantics is the most well worked out version of the bottom-up hybrid approach. Neander’s account relies on ‘response functions’. A mechanism has a response function when it was selected for being caused to go into a particular state by a particular stimulation type (2017: 127). Given that this is a causal response and that on Neander’s view a state carries information about its causes (2017: 142), we get the result that a response function is a function to carry information about that to which it is a response (the perceptual input). This is why Neander’s view counts as bottom-up. Below is my gloss on Neander’s basic condition for content determination: Causal-informational Teleosemantics (CT). A state, R, of a perceptual mechanism M, has the content there’ s C iff M was selected for being caused to be in an R-type state by C-type events (in virtue of their C-ness). 58 For the original see (Neander 2017: 151). I replace Neander’s reference to ‘sensory-perceptual system’ with 58 ‘perceptual mechanism’ as it is more in line with the low-level entities she seems to have in mind (e.g., T5-2 cells in a toad’s optic tectum), which are embedded in what we tend to think of as (higher-level) perceptual systems. I also have made the causal nature of the response function transparent. Neander attributes both an existential and indexical reading to ‘there’. 59 Of course, specifying the relevant cause is no mean feat. Lots of things are causally implicated in – to use a well-worn example – a toad’s detection of an insect (or larvae or worm), including an oblong moving shadow on the retina; a (distal) small, dark, moving object (SDM); an insect; toad food; etc. Neander is aware of the difficulty. She partially addresses it 59 by employing a property-sensitive notion of causation. This gives the result she favors with respect to one aspect of content indeterminacy: whether the toad’s visual system represents a configuration of visible features – an SDM – or something more conceptually-laden such as a fly or food. According to CT it represents an SDM because the relevant mechanism is causally sensitive to SDMs, not food. The toad’s visual system will respond to SDMs whether or not they are food but will only respond to food if it is in the form of an SDM. Likewise for flies. 60 We are still left with a problem (which Neander recognizes): Evolution is too blunt an instrument to differentiate the response to the SDM from the response to the patterns of light/ shadow by which the response to the SDM is made. Assuming that only one of these contents is represented, CT underdetermines the content of the putative SDM-detecting mechanism. This is an instance of the problem of distal content – i.e., the problem of identifying a principle that picks the correct stopping point in the causal chain leading to a perceptual state to serve as the content of that perceptual state. 61 To address this problem, Neander augments CT with the distality principle: ‘Small’, ’dark’, and ‘moving’ are placeholders for some more or less specific range of size, color, and movement 59 type respectively. It remains an open question if causation is property sensitive in the way that Neander needs. I will set that worry 60 aside here. The distality problem is also known as the stopping problem or the horizontal indeterminacy problem. 61 60 Distality principle. “[…] R refers to C rather than the more proximal Q if the [mechanism (M)] responsible for producing Rs was adapted for responding to Qs (qua Qs) by producing Rs as a means of responding to Cs (qua Cs) by producing Rs, but it was not adapted for responding to Cs as a means of responding to Qs.” (2017: 222) The distality principle leverages an asymmetry to (supposedly) get the desired result: The SDM- detecting mechanism was adapted for responding to patterns of retinal stimulation as a means of responding to distal SDMs, but not vice versa. Notice, though, that Neander’s response functions are only derivatively adaptive. That is, they are only adaptive insofar as they contribute to the production of behaviors by which the organism successfully interacts with its environment. It is such behaviors that are, in the first instance, adaptive and, hence, selected for: Responding to the presence of an SDM is only adaptive because it leads to prey capture behavior, and capturing prey is beneficial to the toad. Without the beneficial behavior, SDM-detection wouldn’t be adaptive. Neander acknowledges the need for further adaptive effects following from the detection of some feature in her defense of response functions as etiological functions, but she does not specify how response functions must be linked to these further effects (2017: 129). Having 62 satisfied herself that they can be selected for, Neander treats response functions like non- derivatively adaptive functions – as though we will know what the mechanism was selected for if we know its etiology and how it behaves in the relevant circumstances. But this is not the case for derivatively adaptive functions. We can’t know what a given response function was selected See Millikan (1989, 2004), Papineau (1998), and Shea (2007) for the objection to which Neander is responding. 62 61 for responding to without understanding how that response is taken up by further processing eventuating in a non-derivatively adaptive effect (e.g., a behavior on the part of the organism). Therefore, the distality principle gives us no guidance on the content of a response function- bearing mechanism until the account of derivative adaptiveness is filled in. The question is: Can we fill in these details without losing input-driven functions, like response functions, as the primary functions relevant to perceptual content determination, as Neander’s view requires – i.e., without losing the bottom-up character of the view? The answer is: No. 63 Consider the case of two imaginary species of sea creature. One species, the day-drowser, is nocturnal. The other, the algae-eater, eats bioluminescent algae. Both evolved a mechanism that responds to blue light. In fact, the blue light-sensitive mechanisms of both species are indistinguishable and inherited from a common ancestor. But they are adaptive for different reasons: The blue light-sensitive mechanism modulates sleep patterns in day-drowsers by passing along information concerning the presence of blue light to a further mechanism that releases melatonin. The result is that day-drowsers get sleepy in the daytime, when there is a relative abundance of blue light in their environment. The blue light-sensitive mechanism in algae-eaters, on the other hand, helps them find and consume food (algae that glows blue) by There are further worries about the distality principle. For instance, it does not rule out ‘hidden features’ – features 63 that are causally upstream of the sensory stimulation, are not accessible to perceptual processing, but are nonetheless causally relevant to the sensory stimulation. The very same perceptual mechanism that responds to the presence of a clear, colorless, odorless liquid will respond to the presence of H2O, since being H2O is causally responsible for water presenting these perceptual appearances. Given the prevalence of water in our environment (and the lack of other clear, colorless, odorless liquids – e.g., heavy water, D2O), the one response could not have been selected for without selecting for the other. But being H2O is not represented in visual processing simply in virtue of detecting these perceivable features, according to Neander (2017: 119-122). Furthermore, the mechanism for detecting clear, colorless, odorless liquid was presumably selected because it was by detecting those features that the mechanism detects H2O. So the distality principle will give a result that Neander denies. (See Schulte (2018) for a related argument.) 62 passing along information concerning the presence of blue light to a further mechanism that triggers behavior culminating in eating algae. The result Neander wants is this: CT tells us that the day-drowser’s mechanism (M) was selected for detecting (going into an R state in response to) proximal blue light (Q) or for selecting daylight (C), and the distality principle tells us the presence of distal daylight (C) is the content (of M’s being in R) because the mechanism (M) was selected for responding to proximal blue light (Q) as a way of responding to daylight but not vice versa. And CT tells us that the algae-eaters’ mechanism (M) was selected for detecting (producing an R state in response to) proximal blue light (Q) or for detecting a distal glowing blue expanse (C), and the distality principle tells us that the presence of a distal glowing blue expanse (C) is the content (of M’s being in R) because the response to proximal blue light (Q) was selected as a means of responding to distal glowing blue expanses and not vice versa. But we can’t know what response was selected for just by looking at the response (R) of the perceptual mechanism (M) and knowing the evolutionary history of the mechanism. That won’t distinguish algae-eaters from day-drowsers because their blue light-sensitive mechanisms are identical and evolved into their present form in a common ancestor. We need to know how the response is used – what behavior the detection of blue light triggers – to know what it was selected for. It will not do to simply look at the organism-level behavior to determine how the response is used: Consider a third species of sea creature, the algae-drowser. Like day-drowsers, algae-drowsers are nocturnal and have a blue-light detecting mechanism that triggers the release of melatonin. This blue light-detecting mechanism is identical to those of the other species and shares the same ancestral origins. Like algae-eaters, though, algae-drowsers eat bioluminescent 63 algae. However, algae-drowsers detect bioluminescent algae by smell. It would be unfortunate for the algae-drowsers if they got sleepy whenever they were near their food, so they have evolved a mutation: when they detect the smell of the bioluminescent algae, the melatonin- releasing mechanism is suppressed (despite the blue-light sensitive mechanism responding normally to the presence of blue light). We would not be able to tell that this suppression is occurring simply by looking at the response of the blue light-sensitive mechanism in light of the behavior of the organism because the blue light-sensitive mechanism will respond to the glowing algae and the algae-drowser will eat the algae. It will look as though algae-drowsers are using the blue-light sensitive mechanism to detect the glow of bioluminescent algae, as the algae-eaters do. But that is wrong. Blue light detection does not produce algae-eating behavior in algae-drowsers. We need to understand how the blue light- and algae smell-sensitive mechanisms interact to bring about the algae-drowser’s (non-derivatively adaptive) behavior before we can determine what the blue light-sensitive mechanism was selected for. The general lesson is that understanding a mechanism’s derivatively adaptive function requires an understanding of the precise role that function plays in bringing about an adaptive behavior. And that requires knowing how it fits into the relevant perceptual processing stream (i.e., how the mechanism with that function interacts with other perceptual mechanisms). We can’t simply assume that the response plays the role we think it does in bringing about some high-level behavior. We need to work out the actual details of its contribution. But once we do this, we no longer have a bottom-up approach. We have an approach that considers input-driven response functions and functions to bring about (non-derivatively) adaptive behavioral outputs in 64 equal measure. Neander cannot both retain the bottom-up character of her view, with its resolute focus on response functions, and give an account of the adaptiveness of those response functions. The problem generalizes to all bottom-up approaches: Any perceptual processing prior to the production of behavior will only be derivatively adaptive. But the function of a derivatively adaptive mechanism (and, hence, the contents carried by states of that mechanism) is determined by the role it plays in the larger cognitive system that eventuates in the non-derivatively adaptive behavior. So we need a higher-level functional analysis in which to embed derivatively adaptive, low-level functions – an analysis not available on an exclusively bottom-up approach. Nevertheless, Neander is correct that low-level perceptual mechanisms’ sensitivity to features of the environment are crucial for an account of perceptual contents. This fact is either ignored or under-appreciated in top-down approaches – e.g., Shea’s (2018) otherwise admirable account of perceptual content. 2. The Top-Down Approach: Shea’s Varitel Semantics Shea’s varitel semantics offers a top-down version of the hybrid view. The functions that are 64 relevant for Shea’s account of perceptual content are outcome functions – functions to bring about certain behaviors or the consequences thereof (2018: 55). The information used in Shea’s account is a subset of correlational information. (A state F carries correlation information about a state G’s obtaining just in case the probability of G, given F, is not equal to the probability of G (i.e., P(G|F) ≠ P(G)).) The ‘vari-’ in ‘varitel’ signifies the variable elements of content determination on Shea’s account (e.g., different 64 ways that functions are selected) and that there may be other content determining conditions for other sorts of representational states (e.g., beliefs and desires); ‘-tel’ is short for ‘teleosemantics’. 65 There are a lot of moving parts to Shea’s account, which he introduces sequentially over 4 chapters, allowing for a streamlined presentation of his Condition for Content (2018: 85). However, it will help us to keep track of the moving pieces if the condition is spelled out in full, with the definitions of each part included, as I have in parentheses below. Shea offers a sufficient condition for representation, as opposed to Neander’s necessary and sufficient condition. Condition for Content (CC). For a component M of a system S, where S has the function to produce (instances of outcome type) F such that (a) S’s production of F is ROBUST (i.e., it occurs “in response to a range of different inputs” and “in a range of different relevant external conditions” (55)) and (b) S’s production of F is either STABLE (i.e., it has been systematically stabilized through evolution, learning, or by contributing to the persistence of S) or S has been intentionally designed to produce F (64), If (c) M’s being in a total state R carries EXPLOITABLE INFORMATION about a condition C’s obtaining (i.e., if M is in a state R in a region D and b is in region D’, then either P(Cb|M being in R) < P(Cb) for a univocal reason or P(Cb|M being in R) > P(Cb) for a univocal reason (77)), and 65 The appeal to regions D and D’ circumvents the need for universality. This makes good sense for stabilization, 65 which operates on locally reliable, rather than universal, correlations. See below on the relevance of these regions for robustness. Shea’s definition of exploitable correlational information begins “Item(s) a being in state F carries exploitable correlational information…” (2018: 77). I have modified it to ‘M’s being in a total state R’ to integrate it with the rest of CC. 66 (d) that M’s being in R satisfies (c) plays an unmediated role in a causal explanation (i.e., explains without adverting to a correlation between C and a further condition C’), through R’s contribution to the implementation of an algorithm, of (i) how the production of F “has been systematically stabilized through evolution, learning or contribution to persistence” (S-EXPLANATION) or (ii) how F “has been produced in response to a range of different inputs and achieved in a range of different relevant external conditions” (R-EXPLANATION) (84), then M’s being in R represents that C obtains 66 More succinctly: For a component M of system S, where S has the stabilized function to robustly produce F, if M’s being in R caries exploitable correlational information about C and the fact that M’s being in R carries that information factors in an unmediated causal explanation of the stabilization or robust production of F, then M’s being in R represents that C obtains. So a high-level function to produce an outcome (a behavior or its consequences) constrains the content of any lower-level mechanisms that plays a role in bringing about the outcome. (This is why Shea’s view is top-down.) An outcome is stable when it has been selected for (e.g., by evolution). An outcome is robustly produced when it is brought about in a range of Shea’s original statement of CC (2018: 85) leaves open the interpretation that what is being represented is a bare 66 property, but that makes talk of probability of C mysterious – probabilities are calculated on propositions, not properties. A subsequent statement (181) forestalls this interpretation. I have relied on the latter in my gloss of CC. The exploitable information carried by M could be information about C that makes it less likely that C obtains, however such information won’t generally factor in an R-EXPLANATION or S-EXPLANATION, so I set it aside here. 67 relevant circumstances – those that correspond with circumstances arising in the selection environment in which the production of the outcome is stabilized. But unmediated explanations of stability (S-EXPLANATIONS) and of robustness (R- EXPLANATIONS) will not necessarily call upon the same information. Consider a fanciful creature, see’ums, that evolved in an environment in which the only light was florescent lighting. Unfortunately for the see’ums, the hum of the florescent lights coincides with the resonant frequencies of their heads such that, if the hum gets too loud, their skulls will fracture. The hum produced by the florescent lights gets louder as the light gets brighter. As a result, see’ums avoid loud hums/bright lights. However, see’ums do not have auditory receptors. They can only detect the hum by way of its correlation with the intensity of the light emitted by the florescent bulbs. (They do have visual receptors.) The activation of the mechanism that initiates the see’um’s avoidance behavior correlates with both the presence of bright light and the presence of a loud hum. An unmediated S- EXPLANATION of the behavior will advert to the presence of a loud hum. After all, it is the hum that is dangerous. Put the see’um in bright light without a hum – e.g., by dragging it into daylight – and there is no advantage to avoiding the light. But, if you put the see’um in daylight, it will still engage in avoidance behavior because the avoidance initiating mechanism is triggered by its sensitivity to light intensity. So if we want to offer an unmediated R-EXPLANATION of the avoidance behavior, we will need to appeal to bright light: It is light that is relevant to the production of the behavior in all the circumstances and with all the inputs that robustly cause the behavior. An R-EXPLANATION invoking information 68 about the hum will be mediated: hums only explain the production of the behavior insofar as they correlate with light. According to CC, then, the avoidance initiating mechanism represents that there is a bright light and that there is a loud hum because the causal explanation that determines which EXPLOITABLE INFORMATION gets represented is S-EXPLANATION or R-EXPLANATION. Shea could respond (rightly) that for such toy examples this kind of content indeterminacy is acceptable. But the point is that, given their different explananda, the information invoked in unmediated S- EXPLANATIONS and R-EXPLANATIONS will typically differ. And this is true regardless of the simplicity or complexity of the organism (or other system) under consideration. 67 Moreover, an account of perceptual representation for use in cognitive science should focus on R-EXPLANATIONS, not S-EXPLANATIONS – particularly if we want a notion of representation that will be useful in the dominant information processing paradigm of cognitive science, which seeks to explain the production of an organism’s behavior by operations on internal representations. Those explanations are R-EXPLANATIONS. S-EXPLANATIONS, by contrast, are only concerned with why, given a certain stimulus, a certain behavior was performed. (Whatever is going on inside the toad’s visual system, the thing that makes its capture of a fly adaptive is that things like that are relatively plentiful in the local environment and that they provide sustenance to the toad.) As a result, S-EXPLANATIONS can be given for any stabilized stimulus response system and are not made more perspicuous by an appeal to internal representational states. Examples are easy to find in human perception. For instance, stereopsis relies on the detection of effort in the 67 extraocular muscles. The least mediated explanation for the stabilization of the detection of this effort is that it correlates with eye position and, hence – when considering both eyes – with vergence. Given vergence and the interocular distance, we can determine the distance to the focal object. But it is effort, not vergence or eye position, that is driving the robust response, as demonstrated by Priot et al. (2012). 69 Notice, too, that R-EXPLANATIONS do not – as Shea states – rely on EXPLOITABLE INFORMATION. In particular, the region (D’) in which the behavior was selected is irrelevant to an explanation of how the behavior is produced in response to a range of different inputs, as we see when we take the see’um into the sunlight. The only relevant external factors for the production of the behavior are the sources of the perceptual stimulation (the environmental inputs). An R- EXPLANATION shouldn’t appeal to region D’ because the production of the behavior is not restricted to that region. So, what information is relevant for R-EXPLANATION? Information that is explicitly carried – i.e., made accessible to further perceptual processing – by the algorithms operating on the sensory input and eventuating in the robust/stable behavior. This just is the information invoked in a functional description of the algorithms executed by M (and other components of S) in the production of F. Not all the correlations captured by EXPLOITABLE INFORMATION are accessible in this way. For instance, the see’ums’ visual systems don’t make the presence of a hum accessible to further processing unless it includes a mechanism that caries background information about the correlation between light intensity and hum intensity and allows it to take input concerning light intensity and output information about hum intensity for use by further mechanisms. There is no reason to think there is such background information in the see’um case as described. The lesson is that – as with Neander’s view – we need to trace the stages of perceptual processing to determine exactly what contents are accessible at a given stage. In Shea’s case, however, the shortcoming is that he doesn’t take adequate account of the restrictions that the (derivatively adaptive) information processing functions of low-level perceptual mechanisms 70 place on the information that can be employed in an R-EXPLANATION of a high-level function (behavior). This generalizes to any top-down view, on which high-level functions to produce adaptive (stable) behaviors determine content without regard to the information available to lower-level perceptual processing. The information invoked in an R-EXPLANATION of a behavior must be restricted to information that is internally accessible to perceptual processing. Nevertheless, Shea is correct that low-level processing functions – by which we determine which of the internally accessible information is actual used – need to be understood in terms of the high-level behaviors those functions support. 3. The Bottom-Up Constraint: Internally Accessible Information In this section, I describe the constraints that the responsiveness of low-level perceptual mechanism impose on the information that can figure in an R-EXPLANATION. In the next section, I turn to the constraints imposed by the high-level function on the representational content of the given state of a low-level perceptual mechanism. Let us say that the (correlational) information that a mechanism carries and makes available to further processing is encoded by that mechanism. (So see’ums’ visual response encodes light intensity but not hum.) The information encoded at early stages of processing constrains the information that can be encoded by later stages of processing. 68 Encoded Information. Information regarding a condition C’s obtaining will be encoded by (a change in) the total state R of a perceptual mechanism M iff Some of this information may cease to be accessible to subsequent stages (e.g., the spectral profile of the ambient 68 illumination in achieving color constancy). 71 (i) the activation state of the sensory receptors, along with the implementation of any algorithms applied to those stimulation states by M (or mechanisms preceding M in the perceptual processing stream), causes M to be in R, and (ii) the activation state of the sensory receptors, invariant features of the physical system in which the receptor is embedded, the implementation of any algorithms applied to those stimulation states by M (or mechanisms preceding M in the processing stream responsible for M’s being in R), and any background information carried by M (or mechanisms preceding M in the processing stream responsible for M’s being in R) collectively nomologically determine information about C’s obtaining. 69 In §3.1 I describe what is encoded by receptor stimulation and subsequent processing thereof, without the application of background information. In §3.2 I describe the information encoded by these resources and background information. 3.1. Sensory Receptor Stimulation Encoding begins at the sensory receptors. In particular, it begins with the individual sensory transducers – e.g., the rods and cones – on a continuous receptor surface – e.g., the retina. 70 Background information can reintroduce factors such as locally reliable, but not law-like, correlations. But the 69 nomological determination condition forces us to be explicit about which background information is being invoked. If you don’t like nomological determination, you can substitute some other form of determination so long as it is stringent enough to prevent background information from sneaking in unnoticed. (This blocks the worry, raised in n.8, about causally upstream hidden features.) My description of encoding fits most easily within the traditional feed-forward understanding of perceptual 70 processing. However, it can easily be integrated with top-down approaches (e.g., Bayesian accounts of perception). Indeed, it is an essential part of any such account, insofar as these still allow for pure transduction at the receptors and need to account for the error between the incoming information and the top-down prediction. 72 Transducers evolved, in the first instance, to detect features of some characteristic proximal stimulus, converting (i.e., transducing) energy from this stimulus into neural impulses. A single transducer will encode information about the feature of this proximal stimulus to which it is sensitive. The information encoded is information concerning the presence of a feature at a given time – the time of the (change in) activation of the transducer: The activation state of the transducer raises the probability that the given feature is present, given its law-governed interactions with its characteristic stimulus. Of course, there will be other nomologically possible causes of the transducer activation, including misfires. So, technically, information about all the nomologically possible causes of the transducer state (including misfires) will be encoded. I 71 will return to these complications below. For now, I focus only on the encoding of features of the characteristic proximal stimulus. With the subsequent evolution of receptors, more reliability and a greater variety of proximal features were able to be encoded (due to the spatial arrangement of transducers on the receptor, etc.). This is accomplished via algorithms that take input from the activation states of 72 multiple transducers (or a single transducer at different times). These algorithms can be understood as specifying the conditions under which (changes in) the activation of transducers on the receptor indicate the presence of some feature of the proximal stimulus. The conditions are specified in terms of activation states of the individual transducers involved, their spatial arrangement relative to the other transducers of their respective receptor organs, and temporal As will ‘cheap features’ – e.g., that there is sufficient illumination for visual experience. Cheap distal features will 71 fall away as we proceed through perceptual processing because they do not usefully inform later stages of that processing (see §4). For example, see Schwab (2012) for a thorough overview of the evolution of camera-style eyes from early 72 photoreceptors. 73 relations among their activation states. If the conditions are satisfied, then information about 73 the presence of the given feature is encoded. (Such algorithms are implemented in early perceptual processing, but not, of course, by the transducers or receptors themselves.) For example, there are three types of cones (transducers) distributed along the retina (receptor), each exhibiting peak responsiveness to a different wavelength of light (ca. 558nm, 531nm, and 419nm; these cones are often characterized – somewhat inaccurately – as responding to red, green, and blue light respectively). The response of the cone weakens as the wavelength of the stimulus deviates from the peak. Taken individually, cone activation encodes surprisingly little. The response is dependent on both the intensity of the light and wavelength, so what gets encoded by the single cone is disjunctive information about intensity and wavelength pairings. However, by applying an algorithm that takes the activation states of several adjacent cones as input – including at least one of each type – information about the wavelength profile of the light striking that region of the retina can be encoded, including the overall hue (dominant wavelength) and saturation (relative amounts of energies at various wavelengths). There will also be algorithms that take input from multiple receptors – including receptors of different types – and thereby encode further information about proximal features of the (overall) stimulus (or conjunctions of proximal features of different stimuli). For example, algorithms integrating the inputs from the left and right retinas are essential for encoding retinal disparities, which factor in stereopsis. The continuity of the receptor organ is an invariant feature of the physical system in which the receptor is 73 embedded. Similarly for the arrangement of the transducers on the receptor. Information about these invariant features can be attributed as background information to the relevant stage of perceptual processing, but this is the only background information presently under consideration. See §3.2 for more extensive background information. 74 The activation of the sensory receptors, and the application of algorithms thereto, will also capture information about distal features that could determine the pattern of receptor stimulation. The nomologically determined distal information will be more disjunctive than locally reliable correlations, upon which evolutionary processes work. These correlations rule out certain nomological possibilities. But the encoded information will only be made more specific by these correlations if some processing that carries information regarding them has intervened between sensory stimulation and the perceptual state to which we are ascribing content. For example, anaerobic bacteria produce magnetosomes – transducers that are sensitive to magnetic fields. These magnetosomes are strung together as a receptor that orients itself to the earth’s magnetic field, like a compass needle. The earth’s magnetic field correlates with anaerobic water, in which these bacteria thrive. The bacterium orients its movement according to the magnetosome response, with the result that it reaches (nearly) anaerobic water. However, the detection of a magnetic field, by itself, does not determine information about the direction of anaerobic water. Place a magnet near an anaerobic bacterium and it will reorient itself accordingly, regardless of the oxygen levels of the water towards which it swims. It is true that, within the context of the earth’s biogeochemical cycles, the correlation between anaerobic water and the earth’s magnetic field is governed by physical laws. But the only feature of this distal environment to which the anaerobic bacteria is sensitive, via its magnetosomes, is the magnetic field. So the internally accessible (i.e., encoded) information concerning the distal environment supplied by the response state of the magnetosome is just that which is determined by the magnetosome response. This will be a disjunction of information about all the nomologically possible conditions under which the magnetosome stimulation could 75 occur, including that the organism is in an earth-like biogeochemical environment and there is oxygen-depleted water in the indicated direction. Were the bacteria more complex, subsequent processing could apply background information (such as that such and such a magnetic field corresponds with such and such a level of oxygenation) that would determine information about the direction to anaerobic water. But at present we are only focused on information captured and made internally accessible – and, hence, suitable for inclusion in an R-EXPLANATION – by the processing of stimulation received by the sensory receptors without intervening background information. 3.2. Background Information We ultimately will need informational inputs beyond occurrent receptor stimulation because the algorithms of multiple receptors underdetermine distal features that are represented in perception. For instance, retinal disparities underdetermine the distance to the focal object. We also need information regarding vergence and interocular distance (see n.12). These additional 74 inputs will be supplied by mechanisms that were selected for extracting information from the stimulation of the sensory receptors that the sensory receptors, themselves, cannot supply. 75 The first class of informational inputs beyond occurrent receptor stimulation (and shallow processing thereof) are those derived from past stimulation of the receptors. These include the results of associative learning and the integration of past encoded information into background We also need background information ruling out artificial sources such as stereograms, producing illusory depth 74 perception. Background information is often implicit – i.e., it is not available to further processing but, rather, underwrites operations on explicit information. Explicit information is information ascribed to a mechanism that is available to further processing (i.e., is encoded). It is beyond the scope of this chapter to provide a complete constitutive account of these background contents. 75 76 representations. When activation states of distinct receptors exhibit a law-like correlation, the activation of one receptor can encode information that would be encoded by the correlated activation of the other. Such associative learning underwrites, for instance, the auditory detection of a sound’s elevation. The contours of the outer ear affect the sound wave in a manner that depends on its angle of arrival. Moving one’s head changes the effect on the sound wave. So, learned correlations between head movements (registered by proprioception and the vestibular organs), the resulting alterations to the sound wave, and visual fixation on sound sources will enable the encoding of information about sound elevation from the outer ear-induced effects on the sound wave in the absence of occurrent proprioceptive and visual input. The integration of past contents encoded by the sensory receptors can be seen in the construction of the body schema, which enables the awareness of the relative locations of body parts in a given bodily posture. The body schema is built up through sequential exploration of one’s own body drawing on tactile and proprioceptive input and vision. This stored information 76 will then be integrated, via algorithms, with occurrent stimulation to enable encoding of information about one’s current body posture. The second class encompasses inputs (and associated algorithms) not available to, or wholly derived from, the sensory receptors. For example, afferent motor signals – copies of motor commands sent to the muscles – also contribute to perceptual processes that require tracking body position. There are also more complex examples: Sensitivity to certain visual configurations (e.g., Y-vertices, such as those found at the corners of a drawing of a cube) are implicated in the perception of 3D volumetric shapes. That is why we see a Necker cube as – in See Brecht (2017), Longo et al. (2009), and Longo and Haggard (2010). 76 77 some sense – three dimensional. Such heuristics can make perceptual contents more specific/ 77 more determinate without costly computation and in the absence of some information that might be necessary for such a computation. They can be based upon statistical regularities in past experiences – like the learned associations mentioned above, though heuristics don’t require law- like correlations – or they can be innate. In either case they will operate on information encoded by prior stages of perceptual processing. 78 The operative rule here is: For any stage of perceptual processing you want to analyze, begin with occurrent sensory inputs and then proceed to other inputs – progressing from those most closely related to occurrent sensory stimulations to those least directly related to them – until you are able to explain the functioning of that stage. These additional inputs will allow us 79 to incorporate information about reliable correlations that we ruled out in the preceding section. And they do so in a way that makes their contributions to perceptual processing tractable while preventing us from smuggling information in unnoticed. Only an incremental account of the sort I propose can do this. 4. The Top-Down Constraint: Derivatively Adaptive Low-Level Functions R-EXPLANATIONS are the relevant explanations for determining perceptual contents that are useful for cognitive science. And encoded information is the relevant information for an R- Heuristics rely on implicit background information. For instance, the fact that Y-vertices are correlated with 77 volumetricity is not made available to visual processing in virtue of the implementation of the Y-vertex/ volumetricity heuristic that relies on that information. Of course, heuristics are not guaranteed to get things right – as in the apparent volumetricity of a line drawing of a 78 cube. The chancey-ness of heuristics is one reason I prefer correlational information. By tracking sensory inputs in light of responses to various stimuli and the functional role the response plays in 79 producing behavior, we can get a good picture of what unaccounted for inputs will be required at each stage (see §4). 78 EXPLANATION. But there will be more than one R-EXPLANATION that can be offered for the production of a given behavior – no one R-EXPLANATION needs to use all the information encoded by the mechanisms that give rise to the behavior. So there will still be considerable indeterminacy. For example, suppose that the toad’s SDM-sensitive mechanism encodes both the presence of an SDM and patterns of retinal stimulation by which SDMs are perceived. We will have an R-EXPLANATION that adverts to information about the SDM and another that adverts to information about patterns of retinal stimulation. To avoid content indeterminacy, we need to zero in on the relevant R-EXPLANATION. This is where the account of derivative adaptiveness comes into play. An R-EXPLANATION explains how a behavior comes to be robustly produced by the execution of (information processing) algorithms by components of the given system. Insofar as they are responsible for producing an adaptive behavior, the information processing roles cited in an R-EXPLANATION are derivatively adaptive. If the behavior is also stabilized, then these roles are stabilized functions of the mechanisms performing them. As a result, each R-EXPLANATION of a stabilized, non-derivatively adaptive behavior corresponds with an S-EXPLANATION of the information processing leading to the production of that behavior. The R-EXPLANATION that provides the least mediated S-EXPLANATION of these information processing functions tells us which encoded information is used in the production of the adaptive behavior (which information processing function each mechanism was selected for performing) and, therefore, which state of affairs is represented (the one the used encoded information is about). (Hence the 79 ‘top-down’ constraint: organism-level behaviors constrain our content-determining explanations of the lower-level processing giving rise to them.) For example, the contribution of a toad’s SDM-responding mechanism to the robust production of prey-capture behavior – e.g., a tongue-flick – could be given in terms of retinal stimulation or distal SDMs. Assuming that the tongue-flick response is truly robust and can be adjusted to accommodate SDMs at slightly different positions relative to the toad, there will need to be a transition to distal contents somewhere in the processing stream. Suppose, for instance, that the SDM-responding mechanism takes input from retinal disparities and computes the distance of the SDM from the toad, thereby enabling a response to differing positions of the SDM. In this case, the most direct explanation of the contribution of this mechanism to the production of the tongue-flick will be given in terms of information concerning the distal SDM. Any explanation of the role of this mechanism in the production of tongue-flicking behavior couched in terms of patterns of light on the retina will be highly mediated. This is captured in my Condition for Content’ below, which – like Shea’s CC – is a sufficient condition invoking explanations of stabilization and robustness. Unlike Shea, my condition has both top-down and bottom-up constraints: The responsiveness of low-level mechanisms constrain the information that can factor in an R-EXPLANATION while the robust, stable outcome function constrains which R-EXPLANATION is relevant for content determination. Condition for Content’ (CC’). In any case where component M of a system S is in a total state R and S has the function to stably and robustly produce F, if (a) M’s being in state R ENCODES INFORMATION about C’s obtaining, and 80 (b) the fact that M's being in R satisfies (a) plays a role in providing, through R’s contribution to the implementation of an algorithm, an R-EXPLANATION of S’s production of F, and (c) the R-EXPLANATION in (b) provides the least mediated S-EXPLANATION of S’s production of F, through the implementation of information processing algorithms, then M’s being in R represents that C obtains This suggests that we need to know the encoded information at each stage of processing leading up to the outcome function. (This is how we identify the R-EXPLANATIONS in (b) and (c).) Some will ask: Do we really need a complete functional account of internal processing, beginning with perceptual stimulation and culminating in some behavioral interaction with the environment, to know that we have zeroed in on the correct content? Sadly, yes; but there are two things to say about this. First, this does not mean that we cannot make educated guesses based on our current understanding of the relevant processing. For instance – returning to the issue of misfires and uncharacteristic causes of receptor stimulation encoded by early perceptual processing – retinal ganglion cells seem to be treating the activation of a sufficient number of the photoreceptors from which they take input as indicating the presence of features of light, weeding out the misfires. Also, there is no evidence that early processing makes use of uncharacteristic causes of retinal stimulation. So, we are well justified in thinking that photoreceptors and retinal ganglion cells have the function to encode the presence of features of the proximal stimulus (light). We 81 just need to acknowledge that, should details emerge that scuttle our current best guess, we will have to rethink it. Second, though this is not encouraging news given how far we are from a complete mapping of the functional connections in the brain, the world does not owe us easy answers to even our most burning questions. Even though identifying the contents of perceptual processes isn’t easy, the approach introduced here is the way to do it: It gives us the tools to make properly educated guesses and to revise our provisional answers as new discoveries come to light. And, given the incremental approach advocated here, we will be better able to pinpoint where things go wrong (when they do) and to understand what needs to be done to fix them. To make good on these benefits, we must proceed with caution, taking a holistic look at perceptual processing and attributing only that background information that is necessary to go from one stage of processing to the next, given the eventual performance of the output function. Otherwise, we risk smuggling in unwarranted contents. To illustrate, the Y-vertex/volumetricy heuristic relies on background information that Y- vertices correlate with three-dimensionality. When the output of a mechanism that takes input concerning the orientations of lines in the visual array is best understood as treating Y-vertices as corners of a volumetric shape, then we are licensed in attributing background information underwriting that transformation. If, however, the mechanism took input from retinal disparities and only treated Y-vertices as indicating a volumetric shape when the retinal disparities also supported this treatment, we should explain the function of the mechanism without appealing to background information about Y-vertices and volumetricity. The retinal disparities provide the information about volumetricity. 82 While we want to offer the most direct explanation of the contribution of some internal perceptual mechanism to the production of a non-derivatively adaptive behavior, we are still bound by the actual low-level processes that are being executed. We cannot just assign background information willy-nilly. We need good reason to think it is necessary for the performance of the specific step in perceptual processing. And we must acknowledge that, should our understanding of that step in perceptual processing be mistaken, we will have to rethink what information we can attribute to the mechanisms performing it. 5. Testing the Account Now that my combined high/low-level account is on the table, we can see how it fares. I’ll start by considering the toy examples introduced earlier and then discuss some more general content indeterminacy challenges that have plagued naturalistic accounts of mental content. 5.1. Toy Examples It is plain to see how CC’ will distinguish the day-drowsers from the algae-eaters, provided the processing stream is as simple as described in those cases: a blue-light detecting mechanism directly feeds into a mechanism that produces the relevant behavior. Assuming that information about the proximal blue light and the distal source of the light (daylight in the one case, bioluminescent algae in the other) is encoded, then CC’ will pick daylight for the day-drowsers 80 because blue-light detection functions as a trigger for getting the nocturnal day-drowser to sleep The creature will only encode these distal light sources if there is background information that blue light is to be 80 treated as light emanating from the particular type of source. Otherwise what is represented is that there is blue light from a distal source. 83 in the daytime (which is advantageous for the day-drowser, hence its stabilization). For the algae-eaters, CC’ will pick bioluminescent algae as the content carried by the activation of the mechanism. In the case of the algae-drowser, CC’ will pick daylight because subsequent processing makes use of daylight to trigger the release of melatonin. It does not use the response of the blue light-detecting mechanism to trigger algae eating behavior; so CC’ will not pick bioluminescent algae as an additional content of the algea-drowser’s blue light-detecting mechanism. Of course, if the distal information is not encoded by the blue-light detecting mechanism in these three creatures, then we will be restricted to the proximal information when detailing the contribution of the mechanism in our content-determining R-EXPLANATION. What CC’ provides – and what Neander’s account lacked – is a way of precisely contextualizing the feature detection mechanism within a larger processing stream leading to an adaptive behavior. And that is what determines what the mechanism was selected for and, hence, what the state of the mechanism represents. How about the see’ums? The R-EXPLANATION of the see’ums avoidance behavior will be restricted to information about light unless we are warranted in attributing background information concerning the hum to some stage of the internal processing because see’ums are unable to directly detect hums. Without this background information, CC’ will ascribe content concerning the intensity of light detected to the light-detecting mechanism. This perfectly explains the robust production of avoidance behavior in various circumstances – e.g., when we drag a see’um into daylight – despite those circumstances not being immediately relevant for the stabilization of the avoidance behavior. CC’ also gets things right with respect to a variant of the 84 see’ums that only exhibits the avoidance behavior when the light intensity is coupled with the detection of a regular vibration in a certain frequency range on the skin. In this case we are (plausibly) licensed in attributing background information that treats the light intensity + vibration as indicative of the presence of a dangerous hum. Therefore the R-EXPLANATION of the see’ums behavior can advert to the presence of the hum and will capture the behavior of the more advanced see’um in counterfactual situations. The advanced see’um will not exhibit avoidance behavior if you simply drag it into the sunlight. 5.2. Content Indeterminacy Challenges Naturalistic accounts of mental contents are known to suffer from several content indeterminacy challenges. I will present the general strategy by which CC’ can answer these challenges using the examples of the (aforementioned) distality problem and the vertical problem. The distality problem is the problem of assigning the right link in the causal chain leading to the state of a given perceptual mechanism as the content attributed to that mechanism’s being in that state. My solution to this problem has already been sketched in §4 and in the discussion of the algae-eaters, day-drowsers, and algae-drowsers, above. Applying it to the toad: Provided that (a) the activation of a SDM-sensitive mechanism, M, does in fact encode information about the presence of (distal) SDMs and (proximal) patterns of retinal stimulation, and (b) subsequent processing treats the output of M as indicating the presence of a distal object – as described above with respect to calculating the position of the SDM to guide the prey-capture behavior – then the information concerning the presence of an SDM will be the information factoring in the unmediated R-EXPLANATION that also provides the S-EXPLANATION of M’s information 85 processing function. And so it will be the presence of the distal SDM that is represented by M’s activation state. However, if M were not wired up to the rest of the perceptual system in this way, it might be the proximal information that is represented by the activation of M. This would be the case if R takes input from a single retina, is restricted to a retinotopic frame of reference, and information concerning the presence of a distal SDM is not encoded until much later in the processing stream. And that is as it should be: if the information is not available to be used or is 81 available but not used, then it is not the function of any mechanism (up to that point) to contribute that information to further processing. 82 A similar result will follow for the horizontal problem – the problem of attributing the correct content out of multiple competitors that are locally co-instantiated (or were locally co- instantiated during the process of selection). This, too, can be illustrated with the toad and the fly. Here the question (assuming the content is distal) is: Is the content of the toad’s M being in state R that there is an SDM, that there is a fly, that there is toad food, or that there is something else that is co-instantiated with the SDM? First we must determine what is encoded by M’s being in R. If, for instance information about the presence of toad food isn’t extractable from the sensory stimulation leading up to R, then this information won’t be encoded and can’t determine the Neander assumes that the toad’s T5-2 cells are capturing information about objects in regions of distal space 81 corresponding to the receptive fields of the relevant T5-2 cells (2017: 109-115), even though T5-2 cells only receive monocular retinal input with minimal processing (Ewert and Schippert 2006). But T5-2 cell activation might only capture information about a configuration of visible features (e.g., elongated moving stimulus) in a retinotopic frame of reference. Notice that, once we fully account for the role of directly adaptive effects in determining the nature of derivatively 82 adaptive effects like response functions, we explain the asymmetry in Neander’s distality principle such that there is no further work for the asymmetry to do. 86 content represented by M’s being in R. We need to know what is encoded to know which R- EXPLANATIONS are in play. 83 Even if information about all the candidate contents is encoded, an R-EXPLANATION must consider how the information provided by M’s being in R is used by further processing to produce the prey-capture behavior. This will depend on empirical matters concerning the toad’s perceptual system. If it is wired such that the toad indiscriminately produces a tongue-flicking behavior in the presence of SDMs and there is no good reason to attribute background information allowing the internal processing to treat SDMs as flies or toad food or some other co-instantiated entity, then CC’ will assign content about SDMs to R-states of M because there is no evidence that internal processing treats SDMs as anything other than SDMs. However, if – as is in fact the case – the tongue-flick is only produced in the presence of an SDM when the toad is hungry, then there is good reason to think that the toad is treating SDMs as toad food at some point in the processing leading from visual stimulation to tongue-flicking behavior. The question, then, is whether M is found at or after this point in that processing. If so (and only if so), CC’ will ascribe the content that there is toad food there to M (when in R) because information about food will offer the least mediated explanation of the selection of M’s contribution to the production of the tongue-flick-when-hungry behavior. We can see from this discussion that CC’ gives us two fronts on which to attack any content indeterminacy challenge: First, we can rule out contents that are not, in fact, internally accessible. Second, we can rule out those internally accessible contents that don’t play a role in If only information about SDMs is encoded, then the R-EXPLANATION will proceed in terms of SDMs. This will 83 provide a more mediated explanation of the stabilization of the derivatively adaptive function of R than would an explanation in terms of, say, toad food. However, it will be the least mediated explanation available, given the encoded information. And that is what CC’ requires. 87 the stabilization of the derivatively adaptive information processing function of the given perceptual mechanism – i.e., encoded contents that aren’t used in the production of adaptive behaviors. As a result, my account is well-positioned – twice as well positioned as purely bottom-up or top-down accounts – to solve any remaining content indeterminacy challenges (e.g., Fodor’s (1990) disjunction problem or problems related to properties like Goodman’s (1955: 73-75) grue). If perceptual representation is a relation between a perceptual state and the environment, then the way to understand that relationship is in terms of the features of the environment to which perceptual processing is sensitive, given sensory input from the environment, and the adaptive behaviors that are supported by that perceptual processing and by which the organism (or system) interacts with its environment. This is precisely what my CC’ provides and what other views in this vein do not. 88 Chapter 3 Restricted Auditory Aspatialism ABSTRACT: Some philosophers have argued that we do not hear sounds as located in the environment. Others have objected that this straightforwardly contradicts the phenomenology of auditory experience. And from this they draw metaphysical conclusions about the nature of sounds – that they are events or properties of vibrating surfaces rather than waves or sensations. I argue that there is a minimal, but recognizable, notion of audition to which this phenomenal objection does not apply. While this notion doesn’t correspond to our ordinary notion of auditory experience, it does – in conjunction with our lack of an uncontroversial individuation of the senses and recent interest in distinctively multisensory features of perceptual experiences – raise the possibility of more expansive notions of audition, including some that do plausibly count as corresponding to our everyday notion of audition, that lack the spatial phenomenology cited in the objection. Until this possibility is ruled out, the phenomenal objection and metaphysical conclusions drawn from it remain inconclusive. 89 Contents 1. Setting the target: restricted auditory aspatialism 1.1.Strawson’s restricted auditory aspatialism 1.2.A new motivation for restricted auditory aspatialism 1.3.The target notion of a purely auditory experience 2. Spatial hearing 2.1.monaural cues 2.2.binaural cues 3. Restricted auditory aspatialism, unrestricted auditory aspatialism, and the ontology of sound 4. Conclusion Frequently, on nighttime walks through my neighborhood, I hear a tone – a bit like the whistle of a tea kettle but lower pitched, not loud but not faint – in the distance. I call it the Whistler. I have no idea what causes the Whistler, but it sounds as though it is rather far away and so must be fairly loud at its source. It comes from the Southwest. Were you to come walking with me, I trust you would corroborate all of this and that you would find your ability to do so as unremarkable as I find my ability to have made these observations in the first place. Nevertheless, some philosophers have argued that we do not enjoy spatial audition – that is, we do not hear sounds as located in the environment, even in the bare sense of merely off in a given direction or at some particular distance (Machlachlan 1989; Malpas 1965; Nudds 2001, 2009; O’Shaughnessey 2000, 2009; and, in a qualified form discussed below, Strawson 1959). Call this view ‘auditory aspatialism’. In spite of my evening encounters with the Whistler, I hereby submit this as my application for (qualified) membership in the aspatialist club. In the first half of this chapter I provide the philosophical background and motivation for my approach to auditory aspatialism. In the second half, I provide empirically-based arguments 90 in its favor. These arguments consider the receptivity of the primary auditory receptor – the basilar membrane – to features of a sound wave impinging upon it and the role of these features in determining spatial features of the sound or its source. I then show how these results undermine metaphysical conclusions about the nature of sounds that the phenomenal objection has been thought to support. 1. Setting the Target: Restricted Auditory Aspatialism Auditory aspatialism confronts an immediate and obvious worry, already evident in my description of the Whistler: We do experience sounds as at a distance in some particular direction. Aspatialism straightforwardly contradicts this phenomenological evidence. Auditory spatialists have argued, from this fact and the assumption that our perceptual experiences don’t systematically mislead us, that sounds are events or properties of their sources (Casati and Dokic 2005, 2009; O’Callaghan 2007, 2010). Recent aspatialists have responded to the objection by denying that we ever experience sounds as located and offering an error theory that explains away the recalcitrant phenomenology – generally in support of the view that sounds are (instantiated by) pressure waves (Nudds 2009; O’Shaughnessy 2000, 2009). If the wave view is correct, then our auditory experience cannot present sounds as located in some particular direction at a particular distance without systematically misleading us because the sound wave 91 isn’t located at a distance when heard. It is located right where the perceiver is. Hence the need 1 for an error theory. Aspatialists can also respond to the objection by restricting their aspatialism to a special notion of audition to which the phenomenal objection does not apply. This is the course taken by Strawson (1959) in his original formulation of auditory aspatialism. Strawson’s aspatialism grew out of his project of descriptive metaphysics – an exploration of the relations between the concepts of space and objectivity in our conceptual system – rather than a concern with the metaphysics of sound (§1.1). I offer a different motivation for taking the restricted aspatialist approach; namely, that, when properly developed, restricted auditory aspatialism opens a new line of resistance against the phenomenal objection to unrestricted aspatialism – one that denies that the localization phenomenology cited in the objection is due to audition (§1.2). The rough idea is this: If restricted auditory aspatialism is true, then there is some notion of audition to which the phenomenal objection does not apply, even if it isn’t the intended target of the objection. This raises the possibility that there are characterizations of the ordinary (unrestricted) auditory experience to which the objection does not apply. This possibility becomes especially salient given the lack of any generally agreed upon individuation of the senses – or even that there is just one acceptable individuation of the senses – and recent interest in the possibility of features of perceptual experiences that aren’t properly associated with any of Aspatialism is about the experience of sounds, not the sounds themselves. Proponents of the wave view don’t deny 1 that sounds (waves, according to them) are located. They deny that we experience them as located – but see (Sorensen 2008, Chapter 14) for a defense of a distal wave view). Also note, both spatialists and aspatialists agree that perception does not systematically mislead us – at least not so dramatically as to mis-locate its objects to the degree necessary were one to embrace the wave view and accept that sounds are experienced as reported in the phenomenal objection. 92 the five standard senses but are constitutively multisensory. Until the proponent of the 2 phenomenal objection rules out this possibility, the phenomenal objection to unrestricted aspatialism remains inconclusive. (Problems for meeting these burdens will also be discussed in §1.2.) Furthermore, if the localization phenomenology isn’t due to audition, and sounds are the objects of audition, then the metaphysical conclusions drawn from the phenomenal objection don’t follow. Of course, this all hangs on the truth of restricted auditory aspatialism. To assess restricted aspatialism, we need to clearly characterize the relevant restricted notion of audition. And we need a principled way of assigning features of a perceptual experience to audition. This is done in terms of components of perceptual mechanisms and the features of the stimulus to which they are sensitive (§1.3). These same tools can, I suggest, be extended in future work assessing the viability of this new line of resistance. I present the argument for the restricted form of auditory aspatialism in §2. I then return to the strategy for defending unrestricted aspatialism and how it undermines the metaphysical conclusions drawn from the phenomenal objection (§3). This should make the restricted aspatialist approach of interest to those concerned with the debate over the ontology of sounds, as well as more general questions concerning audition and spatial perception, without entangling us in Strawson’s descriptive metaphysics. On individuation of the senses, see (Coady 1974; Fulkerson 2014b; Gray 2013; Grice 1962; Heil 1983; Keeley 2 2002; Macpherson 2011b, 2011c, 2014; Matthen 2015; Nelkin 1990; Nudds 2004; Roxbee-Cox 1970). On multisensory features, see (Bayne 2014; Briscoe 2016, 2017, 2019; Connolly 2014; Fulkerson 2014a; Macpherson 2011a; O’Callaghan 2008, 2012, 2014a, 2014b, 2015, 2017a, 2017b, 2019). See (Spence and Bayne 2014) for a critical look at the evidence for multisensory experiences. See (Mandrigin 2021; Wadle 2021 (and chapter 1 of this dissertation)) for critical discussion of the arguments in favor of multisensory features. 93 1.1. Strawson's restricted auditory aspatialism The trick, for the proponent of the restricted aspatialist approach, is to find the right notion of auditory experience – one that escapes the phenomenal objection while still being recognizably auditory. Strawson’s restricted aspatialism provides a useful framework here. It is the historical articulation of the view. More importantly, discussing its ambiguities brings crucial desiderata for the target notion into focus and sets the stage for the new line of resistance mentioned above. Strawson begins with the claim that, in an exclusively auditory experience, there would be no place for the application of spatial concepts (Strawson 1959, p. 65). He goes on to say that: The fact that, with the variegated types of sense-experience which we in fact have, we can, as we say, ‘on the strength of hearing alone’ assign directions and distances to sounds, and things that emit or cause them, counts against this not at all. For this fact is sufficiently explained by the existence of correlations between the variations of which sound is intrinsically capable and other non-auditory features of our sense-experience. I do not mean that we first note these correlations and then make inductive inferences on the basis of such observation; nor even that we could on reflection give them as reasons for the assignments of distance and direction that we in fact make on the strength of hearing alone. To maintain either of these views would be to deny the full force of the words ‘on the strength of hearing alone’; and I am quite prepared to concede their full force. I am simply maintaining the less extreme because less specific thesis that the de facto existence of such correlations is a necessary condition of our assigning distances and directions as we do on the strength of hearing alone. (Strawson 1959, p. 66) In the passage above, Strawson embraces aspatialism with respect to both sounds and their sources. Despite this, he is explicit that his view is not prone to the phenomenological objection. 3 He grants that we can locate sounds and sources 'on the strength of hearing alone' and insists that this is not a matter of making inductive inferences from the deliverances of auditory experience. 4 Contrast with Nudds’s (2009) aspatialism, which concerns our experience of sounds but not their sources. 3 See (O’Shaughnessy 2000, 2009) for a view on which sound localization is a matter of inference. 4 94 Strawson can only maintain both positions if he draws a distinction between the sort of auditory experience relevant to the localization of sounds 'on the strength of hearing alone' and the sort of auditory experience relevant to his aspatialism – the purely auditory experience. This distinction is, for Strawson, to be drawn along the lines of information sharing with the other sensory modalities. So, how exactly do we break these inter-modal correlations in order to get at the purely auditory experience? It will not be enough to simply subtract those experiences we associate with the other senses from our occurrent sensory experience. We are trying to drive a wedge between some specialized notion of auditory experience and the notion to which 'the full force' of the phrase 'on the strength of hearing alone' applies. What could the full force of this phrase be if not on the basis of one’s auditory experience without the assistance, in experience, of the other senses (individuated in the common sense way)? But then simply subtracting the other senses from an occurrent experience won’t give us a specialized notion of audition. It will give us the ordinary notion of audition. So, unless Strawson is confused, this cannot be what he has in mind. 5 Notice also that Strawson references correlations between experienced features of sounds and non-auditory features of sensory experience. Given that he does not think hearing the distance or direction of sounds or their sources 'on the strength of hearing alone' is a matter of noting these correlations and making inductive inferences from them, Strawson must be thinking of these correlations as operating subpersonally, prior to the ordinary auditory experience. 6 Despite this, this is one of the interpretations O’Callaghan (2010) gives him. 5 This is consistent with our being able to recognize the correlations and to make inferences from them or to offer 6 them as reasons for our localization of the sound. But it doesn’t require it. 95 There are two sorts of subpersonal correlation we must consider. The first is parasitic on previous conscious experience: Past awareness of the frequent coincidence of some spatial feature registered by a non-auditory modality with some feature of auditory experience leading to the automatic association of the given auditory feature with the given spatial feature. O’Callaghan’s (2017a) interpretation of Strawson’s purely auditory experience, as an auditory experience had by a subject who does not have and never has had any sensory experience in a modality other than audition, is an attempt to rule out just these sorts of correlations. On this interpretation, we are not having a purely auditory experience while lying on our backs in a sensory deprivation tank listening to music piped in through speakers on the ceiling of the tank because we have had experiences in other modalities prior to getting in the tank. Suppose that learned associations between looming visual objects and increasing auditory intensity are necessary to experience a sound (source) as approaching due to a steadily increasing intensity. One of O’Callaghan’s purely auditory experiences will not be able to avail itself of these learned correlations. But what about innate correlations? 7 O’Callaghan-style pure experiences do not rule out innate correlations to representations or capacities that could supply those contents lost by the isolation of the auditory experience from any historical interaction with the other senses. These representations or capacities might be associated with some other sensory modality or they might be premodal (in that they give input to sensory processes prior to their experiential output but do not belong to any one modality). For instance, changing intensity might be innately correlated to changing locations in a premodal We should also rule out learned correlations to non-sensory information, as well. This is generally passed over 7 without remark, perhaps due to an assumption that any relevant non-sensory information will be, itself, derived – at least in part – from sensory information or to an assumption that perceptual content is nonconceptual. 96 egocentric coordinate system, allowing changes in intensity to carry information about distance. Or it might be innately correlated, via visual looming, to such a coordinate system associated with vision. 8 So, if Strawson’s aspatialism is to be assured of avoiding the phenomenal objection, it will need to isolate its purely auditory experience from any innate correlations to non-auditory resources, as well. Textual evidence indicates that Strawson was only concerned with learned correlations, supporting O’Callaghan’s (2017a) interpretation. For instance, Strawson refers to 'correlations between the variations of which sound is intrinsically capable and other non- auditory features of our sense experience'. If such textual evidence is to be believed, this is where Strawson and I part company. My minimalism about the purely auditory experience cuts deeper 9 than his. Mine excludes these innate correlations, too, because they are, after all, correlations to non-auditory resources and so shouldn’t factor in an assessment of a purely auditory experience. 1.2. A new motivation for restricted auditory aspatialism As a first step in motivating my approach to the purely auditory experience, consider the ventriloquism effect, in which visual input dominates the localization of an auditory stimulus. Is the spatial phenomenology associated with the ventriloquism effect auditory, visual, or audio- visual? If we say auditory, we implicate our eyes as part of the auditory mechanism, the stimulation of which gives rise to the experience. If we say visual or audio-visual, we risk There is empirical support for the claim that correlations between visual looming/receding and changes in intensity 8 are innate, though this doesn’t touch on the question of whether or not these correlations support perception of change in auditory distance from changes in intensity without concurrent visual stimulation (Orioli et al. 2018). It is not clear that we should read the evidence this way. After all, given the motivations for his aspatialism (his 9 descriptive metaphysics project), he would likely happily omit innate correlations, too. Thankfully, we do not have to decide the exegetical question. Strawson has served his purpose for the present endeavor, and we can now set him aside. 97 making a great deal of what we ordinarily take to be spatial auditory phenomenology to be, in fact, non-auditory. (I have in mind all of the instances whereby our auditory localization is made more refined by visual input. ) 10 This is not to say that there is no way to resolve this tension with respect to the ventriloquism effect (or other cross-modal interactions). The point is that the choices we make about what is and isn’t a part of the mechanism that generates our ordinary auditory experiences will impact which features of a perceptual experience we ought to class as auditory (and vice versa). But intuitions about which features of a perceptual experience ought to be classed as auditory come apart from intuitions about what is and isn’t a part of our auditory mechanism. This tension must be resolved before we can say which features of our perceptual experiences ought to count as auditory – even (potentially) of our ordinary auditory experiences. Unfortunately, there is no uncontroversial proposal for individuating the standard five senses (sight, audition, smell, taste, and touch) and thereby determining which features of experience are due to them. Perhaps we can be sure that some features are auditory (for instance, proper 11 sensibles like pitch), but this certainty cannot be extended to features about which there is controversy (including sound localization phenomenology). But then we’re not in a position to adjudicate the debate between aspatialists and spatialists about the ordinary auditory experience Given that this is a difference in degree not kind, this will be so even if there are different mechanisms for 10 resolving discrepant audio-visual inputs and for improving resolution of auditory spatialization by way of visual input. Whatever reasons we have for letting one of these mechanisms in will be reason enough to let the other in. Nor is it a problem that the ventriloquism effect and spatial recalibration seem to presuppose that (what is being classified as) auditory input carries spatial information of its own. Even if this is true, we haven’t settled the question of whether audition, construed in this way, is free of other non-auditory inputs without which it couldn’t have this information. See n.2 for references. While this literature tends to focus on cross-species individuation, the lessons apply within 11 species as well. There is no generally accepted rule for resolving this tension revealed by the ventriloquism effect, for instance. 98 because we cannot yet say if the phenomenal features at play in the phenomenal objection are, in fact, properly auditory. This possibility becomes even more salient in light of recent arguments in favor of the existence of distinctively multisensory features – features of perceptual experiences that are not associated with any of the standard five senses. Whether or not there are such features certainly 12 depends on how we individuate those senses, but – again – there is no generally accepted account of how to do this. So it remains a live possibility that auditory spatial phenomenology is one of these multisensory features. Even if we resort to sensory pluralism – the view that the 13 individuation of a given sense is sensitive to the explanatory purposes to which it is put (Fulkerson 2014b; Macpherson 2011c, 2014) – to alleviate this worry, we still run into difficulties. According to sensory pluralism, there are different ways of individuating audition depending on context. But then we need to know which context(s) is (are) relevant for assessing spatialism about audition. Until these issues are settled, the phenomenal objection is inconclusive. So, too, are the metaphysical claims it has been used to support: If it is unclear that audition presents its objects as located in the distal environment, we can’t use this localization to draw conclusions about those objects (more on this in §3). The way forward is to find some preferred way to demarcate the auditory mechanism and then to assess whether or not that mechanism could give rise to auditory experiences of sounds as See n.2 for references. Notice that O’Callaghan, himself, is a leading advocate for the existence of perceptual 12 features not associated with any of the everyday senses, in addition to being an advocate of the phenomenal objection. Generally, it is assumed in the multisensory features literature that auditory localization (including ventriloquism 13 effect location phenomenology) is properly associated with audition, but no argument is offered for the intuition. Nor is it clear that this intuition can be reconciled with other feature association intuitions relied on in this literature (Wadle 2021; chapter 1, this dissertation). But then it remains an open possibility that supposedly auditory localization phenomenology is a novel feature. Likewise, it remains possible that this phenomenology is properly associated with some other sense. We can be mistaken about such things – see (Spence 2016). 99 located in a direction or at a distance. To do this we ought to start with the most minimal 14 sensory mechanism built out of uncontroversially auditory components that will still count as auditory – the mechanism associated with the purely auditory experience – and see if it can deliver the relevant spatial properties. If it can, then no form of auditory aspatialism is true (restricted or not). If it can’t, then restricted auditory aspatialism is true for that notion of audition. We can move from this result to results pertaining to the unrestricted aspatialist thesis, building upon the mechanism of the purely auditory experience until it does deliver these properties. Then we assess whether or not the mechanism specified should count as auditory. If 15 it does, then the spatialist carries the day, provided it is the most restricted acceptable characterization of (ordinary) audition. If not, the phenomenal objection remains inconclusive (barring further refinements to what is and isn’t an acceptable way of characterizing the mechanisms of ordinary audition). We build out from the mechanism of the purely auditory experience by introducing capacities and informational input from other sources – both those associated with other sensory modalities and those that are not (innate premodal spatial representations, for example). Of 16 course, we can always recover these spatial properties by positing such innate representations or capacities, but doing so comes with a burden: We need both a good reason to posit the existence We need to ground the discussion in mechanisms because the dispute concerns whether some phenomenal features 14 or representational contents should be attributed to audition, on its own. So we need to appeal to something other than phenomenology or contents to settle the question. The auditory mechanism is the obvious choice. More generally, focusing on sensory mechanisms to settle questions about phenomenology and content has the benefit of making these questions empirically tractable. Assuming that we can settle on criteria for an acceptable characterization of ordinary audition. One need only look 15 at the extensive literature on the individuation of the senses to realize that this is likely to be a difficult task. On Bayesian accounts of perception, these resources will include priors. A complete Bayesian account will need to 16 include a story about where the priors come from – whether acquired or innate. If we attend to the various inputs to minimal audition required to recover the features of an ordinary auditory experience – as I am urging – we will be able to tell which such inputs are learned from interactions with other senses and which are not. 100 (in us) of the representation or capacity and a good reason to think it part of the auditory mechanism – at least if this is going to secure the spatialist conclusion. And it seems unlikely that we will find good reason for thinking a given innate representation or capacity is best thought of as part of the auditory mechanism. Presumably the reason would be that it only interfaces with 17 unequivocally auditory perceptual mechanisms. But given the many interactions of the sensory mechanisms, beginning at their early stages, it is not obvious that there will be an unequivocal specification of the auditory mechanism at the point we are likely to find such representations or capacities. This gives us further reason to omit such representations or capacities (should they 18 exist) from our unequivocally auditory starting point – they are not unequivocally auditory. This chapter carries out the beginnings of this project – it establishes that restricted aspatialism is true. Whether or not any or all acceptable characterizations of the mechanisms underwriting our everyday notion of audition capture the relevant spatial properties is left as a question for another time. But it is a question that I hope to make more tractable by showing (i) how we should go about characterizing the relevant mechanisms and (ii) how we should ascribe features of a perceptual experience to those mechanisms. Undertaking these two tasks with respect to the purely auditory experience is the subject of the next section. Whether there is good reason to posit them in the first place (for any modality, not just audition) is a matter of 17 some debate. For recent entries in that debate see (Cassam 2005, 2007; Chomanski 2017; French 2018; Matthen 2014; Schwenkler 2012). For a sampling of the empirical evidence for this claim, see (Calvert et al. 2004; Driver and Noesselt 2008; 18 Ghazanfar and Schroeder 2006; Schroeder and Foxe 2005; Shimojo and Shams 2001). Evidence specific to audition will be discussed below. 101 1.3. The target notion of a purely auditory experience This, then, is how I propose we understand purely auditory experiences: A purely auditory experience is one arising from the stimulation, by sound waves, of the basilar membrane (or functionally equivalent receptor organ) and which receives no informational input from the other senses or from other non-auditory background representations. The basilar membrane is found in the cochlea (in the inner ear). Sound waves entering the ear are transmitted to the basilar membrane via the eardrum and the tiny bones of the inner ear. The resulting perturbations of the basilar membrane stimulate hair cells arranged along the length of the membrane, which is tonotopically structured – the closer together two regions of the basilar membrane are, the more similar the frequencies they respond to. In this way, the basilar membrane breaks complex incoming sound waves into their constituent frequencies and registers the intensity of those independent constituents (more on this in §2.1.2). Stimulation of these hair cells leads to the release of neural signals transmitted along the auditory nerve to the brainstem, from which they are passed along to the auditory cortex. As a first pass, we can say that a feature of the proximal stimulus (a sound wave) is encoded just in case stimulation states of the basilar membrane reliably track the presence of that feature in the course of its normal functioning. The stimulation states can be local (at a single hair cell along the basilar membrane) or global (at multiple hair cells along the basilar membrane). Global encoding is governed by algorithms specifying the conditions under which isolated portions of the stimulation of the membrane are to be considered together as potentially encoding some feature of the proximal stimulus. The conditions are specified in terms of the information encoded by each transducer, the spatial relations among transducers on a given 102 receptor organ, and temporal relations among the activation states of transducers. This covers the cases in which locally encoded features are brought together to encode further features of the proximal stimulus. Note, the spatial arrangement of the transducers on the basilar membrane is not, itself, encoded – at least not just in virtue of the occurrent sensory stimulation. This will need to be learned as background information (see below). We’ll say – again as a first pass – that features of objects and events in the distal environment, and the relations they bear to other objects (including spatial relations), are encoded just in case they are (nomically) determined by encoded features of the proximal stimulus. This guarantees that, whenever we have the determining proximal features (which the receptors evolved to detect), we will have the distal – that is, the presence of these proximal features entails the presence of the distal feature. The question of whether there are distal spatial features that are so determined by the encoded features of the proximal stimulus (sound wave) will be the focus of §2. In particular I focus on the spatial features of vantage point relative distance and direction. One might object that my reliance on determination in the encoding of distal features is overly stringent. After all, it is common in accounts of mental contents to allow background conditions relevant to the selection or performance of a given perceptual mechanism’s function – what Dretske (1981) calls ‘channel conditions’ – to play a role in content determination. I do not rule these out of my account, but I treat them as ‘background information’ – informational assumptions that are either acquired through perceptual experience or built into the operations a perceptual mechanism performs on its inputs (such as the implementation of one of the algorithms mentioned above) (see chapter 2, §3.2 for more a detailed description of background 103 information). We can now refine our first pass account of distal feature encoding in this way: Features of objects and events in the distal environment are encoded just in case they are determined by the encoded features of the proximal stimulus in conjunction with the relevant background information and algorithms. We can then make a similar refinement to the account of proximal feature encoding: Features of the proximal stimulus are encoded just in case stimulation states of the basilar membrane, along with the relevant background information (corresponding to ‘channel conditions’ enabling normal functioning of the basilar membrane), determine the presence of that feature. Treating channel conditions in this way has the benefit of putting them into the same format (information) as encoded information, facilitating their integration with information processing accounts of the mind. It also allows us to pinpoint the location at which background information impacts cognitive processing and to identify its sources. This is particularly important for isolating our target notion of the purely auditory experience. It allows us to determine which background information is relevant for the production of a purely auditory experience. To illustrate, interaural discrepancies, in time and level, between the stimulation of the two basilar membranes serve as some of the primary cues to sound (source) direction. These differences are registered in the superior olivary complex (SOC) of the brainstem, which takes direct input from the auditory nerve. The interaural disparities are then transmitted to the inferior colliculus (IC), in the midbrain. It is there that we see the first evidence of selectivity for head- relative left–right directions (azimuth) (Salminen et al. 2018; Schnupp and King 1997; Sterbing 104 et al. 2003; Thompson et al. 2006). If the disparities are not, themselves, sufficient for 19 calculating azimuth – and we will see that they are not – we will need to attribute background information (for instance, about the distance between the ears) that enables these transformations in IC. But is this background information best thought of as auditory? It is tempting to answer in the affirmative, given IC’s position in the auditory processing stream – virtually all signals carried by the auditory nerve that are passed along to the auditory cortex pass through IC. However, these transformations in IC are modulated by learned correlations with other sensory inputs delivered by downward projections to the IC (and SOC, for that matter) from the auditory cortex (Bajo et al. 2010; Bajo and King 2013; Brainard and Knudsen 1993; Budinger et al. 2006; Feldman and Knudsen 1997; Peterson and Schofield 2007). So IC does not contribute to the purely auditory experience because its outputs are conditions by learned correlations with non- auditory sensory inputs. 20 By contrast, the cochlea does not receive downward projections from the auditory cortex. Granted it does receive downward projections from IC, but these are involved in defensive measures to protect against hearing damage from loud sources – stiffening the basilar membrane and tightening the muscles of the inner ear to dampen the mechanical stimulation of dangerously loud sound waves (Guinan 2018). There is no evidence that IC passes along spatial information, It is also there that we see the first evidence for selectivity to sound (source) elevation. This selectivity does not 19 rise to the level of a map of auditory space. There is no topographic arrangement of IC corresponding to azimuth (or elevations), despite the selectivity for each. There is such a map in the superior colliculus (SC) – a multisensory midbrain structure downstream of IC (Sterbing et al. 2002, 2003; King and Palmer 1982, 1983). Given the multisensory inputs to SC, its contributions must be excluded from the purely auditory experience. Even if innate background information is still needed, the status of IC as purely auditory is not as secure as we 20 might have thought because it is impacted by non-auditory sensory input. Recall that there are no generally agreed upon criteria for differentiating the senses (beyond their dedicated peripheral receptors). Given this, it is unclear whether IC counts as auditory or multisensory. Since the goal is to isolate a notion of audition that is uncontroversially auditory, we need to exclude IC (and any innate information we attribute to it). 105 or other learned correlations supplied by auditory cortex, to the cochlea. The same holds for the auditory nerve. So, any background information we need to attribute to the cochlea or the auditory nerve – concerning the tonotopic structure of the basilar membrane, for example – can unproblematically contribute to the purely auditory experience. 21 The general rule is: To count as part of the basis of the purely auditory experience, a mechanism performing transformations on inputs originating with stimulation of the basilar membrane(s) cannot rely on information provided by non-auditory sources (such as learned correlations with other sensory inputs). The account of encoding, above, ensures that the encoded information we are considering is exclusively auditory, provided the background information implicated in encoding explicit information concerning proximal features is. So, in arguing for the aspatialism of the purely auditory experience, we will need to show that there is (a) no non- auditory background information is implicated in the encoding of proximal features, and (b) there is no purely auditory background information that could combine with encoded features of the proximal stimulus to determine locational features of sounds (sources), thereby giving rise to an experience of sounds as located. Call the sensory modality associated with the encoding of features of sound waves registered by the basilar membrane(s), in conjunction with the background information carried by wholly auditory mechanisms, ‘minimal audition’. Minimal audition is minimal insofar as it 22 is the most restricted notion of audition (individuated in terms of receptors and stimulus) that is See n.27 for instances where we will want to ascribe such information to auditory nerve fibers. 21 This method of defining minimal audition bears affinities with proposals for individuating sensory modalities 22 found in (Keeley 2002; Matthen 2015). 106 still recognizable as such. The purely auditory experience is an experience that receives all of 23 its informational input from minimal audition. I will argue for Aspatialism about Minimal Audition: Aspatialism about Minimal Audition (AMA): Minimal audition cannot encode particular vantage point relative distances or directions of sounds (or their sources) in the distal environment. With a modest assumption, AMA can be leveraged into a defense of aspatialism about the purely auditory experience. The assumption concerns the relationship between encoded features, representational contents, and phenomenal character: For the purely auditory experience to represent one of its objects (a sound or its source) as having a given feature or to make it seem as though one of its objects has a given feature, minimal audition must be able to encode that feature. And for the experience to be accurate, it must be encoding that feature in fact. This 24 follows from the fact that the purely auditory experience only receives informational input from the encoding capacities of minimal audition, which allows us to set aside correlations to spatial Which, I must stress again, is not to say that it is one of the acceptable ways of characterizing our everyday notion 23 of audition. It is to say that minimal audition is recognizable as some form of audition. Advocates of action- constituting accounts of perception might object that, in ruling out those inputs that allow us to track our movements, we cease to have something that is recognizable as auditory. However, action-based views have been developed largely as a way of dealing with the perspectival aspects of visual experience and our ability to abstract away from those aspects. They do not support the claim that visual experience per se is impossible in the absence of action but only that perception of the intrinsic spatial features of visual objects is impossible in the absence of action (Evans 1982, 1985; Grush 2000, 2007; Noe 2004; Schellenberg 2007, 2010). Furthermore, there is no straightforward translation of this into the auditory realm: we don’t hear sounds as having shapes (intrinsic or perspectival) and, even if we could hear the shapes of sources, the standard view is that sounds are the primary objects of audition and act as intermediaries between perceivers and the physical objects and events that are sound sources; but see (Kulvicki 2016). A feature’s being encoded does not guarantee that it would be a constituent of the representational content of the 24 purely auditory experience, only that it could be. What decides whether encoded distal or proximal features (or both) get represented in the content of the experience will not be addressed here. 107 information provided from elsewhere (personal or subpersonal, learned or innate, sensory or premodal). To illustrate using a different modality: The retinal stimulation arising from looking at, say, a tennis ball will yield a Marrian 2 1 /2D sketch. That sketch is the basis for presenting the 3D shape of the ball as spherical. But to go from the 2 1 /2D sketch to the 3D shape goes beyond the resources supplied by the retinal stimulation, itself – and perhaps beyond the resources of even our ordinary notion of vision (see n.23). The retinal stimulation underdetermines the 3D shape of the ball. The occluded portion of the ball might be flat, concave, or any number of other shapes. Given this assumption, AMA entails Restricted Auditory Aspatialism: Restricted Auditory Aspatialism (RAA): A purely auditory experience could not present its objects as being located in some particular direction or at some particular distance, relative to the perceiver. If RAA is true, the phenomenal objection doesn’t apply to the purely auditory experience. However, the action is clearly at the level of what gets encoded by minimal audition. And so the focus will be on establishing AMA. There is no need to determine whether or not we ever have purely auditory experiences, as I describe them. The point is that, if the specification of the relevant auditory mechanism just is minimal audition and RAA is true, then it couldn’t be the case that the experience of sounds as located in a given direction or at a given distance were properly attributed to audition, alone, because no experience whose only informational input is the information encoded in minimal 108 audition can represent its objects as located in this way. This opens up a region of logical space heretofore uncharted in the debate over spatial audition – the region between pure and ordinary auditory experiences (and extending to their vague outer boundaries) – making the new line of defense for unrestricted auditory aspatialism available (§1.2, §3), provided RAA is true. The next section establishes that it is (by way of AMA). 2 . Spatial Hearing Given the way I have characterized minimal audition, we should begin with a consideration of the features of sound waves (the proximal stimulus) that the basilar membrane can encode: intensity, frequency, waveform, onset time and duration. Intensity is, roughly, the amount of energy in the wave. Intensity correlates with amplitude which, in standard graphical 25 26 representations, is given by the height of the peak of the wave form above (or below) equilibrium (zero). Frequency is the number of periodic repetitions in the wave in a given time interval. Waveform is the shape of the wave within a period. In graphical representations this will be the recurring (at the wave’s frequency) pattern of peaks and valleys of the wave. Duration is the time span from onset to decay of the wave, measured at a stationary point. We can get an intuitive grasp on these features of sound by noting their psychological correlates – the features of ordinary auditory experiences with which they are associated. Intensity is experienced as the loudness of the auditory event, frequency as its pitch, and waveform as its timbre (the ‘sound quality’ – that feature of sounds that allows us to distinguish, More precisely, it is the amount of energy per unit area in a direction perpendicular to that area. 25 It is proportional to the square of amplitude. 26 109 say, a saxophone from a trumpet producing the same pitch at the same intensity). It is important 27 to keep in mind, though, that these psychological correlates are not relevant for determining spatial attributes of sounds. Their underlying bases (features of sound waves) are. Information about these features of sound waves exhausts the sorts of information about the proximal stimulus that the basilar membrane encodes. Intensity is encoded by the degree of stimulation of the basilar membrane, frequency by the location of the stimulation on the basilar membrane. In the case of spectrum, algorithms consider the locations of the stimulation (as encoding the individual constituent frequencies of the complex stimulus) as well as microfluctuations in intensity and frequency. Of these features, intensity and waveform vary with distance and direction while frequency and duration do not. Differences in onset time and intensity at the two ears vary with 28 direction (and, in limited circumstances, distance). So we should look to intensity, waveform, and these binaural discrepancies for possible sources of spatial information regarding the distal The mapping of attributes of sound waves to their psychological correlates is not quite as straightforward as I have 27 represented it. For instance, perceived pitch does not always correspond to the lowest frequency component of a complex sound. In the missing fundamental phenomenon, for example, there is no energy at the frequency corresponding to the perceived pitch – it is ‘supplied’ by the auditory system on the basis of background information concerning the distribution of frequencies in complex sounds. Relatedly, perceived pitch can vary from that associated with the lowest sounding frequency when the partials are slightly inharmonic (Terhardt 1979; Terhardt et al. 1982). This raises a worry: if the relevant mechanisms carrying the background information necessary for pitch perception are not part of minimal audition, the perhaps minimal audition will not be recognizably auditory, pitch being (intuitively) a fundamental component of auditory experience. However, the relevant operations are performed in the auditory nerve fibers (Cariani and Delgutte 1996; Chialvo 2003). The auditory nerve, like the basilar membrane, does not receive input from other senses. So it can be the site of the mechanisms implementing the frequency-based algorithms for extracting perceived pitch from the stimulation of the basilar membrane. Similarly, the physical characteristics determining timbre are encoded by auditory nerve fibers (Town and Bizley 2013). Having settled this worry, the simplifications will pose no further complications here. I mean here that these vary with changes in distance to a stationary source. There is, of course, the doppler effect 28 for moving sources or auditors. But most sounds we hear are not both of sufficient duration and produced by sources moving fast enough to produce a significant doppler effect. Furthermore, the doppler effect will be subject to the same sort of worries that I raise for intensity and spectrum based cues to distance – it is confounded by changes in the intrinsic frequency of the source, and the amount of doppler shift relative to distance will depend on the speed and angle of approach or recession (Wightman and Jenison 1995, pp. 385–394). This information could be extracted from auditory experience by combining doppler cues with interaural time differences, but see §3.2 for problems with this approach to encoding the relevant spatial features in minimal audition. 110 location of sounds or sources in minimal audition. What I will show is that each of these putative cues to distance or direction underdetermines distance/direction in minimal audition. Rather than encoding distance/direction, they pose inverse problems in which any one of an infinite number of pairings of distance/direction with some other feature of the environment, sound, or source would account for that cue. Furthermore, in each case the infinite pairing includes the possibility that the sound is located right where the perceiver is – at no distance and in no direction. Each of these cases is consistent with the sound being anywhere and so cannot be the sole basis of the localization phenomenology cited in the objection, which depends on sound being experienced as at a distance or in some direction. I will argue that these inverse problems remain unsolved in minimal audition. If I am right about this, minimal audition fails to encode any particular distance or direction: The encoded features of the proximal stimulus (the putative cues) do not wholly determine vantage point-relative distance or direction, and AMA is true. Hence, given our assumption, RAA is true, too. 2.1. Monaural cues I begin with a discussion of the monaural cues to direction and distance based in intensity and waveform. Once we get clear on what these cues are, we will see that they are not sufficient to locate sounds in distance or direction, and that they cannot work together to do so. (I argue that they cannot be coordinated with binaural cues to locate sounds in distance or direction without appealing to information gathered from some non-auditory source in §2.2.) 111 2.1.1. Intensity In ordinary audition the intensity of a sound wave is one obvious cue to the distance of the 29 sound (source). (Intensity shows up in experience as loudness: the Whistler sounds quite faint at my position as I walk about my neighborhood – much fainter than the whoosh of a passing car. But, given the distance it must travel (it doesn’t come from anywhere in my neighborhood), the Whistler must be much louder at its source than the whoosh is just outside the car.) Call the intensity of a wave at its source the initial intensity. Call the intensity as measured at the listener’s position the perspectival intensity. That perspectival intensity decreases with distance from the sound source follows from the the inverse square law – as the spherical wavefront expands, the energy is distributed over a larger and larger area, so less energy is carried at a given point on its surface. Therefore, the 30 perspectival intensity of a wave varies with distance from its source to the listener. But it also varies with the intensity of the sound wave measured at its point of origin. And this means that perspectival intensity underdetermines distance. For any given perspectival intensity, there is an infinite set of distance, initial intensity pairs that might correspond to it. But then the initial intensity must be tracked for this cue to suffice for encoding distance (Zahoric et al. 2005). 31 Minimal audition encodes distance via intensity, alone, only if it encodes both initial and In this section I will set aside the individuation worries raised in §1.2 and use ‘ordinary audition’ in its rough-and- 29 ready intuitive sense – the same sense of audition that is invoked in the phenomenal objection. Whether or not this intuitive notion actually corresponds to the best way (or even an acceptable way) of individuating audition will be picked up again in §3. This is an idealization, which assumes the sound source is a point. Of course, sound sources are generally not 30 points, and the geometry of the sound source impacts the rate of decay. However, the idealization only makes things more difficult for me as it prescinds from further complications undermining the determination of distance from intensity (for instance, minimal audition doesn't encode the shape of the sound source and so can't determine which equation is relevant for determining distance from the encoded intensity). Similarly for the impact of environmental conditions and the impact of the resistance of the medium. This is why estimates of distance from intensity are more accurate for familiar sounds. We know, roughly, the 31 initial intensities of familiar sounds. 112 perspectival intensity. But minimal audition doesn’t encode initial intensity. The only intensity we get from stimulation of the basilar membrane is perspectival intensity (because the basilar membrane doesn’t measure how much it is being stimulated at some location it doesn’t occupy). Therefore, minimal audition doesn’t encode distance merely in virtue of encoding intensity. One might worry that auditory background information concerning reliable correlations between intensity and distance that could be used to mitigate this indeterminacy. However, the empirical evidence locates the first point at which auditory processing is independently sensitive to distance (that is, the point at which information from distance cues is transformed into information about distance) is in the so-called ‘where’ pathway of the auditory cortex – especially in the planum temporale and the posterior superior temporal gyrus – which responds to location and motion (Kolarik et al. 2012, 2016). Not only does this raise the worry that this 32 processing is already conditioned by the multisensory inputs to IC – through which nearly all signals originating from auditory stimulation pass before reaching the auditory cortex – but these areas of the auditory cortex receive direct input from other senses. In fact, they are implicated 33 in cross-modal auditory spatial phenomenon such as the ventriloquism effect (Callan et al. 2015; Zahoric et al. 2005). Given this, they must be excluded from the mechanisms of minimal audition. (And any background information they carry doesn’t factor in minimal auditory 34 processing.) ‘What’ and ‘where’ pathways have been proposed for audition (Rauschecker and Tian 2000; Ahveninen et al. 32 2006) on the model of the two visual system hypothesis, which associates the ventral visual processing stream with object recognition and the dorsal visual processing stream with vision for action (Goodale and Milner 1992; Milner and Goodale 2006). The view that the traditional sensory cortices are, in fact, multisensory brain regions is increasingly common 33 among neuroscientists – see (Budinger et al., 2006; Schroeder and Foxe 2005; Shimojo and Shams 2001). At least barring evidence that the relevant background information is both unconditioned by these multisensory 34 inputs and sufficient to resolve the indeterminacy. Such evidence is unlikely to be forthcoming, given the extent of multisensory interactions in auditory processing. 113 The result is that minimal audition will encode the infinite disjunction of possible locations of the sound source for the given perspectival intensity. But this is far too imprecise to give rise to the localization phenomenology cited in the phenomenal objection. Even if one thinks that this information will be enough to give rise to an experience of sounds as space occupiers, this phenomenology will be consistent with the sound being located right where the perceiver is because the information on which it is based is consistent with that location. So this phenomenology won’t rule out sounds being identified with or instantiated by pressure waves. This gives us a way of operationalizing the ‘particular’ vantage point relative distances and directions cited in AMA and RAA: The information regarding vantage point relative distance must be determinate enough to give rise to phenomenology (without informational inputs beyond those available to minimal audition) that can be used to decide between competing views on the ontology of sounds. Contrast this with looking at the moon – we have no precise information about the distance to the moon, but we still see it as occupying (distal) space. We lack this information because the moon is much too far for proprioceptive input regarding the angle of convergence of the eyes to allow triangulation of the distance to the fixated object (the moon). If we rule out all proprioceptive input from vision, all objects will be seen as at some highly indeterminate distance, as we now see the moon. But, given the laws of optics and the assumption that the stimulation of the two retinas have the same source, this range of distances will rule out the visual object’s being located right where the perceiver is. This is a crucial difference from the 35 If it turns out that this assumption is not carried (as background information) by uncontroversially visual neural 35 mechanisms, then what I am describing here is not the visual analog of minimal audition. The example is merely illustrative, though, and does not depend on whether what I have described is minimal vision or something between minimal and ordinary vision. 114 case of intensity (and all the other cues discussed below) – one that those pressing the phenomenal objection need to reconcile if the objection is going to rule out wave views of sounds. The point here is that this reconciliation is not possible for minimal audition. 2.1.2. Spectrum/waveform Most sound waves are composed of many frequency components (called partials) that are integrated into a single percept according to a set of heuristics (Bregman 1990). The spectrum 36 of a wave is the distribution of energy among its partials. The waveform is determined by its 37 spectrum: The individual amplitudes of each partial are combined additively, yielding the peaks and valleys of the waveform. (Spectrum is the main physical correlate of timbre.) Sounds perceived to have a discernible pitch have (relatively) harmonic relations between their partials – that is, the frequency of each partial is an integer multiple of the lowest constituent frequency (but see n.27). Sounds that do not exhibit such harmonic relations between their partials are noises. The spectrum of a sound wave changes with distance from its source (Butler et al. 1980; Little et al. 1992). In particular, higher frequencies are more attenuated with increased distance than are lower frequencies. So, at greater distances, we should expect a wave of a given initial spectrum to have lost more energy in its upper partials than in its lower partials, allowing spectrum to function as a distance cue in ordinary audition. These heuristics are akin to the algorithms of §1.3. While some of these heuristics will be attributable to minimal 36 audition – for example, those involved in spectrum detection – those involved in the production of more complicated auditory representations likely will not, as they are implemented in the auditory cortex (Christison-Lagay et al. 2015). This should alert us to the fact that intensity can be understood either locally (the intensity of one frequency 37 component of the complex stimulus) or globally (the overall intensity of all the frequency components as captured by one of our algorithms). The discussion of the preceding section refers to the latter. 115 Stimulation of the basilar membrane will fail to encode initial spectrum for reasons analogous to those for which it fails to encode initial intensity. Where before we were concerned with the overall level of excitation of the basilar membrane, now we are concerned with the distribution of that excitation in different regions of the basilar membrane. Otherwise the reasoning is the same: Variations in perspectival spectrum are confounded with the initial spectrum of the sound wave at its source. But the basilar membrane encodes the spectrum at the position of the hearer, not at the position of the source. So again we have an infinite set of pairs of distances (including no distance at all) and some feature of the wave – in this case, initial spectra. Spectrum underdetermines distance in minimal audition. And again, background information concerning reliable correlations – for instance that, in our environment, sounds with more energy in their lower partials are much more likely to be distant – won’t be a help here. As we’ve already seen in the discussion of distance from intensity, the first evidence of distance sensitivity from auditory cues is found in the the auditory ‘where’ pathway, which receives input from the other senses and so must be omitted from minimal audition. Spectrum is also used, in ordinary audition, to identify sound direction in the median plane – the plane perpendicular to and bisecting the interaural axis. This is accomplished by identifying prominent frequency bands in the overall spectrum of the wave. That the particular frequency bands are associated with particular directions is due to the fact that the pinna (the part of the ear sticking out from the sides of the head) and inner ear amplify different frequency ranges in a sound wave depending on its angle of arrival due to their shape (Blauert 1997). However, these cues are also confounded by initial spectrum, which might just happen to have prominent frequencies that are associated with a given direction in the median plane. Of 116 course, we could get direction out if we simply had a mapping of prominent frequency bands to directions, but then spectral cues, alone, wouldn’t determine direction – spectral cues plus the mapping would. But, such mappings aren’t part of minimal audition: Research suggests that spectral cues for direction are mapped to locations in the inferior colliculus (Grothe et al. 2010). But – as we’ve seen – these mappings are modulated by non-auditory information, and so must be ruled out of the mechanisms of minimal audition. Similarly, we could remove the ambiguity 38 if we could track head position through time, observing the changes in the perspectival spectrum as we do so. But that also requires going beyond minimal audition to include proprioceptive input. 2.1.3. Direct to reverberant ratio The direct to reverberant ratio (DRR) – the intensity of the direct signal relative to the intensity of reflections (subsequent stimulation caused by the original sound wave bouncing off a surfaces back to the ear) – has also been shown to be an effective cue to distance in enclosed spaces in ordinary audition (Bronkhorst and Houtgast 1999; Kolarik et al. 2013; Kopco and Shinn- Cunningham 2011). DRR distance estimates exploit a feature of audition known as the precedence effect. The initiation of an auditory event is accompanied by a brief inhibitory period (~40 msec) during which further auditory events are not formed. Rather, stimulations of the 39 basilar membrane during this period are parsed as providing information about the already That spectral cues are modulated by input from the other senses is also supported by research demonstrating that 38 participants outfit with artificial pinnae were initially impaired in their ability to locate sounds in elevation, but were able to correctly localize sounds after a couple days spent wearing the prostheses while going about their normal daily routines (Hoffman et al. 1998; Trapeau and Schőnwiesner 2018). Unless they are substantially louder than the initiating event. 39 117 initiated auditory event(s). The precedent effect can be captured (in its relevant details) by one of our algorithms: Stimulation of like regions of the basilar membrane occurring within ~40 msec are considered together as encoding some feature of the proximal stimulus (namely DRR). 40 We can get a grasp on the phenomenon by considering the difference between reverberation and echoes. An echo is experienced as a distinct auditory event. Reverberation is not. Both are the result of sound waves bouncing off a surface before reaching our ears. The only difference is the time it takes for them to do this. DRR is a comparison of the intensity of the direct stimulation (which initiates the precedence effect) and the intensity of subsequent reverberant signals, arriving within the scope of the precedence effect. DRR varies with the distance of the sound source from the perceiver and the size and shape of the space in which the sound is heard. In a given space, the DRR will be lower with greater distance (there will be a weaker direct signal relative to the reverberant signal). But the same DRR in a different space will be correlated with a different distance. Yet again we have an infinite list of pairs (now distance and room dimensions) corresponding to our putative distance cue (DRR). DRR underdetermines distance in minimal audition. 41 This underdetermination could be solved if we could encode the shape and size of an enclosed space in minimal audition. But to do so requires encoding the distances and directions to various surfaces in the environment from stimulation of the basilar membrane. The basilar The precedence effect relies on background information; namely, that the the direct and reverberant signals have 40 the same source. However, the effect first shows up in auditory nerve fibers and, so, can be included – along with this background information – unproblematically in minimal audition (Brown et al. 2015; Parham et al. 1996, 1998). The same assumption is at play in the binaural cues discussed in §2.2. There, the relevant background information is ascribable to the interaural discrepancy detectors in SOC. Plausibly, DRR does encode information about the volumetricity of the space in which the sound is heard, given 41 the assumption that the reverberant signal is caused by the same sound source. But this does nothing to locate the sound, itself, which could – for all DRR tells minimal audition – be anywhere, including the perceiver’s location. 118 membrane is stimulated by sound waves, so we would need sound waves emanating from each surface of the room – as they do in a reverberant space. And that compounds the initial problem. Now instead of encoding the distance to one sound source, we need to encode the distance to a tremendous number of (effective) sound sources – points along the surfaces of the enclosed space that reflect sound waves back to the listener. Complexity, aside, DRR is going to need some other distance cue to define the shape and dimensions of the enclosed space in which it is effective and thereby to allow encoding of the distance to a sound source within that space. Here it is less likely that there are environmental regularities that could factor, as background information, in resolving the indeterminacies. But even if there were, present research locates the transformation of DRR into distance in the ‘where’ pathway of the auditory cortex (Kopco et al. 2012, 2020), which we have already excluded from the workings of minimal audition. Therefore, minimal audition doesn’t encode distance wholly in virtue of DRR. 2.1.4. Monaural cue coordination What about the possibility that the cues work together to encode distance or direction? We have seen that each cue yields an infinite disjunction of values for the (pairs of) features with which it varies. In order for cue coordination to work, the apparent values for one cue must restrict the number of admissible pairs from the disjunctive list of another. For instance, perspectival spectrum – itself consistent with an infinite disjunction of distance, initial spectrum pairs – will have to eliminate a large number of the distance, initial intensity pairs given by perspectival intensity. And this requires dependencies between the non-spatial features with which these cues 119 vary – that is, it requires that an auditory event with a given perspectival spectrum could not have been produced by a wave of a given initial intensity at a given distance. But there are no such dependencies. Intensity is a measure of the overall energy of the wave. Spectrum is a measure of the distribution of that energy across the wave’s constituents. For any initial intensity and distance matching a perspectival intensity, there will be an intrinsic spectrum that would match the perspectival spectrum at that distance. Similar remarks apply to DRR. DRR varies with the dimensions of the enclosed space in which the sound is heard, but there simply is no dependency between the initial intensity or spectrum of a sound wave and the dimensions of the space in which it is heard. Furthermore, reliable correlations found between these cues and the location of sound (sources) will be of no help. Cue coordinations for distance cues utilizing such background information – for example, that sound of a certain intensity and spectrum tend to be far away – are still bound by the empirical results showing no selectivity for distance prior to the auditory ‘where’ pathway discussed above. So that is where we should attribute such background information, in which case it is not available to minimal audition. So much for the monaural cues, unless they can combine with the binaural cues to give distance or direction in minimal audition. I will argue below that they can’t. But first I will argue that the binaural cues, themselves, fail to encode distance and direction in the absence of correlations with information not encoded by the basilar membrane. 120 2.2. Binaural cues Binaural cues – those depending on the coordination of inputs at two ears (more to the point: at two basilar membranes) – can be admitted to minimal audition by one of our algorithms (akin to the precedence effect, now applied to stimulation across basilar membranes rather than at one basilar membrane). 2.2.1. Interaural level difference and interaural time difference The binaural cues are interaural level difference (ILD) – the difference in intensity of a sound wave at the right and left ears, due primarily to the acoustic shadow cast by the head at the farther ear – and interaural time difference (ITD) – the difference in arrival time at the right and left ears. They are used to determine the direction of sounds (or sources) in ordinary audition. When the wave has greater intensity and/or arrives sooner at the left ear, for instance, the auditory event will be experienced as being on the left (Brungart et al. 1999; Duda and Martens 1998; Konishi 1993). For sources more than a meter away, directional information from ILDs and ITDs generally coincides, so for the present I will proceed as though there is a single binaural cue. I address the cases where ILDs and ITDs differ in §2.2.2. 42 Readers familiar with the duplex theory of sound localization might wonder about these cases of overlap. In the 42 standard duplex theory, ILDs dominate binaural sound localization for sounds above about 1500 Hz while ITDs dominate below that threshold. This is a matter of dominance (which cues are weighted more strongly) not exclusive domains. However, there remains a substantial overlap of effective ranges of ILD and ITD in the traditional duplex views. Furthermore, recent work has helped specify the role of ILD detection in auditory localization throughout the range of hearing, particularly in conjunction with ITDs at frequencies below the traditional duplex theory threshold (Grantham 1984; Harmann et al. 2016; Yost 1988). Similarly, sensitivity to, and a role in spatial processing for, ITDs in high frequency sounds above the traditional duplex theory threshold has been demonstrated (Henning 1974). Note also that most research supporting the duplex theory relied on sine tones – simple tones without partials – to test the effective ranges of ILDs and ITDs, but naturally occurring sounds are complex with many partials at different frequencies that straddle the traditional duplex theory boundary. 121 ILDs and ITDs can be used to specify a cone-shaped region of space – the so-called ‘cone of confusion’ – extending outward from the perceiver’s head directly to the left or right and with its vertex at a point on the interaural axis (see Figure 3.1, left). ILDs and ITDs vary with the angle of the sound source relative to the interaural axis. (Rotating this angle about the interaural 43 axis generates the cone.) Idealizing a bit, a sound wave emanating from anywhere on the surface of a given cone will result in the same ILD/ITD. (The localization of the sound actually falls along the surface of a volume generated by rotating a parabola about the interaural axis. This volume mostly coincides with the cone but intersects the interaural axis between the ears at a point nearer the ear closer to the sound source, whereas the vertex of the cone proper, is at the midpoint of the interaural axis. For any point on the surface of the parabola, the difference in distance from that point to the left ear and from that point to the right ear is constant.) Even if minimal audition could specify a cone of confusion to produce sound localization phenomenology on the basis of encoded ILD/ITD information (and just this basis), the resulting phenomenology wouldn’t place the sound in a particular direction. Each cone has a vertex on the interaural axis – right where the perceiver is, rather than off in some direction. Furthermore, each cone is symmetrical about the interaural axis. ILD/ITDs won’t place a sound, say, to the left and somewhat in front of the perceiver. They would place the sound to the left and either somewhat in front of or somewhat behind (or above, or below) the perceiver. However, minimal audition can’t determine a single cone of confusion from ILD/ITD. Consider ITDs, which are determined, not just by the angle of the wave’s arrival and the speed of For a spherical head, ITD = d/(2c)(θ + sinθ), where d is the interaural distance, c is the speed of sound, and θ is 43 the angle of the source relative to the midpoint of the interaural axis. ILDs are more complicated, but a little picture thinking should suffice to show that, for a spherical head, the shadowing effects will vary with angle of arrival. 122 sound, but also by the interaural distance (see n.43). If we are going to solve for the angle of the sound source relative to the interaural axis (and hence for which cone of confusion the ITD corresponds to), then we will need information regarding the the interaural distance as well as ITD. And interaural distance is not available to minimal audition: Recall that the transformation 44 of ITD (registered in the medial superior olive in the brainstem) into azimuth of the sound source occurs in the inferior colliculus. The inferior colliculus receives input from the auditory cortex which carries information about learned correlations between auditory stimulations and other sensory inputs that modulate the transformation of ITD into azimuth. These correlations – between proprioceptive input concerning head and eye movements and the resulting changes in visual and auditory stimulation – will be the basis for any background information concerning interaural distance. As such, we cannot include this information in the operations of minimal 45 audition. Similarly, to determine the size of the acoustic shadow cast by a head and thereby determine a cone of confusion from ILD (registered in the lateral superior olive), we need to know the dimensions of the head (including the interaural distance). Yet minimal audition does not have access to information concerning the dimensions of the head. These, too, cannot factor A little more picture thinking to illustrate: The parabola specifying the region in which the sound is located will be 44 wider when the interaural axis is longer because we will need to compensate for the increased distance to the far ear by bringing the sound source closer to it to maintain the difference in the distance from the source to the left and right ears. But this changes the egocentric direction to the sound (source). After all, children need to be able to update the interaural distance as they grow. So even if they begin with a 45 default representation of interaural distance in IC, it will quickly be superseded by a refined representation that is conditioned by multisensory input. 123 into auditory processing prior to the inferior colliculus, where (again) the transformation from interaural disparity to external direction (azimuth) is made. 46 At best, then, ILD/ITD determines, an infinite range of cones opening to one side of the head or the other in minimal audition. But even this credits minimal audition with too much: minimal audition can’t determine if a given ear is the right or left ear because it has no access to information by which the interaural axis can be oriented relative to external space – or even the perceiver’s bodily space. (This orientation is one more piece of information necessary to extract spatial information from the putatively auditory cues that will need to be accounted for in the transformations of interaural disparities to azimuth in IC.) So it can’t determine whether a given cone – or an infinite range of cones – opens to the left or to the right. Therefore, minimal audition doesn’t encode any particular direction from a given ILD/ITD, and ILD/ITD pose no threat to AMA. Every ILD/ITD is consistent with the sound (source) being located anywhere, including right where the perceiver is. 47 2.2.2. Binaural and monaural cue coordination At close distances (approximately 1 meter) the directional information extracted, in ordinary auditory experience, by ITDs and ILDs come apart. In these cases, ILDs – in conjunction with directional information determined by ITDs – can determine distance. In particular, ITDs generate a cone of confusion and the discrepant ILD can be used to determine a spherical region These points get further support from studies of adaptation to modified interaural cues in cases of unilateral 46 deafness or hearing impairment and experimental interventions (Javer and Schwartz 1995; Trapeau and Schönwiesner 2015). For ILD/ITDs of zero, we might think that we can get some directional information; namely, that the sound is 47 located on a plane perpendicular to and bisecting the interaural axis. However, an ILD/ITD of zero is also consistent with sounds surrounding the listener and sounds emanating from right where the listener is. 124 of space with its centre in line with the interaural axis and its surface intersecting the interaural axis between the ears (nearer the near ear). The sound source will be located somewhere within 48 this sphere, excepting a smaller spherical region also centered on the interaural axis but shifted towards the ear. (This inner sphere is the sphere that would be generated by an ILD one just noticeable difference below the registered ILD.) The resulting volume is a spherical shell that is thickest directly opposite the ear, thinnest where it intersects the interaural axis between the ears. When superimposed on the cone of confusion, this sphere isolates a torus-shaped section of the cone on which the sound (source) is located (Shinn-Cunningham et al. 2000) (see Figure 3.1). Note that this spherical region, is consistent with a sound source at the perceiver’s location. But once the spherical region and cone are superimposed to specify a torus-shaped region of space at their intersection in which the sound source might be located, this is no longer (necessarily) the case. However, we must once again contend with the fact that ITDs will not, in minimal audition, determine a single cone of confusion. So the region of intersection is far less precise than it is in ordinary audition. Indeed, ITDs won’t rule out any part of the sphere – there will be some cone intersecting a given portion of the sphere for the given ITD. And the calculation for determining this sphere from ILD requires information regarding the interaural distance, too. So, without information about the interaural distance, we also get an infinite range of spheres. When coupled with the difficulty concerning the orientation of the interaural axis, the result is that the region of possible intersections of cones and spheres is unconstrained. These ILD variations follow the inverse square law with the result that any location of the source that maintains 48 the given proportional distance of the source to the left and right ears will produce the same ILD (provided we disregard the impact of the material properties of the head – see below). Taken together these locations form the surface of a sphere. 125 Furthermore, the material properties of the head – its size, shape, and acoustic absorbency, as well as the position of the ears on its surface – will impact the ILD, and hence, the ILD-ITD discrepancy. And so the same ILD-ITD discrepancy is consistent with different localizations of a sound for differently shaped or constituted heads. In particular, for any ILD- ITD discrepancy, there could be a head so constituted that this discrepancy would result from a sound (source) at the perceiver’s location. The result is that minimal audition cannot take advantage of the ILD-ITD discrepancy to determine the distance of a sound (source). So far as minimal audition is concerned, a sound associated with a given ILD-ITD discrepancy could be anywhere, including right where the perceiver is. 126 Figure 3.1. The left half of the diagram represents cones of confusion generated by different ITD values. The shaded areas show a cross-section of the cone that would be generated by a sound source locates at ‘o’. The right half shows cross-sections of ILD-determined spheres. Dotted lines show the ITD cone for a sound source located at the ‘x’. A cross-section of the ‘tori of confusion’ is found at the intersection of this cone (dotted lines) and the shaded circular region (at 3 dB ILD and .4 ms ITD). (from Shinn-Cunningham et al. 2000, p. 1629) This is disanalogous to the stereopsis case (see §2.1.1), in which a visual object was determined to be at an imprecise distance from the perceiver – but at some distance or other – from retinal disparities along with the assumption that the stimulations had the same source and the laws of optics. In the case of discrepant ILDs and ITDs, we have a disparity and a unity assumption, but there are no laws of nature that take us from there to the fact that the auditory object is some distance from the perceiver. To get that, we also need inputs concerning the interaural distance and material properties of the head. And since minimal audition doesn’t encode or implicitly carry this information, it doesn’t encode distance (or direction) in virtue of ILD-ITD discrepancies. What about the coordination of binaural cues with monaural cues? It is unclear what additional information we could get by combining ILD and intensity cues because intensity values recorded at each ear are already factored into the calculation of ILD. There would need to be some spatially relevant information revealed by comparing the two – we would need to be able to restrict the distance or direction of the sound source determined by one by considering the other. For this to work ILD would need to constrain the source of underdetermination for intensity or intensity would need to constrain the source of underdetermination for ILD. But neither manages to do this: the source of underdetermination for intensity is initial intensity. The only constraint a given ILD value places on initial intensity is that the initial intensity be great enough to reach both ears from its point of origin. But there will be an initial intensity that satisfies this condition for a sound source any distance from the perceiver (along the ILD- determined cone of confusion). Similarly, intensity does not constrain the sources of indeterminacy for ILD: it does not limit the interaural distance or material properties of the head. 127 That leaves DRR and spectrum. Suppose you take DRR measures at both ears and compare them with ILD. ILD is a measure of just the direct signal. If it were not, the spatialization from ILD in an ordinary auditory experience could be confused – the overall measure of energy (D+R) could be greater at the farther ear if that ear were much closer to a wall thereby getting greater reverberant energy while the discrepancy in the direct signal remains relatively small. In such a condition, ILD spatialization would place the sound (or source) as in the wrong direction. But then the interaural differences in reverberant intensity vary independently from ILD. They vary with the locations of the sound source, the dimensions of the the space in which it is heard, and the location of the perceiver in that space. ILD varies with the location of the sound source and material properties of the head. The relevant material properties of the head – excepting, to a (too) limited extent, size – are not constrained by, or constraining on, the dimensions of any enclosed space the head happens to occupy. So comparisons of binaural DRR to ILD do not provide any additional information concerning the location of a sound (or source). The same goes for comparisons of binaural DRR to ITD, which varies with location of the sound source and distance between the ears (and which is also not sufficiently constrained by, or constraining on, room dimensions). And, given that DRR doesn’t constrain any of the relevant features for either ILD or ITD, it won’t help where ILDs and ITDs come apart, either. Binaural differences in spectrum, since they are law-governed functions of distance, constituent frequency, and the resistance of the medium, will not yield any useful information not already contained in the ILD. The overall level difference will be describable in terms of the level differences between individual frequency components, but that is all. Granted, pinna 128 filtering affects sounds coming from the front differently from those coming from the back, so the spectral direction cue could resolve front–back ambiguity for a given cone of confusion. However, this would require background information regarding the relative distributions of innate spectrum types in the environment and the relative front–back effects of pinna filtration, which are not available until IC, at the earliest. Any background information that might factor 49 in the integration of spectral distance cues with the binaural cues – as with all auditory distance cues and their combinations – is not plausibly attributed to auditory processing prior to the ‘where’ pathway. But neither IC nor the ‘where’ pathway are part of minimal audition. And so the coordination of cues will not determine vantage point-relative distance or direction. Therefore, minimal audition does not encode vantage point-relative distance or direction. AMA is true. And if AMA is true – given our assumption that the content and phenomenal character of a purely auditory experience are dependent on the encoding capacities of minimal audition – RAA is, too: The purely auditory experience cannot present its objects as located at some distance or in some direction in an egocentric frame of reference without drawing on contents provided by some non-auditory source. 3. RAA, unrestricted aspatialism, and the ontology of sound I now return to the strategy outlined in §1.2 for leveraging RAA into an argument that the phenomenal objection to unrestricted auditory aspatialism is inconclusive, thereby casting doubt on the use of the phenomenal objection to secure results about the ontology of sounds. My goal Indeed, there is evidence that such integration of spectral and binaural distance cues is performed in IC (Sterbing 49 et al. 2003). There is also evidence of spectral cues impacting azimuthal calculations in the superior colliculus (SC) of the mouse (Ito et al. 2020). SC is downstream of the IC and receives substantial multisensory input. 129 here is to make the proposal more concrete in light of the details of the putatively auditory spatial cues canvased in §2. The strategy leans on the fact that there are no generally agreed upon criteria for individuating the senses and the possibility that there are distinctively multisensory phenomenal features of perceptual experiences. These points call into question the association of sound localization phenomenology with audition. It is perfectly plausible that, on the correct (or a correct) individuation of audition, the localization phenomenology is a multisensory feature – only supplied by the operation of audition in conjunction with non-auditory sensory inputs. This is plausible because this phenomenology is not attributable to at least some notion of audition; namely, minimal audition. So additional mechanisms will need to supplement those of minimal audition in order to produce this phenomenology. But, for each addition, we need to assess whether or not the resulting mechanism is auditory. To do this we need a clear set of criteria for individuating (non-minimal) audition, and we don’t have this. To illustrate, consider again O’Callaghan’s (2017a) interpretation of Strawson’s purely auditory experience – an auditory experience had by someone who has never had experiences in any other sensory modality. (This is a more permissive notion of the purely auditory experience than the one at issue in RAA.) As we’ve seen, to get the full benefit of the cues discussed in §2, the O’Callaghan-style purely auditory experience will need to be augmented with correlations to sensory inputs in the other senses. For instance, correlations between proprioceptive input regarding head movements and the resulting changes in auditory and visual stimulation could suffice to set the interaural distance and orient the interaural axis with respect to external space, thereby permitting ITD and ILD to determine a single cone of confusion and spectral effects to 130 determine the elevation of a sound (source) – as seems to happen in IC. The question to settle is whether the mechanism resulting from the addition of these learned correlations is best thought of as auditory or multisensory. On this question, researchers disagree. The multisensory interpretation matches de Vignemont’s (2014, 2018) etiological account of what it is for a feature to be multisensory. By 50 contrast, O’Callaghan’s (2015, 2017a, 2017b, 2019) account of multisensory features requires occurrent stimulation of multiple senses. Provided that the augmented experience doesn’t require occurrent stimulation of some other sense, the resulting localization phenomenology will be auditory on O’Callaghan’s approach. But auditory aspatialism will be true with respect to the 51 etiological approach, on which the phenomenology cited in the phenomenal objection isn’t auditory. Without a clear set of criteria for individuating the (ordinary) auditory mechanism, we can’t decide between these approaches. Similar problems arise for any non-sensory inputs that turn out to be involved in extracting distance and direction information from the cues discussed in §2. The result is that the phenomenal objection is inconclusive. And this inconclusiveness extends to the metaphysical claims that the objection is used to support. The phenomenal objection works in concert with the claim that our perceptual experiences do not systematically deceive us and the – usually unarticulated – assumption that sounds are the objects of audition. If audition presents its objects as located and doesn’t systematically deceive us, then its objects must be located. Since those objects are sounds, sounds must be located. But given the This approach has also been adopted by Wong (2017) and accords with standard scientific practice. 50 Briscoe (2019) adopts O’Callaghan’s approach. See (Wadle 2021) for a critique of this way of associating features 51 with senses. 131 competing notions of audition, we must consider the possibility that the objects of these distinct notions might be different, too. For instance, the object of an O’Callaghan-style purely auditory experience might be one thing – for instance, a wave – while the object of an experience that adds in learned correlations with other sensory inputs is another – such as an event. If that’s right, the phenomenal objection doesn’t get us to the claim that sounds are events because the relevant phenomenology only appears in an experience whose objects are events, not in an experience whose objects are waves, and we haven’t been given a reason to prefer the former characterization of the auditory experience over the latter. 52 To block this line of defense, the spatialist will either need to offer an argument that, by ‘sounds’, we mean the objects of the more expansive rather than the less expansive sort of experience or that the objects of both sorts of experience are the same thing, though one gives a more impoverished experience of that thing. Either way, we become embroiled in other contentious debates – over metasemantics and the correct account of mental content, respectively. In short, more needs to be said before the phenomenal objection can be taken as 53 supporting ontological claims about the nature of sounds. 4 . Conclusion I have argued that, given the ubiquity of interactions between the (putatively) distinct senses, beginning from the earliest stages of perceptual processing, it is not transparent which features of This bears some affinity with Nudds’s (2009) view. However, Nudds allows that both sounds – which are are 52 abstract particulars instantiated by sound waves on his view – and sound sources are objects of audition, though only the sources are located. My proposal allows us to avoid the difficult task of explaining why one sort of object of audition is experienced as located but the other is not. This result holds even if we adopt a sensory pluralism (Fulkerson 2014b; Macpherson 2011b, 2011c, 2014) that 53 allows both approaches to capture acceptable senses of audition. Each of these acceptable notions may well have distinct objects. 132 a perceptual experience ought to be associated with which sensory mechanisms. Next, I introduced a minimal notion of audition, defined so as to guarantee that it would circumvent these complications from cross-modal interactions. I then argued that minimal audition cannot encode vantage point relative distance or direction. I characterized a purely auditory experience as an experience that takes its informational inputs wholly from the information encoded by the minimal auditory mechanism and only relies on unequivocally auditory background information in the processing of those inputs. Given that representation of a spatial feature in the purely auditory experience is dependent on the ability of minimal audition to encode that feature, the purely auditory experience can’t present its objects as located in a particular direction or at a particular distance. (Nor does it seem as though sounds are located at a distance in some direction in the purely auditory experience.) So, restricted auditory aspatialism is true. Given these results, the phenomenal objection to unrestricted auditory aspatialism is inconclusive. We do not presently have an uncontroversial notion of the auditory mechanism to which our experience of sounds as located can be attributed. My experience of the Whistler as located far away to the Southwest – while very real – might not be due to audition on every reasonable demarcation of the auditory mechanism. For all we can say now, it might not be due to audition on any reasonable demarcation of the auditory mechanisms. But then it is premature to conclude that unrestricted auditory aspatialism is false and to draw further conclusions about the nature of sounds from that falsity. If the debate between spatialists and aspatialists is to be decided by appeal to auditory phenomenology, we will have to shift focus to the demarcation of the auditory mechanism and the sorts of features that the mechanism, on a given demarcation, 133 can capture (and therefore contribute to an experience) in the manner I have pursued with respect to minimal audition. 134 Chapter 4 The Contributions of the Bodily Senses to Cortical Body Representations ABSTRACT: Felix reaches up to catch a high line drive to left field and fires the ball off to Benji at home plate, who then tags the runner trying to score. For Felix to catch the ball and transfer it from his glove to his throwing hand, he needs to have a sense of where his hands are relative to one another and the rest of his body. This sort of information is subconsciously tracked in the body schema (or postural schema), a representation of the current bodily posture that is updated on the basis of proprioceptive inputs (Head 1920; Pallaird 1999; Gallagher 1998). While the existence of the body schema in not in dispute, its origin is. After reviewing the competing proposals (§1), I introduce the conceptual tools needed to move the debate forward (§2) and apply them to the question of the extent to which the body schema could be learned from perceptual input in utero (§3-§4). I argue that it could give rise to something recognizable as the body schema, though not quite rising to the level of the mature body schema. After considering the implications for further research on the origins of the body schema, I show how these results apply to other body representations, helping clarify the vexing question of the number, nature, and interactions among cortical body representation. This theoretical work also promises to 135 advance our understanding and treatment protocols for disorders affecting cortical body representations (e.g., anorexia nervosa) (§5). Contents 1. The origins of the body schema 2. Information and the minimal bodily senses 3. Minimal proprioception and minimal touch 3.1.minimal touch 3.2.minimal proprioception 3.3.minimal proprioception + minimal touch 3.4.limitations 4. Minimal equilibrioception 4.1.minimal proprioception+touch+equilibrioception: orientation 4.2.minimal proprioception+touch+equilibrioception: scale 5. Implications and future directions 5.1.the origins of the body schema 5.2.the individuation of cortical body representations Felix reaches up to catch a high line drive to left field and fires the ball off to Benji at home plate, who then tags the runner trying to score. For Felix to catch the ball and transfer it from his glove to his throwing hand, he needs to have a sense of where his hands are relative to one another and the rest of his body. This sort of information is subconsciously tracked in the body schema (or postural schema), a representation of the current bodily posture that is updated on the basis of proprioceptive inputs (Head 1920; Pallaird 1999; Gallagher 1998). While the existence of the body schema in not in dispute, its origin is. After reviewing the competing proposals (§1), I introduce the conceptual tools needed to move the debate forward (§2) and apply them to the question of the extent to which the body schema could be learned from perceptual input in utero (§3-§4). I argue that it could give rise to something recognizable as the body schema, though not quite rising to the level of the mature body schema. After considering the implications for further 136 research on the origins of the body schema, I show how these results apply to other body representations, helping clarify the vexing question of the number, nature, and interactions among cortical body representation. This theoretical work also promises to advance our understanding and treatment protocols for disorders affecting cortical body representations (e.g., anorexia nervosa) (§5). 1. The Origins of the Body Schema The traditional view is that the body schema is acquired after birth via coordination of sensory inputs from different modalities (primarily touch, proprioception, and vision) and, perhaps, efference copy (Piaget 1962; Wittling 1968; Assaiante et al., 2014) – e.g., an infant learns that certain proprioceptive inputs correlate with certain visual stimulation when she moves her right arm in front of her face. Such correlations are thought to be learned through more or less random self-motion in early childhood. More recently, it has been proposed that the body schema is already present at birth. On one version of this proposal the body schema is an innate endowment genetically hardwired into the fetus (Gallagher et al., 1998 Gallagher 2005; Rochat 2001; Bhatt et al., 2016). On the second present-at-birth approach, the body schema is acquired by a process of 1 exploratory self-motion – as with the traditional view – but this process, called body babbling or Evidence for phantom limbs in cases of congenital limb aplasia – as these authors note – appears after a latent 1 period in which there is no evidence of a phantom limb, raising the possibility that the phantoms form in the body image (a distinct body representation used for perception and body-based judgments – see §5.2) on the basis of visual and social learning. The evidence also focuses on perceptual reports, further implicating the body image (Weinstein and Sersen 1961; Weinstein, Sersen, and Vetter 1964; Vetter and Weinstein 1967; Poeck 1964; Melzack 1989; Scatena 1990; Grouios 1996). Gallagher and colleagues note that there remains an absence of evidence for representations in congenitally absent limbs in the body schema (e.g., we don’t see behaviors that indicate that action planning proceeded as though the missing limb were present). To circumvent this difficulty they propose that, in the absence of sensory stimulation in utero, innate representations of the affected body parts in the body schema will atrophy, dwindle, and perhaps even disappear (Gallagher et al. 1998; Gallagher 2005). 137 motor babbling by its proponents, begins in utero (Meltzoff and Moore 1997; Meltzoff 2007a, 2007b; Marshall and Meltzoff 2014, 2015; Meltzoff and Marshall 2018; Fagard et al. 2018). In 2 the course of the exploratory self-motion, tactile sensation provides sensory feedback as body parts come into contact with one another or with the uterine wall. This is combined with proprioceptive input to determine the relative positions of body parts and how they correlate with various movements available to the fetus. Both present-at-birth views are inspired by results suggesting that extremely young neonates can imitate facial expressions and hand gestures (Maratos 1982; Meltzoff and Moore 1977, 1983, 1989, 1992, 1994; Kugiumutzakis 1999; Nagy et al., 2005; Nagy et al., 2014; Ullstadius 1998; Simpson et al. 2014). The idea is that an infant would need a body schema to represent the position of, say, their tongue to imitate an experimenter sticking their tongue out at the infant. Assuming that the support for neonatal imitation is adequate, the inference to a body schema at birth is supported. However, imitation studies cannot tell us whether the body schema originates with body babbling or with innate endowments. To decide between these alternatives, we need to know if there is sufficient sensory input to generate the body schema in utero. If there is, this places pressure on the innate view: The innate endowment view requires body babbling- like movements to reinforce the innate schema (see n.1). If such movements are also sufficient for forming the schema, then there is an argument from parsimony in favor of the body babbling In its initial presentation, body babbling was invoked as part of an explanation of infant imitation of facial 2 expressions that was inspired by verbal babbling accounts of the learned mappings of articulatory gestures and the resulting vocal sounds (Meltzoff and Moore 1997). The idea has since been expanded in work by Meltzoff and Moore to explain whole body imitation (1989, 1992, 1994). The resulting account of infant imitation then relies on intermodal equivalences established between the body schema resulting from body babbling and visual representations of others’ bodies (Meltzoff and Moore 1997, pp. 7-8). The origins of this cross-modal equivalence are left unexamined. 138 hypothesis. If there is not sufficient sensory input for the formation of a body schema, this lends 3 support to at least some innate endowments. The key question, then, is whether sensory information available in utero is sufficient for the formation of an imitation-supporting body schema without the support of innate endowments of the sort proposed by Gallagher and colleagues. However, these infant imitation studies are highly controversial (Jones 2009; Ray and Heyes 2011; Oostenbroek et al., 2016; Oostenbroek et al., 2018). So we cannot yet rule out the traditional view. Of course, failure to show imitation doesn’t guarantee that there is no body schema present at birth. There are other factors that are necessary for imitation that could be lacking, thereby explaining the failure to replicate the imitation results – e.g., social motivation (Bremner 2017, p. 6). So all three of our alternatives (the traditional view, body babbling, and innate endowments) are still on the table. If the controversy around neonatal imitation is resolved in favor of non-imitation, then the proponent of a present-at-birth body schema must fall back on other studies of in utero movements and corresponding brain activity. To date these studies have focused on brain activity in preterm neonates to neonates no more than 60 days old (who are thought to be roughly on part with near term fetuses in terms of brain development). These studies provide support for the existence of cortical body maps in the neonatal/fetal brain (Müller 2003; Milh et al., 2007; Marshall and Meltzoff 2014, 2015; Nevalainen et al., 2015; Meltzoff and Marshall 2018; Fagard et al. 2018; Meltzoff, Saby, and Marshall 2019). Several authors claim that these results provide The argument is further strengthened by the consideration that the fetus’s body undergoes substantial 3 developmental changes in size and relative proportions of body parts, which would mean that any innate schema would need to be continually updated through the process of development. 139 evidence of a functional body schema in the fetus (Marshall and Meltzoff 2014, 2015; Fagard et al., 2018; Meltzoff, Saby, and Marshall 2019). 4 However, these studies do not rule out (i) a role for fetal movements that is restricted to contributing to the organization of neural pathways resulting in the somatotopic organization of the primary somatosensory cortex and motor cortex corresponding to the homunculi (fig. 4.1) of the somatosensory and motor cortices and (ii) a role for genetically programmed developments of the body schema that prefigure (or are accompanied by) fetal movements of the corresponding body parts. The idea behind (i), which can be taken as a traditionalist response to the body babbling interpretation of these results, is that fetal movements merely help organize somatosensory cortex (S1) into a somatopic map known as the sensorimotor homunculus (Penfield and Boldrey 1937; Penfield and Rasmussen 1950). The homunculus is the primary destination of pre-cortical tactile and proprioceptive pathways and has traditionally been viewed as a sensory relay, not as functional representations of the spatial organization of the body. In particular, it is not thought 5 to function as the body schema. Two features of the homunculi support this view (1) the homunculus is distorted in a way that would undercut the accuracy of the representation of limb These accounts also reference observations of seemingly goal-oriented movements of the fetus in utero (e.g., 4 opening the mouth in apparent anticipation of the arrival of the hand; Myowa-Yamakoshi and Takeshita 2006; Zoia et al. 2007; Reissland et al., 2014) to bolster this interpretation. The extent to which these seemingly anticipatory behaviors are in fact anticipations of the outcomes of planned actions is open to debate, though. There are, in fact, multiple hierarchically arranged homunculi found in Brodmann areas 3a and 3b of S1. Those of 5 3a are primarily associated with proprioception, those of 3b primarily with touch. While these take their primary input from their associated peripheral receptors (proprioceptors and mechanoreceptors in the skin, respectively), they also take inputs from elsewhere. For instance, there are projections to S1 from non-somatosensory nuclei of the thalamus– particularly nuclei that take multisensory inputs. The function of these projections is currently unknown, as is whether they remain wholly somatosensory, but routed through nuclei that also receive inputs from other senses, or include inputs from other senses. Similarly, inputs to S1 related to proprioception (Brodmann area 3a) are largely restricted to input from the various peripheral proprioceptors. Unsurprisingly, 3a has extensive, reciprocal connections with motor areas. 140 positions in the postural schema and (2) the structural organization of the homunculus does correspond to that of the body, itself. If the brain activity and movement patterns seen in very 6 young neonates is merely indicative of the organization of the homunculi, then it does not support the presence of a body schema at birth. Nevertheless, the neurological evidence indicates that fetuses have sensory experience and engage in movements similar to those that the traditionalists cite as sources of acquired knowledge about the body postnatally. It would be surprising if no learning relevant to the formation of the body schema were accomplished in utero, even if that learning falls short of a Figure 4.1. 1954 Penfield Homunculi: (A) Sensory homunculus. (B) Motor homunculus. (Image from the Penfield Archives, Osler Library of the History of Medicine.) A recent study suggests that the somatosensory homunculus might play a role in the representation of body metrics 6 and, so, is more than a sensory relay (Giurgola et al., 2019). Nevertheless, the body schema certainly involves more than just the homunculi. 141 functional body schema (de Klerk et al., 2021). The key question, here, is how much perceptual learning, relevant to the body schema, is accomplished in utero. 7 To address (ii), we should consider what predictions the innate endowment view makes that will distinguish it from the body babbling hypothesis. This is particularly challenging given the reinforcement role posited for fetal movement by nativists. To escape the parsimony argument, the proponent of innate endowments will need to show that these endowments are necessary for the formation of the body schema – sensory input alone will not suffice. So once again, we need to consider whether there is sufficient sensory input in utero to generate a body schema. 8 In sum, the question we need to answer to adjudicate between the three competing accounts of the origin of the body schema is: How much is acquired in utero and does it amount to something properly called a body schema? My goal, in the next three sections, is to move us closer to an answer to this question by determining if a body schema could be derived from body babbling in utero. This is not conclusive, of course. The body schema might not begin to develop until infancy due to the absence of 7 relevant sensory input or insufficient brain maturation. To fully address (i), we would need to find activation in brain areas associated with the body schema (e.g., the posterior parietal cortex) beyond the homunculus. A difficulty with this approach is finding a method of locating brain activity with sufficient spatial resolution: Current research on neonatal brain activation during somatosensory stimulation is that it uses EEG, which has relatively poor spatial resolution. More finely resolved technologies (e.g., fMRI) are difficult to use with infants. We could also address (i) by examining behavioral evidence for or against action planning or awareness of limb positions in fetuses and very young neonates as well as evidence for or against repetitive movements that could generate sensory information about the size and position of limbs in utero. These behavioral observations will, of course, bring interpretive difficulties and are not likely to be decisive. Indeed, if neonatal imitation is not supported, the innate endowment theorist will need to point to some 8 achievement that implicates the body schema prior to the child having sufficient sensory experience to acquire the body schema. Taking neonatal imitation off the table extends the timeline for learning from sensory experience and puts the innate endowment theory back into competition with the traditional view as well as the body babbling hypothesis. 142 2. Information and the Minimal Bodily Senses To begin, we need a way to assess the potential sensory contributions to the formation of a body schema in utero. The sensory contributions come from the bodily senses – touch, proprioception, and equilibrioception. (Visual input – the other primary sensory contributor to body representations – is too impoverished in utero to be much help (but see §5.1).) I understand these sensory contributions in terms of body-relevant information supplied to the body schema. It will be helpful, then, to specify what counts as a relevant bodily sense in this context, what is meant by information, and how that relates to representation (which is, after all, what the body schema is). The relevant notions of these bodily senses are not the mature proprioception, touch, and equilibrioception of adult experience. They are, rather, the earliest developmental stages of what will later become these mature senses. The evidence that fetal movements play a role in the somatotopic organization of S1 and the fact that there are distinct regions of the primary somatosensory cortex (S1) for processing proprioceptive and tactile inputs (Brodmann areas 3a and 3b, respectively) suggests that these input streams are first organized separately and are integrated at a later stage of development. Therefore, it will be advisable to take extremely minimal notions of the senses – first in isolation, then in combination – as our starting point. As a first pass, we can say that the minimal bodily senses receive input from the stimulation of their peripheral receptor organs – e.g., pressure on the skin for minimal touch, stretching of muscle fibers and tendons for minimal proprioception, mechanical stimulation due to acceleration of hair cells in the vestibular labyrinths for minimal equilibrioception – and engage in only shallow processing thereof, prior to any interactions with other sensory inputs. 143 Regarding the information carried by the minimal bodily senses: A perceptual mechanism, M, carries information about a state of affairs, S, just in case some activity, A, of M makes it more or less likely that that state of affairs obtains (i.e., Pr(S) ≠ Pr(S|A)). To carry information about a state of affairs is not necessarily to represent that state of affairs, but our ultimate goal is a representation of the body’s current posture (the body schema). On the framework I developed in chapter 2, the representational content attributable to some state of a perceptual mechanism is derived from a subset of the information carried by that mechanism. It is that subset of information that is used by cognitive processing to perform some aspect of its function. 9 Therefore, I will concentrate only on the information available for use – what I’ll call encoded information – and will focus on the ability of the minimal bodily senses to encode information concerning the position of body parts relative to one another. Given what I have said about the relationship between encoded information and representation, this is a necessary precondition of the formation of a representation of the current body posture (the body schema). Since the role of the body schema is to make the current bodily posture available to motor processing, encoded information about bodily position will be represented in the body schema because it will be used by, e.g., motor planning. If the relevant information could be encoded by the minimal bodily senses, it could underwrite a dynamic body representation corresponding to Accounts differ on their characterization of the relevant information and functions, but this generalized picture will 9 do for present purposes. Also note, perceptual representations are representations that some state of the world obtains. The probabilistic component is not generally thought to be part of the representational content, though it may continue to have an impact on metacognition about one’s perceptual states (e.g., how confident one is that the world is as perception represents it to be). 144 the body schema. The activity of a perceptual mechanism will encode information in the following condition. 10 Perceptual Encoding. For an activity, A, of a perceptual mechanism, M, information, I, about a state, S, is encoded iff (i) activation of the peripheral sensory receptors causes M to do A and (ii) the occurrent stimulation causing the activation of the peripheral receptors, any algorithms applied to that stimulation up to that point in perceptual processing, and any background information relied on by those algorithms, collectively determine I. 11 The reason for the determination requirement is to ensure that all the background information contributing to the perceptual processing is accounted for as such – as opposed to smuggling it into information attributed to the sensory stimulation by appeal to something like Dretske’s channel conditions (1981, p. 115-116). These correlations will be accounted for in the analysis, but in a way that makes the question of whether the information is learned or innate tractable: Once we have disentangled the contributions of occurrent sensory stimulation from those of Some authors have referred to information made available to further processing (and hence encoded) as explicit 10 information and information that is not passed along to further processing implicit information (e.g., Shea, 2015, p. 79). Implicit/explicit information, on these characterizations is mechanism-relative – information that is available internally to the mechanism might not be made available by its output – and comes apart from implicit/explicit representational contents. For instance, Kirsch (2003) draws the distinction between implicit and explicit representational contents based on how costly it is for further processing to decode the information (making the implicit/explicit characterization a matter of degree rather than a sharp binary). This restricts representational contents – explicit and implicit – to encoded information (in line with my characterization, above). To prevent confusion, I will avoid using ‘implicit’ and ‘explicit’ with respect to information, focusing instead on whether the relevant information is encoded. The relevant sense of determination is determination by the laws of nature or law-like regularities. It is beyond the 11 scope of this paper to spell out exactly how strict the regularity must be to count as ‘law-like’. It will, however, need to be quite strict to serve the purpose described below. 145 background information, we can further assess whether the background information could be acquired from perceptual learning (by observing regularities in past sensory stimulations) or if it must be innate (see below). The goal here is to identify the maximum possible contributions of sensory stimulation using the minimum necessary innate endowments required to arrive at the body schema in utero. The relevant in utero sensory stimulation will be that received by the minimal bodily 12 senses. Other sensory inputs – especially visual input – that are relevant to the formation of a body schema are available in only highly attenuated form until outside the womb. I will now say a bit more about the roles of occurrent sensory stimulation and background information in perceptually encoding information. Occurrent sensory stimulation. Recall that the minimal bodily senses are distinguished by the shallow processing (prior to interaction with the other senses) of stimulation of their peripheral sensory receptors. The peripheral receptors of each minimal bodily sense include: (1) a set of transducers that respond to some feature or other of the sense’s characteristic proximal stimulus, and (2) a receptor organ – a continuous surface throughout which the transducers are distributed. We can get a rough-and-ready individuation of the minimal bodily senses in terms 13 of receptor organ types: minimal touch’s receptor organ is the skin; minimal proprioception’s, What I am advocating is akin to a ‘how-possibly explanation’: an explanation of how a phenomenon could be 12 generated or sustained by a mechanism. How-possibly explanations are contrasted with how-actually and how- plausibly explanations; see Carver and Darden (2013, pp. 34-35). Typically, a how-possibly explanation decomposes the mechanism into its components and these components are given a functional analysis that explains the operation of the higher-level mechanism they compose (Bechtel 2008; Bechtel and Abrahamsen 2005; Craver 2002, 2003, 2005, 2009; Darden 2002; Craver and Darden 2001, 2005, 2013). I am recommending that we look to the functional contributions of proposed inputs to the body schema-generating mechanism as a means of getting a better understanding of the operation of that (high-level) mechanism – a mechanism we have yet to locate in the brain. For a well-known example outside of the bodily senses, think of rods and cones (transducers) on the retina 13 (receptor). 146 the connective tissues of the musculoskeletal system; minimal equilibrioception’s, the vestibular labyrinths. In the case of minimal equilibrioception, there are multiple receptor organs of the 14 same type – the left and right vestibular labyrinths – whose inputs must be coordinated. The transducers of the bodily senses are the specialized mechanoreceptors found in their respective receptor organs. These mechanoreceptors transduce mechanical energy into neural signals. The (change in) activation of an individual transducer (e.g., a mechanoreceptor in the skin) will encode the degree of (change in) stimulation by its characteristic stimulus (e.g., pressure): The degree of stimulation is both the state (S) about which information is being encoded and the source of the occurrent stimulation to the transducer (M). The transducer responds to this stimulation in a law-governed way by converting the mechanical stimulation into neural impulses (A). Given this, Pr(S|A) > Pr(S), so M’s performing A carries information (I) about S. And, given that M’s performing A is a law-governed response to the stimulus, the occurrent stimulation of the transducer nomically determines – and hence encodes – I. (No background information or algorithms are brought to bear at this stage.) Of course, very little is gained by looking at the activation profiles of single transducers – a single receptor in the skin, for instance, will only respond to very localized stimulation. It is only when (changes to) the activation states of receptors working in concert are considered that we begin to see reliable and informative input concerning the stimulation of the bodily surface. So we will need to posit algorithms for tracking the activation states of multiple receptors on the Transducer type factors in distinguishing the minimal senses from one another when transducers on a single 14 receptor organ respond to different sorts of stimuli. For example, the skin contains distinct transducer types for pressure, pain, and heat. We want to be able to consider each of these individually before considering their interactions. Here we will just be concerned with tactile pressure sense (the receptors of which are the most spatially sensitive of the three.) Individuating minimal modalities just by transducer types is also unworkable: The various mechanoreceptors (sensitive to mechanical energy/pressure) in the skin (associated with touch) are roughly identical to those found in the musculoskeletal system (associated with proprioception). 147 particular organ throughout which they are distributed. This is why the physical continuity of the receptor organ is important. It guarantees that its transducers will behave in a law-governed way (given the physical interaction of the receptor with the stimulus and the distribution of transducers on the receptor) that can be exploited by these algorithms. Clearly, then, algorithms 15 will depend on some background information. Background information. The algorithms operating on the sensory inputs to the minimal bodily senses utilize background information that allows sensory processing to transform the information made immediately available by occurrent sensory stimulation into more useful information. For instance, background information about the distribution of transducers in the skin is necessary to extract the shape of a stimulus pressing on the skin from the occurrent sensory stimulation of pressure-sensitive mechanoreceptors. These algorithms can also apply to distinct receptor organs of a single minimal sense (e.g., the vestibular labyrinths) and – to effect the integration of the bodily senses – to receptor organs of distinct minimal bodily senses. Some of this background information could be learned – e.g., information concerning invariant features of the perceptual apparatus can be acquired given the law-like regularities they impose on stimulation states. Other background information might be innate. To illustrate with an example from audition: A mechanism translating time differences in the arrival of a sound wave at the left and right ears into directional information will require information concerning the interaural distance and the unity of the stimulus. The mechanisms translating interaural time differences into directional information are conditioned by visual inputs (Bajo et al. 2010; Bajo Contrast with distinct (non-continuous) receptors of a given type: Vision (with its retinas) and audition (with its 15 cochlea) are both subject to illusions arising from distinct stimuli being presented to each of the modality’s receptor organs (e.g., by stereograms or stereo headphones). 148 and King 2013; Brainard and Knudsen 1993; Budinger et al. 2006; Feldman and Knudsen 1997; Peterson and Schofield 2007). The implication is that learned correlations between visual, auditory, and proprioceptive stimulation are used to derive the interaural distance, a (relatively) invariant feature of the perceiver’s perceptual apparatus. The assumption of the unity of the 16 stimulus is more likely to be innate. 17 Turning back to information more directly relevant to the body schema: The body schema will need background information about the spatial relationships between mechanoreceptors of minimal touch and minimal proprioception, and the size and shape of the intervening limb segments, to transform information encoded by the occurrent sensory stimulation of proprioceptors into encoded information about the body’s current position. This background information might be acquired (e.g., via body babbling in utero) or it might be the result of an innate endowment. Given our task of assessing the maximum possible contributions of perceptual learning to the formation of the body schema, we are only warranted in attributing innate information about such correlations once we have exhausted the possible contributions of occurrent stimulation and acquired background information, but we still find that the correlations are needed. And so, before attributing innate information, we must first consider whether there is a sufficient degree of uniformity in correlations obtaining among some subset of past stimulations to support learning of, e.g., spatial relations between sensitive portions of the skin from the regularities in Insofar this information is derived learned correlations between visual, auditory, and proprioceptive input, and this 16 input includes a mapping of points in visual space to interaural time differences, there will be no need to represent the interaural distance (i.e., use it in the course of perceptual processing). The account of encoding for innate contents remains an open question. For present purposes I will answer it 17 obliquely (see below). 149 adjacent stimulations. If there is not, but the most perspicuous functional analysis of the perceptual mechanism requires this information, then we are warranted in attributing it as innate information. In this way, we can account for the correlations that have been ruled out by the determination condition by including information about these correlations in the background information captured by the analysis. And it allows us to do so in a way that minimizes innate attributions while maximizing the contributions of sensory input, which is exactly what we need to assess the maximum possible contributions of sensory stimulation to the acquisition of a body schema in utero. 3. Minimal Proprioception and Minimal Touch We are now in a position to assess the possible contributions of the minimal bodily senses to the formation of the body schema. I begin with minimal touch and minimal proprioception, as these provide much more of the relevant information concerning body structure needed for the body schema than does minimal equilibrioception. 3.1. Minimal Touch The mechanoreceptors of minimal touch are found in the skin and are responsive to pressure, torsion, and tension. While our ordinary sense of touch gives us the perception of shape, size, 18 Different receptor types respond to different sorts of stimulation on a different timescale (slow and fast adapting). 18 Fast adapting: Meissner’s corpuscles detect continuous movement along the skin’s surface via low-frequency vibration. Pacianian corpuscles respond to pressure and vibration. Slow adapting: Merkel’s disks respond to light pressure. Ruffini endings respond to stretch. Ruffini endings also provide input to proprioception. We will largely pass over them in minimal touch, returning to them when we consider the combination of minimal touch and minimal proprioception. 150 and surface texture of the objects manipulated, this requires information regarding the size and shape of the body part receiving the tactile stimulation as well as information about the spatial distribution of the stimulation of mechanoreceptors made by the pressure exerted by these objects on the skin’s surface. Moreover, it requires information about the current position of the body parts receiving the stimulation: stimulation of the palm and underside of the fingers indicates one shape when the hand is held flat and another when it is cupped. That is, ordinary touch requires the body schema. The question before us is how tactile stimulation, prior to the 19 development of the body schema, can contribute to the formation of that representation. In particular, the question is what minimal touch – the sense of touch we can expect fetuses to have – can contribute to the body schema. Of particular interest will be the ability of minimal touch to contribute to the acquisition of information about the size and shape of the body, which must combine with information about joint angles to determine the present bodily position (the very thing the body schema is supposed to represent). Imagine a pencil laid lengthwise along your forearm. The pencil stimulates a number of pressure sensors embedded in the skin. Minimal touch encodes the degree of stimulation at mechanoreceptors a, b, and c, along the length of the pencil, but not the relative lengths of ab or bc or whether these lie along a single straight line, turn a corner at some angle, or lie along an arc. Indeed – for all we’ve said – minimal touch won’t even tell us that the stimulus is continuous between a, b, and c. That is, it won’t differentiate stimulation by the pencil or by a comb whose teeth are spaced so as to fall right at the mechanoreceptors (holding pressure of the The claim here is that this information is necessarily a part of the (probably) subpersonal processing underwriting 19 tactile experience. It is not that ordinary tactile experience, itself, is mediated by conscious representations of, e.g., the shape of surfaces pressing on the skin. 151 stimulations constant). Our algorithms will be no help here: No set of discrete transducer states (of minimal touch) can distinguish the two stimulations. And so occurrent stimulation of minimal touch won’t encode the continuity of the stimulus/stimulated region of skin along abc, given the determination criterion for encoding. However, there is more to say: These mechanoreceptors have overlapping receptive fields (RFs). Each RF is an area of the skin, stimulation of which will activate its associated mechanoreceptor. The mechanoreceptor will be most sensitive at the portion of the RF closest to it, but will respond to the immediately surrounding areas as well (fig. 4.2). Given the overlap of the RFs, differences in activation of a mechanoreceptor, due to location of the stimulus relative to its RF, can underwrite the learning of continuities between RFs. Trace a finger along your forearm. At any given time, your fingertip will stimulate multiple overlapping RFs. As you move your finger, it will leave the RFs of some mechanoreceptors, remain in the RFs of others, and 152 Figure 4.2. RF architecture of foot by mechanoreceptor type, from Strzalkoski et al (2018, p. 1236). ‘FA’ stands for ‘fast adapting. ‘SA’ stands for slow-adapting. ‘I’ indicates small RFs, ‘II’ large RFs. FAI mechanoreceptors are also known as Meissner’s corpuscles, FAII as Pacianian corpuscles, SAI as Merkel’s disks, and SAII as Ruffini endings. See n.16 for a description of each mechnoreceptor type’s function. enter the RFs of previously unstimulated mechanoreceptors. Learning the regularities in coordinated RF responses (self-initiated or otherwise) will establish the continuity of the RFs as background information that can be used in conjunction with algorithms that take input from multiple mechanoreceptors (with a continuous RF structure) and the occurrent stimulation of those mechanoreceptors. With these resources we can differentiate stimulation by the pencil 20 (continuous region of the skin) and the comb (discrete points). 21 By extension, we can acquire background information about the continuity of the skin’s surface from sensory input. (There are no areas of the skin not in the RF of some mechanoreceptor.) However, this falls far short of information about the size and shape of the body/body parts. Consider a creature, Feelix, whose only sense is minimal touch. Feelix will be 22 able to learn that he has a continuous surface, and, to some extent, Feelix can use this information to localize stimulation on his skin. He will be able to encode the location of the stimulation relative to individual mechanoreceptors when, e.g., a pencil is laid somewhere upon the surface of his skin. He will encode that a continuous region of skin is stimulated, given the overlapping RFs. But no amount of stimulation of minimal touch, alone, will allow Feelix to acquire information about the spatial disposition of each mechanoreceptor to the next or the size and shape of the RFs: To get distance of a stimulation in an RF from the location of the We can also appeal to hierarchies of RFs, with a later stage of processing having a RF that incorporates those of 20 multiple mechanoreceptors. Such downstream mechanisms will implement the sorts of algorithms we rely on for encoding based on the stimulation states of multiple transducers. There is evidence for such hierarchical processing of tactile stimulation with the RFs expanding as we ascend the hierarchy and, at higher hierarchical levels, these larger RFs are tuned to particular spatial patterns of stimulation analogous to those in the hierarchical processing of visual forms. Which is not to say that there are no limits to our ability to distinguish continuous from multiple discrete 21 simultaneous stimulations or that there could not be illusions of continuity introduced by some abnormal stimulus. However, the extensive overlaps in the RFs of the various mechanoreceptors mitigate the extent of these illusions (once the regularities in coordinated mechanoreceptor responses are learned). Praise or blame for the name goes to Jennifer Foster. 22 153 mechanoreceptor of that RF – which is a precursor to getting the size and shape of an RF and the spatial disposition of individual mechanoreceptors – we would need a law-like regularity between intensity of the stimulation and distance from the mechanoreceptor. But a light touch near the mechanoreceptor can yield an equivalent response to a stronger touch further from the mechanoreceptor. So for all Feelix knows, the continuous region of stimulation might be long or short, wider or narrow, curved, straight, angled, etc. Hence the size and shape of stimulated 23 regions of the skin is not determined by the occurrent stimulation and background information acquired from past stimulations of minimal touch. Therefore, it won’t be encoded. Given the foregoing, Feelix won’t encode the size and shape of his body parts or their present positions relative to one another, as is necessary for a body schema: Imagine Feelix as a bit of silly putty. Draw a boundary around some region of Feelix’s surface. Feelix would be able to register that a stimulation falls within this region, but he will not register the region’s shape or size. Furthermore, we could take another bit of silly putty of the same mass as Feelix but differently shaped. Feelix won’t be able to tell if he has his actual shape, the shape of the second bit of silly putty, or some other shape the silly putty could take. Now draw a border around two more regions. Call one region ‘hand’, another ‘head’, and a third ‘foot’, and imagine that Feelix moves in an amoeba-like manner. Feelix’s hand can move nearer to his head than to his foot or nearer his foot than to his head, or it can be between the two. Feelix will not be able to encode These results scuttle a recently proposed variant of the superficial schema: skin space (Haggard et al. 2017; Cheng 23 and Haggard 2018; Fardo et al. 2018; Cheng 2019). According to proponents of skin space, A superficial schema (skin space) is acquired by the stimulation of the skin without any additional input from prior representations of the body or other senses – including the other bodily senses – and this schema is sufficient for localizing and detecting the shape of stimulations on the skin’ s surface. This is tantamount to saying that skin space is acquired via minimal touch. And minimal touch will not give us a representation of the surface of the skin that would be able to support the detection of the shape of stimulations on the skin’s surface. The acquisitional claim and the shape-detecting capacities attributed to skin space cannot both be correct. 154 that he has taken on these different positions: Without background information concerning the resting shape of the body and the range of motion of its parts, Feelix can’t tell if increased tension registered by stretch detectors in the midlayer of the skin indicates that points on either side are brought closer together or farther apart. He will be able to encode information concerning the fact that he has moved but not how he has moved. Even if Feelix’s hand were to touch his head, it would be indeterminate whether this was because they came into contact or because they were both touched by some other object(s). Since Feelix’s perceptual resources correspond with minimal touch, what goes for Feelix (with respect to sensory encoding) goes for minimal touch. The possible contributions of minimal touch, alone, to the formation of a body representation are limited to the continuity of the skin’s surface. 3.2. Minimal Proprioception Minimal proprioception involves mechanoreceptors that are found in the connective tissue of the musculoskeletal system: the fascia of the muscles (measuring tension and length), the tendons (measuring tension), and in the joint capsule and ligaments (measuring stretch and torsion, mainly at the extremes of the joint’s range), and also in the midlayers of the skin (measuring skin stretch). 24 Each Golgi tendon organ has a transducer (sometimes more than one) that is sensitive to tension placed on the 24 tendon (which connects a muscle to a bone). Muscle spindles contain two types of transducer, one which measures the contraction of muscle fibers and one which measures the rate of change of muscle contractions. Ruffini endings/ SAII receptors in the skin (similar to Golgi tendon organs) track the degree and direction of skin stretch. Similar mechanoreceptors are found in the joints: Ruffini endings in joint capsules (which surround the joint and are filled with sinovial fluid) measure tension on membrane of the capsule. These are mostly sensitive at the ends of the joint’s range of motion, but some respond to intermediate states. Ruffini endings are found in ligaments, with a similar function. 155 The musculoskeletal mechanoreceptors cluster at the joints and lack the RF architecture seen with respect to tactile mechanoreceptors, so we shouldn’t expect minimal proprioception to encode the continuity of the surface of the body, the position of the limbs and joints relative to one another, or the size and shape inter-joint body segments. The possible contributions of minimal proprioception to the acquisition (and updating) of the body schema will concern, at most, joint angles and body positions derived therefrom. Skeletal muscles work in pairs – one muscle contracts to bend the joint a particular way, another to unbend it. Each muscle contains many muscles spindles with mechanoreceptors that encode length changes or tension in the muscle fibers. One of our algorithms can consider collections of these spindles to encode the overall tension/change in length within individual muscles. The tension on the tendons connecting the muscle to bone will be encoded by stretch- sensitive mechanoreceptors in the tendon organs found where the tendon connects to the muscle. Given the tracking of all muscle contractions relative to one another (via an algorithm) we might hope that information encoded by the receptors of two opposing pairs of muscles could determine information regarding joint angle (in the simple hinge-joint case) and, hence, that occurrent minimal proprioceptive stimulation would suffice for encoding joint angle. However, this information does not determine joint angle. There are many reasons for this underdetermination, several of them widely recognized. For instance, proprioceptive information won’t determine joint angle without background information concerning the size and weight of the limb segment moved (Craske et al., 1982; Gurfinkle and Levick, 1991; Longo, Azañòn, and Haggard, 2010; Longo and Haggard, 2010). Similarly for information concerning any resistance exerted by external impediments. The 156 amount of tension/muscle contraction required to move a limb some amount will vary with all these features (There will be more tension on the ligaments when one bends the arm 15° while holding a 10 pound weight than while holding nothing.) Other reasons for this underdetermination have received less attention. Consider a proprioceptive analog to Feelix, Bendji. Suppose that Bendji has a musculoskeletal system like ours, but his only sensory input comes from proprioceptors. When Bendji bends his knee, he will receive minimal proprioceptive input concerning the amount of contraction in the quadriceps and hamstrings and the amount of tension on the tendons attaching these muscles to the bone, the ligaments attaching the bones on either side of the joint together, the joint capsule, and the amount of stretch in the skin around the knee. But without background information concerning the resting angle of the joint, he will not be able to tell whether his knee is bent (as is normal for us) so that the lower leg and thigh are at a 90 degree angle or his knee is bent the same amount from a resting position where the lower leg extends forward 45° from straight (fig. 4.3). 157 Figure 4.3. (A) top: leg at normal resting position (equal tension on hamstrings and quadriceps); bottom: flexed 45° from resting to 90°. (B) top: leg at abnormal resting position; bottom: flexed 45° from resting to straight. Minimal proprioception does not distinguish A from B. Even worse, though Bendji could learn which groups of transducers collaborate (including those of oppositional muscle pairings) via acquired background information about correlations among transducer activations from past stimulations, he can’t locate them relative to one another because minimal proprioception doesn’t enable acquisition of information about transducer locations. So far as I know, this has not been noticed before, but until the clusters of proprioceptors associated with oppositional pairings are located, Bendji can’t tell that the contraction of one and stretching of the other is the result of bending a joint as opposed to a tug- of-war in which the two tendons are arranged in a straight line and attached to a floating ball in the center (rather than bone). If the muscle at the far end of one tendon contracts, pulling the ball toward it, the muscle spindles will encode a shortening of the muscle fibers while the spindles of the other will encode a lengthening, just as they would when bending a hinge joint. (Similarly for changes in tension, considered in conjunction with these changes in muscle fibre length.) Barring an appeal to innate background information – which we are presently trying to do without – Bendji must learn the spatial relationships between the transducers (e.g., whether the transducers of one muscle are situated as in a hinge joint or as in a tug-of-war relative to its oppositional partner). We encountered a similar problem with respect to minimal touch/Feelix. There the RF architecture enabled learning the continuity of the skin’s surface and adjacency relations among RFs, but there is no analogous RF architecture in minimal proprioception. So, Bendji is in an even worse position than Feelix with respect to learning the spatial distributions of his mechanoreceptors. What goes for Bendji goes for minimal proprioception, generally. In contrast 158 to our everyday notion of proprioception, minimal proprioception does not encode the position of body parts relative to one another. It does not encode any postural information by itself. However, adding the resources of minimal touch to those of minimal proprioception will partially rectify the shortcomings of minimal proprioception (and vice versa). 3.3. Minimal Proprioception+Touch We were left with two primary problems: (1) How to get information about the direction and distance of points on the skin’s surface relative to one another given the limitations of minimal touch, and (2) how to encode joint angle given the limitations of minimal proprioception. These intersect at: (3) The problem of acquiring background information about the structure of the body to be combined with occurrent stimulation of the bodily senses to form the body schema. It seems likely that (3) can be solved – at least in a rough fashion – by combining the resources of minimal proprioception and minimal touch. The idea is that a systematic exploration of the correlations of variations in stimulation of the mechanoreceptors of the musculoskeletal system (encoded by minimal proprioception) and those of the skin (encoded by minimal touch) will allow the acquisition of background information about the size and general location of the body parts relative to one another. This process is facilitated by the fact that minimal proprioception and minimal touch both take input from the stretch-sensitive mechanoreceptors 159 (transducers) in the midlayer of the skin, thus ensuring that the two systems are spatially integrated. 25 Tracking systematic correlations in proprioceptor and tactile receptor stimulations will allow us to learn the locations of proprioceptor groupings in relation to skin surfaces (given spatial integration) – i.e., we will be able to learn their spatial distribution relative to the skin Shared transducers – in this case Ruffini endings (SAII) – pose no problem for the minimal senses approach 25 provided that both minimal senses respond to the same class of energy and that the physical continuity of the receptor organs in which the transducers are embedded is secured. Both conditions are satisfied in this case: The transducers of minimal proprioception and touch both respond to mechanical energy, and the dermis is connected to the deep fascia of the muscles by the connective tissue of the superficial fascia/hypodermis – press or stretch the dermis, superficial fascia, or deep fascia and the others will be impacted too. Alternatively, we could appeal to innate information or to some other modality to bring about the integration (e,g., vision), though this would pose problems: the congenitally blind do have an integrated body map. 160 Figure 4.4. (A) Straight forearms, straight upper arms, (B) curved forearms, straight upper arms. Change in the shoulder angle as one slides the left arm downward (lower image) from starting position (upper image) are greater for B than for A. and, therefore, learn that the groupings of coordinated proprioceptors cluster together. And we will be able to learn the location of these clusters relative to the joints, guaranteeing that oppositional pairings actually move a joint rather than engage in a tug-of-war. To see how this would work, we can imagine a sequence of choreographed movements in which various body parts are moved along the surface of others. For instance, imagine pressing one’s palms together at the body’s midline. The various joint and muscle receptors of one arm will mirror the stimulation states of those of the other. Now raise the hands (still touching), by bending the elbows, until the inner forearms touch. The joint angles will be symmetrical and pressure on the forearms and hands will be (more or less) even. Bend the wrists back and slide the right forearm down along the left, until the wrist reaches the end of the elbow (fig. 4.4.A). Then reset and perform the corresponding action with the left arm. In this way we verify (or disconfirm) that the forearms are the same length: If one forearm were longer, portions of it not stimulated by the other at the start would come to be stimulated by it as the other began its slide. We can then run the forearms along each upper arm to verify that these, too, are the same length as one another. In the same way, we can establish that the forearms are (relatively) straight. Given uniform pressure applied at the joints (where the proprioceptors cluster), there will be (roughly) uniform pressure along the whole length of the segments brought into contact thereby if these segments are straight. If pressure falls off substantially, this indicates complementary curvature (or angles, depending on the rate of change in pressure along the relevant limbs). If there are regions without any pressure, then the limbs curve or bend away from one another. Our exploratory choreography can reinforce these results by comparing shoulder angles as each 161 forearm slides down the other (fig. 4.4.B). Similar exploratory movements will allow us to learn the size and shape of the hands. Continued explorations with the forearms and hands – once their relative sizes and shapes are learned – should be enough to establish the (rough) left/right symmetry of the body and the relative sizes and shapes of its parts. 26 By integrating the results of our exploratory choreography, we can acquire the background information regarding the size, shape, arrangement, and range of motion of our body parts that is needed to encode present body position from occurrent stimulation of the minimal bodily senses, without further inputs from the other senses or innate endowments. Furthermore, 27 the exploratory choreography just is a regimented variant of the body babbling hypothesis. Given sufficient experience with more or less random movements (becoming less random over time), this same background information could be acquired. And the resources appealed to here are available to the fetus in utero. It is at least possible, therefore, that the body schema is acquired in utero. 3.4. Limitations However, the body schema that can be acquired by minimal touch and minimal proprioception, alone, has shortcomings – particularly, with respect to scale and orientation – that must be overcome if it is to play its full role in action. Notice that this exploratory choreography depends on background information. For instance, it requires 26 information to the effect that such-and-such joint angle + pressure, along with changes to both over time, determine the size and shape of limb segments. However, the rigidity of inter-joint limb segments and their constant size, relative to the rest of the body, will be guaranteed by the response (or, rather, the relative lack of response) of the stretch detectors in the skin. De Vignemont (2014) argues that for the sighted bodily awareness is partially constituted by vision because vision 27 is less error prone than the bodily senses with respect to the metrical properties of the body. However, she does not deny that the blind can form a functional, if somewhat distorted, body representation (and so she doesn’t deny that my choreography would do its job). 162 The first problem is that most of our actions depend on our ability to scale our body to the external world, which (for us) requires integrating bodily space with visual (and auditory) space. This is the scaling problem. For example, Felix (our left fielder from the introduction) can only know how high he must reach to catch the fly ball if he has a sense of how tall he is and how long his arm is relative to the spatial dimensions of the visual scene. It comes as no surprise that such scaling isn’t available to minimal proprioception+touch, which does not have access to visual and auditory spaces. We might hope to solve the scaling problem by tracking changes in the body schema derived from minimal proprioception+touch as one moves through external space. However, to do so would require an implausibly accurate record of changes in the body’s size through time to arrive at a stable, body-independent measurement of, e.g., its stride length. Furthermore, we couldn’t distinguish between walking on solid, stationary ground and walking on a treadmill. That distinction will have to wait for vestibular input. And so, even if we could overcome the first difficulty, stride length won’t determine distance in extra-bodily space. Now for the orientation problem: Suppose a hybrid of Feelix and Bendji, call her FeeBee, whose sensory systems are limited to minimal touch and minimal proprioception, has acquired a body schema through exploratory movements (body babbling/our exploratory choreography). FeeBee could feel a tickle on her left arm and swat it away with her right arm, but she will not be able to know that it was her left arm that was tickled (or her right arm that did the swatting) from the standpoint of visually presented external space. The only spatial features that minimal proprioception+touch allows us to learn are spatial relations between parts of the body. And, where all we have access to are spatial relations internal to an object, we cannot determine 163 whether we are presented with the object or its mirror image – or its front/back or top/bottom inversion – from the standpoint of some external space. Minimal proprioception+touch, then, 28 cannot orient the dimensions of bodily space to those of extra-bodily space, as visually or auditorily presented. But this is crucial for a body representation – such as the body schema – that is used to plan and execute actions with distal objects. This is primarily a problem for creatures like us that need to integrate body space with the spatial frames of other senses. It will not matter to a creature with only bodily senses whether the touch is on its left arm from the standpoint of visual space. It will not diminish its ability to 29 swat away the touch on that arm with the hand connected to the other arm. There is no confusion about which arm is which in body space, just about which part of body space is on the left of external space (as visually or auditorily presented). But there is one external dimension to which an action-guiding body representation must be oriented, even for a creature with only bodily senses; namely, up/down. The reason for this is that the influence of gravity does need to be accounted for when executing actions. It takes more force to lift something than to drop it. Orienting the body relative to external space – especially relative to gravity – is also necessary for a complete description of the current body posture: While we don’t need to orient body space in external space to know the disposition of body parts relative to one another, we certainly do need to do so in order to distinguish, e.g., whether the pressure on one’s back is due to the fact that one has backed up against a wall or that one is lying The corresponding metaphysical fact has been well-known at least since Kant’s introduction of his incongruent 28 counterparts (1768). See Van Cleve and Frederick (1991) for a sampling of the philosophical work on this topic after Kant. The point here is epistemological, as we are concerned with the acquisition of the body schema. Such creatures could, of course, learn the orientation of bodily space to visual space through cognitive means. For 29 instance, if the creature has spatial hearing, then it could orient bodily space to auditory space. If it then comes to understand that there is a visual space and that it corresponds to auditory space, it could know how its body is oriented from the standpoint of visual space. Indeed, this is likely to be true of congenitally blind humans. 164 on the floor. And which of the situations we are in will certainly have implications for action. We will now consider the extent to which vestibular input can mitigate these problems. 4. Minimal Equilibrioception The receptor organs of minimal equilibrioception are the vestibular labyrinths, found in the left and right inner ears. Each vestibular labyrinth comprises three semicircular canals attached to a central structure called the vestibule (fig. 4.5). The vestibule contains the otolithic organs – a saccule and an utricle. The semicircular canals are composed of three loop-like structures, the anterior, posterior, and lateral canals, each oriented in a different plane. Each canal is filled with fluid that is disturbed by motions of the head in the plane to which it is oriented. Motion of this fluid 165 Figure 4.5. The vestibular labyrinth. Anterior canal (AC) and posterior canal (PC) respond to movement away from the center of the head in the plane to which they are oriented. The lateral (or horizontal) canal (HC) respond to horizontal rotations of the head. The utricle (utr) and saccule (sac) respond to linear acceleration/ gravity. Diagram from Kingma and Van de Berg (2016, p. 4). stimulates hair cells (transducers) on a membrane allowing the detection of rotational acceleration in that plane. Algorithms considering the combined input of multiple canals (and otoliths) factor in determining rotation in intermediate planes. Hair cells on membranes within the otoliths (the saccule and the utricle) respond to acceleration. The utricle responds to acceleration in the horizontal plane and the saccules respond to acceleration in the vertical plane (relative to the head). Algorithms that compare input from the otolithic organs to input from the semicircular canals (which are only sensitive to rotational acceleration) can thereby determine the direction of gravitational force (based on deviations from uniform acceleration). Otolithic stimulation registers ‘gravity’s pull’ when there is no additional input indicating self-motion (e.g., from the semicircular canals or, leaving minimal equilibrioception, from proprioception and vision). The otolithic stimulation Felix receives as 30 he stands in left field between plays is due to gravity, that which he receives as he runs and jumps to catch a fly ball is due to his own motion (and its interaction with gravity). It is clear that minimal equilibrioception won’t encode any information about the spatial structure of the body on its own. Nor will it encode body-independent distances that might help solve the scaling problem. Given the mechanical function of the otolithic organs, once speed stabilizes we will stop receiving input. Therefore minimal equilibrioception can encode the duration of a changing rate of acceleration, but not the duration of movement (let alone speed or Gravity is equivalent to uniform acceleration, and so what is strictly encoded, where psychologists and cognitive 30 scientists speak of ‘detecting gravity’, is the deviation from uniform acceleration/cancelation of gravitational force introduced by an obstacle (in this case, the membranes of the otolithic organs). 166 distance traveled). To illustrate, as the speed of a car in which we are riding stabilizes, the 31 otoliths cease to register our forward momentum and we no longer feel as though we are moving. Minimal equilibrioception won’t help with orientation, either. First, it does not have access to, and cannot acquire (on its own), the orientation of the left and right vestibular labyrinths relative to one another. As a result we cannot be sure that the left and right saccules respond to force exerted in the same plane. For example, if the saccules are oriented at a 30° angle relative to one another, then equal stimulation of the saccules would place the direction of gravitational pull by a stationary agent 15° off of what it would be were the saccules in line. Therefore, a given pairing of stimulations in the left and right saccules underdetermines which way is down. Similarly for the detection of front/back and left/right from otolithic stimulation. 32 The second problem is that minimal equilibrioception doesn’t have access to, and cannot acquire, the orientation of the vestibular labyrinths in bodily space. So, even if there was a guarantee that, e.g., the saccules were oriented in the same plane, there is still no telling (from the standpoint of minimal equilibrioception) the angle at which that plane intersects bodily space. The upshot is that minimal equilibrioception merely encodes the presence of forces in indeterminate directions. But it can make important contributions to the acquisition of body representations once it is combined with the other minimal bodily senses. 33 The otolithic membrane is heavier than the surrounding structures of the otolithic organs and so responds more 31 slowly to acceleration, but shortly after stabilization, the membrane will catch up to the rest of the organ and stop responding. There is a corresponding problem for the orientation of the semicircular canals and vestibule to one another within 32 a single vestibular labyrinth, as well. (As we will see below, this orientation could be acquired and so there is no need – according to the approach we are taking here – to appeal to innate information about this orientation.) That vestibular input primarily augments the other senses should be unsurprising given the fact that there is no 33 dedicated portion of the cortex for processing vestibular input but we do see vestibular inputs routed to the regions of cortex traditionally treated as dedicated to other senses (somatosensory cortex, motor cortex, visual cortex). This contrasts with the other minimal bodily senses considered here, proprioception and touch, which are associated with areas 3a and 3b of S1, respectively. 167 4.1. Minimal Proprioception+Touch+Equilibrioception: Orientation Though minimal equilibrioception cannot encode the direction of gravity, it could encode the law-like covariation of activation states of the vestibular labyrinths. And, by considering the total stimulation of the minimal bodily senses, we could learn how these law-like regularities in vestibular activation correlate with bodily movements tracked by the rudimentary body schema derived from minimal proprioception and minimal touch. For example, as the fetus rotates, it will receive tactile stimulation of the body parts that contact the uterine wall that will progress in the opposite direction of the fetus’s movement. This tactile stimulation will correlate in a law-like way with stimulation of the mechanoreceptors in the semicircular canals (and registration of the position of the head relative to the rest of the body). This would allow the fetus to acquire background information orienting minimal equilibrioceptive stimulation within bodily space. This, in turn, would enable the encoding of the orientation of one dimension of bodily space in (visually presented) external space, but only this one dimension – namely, up/down. This is so because gravity provides an asymmetry between up and down that allows down to be differentiated from up. There is no analogous asymmetry for left/right or front/back. While we will be able to match forward momentum with momentum in the direction of the front of the body – just as we will leftward motion with one side of the body – that will not determine which direction is forward and which is back in visually or auditorily presented body-independent space any more than it will which is left and which is right. 34 It’s a further question whether there is any fact distinguishing external space from its mirror image. 34 168 Ultimately, each dimension of bodily space represented in the body schema must be oriented with respect to (objects presented in) the other spatial frames if it is to play its proposed role in action. It is unsurprising that we cannot achieve such a complete orientation with visual and auditory (and perhaps other) spaces without visual and auditory (and perhaps other) input. 4.2. Minimal Proprioception+Touch+Equilibrioception: Scale Despite the fact that it won’t enable encoding of the direction of travel in extra-bodily space, integrating minimal equilibrioception with a body schema derived from minimal touch+proprioception and occurrent minimal proprioceptive stimulation could allow us to encode that we are moving through external space. For example, by integrating changes in body shape associated with locomotion that are tracked in the rudimentary body schema (via minimal proprioceptive input), pressure on the soles of the feet as they push against the ground (thanks to minimal touch), and stimulation of the saccules as we move forward (via minimal equilibrioception), we could encode that we are walking through external space, as opposed to walking on a treadmill. However, this will not 35 be enough to solve the scaling problem. The reasons here are the same as those given in §3.4: Scaling the body to a body-independent space with only a body-based frame of reference, where (a) bodies change over time and (b) the spatial relationships between the individual transducers of each bodily sense need to be learned, would require an implausibly detailed record of past body representations updated with frequent recourse to something like our exploratory The response of the otolithic organs to linear acceleration is transient – the response of the organs subsides shortly 35 after the speed stabilizes (see n.31). When we stop moving or slow down, we will get activation in the opposite direction of travel, due to inertia. These changes are law-governed and will correlate with bodily movements associated with locomotion, so we will (diachronically) encode duration of movement. 169 choreography. Nothing contributed by minimal equilibrioception changes this – not even the 36 fact that we can now encode the duration of travel. If the size of one’s strides aren’t already scaled to external space, the fact that you walked for so many minutes, taking such-and-such a number of strides, while experiencing some particular otolithic stimulation, won’t determine a body space-independent distance travelled. Resolving this indeterminacy of worldly distance will require input from elsewhere. In particular, it will require input that allows us to scale body space to an independently presented extra-bodily space. This is not one of the possible contributions of body representations acquired just through the minimal bodily senses. 5. Implications and Future Directions 5.1. The Origins of the Body Schema A fairly substantial body schema could be acquired via perceptual learning in utero – at least this cannot be ruled out on conceptual grounds. But this is not a fully mature body schema. The scale problem and aspects of the orientation problem remain unsolved, so long as we are restricted to sensory input from the minimal bodily senses. This limits the sort of imitation that the present-at- birth body schema could support, should imitation results be vindicated. For instance, Meltzoff and Moore (1989) conducted a study that purports to show imitation in clockwise head rotations (as opposed counterclockwise rotations, which were taken to be indicative of tracking the experimenter’s head movements), in infants no older that 72 hours. If the body schema results solely from body babbling without any visual or auditory support, the infants should not be able And some independent assurance that the objects against which we measure ourselves remain a constant size/are 36 in fact the same objects at each instance of measurement. 170 to correlate visually perceived clockwise movements of others’ heads with proprioceptive clockwise movements of their own heads because the body schema has not been oriented with respect to visual space. While in utero visual stimulation is highly limited, it remains possible that the fetus has sufficient experience with shadows cast by a moving body part passing between the fetus’s eyes and a light source strong enough to penetrate the womb to solve the orientation problem on the basis of perceptual learning. Similarly, though in utero auditory input is restricted to low- frequency components that are harder to localize, sufficient experience with alterations in auditory stimulation related to fetal movements might allow perceptual learning to solve the orientation problem with respect to auditory space. This suggests that further research is needed 37 on the potential role for vision and audition in perceptual learning in utero. Should such studies show that in utero visual and auditory input is insufficient to solve the orientation problem and should imitation results requiring orientation to visual or auditory space be vindicated, then we can conclude that there must be some innate endowment involved in solving the orientation problem (in utero). Notice, though, that this innate orientation 38 mechanism – which need only orient the body schema in extra-bodily space – falls far short of the innate endowments posited by Gallagher and colleagues, which include innate representations of body parts. While the neural mechanisms responding to binaural auditory discrepancies are fully formed prior to birth, it 37 remains probable that a period of learning (postnatally) is required to use binaural cues for sound localization (Muir et al., 1979; Muir et al., 1989; Clifton et al., 1981; Muir and Hains 2004). Unless it can be shown that 72 hours is sufficient for the acquisition of a solution to the orientation problem from 38 sensory inputs. 171 Further study is also needed to determine how reliable the imitation results are. These studies should test a variety of orientation-dependent tasks. (Here we circle back to the evaluation of present-at-birth accounts, relative to the traditional view, in addition to helping clarify the role of innate endowments and perceptual learning in utero.) Relatedly, studies on non-imitation-based activities – preferably free of the confounds such as social motivation – requiring orientation should be pursued. Similar studies can (and should) be directed towards the origins of the solution to the scaling problem, which seems less likely to have an in utero perceptual learning-based solution. Note, too, that my account has not factored in constraints on fetal movements imposed by either developmental stage or the by the uterine environment, itself. Further study on these issues will be crucial for assessing the extent to which in utero body babbling supports the formation of the body schema. The account I have provided here lays the groundwork for such studies by 39 specifying how to link particular movements to informational contributions to the body schema. 5.2. The Individuation of Cortical Body Representations The minimal bodily senses approach also has promise for addressing the broader question of individuation of the (increasingly large number of) cortical body representations that have been proposed. In a recent review article, Longo (2016) offers a (non-exhaustive) list of six distinct candidates. As Longo classifies and describes them, these are: • body image: a conscious representation of the size, shape, and composition of the body • body schema: a dynamic representation of the disposition of body parts relative to one another Hayat and Rutherford (2018) for an MRI protocol for studying fetal movements that accommodates these issues. 39 172 • superficial schema: a representation used for localizing stimulation on the body’s surface (the skin) • body model: a representation of the metrical properties of the body subserving perception • semantic body representation: a representation of conceptual information concerning the body • structural body description: a consciously accessible cognitive representation of the body’s structure used in making judgments about the spatial relationship of body parts to one another There is little agreement on the exact number of distinct cortical body representations, their functions, and the extent to which they overlap. The least controversial of these, along with the body schema, is the body image. Even here, there have been challenges both to the distinctness of the body image from the body schema and to the unity of the body image itself. Evidence for the distinctness of the body schema and the body image derives from dissociations exhibited in bodily disorders such as anorexia nervosa (AN) and Alice in Wonderland syndrome (AIW). AN patients represent themselves as weighing more than they actually do, implicating the body image. However, they are not impaired in their movements, as they would be if the body schema were affected. AIW is generally characterized as feeling as though one’s legs are shorter than they are, though there is (usually) no impairment in actions taken with the legs – e.g., walking. This also suggests that representations supporting the perception of and judgments about the body (body image) are distinct from those tracking the body’s position for use in action planning and guidance (body schema). Recent work calls into question the extent of these dissociations. Pitron and De Vignemont (2017) cite first person reports of AIW to demonstrate that, in some cases, there are impairments of action corresponding to the misperception of body size. This leads them to propose the ‘co-construction model’ on which the body schema – though distinct from those of 173 the body image – can influence the updating of the body image, and vice versa, leading to mirrored deficits in both body representations. Cases of complete dissociation are explained by positing an impairment of the mechanism that implements the cross-representation updating. 40 Recent empirical work also suggests that the bodily image might not be best construed as a single representation. In a seminal paper Schwoebel and Coslett (2005) divide the traditional body image into a structural body description – a representation of the boundaries and proximity relations of body parts, which they took to be derived from visual input – and the body image, proper – which houses conceptual and affective contents pertaining to the body (e.g., associations of tools with the body parts required to use them). Schwoebel and Coslett located 41 the body schema in the posterior parietal cortex (PPC) and the body image and structural body description in (different parts of) the temporal lobe. More recent work on body image disturbances in AN has clarified and extended our understanding of the brain areas contributing to the body image – e.g., the insula, which is thought to contribute affective components to the body image (Mohr et al., 2010; Lee et al., 2014; Araujo et al, 2015; Ehrlich et al., 2015; Via et al., 2018). Gadsby (2018) has questioned the co-construction model on two fronts. First, he argues that it is doubtful that the 40 default is to bring the contents of the body schema and body image into line, which is a presupposition of the co- construction model (hence the positing of a mechanism serving this function). Second, he argues that, if the co- construction model and its assumption of default correspondence are correct, we should see the majority of AIW cases evincing shared disturbances in the body image and body schema. But Pitron and de Vignemont’s evidence suggests that shared distortions are the exception, not the norm (Gadsby 2018, p. 167). Pitron, Alsmith, and de Vignemont (2018) respond that the default needn’t assume exact correspondence – particularly if discrepancies are a result of the distinct uses to which the representations are put (e.g., the body schema needs more fine-grained spatial resolution than the body image, given the former’s role in action planning and guidance) (pp. 2-3). They then offer a more complete sketch of the putative co-construction mechanism. To this reader’s’ eyes, the refined account leaves Gadsby’s second criticism unanswered. Notice that this characterization lumps the body image, proper, together with what Longo calls the semantic body 41 representation. Such lack of agreement about labels for putative body representations is common in this literature. 174 Kanayama and Hiromitsu (2021) suggest that the structural body description is derived from the bodily senses-derived body schema and visual input pertaining to the body traditionally associated with the body image (2021, p. 141). As with the co-construction model, this suggests that one cortical body representation might well include or draw on (a part of) another. If that is correct, understanding the acquisition of one body representation (e.g., the body schema) can advance our understanding of others (e.g., the structural body description, insofar as it builds upon the schema). Furthermore, the recent work on AN – as a disorder of the body image – cited above suggests that body representations are constituted by distributed, probably overlapping, networks rather than isolated brain regions. Distinct portions of these networks seem to correspond to different sorts of information pertaining to the body (perceptual, affective, conceptual) from different sources (e.g., bodily senses, vision) – information that is relevant to a wide range of functions served by putative body representations (e.g., input from the bodily senses, information pertaining to body topography or body metrics, etc.). This speaks against treating the putative body representations as distinct, functionally demarcated entities. Indeed, looking back at Longo’s list, we can see that the differences between many of these putative body representations are quite subtle: The body image differs from the body model primarily in that the former is consciously accessible while the latter functions as a subpersonal basis for locating bodily sensations and supporting tactile perception. The body model differs from the structural body description in that the latter is the basis for judgments concerning the spatial relations between body parts, which can come apart from the perception of these relations enabled by the body model and in that the former is not necessarily consciously accessible. And 175 the structural body image differs from the body image (on Longo’s characterization) in that it does not contain content concerning the material composition of the body, only a representation of its spatial structure. Given the subtle differences between these representations and the overlaps in the sort of information they must encode, along with the distributed nature of cortical body representations, it seems likely that there is considerable overlap in the mechanisms implementing these (putatively) distinct representations or even that some of them are merely different functional elaborations on a shared underlying representation. 42 Nevertheless, these representations are often treated as discrete, self-contained constructs. The evidence for this treatment – as with the body schema and body image – comes from dissociations. But the range of these dissociations is vast and can be interpreted to be consistent with both many discrete body representations serving different functions and a complex body representation network including mechanisms that draw on shared representations to perform different functions. Continuing to treat these putative body representations as discrete functional units threatens to obscure the actual overlaps and interactions within the overall body representation network. As Ho and Lenggenhager put it, speaking of AN: “By using the body image as a unitary construct, and not considering the perceptual, affective, and cognitive This points to a general problem with using dissociations to distinguish body representations: functional 42 shortcomings found in a given bodily disorder might be caused by damage to a body representation that specifically supports that function or by damage to a mechanism that accesses a prior, more general body representation supporting many functions. Similarly, different spatial resolutions might result from the functional requirements of a mechanism accessing an underlying representation of body size and shape. For instance, processing loads can be reduced by mechanisms that need less fine-grained information by treating a range of values as equivalent. See De Vignemont (2007; 2018, ch. 8) and Holmes and Spence (2006) for a thorough discussion of the shortcomings of current empirical approaches. 176 subcomponents, it is more difficult to draw valid conclusions surrounding the anatomical signatures of AN” (2021, p. 277). If we are to heed Ho and Lenggenhager’s warning – as we should – then we should consider each putative body representation in light of the potential perceptual, conceptual, and affective contributions to the performance of its assigned function(s). This is where the minimal bodily senses approach can help. All these putative representations will be updated on perceptual input from the bodily senses – either directly or through an intervening body representation. Therefore, everything I’ve said about the possible contributions of the minimal bodily senses to the acquisition of the body schema applies to the possibility of acquiring (aspects of) the other body representations from sensory input in utero. And we can identify functional shortcomings relative to putative body representations in the same way that we identified the orientation and scale problems, above. Furthermore, given our relatively good understanding of early perceptual processing, and its associated pathways, clarifying the perceptual contributions to the formation and updating of body representations lays the groundwork for understanding how other inputs interact with perceptual components to form and update the complete body representation network. The fine-grained analysis of perceptual processing required by my approach – with its careful build up from occurrent stimulation through subsequent stages of processing – also provides guidance on where to look for the mechanisms implementing these (fine-grained) processing functions: We target functions of interest (e.g., those involved in orienting body space to extra-bodily space) and study brain activity related to that function (e.g., by fMRI scans of individuals in the process of adapting to inverting lenses) in light of what we already know about 177 the relevant perceptual processing. We can also study brain activity in regions antecedently thought to be involved in particular body representations with an eye toward the way these areas link up with perceptual processing. This will tell us what fine-grained processing functions they might be contributing to, given their observed activity in response to sensory inputs/experimental task performance. As our understanding of other sorts of inputs (e.g. affective) catch up to our understanding of perceptual processing, we can make similar inferences about the roles their associated brain areas are playing in the body representation system. 43 This, in turn, informs us about the maximum degree of overlap between the components of putative body representations. As such, this approach is better suited to understanding the nature of cortical body representations than the unitary constructs approach. If the body representation system is a network of overlapping representations with shared mechanisms/ contents, then the unitary construct approach obscures this fact. If the body representation system is, in fact, a collection of discrete, functionally differentiated body representations, then the approach I am urging will reveal this. Furthermore, a clearer picture of the neural mechanisms implementing body representations will help us to better understand the disorders of body representations that have been the source of so much theorizing about cortical body representations, hopefully leading to therapeutic advances. In short: Clarifying what the stimulation of the bodily senses can contribute to the formation of body representations better positions us to untangle the web of issues surrounding This will also advance our understanding of other body-involving spatial representations – e.g., peripersonal 43 space, the space immediately surrounding the surface of the body which is processed by input from the body schema and the exteroceptive senses (including vision and audition). As with other body-based representations, there is debate over the number of representations of such peripersonal spaces (de Vignemont 2018; Vagnoni and Longo 2019). And it will help us avoid the over-attribution of information that can be derived from sensory stimulation in the bodily senses in conjunction with (plausibly attributed) background information (see n.23 on skin space). 178 the origins and individuation of these body representations with both theoretical and practical benefits. It is the first step in understanding how we go from Feelix to Felix. 179 Chapter 5 Vision, Vantage, and the Supposed Need for Action in Spatial Perception ABSTRACT: An influential account of the perspectival nature of visual perception holds that the appearance of an object of vision is determined by relational properties of the object relative to a viewpoint (Green and Schellenberg 2018; Hill 2009, 2016, 2020; Noë 2006; Schellenberg 2008, 2010; Tye 2002). Another influential view holds that perceiving spatial features (e.g., shapes) of objects requires action/dispositions to act (Evans 1982; Hurley 1998; Hurley and Noë 2003; Noë 2006; Grush 2007; Schellenberg 2007, 2010). It is often claimed that the former view supports the latter. Schellenberg (2007, 2010), in particular, has argued forcefully that we can only transcend the appearance-determining viewpoint-relative properties presented to vision to perceive the intrinsic spatial features of objects if we possess a practical understanding of space rooted in dispositions to act. Other arguments purport to show that action is necessary to perceive viewpoint-relative properties in the first place (Evans 1982; Grush 2007; Schellenberg 2007, 2010). I will argue that, once we account for some frequently overlooked inputs to visual experience, action/dispositions to act are not necessary to perceive either viewpoint relative or intrinsic spatial properties of visually presented objects. I argue further that the relational 180 properties cited by perceptual relativists are not represented in a way that could determine the phenomenal appearances of visually perceived objects. Contents 1. Perceptual relativity and action 1.1.what is perceptual relativity? 1.2.why does perceptual relativity (allegedly) support action-based views of perception? 2. The egocentric coordinates motivation 2.1.orienting up/down and left/right axes of egocentric space 2.2.scuttling the priority claim: action coordinates also depend on integration 2.3.distance in depth 3. The sensorimotor knowledge motivation 3.1.size 3.2.shape 4. The objective experience motivation 5. Appearance-determining relations are not represented 6. Conclusion An influential account of the perspectival nature of visual perception holds that the appearance of an object of vision is determined by relational properties of the object relative to a viewpoint (Green and Schellenberg 2018; Hill 2009, 2016, 2020; Noë 2006; Schellenberg 2008, 2010; Tye 2002). Another influential view holds that perceiving spatial features (e.g., shapes) of objects requires action/dispositions to act (Evans 1982; Hurley 1998; Hurley and Noë 2003; Noë 2006; Grush 2007; Schellenberg 2007, 2010). It is often claimed that the former view supports the latter. Schellenberg (2007, 2010), in particular, has argued forcefully that we can only transcend the appearance-determining viewpoint-relative properties presented to vision to perceive the intrinsic spatial features of objects if we possess a practical understanding of space rooted in dispositions to act. Other arguments purport to show that action is necessary to perceive 181 viewpoint-relative properties in the first place (Evans 1982; Grush 2007; Schellenberg 2007, 2010). I will argue that, once we account for some frequently overlooked inputs to visual experience, action/dispositions to act are not necessary to perceive either viewpoint relative or intrinsic spatial properties of visually presented objects. I argue further that the relational properties cited by perceptual relativists are not represented in a way that could determine the phenomenal appearances of visually perceived objects. 1. Perceptual Relativity and Action On a popular way of understanding visual experience, visual appearances are determined by relations obtaining between the observer and the perceived object(s) (Hill 2009, 2016, 2020; Noë 2006; Schellenberg 2008, 2010; Tye 2002). The idea, sometimes called perceptual relativity, has a long provenance in philosophy (Berkeley 1713, 1734; Hume 1777). On another popular view of perception, perceiving the spatial features of objects (size and shape) requires action (Hurley 1998; Hurley and Noë 2003; Noë 2006; Schellenberg 2007, 2010; Grush 2007; Green and Schellenberg 2018). The former view is often thought to support the latter view (Noë 2006; Schellenberg 2008, 2010). My goal here is to show that, once we get clear on the features of the 1 object, perceiver, and situation that determine the visual appearances – and, hence, the relational properties cited by the relativist – we have already accounted for everything needed to perceive intrinsic features of objects. There is no further need to appeal to action. Moreover, the relational properties cited by perceptual relativists are not represented in perceptual experience in a way See Hill (2020, pp. 187-188) for a related, but weaker, role for action in providing access to intrinsic features of 1 visually perceived objects. 182 that could determine the viewpoint-dependent phenomenal appearances of visually perceived objects. 1.1. What is perceptual relativity? The idea behind perceptual relativity is that our perceptual systems encounter the world under certain viewing conditions, including the location of the perceiver’s viewpoint relative to perceived objects. These viewing conditions, together with the intrinsic (non-relational) features of the perceived objects, determine the visual appearances of those objects. These appearance- determining features can be – and are, on recent versions of relativism – characterized as objective, relational properties of the perceived objects (properties they have relative to specific viewing conditions). For example, an object seen from fifty feet away subtends a particular angle from top to bottom that determines its apparent height (Harman 1990; Huemer 2001; Noë 2006). And it subtends this particular angle from fifty feet because of its non-relational, 2 objective height. But subtending that particular angle from fifty feet is one of its relational, objective properties: the property of subtending a particular visual angle when seen from fifty feet away. Similarly a coin tilted away from a perceiver will cast an elliptical projection on the retina, so the coin will have an objective relational property of casting that elliptical projection 3 when tilted at that angle relative to a particular vantage point. This is what is sometimes called a viewpoint-dependent property. To determine the visual angle subtended by an 2 object on the vertical axis, imagine drawing straight lines from the topmost and bottommost points of the object to the center of the viewer’s lens and measure the resulting angle. Or onto a 2D plane intersecting the line of sight of the so-called ‘cyclopian eye’, the viewpoint of fused stereo 3 vision. 183 Talk of appearances is often ambiguous between talk of phenomenal character and talk of representational content, so we should clarify just what is at issue in recent discussions of perceptual relativity. Present-day relativists (e.g., Hill and Schellenberg) are representationalist. Representationalist hold that, at a minimum, representational content determines phenomenal character. So a representation of the relational property (e.g., casting a given elliptical projection to a particular point in space) determines that phenomenal appearance of the perceived object (e.g., looking elliptical from the given viewpoint). Though probably the dominant view in the philosophy of perception representationalism is not without its critics. So we would do well to find a less contentious principle connecting content and phenomenology that is, nonetheless, consistent with representationalism. I suggest that we make do with the claim that the content and character of a perceptual experience are each determined by a subset – though not necessarily the same subset – of the information carried in the perceptual processing resulting in that experience. The relevant notion of information, here, is correlational: The activity (A) of a perceptual mechanism carries information about some state of affairs (S) iff the probability of that state of affairs, given the activity of the perceptual mechanism, is different than the unconditional probability of the state of affairs: Pr(S|A) ≠ Pr(S). Given the fact that representational content and phenomenal character are determined by 4 a subset of information carried in perceptual processing, any difference in phenomenal character or representational content entails a difference in information carried by perceptual processing. To say that an activity of a neural mechanism carries information about a state is not yet to say that the activity 4 represents that state. For instance, on an informational teleosemantic view, that information will need to be properly related to some sort of selected function for it to be represented (Neander 2017; Shea 2018). This gives us an externalist account of information and representation that allows us to accommodate non-representationalist physicalist views. Local statistical regularities and selection history can lead to physically identical systems giving rise to experiences with different representational content but identical phenomenal character. 184 For the relativist, this means that the pathway from the relation-determined content or phenomenology runs through the processing of information about the relation. It also gives us a test to see if a given piece of information is carried in perceptual processing: If some change in information about a feature of the world (including features of the perceiver’s perceptual system) induces a change in the perceptual experience (content or phenomenology), then that information about that features is carried in the perceptual processing leading up to that experience. I will use this test to show that all the information needed to represent intrinsic spatial properties can be made available to perceptual processing without relying on action. Of particular importance to the discussion is the fact that the relevant information includes information about the physical structure and organization of the perceptual apparatus that influences appearances – a fact that is frequently overlooked in discussions of perceptual relativity. Some of this information will merely be used to transform inputs of a given stage of processing into its output without, itself, being part of the input or output. For example, the distance between the lens and the retina, in conjunction with accommodation (alterations in the lens thickness brought about by tiny muscles within the eye), will determine the sharpness of the retinal projection. High-resolution receptive field mechanisms in visual processing assume that a certain degree of sharpness can be obtained – hence they assume that a given accommodation at a set distance between lens and retina will yield a retinal projection that has as fine or finer a spatial resolution than the highest spatial frequency receptive fields of visual processing. And so there must be information regarding the distance between the retina and lens. Similarly, the photoreceptors on the retina are not evenly distributed. They are much more densely packed at the center. But an object seen in the center of view does not seem much larger than it does when 185 seen in peripheral vision. So there must be information regarding the spatial distribution of the photoreceptors that allows the visual system to correct for this uneven distribution. This does not require that this information is recoverable by later stages of perceptual processing or even that it is encoded by any neural activity – such information might be supplied by the physical structure of the perceptual mechanisms themselves. In particular, it does not 5 require that a given bit of information carried by perceptual mechanisms is among the subset of information that determines the representational content/phenomenal character of the perceptual experience. Also note that, since the information determining content needn’t be the same as the information determining the phenomenal character, some features of the perceptual system that alter phenomenology (e.g., distance between lens and retina) needn’t be represented in visual experience. But again – though the information carried by perceptual processing might outstrip the information that determines the representational content of the resulting perceptual experience – the information determining the representational contents of the perceptual experience experience cannot outstrip the informational content carried by perceptual processing. Similarly for phenomenal character. Such information is sometimes called ‘implicit information’, but we should be careful not to confuse this use of 5 ‘implicit’ with the use that arises in discussion of implicit attitudes. The latter concern representational contents. On my view, informational content is prior to representational content. Given this, my claim is even weaker than weak representationalism. My claim is simply that perceptual processing – understood as a form of information processing – determines perceptual phenomenology and constrains (if not determines) conscious representational content (see below). This should be relatively uncontroversial. It can be accepted, for instance, by critics of representationalism such as Block (1990, 1996, 2010) who cite worries from inverted spectra, mental paint, and the like, by appealing to an (implicit) mapping of information to phenomenal character. 186 1.2. Why does perceptual relativity (allegedly) support action-based views of perception? There are three primary motivations for the view that action is necessary for visual perception. The first motivation – call it the Sensorimotor Knowledge Motivation – holds that perceiving intrinsic shape from viewpoint-dependent shape requires sensorimotor knowledge of how the appearance of the object would change were we to move (Berkeley 1709, 1734; Hume 1777; Hurley 1998; Noë and Hurley 2003; Noë 2006). For example, in order to perceive the intrinsic circularity of a tilted coin, we need to know how its (supposedly) elliptical viewpoint relative shape would change as we move about it. And this sensorimotor knowledge is gained through past actions. The second motivation – call it the Objective Experience Motivation – also concerns the perception of intrinsic shape. It holds that – even if we could perceive the intrinsic size or shape of an object without appeal to action (e.g., sensorimotor knowledge) – we have no reason to do so unless we conceive of the object as an objective particular, as having existence independently of our experience of it. And this requires action (Schellenberg 2007, 2010; see Strawson 1959 for a precursor). The third motivation – call it the Egocentric Coordinates Motivation – holds that action is necessary even to perceive viewpoint-dependent spatial properties. The idea here is that perceptual input can only have spatial significance in relation to the perceiver’s potential interactions with the objects generating that input (Evans 1982; Grush 2007; Schellenberg 2007, 2010). This is true of, and indeed focuses on, viewpoint-dependent spatial properties or, as they are sometimes called, egocentric spatial properties. 187 I will address each of these motivations in turn. Since perceptual appearances are viewpoint-dependent for the perceptual relativist, and the perception of intrinsic spatial features builds upon the perception of viewpoint-dependent spatial features, I will begin with the Egocentric Coordinates Motivation (§2). I then turn to the Sensorimotor Knowledge Motivation (§3), and finally to the Objective Experience Motivation (§4). Throughout, I will highlight the role of non-retinal informational inputs to visual appearances, both from occurrent stimulation of other sensory receptors and background information concerning invariant features of the perceptual apparatus. These inputs have been largely overlooked or minimized in discussion of the perception of intrinsic spatial features. As we will see, the supposed role of action is to overcome the informational shortfall we get from the perceptual experiences characterized by viewpoint-dependent appearances. If one does not pay adequate attention to the full range of inputs relevant to visual processing, one will be inclined to think of visual processing in an informationally impoverished way – e.g., corresponding to projections of distal objects to a singular viewpoint (the retina), as in talk of visual angles. And this will encourage the thought that action is necessary to visually perceive spatial properties. 2. The Egocentric Coordinates Motivation The Egocentric Coordinates Motivation can be traced back to Evans’s (1982) notion of ‘behavioral space’, according to which perceptual input can only have spatial significance in light of the perceiver’s (potential) interactions with his environment, given his position in that environment (Evans 1982, p. 155). On Schellenberg’s elaboration of the idea, we need 188 dispositions to act to confer a ‘practical understanding of space’, which gives visual experience its egocentric spatial coordinates that allows us to perceive both the location of perceived objects relative to the perceiver and their view-point dependent shapes: What is crucial for determining the coordinates of perception are the spatial locations from which possible movements originate and the directions of the relevant movements. The axes of our egocentric frame of reference are determined by our dispositions to act that bring about practical understanding of basic spatial directions. This practical understanding of basic spatial directions is a kind of spatial know-how. The idea of spatial know-how is related to Evans’s thought that an understanding of spatial directions is not simply related to the place we occupy, but is related rather to the possibilities for action that one has given the way one occupies that location. When I tilt my head, I do not see objects on the verge of sliding off the surface of the earth. The reference of ‘up’ is not determined by the direction of my head, but rather by how I would move my body [to perform some action] given the position of my body. (2007, p. 616) So, when you open your eyes in the morning while lying on your right side, you do not mistakenly perceive that (visual) ‘up is in the direction of the top of your head and that visual ‘down’ is in the direction of your feet. Rather, you perceive visual ‘up’ to be in the direction of your left shoulder and ‘down’ to be in the direction of your right shoulder. And this is the case, according to Schellenberg, because you know that you need to rotate your body such that your head moves in the direction of your left shoulder and your feet move in the direction of your right shoulder if you are going to get out of bed. This is an example of the spatial know-how that (supposedly) determines egocentric spatial coordinates and, hence, viewpoint-dependent spatial appearances. In other words, Schellenberg thinks that the spatial content of action is explanatorily prior to that of perception. Grush (2007, 2009) offers a similar view, also under the inspiration of Evans. However, according to Grush, the spatial content of 189 perception and action derive from dispositions one has to perform certain specific goal- directed actions in response to sensory stimulation. So for Grush the explanation of the spatial contents of perception and action is arrived at holistically – neither is explanatorily prior to the other. Below I argue that Schellenberg and Grush are both wrong. 2.1. Orienting up/down and left/right axes of egocentric space To have a system of egocentric coordinates that allows us to locate things in egocentric space, we must be able to determine where things are along the left/right, up/down, and near/far axes. And to do this requires that we orient these axes (e.g., assign one end as ‘up’ and the other as ‘down’), as in the example of getting out of bed in the morning. I’ll deal with the near/far axis in §2.1.3. Here I will deal with the left/right and up/down axes. The first point to note is that establishing egocentric coordinates does require integrating bodily space and retinocentric space (the spatial field associated with retinal stimulation prior to integration with bodily space). This is supported by extensive empirical evidence for the 6 The integration of these spatial frames includes orienting them to one another such that one side of the 6 retinocentric space is linked one side of the body, just as an orientation of the two retinocentric spatial frames must be operative to achieve a unified stereoptic visual field. It is only once such an orientation occurs that egocentric direction – e.g., left, right, up, down – make any sense. That is the kernel of truth in Evans’s behavioral space view. But, as we are about to see, the behavioral space view (and other dispositional views) overreach when they insist that dispositions to act necessary for egocentric spatial representation. The evidence suggests that this remapping is accomplished by performing coordinate transformations across spatial frames associated with specific peripheral sensory receptors (e.g., retinas, the skin, etc.) rather than by appealing to an explicit representation of egocentric space (Stein 1992). That is, the representation of egocentric space seems to emerge from the rules governing these coordinate transformations (see Briscoe 2021; Grush 2007) Cf. Matthen (2014). Matthen argues that the integration of these distinct spatial frames cannot arise either by mapping the spatial coordinates of the various frames to one privileged frame (e.g., the visual spatial frame) post-perceptually or from action, alone. From this Matthen concludes that there must be a pre-modal representation of space shared by the perceptual modalities. This pre-modal representation functions as a ‘common measure’ somewhat like a container into which the objects and qualities perceived by the various senses can be placed to bring them into spatial relations with one another. However, Matthen does not explicitly address the possibility that the representation of egocentric space is emergent in the way described above, and I see no reason to think that it is not – particularly in the absence of evidence for an explicit representation of egocentric space. The emergence of egocentric space is no more mystifying than the emergence of a three dimensional stereoptic field from the integration of the two retinocentric spatial frames in stereopsis. 190 remapping of objects from sensory-specific spatial frames into egocentric coordinates in the posterior parietal cortex (Azañòn et al. 2010; Andersen et al. 1997; Cohen and Andersen 2004; Zipser and Andersen 1988; see Whitlock 2017 for a review). This is how it works for the up/down axis: First we have a body schema – a representation of the dynamic spatial structure of the body that subserves proprioception (see chapter 4). The stimulation of proprioceptors in the connective tissues of the musculoskeletal system register tension at the joints, which allows us to update the body schema with the current positions of our limbs. So, when the head tilts back, this movement will be registered by 7 proprioceptors of the neck muscles. It will also be registered as rotational acceleration in sagittal plane by the semicircular canals of the vestibular system. Otolithic organs in the vestibular system will register the direction of gravity’s pull. This allows us to orient bodily space along the up/down axis, giving the body-relative reference of ‘up’ and ‘down’. And, assuming our eyes are open, as the head tilts back and the eyes move upward, we will experience a corresponding change in the visual stimulation. The law-like correlation of changes in the retinocentric (and stereoptic) frames with bodily movements – given the location of the eyes in the head and the proprioceptive input concerning eye movements – will allow the visual field to inherit up/down from vestibular input and proprioception. Notice that all of these inputs are perceptual. This suggests that the movements required to achieve integration needn’t Nb. this representation of the body’s position is not tantamount to the categorical basis for dispositions to act. It is 7 wholly perceptual – associated with brain regions involved in perceptual processing (e.g., the posterior parietal cortex) – and distinct from the representations of the body in the motor cortex that coordinate action. It is true, however, that in the case of intentional movements, the body schema will also be updated on the basis of efferent motor signals. 191 be actions. Passive movement of the perceiver’s body will provide the same perceptual input as 8 a corresponding action. As we’ll see, this poses a serious problem for action-based views. A similar story can be told with respect to the left/right axis. The orientation of the stereoptic spatial frame with a representation of the current body posture allows us to perceive visually presented objects on our right as closer to the right side of our body. But if this integration is sufficient for orienting the visual field to the body (and gravity), and the integration of these distinct spatial frames yields egocentric directions, then there is no need to appeal to spatial know-how to set the egocentric coordinates of visual experience. Rather, tracking changes in the retinotopic spatial arrangement of visual stimulation, in conjunction with bodily position changes and vestibular stimulation, will allow us to assign the referents of ‘up’, ‘down’, ‘left’, and ‘right’ in egocentric space. 2.2. Scuttling the priority claim: action coordinates also depend on integration Furthermore, the integration of these distinct spatial frames of reference determine, not only the coordinates of perception, but also the coordinates of action. One doesn’t know in which direction to reach to grasp a visually perceived object unless the stereoptic spatial frame has been oriented to bodily space such that the right of visual space corresponds with the right of bodily space, etc. Spatially oriented movements and actions (and, hence, spatial know-how) depend on egocentric coordinates that are derived from the integration of perceptual inputs in their This is not to say that actions won’t provide addition information – e.g., efferent motor signals (see the preceding 8 note). I will argue below that, though these may be helpful in brining about integration in egocentric space, they are not necessary. 192 modality-specific spatial frames (proprioception, vision, equilibrioception). The priority runs in the opposite direction from that which Schellenberg claims (2007, p. 616; 2010, p. 154). This undercuts one motivation for action-based views; namely, that action is necessary to perceive egocentric/viewpoint-dependent spatial properties. The extent to which action is implicated in spatial perception is that we need to integrate body space and visual space for egocentric spatial perception and egocentric spatial perception is necessary for action. But action is not necessary for egocentric spatial perception. One might object that empirical results show that the orientation of retinocentric space and the body schema, relative to one another, is malleable in a way that supports Schellenberg’s order of explanation claim: Subjects fitted with left/right or up/down inverting lenses will experience a mismatch between visual information and proprioceptive and vestibular information, this interferes with our ability to effectively act (e.g., we will reach for objects with the wrong hand). And it interferes with egocentric perception – there is now an ambiguity regarding the egocentric position of those objects (are they on the right or on the left?). Is is 9 unsurprising, then, that after garnering sufficient experience navigating the world while wearing the inverting lenses subjects will undergo accommodation – i.e., they will reorient body space and visual space to enable normal interactions with the environment (Harris 1980; Kohler 1961, 1964; Linden et al. 1999; Stratton 1897; Yoshimura 2002). We should not overlook the fact that subjects have already integrated (and previously oriented) bodily space and 9 visual space before putting on the lenses. That is, the axes of each of these spatial reference frames are aligned, yielding a common (egocentric) spatial frame of reference. When subjects put on the lenses, the visual inversion interferes with the established orientation of the two spatial frames (visual and bodily). However, bodily space and visual space are still experienced as different representations of the same (already established) egocentric space. But now conflicts between them interfere with the identification of which of two candidates (left/right inversions of one another) is that common frame. 193 Note that the recalibration could run in one of two directions: the inverted visual space could be ‘righted’, recalibrating it to non-inverted body space; or body space could be reoriented to match inverted visual space. If Schellenberg’s priority claim is correct, we would expect that adaptation would manifest as a righting of the visual field back to its original position, in line with the non-inverted body schema and corresponding motor representations of the body. Indeed, Hurley and Noë (2003) argue that accommodation does involve reversion of the visual field to its pre-inverting lens orientation. However this interpretation has been called into serious question 10 as an interpretation of the relevant empirical results (Linden et al. 1999; Briscoe and Grush 2015; Klein 2007) and on phenomenological grounds (Degenaar 2014, pp. 383-388; Linden et. at. 1999). The balance of evidence points away from the reversion of the visual field to its original orientation. And this makes perfect sense: Distal objects are presented first by vision (retinal stimulation). We must adjust our movements to the visually apparent position if we want to interact with these objects. It is only once we do so that we are able to grasp them, touch them, etc. and thereby get proprioceptive and haptic input concerning them – the sorts of inputs registered in (non-inverted) body representations. But then it seems that the coordinates of action need to be set in terms of the coordinates of (one aspect) of perception (the aspect associated with retinal stimulation), as I claimed above. This interpretation of the inverting lens experiments has been taken as supporting the non-disposition action-based 10 accounts (i.e., sensorimotor contingency theories) by Hurley and Noë (2003), Noë (2006), O’Regan and Noë (2001), and O’Regan (2011). Degenaar (2014) takes these results, and the results of his own experience with inverting lenses, as supporting a sensorimotor view, but he seems to have a weaker sensorimotor view in mind that these other authors: For Degenaar this just means that visual experience must take inputs from proprioception – i.e., that it must take account of body movements. He does not explicitly state that it requires agential self-movement. As such, his interpretation of the results is completely consistent with everything I will say here. 194 It is simply not true that the axes of egocentric space are provided by dispositions to act. It is not, as Schellenberg suggests, how one would move one’s body to perform some action, given the position of one’s body, that determines the direction of ‘up’. It is the integration of visual, proprioceptive, and vestibular reference frames that determine the direction of ‘up’ and, in doing so, determine how one would move one’s body to perform that action. 11 However, this does not undercut all versions of the Egocentric Coordinates Motivation. It is open to these views to claim that spatial concepts are dependent on movements and actions that are not spatially oriented – or not oriented in perceiver-external space. For instance, Grush (2007, 2009) can admit everything I have said provided that action is necessary to bring about the integration of these spatial frames because Grush’s view is that the spatial import of both perception and action are mutually co-constructed. Neither is explanatorily prior to the other. This leaves room for an action-requirement to reassert itself: What if the integration of the distinct spatial frames of reference to which I appeal does require action, just not actions that are oriented in egocentric space? After all, subjects experience accommodation after sufficient experience navigating their environments while wearing inverting lenses. The idea is that we require input that action can provide, and that cannot be provided by perception, to bring about the integration of visual and bodily spaces. This is precisely the sort of mutual co-construction of egocentric spatial representation by action and perception proposed by Grush. However, the empirical work on inverting lenses suggests that action is not necessary. Adaptation can arise from passive and involuntary movement (Melamed et al. 1973; Mather and Notice that the resulting representation is not conceptual in any cognitively demanding way. It is, in fact, an 11 instance of what is called ‘nonconceptual content’ by those who do not think that any constituent of mental content is a concept. This is the sort of representation that animals and infants could have. 195 Lackner 1975). Surely agential self-motion is a highly effective way of learning the 12 correlations between these distinct spatial frames, and – given that we move ourselves more than we are moved – it will be the usual way of bringing about the integration. But it is not necessary. Assuming that the same mechanisms are responsible for acquiring the integration of the body schema, vestibular input, and the stereoptic field as are responsible for re-integrating them in adaption to inverting lenses, then there is no need for action to bring about the initial integration of these spatial frames. 2.3. Distance in depth This settles matters for left/right and up/down, but we still must account for distance in depth. After all, if action/dispositions to act are required to ‘set the coordinates’ of distance from the viewer’s viewpoint, then it will be required to perceive relative distances of perceived objects – i.e., that one object is farther away than another. But we already get relative distance from retinal disparities. This is the principle behind stereoscopic photographs: dual images taken by cameras positioned roughly where the eyes of the viewer would be and viewed through a device that presents only the left image to the left eye and only the right image to the right eye, resulting in a fused image with apparent depth. When we add in information concerning the interocular distance and the vergence angle of the eyes – the angle at which the lines of sight from the two eyes intersect, which can be derived from information carried in the body schema (see below) – See Briscoe (2019) for discussion of the implication of these results and results from studies with tactile-visual 12 sensory substitution devices for action-based views of perception. 196 we can triangulate the metric distance to the focal point (at least within about 6m). That is, we 13 will be able to tell, not only that one object is farther away than the other, but how much farther away. (This fact is widely exploited in computer vision.) And it is easily demonstrable that information about interocular distance and vergence is carried in visual processing of depth. Consider the experience of looking through a stereoscope. Here slightly different perspectives on the same scene are presented, one to each eye, resulting in the sort of binocular disparity we get in everyday looking. (The sort that becomes evident when you alternately close one eye and then the other, observing the resulting changes in appearances.) While they do successfully bring about a three dimensional appearance, stereoscope images tend to appear compressed in depth. Figures appear shallower than their natural counterparts. This is sometimes referred to as the cardboard cutout effect because people in stereoscope images tend to look a bit like flat cardboard cutouts arranged within the scene. The cardboard cutout effect arises because the stereo images were taken by cameras that were closer together than the eyes of the average viewer of those images through a stereoscope. When viewed by a perceiver with an interocular distance greater than the distance between the This is not the only non-action-based means of determining metric distance to visually perceived objects. See 13 Bennett (2011, 2022) for discussion of the role of the vertical position of the object within the visual field in determining distance. Another distance cue is provided by accommodation – the stretching or compressing of the lens by tiny muscles within the eye to adjust its focus. These adjustments are registered by proprioceptors attached to these muscles. Pictorial cues (such as shadows, contrast, and occlusion) also play a role in distance perception, but these only provide relative distance, not metric distance. As such, they can help us determine that the farther of Peacocke’s trees is, in fact, farther away. But they cannot tell us how much farther. As we’ll see, accessing metric distance is necessary for determining intrinsic size and shape. The view that (static) vergence plus retinal disparity is sufficient for metric depth perception is widely held in vision science (Rogers 2019; Parker et al., 2016; Banks et al., 2016; Wolfe et al., 2019; Mon-Williams and Tresilian, 1999; Tresilan et al. 1999; Meyer, 1842; Wheatstone, 1852; Baird, 1903; Swenson, 1932; Foley, 1980). Quinlan and Culham (2007) point to the likely involvement of the dorsal parieto-occipital sulcus in integrating vergence and accommodation and, perhaps, calculating metric distance to perceived object from these cues alone. Linton (2020) provides a critical view of the role of vergence as an absolute distance cue, based on experiments that removed confounds from these other (non-action involving) cues. This does not rule out a role for vergence working with, e.g., retinal disparities and accommodation to determine metric distance. Importantly, for our purposes, serious doubt has been cast on the effectiveness of motion parallax as a distance cue (Thomson et al., 2011; Creem-Regehr et al., 2015; and Rogers 2019). 197 cameras that took the images, the resulting retinal disparities appear compressed in depth: A smaller interocular distance requires a smaller vergence angle to fixate on an object at a given distance, and a smaller vergence angle leads to a greater depth-to-width ratio in the fused stereoptic field (fig. 5.1). That is, the disparities captured by the cameras produce a fused image that appears shallower when viewed from a greater interocular distance and larger angle of convergence, as when we look through a stereoscope. 14 For detailed mathematical models of sterescopic space and the impact of vergence and interocular distance 14 thereon, see Gao et al. (2018), Gao et al. (2020) Masaoka et al. (2006), Woods et al. 1993, and Yamanoue et al. 2006. 198 Figure 5.1. The stereoptic field generated by looking at an object with left and right eyes closer together (top) has a larger depth:width ratio than the stereoptic field generated by looking at the object with eyes positioned farther apart (bottom), due to wider interocular distance and larger vergence angle. A blue box marks the stereoptic field (top and bottom). It is also well-established empirical result that artificially altering the tensions measured by the proprioceptors of the extraorbital muscles, which control eye position and by which vergence angle is detected, impacts the apparent distance in depth of perceived objects (Ebenholtz and Wolfson 1975; Ebenholtz 1981; Ebenholtz and Fisher 1982; Fisher and Ciuffreda 1990; Paap and Ebenholtz 1977; Priot et al. 2010; Priot et al. 2012; Tresilian et al. 1999; see Howard and Rogers 2002 for a review). This impact holds independently of interocular distance (which remains fixed in these studies). There are many studies supporting the claim that changes in interocular distance impacts the apparent size and depth of perceived objects (IJsselsteijn et al. 2002; Häkkinen et al. 2011; Kim and Interrante 2017; Kytö et al., 2012; Renner et al. 2015; Utsumi et al. 1994). For example, Kim and Interrante (2017) demonstrated that virtual manipulations of interocular distance impact size perception. Stereo images of objects taken with a larger camera separation than the subject’s own eye separation appeared smaller than their real counterparts when shown in a virtual environment with a set fixation distance, ensuring that vergence angle would be consistent across trials. When the camera separation matched the participant’s eye separation, size perception was veridical. This suggests that interocular distance impacts size perception – had the participants 15 had eye separations matching the high camera separation, they would have perceived the size of the virtually presented objects accurately. What is relevant at present is that interocular distance and vergence angle independently impact the appearance (in both phenomenal and representational senses) of the scene resulting Small virtual eye separation led to slightly larger size estimates, though these were not found to be statistically 15 significant. Kim and Interrante also looked at the effects of eye height. Contrary to prior studies (Sedgwick 1980; Wraga 1999; and Dixon et al. 2000; see also Bennett 2011, 2020), they found no evidence of an effect from eye height. 199 from the stereoptic fusion of the two retinal projections. So, according to the principle that perceptual phenomenology and content are determined by information carried in perceptual processing, there must be information concerning both interocular distance and vergence carried by visual processing. Furthermore, this information can be used to determine, by the law of sines, the vergence angle. And this information is sufficient to determine the metric distance to the perceived object (using the law of sines/triangulation). But if this information is available and it is sufficient to determine the metric distance to perceived objects, then there is no informational shortfall for action to overcome. And once we have established metric distance in depth, we have enough 16 information to determine metric distances in the left/right and up/down planes: Two points at equal distances in depth from the perceiver will form a visual angle which, together with the distances from the perceiver to these points, will determine the distance between them (in the left/right or up/down plane). More generally, given the metric distance from the perceiver of two points and the visual angle subtended by those two points at the retina, the distance between those two points can be computed (using the law of cosines). As such, the metric coordinates of egocentric space can be entirely set (within about 6 meters) without any appeal to action. The Egocentric Coordinates Motivation rests on a mistake. 3. The Sensorimotor Knowledge Motivation That covers the viewpoint-dependent spatial features that perceptual relativists have claimed require action to represent. It is simply not true that action/dispositions to act are necessary to For metric distance beyond the effective range of stereopsis, see Bennett (2011, 2022), previously mentioned in n. 16 13. 200 specify egocentric spatial contents – including the viewpoint-dependent shapes of perceived objects. I will now leverage the results of the preceding section to argue that the Sensorimotor Knowledge Motivation fails. We do not need knowledge garnered from past interactions with visible objects to perceive the intrinsic spatial features (size and shape) of objects. 3.1. Size It is a very short step from the foregoing discussion of the metric coordinates of egocentric space to the determination of intrinsic (i.e., viewpoint-independent) size. Two trees might subtend equal visual angles from top to bottom at each of the perceiver’s retinas despite one being farther away than the other. We are, of course, able to perceive that one of them is larger than the other (the one that is further away). Furthermore, given the ability to determine their metric distance in depth – at least within about 6 meters – and the fact that this will determine the metric height corresponding to visual angle at the distance of each tree, we can perceive more than just the relative difference in size between the two trees. We can perceive the intrinsic sizes (in terms of height – the same points follow for width) of the trees without any additional appeal to action/ dispositions to act. This goes a long way towards explaining how we can perceive intrinsic 17 shape, as well. However, there are complications that arise when we consider objects that are slanted away from the perceiver. Here the angle subtended by the endpoints of a line (e.g., a horizontal line marking the diameter of a tilted coin) and the distance to an interior point of it does not determine the length of the line. This doesn’t mean that we will be able to accurately estimate the metric distances in some unit of measurement. 17 Our perception of distance will support our perceptual judgments and actions based thereon, however. This is how we manage to play – with varying degrees of success, of course – a game of horseshoes. 201 3.2. Shape Consider the widely used example of a coin that is tilted away from the perceiver – the edge of the coin is farther from the perceiver at some points than at others. The coin will appear elliptical, though its intrinsic shape is circular. Nevertheless, we perceive that the coin is 18 circular. How is this possible? Nearly everyone agrees that we perceive the coin as circular. That viewpoint-dependent properties (e.g., 18 viewpoint-dependent ellipticality) are represented is, in fact, the more contentious claim. (For arguments against the representation of viewpoint-dependent properties, see Briscoe 2008, Peacocke 1983) I’ll return to this issue in §4. 202 Figure 5.2. Illustration of the effect of focal distance on the geometry of the stereoptically fused region. The top three rows provide a top view of objects at three different distances (A, B, C) that give rise to the same retinal disparities. The geometry of the fused region is given by the intersection of the left and right visual angles on the object. As the object gets further away and vergence decreases, the depth:width ratio of the fused region increases. The fourth row provides a side view illustrating the shared top-to-bottom visual angle subtended by object. The bottom row provides an illustration of the head-on shape of the three objects. The first thing to note is that we will get some depth information from stereopsis, assuming the coin is within about 6m. This is important to note because discussions of the coin often proceed as though we are monocular, with a single elliptical shape projected by the circular coin at the given angle (e.g, Noë 2006). However, even recognizing that the coin is tilted won’t decide between a circular coin at one angle and an elliptical coin or plate at a different angle and distance that could generate the same binocular disparities (figure 5.2). And, again, we see that the coin is circular. Presumably, we could recover the circularity by first fixating on the near edge and then on the far edge, but the difference in distances between these two edges will be quite small – perhaps too small for the desired precision of shape resolution. Furthermore, it does not seem that we need to look from one edge to another before we are able to perceive that the coin is circular. But recall the cardboard cutout effect. When viewing images through a stereoscope, we often experience them as flattened in depth due to the placement of the cameras when taking the two images. And we get this impression immediately, without casting our eyes about the image (small saccades notwithstanding). This is because we do not merely define the distance to the focal point by triangulating from interocular distance and vergence. We also define the geometry of the entire stereoptically fused visual field (the area of distal space for which the left and right 203 retinal projections are fused – the area in which we do not experience double vision). Figure 19 5.2 provides an illustration of the changes in the depth-to-width ratio of the stereoptically fused field given changes to interocular (or inter-camera distances) while preserving the retinal disparities. The result is that retinal disparities, in conjunction with information about interocular distance, and vergence, are sufficient to define the geometry of stereoptic field. Given this, we can recover the intrinsic shape of the visible surfaces of the perceived object (e.g., the circularity of the facing side of the tilted coin) by perceiving the way these surfaces fill a subregion of the stereoptic field. We’ve already seen that all of this information is present in the visual processing This is not to say that we compute the precise metric properties of perceived objects for all perceptual purposes. 19 For instance, this doesn’t seem to be necessary for basic object recognition, which relies on information concerning non-accidental properties (NAPs) – spatial features of objects that persist despite changes in rotation in depth (Biederman and Barr 1999). Object recognition is a function of the ventral visual processing stream according to the two visual system hypotheses (TVSH), while visual perception in support of action is the province of the ventral stream (Goodale and Milner 1992). Metric properties are primarily associated with the ventral stream, however it seems that the two streams are not as separate as initially supposed (see Briscoe 2009 for discussion). I set these complications aside, but note that the role of the ventral stream is to provide spatial information concerning perceived objects to motor planning areas. This further suggests that Schellenberg’s priority claim is incorrect. Also note that this use of NAPs presupposes a solution to the abstraction problem, whether innately programmed into us or acquired. See below for reasoning that action isn’t necessary even if the solution is acquired. 204 leading up to the experience of the coin. So, again, there is no information lacking that we need action to provide. 20 It appears, then, that we don’t need action to perceive intrinsic shape, either. However, it remains possible that, even though the resources are present to compute intrinsic size and shape without appealing to action, we would not in fact compute them unless we recognized that objects seen from a particular egocentric vantage points can be seen from other vantage points as well. Just because we can perceive intrinsic shapes doesn’t mean that we do. There is still the objective experience motivation for action-based views to contend with 4. The objective experience motivation: Schellenberg’s abstraction condition The over-arching idea behind the objective experience motivation is that – whether or not we have all the information necessary to perceptually determine the intrinsic shapes and sizes of visually perceived objects – we have no reason to do so until we have a reason to attribute such intrinsic (non-relational) spatial features to them. Schellenberg calls the condition that must be One might try a retreat to the claim that the complete intrinsic shape has not thereby been perceived, as we are still 20 limited to the non-relational shape of only the facing surfaces. So our perception of the shape of the object as a whole is still viewpoint-dependent. There are two things to note in response: First, the discussion has generally been cast in terms of the intrinsic (non-relational) shapes of visible surfaces. Discussion of the coin, for instance, concerns the circularity of its facing surface, not its cylindricality. Second, visual perception does seem to engage in amodal completion of 3D shape, but this is based on basic simplicity assumptions – e.g., that objects are symmetrical (Kwan et al. 2016; etc.). Illusions resulting from these assumptions persist even after one has encountered the object from a different view (contrary to the predictions of sensorimotor contingency views) (Ekroll et al. 2018). There is some evidence that some of the relevant assumptions do not emerge until 6 months of age, at which point infants are beginning to engage in coordinated visual-manual exploration (Soska and Johnson 2008); however, this does not establish that action is needed to acquire these assumptions, which – insofar as they are acquired – are generalizations from statistical regularities in visual experience of objects and so could, in principle, be acquired by passively viewing objects in motion, though action-based engagement with objects would certainly be more optimal. Notice, too, that the viewpoint-dependent appearances of visually perceived objects differ with changes in interocular distance and vergence. Vergence and interocular distance work together to determine the depth of the fused stereoptic area. Were we to adjust one while holding the other (and the retinal stimulation) fixed, we will get an elongating or shortening of apparent distance in depth. So we should expect that an angled circular coin seen close up and a larger angled elliptical plate seem from further away, which generate the same retinal disparities, will appear to have different viewpoint-dependent shapes (fig. 5.2) This calls into question the difference between the viewpoint-dependent shape of an object – if it is this that determines appearances – and the intrinsic shape of the visible surfaces of the object (which seems to be the relevant notion of intrinsic shape at play in the discussion). 205 met in order to see objects as having intrinsic spatial properties ‘the abstraction condition’. Her claim is that, to meet the abstraction condition, we must understand that the objects we perceive are perceivable from locations other than the location we occupy (2007, p. 618). And this 21 presupposes that we can represent our actual location relative to the perceived object (so that there is something to abstract away from). For Schellenberg, representing our location relative to the perceived object is simply a matter of representing our position as both the point of origin of perception and the point of origin of action (2010, pp. 150-151; 2007, p. 621), which entails – on Schellenberg’s view – that we have established the coordinates of egocentric space. We can then imagine that other points within egocentric space could be viewpoints with their own (action-determined) egocentric coordinates. However, the arguments of §2.1 show that it is not action that determines the coordinates of egocentric space, it is the integration of distinct perceptual spatial frames. If that is correct, then one should be able to represent her current location without depending on dispositions to act. The preceding arguments have laid the ground for another path to the representation of our position relative to perceived objects: Information concerning retinal disparities, interocular distance, and vergence angle are sufficient for the triangulation of one’s distance from the focal object and to define the geometry of the region of space represented in the stereoptic field. Noë offers a more demanding ‘abstraction condition’ that requires knowledge of how our visual experience would 21 change were we to move in order to abstract away from relational properties (which, for him, are projections onto a plane perpendicular to the line of sight, which he calls P-properties) (2004, p. 84). Hill (2020, p. 187-188) claims that one way to get at intrinsic properties via perception is to calibrate perceptual inputs with motor programs via learning. This doesn’t require person-level awareness of the sensorimotor correlations, but it does require that these correlations be represented somewhere in (subpersonal) processing. Schellenberg’s view is less demanding, so I will focus on it. If I can show that even her undemanding reliance on action is too demanding, it will scuttle the other views as well. Schellenberg phrases her condition as a necessary condition, though she often talks as though it is also sufficient for perceiving intrinsic shape. My intention here is to show that it is not necessary. 206 When integrated with information concerning one’s current body posture, the location of the eyes in the head, and vestibular input, this will provide all the necessary informational inputs to fix the relevant spatial relations between the perceiver and the object, thereby representing her location. 22 Where does this leave us with Schellenberg’s abstraction condition? According to Schellenberg, the viewpoint-dependency of perception/perceptual imagination (there is no view- from-nowhere experience of seeing or imagining an object), and the fact that the egocentric coordinates of perception/perceptual imagination are dependent on one’s dispositions to act (one’s spatial know-how), jointly imply that understanding objects as perceivable from other locations – and, thus, meeting the abstraction condition – involves representing other points in egocentric space as potential egocentric viewpoints (which Schellenberg calls ‘alter-ego vantage points’). On my view, having an egocentric viewpoint does not require dispositions to act. The coordinates of this spatial frame are set by other means. So it can’t be that we need disposition to act to establish the coordinates of the alter-ego viewpoint, either. If dispositions to act are necessary to satisfy the abstraction condition, it is because they are necessary for us to imagine other points as points we could occupy. The thought is that we could only imagine ourselves as occupying some other position if we have some understanding of our ability to move ourselves to that position. And this is relevant for securing objective experience; that is, for experiencing the objects of perceptual experience as mind-independent entities. The thought has been expressed by At one point, Schellenberg says “Through changes in perception brought about by changes in spatial relations to 22 objects, one can triangulate back to one’s location” (2010, p. 151). She doesn’t return to the triangulation point, and she may be speaking metaphorically. If not, her comments suggest triangulation by motion parallax, which would implicate motion at least (though not necessarily action). My point here is that binocular parallax also permits triangulation of one’s position and doesn’t require motion, let alone action. 207 Strawson (1959) in his requirement that objective experience requires that we conceive of the object of experience as capable of existing unperceived. Strawson connects this to the need for spatial perception and awareness of one’s own movements. (He doesn’t specify that they must be self-movements, though this is implicit in his discussion.) So the abstraction condition amounts to the requirement that we have some reason to think of the object as retaining its features independently of our experience of it. Schellenberg’s idea is that, in understanding that our movements would impact the situation dependent features/appearance of perceived objects, we become licensed in attributing objective features to them. This does not require, as Noë's (2004) view does, that we can predict how the changes in position will affect appearances, only that they will. That is, on the objective experience motivation, what is at issue is that objects have objective shapes and sizes, not what it takes for us to perceive those objective shapes and sizes. Even if we need to understand that we could move to another point in egocentric space to meet the abstraction condition, there is no reason to think that this requires knowledge of a capacity for self-movement (action). It only requires that we understand that we could be moved (whether by ourselves or others) from our current position to some other position. And this understanding could arise from past experiences of being moved (passively), once the stereoptic spatial reference frame was integrated with vestibular stimulation (which tracks acceleration in the direction of movement) and a body representation. Nor does this require – as Schellenberg concedes – that we be able to imagine how our experience of the object would change were we moved. So action is not necessary to imagine a point in egocentric space as a potential alter-ego 208 viewpoint. A sentient statue could satisfy the abstraction condition (Cf. Schellenberg 2007, pp. 610-611, 625-626). 23 Furthermore, it isn’t clear that movement is required. Our visual experience of objects arises from two distinct (though nearby) locations – namely, the locations of the eyes. While visual experience doesn’t generally appear as though coming from two locations, passive alternating occlusion of the eyes would make this apparent. And one it is apparent, it is not obvious why this wouldn’t be sufficient for us to understand that we can have visual experiences from different locations. 24 In short, we can meet Schellenberg’s abstraction condition without appealing to dispositions to act (or past actions, as Noë (2006) maintains). Therefore, there is no bar to accepting that intrinsic size and shape can be determined in the manner specified above, without appealing to action. 5. Appearance-determining relational properties are not represented As mentioned in §1.1, modern-day proponents of perceptual relativity also tend to be representationalists. These authors believe that the representation of relational properties of perceived objects determines the phenomenal appearances of those objects (Schellenberg 2008, My proposal is not subject to Schellenberg’s worry that appeals to spatial concepts (e.g., left and right) to meet the 23 abstraction condition over-intellectualizes perception and simply presupposes what must be explained (namely the spatial structure of perception) (2010, p. 157; 2007, p. 606). Nothing I have said relies on our perceiving things as, e.g., on the left. I only rely on our ability to perceive that something is on the left. And so I don’t appeal to spatial concepts in an intellectually demanding sense. And the presupposition complaint only applies to views that posit these concepts as structuring perception while claiming that there is nothing further to be said about how and why they structure perception. The view I am advancing does not commit this sin: it gives an account as to how perception comes to have this structure and why it needs to have this structure. There is room to push back, given that these locations aren’t locations in the perceived egocentric space – we 24 don’t see the locations of our eyes. Nevertheless, I don’t need the point about motion to show that meeting the abstraction condition doesn’t require action. 209 2010; Hill 2009, 2016, 2020; and Green and Schellenberg 2018). In this section I assess such views. To do so I must first clarify: (1) what the relevant relational properties are, and (2) what the relevant notion of representation is. So what are the relevant relational properties? There are two views currently on offer. Schellenberg (2008, 2010) and Green and Schellenberg (2018) rely on ‘situation-dependent properties’: “Situation-dependent properties are ontologically dependent on and exclusively sensitive to intrinsic properties of the environment (such as intrinsic shape and colour reflectance properties of objects) and situational features (such as the perceiver’s location and the lighting condition)” (Schellenberg 2010: 146). Situation-dependent properties do not include features of 25 the perceiver’s perceptual apparatus. However, features of the perceptual apparatus do determine which situation-dependent properties a perceiver is sensitive to and, hence, which situation- dependent properties the perceiver can represent. 26 Hill (2009, 2014, 2016, 2020) differentiates the relational features determined by intrinsic features and viewing conditions (Schellenberg’s situation-dependent properties) from what he calls ‘Thouless properties’. Thouless properties are outputs of a mechanism that takes situation- dependent properties as inputs and and transforms them into relational properties that are, Visual angles are one sort of situation dependent property. They are the basis of Noë’s (2004) P-properties (or 25 projection properties), which he uses to capture the (spatial) relational properties determining perception. P- properties are projections the a perceived object would cast onto a plane perpendicular to the line of sight. Situation- dependent properties are, at least officially, three dimensional, which is a reason to favor them over P-properties, as wee’ll see in the discussion of stereopsis, below. Noë, himself, is not a representationalist (2005); nevertheless, I will explain why we can’t appeal to something akin to P-properties to defend a version of representational relativism below. Schellenberg acknowledges that phenomenal appearances are impacted by features of the perceptual apparatus. – 26 she endorses intramodal representationalism, on which the phenomenal character of a perceptual experience is determined by the content of the experience and the sensory modality of the experience (2008: 65; for the classic presentation of intramodal representationalism see Lycan 1996). While this view includes features of the perceptual apparatus in the factors determining phenomenal appearances of a perceptual experience, it does not include them in the represented contents of that experience. See Speaks (2015) for criticisms of intramodal representationalism. 210 roughly, a compromise between situation-dependent properties and intrinsic properties: They exhibit more perceptual constancy than situation-dependent properties – representations of Thouless properties are less sensitive to changes in viewing conditions that situation-dependent properties – but less constancy than the outputs of constancy functions capturing intrinsic features of objects. According to Hill, appearances are determined by representations of Thouless properties, rather than situation-dependent properties, and the relations captured in Thouless properties include a role for perceptual mechanisms (e.g., pseudo-constancy mechanisms) (Hill 2020: 183-184). 27 And what is the relevant notion of representation? To say that an experience has representational content is, minimally, to say that it has accuracy conditions. That is, the experience can be accurate or inaccurate depending on how it represents the world to be. This 28 is not supposed to explain why it has that content; it merely pinpoints what the content is. And it does not entail that all the contents captured by the accuracy conditions will be accessible to the perceiver – the perceiver might lack the relevant concepts. Nor does it answer the question 29 about the relationship between phenomenal appearances and representational contents (accessible or not). Representationalists answer this question with the claim (roughly) that representational content of the experience determines its phenomenology (Byrne 2001; Dretske 1995; Harman 1990; Lycan 1996; Tye 2000, 2002; Hill 2009; Schellenberg 2008: 63, 2017: 12). This allows Hill to sidestep the appeal to features of the perceptual system as an additional basis for the 27 phenomenal character of a perceptual experience, as we see in weak representationalism. Schellenberg (2008, p 64; 2017, pp. 4, 11-12) explicitly adopts this characterization of perceptual content. 28 For accounts of perceptual contents that are not accessible to the perceiver, see Crowther (2006), Speaks (2005), 29 Van Cleve (2012). For present purposes, I will only need to focus on the information that factors in perceptual processing and not which subset of that information is represented in conscious perceptual experience. 211 Representationalist relativists, like Schellenberg and Hill, think that perceivers represent relational properties in their visual experiences and these represented properties determine the phenomenal appearances of that experience. Furthermore, representationalists generally focus on the consciously accessible contents of experience, the idea being that the phenomenal character of the experience is how the content that determines it is made consciously accessible, hence the tight connection between phenomenology and content. If the relevant account of perceptual content is tied to accuracy conditions, then it is clear that they will be represented, given the laws of optics, the operations of visual processing, and the relevant features of proprioception and equilibrioception. But there is no guarantee that they will play a role in the determination of perceptual phenomenology, as the perceptual relativist claims. I have left it open that the information that determines visual phenomenology and the information that determines the content of visual experiences differ. Furthermore, some information – e.g., concerning invariant features of the perceptual system (such as interocular distance, the distance between the lens and the retina) – might play a role in determining the content of a visual experience without, itself, being consciously represented in that experience. Not all the information carried in perceptual processing is represented. Similarly, contents that are represented in perceptual processing – and even, according to some, in the resulting perceptual experience – will not necessarily be consciously accessible to the perceiver enjoying the resulting perceptual experience. Recall that the activity (A) of a perceptual mechanism carries information about some state of affairs (S) iff the probability of that state of affairs, given the activity of the perceptual mechanism, is different than the unconditional probability of the state of affairs: Pr(S|A) ≠ Pr(S). The activity of that mechanisms represents that state of affairs 212 when further perceptual processing uses the activity of the mechanism as indicating that state of of affairs. Roughly, the content represented is the information used (see chapter 2). 30 This notion of representation is silent on the questions of whether the content is represented in perceptual experience and whether it is consciously accessible. It is a fairly liberal notion of representation, and – though it is not as liberal as the accuracy conditions notion – it is the most liberal notion of representation that can be put into the service of representationalism: if representational contents are going to play a role in determining the phenomenology of perceptual experiences, then they must be being used by the perceptual processing that gives rise to that phenomenology. I will now argue that information about relational properties does not play a role in visual processing, and hence is not represented. Lande (2018) has offered an argument to this effect, to which my argument is much indebted. According to Lande, the visual system is designed to account for the fact that objects are encountered from a particular viewpoint. Lande invokes a principle of non-redundancy to argue that there is no need to represent this information in visual processing because it is already accounted for in the physical structures of the visual system (Lande 2018, pp. 201-202). 31 Though I think that there is much of value in Lande’s approach, I don’t think it quite hits the mark. According to Lande the physical configuration of the visual system accounts for the fact that the shapes detected by the retina are projective (i.e., that they are projected onto the More precisely, the content represented is that some state of affairs obtains (or does not obtain), where that state of 30 affairs is a state of affairs about which the mechanism carries information and subsequent processing uses the activity of the mechanisms as indicating that the state of affairs obtains (or does not obtain). Lande’s argument draws inspiration for the oft invoked hardware/software analogy on which representation is a 31 matter of the processing done by the cognitive ‘software’ (e.g., neuron firings), but it is helped along by the physical configuration of the nervous system (e.g., the retinotopic arrangement of portions of the visual cortex or the spatial proximity of individual photoreceptors feeding into the same retinal ganglion cell). The idea is that we don’t need to represent, in the processing stream, facts that are already captured by the architecture of the nervous system (e.g., proximity of individual photoreceptors). 213 retina from an external source and so are viewpoint-dependent). But this does not mean that the visual system doesn’t need to represent the projective shape. It only means that the visual system doesn’t need to represent it as projective. But that doesn’t scuttle the relativist view. Schellenberg, for instance, is explicit that her situation-dependent properties needn’t be represented as relational properties (2008, p. 68). Even so, I think Lande is on the right track, and his conclusion is correct: the visual system does not need information about relational spatial properties to represent non-relational spatial properties of perceived objects. I also think that Lande’s redundancy principle is correct, though I think ‘superfluity’ is a more apt term than ‘redundancy.’ Some redundancy can be functionally valuable, and different systems might compute the same information as part of distinct processes – e.g., if they evolved independently. Lande clearly has no problem with this kind of redundancy (because the redundancies are not superfluous). The idea, then, is that we should not attribute contents (informational or representational) that are superfluous from the standpoint of perceptual processing. If the information already present in the relevant processing stream is enough to do the job, there is no need to appeal to further information about relational properties. Given this, the relativist owes us a processing role played by relational properties that could not be played by any other information present in the relevant perceptual processing stream. Schellenberg’s (2008) epistemic dependency thesis captures the standard role proposed by representational relativists for relational properties: Perception of intrinsic features is dependent on – and by implication computed by way of – perception of viewpoint-dependent relational properties of perceived objects. Schellenberg’s argument is straightforward: If we 214 defeat the perceptual evidence for the relational property, we thereby defeat the evidence for the corresponding non-relational (intrinsic) property, but not vice versa. Consider the tilted coin again. The standard description of the case appeals to an elliptical perspectival shape and the intrinsic circularity of the coin. This way of setting up the case bears a close affinity with the classic inverse problem for optics, which holds that the projective shape of an object underdetermines the shape of the three dimensional object that casts the projection. But this is a problem about projections to a single retina. As such it is more apt for understanding monocular visual processing than it is for understanding visual processing in binocular creatures like us. Furthermore, we’ve seen, in discussion of the cardboard cutout effect, that interocular 32 distance and vergence impact the phenomenal appearances of binocular visual experiences. So the relevant relational property cited by the relativist will need to go beyond the output of stereopsis if it is to account for the determination of visual phenomenology. This shows us that we cannot merely consider retinal projections when attempting to account for the visual perception of shape. However, if information concerning retinal projections, their stereoptic fusion, vergence, and interocular distance collectively determine the phenomenal shape appearances of visually perceived objects, then there is no role left for information about a relational property encompassing all of this information to play in the production of perceptual experience. The relevant relational property is an objective property of the perceived object, on Schellenberg’s This is a problem for Lande’s positive proposal, as well. Lande’s proposal invokes an array-like structure of 32 perceptual representations on which the tilted coin and a head on ellipse occupy the same cells within the representational array, where cells correspond to lines of sight from the perceiver (2018). But this, too, seems more apt to monocular vision. To avoid superfluity, Lande will have to show that there is some role to be played by overlaying this array of cells onto the stereoptic field, which is constructed out of two distinct viewpoints. It is unclear what role this could be. 215 view. As such, it would need to be something like: produces a given stereoptic fusion of two retinal projections to two points at a given distance (the interocular distance) to one another and whose vergence angle (given the angle of each eye relative to the interocular axis) will be such and such. But the individual pieces of information cited in the putative relational property are enough to determine phenomenal appearances and that the object – at least its visible surface – has a particular objective shape. There is no further need to invoke a role for an objective relational property of the perceived object that is pieced together from information we’ve already accounted for. Given the superfluity principle, we should deny that such relational properties play a role in the production of visual experience and, hence, such relational properties are not represented in visual experience – at least not in the phenomenology-determining way that the perceptual relativist claims. 33 Furthermore, this shows that the role proposed for relational properties captured in Schellenberg’s epistemic dependency thesis is not actually played by relational properties: they are not directly involved in the perceptual processing that calculates intrinsic spatial properties. They merely happen to be determined by the same factors as are the intrinsic spatial properties, This supports what Green and Schellenberg (2018) have called the distance and slant view – on which the 33 viewpoint dependence of visual perception is explained by representations of (or information about) the distance and orientation of visually perceived objects relative to the perceiver – over their relational properties view (featuring Schellenberg’s situation-dependent properties). Green and Schellenberg (2018, pp. 9-10), and Schellenberg (2008, pp. 68-69, argue that distance and slant are not represented in visual experience because this is too demanding. She phrases this in terms of awareness, so we can assume she is talking about consciously accessible representations of distance and slant (and other of her situational features – features concerning the viewing conditions). Whether or not distance and slant are represented in experience, they are certainly represented in subpersonal visual processing, which is all we need to generate make relational properties computationally superfluous. 216 hence the fact that defeating evidence for the relational properties defeats evidence for the intrinsic property. 34 Therefore, there is no place for the conscious representation of relational properties that determine the phenomenal appearances of visually perceived objects. This is not to say that 35 relational properties of some sort aren’t represented in perceptual processing at all (contra Lande). Indeed, the projective shapes objects cast onto each retina clearly must play a role in visual processing (though not as projective) in order to accomplish stereopsis. But, while these do play a role in the production of experience, they are not the sole perceptual inputs involved in the determination of visual appearances. They are just one input among many – including many that don’t concern relational properties of perceived objects – that collectively determine appearances. Nor are they ordinarily part of the content of visual experience (though they can be, e.g., when we close one eye). Therefore, we cannot build a representational relativism on the basis of projective properties: Representations of projective properties – even if they are represented in perceptual experience – are insufficient for the determination of the phenomenal appearances of binocular visual experiences. I have focused on Schellenberg’s version of representational relativism, but my argument also applies to Hill’s account unless he can carve out a role for his Thouless properties in Schellenberg’s claim that defeaters for intrinsic properties don’t necessarily defeat evidence for relational 34 properties seems to be a holdover from views that think of relational properties in terms of retinal projections. On such views there will be some evidence for intrinsic spatial features that can be defeated without defeating the retinal projection-based relational properties; namely, evidence concerning the information needed to solve the relevant inverse problem (monocular or binocular). But these relational properties don’t determine appearances, and so make representational relativism false. I am focusing here on spatial features of perceived objects (size and shape). There might be room for the relativist 35 to push back with respect to color, though I suspect problems will still arise. Recall the the image of the blue and black/white and gold dress. The distinct appearances arise despite the fact that wavelength information available to all perceivers is the same. This suggests that appearances are determined by processing that has already accounted for aspects of the illumination in order to represent the intrinsic color of the dress. 217 perceptual processing. But given the foregoing, it’s not clear what role that could be that couldn’t be played by (non-relational) information about the perceptual mechanisms outputting Thouless properties and the information that made Schellenberg’s situation-dependent properties otiose. Furthermore, it is not clear that Thouless properties, as Hill conceives them, are properly relational as opposed to misrepresentations of intrinsic properties. This latter interpretation is, after all, the interpretation Thouless gave to his results concerning the systematic deviations from both intrinsic and retinal sizes he reported (1931a, 1931b). And it remains an open question whether Thouless’s work establishes such systematic deviations (Epstein and Park 1963; Joynson 1958; Kaess 1978.) As such, the conclusion stands: Information about relational properties that determine perceptual appearances plays no functional role in visual processing. And, if information about these aren’t used by visual processing, visual experience doesn’t represent them. Representational relativism is false. 6. Conclusion In this chapter, I have argued that action is not needed to perceive either viewpoint-dependent or viewpoint-independent spatial features of visually perceived objects. This contradicts the commonly held view that the fact of perceptual relativity – i.e., that visual appearances are dependent on one’s viewpoint – entails that action is necessary for spatial vision. I have argued further that the viewpoint-dependent phenomenal appearances of visual experiences are not explainable wholly in terms of representations of relational properties. I now want to step back and contextualize this discussion within the larger framework of the minimal sensory modalities approach. Recall that a primary factor leading previous scholars 218 to posit a role for action in visual perception was an under-appreciation of the non-retinal perceptual inputs to production of (supposedly unimodal) visual experiences. That is, there has been a tendency to treat vision as though it were minimal vision – a version of vision that only takes sensory input from the retinas. It is true that, were we restricted to minimal vision, vision would not represent egocentric spatial properties, let alone intrinsic spatial properties. Nor would it represent spatial properties of a visual field prior to the integration with bodily space that gives rise to egocentric spatial perception, given the fact that the minimal senses do not include background information pertaining to the arrangement of the individual transducers (in this case rods and cones) on the receptors organ (the retina) or the spatial relationship between discrete receptor organs (the two retinas). The two retinas need to be oriented to one another to achieve 36 stereopsis, which results in a single three-dimensional visual field. Even if the retinas are oriented to one another and stereopsis is achieved, until there is information specifying which retina is on the left and which on the right, there will be an indeterminacy with regards to depth perception – e.g., objects that appear concave in one configuration will appear convex in the other. Similarly, until there is information about the positions of rods and cones on the retina, a given pattern of photoreceptor activity won’t determine the shape of the light projected onto the retina that causes that pattern of activity. So the spatial features of even the monocular visual field are wildly underdetermined in minimal vision. The upshot is that restricted aspatialism is true of vision, just as it is true of audition (chapter 3), touch (chapter 4), and proprioception (chapter 4). To arrive at spatial vision, with respect to retinocentric or stereoptic space, we need additional information concerning invariant These points are introduced in more detail in chapter 3 and also discussed in chapter 4. 36 219 features of the visual system (arrangement of rods and cones on the retinas, position of the retinas relative to one another). As we saw with audition, touch, and proprioception this sort of information becomes available only when we combine inputs from the various peripheral sensory organs. For instance, we learn the placement of photoreceptors on the retinas by tracking changes in retinal stimulation in conjunction with changes in head position detected by proprioception. This is how we come to correct for the uneven distribution of photoreceptors and avoid perceiving objects casting projections to the centers of the retinas as substantially larger than those casting projections to the peripheries, as mention in §1.1. But integrating retinal and non-retinal perceptual inputs does more than provide retinocentric and stereoscopic spatial content to vision. It also allows vision to take on egocentric spatial content. The burden of this chapter has been to show that this can occur without appealing to action. The perceptual resources of proprioception, equilibrioception, and vision are enough. 220 References Ahveninen, J., Jaaskelainen, I. P., Raij, T., Bonmassar, G., Devore, S., Hamalainen, M., Levanen, S., Lin, F., Sams, M., Shinn-Cunningham, B. G., Witzel, T. and Belliveau, J. W. (2006). ‘Task-Modulated “What” and “Where” Pathways in Human Auditory Cortex.’ PNAS 103(39), 14608–14613. Alsmith, A. J. T. (2021). ‘Bodily Structure and Body Representation.’ Synthese 198, 2193-2222. Araujo, H. F., Kaplan, J. Damasio, H., and Damasio, A. (2015). ‘Neural Correlates of Different Self Domains.’ Brains and Behavior 5(12), article e00409. Artiga, M. (2021). ‘Beyond Black Dots and Nutritious Things: a Solution to the Indeterminacy Problem.’ Mind and Language 36(3), 471-490. Artiga, M. and Sebastian, M. A. (2018). ‘Informational Theories of Content and Mental Representation.’ Review of Philosophy and Psychology 2018: 1-15 Assaiante, C., Barlaam, F., Cignetti, F, and Vaugoyeau, M. (2014). ‘Body Schema Building During Childhood and Adolescence: A Neurosensory Approach,’ Clinical Neurophysiology 44, 3-12. Auvray, M., and Spence, C. (2008). ‘The Multisensory Perception of Flavor.’ Consciousness and Cognition 17, 1016-1031. Azañòn, E., Longo, M. R., Soto-Faraco, S., and Haggard, P. (2010). ‘The Posterior Parietal Cortex Remaps Touch into External Space’, Current Biology 20, 1304-1309. Baird, J.W. (1903). ‘The Influence of Accommodation and Convergence upon the Perception of Depth.’ The American Journal of Psychology, 14(2), 150-200. Bajo, V . M., Nodal, F. R., Moore, D. R. and King, A. J. (2010). ‘The Descending Corticocollicular Pathway Mediates Learning-Induced Auditory Plasticity.’ Nature Neuroscience 13(2), 253–262. Bajo, V . M. and King, A. J. (2013). ‘Cortical Modulation of Auditory Processing in the Midbrain.’ Frontiers in Neural Circuits 6, article 114. Banks, M.S., Hoffman, D.M., Kim, J., and Wetzstein, G. (2016). ‘3D Displays.’ Annual Review of Vision Science 2(1), 397-435. Bayne, T. (2014). ‘The Multisensory Nature of Perceptual Consciousness.’ In D. J. Bennett and C. S. Hill (eds.), Sensory Integration and the Unity of Consciousness (pp. 15-36). Cambridge, MA: MIT Press. 221 Bechtel, W. (2008). Mental Mechanisms: Philosophical Perspectives On Cognitive Neuroscience. New York: Routledge. Bechtel, W. and Abrahamsen (2005). ‘Explanation: a Mechanistic Alternative.’ Studies in History and Philosophy of Biological and Biomedical Sciences 36, 421-441. Bennett, D. (2011). ‘How the World is Measured Up in Size Experience.’ Philosophy and Phenomenological Research 83(2), 345-365. Bennett, D. (2022). ‘Measuring Up the World in Size and Distance Perception.’ Erkenntnis, <https://doi.org/10.1007/s10670-022-00543-9>. Berger, C. C., and Ehrsson, H. H. (2013). ‘Mental Imagery Changes Multisensory Perception.’ Current Biology 23, 1367-1372. Berkeley, G. (1709). An Essay Towards a New Theory of Vision. Dublin: Aaron Rhames. Berkeley, G. (1713 [1979]). Three Dialogues between Hylas and Philonous. Indianapolis, IN: Hackett Publishing. Berkeley, G. (1734 [1998]). A Treatise Concerning the Principles of Human Knowledge. Dancy, J. (ed.). Oxford: Oxford University Press. Bertelson, P., Vroomen, J., de Gelder, B., and Driver, J. (2000). ‘The Ventriloquism Effect Does Not Depend on the Direction of Deliberate Visual Attention.’ Perception and Psychophysics 62(2), 321-332. Bhatt, R. S., Hock, A., White, H., Jubran, R., and Galati, A. (2016). ‘The Development of Body Structure Knowledge in Infancy.’ Child Development Perspectives 10, 45-52. Biederman, I. and Bar, M. (1999). ‘One-shot Viewpoint Invariance in Matching Novel Objects.’ Vision Research 39, 2885-2899. Blauert, J. (1997). Spatial Hearing (2nd ed.). Cambridge, MA: MIT Press. Block, N. (1990). ‘Inverted Earth.’ Philosophical Perspectives 4, 53-79. Block, N. (1996). ‘Mental Paint and Mental Latex.’ Philosophical Issues 7, 19-49. Block, N. (2010). ‘Attention and Mental Paint.’ Philosophical Issues 20(1), 23-63. Bourget, D. (2017). ‘Representationalism and Sensory Modalities: an Argument for Intermodal Representationalism.’ American Philosophical Quarterly 54(3), 251-268. 222 Brainard, M. S. and Knudsen, E. I. (1993). ‘Experience-Dependent Plasticity in the Inferior Colliculus: a Site for Visual Calibration of the Neural Representation of Auditory Space in the Barn Owl.’ Journal of Neuroscience 13, 4589-4608. Brecht, M. (2017). ‘The Body Model Theory of Somatosensory Cortex.’ Neuron 94(5), 985-92. Bregman, A. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press. Bremner, A. J. (2017). ‘The Origin of Body Representation.’ In de Vignemont, F. and Alsmith, A.J.T., (eds.), The Subject’ s Matter: Self-Consciousness and the Body (pp. 3-32). Cambridge, MA: MIT Press. Brennan, S. C., Davies, T. S., Schepelmann, M., and Riccardi, D. (2014). ‘Emerging Roles of the Extracellular Calcium-Sensing Receptor in Nutrient Sensing: Control of Taste Modulation and Intestinal Hormone Secretion.’ British Journal of Nutrition 111, S16-S22. Briscoe, R. (2008). ‘Vision, Action, and Make-Perceive.’ Mind and Language 23(4), 457-497. Briscoe, R. (2009). ‘Egocentric Spatial Representation in Action and Perception.’ Philosophy and Phenomenological Research 79(2), 423-460. Briscoe, R. (2016). ‘Multisensory Processing and Perceptual Consciousness: Part I.’ Philosophy Compass 11(2), 121-133. Briscoe, R. (2017). ‘Multisensory Processing and Perceptual Consciousness: Part II.’ Philosophy Compass 12, 1-13. Briscoe, R. (2019). ‘Bodily Action and Distal Attribution in Sensory Substitution.’ In Macpherson, F. (ed.), Sensory Substitution and Augmentation (pp. 173-186). Oxford: Proceedings of the British Academy. Briscoe, R. (2021). ‘Bodily Awareness and Novel Multisensory Features.’ Synthese 198, 3913-3941. Briscoe, R. and Grush, R. (2015). ‘Action-Based Theories of Perception.’ The Stanford Encyclopedia of Philosophy, <https://plato.stanford.edu/entries/action-perception> (accessed 5/27/2022). Bronkhorst, A. W. and Houtgast, T. (1999). ‘Auditory Distance Perception in Rooms.’ Nature 397, 517-520. Brown, A. D., Jones, H. G., Kan, A., Thakkar, T., Stecker, G. C., Goupell, M. J. and Litovsky, R. Y . (2015). ‘Evidence for a Neural Source of the Precedence Effect in Sound Localization.’ Journal of Neurophysiology 114, 2991-3001. 223 Brungart, D. S. and Rabinowitz, W. M. (1999). ‘Auditory Localization of Nearby Sources: Head- Related Transfer Functions.’ Journal of the Acoustical Society of America 106(3), 1465-1479. Brungart, D. S., Drulach, N. I. and Rabinowitz, W. M. (1999). ‘Auditory Localization of Nearby Sources II: Localization of a Broadband Source.’ Journal of the Acoustical Society of America 106(4), 1956-1968. Budinger, E., Heil, P., Hess, A and Sceich, H. (2006). ‘Multisensory Processing via Early Cortical Stages: Connections of the Primary Auditory Cortical Field with Other Sensory Systems.’ Neuroscience 143, 1065-1083. Butler, R. A., Levy, E. T. and Neff, W. D. (1980). ‘Apparent Distance of Sounds Recorded in Echoic and Anechoic Chambers.’ Journal of Experimental Psychology: Human Perception and Performance 6, 745–750. Byrne, A. (2001). ‘Intentionalism Defended.’ Philosophical Review 110, 199-240. Caclin, A., Soto-Faraco, S., Kingstone, A., and Spence, C. (2002) ‘Tactile ‘Capture’ of Audition.’ Perception and Psychophysics 64, 616-630. Callan, A., Callan, D., and Ando, H. (2015). ‘An fMRI Study of the Ventriloquism Effect.’ Cerebral Cortex 25, 4248-4258. Calvert, G., Spence, C. and Stein, B. E. (2004). The Handbook of Multisensory Processes. Cambridge, MA: MIT Press. Cariani, P. A. and Delgutte, B. (1996). ‘Neural Correlates of the Pitch of Complex Tones: I. Pitch and Pitch Salience.’ Journal of Neurophysiology 76(3), 1698-1716. Casati, R. and Dokic, J. (2005). ‘Sounds.’ The Stanford Encyclopedia of Philosophy, <https:// plato.stanford.edu/entries/sounds> (accessed 4/23/2020). Casati, R. and Dokic, J. (2009). ‘Some Varieties of Spatial Hearing.’ In M. Nudds and C. O’Callaghan (eds.), Sounds and Perception (pp. 97-110). Oxford: Oxford University Press. Cassam, Q. (2005). ‘Space and Objective Experience.’ In J. L. Bermudez (ed.), Thought, Reference, and Experience: Themes from the Philosophy of Gareth Evans (pp. 258-289). Oxford: Clarendon Press. Cassam, Q. (2007). The Possibility of Knowledge. Oxford: Oxford University Press. Chen, Y .-C., and Spence, C. (2017). ‘Assessing the Role of the ‘Unity Assumption’ on Multisensory Integration: a Review.’ Frontiers in Psychology 8, 445. 224 Cheng, T. and Haggard, P. (2018). ‘The Recurrent Model of Bodily Spatial Phenomenology.’ Journal of Consciousness Studies 25(3-4), 55-70. Cheng, T. (2019). ‘On the Very Idea of a Tactile Field: a Plea for Skin Space.’ In T. Cheng, O. Deroy, and C. Spence (eds.), The Spatial Senses: Philosophy of Perception in an Age of Science (pp. 226-247). New York: Routledge. Chialvo, D. R. (2003). ‘How We Hear What Is Not There: A Neural Mechanism for the Missing Fundamental Illusion.’ Chaos 13(4), 1226-1230. Chomanski, B. (2017). ‘Balint’s Syndrome, Visual Motion perception, and Awareness of Space.’ Erkenntnis 83(6), 1265-1284. Christison-Lagay, K. L., Gifford, A. M. and Cohen, Y . E. (2015). ‘Neural Correlates of Auditory Scene Analysis and Perception.’ International Journal of Psychophysiology 95(2), 238-245. Coady, C. A. J. (1974). ‘The Senses of Martians.’ Philosophical Review 83(1), 107-125. Connolly, K. (2014). ‘Making Sense of Multiple Senses.’ In R. Brown (ed.), Consciousness iNside and out: Phenomenology, Neuroscience, and the Nature of Experience (pp. 351-364). New York: Springer. Crane, T. (2003). ‘The iNtentional Structure of Consciousness.’ In Q. Smith and A. Jokic (eds.), Consciousness: New Philosophical Perspectives (pp. 33-56). Oxford: Oxford University Press. Crane, T. (2007). ‘Intentionalism.’ In A. Beckermann and B. P. McLaughlin (eds.), Oxford Handbook to the Philosophy of Mind (pp. 365-373). Oxford: Oxford University Press. Craske, B., Kenny, F. T., Keith, D. (1984). ‘Modifying an Underlying Component of Perceived Arm Length: Adaptation of Tactile Location Induced by Spatial Discordance.’ Journal of Experimental Psychology: Human Perception and Performance 10(2), 307–317. Craver, C. F. (2002). ‘Interlevel Experiments, Multilevel Mechanisms in the Neuroscience of Memory.’ Philosophy of Science 69, S83-S97. Craver, C. F. (2003). ‘The Making of a Memory Mechanism.’ Journal of the History of Biology 36, 153-195. Craver, C. F. (2005). ‘Beyond Reduction Mechanisms, Multifield Integration, and the Unity of Neuroscience.’ Studies in History and Philosophy of Biological and Biomedical Sciences 36, 373-397. Craver, C. F. (2009). ‘Mechanisms and Natural Kinds.’ Philosophical Psychology 22, 575-594. 225 Craver, C. F. and Darden, L. (2001). ‘Discovering Mechanisms in Neurobiology: the Case of Spatial Memory.’ Theory and Method in the Neurosciences, 112-137. Craver, C. F. and Darden, L. (2005). ‘Introduction: Mechanisms Then and Now.’ Studies in History and Philosophy of Biological and Biomedical Sciences 36, 233-244. Craver, C. F. and Darden, L. (2013). In Search of Mechanisms. Chicago: University of Chicago Press. Creem-Regehr, S. H., Stefanucci, J. K., and Thompson, W. B. (2015). ‘Perceiving Absolute Scale in Virtual Environments: How Theory and Application Have Mutually Informed the Role of Body-Based Perception.’ In B. H. Ross (ed.), Psychology of Learning and Motivation, vol. 62 (pp. 195-224). Cambridge, MA: Academic Press. Crowther, T. (2006). ‘Two Conceptions of Conceptualism and Nonconceptualism.’ Erkenntnis 65(2), 245-276. Darden, L. (2002). ‘Strategies for Discovering Mechanisms: Schema Instantiation, Modular Subassembly, Forward/Backward Chaining,’ Philosophy of Science 69, S354-S365. Degenaar, J. (2014). ‘Through the Inverting Glass: First-Person Observations on Spatial Vision and Imagery.’ Phenomenology and Cognitive Sciences 13, 373-393. Deroy, O., Spence, C., and Noppeney, U. (2016). ‘Causal Metacognition: Monitoring Uncertainty About the Causal Structure of the World.’ Trends in Cognitive Sciences 20, 736-747. Dixon, M.W.; Wraga, M.; Proffitt, D.R.; and Williams, G.C. (2000). ‘Eye Height Scaling of Absolute Size in Immersive and Nonimmersive Displays.’ Journal of Experimental Psychology: Human Perception and Performance 26(2), 582-593. Dretske, F. (1981). Knowledge and the Flow of Information. Cambridge, MA: MIT Press. Dretske, F. (1986). ‘Misrepresentation.’ In R. Bogdan (ed.), Belief: Form, Content and Function (pp. 17-36). Oxford: Oxford University Press. Dretske, F. (1988). Explaining Behavior: Reasons in a World of Causes. Cambridge, MA: MIT Press. Dretske, F. (1995). Naturalizing the Mind. Cambridge, MA: MIT Press. Driver, J., and Noesselt, T. (2008). ‘Multisensory Interplay Reveals Crossmodal Influences on ‘Sensory-Specific’ Brain Regions, Neural Responses, and Judgments.’ Neuron 57, 11-23. Ehrlich, S., Lord, A. R., Geisler, D., Borchardt, V ., Boehm, I. Seidel, M., Ritschel, F., Schulze, A., King, J. A., Weidner, K., Roessner, V ., and Walter, M. (2015). ‘Reduced Functional 226 Connectivity in the Thalamo-Insular Subnetwork in Patients With Acute Anorexia Nervosa.’ Human Brain Mapping 36, 1772-1781. Ekroll, V ., Mertens, K., and Wagemans, J. (2018). ‘Amodal V olume Completion and the Thin Building Illusion.’ i-Perception 9(3), 1-21. Eliasmith, C. (2000). How Neurons Mean: a Neurocomputational Theory of Representational Content. (unpublished dissertation, Washington University in St. Louis) Eliasmith, C. (2005a). ‘Neurosemantics and Categories.’ In H. Cohen and C. Lafebvre (eds.), Handbook of Categorization in Cognitive Science (pp. 1035-1054). Amsterdam: Elsevier. Eliasmith, C. (2005b). ‘A New Perspective on Representational Problems.’ Journal of Cognitive Science 6, 97-123. Eliasmith, C. (2013). How to Build a Brain: a Neural Architecture for Biological Cognition Oxford: Oxford University Press. Epstein, W. and Park, J.N. (1963). ‘Shape Constancy: Functional Relationships and Theoretical Formulations.’ Psychological Bulletin 60, 265-288. Evans, G. (1982). Varieties of Reference. Oxford: Clarendon Press. Evans, G. (1985). Collected Papers. Oxford: Oxford University Press. Fagard, J., Esseily, R, Jacquey, L., O’Regan, K., and Somogyi, E. (2018). ‘Fetal Origin of Sensorimotor Behavior.’ Frontiers in Neurobotics 12, 23. Fardo, F., Beck, B., Cheng, T., and Haggard, P. (2018). ‘A Mechanisms for Spatial Perception on the Human Skin.’ Cognition 178, 236-243. Feldman, D. E. and Knudsen, E. I. (1997). ‘An Anatomical Basis for Visual Calibration of the Auditory Space Map in the Barn Owl’s Midbrain.’ Journal of Neuroscience 17(17), 6820-6837. Fodor, J. (1990). A Theory of Content. Cambridge, MA: MIT Press. Fodor, J. (1996). ‘Deconstructing Dennett’s Darwin.’ Mind and Language 11, 246-262. Foley, J. (1980). ‘Binocular Distance Perception.’ Psychological Review 87, 411-434. French, C. (2018). ‘Balint’s Syndrome, Object Seeing, and Spatial Perception.’ Mind and Language 33(3), 221-241. Fulkerson, M. (2011). ‘The Unity of Haptic Touch.’ Philosophical Psychology 24(4), 493-516. 227 Fulkerson, M. (2014a). ‘Explaining Multisensory Experience.’ In R. Brown (ed.), Consciousness Inside and out: Phenomenology, Neuroscience, and the Nature of Experience (pp. 365-373). New York: Springer. Fulkerson, M. (2014b). The First Sense: a Philosophical Study of Human Touch. Cambridge, MA: MIT Press. Fulkerson, M. (2014c). ‘Rethinking the Senses and Their Interactions: the Case for Sensory Pluralism.’ Frontiers in Psychology 5, 1-14. Gallagher, S. (1998). ‘Body Schema and Intentionality.’ In J. L. Bermudez, A. Marcel, and N. Glan (eds.), The Body and The Self (pp. 225-244). Cambridge, MA: MIT Press. Gallagher, S. (2005). How the Body Shapes the Mind. Oxford: Oxford University Press. Gallagher, S., Butterworth, G. E., Lew, A., and Cole, J. (1998). ‘Hand-Mouth Coordination, Congenital Absence of Limb, and Evidence for Innate Body Schemas.’ Brain and Cognition 38(1), 53-65. Gadsby, S. (2018). ‘How are the Spatial Characteristics of the Body Represented? A Reply to Pitron and de Vignemont.’ Consciousness and Cognition 62, 163-168. Gao, Z., Hwang, A., Zhai, G., and Peli, E. (2018). ‘Correcting Geometric Distortions in Stereoscopic 3D Imagery.’ PLoS ONE 13(10), article e0205032. Gao, Z., Zhai, G., and Yang, X. (2020). ‘Stereoscopic 3D Geometric Distortions Analyzed from the Viewer’s Point of View.’ PLoS ONE 15(10), article e0240661. Ghazanfar, A., and Schroeder, C. E. (2006). ‘Is Neocortex Essentially Multisensory?’ Trends in Cognitive Sciences 10, 278-285. Giurgola, S., Pisoni, A., Maravita, A., Vallar, G., Bolognini, N. (2019). ‘Somatosensory Cortical Representation of Body Size.’ Human Brain Mapping 40, 3534-3547. Goodale, M.A. and Milner, A.D. (1992). ‘Separate Visual Pathways for Perception and Action.’ Trends in Neurosciences 15(1), 20-25. Goodman, N. (1955). Fact, Fiction and Forecast. Cambridge, MA: Harvard University Press. Grantham, D. W. (1984). ‘Interaural Intensity Discrimination: Insensitivity at 1000 Hz.’ Journal of the Acoustical Society of America 75, 1191-1194. Gray, R. (2013). ‘Is There a Space of Sensory Modalities?’ Erkenntnis 78(6), 1259-1273. Green, E. J. and Schellenberg, S. (2018). ‘Spatial Perception: The Perspectival Aspect of Perception.’ Philosophy Compass 13 (2), article e12472. 228 Grice, H. P. (1962). ‘Some Remarks About the Senses.’ In R. J. Butler (ed.), Analytic Philosophy, First Series (pp. 133-153). Oxford: Oxford University Press. Griffiths, P. and Goode, P. E. (1995). ‘The Misuse of Sober’s Selection for/Selection of Distinction.’ Biology and Philosophy 10, 99-107. Grothe, B., Pecka, M. and McAlpine, D. (2010). ‘Mechanisms of Sound Localization in Mammals.’ Physiological Reviews 90, 983-1012. Grouios, G. (1996). ‘Phantom Limb Perceptuomotor “Memories” in a Congenital Limb Child.’ Medical Science Research 24, 503-504. Grush, R. (2000). ‘Self, World and Space: The Meaning and Mechanisms of Ego- and Allocentric Spatial Representation.’ Brain and Mind 1(1), 59-92. Grush, R. (2007). ‘Skill Theory V2.0: Dispositions, Emulation, and Spatial Perception.’ Synthese 159(3), 389-416. Guinan, J. J. (2018). ‘Olivocochlear Efferents: Their Action, Effects, Measurement and Uses, and the Impact on the New Conception of Cochlear Mechanical Responses.’ Hearing Research 362, 38-47. Gurfinkel, V . S., and Levick, Y . S. (1991). ‘Perceptual and Automatic Aspects of the Postural Body Scheme.’ In J. Paillard (ed.), Brain and Space (pp. 147-162). Oxford University Press. Haggard, P., Chen, T., Beck, B., Fardo, F. (2017). ‘Spatial Perception and the Sense of Touch.’ In de Vignemont, F. and Alsmith, A.J.T. (eds.), The Subject’ s Matter: Self-Consciousness and the Body (pp. 97-114). Cambridge MA: MIT Press. Häkkinen, J., Hakala, J., Hannuksela, M., and Oittinen, P. (2011). ‘Effect of Image Scaling on Stereoscopic Movie Experience.’ Proceedings, Society of Photo-Optical Instrumentation Engineers 7863, Stereoscopic Displays and Applications XXII, article 78630R. Harman, G. (1990). ‘The Intrinsic Quality of Experience.’ Philosophical Perspectives, 4: 31-52. Harris, C. (1980). ‘Insight or out of Sight?: Two Examples of Perceptual Plasticity in the Human Adult.’ In C. Harris (ed.), Visual Coding and Adaptability (pp. 95–149). Hillsdale, NJ: Lawrence Erlbaum. Hartmann, W. M., Rakerd, B., Crawford, Z. D. and Zhang, P. X. (2016). ‘Transaural Experiments and a Revised Duplex Theory for the Localization of Low-Frequency Tones.’ Journal of the Acoustical Society of America 139(2), 968-985. Hayat, T. T. A. and Rutherford, M. A. (2018). ‘Neuroimaging Perspectives on Fetal Motor Behavior.’ Neuroscience and Biobehavioral Reviews, 92: 390-401. 229 Head, H. (1920). Studies in Neurology, vol. 2. London: Oxford University Press. Heil, J. (1983). Perception and Cognition. Berkeley, CA: University of California Press. Henning, G. B. (1974). ‘Detectability of Interaural Delay in High-Frequency Complex Waveforms.’ Journal of the Acoustical Society of America 55, 84-90. Hill, C. (2009). Consciousness. Cambridge: Cambridge University Press. Hill, C. (2014). Meaning, Mind, and Knowledge. Oxford: Oxford University Press. Hill, C. (2016). ‘Perceptual Relativity.’ Philosophical Topics 44(2), 179-200. Hill, C. (2020). ‘Appearance and Reality.’ Philosophical Issues 30, 175-191. Ho, J. and Lenggenhager, B. (2021). ‘Neural Underpinnings of Body Image and Body Schema Disturbances.’ In Y . Ataria, S. Tanaka, and S. Gallagher (eds.), Body Schema and Body Image (pp. 267-284). Oxford: Oxford University Press. Hoffman, P. N., Van Riswick, J. G. A. and Van Opstal, A. J. (1998). ‘Relearning Sound Localization with New Ears.’ Nature Neuroscience 1(5), 417-421. Holmes, N. P. and Spence, C. (2006). ‘Beyond the Body Schema: Visual, Prosthetic, and Technological Contributions to Bodily Perception and Awareness.’ In G. Knoblich, I. M. Thornton, M. Grosjean, and M. Shiffrar (eds.), Human Body Perception From the Inside Out (pp. 15-64). Oxford: Oxford University Press. Huemer, M. (2001). Skepticism and the Veil of Perception. Lanham, MD: Rowman and Littlefield. Hume, D. (1777 [1993]). An Enquiry Concerning Human Understanding. E. Steinberg (ed.) Indianapolis, IN: Hackett Publishing. Hurley, S.L. (1998). Consciousness and Action. Cambridge, MA: Harvard University Press. Hurley, S.L. and Noë, A. (2003). ‘Neural Plasticity and Consciousness.’ Biology and Philosophy 18, 131-168. IJsselsteijn, W.A., de Ridder, H., and Vliegen, J. (2002). ‘Subjective Evaluation of Stereoscopic Images: Effects of Camera Parameters and Display Duration.’ IEEE Transactions on Circuits and Systems for Video Technology 10(2), 225-233. Ito, S., Si, Y ., Feldheim, D. A. and Litke, A. M. (2020). ‘Spectral Cues are Necessary to Encode Azimuthal Auditory Space in the Mouse Superior Colliculus.’ Nature Communications 11, article 1087. 230 Javer, A. R. and Schwartz, D. W. (1995). ‘Plasticity in Human Directional Hearing.’ Journal of Otolaryngology 24, 111. Jones, S. S. (2009). ‘The Development of Imitation in Infancy.’ Philosophical Transactions of the Royal Society, Series B: Biological Sciences 364(1528), 2325-2335. Joynson, R. B. (1958). ‘An Experimental Synthesis of the Associationist and Gestalt Accounts of the Perception of Size. Part II.’ Quarterly Journal of Experimental Psychology 10, 142-154. Kaess, D. W. (1978). ‘Importance of Relative Width Differences and Instructions on Shape Constancy Performance.’ Perception 7, 179-185. Kanayama, N. and Hiromitsu, K. (2021). ‘Triadic Body Representations in the Human Cerebral Cortex and Peripheral Nerves.’ In Y . Ataria, S. Tanaka, and S. Gallagher (eds.), Body Schema and Body Image (pp. 133-151). Oxford: Oxford University Press. Kant, I. (1768 [1991]). ‘On the First Ground of the Distinction of Regions of Space.’ In J. Van Cleve and R. E. Frederick (eds.), The Philosophy of Left and Right (pp. 27–33). Boston, MA: Kluwer Academic Publishers. Kant, I. (1780 [1990]). Critique of Pure Reason. J. M. D. Meiklejohn (trans.). Amherst, NY: Promethius Books. Keeley, B. (2002). ‘Making Sense of the Senses: Individuating Modalities in Humans and Other Animals.’ Journal of Philosophy 99(1), 5-28. Kim, J. and Interrante, V . (2017). ‘Dwarf or Giant: The Influence of Interpupillary Distance and Eye Height on Size Perception in Virtual Environment.’ In R. Lindeman, G. Bruder, and D. Iwai (eds.), International Conference on Artificial Reality and Telexistence Eurographics Symposium on Virtual Environments. King, A. J. and Palmer, A. R. (1983). ‘Cells Responsive to Free-Field Auditory Stimuli in the Guinea Pig Superior Colliculus: Distribution and Response Properties.’ Journal of Physiology 342, 361-381. Kirsch, D. (2003). ‘Implicit and Explicit Representation.’ Encyclopedia of Cognitive Science 2, 478-481. Klein, C. (2007). ‘Kicking the Kohler Habit.’ Philosophical Psychology 20(5), 609-619. de Klerk, C. C. J. M., Filippetti, M. L., and Rigato, S. (2021). ‘The Development of Body Representations: an Associative Learning Account.’ Proceedings of the Royal Society, B, 288, article 20210070. 231 Kohler, I. (1961). ‘Experiments with Goggles.’ Scientific American 206, 62–72. Kohler, I. (1964). ‘The Formation and Transformation of the Perceptual World.’ Psychological Issues 3, 1–173. Kolarik, A. J., Cirstea, S. and Pardhan, S. (2013). ‘Discrimination of Virtual Auditory Distance Using Level and Direct-to-Reverberant Ratio Cues.’ The Journal of the Acoustical Society of America 134, 3395-3398. Kolarik, A. J., Moore, B. C. J., Zahorik, P., Cirstea, S. and Pardhan, S. (2016). ‘Auditory Distance Perception in Humans: A Review of Cues, Development, Neuronal Bases, and Effects of Sensory Loss.’ Attention, Perception, and Psychophysics 78, 373-395. Konishi, M. (1993). ‘Listening with Two Ears.’ Scientific American 268(4), 66-73. Kopco, N. and Shinn-Cunningham, B. G. (2011). ‘Effect of Stimulus Spectrum on Distance Perception for Nearby Sources.’ The Journal of the Acoustical Society of America 130, 1530-1541. Kopco, N., Huag, S., Belliveau, J. W., Raij, T., Tengshe, C. and Ahveninen, J. (2012). ‘Neuronal Representations of Distance in Human Auditory Cortex.’ PNAS 109(27), 11019-11024. Kopco, N., Doreswamy, K. K., Huang, S., Rossi, S. and Ahveninen, J. (2020). ‘Cortical Auditory Distance Representation Based on Direct-to-Reverberant Energy Ratio.’ NeuroImage 208, article 116426. Kugiumutzakis, J. (1999). ‘Genesis and Development of Early Infant Mimesis to Facial and V ocal Models.’ In J. Nadel and G. Butterworth (eds.), Imitation in infancy (pp. 36-59). Cambridge, UK: Cambridge University Press. Kulvicki, J. (2016). ‘Auditory Perspectives.’ In B. Nanay (ed.), Current Controversies in Philosophy of Perception (pp. 83–94). Abingdon: Routledge. Kwon, T., Li, Y ., Sawada, T., and Pizlo, Z. (2016). ‘Gestalt-like Constraints Produce Veridical (Euclidean) Percepts of 3D Indoor Scenes.’ Vision Research 126, 264-277. Kytö, M.; Hakala, J.; Oittinen, P.; Häkkinen, J. (2012). ‘Effect of Camera Separation on the Viewing Experience of Stereoscopic Photographs.’ Journal of Electronic Imaging 21(1), article 011011. Lande, K. (2018). ‘The Perspectival Character of Perception.’ Journal of Philosophy 115(4), 187-214. Lee, S., Ran Kim, K., Ku, J., Lee, J. H., Namkoong, K., and Jung, Y . C. (2014). ‘Resting-State Synchrony Between Anterior Cingulate Cortex and Precuneus Relates to Body Shape 232 Concern in Anorexia Nervosa and Bulimia Nervosa.’ Psychiatry Research: Neuroimaging 22(1), 43-48. Leo, F., Bolognini, N., Passamonti, C., Stein, B. E., and Ladavas, E. (2008). ‘Cross-Modal Localization in Hemianopia: New Insights on Multisensory Integration.’ Brain 131, 855-865. Linden, D.E.J., Kallenbach, U., Heinecke, A., Singer, W., and Goebel, R.. (1999). ‘The Myth of Upright Vision: A Psychophysical and Functional Imaging Study of Adaptation to Inverting Spectacles.’ Perception 28, 469–481. Linton, P. (2020). ‘Does Vision Extract Absolute Distance from Vergence?’ Attention, Perception, and Psychophysics 82, 3176-3195. Little, A. D., Mershon, D. H. and Cox, P. H. (1992). ‘Spectral Content as a Cue to Perceived Auditory Distance.’ Perception 21, 405-416. Longo, M. R. (2016). ‘Types of Body Representation.’ In Y . Coello and M. H. Fischer (eds.), Foundations of Embodied Cognition, Volume 1: Perceptual and Emotional Embodiment (pp. 117-134). London: Routledge. Longo, M. R., Azañòn, E., and Haggard, P. (2010). ‘More Than Skin Deep: Body Representation Beyond Primary Somatosensory Cortex.’ Neuropsychologia 48, 655-668. Longo, M. and Haggard, P. (2010). ‘An Implicit Body Representation Underlying Position Sense.’ Proceedings of the National Academy of Sciences, USA 107(26), 11727-11732. Lycan, W.G. (1996). Consciousness and Experience. Cambridge, MA: MIT Press. Maclachlan, D. L. C. (1989). Philosophy of Perception. Englewood Cliffs, NJ: Prentice Hall. Macpherson, F. (2011a). ‘Cross-Modal Experiences.’ Proceedings of the Aristotelian Society 111(3), 429-468. Macpherson, F. (2011b). ‘Individuating the Senses.’ In F. Macpherson (ed.), The Senses: Classical and Contemporary Readings (pp. 3-36). Oxford: Oxford University Press. Macpherson, F. (2011c). ‘Taxonomizing the Senses.’ Philosophical Studies 153(1), 123-142. Macpherson, F. (2014). ‘The Space of Sensory Modalities.’ In D. Stokes, M. Matthen, and S. Biggs (eds.), Perception and its modalities (pp. 434-463). Oxford: Oxford University Press. Malpas, R. M. P. (1965). ‘The Location of Sound,’ In R. J. Butler (ed.), Analytical Philosophy, 2nd series (pp. 131-144). Oxford: Basil Blackwell. 233 Mandrigin, A. (2021). ‘Multisensory Integration and Sense Modalism.’ The British Journal for the Philosophy of Science 72(1), 27-49. Masaoka, K., Hanazato, A., Emoto, M., Yamanoue, H., Nojiri, Y ., and Okano, F. (2006). ‘Spatial Distortion Prediction System for Stereoscopic Images.’ Journal of Electonig Imaging 15(1), article 013002. Maratos, O. (1982). ‘Trends in the Development of Imitation in the First Six Months of Life.’ in T. G. Bever (ed.), Intersubjective Communication and Emotion in Ontogeny (pp. 81-101). Hillsdale, NJ: Lawrence Erlbaum Associates. Marshall, P. J. and Meltzoff, A. N. (2014). ‘Neural Mirroring Mechanisms and Imitation in Human Infants.’ Philosophical Transactions of the Royal Society, B 369: article 20130620. Marshall, P. J. and Meltzoff, A. N. (2015). ‘Body Maps in the Infant Brain.’ Trends in Cognitive Science 19(9), 499-505. Maruyama, Y ., Yasuda, R., Kuroda, M., and Eto, Y . (2012). ‘Kokumi Substances, Enhancers of Basic Tastes, Induce Responses in Calcium-Sensing Receptor Expressing Taste Cells.’ PLoS ONE 7(4), article e34489. Mather , J. and Lackner, J. (1975). ‘Adaptation to Visual Rearrangement Elicited by Tonic Vibration Reflexes.’ Experimental Brain Research 24(1), 103-105. Matthen, M. (2010). ‘On the Diversity of Auditory Objects.’ Review of Philosophy and Psychology 1(1), 63-89. Matthen, M. (2014). ‘Active Perception and Space.’ In D. Stokes, M. Matthen, and S. Biggs (eds.), Perception and Its Modalities (pp. 44-72). Oxford: Oxford University Press. Matthen, M. (2015). ‘Individuation of the Senses.’ In M. Matthen (ed.), The Oxford handbook of perception (pp. 567-586). Oxford: Oxford University Press. Melamed, L. E.; Halay, M.; and Gildow, J. (1973). ‘An Examination of the Role of Task Oriented Attention in the Use of Active and Passive Movement in Visual Adaptation.’ Journal of Experimental Psychology 72, 207-212. Meltzoff, A. N. (2007a). ‘The “Like Me” Framework for Recognizing and Becoming an Intentional Agent.’ Acta Psychologica 124, 26-43. Meltzoff, A. N. (2007b). ‘“Like Me”: a Foundation for Social Cognition.’ Developmental Science 10(1), 126-134. Meltzoff, A. N. and Marshall, P. J. (2018). ‘Human Infant Imitation as a Social Survival Circuit.’ Behavioral Sciences 24, 130-136. 234 Meltzoff, A. N. and Moore, M. K. (1977). ‘Imitation of Facial and Manual Gestures by Human Neonates,’ Science 198, 75-78. Meltzoff, A. N. and Moore, M. K. (1983). ‘Newborn Infants Imitate Adult Facial Gestures.’ Child Development 54, 702-809. Meltzoff, A. N. and Moore, M. K. (1989). ‘IMitation in Newborn Infants: Exploring the Range of Gestures Imitated and the Underlying Mechanisms.’ Developmental Psychology 25, 954-962. Meltzoff, A. N. and Moore, M. K. (1992). ‘Early Imitation Within a Functional Framework: the Importance of Person Identity, Movement, and Development.’ Infant Behavior and Development 15, 479-505. Meltzoff, A. N. and Moore, M. K. (1994). ‘Imitation, Memory, and the Representation of Persons.’ Infant Behavior and Development 17, 83-99. Meltzoff, A. N. and Moore, M. K. (1997). ‘Explaining Facial Imitation: a Theoretical Model.’ Early Development and Parenting 6(3-4), 179-192. Meltzoff, A.N., Saby, J. N. and Marshall, P. J. (2019). ‘Neural Representation of the Body in 60- Day-Old Human Infants.’ Developmental Science 22, article e12698. Melzack, R. (1989). ‘Phantom Limbs, the Self and the Brain.’ Canadian Psychology 30, 1-16. Meyer, H. (1842). ‘Über einige Täuschungen in der Entfernung w. Grösse der Gesichtsobjecte.’ Archiv für Physiologische Heilkunde. Milh, M, Kaminska, A., Huon, C., Lapillonne, A., Ben-Ari, Y ., and Khazipov, R. (2007). ‘Rapid Cortical Oscillations and Early Motor Activity in Premature Human Neonate.’ Cerebral Cortex 17(7), 1582-1594. Millikan, R. (1984). Language, Thought and Other Biological Categories. Cambridge, MA: MIT Press. Millikan, R. (1989). ‘Biosemantics.’ Journal of Philosophy 86, 281-297. Millikan, R. (2004). Varieties of Meaning. Cambridge, MA: MIT Press. Milner, A. D. and Goodale, M. H. (2006). The Visual Brain in Action, 2nd ed. Oxford: Oxford University Press. Mohr, H. M., Zimmerman, J., Röder, C., Lenz, C., Overbeck, G., and Grabhorn, R. (2010). ‘Separating Two Components of Body Image in Anorexia Nervosa Using fMRI,’ Psychological Medicine 40, 1519-1529. 235 Mon-Williams, M. and Tresilian, J.R. (1999). ‘Some Recent Studies on the Extraretinal Contributions to Distance Perception.’ Perception, 28(2), 167-181. Müller, G. B. (2003). ‘Embryonic Motility: Environmental Influences and Evolutionary Innovation.’ Evolution and Development 5, 56-60. Myowa-Yamakoshi, M. and Takeshita, H. (2006). ‘Do Human Fetuses Anticipate Self-Oriented Actions? A Study by Four-Dimensional (4D) Ultrasonography.’ Infancy 10(3), 289-301. Nagy, E., Kompagne, H., Orvos, H., Pal, A., Molnar, P., Janszky, I., Bardos, G. (2005). ‘Index Finger Movement Imitation in Human Neonates: Motivation, Learning, and Left-Hand Preference.’ Pediatric Research 58, 749-753. Nagy, E., Pal, A., and Orvos, H. (2014). ‘Learning to Imitate Individual Finger Movements by the Human Neonate.’ Developmental Science 17(6), 841-857. Neander, K. (1995). ‘Misrepresenting and Malfunctioning.’ Philosophical Studies 79(2), 109-141. Neander, K. (2013). ‘Toward an Informational Teleosemantics.’ In D. Ryder, J. Kingsbury, and K. Williford (eds.), Millikan and Her Critics. New York: Wiley. Neander, K. (2017). A Mark of the Mental. Cambridge, MA: MIT Press. Nelkin, N. (1990). Categorising the Senses. Mind and Language 5, 149-165. Nevalainen, P., Rahkonen, P., Pihko, E., Lano, A., Vanhatalo, S., Andersson, S., Lauronen, L. (2015). ‘Evaluation of Somatosensory Cortical Processing in Extremely Preterm Infants at Term With MEG and EEG’ Clinical Neurophysiology 126(2), 275-283. Noë, A. (2005). ‘Real Presence.’ Philosophical Topics 33(1), 235-264. Noë, A. (2006). Action in Perception. Cambridge, MA: MIT Press. Nudds, M. (2001). ‘Experiencing the Production of Sounds.’ European Journal of Philosophy 9, 210-229. Nudds, M. (2004). ‘The Significance of the Senses.’ Proceedings of the Aristotelian Society 104(1), 31-51. Nudds, M. (2009). ‘Sounds and Space.’ In M. Nudds and C. O’Callaghan (eds.), Sounds and Perception (pp. 69–96). Oxford: Oxford University Press. Nudds, N. (2010). ‘What are Auditory Objects?’ Review of Philosophical Psychology 1, 105-122. O’Callaghan, C. (2007). Sounds: a Philosophical Theory. Oxford: Oxford University Press. 236 O’Callaghan, C. (2008). ‘Seeing What You Hear: Cross-Modal Illusions and Perception.’ Philosophical Issues 18, 316-338. O’Callaghan, C. (2010). ‘Perceiving the Locations of Sounds.’ Review of Philosophical Psychology 1, 123–140. O’Callaghan, C. (2012). ‘Perception and Multimodality.’ In E. Margolis, R. Samuels, and S. Stich (eds.), Oxford handbook of philosophy of cognitive science (pp. 92-117). Oxford: Oxford University Press. O’Callaghan, C. (2014a). ‘Intermodal Binding Awareness.’ In D. J. Bennett and C. S. Hill (eds.), Sensory Integration and the Unity of Consciousness (pp. 73-103). Cambridge, MA: MIT Press. O’Callaghan, C. (2014b). ‘Not All Perceptual Experience Is Modality Specific.’ In D. Stokes, M. Matthen, and S. Biggs (eds.), Perception and its modalities (pp. 133-165). Oxford: Oxford University Press. O’Callaghan, C. (2015). ‘The Multisensory Character of Perception.’ Journal of Philosophy 112(10), 551-569. O’Callaghan, C. (2017a). Beyond Vision: Philosophical Essays. Oxford: Oxford University Press. O’Callaghan, C. (2017b). ‘Grades of Multisensory Awareness.’ Mind and Language 32(2), 155-181. O’Callaghan, C. (2019). A Multisensory Philosophy of Perception. Oxford: Oxford University Press. O’Dea, J. (2006). ‘Representationalism, Supervenience, and the Cross-Modal Problem.’ Philosophical Studies 130(2), 285-295. Ohsu, T., Amino, Y ., Nagasaki, H., Yamanaka, T., Takeshita, S., Hatanaka, T., Maruyama, Y ., Miyamura, N. and Eto, Y . (2010). ‘Involvement of the Calcium-Sensing Receptor in Human Taste Perception.’ The Journal of Biological Chemistry 285(2), 1016-1022. Oostenbroek, J. et al. (2016). ‘Comprehensive Longitudinal Study Challenges the Existence of Neonatal Imitation in Humans.’ Current Biology 26, 1334-1338. Oostenbroek, J. et al. (2018). ‘Re-Evaluating the Neonatal Imitation Hypothesis.’ Developmental Science 22, article e12720. O’Regan, J. K. and Noë, A. (2001). ‘A Sensorimotor Account of Vision and Visual Consciousness.’ Behavioral and Brain Sciences 24(5), 883-917. 237 O’Shaughnessy, B. (2000). Consciousness and the World. Oxford: Oxford University Press. O’Shaughnessy, B. (2009). ‘The Location of Perceived Sound.’ In M. Nudds and C. O’Callaghan (eds.), Sounds and Perception (pp. 111–125). Oxford: Oxford University Press. Orioli, G., Bremner, A. J. and Farroni, T. (2018). ‘Multisensory Perception of Looming and Receding Objects in Human Newborns.’ Current Biology 28(22), R1294-R1295. Paillard, J. (1999). ‘Body Schema and Body Image – a Double Dissociation in Deafferented Patients.’ In G. N. Gantchev, S. Mori, and J. Massion (eds.), MOtor Control: Today and Tomorrow (pp. 197-214). Sofia, Bulgaria: Academic Publishing House. Palmer, A. R. and King, A. J. (1982). ‘The Representation of Auditory Space in the Mammalian Superior Colliculus.’ Nature 299, 248-249. Papineau, D. (1987). Reality and Representation. Oxford: Blackwell. Papineau, D. (1993). Philosophical Naturalism. Oxford: Blackwell. Papineau, D. (1998). ‘Teleosemantics and Indeterminacy.’ Australasian Journal of Philosophy 76(1), 1-14. Parham, K., Zhao, H. B. and Kim, D. O. (1996). ‘Responses of Auditory Nerve Fibers of the Unanesthetized Decerebrate Cat to Click Pairs as Simulated Echoes.’ Journal of Neurophysiology 76, 17-29. Parham, K., Zhao, H. B. and Kim, D. O. (1998). ‘Responses of Anteroventral Cochlear Nucleus Neurons of the Unanesthetized Decerebrate Cat to Click Pairs as Simulated Echoes.’ Hearing Research 125, 131-146. Parker, A.J.; Smit, J.E.T.; and Krug, K. (2016). ‘Neural Architectures for Stereo Vision.’ Philosophical Transactions of the Royal Society B: Biological Sciences 371(1697), article 20150261. Pasnau, R. (1999). ‘What Is Sound?’ Philosophical Quarterly 49, 309-324. Peacock, C. (1983). Sense and Content. Oxford: Oxford University Press. Penfield, W. and Boldrey, E. (1937). ‘Somatic Motor and Sensory Representation in the Cerebral Cortex of Man as Studied by Electrical Stimulation.’ Brain 60, 389-443. Penfield, W. and Rasmussen, T. (1950). The Cerebral Cortex of Man; a Clinical Study of Localization of Function. New York: Macmillan. 238 Peterson, D. C. and Schofield, B. R. (2007). ‘Projections from Auditory Cortex Contact Ascending Pathways that Originate in the Superior Olive and Inferior Colliculus.’ Hearing Research 232(1-2), 67-77. Piaget, J. (1962). Play, Dreams, and Imitation in Childhood. New York: Norton. Pitron, V . and de Vignemont, F. (2017). ‘Beyond Differences Between the Body Schema and the Body Image: Insight From Body Hallucinations.’ Consciousness and Cognition 53, 115-121. Pitron, V ., Alsmith, A., de Vignemont, F. (2018). ‘How Do the Body Schema and the Body Image Interact?’ Consciousness and Cognition 65, 352-358. Pizlo, Z. (2008). 3D Shape: Its Unique Place in Visual Perception. Cambridge, MA: MIT Press. Poeck, K. (1964). ‘Phantoms Following Amputation in Early Childhood and in Congenital Absence of Limbs.’ Cortex 1, 269-275. Priot, A-E., Neveua, P., Sillan, O., Plantier, J., Roumes, C., and Prablanc, C. (2012). ‘How Perceived Egocentric Distance Varies with Changes in Tonic Vergence.’ Experimental Brain Research 219, 457-465. Quinlan, D.J. and Culham, J.C. (2007). ‘FMRI Reveals a Preference for Near Viewing in the Human Parieto-Occipital Cortex.’ NeuroImage 36(1), 167-187. Rauschecker, J. P. and Tian, B. (2000). ‘Mechanisms and Streams for Processing of “What” and “Where” in Auditory Cortex.’ PNAS 97(22), 11800-11806. Ray, E. and Heyes, C. (2011). ‘Imitation in Infancy: the Wealth of the Stimulus.’ Developmental Science 14(1), 92-105. Reissland, N., Francis, B., Aydin, E., Mason, J., Schaal, B. (2014). ‘The Development of Anticipation in the Fetus: a Longitudinal Account of Human Fetal Mouth Movements in Reaction to and Anticipation of Touch.’ Developmental Psychobiology 56, 955-963. Renner, R.S.; Steindecker, E.; Müller, M.; Velichkovsky, B.M.; Stelzer, R.; Pannasch, S.; and Helmert, J.R. (2015). ‘The Influence of the Stereo Base on Blind and Sighted Reaches in a Virtual Environment.’ ACM Transactions on Applied Perception 12(2), article 7. Richardson, L. (2014). ‘Non Sense-Specific Perception and the Distinction Between the Senses.’ Res Philosophica 91(2), 215-239. Rochat, P. (2010). ‘The Innate Sense of the Body Develops To Become a Public Affair by 2-3 Years.’ Neuropsychologia 48(3), 738-745. Rock, I., and Harris, C. S. (1967). ‘Vision and Touch.’ Scientific American 216(5), 96-107. 239 Rock, I., and Victor, J. (1964). ‘Vision and Touch: an Experimentally Created Conflict Between the Two Senses.’ Science 143(3606), 594-596. Rogers, B. (2019). ‘Toward a New Theory of Stereopsis: A Critique of Vishwanath (2014).’ Psychological Review 126(1), 162-169. Roxbee-Cox, J. W. (1970). ‘Distinguishing the Senses.’ Mind 79, 1530-1550. Salminen, N. H., Jones, S. J., Christianson, G. B., Marquardt, T. and McAlpine, D. (2018). ‘A Common Period Representation of Interaural Time Differences in Mammalian Cortex.’ Neuroimage 167, 95-103. Scatena, P. (1990). ‘Phantom Representations of Congenitally Absent Limbs.’ Perceptual and Motor Skills 70, 1227-1232. Schellenberg, S. (2007). ‘Action and Self-Location in Perception.’ Mind 116(463), 603-631. Schellenberg, S. (2008). ‘The Situation-Dependency of Perception.’ Journal of Philosophy 105(2), 55-84. Schellenberg, S. (2010). ‘Perceptual Experience and the Capacity to Act.’ In N. Gangopadhya, M. Madary, and F. Spicer (eds.), Perception, Action, and Consciousness: Sensorimotor Dynamics and Two Visual Systems (pp. 145-159). Oxford: Oxford University Press. Schellenberg, S. (2019). ‘Perceptual Consciousness as a Mental Activity.’ Noûs 53(1), 114-133. Schnupp, J. W. H. and King, A. J. (1997). ‘Coding for Auditory Space in the Nucleus of the Brachium of the Inferior Colliculus in the Ferret.’ Journal of Neurophysiology 78, 2717-2731. Schroeder, C. E. and Foxe, J. (2005). ‘Multisensory Contributions to Low-Level, “Unisensory” Processing.’ Current Opinion in Neurobiology 15, 454–458. Schulte, P. (2018). ‘Perceiving the World Outside: How to Solve the Distality Problem for Informational Teleosemantics.’ The Philosophy Quarterly 68(271), 349-369. Schwab, I.R. (2012). Evolution’ s Witness: How Eyes Evolved. Oxford: Oxford University Press. Schwenkler, J. (2012). ‘Does Visual Spatial Awareness Require the Visual Awareness of Space?’ Mind and Language 27(3), 308-329. Schwoebel, J. and Coslett, H. B. (2005). ‘Evidence for Multiple, Distinct Representations of the Human Body.’ Journal of Cognitive Neuroscience 17, 543-553. Shea, N. (2007). ‘Consumers Need Information: Supplementing Teleosemantics with Input Conditions.’ Philosophy and Phenomenological Research 75(2), 404-435. 240 Shea, N. (2015). ‘Distinguishing Top-Down from Bottom-Up Effects.’ In D. Stokes, M. Matthen, and S. Biggs (eds.), Perception and Its Modalities (pp. 73-91). Oxford: Oxford University Press. Shea, N. (2018). Representation in Cognitive Science. Oxford: Oxford University Press. Shimojo, S., and Shams, L. (2001). ‘Sensory Modalities Are Not Separate Modalities: Plasticity and Interactions.’ Current Opinion in Neurobiology 11, 505-509. Shinn-Cunningham, B. G., Santarelli, S., Kopco, N. (2000). ‘Tori of Confusion: Binaural Localization Cues for Sources Within Reach of a Listener.’ Journal of the Acoustical Society of America 107(3), 1627-1636. Siegel, S. (2006). ‘Which Properties Are Represented in Perception?’ In T. S. Gendler and J. Hawthorne (eds.), Perceptual Experience (pp. 481-503). Oxford: Oxford University Press. Simpson, E. A., Murray, L., Paukner, A., and Ferrari, P. F. (2014). ‘The Mirror Neuron System as Revealed Through Neonatal Imitation: Presence From Birth, Predictive Power, and Evidence of Plasticity.’ Philosophical Transactions of the Royal Society, B 369, article 20130289. Sorensen, R. (2008). Seeing Dark Things: The Philosophy of Shadows. Oxford: Oxford University Press. Soroker, N., Calamaro, N., and Myslobodsky, M. (1995a). ‘McGurk Illusion to Bilateral Administration of Sensory Stimuli in Patients With Hemispatial Neglect.’ Neuropsychologia 33, 461-470. Soroker, N., Calamaro, N., and Myslobodsky, M. (1995b). ‘Ventriloquism Effect Reinstates Responsiveness to Auditory Stimuli in the ‘Ignored’ Space in Patients With Hemispatial Neglect.’ Journal of Clinical and Experimental Neuropsychology 17, 243-255. Soska, K. C. and Johnson, S. P. (2008). ‘Development of Three-Dimensional Object Completion in Infancy.’ Child Development 79(5), 1230-1236. Speaks, J. (2005). ‘Is there a Problem about Nonconceptual Content?’ Philosophical Review 114(3), 359-398. Speaks, J. (2015). The Phenomenal and the Representational. Oxford: Oxford University Press. Spence, C. (2015a). ‘Cross-Modal Perceptual Organization.’ In J. Wagemans (ed.), The Oxford Handbook of Perceptual Organization (pp. 649-664). Oxford: Oxford University Press. Spence, C. (2015b). ‘Multisensory Flavor Perception.’ Cell 161, 24-35. 241 Spence, C. (2016). ‘Oral Referral: on the Mislocalization of Odours to the Mouth.’ Food Quality and Preference 50, 117-128. Spence, C., and Bayne, T. (2014). ‘Is Consciousness Multisensory?’ In D. Stokes, M. Matthen, S. Biggs (eds.), Perception and its modalities (pp. 95-131). Oxford: Oxford University Press. Spence, C., and Frings, C. (2020). ‘Multisensory Feature Integration in (and out) of the Focus of Spatial Attention.’ Attention, Perception, and Psychophysics 82(1), 363-376. Stampe, D. (1977). ‘Toward a Causal Theory of Linguistic Representation.’ In P. French, H. K. Wettstein, and T. E. Uehling (eds.), Midwest Studies in Philosophy, 2 (pp. 42-63). Minneapolis: University of Minnesota Press. Stein, J. F. (1992). ‘The Representation of Egocentric Space in Posterior Parietal Cortex.’ Behavioral Brain Sciences 4, 691-700. Sterbing, S. J., Hartung, K. and Hoffmann, K-P. (2002). ‘Spatial Tuning in the Superior Colliculus of the Guinea Pig in a Virtual Auditory Environment.’ Experimental Brain Research 142, 570-577. Sterbing, S. J., Hartung, K. and Hoffmann, K-P. (2003). ‘Spatial Tuning to Virtual Sounds in the Inferior Colliculus of the Guinea Pig in a Virtual Auditory Environment.’ Journal of Neurophysiology 90, 2648-2659. Stratton, G. (1897). ‘Vision without Inversion of the Retinal Image.’ Psychological Review 4, 341-360, 463-481. Strawson, P. F. (1959). Individuals. London: Methuen Press. Stevenson, R. J. (2009). The Psychology of Flavor. Oxford: Oxford University Press. Strzalkoski, N. D. J., Peters, R. M., Ingliss, J. T., and Bent, L. R. (2018). ‘Cutaneous Afferent Innervation of the Human Foot Sole: What We Can Learn From Single-Unit Recordings?’ Journal of Neurophysiology 120, 1233-1246. Swenson, H.A. (1932). ‘The Relative Influence of Accommodation and Convergence in the Judgment of Distance.’ The Journal of General Psychology 7(2), 360-380. Tang, C. S., Tan, V . W. K., Teo, P. S., Gorde, C. G. (2020). ‘Savoury and Kokumi Enhancement Increases Perceived Calories and Expectation of Fullness in Equicaloric Beef Broths.’ Food Quality and Preference 83, article 103897. Terhardt, E. (1979). ‘Calculating Virtual Pitch.’ Hearing Research 1, 155-182. 242 Terhardt, E., Stoll, G. and Seewann, M. (1982). ‘Algorithm for Extraction of Pitch and Pitch Salience from Complex Tonal Signals.’ Journal of the Acoustical Society of America 71(3), 679-688. Thompson, S. K., von Kriegstein, K., Deane-Pratt, A., Marquardt, T., Deichmann, R., Griffiths, T. D. and McAlpine, D. (2006). ‘Representation of Interaural Time Delay in the Human Auditory Midbrain.’ Nature Neuroscience 9, 1096-1098. Thomson, W.; Fleming, R.; Creem-Regehr, S.; and Stefanucci, J.K. (2011). Visual Perception from a Computer Graphics Perspective. Boca Raton: CRC Press. Thouless, R.H. (1931a). ‘Phenomenal Regression to the Real Object I.’ British Journal of Psychology 20, 339–359. Thouless, R.H. (1931b). ‘Phenomenal Regression to the Real Object II.’ British Journal of Psychology 21, 20-30. Town, S. M. and Bizley, J. K. (2013). ‘Neural and Behavioral Investigations into Timbre Perception.’ Frontiers in Systems Neuroscience 7, article 88. Trapeau, R. and Schőnwiesner, M. (2015). ‘Adaptation to Shifted Interaural Time Differences Changes Encoding of Sound Location in Human Auditory Cortex.’ NeuroImage 118, 26-38. Trapeau, R. and Schőnwiesner, M. (2018). ‘The Encoding of Sound Source Elevation in the Human Auditory Cortex.’ Journal of Neuroscience 38(13), 3253-3264. Tresilian, J.R., Mon-Williams, M., and Kelly B.M. (1999). ‘Increasing Confidence in Vergence as a Cue to Distance.’ Proceedings: Biological Sciences 266(1414), 39-44. Tye, M. (2000). Consciousness, Color, and Content. Cambridge MA: MIT Press. Tye, M. (2002). ‘Visual Qualia and Visual Content Revisited.’ In D. J. Chalmers (ed.), Philosophy of Mind: Classical and Contemporary Readings (pp. 447-456). Oxford: Oxford University Press. Ueda, Y ., Sakaguchi, M., Hirayama, K., Miyajima, R., and Kimizuka, A. (1990). ‘Characteristic Flavor Constituents in Water Extract of Garlic.’ Agricultural and Biological Chemistry 54(1), 163-169. Ullstadius, E. (1998). ‘Neonatal Imitation in a Mother-Infant Setting,’ Early Development and Parenting 7, 1-8. Usher, M. (2001). ‘A Statistical Referential theory of Content: Using Information Theory to Account for Misrepresentation.’ Mind and Language 16(3), 311-334. 243 Utsumi, A.; Milgram, P.; Takemura, H.; and Kishino, F. (1994). ‘Investigation of Errors in Perception of Stereoscopically Presented Virtual Object Locations in Real Display Space.’ Proceedings of the Human Factors and Ergonomics Society Annual Meeting 38(4), 250-254. Vagnoni, E. and Longo, M. (2019). ‘PEripersonal Space: its Functions, Plasticity, and Neural Basis.’ in T. Cheng, O. Deroy, C. Spence (eds.), The Spatial Senses: Philosophy of Perception in an Age of Science (pp. 199-225), New York: Routledge. Van Cleve, J. (2012). ‘Defining and Defending Nonconceptual Contents and States.’ Philosophical Perspectives 26, 411-430. Van Cleve, J. and Frederick, R., eds. (1991). The Philosophy of Left and Right: Incongruent Counterparts and the Nature of Space. Boston: Kluwer Academic Publishers. Vetter, R. J. and Weinstein, S. (1967). ‘The History of the Phantom in Congenitally Absent Limbs.’ Neuropsychologia 5, 335-338. Via, E., Goldberg, X., Sanchez, I., Forcano, I., Harrison, B. J., Davey, C. G., Pujol, J., Marinez- Zalacain, I., Fernandez-Aranda, F., Soriano-Mas, C., Cardoner, N., and Menchon, J. M (2018). ‘Self and Other Body Perception in Anorexia Nervosa: the Role of Posterior DMN Nodes.’ World Journal of Biological Psychiatry 19(3), 210-224. de Vignemont, F. (2007). ‘How Many Representations of the Body.’ Behavioral and Brain Science 30(2), 204-205. de Vignemont, F. (2014). ‘A Multimodal Conception of Bodily Awareness.’ Mind 123(492), 989-1020. de Vignemont, F. (2018). Mind the Body. Oxford: Oxford University Press. de Vignemont, F. and Alsmith, A.J.T., eds. (2017). The Subject’ s Matter: Self-Consciousness and the Body. Cambridge MA: MIT Press. Wadle, D. C. (2021). ‘Sensory Modalities and Novel Features of Perceptual Experience.’ Synthese 198(10), 9841-9872. Weinstein, S. and Sersen, E. A. (1961). ‘Phantoms in Cases of Congenital Absence of Limbs.’ Neurology 11, 905-911. Weinstein, S., Sersen, E. A. and Vetter, R. J. (1964). ‘Phantoms and Somatic Sensation in Cases of Congenital Aplasia.’ Cortex 1, 276-290. 244 Wheatstone, C. (1852). ‘Contributions to the Physiology of Vision, Part the Second: On Some Remarkable, and Hitherto Unobserved, Phenomena of Binocular Vision (continued).’ Philosophical Transactions of the Royal Society of London 142, 1-17. Wightman, F,. L. and Jenison, R. (1995). ‘Auditory Spatial Layout.’ In W. Epstein and S. Rogers (eds.), Perception of Space and Motion, 2nd ed. (pp. 365-400). San Diego: Academic Press. Wolfe, J.M., Kluender, K.R., Levi, D.M., Bartoshuk, L.M., Herz, R.S., Klatzy, R.L., and Merfeld, D.M. (2019). Sensation and Perception. Oxford: Oxford University Press. Wong, H. Y . (2017). ‘On Proprioception in Action: Multimodality versus Deafferentation.’ Mind and Language 32(3), 259-282. Woods, A.J.; Docherty, T.; and Koch, R.. (1993). ‘Image Distortions in Stereoscopic Video Systems.’ Stereoscopic Displays and Applications IV 1915, 36-48. Wraga, M. (1999). ‘Using Eye Height in Different Postures to Scale the Heights of Objects.’ Journal of Experimental Psychology: Human Perception and Performance 25(2), 518-530. Yamamoto, M., Terada, Y ., Motoyama, T., Shibata, M., Saito, T., and Ito, K. (2020). ’N-Terminal [Glu]3 Moiety of Γ-Glutamyl Peptides Contributes Largely to the Activation of Human Calcium-Sensing Receptor, a Kokumi Receptor.’ Bioscience, Biotechnology, and Biochemistry 84(7), 1497-1500. Yamanoue, H.; Okui, M.; and Okano, F. (2006). ‘Geometrical Analysis of Puppet-Theater and Cardboard Effects in Stereoscopic HDTV Images.’ IEEE Transactions on Circuits and Systems for Video Technology 16(6), 744-752. Yang, J., Bai, W., Zeng, Z., and Cui, C. (2019). ‘Gamma Glutamyl Peptide: the Food Source, Enzymatic Synthesis, Kokumi-Active and the Potential Functional Properties – a Review.’ Trends in Food Science and Technology 91, 339-346. Yoshimura, H. (2002). ‘Reacquisition of Upright Vision while Wearing Visually Left-Right Reversing Goggles.’ Japanese Psychological Research: Short Report 44, 228–233. Yost, W. A. and Dye, R. H. (1988). ‘Discrimination of Interaural Differences of Level as a Function of Frequency.’ Journal of the Acoustical Society of America 83, 1846-1851. Zahoric, P., Brungart, D. S. and Bronkhorst, A. W. (2005). ‘Auditory Distance Perception in Humans: A Summary of Past and Present Research.’ Acta Acustica united with Acustica, 91, 409-420. Zioa, S., Blason, L., D’Ottavio, G., Bulgheroni, M., Pezzetta, E., Scabar, A. (2007). ‘Evidence of Early Development of Action Planning in the Human Foetus: a Kinematic Study.’ Experimental Brain Research 176, 217-226. 245 Appendix Tables summarizing the association rules, how they fared with respect to the four constraints on an acceptable association rule, and – for those that failed to satisfy constraint 2 (regarding our guiding feature association intuitions), how they fared with respect to intuitions concerning proper sensibles, the McGurk effect, and the ventriloquism effect, are given below. Table 1.1. Summary of the association rules considered herein is given, with respect to the dimensions along which those rules varied. demarcation of the sensory mechanisms what is the relevant stimulation for assessing the counterfactual? what counts as a result of stimulation of that mechanism? OCR indeterminate equivalent to actual what could result RR1 determinate equivalent to actual what could result RR2 determinate equivalent to actual what would result RR3 determinate substantially similar to actual what would result RR4 determinate unconstrained what would result RR5 determinate substantially similar to actual what would result but allowing differences in determinate of same determinable RR6 determinate substantially similar to actual what would result but allowing restricted feature inheritance from other mechanisms RR7 determinate substantially similar to actual what would result but allowing restricted inheritance of information from other mechanisms, with phenomenology supervening on informational content+modality RR8 determinate substantially similar to actual what would result, provided what did result was (partially determined by information carried in the mechanism) but allowing restricted inheritance of information as in RR7 246 Table 1.2a. Summary of results for each association rule with respect to the four constraints on an acceptable association rule. Table 1.2b. A comparison of the guiding feature association intuitions for the relevant features in our test cases and the results for those association rules that were rejected for failing to satisfy this constraint. (These results apply to our generalized notion of an acceptable demarcation of the sensory mechanisms.) Rules on every feature for each modality Satisfies guiding intuitions Makes novel feature debate substantive Sufficiently grounded in stimulation states of mechanisms OCR no no yes no RR1 yes no yes yes RR2 yes no yes yes RR3 yes no yes yes RR4 yes no yes yes RR5 yes uncertain yes no RR6 yes, but circular yes, but circular yes no RR7 yes, pending theoretical results uncertain no no RR8 yes uncertain no yes Proper Sensibles McGurk Effect Ventriloquism Effect Intuition non-novel, unique to a modality auditory auditory OCR non-novel, not unique auditory auditory RR1 non-novel, not unique auditory auditory RR2 non-novel, unique non-auditory non-auditory RR3 non-novel, unique auditory non-auditory RR4 non-novel, unique auditory non-auditory 247
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Molyneux's question answered!
PDF
Thomas Reid on singular thought
PDF
A deontological explanation of accessibilism
PDF
Eyez: Spatial perception in videogames
PDF
Theism and secular modality
PDF
Breaching the labial lips: thought and new language in the menstrual poem and Homesick for myself (poems)
PDF
A perceptual model of evaluative knowledge
PDF
Feeling good and living well: on the nature of pleasure and its role in well-being
PDF
Processing the dynamicity of events in language
PDF
Sensory acquisition for emergent body representations in neuro-robotic systems
PDF
Comparative iIlusions at the syntax-semantics interface
PDF
Cognitive boundaries for rational coherence requirements
PDF
Active sensing in robotic deployments
PDF
Spatial query processing using Voronoi diagrams
PDF
Practice-inspired trust models and mechanisms for differential privacy
PDF
Evaluating sensing and control in underwater animal behaviors
PDF
Merely verbal disputes in philosophy
PDF
Vulnerable station: perceptions of suicide in America's Last Best Place
PDF
The somnambulist's hour: unruly bodies and unruly modernism, 1913-1947
PDF
Scalable processing of spatial queries
Asset Metadata
Creator
Wadle, Douglas Charles
(author)
Core Title
Minimal sensory modalities and spatial perception
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Philosophy
Degree Conferral Date
2023-08
Publication Date
06/30/2023
Defense Date
06/13/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
individuation of the senses,multisensory perception,OAI-PMH Harvest,philosophy of cognitive science,philosophy of perception,spatial perception
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Levin, Janet (
committee chair
), Hawthorne, John (
committee member
), Mintz, Toby (
committee member
), Van Cleve, James (
committee member
), Wallace, David (
committee member
)
Creator Email
d.c.wadle@gmail.com,wadle@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113260804
Unique identifier
UC113260804
Identifier
etd-WadleDougl-12003.pdf (filename)
Legacy Identifier
etd-WadleDougl-12003
Document Type
Dissertation
Format
theses (aat)
Rights
Wadle, Douglas Charles
Internet Media Type
application/pdf
Type
texts
Source
20230705-usctheses-batch-1060
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
individuation of the senses
multisensory perception
philosophy of cognitive science
philosophy of perception
spatial perception