Close
The page header's logo
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
An integrated environment for modeling, experimental databases and data mining in neuroscience
(USC Thesis Other) 

An integrated environment for modeling, experimental databases and data mining in neuroscience

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content An Integrated Environment for Modeling, Experimental Databases and Data Mining in Neuroscience Copyright 2001 by Ying Shu A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA in Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2001 Ying Shu Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3027780 Copyright 2001 by Shu, Ying All rights reserved. UMI UMI Microform 3027780 Copyright 2002 by Bell & Howell Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. Bell & Howell Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES. CALIFORNIA 90007 This dissertation, written hy y i t s . ^ stvw under the direction of h. Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School in partial fulfillment of re­ quirements for the degree of DOCTOR OF PHILOSOPHY Dean of Graduate Studies Da te .... . A A . .?.9.9A. DISSERTATION COMMITTEE Chairperson Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Dedicate to My parents Zhen Shu, Meifen Liang and my sister Hua Shu Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgements First of all, my sincere thanks to my advisor Prof. Arbib who made the completion of the thesis possible. His supervison and support is a great help. Thanks to Prof. Liaw for leading me into modeling research initially. His fruitful discussion and help on modeling and neuroscience resolved a lot of my questions. Thanks to Prof. Shahabi for helpful discussion on database and data mining issues. Thanks to Prof. Shen for discussion on data mining. Thanks to Prof. Berger and members in his lab for the help. Thanks to the members in Brain Simulation Lab for the help. Thanks to Amy Yung, Gabriele Larmon, and Paulina Tagle for taking care of administration issues. Thanks to Luigi Manna for solving problems on system administration. Last but not least, thanks to my family and friends who have helped me through my graduate study. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table of Contents Dedication ii Acknowledgements iii List of Figures viii List of Tables xii Abstract xiii 1. Introduction 1 1.1 Challenging issues 1 1.2 Proposed approaches 3 1.3 Objectives and significance 6 1.4 Organization of the thesis 9 2. Background 12 2.1 Neuron and neural modeling 12 2.1.1 Basic properties of neurons 13 2.1.2 Synapses 14 2.1.3 Artificial neural network 18 2.1.4 Neural modeling and simulation systems 22 2.1.5 Discussion 30 2.2 Databases and data mining 32 2.2.1 Experimental databases 32 2.2.2 Data warehouse 38 2.2.2.1 Characteristics of a data warehouse 38 2.22.2 Processes in data warehousing 39 2.2.2.3 Designing data warehouses 42 2.2.3 Time series data mining 44 2.2.4 Discussion 49 2.3 System integration 50 2.3.1 Schema integration 51 2.3.2 Database middleware systems 53 2.3.3 Data integration systems 55 2.3.4 Discussion 56 iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. System Architecture of the Integrated Environment 58 3.1 Basic system architecture 59 3.1.1 Background 59 3.1.2 Components 63 3.1.2.1 Graphical user interface 66 3.1.2.2 Multi-level modeling and simulation 66 3.1.2.3 Data mining 67 3.1.2.4 Experimental databases 68 3.1.2.5 Middle-ware layer 68 3.2 The design options 70 3.2.1 Loosely coupled vs. tightly coupled components 70 3.2.2 Centralized versus distributed data warehouses 72 3.2.3 Centralized versus parallel data mining 73 3.3 The adopted architecture model 75 4. Multi-level Modeling 79 4.1 Introduction 79 4.2 System complexity 82 4.2.1 Non-linear interactions of multiple processes at a given level or across different levels of organization 82 4.2.2 Spatial and time scales 85 4.3 Multi-level modeling system: EONS 87 4.3.1 EONS system design 88 4.3.1.1 EONS library 91 4.3.1.2 Flexible neuronal modeling 92 4.3.1.3 Composition 97 4.3.2 EONS development 98 4.3.2.1 EONS object library 98 4.3.2.2 Neuron modeling 99 4.3.2.3 Systematic set-up neuron connections 107 4.3.2.4 Synaptic modeling 112 4.3.2.5 Neural network modeling with synaptic dynamics 118 4.4 Protocol-based simulation with EONS 125 4.5 Discussion 132 5. Data Warehouse and Data Mining for Experimental Databases 135 5.1 Introduction 135 5.2 Data warehouse for experimental databases 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3 Flexible matching for time series neurophysiological recordings 142 5.3.1 Motivation 144 5.3.2 Related work 146 5.3.3 Flexible accuracy matching (FA-matching) 148 5.3.4 The index structure 152 5.3.5 An approximate algorithm for FA-matching 154 5.3.6 Implementation 157 5.3.7 Performance measurements 157 5.3.8 Queries 163 5.3.9 An extension of FA-matching: Flexible-attribute matching 164 5.3.10 Discussion 165 5.4 A framework for data mining of neuroscience databases 168 5.4.1 NeuroMiner 169 5.4.2 Data organization 172 5.4.3 Discussion 173 6. An Architecture for Linking Modeling and Experimental Studies (MDIS) 177 6.1 Introduction 177 6.1.1 Motivating example 178 6.1.2 A hypothesis-driven approach 184 6.1.3 The information integration platform 188 6.1.4 MDIS architecture 189 6.1.5 The process flow 191 6.2 Linking models to experimental data in the databases 193 6.2.1 Experimental protocols 194 6.2.2 Simulation protocols 195 6.3 MDIS architecture components 196 6.3.1 Protocol specification language and data model 196 6.3.1.1 Syntax of PSL 198 6.3.1.2 Semantics of PSL 201 6.3.1.3 Semantic model 203 6.3.2 The metadata model 204 6.3.2.1 Cooperative universal relation 207 6.3.2.2 The distinction of names from semantics 209 6.3.2.3 Parameter connection matrix (PCM) 210 6.3.2.4 Attribute linkage matrix (ALM) 212 6.3.2.5 Attribute relevance matrix (ARM) 214 6.3.3 Query multiple databases 216 6.3.3.1 Mapping simulation protocols with experimental protocols 219 6.3.3.2 Cooperative SQL processing 219 6.3.3.3 Query transformation 222 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3.4 The Model-DataView structure 225 6.3.5 Controller 231 6.3.6 User interface 232 6.4 Characteristics of MDIS 234 6.5 Implementation issues 237 6.6 Related work 245 6.7 Discussions 249 7. Integration of Modeling, Experimental Databases and Data Mining 251 7.1 Introduction 251 7.2 Testing fitness between models and experiments 252 7.3 Integrating data mining into MDIS 261 7.3.1.A Scenario 261 7.3.2 The extended architecture 263 7.4 A common representation for data transfer 265 7.5 System integration 269 7.5.1 Advantages of using distributed object computing 272 7.5.2 Interaction protocols 274 7.5.3 Implementation 278 7.6 Discussion 280 8. Conclusions and Future Work 282 References 289 Appendices 299 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures Figure 2.1 Schematic of a neuron Figure 2.2 A synapse model Figure 3.1 Relationships among the four Components Figure 3.2 A framework for linking modeling and simulation, experimental databases and data mining Figure 3.3 The adopted architecture Figure 3.4 The metadata in the integrated environment Figure 4.1 Complexity of multi-level modeling Figure 4.2 Process evolves at different time scales Figure 4.3 System design of EONS Figure 4.4 The different stages of analysis and modeling Figure 4.5 EONS library objects and methods Figure 4.6 Paradigms for neural network simulation Figure 4.7 Flexible neural network modeling with EONS library Figure 4.8 A hierarchy of neuron components Figure 4.9 Neuron modeling Figure 4.10 Systematic set-up neuron connections in a neural network Figure 4.11 C++ for set-up neuron connections in a 3-layer neural network Figure 4.12 The simulation results (EPSPs) of AMP A receptor, NMDA receptor and Spine viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.13 The plot of the simulation results in Figure 4.12 into one picture where the lowest curve is EPSP of NMDA receptor, the middle curve is the EPSP of AMP A receptor, the top one is the summary EPSP from the AMPA and NMDA receptors Figure 4.14 Variations in synaptic mechanisms and pattern transformation Figure 5.1 A Multi-dimensional modeling schema for Neurophysiological EPSP data Figure 5.2 An example of neuroscience time series data Figure 5.3 A sequence of real experimental recordings Figure 5.4 The index schema for experimental recordings Figure 5.5 Illustration of FA-matching algorithm Figure 5.6 A search result of FA-marching of two similar EPSP recordings Figure 5.7 Compare FA-matching with linear scan on average recall Figure 5.8 The amplitudes of time series recordings Figure 5.9 NeuroMiner in the overall system architecture Figure 5.10 A portion of the experimental manipulations hierarchy Figure 6.1 Kinetics for AMPA receptor Figure 6.2 Kinetics for NMDA receptor Figure 6.3 The Model/Data Integration System Figure 6.4 Process flow within MDIS Figure 6.5 A hierarchical description of experimental protocol Figure 6.6 A hierarchical description of simulation protocols Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.7 The PSL representation of a simulation protocol Figure 6.8 A simulation protocol semantic model Figure 6.9 Cooperative component database schemas Figure 6.10 General processes of querying experimental DB Figure 6.11 A generic algorithm for translation between simulation protocol parameters and cooperative attributes (experimental protocol parameters) Figure 6.12 The generic algorithm for query rewrite Figure 6.13 An illustration of the fields in one Model-DataView unit Figure 6.14 A portion of the hierarchy of experimental protocols Figure 6.15 The ModelDataViewTable Figure 6.16 A GUI for simulation protocol specification Figure 6.17 The architecture of the protocol-based schema translation mechanism Figure 6.18 A class definition of simulation protocols for mapping with experimental protocols Figure 7.1 Comparison of an experimental result with the simulation results of two AMPA models Figure 7.2 The comparison of an experimental result with the simulation result of the synapse model presented in section 6.3.4.2. Figure 7.3 The comparison of an experimental result with the simulation result of the synapse model presented in this section. Figure 7.4 A scenario of information flow among modeling, experimental database and data mining system Figure 7.5 An extended architecture for integrating modeling, neuroscience experimental databases and data mining system X Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 7.6 A generic format of the header and body section of the ModelData Protocol Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Tables Table 1 The parameter connection matrix Table 2 The cooperative attribute name and the attribute semantic Table 3 The attribute linkage matrix for the synaptic modeling example in section 6.1.1 based on Figure 6.9 Table 4 The attribute relevance matrix for the synaptic modeling example in section 6.1.1 based on Fig. 6.9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ABSTRACT In this thesis I propose an integrated framework for a modeling and simulation system, experimental databases, and a data mining system. A middle-ware layer is developed on top of the components to facilitate the connections among them. This is a multi-disciplinary study falling in the intersection of neural modeling, experimental time series analysis, database systems and computer science. Traditionally, modeling, experimental studies and database systems are separated with their own approaches. An integrated environment with emerging database techniques will provide more powerful tools to both modelers and experimentalists, and help them to get insights from each other’s work. To achieve the objective, we develop a system architecture with five components: graphical user interface, a modeling and simulation system, experimental databases, a data mining system, and a middle-ware layer, with emphasis on the last four components in this thesis. Each of the components is an independent system with its specific characteristics, which can communicate with the other components. We provide methods to enhance the collaboration among them. Individual components in the environment have been developed first, including a multi-level modeling system EONS, a prototypical data mining system NeuroMiner, xiii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. an index structure (Sh-index) for experimental time series and a data warehouse architecture for the experimental databases to support data mining. Moreover, emphasis has been put on developing the Model/Data Integration System (MDIS) for linking modeling and experimental studies. This architecture serves as a middle-ware layer for modeling and experimental databases, and is then extended to including data mining components. A common representation for message passing through XML and an application-dependent interaction protocol for facilitating the communication among the components are provided. The developed system architecture is loosely coupled and leaves room for extending individual components and adding new components into it. Each component can be modified and/or extended on its own without affecting the performance of other components. This multi-tiered architecture separates interface, logic and data for maximum deployment flexibility. It also makes it easy for us to adopt a component- based approach to implement the system. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1: Introduction In this thesis I propose an integrated framework for a modeling and simulation system, experimental databases, and a data mining system. This is a multi-disciplinary study falling in the intersection of neural modeling, experimental time series analysis, and database systems. Traditionally, modeling, experimental studies and database systems are separated with their own approaches. An integrated environment with emerging database techniques will provide more powerful tools to both modelers and experimentalists, and help them to get insights from each other’s work. To achieve the objective, we develop a system architecture with five components: graphical user interface, modeling and simulation system, experimental databases, data mining system, and a middle-ware layer, with emphasis on the last four components in this thesis. Each of the components is an independent system with its specific characteristics, which can communicate with the other components. We provide methods to enhance the collaboration among them. 1.1 Challenging Issues Most of the work in neural modeling has attempted to model brain functions at a certain level. Very few of them have emphasized extracting principles from a complex 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. system, modeling neural system from different levels of granularity and capturing the interactions among the processes at different levels of brain organization, i.e. from molecular/cellular, to synaptic, to neuron, and to network level. Neural models are built to verily certain hypotheses and are tested by getting data from the literature for most cases. However, to verify the accuracy of the models, it is important to test them with real experimental data. How to link the models to the data in the experimental databases is a problem interesting to both modelers and experimentalists. Moreover, modeling and experimental studies are two very different disciplines. How to build a linkage between them so that these two disciplines can communicate is a challenge. Experimentalists have accumulated a massive amount of experimental recordings through the years. However, there are few methods having been developed for analyzing the data at a large scale. How to organize the massive amount of experimental data gathered by experimentalists in the experimental databases for easy data retrieval, data analysis, and maintenance is a fundamental challenge for designing experimental data storage systems. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Most of the current work in time series data mining is in the fields of financial applications, such as stock market data, sales along certain time period, etc. Very little of the work has considered neuroscience time series recordings. To adapt those methods into neuroscience time series recording analysis is still a challenging work. Moreover, to enhance the collaboration among these systems, there is a need for an environment to automate the connection among the modeling and simulation system, experimental databases, and the data mining system. 1.2Proposed Approaches In this thesis I propose an integrated environment consisting of a modeling and simulation system, experimental databases, and a data mining system to better utilize the functionality of each to explore the principles of brain functions. To achieve our goal of building an integrated environment, I first work on some of the individual components. A multi-level modeling system, EONS, has been developed, as well as a prototypical data mining system, NeuroMiner, an index structure (Sh-index) for experimental time series and a data warehouse architecture for the experimental databases in order to support data mining. 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Then I provide mechanisms for system integration. There are two important aspects being addressed. The first is to provide a middle-ware layer (MDIS) for linking models to the data in the experimental databases and testing the fitness between a model and an experiment. The second is to provide a common representation for message passing through XML and an application-dependent interaction protocol to facilitate the communication among the components. Specifically, several techniques have been proposed in this thesis: • Multi-level modeling based on EONS {Elementary Objects for Neural Systems) is incorporated as our modeling and simulation component. The developed multi­ level modeling system EONS can model neural systems from different levels of granularity and capture the interactions among different levels of processes. It can extract the principles from a complex system through modeling the neural systems into different levels o f compositions. An object-oriented approach has been taken to build a library o f elementary neural objects, and new modules can be built from existing modules. A protocol-based simulation with EONS has also been developed. 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • A data warehousing architecture is proposed to provide a conceptual schema to organize the data in the experimental databases, and to facilitate the mapping between the simulation protocols and the experimental protocols. A flexible matching mechanism, FA-matching, is developed to search similar experimental recordings in the databases as the prototype function ofNeuroMiner. An index structure Sh-index is developed to support the matching mechanism for data m ining- This technique can also be used when we compare the experimental results with the simulation results. Our approach has taken the specific requirements of neuroscience applications into consideration. • A middle-ware layer, MDIS (Model/Data Integration System) is developed to provide several mechanisms: mapping between experimental protocols and simulation protocols, a protocol specification language, a semantic model, a metadata model, a Model-DataView structure, a controller, a query manager and a GUI. A hypothesis-driven approach for identifying key data in the databases is adopted. The hypotheses and the experimental protocols are used to identify key data in the experimental databases. Based on these mechanisms, the fitness between a model and an experiment can be tested. The integration of modeling, experimental databases and data mining is achieved. 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • A data transfer mechanism using XML and an application-dependent interaction protocol (ModelData protocol) for facilitating the communication among the components have been proposed. 1.3 Objectives and Significance I propose to develop an integrated environment for neuroscience researchers to interactively perform different functions provided by the modeling and simulation system, the experimental databases, and the data mining system. Most current applications tend to have stand-alone systems with only one of the facilities. The integrated architecture that is proposed in this thesis provides a test base for a comprehensive study of brain functions. Besides the advantages provided by the integrated environment, each component in the environment also has its own significance. The modeling and simulation component is based on a multi-level modeling system, EONS. This system can model and simulate neural functions at different levels of granularity. It also can model the interactions among the processes in different levels of neural system, from molecular/cellular level, to synaptic level, to neuron level, and 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to neural network level. The development of EONS is object-oriented, and each object is self-contained with the possibility of being reused by other modules. This modeling method makes it easy to include detailed analysis of neural dynamics by composing specific modules. Protocol-based simulation with EONS provides a tight coupling with experimental studies for modeling study. The design of the experimental databases has been extended into data warehouse. We use a multidimensional modeling method to organize the database tables, which not only makes it easy for data analysis and data search, but also makes it more efficient for data mining on experimental data. Our proposed time series matching mechanism is designed specifically for our applications of brain research. An index structure has been developed to enhance its performance. Modeling and databases are separated in most applications. The introduction of MDIS provides a logic layer between modeling and experimental studies to link them despite their different structures. MDIS includes several mechanisms. Specifically, Protocol Specification Language (PSL) provides a common representation for simulation protocols and experimental protocols, which makes it possible to automate the mapping between them and to facilitate the interactions between modeling and experimental studies. Metadata model provides a common ontology for mapping 7 permission of the copyright owner. Further reproduction prohibited without permission. between model parameters with experiment attributes. By incorporating the Model- DataView structure into the model databases, the models (in EONS) can be tested with real data from the experimental databases instead of being manually fed by humans. Cooperative query processing helps to find relevant data in the databases. Our hypothesis-protocol approach provides the mechanism to identify key data in the experimental databases and the linkage between models and experimental data. In order for these components to communicate with each other, a common representation for data transfer is needed. Introducing XML into this environment not only provides a common ground for communication, but also makes the work extendible to more applications. It will make it possible for authorized users to access our system through the Internet, search for models, run simulations, retrieve data from the experimental databases, and analyze or mine the experimental data. Thus, researchers at different geographic sites can cooperate with each other on brain research. The application-dependent interaction protocol, ModelData protocol, used in this environment will help to provide specific communication service for the application. The developed system architecture is loosely coupled and leaves room for extending different components and including new components into the environment. Each 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. component can be modified and/or extended on its own without affecting the performance of other components. This multi-tiered architecture separates interface, logic and data for maximum deployment flexibility. It also makes it easy for us to adopt a component-based approach to implement the system. 1.40rganization of the Thesis In this thesis, I describe the fiinctionality of each individual component before discussing the system integration issues. Chapter 2 consists of a review of the relevant literature. It includes some basic knowledge on neural systems, modeling and simulation systems, experimental databases, data warehouse, data mining, and system integration. Chapter 3 provides a high-level system architecture for the integrated environment. It introduces the basic characteristics of each component, the design options for the integrated environment and the adopted system architecture. Chapter 4 emphasizes the principles of multi-level modeling and the development of EONS. The principles and the significance o f multi-level modeling are first given. The 9 permission of the copyright owner. Further reproduction prohibited without permission. development of a multi-level modeling system, EONS, are presented. This system aims at modeling neural systems at a given or across different levels of organization, capturing the principles o f multi-level analysis o f neural systems. We adopt an object- oriented approach to build a hierarchy of neural modules. Some elementary objects of neural systems are first constructed and new models are composed from existing ones. Each object (module) is self-contained and can be modified and expanded without affecting other modules. Protocol-based simulation is also introduced. Chapter 5 introduces the work on data warehouse and data mining for experimental databases. I propose to use a multi-dimensional modeling method to organize the experimental data. A flexible matching (FA-matching) mechanism for finding similar time series recordings in the databases given a certain time series is presented. The matching mechanism can find experimental recordings with different accuracy according to the specification of the query submitted by the users. This technique can also be used to compare an experimental recording with a simulation result, which is required when we integrate the modeling studies with the experimental databases presented at chapter 7. A shape index schema, Sh-index, is developed and implemented together with the FA-matching mechanism in Java. A framework for data mining on experimental databases and the primary function ofNeuroMiner are given. 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 contains the work on developing MDIS. Experimental protocols and simulation protocols are introduced. The Protocol Specification Language, semantic model and its implementation in C++ are shown. The metadata model is introduced. The Model-DataView architecture is proposed to group models with queries to search databases for required experimental data. A hypothesis-protocol approach is developed to construct the Model-DataView units through an example. Cooperative query processing is presented. GUI design is mentioned. Chapter 7 presents the work on system integration o f modeling, experimental databases and data mining. Fitness testing for a model and an experimental as a case study of MDIS is first presented. The extended architecture involving MDIS, modeling and simulation system, experimental databases, and data mining system is developed. A common representation for message passing and an application- dependent interaction protocol are designed to facilitate the communication among the components. As for implementation aspect of system integration, we use C++ and socket programming to simulate three nodes that can send information to each other. Chapter 8 is the conclusion and outlines future work. Some directions for future work are proposed to extend the current system architecture and components. 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2: Background In this chapter, I give an introduction to the background knowledge on the topics addressed by the thesis. Basic properties of neuron and synapse are introduced first. Then a brief review on artificial neural networks is given, followed by a discussion on some neural modeling and simulation systems. Section 2.2 contains related work on databases and data mining systems. The experimental databases in USCBP (University o f Southern California Brain Project) are described. Some techniques on data warehouse and data mining are discussed. Finally, Section 2.3 discusses system architecture and system integration techniques. The literature review is not an exhaustive survey on the topics mentioned, but rather an analytic summary of relevant information. 2.1 Neuron and Neural Modeling To develop a neural modeling system, it is essential to have a basic understanding of the properties of the neuron and its components. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.1.1 Basic Properties of Neurons The human brain is composed of approximately 100 billion cells of many different types. There are about 10 billion neurons, which is a special kind o f cell that conducts electrical signal in the human brain. The remaining 90 billion cells are called glial or glue cells, which serve as support cells for the neurons. Figure 2.1 is a schematic structure of a neuron. Typically, a neuron contains three parts: cell body (soma), a tree-like networks of nerve fiber called dendrites, and a long fiber extending from the cell body called axon. The dendrites transport signals to the soma or cell body. The axons generally split into smaller branches constituting the axonal arborization. The tips of these axon branches impinge either upon the dendrites, somas, or axons of other neurons or upon effectors. The contact between an endbulb o f an axon and the part it impinges upon is called a synapse. The signal flow in the neuron is from the dendrites through the soma converging at the axon hillock and then down the axon to the synapses. With some exceptions, the flow could be bi-directional (Bose and Liang 1996). A neuron typically has many dendrites but only one axon. Some neurons lack axons, such as the amacrine cells. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.1.2 Synapses The excitation of a neuron affects other neurons through its synapses that impinge on the membrane of other neurons. Synapses differ in size, shape, form and effectiveness. The endbulbs are called presynaptic terminals. The membranes the endbulbs impinge on are called postsynaptic spines. There is a gap between the presynaptic and postsynaptic membrane which is called a synaptic cleft. Neurons communicate with each other through the release of small packets of chemicals into the gap, and some of them bind upon the postsynaptic membrane. One neuron may communicate with more than 100,000 other neurons. Figure 2.2 shows a synapse model. An action potential propagated to the endbulb causes the release of chemical substances called neurotransmitters which form little packets (synaptic vesicles) in the presynaptic membrane. Neurotransmitters diffuse across the synaptic cleft to the postsynaptic membrane and cause a change of electrical potential in the postsynaptic membrane. About 100 different neurotransmitters have been found in the human brain. The direct cause of the change in the electrical potential of the postsynaptic membrane is primarily chemical rather than electrical. The electrical impulses Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. transmitted across the synaptic cleft is chemically induced and controlled (Bose and Liang 1996). Electrical gap synapses may be found in prevertebrate systems, but they are not o f main concern in this context. Dendrites Cell body (soma) Axon hillock Synaptic endbulb r Axon Axon arborization Figure 2.1 Schematic of a Neuron (Bose and Liang 1996) 1 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 2.2 A Synapse Model Generally synapses are classified according to the different effects of the contact (not exclusive): 1) Excitatory synapse. Excitation of presynaptic neuron makes it easier for the postsynaptic neuron to be depolarized. 2) Inhibitory synapse. Excitation o f presynaptic neuron makes it harder for the postsynaptic neuron to be depolarized. Whether a synapse is excitatory or inhibitory depends not only on the type of neuro transmitter the presynaptic axon releases but also on the types of the postsynaptic receptors. There are two types of inhibitory synapses. Presynaptic Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. inhibition exists in axon-axon synapses. Postsynaptic inhibition is caused by inhibitory neurons. These inhibitory neurons form only inhibitory synapses with other neurons. There are some common features associated with synapses. In general, an impulse that arrives at a synapse generates a subthreshold change in the postsynaptic membrane. Potentials across postsynaptic membrane propagate along dendrites and cell bodies of neighboring neurons with decay. Signal flow is unidirectional from presynaptic to postsynaptic neuron. The delay from the time a propagation reaching the presynaptic site to the time the potential changes across the synaptic cleft is relatively long, approximately 0.5 to 2 ms. Synaptic plasticity refers to the significant and widespread characteristics of chemical synaptic transmission which either facilitate or inhibits chemical synapses during repetitive firing. The function of a synapse is very sensitive to the cell body's overall physiological, pharmacological conditions. Most models on synapses are rather simplified without considering the detailed processes involved. In conventional neural network models, synapses are usually not modeled. Only a link with a weight between two neurons is set-up for representing the axon-dendrite connection. Even though this simplification facilitates mathematical 1 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. analysis o f single neurons, very important neurobiological information is lost due to the simplification. 2.1.3 Artificial Neural Network In this section, I will give a brief review on some topics in artificial neural network. Interested in readers can get more details from the references I provide below. The Perceptron was among the first and simplest neural networks that were trainable (Rosenblatt 1958). It is a two-layer feedforward network, whose inputs called retina are connected to an association layer. This layer then projected to a response layer (Anderson 1997). The connections between the association layer and the response layer are leamable. The Perceptron usually functioned as a pattern classifier. If two pattern classes were linearly separable, i.e there exists a hyperplane lying between the two pattern classes, the Perceptron with a simple learning rule would eventually learn the classification. As for the learning algorithms used in a neural network, there are supervised learning and unsupervised learning (Ballard 1997). 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2X3.1 Supervised Learning For a supervised learning algorithm, there is a set of data, a training set, for which the output is known. Given a member of the training set as an input to the neural network, depending on how much outputs differ from the desired output (errors), we can adjust the network weights to minimize the errors. The more information we have on the training set and the network, the better for learning. If there is more than one layer of leamable parameters, the feed-forward networks are denoted multi-layer learning networks. Many multi-layer neural networks adopted a learning algorithm called backpropagation. Backpropagation algorithm is a supervised learning method for training multilayer neural network. It learns by first computing an error signal and then propagating the errors backward through the network. Then the network weights are adjusted to minimize the errors. Backpropagation is by far the most popular and widely used neural network learning algorithm. It is a more complicated gradient descent algorithm than the Widrow-Hoff learning rale (Widrow and Hoff 1960). It was discovered independently by several groups. The earliest description of the algorithm was by Paul Werbos (Werbos 1974) in his Harvard Ph.D thesis in 1974. The best-known early description was in 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Rumelhart, Hinton and Williams (Rumelhard and Hinton 1986). It showed that backpropagation can solve some difficult problems. David Parker (Parker 1985) and Yann Le Cun (LeCun 1986) also discovered the algorithm separately. Haykin (Haykin, 1994) gave a long and thorough technical review on backpropagation. Interested readers can refer to this article for details. 2.1.3.2 Unsupervised Learning One of the most influential unsupervised learning algorithms is the hebbian learning rule (Hebb 1949). According to Hebb, the means of neural activity changing synaptic function relies on a local correlational learning rule to model synaptic plasticity (Montague and Sejnowski 1994). Aw(t) = rpc(t)y(t) The central idea is that when an axon o f cell A is near enough to excite a cell B and repeatedly or persistently take part in firing it, some growth process or metabolic change takes place in one or both cells such that A ’ s efficiency as one o f the cells firing B, is increased. w(t) is the synaptic strength at time t, x(t) is a measure of Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. presynaptic activity, y(t) is a measure o f postsynaptic activity, and 7 7 is a fixed learning rate. This kind of learning rule is called local because the signals sufficient for changing synaptic efficiency are assumed to happen in local synaptic contacts. Subsequent theoretical and computational efforts have extended Hebb's ideas and successfully used correlational learning rules for aspects of map formation and self-organization of visual and somatosensory cortex (Montague, Gaily et al. 1991) (von der Malsburg 1977) (von der Malsburg 1973). For unsupervised learning algorithms, the exact output of a training set is not known, i.e., no supervisor exists. Unsupervised learning algorithms are of interest because the structure and organization arise from the raw data. However, because unsupervised learning algorithms have little initial information to start with, they are generally difficult to use. McCulloch and Pitts (McCulloch and Pitts 1943) proposed the first simple logical units called cells or neuron models. Neural network models based on simple neuron models assume many simplifications over actual biological neural networks. Such simplifications sometimes are necessary to capture the intended properties and to 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. apply mathematical analysis. However, simple structures are sometimes not enough to capture the underlying interactions among neurons. When the neuron model is too simple, some interesting details may be missing from the model, such as the role of synaptic dynamics in neural information processing. The role o f synaptic dynamics in neural information processing is an important characteristic to investigate to really understand information coding and transformation in the nervous system, because the connections between neurons represent the way how neurons process information in the neural and synaptic level. Thus there is a need to construct a model of neuron networks with more detailed neuron connections to include synaptic transmission. 2.1.4 Neural Modeling and Simulation Systems Neuron simulation tools have been built to assist the process of modeling and simulating neurons at different levels of detail. In this section, I will give some general introduction to several neuron simulation tools, e.g. NEURON, GENESIS, and NSL. 2.1.4.1 NEURON One of the major goals for developing NEURON (Hines 1984) (Hines 1989) (Hines 1993) (Hines 1994) is to provide a flexible environment for implementing biologically 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. realistic models of electrical and chemical signaling in neurons and networks of neurons. It is designed for the convenient creation of quantitative models of brain mechanisms and the efficient simulation of the operation o f these mechanisms. The underlying mathematical basis for NEURON is the cable theory. Although NEURON is a compartmental modeling program, the specification o f biological properties has been separated from the numerical issues of compartment size (Hines and Camevale 1997). NEURON adopts the object-oriented design method and uses the notion of section to represent a continuous length of unbranched cable. Sections are connected together to form branched tree structures of any kinds. Each section consists of one or several segments of equal length. At the center of each segment is a node, which defines the internal voltage of the segment. Adjacent nodes are connected through resistors. NEURON provides two implicit integration methods: backward Euler, and a variant o f Crank-Nicholson (C-N). The two methods are almost identical in terms o f computational cost per time step. Backward Euler is the default one because of its robust numerical stability. But C-N method is a more accurate method for small time steps. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. NEURON aims to provide convenience for the user without sacrificing speed in simulations. User convenience is provided through an interpreter while speedy simulations are carried out by compiled components. The interpreter built into NEURON, called HOC (High Order Calculator), also provides for assigning the location and strength of synapses, placement and strength of current injection electrodes or voltage clamps. The interpreter has been extended with some object- oriented syntax to implement abstract data types and data encapsulation. NEURON includes several standard channel types (Moore and Hines 1994): • Hodgkinn-Huxley (HH) Na, K, Leak channels (Hodgkin and Huxley 1952), • Moore-Cox sodium channel, • Ca channel, • Ca-sensitive R channels. ® Many additional channel types are available in the library of user contributions. Additional standard mechanisms included are: • a Na/Ca exchange, • a metabolically driven Ca pump, 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ® intracellular Ca binding, • intracellular Ca diffusion. The compiled routines of all the membrane and intracellular processes are called by the interpreter to carry out calculations of membrane channel currents and a variety of other mechanisms. The NEURON interpreter also offers a convenient method for plotting the voltage and current responses or the values of other parameters at any location. An editor built into NEURON allows on-line changes in parameters, variables plotted, etc. Since the programming editor is quite application-dependent, NEURON also accepts HOC code in form of straight ASCII files created by users using other editors. Membrane mechanisms can be added to NEURON using a model description language called MODL. The original model description language was developed at the NBSR (National Biomedical Simulation Resource) to specify models for simulation with SCoP (Simulation Control Program). Nonlinear algebraic equations, differential equations, and kinetic schemes can be specified into a physical model through the MODL language. MODL translates the specifications into programs in C which are then compiled and linked to the SCoP program. Later with some modest 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. extension, MODL is able to translate model specifications into the form that can be compiled and linked to NEURON. The extended MODL is called NMODL. Using NMODL has several advantages. (1) Consistency of the models is ensured. (2) Mechanisms described by a kinetic scheme are clearly stated. (3) By describing the content in the model level instead o f C programming level, clarity o f specification is enhanced, and simplification of a description language is achieved. Membrane mechanisms usually deal with currents, concentrations, potentials, and state variables. Thus it is necessary for NMODL to divide up the statements into the current functions and state functions. An interface needs to be generated between model variables and NEURON, and a memory allocation function needs to be created to store variables of segments. Also local currents need to be summed up into global ionic currents. 2.1.4.2 GENESIS GENESIS (Bower and Beeman 1994) is a software package developed at California Institute of Technology for realistic modeling of neural structures. It provides a compartmental modeling abstraction for dendritic trees and several kinds of ionic channels. Its design is object-oriented. Simulations are constructed from modules 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (objects), each of which performs well-defined functions. Users can add new modules to existing modules to extend an application. Modules communicate with each other through message passing. GENESIS provides system commands to create networks of cells, e.g. addmsg, createmap, planar/volumeconnect, planar/volumnweight, planarZvolumndelay. Using these commands a network of neurons can be systematically created. The GENESIS user interface is made up o f two parts. The underlying level of GENESIS includes a built-in script programming language called SLI (Script Language Interpreter). This is a very simple language used to create scripts which define GENESIS objects and control the progress of simulation. It is similar to a UNIX shell, with an extensive set of commands for building simulations. The interpreter can interactively read SLI scripts from the command line or from files. The graphical interface is XODUS (X-windows Output and Display Utility for Simulation). XODUS provides a higher-level view for users to develop simulations and monitor the execution. XODUS provides graphical computational models for users to choose and display data. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. With regards to numerical methods, there are both explicit integration (Euler, exponential Euler), and implicit integration (Backwards Euler, Cran-Nicolson + Hines numbering scheme) in GENESIS. GENESIS also provides a parameter searching mechanism (param library) to facilitate simulation. 2.1.4.3 NSL NSL (Neural Simulation Language) is a simulation language and development system (Weitzenfeld, Arbib et a L 1997). The simulator has been implemented in both C++ and Java. In NSL, there is a common compiled language, NSLM, for building modules, and a common script language based on TCL, called NSLS, for building and controlling the simulation system. The implementation o f models in NSL is modular. Each model is built upon hierarchical collections o f sub-modules connected via ports, and can be used for building other modules. Currently NSL has several independent sub-systems: a simulator, an object-oriented module building language called NSLM, a flexible scheduler that allows for modules to have different time steps, hierarchical processing of modules, common neural network Math functions, an object-oriented scripting language called NSLS, some Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. graphical plotting capabilities, libraries o f modules with version control on some modules, and a schematic capture system. The simulator is the part where model interaction and processing take place. It consists of model language libraries, model language compiler, processing module, commands interpreter, etc. The window interface is the part where all graphics interaction and display take place. It consists of temporal plots, spatial plots, area level graphs, zoom/unzoom, and dynamic selection of variables. The schematic capture system is an ongoing work. It will contain an icon editor, a schematic editor, a text editor, a script and NSLM generator, and a hierarchical consistency checker. The NSL library contains modules of leaky neuron, etc. NSL simulation language is designed as an object-oriented language. In NSL, users can program learning methods such as Hebbian learning, and backpropagation methods into a simulation. NSL simulation commands may be read from batch files, or interactively entered; and any number o f command files can be associated with a single model file. There are pre-defined modules to control the sequence o f execution. 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.1.5 Discussion Most models on synapses are rather simplified without considering the detailed processes involved. In conventional neural network models, synapses are usually not modeled. Only a link with a weight between two neurons is set-up for representing the axon-dendrite connection. Even though this simplification facilitates mathematical analysis of single neurons, very important neurobiological information is lost due to the simplification. The underlying neurobiological changes within the synapses are not taken into consideration. There is a need and a challenge to build a neuron model with detail synaptic information to capture the real neural information processing. Both NEURON and GENESIS provide compartmental modeling abstraction for dendritic trees and support some cellular and molecular mechanisms. However, these two simulation tools have tightly coupled simulation and user interface which limits portability o f models. Detailed synaptic modeling is not emphasized by either. When they are used for modeling neural networks, efficient performance is obtained with small network of neurons. Among these tools, GENESIS or NEURON is designed to model a few neurons with detailed dendrites and compartments. NSL are concerned with modeling large 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. networks of neurons. In these simulation tools, the user interface and the simulation programs are tightly coupled. Modification of part of the simulation programs or the user interface will affect the whole system. Detailed analysis and modeling o f synapses are not included in any of these three systems. Modules developed by the simulation tools are difficult to re-use and extend without affecting the whole program These open problems will be the issues we address when we develop the simulation program EONS. Therefore, there are different alternatives for users when they want to use a simulation tool. If they want to explore the compartmental and other cellular and molecular properties, they can choose GENESIS and NEURON. If they want to simulate large network of neurons, NSL is a good candidate. If they want to model neurons into details, NEURON is a good one. The multi-level modeling system EONS (details in chapter 4 o f this thesis) we developed is a system to model neural system taking into account the processes at one level and across different levels of organization. It provides a library of elementary neural objects which can be re-used and modified separately. It provides detailed modeling of synapses and neural network modeling with detailed synaptic dynamics. If a user want to model neural system into different levels o f granularities from networks, neurons, synapses, down to the molecular / cellular levels with underlying neural processes, EONS is a good candidate to use. 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2 Databases and Data Mining Database management systems are systems especially developed for the storage and flexible retrieval of large masses of structured data. Knowledge discovery in databases, often called data mining, aims at the discovery of useful information from large collections of data. The discovered knowledge can be clusterings of the objects, frequently occurring patterns, rules describing properties of the data in the database, etc. In this section, we will introduce some issues related to databases and data mining. We will first present some developed experimental databases at USCBP. Then data warehouse will be presented as an alternative and better extension for databases. Some data mining issues especially time series data mining, are discussed. 2.2.1 Experimental Databases In USC Brain Project, experimental databases are being built to provide storage and data analysis for individual laboratory to conduct brain research. Among them, there 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are a NeuroCore database schema, and a prototype time series database for in-vivo neurophysiology. 2.2.1.1 A NeuroCore Database Description In USC Brain Project, experimental databases are being built to provide storage and data analysis for individual laboratory to conduct brain research. Among them, there is a NeuroCore database schema which provide general experimental database schema and guidelines for extension. To build a NeuroCore database description in USCBP, several design principles have been considered (Grethe 1997): • An easily modified database is desirable for meeting specific needs of a laboratory and remaining compatible to other databases. • The database is not just a place for storing and retrieving data. It should also provide experimental tools and data analysis. • A monolithic database is not enough for ever increasing data distribution. • Intra- and inter-laboratory collaborations should be fostered. 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Thus, the NeuroCore database descriptions (schema) aim to organize the core concepts in neurobiology experiments in a way that meets every laboratoiy’s basic needs and provide means for extensions. The design method for NeuroCore is object-oriented methodology. Each generic concept is modeled as a class. There is inheritance between a super-class and a sub­ class. Data encapsulations are preserved in the (class) object descriptions. By modeling the basic concepts into modules, the modules are self-contained and can be modified and maintained independently. When the modules are used by different laboratories, the modules can be changed and new information can be added into the modules. In order to cover the main concepts of neurobiological experiments, the NeurCore schema also includes the essential information for storing an experiment, and providing some extensions for easy augmentation and data sharing. Thus the core schema consists o f five components (Grethe 1997): • Core Experimental Framework • Neuroanatomical Concepts • Neurochemical Concepts 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Database Extension Interface • Database Federation Interface After the design of the core schema, the choice for database is to use an object- relational database. The data model for an object-relational database is very relation- centric. Tables are the key central concepts in the model. Inheritances exist between some tables. It works in one way like an object database that objects may be tables; in another way like a relational database where there are references and inheritances. An object-relational database fits the need for several reasons: 1. Besides standard data types, an object-relational database can support user defined data types for storing application specific data, such as hypotheses and experimental protocols: experimental manipulations and experimental conditions. 2. There are many time series data in neurobiological experiments, while the object-relational database provides time series datablade (see Informix Marmul for details) for this kind of data. 3. The neuroanatomical and neurochemical experiments use atlas-based brain image for their research. An object-relation database provide built-in support for large objects as data types such as image, audio, video, etc. 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The core database schema is used as a base if we want to build a specific database for alabofUSCBP. 2.2.1.2 A Time Series Database for in-vivo Neurophysiology In general, the time series databases in the context ofUSCBP are concerned with storing, retrieving and analyzing electrophysiological data, together with experimental manipulations and experimental conditions (experimental protocols). In USC Human Brain project, there are several laboratories recording and analyzing time series data. Building time series databases will facilitate several laboratories’ research and collaborations. One of the examples is the time series database in Dr. Thompson’s lab. This database was built on the context o f searching for substrates underlying learning and memory (Grethe 1997b). The purpose of the experiments is to train a rabbit to response to a tune with eyeblink (the conditioned stimulus or CS). During the experiment, there are recordings for the peak o f response, the duration of the presence of the unconditioned and conditioned stimulus. The relative distance between the point of peak response to the presence of stimulus through time. Thus, a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. time series database is needed to store these data, together with experiment preparation, and experimental protocol. To build a time series database for experiments like the above, there is a need to extend the database design from the NeuroCore database schema. The NeuroCore database schema serves as meta-data for constructing general laboratory database schema. A new research subject needs to be defined from that o f the NeuroCore schema, and more detailed information will be included. Because object-relational database supports inheritance, the subclass inherits the superclass’s description, and new fields and functions can be added into the subclass. To be able to more efficiently analyze the data, the experimental conditions and experimental manipulations, called experimental protocol, under which the experiments were performed should all be included in the database. It is better to store the experimental results and the experimental protocol together for later retrieval and analysis. Thus, those experimental conditions and manipulations are also represented as tables created from the schema in the NeuroCore. Usually some statistic methods are associated with the recordings. Extra tables are needed to store the results from applying the statistic methods to the recordings. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2.2 Data Warehouse Data warehousing is a concept. It is not a product that you can buy off the shelf. It is a set of hardware and software components that can be used to better analyze the massive amounts of data that organizations are accumulating to make better decisions. The experimental data that are used to operate research in our application environment represents a wealth of knowledge that may not be fully tapping into. It is an asset that is probably not being used to its fullest potential. Data warehousing can help to take advantage of the experimental data experimentalists have created over time. 2.2.2.1 Characteristics of a Data Warehouse There are generally four characteristics that describe a data warehouse (Weiss 1998): • Subject-oriented: data are organized according to subjects e.g. an academic institute using a data warehouse would organize their data by department, research areas, and administration, instead of by different products (auto, life, etc.). The data organized by subject contain only the information necessary for decision support processing. 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Integrated: When data resides in many separate applications in the operational environment, encoding of data is often inconsistent. For instance, in one application, gender might be coded as " m " and "f' in another by 0 and 1. When data are moved from the operational environment into the data warehouse, they assume a consistent coding convention e.g. gender data is transformed to " m " and tijp ! • Time-variant: The data warehouse contains a place for storing data that are 5 to 10 years old, or older, to be used for comparisons, trends, and forecasting. These data are not updated. • Non-volatile: Data are not updated or changed in any way once they enter the data warehouse, but are only loaded and accessed. 2.1.2.2 Processes in data warehousing The first phase in data warehousing is to "insulate" your current operational information, i.e. to preserve the security and integrity of mission-critical applications, while giving you access to the broadest possible base of data. The resulting database or data warehouse may consume hundreds of gigabytes - or even terabytes - o f disk space, what is required then are efficient techniques for storing and retrieving massive Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. amounts of information. Increasingly, the large organizations have found that only parallel processing systems offer sufficient bandwidth. The data warehouse thus retrieves data from a variety o f heterogeneous operational databases. The data is then transformed and delivered to the data warehouse/store based on a selected model (or mapping definition). The data transformation and movement processes are executed whenever an update to the warehouse data is required so there should some form of automation to manage and execute these functions. The information that describes the model and definition of the source data elements is called "metadata". The metadata is the means by which the end-user finds and understands the data in the warehouse and is an important part of the warehouse (Weiss, 1998). The metadata should at the very least contain: • the structure of the data; • the algorithm used for summation; • the mapping from the application environment to the data warehouse. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Data cleansing is an important aspect of creating an efficient data warehouse in that it is the removal of certain aspects of operational data, such as low-level artifacts in the data, which slow down the query times. The cleansing stage has to be as dynamic as possible to accomm odate all types of queries even those that may require low-level information. Data should be extracted from application sources at regular intervals and pooled centrally but the cleansing process has to remove duplication and reconcile differences between various styles of data collection. Once the data have been cleaned they are then transferred to the data warehouse which typically is a large database on a high performance box either SMP, Symmetric Multi-Processing or MPP, Massively Parallel Processing. Number-crunching power is another important aspect of data warehousing because of the complexity involved in processing ad hoc queries and because o f the vast quantities of data that the organization wants to use in the warehouse. A data warehouse can be used in different ways for example it can be used as a central store against which the queries are run or it can be used to like a data mart. Data marts are small warehouses that can be established to provide subsets of the main store and summarized information depending on the requirements o f a specific group. The central store approach generally uses very simple data structures with very little assumptions about the relationships between data, whereas marts often use multidimensional databases 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. which can speed up query processing as they can have data structures which are reflect the most likely questions. The data warehouse offers the potential to retrieve and analyze information quickly and easily. 2.2.23 Designing Data Warehouses Designing data warehouses is very different from designing traditional information processing systems. For one thing, data warehouse users typically don't know nearly as much about their wants and needs as information processing users. Second, designing a data warehouse often involves thinking in terms of much broader, and more difficult to define, application concepts than does designing an operational system (Weiss, 1998). But while data warehouse design is different than we have been used to, it is no less important. The fact that end-users have difficulty defining what they need as a bare minimum is no less necessary. In practice, data warehouse designers find that they have to use every trick in the book to help their users "visualize" their requirements. In this respect, robust, working prototypes are essential. Developing / transforming databases into data warehouse architecture is still a new concept. Most current commercial warehousing systems (e.g., Red-brick, Sybase, 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Arbor) focus on the query and analysis component, providing specialized index structures at the warehouse and extensive querying facilities for the end user. In the WHIPS (WareHousing Information Project at Stanford) project (Labio, Zhuge et al 1997), on the other hand, they focus on the integration component. In particular, they have developed an architecture and implemented a prototype for identifying data changes at heterogeneous sources, transforming them and summarizing them in accordance to warehouse specifications, and incrementally integrating them into the warehouse. The WHIPS architecture is modular and has been designed specifically to fulfill several important and interrelated goals: data sources and warehouse views can be added and removed dynamically. It is scalable by adding more internal modules; changes at the sources are detected automatically; the data warehouse may be updated continuously as the sources change, without requiring “down time”; and the warehouse is always kept consistent with the source data by the integration algorithms. 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2.3 Time Series Data Mining Time-series databases naturally arise in business as well as scientific decision-support applications. The capability to find time-sequences (or subsequences) that are “similar” to a given sequence or to be able to find all pairs o f similar sequences is needed in several applications, such as stock market movements, and seismic waves that are not similar to spot geological irregularities, etc. Similarity Search The similarity search method first proposed in (Jagadish 1991) has since been refined and modified in several subsequent papers (Faloutsos, Ranganathan et al. 1994) (Agrawal, Lin et al. 1995) (Li, Yu et al. 1996). The other papers mentioned here (Shatkay and Zdonik 1996) (Agrawal, Psalila et al. 1995) have tried altogether different ways of approaching this problem. Before any methods for similarity can be applied to two signals, information about the signal shapes has to be extracted from the signal. A method for feature extraction was suggested in (Jagadish 1991), and the same method has appeared in subsequent papers (Faloutsos, Ranganathan et al. 1994) (Agrawal, Lin et al. 1995). The signal is 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mapped into the frequency domain by the Discrete Fourier Transform (DFT) and all but the first/ frequencies are discarded. The remaining frequencies form an/- dimensional vector that is used as a representation of the original sequence. This makes it possible to use spatial access methods for finding similar shapes. The observation is that only the first few frequencies will be of practical interests and the transformation preserves the Euclidean distance in the frequency domain. When the system is given a sequence and asked to retrieve all similar sequences, all sequences whose feature vector lie within a given distance from the query sequence feature space vector are returned. A major limitation of the method is that all sequences have to be o f the same length, i.e. the system does not support subsequence matching. A solution to the problem of subsequence matching is presented in (Faloutsos, Ranganathan et al. 1994). Here a method for doing fast subsequence matching is proposed. This is done by the introduction of a sliding window. The window slides along the time series sequence and for each window position the window contents are mapped into the frequency domain by the discrete Fourier Transform (DFT). These first/frequencies (typically 2-3), are used to create a vector in feature space. The result is that the window content is mapped to a point in the/-dimensional feature 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. space, and is represented by its minimum bounding (hyper)-rectangle, MBR. The MBRs are stored in an R*-tree (Beckmann, Kriegel et al. 1990) that serves as an efficient index. R*-tree is a derivation ofR-tree. R-tree is a popular spatial index. It is a height- balanced tree similar to a B-tree (Comer 1979), with index records in its leaf nodes containing pointers to data objects. The structure is designed so that a spatial search requires visiting only a small number of nodes. The index is completely dynamic; inserts and deletes can be intermixed with searches and no periodic reorganization is required (Guttman 1984). When we want to query the system for a signal, the query signal is subjected to the same procedure as described above. Then we calculate the Euclidean distance between the query signals minimum bounding rectangles and the minimum bounding rectangles previously stored in the system and if the distance is sufficiently small, the signals associated with the found stored rectangles are returned as matches to the query. This generalization allows that the data sequences can be of different lengths and the query sequence can be smaller than any o f the stored data sequences. 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In (Li, Yu et al. 1996) the work in similarity search for sequences is continued. A method for reducing the search space in feature space is suggested. The features extracted from the signal are grouped in a hierarchy. When the signal is to be found, the most descriptive feature is used to make a fast selection from database. All results that have a correlation above some redefined threshold are saved and the process is repeated with the second most descriptive feature on the saved results. Because of this hierarchical search, a linear scan of the entire feature space can be avoided and thus a faster search can be made. Another approach to the problem is suggested in (Shatkay and Zdonik 1996). They present a method where all interesting “features” are extracted from the signal. The features are not extracted in the same way as in the previously mentioned papers. Instead they extract those features that a domain expert finds interesting in the signal, for example peaks. The feature extraction is done so that the extracted features can be amplitude scaled, contracted, dilated or shifted in time. This makes it possible for them to ask queries like “ retrieve all signals with two peaks” and the system will find all sequences with two peaks, regardless o f their length, the peak height or position. This is done by breaking the sequence into smaller subsequences using a breaking algorithm. This breaking algorithm depends on the application and they do not discuss exactly how it works in this article. Since this representation of the signal is more 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. compact than the original representation (if the sequences obtained from the breaking algorithm are sufficiently larger than 1), an effective compression of the signal is achieved. Further information can also be gained from this function by calculating the derivatives, extremes, etc. The Language Approach A language approach was suggested by a group at IBM Almaden Research Center in (Agrawal, Psalila et al. 1995). This approach differs from the previous approaches in that they define a shape definition language, SDL. With SDL they are able to describe shapes and properties of a signal. The shape matching is then transformed into a problem of unifying two expressions, the query expression and the stored sequence expression. The most interesting feature o f SDL is its capability o f “blurry” matching. That is the ability to describe the overall shape of the signal without having to care about specific details. We do not have to specify any exact values, it is enough if we use a more diffuse description of the signal. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. There are two different types of shapes in SDL - elementary shapes and derived shapes. The elementary shapes are the shapes that are directly described by the SDL alphabet, and derived shapes are more complex shapes described by strings of tokens from the alphabet. The derived shapes can be very complex since SDL has a set of modifiers, that allows the users to describe very complex signals. All shapes can be parameterized. SDL provides a natural and powerful language for expressing shapes and shape queries and the important ability to perform blurry matching. However, SDL can only tell if the signal is changing its value very rapidly or not, and if the signal becomes zero. This introduces another problem that is not addressed in (Agrawal, Psalila et al. 1995) and that is how, for example, “slightly increasing” should be defined. This is one of SDL’s weaknesses. 2.2.4 Discussion Experimentalists have accumulated a large amount o f experimental data, how to organize these data for easy data retrieval, data analysis and data mining is a challenging work. In the presence of large amount o f experimental data, a data warehouse is desirable for efficient data store, data retrieval, data analysis and decision making Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The data that have been addressed in the papers for time series similarity search are mostly stock market time series data. However, I am working on neurophysiological recordings, it is different from the time series in the stock market domain. The neurophysiological data are generally not translated into frequency domain but rather be kept in the time domain since the translation will result loss o f information. Moreover, neuroscientists are usually interested in how the recordings change as time changes. Generally there is no obvious hierarchy in neurophysiological time series recordings, the hierarchy approach proposed in (Li, Yu et al. 1996) is not directly applicable. The SDL provides a labeling mechanism for any two consecutive points in a time series and it motivates my Shape index in chapter 5 of this thesis. But, we can see, most of the work focus whole sequence matching without considering some subsequence closely matched while other subsequences loosely matched. 2.3 System Integration A system is a collection of connected units that are organized to accomplish a specific purpose. System architecture is the organizational structure o f a system. We distinguish three different classes of architectural element: 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • processing elements; • data elements; and • connecting elements. How the components of the system architecture are used and partitioned defines the Application Architecture, which identifies the techniques, tools, technologies, and abstractions used to develop an application. 2.3.1 Schema Integration The Scalable Knowledge Composition (SKC) project (Jannink, Mitra et al. 1999) at Stanford is being initiated to develop a novel approach to resolve semantic heterogeneity in Information systems. The SKC approach aims to develop an algebra over ontologies that represent the terminologies from distinct, typically autonomous domains. Through their approach, the problem o f managing large knowledge bases is reduced to one of composition. No global agreement is needed among maintainers of disjoint ontologies. This distributed approach to knowledge maintenance makes semantic interoperation truly scalable, since it partitions the ontology maintenance Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. problem into small, disjoint and specialized domains (Mitra, Wiederhold et al. 1999). The project is hence conceptually quite innovative. Schema Implantation and Semantic Evolution (Kahng and McLeod 1998) is a partial database integration scheme in which remote and local (meta) data are integrated in a stepwise manner over time. This technique has been experimentally implemented and demonstrated with USC Brain Project examples. Knowledge is required to interrelate remote and local (meta) data is acquired from corresponding domain experts who have the necessary expertise about their application domain semantics. Relating remote data to local data enables information sharing and exchange between a remote database and a local database by allowing remote information units to be accessed/manipulated from/within the local database environment. In (Kahng and McLeod 1996) a shared ontology - a collection of key concepts and terms along with their inter-relationships are described. A "dynamic ontology" is a shared ontology that adapts to an application domain and evolves with time as the concepts in that domain change. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.3.2 Database Middleware Systems Database middleware systems (Papakonstantinou, Garcia-Molina et al. 1995) (Tomasic, Raschid et a L 1996) (Adali, Candan et al. 1996) (Bontempo 1995) integrate data from multiple sources. To be effective, such systems must provide one or more integrated schemas, and must be able to transform data from different sources to answer queries against these schemas. The power of these systems’ query engines and their ability to connect to several information sources makes them a natural base for doing more complex transformations as well. Since data comes from many diverse systems these days, it must provide access to a broad range o f data sources transparently. It must have sufficient query processing power to handle complex operations, and to compensate for limitations of less sophisticated sources. Some transformation operations (especially the complex ones) require that data from different sources be interrelated in a single query. Garlic (Carey et al. 1995) is such a typical database middleware system to provide most of the mentioned capability. Garlic's (Haas, Miller et al. 1999) powerful query engine, in both planning and executing the query, can communicate with wrappers for the various data sources involved in the query. Systems of this type have two opportunities to transform data: Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. first, at the wrapper as the data is mapped from the source's model to the middleware model, and second, by queries or views against the integration schema. The TSIMMIS Project at Stanford University (Chawathe, Garcia-Molina et al. 1994) deals with developing tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and semistructured data. It provides an architecture and tools for accessing multiple heterogeneous information sources by translating source information into a common self-describing object model, called the Object Exchange Model (OEM). TSIMMIS provides integrated access to heterogeneous sources through a layer of source specific translators as well as "intelligent" modules, called mediators. Translators (wrappers) convert queries over information in the common model (OEM) into requests the source can execute. The data returned by the source is converted back into the common model. Mediators are programs that collect information from one or more sources, process and combine it, and export the resulting information to the end user or an application program. Users or applications can choose to interact either directly with the translators or indirectly via one or more mediators. In order to access information from a variety of heterogeneous information sources, wrappers are developed to convert queries into one or more commands/queries 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. understandable by the underlying source /and transform the native results into a format understood by the application (Hammer, Breunig et al. 1997). In (Li, Yemeni et al. 1998), a TSIMMIS mediator is presented to generate feasible query plans in the presence of limited source capabilities. 2.3.3 Data Integration Systems The data integration system developed at University of Washington (Ives, Florescu et al. 1999) is an automated method for querying across multiple heterogeneous databases in a uniform way. In essence, a specific schema is created to represent a particular application domain and data sources are mapped as views over the specific schema. The user asks a query over the specific schema and the data integration system reformulates this into a query over the data sources and executes it. Query processing in data integration occurs over network-bound, autonomous data sources ranging from conventional databases on the LAN or intranet to web-based sources across the Internet. Their Tukwila data integration system is designed to scale up to the amounts of data transmissible across intranets and the Internet (tens to hundreds of MBs), with large numbers of data sources. This requires extensions to traditional optimization and 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. execution techniques for three reasons: there is an absence of quality statistics about the data, data transfer rates are unpredictable and bursty, and slow or unavailable data sources can often be replaced by overlapping or mirrored sources. The Tukwila data integration system is designed to support adaptivity at its core using a two-pronged approach. Interleaved planning and execution with partial optimization allows Tukwila to quickly recover from decisions based on inaccurate estimates. During execution, Tukwila uses some complex query operators (Ives, Florescu et al. 1999), which produces answers quickly, and the dynamic collector, which robustly and efficiently computes unions across overlapping data sources. 2.3.4 Discussion The approach in SKC is a modular approach since operations and rules are defined to integrate the ontology from different domains through abstraction of the ontology. Consequently source ontologies can be largely maintained autonomously. However, the approach in (Kahng and McLeod 1998) is a partial database integration through attribute translation assisted by domain knowledge. Schema integration as the name implies is mainly on the meta data level integration, while, database middleware systems go further into not only integrating meta data but 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. also processing data themselves and doing complex queries over different sources where data can be in different formats. The Tukwila data integration system mainly addresses issues of data integration in an open network environment. As we can see, most of the work on system integration focuses on conceptual and system level integration and /or databases integration. Few of them work on developing application level connection mechanisms, and building an integrated environment for simulator, databases and data mining systems. These are the issues I address in this thesis. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3: System Architecture of the Integrated Environment Modeling studies and experimental studies are traditionally separated with their own methods. Most of current neuroscience modeling and simulation systems are stand­ alone systems. Usually humans feed the simulation data from published experimental data in the literature instead of directly from experimental databases. Moreover, not many neuroscience simulation and experimental applications are linked to a data mining facility which can potentially advance the development o f both. The integration of modeling and simulation systems with experimental databases and a data mining facility will have significant impact on neuroscience research, and will provide new insights for both modelers and experimentalists. Thus, this project aims at providing an integrated environment for neuroscientists and modelers to communicate with each other in a more systematic way. A middle-ware layer (MDIS), Model/Data Integration System, is built on top o f the three components. To this end, we have provided a N-tier system architecture, i.e. user interface on the top; MDIS in the middle; modeling and simulation system, data mining system and experimental databases on the bottom. We will emphasize the last two layers in this thesis. 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1 Basic System Architecture In this section, the system architecture for an integrated environment is presented. The background of the environment and some particular components of the architecture are described. 3.1.1 Background This effort is a part of the Brain Project at University of Southern California. The USC Brain Project (USCBP) is funded in part by the Human Brain Project consortium, and integrates research in neuroscience with research in Neuroinformatics, adapting such computational techniques as databases, the World Wide Web, data mining, and visualization to the analysis of neuroscience data. It employs modeling and simulation to study the relations between structures and functions of the brain. The brain is a complex system with multiple levels of functional organization, prohibiting any attempt at an integrative study of the entire brain. Consequently, each neuroscience study focuses on a specific part of the brain resulting in dispersed and fragmented data sets. Similar conditions exist in the discipline of computational 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. modeling. Moreover, not only the data sets are fragmented, but also our understanding of the brain is limited by the fragmented neuroscience studies. Usually a model can span a few aspects of the nervous system, and each aspect is studied using a distinct approach. To verify and test the accuracy of the models, we need data sets from multiple sites. Thus linking the fragmented data sets from multiple sites will provide a better test base for modeling, and give more insights on experimental studies as well. Further, new knowledge can be gained only through combining relationships among fragmented data sets, showing a need for data mining on the experimental data stored in multiple database. At the current stage, experimental data are stored in different files and formats under different sites. The input data for simulations are manually selected from neuroscience literature. To this extent, USCBP has made initial efforts to organize experimental data into experimental databases, making data accessible electronically. Still, linking a simulator and multiple databases is not a straightforward process because of the large data sets in the databases and the difficult selection of relevant data for a simulation. Moreover, finding the hidden knowledge among the experimental data is also challenging work. Therefore, an integrated environment can provide a comprehensive study of modeling and experimental studies, to explore the methodology for mining data sets from neuroscience databases. 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Our integrated simulation, data mining, and database environment will provide mechanisms offering specific characteristics to applications. The developed system architecture provides services for: • Modeling neural systems from different levels of granularities, • Simulating neural models with real experimental data from experimental databases, • Searching similar experimental results with corresponding simulation results in the experimental databases, • Providing an organized data storage system, a data warehouse which can be subject-oriented, • Finding trends and associations among experimental protocols and experimental recordings by data mining, • Facilitating communication among the components with a common representation for data transfer and an application-dependent protocol for message passing, • Providing mechanisms for integrating modeling and experimental studies. We assume that more services will be included into this environment as it is further developed. In this thesis, we have addressed the following issues: 6! Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Developing a multilevel modeling system to model neural systems from different levels of granularities (chapter 4), • Building the connection between simulations and experimental databases facilitated by a middle-ware layer (MDIS) and searching similar experimental results with corresponding simulation results in the experimental databases (chapter 6), • Initial design of an organized data storage system, a data warehouse which can be subject-oriented (section 5.2), • Developing a flexible matching mechanism to find similar time series experimental recordings in the databases as a base for further data mining utility inclusion (section 5.3 - 5.4), • Preliminary design of a common representation for data transfer and an application-dependent protocol for message passing to facilitate communication among the components (section 7.4 - 7.5). 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.2 Components The overall architecture consists of the following components: • Graphical user interface, • Multi-level modeling and simulation system, • Data mining system, • Experimental databases, and • Middle-ware layer Each of the first four components is a stand-alone system. Users can specify the requirements for a model and then run a simulation through the simulator. Users can get input data from databases and feed them to the simulator. The outputs of a simulation can be the input for the data mining system to search for similar results in the databases. The communication among those components is through message passing. Figure 3.1 captures the relationships among the components. The middle­ ware layer is on top of the components and all of them can access the middle-ware layer. 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. SIMULATOR DATABASES USER INTERFACE DATA MINING SYSTEM Figure 3.1 Relationships among the Four Components The user interface is the front-end to the users. The other three components form a triangle architecture which is presented in details in Figure 3.2. The middle-ware layer contains several mechanisms for integrating modeling, experimental databases and data mining components. Details are shown in chapter 6. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Simulation EONS Object Library Model- DataView NeuroMiner Databases Experiment v Data Models Figure 3.2 A Framework for Linking Modeling and Simulation, Experimental Databases and Data Mining 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.2.1 Graphical User Interface The GUI provides a friendly user access to the components as well as mechanisms in the middle-ware layer in the integrated environment. It provides the windows for users to input their requirement in certain formats. It also is the media for visualization of simulation results, data mining results or database search results. However, in this thesis the construction o f this component is not addressed, and is only roughly discussed when we present the development of the other components. 3.1.2.2 Multi-level Modeling and Simulation The main components of the multi-level modeling and simulation system are EONS (Elementary Objects fo r Neural Systems). EONS is an object-oriented hierarchical objects library for neural systems. Each model, e.g., for a synapse or a neuron, is self- contained and any reasonable degrees of details can be included. This provides the flexibility to compose complex system models that reflect the hierarchical organization o f the brain from objects o f basic elements. New models can be constructed from existing models. In chapter 4, the development of the EONS system is shown. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To be able to link a model with experimental data in the database, a common ground is needed. One of the fundamental characteristics of biological experiments is the testing o f certain hypothesis. The experimental procedures for testing a hypothesis can be fully characterized by a set o f protocols that specify the conditions of the experiment and the manipulations o f system parameters. We have developed a protocol-based simulation scheme to facilitate the communication between these two disciplines. 3.1.23 Data Mining The data mining system allows both experimentalists and modelers to find hidden patterns in the experimental data and/or to search for models they need in the model database. It provides knowledge discovery mechanisms to extract patterns and summarize trends. As the first version of the system, we provide a flexible matching mechanisms to search similar experimental recordings in the databases and to compare simulation results with experimental recordings. Besides, a framework for data mining in neuroscience data is also shown. The data mining system needs to work on the databases. The multi-level modeling system and the data mining system can be seen as two loosely coupled systems because the simulation results can provide input for the data mining system, and the results of data mining system can provide 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. insights for the modeling of neural systems. Chapter 5 of this thesis contains information on this work. 3.1.2.4 Experimental Databases The experimental databases are the data storage and data manipulation system for experimental data. There can be one central database, or a cluster of databases. There could potentially contain a very large amount of experimental data in the databases in our applications. It is desirable that a data warehouse will be constructed for the databases, and some data marts can be built for specific laboratories. Some index structures are also needed to facilitate data searching and data mining. In chapter 5 of this thesis, a data warehouse schema for neurophysiological data and a shape index for similarity matching experimental time series recordings are shown. Since experimental databases are still under development in USCBP, the data warehouse schema can be adopted later to better facilitate data mining on neuroscience data. 3.1.2.5 Middle-ware Layer MDIS contains several mechanisms for linking modeling and experimental databases. A formal specification o f protocols is needed to formalize the contents o f protocols, 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and to provide a common representation for mapping protocols between modeling and experiment disciplines. Consequently, we have developed a Protocol Specification Language (PSL) motivated by the terminological logic (Brachman and Schmolze 1985) (Brachman, McGuinness et al. 1991) to provide a formal specification for simulation protocols and experimental protocols so as to facilitate the integration of modeling and experimental studies. Semantic model is the internal model for PSL units. Metadata model is designed for setting up the ontology for the integration. We developed the Model-DataView units for the model to connect to specific experimental data needed. There are different simulation models that need different kinds of data from experimental databases. We can create database views to extract relevant data from databases. We design Model-DataView to relate different views with models in simulation. Based on the relationships between a model and the views, we combine a model and the related views into one unit to facilitate users who need to run the models later so that they do not need to construct the queries/views every time they use the models. Different versions o f the Model-DataView units will also be constructed to provide more flexibility. If necessary, we can materialize those views for future use. Query manager for cooperative query and GUI have been developed as well. The design and development ofMDIS are discussed in chapter 6. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2 The Design Options This section explains a few of the options in designing the integrated environment. The design options range from loosely coupled vs. tightly coupled components to centralized vs. distributed data warehouse, to centralized vs. parallel data mining. The different types of communication mechanisms are briefly discussed. Finally, the architecture adopted in our project is described. 3.2.1 Loosely coupled versus tightly coupled components Loosely coupled system provides an almost unlimited scalability option. Loosely coupled systems will provide the capability to attach multiple systems together in a "Share Nothing" cluster. Once connected, those loosely coupled systems appear as a single system to the users. It is known that the loosely coupled systems offer more advantages in complex environments. The autonomous groups may be more sensitive to environmental change, and offers more simultaneously adaptation to conflicting demands. If problems develop in one part of the system, it can be sealed off from the rest of the system. The resulting total system may be more stable when loosely coupled. 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Allowing local components to adapt to local environments can reduce the coordination costs for the whole system. In contrast, the tightly coupled components will make scalability difficult. Every changes made at one component will affect other components since they share internal data structures. Federated database systems (FDSs) can be defined as a collection of independently managed, heterogeneous database systems that allow partial and controlled sharing of data without affecting existing applications. The component systems of the FDSs may be more or less tightly coupled. In our application, there are multiple databases designed for different areas of neuroscience research. Those databases form a cluster of database systems similar to FDSs. The databases are the data sources for the data warehouse. In this thesis, I consider one central database situation. Each component in the integrated environment is treated as a stand along system, such as a modeling and simulation system, a data mining system and experimental databases, so that they can be developed, modified, and extended separately. 7 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.2. Centralized data warehouses versus distributed data warehouses Centralized data warehouses are what most people think of when they first are introduced to the concept of data warehouse. The centralized data warehouse is a single physical database that contains all of the data for a specific functional area, department, division or enterprise. Centralized data warehouses are often selected where there is a common need for informational data and there are large numbers of end-users already connected to a central computer or network. A centralized data warehouse may contain data for any specific period of time. Usually, centralized data warehouses contain data from multiple operational systems. Centralized data warehouses are real. The data stored in the data warehouse is accessible from one place and must be loaded and maintained on a regular basis. Normally, data warehouses are built around advanced Relational Databases Management System or some form of multi-dimensional informational database server. Distributed data warehouses are data warehouses in which certain components of the data warehouse are distributed across a number o f different physical databases. Increasingly, large organizations are pushing decision-making down to lower and lower levels of the organization and in turn pushing the data needed for decision making down (or out) to the LAN or local computer serving the local decision-maker. 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Distributed data warehouses usually involve the most redundant data and as a consequence most complex loading and updating processes. In this thesis, I propose a data warehouse schema for a specific database containing neurophysiological EPSP (Excitatory PostSynaptic Potential) data. Other data warehouse schemas for other neuroscience data such as neurochemical, behavior, etc can be constructed in similar fashion later as the project goes on. When more than one data warehouse exist in the environment, distributed data warehouse issues can be considered. This will be a future work. 3.2.3 Centralized data mining versus parallel data mining Data Mining is the process of identifying valid, novel, potentially useful, and ultimately comprehensible knowledge from the databases. It is the automated analysis o f large volumes of data, looking for the 'interesting' relationships and knowledge that are implicit in large volumes o f data. Most of the current work on data mining lies in developing fast and accuracy algorithm on a centralized basis. Research and development work in the area of parallel data mining is concerned with the studies and definitions of parallel algorithms, methods, and tools for the extraction 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. o f novel, useful, and implicit patterns from data using high-performance architectures. Parallel data mining means that a single user can submit a query operation against several databases, and multiple data mining processor tasks are initiated internally to process the request. When data mining tools are implemented on high-performance parallel computers, they can analyze massive databases in a reasonable time. Faster processing also means that users can experiment with more models to understand complex data. High performance makes it practical for users to analyze greater quantities o f data that, in turn, yield improved predictions. Data mining algorithms written for an uni-processor machine will not automatically run faster on a parallel machine; they must be rewritten to take advantage of the parallel processors. Independent pieces of the application are assigned to different processors. The more processors, the more pieces can be executed without reducing throughput. This is called inter-model parallelism. This kind of scale-up is also useful in building multiple independent models. In this thesis, I consider centralized data mining environment. When the development of experimental databases gets mature in USCBP, more databases will be available and parallel data mining environment can be explored later as a future work. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3 The Adopted Architecture Model We build a triangle structure for the three main components in our environment. The connections to the middle-ware layer is introduced in chapter 6 and 7. Data and requests can be sent among these three components. The adopted architecture for linking modeling and simulation, experimental databases and data mining are shown in Figure 3.3. We use loosely coupled component model, with a modeling and simulation system, a central data warehouse, and a centralized data mining system. Users can go from the user interface run a simulation and/or submit queries to search for data from the experimental databases. The data mining component runs on top of the databases. In the adopted architecture (Figure 3.3), I have addressed EONS, NeuroMiner, Data Warehouse, and Meta Data in my thesis. Building experimental databases is covered by others’ work in USCBP. The development of the EONS is discussed in chapter 4. This system can model neural systems from different levels of granularities and incorporate synaptic dynamics into neural networks. The prototypical work on Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. U se r R e q u e sts D B 2 D B n M e ta D ata N e u ro M in e r A P EONS A P D ata W arehouse Figure 3.3 The Adopted Architecture NeuroMiner and data warehouse is discussed in chapter 5. The preliminary version of the data miner for neuroscience, NeuroMiner, includes a flexible similarity matching 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. function and some supporting index mechanisms such as a shape index and a data warehouse schema for organizing neurophysiological EPSP data in USCBP. The integration issues for linking these components are discussed in chapter 6 and 7 of this thesis. The development of the metadata is associated with individual components and the integration. Figure 3.4 shows the metadata in place of the components. Each of the metadata is described in the corresponding chapters when I discuss the components and integration. The neural hierarchy is discussed in chapter 4. The Model-DataView, simulation protocols, experimental protocols, hypothesis, PSL, XML and ModelData communication protocol are discussed in chapter 6. The shape index and data warehouse schema is discussed in chapter 5. When there are more than one database, parallel data mining algorithms can be run to extract inter-database relationships. Federated database issues will also be considered to facilitate the data mining. These will be future work. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Neural hierarchy Model-DataView Hypothesis Simulation protocols Shape index Hypothesis Experimental protocols Data Warehouse Schema Database schema PSL XML ModelData Protocol Figure 3.4 The Metadata in the Integrated Environment 7 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4: Multi-level Modeling 4.1 Introduction A major unresolved issue in neuroscience is how functional capacities of the brain depend on (emerge from) underlying cellular and molecular mechanisms. This issue remains unresolved because of the number and complexity o f mechanisms that are known to exist at the cellular and molecular level, their nonlinear nature, and the complex interactions o f these processes. Thus, the capability of simulating neural systems at different levels of organization is essential for an understanding of brain function. To this end, the development o f a multi-level modeling system, EONS (Elementary Objects of Neural Systems) has been undertaken to provide a general framework for representing structural relationships and functional interactions among different levels of neural organization. The development ofEONS is two-folded. One is to build an object library for elementary neural objects that can be used for building more complex neural modules. The second is to use it as a multi-level modeling system to handle modeling neural systems at different levels o f granularity. To build a library of neural objects, we need to abstract the important characteristics of those components in the neural system and 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. model them at different levels of analysis for different modeling purposes. To be able to re-use the models, we have adopted object-oriented design methodology. Each object is self-contained and can be modified and extended separately. This modeling method makes it easy to include detailed analysis of neural dynamics by composing specific modules, and can model neural systems from molecular/cellular level, to synaptic level, to neuron level, and to neural network level. Consequently, in EONS, the object-oriented modeling methodology is adopted to form a hierarchy of neural models, from molecules, synapses, neurons, and networks to systems. Each model, e.g., for a synapse or a neuron, is self-contained and any reasonable degrees of detail can be included. This provides the flexibility to compose complex system models that reflect the hierarchical organization of the brain from objects of basic elements. To have a multi-level modeling system, we need to consider the system complexity when there are multiple processes at a given level and across different levels of neural organization, and there are different spatial and time scales. It is a challenging work to simulate a large neural network with detailed processes, providing connectivity methods for linking neurons at different layers. I have designed a connectivity mechanism that can systematically set up the connections between neurons in different layers for simulating a large neural network. A set of neural objects has been implemented as components in the EONS library. I have used EONS objects to 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. compose a synapse model with different combination o f AMP A and NMDA receptors. A neural network simulation incorporating detailed synaptic dynamics has also been conducted to demonstrate the capability ofEONS to extract essential principles from a complex system. I have implemented some core functions ofEONS for flexible neural modeling and reuse of the neural objects. The work on developing a set of system functions to make EONS a better simulation environment is an interesting future work. A tight coupling with experimental studies is critically important for the success of a computational model. This is particularly important given that experimental and computational approaches can complement each other. To address this, we have developed a protocol-based simulation with EONS. Therefore, in this chapter, I will first introduce the system complexity issues in section 4.2. Then in section 4 .3 ,1 will discuss the system design ofEONS, and the development o f core functions ofEONS including both model constructions and simulations. Protocol-based simulation is introduced in section 4.4. The discussion and future work are given in section 4.5. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2 System Complexity Nervous systems can be analyzed into different levels, ranging from network (system) level to the neural, cellular and molecular levels. When modeling nervous systems in terms of multiple levels, we need to model non-linear interactions of multiple processes at a given level or across different levels o f the organization. There are different processes going on, each functioning at different spatial and time scales. 4.2.1 Non-linear Interactions of Multiple Processes Figure 4.1 shows the complexity of multi-level modeling. Nervous systems are composed of processes from different levels of organizations. At the neuron level, neurons can be modeled with quite different models, such as McCulloch-Pitts neurons, or using an integrator model. When one wants to model the internal structures o f a neuron, more details need to be specified. A neuron can be modeled to include a soma, dendrites and an axon. Dendrites can be seen as compartments with characteristics and functions associating with those compartmental modules. The common method used is cable theory. When modeling the contact between two neurons, synapses can be included. In the postsynaptic site, different ion channels (Na+, K+, Ca2+, etc.) and receptor channels (AMP A, NMD A, GAB A, etc.) can be 8 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. integrated to produce the synaptic potentials. These bring in cellular and molecular levels of neurobiological interactions. Those interactions may complicate the processes not only in the synaptic level, but also in the neural level. The interactions consist of complicated neural, synaptic, or molecular processes that are non-linear in nature. A neural network can be composed of neurons with different properties. When building a neural network, the connections between neurons can be modeled at different levels. The connection can be modeled as a simple synaptic weight represented as a number at the neuron level as in conventional neural networks. Alternatively, more details such as synaptic, cellular and molecular level connections can be included. If we model into the synaptic level, the connection is from a presynaptic terminal to a postsynaptic spine. Detailed synaptic mechanisms are incorporated as non-linear ftinctions to adjust the connections. The synaptic strengths are no longer synaptic weights (numbers). The complexity of the non-linear functions is determined by the processes included, and the level of details on the ion channel modeling and receptor channel modeling. The functions involved in the connections reflect non-linear interactions from the molecular level to the neural level. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 20X20Layer => 400Neurons Eads conneds to 40 neurons Total o f 16,000 synapses S p i n e 'esicle AMPA’ Figure 4.1 Complexity of Multi-level Modeling Moreover, the number of neurons in a network can be large. Consequently, the number of connections between neurons can be large as well. Suppose that there are two layers of neurons in a network with each layer having 20x20 neurons. If each neuron connects to 40 neurons (a small number) in another layer, there are a total of 16,000 synapses. Since the interactions between the components at a given level and 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. among different levels in the neuron structural hierarchy are non-linear, a systematic set-up o f the connections between the components is a challenging task. The connections are not only at the level of neurons, but also at lower levels in the hierarchy. The choice of modeling level is subject to the user's judgments and interests. Any levels o f complications can be selected. The models built should capture the underlying interactions (at or across multiple levels of organizations), but be simple and adequate enough for analysis. 4.2.2 Spatial and Time Scales Information processing in the brain is concerned with the spread and interaction of electrical and chemical signals within and among neurons. This involves nonlinear mechanisms that span a wide range of spatial and temporal scales (Camevale and Rosenthal 1992). The complexities of the non-linearities and spatio-temporal relationships involved are quite unlike those in the non-biological domains, because they are constrained to operate within the intricate anatomy of neurons and their interconnections, with the understanding of the cells and circuits of the brain. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Take a neuron as an example. From the neuron structural hierarchy, we can see that different components inside a neuron have special structural relationships. The underlying interactions actually involve spatio-temporal relations as well. For instance, when an action potential propagates along the axon terminal, the patterns of the spikes, representing by different spatio-temporal combinations, encode certain information that the neuron is transferring. Cable theory is another example. The equations describe the relationships between current and voltage in a one-dimensional cable. Spatial discretization of the partial (ordinary) differential equations is equivalent to reducing the spatially distributed neuron to a set of connected compartments. The longitudinal spread of voltage and current is represented by means of approximating the cable equation as a series of compartments connected by resistors. Cable diameters and channel densities need to be considered. Thus, spatial relationships among neurons are captured in some equations to encode brain mechanisms. The processes at different levels of the neural organizations have different time scales. For instance, an action potential may function at the millisecond scale, while some molecular processes may function at the microsecond scale. Figure 4.2 is an illustration of different time scales of neural processes (Bose and Liang 1996). 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Synaptic P lasticity Transm itter Synaptic Diffusion Current Vesicle Fusion Calcium Diffusion Vesicle Mobilization us ms sec min hr day Figure 4.2 Process Evolves at Different Time Scales 4.3 Multi-level Modeling System: EONS Simulating neural systems at different organization levels is essential for brain function understanding. To this end, a multi-level modeling system EONS (Elementary Objects of Neural Systems) (Liaw, Shu et al. 1996) (Liaw, Shu et al. 1999) has been developed to provide a general framework for representing structural 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. relationships and functional interactions among different levels of the neural organizations. Object-oriented design methodology is used to form a hierarchy of neural models, to model systems, networks, neurons, and synapses, down to molecules. Each model is self-contained and any degree of detail can be included. This provides the flexibility to compose complex system models from objects o f basic elements, in a way that reflects the hierarchical organization of the brain. My contribution in this effort is mainly on developing the core functions ofEONS. Therefore, in the following sections, I will first introduce the system design ofEONS, including the system architecture, the object library design and flexible neural modeling. Then I will focus on neural modeling and simulation results are shown. Some interesting future work is addressed to make EONS a better multi-level modeling tool. 4.3.1 EONS System Design The design ofEONS is composed of three layers, namely the interface layer, the object-oriented modeling layer and the simulation layer (see figure 4.3). The user interface layer captures the users' view of the system. It helps to specify the requirements o f the models. At the second layer, the users' requirements are captured 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in the object models. It is where multi-level modeling takes place. The object models in the second layer are object-oriented and can be shared by many users. The third layer is the simulation layer. The main purpose of the simulation layer is to efficiently implement the models built in the second layer. O b j e c t - o r i e n t e d Modeling Layer Simulation Layer Interface Layer Users’ view Logic stru c tu re s I m p l e m e n t a t io n Figure 4.3 System Design ofEONS Figure 4.4 shows the different stages (life cycle) o f modeling in which EONS is involved. The modeling part starts with a system analysis, which is divided into 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. structural analysis and functional analysis. After these stages, multi-level modeling employed. New models can be constructed from existing models. Simulated and verified models can be placed into the object library for later use. System Analysis ^ l-'iiifciinnal A n;»l>sis M ru c iiir a l A n a ly s is M n I ti-le v c l M o d c l i n a Existing models S im e la t i o n a n d I'c slin u new models Figure 4.4 The Different Stages of Analysis and Modeling 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3.1.1 EONS Library By grouping EONS models together to form a hierarchy, and including some numerical methods used, we can construct the EONS library, which can be included for simulation programs. The objects in EONS are represented as a hierarchy, from neural network, neuron (axon terminal, soma and dendrite), synapse (axon terminal, synaptic cleft and post-synaptic spine), down to cellular properties (membrane, ion channels and receptor channels). Lower-level properties such as membrane properties, channel types and cleft properties etc. can also be specified. Molecular kinetics and numerical methods (ordinary differential equations, partial differential equations, and geometry) are also included. A graphical description ofEONS library is shown in Figure 4.5. The EONS object library at current stage is a set of class definitions and implementation in C++ including Neural network, Neuron, Synapse, Axon terminal, Post-synaptic spine, Membrane (including resistance, capacity, and leak current), AMP A, NMDA objects, and numerical method ODE. These C++ codes representing EONS object classes and methods can be reused by different neural models. The assembly of neural objects to form a neural model is still done manually. An interesting future work is to automate this process. 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Axon terminal Soma Neural network C ap acity Leak current M em brane Ion channels Dendrite Neuron Synapse Synapse Cellu lar iReceptor channels Na channel ellula V lulsoiiar k in e tic s Geom etry tlum orical m e th n s ls diffusion AMPA Molecular kinetics ODE G e o m e try Axon terminal Diffusion POE Cleft Uptake G eom etry Post-synaptic spine Degradation ellular Meshin ieom etry Diffusion Molecular kinetics Figure 4.5 EONS Library Objects and Methods 4.3.1.2 Flexible Neural Modeling Neural models are the abstract representations of the characteristics and processes of part of the neural system. In principle, there can be different levels o f details 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. incorporated in the neural models depending on the modeling purposes. However, due to the computation complexity of the neural processes, neural models are usually the simplified abstraction of the neural components. Therefore, flexible neural modeling met various modeling purposes is needed to provide a better view of the neural systems and to constrain computation complexity at the same time. The EONS object library contains a set o f elementary neural objects modeled by adopting object-oriented design methodology and can be included into a neuron module as needed according to neuron hierarchy. For example, we can model a neuron with different details of granularities. A neuron consists of axon, soma and dendrite. Each o f the three components can be modeled further into more details depending on user’s requirement. For instance, a soma object can specify how membrane potential is influenced by integrated synaptic potentials without specifying how the spine objects generate them. A variety of combinations of receptor channels can be included in a spine object to produce a particular synaptic potential. Neurons in a neural network can be modeled into different details through including different elementary objects in EONS library and incorporating appropriate details of synaptic processes into a neural network model. Therefore, one can capture the essential principles of a complex system through incorporating different levels of 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. details in the neuron modeling and neuron connections. When simulating a network of neurons, there are two extremes. See figure 4.6 for paradigms on neural network simulation. On one end, every neuron in the network is identical. It only needs to change the synaptic weights to do the convolution to produce the output layer of neurons. On the other end, every neuron in a network is different with variant internal structure. A neuron's pre-synaptic mechanisms vary across axon terminals. Neurons are no longer homogeneous. For this case, no system as we know has solved the problems. EONS is an effort falling in the left end of the spectrum in Figure 4.6. Our final goal is to work out the problems of constructing networks of neurons with diversified structures. NEURON EONS GENESIS NSL Heterogeneous Homogeneous neural network neural network Figure 4.6 Paradigms for Neural Network Simulation 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To model a neural network, there are some aspects we need to consider. It is necessary to describe the neurons which comprise the network, the interconnections among the neurons, and the dynamics o f the neurons and their interconnections (Weitzenfeld, Arbib et al. 1997). Most of the neural simulation systems such as NSL usually model larger number of neurons, the inter-connections between two layers of neurons are realized by convolution operation o f weight matrixes in simulation. NSL makes the assumption that neurons in the same layer have the same number of axon terminals. When learning is not involved, the synaptic weights remain the same for these neurons. In our system, there is no longer a weight matrix. Instead, it is replaced by the pre-synaptic mechanisms. One critical issue for simulating neural systems at different levels o f organization is to manage larger number of synaptic modules for a model of any realistic size. A clear connectivity representation is needed to perform the pre-synaptic mechanism procedures in implementation. Each neuron in EONS is modeled to the details o f pre- synaptic terminals and post-synaptic spines. A natural way of making the connections between pre-terminals and post-synaptic spines is through links. Depending on the number of pre-terminals of the input neuron, we can construct a matrix representing the terminal number. Using this matrix as a moving window, for each pre-terminal of a neuron in the input layer, EONS will "grow" the synapses between neurons 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. according to the connectivity matrix. In this way, the connectivity relationships among different layers of neurons are built up. Figure 4.7 is flexible neural network modeling with EONS library, where neurons in a neural network can have various details through including synaptic, cellular or molecular level objects in the library according to neuron hierarchy. Sam IXnanic tynapss I h L - OO Figure 4.7 Flexible Neural Network Modeling with EONS Library 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3.1.3 Composition To build a library of neural models, a primary concern is composition. In EONS, we consider composition from two levels. At the first level, we construct models from primitive neural components. Every component that we are modeling is self- contained, and can be seen as a model. At the second level, we need to compose small networks into one large network. For instance, the Axon-terminal object linked to the Neuron object can be selected from the following set of objects with varying degrees of detail: The axon terminal can be defined as a simple relay of pre-synaptic action potential in one extreme; or it can include calcium channels, calcium diffusion, representation of vesicles, molecular processes, etc., at the other extreme; or an object with intermediate complexity. Assume a neuron is composed of a set o f attributes, an array of pre-terminals and an array of post-synaptic spines. A post-synaptic spine is composed of AMP A receptors or/and NMDA receptors. Each of the units is constructed separately, and can be modified separately without affecting its connecting parts. Composition in this level is conducted according to biological neural structures. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. When we compose layers of a network, the composition problem becomes more complicated. We need to construct new models by composing existing models which comprise layers of neurons. In doing this, the connectivity between layers of existing model and those of the new model need to be added. Subsequently, the existing connectivity relationships are adjusted to accommodate the changes. This involves some structure changes of the existing system. As the connectivity relationships are modified, the outputs of the existing models should also be adjustable to be fed as the inputs to the new models. 4.3.2 EONS Development After the discussion on EONS system design in previous section, in this section, I will present my work on developing some core functions ofEONS. The discussion and some fixture work are presented in section 4.4. 4.3.2.1 EONS Object Library I have implemented the Neural network, Neuron, Synapse, Pre-terminal, Post- synaptic spine, Membrane (including resistance, capacity, and leak current), AMP A, NMDA objects, and numerical method ODE in C++. Each of them is a component of 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the EONS library. The C++ codes can be downloaded at http: //www- hbp.use.edu/people/~yingshu. The EONS object library at current stage is a set of class definitions and implementation in C++ consisting of the above objects. These C++ codes representing EONS object classes and methods can be reused by different neural models. The assembly of neural objects to form a neural model is still done manually. An interesting future work is to automate this process. 4.3.1.2 Neuron Modeling EONS adopts the idea o f object-oriented design methodology into neural modeling. Modeling neural components are done by using conventional object-oriented programming methods to have a class with some links to other classes which are the components, parents or children of the class. In EONS, the internal structure of a neuron is modeled by treating each component as an object. A class hierarchy is used to represent the relationships among the components within a neuron. Figure 4.8 shows an example of neuron components. For example, we can define a class called Neuron (see following C++ code), which 99 permission of the copyright owner. Further reproduction prohibited without permission. includes some soma properties, a set of pre-terminals and a set of post-synaptic spines. The Axon is not modeled right now, but we can define a dummy class Axon which links to both Neuron and Pre-terminal. The dendrite is not modeled at the moment. It will be developed together with cable theory in EONS as future work. Therefore, at current stage, the connection between two neurons reduces to the connection of synapse, which in turn consists of a pre-terminal and a post-synaptic spine. The internal state of a neuron in EONS can vary. Some are described by a single scalar quantity, membrane potential m, which depends on the neuron's inputs and its past history. For others, more detailed structures are included. For example, synapse model or even receptor model can be included. Synaptic weights are dynamically changeable and can be replaced by pre-synaptic procedures. The post-synaptic spine can contain AMPA and NMDA receptors, etc. To model a Neuron with soma, axon terminal, spine, etc is done in EONS through conventional C++ programming methods, i.e. define a Neuron class with links to other classes which are axon terminal, and spine, whereas the spine class has links to AMPA and NMDA receptor classes. The properties o f soma have already been included in the neuron properties. Any more components can be added to Neuron 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.8 A Hierarchy ofNeuron Components class (and its subparts) through adding links to other classes. We model the set of pre­ terminals (post-synaptic spines) as an array of pre-terminal objects (post-synaptic spine objects). The pre-terminal and spine are modeled as two separate classes. The AMPA and NMDA receptors are modeled as two separate classes as well. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.9 is an illustration of neuron modeling where neurons are connected through axon, pre-terminal, post-synaptic spine, dendrite, etc. Dendrite Post-synaptic Spine NMDA Dendrite / ( Neuron ) N euron^')” \ Pre-terminal AMPA % ^ Axon Figure 4.9 Neuron Modeling Since axon and dendrite are not modeled at the moment, we can define Neuron class having two friend classes Pre terminal and Post synaptic spine which specify that there are linked class objects belonging to these two friend classes. 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. class Neuron : { friend class Prejerm inal; friend class Post_synaptic_spine; private: float delta_t; float Vm_Rest; // resting membrane potential (in ms) float Ap_Refrac; // length of a refractory period (in ms) float N_Vm; II membrane potential float N_th; II threshold for firing an action potential float N_Ap; II action potential float N_Ap_Refrac; // neuron in refractory int term jium , spine_num; public: Pre_terminal *has_pre_terminal[term_num]; Post_synaptic_spine *has_spine[spine_num]; void derivs(float, float*, float*) const; void run_moduleJnput_neuron(float); }; ..... The internal structure of the neuron can be varied through defining different attributes and different implementation for the functions run_module input_neuron(float), derivs(float, float*, float*), etc. The Preterminal and Post_synaptic_spine become two link fields in the Neuron class object which imply that there are components can be composed into a neuron. Pre_terminal *has_pre_terminal[term_num]; Post_synaptic_spine *has_spine[spine_num]; 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The t@rm_num and spine_num represent the number of pre-terminals and post- synaptic spines linking to the neuron. The void run_module_inputjneuron(fIoat) Junction calculates the action potential for a Neuron, void derivs(fioat, float*, float*) const function takes care of differential equations involved, etc. The model for Postsynapticspine can be any combinations o f AMPA and NMDA receptors, and /or inclusion of more synaptic details. The same holds for the pre terminal model. class Post_synaptic_spine { friend class AMPA; friend class NMDA; private: int no_ampa, no_nmda; float la_sum; float ln_sum; float I I ; public: AMPA *has_ampa[max]; NMDA *has_nmda[maxj; Post_synaptic_spine(int no_a, int no_n, float vjnit); void calculate_V(); The above Post-synaptic spine class defines a spine with a combination of a set of AMPA and NMDA receptors. The following are the AMPA and NMDA receptor 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. class definitions that include pointers linking to the Post synaptic spine class object. Therefore, AMPA and NMDA class objects are linked indirectly to Neuron class through Post synaptic spine class object. class AMPA { friend class Post_synaptic_spine; private: float la; public: Post_synaptic_spine *The_Spine; void derivs(float, float*, float*); void rk4(float * , float * , int, float, float, float *); void calculate_d(); class NMDA { friend class Post_synaptic_spine; private: float In ; public: Post_synaptic_spine *The_Spine; NMDA(Post_synaptic_spine*); void derivs(float, float*, float*); void rk4(float *, float *, int, float, float, float *); void calculate_d(); Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Each of the above class definition is self-contained, and can be modified, extended and reused by other models. More neural components from cellular / molecular level to synaptic level such as GAB A receptor and Ca diffusion, etc. can be added to compose into a neuron model. The pre terminal of one neuron can be connected to post_synaptic_spine of another neuron by means of link among them. For example, there is an array of pre-terminals in one Neuron, and an array of post synaptic spines in another Neuron. One entry in the array o f pre-terminal objects can have a pointer to one entry in the array of spine objects in another Neuron, and vice versa. The systematic set-up connections between neurons (between pre-terminal of one neuron to the spine of another neuron) in the neural network is done through bidirectional links (Figure 4.11). A future work will be using a graphical tool to specify the connection between models and generate codes for it. Each pre-terminal can associate with different pre-synaptic procedures which makes the connections become various functions instead of numbers representing weights between neurons. By linking neurons through pre-terminals and post-synaptic spines, we can connect a large network of neurons with detailed synaptic mechanisms. In this way, the representation power of neural network greatly increases. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 43.2.3 Systematic Set-up of Neuron Connections A large network of neurons incorporating various mechanisms corresponding to the molecular and cellular levels can be constructed by linking neurons through axon terminals and postsynaptic spines. This, however, raises a critical issue for simulating neural systems at different levels o f organization with large number of synaptic modules for a network model of any realistic size. For example, to connect two layers o f20 x 20 neurons with a 5 x 5 connection matrix, requires a total o f 10,000 synapses. Each synapse may be comprised of multiple molecular objects (e.g., vesicles, receptor, release site, etc.) which may not be uniformly distributed for each synapse. A systematic indexing scheme has been developed to handle such complexity: Given neural networks composed of layers of neurons and the connectivity matrix for each layer, EONS will "grow" the synapses between neurons, by automatic generation of bi-directional synaptic pointers among pre-synaptic terminals and post-synaptic spines according to the connectivity matrix. For example, each entry in the terminal array of each neuron in the first layer will have a pointer pointing to one entry (in our case) in the spine array of another neuron in the second layer. The entry in the same spine array of that neuron in the second layer will have a pointer pointing to the 107 permission of the copyright owner. Further reproduction prohibited without permission. corresponding entry in the pre-terminal array of the neuron in the first layer o f the neural network (Figure 4.10). Like many other applications, the reason to use pointers is implementation consideration since pointers are easy to implement and provide concise and clear semantics for the representation. Moreover, EONS also contains objects representing the network, layers, neurons, axon terminals and spines (as components ofEONS object library). These objects, together with the synaptic indexes, form a structural modeling method for building neural models such that various neural processes can be incorporated at each level in a flexible manner. In our approach, the connection between two neurons is no longer a simple value representing the synaptic weight. Rather, presynaptic mechanisms such as facilitation, feedback modulation, and augmentation are encoded. Each of the terminals has its synaptic mechanism functions. The pre-terminals of one neuron are linked to spines of a set of neurons. The following is the neuron object with an array of pre-terminals and an array of post-synaptic spines to support the connectivity mechanism. class Neuron : { friend class Prejerminal; friend class Post_synaptic_spine; private: float ApJRefrac; // length of a refractory period (in ms) 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. float N_Vm; // membrane potential float N_th; 1 1 threshold for firing an action potential float N_Ap; // action potential float N_Ap_Refrac; // neuron in refractory int term_num, spine_num; public: Pre_terminal *has_pre_terminal[term_num]; Post_synaptic_spine *has_spine[spine_num]; void derivs(float, float*, float*) const; void run_module_neuron(float); }; The high level notation for systematic set-up neuron connections in a neural network is shown in Figure 4.10. The C++ code for systematic setting up neural connections to grow synapses between neurons in different layer is shown in Figure 4.11.1 have taken a neural network with three layers as an example: neuronO*. neuronl * and neuron2*, having 10, 4 and 2 neurons respectively. The connections between neurons in different layers are realized by linking pre-synaptic terminals of one neuron in one layer to the post-synaptic spines of all neurons in another layer and vice versa. When implementing the connectivity, we make the assumption that: the number of neurons in the second layer determines the number of the pre-terminals of a neuron in the first layer, and the number of neurons in the first layer determines the number of the post- synaptic spines of a neuron in the second layer. Therefore, by specifying the number of neurons in a neural network, the system can automatically set up connections 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. between neurons (between pre-terminal of one neuron to the spine of another neuron) through setting up bi-directional links among neurons in different layers. Each neuron in the first layer will have a pointer to the neurons in the second layer through the linkage of pre-terminals and post-synaptic spines. There is an inhibition neuron in the second layer which is linked to the pre-terminals of the neurons in the first layer to give feedback modulation. The same is applied to the connections between the second and the third layers. Consequently, altering synaptic weights becomes tuning parameters of the pre- synaptic functions, which provides more representational power in the synaptic level, and captures neural network dynamics. NeuroO Neurol Neuro2 Inhibition Figure 4.10 Systematic Set-up Neuron Connections in a Neural Network 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Set_up_net(Neuron *neuroO, Neuron *neuro1, Neuron *neuro2, int neuronNumJayO, int neuronNumJayl, int neuronNum_lay2) { int i, j; for (i=0; i<neuronNum_layO; i++) { for (|=0; j<neuronNum_lay1+1; j++) { neuroO[i].has_pre_terminal[j]->inh_neuron = &neuro1 [neuronNumJayl]; neuroO[i].has_preJerminalD]->a_spine = neural [j].has_spine[i]; neural 0].has_spine[i]->ajerminal = neuroOp]. has_preJerminalO]; f } for (i=0; i<neuronNumJay1; i++) { for 0=0; j<neuronNumJay2+1; j++) { neural [i].has_preJerminalO]->inh_neuron = &neuro2[neuronNumJay2]; neural [i].has__prejerminal[j]->a_spine = neuro2[j].has_spine[i]; neuro2[j].has_spine[i]->ajerminal = neural [i]. has_preJerminalp]; f } } Figure 4.11 C++ for Set-up Neuron Connections in a 3-layer Neural Network In Figure 4.11, in the first For loop, the pre-terminals of neurons in the first layer are connected to post-synaptic spines of the neurons in the second layer and vice versa. ill permission of the copyright owner. Further reproduction prohibited without permission. The last neuron in the second layer is an inhibition neuron and is linked to all the pre­ terminals of the first layer. The same is done for the connections between the second and the third layers (second For loop). The above mechanism for systematic set-up neuron connections provides a general method to build a neural network connection. Once the number of neurons in each layer and the neuron structure with pre-terminal and post-synaptic spine are given, we can call the above function Set_up_net(...) to set-up the connections automatically. 4.3.2A Synaptic Modeling There is greatly variety of synapses that differ from each other in iron mechanism, time constant, tendency to change strength with activity, long- and short-term effects o f activation, and many other important properties. A synapse junction has two sides. The input side, receiving an action potential from the driving cell, is referred to as pre-synaptic. The driven cell is the post-synaptic side. Synapse is an important component in neuron model. However, it is not modeled or simply modeled in many simulation tools. One of our tasks is to model the synapse into different granularities to reveal the principle of synaptic dynamics. To this end, we have used object- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. oriented techniques to model synapse with different combination o f neural components and processes. Without loss of generality, we model a synapse to have none or more AMPA receptors, and none or more NMDA receptors. Different numbers o f AMPA and NMDA receptors with possible different properties are allowed to be part o f it, showing a flexible modeling for various granularities. More components such as GABA receptor and molecular properties can be included easily through our method. In a way, it shows that using EONS elementary objects, more complex model can be built. By varying the different combinations of AMPA and NMDA receptors, we can model synapse into different versions to match with experimental results to show fitness of the models. Details about this are presented in chapter 6 o f this thesis. The actual modeling of the synapse with different combination of receptors can be done by taking a portion o f the Neuron model under synapse with AMPA and NMDA receptors. The C++ code for the synapse with AMPA and NMDA receptors has been shown before in section 4.3.2.2.1 will show some of the class definitions and methods in C++ below. The purpose of the simulation is to show that the EPSP (Excitatory Post-Synaptic Potential) generated by the post-synaptic spine is the summary effects Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of the individual components of the synapse, in our case the AMPA receptors and NMDA receptors. class Spine { friend class AMPA; friend class NMDA; public: AMPA *has_ampa[max]; NMDA *has_nmda[max]; void calculate_V(); The spine consists of a set of AMPA and NMDA receptors. The number o f AMPA and NMDA receptors in a spine is specified by the number of entries in the AMPA and NMDA array. There can be none or more receptors in each array. The following are the AMPA and NMDA class definitions and a portion of the function implementation. The derivs(...) and rk4(...) are differential equation functions, void calculate_d() calculate the EPSP. The AMPA and NMDA receptors are linked to the spine class through link fields. class AMPA { friend class Spine; public: 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Spine *The_Spine; void derivs(float, float*, float*); void rk4(float *, float *, int, float, float, float *); void calculate_d(); }; class NMDA { friend class Spine; public: Spine *The_Spine; void derivs(float, float*, float*); void rk4(float *, float *, int, float, float, float *); void calculate_d(); void AMPA::calculate_d() { int n; derivs(0.0, y, dydx); rk4(y, dydx, n, 0.0, delta_t, y); la = number_of_synapses * single_G_AMPA * y[2] * (0.0 - The_Spine- >V); ... } void NMDA: :calculate_d() { int n; Mg = 0.0012; derivs(0.0, y, dydx); rk4(y, dydx, n, 0.0, delta_t, y); rate_k = 0.0088 * exp(The_Spine->V/(12.5)); GN_ub = (rate_k / (rate_k + Mg)) * y[2]; In = number_of_synapses * single_G_NMDA * GNjub * (0.0 - The_Spine->V);... }; 1 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The simulation results are shown in Figure 4.12. The output in the top portion of figure 4.12 is the simulation result of EPSP (Excitatory PostSynaptic Potential) on an AMPA receptor. The output in the middle portion is the simulation result of EPSP on a NMDA receptor. The output in the bottom portion is the simulation result of EPSP on a Spine with one AMPA receptor and one NMDA receptor. Notice that the scale for EPSP on the NMDA receptor is much smaller. Figure 4.13 shows these three simulation results in one figure. As we can see the EPSP generated by NMDA is very small when putting them together (close to x- axis). The EPSP generated by the spine is mostly contributed by the AMPA receptor. That is why the amount of EPSP generated by the AMPA receptor and the spine is very close. The figure shows that the EPSP generated by the spine is the sum of the EPSPs generated by the AMPA receptor and the NMDA receptor. With our modeling method, the number of AMPA and NMDA receptors can be easily changed. The output can be easily viewed and varied, which can be used to compare with experimental recordings. This shows an advantage o f EONS modeling that it can be used as a bridge to link modeling & simulation studies with neuroscience experimental studies through modeling neural systems into different levels of granularities. 1 1 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 250 200 150 100 0.08 0.06 0.04 0.02 200 250 150 100 50 250 150 200 100 Figure 4.12 The Simulation Results (EPSPs) of AMPA Receptor, NMDA Receptor and Spine Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0 50 100 150 200 250 Figure 4.13 The plot o f the simulation results in Figure 4.12 into one picture where the lowest curve is EPSP ofNMDA receptor, the middle curve is the EPSP of AMPA receptor, the top one is the summary EPSP from the AMPA and NMDA receptors 43.2.5 Neural Network Modeling with Synaptic Dynamics The computational capability of the brain is the result of a complex interaction among a larger number of underlying neurobiological mechanisms. Among them the synaptic 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mechanisms play an important role in neural information processing, because neural information transmission takes place between neurons through synapses. However, synaptic dynamics is absent in most neural network models. To extract the essential principles of synaptic dynamics in neural information processing, we incorporate synaptic dynamics in a network to analyze the behavior of the neural network. In conventional neural networks the connection between neurons is synaptic weight and synaptic dynamics is not very easily to be incorporated. However, with EONS modeling method, we can model a synapse with detailed neurobiological processes and the connection between neurons become pre-synaptic functions. A concise representation o f synaptic dynamics is dynamic synapse (Liaw and Berger 1996). The main idea behind dynamic synapse is that the synapse has a pre-synaptic site to determine the probability o f neurotransmitter release, and a post-synaptic site to scale the amplitude o f excitatory post-synaptic potential (EPSP) to response to neurotransmitter release. The probability of release is determined by the temporal pattern of action potential Ap, first and second components of facilitation FI and F2, and feedback modulation Mod. To implement a dynamic synapse in EONS involves defining a pre-synaptic object, a post-synaptic object together with a neuron object. Synaptic functions need to be 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. defined to capture the non-linear nature of synaptic transmissions. For example, a pre­ terminal class is defined and a pre-synaptic function in the pre-terminal is constructed that includes the release probability function determined by the temporal pattern of action potential Ap, first and second components of facilitation FI and F2, and feedback modulation Mod: Pr( R) = Ap + FI +F2 + Mod. The following are a portion of the class definition and method implemented in C++ for the main functionality o f dynamic synapse. The pre-terminal is linked to a neuron, and is connected to a post-synaptic spine through synaptic dynamics represented by the pre-synaptic functionPr(R). void derivs(...) and void rk4(...) are differential functions. run_moduIe_bouton() implements the main idea of dynamic synapse. By using EONS modeling methodology, detailed synaptic dynamics is easily incorporated. class Prejerm inal { friend class Neuron; friend class Spine; private: float Release_th; II threshold for release float Release_th_step; // threshold for release float Pr_R_Rest; II resting Pr_R Neuron *a_neuron; Spine *a_spine; Neuron *inh_neuron; 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Pre_terminal(Neuron*); void run_module_bouton(); void derivs(float, float *, float *); void rk4(float * , float * , int, float, float, float *); }; void Pre_terminal:: run_module_bouton() { int n; temp_Pr_Ap = Pr_Ap_k * this->a_neuron->N_Ap + Pr_R_Rest; temp_Pr_Mod = Mod_k * this->inh_neuron->N_Ap; tmp = F1_decay_rate * delta_t; F1 = F1 + F1_k * this->a_neuron->N_Ap - tmp*F1; temp_F1_neg = F1_neg_k * F1; temp_F2 = F2_k * this->a_neuron->N_Ap; temp_F2_neg = F2_neg_k * F2; derivs(0.0, y, dy); rk4(y, dy, n, 0.0, delta_t, y); Pr_R = Pr_Ap + F1 + F2 + Mod; if ((Pr_R >= Release_th) & & (Pre_NT_available >= NT_Quanta)) { Pre_NT = Pre_NT + NT_Quanta; Pre_NT_available = Pre_NT_available - NT Quanta; } Pre_NT = Pre_NT *(1.0- d eltaj * NT_Diff_Rate); Pre_NT_available += delta_t * NT_Recharge * (NTJnit - Pre_MT_available); } We choose a dynamic synapse over simple synaptic weight when we would like to analyze a neural network with detailed synaptic dynamics incorporated. A neuron can be an integrate-and-fire model, which was derived from making a number of 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. approximation o f Hodgkin-Huxley equation to describe the behavior of squid giant axon. O r a neuron can be defined to have more components involved such as pre- synaptic and post-synaptic objects to meet different modeling purposes. This rises a question about how the type restrictions work? In another word, how the choice of different models, say, synaptic models, is interchanged in modeling? The answer lies on the object-oriented approach taken by EONS. Since each model is modeled as a stand-alone unit in EONS, the functions are encapsulated in the object implementation. The connections between models are handled by the interface function in the object model. The switch between different models has been taken care of by object-oriented programming method. Whether dynamic synapse model is good enough is still no definite answer. More research needs to be done. However, dynamic synapse indeed provides more representation power in a neural network. To show its advantage, I have implemented a neural network with two neurons to explain its concepts. By adopting EONS modeling method, the neuron is modeled to have a pre-synaptic site to determine the probability that a quanta of neurotransmitter is released, and a post-synaptic site to scale the amplitude o f excitatory post-synaptic potential (EPSP) to response to an event of neurotransmitter release. The model is organized into two layers, an input layer and an output layer, with one neuron in each layer. The output neuron is also an 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. inhibitory neuron to the input neuron. There is a feedback connection from the output neuron to each presynaptic terminal of the input neuron. The input neuron is connected to the output neuron with four pre-terminals by dynamic synapses. The input neuron receives unprocessed, noisy raw speech waveforms as inputs. The systematic set-up of neuron connections in 43.2.3 is used. Figure 4.14 shows a picture of this model and simulation results of the temporal patterns in the presynaptic terminals. The bottom of the figure shows the variations of neurotransmitter releases with presynaptic mechanisms corresponding to certain input. The x-axis is time in ms, and the y-axis is the amplitude of input and the amount of neurotransmitter releases. The first line is the input data. The second line is the neurotransmitter release with certain presynaptic mechanism. We keep the second line as a standard in order to analyze the charts. The third line is the neurotransmitter release when argumentation FI is 1.25 times of that o f the standard one. As one knows, when argumentation FI increases, neurotransmitters also increase. That is what we see from the third line. When we increase facilitation on line four, we see an increase of neurotransmitter as well. On line five, when we increase the feedback modulation which is inhibition, we see a decrease in the neurotransmitter release. For the same spike train from the presynaptic neuron, by tuning the magnitudes o f the presynaptic mechanisms, we can get different 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. neurotransmitter releases in axon terminals. We can derive that for different spike patterns in the presynaptic neuron, with the same magnitudes of the presynaptic mechanisms, the neurotransmitter releases for the four axon terminals will change as well. Pr(R) — Ap + FI + F2 + Mod Pr(R): Probability of Neurotransmitter Release Ap: Current Action Potential FI & F2: 1st and 2nd components of facilitatio Mod: Feedback modulation 1 1 LLi U L...____ 100 200 300 T im e (m s) 400 ■ I. - I I . . I * - _ - , , • u ......... » _____1____ . 1 k . > t \ b 1 \ _ 1 » i . ........ 1 ........ 1 . . , t I 1 I .ft I A P F I F2 M od H u lL 2. 1.25 n n n n n 1 n n 1.25 n n n n Figure 4.14 Variations in Synaptic Mechanisms and Pattern transformation 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The results show that a neural network incorporating synaptic dynamics can capture the invariant feature of temporal patterns, and the pre-synaptic mechanisms can affect the probability of neurotransmitter releases. By tuning the pre-synaptic parameters, the sptio-temporal patterns of release which encode information that the neuron transfers can vary. As the number o f synapses grow, the coding capacity o f the network will increase and give an enormous representational and computing power for a neural network to process noisy signals. EONS modeling method makes it easy to incorporate detailed synaptic dynamics in a model and provides advantages for implementing it in a neural network. 4.4 Protocol-based Simulation with EONS After developing the basic modeling framework ofEONS in previous sections, we take one step further to capture the experimental procedures used in neurobiological experiments (Shu, Liaw et al. 1997). A tight coupling with experimental studies is critically important for the success of a computational model. This is particularly important given that experimental and computational approaches can complement each other. For example, it is difficult, if not impossible, to physically translocate certain molecules in a biological neuron with current experimental methods. However, this can easily be achieved in a computational model with an appropriate 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. spatial representation. It is, on the other hand, very difficult to solve the diffusion equation over a complex 3-dimentional space, a process which can readily be manipulated experimentally. To increase the synergy between empirical and computational neuroscience, a communication language for these two disciplines is needed. One o f the fundamental characteristics of biological experiments is the testing of certain hypotheses. The experimental procedures for testing a hypothesis can be fully characterized by a set of protocols, which specify the conditions o f the experiment and the manipulations o f system parameters. Correspondingly, we have developed a protocol-based simulation scheme to facilitate the communication between these two disciplines. We will illustrate this simulation scheme using a case study, namely the formulation and testing of a specific hypothesis that receptor channel aggregation on the postsynaptic membrane is the cellular mechanism underlying the phenomenon o f long-term potentiation. The experiment is described in (Xie, Liaw et al. 1997). In the protocol-based scheme, information of a simulation is organized into two- levels, namely, a meta level which spells out the hypothesis being tested and a procedural level which contains initial conditions and manipulations of model parameters. To facilitate the representation for computer simulation program, a 126 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. simulation protocol is specified by a set of system parameters. This representation goes beyond parameter / attribute translation for experimental protocols. For example, ■ Simulation conditions can be represented as a list of pairs (Pj, Vj), where parameter Pj in a simulation has the value Vj, and remains constant through the simulation, where 1 < j ^ N; N is the total number of parameters in a simulation. ■ Simulation manipulations can be represented as a list of pairs (Pj, Vj, (Tm, Tn)), where 1 < i < N, 1 < m, n < K, N is the total number of parameters in a simulation, K isa given number of time points. A tuple (Pj, Vj, (T/, T2)) specifies that parameter Pj in a simulation is changed to Vj and keeps the value during (Tj, T2). ■ If a manipulation is event-driven, it can be represented as a list of pairs (Pj, Vj, Am), where 1 < i <N; 1 < m < K. A tuple (Pj, Vj, A]) specifies that parameter Pj in a simulation is changed to Vj after (during) event Aj. We illustrate the protocol-based simulation scheme by the following example of using synaptic model to study molecular/cellular mechanisms underlying Long-term potentiation (LIT). 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4.1 Simulation protocols for testing alternative hypotheses Long-term potentiation (LTP) is a widely studied form of use-dependent synaptic plasticity expressed robustly by glutamatergic synapses of the hippocampus. Although there is a convergence of evidence concerning the cellular/molecular mechanisms mediating the induction ofNMDA receptor-dependent LTP, there remains substantial debate as to the underlying molecular/cellular mechanisms for the expression of LTP. Here we illustrate three hypotheses regarding possible mechanisms and the corresponding simulation protocols for testing these hypotheses. The key data set for testing alternative hypotheses is provided by the experiments described in (Xie, Liaw et al. 1997). The main finding of the experiment is that during the initial phase ofLTP, AMPA receptor-mediated responses evoked by presynaptic stimulation are increased. However, AMPA responses to focal application of agonist are not affected. The following simulation protocols for testing alternative hypotheses is based on a simple kinetic scheme: Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where A represents the concentration o f neurotransmitter at the position where the receptor B is located on the postsynaptic membrane. AB represents the probability that B is bound to the neurotransmitter and AB* is the probability that B is bound and open. K i, K -i, K2, and K_2 are the binding, unbinding, opening and closing rates, respectively. The amplitude o f excitatory postsynaptic potential (EPSP) is assumed to be proportional to AB*. The following simulation protocols demonstrate how the above synaptic model can be used to test alternative hypotheses regarding mechanisms underlying LTP (Shu, Xie et al. 1999). Hypothesis 1: Changes in the distribution of receptor (B) is the mechanism underlying synaptic plasticity. The key component of the protocol is the variation of the location of the receptor channel, B, from trial to trial. 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Trial #1 (Loc-of-B, O n m ) from release site Trial #2 (Loc-of-B, 20nm) Trial #3 (Loc-of-B, 40nm) Hypothesis 2: An increase in the neurotransmitter release is the underlying mechanism. The corresponding simulation protocol is similar to that for testing hypothesis 1 except that the concentration of agonist, A, is varied across trials. Trial #1 (A, 0, (Ti, T2)>, (A, 100, (T2, T3)), (A, 0, (T3, T4» Trial#2 (A, 0, (T i, T2)), (A, 200, (T2, T3)), (A, 0, (T3, T4» Trial #3 (A, 0, (Ti, T2)> , (A, 400, (T2, T3» , (A, 0, (T3, T4)) Hypothesis 3: Changes in binding rate constant (K i) is the base for synaptic plasticity. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The corresponding simulation protocol is similar to that for testing hypothesis 1 except that the binding rate constant, K i, is varied across trials. Trial #1 (Kl, 0.1) Trial #2 (Ki, 0.2) Trial #3 (Kl, 0.4) Thus, by constructing a model that includes appropriate representation of the synapse and a protocol for mapping experimental data and model simulation, one can formulate multiple biologically based hypotheses and identify the one that is most consistent with experimental observations. Currently, EONS library consists o f a set of objects such as: neural network, neuron, synapse, axon terminal, post-synaptic spine, receptor channels (e.g., AMPA and NMDA receptor channels), and various voltage-gated ion channels. Those objects can be used to construct modules together with appropriate protocols to verify simulation results with experimental results. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5 Discussion In this chapter, we describe the system design ofEONS, the development of EONS system, the development ofEONS object library for neural models, the multilevel modeling and simulation for neural systems with EONS, and the protocol-based simulation with EONS. The library o f objects provides a basis for modeling neural objects with varying degrees of complexity, depending on the objective of a specific modeling study. The hierarchical scheme ofEONS provides a general framework for representing structural relationships and functional interactions among different levels of neural organizations. Among currently available neural simulation systems, some specialize in the sophisticated modeling of a few single neurons, such as NEURON and GENESIS; others are concerned with modeling large networks o f neurons supported by simulation systems such as NSL (Weitzenfeld, Arbib et al. 1997). EONS models are constructed from sub-modules. Different levels of analysis can be included. By simulating models that incorporate appropriate details of neural processes, one can capture the essential principles of complex interactions in neural systems. Non-linear interactions of multiple processes at a given level or across different levels of organizations, and spatio-temporal relationships can be included. To exhibit the 132 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. capability of multilevel modeling, we have developed models at different levels using EONS objects, from synaptic models to neural network models, and have shown advantages of this modular based approach. Apart from developing EONS as a simulation system for modeling nervous systems, another primary goal of our work is to develop a library composing of elementary neural objects which can be used for other simulation tools. More work needs to be done in this perspective to make the library easy to use. For computational models to be useful both as an interactive research and teaching tool and as a means for technology transfer, it is important that a close-loop interaction between computational and experimental studies is provided. This requires a well-structured modeling framework that parallels the biological nervous systems and a mechanism for seamless mapping between neurobiological system parameters and model parameters for a direct comparison and interpretation of experimental manipulations and model simulations. To provide a link with experimental studies, we have developed a mechanism for protocol-based simulation (Shu, et al, 1997) (Shu, Xie et al. 1999) in which information about a model is organized in the same way as the specification o f an experiment. The protocol-based simulation provides a well-structured specification scheme for integrating modeling 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and experimental studies, by making simulation protocols similar to the experimental protocols. Overall, the goal of the EONS effort is to facilitate the development o f models that are biologically well constrained and thus can be used as: research tools for testing specific hypotheses regarding incorporating detailed neural processes from cellular or molecular levels up to synaptic and network levels. There is some interesting fiiture work associated with EONS modeling and development. For example, dendrite is not modeled at the moment. It can be developed together with cable theory in EONS as fixture work, etc. With regards to developing EONS into a better multi-level modeling environment, the following are some issues need to be considered. But it is not an exclusive list. • Build a user interlace and interpreter so that EONS model and its connections can be specified graphically. • Develop utility for object management so that object assembly can be done automatically. • Develop utility for object manipulation so that those objects can be retrieved, inserted and modified automatically. 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5: Data Warehouse and Data Mining for Experimental Databases 5.1 Introduction Experimentalists have accumulated a large amount of experimental data through the years. How to organize these data so that they can be efficiently used for data analysis and data retrieval is a challenging work. Moreover, new knowledge can be gained through combining relationships among fragmented data sets, showing a need for data mining on experimental data stored in experimental databases. Finding hidden knowledge among the experimental data is also challenging. Through the efforts of the USC Brain Project, I have designed a data warehouse schema for storing neurophysiological data, and have developed a flexible similarity matching mechanism for extracting various features of time series data (e.g. electrophysiological recordings). These methods provide a flexible way for users to define database queries to search through large data sets in order to identify similar recordings obtained under various experimental conditions and manipulations. Such a capability can facilitate the identification of hidden patterns or trends that would emerge only when a large number of data sets are analyzed together. 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The methods for collecting and storing neuroscience experimental data vary. Different neuroscience research areas have their own ways of conducting experiments and storing experimental recordings. There is no standard format for them to book-keep the data, showing a need for a systematic data storage mechanism. Besides, neuroscience research involves a large amount of data recordings. These data need to be stored together with experimental manipulations and conditions so that they can be analyzed in a more consistent manner. Data warehouse provides many characteristics that can be applied to meet the requirements in our applications. A multi-dimensional modeling method is proposed for organizing the experimental protocols and recordings in the databases. There are many different experimental data recordings. The data 1 am working on is neurophysiological time series EPSP recordings without spikes, because most o f time when neuroscientists analyze EPSP recordings, they usually do not include those recordings with spikes. I addressed one quesition neuroscientists may ask: Find those EPSP recordings that match the given one in the rising phrase, but roughly match the given one in the decay course, or vice verse. It is by no means the only question neuroscientists will ask. But it represents a range of analysis for this domain. For other questions, there may be other methods to solve. For example, a sequence o f 136 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. spikes with no obvious rising courses and decay courses are not the data addressed by my method. But other data mining methods for finding trends can be used for analyzing these spikes. My intention here is to explore the possibility to solve some neuroscience questions by applying data mining methods in computer science fields. Time series data pose some special challenges on today's knowledge discovery and data mining techniques. Being able to measure similarity is an important aspect of data mining in time series. Most previous studies of time series mining focus on finding an accurate whole matched sequence. However, it is often useful to consider finding a sequence with some segments accurately matched and the other segments less accurate matched in time-related data sets. To this end, I have propose an approach, termed FA-matching (Flexible Accuracy matching), to flexibly measure the similarity of two sequences with different degrees of accuracy in the subsequences motivated by our application in mining neuroscience time series. This kind of blur matching can be applied to other application domains as well. In this chapter I will describe the work on designing a data warehouse for experimental databases in section 5.2. In section 5 .3,1 show a definition of FA- matching model; an approximate faster algorithm for implementing the model; an 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. index structure (Sh-index) for neuroscience time series; and some preliminary experimental results on similarity matching real experimental EPSP recordings without spikes. We also show that we can generate queries based on our FA- matching model for other application areas e.g. stock market. Based on this work, a framework for data mining on neuroscience databases is introduced. A prototype of NeuroMiner is developed for mining neuroscience time series data, which includes the prototype junction for flexible matching similar neuroscience time series data in the database. The preliminary version of a data miner for neuroscience data, NeuroMiner, is given in section 5.4. The work provides the foundation for further development of the data mining mechanism, and facilitates the integration o f modeling system with the experimental databases. 5.2 A Data Warehouse for Experimental Databases Date warehouse is often used for a large amount of data that need frequent analysis and decision making. There are experimental conditions, experimental manipulations and experimental recordings in our experimental databases. We use multi-dimensional modeling to decompose the data into a fact table and some dimensional tables. For example, data entities such as experimental conditions, electrical manipulations, and pharmacological manipulations may all represent different dimensions. Some 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. experimental recording computation may represent measurements to analyze. Viewing actual experimental recordings across the different dimensions can be done in a very timely, very powerful fashion. The power o f the data is given to the users who need it most, without the need for users to wait for complex reports to be generated. Queries are designed to exploit this by using the dimension tables for counts, aggregation paths and searching for properties of the elements. Many queries can be resolved without even touching the fact table. When "facts" are needed, the method is to gather the key values from the dimension tables and then pull the matching records from the fact table, avoiding costly and time consuming table scans and complex joins. Figure 5.1 contains a data warehouse schema I designed for neurophysiological EPSP data of Dr. Berger’s lab in USCBP. In the figure, there is a Fact table, Experiment. This fact table contains different dimension keys which correspond the keys in different dimensions. Cond key (representing experimental condition dimension), Pharm key (representing pharmacological manipulation dimension), Elec key (representing electrical manipulation dimension), Record_key (representing experimental recordings dimension), Time_key (representing time dimension), avg-amp represents aggregation 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. value (average experimental recording amplitudes), etc. The electrical manipulation, the pharmacological manipulation, the experimental condition, the experimental recordings and time are dimensions linked to the feet table through Elecjtey, Pharmjkey, Condkey, etc. The data shown is central in importance as it: • reflects the most detailed manipulations and recordings which are usually the most interesting; • it is voluminous as it is stored at the lowest level of granularity; • it is always (almost) stored on disk storage, which is fast to access but expensive and complex to manage. This structure can also be used for database schema. But most databases use relational schemas rather than multi-dimensional schema. My contribution on designing this structure is that this way o f organizing neurophysiological EPSP data gives a clean structure for the experimental data and makes it easy to analyze experimental recordings under different manipulations and conditions. Moreover, this multi-dimensional structure will facilitate our future work on cross data mining on Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. experimental protocols and experimental recordings with a better organization of experimental data. The data warehouse schema in Figure 5.1 is an example of data warehouse designed for neurophysiological EPSP data ofUSCBP. Only the most sophisticated organizations will be able to put together such an architecture the first time out. What the data warehouse schema provides then is a kind of roadmap that can be used to design toward. Coupled with an understanding of the options and application domain at hand, the data warehouse schema provides a useful way of determining if the organization is moving toward a reasonable data warehousing framework. In our application ofUSCBP, experimental databases may be dispersed among subsidiary labs, each potentially recording massive numbers of experiments. The design of data warehouse schemas for different labs may vary. A complete data warehouse schema for USCBP is only possible when the experimental databases in USCBP become available. The key idea is to make available to management the critical information that can be used for further analytical processing and decision making, and facilitate data mining. 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3 Flexible Matching for Time Series Neurophyslological Recordings Time series data pose some special challenges on today's knowledge discovery and data mining techniques. Being able to measure similarity is an important aspect of data mining in time series. Most previous studies of time series mining focus on finding an accurate whole matched sequence. However, it is often useful to consider finding a sequence with some segments accurately matched and the other segments less accurate matched in time-related data sets. In this section, I propose an approach, termed FA-matching (Flexible Accuracy matching), to flexibly measure the similarity of two sequences with different degrees o f accuracy in the subsequences motivated by our application in mining neuroscience time series. The accuracy is captured in a user- specified number with higher one representing more accurate. Being able to measure subsequences with different degrees o f accuracy gives us more adaptability in finding closely matched time series segments with critical requirements, while leaving other less important segments more error tolerable. We show a definition o f FA-matching model; an approximate algorithm for implementing the model; an index structure (Sh- index) for neuroscience time series; and some preliminary experimental results on real neurophysiological EPSP recordings without spikes. I also show that queries based on 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Pharm _key Receptor In-bath Focal In-cell C ondkey Species Temp. Bath Slice E leckey Type Intensity Frequency Pattern Time_key Year Month Day hour minute second Recordkey Amp. Slope Rising_const Decay_const Rec_pattern Experiment Fact-ID C ondkey E leckey Pharm _key Recordkey T im ekey avg-amp avg-amp-area Figure 5.1 A Multi-dimensional Modeling Schema for Neurophysiological EPSP Data our FA-matching model can be generated for other application areas e.g. stock market. Based on this work, a framework for data mining on neuroscience databases Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is introduced. The preliminary version of a data miner for neuroscience data, NeuroMiner, is given. 5.3.1 Motivation There have been several financial and scientific applications of time series data mining such as stock market prices, geophysical data, etc. (AgrawaL, Lin et al. 1995). Most of them focus on finding a whole matched sequence, finding a sequence with some segments tightly matched and the other segments loosely matched were not considered. However, search for partially similar patterns occurs in many applications. For example, if we extract the first portion of the data in Figure 5.2 (an example of real neurophysiological EPSP recordings without spikes from Mr. Zhuo Wang’s experiments), we get the data in Figure 5.1, which illustrates an experimental recording of EPSP (Excitatory PostSynaptic Potential) in neuroscience research. Si(ti, vi) is the starting time. Sp (tp , vp ) is the peak point. S2(t2, v2 ) is the ending time, a is the peak amplitude. tp-ti is the time to peak amplitude. tr-tp is the decay time. Different postsynaptic potential recordings could have different values for these parameters. Therefore, my intention here is to explore the possibility to solve some neuroscience questions by applying data mining methods in computer science fields. 144 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sp Figure 5.2 An example o f neuroscience time series data In Figure 5.2, the peak amplitude specifies how big the potential is. tp-ti specifies the time course getting to the peak. These parameters are important information in the recordings and can be varied by different experimental conditions and manipulations. But the decay is influenced by many factors not necessarily by experimental manipulations. Usually when neuroscientists are analyzing the EPSP recordings, they are more concerned about how big the amplitude is and how fast the potential gets to the peak, and less concerned about the decay (personal communications with neuroscientists). Consequently, a query could be "Find the time series with very similar shape in the time to peak period, and roughly the same on the decay period". For this kind of query, if we would like to find similar patterns, we would require that the sequences found would match the subsequence of [ti, tp ] closely, but can be a bit 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. loose in matching the subsequence [tp , ta]. Thus, developing a mechanism which is general for the whole sequence but is able to control the degrees of similarity in the subsequences is desirable. In the example above, given a time series, finding patterns which tightly match the period [ti, tp ] and loosely match [tp , ta] can be achieved by putting more degrees o f accuracy in subsequence [U, tp ] matching, and less in subsequence [tp , ta] matching. Here a critical time point Sp needs to be determined. This leads to our definition of similarity with flexible accuracy (FA-matching) in the subsequences. Notice that if we put the same degree o f accuracy in all the subsequences with a small tolerance on how much distance they are allowed to have, the FA-matching can also be used for whole sequence matching. For other experimental data having clear rising and decay courses, such as IPSPs, my method can also be applied to them. Due to limited source o f experimental data, I did have those kinds of data. Therefore, I did not show the result on IPSP data here. 5.3.2 Related Work There are a number o f techniques having been developed for time series data mining. Most of them deal with whole sequence similarity matching. In (Faloutsos, Ranganathan et al. 1994) a sequence is decomposed into a small set of 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Seriesl Series2 Figure 5.3 A Sequence o f Real Experimental Recordings multidimensional rectangles by sliding windows, and features are extracted from the data sequence in each window. (Agrawal, Faloutsos et al. 1993) proposed the DFT (Discrete Fourier Transform) to map time sequences to the frequency domain with the observation that only the first few frequencies are strong. Then R*-trees (Guttman 1984) is used to index the sequences and efficiently answer similarity queries. (Agrawal, Lin et al. 1995) proposed another approach of measuring similarity which can handle scaling, offset translation, and non-matching gaps. The distance is determined from the specified envelopes o f the original sequences. 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (Bemdt and Clifford 1994) used dynamic time-warping approach for finding similar patterns in time series. Tree matching algorithms have been used by (Shaw and Defigueiredo 1990) and (Wang, Zhang et al. 1994), where relational trees were used to capture the hierarchy of peaks / valleys in a sequence. (Keogh and Smyth 1997) proposed a probabilistic model to integrate the local and global information and directly leads to an overall distance measurement between two sequence patterns. The above approaches have been emphasized on developing a criteria for accurately matching whole sequence. However, in some applications such as neuroscience research, there are specific time points in the sequence which segment the sequence where each subsequences can have different degrees o f accuracy (similarity) requirements because some part of recordings are more critical than the other parts. To the best of our knowledge, matching a time series with different degrees of accuracy in the subsequences is not addressed in previous studies. Therefore, I propose a FA-matching model for approximately matching a time series. 5.3.3 Flexible Accuracy Matching (FA-matching) The FA-matching stands for Flexible-Accuracy-matching. In this subsection, a description of the FA-matching model is given. The basic idea is that the original time 148 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. series can be transformed into another form of sequence by getting the slopes of the segments in the original time series. Then comparing two time series becomes comparing the corresponding slopes of them. One time series can be divided into several parts. Each part can have different number of segments, with high one meaning high resolutions and low one meaning low resolutions. Zero segment means ignoring certain subsequence. In this way, flexible accuracy representation for subsequences can be achieved. The FA-matching Model A given time series is a finite sequence X of float pairs: ((ti, V i ) ,..., (tk, V k ),..., (tn , v„)). t is time point, v is the value at time t. tk is a special time point in the given query sequence. We would like to find sequences similar to X with different accuracy requirements in the subsequences divided by tk . For illustration purposes, I only specify one special time point to decompose the sequence into two parts with two different accuracy requirements. Let F denote a set of transformation functions for mapping sequences to sequences. The set F may consist of functions that, for instance, break a time series into several segments and get the slopes o f each segment. Then, comparing two time series 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. becomes comparing their corresponding slopes. Consequently, a time series X is mapped into a sequence of float points after such a transformation. Intuitively, we say that two sequences X = ((ti, xi),..., (tk, xk )), Y = ((ti, yi),..., (tr , yr )) are (f, m, e)-similar, if there is a function f(m) belonging to F such that after applying/(ml to both X and Y, the resulting sequences X’ and Y’of floats are approximately matched with certain tolerance e in distance, where X’ = (xi, xz, ..., xm ), Y’ = (yi, y2, ..., ym ), m is user defined segment number, and | X’ - Y’ | < e. It is important to note that X’ or Y’ does not consist of consecutive points of X (or Y). Rather the points of X’ appear in the same relative segment orders in X, which is also held for Y’ in Y. This means that the matched sequence allows for a number of holes (outliers) in the original sequences. Clearly, if X and Y are similar, the number of outliers will be small, and X’ and Y’ will approximate them in transformations. Definition 1. Given a queiy time series X = ((ti, xi), ..., (tj, xj), ..., (t„, x,,)), and a testing time series Y = ((ti, yi),..., (tk , yk ),..., (tm , ym )), and the numbers 0< ei, e2 < 1, the sequences of X and Y are (F, x, e)-similar with two different segment numbers m and n2 on the subsequences with the breaking points X(tj, Vj), if and only if there exist some functions (e.g. two here) fi(ni) e F and f 2(m) e F, and subsequences X' = ((ti, 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. X i ) ,..., (tj, X j)), X” = ((tj+i, X j + i ) , (t„, xn )); an d Y' - ((ti, yi),..., (tk , yk )), Y” = ((tk + i, yk + i), (tm , ym )), such that X' and Y' are (fi, m, e^-similar, and the X" and Y " are (f2 , n2 , e2 )-simi\m.fi(m) transforms X’ and Y’ into m segments and gets their slopes, and f 2(n2 ) transforms X” and Y” into n2 segments and gets their slopes. The distance among the slopes in X’ and Y’ is less than ej. The distance among the slopes in X” and Y” is less than e2 . The parameter m and n2 are used to control the number of segments in the subsequences X1 (Y') and X " (Y") respectively. The parameters ei and e2 control how close we want the subsequences to match, i.e. X' and Y' can be matched with ei tolerance and X" and Y " can be matched with e2 tolerance in distance. Definition 2. For given X, Y, F, n, e, the FA-matching o f X and Y is FA-match F , « , e (X, Y) = {X, Y are (F, n, el-sim ilar} In this work, we mostly consider the fimction F transforms the original sequence into a set of segments and gets their slopes. This transformation allows us to find similarities between sequences with different degrees of accuracy (specified by the segment number) on the subsequences and different scaling factors. 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3.4 The Index Structure To develop a framework for our application, we observed that in neuroscience tim e series recordings we are dealing with, a valid stimulus results in a recording with a rising period an d a decay period together, we called it a base recording. Given a sequence of stimuli, a sequence o f base recordings will result (Figure 5.4). This motivates us to develop a shape index structure for these recording objects to facilitate the search of t im e series in the experimental databases. We observe that there are few noises in the Mr. Wang’s recordings, I did not consider this issue. If there are too much noise in the experimental recordings, pre-processing of experimental recordings is needed. This can be a future work. The Shape Index (Sh-index) The design of our index structure is based on the observation that time series in our application domain have specific features that each time series is the combination of one or more EPSP recordings with rising periods and decay periods. A sequence of this kind of recording has a sequence o f up rising periods and decay periods made from the single recordings (Figure 5.2). These data have the specific characteristics Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that there are peaks. Intuitively, those peaks are used as special points by neuroscientists. Thus, we segment each single recording into two parts, the up part called Up object, and the decay part called Down object. Therefore a single recording object consists of an Up object and a Down object. If there are interval period between the recordings, we define it to be a Stable object. The segmentation of the data is done when I input the data into my data storage structure. For example, the recording (for illustration purpose) in Figure 5.4 will have a sequence of objects: (Up Down Stable Up Down Up Down) Down Down Down S3 Stable Figure 5.4 The Index Schema for Experimental Recordings Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We can describe the Sh-index as following: a) A single EPSP recording is represented as an Up object (rising phase) and a Down object (decay phase). b) A sequence ofEPSPs is represented as a sequence ofUp and Down objects. If there are intervals between the recordings, we use Stable objects to represent them. Thus, a EPSP sequence can be represented by 3 labels Up, Down and Stable, with each of them pointing to a list o f objects in the same sequence. The design of the Sh-index is an application o f the language approach for querying the shapes of histories in (Agrawal, Psalila et al. 1995). 5.3.5 An Approximate Algorithm for FA-matching The approximate algorithm is illustrated by an example (Figure 5.5). The idea behind FA-matching is that we can segment the given query sequence (Si,..., Sp ) into m segments, getting the segment of (Si,..., S2), (S2, ..., S3),.... and (Sm , ..., Sp ). Users specify the number of the segments. By linking the end points of each segment, we get a chain of lines, which can be very similar to the original subsequences, if the segments are chosen reasonably. The higher the number, the more segment lines will 154 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. be. By comparing the closeness of the line segments of the sample sequence to the test sequences, we can decide whether those sequences are similar or not to the given query sequence. By specifying different segment number for different subsequences in the same sequence, the number of segment lines in subsequences will differ. Consequently, there are different degrees of similarity among the subsequences depending on the user specifications. Zero segment can be specified to ignore certain subsequence. For exam ple, a sequence with the starting point Sj(ti, V i), peak point Sp (tp , vp ), and the end point S2(t2, V 2). Assume that the time points are much more than the segment numbers. We want the degree of accuracy of the sequence from Si to Sp is 3, and the degree of accuracy of the sequence from Sp to S2 is 2. Then we can construct 3 lines: Si to S12, S12 to S23, S23 to Sp . We can also construct another two lines: Sp to S34, S34 to S2 (Figure 5.5). Sp si S2 Figure 5.5 Illustration of FA-matching Algorithm 155 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A pseudo-code for FA-matching is shown below. The result on real data is shown in Figure 5.6. FA-matching ((SI, Sp), Y) (Figure 5.5) nl = the number of segments for subsequence (SI, Sp); count = 0 ; for subsequence (SI, Sp) { get the segments according to n l; compute the slopes; } do the same for the sample sequence Y; while compare the corresponding slopes of the subsequences { if the slope is similar within e tolerance, count++; } if count/nl >= threshold, return True; else return False In the sequence (SI, Sp), three slopes are computed for these three subsequence to get three floats, say (x, y, z). The correspond parts of the given sequence Y are also segmented into 3 parts and 3 slopes are obtained, say (u, v, w). Then (x, y, z) are compared with (u, v, w). If the distance between x and u, y and v are smaller than a fraction, say 0.15, then they are treated similar. Consequently, we have 2/3 = 0.63, which is bigger than the threshold, say 0.6. Thus the function will return True. We can do the same for (Sp, S2). 156 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3.6 Implementation The FA-matching together with the Sh-index are implemented in Java. We take object-oriented design methodology to construct class o f objects for the time series and its index. I use Sh-index structure to enhance the efficiency of FA-matching. Since there is not database available in USCBP, I use an array structure to store the time series experimental data. Each time series sequence has an attribute NumOfPoints representing its length and a label with one of the value {Up, Down, Stable}. These are obtained when we input the data into the database. If it is an Up sequence, there will be values for peakamplitude and risingconstant. If the label is Down, there will be values for peakamplitude and decayrate. The time series raw data was stored in an array rawdata, where Seq[i]=(ti, w). The definitions of the Sequence and RawData are: public class Sequence { public String label; public int NumOfPoints; public double time step; public double start time; public double end time; public double peak amplitude; public double rising constant; public double decay rate; public RawData raw_data[]; 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. public double slope []; public class RawData { public double t; public double v; public RawData(double vtime, double value) { t = vtime; v = value; } } public void match_seq_Obj(UnitRecord Qrec, Vector recV, int window, int Udegree, int Ddegree) { ...} public void match_seq_Up(UnitRecord Qrec, Vector recV, int window, int degree) { ...} public void match_seq_Down(UnitRecord Qrec, Vector recV, int window, int degree) { ...} The time series can be decomposed into subsequences according to user's requirements. For each subsequence, different degree of accuracy for matching can be specified through the number w. A number o f 2 means to divide the subsequence into two segments, a number of 3 means to divide the sequence into 3 segments. The above match functions are used for checking the slopes o f the corresponding segments. To keep track of the number o f subsequences, we define a class UnitRecord to store the UpList, DownList and StableList. Therefore comparing two 158 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sequences turns into comparing the corresponding slopes of small segments between two subsequences. I use a distance measurement e to control how close we want two sequences to match. When comparing the difference between the values of two slopes, if the difference is within e, then add 1 to a counter. If the ratio of the matched segments to the total number of segments is higher than a threshold h, the two sequences are considered similar. 5.3.7 Performance Measurements To test our algorithm, we use sequences of neurophysiological EPSP data to find similar sequences with different accuracy based on users's specifications. The shape of each EPSP looks similar to the ones in Figure 5.2 and Figure 5.4, but with different amplitudes, time to the p eak , decay time, number of responses, etc. Given a sample sequence, we would like to find the FA-matching ones. Figure 5.6 is a search result for the query: find the EPSP which closely m atch es the given EPSP recording in the rising course (segment # = 3) and roughly similar in the decay course (segment # = 1). Therefore, the rising course of the given EPSP 159 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. recording is divided into 3 segments and slopes are computed for each segment, while only one slope is computed for the whole decay course. The same is done for the EPSP recordings being searched. After comparing the slopes, the program will return the FA-matching EPSP recordings. The following figure shows two recordings which are FA-matching similar. However linear scan will return these two recordings to be not similar because it uses Euclidean distance function | X - Y | = sqrt (( X - Y ) * ( X -Y ) ) t o calculate the whole distance of sequence. It does not take into account partial matched sequences. Notice that the decay courses of these two recordings do not match, but their slopes are similar. FA-matching will find the match. But linear scan will miss the matched one. We compare the proposed methods with sequential scanning method in terms of average recall (i.e. if a recording meets a query, the program will find it). The result is shown in Figure 5.7. All methods were implemented in Java on a Sun Sparc station 20. The measurements are obtained by sampling 30 neurophysiological data, and running the algorithm for each sequence and the given one. We use a distance parameter e of 0.2. If the percentage of matched segments exceed 60%, we treated the pair as matched. The average recall of FA-matching is very high with more than 160 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 7 6 5 4 3 2 1 0 -1 0 50 100 150 200 250 Figure 5.6 A Search Result o f FA-marching of Two Similar EPSP Recordings 92%. The reason for some missing is because of the noise in the experimental recordings. However, the average recall for linear scan is not that high because it always retrieval whole matched sequences, and does not take partial matched Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sequences as matched ones (Figure 5.6). The results for linear scan vary on different data sets. 0.5 “ A B Methods A: Linear Scan B : FA-matching Figure 5.7 Compare FA-matching with Linear Scan on Average Recall The user-specified numbers for segment also play an important role when matching two sequences. The larger the number, the more accurate are the matched subsequences. 162 8 G > o c < D o> to 4 — < D Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3.8 Queries There are several kinds of queries that can be answered efficiently using FA-matching. Besides neuroscience time series, e.g. EPSP data, it can also be applied to other business and scientific applications such as stock market data. Notice that if we put the same degree of accuracy in all the subsequences, the FA-matching can also be used as whole sequence matching. For example: • Query 1 : Find the pattern with similar up-growing trends with the given one. The query can be expressed by putting more degrees of accuracy on the up growing series of the given pattern, and less degrees of accuracy on other parts. • Query 2: Find the time series with similar up growing trend but having two times faster decay trend. This kind of query is not easy to be represented in convenient whole sequence matching. Instead it is easy to be solved by our algorithm by calculating the required slope for the down decreasing subsequence. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Query 3: Find the time series with similar trend in January but drops in February of this year. This query can be expressed by sampling the January and February sequences and matching them with other sequences. • Query 4: Find the time series with two consecutive up growing trends with a small interval in between during 24 hours. This kind of query is not easy to be represented in convenient whole sequence matching because of the length and the shape of the interval could vary. Instead it is easy to be solved by our algorithm. 5.3.9 An Extension of FA-matching: Flexible-Attribute Matching When we search time series recordings to find similar ones, we can also find those with certain attribute properties, such as time series with same amplitude, or same rising constant. Then we can assign different weight numbers to selected attribute to present the importance of this attribute. For example, a matching function f(n) can be defined as: 164 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. F(n) = w/* amplitude + W 2*rising_const + wj*decay_rate + w4 r*(a2 /ai)+ ... + W „* property Where aj, 0 2 is the amphtudes of the first and the second peaks in a recording. The attribute values of the sequences in the experimental databases will be normalized based on those corresponding values in the query sequence in order to perform this matching. By specifying different number of w (0.0 < w <- 1.0), the function will match recordings with different emphasis on their properties. 5.3.10 Discussion This section presents the FA-matching with different degrees of accuracy on the subsequences, which is a variation of whole sequence matching motivated by neuroscience applications. We show how to use FA-matching to find time series with some subsequences closely matched, while other subsequence roughly matched. Numbers represent the degrees of accuracy with higher ones requiring more accuracy. Being able to measure subsequences with different degrees of accuracy gives us more adaptability in finding closely matched time series segments with critical requirements, while leaving other less important segments more error tolerable. Thus when match Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. patterns with known outliers, the outliers can be defined to be matched with very low accuracy. In our algorithm, the matching is done through comparing the slope o f two corresponding segments, therefore, scale is not a problem with our method. Two sequences can be different scales, but have similar slopes in the segments. One heuristics we use when using our FA-matching is that we can provide default segments. From the experimental measurements performed by domain experts, it is known that usually the slope of rising phase is obtained by segmenting the sequence into 3 parts and calculating the slopes of them. The three segments are from the point whose x-axis is 0 to 0.3T; 0.3T to 0.7T; and 0.7T to T, where T is the time from the start point to the peak. Thus each subsequence is segmented into 3 parts [0 0.3T], [0.3T 0.7T], [0.7T T]. The slopes o f these 3 segments are stored as parameters. For simplicity, we can also apply the same criteria to the decay phase. This means that in FA-matching, the segments do not even to be the same length, and provide more flexibility on matching. Another heuristic I consider in our application domain is based on the characteristics of the experimental recordings. An experimental recording can be represented by several parameters such as amplitude, rising constant and decay constant. The 166 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5.8 The Amplitudes of Time Series Recordings combination of these parameters can roughly define a single experimental recording. However, for a sequence of recordings resulted for a sequence of stimulus, experimentalists usually measure the changes among the amplitudes, such as the ratio o f the amplitudes, A2/A1, A3/A1, etc. (Figure 5.8). Within the same recording, the time course of each response is similar, and the A2/A1 is a very important parameter to determine how the rest of the responses compare to the first amplitude. Thus, for a recording having a sequence o f responses, we can use amplitude, rising constant, decay constant, and A2/A1 as four parameters to represent the sequence. When there is only one response, the 4th parameter A2/A1 is 0. 167 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Based on these specific features of our application domain, it seems that R-tree (Guttman 1984) is a good candidate for the index structure. However, only using the 4 parameters as the dimensions in the R-tree will introduce false hits since the 4 parameters are the roughly representation of the sequence. Our FA-matching can be used to eliminate them. Using R-tree may require extra cost, but may save computation if we can limit the candidate sequences. If users do not want exact match, but rather interested in anything with the same shape, they can use FA- matching directly because FA-matching can take care o f scaling. Therefore, an interesting foture work is including these heuristics into the implementation and compare FA-matching with some other similarity matching methods. 5.4 A Framework for Data Mining of Neuroscience Databases Neuroscience databases contain different kinds of data. Sets of data mining functions are needed to perform different task. NeuroMiner (Shu, Liaw et al. 1999b) is a data miner for this purpose. In this section, I will introduce the architecture ofNeuroMiner and discuss some expected functions of this miner. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4.1 NeuroMiner NeuroMiner is a data mining engine for extracting interesting patterns from neuroscience databases. Discovering knowledge in experimental databases concerns extracting interesting features from time series experimental recording, finding association rules among the experimental protocols, finding general relationships between experimental recordings and experimental protocols, and getting other general data characteristics not explicitly stored in the experimental databases. Such discovery may play an important role in understanding experimental data, and capturing intrinsic relationships between experimental recordings and experimental protocols. There are two architecture issues o f concern for the data mining engine. The first is the place ofNeuroMiner in the overall system architecture. The second is the organization of the data mining engine itself. System Architecture The structure ofNeuroMiner fitted in the overall figure 5.9. Upon user’s requests, the data mining 169 system architecture is outlined in engine will access the databases and Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. perform mining functions. The data mining results will be sent back to users after the data mining engine has performed different mining algorithms that are based on users’ requirements to find interesting unknown knowledge. Meta-data is used for guiding the searches. Modeling and simulation (EONS) will help to verify the discovered new knowledge and provide new queries for further discovery. There are APIs built between the user interface to data mining engine, and between the data mining engine and the databases, to handle the interaction issues. System Structure NeuroMiner is a data mining engine which can incorporate several data mining methods for a set of tasks. It can do data clustering, extract unknown knowledge by adopting meta-rules, do cross mining on experimental recordings and experimental protocols, time series recording mining, association rules mining on experimental protocols, attribute-oriented induction data mining through conceptual hierarchy of the experimental protocols, using OLAP to enhance the data analysis, etc. The data mining results will be displayed through data visualization program Currently, the preliminary version ofNeuroMiner only includes the FA-matching mechanisms and its related index structure. More future work needs to be conducted. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. User Requests new knowledge requests requests patterns ► EONS Meta-Data NeuroMiner metadata hypotheses hypotheses \^ a ta + protocols DB2 DBn DB1 meta-data Figure 5.9 NeuroMiner in the Overall System Architecture 171 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4.2 Data Organization Most neuroscience data are accessed manually. USC Brain Project has made an effort to put the data electronically. However there are different data types, data formats which make data organization a non-trivial task. The data warehouse is the underlying data organization for the experimental data. Currently we use multi-dimensional modeling, and it has several advantages: (a) It allows a complex, multi-dimensional data structure to be defined with a very simple data model. This makes it easy to define hierarchical relationships within each dimension. (b) It simplifies the task o f creating joins across multiple tables, and reduces the number of physical joins the query has to process. This greatly improves performance. (c) By simplifying the view of the data model, it reduces the chances o f users’ inadvertently submitting incorrect, long-running queries that consume significant resources and return inaccurate information. (d) It allows the data warehouse we build to expand and evolve with relatively low maintenance. This simple and powerful dimensional design provides a flexible foundation for the data warehouse’s growth. 172 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4.3 Discussion In this subsection, I will discuss some interesting future work for extending NeuroMiner. NeuroMiner is a data mining engine with a set of data mining algorithms built on top of the data warehouse. NeuroMiner can work on one centralized data warehouse, or distributed data warehouses if parallel data mining algorithms are employed. The FA-matching mechanism is the prototype function in this data miner. However, to make NeuroMiner a data mining engine in our application, more functions need to be included such as association rule mining on experimental protocols, cross mining among neuroscience time series recordings and non-time series data stored in the experimental databases, attribute induction, OLAP analysis, etc. For example, we can infer from an experimental protocol that under certain conditions, we will get higher EPSP (Excitorary PostSynaptic Potential), which will cause some property changes of channels. These changes of channels will induce enzyme changes. This kind of data mining tasks can be assisted by meta-rules. Suppose that we know manipulation A will cause B and B will affect C. We can infer that manipulation of A will affect C which is previously unknown because the experiments were conducted 173 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. by different people. Consequently, new experiments can be conducted to test this hypothesis. We can design query operations to perform the searches to find the relationships between time series and non-time series data, i.e. the experimental recordings and the experimental protocols. We make some assumptions that (1) knowledge discovery is initiated by users’ query requests. It is a command-driven constrained search. (2) Background knowledge is available for the data mining process to assist the processes. There are different algorithms for performing different tasks in a data mining process: (a) Experimental protocol generalization before time series data generalization. (b ) Reverse order. (c) Interleaved generalization between time series recordings and experimental protocols. Conceptual hierarchies provide background knowledge to constrain searches. Figure 5.10 is a portion the conceptual hierarchy for experimental manipulations. This hierarchy can be used when data mining re la tio n s h ip s between experimental 174 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. recordings and experimental protocols. The relationships on experimental manipulations can be generalized (specialized) along the hierarchy. Electrical Experimental Manipulations Pharmacological Geometrical » l stimulus potential ^ F° /? L “ » intensity frequency patterns agonist antagonist intracellular duration wash-out Figure 5.10 A Portion o f the Experimental Manipulations Hierarchy Although certain regularities, such as association rules, can be discovered and expressed at the primitive conceptual level by data mining techniques, often stronger and more interesting regularities can be discovered only at high conceptual levels and expressed in concise terms, such as: if A increases, B usually decrease. It is necessary to generalize low level primitive data in databases to relatively high level concepts for effective data mining. The availability o f background knowledge not only improves 175 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the efficiency o f a discovery process but also express users’ preference for guided generalization, while may lead to an efficient and desirable generalization processes. By initiating a knowledge discovery process upon users’ requests, the search for experimental data and experimental protocols will be contained on a specified subset of data and the generalization o f rules will be more focus. NeuroMiner can get user queries from a GUI. Key data in different databases are identified by hypotheses (Shu, Liaw et al. 1998). Visual data mining techniques have a high potential for mining large databases and a high value in exploratory data analysis. Data visualization tools let users explore higher dimensionality relationships by combining spatial and non-spatial attributes (location, size, color, and so on). It is desirable to incorporate some data visualization for our application. Since the development o f experimental databases is still undergoing in USCBP, most work for NeuroMiner discussed here will be future work. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6: An Architecture for Linking Modeling and Experimental Studies (MDIS) 6.1 Introduction Experimental studies and modeling are two closely related approaches to the understanding of a complex system. Experimental studies can be used to validate models, while modeling and simulation provide insights for experimental studies. Currently, data obtained from simulation is often compared with experimental data found in the literature in an ad hoc manner. Those data are usually summarized and aggregated data. Precise comparison is difficult. With increasing amount o f raw experimental data being stored in databases, a systematic approach to connect simulation and experimental databases based on hypothesis formation and verification is needed. This approach should not only facilitate the search for experimental data to constraint model parameters, but also make the models more accessible to experimentalists. A combination of modeling and experimental studies is necessary to better characterize dynamics of complex systems. In neuroscience, understanding brain functions often needs combining different kinds o f data stored in multiple databases. For example, understanding LTP requires 1 7 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. neurophysiological, neurochemical, neuroanatomical, and behavioral studies. Each of the studies requires subtle expertise; it is hard to conduct an integrated experimental study to cover all those areas. Integrating empirical data collected in different databases provides new opportunities for scientific discovery. Generally, a model can span several fields of research and provide integrated analysis for experimental data. Hence models developed for insightful hypotheses can be an important foundation for the integration of data stored in different databases. To test the accuracy o f the models built for certain hypotheses, it is desirable that the models be tested with experimental data stored in the different databases. Moreover, it is essential to build a federation of databases and to link the data with related models to help searching answers to complex research problems, and to find hidden knowledge through analyzing experimental data at a large scale. 6.1.1 Motivating Example The following example illustrates a scenario of integrating modeling and neuroscience experimental studies. Assume there is a hypothesis on synaptic modeling: different combinations of AMPA and NMDA receptors will result in different EPSPs (Excitatory PostSynaptic Potentials). A synaptic model (Amhros-Ingerson and Lynch 1993) (Lester and Jahr 1992) is built to capture the changes of postsynaptie potentials 1 7 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and the underlying receptor distributions for this hypothesis. In the synaptic model, there are AMPA receptors and NMDA receptors. Different distribution of receptors will affect the generation of postsynaptic potentials (PSPs). When a stimulus is applied, some receptors will be affected. The channels will open. Corresponding EPSCs will be yielded for each receptor involved. When we use agonist AMPA receptor, the recording is the EPSP yielded by the AMPA receptor. When we use agonist Glutamate receptor, the recording is the EPSP yielded by the AMPA and NMDA receptors. Figure 6.1 gives a kinetic model for AMPA receptor (Ambros-Ingerson and Lynch 1993). An AMPA receptor binds to agonist R. The properties o f AMPA model requires consideration of time-dependent transition rate (KjXa (t); i = 1, 3} where transmitter concentration Xa (t) changes in time. The estimation o f kinetic parameters can be found in (Ambros-Ingerson and Lynch 1993). It is perhaps the simplest model that can account for a significant portion of the phenomenology that has been reported for the AMPA channel such as desensitization and burst behavior (i.e., several openings closely spaced in time). The NMDA model (Lester and Jahr 1992) is presented in Figure 6.2. It represents that an NMDA receptor R with two binding sites for agonist A. O is the open state of 179 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. K1Xa (t) K „ R + A ^ ^ RA ^t - ----- ^ O pen K J K1 i K4 Kr t KSXa (t) 1 Rd + A RwA K < Figure 6.1 Kinetics for AMPA Receptor the channel and A2 D is the desensitized state. K o n and K o ff are the binding and unbinding rates, B and a are the opening and closing rates, and K d and K r are the rates of desensitization and resensitization, respectively. 2K on K on B 2A + R A + AR a 2 R * T O K off 2K off K. a t a 2 d Figure 6.2 Kinetics for NMDA Receptor 180 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Calculation of EPSP from the Kinetics of AMPA and NMDA The calculation ofEPSC is by assuming that there are a number of AMPA and a number of NMDA receptors in a synapse (Shu 1997b). The number of AMPA and NMDA represent the densities of the AMPA and NMDA receptors. I a m p a ~ (# of AMPA) * (single_AMPA_current) (a) I n m d a - (# ofNMDA) * (single_NMDA_current) (b) I t o t a i ~ Ia m p a + In m d a + T e a k (c) T e a k is the leaking current. The density of receptors can be represented as, for example: Synapse #1, there are 30% AMPA receptors, 70% NMDA receptors. Synapse #2, there are 50% AMPA receptors, 50% NMDA receptors. or Synapse #1 before LTP, there are 30% AMPA receptors, 70% NMDA receptors. Synapse #2 before LTP, there are 50% AMPA receptors, 50% NMDA receptors. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Calculation of EPSP The calculation of EPSP is shown in equation (d). Here, R is membrane resistance. C is membrane capacity. V is membrane potential. R * C * (dV/dt) = -V + W i (d) The Process for Stimulus to Affect the Kinetics From the kinetics of AMPA and NMDA, we can see that there are rate constants between different states. By using different chemical stimulus, we can change the rate constants (Ambros-Ingerson and Lynch 1993) (Lester and Jahr 1992). The observed interactions between drug and LTP suggest that potentiation itself changes the kinetic parameters of the channel. When a chemical stimulus is applied, some receptors will be affected. The channels will open. Corresponding EPSCs will be yielded for each receptor involved. In this example, to verify certain hypothesis for this model, we need information about EPSP and the distribution of AMPA and NMDA receptors. The EPSP recordings are usually obtained by neurophysiological experiments and stored in neurophysiological 182 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. databases, while the distribution of AMPA and NMDA receptors are obtained from neurochemical experiments and stored in neurochemical databases that are usually brain atlas databases. It is desirable that models for this hypothesis can be verified with different kinds of experimental data from those experimental databases to test the similarity of simulation results with experimental results. To make the association between models and relevant experimental data possible, there are several issues need to be considered: • To make neural models and databases storing experimental data “understand” each other. • To identify key data in the databases and to efficiently get the relevant data from databases for a model. • To represent various models in a well-defined way so that they can be linked to those relevant experimental data in the databases. • To set up the relevance among models and experimental data. To solve these issues, an architecture called Model/Data Integration System (MDIS) is developed to allow models to be linked with relevant data in the experimental databases and to facilitate the connections through data access, model/data representation and integration methods. By using the functions provided by the 183 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. architecture, an integrated analysis o f modeling and experimental studies is obtained. The architecture will not only facilitate the search of experimental data to constrain model parameters, provide insights for experimental studies, but also enable data mining facility to infer useful relationships among data from previously unrelated experimental databases. Data mining in neuroscience databases has been addressed in chapter 5 of this thesis. 6.1.2 A Hypothesis-driven Approach One of the most important aspects ofboth modeling and experimental studies is the formulation and verification of hypotheses. For the same phenomenon in neuroscience, there are various hypotheses for it depending on different neuroscience sub-fields. Each of them focuses on its area of research and explanation. However, a satisfactory explanation is achieved only when multiple levels of the neural system from molecular/cellular, to synaptic, neuron and neural network level are taken into consideration. An insightful hypothesis for modeling spanning several neural levels can provide linkage for different areas o f neuroscience research and suggest new perspectives for neuroscience experiments. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. When we want to link modeling and experimental databases, hypothesis provides a way to constrain the common area of interests. By providing hypotheses as the conceptual level description, search and retrieval of the data in the databases can be realized by formulating queries using the terms related to the hypotheses. For example, there are different hypotheses concerning certain phenomena in hippocampus: Neurochemical Study Observation: AMPA and NMDA receptors are co-localized on the same synapse in hippocampus. Hypothesis: Uniform distribution o f AMPA and NMDA receptors in hippocampus Main experimental procedures: 1. Apply AMPA to slice 2. Apply TCP to slice 3. Subtract NMDA binding from AMPA binding Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Physiology Study Observation: Variation in EPSPs originated from different dendritic regions of CA1. Hypothesis: Non-uniform distribution of AMPA and NMDA receptors in CA1 Main experimental procedures: 1. Record intracellular EPSP evoked from different dendritic regions 2. Apply AMPA blocker to slice 3. Apply NMDA blocker to slice Modeling Study Observation: AMPA and NMDA receptor channels have different kinetics. Hypothesis: Different combinations of AMPA and NMDA receptors will result in different EPSPs. Main experimental procedures: 1. Vary the number of AMPA and NMDA receptors in a synapse 2. Calculate EPSP in response to a pair of glutamate release The above example shows that there are different hypotheses regarding to specific phenomena in experiments. By analyzing various hypotheses in experimental areas, 186 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. modeling studies can derive a more general one for model construction and detailed analysis. For this example, by using hypotheses in different areas of brain research, we can extract the important features from the studies that different combinations of AMPA and NMDA receptors will result in different EPSPs. Given that we construct a synaptic model based on the hypothesis on the relationships between EPSP and combined distributions of AMPA and NMDA receptors, the simulation conditions and manipulations can be obtained. Consequently, experimental conditions and manipulations can be derived and relevant database attributes can be identified. Hence, to solve the problem o f identifying needed data in the databases; a hypothesis- driven approach is adopted. In general, using the hypothesis-driven approach for key data identification consists of several steps: 1. Gather hypotheses from different areas o f research concerning specific phenomena. 2. Derive general hypothesis for modeling study based on those hypotheses in different experimental areas. 3. Construct a model for detailed analysis. 4. Select relevant simulation condition and manipulation parameters. 5. Translate those simulation parameters into experimental attributes. 1 8 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6 . Transform the experimental attributes into database table attributes 7. Get needed data through those table attributes. 6.1.3 The Information Integration Platform Currently, there are very few data management systems available that provide sufficient support for the integration of modeling and experimental studies, and in most cases, the two approaches remain separated. The work presented here, Model/Data Integration System (MDIS) for linking modeling and experimental studies, has been conducted within the framework ofUSC Brain Project. An important goal of the Human Brain Project is to understand how to build a federation of databases in order to integrate, search and model multidisciplinary neuroscience data (Shepherd, et. al 1998). The databases within the federation include Repositories of Empirical Data (REDs), Summary DataBases (SDBs), Model Repository (MR) and Article Repositories (ARs). The REDs will use the NeuroCore database schema to build databases of “protocol-wrapped” data from experiments (Grethe, et al. 2000). The MR provides database access to computational models in such a way that links the assumptions and predictions of models to empirical and summary databases. The architecture presented here is primarily concerned with the linkages of the data in the Repositories o f Empirical Data (RED) and the Model Repository (MR). 188 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.1.4 MDIS Architecture The MDIS architecture is based on a middleware layer between modeling and experimental databases. It applies database and knowledge representation techniques to neuroscience applications and takes the specific requirements of neuroscience applications into consideration. Figure 6.3 is an overview o f the different components that form MDIS. It incorporates several mechanisms: mapping between experimental protocols and simulation protocols, a protocol specification language, a semantic model, a metadata model, a Model-DataView structure, a controller, a query manager and a GUI. A hypothesis-driven approach for identifying key data in the databases is adopted. This multi-tiered architecture separates interface, logic and data for m axim um deployment flexibility. It also makes it easy for us to adopt a component- based approach to implement the system. Users interact with the system through the graphical user interface module. Requests from the GUI are sent to the appropriate component. The simulation protocol for a model represented in terms of protocol specification language is the input to the semantic model. The outcome of the semantic model is the input to the metadata model, where the translation to experimental protocols and further to database attributes is supported. The output is query constrains, which are used to define query 1 8 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in the view section ofModel-DataView unit. The query is processed by query manager and sent to databases for data search. Hypotheses provide conceptual guidance for developing the metadata model, Model-DataView units and queries. The protocol specification language aims at formalizing the representation of experimental and simulation protocols, and facilitating the mapping between them because simulation protocols and experimental protocols are abstract representation for models and experimental data. The semantic model provides common internal representation for protocols. Since simulation protocols are the abstract representation o f a subset o f a model, PSL can also be used to describe a subset o f a model. The metadata model is based on several matrixes to set up a common ontology for parameter/attribute translation between modeling and experimental database term. The query manager constructs generic queries based on user-selected constraints and processes users’ queries. The Model-DataView structure tackles the task of associating models with the database queries of the needed data. The manipulation o f Model-DataView units is conducted by the controller component. The GUI is for facilitating user interactions. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Modeling (EONS) Protocol Specification Language Semantic Model GUI Middleware Layer (MDIS) /V Metadata model Model-DataView Controller Hypotheses H ► Query Manager < S ) Figure 6.3 The Model/Data Integration System 6.1.5 The Process Flow To discuss the information flow within MDIS, in Figure 6.4, we show a workflow process for the system from a modeler’s point o f view. 191 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. PSL find exp. \ protocol terms in metadata / model? / Query in \ Simulation .terms? / yes yes no Restart Restart no ■ n o yes Find \ attribute! yes \erms2 yes yes no f s /Getexp \resultsy Similar yes no Exit Query \ Database. Refine query Form query & send query Search cooperative attributes Mapping with exp. protocol terms Compare simulation results with experimental results Figure 6.4 Process flow within MDIS Users can query the databases in either simulation protocol terms or experimental protocol terms. The protocol terms are constrained by PSL. If the terms are in simulation protocols, they are mapped with experimental protocols assisted by the metadata model. If a mapping is found, we can form the query and send it to databases to get experimental results. The experimental results are compared with simulation results. If they are similar, it means that a hypothesis is verified by both a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. simulation and an experiment. If they are different, the query can be refined for a new search. If the search terms are in experimental protocols, we can just send it to databases for search. If they are not in experimental protocol terms, we can find out whether they are in the cooperative database attribute list. If they are, we can form the query. Otherwise, we can restart a new search or just exit from the system. Details are illustrated in the following sections. The chapter is organized as follows. First, experimental protocols and simulation protocols are introduced. The components ofMDIS are presented in details in section 6.3. Section 6.4 presents the characteristics ofMDIS. Section 6.5 discusses the implementation issues related to MDIS. The related work is presented in section 6 .6 . Section 6.7 is the discussion. 6.2 Linking Models to Experimental Data in the Databases To integrate modeling and experimental studies, the first step is to develop a common ground for these two disciplines. Usually experimental procedures are specified in experimental protocols to test a hypothesis, designing simulation protocols based on experimental protocols can build up a linkage between these two disciplines, and Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. establish a direct correspondence between a biological neural system and a modeling study. 6.2.1 Experimental Protocols In the experimental databases, an experiment is organized into a hierarchy with three types of objects: session, block and trial. Each session has several blocks. Each block has several trials. An experiment consists of several sessions. Each session/block/trial has a protocol associated with it. The experimental protocol specifies what are performed in each trial, each block, each session of an experiment. Usually, an experiment includes manipulations such as drug applications, external stimuli and electrical simulations. An experiment is performed under certain conditions. Experimental conditions may be time constraints. They may also be initial conditions that should remain the same during the whole experiment. The experimental manipulations and experimental conditions are captured in a protocol. That is, each protocol is formed by a set of experimental manipulations and experimental conditions. An example of the hierarchical description of an experimental protocol is illustrated in the Figure 6.5. 194 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. c (^Experimental '-Protocol. J / < Experimental ^ Conditions J r E x p e r i m e n t a l \ v. M a n ip u la tio n s } V - - ✓ I Species rat rabbit T emperature body room Bath [ C a 2 + ] 0 P er fu si o n A CSF rate duration Slice thickness orientation Circuitry (< Ph a rmaco logica 1^ Receptor agonist antagonist In bath Focal Intracellular duration wash-out Eie ct ric a I Type of elec. Stimulation intensity frequency patterns Holding potential Current injection Figure 6.5 A Hierarchical Description of Experimental Protocol 6.2.2 Simulation Protocols Simulation protocols with similar structures to experimental protocols are used to capture simulations. To set up connections between experimental protocols and 195 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. simulation protocols for our schema translation, we focus on the connections between the parameters of the protocols. Basically, an experimental protocol has a correspondence to a simulation protocol if the set of experimental conditions overlaps with the set of simulation conditions, or if the set of experimental manipulations overlaps with the set of simulation manipulations. Figure 6.6 shows an example o f simulation protocol. An example of the parameters / attributes correspondence is shown in Table 1. 6.3 MDIS Architecture Components In the following, components in the MDIS are described. It includes protocol specification language and semantic model, the metadata model, mechanisms for querying multiple databases, the Model-DataView structure (MDV), MDV controller and GUI. 6.3.1 Protocol Specification Language and Data Model In order to facilitate users specifying queries in terms of protocol parameters, there needs a method to specify them formally. The Protocol Specification language (PSL) was developed to provide a formal notion for simulation/experimental protocols. It is 196 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Csimulation Protocol) Simulation Conditions i Sim ulation A ^Manipulations/ S p e c ie s T em p eratu re B ath [Ca2+]Q P er fu sio n agent rate duration concentration Membrane P rop erty capacitance resistance M o r p h o lo g ica l P rop erty shape size location ^ density > > Pharmacological glutamate CNQX Electrical current voltage Kinetic rate constant Morphological Property shape size location V density_____ y Figure 6.6 A Hierarchical Description of Simulation Protocols motivated by the work on description logic (Brachman, McGuinness et al. 1991). Given that simulation protocols are partial representations of models, PSL can be used to describe a subset of a model. The PSL is used for capturing both simulation 197 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. protocols and experimental protocols. Therefore, model parameters can be mapped to experimental database attributes in a systematic way, and domain knowledge can be input into MDIS formally. Once a protocol has been described in terms ofPSL, users can create its internal domain representation, which is referred to as its Semantic Model. It is a template for protocol specification. 6.3.1.1 Syntax ofPSL The PSL language is defined by a set o f concept-forming constructors such as AND, MUST, WITH, FIX, FILLS. It is a universal language for defining protocol concepts and protocol individuals. PSL is a subset of the language described in (Brachman, McGuinness et al. 1991). In our PSL, the new MUST and FIX constructors are introduced to restrict that the filler of a role can only be certain type or certain individual It is suitable for our application domain because sometimes we want certain parameters must meet certain criteria first. Primitive role expressions are also introduced to include time relations for experimental (simulation) protocols. The BNF syntax for a protocol specification language is given below: <concept-expr> ::= Protocol | <concept-name> | (AND <concept-expr>+) | (MUST <role-expr> <concept-expr>) | (WITH <role-expr> <concept-expr>) | 1 9 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (FIX <role-expr> <individual-name>+) | (FILLS <role-expr> <individual-name>+) <role-expr> ::= <symbol> | (PRIMITIVE <role-expr>) <individual-expr>::= <concept-expr> <concept-name> ::= <symbol> <individual-name> ::= <symbol> | <number> Protocol is a pre-defined concept on top of the hierarchy. The constructor MUST makes an attribute be a certain type. The FIX constructor makes its role-expr be a certain value. The role value specified by the MUST and FIX constructors are priority attributes need to be satisfied first. The other constructors WITH and FILLS are used for constructing protocols, but their role values are not fixed, and can be relaxed when we match the values between a simulation protocol and an experimental protocol. AND represents the is-a relationship. The basic components in PSL are concepts, individuals and roles. Concept-expr introduces concepts for either simulation protocols or experimental protocols. A concept is usually specified by already defined concepts with constructors. A concept represents a class of individuals, and can combine other concepts and roles using concept-forming constructors. Individuals are the instances of the concepts. Roles represent binary relations between individuals. The roles of a PSL individual can be filled by a set of individuals or be restricted by certain types of concepts. 199 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For example, we can define a current manipulation as: Current_manipulation: (AND Electrical_manipulation (MUST Electricaltype Current)) The ElectricalmanipuJation and Current are concepts. The Electrical type is a role expression. The primitive role expressions for time relations such as before, overlap, after, etc are introduced in PSL as well. For conditional protocol, i.e. the application of a protocol is constrained by a previous one; a role after is defined to represent it. Other timing relationships can be defined similarly. There are 13 kinds of time relationships (temporal logic), which can be pre-defined to be primitive roles. For example, suppose PI is an already defined simulation manipulation, and P2 needs to be applied immediately after PI, then we can define: P2: (AND Simuktion manipulation (MUST after PI)) Figure 6.7 provides an example o f a PSL representation for the simulation protocol of the synapse model introduced in section 6.1.1. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Synpase_model_pro 1 : (AND Currentmanipulation (FILLS Electrical_intensity 0.5) (FILLS Electrical_frequency 100)) Figure 6.7 The PSL representation of a simulation protocol Using PSL for protocol specification is the key to the MDIS’ solution for connecting modeling and multiple experimental databases. To use MDIS for a completely new domain, only new concepts and individuals need to be defined. The underlying structure (constructors) does not change; only the knowledge about the specific domain changes. The PSL specification also provides the capability to dynamically alter knowledge within a domain. A PSL specification can be changed and the next time the subsystem is activated, the new specification is used. 6.3.1.2 Semantics of PSL Formal semantics gives better understanding of what PSL is able to express and provides a precise definition for the constructors: AND, MUST, WITH, FIX, FILLS. 201 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Let D be a set regarded as the domain under consideration, s is a function which to each unique individual assigns an element from D, to each unique concept assigns a subset of D, and to each role assigns a subset of D xD . That is s [c] c D for any concept c, 8 [ r ] c D x I ) for any concept r, and e [/] g D for any individual i. 8 is said to be an extension function over D iff: 1. 8 [(AND cj, ... c„)] = {x e D | x e s [cf\ a ... a x e s [c„]}; 2. 8 [(MUST r c)] = {x gD | Vy: if <x, y > e e [r], then y g s [c]}; 3. 8 [(WITH r c)] = {x g D | Vy: if <x, y> e s [r], then y e s [c]}; 4. s [(FIX r ik)] = (x g D | <x, 8 [ij > g s [r] (1 < k < m)}; 5. s [(FILLS r » ) ) ] - {x g D | <x, e [i] > g e [r] (1 < j < n)}; 6. 8 [(PRIMITIVE r)] = {x g A y eD | <x, y> g s [r]}. The AND constructor takes a number o f concepts and forms the concept according to their conjunction. The MUST (WITH) constructor assures that all of the fillers of the role r belong to the concept c. The FIX (FILLS) constructor takes a number of individuals as the fillers of the role r. MUST and WITH (FIX and FILLS) have similar semantics. Their differences lie in when we define query based on PSL to search databases, the fillers for MUST (FIX) need to be satisfied first. 202 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3.1.3 Semantic Model The semantic model is the common internal representation o f a simulation protocol associated with a neural model in our application. It is implemented through C++ classes and procedures. Each protocol has its own Semantic Model. Although protocols have different information in the respective semantic models, the general structure (template) and interface of the semantic models is the same. This is important because it allows all simulation protocols to be translated to experimental protocols using the same rules. The semantic model for a simulation protocol is dynamically created at run-time by model definition and hypothesis. The PSL definition for a particular simulation protocol is a major input item of the Semantic Model. The semantic model will contain other knowledge for mapping from simulation protocol parameters into experimental protocol attributes. The knowledge is defined in the metadata model ofMDIS, described in section 6.3.2. This allows simulation protocols to be mapped into appropriate experimental attributes. For instance, if an experimental database has been modified by adding/changing certain table attributes, MDIS will automatically handle the changes since it does not access the database schema until run-time. When a user queries the content of a simulation protocol, MDIS returns the current Semantic Model to the user. Figure 6.8 is an 203 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. illustration of the content of a semantic model for the simulation protocol for the example in section 6.1.1. Protocol Object SimulationProtocollD SimulationCondition Attributes Time-step C oefficien t-co n sta n ts LocationID ElectricalManipulation Attributes Electrical-type E lectrical-intensity Electrical-frequency Electrical-du ration PharmacologicalManiupulation Attributes Drug-type Drug-name Drug-duration T ranslationFunction() Figure 6.8 A Simulation Protocol Semantic Model 6.3.2 The Metadata Model To make it possible to link models with experimental data in the databases, there needs to be a notion of common ontology for model parameters and experimental 204 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. database attributes, and to support cooperative queries in multiple neuroscience database systems without building an integrated schema. In this section, we present the work on developing the metadata model, which in principle is an ontology translation based on parameter/attribute semantic equivalence. The basic premises of the metadata model are as follows. (1) Modeling and experimental databases are usually separated. (2) The component databases in neuroscience applications are so loosely related that integrating all database schemas into a single one is impossible. (3) The linkage among the metadata o f the component databases in neuroscience application is ad hoc, which implies that user assistance in identifying connections among them is needed. (4) Component databases are in different sub-areas of neuroscience research and therefore the identities o f the databases must be made explicit to users. (5) Different neuroscience databases cannot be merged because their domains are different and their identities must be maintained. In our approach, a metadata model for linking modeling and experimental studies is provided in order to make it possible for modelers and experimentalists to query the databases using the terms they are familiar with, as well as to make the component database attributes transparent to the users. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Most database attributes in our application are closely related to the parameters in the experimental protocols. If modelers want to find experimental data to constrain their model parameters, it is desirable that model parameters are linked to experimental parameters in the experimental protocols, and experimental protocol parameters be consistent with the cooperative database attributes. In the protocol-based schema translation metadata model, a parameter connection matrix (PCM), essentially a table, is developed to contain all parameters for the mapping between experimental protocols and simulation protocols for a certain application. The fundamental problem with metadata management in multiple database systems is to provide a single database image that users can conduct queries without having to know the schemas of individual component databases. In our protocol-based schema translation approach, an attribute linkage matrix (ALM) is built, which is another table containing all attributes for mapping between cooperative database attributes with component database attributes. In addition to the ALM, the logical data structures of the component databases are specified by their export schema, and constrained by the attribute relevance matrix (ARM). The parameter connection matrix, the attribute linkage matrix, the attribute relevance matrix and the export schemas together represent the protocol-based metadata needed for linking simulations and experiments, as well as for processing cooperative queries. 206 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. If a simulation protocol has been represented in PSL, the simulation parameters in the PSL can be mapped to cooperative attributes in our connection matrixes. The corresponding database table attributes can be located consequently. 6.3.2.1 Cooperative Universal Relation Universal relation is a concept in database domain as well as an implementation framework (Brosda and Vossen 1988) (Chang and Sciore 1992) (Leymann 1989). As a concept in database systems, a universal relation is a super relation that consists of all attributes in a database. It hides from the user the existing logical database structures, and creates an illusion to the user that there is only a single relation in the database. As an implementation framework, universal relation allows the user to specify only attribute names of interest in the query by providing a higher-level query language to make the database more accessible. Thus queries can be more easily written in conventional SQL queries. In our protocol-based schema translation approach, we make use o f the concept of universal relation to define ontology for the integrated system. The ontology is a superset of attributes that subsume the parameters in the simulation protocols / 207 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. experimental protocols, and the component database attributes. To define the attribute list, it is necessary to define data dictionaries for specifying the mapping of simulation protocols with experimental protocols, and the definitions of all data attributes in the component databases. We assume that there exists a federation whose members have determined the data contents to be exported to the cooperative database system. Then the data dictionaries in our applications are realized by defining equivalence among the parameters that are consistent with simulation / experimental parameters and subsume the correspondent attributes in the cooperative databases. Notice that if a parameter in experimental protocols can be mapped to a cooperative attribute, then their semantics are equivalent. But not all the parameters in experimental protocols will be mapped to the cooperative attributes and there may be attributes in the cooperative attribute list that are not parameters in the experimental protocols. The notion of semantic equivalence among cooperative attributes and their corresponding attributes in the component databases is rather straightforward. More details are given in the following subsections. For the PSL example in Figure 6.6, the roles: Electrical type, Electrical intensity and Electricaljfequency, specify the simulation parameters to be mapped to the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. experimental protocol attributes, which are also cooperative attributes in database terms. 6.3.2.2 The Distinction of Names from Semantics To link modeling and experimental studies and issue the cooperative queries, it is necessary to define the parameter semantic for the mapping between simulation protocols and experimental protocols, and distinguish the parameter names from parameter semantics. The same applies to distinguish the database attribute names from attribute semantics. A parameter name is chosen to represent the parameter in the simulation protocol or experimental protocol respectively. For instance, "Rate of perfusion" can be a parameter semantic, while "Perfusion frequency" can be a parameter name in a simulation protocol, and "Perfusion rate" can be a parameter name in an experimental protocol. As for database attributes, for example, "The length of electrical manipulation" can be an attribute semantic, while "electrical duration" is a chosen attribute in one database, and "ElecPeriod" can be another attribute name in another database with the same semantic. These examples show that different parameter (resp. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. attribute) names can be chosen for the same parameter (resp. attribute) semantics for simulation protocols and experimental protocols (resp. databases). Usually, name conflicts occur in the protocol level and in cooperative databases level as their schemas are based on parameter/attribute names, not on parameter/attribute semantics. Name conflicts can be resolved by checking the parameter/attribute semantics. A cooperative universal attribute list should be determined according to the parameter/attribute semantic. In the next subsection, we discuss how parameters in simulation protocols are connected to those in experimental protocols. 63.2.3 Parameter Connection Matrix (PCM) There are two steps to define the parameter connection matrix. First, we need to determine parameters in the simulation protocols and their corresponding experimental protocol parameters. Second, the semantics of the experimental parameters and model parameters are explained. Both simulation and experiment parameters are named according to their cooperative parameter semantics, and the experimental parameter names are used as a part of cooperative universal database attribute names, since the cooperative attribute names 210 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are a superset of the experimental parameter names. In our application, most of the corresponding parameters in simulation protocols are chosen to be the same as those in experimental protocols. Table 1 is a portion o f the parameter connection matrix for our synapse model example in section 6.1.1 based on the description of simulation / experimental protocols. In the next subsection, we discuss how cooperative attributes are derived by means of relating all database attribute names to their attribute semantics. Table 1 . The Param eter Connection Matrix Parameter semantic Experimental parameter Model parameter Name of chemical Chemical name Chemical name Type of chemical manipulation Chemical type Chemical type Duration of chemical manipulation Chemical duration Chemical duration Type of electrical stimulus Electrical type Electrical type Intensity of electrical stimulus Electrical intensity Electrical intensity Frequency of stimulus Electrical frequency Electrical frequency Name of drug Drug nam Drug name Type of drug Drug type Drug type Rate of perfusion Perfusion rate Perfusion rate Duration of perfusion Perfusion duration Perfusion duration 211 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 63.2.4 Attribute Linkage Matrix (ALM) Fig. 6.9 illustrates a multiple database system consisting of three component databases X, Y and Z. Notice that X and Y are two neurophysiological experimental databases, and Z is a neurochemical experimental database. Database X and Y contain similar content with different naming conventions. Database Z contains minor overlapping attributes with those in database X and Y. To verify certain hypothesis for the synapse model example in section 6.1.1, we can make use of the data in these databases. The attribute linkage matrix is constructed to set up the linkage among cooperative attributes and the component database attributes. The cooperative attribute semantics can be determined by collecting the attribute definitions in all databases, and deriving a common list of attribute definitions as the attribute semantics. In a real-world multi­ database system, the list o f attribute semantics and cooperative names should be obtained based on an agreement among members of the federation. In our application, we construct a list o f attribute semantics that are common for both modeling and experimental databases. The cooperative attributes are a superset of those parameters in the experimental protocols and attributes from component databases. Once the cooperative attributes are defined, we can easily find the correspondence between the 212 with permission of the copyright owner. Further reproduction prohibited without permission. PhyExp(explD, name, date, region, chemManName, elecManType, recordID) ChemMan(manName, type, duration) ElecMan(manType, location, intensity, frequency) (I) Com ponent Database X NeuroPhyExp(ID^ pname, date, region, description, pchemName^pelecType, recordID) Chem(chemName, type, duration) Elec(elecType, intensity, frequency) (II) Com ponent Database Y NeuroChemExp(exp#, name, date, section, chemManName, stainType, HistolD) ChemManfchemName, type, duration) (in) C om ponent Database Z Figure 6.9 Cooperative component database schemas 213 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. cooperative and component attributes by comparing their semantics. We obtain the list of attribute semantics along with their given cooperative names as shown in Table 2. In our application, the cooperative universal relation and all component attributes are placed in a table, called attribute linkage matrix (ALM). The parameters in experimental protocols overlap with the cooperative attributes. We just add some more attributes with regards to the PCM to build up the ALM matrix. In the matrix, the first column consist of the list of cooperative attributes (experimental protocol parameters), and the first row less the first cell contains the names (or identifiers) of all component database attributes. Each non-blank entry o f the matrix intersects a cooperative attribute and a component database attribute. An entry in the matrix is blank if there is no corresponding attribute in this component database. Table 3 illustrates the content of the attribute linkage matrix for the example in Figure 6.9. 6.3.2.5 Attribute Relevance Matrix (ARM) In our neuroscience applications, since the databases are created in different neuroscience sub-domains, their connections are not straightforward. If we apply data mining in multi-database, it is essential that we identify attribute connections among 214 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 2. The Cooperative A ttribute Name and the Attribute Sem antic Coopertive attribute Attribute semantic Experiment ID Identifier of experiment Experiment Name Name of the experiment Date Date of the experiment Brain Region Region of the experiment Experiment Chemical Chemical name used Chemical Name Name of chemical Chemical Type Type of the chemical Chemical Duration Duration of the chemical Experiment Electrical Type of electrical manipulaiton Electrical Type Type of electrical Electrical Location Location of the electrical performed Electrical Intensity Intensity of electrical Electrical Frequency Frequency of electrical Experiment Description Description of the experiment Recording ID Identifier of the recording Stain Type Type of stain the databases. Models provide a high level connection for defining cooperative queries. Furthermore, domain knowledge is needed to provide linkage for component database attributes. These requirements are reflected in the attribute relevance matrix that links relevant attributes in component databases with cooperative attributes. The selection o f the relevant attributes is based on models and hypotheses for the models. Table 4 shows the attribute relevance matrix for the example shown in Figure 6.9 based on the synapse model in section 6.1.1 where both the neurophysiological and neurochemical experiments perform some kind of chemical manipulations. By linking these attributes in different databases together, it can help to infer causal relationships 215 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that are previously ignored in the recordings after further data analysis or data mining processes. Therefore, hidden knowledge can be derived. The attribute relevance matrix is usually a subset o f attribute linkage matrix. Table 3. The A ttribute Linkage M atrix for the Synaptic Modeling Example in section 6.1.1 Based on Figure 6.9 C oopertive attribute X Y Z Experim ent ID PhyExp.expID NeuroPhyExp.lD NeuroChem Exp.exp# Experim ent N am e PhyExp.nam e NeuroPhyExp.pnam e N euroChem Exp.nam e D ate PhyExp.date NeuroPhyExp.date NeuroChem Exp.date B rain R egion PhyExp.region NeuroPhyExp.region NeuroChem Exp.section Experim ent C hem ical PhyExp.chem M anNam e NeuroPhyExp.pchem Nam e NeuroChem Exp.chem M anNam e C hem ical N am e C hem M an.m anNam e Chem .chem Nam e Chem M an.chem Nam e C hem ical Type C hem M an.type Chem .type Chem M an.type Chem ical D uration ChemMan.duration Chem .duration C hem M an.duration Experim ent Electrical PhyExp.elecM anType NeuroPhyExp.pelecType E lectrical Type ElecM an.m anType Elec.elecType E lectrical Location ElecM an.location E lectrical Intensity ElecM an.intensity Elec.intensity E lectrical Frequency ElecM an.frequency Elec.frequency Experim ent D escription NeuroPhyExp.description R ecording ID PhyExp.recordID NeuroPhy E xp.recordID NeuroChem Exp.HistolD S tain Type NeuroChem Exp.stainType 6.3.3 Query Multiple Databases After developing the cooperative attribute list and those matrix in the middleware layer, we can use them to query multiple experimental databases. Take the synapse 216 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 4. The attribute relevance matrix for the synaptic modeling example in section 6.1.1 based on Fig. 6.9 C ooperative attribute X Y Z Experim ent Chem ical PhyExp.chem M anNam e NeuroPhyExp.pchem Nam e NeuroChem Exp.chem M anNam e C hem ical N am e Chem M an.m anNam e Chem .chem Nam e C hem M an.chem Nam e C hem ical Type C hem M an.type C hem .type C hem M an.type C hem ical D uration Chem M an.duration Chem .duration Chem M an.duration model we shown in section 6.1.1 as an example. If we would like to find experimental data to constrain the synapse model parameters for certain hypothesis, we need to get both neurophysilogical and neurochemical experimental data. As shown in Figure 6.9, neurophysiological and neurochemical data are stored in different databases since the experiments are conducted separately. Therefore, cooperative SQL is needed to get the experimental data from the databases. In our application, experimental data have been represented in the format of database tables. There are two possible ways for users to query the databases: (1) search the databases with model parameters, (2) search the databases with cooperative attributes. In the first case, if the users are from modeling domain and they specify the search attributes as model parameters, the model parameters need to be translated into cooperative attributes before the queries are submitted for execution in the 217 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. databases. In the second case, cooperative database attributes are specified as search terms. The general processes from defining a simulation protocol until getting the experimental results are represented in Figure 6.10. Constraints of experiment attributes Simulation protocols Human queries Experimental protocols Experimental PSL Models Experimental Databases Experimental protocol representation Mapping among experimental protocols and simulation protocols Results Figure 6.10 General Processes of Querying Experimental DB Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3.3.1 Mapping Simulation Protocols with Experimental Protocols The user-specified simulation protocol parameters can be translated into a set of experiment attributes in the databases by searching the PCM, where parameters in simulation protocols are mapped to a subset o f the cooperative database attributes. Figure 6.11 is the generic algorithm for the function Map simulation experiementO which translates simulation protocol parameters to cooperative attributes (experimental protocol parameters). 6.33.2 Cooperative SQL Processing In (Zhao 1997), a schema coordination method is introduced for federated database management where the databases are usually within the same business domain and are developed with naturally overlapped attributes. The difference between theirs and our applications lies in that the neuroscience databases are previously separate ones without an integrated schema. User assistance in identifying relevant attributes in the databases is needed. Therefore, our cooperative queries are guided by the model parameters and hypotheses, and are constrained by the attribute relevance matrix (ARM). 219 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The schema translation mechanism in our applications requires an extended SQL to allow specification of desired attributes in the SELECT clause and desired databases in the FROM clause. This extension is a simple but important functionality because users need to query only selected attributes and component databases. SELECT model parameters or cooperative attributes FROM databases WHERE selection and interdatabase join conditions If users specify the model parameters when the users are from modeling domain, we need to call the function Map simulation experimentQ in Figure 6.11 to convert the parameters into cooperative attributes. The task of translating the cooperative query to appropriate sub-queries is accomplished by the query processing procedure in the middleware layer. As a result, the users do not need to know the structural differences and naming conventions of individual databases. Thus, the users only need to know about the cooperative attributes or model parameters, and are released from knowing the details in the databases. For example, consider a cooperative query: "find the names of experiments and their experimental recordings in database X and Y that use the same kind of chemical as those stored in ChemMan “agonist” in database Z, and display the histology result 220 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Map_simulation_experiment(P): 1. Given an input P, where P is a set of simulation protocol parameters. 2. For each p. in P, i = 1 to n, n is the number of simulation protocol parameters. IF the model parameter exists in the Parameter Connection Matrix, THEN convert each model parameter to its corresponding cooperative database attribute ELSE the model parameter is ignored. The results of parameter conversion is a., where a. is cooperative attribute Figure 6.11 A Generic algorithm for translation between simulation protocol parameters and cooperative attributes (experimental protocol parameters) IDs of the neurochemical experiments. Refer to Fig. 6.9 for the database schemas related to the query. This cooperative query is written as: SELECT (experimental name},{experimental recording} (a) FROM X:, Y:,Z: WHERE Z: (chemical type} = (X, Y): (chemical type} AND Z: (chemical type} = "agonist" Where the terms in { } denote cooperative attributes. A database name in the FROM list is followed by a colon to denote that it is a database, not a table. Interdatabase join conditions are represented by database names. The term Z : and (X:, Y:) indicate that joins are done between database Z and X as well as between database Z and Y. The last selection condition means that this applies only to database Z. Notice that interdatabase joins do not depend on database structures, but on user's information 221 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. requirement, in our application through the attribute relevance matrix. They must be specified by the user since the system has no way o f knowing which related attributes and databases to be joined for a particular query. As for joins within the same database, they are determined by foreign key join as normal database systems do. 6.3.33 Query Transformation The cooperative SQL needs to be rewritten to subqueries to be accepted by the component databases. The modified genetic algorithm for query transformation motivated by (Zhao 1997) is shown in Figure 6.12. Generally the translation is done as the following steps: 1. Parameter conversion. If there are model attributes in the SQL, call fimction Map_simulation_experiment() to translation them into cooperative attributes assisted by the PCM. 2. Subquery creation. Create one subquery per group o f relevant component databases needed to process the subqueiy. Interdatabase join is a indicator for split of subqueries. 3. Consistency checking. Check the attribute relevance matrix (ARM) to verify the conditions of interdatabase join. 4. Attribute conversion. For each subquery, convert each cooperative attribute in the 222 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. SELECT and WHERE clauses to its component database attribute according to the ALM. 5. Join determination. Specify the equal condition for joins through foreign keys for component databases. 6. Table indication. In the FROM clause, specify the tables together with the component databases need to be accessed by referring to the component databases and relations in the subquery. The result of the query transformation of (a) is as following: Subqueiy 1 : SELECT X.PhyExp.name, X.PhyExp.recordID, Z.NeuroChemExp.HistoID FROM X.PhyExp, Z.NeuroChemExp WHERE X.ChemMan.type = Z.ChemMan.type AND Z.ChemMamtype = “agonist” Subquery 2: SELECT Y.NeuroPhyExp.pname, Y.NeuroPhyExp.recordID, Z.NeuroChemExp.HistoID FROM Y.NeuroPhyExp, Z.NeuroChemExp WHERE Y.Chemtype - Z. ChemMan.type AND Z.ChemMamtype = “agonist” The query manager takes users’ query and transforms them into appropriate sub­ queries. Therefore, the task o f query manager involves query transformation and query processing. 223 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Given a cooperative query Q(P, D, C) where P is the set of attributes, and D is the set of component databases, and C is the set of selection and interdatabase join conditions IF P is a set of model parameter, call Map_simulation_experiment(P). IF D contains interdatabase joins, IF the corresponding attribute pair in the interdatabase join conditions does not exist in the Relevance Matrix, THEN exit. Determine subqueries q.(P, d., C) for i = 1 to n, n is the number of subqueries. IF D contains interdatabase joins THEN d. is a set of dependent component databases in D with respect to the interdatabase joins in Q ELSE d. is the itfl component database in D . FOR each subquery q. { FOR each cooperative attribute in P FOR each database in d. i IF the component attribute exists in the Attribute Linkage Matrix, THEN convert each cooperative attribute to its corresponding Database.Table.ComponentAttribute ELSE the cooperative attribute is ignored. The results of attribution conversion is p. i FOR each cooperative attribute in C FOR each database in d. i IF the component attribute exists in the Attribute Linkage Matrix, THEN convert each cooperative attribute (or Database.CooperatiyeAttribute) to its corresponding Database.Table.ComponentAttribute ELSE q. is omitted since it can not be evaluated. i The results of attribution conversion is c. i } FOR each subquery q. { FOR each database in d., DO normal database join through foreign key joins FOR each q . i FOR each unique Database.Table in p. U c., put them in FROM clause of q . Figure 6.12 The Generic algorithm for query rewrite 224 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3.4 The Model-DataView Structure Based on hypotheses, different models are built. To test the accuracy of the models, experimental data is needed to verify them. To get data from experimental databases, it is desirable that users do not have to compose the queries to search the databases each time, and some default queries are ready for use for a model. To this end, we have developed the Model-DataView (MDV) structure. The principal mechanism in our Model-DataView structure to obtain selected data for a model is a view specification. A view on a database consists of a query that defines a suitably limited amount of data in the database. A view is defined in a relational model as a query over the base relations, and perhaps also over other views. Current implementations do not materialize views, but transform user operations on views into operations over the base table. The final result is obtained by interpreting the combination of view definition and user query, using the description o f the database stored in the database schema. To define a view, the first step is to identify the needed data in the databases. This is done with the hypothesis-driven approach. Given key data identified through hypothesis, our proposal is to combine the concepts o f models and views (queries) into a single concept: Model-DataView. This proposal 225 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is motivated by the need of simulating neural models to get real data from the experimental data. The concept of views is used to generate a working set of tuples corresponding to the data input by the models. The view definitions can be replaced by normal database queries. The mathematical models of the nervous system are created with simplifications and assumptions, which are abstraction o f the subsystems of a biological system. The abstraction can be represented by a set of system parameters. When we want to simulate a model with real experimental data, we would like to get the experimental data corresponding to the system parameters, and to get simulation results that are similar to the experimental results. More precisely, • For each model, we would like to find the experiment, where relevant conditions and manipulations of simulation protocols and experimental protocols are sufficiently similar. • For each model, we would like to find the experiment, where the simulation result and experimental result are sufficiently similar for a hypothesis. The experimental data obtained in the above-mentioned ways are defined as database views for re-use. A view is a table that is derived from other views or base tables. 226 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Views depend on other tables and views to exist. However, they are dynamic in nature, i.e. their contents change automatically when the base tables and views, on which their definition are based, change. If several databases are involved, a cooperative view may be defined and parallel queries are conducted. At the user end, integration of the search results is needed. When a user creates a model for a hypothesis, he/she needs to construct the queries that get the data from the experimental databases to test the model. The view definition and the hypothesis definition are included in the text field in the Model-DataView units respectively. Figure 6.13 is an illustration o f the fields in one Model-DataView unit. Model_name M odelJD M D V J D TimeStamp Hypothesis View(query) Figure 6.13 An Illustration of the Fields in One Model-DataView Unit The Modelname is the name o f the model. The Model_ID is the model identifier. The MDV ID is the identifier for different versions ofMDV units for a model. The 227 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Timestamp specifies when the MDV unit is generated. The Hypothesis stores the hypothesis definition; the View stores the view definition (query). The Model-DataView unit provides the following facilities: « It extracts data from the databases in relational form as needed. This is now done by a query corresponding to a view. A view tuple or set of view tuples will contain projected data corresponding to a model input. A view relation will cover the input parameters o f a model. 0 It is a realization of the mapping from the experimental protocols to simulation protocols. # The list o f the MDV units for one model is used to keep track of the variations of the hypotheses for the model and their view definitions. 0 The list o f the MDV units for one model is the records for the changes of the model versions and their model inputs. 0 It can group the data into a set of inputs, and makes them available to the simulation program under different hypotheses. The generation of the Model-DataView needs information other than data to perform its fimctions: 228 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • The database schema specifying the database, • The query portion identifying the base data, • The specifications of the primary relations identifying the model parameters, • The translation mechanism between experimental protocols and simulation protocols realized by the PSL, semantic model, metadata model in MDIS and the query definition, • The simulation prototype specifying the structure and linkages o f the data elements within a model, and the access functions for the model. A model in a simulation is presented as a group of parameters, and functions operated on the parameters. It is built according to the biological counterpart. Depending on the details of analysis, the models can be constructed into different levels of granularity. For the synapse example in the introduction section 6.1.1, the input for the synaptic model may be chemical stimulus. Based on the hierarchy o f the experimental protocols (Figure 6.14), chemical_type can be refined to be Agonists. Thus, an example of a view definition for the synapse model can be: Create view SynapseView as Select intensity, recordings From experimental_db Where chemicaljype = “agonist” 229 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The Model-DataView units are stored in the model database at the client site. The experimental protocol hierarchy is provided for new queries to be generated according to the hierarchical representation. For example, the above chemicaltype can be changed into AMP A, NMD A, etc. Chemical Stimulus Antagonist Agonist Glutamate AMPA NMDA CNQX APV APS Figure 6.14 A Portion of the Hierarchy of Experimental Protocols If we want to change the models or use different hypothesis, we need to change the view definitions accordingly to maintain the consistent linkage between a model and its query for the experimental data in the databases. Therefore, our MDV units keep track of not only model modification but also the changes of hypotheses and queries. They are maintained as metadata for the models and are accessed through the entries in a table called Model Data ViewTable. 230 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3.5 Controller The ModelDataViewTable is a structure in MDIS that serves as a metadata structure to keep track of the created Model-DataView units. The manipulation of the Model-DataView units is conducted by the controller component. The controller can insert, delete or update the entries in the ModelDataViewTable. For each model in the model database, there will be an entry in the ModelDataViewTable corresponding to it. There is a linked list of MDV units for each model in each ModelDataViewTable entry. Figure 6.15 is an illustration of the ModelDataViewTable structure. The MDV units in the same linked list have the same Modelname and Model_ID values. When different versions of models are created, the system will automatically generate a MDV ID for the MDV unit together with a timestamp (Fig. 6.14). If the view definition or variation of hypotheses for the model need to be modified along with the different model versions, it will be stored in the view field / hypothesis field, and a new MDV unit will be added to the linked list. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. MDV: Model-DataView unit Model nam el Model name2 Model name3 Model name4 Model nameS MDV1 MDV2 M DV3 MDV1 ► M DV1 Figure 6.15 The ModelDataViewTable 6.3.6 User Interface The graphic user interface is the front-end ofMDIS for users to input their requests, protocol specifications and queries. It provides graphical assistance and constraints for user specifications. It is also the part for users to visualize the results. A prototype of a user interface for protocol specification is shown in Figure 6.16. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In the figure, model parameters are grouped into two classes, namely, conditions that include those parameters that remain constant throughout the simulation, and manipulations which contain the time-stamped changes of parameters. Default values can be used in the condition part to simplify the specification process. A series of simulations to test a specific hypothesis is organized into blocks with each block consisting of multiple trials. Manipulations that remain constant for all trials in a block are specified once at the block level whereas those vary from trial to trial are specified on a trial-basis. File | Set_up Simulation Edit View Init Run Cont interrupt EONS library Protocol Van Membrane potential Vm: -70.0 Var: Membrane resistance Rm: 12000.0 Var: Membrane capacity Cm: 10.0 Var: Temperature T: 2S.O Conditions Manipulations Blocks Var: Number of trials n: 3 Var: Glutamate (with time points] t - O : 200.0 t > O : 0.0 t = 60 : 300.0 t > 60 : 0.0 Trials Trial #1 Position of AMPA: Trial #2 Position of AMPA: Trial #3 Position of AMPA: Figure 6.16 A GUI for Simulation Protocol Specification 233 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.4 Characteristics of MDIS The MDIS is a middleware information mediation system to support the connection between modeling and experimental databases. The protocol specification language, semantic model, the parameter connection matrix, the attribute linkage matrix, the attribute relevance matrix, MDV, and the cooperative query processing method together form a protocol-based schema translation mechanism. Figure 6.17 is the illustration o f the architecture of the protocol-based schema translation mechanism. By developing such a middleware layer between modeling and experimental databases, the communication path for them is set up despite the logical structure difference between models and experimental databases. The middleware layer is scalable since additional functionality can be easily included. The procedure o f protocol-based schema translation includes the following steps: 1. Define protocols according to PSL and semantic model. 2. Mapping simulation protocols with experimental protocols. 3. Creation of the parameter connection matrix. 4. Collection of export schemas of component databases. 5. Determination of cooperative attributes. 234 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6. Creation of the attribute linkage matrix. 7. Determination of relevant attributes in multiple databases. 8. Creation of the attribute relevance matrix according to models and hypotheses. 9. Translation of cooperative queries. 10. Default queries are stored and later supplied by Model-DataView units (alternative). Modeling ALM PCM ARM Model-D-ataView Cooperative query processing Simulation / expe rim e nta I p rotoco Is Domain knowledge Protocol specification language Semantic model Metadata model Protocol-based schema translation Multiple experimental databases Figure 6.17 The Architecture of the protocol-based schema translation mechanism 235 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The protocol-based schema translation approach provides mechanisms for linking modeling and multiple neuroscience experimental databases at the parameter / attribute level through parameter connection matrix and attribute linkage matrix. The parameter / attribute equivalence is based on parameter / attribute semantic and experimental / simulation protocols. When simulation /experimental protocols or component database schemas are updated, we only need to change the parameter connection matrix and the attribute linkage matrix. Therefore, logical data independence is ensured for both modeling and experimental databases, and minimal schemas mapping is supported. Consequently, the maintenance o f the metadata model is easy. Neural models provide the conceptual-level structure for identifying relationships among the different neuroscience experimental databases. The attribute relevance matrix is used to constrain the equivalence o f certain attributes in different databases. Depending on variations of models and hypotheses, the attribute relevance matrix can be modified, thus providing flexible extensibility for different query processes and data mining functionality. The protocol-based schema translation mechanism requires the system to translate the cooperative queries into subqueries in conventional SQL. The semantics of the queries is defined by means o f the matrixes. The data returned from the queries can be input for OLAP analysis, and other data mining facility. 236 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Therefore, the middleware layer between modeling and experimental databases based on the protocol-based schema translation mechanism has the potential for further extension of data mining on neuroscience experimental data. The Model-DataView structure provides a mechanism for storing default queries associated with each model. Hence, users do not need to construct queries for searching experimental data from scratch each time. It not only makes it efficient for model/experimental data connection, but also makes it feasible for experimentalists/modelers to use our Model/Data linkage mechanism since they are not required to be an expert in SQL to use the system. Therefore, the barrier is lowered for neuroscientists/modelers to use computer science technology. 6.5 Implementation Issues In previous sections, I have described the MDIS architecture for integrating modeling and experimental studies. The implementation o f this architecture is a complex task and has to be done incrementally. Therefore, a component-based approach is adopted. The component-based approach makes the implementation o f components modifiable without affecting other components. It can reduce cost, risk and time to deploy, maintain and upgrade the components. 237 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The general processes for defining simulation protocols until getting experimental results are represented as: 1. User defines the simulation protocols according to PSL. 2. The PSL protocols are represented in C++ classes. 3. Translate C++ simulation protocol class attributes into experimental protocol attributes. 4. User sends queries to the experimental database based on constraints got from step 3. 5. The experimental database outputs the experimental recordings according to the queries. 6. The experimental results are used to compare with simulation results to test the fitness between models and experiments. In the following, the implementation of a subset of the MDIS architecture is discussed. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The Protocol Specification Language and Semantic Model The PSL is a conceptual level language that can be implemented depending on the requirements of different applications. The challenge lies in how to map the PSL into a structure that can be implemented in a programming language. In our application, the PSL is used to describe the concept o f protocols and individual protocols. There is a hierarchical representation of both simulation protocols and experimental protocols. Each protocol is a stand-along unit with a subset of constructors in PSL. Therefore, we choose to build a common semantic model for the protocols and to use an object-oriented language for implementation that can preserve these characteristics of protocols. The semantic model for the protocols includes the basic attributes o f simulation protocols. The model is implemented in C++ as a base class. Other protocols with varying definitions inherit the base class and override the attributes. Thus, the C++ classes define a hierarchy of simulation protocols. Individual protocols are instantiated from the classes. In our application, the experimental protocols have been represented in the format of database scheme as table relations and table attributes. The class attributes can be mapped to their counterparts in experiment attributes based on hypothesis and parameter/attribute translation specified in the metadata model. To 239 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. map simulation protocols into a subset of database scheme, a C++ program has been developed to translate user-specified simulation protocols (C++ classes) into a set of constraints, which are experiment attributes in the databases for query definitions. In the mapping program, we can define a class as the base class of Simulation- protocol (Figure 6.18). It includes: simulation-manipulation and simulation-condition. Simulation-manipulation can be further defined as electrical-manipulation and pharmacological-manipulation, etc. class Simu_protocol { friend class Elecmanip; friend class Pharmmanip; friend class Simu cond; public: int Prold; Elecjmanip *elec_manip; Pharm manip *pharm_manip; Simu_cond *simu_cond; MapTable *mtable; void tranc_elec(int, ofstream&); void trancjpharm(int, ofstream&); void tranc_cond(int, ofstream&); Simu_protocol(int, int); ~Simu_protocol(); > 5 Figure 6.18 A Class Definition of Simulation Protocols for Mapping with Experimental Protocols 240 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. The Simu-protocol class includes other classes such as elec manip, pharm manip and simucond. The function tranc_elec(...) processes the matching between simulation protocols and experimental attributes in electrical manipulation. The function trancjpharm(...) processes the matching between simulation protocols and experimental attributes in pharmacological manipulation. The function tranc_cond(...) processes the matching between simulation protocols and experimental attributes in initial conditions. For specific simulation model parameters input by the users, the program will find the corresponding experiment attributes. After user input the parameters in simulation protocols, the program will output constraints for attributes in the experimental database. A small experimental database in Informix to store the experimental recordings and protocols has been constructed. The Metadata Model The protocol connection matrix (PCM), attribute linkage matrix (ALM) and attribute relevance matrix (ARM) have been implemented in C++ to set up the mapping among simulation protocol parameters and experimental database attributes. The concepts in the protocol connection matrix (PCM) are determined by simulation and experimental protocols in an application. There are several options on how the 2 4 1 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. PCM can be implemented. It can be implemented as a table in the database or a user defined data structure. Since a user defined data structure is easier to maintain and update, a linked list is chosen. It is easy to search through a linked list and make modifications. A declaration of PCM unit is shown below. Struct PCM_Unit { char *simulationjparam; char *experimental_attri; Struct PCMJJnit *next; }; The *simulation_param is a simulation model parameter. Its corresponding experimental protocol attribute is stored in *experimental_attri. Each pair is linked by the next link. The definitions for the attribute linkage matrix (ALM) and attribute relevance matrix (ARM) are supplied by the database table attributes and cooperative database attributes in the application. Depending on how many databases are involved and the ontology for the application and databases, the concepts in the matrix are identified. In our application, it is determined by the terms in experimental protocols and database table attributes. The attribute relevance matrix is determined by hypotheses 242 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. in the application as well. There are several options on how the ALM and ARM can be implemented. They can be implemented as tables in the database or user defined data structures. Since a user defined data structure is easier to maintain and update, we choose to implement them as linked lists. The definition and manipulation of these two matrixes are similar to the PCM. The MDV Controller The implementation of the MDV controller is based on the assumption that there is a model database and each model has a unique ID associated with it. A prototype of controller component has been developed in C++ to keep track o f the created Model- DataView units. It can be used to insert, delete and update the entries in the Model- DataViewTable. The following are the definition ofModel-DataView and Model- DataViewtable in C++. Struct ModeI_DataView { char *model_name; integer modelJD; integer unit_ID; float timestamp; Char ‘hypothesis; Char ‘view; Struct Model_DataView ‘next; }; 243 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Struct ModelDataViewT able { char *model_name; integer modelJD; Model_DataView *model_dataview; Struct ModelViewTable *next; }; void Add_ModelViewUnit (Model_DataView *view_unit); void Delete_ModelViewUnit (ModelJDataView *view_unit); The *model_name is the name of the model; the m odellD is the model identifier; the *unit_ID is the identifier for different versions of Model-DataView units for a model; the timestamp specifies when the Model-DataView unit is generated; the ^hypothesis stores the hypothesis definition; the *view stores the view definition; the *model- dataview is a linked list of Model-DataView units; the *next field is the point pointing to the next Model-DataView unit or the next ModelDataViewT able entry. The Model-DataView units in the same linked list have the same *model_name and m odellD values. The function Add_ModelViewUnit() will add a model-dataview unit to the corresponding entry in the ModelDataViewTable. The function Delete ModelViewUnitO will delete a model-dataview unit from the corresponding entry in the ModelDataViewTable. The query processing procedure is done in an ad hoc manner at the moment by human being. It allows users to submit sub-queries (SQL statements) to relevant databases. 244 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. The graphical user interface has not been implemented yet. However, the component- based implementation approach makes it easy to add a component into the architecture. The implementation o f the query manager and GUI will be interesting future work. 6.6 Related Work Most neuroscience research has concentrated on either modeling or experimental studies. Very few of them have been on the development of integrated architecture for linking modeling and experimental studies. Database technology has not been largely applied to neuroscience research. Evidently, combining them will give more insights into both modeling and experimental studies, especially by integrating multiple neuroscience experimental databases. Conventional federated database systems have mostly been based on the schema integration approach (Ahmed 1991) (Batini, Lenzirini et al. 1986) (Breitbart, Olson et al. 1986) (Breitbart 1990) (Josifovski and Risch 1999) (Larson, Navathe et al. 1989) (Milo and Zohar 1998) (Naumann and leser 1999) (Reddy, Prasad et al. 1994) (Sheth and Larson 1990) to resolve the logical and structural diversity among component databases, where a global database schema for the component databases are obtained. However, this approach is not directly applicable to our neuroscience applications because most of 245 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without permission. the neuroscience databases in different sub-areas are designed and developed individually. An integrated database schema for all of them is impossible. In (Zhao 1997), a schema coordination approach has been proposed where global federated database attributes are derived from the component database attributes to form a matrix of correspondent attributes. Their approach differs from the conventional schema integration approach in that no global database schema is constructed; instead, a list of federated database attributes is formed to provide the advantages of simplifying the database integration and enabling logical data independence. Their approach has been adopted in our work to develop the protocol- based schema translation mechanism. Essentially, users need only to specify the search attributes in terms of the cooperative database attributes, which then will be translated to component database attributes. However, the construction of the cooperative database attributes in our approach has taken the specific characteristics of neuroscience applications into consideration to cover the terms in both modeling and experimental studies. It is based on the observations that simulation protocols can be mapped to experimental protocols and the cooperative database attributes are a super set of both experimental protocol parameters and the component database attributes. R eproduced with perm ission of the copyright owner. Further reproduction prohibited without permission. Furthermore, modeling and experimental databases are two very different disciplines and a direct connection between them is difficult. By developing a middleware layer between these two, our protocol-based schema translation mechanism provides a systematic approach for an integrated architecture and supporting cooperative query processing of multiple experimental databases. This modular approach makes it easy to add more functionalities and services to the middleware layer without much difficulty for future extensions. System integration is closely related to middleware system development, which has been an active research area in Computer Science. DISCO (Tomasic, Raschid et al. 1996) builds on the notion of capability records and requires a wrapper writer to describe a data source’s capabilities by means of a language based on a set of relational (logical) operators such as select, project, and scan. GARLIC (Carely and al. 1995) is a middleware system that provides an integrated view of a variety of legacy data sources, without changing how or where data is stored. Garlic wrappers model legacy data as objects, participate in query planning, and provide standard interfaces for method invocation and query execution (Roth and Schwarz 1997). The TSIMMIS system (Papakonstantinou, Garcia-Molina et al. 1996) used the Query Description and Translation Language (QDTL) (Papakonstantinou and al. 1995) to express the specifications of query power. Most of the work on middle-ware system 247 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without permission. has been emphasized on developing wrappers and mediators so that a wide range of data source types can be integrated together. In contrast, our work in neuroscience applications for system integration needs not only tackle data integration, but also explore extracting and deriving connections for two different domains, i.e. modeling and experimental studies. Very few work has been concentrated on system integration of database techniques into experimental and modeling studies. In (Dashti, Ghandeharizadeh et al. 1997), a conceptual design for a Neuroanatomical Rat Brain Viewer is proposed. Some database issues involved in neuroscientific applications such as data integration, materialized views, indexing and fault tolerance are addressed. In (Forss, Beeman et al. 1999), a distributed digital library for neuroscience is presented, where a three-tier architecture with CORBA as the middleware to facilitate database federation is described. Database tools for integrating data within neurons and comparing data across neurons are described in (Mirsky, Nadkami et al. 1998). However, there lacks a general architecture to address the representation gap and communication between modeling and experimental studies. In (Shu, Xie et a L 1999) simulation protocols are designed similar to experimental protocols to build a common ground for experimental and modeling approaches to communicate with each other. The work on building the metadata model for integrating modeling and experimental databases in 248 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. MDIS was presented in (Shu and Arbib 2000b). To have an integrated architecture for neuroscience applications to enhance the representation and communication for modeling and experimental studies, more mechanisms need to be included besides the metadata model. This chapter has addresses the development of these mechanisms. 6.7 Discussions One of the major difficulties in integrating the experimental and modeling approaches is finding relevant data to constrain model parameters such that comparisons o f the simulation results and experimental data can be grounded. In this chapter, we propose an approach that integrates modeling and multiple neuroscience database system by means of neural models. We introduce the protocol-based schema translation mechanism within MDIS, a new methodology for processing cooperative queries in multiple neuroscience database systems for linking modeling and experimental databases. The novelty o f this approach is that it provides a middle layer based on parameter / attribute mapping and cooperative query processing for linking modeling and multiple experimental databases. It offers minimal schema translation based on the notion of simulation / experimental protocols, universal relation and identified relevant attributes in the databases. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. The MDIS architecture is described together with its main components in this chapter. The architecture provides a generalized information integration system for neuroscience applications. The initial prototype software was developed to integrate neural models and data from neuroscience experimental databases. The same architecture can be used for other simulation and experimental environments. The architecture will not only facilitate the verification of models with experimental data, but also make it possible to analyze different kinds of data sets from multiple experimental databases to facilitate data mining. The architecture of MDIS aims at accomplishing the integration of modeling and experimental databases without demolishing the logical structure differences between these two disciplines. Instead of building a global schema for databases, experimental protocols are used as a common ontology for model/data connection. MDIS serves as a middle-ware layer between modeling and experimental databases so that the architecture is very scalable for future function extension. The development o f MDIS is a good start toward realizing the goals of building a system to promote collaboration, integrated analysis, data mining, data sharing and data management for both modelers and neuroscience experimentalists. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 7: Integration of Modeling, Experimental Databases and Data Mining 7.1 Introduction Chapter 6 has described the Model/Data Integration System (MDIS) for linking modeling and experimental databases. Testing fitness between models and experimental data is an interesting application of the integration. Moreover, experimentalists have accumulated a large amount of experimental data, and useful knowledge could be found if we analyze them at a large scale, showing a need for data mining on neuroscience databases. Therefore, the development o f an integrated architecture that enhances the collaboration and interactions among simulation, experimental databases and data mining system is better for conducting brain research. In this chapter, I present extensions of integrating modeling and experimental studies: testing fitness between models and experiments, and adding data mining into the integrated architecture. The communication among simulation engine, neuroscience databases and data mining component is facilitated by a data transfer mechanism for message passing using XML and an interactive protocol. 251 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. To accomplish the task of prov id in g extension for the integration where modeling, experimental databases, and data mining component can efficiently communicate, some communication mechanisms are needed. In this chapter, we will first introduce the application of testing fitness between models and experiments. The architecture for integrating the three components is presented in section 7.3. Section 7.4 and section 7.5 describe the data transfer mechanism and the interaction protocol for component communication respectively. Section 7.6 is the discussion. 7.2 Testing Fitness between Models and Experiments The introduction of MDIS can facilitate the processes for testing fitness between models and experiments. Given an experiment, which model is the best that gives the most adequate interpretation o f the experimental results? Moreover, if there are more than one experimental recordings met certain query, which experimental result is more similar to the simulation result of a model than others? AMPA and Synapse Models To show the capability of MDIS for testing the fitness between models and experiments, two different AMPA models and two different Synapse models are 252 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. developed. One AMPA model and one Synapse model are based on (Ambros- Ingerson and Lynch 1993) (Lester and Jahr 1992) which I have presented in section 6.1.1. The other AMPA model and the other Synapse model are based on (Zador 1990). An Alpha Sanction was used for non-NMDA current in (Zador 1990). I(t) = (E syn - vm ) k*gp *t*exp(-t /tp) [1] With k - e / t p,e the base o f the natural logarithm, tp~ 1.5msec, E syn = 0 mV, and the peak conductance gp = 0.5 nS. The NMDA conductance was a function of both time and membrane potential. The voltage-dependence was derived from a model in which the binding rate constant of Mg2 + to the site of channel block varied as a function o f voltage (Zador 1990). I(t) = (E syn - Vm ) g„*(exp(-t/ taul)- exp(-t/ tau2)) / (1+ n [Mg] exp(-rV)) [2] With taul = 80msec, tau2 = 0.67 msec, n = 0.33/mM, r — 0.06/mV, Em - 0 mV, and g« = 0.2 nS. 253 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Based on the above equations, an AMPA model and a Synapse model were developed in EONS and simulations were run on them. Consequently, there are different AMPA and Spine models in EONS. The below curr_t (current time) is controlled by the simulation time step and total simulation time. An array of AMPA (NMDA) is used to represent various number of AMPA (NMDA) receptors in a synapse. void AMPA::calculate_d(float curr_t) { la = (Esyn -Vm) * k * gp * curr_t * exp(~curr_t/tp); } void NMDA::calculate_d(float curr_t) { Mg = 0.0012; In = (Esyn-Vm)* gn * (exp(-curr_t/taul)- exp(-curr_t/tau2)) /(I + m ta * Mg*exp(-alpha * The_Spine->V)); } void Spine :: calculate_V(float curr_t) { int i; float la=0.0; float ln=0.0; for (i=0; i<no_ampa; i++) { has_ampa (i ] ->calculate_d (curr__t) ; la = has_ampa[i]->Ia; } for (i=0; i<no_nmda; i++) { has_nmda[i]->calculate_d(curr_t); In = has_nmda[i]->In; } 254 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. II = G_leak * (V + 70.0); V = V + delta_t * (la + In -II)/ KC; } Testing Fitness Between AMPA Models and AMPA Receptor EPSP Recordings The simulation results of the two AMPA models are compared with the experimental results in the experimental database. The AMPA models are analyzed to determine the fittest one for the AMPA receptor EPSP experimental recordings constrained by the protocols. A PSL definition is defined for a simulation manipulation as: AMPA-simu: (AND simulation-manipulation (MUST receptor ampa) (WITH elecType current)) The above PSL representation can be used to constrain user queries for searching experimental data in the databases. Thus, a query is formed to get the experimental recordings for AMPA receptor. select exp.recording from exp-db where pharm type = ‘ampa’ and electype = ‘current’. (a) 255 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. After ninoing the query and getting the experimental results, the experimental recordings are compared with the simulation results from the two different AMPA models, and found out which model is a better fit for the experiments. At the same time, the model parameters can be changed to approximate the simulation results to the experimental results. The left plot in Figure 7.1 is the comparison of an experimental result with the simulation result of the AMPA model presented in section 6.1.1. The clear line is the simulation result and the line with noise is the experimental result. The right plot in Figure 7.1 is the comparison of the same experimental result with the simulation result of the AMPA model in [1] o f this section. The clear line is the simulation result and the line with noise is the experimental result. We can see that the AMPA model in (Ambros-Ingerson and Lynch 1993) (Left plot in Figure 7.1) is a better fit for the AMPA receptor EPSP recording experiment. This can be verified by running the Euclidean distance function on the simulation result and the experimental result. | X - Y | = sqrt ( ( x - y ) * ( x - y ) ) i s used as the Euclidean distance function for Figure 7.1. Using the Euclidean distance function, a list of distance ranking for experimental recordings with regard to their similarity to a 256 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. /Pi \ s « \ 100 200 300 10 8 6 4 2 0 -2 100 200 300 0 Figure 7.1 Comparison o f an Experimental Result with the Simulation Results of two AMPA Models simulation result can be shown. High value means more distance, less similar. Low value means less distance, more similar. Compare(exp, AMPA1) = 0.535409 Compare(exp, AMPA2) = 1.50627 257 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The distance for the first AMPA model of the experimental result is 0.535409. The distance for the second AMPA model of the experimental result is 1.50627. Thus, the first AMPA model is a better fit because the distance is smaller. Testing Fitness Between Synapse Models and EPSP Recordings with both AMPA and NMDA Receptors The simulation results of the two Synapse models are compared with the experimental results in the experimental database. The Synapse model is analyzed to determine the fittest for the experiments on EPSP recordings with both AMPA and NMDA receptors. A PSL definition is defined for a synapse model simulation manipulation as: Synapse-exp: (AND simulation-manipulation (MUST receptor ampa+nmda) (WITH elecType current)) The above PSL representation can be used to constrain user queries for searching experimental data in the databases. Thus, a query is formed to get the experimental recordings for a synapse with both AMPA and NMDA receptors. 258 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. select exp.recording from exp-db where pharm_type = ‘ampa+nmda’ and elec_type =’current’ (b) After running the query and getting the experimental results, the experimental recordings are compared with the simulation results from the two different Synapse models, and found out which model is a better fit for the experiments. At the same time, the model parameters are tuned to approximate the simulation results to the experimental results. Figure 7.2 is the comparison of an experimental result with the simulation result of the Synapse model presented in section 6.1.1. Figure 7.3 is the comparison o f the same experimental result with the simulation result o f the Synapse model presented in this section. From the above figures, we can see neither of the synapse models is a best fit for the experimental result. We need to tune the model parameters to improve it. Therefore, testing the fitness between models and experiments is two-fold. One is to test which model better fits a certain experiment. The other is to test which experimental result better fits a simulation result. Systematically finding the fitness of a model with an experiment as well as automatically tuning parameters are interesting research issues and fiiture work. 259 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 6 4 2 0 ■ 2 250 100 200 300 50 150 0 Figure 7.2 The comparison of an experimental result with the simulation result of the Synapse model presented in section 6.3.4.2. 8 6 '— 4 2 0 ■ 2 200 250 50 150 0 100 Figure 7.3 The comparison of an experimental result with the simulation result o f the Synapse model presented in this section. 260 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.3 Integrating Data Mining into MDIS Experimentalists have accumulated a large amount of experimental data, and useful knowledge can be found if we analyze them at a large scale, showing a need for data mining on neuroscience databases. The development of an integrated architecture that enhances the collaboration and interactions among simulation, experimental databases and data mining system is better for conducting brain research. The MDIS architecture presented in chapter 6 provides a generalized information integration system for neuroscience applications. Adding data mining component into the architecture will not only facilitate the verification of models with experimental data, but also make it possible to analyze different kinds of data sets from multiple experimental databases to find hidden knowledge. Therefore, an enhanced architecture with data mining capacity has the potential for integrated data analysis and facilitating data mining on multiple experimental databases. 7.3.1 A Scenario Consider a scenario for an integrated environment here (Figure 7.4). Suppose that an experimentalist has conducted a new experiment and put them into the experimental database from a GUI, and he can send a request to check the model database. 261 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. models queries experiments data trends requests patterns modeling Data mining Experimental databases Figure 7.4 A Scenario of information flow among modeling, experimental database and data mining system If there is no corresponding simulation result that uses the experimental data, he will get a message return and can invoke the simulator to run with the new experimental data. The simulation results will be compared with the experimental results. If the results are comparable, a message will be sent back to the GUI for the experimentalist, and no further actions will be triggered. If the results are not comparable, the experimentalist can either reconsider his experiment or send a message to the data mining engine to check what is the trend on the condition that when the values of parameters having been changed based on the experimental data. The results will be propagated back to the GUI for the experimentalist. This will trigger some reconsideration of the design o f both modeling and simulation, and experiments. 262 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. If a modeler has designed a new model and would like to test it with experimental data from th e experimental databases, he can issue queries to search for similar experimental data. If there is no corresponding experimental result in the database, he can do a query relaxation and find some relevant experimental results and compare them with his simulation result. If the database has been changed since the simulation last run, a new simulation will be run. The simulation results will be compared with the experimental results. If the results match with each other, a message will be sent back to the modeler, and no fu rth er actions will be triggered. If the results do not match, the simulator engine can send a message to the data mining engine to get additional information on the trend. New design of the model and updates on the Model-DataView units may be performed. 7.3.2 The Extended Architecture The integration of modeling, experimental databases and data mining is a n-tier architecture. The MDIS is on the top which provides metadata and manipulation mechanism. The lower layer consists of the communication mechanisms and the three components: simulation engine, experimental databases, and data mining component. Figure 7.5 is the overview of the architecture. 263 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. MDIS queries models data requests patterns modeling Data mining Experimental databases experiments Figure 7.5 An extended architecture for integrating modeling, neuroscience experimental databases and data mining system Since MDIS is a middle-ware layer system, data mining component can be easily attached. It only needs to access the metadata model in MDIS for relevant information. The modeling and simulation has been described in chapter 4, experimental databases and data mining component have been described in chapter 5, the integration of modeling and experimental databases through MDIS has been discussed in chapter 6 respectively. Therefore, the issues to be addressed in the following are the communication mechanisms among the three components, mainly 264 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the common representation for message passing among the components and interaction protocol for communication in the environment. 7.4 A Common Representation for Data Transfer The modeling and simulation system, experimental databases, and the data mining system can be three software entities each operating as stand-alone applications; unfortunately, they are each limited in their functionalities. The modeling and simulation application has no integrated testing capabilities with experimental databases; data mining component does not get insights from simulation. Integrating them will provide more capabilities for all of them. Moreover, the three components each have their own representation schema. It is very difficult to integrate them without having a common representation for message passing. The newly proposed W3C standard for information exchange on the Internet - XML (extensible Markup Language) provides a language that can be used to define a common data modeling language and representations in our environment. XML is a textual language quickly gaining popularity for data representation and exchange on the Web (Bray, Paoli et al. 1998). Nested, tagged elements are the building blocks of XML. Each tagged element has a sequence of zero or more 265 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. attribute / value pairs, and a sequence of zero or more sub-elements. These sub­ elements may themselves be tagged elements, or they may be “tagless” segments of text data. Because XML was defined as a textual language rather than a data model, an XML document always has implicit order - order that may or may not be relevant but is nonetheless unavoidable in a textual representation. A well-formed XML document places no restrictions on tags, attribute names, or nesting patterns. Alternatively, a document can be accompanied by a Document Type Definition (DTD), essentially a grammar for restricting the tags and structure o f a document. An XML document satisfying a DTD grammar is considered valid. While not exactly a data model, a standard Document Object Model (DOM) for XML has been defined (Apparao and Byrne 1998), to enable XML to be manipulated by software. The DOM defines how to translate an XML document into data structures and thus can serve as a starting point for any XML data model. XML is an emerging standard for data representation and exchange on the World- Wide Web. At its most basic level, XML is a document markup language permitting tagged text (elements), element nesting, and element references. However, XML also can be viewed as data modeling language (Marchiori 1998). Semi-structured data has been defined as data that may be irregular or incomplete, and whose structure may Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. change. Although data encoded in XML may conform to a Document Type Definition, or DTD (Bray, Paoli et al. 1998), DTD’s are not required by XML. Database vendors have started to developed XML-enabled products, i.e. to allow one to easily read and write data formatted as XML documents to and from the databases. For instance, the Oracle8I Core XML Support consists o f three key facilities: • Oracle XML Parser, for programmatic processing of XML documents or document fragments. • XML Support in Oracle Internet File System to automate parsing and rendering of data between XML and database. XML documents can be stored into and rendered from a series o f related database tables and columns, or stored in a “text blob”. For documents that mix structures data and structured text markup, one can combine the two approaches. • XML-enabled “section searching” in ConText for more precise searches over structured documents. Any XML documents or document fragments saved into “text blobs” in the database can be enabled for indexing by Oracle8I InterMedia’s ConText text-search engine. 267 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For example, if there is a database with a table Experimental Recording, we can use XML to model this table. The following is the table definition in XML. <!DOCTYPE ExperimentalRecording http://www-scf. use. eduM^shu/schemes/mvexperimental recordings. dtd> <experimental_recording> <recordings> <recording> <fact_id> 123 </fact_id> <record_key> 456 </record_key> <amplitude> -55 </amplitude> <rising_con> 0.98 </rising_con> <decay_con> 0.54 </decay_con> </recording> <recording> <fact_id> 234 </fact_id> <record_key> 756 </record_key> <amplitude> -58 </amplitude> <rising_con> 0.77 </rising_con> <decay_con> 0.34 </decay_con> </recording> </recordings> <elec_stimulus> <elec_stim> <intensity> 0.8 </intensity> <firequency> 100 </fi-equency> </elec_stim> <elec_stim> <intensity> 0.5 </intensity> <frequency> 100 </fi*equency> </elec_stim> </elec_stimulus> </experimental_recording> 268 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Using XML as a common representation, we can send messages from one component to another one. The messages are XML documents that are physically sent between the different components that are taking part in the environment. It requires that at the other end, there is a converter that can extract the information from XML. The message is constructed into blocks; and is exchanged between components to create a session. XML definitions of a message include a session reference block, which contains information identifying the session and the block. XML ID Attributes are used to identify data record Messages. Extra XML Elements and new user defined values for existing codes can be used when the message is extended. With the support o f MDIS, we can map the experimental protocols into simulation parameters and compare the experimental recordings with the simulation results. 7.5 System Integration In order to integrate the modeling and simulation system, experimental databases, and data mining system, we must devise a means whereby they may communicate with one another. In essence, we want experimental databases sending a signal to the simulation engine when new experimental data are entered into the databases, and the simulation engine can confirm the results with simulation results if they are available when the signal was sent. If there is not available simulation data, new simulations 269 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. will be conducted. As a result, the chosen communication mechanism must support bi-directional links between the three software units. Adding to the difficulty of integration is the fact that the components may be written in different programming languages. One integration strategy is to use Pipes and FIFOs (Halsall 1996). A pipe is a mechanism for interprocess communication; data written to the pipe by one process can be read by another process. The data is handled in a first-in, first-out (FIFO) order. The pipe has no name; it is created for one use and both ends must be inherited from the single process which creates the pipe. A FIFO special file is similar to a pipe, but instead of being an anonymous, temporary connection, a FIFO has a name or names like any other file. Processes open the FIFO by name in order to communicate through it. A pipe or FIFO has to be open at both ends simultaneously. If you read from a pipe or FIFO file that doesn't have any processes writing to it (perhaps because they have all closed the file, or exited), the read returns end-of-file. Writing to a pipe or FIFO that doesn't have a reading process is treated as an error condition; it generates a SIGPIPE signal, and fails with error code EPIPE if the signal is handled or blocked. Neither pipes nor FIFO special files allow file positioning. Both reading and writing 270 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. operations happen sequentially; reading from the beginning of the file and writing at the end. Another way is to use sockets. A socket (Halsall 1996) is a generalized interprocess communication channel. Like a pipe, a socket is represented as a file descriptor. But, unlike pipes, sockets support communication between unrelated processes, and even between processes running on different machines that communicate over a network. Sockets are the primary means of communicating with other machines; telnet, rlogin, ftp, talk, and the other familiar network programs use sockets. After weighing the above options, it was decided that a socket should be used as the fundamental conduit of information transfer among the simulation, experimental databases, and data mining. Using a socket, a simple communication protocol could be developed by which the three components would communicate with one another. In addition, the only means by which the three components would interact with each other is via the socket, thereby encouraging loose coupling among the three modules. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.5.1 Advantages of Using Distributed Object Computing From a purely software engineering perspective, splitting the implementation into three separate units enforces very loose coupling among them, thereby making each o f the units more autonomous and cohesive. The three software units do not even share a single global variable among them. As a result, significant changes can be made in one module or the middleware layer (MDIS) without adversely affecting the other. By making the three units separate, we only share as much information as necessary among them via inter-component communication. Had the three modules been tightly bound together, the temptation to share numerous data structures among them would be very strong. Subsequent changes to these data structures would have repercussions that would permeate throughout the entire implementation. By modularizing the simulator, database engine, data mining engine, effects of radical implementation changes in one module do not usually extend into the other module, hence localizing the impact of changes. By creating a clear implementation distinction among the simulator engine, database engine, data mining engine, it becomes much easier to remove one of the modules and replace it with another that has similar capabilities. For example, a different database with different contents may replace the current one. The three components will still be 272 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. able to communicate with one another as long as the new database engine providing to the simulator engine and the data mining engine with the necessary details regarding the structural representation, database schema of the database, and can parse the output results generated by the simulator engine. Alternatively, instead of replacing modules, we could augment the system relatively seamlessly by adding new modules. For example, if one were doing a comparison of two different discrete-event simulators, the same database could be employed to provide the necessary data records. Another advantage o f rigidly partitioning the three components into three distinct applications is that it establishes a framework by which the three units can communicate with each other at the socket level. This could lead to several benefits since this implies that these three components do not necessarily have to be running on the same machine. For example, several people could conceivably be working on simulating different models using different simulator engines. The database they all use, however, could be running on a high-powered machine which is accessible to all. The transmission of the simulation data input request to the so-called “database server” would be transparent to the end user. Also the data mining engines on different machines can also access databases on different machines to do parallel data 273 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mining. With socket support in the core of the components, the distribution of the components over several machines will become realizable. Of course, an efficient distribution o f the components over several machines would have to take into consideration the potential communication delays inherent within any network. Similarly, effectively distributing the data into different databases, or distributing different data mining functions into different data mining engines would necessitate an intelligent partitioning of the database or data mining sub-components in such a way so as to minimize network traffic between machines. Moreover, by separating three software units into loosely coupled applications, different programming languages can be used by each unit, as long as APIs are provided to link the core functions of the units to the communication mechanisms at the socket levels. 7.5.2 Interaction Protocols An interaction protocol is a set o f rules governing some form of communication. Computer protocols have been developed to ensure that there are no mistakes when transferring data between computer systems. A protocol is not a program, or a piece of hardware. It is a specification, like an algorithm. An interaction protocol is used to 274 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. transfer files point-to-point. Network protocols are rales governing information exchange between two or more computers on a network. Most computer networks require that information transferred between two nodes be divided into blocks, called packets. These packets make the information more manageable for sending and receiving nodes, as well as any intermediate nodes (bridges or routers). In addition to the information, or data, that is being transferred, each packet contains control information used for error checking, addressing, and other purposes. In our application, the interaction protocol used among the three components is a ModelData protocol. The ModelData protocol transfers the data units and structural information of the data units forming a data record among the components. It also responsible for transferring input and output signal values back and forth between any two components in the environment. There are certain fields that need to be included in the protocol. The ModelData Protocol The ModelData protocol is an extensible application-level protocol. The information transferred between the components is through sockets. The structure of the protocol is composed of a header and a body. The header can be thought of as representing a 275 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. composed entity, which has several fields such as type, serial number, counter, jileName. The protocol body contains files carrying raw data. The header of the ModelData protocol has several attribute fields. The type attribute determines the type of the unit; acceptable values include one of electrical, pharmacological, condition, recordings and control. It allows messages to be tagged with different headers to specify their purposes: the Type header indicates what kind of message is being processed. It can carry experimental data, request for data, or acknowledgement. The time-out header indicates the maximum period of time that a component commits to wait for receiving the response. The ID attribute is used to distinguish different packets. Its value is a serial number assigned by the system. The counter attribute specifies the order o f packets in the same transfer. The fileName attribute is used to include the name o f the file which needs to be transferred. The ModelData protocol contains the sequence information of the data sent, such as how many data records are sent, what kind o f data is sent. If it is a request / acknowledge message, the type attribute is the value control, with no file is sent with it. We consider a connection as a virtual circuit established between two components for the purpose of communication, such as (TCP/IP) stream sockets (Halsall 1996). This consideration is based on the assumption that if later on we extend our architecture to 276 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Internet environment, then brain researchers all over the world can collaborate and share information in the model databases and experimental databases. The interaction protocol is a request/response protocol on top o f TCP/IP and it distinguishes two roles: client and server. At anytime, a component can be a client or a server. In a typical message flow a client sends a request to the server, and this sends a response back to the client. All messages exchanged between a client and a server are encapsulated as either requests or responses of the interaction protocol. Observe that both requests and responses are represented as XML messages that nest the information corresponding to interaction between any two components. On the sender side, messages to be sent are wrapped at each layer and passed down till getting to the communication layer responsible for the sending. Upon reception, on the receiver side, messages are unwrapped at each layer and passed up to higher layers till the message arrives at the facility for processing. An example of the generic format of the header of body of the ModelData protocol is shown in Figure 7.6. 277 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.5.3 Implementation Currently, we implement in C++ a prototype of three nodes which can send messages with each other using sockets at UNIX environment. The three nodes emulate the three components in our environment. There is a separate class Datagram designed for message which represent the general structure for a message sent among the nodes. The connection relationships among the three components are recorded in the *utable. In our case, they are bidirectional relationships among the components. A sample code of our Node definition is shown below. Header: attribute ^: value - j attribute 2 ' . value 2 attribute n: value n Body: data Figure 7.6 A Generic Format of the Header and Body Section of the ModelData Protocol Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. class Node { friend class Datagram; public: int Nodeld; Datagram ‘message; int sockfd; struct sockaddrjn the_addr, their_addr; int packetsSent; int packetsReceived; UniTable *utable; EventList ‘eventQ; // event queue struct timeval startjim e, current_time, tv; char ‘buffer; void make_message(int, int, char*); Node(int nodeld); ~Node(); void tranc_process(int, ofstream&); void establish_event(ofstream&); float difftime(struct timeval, struct timeval); void make_socket(); void Send(int); void Receive(); }; The message sending and receiving are conducted by socket functions. After a component received a certain message, it can perform the functions required by the message, and send confirm message or some results back to other components specified in the message. 279 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.6 Discussion Finding fitness between models and experimental data is an interesting application of the integration of modeling and experimental studies. In this chapter, we have shown a case study on this. Experimentalists have accumulated a large amount of experimental data. Useful knowledge could be found if we analyze them at a large scale. Therefore, an extended integration architecture incorporating data mining that enhances the collaboration and interactions among simulation, experimental databases and data mining system is better for conducting brain research. The MDIS has laid out a framework, adding data mining become much straightforward since it can be attached to MDIS as a stand alone module just as the modeling and simulation system, and the experimental databases. XML has been proposed as a data transfer mechanism in our application, which provides flexibility and efficiency in message passing since raw data are encapsulated in a file. An application-dependent interaction protocol (the ModelData protocol) has been designed to support communication among the components. Although the prototype is built for neural modeling application, the techniques presented here can be used for many other applications. 280 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To folly implement the system integration, we anticipate that each component will perform certain functions when receiving certain message. Therefore, to further automate our integration schema, we can build a state machine for each component to facilitate message sending and function selection. By incorporating state machine into each component, we can achieve automation of communication among the components, which will be an interesting future work. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 8: Conclusions and Future Work In this chapter, the results in the thesis are summarized first. Then some future works are presented. 8.1 Summary of the Thesis In this thesis, an integrated architecture is outlined for modeling and simulation, experimental databases and data mining. The development o f a modeling and simulation system EONS, neuroscience databases, and a data mining system NeuroMiner have been discussed in respective chapters. Special emphasis has been put in developing MDIS, a Model/Data Integration System. MDIS is developed to allow models to be linked with relevant data in the experimental databases and to facilitate the connections through data access, model/data representation and integration methods. It incorporates several mechanisms: mapping ejqjerimental protocols with simulation protocols, a protocol specification language and semantic model, a metadata model, a hypothesis-driven approach to identify key data in the databases, a Model-DataView structure, a controller, a query manager and a GUI. Adding data mining into this architecture aims to give experimentalists and modelers 282 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. an environment where they have easy access to all of the components and can perform functions o f the components interactively and incrementally. The development of EONS provides methodologies for modeling neural systems from different levels of granularities and capturing the interactions among processes from different levels, i.e. from cellular / molecular level, to synaptic level, to neuron level and network level. The core of the simulation tool, EONS, (Elementary Objects of Neural Systems) is an object-oriented library comprising o f neural models, from networks, neurons, and synapses, down to molecules in a hierarchical structure. Its modular design makes it possible to reuse existing modules for building new modules and to modify them without affecting other modules. There are different kinds of experimental data accumulated by experimentalists. How to organize these data is a very challenging work. We have proposed to construct a data warehouse to store these data. Facilitated by this architecture, more advanced data analysis can be performed. To this end, we have developed a flexible matching mechanism to find similar sequences in the experimental databases. This mechanism is supported by our Sh-index and data warehouse architecture. This function is the prototype of our proposed NeuroMiner - a data miner for neuroscience data. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A protocol-based scheme is used by both the simulation and the neurobiological database which stores the information about the experiments to provide the linkage. A prototype for protocol-based simulation in which information about a model is organized in the same way as the specification of an experiment has been developed to provide the connection for modeling and experimental studies. A protocol specification language (PSL) and semantic models for the protocols have been designed. By using PSL, the common representation language for both experimental protocols and simulation protocols, we can constrain user queries with simulation protocol/experimental protocol terms. The metadata model which includes several matrix has been developed to facilitate the mapping. The hypothesis-driven approach provides the mechanism to identify key data in the experimental databases, and to help formalizing the query definitions for the models. To test the accuracy of models, it is important to simulate them with real data from experimental databases. However, experimental databases and model databases are usually separated with their own structures. Created models have different versions and can be used to verify different hypotheses. The Model-DataView units are designed to make querying databases more easily since default queries have been stored. All of the above methods together form the Model/Data Integration System (MDIS). We have used the mechanisms in MDIS to generate queries to retrieve relevant experimental results 284 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. from databases to test the fitness among different models and certain experimental data. Adding data mining into MDIS extends the integrated architecture. The three components in our environment - modeling and simulation system, data mining, and experimental databases are stand-alone systems. Connections among them are built to facilitate the communication. Towards this, a message passing mechanism for data transfer through XML and an application-dependent interaction protocol are developed to enhance communication services specific for our application domain requirements. 8.2 Future Work EONS is developed as a library of elementary neural objects, which can be reused for other modules. Thus, linking EONS as a library ofNSLJ is an interesting future work. NSLJ provides the capability o f simulating large number of neurons. With the detailed modeling methods in EONS, the combination will make NSLJ a very flexible and powerful simulation language. 285 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Our proposed data mining system, NeuroMiner, only provides time series data matching mechanisms at the moment. The data miner can be enriched by adding some more data mining functions such as cross mining on experimental protocols and experimental recordings, association rule mining on the experimental protocols, attribute induction, and some multidimensional analysis. Multi-dimensional analysis solutions, commonly referred to as On-Line Analytical Processing (OLAP) solutions, offer an extension to the relational model to provide a multi-dimensional view o f the data. Multi-dimensional data structures provide both mechanisms to store the data and a way for a decision making. We have designed the Model-DataView structure in MDIS. The building o f the Model-DataView units is done manually. A future work could be automating the construction o f the Model-DataView units through the help o f GUI. Query manager in our architecture has not been implemented. Most of its functionality can be realized by general database query engine. The implementation of a more sophisticated q u ery manager for cooperative query processing can be an interesting future work. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The mechanisms in the MDIS architecture have been used to facilitate testing the fitness between models and experiments. The ultimate goal o f this work is to make this testing processes automate. Therefore finding fitness between a model and an experiment, as well as systematically tuning system parameter to match simulation results and experimental results are interesting future work. The implementation of XML can be easily conducted if we have a database that supports XML such as Oracle. To implement the system integration, a state machine transition in each component will help to automate the processes of running certain functions and sending message to one and the other components. Thus incorporating the functionality o f state transition machine into each component will provide users the capability to perform different functions interactively and incrementally. The GUI component is not emphasized in this thesis. A future work can be to develop a friendly and comprehensive user interface for users to input requirements and view the results from simulation, databases, data mining system and MDIS. With powerful GUI and message connection mechanisms, we can envision a semi-automated model/data integration system to be possible. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This integrated architecture can be extended if we take into consideration the issues o f distributed systems and parallel processing. The proposed architecture assumes that there is a central data warehouse. However, we can also build several data mart for specific experimental labs, thus, providing a distributed data warehouse environment. Consequently, parallel data mining systems can be used for data mining on different data warehouse. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. References Adali, S., K. Candan, et al. (1996). Query caching and optimization in distributed mediator systems. Proc. of ACM SIGMOD, Montreal, Canada. Agrawal, R., C. Faloutsos, et al. (1993). Efficient Similarity Search in Sequence Databases. Proc. of the 4th Int'l Conference on Foundations of Data Organization and Algorithms, Chicago. Agrawal, R., K. Lin, et al. (1995). Fast Similarity Search in the Presence of Noise. Scaling- and Translation in Time-Series Databases. Proc. o f the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland. Agrawal, R., G. Psalila, et al. (1995). Querying shapes ofhistories. Proceedings of the 21st VLDB Conference, Zurich, Switzerland. Ahmed, R. e. a. (1991). “The Pegasus hetergeneous multidatabase system.” IEEE Computer 24 (12): 19-27. Ambros-Ingerson, J. and G. Lynch (1993). “Channel gating kinetics and synaptic efficacy: A hypothesis for expression of long-term potentiation.” Proc. Natl. Acad. Sci. Anderson, J. A. (1997). An introduction to neural network. MIT press. Apparao, V. and S. Byrne, Eds. (1998). Document object model (DOM) level 1 specification. Ballard, D. H. (1997). An introduction to natural computation. Cambridge, MA, MIT Press. 289 permission of the copyright owner. Further reproduction prohibited without permission. Batini, M., C. Lenzirini, et al. (1986). “A Comparative analysis of methodologies for database schema integration.” ACM Computer Surveys 18 (4): 323-363. Beckmann, N., H.-P. Kriegel, et al. (1990). The R*-tree: an efficient and robust access method for points and rectangles. ACM SIGMOD. Bemdt, D. J. and J. Clifford (1994). Using Dynamic Time Warping to Find Patterns in Time Series. KDD94: AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington. Bontempo, C. (1995). DataJoiner for AIX, IBM Corporation. Bose, N. K. and P. Liang (1996). Neural network fundamentals with graphs, algorithms, and applications. Mcgraw-Hill Inc. Bower, J. and D. Beeman (1994). The book o f GENESIS: Exploring realistic neural models with the General Neural Simulation System. Springer-Verlag. Brachman, R. J., D. L. McGuinness, et al. (1991). “ Living with CLASSIC: When and how to use a KL-ONE-Like language.” Principles of Semantic Networks. Brachman, R. J. and J. G. Schmolze (1985). “An overview of the KL-ONE knowledge representation system.” Cognitive science 9. Bray, T., J. Paoli, et al. (1998). Extensible markup language (XML) 1.0. Breitbart, Y. (1990). “Multidatabase interoperability. ” ACM SIGMOD Record 19 (3): 53-60. 290 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Breitbart, Y., P. L. Olson, et al. (1986). Database integration in a distributed heterogeneous database system. Proceedings of the second international conference on data engineering. Brosda, V. and G. Vossen (1988). ‘ TJpdate and retrieval in a relational database through a universal schema interface.” ACM Transactions on database systems 13 (4): 449-485. Carely, M. and e. al. (1995). “Towards Heterogeneous Multimedia Information System: The Garlic Approach.” Proc. of the Intl. Workshop on Research Issues in Data Engineering! March!: 124-131. Camevale, N. T. and S. Rosenthal (1992). “Linetics of diffusion in a spherical cell: I. No solute buffering.” J. Neurosci. Meth. 41:205-216. Chang, T.-H. and E. Sciore (1992). “A universal relation data model with semantic abstractions.” IEEE Transactions on knowledge and data engineering 4 (1): 23-33. Chawathe, S., H. Garcia-Molina, et al. (1994). The TSIMMIS Project: Integration of Heterogeneous Information Sources. In Proceedings of the 100th IPSJ Conference, Computer Society of IEEE, Tokyo, Japan. Comer, D. (1979). The Ubiquitous B-Tree. In ACM Comp. Surveys. Dashti, A. E., S. Ghandeharizadeh, et al. (1997). “Database Challenges and Solutions in Neuroscientific Applications.” Neuroimage 5: 97-115. Faloutsos, C., M. Ranganathan, et al. (1994). Fast Subsequence Matching in Time- Series Databases. ACM SIGMOD Conference. Forss, J., D. Beeman, et al. (1999). “The Modeler's Workspace: a distributed digital library for neuroscience.” Future Generation Systems 1 6 :111-121. 291 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Grethe, J. S. (1997). NeuroCore database description, In: Neurolnformatics Workbench: Experimental database guide., University of Southern California Brain Project. Grethe, J. S. (1997b). Building a time-series database for in-vivo neurophysiology, In: Neurolnformatics Workbench: Experimental database guide., University o f Southern California Brain Project. Guttman, A. (1984). R-trees: A Dynamic Index Structure for Spatial Searching. Proc. ACM SIGMOD Int. Conf. on Management of Data. Haas, L. M., R. J. Miller, et al. (1999). “ Data Engineering Bulletin '99: Transforming Heterogeneous Data with Database Middleware: Beyond Integration.” . Halsall, F. (1996). Data Communications. Computer Networks and Open Systems. Addison-Wesley publishing company. Hammer, J., M. Breunig, et al. (1997). Template-Based Wrappers in the TSIMMIS System. In Proceedings of the Twenty-Sixth SIGMOD International Conference on Management of Data, Tucson, Arizona. Hebb, D. O. (1949). The organization of behavior. New York, Wiley. Hines, M. (1984). “Efficient computation of branched nerve equations.” Int. J. Bio- Med. Comput. 15: 69-76. Hines, M. (1989). “A program for simulation o f nerve equations with branching geometries.” Int. J. Bio-Med. Comput. 24(55-68). Hines, M. (1993). NEURON: a program for simulation of nerve equations. In: Neural Systems: Analysis and Modeling. F. Eckman: 127-136. 292 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Hines, M. (1994). The NEURON simulation program, In: Neural Network Simulation Environment. J. Skrzypek. Norwell, MA, Kluwer: 147-163. Hines, M. and N. T. Camevale (1997). “The NEURON simulation environment.” Neural Computation 9(6). Hodgkin, A. L. and A. F. Huxley (1952). “A quantitative description o f membrane current and its application to conduction and excitation in nerve.” J. Physiol. 117: 500-544. Ives, Z. G., D. Florescu, et al. (1999). An Adaptive Query Execution System for Data Integration. ACM SIGMOD Conference on Management of Data, Philadelphia, PA. Jagadish, H. V. (1991). A Retrieval technique for similar shapes. Proc. ACM SIGMOD Conference. Jannink, J., P. Mitra, et al. (1999). An Algebra for Semantic Interoperation of Semistructured Data. 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX'99), Chicago. Josifovski, V. and T. Risch (1999). Integrating heterogeneous overlapping databases through object-oriented transformations. Proceedings of the 25th VLDB conference, Edinburgh, Scotland. Kahng, J. and D. McLeod (1996). Dynamic Classificational Ontologies for discovery in Cooperative Federated Databases. Proceedings of the First IFCIS International Conference on Cooperative Information Systems (CoopIS'96), Brussels, Belgium. Kahng, J. and D. McLeod (1998). Dynamic Classificational Ontologies: Mediation of information sharing in Cooperative Federated Database Systems. Cooperative Information Systems: Trends and Directions. M. P. Papazoglou and G. Schlageter. London, United Kingdom, Academic Press: 179-203. 293 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Keogh, E. and P. Smyth (1997). A Probabilistic Approach to Fast Pattern Matching in Time Series Databases. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Newport Beach, California. Labio, W . J., Y. Zhuge, et al. (1997). The WHIPS Prototype for Data Warehouse Creation and Maintenance. Demonstration description. The demo was also given at the International Conference on Data Engineering. Binghamton. UK ApriL 1997. Proceedings of the ACM SIGMOD Conference, Tuscon, Arizona,. Larson, J. A., S. B. Navathe, et al. (1989). “A Theory o f attribute equivalence in databases with application to schema integration.” IEEE transactions on software engineering 15 (4): 449-463. LeCun, Y. (1986). Learning processes in an asymmetic threshold network. Disordered Systems and Biological Organization. E. Bienenstock, Fogelman-Soulie and F. a. W. G. Berlin, Springer-Verlag. Lester, R. A. J. and C. E. Jahr (1992). “NMD A channel behavior depends on agonist affinity.” The Journal ofNeuroscience 12(2)(February): 635-643. Leymann, F. (1989). “A survey of the universal relation model.” Data and knowledge engineering 4: 305-320. Li, C., R. Yemeni, et al. (1998). Capability Based Mediation in TSIMMIS. Demo. SIGMOD 98, Seattle. Li, C.-S., P. S. Yu, et al. (1996). HierarchvScan: A hierarchical similarity search algorithm for databases of long sequences. Proc. of the 12th International Conference on Data Engineering, Louisiana. Liaw, J.-S., Y. Shu, et al. (1996). EONS: A Multi-level Modeling System. 26th Annual Meeting of Society for Neuroscience, Washington D.C. 294 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Liaw, J.-S., Y. Shu, et al. (2001). EONS: A Multi-level Modeling System and Its Applications. Computing the Brain: A Guide to Neuroinformatics. M. Arbib and J. Gerthe. Academy Press. Marchiori, M. (1998). Proceedings o f OL'98 - The Query Languages Workshop. Boston, MA. McCulloch, W. S. and W. Pitts (1943). “A Logical Calculus of the Ideas Immanent in Nervous Activity.” Bull. Math. Biophvs. 5(115). Milo, T. and S. Zohar (1998). Using schema matching to simplify heterogeneous data translation. Proceedings of the 24th VLDB conference, New York. Mirsky, J. S., P. M. Nadkami, et al. (1998). Database tools for integrating and searching membrane property data correlated with neuronal morphology. Journal of Neuroscience Methods. Elsevier Science. 82: 105-121. Mitra, P., G. Wiederhold, et al. (1999). Semi-automatic Integration of Knowledge Sources. Proceedings o f Fusion '99, Sunnyvale CA. Montague, P. R., J. A. Gaily, et al. (1991). “Spatial signaling in the development and function of neural connections.” Cerebral Cortex 1: 1-20. Montague, P. R. and T. J. Sejnowski (1994). The predictive brain: Temporal Coincidence and temporal order in synaptic learning mechanism Learning & Memory. Cold Spring Harbor Laboratory Press. 1. Moore, J. W. and M. L. Hines (1994). “Simulations with NEURON.” . Naumann, F. and U. leser (1999). Quality-driven integration o f heterogeneous information systems. Proceedings of the 25th VLDB conference, Edinburgh, Scotland. 295 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Papakonstantinou, Y. and e. al. (1995). “A Query Translation Scheme for Rapid Implementation of Wrappers.” Proc. o f the Conference on Deductive and Object- Oriented Databases. Papakonstantinou, Y., H. Garcia-Molina, et al. (1996). “Medmarker: A mediation system based on declarative specifications.” Proc. o f IEEE Conference on Data Engineering. Papakonstantinou, Y., H. Garcia-Molina, et al. (1995). Object exchange across heterogeneous information sources. Proceedings of the Data Engineering Conference, Computer Society of the IEEE, Taipei, Taiwan. Parker, D. (1985). Learning logic. Cambridge, MA, Center for Computational Research in Economics and Management Science, MIT. Reddy, M. P., B. E. Prasad, et al. (1994). “ A methodology for integration for heterogeneous databases.” IEEE transactions on knowledge and data engineering 6 (6): 920-933. Rosenblatt, F. (1958). “The perceptron: a probabilistic model for information storage and organization in the brain.” Psychological Review 65: 386-408. Roth, M. T. and P. Schwarz (1997). Don’ t scrap it. wrap it! A wrapper architecture for legacy data sources. Proc. of the 23rd VLDB conference, Athens, Greece. Rumelhard, D. E. and G. E. Hinton (1986). Learning internal representations by error propagation. Parallel Distributed Processing. D. E. Rumelhard and McClelland. Cambridge, MIT Press. 1. Shatkay, H. and S. B. Zdonik (1996). Approximate queries and representations for large data sequences. Proc. Of the 12th International Conference on Data Engineering, Louisiana. 296 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Shaw, S. W. and R. J. P. Defigueiredo (1990). Structural Processing of Waveforms as Trees. IEEE Trans. ASSP. Sheth, A. P. and J. A. Larson (1990). “ Federated database systems for managing distributed heterogeneous and autonomous databases.” ACM Computer Surveys 22 (3): 183-235. Shu, Y., J.-S. Liaw, et al. (1997). Protocol-based Simulation with EONS. 27th Annual Meeting of Society for Neuroscience, New Orleans. Shu, Y. (1997b). Synaptic Modeling with AMP A and NMDA Receptors. Los Angeles, Dept, o f Computer Science, University o f Southern California Shu, Y., J.-S. Liaw, et al. (1998). Integration o f Different Kinds of Data in Neurobiological Databases. 28th Annual Meeting o f Society for Neuroscience, Los Angeles. Shu, Y., J.-S. Liaw, et al (1999). Data Mining for Neuroscience Databases. 29th Annual Meeting of Society for Neuroscience. Shu, Y., X. Xie, et al. (1999). A Protocol-based Simulation for Linking Computational and Experimental Studies. Neurocomputing. Elsevier Science. 26-27: 1039-1047. Shu, Y. et al. (2000). Supporting Connections between Simulator and Databases. 11th International conference of the information sources management association, Anchorage, Alaska. Tomasic, A., L. Raschid, et al. (1996). “Scaling heterogeneous databases and the design ofDISCO.” Proc. ICDCS. 297 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. von der Malsburg, C. (1973). “Self-organization of orientation sensitive cells in the striate cortex.” Kvbemetik 14: 85-100. von der Malsburg, C. and D. J. Willshaw (1977). "How to label nerve cells so that they can interconnect in an ordered fashion." Proc. Natl. Acad. Sci. 74: 5176-5178. Wang, J. T., K. Zhang, et al. (1994). A System for Approximate Tree Matching. IEEE TKDE. Weiss, Sholom M. and Nitin Indurkhya (1998). Predictive Data Mining A Practical Guide. Morgan Kaufm arm Publishers, Inc. 1-55860-403-0. Weitzenfeld, A., M. Arbib, et al. (1997). NSL: Neural Simulation Language, Modeler's Guild (alpha 2), version 3.0.a, Brain Simulation Laboratory, Center for Neural Engineering, University o f Southern California. Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. Cambridge, MA, Harvard University. Widrow, B. and M. Hoff (1960). “Adaptive switching circuits.” 1960 WESCQN Convention Record: 96-104. Xie, X., J.-S. Liaw, et al. (1997). “Novel expression mechanism for synaptic potential: Alignment of presynaptic release site and postsynaptic receptor.” Proceedings of the National Academy o f Science 94: 6983-6988. Zador (1990). “ Biophysical model of a Hebbian synapse.” Proc. Natl. Acad. Sci. 87(Neurobiology): 6718-6722. Zhao, J. L. (1997). “Schema coordination in federated database management: a comparison with schema integration.” Decision support systems 20:243-257. 298 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix: Description of the Programs in the Thesis 1. Name: spine.cc Purpose: this program models a synapse which can be a combination of different AMPA and NMDA receptors. The number of receptors can be varied to get different EPSPs. These EPSPs can be compared with EPSP experimental results. Address: /export/home/hbptwb-00/yingshu/Demo/AMPA_NMDA/spine.cc Language: C++ Number o f lines: 382 How to use: to execute, type "spine". Class name: spine Purpose: a unit which consists o f a number o f AMPA and NMDA receptors 299 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Procedure: calculate_d(), calculate the EPSP generated by the spine. How to use: it uses the AMPA and NMDA classes. Class name: AMPA Purpose: model AMPA receptor Procedures: calculate_d(), it calculates the EPSC generated by the AMPA receptor. rk4(float *y, float *dydx, int n, float x, float h, float *yout), it is a standard partial differential equation for calculating EPSC o f AMPA. How to use: it is used by the Spine class. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Class name: NMDA Purpose: model NMDA receptor Procedures: calculate_d(), it calculates the EPSC generated by the NMDA receptor. rk4(float *y, float *dydx, int n, float x, float h, float *yout), it is a standard partial differential equation for calculating EPSC ofNMDA. How to use: it is used by the Spine class. 2. Name: spine_phy.cc Purpose: this program models a synapse which is a combination of AMPA and NMDA receptors. The number of receptors can be varied to get different EPSPs. These EPSPs can be compared with EPSP experimental results. 301 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Address: /export/home/hbptwb-00/yingshu/Demo/Biophys/spine_phy.cc Language: C++ Number oflines: 274 How to use: to execute, type "spme_phy". Class name: spine Purpose: a unit which consists of a number of AMPA and NMDA receptors Procedure: calculate_V(), calculates the EPSP generated by the spine. How to use: it uses the AMPA and NMDA classes. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Class name: AMPA Purpose: model AMPA receptor Procedure: calculatedQ, calculates the EPSC generated by the AMPA receptor. How to use: it is used by the Spine class. Class name: NMDA Purpose: model NMDA receptor Procedure: calculate dQ, calculates the EPSC generated by the NMDA receptor. How to use: it is used by the Spine class. 303 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. Name: yn.cc Purpose: this program models a neural network where the connection between neurons is replaced by pre-synapticl procedures, instead of the weighted links. This architecture shows the flexibility of multi-level modeling of neural systems. Address: /export/home/hbptwb-OO/yingshu/Demo/Eons/yncc Language: C++ Number of lines: 1868 Main procedures: Set_up_net(input_neuro, outputneuro), connects a network of neurons. Initial_neuron(mput_neuro, output neuro), initiates the variables in the Neuron class. YN_neurons(input_neuro, output neuro, inputdata), simulates the neural network. 304 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Leam_YN(Neuron, Neuron, Neuron), performs learning of the neural network. Normalize(mt, int, int, Neuron, Neuron), normalizes the values of the parameters in the presynaptic procedure. How to use: to execute, type " y n " Class name: Neuron Purpose: consist of pre-terminals and spines. Procedure: run_module_neuron(float), calculates membrance potential o f a neuron and checks whether it passes the threshold. How to use: it uses the Pre-terminal and Spine classes. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Class name: Pre-terminal Purpose: model Pre-synaptic terminal Procedure: run_module_bouton(), calculates the neuron transmitter release generated by the Pre-terminal. rk4(float *y? float *dydx, int n, float x, float h, float *yout), a standard partial differential equation for calculating neuron transmitter release of pre-terminal. How to use: it is used by the Neuron class. Class name: Spine Purpose: model spine Procedures: run_module_bouton(), calculates the EPSP generated by the spine. 306 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. rk4(float *y, float *dydx, int n, float x, float h, float *yout), a standard partial differential equation for calculating EPSP of spine. How to use: it is used by the Neuron class. 4. Name: ControlDB.java Purpose: given a sample time series sequence, this program finds similar ones in a group of time series sequences by flexible shape matching. The matching criteria is determined by the users. Address: /export/home/hbptwb-OO/yingshu/Demo/Neurominer/* .java Language: Java Number of lines: 578 How to use: to execute, type "java ControlDB" 307 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without permission. Class Name: UnitRecord Purpose: read sequences o f time series into a local data repository. Each sequence labeled by Up, Down and Stable to represent its series of shapes. Procedures: read_objlist(String name), labels each base recording in a sequence. read_uplist(String name), labels Up parts in a sequence. read_uplist(String name), labels Down parts in a sequence. read_stablelist(String name), labels Stable parts in a sequence. Class Name: Sequence Purpose: get the slopes of the specified sequence. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Procedure: check_slope(int degree), get the slopes by the number of the specified segments. Class Name: RecordDB Purpose: find similar time series in the repository for the given time series. Procedures: match_seq_Obj(), matches base recording object. match_seq_Up(), matches Up parts of a sequence. match_seq_Down(), matches Down parts of a sequence. Class Name: ControlDB Purpose: control the input and display o f output 309 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Procedure: main(), gets user's input and calls respective Junctions to perform the requirements. 5. Name: modelview.cpp Purpose: this program translates a set o f simulation protocol parameters into their corresponding experimental database attributes based on PSL and connection matrix specification. The output can be used to construct query for finding similar experiments in the databases. Address: /export/home/hbptwb-OO/yingshu/Demo/ModelView/modelview.cc Language: C++ Number oflines: 446 Main procedures: 310 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. mainO controls the input and the displays of output. tranc_elec(int which, ofstream &output_obj), translates electrical manipulation parameters into experimental electrical manipulation attributes and writes them into an output file. tranc_pharm(int which, ofstream &output_obj), translates pharmacological manipulation parameters into experimental pharmacological manipulation attributes, and writes them into an output file. tranc_cond(int which, ofstream &output_obj), translates simulation conditions parameters into experimental condition attributes, and writes them into an output file. How to use: type "modelview #i config#". Config# contains the attribute connection information. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6. Name: Integr.cpp Purpose: This program simulates 3 nodes. Each of them can send messages to each other. These 3 nodes represent simulator, an experimental db and a data mining component. When each of them gets a message, it can perform the function requested in the message. Address: /export/home/hbptwb-QO/yingshu/Demo/Integr/integr.cc Language: C++ Number of lines: 652 Main procedures: main(): controls simulation time. tranc_process(int Nodeld, output file obj), through socket, each node receives and processes messages. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. makeUnicastTable(i), for each node, constructs the connections with other nodes. buildEventList(i), for each node, constructs its event list. Send(int node), each node can send packages to other nodes. Receive(), each node can receive packages. make socketO, sets up socket for each node. How to use: to execute, type " r u n " (run it at background) R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. 
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button
Conceptually similar
Automatically and accurately conflating road vector data, street maps and orthoimagery
PDF
Automatically and accurately conflating road vector data, street maps and orthoimagery 
An adaptive soft classification model:  Content-based similarity queries and beyond
PDF
An adaptive soft classification model: Content-based similarity queries and beyond 
Content -based video analysis, indexing and representation using multimodal information
PDF
Content -based video analysis, indexing and representation using multimodal information 
An adaptive temperament -based information filtering method for user -customized selection and presentation of online communication
PDF
An adaptive temperament -based information filtering method for user -customized selection and presentation of online communication 
Efficient minimum bounding circle-based shape retrieval and spatial querying
PDF
Efficient minimum bounding circle-based shape retrieval and spatial querying 
Concept, topic, and pattern discovery using clustering
PDF
Concept, topic, and pattern discovery using clustering 
Experimental evaluation of a distributed control system for chain-type self -reconfigurable robots
PDF
Experimental evaluation of a distributed control system for chain-type self -reconfigurable robots 
A unified mapping framework for heterogeneous computing systems and computational grids
PDF
A unified mapping framework for heterogeneous computing systems and computational grids 
Efficient PIM (Processor-In-Memory) architectures for data -intensive applications
PDF
Efficient PIM (Processor-In-Memory) architectures for data -intensive applications 
Architectural support for efficient utilization of interconnection network resources
PDF
Architectural support for efficient utilization of interconnection network resources 
Distributed constraint optimization for multiagent systems
PDF
Distributed constraint optimization for multiagent systems 
Conjunctival epithelial uptake of biodegradable nanoparticles:  Mechanism, intracellular distribution, and absorption enhancement
PDF
Conjunctival epithelial uptake of biodegradable nanoparticles: Mechanism, intracellular distribution, and absorption enhancement 
Cost -sensitive cache replacement algorithms
PDF
Cost -sensitive cache replacement algorithms 
Insulin-transferrin conjugate as an oral hypoglycemic agent
PDF
Insulin-transferrin conjugate as an oral hypoglycemic agent 
Enabling clinically based knowledge discovery in pharmacy claims data: An application in bioinformatics
PDF
Enabling clinically based knowledge discovery in pharmacy claims data: An application in bioinformatics 
A dynamic apical actin cytoskeleton facilitates exocytosis of tear proteins in rabbit lacrimal acinar epithelial cells
PDF
A dynamic apical actin cytoskeleton facilitates exocytosis of tear proteins in rabbit lacrimal acinar epithelial cells 
Color processing and rate control for storage and transmission of digital image and video
PDF
Color processing and rate control for storage and transmission of digital image and video 
Annotation databases for distributed documents
PDF
Annotation databases for distributed documents 
Computer prediction of peptide binding to class I MHC molecules
PDF
Computer prediction of peptide binding to class I MHC molecules 
Design and evaluation of a fault tolerance protocol in Bistro
PDF
Design and evaluation of a fault tolerance protocol in Bistro 
Action button
Asset Metadata
Creator Shu, Ying (author) 
Core Title An integrated environment for modeling, experimental databases and data mining in neuroscience 
Contributor Digitized by ProQuest (provenance) 
School Graduate School 
Degree Doctor of Philosophy 
Degree Program Computer Science 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag Computer Science,OAI-PMH Harvest 
Language English
Advisor Arbib, Michael A. (committee chair), Liaw (committee member), Shahabi, Cyrus (committee member), Shen, Wei-Chiang (committee member) 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c16-109955 
Unique identifier UC11326983 
Identifier 3027780.pdf (filename),usctheses-c16-109955 (legacy record id) 
Legacy Identifier 3027780.pdf 
Dmrecord 109955 
Document Type Dissertation 
Rights Shu, Ying 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA