Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Modeling biochemical components and systems through data integration and digital arts approaches
(USC Thesis Other)
Modeling biochemical components and systems through data integration and digital arts approaches
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Modeling Biochemical Components and Systems through Data
Integration and Digital Arts Approaches
by
Kyle Mathew McClary
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
Chemistry
May 2020
Copyright 2020 Kyle Mathew McClary
ii
Table of Contents
List of Tables ......................................................................................................................................... iv
List of Figures ......................................................................................................................................... v
Abstract ................................................................................................................................................ viii
Chapter 1: Introduction ............................................................................................................................ 1
Diabetes and Modeling the Pancreatic Beta Cell .................................................................................. 3
Diabetes: a widespread metabolic disease regulated by insulin activity............................................ 3
Determining PBC insulin production mechanisms with cell modeling .............................................. 6
The Melanocortin 4 Receptor (MC4R) and appetite regulation ........................................................... 15
MC4R: a metabolism regulating receptor with therapeutic potential ............................................... 15
MC4R drug development, structural modeling, and mutagenesis .................................................... 24
Data integration via digital arts approaches ........................................................................................ 29
Visualization as a means for integrating disparate data ................................................................... 29
Chapter 2: Omics and Ultrastructural Analysis of the Pancreatic Beta Cell ............................................. 39
Background: PBC omics and imaging ................................................................................................ 39
PBC omics and imaging toward a whole-cell model ....................................................................... 39
PBC omics for component lists ...................................................................................................... 42
Structure availability analysis ......................................................................................................... 44
Proteomic and transcriptomic profiling of PBC lines 1.1B4 and INS1-e ......................................... 46
Ultrastructure profiling of PBC line INS1-e via soft X-ray tomography .......................................... 47
Results and Discussion ...................................................................................................................... 49
Global Proteomic Analysis in 1.1B4 Whole Cell Lysates through comparative Fractionation
Techniques .................................................................................................................................... 49
1.1B4 Proteomics and Transcriptomics Consistency Analysis......................................................... 56
INS1E whole-cell and organelle proteomics ................................................................................... 61
Ultrastructural mapping of 1.1B4 and INS1E via soft X-ray tomography ........................................ 67
Methods............................................................................................................................................. 69
Chapter 3: Toward the Structure of the Melanocortin 4 Receptor ........................................................... 74
Background: GPCR structural and functional investigations ............................................................... 74
iii
MC4R: an attractive target for structure based drug design ............................................................. 74
Results and Discussion ...................................................................................................................... 77
MC4R expression optimization .................................................................................................... 77
MC4R purification optimization ..................................................................................................... 81
GFP insertion to measure protein purity and monodispersity .......................................................... 82
Receptor-stabilizing point mutations .............................................................................................. 83
Ligand co-purification for improved receptor stability .................................................................... 86
Construct and ligand pair optimization ........................................................................................... 88
Thermostability assessment ............................................................................................................ 89
Combining stabilizing point mutations and ligands ......................................................................... 92
Negative stain as a screening technique for GPCR purity and monodispersity ................................ 94
Methods........................................................................................................................................... 101
Chapter 4: Modeling Biochemical Systems with Digital Arts Platforms ............................................... 104
Introduction: Communicating Science Visually ................................................................................ 104
Developing Tools for Improved Microscope Image Viewing, Markup, and Sharing ......................... 105
Microscape: A point cloud rendering system for viewing microscope images ............................... 105
Importing and rendering microscope images in the Unity Game Engine ....................................... 106
Image modification ...................................................................................................................... 111
Conclusion ................................................................................................................................... 112
Developing data-driven interactive experiences of PBC environments .............................................. 113
The World in a Cell: A virtual exploration of PBC biology........................................................... 113
Modeling the PBC as an input-output machine ............................................................................. 115
Representing PBC molecular signaling pathways ......................................................................... 116
Modeling PBC components in a scalable form-language .............................................................. 119
Storyboarding the interactions between PBC components............................................................. 122
Current phase and next steps ........................................................................................................ 125
Chapter 5: Communicating Science through Digital Arts ..................................................................... 127
Science is becoming more prevalent in the media and popular culture .......................................... 128
Artistic practices benefit scientists in their communications and productivity ............................... 130
Creating media to communicate science through digital arts approaches ....................................... 134
References ........................................................................................................................................... 139
iv
List of Tables
Table 1. Comparison of discovered protein IDs in 1.1B4 whole-cell lysates using four
prefractionation methods ........................................................................................................... 53
Table 2. Top fifteen most abundant 1.1B4 genes/proteins present in both transcriptome and
proteome datasets, ranked by transcript frequency ..................................................................... 58
Table 3. INS1E protein identification summary for whole-cell lysate and organelle fractions .... 62
Table 4. 50 cell components that were modeled in phase I of the World in a Cell .................... 120
v
List of Figures
Figure 1. Ligand cross-reactivity in the melanocortin system .................................................... 17
Figure 2. MC4R signaling pathways.......................................................................................... 18
Figure 3. Structure predictions solved by Foldit players ............................................................ 36
Figure 4. Structural coverage of PBC proteins identified in analyses of: (a) transcriptomic; and
(b) proteomics data, which were categorized according to the amount of available structural
information ............................................................................................................................... 46
Figure 5. Sample preparation and analysis pipeline for PBC proteomics .................................... 50
Figure 6. Integrated proteomics pipeline to identify peptides from raw mass spectra, match
peptides to parent proteins, and perform statistical significance assessments on the results ........ 51
Figure 7. 1.1B4 whole-cell lysate, molecular function gene ontology analysis ........................... 54
Figure 8. 1.1B4 whole-cell lysate, protein class gene ontology analysis ..................................... 55
Figure 9. Rank Rank HyperGeometric Overlap visualization of relationship between ranked lists
of 1.1B4 transcripts generated with RNA-seq and 1.1B4 proteins generated with mass-
spectrometry proteomics ........................................................................................................... 60
Figure 10. Rank Rank HyperGeometric Overlap visualization of relationship between ranked
lists of INS1E transcripts generated with RNA-seq and INS1E proteins generated with mass-
spectrometry proteomics ........................................................................................................... 64
Figure 11. Distribution of INS1E total protein molecular function gene ontology classifications
................................................................................................................................................. 65
Figure 12. Distribution of INS1E total protein class gene ontology classifications ..................... 66
Figure 13. Identification of INS-1E ultrastructural components via LAC value ......................... 68
vi
Figure 14. GPCR superfamily. Melanocortin family is circled in red ......................................... 75
Figure 15. MC4R snakeplot denoting N- and C-terminal truncation sites (red) and pairs of ICL3
fusion protein insertion sites (blue and green)............................................................................ 80
Figure 16. GFP facilitated comparison of three fusion insertion constructs’ purity and
monodispersity .......................................................................................................................... 83
Figure 17. Four mutations improving MC4R purity and monodispersity in SEC........................ 85
Figure 18. Structures of agonists and antagonists tested for MC4R stability inducement ........... 87
Figure 19. Determination of ten ligands yielding pure and monodisperse MC4R when purified
with BRIL6 + N2.57L construct ................................................................................................ 89
Figure 20. Thermostability assay of pure and monodisperse MC4R fusion and point mutated
construct with SHU9119 copurification ..................................................................................... 90
Figure 21. Purity and monodispersity is retained in MC4R BRIL fusion and N2.57L mutated
construct after removal of GFP.................................................................................................. 91
Figure 22. Thermostability is retained in MC4R BRIL fusion and N2.57L mutated construct after
removal of GFP ......................................................................................................................... 92
Figure 23. In addition to N2.57L, the E1.37V, G2.58V, and S2.59F mutations produced pure and
monodisperse MC4R upon purification in the presence of SHU9119......................................... 93
Figure 24. Point mutation combinations led to pure and monodisperse MC4R upon copurification
with SHU9119 .......................................................................................................................... 94
Figure 25. Micrograph from negative stain electron microscopy imaging of MC4R copurified with
SHU9119 .................................................................................................................................. 96
Figure 26. Micrograph from negative stain electron microscopy imaging of MC4R copurified with
[Nle4,D-Phe7]-α-MSH .............................................................................................................. 97
vii
Figure 27. Micrograph from negative stain electron microscopy imaging of MC4R copurified
with MT-II ................................................................................................................................ 98
Figure 28. Micrograph from negative stain electron microscopy imaging of MC4R copurified with
MT-II, with aggregation ............................................................................................................ 99
Figure 29. Rendering pipeline diagram .................................................................................... 107
Figure 30. Point cloud rendering inside Unity development environment ................................ 109
Figure 31. Fluorescent confocal microscope image rendered in Microscape with a high step
count and a low step count ...................................................................................................... 110
Figure 32. Camera capture and annotation of microscope image within Microscape ................ 112
Figure 33. Example PBC pathway diagram ............................................................................. 118
Figure 34. Visual language development for World in a Cell components ............................... 121
Figure 35. Storyboard of GLP1R activation process ................................................................ 123
Figure 36. Relative scale and frequency of components within a PBC ..................................... 125
viii
Abstract
This thesis explores aspects of biological modeling and data visualization as well as
biochemical experiments toward generating information regarding the components and
infrastructure of the cell. Experimental results focus on protein structure determination, cellular
proteomics, transcriptomics, fluorescence and X-ray based cell imaging, along with computational
work to model and visualize biochemical components at the atomic, organelle, and cellular scales.
The biological focus centers on aspects of metabolic regulation, including the pancreatic beta cell
(PBC), a highly-differentiated cell type in the pancreas responsible for producing insulin; as well
as the melanocortin 4 receptor (MC4R), a protein involved in the regulation of appetite and energy
homeostasis.
1
Chapter 1: Introduction
This thesis explores aspects of biological modeling and data visualization as well as
biochemical experiments toward generating information regarding the components and
infrastructure of the cell. The chapters that follow present results from experiments focused on
protein structure determination, cellular proteomics, transcriptomics, fluorescence and X-ray
based cell imaging, along with computational work to model and visualize biochemical
components at the atomic, organelle, and cellular scales. The biological focus centers on aspects
of metabolic regulation, including the pancreatic beta cell (PBC), a highly-differentiated cell type
in the pancreas responsible for producing insulin; as well as the melanocortin 4 receptor (MC4R),
a protein involved in the regulation of appetite and energy homeostasis.
Chapter 2 focuses on integrating data regarding cellular components and infrastructure in
an effort to generate models of cellular systems. Here, the focus was to develop frameworks for
modeling the activity of the PBC, specifically, its mechanisms for insulin production and secretion.
To this end, data were collected, analyzed, and integrated concerning the frequency and identities
of proteins (proteomics) and transcripts (transcriptomics) within PBCs, the structural availability
of proteins within PBCs (structural bioinformatics), and the mesoscale composition of PBC
organelles (fluorescent and X-ray based imaging). Efforts toward the structural determination of
MC4R is presented in Chapter 3 where the fundamentals of protein expression, purification, and
engineering, are addressed along with protein crystallography, electron microscopy, as well as the
structural modeling of macromolecules.
2
Chapter 4 of this thesis focuses on using three- and four-dimensional entertainment media
development platforms, like Unity and Maya, to model the cell and its components, and to visualize
biological imaging data. As science labs are generating more data than ever, researchers are
working tirelessly to process information into digestible abstractions. Data is not only growing in
abundance, but also complexity, and we must rely on multidimensional visualizations in order to
build models of increasingly complex biological processes. To explore potential solutions to such
challenges, I had the opportunity to create two interactive applications for viewing and exploring
biological data. Specifically, Microscape takes point cloud images from confocal fluorescent
microscopes and renders them inside the Unity Game Engine for visualization on computer
screens, virtual reality headsets, and mobile devices. The second application, The World in a Cell,
is a virtual reality exploration of the multithreaded systems operating within a PBC.
Entertainment production platforms like Unity and Maya are beginning to fill needs in
research and communication pipelines as their robust toolkits are some of the most powerful for
creating multidimensional and interactive representations of data and knowledge. As scientific
efforts continue to work toward developing a more holistic understanding of human biological
systems through experimental techniques like imaging, omics, and macromolecular structure
determination, we are increasing our capacity for producing more complex and disparate data. In
this age of big data, it’s time to develop frameworks, as put forward in this thesis, for putting the
pieces together into more comprehensive models of biological systems. The development of
whole-cell and eventually tissue-scale models through multi-dimensional data integration
practices will allow for deeper insights into a myriad of health conditions, including diseases like
diabetes and cancer, each of which affects hundreds of millions of people worldwide.
3
Diabetes and Modeling the Pancreatic Beta Cell
Diabetes: a widespread metabolic disease regulated by insulin activity
Diabetes is a growing health epidemic affecting more than 400 million people worldwide
1
.
The disease is characterized by the body’s inability to produce sufficient quantities of insulin.
Insulin is a hormone produced solely by specialized beta cells in the pancreas, and it regulates
glucose uptake from the blood. PBCs are bathed in capillaries, and when they sense changes in
blood glucose levels they respond by producing insulin, which promotes cellular glucose uptake
2
.
In diabetes, PBC function and mass are both decreased, which cause a loss of glycemic control
3
.
Insulin itself is a protein hormone, and its synthesis, storage, and secretion are coordinated by a
host of PBC proteins. PBCs undergo signaling changes when exposed to varying amounts of
glucose, and the expression of proteins involved in insulin production processes are up- and down-
regulated as a result
4
. Improving our understanding of the changes that promote and decrease
insulin synthesis and secretion will provide leads for detecting protein malfunction in diabetes.
Diabetes is a group of metabolic diseases that, according to the International Diabetes
Federation, “occur when the pancreas is no longer able to make insulin, or when the body cannot
make good use of the insulin it produces.”
5
Diabetic patients experience excessive levels of glucose
in the blood. In a diabetic state, the body is unable to produce or use insulin to properly process
blood sugars, leading to toxic accumulation, known as hyperglycemia. Hyperglycemia can lead to
frequent urination, excessive thirst, fatigue, unexpected weight loss, vision changes, and over the
long-term, increases the risk of heart attack, stroke, kidney disease, and nerve damage
6
. Insulin
plays a crucial role in diabetes since it is key to removing glucose from the bloodstream.
4
There are two forms of diabetes, type I and type II. Type I diabetes is characterized by the
body’s immune system attacking PBCs and the diminished production of insulin in response to
high glucose levels
7
. Type I diabetes is often diagnosed in younger patients, though the disease
can develop at any age
8
. In many cases, type I diabetes patients supplement with external insulin
every day. Type II diabetes is characterized by the body’s inability to make and use insulin in an
efficient manner
9
. This form of diabetes occurs more often in adults, but also can develop at any
stage of life
10
. Type I and type II diabetes often persist within families, suggesting a strong genetic
component to the disease
11-12
, though lifestyle behavior, such as lack of exercise and poor diet,
have also been linked to diabetes, especially in the case of type II
13
.
In a healthy physiological state, insulin is produced by the pancreas in a mechanism tightly
coupled with the sensing of blood glucose levels
14
. When blood glucose levels increase, insulin is
produced to remove it from the blood so it can be converted to energy. After insulin is synthesized
in PBCs and released into the bloodstream, the hormone interacts with the insulin receptor on the
membranes of cells throughout the body
15
. Upon insulin binding to the target insulin receptors, a
signaling cascade is initiated that ultimately increases the number of glucose transporters
embedded in the cell membrane. By increasing their number, the target cells allow more glucose
to diffuse from the bloodstream into the cytosol where it is further processed, ultimately into ATP.
Hence, insulin is a crucial regulator of all phenotypes within this class of metabolic disease.
Given its critical relationship to diabetes, insulin and PBC glucose-stimulated secretion
receives much attentionand there is a long history of efforts to identify and study the metabolic
5
pathology
16
. The disease was first characterized in Egypt around 1500 BC as an occurrence of
frequent urination
17
. Descriptions of the disease are found in ancient Indian, Chinese, Greek, and
Arab manuscripts. Near 200 AD, Aretaeus of Cappadocia coined the name ‘diabetes’ and
provided the first accurate descriptions of its manifestations. Later in 1776, Matthew Dobson
was able to confirm that the sweet tasting urine often found in diabetes, observed centuries
earlier, was indeed due to the presence of excessive. More than a century later in 1889, Joseph
von Mering and Oskar Minkowski discovered the role of the pancreas in diabetes after removing
the pancreas from dogs and witnessing the development of symptoms found in human diabetes.
von Mering and Minkowski’s finding was the beginning of a new chapter in diabetes research
that focused on pinpointing the systems that govern insulin production and release in the
pancreas. On the heels of the finding that the pancreas is responsible for insulin production, Sir
Edward Albert Sharpey-Schafer discovered in 1910 that diabetes was strongly influenced by the
absence of a particular substance produced by the pancreas. He coined this substance “insulin,”
named after the irregularly-shaped patches of endocrine tissue located within the pancreas, or
“islets of Langerhans” thought to be the locale of insulin producing cells. Soon after, in 1921,
Frederick Banting, Charles Best, James Collip, and John Macleod were the first to extract and
purify insulin, opening new doors for the treatment of diabetes.
The precise contribution of PBC mechanisms to diabetes-related phenotypes is still largely
uncharacterized
3
. Many research modalities are being leveraged to understand PBC behavior, and
more are being adapted to delve deeper into this cell type
18
. This includes not only new
technological platforms but also biological research tools such as cell lines and assay logistics. In
many ways, the goal of PBC research, and diabetes research more broadly, is to understand the
6
mechanisms of insulin production so that we can engineer more optimal mechanisms of production
both in vivo and in vitro. Among many diabetes-oriented research groups, this focus on
understanding PBC mechanisms of insulin synthesis and secretion has also been adopted by the
Pancreatic Beta Cell Consortium (PBCC)
19
, a team of scientists, clinicians, engineers, and digital
artists working together to develop new methods for diabetes research and intervention. The
PBCC’s mission is to develop spatiotemporal, multiscale, whole-cell models of human PBCs that
integrate data across the spectrum of research disciplines and techniques. In addition to generating
structural models of PBCs, the PBCC seeks to understand the mechanism of insulin production in
health and disease in order to create interventions for the prevention and treatment of diabetes.
Determining PBC insulin production mechanisms with cell modeling
Efforts presented in this thesis contribute to the PBCC’s overall goal of generating
spatiotemporal multiscale models both by contributing data to models in development and by
devising frameworks for integrating data between experimental platforms. My work with the
PBCC builds on not only a wealth of known information on diabetes and PBC biology, but also
on an extensive history of efforts in modeling cellular phenomena. Here, a whole-cell model
describes one or more aspects of the entire cell as a function of its components and relationships
among them. Such models increase our understanding of how the cell functions, is modulated, and
evolves with time. Recent efforts in whole-cell modeling are wide-ranging and include static visual
reconstructions of cellular landscapes, dynamic structural simulations of molecular interactions,
systems of mathematical equations recapitulating biochemical reaction pathways, and more. For
example, atomic resolution models of cytoplasmic subsections of Mycoplasma genitalium
20-21
and
Escherichia coli
22-23
were assembled and used for simulating dynamics via Brownian dynamics
7
(BD) or molecular dynamics (MD) to investigate diffusion and protein stability under crowded
cellular conditions. Other efforts focused on assembling 3D cellular landscapes using experimental
data, including, for example: models of HIV-1 virus and Mycoplasma mycoides using cellPACK
(a software tool that assembles large-scale models from molecular components using packing
algorithms)
24-25
; an atomic resolution snapshot of a synaptic bouton using quantitative
immunoblotting, mass spectrometry, electron microscopy, and super-resolution fluorescence
imaging
26
; and an ultrastructure model of mouse PBC using electron tomography
27
. Additionally,
mathematical models using differential equations and flux balance analysis have been used to
construct cellular
28
and metabolic networks
29
of whole cells to predict phenotype from genotype.
Many other platforms for modeling cellular processes using various techniques have been
developed over the last two decades. One example is V-Cell, a modeling platform that simulates a
variety of molecular mechanisms, including reaction kinetics, membrane transport, and flow, using
spatial restraints derived from microscope images
30-31
. Another popular cellular modeling platform
is M-Cell, which also uses spatial 3D cellular models and Monte Carlo methods to simulate
reactions and movement of molecules
32
. Similarly, the E-Cell platform simulates cell behavior
using differential equations and user-defined reaction rules regarding aspects like protein function,
regulation of gene-expression, and protein-protein interactions
33
. Collectively, these efforts
required both an enormous amount of data, as well as integrative computational methods. While
each of these models offered some degree of insight and represented important milestones in
whole-cell modeling, none was able to fully represent the complexity and scope of an entire cell.
A comprehensive whole-cell model should allow us to address questions from multiple
scientific fields, incorporate all available experimental information, and harness the power of a
8
wide variety of computational and database resources. Biologists, chemists, physicists, and many
others should be able to use the model to ask a myriad of scientific questions depending on the
researcher’s particular interest. For example, biologists could query the effects of a drug on a cell’s
expression patterns, chemists could test the stability of a particular compound in a cellular
environment, and physicists could examine the relationships between reaction rates in biochemical
contexts. For the model to be useful to many disciplines, it should integrate data generated from a
wide range of experimental platforms. For instance, in the model, each of the cell’s components
that are determined by omics approaches should be connected to their conformational data
determined through structural biology approaches. Similarly, subcellular localization data should
be determined by microscopy, and so forth. To connect these disparate pieces of information, the
model will need to integrate a wide variety of database tools and will also require the incorporation
of extensive computational resources to perform simulations and queries. The scope of biological
questions accessible through a comprehensive whole-cell model will continue to evolve as the
available data and technology evolve. A comprehensive cell model should contain the following
attributes:
Complete and Multi-scale - The model must consist of all cellular components, including
individual atoms, small molecules (e.g., water and metabolites), macromolecules (e.g., proteins,
nucleic acids, and polysaccharides), and complexes (e.g., ribosomes, nuclear pore complex, and
proteasome), as well as organelles and cellular compartments (e.g., nucleus, mitochondria, and
vesicles). The model must describe the cell at multiple levels of its hierarchical organization, from
atoms to cellular compartments.
9
Space and Time - The spatial organization of the cell should be mapped by defining the
distributions of its components
34
, as well as how these distributions change over time, through
atomic fluctuations, chemical reactions, diffusion, and active mass transport.
Logical Organization - The model should also define the logical organization of the cell
(e.g., via molecular networks, mass, and information flow), providing a sense of functional
relationships among its components; for example, how insulin secretion pathways change and are
affected by drug molecules in the environment.
Couple Multiple Representations - A comprehensive cell model will need to
simultaneously include various representations of the cell that researchers have established and
published in the past. Examples of some of these representations include: (1) a static description
of the spatial distribution of cellular components derived from various experiments like fluorescent
imaging and quantitative proteomics
25-26, 35-36
; (2) static molecular networks used in systems
biology
37
; (3) ordinary differential equations (ODEs) commonly used for modeling metabolic
pathways
28
; (4) a flux balance analysis (FBA) for modeling biochemical networks
29
; (5) a
description of dynamics by statistical inferencing from spatiotemporal distribution of components
and live imaging
38-39
; (6) particle diffusion in crowded environments via BD
40-41
; and (7) atomic
fluctuations by MD simulations
21
. Each of these representations are informed by different inputs
and provide answers to different questions. Therefore, each will be useful to include as a different
layer for modeling whole cells. Although the representations can be derived from diverse sources
of information, they must be in harmony and coupled in the final model. The model of the cell
should couple diverse representations directly or indirectly, such that a change in a model of one
10
cellular component described by one representation will be reflected in all models of all
components. For example, a change in concentrations derived from mathematical modeling
implies a change in spatial model. Similarly: information about the diffusion of complexes and
metabolites from BD simulations should feed back into reaction-diffusion models of functional
processes; the structure of a protein complex implies a molecular network involving the constituent
proteins
42
; and a change in the structure of a subunit in a complex may imply a change in the
complex structure.
Integrative - Because of the vast complexities of cellular functions, all relevant information
must be used to construct the cellular model. This information includes varied experimental
datasets, physical theories, statistical inferences, and/or prior models. Thus, integrative modeling
must be used to convert the diverse input information into the cellular model
43-44
.
Specify Uncertainty - Importantly, any model must include a quantification of its
uncertainty, which is essential for the interpretation of its results. Model uncertainty depends on
the sparseness of the input information and the uncertainty about how input information limits the
model
45
. For example, the B-factor used in structural biology specifies the uncertainty in atomic
positions in a model of protein structure
45-46
. Likewise, localization probability densities quantify
the uncertainty of an integrative model of macromolecular complex structure
43
.
Descriptive and Predictive - A good model will allow the rationalization of existing
experimental observations. It will also be quantitatively predictive rather than providing only a
11
table or visualization of experimental observations. Thus, a model will allow testable predictions
to be made (i.e., be falsifiable).
Reflect Heterogeneity - Even cells of the same type vary in terms of abundance and
localization of components (e.g., molecules, complexes, and organelles). Therefore, a whole-cell
model must describe the variation among individual cells of the same subtype, among cells of
different subtypes
47-50
, and among cells from different individuals in the human population. It
should also be able to take into account different environments and perturbations. These variations
are likely to play significant roles in the progression of disease and drug response and, therefore,
must be accounted for in a whole-cell model.
Iterative - A model must be capable of continued refinements, reflecting the growth in
quantity and quality of input information as well as improvements in computing capacity and
modeling methods. A good model will facilitate the identification of the most informative future
experiments and thus expedite the scientific cycle of iterating through experiment and
modeling
51
.
In addition to these criteria, coupling the structural information of cell architecture and
experimentally-derived spatial distribution of cell components with mathematical models is an
important aspect of the whole-cell model. Platforms like V-Cell
31
and M-Cell
36
, as mentioned
previously, already integrate such spatial information with mathematical modeling. V-Cell can
simulate various molecular mechanisms using ordinary and/or partial differential equations in the
context of compartments, whereas M-Cell can simulate cellular environments via 3D reaction-
diffusion using Monte-Carlo methods. Other studies have also considered spatial aspects. For
12
example, one study extracted cell geometry from cryo-electron tomography experiments and used
stochastic simulations to study the effect of cell structure on reaction network
52
. Some have relied
on fluorescent images to extract ultrastructure and spatial distribution of proteins for cell modeling
using V-Cell or M-Cell
35-36
. These methods do not represent cellular environments, including
proteins, lipids, and macromolecular assemblies, as three-dimensional structures. M-Cell
represents components as diffusing particles, and V-Cell models component concentrations via
ODEs. In the future, new versions of these modeling platforms that include 3D representations of
components and physical interactions are likely to be applied to larger and more complex systems,
such as human PBCs.
Efforts to model the whole cell are timely because, for the first time, the computational
capabilities required to undertake whole-cell modeling are within reach. However, as already
discussed, it is highly likely that a variety of different types of modeling are needed to construct a
model of the cell. Thus, different types of computing architectures will also be required, at least
initially. These architectures will include special-purpose hardware such as the Anton
supercomputer for MD simulations
53
and Graphics Processing Units (GPUs) for image
processing
54
. Modeling will require general purpose parallel computer clusters with significant
data bandwidth and cloud computing with relatively low volume data transfer between a central
repository and individual nodes
43
. Large-scale recruitment of personal computers on the Internet,
as in Folding@Home, may also be helpful. Some computing capabilities of each one of these types
are already available. As the modeling methods evolve and as it becomes more clear what types
of modeling are required, it will also be beneficial to construct increasingly specialized and thus
efficient computing architectures, perhaps culminating in a special-purpose computer for
13
simulating cells. Just as the need to solve the phase problem in crystallography motivated the
development of early computer hardware
55
, we anticipate that the goal of modeling whole cells
will also attract computer hardware and software engineers with commensurate support from
governments, philanthropic foundations, and industry partnerships. It is encouraging that a number
of government institutions have already recognized the need for increased computing for biology
in general and are investing in it, e.g., the Cloud Credits Models by NIH and The European Cloud
Initiative.
In addition to computing hardware for modeling, modeling efforts will also require
resources for archiving and disseminating the data and models. It is likely that this functionality
will be best achieved through a federated archive of repositories. One such arrangement is
envisioned for integrative structures of biological macromolecules and corresponding data for the
integrative/hybrid methods initiative of wwPDB
56-57
. The Biostudies resource
58
at the European
Bioinformatics Institute (EBI) may provide a convenient portal for the participating archives.
Therefore, it is conceivable that the common goal of modeling cells will encourage the
communities that generate data to come together and define appropriate standards for archiving,
annotating, validating, and disseminating their data to provide maximal use of the accumulated
results.
Accurate multi-scale models will contribute to our understanding of how the cell functions,
how it can be modulated, and how it evolved. Models of whole cells will lead to innovative ways
of designing therapeutics, just as structures of individual proteins often facilitate rational,
structure-based drug discovery. Drug design efforts have focused at the molecular level on isolated
proteins for decades, and now it is only logical to move toward designing drugs by modeling their
14
effects at the cellular level. Whole-cell models could help in target selection for drug discovery by
predicting a potential drug target’s impact on modulating various cell functions. In addition, these
models could also provide a basis for rational-based cell therapies. While there are a rising number
of cell therapy clinical trials, only one has thus far been approved
59
. Therefore, there is scope for
models to help improve this outlook. For example, a major area of investigation for improving
diabetes therapy is focused on generating healthy PBCs in vitro, which will require a holistic
understanding of PBCs. Efforts to derive PBCs from induced pluripotent stem cells and implant
them in patients are active areas of research and provide many challenges
60-61
. Accurate beta-cell
models can potentially be used to inform the design of these approaches for therapy
62
.
Whole-cell models may also provide new insights about the molecular links from genotype
to phenotype and shed light on more complex emergent properties that arise from molecular
interactions, such as coordinated insulin release from clusters of PBCs. Multi-scale models will
also provide a way to comprehend heterogeneous data from a vast array of complementary
experiments and push computational limits so that complex biological questions can be answered.
Many of these points have also been considered in other contexts
63-68
. One specific outcome of
initially focusing on PBCs for the development of cell models would be the establishment of a
platform consisting of the basic pipeline, framework, toolbox, methods, approaches, experimental
and computational infrastructure, etc. that could be then extended to other cell types. Beyond that,
it’s important to think about how these individual cells integrate into tissues and such complex
structures could become an additional target for modeling efforts. To take the long view, the
development of whole-cell and eventually tissue-scale models will allow for deeper insights into
a myriad of health conditions, including diseases like diabetes and cancer, each of which affects
hundreds of millions of people worldwide.
15
The PBC omics and imaging studies presented in Chapter 2 contribute to the PBCC’s
efforts to build spatiotemporal multi-scale models of the PBC. Despite the PBC's central role in
metabolism, there has not been an effort to comprehensively describe its proteome at the organelle
level. In Chapter 2, my work toward determining the whole cell and organelle-specific proteomes
of an insulin producing PBC line, INS1, is reported as a starting place for integrating information
about PBC protein identities and abundances with data from other PBC experimental modalities.
Additional proteomics analyses of PBC components will complement the Consortium’s ongoing
work in PBC imaging, systems analysis, structural modeling, and more, to build predictive, four-
dimensional models of PBC physiology.
The Melanocortin 4 Receptor (MC4R) and appetite regulation
MC4R: a metabolism regulating receptor with therapeutic potential
In the early 1990’s, Roger Cone et al. were the first to clone the family of genes coding for
what are now the well-known melanocortin receptors
69
. At the time, it had been determined that
the melanocortin system was controlled by peptides in the brain, pituitary, and immune system,
yet the function and composition of the individual members of the system were largely unknown.
Today, we know the melanocortin system coordinates a wide variety of physiological functions
including metabolism, pigmentation, sexual function, analgesia, temperature control, and
cardiovascular regulation
70
. The system is regulated by a group of peptides that act on the five
melanocortin receptors. The melanocortin receptors are class A GPCRs, of which over fifty have
been structurally solved at atomic resolution, yet none of the melanocortin receptors have been
16
structurally determined
71
. Unique to the system of melanocortin receptors are two endogenously
expressed antagonists, while there are no identified native inhibitory ligands for any of the other
GPCR families
72
. The two antagonists are agouti and agouti-related protein (AgRP), and both were
discovered as a result of their effect on pigmentation, which originates at agouti locus. The
melanocortin system also shares common agonists that all derive from a precursor peptide of 241
amino acid residues known as Pro-opiomelanocortin (POMC). POMC, synthesized in the pituitary
gland, has its own precursor, pre-POMC, that converts to POMC upon removal of the signal
peptide sequence during translation. POMC is then differentially cleaved to produce a series of
peptides with more specific action at the melanocortin receptors. One POMC derivative, alpha
melanocyte-stimulating hormone (α-MSH) is the primary agonist of MC4.
Soon after isolating (MC4R), one of five members of the melanocortin family, seminal
studies on MC4R demonstrated that mutations in the gene can lead to monogenic obesity
73
. Almost
thirty years later, it is commonly held that MC4R mutations are likely the most common cause of
monogenic obesity in humans and other organisms, with at least one hundred mutations having
been reported that correlate with obese phenotypes
74
. Many efforts to prevent and treat obesity are
now centered around MC4R research. While there is strong evidence for the receptor’s role in
metabolic physiology, there is still much to learn about MC4R’s role in cells, the brain, and the
body at large. Many seek to understand the function of MC4R at the molecular level, as well as to
understand its interaction with ligands and effector proteins, and the atomic resolution structure of
MC4R has long been sought
75
. Presented in Chapter 3 are efforts toward generating the structure
of MC4R. Here, my work included construct design, purification optimization, and preparation for
x-ray crystallography and cryo-electron microscopy structure determination. Efforts concluded
17
with the determination of stably purified MC4R constructs with a series of ligands, including the
known endogenous agonist as well several other agonists and antagonists. Upon the development
of several consistently highly expressing, stable, and purifiable MC4R constructs, the project was
transferred to Shanghai Tech University for further efforts toward structure determination using x-
ray crystallography and cryo-electron microscopy.
Figure 1. Ligand cross-reactivity in the melanocortin system. Adapted and reprinted with permission from
American Physiological Society
76
.
Each of the five melanocortin receptors activate intracellular signaling through G protein-
coupled cAMP generation. MC4R in particular has been found to associate with several members
of the G protein family including Gs, Gi/o, and Gq
77
. MC4R, in addition to the other melanocortin
receptors, have demonstrated interactions with G protein and cAMP pathways, and also have been
associated with influx and efflux of calcium ions, activation of inositol trisphosphate, and the
18
activation of MAP kinase, janus kinase, and PKC pathways
78
. MC4R is predominantly expressed
in the central nervous system. In mice, the receptor has been found in nearly every brain region
including the cortex, hypothalamus, brainstem, and spinal cord
79
. In humans, the region of highest
expression is the paraventricular nucleus, supraoptic nucleus, and the nucleus basalis of Meynert
80
.
Overall, MC4R expression seems to approximate the distribution of its native ligands, AgRP and
α-MSH
80
.
Figure 2. MC4R signaling pathways. Adapted and reprinted with permission from Springer
77
19
In 1997, studies by Huszar et al. determined that the MC4R gene locus is a critical regulator
of metabolic function in mice
73
. Upon disrupting MC4R, double knockout mice displayed
phenotypes of hyperphagia, increased adiposity, increased longitudinal growth, and
hyperinsulinaemia, leading to maturity onset obesity
73
. While the double knockout mice display
an exaggerated phenotype of obesity, it was also found that mice with a heterozygous presence of
the MC4R gene (+/-) displayed an intermediate phenotype of weight gain, suggesting that the
receptor’s role in energy homeostasis is additive. About a decade earlier, studies showed that
injecting α-MSH and ACTH into the intracerebroventricular area of rats caused a pronounced
decrease in food intake, however, the locus of action for these molecules was unknown
81
. A
subsequent study found that agouti, which was a well-known antagonist of MC1R involved in
pigmentation, was also an antagonist of MC4R
82
. This finding led Cone et al. to hypothesize that
agouti, which is expressed in the central nervous system local to MC4R, could explain why mice
with the agouti pigmentation phenotype also display characteristic of hyperphagia. During this
time, much effort was focused on characterizing ligands that interact with the melanocortin system,
giving rise to knowledge that molecules like MTII and other α-MSH analogues like SHU9119
show high potency and selectivity in modulating MC4R activity
83
. Cone et al. tested their
hypothesis about MC4R antagonization as the cause of hyperphagia by administering several
obesity recapitulating mouse models with both MTII and SHU9119, known MC4R agonist and
antagonist, respectively, and measuring the dose response in feeding activity
84
. They found that
administering SHU9119 alone led to increased feeding, even in mice with diet induced obesity,
suggesting that the molecule acts on a locus that exacerbates feeding behavior independent of
metabolic homeostasis. Further, upon administering the MC4R agonist MTII, feeding behavior
was decreased in a dose-dependent manner, followed by a dose-dependent reversal of the effect
20
upon administration of MC4R antagonist, SHU9119. Follow-up studies found that injecting the
compounds directly into the paraventricular nucleus of mice exacerbated the agonist and
antagonist responses. This region of the brain has been shown to express higher levels of MC4R
and is thought to be the site of action for regulating feeding behavior through the melanocortin
system. After early studies determined that MC4R activity is central to energy homeostasis, many
follow-up investigations sought to understand how specific mutations in the receptor might affect
the development of obesity.
Gene-wide association studies have identified a list of mutations correlated with the
development of obesity in humans. Estimates suggest that the prevalence of MC4R mutations is
around 4% in obese adults, with a higher prevalence in children with severe childhood-onset
obesity
85
. The effects of MC4R mutation in humans is comparable to that in mice, where
hyperphagia, hyperinsulinaemia, and increased adipose mass appear in heterozygous mutants and
manifest more intensely in homozygous mutants
86
. Of the list of over one hundred MC4R
mutations associated with dysregulation of energy homeostasis, many have been shown to cause
the receptor to fold improperly and be retained intracellularly. In such a case, the receptor is unable
to receive extracellular signals at the cell membrane and communicate them inward, resulting in a
loss of agonist response. Loss of agonist activity is thought to withhold necessary signals of
satiation directed by MC4R under normal conditions. Other obesity-related mutations have been
shown to decrease constitutive activity of MC4R without reducing the agonist response. It appears
that even slight deviations from MC4R’s highly conserved sequence can cause major disruptions
in energy regulation.
21
A classification system for MC4R variants that are functionally correlated to obesity
phenotypes was developed by Tao et al. to organize observations into five groups
87
. The variants
include: class I, which are lacking in expression; class II are expressed, but improperly trafficked
to the cell membrane; class III have defective ligand binding; class IV are characterized by defects
in signaling response; and class V consists of MC4R mutants without known defects. In clinical
studies, the most common observation in patients with MC4R variants is over-eating with early-
onset obesity. Carriers of variant MC4R genes often express not only characteristics of
hyperphagia, but they also tend to have a less prevalent occurrence of high blood pressure and
increased bone mineral density
88
. Recently, Kuhnen et al. summarized the outcomes from nineteen
groups of MC4R variant carriers following bariatric surgery to determine the effect of receptor
variations on recovery
87
. One observation that connects the melanocortin system to diabetes-
related physiology was a cohort of 249 patients with the rs17782313 MC4R variant who received
gastric bypass that exhibited biometrics associated with type 2 diabetes
89
. Also following gastric
bypass, carriers of the I251L and V103I mutations in a study of 1433 individuals lost more weight
than non-variant carriers
90
. It is unclear exactly how the melanocortin system, and MC4R in
particular, participate in the coordination of feeding behavior, energy homeostasis, and overall
metabolic regulation, though gene-wide association studies in large patient cohorts of MC4R
variant carriers are providing new insights into the receptors axis of influence.
In addition to demonstrating a critical role the regulation of metabolism through appetite
and energy expenditure, MC4R has also been associated with the management of other primary
physiological activities including cardiovascular function, reproduction and sexual function, as
well as pain mediation. Along with abundant expression in the paraventricular nucleus of the
22
central nervous system, MC4R is also expressed in the nucleus of the solitary tract, a region
associated with the regulation of cardiovascular activity. Kuo et al. found that upon activation of
MC4R there is a corresponding increase in arterial pressure, while antagonized MC4R decreases
arterial pressure
91
. Subsequently, it was shown that the inhibition of MC4R by exaggerated
injection of SHU9119 and AgRP, both MC4R antagonists, resulted in not only a decrease of
arterial blood pressure but also in a concomitant decrease in heart rate
92
.
While it is clear that MC4R activity is associated with cardiovascular regulation, whether
the receptor has therapeutic potential for aiding related pathology is still up for debate. MC4R also
appears to influence reproductive activity and sexual function as MC4R knockout mice have been
observed to exhibit erectile dysfunction and an accelerated reproductive aging process in females.
Antagonist inhibition of MC4R has been shown to constrain luteinizing hormone surges in female
mice, a precursor to ovulation
93
. In both male rats and humans, MC4R mRNA has been found to
express in the penile tissues and the pelvic ganglion, a major center for the coordination of sexual
function
94
. Furthermore, it has been demonstrated in mice that the highly selective MC4R agonist,
THIQ, enhances copulatory behavior; however, this effect is reduced in MC4R knockouts. MC4R
mRNA is expressed in several regions that mediate the influence of opiates including the striatum
and nucleus accumbens. Further, POMC, the melanocortin peptide precursor, also gives rise to
beta-endorphin, a known agonist of the u-opioid receptor. More than 30 years ago, it was
demonstrated that α-MSH antagonizes analgesia caused by morphine introduction
95
, while chronic
introduction of morphine causes decreased MC4R expression in selective brain regions
96
.
23
MC4R is not only modulated by peptides and small molecule compounds but interacts with
a series of intracellular binding partners that modify the receptor’s signaling pathway. Of the short
list of currently characterized MC4R-interacting proteins, the most heavily investigated are
Melanocortin Receptor Accessory Proteins (MRAPs) and Inward-Rectifier Potassium Channel 7.1
(Kir 7.1). The MRAP family are small transmembrane proteins originally identified from their
interaction with MC2R, where they enabled functional expression of the receptor in transfected
cells
97
. In 2009, Chan et al. found that MRAP and MRAP2 interact with each of the five
melanocortin receptors and that the addition of both accessory proteins can result in a small but
significant decrease in MC4R surface expression
98
. Further, they found that effects of reduced
surface expression can be additive between MRAP and MRAP2. In 2013, Sebag et al.
demonstrated that MC4R interaction with MRAP2a caused a reduced binding affinity between
MC4R and α-MSH, whereas interaction with MRAP2b led to a reduction in MC4R constitutive
activity and an increased potency of α-MSH in zebrafish
99
. In a separate study, Asai et al.
demonstrated the obesity-related physiological relevance of MRAP2 in that mice with the gene
knocked out develop obesity at a young age. Additionally, the group found that rare heterozygous
MRAP2 mutations are associated with human early-onset obesity
100
.
Recently, the Cone group identified a novel interaction between MC4R and Kir7.1. MC4R
passes its activation signal to downstream mediators via G protein alpha s activation, which
subsequently activates PKA signaling. In humans and other mammals, the endogenous peptide α-
MSH serves to activate the signaling cascade, while the antagonist AgRP acts in the reverse
direction to block the pathway. Cone’s group found that the resultant MC4R signaling pathways
can be modulated in hypothalamic neurons by α-MSH and AgRP independently of G protein alpha
s signaling through the coupling of MC4R to Kir7.1
101
. It appears that α-MSH and AgRP not only
24
coordinate the activation and inhibition of MC4R coupling and activation of G protein alpha s, but
also coordinate the receptor’s interaction with the potassium channel. Hence, α-MSH causes
MC4R to interact with Kir7.1, leading to a closure of the channel and subsequent cell
depolarization. This finding presents a new potential target for the development of appetite and
energy expenditure-related therapeutics. The elucidation of the MC4R structure would contribute
not only to modeling the interaction between the Kir7.1 and the receptor but would also provide a
basis for the determination of the complete receptor-ion channel complex.
MC4R drug development, structural modeling, and mutagenesis
MC4R regulates critical sections of physiology and is pharmacologically tractable as a
GPCR. As such, the development of MC4R-targeting therapeutics have great potential in
providing benefits for many indications. However, there are currently no clinically approved drugs
available to specifically modify the activity of MC4R. Extensive efforts have focused on
developing MC4R-based treatments for obesity and appetite regulation. MC4R activation leads to
a decrease in feeding behavior along with an increase in energy utilization. In MC4R-defective
patients, many experience phenotypes and symptoms associated with the antagonized state of
MC4R, overeating and decreased energy utilization. Hence, a therapeutic to reverse symptoms
experienced by patients with impaired MC4R activity should mimic the effects of MC4R
activation. To this end, a number of MC4R agonists have been characterized. In 2014, Fani et al.
performed a systematic literature review for MC4R-related pharmacological obesity treatments
and found several preclinical studies involving MC4R agonists
102
. Each of the studies included in
the review consisted of in vivo studies of MC4R agonism in both humans and other animals.
MC4R agonists from the fifteen studies included melanotan II, tetrapeptides, MK-0489, MK-0493,
25
urea-based piperazine, Ro27-3225, cyclophanes, ACTH-derivatives, pyrrolidine diastereoisomer,
BIM-22493 and BIM-22511, and β-MSH. They found that many MC4R agonists have undesirable
side effects, likely from off-target activations. For instance, melanotan II was observed to cause
reduced food intake and a concomitant weight decrease in rats, though this effect was accompanied
by a conditioned taste aversion.
Another challenge in developing MC4R-targeting therapeutics is bioavailability. Oral
administration is preferred for drug intake in humans, yet MC4R targeting peptides have a short
half-life in the intestinal and blood environments. However, backbone cyclization strategies have
demonstrated an improvement in tetrapeptide bioavailability, and additional strategies will be
developed with improved understanding of structural features of MC4R activation
103
. Currently,
there is a lack of peripheral or surrogate biomarker detectability in the study of the efficacy of
MC4R targeting compounds. Thus, it difficult to determine the extent of MC4R engagement
following exposure to MC4R agonists, and most studies of MC4R activation and resulting
metabolic effects rely on traditional biomarkers and visual clues as a proxy for measuring efficacy.
As more information about MC4R’s role in energy homeostasis and peripheral biological influence
is uncovered, it is likely that we will build on our understanding of the system to the point where
it is possible to develop an MC4R-targeting compound that provides the desired effect on energy
management, without off-target effects. The contribution of an atomic resolution structure of
MC4R will accelerate such a trajectory.
Molecular modeling of MC4R remains challenging given the lack of structural information
about the receptor and its family members. Several homology modeling efforts have been applied
26
to MC4R in search of clues toward understanding ligand binding, receptor conformational states
and transitions, and binding to and signaling through effector proteins. Central to these efforts is
the development of therapeutic compounds to treat obesity and appetite disorders. A main
challenge here stems from the fact that ligands reacting to MC4R often show cross reactivity to
other receptors in the melanocortin family, leading to unwanted side effects. Modeling efforts
provide an opportunity to delineate features of the MC4R ligand binding pocket that serve to
differentiate between potential binders. Recently, Heyder et al. used the sphingosine 1-phosphate
receptor structure to model MC4R given the similarity between the two receptors
104
. They have
about a 50% sequence similarity in the transmembrane region, a similar number of N-terminal
residues, and common features in the loop regions. After developing an MC4R homology model,
the group sequentially substituted twenty-one naturally occurring side chain substitutions at
fourteen N-terminal locations in the receptor and found no impact on α-MSH binding or cAMP
signaling. This result suggests that the N-terminal region of MC4R may not play a critical role in
ligand differentiation. However, this finding should be orthogonally validated by additional
studies, ideally using a molecularly determined structure of the receptor.
Another recent modeling effort by Saleh et al. also used the sphingosine 1-phosphate
receptor crystal structure to develop an MC4R model for investigating ligand binding in molecular
dynamics simulations
105
. The group’s approach to generating the homology model focused on
conserving several key structural features in MC4R, including: a very short second extracellular
loop consisting of only four amino acid residues; an absent disulfide bridge between extracellular
loop two and transmembrane helix three that is normally highly conserved; a methionine in
position 5.50 instead of the highly conserved proline found in other Class A GPCRs; a DPxxY
27
motif in transmembrane helix seven instead of the highly conserved NPxxY motif; and a disulfide
bridge in the third extracellular loop. In simulations, they found several transmembrane residues
to influence agonist recognition but not antagonist recognition. One antagonist, MCL0129,
prevented the access of agonists to the binding pocket, yet did not form contact with important
transmembrane residues in the putative agonist binding pocket, including I125, L133, W258, F261,
and M292.
In addition to investigating potential residues involved in ligand recognition, the group
identified a sodium binding site thought to impact ligand binding through allosteric effects. In
Class A GPCRs, an allosteric sodium binding pocket was originally identified by Liu et al. in a 1.8
Å resolution crystal structure of an engineered human A2A adenosine receptor. The sodium
binding pocket has received much attention as a potential source of GPCR therapeutic
modulation
106
. To date, the allosteric sodium binding site has been found in over fifteen Class A
GPCRs including the adenosine, adrenergic, dopamine, and opioid receptors. In Saleh’s MC4R
homology model, the sodium binding site involves key residues found in other GPCR sodium
binding sites including S58, D90, S295, and N294. However, the group noticed that the site was
between transmembrane helices two and seven, close to helix one, while in other Class A GPCRs
the site is closer to helix three. The group also noticed that ligand binding was carried out through
a stepwise binding mechanism, like previous simulations of ligand binding in related receptors,
with the N-terminus playing a key role in ligand binding and recognition. This result contradicts
the conclusions of Heyder et al. where they contended that the N-terminus is not active in MC4R
ligand binding. A crystal structure of MC4R will provide the insights required to solve this
discrepancy, as well as other open questions in MC4R ligand recognition, binding, and activation.
28
There is a wealth of clinical and pharmacological data for MC4R, yet not a single solved
structure exists. Efforts presented in Chapter 3 are aimed at solving the structure of MC4R to
provide better models both for understanding the loss of receptor function in obesity and for the
rational, structure-based development of effective treatments for appetite regulation. Through
these approaches to determining the structure of MC4R the discovery and optimization of a series
of MC4R gene constructs and ligand pairs are demonstrated with biophysical characteristics
necessary for further structural studies.
29
Data integration via digital arts approaches
During my graduate studies, I not only had the opportunity to perform research on cells
and their components using a variety of experimental techniques, I was also able to focus on
developing means for representing complex and multidimensional biochemical data. As science
labs are generating more data than ever, researchers are working tirelessly to process information
into digestible abstractions. Data is not only growing in abundance, but also complexity, and we
must rely on multidimensional visualizations in order to build models of increasingly complex
biological processes. Using entertainment platforms like Unity and Maya, I explored methods for
visualizing and interacting with biochemical data. Our visual tools focused on representing data
from biological imaging, omics, and macromolecule structure. Through my proteomics,
transcriptomics, and imaging efforts toward generating information for cellular models of the PBC,
as well as my efforts toward determining the atomic resolution structure of MC4R, I gained
experience in producing and analyzing biochemical data. Building on these experiences, my efforts
in developing interactive applications represent an exploration of outside-the-box approaches for
integrating and visualizing biochemical data.
Visualization as a means for integrating disparate data
Visualization is a critical component of research and communication in the life sciences.
We not only visualize our models of understanding in formats like graphs, diagrams, figures, maps,
etc., but we also visually investigate a large part of our primary data, as in instrument readouts. As
a picture is worth a thousand words, many researchers first look to figures when reading scientific
literature. In figures, we summarize our knowledge into crystalized, meme-like demonstrations of
30
an idea or result. Innovations in visualization techniques have moved in step with innovations in
biochemistry and our overall understanding of the processes of life. Less than a century ago, atoms
were represented only by arbitrary symbols and abstract shapes, and now we’re performing
simulations of trillions of atoms in biological environments
107
. Advances in computing and
visualization technology have provided tools that are able to visualize extremely detailed,
complex, and fast moving entities without needing a fancy studio, making them available to a
broader population of scientists including those focused on the microscopic worlds within us.
In my work with the PBCC, part of my focus has been on developing visualizations of the
cell and other biological structures using entertainment media and game production tools. Along
with many collaborators across departments at USC, I led two projects using 4D visualization
platforms like Unity, Maya, and others to create a microscope image viewer for desktop, mobile,
and virtual reality, as well as a virtual reality exploration of the world within a PBC. The
motivations and outcomes for each of the three projects are presented in Chapter 3. Each project
relied not only on teams of individuals with diverse specialties, but also the tools and techniques
that have been developed across a range of fields.
In a number of fields, and especially in the study of the microscopic world, we are
collecting more information about dynamic processes with many components moving and
interacting in time. We must rely on 4D visualizations in order to build models of these complex
processes. Our understanding of the molecular scale is based on the synthesis of three types of
knowledge
108
. The first is our observation of the subject in the macro world. For instance, when
considering a skin cell, we might also consider our skin at-large as it is many of these cells
31
assembled together. The second is from the symbols we use to represent processes taking place at
the molecular scale, such as chemical equations. The third is our ‘mind’s eye’ picture of what is
taking place during a particular molecular phenomenon. It is this third type of understanding that
often comes into play when reasoning through a molecular process in the cell. Especially since
there is more structural information being produced in labs every day, our thought processes about
how the cell operates are becoming more representative of the structures of molecular objects. This
has created an opportunity for new ways to visualize our conceptions about the molecular worlds
within cells.
There is a long history of visualizations leading to major developments in biochemistry
and cell science
108
. It was only after van Leuwenhook produced drawings of the cells he saw in his
microscope that we were able to develop cell theory. Moreover, Mendeleev laid the foundations
for modern atomic physics and chemistry when he created the periodic table as a visualization of
relationships between atoms. August Hoffman’s creation of atomic models established the basis
for structural chemistry, which is now central to the development of therapeutics. Jane
Richardson’s ribbon model visualization of protein folds allowed us to see molecular folds and
compare structural motifs across the machinery of life. Now computer-generated visualizations
are building on this long history of visual abstraction and representation in the biochemical and
cell sciences by creating interactive multidimensional models of our understanding of nature.
The visualization projects presented in Chapter 3, relied heavily on game and entertainment
technology. Each project was built in the Unity game engine and used components built in other
modeling and animation software like Maya and Blender. Unity is a cross platform game engine
32
that is used to create 2D, 3D, virtual reality, and augmented reality games, as well as simulations
and other experiences
109
. The engine has been adopted in non-gaming fields as well such as
architecture
110
, automotive
111
, and medical training
112
. The engine gives users the ability to create
3D environments with large numbers of constituent components each scripted with object-based
behaviors. The game engine is a framework that brings together several core areas in interactive
media design and it excels in six areas - graphics, audio, networking, physics, graphical user
interface, and scripting. As far as graphics go, Unity provides developers with an architecture for
high performance rendering capabilities, and with access to a graphics API, so that builders can
achieve high visual performance in simulations, games, and experiences. For audio capabilities,
Unity provides a framework for integrating sounds developed in-house as well as third-party apps
so developers can focus on the nature and composition of sounds rather than on their
implementation. One of Unity’s most powerful features, and one that makes the engine particularly
well-suited to the sciences, is its networking capabilities. Unity has scriptable components and
APIs in place that ease the process of optimizing an experience for online or multiplayer, which is
very useful in an age where data sharing and communication are paramount. Unity also has a built-
in physics engine enabling the creation of objects that behave realistically with just a few lines of
code. This feature is useful when programming behaviors into components of the cell like proteins,
for instance. For the creation of interfaces for interactive experiences, Unity has its own tools for
designing and implementing graphical user interfaces making it easy to add buttons and link them
to commands in a simulation, for example. Finally, it is arguable that Unity’s most powerful feature
is the scripting element. Unity not only provides pre-built scripts and solutions for basic features
like object movement, it also offers the ability to write custom scripts to control the behavior of
components within an experience. In our biological simulations and experiences, this gives us the
33
ability to program biochemical elements, like those of a cell, with behaviors known to us through
research.
Maya is an animation software that supplements our development of multidimensional
interactive platforms for research and communication of processes at the molecular scale
114
. Here,
Maya is primarily used for modeling components and environments to be used in our creation of
interactive tools and experiences. It is a powerful platform for modeling as its robust toolsets allow
for creating, importing, and modifying three dimensional objects. Fields in the molecular sciences
like structural biology and imaging are rapidly and increasingly producing three dimensional
information concerning cellular systems and components. As our projects integrate data from these
domains, we used Maya to optimize the components within the interactive experiences we build
in Unity. For instance, in the World in a Cell project, Maya is used to develop representations for
protein and nucleotide structures, as well as larger entities like organelles, and even cellular
membranes. Before bringing molecular components into the Unity game engine, we start with
structures from data repositories like the PDB and then shape these into models optimized for not
only performance within the interactive experience, but also for aesthetics. Maya allows us to add
features to molecular representations, such as highlighting functional domains within enzymes
using texture, color, motion, and more. Maya is also utilized in translating data from imaging
experiments like X-ray tomography into models amenable to the Unity game engine’s pipelines.
For instance, raw X-ray tomography data is often reconstructed as a mrc file type in order to
perform operations like segmenting areas of interest within a cellular sample, for example, like the
surfaces of organelles and the plasma membrane. Then the segmented cell surfaces are ingested
into Maya where the models are optimized for performance in the Unity game engine as well as
34
for aesthetic qualities. The development of these representations will be discussed further in
Chapter 4.
There is growing demand for 3D tools in the biosciences, and recently Unity is increasingly
leveraged in research and communication within this area
115
. Fields within biochemistry are
experiencing rapid expansion as we uncover more data on omics networks, atomic resolution
structures of large complexes, networks of interactions between molecular components, and more.
This expansion creates an increasing demand for visual tools to represent and communicate
complex information. Even more so, we need visual tools to integrate data from disparate yet
connected fields within the life sciences. For instance, a visual tool could connect macromolecular
structure data with omics and enzyme kinetics by demonstrating the network of interactions
between components with known frequency numbers, reaction rates, and structural composition.
Biological entities such as a cell exist in four dimensional environments, so it is apparent that
interactive four-dimensional tools are required to accurately model such complex environments.
Much of our thinking in the life sciences to date has been one- or two-dimensional. Genomic
sequences are represented as one dimensional strings of nucleotides, proteomics results are
tabulated as a two-dimensional relationship between component identity and frequency, and the
list goes on. It is largely the imaging and structural biology fields pushing toward three- and four-
dimensional representations of biological systems. In order to represent data produced within these
fields, three- and four-dimensions must be used to model findings as close to full-fidelity as
possible. Conveniently, these types of data are stored digitally in three- and four-dimensional file
types, which can be modified or implemented as is in visual programs like Unity, Maya, and others.
35
The demand for three and four dimensional tools in the life sciences, and the opportunity
for leveraging entertainment platforms, has contributed to a new genre in gaming and computing
called ‘serious games’
116
. Serious games are those that use entertainment technology like Unity
and Maya to create platforms for performing and contributing to research and communication. As
an example, Foldit is one the most commonly known examples of a serious game in the field of
biochemistry
117
. Foldit is an online puzzle video game focused on protein folding. The platform
was created by a collaboration between the University of Washington’s Center for Game Science
and the Department of Biochemistry. The objective of Foldit is to attempt to predict the native
structure of a protein by moving atomic components in space, seeking the most energetically
favorable conformation. The highest scoring attempts are then reviewed by professional structural
biologists for quality and accuracy. Foldit led to a breakthrough in 2010 where over 50,000 citizen
scientists contributed results that performed better than traditional structure prediction methods.
Then in 2011 Foldit players helped determine the crystal structure of a viral protein in less than
ten days that had eluded scientists for more than fifteen years. Altogether, Foldit represents an
opportunity for incorporating entertainment platforms in research pipelines. We are still in the
early days of leveraging such tools to discover insights about the mechanisms of life at the
microscopic level, and we will certainly see more development in this area as the need increases
to visually and interactively integrate an abundance of disparate information generated in life
science research.
36
Figure 3. Structure predictions solved by Foldit players. Native structures are shown in blue, starting
puzzles in red, and top scoring Foldit predictions in green. In puzzle a, the top scoring prediction flips and
slides a beta strand into place. In puzzle b, players correctly buried an exposed isoleucine by remodeling
the loop backbone. In puzzle c, the top scoring prediction rotated an entire misplaced helix. In puzzle d, the
top scoring prediction correctly buried exposed hydrophobic residues. In puzzle e, the top prediction
correctly buried an exposed phenylalanine. Figure adapted and reprinted with permission from Nature
117
.
Recently, several tools have been developed to help scientists bring their three- and four-
dimensional data into entertainment software like Unity and Maya. Most notable are CellPack,
ePMV, Molecular Maya, and Unity Mol. These tools are not only used in the communication of
scientific data through the preparation of accurate models, but also for testing hypotheses about
molecular environments and interactions. CellPack
25
and ePMV
118
were both developed by the
Olson and Goodsell lab at TSRI for the purpose of importing macromolecular structures into
entertainment software, allowing molecules to be visualized and animated using advanced tools.
CellPack is focused on taking known structures from crystallography and electron microscopy and
packing them within a container such as a cell or organelle membrane. Once the macromolecules
are packed in the container, users can bring the ensemble to life using the tools built into programs
like Maya, Blender, and Cinema4D. ePMV similarly brings protein and other macromolecular
structures into these three entertainment programs but is focused more on single molecules and
their properties. ePMV not only migrates structure data from PDB files, but also carries the
metadata about atoms within the structure and other properties so that they can be used inside the
37
entertainment software to create representations based on those features. Similar to ePMV,
Molecular Maya is a plugin designed to import PDB files into Maya in order to use the extensive
toolset to manipulate biological structures
119
. It was developed by the Clarafi group led by Gael
McGill at Harvard University and allows for metadata rich protein and DNA structure datasets to
be imported to Maya so that it can be integrated with other forms of visual information, such as
X-ray tomography datasets. For instance, Maya and Molecular Maya can be used to populate
landscapes of cellular environments collected through imaging methods with 3D macromolecular
structures collected from crystallography or electron microscopy. Further, omics information can
be integrated using these data to determine at what frequency components should be used to
populate an environment. Finally, UnityMol is an additional tool in the genre of entertainment
software plugins for visualizing biochemical data
115
. Where the previously described tools
compliment the Maya workflow as plugins to support the creation of pre-rendered visuals,
UnityMol is used to bring molecular information into the Unity game engine to be used in
developing interactive experiences with biological data.
Lines are blurring between traditional biochemistry structural investigation applications
like Chimera or Amira and entertainment modeling platforms like Unity or Maya. Applications
like Unity and Maya are beginning to fill needs in research and communication pipelines as their
robust toolkits are some of the most powerful for creating multidimensional and interactive
representations of data and knowledge. Unity and Maya are versatile tools for developing visuals,
applications, and experiences, and different aspects of the platforms are utilized in each project. In
the World in a Cell project, we relied on the Unity engine’s object-oriented nature to provide
individual elements within the experiences with their own unique behaviors that correspond to
their roles within a cell. We also used Maya to optimize, preprocess, and model components and
38
the overall environment. In the Microscape project, we most heavily relied on Unity’s graphics
management and shader scripting features, along with UI tools and cross platform deployability.
Each of our projects within this domain are only possible because of the robust toolkits that were
developed initially for completely different applications but are beginning to provide wide utility
to life scientists. Chapter 4 will discuss not only our approaches and current outcomes in
integrating and communicating biological data within these platforms but will also look to the
future in our own projects and others as they become more widely used in research and
communication.
39
Chapter 2: Omics and Ultrastructural Analysis of the Pancreatic Beta
Cell
Contributors to this work include: Jolene Diedrich, Yu Gao, John R. Yates III, Sanraj Mittal, Jitin
Singla, Kate White, Andrej Sali, Frank Alber, Carolynn Larabell, Gerry McDermott, Jian-Hua Chen
Background: PBC omics and imaging
PBC omics and imaging toward a whole-cell model
The role of the pancreas in diabetes was first discovered in 1889, while the PBC itself was
first described more recently in 1972
120
. The PBC has received much attention given it’s critical
role in diabetes-related physiology, and researchers use a wide variety of techniques, including
omics and imaging, to characterize these insulin producing cells
121-122
. Many current PBC omics
studies seek to elucidate changes in the proteome, transcriptome, lipidome, and genome in
conditions that recapitulate diabetic and healthy states, as well as transitions between the two.
Omics studies are particularly well-suited for such investigations given their ability to generate
lists of state-specific properties for components within biological samples
123
. PBC imaging studies
complement state-specific list-type observations by providing a window into the three
dimensionality of cellular systems. Ultimately, a detailed understanding of PBC behavior requires
knowledge of the cells’ inventory and architecture, in addition to many other vectors
124
. Omics
and imaging are ideal techniques for acquiring cell component inventory and ultrastructural
makeup. In this Chapter, these methods are used to generate baseline data toward structural models
of PBCs. Proteomics and transcriptomics were used to identify a series of constituent PBC
proteins, and soft X-ray tomography was used to characterize PBC ultrastructure. The PBC omics
40
and imaging studies of this thesis seek not only to connect but also add to the history of experiments
in the literature focused on understanding diabetes, the pancreas, and PBC biology.
The research presented here was part of the PBCC’s initial efforts toward creating a
structural model of the PBC
19
and of their overarching goal of using omics and imaging to gain a
baseline PBC component inventory and ultrastructural architecture. As a consortium, the PBCC
coordinates experiments using different techniques so that resulting data from each study can
ultimately be integrated using modeling approaches they are currently developing.
In addition to generating structural models of PBCs, the PBCC seeks to understand the
mechanism of insulin production in health and disease in order to create interventions for the
prevention and treatment of diabetes. As such, the PBCC not only needed to select a test-subject
cell-type that was amenable for modeling cell structure, they needed one that could provide insight
into insulin production and diabetes-related biology. One major challenge in studying the biology
of insulin and diabetes is obtaining reliable subjects for experimentation. The ideal test subjects
are actual human PBCs, grown in the islets of the pancreas. However, in mammals, PBCs exist in
a complex environment within the pancreatic islets, hence their extraction for use in
experimentation is nontrivial
125
. Further, islets cannot be removed from live subjects, making them
scarcely available for research
126
. Islets are often extracted from deceased human subjects, which
can then be used for research, though the high demand for this scarce material leaves it difficult to
obtain
127
. As a result, lab animal islets are often used to study insulin related biology, and rodents
are frequently used as model organisms
128
.
41
In addition to primary PBCs, a series of cell lines have been developed to study diabetes
and insulin-related biology. There are around fifteen commonly used PBC mimicking lines derived
from human, mice, rats, and hamster cells
129
. Therefore, due to the limited availability of primary
beta cell, as a consortium we decided to use human- and rodent-derived cell lines for early method
development and the experiments reported in this chapter represent the initial PBCC proteomics,
transcriptomics, and soft X-ray tomography imaging efforts for characterizing insulin producing
cells. Along with accessibility and continuous growth and division, one major advantage to using
cell lines in PBC research is the consistency and reproducibility achievable in using batches of
clonal cells. However, there are also major disadvantages in using cell lines. One prominent
disadvantage is that cells undergo changes during continuous growth, hence their evolution may
cause a divergence from normal PBC phenotypes. Additionally, cell lines often have abnormal
chromosome content, a result of the immortalization process, as well as other genetic mutations
that may cause atypical protein expression patterns and altered metabolism. Furthermore, cultured
cell lines fail to recapitulate the cell to cell interactions found in the complex environment of
pancreatic islets. However, despite these disadvantages, cell lines also offer many research
opportunities. Ultimately, the goal of the PBCC, and the diabetes research field in general, is to
understand human insulin-producing PBCs. Hence, the precursor omics and imaging of PBC cell
lines presented here represents the development of methods that seek to contribute to future
modeling efforts by the PBCC using primary human PBCs.
In developing the initial omics and imaging approaches to generate component and
ultrastructure information for structural PBC models, work began with the human-derived PBC
line 1.1B4, and rodent-derived PBC line INS1E. The 1.1B4 line was produced using electrofusion
42
between the immortal human PANC-1 epithelial cell line and freshly isolated human PBC and
produced insulin in response to nutrients, hormones, and drug acting stimulators or inhibitors in
normal PBCs
130
. INS1E is a popular PBC line stemming from rats, developed by Asfari et al.
131
.
It is insulinoma-derived and displays many important PBC characteristics including a response to
glucose stimulation with concentration-dependence curve similar to rat islets
129
.
The efforts presented in this chapter are focused on determining the baseline proteomic and
transcriptomic inventory, as well as the ultrastructural organization of commonly studied PBC
lines. Through these omics and imaging efforts, lists were generated of representative protein IDs
for the 1.1B4 and INS1E lines and for a mitochondria, nucleus, and membrane-specific proteome,
and an ultrastructural map was determined for INS1E. In addition, proteomic separation techniques
for expanding protein ID discovery were tested on all four cell lines, contributing to method
development for future PBCC proteomic efforts.
PBC omics for component lists
Though there is a wealth of PBC omics data for rodent models and cell lines, currently only
a limited number of transcriptomics and even fewer proteomics datasets are available for human
PBCs. Studies with human samples have mostly been carried out on whole pancreatic islets due to
the difficulty of segregating PBCs from islets, limiting their utility in this context. The available
transcriptomics data for human PBCs were compiled and curated, then compared with the sole
available proteomics dataset of PBCs.
43
This assessment showed that the proteomics analysis represents a partial survey of the PBC
proteome. There are only 750 proteins included in the proteomics dataset and 11,606 genes are
available via transcriptomics corresponding to 11,700 proteins. Supplementary tables with
comparisons of proteome and transcriptome from literature are provided in the Appendix. Many
important proteins are missing from the proteomics data, including G protein-coupled receptors
(GPCRs) e.g., glucagon-like peptide 1 receptor (GLP-1R) and other cell membrane proteins that
are known to be expressed in human PBCs and are instrumental in glucose and insulin signaling.
Hence, there is a need for global proteomic analyses on human PBCs and single cell RNA-seq for
a more descriptive analysis and also to capture the variability between individual cells.
For cell modeling considerations related to component inventory, another crucial piece of
information is the respective protein copy numbers per cell, as well as the variability of the copy
numbers across individual cells. The relative abundance of proteins in a cell population can in
principle be extracted from proteomics datasets. However, this information is still lacking due to
the limited number of proteomics datasets available for human PBCs. Recently, new techniques
have been developed for single cell proteomics, but they have not yet been widely applied
132-134
.
A complete model of a cell also requires similar quantitative information about metabolites,
lipids, polysaccharides, and other molecules, including their identities and relative concentrations.
Partial data is available from omics datasets for rodent cells/islets and human islets, but not for
isolated human PBCs
135-139
. A complete list of the components of the cell is necessary to begin to
44
understand what processes occur inside a cell as a result of the interactions and reactions of these
components.
Structure availability analysis
Because the overall focus here is on building multiscale 3D models of PBCs, in addition
to the list and relative abundance of component proteins, their atomic structures are also required.
The number of available atomic protein structures were determined for the 11,700 PBC proteins
in the list. As of 2018, only about 28% of the proteins had either an experimental structure in the
PDB
140
with a sequence coverage ≥75% or a reliable homology model in SWISS-MODEL
141
with
a sequence coverage ≥75% and a normalized-QMEAN
142
≥0.6. It is estimated that about 10% of
proteins are structurally disordered, with a fraction of disordered residues ≥50%, while more than
60% of proteins have no reliable 3D structures
143
. This lack of structural knowledge is even more
pronounced when considering protein complexes, which have not yet been identified exhaustively
let alone structurally characterized
42, 144
.
The structural coverage of PBC proteins that were identified in the analyses of
transcriptomic and proteomics data were categorized according to the amount of available
structural information (Figure 5). Categories used include: 1) Experimental Structures, which
consisted of entries in the Protein Data Bank (PDB) with sequence coverage ≥75%; 2) Reliable
Homology Models, which consisted of existing SWISS-MODEL structures with sequence
coverage ≥75%, and a normalized-QMEAN ≥0.6; 3) Disordered Proteins, which consisted of
entries in the Database of Disordered Protein Prediction (D2P2) with more than 50% of residues
45
predicted to be disordered; and 4) Unknown, which consisted of proteins with no reliable
experimental structure, no reliable homology model, and that were predicted to have <50% of
disordered residues. Three transcriptomics datasets were used to create a combined list of genes
expressed in PBCs
145-147
.
There were approximately 13,400 unique genes in all three datasets combined. Reads Per
Kilobase Million (RPKM), Fragments Per Kilobase Million (FPKM), and Transcripts Per Kilobase
Million (TPM) are all statistical methods used to convert raw transcriptomic read counts into an
estimate of the frequency transcripts were present in biological samples. To improve confidence
in the literature analysis of human PBC transcriptomics data, the working list was limited to those
genes present in at least two of the three datasets and with an RPKM, FPKM, or TPM value ≥1.
This resulted in 11,606 genes for which 11,700 protein products were identified in UniProtKB
148
.
The list of transcripts from the three reports that met these criteria is available in the Appendix.
For the proteomics dataset, the sole available quantitative proteomics dataset specific to human
PBCs was used, which reported only 750 proteins
149
.
46
PBC protein classes and structural coverage
Figure 4. Structural coverage of PBC proteins identified in analyses of: (a) transcriptomic; and (b)
proteomics data, which were categorized according to the amount of available structural information.
Disordered content information came from the D2P2 database and homology models were downloaded
from SWISS-MODEL.
Proteomic and transcriptomic profiling of PBC lines 1.1B4 and INS1-e
After determining gaps in available data regarding PBC component inventory, omics
inventories were created, while developing methods to be used for further PBC profiling. Work
began with the 1.1B4 cell in order to study human insulin production and secretion since these two
lines are both derived from human pancreatic cells. Following early tests with 1.1B4, efforts were
extended to include the popular rodent PBC line, INS1E. To develop an initial PBC component
inventory for the PBCCs modeling efforts, proteomic approaches were used to identify the
proteins, and transcriptomics were used to identify the mRNA present in 1.1B4 and INSE cells.
Through the omics efforts presented in this chapter, a representative protein ID list was generated
for the 1.1B4 and INS1E; a mitochondria, nucleus, and membrane specific proteome list was
generated for INS1E; and proteomic separation techniques for expanding protein ID discovery
47
were tested on both cell lines, contributing to method development for future PBCC proteomics
efforts.
Following the determination of components for each cell line, gene list analyses were
performed using PantherDB
150
. PantherDB enabled the determination of: 1) the family and
subfamily of the determined components, which are groups of evolutionarily-related proteins and
related proteins, respectively, that also have the same function; 2) the molecular function, or the
function of the protein by itself or with directly interacting proteins at a biochemical level, e.g. a
protein kinase; 3) biological process, or the function of the protein in the context of a larger
network of proteins that interact to accomplish a process at the level of the cell or organism; and
4) the pathway, which is similar to biological process, but also explicitly specifying the
relationships between the interacting molecules.
Ultrastructure profiling of PBC line INS1-e via soft X-ray tomography
To construct a comprehensive model that accurately represents the 3D structure,
organization, and function of a PBC, not only was a complete list of PBC component identities
needed, like those obtained for proteins and transcripts through proteomics and transcriptomic
efforts, but also, among other data, the basic ultrastructural architecture of the cell. This included
the overall shape and size of the cell, along with abundance, location, and membrane topology of
the organelles within the target PBC.
48
X-ray and cryo-electron tomography are capable of determining the ultrastructural
architecture of PBCs, each at different spatial scales. X-ray tomograms can be used to extract
information regarding quantity, location, shape, and size of the nucleus, golgi, mitochondria, and
other organelles. At a higher resolution, cryo-electron tomography can elucidate the topology of
membranes and compartments within a cell, as well as reveal individual macromolecular
complexes. Even the low resolution electron tomograms taken a decade ago by serial sectioning
of high-pressure frozen and plastic-embedded cells showed a surprisingly tight packing of insulin
vesicles inside islet PBCs
27, 151
. However, with recent methodological advances including focused
ion beam milling, direct detectors, and phase plates, the resolution and contrast of captured images
can be increased so that even macromolecules can be visualized
152-153
. Computational methods
like template-based search
154-155
and template-free pattern mining
156-157
are also under
development for the detection of macromolecules in cellular tomograms. With further
technological advances and increased resolution, the combination of X-ray and cryo-electron
tomographic data along with accurate computational techniques will be key for creating spatial
and temporal distribution maps of the organelles and macromolecular complexes of PBCs under
various environmental conditions
52, 154
.
With the PBCC, soft X-ray tomography was used to determine baseline ultrastructural
maps of both 1.1B4 and INS1E. In this way it was possible to distinguish between not only large
membranes like that of the cell itself, but also organelle-level composition like the mitochondria
and nucleus, and even down to the level of individual insulin containing vesicles. Through imaging
and omics approaches, lists of state-specific PBC proteins, transcripts, as well as the ultrastructural
makeup of PBCC macrocomponents were generated. Cell modeling and data integration efforts
49
within the PBCC are now focused on connecting information from these modalities through 3D
and 4D modeling applications to develop approaches for modeling the cell from the atomic scale
upward.
Results and Discussion
Global Proteomic Analysis in 1.1B4 Whole Cell Lysates through comparative
Fractionation Techniques
With the goal of generating a PBC proteomic inventory to be used for further cell modeling
efforts, initial experiments focused on finding a suitable cell line for method development. The
first PBC cell line obtained was 1.1B4, supplied by Sigma Aldrich. Initial tests used protein
fractionation techniques to optimize methods for discovering the largest number of protein IDs in
whole-cell lysates. Fractionation methods included a standard methanol precipitation, DiffPop and
short-DiffPop, developed by the Yates lab at TSRI, and high pH reversed-phase separations.
DiffPop and short-DiffPop are both serial protein precipitation methods that use an increasing
methanol and acetic acid gradients to separate complex protein samples over several iterations
158
.
High pH reversed-phase separation is also a commonly used prefractionation method where a basic
liquid mobile phase is passed through a non-polar stationary phase to generate a time- and
hydrophobicity-based spectrum of separated eluents
159
. After samples were processed using
complexity reduction methods, each was precipitated, trypsinized, and profiled via mass
spectrometry. Raw mass spectrometry (MS) files were processed using the Integrated Proteomics
Pipeline, developed by the Yates lab
160
. The Integrated Proteomics Pipeline uses proprietary tools
like RawConverter, ProLuCID, and DTASelect, to identify peptides from raw mass spectra, match
50
them to parent proteins, and perform statistical significance assessments on the results. A diagram
of the overall sample preparation and analysis workflow is presented below (Figure 6) along with
a diagram representing the Integrated Proteomics Pipeline workflow for MS data processing
(Figure 7).
Figure 5. Sample preparation and analysis pipeline for PBC proteomics. Cell lines are cultured, lysed,
fractionated, precipitated, and acidified before running samples on the mass spectrometer. Following the
mass spectrometer profiling step, the Integrated Proteomics Pipeline was used to match spectra to peptide
fragments, and to match peptide fragments to parent proteins.
51
Figure 6. Integrated proteomics pipeline to identify peptides from raw mass spectra, match peptides to
parent proteins, and perform statistical significance assessments on the results. Raw MS data is converted
from binary to text format for processing using RawConverter. Peptide spectral matching is performed to
identify peptide fragments present in the sample using ProLuCID. Peptides are then matched to parent
proteins and assessed for statistical significance using DTASelect.
52
As a control method and metric for peptide identification validation, false discovery rates
were measured for each experiment. The false discovery rate approach, which yields the ratio of
falsely identified peptides to the total number of identified peptides, is the most widely accepted
method for validating proteomics results
161
. After a mass spectrum is acquired, a scoring function
is used to evaluate the similarity between peaks in the experimental spectrum against known
peptide spectral peak features. Software compares the experimental peaks to potential database
matches, scores each potential match for similarities, and then selects the highest scoring potential
match as the peptide identity for that peak in the experimental spectrum. In this study, the target-
decoy false discovery rate method was used to estimate the false discovery rate of identified
peptides
162
. The target-decoy method works by comparing experimental spectra to both a target
database of peptide-peak features as well as a decoy database of the same size that contains
contrived peptide-peak features. With an optimized decoy database, the number of false peptide
identifications will be evenly distributed between the target and decoy databases. Given that all of
the decoy databases identifications are known to be false, the false discovery rate can be estimated
by the number of decoy database matches divided by the total number of peptide matches.
Community standards in proteomics suggest using a threshold false discovery rate of no higher
than 1%, which was the threshold used in experiments presented here
163
.
Separately, using standard methanol precipitation, DiffPop, Short-DiffPop, and high pH
reverse-phase chromatography, each sample type yielded a protein ID count in the low thousands.
MS and protein identification results for 1.1B4 and each of the four methods is presented in Table
1. For 1.1B4 cells, DiffPop fractionation yielded the largest number of unique protein IDs at 2063,
53
while the short DiffPop fractionated sample yielded the least number of unique IDs at 1111. Total
peptide and total protein amounts varied within an order of magnitude between the four methods,
again with DiffPop producing the largest quantity of each. This result can largely be attributed to
the greater number of sample fractions processed for the DiffPop dataset, and this is also observed
in the High pH RP sample containing the second largest number of individual IDs. In total, from
the four methods, 2371 unique proteins were identified in the 1.1B4 whole-cell lysates. Of the 750
proteins found in the Brackeva et al. human pancreatic beta cell proteomics dataset analyzed for
structural availability earlier, 486 of those proteins were also found in the proteomic profiling of
1.1B4 cells presented here.
Table 1. Comparison of discovered protein IDs in 1.1B4 whole-cell lysates using four
prefractionation methods.
Sample type Fractions
Individual
Protein IDs
Total Peptide
Count
Total Spectral
Count
False
Discovery
Rate
Standard MeOH 2 1348 18088 59563 0.93%
DiffPop 10 2063 54494 139296 0.90%
Short DiffPop 3 1111 15906 36287 0.89%
High pH RP 8 1573 24166 11124 0.93%
Total 23 2371 112654 346390 0.91%
Using PantherDB, the 2371 protein IDs were matched against gene ontology classifications
for molecular function, biological process, cellular component, and protein class. From the list of
1.1B4 proteins, the most frequent molecular function was found to be binding activity. According
to the Gene Ontology Consortium’s definition, binding activity is synonymous with ligand
54
activity, or the selective, non-covalent, often stoichiometric, interaction of a molecule with one or
more specific sites on another molecule. This class contains further subclasses including protein
binding, carbohydrate binding, cofactor binding, ion binding, heterocyclic compound binding,
neurotransmitter binding, and more. The 894 protein IDs within the binding gene ontology
definition fell mostly within the protein binding subclass.
Figure 7. 1.1B4 whole-cell lysate, molecular function gene ontology analysis
The combined list of 2371 unique protein IDs from each whole-cell lysate MS run was
further broken down by protein class using the PantherDB. The most representative protein class
within the 1.1B4 proteome was nucleic acid binding proteins, followed by hydrolases, enzyme
modulators, and transferases. Within the nucleic acid binding class, more than two-thirds of the
103
4%
17
1%
58
2%
753
32%
69
3%
28
1%
164
7%
894
38%
285
12%
1.1B4 WHOLE CELL LYSATE -MOLECULAR FUNCTION
Transporter Activity
(GO:0005215)
Translation Regulator Activity
(GO:0045182)
Transcription Regulator Activity
(GO:0140110)
Catalytic Activity
(GO:0003824)
Molecular Function Regulator
(GO:0098772)
Molecular Transducer Activity
(GO:0060089)
Structural Molecule Activity
(GO:0005198)
Binding
(GO:0005488)
Other
55
protein IDs corresponded to RNA binding protein activity, with the next-level subclassifications
corresponding to ribosomal proteins, mRNA processing factors, and translation factors. Given that
the nucleic acid binding protein category is overrepresented by proteins involved in ribosomal
protein translation, this suggests that the 1.1B4 cells possess a large degree of protein production
capabilities. Further, given that PBC function is centered around the process of synthesizing and
secreting insulin, which is a protein itself, the observation that translation-involved proteins are
among the most frequently observed within the cell fits expectations for a hormone producing line.
Figure 8. 1.1B4 whole-cell lysate, protein class gene ontology analysis
435
25%
231
14%
155
9%
144
8%
139
8%
90
5%
83
5%
75
4%
69
4%
299
18%
1.1B4 WHOLE CELL LYSATE -PROTEIN CLASS
nucleic acid binding (PC00171)
hydrolase (PC00121)
enzyme modulator (PC00095)
transferase (PC00220)
cytoskeletal protein (PC00085)
oxidoreductase (PC00176)
transcription factor (PC00218)
transporter (PC00227)
membrane traffic protein (PC00150)
other
56
The combined list of unique 1.1B4 whole-cell lysate protein IDs was also analyzed using
the Consensus Path DataBase (CPDB) from the Max Planck Institute for Molecular Genetics to
determine the identities of enriched pathways
164
. Of hundreds of enriched pathways present in the
dataset, the two most enriched were protein and general metabolism, while signal transduction was
third, and vesicle-mediated transport and membrane trafficking were twelfth and thirteenth,
respectively. Each of these categories would be relevant to a cell responsible for sensing nutrients
and secreting hormones, so the pathway enrichment analysis seemed to support the notion that the
1.1B4 cell line could be a reasonable cell model to recapitulate insulin production and secretion,
and ultimately PBC-related phenotypes.
Despite learning that the 1.1B4 cells have hormone exocytosis-related activity, no insulin
was detected in the MS runs. This was counter to expectations given that the cell line is reported
as an insulin producing, PBC recapitulating, cell line
130
. To follow up, the PBCC looked for insulin
presence using orthogonal methods such as transcriptomics and radiometric assays and requested
that collaborators repeat the MS runs as a control. Radiometric insulin assays as well as
collaborator-repeated proteomics assays also showed no presence of insulin in the 1.1B4 cell line.
Transcriptomics, however, did give an indication that insulin may be present in the cells.
1.1B4 Proteomics and Transcriptomics Consistency Analysis
The PBCC performed RNA-Seq transcriptomics experiments on 1.1B4 cells to generate an
ID list and quantify measures of mRNA in the PBC line. 1.1B4 RNA-Seq yielded a total of
5,142,805 detected transcripts, 5331 unique transcript IDs, and 5142 IDs with 10 or more
transcripts. Of the more than five-million transcripts, only 96 were insulin mRNA. This number is
57
surprisingly low given the expectation of the 1.1B4 line to recapitulate PBC glucose-stimulated
insulin production and secretion. In addition to the lack of insulin, there were also no transcripts
present for GLP1R, glucagon, or glucagon receptor - each well-characterized features of PBCs.
Using the CPDB, the list of transcripts and observed frequency were analyzed for enriched
pathways. The top five enriched pathways were identical between the 1.1B4 proteomics and
transcriptomics data sets, suggesting a degree of consistency between workflows for the two
techniques. It was also noticed that the vesicle-mediated transport and membrane trafficking
pathways were slightly more enriched in the transcriptomics dataset compared to the proteomics
data. In the transcriptomics data, the two enriched pathways appeared as the tenth and eleventh
most enriched, while they appeared as the twelfth and thirteenth most enriched in the proteomics
data.
Next, several approaches were used to compare the 1.1B4 proteome and transcriptome.
The cells were cultured in consistent conditions, so the comparison served to check the relative
correlation between the 1.1B4 proteome and transcriptome in terms of component frequency. 1775
common IDs were found between the two lists. The top fifteen most abundant IDs present in both
the proteomics and transcriptomics datasets are listed in Table 2. The ten most abundant commonly
identified transcripts/proteins were B2M, GNAS, MYL6, PSAP, RPLP1, RPL37A, SSR4, RPS14,
RPL8, and ALDOA. Notable within the top ten are: GNAS, which is the alpha-s G-protein, a key
regulator of the insulin synthesis and secretion pathways in PBCs; RPLP1, RPL37A, RPS14, and
RPL8, which are ribosomal components; SSR4, which is a protein responsible for binding calcium
to the ER membrane, a process that contributes to controlling the release of insulin in PBCs; and
MYL6, which is a myosin light chain, a known PBC protein
165
.
58
Table 2. Top fifteen most abundant 1.1B4 genes/proteins present in both transcriptome and proteome
datasets, ranked by transcript frequency.
Gene Protein
Transcriptome
Rank
Proteome Rank
B2M Beta-2-microglobulin 5 55
GNAS
Guanine nucleotide-binding protein
G(s) subunit alpha isoforms short
9 1016
MYL6 Myosin light polypeptide 6 15 486
PSAP Prosaposin 17 301
RPLP1 60S acidic ribosomal protein P1 18 128
RPL37A 60S ribosomal protein L37-A 19 76
SSR4
Translocon-associated protein
subunit delta
20 732
RPS14 40S ribosomal protein S14 21 81
RPL8 60S ribosomal protein L8 23 52
ALDOA Fructose-bisphosphate aldolase A 26 69
RPL3 60S ribosomal protein L3 27 58
RPL36 60S ribosomal protein L36 28 66
RPL31 60S ribosomal protein L31 29 80
RPL30 60S ribosomal protein L30 30 109
GAPDH
Glyceraldehyde-3-phosphate
dehydrogenase
32 26
To further analyze for degree of correlation, the two ID lists with their corresponding
frequency measures were assessed for the degree of linear dependence using Spearman’s rank
correlation analysis. The Spearman’s rank correlation coefficient is a nonparametric measure of
rank correlation, or in other words, a measure of statistical dependence between the rankings of
59
two variables. The Spearman correlation between two variables will approach a perfect correlation
of 1.0 when observations, such as transcript and protein frequency, have a similar rank for each
independent variable, such as gene or protein type. The comparative analysis of 1.1B4
transcriptomics and proteomics data yielded a Spearman Correlation Coefficient of 0.256, which
suggests there is a monotonic relationship between the ranked lists
166
. Further, this Spearman’s
rank correlation result for 1.1B4 transcriptomics and proteomics identifications is consistent with
previously reported rank-comparisons between transcriptomics and proteomics identifications in
a variety of cell types
167-169
. A linear monotonic relationship is one in which as the value of one
variable increases, so does the value of the other value. Each independent variable, which in this
case is the identity of the transcript-gene pair, has two dependent variables associated with it, a
transcript frequency ranking and protein frequency ranking. So the resulting Spearman’s
Correlation Coefficient value, and the relationship it implies, gave us confidence that the results
from our two omics techniques were consistent in the type and abundance of the identified gene
and protein pairs.
The relationship between the two ranked ID lists from 1.1B4 proteomics and
transcriptomics experiments was also visualized using the Rank Rank HyperGeometric Overlap
tool from the Crump lab
170
. The Rank Rank HyperGeometric Overlap algorithm serves to compare
independent high-throughput gene-expression experiments for significant overlap between the
identities and frequencies of the profiled components. The algorithm steps through two ranked
lists, in this case transcripts identified in RNA-seq and proteins identified with mass-spectrometry
proteomics, and successively measures the statistical significance of the number of overlapping
features, such as the consistency in rankings of gene-protein pairs identified by the two omics
techniques. The output is a graphical heatmap showing the strength, pattern, and bounds of
60
correlation between the two ranked lists. The heatmap color corresponds to the -log of the
hypergeometric p-value, which is a measure of the probability that a relationship exists between
the ranks of a transcript and protein. For the comparison of gene expression signatures between
two ranked lists, the algorithm takes an input of step-size, which is used to bin ranked items during
run-time to balance calculation efficiency versus output resolution. Using a step-size of 10, the
ranked list yielded the profile shown in Figure 10. The output heatmap suggests a monotonic
relationship between the two datasets given the significant overlap, shown in red, between the
highest frequency transcripts and proteins.
Figure 9. Rank Rank HyperGeometric Overlap visualization of relationship between ranked lists of 1.1B4
transcripts generated with RNA-seq and 1.1B4 proteins generated with mass-spectrometry proteomics.
61
Using a step-size of 10, the ranked list yielded the profile shown here. The output heatmap suggests a
monotonic relationship between the two datasets given the significant amount of overlap, shown in red,
between the highest frequency transcripts and proteins.
INS1E whole-cell and organelle proteomics
After 1.1B4 cells demonstrated limited PBC-related characteristics in the proteome and
transcriptome datasets, attention was turned to other PBC recapitulating cell lines. The next line
chosen to pursue by the PBCC was INS1E, an insulinoma line derived from a pancreatic cancer
cell in Rattus norvegicus. In the literature, there are several demonstrations of INS1E glucose
stimulated insulin secretion, so after obtaining the INS1E line the series of experiments were begun
by testing the cells for this feature. Upon confirming the insulin secretion of the INS1E line, the
next goal was to develop a baseline list of protein components, similar to previous efforts for
characterizing 1.1B4.
In addition to whole cell lysate proteome ID lists, organelle specific ID lists were also
generated. Efforts focused on extracting the nucleus, mitochondria, cell membranes, and cytosol
fractions of INS1E. Organelle extractions were performed using fractionation kits based on
differential centrifugation from Abcam, Thermo, and Sigma, and validated using manufacturer
recommended antigens. The number of unique protein IDs discovered in each organelle extraction
is given in Table 3.
62
Table 3. INS1E protein identification summary for whole-cell lysate and organelle fractions. Unique
protein IDs contained at least one peptide and spectral count.
Samples
Unique
Protein IDs
Total Peptide
Count
Total Spectral
Count
False
Discovery
Rate
Cytosol 2 4664 85639 282662 0.92%
Mitochondria 2 2045 13673 25099 0.96%
Nucleus 2 3870 92667 294450 0.98%
Membranes 2 4782 100785 295358 0.97%
Whole Cell 2 7808 109031 271953 0.94%
Total 10 8825 401795 1169522 0.95%
With the goal of determining a baseline protein ID for a cell that recapitulates glucose-
stimulated insulin synthesis and secretion as observed in PBCs, after obtaining whole-cell and
organelle specific ID lists, instances of insulin-related biology were identified. Taking the 8825
total unique protein IDs from the combined datasets, it was observed that insulin-1 was within the
highest 10% of expressed proteins, with an abundance rank of 682. Further, insulin-2 was found
within the highest 20% of expressed proteins, with an abundance rank of 1053. This result added
credence to the conclusion that INS1E could serve as a reasonable line to target integrative whole-
cell modeling efforts. Again, taking the 8825 total unique protein IDs from the combined datasets,
along with peptide counts for each of the IDs, the PantherDB’s gene list analysis tools were used
to perform a cellular pathway overrepresentation analysis. Notably high on the list of enriched
pathways are two pathways critical to insulin synthesis and secretion: ‘protein targeting to ER’
with an enrichment of 3.28 and a false discovery rate of 0.0219, along with ‘ER to Golgi vesicle-
mediated transport’ with an enrichment of 2.60 and false discovery rate of 0.00000664.
63
Similar to efforts for 1.1B4 component characterization, the PBCC performed RNA-seq
transcriptomics on INS1E to profile the identity and frequency of mRNA transcripts present in the
cell line. INS1E RNA-Seq yielded a total of 11,155 unique transcript IDs, with 6081 IDs present
with 10 or more transcripts. Further, the two variants of rat insulin, ins1 and ins2, accounted for
the two most abundant ID types with 294,534 and 37,293 transcripts, respectively. Next, similar
to the analysis for consistency between 1.1B4 transcriptomics and proteomics results, the results
from INS1E proteomics and transcriptomics were compared using the Spearman’s rank correlation
analysis as well as the Rank Rank HyperGeometric Overlap tool. There were a total of 5885 shared
IDs between the INS1E proteomics and transcriptomics lists. In assessing the degree of linear
dependence between the rankings of these shared IDs found in both techniques, the Spearman
Correlation Coefficient was found to be 0.265, which, similar to results from 1.1B4, suggest there
is a linear monotonic relationship between the two ranked lists. Further, the strength, pattern, and
bounds of correlation between the two ranked lists was visualized using the Rank Rank
HyperGeometric Overlap tool (Figure 11). The output heatmap suggests a monotonic relationship
between the two datasets given the significant overlap, shown in red, between the highest
frequency transcripts and proteins.
64
Figure 10. Rank Rank HyperGeometric Overlap visualization of relationship between ranked lists of
INS1E transcripts generated with RNA-seq and INS1E proteins generated with mass-spectrometry
proteomics. Using a step-size of 10, the ranked list yielded the profile shown here. The output
heatmap suggests a monotonic relationship between the two datasets given the significant
amount of overlap, shown in red, between the highest frequency transcripts and proteins.
To further characterize the total list of unique protein IDs detected in INS1E, PantherDBs
gene list analysis tools were used to find gene ontology classification categories for molecular
function and protein class. Similar to the results for 1.1B4 molecular function classification, the
INS1E protein ID list was most frequently represented by the binding and catalytic activity
categories, together comprising almost three-fourths of the total annotations. Within the binding
class, heterocyclic compound binding and protein binding are the two most observed
subclassifications. Further, within the catalytic activity class, hydrolase and transferase activity
were the two most abundant molecular function categories for the identified proteins. For protein
65
class characterizations, again the INS1E protein ID list was most frequently represented by the
nucleic acid binding category, with more than two-thirds of the annotations falling within the RNA
binding protein subclass, and the majority of these IDs corresponding to ribosomal proteins and
translation factors.
Figure 11. Distribution of INS1E total protein molecular function gene ontology classifications.
243
5%
233
5%
1707
35%
195
4%
468
10%
1895
39%
108
2%
INS1E -TOTAL PROTEIN IDS -MOLECULAR FUNCTION
transporter activity (GO:0005215)
transcription regulator activity
(GO:0140110)
catalytic activity (GO:0003824)
molecular function regulator
(GO:0098772)
structural molecule activity
(GO:0005198)
binding (GO:0005488)
other
66
Figure 12. Distribution of INS1E total protein class gene ontology classifications.
The research presented here focused on determining the baseline inventory of protein
components within insulin producing, PBC phenotype-recapitulating cells. Using MS based
proteomics and RNA-seq transcriptomics, a representative protein ID list was developed for the
1.1B4 and INS1E lines, as well as a cytosol, mitochondria, nucleus, and membrane specific protein
ID lists for INS1E. In addition, comparisons between prefractionation and traditional precipitation
techniques contributed to methodological development for future PBCC proteomic efforts. In all,
a total of 2371 proteins were observed in 1.1B4 using several proteomics techniques, while
transcriptomics discovered 5331 unique sequences of mRNA in the same cell line. For INS1E, a
total of 8825 unique proteins were identified with proteomics, and from organelle fractionation
1065
27%
462
12%
348
9%
335
8%
305
8%
215
5%
210
5%
174
4%
151
4%
121
3%
609
15%
INS1E -TOTAL PROTEIN IDS -PROTEIN CLASS
nucleic acid binding (PC00171)
hydrolase (PC00121)
transferase (PC00220)
enzyme modulator (PC00095)
transcription factor (PC00218)
cytoskeletal protein (PC00085)
oxidoreductase (PC00176)
transporter (PC00227)
membrane traffic protein (PC00150)
ligase (PC00142)
other
67
techniques 4664 cytosolic proteins, 2045 mitochondrial proteins, 3870 nuclear proteins, and 4782
proteins were identified within cellular membranes.
Ultrastructural mapping of 1.1B4 and INS1E via soft X-ray tomography
In parallel with efforts to generate a protein level inventory of PBC lines 1.1B4 and INS1E,
the two lines were profiled at the mesoscale using soft X-ray tomography, with the ultimate goal
of integrative modeling between the experimental modalities and others through the PBCC.
Through soft X-ray tomography, the cells in a near native cryo-fixed arrangement were observed
while visibly resolving scales from 100 nm to 10 µm
171
. Soft X-ray tomography was also well-
suited for rapid collection of tomographic datasets at a rate of over 10 per hour
172
. Mesoscale
imaging revealed the cell’s membrane, nucleus, mitochondria, and in the case of PBC’s, insulin
vesicles
173
. For these reasons, the PBCC set out to incorporate soft X-ray tomography into its
integrative data collection and modeling platform, and collection commenced on a series of
tomograms on 1.1B4 and INS1E.
After collecting soft X-ray tomograms on more than twenty 1.1B4 and INS1E PBCs, the
images were segmented to define regions of biological interest using Amira. Differentiating
between ultrastructural elements relied not only on size and shape, but also on the linear absorption
coefficient (LAC) value of a given component
174
. The samples absorb X-rays according to the
Beer-Lambert law and as such is a quantitative measure of composition, based on thickness and
chemical makeup. Densely packed regions of the cell, where there is a large carbon content per
unit space, yield greater LAC values, while solvated regions in the cell appear more transparent in
68
comparison. The LAC value was used to connect boundaries of organelles during the segmentation
process. Interestingly, in the samples, LAC values were largest for the dense core insulin secretory
granules, which contain tightly packed insulin, creating a highly carbon-dense environment. Figure
14 gives a demonstration of the visual differences in LAC values for different components within
INS1E as well as experimentally determined numerical LAC values.
Figure 13. Identification of INS-1E ultrastructural components via LAC value
Currently in progress, the segmented ultrastructural makeup of the PBC specimen
generated from X-ray tomography is being integrated with proteomics data from the same cell
lines using 3D and 4D modeling and interactive development platforms typically used for
animation and game development. This research will be described in more detail in Chapter 4.
Data from both PBC omics and imaging are also currently being used by the Olson and Goodsell
69
labs at the Scripps Research Institute to create static models of mitochondria based on
experimentally-determined ultrastructural architecture and omics level component frequency data.
They use applications like CellPack, developed in their lab, to generate data-driven structural
biological models
25
. As a part of the PBCC, these approaches along with other modeling efforts
within the group, seek to develop methods for modeling other organelles in addition to the
mitochondria, ultimately working toward modeling PBCs themselves.
Methods
Structure availability analysis
To determine the availability of 3D structure data for each protein confidently identified in whole-
cell lysates and organelle fractions, an analysis was performed using the SWISS-MODEL. Each
protein was analyzed to determine if it has an experimentally-solved structure in the PDB or a
reliable homology model in SWISS-MODEL. Experimentally-determined structures were counted
as present if the sequence coverage ≥75%, and reliable homology models were counted as present
if the sequence coverage ≥75% and the Q-mean was ≥0.6.
Cell culture
1.1B4 cells were cultured in RPMI media with 10% FBS, 2 mM Glutamine, and 1× penicillin-
streptomycin) and seeded at a density of 2.0 × 10
4
cm
-2
. INS-1E cells were cultured in Addex Bio
Optimized RPMI media supplemented with 10% FBS, 1× penicillin-streptomycin, and 0.05 mM
β-Mercaptoethanol and seeded at a density of 8.0 × 10
4
cm
-2
. All cells were grown in either T25
or T75 flasks in 37°C incubators with 5% CO2. Prior to purchasing, all cell lines were authenticated
by each supplier. INS-1E cells were analyzed by COI Assay by the supplier (Addex Bio) to insure
70
no interspecies contamination
175-176
. Additionally, RNA-seq was preformed to confirm the
presence of insulin, and the cells were observed under microscopy to identify changes in
morphology and attachment properties. Growth curves were maintained to monitor cell-doubling
time, growth trends and passage number.
Whole-cell lysates and organelle extracts
All extracts and lysates were analyzed by SDS-PAGE and Coomassie and quantified for protein
contents by bicinchoninic acid assay (Pierce). Enrichment verification was performed via western
blot by loading 20 µg of extract onto an 8-12% Bis Tris gel (Invitrogen), transferring to developing
membrane via iBlot Dry Blotting System (ThermoFisher), and developing using HRP-conjugated
secondary antibodies (Novus). Nuclei and cytosol were extracted using NE-PER Nuclear and
Cytoplasmic Extraction Regents (ThermoFisher) following the manufacturer’s protocol.
Enrichment of nuclei was verified using western blot with anti-histone deacetylase antibody.
Mitochondria was extracted using Mitchondria Isolation Kit for Cultured Cells (Abcam) following
the manufacturer’s protocol. Enrichment of mitochondria was verified using western blot with
anti-Cytochrome C Oxidase antibody. Plasma membrane was extracted using Plasma Membrane
Protein Extraction Kit (Abcam) following the manufacturer’s protocol. Extraction of plasma
membrane was verified using western blot with anti-Calcium Pump PMCA2 antibody.
Protein precipitation and digestion
Lysates and extracts were precipitated in 50 µg aliquots using MeOH/CHCl3. Protein pellets were
then resuspended in 8 M Urea, 100 mM Tris pH 8.5, TCEP, and 2-chloroacetamide. Following
71
alkylation, samples were diluted with TEAB and trypsinized overnight. Trypsin was quenched
using formic acid.
MS analysis
Digested peptides from whole-cell lysates and organelle fractionations were analyzed were
analyzed on a Fusion Lumos Orbitrap tribrid mass spectrometer (Thermo). The digest was injected
directly onto a 30 cm, 75 um ID column packed with BEH 1.7um C18 resin (Waters). Samples
were separated at a flow rate of 400 nl/min on a nLC 1200 (Thermo). Buffer A and B were 0.1%
formic acid in water and 0.1% formic acid in 90% acetonitrile, respectively. A gradient of 1-25%
B over 180 min, an increase to 40% B over 40 min, an increase to 90% B over 10 min and held at
90%B for a final 10 min was used for 240 min total run time. Column was re-equilibrated with 20
ul of buffer A prior to the injection of sample. Peptides were eluted directly from the tip of the
column and nanosprayed directly into the mass spectrometer by application of 2.5 kV voltage at
the back of the column. The Orbitrap Fusion Lumos was operated in a data dependent mode. Full
MS scans were collected in the Orbitrap at 120K resolution with a mass range of 400 to 1500 m/z
and an AGC target of 4e
5
. The cycle time was set to 3 sec, and within this 3 sec the most abundant
ions per scan were selected for CID MS/MS in the ion trap with an AGC target of 2e
4
and minimum
intensity of 5000. Maximum fill times were set to 50 ms and 100 ms for MS and MS/MS scans
respectively. Quadrupole isolation at 1.6 m/z was used, monoisotopic precursor selection was
enabled and dynamic exclusion was used with exclusion duration of 5 sec.
Protein and peptide identification
72
Protein and peptide identification were performed with the Integrated Proteomics Pipeline – IP2
(Integrated Proteomics Applications). Tandem mass spectra were extracted from raw files using
RawConverter
177
and searched with ProLuCID
178
against Uniprot rat database. The search space
included all fully-tryptic and half-tryptic peptide candidates. Carbamidomethylation on cysteine
was considered as a static modification. Data was searched with 50 ppm precursor ion tolerance
and 600 ppm fragment ion tolerance. Identified proteins were filtered to using DTASelect
179
and
utilizing a target-decoy database search strategy to control the false discovery rate to 1% at the
protein level
180
.
Sample preparation for soft X-ray tomography
After sample treatment, cells were placed on ice in Eppendorf tubes in preparation for freezing in
capillaries. Cells were loaded into thin walled (200 nm) glass capillaries (in KREBs buffer as
described above) and rapidly frozen by plunging into liquid nitrogen cooled propane and stored in
liquid nitrogen until data collection
181
.
Soft X-ray tomography Data Collection
Data was collected as previously described
182
. Briefly, projection images were collected at 517 eV
using XM-2, the National Center for X-ray Tomography’s soft X-ray microscope at the Advanced
Light Source of Lawrence Berkeley National Laboratory. The microscope was equipped with a 50
nm resolution defining objective lens. During data collection, the cells were maintained in a stream
of helium gas cooled to liquid nitrogen temperatures, which allowed for collection of projection
images while reducing the effects of exposure to radiation
183
. 180 projection images were collected
sequentially around a rotation axis in 1° increments to give a total rotation of 180°. Depending on
73
synchrotron ring current, an exposure time of between 140 - 300 ms was used. Tomographic
reconstructions were calculated using the iterative reconstruction method as previously
described
184
.
Linear Absorption Coefficient (LAC) and Segmentations
Segmentation of the nucleus and plasma membrane were undertaken using the program Amira
(FEI) employing the “paintbrush” tool where the outer edge of the nucleus was manually traced
orthoslice-by-orthoslice and using the “interpolation” feature. The accuracy of ‘painting’ was
confirmed in three-dimensions. Segmentation of the lipid droplets, DCSGs and mitochondria was
accomplished using the semi-automatic “magic wand” tool where voxels of specific LAC values
could be selected for segmentation. A combination of LAC and morphology were used to segment
lipid droplets, DCSGs, and mitochondria. Lipid droplets and DCSGs were clearly distinguishable
by LAC and size, while DCSGs and mitochondria were easily distinguishable by LAC, shape and
size
74
Chapter 3: Toward the Structure of the Melanocortin 4 Receptor
Contributors to this work include: Matthew Eddy, Ariel Wein, Sanraj Mittal, Tyler Marks, Benjamin
Stauch, Martin Audet, Vsevolod Katritch, Vadim Cherezov
Background: GPCR structural and functional investigations
MC4R: an attractive target for structure based drug design
MC4R is a class A GPCR expressed primarily in the brain
79
. The receptor is a critical
coordinator of mammalian energy homeostasis and body weight. Seminal studies of MC4R in mice
demonstrated that inactivation by gene targeting results in the development of a maturity onset
obesity syndrome associated with hyperphagia, hyperinsulinemia, and hyperglycemia
84
. Further,
mutations in the MC4R gene are the most common cause of monogenic human obesity - a disease
that has a public health cost in America of more than $100 billion annually
185
. There is a wealth
of clinical and pharmacological data for MC4R, yet not a single solved structure exists. The
research presented in this chapter is aimed at solving the structure of MC4R to provide better
models both for understanding the loss of receptor function in obesity and for the rational,
structure-based development of effective treatments for appetite regulation.
MC4R is a member of the GPCR super-family, consisting of more than 800 cell surface
receptors that play critical roles in most aspects of human physiology and pathology. As a family,
GPCRs are the targets for more than 30% of FDA approved drugs ranging in indications from the
metabolic, cardiovascular, neurodegenerative, oncological, and psychiatric
186
. In addition to the
nearly 700 approved GPCR-targeting therapeutics, in 2017 there were around 300 more in clinical
trials
186
. GPCRs have long been attractive therapeutic targets due to their surface availability and
75
tractable ligand response characterizations. As recent innovations in GPCR structural biology have
delivered new insights on ligand binding and molecular function, the family of receptors is under
immense focus in the area of structure-based drug design.
Figure 14. GPCR superfamily. Melanocortin family is circled in red. Adapted from reference
187
.
76
The atomic resolution structure of MC4R would provide not only the basis for structure
based drug design efforts in developing metabolic- and appetite-related therapeutics targeted at
this pharmacologically sought-after receptor, but also a basis for studies into the rising field
surrounding the activation mechanisms of Class A GPCRs
188
. MC4R has demonstrated wide-
ranging signaling characteristics and that it binds to each major G protein subtype as well as
arrestin
189
. MC4R is a model receptor in studies relating to biased signaling, and has been
characterized in its interactions with a series of ligands, with a few receiving increased attention
for useful signaling properties
87
. The MC4R structure would represent the basis for further
investigations in the relationship between structural formations and receptor activation. As MC4R
has been widely characterized in its pharmacological response to a series of molecules, it provides
a wealth of information for comparison to structural observations.
Before X-ray diffraction and cryogenic-electron microscopy (cryo-EM) structural analysis,
a GPCR like MC4R is optimized for its expression and purification, to be produced and isolated
in high quantities, as well as for biophysical qualities like thermostability, diffusibility, and
concentrating ability
190
. Expression quality is measured largely by two parameters, surface
expression and total expression, indicating the amount of receptors that reach the cell surface and
the total number of receptors present in the cell membrane and intracellular space, respectively.
After appropriate expression, the receptor is purified through differential centrifugation and
affinity-based methods, resulting in monodisperse receptors protected by a detergent micelle.
Purity and concentration are then measured using size exclusion chromatography (SEC), with a
minimum of one milligram of protein necessary to proceed to further biophysical assays.
77
Thermostability is measured using a microscale fluorescent thermal stability assay developed by
the Stevens lab, stability at high density is tested using SEC after centrifugal concentration, and
the ability to diffuse in lipid media for LCP-based crystallization is characterized using a high
throughput LCP-FRAP assay developed by the Stevens lab. Once optimized for appropriate
biophysical characteristics the GPCR proceeds to structural studies via X-ray crystallography and
or cryo-EM. The efforts described in this chapter demonstrate the discovery and optimization of a
series of MC4R gene constructs and ligand pairs with biophysical characteristics necessary for
further structural studies.
Results and Discussion
MC4R expression optimization
MC4R is a protein with 332 amino acids, 40 residues on the N-terminus, and 13 on the C-
terminus. As a first step toward improving the surface and total expression of the protein, sequence
deletions were made in the N- and C-termini, along with insertions of several fusion proteins in
the third intracellular loop (ICL3), followed by expression in Sf9 (see methods section below).
Sequence deletions along the N- and C-termini were tested at nearly every residue, with 30
truncations performed in the N-terminus and 14 truncations at the C terminus. Of the 44 terminal
truncation mutations, none improved the surface expression of the receptor above 60% of the total
population. The highest expression construct from this series of deletions was when the N-terminus
was truncated to position eight, which yielded a surface expression slightly above 58%. Though
there was not much improvement to the overall surface expression of the terminal truncation
78
mutants, the total expression of each construct appeared to remain more or less stable, with
averages in the eightieth and ninetieth percentile of all other expression analyses performed by the
Stevens lab. As surface and total expression remained largely unaffected by the truncations, so did
the SEC profile, which, like the wild-type MC4, displayed characteristics of the presence of both
stable receptor as well as aggregated impurity. As there were no major improvements to the MC4R
construct’s expression and stability after purification as a result of truncations at the termini, the
original wild-type construct was carried forward for further modification toward improving the
surface and total expression of the receptor.
One common strategy for improving the expression, stability, and crystallization of a
protein for structural studies is to add a heterologous fusion protein to the construct of interest
191
.
The fusion protein is typically one that is stably expressed and has an appreciable crystal contact
surface area. Using recombinant genetics, the fusion protein is added to the protein of structural
interest at one of its termini or to an internal region, with a linker amino acid sequence tethering
the two together. As structural studies on GPCRs have accelerated, increased interest has been
paid to the development of a series of GPCR friendly fusion proteins, as well as to the identification
of GPCR regions that are amenable to the addition of a fusion protein. The Stevens lab has found
that five proteins in particular can improve GPCR expression, stability, and crystallization when
fused to a GPCR construct under structural investigation. They are Cytochrome b562RIL, T4
lysozyme, rubredoxin, xylanase, and flavodoxin. These five fusion choices make for ideal
supplements given that they are all single domain proteins with crystal structures at higher
resolution than 2 Å, mean B-factors lower than 40, molecular weights between 5 and 20
79
kiloDaltons, a similar number of basic and acidic residues, and distances of <15 Å between the N-
and C-termini for optimal insertion.
To the wild-type MC4R construct, three fusion proteins were added to ICL3 of the receptor
to improve expression abundance and stability. From the Stevens lab tool chest of GPCR fusion
partners, Cytochrome b562RIL, T4 lysozyme, and rubredoxin were the first to be tested with the
wild-type construct. A strategy was developed for testing insertion positions that started by
removing ICL3 of MC4R at the closest position to transmembrane (TM) helix five and helix six,
between where the N- and C-terminal ends of ICL3 connects. The positions for the beginning and
end of ICL3 were estimated using a homology model of MC4R from the Katrich lab, generated
using sequences and structures from the previously solved GPCRs. As the homology model
predicted that the first amino acid of ICL3 was M218 and the last was G243, these were the first
positions at which the loop was removed and the fusion protein was inserted. Thirteen fusion sites
were tested for each of the 3 fusion proteins. In addition to the first fusion site at the ends of ICL3,
the next 12 sites tested were each at positions one amino acid further toward the center of the loop
from both the N- and C-termini. Hence, for the second fusion site, the loop was removed at A219
and K242, and so forth for the next 11 sites. Most of the 39 total fusion addition experiments
resulted in decreased expression and stability upon purification. However, three of the insertions
improved the surface expression of MC4R into the seventieth percentile: Cytochrome b562RIL
inserted at I226 and I235, rubredoxin inserted at I223 and G238, and another rubredoxin inserted
at P230 and G231.
80
Figure 15. MC4R snakeplot denoting N- and C-terminal truncation sites (red) and pairs of ICL3 fusion
protein insertion sites (blue and green)
192
81
MC4R purification optimization
Following early screens for potential N- and C-terminal truncations along with fusion
protein insertions to improve expression abundance and stability after purification, focus was on
improving other conditions of expression and purification like buffer constitution and ligand
presence. During receptor purification, several salt solutions were used to remove cell debris and
other impurities, followed by an alkylation step where iodoacetamide was incubated with the
receptor to decrease aggregation. Then a solubilization step was undertaken where detergent was
used to remove the receptor from its native membrane and into a detergent micelle. In the
membrane washing steps, large concentrations of salts like sodium chloride were used to lyse cells
and precipitate organelle and protein impurities. One strategy for improving the quality of receptor
after purification was to titrate the amount of sodium chloride present in the wash solutions. This
led to no appreciable increase in the amount of stable and monodisperse receptor as measured by
SEC. The incubation time between the receptor and iodoacetamide before the solubilization step
was also titrated, which led to no appreciable increase in stably purified receptor. As a third attempt
to improve purification conditions, the concentration of detergent present in the solubilization
buffers was titrated, which did not improve protein stability after purification. Subsequent efforts
to improve receptor purification conditions focused on the matching the receptors isoelectric point
to that of the buffers used in purification. Using the MC4R homology model, the isoelectric point
of MC4R was estimated to be near a pH of 7.0. At the outset of the process to express and purify
MC4, general protocols from the GPCR Consortium that are used to express and purify other
GPCRs that have led to structure determination were followed. In the general protocols, it is
suggested that each of the buffers in the purification process be prepared at a pH of 7.5. To cater
to the purification of MC4, the pH of these buffers was changed to match the estimated isoelectric
82
point of MC4R at 7.0. This led to a slight improvement in the SEC traces in the constructs of
interest. There was an appreciable decrease in the amount of low retention time material in the
SEC curve and an increase in the amount of high retention time material, suggesting an increased
presence in the amount of stable and monodisperse MC4R present in the sample.
While it appeared that changing buffer pH to match the isoelectric point of MC4R
improved the ability to purify more stable and less impure receptor, it was difficult to draw this
conclusion confidently given that SEC traces display not only qualities of the protein of interest,
but also other proteins present in the sample, assuming they absorb at 280 nm. One outstanding
question was whether the low retention time material was caused by non-MC4R protein material
that was not successfully removed during purification, or whether it was aggregated receptor. If
the low retention time material is impurity and not aggregated receptor then it is important to focus
on improving the purification process and not necessarily on improving the expression of stable
receptor. However, if the low retention time material is aggregated receptor and not impurity then
focus should fall on improving the stability of the MC4R construct through mutagenesis. To
answer this question, a strategy was developed based on GFP fluorescence to determine which
portion of the SEC curve corresponded to receptor versus other proteins present as impurities.
GFP insertion to measure protein purity and monodispersity
In order to individually measure the presence of MC4R in the samples after purification,
enhanced GFP (eGFP) was added to the C-terminus of the three highest expressing fusion insertion
constructs. eGFP has a peak absorbance at around 530 nm, which allowed the comparison of the
presence of MC4-eGFP in SEC traces against the overall protein presence detected through
83
absorbance at 280 nm. For each of the eGFP fusions constructs, a profile more closely resembling
the ideal Gaussian shape of a pure and stable receptor was detected. When compared to the
absorbance at 280 nm (A280), the A530 profile had less protein in the low retention time portion
of the curve, suggesting that the sample mostly consisted of MC4, some monodisperse and some
aggregated, along with a smaller amount of impurity proteins.
Figure 16. GFP facilitated comparison of three fusion insertion constructs’ purity and monodispersity
Receptor-stabilizing point mutations
Given the knowledge that the sample consisted largely of receptor, some unfolded and
aggregated yet largely uncontaminated with other impurities, the next steps focused on improving
the stability of the receptor construct through mutagenesis. In many of the previous structurally-
-1
0
1
2
3
4
5
6
7
8
9
5 6 7 8 9 10
Intensity (absorbance)
Retention time (min)
MC4-BRIL6 + GFP
Total Protein Receptor
-1
0
1
2
3
4
5
5 6 7 8 9 10
Intensity (absorbance)
Retention time (min)
MC4-Rub2 + GFP
Total Protein Receptor
-2
0
2
4
6
8
10
12
5 6 7 8 9 10
Intensity (absorbance)
Retention time (min)
MC4-Rub10 + GFP
Total Protein Receptor
84
solved GPCRs, point mutations have been instrumental in stabilizing the receptor up to its point
of crystallization and X-ray diffraction
193
. There are several commonly used mutations in GPCR
structural biology that have aided in the solution of structures
194
. Most often used from this group
are D2.50N and C3.41W. Upon applying these two mutations there was no significant change in
SEC profiles when compared to the unmutated MC4-BRIL-eGFP construct, other than a slight
improvement in monodispersity for the D2.50N mutation.
It was clear that additional mutations would be necessary to stabilize the receptor, so a
strategy was developed for determining potential stabilizing MC4R mutations based on gene
alignments with previously structurally solved class A GPCRs. Using the GPCRdb’s structure-
based alignment tool, the MC4R gene sequence was compared to about 30 structurally determined
sequences and focused on mutations within transmembrane helices as they are the most important
for creating foundational contacts. It was determined that the following mutations were common
in many crystallized GPCRs: E1.37V, A2.47V, N2.57L, G2.58P, G2.58V, S2.59F, M5.50P,
N6.30D, S7.46V, and L8.50F. The mutations were applied to the MC4-fusion-eGFP construct and
four significantly improved the monodispersity in all three fusion protein variants. The four
mutations, E1.37V, N2.57L, G2.58V, and S2.59F each showed a strong Gaussian profile in SEC
similar to previously solved class A GPCRs, though still with the presence of slight high molecular
weight material.
85
Figure 17. Four mutations improving MC4R purity and monodispersity in SEC
0
2
4
6
8
10
12
5 6 7 8 9 10
Intensity (absorbance)
Retention time (min)
E1.37V
Total Protein Receptor
0
2
4
6
8
10
12
5 6 7 8 9 10
Intensity (absorbance)
Retention time (min)
N2.57L
Total Protein Receptor
0
2
4
6
8
10
12
5 6 7 8 9 10
Intensity (absorbance)
Retention time (min)
G2.58V
Total Protein Receptor
0
2
4
6
8
10
12
5 6 7 8 9 10
Intensity (absorbance)
Retention time (min)
S2.59F
Total Protein Receptor
86
Ligand co-purification for improved receptor stability
The vast majority of structurally-solved GPCRs utilized ligands not only to stabilize the
receptor, but also provide insight into receptor structural arrangements when bound to biologically
active molecules. The search for ligands for the purification and crystallization of MC4R began in
literature surrounding MC4R signaling. As MC4R has an established role in appetite regulation,
many studies have characterized MC4’s signaling response to a wide variety of molecules in an
effort to develop therapeutic molecules. Of a list of molecules known to directly influence MC4R
activity, the following were tested during purification including agonists α-MSH, melanotan II,
THIQ, and [Nle4,D-Phe7]-α-MSH; and antagonists PG-901, MBP10, HS014, HS024, SHU9119,
JKC 363, and ML 00253764 hydrochloride. Each of the tested compounds are commercially
available and have been extensively characterized in their interaction with MC4R and downstream
signaling. Each of the eleven compounds were first tested with the MC4R BRIL and Rubredoxin
fusion constructs and analyzed for stability inducing qualities via SEC. Initial tests maintained a
ligand concentration of 100 uM during all steps of purification, including membrane washing,
detergent solubilization, and affinity capture of the isolated receptor. Of the eleven compounds,
SHU9119 was the only one to show significant improvement in SEC following purification.
However, several other ligands including HS024 and [Nle4,D-Phe7]-α-MSH provided a noticeable
but slight improvement to the Gaussian quality of the MC4R SEC profile. Given that SHU9119
provided a significant improvement to the MC4R SEC profile and that several point mutations had
also provided a comparable improvement to the SEC profile, the next experiment was to take one
of the most promising point mutation constructs, N2.57L, and purify it in the presence of
SHU9119.
87
Figure 18. Structures of agonists and antagonists tested for MC4R stability inducement
88
Construct and ligand pair optimization
Combining the point mutation optimized MC4R construct with the ligand purification
strategy using SHU9119 yielded a SEC curve showing nearly complete monodispersity in the
receptor via a highly symmetric Gaussian profile. As it is often optimal to have several options for
stabilizing ligands when moving into receptor crystallization trials, the next step included testing
the complete set of eleven commercially purchased ligands with the mutation-optimized MC4R
construct. Of the 11 ligands tested, 9 provided additional stability to the MC4R construct
containing a BRIL fusion and the N2.57L stabilizing mutation. Each of the 9 stabilizing ligands,
when purified with the fusion and mutation-optimized MC4R construct, produced a Gaussian SEC
profile, recapitulating SEC curves shown in structurally determined GPCR constructs. Further,
purifications were performed with 10 uM concentrations of each compound present during
membrane washing, solubilization, and affinity-capture steps of the purification process. Hence
this demonstrated that the point mutation stabilized construct required less of the stabilizing ligand
to achieve the desired effect receptor monodispersity. As commercial ligand are often expensive
and available in limited quantities, this result of requiring less ligand for stabilization provided the
opportunity to perform additional experiments. SEC profiles of the 9 stabilizing ligands when
purified with the BRIL and N2.57L optimized MC4R construct are shown in Figure 20.
89
Figure 19. Determination of ten ligands yielding pure and monodisperse MC4R when purified with BRIL6
+ N2.57L construct
Thermostability assessment
Following the identification of a highly monodisperse MC4R construct and ligand pair, the
next step focused on the biophysical characterization of the new lead. In GPCR structural studies,
receptors need not only be monodisperse but also be stabilized to a degree that prevents unfolding
during the crystallization process. A microscale fluorescence assay developed by the Stevens lab
was used to test for thermal stability
195
. In the assay, the receptor is reconstituted in the presence
of thiol-specific fluorescent dye, N-[4-(7-diethylamino-4-methyl-3-coumarinyl)phenyl]maleimide
-5
0
5
10
15
20
25
30
35
40
45
6 7 8 9 10
Intensity (absorbance)
Retention time (min)
MC4-BRIL6-N2.57L Broad Ligand Screen
Apo SHU9119 [Nle4,D-Phe7]-a-MSH
PG-901 HS014 HS024
MT II THIQ ML 00253764
JKC 363
90
(CPM). The receptor and dye cocktail are heated from 20°C to 95°C at a rate of 2 degrees per
minute while monitoring the fluorescence of the solution, which increases as the receptor
destabilizes and unfolds. The majority of cysteine residues in MC4R and most class A GPCRs are
within the transmembrane region of the receptor and are embedded in the protein interior until
temperature-induced unfolding. As cysteine residues are exposed with heat, CPM dye, which is
nonfluorescent when unbound, binds the newly exposed thiol groups on the receptor and begins to
fluoresce. Previously crystallized class A GPCRs typically display a strong sigmoidal transition
between the folded and unfolded state at temperatures greater than about 50-60°C. Upon assaying
the MC4R fusion and mutation-optimized construct with 10 uM SHU9119, a strong sigmoidal
profile was observed with a transition temperature above 65°C as shown in Figure 21, suggesting
the construct and ligand pair were sufficiently thermally stable for downstream assays and
structural studies.
Figure 20. Thermostability assay of pure and monodisperse MC4R fusion and point mutated construct with
SHU9119 copurification.
0
10
20
30
40
50
60
70
80
90
100
20 35 50 65 80 95
Protein Denaturation (%)
Temperature (°C)
MC4-BRIL6-N2.57L Ligand-induced Thermostability
Apo SHU9119
91
Through the fluorescent thermostability assay and SEC assays following purification, the
MC4R construct was observed to be mostly free of contaminants, monodisperse, and thermostable.
These are three necessary characteristics in order to continue the protein toward structural studies
via crystallography and electron microscopy. However, the constructs tested each contain an
additional eGFP fusion to aid in the SEC analysis of purity and monodispersity. Next, steps were
taken to remove the eGFP fusion and perform equivalent thermostability and SEC assays, ideally
with no difference in the assay results. Following the removal of GFP from the construct containing
a BRIL fusion at ICL3 and an N2.57L stabilizing mutation, no significant differences were
observed in purity, monodispersity, or thermostability (Figures 22 and 23).
Figure 21. Purity and monodispersity is retained in MC4R BRIL fusion and N2.57L mutated construct
after removal of GFP
-5
0
5
10
15
20
25
30
35
40
6 7 8 9 10
Intensity (absorbance)
Retention time (min)
MC4R stability after GFP removal (SEC)
MC4-BRIL6-N2.57L-SHU9119
92
Figure 22. Thermostability is retained in MC4R BRIL fusion and N2.57L mutated construct after removal
of GFP
Combining stabilizing point mutations and ligands
After identifying a ligand and mutation pair leading to pure, monodisperse, and stable
receptor in expression and purification, additional mutations were sought that lead to similar
conditions. In addition to finding that the N2.57L mutation improves receptor stability, the
previously described mutation screens, based on sequence alignments to other crystallized class A
GPCRs, led to several other mutations that seemed to improve the monodispersity of MC4. The
SEC curves of the MC4R BRIL fusion with E1.37V, G2.58V, and S2.59F showed slight
improvement compared to the MC4R BRIL construct without point mutations, so these mutations
were screened in the presence of SHU9119. As a result, it was found that each mutation, when
added to the MC4R BRIL fusion construct and purified in the presence of SHU9119, led to pure
and monodisperse receptor as measured by SEC (Figure 24).
0
10
20
30
40
50
60
70
80
90
100
20 35 50 65 80 95
Intensity (absorbance)
Temperature (°C)
MC4R stability after GFP removal (CPM)
Apo
SHU9119
93
Figure 23. In addition to N2.57L, the E1.37V, G2.58V, and S2.59F mutations produced pure and
monodisperse MC4R upon purification in the presence of SHU9119
To generate additional constructs as potential candidates for crystallization and single-
particle electron microscopy studies, the N2.57L, E1.37V, G2.58V, and S2.59F were combined
and purified each in the presence of SHU9119. Of the 24 total combinations, 4 led to pure and
monodisperse receptor as measured by SEC. The point mutation combinations were tested using
the BRIL fusion construct previously characterized with each of the four mutations individually.
The three combinations of mutation that lead to pure and monodisperse MC4R were E1.37V +
N2.57L, N2.57L + S2.59F, E1.37V + S2.59F, and E1.37V + N2.57L + S2.59F. SEC traces of the
successful mutation combinations are shown in Figure 25.
-20
0
20
40
60
80
100
6 7 8 9 10
MC4R
Retention time (min)
MC4 stability via point mutations
E1.37V
N2.57L
G2.58V
S2.59F
94
Figure 24. Point mutation combinations led to pure and monodisperse MC4R upon copurification with
SHU9119
Negative stain as a screening technique for GPCR purity and monodispersity
Following the identification of a series of construct and ligand pairs leading to purification
of stable and monodisperse MC4, the structure of the receptor was sought both by X-ray
crystallography as well as through single particle cryo-EM. Early studies using cryo-electron
microscopy focused on finding a condition that stabilized the receptor as a single particle, using
negative stain. Negative stain is a prescreening technique commonly used in single particle cryo-
EM studies because it allows for a simple and quick observation of macromolecules and
macromolecular complex through the use of a contrast enhancing stain reagent
196
. The speed of
-20
0
20
40
60
80
100
120
7 8 9 10 11 12
Intensity (absorbance)
Retention time (min)
Point mutation combinations
E1.37V +
N2.57L
N2.57L +
S2.59F
E1.37V +
S2.59F
E1.37V +
N2.57L +
S2.59F
95
sample preparation for negative stain makes it an ideal method for introductory assessments of
sample quality, through characteristics like purity, heterogeneity, and concentration. The
resolution of the technique is limited to about 20 Å, enough to inspect whether the protein sample
is aggregated or exists individually in a detergent micelle. For both X-ray crystallography and
single particle cryo-EM, it is important that the protein is monodisperse, pure, and stable, which
can be seen in negative stain preparations.
In the receptor negative stain images, the aim was to identify ligand co-purification
conditions that led to pure and monodisperse MC4. The BRIL fusion and N2.57L point-mutation
construct were purified in the presence of several ligands, individually, seeking to determine
whether a particular ligand induced more stability than others in terms of preventing aggregation.
The previously described ligands were tested including one antagonist, SHU9119, as well as three
agonists, MT-II, [Nle4,D-Phe7]-α-MSH, and THIQ. The negative stain images showed that the
grid with SHU9119 showed the least amount of protein aggregation in the each of the regions
imaged, while MT-II showed a large amount of aggregation on many of the imaged grid locations.
To date, the vast majority of GPCR structures have been determined using X-ray
crystallography, with the receptor co-crystallized with antagonists, and in the inactive state
197
.
Antagonists are believed to lock GPCRs in the putative off-state and reduce the protein’s inherent
conformational flexibility, which facilitates the tight packing of crystals. Proteins in the process of
scanning structural conformations, like those showing constitutive activity, present a wider variety
of potential binding epitopes as they bend and twist through the conformational landscape. Hence,
those proteins are more likely to aggregate in non-crystalline assemblies, preventing the ability to
determine their structural arrangement through diffractive methods. Based on this reasoning, it
96
was anticipated that the least amount of aggregation in negative stain images would be observed
with the receptor bound to an antagonist. Figures 26-29 show representative images of MC4R co-
purified with several ligands.
Figure 25. Micrograph from negative stain electron microscopy imaging of MC4R copurified with
SHU9119.
97
Figure 26. Micrograph from negative stain electron microscopy imaging of MC4R copurified with [Nle4,D-
Phe7]-α-MSH.
98
Figure 27. Micrograph from negative stain electron microscopy imaging of MC4R copurified with MT-II.
99
Figure 28. Micrograph from negative stain electron microscopy imaging of MC4R copurified with MT-II,
with aggregation
Given that negative stain images of MC4R bound to the antagonist SHU9119 showed the
least amount of aggregation, the SHU9119 and MC4R construct pair was found to be a viable
candidate for further crystallization attempts. It was also observed that micrographs from the
negative stain images of MC4R with MT-II showed more protein aggregation than the others.
100
Concurrently, new crystallization trials were tested weekly on a series of MC4R constructs,
iterating on truncations of the N- and C-terminus as well as adjusting the linker positions for the
ICL3 BRIL fusion protein insertion. The negative stain images showing the MC4-SHU9119
complex in pure and monodisperse arrangement provided confidence that the ligand and construct
pair was a viable target. Several sets of microcrystals with the MC4R and SHU9119 combination
were produced, yet the crystals did not show protein diffraction after testing on the LCLS beamline
at Stanford. MC4R has constitutive activity and is very hydrophobic, both of which are conditions
that seem to hinder protein crystallization, so it seemed likely that the best possibility of
crystallizing the receptor would be with an antagonist. However, the negative stain results showed
that other ligands, including agonists could produce conditions of purity and monodispersity,
which were explored for the potential of solving a structure.
After determining a series of combinations of constructs and ligands that yielded pure,
monodisperse, and stable MC4, the project was transferred to the Stevens lab at Shanghai Tech
University, where graduate students carried on the work toward determining the receptor structure.
At this time, efforts with the PBCC were being ramped up, as described in Chapter 2, and I took
this as an opportunity to dedicate more time to helping lead the organization of the consortium as
we laid the foundation for future projects in this area, and to pivot my own research toward data
integration and whole-cell modeling. In transferring the project, I travelled to Shanghai Tech
University on multiple occasions to meet with students who would take over the project, and
passed on optimized purification and assay protocols, plasmids containing constructs of interest,
packages of data containing results from expression, purification, and biophysical
characterizations, as well as future plans for improving the size and packing of early crystals.
101
Based in part on the results presented herein, the structure of MC4R has now been solved at <3 Å
resolution, with the manuscript currently in preparation.
Methods
Cell culture and MC4R expression
Recombinant baculoviruses were generated using the Bac-to-Bac system (Invitrogen) and
transfected in 2.5 ml of Sf9 cells at 1 × 106 cells ml
−1
using Extreme Gene (Sigma Cat. No.
06366546001) at a multiplicity of infection of 5 and a specialized transfection media (Expression
Systems Cat. No. 95-020) for 96 h. The supernatant (P0) was then collected by centrifugation at
2,000 rpm for 20 min. This P0 was then used with an assumed titer of 1 × 109 IU ml−1 to infect
40 ml of Sf9 cells at 2.0 × 106 cells ml−1 . Sf9 cells were placed in a shaker at 125 rpm and 27°C
for 48 h. Cells were sterilely collected into 50 ml conical tubes. A small aliquot was taken to
analyze for the presence of a FLAG tag using a FITC antibody (Sigma Cat. No. F4049-1mg)
reading by FACS. The FLAG antibody was tested both with and without the presence of Triton X
(Sigma Cat. No. X100-500ml) to determine both the surface and total expression values of MC4.
Cells were pelleted and collected as P1 biomass and placed in a −80°C freezer for purification and
further analysis. The supernatant was stored at 4°C as a P1 virus. This P1 virus was titered and
used for further infection of large-scale biomass. Typically, 5 L of biomass yielding ~400 µg L −1
was used to set up one round of crystallographic screening trials.
Receptor purification
All steps in purification were carried out at 4°C. Insect cell membranes were disrupted by dounce
homogenization of cell pellets in a hypotonic buffer containing 25 mM HEPES pH 7.0, 10 mM
102
MgCl2, 20 mM KCl, and protease inhibitor cocktail (Roche). Extensive washing of the membranes
was performed by repeated dounce homogenization and centrifugation in a high osmotic buffer
containing 1.0 M NaCl, 10 mM MgCl2, 20 mM KCl, and 25 mM HEPES pH 7.5. This process was
repeated three times. Purified membranes were resuspended in 500 mM NaCl, 20 mM KCl, 50 mM
HEPES pH 7.5, and 35% (v/v) glycerol, flash frozen with liquid nitrogen, and stored at −80°C
until further use. Purified membranes were thawed on ice in a solution of 500 mM NaCl, 50 mM
HEPES pH 7.5, 20 mM KCl, and 5% glycerol (v/v), at 4°C for 60 min. Iodoacetamide (Sigma)
was then added to a final concentration of 1 mg ml
−1
and the solution was incubated for another
15 min before solubilization with 0.5% (w/v) n-dodecyl-β-Dmaltopyranoside (DDM; Anatrace),
and 0.1% (w/v) cholesteryl hemisuccinate (CHS; Sigma) for 3 h at 4°C. The supernatant was
isolated by centrifugation at 160,000 × g for 45 min, supplemented with 25 mM imidazole, pH 7.5,
and incubated with TALON metal ion affinity chromatography resin (Clontech) overnight at 4°C.
After binding, the resin was washed with 15 column volumes of buffer 1 (500 mM NaCl, 50 mM
HEPES pH 7.5, 20 mM KCl, 10 mM MgCl2, 5% (v/v) glycerol, 1 mM ATP, 25 mM imidazole,
25 µM ligand of interest, 0.05% (w/v) DDM and 0.01% (w/v) CHS); and 5 column volumes of
wash buffer 2 (buffer 1 without MgCl2 or ATP), before protein elution with elution buffer
(500 mM NaCl, 50 mM HEPES pH 7.5, 20 mM KCl, 10% (v/v) glycerol, 250 mM imidazole,
0.025% (w/v) DDM and 0.005% (w/v) CHS). Receptor purity and monodispersity were monitored
via SDS–PAGE and analytical size exclusion chromatography (aSEC). Purified receptor was
desalted using a PD midiTrap G-25 column (GE Healthcare) into a buffer containing 500 mM
NaCl, 20 mM KCl, and 5% (v/v) glycerol, 50 mM HEPES pH 7.5. The protein solution was then
further concentrated to 40 mg ml
−1
with a 100 kDa molecular mass cut-off Vivaspin concentrator
(GE Healthcare).
103
Microscale thermal stability measurements (CPM assay)
A stock solution of 5 mg ml−1 CPM (7-Diethylamino-3-(4'-Maleimidylphenyl)-4-
Methylcoumarin, Sigma) was dissolved in DMF (dimethylformamide, Sigma) and stored at
−80°C, protected from light. Before each experiment, a fresh 2 µL aliquot of CPM stock was
combined with 38 µL room temperature desalting buffer (50 mM HEPES pH 7.5 at 4°C, 100 mM
NaCl, 20 mM KCl, 0.01% LMNG, 0.002% CHS, 5 µM ligand) to create a working stock solution
and protected from light. Thermal stability measurements were carried out using the Cary Eclipse
Spectrophotometer equipped with a Peltier 4 Position Multicell Holder. Measurements were
recorded using 100 µL capacity “sub-micro” cuvettes from Starna Cells (Cat. No. 16.100-Q-
10/Z20). Samples were temperature ramped from 20–80°C at a rate of 1–2°C min
−1
while
recording the sample’s fluorescence at an excitation/emission wavelength of 384/470 nm.
Lipidic cubic phase crystallization
Concentrated protein samples were reconstituted into LCP by mixing with molten lipid using a
mechanical syringe mixer at room temperature (~20 - 22°C).162 The LCP mixture contained 40%
(w/w) concentrated protein, 54% (w/w) monoolein (Sigma), and 6% (w/w) cholesterol
(AvantiPolar Lipids). Crystallization trials were performed in 96-well glass sandwich plates
(Marienfeld) onto which 40 nl protein-containing LCP drops and 0.8 µl precipitant solution drops
were deposited by the NT8-LCP crystallization robot (Formulatrix). The crystallization plates
were then sealed with a glass cover slip and stored at 20°C in an incubator/imager (RockImager
1000, Formulatrix).
104
Chapter 4: Modeling Biochemical Systems with Digital Arts Platforms
Contributors to this work include: John Paul Francis, Alex McDowell RDI, Francesco Cutrale, Scott
Fraser, Helen Berman
Introduction: Communicating Science Visually
Visualization is a critical component of research and communication in the life sciences.
We not only summarize information in visual formats like graphs, diagrams, figures, maps, etc.,
but we also visually investigate a large part of our primary data, as in instrument readouts. As a
picture is worth a thousand words, many researchers first look to figures when reading scientific
literature. In figures, we summarize our knowledge into crystalized, meme-like demonstrations of
an idea or result. Innovations in visualization techniques have moved in step with innovations in
biochemistry and our overall understanding of the processes of life. Advances in computing and
visualization technology have provided tools that are able to visualize extremely detailed,
complex, and fast moving entities without needing a proper studio, making them available to a
broader population of scientists including those focused on the microscopic worlds within us.
During my graduate studies, I not only had the opportunity to perform research on cells
and their components using a variety of experimental techniques, I also was able to focus on
developing means for representing complex and multidimensional biochemical data. Using
entertainment platforms like Unity and Maya, I explored methods for visualizing and interacting
with biochemical data. Our visual tools focused on representing data from biological imaging,
omics, and macromolecule structure. I initiated the projects reported in this chapter after receiving
a grant from the USC Bridge Institute of the Michelson Center for Convergent Bioscience. The
goal of this grant was to investigate new methods to communicate science and visualize data by
105
working with digital arts platforms. Over the course of three years, the projects have evolved into
multidisciplinary collaborations between USC’s Departments of Chemistry, Molecular and
Computational Biology, Medicine, Computer Science, and Cinematic Arts. USC has proven to be
an ideal environment for developing these types of multidisciplinary projects as the university
houses specialists not only from across the life science disciplines but also from technical and
design-based digital arts practices.
Developing Tools for Improved Microscope Image Viewing, Markup,
and Sharing
Microscape: A point cloud rendering system for viewing microscope images
Microscape is a microscope image viewer developed as part of my thesis work that
explored the use of digital arts technology for displaying biological data. The image viewer takes
point cloud images from confocal fluorescent microscopes and renders them for visualization on
screens, virtual reality headsets, and mobile devices. Through the Microscape application, it is not
only possible to view microscope images in high definition, but one can also interact with the
images using gameplay mechanisms developed using the Unity Game Engine. In the current
version of the application users can: import a range of 3D microscope image-stack file types;
independently control color channels of the volumetric image; annotate the image using markup
tools; explore the image in virtual reality and pick up a camera to record and export images and
video; and share and manipulate the microscope images with collaborators in real-time using
multiplayer features.
106
Importing and rendering microscope images in the Unity Game Engine
One core feature of the Microscape application is its ability to upload microscope images
from a wide variety of microscope types and image formats. The Imaris File Converter is used to
convert images from specific-use formats like DICOM, Leica XLEF, Leica LOF and other
common formats (e.g. TIFF, JPEG, and PNG) into IMS file formats, which are then converted to
a format that is readable within Unity. Z-stack images are the most common data format in 3D
confocal fluorescence microscopy, which consist of multiple 2D images layered on top of one
another to create a 3D representation. Each slice of the z-stack is a 2D coordinate system with
points on the plane representing areas of fluorescence within the slice. Each coordinate not only
denotes whether part of the image exists at that location, but also what type of component is present
at this location. Component type usually refers to the color of fluorescence during image
acquisition, which ultimately refers to a specific biological structure present within the sample. In
order to read the z-stack files into Microscape and render them as pseudo 3D images, the images
must first be individually separated from the z-stack. Then, within Unity, the distance between
each slice of the image is specified. A z-stack of 2D planes is then rendered into an image through
a technique devised using Unity’s raycasting capabilities. A simplified overview of the rendering
pipeline is presented in Figure 30.
107
Figure 29. Rendering pipeline diagram. First, images begin as z stacks after acquisition on confocal
microscope or other volumetric imaging platform. Next, the image slices are composited into a 3D
volume using the rendering algorithms developed in this project. Lastly, the image is displayed on a
screen for visual investigation.
In rendering, raycasting is an operation where a virtual camera sends out imaginary rays in
the direction the camera is pointing, which ultimately collides with some object in the scene. If the
ray collides with an object, then the renderer knows to draw that object on the screen at the pixel
location from which the ray emanated. In Microscape, raycasting is used to detect points of
fluorescence within the slices of the z-stack so that the in-game camera knows where to draw
points of fluorescence on the viewing screen. In real-time, as the camera is moved around the z-
stack, rays intersect different coordinates of the slices, detect fluorescence, and then points of
fluorescence are drawn in the correct location on the viewing screen, whether on a desktop
computer, virtual reality headset, or mobile device.
Confocal microscope images can contain millions of individual points of fluorescence
within the slices of a z-stack
198
. This creates a major computational challenge for moving the
fluorescent points to correct pixel locations on a screen while the in-game camera is in motion.
For instance, on a desktop screen, in order to rotate around an image, the engine must calculate
new display locations for all the fluorescent points relative to one another and relative to the focal
108
point of the in-game camera as they move around the image’s axis of rotation. On a virtual reality
device, where there are many more degrees of freedom for camera motion, along with constant
shakiness due to slight user head movements, the calculations required to place each fluorescent
point in the correct pixel location becomes much more computationally challenging.
A rendering strategy was developed to address computational challenges in displaying the
billions of fluorescent points as they move in real-time, which is described below. The point cloud
rendering strategy designed here was undertaken almost entirely within the framework of Unity’s
shader system. In computer graphics, a shader is an algorithm used to create an image by producing
the appropriate levels of light, color, and texture of an image rendered on a screen. Most shader
code is designed for graphics processing units (GPUs) because their highly parallelized
architecture allows for faster calculations in translating and redrawing object locations on a screen.
As it is a computationally intensive process to render the many points in a fluorescent confocal
microscope image, Unity’s shader framework provided the ability to create references to image
slices within a z-stack and then probe those slices for points of fluorescence using Unity’s
raycasting system. The shader code for point cloud rendering was written in the High-Level
Shading Language developed by Microsoft for the DirectX pipeline.
109
Figure 30. Point cloud rendering inside Unity development environment. Unity provides built in tools for
manipulation of 3D objects, like our volumetric images, but first they must be rendered using shader
algorithms developed in this project.
The biggest challenge in real-time rendering of point cloud datasets stems from the
difficulty of moving the image on a screen without losing image quality or framerate. Point clouds
contain many coordinates that have to be calculated individually when translated in a viewing
space. A given processor, whether CPU or GPU, has finite calculation power, and often the number
of points in a confocal fluorescent microscope image point cloud is more than a typical processor
on a consumer-grade computer can manage. When the number of calculations required to translate
a point-cloud exceeds the computational power of a computer, trade-offs must be made between
either the speed of image translation from one location to another or in the number of points that
are being translated. A point cloud rendering scheme was developed in this project to balance this
trade-off so that the number of points being translated in real-time can be adjusted to meet the
computational power of the device used to render the volumetric image. Hence, when an image
contains more points than the viewing device can translate, the number of points that the algorithm
110
detects within the image is reduced. By reducing the number of points, there is some loss in image
resolution, yet the ability to navigate seamlessly around the image with a high framerate is
improved. This balance is controlled through the raycasting features described above. As rays are
cast through the slices of a z-stack they intersect a series of coordinates with points of fluorescence.
The number of fluorescent coordinates the ray intersects is dependent not only on the density of
fluorescence points within each slice but also the number of slices. Confocal fluorescent
microscope images often contain hundreds of slices in an individual z-stack, spaced equally along
the z-axis. To reduce the number of points, a form of real-time pseudo-downsampling was
developed where the rays intersecting each slice have a programmed minimum distance between
any two fluorescent points the ray can detect. This parameter was coined the ‘step value.’ At high
step values, the ray detects points of fluorescence at more closely spaced intervals, and at low step
values the ray ignores more points that lie close to one another by sampling fewer slices in the z-
stack.
Figure 31. Fluorescent confocal microscope image rendered in Microscape with a high step count and a
low step count.
111
Image modification
After the image is loaded into the application, several image manipulation tools were
developed to convert the visual into formats appropriate for biological investigations and research
presentations. At the core of the image manipulation tools are the channel control features.
Confocal fluorescent microscope images are often acquired using multiple fluorescent antibodies
of different colors, which bind to different biological structures, so they can be visually
distinguished in the resulting image. In this project, capabilities were developed to control each of
the channels independently based on the particular color of fluorescence captured during
acquisition. With these capabilities, it is now not only possible to control the color displayed for
each individual channel, but also features like brightness, contrast, and attenuation. In addition to
channel controls, tools for cropping and stretching or compressing the image from any of the six
positive and negative x,y,z axes were also developed.
With these image manipulation tools, users can adjust visual features of the z-stack to the
appropriate colors, brightness levels, scale, cropping dimensions, and more, to set up the image
for further analysis or communications purposes. In addition to tools for modifying the original
microscope image, an annotation system for users to mark-up or highlight points of interest on the
z-stack was developed. The paintbrush tool allows the user to annotate the image and the
surrounding space as if it was a 3D chalkboard. On both the desktop and virtual reality versions of
Microscape, users can pick up the paintbrush tool and create markings on regions of interest within
the image, write notes outside the image, save those markings for future sessions, and share mark-
ups with collaborators. The tool allows for markup size and color to be adjusted so to find the
112
appropriate contrast and viewability for the particular application. Following image setup and
annotation, additional tools like the camera and multiplayer network then allow users to share
images they have prepared in Microscape with collaborators, advisors, and the public. The capture
tools give users the ability to pick up a camera in virtual reality and move around an image while
capturing pictures and video that can be exported and used during research presentations. Further,
the multiplayer tool, developed using Photon Unity Networking package, can be used to bring
collaborators, advisors, and the public into a shared image viewing session, either on the desktop
or in virtual reality, where multiple individuals can view and manipulate an image in real time.
Figure 32. Camera capture and annotation of microscope image within Microscape
Conclusion
As science labs are generating more data than ever, terabytes of images are stored away on
hard drives while researchers work tirelessly to process the data into digestible abstractions. In
literature and textbooks, microscope images often appear as 2D screen captures, even though these
113
images are 3D in nature. As there is a growing need to analyze and communicate this information
more effectively, those who are best able to share their work with others, and find collaborative
insights, will be at an advantage. Hence, as this need grows, there will be an escalation in
development of more and better tools for image viewing, sharing, and storage. In the research
environment of the future, researchers will be able to share more robust representations of their
imaging data with both the scientific community as well as the general public, increasing their
impact. Microscape attempts to take a step toward this future, by lowering the barrier of entry for
researchers to access software for viewing, annotating, and sharing research images. The hope is
that this tool will increase the ability for labs to share their work and engage in collaborations
centered around maximizing the impact derived from research images.
Developing data-driven interactive experiences of PBC environments
The World in a Cell: A virtual exploration of PBC biology
The World in a Cell is a virtual reality exploration experience of the multithreaded systems
operating within a PBC. The PBC is a vital component of human biology as it is the source of one
of our most important hormones - insulin. To produce insulin, the PBC relies on a series of
components - proteins, nucleotides, small molecules, and more - to coordinate their activities
toward a common goal. In the World in a Cell virtual reality experience, users can navigate the
complex environment of the PBC, meeting the components that carry out these processes, all while
engaging in concepts, pathways, and implications of cellular biology presented through narratives
backed by scientific rigor. The project is a large collaboration between artists and scientists
114
working together to use storytelling and digital arts approaches to immerse both lay-people and
experts in the rich and interconnected world of the human cell.
The enormous amount of complexity operating within a cell presents a major challenge in
visually depicting its inner processes. Estimates suggest that there are about 100 trillion atoms in
an average human cell
199
. Just one cell can contain billions of proteins, with tens of thousands of
different types available
199
. In our genome there are about 3 billion base pairs, which give rise to
the tens of thousands of mRNA transcripts that may be present in a cell at any moment
200
. This is
not to mention the staggering numbers of lipids, metabolites, and other integral cellular
components. All in all, there is a lot happening inside a cell, and there are many pieces to keep
track of in order to understand how the symphony orchestrates itself. This is where game engines
like Unity show great promise, as they provide the capabilities for programming behavior into
components in order to build large systems of interactions. Unity’s scripting capabilities allow
models of biological components to be embed with features much like those understood to be
present in real biological systems, like the cell. As such, the World in a Cell interactive experience
was developed using the Unity Game Engine.
In establishing this project, my initial goal was to work with specialists in storytelling and
the digital arts to leverage emerging approaches in visualization and narrative toward developing
a platform for communicating how the cell carries out complex, multithreaded processes. As the
goal was essentially to develop an interactive story-world within the cell, I reached out to Alex
McDowell RDI of the USC World Building Media Lab for support, as he had been developing
story-worlds for Hollywood for much of his career. Fortunately, Alex saw potential in this idea
115
and joined the project. The work presented in this section represents the collaboration between the
World Building Media Lab of the USC School of Cinematic Arts and the USC Bridge Institute of
the Michelson Center for Convergent Bioscience. Over the course of almost three years of project
development, the collaboration has grown to involve a group of over fifty scientists, artists,
engineers, medical professionals, communications specialists, educators, and more. Initially, the
focus of this project was to create an interactive experience inside canonical human cell. This
project was developed around the same time as the creation of the PBCC, presented in Chapters 1
and 2. In an effort to connect the World in a Cell project to the PBCC, the interactive experience
was focused on the insulin producing PBC.
Modeling the PBC as an input-output machine
To create an experience allowing one to enter the cell and explore its systems, first steps
focused on creating models and a visual language to represent the cell and its systems. This was a
very difficult task given that cells contain billions of components interacting with each other in
real-time. In modelling the cell, the goal was to include as much known information from the
literature as possible. There is a vast amount of information available concerning the form and
function of biological cells. To eventually build the capability to integrate highly detailed cellular
data into the experience, initial steps focused on features that were the most foundational to cellular
behavior. It was only possible to add layers of functionality to the cell after designing its
foundational architecture. PBCs are just as complex as any other human cell, but their most salient
feature is that they are quite specialized to fit their unique role of producing insulin for the rest of
the body. This feature of PBCs allowed for the modeling approaches within the interactive
experience to be simplified. At the outset, it was decided that the PBC would be represented as an
116
input-output machine, where sugar goes into the cell, and insulin comes out of the cell. Hence, the
focus was centered on modelling the pathways and components integral to this process of insulin
synthesis and secretion.
Conceptually, it is convenient to think of the cell as consisting of a series of components
performing chain reactions with one another in multithreaded cyclical loops. If you could shrink
down to see objects the size of cellular components and slow yourself down to the time-scale of
cellular processes, I imagine it would look like a tightly packed snow globe with pockets of
components cycling in and out of ordered processes; maybe like the bustling landscape of a coral
reef, with thousands of species swarming and interacting, all giving rise to a stable ecosystem.
Often, cell biologists represent this complexity of cellular systems using pathway diagrams.
Pathway diagrams graphically represent events occurring in the cell through a sequence of steps
laid out much like a wiring diagram for an electrical circuit. In a typical pathway diagram, nodes
represent biological components and edges represent interactions between components. The
diagrams allow one to follow the trajectories of sequences of reactions as they interconnect and
build on top of one another.
Representing PBC molecular signaling pathways
At the outset of the World in a Cell project, the insulin synthesis and secretion pathway of
the PBC was the initial sequence of events designed and modeled. Existing insulin production
pathway models were used, like those in Figure 34, as the basis for developing the cellular
environment. Ultimately, several pathway diagrams were combined and modified into a series of
117
components and reactions that would allow for an introductory-level story as to how and why
insulin is produced in the PBC. Focus was on the modeling of several sub-pathways including:
- the glucose metabolism pathway, which brings extracellular sugar into the cytosol where
it is converted to ATP through glycolysis, the Kreb’s cycle, and the electron transport
chain;
- the GLP1R activation pathway, which transduces signals prompting the synthesis of insulin
from the outside of the cell to the inside, where they are amplified through G-protein
signaling and cyclic AMP production;
- the insulin transcription activation pathway, where kinases are activated by ATP and cyclic
AMP phosphorylate transcription factors, leading to the production of insulin mRNA;
- the insulin synthesis pathway, where insulin mRNA is translated by an endoplasmic
reticulum-embedded ribosome and later packed into insulin vesicles for eventual
exportation;
- the depolarization pathway, where ions are selectively permitted to enter and exit the cell,
leading to an increase in intracellular calcium concentration; and
- the insulin secretion pathway, where calcium mobilizes insulin vesicles toward fusing with
the cell membrane and releasing insulin into the bloodstream.
118
Figure 33. Example PBC pathway diagram. G protein-coupled receptors, such as GLP1, FFAR1, and
GPR40, on the cell surface are activated by their respective ligands, and then activate G-protein signaling
cascades. G-protein signaling cascades activate downstream mediators such as PKA, PKC, and EPAC2.
These downstream mediators then activate signaling pathways that result in the activation of insulin
transcription factors, such as PAX6, PDX-1, NeuroD1, and MafA, which stimulate the synthesis of insulin.
In parallel, glucose enters PBC via transmembrane glucose transporters 1 and 2. Increased ATP production,
as a result of glycolysis, triggers potassium channels in the membrane to close, which prevents efflux of
potassium ions. As a result, sodium and calcium channels are opened to allow the influx of sodium and
calcium ions. In addition, some voltage-gated calcium channels are positively modulated by
phosphorylation via PKA. The dramatic increase in intracellular calcium ion levels then stimulate the
trafficking of insulin-containing vesicles to the cell membrane, which causes insulin release from the PBC.
Figure adapted and reprinted with permission from InTech Open
201
.
119
Modeling PBC components in a scalable form-language
To build cellular pathways, first steps consisted of creating representations of the
components within the pathways. Within the six sub-pathways described above, fifty components
were chosen for which models were created, including proteins, nucleotides, small molecules, and
organelles. The 50 components chosen for the first phase of the interactive experience are listed in
Table 4. In developing the form language used to model the components in a cell it became clear
there was a need for a modular design pattern. The components of a cell exist across several orders
of magnitude in size, so it was challenging to create a visual style that was consistent across scales.
When representing cells, the atom is typically viewed as the smallest component, while organelles
are typically the largest. Organelles can consist of millions of atoms, so it is rare to see a
visualization of an organelle that includes atomic level detail. Even proteins, which often consist
of thousands of atoms are not typically viewed at atomic level detail, and instead viewed using
surface and ribbon displays. The goal in World in a Cell was to implement a visual system that
could remain consistent at the different scales of the cell, so the decision was made to use the
geometric shape of a tetrahedron as the fundamental unit of size. Tetrahedra are infinitely scalable
as they can not only be subdivided by smaller tetrahedra, but also stacked to form larger tetrahedra.
Hence tetrahedra were used to model small cellular components like glucose, and also to model
the largest components like organelles, with the implication being that the larger tetrahedra can be
subdivided as they are investigated at smaller size scales. An example of the tetrahedron visual
language is demonstrated in Figure 35.
120
Table 4. 50 cell components that were modeled in phase I of the World in a Cell.
Component models were designed with respect to primary research data from structural
studies. One of the top-level goals in designing the World in a Cell experience was for users to be
able to enter the cell, watch components perform their roles, and then remember that component
and its role the next time it is encountered. The components should not only be memorable, but
also scientifically accurate to an acceptable degree of abstraction. So, it was important to develop
a visual style that was both engaging and intuitive, but that also remained faithful to the current
scientific understanding of a given component. For all of the protein components in the chosen
pathways, the models’ shape and size were based on experimental structures from the Protein Data
Bank when available, and homology models from SWISS-MODEL when no experimental
structure was present. Small molecule models for components like ATP and glucose were based
121
on molecular structures in PubChem. For larger components like organelles and the membrane of
the cell itself, imaging data was used as a basis for modeling the size and shape. Components were
modeled using a combination of entertainment production software and scientific visualization
software including Maya, Rhino, and Chimera.
Figure 34. Visual language development for World in a Cell components.
122
Storyboarding the interactions between PBC components
After designing the cellular environment as well as the models to populate the environment,
next steps were to build a system of interactions between the components within the cell. The
experience was developed using the Unity Game Engine’s native component interaction systems.
Before assigning behavior to the components, the sequence of events to be represented were
storyboarded. This consisted of taking revised pathway maps and looking at each edge and node
individually to devise functions and behaviors to display. Take the interaction between GLP1 and
GLP1R at the surface of the PBC, for example. Having already developed models for the two
components, it was then necessary to specify exactly how the models would connect with one
another to carry out the biological process of signaling activation. Inspiration for modeling this
connection was found in a publication presenting the crystal structure of GLP1 bound to the
extracellular domain of GLP1R
202
. Knowing the specific points of contact between the two
components allowed for the creation of an animation sequence using Maya, to specify that the
tetrahedron-based models would connect in a similar fashion as reported in the literature. Once
GLP1 binds to the receptor, the next consideration focused on a means for depicting the process
of receptor activation. Other scientists on the team found representations of structural transitions
in the literature
203
and then worked with the artists to animate the tetrahedron-based GLP1R to
perform similar structural rearrangements. This is representative of the process used to storyboard
each of the interactions within the cell experience. First, evidence of how the process takes place
was found in the literature, including the specific motions, speed, and intermediate steps, and then
those feature were represented using animation and modeling software. This process is much like
the storyboarding process used in producing entertainment media, a pipeline in which the World
Building Media Lab has much experience. Storyboards were created for each of the processes
123
within the insulin production and export pathway as a reference for coding the specific behaviors
and motions into the cellular components in Unity. An example storyboard used to plan the process
of GLP1R activation is shown in Figure 36.
Figure 35. Storyboard of GLP1R activation process.
Accurately depicting the scale of components was one major challenge in modeling the
cellular environment. Within the list of components chosen to model, the scale difference between
large and small components can differ by up to five orders of magnitude. To determine the scale
of smaller components using experimentally-determined structures, structures were imported into
Chimera, rendered in surface representation, and then the volume of that surface was measured
using Chimera’s toolsets
204
. For larger components like organelles, where structural data were not
available, imaging data were used to make estimates for the dimensions of resulting models. When
124
scaling the models within Unity, a logarithm was used to convert the measured and estimated scale
of the components into a compressed range of values. Currently, strategies are being developed to
avoid this compression of size scale through approaches like giving the user the capabilities of
changing their own scale when navigating the cell experience. However, the current experience
does not yet have this capability, so a logarithmic size scale range is used in order to facilitate the
visual interrogation of each structure when navigating through the experience. In addition to
accurately representing the scale of components, another goal was to represent accurately the
relative number of components depicted in the cell model. A variety of sources were used to
estimate component frequency values on a per-cell basis, including omics and imaging data, as
well as estimates presented in the book Cell Biology by the Numbers, which proved to be an
invaluable resource
199
. The scale and frequency of selected representative components is shown in
Figure 37.
125
Figure 36. Relative scale and frequency of components within a PBC.
Current phase and next steps
The design and development process for the World in a Cell involved: determining which
cellular pathways to represent; modeling components within the pathways; developing visuals for
126
the interactions between components; as well as determining the size and frequency of each
component. Following the first iteration through the design cycle for one of the sub-pathways,
models and behaviors for components were implemented within the Unity engine. As the project
continues to develop further, there are constant cycles of design and production, with each phase
informing the other. In the current build, components and interactions have been implemented for
glucose metabolism, GLP1R activation, transcription activation, insulin synthesis, and insulin
secretion sub-pathways. Additional new pathways are continually being added to the experience
as current models and interactions are simultaneously optimized. The system of interacting
components has been packaged into a Unity build that can be operated on a consumer-grade virtual
reality compatible computer. Recently, applications have been submitted to showcase the project
at festivals, and proudly, the World in a Cell has been accepted to exhibit at the 2019 Ars
Electronica Festival.
127
Chapter 5: Communicating Science through Digital Arts
Communication has always been a vital aspect of the discovery process. We make new
discoveries by building on previous discoveries, or as Newton put it, only by standing on the
shoulders of giants. In order to climb up to the shoulders and gain new perspectives, we must first
develop an understanding of previous discoveries, which is possible only through communication.
Throughout history we have communicated most of our ideas about science through the written
word, as in books and research publications. But now, with the democratization of tools for
creating and distributing digital media, there are new opportunities for improving the way we
communicate, not only to colleagues, but also to broad audiences. Scientists not only stand to boost
their reach and impact through media production, but by engaging in such artistic practices they
can also improve the effectiveness of their research in itself. The prevalence of science-related
media is on the rise not only in news outlets and research publications, but also through
entertainment platforms like TV, film, games, and social media. Through new tools for
disseminating ideas, scientists will additionally have the opportunity to improve the science
literacy of society and recruit more minds to participate in the process of discovery, which will
help fuel the accelerating rate of scientific advances. Presented in this chapter is a case for the
importance of effective science communication, as well as for developing skills in the art of
communication, followed by a brief overview of my efforts at USC to work with artists to create
scientific media.
The scale of science is growing and becoming more collaborative, making effective
communication more important than ever
205
. Now, it is not only common for science
128
collaborations to span multiple research fields, but also multiple countries
206
. Less than 125 years
ago, the first subatomic particle was discovered by Thompson in a small room with a few
improvised vacuum tubes and wires, whereas now the Large Hadron Collider discovers new
particles by employing over 2,500 individuals from many nations to operate a 17-mile long
scientific apparatus with an annual budget of more than a billion dollars
207
. In this multidisciplinary
collaboration, and the many smaller ones, scientists work with individuals outside of their
particular specialty, which requires effective communication in order to build commonly shared
goals. These collaborative scientific endeavors not only demand great communication among team
members, but also with the public, as the projects require the support of taxpayer funding.
Throughout the history of science, the communication of its results have largely been
accomplished through research publications and conference proceedings. However, as science
audiences are growing, science communication is taking on many new forms. Ideas about how to
communicate science have evolved somewhat passively as the enterprise of science has grown,
but recently much attention has been paid to the process of science communication in itself. There
are even journals solely dedicated to the communication of science, a practice that the journal
Science Communication defines as “the diffusion of knowledge, and the communication of science
and technology among professionals and to the public”
208
. In the past, science communication has
focused on the transmission of knowledge between professionals within the fields of science, as
well as to the policy makers who are appropriating funding. However, in a world rich with media,
there are new opportunities for communicating science to wider audiences, including the public.
Science is becoming more prevalent in the media and popular culture
129
Each day, the average citizen spends many hours consuming media and science is growing
as a source of content
209
. According to a Pew Research poll in 2017, more than 33% of Americans
get science news at least a few times a week
210
. About 45% get their science news from
documentaries or other science video programs, and 12% get their science news through podcasts
or radio programs. Social media is also playing a large role in the growth of science audiences. In
a separate Pew study, more than 25% of Americans who use social media reported that they follow
a science-related page on Facebook
211
. The top two science-related Facebook pages, National
Geographic and Discovery, have a following of 44.3 million and 39 million individuals,
respectively. Not only are popular media outlets like National Geographic and Discovery taking
advantage of social media, but many scientific research institutes and organizations themselves are
keeping active social media accounts to engage both with fellow scientists as well as the public.
This is excluding the growing number of individual science researchers and science
communicators that are building a personal following through the internet.
In addition to the newfound presence of science in social media there is also an increased
presence in traditional entertainment media like movies and TV shows. Shows like Grey’s
Anatomy, House, CSI, and the Big Bang Theory have brought diverse fields of science and
research into millions of households. A 2017 Pew study found that 81% of Americans watch some
science-related entertainment media
212
. Scientists have much to gain by increased representation
in entertainment media, as this same study found that “viewers of science-related entertainment
media tend to see such shows as giving a positive impression of work in science, technology and
medicine.” This increased presence of science in media, and the positive impressions bestowed by
it, may play a large role in phenomenon like the CSI effect, where college forensic chemistry
130
programs have been in greater demand by incoming students as a result of the show’s popularity.
Several universities have reported that the demand for forensic chemistry programs has as much
as doubled since the show’s debut
213
. The CSI effect can serve as inspiration for scientists in other
fields who want to grow their reach and expand their field of interest, given that science in media
can be a major factor in a student’s decision to pursue a career in the sciences. The proportion of
students pursuing degrees in science-related fields is growing rapidly, and there are more students
pursuing higher education in science now than ever
214
. All in all, the benefits of communicating
science not only help to build better collaborations and a more informed public, but also inspire
the next generation to join us in the professional pursuit of discovery.
Artistic practices benefit scientists in their communications and productivity
Recently, the arts have become a popular medium for communicating science
215
. For
scientists, an artistic hobby might not only improve a researcher’s ability to communicate their
work, but can also boost their productivity in the lab
215
. In a 2008 study, researchers found that
Nobel prize winning scientists were more almost three times as likely to possess an artistic hobby-
related to communication, such as painting, photography, acting, composing, and dancing than an
average scientist
216
. Further, one study followed 38 scientists from 1958 to 1988, including some
who rarely published and four who went on to win Nobel prizes, and attempted to determine how
particular hobbies might lead to professional success for a scientist
217
. As a result, “significant
correlations were found between scientific success and particular modes of thinking (especially
visual ones), [and] between success and various hobbies (especially artistic and musical ones)....”
The leader of the study, Root-Bernstein, later said in an interview that when the scientists were
ranked based on their success “All the people in the bottom third, … the worst group, basically
131
said there’s two cultures and scientists and artists… can’t talk to each other.” While “all the people,
without exception, at the top said that’s ridiculous”
218
. This further suggests that scientists may
have much to gain from opening themselves up to the world of the arts.
The mechanisms operating behind the hypothesis that artistic practices can help an
individual in their scientific pursuits could result from science and art being so closely related in
practice. This may not always be apparent. To those outside the disciplines of art and science, it
may seem that science is purely concerned with the objective, while art is concerned with the
subjective; or that science focuses on logic, while art focuses on emotion. Similarly, science takes
observations of the outside world and brings them to the inner world of our psyche, while art takes
observations of the inner world of psyche and presents them to the outside world. However, those
who have engaged in both artistic and scientific practices know that there are more similarities
than differences between the science lab and the art studio. Artists and scientists share a desire to
understand the natural world through processes of inquiry and discovery. The processes of art and
science both engage in the ongoing feedback loop between thinking and testing, where open
inquiry is encouraged, and community and criticism are vital.
Most scientific training programs offer little support in preparing scientists with skills in
communication
219
. If a science student in higher education receives any training in communication,
it is likely for publishing technical communications, like research articles, which are often written
in highly subject-specific language, or jargon
220
. If there is little support in preparing science
students for technical communication, there is a near absence of support for learning skills in the
arts of visual communication and storytelling
221
. The few occasions when scientists do encounter
132
the need for traditional artistic communication are when preparing figures for research
publications, conference posters, or presentation slides. Though, most scientists will hardly
acknowledge that scientific figures are a form of art, even as the journal Science refers to figures
as “art” in their submission guidelines, and while figures sent to Nature Publishing Group are
reviewed by their “Art Department.” Figures are often highly technical as well, but can benefit
from the lessons of visual design and storytelling. In some scientific communities and journals,
there is a new movement focused on redesigning the display of research outcomes visually through
what are called “visual abstracts”
222
. Visual abstracts are like infographics that summarize a
paper’s main findings, and researchers are noticing that visual abstracts can not only boost
effectiveness in information dissemination within their particular scientific field, but also to the
public, as they are receiving much attention on social media, especially Twitter
(#VisualAbstract)
223
. All in all, scientists have much to gain from learning and implementing skills
in visual communication and storytelling, as they can boost one’s impact and reach.
Not long ago, scientists privy to the idea that effective communication through the visual
arts and storytelling can boost their research impact had few resources to learn from, and often
turned to a relatively scarce group of data visualization specialists, like Edward Tufte, Jacques
Bertin, and Irving Geis
224
. But now, there is an abundance of resources to help scientists get their
point across through image, video, and narrative. Scientific journals have been leaning into the
wave of building visual communication literacy by publishing articles and special issues on the
topic. One recent paper in PLOS Computational Biology titled “Ten Simple Rules for Better
Figures” has been read more than 400,000 times with over 1,500 shares on social media
225
. In
addition to new resources from scientific journals, a recent surge of books have been written on
133
the topic. Currently, one data visualization book is within the top 2,000 best sellers on Amazon:
Storytelling with Data
226
. Along with the many emerging data visualization resources in the
scientific literature and books, there are a wealth of new blogs and social media channels dedicated
to the topic. On Twitter, the Data Visualization Society has more than 14,000 followers, while the
top three data visualization groups on Facebook have a combined following of more than 25,000
individuals.
Not only are there a growing number of resources for scientists to learn techniques in the
art of storytelling and visual communication, recent advances in digital media production tools
have made artistic toolkits accessible to the general public, as well as scientists. No longer does
one need advanced training to make a high-quality video or graphic about their research. This is
creating and abundance of new opportunities for scientists to engage with artistic practices to
communicate their science, not only to the general public, but also to colleagues. Scientists should
take advantage of these opportunities, because as a group, there is much room for improvement
when it comes to communication. A 2019 Pew study found that while 89% of the public views
scientists as intelligent, only 54% see scientists as good communicators
227
.
As artistic digital media production tools are becoming more accessible to lay audiences,
including scientists, there are also growing numbers of university programs focused on providing
training in digital media production. This trend is leading to an increase in trained media
production specialists becoming available to work with scientists to produce science-related
media. Further, many individuals are beginning to cross over between fields in digital media and
science research. As a result, communication literacy is growing in scientific communities. This
134
has all been fueled by the adjacent rise in social media, which has created new venues for building
large followings around research and scientific topics. Not only is the general public taking
advantage of these new means of access to scientific developments, but many science researchers
themselves are getting their science news from social media. In 2015, the Pew Research Center
found that 47% of scientists use social media to follow and discuss science
228
, while a 2016 study
titled “How are Scientists Using Social Media in the Workplace” observed that most scientists
focus their attention toward Twitter, Facebook, and LinkedIn
229
. There are no prominent studies
on the number of scientists using social media to promote their science, but given the trends in
viewership it is likely that these numbers are also on the rise. When scientists produce content to
deliver scientific messages they not only help foster a more scientifically informed public, but also
engage in new exchanges of ideas with fellow scientists, all while inspiring the next generation to
join the scientific enterprises.
Creating media to communicate science through digital arts approaches
During my graduate school work, I had the opportunity to embrace this new future of
science communication by founding an organization that brings together artists and scientists to
collaborate in creating media focused on presenting science to diverse audiences. Over the last
three years, my organization, The Bridge Art + Science Alliance, has produced over 15 projects in
mediums like live-action video, animation, games, software, comics, and more. We have also
hosted events at USC and in greater Los Angeles focused on the intersection of art and science.
These events served not only to communicate art-science developments to both the general public
as well as to research audiences, but also to connect artists and scientists for interdisciplinary
collaborations. All of this was made possible by generous funding from the USC Michelson Center
135
for Convergent Bioscience, the more than one hundred individuals who collaborated and
participated in the organization, and the help of many mentors, especially my advisor Ray Stevens.
Entering graduate school I had no formal training in media production, though I have been
engaged in the arts since childhood through music writing and recording, video production, and
graphic design. After accepting a graduate research position at USC, I saw an opportunity not only
to pursue science, but also to embrace my enthusiasm for the arts by working with and learning
from artists at USC’s School of Cinematic Arts. USC’s film school benefits from its proximity to
Hollywood and has many highly decorated members from the entertainment industry as faculty.
This advantage gives the school a consistent top ranking among film programs, as well as an
incredibly talented base of students with many excited about making science-related media.
Through my endeavors to collaborate with artists at USC, I not only received formal training in
digital media production, storytelling, and visual design, but also had the opportunity to gain
hands-on experience working with artists to create scientific media.
Through BASA, one of the first art-science projects I undertook was to use entertainment
media production tools to create an interactive digital exploration of the human cell. This project,
which would ultimately become The World in a Cell (described in Chapter 4), was largely inspired
by the very popular “Inner Life of a Cell” video produced by Harvard’s BioVisions program in
2006. The video has received more than 1.5 million views on YouTube alone, which suggests a
demand for media that explains the foreign and intricate world inside our cells. One of the
communication challenges we addressed in this project was the representation of molecular
signaling. As difficult as it is to prepare static models of molecules, showing their dynamic
136
behavior provided an even greater challenge. We used this as an opportunity to not only work with
influential molecular and cellular biologists like Helen Berman, Art Olson, and David Goodsell to
create new molecular representations that deliver information about structural features, but also to
work with renowned digital media artists, like Alex McDowell and Richard Weinberg to create
dynamic and interactive representations of molecular function. As a result, we developed an
interactive platform that allows participants to witness and engage with the process of molecular
signaling, as well as other cellular phenomena.
In producing scientific media through BASA, I not only had the opportunity to work with
experienced professors and practitioners of digital media, but also with many students from the
USC School of Cinematic Arts. In video projects alone, I collaborated with USC artists and
scientists to produce and direct nine films covering a range of scientific aspects in research and
popular culture. These projects ranged in form and content from live-action shorts highlighting
research institutes and programs, to claymation films about the personal impact of cancer
diagnosis. Our videos have received awards at more than 20 domestic and international film
festivals. To name a few, “Tiniest Tremor,” a film about neonatal opiate addiction, won the
distinction of becoming an Official Selection for premiere at SXSW. Moreover, DANI, a short
film about cancer diagnosis won first place at the Palm Springs Film Festival, and received
consideration for Emmy nomination as a result. In addition to film festivals and other events, we
promote our videos on social media where we have received significant viewership. For example,
a video I produced and directed about the PBCC (described in Chapter 2) titled “The Ultimate
Modeling Problem” received more than 2000 views on Facebook in the first week after it was
posted.
137
Success in film and interactive media projects led to opportunities to work with established
groups in science communication, like The Science and Entertainment Exchange. The Exchange
is a program of the National Academy of Sciences that connects entertainment industry
professionals with top scientists to create a synergy between accurate science and storytelling. The
group is now an advisor to the BASA program and have joined us as partners working out of USC.
This had led to new opportunities for hosting events at the intersection of art and science, and
together we have hosted workshops, talks, and networking events aimed at engaging and
connecting artists and scientists at USC. As an outgrowth of the success in science media projects,
I have had the opportunity to give talks and keynotes at several conferences focused on art, science,
and communication, including the Association of Medical Illustrators Annual Conference, the
SoCal Science Writing Symposium, and VRSciFest. Additionally, through this work I was a proud
recipient of a 2019 Passion in Science Award for Arts and Creativity by New England Biolabs
230
.
Communication will continue to be a fundamental aspect of the discovery process. New
opportunities and platforms for communicating science will continue to develop, as they have with
the recent democratization of tools for creating and distributing digital media. Scientists have much
to gain from expanding their reach and impact through media production, and engaging in such
artistic practices can also improve the efficacy of one’s research in itself. Through communication,
scientists have the opportunity to improve the science literacy of society and recruit more minds
to participate in the process of discovery, which will help fuel the accelerating rate of scientific
advances. I have been fortunate to benefit in my own development as scientist and communicator
through the opportunities presented here. In all, I feel energized to continue not only in my
138
scientific pursuits, but to work toward boosting the impact of these outcomes through a continued
dedication to the practice of communicating science.
139
References
1. Trikkalinou, A.; Papazafiropoulou, A. K.; Melidonis, A., Type 2 diabetes and quality of
life. World J Diabetes 2017, 8 (4), 120-129.
2. Johnston, N. R.; Mitchell, R. K.; Haythorne, E.; Pessoa, M. P.; Semplici, F.; Ferrer, J.;
Piemonti, L.; Marchetti, P.; Bugliani, M.; Bosco, D.; Berishvili, E.; Duncanson, P.; Watkinson,
M.; Broichhagen, J.; Trauner, D.; Rutter, G. A.; Hodson, D. J., Beta Cell Hubs Dictate Pancreatic
Islet Responses to Glucose. Cell Metab 2016, 24 (3), 389-401.
3. Chen, C.; Cohrs, C. M.; Stertmann, J.; Bozsak, R.; Speier, S., Human beta cell mass and
function in diabetes: Recent advances in knowledge and technologies to understand disease
pathogenesis. Mol Metab 2017, 6 (9), 943-957.
4. MacDonald, P. E.; Joseph, J. W.; Rorsman, P., Glucose-sensing mechanisms in pancreatic
beta-cells. Philos Trans R Soc Lond B Biol Sci 2005, 360 (1464), 2211-2225.
5. Kharroubi, A. T.; Darwish, H. M., Diabetes mellitus: The epidemic of the century. World
J Diabetes 2015, 6 (6), 850-867.
6. Berlanga-Acosta, J.; Schultz, G. S.; López-Mola, E.; Guillen-Nieto, G.; García-Siverio,
M.; Herrera-Martínez, L., Glucose toxic effects on granulation tissue productive cells: the
diabetics' impaired healing. Biomed Res Int 2013, 2013, 256043-256043.
7. Atkinson, M. A.; Eisenbarth, G. S.; Michels, A. W., Type 1 diabetes. Lancet 2014, 383
(9911), 69-82.
8. Monaghan, M.; Helgeson, V.; Wiebe, D., Type 1 diabetes in young adulthood. Curr
Diabetes Rev 2015, 11 (4), 239-250.
140
9. DeFronzo, R. A.; Ferrannini, E.; Groop, L.; Henry, R. R.; Herman, W. H.; Holst, J. J.; Hu,
F. B.; Kahn, C. R.; Raz, I.; Shulman, G. I.; Simonson, D. C.; Testa, M. A.; Weiss, R., Type 2
diabetes mellitus. Nature Reviews Disease Primers 2015, 1 (1), 15019.
10. Wu, Y.; Ding, Y.; Tanaka, Y.; Zhang, W., Risk factors contributing to type 2 diabetes and
recent advances in the treatment and prevention. Int J Med Sci 2014, 11 (11), 1185-1200.
11. Steck, A. K.; Rewers, M. J., Genetics of type 1 diabetes. Clin Chem 2011, 57 (2), 176-185.
12. Prasad, R. B.; Groop, L., Genetics of type 2 diabetes-pitfalls and possibilities. Genes
(Basel) 2015, 6 (1), 87-123.
13. Olokoba, A. B.; Obateru, O. A.; Olokoba, L. B., Type 2 diabetes mellitus: a review of
current trends. Oman Med J 2012, 27 (4), 269-273.
14. Röder, P. V.; Wu, B.; Liu, Y.; Han, W., Pancreatic regulation of glucose homeostasis. Exp
Mol Med 2016, 48 (3), e219-e219.
15. Boucher, J.; Kleinridders, A.; Kahn, C. R., Insulin receptor signaling in normal and insulin-
resistant states. Cold Spring Harb Perspect Biol 2014, 6 (1), a009191.
16. Karamanou, M.; Protogerou, A.; Tsoucalas, G.; Androutsos, G.; Poulakou-Rebelakou, E.,
Milestones in the history of diabetes mellitus: The main contributors. World J Diabetes 2016, 7
(1), 1-7.
17. Ahmed, A. M., History of diabetes mellitus. Saudi Med J 2002, 23 (4), 373-8.
18. Fradkin, J. E.; Rodgers, G. P., Diabetes research: a perspective from the National Institute
of Diabetes and Digestive and Kidney Diseases. Diabetes 2013, 62 (2), 320-326.
19. Singla, J.; McClary, K. M.; White, K. L.; Alber, F.; Sali, A.; Stevens, R. C., Opportunities
and Challenges in Building a Spatiotemporal Multi-scale Model of the Human Pancreatic β Cell.
Cell 2018, 173 (1), 11-19.
141
20. Feig, M.; Harada, R.; Mori, T.; Yu, I.; Takahashi, K.; Sugita, Y., Complete atomistic model
of a bacterial cytoplasm for integrating physics, biochemistry, and systems biology. Journal of
Molecular Graphics and Modelling 2015, 58, 1-9.
21. Yu, I.; Mori, T.; Ando, T.; Harada, R.; Jung, J.; Sugita, Y.; Feig, M., Biomolecular
interactions modulate macromolecular structure and dynamics in atomistic model of a bacterial
cytoplasm. Elife 2016, 5, e19274.
22. Hasnain, S.; McClendon, C. L.; Hsu, M. T.; Jacobson, M. P.; Bandyopadhyay, P., A New
Coarse-Grained Model for E. coli Cytoplasm: Accurate Calculation of the Diffusion Coefficient
of Proteins and Observation of Anomalous Diffusion. PLOS ONE 2014, 9 (9), e106466.
23. McGuffee, S. R.; Elcock, A. H., Diffusion, Crowding & Protein Stability in a Dynamic
Molecular Model of the Bacterial Cytoplasm. PLOS Computational Biology 2010, 6 (3),
e1000694.
24. Johnson, G. T.; Goodsell, D. S.; Autin, L.; Forli, S.; Sanner, M. F.; Olson, A. J., 3D
molecular models of whole HIV-1 virions generated with cellPACK. Faraday Discuss 2014, 169
(0), 23-44.
25. Johnson, G. T.; Autin, L.; Al-Alusi, M.; Goodsell, D. S.; Sanner, M. F.; Olson, A. J.,
cellPACK: a virtual mesoscope to model and visualize structural systems biology. Nat Methods
2015, 12 (1), 85-91.
26. Wilhelm, B. G.; Mandad, S.; Truckenbrodt, S.; Kröhnert, K.; Schäfer, C.; Rammner, B.;
Koo, S. J.; Claßen, G. A.; Krauss, M.; Haucke, V.; Urlaub, H.; Rizzoli, S. O., Composition of
isolated synaptic boutons reveals the amounts of vesicle trafficking proteins. Science 2014, 344
(6187), 1023.
142
27. Noske, A. B.; Costin, A. J.; Morgan, G. P.; Marsh, B. J., Expedited approaches to whole
cell electron tomography and organelle mark-up in situ in high-pressure frozen pancreatic islets. J
Struct Biol 2008, 161 (3), 298-313.
28. Karr, J. R.; Sanghvi, J. C.; Macklin, D. N.; Gutschow, M. V.; Jacobs, J. M.; Bolival, B.,
Jr.; Assad-Garcia, N.; Glass, J. I.; Covert, M. W., A whole-cell computational model predicts
phenotype from genotype. Cell 2012, 150 (2), 389-401.
29. King, Z. A.; Lu, J.; Dräger, A.; Miller, P.; Federowicz, S.; Lerman, J. A.; Ebrahim, A.;
Palsson, B. O.; Lewis, N. E., BiGG Models: A platform for integrating, standardizing and sharing
genome-scale models. Nucleic Acids Res 2016, 44 (D1), D515-D522.
30. Cowan, A. E.; Moraru, I. I.; Schaff, J. C.; Slepchenko, B. M.; Loew, L. M., Spatial
modeling of cell signaling networks. Methods Cell Biol 2012, 110, 195-221.
31. Moraru, I. I.; Schaff, J. C.; Slepchenko, B. M.; Blinov, M. L.; Morgan, F.;
Lakshminarayana, A.; Gao, F.; Li, Y.; Loew, L. M., Virtual Cell modelling and simulation
software environment. IET Syst Biol 2008, 2 (5), 352-362.
32. Stiles, J. R.; Van Helden, D.; Bartol, T. M., Jr.; Salpeter, E. E.; Salpeter, M. M., Miniature
endplate current rise times less than 100 microseconds from improved dual recordings can be
modeled with passive acetylcholine diffusion from a synaptic vesicle. Proc Natl Acad Sci U S A
1996, 93 (12), 5747-5752.
33. Tomita, M.; Hashimoto, K.; Takahashi, K.; Shimizu, T. S.; Matsuzaki, Y.; Miyoshi, F.;
Saito, K.; Tanida, S.; Yugi, K.; Venter, J. C.; Hutchison, C. A., 3rd, E-CELL: software
environment for whole-cell simulation. Bioinformatics 1999, 15 (1), 72-84.
143
34. Thul, P. J.; Åkesson, L.; Wiking, M.; Mahdessian, D.; Geladaki, A.; Ait Blal, H.; Alm, T.;
Asplund, A.; Björk, L.; Breckels, L. M.; Bäckström, A.; Danielsson, F.; Fagerberg, L.; Fall, J.;
Gatto, L.; Gnann, C.; Hober, S.; Hjelmare, M.; Johansson, F.; Lee, S.; Lindskog, C.; Mulder, J.;
Mulvey, C. M.; Nilsson, P.; Oksvold, P.; Rockberg, J.; Schutten, R.; Schwenk, J. M.; Sivertsson,
Å.; Sjöstedt, E.; Skogs, M.; Stadler, C.; Sullivan, D. P.; Tegel, H.; Winsnes, C.; Zhang, C.;
Zwahlen, M.; Mardinoglu, A.; Pontén, F.; von Feilitzen, K.; Lilley, K. S.; Uhlén, M.; Lundberg,
E., A subcellular map of the human proteome. Science 2017, 356 (6340), eaal3321.
35. Murphy, R. F., CellOrganizer: Image-derived models of subcellular organization and
protein distribution. Methods Cell Biol 2012, 110, 179-193.
36. Murphy, R. F., Building cell models and simulations from microscope images. Methods
2016, 96, 33-39.
37. Fabregat, A.; Sidiropoulos, K.; Garapati, P.; Gillespie, M.; Hausmann, K.; Haw, R.; Jassal,
B.; Jupe, S.; Korninger, F.; McKay, S.; Matthews, L.; May, B.; Milacic, M.; Rothfels, K.;
Shamovsky, V.; Webber, M.; Weiser, J.; Williams, M.; Wu, G.; Stein, L.; Hermjakob, H.;
D'Eustachio, P., The Reactome pathway Knowledgebase. Nucleic Acids Res 2015, 44 (D1), D481-
D487.
38. Komatsu, T.; Johnsson, K.; Okuno, H.; Bito, H.; Inoue, T.; Nagano, T.; Urano, Y., Real-
Time Measurements of Protein Dynamics Using Fluorescence Activation-Coupled Protein
Labeling Method. Journal of the American Chemical Society 2011, 133 (17), 6745-6751.
39. Mikuni, T.; Nishiyama, J.; Sun, Y.; Kamasawa, N.; Yasuda, R., High-Throughput, High-
Resolution Mapping of Protein Localization in Mammalian Brain by In Vivo Genome Editing.
Cell 2016, 165 (7), 1803-1817.
144
40. Ando, T.; Skolnick, J., BROWNIAN DYNAMICS SIMULATION OF
MACROMOLECULE DIFFUSION IN A PROTOCELL. Quantum Bioinform IV (2010) 2011, 28,
413-426.
41. Długosz, M.; Trylska, J., Diffusion in crowded biological environments: applications of
Brownian dynamics. BMC Biophys 2011, 4, 3-3.
42. Mosca, R.; Céol, A.; Aloy, P., Interactome3D: adding structural details to protein networks.
Nat Methods 2012, 10, 47.
43. Alber, F.; Dokudovskaya, S.; Veenhoff, L. M.; Zhang, W.; Kipper, J.; Devos, D.; Suprapto,
A.; Karni-Schmidt, O.; Williams, R.; Chait, B. T.; Rout, M. P.; Sali, A., Determining the
architectures of macromolecular assemblies. Nature 2007, 450 (7170), 683-694.
44. Ward, A. B.; Sali, A.; Wilson, I. A., Biochemistry. Integrative structural biology. Science
(New York, N.Y.) 2013, 339 (6122), 913-915.
45. Schneidman-Duhovny, D.; Pellarin, R.; Sali, A., Uncertainty in integrative structural
modeling. Curr Opin Struct Biol 2014, 28, 96-104.
46. Yuan, Z.; Bailey, T. L.; Teasdale, R. D., Prediction of protein B-factor profiles. Proteins
2005, 58 (4), 905-12.
47. Aguayo-Mazzucato, C.; van Haaren, M.; Mruk, M.; Lee, T. B., Jr.; Crawford, C.; Hollister-
Lock, J.; Sullivan, B. A.; Johnson, J. W.; Ebrahimi, A.; Dreyfuss, J. M.; Van Deursen, J.; Weir, G.
C.; Bonner-Weir, S., β Cell Aging Markers Have Heterogeneous Distribution and Are Induced by
Insulin Resistance. Cell Metab 2017, 25 (4), 898-910.e5.
48. Dorrell, C.; Schug, J.; Canaday, P. S.; Russ, H. A.; Tarlow, B. D.; Grompe, M. T.; Horton,
T.; Hebrok, M.; Streeter, P. R.; Kaestner, K. H.; Grompe, M., Human islets contain four distinct
subtypes of β cells. Nat Commun 2016, 7, 11756-11756.
145
49. Gutierrez, G. D.; Gromada, J.; Sussel, L., Heterogeneity of the Pancreatic Beta Cell. Front
Genet 2017, 8, 22-22.
50. Tamura, H.; Matsumoto, G.; Itakura, Y.; Terai, H.; Ikebuchi, K.; Mitarai, T.; Isoda, K., A
Case of Congenital Dyserythropoietic Anemia Type II Associated with Hemochromatosis.
Internal Medicine 1992, 31 (3), 380-384.
51. Sanghvi, J. C.; Regot, S.; Carrasco, S.; Karr, J. R.; Gutschow, M. V.; Bolival, B., Jr.;
Covert, M. W., Accelerated discovery via a whole-cell model. Nat Methods 2013, 10 (12), 1192-
1195.
52. Earnest, T. M.; Watanabe, R.; Stone, J. E.; Mahamid, J.; Baumeister, W.; Villa, E.; Luthey-
Schulten, Z., Challenges of Integrating Stochastic Dynamics and Cryo-Electron Tomograms in
Whole-Cell Simulations. J Phys Chem B 2017, 121 (15), 3871-3881.
53. Miao, Y.; Feixas, F.; Eun, C.; McCammon, J. A., Accelerated molecular dynamics
simulations of protein folding. J Comput Chem 2015, 36 (20), 1536-1549.
54. Zheng, S. Q.; Branlund, E.; Kesthelyi, B.; Braunfeld, M. B.; Cheng, Y.; Sedat, J. W.;
Agard, D. A., A distributed multi-GPU system for high speed electron microscopic tomographic
reconstruction. Ultramicroscopy 2011, 111 (8), 1137-1143.
55. Kendrew, J. C., Myoglobin and the Structure of Proteins. Science 1963, 139 (3561), 1259.
56. Burley, S. K.; Kurisu, G.; Markley, J. L.; Nakamura, H.; Velankar, S.; Berman, H. M.; Sali,
A.; Schwede, T.; Trewhella, J., PDB-Dev: a Prototype System for Depositing Integrative/Hybrid
Structural Models. Structure 2017, 25 (9), 1317-1318.
146
57. Sali, A.; Berman, H. M.; Schwede, T.; Trewhella, J.; Kleywegt, G.; Burley, S. K.; Markley,
J.; Nakamura, H.; Adams, P.; Bonvin, A. M. J. J.; Chiu, W.; Peraro, M. D.; Di Maio, F.; Ferrin, T.
E.; Grünewald, K.; Gutmanas, A.; Henderson, R.; Hummer, G.; Iwasaki, K.; Johnson, G.; Lawson,
C. L.; Meiler, J.; Marti-Renom, M. A.; Montelione, G. T.; Nilges, M.; Nussinov, R.; Patwardhan,
A.; Rappsilber, J.; Read, R. J.; Saibil, H.; Schröder, G. F.; Schwieters, C. D.; Seidel, C. A. M.;
Svergun, D.; Topf, M.; Ulrich, E. L.; Velankar, S.; Westbrook, J. D., Outcome of the First wwPDB
Hybrid/Integrative Methods Task Force Workshop. Structure 2015, 23 (7), 1156-1167.
58. McEntyre, J.; Sarkans, U.; Brazma, A., The BioStudies database. Mol Syst Biol 2015, 11
(12), 847-847.
59. Golubovskaya, V.; Berahovich, R.; Zhou, H.; Xu, S.; Harto, H.; Li, L.; Chao, C.-C.; Mao,
M. M.; Wu, L., CD47-CAR-T Cells Effectively Kill Target Cancer Cells and Block Pancreatic
Tumor Growth. Cancers (Basel) 2017, 9 (10), 139.
60. Millman, J. R.; Xie, C.; Van Dervort, A.; Gürtler, M.; Pagliuca, F. W.; Melton, D. A.,
Generation of stem cell-derived β-cells from patients with type 1 diabetes. Nat Commun 2016, 7,
11463-11463.
61. Rezania, A.; Bruin, J. E.; Arora, P.; Rubin, A.; Batushansky, I.; Asadi, A.; O'Dwyer, S.;
Quiskamp, N.; Mojibian, M.; Albrecht, T.; Yang, Y. H. C.; Johnson, J. D.; Kieffer, T. J., Reversal
of diabetes with insulin-producing cells derived in vitro from human pluripotent stem cells. Nature
Biotechnology 2014, 32, 1121.
62. Purcell, O.; Jain, B.; Karr, J. R.; Covert, M. W.; Lu, T. K., Towards a whole-cell modeling
approach for synthetic biology. Chaos 2013, 23 (2), 025112-025112.
63. Betts, M. J.; Russell, R. B., The hard cell: from proteomics to a whole cell model. FEBS
Lett 2007, 581 (15), 2870-6.
147
64. Carrera, J.; Covert, M. W., Why Build Whole-Cell Models? Trends Cell Biol 2015, 25
(12), 719-722.
65. Horwitz, R., Integrated, multi-scale, spatial–temporal cell biology – A next step in the post
genomic era. Methods 2016, 96, 3-5.
66. Horwitz, R.; Johnson, G. T., Whole cell maps chart a course for 21st-century cell biology.
Science 2017, 356 (6340), 806.
67. Macklin, D. N.; Ruggero, N. A.; Covert, M. W., The future of whole-cell modeling. Curr
Opin Biotechnol 2014, 28, 111-115.
68. Roberts, E., Cellular and molecular structure as a unifying framework for whole-cell
modeling. Curr Opin Struct Biol 2014, 25, 86-91.
69. Mountjoy, K. G.; Mortrud, M. T.; Low, M. J.; Simerly, R. B.; Cone, R. D., Localization of
the melanocortin-4 receptor (MC4-R) in neuroendocrine and autonomic control circuits in the
brain. Mol Endocrinol 1994, 8 (10), 1298-308.
70. Gantz, I.; Fong, T., The Melanocortin System. American journal of physiology.
Endocrinology and metabolism 2003, 284, E468-74.
71. Cong, X.; Topin, J.; Golebiowski, J., Class A GPCRs: Structure, Function, Modeling and
Structure-based Ligand Design. Curr Pharm Des 2017, 23 (29), 4390-4409.
72. Yang, Y.; Harmon, C. M., Molecular signatures of human melanocortin receptors for
ligand binding and signaling. Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease
2017, 1863 (10, Part A), 2436-2447.
148
73. Huszar, D.; Lynch, C. A.; Fairchild-Huntress, V.; Dunmore, J. H.; Fang, Q.; Berkemeier,
L. R.; Gu, W.; Kesterson, R. A.; Boston, B. A.; Cone, R. D.; Smith, F. J.; Campfield, L. A.; Burn,
P.; Lee, F., Targeted Disruption of the Melanocortin-4 Receptor Results in Obesity in Mice. Cell
1997, 88 (1), 131-141.
74. Iepsen, E. W.; Zhang, J.; Thomsen, H. S.; Hansen, E. L.; Hollensted, M.; Madsbad, S.;
Hansen, T.; Holst, J. J.; Holm, J.-C.; Torekov, S. S., Patients with Obesity Caused by
Melanocortin-4 Receptor Mutations Can Be Treated with a Glucagon-like Peptide-1 Receptor
Agonist. Cell Metab 2018, 28 (1), 23-32.e3.
75. Tao, Y.-X., The melanocortin-4 receptor: physiology, pharmacology, and
pathophysiology. Endocr Rev 2010, 31 (4), 506-543.
76. da Silva, A. A.; do Carmo, J. M.; Wang, Z.; Hall, J. E., The brain melanocortin system,
sympathetic control, and obesity hypertension. Physiology (Bethesda) 2014, 29 (3), 196-202.
77. Rodrigues, A. R.; Almeida, H.; Gouveia, A. M., Intracellular signaling mechanisms of the
melanocortin receptors: current state of the art. Cellular and Molecular Life Sciences 2015, 72 (7),
1331-1345.
78. Yang, L.-K.; Tao, Y.-X., Biased signaling at neural melanocortin receptors in regulation
of energy homeostasis. Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease 2017,
1863 (10, Part A), 2486-2495.
79. Mountjoy, K. G.; Mortrud, M. T.; Low, M. J.; Simerly, R. B.; Cone, R. D., Localization of
the melanocortin-4 receptor (MC4-R) in neuroendocrine and autonomic control circuits in the
brain. Molecular Endocrinology 1994, 8 (10), 1298-1308.
149
80. Jacqueline, E. S.; Unga, A. U.; Andries, K.; Dick, F. S.; Eric, F.; Anneke, A., Melanocortin
4 receptor distribution in the human hypothalamus. European Journal of Endocrinology 2013, 168
(3), 361-369.
81. Poggioli, R.; Vergoni, A. V.; Bertolini, A., ACTH-(1–24) and α-MSH antagonize feeding
behavior stimulated by kappa opiate agonists. Peptides 1986, 7 (5), 843-848.
82. Lu, D.; Willard, D.; Patel, I. R.; Kadwell, S.; Overton, L.; Kost, T.; Luther, M.; Chen, W.;
Woychik, R. P.; Wilkison, W. O.; Cone, R. D., Agouti protein is an antagonist of the melanocyte-
stimulating-hormone receptor. Nature 1994, 371 (6500), 799-802.
83. Hruby, V. J.; Lu, D. S.; Sharma, S. D.; Castrucci, A. D.; Kesterson, R. A.; Alobeidi, F. A.;
Hadley, M. E.; Cone, R. D., Cyclic Lactam Alpha-Melanotropin Analogs of Ac-Nle(4)-
Cyclo[Asp(5),D-Phe(7),Lys(10)] Alpha-Melanocyte-Stimulating Hormone-(4-10)-Nh2 with
Bulky Aromatic-Amino-Acids at Position-7 Show High Antagonist Potency and Selectivity at
Specific Melanocortin Receptors. J Med Chem 1995, 38 (18), 3454-3461.
84. Fan, W.; Boston, B. A.; Kesterson, R. A.; Hruby, V. J.; Cone, R. D., Role of
melanocortinergic neurons in feeding and the agouti obesity syndrome. Nature 1997, 385 (6612),
165-168.
85. Vaisse, C.; Clement, K.; Durand, E.; Hercberg, S.; Guy-Grand, B.; Froguel, P.,
Melanocortin-4 receptor mutations are a frequent and heterogeneous cause of morbid obesity. J
Clin Invest 2000, 106 (2), 253-262.
86. Farooqi, I. S.; Yeo, G. S.; Keogh, J. M.; Aminian, S.; Jebb, S. A.; Butler, G.; Cheetham,
T.; O'Rahilly, S., Dominant and recessive inheritance of morbid obesity associated with
melanocortin 4 receptor deficiency. J Clin Invest 2000, 106 (2), 271-279.
150
87. Kühnen, P.; Krude, H.; Biebermann, H., Melanocortin-4 Receptor Signalling: Importance
for Weight Regulation and Obesity Treatment. Trends in Molecular Medicine 2019, 25 (2), 136-
148.
88. Ellacott, K. L. J.; Cone, R. D., The role of the central melanocortin system in the regulation
of food intake and energy homeostasis: lessons from mouse models. Philos Trans R Soc Lond B
Biol Sci 2006, 361 (1471), 1265-1274.
89. Peterli, R.; Peters, T.; von Flüe, M.; Hoch, M.; Eberle, A. N., Melanocortin-4 Receptor
Gene and Complications after Gastric Banding. Obesity Surgery 2006, 16 (2), 189-195.
90. Sarzynski, M. A.; Jacobson, P.; Rankinen, T.; Carlsson, B.; Sjöström, L.; Bouchard, C.;
Carlsson, L. M. S., Associations of markers in 11 obesity candidate genes with maximal weight
loss and weight regain in the SOS bariatric surgery cases. International Journal of Obesity 2011,
35 (5), 676-683.
91. Kuo, J. J.; Silva, A. A.; Hall, J. E., Hypothalamic melanocortin receptors and chronic
regulation of arterial pressure and renal function. Hypertension 2003, 41 (3 Pt 2), 768-74.
92. Tallam, L. S.; Kuo, J. J.; da Silva, A. A.; Hall, J. E., Cardiovascular, renal, and metabolic
responses to chronic central administration of agouti-related peptide. Hypertension 2004, 44 (6),
853-8.
93. Watanobe, H.; Schiöth, H. B.; Wikberg, J. E. S.; Suda, T., The Melanocortin 4 Receptor
Mediates Leptin Stimulation of Luteinizing Hormone and Prolactin Surges in Steroid-Primed
Ovariectomized Rats. Biochemical and Biophysical Research Communications 1999, 257 (3),
860-864.
151
94. Van der Ploeg, L. H. T.; Martin, W. J.; Howard, A. D.; Nargund, R. P.; Austin, C. P.; Guan,
X.; Drisko, J.; Cashen, D.; Sebhat, I.; Patchett, A. A.; Figueroa, D. J.; DiLella, A. G.; Connolly,
B. M.; Weinberg, D. H.; Tan, C. P.; Palyha, O. C.; Pong, S.-S.; MacNeil, T.; Rosenblum, C.;
Vongs, A.; Tang, R.; Yu, H.; Sailer, A. W.; Fong, T. M.; Huang, C.; Tota, M. R.; Chang, R. S.;
Stearns, R.; Tamvakopoulos, C.; Christ, G.; Drazen, D. L.; Spar, B. D.; Nelson, R. J.; MacIntyre,
D. E., A role for the melanocortin 4 receptor in sexual function. Proc Natl Acad Sci U S A 2002,
99 (17), 11381-11386.
95. Contreras, P. C.; Takemori, A. E., Antagonism of morphine-induced analgesia, tolerance
and dependence by alpha-melanocyte-stimulating hormone. Journal of Pharmacology and
Experimental Therapeutics 1984, 229 (1), 21.
96. Alvaro, J. D.; Tatro, J. B.; Quillan, J. M.; Fogliano, M.; Eisenhard, M.; Lerner, M. R.;
Nestler, E. J.; Duman, R. S., Morphine down-regulates melanocortin-4 receptor expression in brain
regions that mediate opiate addiction. Molecular Pharmacology 1996, 50 (3), 583.
97. Noon, L. A.; Franklin, J. M.; King, P. J.; Goulding, N. J.; Hunyady, L.; Clark, A. J., Failed
export of the adrenocorticotrophin receptor from the endoplasmic reticulum in non-adrenal cells:
evidence in support of a requirement for a specific adrenal accessory factor. Journal of
Endocrinology 2002, 174 (1), 17-25.
98. Chan, L. F.; Webb, T. R.; Chung, T.-T.; Meimaridou, E.; Cooray, S. N.; Guasti, L.;
Chapple, J. P.; Egertová, M.; Elphick, M. R.; Cheetham, M. E.; Metherell, L. A.; Clark, A. J. L.,
MRAP and MRAP2 are bidirectional regulators of the melanocortin receptor family. Proceedings
of the National Academy of Sciences 2009, 106 (15), 6146.
152
99. Sebag, J. A.; Zhang, C.; Hinkle, P. M.; Bradshaw, A. M.; Cone, R. D., Developmental
Control of the Melanocortin-4 Receptor by MRAP2 Proteins in Zebrafish. Science 2013, 341
(6143), 278.
100. Asai, M.; Ramachandrappa, S.; Joachim, M.; Shen, Y.; Zhang, R.; Nuthalapati, N.;
Ramanathan, V.; Strochlic, D. E.; Ferket, P.; Linhart, K.; Ho, C.; Novoselova, T. V.; Garg, S.;
Ridderstråle, M.; Marcus, C.; Hirschhorn, J. N.; Keogh, J. M.; O’Rahilly, S.; Chan, L. F.; Clark,
A. J.; Farooqi, I. S.; Majzoub, J. A., Loss of Function of the Melanocortin 2 Receptor Accessory
Protein 2 Is Associated with Mammalian Obesity. Science 2013, 341 (6143), 275.
101. Ghamari-Langroudi, M.; Digby, G. J.; Sebag, J. A.; Millhauser, G. L.; Palomino, R.;
Matthews, R.; Gillyard, T.; Panaro, B. L.; Tough, I. R.; Cox, H. M.; Denton, J. S.; Cone, R. D., G-
protein-independent coupling of MC4R to Kir7.1 in hypothalamic neurons. Nature 2015, 520
(7545), 94-98.
102. Fani, L.; Bak, S.; Delhanty, P.; van Rossum, E. F. C.; van den Akker, E. L. T., The
melanocortin-4 receptor as target for obesity treatment: a systematic review of emerging
pharmacological therapeutic options. International Journal of Obesity 2014, 38 (2), 163-169.
103. Hess, S.; Linde, Y.; Ovadia, O.; Safrai, E.; Shalev, D. E.; Swed, A.; Halbfinger, E.; Lapidot,
T.; Winkler, I.; Gabinet, Y.; Faier, A.; Yarden, D.; Xiang, Z.; Portillo, F. P.; Haskell-Luevano, C.;
Gilon, C.; Hoffman, A., Backbone Cyclic Peptidomimetic Melanocortin-4 Receptor Agonist as a
Novel Orally Administrated Drug Lead for Treating Obesity. J Med Chem 2008, 51 (4), 1026-
1034.
153
104. Heyder, N.; Kleinau, G.; Szczepek, M.; Kwiatkowski, D.; Speck, D.; Soletto, L.; Cerdá-
Reverter, J. M.; Krude, H.; Kühnen, P.; Biebermann, H.; Scheerer, P., Signal Transduction and
Pathogenic Modifications at the Melanocortin-4 Receptor: A Structural Perspective. Frontiers in
Endocrinology 2019, 10, 515.
105. Saleh, N.; Kleinau, G.; Heyder, N.; Clark, T.; Hildebrand, P. W.; Scheerer, P. Binding,
Thermodynamics, and Selectivity of a Non-peptide Antagonist to the Melanocortin-4 Receptor
Front Pharmacol [Online], 2018, p. 560. PubMed. http://europepmc.org/abstract/MED/29910730
http://europepmc.org/articles/PMC5992272?pdf=render
http://europepmc.org/articles/PMC5992272
https://doi.org/10.3389/fphar.2018.00560 (accessed 2018).
106. Katritch, V.; Fenalti, G.; Abola, E. E.; Roth, B. L.; Cherezov, V.; Stevens, R. C., Allosteric
sodium in class A GPCR signaling. Trends Biochem Sci 2014, 39 (5), 233-244.
107. Germann, T. C.; Kadau, K. A. I., TRILLION-ATOM MOLECULAR DYNAMICS
BECOMES A REALITY. International Journal of Modern Physics C 2008, 19 (09), 1315-1319.
108. Jones, L. L.; Kelly, R. M., Visualization: The Key to Understanding Chemistry Concepts.
In Sputnik to Smartphones: A Half-Century of Chemistry Education, American Chemical Society:
2015; Vol. 1208, pp 121-140.
109. Haas, J. K. A History of the Unity Game Engine; 2014.
110. Pauwels, P. D. M., R.; Campenhout, J. V. , Visualisation of semantic architectural
information within a game engine environment. Proceedings of the 10th International conference
on Construction Applications of Virtual Reality 2010, 219-228.
154
111. Cioroaica, E.; Kuhn, T.; Bauer, T. In Prototyping Automotive Smart Ecosystems, 2018 48th
Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops
(DSN-W), 25-28 June 2018; 2018; pp 255-262.
112. Mathur, A. S. In Low cost virtual reality for medical training, 2015 IEEE Virtual Reality
(VR), 23-27 March 2015; 2015; pp 345-346.
113. Zubek, R., Getting Started with Unity and Entity/Component Model. Northwestern
University: 2016.
114. McGhee, J., 3-D visualization and animation technologies in anatomical imaging. Journal
of Anatomy 2010, 216 (2), 264-270.
115. Lv, Z.; Tek, A.; Da Silva, F.; Empereur-mot, C.; Chavent, M.; Baaden, M., Game on,
science - how video game technology may help biologists tackle visualization challenges. PloS
one 2013, 8 (3), e57990-e57990.
116. Gorbanev, I.; Agudelo-Londoño, S.; González, R. A.; Cortes, A.; Pomares, A.; Delgadillo,
V.; Yepes, F. J.; Muñoz, Ó., A systematic review of serious games in medical education: quality
of evidence and pedagogical strategy. Med Educ Online 2018, 23 (1), 1438718-1438718.
117. Cooper, S.; Khatib, F.; Treuille, A.; Barbero, J.; Lee, J.; Beenen, M.; Leaver-Fay, A.;
Baker, D.; Popović, Z.; Players, F., Predicting protein structures with a multiplayer online game.
Nature 2010, 466 (7307), 756-760.
118. Johnson, G. T.; Autin, L.; Goodsell, D. S.; Sanner, M. F.; Olson, A. J., ePMV embeds
molecular modeling into professional animation software environments. Structure 2011, 19 (3),
293-303.
119. McGill, G., Molecular Movies… Coming to a Lecture near You. Cell 2008, 133 (7), 1127-
1132.
155
120. Granger, A.; Kushner, J. A., Cellular origins of beta-cell regeneration: a legacy view of
historical controversies. J Intern Med 2009, 266 (4), 325-338.
121. Blodgett, D. M.; Cura, A. J.; Harlan, D. M., The pancreatic β-cell transcriptome and
integrated-omics. Curr Opin Endocrinol Diabetes Obes 2014, 21 (2), 83-88.
122. Ahlgren, U. K., E., Imaging the Pancreatic Beta Cell. IntechOpen 2010.
123. Bensu, K., Overview of Systems Biology and Omics Technologies. Current Medicinal
Chemistry 2016, 23 (37), 4221-4230.
124. Goodsell, D. S.; Franzen, M. A.; Herman, T., From Atoms to Cells: Using Mesoscale
Landscapes to Construct Visual Narratives. Journal of Molecular Biology 2018, 430 (21), 3954-
3968.
125. Graham, K. L.; Fynch, S.; Papas, E. G.; Tan, C.; Kay, T. W. H.; Thomas, H. E., Isolation
and Culture of the Islets of Langerhans from Mouse Pancreas. Bio-protocol 2016, 6 (12), e1840.
126. Scharfmann, R.; Staels, W.; Albagli, O., The supply chain of human pancreatic β cell lines.
J Clin Invest 2019, 129 (9), 3511-3520.
127. Kaddis, J. S.; Olack, B. J.; Sowinski, J.; Cravens, J.; Contreras, J. L.; Niland, J. C., Human
pancreatic islets and diabetes research. JAMA 2009, 301 (15), 1580-1587.
128. King, A. J. F., The use of animal models in diabetes research. Br J Pharmacol 2012, 166
(3), 877-894.
129. Skelin, M. R., M.; Cencic, A., Pancreatic beta cell lines and their applications in diabetes
mellitus research. ALTEX 2010, 27 (2), 105-13.
130. McCluskey, J. T.; Hamid, M.; Guo-Parke, H.; McClenaghan, N. H.; Gomis, R.; Flatt, P.
R., Development and functional characterization of insulin-releasing human pancreatic beta cell
lines produced by electrofusion. J Biol Chem 2011, 286 (25), 21982-92.
156
131. Asfari, M.; Janjic, D.; Meda, P.; Li, G.; Halban, P. A.; Wollheim, C. B., Establishment of
2-mercaptoethanol-dependent differentiated insulin-secreting cell lines. Endocrinology 1992, 130
(1), 167-178.
132. Hughes, A. J.; Spelke, D. P.; Xu, Z.; Kang, C.-C.; Schaffer, D. V.; Herr, A. E., Single-cell
western blotting. Nat Methods 2014, 11 (7), 749-755.
133. Budnik, B.; Levy, E.; Slavov, N., Mass-spectrometry of single mammalian cells quantifies
proteome heterogeneity during cell differentiation. bioRxiv. DOI 2017, 10 (1101), 102681.
134. Su, Y.; Shi, Q.; Wei, W., Single cell proteomics in biomedicine: High-dimensional data
acquisition, visualization, and analysis. Proteomics 2017, 17 (3-4), 10.1002/pmic.201600267.
135. Gooding, J. R.; Jensen, M. V.; Newgard, C. B., Metabolomics applied to the pancreatic
islet. Arch Biochem Biophys 2016, 589, 120-130.
136. Pearson, G. L.; Mellett, N.; Chu, K. Y.; Boslem, E.; Meikle, P. J.; Biden, T. J., A
comprehensive lipidomic screen of pancreatic β-cells using mass spectroscopy defines novel
features of glucose-stimulated turnover of neutral lipids, sphingolipids and plasmalogens. Mol
Metab 2016, 5 (6), 404-414.
137. Huang, M.; Joseph, J. W., Metabolomic analysis of pancreatic β-cell insulin release in
response to glucose. Islets 2012, 4 (3), 210-222.
138. Roomp, K.; Kristinsson, H.; Schvartz, D.; Ubhayasekera, K.; Sargsyan, E.; Manukyan, L.;
Chowdhury, A.; Manell, H.; Satagopam, V.; Groebe, K.; Schneider, R.; Bergquist, J.; Sanchez, J.-
C.; Bergsten, P., Combined lipidomic and proteomic analysis of isolated human islets exposed to
palmitate reveals time-dependent changes in insulin secretion and lipid metabolism. PloS one
2017, 12 (4), e0176391-e0176391.
157
139. Wallace, M.; Whelan, H.; Brennan, L., Metabolomic analysis of pancreatic beta cells
following exposure to high glucose. Biochimica et Biophysica Acta (BBA) - General Subjects
2013, 1830 (3), 2583-2590.
140. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.;
Shindyalov, I. N.; Bourne, P. E., The Protein Data Bank. Nucleic Acids Res 2000, 28 (1), 235-242.
141. Kiefer, F.; Arnold, K.; Künzli, M.; Bordoli, L.; Schwede, T., The SWISS-MODEL
Repository and associated resources. Nucleic Acids Res 2009, 37 (Database issue), D387-D392.
142. Benkert, P.; Tosatto, S. C. E.; Schomburg, D., QMEAN: A comprehensive scoring function
for model quality assessment. Proteins: Structure, Function, and Bioinformatics 2008, 71 (1), 261-
277.
143. Oates, M. E.; Romero, P.; Ishida, T.; Ghalwash, M.; Mizianty, M. J.; Xue, B.; Dosztányi,
Z.; Uversky, V. N.; Obradovic, Z.; Kurgan, L.; Dunker, A. K.; Gough, J., D²P²: database of
disordered protein predictions. Nucleic Acids Res 2013, 41 (Database issue), D508-D516.
144. Im, W.; Liang, J.; Olson, A.; Zhou, H.-X.; Vajda, S.; Vakser, I. A., Challenges in structural
approaches to cell modeling. Journal of molecular biology 2016, 428 (15), 2943-2964.
145. Blodgett, D. M.; Nowosielska, A.; Afik, S.; Pechhold, S.; Cura, A. J.; Kennedy, N. J.; Kim,
S.; Kucukural, A.; Davis, R. J.; Kent, S. C.; Greiner, D. L.; Garber, M. G.; Harlan, D. M.; diIorio,
P., Novel Observations From Next-Generation RNA Sequencing of Highly Purified Human Adult
and Fetal Islet Cell Subsets. Diabetes 2015, 64 (9), 3172-3181.
146. Li, J.; Klughammer, J.; Farlik, M.; Penz, T.; Spittler, A.; Barbieux, C.; Berishvili, E.; Bock,
C.; Kubicek, S., Single-cell transcriptomes reveal characteristic features of human pancreatic islet
cell types. EMBO Rep 2016, 17 (2), 178-187.
158
147. Nica, A. C.; Ongen, H.; Irminger, J.-C.; Bosco, D.; Berney, T.; Antonarakis, S. E.; Halban,
P. A.; Dermitzakis, E. T., Cell-type, allelic, and genetic signatures in the human pancreatic beta
cell transcriptome. Genome Res 2013, 23 (9), 1554-1562.
148. The UniProt Consortium, UniProt: the universal protein knowledgebase. Nucleic Acids Res
2017, 45 (D1), D158-D169.
149. Brackeva, B.; Kramer, G.; Vissers, J. P. C.; Martens, G. A., Quantitative proteomics of rat
and human pancreatic beta cells. Data Brief 2015, 3, 234-239.
150. Mi, H.; Muruganujan, A.; Ebert, D.; Huang, X.; Thomas, P. D., PANTHER version 14:
more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools.
Nucleic Acids Res 2018, 47 (D1), D419-D426.
151. Pfeifer, C. R.; Shomorony, A.; Aronova, M. A.; Zhang, G.; Cai, T.; Xu, H.; Notkins, A. L.;
Leapman, R. D., Quantitative analysis of mouse pancreatic islet architecture by serial block-face
SEM. J Struct Biol 2015, 189 (1), 44-52.
152. Danev, R.; Baumeister, W., Cryo-EM single particle analysis with the Volta phase plate.
Elife 2016, 5, e13046.
153. Rigort, A.; Bäuerlein, F. J. B.; Villa, E.; Eibauer, M.; Laugks, T.; Baumeister, W.; Plitzko,
J. M., Focused ion beam micromachining of eukaryotic cells for cryoelectron tomography. Proc
Natl Acad Sci U S A 2012, 109 (12), 4449-4454.
154. Beck, M.; Malmström, J. A.; Lange, V.; Schmidt, A.; Deutsch, E. W.; Aebersold, R., Visual
proteomics of the human pathogen Leptospira interrogans. Nat Methods 2009, 6 (11), 817-823.
155. Nickell, S.; Kofler, C.; Leis, A. P.; Baumeister, W., A visual approach to proteomics.
Nature Reviews Molecular Cell Biology 2006, 7 (3), 225-230.
159
156. Xu, M.; Tocheva, E. I.; Chang, Y.-W.; Jensen, G. J.; Alber, F., De novo visual proteomics
in single cells through pattern mining. arXiv preprint arXiv:1512.09347 2015.
157. Xu, M.; Beck, M.; Alber, F., Template-free detection of macromolecular complexes in cryo
electron tomograms. Bioinformatics (Oxford, England) 2011, 27 (13), i69-i76.
158. Xu, M.; Moresco, J. J.; Chang, M.; Mukim, A.; Smith, D.; Diedrich, J. K.; Yates, J. R., III;
Jones, K. A., SHMT2 and the BRCC36/BRISC deubiquitinase regulate HIV-1 Tat K63-
ubiquitylation and destruction by autophagy. PLOS Pathogens 2018, 14 (5), e1007071.
159. Yang, F.; Shen, Y.; Camp, D. G., 2nd; Smith, R. D., High-pH reversed-phase
chromatography with fraction concatenation for 2D proteomic analysis. Expert Rev Proteomics
2012, 9 (2), 129-134.
160. Lavallée-Adam, M.; Park, S. K. R.; Martínez-Bartolomé, S.; He, L.; Yates, J. R., 3rd, From
raw data to biological discoveries: a computational analysis pipeline for mass spectrometry-based
proteomics. J Am Soc Mass Spectrom 2015, 26 (11), 1820-1826.
161. Reiter, L.; Claassen, M.; Schrimpf, S. P.; Jovanovic, M.; Schmidt, A.; Buhmann, J. M.;
Hengartner, M. O.; Aebersold, R., Protein identification false discovery rates for very large
proteomics data sets generated by tandem mass spectrometry. Mol Cell Proteomics 2009, 8 (11),
2405-2417.
162. Kim, H.; Lee, S.; Park, H., Target-small decoy search strategy for false discovery rate
estimation. BMC Bioinformatics 2019, 20 (1), 438.
163. Mallick, P.; Kuster, B., Proteomics: a pragmatic perspective. Nature Biotechnology 2010,
28 (7), 695-709.
160
164. Kamburov, A.; Wierling, C.; Lehrach, H.; Herwig, R., ConsensusPathDB--a database for
integrating human functional interaction networks. Nucleic Acids Res 2009, 37 (Database issue),
D623-D628.
165. Dorajoo, R.; Ali, Y.; Tay, V. S. Y.; Kang, J.; Samydurai, S.; Liu, J.; Boehm, B. O., Single-
cell transcriptomics of East-Asian pancreatic islets cells. Sci Rep 2017, 7 (1), 5024-5024.
166. Ghazalpour, A.; Bennett, B.; Petyuk, V. A.; Orozco, L.; Hagopian, R.; Mungrue, I. N.;
Farber, C. R.; Sinsheimer, J.; Kang, H. M.; Furlotte, N.; Park, C. C.; Wen, P.-Z.; Brewer, H.;
Weitz, K.; Camp, D. G., II; Pan, C.; Yordanova, R.; Neuhaus, I.; Tilford, C.; Siemers, N.;
Gargalovic, P.; Eskin, E.; Kirchgessner, T.; Smith, D. J.; Smith, R. D.; Lusis, A. J., Comparative
Analysis of Proteome and Transcriptome Variation in Mouse. PLOS Genetics 2011, 7 (6),
e1001393.
167. Ghazalpour, A.; Bennett, B.; Petyuk, V. A.; Orozco, L.; Hagopian, R.; Mungrue, I. N.;
Farber, C. R.; Sinsheimer, J.; Kang, H. M.; Furlotte, N.; Park, C. C.; Wen, P.-Z.; Brewer, H.;
Weitz, K.; Camp, D. G., 2nd; Pan, C.; Yordanova, R.; Neuhaus, I.; Tilford, C.; Siemers, N.;
Gargalovic, P.; Eskin, E.; Kirchgessner, T.; Smith, D. J.; Smith, R. D.; Lusis, A. J., Comparative
analysis of proteome and transcriptome variation in mouse. PLoS Genet 2011, 7 (6), e1001393-
e1001393.
168. Maier, T.; Güell, M.; Serrano, L., Correlation of mRNA and protein in complex biological
samples. FEBS Letters 2009, 583 (24), 3966-3973.
169. Le Roch, K. G.; Johnson, J. R.; Florens, L.; Zhou, Y.; Santrosyan, A.; Grainger, M.; Yan,
S. F.; Williamson, K. C.; Holder, A. A.; Carucci, D. J.; Yates, J. R., 3rd; Winzeler, E. A., Global
analysis of transcript and protein levels across the Plasmodium falciparum life cycle. Genome
research 2004, 14 (11), 2308-2318.
161
170. Plaisier, S. B.; Taschereau, R.; Wong, J. A.; Graeber, T. G., Rank-rank hypergeometric
overlap: identification of statistically significant overlap between gene-expression signatures.
Nucleic Acids Res 2010, 38 (17), e169-e169.
171. Carzaniga, R.; Domart, M.-C.; Collinson, L. M.; Duke, E., Cryo-soft X-ray tomography: a
journey into the world of the native-state cell. Protoplasma 2014, 251 (2), 449-458.
172. Do, M.; Isaacson, S. A.; McDermott, G.; Le Gros, M. A.; Larabell, C. A., Imaging and
characterizing cells using tomography. Arch Biochem Biophys 2015, 581, 111-121.
173. Ekman, A. A.; Chen, J.-H.; Guo, J.; McDermott, G.; Le Gros, M. A.; Larabell, C. A.,
Mesoscale imaging with cryo-light and X-rays: Larger than molecular machines, smaller than a
cell. Biol Cell 2017, 109 (1), 24-38.
174. Weiß, D.; Schneider, G.; Vogt, S.; Guttmann, P.; Niemann, B.; Rudolph, D.; Schmahl, G.,
Tomographic imaging of biological specimens with the cryo transmission X-ray microscope.
Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers,
Detectors and Associated Equipment 2001, 467-468, 1308-1311.
175. Andrzejewski, D.; Brown, M. L.; Ungerleider, N.; Burnside, A.; Schneyer, A. L., Activins
A and B Regulate Fate-Determining Gene Expression in Islet Cell Lines and Islet Cells From Male
Mice. Endocrinology 2015, 156 (7), 2440-2450.
176. Huang, Q.; Merriman, C.; Zhang, H.; Fu, D., Coupling of Insulin Secretion and Display of
a Granule-resident Zinc Transporter ZnT8 on the Surface of Pancreatic Beta Cells. The Journal of
biological chemistry 2017, 292 (10), 4034-4043.
177. He, L.; Diedrich, J.; Chu, Y.-Y.; Yates, J. R., 3rd, Extracting Accurate Precursor
Information for Tandem Mass Spectra by RawConverter. Anal Chem 2015, 87 (22), 11361-11367.
162
178. Xu, T.; Park, S. K.; Venable, J. D.; Wohlschlegel, J. A.; Diedrich, J. K.; Cociorva, D.; Lu,
B.; Liao, L.; Hewel, J.; Han, X.; Wong, C. C. L.; Fonslow, B.; Delahunty, C.; Gao, Y.; Shah, H.;
Yates, J. R., 3rd, ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity
and specificity. J Proteomics 2015, 129, 16-24.
179. Tabb, D. L.; McDonald, W. H.; Yates, J. R., 3rd, DTASelect and Contrast: tools for
assembling and comparing protein identifications from shotgun proteomics. J Proteome Res 2002,
1 (1), 21-26.
180. Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P., Evaluation of
Multidimensional Chromatography Coupled with Tandem Mass Spectrometry (LC/LC−MS/MS)
for Large-Scale Protein Analysis: The Yeast Proteome. J Proteome Res 2003, 2 (1), 43-50.
181. Le Gros, M. A.; Clowney, E. J.; Magklara, A.; Yen, A.; Markenscoff-Papadimitriou, E.;
Colquitt, B.; Myllys, M.; Kellis, M.; Lomvardas, S.; Larabell, C. A., Soft X-Ray Tomography
Reveals Gradual Chromatin Compaction and Reorganization during Neurogenesis In Vivo. Cell
Rep 2016, 17 (8), 2125-2136.
182. Clowney, E. J.; LeGros, M. A.; Mosley, C. P.; Clowney, F. G.; Markenskoff-
Papadimitriou, E. C.; Myllys, M.; Barnea, G.; Larabell, C. A.; Lomvardas, S., Nuclear aggregation
of olfactory receptor genes governs their monogenic expression. Cell 2012, 151 (4), 724-737.
183. Le Gros, M. A.; McDermott, G.; Larabell, C. A., X-ray tomography of whole cells. Curr
Opin Struct Biol 2005, 15 (5), 593-600.
184. Parkinson, D. Y.; Knoechel, C.; Yang, C.; Larabell, C. A.; Le Gros, M. A., Automatic
alignment and reconstruction of images for soft X-ray tomography. J Struct Biol 2012, 177 (2),
259-266.
163
185. Tsai, A. G.; Williamson, D. F.; Glick, H. A., Direct medical cost of overweight and obesity
in the USA: a quantitative systematic review. Obes Rev 2011, 12 (1), 50-61.
186. Sriram, K.; Insel, P. A., GPCRs as targets for approved drugs: How many targets and how
many drugs? Molecular Pharmacology 2018, mol.117.111062.
187. Newhouse, K. E., Goodman and Gilman's The Pharmacological Basis of Therapeutics.
Yale J Biol Med 1986, 59 (1), 71-72.
188. Venkatakrishnan, A. J.; Deupi, X.; Lebon, G.; Heydenreich, F. M.; Flock, T.; Miljus, T.;
Balaji, S.; Bouvier, M.; Veprintsev, D. B.; Tate, C. G.; Schertler, G. F. X.; Babu, M. M., Diverse
activation pathways in class A GPCRs converge near the G-protein-coupling region. Nature 2016,
536 (7617), 484-487.
189. Heyder, N.; Kleinau, G.; Szczepek, M.; Kwiatkowski, D.; Speck, D.; Soletto, L.; Cerdá-
Reverter, J. M.; Krude, H.; Kühnen, P.; Biebermann, H.; Scheerer, P., Signal Transduction and
Pathogenic Modifications at the Melanocortin-4 Receptor: A Structural Perspective. Frontiers in
endocrinology 2019, 10, 515-515.
190. Milić, D.; Veprintsev, D. B., Large-scale production and protein engineering of G protein-
coupled receptors for structural studies. Front Pharmacol 2015, 6, 66-66.
191. Chun, E.; Thompson, A. A.; Liu, W.; Roth, C. B.; Griffith, M. T.; Katritch, V.; Kunken,
J.; Xu, F.; Cherezov, V.; Hanson, M. A.; Stevens, R. C., Fusion partner toolchest for the
stabilization and crystallization of G protein-coupled receptors. Structure 2012, 20 (6), 967-976.
192. Munk, C.; Isberg, V.; Mordalski, S.; Harpsøe, K.; Rataj, K.; Hauser, A. S.; Kolb, P.;
Bojarski, A. J.; Vriend, G.; Gloriam, D. E., GPCRdb: the G protein-coupled receptor database - an
introduction. Br J Pharmacol 2016, 173 (14), 2195-2207.
164
193. Heydenreich, F. M.; Vuckovic, Z.; Matkovic, M.; Veprintsev, D. B., Stabilization of G
protein-coupled receptors by point mutations. Front Pharmacol 2015, 6, 82-82.
194. White, K. L.; Eddy, M. T.; Gao, Z.-G.; Han, G. W.; Lian, T.; Deary, A.; Patel, N.; Jacobson,
K. A.; Katritch, V.; Stevens, R. C., Structural Connection between Activation Microswitch and
Allosteric Sodium Site in GPCR Signaling. Structure 2018, 26 (2), 259-269.e5.
195. Alexandrov, A. I.; Mileni, M.; Chien, E. Y. T.; Hanson, M. A.; Stevens, R. C., Microscale
Fluorescent Thermal Stability Assay for Membrane Proteins. Structure 2008, 16 (3), 351-359.
196. Scarff, C. A.; Fuller, M. J. G.; Thompson, R. F.; Iadaza, M. G., Variations on Negative
Stain Electron Microscopy Methods: Tools for Tackling Challenging Systems. J Vis Exp 2018,
(132), 57199.
197. Wacker, D.; Stevens, R. C.; Roth, B. L., How Ligands Illuminate GPCR Molecular
Pharmacology. Cell 2017, 170 (3), 414-427.
198. Jonkman, J.; Brown, C. M., Any Way You Slice It-A Comparison of Confocal Microscopy
Techniques. J Biomol Tech 2015, 26 (2), 54-65.
199. Milo, R.; Jorgensen, P.; Moran, U.; Weber, G.; Springer, M., BioNumbers--the database
of key numbers in molecular and cell biology. Nucleic Acids Res 2010, 38 (Database issue), D750-
D753.
165
200. Venter, J. C.; Adams, M. D.; Myers, E. W.; Li, P. W.; Mural, R. J.; Sutton, G. G.; Smith,
H. O.; Yandell, M.; Evans, C. A.; Holt, R. A.; Gocayne, J. D.; Amanatides, P.; Ballew, R. M.;
Huson, D. H.; Wortman, J. R.; Zhang, Q.; Kodira, C. D.; Zheng, X. H.; Chen, L.; Skupski, M.;
Subramanian, G.; Thomas, P. D.; Zhang, J.; Gabor Miklos, G. L.; Nelson, C.; Broder, S.; Clark,
A. G.; Nadeau, J.; McKusick, V. A.; Zinder, N.; Levine, A. J.; Roberts, R. J.; Simon, M.; Slayman,
C.; Hunkapiller, M.; Bolanos, R.; Delcher, A.; Dew, I.; Fasulo, D.; Flanigan, M.; Florea, L.;
Halpern, A.; Hannenhalli, S.; Kravitz, S.; Levy, S.; Mobarry, C.; Reinert, K.; Remington, K.; Abu-
Threideh, J.; Beasley, E.; Biddick, K.; Bonazzi, V.; Brandon, R.; Cargill, M.;
Chandramouliswaran, I.; Charlab, R.; Chaturvedi, K.; Deng, Z.; Francesco, V. D.; Dunn, P.;
Eilbeck, K.; Evangelista, C.; Gabrielian, A. E.; Gan, W.; Ge, W.; Gong, F.; Gu, Z.; Guan, P.;
Heiman, T. J.; Higgins, M. E.; Ji, R.-R.; Ke, Z.; Ketchum, K. A.; Lai, Z.; Lei, Y.; Li, Z.; Li, J.;
Liang, Y.; Lin, X.; Lu, F.; Merkulov, G. V.; Milshina, N.; Moore, H. M.; Naik, A. K.; Narayan,
V. A.; Neelam, B.; Nusskern, D.; Rusch, D. B.; Salzberg, S.; Shao, W.; Shue, B.; Sun, J.; Wang,
Z. Y.; Wang, A.; Wang, X.; Wang, J.; Wei, M.-H.; Wides, R.; Xiao, C.; Yan, C.; Yao, A.; Ye, J.;
Zhan, M.; Zhang, W.; Zhang, H.; Zhao, Q.; Zheng, L.; Zhong, F.; Zhong, W.; Zhu, S. C.; Zhao,
S.; Gilbert, D.; Baumhueter, S.; Spier, G.; Carter, C.; Cravchik, A.; Woodage, T.; Ali, F.; An, H.;
Awe, A.; Baldwin, D.; Baden, H.; Barnstead, M.; Barrow, I.; Beeson, K.; Busam, D.; Carver, A.;
Center, A.; Cheng, M. L.; Curry, L.; Danaher, S.; Davenport, L.; Desilets, R.; Dietz, S.; Dodson,
K.; Doup, L.; Ferriera, S.; Garg, N.; Gluecksmann, A.; Hart, B.; Haynes, J.; Haynes, C.; Heiner,
C.; Hladun, S.; Hostin, D.; Houck, J.; Howland, T.; Ibegwam, C.; Johnson, J.; Kalush, F.; Kline,
L.; Koduru, S.; Love, A.; Mann, F.; May, D.; McCawley, S.; McIntosh, T.; McMullen, I.; Moy,
M.; Moy, L.; Murphy, B.; Nelson, K.; Pfannkoch, C.; Pratts, E.; Puri, V.; Qureshi, H.; Reardon,
M.; Rodriguez, R.; Rogers, Y.-H.; Romblad, D.; Ruhfel, B.; Scott, R.; Sitter, C.; Smallwood, M.;
166
Stewart, E.; Strong, R.; Suh, E.; Thomas, R.; Tint, N. N.; Tse, S.; Vech, C.; Wang, G.; Wetter, J.;
Williams, S.; Williams, M.; Windsor, S.; Winn-Deen, E.; Wolfe, K.; Zaveri, J.; Zaveri, K.; Abril,
J. F.; Guigó, R.; Campbell, M. J.; Sjolander, K. V.; Karlak, B.; Kejariwal, A.; Mi, H.; Lazareva,
B.; Hatton, T.; Narechania, A.; Diemer, K.; Muruganujan, A.; Guo, N.; Sato, S.; Bafna, V.; Istrail,
S.; Lippert, R.; Schwartz, R.; Walenz, B.; Yooseph, S.; Allen, D.; Basu, A.; Baxendale, J.; Blick,
L.; Caminha, M.; Carnes-Stine, J.; Caulk, P.; Chiang, Y.-H.; Coyne, M.; Dahlke, C.; Mays, A. D.;
Dombroski, M.; Donnelly, M.; Ely, D.; Esparham, S.; Fosler, C.; Gire, H.; Glanowski, S.; Glasser,
K.; Glodek, A.; Gorokhov, M.; Graham, K.; Gropman, B.; Harris, M.; Heil, J.; Henderson, S.;
Hoover, J.; Jennings, D.; Jordan, C.; Jordan, J.; Kasha, J.; Kagan, L.; Kraft, C.; Levitsky, A.;
Lewis, M.; Liu, X.; Lopez, J.; Ma, D.; Majoros, W.; McDaniel, J.; Murphy, S.; Newman, M.;
Nguyen, T.; Nguyen, N.; Nodell, M.; Pan, S.; Peck, J.; Peterson, M.; Rowe, W.; Sanders, R.; Scott,
J.; Simpson, M.; Smith, T.; Sprague, A.; Stockwell, T.; Turner, R.; Venter, E.; Wang, M.; Wen,
M.; Wu, D.; Wu, M.; Xia, A.; Zandieh, A.; Zhu, X., The Sequence of the Human Genome. Science
2001, 291 (5507), 1304.
201. de la Vega-Monroy, M. L. F.-M., C., Beta-Cell Function and Failure in Type 1 Diabetes.
IntechOpen 2010, 93-116.
202. Underwood, C. R.; Garibay, P.; Knudsen, L. B.; Hastrup, S.; Peters, G. H.; Rudolph, R.;
Reedtz-Runge, S., Crystal structure of glucagon-like peptide-1 in complex with the extracellular
domain of the glucagon-like peptide-1 receptor. The Journal of biological chemistry 2010, 285
(1), 723-730.
203. Zhang, Y.; Sun, B.; Feng, D.; Hu, H.; Chu, M.; Qu, Q.; Tarrasch, J. T.; Li, S.; Sun Kobilka,
T.; Kobilka, B. K.; Skiniotis, G., Cryo-EM structure of the activated GLP-1 receptor in complex
with a G protein. Nature 2017, 546 (7657), 248-253.
167
204. Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt, D. M.; Meng, E.
C.; Ferrin, T. E., UCSF Chimera—A visualization system for exploratory research and analysis. J
Comput Chem 2004, 25 (13), 1605-1612.
205. Bennett, L. M.; Gadlin, H., Collaboration and team science: from theory to practice. J
Investig Med 2012, 60 (5), 768-775.
206. Hsiehchen, D.; Espinoza, M.; Hsieh, A., Multinational teams and diseconomies of scale in
collaborative research. Science Advances 2015, 1 (8), e1500211.
207. Big Science: The 10 Most Ambitious Experiments in the Universe Today.
https://www.popsci.com/science/article/2011-07/supersized-10-most-awe-inspiring-projects-
universe/.
208. Science Communication. https://journals.sagepub.com/home/scx.
209. Company, T. N. The Nielsen Total Audience Report: Q1 2018; 2018.
210. Funk, C.; Gottfried, J.; Mitchell, A. A majority of Americans rely on general outlets for
science news but more say specialty sources get the facts right about science; Pew Research
Center, 2017.
211. Hitlin, P.; Olmstead, K. Science-related Facebook pages draw millions of followers but
feature more posts with ‘news you can use’ or ads than scientific discoveries; Pew Research
Center, 2018.
212. Cary, F.; Gottfried, J.; Mitchell, A. Most Americans see science-related entertainment
shows and movies in either a neutral or positive light; Pew Research Center, 2017.
213. Lemaine, A., 'CSI' spurs campus forensics scene. Union-Tribune 2004.
214. Wright, J. STEM Majors Are Accelerating in Every State, Just as Humanities Degrees Are
Declining; Emsi, 2017.
168
215. Lesen, A. E.; Rogan, A.; Blum, M. J., Science Communication Through Art: Objectives,
Challenges, and Outcomes. Trends in Ecology & Evolution 2016, 31 (9), 657-660.
216. Root-Bernstein, R.; Allen, L.; Beach, L.; Bhadula, R.; Fast, J.; Hosey, C.; Kremkow, B.;
Lapp, J.; Lonc, K.; Pawelec, K.; Podufaly, A.; Russ, C.; Tennant, L.; Vrtis, E.; Weinlander, S.,
Arts Foster Scientific Success: Avocation of Nobel, National Academy, Royal Society, and Sigma
Xi Members. Journal of Psychology of Science and Technology 2008, 1 (2), 51-63.
217. Root-Bernstein, R. S.; Bernstein, M.; Garnier, H., Correlations between avocations,
scientific style, work habits, and professional impact of scientists. Creativity Research Journal
1995, 8 (2), 115-137.
218. Rogers, N. To Win a Nobel Prize in Science … Make Art?
https://www.insidescience.org/news/win-nobel-prize-science-make-art.
219. Grzyb, K.; Snyder, W.; Field, K. G., Learning to Write Like a Scientist: A Writing-
Intensive Course for Microbiology/Health Science Students. J Microbiol Biol Educ 2018, 19 (1),
19.1.10.
220. Russell, D. R., Writing in the Academic Disciplines: A Curricular History. Southern
Illinois University Press: 2002.
221. Brownell, S. E.; Price, J. V.; Steinman, L., Science Communication to the General Public:
Why We Need to Teach Undergraduate and Graduate Students this Skill as Part of Their Formal
Scientific Training. J Undergrad Neurosci Educ 2013, 12 (1), E6-E10.
222. Ibrahim Andrew, M.; Bradley Steven, M., Adoption of Visual Abstracts at Circulation
CQO. Circulation: Cardiovascular Quality and Outcomes 2017, 10 (3), e003684.
169
223. Ibrahim, A. M.; Lillemoe, K. D.; Klingensmith, M. E.; Dimick, J. B., Visual Abstracts to
Disseminate Research on Social Media: A Prospective, Case-control Crossover Study. Annals of
Surgery 2017, 266 (6).
224. Friendly, M., A Brief History of Data Visualization. Handbook of Computational
Statistics: Data Visualization 2006, 1-43.
225. Rougier, N. P.; Droettboom, M.; Bourne, P. E., Ten Simple Rules for Better Figures. PLOS
Computational Biology 2014, 10 (9), e1003833.
226. Shepard, A. Sales Rank Express.
http://www.salesrankexpress.com/sre/8ce744fb3352c215d5e2ff1c4544df1f5d6edd.
227. Funk, C.; Hefferon, M. Most Americans have positive image of research scientists, but
fewer see them as good communicators; Pew Research Center, 2019.
228. How Scientists Engage the Public; Pew Research Center, 2015.
229. Collins, K.; Shiffman, D.; Rock, J., How Are Scientists Using Social Media in the
Workplace? PLOS ONE 2016, 11 (10), e0162680.
230. Passion in Science Awards. https://www.neb.com/about-neb/passion-in-science-
awards#tabselect3.
Abstract (if available)
Abstract
This thesis explores aspects of biological modeling and data visualization as well as biochemical experiments toward generating information regarding the components and infrastructure of the cell. Experimental results focus on protein structure determination, cellular proteomics, transcriptomics, fluorescence and X-ray based cell imaging, along with computational work to model and visualize biochemical components at the atomic, organelle, and cellular scales. The biological focus centers on aspects of metabolic regulation, including the pancreatic beta cell (PBC), a highly-differentiated cell type in the pancreas responsible for producing insulin
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Non-invasive live-cell imaging for monitoring and evaluating pancreatic islet and beta cell metabolism
PDF
Exploring the structure of G protein-coupled receptors
PDF
Biochemical regulation of hunger by the melanocortin 4 receptor
PDF
Computer aided inhibitor design for fungal bromodomains
PDF
Structural studies of the human leukotriene B4 receptor 1
PDF
Structure and function of the orphan G protein-coupled receptor 6
PDF
Structure – dynamics – function analysis of class A GPCRs and exploration of chemical space using integrative computational approaches
PDF
Live FLIM imaging and targeted pH sensor design to uncover dynamics of insulin secretory granule maturation
PDF
Mapping 3D genome structures: a data driven modeling method for integrated structural analysis
PDF
Structural and biochemical analyses on substrate specificity and HIV-1 Vif mediated inhibition of human APOBEC3 cytidine deaminases
PDF
Spatial and quantitative mapping of caveolin-1 membrane nanodomains and their functional dynamic response to mechanosignaling cues
PDF
Developing and exploiting small molecules to study O-GlcNAc modification
PDF
Semantic modeling of outdoor scenes for the creation of virtual environments and simulations
PDF
Selective inhibition of inflammatory and neuropathic cold pain
PDF
Advancing computational methods for free energy calculations in proteins and the applications for studies of catalysis in nucleotide-associated proteins
PDF
New approaches for precisely engineering heterotypic muscle tissues by naturally and synthetically controlling cell fate
PDF
Synaptic integration in dendrites: theories and applications
PDF
Integrated approaches to understanding diversification through time using sea urchins as a model system
PDF
A novel therapeutic approach in asthma: depleting CD52-expressing leukocytes suppresses airway hyperreactivity and ameliorates lung inflammation
PDF
Detecting joint interactions between sets of variables in the context of studies with a dichotomous phenotype, with applications to asthma susceptibility involving epigenetics and epistasis
Asset Metadata
Creator
McClary, Kyle Mathew (author)
Core Title
Modeling biochemical components and systems through data integration and digital arts approaches
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Chemistry
Publication Date
05/06/2020
Defense Date
12/02/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
art science,artsci,biochemistry,cell modeling,cinematic arts,convergent bioscience,data integration,digital arts,melanocortin 4 receptor,microscopy,OAI-PMH Harvest,omics,pancreatic beta cell,point cloud rendering,science communication,science media,science visualization,structural biology,systems biology,virtual reality
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Stevens, Raymond Charles (
committee chair
), Cherezov, Vadim (
committee member
), Katritch, Vsevolod (
committee member
)
Creator Email
kmcclary@usc.edu,kmcclary88@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-298864
Unique identifier
UC11665825
Identifier
etd-McClaryKyl-8431.pdf (filename),usctheses-c89-298864 (legacy record id)
Legacy Identifier
etd-McClaryKyl-8431.pdf
Dmrecord
298864
Document Type
Dissertation
Rights
McClary, Kyle Mathew
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
art science
artsci
biochemistry
cell modeling
cinematic arts
convergent bioscience
data integration
digital arts
melanocortin 4 receptor
microscopy
omics
pancreatic beta cell
point cloud rendering
science communication
science media
science visualization
structural biology
systems biology
virtual reality