Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Molecular imaging data grid (MIDG) for multi-site small animal imaging research based on OGSA and IHE XDS-i
(USC Thesis Other)
Molecular imaging data grid (MIDG) for multi-site small animal imaging research based on OGSA and IHE XDS-i
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MOLECULAR IMAGING DATA GRID (MIDG) FOR MULTI-SITE SMALL
ANIMAL IMAGING RESEARCH BASED ON OGSA AND IHE XDS-i
by
Jasper Chung Lee
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOMEDICAL ENGINEERING)
May 2011
Copyright 2011 Jasper Chung Lee
ii
DEDICATION
I dedicate this dissertation work to my mother, wife, twin brother, and the Lord for their
guidance, strength, and encouragement.
iii
ACKNOWLEDGMENTS
This work has been done with the partial contribution and collaboration of the following
colleagues and friends:
Prof. HK Huang Ph.D Advisor
Prof. Brent Liu Ph.D Advisor
Prof. Jianguo Zhou & staff XDS-i Concept and Application
Jorge Documet, PhD MIDG Graphical User Interface Development; IPILab
Michael Zhou, PhD Globus Toolkit Integration; formerly with IPILab
Ryan Park, MS Data Object Model; USC MIC
Archana Tank, MS Data Collection; USC MIC
Prof. Kirk Shung Multi-Site MIDG Evaluation Model; USC UTRC
This work has been supported by:
- NIH/NIBIB Biomedical Imaging Informatics Training Grant T32 EB00438
- USAMRMC/TATRC 2007011185
- MI2, USA
iv
TABLE OF CONTENTS
Dedication ......................................................................................................................... i
Acknowledgements ............................................................................................................ iii
List of Tables .................................................................................................................... vii
List of Figures .................................................................................................................. viii
Abbreviations ..................................................................................................................... xi
Abstract ..................................................................................................................... xiii
Chapter 1. Introduction ...................................................................................................1
1.1 Multi-Site Molecular Imaging Research Using Small Animal Models ................1
1.2 Current Imaging Informatics Challenges in Small Animal Imaging Research ....2
1.3 Introduction of a Molecular Imaging Data Grid ...................................................3
Chapter 2. Small Animal Imaging Informatics .............................................................5
2.1 Pre-Clinical Imaging Workflow and Modalities...................................................5
2.2 Multi-Modality Data Objects ................................................................................8
2.3 Current Data Management and Sharing Infrastructures .......................................9
2.4 Imaging Informatics Challenges Addressed in this Research .............................11
Chapter 3. Data Grid Technology in Medical Imaging ..............................................14
3.1 Background .........................................................................................................14
3.2 Data Grid Concepts and Dataflow ......................................................................14
3.3 Open Grid Services Architecture (OGSA) .........................................................16
3.4 DICOM Standards and IHE Workflow Profiles in Medical Imaging
Informatics ..........................................................................................................17
3.4.1 Digital Imaging and Communications in Medicine (DICOM) ..................17
3.4.2 Integrating the Healthcare Enterprise (IHE) Workflow Profiles ...............21
3.5 Current Applications in Radiology and Imaging-based Research ......................23
Chapter 4. MIDG System Overview.............................................................................24
4.1 System Design ....................................................................................................24
4.2 Data Model..........................................................................................................26
4.2.1 MIDG Metadata Database Schema ............................................................28
4.2.2 DICOM Compliance ..................................................................................30
4.3 System Components............................................................................................30
4.3.1 Graphical User Interface (GUI) Web-Server .............................................31
v
4.3.2 Grid Node Server .......................................................................................32
4.3.3 Grid Management Server ...........................................................................33
4.3.4 Storage Devices .........................................................................................34
4.4 Workflow ............................................................................................................34
Chapter 5. Initial MIDG Design Using The Globus Toolkit ......................................36
5.1 MIDG System Architecture Using The Globus Toolkit 4.0.2 ............................36
5.1.1 Application Layer ......................................................................................39
5.1.2 Gateway Layer ...........................................................................................39
5.1.3 Grid-Middleware Layer .............................................................................41
5.1.4 Resources Layer .........................................................................................43
5.2 System Connectivity and Workflow ...................................................................44
5.2.1 Archiving of Molecular Imaging Data .......................................................46
5.2.2 Data Persistence Management ...................................................................49
5.2.3 Data Retrieval Across Multiple Research Sites .........................................51
5.3 Design Limitations and Bottlenecks ...................................................................54
Chapter 6. MIDG Design Based on the XDS-i Integration Profile and Open
Grid Services Architecture ..............................................................................................56
6.1 Dataflow based on IHE XDS-i Integration Profile .............................................56
6.1.1 Uploading and Downloading Workflow ....................................................57
6.1.2 Rules-based Data Management Workflow ................................................59
6.2 Services-Oriented Functionality .........................................................................61
6.2.1 Multi-Threaded GridFTP Transfers ...........................................................61
6.2.2 Rules-based Data Routing..........................................................................62
6.2.3 Internal Hardware and Software Monitoring .............................................63
6.2.4 Improved Data Security and Auditing .......................................................64
6.3 Service-Oriented Architecture (SOA).................................................................65
6.3.1 SOA Protocols ...........................................................................................65
6.3.2 4-Layer System Architecture .....................................................................67
6.3.3 Application Layer ......................................................................................68
6.3.4 Gateway Layer ...........................................................................................69
6.3.5 Grid-Middleware Layer .............................................................................70
6.3.6 Resources Layer .........................................................................................71
Chapter 7. Web-Based Graphical User Interface .......................................................73
7.1 Purpose and Design.............................................................................................73
7.1.1 Framework and Programming Language...................................................74
7.1.2 Study-Centric Sharing ................................................................................74
7.2 Uploading Molecular Imaging Study Datasets ...................................................75
7.2.1 Upload Workflow and Dataflow ................................................................75
7.2.2 Samples of Screenshot ...............................................................................78
7.2.3 DICOM Compliance ..................................................................................85
7.3 Monitoring and Management Tools....................................................................87
vi
7.4 Downloading Molecular Imaging Datasets ........................................................93
7.4.1 Samples of Screenshot ...............................................................................93
7.4.2 Download Workflow and Dataflow ...........................................................95
Chapter 8. System Implementation and Evaluation ...................................................97
8.1 Objectives ...........................................................................................................97
8.1.1 Overview: Laboratory Model and Distributed Multi-Site Model ..............97
8.2 Datasets Collected for System Evaluation ........................................................100
8.3 Hardware Components......................................................................................102
8.3.1 Laboratory Model ....................................................................................102
8.3.2 Multi-Site Model ......................................................................................103
8.4 System Configurations ......................................................................................105
8.4.1 Network Configurations...........................................................................105
8.4.2 Configuring Grid Manager ......................................................................106
8.4.3 Configuring Grid Node Servers ...............................................................106
8.4.4 Configuring MIDG Web Servers and Investigator Workstations ............106
8.5 Laboratory Evaluation Model and Dataflow ....................................................107
8.6 Distributed Multi-Site MIDG Evaluation Model and Dataflow .......................109
8.6.1 Three Site Test-bed ..................................................................................109
8.6.2 Dataflow ...................................................................................................112
8.7 Networking and Bandwidth ..............................................................................114
Chapter 9. System Evaluation Results .......................................................................116
9.1 Laboratory Evaluation Results ..........................................................................116
9.2 Multi-Site Evaluation Results ...........................................................................119
9.2.1 Upload Performance Results ....................................................................120
9.2.2 Download Performance Results ...............................................................123
9.2.3 Fault-Tolerance Performance Results ......................................................124
9.3 Qualitative Impact on Pre-clinical Molecular Imaging Facilities .....................126
9.3.1 Laboratory Manager’s Feedback .............................................................126
9.3.2 Laboratory Supervisor’s Feedback ..........................................................127
9.3.3 Research Laboratory Specialist’s Feedback ............................................128
Chapter 10. Current Status, Disccusion, and Future Plans .......................................129
10.1 Current Project Status .......................................................................................129
10.2 Discussion .........................................................................................................130
10.2.1 Comparing Existing Data Grids in Healthcare Informatics .....................130
10.2.2 Comparing Current Preclinical Molecular Imaging Informatics
Methods....................................................................................................131
10.2.3 Discussion Summary ...............................................................................132
10.3 Future Research and Development Opportunities ............................................133
Bibliography ....................................................................................................................137
Appendix: Author Publications and Presentations ..........................................................139
vii
LIST OF TABLES
Table 1: Molecular Imaging File Formats That Need Archiving at the USC MIC ............9
Table 2: Sample Rules-Based Routing Configurations for the Routing Service ..............63
Table 3: Internally Monitored Components of the MIDG Infrastructure .........................64
Table 4: List of DICOM Tags that are Modified by the MIDG GUI During Upload ......86
Table 5: Preclinical Molecular Imaging File Formats Collected from USC MIC
for Evaluation....................................................................................................101
Table 6: Molecular Imaging Datasets Collected for Evaluation from the USC MIC .....102
Table 7: Hardware Components Used in MIDG Laboratory Model ..............................103
Table 8: Hardware Components Used in MIDG Distributed Multi-Site Model ............104
Table 9: Performance Tests Measuring the Time It Takes to Archive and Retrieve a
Study Dataset from the Six Preclinical Molecular Imaging Modality
Types Over a 100 mbps Network .....................................................................117
Table 10: MIDG Project Timeline ..................................................................................130
viii
LIST OF FIGURES
Figure 1: Traditional Small Animal Imaging Facility Workflow ........................................6
Figure 2: Preclinical Imaging Modalities. A) MicroPET, B) MicroCT, C) MicroMRI,
D) MicroUS, E) Optical Imaging, F) Autoradiography .....................................7
Figure 3: Sample Preclinical Imaging Studies. Left - Sagittal CT, Middle - PET, and
Right - PET/CT Fusion Image of a Mouse. .......................................................8
Figure 4: Typical VPN Remote Access for Remote Site Researchers to Access
Current Molecular Imaging Facilities’ Data Archiving Server .......................11
Figure 5: Conceptual Data Grid Hardware and Networking Diagram ..............................16
Figure 6: DICOM Model of the Real World .....................................................................19
Figure 7: Basic DICOM File Structure ..............................................................................20
Figure 8: IHE XDS-i Integration Profile ...........................................................................22
Figure 9: MIDG System Overview. ...................................................................................25
Figure 10: Small Animal Imaging Data Model for Organizing Data Context in the
MIDG ...............................................................................................................27
Figure 11: MIDG Metadata Database Schema ..................................................................29
Figure 12: Basic Connectivity of MIDG Components ......................................................31
Figure 13: Revised Small Animal Imaging Facility Workflow with Altered Steps
Identified in Gray. ............................................................................................35
Figure 14: Molecular Imaging Grid Architecture, Built with Globus Toolkit’s Grid
Services, is Tailored for Molecular Imaging Data Management .....................37
Figure 15: Data Grid System Architecture for Clinical Radiology Applications ..............38
Figure 16: RLS Mapping of Logical File Name (LFN) to Physical File Name (PFN) .....42
ix
Figure 17: Components and Connectivity of the Molecular Imaging Grid
Architecture, from a Molecular Imaging Site’s Perspective (Top) ..................45
Figure 18: Archiving Molecular Imaging Files and Study Metadata - Dataflow
Diagram............................................................................................................47
Figure 19: Data Persistence Management - Dataflow Diagram ........................................50
Figure 20: Search and Data Retrieval of Molecular Imaging Studies - Dataflow
Diagram............................................................................................................52
Figure 21: MIDG Implementation of the IHE XDS-i Integration Profile .........................58
Figure 22: Rules-based Routing / Back-up Workflow ......................................................60
Figure 23: Communication Protocols in the MIDG ..........................................................66
Figure 24: Service-Oriented-Architecture of MIDG (Compared with Figure 14 of the
MIDG built with Globus Toolkit) ....................................................................68
Figure 25: Multiple Simultaneous Users can Access the MIDG Graphical User
Interface Because It is Web-Based with Process Request Queuing
Mechanisms .....................................................................................................73
Figure 26: MIDG Graphical User Interface – Upload Workflow and Dataflow ...............77
Figure 27: MIDG Upload GUI – Study Selection Level ...................................................78
Figure 28: MIDG Upload GUI – New Study Registration Form ......................................79
Figure 29: MIDG Upload GUI – New Session Registration Form ...................................80
Figure 30: MIDG Upload GUI – New Group Registration Form. ....................................81
Figure 31: MIDG Upload GUI – New Scan Registration Form ........................................82
Figure 32: Screenshot of Copied Files from a Client Machine into the Shared Study
Directory (X:\) of the MIDG Web-Server. ......................................................83
Figure 33: MIDG Upload GUI – Uploading Files From a Scan Dataset ...........................84
Figure 34: MIDG GUI – Study Monitoring .......................................................................88
Figure 35: MIDG GUI – Study Management ....................................................................90
x
Figure 36: MIDG GUI – Administrator’s Management Page ...........................................92
Figure 37: MIDG Download GUI......................................................................................93
Figure 38: Sample Search Results Page With Results for Optical Imaging Studies in
the MIDG .........................................................................................................94
Figure 39: MIDG Graphical User Interface – Download Workflow .................................96
Figure 40: Systems Integration and Workflow Overview of MIDG Implementation .......99
Figure 41: Laboratory Model - Components and Dataflow Overview ............................108
Figure 42: Geographic Location of the Three USC IPILab, MIC, and UTRC Sites
Participating in the Multi-Site MIDG Evaluation Model ..............................110
Figure 43: Components and Connectivity of the Multi-Site MIDG Model
Implementation for Evaluation ......................................................................113
Figure 44: Network Bandwidth in the 3-Site MIDG Evaluation .....................................115
Figure 45: Laboratory Results Plot ..................................................................................119
Figure 46: Upload Performance of a MicroCT Scan of Rat Animal Model ....................121
Figure 47: Upload Performance Results for Studies Uploaded at the USC MIC ............122
Figure 48: Download Performance Results for Studies Downloaded at the USC
IPILab. ...........................................................................................................124
Figure 49: Fault-Tolerance Results for a MicroCT Study Downloaded at the USC
UTRC .............................................................................................................125
xi
ABBREVIATIONS
API Application Programming Interface
AR Autoradiography
BME Biomedical Engineering
DICOM Digital Imaging and Communications in Medicine
ebXML Electronic Business using Extensible Markup Language
ePR Electronic Patient Record
FTP File Transfer Protocol
GAP Grid-Access-Point
GB Gigabyte
GridFTP Grid File Transfer Protocol
HIPAA Health Insurance Portability and Accountability Act
HTTP Hypertext Transfer Protocol
HTTPS Hypertext Transfer Protocol Secure
IACUC Institutional Animal Care and Use Committee
IHE Integrating the Healthcare Enterprise
IPILab USC Image Processing and Informatics Laboratory
IT Information Technology
LAN Local Area Network
LFN Logical File Name
LRC Local Replica Catalog
MCS Metadata Catalog Service
μCT Micro- Computed Tomography
μMRI Micro- Magnetic Resonance Imaging
μPET Micro- Positron Emissions Tomography
μSPECT Micro- Single Positron Emission Computed Tomography
μUS Micro-Ultrasound
MIC Molecular Imaging Center
MB Megabyte
MIDG Molecular Imaging Data Grid
NAS Network-Attached-Storage
OPT Optical Imaging
PC Personal Computer
PFN Physical File Name
PHP PHP: Hypertext Preprocessor
PostgreSQL An Object-Relational Database Management System
RFT Reliable File Transfer
RLI Replica Location Index
RLS Replica Location Service
SAN Storage-Area-Network
xii
SOA Service Oriented Architecture
SOAP Simple Object Access Protocol
SOP DICOM Service-Object Pair
SQL Structured Query Language
TB Terabyte
URL Uniform Resource Locator
USC University of Southern California
UTRC USC Ultrasound Transducer Resource Center
VM Virtual Machine
VPN Virtual Private Network
WAN Wide Area Network
XDS-I Cross Enterprise Document Sharing for Imaging
XML Extensible Markup Language
xiii
ABSTRACT
Molecular imaging is a relatively new field in medical imaging research that has been
propagating research discoveries in biology, medicine, disease studies, proteomics, and
radiopharmaceutical development, by using in-vivo biomarkers to visualize and quantify
cellular and molecular content and activity. Small animal imaging facilities throughout
medical research institutions have been growing in number of investigator studies as well
image data volume per study. To optimize utilization of pre-clinical molecular imaging
data in translational sciences research, a multi-modality Molecular Imaging Data Grid
(MIDG) has been designed to address challenges in data archiving, management, and
sharing among multi-site or multi-institution research consortiums.
The background, MIDG R&D (research and development), system architecture, data
model, and workflows of the Molecular Imaging Data Grid are presented, followed by
system evaluations and results discussion. The system design was based on the Open
Grid Services Architecture and the Cross-Enterprise Document Sharing for Imaging
(XDS-i) integration profile. The current status of the MIDG project is a 3-site
implementation within USC, enabling DICOM-compliant data sharing between a
molecular imaging facility, a medical imaging informatics lab, and an investigative
research lab in the department of Biomedical Engineering. This research presents a
system design proposal, methodology, and evaluated results for applying novel grid-
based data archiving infrastructures to sharing molecular imaging data in pre-clinical
small animal model research communities.
1
Chapter 1. INTRODUCTION
Molecular imaging modalities have become a necessary resource in small animal imaging
facilities and scientific medical research by helping investigators visualize and quantify
cellular and proteomic activity in small animal models. However, due to limited
interoperability between disparate data objects and geographically distant user-groups,
newly discovered imaging content is often under-utilized and discarded upon publication
of specific research findings. Over the past decade, grid-technology
[1]
has entered the
information technology (IT) space promising integrated and scalable IT infrastructure for
sharing research data and computational workload in an interconnected and virtualized
infrastructure. This research presents the implications, methodology, and evaluation of a
Molecular Imaging Data Grid for improving imaging informatics in multi-site molecular
imaging research communities.
1.1 Multi-Site Molecular Imaging Research Using Small Animal Models
Discovery and innovation in molecular biology research, in-vivo imaging modalities, and
image-processing technologies for improved healthcare delivery always begin in pre-
clinical stages of evaluation in small animal models. Small animal imaging facilities are
available at most medical research universities and continue to provide multi-modality
imaging resources to drive a new field in science called molecular imaging. These
modalities include nuclear imaging technology such as microPET and microSPECT,
optical and fluorescence imaging, microCT, microMR, ultrasound, autoradiography, and
2
microscopy, together forming a versatile set of imaging tools with biomarkers needed for
viewing and quantifying cellular and molecular processes in small animal models.
Medical investigators use them for disease studies, new pathological therapies, and
cellular and genomic research. Today, most small animal imaging facilities offer services
in multi-modality molecular imaging usage, quantification and analysis of image data,
and trained imaging research staff to both on-campus and off-campus medical
investigators. These facility resources together help investigators experiment, visualize,
characterize, and quantify biological and chemical events at the cellular and molecular
levels, giving them probes into otherwise unseen biochemical pathways in anatomy and
physiology.
1.2 Current Imaging Informatics Challenges in Small Animal Imaging
Research
The customers of small animal imaging research facilities are the investigators who use
their imaging modalities, computing software, and knowledgeable staff services. The
efficiency and rate of investigator workflows are relevant cost variables to both the
investigator and the small animal imaging facility. Currently, challenges in archiving,
management, and distribution of growing small animal imaging data is an impediment for
many small animal imaging facilities to support. Time spent on managing data between
disparate modality imaging workstations with multiple users of data often burden small
animal imaging staff and limit the number of people who have access to and understand
the complex datasets. Furthermore, the lack of secure and reliable wide-area-network
3
(WAN) connectivity into a small animal imaging facility’s data archives makes remote
access to the multi-modality datasets a tedious, if not unwanted, process for multi-site
and multi-investigator research. Thus, valuable imaging data from individual experiments
are often lost or discarded over time by small animal imaging facilities, leaving
investigators unable to re-examine or cross-examine previous image data, not to mention
search for similar experimental imaging studies at geographically distant small animal
imaging facilities. In summary, the challenging task of archiving disparate multi-modality
imaging data and distributing publishable pre-clinical image data have not been well
addressed in small animal imaging facilities, making systems integration, management,
and sharing of molecular imaging data a workflow impediment for both investigators and
molecular imaging facility staff.
1.3 Introduction of a Molecular Imaging Data Grid
The availability of open-source grid technologies
[2]
over the past decade has helped grid-
based digital data infrastructures to be adopted in various applications in scientific
research as well as data-intensive industries such as finance and healthcare. The grid’s
virtualization of data storage infrastructure across secured WAN promotes integration of
disparate computing systems and multi-site user interaction. However, applying data grid
infrastructures to a particular application is challenging and involves detailed data object
definitions, custom user-level application interfaces, data receiving and handling services,
software integration of grid middleware, and network connectivity of multiple remote
computing networks.
4
A data grid for small animal imaging research has not been designed or implemented
before because of a few reasons. The usually small IT staff of small animal imaging
research facilities do not have the resources to develop, integrate, and implement the
many sub-systems required in a multi-modality molecular imaging data grid. Secondly,
sharing of small animal imaging datasets, even of published imaging data, to outside
investigators and research groups has not been highly regarded as convenient or
beneficial for small animal imaging facilities. Consequently, many molecular imaging
facilities inherit a data privacy mentality over their completed investigative pre-clinical
research data archives. A third reason is the previous unavailability of distributed
computing infrastructure to access large imaging datasets among multi-institutional
consortiums. Only in the past few years has data grid technology been applied in the
medical imaging informatics field. Therefore, the objectives in this research are to design
and implement an institutionally-maintained Molecular Imaging Data Grid for archiving,
managing, and distributing multi-modality image datasets generated at small animal
imaging facilities, and to promote the sharing of published scientific imaging data in
multi-institution and translational science communities across the WAN. Details of the
Molecular Imaging Data Grid architecture and workflows will be discussed in later
chapters.
5
Chapter 2. SMALL ANIMAL IMAGING INFORMATICS
Small animal imaging informatics is the information technology systems and workflows
that archive, manage, distribute, and make available small animal imaging data files. In
this chapter, the imaging workflow, modalities, and data objects of a typical small animal
imaging facility are presented. Then, a few examples of how current small animal
imaging programs address these informatics tasks are shown. Lastly, I will summarize the
imaging informatics challenges that will be addressed in this research for molecular
imaging research facilities.
2.1 Pre-Clinical Imaging Workflow and Modalities
Before biological discoveries and therapies are tested in human trials, pre-clinical
research investigators work with small animal models and imaging facilities to plan,
schedule, prepare, and image preclinical experiments using imaging modalities,
biomarkers, and computer resources designed for small animals. Figure 1 below diagrams
this investigative workflow based on workflow at the University of Southern California
(USC) Molecular Imaging Center (MIC). Preparatory steps 1 through 4 are standard
protocols for most all small animal imaging facilities, while the order of steps after 4 may
vary between different imaging facilities. If post-processing of acquisition imaging data
is required, the molecular imaging staff uses dedicated software and computing servers
for image reconstruction or co-registration of multi-modality data. After viewable images
are rendered in step 4 and 5, visualization and quantification analysis is performed by
6
staff or the investigators, often on a different software workstation. In step 7, completed
molecular imaging datasets are archived onto large capacity storage device such as a
network-attached-storage (NAS) or storage-area-network (SAN) on the internal
networks. Archived molecular imaging datasets include the raw image acquisition files,
post-processed images, videos and screenshots, acquisition and post-processing header
files, and study report files. Investigators and collaborators can then retrieve these
datasets in step 8 of the workflow from the same storage devices by copy the physical
files onto their personal computers or burning onto compact disk (CD), typically for
publication purposes.
Figure 1: Traditional Small Animal Imaging Facility Workflow
Common imaging modality systems currently being used in small animal imaging
facilities resemble many imaging modalities used on humans in radiology departments
because they are created by similar manufacturing companies. However, the resulting
scan acquisition files and post-processing datasets created in the pre-clinical small animal
imaging facility workflow are not standardized in format or transmission methods. This is
dissimilar to clinical radiology’s implementation of the DICOM standard because
7
initiatives have not been undertaken to thoroughly integrate multi-modality animal
imaging systems. Shown below in Figure 2 are photos of common small animal imaging
modalities in small animal imaging facilities, including the μPET, μCT, μMRI,
μUltrasound, optical imaging, and autoradiography.
A B C
D E F
Figure 2: Preclinical Imaging Modalities. A) MicroPET, B) MicroCT, C) MicroMRI,
D) MicroUS, E) Optical Imaging, F) Autoradiography
These small animal imaging modalities are able to capture images at higher spatial
resolution using scaled-down versions of the same medical physics techniques in
radiology modalities. A sample co-registered PET/CT of a mouse imaging dataset with a
cancerous prostate tumor biomarker is shown in Figure 3.
8
Figure 3: Sample Preclinical Imaging Studies.
Left - Sagittal CT, Middle - PET, and Right - PET/CT Fusion Image of a Mouse.
* Images courtesy of USC Molecular Imaging Center
2.2 Multi-Modality Data Objects
The identification of data file formats is necessary for designing a database model for
molecular imaging datasets. Due to variability in file formats generated from molecular
imaging modalities, the organization of data archives in molecular imaging centers is a
challenge. Raw acquisition images, acquisition header files, intermediate reconstruction
files, post-processing workflow files, analysis region-of-interest (ROI) files, and varying
display formats all add to the complexities of creating a centralized and intelligent data
storage infrastructure for sharing molecular imaging data and findings. Table 1
summarizes the modalities, file formats, description, and creating software based on the
workflow at the USC Molecular Imaging Center (MIC).
9
Table 1: Molecular Imaging File Formats That Need Archiving at the USC MIC
Modality Type
File Format
Description
Software
(Manufacturer)
MicroPET *.lst +*.lst.hdr
*.scn *.scn.hdr
*.img + *.img.hdr
Image Acquisition files
Histogram Reconstruction
Final Reconstruction
MicroPET
Manager
(Siemens)
MicroCT *.cat + *.cat.hdr
*.img + *.img.hdr
Image Acquisition files
Final Reconstruction
Inveon
(Siemens)
Cobra (Siemens)
PET/CT *.img
*.img
*.xif
PET Input file
CT Input file
Final Co-registered Image
Amide
(GNU Project)
Optical *.tif
*.txt
*.png
Acquisition files
Processing Parameters
Final Overlayed Image
Living Image
(Xenogen)
Ultrasound *.avi
*.tif
*.dcm
Recorded video clips
Screenshots
Screenshots
Vevo 770
(VisualSonics)
Autoradiography *.tif
*.ana
Image Acquisition file
Analysis ROI file
OptiQuant
(Packard
Instrument Co.)
2.3 Current Data Management and Sharing Infrastructures
Data storage methods vary among small animal imaging facilities, but most consist of
multiple imaging workstations networked privately with a large disk-capacity storage
device such as a network-attached-storage (NAS) or storage-area-network (SAN) device.
At the USC MIC, multiple modality and post-processing workstations are internally
connected to a dedicated server via sharing of the server’s directories over the local-area-
network (LAN). Some other small animal imaging facilities have developed data storage
infrastructures with improvements in off-site data access and automation of investigative
workflows, and are described as follows.
10
At the UCLA Crump Institute for Molecular Imaging, where they have a pool of 40 or
more investigators, workflow management software have been implemented to automate
post-processing of raw image files between on-site software workstations. The result is
an automated post-processing workflow for investigators. Storage of these datasets,
however, is still kept internally on secure storage-area-network (SAN) devices where
facility staff managers organize datasets under investigator-named folders. On-campus
users can access their data through a web-based interface by logging in and requesting the
datasets using investigator ID or session ID numbers. Requested files are then transferred
to campus-wide file servers that students and faculty can mount onto their local computer
as a shared directory, assuming their local computer is within the campus LAN and they
have a valid university provided account. Similarly at the Stanford Molecular Imaging
Program (MIPS), imaging modality workstations and user analysis workstations are
connected to a central archive server via a mounted shared directory. The small animal
imaging staff manager selectively migrate completed image files from user workstations
to the multi-terabyte archive server. Investigators then use a web portal to search and
request for files to be made available on the campus-wide network fileserver. If off-site
investigators want access to these file-servers, they must create a virtual-private-network
(VPN) account with the university as shown in Figure 4, which can be tedious to set up
and slow in file-transmission across WAN.
11
Remote Site Researcher
Pre-clinical Image Data
Archiving Server
Campus-Wide
VPN Server
Campus
Network
Firewall
Campus-Wide
User Accounts
Domain Server
SSL/TLS connection
Secure
Campus
LAN
Molecular Imaging
Facility
Imaging Modalities &
User Workstations
Remote Site
Researcher
SSL/TLS
connection
Figure 4: Typical VPN Remote Access for Remote Site Researchers to Access Current
Molecular Imaging Facilities’ Data Archiving Server
2.4 Imaging Informatics Challenges Addressed in this Research
Access to data storage infrastructure is currently restrictive for users outside of a small
animal imaging facility in order to prevent archive corruption and unwanted access to
experimental research data. The storage repositories are typically large-capacity storage
device with organized directories mounted on each workstation as external file systems
so users can drag-and-drop files with their user workstation. The sharing of molecular
imaging data is consequently limited to users who understand or have access to metadata
behind these filenames and directories. Searching and processing molecular imaging data
are very difficult for new or multi-site investigators trying to do collaboration on these
datasets. Current methods of distributing molecular imaging studies and quantitative
12
analysis reports for investigators without LAN access include selecting files from a
central storage device based on self-maintained logs and record books, and copying them
onto portable storage devices or burned DVD’s for pick up.
The lack of an organized data management and user interface with metadata-enriched
archiving infrastructure make management, distribution and data-mining of molecular
imaging research datasets heavily dependant on staff at small animal imaging facilities
and the research individuals who often have little or incomplete knowledge about the
significance of the data being handled. Data mining across differing modalities, multiple
investigators, and diverse experimental imaging studies is made difficult with only a
filename and folder directory for comparison. Molecular imaging investigators often
spend significant amounts of time organizing, transmitting, and explaining file context
because the metadata is not readily available in existing storage solutions, resulting in
high costs and shortened time for analyzing experimental data. The strain in organizing,
discovering, and distributing molecular imaging data is primarily due to the lack of a
dedicated archiving infrastructure. Small animal imaging informatics today is similar to
the state of informatics in radiology before the standardization of the DICOM file and the
Picture Archiving and Communication System (PACS) in the 1990’s.
However, where the Picture Archiving and Communication System (PACS) and the
Digital Imaging and Communications in Medicine (DICOM) standard have integrated
modality data, systems and workflow within the Radiology department for healthcare
delivery, the digital infrastructure and workflow requirements of molecular imaging
13
facilities are quite different in its design and specifications form radiological imaging
modalities. Variability in experimental objectives of small animal imaging studies,
complex file formats, extensive image post-processing and quantitative analysis of
volumetric datasets have limited the usefulness of conventional radiology workflows and
communication protocols between modalities, software tools, storage archives, and users.
By building upon features and services of grid technology, and adopting certain radiology
imaging standards and workflow profiles, the utilization of molecular imaging services
may be improved and its costs reduced for both small animal imaging facilities and its
investigators. These challenges are to be explored in this research.
14
Chapter 3. DATA GRID TECHNOLOGY IN MEDICAL IMAGING
3.1 Background
Grid-based computing has been used for research information technology systems since
the 1990’s to share large datasets and computational resources among geographically
dispersed collaborating sites. Secure networking protocols and service-oriented-
architectures (SOA) have enabled grid-based infrastructures to provide research
initiatives an effectively integrated IT solution for archiving, managing, and distributing
large volumes of data among multiple involved sites. In medical imaging, grid-based
solutions have recently been considered as a method for multi-site data integration in
enterprise radiology
[3]
and image-based clinical trials
[4]
. This research demonstrates how
data grid technologies can be utilized for archiving, managing, and distributing diverse
molecular imaging datasets among dispersed investigative users for improved workflow
and collaboration in the molecular imaging research communities.
3.2 Data Grid Concepts and Dataflow
A data grid is a distributed network of hardware components and software services that
together enable remote users to share, search, and retrieve data files across the WAN.
Grid technology was conceived to remove obstacles that prevent seamless collaboration
among geographically dispersed institutions and sites. Its core services, interfaces and
protocols allow users to access remote computational hardware and data resources as if
they were located within their own domain while simultaneously preserving local control
15
over who can use resources and when. The general design concepts of a data grid consists
of information exchange infrastructure, data management services, robust data delivery
methods, events auditing, security protocols, and implementation portability
[5]
.
The network structure of data grids typically involves node servers at participating data
grid sites and a centralized management server that hosts global database content and
web-services. Additional hardware components, such as large-capacity storage devices
and application-specific middleware servers, can be added to each site to utilize the data
exchange through the local grid node servers. Figure 5 diagrams the basic concept and
dataflow of a data grid infrastructure. Archiving is done by uploading data files from
client machines to the local grid node server, typically through a graphical user interface.
The grid node server would then process the files by registering it into the central
management server. Internal data management of a data grid system typically involves
system monitoring, routing of archived files to two or more storage devices for data
redundancy, and determining fastest data source for retrieval to a user site. And finally,
retrieving data from a data grid system is done by sending requests to a local grid node
server that locates and retrieves files from either on-site or remote storage devices. These
concepts and dataflows will be presented in more detail in later chapters.
16
Network Storage Device
(e.g. NAS or SAN)
Remote Site User
Data Grid – Central
Management Server
Data Grid –
Node Server
Data Grid –
Node Server
Data Grid –
Node Server
Custom Application Server
(e.g. web-based
graphical user interface)
Custom Application Server
(e.g. computing cluster)
WAN WAN
LAN LAN LAN
LAN
Site ‘C’
Site ‘D’
Site ‘B’
Site ‘A’
Figure 5: Conceptual Data Grid Hardware and Networking Diagram
3.3 Open Grid Services Architecture (OGSA)
The OGSA is a set of architectural requirements and capabilities put forth by the grid
research community to address topics in grid development and promote interoperability
through service-oriented designs. The first official OGSA document was distributed in
early 2005 by authoring members of the Global Grid Forum and was utilized in many
17
applications of grid-based research.
[6]
OGSA has become the framework for distributed
system integration, virtualization, and management by the specification of the interfaces,
behaviors, resource models, and binding mechanisms between grid services. In order to
achieve interoperability among vendor-agnostic systems, the OGSA framework promotes
designs that are service-oriented architectures implemented with web-services. The grid
capabilities covered by OGSA include: infrastructure services, execution management
services, data services, resource management services, security services, self-
management services, and information services. These topics are fundamental to
designing a robust grid-based system.
3.4 DICOM Standards and IHE Workflow Profiles in Medical Imaging
Informatics
The two medical imaging informatics standards that are relevant in this research are the
DICOM standard (Digital Imaging and Communications in Medicine) and the Cross-
Enterprise Document Sharing for Imaging integration profile (XDS-i) put forth by the
Integrating the Healthcare Enterprise initiative (IHE).
3.4.1 Digital Imaging and Communications in Medicine (DICOM)
To implement data grid technologies for medical imaging applications, a data grid’s
front-end services needs to comply with existing data and messaging standards in medical
imaging informatics. However, the field of molecular imaging informatics is relatively
new, and it lacks a standard file format that is accepted across all modalities and
18
software. In clinical radiology, where many preclinical imaging modality vendors also
have a market share, the DICOM standard has been widely accepted as the data format
and data exchange method between two imaging-related devices. First created in the
1980’s, the DICOM standard has been continuously maintained by medical imaging
informatics professionals in the clinical radiology industry. Its goals were and still are to
enable vendor-agnostic interoperability of diagnostic imaging data between modalities
and information systems in the clinical healthcare setting. To do this, a data model was
created based on real-world entities linking patients to their imaging-related data files.
This data model is shown in Figure 6, which was taken from part 3 of the DICOM
standard in 2009.
19
Figure 6: DICOM Model of the Real World
[7]
DICOM applies this real world model into its DICOM file format standard by attaching
textual metadata on top of its data content. This is best shown in the DICOM image
information object definition (IOD) of Figure 7. In addition to an image’s pixel data,
textual metadata is embedded into the DICOM file as multiple tag-value pairs. The tags
are unique fields that describe the image context, such as patient name, study date, and
modality manufacturer. An image’s pixel data then becomes an attribute that describes
the physical image file.
20
Figure 7: Basic DICOM File Structure. Courtesy of DCM4CHE
[8]
Because of DICOM’s presence in clinical radiology systems and acceptance by the same
vendors that manufacture current preclinical imaging modalities, the DICOM standard is
a feasible data exchange and file format standard for preclinical molecular imaging data
objects. Standardization of preclinical imaging data formats can enable interoperability of
the Molecular Imaging Data Grid with external DICOM-compliant software and
21
information systems. The Molecular Imaging Data Grid design thereby adopts the
DICOM standard by supporting DICOM message exchange protocols and converting
current molecular imaging data formats into the DICOM format. Further details of how
preclinical metadata is mapped to these DICOM tags and which data formats are
converted to DICOM will be presented in later chapters.
3.4.2 Integrating the Healthcare Enterprise (IHE) Workflow Profiles
Just as DICOM was adopted for standardizing preclinical imaging data in the MIDG, the
MIDG system was based on an IHE integration workflow profile for radiology. The
Cross-Enterprise Document Sharing of Images (XDS-i) was defined by the Integrating
the Healthcare Enterprise (IHE) initiative in 2005
[9]
, and specifies how medical IT
software components can register, query, locate, and deliver patient images between
geographically remote medical information systems.
Similar dataflow tasks in the MIDG
system leads to the adaptation of XDS-i in the MIDG components and dataflow.
Using generic actors and existing messaging protocols , XDS-i was created for vendors to
promote interoperability and data sharing without micro-managing how actors and events
are implemented by the vendors. These existing messaging protocols include DICOM,
HTTP, ebXML, and SOAP and are described in more detail in Chapter 6, MIDG Design
Based on a Service-Oriented Architecture and the IHE XDS-i Integration Profile. Figure
8 shows the XDS-i integration profile and below are its registration and retrieval dataflow
steps
[10]
. Note that the XDS-i profile does not specify ownership over data content, and is
only responsible for connecting an image data source to its consumer.
22
Imaging
Document
Source
Document
Repository
Document
Registry
Patient Identity
Source
Document
Consumer
Imaging
Document
Consumer
1
2
3
A
C
D
B
Figure 8: IHE XDS-i Integration Profile
Steps for Registering DICOM Images
1. Image Document Source sends a new message in the form of a manifest file to the
Document Repository. The manifest file describes the metadata of the DICOM study.
2. Document Repository notifies the Document Registry of its received manifest content.
3. Document Registry matches the received patient ID to the master patient ID from the
Patient Identity Source.
Steps for Query / Retrieving Documents
A. Image Document Consumer sends a query to the Document Registry.
B. Document Registry matches the patient ID to a master patient ID from the Patient
Identity Source.
C. Document Registry returns a query result to the Document Consumer
23
D. Document Consumer retrieves the DICOM images directly from the Image Document
Source.
3.5 Current Applications in Radiology and Imaging-based Research
Data grid technologies have been utilized in other medical imaging applications by the
Image Processing and Informatics Lab (IPILab) at USC. In 2005, a DICOM-compliant
data grid project was initiated to investigate how grid technologies can be used to meet
HIPAA requirements for off-site data back-up of clinical radiology data and images
archived in the Picture Archiving and Communications System (PACS).
[3]
Developing a
DICOM services layer on top of file delivery and management services of the Globus
Toolkit, an open-source set of grid-enabling services, the IPILab was able to archive and
retrieve clinical radiology imaging studies between two WAN sites.
[11]
This DICOM-
compliant and fault-tolerant Data Grid was presented at the 2006 Radiological Society of
North America (RSNA) conference where it won an InfoRAD Certificate of Merit award.
In 2007, the system was adapted as an enterprise PACS solution such that multiple
hospitals and imaging centers could securely share radiological patient images.
[12]
Later
in 2007, the DICOM-compliant Data Grid was implemented with lossless digital
signature embedding and event auditing features so imaging-based clinical trail cores can
use it to collect and share anonymized clinical trial images.
[13]
24
Chapter 4. MIDG SYSTEM OVERVIEW
The overall MIDG design is an integration of two systems. A web-based graphical user
interface and a distributed data grid infrastructure; together they form a dedicated data
archiving solution for archiving, sharing, and distributing molecular imaging datasets
among users at geographically distant molecular imaging research sites. This chapter will
introduce the design and workflow of the MIDG system. Because the names of system
components changes from the first MIDG prototype to the current MIDG system they
will be referred to with generic labels until their respective design architectures are
presented in Chapters 5 and 6.
4.1 System Design
The MIDG system enables molecular imaging facilities and remote investigator sites to
upload and download of preclinical molecular imaging data files. Ownership of imaging
files belong to the primary investigator and molecular imaging facility staff, but data
privacy and security are the responsibility of all users that have been given access to
individual studies. Although the imaging facilities are more likely to take on the role of
data provider, while investigator sites are consumers of archived image datasets, there is
no restriction in the system design that prevents remote investigators from contributing
new data such as post-processed images or reports into the MIDG system as well. The
overall system design is shown in Figure 9 and demonstrates how a MIDG can be
implemented.
25
Preclinical
Imaging
Modalities
Analysis
Workstations
Grid
Node
Grid
Node
Grid
Node
Multi-modality
Molecular Imaging Facility
Preclinical
Investigator Sites
Informatics Research Lab
Redundant
Disk
Storage
System Overview:
Molecular Imaging Data Grid
GUI Web-
Server
GUI Web-
Server
GUI Web-
Server
Central Grid
Management
Server
Figure 9: MIDG System Overview. The molecular imaging facilities perform animal-model
imaging scans and generate multi-modality datasets for distribution. Investigators at
preclinical research sites retrieve their imaging datasets via the MIDG GUI Web-Server
and Grid Node at their site. Informatics research labs can provide long-term storage for
molecular imaging facilities and also utilize the imaging data for computational and image
processing research.
Figure 9 shows three types of sites that may participate in a MIDG – a molecular imaging
facility, an informatics research lab, and preclinical investigator sites. The molecular
imaging facility typically has multiple animal model imaging modalities and work with
investigators to perform imaging studies. The informatics research lab typically has
robust computational hardware and services that may provide additional data and
computational infrastructure for a molecular imaging facility. Preclinical investigators at
remote sites connect to with these two types of sites via the MIDG to collaborate and
utilize their imaging data and resources.
26
Setting up a site to participate in a MIDG involves configurations on user workstations,
GUI web-server, grid nodes, and grid management server. As shown in Figure 9, each
participating site has a GUI web-server and a grid node server. The grid node server is
the gateway into the MIDG back-end infrastructure, made up of a centralized grid
management server and the other grid node servers. The grid management server is
typically hosted at a robust operational facility such as an informatics research lab where
IT staff, server redundancy, backup power, air-conditioning, and secure networking
devices are available.
4.2 Data Model
Like the DICOM real-world data model, a MIDG data model was created for organizing
preclinical molecular imaging dataset based on current investigative workflows. Where
clinical radiology archives are centralized around the patient object, most preclinical
molecular imaging facilities organize their data by investigators. This however makes
searching for studies across multiple investigators very difficult, and managing user-level
access to multiple studies near impossible. The MIDG data model is study-oriented and
follows a hierarchy slightly different than that of radiology’s patient-study-series-file
structure. Figure 10 shows this study-centric MIDG data model where users such as
preclinical investigators and collaborating researchers are granted access to studies.
27
Study
Session
Group
Scan
Users
Acquisition
Files
Post-
Processing
Files
Distributed
Files
Figure 10: Small Animal Imaging Data Model for Organizing Data Context in the MIDG
Under each preclinical imaging study, data is organized into sessions, groups, scans, and
file. A session is performed on a single day, typically by a single imaging technician on a
single preclinical modality. During a session, animal subjects are grouped into control
and experimental variable groups differing by treatment and imaging parameters. And
finally, individual animals are scanned, generating a number of imaging files per scan.
The file types generated in a scan depends on the modality and imaging facility software,
but all can be categorized into acquisition, post-processing, or distributed. This MIDG
data model was created at based on the data and imaging workflow at the USC Molecular
Imaging Center.
28
4.2.1 MIDG Metadata Database Schema
The MIDG data model was implemented into the MIDG system design as the MIDG
Metadata Database, shown in Figure 11. This MIDG Metadata Database is installed on
the central grid management server and is access by all MIDG GUI web-servers at
participating MIDG sites. It contains global user accounts information and all preclincial
molecular imaging metadata, and is essential for monitoring, management, and searching
of preclincial molecular imaging studies in the MIDG system. When studies are
registered into the MIDG GUI at a particular site, the study’s metadata is written to this
central MIDG Metadata Database. Like the Grid Manager server, the centralized MIDG
Database should be hosted at a secure site with reliable hardware resources to guarantee
continuous availability in a MIDG implementation.
29
Figure 11: MIDG Metadata Database Schema. Tables with bold outline form the main
structure of the data model and are related using unique ID’s and foreign keys. More
common molecular imaging study parameters such as modality and animal type are given
separate tables to enable molecular imaging facilities to pre-define these fields in the MIDG
GUI.
The database reflects the MIDG data model hierarchy with a separate table for study,
session, group, scan, and file metadata. There are also miscellaneous tables surrounding
these core tables so that each MIDG instance can be customized for different types of
30
preclinical studies. In the top left corner of Figure 11, user accounts are given access to
studies through the MIDG GUI by originating investigators or molecular imaging facility
staff.
4.2.2 DICOM Compliance
Standardizing image files in the MIDG system into DICOM is needed for interoperability
within the MIDG and is beneficial for interoperability with external image analysis
software. Although the current DICOM standard does not specify metadata tags for
preclinical molecular imaging studies, the basic structure of radiology metadata is similar
metadata used in preclinical molecular imaging research. Furthermore, many medical
imaging viewers and image processing tools for molecular imaging data are currently
compliant with the DICOM messaging and data format standard. The MIDG can save
investigators time by automatically converting molecular imaging files to DICOM with
comprehensive study metadata already inserted into the DICOM header. This level of
organization and data provenance metadata is critical for sharing imaging research files
among multi-site collaborations.
4.3 System Components
There are four major components making up the MIDG system design: the MIDG GUI
web-servers, grid node servers, data storage devices, and a centralized grid management
server. Investigator users at a MIDG site can upload, monitor, download, and manage
their preclinical imaging study datasets by connecting to their local GUI web-server via
web browser on their user workstation. The MIDG web-server then interacts with its
31
local grid node server to exchange metadata with the central grid management server, and
exchange imaging data files with a local storage device or a remote grid node server. The
combination of grid nodes, storage devices, and grid management server create an
automate data grid infrastructure that can localize and transmit MIDG data files securely,
reliably, and quickly across the WAN Internet without user intervention. Figure 12 draws
a basic connectivity diagram of these components in the MIDG.
Figure 12: Basic Connectivity of MIDG Components. Investigators at each MIDG site
interact with the web-based GUI hosted on the GUI web-server at their site. Each site also
has a Grid Node Server, and possibly an attached larger-capacity storage device, that
exchanges DICOM images with the GUI web-server. The Grid Node Servers interact with
the central Grid Management Server and each other over the Internet for data
management and distribution with the MIDG.
Each of the three servers in the MIDG design has a unique set of services and database,
and will be described in the following component overview sections.
4.3.1 Graphical User Interface (GUI) Web-Server
The MIDG graphical user interface features upload, monitor, download, and management
functionality for preclinical molecular imaging datasets. Written in PHP and Java, the
web-based interface is hosted on an Apache2 web-server and has process queuing to
allowing simultaneous users at each site to access the MIDG. The MIDG user interface
32
depends on the MIDG Metadata Database designed from the data model for keeping
track of user account information and metadata of uploaded studies. Only one instance of
the MIDG Metadata Database is needed per implementation of the MIDG system, and
can be hosted on the same machine server as the grid management server. However, for
redundancy and performance, the database can also be hosted on its own dedicated
server. Together with the MIDG Metadata Database, the web-based MIDG GUI provides
users with a light weight portal into the MIDG and preclinical data resources. Features
include user log-in, data uploading, data search and downloading, and study monitoring
and management. Details about these features and the GUI workflow will be presented in
chapter 4.
4.3.2 Grid Node Server
The grid node server is the entry point into the distributed MIDG data grid infrastructure.
Like the MIDG GUI web-server, a grid node server is required at every participating data
grid site. Its front-end services are DICOM compliant and communicate with the MIDG
web-server over DICOM messaging and file delivery. Its back-end services are
responsible for registering incoming DICOM files with the grid management server, and
retrieving DICOM files from the data grid infrastructure. During a data retrieval process,
the grid node server queries the central grid management server to locate particular
imaging study datasets, and retrieves these files by initiating data deliver from the remote
grid node server if it does not already have a copy of the data on local storage. Retrieved
data files are then pushed to the requesting MIDG GUI web-server over DICOM.
33
The grid node server maintains its own database for local configurations, processing
queuing, and event logging that is unique to each site. The database is implemented on
every grid node server and consists of 14 tables. Configuration tables include a list of
DICOM clients that interact with this grid node and a list of supported DICOM SOP
classes. It is important to note that DICOM header information is not kept at the grid
node database due to the need to share this information with all participating data grid
sites. Therefore metadata information of incoming DICOM files and data retrieval
requests are directed to the grid management server and database.
4.3.3 Grid Management Server
The grid management server is responsible for hosting data routing and management
services, as well as maintaining a master registry of all files archived in the MIDG. Its
routing services respond to grid node servers data queries with the location that a specific
imaging dataset can be retrieved from. The grid management server’s monitoring services
handle failover when data retrieval attempts fail due to unavailability of a grid node
server or storage device. Further discussion about the design of the grid management
server will be described in Chapters 5 and 6.
The grid management server’s database is similar to the MIDG Metadata Database
because it only resides at a single centralized location in order to serve all grid node
servers. It differs from the MIDG Metadata Database because it does not include the
high-level preclinical molecular imaging study metadata such as study context and user
access configurations. The grid management server’s database is only focused on routing
34
and localization of the physical DICOM files that are archived in the MIDG. Made up of
23 tables, the database maintains a registry of all grid node servers, DICOM header
information for search purposes, mapping between DICOM fiels and grid node servers,
processing queues, and global event auditing logs. In order to provide continuous
availability for all grid node servers, the grid management database should also be
implemented in a robust server environment.
4.3.4 Storage Devices
Although dedicated storage devices are not needed because data can be kept on grid node
servers, they are highly recommended because of their internal RAID redundancy
features and for removing grid node servers as single points of failure. There can be
multiple storage devices within every multi-site MIDG implementation, but each storage
device should independently have sufficient storage capacity and high levels of built-in
reliability. Every storage device in the MIDG implementation should have a projected
storage capacity for a few years and depending on a facility’s back-up requirements.
Dedicated storage devices are mounted directly to grid node servers with local read and
write-able permissions. The types of supported storage range from hard-drives in the grid
node server to attached network storage devices, such as a partition-able network-
attached-storage (NAS) or a storage-area-network (SAN).
4.4 Workflow
The investigator’s workflow for conducting a preclinical molecular imaging study is
shown in Figure 13. The numbered steps 1 thru 10 are the traditional workflow of
35
molecular imaging investigators. The grayed components in Figure 13 identify the steps
that have changed due to interaction with the MIDG system.
Figure 13: Revised Small Animal Imaging Facility Workflow with Altered Steps Identified
in Gray. See Figure 1 for Comparison.
In step 1, investigators and imaging facility staff work together to plan and schedule an
animal-model molecular imaging study. Upon completion of planning, the imaging staff
registers the study into the MIDG’s web-based GUI by entering the planned study,
sessions, groups, and scans in step 2. In step 7, the generated data files from the study are
uploaded and archived into the MIDG through the same web-based GUI. After the
imaging files are archived into the MIDG, remote investigators with access to the study
can download them using the MIDG GUI web-server and grid node server setup at their
own site. Note that only users who have been given shared access to a particular study in
step 2 can obtain search results from the MIDG GUI. This general workflow will be
presented and discussed in more detail in later chapters.
36
Chapter 5. INITIAL MIDG DESIGN USING THE GLOBUS
TOOLKIT
The Molecular Imaging Data Grid research began with an initial development of a
DICOM-compliant data grid infrastructure using The Globus Toolkit. The experience,
design architecture, and workflows from this initial design were largely carried over
when developing the second and current implemented MIDG system, to be discussed
later in Chapter 6. This chapter presents the 4-layer software architecture of the MIDG
system, detailed data archiving, management, and retrieval workflows, and finally the
design challenges in development of the MIDG system using The Globus Toolkit. These
design challenges form the foundation of the current MIDG design and implementation.
5.1 MIDG System Architecture Using The Globus Toolkit 4.0.2
The Globus Toolkit is an open-source software package that enables developers to utilize
grid services as an application programming interface (API) library to implement file
management and data sharing across the WAN
[14,15]
. In the MIDG system, we used many
of the services provided in the Globus Toolkit package version 4.0.2 to build a 4-layer
system architecture that integrates application services, gateway services, core grid
middleware, and hardware resources. The Globus Toolkit’s Replica Location Service
(RLS) is used to maintain and manage file locations of files stored within the data grid.
The Globus Toolkit’s Reliable File Transfer (RFT) service and GridFTP protocols are
used to transfer files reliably across the WAN. And the Globus Toolkit’s Simple CA
37
credential management package is used to authenticate communications and file transfers
between grid node servers and remote storage device.
Figure 14: Molecular Imaging Grid Architecture, Built with Globus Toolkit’s Grid
Services, is Tailored for Molecular Imaging Data Management
Figure 14 shows these four layers of the Molecular Imaging Data Grid system
architecture. Figure 15 is provided to demonstrate similarities and differences with a data
grid system that was developed before my research in molecular imaging at the IPILab
for clinical radiology applications. Note that the grey components in both Figures 14 and
15 represent packages from the Globus Toolkit. Because preclinical molecular imaging
systems do adhere to a standardized DICOM data and messaging protocol, the
38
application layer of the MIDG architecture required a graphical user interface that takes
in study metadata through manual user input, and then converts uploaded preclinical
imaging data to DICOM before passing files to the gateway layer. New features were
also added to the gateway layer and grid-middleware layer to improve internal data
persistence during failure of resource layer components, and monitoring of resources and
events within the MIDG.
Figure 15: Data Grid System Architecture for Clinical Radiology Applications
The following sections will discuss the individual layers of the Molecular Imaging Data
Grid system architecture shown in Figure 14.
39
5.1.1 Application Layer
The Application Layer in Figure 14 is the user interfaces that handle data exchange
between user workstations and the Gateway Layer. Implemented as web-based interfaces
instead of DICOM protocols, the file upload and download at the application level is
done through HTTP web pages and shared web-server directories. Authorized users
interact with the Application Layer software to upload data files, submit data requests,
and retrieve parameter-based queries and files. Additionally, the Application Layer has
grid management and user authentication interfaces for molecular imaging facilities to
monitor and control user access over shared molecular imaging study data. Unlike the
data grid architecture designed for radiology, where users interact with the data grid’s
DICOM services from external DICOM-compatible applications, the MIDG needs to
maintain study-related metadata and user-level access control. Further description of this
web-based graphical user interface is in Chapter 7.
5.1.2 Gateway Layer
In Figure 14, the Gateway Layer consists of software installed on Grid-Access-Point
(GAP) servers that are needed to handle incoming files and requests from the MIDG user
interface at each participating data grid site. GAP servers are installed on the local area
network (LAN) of each site and should be protected behind network firewalls that only
allow particular ports to pass through to the public WAN. When GAP servers receive
new molecular imaging study files, the Metadata Catalog Service extracts relevant
metadata from the header of incoming DICOM files and updates the metadata database.
40
Then, the files are physically distributed to one or more of the external grid storage
archives over the WAN using secure file delivery and communications protocols. When
GAP servers receive a data retrieval request from the MIDG user interface, the Data
Retrieval Service contacts the Core Middleware Layer to locate and initiate GridFTP file
delivery of the requested study files. Upon receipt of the files at the GAP server, the files
are prepared and sent to the MIDG user interface via DICOM. Details of these two
workflows are described later in this chapter.
5.1.2.1 Data Persistence Manager
Because data files are archived at one or more storage devices within the Molecular
Imaging Data Grid, maintaining this redundancy during events of storage device failure is
handled by the Data Persistence Managers at each active storage site. Data persistence is
maintained by the Data Persistence Manager service through replication of data files
between two remote storage archives using GridFTP protocol provided in the Globus
Toolkit. The Data Persistence Manager gets notified by the Metadata Catalog Services on
a GAP server when a file failed to be sent to a storage device in the data grid due to failed
connectivity, disk hardware, or certificate authorization. The Data Persistence Manager
monitors the availability of that failed remote storage device daily, and tries to re-
populate it during off-peak hours by replicating it from another storage device. Upon
successful replication, the RLS is updated to reflect the new copy of that archived file.
This method creates an automated disaster recovery mechanism for all data files archived
in the MIDG.
41
5.1.3 Grid-Middleware Layer
The Core Middleware Layer is the third layer down in Figure 14 and is necessary for
managing and coordinating grid resources amongst the multi-sited infrastructure. These
middleware services should be implemented on a centralized server that is accessible
over the WAN in order to provide critical communications between GAP servers and
management of the data grid resources.
5.1.3.1 Replica Location Service
The Replica Location Service (RLS) is a web-service provided in the Globus Toolkit, and
is used for indexing and locating files among multiple remote storage devices. The RLS
maintains its own 2-tier database structure that maps filenames, also known as logical file
name (LFN), to the actual URL paths that point to the destination of actual files, also
known as physical file names (PFN), within the Molecular Imaging Data Grid
[14]
. Figure
16 shows how a logical file name ‘XYZ’ can be mapped to three copies of that file at
three different sites, having physical file names of ‘XYZreplica1’, ‘XYZreplica2’, and
‘XYZreplica3’.
42
Figure 16: RLS Mapping of Logical File Name (LFN) to Physical File Name (PFN).
Courtesy of The Globus Toolkit
[14]
5.1.3.2 File Delivery Service
To send files securely and efficiently between a Grid-Access-Point server and a remote
storage archive, the Grid-Access-Point services utilize a file delivery protocol called
GridFTP in the Globus Toolkit. GridFTP is paired with authentication methods, also
provided by the Globus Toolkit, to guarantee the security of data transmission across
WAN using data encryption technology. Therefore, every component in the Molecular
Imaging Data Grid needs to have GridFTP package and a valid digital certificate in order
to interact with other resources in the data grid. Every Grid-Access-Point server, RLS
server, and storage device needs to maintain a list of certificate subject names identifying
the remote grid resources it requires interaction with. These subject names are used to
authenticate security prior to every messaging or file delivery exchange. Each Molecular
Imaging Data Grid instance also needs a certificate authority server where new
43
certificates can be signed and distributed to authorized Grid-Access-Point servers, disk
storage archives, and Replica Location Service servers.
5.1.3.3 Resources & Events Monitoring
Monitoring of grid hardware resources and auditing of data handling events are essential
to maintaining the integrity of data, services, and users in the molecular imaging grid. A
dedicated monitoring and auditing server should have its own database to record the
status of all hardware resources, such as storage archives, database servers, and
computational servers, and also all major events from the users and middleware services
so as to leave an audit trail.
5.1.4 Resources Layer
The components making up the resources layer in Figure 14 architecture diagram are the
storage devices, databases, applications, and network devices. The servers that host these
components can be distributed across different site locations, but should be hosted at a
reliable and maintainable site. Because these services are shared, data fault-tolerance
within its storage archives and redundancy of database servers are very important. There
are three main database resources in the imaging grid - the molecular imaging metadata
database, the Replica Location Service databases, and the real-time monitoring and
auditing database. Furthermore, the firewall of each participating data grid site should
only enable particular grid ports to pass through, including the Replica Location Service
port 39281, GridFTP port 2811, and the SQL database port 1433.
44
5.2 System Connectivity and Workflow
Figure 17 below is the connectivity diagram of the MIDG from the perspective of a
molecular imaging site. The light blue area shows the data grid components installed at a
single molecular imaging site where the Grid-Access-Point server interacts with the user
interface over HTTP web protocols, databases over structured query language (SQL), and
storage archives over GridFTP. The grey cloud represents the Molecular Imaging Data
Grid (MIDG) that is exposed to each molecular imaging site as a resource on the wide-
area-network (WAN), also known as the Internet. These include the expandable disk
storage archives and attached Data Persistence Server, the MIDG metadata database
server, the master Replica Location database server, the grid resources and events
monitoring server, and the certificate authority server.
45
Figure 17: Components and Connectivity of the Molecular Imaging Grid Architecture,
from a Molecular Imaging Site’s Perspective (Top). Bottom: Certificate Authority Server -
third-party verification of digital certificates; Data Persistence Server - maintain long-term
storage by study data migration; Grid Resources & Event Monitoring server - real-time
monitor of databases, storage archives, and data movement events.
46
Next are the three dataflow scenarios that are most significant to users of the Molecular
Imaging Data Grid – archiving, data persistence management, and data retrieval. Based
on these dataflow diagrams, the performance of a multi-site implementation of the
Molecular Imaging Data Grid system at USC will be presented. Details of the datasets
and setup for evaluation will be presented in chapter 8.
5.2.1 Archiving of Molecular Imaging Data
To archive completed small animal imaging datasets into the Molecular Imaging Data
Grid, new studies are registered and uploaded into the Molecular Imaging Data Grid
through the web-based user interface on a small animal imaging workstation. The step-
by-step dataflow is listed directly following Figure 18.
47
Figure 18: Archiving Molecular Imaging Files and Study Metadata - Dataflow Diagram
Dataflow
1. New molecular imaging studies are registered into the grid and imaging files are
uploaded by users to the local Grid-Access-Point through the user interface,
implemented here as a web-based client.
48
2. Study metadata and cached file locations are sent in an XML summary file to the
Metadata Catalog Service (MCS) also on the Grid-Access-Point server.
3. Metadata Catalog Service updates the master metadata database with the molecular
imaging study information. A local metadata database may also be updated as a back-
up in case of WAN network failure or master database failure.
4. Metadata Catalog Service calls the Replica Location Service (RLS) API to index the
replicated destinations for each file.
5. RLS service authenticates itself with the third-party Certificate Authority server using
the host certificate of the Grid-Access-Point server.
6. RLS service updates the logical filename to the RLI databases, and the full
destination file path to the RLC databases.
7. Metadata Catalog Services calls the Reliable File Transfer (RFT) service API to
replicate the files in the Grid-Access-Point cache to the (pre-configured) remote
storage archives. Files are kept in cache until transfers are complete.
8. RFT service authenticates itself with the CA server using the user certificates on the
Grid-Access-Point server.
9. RFT service authenticates the first destination server with the CA server using the
user certificates remote archive server.
10. RFT service sends a copy of the file to one remote storage archive via GridFTP.
11. RFT service authenticates the second destination server with the CA server using the
remote archive server’s user certificate.
12. RFT service sends a copy of the file to the second storage archive via GridFTP.
49
5.2.2 Data Persistence Management
The detailed dataflow for data persistence management within the Molecular Imaging
Data Grid is shown in steps 1 thru 10 of Figure 19 below. The scenarios to be addressed
and evaluated include:
A. Maintaining data redundancy of all imaging studies within the data grid archives
B. Monitor and prevent archive devices from running out of disk space
C. Migrate historic imaging studies to secondary storage archives
50
Molecular Imaging Data Grid
Wide-Area-Network
Data Persistence Manager
Monitoring Service Back-Up Policies
Redundant
Metadata
Database Server
Certificate
Authority Server
Secondary Storage
20 TB capacity NAS
9. GridFTP
8. Certificate
Authentication
Local
Metadata Db
(optional)
Redundant RLS
Database Server
Knowledge-based
Migration Middleware
Data Replication Service (DRS)
Local RLS
Db Server
(optional)
3. Alerts
1. XML File
5 & 10. DRS Globus API
2. Monitor SAN
File-system
(mounted)
Primary Storage
5 TB capacity SAN
RAID 5
4. Query for
Study Status
7. Modify RLS
Databases
6. Certificate
Authentication
8. Certificate
Authentication
Imaging Site’s FIREWALL
Molecular Imaging Site
Local-Area-Network
Figure 19: Data Persistence Management - Dataflow Diagram
Dataflow
1. Data Persistence Manager’s Monitoring Service loads the XML configuration file
containing the back-up policies for the local primary storage.
2. Monitoring Service periodically scans the SAN file-system for files that violate the
back-up policies.
51
3. When a file violates a back-up policy, an alert with the file name and path is sent to
the knowledge-based migration middleware.
4. Knowledge-based Migration Middleware queries the metadata database for the
study’s status to confirm that the file’s study has been marked as completed.
5. If the study has been completed, the Data Replication Service is told to migrate the
file from the (local) primary SAN to a pre-configured secondary storage device.
6. The DRS authenticates itself with the CA server.
7. The DRS modifies the two RLS database types – the RLI database and LRC database
– to reflect the migration of files to another storage device.
8. The DRS authenticates both source and destination host certificates against the CA
server.
9. The DRS initiates GridFTP file transfer from the (local) primary storage SAN to the
(remote) secondary storage NAS.
10. Knowledge-based Migration Middleware sends delete file and delete RLS mapping
commands to the DRS, to delete the physical file on the local SAN and the RLS
database mappings for that file.
5.2.3 Data Retrieval Across Multiple Research Sites
Investigators are able to search and retrieve molecular imaging datasets from the
Molecular Imaging Data Grid using the system’s User Interface shown in Figure 20.
52
Evaluation will be based on:
A. User-level authorization and filtered access to molecular imaging studies
B. Performance in retrieving datasets originating from a remote site
C. Performance in retrieving antiquated datasets archived in long-term storage
Figure 20: Search and Data Retrieval of Molecular Imaging Studies - Dataflow Diagram
53
Dataflow
1. User, upon log-in, queries for a study’s dataset by searching with study-specific
metadata parameters.
2. User interface web-server queries the local metadata database, if available, and
master metadata database for molecular imaging studies that are authorized for access
to the user and that meet the user specified study parameters.
3. Upon user selection of a desired imaging file to retrieve from the grid archives, the
web-server executes a script to run the Retrieve Service, also on the Grid-Access-
Point server, including with it the study ID and file name.
4. Retrieve Service queries the RLS service for all replica locations of the select file.
5. RLS service authenticates itself with the CA server using the host certificate.
6. File replica locations are sent back to the Retrieve Service.
7. Retrieve Service calls on the Globus Reliable File Transfer Service to retrieve the file
from the first remote storage archive.
8. RFT Service authenticates itself with the CA server using the user certificate.
9. RFT Service authenticates the first file source server’s user certificate with the CA
server. If this fails, then skip to step 11.
10. RFT Service initiates file delivery from the source storage archive back to the Grid-
Access-Point server cache. If this fails, then skip to step 11.
11. RFT Service authenticates the second server’s user certificate with the CA server.
12. RFT Service initiates file delivery from the source storage archive back to the Grid-
Access-Point server cache.
54
13. Upon completion of GridFTP file transfer, the user can download the retrieved file
from the Grid-Access-Point cache. Note: the study’s temporary cache directory is
emptied at the end of each day.
5.3 Design Limitations and Bottlenecks
A MIDG system using The Globus Toolkit was designed and implemented at the USC
Image Processing and Informatics Lab in 2009. The system was able to utilize The
Globus Toolkit to deliver and retrieve molecular imaging studies between simulated sites
within the laboratory environment
[16]
. However, challenges arose during more thorough
evaluations regarding performance bottlenecks, lack of robust failover techniques, and
complicated security requirements involved for file delivery. Because computationally
intensive DICOM middleware had to be inserted between the MIDG Application Layer
and the Globus Toolkit infrastructure, a proverbial moat was created around the file
delivery and management infrastructure of The Globus Toolkit components. Performance
bottlenecks were caused by unknown processes within the data grid infrastructure using
the Globus Toolkit. Failures within the data grid infrastructure were poorly
communicated to MIDG management services, making robust failover policies difficult
to implement. These internal failures were frequently caused by invalid digital certificates
given to new GAP servers with Globus Toolkit components. Furthermore, the limited
availability of API documentation thwarted continued development of needed
functionalities such as data life-cycle management, MIDG system monitoring, and rules-
based load-balancing. These lessons learned initiated a new approach in early 2009 to
55
design the MIDG with a data grid infrastructure based on IHE XDS-i. This new approach
forms the foundation of my dissertation accomplishment and is presented in Chapter 6.
56
Chapter 6. MIDG DESIGN BASED ON IHE XDS-i INTEGRATION
PROFILE AND OPEN GRID SERVICES ARCHITECTURE
The initial MIDG design using The Globus Toolkit software packages revealed design
limitations and performance bottlenecks, so a different data grid infrastructure that has
been developed for medical imaging data
[17]
was utilized in the MIDG to achieve better
quality of service and improved workflow methods, respectively. As with the Globus
Toolkit, this data grid designed for handling medical images follows the Open Grid
Services Architecture framework
[18]
, but it also follows the workflow standards of the
IHE XDS-i integration profile
[10]
set forth for enterprise radiology. The proverbial black
box of file management and delivery in the Globus Toolkit
[14]
was replaced with these
dedicated web-services for optimized management and delivery of multi-modality
DICOM imaging datasets. The 4-layer MIDG system architecture has been revised and is
presented in this chapter. Note that the MIDG GUI component remains unchanged
because DICOM is still required between the MIDG Web-Server and the data grid node
server at each site.
6.1 Dataflow based on IHE XDS-i Integration Profile
The Integrated Healthcare Enterprise (IHE) defined a system and workflow integration
profile called Cross-Enterprise Document Sharing for Imaging (XDS-i) to specify a
standard communications and data exchange methodology between vendor-agnostic
systems that provide, archive, and consume medical images in an enterprise healthcare
57
environment. As with its other integration profiles in the Radiology Framework, the IHE
XDS-i integration profile is an industry supported workflow model to promote
interoperability and streamlined dataflow between healthcare systems and sites. With this
new MIDG data grid infrastructure based on XDS-i, messaging and data file transmission
involved in registering, querying, and delivering preclinical molecular imaging datasets
are achieved in fewer steps and with fewer components. Although the MIDG is a pre-
clinical application, it benefits from the IHE experience manifested in the integration
profiles such as XDS-i for clinical environments.
6.1.1 Uploading and Downloading Workflow
Figure 21 is taken from the IHE XDS-i Integration Profile and demonstrates the MIDG
workflow for uploading and downloading medical imaging data between a data providing
site, central archiving registry, and a data consuming site. The black boxes represent
actors involved, and the arrows mark the interactions between these components. The red
boxes in Figure 21 show how these actors are represented by the hardware components of
the MIDG data grid. Steps 1 and 2 are the uploading workflow, and steps 3, 4, and 5 are
the downloading workflow. Note that the Patient Identity Source component in the XDS-i
profile is not covered in the MIDG because investigator identifiers are expected to be
consistent across all participating MIDG sites, and do not need to be normalized to a
global patient identifier like in enterprise radiology. Also note that MIDG Web Servers
interact with the Grid Node Servers, but are not depicted in Figure 21 because they are
high-level applications that are not within the scope of the IHE XDS-i integration profile.
58
Imaging
Document
Source
Document
Repository
Document
Registry
Patient Identity
Source
Document
Consumer
Imaging
Document
Consumer
1
2
3
4
5
Grid Node
Server ‘A’
Grid
Manager
Server
Grid Node
Server ‘B’
Figure 21: MIDG Implementation of the IHE XDS-i Integration Profile
Preclinical molecular imaging datasets are uploaded to Grid Node Server ‘A’ as a
DICOM imaging study and gets registered into the Document Repository on the Grid
Manager Server as an XML document. In step 1 of Figure 21, Grid Node Server ‘A’
sends a SOAP message containing the in-coming DICOM study’s textual metadata as an
XML document to the Grid Manager Server’s Data Registration Service, represented by
the document repository. The received XML document is then parsed and entered into the
XDS Registry database that also resides on the Grid Manager Server.
When DICOM query and retrieve requests for a particular study dataset are sent to Grid
Node Server ‘B’ from the MIDG Web-Server, presumably located at a MIDG site
different from Grid Node Server ‘A’, the DICOM commands are handled by the Grid
Node Server ‘B’ Document Consumer web-service. First in step 3, the DICOM
59
Query/Retrieve Service on Grid Node Server ‘B’, also known in the XDS-i profile as the
Document Consumer, queries the Document Registry database on the Grid Manager
Server for instances of files for that molecular imaging study. Then the DICOM
Query/Retrieve Service fetches the uploaded XML document from the Document
Repository, identifying the archived location of those files. In step 5, the Grid Node
Server ‘B’ Imaging Document Consumer initiates a GridFTP transfer on Grid Node
Server ‘A’ via SOAP message to deliver the requested files back to itself.
6.1.2 Rules-based Data Management Workflow
As in the first MIDG design, internal data management workflow among Grid Node
Servers is separate from the uploading and downloading workflows just mentioned.
However, new rules-based data routing capabilities on the central Grid Manager Server
creates a new method and workflow for maintaining continuous data availability among
the MIDG. Instead of deploying Data Persistence Manager services on every Grid-
Access-Point server, the new workflow centralizes routing and monitoring at the Grid
Manager Server and is shown in Figure 22.
60
Grid Node
Server, A
Grid Manager
Server,
rule X
MIDG
Web-Server
. . .
Grid Node
Server, B
MIDG
Web-Server
Grid Node
Server, C
MIDG
Web-Server
Grid Manager Database
(aka. XDS Registry)
2
3
4
5
1,6
Figure 22: Rules-based Routing / Back-up Workflow
The Grid Manager Server has an Intelligent Routing Service that is configured for multi-
site back-up purposes and long-term data storage migration. Figure 22 demonstrates the
workflow for automatically replicating incoming molecular imaging data at Grid Node
Server ‘A’ to Grid Node Server ‘B’ and/or ‘C’ for back-up data redundancy, and will be
named rule ‘X’ for example. The following steps describe the workflow.
1. Intelligent Routing Service triggered by a rule ‘X’ initiates replication of
molecular imaging files from Grid Node Server ‘A’ to Grid Node Server ‘B’.
2. Grid Node Server ‘A’ sends a copy of the image file(s) to Grid Node Server ‘B’
over GridFTP.
3. DICOM Storage Service on Grid Node Server ‘B’ acknowledges and registers
this new file by sending a SOAP message to Grid Manager Server’s Data
Registration Service.
61
4. Rule ‘X’ of Intelligent Routing Service may also be configured to require another
copy of the same image file(s) to be sent to Grid Node Server ‘C’ as well.
5. The same process as step 3 is repeated for Grid Node Server ‘C’ upon receiving
its copy of the data.
6. (Conditional) If rule ‘X’ of Intelligent Routing Service on the Grid Manager
Server is a migration event for long-term storage, then a delete request is sent to
the Delete Service on Grid Node Server ‘A’ to remove the original file(s) that
were replicated to Grid Node Server ‘B’ and/or ‘C’.
6.2 Services-Oriented Functionality
Utilization of this new data grid infrastructure in the MIDG improves file delivery
performance and reliability because of distinct features that were previously absent in the
first MIDG design. This section describes the key new features of this data grid
infrastructure.
6.2.1 Multi-Threaded GridFTP Transfers
One of the challenges in using the Globus Toolkit was the difficulty to implement
multiple simultaneous GridFTP file transfers needed to move multiple images typically
making up a single imaging study. For example, transmitting a 4.5 GB MicroCT study
with 1000 images across the Internet using The Globus Toolkit required either
transferring a single huge ZIP file or individually transferring the 1000 images. The
former had the time-consuming task of compressing and decompressing the ZIP file at
each end, while the later resulted in large overhead processing times needed to negotiate
62
transmission of each image. Although the Globus Toolkit was stated as capable of multi-
threaded transfers, it was challenging to implement in the MIDG without further API
documentation. The new data grid infrastructure includes multi-threading GridFTP
capability and could retrieve up to 5 images per instance. This approach yields a
performance improvement up to a magnitude of 5 times faster.
6.2.2 Rules-based Data Routing
Intelligent management of imaging datasets within the data grid infrastructure is a large
factor in the usability and reliability of the MIDG design. In addition to the upload and
download requests from users, data movement from one site’s Grid Node Server to
another is also needed for data management operations. There are three scenarios that
require internal auto-routing of molecular imaging studies. If a participating site in a
MIDG is a small site contributor to a MIDG research group or community, they may not
have long-term storage devices to contribute into the MIDG. Therefore the Grid Node
Server at such sites act as light-weight gateways that rely on the remote MIDG sites for
both short-term and long-term data storage. Secondly, some collaborating sites in a
MIDG may want to back up critical molecular imaging datasets to a second remote
MIDG site for better data fault-tolerance. And lastly, a MIDG site may not have long-
term storage capacity and needs to migrate older molecular imaging study datasets to a
larger long-term storage site. All these scenarios require an intelligent routing service on
the Grid Manager that can automatically detect and replicate particular imaging study
types from one MIDG site’s Grid Node Server to another based on pre-configured rules.
63
Depending on the routing requirements, there is also a delete function that removes data
files from the replication source after the files have been successfully migrated to the
destination site’s Grid Node Server. Parameters required in these routing rules include
source site, destination site, modality type, data age threshold, and investigator. Table 2
show three sample routing rules to demonstrate the three management scenarios. Note
that these management services are able to reuse the same web-services on the Grid
Management Server involved data in the upload and downloading workflows.
Table 2: Sample Rules-Based Routing Configurations for the Routing Service
Routing
ID #
Source Site
Dest. Site
Modality
Type
Age
Threshold
Investi-
gator
Delete
Source
Copy
1 temp_site backup-site -- -- -- yes
2
research-
site
imaging-
site
US -- johndoe no
3
imaging-
site
backup-site -- 2 years -- yes
6.2.3 Internal Hardware and Software Monitoring
Although monitoring services were implemented in the first MIDG design, it did not
implement failover for data retrieval because the Globus Toolkit did not check for the
presence and integrity of a file before attempting to retrieve it from one site’s Grid Node
Server to another. Although event failure messages were generated in the Globus
Toolkit’s Replica Location Service, they were not extracted by the monitoring services,
thereby causing it to be more of an observational monitoring rather than being proactive
during the archiving, management, and retrieval workflow. In the new data grid
64
infrastructure, hardware and software monitoring tools were built into the uploading and
downloading web-services, receiving real-time error messages from low-level services
and responding with alternate attempts. Table 3 lists the hardware-level and software-
level components that are now able to be monitored.
Table 3: Internally Monitored Components of the MIDG Infrastructure
Monitored Hardware Components Monitored Software Events
Grid Node Servers
Grid Node Database
Grid Manager Server
Grid Manager Database (aka. XDS
Registry)
Grid Node Queue
New Imaging Study Registration
Data Source Integrity
6.2.4 Improved Data Security and Auditing
One of the hurdles in developing the MIDG design with the Globus Toolkit was the
complex deployment of digital certificates for secure messaging and the lack of
integrated system-wide auditing. In the new MIDG infrastructure, direct cryptographic
public and private keys are exchanged automatically between interacting software
components, establishing a secure and encrypted data delivery protocol between web-
services across the WAN. Because the DICOM level web-services are seamlessly
integrated with the web-services responsible for file delivery, event auditing logs are able
to consolidate study-level metadata with low-level file delivery jobs, providing
informative high-level overview for MIDG management users of the success or failure of
data traffic among its preclinical molecular imaging sites.
65
6.3 Service-Oriented Architecture (SOA)
As specified in OGSA, this MIDG data grid infrastructure is a service-oriented-
architecture built with web-services and XML-based messaging. The concept of a
service-oriented design is distinct separation of loosely coupled tasks that are relatively
autonomous in logic, stateless in relation to each other, and reusable for different tasks.
[18]
In this new infrastructure, gateway and grid-middleware layers are implemented as Java
J2EE web-services that interact with one another over SOAP messaging protocol, XML
document encoding, and WSDL service definitions. These three fundamental protocols of
SOA systems and the new 4-layer architecture will be presented in this section.
6.3.1 SOA Protocols
Figure 23 is a communications protocol diagram adopted from the Open-Grid Services
Architecture to demonstrate the hierarchy of protocols in a SOA design. The top-most
Grid Applications level is represented by the MIDG GUI, which is written in PHP and
Java and accesses the Grid System as a DICOM client application. The Grid System level
constitutes the actual data grid infrastructure made up of web-services deployed on Java
Application Servers. Communications among these web-services are carried out in
industry-accepted XML-based messaging protocols such as SOAP and WSDL, shown in
the third level. The bottom-most level are the underlying Internet Protocol standards
needed for security and robust delivery of these XML-based messages between the web-
services.
66
Internet Protocols (HTTP, SSL)
XML Protocols (SOAP, WSDL)
Grid System (Web-Services)
Grid Applications
Figure 23: Communication Protocols in the MIDG
As mentioned earlier, the three fundamental protocols of SOA design are XML
documents, SOAP, and WSDL. XML is a document format that uses opening and closing
tags to label textual field values, thus providing a name-value pair. These tags, however,
can follow a parent-children pattern such that a tag can contain one or more child tags.
The result is a metadata-enriched document that can be created and parsed by computer
software. The Simple Object Access Protocol (SOAP) is like an envelope that packages
data content with specific sender to receiver information in the form of an XML
document.
[19]
The combination of XML structured data encapsulated in SOAP messages
is a delivery method widely used over public Internet sites, often to transmit database
query results or form field values. In the MIDG, database interactions, web-service
triggers, and monitoring results are exchanged using the SOAP and XML document
protocols. Another data content packaged with SOAP are Web Services Description
Language (WSDL) documents, an XML-based document structure that describe web-
services by providing operations, input requirements, and output types of a particular
67
web-service.
[19]
WSDL is critical in the discovery, utilization, and interoperability of
distributed web-services in the MIDG data grid infrastructure.
The two main protocols at the Internet Protocols level are the Hypertext Transfer
Protocol (HTTP) and Secure Socket Layer (SSL). HTTP is a fundamental networking
protocol that functions as a request-response exchange in a client-server model. SOAP
relies on HTTP for message negotiation and transmission between web-services. Secure
Socket Layer (SSL) provides security for communications by encrypting message
exchange at the Application Layer. Combining HTTP and SSL encryption enables SOAP
messages with encapsulated XML-based data content to be quickly and securely
delivered between MIDG web-services.
6.3.2 4-Layer System Architecture
The new MIDG design architecture is shown in Figure 24 with the data grid
infrastructure components represented in grey rectangles and the MIDG GUI components
represented in orange rectangles. The gateway layer is now made up of the Grid Node
Server and the MIDG GUI Web-Server. Furthermore, the gateway, grid-middleware, and
resource layers are now largely made up of the data grid infrastructure components
because the DICOM services are now integrated into the underlying distributive
infrastructure, rather than accessing the Replica Location Services as an external
consumer application.
68
Figure 24: Service-Oriented-Architecture of MIDG (Compared with Figure 14 of the MIDG
built with Globus Toolkit). *Grid Manager Database is also known as the XDS Registry.
6.3.3 Application Layer
The four primary interfaces in the Application Layer are for study data registration and
uploading, study review and monitoring, parametric search and study downloading, and a
management interface. These remain relatively unchanged aside from the addition of
process queuing features and layout improvements. As in Chapter 5, these user interfaces
are web-based pages written in PHP and Java, and hosted on MIDG Web-Servers at each
participating MIDG site. The MIDG Web-Server connects to a remote MIDG Database
for registration and retrieval of high-level molecular imaging study metadata that are not
69
covered by the DICOM file headers. The MIDG Web-Server is a user-level middleware
that interacts with the Grid Node Server at each site, specifically the DICOM Store
Service and DICOM Query/Retrieve Service. Further detail on these four interfaces is
presented in Chapter 7.
6.3.4 Gateway Layer
Three of the four user-level middleware services reside on Grid Node Servers that are
required at each participating MIDG site. These services are the DICOM C-Store Service,
DICOM Query/Retrieve Service, and Delete Service. The MIDG Web-Server is
considered in general as a service which sends user-initiated DICOM requests and
imaging files to these three Grid Node Server services. The DICOM C-Store Service
receives DICOM images during the upload process from the MIDG Web-Server, and
then registers the study metadata provided in the DICOM image header to the central
XDS Registry on the Grid Manager Server. The DICOM Query/Retrieve Service is
responsible for locating and sending imaging study data files from a source Grid Node
Server site to the requesting Grid Node Server, assuming no local copy of the data files
yet exist. This retrieval process is performed when an investigator requests to download a
particular study dataset of interest, or when the Intelligent Routing Service (core
middleware layer) on the Grid Manager Server initiates replication of study data files
from one site to another either for back-up or long-term storage migration. Following the
later case of migration, the Delete Service is called on the original sending Grid Node
Server to permanently remove the physical files from the attached storage filesystem.
70
Notice that the Grid Node Server does not maintain any long-term data for the purpose of
minimizing single-points of failure at remote sites. Molecular imaging data files are
archived onto local storage devices, such as a network-share file-server or storage-area-
network device (SAN) with built-in disk redundancy. Therefore a Grid Node Server, with
properly configured settings, can quickly replace a faulty Grid Node Server in disaster
scenarios without service interruption or loss of local data.
6.3.5 Grid-Middleware Layer
The core middleware services reside on the central Grid Manager Server and carry out
the tasks not uniquely specific to a MIDG site. This deviates from the first MIDG design
with GTK where data registration, discovery, and persistence were handled heavily at
each site’s Grid-Access-Point Server. Here, only one instance of each core middleware
service is required per MIDG community. The only scenario with more than one instance
of core middleware services would be for mirroring Grid Manager Servers across two
hardware servers to provide automatic failover redundancy. Such redundancy at the
hardware level can be done, but is not discussed or implemented in this research.
The core middleware services are Java web-services running on the Java Application
Server on the Grid Manager Server. The Data Registration Service receives SOAP
messages from the DICOM Store Service that contain study metadata to be registered in
the XDS Registry Database, which in this research is hosted on the Grid Manager Server.
The Data Discovery Service responds to the DICOM Query/Retrieve Service with query
results from the XDS Registry Database regarding availability and location of a requested
71
molecular imaging dataset, packaged inside a XML-based SOAP message. The
Intelligent Routing Service is a rules-based data management service for multi-site data
redundancy and long-term storage migration features mentioned earlier in Section 6.1.2.
Lastly, the Monitoring Service is a web-service that receives event and error messages
from all user-level middleware and core middleware services already mentioned.
Centralizing data access and distribution events and error messages in the MIDG data
grid infrastructure is crucial for usable auditing information and automated error-
handling.
6.3.6 Resources Layer
The bottom-most layer in this MIDG architecture continues to be made up of disk storage
devices, databases, and networking infrastructure. The MIDG Database carries over from
the previous MIDG design architecture’s Imaging Metadata Database, but now is only
used by the MIDG Web-Server. The User-Level Middleware services no longer save and
query DICOM metadata from the Imaging Metadata Database, but rather connect to the
Grid Manager Server for access to the XDS Registry. This redirection demonstrates the
integration of web-services across the entire data grid infrastructure that connects the
multiple sites of a MIDG research community. The Globus Toolkit’s Replica Location
Database has also been removed and replaced by the Grid Manger’s XDS Registry,
mapping DICOM study identifiers to the physical locations of the distributed molecular
imaging datasets. The third database listed in the Resource Layer is the new Grid Node
Queue Database, demonstrating the process queuing capability at each Grid Node Server.
72
This Queue Database is utilized by all three web-services on the Grid Node Server and
implements a first-in-first-out rule. The networking infrastructure remains relatively the
same as before, with router devices and network firewalls, but the port configurations
have changed on these security devices to enable the various web-services to
communicate with one another.
73
Chapter 7. WEB-BASED GRAPHICAL USER INTERFACE
7.1 Purpose and Design
The MIDG graphical user interface (GUI) is used by investigators and molecular imaging
staff to upload, manage, search, and download molecular imaging study data from the
MIDG distributed data grid infrastructure. Regardless of whether dataset files are kept
within the MIDG on a local or remote site Grid Node Server, every site has a MIDG web-
server, as shown in Figure 25. This web-server interacts with its local Grid Node Server
over DICOM and can be implemented as a virtual machine run on the same hardware
machine as the Grid Node server, given sufficient hardware requirements. The MIDG
graphical user interface is web-based and accessible simultaneously via web-browser
from multiple user workstations at each site.
Storage Device
GUI Web-Server
Grid Node Server
Central Grid
Management Server
Remote Grid
Node Server(s)
Multiple users at a
single MIDG site
Internet
Investigator WS
Investigator WS
Investigator WS
Investigator WS
Figure 25: Multiple Simultaneous Users can Access the MIDG Graphical User Interface
Because It is Web-Based with Process Request Queuing Mechanisms
74
As the data input- and output- gateway between users and the back-end MIDG data grid
infrastructure, the MIDG GUI web-server also handles converting files into DICOM
before uploading into the data grid infrastructure. It is also responsible for handling user
searching of preclinical imaging datasets before submitting a DICOM retrieve request to
the Grid Node Server for a particular study.
7.1.1 Framework and Programming Language
The software for the MIDG GUI is written in PHP and Java programming languages in
order to create HTTP web-pages with database and server-side logic functionality. In
order to handle multiple simultaneous user requests, each GUI web-server has queuing
capabilities using a local database for documenting incoming data requests. This local
database also holds DICOM connectivity configurations to the local Grid Node Server.
For all preclinical study metadata, however, the web-server must read and write from the
central MIDG database, typically hosted on the remote Grid Manager Server.
7.1.2 Study-Centric Sharing
The MIDG GUI allows registered users to share their molecular imaging study datasets
with other registered users by following a study-centric data model, as presented in
Chapter 4. This is done via a table in the central MIDG database called ‘access’, which
maps user ID’s to study ID’s. There are no limits to the number of people who can access
a single study. However, only MIDG administrators with administrator privilege can
delete information and files through the MIDG GUI. Note that MIDG administrative
75
users have access to all studies in an MIDG implementation, and also have configuration
privileges in the GUI’s management interface.
When all preclinical data and results in a study is completed and/or published, principle
investigators have the option to make the imaging datasets available to the public. This
means that the completed studies will show up during search results for other registered
users in the MIDG instance. Data sharing within a molecular imaging research
community is currently very difficult because no comprehensive search and download
infrastructure has been made readily available. Study-centric sharing of molecular
imaging data files is critical when a multi-investigative research study is on-going, but it
can also happen after preclinical experiments and results have been published so the
larger molecular imaging community can analyze or understand.
7.2 Uploading Molecular Imaging Study Datasets
7.2.1 Upload Workflow and Dataflow
Current methods of uploading data at molecular imaging facilities typically include the
use of investigator-named folders with sub-folders labeled by session dates. The staff or
imaging lab managers then drag-and-drop completed imaging datasets from user analysis
workstations to a central storage device such as an on-site file-server. In the MIDG
upload workflow, folders are named by study identifiers rather than investigator name
because multiple investigators may log-in to the GUI to access a preclinical study.
However, session dates are still used to name study sub-folders, and copying data from
76
user workstations to the MIDG GUI web-server using the drag-and-drop method is also
kept the same. Aside from reasons to minimize workflow disruption, using the drag-and-
drop method to move data from user analysis workstations at a preclinical molecular
imaging facility to the MIDG is faster and simpler than requiring users to select files
through the GUI web pages.
Figure 26 draws out the 9 workflow and dataflow steps involved with uploading
molecular imaging study datasets from a browser-enabled user workstation to the MIDG
through the MIDG GUI. The boxes in grey signify steps requiring user input, whereas
boxes with dotted borders signify dataflow steps performed by server-side scripts on the
MIDG web-server. For the sake of discussion, Figure 26 does not include dataflow steps
that occur on the Grid Node Server after the created DICOM files have been sent to the
DICOM-compliant Grid Node services.
77
Figure 26: MIDG Graphical User Interface – Upload Workflow and Dataflow
At the USC Molecular Imaging Center, step 1 is performed by the imaging facility’s
manager, steps 4 and 5 are performed by imaging staff technicians, and step 6 is
performed by either the imaging staff or investigator. Note that all study metadata
involving a scan must be entered in step 1 before the corresponding scan folders will be
1. Register New
Study into MIDG GUI
2. Study metadata written to
central MIDG Database *
* Study metadata include comprehensive information about the study, sessions,
groupings, and scans.
** If not already done, mount the local MIDG web-server’s shared upload directory
onto user workstations.
3. Study folders created in the
MIDG web-server’s Shared
Upload directory **
4. Perform imaging
scans on modalities
5. Copy-and-paste
new imaging files
into the shared
upload directory
6. Select files to be
uploaded in GUI
7. Files are written into MIDG
database and Upload Queue
8. Files are converted to
DICOM by a Java program
executed every minute on
MIDG web-server
9. Converted files are sent to
Grid Node by the same Java
program.
78
created in step 3. Note that the users responsible for each step in grey can vary depending
on the operational policies unique to each molecular imaging facility.
7.2.2 Samples of Screenshot
Here are screenshots demonstrating the MIDG upload GUI and workflow. Figure 27
shows a list of two previously registered studies that the user ‘jasperle@usc.edu’ has
access to. The parameters relevant at the study level are shown, including animal type,
region-of-interest (ROI), Institutional Animal Care and Use Committee (IACUC), and the
primary/principle institutions of the study.
Figure 27: MIDG Upload GUI – Study Selection Level
79
Clicking on the ‘Register a New Study’ link in Figure 27 will bring you to the study
registration page shown next in Figure 28. In addition to the study metadata fields, the
last text field box allows the registering user to share this study with other users by
entering user accounts separated by commas. These user accounts must already be
registered as valid users in the MIDG GUI in order to work.
Figure 28: MIDG Upload GUI – New Study Registration Form. These Field Values Will Be
Inserted Into the DICOM Header of All Uploaded Image Files During the DICOM
Conversion Step.
80
Upon registration of a study, any number of planned sessions can be added to the study.
Figure 29 shows the session registration form. Note that each session in molecular
imaging workflows is unique to a specific modality, technician, and date. The duration of
a session is also asked for scheduling and billing purposes.
Figure 29: MIDG Upload GUI – New Session Registration Form. These Field Values Will
Be Inserted Into the DICOM Header of All Uploaded Image Files During the DICOM
Conversion Step.
81
Within each imaging session, one or more animals can be grouped by experimental
variables. Figure 30 shows the registration page for a new group within a session. For
example, there is usually a control or placebo group in an experimental imaging study
that is scanned together with the experimentally treated animals. Although contrast agents
are not always required in an imaging scan, contrast agent fields are made available as
optional.
Figure 30: MIDG Upload GUI – New Group Registration Form. These Field Values Will Be
Inserted Into the DICOM Header of All Uploaded Image Files During the DICOM
Conversion Step.
82
In the last step of the registration process, the scan registration form shown in Figure 31
asks for basic descriptions of the animal subject being scanned. More metadata fields
may be added to this form in the future, but this initial GUI design only asks for a subject
name, age, and weight. Due to experimental and preparatory variability, a comments field
is left open for users to enter free text.
Figure 31: MIDG Upload GUI – New Scan Registration Form. These Field Values Will Be
Inserted Into the DICOM Header of All Uploaded Image Files During the DICOM
Conversion Step.
83
After all the study, session, group, and scan metadata are registered into the MIDG GUI,
the investigators or imaging technician can start drag-and-drop newly created imaging
files directly from modality workstations to the appropriate study folder in the shared
‘Upload’ directory of the MIDG web-server. Figure 32 shows 12 microCT image files
that have been copied into the shared ‘Upload’ directory (X:\). Note the directory path
shown in the explorer browser that leads to this scan folder. The first three parent
directories are designated by study name, session date, and scan subject name. Because
some modalities such as microCT can generate tens to hundreds of 2-D images per scan,
the sample DICOM files shown in Figure 32 are put into a user-created folder named
‘CTScan_1’ under the ‘acquisition’ folder. This added ‘CTScan_1’ folder makes the
image selection process easier in the next step.
Figure 32: Screenshot of Copied Files from a Client Machine into the Shared Study
Directory (X:\) of the MIDG Web-Server. This Drag-and-Drop Method Over a LAN is
Simple, Fast, and Common in Existing Methods of Data Archiving at Preclinical Imaging
Facilities.
84
The last upload step for users is step 6, which requires selection of the files and folders
that have been copied into the corresponding upload folder. Carrying over from the
earlier example in Figure 32, the ‘CTScan_1’ folder is shown in Figure 33 below as a
single checkbox under ‘ACQUISITION Files’. Figure 33 also shows that there were no
files copied into the ‘PROCESSING Files’ folder, and four individual DICOM files were
copied into the ‘DISTRIBUTION Files’ folder. Upon selection and clicking the ‘Upload’
button at the bottom, these selected files are converted into DICOM files with the study
metadata entered earlier in the upload workflow. The metadata is written into the DICOM
header, as described later in Section 7.3.3.
Figure 33: MIDG Upload GUI – Uploading Files From a Scan Dataset. The Final Step For
Users In the Uploading Workflow is to Click on the Upload Button, Which Initializes
DICOM Format Conversion and Archival of Selected Files Into the MIDG.
85
7.2.3 DICOM Compliance
With similarities and differences between molecular imaging research and clinical
radiology environments, mapping molecular imaging study metadata to DICOM header
tags originally intended for clinical patient studies required some adjustments. However,
much of the standardized unique identifiers (UID) such as Service-Object Pair (SOP)
Class UID and Transfer Syntax UID were kept in conformance to the DICOM standard.
Table 4 lists and describes the DICOM tags that are modified by the MIDG GUI before
being sent to a Grid Node server. Although not all of these 38 fields are required for
successful storage within the MIDG data grid infrastructure, additional metadata is
inserted for the benefit of investigators who will download and import these DICOM files
on their own computers. Because the preclinical molecular imaging workflow does not
require the assignment of patient, study, series, and file instance identifiers, these
DICOM tag values were generated by the MIDG GUI Web-Server using internal
identifiers from the MIDG database, and are highlighted in orange text in Table 4. Most
DICOM-compliant software is able to display and sort these DICOM files according to
these header tag descriptors.
86
Table 4: List of DICOM Tags that are Labeled by the MIDG GUI During Upload Process
DICOM Tag Tag ID VR Description
Media Storage
SOP Class UID
0002,0002 UI Same as SOP Class UID
Transfer Syntax 0002,0010 UI Supported Transfer Syntax:
[Implicit VR Little Endian = 1.2.840.10008.1.2]
[Explicit VR Little Endian = 1.2.850.10008.1.2.1]
[Explicit VR Big Endian = 1.2.840.10008.1.2.2]
[JPEG Baseline, 1.2.840.10008.1.2.4.50]
[RLS Lossless = 1.2.840.10008.1.2.5]
SOP Class UID 0008,0016 UI Supported SOP Classes:
[SC Image = 1.2.840.10008.5.1.4.1.1.7]
[CT Image = 1.2.840.10008.5.1.4.1.1.2]
[PET Image = 1.2.840.10008.5.1.4.1.1.128]
[US MF = 1.2.840.10008.5.1.4.1.1.3.1]
[MR Image = 1.2.840.10008.5.1.4.1.1.4]
SOP Instance UID 0008,0018 UI SeriesUID.FileID
Study Date 0008,0020 DA Study Date, in the form of yyyymmdd
Series Date 0008,0021 DA Session Date, in the form of yyyymmdd
Accession Number 0008,0050 SH StudyUID without the periods in the middle.
Modality 0008,0060 CS Modality Type Abbreviations: CT, PT, OPT, US, AR
Manufacturer 0008,0070 LO Modality Manufacturer Name
Institution 0008,0080 LO Investigator's Institution
Referring
Physician
0008,0090 PN Investigator's Full Name
Study Description 0008,1030 LO Study Description
Series Description 0008,103E LO Scan Comments
Department 0008,1040 LO Investigator's Department
Operators' Name 0008,1070 PN Imaging Technician's Full Name
Model Name 0008,1090 LO Modality Manufacturer's Model Name
PatientName 0010,0010 PN Animal Subject's Name
Patient ID 0010,0020 LO Investigator ID
Patient Sex 0010,0040 CS Animal Subject’s Sex, M or F (has to be uppercase)
Patient Age 0010,1010 AS Animal Subject’s Age, nnnD, nnnW, nnnM, nnnY
Patient Weight 0010,1030 DS Animal Subject’s Weight in kg
Patient Species
Description
0010,2201 LO Animal Type (eg. mouse, rabbit, etc.)
Clinical Trial
Committee
0012,0081 LO "Clinical Trial Protocol Ethics Committee Name" =
IACUC
Clinical Trial
Approval Number
0012,0082 LO "Clinical Trial Protocol Ethics Committee Approval
Number"
Exam Part 0018,0015 CS Animal Imaging ROI, Body Part
Contrast Agent 0018,0010 LO Radiopharmaceutical Biomarker (ex. FDG)
Study ID 0020,0010 SH Same as Accession Number
Study UID 0020,000D UI 1.2.StudyID
Series UID 0020,000E UI StudyUID.ScanID (note: sessionID and groupID are
not accounted for)
87
Deviating from the clinical DICOM data model, the ‘Referring Physician’ tag
(0008,0090) has been allocated for the preclinical investigator’s full name, and the
investigator’s ID to the ‘Patient ID’ tag (0010,0020). The investigator’s name is placed
into the ‘Referring Physician’ tag because investigators play a synonymous role in
requesting an imaging exam. The investigator’s ID was used in the ‘Patient ID’ tag
instead of an animal ID because molecular imaging studies in research put little emphasis
on individual animal-models. In stark contrast with clinical radiology where patients are
the focus and owners of imaging studies, investigators are the owners of imaging studies
and animals are simply tools in a study. Placing the investigator’s ID into the Patient ID
also enables DICOM-compliant software to sort imaging studies by investigators rather
than a multitude of different animal ID’s. The other patient-related tags such as patient
name and patient age, however, remain affiliated with the actual animal subject of the
imaging scan. In summary, although the GUI Web-Server has a study-centric workflow
to maintain user-access control, the DICOM data model adapted for preclinical imaging
studies in the MIDG are a hybridization of investigator and animal subject metadata.
7.3 Monitoring and Management Tools
Investigators who have registered their molecular imaging studies into the MIDG system
can monitor their study’s progress based on the files that have been uploaded. The
monitoring page is shown in Figure 34 with two sample studies belonging to user
‘jasperle@usc.edu’ listed at the top. Upon clicking on a study link, its specific study
metadata and list of uploaded files appear below the list. In Figure 34, the link for
88
‘Study_1’ has been clicked, showing the description and list of DICOM files already
uploaded for that preclinical imaging study.
Figure 34: MIDG GUI – Study Monitoring. A Summary of the Log-In User’s Authorized
Studies is Shown, Including Study Metadata, Status, and Listing of Uploaded Image Files.
89
Another tool available for both investigator users and MIDG administrative users is the
study management page, shown in Figure 35. Here investigator users can mark studies as
completed, delete erroneous upload entries, and add or remove other users from having
access to their studies. When studies are marked as completed, no new data is expected to
be uploaded for that study because all planned imaging scans have been performed and
analysis of the data is done. Users click on the green checkmark icon to reflect this
completed study status into the MIDG Database, and also to delete the ‘upload’ directory
for that study on the local MIDG GUI Web-Server in order to free up hard-drive space
for future study uploads. Investigators and MIDG administrators can also delete
registered scans, groups, sessions, and studies by clicking on the red ‘X’ icon next to each
level. Deleting an entry removes the corresponding level’s information as well as all
children levels from the MIDG database. Note that the physical DICOM files already in
the data grid archives are not deleted for security purposes. Deleting the files in the
MIDG GUI simply removes it from the MIDG database, thereby preventing any search or
record of the entry. To remove the physical files previously uploaded into the MIDG
requires manual intervention by MIDG administrators who have access to the data grid
infrastructure.
90
Figure 35: MIDG GUI – Study Management. Mistakes During the Study, Session, Group,
or Scan Registration Process Can Be Deleted Here. Primary Investigators and
Administrators Can Also Change the User Access to Their Registered Studies.
And thirdly, access control can be managed per study by selecting the user accounts in
either the ‘Remove’ or ‘Add’ drop-down menus below each study box. The ‘Remove’
drop-down menu for each study contains a list of all non-administrative user accounts
that have existing access to the study above it. The study’s principle investigator account
is also not listed in this list because they, like administrators, cannot be removed from
having access to the study. On the other hand, the ‘Add’ drop-down menu for each study
contains a list of all non-administrative user accounts that do not yet have access to the
study. Clicking on the ‘Add’ button will then grant access for a user to that study,
91
allowing them to obtain search results with that study, and authorization to upload and
download datasets for that study.
Available only to administrative MIDG users is the grid management portion of the
MIDG management page. This section is shown near the top of Figure 36 and allows
administrators to configure the AETitle, IP address, and port number of the local Grid
Node servers, and also customize certain MIDG database parameters such as animal
types and new modality types. The selection of Grid Node servers is unique to each
MIDG web-server implementation because each site should have its own MIDG web-
server and Grid Node Server. In case of failure of a local Grid Node, the MIDG web-
server can be quickly reconfigured to second backup Grid Node Server using this
administrative management interface.
92
Figure 36: MIDG GUI – Administrator’s Management Page. In addition to the study
management ability, administrators can add new fields or edit existing fields for the MIDG
GUI interface.
93
7.4 Downloading Molecular Imaging Datasets
7.4.1 Samples of Screenshot
Downloading data from the MIDG system is done through the Download page of the
MIDG GUI, shown in Figure 37. Near the top of the page is a list of completed and in-
progress studies that the logged-in user has shared access to. The bottom half of the page
is an advanced search feature that allows users to filter results by specific study
parameters such as modality type, contrast agent, and treatment type. Note that the search
feature will also look into public studies that have been made available by the primary
investigator due to study completion and/or publication.
Figure 37: MIDG Download GUI. Studies with imaging files can be downloaded by
authorized users. The advanced search feature can find studies by specific parameters.
94
The search feature queries the MIDG Database for studies that meet the inputted search
parameters and have been granted access to the current logged-in user, as shown in
sample Figure 38. The returned results list look similar to the study management page,
but there is only one active event icon on the right side. This red arrow icon submits a
download request to the MIDG GUI queue for that particular study. The workflow that is
involved in downloading a selected study is presented in the next section.
Figure 38: Sample Search Results Page With Results for Optical Imaging Studies in the
MIDG. The search feature allows investigators to perform detailed filtered search of
personal and public preclinical imaging studies available for download.
95
7.4.2 Download Workflow and Dataflow
Similar to the uploading workflow, the downloading workflow submits a request to the
local MIDG GUI Web-Server’s download queue, which gets processed by a server-side
Java program running once every minute. The Java program sends a DICOM c-find and
c-move command to the Grid Node server for each study in the queue, and waits for the
requested files to be returned from the Grid Node server. The DICOM receiver on the
MIDG web-server, shown in step 5 of Figure 39, is a modified version of the open-source
DCM4CHE2 toolkit because it places incoming DICOM files in folders corresponding to
the study name. These study folders are automatically created in the shared ‘Download’
directory once the first image file is sent into the MIDG GUI Web-Server. Like the copy-
and-paste method to upload datasets into the MIDG GUI Web-Server, users also drag-n-
drop retrieved study folders from the shared ‘download’ directory on the MIDG GUI
Web-Server to their client workstations.
96
Figure 39: MIDG Graphical User Interface – Download Workflow
1. Search for Studies
3. Download request written
to Download Queue.
* If not already done, mount the local MIDG web-server’s
shared download directory onto client workstations.
4. Java program pulls
download requests from
Download Queue and sends
DICOM Query/Retrieve to
Grid Node Server.
6. Copy-and-paste
arrived imaging
datasets to client
workstation.*
5. DICOM Receiver puts
image files into study’s folder
in the shared Download
directory on MIDG web-server.
2. Select a Study
to be Downloaded
97
Chapter 8. SYSTEM IMPLEMENTATION AND EVALUATION
8.1 Objectives
Preclinical imaging studies from five modalities at the USC Molecular Imaging Center
were collected and used to evaluate the Molecular Imaging Data Grid in both a laboratory
and then in a distributed multi-site implementation. The laboratory model was deployed
at the USC Image Processing and Informatics Lab (IPILab) with dedicated server
environments and high-speed LAN connectivity. The multi-site implementation was
deployed at three USC institutions – the USC Molecular Imaging Center (MIC), the USC
IPILab, and the USC Ultrasonic Transducer Resource Center (UTRC). The objectives of
the evaluation scenarios are to validate system design, quantify dataflow performance,
and determine limitations in the MIDG system design based on SOA and the XDS-i
integration profile. Staff at the USC Molecular Imaging Center was involved in the data
collection process as well as planning these evaluation scenarios.
8.1.1 Overview: Laboratory Model and Distributed Multi-Site Model
In the laboratory and multi-site evaluation scenarios, collected preclinical imaging studies
are uploaded into the MIDG using a simulated user workstation, and then downloaded
onto another simulated user workstation either at the same site or a remote site,
depending on the MIDG implementation being tested. The laboratory implementation
was set up at IPILab for development and system design validation, while the distributed
multi-site implementation was set up at three USC sites for real-world dataflow
98
performance measurements and fault-tolerance testing. Having three sites in the
distributed MIDG model allow for fail-over testing scenarios where network connectivity
at one of two imaging data provider sites becomes temporarily unavailable. Variables that
can affect measured performance results include study dataset variability, hardware
server performance, and network bandwidth, and are taken into consideration during the
distributed multi-site evaluation. Figure 40 demonstrates overview of MIDG
implementation and evaluated scenario of the MIDG between a data provider site and a
data consumer site.
99
Modality and Workstation
Data Provider
MIDG GUI
(upload)
MIDG Web-
Server
Master
MIDG
Database
Grid
Manager
Database
Grid Node
Server
MIDG Web-Server Grid Manager Server
Grid Node
Server
MIDG Web-
Server
Investigator Desktop
Data Consumer
MIDG GUI
(download)
MIDG Management Site
Data Provider Site
Data Consumer Site
1
2
3
4
5
6
Figure 40: Systems Integration and Workflow Overview of MIDG Implementation
The Data Provider Site (left) uses the MIDG GUI Upload-page to upload preclinical
molecular imaging studies into the MIDG system (steps 1 and 2), while the Data
Consumer Site uses the MIDG GUI Download-page to retrieve archived study data to the
investigator’s remote workstation (steps 3 thru 6). Step 5 represents data file transport
from the Data Provider Site to the Data Consumer Site via Grid Node Servers. The reason
for a dotted box around the MIDG Web-Server and Grid Node Server in Figure 40 is
100
because the two components are hosted on a single server hardware using VMware
Server 2.0 technology
[20]
. This setup was for evaluation purposes, and the two MIDG
components can be implemented on separate machines for better system robustness.
8.2 Datasets Collected for System Evaluation
Twelve sample preclinical molecular imaging datasets were collected from the USC
Molecular Imaging Center’s six preclinical imaging modality types – microCT,
microPET, co-registered microPET-CT, optical imaging, ultrasound, and
autoradiography. Two sample datasets from each type of modality was collected. Each
modality type’s dataset has a unique variety of acquisition, post-processing, and
distributed image formats, and may vary between pre-clinical imaging facilities.
Table 5 lists the file formats identified by the USC Molecular Imaging Center (MIC) as
needing long-term archiving and/or distribution to their investigators. In the current
MIDG system design, only native DICOM and 2-D viewable image formats (shown in
red font in Table 5) were gathered for this evaluation because the current data grid
infrastructure supports DICOM format and protocols. Archiving MIC’s other data file
formats (shown in black text), such as the proprietary raw acquisition files and post-
processing protocol files, into the MIDG requires future work and will be discussed in
Chapter 10. The supported imaging data formats are converted to the DICOM file format
by the MIDG GUI Web-Server during the upload process.
101
Table 5: Preclinical Molecular Imaging File Formats Collected from USC MIC for
Evaluation
MicroCAT
MicroPET
Pet-CT
Optical
Imaging
US
Autoradio-
graphy
Acquisition CAT LST DCM TIFF -- TIFF
CAT.HDR LST.HDR TXT TXT ANA
Post-
Processed
IMG IMG XIF -- -- TIFF
IMG.HDR IMG.HDR
Distributed DCM DCM JPEG PNG DCM* BMP
PDF PDF PDF TIFF JPEG
XLS PDF PDF
AVI
* Ultrasound has both DICOM single images and DICOM multi-frame video formats.
A preclinical imaging study’s dataset size depends on a variety of factors such as
modality type, animal type, animal region-of-interest (roi), file format, number of files,
and image resolutions. Table 6 lists the 12 study datasets that were collected from the
USC MIC with some relevant metadata. Table 6 forms a basis of discussion for file
transfer speed results in Chapter 9. Note that a column named DICOM SOP Class was
added to specify the type of DICOM object being created by the MIDG. This is important
because data retrieved from the MIDG to be imported into DICOM-compliant software
used by preclinical investigators and imaging facilities may not support all DICOM
object types.
102
Table 6: Molecular Imaging Datasets Collected for Evaluation from the USC MIC. (SOP:
Service-Object Pair; SC: Secondary Capture)
Study
Name
Modality Type Original
Formats
DICOM
SOP Class
# of
Files
Total Dataset
Size
μCT 1 MicroCT DICOM-CT CT 461 130 MB
μCT 2 MicroCT DICOM-CT CT 974 1.07 GB
μPet 1 MicroPET DICOM-PET PET 63 2.09 MB
μPet 2 MicroPET DICOM-PET PET 63 2.09 MB
PETCT 1 Co-registered
MicroPET/CT
JPEG SC 3 216 KB
PETCT 2 Co-registered
MicroPET/CT
JPEG SC 11 1.20 MB
OPT 1 Optical TIFF SC 4 914 KB
OPT 2 Optical TIFF SC 5 2.97 MB
US 1 Ultrasound DICOM-US
Multi-Frame
Multi-
Frame
1 44.9 MB
US 2 Ultrasound TIFF US 3 6.6 MB
AR 1 Autoradiography TIFF SC 1 8.38 MB
AR 2 Autoradiography TIFF, BMP,
JPEG
SC 4 10 MB
8.3 Hardware Components
8.3.1 Laboratory Model
The hardware components used for the laboratory MIDG model was set up at the IPILab
with technical specifications detailed in Table 7. The laboratory model consists of a
simulated investigator workstation, a MIDG Web-Server, and two Dell PowerEdge
servers that host the data grid’s Grid Manager Server and two Grid Node Servers. A
diagram of this setup is shown later in Figure 41 of Section 8.5. Readily available on the
investigator workstation (WS, which is a PC) are the collected sample datasets for
evaluation. Two Grid Node Servers are hosted on one Dell PowerEdge server by the use
of VM Server 2.0 technology because each Grid Node Server is a virtual machine (VM).
103
Both the laboratory and distributed MIDG implementations utilize this server
virtualization technology provided by VMware, Inc. for portability and hardware
consolidation. Note that although the laboratory model implements two Grid Node
Servers on the same server, they are hosted on separate hard-drives to make sure read and
write transfer speeds between the two Grid Node Servers are not affected by hard-drive
overloading during the laboratory evaluations.
Table 7: Hardware Components Used in MIDG Laboratory Model. Italicized components
were implemented as virtual machines (VM).
Location Hardware Component(s) CPU Memory Total
Storage
IPILab Dell
PowerEdge
2970
Grid Manager Quad-Core
AMD Opteron,
2.4 Ghz
8 GB 120 GB
IPILab Dell
PowerEdge
2970
Grid Node Svr. 1
Grid Node Svr. 2
Quad-Core
AMD Opteron,
2.4 Ghz
8 GB 250 GB
IPILab Dell
Dimension
9150
GUI Web-Server
w/ MIDG
Database
Investigator PC
Pentium D,
3.2 Ghz
512 MB 30 GB
IPILab Dell
Dimension
9150
Investigator PC Pentium D,
3.2 Ghz
3 GB 100 GB
8.3.2 Multi-Site Model
The multi-site MIDG model implements an investigator WS, GUI Web-Server, and Grid
Node Server at three geographically distributed sites. The technical hardware
specifications are listed in Table 8, and its setup is shown later in Figure 43 of Section
8.6.2. The central Grid Manager Server is hosted at the USC IPILab on a Dell PowerEdge
server. At the MIC and UTRC sites, VMware was used to consolidate the Grid Node
104
Server and GUI Web-Server onto a single hardware PC for ease of deployment. However
at the IPILab, the Grid Node Server is hosted on a Dell PowerEdge server, while the GUI
Web-Server is hosted on another PC. The three simulated investigator workstations at
these sites are browser-enabled PC’s with LAN connectivity to their site’s GUI Web-
Server.
Table 8: Hardware Components Used in MIDG Distributed Multi-Site Model
Location Hardware Component(s) CPU Memory Total
Storage
IPILab Dell
PowerEdge
2970
Grid Manager Quad-Core
AMD
Opteron, 2.4
Ghz
8 GB 120 GB
IPILab Dell
PowerEdge
2970
Grid Node Server
Quad-Core
AMD
Opteron, 2.4
Ghz
8 GB 250 GB
IPILab Dell
Dimension
9150
GUI Web-Server
w/ MIDG
Database
Investigator PC
Pentium D,
3.2 Ghz
512 MB 30 GB
IPILab Dell
Dimension
9150
Investigator PC Pentium D,
3.2 Ghz
3 GB 100 GB
MIC Dell Vostro Grid Node Server
GUI Web-Server
Intel Core i5,
3.2 Ghz
8 GB 500 GB
MIC iMac Investigator PC Intel Core i3,
3.06 Ghz
4 GB 500 GB
UTRC Dell
Dimension
9150
Grid Node Server
GUI Web-Server
Pentium D,
3.2 Ghz
2 GB 80 GB
UTRC Lenovo 3000
V100
Investigator
Laptop PC
Intel Core2,
2.0 Ghz
1 GB 300 GB
105
8.4 System Configurations
Setting up a Molecular Imaging Data Grid requires configurations of networking, Grid
Manager Server, Grid Node Servers, GUI Web-Servers, and investigator PC’s.
8.4.1 Network Configurations
For privacy and security purposes, most institutions interconnect their local workstations
on a LAN behind a firewall or firewall-enabled router to prevent unwanted access from
users on the public Internet space, also known as the WAN. To set up a Grid Manager
Server and/or Grid Node Server at a site, static IP addresses are needed and certain
network ports must be opened on the firewall so that the Grid Node services can
communicate with Grid Manager Server’s services and remote Grid Node Servers over
the WAN. These ports include the Java Application Server port, web-services port,
DICOM C-Store port, and DICOM C-Find and C-Move port.
Because most sites use a router to separate their LAN from the WAN, the Grid Node
Servers have a LAN IP address for only on-site connectivity, and a WAN IP address that
is visible to the external WAN network. The mapping from the WAN IP address to the
LAN IP address is via network-address-translation (NAT), and is handled by the router at
each site. While all of the Grid Manager’s services should be configured with its own
WAN IP address, Grid Node Servers’ services are configured using its LAN IP address.
Each Grid Node Server is configured with the WAN IP address of the Grid Manager
Server and the WAP IP address of remote Grid Node Servers because that is the IP
address visible over WAN.
106
8.4.2 Configuring Grid Manager
Configuring the central Grid Manager Server requires making its registration, routing,
and retrieval services accessible to the remote Grid Node Servers’ services. The first step
is to assign its WAN IP address to the Java EE Application Server, Oracle Grid Manager
Database, and Grid Manager service configuration files. The second step is to register the
Grid Node servers’ WAN IP addresses into the Grid Manager’s database so that it can
direct data storage and retrieval requests between Grid Node Servers.
8.4.3 Configuring Grid Node Servers
Configuring a site’s Grid Node Server to interact with the Grid Manager’s services and
the local MIDG GUI Web Server requires three general steps. The first step is to assign
its LAN IP address to the Java EE Application Server, MySQL Grid Node Database, and
Grid Node services’ configuration files. The second step is to register the Grid Manager
Server’s WAN IP address into the MySQL Grid Node Database. The third step is to add
the local MIDG GUI Web Server as a DICOM client so that it can store, query, and
retrieve DICOM files.
8.4.4 Configuring MIDG Web Servers and Investigator Workstations
Configuring a site’s MIDG GUI Web Server is required for user workstations to access
their local web-based GUI and to mount the web-server’s shared Upload and Download
directories. This configuration process requires three general steps. The first step is to
configure the GUI web pages, written in the PHP:Hypertext Preprocessor language, and
107
the local PostgreSQL MIDG Database with the GUI Web-Server’s LAN IP address. The
second step is to configure the connectivity of the central MIDG Database for the PHP
web pages. The third step is to add the local site’s Grid Node Server as the DICOM
destination for sending and retrieving DICOM imaging files. This last step determines the
MIDG GUI Web-Server’s entry point into the MIDG data grid infrastructure, and can be
changed easily through the GUI to a different Grid Node Server in disaster recovery
scenarios.
8.5 Laboratory Evaluation Model and Dataflow
The laboratory evaluation of the MIDG was used for development and system design
validation. Sample preclinical molecular imaging datasets (Section 8.2) were uploaded
and downloaded using one GUI Web-Server, one Grid Manager Server, and two Grid
Node Servers across a consistent LAN network. The setup and dataflow are shown in
Figure 41. The uploading process is shown as steps 1 thru 3; and the download process is
shown as steps 4 thru 9. The two Dell PowerEdge servers at the IPILab are hosted in an
air-conditioning server room with back-up power. One of them is used for the Grid
Manager Server and the other is host to two Grid Node Servers. These hardware
specifications were listed earlier in Table 7.
108
Data Provider
MIDG GUI
(upload)
MIDG Web-
Server
MIDG
Database
Grid
Manager
Database
Grid Node Server 1
+
Grid Node Server 2
Grid Manager Server
Investigator Desktop
Data Consumer
MIDG GUI
(download)
MIDG Laboratory Evaluation Site
USC IPILab - 100 mbps LAN
1
2
3
4
6
8
9
7
5
Figure 41: Laboratory Model - Components and Dataflow Overview
The basic laboratory evaluation dataflow is as follows:
1. New molecular imaging studies are registered and uploaded from the simulated
investigator PC to the MIDG Web-Server using the MIDG GUI and shared Upload
directory, respectively.
2. File upload requests are added to the upload queued on the MIDG Web-Server and
sent to the Grid Node Server 1 via DICOM by a Java program that runs every minute.
3. Grid Node Server 1 receives the DICOM files and registers the new study with the
Grid Manager Server in its XDS registry, which is implemented using an Oracle
database.
109
4. A user requests a download of a molecular imaging study from the MIDG. This
request is added to the download queued on the MIDG Web-Server.
5. Another Java program, that runs every minute, handles these download requests one
at a time and submits a DICOM c-find and c-move request to Grid Node Server 2.
6. Grid Node Server 2 queries the Grid Manager to locate the requested imaging study.
7. Grid Node Server 1 is triggered to sent the requested imaging study to Grid Node
Server 2 over a GridFTP connection.
8. Grid Node Server 2 receives the imaging study files and sends them to the MIDG
Web-Server over a DICOM connection.
9. MIDG Web-Server receives and packages the imaging studies in its shared Download
directory for the user to copy-and-paste to their local investigator PC or external hard-
drive.
8.6 Distributed Multi-Site MIDG Evaluation Model and Dataflow
The purpose for the evaluation of the multi-site MIDG model is to demonstrate
application and feasibility of the MIDG design in a real-world test across a WAN region
with variable network bandwidth. The evaluation involves three geographically
distributed sites that allows for fail-over testing scenarios where one of two imaging data
provider sites becomes unavailable for study data retrieval.
8.6.1 Three Site Test-bed
The three sites in the MIDG multi-site model are the USC Molecular Imaging Center,
located on the USC Health Sciences Campus; the Image Processing and Informatics Lab,
110
located at the USC Annenberg Research Park; and the Department of Biomedical
Engineering, located on the USC main campus shown in Figure 42.
Figure 42: Geographic Location of the Three USC IPILab, MIC, and UTRC Sites
Participating in the Multi-Site MIDG Evaluation Model. Map Provided by Google.
USC Image Processing & Informatics Lab (IPILab)
The USC Image Processing & Informatics Lab is the informatics research and
development site behind the MIDG system. With experience in medical informatics
projects ranging from a PACS simulator to a web-based ePR solution for multimedia data
in surgical suites, my colleagues at the IPILab have the knowledge and resources to assist
me to establish the MIDG management site for the multi-site MIDG implementation. The
central Grid Manager Server and MIDG Database is hosted in a dedicated server facility
with adequate A/C, back-up power supply, and high-speed network bandwidth for
efficient data transmission.
111
USC Molecular Imaging Center (MIC)
The USC Molecular Imaging Center is a multi-modality molecular imaging facility with
trained staff and dedicated imaging systems to help research in molecular imaging and
personalized medicine for the future clinical environment. Available imaging modalities
to investigators include the Siemens MicroPET R4, Siemens MicroCAT II, Xenogen
IVIS 200 optical imaging system, VisualSonics Vevo 770 ultrasound system, and
Faxitron MX-20 autoradiography system (see Section 2.1). Each modality is operated by
trained staff using dedicated software on the accompanying workstation for image
acquisition, post-processing, image display, and certain quantitative image analysis.
Every month, the Molecular Imaging Center receives an average of 5 new experimental
studies from investigators inside and outside of USC and generates an estimated 1
Terabyte of data per year. In this MIDG multi-site implementation, the MIC site
represents the primary imaging data provider and its users help upload collected sample
datasets during the evaluation scenarios.
USC Ultrasound Transducer Resource Center (UTRC) at the Biomedical Engineering
Department
The Ultrasonic Transducer Resource Center (UTRC) is located in the BME department
and building with many biomedical research labs ranging from medical device
prototyping to medical image processing research. The UTRC is an ultrasound transducer
research lab conducting research, fabrication, and training on ultrasound transducer
design in medical imaging applications for academic investigators and outside private
112
institutions. The UTRC is an exemplary remote molecular imaging research site that can
contribute animal research data to the MIDG and benefit from a multi-site MIDG
infrastructure and data sharing research community. In this implementation of the multi-
site MIDG model, I set up a Grid Node server and MIDG web-server at the UTRC to
simulate how an investigator at a remote imaging research institution can download and
contribute molecular imaging data from their computer with as simple as a laptop
computer.
8.6.2 Dataflow
There are four steps in this 3-site MIDG evaluation. Its setup and connectivity are shown
in Figure 43. Detail about hardware specifications were presented in Table 8, and
network bandwidths are presented in Section 8.7 of this chapter.
113
Figure 43: Components and Connectivity of the Multi-Site MIDG Model Implementation
for Evaluation. Courtesy of the USC MIC and USC UTRC.
114
The evaluation steps in this multi-site MIDG implementation are as follows:
1. Upload molecular imaging studies to the MIDG from the MIC site in order to
measure study upload performance. In addition to the total time for archival, three
sub-process times are also measured: data format conversion to DICOM; transmission
of dataset files across WAN; and registration of studies into the central XDS-i
repository on the Grid Manager Server at the IPILab.
2. Download the newly uploaded molecular imaging studies from a remote site, the
IPILab, to measure study retrieval time across the WAN. This download is performed
on the investigator workstation at the IPILab via the MIDG UI.
3. Download the same molecular imaging studies a second time, again at the IPILab, to
compare WAN and LAN download performance. After step 2, the IPILab’s Grid
Node Server has a local copy.
4. Shutdown the IPILab’s Grid Node Server to simulate a failure of a data providing site
within the MIDG. Then, attempt to download the studies at a third site, the UTRC, to
validate and test MIDG download failover. The total time to download at the UTRC’s
investigator workstation will determine if a delay occurs when one of two data source
sites fails (IPILab’s Grid Node Server) or becomes unavailable prior to a download
attempt at a third data consuming site.
8.7 Networking and Bandwidth
The data transfer speed between remote sites is determined by the limiting upload and
download bandwidths at the data provider and consumer sites, respectively. An
115
understanding of the actual upload and download bandwidths at the three USC sites may
provide insight into the results of the distributed multi-site MIDG evaluation. Figure 44
shows the WAN upload and download bandwidth and also the LAN bandwidth for each
site. Note that LAN bandwidth for uploading and downloading at the MIC and UTRC
sites are able to achieve 100 mbps because the Grid Node Server and GUI Web-Server
are hosted on the same hardware server using VMware virtualization.
Figure 44: Network Bandwidth in the 3-Site MIDG Evaluation
116
Chapter 9. SYSTEM EVALUATION RESULTS
Two MIDG evaluations were performed. The first was performed using the laboratory
model within the IPILab. The second evaluation was performed using the multi-site
MIDG model at three USC institutions. Results from the initial laboratory tests at the
IPILab establish baseline performance metrics including overall data upload and
download times, and also time measurements of sub-tasks for identifying potential
bottlenecks in the dataflow. Then, evaluation of the multi-site MIDG model addresses the
impact of the WAN on data transfer performance, and includes data fault-tolerance
testing. The quantitative results in these findings were obtained through three repeated
tests to validate accuracy of measurements. Although the preclinical imaging workflow
does not require urgent upload and download times as in the clinical environment, a
consistent performance should be achieved for investigator end-users and it should
improve upon current archiving and distribution methods. And finally, a qualitative
assessment is presented based on the molecular imaging staff at the USC Molecular
Imaging Center regarding the potential utility of the MIDG system on streamlining
internal workflow as well as promoting collaborative preclinical imaging research.
9.1 Laboratory Evaluation Results
Laboratory evaluation of the Molecular Imaging Data Grid (MIDG) (see Figure 41) has
been done by measuring the time it takes to upload and download the collected multi-
modality datasets. This evaluation is for developmental validation, and also to establish a
117
baseline metric for data delivery performance in the MIDG over a WAN in the multi-site
model. The measured time for uploading a study dataset starts when the data upload
request from the user’s WS is processed at the GUI Web-Server, and ends when all
selected files have been registered into the Grid Manager by the receiving Grid Node
Server. This completion signifies that the uploaded study has been successfully archived
in the MIDG, and is available for download by any Grid Node Server in the MIDG. The
amount of time it takes to copy imaging datasets from modality workstations to the
shared study folder on the Grid Node Server was not measured because this step does not
differ from the current method in preclinical imaging facilities that store imaging data on
a shared network fileserver.
The measured time for downloading a study dataset starts when the download request
from the user’s WS is processed at the GUI Web-Server, and ends when all DICOM files
requested have arrived in the shared ‘Download’ directory on the requesting GUI Web-
Server. Table 9 shows these results from the USC IPILab for one study from each of the
six modality types.
Table 9: Performance Tests Measuring the Time It Takes to Archive and Retrieve a Study
Dataset from the Six Preclinical Molecular Imaging Modality Types Over a 100 mbps
Network
MicroCAT MicroPET PET-CT
Optical
Imaging US
Autoradi-
ography
# of Files in
Animal Scan 461 63 3 4 3 2
Total Size 130 MB 2 MB 206 KB 105 KB 578 KB 8.9 MB
Failures none none none none none none
Collisions none none none none none none
Archiving
Time (mm:ss) 5:18 0:35 0:10 0:14 0:12 0:22
Retrieval
Time (mm:ss) 1:56 0:14 0:05 0:04 0:06 0:07
118
One of the challenges in the MIDG system design was handling multi-user traffic and
potential data transfer failures due to potential data collisions. The results shown in Table
9 demonstrates that the queuing mechanism at the MIDG GUI Web-Server successfully
queue’s upload and download requests using a first-in-first-out method so that data
transfer failures due to multi-user traffic is avoided.
From these laboratory results, it can be seen that the laboratory MIDG implementation
was able to archive and distribute an imaging dataset from each modality type. The
average length of time for study data retrieval was less than half the time for archiving
the same dataset. Although many variables, such as number of files and file formats,
effecting these performance times were not included in these initial results, Figure 45
plots these values to show a general comparison of upload and download times per
modality type. The modality types with small dataset sizes, most likely due to few image
files, were able to complete upload and download under half a minute, whereas the larger
microCT scan with 461 images, resulting in a 130 MB dataset, took around 5 minutes to
upload and 2 minutes to download.
119
MIDG Laboratory Results
5:18
0:35
0:10
0:14
0:12
0:22
1:56
0:14
0:05 0:04
0:06 0:07
0:00
1:12
2:24
3:36
4:48
6:00
MicroCT (130 MB) MicroPET (2 MB) MicroPET-CT (206
KB)
Optical (105 KB) US (578 KB) Autoradiography
(8.9 MB)
A Sample Dataset per Modality (Dataset Size)
Time (mm:ss)
Upload Time (m:ss)
Download Time (m:ss)
Figure 45: Laboratory Results Plot
9.2 Multi-Site Evaluation Results
In the multi-site evaluation of the MIDG, data transmission between Grid Node Servers
and Grid Managers are tested across the WAN using the 12 collected preclinical
molecular imaging datasets from the MIC (see Table 6, shown again below). The
variables commonly associated with data delivery performance across a WAN are
network bandwidth, geographic distance, and failover scenarios. The purpose of this
multi-site MIDG evaluation is to measure the real-world performance of the MIDG
upload and download workflows, and also to obtain a qualitative assessment of its impact
at a preclinical molecular imaging research facility. Fault-tolerance of the MIDG is tested
using data downloading scenarios at the UTRC site. The qualitative assessment of the
120
MIDG is based on comments and opinions of the imaging staff and management at the
USC MIC.
Table 6: Datasets collected for evaluation from the USC MIC. (SOP: Service-Object Pair;
SC: Secondary Capture)
Study
Name
Modality Type Original
Formats
DICOM
SOP Class
# of
Files
Total Dataset
Size
μCT 1 MicroCT DICOM-CT CT 461 130 MB
μCT 2 MicroCT DICOM-CT CT 974 1.07 GB
μPet 1 MicroPET DICOM-PET PET 63 2.09 MB
μPet 2 MicroPET DICOM-PET PET 63 2.09 MB
PETCT 1 Co-registered
MicroPET/CT
JPEG SC 3 216 KB
PETCT 2 Co-registered
MicroPET/CT
JPEG SC 11 1.20 MB
OPT 1 Optical TIFF SC 4 914 KB
OPT 2 Optical TIFF SC 5 2.97 MB
US 1 Ultrasound DICOM-US
Multi-Frame
Multi-
Frame
1 44.9 MB
US 2 Ultrasound TIFF US 3 6.6 MB
AR 1 Autoradiography TIFF SC 1 8.38 MB
AR 2 Autoradiography TIFF, BMP,
JPEG
SC 4 10 MB
9.2.1 Upload Performance Results
To begin the multi-site evaluation, the 12 sample datasets listed in Table 6 were uploaded
into the multi-site MIDG. Three measurements were taken per dataset, resulting in a total
time, because the upload dataflow consists of three major steps – conversion,
transmission, and registration – adding up to the total upload time. These metrics are
demonstrated in Figure 46 using the results from uploading a sample microCT dataset.
The conversion, transmission, and registration process for this microCT study resulted in
standard deviations of 1, 4, and 5 seconds, respectively, over five repeated tests. This 3%
121
margin of variability in the total microCT upload time shows that the MIDG upload
process is reproducible.
Cumulative Time for Uploading a Dataset to MIDG @ MIC Site
461-image MicroCT Scan of rat animal model
Start
0
Conversion
1:05
Transmission
2:28
Registration
2:45
0:00
0:28
0:57
1:26
1:55
2:24
2:52
3:21
Time (mm:ss)
Figure 46: Upload Performance of a MicroCT Scan of Rat Animal Model. Five repeated
tests resulted in standard deviations of 1, 4, and 5 seconds for the conversion, transmission,
and registration steps, respectively.
Incoming imaging files are converted to DICOM format during the conversion step with
proper study metadata inserted into the header fields. Then, the transmission step sends
the converted DICOM image files from the GUI Web-Server to the local Grid Node
Server. Once arrived at the Grid Node Server, its DICOM storage service constructs and
sends a SOAP message to the Grid Manager’s study registration web-service to register
the files and location of the study. Note that the upload dataflow is almost entirely carried
out over a site’s LAN between user workstations, GUI Web-Server, and the local Grid
Node Server. Only during the registration step is a SOAP message sent to the Grid
Manager Server over WAN. Figure 47 shows these upload times obtained at the USC
MIC for the multi-site evaluation.
122
Multi-Site MIDG Model
Upload Performance @ MIC Site
(does not include MicroCT modality studies)
00:00
00:04
00:09
00:13
00:17
00:22
00:26
00:30
00:35
00:39
00:43
Time (mm:ss)
Conversion Transmission Registration Total Upload
Conversion 00:02 00:01 00:01 00:05 00:02 00:04 00:16 00:04 00:02 00:04
Transmission 00:03 00:02 00:02 00:02 00:02 00:02 00:22 00:02 00:02 00:02
Registration 00:02 00:01 00:01 00:02 00:01 00:01 00:03 00:01 00:02 00:02
Total Upload 00:07 00:04 00:04 00:09 00:05 00:07 00:41 00:07 00:06 00:08
MicroPET
1
MicroPET
2
PET-CT 1 PET-CT 2 Optical 1 Optical 2 US 1 US 2 AR 1 AR 2
Figure 47: Upload Performance Results for Studies Uploaded at the USC MIC
For datasets that were native DICOM format already, transmission time took the larger
percentage of total upload time, while datasets that required conversion to DICOM spent
more time being converted to DICOM at the GUI Web-Server than the time for file
delivery to the Grid Node Server. Upload times did not exceed 10 seconds for datasets
with a size less than 10 MB. However, the ultrasound study ‘US 1’ took 41 seconds to
complete upload because it was a multi-frame DICOM file of 45 MB size. More sample
datasets with greater size variability will be needed to determine the relation between
upload times and size or file count per dataset.
123
9.2.2 Download Performance Results
To evaluate data retrieval across the WAN, the study datasets previously uploaded at the
MIC site were downloaded to the IPILab using the MIDG GUI. The download dataflow
has two major steps – localization and delivery of files from a source Grid Node Server
to the requesting Grid Node Server over gridFTP protocol, and then the DICOM send of
these files to the MIDG GUI Web-Server over the DICOM C-Move protocol. However,
the later step is inconsequential compared to the first if the requested dataset is located at
a remote site, and if the GUI Web-Server and Grid Node Server are deployed on a single
VM host server with 100 mbps internal bandwidth. In the download results shown in
Figure 48, each dataset was retrieved twice to compare data retrieval over the WAN and
data retrieval over the LAN. The first attempt took significantly longer than the second
because delivery of datasets required localization, initiation, and transmission across the
WAN. The second attempts were almost faster by a magnitude of 10 because studies
were being retrieved over LAN from Grid Node Server to the GUI Web-Server.
124
Multi-Site MIDG Model
Download Performance @ IPILab Site
(does not include MicroCT modality studies)
0:00:00
0:00:17
0:00:35
0:00:52
0:01:09
0:01:26
0:01:44
Time (mm:ss)
WAN LAN
WAN 0:01:08 0:01:05 0:00:46 0:00:48 0:00:49 0:00:48 0:01:27 0:00:46 0:00:51 0:00:52
LAN 0:00:06 0:00:06 0:00:04 0:00:05 0:00:05 0:00:05 0:00:05 0:00:05 0:00:11 0:00:06
MicroPET
1
MicroPET
2
PET-CT
1
PET-CT
2
Optical 1 Optical 2 US 1 US 2 AR 1 AR 2
Figure 48: Download Performance Results for Studies Downloaded at the USC IPILab. See
Figure 44 for Internet and WAN speed bandwidth.
9.2.3 Fault-Tolerance Performance Results
There must be at least 2 copies of a dataset in a fault-tolerant 3-site MIDG
implementation to evaluate the failover mechanism of the MIDG. At this point in the
evaluation, the requirement is met because the 12 datasets are available on both the MIC
and IPILab’s Grid Node Servers. Fault-tolerance was tested by downloading a 130MB
microCT study at the USC UTRC site, through the UTRC’s GUI Web-Server. Three
download tests were performed to simulate three different scenarios: normal, Grid Node
Server failure, and data corruption. After each attempt, the UTRC’s Grid Node Server
was cleared of all local studies so that datasets must be re-retrieved from a remote site
over the WAN. The first scenario established a baseline download performance without
any simulated failure scenarios. The second scenario simulated failure of a remote Grid
125
Node Server by shutting down the MIC’s Grid Node Server before attempting to
download the microCT study at the UTRC. The third scenario simulated data corruption
by manually deleting the MIC’s local study files before attempting to download the
microCT at the UTRC. The results, shown in Figure 49, demonstrate whether failure at
the Grid Node Server or the data storage creates a delay in download times at a third
MIDG site.
Multi-Site MIDG Model
Download Failover Performance @ UTRC Site
130 MB MicroCT Rat Scan with 461 Images
02:55
03:00
04:45
00:00
00:43
01:26
02:10
02:53
03:36
04:19
05:02
Scenario 1: Both Sites
Online
Scenario 2: Failure of a Grid
Node
Scenario 3: Corrupted
Dataset
Time (mm:ss)
Figure 49: Fault-Tolerance Results for a MicroCT Study Downloaded at the USC UTRC
126
9.3 Qualitative Impact on Pre-clinical Molecular Imaging Facilities
To determine how the MIDG design and implementation would impact preclinical
molecular imaging research, the multi-site MIDG implementation was presented to three
staff members at the USC Molecular Imaging Center - the laboratory manager, laboratory
supervisor, and research laboratory specialist. The sample datasets collected for
evaluation were uploaded and downloaded in the 3-site MIDG to demonstrate feasibility
and utility of the MIDG system. In follow-up discussions, the three USC MIC
participants presented various advantages and concerns of the MIDG system in their
feedback.
9.3.1 Laboratory Manager’s Feedback
Advantages: The MIDG GUI is advantageous for scheduling and billing of investigator
studies at the MIC because comprehensive preclinical study metadata can be recorded
and accessed through a single user interface. The study monitoring capability of the
MIDG GUI creates a summarized list of study metadata and completed datasets that can
easily be exported as a PDF report page.
Concerns: The study upload process for imaging experiments with many scans would
take a long time to register detailed study metadata into the MIDG GUI.
Suggestions: Use templates for studies, sessions, groups, and scans for faster metadata
input of common experimental studies.
127
9.3.2 Laboratory Supervisor’s Feedback
Advantages: The MIDG’s most beneficial impact is the secured sharing of study data
with the MIC’s remote campus investigators. By setting up the MIDG at remote
campuses, the MIC staff no longer has to create and keep track of FTP user accounts
because the MIDG has user-level access control over studies. Investigators can now log-
in, search, and download their preclinical datasets without requesting it from the MIDG
staff. Previously, finding and distributing preclinical imaging datasets to remote
investigators required a turn-around time of 24 hours and up to 1 week if the MIC staff is
busy with scans. Since distribution of imaging datasets to remote investigators is not a
primary service of the preclinical molecular imaging facility, automating the data
distribution process with the MIDG gives more time to imaging facility staff for
conducting scans and also more time to investigators to access preclinical data.
The laboratory supervisor also mentioned that the MIDG can promote collaborative
research in preclinical molecular imaging by connecting two imaging facilities that may
have different imaging modalities. An investigator’s study may require multiple
modalities or analysis software at remote facilities. By federating data storage for these
types of studies, investigators can access and monitor their many datasets from a single
GUI at their office.
Concerns: Lack of support for raw data formats and post-processing files in the MIDG.
Suggestions: Implementation of algorithms in the system to support additional data
formats and post-processing files as needed.
128
9.3.3 Research Laboratory Specialist’s Feedback
Advantages: The MIDG’s ability to keep track of study files is especially beneficial as an
imaging facility’s workload and/or number of users increase. Although they have rarely
been unable to retrieve a previous study’s data files from archives, the task of finding and
maintaining completed study datasets depends heavily on the memory the imaging staff
that performed a particular study, and can quickly become difficult when more than a
handful of staff is responsible for archiving. Currently, up to 2 hours are spent per study
on archiving the study’s data files. Depending on the size of the dataset, up to 30 minutes
are needed to identify the key files that need to be archived, such as raw acquisition files,
post-processed imaging files, and final distributed image files. Then, a range of 1 hour to
1.5 hours is needed to transfer these files from temporary storage to long-term storage
devices such as DVD’s, external hard-drives, and networked file-servers. Although the
MIDG workflow does not save time here, the automated attachment of study metadata to
the uploaded data files are significant data management responsibilities that make
searching for files more accurate and much faster.
Chapter 10 will discuss the future work in this project to address these suggestions from
the MIC staff, and also discuss qualitative impact to investigators.
129
Chapter 10. CURRENT STATUS, DISCUSSION, AND FUTURE
PLANS
10.1 Current Project Status
In my research on the Molecular Imaging Data Grid (MIDG) beginning in 2007 at the
IPILab, I visited several molecular imaging facilities and interviewed both staff members
and investigators. Based on their input and my own thoughts of what a MIDG should be,
I have developed the concept, design, and implementation of the MIDG in order to
facilitate and impact preclinical molecular imaging operations. The MIDG system has a
workflow to mimic preclinical molecular imaging research institutions, and a web-based
GUI. The evaluation has been performed using a laboratory model and distributed multi-
site model on the USC campuses.
The current Molecular Imaging Data Grid system supports data file formats including
DICOM, TIFF, PNG, JPEG, and PDF that are generated during preclinical molecular
imaging studies at the USC MIC using their 5 available imaging modalities - microCT,
microPET, ultrasound, optical imaging, and autoradiography. Non-DICOM image
formats and the PDF report documents are supported by way of conversion to DICOM
formats after being uploaded into the MIDG. A list of these supported file formats with
their corresponding modalities are listed in Table 5 of Chapter 8. Evaluation and
performance tests using 2 sample datasets per modality were conducted and were
presented in Chapter 9. Table 10 is the planned project timeline in my first oral
130
presentation for the entire MIDG project with accomplishments logged for each task
required.
Table 10: MIDG Project Timeline
Tasks
Sm
‘07
Fall
‘07
Sp
‘08
Sm
‘08
Fall
‘08
Sp
‘09
Sm
‘09
Fall
‘09
Sp
‘10
Sm
‘10
Fall
‘10
Workflow Study
Data Model Design
Application Interfaces
Grid-Access Service
Grid-Middleware Services
Resource Configurations
Data Collection
In-Lab Testing
Multi-Site Deployment
Evaluation
Thesis Writing
Oral Defense
10.2 Discussion
10.2.1 Comparing Existing Data Grids in Healthcare Informatics
Over the past decade in healthcare, there have been a handful of national and
international efforts to realize grid-based systems in biomedical research involving
imaging data, such as Europe’s ActionGrid and United States’ Biomedical Informatics
Research Network (BIRN).
[21]
The difference between the MIDG and these existing
methods is in its application and project scope. The Molecular Imaging Data Grid applies
a novel and unique data grid technology to preclinical molecular imaging facilities, a
specific biomedical imaging research community that has not been addressed before.
Furthermore, the scope of MIDG is focused on a cluster of preclinical molecular imaging
researchers, centered around a few, if not one, preclinical molecular imaging facility and
131
its affiliated investigator institutions. The scope of the MIDG is purposely kept small to
enable comprehensive customization for study metadata and supported file formats, and
to empower preclinical molecular imaging facilities to become imaging cores with
accessible historical datasets. Nonetheless, a common theme in these grid-based projects
is the need for data standardization, user-interfaces, metadata databases, grid-based data
delivery, and extendable infrastructure for multiple sites.
[22]
The Molecular Imaging Data
Grid takes these challenges into consideration, and creates a unique preclinical molecular
imaging informatics infrastructure with a workflow, data model and user-interfaces that
can readily be integrated into larger scoped initiatives in the future.
10.2.2 Comparing Current Preclinical Molecular Imaging Informatics Methods
Previous work at other preclinical molecular imaging facilities has been done to facilitate
preclinical molecular imaging workflow by developing web-based data management
interfaces for staff and investigative users within their respective institutions. To name
several, at UCLA’s Crump Institute for Molecular Imaging, a web-based interface is
implemented on campus for investigators to schedule scan sessions in advance and
request their own datasets to be made available on university-wide fileservers.
[23]
The
physical data archive consists of network file servers that organize datasets under
individual investigator folders. At Case Western Reserve University, a web-based Multi-
modality Multi-resource Information Integration (MIMI) system has been developed to
integrate staff, investigator, and data workflows. Its functionality ranges from scheduling,
to data cataloging, to billing. The MIMI system also has a database for documenting user,
132
equipment, project, and billing information. However, they too tackle archiving and
retrieval using share fileservers and investigative folders.
[24]
Retrieval of data files from
these previous informatics solutions remains institutionalized and investigator-centric.
Thus off-campus access, contribution and discovery of new or historic preclinical
molecular imaging datasets are very discouraged with current storage infrastructure.
[25]
As the value of inter-institutional collaboration and the volume of molecular imaging data
generated in preclinical trials increases, the need for multi-institutional data sharing
infrastructure and study-centric data management is becoming more relevant. The MIDG
stands up for these challenges.
10.2.3 Discussion Summary
Data grid technology is an integrative informatics platform which has been used in many
research arenas for organizing and sharing large datasets among collaborating
institutions. Preclinical molecular imaging facilities can become imaging cores within a
multi-disciplinary research community, such that medical investigators, basic sciences
researcher, and medical imaging engineers can discovery, contribute, and manage
preclinical molecular imaging data remotely. In this research, I presented the Molecular
Imaging Data Grid (MIDG) to demonstrate a novel method for archiving and
disseminating preclinical molecular imaging data while complying with the DICOM
imaging standard and IHE XDS-i workflow profile. A multi-modality data model was
defined, and the system architecture of the Molecular Imaging Data Grid was presented. I
have deployed a three-site research test-bed within the University of Southern California
133
to evaluate the Molecular Imaging Data Grid system based on data provided by the USC
Molecular Imaging Center. Evaluation has been performed in both laboratory and
distributed environments to measure quantitative performance times for archiving and
retrieving imaging study datasets from the Molecular Imaging Data Grid. By building
upon features and services of grid technology, and DICOM imaging standards and IHE
workflow profiles, the accessibility of disparate animal-model molecular imaging
datasets by users outside a molecular imaging facility’s LAN can be improved.
Productivity and efficiency of research for translational sciences investigators would
thereby be improved through streamlining experimental dataflow. In addition, the MIDG
allows for data mining and content-based information retrieval for additional knowledge
discovery, unachievable without the concept and design of the MIDG.
10.3 Future Research and Development Opportunities
The next steps in this research are to continue evaluation of the multi-site MIDG model
using live molecular imaging studies at the USC Molecular Imaging Center to improve
system robustness across all currently supported imaging data formats. Furthermore,
affiliated molecular imaging research sites that consistently utilize the USC Molecular
Imaging Center may be added to the current 3-site USC implementation to form a larger
molecular imaging research community and to promote collaboration and data-sharing
outside of the USC community.
Development of the MIDG GUI and data grid infrastructure can also be continued to
address the concerns mentioned in the evaluations by the MIC staff. The current upload
134
workflow in the GUI can be expedited using pre-defined templates for common
preclinical imaging studies such that study, session, group, or scan descriptions do not
have to be re-entered by users for multi-animal imaging experiments. The GUI can also
be extended for scheduling and billing at molecular imaging facilities using its current
metadata database. The data grid infrastructure can be further developed to address
current limitations in data format support, namely raw image acquisition data files and
proprietary post-processing files. By implementing an alternative data transfer protocol
between the MIDG GUI Web-Server and Grid Node Server, these non-DICOM files and
its affiliated metadata can be archived into the MIDG.
Another tangible parallel project would be to utilize the existing technology of radiology
PACS and the GUI component from the MIDG to bridge current clinical imaging
vendors with preclinical research applications. The IPILab and MIC have received grant
funding from The Los Angeles Basin Clinical and Translational Science Institute to pilot
a feasibility project entitled, “Development of a novel imaging and informatics platform
for translational small animal and molecular imaging research.” The objectives are to
build a DICOM conversion gateway with GUI capability that receives preclinical
imaging datasets and stores them in an attached clinical PACS. The bigger picture is to
prompt current PACS vendors to extend their market share and resources into molecular
imaging research fields, which have similar imaging data content, study workflow, and
billing procedures.
135
As presented in Chapter 1, grid technologies can be categorized into a data grid and a
computational grid. With intensive image post-processing algorithms being used in pre-
clinical molecular imaging research, the Molecular Imaging Data Grid can be interfaced
into a computational grid that exchanges pre- and post-processed imaging data from a
remote site. Currently, there are many vendor-provided grid computing infrastructures
that can run grid-based post-processing software, such as Amazon’s Elastic Compute
Cloud (EC2).
[26]
The MIDG GUI can be further developed to be a web portal that can
automate post-processing requests between the MIDG and external grid computing
infrastructures.
Moreover, although DICOM compliance in the Molecular Imaging Data Grid creates a
standardized image output of molecular imaging datasets for external DICOM-compliant
distribution, analysis, and viewing software tools, the challenge of converting all input
molecular imaging data files to DICOM may eventually be too complex, if not futile if
vendors persist in their proprietary data formats for raw acquisition files. Modality,
analysis, and post-processing software vendors in the molecular imaging field are
gradually recognizing a need for vendor-agnostic interoperability of their output imaging
data, but DICOM may not be the only standard format used in the future. For this reason,
a future MIDG design may forgo data format normalization responsibilities by replacing
the DICOM converting middleware with direct uploading and downloading web-services.
In this design, in-coming and out-going datasets will require a method other than DICOM
to correlate study-centric metadata to its dataset of physical imaging files. A GUI will
still be needed to input study, session, group, and scan information, but performance can
136
be improved by consolidating the MIDG Database with the Grid Manager Database, and
the MIDG Web-Server with each Grid Node Server. The basic infrastructure and
workflow would remain similar to the current MIDG design, but services regarding data
context and processing will be optimized with XML-based manifest metadata files.
Lastly, the data grid infrastructure of the MIDG can be utilized in other biomedical
informatics applications such as the Breast Imaging Data Grid, which interconnects
imaging data and radiologists from multiple breast cancer screening institutions to enable
real-time tele-radiology diagnostics. I am currently involved with developing a web-
based electronic patient record (ePR) system for breast cancer screening patients with
multiple breast imaging studies including dedicated breast MRI. The concept is to
integrate the ePR application layer interface with the DICOM-compliant data grid
infrastructure used in the MIDG. Together, this system is called the BIDG ePR for multi-
modality breast imaging studies.
137
BIBLIOGRAPHY
[26] Amazon Elastic Compute Cloud (Amazon EC2) [cited 18 February 2010] Available from:
http://aws.amazon.com/ec2/
[25] Anderson N, Lee E, Brockenbrough J, et al (2007) Issues in Biomedical Research Data
Management and Analysis: Needs and Barriers. Journal of the American Medical
Informatics Association. 14:478-488. doi: 10.1197/jamia.M2114
[21] Biomedical Informatics Research Network. About: Overview [cited 20 Oct 2009] Available
from: http://www.birncommunity.org
[7] DICOM Standard, Part 3: Information Object Definition. PS 3.3-2009, pp.105 [cited 15 July
2009] Available from: http://medical.nema.org/
[19] Erl T (2005) Service-Oriented Architecture (SOA): Concepts, Technology, and Design.
Prentice Hall. ISBN-10: 0131858580
[8] Evans D. A Very Basic DICOM Introduction. [cited 15 Oct 2010] Available from:
http://www.dcm4che.org/confluence/display/d2/A+Very+Basic+DICOM+Introduction
[22] Flanders AE (2009) Medical Image and Data Sharing: Are We There Yet?. RadioGraphics.
29(5):247-1251. doi: 10.1148/rg.295095151
[5] Foster I (2006) Globus Toolkit Version 4: Software for Service-Oriented Systems. J.
Comput. Sci. & Technol. Vol 21. No 4. pp 513-520.
[1] Foster I (2005) Service-Oriented Science. Science Magazine. Volume 308.
[2] Foster I, Kesselman C, Nick J, Tuecke S (2002) The Physiology of the Grid: An Open Grid
Services Architecture for Distributed Systems Integration [cited 17 July 2009] Available
from: www.globus.org/alliance/publications/papers/ogsa.pdf
[18] Foster I, Kishimoto H, Savva A, et al. The Open Grid Services Architecture Version 1.0.
[cited 11 Nov 2009] Available from: http://www.gridforum.org/documents/GFD.30.pdf
[6] Global Grid Forum (2005) Open Grid Services Architecture, version 1.0. Available from:
http://www.gridforum.org/documents/GWD-I-E/GFD-I.030.pdf
[14] The Globus Toolkit. Data Management: Key Concepts. [cited 18 Sept 2009] Available
from: http://www-unix.globus.org/toolkit/docs/4.0/data/key/index.html
[15] A Globus Primer. [cited 19 Oct 2010] Available from: http://www-
unix.globus.org/toolkit/docs/4.0/key/GT4_Primer_0.6.pdf
138
[11] Huang HK (2008) Utilization of Medical Imaging Informatics and Biometrics Technologies
in Healthcare Delivery. Int J CARS. 3:27-39. doi: 10.1007/s11548-008-0199-4
[3] Huang HK, Zhang A, Liu BJ, et al (2005) Data Grid for Large-Scale Medical Image
Archive and Analysis. Proceedings of the 13th ACM International Conference on
Multimedia. pp 1005-1013. doi: 10.1145/1101149.1101357
[10] IHE Radiology Technical Framework Supplement 2005-2006 [cited 12 Oct 2010]
Available from: http://www.ihe.net/Technical_Framework/upload/IHE_RAD-
TF_Suppl_XDSI_TI_2005-08-15.pdf
[16] Lee J, Documet J, Liu BJ, Park R, Tank A, Huang HK (2010) MIDG-Emerging Grid
Technologies for Multi-Site Preclinical Molecular Imaging Research Communities.
International Journal of Computer Assisted Radiology and Surgery. doi: 10.1007/s11548-
010-0524-6
[13] Lee J, Ma K, Liu BJ (2008). Assuring Image Authenticity within a Data Grid Using
Lossless Digital Signature Embedding and a HIPAA-Compliant Auditing System.
Proceedings of the SPIE, Volume 6919, pp. 69190O.
[12] Lee J, Zhou Z, Talini E, Ma K, Liu BJ, Huang HK (2007). A Data Grid for Enterprise
PACS Tier 2 Storage and Disaster Recovery. RSNA 2007 Educational Exhibit, LL-
IN5226-B.
[9] Mendelson D, Bak P, Menschik E, Siegel E (2008) Image Exchange: IHE and the Evolution
of Image Sharing. Radiographics, Vol 28, pp.1817-1833. doi: 10.1148/rg.287085174
[23] Stout DB, Chatziioannou AF, Lawson TP, et al (2005) Small Animal Imaging Center
Design: the Facility at the UCLA Crump Institute for Molecular Imaging. Mol Imaging
Bio. 7(6):393-402. doi: 10.1007/s11307-005-0015-2
[24] Szymanski J (2008) An Integrated Informatics Infrastructure for Pre-clinical Research-IT
Support. Unpublished Ph.D Thesis, Case Western Reserve University.
[20] VMware, Inc. VMware Server: Getting Started with Virtualization Risk-Free.[cited 25 Oct
2010] Available from: http://www.vmware.com/products/server/
[17] Zhang J, Zhang K, Yang Y, et al (2010) Grid-based implementation of XDS-I as part of
image-enabled EHR for regional healthcare in Shanghai. Int J. CARS. doi:
10.1007/s11548-010-0522-8.
[4] Zhou M, Lee J, Huang HK, et al (2007) A Data Grid for Imaging-based Clinical Trials.
SPIE Medical Imaging. Vol.6516.
139
APPENDIX: AUTHOR PUBLICATIONS AND PRESENTATIONS
Peer-reviewed Publications
1. Lee J, Documet J, Liu BJ, Park R, Tank A, Huang HK. (2010) MIDG-Emerging Grid
Technologies for Multi-Site Preclinical Molecular Imaging Research Communities.
International Journal of Computer Assisted Radiology and Surgery. doi:
10.1007/s11548-010-0524-6
2. Gutierrez MA, Lage S, Lee J, Zhou Z. (2008) A Computer-Aided Diagnostic System
using a Global Data Grid Repository for the Evaluation of Ultrasound Carotid
Images. CCGRID 2007: Proceedings of the Seventh IEEE International Symposium
on Cluster Computing and the Grid, pp.840-845.
3. Guo B, Documet J, Lee J, Liu BJ, King N, Shrestha R, Wang K, Huang HK, Grant E.
(2007) Experiences with a Prototype Tracking & Verification System Implemented
within an Imaging Center. Journal of Academic Radiology, 14(3), 270-278.
Conference Proceedings Paper
1. Lee J, Documet J, Liu BJ. (2010) Data Migration and Persistence Management in a
Medical Imaging Informatics Data Grid. Proceedings of the SPIE, Volume 7628, pp.
762812.
2. Lee J, Gurbuz A, Liu BJ. (2010) An Investigator-centric Data Model for Organizing
Multimodality Images and Metadata in Small Animal Imaging Facilities. Proceedings
of the SPIE, Volume 7628, pp. 76280O.
3. Lee J, Gurbuz A, Dagliyan G, Liu BJ. (2009) A Virtualized Infrastructure for
Molecular Imaging Research using a Data Grid Model. Proceedings of the SPIE,
Volume 7264, pp. 726417.
4. Lee J, Ma K, Liu BJ. (2008) Assuring Image Authenticity within a Data Grid Using
Lossless Digital Signature Embedding and a HIPAA-Compliant Auditing System.
Proceedings of the SPIE, Volume 6919, pp. 69190O.
5. Lee J, Le A, Liu BJ. (2008) Integrating DICOM Structure Reporting (SR) into the
Medical Imaging Informatics Data Grid. Proceedings of the SPIE, Volume 6919, pp.
691904.
6. Guo B, Zhang Y, Documet J, Lee J, Liu BJ, Shrestha R, Wang K, Huang HK. (2007)
Comparison of fingerprint and facial biometric verification technologies for user
access and patient identification in a clinical environment. Proceedings of the SPIE,
Volume 6516, pp. 65160Y.
7. Guteirrez MA, Lee J, Zhou Z, Pilon PE, Lage SG. (2007) Utilization of a Global
Data Grid Repository in CAD Assessment of Carotid Wall Thickness. Proceedings of
the SPIE, Volume 6516, pp. 651614.
8. Zhou Z, Chao S, Lee J, Liu BJ, Documet J, Huang HK. (2007) A Data Grid for
Imaging-based Clinical Trials. Proceedings of the SPIE, Volume 6516, pp. 65160U.
140
9. Lee J, Zhou Z, Talini E, Documet J, Liu BJ. (2007) Design and Implementation of a
Fault-Tolerant and Dynamic Metadata Database for Clinical Trials. Proceedings of
SPIE, Volume 6516, pp. 65160S.
10. Lee J, Liu BJ, Documet J, Guo B, King N, Huang HK. (2006) Technical Experiences
of Implementing a Wireless Tracking and Facial Biometric Verification System for a
Clinical Environment. Proceedings of SPIE, Volume 6145, pp.61450U.
11. Liu BJ, Documet J, Chao S, Lee J, Lee M, Topic I, Williams L. (2005)
Implementation of an ASP Model Offsite Backup Archive for Clinical Images
Utilizing Internet2. Proceedings of SPIE, Volume 5748, pp. 224.
Conference Presentations
1. Lee J, Wang K, Liu BJ. (2009) Pre-clinical Implementation and Disaster Recovery
Evaluation of a Medical Imaging Informatics Data Grid Used as a Tier 2 Enterprise
PACS Back-up Solution at the USC Academic Medical Center. RSNA 2009
Educational Exhibit, LL-IN3052.
2. Liu BJ, Documet J, Law MY, Lee J, Hong X, Moin P, Ma K, Huang HK. (2009) An
Image-Intensive Breast Cancer Data Grid Infrastructure for Data Mining and
Outcomes Research. RSNA 2009 Educational Exhibit, LL-IN3550.
3. Lee J, Guo B, Liu BJ, Wang K. (2008) Optimization and management of User
Registration and HIPAA-Compliance in a Clinical PACS Environment by Integration
with a HIPAA-Compliant Auditing Toolkit. RSNA 2008 Educational Exhibit, LL-
IN1052.
4. Lee J, Le A, Liu BJ. (2008) Integration of Content-based DICOM-SR for CAD in the
Medical Imaging Informatics Data Grid with Examples in CT Chest, Mammography,
and Bone-Age Assessment. RSNA 2008 Educational Exhibit, LL-IN1123.
5. Documet J, Liu BJ, Wang K, Lee J. (2008) A Radiology Dashboard Integrated with
RFID Location System. RSNA 2008 Educational Exhibit, LL-IN1116.
6. Talini E, Lee J, Zhou Z, Caramella D, Huang HK. (2007) A Real Case PACS
Enterprise in Italy: Feasibility of a Data Grid Solution for Backup and Disaster
Recovery with a Web-based Centralized Management System. RSNA 2007
Educational Exhibit, LL-IN6881.
7. Lee J, Zhou Z, Liu BJ, Brown M, Guo B, Documet J. (2007) Assuring Image
Security within a Data Grid for Image-based Clinical Trials Using Lossless Digital
Signature Embedding and a HIPAA-compliant Auditing System. RSNA 2007
Educational Exhibit, LL-IN6883.
8. Lee J, Zhou Z, Talini E, Ma K, Liu BJ, Brown M. (2007) A Fault-tolerant Metadata
Database Design for Sharing Quantitative Results in Image-Based Clinical Trials
Based on the IHE XDS Profile. RSNA 2007 Educational Exhibit, LL-IN5250-B.
9. Lee J, Zhou Z, Talini E, Ma K, Liu BJ, Huang HK. (2007) A Data Grid for
Enterprise PACS Tier 2 Storage and Disaster Recovery. RSNA 2007 Educational
Exhibit, LL-IN5226-B.
10. Zhou Z, Liu BJ, Huang HK, Documet J, Brown M, Lee J. (2006) A Data Grid for
Imaging-based Clinical Trials. RSNA 2006 Educational Exhibit, LL-IN3104.
141
11. Guo B, Documet J, Lee J, Zhang Y, Liu BJ, Huang HK. (2006) Comparison of
Fingerprint, Iris, and Facial Biometric Verification Technologies for User Access
and Patient Identification in a Clinical Environment. RSNA 2006 Educational
Exhibit, LL-IN3105.
12. Documet J, Liu BJ, Zhou Z, Lee J. (2006) A Fault-Tolerant Metadata Database
Model Design in a Medical Image Data Grid for a DICOM Radiation Therapy
Information System (RTIS). RSNA 2006 Educational Exhibit, LL-IN2603.
Abstract (if available)
Abstract
Molecular imaging is a relatively new field in medical imaging research that has been propagating research discoveries in biology, medicine, disease studies, proteomics, and radiopharmaceutical development, by using in-vivo biomarkers to visualize and quantify cellular and molecular content and activity. Small animal imaging facilities throughout medical research institutions have been growing in number of investigator studies as well image data volume per study. To optimize utilization of pre-clinical molecular imaging data in translational sciences research, a multi-modality Molecular Imaging Data Grid (MIDG) has been designed to address challenges in data archiving, management, and sharing among multi-site or multi-institution research consortiums.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A medical imaging informatics based human performance analytics system
PDF
Development of an integrated biomechanics informatics system (IBIS) with knowledge discovery and decision support tools based on imaging informatics methodology
PDF
Imaging informatics-based electronic patient record and analysis system for multiple sclerosis research, treatment, and disease tracking
PDF
An electronic patient record (ePR) system for image-assisted minimally invasive spinal surgery
PDF
Mining an ePR system using a treatment plan navigator for radiation toxicity to evaluate proton therapy treatment protocol for prostate cancer
PDF
Identifying injury risk, improving performance, and facilitating learning using an integrated biomechanics informatics system (IBIS)
PDF
Model-based studies of control strategies for noisy, redundant musculoskeletal systems
ZIP
Development of a multi-mode optical imaging system for preclinical applications in vivo [program]
PDF
Investigation of preclinical testing methods for total ankle replacements
PDF
Multi-modality intravascular imaging by combined use of ultrasonic and opticial techniques
PDF
Intravascular imaging on high-frequency ultrasound combined with optical modalities
PDF
Feature and model based biomedical system characterization of cancer
PDF
High-frequency ultrasound array-based imaging system for biomedical applications
PDF
Cerebrovascular disease of white matter in patients with chronic anemia syndrome
Asset Metadata
Creator
Lee, Jasper Chung
(author)
Core Title
Molecular imaging data grid (MIDG) for multi-site small animal imaging research based on OGSA and IHE XDS-i
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Biomedical Engineering (Biomedical Imaging and Telemedicine)
Publication Date
01/19/2011
Defense Date
12/06/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Cross-Enterprise Document Sharing for Imaging (XDS-i),data grid,molecular imaging,OAI-PMH Harvest,Open Grid Services Architecture (OGSA),preclinical imaging data archive
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hsiai, Tzung K. (
committee chair
), Huang, H. K. (
committee chair
), Liu, Brent J. (
committee member
), McNitt-Gray, Jill L. (
committee member
), Yen, Jesse T. (
committee member
)
Creator Email
jasperle@usc.edu,Jspr624@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3615
Unique identifier
UC185184
Identifier
etd-Lee-4268 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-431767 (legacy record id),usctheses-m3615 (legacy record id)
Legacy Identifier
etd-Lee-4268.pdf
Dmrecord
431767
Document Type
Dissertation
Rights
Lee, Jasper Chung
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
Cross-Enterprise Document Sharing for Imaging (XDS-i)
data grid
molecular imaging
Open Grid Services Architecture (OGSA)
preclinical imaging data archive