Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A medical imaging informatics based human performance analytics system
(USC Thesis Other)
A medical imaging informatics based human performance analytics system
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
By
Sneha K. Verma
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfilment of the Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOMEDICAL ENGINEERING)
May 2019
A Medical Imaging Informatics Based
Human Performance
Analytics System
2
Dedicated to
My parents,
Late. Suresh Chandra Verma
Mrs. Manorama Verma
My elder brother Capt. Kirti Kumar Verma
My elder sister Dr. Rashmi Verma
3
Table of Contents
Abstract ............................................................................................................................................................... 5
Chapter 1. Introduction ................................................................................................................................... 7
Human performance analysis .................................................................................................................... 7
Use of technology in human performance analysis for performance improvement and injury prevention
10
Current workflow for biomechanics investigation and its challenges ...................................................... 13
Parallels with medical imaging and need for informatics systems .......................................................... 15
Research Aims and Scope ........................................................................................................................ 18
Summary .................................................................................................................................................. 19
Chapter 2. Background and Significance ........................................................................................................ 20
Human Performance Analysis and its current limitations. ....................................................................... 20
Baseline workflow .................................................................................................................................... 22
Factors affecting workflow ...................................................................................................................... 25
Medical Imaging informatics ................................................................................................................... 28
2.4.1. Medical imaging informatics infrastructure ........................................................................................ 28
2.4.2. Electronic patient records ................................................................................................................... 28
2.4.3. Digital imaging and communication in medicine (DICOM) ................................................................. 29
2.4.4. Data warehousing ............................................................................................................................... 31
2.4.5. Structure reporting ............................................................................................................................. 32
2.4.6. Data Visualization for imaging and clinical data ................................................................................. 33
Summary .................................................................................................................................................. 34
Chapter 3. Conceptual Design of Medical Imaging Based System for Human Performance Analytics and
system architecture ............................................................................................................................................ 35
Multilayer infrastructure for human performance analytics ................................................................... 35
3.1.1. Layer 1: Data sources .......................................................................................................................... 36
3.1.2. Layer 2: Functional layer ..................................................................................................................... 36
3.1.3. Layer 3: Modular Layer ....................................................................................................................... 37
3.1.4. Layer 4: Conceptual Layer ................................................................................................................... 38
System Architecture Design and Implementation .................................................................................... 40
3.2.1. Data Gateway ...................................................................................................................................... 40
3.2.2. Data Storage ........................................................................................................................................ 41
3.2.3. Data processing engines ..................................................................................................................... 42
3.2.4. User interface ...................................................................................................................................... 42
Summary .................................................................................................................................................. 43
Chapter 4. Design and development of informatics system for self-reported and multimedia data
management in rehabilitation ............................................................................................................................ 44
Background .............................................................................................................................................. 44
Challenges ................................................................................................................................................ 45
4
Informatics system design and deployment ............................................................................................ 47
Evaluation ................................................................................................................................................ 49
4.4.1. Modified (Simplified) paper questionnaire ......................................................................................... 50
4.4.2. Parser for fragmented data sources ................................................................................................... 51
4.4.3. Data preprocessing (Validation, Normalization, Cleaning, Annotating, and Synchronizing) .............. 52
4.4.4. Web portal for data collection ............................................................................................................ 53
4.4.5. Centralized data storage ..................................................................................................................... 53
4.4.6. Knowledge discovery & Decision support ........................................................................................... 53
Chapter 5. Design and development of informatics system for knowledge curation in collaborative research
environment ……………………………………………………………………………………………………………………………………………….... 55
Background .............................................................................................................................................. 55
Informatics system design and deployment ............................................................................................ 55
Evaluation ................................................................................................................................................ 57
5.3.1. Data Structuring .................................................................................................................................. 58
5.3.2. Data preprocessing (Validation, Cleaning, Annotating, and Synchronizing) ....................................... 59
5.3.3. Centralized data storage with access control ..................................................................................... 60
5.3.4. Data sharing for teaching and collaboration ....................................................................................... 60
Chapter 6. Results ......................................................................................................................................... 61
Chapter 7. Discussion & Future work ............................................................................................................ 68
References ......................................................................................................................................................... 71
Publications and Presentations .......................................................................................................................... 73
5
Abstract
With the recent technological advances in the field of human performance monitoring and analysis,
the amount and the speed at which data is generated has increased exponentially. However, with
the current workflow of human performance analysis, there is a gap between the amount of data
recorded and the amount of knowledge extracted and utilized in terms of real-time feedback and
interventions. One solution to bridge this gap is the development of an informatics system based
on medical imaging informatics concepts. Such a system can handle challenges including data
storage, data sharing, data standardization, and data security. This system can provide a framework
for solutions development for data processing and knowledge extraction that is scalable and
practical in a research setting and can support decision support and feedback requirements.
In this dissertation, I will present a discussion on the current state of human performance analytics
workflow, factors that impact this workflow and how knowledge of medical imaging informatics
can be used as leverage for development of a robust solution moving forward. For over a decade
medical imaging informatics infrastructure (MIII) has been used as a foundation for applying
imaging informatics in various applications areas within clinical departments.
In my work, I am developing a re-conceptualized MIII infrastructure to meet specific requirements
of human performance analytics. This is one of the first times that this approach will be expanded
to a new frontier area of research outside Radiology and related imaging-based clinical
departments. This modification is needed since the biomechanics community is unique in terms of
the end users, current workflows, and data models of research studies but also require similar data
management and visualization of large-scale multi-media data as traditional imaging informatics
needs. Based on the modified infrastructure I implemented several solutions/tools for supporting
biomechanics research. I implemented and tested these solutions in two different use-case
6
scenarios using various data sets and different users to show the versatility of the new proposed
infrastructure. The initial impact of having these functionalities is measured by comparing the
workflow and predicted time difference in task completion with and without the informatics
system.
7
Chapter 1. Introduction
Human performance analysis
Human performance analytics involves analysis of how a human body interacts with its
environment to accomplish certain tasks. Studying interactions of neuromuscular and
musculoskeletal systems with the environment while performing a task is crucial to deduce
conclusions about what the individual is currently doing to accomplish that task. Human
performance analytics covers a vast landscape of applications from preventing injury to improving
performance. Additionally, human performance analytics covers a broad population group ranging
from clinical patients (such as manual wheelchair users) to high performing elite athletes (such as
volleyball players). Although there is a broad landscape and a variety of population groups, there
are common threads that exist amongst this vast landscape. For example, manual wheelchair user’s
population are researched where a specific task to be studied is to perform activities of daily living
while interacting with the wheelchair. Similarly, for the elite athlete such as volleyball players,
tasks researched include but not limited to landing and takeoff etc. To analyze these tasks, a variety
of multi-media devices are utilized to capture data during the study of these certain tasks. For
example, by recording wheelchair propulsions with a high-speed camera, observations can be
made about propulsion techniques which is one aspect of wheelchair use that is believed to be
associated with upper limb overuse injury [1], [2]. However, with volleyball players, analyzing the
interaction between a player’s body and landing surface or their environment is critically important
for the development of proper techniques. While performance metrics and movement analytics are
important, additional athlete-specific contextual information is extremely valuable for coaches
when designing an individual training program for an athlete. Contextual data often includes, but
not limited to, data about behavior (e.g. frequency of workout sessions) and physiology (e.g. heart
8
rate). Contextual data can include current and historical quantitative or qualitative
information. However, this contextual information is often challenging to record, share, and
analyze as this information is fragmented since this is gathered from different sources. Therefore,
there is a need for an integrated informatics platform for coaches, athletes, and researchers within
the biomechanics community where they can collect and analyze data to gain a better
understanding of performance and track results longitudinally. For performance improvement and
to lower the risk of injury levels during any given task, interventions and feedback tools can be
used to educate coaches, athletes, physical therapists, or researchers to make knowledge-driven
decisions. Currently, the basic methodology for improving human performance in research setting
includes execution of the following stages as shown in Figure 1.1 [1]
• Identification of current state
It is important to investigate, exactly what an individual is doing to accomplish a certain task. For
example, this may include, knowledge of shoulder loading at different speeds in wheelchair
populations or how much sleep a player had before a competition or training.
• Investigation of interaction with environment
Various environmental factors affect the way neuromuscular and musculoskeletal system interact
while performing a task. For example, fitting of a wheelchair to a person’s needs or properties of
the surface for track and field athletes. It is crucial to gain knowledge about this interaction and
this knowledge can be extracted by repeated experimentation and observations.
• Simulation for understanding possible solutions and limitations
Broadly, there are two ways to affect task outcome. First, by changing the environmental factors
such as assistive devices like wheelchair fitting. Second, by changing the person and his or her
9
movement either ergonomically or physiologically. Simulations are used to visualize and estimate
effects of these changes.
• Feedback or intervention
Any changes in the movement technique or interaction with the environment require meaningful,
accurate, reliable, timely and understandable feedback, making it possible to insert in the existing
workflow.
Figure 1-1 Four stages defined for human performance data research
For the implementation of the above-mentioned key steps, a generalized workflow for performance
improvement and technique can be inferred as shown in Figure 1.2. With recent advancement in
technology, there are various ways to implement the general workflow. The next section presents
a short discussion on the technology that is being used for human performance monitoring,
improvement and injury preventions.
10
Figure 1-2 Generalized workflow for human performance research
Use of technology in human performance analysis for performance
improvement and injury prevention
Human performance analytics requires a better understanding of challenges and limitations of an
athlete in the field or wheelchair user in daily living conditions. To gain an understanding of cause
and effect relationships various devices are used as means of data collection. With recent advances
in the field of technology, there are many options available in the market for recording human
performance data. Each data source has its limitations which will be discussed further in chapter
2. These data sources are summarized in Figure 1.3 and can be broadly classified as follows:
• Videography
The primary method used for recording and studying human performance analysis is digital
videography. Based on the research questions one of the following ways is selected for recording
visual evidence: (I) Motion capture system: Recording movements of objects or people using
reflective markers. Optical systems utilize data captured from image sensors to triangulate the 3D
11
position of a subject between two or more cameras calibrated to provide overlapping projections.
(II) High-speed/high definition camera: Recording high speed movement as photographic
images onto a storage medium and ability to play in slow motion. High-speed camera capture
moving images with exposures of less than 1/1,000 second or frame rates more than 250 frames
per second. High definition video is classified as video with more than 480 horizontal lines (III)
Action Camera: Recording actions while being immersed in it. Action cameras are compact and
rugged. They record photos in burst mode and time-lapse mode, allowing recording for a long
amount of time.
• Instrumented Devices/Surfaces:
For quantitative analysis of movement patterns and to highlight potential risk factors, particularly
in high-impact activities force or pressure are measured using: (I) Force plates: Most force
measurements in human performance research use force plate, which measures the contact force
components between the ground, called the ground contact force, or another surface, and the
performer. The measured force acting on the performer has the same magnitude as, but opposite
from, the reaction force exerted on the performer by the force plate. (II) Instrumented hand rims
for the wheelchair: For measuring forces and moments applied to the push rim during a
wheelchair propulsion, 3-D force and torque sensing push rim is used.
• Self-reported Data
To add contextual information such as sleep, nutrition, hydration etc. as well as to monitor the
impact of interventions, self-reported data is used. The following ways of collecting such
information include (I) Paper-based questionnaires: This is the traditional and well-adapted
methods in health institutes for collecting survey data. Generally completed by participants
themselves. (II) Electronic webpage and applications: Electronic versions of the survey is used
12
to save time for data review. Also, with increasing use of the Internet, Web-based questionnaires
are becoming highly used alternatives. (III)Interview: Interviews and observations, from
participants or data collection team, provides helpful information.
• External data sources
These are associated data such as healthcare records or performance records, often collected and
managed outside the research group. Examples include electronic medical records or performance
history.
• Wearables
Wearable sensors provide a method of monitoring real-time physiologic and movement parameters
during training and competition. Wearable sensors also provide ways to capture data outside the
lab. In human performance research, the following wearable sensors are often utilized[3]: (I)
Accelerometer/gyroscope: These devices are composed of 2 components: a mechanical
movement-sensing device and a microchip that interprets signals from the mechanical device.
Technological progress and the development of microelectromechanical systems (MEMS) devices
have allowed multiple transducers to be packaged together, giving a single sensor the ability to
perceive movement in multiple dimensions. Accelerometers are used to estimate energy
expenditure since it is a crucial parameter for assessing the intensity of a training regimen. (II)
Pedometer: These devices are the simplest form of the movement sensor, used for step-count
monitoring purposed in ambulatory settings. (c) Global Positioning Satellite (GPS): These
devices are an alternative to the accelerometer in measuring positional data in athletics. These
devices require a signal transmission from multiple GPS satellites orbiting the earth. The GPS has
been used to monitor the speed and position of athletes in outdoor sports. (d) Heart rate monitors:
Heart rate is a useful indicator of physiological adaptation and intensity of effort. Standard heart
13
rate monitors comprise of a transducer worn around the chest that transmits to a wireless wrist
display. Heart rate monitors have also been used with video data to determine physiologic response
and metabolic demand experienced during competition in a number of sports. (e) Temperature
monitors: These sensors are often used to gather information about the core body temperature
where hyperthermia is a concern. (d) Integrated sensors: Multimodal integrated sensors have
been developed for use in a team and individual fitness activities with a number of variable sensing
elements to obtain physiologic and movement profiles in athletes.
Figure 1-3 Data sources used for performance monitoring
Current workflow for biomechanics investigation and its challenges
To accomplish the four stages of performance analysis as discussed in section 1.1, a linear
workflow is used as shown in Figure 1.2. In this workflow, data from different sources mentioned
in the previous section is gathered and analyzed. However, data collected from these sources
14
presents itself with various unique challenges, which causes delays in data gathering, processing,
and utilization in a linear workflow[4]. These challenges can be broadly classified in the following
categories as shown in Figure 1.4:
Figure 1-4 Challenges
• Unstructured data
In order to process large amounts of data collected in human performance analysis, a data model
is required to apply batch processing techniques to increase efficiency. The lack of a data model
and structure in complex datasets is a common problem when isolated data sources are used.
• Unsynchronized data
Information collected using multiple sources provides the basis to generate visual evidence in a
contextual manner. However, the data coming from different sources needs to be synchronized on
a common scale (time series) before interpretations can be made.
• Integrated data storage
Data collected from multiple sources operating on isolated systems needs to be stored in an
efficient manner. It is very difficult for researchers to search for information retrospectively when
complex datasets start to build up as the study progresses.
15
• Data presentation
When a large amount of data collected using multiple sources at a high speed in high volume,
knowledge extraction techniques are required so as to not overwhelm the end users. To represent
this knowledge, visualization techniques are needed that can allow data to be compressed in visual
representation while preserving its knowledge content.
Parallels with medical imaging and need for informatics systems
In order to solve the above-mentioned challenges of human performance data collection, analysis
and utilization, many concepts developed in medical imaging informatics can be used. As shown
in Figure 1.5 there are many similarities between challenges that currently exists in human
performance analytics and the ones addressed by the medical imaging informatics community over
past several decades. For example:
• Medical imaging data are collected using various vendor-based hardware systems and need
standardization through DICOM (). Similarly, human performance data is gathered using
many configurations of hardware, manufactured by different vendors and also needs
standardization.
• A medical imaging informatics system provides infrastructure for data sharing within a
single institute and across multiple institutes. A similar collaboration is required for human
performance data analysis with end users as well as collaborating institutions.
• Both medical imaging data and human performance data are large volumes of imaging and
multimedia data.
16
Figure 1-5 Overview of challenges that are addressed by imaging informatics community
Therefore, the experiences and skills gained from the development of such a decision support
system within Imaging Informatics can be further leveraged in a new frontier area of research that
requires multimedia data such as human performance analytics. The medical imaging informatics
infrastructure (MIII) has been developed and widely used in many clinical and research
applications to utilize PACS images and related data for large scale horizontal and longitudinal
clinical service, research and education. The MIII components and their logical relationship are
shown in Figure 1.6. In short, the medical imaging informatics infrastructure provides a framework
for a solution addressing the challenges mentioned above such as data storage, data integration and
data visualization. The medical imaging informatics infrastructure includes five layers[5]:
• First Layer: Addresses data sources which in the case of medical imaging, are connections
with medical imaging databases.
• Second Layer: Space for Common tools developed for various data elements.
17
• Third Layer: This layer is focused on knowledge extraction, simulations and modeling
purposes using the tools developed within the second layer.
• Fourth Layer: Application specific software-based placeholder
• Fifth Layer: Customized software placeholder
These layers are used as the basis for designing a new and innovative human performance
informatics infrastructure. In the next chapter, I will present the background on the MIII
components and simultaneously present its counterparts in the newly designed human performance
informatics infrastructure.
Figure 1-6 Medical imaging informatics infrastructure
18
Research Aims and Scope
There are many challenges that currently exist in human performance analytics and its related
technologies. These challenges are similar challenges encountered within the medical imaging
community. My research objective is to design, develop and evaluate an imaging informatics-
based solution targeted towards the application of the biomechanics community and specifically
in the area of human performance research. In order to evaluate my work, I have tested the
hypothesis that a system based on the concepts of medical imaging informatics infrastructure can
support human performance research by providing a framework for solution development. If this
is true, the solutions developed with designed framework will improve the workflow efficiency in
terms of time saved and error rate.
The scope of my work on informatics framework is based on the two studies: (I) Developing
decision support and data management solutions for rehabilitation (II) Developing an
infrastructure to create knowledge base in a collaborative research study. Overall project goals are
listed below:
• Design a date model and build a database for human performance data and post-processed
knowledge extracted from this raw data set.
• Design and implement a roust program to parse diverse data from various data sources
• Design a HIPAA compliant data sharing protocol for facilitating multi-institutional
collaboration
• Evaluate the system’s ability to integrate data from at least two different institutions
• Demonstrate the impact of system on conventional workflow
19
Summary
Following this introduction, the next chapter will explain the challenges in human performance
analytics in detail, introduce the parallels between human performance analytics and medical
imaging informatics. Chapter 3 outlines the architecture of the system and conceptual design of
medical imaging-based systems for human performance analytics. Chapter 4 and 5 will present
the two applications where informatics infrastructure is used to improve workflow. Chapter 6
presents evaluation results of system in two applications. Chapter 7 concludes the work by
discussing advantages and disadvantages of extending informatics into this new area of research
and future.
20
Chapter 2. Background and Significance
Human Performance Analysis and its current limitations.
In this chapter, a short background on the workflow analysis specific to human performance
analytics will be described including the observed current workflow for human performance
analysis in a research setting and its limitations. In the second half of the chapter, concepts of
medical imaging informatics will be discussed that can be used for improving workflow. For
documenting and analyzing the current workflow of human performance analytics, the major
workflow components are categorized into four specific areas: Roles, Processes, Pathways and
Data storage.
Figure 2-1 Roles, Process, Pathways and Data storage, are used as building block for workflow
analysis.
21
Roles: For workflow analysis in any environment, the very first step is to create a user profile
which will describe the characteristics of the end users. Each user plays a role to accomplish a task
in the workflow, as well as the fact that a user can have multiple roles at various stages in the
workflow. Defining roles makes development, deployment, and maintenance of the system more
efficient. Roles can be classified as (I) Participant: A person who takes part in the study for human
performance, for example, a wheelchair patient or a cross-country runner. (II) Collector: A person
who is responsible to collect data from the participant, for example, a video camera operator or a
person receiving completed surveys from a participant. (III) Analyzer: A person or a group
coordinating with the collector and extracting knowledge from the data based on a specific research
question, for example, a team of students and researchers at a laboratory. (IV) End User: A person
who disperses information back to participants either as an intervention or feedback, for example,
an athletic team coach or a physical therapist.
Processes: A logical set of tasks that, once completed, accomplish the goal for each of the above-
mentioned user roles are defined as a process. Processes include data access, data editing, data
transfer, and data visualizing.
Pathways: Pathways are defined as the means or methods for accomplishing the processes defined
in the previous section. Pathways can be categorized as (I) Networked: When established the
network connection is available for performing processes, (II) Hardwired: When data is
transferred, or accessed using hardwired devices. (III) Paper-based: Self-reporting or non-
22
electronic information is often performed using paper-based means. (IV) Observed: When
information is not documented but is generated based on the observations of users.
Data Storage: Users with various roles perform multiple processes using one or more pathways.
However, from an informatics system deployment point of view, it is also important to document
and standardize data storage locations for temporary or long-term use based on the pathways.
Broadly, data storage can be classified into a local or networked data storage.
Baseline workflow
The current workflow for general human performance analytics in a research environment can be
documented by first analyzing the user groups and their interchangeable roles. As shown in Figure
2.2, the user group consists of (I) Participants – who are the subjects of a study, (II) Data collector
– who manages participants and data sources at a data collection site, (III)Data Analyzers – this is
a group of biomechanics domain experts; and (IV) End-user –who uses the knowledge gathered
by data analyzer. These roles are well established in the biomechanics community, however, in
some cases, these roles are interchangeable based on a data collection design or lack of resources.
For example, if the wearable devices were used for a field data collection, the participant will take
on the role of a data collector as well and therefore the participant will be required to manage the
data sources.
23
Figure 2-2 Generalized view of different roles and mutual interactions.
The current operational workflow involves the following steps. Each step described corresponds
to the number as indicated in Figure 2-3:
1) Data is captured using various sources mentioned in chapter 1 based on a specific research
question.
2) A log book is created electronically, which consists of details about the data capture. This
log book is created and maintained by the data collector. The logbook content varies
from project to project and is based on the preferences of the data collector.
3) The Data collector reviews the collected data.
4) The Data collector edits information in the logbook if needed according to the data
collected or adds more notes specific to the collection.
5) Data captured using various sources is gathered and transferred to one machine. Based on
the devices that are used, the process of data transfer varies.
6) Based on the requirements analysis, the Data analyzer creates a copy of the data based
on the analysis type.
7) The Data analyzer makes duplicate copies of data for processing.
24
8) Processed data and results of the analysis are created locally.
9) The results of the analysis are shared with the end users such as coaches and researchers
using access to storage devices where the analyzed data resides.
10) End-user gets access to the data and reviews the results.
Steps 6 – 10 are repeated based on the feedback or requirements of the end user.
Figure 2-3 Current general performance analytics workflow.
There are many limitations in the current workflow of human performance analytics including the
following: (I)Duplication of data: Lack of data management and sharing causes a number of
versions of datasets to exist at various locations; (II) Linear data processing pipeline: In the
current workflow, completion of one process is necessary before starting the next process; (III)
25
Isolated data: Data is captured on isolated platforms which requires integration and
synchronization; (IV) One path for all: The current workflow is not customizable for different
data collections and data elements.
To overcome these limitations, an informatics system can be designed using various concepts of
medical imaging informatics infrastructure which will be discussed further in this chapter.
However, it is important to outline how the different factors affect workflow specifically in a
human performance research setting and how a system must meet these challenges. The next
section addresses the effects of the user, environment, scale, and mode of analysis on the workflow
design for improved outcomes.
Factors affecting workflow
• Users
The user population affects the utilization and efficiency of workflow. The user population can be
defined by the following attributes: total user number, user’s roles, user’s location, and user
experience. Many clinical studies involve data sharing between multiple institutes or within a
large team of researchers, therefore setting up data sharing protocols is one of the most crucial
steps of any collaborative project. Data sharing and data management are also very crucial for
scaling any analytics. However, the data sharing and processing methods differ when implemented
for a single user as opposed to a group of users. Other factors that affect the workflow based on
the user are the variations within the user population in terms of location, familiarity with the data
and overall goals. The current workflow as discussed in the previous section does not provide any
customization based on these requirements. The current workflow is targeted for one single user
profile. Based on the limitations of current workflow discussed in the previous section, the
26
following changes must be made: (I) Shared data but limited access protocol, (II) Customizable
interface based on user preference.
• Scale
Implementation of data sharing and storing protocols is highly sensitive to the volume of data.
Scaling one process designed for a small amount of data versus processing a large volume of data
can be challenging. There are many ways in which the amount of data gets accumulated (I) a large
amount of data is generated at any single instance of time, (II) Relatively small amount of data is
generated but over time it still totals a large volume, (III) Small amount of data is generated at a
higher rate of speed, and (IV) Data generated is not fixed and varies in amount. The amount of
data that one user needs to process, or view has a big impact on the workflow design and
implementation. Currently, with no shared data storage, every user needs to copy the data to local
memory before processing. However, since the amount of data that needs to be copied varies, it
becomes difficult to estimate the efficiency of the workflow. Furthermore, all pathways or
processes in the current workflow are based on the assumption that every data element is similar.
One major change that can be made to the existing workflow is varying the pathways based on the
amount of data for each process.
• Environment
Environmental factors contribute a lot to workflow implementation as the pathways for processes
depends on the available infrastructure. Availability of network, local storage space on the remote
machine, means of displaying information, etc. are few of the many factors. Connectivity and user
interface platforms differ a lot in terms of where data is collected, analyzed or reviewed. Based on
27
the requirement of different users at different stages of workflow, they may require access to
specific data sets. In the current workflow, there are pathways for accomplishing certain tasks,
however, these pathways are not based on the user’s environment. Based on the user’s
environment, certain pathways can be modified to increase the overall efficiency of the workflow.
For example, if the end user has access to the network, they do not need to download data locally
for viewing.
• Application
Human performance analytics can be used for feedback, intervention, or as an educational tool.
However, each of these use cases requires a different representation of the data. The current
workflow provides only a single pathway for results generation and analysis, irrespective of the
purpose of the analysis. However, a single set of information can be used for intervention as
opposed to teaching and therefore the organization of the data needs to be differentiated. Currently,
the entire workflow needs to be changed for this translation which is not a very efficient method.
Designing an informatics-based system for allowing customization of data storage and data
visualization based on the different modes of research will have a big impact on the translation of
research into clinical practice.
As discussed in section 1.4 there is a bridge between human performance analytics challenges and
the medical imaging informatics concepts. The concepts of medical imaging informatics
community provide a basis on which the above-mentioned factors can be targeted as a solution.
The rest of this chapter discusses the concepts that were used for the design of the new performance
analytics informatics infrastructure and system implementation.
28
Medical Imaging informatics
2.4.1. Medical imaging informatics infrastructure
Over the past few decades, the medical imaging informatics infrastructure has been widely used
in clinical and research environment to handle large amounts of medical imaging data. MIII
provides pathways to a structured and organized solution in order to meet the challenges of data
standardization, integration, storage, and visualization.
2.4.2. Electronic patient records
An electronic patient record (ePR) is a digital version of a paper chart that contains all of the
patient’s medical history from one practice. Another version of the patient records is an electronic
medical record which contains standard medical and clinical data gathered in one provider’s office.
A third version, the electronic health record (eHR) goes beyond the data collected in the provider’s
office and includes more comprehensive patient history within the patient home. All eMR systems
consists of five major functions: (I) Integration of direct digital input of patient data; (II) Ability
to analyze across patients and providers; (III) Provides clinical decision support and suggests
courses of treatment; (IV) Performs outcome analysis and patient and physician profiling; (V)
Distributes information across different platforms and health information systems. Existing eMRs
are mostly web-based and have large data dictionaries with time stamps of their contents. In
addition, they can query from related healthcare information systems and display the data in a
flexible and robust manner. But just like other medical information systems, development of the
eMR faces several obstacles such as[6]:
• A common method to input patient examinations and related data to the system.
• Development of an across-the-board data and communication standard.
29
• Buy-in from manufactures to adopt the standards.
• Acceptance by healthcare providers. The medical imaging informatics community has
reaped the benefits of having an integrated eMR such as efficient extraction of knowledge from
images, knowledge-driven decision, and investigations for causal and non-causal relationships.
2.4.3. Digital imaging and communication in medicine (DICOM)
The heterogeneity of information gathered in human performance analytics is like the diversity of
databases maintaining patient information in the healthcare sector. Therefore, it is possible to use
standards designed and deployed for medical imaging as a guiding principle for managing human
performance data. Clinical informatics systems such as the clinical picture archiving and
communication systems (PACS) use Digital Imaging and communication in medicine (DICOM)
format for storing data. The development of DICOM has benefited the medical imaging
community at large by facilitating interoperability of the medical imaging equipment by
specifying[7]:
• A set of protocols for network communication followed by devices conformant to the
DICOM standard;
• A syntax and semantics for commands and associated information that can be exchanged
using these protocols;
• A set of media storage services to be followed by standard compliant devices, as well as a
file format and a directory structure to facilitate access to images, waveform data, and
related information.
The DICOM format consists of a binary header of tag/value pairs. The tags are keys, but the
description of tags is stored independently in DICOM dictionaries and not in the data itself. The
30
type of the value is contained in the tag/value pair enables the accurate reading of the data and
meta-data. The binary data is stored as a tag/value pair. In DICOM, all the data is represented
within an information object class. Thus, entities such as a patient’s demographics, image
acquisition variables, and the image data itself are specified by an object class. A service class
refers to a process upon which data is generated, operated (transformed), or communicated. The
DICOM distinguishes between normalized object /service classes versus composite object/service
classes that are constructed from two or more normalized classes [8].
Figure 2-4 DICOM hierarchical data model, which considers the relationship between a patient, an
imaging study, and the parts of the study (series, images, reports, presentation states).
Figure 2.4 shows the base hierarchy of the patient and related imaging data concepts. At the top of
this hierarchy, the patient object is the main entity around which all other data is organized, with
demographics and information on the individual relevant to conducting the imaging procedure. A
patient object is associated with one or more time-stamped imaging studies, with each study
encoding data and its nature such as institution, the reason for the exam, and referring physician.
In turn, each imaging study consists of one or more imaging series that describes the acquisition
parameters for a given scan sequence such as modality, scanner, CT- and MR-specific values,
contrast agent, and orientation. Each imaging series then consists of a set of individual image slices
31
that make up the sequence with descriptors for the image resolution, (physical) location, and a 2D
array of pixel values constituting the image or waveform values. The DICOM service classes
define the role of a device in a requested operation such as is the device invoking the service, or is
the device performing the operation. The former is referred to as a service class user (SCU), the
latter a service class provider (SCP). For a given service class, a device may act as the SCU or
SCP or both depending on its function in an overall PACS. The interchange of data in DICOM is
governed by the concept of service-object pair (SOP) classes, which aggregate an information
object definition and service class together.
2.4.4. Data warehousing
Data warehousing is a process of assembling information from various sources for the purpose of
obtaining a single detailed view of a specific part of data or broader overview of all the elements
to be integrated. Traditionally, data warehousing is associated with business management, but with
the advancement in data collection technologies, it is now a requirement even for research. This is
applicable to human performance research where a large volume of data is collected and analyzed
at every step. The data warehouse, in general, is a data structure that is optimized for distribution,
mass storage, and complex query processing. One variant of data warehousing is a clinical data
warehouse, which addresses the requirements of healthcare data. Clinical data warehouses are
complex in nature in comparison to a data warehouse. Clinical data warehouses require extensive
analysis of data design, architectural design, implementation, and deployment. When designing a
clinical data warehouse, data integration tasks of the medical data store are challenging scenarios.
Among the many challenges and issues, the most prominent are the architecture, data quality,
patient privacy, report consistency, scalability and user involvement. Since human performance
32
data collected for research is similar to the healthcare data set, the same guiding principles can be
leveraged while designing data warehousing solutions for supporting informatics needs. This is
further addressed in chapter 3.
2.4.5. Structure reporting
In DICOM, the structure reporting object or SOP classes are defined for transmission and storage
of documents that describe or refer to the images, waveforms, or the features they contain. The
structure report SOP classes provide the capability to record the structured information to enhance
the value of clinical documents and enable users to link the text data to images or waveforms [9].
SRs are designed to be a self-describing information structure that can be tailored to diverse
clinical observation reporting applications by utilization of templates and context-dependent
terminology. A DICOM-SR document consists of an ordinary DICOM “header” containing
demographic and identification information, accompanied by a “content tree” that consists of a
recursive structure of name-value pairs. In the realm of clinical trials, there are many reasons to
encode structured, quantitative, and coded information related to images. One of the initial goals
for the development of the DICOM Structured Report (SR) was the ability to encode such reports
in a form that would allow information to be extracted more readily than from a paper printed
report, or an unstructured plain text format. In medical imaging, there are two general categories
of information – 1) information generated by human operators using imaging equipment such as
distance and velocity measurements made during vascular, cardiac or obstetric ultrasound, and 2)
information generated by automated analysis of images, including Computer Assisted Detection
(CAD) or Computer Assisted Diagnosis (CAD). Both types of use fall into the general category of
creating “evidence documents” where part of the content is often subsequently extracted and
included in human-generated reports that supply the interpretation of the findings. DICOM SR
33
framework has proven ideal for the encoding, interchange, and persistent storage of such evidence
documents. This is largely because both the acquisition equipment and the image reporting
equipment already support DICOM encoding and services to be able to exchange images and there
is less additional effort to extend the DICOM implementation to develop some new novel method
[7].
2.4.6. Data Visualization for imaging and clinical data
Patient records contain a large quantity of complex, heterogeneous data such as imaging studies,
textual reports, and various laboratory results. A comprehensive review of a patient’s medical
record usually requires the physician to examine multiple documents while mentally noting the
current clinical context and filtering out unrelated information. However, given the time
constraints for each patient encounter, the data complexity and the data volume associated with
the chronic conditions and physicians often have difficulty gathering all the relevant patient
information. This leads to a lack of coordination among caregivers, poor integration of
examination results for diagnosis and performance of information redundant studies [10]. Over
the past two decades, significant progress has been made to make patient records more accessible
to clinicians through dedicated systems such as hospital information systems, PACS and RIS. But
increased access to information does not necessarily translate into use of that information.
Knowledge needs to be extracted and needs to be organized in a way that facilitates efficient
presentation and retrieval by the clinician. Data in the patient records have traditionally been
organized in three ways: (I) Source-oriented views, in which data are organized based on origin
(e.g., laboratory results are grouped together, whereas medications are grouped separately); (II)
Time-oriented views, in which data are organized based on when they are produced; and (III)
Concept-oriented views, in which data are organized based on relevance to a topic (e.g., medical
34
problems, current therapies. Each view is well suited for a specific user type (e.g., clinicians,
researchers, patients) or task (e.g., follow-up, consultation), however, no one view is sufficient to
support the needs of all users in the medical community. The extracted and organized clinical
information can be fully utilized only by presenting it in a way that helps the user understand the
trends and relationships in the data. The overall goals for data visualization is to: visually present
medical data in a more intuitive, easy-to-understand format; visually magnify subtle aspects of the
patient record that are pertinent to tailoring diagnosis and treatment; and prevent information
overload by presenting only the information needed at a given time[5], [7], [11].
Research in information visualization has resulted in novel depictions that assist the user with such
tasks as interacting with a large amount of data, discerning trends, elucidating relationships, and
understanding changes over time. In the medical domain, techniques have included graphical
displays for planning therapies, facilitating decision support, and visualizing temporal trends. One
challenge of providing a visual summary of a patient’s record is integrating all the disparate
representations into a cohesive interface.
Summary
Currently, factors such as user population, user environment, the scale of study and application
area are important but have not been integrated within the workflow design. For integrating these
factors in the workflow, modification of the concepts utilized by the medical imaging informatics
community can be further leveraged into new frontier areas of research because these concepts
provide a foundation for developing solutions for data reporting, data standardization and data
visualization as shown in the previous section.
35
Chapter 3. Conceptual Design of Medical Imaging Based System for
Human Performance Analytics and system architecture
Multilayer infrastructure for human performance analytics
Based on the infrastructure of medical imaging informatics as well as the workflow analysis and
requirements discussed in previous chapters, an infrastructure for the informatics system for
human performance analytics was developed and designed. In order to provide customization and
flexibility required by human performance analytics, the solutions were developed in a multilayer
layer architecture in a similar approach as medical imaging informatics. The human performance
analytics infrastructure is divided into four layers focusing on data connection, functions, modules
and applications. Developing an informatics solution for any application in human performance
analytics, a multi-layer design allows for modifications without redesigning the entire system
architecture. Figure 3.1 shows the four layers of infrastructure that was developed. Each layer will
be described in greater detail in the following sections.
Figure 3-1 Multilayer infrastructure for human performance analytics
36
3.1.1. Layer 1: Data sources
This layer forms the data sources that will present various challenges with regards to data
collection and management and connectivity. The data sources layer provides solutions for (I)
Speed and resolution of data: With various sensors and devices, data can be captured and
transmitted either offline or in real time and at various speeds. For example, accelerometer/
gyroscope used in inertial measuring units (IMU) can record data from 128Hz to 500Hz, but
multimedia data is often recorded at 60Hz to 120Hz. IMU units can transmit data in real time
whereas multimedia data requires sending to storage offline with hardware-based data transfer
methods; (II) Format of data: Data from different devices are stored in varying formats. For
example, a multimedia file can be stored in .MOV, .MP4 or .AVI format. Often, the format of files
is dependent on the hardware; (III) Synchronization and Integration of data: Data from different
sources are recorded on isolated systems and therefore requires time synchronization before its
information is integrated; (IV) Standardization of data: Data from sources does not get saved in
a specific structured way which prevents batch processing of large datasets; (V) Privacy and
security of data: Depending on the sources of data, de-identification and anonymization of data
is necessary.
3.1.2. Layer 2: Functional layer
This layer defines various functions that can be performed on a data set. Based on the general
workflow of human performance analytics and sources of data, the following functions are
required at the minimum for an informatics system: (I) Data cleaning: Data cleaning or data
cleansing is the process of detecting and correcting (or removing) corrupt or inaccurate records
from a data set; (II) Data validation: Data validation is the process of ensuring that a program
operates on clean, correct and useful data. It uses routines, often called “validation rules”; (III)
37
Data cropping: Data cropping is the process of extracting segments of useful data from a long
stream of data; (IV) Data transfer: Based on the data type and the source of data and environment
in which data is gathered, data transfer protocols are defined with this process; (V) Data annotation:
Data annotation or data tagging is the process by which each isolated or integrated piece of data is
tagged with an identification code for facilitating quick data access.
3.1.3. Layer 3: Modular Layer
Modules are defined as a group of one or more types of data sources tied with one or more types
of functionality. Modules are classified based on the use and placement within the workflow as
well as based on the type of analysis to be performed. An example of some of the modules designed
includes a survey-based module and a multimedia data processing module. Based on the data
sources, the diverse application areas, and the user groups, the following modules were designed:
(I) Data parser module: This module is designed to handle incoming data streams and
extracting information from it.
(II) Survey handler: This module collects self-reported information such as sleep, and
hydration. Various platforms can be implemented such as online apps, text form,
and paper-based forms. This module is designed for processing self-reported data.
(III) Multimedia visualizer: Human performance analysis utilized video data to provide
visual evidence for cause and effect relationships between various factors that affect
the outcome of a performance. This module addresses the issues of collecting,
reviewing, and editing various formats of video files.
(IV) Data integration: Data integration modules are required for integrating multiple
data sets collected using the same or different data sources in order to provide a
38
unified view of the data for the end user. This module is crucial for data sharing
within multiple institutes.
(V) Data synchronization: Data is collected using various sources at various sampling
rates and using isolated systems which may require synchronization. The methods
utilized for synchronizing a data stream depends on the type of data source as well
as how the data is accessed.
(VI) Data analytics: This module is used as a place holder for scripts (MATLAB, R, etc.)
that will be used to process data.
(VII) Data visualization: This module is designed to create a visual representation of the
data depending upon the data source and end-user requirements.
(VIII) Data storage module: This module is designed to handle database operations, such
as creating access paths, data upload/download, and other database related
functions.
(IX) Data security module: This module handles user authentication, data de-
identification and anonymization, privacy, and data sharing protocols.
(X) Data reporting: This module handles report generation for data review and data
analysis.
3.1.4. Layer 4: Conceptual Layer
The conceptual layer defines the application aspect of an informatics system in various modes such
as: (I) Intervention/Feedback: For new technique development/refinement in the athletic
population, an accurate, reliable, timely and understandable feedback mechanism is required. One
of the primary advantages of building a human performance analytics informatics system is to
have a decision support tool for the physical therapist or coaches in the field; (II) Educational: To
39
demonstrate, teach, and document cause and effect relationships of biomechanics of the human
body, a platform is needed to provide structure to the information for easy access and portability;
(III) Research: Before developing interventional and educational tools, knowledge extraction and
communication is a must for the research community. Knowledge sharing between institutes is a
requirement for large scale projects which can be supported only by deploying a customizable
informatics system. To implement the above-mentioned infrastructure, a system architecture was
designed and developed. This architecture provides the basis for developing informatics solutions
to various applications in human performance analysis. The next section will describe the overall
system architecture.
Figure 3-2 System architecture for the human performance analytics informatics system
40
System Architecture Design and Implementation
This section describes three aspects of the human performance analytics informatics system. The
first aspect is how the data is moved to the storage layer, the second aspect is where the moved
data is stored, and the third aspect is how the analytics mechanism is structured around this stored
data. To address each of these aspects, the system was designed and implemented using four major
functional components: the data gateway, the data storage (Servers), the user interface, and the
data processing engine.
3.2.1. Data Gateway
All types of data collected using different data sources need to be parsed for storage, processing,
and analysis. Since the data collected for human performance analytics varies based on the data
sources used, as well as the pathway for incoming data dependent upon data collection
environmental factors, creating a data gateway can be quite challenging and becomes a crucial
component of the system architecture. In addition, sometimes the data gathered does not contain
metadata and are not contained in a structured format. These challenges are addressed using a data
parser module which consists of two functionalities. First functionality is to create metadata based
on the log sheets and the second functionality is to structure files. There are two approaches
towards implementing data parsers: (I) Local parser: Metadata files are created on a local machine
based on the logbook information and files are moved into user-defined structured folders. Parsing
data locally allows data to be parsed without a network; (II) Server-based parser: Metadata files
are created once the raw data files are uploaded to the file server. In server-based parsers, files are
not moved to the structured folder but instead are moved to relative file locations and saved as a
separate file directly on the server. Server-based parser also allows data uploads from multiple
locations.
41
3.2.2. Data Storage
The data storage component is defined as an overall aggregate of components such as metadata,
data warehouse, data mart, data acquisition, and information delivery. The details of the data
storage showing how the components will fit together are described as follows. Data model
Provides the mapping of the data in a database. This includes the source of tables and columns,
the meanings of the keys, and the relationships between the tables [12]. To support the human
performance analytics needs and to provide flexibility to adapt storage needs according to various
workflows discussed in chapter 3, a data model was designed which is conceptually compact and
can be scaled and modified easily. The data model works as a foundation for designing the database
and the file server. Metadata: Provides information about the data set that is required to process
the reference data set. Some data sources create metadata files within their protocol. However, not
all data sources create a metadata file by themselves and therefore requires a mechanism for
metadata creation. Database: Provides an organized placeholder for data so that it can be easily
accessed, managed and updated. Commonly used relational database are made up of a set of tables
with data that fits into a predefined data model. Data warehouse: Consists of sets of databases
from which all information is accessed, queried and analyzed. The data warehouse is designed to
hold versions of data in every stage of processing, for example, raw data, clean data, cropped data,
pre-processed data, and processed data. Data mart: A subset of a data warehouse that focuses on
one or more of the specific applications or analytics to be performed. The data is extracted from
the data warehouse and de-normalized and indexed to support repeated use of certain data
elements.
42
3.2.3. Data processing engines
The webserver is responsible to deliver content to the end users based on a web-based interface
which is used for data review or reporting interface. A primary example of a webserver application
is the web-based visualization engine, which is used as a means for generating visualization tools
for end users. The Application server is responsible for the interactive part of the informatics
system which allows users to sort through data, download/upload files, and edit files. The
application server works with the webserver as a logical unit. This also includes accessing
databases or files or carrying out complex calculations. A primary example of an application server
is the web-based analytics engine which can be used by end user to perform calculations and data
analysis. To test the accuracy and initial deployment steps of the informatics system two different
testbed environments were utilized: (I) Local server environment is used to test various modules
including the data parser, data uploader, and data visualizer. The main purpose of setting up this
environment is to test features of the informatics system in situations where there is no network.
(II) Virtual machines on a data center server are used to deploy the informatics system and to test
the features for data sharing remotely.
3.2.4. User interface
User interfaces (UI) are designed and implemented based on the end user requirement and the
environment in which end users interact with this interface. For example, dashboard designs will
have a different view if the end user is reviewing data which is already entered into the system as
opposed to if the end user is uploading new data or analyzing data. For networked and local use,
the user interface may also differ operationally. In the next chapter, different implementation
scenarios of this system are discussed with solutions developed on the basis of the 4-layer human
performance infrastructure as well as the system architecture that was designed and developed.
43
Summary
A robust system architecture design can have an impact on the overall integration of an
informatics-based solution that can benefit any workflow. For human performance studies, a
system architecture design is needed that can accommodate a variety of user groups, applications,
and scale accordingly. In this chapter, I presented how the newly designed four block system
architecture derived from the original MIII infrastructure can be used to implement a data
processing pipeline for a variety of applications.
44
Chapter 4. Design and development of informatics system for self-
reported and multimedia data management in rehabilitation
Background
Patients confined to a manual wheelchair are at an added risk of developing shoulder pain due to
the shoulder load of work involved in operating the manual wheelchairs. Upper extremity pain in
individuals with spinal cord injury ranges from 36-73% and the incidence increases with time post
injury and is three times greater than in randomly selected individuals. One working hypothesis is
that the cause for shoulder pain is due to shoulder load which is affected by the fit of the
wheelchair. To test this hypothesis one clinical study is being conducted at the Rancho Los Amigos
Rehabilitation center where biomechanics analysis is being applied to the wheelchair fitting center
workflow. Figure 4.1 shows the overall design of this research study. The experimental design
involves capturing videos as well as kinetic and kinematic data of participants as they perform
wheelchair propulsion in an outdoor environment over the course of three different visits during a
10-month duration. As shown in Figure 4.1, the collected data set includes questionnaires,
multimedia data, sensor data and free text notes from each visit with other observed information.
The objective behind collecting different data elements is to gather as much insight as possible
about the wheelchair user’s daily living and the techniques they utilize for propelling his or her
chair.
45
Figure 4-1 Overview of a research study for analysis of shoulder pain causes in reference to
wheelchair fitting
Challenges
Data collection involves following steps specifically for shoulder health study conducted at
Rancho Los Amigos Rehab wheelchair fitting site as shown in Figure 4.1.
1. Data is captured using paper-based forms to gain insight into wheelchair users
standards of living.
2. The subject’s wheelchair usage is monitored using a sensor that is attached to the
wheelchair which records daily mileage. This information is gathered by the researcher
who contacts them to obtain the reading.
46
3. The subject’s chair configurations and physical measurements are recorded on a
separate paper-based form or excel sheets.
4. The physical therapist’s observations, as well as the notes on the number of changed
made, are recorded in an excel sheet as well.
5. The data analyzer downloads the data separately for all data types on a local machine
6. Various commercially available software platforms are then opened to view each of
these data files, for example, Microsoft Word, Adobe reader, and MATLAB.
Figure 4.2: Traditional and informatics-based workflows
There are many limitations of the current workflow including duplication of data, linear data
processing pipeline and isolated/fragmented datasets. Factors such as end users, the scale of data,
environment, and applications of the data, however, are not considered in current workflow. These
factors can impact design data collections and time requirements as shown in various research
studies in imaging-based clinical studies.
47
Informatics system design and deployment
Human performance longitudinal studies are very similar in design to various medical imaging-
based studies involving baseline measurements, treatment, and follow up records. Therefore,
medical imaging informatics concepts such as the electronic patient record system can be used as
the basis for designing an informatics-based solution. Another similarity that exists between the
biomechanics research and various imaging informatics applications is that in both applications a
variety of data-intensive multimedia sources are often used. For example, in this study information
is gathered using paper-based form, spreadsheets, sensor, instrumented surfaces, and multimedia
(video and pictures) data.
However, unlike medical imaging, these sources operate in complete isolation with each other,
which causes the collected information to be unsynchronized and non-standardized. Therefore, the
very first step in developing an informatics system is to conceptualize methods to interlink these
isolated pieces of information for creating an integrated data set. This is done by translating the
experimental design in Figure 4.1 into a logical data model. The study workflow shown in Figure
4.3, shows how two independent pieces of information can be linked. This data model is unique to
this application since it defines the data link based on the experimental design that one subject will
visit for multiple sessions and during each of these sessions, multiple trials are conducted to gather
various data elements from various data sources.
48
Figure 4.3: Study workflow and corresponding acquired data
Once the data model was created based on the experimental design and the corresponding
databases are created on the server. The next two informatics tools that are developed include the
data parser and the gateway. First, the data parser is used to integrate information from raw data
files and provide structure and standardization. The parser function is based on the DICOM
standard which is widely used in imaging informatics. Secondly, the gateway is used as a
checkpoint for all data elements before they are stored in a server where the database is located.
The data parser was developed in MATLAB, the DB was developed in MySQL [13] relational
database, and the web-based development was developed using Hypertext Preprocessor (PHP).
Highcharts [14] and google charts [15] are utilized for graphing the data points. The layout of
information on the user interface is based on the observed feedback of the end user population.
The development work was focused on workflow efficiency in a research environment, and
therefore the UI was tailored for researchers.
49
Evaluation
Human performance informatics system was evaluated with the shoulder pain study, which aimed
to collect 30 subjects in 2 years period. Data set consists of 15 controlled and 15 subjects in the
testing group. Self-reported data were collected using standardized surveys during three sessions
at baseline, 1month and 10 months. Also included with digital questionnaires were sensor and
multimedia data. These data sets were collected using isolated systems and were integrated
manually retrospectively. The data from different sources are integrated at the processing stage or
at the reporting stage manually. In the current workflow, data is entered using paper-based
methods. Although the paper forms are easy to deploy, in the long run, they are not time efficient.
The paper handling itself is insecure, it requires, many processing and data management steps to
get to a point where data entered can be used. Manual data entry has a high error rate, as well as
data validation is only possible retrospectively. Informatics system developed on the basis of the
infrastructure of medical imaging informatics allowed creation of a platform for qualitative and
quantitative analysis. Various informatics tools were implemented as listed in Table 4.1 on a
generic system architecture as mentioned in chapter 3. The overall added value of the system is
observed by analyzing each tool in isolation and in a sequential manner.
50
Table 4.1: Four-layer modular development for shoulder pain study
Layer 1 Data Source A. Modified (Simplified) paper questionnaire
B. Parser for fragmented data sources
Layer 2 Functional Layer C. Data preprocessing (Validation, Normalization,
Cleaning, Annotating, and Synchronizing)
Layer 3 Modular Layer D. Web portal for data collection
E. Centralized data storage
Layer 4 Application Layer F. Knowledge discovery & Decision support
4.4.1. Modified (Simplified) paper questionnaire
For clinical and non-clinical research studies, self-reported questionnaire is almost always used.
Traditionally questionnaires are filled in a paper format however over the past decade use of
electronic data entry is integrated within data collection workflow. Although electronic data
collection is a norm, collected data is often not integrated which leads to fragmented data sets that
require additional time in terms of management, maintenance, query, and presentation. The
informatics infrastructure we created a pathway through which questionnaires can be completed
and integrated with other data sets. We used metadata creation based on medical imaging to
annotate survey data which allowed schematic data storage. When data analysis has performed a
narrative is established for the data reviewer. For shoulder pain study four different standardized
questionnaires were used to assess the standard of living and overall health of wheelchair users.
The questions on each of these surveys are measured on different scales and therefore presents a
challenge for interpretation and analysis. We created an integrated/condensed version of the
questionnaire that included all questions merged according to the focus area. This is shown in table
51
4.2. There were two benefits of merging these questions, first, it reduced the time for data
collection as workflow improvement and secondly its improved data processing time since all
questions are validated and normalized at the time of collection. There are additional benefits such
as reduction in data entry errors as well as improved data integration.
4.4.2. Parser for fragmented data sources
In some cases, data collection is possible only in a non-networked environment where either the
instrument used is not accessible during the collection or the device is completely isolated. For
example, video or force data. Often in case of the electronic questionnaire also, the collection is
done with PDF documents. Such cases must be considered while designing any workflow based
on informatics. To integrate data collected in non-networked setup, file uploader with the inbuilt
parser is designed. In the case of shoulder pain study when paper or offline PDF document is used
for self-reported data, a MATLAB based parser was implemented. This parser extracts information
Table 4.2: Combining and normalizing variables from different data sources allows clean data
visualization
Source Variable Overall focus areas divided to create comprehensive profiles
Demographics 5 Mobility
Questionnaire 1 15 Pain
Questionnaire 2 10 Health
Questionnaire 3 2 Free text
Notes Questionnaire 4 11
52
from PDF document, add identifiers to prepare data for merging with other data sets in the central
database.
4.4.3. Data preprocessing (Validation, Normalization, Cleaning, Annotating, and
Synchronizing)
Data captured from electronic data entry or file uploader/parser requires a level of massaging
before it can be used. This includes data type validation, data normalization, clean, and
synchronization. This layer is an essential attachment for data source layer and is implemented
based on data. For example, in the case of shoulder study survey data, evaluation of data integrity
both is essential. This can be defined by rules on expected user inputs or on logical reasoning.
Figure 4.4 shows the summary of rules used for the validation of survey data. Data preprocessing
depends on the type of data and since presented infrastructure has a placeholder for these processes
creates a pathway for creating more data preprocessing methods.
Figure 4.4: Flowchart for preprocessing survey datasets (need to be referenced)
53
4.4.4. Web portal for data collection
Based on the modified survey and preferences of data collector an interface was developed, which
allowed faster data collection as compared to paper. This electronic data collection method allows
faster data acquisition as well as data tagging and validation in real time based on data
preprocessing from Functional Layer. To accommodate data gathering in the low network or non-
digital pathway, we have another interface for file upload that works with data parser from Data
Source layer.
4.4.5. Centralized data storage
There are unique requirements of data storage solution development for non-clinical non-imaging-
based research. First data heterogeneity is a very important factor since data is acquired and
captured of various devices (paper, electronic survey, sensor, camera, etc.). Second various state
in which a dataset should be stored (raw, processed, cleaned, pre-processed, etc.) and lastly the
data should be modeled around how it can be uploaded (added tags) and how it can be searched.
For shoulder pain study, we focused on the following type of data: Multimedia (Video),
Information from biomechanics processing & Text from online survey/ electronic data entry
(Paper, observation, survey).
4.4.6. Knowledge discovery & Decision support
When data is stored in a centralized data storage, information retrieval and its use are speedy. In
the current workflow, each piece of information needs to be searched, opened and arranged on
screen to be reviewed manually based on the requirement of data reviewer. This process is not
scalable when a large amount of data is accumulated. Current workflow also fails for longitudinal
data analysis where data within the subject group or in-between multiple visits from the same
54
subjects requires review. It’s difficult to detect any pattern formation and as the number of data
sources increases, this process fails. With the new infrastructure, the process of information
retrieval more efficient and is standardized like in medical imaging. The system design allows data
user which in this case is biomechanics researcher to interact with the simple interface, and run
the search on multiple complex data queries, create standardized reports and visually identify
patterns easily. In figure 4.5, the end interface is showed where multiple data sources are shown
in context with each other giving a whole picture.
Figure 4.5: Dashboard example
55
Chapter 5. Design and development of informatics system for
knowledge curation in collaborative research environment
Background
Collaborative research is often conducted across the biomechanics community with an aim to
create rich data sets where data is often collected at multiple locations and require access control
mechanisms. For data sharing current workflow uses commercial solutions such as cloud-based
storage and hard drives. Since collaborative research in human performance often is a
collaboration between competing institutes, securing and safeguarding intellectual property is of
high priority and data access plays a crucial role in terms of defining system operational protocols.
This challenge is similar to patient data confidentiality requirements of any medical imaging-based
research and specifically in an imaging-based clinical trial. However, in healthcare, the provider
has an infrastructure and resources to anonymize and secure information as well as established
medical imaging informatics databases. There is no such infrastructure for the regulation of data
sharing in the human performance research community. Therefore, the very first and important
step to build any informatics-based solutions for collaborative application is to develop method
and tools to anonymize information before sharing it over secure channels with authenticated users
as well as developing proper user access rights to the data sets.
Informatics system design and deployment
Innovative solutions were designed to address the above-mentioned challenges. The first, solution
was to design customizable local data parsers for each of the participating institutes. These data
parsers extract only sharable information from each data set before merging it with a centralized
database. The data parser also creates metadata which aids in the tracking of integrated data easily.
56
The second solution was to develop methods to control data access by linking user roles with their
institutional affiliations and data. In order to design and develop infrastructure to support
collaborative research in human performance analytics that can support data sharing and
management in a secure environment, we used sample multimedia data set which is most often
collected for any human movement analysis. The primary focus of this experiment was to test
parsing mechanism and access control within a variety of user population.
In the traditional scenario, commercial sharing mechanisms are used. However, there are many
challenges to this method such as poor data integrity, data redundancy, duplication, etc. Traditional
data sharing methods are also prone to poor data sharing protocols and overall data security. This
often leads to delayed data utilization and risk for intellectual property. In the proposed
informatics-based workflow, all this is handled at two distinct places. First at the front end by
customization of data uploader modules. Parsers run locally on the data collectors’ machines can
strip information (sensitive) away or add information such as metadata to the raw data files itself.
The second place where the solution is implemented is the rule-based access control which is
implemented in the backend with a central database level. Current workflow for data sharing in a
collaborative environment includes the following steps:
1. Data from various sources acquired on the different system.
2. Data is combined and saved in a folder on a local machine
3. Various copies of the data are made on hard drives to share among data analyzers.
4. To share data with collaborating research team or user groups, it is uploaded to
commercially available data sharing platforms.
5. Data access on the commercially available solutions is at the directory level. Therefore,
either new/duplicate directories are created to limit access to data.
57
Evaluation
There are many limitations of current workflow including duplications of data and secure access.
Factors such as the environment and number of users should be considered in designing workflow.
As a case study of data sharing, I used generic datasets from a biomechanics laboratory. The data
model is defined to get a clear interconnection of various data elements. As shown in Figure 5.1,
a username table is linked with the institute and user roles which triggers cross checks in the
background when a single user tries login to the system as opposed to traditional authentication
methods of single username password options. With the creation of additional metadata on top of
the raw data files, this creates a unique identification tag that helps in data security, reusability and
retrospective searches.
Figure 5-1 The database created here grants user access based on user roles, affiliated institute and
metadata. The meta data is created using the meta data generator locally before the files are uploaded
onto the server-based storage.
58
Currently, the system design takes into account that data can reside on three possible locations
such as commercially available storage space like google drives, or local machines, or on the
limited access servers. Because of this, the relative conceptual position of the analytics engine is
kept where processing scripts can access all three locations. Also, since the data is distributed,
various mechanisms are deployed to maintain the same versions of data across all locations.
Informatics system developed on the basis of the infrastructure of medical imaging informatics
allowed the creation of a platform for qualitative and quantitative analysis. Various informatics
tools were implemented as listed in Table 5.1 on a generic system architecture as mentioned earlier.
An overall added value of the system is observed by analyzing each tool in isolation and in a
sequential manner.
Table 5.1: Four-layer modular development for shoulder pain study
Layer 1 Data Source A. Data Structuring
Layer 2 Functional Layer B. Data preprocessing (Validation, Normalization,
Cleaning, Annotating, and Synchronizing)
Layer 3 Modular Layer C. Centralized data storage with access control
Layer 4 Application Layer D. Data sharing for teaching and collaboration.
5.3.1. Data Structuring
When data is collected in a research setting for evidence-based interventions and feedback, a wide
variety of data acquisition systems are used. With the advancement of technology, the variety and
quality of data have improved a lot, however, each of this data acquisition system operates in
complete isolation and therefore data is often gathered as subfolders at the end of data collection.
To provide a sorted structure which can improve the efficiency of data query and data analytics, a
59
method is needed to provide structure to the file folders. This process needed to run independently
on a local machine without altering the content of files. This is achieved in this module with
MATLAB based functions.
5.3.2. Data preprocessing (Validation, Cleaning, Annotating, and Synchronizing)
Data captured from electronic data entry or file uploader/parser requires a level of massaging
before it can be used. This includes data type validation, data normalization, clean, and
synchronization. This layer is an essential attachment for data source layer and is implemented
based on data. Data preprocessing depends on the type of data and since presented infrastructure
has a placeholder for these processes creates a pathway for creating more data preprocessing
methods. To add context to the collected data, metadata elements are defined by observing a
variety of human performance research data collections. These elements are picked first based on
the requirements of the data analytics and second on the basis on feasibility/practicality from data
collectors’ point of view. Since this work is the first step towards creating such standards for
human performance research, I focused on metadata tags specific to data collection and
experimental design. These tags serve two purposes, first, it creates a systematic data view where
each element of data itself has a reference point, second, it creates a way to achieve data in a way
that data reusability increases. These tags are used for adding information about data collection
itself. Since a large amount of data collected for human performance analytics is visual, it does
require systematic data de-identifiers and anonymizers. For human performance research data
quality and standards such as the resolution of image/video are required to meet certain standards,
which makes de-identifier a critical component in the data processing pipeline. For example, any
changes in resolution directly affect the computer vision-based algorithms. Likewise, if in a
60
survey, there are certain fields that should not be transcribed in digital format, specific retractions
need to be applied. This module is also developed using MATLAB.
5.3.3. Centralized data storage with access control
There are unique requirements of data storage solution development for non-clinical non-imaging-
based research. First data heterogeneity is a very important factor since data is acquired and
captured of various devices (paper, electronic survey, sensor, camera, etc.). Second various state
in which a dataset should be stored (raw, processed, cleaned, pre-processed, etc.) and lastly the
data should be modeled around how it can be uploaded (added tags) and how it can be searched.
Defining various user groups and storing data with metadata makes data masking and access
control easier to implement. Each piece of data is saved with identification in terms of data
collection date, type, uploader, user, etc. The access control within a large user group can now be
controlled at the file level and not just entire directory level.
5.3.4. Data sharing for teaching and collaboration
When data is stored in fragments, sharing data is less secure and often leads to duplication of data.
In the current workflow for collaborative research, information sharing is only possible at two
levels, where either someone can access a folder, or they can’t access a folder. However, in the
proposed informatics-based solution I implemented a pipeline that allowed data access control at
the file level.
61
Chapter 6. Results
Chapter 4 and 5 discussed two use cases and the development of their informatics-based solutions.
In this chapter, I will present my evaluation findings and observations. I used two quantitative
methods to evaluate each module that was developed for each these use case categorized based on
infrastructure layers. The first method is observing time difference between informatics-based
workflow and traditional workflow. The second method involves comparison of workflow steps
between informatics-based workflow and traditional workflow. Tables 6.1 and 6.3, give details on
quantitative assessment of each module and the comparison of each workflow. All workflow
comparison is done from end-user perspective. Table 6.2 and 6. 4 lists all the modules developed
for each layer for both use cases. Each module is developed to achieve the same results with higher
accuracy and more efficiently compared to traditional workflow.
Informatics-based workflow developed in modules, adds value in terms of accuracy and efficiency.
As seen in layer 1 description, having electronic forms for data collection not only speeds up data
collection but it also allows data validation to occur at the same as data collection. This reduces
error which can occur due to missing information or wrong information which often occurs in case
of paper-based forms. Another advantage is that paper-based forms require file management as an
organizational level which although traditional but adds up a huge cost.
For layer 2 as noted in table 6.2, data preprocessing is data validation, cleaning, annotation. This
process is a simple rule-based data check, but this is not scalable if done manually. As shown in
table 6.1, the informatics tools reduce this time by a huge margin and as well as the steps are
reduced and were automated.
62
Table 6.1 Design and development of informatics system for self-reported and multimedia
data management in rehabilitation
Layer
1
Modified
(Simplified)
paper
questionnaire
Traditional Workflow
1. Data collection is set up.
2. Paper forms completed
3. Paper filing is done
4. Microsoft Excel spreadsheets created to match with
forms
5. Manual data entry is conducted
6. Checks are done for data accuracy and human data
entry errors
7. Usable data set is generated
Time
15-20
min
per
form
Informatics Workflow
1. Data collection is set up.
2. Web-based forms are created once
3. Data is entered via web/pdf where inbuild data
validation is implemented
4. Usable data is generated
Time
7 min
per
from
Parser for
fragmented
data sources
Traditional Workflow
1. Data from different sources is copied on the local
machine.
2. Microsoft Excel spreadsheets created to match with
forms
3. Manual data entry is conducted
4. Checks are done for data accuracy and human data
entry errors
5. The usable data set is generated
Time
30 min
per
form
Informatics Workflow
1. Data collection is set up.
2. Data is entered from pdf via parsers where fields are
matched with specific variables where inbuild data
validation are implemented
3. Usable data is generated
Time
15 min
per
from
Layer
2
Data
preprocessing
Traditional Workflow
1. Data from different sources is copied on the local
machine.
2. Microsoft Excel spreadsheets created to match with
forms
3. Manual data entry is conducted
4. Checks are done for data accuracy and human data
entry errors
5. The usable data set is generated
6. For preprocessing, various copies of data are made to
control version.
7. Data sorting, cleaning is done manually by the data
analyzer
Time
40 min
per
form
63
Informatics Workflow
1. Data collection is set up.
2. Data is entered from pdf via parsers where fields are
matched with specific variables where inbuild data
validation are implemented
3. Usable data is generated
4. Data query is written once and automated for the
entire data set.
Time
<1
min
per
query
Layer
3
Web portal
for data
collection
Traditional Workflow
1. For paper-based data collection, subject is required to
visit wheelchair fitting center.
2. Paper-based form is filled at the fitting center, but
paper file management is needed
Time
30 min
per
form
Informatics Workflow
1. Data collection is set up, and devices like tablets are
used to enter data
Time
15 min
per
from
Centralized
data storage
Traditional Workflow
1. Data is collected but stored in various file folders
2. For reviewing data all folders are accessed
individually and files are open separately by
referenced logbook
Time
30 min
per
form
Informatics Workflow
1. Data is directly stored in the database with tables for
each form.
2. One interface is used to query data based on the
number of filters.
Time
1 min
per
from
Layer
4
Knowledge
discovery &
Decision
support
Traditional Workflow
1. Data from different sources is copied on local
machine.
2. Different third-party software’s are used to open
analyze each piece of information in a data set
3. Logbook lookup is necessary for retrieving correct
referenced files form each data folder
4. Checks are done for data accuracy and human data
entry errors
5. Data is arranged on screen or in PowerPoint slides
for creating a narrative for decision making
Time
4-5
hours
per
form
Informatics Workflow
1. Data collection is set up.
2. Data is entered from pdf via parsers where fields are
matched with specific variables where inbuild data
validation are implemented
3. Usable data is generated
Time
15 min
per
form
64
Major impact on workflow efficiency can be seen in the layer 4, where the information is required
to be presented in a contextual manner and doing this in traditional workflow require a long time.
But with informatics-based workflow this time is reduced many folds, and this adds to the
capability of data exploration.
Table 6.2 Observed added value
Designed and
developed Informatics
Tools
Observed Added Value
A. Modified
(Simplified) paper
questionnaire
B. Parser for
fragmented data
sources
Combined
questionnaire
Tool set developed
with MATLAB
Data normalization; data collection
on paper, networked, non-
networked electronic environment;
realtime data storage, visualization
and validation; data integration
C. Data preprocessing
(Validation,
Normalization,
Cleaning,
Annotating, and
Synchronizing)
Server-side and client-
side pipeline toolset
for
cleaning/preprocessing
survey data
Data extraction, transformation,
automated cleaning;
data structure
D. Web portal for data
collection
E. Centralized data
storage
Data collection portal
for survey
A Database that
supports complex
queries
Remote data collection; access to
data from multiple devices and
ability to sort through data
F. Knowledge
discovery &
Decision support
Web-portal designed
specifically to show
self-reported data for
reviewing purposes
A customizable interface that is
built on top of data storage that
allows data exploration
As shown in table 6.3 and 6.4, the major impact of informatics on workflow for collaborative work
is in layer 1 and layer 3. At layer 1, data is collected in a raw format and this is not a structured
data set and therefore takes up time if managed manually. Data structuring can be automated by
using file attributes. This saves time, reduces errors and prevents loss of data. At layer 3, access
control reduces data duplication, and therefore impact overall data integrity.
65
Table 6. 3 Design and development of informatics system for knowledge curation in a
collaborative research environment
Layer
1
Data
Structuring
Traditional Workflow
1. Data from different sources is copied on a local
machine.
2. Files are moved manually to create a folder structure
3. The usable data set is generated
Time
30 min
Informatics Workflow
1. Data from different sources is copied on a local
machine.
2. MATLAB based parser is used to structure data
3. Usable data is generated with a hierarchy of data model
Time
1 min
Layer
2
Data
preprocessing
Traditional Workflow
1. Video data requires manual cropping mechanism
2. Pictures/video require manual data retraction/ de-
identification
3. Logbook need to keep for accessing a particular data
file to match different data sources.
4. This process is not automated and requires repetition
for each file.
Time
40 min
Informatics Workflow
1. Based on log-book, identifiers are created as metadata
and is referenced with the data.
2. Batch processing for de-identification and cleaning are
possible, an example of vision-based data cleaning.
Time
10 min
Layer
3
Centralized
data storage
with access
control
Traditional Workflow
1. Data from different sources is copied on a local
machine.
2. Data is sorted and subfolders are created for different
users
3. Various copies of the data are made on flash drives
4. Flash drives are shared among data users
Time
30 min
Informatics Workflow
1. Data from different sources is copied on a local
machine.
2. Data is upload to the database
3. Data is accessible from multiple locations
Time
15 min
Layer
4
Knowledge
discovery &
Decision
support
Traditional Workflow
1. Data is shared among various data users with
commercially available resources
2. Since data access is not managed at the file level,
different folders are created and shared
Varies
Informatics Workflow
1. Database access is granted based on the data elements
(date of collection, type of collection etc.).
< 15
min
66
Methods traditionally used in medical imaging informatics for handling large datasets can be
utilized in different domains where similar data handling limitations exist. For achieving
translation of core concepts of medical imaging infrastructure to biomechanics based human
performance analytics, this work discusses foundational work and aims to set protocols and
standards. Proposed methods were evaluated on self-reported survey data from the rehabilitation
study. By developing the informatics system and standards we were able to report its positive
effects on current workflow. There are many applications and uses of medical imaging
infrastructure-based informatics systems. Especially in the field of human performance analysis,
where many technological advances now allow high-resolution data collection. However, without
the data regulation and standardization of data collection, it is challenging to scale data processing.
As an initial foundational work, we presented in this work an example of both qualitative and
quantitative data analysis. We showed how the informatics system saves time by analyzing
workflow and increases the confidence of data user in various research avenues.
67
Table 6.4 Observed added value
Designed and
developed Informatics
Tools
Observed Added Value
A. Data Structuring Methods to restructure
data without altering
Toolset developed
with MATLAB
Data normalization; data collection
on PAPER, Network, NON-
Network electronic environment;
Realtime data storage, visualization
and validation; Data integration
B. Data preprocessing
(Validation,
Normalization,
Cleaning, Annotating,
and Synchronizing)
Server-side and client-
side pipeline toolset
for
cleaning/preprocessing
survey data
Data extraction, transformation,
cleaning is automated;
Data structure
C. Centralized data
storage with access
control
Access rules
implemented at data
element level
Database that support
complex queries
Remote data collection; access to
data from multiple devices and
ability to sort through data
D. Data sharing for
teaching and
collaboration.
Web-portal for
collaboration
A customizable interface that is
built on top of data storage that
allows data exploration
68
Chapter 7. Discussion & Future work
Medical imaging-based informatics infrastructure was evaluated on two applications of human
performance research. This work may further be accessed by the scientific and commercial
community by applying this methodology to create a data processing pipeline for various
applications. There is a great value in taking this work further since the area of health data analytics
and hospital information technology is evolving and with the adaptation of the wearable
technology, it is now possible to collect data in the large volume of continuous data. However, a
large volume of data by itself is not sufficient to generate reliable and usable knowledge, and there
is a need for a robust infrastructure to process this data to extract knowledge.
For taking advantage of technologies such as artificial intelligence and machine learning, data
preparation and data biases often always presents as a big problem. Work presented in this
dissertation aims to create an infrastructure that tackles highlighted challenges of data preparation
and enable the design and development of their solutions.
Currently, data management and knowledge discovery in human performance research are handled
by commercially available platforms such as Dropbox [16], Google Drive[17], etc. These
solutions, however, are inadequate for large-scale collaborative research projects because data
collected is not in any community defined structure. By using concepts of medical imaging such
as electronic patient records and DICOM, we aimed to create an evidence-based foundation upon
which solutions supporting human performance research analytics can be developed and
community-wide standards and protocols can be designed for data handling and management. In
medical imaging informatics, DICOM has facilitated interoperability of medical imaging
69
equipment by specifying a set of media storage services to be followed by standard compliant
devices, as well as a file format and a directory structure to facilitate access to images, waveform
data, and related information.
One key benefit of implementing informatics infrastructure is that it provides a roadmap to design
and develop a decision support system for human performance analytics and injury prevention
research. The data management tools aim to provide end users such as researchers, physical
therapist ability to explore data with the complex query and investigate pattern that otherwise
requires significant training and resources which adds up in cost and time. Informatics framework
allows data utilization in real time and helps preserve data integrating for retrospective analysis.
Real-time data processing allows an end user to use data for decision making and account for
smaller or complex variables which increases their confidence level in their decision. Standards
and protocols for data processing allows scaling the data collection and analytics by many folds.
This allows high-resolution data collection for a large population group.
In the real world having an informatics system has another advantage for collaborative research,
where data sharing is required within institute or between multiple institutes spread across multiple
locations. Informatics platform creates a foundational data processing pipeline, which in effect
allows data standardization. This scandalization allows faster transfer and sharing while preserving
the contextual integrity of the data itself. This is a very important factor for data processing or
presentation since some components of collected data are only useful in context with differently
collected data elements.
70
Research workflows are often established around available resources and therefore often changes
as the data collection phase progresses. It is therefore very difficult to measure the impact of
informatics tools as a whole in the research setting. Informatics toolset and its impact should be
evaluated after training of users as user bias might also affect the acceptability of a changed
workflow. Another limitation that this work had was the scope of research was only limited to one
site for shoulder pain study experimental setup. However, in the real world, many research studies
are conducted in parallel which leads to changing data collection protocols. In order to fully deploy
methods of data scandalization discussed in this dissertation, it is necessary to evaluate them in a
real setting. Changing data collection technologies also creates challenges in terms of consistency.
Deployment of informatics allows data collection to be streamlined based on the study design from
the initial state.
Lastly, informatics system allows data normalization, validation, and synchronization in real time
and this allow deployment of technologies such as machine learning or AI much more reliable and
attainable in real-world applications of human performance analysis.
71
References
[1] J. L. McNitt-Gray, K. Sand, C. Ramos, T. Peterson, L. Held, and K. Brown, “Using
technology and engineering to facilitate skill acquisition and improvements in
performance,” Proc. Inst. Mech. Eng. Part P J. Sport. Eng. Technol., vol. 229, no. 2, pp.
103–115, 2015.
[2] R. R. Deshpande, H. Li, P. Requejo, S. McNitt-Gray, P. Ruparel, and B. J. Liu,
“Utilization of DICOM multi-frame objects for integrating kinetic and kinematic data with
raw videos in movement analysis of wheel-chair users to minimize shoulder pain,” vol.
8319, p. 83190S, 2012.
[3] R. T. Li, S. R. Kling, M. J. Salata, S. A. Cupp, J. Sheehan, and J. E. Voos, “Wearable
Performance Devices in Sports Medicine,” Sport. Heal. A Multidiscip. Approach, vol. XX,
no. X, pp. 1–5, 2015.
[4] A. Prof and A. Chee, “Advances in Medical Imaging Informatics - Dealing with Big
Data,” no. May, 2012.
[5] H. K. Huang, “Utilization of medical imaging informatics and biometrics technologies in
healthcare delivery,” Int. J. Comput. Assist. Radiol. Surg., vol. 3, pp. 27–39, 2008.
[6] J. P. Wanderer, S. E. Nelson, J. M. Ehrenfeld, S. Monahan, and S. Park, “Evolving health
informatics semantic framework and metadata-driven architectures,” J. Med. Syst., vol.
40, no. 12, pp. 1–9, 2016.
[7] M. I. Informatics, Medical Imaging Informatics. 2009.
[8] R. Deshpande, “A Vendor- - - Neutral , HIPAA- - - Compliant Decision Support System
with Embedded Data Mining for Assessing Dose End- - - Points in Radiation Therapy of
Head and Neck Cancer Fall 2014,” 2014.
72
[9] D. A. Clunie, “DICOM Structured Reporting and Cancer Clinical Trials Results,” pp. 33–
56, 2007.
[10] I. Utilization, M. Y. Y. Law, and B. Liu, “Problem-centric Organization and Visualization
of Patient Imaging and Clinical Data1 Vijayaraghavan,” Image Process., vol. 29, no. 3,
pp. 655–668, 2009.
[11] X. Wang, S. Verma, Y. Qin, J. Sterling, A. Zhou, J. Zhang, C. Martinez, N. Casebeer, H.
Koh, C. Winstein, and B. Liu, “Imaging informatics-based multimedia ePR system for
data management and decision support in rehabilitation research,” in Progress in
Biomedical Optics and Imaging - Proceedings of SPIE, 2013, vol. 8674, no. MARCH, p.
86740P.
[12] “Data warehousing in health care,” 2002.
[13] MySQL: https://www.mysql.com
[14] Highcharts: https://www.highcharts.com/
[15] Google Charts: https://developers.google.com/chart/
[16] Dropbox: https://www.dropbox.com
[15] Google Drive: https://drive.google.com
73
Publications and Presentations
Conference Proceedings and talks
“Multimedia data handling and integration for rehabilitation research”, [Talk] at Imaging
Informatics for Healthcare, Research, and Applications at SPIE 2019
“Medical imaging informatics-based solutions for human performance analytics” [Poster] at
Imaging Informatics for Healthcare, Research, and Applications at SPIE 2018
“The development of a decision support system with an interactive clinical user interface for
estimating treatment parameters in radiation therapy in order to reduce radiation dose in head and
neck patients.” [Talk] at Imaging Informatics for Healthcare, Research, and Applications at SPIE
2017
“Multi-disciplinary data organization and visualization models for clinical and pre-clinical
studies: A case study in the application of proton beam radiosurgery for treating spinal cord injury
related pain” [Poster] at Advanced PACS-based Imaging Informatics, and Therapeutic
Applications, SPIE 2016
Honorable Mention Poster Award for Advanced PACS-based Imaging Informatics, and
Therapeutic Applications at SPIE 2015 for Poster Title: “An imaging informatics-based system to
support animal studies for treating pain in spinal cord injury patients utilizing proton beam
radiotherapy” [Poster].
74
Cum Laude Poster Award for Advanced PACS-based Imaging Informatics, and Therapeutic
Applications at SPIE 2014 for Poster Title: “An imaging informatics-based system utilizing
DICOM objects for treating pain in spinal cord injury patients utilizing proton beam radiotherapy”
[Poster].
“A web-based neurological pain classifier tool utilizing Bayesian decision theory for pain
classification in spinal cord injury patients.” [Talk] at Advanced PACS-based Imaging
Informatics, and Therapeutic Applications at SPIE 2014
“A multimedia system for decision support in neurological classification of pain in spinal cord
injury patients” [Talk] Advanced PACS-based Imaging Informatics, and Therapeutic Applications
at SPIE 2013
“Imaging Informatics System Utilizing DICOM Objects for Treating Pain in Spinal Cord Injury
Patients Utilizing Proton Beam Radiotherapy” [Demo] at Radiological Society of North America
(RSNA) 2013
Best poster award at SPIE 2010 in Ultrasonic Imaging and Signal Processing conference, Poster
Title: “Prostate brachytherapy seed localization using combined photoacoustic and ultrasound
imaging”
Received Sisyphus Award for overcoming unusual difficulties in the project on “Photoacoustic
Imaging in Biological Tissues” (May 2008) for Computer Integrated surgery II coursework.
75
Abstract (if available)
Abstract
With the recent technological advances in the field of human performance monitoring and analysis, the amount and the speed at which data is generated has increased exponentially. However, with the current workflow of human performance analysis, there is a gap between the amount of data recorded and the amount of knowledge extracted and utilized in terms of real-time feedback and interventions. One solution to bridge this gap is the development of an informatics system based on medical imaging informatics concepts. Such a system can handle challenges including data storage, data sharing, data standardization, and data security. This system can provide a framework for solutions development for data processing and knowledge extraction that is scalable and practical in a research setting and can support decision support and feedback requirements. ❧ In this dissertation, I will present a discussion on the current state of human performance analytics workflow, factors that impact this workflow and how knowledge of medical imaging informatics can be used as leverage for the development of a robust solution moving forward. For over a decade medical imaging informatics infrastructure (MIII) has been used as a foundation for applying imaging informatics in various applications areas within clinical departments. ❧ In my work, I am developing a re-conceptualized MIII infrastructure to meet specific requirements of human performance analytics. This is one of the first times that this approach will be expanded to a new frontier area of research outside Radiology and related imaging-based clinical departments. This modification is needed since the biomechanics community is unique in terms of the end users, current workflows, and data models of research studies but also require similar data management and visualization of large-scale multi-media data as traditional imaging informatics needs. Based on the modified infrastructure I implemented several solutions/tools for supporting biomechanics research. I implemented and tested these solutions in two different use-case scenarios using various data sets and different users to show the versatility of the new proposed infrastructure. The initial impact of having these functionalities is measured by comparing the workflow and predicted time difference in task completion with and without the informatics system.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Development of an integrated biomechanics informatics system (IBIS) with knowledge discovery and decision support tools based on imaging informatics methodology
PDF
Imaging informatics-based electronic patient record and analysis system for multiple sclerosis research, treatment, and disease tracking
PDF
Knowledge‐driven decision support for assessing radiation therapy dose constraints
PDF
Identifying injury risk, improving performance, and facilitating learning using an integrated biomechanics informatics system (IBIS)
PDF
An electronic patient record (ePR) system for image-assisted minimally invasive spinal surgery
PDF
Molecular imaging data grid (MIDG) for multi-site small animal imaging research based on OGSA and IHE XDS-i
PDF
Fresnel beamforming for low-cost, portable ultrasound systems
PDF
Development of fabrication technologies for robust Parylene medical implants
PDF
Mining an ePR system using a treatment plan navigator for radiation toxicity to evaluate proton therapy treatment protocol for prostate cancer
PDF
Decision support system in radiation therapy treatment planning
PDF
Control and dynamics of turning tasks with different rotation and translation requirements
PDF
Dynamic graph analytics for cyber systems security applications
PDF
Understanding reactive balance control strategies in non-disabled and post-stroke gait
PDF
Investigation of preclinical testing methods for total ankle replacements
PDF
Sense and sensibility: statistical techniques for human energy expenditure estimation using kinematic sensors
PDF
Engineering scalable two- and three-dimensional striated muscle microtissues for human disease modeling
PDF
An approach to experimentally based modeling and simulation of human motion
PDF
Architecture design and algorithmic optimizations for accelerating graph analytics on FPGA
PDF
Multi-scale biomimetic structure fabrication based on immersed surface accumulation
PDF
Modeling human regulation of momentum while interacting with the environment
Asset Metadata
Creator
Verma, Sneha K.
(author)
Core Title
A medical imaging informatics based human performance analytics system
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Biomedical Engineering
Publication Date
04/29/2019
Defense Date
04/29/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
medical imaging informatics,OAI-PMH Harvest,performance analytics
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Liu, Brent (
committee chair
), McNitt-Gray, Jill (
committee member
), Zhou, Qifa (
committee member
)
Creator Email
snehaver@gmail.com,snehaver@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-160803
Unique identifier
UC11662484
Identifier
etd-VermaSneha-7355.pdf (filename),usctheses-c89-160803 (legacy record id)
Legacy Identifier
etd-VermaSneha-7355.pdf
Dmrecord
160803
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Verma, Sneha K.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
medical imaging informatics
performance analytics