Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A multimodal screen reader for the visually impaired
(USC Thesis Other)
A multimodal screen reader for the visually impaired
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UME
films the text directly from the original or copy submitted. Thus, some
thesis and dissertation copies are in typewriter face, while others may be
from any type o f computer printer.
The quality of this reproduction is dependent upon the quality of the
copy submitted. Broken or indistinct print, colored or poor quality
illustrations and photographs, print bleedthrough, substandard margins,
and improper alignment can adversely afreet reproduction.
In the unlikely event that the author did not send UMI a complete
manuscript and there are missing pages, these will be noted. Also, if
unauthorized copyright material had to be removed, a note will indicate
the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by
sectioning the original, beginning at the upper left-hand comer and
continuing from left to right in equal sections with small overlaps. Each
original is also photographed in one exposure and is included in reduced
form at the back o f the book.
Photographs included in the original manuscript have been reproduced
xerographically in this copy. Higher quality 6” x 9” black and white
photographic prints are available for any photographs or illustrations
appearing in this copy for an additional charge. Contact UMI directly to
order.
UMI
A Bell & Howell Information Company
300 North Zeeb Road, Ann Arbor MI 48106-1346 USA
313/761-4700 800/521-0600
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A Multimodal Screen Reader For The Visually Impaired
by
Udayakumar Rajendran
Copyright 1996
A Dissertation Presented to the
FACULTY OF THE SCHOOL OF ENGINEERING
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirement for the Degree
Master of Science
(Biomedical Engineering)
August 1996
Udayakumar Rajendran
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 1381603
Copyright 1996 by
Rajendran, Udayakumar
All rights reserved.
UMI Microform 1381603
Copyright 19%, by UMI Company. All rights reserved.
This microform edition is protected against unauthorized
copying under Title 17, United States Code.
UMI
300 North Zeeb Road
Ann Arbor, MI 48103
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This thesis, written by
J ± £&j >A£Lr<______ u'JDJ t y J E L tS k in fB L _____________
under the guidance of his/her Faculty Committee
and approved by all its members, has been
presented to and accepted by the School of
Engineering in partial fulfillm ent of the re
quirements for the degree of
m S J S Z .. _ j>JL _ JLQ.Lt*£^-C&Q£!£BL&jU. _ ££(*$
Date __________
Faculty Committee
Chairman
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
To, My Parents, Teachers and Friends
Who made me all I am
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Acknowledgment
First I would like to thank my committee chairman Dr. Manbir Singh for giving
all the support he has given me. I am very thankful to him for agreeing to be my advisor
even though I wanted to do something completely outside his research area. I have taken
lot of his time to discuss my work and get his advice about various things. He gave me lot
o f latitude and allowed me to pursue two or three different topics before settling on the
final topic. He has been exceptionally kind in accommodating my erratic work schedule
during this completion of this work.
Next I thank Dr. Vasilis Marmarelis, Chairman of the Biomedical Engineering
Dept, for his guidance when I was setting up my thesis committee. I am greatly indebted
to him for allowing me to continue my work after an interruption of about an year and a
half.
I would like to thank Dr. Michael Khoo and Dr. Jean-Michel Maarek, the other
members of my thesis committee. They spent valuable time in going through my work
and offering useful suggestions.
I would like to thank Louai Aldayeh, for guiding me in completing all the paper
work that had be completed for the submission o f this thesis.
Finally 1 would like to thank the rest o f the faculty and staff of the Biomedical
Engineering department who have helped in various ways throughout my stay here.
Udayakumar, Rajendran.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table of Contents
1. Introduction............................................................................................................................. I
2. System Description................................................................................................................3
2.1. The Environment............................................................................................................ 3
2.2. Potential candidates for the various components........................................................ 7
2.2.1. Windows Open Systems Architecture (WOSA).................................................... 8
2.2.2. The OLE Environment...........................................................................................10
2.2.3. Screen Reader Control Subsystem........................................................................ 12
2.2.4. Text to Voice Engine.............................................................................................. 13
2.2.5. Haptic Controller.................................................................................................... 15
3. Further work..........................................................................................................................17
3.1. Work for the Windows platform implementation......................................................17
3.1.1. Technological Considerations................................................................................17
3.1.2. Economic considerations........................................................................................18
3.2. Issues in porting to other operating systems................................................................18
4. Conclusion.............................................................................................................................20
5. Bibliography..........................................................................................................................21
6. Appendix A - Cost estimate for end user system/ development system........................ 22
iv
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Tables and Figures
Figure 1. Block diagram o f screen reader environment..........................................................3
Figure 2. Dataflow diagram o f the Screen Reader..................................................................6
Figure 3. Windows Open System Architecture (WOSA) Components................................9
Figure 4. The OLE Framework...............................................................................................10
Figure 5. Binary Representation o f an Interface [4].............................................................. 1 1
Figure 6. Text To Speech Subsystem......................................................................................14
v
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abstract
Current screen readers used by visually impaired users of computers are based on
audio feedback to the user. A screen reader based on both audio and haptic
(sensoiy/touch) feedback is proposed here. First the various subsystems that make up the
system are described at a functional level. Feasibility o f building such a system using
current technology is discussed next. Candidates for the various subsystems using current
technology are briefly described. Finally the major issues for the actual implementation
and a set of tasks for further work are discussed.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1. Introduction
Computers have become common in today’s work place. Screen readers allow a
visually handicapped person to work with computers. Commercial screen reader
packages have been available for a while for text based environments. Graphical User
Interface(GUI)s which are increasingly becoming the standard desktop environments
pose special challenges to the visually handicapped user since the complexity o f the
interface has increased. In addition to the two dimensional screen, it is possible have
multiple windows on the screen, each window potentially running a different
application. Each window can contain various objects technically known as controls or
widgets. The user needs to keep track of more information. Further operations that can be
performed on the computer have increased both in functionality and complexity. For
example sharing data between multiple applications word-processors, spread-sheets etc.
has become a common task. Many text based screen readers are being ported to GUI
environments. Quite a few readers are available for the most commonly used operation
system Microsoft Windows environment.
At this time most o f the screen readers use audio feedback only. As the viewer
switches between the windows and controls on the screen the system announces the type
of the control and associated text if any. For example when the user switches to the
application Microsoft Word, the system might say Main Window - Microsoft Word.
When the user selects the item on the m enu, File the system might say ‘File Menu’. If
the user chose the open option from the File menu the system might say ‘File Open
1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Dialog Box’. As the user changes the focus via the keyboard or the mouse the system
names the new control’s type and some identifying text like ‘OK button’, ‘Cancel button’
etc.
While audio based interfaces have been helpful, they require the user to have a
certain amount o f cognitive skill. The user needs to build a mental map the windows and
the various user interface elements on the windows. A multimodal interface would help
make the process o f building and working with this map easier and faster.
A screen reader with audio and haptic mode o f operation using a modified
pointing device was displayed recently at the eleventh annual conference of technology
for people with disabilities. As the user moved the cursor moves across various items on
the screen the user was given some force feedback via the pointing device. For example
as the cursor crossed the border of a window the user felt a bump as he moved the
pointing device. When the cursor reached the edge of the screen the pointing devices
stopped moving giving the idea of striking against a fixed barrier like a wall.
This paper proposes audio and haptic output through a glove instead of a mouse.
The advantage of using a glove is that much finer feedback can be given. Further the
glove seems to be a more natural interface than the mouse which has a single point of
contact with the surface being explored. Using the glove the user can actually feel the
windows and controls and other user interface elements on the windows. If the user
knows Braille it is possible to present the text on the screen via Braille.
An experimental system for blind users using multimodal feedback has been
reported[2]. The system used audio feedback and tactile feedback via two dimensional
i
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Braille system. However the system was built as an experimental platform and the
functionality was limited in scope. It was built to work with a particular multimedia data
retrieval application rather than the broader use proposed for the system proposed in this
report.
2. System Description
2.1. The Environment
Keyboard
Driver
Display
Driver
Pointing
Device
Driver
Audio
Interface
Driver
Haptic
Interface
Driver
User
Operating System
Figure 1. Block diagram of screen reader environment
Figure 1 shows the various components in the system. The user interacts with
various applications and the operating system (Links A & B in the figure). Applications
use the various devices like keyboard, pointing device and the display through services
provided by the operating system (C). The operating system in turn depends on various
drivers to actually control the external devices (D, E, F). The screen reader control
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
subsystem registers itself with the operating system so that it will be notified whenever
the an application changes the display or the user performs an action via the keyboard or
the pointing device (G). When the control subsystem processes the notifications from the
operating system it decides the feedback to be delivered to the user through the audio and
haptic interfaces. The control subsystem then sends commands to the text to voice engine
and the haptic interface controller (G, H, I). The text to voice engine and the haptic
controller give audio and haptic responses to the user via the audio interface driver and
the haptic interface driver respectively(H, I, J, K). The user can adjust the behavior the
control subsystem and associated subsystems to suit his needs (L,M,N).
Figure 2, which represents the system as a dataflow diagram, shows the system in
further detail. The user’s actions and an application’s response to the user’s actions are
processed by the operating system which sends notifications to the screen reader control
subsystem, the heart of the screen reader.
The control subsystem, which maintains an internal model o f the screen, uses a
set of mapping tables to decide what feedback is to be given to user for in response to a
particular event such as opening of a window, focus being set to a new control on the
screen, the pointer being moved to a new window etc. These event-feedback mapping
tables are created by the macro processor which processes macro scripts that
characterize various input events and the appropriate feedback that is to be given to the
user in response to these events. There will be two levels of feedback tables, a global
table that defines the behavior of the system for all applications and an application
specific table that can be used modify the response for an event for a particular
4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
application alone. While determining the response for an event the application specific
table is searched first. If the event is not handled in the application specific table the
response specified in the global mapping table is provided. The control subsystem
decides the audio and haptic feedback that is to be given to the user and passes on these
responses to the text to speech engine and the haptic feedback engine respectively. The
text to speech engine receives the text to be spoken out and determines the phonemes
that need to be enunciated to speak out the text. The phonemes are determined by
referring to pronunciation rules (word-phoneme mapping rules). In order to make the
quality of the spoken text sound more natural in addition to the word-phoneme mapping
rules small pre-recorded segments of speech, called prosody, are used. This gives proper
intonation and timing to the spoken text. Once the phonemes and appropriate prosody
segments are decided the text to speech engine passes these to the audio interface driver
which actually drives the hardware to create audible sound via the speaker.
In addition to spoken speech the control system may use non-speech audio icons
to indicate various events. For example the opening of a menu or the pointer crossing the
border of a window might give different distinctive beeps. With regard to the haptic
feedback, the screen reader control subsystem generates commands at a high level such
as ‘produce a bump o f magnitude x’ or provide the texture of a smooth surface’ or
'produce a raised rectangular surface of 2 cm by 1 cm (simulation o f a button for
example)’. The control subsystem will also have a Braille rendering engine that will
translate the text on the screen into Braille patterns which will then be superimposed on
the general pattern of user interface elements on the screen.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission.
Macro
Processor
Macro
Scripts
Speaker
User Input
Haptic
Actuation
Sequences
''Haptic'
Interface
Driver
''Audio'
Interface
Driver
Phonemes /
Synthesizer
Commands
Haptic
Feedback
Commands
Text to be
spoken out
Prosody
Segments
Word-
Phonme
Mapping
Rules
Glove
Control
Parmetcrs
A pplication
Actuation
Pattern
Sequences
Control
Parmeters
Operating
System
Notifications
Control
Parmeters
Event -
Feedback
Lookup
Tables
Operating
System
Control
Subsystem
Text to
Speech
Engine
Haptic
Feedback
Engine
Figure 2. Dataflow diagram of the Screen Reader
O n
The haptic feedback engine uses a set of actuation pattern sequences to generate
the lower level force-feedback sequences to be delivered to the final haptic device in
order to produce the high level tactile and kinesthetic effects requested by the control
subsystem. These actuation sequences are passed on to the haptic interface driver that
actually drives the hardware in the glove.
In addition to acting as an output device the haptic interface will also act as an
input device. Apart from playing a major role in providing feedback to the haptic
feedback engine which uses the input as cues for further actuation sequences the input
will also be used as a special pointing device on the screen. The multi-dimensional input
the haptic device provides will be modeled as one or more pointing devices, each of
which could have multiple axes of freedom.
2.2. Potential candidates for the various components
While users could be using a wide variety of computers running various operating
systems we will limit our discussion to IBM Compatible PCs running Microsoft
Windows as the operating system. The platform was chosen since this combination is
increasingly becoming the standard for commercial use. Large volumes have also made
these systems very affordable and so these systems are widely used as home computers
too.
Of the various components shown in Figure 2 we will not discuss the standard
components like the keyboard driver, pointing device driver and the display driver.
Before starting the discussion on the various subsystems in earnest brief descriptions of
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the conceptual and implementation models of the Windows programming environment
are presented. First the architecture o f the Microsoft Windows programming model is
discussed. This architecture known as Windows Open System Architecture (WOSA)
serves a overall conceptual framework of reference for Windows programming. A brief
discussion about the relevant portions of the OLE interface specifications follows next.
In addition to defining various building blocks that can be used as the foundation for the
actual implementation and these specifications also provide the framework to create a
modular implementation o f our system.
2.2.1. Windows Open Systems Architecture (WOSA)
WOSA is the software architecture proposed by Microsoft to facilitate modular
application development and easier integration of components from multiple vendors[3].
Figure 3 shows the high level programming model of this architecture. The operating
system provides various services. Service is essentially another name for functionality.
Examples of services are database connectivity, network connectivity, Text to Voice
functions etc. Applications call various windows application programming interface
(API)s to make use of services provided by Windows (Link A in Figure 3). The operating
system translates the application requests into requests for specific service providers (Bi
links) and passes these requests to the particular service providers (Link C). The
advantage using this layered architecture is that Windows provides standard interfaces
for both applications and vendors who provide lower level building blocks. It decreases
the amount of work needed by applications to support different lower level building
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
blocks. It also gives more freedom to system builders since it allows easier mixing of
components from multiple vendors.
Applications
Windows Application Programmer’s Interface(API)s
T
| b 2
Windows / Service Managers
1 B ’ I 1
B4
Windows Service Provider’s Interface(SPI)s
Service Providers
Figure 3. Windows Open System Architecture (WOSA)
Components
WOSA specifications have been developed for various functions like database
connectivity, network connectivity, messaging, imaging, financial services etc. The
specifications that are of interest to us are the Speech API and Directlnput API. The
speech API provides the voice recognition and text to voice functionality. The
Directlnput API currently provides just multiple joystick capabilities and supports up to
six axes of movement for each joystick. The interface is capable of supporting force
feedback capabilities and the definition of the specific interfaces is currently in progress
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.2.2. The OLE Environment
Client
— &
o-
Interface
functions
O
Object
Properties
(data members)
Methods
Object
Properties
(data members)
Methods
Component
Server
Figure 4. The OLE Framework
The OLE environment is the model that the Windows operating system presents
of itself to a programmer. The textbook definition o f OLE is that “It is a unified
environment of object based services with the capability of both customizing those
services and arbitrarily extending the architecture through custom services, with the
overall purpose o f enabling rich integration between components” [4], More simply put
this means it presents the services the operating system as a set of components. These
components can be customized and new components can be added extend the
functionality o f the system. From the programmer’s point of view OLE is made of set o f
APIs and a large number of interfaces. An interface is the group of semantically related
functions through which you access the functionality provided by an object. OLE does
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
not allow direct access to an object’s internal variables. All access to the object is via the
functions that constitute the interface. Figure 4 shows a high level view the OLE
environment. In programming terms, any object can be considered to be made of a set of
properties (data elements) and methods (member functions). While this represents the
internal view o f an object, the object presents itself only as a set of functions to any
external entity.
Figure 5 shows the representation o f an interface at the binary level. The interface
is basically a pointer to a table o f function pointers. It is important to note that since the
interface is the only view o f the object to an external entity the object’s implementation
is completely hidden from the client. The object could be implemented in any language
as long as the language environment allows for setting up the interface function table.
Further since this is a binary level standard, changing the object implementation does not
need the rebuilding of any users o f the object. As long as the order of the interface
functions remains the same the implementation could be changed as needed transparent
to the user of the component.
Interface
pointer
Ipttb!
Private
object
data
Interface function table,
or vtable
-> * - P ointer to F unctionl
P ointer to Function2
P ointer to Function3
Object
implementation
of interface
functions
Figure 5. Binary Representation of an Interface [4]
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.2.3. Screen Reader Control Subsystem
The screen reader needs to ask the operating the system to notify it (the screen
reader) when an application changes the screen or when the user provides the system
some input via the keyboard or the pointing device. In the Windows environment the
screen reader can receive these notification by installing hooks via the
SetWindowsHookEx fiinction[5]. In particular the screen reader needs to install filter
functions for the following hooks WHjCBT, WHGETMESSAGE,
WHSYSMSGFDLTER.
The WH CBT hook is called whenever Windows does any significant action
related to windows including creation/destruction o f a window, activation of a window,
minimization/maximization o f a window, sizing of a window, activation of any item in
the system menu of a window, setting focus to a window.
The WH GETMESSAGE hook is called whenever the GetMessage or
PeekMessage functions are about to return. GetMessage and PeekMessage are the
functions, typically used in the main loop o f any Windows based application, that an
application uses to get input from the system/user.
The WH_S YSMSGFTLTER is called a dialog box, a message box, a scroll bar, or
a menu retrieves a message or when the user presses the a l t+ta b or a l t+e sc keys.
Basically these are special circumstances where the application does not receive the input
via the normal GetMessage, PeekMessage functions.
The control subsystem maintains an off-screen model of the screen and gives
appropriate feedback to user according the context. Audio icons can be used to indicate
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
various actions happening on the screen. For example events like a dialog box opening, a
menu dropping down, a menu item getting selected etc. might give brief distinctive non
voice sounds followed by enunciated text appropriate to the particular context. Some of
these events are already supported by the operating system itself whereas the screen
reader will have to handle other events.
2.2.4. Text to Voice Engine
The Speech API provides for both text to voice and voice recognition functions.
We are interested only in the voice to text functions. Figure 6 shows the major
components of the WOSA text to speech subsystem[6]. The functionality is made
available as a set of OLE components i.e. each component exposes a set o f interface
functions that an application calls in order to use the services provided by the component.
A brief description of the various components and their role in the system follows.
The text-to-speech enumerator enumerates the text-to-speech modes for all
engines available to the application (Link A in Figure 6). The WOSA architecture allows
for multiple text-to-speech engines to coexist on a single system. A text-to-speech mode
is a particular style of language/speech, for example, adult male voice speaking US
English. The engine enumerator enumerates the text-to-speech modes for a particular
engine (Link C). The engine object provides a text-to-speech mode for the application,
which in our case is the screen reader control subsystem, to use. The multimedia audio
destination object represents an audio destination and is based on a multimedia device
driver.
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Engine Enumerator
Multimedia Audio
Destination
Text to Speech
Enumerator
Text to Speech
Engine
Application (Screen Reader Control Subsystem)
Figure 6. Text To Speech Subsystem
Once the control subsystem selects a particular text-to-speech mode to be used, it
sends the text to be spoken to the text engine (Link D). The text engine converts the text
into appropriate phonetic representation, converts the phonetic representation into
commands for multimedia audio destination device (Link E). In some cases , like the
playing sound icons, the control subsystem might directly send commands to the audio
destination device (Link F).
The actual steps involved in using the text-to-speech services are
1. Initialization of the OLE libraries via a call to the Colnitialize function.
2. Creation o f an instance of the multimedia audio destination object by a call to the
CoCreatelnstance function.
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. Creation o f an instance o f the text to speech enumerator and obtaining the handle of
a text to speech engine and an appropriate mode from it via the Find function of the
enumerator.
4. Creation of an instance o f a custom class that implements the ITTSBufNotifySink
interface. The engine will call this interface to notify us buffer-related events.
5. Causing the engine to speak out text by calling the TextData function o f the
ITTSCentral interface of the engine with appropriate parameters. The text to be
spoken is actually passed to the engine passed as a parameter to the TextData
function.
2.2.5. Haptic Controller
Conceptually, the model for system will be something similar to the graphics
rendering subsystem. The haptic interface driver implements some low level primitives
whereas the haptic feedback engine takes care of translating the more complex scene
descriptions given by the control subsystem into the lower level primitives. This is
equivalent to video drivers implementing functions like drawing a line, filling a polygon
etc., while the GDI takes care of dithering, hidden surface removal, shadow rendering
etc.
The haptic subsystem has not been standardized to the extent the text-to-speech
subsystem yet. The Directlnput section of Microsoft Games SDK section defines the
functionality for a variety of devices like joysticks, flight yokes, virtual reality headgear
etc. [7]. Each device can use up to six axes of movement and the architecture supports up
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to sixteen such devices on a single system. The overall architecture of the system follows
the WOSA model and is not very different from the architecture of the text-to-speech
subsystem described in the previous section. The architecture is flexible enough to
accommodate force-feed back joysticks and gloves. As o f now specific interfaces have
not been defined. Microsoft is the process o f developing new interfaces along with a
group of other vendors working in these areas.
Haptic feedback is composed of two elements: tactile perception, or the
awareness o f stimulation the skin, and kinesthesis, or the sense of joint positions,
movements and torques. A glove that can give haptic feedback is commercially
available[8]. The CyberTouch™ glove provides haptic feedback using vibrotactile
sensors on each finger and the palm of the hand and uses resistive sensors to detect the
wrist and finger positions and movement. The device interfaces to the PC via the serial
port and allows the developer to develop custom actuation patterns. Another device that
looks very promising in the PHANToM interface from Sensable Devices[9], In it’s
current commercial form this device provides haptic feedback to the tip of the finger via
thimble connected to appropriate actuating motors. The user places a finger in the
thimble and is able to “feel” the virtual object. A study using this device as the interface
to help blind users visualize technical and scientific data has been reported[10]. The
device interfaces to the PC via a special adapter board. Work needs to be done to extend
the interface to provide feedback via more fingers and possibly the entire wrist.
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. Further work
First the work that needs to be done to actually implement such a system under
the Windows environment will be discussed followed by a brief discussion on the issues
in porting the system to other platforms
3.1. Work for the Windows platform implementation
3.1.1. Technological Considerations
The biggest technical issue is the standardization of haptic interfaces. The fact
that haptic interface technology is relatively young and is still evolving makes this still
harder. However the recent proliferation of multimedia computers and virtual reality
technologies has generated lot of interest in this area. Haptic interfaces are being
considered for use by normal users for more realistic games, exploration of virtual worlds
etc. This augurs well for improvements in the area since commercial firms will be willing
to devote more resources in the area in order to take advantage o f the huge potential
market.
Other than standardization, haptic interface technology itself needs to be explored
further. In particular the kind of actuation sequences needed for expressing fine detail,
variety of textures etc. need to be designed. Getting acceptable response times while
performing these complex sequences, especially from systems dealing with many
actuation points will be another problem. For example the CyberTouch™ glove uses 18-
22 sensors, depending on the resolution desired, for just data acquisition to cover a single
hand. The vibrotactile actuators are handled separately. If a more natural device, where
17
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the user can both hands, is desired the number of actuator points is quite huge. The
commercial version o f the PHANToM device handles only two fingers at this time. To
get acceptable performance we might have to use complex devices just to handle this part
which would increase the cost o f the system. The cost factor is discussed further in the
next section.
Better interfaces to present the system to the blind user have to be studied. For
example optimal ways o f using normal speech output and non-speech sound icons to
present an effective user interface was explored in a recent study[l 1]. New metaphors for
use by the blind are also being studied[l2].
3.1.2. Economic considerations
At this time the haptic interfaces described are extremely costly and make the
entire system unaffordable by a typical visually impaired user. For example the
CyberTouch™ glove costs about $14,000 while the PHANToM device costs about
$20,000. However as such devices become more mature and manufacture for large
volumes starts the price can be expected to come down to a large extent.
Other than this cost, it is estimated that a typical user’s system will cost around
$3,700. A developer’s system for this project will cost around $6,000. Appendix A.
shows the details for the estimate.
3.2. Issues in porting to other operating systems
While the overall architecture of the system remains the same across all operating
systems, common standardized interfaces are not available. For example each of the
18
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
popular UNIX platforms Sun, HP, SGI and IBM have different hardware interfaces. The
software interfaces to access the hardware also differs slightly between the corresponding
operating systems. There is no standard system architecture model like WOSA which
again causes the drivers to be specific to each platform. So typically hardware is not
portable across the various workstations.
The CyberTouch™ system supports the standard RS-232C serial port and so is
compatible with these platforms at the hardware level. However the software interfaces
are still different and have to be handled separately. The PHANToM1 ' 1 system currently
works with ISA interface card and cannot be used with the Unix platforms directly. It
might be useful to investigate the feasibility of creating a SCSI based interface for these
devices since this interface is supported across all platforms including IBM PC
compatibles and Unix workstations. Also this interface can support large bandwidths,
which might be needed to produce finer and more accurate control of the haptic devices.
In terms of the software architecture the X-Windows environment, particularly
the Motif based version, has become fairly standard across all Unix platforms. Recently
hooks for supporting tools like screen readers have been defined for the X-Windows
environment.
The cost of a base workstation for these Unix platforms by itself is slightly higher
than a comparable PC, partly due to the lower volumes of manufacture of these systems.
The same goes for the development tools too.
19
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. Conclusion
A high level design for a screen reader based on audio and haptic feedback was
discussed. The issues involved in building the system using a standard operating system
were discussed. Most o f the components needed for such a system are currently
available. However significant amount of work is needed with regard to the haptic
subsystem. In particular, standard interfaces have to developed and the device itself is
currently too costly for regular commercial production.
20
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5. Bibliography
1. Lazzaro Joseph J., “Adapting PCs for disabilities”, Addison Wesley Publishing
Company, 1996.
2. -, “Voice-Tactile Interaction Facilitating Blind User Access to an Adapted
Multimedia Retrieval Application”,
http://titan.mic.dundee.ac.uk/engineer/porting/bIind/index.htm
3. -,“WOSA Backgrounder Delivering enterprise services to the Windows based
desktop”, MSDN July 1993.
4. Kraig Brockschmidt, “Inside OLE”, Microsoft Press 1995.
5. Kyle Marsh, “Microsoft Windows Hooks’, MSDN July 1992.
6. - /‘Microsoft Speech API Documentation”.
7. -/‘Directlnput, MS Games SDK Documentation”.
8. - /‘CyberTouch™” - Virtual Technologies Inc.
http://www.quake.net/~virtex/products.html
9. - /‘The PHANToM Interface” - Sensable Devices Inc.
http://www.sensable.com/users/sensable/
10. Jason P. Fritz et al., “Haptic Representation of Scientific Data for Visually Impaired
or Blind Persons”, Proceedings of the Eleventh Annual Conference o f Technology for
People with Disabilities, LosAngeles, March 1996.
11. Alireza Darvishi, “Use o f Environmental Sounds in Non Visual Interfaces”,
Proceedings of the Eleventh Annual Conference o f Technology for People with
Disabilities, LosAngeles, March 1996.
12. Ronald Morford et al. “Windows City - A new generation of Windows Three
Dimensional Screen Reader”, Proceedings of the Eleventh Annual Conference of
Technology for People with Disabilities, LosAngeles, March 1996.
21
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6. Appendix A - Cost estimate for end user system/
development system
Item/Price Normal User’s Machine Developer’s Machine
IBM PC, Pentium 100
MHz, 16 MB RAM, 1GB
Hard drive (2GB for
developer), SVGA display
2000 2500
DEC Talk speech
synthesizer
900 900
Text to Speech Engine 800 800
C compiler 200
Visual Basic Compiler 200
MSDN information 495
Soft ICE Debugger 800
Total Cost 3700 5895
22
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Cellular kinetic models of the antiviral agent (R)-9-(2-phosphonylmethoxypropyl)adenine (PMPA)
PDF
Design of a portable infrared spectrometer: application to the noninvasive measurement of glucose
PDF
Head injury biomechanics: Quantification of head injury measures in rear-end motor vehicle collisions
PDF
An open ear canal sound delivery system
PDF
A model of upper airway dynamics in obstructive sleep apnea syndrome
PDF
Contact pressures in the distal radioulnar joint as a function of radial malunion
PDF
Characteristics and properties of modified gelatin cross-linked with saline for tissue engineering applications
PDF
Estimation of optical fluorescent impulse response kernel in combined time and wavelength spaces
PDF
Comparisons of deconvolution algorithms in pharmacokinetic analysis
PDF
Flourine-19 NMR probe design for noninvasive tumoral pharmacokinetics
PDF
Effects of prenatal cocaine exposure in quantitative sleep measures in infants
PDF
Cross-modal prepulse modification of ERP and startle
PDF
Functional water MR spectroscopy of stimulated visual cortex using single voxel
PDF
Finite element analysis of the effects of stem geometry, surface finish and cement viscoelasticity on debonding and subsidence of total hip prosthesis
PDF
Processing And Visualization In Functional Magnetic Resonance Imaging (Fmri) Of The Human Brain
PDF
A finite element model of the forefoot region of ankle foot orthoses fabricated with advanced composite materials
PDF
Evaluation of R.F. transmitters for optimized operation of muscle stimulating implants
PDF
Development of ceramic-to-metal package for BION microstimulator
PDF
Respiratory system impedance at the resting breathing frequency range
PDF
Three-dimensional functional mapping of the human visual cortex using magnetic resonance imaging
Asset Metadata
Creator
Rajendran, Udayakumar
(author)
Core Title
A multimodal screen reader for the visually impaired
School
School of Engineering
Degree
Master of Science
Degree Program
Biomedical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, biomedical,engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Singh, Manbir (
committee chair
), Khoo, Michael C.K. (
committee member
), Maarek, Jean-Michel (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-6110
Unique identifier
UC11341442
Identifier
1381603.pdf (filename),usctheses-c16-6110 (legacy record id)
Legacy Identifier
1381603.pdf
Dmrecord
6110
Document Type
Thesis
Rights
Rajendran, Udayakumar
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, biomedical
engineering, electronics and electrical