Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Dialogue management in spoken dialogue systems with degrees of grounding
(USC Thesis Other)
Dialogue management in spoken dialogue systems with degrees of grounding
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DIALOGUE MANAGEMENT IN SPOKEN DIALOGUE SYSTEMS WITH
DEGREES OF GROUNDING.
by
Antonio Roque
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2009
Copyright 2009 Antonio Roque
Acknowledgments
This work has been sponsored by the U.S. Army Research, Development, and Engi-
neering Command (RDECOM). Statements and opinions expressed do not necessar-
ily reflect the position or the policy of the United States Government, and no official
endorsement should be inferred. I am personally grateful to the U.S. Army for under-
standing the value of research and for providing a context for my dissertation work.
I thank my committee: David Traum (advisor), who helped identify a topic of mutual
interest, made many comments, meticulously proofread my academic writing, and was
as tireless in seeking funding as he was generous in providing it; Kevin Knight, whose
input greatly improved the evaluations; and Shrikanth Narayanan and Milind Tambe,
who provided useful comments. The testbed projects involved infrastructure developed
by various co-workers at ICT, USC, and the Army, who I thank: Ron Artstein, Lila
Brooks, David DeVault, Sudeep Gandhe, Jillian Gerten, Panayiotis Georgiou, Dusan
Jan, Anton Leuski, John Looney, Bilyana Martinovski, Bill Millspaugh, Vivek Rangara-
jan, Susan Robinson, and Ashish Vaswani, among others. I also thank Alicia Abella and
the AT&T Labs Fellowship Program, William McMillan at Eastern Michigan Univer-
sity, Francisco Luis Roque, and my former colleagues at the University of Pittsburgh:
Dumisizwe Bhembe, Michael Bottner, Andy Gaydos, Brian Hall, Pamela W. Jordan,
ii
Diane Litman, Maxim Makatchev, Umarani Pappuswamy, Michael A. Ringenberg, Car-
olyn Penstein Ros´ e, Stephanie Siler, Scott Silliman, Ramesh Srivastava, and Kurt Van-
Lehn.
I am particularly grateful to Carolyn Penstein Ros´ e, who proved by example that
it was possible to be profoundly devoted to both research and family: if it were not
for her guidance I would never have thought to pursue a PhD, and I did not know the
value of her frequent encouragement until I was absent from it. I am also grateful to
Mark Core for talking to me about research, science, and career issues in general, and to
David DeVault for reminding me that discussing issues in language and meaning could
be enjoyable.
iii
Table of Contents
Acknowledgments ii
Abstract xiii
Chapter 1: Introduction 1
1.1 Grounding and Spoken Dialogue Systems 1
1.2 Thesis Questions 4
1.3 Contributions of This Dissertation 5
1.4 Outline of Dissertation 7
Chapter 2: Related Work 10
2.1 Overview 10
2.2 Error Handling Approaches 10
2.2.1 Detecting Misunderstandings 11
2.2.2 Reacting to Misunderstandings: Studies of User Behavior 11
2.2.3 Reacting to Misunderstandings: System Approaches 13
2.2.4 Reacting to Non-Understandings 14
2.3 Grounding Approaches 14
2.3.1 Clark and Grounding 14
2.3.2 Grounding Acts Model 18
2.3.3 Decision-Making Under Uncertainty 19
2.4 Discussion 20
Chapter 3: A Model of Degrees of Grounding 22
3.1 Overview 22
3.2 Evidence of Understanding 22
3.3 Degrees of Groundedness 25
3.4 Grounding Criteria 27
3.5 Dialogue Management Algorithms 29
3.6 Discussion 31
Chapter 4: Development Testbed: Call for Fire Training 32
4.1 Rationale for Domain Choice 32
iv
4.2 Military Radio Communication 33
4.3 Calls For Fire 36
4.4 CFF Training in JFETS-UTM 38
Chapter 5: Discourse Analysis of Development Corpus 40
5.1 Overview 40
5.2 Dialogue Phases 40
5.3 Information Types 43
5.4 Discussion 45
Chapter 6: CFF Dialogue Modeling with Degrees of Grounding 47
6.1 Overview 47
6.2 Common Ground Units 48
6.3 Evidence of Understanding 49
6.3.1 Submit 49
6.3.2 Repeat Back 50
6.3.3 Resubmit 51
6.3.4 Acknowledge 51
6.3.5 Request Repair 51
6.3.6 Move On 52
6.3.7 Use 53
6.3.8 Lack of Response 54
6.4 Degrees of Groundedness 55
6.4.1 Unknown 57
6.4.2 Misunderstood 58
6.4.3 Unacknowledged 58
6.4.4 Accessible 58
6.4.5 Agreed-Signal and Agreed-Signal+ 59
6.4.6 Agreed-Content and Agreed-Content+ 60
6.4.7 Assumed 60
6.5 Grounding Criteria 60
6.6 Trace of Example Dialogue 62
6.7 Discussion 66
Chapter 7: CFF Dialogue Management with Degrees of Grounding 67
7.1 From Dialogue Modeling to Dialogue Management 67
7.2 Dialogue Management Algorithms 68
7.2.1 Identifying Evidence of Understanding 68
7.2.2 Identifying Degrees of Groundedness 70
7.2.3 Determining Evidence of Understanding in Response 72
7.3 Implementation Details 75
7.4 Trace of Dialogue Management Example 78
v
7.5 Corpus Evaluation of Dialogue Management Algorithms 82
7.5.1 Identifying Evidence - Inter-Coder Agreement 83
7.5.2 Identifying Evidence - Human-Algorithm Agreement 84
7.5.3 Change in Degree of Groundedness - Inter-Coder Agreement 85
7.5.4 Change in Degree of Groundedness - Algorithm Agreement 85
7.6 Evaluation of Implemented System 86
7.7 Discussion 87
Chapter 8: Grounding in Tactical Questioning with Degrees of Grounding 90
8.1 Extension of the Model to a Second Domain 90
8.2 A Virtual Human for Tactical Questioning Training 91
8.2.1 Domain: Tactical Questioning 91
8.2.2 Scenario: Hassan 91
8.2.3 Hassan System Architecture 93
8.2.4 Grounding and Dialogue Management 95
8.3 Example Dialogue Excerpt 98
8.4 Evaluation 99
8.4.1 Comparison to No Grounding Condition 99
8.4.2 Comparison to Random Grounding Condition 103
8.5 Discussion 105
Chapter 9: Conclusions and Future Directions 107
9.1 Conclusions of Completed Work 107
9.2 Future Directions 108
9.2.1 Application to New Domains 108
9.2.2 Machine Learning of Domain-Specific Features 109
9.2.3 Grounding, Emotion, and Personality 110
9.2.4 Probabilistic Approaches 111
Reference List 113
vi
List of Tables
Table 1.1: Grounding-Related Terms 3
Table 2.1: Clark and Marshall’s Description of Shared Knowledge 15
Table 2.2: Clark and Schaefer’s Evidence of Understanding 17
Table 3.1: Example of Evidence of Understanding 23
Table 3.2: Example of Evidence of Understanding 25
Table 3.3: Example of Evidence of Understanding 26
Table 3.4: Example of Degrees of Groundedness 27
Table 3.5: Example of Degrees of Groundedness 28
Table 4.1: Example Radio Dialogue 35
Table 5.1: Example Call for Fire Dialogue 41
Table 5.2: Parameters Related to General and Establishing Dialogue Moves 44
Table 5.3: Parameters Related to Targeting Dialogue Moves 44
Table 5.4: Parameters Related to Delivery Dialogue Moves 45
Table 5.5: Parameters Related to Adjust Dialogue Moves 45
vii
Table 6.1: Evidence of Understanding 50
Table 6.2: Example of Evidence in Dialogue 50
Table 6.3: Example of an Acknowledgment 51
Table 6.4: Example of a Request Repair 52
Table 6.5: Example of a Move On 53
Table 6.6: Example of a non-Move On 53
Table 6.7: Example of a Use 54
Table 6.8: Example of a Use 54
Table 6.9: Example of a Lack of Response 55
Table 6.10: Example of a Lack of Response 55
Table 6.11: Degrees of Groundedness 56
Table 6.12: Example of a Lack of Response 57
Table 6.13: Examples of Degrees of Groundedness 58
Table 6.14: Example of Unacknowledged and Agreed-Content Degrees 59
Table 6.15: Example of Agreed-Signal Degree 59
Table 6.16: Example of Agreed-Signal+ Degree 59
Table 6.17: Sample Set of Grounding Criteria 61
Table 6.18: CFF Dialogue Trace, Targeting Phase 62
viii
Table 6.19: Change in CGU Record, Targeting Phase 64
Table 6.20: CFF Dialogue Trace, Post-Targeting Phase 65
Table 6.21: Change in CGU Record, Post-Targeting Phase 66
Table 7.1: Identifying Degrees of Groundedness 71
Table 7.2: Example of Repeat Back Evidence and Misunderstood Degree 72
Table 7.3: Identifying Evidence in Response 74
Table 7.4: Inter-Coder Agreement - Evidence 84
Table 7.5: Algorithm Agreement - Evidence 84
Table 7.6: Degree Increase/Decrease Agreements 85
Table 7.7: Performance Summary 87
Table 7.8: Performance in Grounding Condition 88
Table 7.9: Performance in Control Condition 88
Table 8.1: Example Dialogue With Hassan 93
Table 8.2: Evidence and Degree in a Dialogue With Hassan 98
Table 8.3: Experiment 1 - Questions Used 100
Table 8.4: Experiment 1 - Data 101
Table 8.5: Experiment 1 - Differences 102
Table 8.6: Experiment 2 - Questions Used 103
ix
Table 8.7: Experiment 2 - Data 104
Table 8.8: Experiment 2 - Differences 105
Table 8.9: Example Dialogue: Misinterpretation and Grounding 105
x
List of Figures
Figure 3.1: Dialogue Management Algorithms 29
Figure 4.1: A Human in the JFETS-UTM Environment 38
Figure 5.1: Example Dialogue Moves and Parameters for an Utterance 43
Figure 6.1: Example Common Ground Unit 49
Figure 7.1: Dialogue Management Algorithm 68
Figure 7.2: Rules to Identify Evidence of Understanding 69
Figure 7.3: Rules to Identify Degrees of Groundedness 73
Figure 7.4: Rules to Determine Evidence in Response 75
Figure 7.5: IOTA Architecture: IOTA in the JFETS-UTM 76
Figure 7.6: IOTA Architecture: Components Within IOTA 77
Figure 7.7: Dialogue Management Example: CGU Objects 79
Figure 7.8: Dialogue Management Example: CGU Register 79
Figure 7.9: Dialogue Management Example: CGU Register 80
Figure 7.10: Dialogue Management Example: CGU Under Consideration 80
xi
Figure 7.11: Dialogue Management Example: CGU Register 81
Figure 7.12: Dialogue Management Example: Evidence Provided 81
Figure 7.13: Dialogue Management Example: CGU Register After Reply 82
Figure 8.1: A Human Interacting With Hassan 92
Figure 8.2: Hassan System Architecture 94
Figure 8.3: Example Interpretation: Wh- Question 95
Figure 8.4: Example Interpretation: Assertion 95
Figure 8.5: Example Interpretation: Greeting 96
Figure 8.6: Example Interpretation: Unknown 96
Figure 8.7: Hassan Grounding Algorithm 96
xii
Abstract
Spoken dialogue systems - computers that interact with humans through spoken conver-
sations - must become more robust before they will be widely accepted. One tradition in
improving error-handling in spoken dialogue systems involves studying and implement-
ing grounding behavior as used by humans. When humans converse, they typically work
together to establish mutual understanding by using behavior such as repetitions (“you
said seven o’clock...,”) acknowledgments (“ok, got it,”) and repairs (“No, I said ten
o’clock.”) These types of evidence of understanding combine to help humans establish
that the material under discussion is mutually understood to a level sufficient for the cur-
rent purposes. However, previous work in grounding has not examined how to explicitly
represent the degree to which material is grounded during a dialogue, whether this can
be useful for dialogue management in spoken dialogue systems, and what advantages
this brings to implemented systems.
This thesis presents the novel Degrees of Grounding model. This model answers
open questions by using a corpus study to identify how to explicitly represent the degree
to which material has reached mutual understanding during a dialogue. The model
describes how evidence of understanding combines to define the degree of groundedness
of some material under discussion, how grounding criteria can be defined in terms of
those degrees of groundedness, and how algorithms working with these concepts can be
used for dialogue management.
xiii
The components of the Degrees of Grounding model were developed by analyzing
behavior of artillery fire request dialogues in a virtual training environment. An eval-
uation confirmed that the dialogue management algorithms agreed with human judg-
ments, and that the dialogue manager was capable of managing dialogues in the virtual
environment while providing more detailed descriptions of dialogue behavior than were
previously available. The Degrees of Grounding model was then implemented in a vir-
tual human for tactical questioning training, and a set of experiments with users showed
that the Degrees of Grounding model produced more appropriate grounding utterances
when compared to baseline systems.
xiv
Chapter 1
Introduction
1.1 Grounding and Spoken Dialogue Systems
The general public is increasingly likely to encounter dialogue systems, and especially
spoken dialogue systems: as telephone-based Interactive V oice Response systems, for
in-car navigation, or for personal computer control, for example. Researchers study
other domains such as tutoring [LS04] and interacting with virtual humans [RMG
+
02].
However, these research systems - and even current commercial systems - must become
more robust before they will be widely accepted.
Typical dialogue systems have several components. If the system’s interface works
through speech rather than text, an Automated Speech Recognition component takes the
speaker’s acoustic signal and produces a text interpretation of the utterance. However,
to the dialogue system, that text does not automatically have any meaning that might
be useful in that domain. So a Natural Language Understanding component takes that
text and produces an interpretation of the meaning of that utterance. A Dialogue Man-
agement component must then determine what response if any to make to the human
speaker and update the information state of the dialogue, usually by referring to a model
of the state of the dialogue to that point, and possibly also to domain-specific informa-
tion, such as the state of a virtual world at that point or a set of information stored in a
database. Then, a speech generation system must produce the desired reply.
1
However, Automated Speech Recognition and Natural Language Understanding
technologies are are error-prone, especially when attempting domain- and speaker-
independence. Furthermore, the human speech is often a reaction to something the
dialogue system said, which may have involved errors by the Dialogue Manager or
speech generation system [BVPV03]. This is in addition to mistakes of comprehension
or generation that the human user may have contributed. So a dialogue manager does its
work in a potentially errorful situation.
Because of this, an active field of research involves handling errors in spoken dia-
logue systems. Researchers study ways of detecting these errors and handling them
once detected. Some of these approaches focus purely on improving system perfor-
mance rather than modeling human behavior. These approaches favor using confidence
scores from Speech Recognition or Language Interpretation components to detect errors
[BR05a], using belief models to identify when a belief inconsistent with its current
domain or discourse was detected [McR98], or using supervised learning trained on an
implemented system to determine recovery policies [BLR
+
06].
Another approach in researching error handling notes that humans also work in noisy
environments and have developed strategies for detecting and managing errors. These
researchers study the kinds of prompts given to indicate misunderstandings or acknowl-
edgments that understanding has occurred [KSTW01], the kinds of errors and error-
correction mechanisms humans use in dialogue [AF03], and the kinds of clarification
requests that humans use [RS04], for example.
One focus of research in this approach considers the behaviors humans use to man-
age error and mutual understanding while working collaboratively to establish mutual
belief. Clark and Schaefer [CS89] discuss the importance of studying how material
2
Term Definition
Common Ground The set of beliefs that the participants of a discourse
have in common
Grounding A collaborative activity between participants who
work towards establishing a set of mutually-held
beliefs
Evidence of Understanding Indications of progress towards mutual understanding
Degree of Groundedness The extent to which material is grounded
Degrees of Grounding model The central contribution of this dissertation, which
describes how degrees of groundedness arise from evi-
dence of understanding, and shows applications to dia-
logue management
Table 1.1: Grounding-Related Terms
is added to the common ground, the set of beliefs that the participants of that dis-
course have in common. Clark and Schaefer define grounding as a collaborative activ-
ity between participants who work towards establishing a set of mutually-held beliefs.
As part of a model of grounding, Clark and Schaefer describe how participants pro-
vide evidence of understanding such as confirmations (“OK, you said seven o’clock”),
backchannelling (“uh-huh”), and corrections (“I said ten, not seven”) to indicate their
progress towards mutual understanding. Clark and Schaefer also describe how partic-
ipants work towards the grounding criterion, at which point both parties are satisfied
that they agree on the grounded material. These terms are summarized in Table 1.1.
These concepts are further developed by Traum [Tra94], who presents a compu-
tational model of grounding implemented in a planning assistant for railroad resource
allocation. Traum uses a model of groundedness in which something is either grounded,
not grounded, or in the process of being grounded. However, he acknowledges that the
currently undeveloped notion of degrees of groundedness [Tra94, TD98] - the extent
to which material is grounded, as introduced by Clark and Schaefer - also appears to
be useful. Intuitively, it seems that some information needs to be better grounded than
others: for example, when ordering an airplane ticket over the phone it is essential that
3
every number of a credit card be agreed upon, but it is much less essential that every
word made in reply to “thank you very much and have a nice day” be mutually agreed
upon. The central contribution of this dissertation is the Degrees of Grounding model,
which describes how degrees of groundedness arise from evidence of understanding,
and shows applications to dialogue management.
1.2 Thesis Questions
This disseration provides answers to several open questions regarding degrees of
groundedness.
It provides an answer to the question of how to represent degrees of groundedness.
As part of this, it answers the question of the relation between evidence of understand-
ing and degree of groundedness, in particular whether the strength of evidence of under-
standing is of primary importance in modeling the extent to which material is grounded.
It does this by showing that a degrees of groundedness is the result of an accumulation
of evidence of understanding, and that it is the degree of groundedness of a material that
is important in determining the extent to which material is grounded, rather than just the
latest type of evidence of understanding.
This dissertation also provides an answer to the question of whether degrees of
groundedness could be used in online dialogue managers, or if degrees of grounded-
ness were only useful for descriptive analyses of dialogues. Beyond providing a static
description of the features that influence groundedness, this dissertation describes a way
of reacting to the effect of various dialogue phenomena, tracking the degree of ground-
edness in an online fashion, and providing a reply based on the degree of groundedness
and grounding criteria of material under discussion.
4
Finally, this dissertation answers the question of what value, if any, there is to mod-
eling degrees of groundedness in spoken dialogue systems. It shows that modeling
degrees of groundedness, in a way that is based on an analysis of appropriate human
dialogue behavior, provides better representations of a human speaker’s performance in
a training dialogue and provides demonstrable improvements in the appropriateness of
a virtual human dialogue system’s replies.
1.3 Contributions of This Dissertation
The central contribution of this dissertation is the creation of the Degrees of Ground-
ing model. This model contains a set of types of evidence of understanding, such
as acknowledgments and repeat backs, which indicate progress towards mutual under-
standing. The model also contains a set of degrees of groundedness, such as accessible
and agreed-content, which measure the extent to which material is grounded. The model
describes how grounding criteria can be described in terms of degrees of groundedness,
and how these grounding criteria may vary. The model also contains several dialogue
management algorithms: one algorithm describes how evidence of understanding can be
identified from dialogue features, another algorithm describes how degrees of ground-
edness emerge from evidence of understanding, and a third algorithm determines what
evidence of understanding, if any, are appropriate to generate given the current state
of groundedness of the material under discussion. Although this model was developed
in the context of dialogue management in spoken dialogue systems, it provides general
principles relevant to text-based dialogue systems, and is also a useful tool for describing
dialogue between two humans.
The Degrees of Grounding model is developed based on a corpus analysis: an appro-
priate domain and corpus is identified, the material to be grounded and the dialogue
5
behavior is identified, and the components of the model are developed. An evaluation
is conducted to verify that human annotators can agree upon the model’s features, and
that the algorithm’s identification of evidence of understanding and changes in degrees
of groundedness agree with the human annotator’s identifications. The model is imple-
mented in a spoken dialogue system, and it is shown to provide performance that equals
that of a traditional dialogue system, with the added benefit of better describing the
extent to which material has been grounded - which in a training system, can be useful
for post-session review or for identifying immediate instructor intervention. Then the
model is adapted to a second domain, distinct from the one in which it was developed.
The domain is studied and a component is built to handle grounding behavior in the
context of a virtual human spoken dialogue system. A set of experiments are run to
compare the performance of the virtual human with the Degrees of Grounding model to
baseline models. The experiments show that the system with the Degrees of Grounding
model provides improved performance.
The model of Degrees of Grounding makes several contributions by way of answer-
ing this dissertation’s thesis questions. First, it provides a representation of degree
of groundedness, and describes how it is updated by evidence of understanding. It
describes how the strength of degrees of groundedness, rather than the strength of an
individual piece of evidence of understanding, is of central importance to grounding.
Second, it provides a way of describing how degrees of groundedness are used to drive
dialogues forward by tracking the effect of dialogue phenomena on evidence of under-
standing, the effect of evidence of understanding on degrees of groundedness, and the
effects of degrees of groundedness and grounding criteria on decisions made regarding
dialogue replies. Third, it shows how the implemented model can be used to better rep-
resent a human trainee’s performance in a training dialogue system. Fourth, it shows
6
how adding grounding behavior can provide demonstrable improvements in the appro-
priateness of a virtual human dialogue system’s replies.
1.4 Outline of Dissertation
The dissertation proceeds as follows. Chapter 2 details research in error handling in
spoken dialogue systems and in human communication, such as work in detecting when
misunderstandings occur, how humans react to misunderstandings, what approaches to
reacting to misunderstanding are most likely to lead to task success in a dialogue system,
and what approaches are best when faced with a complete lack of comprehension of a
user’s utterance. This chapter also covers work in grounding, both from its development
as a theory of dialogue, to computational approaches, emphasizing the concepts that
serve as background and foundation to the current work.
Chapter 3 describes the Degrees of Grounding model at a general level, indicating
how it follows from ideas in previous work in grounding. This chapter enumerates the
model’s components: evidence of understanding, degrees of groundedness, grounding
criteria, and algorithms for dialogue management. It provides examples of each, and
describes various factors that must be taken into consideration when considering how a
domain should be represented with this model.
To describe the model at a more detailed level, an appropriate testbed domain must
be identified. The testbed must be amenable to an initial analysis of grounding, by
containing a large amount of grounding behavior, and a discrete amount of information
to be grounded. The Call for Fire domain, in which artillery fire is requested by radio
dialogue, is identified as an appropriate choice. The domain is described in Chapter 4,
along with general considerations that affect and constrain military radio dialogue.
7
As part the model development, a discourse analysis of the Call for Fire domain
is carried out, which identifies the information elements worked with in that domain,
along with the dialogue moves and dialogue move parameters used to communicate
about those information elements. This is described in Chapter 5.
Following that, Chapter 6 describes how the various components of the Degrees
of Grounding model are implemented in the Call for Fire domain, in the context of
describing a dialogue between two humans. The elements of each component of the
model are described, with examples. A sample dialogue is traced out, highlighting how
the dialogue model components describe the grounding-related phenomena after every
utterance.
Chapter 7 describes how the model works in an online interactive system in the Call
for Fire domain. An evaluation of the model is performed, first to confirm that human
annotators can agree upon identifying the model components, then that the dialogue
algorithms can agree with the humans. An online evaluation then shows that a dia-
logue manager using the Degrees of Grounding model can match the performance of a
traditional dialogue manager while providing additional information about the state of
groundedness.
Chapter 8 describes implementing the model for a virtual human in a new domain,
virtual humans for Tactical Questioning training. The Tactical Questioning domain is
defined, and the virtual human architecture is described. An example of how grounding
occurs in this domain is provided. A set of experiments are described, which show that
the virtual human implementing a Degrees of Grounding model performs better than
baseline systems.
Finally, Chapter 9 provides conclusions about the material presented, and discusses
research directions suggested by the current work, such as issues that might arise when
applying this model to new domains, ways in which machine learning could help with
8
learning domain-specific features, the importance of considering emotion and personal-
ity in grounding, and some thoughts on probabilistic approaches.
9
Chapter 2
Related Work
2.1 Overview
Human speakers conversing with each other risk errors of understandings in various
ways [Pae03], but humans have a strong set of error recovery strategies to call upon
[Ska05a]. A central current research question is how to help spoken dialogue systems
coordinate understanding and manage error as well as humans do.
Researchers have addressed various aspects of this problem, studing sources of mis-
understandings, types of clarifications, and strategies for error recovery, for example.
Section 2.2 reviews recent trends in this work.
A highly influential contribution comes from the idea of conversation as a joint
activity, in which grounding plays a central role [CS89]. Although much of the work
described in section 2.2 is aware of the notion of grounding and conversation as joint
activity, there is a set of research that more overtly builds on these concepts and devel-
ops them to attack problems in spoken dialogue systems. The set of research based on
grounding and conversation as joint activity, of which this thesis is a part, is described
in section 2.3.
2.2 Error Handling Approaches
Errors of understanding in Spoken Dialogue Systems are not always obvious. For that
reason, detecting misunderstandings is the first step towards handing errors.
10
2.2.1 Detecting Misunderstandings
Traditionally, spoken dialogue systems have relied on the confidence score of speech
recognition or interpretation components to identify misunderstandings [BR05a].
Hirschberg, Litman, and Swerts study ways of detecting user behavior that indicate
misunderstanding, focusing on prosodic cues [HLS99] and on automatic features such
as keywords, or the kinds of prompts that were given [HLS04]; using these techniques
they have found an improvement over simple confidence scores.
Kramer et al. study the kinds of cues humans use to indicate misunderstanding
[KSTW01]. They identify several features that tend to communicate positive evidence
that a human is successfully understanding: for example, humans might give confirma-
tions, give answers to questions, or add new information; they will not make corrections.
If a human is not understanding, they might give certain cues such as dis-confirmations,
or not giving answers or new information when expected, or they may make a correction
or repetition.
Aberdeen and Ferro perform a corpus study to determine the kinds of errors, evi-
dence for errors, and correction mechanisms that humans performed, along with the
types of outcomes of each correction mechanism [AF03]. For example, an evidence of
understanding might be that a participant ignores a certain context switch, or gives a
contradictory response. A correction mechanism might be an implicit confirmation and
a repeated prompt. However, they concede that their study was based on a small set of
data.
2.2.2 Reacting to Misunderstandings: Studies of User Behavior
Once a misunderstanding is detected, it must be reacted to. One approach researchers
have taken is to investigate how humans react to misunderstanding.
11
Litman, Hirschberg, and Swerts study the kinds of corrections that human users pro-
vide when they are misunderstood [LHS06]: for example whether users repeat them-
selves, provide paraphrases, repeat with task content added, repeat with less task con-
tent, or repeat with some content added and some omitted. Earlier work by these authors
studied the ways in which system dialogue strategies affect the types of correction given,
and identified that users preferred strategies that needed fewer corrections, even if that
meant that the tasks would take longer [SLH00].
Rodriguez and Schlangen study clarification requests that follow misunderstandings
[RS04]. They perform a corpus study to identify details of the forms and functions that
clarification requests may take: for example, the mood, tone, extent, and severity of the
clarification request.
Skantze studies the strategies that humans use to recover from error, principally in a
map task using the Galatea discourse modeller [Ska05b] in the Higgins spoken dialogue
system [ESC04]. Galatea tracks the grounding status of concepts in terms of who added
it and when, and what confidence the system has in its understanding of the concept.
Each concept has a high to low grounding status based on its confidence score. Based on
manually-set thresholds, if the grounding status is low, the system makes a clarification
request; if it is medium, the system makes a display of understanding, and if it is high,
then no clarification act is needed.
Skantze also ran an experiment with task-related conversations between humans, in
which he introduced error by having the communications mediated by ASR and vocoder
speech [Ska05a]. He noticed that a high word error rate led to fewer mis-understandings,
but many non-understandings. He also noticed that in the case of a non-understanding,
although sometimes the participant would signal non-understanding by explicitly say-
ing so, or re-assert what they thought was the correct topic, they more frequently tended
to ask questions related to the task that would confirm their hypothesis about what the
12
other participant believed. These hypothesis-confirmation questions lead to the best
immediate results, with significantly fewer non-understandings and more partial under-
standings; possibly because it constrains the response to domain and language models.
2.2.3 Reacting to Misunderstandings: System Approaches
Other researchers have focused on the ways that systems can handle misunderstandings,
taking human behavior into account but not focusing on it as closely.
McRoy [McR98] identifies a way to detect and repair misunderstandings. A belief
created by an utterance would be accepted if that belief was consistent with current
beliefs. If the system could not find a role for that belief that was consistent with its
domain plan or discourse plan, it would decide it misunderstood that utterance. In that
case it would look back in previous turns for an interpretation that made sense if previous
utterances were interpreted differently. If it found none, it would ask the human for more
information.
McTear et al. [MOHL03] describe a system in which a confirmation status and
discourse peg are used to determine when to perform confirmations or repairs. Each
concept has a confirmation status such as new, inferred, repeated, and so on; as well as a
discourse peg equal to 1 if it had been repeated by the user, 0 if the value was modified,
or -1 if the system negated it indicating misunderstanding. If a new value was given, the
system would confirm it; if the value was 0, the system would make a repair confirm; if
the value was -1 it would request a repair. These decisions were made by rules.
Schlangen [Sch04] studies clarification requests, using the notion of levels of error
as described by Clark [Cla96] and Allwood [All95] to classify the causes of clarifica-
tions. He then uses a confidence score that is a mix of ASR confidence and intepretation
(semantic and pragmatic) confidence to decide on whether to perform clarification or
confirmations.
13
2.2.4 Reacting to Non-Understandings
In non-understandings, the system is unable to produce an interpretation, or an interpre-
tation below a certain threshold of what the user said [BR05c]. In this case, the issues are
different from misunderstandings: detection is trivial, but there has been less research
in resolving it. The most recent contributions have come from Bohus and Rudnicky.
Bohus and Rudnicky developed RavenClaw, a system for exploring error detection
and recovery [BR05b] in which the error handling is done separately from the dialog task
specification, to make it domain-independent. For misunderstandings, it provides either
explicit confirmations, implicit confirmations, or rejections. For non-understandings, it
may ask for a repetition, ask for a rephrasing, repeat the last prompt possibly with more
information, merely tell the user that there was a problem, say nothing, go on to another
part of the task, give an example of what could be said, or provide a full help text. An
error handling process executes one of these strategies when needed, using MDPs and
confidence scores. Further research [BLR
+
06] explores online supervised learning of
these strategies.
2.3 Grounding Approaches
2.3.1 Clark and Grounding
The notion of grounding, and the kinds of behavior that conversation participants engage
in while achieving mutual belief, have been important to a particular direction in dealing
with error in dialogue. In particular, Clark and his collaborators have described dialogue
as a collective act, which depends on common ground and contributing to the common
ground. In their work they describe common ground in terms of mutual beliefs, and
14
shared
1
knowledge
A knows that p
B knows that p
shared
2
knowledge
A knows that p
B knows that p
A knows that B knows that p
B knows that A knows that p
shared
3
knowledge
A knows that p
B knows that p
A knows that B knows that p
B knows that A knows that p
A knows that B knows that A knows that p
B knows that A knows that B knows that p
And so on to shared
∞
knowledge.
Table 2.1: Clark and Marshall’s Description of Shared Knowledge
the kinds of actions that conversation participants perform while establishing mutual
knowledge.
The concept of mutual knowledge is explored by Clark and Marshall in the context
of definite reference [CM81]. They summarize the technical distinction, developed in
the field of philosophy, between shared knowledge and mutual knowledge, describe the
importance of the latter, and resolve a paradox related to the issue.
Given a proposition p, and two people A and B, various levels of shared knowledge
can be described as shown in Table 2.1. In shared
1
knowledge, both participants know
the same p, but aren’t aware that the other knows it. In shared
2
knowledge, both partici-
pants know the same p and knows that the other knows it, but don’t know that the other
knows that they know it. And so on, to shared
∞
.
Clark and Marshall examine a definition of mutual knowledge as shared
∞
knowl-
edge, which suggests a potential paradox: although mutual knowledge is described in
the literature as essential in language, it seems to require an infinite checking of knowl-
ege conditions. This requires some non-zero amount of processing time and therefore
an infinite amount of time to reach mutual knowledge, which clearly humans do not
perform.
15
Clark and Marshall resolve this through copresense heuristics, which depend on
a grounds or basis which participants can use to induce mutual knowledge. A basis
involves evidence for that basis, possibly perceptual, for example, as well as a set of
assumptions, such as that the other participant is paying attention and will reason appro-
priately. This is relevant to the current work because certain conversation behavior can
be part of a basis.
Clark and Schaefer [CS89] discuss how conversations are highly coordinated efforts
in which participants work together to ensure that knowledge is properly understood by
all participants. They present a model which is not directly usable by an online system,
but which includes many useful concepts.
Clark and Schaefer note that models of discourse in philosophy, psychology of lan-
guage, linguistics, and computer science usually have three elements in common. They
tend to have a model of common ground, which represents the participant’s mutual
knowledge; they tend to agree that discourse involves participants adding to the common
ground; and they tend to believe that adding to the common ground is done unilaterally -
by saying the right thing at the right time. Clark and Schaefer disagree with the notion of
unilateral contributions, saying it does not account for grounding behavior: the positive
actions and the repairs that participants make to ensure material is correctly grounded,
or added to the common ground.
Grounding involves the conversation participants working to register a contribution
to a grounding criterion, which Clark and Schaefer define as “a criterion sufficient for
current purposes” [CS89]. Once the grounding criterion is met, the participants have
established mutual belief.
According to Clark and Schaefer, the grounding criterion is met through a two-phase
process in which a contributor presents an utterance, and a recipient accepts the utter-
ance by giving evidence of understanding. As shown in Table 2.2, Clark and Schaefer
16
Type of Evidence Definition
Continued Attention A participant continues to attend to the other’s utter-
ances
Initiation of Relevant Next Contri-
bution
A participant continues with an utterance in the dia-
logue
Acknowledgement A participant makes a general utterance of agreement
such as “uh-huh”
Demonstration A participant uses the information in a way that shows
that it is understood
Display A participant repeats back a relevant part of the utter-
ance
Table 2.2: Clark and Schaefer’s Evidence of Understanding
[CS89] (and Clark and Brennan [CB91]) provide a set of types of evidence of under-
standing, which they order “roughly” from weakest (Continued Attention) to strongest
(Display). Clark and Schaefer indicate that further study is needed to determine what
kind of evidence is required for a given presentation.
Clark and Wilkes-Gibbs [CWG86] describe how participants in a conversation will
act in a way that minimizes the amount of collaborative effort: the work that both of
them do in the grounding process. This Principle of Least Collaborative Effort explains
why, for example, a participant may not spend enough time to get an utterance right
the first time: given the constraints of time, it may be better to submit it and hope that
the other participant will help. Similarly, given a complex utterance, it may be better to
submit in parts, to reduce the likelihood of confusion.
Further work by Clark and his collaborators describes various aspects of this
approach. For example, Clark and Brennan study how the communication medium
influences grounding behavior. [CB91] The specifics of whether the participants are in
the same environment, can see each other, and can hear each other, for example, influ-
ences the costs of grounding: how costly it is to formulate and to produce a message,
to receive and understand a message, and so on. Clark and Krych [CK04] follow up by
17
studying the ways in which conversation participants monitor each other for behaviour
such as gestures, nods, or gaze, which indicate understanding.
2.3.2 Grounding Acts Model
Traum points out some of the problems in applying Clark and Schaefer’s contributions
model directly to the task of grounding in dialogue systems [Tra99, Tra98]. Traum
notes that by using a two-phase structure, Clark and Schaefer introduce problems such
as needing to know how much acceptance the second phase requires, a problem which
is exacerbated by the lack of a fully-developed notion of degrees of grounding. It is also
difficult to adapt to an online environment, as it is better suited to an analysis in which
the entire conversation is available. Of course, this is in large part because Clark and
Schaefer’s model was not intended to inform dialogue systems.
Traum develops a model of grounding which avoids many of the problems of
Clark and Schaefer’s contributions model [Tra94, TD98]. Traum’s model uses a Finite
Automaton and includes a set of Grounding Acts which considers all acceptance acts
to be Acknowledgments, has a strong emphasis on repairs, and allows for the updat-
ing of the state after every utterance, so after every turn the grounding state is known.
Traum uses discourse units (later redefined as common ground units [NT99]) to rep-
resent the content being grounded, and grounding acts to describe the utterances that
ground the common ground units. Traum’s model implements a dialogue manager in
a system to help manage when a user utterance should be grounded, but only involves
three states: ungrounded, fully-grounded and ’in process’, and leaves unexplored the
notion of degrees of groundedness, the extent to which an item is grounded. This notion
will play a key part in the current study.
18
2.3.3 Decision-Making Under Uncertainty
Paek and Horvitz take the basic ideas about grounding in a different direction. Paek and
Horvitz use decision theory to approach grounding as decision-making under uncer-
tainty [PH00a]. In the process they developed several testbed agents, including the
Bayesian Receptionist [HP99], for which they studied the linguistic and visual cues
used by receptionists to determine what the user wanted, and the Lookout application
[HP00] which automated operations for the Microsoft Outlook application.
Taking inspiration from Clark [Cla96], Paek and Horvitz view error in dialogue as
happening at one of four levels: the channel level, which involves the action or utter-
ance; the signal level, which involves the behaviors that the participants agree upon as
being relevant to the communication; the intentional level, which involves the seman-
tic content being communicated; and the conversation level, at which the participates
coordinate the joint activity related to the task.
Paek and Horvitz combine the signal and channel level into a maintenance module,
they have an intention module for the intention level, and a conversational control com-
ponent to coordinate interactions between these modules and manage the conversation.
Paek and Horvitz use Bayesian networks to model dependencies between these modules
and to manage the uncertainty in them.
Paek and Horvitz use value of information analysis to help guide actions about decid-
ing which information to request, by measuring the cost involved in getting that informa-
tion versus the amount of information it would provide. They describe how visual cues,
which humans acquire inexpensively, have a cost with activating and using. Concep-
tual cues, which involve processing linguistic features, have a high cost, since clarifying
concepts with a human is intrusive, but term cues have a lower cost, since they involve
only clarifying a word [PH99].
19
Paek and Horvitz define the grounding criterion based on the expected utility of
repairing or not, given the probability of comprehension given evidence. A participant
should request a repair if the utility of repair is greater than the utility of not repairing.
If the utility of not repairing is higher than the utility of repairing for both participants,
then the material they are talking about has reached the grounding criterion. [PH00b]
Paek and Horvitz cite several strengths of their approach [PH03]. It can track and
propagate the degree of uncertainity about various aspects of the conversation; for exam-
ple, about ASR confidence. Also, it can model contextual dependencies, such as the fact
that the background noise level influences the likelihood that a given speech input was
overheard, and not actually indended for the agent. Finally, it can model the stakes: the
severity of the consequences of a given decision.
Skantze continued work on one aspect of this approach, by investigating the use of
data to automatically derive expected utility values [Ska07]. Specifically, he used mea-
sures of dialogue efficiency, task failure consequences, and information gain to deter-
mine cost values, and used data to determine ASR confidence thresholds.
2.4 Discussion
As discussed in Section 2.2, researches have studied ways of handling errors in spoken
dialogue systems: how to detect misunderstandings, how to react to misunderstandings,
and how to react to a complete lack of understanding. Section 2.3 focused on grounding
approaches, inspired by the behavior that human speakers exhibit while establishing
mutual knowledge.
Clark and Schaefer’s concept of grounding criteria implies that, at the very least,
material can either be ungrounded or fully grounded, and Clark and Schaefer’s set of
20
evidence of understanding and their strength of evidence principle suggest that mate-
rial will be more grounded after certain types of evidence of understanding, which
implies that there are various degrees of groundedness. However, a further specifica-
tion of degrees of groundedness and how they emerge from dialogue behavior has not
yet been provided. Chapter 3 provides an overview of the Degrees of Grounding model
and describes how evidence of understanding produces degrees of groundedness, and
why this might be useful for dialogue management. This model is more fully developed
in Chapters 6 and 7 in the context of an immersive training environment for artillery fire
requests, and in Chapter 8 in the context of a virtual human.
21
Chapter 3
A Model of Degrees of Grounding
3.1 Overview
The Degrees of Grounding model develops notions introduced by Clark and Schaefer,
as described in Chapter 2. However, a formalism such as Traum’s Grounding Acts
model is more straightforward for dialogue management. To this end, the Degrees of
Grounding model separates the degree of groundedness from the types of evidence of
understanding, and uses this along with the degrees of groundedness to determine an
appropriate grounding reply.
The Degrees of Grounding model includes several components: a set of types of
evidence of understanding, a set of degrees of groundedness, a definition of grounding
criteria based on degrees of groundedness, and algorithms for dialogue management.
These components are described in the following sections of this chapter, along with
discussions of which components are general, which are domain-specific, and how the
domain-specific components may be adapted to a new domain.
3.2 Evidence of Understanding
This model begins with Clark and Schaefer’s notion of evidence of understanding as
phenomena that provides an indication of the extent to which material under discussion
is mutually understood. Clark and Schaefer’s set of types of evidence of understand-
ing includes dialogue behavior (general acknowledgments, repetitions of material heard
22
from the other speaker), indications of understanding based on semantic or task-based
properties (initiating a relevant next contribution, or using information that the other
speaker has provided in other ways) or nonverbal behavior (continuing to attend to the
other speaker).
The model of Degrees of Grounding adapts and extends Clark and Schaefer’s set of
types of evidence of understanding. Rather than focus on the strength of each individ-
ual type of evidence of understanding, the Degrees of Grounding model describes how
patterns of evidence are used by humans to reach the grounding criterion for the rele-
vant material under discussion. For example, consider the example in Table 3.1, which
describes a phone conversation between a patient (the author of this dissertation) and his
doctor.
Line Speaker Utterance Evidence
1 Patient The fax number is 3 1 0 5 7 4
5 7 2 5
Submit
2 Doctor 3 1 0 5 7 4 5 7 2 5 Repeat Back
3 Patient right Acknowledge
Table 3.1: Example of Evidence of Understanding
The speakers are discussing a fax number to which the doctor will send a document.
In line 1, the patient provides his fax number to the doctor - in terms of evidence of
understanding, the material in question is Submitted for consideration. As evidence, it
is not very strong: the speaker can assume that if the doctor is paying attention at that
moment, and if there are no errors in comprehension, then the material has been mutu-
ally understood; nevertheless, although weak, it is a type of evidence of understanding.
In line 2, the doctor recites the fax number - in terms of evidence of understanding,
the material in question is Repeated Back. This type of evidence of understanding is
much stronger, although it is still not unquestionable proof that the material is mutually
understood: the doctor might have misunderstood the number and then misstated it, for
23
example. Furthermore, after line 2, the doctor has no indication of whether the patient
has heard and accepted the doctor’s repetition. So the Acknowledgment in line 3 pro-
vides further evidence of understanding, because now the patient is indicating that he
has heard the doctor’s repetition, and found no fault in it.
When adapting the Degrees of Grounding model to a new domain, an initial step is
identifying the types of evidence of understanding in that domain. Many of the types
of evidence used in the domains described in this dissertation - such as when a speaker
Submits new material, or Repeats Back material, will transfer to the new domain, just
as they were transferred from Clark and Schaefer’s model to this model. However, other
types of evidence may not transfer between domains, especially those dependent on
modalities. For example, although Clark and Schaefer’s “Continued Attention” type
of evidence would almost certainly be useful to a dialogue manager with the ability to
attend to a speaker and to indicate continued attention, neither of the testbed domains
modeled in this dissertation had these abilities.
Identifying a type of evidence of understanding in the context of the Degrees of
Grounding model involves identifying what kinds of cues participants give each other
during the dialogue. Typically, identifying types of evidence of understanding will
involve a corpus analysis or study of the domain to identify these cues and the fea-
tures of that evidence that make it unique. Generally, these features are relevant on a
per-material basis. For example, the relevant features of the Repeat Back type of evi-
dence of understanding are that the material has been Submitted by the other participant,
and that the current participant is presenting the material again for the purposes of con-
firmation (i.e. without some indication that they think it is incorrect.) These features are
useful not only to define the types of evidence, but for use in algorithms for dialogue
management as described in Section 3.5.
24
3.3 Degrees of Groundedness
At some point in the dialogue shown in Table 3.1, the speakers reach the grounding cri-
terion for that material in that dialogue, and are satisfied that it is sufficiently grounded.
However, note that the essential component of reaching the grounding criterion is not
any single instance of evidence of understanding, but the combination of instances of
evidence of understanding. The degree of groundedness (extent to which material is
grounded) is not dependent on the strongest type of evidence used (arguably the Repeat
Back, which explicitly indicates a belief by the speaker that both dialogue participants
share the material in question), because if the patient had not produced the utterance in
line 3, the material would not be as strongly grounded (the doctor would not have been
assured that the patient found no fault in line 2.) The degree of groundedness is not
dependent on the last type of evidence used, because if the doctor had replied in line 2
with a simple acknowledgment such as “got that”, the material would not be as strongly
grounded. Rather, the degree of groundedness is dependent on ordered sequences of
evidence of understanding. Consider the examples in Tables 3.2 and 3.3.
Line Speaker Utterance Evidence
1 Patient The fax number is 3 1 0 5 7 4
5 7 2 5
Submit
2 Doctor Say that again? Request
Repair
3 Patient 3 1 0 5 7 4 5 7 2 5 Resubmit
4 Doctor got that Acknowledge
Table 3.2: Example of Evidence of Understanding
In Table 3.2, the patient submits the fax number in line 2, and the doctor Requests
Repair of the material in line 2, indicating that they would like the other speaker to
repeat the information. The patient replies Resubmits the material in line 3, and the
doctor Acknowledges the material in line 4. In contrast, the example in Table 3.3 uses
25
the same set of evidence of understanding, but in a different order. After the initial
submission, the doctor replies with an acknowledgment. However, the patient resubmits
the material, perhaps fearing the doctor did not sufficiently ground the material, and the
doctor replies with a request for repair. Although the same set of types of evidence of
understanding are used in these two examples, clearly the example in Table 3.3 is less
grounded by turn 4.
Line Speaker Utterance Evidence
1 Patient The fax number is 3 1 0 5 7 4
5 7 2 5
Submit
2 Doctor got that Acknowledge
3 Patient 3 1 0 5 7 4 5 7 2 5 Resubmit
4 Doctor Say that again? Request
Repair
Table 3.3: Example of Evidence of Understanding
Degrees of groundedness are defined in terms of sequences of evidence of under-
standing. Consider Table 3.4, which repeats the initial doctor-patient dialogue, but addi-
tionally provides the degrees of groundedness of the relevant material after that line’s
utterance is made. After an initial Submit type of evidence, the degree of groundedness
is Accessible. If the sequence of evidence is Submit followed by a Repeat Back, the
degree of groundedness is Agreed-Content, indicating that the content of the material
(i.e. the actual number) has been agreed upon. If an Acknowledge type of evidence is
repeated after that, the degree of groundedness is Agreed-Content+, indicating that the
content of the material has been agreed upon, and further confirmed.
When adapting the Degrees of Grounding model to a new domain, after the set of
types of evidence of understanding are identified, the sequences of evidence of under-
standing that occur in the new domain need to be studied to identify if previously unseen
patterns of evidence of understanding occur. These novel patterns may use evidence of
26
Line Speaker Utterance Evidence Degrees
1 Patient The fax number is 3 1 0 5 7 4
5 7 2 5
Submit Accessible
2 Doctor 3 1 0 5 7 4 5 7 2 5 Repeat Back Agreed-
Content
3 Patient right Acknowledge Agreed-
Content+
Table 3.4: Example of Degrees of Groundedness
understanding that has been identified new to this domain, or they may use previously-
identified evidence types in new ways. In either case, at this stage what is relevant to
the model is the effect that the evidence has on the degree of groundedness: the extent
to which the material has been mutually understood, as reflected in the behavior of the
dialogue participants. One of the key elements of that behavior is when the dialogue par-
ticipants finish their discussion of the material at hand because it has been sufficiently
grounded - the material has reached its grounding criterion.
3.4 Grounding Criteria
In the Degrees of Grounding model, each element of material being discussed has a
grounding criterion, which is defined in terms of the degree of groundedness at which
the material is sufficiently grounded for the purposes of the immediate dialogue.
The grounding criterion of some material is often dependent on the material being
discussed. For example, the fax dialogue being discussed in Tables 3.1 to 3.4 is of
high importance - if each digit of the number is not transmitted correctly, the fax will
not arrive. For that reason, the participants in that dialogue ground the material to the
Agreed-Content+ degree.
However, consider an extension of the dialogue shown in Table 3.5. In line 1, the
doctor indicates the time the fax will be sent. Rather than ground the time by repeating
27
it back, however, the patient grounds it with a pair of acknowledgments and move on to
the next topic of the dialogue - closing the dialogue - because the dialogue participants
are satisfied that the time being discussed is sufficiently grounded.
Line Speaker Utterance Evidence Degrees
1 Doctor I’ll send it after 4 o’clock Submit Accessible
2 Patient great Acknowledge Agreed-Signal
3 Doctor okay, goodbye Acknowledge,
Move On
Agreed-
Signal+
Table 3.5: Example of Degrees of Groundedness
If the time were more important - if, for example, it was essential to the receivers
that the fax be sent in a specific time interval - then the patient might have repeated the
material back instead of simply acknowledging.
Various factors can vary grounding criteria for an element of information. Domain
has an effect: for example, when the doctor asks over a phone “how are you doing?”, the
reply might need to be grounded more thoroughly than when the same question is asked
during a casual phone conversation between friends. Also, potential for misunderstand-
ing may play a part: if the phone connection is very bad or the background environment
very noisy, then the participants may need to ground material more throughly. If a dia-
logue is being held as part of a training exercise, the material may need to be grounded
more thoroughly for purposes of instruction. Finally, the personality of the dialogue
participants may play a part in how thoroughly that person feels they need to ground
material.
When adapting the Degrees of Grounding model to a new domain, the information
elements that will be discussed in the dialogue must be identified, and the grounding
criterion for each must be determined, along with the relevant features, such as those
mentioned above, that might vary the grounding criteria in that domain. Also one must
28
determine whether the grounding criteria should vary during the dialogue (if communi-
cation conditions change, for example) or between dialogues (for more or less rigorous
instruction per trainee, for example.)
3.5 Dialogue Management Algorithms
As described so far, the Degrees of Grounding model contains sets of types of evidence
of understanding and degrees of groundedness, and an approach towards describing
grounding criteria in terms of degrees of groundedness. But to model online dialogue
and be useful as a dialogue manager, a set of algorithms are needed. A high-level view
of the approach that the Degrees of Grounding model uses is outlined in Figure 3.1.
for a given utterance
identify the types of evidence of understanding it provides
identify the degree of groundedness of the relevant material
determine the type of evidence to provide in return
Figure 3.1: Dialogue Management Algorithms
First, the model needs a way of automatically identifying types of evidence of under-
standing from utterance and dialogue features. These features can be the ones described
at the end of Section 3.2: the speaker, the order of presentation of the material, and
semantic features, for example. Adapting the model to a new domain involves adding
any new types of evidence of understanding identified and determining what approaches
to identifying types of evidence of understanding are new to this domain: for example,
are there additional semantic cues or dialogue behavior that would change the identifi-
cation of a type of element.
Second, the model needs a way of automatically identifying the degree of ground-
edness of each relevant element of information. This will typically be based on the
29
history of evidence of understanding presented for that material, although the entire his-
tory of evidence may not be required in all cases, and there may be levels that do not
require evidence: for example, material may begin at an Unknown degree of ground-
edness Alternately, material may be Assumed to be mutually known because it is com-
mon knowledge among the dialogue participants, or because it was grounded by other
means such as through a set of written instructions presented during an exercise brief-
ing. Adapting the model to a new domain involves first determining the effects of any
new types of evidence and degrees added, and second determining if there are any new
patterns of evidence used in the domain that now need to be modelled.
Finally, the model needs a way of determining what type of evidence of under-
standing, if any, an automated system should provide during a dialogue. Typically this
involves monitoring the degree of groundedness of the set of information being dis-
cussed, comparing it to the grounding criteria of that material, and determining what
type of evidence of understanding will advance the material towards its grounding cri-
terion. Adapting the model to a new domain involves identifying the effects of new
types of evidence of understanding and degrees of groundedness and identifying any
new grounding behavior to be modeled. For example, a new domain may require that
the dialogue manager delay grounding based on world, task, or emotional criteria.
The specifics of how this algorithm will be integrated into an online system and
how it influences task decisions will vary based on the domain and the system archi-
tecture. The dialogue management algorithm described in Section 7.2 for the Call for
Fire domain handles almost every utterance made by the system and can ground mul-
tiple items in one utterance, whereas the dialogue management algorithm described in
Section 8.2.4 for the Tactical Questioning domain handles only the production of one
optional grounding statement per utterance.
30
3.6 Discussion
This chapter has described a general outline of the components of the Degrees of Groun-
ing model: a set of types of evidence of understanding, a set of degrees of grounding,
grounding criteria for each item of information being discussed, and algorithms for dia-
logue management. Several of these components are adapted from concepts presented
in Clark and Schaefer’s model of grounding: for example, the concept of evidence of
understanding, and grounding criteria. However, the Degrees of Grounding model is
innovative in terms of describing how degrees of grounding are derived from evidence
of understanding, and how these degrees can then be used to make dialogue management
decisions.
It is difficult to describe the model in depth without making the model concrete in
terms of a specific domain. To do this, an appropriate domain must be identified, and
a discourse analysis of that domain must be carried out for purposes of computational
modeling. An appropriate testbed domain is described in Chapter 4, and a discourse
analysis is described in Chapter 5. Following that, Chapter 6 describes how the various
components of the Degrees of Grounding model will play out in that domain, in the con-
text of describing a dialogue between two humans. To fully test some of the algorithms
for dialogue management, an online interactive system must be developed and tested;
this is provided in Chapter 7. Finally, some issues related to adapting the model to a
new domain and experiments in the usefulness of this approach are provided in Chapter
8.
31
Chapter 4
Development Testbed: Call for Fire
Training
4.1 Rationale for Domain Choice
Chapter 3 provided a description of the Degrees of Grounding model in general terms,
but to make the components of the model more concrete and put the whole model on a
more empirical footing, it must be more fully developed in the context of an appropriate
domain.
A good domain for this will have several features. It will have a number of different
types of information to be grounded, which will produce a variety of dialogues that
show the process through which that information is grounded. Some of that information
to be grounded should be more important than others, and that importance should be
reflected in differences in the ways that dialogue participants talk about the information.
There should be various kinds of dialogue moves, to provide a rich set of evidence of
understanding. There should be some rationale to the extent to which material should be
grounded. However, there should be a limited amount of ambiguity: although eventually
the model should be able to adapt to ambigous or uncertain domains, limiting ambiguity
and uncertainty makes the task of defining the initial model less problematic.
This chapter describes the domain of Calls for Fire training in a virtual enviroment,
which involves simulated radio dialogues to request artillery fires on enemy units in
a virtual world. This domain has the desireable features of a good initial domain as
32
described above. It has a set amount of information that needs to be transmitted, and that
information varies in importance - some of it is relatively trivial, while in the ultimate
Call for Fire task correctly transmitting certain types of information is literally a matter
of life or death. The task is done in a conventially defined set of dialogue moves. There
are issues related to radio transmission conditions and training correctness that define
the extent to which material should be grounded. Although there is some ambiguity,
the dialogue protocols have been designed to minimize that ambiguity, and because the
language is so constrained, Automated Speech Recognition components can provide
results with relatively high accuracy compared to other domains.
This chapter begins by describing the concerns and constraints of military radio
communications in general, and Calls for Fire in particular. Section 4.2 describes basic
characteristics of military radio communication, and protocols that have been developed
to handle them. Section 4.3 describes Calls for Fire, focusing on the participants and the
conversations involved, and Section 4.4 describes the JFETS-UTM Call for Fire training
environment. A discussion of the kinds of information transmitted during the Call for
Fire is deferred until Chapter 5, which describes a discourse analysis of the information
transmitted and dialogue moves involved in the Call for Fire domain.
4.2 Military Radio Communication
Military radio communications are shaped by the potentially hostile environment in
which they occur. Although radios are useful tools for communication, they are inher-
ently insecure because their transmissions can be intercepted by enemy technicians.
Even if radio transmissions are encrypted, enemy analysts can use direction-finding
technology to locate the transmitters, and if direction-finding is not possible, enemy
analysts can nevertheless monitor traffic patterns.
33
Military organizations develop protocols for managing the risks associated with
radio communications. One such set of protocols was developed by the Com-
bined Communications-Electronics Board (CCEB) jointly established by the military
forces of the United States, United Kingdom, Canada, Australia, and New Zealand
[Boa01, oS85, oS84]. These protocols acknowledge that they cannot cover all situations,
and as such there may be variety in execution. Indeed, there are significant variations
in usage depending on the task and situation. However, they provide guidelines that
structured the dialogues and strongly influence the behavior in the corpus being studied
here.
CCEB protocols make the worst-case assumption that all radio signals will be inter-
cepted by the enemy, who will exploit the information immediately. The only sure
defense against this is radio silence, which must be maximized. Only the most neces-
sary radio transmissions should be made and they should be as short as possible - less
than 20 seconds in all cases.
The CCEB protocols specify procedures for turn-taking, corrections, and working in
noisy environments. These procedures are focused on ensuring that radio messages are
sent in as brief a form as possible, with a minimal amount of ambiguity and a reduced
need for corrections and repetitions, thereby minimizing the number and length of poten-
tially exploitable transmissions.
One procedure that distinguishes military radio communication from ordinary con-
versation is that the radio operator should write down messages before sending them,
and in some cases should be written down upon receipt as well. Another charac-
teristic feature is that radio communications use procedure words that can abbreviate
entire phrases. For example, the procedure word “roger” abbreviates the phrase “I have
received your last transmission satisfactorily.”[Boa01]
34
According to the protocols, a message is not considered delivered until the sender
receives a ‘receipt.’ This can be a reply with a procedure word such as ‘roger’ or ‘wilco,’
(procedure word for “I will comply”) or in some cases a followup transmission not
explicitly addressing the original message.
Another protocol handles repetitions. If a radio operator is unsure of what the
another speaker has said, they can request a repetition by saying ‘say again’. They
can also specify a word or range of words to be repeated by using the phrase ‘say again
word after...’ or ‘say again from ... to ...’.
Also, a radio operator can request a confirmation by saying ’read back’. This is
shown in Figure 4.1, which is taken from the CCEB instructions [Boa01]. The partici-
pants use call signs to identify themselves, in this case 4D and 78. In line 1 the operator
whose call sign is 4D gives information about a convoy’s arrival time, and includes a
‘read back’ procedure word. In line 2 operator 78 reads back the message, and in line 3
4D provides a confirmation to that with the procedure word ‘correct.’ If the information
had been read back incorrectly, 4D would have used the procedure word ‘wrong’ in line
3.
1 4D Seven eight this is four delta, read back, convoy has arrived,
time one six three zero zulu, over
2 78 This is seven eight, I read back, seven eight this is four delta
read back convoy has arrived time one six three zero zulu, over
3 4D This is four delta, correct, out
Table 4.1: Example Radio Dialogue
Figure 4.1 also contains examples of the procedure words ‘over’ and ‘out.’ ‘Over’
indicates the end of a speakers turn, and that a reply is expected. ‘Out’ indicates the end
of a speakers turn, and that no reply is expected.
35
There are also procedures for authenticating one’s identity, for directing a shift in
radio frequency, for managing a network of transmitters, and more, but they are not
discussed here as they do not play a part in the current corpus.
The CCEB instructions recognize that procedures may vary based on the conditions
in which the radio calls are being made. Specifically, it distinguishes between radio
calls which may be noisy or difficult because of enemy action or a weak radio signal,
for example, and radio calls which are made in satisfactory conditions.
If the conditions are difficult, the radio operators may speak numbers individually,
saying “five zero” rather than “fifty” for example. They may also use the phonetic
alphabet for abbreviations, such as “echo tango alpha” rather than “ETA.” They may
also use the procedure word “words twice” to indicate that they will repeat each phrase
in the transmission.
Being in continuous communication can also change procedure. For example, after
an initial call sign identification (such as the first phrase in Figure 4.1), further identifica-
tions are not needed if there are no other participants on the net and the two participants
are in continuous communication.
On all radio nets, one radio operator is the Net Control Station, and this is the par-
ticipant who determines whether the conditions are difficult or satisfactory, by using the
procedure words ‘use full procedure’, ‘use full call signs’ or ‘use abbreviated procedure’
and ‘use abbreviated call signs.’
4.3 Calls For Fire
Calls for Fire (CFF) are coordinated artillery attacks on an enemy [Arm01]. Several
teams work together to execute a CFF. A Forward Observer (FO) team locates an
enemy target and initiates the call. The FO team is made up of two or more soldiers,
36
usually with one soldier dedicated to spotting the enemy and another soldier dedicated to
handling communications, usually by radio. The FO radio operator communicates with
the Fire Direction Center (FDC) team, which decides whether to execute the attack,
and if so, which of the available fire assets to use. The FDC team is usually made up
of four personnel: a Fire Support Officer to approve and coordinate the attacks, two
soldiers to support the Fire Support Officer, and a radio operator. Once the FDC team
has decided upon the type of artillery fire to send, the FDC team communicates with
the appropriate firing unit, which sends the fire. The FO team observes the results of
the fire, and then may request a repetition or positional adjustment, or may end the fire
mission.
Note that there are many conversations occuring in the FO and FDC teams. For
example, in the FO team, the soldier observing the enemy will talk to the soldier using
the radio. In the FDC team, the soldier using the radio will talk to the Fire Support
Officer, and probably others. These conversation generally do not occur on the radio,
although they influence what is said on the radio: for example, the FO observer may sug-
gest a correction, if they hear the FO radio operator give incorrect information. However,
this research focuses on the radio conversations between the radio operator for the FO
team and the radio operator for the FDC team. These speakers are henceforth referred
to as simply FO and FDC.
In general the procedures described in section 4.2 are followed, with some minor
variations. For example, a ‘read back’ is assumed for all turns transmitting information
about the CFF target; however, they are typically not followed by a ‘correct’ or ‘wrong’
procedure word, although they are sometimes followed by a ‘roger.’ [oS84, Arm01]
37
4.4 CFF Training in JFETS-UTM
Providing real-world training for Calls for Fire is expensive in terms of personnel and
equipment required, so the U.S. Army has developed virtual immersive training facilities
such as JFETS [Gou05]. In training situations such as JFETS’s UTM scenario, a team of
one, two, or more Forward Observers enter an environment designed to resemble a room
in a war-torn Middle Eastern city, with one or more walls containing a rear-projected
computer screen resembling a window, as shown in Figure 4.1. These windows have a
view to a virtual city, in which the FOs locate enemy targets, and use a simulated radio
to communicate with the FDC. In this training environment the FDC is portrayed by
one or two human trainers who work together to communicate with the FOs and send
simulated artillery fires into the vitual world, for the FOs to see and react to.
Figure 4.1: A Human in the JFETS-UTM Environment
The dialogues that make up the corpus being studied here were held during training
sessions held in the JFETS-UTM training facility. CFFs made in a training environment
have slightly different characteristics from those made in the field. Trainers act as FDCs
and communicate with the FO teams being trained. The trainers may use the radio to
correct the FO. When correcting the FO, they may do so ‘in character’ as an FDC in the
world would, or ‘out of character,’ explicitly acting as a trainer. Alternately, the trainer
38
may take note of an error in protocol made by the FO and handle it during an after action
review.
The model of grounding developed for this research was initially implemented as
part of the Radiobot-CFF/IOTA project, whose purpose was to assist the operation of
the JFETS-UTM environment by handling routine radio calls and making firing deci-
sions, with the optional intervention of the human operator. Further details about the
implementation of Radiobot-CFF/IOTA system are provided in Chapter 7.
Developing the Degrees of Grounding model in the JFETS-UTM training domain
required a discourse analysis of the domain. This discourse analysis is described in
Chapter 5. A description of the development of the Degrees of Grounding model for
modeling dialogue between two humans follows in Chapter 6.
39
Chapter 5
Discourse Analysis of Development
Corpus
5.1 Overview
A discourse analysis of a corpus from the Call for Fire development testbed provided a
context for the development of the Degrees of Grounding model. This discourse analysis
had two components: an analysis of the phases of the dialogue, described in Section
5.2, which was useful in categorizing information types, and an enumeration of the
information types common to Calls for Fire, described in Section 5.3, which describe
the types of information that the participants of a Call for Fire transmit to one another.
5.2 Dialogue Phases
Figure 5.1 shows a CFF dialogue that was conducted between two humans in the JFETS-
UTM training environment.
We have analyzed dialogue in the CFF training domain by reviewing field man-
ual documentation and transcripts from interactions between human participants in the
JFETS-UTM training environment. The CFF domain can generally be divided into sev-
eral phases, in which various kinds of dialogue moves are made.
First there is an establishing phase, in which the participants identify themselves and
their circumstances. This is shown in utterances 1-2 in Figure Figure 5.1, in which the
40
1 FO steel one niner this is gator niner one
adjust fire polar over
2 FDC gator nine one this is steel one nine adjust
fire polar out
3 FO direction five niner four zero distance four
eight zero over
4 FDC direction five nine four zero distance four
eight zero out
5 FO one b m p in the open i c m in effect over
6 FDC one b m p in the open i c m in effect out
7 FDC message to observer kilo alpha high
explosive four rounds adjust fire target
number alpha bravo one zero zero zero
over
8 FO m t o kilo alpha four rounds target number
alpha bravo one out
9 FDC shot over
10 FO shot out
11 FDC splash over
12 FO splash out
13 FO right five zero fire for effect out over
14 FDC right five zero fire for effect out
15 FDC shot over
16 FO shot out
17 FDC rounds complete over
18 FO rounds complete out
19 FO end of mission one b m p suppressed zero
casualties over
20 FDC end of mission one b m p suppressed zero
casualties out
Table 5.1: Example Call for Fire Dialogue
Forward Observers identify themselves by call sign. In other cases, they may establish
the strength of their radio signal by giving and receiving a Radio Check, they may
provide their location by giving Observer Coordinates, or they may settle on the location
of Known Points from which to later derive target locations.
In the targeting phase the Forward Observer provides information about the CFF,
including the kind of fire they are requesting, the coordinates of the target, and the kind
of target being fired upon. This is shown in lines 1-7 of Figure 5.1; the targeting phase
overlaps with the establishing phase, as the first two lines also specify a method of fire
41
(“adjust fire”) and a method of targeting (“polar”.) In utterance 3 the FO gives target
coordinates, and in utterance 5 the FO identifies the target as a BMP (a type of light
tank) and requests ICM rounds (“improved conventional munitions.”)
In the delivery phase of a CFF, (utterances 7-12 in Figure 5.1), after the FDC decides
what kind of fire they will send, they inform the FO in a message to observer (MTO) as in
utterance 7. This includes the units that will fire (“kilo alpha”), the kind of ammunition
(“high explosive”), the number of rounds and method of fire (“4 rounds adjust fire”),
and the target number (“alpha bravo one zero zero zero”). CFFs are requests rather than
orders, and they may be denied in full or in part. In this example, the FO’s request for
ICM rounds was denied in favor of High Explosive rounds. Next the FDC informs the
FO when the fire mission has been shot, as in utterance 9, and when the fire is about to
land, as in utterance 11. Each of these are confirmed by the FO.
In the adjustment phase, (utterances 13-20 in Figure 5.1) the FO regains dialogue ini-
tiative. Depending on the observed results, the FO may request that the fire be repeated
with an adjust in location or method of fire. In utterance 13 the FO requests that the shot
be re-sent to a location 50 meters to the right of the previous shot as a “fire for effect”
all-out bombardment rather than an “adjust fire” targeting fire. This is followed by the
abbreviated FDC-initiated phase of utterances 15-18. In utterance 19 the FO ends the
mission, describing the results and number of casualties.
Besides the behavior shown in Figure 5.1, at any turn either participant may request
or provide an intelligence report describing the behavior of enemy units in the world,
or a status report describing the state of the FO team or of the artillery units. Further-
more, after receiving an MTO the FO may immediately begin another fire mission and
thus have multiple missions active; subsequent adjusts are disambiguated with the target
numbers assigned during the MTOs.
42
steel one nine this is gator niner one
ID-FDC: steel one niner
ID-FO: gator niner one
adjust fire polar
WO-MOF: adjust fire
WO-MOL: polar
Figure 5.1: Example Dialogue Moves and Parameters for an Utterance
5.3 Information Types
The types of information exchanged in a Call for Fire can be identified by Dialogue
Move and Parameter, to identify the kind of information that the speaker is providing.
The first part of the utterance “steel one nine this is gator niner one” is an Identification
dialogue move, with the parameters “fdc id” and “fo id”. Similarly, the second part of the
utterance, “adjust fire polar” is a Warning Order dialogue move, made up of parameters
“method of polar” and “method of location”. The word “over” in this example is a
procedure word used for managing turn-taking.
In this research, the grounded material is at the combined dialogue move and param-
eter level. For example, in figure 5.1, which shows the Dialogue Moves and Parameters
for utterance 1 in Figure 5.1, the parameters are the FDC’s call sign, the FO’s call sign,
the method of fire, and the method of location. This is part of the information that allows
the FDC to decide where and how to send the artillery fire.
Table 5.2 shows the various dialogue move and parameter combinations extracted
by our analysis, focusing on those relevant to the establishing phase of the CFF. Various
moves identify or request the status of the Forward Observer or Fire Direction Center,
refer to scenario Intelligence, provide the Forward Observer’s Observer Coordinates.
Other request a given information or make a repair request.
Table 5.3 shows the dialogue move parameters relevant to targeting information.
Parameters may be relevant to the Warning Order, or description of fire. Others describe
43
Abbreviation Description
ID-FO Identification of Forward Observer
ID-FDC Identification of Fire Direction Center
STATUS-FO Status of Forward Observer
STATUS-FDC Status of Fire Direction Center
INTEL Intelligence report
OBCO Observer Coordinates
REQ Making a Request - the parameter to this can be any
other parameter
SAYAGAIN Making a Repair Request
Table 5.2: Parameters Related to General and Establishing Dialogue Moves
Abbreviation Description
WO-MOF The method of fire - an initial ’adjust fire’ or full ’fire
for effect’ bombardment
WO-MOE The method of engaging the target - i.e. what kind of
artillery to send
WO-MOL The method of location - i.e. grid coordinates, polar,
or other
KP Referring to a ’known point’ to fire at
TL-GR The Grid coordinate of a target location
TL-DIR The direction component of a Polar target location
TL-DIS The distance component of a Polar target location
TL-ATT The attitude component of a target location
TD-TYPE The type of enemy being fired at
TD-NUM The number of enemy units
TD-DESC Further description of the enemy - kind of cover, etc.
Table 5.3: Parameters Related to Targeting Dialogue Moves
the location of the target either in relation to a known point, or by providing Grid or
Polar coordinates. Others describe the kind of target being fired at.
Table 5.4 describes the information relevant to the delivery of the fire mission. This
may include information given during the Message to Observer (MTO), and notifications
about when the fire has been shot, and when it should land.
Finally, Table 5.5 describe the kinds of dialogue moves that occur after a fire is
delivered: how the mission can be repeated and the target location moved, and how
the mission can be ended with a report of the damage done. Note that Target Number
44
Abbreviation Description
MTO-BAT The battery that will be firing
MTO-NUM The number of rounds that will be fired
MTO-TYPE The type of rounds that will be fired
TN The target number of the fire mission
SHOT Notification or recognition that the rounds have started
firing
RC Notification or recognition that the rounds have ended
firing
SPLASH Notification or recognition that the rounds should land
momentarily
Table 5.4: Parameters Related to Delivery Dialogue Moves
Abbreviation Description
TL-AD An add/drop adjustment
TL-LR A left/right adjustment
COMMAND-
REPEAT
Requesting a repeat of the fire
EOM-NUM End of Mission - number of casualties
EOM-TYPE End of Mission - type of casualties
EOM-BDA End of Mission - extent of casualties
TN The target number of the fire mission
Table 5.5: Parameters Related to Adjust Dialogue Moves
is used in both targeting-related utterances as well as in delivery-related utterances. In
both cases, target number is used to disambiguate the fire mission being referred to.
5.4 Discussion
A look back at the dialogue example in Figure 5.1 will show the explicit importance of
grounding to training Calls for Fire. Every other turn in that dialogue involves repeating
back some information from the turn before, with some variations - for example, in
turn 8 the FO abbreviates ‘MTO’, does not repeat back the battery and mutitions (‘kilo
alpha four rounds’) and abbreviates the target number. The central task of this domain is
transmitting information while ensuring that it is appropriately confirmed. Furthermore,
45
there are a specific set of information elements that are being discussed, and which can
be tracked, as shown in Tables 5.2 to 5.5. Because of this, the training CFF domain was
a good candidate for initial development of the model of Degrees of Grounding, which
will be described in the next chapter.
46
Chapter 6
CFF Dialogue Modeling with Degrees
of Grounding
6.1 Overview
Chapter 3 described the Degrees of Grounding model in general. However, to properly
develop the model, an appropriate domain had to be identified as in Chapter 4, and a
discourse analysis of that model had to be performed as in Chapter 5. This chapter
proceeds to develop the Degrees of Grounding model in the context of the Call for Fire
(CFF) training domain.
The Degrees of Grounding model was developed through an analysis of transcripts
of sessions between human trainees in the JFETS-UTM training environment described
in Section 4.4. After the discourse analysis was performed and a state-of-the-art dia-
logue manager (that did not explicitly model degrees of groundedness) was built to
explore the domain [RLR
+
06], a close review of transcripts between human trainees
was used to develop the initial model, which was then slightly modified after the exper-
iments described in Section 7.5 and Chapter 7.
This chapter begins by describing the data structures involved in the model: repre-
sentations of the level at which material is grounded, and the structure of those repre-
sentations. Following that, section 6.3 enumerates and provides examples of the types
of evidence of understanding that the dialogue participants provide, and Section 6.4
47
describes how degrees of groundedness arise from combinations of evidence of under-
standing. Sections 6.5 describes the grounding criteria. Finally, section 6.6 shows how
the dialogue components work together for dialogue modeling, by providing a turn-by-
turn example of how the components describe the state of grounding in a conversation
between two humans.
6.2 Common Ground Units
Generally, the common ground between dialogue participants represents a large set of
knowledge about the world, domain, and scenario. For the purposes of this application,
the important elements of the common ground are those related to the task at hand.
Evidence in the Degrees of Grounding model is tracked per Common Ground Unit
(CGU) [NT99]. An example of a CGU in the Call for Fire domain is given in Figure
6.1. Material under discussion is disambiguated by several identifying components of
the CGU: in this domain this is the dialogue move, the parameter, the mission number,
and the adjust number. In the CFF domain, the mission number is assigned by the Fire
Direction Center after the firing battery and available munitions have been identified:
until that time, the mission does not have a target number and is considered to be under
development. Because the actual value of the parameter is not used for identification,
material can be identified by participants who may not yet agree on its value.
The collection of CGUs is called the Common Ground Record. The Common
Ground is defined as the subset of the Common Ground Record whose CGUs have
reached the grounding criteria.
48
information:
dialogue move: target location
parameter: direction
value: 5940
mission number: to be determined
adjust number: 0
evidence history:
submit-G91, repeat_back-S19
degree of groundedness: agreed-content
grounding criteria met: true
Figure 6.1: Example Common Ground Unit
6.3 Evidence of Understanding
As described in Section 3.2, evidence of understanding is a phenomenon that provides
an indication of the extent to which material under discussion is mutually understood.
The full set of types of evidence of understanding is shown in Table 6.1 and discussed
in the following sections. Unlike Clark and Schaefer’s model, the Degrees of Ground-
ing model does not tie the degree of groundedness of material directly to the strength
of a particular type of evidence; rather, the degree of groundedness is produced from
sequences of evidence of understanding, as described in Section 6.4.
6.3.1 Submit
A Submit type of evidence is provided when material is introduced into the common
ground for the first time. The Submit type of evidence is derived from the Presentation
phase of [CS89].
An example of a Submit is given in line 1 of Table 6.2: “direction 6120” is informa-
tion that had not yet been mentioned and has no assumed values.
Dialogue systems that do not specifically model grounding generally assume that
material is grounded when it is first Submitted unless there is evidence to the contrary.
49
Degree Description
Submit The initial presentation of material
Repeat Back Participant B presents material that participant A has
previous Submitted
Resubmit Participant A presents material that participant A has
previously submitted
Acknowledge A participant makes a general utterance of understand-
ing
Request Repair A participant makes an utterance asking the other par-
ticipant to resubmit material.
Use Participant B makes a grounding move that demon-
strates some understanding of material previously
introduced by speaker A
Move On Participant B makes a grounding move that takes the
next step in the task at hand
Lack of Response A given participant makes no reply
Table 6.1: Evidence of Understanding
Line ID Utterance Evidence
1 G91 direction 6120 over Submit
2 S19 direction 6120 out Repeat
Back
3 G91 correction direction 6210 over Resubmit
Table 6.2: Example of Evidence in Dialogue
6.3.2 Repeat Back
A Repeat Back type of evidence is provided when material that was Submitted by
another dialogue participant is presented back to them, often as part of an explicit con-
firmation.
The Repeat Back evidence is related to the “Display” evidence of Clark and Schaefer
[CS89], however here it is renamed to indicate that it pertains to verbal repetitions, rather
than general displays which may be in other modalities, such as visual. In fact, there is
evidence that grounding behavior related to visual feedback is different from that related
to auditory feedback [CB91, TG08]. An example is given in line 2 of Table 6.2: the
“direction 6120” information given in line 1 is Repeated Back as part of a confirmation.
50
6.3.3 Resubmit
A Resubmit type of evidence is provided when material that has already been Submit-
ted by a dialogue participant is presented again as part of a self- or other-correction.
This is an example of what [CB91] call negative evidence, which indicate a lack of
mutual belief. An example is shown in Table 6.2; the direction information which was
Submitted in turn 1 and Repeated Back in turn 2 is Resubmitted in turn 3.
6.3.4 Acknowledge
An Acknowledge type of evidence is a general statement of agreement that does not
specifically address the content of the material. Acknowledges are identified by seman-
tic interpretation. Acknowledges are a part of [CS89]’s set of types of evidence of
understanding. Table 6.3 contains an example: in line 1 the speaker G91 Submits infor-
mation about the target’s status, which is then Acknowledged by speaker S19 in turn
line 2.
Line ID Utterance Evidence
1 G91 end of mission target
destroyed over
Submit
2 S19 roger Acknowledge
Table 6.3: Example of an Acknowledgment
6.3.5 Request Repair
A Request Repair type of evidence is a statement that indicates that the speaker needs
to have the material Resubmitted by the other participant. Request Repairs are identified
by semantic interpretation. Request Repairs are another example of negative evidence
[CB91]. Table 6.4 gives an example: in line 1 G91 submits a map grid coordinate, and
51
in line 2 S19 asks that the other speaker “say again” that grid coordinate, which is a
Request for Repair.
Line ID Utterance Evidence
1 G91 grid 5843948 Submit
2 S19 say again grid over Request
Repair
Table 6.4: Example of a Request Repair
6.3.6 Move On
A Move On type of evidence is provided when a participant decides to proceed to the
next step of the task at hand. This requires that the given task have a set of well-defined
steps, and that the step being Moved On from needs to be grounded before the next
step can be discussed. Move Ons are identified based on a model of the task at hand.
Move Ons are related to [CS89]’s “Initiation of the relevant next contribution,” although
Clark and Schaefer do not specify that “next contributions” should be dependent on
sufficiently grounding the previous step.
A Move On provides evidence because a cooperative dialogue participant would
typically not move on to the next step of the task under such conditions unless they felt
that the previous step was sufficiently grounded.
Table 6.5 shows an example of a Move On. In line 1, G91 indicates that the kind of
artillery fire they want is a “fire for effect”; this is Repeated Back in line 2. G91 then
Submits grid information related to the target location. The task specification of Calls
for Fire indicates that fire requests should proceed in several steps: after a Warning Order
is established, a Target Location should be given, followed by a Target Description. By
moving on to the step in which a Target Location is provided, G91 tacitly indicates that
the step in which a Warning Order is established has been dealt with to their satisfaction.
52
Line ID Utterance Evidence
1 G91 fire for effect over Submit
2 S19 fire for effect out Repeat
Back
3 G91 grid 45183658 Submit,
Move On
Table 6.5: Example of a Move On
Line ID Utterance Evidence
1 S19 message to observer kilo 2
rounds AB0001 over
Submit
2 G91 mike tango oscar kilo 2
rounds target number AB0001
out
Repeat
Back
3 S19 shot over Submit
Table 6.6: Example of a non-Move On
Not all typical sequences provide Move On evidence. In the example in Table 6.6,
in line 1 S91 submits a “message to observer” indicating the kind of fire that is being
delivered, which is followed in line 2 by a confirmation by G91. S19 then proceeds to
the next step of the task by indicating in line 3 that the artillery has been fired. Line
3, however, is not a Move On because although it is typically the next step in the task,
providing that information is not dependent on fully grounding the material being dis-
cussed in line 2 - in fact, line 3 will be provided when the artillery has been fired, and
not based on any other decision by S19.
6.3.7 Use
A Use type of evidence is provided when a participant presents an utterance that indi-
cates, through its semantics, that a previous utterance was understood. Uses are related
to [CS89]’s “Demonstration”. In the Radiobot-CFF corpus, most Uses are replies to a
53
request for information, such as in Table 6.7, where S19’s request for a target description
in line 1 is answered with a target description, in line 2.
Line ID Utterance Evidence
1 S19 s2 wants to know whats the
target description over
Submit
2 G91 zsu over Submit,
Use
Table 6.7: Example of a Use
Another example of Use is shown in Table 6.8, in which S19 is providing an intel-
ligence report in line 1 regarding an enemy target, and line 2 replies with a statement
asking whether the target is a vehicle. The utterance in line 2 uses information provided
in line 1.
Line ID Utterance Evidence
1 S19 again it should have rather
large antennas affixed to it uh
they are still sending out sig-
nals at the time
Submit
2 G91 this is some kind of vehicle Submit,
over Use
Table 6.8: Example of a Use
6.3.8 Lack of Response
A Lack of Response type of evidence is provided when neither participant speaks for
a given length of time. Identifying a Lack of Response type of evidence involves deter-
mining how much silence will be significant for signalling understanding or lack of
understanding.
54
In the example shown in Table 6.9, G91 submits an identifying utterance to see if
S19 is available. After 12 seconds, G91 has heard nothing back; this is negative evidence
of grounding, so in line 3 G91 resubmits the utterance.
Line ID Utterance Evidence
1 G91 S 1 9 this is G 9 1 Submit
2 (12 seconds of silence) Lack of
Response
3 G91 S 1 9 this is G 9 1 Resubmit
Table 6.9: Example of a Lack of Response
A Lack of Response can also be an indication of positive grounding, as in Table 6.10.
In line 1, G91 submits information about a target, which in line 2 is repeated back. Line
3 indicates a period of silence, in which neither speaker took the opportunity to request
a repair or otherwise indicate their disapproval with the state of the groundedness of the
material. In that sense, the silence of line 3 is positive evidence of understanding.
Line ID Utterance Evidence
1 G91 b m p in the open over Submit
2 S19 b m p in the open out Repeat
Back
3 (10 seconds of silence) Lack of
Response
Table 6.10: Example of a Lack of Response
6.4 Degrees of Groundedness
As discussed in Section 3.3, degrees of groundedness represent the extent to which
material is grounded in a dialogue, given a sequence of evidence of understanding.
Table 6.11 shows the significant degrees identified during a review of sessions between
humans, as well as the definition or identifying pattern of evidence. These degrees are
55
Degreee Pattern/Identifier
Unknown not yet introduced
Misunderstood (anything,Request Repair)
Unacknowledged (Submit, Lack of Response)
Accessible (Submit) or (anything,Resubmit)
Agreed-Signal (Submit, Acknowledgment)
Agreed-Signal+ (Submit, Acknowledgment, other)
Agreed-Content (Submit, Repeat Back)
Agreed-Content+ (Submit, Repeat Back, other)
Assumed grounded by other means
Table 6.11: Degrees of Groundedness
shown from Unknown, which is least grounded, to Assumed, which is always com-
pletely grounded. Most degrees are identified by patterns of evidence. For example, a
CGU is misunderstood if the latest item of evidence provided is a Request Repair, and
CGU is Unacknowledged if it is Submitted, after which there is a Lack of Response.
The relative strength of the Misunderstood degree of groundedness is possibly unique to
this domain. In a more general domain, it is possible that Misunderstood is stronger than
Accessible, because knowing that a topic is under discussion but having some confusion
about the specifics of that topic is more useful than one participant having mentioned
that topic without even knowing if the other participant is listening. However, in the
CFF training domain this is not so: there is information that is sufficiently grounded
just by being mentioned, such as “Shot” warnings when done during a training mission
with less strictness, but those warnings would need to be further dealt with if there was
a misunderstanding about them.
The degree of groundedness models the extent to which the participants have
achieved mutual understanding regarding a given material. The degree of grounded-
ness can be used compute how much (if any) additional evidence is needed to reach the
56
grounding criterion, or “criterion sufficient for current purposes” as defined by [CS89].
This computation can be used in dialogue management to help select a next utterance.
Degrees of groundedness are defined such that material has a given degree before and
after any sequence of evidence given. For example, in Table 6.12 the target description
given in line 1 has a certain degree of groundedness before it is Submitted (Unknown),
another degree after it is Submitted (Accessible), another degree after it is Repeated
Back (Agreed-Signal), and another degree after the Lack of Response (Unacknowl-
edged).
Line ID Utterance Evidence Degree
1 G91 b m p in the open over Submit Accessible
2 S19 b m p in the open out Repeat
Back
Agreed-Signal
3 (10 seconds of silence) Lack of
Response
Unacknowledged
Table 6.12: Example of a Lack of Response
6.4.1 Unknown
If the degree is Unknown, then by definition the participants have no immediate evidence
of how grounded the material is.
It is useful to model this degree of groundedness if one of the participants can reason
that a given unit of information is required but not known. For example, in a Call for
Fire domain a Warning Order, a Target Location, and a Target Description are required
before a fire mission can be delivered. If only the Warning Order and Target Location
have been grounded but the Target Description is Unknown, then the agent can represent
exactly why the fire mission should not yet be delivered.
57
6.4.2 Misunderstood
If Repair Request evidence has been given for a material, then its degree of grounded-
ness is Misunderstood, as shown in line 2 of Table 6.13.
# ID Utterance Evidence Degree
1 FO grid four five four four three six four
six over
Submit Accessible
2 FDC say again grid over Request Repair Misunderstood
3 FO I say again grid four five four four
three six four six over
Resubmit Accessible
Table 6.13: Examples of Degrees of Groundedness
At the Misunderstood degree, the participants know that the material is being dis-
cussed, and that they are in the process of correcting it, but there has been some evidence
that one of the participants does not understand the material.
6.4.3 Unacknowledged
If the degree is Unacknowledged, then the participant still has no firm grasp on whether
the material is grounded, although there is the possibility that because it was Presented,
the other participant may have understood it but not been able to generate a reply.
If a given material is Presented but an expected reply has not been made, then it is
Unacknowledged, as shown in line 2 of Table 6.14.
6.4.4 Accessible
At the Accessible degree, the participants know that if the other participant is attending
to the converstion, and is able to access and interpret the material, it may be grounded.
This degree is reached after material has been submitted or resubmitted.
58
# ID Utterance Evidence Degree
1 FO steel one nine, gator nine one over Submit Accessible
2 (27 seconds pass) Lack of
Response
Unacknowledged
3 FO steel one nine, gator nine one over Resubmit Resubmitted
4 FDC gator nine one this is steel one nine
out
Repeat Back Agreed-Content
5 FO fire for effect over Move On Agreed-Content+
Table 6.14: Example of Unacknowledged and Agreed-Content Degrees
6.4.5 Agreed-Signal and Agreed-Signal+
A Presentation followed by an Acknowledgment results in an Agreed-Signal degree of
groundedness, as in the example given in turn 2 of Table 6.15. If an additional acknowl-
edgment or use is given, this increases the Degree to Agreed-Signal+ as shown in Table
6.16. If material is grounded to the Agreed-Signal or Agreed-Signal+ degree, then the
participants have agreed on the general topic of the material.
# ID Utterance Evidence Degree
1 FDC reconnissance shows ah heavy pres-
ence of z s us and ah b m ps in the
area over
Submit Accessible
2 FO ah roger Acknowledge Agreed-Signal
Table 6.15: Example of Agreed-Signal Degree
# ID Utterance Evidence Degree
1 FO direction 6 1 2 0 over Submit Accessible
2 FDC direction 6 1 2 0 out Repeat Back Agreed-Signal
3 FO roger out Acknowledge Agreed-Signal+
Table 6.16: Example of Agreed-Signal+ Degree
59
6.4.6 Agreed-Content and Agreed-Content+
If material is grounded to the Agreed-Content degree, then the participants have agreed
on the specific topic of the material. An example of this is shown in line 4 of Table
6.14, where the FDC repeats back the call signs, bringing the degree of groundedness to
Agreed-Content.
If further evidence is provided, then the degree increases to Agreed-Content+. In
line 5 of Table 6.14, the FO moves on to the next step in the task, which increases the
degree of groundedness to Agreed-Content+.
6.4.7 Assumed
Finally, certain material is Assumed to be already grounded: for example, the fact that
the method of location is ’grid’ unless already stated; and the observer coordinates in the
scenario unless otherwise stated. Evidence for Assumed items is related to the ’Com-
munity Membership’ basis for mutual knowledge, as described in Clark and Marshall
[CM81].
Assumed material is made up of two kinds of information. One kind is conventional,
that which is known because of the conventions of the task or domain, and one which
has experienced prior grounding.
6.5 Grounding Criteria
Each type of material being modeled has a degree of groundedness that defines its
grounding criteria. There are some variations in the grounding criteria. First of all,
the grounding criterion for Identification is specified by domain manuals [Boa01]: an
initial Identification must achieve a high degree of grounding, but immediately after-
wards needs not be presented or confirmed and can be assumed. More importantly, the
60
variations in grounding described in the data may vary based on the situation. Domain
manuals [Boa01] also specify that worsening noise conditions may require higher levels
of groundedness. Additionally, because these dialogues occur in a training environment,
there may be variable levels of strictness in terms of achieving correct performance
immediately, as opposed to allowing occasional errors in grounding behavior, which
would be corrected during an After Action Review.
Table 6.17 lists the various types of information used in the CFF domain, and a
sample grounding criterion for each of these types of material.
Element Grounding Criterion
fdc id Agreed-Content / Accessible
fo id Agreed-Content / Accessible
method of fire Agreed-Content
method of control Agreed-Content
method of engagement Agreed-Content
method of location Agreed-Content
grid location Agreed-Content
direction Agreed-Content
distance Agreed-Content
known point Agreed-Content
td-target type Agreed-Content+
td-number of enemies Agreed-Content+
td-disposition Agreed-Content+
command Agreed-Signal
target number Accessible
mto-battery Accessible
mto-number Accessible
shot Accessible
splash Accessible
rc Accessible
left right Agreed-Content+
left right adjust Agreed-Content+
add drop Agreed-Content+
add drop adjust Agreed-Content+
eom-num Accessible
eom-type Agreed-Content
eom-bda Agreed-Content
Table 6.17: Sample Set of Grounding Criteria
61
6.6 Trace of Example Dialogue
Here follows an example of dialogue. This dialogue is between G91 as a Forward
Observer identifying a target, and S19 as a Fire Direction Center who will send the
artillery fire when given the appropriate information. The first several lines of the dia-
logue are shown in Table 6.18, which shows the utterance, interpretation, and identified
evidence for that utterance.
Line ID Utterance Interpretation Evidence
1 G91 fire for effect over WO-MOF: fire for effect Submit
2 S19 fire for effect out WO-MOF: fire for effect Repeat Back
3 Silence: 0.7 seconds
4 G91 grid four five four two ah
three six three eight
TL-GR: 45423638 Submit, Move On
5 Silence: 2.3 seconds
6 S19 grid four five four two
three six three eight out
TL-GR: 45423638 Repeat Back
7 Silence: 0.7 seconds
ah roger ROGER Acknowledge
8 G91 b r d m TD-TYPE: b r d m Submit, Move On
in the open over TD-DESC: in the open Submit
9 Silence: 1.3 seconds
10 S19 b r d m TD-TYPE: b r d m Repeat Back
in the open out TD-DESC: in the open Repeat Back
11 Silence: 9.9 seconds Lack of Response
Table 6.18: CFF Dialogue Trace, Targeting Phase
In line 1, G19’s utterance is interpreted as a Warning Order - Method of Fire (WO-
MOF), describing the kind of artillery fire requested, whose value is “fire for effect.”
This is the first mention of a WO-MOF for this particular CFF, so it is identified as a
Submit type of evidence related to a new CGU, which now has an Accessible degree
of groundedness. In line 2, a WO-MOF is again given, this time by the other speaker.
The WO-MOF is identified as referring to the CGU introduced in line 1, and a Repeat
62
Back type of evidence is added to that CGU’s evidence history, which gives it an Agreed-
Content degree of groundedness. In line 3 there follows a silence that is not long enough
to be a Lack of Response type of evidence.
In line 4, G91 provides an Acknowledge type of evidence, and Moves On to the next
task item: identifying the Target Location - Grid (TL-GR) of the CFF. The Acknowl-
edge and Move On, referring to the CGU created in line 1, raise that CGU’s degree of
groundedness to its grounding criterion of Agreed-Content+, at which point it becomes
grounded. At the same time, the introduction of the TL-GR information creates a new
CGU, whose degree is Accessible. Line 5 provides another silence that is not long
enough to provide a Lack of Reply type of evidence of undestanding. In line 6 the
TL-GR CGU is Repeated Back, thereby raising its degree of groundedness to Agreed-
Content.
In line 8 an Acknowledge is provided and a set of information related to the Tar-
get Description (TD-) is given, providing a Move On, thereby grounding the TL-GR
CGU. So by line 8, two CGUs (WO-MOF and TL-GR) have been added to the common
ground, and two more CGUs (TD-TYPE and TD-DESC) have Accessible degrees and
are in the process of being grounded. In line 10 the TD CGUs are Repeated Back, rais-
ing their degree of groundedness to Agreed-Content. In line 11 the Lack of Response
raises the TD CGUs to Agreed-Content+ thereby grounding them. At this point there is
enough information in the common ground for S19 to send the artillery fire.
Table 6.19 shows the items in the CGU record that changed after each line. Note that
in lines 4, 8, and 11 items in the CGU record changed that were not explicitly mentioned
in the utterance: lines 4 and 8 because of the Move On type of evidence, and line 11
because of the silence.
Table 6.20 shows the second part of the dialogue. In line 12, S19 provides infor-
mation about the artillery fire that is going to be sent. This includes the battery that
63
Line CGU Value Degree GC Met?
1 WO-MOF: fire for effect Accessible No
2 WO-MOF: fire for effect Agreed-Content Yes
4 WO-MOF: fire for effect Agreed-Content+ Yes
TL-GR: 45423638 Accessible No
6 TL-GR: 45423638 Agreed-Content Yes
TL-GR: 45423638 Agreed-Content+ Yes
8 TD-TYPE: b r d m Accessible No
TD-DESC: in the open Accessible No
10 TD-TYPE: b r d m Agreed-Content No
TD-DESC: in the open Agreed-Content No
11 TD-TYPE: b r d m Agreed-Content+ Yes
TD-DESC: in the open Agreed-Content+ Yes
Table 6.19: Change in CGU Record, Targeting Phase
will be firing (MTO-BAT), the number of rounds to be fired (MTO-NUM) and the tar-
get number that will be used to refer to this particular fire mission from that point on
(TN). In line 14, G91 Repeats Back the information presented in line 12 along with an
Acknowledge. In line 16, S19 notifies that the mission has been fired; in line 18 this is
confirmed. Likewese, in line 19 S19 notifies that the mission is about the land; in line
21 this is confirmed.
The following several turns have been removed for space reasons. These turns were
related to an adjustment of the artillery fire: after the initial bombardment, the Forward
Observer requested that the same artillery be fired 100 meters to the left of the original
bombardment. This was confirmed and delivered. In line 23, G91 sends a description
of the amount of damage suffered by the target: the number of enemy affected (EOM-
NUM), the type of enemy (EOM-TYPE) and the extent of the damage (EOM-BDA).
In line 24, the information from line 23 is Repeated Back by S19, thereby ending the
CFF. Note that S19 does not Repeat Back the EOM-NUM. In this particular instance
the number of enemies is implied by the EOM-TYPE being singular, but throughout the
corpus EOMs are seen to have a low grounding criteria.
64
Line ID Utterance Interpretation Evidence
message to observer kilo MTO-BAT: kilo Submit
12 S19 two rounds MTO-NUM: two Submit Move On
target number alpha
bravo zero zero one over
TN: AB001 Submit
13 Silence: 3.1 seconds
a roger mike tango alpha
ah alpha
ROGER Acknowledge
14 G91 target number alpha
bravo zero zero zero one
TN: AB0001 Repeat Back
a kilo MTO-BAT: kilo Repeat Back
two rounds out MTO-NUM: two Repeat Back
15 Silence: 11.4 seconds Lack of Response
16 S19 shot SHOT Submit
rounds complete over RC Submit
17 Silence: 0.8 seconds
18 G91 shot SHOT Repeat Back
rounds complete out RC Repeat Back
19 S19 splash over SPLASH Submit
20 Silence: 1.5 seconds
21 G91 splash out SPLASH Repeat Back
22 Silence: 30.4 seconds Lack of Response
ah end of mission a target
number alpha bravo zero
zero one
TN: AB001 Submit
23 G91 one EOM-NUM: one Submit
b r d m EOM-TYPE: b r d m Submit
destroyed over EOM-BDA: destroyed Submit
24 S19 end of mission b r d des
m d correction b r d m
EOM-TYPE: b r d m Repeat Back
destroyed out EOM-BDA: destroyed Repeat Back
Table 6.20: CFF Dialogue Trace, Post-Targeting Phase
Table 6.21 shows the items in the CGU record that changed after each line. Note
that the MTO and delivery-related CGUs have low grounding criteria, as this dialogue
is being processed with lenient grounding criteria. Also note that in lines 23 and 24, the
type and bda of the end of message have higher grounding criteria than the number and
target number.
65
Line CGU Value Degree GC Met?
MTO-BAT: kilo Accessible Yes
12 MTO-NUM: two Accessible Yes
TN: AB001 Accessible Yes
MTO-BAT: kilo Agreed-Content Yes
14 MTO-NUM: two Agreed-Content Yes
TN: AB001 Agreed-Content Yes
MTO-BAT: kilo Agreed-Content Yes
15 MTO-NUM: two Agreed-Content Yes
TN: AB001 Agreed-Content+ Yes
16 SHOT Accessible Yes
RC Accessible Yes
18 SHOT Agreed-Content Yes
RC Agreed-Content Yes
19 SPLASH Accessible Yes
21 SPLASH Agreed-Content Yes
22 SPLASH Agreed-Content+ Yes
TN: AB001 Accessible Yes
23 EOM-NUM: one Accessible Yes
EOM-TYPE: b r d m Accessible No
EOM-BDA: destroyed Accessible No
24 EOM-TYPE: b r d m Agreed-Content Yes
EOM-BDA: destroyed Agreed-Content Yes
Table 6.21: Change in CGU Record, Post-Targeting Phase
6.7 Discussion
This chapter describes how a model of Degrees of Grounding has been developed
from a corpus analysis. In particular, Section 6.3 describes the various kinds of evi-
dence of understanding that participants use in discussions of the domain’s information
components. Section 6.4 describes how these can be combined to provide degrees of
grounding. Section 6.5 describes how grounding criteria can be defined from degrees
of grounding. Section 6.6 provides an example of how the model works in a dialogue
between two humans.
However, to allow this model to actually work in an online system, algorithms for
dialogue management must be developed, implemented, and evaluated. The model is
extended to handle these capabilities in Chapter 7.
66
Chapter 7
CFF Dialogue Management with
Degrees of Grounding
7.1 From Dialogue Modeling to Dialogue Management
The model of Degrees of Grounding described in Chapter 6 contains an algorithm for
determining whether to respond to an observed evidence of understanding, and if so,
what type of evidence of understanding to provide in return. The evaluation of the model
done in that chapter was on a corpus of utterances between two humans, by tracking
the grounding behavior between the dialogue participants, the algorithm performed a
dialogue modeling function.
However, evaluating the algorithm for providing an evidence of understanding in
response is best done in an implemented system, using a system guided by the algo-
rithm to make real-time decisions to guide interaction with a human. That is having the
algorithm perform a dialogue management function. That evaluation is the focus of this
chapter.
Section 7.2 describes the dialogue management algorithms. Section 7.3 provides
implementation details about the project in the context of which this work was done,
and Section 7.4 provides an example of how the dialogue manager works. Section
7.5 describes a corpus evaluation of the dialogue management algorithms. Section 7.6
describes results of testing the Degrees of Grounding dialogue manager, and Section 7.7
provides some conclusions.
67
7.2 Dialogue Management Algorithms
Exploiting the model described in Chapter 6 involves several steps. Evidence of under-
standing must be identified given a semantic interpretation and the history of evidence
provided so far. Given an utterance’s new evidence and a CGU’s current degree of
groundedness, the CGU’s new degree of groundedness must be determined. Once a
CGU’s current degree is determined, it can be compared to its grounding criterion to
determine whether or not it has been sufficiently grounded, and if not, a new item of
evidence may be suggested to help further ground the material.
Figure 7.1 shows how these individual algorithms work together for dialogue man-
agement. Note that this is an elaboration of the algorithm overview discussed in Section
3.5. Specifics of each algorithm follow.
for each dialogue act parameter,
identify the relevant CGU
identify evidence of understanding
compute the CGU’s degree of groundedness
for each CGU not sufficiently grounded
determine evidence to be given
compute the CGU’s degree of groundedness
Figure 7.1: Dialogue Management Algorithm
7.2.1 Identifying Evidence of Understanding
The rules for identifying the evidence of understanding are summarized in Figure 7.2.
They describe the processing that occurs for each CGU in an utterance.
The first set is given a chance to fire when analyzing each CGU in an utterance.
These rules are constructed based on the definitions of the types of evidence of under-
standing. For example, a Presentation type of evidence is identified by question of
whether the relevant CGU is not in the common ground, which would identify if it is
68
for each CGU in an utterance,
if that CGU is not already grounded or in common ground record,
then the Evidence is: Submit for that CGU
if that CGU’s Dialogue Move is Confirm
then the Evidence is: Acknowledge for each CGU
in previous turn
if that CGU is already grounded,
and was Submitted by the other dialogue participant,
then the Evidence is: Repeat Back for that CGU
if the Dialogue Move has a ’prompt’ in the previous turn
then the Evidence is: Use
if the Dialogue Move is Say Again
then the Evidence is: Request Repair
for the parameter of that Say Again
if that CGU is already present in the common ground record,
and was presented by yourself,
then the Evidence is: Resubmit
for each CGU in the previous turn
if that CGU was the next item in the task model
then the Evidence is: Move On
after a move is made
if n=5 seconds pass without a reply
then the Evidence is: Lack of Response
Figure 7.2: Rules to Identify Evidence of Understanding
being presented for the first time. (Recall that CGUs are identified not only by the type
of content: for example, in the case of calls for fire, Target Locations are identified
by their target number as well as by the fact that they are a Target Location.) This
is to be contrasted with the case in which the CGU is already present in the common
ground and was presented by the agent running the algorithm, in which case it is a
Resubmit. Note that this identification of ’Resubmit’ is specific to the CFF domain:
in other applications, having the same speaking present again information already in
69
the common ground record does not automatically identify a resubmit: in the Tactical
Questioning domain described in Chapter 8, the repeat presentation of material already
in the common ground record would have to be accompanied by an additional semantic
information to be identified as a Resubmit.
Indeed, even in the current algorithm shown in Figure 7.2, other rules are fired
based on the semantic interpretation of the associated CGU - expressed here as Dia-
logue Moves. For example, if the Dialogue Move is a Confirm, this would serve as
an Acknowledgment of the CGUs in the previous turn, and if the Dialogue Move is a
Say Again, the type of evidence is a Repair Request for the relevant parameter. In the
CFF domain, the main Use evidence (outside of general off-task talk) is a prompt for
information, followed by the submission of that information; so a rule to handle this is
added.
The second set of rules in Figure 7.2 is actually an iteration of every CGU in the
previous turn, and compares the CGUs in the previous move with those in the current
move, to see if the CGU referenced in the previous move was referenced in the current
move. This would determine whether the type of Evidence was ’Move On.’ Finally,
the final rule is used when examining whether there is a response or not to the latest
utterance, which would determine whether there was a Lack of Response.
7.2.2 Identifying Degrees of Groundedness
Section 7.2.1 described how the latest evidence of understanding regarding a material
can be identified. Given this new evidence, the next task for the model is to determine
the new degree of groundedness of the material being referred to.
The degrees of groundedness are defined to capture the essential information about
the evidence of understanding that has been presented so far. For example, an Agreed-
Signal degree indicates that a Submit type of evidence has been presented, along with an
70
Acknowledgment. Because of this, a material’s current degree of groundedness along
with the new evidence of understanding is enough to determine that material’s new
degree of groundedness.
Table 7.1 shows how this can be done, given any possible combination of evi-
dence and degrees. The columns of the tables represent the material’s current degree
of groundedness, and the rows represent the latest type of evidence of understanding
regarding that material. The combination of the two leads to the material’s new degree
of groundedness.
Un-
known
Misun-
derstood
Unac-
knowl.
Aces-
sible
Agreed-
Signal
Agreed-
Signal+
Agreed-
Content
Agreed-
Content+
Submit Aces-
sible
n/a n/a n/a n/a n/a n/a n/a
Repeat
Back
n/a Agreed-
Content
Agreed-
Content
Agreed-
Content
Agreed-
Content
Agreed-
Content
Agreed-
Content+
Agreed-
Content+
Resubmit n/a Aces-
sible
Aces-
sible
Aces-
sible
Aces-
sible
Aces-
sible
Aces-
sible
Aces-
sible
Ac-
knowl.
n/a Misun-
derstood
Unac-
knowl.
Agreed-
Signal
Agreed-
Signal+
Agreed-
Signal+
Agreed-
Content+
Agreed-
Content+
Request
Repair
n/a Misun-
derstood
Misun-
derstood
Misun-
derstood
Misun-
derstood
Misun-
derstood
Misun-
derstood
Misun-
derstood
Use n/a Agreed-
Content
Agreed-
Content
Agreed-
Content
Agreed-
Content
Agreed-
Content
Agreed-
Content+
Agreed-
Content+
Move
On
n/a Misun-
derstood
Unac-
knowl.
Aces-
sible
Agreed-
Signal+
Agreed-
Signal+
Agreed-
Content+
Agreed-
Content+
Lack of
Resp.
n/a Misun-
derstood
Unac-
knowl.
Unac-
knowl.
Agreed-
Signal+
Agreed-
Signal+
Agreed-
Content+
Agreed-
Content+
Table 7.1: Identifying Degrees of Groundedness
Certain combinations are not applicable due to the definitions of types of evidence
and degrees. For example, the Accessible degree implies that the material has been
submitted already, so a Submit type evidence of understanding cannot be identified
for material that is already Accessible (the correct type of evidence would probably
be Resubmit, in that case.)
Also, certain combinations are not usually seen, though their possible combination
can be imagined. For example, it may seem unusual to Repeat Back material that is
71
currently Misunderstood, but Table 7.2 gives an imagined example of how that could
occur. In line 1 the forward observer submits a grid. Imagine the fire direction center is
unsure of the number, so submits a request for repair in line 2. At this point, the degree
of the grid number is Misunderstood. However, the fire direction center then becomes
more confident of their understanding of the utterance in line 1 (perhaps they initially
misunderstood the first digit, then remembered that all grids on that map begin with a
4) and thus feels free to provide a Repeat Back. So a Repeat Back on Misunderstood
material could lead to an Agreed-Content degree.
# ID Utterance Evidence Degree
1 FO grid 4 5 6 3 4 6 over Submit Accessible
2 FDC say again grid over Request Repair Misunderstood
3 FDC grid 4 5 6 3 4 6 over Repeat Back Agreed-Content
Table 7.2: Example of Repeat Back Evidence and Misunderstood Degree
Table 7.1 could be implemented as an FSA or a lookup-table, but in this dissertation
it is expressed in terms of rules; a subset of some of the most common combinations is
shown in Figure 7.3.
7.2.3 Determining Evidence of Understanding in Response
Section 7.2.1 described algorithms for identifying the types of evidence of understand-
ing in an utterance, and Section 7.2.2 described algorithms for identifying the degree of
groundedness of material being discussed. The next algorithm to be defined is the one
that determines what kind of evidence of understanding needs to be presented to help
material become more grounded.
Table 7.3 shows all possible combinations of current degrees of groundedness
(columns) and grounding criteria (rows). For each of these possible combinations a type
72
if the last given Evidence was a Lack of Response
if the current Degree is Accessible
then the new Degree is Unacknowledged
if the current Degree is Agreed-Signal
then the new Degree is Agreed-Signal+
if the current Degree is Agreed-Content
then the new Degree is Agreed-Content+
if the last given Evidence was a Request Repair
then the new Degree is Misunderstood
if the last given Evidence was a Submit or Resubmit
then the new Degree is Accessible
if the last given Evidence was an Acknowledge
if the current Degree is Agreed-Signal
then the new Degree is Agreed-Signal+
else the new Degree is Agreed-Signal
if the last given Evidence was a Use
then the new Degree is Agreed-Content
if the last given Evidence was a Repeat Back
if the current Degree is Agreed-Content
then the new Degree is Agreed-Content+
else the new Degree is Agreed-Content
Figure 7.3: Rules to Identify Degrees of Groundedness
of evidence of understanding is provided, which an analysis of the corpus has shown is
most likely to ultimately ground the material involved.
The large number of Lack of Response entries in Table 7.3 indicate that if the mate-
rial has already met its grounding criteria, it is enough to provide no additional evidence.
However, in some cases Lack of Response increases in grounding by providing a lack of
objection, as is the case when the grounding criteria is Agreed-Content+ and the current
degree of groundedness is Agreed-Content. For material whose grounding criterion is
Unknown, the Lack of Response indicates that although a speaker or situation may bring
up the topic, it need not be grounded any further.
73
Un-
known
Misun-
derstood
Unac-
knowl.
Acces-
sible
Agreed-
Signal
Agreed-
Signal+
Agreed-
Content
Agreed-
Content+
Un-
known
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Misun-
derstood
Submit Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Unac-
knowl.
Submit Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Acces-
sible
Submit Re-
submit
Re-
submit
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Agreed-
Signal
Submit Re-
submit
Re-
submit
Ac-
knowl.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Agreed-
Signal+
Submit Re-
submit
Re-
submit
Ac-
knowl.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Lack of
Resp.
Agreed-
Content
Submit Re-
submit
Re-
submit
Repeat
Back
Repeat
Back
Repeat
Back
Lack of
Resp.
Lack of
Resp.
Agreed-
Content+
Submit Re-
submit
Re-
submit
Repeat
Back
Repeat
Back
Repeat
Back
Lack of
Resp.
Lack of
Resp.
Table 7.3: Identifying Evidence in Response
The entries in this table only cover responses to be made for grounding purposes, and
are not exclusive of utterances that may be generated by other reasons. For example, if
a speaker’s task model prescribes that they should Move On to the next step in a task,
a grounding-related Lack of Response evidence should not override this. Similarly, all
material that is currently Unknown should not necessarily be immediately Submitted
for grounding purposes; other dialogue components, such as a task model, will dictate
which topic should be under discussion, and at what point. In practice, the table is
expressed in terms of rules; a subset of the most rules are shown in Figure 7.4. Note that
these responses focus on what is needed for the system to handle the current domain;
for more complex dialogues, especially those handling misunderstandings, the current
degree of grounding may be relevant.
74
if the grounding criterion is Unacknowledged
then the evidence to reply is: Lack of Response
if the grounding criterion is Accessible
then the evidence to reply is: Lack of Response
if the grounding criterion is Agreed-Signal
then the evidence to reply is: Acknowledge
if the grounding criterion is Agreed-Signal+
and the current degree of groundedness is Agreed-Signal
then the evidence to reply is: Lack of Response
else
the evidence to reply is: Acknowledge
if the grounding criterion is Agreed-Content
then the evidence to reply is: Repeat Back
if the grounding criterion is Agreed-Content+
and the current degree of groundedness is Agreed-Content
then the evidence to reply is: Lack of Response
else
the evidence to reply is: Repeat Back
Figure 7.4: Rules to Determine Evidence in Response
7.3 Implementation Details
Initial efforts in developing a dialogue manager for Calls for Fire were part of the
Radiobot-CFF project [RLR
+
06], which worked in the context of the JFETS-UTM
system described in Section 4.4. After analyzing the domain, the initial Information
State-based dialogue manager distinguished between material that needed to be repeated
back, such as Warning Orders and Target Descriptions, and material that did not, such
as confirmations to Shot and Splash warnings [RT06]. Rules for handling corrections
were built, and the entire system was evaluated and shown to have a respectable rate of
completion compared to human performance [RRVT06].
The research effort described in this chapter was part of the IOTA project, which
extended the Radiobot-CFF work. Recent efforts in CFF dialogue management have
focused on building a rich model of representing how grounded material is, without
losing the performance levels of the initial versions. Using a Degrees of Grounding
dialogue manager provides additional capabilities - providing additional information
75
for student assesment by modeling more precisely the state of understanding between
human and machine - while providing the same level of service as a traditional dialogue
manager without Degrees of Grounding.
Figure 7.5: IOTA Architecture: IOTA in the JFETS-UTM
The IOTA system, which is an extension of the Radiobot-CFF system, interacts with
the JFETS-UTM CFF training environment described in Section 4.4. IOTA interfaces
with JFETS UTM as depicted in Figure 7.5. The trainee talks to the IOTA component
by voice through the simulated radio. IOTA determines what kind of response to make,
and replies back by radio. Also, IOTA interacts with the JFETS CASTrainer training
interface that operators usually use to manage and request fires. The notion is that when
IOTA is enabled, IOTA will collect information such as target descriptions and grid
locations, and fill out the form elements of the CASTrainer training interface. To this
end, when IOTA determines that a relevant informational element has been given, IOTA
76
sends XML messages to the CASTrainer interface, which then fills in the form elements
as requested. Finally, if IOTA determines that a mission needs to be fired, it sends a
message telling the CASTrainer to fire the mission, and the mission is fired just as if a
human operator had requested the fire through the CASTrainer interface.
Figure 7.6: IOTA Architecture: Components Within IOTA
Within the IOTA module, several pipelined components are at work, as shown in
Figure 7.6. A Speech Recognizer takes the voice signal and translates this into text,
using the SONIC speech recognition system [Pel01] with custom language and acoustic
77
models [SGN05]. Next, an Interpreter component determines what the meaning of the
text is: whether a Warning Order is given, or a Target Location, or some other Dia-
logue Move, and if so, what the Parameters are. The Interpreter component tags the
Speech Recognizer output with its its dialogue move and parameter labels using two
separate Conditional Random Field [SP03, McC02] taggers trained on hand-annotated
utterances.
The IOTA dialogue manager was built using the Information State [TL03] approach,
which involves defining a set of information state components, along with rules to
update those components. The Dialogue Manager determines whether a voice con-
firmation is needed, and if so, uses a Text-to-Speech (TTS) engine to produce it. The
Dialogue Manager also determines what kind of XML command is needed for send-
ing to the JFETS CASTrainer, and produces that if so - it is the CASTrainer through
which the commands to fire missions (and therefore display explosions in the virtual
environment) are made. The dialogue manager uses algorithms for dialogue manage-
ment as described in Section 7.2. The dialogue manager was developed, integrated, and
tested as part of this dissertation research; the rest of the system was developed by other
researchers as enumerated in [RLR
+
06].
7.4 Trace of Dialogue Management Example
As described in Section 4.4, the dialogue manager receives the output of the Interpreter
component. For a given utterance, this is a dialogue move and parameter classification,
at the CGU level, as described in Section 6.2. So for the utterance “adjust fire polar
over”, the dialogue manager receives a classification of “Warning Order / Method of
Fire” (here abbreviated as “Fire”), with two parameters: “method of fire” with value
“adjust fire” and “method of location” with value “polar”. Because the dialogue manager
78
builds CGUs at the combined dialogue move / parameter level, the dialogue manager
therefore builds two objects, one for each parameter, as shown in Figure 7.7.
ASR: adjust fire polar over
CGU identified:
dialogue move: FIRE
parameter: method_of_fire
parameter value: adjust fire
CGU identified:
dialogue move: FIRE
parameter: method_of_location
parameter value: polar
Figure 7.7: Dialogue Management Example: CGU Objects
The dialogue manager then considers the CGUs one at a time to determine a target
number, evidence, and degree: the first CGU being considered is shown in Figure 7.8.
Because the mission is in the process of being described, the Fire-Method of Fire CGU
does not have a target number yet. Because the mission has not yet been fired, it is not
possible for post-fire adjusts to have been given yet, so the adjust number is 0.
CGU under consideration:
type: Fire-method_of_fire
value: adjust fire
target number: unassigned
adjust number: 0
evidence: submit
degree: accessible
Figure 7.8: Dialogue Management Example: CGU Register
The Fire-Method of fire CGU identified by unassigned-0 does not yet have a value,
so the rules identifying the type of Evidence of Understanding being presented for this
CGU recognize this as a Submit. Because the latest Evidence was a Submit, the rules
identifying the Degree of Understanding identify this as Accessible. The CGU tracks
79
who submitted the material, for purposes of disambiguating between a Resubmit and a
Repeat Back type of Evidence.
The dialogue system begins with a Grounding Criterion defined for each type of
information. The Grounding Criterion is a degree, and in this case it has not yet been
met, so the CGU is flagged as not yet having been grounded. Strictly speaking, this
grounding check could be made as needed and thus need not be recorded as part of
the CGU, however it is useful for analysis and debugging to track the CGU’s state of
groundedness as part of the CGU. After the method of fire has been considered, the
CGU register has one CGU in it, as shown in Figure 7.9.
CGU Register / Common Ground:
unassigned-0-FIRE-method_of_fire:
value: adjust fire
degree: accessible
initiator: other
grounded?: false
Figure 7.9: Dialogue Management Example: CGU Register
The same process is undergone for the second CGU identified, the “polar” method of
location. This also has an unassigned target number, an adjust number of 0. Algorithm
rules identify its evidence type as Submit, and its degree as Accessible, as shown in
Figure 7.10.
CGU under consideration:
type: Fire-method_of_location
value: polar
target number: unassigned
adjust number: 0
evidence: submit
degree: accessible
Figure 7.10: Dialogue Management Example: CGU Under Consideration
80
At this stage of processing, the incoming utterance has been fully processed. The
CGU register has two CGUs in it, and neither one has met its grounding criterion, as
shown in Figure 7.11.
CGU Register / Common Ground:
unassigned-0-FIRE-method_of_fire:
adjust_fire | accessible | other | false
unassigned-0-FIRE-method_of_location:
polar | accessible | other | false
Figure 7.11: Dialogue Management Example: CGU Register
The dialogue manager must now decide what evidence if any to provide. It begins
by looking through the CGU register and identifying which CGUs have been not yet met
their grounding criteria. It identifies the two CGUs that have not yet met their grounding
criteria. A set of rules determine that for these CGUs to reach their grounding criteria,
Repeat Back evidence must be made for them. The CGUs and evidence needed are then
processed by the Generation subsystem, which produces the appropriate utterance, as
shown in Figure 7.12.
Evidence Needed:
Repeat Back, for: unassigned-0-FIRE-method_of_location
Repeat Back, for: unassigned-0-FIRE-method_of_fire
Generation: adjust fire, polar, out.
Figure 7.12: Dialogue Management Example: Evidence Provided
Finally, the dialogue manager considers the state of the CGU register after provid-
ing the Repeat Back evidence. The dialogue manager applies the appropriate rules to
determine that because of the Repeat Back evidence presented, the CGUs have met their
grounding criteria, and are now grounded, as shown in Figure 7.13. In this manner an
utterance has been received, processed, repeated back, and grounded.
81
CGU Register / Common Ground:
unassigned-0-FIRE-method_of_fire:
adjust fire | agreed-content | other | true
unassigned-0-FIRE-method_of_location
polar | agreed-content | other | true
Figure 7.13: Dialogue Management Example: CGU Register After Reply
7.5 Corpus Evaluation of Dialogue Management Algo-
rithms
The validity of this model has been evaluated in several corpus tests to measure inter-
coder agreement in identifying evidence, to ensure that identifying evidence can reliably
be done by an algorithm, to measure inter-coder agreement in identifying the increase
or decrease of the degree of groundedness, and to ensure that identifying the increase or
decrease of a degree of groundedness can reliably be done by an algorithm.
The corpus used to develop the model of grounding was taken from a set of JFETS-
UTM sessions run at the Ft. Sill facility with a human trainee on the radio acting as
Forward Observer, a human trainer controlling the simulation acting as Fire Direction
Center, and no Radiobot-CFF system. These sessions had been recorded and transcribed
by various members of the original Radiobot-CFF team and were used early in the devel-
opment of Radiobot-CFF, to develop the dialogue model for the dialogue manager, and
to provide data for its speech recognition and interpretation components.
Human transcribers produced transcriptions of several sessions between two sets of
humans acting as Forward Observer and Fire Direction Center radio operators in the
training simulation. A subset of the corpus was used for close analysis: this subset was
made up of 4 training sessions, composed of 17 fire missions, totaling 456 utterances;
this provided a total of 1222 possible indicators of evidence of understanding made up
of 886 dialogue move parameters and 336 period of silence.
82
A script performed a dialogue act interpretation on the utterances of these transcrip-
tions; these were then checked and corrected those interpretations by hand. These anal-
yses used the dialogue moves and parameters described previously, with slight exten-
sions: whereas the dialogue moves and parameters described previously handled only
the Forward Observer, the annotations for the corpus analysis handled both the Forward
Observer and the Fire Direction Center.
A total of 1398 discrete potential phenomena were analyzed for evidence of under-
standing. Of these, 1001 were common ground units associated with dialogue act inter-
pretations. The other 397 phenomena in the corpus were periods of silence between
utterances, which were examined to determined whether they were long enough to rep-
resent Lack of Response evidence of understanding.
7.5.1 Identifying Evidence - Inter-Coder Agreement
The first corpus evaluation explored whether human annotators could reliably agree
on identifying evidence of understanding. This was framed in terms of an inter-coder
agreement experiment in which two coders tagged a subset of the corpus (318 dialogue
move parameters and 74 silences) to identify the evidence of understanding, given an
utterance and dialogue act interpretation. One coder was the first author of this paper,
and the other was a computer professional who had no previous experience with the
domain or with tagging data.
Table 7.4 shows the results, broken down by the Standalone types of evidence, which
could occur by themselves (Submit, Repeat Back, Resubmit, Acknowledge, and Request
Repair), the Additional types of evidence, which only occurred with other types of evi-
dence (Move On and Use), and the Silence-Related Lack of Understanding type of evi-
dence. Each of these showed acceptable levels of agreement, with the exception of the
83
Evidence Type P(A) Kappa
Standalone 0.95 0.91
Additional 0.87 0.53
Silence-Related 0.92 0.84
Table 7.4: Inter-Coder Agreement - Evidence
Evidence Type P(A) Kappa
Standalone 0.88 0.81
Additional 0.98 0.92
Silence-Related 1.0 1.0
Table 7.5: Algorithm Agreement - Evidence
Kappa for the additional evidence. The low score on the additional evidence is prob-
ably due to the fact that Move On judgments depend on a strong understanding of the
domain-specific task structure, as described in section 6.3.6; to a lesser extent Use judg-
ments tend to rely on an understanding of the scenario as well. This highlights the
fact that for most of the evidence of understanding (all except for Move On and Use),
agreement can be reached with a non-expert coder.
7.5.2 Identifying Evidence - Human-Algorithm Agreement
The next corpus evaluation explored whether the dialogue management algorithm to
identify evidence of understanding could agree with human judgments. The results of
the inter-coder agreement test were merged into the larger 1222-markable corpus, to
create a consensus human-coded corpus.
The set of rules to identify Evidence of Understanding described in Section 7.2.1
were then applied to the 1222-markable corpus, and the resulting identifications were
then compared to the identifications made by the human coders. The results are shown
in Table 7.5. The respectable agreement and kappa values indicate that it is possible for
an algorithm to reliably identify evidence.
84
Agreement Type P(A) Kappa
Human-Human 0.97 0.94
Human-Algorithm 0.87 0.73
Table 7.6: Degree Increase/Decrease Agreements
7.5.3 Change in Degree of Groundedness - Inter-Coder Agreement
The next corpus evaluation explored whether human annotators could reliably agree
on whether a given material’s groundedness had increased or decreased after a given
turn. Two human annotators were given segments of dialogue transcripts containing
utterances and periods of silence (with length in seconds, for the silences) and for each
utterance, the speaker of each utterance and an item of interest (a target location, for
example). For each utterance, the annotators were asked whether the item of interest
was more, less, or equally grounded than it was the turn before. This produced 74 data
points, all of which were annotated by both annotators, and agreement calculations were
made. The results are shown in the first row of Table 7.6, indicating that humans could
reliably agree among themselves.
7.5.4 Change in Degree of Groundedness - Algorithm Agreement
The final corpus evaluation explored whether the dialogue management algorithm to
identify degree of groundedness could agree with humans on identifying a change in
degree of groundedness.
The data from the human agreement study on change in degree of groundedness was
combined to make a human consensus set of data. Then the rules described in Section
7.2.2 were implemented and used to determine a CGU’s degree of groundedness, and
therefore whether that degree of groundedness had increased or decreased. Agreement
calculations were then made. The results are shown in the second row of Table 7.6,
85
indicating that the dialogue management algorithm to identify degree of groundedness
could reliably agree with human consensus judgments.
7.6 Evaluation of Implemented System
To evaluate the dialogue management model of Degrees of Grounding, we had human
subjects use the JFETS-IOTA system in a Call for Fire task. The purpose of this evalu-
ation was to ensure that the dialogue manager implementing the Degrees of Grounding
model did not decrease system performance. Our concern in this evaluation was whether
the Degrees of Grounding dialogue manager could handle dialogues as effectively as a
traditional dialogue manager used as a control. If this is the case, then the Degrees of
Grounding dialogue manager is an improvement over the traditional dialogue manager
because not only does it match its performance, but it also provides a detailed descrip-
tion of the extent to which material is grounded; this description can then be used to
suggest real-time operator intervention, or during After-Action Reviews.
Following [RRVT06], our primary area of interest was task completion, defined as
transmitting and confirming the information required to make a decision to deliver the
first fire. We focused on the first fire because after the initial fire, Forward Observers
have the ability to make multiple adjusts that reflect the FO’s ability at calculating the
target’s location or their desire to ensure adequate target destruction, rather than the
efficiency of a dialogue. We also focused on the number of turns to fire rather than
the time to fire, because the latter could be influenced by interactions with the virtual
world such as target identification and location calculation, which are not relevant to the
dialogue.
To perform this evaluation, twenty-four fire missions were called in by four sub-
jects in a JFETS-IOTA installation in multiple desktop computers. To compensate for
86
Degrees Control
Number of Fires 12 12
Task Completion 92% 92%
Ave. Turns to Fire 3.7 4.4
Table 7.7: Performance Summary
ordering effects, two of the subjects began by calling in three missions in the control
condition followed by three missions in the Degrees of Grounding condition; the other
two began by calling in three missions in the Degrees of Grounding mission followed
by three missions in the control condition. Although the dialogue system contains the
ability of the operator to correct the dialogue system’s interface to the simulator, task
completion in this context was defined as being the ability of the system to perform the
task without human intervention.
Results of tests on the system running the grounding condition are summarized in
Table 7.7. In each of the conditions, 11 of the 12 initial fires were achieved without man-
ual intervention, for a task completion rate of 92%, with comparable turn to fire averages
in both conditions. The data is fully enumerated in Tables 7.8 and 7.9. A paired t-test
shows that there is no statistical significance between the data sets of the two condi-
tions in terms of turns to fire (p = 0.28). This demonstrates that the dialogue manager
implementing the Degrees of Grounding model can complete the Call for Fire task, and
suggests that its performance is comparable to a state-of-the-art dialogue manager.
7.7 Discussion
Chapter 6 provided an answer to this dissertation’s first thesis question, by developing
a representation of degrees of groundedness and using it to describe dialogue phenom-
ena. This chapter has provided an answer to the second thesis question, about whether
degrees of grounding have a prescriptive value or can only be used descriptively. The
87
Turns to Fire
Subject 1 Session 1 4
Subject 1 Session 2 6
Subject 1 Session 3 3
Subject 2 Session 4 3
Subject 2 Session 5 4
Subject 2 Session 6 not completed
Subject 3 Session 1 4
Subject 3 Session 2 3
Subject 3 Session 3 3
Subject 4 Session 4 5
Subject 4 Session 5 3
Subject 4 Session 6 3
Average 3.7
Table 7.8: Performance in Grounding Condition
Turns to Fire
Subject 1 Session 4 4
Subject 1 Session 5 4
Subject 1 Session 6 8
Subject 2 Session 1 6
Subject 2 Session 2 not completed
Subject 2 Session 3 3
Subject 3 Session 4 3
Subject 3 Session 5 3
Subject 3 Session 6 3
Subject 4 Session 1 6
Subject 4 Session 2 4
Subject 4 Session 3 4
Average 4.4
Table 7.9: Performance in Control Condition
answer is that degrees of grounding can be used to prescribe agent actions. In fact, a sys-
tem using the Degrees of Grounding model can perform the CFF training task with a per-
formance comparable to that of a state-of-the-art system, while providing the additional
advantage of a session log that tracks the extent to which the trainee grounded material
during the session, which can be useful for assessment and After-Action Review.
88
A final thesis question remains to be answered, however. It is not yet clear what
additional advantages, if any the Degrees of Grounding model provides. Chapter 8 pro-
vides an answer that question, through a set of experiments that show how the Degrees
of Grounding model can provide performance improvements when handling grounding
tasks for a virtual human. Following that, Chapter 9 provides final conclusions, and a
set of future directions for this research.
89
Chapter 8
Grounding in Tactical Questioning
with Degrees of Grounding
8.1 Extension of the Model to a Second Domain
Chapters 6 and 7 have described the development of the Degrees of Grounding model
from a corpus, and the implementation of the model for dialogue management. How-
ever, development of the model so far has been in the same domain: Calls for Fire. To
better understand the potential value of the model, it is important to implement it in a
second domain with characteristics different from that of Calls for Fire. In that way, we
can examine how the various components of the model need to be updated, as well as
have some examples to consider the possible extent to which this model may be domain
independent. To that end, I selected the domain of developing virtual humans in Tactical
Questioning.
Furthermore, it is also important to run experiments to identify what improvements,
if any, are gained by using the Degrees of Grounding model. This chapter will show
that using Degrees of Grounding in a virtual human provides an improvement in human
perceptions in several measures, especially in perceived appropriateness of response.
90
8.2 A Virtual Human for Tactical Questioning Training
8.2.1 Domain: Tactical Questioning
In Tactical Questioning dialogues small-unit military personnel hold conversations with
individuals to produce information of military value [Arm06]. We are specifically inter-
ested in this domain when applied to civilians, when the process becomes more conver-
sational and additional goals involve building rapport with the population and gathering
general information about the area of operations.
A questioner should be aware of the subject’s cultural tendencies such as their ten-
dency to avoid uncertainty, their level of focus on relationships when conducting busi-
ness, their focus on long- versus short-term gains, for example [Wun06].
In the specific case of Iraqi Arabic culture, appealing to the subject’s sense of honor,
and being aware of specific issues of interest to an influential person - such as a need
for secrecy, or cooperation on Civil Affairs projects - can lead to success in acquiring
information during a tactical questioning session [Pau06].
8.2.2 Scenario: Hassan
We have investigated the use of virtual humans - spoken dialogue systems embodied
in a virtual environment - for training individuals in conducting Tactical Questioning
[TRL
+
07]. This work is in the tradition of research in building affective dialogue sys-
tems [ADMH04] as part of embodied conversation agents [Cas01, RMG
+
02], with emo-
tional components for training or tutoring purposes [GM05]. Similar work [TSMG05]
has used a virtual human to decide on a negotiation strategy based on its emotional
appraisal of the topic, of its negotiation options, and of the human speaker.
Besides the research in grounding described here, some of the research issues inves-
tigated by our group are the development of a domain-appropriate model of emotions
91
and social interactions [RT07], the trade-offs between the possible kinds of system archi-
tectures [TLR
+
08], and issues related to developing authoring tools for end-users to
create such new scenarios [GDR
+
08].
The scenario used as a testbed for this thesis involved a virtual human named Hassan,
shown in Figure 8.1. The scenario takes place in contemporary Iraq, where the trainee
must talk to Hassan, a local government functionary. If the trainee convinces Hassan
to help him, the trainee will confirm suspicions about an illegal tax being levied on
a new marketplace; a successful trainee may discover that the tax has been placed by
Hassan’s employer, and even learn where to find that employer. But if Hassan becomes
adversarial, he may lie or become insulting.
Figure 8.1: A Human Interacting With Hassan
Table 8.1 shows an excerpt from a typical dialogue with Hassan. Rather than always
working cooperatively, Hassan has a set of goals that must be satisfied, such as assur-
ances of protection, or in some cases, an offer of money. In turn 1 the Trainee asks
a question about the identity of the person collecting the tax. In turn 2, Hassan indi-
catates that he might be interested in answering the question, if his needs were fulfilled.
In turn 3 the Trainee indicates that they have previously assured Hassan that he could
have protection and money. In the actual dialogue, those previously assurances were not
correctly interpreted by Hassan, so this is actually the first time Hassan understands the
92
offer of money. In turn 4 Hassan indicates that he has understood the offer of money,
and answers the question about taxation.
1 Trainee Well then why don’t you just tell me
who is collecting the tax?
2 Hassan I might tell you what you want if there
was something in it for me
3 Trainee Now I’ve already told you that we can
protect you or offer you money
4 Hassan So, you offer me money. Indeed, you
might say that I collect the taxes.
Table 8.1: Example Dialogue With Hassan
As a training application, Hassan logs utterances, language features, and emotional
states at every turn, with the aim of producing a summary for after-action review, at
which time a human trainer and trainee may discuss the session. The notion is that
although Hassan is an automated system and handles dialogues without human interven-
tion, a trainer should be allowed to supervise the session and have the ability to intervene
mid-session or review the session after the fact. For this reason, Hassan may react real-
istically to a trainee’s bribes or threats of force, even though such actions are against
policy for Tactical Questioning of noncombatants [Arm06]: these behaviors would be
reviewed by a human trainer during or after the training session.
8.2.3 Hassan System Architecture
The natural language components of the version of Hassan that was used for this thesis
is shown in Figure 8.2.
V oice input is translated into text by an Automated Speech Recognition Component,
which uses the Sonic recognizer [Pel01] and custom language models [SGN05]. An
Anaphora Resolution component tries to resolve pronouns and other anaphors based on
dialogue context. An NLU component perform a statistical dialogue act classification
93
using a tool named NPCEditor, which is based on the research of [LT08, LPTK06]. The
output of the NLU component, which is a semantic frame, is used by two components:
the Grounding module and the Compliance module.
The Grounding module is based on the Degrees of Grounding model and is further
discussed below. The Compliance module updates Hassan’s emotional state, and is used
in conjunction with the Response Generation to determine the kind of reply (apart from
explicit grounding behavior) that Hassan will make. This response is effected by an
NLG component to produce a text reply, which may be made less or more polite by a
Style Filter based on [DTA08], which is influenced by the model of emotions. This final
reply is then sent to the virtual environment to be displayed by a virtual human, which
is an adaption of components of the ICT Virtual Human project [KHG
+
07].
Figure 8.2: Hassan System Architecture
The Grounding module was developed, integrated, and tested as part of this disser-
tation research; the rest of the system was developed by other researchers as enumerated
in [TRL
+
07].
94
8.2.4 Grounding and Dialogue Management
As described above, the Grounding and Dialogue Management components work with
the output of an NLU component. An example NLU interpretation is shown in figure
8.3. It is a speech act expressed in XML; the example shown is a question asking about
the reason that the market is not being used. The whq speech act tag indicates a Wh-
question, the market attributed to the object tag indicates the object of the question, and
the reason-not-used attribute provides more information about the question.
Figure 8.3: Example Interpretation: Wh- Question
hassan
Figure 8.4: Example Interpretation: Assertion
Figure 8.4 shows a speech act that Hassan might make as a reply to the human
questioner, asserting the fact that the tax collector was Hassan himself. Other examples
are shown in Figure 8.5, which represents a greeting speech act, and Figure 8.6, which
95
represents an unknown speech act, that is, one that could not be identified with sufficient
confidence.
Figure 8.5: Example Interpretation: Greeting
Figure 8.6: Example Interpretation: Unknown
The grounding component works with the NLU interpretation before the Dialogue
Manager component does. The algorithm used by the grounding component is described
in Figure 8.7. The algorithm begins by identifying the CGU relevant to a speech act’s
object: if the object has not been mentioned before, this involves creating a new CGU,
otherwise this involves identifying the CGU in the record of CGUs. The evidence of
understanding provided by the utterance and the CGU’s subsequent degree of ground-
edness is computed, the evidence to provide in reply is determined, and the CGU’s new
degree of groundedness is determined.
given a speech act object,
identify the relevant CGU
identify evidence of understanding
compute the CGU’s degree of groundedness
determine evidence to be given
compute the CGU’s degree of groundedness
Figure 8.7: Hassan Grounding Algorithm
96
Hassan’s grounding algorithm differs from the approach shown in Figure 7.1 by
only analyzing one speech act object per human utterance. This is because, as discussed
above, Hassan’s NLU component only identifies one speech act object per human utter-
ance, rather than the multiple dialogue move parameters identified by the IOTA Inter-
preter module. Also, rather than examine the entire common ground record to identify
material to ground, Hassan only examines the CGU related to the speech act object
associated with the human user’s utterance.
An addition to the model of evidence in the Tactical Questioning domain is the han-
dling of unknown speech acts. In the CFF training environment, utterances that cannot
be reliably interpreted are not responded to, under the notion that those would be han-
dled by a human supervisor. However, Hassan handles them differently. The grounding
module tracks the latest object being discussed throughout the dialogue; if there has
been an object brought up then Hassan would ask if the trainee was still talking about
that object. An ’unknown’ speech act provides negative evidence of understanding,
which decreases the degree of groundedness, and which needs to be remedied by pro-
viding a request repair of some sort. If no object had been brought up yet, then Hassan
would make a request repair with no object - this is cooperative behavior built under the
observation that this mostly happens during the beginning of the dialogue, when Hassan
has less reason to be uncooperative.
After the grounding module has decided whether to make a grounding-related utter-
ance or not, it notifies the Response Generation module, which then makes a decision
whether to make an additional reply - answering a question, or returing a greeting, for
example.
97
8.3 Example Dialogue Excerpt
Table 8.2 provides an example exchange. In line 1 the trainee asks the virtual human
if he knows the sheikh. The dialogue management algorithm associates the CGU with
the imam (synonymous with ’sheikh’ in this scenario), determines that the evidence is
Submit as it is being introduced for the first time, and determines that therefore the
degree is Accessible.
Because the topic is a sensitive one, the grounding criterion is high and is not met at
the Accessible degree of groundedness. The virtual human determines that this CGU
requires a Repeat Back evidence, which is provided in line 2, thereby bringing the
topic to its grounding criterion. Then, in line 3 the virtual human elicits a promise
of anonymity, which results in Submitting another CGU.
Line Speaker Utterance CGU
1 Trainee Do you know the Sheikh? Topic: Imam
Evidence: Submit
Degree: Accessible
2 Hassan You take an interest in the Imam. Topic: Imam
Evidence: Repeat Back
Degree: Agreed-Content
3 I would prefer to remain anonymous Topic: Anonymity
in these matters. Evidence: Submit
Degree: Accesible
Table 8.2: Evidence and Degree in a Dialogue With Hassan
98
8.4 Evaluation
8.4.1 Comparison to No Grounding Condition
The goal of evaluating Hassan’s grounding module was to determine whether it
improved the virtual human, and if so how. A particular focus was whether the sys-
tem made more appropriate responses, and whether the trainee had a better notion of
how grounded the material being discussed in the conversation was.
The first experiment involved examining whether the grounding module improved
the virtual human, compared to a virtual human with no grounding module. To test this,
10 subjects held dialogues with Hassan, interacting with him in two conditons: one in
which Hassan had the grounding module, and a control condition in which he had no
grounding module. The sequence of conditions alternated, so every other subject had
the control condition dialogue first, to account for ordering effects. After each dialogue,
subjects filled out a questionnaire that used 7-point Likert scales to measure the human’s
perceptions of Hassan’s grounding behavior and appropriateness of dialogue responses.
A brief description of the questions shown in Table 8.3 follows. The questions are
discussed here in a slightly different order than they were presented to the participants,
for easier comparison to the follow-up questions in the second experiment. The ques-
tions were stated in a colloquial way, to overcome the difficulty of handling subject’s
naive understanding of the words such as “belief” and “grounding.”
The first set of questions were about beliefs. Question 1 was about beliefs about
beliefs. This addresses the human’s perceptions of the virtual human’s beliefs, which
is an important part of grounding. It was expressed as, ”Did you have a sense of what
Hassan wanted?” Question 2 was about beliefs about beliefs about beliefs. This ques-
tion addresses the human subject’s perceptions of the virtual human’s beliefs about the
99
Question Text
1 Did you have a sense of what Hassan wanted?
2 How well did Hassan seem to understand what you were
talking about?
3 How much effort did Hassan seem to put into trying to
understand you?
4 When Hassan had problems understanding you, how
human-like were his attempts to resolve the problems?
5 How appropriately did Hassan respond to what you were
saying?
6 Taken as a whole, how human-like was Hassan as a con-
versation partner?
Table 8.3: Experiment 1 - Questions Used
human’s beliefs, which is an important component of grounding, and was stated as:
”How well did Hassan seem to understand what you were talking about?”
The next two questions attempted to measure whether the human could detect a
change in grounding behavior, and whether this grounding behavior seemed human-
like. In particular, Question 3 was about perceived grounding behavior. This addresses
whether the human subject can perceive the virtual human’s attempts at grounding, and
were stated as ”How much effort did Hassan seem to put into trying to understand you?”
Question 4 was about human-like error-correction grounding behavior. This addresses
whether the the subset of grounding behavior related to attempts to resolve problems
were human-like. The questionnaire states this as, ”When Hassan had problems under-
standing you, how human-like were his attempts to resolve the problems?”
The final two questions were about overall improvements in the virtual human as
a dialogue system. Question 5 was about quality of responses. The hypothesis is that
the grounding module will produce more appropriate responses when implicit confirma-
tions occur as part of a repeat-back reply, as well as when a non-understanding occurs.
The questionnaire states this as ”How appropriately did Hassan respond to what you
100
were saying?” Question 6 was about human-like behavior, which is a desireable qual-
ity in a virtual human. The questionnaire states this as, ”Taken as a whole, was your
conversation with Hassan like a conversation with a human?”
Subject Condition Q1 Q2 Q3 Q4 Q5 Q6
1 Grounding 6 4 5 6 5 3
2 Grounding 5 3 4 3 4 2
3 Grounding 3 5 5 3 5 3
4 Grounding 2 4 4 4 3 3
5 Grounding 6 5 5 5 4 5
6 Grounding 1 2 5 4 3 5
7 Grounding 3 4 3 3 4 3
8 Grounding 5 3 5 3 4 3
9 Grounding 5 4 4 3 5 3
10 Grounding 1 2 3 3 3 2
Mean 3.7 3.6 4.3 3.7 4 3.2
1 Control 3 3 5 6 2 3
2 Control 2 3 3 2 2 2
3 Control 4 4 5 3 5 4
4 Control 4 4 4 4 4 4
5 Control 4 3 3 2 3 4
6 Control 1 1 3 1 1 1
7 Control 1 2 3 4 3 2
8 Control 1 1 2 3 2 1
9 Control 1 3 4 2 4 2
10 Control 4 4 3 4 3 4
Mean 2.5 2.8 3.5 3.1 2.9 2.7
p 0.084 0.035 0.026 0.11 0.0087 0.19
Table 8.4: Experiment 1 - Data
Table 8.4 shows the full set of Likert scale data from the surveys. Note that in all
questions, the mean was higher in the grounding condition than in the control condition.
Also, in a paired t-test, question 5 shows a p<0.01, and questions 2 and 3 show a p<0.05.
Because the number of subjects in this test could not guarantee a normal distribution, a
Wilcoxon Signed-Rank test was applied to the data, and only question 5 was shown to
be significant with p<0.025 (W=31, n
s/r
=8).
Table 8.5 shows the differences between the two conditions per participant: that is,
each cell describes the score that the participant gave for that question in the grounding
101
condition, minus the score that the participant gave for that question in the control con-
dition. So if the participant had a preference for the grounding condition, the number
would be positive, if the participant preferred the control condition, the number would
be negative, and if the participant preferred them equally, the number would be 0. Note
that for question 3 no participants expressed a preference for the control condition, and
for questions 2 and 5 only one participant expressed a preference for the control con-
dition. Question 1 had the highest mean: 4 participants expressed a preference for the
grounding condition that resulted in a difference of 3 or 4, but this was balanced by three
participants who expressed a preference for the control condition.
Subject Q1 Q2 Q3 Q4 Q5 Q6
1 3 1 0 0 3 0
2 3 0 1 1 2 0
3 -1 1 0 0 0 -1
4 -2 0 0 0 -1 -1
5 2 2 2 3 1 1
6 0 1 2 3 2 4
7 2 2 0 -1 1 1
8 4 2 3 0 2 2
9 4 1 0 1 1 1
10 -3 -2 0 -1 0 -2
Mean 1.2 0.8 0.8 0.6 1.1 0.5
Table 8.5: Experiment 1 - Differences
In summary, subjects seemed to prefer the grounding condition over the control con-
dition in all measures. This was strongest in the measure of question 5, the appropriate-
ness of response, and also present in question 2, how well Hassan seemed to understand
what the human was talking about, and question 3, how much effort Hassan seemed to
put into trying to understand the human.
102
8.4.2 Comparison to Random Grounding Condition
Given that the grounding module seemed to improve the virtual human rather than harm
it, a second experiment was designed to further study whether the Degrees of Grounding
model itself was the cause of the improvement in survey ratings, or whether the survey
ratings could be improved by any grounding at all.
To measure this, 10 subjects (different from the first set of 10) held dialogues with
Hassan, interacting with him in two conditions: one in which Hassan had the ground-
ing module, and a control condition which performed grounding behavior in a random
fashion. Again the sequence of conditions alternated, and the subjects filled out ques-
tionnaires, although the questions were slightly modified, as shown in Table 8.6.
Question Text
1 How strong was your sense of what Hassan was thinking?
2 How strong was your sense of what Hassan thought you
were saying?
3 In terms of the things he was saying, how good of an effort
did Hassan make at trying to understand you?
4 In terms of the things he was saying, how good of an effort
did Hassan make at trying to be understood?
5 In general, how appropriately did Hassan respond to what
you were saying?
6 Taken as a whole, how human-like was Hassan as a con-
versation partner?
Table 8.6: Experiment 2 - Questions Used
Table 8.7 shows the full set of Likert scale data from the surveys in the second exper-
iment. The means for all questions were higher in the grounding condition than in the
control condition, with the exception of Question 3, in which participants marginally
preferred the control condition (although only by a difference of one point on one ques-
tionnaire.) The primary variable of interest, question 5, showed a statistically significant
p<0.05. Of the two variables that had also previously shown low p values, question 2
showed a p<0.05 but question 3 did not. Because the number of subjects in this test
103
Subject Condition Q1 Q2 Q3 Q4 Q5 Q6
1 Grounding 7 5 7 5 7 5
2 Grounding 4 4 4 5 3 3
3 Grounding 6 5 3 4 5 6
4 Grounding 6 6 5 5 6 6
5 Grounding 5 3 2 3 4 4
6 Grounding 5 4 3 5 3 3
7 Grounding 2 6 2 4 3 3
8 Grounding 5 6 4 2 4 5
9 Grounding 4 4 3 6 4 3
10 Grounding 1 1 2 2 1 1
Mean 4.5 4.4 3.5 4.1 4 3.9
1 Control 5 4 4 3 4 4
2 Control 2 3 4 5 1 2
3 Control 6 3 3 4 3 6
4 Control 3 5 4 4 5 5
5 Control 5 4 3 3 4 4
6 Control 4 2 3 5 3 3
7 Control 4 4 5 5 3 4
8 Control 3 2 2 1 4 3
9 Control 4 4 6 5 3 3
10 Control 1 1 2 3 1 1
Mean 3.7 3.2 3.6 3.8 3.1 3.5
p 0.06 0.011 0.44 0.17 0.015 0.084
Table 8.7: Experiment 2 - Data
could not guarantee a normal distribution, a Wilcoxon Signed-Rank test was applied to
the data: question 2 was significant with p<0.025 (W=31, n
s/r
=8) and question 5 was
significant with p=0.05 (W=15, n
s/r
=5).
Table 8.8 shows the differences between the two conditions per participant. For
question 3, subjects had an almost negligible preference for the control condition. For
question 5 all participants expressed a preference for the grounding condition, and for
questions 1, 2 and 6 all but one participant expressed a preference for the grounding
condition.
In summary, subjects preferred the grounding condition over the control condition
in almost all measures, especially in questions 2 and 5.
104
Subject Q1 Q2 Q3 Q4 Q5 Q6
1 2 1 3 2 3 1
2 2 1 0 0 2 1
3 0 2 0 0 2 0
4 3 1 1 1 1 1
5 0 -1 -1 0 0 0
6 1 2 0 0 0 0
7 -2 2 -3 -1 0 -1
8 2 4 2 1 0 2
9 0 0 -3 1 1 0
10 0 0 0 -1 0 0
Mean 0.8 1.2 -0.1 0.3 0.9 0.4
Table 8.8: Experiment 2 - Differences
8.5 Discussion
The advantage of a grounding module is shown in Figure 8.9, which was taken from
the experiments described in this chapter. In this example, the trainee asked a question
about who Hassan is working for. However, the NLU component misinterpreted the
utterance as an offer to protect Hassan’s family.
1 Trainee Who are you working for?
(Misinterpreted as an offer to protect
Hassan’s family)
2a Hassan So, you offer to protect my family.
2b Hassan I do trust you, and hope that the U S can
fulfill these expectations.
3 Trainee Yes we will protect your family
Table 8.9: Example Dialogue: Misinterpretation and Grounding
Without the grounding module, Hassan would reply only with 2b: an acceptance of
the offer. However, since the Trainee is unaware that Hassan misinterpreted the question,
that utterance would make no sense to the Trainee.
The grounding module would help this problem with a repeat back of Hassan’s
understanding, shown here in line 2a, of the Trainee’s utterance. Because of this,
although the exchange is not ideal, the Trainee at least has a chance to understand why
105
Hassan is making the utterance in 2b, and is even able to proceed with the dialogue as
in line 3. This is the type of grounding behavior that helps produce the improvement in
participant responses over control conditions.
To summarize, this chapter has adapted the model of Degrees of Grounding
described in Chapters 6 and 7 to a new domain. A set of experiments has shown that
the Degrees of Grounding model provides advantages over baseline models in several
measures, but especially in terms of human perceptions of appropriateness of response,
and to a lesser extent in terms of providing an understanding of what Hassan thought
the human was saying.
This has also shown that the Degrees of Grounding model is revelant to more than
one domain. The extent to which it is relevant to other domains, and future directions,
will be discussed in Chapter 9.
106
Chapter 9
Conclusions and Future Directions
9.1 Conclusions of Completed Work
Research in error handling in spoken dialogue systems, as described in Chapter 2, has
produced several unanswered questions related to the notion of degree of groundedness,
the extent to which material has achieved mutual understanding during a dialogue. This
dissertation has presented the novel Degrees of Grounding model, which measures the
extent to which material is grounded, as described in Chapter 3. This model was devel-
oped from analyzing a corpus of dialogues between humans, and applied to modeling
the extent to which material was grounded in those dialogues, as described in Chapters
4 through 6. The model was then used for dialogue management, first in the domain
in which the model had been developed, then in an unrelated domain, as described in
Chapter 7 and Chapter 8.
The Degrees of Grounding model answers several open questions regarding ground-
ing. A key question was how to represent degrees of groundedness, and how it related
to the notion of evidence of understanding. As detailed in Chapter 3, the Degrees
of Grounding model provides a way of representing degrees of groundendess, and
describes the interaction between degrees of groundedness and evidence of understand-
ing. In particular, the extent to which material is grounded is dependent on the existing
degree of groundedness of a material when a new type of evidence of understanding is
observed; this is opposed to the model of Clark and Schaefer, in which the extent to
which material is grounded is dependent only on the strength of the most recent type
107
of evidence of understanding provided. The Degrees of Grounding model also enables
using degrees of groundedness to express grounding criteria, as well as a way of making
decisions about possible replies: by comparing the current degree of groundedness of a
material with that material’s grounding criteria, as shown in Chapter 6.
Another open question was whether degrees of groundedness were only useful
descriptively, or whether they could be used perscriptively to drive dialogues forward.
The Degrees of Grounding model shows how to update the degree of groundedness of
material in an online fashion by tracking the influence of evidence of understanding
on the relevant material, and then how to use that to produce a response described in
terms of evidence of understanding, motivated by the relevant material’s current degree
of groundedness and grounding criterion. This is shown in Chapter 7.
Another question was whether degrees of groundedness were of any value to the
development of spoken dialogue systems beyond answering open questions in ground-
ing theory. Chapter 8 shows how, when implemented as the grounding module of a
virtual human, the Degrees of Grounding model provides generally higher performance
measures, particularly in terms of perceived appropriateness of response.
The Degrees of Grounding model, therefore, has made several theoretical and
applied contributions. Furthermore, the work completed to date also suggests many
interesting future research directions.
9.2 Future Directions
9.2.1 Application to New Domains
The Degrees of Grounding model was tested in two domains. However, this does not
exhaust the full range of types of dialogue systems in which it could be implemented,
or the extent of the benefit of explicitly modeling degrees of groundedness. Although it
108
seems likely that the basic set of components of the model (Evidence of Understand-
ing, Degrees of Groundedness, Algorithms for Dialogue Management) are domain-
independent, it is likely that the elements of those components (the specific types of
evidence, some of the degree definitions, the logic of the algorithms, the way in which
evidence to reply with is determined) will need to be adapted. As described in Chapter
3, analyzing a new domain will involve identifying new cues that participants provide
one another which may constitute new types of evidence of understanding, determining
if new patterns of evidence lead to new degrees of groundedness, and deciding whether
changes to the algorithms for dialogue management are needed to identify types of evi-
dence, update degrees, and make responses. And in almost all cases, moving to a new
domain will require a new process of determining grounding criteria for the material
being discussed in that domain.
This is especially obvious when adding new modalities to the model. A dialogue
system that provides the ability to detect visual feedback, for example, allows another
possible kind of evidence of understanding. As described above, for each new kind
of evidence of understanding identified, one needs to consider whether a sequence of
evidence leads to a previously-undefined degree of groundedness, and if so, how the
dialogue management algorithms need to be updated. And indeed, work by Gergle
[TG08] suggests that a user utterance followed by a visual display leads to a different
degree of groundedness than a user utterance followed by a verbal repetition.
9.2.2 Machine Learning of Domain-Specific Features
Some of the issues involved in adapting the Degrees of Grounding model to a new
domain may be resolved by applying machine learning techniques to the new domain,
especially in terms of identifying new grounding criteria and building the algorithm to
determine a reply.
109
For example, given a corpus of dialogues between two humans in a domain to be
modeled, if algorithms to identify types of evidence of understanding and degrees of
groundedness in that domain are already implemented, that corpus might be processed
to identify at what point the material was considered sufficiently grounded by the dia-
logue participants, thereby automatically identifying the grounding criteria. Similarly,
learning algorithms might learn which responses are more likely to be produced in reply
to a given degree of groundedness. If dialogue features such as noise levels are available,
those might be used to identify features that modify the grounding criteria. If dialogue
success measures are available, reinforcement learning might be used to identify the
grounding criteria and response algorithms that are more likely to lead to successful
dialogues.
9.2.3 Grounding, Emotion, and Personality
Humans do not always ground in a way that will increase task performance, especially
when factors of emotion and personality are involved. For example, a human may not
care if important material reaches a high level of mutual understanding if they have
negative emotions towards their addressee. If they have certain kinds of personalities,
they may ground material more or less than seems necessary for task success. This can
be expressed through grounding criteria that are higher or lower than they would be
for other individuals, but emotion and personality might also be represented as having
other effects. For example, a character might be so angry that they fail to appropriately
identify cues from evidence of understanding. A character might be so anxious that they
fail to identify the correct response that would most likely ground the material they are
discussion. A question, then, is what effect various emotions and personality types will
have on grounding behavior.
110
This question is likely to be important when developing virtual humans that aspire to
human-like behavior. It is possible (although not certain) that humans interacting with
these virtual humans will be able to detect or attribute emotion or personality cues to
the virtual humans based on grounding behavior. Part of this question is whether the
Degrees of Grounding model provides advantages for describing and controlling this
phenomena.
9.2.4 Probabilistic Approaches
The model described in this dissertation describes a symbolic, rule-based approach to
modeling Degrees of Groundedness. However, it is possible that there would be some
benefit to using probabilistic approaches to represent various components.
In particular, the current model does not handle uncertainty in the dialogue com-
ponents or in its algorithms. The current model does not consider confidence scores
from Automated Speech Recognition or Interpreter components. Nor do its algorithms
provide confidence scores when determining evidence of understanding and degrees of
groundedness.
Such an approach might involve defining each of the components of the model prob-
abilistically and using decision theory, rather than symbolically and using rule-based
approaches. Given an utterance, probabilities would be assigned to the likelihood that it
contains a given type of evidence of understanding. Likewise, a number would describe,
for each type of material, the probability that it is grounded to a given degree of ground-
edness. Dialogue management algorithms could be adapted to handle these probabilities
with known techniques, such as using a POMDP for dialogue management [WY06] or
managing ambiguities [DS07].
The approaches described in this future directions section need not be used exclusive
of each other, or of the current implementation. For example, probabilities of types of
111
evidence could be provided along with rules to manage the dialogue; learning algorithms
could be used to discover either rules or probabilities for dialogue management, and the
influence of emotions and personality could be measured regardless of which approach
is used. Future work also involves studying the relative strengths and limitations of these
varieties of approaches.
112
Reference List
[ADMH04] Elisabeth Andr´ e, Laila Dybkjær, Wolfgang Minker, and Paul Heisterkamp,
editors. Affective Dialogue Systems, Tutorial and Research Workshop, ADS
2004, Kloster Irsee, Germany, June 14-16, 2004, Proceedings, volume
3068 of Lecture Notes in Computer Science. Springer, 2004.
[AF03] John Aberdeen and Lisa Ferro. Dialogue patterns and misunderstandings.
In Proceedings of the ISCA Tutorial and Research Workshop on Error
Handling in Spoken Dialogue Systems, pages 17–21, August 28-31 2003.
Chateau d’Oex, Vaud, Switzerland.
[All95] Jens Allwood. An activity based approach to pragmatics., 1995. Gothen-
burg Papers in Theoretical Linguistics 76, Goteborg University, Goteborg,
Sweden.
[Arm01] Department of the Army. Tactics, techniques and procedures for observed
fire and fire support at battalion task force and below. Technical Report FM
3-09.30 (6-30), Department of the Army, 2001.
[Arm06] Department of the Army. Police intelligence operations. Technical Report
FM 3-19.50, Department of the Army, 2006. Appendix D: Tactical Ques-
tioning.
[BLR
+
06] Dan Bohus, Brian Langner, Antoine Raux, Alan Black, Maxine Eskenazi,
and Alexander Rudnicky. Online supervised learning of non-understanding
recovery policies. In Proceedings of SLT-2006, 2006.
[Boa01] Combined Communication Electronics Board. Communication instruc-
tions: Radiotelephone procedures. Technical Report ACP 125(F), Com-
bined Communication Electronics Board, September 2001.
[BR05a] Dan Bohus and Alexander Rudnicky. Constructing accurate beliefs in spo-
ken dialog systems. In Proceedings of ASRU-2005, 2005.
113
[BR05b] Dan Bohus and Alexander Rudnicky. Error handling in the Ravenclaw
dialog management architecture. In Proceedings of HLT-EMNLP-2005,
2005.
[BR05c] Dan Bohus and Alexander Rudnicky. Sorry, I didn’t catch that! - an investi-
gation of non-understanding errors and recovery strategies. In Proceedings
of SIGdial-2005, 2005. Lisbon, Portugal.
[BVPV03] Caroline Bousquet-Vernhettes, Regis Privat, and Nadine Vigouroux. Dia-
logue patterns and misunderstandings. In Proceedings of the ISCA Tutorial
and Research Workshop on Error Handling in Spoken Dialogue Systems,
pages 41–45, August 28-31 2003. Chateau d’Oex, Vaud, Switzerland.
[Cas01] Justine Cassell. Embodied conversational agents: Representation and intel-
ligence in user interface. AI Magazine, 22(3):67–83, 2001.
[CB91] Herbert H. Clark and Susan E. Brennan. Grounding in communication. In
Perspectives on Socially Shared Cognition, pages 127–149. APA Books,
1991.
[CK04] Herbert H. Clark and Meredyth A. Krych. Speaking while monitoring
addressees for understanding. Journal of Memory and Language, 50:62–
81, 2004.
[Cla96] Herbert H. Clark. Using Language. Cambridge University Press, Cam-
bridge, 1996.
[CM81] Herbert H Clark and Catherine R Marshall. Definite reference and mutual
knowledge. In A. Joshi, B. Webber, and I. Sag, editors, Elements of Dis-
course Understanding, pages 10–63. Cambridge, 1981.
[CS89] Herbert H Clark and Edward F Schaefer. Contributing to discourse. Cog-
nitive Science, 13:259–294, 1989.
[CWG86] Herbert H. Clark and Deanna Wilkes-Gibbs. Referring as a collaborative
process. Cognition, 22:l–39, 1986.
[DS07] David DeVault and Matthew Stone. Managing ambiguities across utter-
ances in dialogue. In Proceedings of the Workshop on the Semantics and
Pragmatics of Dialogue (DECALOG), 2007.
[DTA08] David DeVault, David Traum, and Ron Artstein. Making grammar-based
generation easier to deploy in dialogue systems. In Fifth INLG Conference,
2008.
114
[ESC04] J Edlund, G Skantze, and R Carlson. Higgins - a spoken dialogue system for
investigating error handling techniques. In Proceedings of the International
Conference on Spoken Language Processing, ICSLP 04, pages 229–231,
2004. Jeju, Korea.
[GDR
+
08] Sudeep Gandhe, David DeVault, Antonio Roque, Bilyana Martinovski, Ron
Artstein, Anton Leuski, Jillian Gerten, and David Traum. From domain
specification to virtual humans: An integrated approach to authoring tacti-
cal questioning characters. In Interspeech, 2008.
[GM05] Jonathan Gratch and Stacy Marsella. Some lessons for emotion psychol-
ogy for the design of lifelike characters. Journal of Applied Artificial Intel-
ligence, 19(3-4):215–233, 2005. Special issue on Educational Agents -
Beyond Virtual Tutors.
[Gou05] Scott R. Gourley. JFETS. Military Training Technology, 2005.
[HLS99] Julia Hirschberg, Diane Litman, and Marc Swerts. Prosodic cues to recog-
nition errors. In Proceedings of the Automatic Speech Recognition and
Understanding Workshop (ASRU99), pages 349–352, 1999.
[HLS04] Julia Hirschberg, Diane Litman, and Marc Swerts. Prosodic and other cues
to speech recognition failures. Speech Communication, 43:155–175, 2004.
[HP99] Eric Horvitz and Tim Paek. A computational architecture for conversa-
tion. In Proceedings of the 7th International Conference on User Modeling
(UM), pages 201–210, 1999.
[HP00] Eric Horvitz and Tim Paek. Deeplistener: Harnessing expected utility to
guide clarification dialog in spoken language systems. In Proceedings of
the 6th International Conference on Spoken Language Processing (ICSLP
2000), November 2000.
[KHG
+
07] Patrick Kenny, Arno Hartholt, Jonathan Gratch, William Swartout, David
Traum, Stacy Marsella, and Diane Piepol. Building interactive virtual
humans for training environments. In Proceedings of IITSEC, 2007.
[KSTW01] E Krahmer, M Swerts, M Theune, and M Weegels. Error detection in spo-
ken human machine interaction. International Journal of Speech Technol-
ogy, 4(1):19–29, 2001.
[LHS06] Diane Litman, Julia Hirschberg, and Marc Swerts. Characterizing and pre-
dicting corrections in spoken dialogue systems. Computational linguistics,
pages 417–438, 2006.
115
[LPTK06] Anton Leuski, Ronakkumar Patel, David Traum, and Brandon Kennedy.
Building effective question answering characters. In 7th SIGdial Workshop
on Discourse and Dialogue, 2006.
[LS04] Diane Litman and Scott Silliman. ITSPOKE: An intelligent tutoring spo-
ken dialogue system. In Proceedings of the Human Language Technology
Conference: 4th Meeting of the North American Chapter of the Association
for Computational Linguistics (HLT/NAACL), 2004.
[LT08] Anton Leuski and David Traum. A statistical approach for text processing
in virtual humans. In Proceedings of the Army Science Conference, 2008.
[McC02] Andrew Kachites McCallum. Mallet: A machine learning for language
toolkit. http://mallet.cs.umass.edu, 2002.
[McR98] Susan Weber McRoy. Achieving robust human-computer communication.
International Journal of Human-Computer Studies, 48(5):681–704, 1998.
[MOHL03] Michael McTear, Ian O’Neill, Philip Hanna, and Xingkun Liu. Handling
errors and determining confirmation strategies: an object-based approach.
In Proceedings of the ISCA Tutorial and Research Workshop on Error Han-
dling in Spoken Dialogue Systems, pages 129–132, August 28-31 2003.
Chateau d’Oex, Vaud, Switzerland.
[NT99] Christine Nakatani and David Traum. Coding discourse structure in dia-
logue (version 1.0). Technical Report Technical Report UMIACS-TR-99-
03, University of Maryland, 1999.
[oS84] The Joint Chiefs of Staff. Radiotelephone procedures for the conduct of
artillery and naval gunfire. Technical Report ACP 125 US Supp-2A, The
Joint Chiefs of Staff, February 1984.
[oS85] The Joint Chiefs of Staff. Communication instructions: Radiotelephone
procedures for use by united states ground forces. Technical Report ACP
125 US Supp-1, The Joint Chiefs of Staff, October 1985.
[Pae03] Tim Paek. Toward a taxonomy of communication errors. In Proceedings
of the ISCA Tutorial and Research Workshop on Error Handling in Spoken
Dialogue Systems, pages 53–58, August 28-31 2003. Chateau d’Oex, Vaud,
Switzerland.
[Pau06] Matthew C. Paul. Tactical questioning: human intelligence key to coun-
terinsurgency campaigns. Infantry Magazine, Jan-Feb 2006.
116
[Pel01] Bryan Pellom. SONIC: The university of colorado continuous speech rec-
ognizer. Technical Report TR-CSLR-2001-01, University of Colorado,
2001.
[PH99] Tim Paek and Eric Horvitz. Uncertainty, utility, and misunderstanding:
A decision-theoretic perspective on grounding in conversational systems.
In AAAI Fall Symposium on Psychological Models of Communication in
Collaborative Systems, 1999.
[PH00a] Tim Paek and Eric Horvitz. Conversation as action under uncertainty. In
Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
(UAI), pages 455–464, 2000.
[PH00b] Tim Paek and Eric Horvitz. Grounding criterion: Toward a formal theory
of grounding. Technical report, Microsoft Research, April 2000. Microsoft
Technical Report, MSR-TR-2000-40.
[PH03] Tim Paek and Eric Horvitz. On the utility of decision-theoretic hidden
subdialog. In Proceedings of the ISCA Tutorial and Research Workshop on
Error Handling in Spoken Dialogue Systems, pages 95–100, August 28-31
2003. Chateau d’Oex, Vaud, Switzerland.
[RLR
+
06] Antonio Roque, Anton Leuski, Vivek Rangarajan, Susan Robinson, Ashish
Vaswani, Shri Narayanan, and David Traum. Radiobot-cff: A spoken dia-
logue system for military training. In 9th International Conference on Spo-
ken Language Processing (Interspeech 2006 - ICSLP), September 2006.
[RMG
+
02] Jeff Rickel, Stacy Marsella, Jonathan Gratch, Randall Hill, David Traum,
and Bill Swartout. Towards a new generation of virtual humans for inter-
active experiences. IEEE Intelligent Systems, pages 32–38, July/August
2002.
[RRVT06] Susan Robinson, Antonio Roque, Ashish Vaswani, and David Traum. Eval-
uation of a spoken dialogue system for military call for fire training. In
Proc. Army Science Conference, 2006.
[RS04] Kepa Joseba Rodriguez and David Schlangen. Form, intonation and func-
tion of clarification requests in german task oriented spoken dialogues. In
CATALOG ’04: The 8th Workshop on the Semantics and Pragmatics of
Dialogue, July 19-21 2004. Barcelona.
[RT06] Antonio Roque and David Traum. An information state-based dialogue
manager for call for fire dialogues. In 7th SIGdial Workshop on Discourse
and Dialogue, 2006.
117
[RT07] Antonio Roque and David Traum. A model of compliance and emotion for
potentially adversarial dialogue agents. In The 8th SIGdial Workshop on
Discourse and Dialogue, 2007.
[Sch04] David Schlangen. Causes and strategies for requesting clarification in dia-
logue. In Proceedings of the 5th SIGdial Workshop on Discourse and Dia-
logue, 2004.
[SGN05] Abhinav Sethy, Panayiotis Georgiou, and Shrikanth Narayanan. Building
topic specific language models from webdata using competitive models. In
Proceedings of Eurospeech, Lisbon, Portugal, 2005.
[Ska05a] Gabriel Skantze. Exploring human error recovery strategies: Implica-
tions for spoken dialogue systems. Speech Communication, 45(3):325–341,
2005.
[Ska05b] Gabriel Skantze. Galatea: a discourse modeller supporting concept-level
error handling in spoken dialogue systems. In Proceedings of SigDial,
pages 178–189), 2005. Lisbon, Portugal.
[Ska07] Gabriel Skantze. Making grounding decisions: Data-driven estimation of
dialogue costs and confidence thresholds. In Proceedings of the 8th SIGdial
Workshop on Discourse and Dialogue, 2007.
[SLH00] Marc Swerts, Diane Litman, and Julia Hirschberg. Corrections in spoken
dialogue systems. In Proceedings of the 6th International Conference of
Spoken Language Processing (ICSLP-2000), October 2000.
[SP03] Fei Sha and Fernando Pereira. Shallow parsing with conditional random
fields. Technical Report Technical Report CIS TR MS-CIS-02-35, Univer-
sity of Pennsylvania, 2003.
[TD98] David R. Traum and Pierre Dillenbourg. Towards a normative model of
grounding in collaboration. In Working notes, ESSLLI-98 Workshop on
Mutual Knowledge, Common Ground and Public Information, pages 52–
55, August 1998.
[TG08] Will Thompson and Darren Gergle. Modeling situated conversational
agents as partially observable markov decision processes. In Proceedings
of Intelligent User Interfaces (IUI), 2008.
[TL03] David Traum and Staffan Larsson. The information state approach to dia-
logue management. In R. Smith and J. van Kuppevelt, editors, Current
and New Directions in Discourse and Dialogue, pages 325–353. Kluwer,
Dordrecht, 2003.
118
[TLR
+
08] David Traum, Anton Leuski, Antonio Roque, Sudeep Gandhe, David
DeVault, Jillian Gerten, Susan Robinson, and Bilyana Martinovski. Nat-
ural language dialogue architectures for tactical questioning characters. In
Army Science Conference, 2008.
[Tra94] David R. Traum. A Computational Theory of Grounding in Natural Lan-
guage Conversation. PhD thesis, University of Rochester, 1994.
[Tra98] David R. Traum. On Clark and Schaefer’s contribution model and its appli-
cability to human-computer collaboration. In Proceedings of COOP’98
Workshop on Use of Clark’s Models of Language for the design of Cooper-
ative Systems, May 1998.
[Tra99] David R. Traum. Computational models of grounding in collaborative sys-
tems. In Working Notes of AAAI Fall Symposium on Psychological Models
of Communication, pages 124–131, November 1999.
[TRL
+
07] David Traum, Antonio Roque, Anton Leuski, Panayiotis Georgiou, Jillian
Gerten, Bilyana Martinovski, Shrikanth Narayanan, Susan Robinson, and
Ashish Vaswani. Hassan: A virtual human for tactical questioning. In The
8th SIGdial Workshop on Discourse and Dialogue, 2007.
[TSMG05] David Traum, William Swartout, Stacy Marsella, and Jonathan Gratch.
Fight, flight, or negotiate: Believable strategies for conversing under cri-
sis. In 5th International Conference on Interactive Virtual Agents, 2005.
Kos, Greece.
[Wun06] William Wunderle. Through the lens of cultural awareness: A primer for
US armed forces deploying to arab and middle eastern countries. Technical
report, Combat Studies Institute Press, 2006.
[WY06] Jason D. Williams and Steve Young. Partially observable markov decision
processes for spoken dialog systems. Computer Speech and Language,
21(2):393–422, 2006.
119
Abstract (if available)
Abstract
Spoken dialogue systems -- computers that interact with humans through spoken conversations -- must become more robust before they will be widely accepted. One tradition in improving error-handling in spoken dialogue systems involves studying and implementing grounding behavior as used by humans. When humans converse, they typically work together to establish mutual understanding by using behavior such as repetitions, acknowledgments, and repairs. These types of evidence of understanding combine to help humans establish that the material under discussion is mutually understood to a level sufficient for the current purposes. However, previous work in grounding has not examined how to explicitly represent the degree to which material is grounded during a dialogue, whether this can be useful for dialogue management in spoken dialogue systems, and what advantages this brings to implemented systems.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Rapid prototyping and evaluation of dialogue systems for virtual humans
PDF
User modeling for human-machine spoken interaction and mediation systems
PDF
Incrementality for visual reference resolution in spoken dialogue systems
PDF
Automatic evaluation of open-domain dialogue systems
PDF
An investigation of fully interactive multi-role dialogue agents
PDF
Weighted factor automata: A finite-state framework for spoken content retrieval
PDF
Code-switching dialogue systems: an investigation into how systems can support code-switching and when they should, with analysis of two Choctaw-English applications
PDF
Understanding and generating multimodal feedback in human-machine story-telling
PDF
Planning with continuous resources in agent systems
PDF
Improving language understanding and summarization by leveraging auxiliary information through self-supervised or unsupervised learning
PDF
Weighted tree automata and transducers for syntactic natural language processing
PDF
Adaptive resource management in distributed systems
PDF
Structure and function in speech production
PDF
Spoken language processing in low resource scenarios with applications in automated behavioral coding
PDF
Computational modeling of behavioral attributes in conversational dyadic interactions
PDF
A computational framework for diversity in ensembles of humans and machine systems
PDF
A computational framework for exploring the role of speech production in speech processing from a communication system perspective
PDF
Modeling and regulating human interaction with control affine dynamical systems
PDF
Decision support systems for adaptive experimental design of autonomous, off-road ground vehicles
PDF
Computational modeling of mental health therapy sessions
Asset Metadata
Creator
Roque, Antonio
(author)
Core Title
Dialogue management in spoken dialogue systems with degrees of grounding
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
04/08/2009
Defense Date
12/19/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
dialogue management,OAI-PMH Harvest,spoken dialogue systems
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Traum, David (
committee chair
), Knight, Kevin (
committee member
), Narayanan, Shrikanth S. (
committee member
), Tambe, Milind (
committee member
)
Creator Email
arus@roque-brown.net,roque@ict.usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2065
Unique identifier
UC1435271
Identifier
etd-Roque-2633 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-214152 (legacy record id),usctheses-m2065 (legacy record id)
Legacy Identifier
etd-Roque-2633.pdf
Dmrecord
214152
Document Type
Dissertation
Rights
Roque, Antonio
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
dialogue management
spoken dialogue systems