Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Mechanisms underlying acquisition of non-adjacent dependencies
(USC Thesis Other)
Mechanisms underlying acquisition of non-adjacent dependencies
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Mechanisms Underlying Acquisition of Non-Adjacent Dependencies
Jia Li
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
August, 2015
Advisory Committee:
Professor Toben Mintz (Chair)
Professor Khalil Iskarous
Professor Bosco Tjan
Professor Justin Wood
2
ACKNOWLEDGMENT
First and foremost, I would like to thank my advisor, Dr. Toben Mintz, who
devoted so much of the past six years to guiding and teaching me. He created a free,
open atmosphere for me to share and develop any ideas, challenged me to think more
deeply, pointed out directions for me to explore, and provided me with advice and
support whenever I needed it. I will always admire his kindness and wisdom. Without
his patience and encouragement, I would not be where I am.
I am hugely indebted to Dr. James Magnuson at the University of Connecticut.
He offered me the chance for me to come study in the United States, and experience
educational opportunities, which have transformed my life. He exposed me to many
valuable, fascinating ideas, including those contained in this dissertation. I am also
fortunate to have had the opportunity to take Dr. Carol Fowler's course. Each class felt
like watching an epic movie, telling the story of research endeavors in the field of speech
perception.
I am also grateful for an inspiring committee that engaged in serious, thought-
provoking discussion, and helped to shape the direction of my research--many thanks to
Professors Sarah Bottjer, Khalil Iskarous, Bosco Tjan, and Justin Wood.
I am thankful for our department's highly professional and helpful staff members,
especially Irene Takaragawa, Twyla Ponton, Vivian Hsu-Tran, and Gabriel Gonzalez,
who greatly supported me with their professional knowledge and their genuine care for us
students. And I am deeply grateful to my wonderful group of friends in the psychology
department, especially Zhisen Urgolites, Jinxia Ma, Hao (Bill) Wang, Susan Geffen,
Pinglei Bao, Aditya Prasad, Miao Wei, Pan Wang, Jinshu Cui, Zhiqin Chen and Eustace
Hsu.
I am most grateful for my elder brother and my parents’ love and support.
3
Contents
ACKNOWLEDGMENT..................................................................................................... 2
LIST OF TABLES .............................................................................................................. 6
LIST OF FIGURES ............................................................................................................ 7
ABSTRACT ........................................................................................................................ 8
CHAPTER 1 – INTRODUCTION ................................................................................... 10
CHAPTER 2 – LITERATURE REVIEW ........................................................................ 13
Introduction .................................................................................................................. 13
Non-Adjacent Dependencies of Phonemes .................................................................. 13
Non-Adjacent Dependencies of Syllables .................................................................... 14
Non-Adjacent Dependencies of Non-Linguistic Stimuli ............................................. 17
Relevant Findings from Learning Adjacent Dependencies ....................................... 17
Findings on Learning the NADs of Non-Linguistic Stimuli....................................... 18
The Dual Mechanisms Account ................................................................................... 22
Questions ...................................................................................................................... 24
CHAPTER 3 – NON-ADJACENT DEPENDENCIES OF SYLLABLES ...................... 27
Introduction .................................................................................................................. 27
Experiment 1 - 18 ......................................................................................................... 28
Methods ...................................................................................................................... 28
Results ........................................................................................................................ 31
Discussion .................................................................................................................. 40
CHAPTER 4 – NON-ADJACENT DEPENDENCIES OF ABSTRACT IMAGES
AND TONES .................................................................................................................... 43
Experiment 19 – 25 ...................................................................................................... 43
Methods ...................................................................................................................... 43
Results ........................................................................................................................ 45
Conclusions................................................................................................................ 50
CHAPTER 5 – TWO HYPOTHESIS ON MECHANISMS UNDERLYING
ACQUISITION OF NON-ADJACENT DEPENDENCIES ............................................ 51
Study Synopsis ............................................................................................................. 54
4
Experiment 26: NADs of Body Movements (Replication of Endress and
Wood (2011) ................................................................................................................ 54
Methods ...................................................................................................................... 54
Results and Discussion .............................................................................................. 58
Experiment 27: NADs of Object Movements and Transformations ............................ 61
Methods ...................................................................................................................... 61
Results and Discussion .............................................................................................. 62
Experiment 28: NADs of Body Postures ..................................................................... 63
Methods ...................................................................................................................... 63
Results and Discussion .............................................................................................. 64
Experiment 29: NADs of Object Transformations and Postures ................................. 64
Methods ...................................................................................................................... 64
Results and Discussion .............................................................................................. 64
Discussions ................................................................................................................... 65
Conclusion ................................................................................................................... 67
CHAPTER 6 - FURTHER EXPLORATION ................................................................... 69
Introduction .................................................................................................................. 69
Images ........................................................................................................................ 70
Human Body Movements ........................................................................................... 71
Study Synopsis ............................................................................................................. 72
Experiment 30: NADs of Object Images ..................................................................... 72
Methods ...................................................................................................................... 73
Results ........................................................................................................................ 74
Experiment 31: NADs of Body Movements Performed by Different Actors .............. 75
Methods ...................................................................................................................... 75
Results ........................................................................................................................ 77
Discussion .................................................................................................................... 79
CHAPTER 7 – GENERAL DISCUSSION AND CONCLUSIONS ............................... 81
General Discussion....................................................................................................... 81
Limitations ................................................................................................................... 84
Conclusion ................................................................................................................... 85
5
REFERENCES ................................................................................................................. 86
6
LIST OF TABLES
TABLE 1 RESULTS OF EXPERIMENTS TESTING ADULT PARTICIPANTS’
ACQUISITION OF NADS OF SYLLABLES ..................................................................... 34
TABLE 2 RESULTS FROM REPLICATION STUDIES WITH TESTING EFFECT
CONTROLLED. THE REST WAS THE SAME AS IN TABLE 1. .................................... 36
TABLE 3 SUMMARY OF PARTICIPANTS’ PREFERENCE FOR RULE-WORDS
VS. PART-WORDS UNDER DIFFERENT TRAINING CONDITIONS. .......................... 37
TABLE 4 RESULTS FROM PREVIOUS STUDIES TESTING ADULT
PARTICIPANTS’ ACQUISITION OF NADS OF TONES AND IMAGES........................ 48
TABLE 5 MEAN PREFERENCE FOR RULE-TRIPLETS OVER 36 TRIALS
INCLUDING STANDARD DEVIATION, MEAN % OF PREFERENCE TO RULE-
TRIPLETS, INTERCEPT (Β), STANDARD ERROR OF INTERCEPT (SE), Z, AND
P-VALUE OF EACH EXPERIMENT USING MIXED LOGIT MODEL. ......................... 59
TABLE 6 MEAN PREFERENCE FOR RULE-TRIPLETS OVER 36 TRIALS
INCLUDING STANDARD DEVIATION, MEAN % OF PREFERENCE FOR
RULE-TRIPLETS, AND INTERCEPT (Β), STANDARD ERROR OF
INTERCEPT (SE), Z AND P VALUE OF EACH EXPERIMENT USING MIXED
LOGIT MODEL. ............................................................................................................... 78
7
LIST OF FIGURES
FIGURE 1. THE STRUCTURE OF TRAINING AND TESTING STIMULI IN
PEÑA ET AL. (2002). ....................................................................................................... 16
FIGURE 2. RESULTS FROM PEÑA ET AL. (2002). ................................................... 17
FIGURE 3. SAMPLES OF TRAINING AND TESTING STIMULI FROM
OTSUKA ET AL. (2013) AND TURK-BROWNE & SCHOLL (2009). ........................ 18
FIGURE 4. EXAMPLE OF TRAINING STIMULI FROM TURK-BROWNE
ET AL. (2005).. ................................................................................................................. 21
FIGURE 5. RESULTS OF EXPERIMENTS 1-12.. ......................................................... 38
FIGURE 6. RESULTS OF EXPERIMENTS 13-18.. ....................................................... 39
FIGURE 7. IMAGES USED IN THE STUDY, AND THEIR CORRESPONDING
SYLLABLES. ................................................................................................................... 45
FIGURE 8. RESULTS OF EXPERIMENTS 19-25.. ....................................................... 49
FIGURE 9. FRAMES EXCERPTED FROM THE BODY MOVEMENT
ANIMATIONS USED IN EXPERIMENT 26 (DEPICTING THE MAXIMUM
EXTENT OF MOVEMENT), WHICH WERE ALSO THE STILL IMAGES
USED IN EXPERIMENT 28............................................................................................ 56
FIGURE 10. DIAGRAM OF THE DESIGN OF EXPERIMENT 26. ............................. 57
FIGURE 11. RESULTS OF EXPERIMENTS 26-29. ...................................................... 60
FIGURE 12. DEPICTION OF THE OBJECT TRANSFORMATIONS AND
NEUTRAL POSITION USED IN EXPERIMENT 27, AND THE IMAGES USED
IN EXPERIMENT 29.. ..................................................................................................... 62
FIGURE 13. IMAGES OF THE OBJECTS USED IN EXPERIMENT 30. .................... 74
FIGURE 14. FRAMES EXCERPTED FROM THE VIDEOS USED IN
EXPERIMENT 31 (DEPICTING THE MAXIMUM EXTENT OF MOVEMENT). ..... 76
FIGURE 15. RESULTS OF EXPERIMENTS 30 AND 31. ............................................ 79
8
ABSTRACT
This dissertation examines the underlying mechanisms of learning Non-Adjacent
Dependencies (NADs), the dependencies of stimuli that are not temporally adjacent.
Previous studies have suggested that perceptual grouping facilitated such learning. Peña
et al. (2002) observed that, absent perceptual cues, brief pauses between syllable
sequences facilitated the acquisition of NADs of syllable sequences. Endress and Wood
(2011) observed successful acquisition of body movements when the body movement
sequences were separated by pauses. However, the function of the pauses, and the
underlying mechanism of acquisition, remain unknown. This dissertation first replicated
the key findings in Peña et al. (2002), confirming the acquisition of syllable NADs when
pauses separated the test sequences. It then applied the same experimental model to non-
linguistic stimuli, abstract images and tones, and found that the inclusion of pauses did
not facilitate the acquisition of the NADs of these stimuli. Based on these observations,
two hypotheses were proposed: (1) packaging the sequences into a coherent unit is
necessary for the acquisition of NADs; and (2) motor sequence learning is the mechanism
which underlies the acquisition of NADs. The dissertation studies then tested acquisition
of the NADs of human body movements (replicating Endress and Wood (2011)), object
movements, human body postures, and object postures. The results showed that while
participants acquired the NADs of the first three categories of stimuli, they failed to do so
with object postures. This outcome suggested that bracketing the sequences of stimuli as
fluid transformations is critical for NAD acquisition, a process which could occur as the
result of viewing actual fluid transformation (i.e. a video of continuous movement), or
perceived fluid transformation (i.e. a sequence of images suggesting continuous
9
movement). Particularly with stimuli that can be mapped onto the perceiver’s motor
representation, motor sequence learning enables the representation of stimuli sequences
as fluid transformations. Two more experiments were carried out, which support this
claim. Acquisition of the NADs of object images occurred when a packaging
mechanism(s) was available, while acquisition of the NADs of human body movements
disappeared when the packaging of the movement sequences was disrupted. The results
of this dissertation imply that the acquisition of syllable NADs may be supported by the
motor sequence learning of the vocal gestures that produce the syllable sequences.
10
CHAPTER 1 – INTRODUCTION
Our sensory experience is full of regularities distributed over time. How do we
track and discover these regularities quickly and unintentionally? Research on statistical
learning has shown that humans can discover both visual and auditory regularities by
tracking the co-occurrence patterns of the stimuli.
The seminal work of Saffran, Aslin, and Newport (1996) revealed human infants’
computational abilities in tracking distributional cues, Transitional Probabilities (TPs),
among syllables in a continuous, uninterrupted speech stream, and named it statistical
learning. Transitional Probabilities (TPs) are defined as the probability of having events
X and Y (e.g., syllables) co-occurring given X, P(XY|X) (Saffran, Newport, & Aslin,
1996). In Saffran et al. (1996), after listening to a continuous stream of syllables
composed of artificial words (e.g., …bidakupadotigolabubidaku…), infants showed
responses that suggest that they grouped the syllables into word-like units (e.g., bidaku).
Statistical learning of the TPs provided a plausible approach for word segmentation in the
absence of word-boundary cues, as in the above example. Linguists have been using this
way to study new languages (Harris, 1955).
Subsequent studies can be generally divided into two major directions of research.
One research trend is to test whether statistical learning is specific to language, or
whether it is a domain-general learning mechanism, by testing statistical learning of
various types of non-linguistic stimuli such as tones (Creel, Newport, & Aslin, 2004;
Saffran, Johnson, Aslin, & Newport, 1999), noises (Gebhart, Newport, & Aslin, 2009),
shapes and images (Fiser & Aslin, 2001, 2002; Otsuka, Nishiyama, Nakahara, &
Kawaguchi, 2013; Turk-Browne, Scholl, Chun, & Johnson, 2008; Turk-Browne & Scholl,
11
2009), as well as other species’ statistical learning abilities (Hauser, Newport, & Aslin,
2001; Newport, Hauser, Spaepen, & Aslin, 2004). Another research direction looks into
the limits and characteristics of statistical learning of various types of linguistic stimuli
(such as phonemes, syllables, and words) and of different types of regularities, especially
non-adjacent dependencies (NADs) (Bonatti, Peña, Nespor, & Mehler, 2005; Gómez,
2002; Graf Estes, Evans, Alibali, & Saffran, 2007; Newport & Aslin, 2004; Peña, Bonatti,
Nespor, & Mehler, 2002; Mintz, 2002, 2003; Mirman, Magnuson, Graf Estes, & Dixon,
2008; Pons & Toro, 2010; Toro, Nespor, Mehler, & Bonatti, 2008; Toro, Shukla, Nespor,
& Endress, 2008). NADs are the co-occurrences between stimuli that are not temporally
adjacent. The purpose of the latter research direction is to establish the role of statistical
learning in language acquisition, specifically, whether it plays a role in learning complex
regularities in language other than word segmentation, such as grammar (e.g., such as in
subject verb agreement) (Santelmann, & Jusczyk, 1998).
Studies pursuing the first research direction tend to agree that statistical learning
is a domain-general mechanism (Gómez & Gerken, 2000; Karuza, Newport, Aslin,
Starling, Tivarus, & Bavelier, 2013; Saffran & Thiessen, 2007), because, participants
have also successfully extracted patterns based on the TPs of non-linguistic stimuli (e.g.,
images and tones). However, the second trend of studies has observed both similarities
and differences between learning outcomes with linguistic stimuli and non-linguistic
stimuli (Bonatti et al., 2005; Creel et al., 2004; Endress & Wood, 2011; Fiser & Aslin,
2001, 2002; Gebhart et al., 2009; Gómez, 2002; Saffran et al., 1999), and variable
outcomes with different types of linguistic stimuli, such as phonemes and syllables
(Bonatti et al., 2005; Newport & Aslin, 2004; Peña et al., 2002; Toro, et al., 2008), which
12
can be interpreted as evidence countering the domain-general account. For example,
human adults failed to learn the NADs of either pure tones (Creel et al., 2004), body
movements (Endress & Wood, 2011), or syllables (Peña et al., 2002), from a continuous
stream of tones, body movements, or syllables respectively. Acquisition of the NADs of
syllables and body movements was enabled by putting items with NADs at the edge
positions (beginning or ending) of segmented sequences (Endress & Wood, 2011; Peña et
al., 2002). By contrast, the NADs of tones could be learned when the tones with NADs
were perceptually more similar (e.g., with higher pitches or the same kind of timbre)
compared to the rest of the tones (Creel et al., 2004).
The diverse results of statistical learning studies over almost 20 years of research
invite a review of the various findings and their interrelation, and investigation of the
possible cognitive mechanisms underlying statistical learning, in order to better
understand the nature of statistical learning and its potential role in language acquisition.
This dissertation looked into the mechanisms of statistical learning and proposed that
bracketing the sequences into coherent units was necessary for acquisition of the NADs
in the sequences. Such bracketing may take various forms. In particular, motor
representation provided a way to package discrete stimuli into fluid transformation. The
implications on language acquisition are discussed.
13
CHAPTER 2 – LITERATURE REVIEW
Introduction
Studies on statistical learning of temporal regularities can be broadly divided into
two categories based on the types of distributional cues, Adjacent Dependencies and
Non-Adjacent Dependencies. Previous research has confirmed human subjects’
consistently similar capacity to learn adjacent dependencies in both linguistic and non-
linguistic stimuli, such as tones, noises, images, and body movements (Creel et al., 2004;
Endress & Wood, 2011; Fiser & Aslin, 2002; Gebhart et al., 2009; Kirkham, Slemmer, &
Johnson, 2002; Saffran et al., 1999; Turk-Browne & Scholl, 2009). By contrast, the
acquisition of NADs exhibits differing characteristics with different types of stimuli.
This chapter will explore the major findings on learning NADs of different types
of stimuli, including linguistic stimuli (phonemes and syllables) and non-linguistic
stimuli (abstract images, tones, noises, and human body movements), and then introduce
the dual-mechanisms account for acquisition of the NADs.
Non-Adjacent Dependencies of Phonemes
Studies have shown that participants can acquire NADs phonemes with certain
limitations.
In studies of the acquisition of NADs among phonemes, phonemes with NADs
were either all consonants (e.g., t_k_p_, with “_” indicating spaces for vowels), or all
vowels (e.g. _e_i_u, with “_” indicating spaces for consonants) (Bonatti et al., 2005;
Newport & Aslin, 2004). Newport and Aslin (2004) proposed two theoretical
explanations for their finding that NADs between consonants or between vowels,
respectively, could be readily learned. The first explanation proposed that statistical
14
learning of linguistic stimuli was constrained by Gestalt principle of similarity
(Wertheimer, 1923), given that consonants and vowels can be viewed as two groups of
phonemes that have similar intra-group acoustic features, but distinct inter-group acoustic
features. The second explanation proposed that consonants and vowels are segmented
into different phonological tiers, as proposed by Autosegmental phonology (Goldsmith,
1976), and therefore learning the NADs between consonants or vowels actually equates
to learning adjacent dependencies between them within their respective tier.
Non-Adjacent Dependencies of Syllables
Statistical learning has also been tested as a potential learning mechanism for
linguistic regularities, which are distributed over temporally non-adjacent items, such as
in subject verb agreement (Peña et al., 2002; Mintz, 2002, 2003; Newport & Aslin,
2004; Bonatti et al., 2005; Toro et al., 2008; Wang & Mintz, 2008).
Newport and Aslin (2004) created a speech stream (e.g., …dokitakegubu…)
composed of repetitions of four tri-syllabic artificial words by pairing two NADs (e.g., do
_ ta and ke _ bu) and two middle syllables (ki and gu). In the test, participants were
asked to indicate whether words (e.g., dokita), or part-words (e.g., kitake), which is an
episode that spans over two words, were more familiar. The TP among the syllable pairs
in words (e.g., do_ta) was 1 while the TP between any two syllables in the part-word was
0.5. If statistical learning functions over NADs, participants would be expected to
consider the words to be more familiar than the part-words. However, participants did
not exhibit any familiarity with the words even when training time was increased to 21
minutes per day for 10 days.
15
Peña et al. (2002) had similar observations from testing participants’ sensitivity to
the syllable NADs in a speech steam. Researchers created nine tri-syllabic words (puraki,
puliki, pufoki, beraga, beliga, befoga, taradu, talidu, and tafodu) by pairing each of three
pairs of syllables with NADs (pu_ki, be_ga, ta_du) with each of three middle syllables (ra,
li and fo), and randomly concatenated repetitions of each word into a continuous speech
stream. After listening to the speech stream, participants were tested on their preference
between two kinds of words: rule-words and part-words. Rule-words were defined as tri-
syllabic sequences with the correct NADs paired with unattested middle syllables (e.g.,
pubeki). Rule-words and part-words differed in two major ways: 1) participants were
exposed to part-words during training, but not to rule-words; 2) rule-words contained the
same NADs as the training words. Based on these differences, familiarity with part-
words could result from exposure to the syllable sequences, while familiarity with rule-
words could result from acquisition of NADs. Figure 1 illustrates the design of the study.
The study observed that when there were no segmental cues between the words,
participants displayed preference to the part-words after 10 minutes of training,
suggesting that the participants failed to extract the dependencies between syllable pairs.
However, when 25-millisecond subliminal inter-word pauses were added, preference
shifted significantly towards rule-words, words that were not shown in the training but
with co-occurrence patterns of NADs, even after an exposure as brief as 2 minutes (20
repetitions of each word), see Figure 2. This discovery suggested that subliminal
segmentation cues play a pivotal role in the acquisition of the NADs of syllables.
16
Peña et al. (2002) suggested that: 1) subliminal segmentation cues promoted the
acquisition of NADs (indicated by familiarity with rule-words); 2) acquisition of NADs
of syllables is rapid.
Figure 1. The structure of training and testing stimuli in Peña et al. (2002).
17
Figure 2. Results from Peña et al. (2002). Peña et al. (2002) found that adult participants
considered rule-words to be more familiar when there were 25ms pauses before and after
the NADs, suggesting that they were sensitive to the NADs of syllables under the with-
pause condition.
Non-Adjacent Dependencies of Non-Linguistic Stimuli
Relevant Findings from Learning Adjacent Dependencies
Similar to syllables, tone sequences and image sequences with higher Transitional
Probabilities (TPs) can be extracted from a continuous stream of images by both human
infants and adults (Abla, Katahira, & Okanoya, 2008; Abla & Okanoya, 2009; Fiser &
Aslin, 2002; Kirkham et al., 2002; Roser et al., 2011; Saffran, Johnson, Aslin, and
Newport, 1999; Turk-Browne et al., 2008; Turk-Browne & Scholl, 2009). Since images
can be presented simultaneously, researchers have found that image clusters that are
spatially concurrent could also be learned (Fiser & Aslin, 2001; Turk-Browne & Scholl,
2009).
Findings from Otsuka et al. (2013) suggest that the mental representation of visual
sequences may involve other aspects of the visual stimuli, such as the semantics of the
sequences. Unlike Turk-Browne and Scholl (2009), which used abstract images, Otsuka et
al. (2013) used images of familiar objects and animals (e.g., car and dog) to test
18
acquisition of adjacent dependencies of the images. In one experiment in Otsuka et al.
(2013), the training stimuli were images of line drawings (see Figure 3), while the testing
stimuli were written texts of the corresponding images; nonetheless, participants’
expressed preference was comparable to the results when line drawings were used as the
testing stimuli.
Figure 3. Samples of training and testing stimuli from Otsuka et al. (2013) (left) and
Turk-Browne & Scholl (2009) (right).
Findings on Learning the NADs of Non-Linguistic Stimuli
Below will introduce findings on learning the NADs of non-linguistic stimuli,
including tones, noises, images and body movements.
Tones Studies on tones and noises have uniformly indicated that the NADs of
these acoustic stimuli cannot be readily learned unless units with NADs are perceptually
similar, following Gestalt principles of perception (Wertheimer, 1923) (Creel et al., 2004;
Gebhart et al., 2009).
Creel et al. (2004) first demonstrated this principle with tones. They created four
tone triplets (denoted ABC, DEF, GHI and JKL, with each letter representing a distinct
tone), and randomly interweaved the triplets so that adjacent tones within a triplet were
each separated by one tone from another triplet (e.g., A-G-B-H-C-I-D-J-E-K-F-L).
19
Although no consistent pattern occurred between any two neighboring tones, a co-
occurrence pattern existed between every other tone, which indicated four tone-triplets.
After listening to the tone sequence, participants were tested on their knowledge of the
adjacent and nonadjacent dependencies between tones. In the test of adjacent
dependencies, participants were asked to choose the more familiar one between the
actually-heard tone sequences (e.g., AgBhCi) and sequences that was not present in the
training (e.g., ChAiBg). The two testing items in each pair were composed of a tone
triplet (e.g., ABC), and a scrambled tone triplet in which the order of the first two tones
was flipped (e.g., EDF). When all the tones were from the same octave, with similar
pitch, participants succeeded in the test of adjacent dependencies, but failed to
discriminate between the two types of triplets in the test of NADs, indicating that they
acquired the adjacent dependencies, but not the NADs, between the tones. In the
following studies, the interleaving triplets had distinct pitches (higher or lower octaves)
or timbres (the characteristic quality of the sound), and participants’ performance
followed the opposite pattern: they manifested acquisition of NADs, but not adjacent
dependencies. Given that participants readily grouped perceptually similar tones, Creel
et al. (2004) suggested that the perception of tone sequences followed Gestalt principle of
similarity.
Similar perceptual grouping was observed in Gebhart et al. (2009). Gebhart et al.
(2009) created four noise triplets (e.g., AxB, CxD, AyB, and CyD, each letter
representing a distinct noise) by exhaustively combining two NADs (e.g., A_B and C_D)
and two middle sounds (e.g., x, y). There was a 150ms pause between each tone, but no
extra pause between the triplets. After three days of training totaling 100 minutes,
20
participants failed to discriminate rule-triplets (e.g., AxB) and part-triplets 9 (e.g., yDA).
Similar to Creel et al. (2004), when the tones with NADs were given a raspy sound
quality, which perceptually distinguished them from the rest of the tones, participants
indicated triplets with NADs were more familiar to them, supporting the claim that the
NADs of sounds can be learned only when they are perceptually similar.
Images Unlike tones, the acquisition of NADs between visual stimuli has not
been investigated as extensively. Findings from Turk-Browne, Jungé, & Scholl (2005)
suggested that the acquisition of NADs of images requires both perceptual similarity of
the images and participants’ specific attention to the images with NADs. In this study,
two arrays of shapes (green and orange), each composed of shape-triplets, were
interweaved into one sequence, as shown in Figure 2. The NADs were among shapes
from the same triplets, which also had the same color.
21
Figure 4. Example of training stimuli from Turk-Browne et al. (2005). Visual shapes in
the middle column were presented sequentially during training. Figures adapted from
Turk-Browne et al. (2005).
During familiarization, shapes were presented on the computer screen
sequentially, with image presentation time and separating pauses of equal duration.
Participants were required to pay attention to shapes of one color to detect an immediate
repetition of the same shape, but not to shapes of the other color. In the tests, the choice
was between triplets and foil triplets of both the attended color and unattended color.
Results showed that participants only acquired the NADs of the attended color shapes,
but not of the unattended ones. In this study, participants’ attention was intentionally
directed to one array of shapes, which may lead to strategic ignorance of the other array,
and suppression of the processing of the other array.
22
It is not yet known whether NADs between visual stimuli without any salient
perceptual cues (e.g., color) would be automatically acquired when they are viewed
passively. It is also unknown whether perceptual cues would influence the learning of
NADs between visual stimuli in the same way as the learning of tonal stimuli (Creel et al.,
2004).
Body Movements Endress and Wood (2011) investigated statistical learning of
body movements, such as raising one’s hands or twisting one’s body. In this study, adult
participants watched a video of an animated male actor performing a series of actions.
The structure of the stimuli was similar to Peña et al. (2002), but replaced syllables with
body movements, and participants were similarly tested. They were shown two body
movement triplets, and asked to select the more familiar one. As with syllables, the
addition of pauses separating the movement triplets to the training sequence facilitated
participants’ acquisition of the rule-triplets (analogous to rule-words).
Summary
Taken together, the above studies suggest that: (1) Perceptual cues facilitate the
acquisition of NADs; (2) pauses between sequences with NADs also facilitate the
acquisition of NADs of syllables and human body movements.
The Dual Mechanisms Account
To account for the results in Peña et al. (2002), the dual mechanisms account
(Endress & Bonatti, 2007; Endress & Mehler, 2009) proposed that there are two learning
mechanisms at work during statistical learning of syllable sequences, one which rapidly
records syllables’ positions relative to the edges of the sequences, and another one which
tracks TPs among syllables. Each learning mechanism is hypothesized to produce its
23
own learning outcome, and the observed learning outcome in the studies was produced by
the interplay of the two respective outcomes.
TPs learning is defined as learning based on the co-occurrence patterns of the
neighboring items, and such learning has been described in the previous section. This
section will introduce positional learning. In positional learning, the syllables at the edge
of the sequences, e.g., pu and ki in the sequence puraki, play a critical role, and they will
be referred to as edge syllables in later discussion. The positional learning mechanism
records syllables’ positions relative to the edge syllables. The positional learning
mechanism suggests that the edge items occupy cognitively distinct positions, acting as
“anchor points” to set the relative positions of the medial items (Endress & Bonatti, 2007;
Endress & Mehler, 2009; Endress, Nespor, & Mehler, 2009). Such a learning mechanism
has three features: (1) it requires sequence boundaries so that it has “anchors” to mark the
edge items of the sequences and distinguish the medial items; (2) the edge syllables are
most prominent; (3) learning is rapid and requires less training.
The dual mechanisms account explained the findings in Peña et al. (2002). In
training conditions without the pauses, there was no cue for the beginning or ending of
the tri-syllabic sequences, thus the positional learning mechanism failed to operate due to
a lack of marked edge syllables. As a result, rule-words, whose resemblance to words
was their edge syllables, could not be recognized. Concurrently, TPs rendered part-
words more familiar over time. Under the with-pause condition, pauses marking word
boundaries enabled the positional mechanism to record syllable positions, which was
essential to learning the NADs of edge syllables. When training was short (2 min.),
positional learning preceded TPs-learning, so that rule-words, which had never been
24
heard during training but had the correct edge syllables, were preferred over part-words.
With an increased amount of training, TPs-learning had enough time to build up
knowledge of the TPs of syllables, and therefore part-words were preferred due to their
high TPs between syllables.
Questions
The above review of findings on the acquisition of NADs suggests two types of
cues which facilitate such acquisition, perceptual cues pointing to the items with NADs,
and pauses separating the sequences with NADs. The dual-mechanisms account
proposes the positional learning mechanism as the explanation for the successful
acquisition of NADs under the with-pause condition, and failed acquisition under the
without-pause condition. However, it is unknown what cognitive mechanisms underlie
positional learning.
As a first step in the exploration of the cognitive mechanisms behind the
acquisition of NADs, the following studies will first confirm the findings from Peña et al.
(2002) that after brief exposure to the stimuli, participants will successfully acquire the
NADs of syllables when there are pauses separating the syllable sequences, but fail to do
so without the pauses. Positional learning is grounded in the proposition that the edge
syllables occupy prominent positions because they mark the beginnings and endings of
the sequences. In Peña et al. (2002), apart from their special positions in the sequences,
the edge syllables contrasted with middle syllables in that the edge syllables contained
stop consonants (/p/, /k/, /b/, /g/, /t/, /d/), while the middle syllables contained liquids or
fricatives (/l/, /r/, /f/). Such phonetic cues could be a potential confounding factor for
participants’ sensitivity to the edge syllables, as pointed out by several researchers
25
(Newport & Aslin, 2004; Onnis, Monaghan, Richmond, & Chater, 2005). Therefore, it is
necessary to examine if the acquisition of NADs still occurs after removing phonetic cues
from the edge syllables.
Further, a key feature of positional learning is its rapidity. In Peña et al. (2002),
two minutes of exposure led to acquisition of the NADs of syllables, and acquisition after
two minutes of exposure has been replicated in Endress and Bonatti (2007) and Endress
and Mehler (2009). Given a learning mechanism as rapid as positional learning, a
detailed investigation of the speed of such learning would provide a basis for further
investigation of its underlying cognitive mechanisms, as well as the domain generality of
such learning mechanisms, by comparing the features (including speed) of positional
learning of linguistic stimuli and non-linguistic stimuli. Therefore, one goal of the
current research is to focus on the two-minute exposure time, and test how far this
exposure time can be reduced without disrupting acquisition.
The dissertation will then explore the domain generality of positional learning.
So far, it has been tested with syllables and human body movements; it is unknown if it
also applies to other non-linguistic stimuli such as abstract images and pure tones.
Based on the findings in the above studies, this dissertation will propose a
hypothesis regarding the underlying mechanism of acquisition of NADs, and further
develop studies to test the hypothesis.
Chapter 3 will first confirm the findings in Peña et al. (2002) with English-
speakers, controlling for phonological cues. Then the rapidity of the acquisition of
syllable NADs will be tested. Chapter 4 will examine if participants can learn the NADs
of images and tones with the aid of pauses. Chapter 5 will analyze the findings from
26
studies on the acquisition of NADs, propose a hypothesis, and test that hypothesis, which
Chapter 6 will examine further. Chapter 7 will provide an overall discussion of the
findings.
27
CHAPTER 3 – NON-ADJACENT DEPENDENCIES OF SYLLABLES
Introduction
Studies in this chapter first confirmed the findings in Peña et al. (2002) with
English speakers (as opposed to French speakers in Peña et al. (2002)), and then
examined the minimal amount of exposure to the training stimuli that is needed to
successfully acquire the NADs of syllables. This chapter consists of 18 independent
experiments.
Experiments 1 and 2 used training and testing stimuli which closely represented
stimuli in Peña et al. (2002), consisting of the same sets of words, synthesized using the
same MBROLA fr2 diphone database (Dutoit, Pagel, Pierret, Bataille, & van der Vreken,
1996). In Experiment 1, there were no pauses between words (referred to as the “no-
pause” condition), while in Experiment 2 the word were segmented by 25ms pauses
(referred to as the “with-pause” condition). Both experiments tested whether participants
were able to discern rule-words from part-words after two minutes of exposure to the
training stimuli.
Experiments 3 (no-pause) and 4 (with-pause) differed from Experiments 1 and 2
only in terms of the MBROLA diphone base used (de7, German in female voice),
because native English speakers reported that words synthesized with the German
diphone base were more intelligible.
Experiments 5 (no-pause) and 6 (with-pause) were similar to Experiment 3 and 4
respectively, except that 5 and 6 used a different set of syllables which did not contain
any plosive consonants. The new set of training stimuli in Experiment 5 and 6 eliminated
the confounding cues.
28
Experiments 7-12 used the same materials as experiments 5 and 6 but varied the
training time to 30 seconds (Experiment 7 and 8), 1 minute (Experiment 9 and 10), and 5
minutes (Experiment 11 and 12), to test participants’ performance under different training
time conditions. The 5-minute training condition was added to test if the results follow a
continuous pattern from 2 minutes to 10 minutes.
Since there was a potential testing effect in Experiments 1-12 (see Stimuli section
for details), some of the key findings were replicated in Experiments 13-18.
Experiment 1 - 18
Methods
Participants. A total of 178 native English-speaking USC undergraduate
students participated in the studies. They were recruited through the USC Psychology
Subject Pool (https://usc.sona-systems.com) and received credits to fulfill course
requirements as compensation.
For the number of participants in each experiment see Table 1 and Table 2.
Apparatus and Stimuli. In experiments 1 and 2, the stimuli were constructed to
resemble the stimuli in Peña et al. (2002). Nine tri-syllabic artificial words (puliki, pufoki,
puraki, beliga, beraga, befoga, talidu, tafodu, and taradu) were created by exhaustively
matching three syllable NADs (pu_ki, be_ga, and ta_du) with three middle words (li, fo,
and ra), as in Figure 1. The training speech was created by randomly concatenating 20
instances of each of the nine, with the constraint that words were not immediately
followed by ones with the same NAD (e.g., puliki and puraki). The TP between syllables
within words was 0.03, and the TP between edge syllables within words was 1. The TP
between the right edge syllable and the following left edge syllable from the next word
29
was 0.5. The nine words were synthesized using the MBROLA French female voice
diphone database fr2 at the pitch of 200Hz, same as in Peña et al. (2002). Words were
concatenated with Praat 5.3.45. In Experiment 1 there was no pauses between words,
while in Experiment 2 there was a 25ms pause between words.
The testing materials consisted of 36 pairs of rule-word and part-word pairs.
Rule-words were defined as a tri-syllabic sequence composed of legal NADs (pu_ki,
be_ga, and ta_du) filled in with an edge syllable (pu, ki, be, ga, ta, and du). Part-words
were defined as tri-syllabic sequences spanning two adjacent words (e.g., rakita).
Therefore, rule-words never showed up during training, but part-words were attested in
the training sequences, although at lower frequencies compared to words. Within each
pair, the rule-words were paired with closely matched part-words, e.g., the rule-word
“pubeki” would be paired with the part-word “rakibe.” The order of presentation of rule-
words and part-words was randomized, and test pairs were randomized as well. The
duration of each syllable was the same as in training, 230 ms, and there were 500 ms of
pause between the rule-words and part-words within each pair. Each syllable had
duration of 230 ms, each word 690 ms, and the pitch of the syllable streams was set at
200Hz. The training stimulus was around 2 minutes long both for with-pause and no-
pause conditions. In order to ensure that the words would not have a clear beginning or
ending point, the first 5 seconds of the stimuli faded in and the last 5 seconds faded out.
Since the syllable NADs were composed of plosive syllables, this set of stimuli is
referred to as with-plosive.
Experiments 3 and 4 were identical to Experiments 1 and 2 respectively, except
that 3 and 4 used the MBROLA de7 German diphone database, female voice.
30
Experiments 5 and 6 were identical to Experiments 3 and 4 respectively, except that the
syllables used in Experiments 3 and 4 were replaced with a new set of syllables. The
NADs were “me_va,” “zi_nu,” and “su_fo,” while the middle syllables were “li,” “sa,”
and “ra.”
Experiments 7-12 were identical to Experiments 5 and 6, except that the
repetitions of each words varied from 5 times to 50 times, and the duration of the stimuli
varied from 30 seconds to 5 minutes as a result, as shown in Table 1.
In Experiments 1-12, each of the NADs appeared nine times, more often than any
of the part-words, which could induce testing effects, meaning that participants’
preference for rule-words was the result of greater exposure to rule-words during testing.
To avoid this problem, Experiments 13-18 controlled for such testing effects by choosing
nine specific rule-words (pubeki, putaki, pugaki, beduga, bekiga, bepuga, tagadu, tabedu,
and takidu) and nine specific part-words (kitara, kitafo, gapufo, dubera, likita, lidube,
radube, ragapu, and fogapu), so that each word appeared nine times during the testing.
Experiments 13-18 only replicated experiments in which preference was either given to
rule-words, or no preference was expressed (see Table 2), but not experiments in which
preference for part-words was expressed, because the potential testing effect would bias
participants toward rule-words, not part-words. Aside from this change, the designs of
Experiments 13-18 each paralleled the design of the particular experiment they replicated.
Procedures. Participants were each instructed to listen to a sequence of words
for a period of time (depending on the actual time of each study), and to pay close
attention to the sequence, as they would be tested on it afterwards. After they were done
with the training, participants were informed that they would be completing a choice-test,
31
in which they would hear two sequences of syllables and indicate which one was more
familiar to them by pressing one of two buttons.
In all of the experiments in this study, PsyScope B53D was used to display
stimuli and record responses. The experiments were run on a 13-inch MacBook Pro, in a
quiet room free from disturbances.
Results
In the choice tests, participants’ responses were coded as binary variables, with
rule-words coded as 1 and part-words coded as 0. A logistic regression model (Jaeger,
2008) was used to compare participants’ choice with the chance level (0.5), controlling
for variance based on participants and test questions. The intercept (β) describes
deviation from chance level in the choice tests between rule-words and part-words, and
the p-value describes whether the difference from chance level was significant. For each
experiment, participants’ percentage of preference for rule-words with standard deviation,
intercept (β) with standard error, z, and p-value are listed in Table 1. The analysis was
performed with R 3.0.2 GUI 1.62 Snow Leopard build (6558), and lme4 R package,
version 1.0-4, and graphed with graphics version 3.0.2.
Analysis of Experiment 1 (ß = 0.42, SE = 0.13 z = 3.37, p < .01), Experiment 2 (ß
= 1.09, SE = 0.19, z = 5.84, p < .001), Experiment 3 (ß = 0.41, SE = 0.16, z = 2.54, p
< .01), and Experiment 4 (ß = 1.37, SE = 0.19, z = 7.38, p < .001) yielded significant
intercepts, indicating that participants considered rule-triplets to be more familiar in these
four experiments, where the edge syllables started with plosive consonants, regardless of
the MBROLA diphone database used.
32
In Experiment 5, where the phonological cues from edge syllables were removed
and where there were no segmental cues between the words in the training, participants’
preference shifted to part-words (ß = -0.49, SE = 0.22, z = -2.20, p < .05). However, in
Experiment 6, where phonological cues were removed, but the subliminal pauses were
present, preference for rule-words was expressed once again (ß = 0.38, SE = 0.17, z =
2.16, p < .05). The major finding in Peña et al. (2002), that participants considered the
rule-words to be more familiar with pauses separating the triplets, was confirmed.
In Experiment 7 and 8 with only 30 seconds of training, participants did not show
any significant preference between rule-words and part-words under the no-pause
condition (ß = -0.07, SE = 0.16, z = -0.48, p = 0.63), but preferred rule-words under the
with-pause condition (ß = 0.39, SE = 0.20, z = 2.00, p = 0.045). When the exposure was
extended to 1 minute, part-words were preferred under the no-pause condition (ß = -0.46,
SE = 0.15, z = -2.97, p < .01), while rule-words were preferred under the with-pause
condition (ß = 0.40, SE = 0.18, z = 2.16, p < .05). When exposure was further increased
to 5 minutes, similar to the results from the 1-minute and 2-minute training conditions,
part-words were preferred under the no-pause condition (ß = -0.41, SE = 0.20, z = -2.09,
p < .05), and rule-words were preferred under the with-pause condition (ß = 0.53, SE =
0.11, z = 3.62, p < .001).
In Experiment 13 and 14, which replicated Experiment 3 and 4, results showed
that while participants still preferred rule-words under the with-pause condition (ß = 1.06,
SE = 0.12, z = 8.87, p < .001), there was no preference for either one under the no-pause
condition (ß = 0.12, SE = 0.14, z = 0.85, p = 0.396). Experiment 15 replicated
Experiment 6, which controlled the phonological cues with 2 minutes of training under
33
the with-pause condition, and results showed that participants still preferred rule-words (ß
= 0.28, SE = 0.11, z = 2.43, p < .05). Experiment 16 and 17 replicated Experiment 7 and
8, which had 30 seconds of training. After controlling the testing effect, participants
didn’t show any preference under both the no-pause condition (ß = -0.16, SE = 0.17, z = -
0.95, p = 0.34) and the with-pause condition (ß = 0.16, SE = 0.19, z = 0.83, p = 0.40).
Experiment 18 replicated Experiment 10, which had 1 minute of training with pauses
segmenting the words, and participants still preferred rule-words (ß = 0.52, SE = 0.19, z =
2,72, p < .01).
To sum up, the results showed that with 2 minutes of exposure and when the edge
syllables contain plosive consonants, participants preferred rule-words to be more
familiar under both with-pause and no-pause conditions. When the more intelligible
German database was used, preference to rule-words was observed only under the no-
pause condition; no preference was observed under the no-pause condition. When the
phonological cues were controlled, participants’ preference was consistent with 1minute,
2minutes and 5 minutes of training: rule-words under the with-pause condition and part-
words under the no-pause condition. No preference was observed with only 30 seconds
of training.
34
Table 1
Results of experiments testing adult participants’ acquisition of NADs of syllables. Experimental design was similar with Peña et al
(2002). Preference to rule-words (with NADs) was coded as 1, and part-words (without NADs) as 0. Results were analyzed using
Mixed Logit Model. The replication experiments in Experiments 13- 18 were listed in parenthesis after the corresponding experiment
number.
Exp Synth.
Database
Plosive
Consonants?
Repetitions
/Training
duration
Pauses Number of
Participants
(N)
Mean %
Choosing Rule-
Words (SD)
Intercept
(SE)
z p
1 French Yes 20/~2 min
a
No 14 59.72 (9.57) 0.42 (0.13) 3.16 <.01
**b
2 Yes 14 72.62 (11.21) 1.09 (0.19) 5.84 <.001
***
3
(Exp13)
c
German
Yes 20/~2 min
No 14 59.33 (12.16) 0.41 (0.16) 2.54 0.01
*
4
(Exp14)
Yes 14 77.98 (10.48) 1.37 (0.19) 7.38 <.001
***
5
No
20/~2 min No 14 39.88 (15.24) -0.49 (0.22) -2.20 0.03
*
6
(Exp15)
Yes 16 58.33 (13.03) 0.38 (0.17) 2.16 0.03
*
7
(Exp16)
5/~30 sec No 18 48.28 (11.81) -0.07 (0.16) -0.48 0.63
8
(Exp17)
Yes 16 58.16 (13.17) 0.39 (0.20) 2.00 0.045
*
9 10/~1 min No 14 40.22 (10.56) -0.46 (0.15) -2.97 <.01
**
10
(Exp18)
Yes 14 58.90 (14.25) 0.40 (0.18) 2.16 0.03
*
35
11 50/~5 min No 16 40.59 (14.43) -0.41 (0.20) -2.09 0.037
*
12 Yes 14 62.52 (11.40) 0.53 (0.11) 3.62 <.001
***
a. The duration of the training is listed as an approximate number because the two conditions, with-pause and no-pause, have different
durations. For example, in Experiment 1, the exact durations for the no-pause and with-pause conditions are 124.4 seconds and 128.7
seconds respectively.
b. * indicates the p-value is less than 0.05, ** less than 0.01, and *** less than 0.001.
c. Due to potential testing effect, some of the studies were replicated with better controls. The corresponding replication study is listed
in the parenthesis.
36
Table 2
Results from replication studies with testing effect controlled. The rest was the same as in Table 1.
Exp Plosive
Consonants?
Repetitions/
Training
Duration
Pauses Number of
Participants
Mean %
Choosing Rule-
words (SD)
Intercept (SE) z p
(N)
13 Yes 20/~2 min No 13 52.75 (11.80) 0.12 (0.14) 0.85 0.396
14 Yes 14 74.01 (8.44) 1.06 (0.12) 8.87 <.001
***
15 No 20/~2 min Yes 15 56.77 (10.58) 0.28 (0.11) 2.43 0.015
*
16 5/~30 sec No 15 46.11 (15.38) -0.16 (0.17) -0.95 0.341
17 Yes 15 53.33 (16.11) 0.16 (0.19) 0.83 0.404
18 10/~1 min Yes 14 61.90 (16.01) 0.52 (0.19) 2.72 0.006
**
37
Table 3
Summary of participants’ preference for rule-words vs. part-words under different training conditions.
Synthesizer
Database
Plosive
Consonants?
Repetitions
/Training duration
Pauses Preference
French Yes 20/~2 min
a
No Rule-Words
Yes Rule-Words
German
Yes 20/~2 min
No No Preference
Yes Rule-Words
No
20/~2 min No Part-Words
Yes Rule-Words
5/~30 sec No No Preference
Yes No Preference
10/~1 min No Part-Words
Yes Rule-Words
50/~5 min No Part-Words
Yes Rule-Words
38
Figure 5. Results of Experiments 1-12. Each dot represents the percentage of preference for rule-triplets for each participant, and the
diamond represents the group mean. The dotted line indicates chance level (50%).
1 2 3 4 5 6 7 8 9 10 11 12
NPause WPause NPause WPause NPause WPause NPause WPause NPause
1min
WPause
1 min
NPause
5 min
WPause
5min
fr2, WPlosives de 7, WPlosives de 7, NPlosives
2 min 30 secs 1 min 5 min
39
Figure 6. Results of Experiments 13-18. Each dot represents the percentage of preference for rule-triplets for each participant, and the
diamond represents the group mean. The dotted line indicates chance level (50%).
Exp 13
WPlosive, NPause
2 min
Exp 14
WPlosive, WPause
2 min
Exp 15
NPlosive, WPause
2 min
Exp 16
NPlosive, NPause
30 sec
Exp 17
NPlosive, WPause
30 sec
Exp 18
NPlosive, WPause
1 min
40
Discussion
In this study, 18 experiments were conducted in order to probe how rapidly
English-speakers can acquire the NADs of syllables using the paradigm in Peña et al.
(2002). Experiments 1 and 2 replicated Peña et al. (2002) using the MBROLA diphone
database for French with English-speakers, to test if English-speakers showed any
preference to rule-words under similar exposure conditions. Experiments 3-6 used the
MBROLA German diphone database (de7) in order to generate stimuli that were more
intelligible to native English-speakers, and controlled the confounding phonological cues
that were present in the original Peña et al. (2002) study, confirming that participants
exhibited acquisition of NADs of syllables under the with-pause testing condition. In
Experiments 7-12, participants’ acquisition of NADs was examined with different
amounts of exposure to the training stimuli, varying from 30 seconds to 5 minutes.
Experiments 13-18 supplemented Experiments 7-12 with confounding test effects
controlled.
Peña et al. (2002) demonstrated that 2 minutes of exposure was sufficient for the
acquisition of syllable NADs. The current study focused more closely on the short
exposure time, tested acquisition after 30 seconds, 1 minute, and 2 minutes of exposure,
and demonstrated a new lower limit on the speed of acquisition. Experiment 10, which
was confirmed by Experiment 18, provided the key findings of the current study—that
acquisition of NADs of syllables is rapid, since participants considered rule-words to be
more familiar than part-words after only 1 minute of exposure to the stimuli,. This result
supports rapidity as a key feature of positional learning in the dual mechanism account.
41
Participants considered rule-words to be more familiar in Experiment 1, with the
stimuli synthesized using the French diphone database, even though there were no pauses
separating the words. Peña et al. (2002) did not report results under this condition. The
acquisition of syllable NADs in a continuous stream may be the result of phonological
cues from the edge syllables, and the different features of the French consonants
(compared to English consonants) added to the prominence of the edge syllables, when
perceived by English speakers. If this is the case, it suggests the necessity of controlling
for phonological cues from the edge syllables, and using a diphone database which more
closely resembles English to synthesize the stimuli. Results from Experiment 13
supports this possibility, because when the German diphone database (de7, which is
considered as more intelligible for English speakers) was used, keeping everything else
constant, preference for rule-words disappeared. The result from Experiment 13 is
consistent with results in Endress and Bonatti (2007), which found no preference between
part-words and words with attested edge syllables
1
in Italian speakers. When the
phonological cues were further removed from the stimuli, preference switched to part-
words, as in Experiment 5. The step-wise change of preference for rule-words (in the
experiment using the French diphone database, phonological cues), to neutral preference
(German diphone database, phonological cues), to part-words (German diphone database,
no phonological cues), suggests that the phonological cues from the edge syllables biased
the participants towards rule-words.
1
In Endress and Bonatti (2007), the comparison was between class-words and rule-words. Class-words are
defined as a tri-syllabic syllable sequences with the edge syllables from the attested positions but unattested
paring, and the middle syllables from edge positions. If the nine artificial words are puliki, pufoki, puraki,
beliga, beraga, befoga, talidu, tafodu, and taradu, an example of class-word would be pubega.
42
Experiment 9 shows that participants considered part-words to be more familiar
after 1 minute of training when there were no pauses bracketing the syllable sequences.
This result does not support the dual mechanism account’s proposition that, because the
traces of memory are built over many instances, TPs learning occurs slowly (Endress &
Bonatti, 2007; Endress & Mehler, 2009). Peña et al. (2002) showed that there was no
preference between part-words and rule-words after 10 minutes of exposure. Endress and
Bonatti (2007) made similar findings comparing class-words (see fn. 1) with part-words.
Based on the discussion in the previous paragraph, the phonological cues presented in
Endress and Bonatti (2007) might bias participants towards rule-words; as a result,
preference for part-words was diminished under the short-exposure condition. This study
suggests that TPs-based learning might be faster than previously proposed.
Taken together, this study observed that the NADs of syllables could be acquired
after only one minute of exposure to the training stimuli, and the TPs learning is faster
than previously described.
43
CHAPTER 4 – NON-ADJACENT DEPENDENCIES OF ABSTRACT
IMAGES AND TONES
The previous chapter confirmed the key findings in Pena et al. (2002) that
participants could acquire syllable NADs without perceptual cues, provided there were
pauses separating the sequences. So far, it has been suggested that pauses facilitate the
learning of the NADs of syllables and body movements (Endress and Wood, 2011). It is
unknown if pauses facilitate the acquisition of non-linguistic NADs in general. This chapter
tests the domain generality of such learning by testing the acquisition of NADs of both
acoustic and visual non-linguistic stimuli: pure tones and abstract images.
Experiment 19 – 25
Methods
Subjects. Participants were all USC students, native English speakers. They
received credits fulfilling course requirements as compensation. The numbers of
participants for each condition are listed in Table 4.
Stimuli. The structure of the training and testing stimuli used in this study were
similar to the ones in Peña et al. (2002), with each syllable replaced with a pure tone or
an abstract image.
In tone study (Experiments 19-24), the training sequence and test items had the
same structure as Peña et al. (2002), with each pure tone substituting for one syllable.
Duration of the pure tones and pauses were consistent with studies using syllables, with a
tone duration of 230 ms and pause duration of 50 ms.
Experiments 19-22 used the same sets of tones. The tonal NADs in these studies
were “E4_A#4”, “F4_D4”, and “C4_D#4”, with middle tones “G4”, “A4”, and “B4”.
44
The frequencies of the octave notes are: C4 (261.6Hz), D4 (293.7Hz), D#4 (311.1Hz), E4
(329.6Hz), F4 (349.2Hz), G4 (392.0Hz), A4 (440Hz), A#4 (466.2Hz), B4 (493.9Hz)
(“Scientific Pitch Notation”, n.d.).
Experiments 23 and 24 are similar to Experiments 19 and 20, but the duration of
each tone was increased to 300 ms, and of each pause to 100 ms. To avoid the possibility
that participants might prefer particular chords, there were 10 different sets of training
and testing materials in Experiments 23 and 24, and participants were randomly assigned
to one of the ten training sets. The NADs and the middle tones of the ten sets are: Set
1(E4_D#4, G4_F4, D4_C4; B4, A#4, A4), Set 2 (A4_G4, B4_A#4, F4_E4; C4, D4, D#4),
Set 3 (G4_F4, A#4_A4, E4_D#4; C4, D4, B4), Set 4 (F4_E4, A4_G4, D#4_D4; A#4, B4,
C4), Set 5 (D#4_D4, F4_E4, C4_B4; A4, G4, A#4), Set 6 (D4_C4, E4_D#4, B4_A#4; F4,
G4, A4), Set 7 (C4_B4, D#4_D4, A#4_A4; E4, F4, G4), Set 8 (B4_A#4, D4_C4, A4_G4;
D#4, E4, F4), Set 9 (C4_D4, G4_A4, A#4_B4; D#4, E4, F4), and Set 10 (D4_A4, G4_B4,
C4_A#4; E4, D#4, F4). All of the tone sequences were synthesized with
MATLAB_R2013a.
In the image study (Experiment 25), the training sequence and test items had the
same structure as Peña et al. (2002), with each image substituting for one syllable. The
images used were nine glyphs from the Sabaen alphabet and Njuka syllabary (Agers,
n.d.), following Turk-Browne et al. (2008). The images were 474 x 553 pixels, black and
on a white background, with a duration of 500 ms and 51 ms pauses between images.
The images and their corresponding syllables are shown below in Figure 7.
45
Figure 7. Images used in the study, and their corresponding syllables.
Procedures. Participants were instructed to listen to a sequence of tones
(Experiments 19-24) or watch a sequence of images (Experiment 25) for a period of time,
paying close attention to the sequences as they would be tested on them after listening or
watching. After completing the training, participants performed a choice test, in which
they would hear or see two sequences of tones or images and indicate which one was
more familiar to them by pressing one of two computer keys.
PsyScope B53D was used to display stimuli and record responses in all of the
experiments in this study. The experiments were run on 13-inch MacBook Pro, in a quiet
room free from distractions.
Results
The experimental data was analyzed using the same methods as in Experiment 1.
Results see Table 4 and Figure 8.
46
With 2 minutes of pure tone sequences as training stimuli, on average,
participants’ preference for rule-triplets was 53.33% (SD = 9.69) of the test trials under
the no-pause condition (Experiment 19), and 54.58% (SD = 9.62) under the no-pause
condition (Experiment 20). There were no significant differences in preference between
rule-triplets or part-triplets, under either the no-pause condition (Experiment 19) (ß =
0.15, SE = 0.15, z = 0.98, p = 0.33), or the with-pause condition (Experiment 20) (ß =
0.22, SE = 0.17, z = 1.28, p = 0.20).
With training of the pure tone sequences increased to 5 minutes, participants
chose rule-triplets to be more familiar in 51.71% (SD = 7.35) of the trials under the no-
pause condition (Experiment 21), and 55.31% (SD = 6.85) under the no-pause condition
(Experiment 22). Still, there were no significant differences in preference between rule-
triplets or part-triplets, under either the no-pause condition (Experiment 21) (ß = 0.08, SE
= 0.15, z = 0.50, p = 0.61), or the with-pause condition (Experiment 22) (ß = 0.25, SE =
0.16, z = 1.57, p = 0.12).
When the duration of each tone was lengthened to 300 ms and pauses (in
Experiment 24) was also increased to 300 ms, participants still didn’t show any
preferences between rule-triplets and part-triplets in either Experiment 23 (no pauses),
Mean Percentage = 45.37% (SD = 15.10), ß = -0.20, SE = 0.17, z = -1.19, p = 0.24, or in
Experiment 24 (with pauses), Mean Percentage = 54.34 (SD = 21.52), ß = 0.23, SE = 0.23,
z = 1.01, p = 0.31.
In Experiment 25 exploring acquisition of the NADs of abstract images,
participants considered rule-triplets to be more familiar in 52.94% of trials, SD = 15.02.
47
Regression model suggested that the preference did not differ from chance level of 50%,
ß = 0.13, SE = 0.15, z = 0.81, p = 0.42.
48
Table 4
Results from previous studies testing adult participants’ acquisition of NADs of tones and images. The design of the studies was
similar to Peña et al. (2002). Participants’ preference for rule-triplets (with NADs) was coded as 1, preference for part-triplets
(without NADs) as 0, and results were analyzed using the Mixed Logit Model.
Exp Types of
Stimuli
Number of
Repetitions
Training
Duration
Pauses Number of
Participants (N)
Mean % Choosing
Rule-Triplets (SD)
Intercept
(SE)
z p
19 Pure Tone
20 ~2 min No 15 53.33 (9.69) 0.15 (0.15) 0.98 0.33
20 Yes 17 54.58 (9.62) 0.22 (0.17) 1.28 0.20
21 50 ~5 min No 13 51.71 (7.35) 0.08 (0.15) 0.50 0.61
22 Yes 23 55.31 (6.85) 0.25 (0.16) 1.57 0.12
23 Pure Tone
(10 sets)
20 ~2 min No 15 45.37 (15.10) -0.20 (0.17) -1.19 0.24
24 Yes 16 54.34 (21.52) 0.23 (0.23) 1.01 0.31
25 Image 20 ~4.5 min Yes 17 52.94 (15.02) 0.13 (0.15) 0.81 0.42
49
Figure 8. Results of Experiments 19-25. Each dot represents the percentage of preference for rule-triplets for each participant, and the
diamond represents the group mean. The dotted line indicates chance level (50%).
2 min
No Pause
2 min
With Pause
5 min
No Pause
5 min
No Pause
Longer Duration
of Each Tone
2 min, No Pause
Longer Duration
of Each Tone
2 min, With Pause
Images
With Pause
Tones
50
Conclusions
This chapter aimed to test if bracketing sequences with pauses would assist the
acquisition of NADs of non-linguistic stimuli, both acoustic and visual. The study
controlled the perceptual saliency of the stimuli, by using tones from the same octave,
and using black line drawings on a white background. The study further controlled the
potential influence of semantics by using meaningless abstract images as visual stimuli.
With non-linguistic stimuli, participants failed to acquire the NADs even under the with-
pause condition, a condition under which syllable NADs have been successfully acquired
in past studies.
So far, pauses have been shown to facilitate acquisition of NADs of syllables and
human body movements, but not pure tones or abstract images. The next chapter will
explore the why the pauses leads to acquisition of the NADs of certain types of stimuli
but not others.
51
CHAPTER 5 – TWO HYPOTHESIS ON MECHANISMS UNDERLYING
ACQUISITION OF NON-ADJACENT DEPENDENCIES
The above studies suggest that: (1) Perceptual cues facilitate the acquisition of
NADs; (2) pauses between sequences with NADs also facilitate the acquisition of NADs,
but only with respect to syllables and human body movements, not with respect to non-
linguistic acoustic stimuli, such as tones or noises. Two questions arise. First, what role
do the pauses play in the acquisition of the NADs of stimuli without perceptual cues?
Second, why do pauses facilitate the acquisition of the NADs of certain types of stimuli
(syllables and body movements) but not of others (abstract images and tones)?
One possible answer to the first question is that the pauses bracketed syllable
sequences; as a result, syllables at the beginning and ending positions occupied special
edge positions. Henson’s Start-End Model (SEM) proposed that, in sequence learning,
the representation system places a “start marker” and an “end marker” in each sequence
(Henson, 1998, 1999) and that the items’ positions are recorded as their distance to one of
the two markers. Building on this idea, the dual mechanisms account (Endress & Bonatti,
2007; Endress et al., 2009; Endress & Mehler, 2009) proposed that there are two learning
mechanisms at work during statistical learning of syllable sequences in Peña et al. (2002),
one rapidly recording syllables’ positions relative to the edges of the sequences, and
another one tracking TPs among syllables. According to this hypothesis, each learning
mechanism produces its own distinct learning outcome, and the two mechanisms taken
together produce the observed behaviors. The dual mechanisms account plausibly
explains the statistical learning of syllable NADs.
52
So far, no compelling answer has been proposed to the second question regarding
the reason why pauses facilitate the acquisition of NADs of some stimuli and not others.
Thus, the mechanisms involved in detecting patterns in NADs do not appear to be
engaged in all auditory stimuli, at least not under similar exposure conditions. This raises
the possibility that the mechanisms may be specific to language processing. However, in
the visual domain, the NADs of body movements were acquired (Endress and Wood,
2011). This naturally invites speculation that learning in these two domains is governed
by the same underlying mechanism. But if it is, that leads to a puzzle: what kind of
learning mechanism would be engaged by speech and body movements, but not by non-
linguistic auditory sequences.
One possibility is that syllables and body movements fluidly transform from one
stimulus to the next (unlike distinct tones and images which shift sharply from one
stimulus to the next), which facilitates the acquisition of the NADs. Sensitivity to the
NADs of movements is not particular to human body movements, and the NADs of any
movements at the beginning and ending of a continuous sequence of motion can be
learned. It has been shown that human perception is generally sensitive to dynamics of
an object. In an object recognition task in Vuong & Tarr (2004), participants were first
presented with a rotating object, and then with a single view of the object, and were then
asked to indicate whether the test object was the same rotating object from the training.
Participants responded faster and more accurately when the test views were from the
beginning or the end of the rotation. Participants were even sensitive to unattested views
that preceded or followed the trajectory of rotation in the training. In Vuong & Tarr
(2004), each object performed only one movement, rotating in a single direction, but it is
53
equally possible that higher familiarity with the particular movements at the beginning
and ending of a continuous series of movements would also results in the acquisition of
NADs.
Another possibility is that the acquisition of NADs of speech and that of body
movements share common cognitive processes, given that syllable sequences may be
perceived as sequences of corresponding vocal movements. For example, the motor
theory of speech perception (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967;
Liberman & Mattingly, 1985) posits that in perceiving speech, human beings map the
acoustic signal to articulatory gestures. Brain imaging studies and Transcranial Magnetic
Stimulation (TMS) studies have shown that perception and production of the same
syllables activate the same premotor areas (Fadiga, Craighero, Buccino, & Rizzolatti,
2002; Pulvermüller, Huss, Kherif, Moscoso del Prado Martin, Hauk, & Shtyrov, 2006;
Watkins, Strafella, & Paus, 2003; Wilson, Saygin, Sereno, & Iacoboni, 2004). Although
these studies were largely based on the perception of isolated phonemes or syllables, it
may be reasonably hypothesized that this theory would apply equally to the perception of
syllable sequences. Likewise, the representation of visually perceived human movements
involves the activation of motor representations by the perceiver (e.g., Wilson &
Knoblich, 2005). Thus, it is possible that a kind of motor sequence learning is the
common underlying mechanism supporting the statistical learning of syllables and body
movements. The beginning and ending of the motor sequences are prominent since they
mark the change of status, from stillness to motion and from motion to stillness. This
might facilitate the detection of dependency patterns in which the beginnings and endings
take part.
54
Study Synopsis
The following experiments will examine each of these two possibilities as
potential explanations for the mechanism underlying NAD acquisition. It is worth noting
that the two explanations are not mutually exclusive. It is possible that they jointly
contribute to the acquisition of NADs.
The first question is whether the NADs of non-human movements can be acquired.
Experiment 26 replicated the Endress & Wood (2011) findings regarding human
movements, and Experiment 27 tested participants’ acquisition of NADs of sequences of
objects moving in a manner which would be biologically impossible for human beings,
but is nonetheless continuous and coherent. Next, whether continuous movement is
critical for the acquisition of NADs was explored. Experiment 28 tested NAD learning
with sequences of static images of body postures, and Experiment 29 tested learning of
static images of objects. Experiments 26 and 28 use stimuli that can be mapped onto
representation of body movements while Experiments 27 and 29 do not. Thus, these four
experiments investigated the continuous movement hypothesis and the motor sequence
learning hypothesis.
Experiment 26: NADs of Body Movements (Replication of Endress and Wood (2011)
Experiment 26 replicated the finding in Endress and Wood (2011) that adult
participants were capable of acquiring NADs of human body movements.
Methods
Participants. Twenty undergraduate students from the University of Southern
California (USC) were recruited from the USC Psychology Subject Pool. Their
participation in the experiment was compensated with course credits.
55
Apparatus and Stimuli. The original episodes of body movements from Endress
and Wood (2011) were used to create training and testing stimuli in this experiment. In
each original episode, an animated male actor performed a movement (e.g., bending), and
the movements started and ended in the same neutral, still, standing position, referred to
herein as the “neutral position” consistent with Endress and Wood (2011). The actor
pausing briefly in the neutral position served as the interval pause between sequences of
movements.
There were two major differences between the stimuli in the current experiment
and in Endress and Wood (2011). First, the pauses between sequences in this experiment
were a blank screen, instead of the neutral position. The reason for this change was the
fact that neutral positions could not be used as intervals in Experiments 28 and 29, and
the study sought to minimize the differences between the designs of the experiments.
Second, the parameters of the visual presentation were different. In this experiment, each
movement episode lasted 625 ms with 15 frames presented at a frame rate of 24
frames/second. Each frame was sized 480×468 pixels.
Training. The structure of the training and testing stimuli in both experiments
was similar to that in Peña et al. (2002), with nine syllables replaced with nine body
movements. Nine triplets were created by pairing each of three pairs of NADs (a_b, c_d,
e_f, with each letter representing a body movement) with each of three middle items
(x,y,z), and 20 repetitions of the nine triplets were randomly concatenated into a
continuous visual stream. A sequencing constraint was imposed such that each triplet
was immediately followed by a triplet with a different NAD, and a different middle item.
Each triplet was presented for 1875 ms, with a 125 ms pause between the triplets,
56
resulting in an entire video training sequences of 6 minutes and 22 seconds. The first 5
seconds of the video faded in and the last 5 seconds of the video faded out.
Testing. After exposure to the training set, participants were tested on their
preference for rule-triplets or part-triplets. Same as previous experiments, there were 36
test pairs contrasting rule- and part-triplets. The presentation order of the two types of
sequences within a pair and the response buttons were counterbalanced.
Figure 9. Frames excerpted from the body movement animations used in Experiment 26
(depicting the maximum extent of movement), which were also the still images used in
Experiment 28. The stimuli are the original stimuli used in Endress and Wood (2011).
57
Figure 10. Diagram of the design of Experiment 26; the following experiments have the
same structure as Experiment 26, except that each body movement is replaced with object
movement (Experiment 27), still body posture (Experiment 28), and still object posture
(Experiment 29). During training, participants viewed videos of nine triplets of body
movements repeated by an animated actor. The nine triplets were constructed by
exhaustive matching of three pairs of Non-Adjacent Dependencies (NADs) and three
middle items. The color bars below the pictures illustrate the three pairs of NADs (blue,
green and purple) and three middle items (red) (the color bars were not part of the
stimuli). In the test, participants were asked to indicate which one is more familiar to
them, (1) part-triplets, episodes from the video spanning two triplets, or (2) rule-triplets,
NADs paired with unattested middle items.
58
Results and Discussion
Participants’ responses were coded as binary variables, with preference for rule-
triplets coded as 1 and preference for part-triplets coded as 0. A logistic regression model
was used to compare participants’ choice with the chance level (0.5), with the binary
responses as the dependent variable; the model controlled for variance based on
participants and test questions. For each experiment, participants’ mean preference for
rule-triplets over 36 testing pairs, the standard deviation and mean percentage of rule-
triplet preference, as well as the intercept (β), z, and p-value from the logistic regression,
are listed in Table 5. Intercept (β) indicates deviation from the chance level in the choice
tests between rule-triplets and part-triplets, and the p-value indicates whether the
preference for rule-triplets is significant. In Experiment 1, out of 36 test trials, the
average number of trials in which participants preferred rule-triplets is 22.75 (SD = 6.09),
which is approximately 63.19%. Logistic regression of Experiment 1 yielded a
significant intercept (ß = 0.61, SE = 0.18, z = 3.37, p < .001), indicating participants
considered rule-triplets to be more familiar. Figure 11 shows each participant’s
percentage of preference for rule-triplets in this experiment. The above analysis was
done using R 3.0.2 GUI 1.62 Snow Leopard build (6558), and lme4 R package, version
1.0-4, and graphed with graphics version 3.0.2.
This experiment confirmed the findings in Endress and Wood (2011) that
participant’s preference for rule-triplets was above chance level, suggesting that
participants considered rule-triplet to be more familiar. Given that the participants were
never exposed to rule-triplets during training, and that rule-triplets only shared NADs in
59
common with the training triplets, the results suggest that participants were sensitive to
the NADs of body movements.
Table 5
Mean preference for rule-triplets over 36 trials including standard deviation, Mean % of
preference to rule-triplets, Intercept (β), Standard Error of Intercept (SE), z, and p-value
of each experiment using mixed logit model.
Experiments
Mean preference
to rule-triplets over
36 trials (SD)
Mean % of
preference to
rule-triplets
Intercept (β)
(SE)
z p
26. Body Movements 22.75 (6.09) 63.19 0.61 (0.18) 3.37 <.001
27. Object
Transformations
23.1 (6.45) 64.17 0.66 (0.19) 3.49 <.001
28. Body Postures 21.15 (4.12) 58.75 0.64 (0.11) 3.41 <.001
29. Object Postures 16.8 (5.27) 46.67 -0.14 (0.14) -1.04 0.3
60
Figure 11. Results of Experiments 26-29. Each dot represents percentage of preference
for rule-triplets for each participant, and the diamond represents group mean. The Dotted
line indicates chance level (50%).
One possible explanation for participants’ sensitivity to the NADs of the body
movements (Experiment 26) is that the representation of body movement sequences is
mapped onto motor representations of those sequences (Wilson & Knoblich, 2005),
hence participants were sensitive to the beginnings and endings of the motor sequences.
Another possibility is that participants were equally sensitive to the NADs of the
movements of any actor. Experiment 2 tested if participants were similarly capable of
learning NADs of object movements and transformations.
Experiment 26
Body Movements
Experiment 27
Object Movements
& Transformations
Experiment 28
Body Postures
Experiment 29
Object Postures
Percentage Choosing Rule-triplets
61
Experiment 27: NADs of Object Movements and Transformations
Experiment 27 explores if the NADs of object movements and transformations
could also be acquired. In this experiment, the actor was a non-human object, and its
movements and transformations could not be plausibly performed by human actors.
Methods
The methods were the same as in Experiment 26, except that each body
movement was replaced with an animated object movement or transformation, as shown
in Figure 12. The red blanket-shaped object (as in the cell titled “Neutral Position”)
performed movements that cannot be mapped onto human motor representations. The
videos of object transformations were created using 3ds Max.
62
Figure 12. Depiction of the object transformations and neutral position used in
Experiment 27, and the images used in Experiment 29. In Experiment 27, movement into
each position was continuous from the flat, neutral position.
Results and Discussion
In Experiment 27, the average preference for rule-triplets was 23.1 (SD = 6.45)
over 36 test trials, which is about 64.17% (See Table 5 and Figure 10). Analysis of
Experiment 27 yielded a significant intercept (ß = 0.66, SE = 0.19, z = 3.49, p < .001),
suggesting participants considered rule-triplets more familiar.
This result suggests that continuous movements aid the acquisition of NADs
regardless of whether the movements can be mapped to motor representations. Bracketed
63
continuous movements provide sufficient packaging of sequences to facilitate learning
patterns involving the sequence beginning and end.
The following two studies further examine this idea by testing whether
participants are equally likely to learn the NADs of static images of body postures
(Experiment 28) and object postures (Experiment 29), rather than continuous movements.
It is possible that they would fail to do so in both Experiment 28 and 29 due to the lack of
continuous movement. It is also possible they would succeed in both experiments, which
would suggest that participants are sensitive to the NADs of images as well. Another
possibility is that participants would fail in Experiment 29, but succeed in Experiment 28,
because the images of body postures provide sequences of implied body actions that are
perceived as continuous movement (Shiffrar & Freyd, 1990; Urgesi, Moro, Candidi, &
Aglioti, 2006), which in turn triggers motor learning of continuous movements from one
posture to the next, as in Experiment 26.
To minimize the differences between the stimuli of Experiments 28 and 29 and
the stimuli of Experiments 26 and 27, the 9
th
frames of the movement episodes in
Experiments 26 and 27 were used as the image stimuli for the next two studies.
Experiment 28: NADs of Body Postures
Experiment 28 examines if participants can acquire the NADs of the images of
human body postures as they did in Experiment 26.
Methods
The methods were the same as in Experiment 26, except that each body
movement was replaced with the 9th frame of the video of each movement. Each image
sized 480×468 pixels. Each image was presented for 625 ms, same as the duration of
64
each movement video in Experiments 26 and 27. The between-triplet pauses were also
125 ms. The total duration of the training sequence was 6 minutes and 22 seconds.
Results and Discussion
The average preference for rule-triplets in Experiment 28 was 21.15 (SD = 4.12),
58.75% (See Table 5 and Figure 11). Logistic regression yielded a significant intercept ß
= 0.64, SE = 0.11, z = 3.41, p < .001, indicating participants considered rule-triplets to be
more familiar and that they were sensitive to the NADs of both the body movements and
body postures.
Still, it is unknown if these findings would extend to images in general, or if
participants are sensitive to body postures because viewers interpret the posture
sequences as continuous movement, which then functions as in Experiment 26.
Experiment 29 tested whether participants could also acquire the NADs of images of the
objects in different postures.
Experiment 29: NADs of Object Transformations and Postures
Experiment 29 explores if participants can acquire the NADs of object postures as
they did with human postures in Experiment 28.
Methods
The methods were the same as in Experiment 26, except that each body
movement was replaced with a static image of an object posture, which was the 9
th
frame
of the corresponding video in Experiment 27.
Results and Discussion
In Experiment 29, the average preference for rule-triplets was 16.8 (SD = 5.27),
around 46.67% (see Table 5 and Figure 11). The intercept from logistic regression was
65
not statistically significant, (ß = -0.14, SE = 0.14, z = 1.04, p = 0.3), suggesting that
participants had no preference between rule-triplets and part-triplets, and failed to
distinguish between them.
The outcome that participants successfully learned the NADs of static body
postures, but not of static object postures, suggests two points. First, with simple objects,
as opposed to human postures, packaging the sequences through continuous movement
appears to be necessary for acquiring NADs. Absent such packaging, participants failed
to learn the NADs of the images. Second, processing and representation of body
movement sequences appears to be special. As discussed above, mapping body postures
onto motor systems creates a perception of continuous body movements (Shiffrar &
Freyd, 1990; Urgesi, et al., 2006), which could have activated motor representations
similar to when continuous motions were viewed, thereby achieving the same packaging
of the movement-triples that highlights the beginnings and ends. However, the
representation of object postures did not support such mappings, so acquisition of NADs
of discrete objects failed.
Discussions
The inquiry into the representation of body movements and continuous object
movements stemmed from comparable results obtained in studies of the NADs of
syllables (Peña et al., 2002) and body movements (Endress & Wood, 2001). Both studies
suggested that dependency rules involving NADs of syllables and body movements could
be learned, with the condition that the sequences with NADs were bracketed by pauses
(Peña et al., 2002).
66
The four experiments in the current study probed two questions regarding visual
statistical learning of NADs: (1) if such learning pertains only to stimuli in the form of
human movements; (2) if continuous movement has an impact on participants’
acquisition of the NADs of non-human object stimuli.
Experiment 26 replicated one major finding in Endress and Wood (2011) and
confirmed that participants could acquire NADs of body movements under the current
experimental conditions. Experiment 27 probed if movements’ susceptibility to being
mapped onto the human body was a necessary condition for learning their NADs, by
replacing the human actor with an object performing movements that could not be
represented as human body movements, and found that participants similarly learned the
NADs of the object movements. Experiments 28 and 29 used static images that depicted
the maximal extent of the movements depicted in Experiments 26 and 27.
Experiment 26 (with body movements) and 27 (with object movements)
suggested that statistical learning of movement NADs is domain general, and not limited
to human body movements, since participants successfully acquired the NADs of object
movements/transformations as well. These two experiments demonstrated the general
capacity of the human cognitive system to track and learn the beginning and ending
movements of a continuous sequence of movements, regardless of the actor performing
them, or whether the movements were human-like. However, this does not mean that the
representation and processing of movements were the same for body movements and
object movements/transformations. The differing results of Experiment 27 (with static
human postures) and Experiment 29 (with static object postures) indicated differences in
the underlying processing of static images.
67
In Experiment 28, participants successfully learned the NADs of the static body
postures of the same actor in Experiment 26. While the results could be explained by a
separate representation system for object sequences and body postures, the contrasting
results suggest the involvement of the motor system in visual sequence learning. The
discrete images of static body postures, once mapped onto a representation of the
observer’s motor system, are perceived as continuous body movements (Shiffrar & Freyd,
1990; Urgesi, et al., 2006). This may in turn activate motor representations similar to
those which are activated when continuous motions were viewed, thereby achieving the
same packaging of the movement-triples that highlights the beginnings and ends, leading
to successful learning of NADs, as in Experiment 26. In other words, the motor system
facilitates the linkage of distinct body postures into coherent movements. In fact, viewers
of the static body posture sequences themselves reported that the sequences created a
sense of continuous movement. With respect to object postures, since they cannot be
mapped onto the motor system, they are still perceived as separate images of an object.
Therefore, participants failed to learn the NADs of these different objects.
Conclusion
Packaging of sequences into a coherent unit is critical for learning the Non-
Adjacent Dependencies of the stimuli. Motor sequence learning provides a way of
representing the sequences as a coherent unit. Based on this account, the observed results
can be explained as such: (1) object sequences had fluid transformations over time so that
acquisition was successful; (2) body postures were represented continuous
transformations in the motor system, so that acquisition was also successful; (3)
acquisition of the NADs of body movements was successful as a result of the previous
68
two reasons; (4) acquisition of the NADs of object postures failed due to the lack of fluid
transformations from one posture to the next.
69
CHAPTER 6 - FURTHER EXPLORATION
Introduction
The previous chapter proposed that the packaging of stimuli into a coherent
sequence with fluid transformation is critical for successful acquisition of the Non-
Adjacent Dependencies of movements. Pauses played the roles of bracketing the
sequences. The coherency can result from the stimuli’s fluid transformations over time,
or from the participant’s subjective perception that such transformations are occurring.
Experiments in the previous chapter provided preliminary findings supporting this claim
by demonstrating (1) acquisition of the NADs of movements of an object, based on its
visual fluid transformations, (2) acquisition of the NADs of images of body postures,
because of perceived movement connecting the images, and (3) human body movements,
which combine these two factors. By contrast, the NADs of images of an object’s
different postures (similar to human body postures) were not acquired under the same
training and testing conditions, possibly due to the absence of fluid transformations,
actual or perceived. Participants’ failure to acquire the NADs of abstract images in
Chapter 4 may be justified on the same basis. This chapter will further test this
explanation. In the following experiments, the stimuli were manipulated so that images
of objects created the perception of fluid transformation, while human body movements
no longer generated such perception, and the differences between experiments in this
chapter and experiments in the previous chapter were minimized. The experimental
prediction was that, with the new set of stimuli in this chapter, participants would acquire
the NADs of images, but not of human body movements or images of body postures.
70
Images
Previous studies have shown that the processing of sequences of identifiable
images (i.e. line drawings of real-world objects or animals) may involve semantic
processing or verbalization of the image sequences, which cannot occur with abstract
images. In Otsuka et al. (2013), participants demonstrated similar acquisition of the
sequences of identifiable images under all three testing conditions: (1) training and
testing stimuli composed of the same set of images; (2) training and testing stimuli which
used different sets of images, but the objects depicted in the images were from the same
basic-level categories (e.g., types of chairs); (3) training stimuli composed of images of
objects, and testing stimuli which were words for the objects shown in the training. The
third condition provided the strongest evidence that additional processing, apart from
visual processing, had taken place during exposure to the visual sequence, given that the
words and the images did not share any common visual features. Otsuka et al. (2013)
proposed that the semantic grouping of the training sequences contributed to the
comparable results under condition 1 (both training and testing sequences used the same
set of objects) and condition 3 (image sequences in training, word sequences in testing).
These results may also be explained by participants’ verbalization of the objects’ basic-
level labels, which could be stored in a phonological loop (Baddeley, Chincotta, &
Adlam, 2001), and could involve similar motor sequence learning of pronouncing the
object labels. The key point is that with identifiable images, additional mental
representation of those images, such as semantic processing or verbalization of the labels
of the images, may serve as a means to group the sequences.
71
The foregoing finding that the semantic representation or verbalization of images’
“labels” would provide another way of bracketing the sequences, gives rise to the
possibility that acquisition of the NADs of images could be enabled by replacing the
abstract images with images depicting identifiable objects. Experiment 30 examines this
hypothesis by testing the acquisition of the NADs of images in sequences, where the
abstract images used in Experiments 29 were replaced with images depicting 3D objects
(e.g., cubes), while keeping other experimental conditions the same. Results from this
experiment will test the hypothesis that packing the stimuli into a coherent unit, which
can be realized in various ways, is critical for acquisition of the NADs.
Human Body Movements
Previous studies have suggested that intrinsic motions and the identities of the
actors performing them are naturally represented together (Kersten, Earles, & Berger,
2015). “Intrinsic motions” refers to motions performed by parts of the objects or actors
(e.g., a man raising arms), as contrasted with “extrinsic motions,” which refers to the
objects’ or actors’ motions relative to their external frames (e.g., a man getting into a
vehicle). Kersten et al. (2015) demonstrated that participants automatically encoded the
intrinsic motions and the actors performing them, even though the participants were not
instructed to pay attention to the binding between specific actors and actions, suggesting
the intrinsic motions are represented and memorized in relation to the actors performing
them.
In Endress and Wood (2011) and Experiments 26 and 28 in this dissertation, the
NADs of movements and body postures were successfully acquired, because the
sequences with the NADs were represented as a whole event, due to either the visual
72
continuity of the movements, or the perceived continuity of the movements. The
movements in both Endress and Wood (2011) and Experiment 26 and 28 are intrinsic
motions. The findings in Kersten et al. (2015) suggest that if each movement is
performed by a distinctive actor, the continuity of the movements would be disrupted,
because the movements are represented in relation to the distinctive actors performing
them, and that such disruption would decrease or even diminish the acquisition of the
NADs of movements. Experiment 31 below tests acquisition of the NADs of movements,
under the condition that the movements are performed by different animated human
actors.
Study Synopsis
The two experiments in this chapter test the role of packaging in the acquisition of
NADs. Earlier experiments in this dissertation suggested that participants could not learn
the NADs of images that lack meaning, or are difficult to describe. Experiment 30 will
use images of objects, so that semantic representation of the images or verbalization of
the images may facilitate grouping of the images. It is expected that under this new
condition, participants will successfully acquire the NADs. Experiment 31 anticipates
that disrupting the packaging of continuous movement sequences, by having different
human subjects perform each movement in a given sequence, will prevent the successful
acquisition demonstrated in Experiment 30.
Experiment 30: NADs of Object Images
This experiment tested if participants can learn the NADs of images depicting
identifiable objects.
73
Methods
Participants. Twenty undergraduate students from University of Southern
California (USC) were recruited from the USC Psychology Subject Pool.
Apparatus and Stimuli. The design of the experiments was similar to
Experiment 28, except that each abstract image was replaced with an image of a
monochromatic 3D geometric shape (e.g., cube, sphere, and cone) with a black
background (See Figure 13). Each 3D shape was represented in a distinct color. Each
image was presented for 625 ms, and the pauses between triplets were 125 ms, resulting
in a total duration of 1875 ms for each triplet. The total duration of the training sequence
was 6 minutes and 22 seconds.
The stimuli were generated using Blender v. 2.71.
74
Figure 13. Images of the objects used in Experiment 30.
Results
On average, participants considered rule-triplets to be more familiar in 22.70 out
of 36 testing trials (SD = 8.64), around 63.06% of all the trials (see Table 6). The same
logistic regression model used in previous experiments was applied to compare
participants’ preference for rule-triplets, with chance level (0.5). Participants’ responses
were coded as binary variables (1 for rule-triplets, and 0 for part-triplets). The results
suggested that participants’ preference for rule-triplets was significantly above chance
75
level (ß = 0.86, SE = 0.36, z = 2.42, p = 0.02). Figure 15 shows each participant’s degree
of preference for rule-triplets in this experiment, expressed as a percentage. The above
analysis was done using R 3.0.2 GUI 1.62 Snow Leopard build (6558), and lme4 R
package, v. 1.0-4, and graphed with graphics v. 3.0.2.
Experiment 31: NADs of Body Movements Performed by Different Actors
This experiment tested if participants could learn the NADs of movements in a
sequences, when each movement is performed by a distinct actor.
Methods
Participants. Twenty undergraduate students from University of Southern
California (USC) were recruited from the USC Psychology Subject Pool. Their
participation in the experiment was compensated with course credits.
Apparatus and Stimuli. The stimuli were modified from the stimuli used in
Zhisen Urgolites’s studies. In each of the nine episodes, a distinct animated human actor
performed a distinct action against a black background; each of the nine actors
corresponded to one of nine specific actions (See Figure 14). Similar to Experiment 26,
15 frames of each episode of movement was presented at the rate of 24 frames/second,
resulting in a triplet duration of 625 ms. The pause between triples was 125 ms, as in
previous studies. The total duration of the training sequence was 1875 ms. Every frame
of the videos was sized 480 x 468 pixels.
76
Figure 14. Frames excerpted from the videos used in Experiment 31 (depicting the
maximum extent of movement).
77
Training. The structure of the training and testing stimuli in both experiments
were similar to Experiment 26. Each triplet was presented for 1875 ms, with 125 ms of
pauses between the triplets, resulting in a total duration of 6 minutes 22 seconds for the
training video.
Testing. Same as Experiment 26.
Results
On average, participants considered rule-triplets to be more familiar in 17.55 out
of 36 testing trials (SD = 4.65), around 48.75% of all the trials (see Table 6 and Figure
15). Logistic regression showed that participants’ preference for either rule-triplets or
part-triplets was at chance level (ß = -0.05, SE = 0.12, z = -0.43, p = 0.67).
78
Table 6
Mean preference for rule-triplets over 36 trials including standard deviation, Mean % of
preference for rule-triplets, Intercept (β), Standard Error of Intercept (SE), z, and p value
of each experiment using mixed logit model.
Experiments
Mean preference
for rule-triplets
over 36 trials (SD)
Mean % of
preference for
rule-triplets
Intercept (β)
(SE)
z p
Images of Objects 22.70 (8.64) 63.06 0.86 (0.36) 2.42 0.02
Movements by
Different Actors 17.55 (4.65) 48.75
-0.05 (0.12) -0.43 0.67
79
Figure 15. Results of Experiments 30 and 31. Each dot represents the percentage of
preference for rule-triplets of each participant, and the diamond represents group mean.
The dotted line indicates chance level (50%).
Discussion
The two experiments in Chapter Six were designed to test the hypothesis that
bracketing of the sequences is critical to acquiring the NADs of the stimuli.
In previous studies, it has been shown that participants failed to learn the NADs
of either abstract images or images of unidentifiable 3D objects. It was hypothesized that
acquisition failed to take place because the images could not be packed into a coherent
unit, because there was no transformation connecting the images within each triplet. As
noted above, in contrast with Experiment 31, participants in previous studies were able to
learn the NADs of human body movements or images of body postures, since the
Experiment 30
Images of Different Objects
Experiment 31
Different Actors
Percentage Choosing Rule-Triplets
80
continuity of the body movements and the perceived continuity between images within
the sequences functioned as a packaging mechanism for the stimuli.
If the foregoing hypothesis regarding the role of packaging is correct, it would
imply that the acquisition of object images would be possible with a mechanism that
supports packaging of the images within a sequence, while the acquisition of human body
movements would be negated by the disruption of the packaging mechanism.
Experiment 30 tested the acquisition of objects by using pictures of objects that could be
easily named. Otsuka et al. (2013) suggested that the act of viewing sequences of easily
identifiable objects may involve grouping those objects, possibly through semantic
grouping or verbalization. Therefore, it was expected that NAD acquisition would
successfully take place with sequences of identifiable objects. Experiment 31 was
designed to test if acquisition of the NADs of human body movements would fail when
the packaging mechanism was removed. Instead of using a single actor to perform all of
the movements, Experiment 31 used nine distinct actors to perform nine different
movements. Based on the findings in Kersten et al. (2015), it was expected that the
participants’ perception of continuous movements would be weakened or even disrupted
entirely, since the representation of each movement would be associated with each
corresponding actor, rather than connected with the next movement by a different actor.
Results from the two studies supported both experimental hypotheses. Their implications
are discussed in the next chapter.
81
CHAPTER 7 – GENERAL DISCUSSION AND CONCLUSIONS
General Discussion
The goal of this dissertation was to explore the cognitive mechanisms underlying
the acquisition of Non-Adjacent Dependencies. The dissertation first replicated the
critical findings from Peña et al. (2002), which suggested that brief pauses between
syllable sequences enables the acquisition of their NADs. The dissertation research
confirmed these findings with English-speakers, and observed that such acquisition took
place faster than previously expected, implying that the underlying mechanism would
support rapid learning. The research then tested the domain generality of such learning
with images and tones, but did not observe acquisition of NADs. Given succesful
acquisition with syllables and human body movements (Endress and Wood, 2011) and
failure with tones and images, two possibilities were advanced as potential explanations
for the mechanism underlying NAD acquisition: (1) fluid transformation is necessary for
connecting the items at the edge positions; and (2) the fundamental mechanism
supporting learning of NADs is motor sequence learning. Four experiments were
designed to test these two hypotheses. It was found that participants learned the NADs of
both human body movements (replicating Endress and Wood (2011)) and object
movements with equal degrees of success, supporting the fluid transformation hypothesis.
When the stimuli used were images, participants acquired the NADs when the images
depicted human body postures, but not when they depicted object postures, suggesting
that motor sequence learning played a critical role in NAD learning. Unifying the
foregoing body of experimental results, this dissertation hypothesized that learning NADs
requires bracketing of the sequences into coherent units, and motor sequence learning
82
provides the means for such bracketing. Experiments 30 and 31 confirmed this
hypothesis in two complementary ways, enabling the previously failed acquisition of
image NADs, and disrupting the previously successful acquisition of human body
movement NADs, by alternately enabling and disrupting the packaging mechanism.
The facilitating role of the motor system in statistical learning of visual stimuli
has implications for understanding the underlying mechanisms of statistical learning in
the domain of language. With acoustic stimuli, acquisition of NADs has been observed
with syllables (Peña et al., 2002) but not tones or noises (Gebhart et al., 2009; Li and
Mintz, 2014). The Motor Theory of Speech Perception proposes that the perception of
syllables is mapped onto vocal gestures, and those gesture representations drive
perception (Liberman et al., 1967; Liberman & Mattingly, 1985). More contemporary
research also implicates motor representations in the perception of other individuals’
movements, in speech, and more broadly (Fadiga, Craighero, & Olivier, 2005; Skipper,
Nusbaum, & Small, 2005). If this is so, the syllable sequences could be represented as
sequences of vocal movements by the motor system. Learning syllable sequences would
be learning the motor sequences producing the syllable sequences, similar to learning
body movements and static body postures.
The characteristics of motor sequence learning, most widely studied using finger
tapping, also converge with findings regarding the statistical learning of speech. A
paradigm widely used in sequence motor learning studies is Serial Reaction Time (SRT),
in which subjects learned ordered finger movements. During training, participants place
their fingers on four buttons and were cued to press the corresponding button by asterisks
shown in one of the four corners of the screen. After only ten cycles of training on the
83
same sequence, subjects’ reaction times dropped dramatically, compared to reaction
times to a new sequence (Nissen & Bullemer, 1987). Such improved performance
indicates participants’ acquisition of finger-movement sequences. In addition to its speed,
research has shown that such learning can happen in the absence of awareness, although
attention to the task is necessary (Clegg, DiGirolamo, & Keele, 1998; Destrebecqz &
Cleeremans, 2001; Nissen & Bullemer, 1987). Further, studies have shown that in
sequence learning, observation of others performing the serial learning task could also
produce similar learning outcomes as having subjects themselves perform the task, and
the same neural substrates were engaged in both observation and performance of the
motor sequences (Bird, Osman, Saggerson, & Heyes, 2005; Frey & Gerry, 2006;
Gruetzmacher, Panzer, Blandin, & Shea, 2011; Heyes & Foster, 2002; Mattar & Gribble,
2005; Osman, Bird, & Heyes, 2005; Rizzolatti & Craighero, 2004).
The above-described characteristics of sequence learning have been observed in
statistical learning of syllable sequences: they are fast, do not require attention, and can
occur through observation (Saffran et al., 1996; Saffran et al., 1997; Peña et al., 2002).
Rapid speed of acquisition is one basic feature of statistical learning of speech. Peña et al.
(2002) found that participants could discern syllable sequences with legal NADs after
only two minutes of exposure to the training stimuli. Studies in Chapter 3 observed that
shortening the exposure to only one minute elicited similar learning outcomes. Another
feature of statistical learning is that it occurs implicitly. Although participants paid
attention to the training stimuli, they were not told what regularities (e.g., NADs of
syllables) they should look for prior to training. Further, it would provide a plausible
84
explanation for the presence of the edge effect in statistical learning of syllables and body
movements, and its absence when using other stimuli, such as tones.
Therefore, motor sequence learning might be a critical part of learning syllable
dependency patterns in speech, in that it provides a kind of packaging of sequences that
highlights beginnings and ends, and therefore facilitates the learning of the patterns
between them.
Limitations
The studies in this dissertation provide an initial exploration of the mechanisms
underlying the acquisition of NADs. These studies have several major limitations which
call for improvement in future research.
First, in Experiment 30 with identifiable objects, it is still unknown what process
is involved in packaging the sequences, although it is hypothesized that it could either be
semantic grouping, as proposed in Otsuka et al. (2013), or verbalization of the objects’
labels. It is also possible that a third mechanism is responsible for this packaging. The
answer to this question will significantly improve our understanding of sequence learning
of objects.
Second, the packaging hypothesis has not been directly tested with acoustic
stimuli. This dissertation tested this hypothesis with visual stimuli, and predicts that the
same mechanism applies to acoustic stimuli. One way to extend this study to acoustic
stimuli would be to test professional vocalists’ acquisition of the NADs of tones. In
Chapter 3 of this dissertation, participants failed to learn the NADs of tones. If the
hypothesis that motor sequences learning functions as a mechanism for packaging the
stimuli, and that professionals may be better able to vocalize the tones due to their
85
extensive experience, it may be expected that professional vocalists would more
successfully acquire tonal NADs.
Third, apart from whether or not they were identifiable, the stimuli used in
Experiments 29 and 30 had differing saliency. Future studies should consider replicating
Experiments 29 and 30 while controlling for the saliency of the identifiable objects in
Experiment 30.
Conclusion
This dissertation explored the underlying mechanisms of Non-Adjacent
Dependencies. The research results suggest that absent perceptual cues, acquisition of
NADs requires bracketing of sequences into coherent units. The packaging of stimuli
sequences into coherent units may result from the fluid transformation of the stimuli
within a sequence, delineated by pauses. It could also result from participants’
perception that such fluid transformation is occurring, even with sequences of discrete
stimuli. For stimuli that are visually discrete, but could be mapped onto the perceiver’s
motor system, such as images of body postures, motor sequence learning could facilitate
such bracketing. This, in turn, implies that the acquisition of syllable NADs may be
supported by the motor sequences learning of the vocal gestures that produced the
syllable sequences.
86
REFERENCES
Abla, D., Katahira, K., & Okanoya, K. (2008). On-line assessment of statistical learning
by event-related potentials. Journal of Cognitive Neuroscience, 20(6), 952–964.
Abla, D., & Okanoya, K. (2009). Visual statistical learning of shape sequences: an ERP
study. Neuroscience Research, 64(2), 185–190. doi:10.1016/j.neures.2009.02.013
Ager, S. (n.d.). Omniglot - the online encyclopedia of writing systems and languages.
Retrieved November 18, 2013, from http://www.omniglot.com/index.htm
Bird, G., Osman, M., Saggerson, A., & Heye, C. (2005). Sequence learning by action,
observation and action observation. British Journal of Psychology, 96, 371–388.
doi:10.1348/000712605X47440
Bonatti, L. L., Peña, M., Nespor, M., & Mehler, J. (2005). Linguistic constraints on
statistical computations: the role of consonants and vowels in continuous speech
processing.. Psychological Science, 16(6), 451–459. doi:10.1111/j.0956-
7976.2005.01556.x
Clegg, B. A., DiGirolamo, G. J., & Keele, S. W. (1998). Sequence learning. Trends in
Cognitive Sciences, 2(8), 275–281. Retrieved from
http://www.ncbi.nlm.nih.gov/pubmed/21227209
Creel, S. C., Newport, E. L., & Aslin, R. N. (2004). Distant melodies: Statistical learning
of nonadjacent dependencies in tone sequences. Journal of Experimental Psychology.
Learning, Memory, and Cognition, 30(5), 1119–1130. doi:10.1037/0278-
7393.30.5.1119
87
Destrebecqz, A., & Cleeremans, A. (2001). Can sequence learning be implicit? New
evidence with the process dissociation procedure. Psychonomic Bulletin & Review,
8(2), 343–350.
Endress, A. D., & Bonatti, L. L. (2007). Rapid learning of syllable classes from a
perceptually continuous speech stream. Cognition, 105(2), 247–299.
doi:10.1016/j.cognition.2006.09.010
Endress, A. D., & Mehler, J. (2009). Primitive computations in speech processing.
Quarterly Journal of Experimental Psychology, 62(11), 2187–209.
doi:10.1080/17470210902783646
Endress, A. D., Nespor, M., & Mehler, J. (2009). Perceptual and memory constraints on
language acquisition. Trends in Cognitive Sciences, 13(8), 348–353.
Endress, A. D., & Wood, J. N. (2011). From movements to actions: two mechanisms for
learning action sequences. Cognitive Psychology, 63(3), 141–71.
doi:10.1016/j.cogpsych.2011.07.001
Graf Estes, K., Evans, J. L., Alibali, M. W., & Saffran, J. R. (2007). Can infants map
meaning to newly segmented words? Statistical segmentation and word learning.
Psychological Science, 18(3), 254–260. doi:10.1111/j.1467-9280.2007.01885.x
Fadiga, L., Craighero, L., & Olivier, E. (2005). Human motor cortex excitability during
the perception of others’ action. Current Opinion in Neurobiology, 15(2), 213-218.
Fadiga, L., Craighero, L., Buccino, G., & Rizzolatti, G. (2002). Speech listening
specifically modulates the excitabilty of tongue muscles: a TMS study. European
Journal of Neuroscience, 15(2), 399 - 402.
88
Fiser, J., & Aslin, R. N. (2001). Unsupervised statistical learning of higher-order spatial
structures from visual scenes. Psychological Science, 12(6), 499–504.
Fiser, J., & Aslin, R. N. (2002). Statistical learning of higher-order temporal structure
from visual shape sequences. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 28(3), 458–467. doi:10.1037//0278-7393.28.3.458
Frey, S. H., & Gerry, V. E. (2006). Modulation of neural activity during observational
learning of actions and their sequential orders. The Journal of Neuroscience, 26(51),
13194–13201. doi:10.1523/JNEUROSCI.3914-06.2006
Gebhart, A. L., Newport, E. L., & Aslin, R. N. (2009). Statistical learning of adjacent and
nonadjacent dependencies among nonlinguistic sounds. Psychonomic Bulletin &
Review, 16(3), 486–490. doi:10.3758/PBR.16.3.486
Goldsmith, J. (1976). Autosegmental phonology (Unpublished doctoral dissertation).
Swarthmore College, Swarthmore.
Gómez, R. L. (2002). Variability and detection of invariant structure. Psychological
Science, 13(5), 431–436.
Gómez, R., & Gerken, L. (2000). Infant artificial language learning and language
acquisition. Trends in Cognitive Sciences, 4(5), 178-186.
Grafton, S. T., Hazeltine, E., & Ivry, R. (1995). Functional mapping of sequence learning
in normal humans. Journal of Cognitive Neuroscience, 7(4), 497–510.
Gruetzmacher, N., Panzer, S., Blandin, Y., & Shea, C. H. (2011). Observation and
physical practice: coding of simple motor sequences. Quarterly journal of
Experimental Psycholog, 64(6), 1111–1123. doi:10.1080/17470218.2010.543286
Harris, Z. S. (1955). From phoneme to morpheme. Language, 31, 190-222.
89
Hauser, M. D., Newport, E. L., & Aslin, R. N. (2001). Segmentation of the speech stream
in a non- human primate : statistical learning in cotton-top tamarins. Cognition, 78,
53–64.
Henson, R. N. A. (1998). Short-term memory for serial order: the Start-End Model.
Cognitive Psychology, 36(2), 73–137. doi:10.1006/cogp.1998.0685
Henson, R. N. A. (1999). Positional information in short-term memory: Relative or
absolute? Memory & Cognition, 27(5), 915–927.
Henson, R. (1998). Short-term memory for serial order: The start-end model. Cognitive
Psychology, 36, 73–137.
Heyes, C. M., & Foster, C. L. (2002). Motor learning by observation: evidence from a
serial reaction time task. The Quarterly Journal of Experimental Psychology, 55A(2),
593–607. doi:10.1080/02724980143000389
Karuza, E. A., Newport, E. L., Aslin, R. N., Starling, S. J., Tivarus, M. E., & Bavelier, D.
(2013). The neural correlates of statistical learning in a word segmentation task: An
fMRI study. Brain & Languague, 127(1), 46-54.
Kersten, A. W., Earles, J. L., Berger, J. G. (2015). Recollection and unitization in
associating actors with extrinsic and intrinsic motions. Journal of Experimental
Psychology: General, 144(2), 274-298.
Kirkham, N. Z., Slemmer, J. A, & Johnson, S. P. (2002). Visual statistical learning in
infancy: evidence for a domain general learning mechanism. Cognition, 83(2), B35–
42.
Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967).
Perception of the speech code. Psychological Review, 74(6), 431-461.
90
Liberman, A. M., & Mattingly, G. (1985). The motor theory of speech perception
revised*. Cognition, 21, 1-36.
Liberman, A. M., Delattre, P. C., & Cooper, F. S. (1952). The role of selected stimulus-
variables in the perception of the unvoiced stop consonants. The American Journal
of Psychology, 65, 497-516.
Liberman, A. M., Delattre, P. C., & Cooper, F. S. (1955). Acoustic loci and transitional
cues for consonants. The Journal of the Acoustical Society of America, 27(4), 769-
773.
Liberman, A. M. & Whalen, D. H. (2000). On the relation of speech to language. Trends
in Cognitive Sciences, 4(5), 187-196.
Mattar, A. A. G., & Gribble, P. L. (2005). Motor learning by observing. Neuron, 46(1),
153–160. doi:10.1016/j.neuron.2005.02.009
Mintz, T. H. (2002). Category induction from distributional cues in an artificial language.
Memory & Cognition, 30(5), 678–686.
Mintz, T. H. (2003). Frequent frames as a cue for grammatical categories in child
directed speech. Cognition, 90(1), 91–117. doi:10.1016/S0010-0277(03)00140-9
Mirman, D., Magnuson, J. S., Graf Estes, K. G., & Dixon, J. A. (2008). The link between
statistical segmentation and word learning in adults. Cognition, 108, 271–280.
doi:10.1016/j.cognition.2008.02.003
Newport, E. L., & Aslin, R. N. (2004). Learning at a distance I. Statistical learning of
non-adjacent dependencies. Cognitive Psychology, 48(2), 127–162.
doi:10.1016/S0010-0285(03)00128-2
91
Newport, E. L., Hauser, M. D., Spaepen, G., & Aslin, R. N. (2004). Learning at a
distance II. Statistical learning of non-adjacent dependencies in a non-human
primate. Cognitive Psychology, 49(2), 85–117. doi:10.1016/j.cogpsych.2003.12.002
Nissen, M. J., & Bullemer, P. (1987). Attentional requirements of learning: Evidence
from performance measures. Cognitive Psychology, 19, 1–32.
Onnis, L., Monaghan, P., Richmond, K., & Chater, N. (2005). Phonology impacts
segmentation in online speech processing. Journal of Memory and Language, 53,
225-237.
Osman, M., Bird, G., & Heyes, C. (2005). Action observation supports effector-
dependent learning of finger movement sequences. Experimental Brain Research,
165, 19–27. doi:10.1007/s00221-005-2275-0
Otsuka, S., Nishiyama, M., Nakahara, F., & Kawaguchi, J. (2013). Visual statistical
learning based on the perceptual and semantic information of objects. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 39(1), 196-207. doi:
10.1037/a0028645
Peña, M., Bonatti, L. L., Nespor, M., & Mehler, J. (2002). Signal-driven computations in
speech processing. Science, 298(5593), 604–7. doi:10.1126/science.1072901
Pons, F., & Toro, J. M. (2010). Structural generalizations over consonants and vowels in
11-month-old infants. Cognition, 116(3), 361-367.
Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of
Neuroscience, 27, 169–192. doi:10.1146/annurev.neuro.27.070203.144230
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old
infants. Science, 274, 1926–1928.
92
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning
of tone sequences by human infants and adults. Cognition, 70(1), 27–52.
Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word Segmentation : The Role of
Distributional Cues. Journal of Memory and Language, 35, 606–621.
Saffran, J. R., & Thiessen, E. D. (2007). Domain general learning capacities. In E. Hoff
and M. Schatz (Eds.) Blackwell Handbook of Language Development, pp. 68-86.
Oxford: Blackwell Publishing.
Santelmann, L. M., & Jusczyk, P. W. (1998). Sensitivity to discontinuous dependencies
in language learners: evidence for limitations in processing space. Cognition, 69(2),
105–134.
Shiffrar, M., & Freyd, J. J. (1990). Apparent motion of the human body. Psychological
Science, 1(4), 257–264. doi:10.1111/j.1467-9280.1990.tb00210.x
Skipper, J. I., Nusbaum, H. C., & Small, S. L. (2005). Listening to talking faces: motor
cortical activation during speech perception. Neuroimage, 25(1), 76-89.
Toro , J. M., Nespor, M., Mehler, J., & Bonatti, L. L. (2008). Finding words and rules in
a speech stream: functional differences between vowels and consonants.
Psychological Science, 19(2), 137–144. doi:10.1111/j.1467-9280.2008.02059.x
Toro, J. M., Shukla, M., Nespor, M., & Endress, A. D. (2008). The quest for
generalizations over consonants: Asymmetries between consonants and vowels are
not the by-product of acoustic differences. Perception & Psychophysics, 70(8),
1515-1525.
93
Turk-Browne, N. B., Jungé, J. A., & Scholl, B. J. (2005). The automaticity of visual
statistical learning. Journal of Experimental Psychology: General, 134(4), 552–64.
doi:10.1037/0096-3445.134.4.552
Turk-Browne, N. B., & Scholl, B. J. (2009). Flexible visual statistical learning: transfer
across space and time. Journal of Experimental Psychology: Human Perception and
Performance, 35(1), 195–202. doi:10.1037/0096-1523.35.1.195
Turk-Browne, N. B., Scholl, B. J., Chun, M. M., & Johnson, M. K. (2008). Neural
evidence of statistical learning: efficient detection of visual regularities without
awareness. Journal of Cognitive Neuroscience, 21(10), 1934–1945.
doi:10.1162/jocn.2009.21131
Urgesi, C., Moro, V., Candidi, M., & Aglioti, S. M. (2006). Mapping implied body
actions in the human motor system. Journal of Neuroscience, 26(30), 7942–7949.
doi:10.1523/JNEUROSCI.1289-06.2006
Vuong, Q, C. & Tarr, M.J. (2004). Rotation direction affects object recognition. Visual
Resaerch, 44, 1717-1730.
Wertheimer, M. (1923). Laws of Organization in Perceptual Forms. First published as
Untersuchungen zur Lehre von der Gestalt II, in Psycologische Forschung, 4, 301-
350. Translation published in Ellis, W. (1938). A source book of Gestalt psychology
(pp. 71-88). London: Routledge & Kegan Paul.
Wilson, M., & Knoblich, G. (2005). The case for motor involvement in perceiving
conspecifics. Psychological Bulletin, 131(3), 460–473. doi:10.1037/0033-
2909.131.3.460
94
Wilson, S. M., Saygin, A. P., Sereno, M. I., & Iacoboni, M. (2004). Listening to speech
activates motor areas involved in speech production. Nature Neuroscience, 7(7),
701-702.
Abstract (if available)
Abstract
This dissertation examines the underlying mechanisms of learning Non-Adjacent Dependencies (NADs), the dependencies of stimuli that are not temporally adjacent. Previous studies have suggested that perceptual grouping facilitated such learning. Peña et al. (2002) observed that, absent perceptual cues, brief pauses between syllable sequences facilitated the acquisition of NADs of syllable sequences. Endress and Wood (2011) observed successful acquisition of body movements when the body movement sequences were separated by pauses. However, the function of the pauses, and the underlying mechanism of acquisition, remain unknown. This dissertation first replicated the key findings in Peña et al. (2002), confirming the acquisition of syllable NADs when pauses separated the test sequences. It then applied the same experimental model to non-linguistic stimuli, abstract images and tones, and found that the inclusion of pauses did not facilitate the acquisition of the NADs of these stimuli. Based on these observations, two hypotheses were proposed: (1) packaging the sequences into a coherent unit is necessary for the acquisition of NADs
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Statistical learning is a two-step process
PDF
Learning repetition rules and non-adjacent dependencies from human actions in nine-month-old infants
PDF
Acquisition of functional categories
PDF
When and how infants discriminate between declaratives and interrogatives
PDF
Neural substrates associated with context-dependent learning
PDF
Experience-dependent neuroplasticity of the dorsal striatum and prefrontal cortex in the MPTP-lesioned mouse model of Parkinson’s disease
PDF
Crowding in peripheral vision
PDF
From sensory processing to behavior control: functions of the inferior colliculus
Asset Metadata
Creator
Li, Jia (author)
Core Title
Mechanisms underlying acquisition of non-adjacent dependencies
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
07/06/2017
Defense Date
06/03/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
dual-mechanism account,language acquisition,motor theory,non-adjacent dependencies,OAI-PMH Harvest,statistical learning
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mintz, Toben H. (
committee chair
), Iskarous, Khalil (
committee member
), Tjan, Bosco S. (
committee member
), Wood, Justin N. (
committee member
)
Creator Email
jia.vivian.li@gmail.com,jli10@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-585986
Unique identifier
UC11300374
Identifier
etd-LiJia-3551.pdf (filename),usctheses-c3-585986 (legacy record id)
Legacy Identifier
etd-LiJia-3551.pdf
Dmrecord
585986
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Li, Jia
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
dual-mechanism account
motor theory
non-adjacent dependencies
statistical learning