Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The effects of violating selected item-writing principles on the difficulty and reliability of multiple-choice tests for health profession students
(USC Thesis Other)
The effects of violating selected item-writing principles on the difficulty and reliability of multiple-choice tests for health profession students
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
THE EFFECTS O F VIOLATING SELECTED ITEM-WRITING PRINCIPLES
O N THE DIFFICULTY A N D RELIABILITY O F MULTIPLE-CHOICE
TESTS F O R HEALTH PROFESSION STUDENTS
by
Muriel Wolkow
A Dissertation Presented to the
FACULTY O F THE S C H O O L O F EDUCATION
UNIVERSITY O F SO U TH ER N CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
D O C TO R O F EDUCATION
October, 1978
UMI Number: DP26571
All rights reserved
INFORMATION TO ALL USERS
The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
UMI DP26571
Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author.
Microform Edition © ProQuest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code
ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106- 1346
c
This dissertationf written under the direction ^
of the Chairman of the candidate's Guidance j
Committee and approved by all members of the ^ /\y
Committeej has been presented to and accepted
by the Faculty of the School of Education in
partial fulfillment of the requirements for the
degree of Doctor of Education.
Jajauany.,...19.75. Date.
Dean
Guidance Committee
Chairman
TABLE O F C O N TE N TS
Page
LIST O F TABLES.................................................... iv
Chapter
I. TH E PROBLEM............................................................ 1
Introduction
Purpose of the Study
Research Problem
Questions to be Answered
Research Hypotheses
Definition of Terms
Importance of the Study
Scope of the Study
Organization of the Remainder
of the Dissertation
I I . REVIEW O F THE LITERATURE.......................................................... 11
Test-wiseness
Teaching Test-wise Skills
Correlates of Test-wiseness
Principles of Item-Writing
I I I . PROCEDURES....................................................................................... 25
Research Design
Independent Variables
Dependent Variables
Statistical Hypotheses
Description of Sample
Population I (Optometry Students)
Population II (Dentistry Students)
Description of Instruments
Final Examination for Optometry 343
Final Examination for Pedodontics 534
Test Administration and Scoring
Instruction Sessions
Administering the Final Examinations
Test Scoring
i i
Chapter Page
Data Analyses
Analyses Involving the Variable
Test Difficulty
Analyses Involving the Variable
Test Reliability
Assumptions
Limitations
IV. RESULTS ........................................................... 42
Analyses of Test D ifficulty Variable
Extra-1ong-Fault Category
Findings
Discussion
Parallel-Fault Category
Findings
Di scussion
Number-Fault Category
Findings
Discussion
All-of-the-Above-Fault Category
Findings
Discussion
Analyses of Test R eliability Variable
Findings
Discussion
V, S U M M A R Y CONCLUSIONS A N D RECO M M ENDATIO NS ............................. 75
Sum mary
Conclusions
Recommendations
Som e Final Thoughts
REFERENCES...............................................................................................................85
APPENDIX A: CONFIDENCE-IN-RESPONSE INSTRUCTIONAL PACK ET . . . 90
APPENDIX B: SAM PLE A N SW E R SHEET (PAGE 1) . . ...................................98
111
-LIST O F TABLES
TABLE Page
3.1 Assignment of Items to Manipulated and
Non-Manipulated Categories for Optometry
Examination ............................................................................. 37
3.2 Assignment of Items to Manipulated and
Non-Manipulated Categories For Dentistry
Examination ............................................................................. . . 37
4.1 Mean Scores on Extra-Long Items With Regular and
Percent Confidence Scoring Procedures for
Dentistry Students ............................................................. . . 43
4.2 Mean Scores on Extra-Long Items with Regular and
Percent Confidence Scoring Procedures for
Optometry Students ............................................................. 44
4.3 Analysis of Variance for Extra-Lonq Items with
Regular and Percent Confidence Scoring
Procedures for Dentistry Students ................................. 45
4.4 Analysis of Variance for Extra-Long Items with
Regular and Percent Confidence Scoring
Procedures for Optometry Students ................................. 46
4.5 Mean Scores on Parallel Items with Regular and
Percent Confidence Scoring Procedures for
Dentistry Students ............................................................. . . 50
4.6 M ean Scores on Parallel Items with Regular and
Percent Confidence Scoring Procedures for
Optometry Students ............................................................. 51
4.7 Analysis of Variance for Parallel Items with
Regular and Percent Confidence Scoring
Procedures for Dentistry Students ................................ , , 52
4.8 Analysis of Variance for Parallel Items with
Regular and Percent Confidence Scoring
Procedures for Optometry Students ................................. . . 53
i V
TABLE
Page
4.9 Mean Scores on Number Items With Regular and
Percent Confidence Scoring Procedures for
Dentistry Students ................................. , ............................ 57
4.10 Mean Scores on Number Items with Regular and
Percent Confidence Scoring Procedures for
Optometry Students ..................................................................... 58
4.11 Analysis of Variance for Number Items with
Regular and Percent Confidence Scoring
Procedures for Dentistry Students ......................................... 59
4.12 Analysis of Variance for Number Items with
Regular and Percent Confidence Scoring
Procedures for Optometry Students ......................................... 60
4.13 Mean Scores on All-Of-The-Above Items with
Regular and Percent Confidence Scoring
Procedures for Dentistry Students ......................................... 63
4.14 Mean Scores on All-Of-The-Above Items with
Regular and Percent Confidence Scoring
Procedures for Optometry Students ......................................... 64
4.15 Analysis of Variance for All-Of-The-Above Items
with Regular and Percent Confidence Scoring
Procedures for Dentistry Students ......................................... 65
4.16 Analysis of Variance for All-Of-The-Above Items
with Regular and Percent Confidence Scoring
Procedures for Optometry Students . . ................................ 66
4.17 Comparisons of Measures of Internal Consistency
for Faulty and Fault-Free Items on Dentistry
and Optometry T e s ts ..................................................................... 70
4.18 Reliability Coefficients (Cronbach's Alpha) for
Faulty and Fault-Free Item Sets on Dentistry
T e s t ................................................................................................. 71
4.19 Reliability Coefficients (Cronbach's Alpha) for
Faulty and Fault-Free Items Sets on Optometry
T e s t ................................................................................................. 71
4.20 Correlations of Faulty and Fault-Free Items
with Non-Manipulated Items on the Dentistry
and Optometry Tests ..................................................................... 73
V
LIST O F FIGURES
FIGURE Page
1 Diagram of Basic Latin Square Replicated for
A n a ly s e s ......................................................................................... 38
2 Diagram of Complete Analysis Design Showing
basic Latin Squares which were replicated
13 Times for Each Q u a r t i l e ................................................... 39
VI
CHAPTER I
THE PR O B LEM
Introduction
In recent years, educators have increasingly favored objective
testing over other paper-and-pencil techniques for purposes of student
evaluation (1), The trend is particularly evident in health profesÂ
sion schools where multiple-choice tests have become the most widely
used of test instruments (2,3). Educational measurement specialists
cognizant of the critical decisions that are likely to be made on the
basis of test achievement have, over the years, stressed the need for
ascertaining the validity of objective tests in general and, of
teacher-made versions in particular. Most textbooks on educational
measurement prescribe guidelines to help teachers construct "good"
tests; yet, there is l i t t le empirical research to support many of the
recommendations. The study reported in this paper focuses on the
effects of deliberate violations of selected item-writing principles
on measures of achievement for health profession students.
The growing popularity of the multiple-choice test format can be
attributed in part to economic conditions. During the seventies, most
health profession schools have had to deal with the conflicting demands
of escalating student enrollments combined with diminishing school
budgets. In view of the cost/benefits of objective testing as compared
with traditional evaluation techniques, i t is not surprising that adÂ
ministrators have encouraged a shift toward the form procedures.
1
1 However, the enthusiastic acceptance of the objective test format
! by health profession educators has undoubtedly been influenced more by
! perceived educational advantages than by economic factors. For in-
' stance, objective tests are seen by m any as a welcome solution to the
problems of evaluating learning outcomes in the face of the tremend-
I
ous volumes of factual knowledge that are included in an educational !
program for health professionals. More comprehensive sampling of
; course content is possible because the examiner can cover more facts
i in a shorter period of time (4). Also since objective tests are
I easier to administer and score, they provide rapid feedback to large
I I
I student groups with minimum expenditure of faculty time (5). Further-
I more, in previous studies the validity and re lia b ility of multiple-
i
choice tests as compared to essay tests has been documented (6,7,8).
' While these advocates of objective measurement techniques stress
I the advantages of increased re lia b ility , greater economy in scoring
I
I and better sampling of content, opponents point out the rig id ity of
' the structured response format, emphasis on low-level cognitive a t t r iÂ
butes, frequent ambiguity and hidden penalties for creative thinking
' (9,10,11). In consideration of both points of view, Adkins has sugÂ
gested that "although a well-constructed objective test m ay provide
greater assurance of validity than an equally well-constructed essay
I test, a poorly constructed objective test probably provides far less
! valid results than a poorly constructed essay test," (12). The problem
I then becomes one of defining criteria for evaluating test construction.!
I
Item structure is considered to be an important determinant of the |
quality of an objective test. Several authors have suggested that
; 2 1
poorly written items which contain elements of ambiguity and/or inadÂ
vertent cuing m ay affect adversely the validity of a test as a measure
of achievement for certain populations (13,14,15). I t has been sugÂ
gested that ambiguous wording tends to confuse the examinee and inÂ
troduces components of chance into the scoring (16) and that item
cues strongly bias a test in favor of the "testwise" student (17,18,
19).
Many textbooks on educational evaluation include guidelines for
avoiding item-writing p itfa lls (16,20,21,22). In most, the assumpÂ
tion is m ade that by adhering to stated principles of item-writing,
these important sources of irrelevant variation in test results can
be avoided. There is, however, generally no differentiation m ade
am ong the recommended principles, the inference being that in order
to maintain the integrity of a test as a measure of specific knowlÂ
edge, each rule is of equal importance and must be precisely followed
in all test situations.
Several research studies have been reported which investigate
what happens when test developers do not adhere to recommended item-
writing procedures (13,23,24,25,26). The concepts of test-wiseness
that have evolved from these and other studies w ill be discussed in
detail in Chapter I I of this report. For the most part, however,
these researchers : have been concerned with achievement in low-content
subject areas by comparatively naive test-takers. It seems reasonable
to assume that am ong a relatively homogenous group of sophisticated
examinees such as would constitute the population of a health proÂ
fession school, the degree of variance due to test-wiseness m ay be
very different from that evidenced am ong the school populations preÂ
viously studied. The implications of such variance m ay also be d ifÂ
ferent for the more sophisticated population. Perhaps there is a
point of diminishing return in the investment of teacher time spent
to achieve "perfection" in test construction, particularly at higher
levels of academic experience. In order to answer som e of the quesÂ
tions that these speculations pose, more empirical data are needed
regarding the effects of violating item-writing principles on tests
administered to populations of varying levels of sophistication in
different test situations.
Purposes of the Study
This study was designed to investigate the influence of faulty
items (items which do not adhere to selected item-writing principles)
on the d ifficu lty and re lia b ility of a multiple-choice test adminisÂ
tered to a population of health profession students. The specific
purposes of the study were : (1) to determine the effects of each of
four selected item-writing faults on the d iffic u lty and/or re lia b ility
measures of a multiple-choice test administered to students in a
health profession school; (2) to ascertain whether these effects were
differential with respect to high and low achievers within the populaÂ
tion tested and (3) to determine whether the inclusion of faulty items
on a test affects the internal consistency of the test.
The faults investigated were:
(1) Use of "extra-long" keyed response.
(2) Use of middle alternative for keyed response
in numerical sequences.
(3) Use of "parallel" distractors.
(4) Use of "all-of-the-above" option as keyed response
and only multiple-alternative response choice.
Research Problem
Questions to be Answered
The research was designed to provide answers to the following
questions which corresponded to the main purposes of the investiÂ
gation:
1. With respect to the effects of each of the four specified
faults on estimates of student achievement (d ifficu lty level)
three questions were asked:
1.1 Is there a significant difference between m ean scores
obtained on items which contain a specified fault and
the m ean scores obtained on fault-free versions of the
sam e items?
1.2 Is there a significant difference between the m ean conÂ
fidence expressed in the correctness of keyed responses
for items which contain a specified fault and the m ean
percent-confidence expressed in the correctness of keyed
responses for fault-free versions of the sam e items?
1.3 Does the magnitude of the effect of item faults on test
d iffic u lty differ with respect to high and low achievers?
2. Does the inclusion of items which contain the specified
faults affect the re lia b ility of a test?
Research Hypotheses
It was predicted that:
5
(1)
Scores on faulty items would be higher than scores on
fault-free versions of the sam e items.
(2) Subjects would express more confidence in the correct-
ness of keyed responses on faulty items than on fault-
free items.
(3) Low achievers would derive greater benefit than high
achievers would from the presence of faulty items on a
test.
(4) The re lia b ility of a test would be affected by inclusions
of faulty items
•
Definition of Terms
For purposes of this study, the following definitions of terms
were generated.
All-of-the-above fault: The item fault in which the keyed
response, "all of the above," is
the only multi pie-alternative choice
offered.
Alternative response A response choice other than the keyed
choice: response.
Complex-Alternative A n item format in which several multiple
item format alternative choices are displayed, inÂ
cluding "all of the above" as a possible
response.
Confidence-i n-response A numerical indication of the degree of
(C-I-R): certainty an examinee expresses in the
correctness of a response.
6
Distractor:
Extra-long fault
Fault-free items
Faulty items:
Item set:
Item structure:
Keyed response:
Manipulated items
A term used in this study to designate
a complex-alternative item format in
which "all of the above" is not the
keyed response.
The item fault in which the keyed re-
i
sponse is noticeably longer than any !
of the other alternatives.
Items which have been judged by the
researcher and a test specialist to
adhere to the principles of good item
writing. ,
Items which have been manipulated to !
include violations of specified item-
writing principles.
The characteristic of an item which
is determined by whether i t appears on
a specified form of a test as a faulty
or fault-free item.
The characteristic of an item which
designates i t as faulty or fault-free.
The response choice considered to be
correct for a given item.
Items which have been altered by the
researcher in any way for purposes of |
I
the study. These m ay include fault-free I
items and distractors, as well as faulty '
Nulti ple-alternati ve
choice:
Non-manipulated item:
Number fault:
Parallel fault:
Reproducing scoring
system (RSS) :
Test-wiseness (TW):
i tems.
A response option that can be selected |
to indicate that more than one of the
alternatives presented are correct.
A n item which appears on the test exÂ
actly as written by the course instructor
and is not designated as either "faulty"
o r'fa u lt-fre e ."
The item fault in which the response
choices are displayed in numerical
sequence and the keyed response is a
middle alternative.
The item fault in which the keyed reÂ
sponse is obviously paired with one
and only one of the other alternatives
by reason of sim ilarity or opposition
(e.g ., hyperglycemia and hypoglycemia)
and no such pairing exists am ong the
other alternatives.
A scoring system in which maximum scores
are achieved when the examinee reports
subjective probabilities honestly.
A subject's capacity to use the charÂ
acteristics of a test item, independent |
of content knowledge, to determine the ;
correct response. '
8
Importance of the Study
The intent of this research was to contribute to existing knowlÂ
edge concerning sources of error variance in multiple-choice test
scores. The study focused on populations of students enrolled in two
health profession schools, a school of dentistry and a school of opÂ
tometry. Although som e empirical data already exist on the effects of
poor item-writing techniques on parameters of test d iffic u lty and
re lia b ility , the majority of these studies have been concerned with
populations of students at lower academic levels. This investigation
sought to determine the effects of specified item-writing faults on
student achievement and on the re lia b ility coefficients of tests adÂ
ministered to the more test-sophisticated students in these profesÂ
sional schools.
Scope of the Study
The population for the study was restricted to second year stuÂ
dents at the University of Southern California School of Dentistry enÂ
rolled in the introductory course in pedodontics, P E D O 534, and fir s t
year students at the Southern California School of Optometry enrolled
in the introductory course in clinical optometry. Optometry 343. Only
multiple-choice items included in the final examination for the two
courses (PEDO 534 and Optometry 343) were included in the item samples
for this study.
Organization of the Remainder of the Dissertation
The remainder of the dissertation is organized within four chapÂ
ters. Chapter I I presents a review of the literature. In Chapter I I I
I the research design and the methods employed to collect and analyze
I the data are described. Chapter IV presents an analysis and interpre-
I '
j tation of the findings. A summary of the study, conclusions and recom-
: mandations are provided in Chapter V. ;
10
CHAPTER I I
REVIEW O F THE LITERATURE
In this chapter the literature which served as a conceptual fram-
work for the present investigation is summarized. The review focuses
on the nature of test-wiseness (TW) and the effects of this tra it on
test scores with emphasis on secondary-cue responses attributable to
faulty item writing.
Test-wiseness
Katheryn Woodley refers to test-wiseness as a construct which
has been operationally defined in order that i t m ay be measured (27).
One of the definitions most generally accepted by educational measureÂ
ment specialists today appears in Good's Dictionary of Education as,
A subject's capacity to u tilize the characterÂ
istics and formats of the test and/or test-
taking situation to receive a high score; logiÂ
cally independent of the examinee's knowledge
of the subject matter which the items are supÂ
posedly measuring; includes knowledge of stratÂ
egies in using time, avoiding error, guessing,
reasoning deductively, and using cues and
specific determiners (28).
The complexity of the definition is indicative of the intricacy of
the tra it its e lf. I t is, therefore, not surprising to find that reÂ
searchers have focused on only selected aspects of test-wiseness and no
comprehensive body of literature is to be found.
The importance of test-wiseness as a source of variation in test
scores is frequently discussed in the literature of test theory (10,16,
22, 29). Stanley describes the tra it as a lasting and general charÂ
acteristic of the individual causing stable individual difference in
1 1
test performance (30). He suggests that test-wise factors contribute
true variance to test scores, variance that m ay or m ay not be desirable
depending on the purpose of the test and which, therefore,present
problems of test validity rather than test re lia b ility . Stanley also
discusses the probability that mental a b ility and test-wiseness are
positively correlated, an assumption that has been contradicted by
som e recent findings (31, 39).
Many of the early studies of variance in test scores attributed
to test sophistication were concerned with the influences of coaching
and practice on specific tests rather than general test-taking a b iliÂ
ties. A study in 1921 by Rugg and Colloton (32) showed substantial
increases in IQ scores on retests of the Stanford-Benet measurement
instrument. Reporting on the effects of practice and growth on scores
on the Scholastic Aptitude Test (SAT), Levine and Angoff indirectly
referred to components of test-wiseness when they attributed score d ifÂ
ferences found with SAT tests to increase sophistication of the student
population (33).
The effects of random guessing and individual test-taking biases
also received considerable attention in early investigations (34,35,
36,37). Although these studies were not directly concerned with corÂ
relates of TW-specific test-taking strategies, they did provide inÂ
sights into the problem solving approaches used by examinees in responÂ
ding to multiple-choice questions which were valuable in designing
later research.
Teaching Test-wise Skills
Other investigations have examined the feasib ility of improving
12
I T W skills through instruction. Most experts consider test-wiseness to
be a specific cognitive skill that can be improved through experience ,
(38,39, 40). Ebel points out the need for improving the test-taking
: skills of all examinees asserting, "Given a test that measures com m and
of knowledge and is free from technical flaws, errors in measurement
are likely to be due to too l i t t l e rather than too m uch test-wiseness,"'
(16). Most of the studies in this area have examined treatment effects
for populations of naive test-takers in elementary and secondary grades
; focusing on such general aspects of test-taking as following direc-
I ions, fam iliarity with a variety of test formats and careful pacing. â–
, Moore, Schultz and Baker developed a programmed test for eighth grade
students in which examinees were taught to identify and respond d ifÂ
ferential ly to relevant cues in test directions (38). Another re-
: searcher, Carl Callenbach, working with second grade students in 1973,
was able to effectively improve achievement on standardized reading
I tests through instruction and practice in content-independent test-
taking techniques (41).
Further research supporting the assumption that principles of
test-wiseness can be taught was reported by Wahlstrom and Boersm a in
1 1968 (42). Their study developed and tested a training program for
ininth-graders using a modified version of the outline of principles of '
I
test-wiseness suggested by Mil 1 man, Bishop and Ebel (19). Experimental
tests developed for the study utilized social studies content appro-
'priate for the sample population, manipulating items to include item |
I
faults. In addition to comparisons between treatment and non-treatmenti
groups, the research design provided for comparisons between two
I 13 i
, experimental groups. One of these groups was pre-tested with a "good"
j test and post-tested immediately after training, with a test contain-
I ing faulty items while, for the second group the pre and post-tests
I were reversed. After treatment, the m ean scores on the post-test for '
I
the fir s t group were significantly higher than the m ean scores on the
pre-test but no similar differences were seen for the second group, '
indicating that improvement effects could be attributed directly to
the treatment. A n interesting sidelight of the Wahlstrom and Boersma
' program was the incorporation of item-writing experiences in the trainÂ
ing phase of the program in order to make the students more aware of
i j
j item-writing faults, an approach that was apparently highly successful.
I
One of the few instructional programs specifically designed to
promote improvement of adult test-wise skills was reported by Woodley ,
' in 1972 (27). The program was developed to assist adult candidates in
I preparing for the licensing examinations for the College of Life UnderÂ
writers (CLU). In contrast to many commonly used approaches to tutor- ,
ing for adults, the Woodley program focused on instruction in test-
taking techniques, not on providing practice in the types of items
used in the C LU examinations, although items from actual tests were
! modified to develop a test-wiseness scale. Cue-using strategies in-
' volving grammatical cues, stem-option sim ilarities and specific de-
: terminers were included on the measurement scale.
Correlates of Test-wiseness
I
Although no comprehensive study of test-wiseness has yet been re- ;
i
ported, selected aspects of T W traits have been, as previously menÂ
tioned, examined by researchers. In recent years, the effects of ,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ 1 L .J
test-wiseness training on test scores, personality correlates of test-
wiseness and interrelations of test-wiseness with sex, age, and risk-
jtaking behavior have all been subjects of study (17,42,43,44). Most
I of these investigations have been concerned with the responses of naive
test-takers to non-content-related test items. !
!
In 1972, Diamond and Evans investigated cognitive correlates of
test-wiseness am ong a population of sixth-grade pupils (31). The test-
wiseness instrument used for this study employed fictitio us items which,
according to the researcher, were used in order to eliminate knowledge
I '
I of the subject matter as a response base. Item faults selected for
jstudy were : (1) sim ilarities between stem and correct alternatives;
(2) use of specific determiners; (3) extra-long alternatives ; (4) gramÂ
matical clues; and (5) overlapping distractors. In accordance with
their findings. Diamond and Evans concluded that test-wiseness, defined
as a secondary-cue response, is a tr a it possessed by naive subjects and
: is specific to the particular clue or cue under investigation rather
than a general characteristic. It was also suggested that the results
indicate that secondary-cue responses have l i t t l e relationship to a
student's general cognitive ab ility; a finding which seems to be in
conflict with the previously noted assumption expressed by Stanley (30).
' The longitudinal study conducted by Crehan, Koehler and Slakter
! in 1974 examined the relationships of test-wiseness to grade d iffe rÂ
ences, grade-by-sex interactions and the stab ility of T W over time (43).
Subjects were students in grades 5 through 11 who were observed twice,
with a 2-year interval between observations. Here again the T W scale
used was developed with fictitious rather than content-oriented items. â–
i 15 !
Results showed significant increases in test-wise factors over grades,
no sex-by-grade interaction and s tab ility of the tra it for individuals
over grade levels.
Principles of Item-Writing
In their analysis of test-wiseness as a source of variance in edÂ
ucational test scores. Mil 1man. Bishop and Ebel suggest a theoretical
framework for future empirical investigations (19). These authors
differentiate between test-wise strategies that are independent of the
test constructor and test purpose such as timing and careful reading
of instructions and, strategies that are dependent on characteristics
of the specific test or test constructor such as taking advantage of
cues to the correct answer which have inadvertently been provided by
the test-maker. M uch of the significant research on secondary-cue
responses since 1965 has evolved from recommendations included in this
paper. In a discussion directed toward the test-taker, the authors
describe two kinds of item flaws. Items in which partial knowledge m ay
possibly be parlayed into a correct response are considered firs t. In
this category the use of partial knowledge to identify correct a ll-o f-
the-above alternatives, elimination of options which imply the correctÂ
ness of each other when only one answer is allowed, and use of content-
knowledge inadvertently provided in other items are discussed. More
direct cue-using strategies are implied for the second group of item
flaws which are described as representing constant idiosyncracies of
the test constructor. Under this heading are placed such item-writing
faults as (1) a keyed option in an ordered set which is in the middle
of a numerical sequence; (2) a keyed response that is one of a pair of
16
j diametrically opposite (or closely similar) statements; (3) a keyed
I
' response that is noticeably longer (or shorter) than incorrect options.
t
j One of the fir s t investigations to focus on the effects on item
!
i scores of specific cue-using strategies was reported by Dunn and
Goldstein in 1959 (25), before the publication of the Mi liman, Bishop '
!
and Ebel outline. These researchers investigated the effects of viola^'
tions of four generally accepted principles of item construction on
test d iffic u lty , re lia b ility and validity. Subjects for the study were
' 832 Army trainees. In this study relevant content was used to con-
i struct four experimental tests for each of four content areas. Com -
i !
j pari sons were m ade between (1) items in which the lead is an incomplète
statement and items in which the lead is a question, (2) items with
no cues in the lead and those in which the stem includes a cue to the
correct alternative, (3) items with alternatives of equal length and
items in which the keyed response is extra long, and (4) items which
: are grammatically correct and items in which grammatical inconsisten-
; cies were included. Based on their findings, the authors concluded
I that items containing cues to correct alternatives, extra-long correct
1
I alternatives and inconsistencies in grammar were less d iffic u lt for the,
I populations studied than items written according to the rules. N o
i effect on item d iffic u lty attributable to the form of the lead (incomÂ
plete statement or question) was found. Also, no significant d iffe rÂ
ential effects on re lia b ility or validity resulting from violations of â–
I
I the item-writing principles were noted. Correlational comparisons
i
between experimental scores and aptitude test scores indicated l i t t l e
or no relationship between test-wiseness and intelligence, a finding
17 I
, which was later supported by results In the Diamond and Evans study
‘ (31).
I
I Another early study concerned with relationships between T W traits
I
I and item faults was conducted by Bernard G , Gibb in 1964 (39). Gibb's
main interest was in determining whether secondary-cue responses could
appropriately be used to measure changes in levels of test-wiseness '
after T W training. The study did not examine cue-specific effects on
' test characteristics as had been done in the Dunn and Goldstein study
but rather used item-cue responses to define, teach and evaluate test-
wise skills. Gibb hypothesized: (1) test-wiseness is a cognitive
skill that exists separately from knowledge of subject matter; (2)
there are reliable individual differences in test-wise a b ilitie s; and
(3) test-wise traits can be improved through training. Subjects were
103 college students, half of w hom were given training in the recogniÂ
tion of seven types of item faults (stem-option sim ilarities, absurd
alternatives, use of determiners, overly precise correct alternatives,
extra-long correct alternatives, grammatical inconsistencies and cues
provided by other items on the sam e test). In order to avoid confoundÂ
ing knowledge of content with indications of test-wiseness, Gibb deÂ
veloped the experimental test for the study using items of "inappropriÂ
ate" d iffic u lty for the population sample. Gibb concluded that his
in itia l! hypotheses could be accepted as true. His data also indicated
that the inclusion of secondary cues m ade items less d iffic u lt for both ’
trained and untrained groups. j
In the mid-sixties, at about the sam e time as Gibb reported on the
development of his T W measurement scale, another study was reported
' 18 I
I which was concerned with characteristics of item format as related to
I
I response set rather than test-wiseness per se. Clinton I. Chase exam-
!
I ined the relationships between varying length of options and response
â– sets to choose the longest alternative (23). The researcher asked two
questions: (1) H ow long must an option be before a set appears to
choose i t over other alternatives? (2) Can responses to previous items^
influence a length-of-option response set? For the study, experimental
' tests were developed using items which dealt with advanced topics in
I
I psychology and were administered to beginning students in a college
I course. Results showed a bias in favor of the longest options, i.e., ,
j options which were at least three times longer by word count than any '
: of the other alternatives. In the second half of the study, highly
1
: d iffic u lt extra-1ong-option items were interspersed with easy items
in which the longest alternative was never correct so that each d if f i-
: cult item was preceded by several easy ones. Analysis of the scores
! on the second test showed that the long-alternative response set could
be m ade to disappear.
! In 1965, two investigators, Hughes and Trimble, reported a study
I on the use of complex alternatives to "thwart test-sophisticated
' students who parlay partial information into correct answers," (26).
! These researchers examined the effects of three types of alternatives,
, "all of the above," "none of the above" and "both 1 and 2 are correct,",
! to determine whether the inclusion of such options affected item d if-
j ficulty and/or item discrimination. Experimental tests for the re-
!
! search were modified versions of a standard classroom test in psychol-
: ogy administered to college freshmen. Multiple-alternative choices
19 I
were added to selected items on the existing test but no other changes
were made, thus the added choice was always an incorrect option. In
addition, the study investigated relationships between typical
multiple-choice response procedures as used in the standard test and
response procedures designed to evaluate partial information. A s the
authors themselves indicated, the procedures employed did not adequately
answer several of the research questions. Complex alternatives were
shown to make items more d iffic u lt but the data did not indicate
whether complex alternatives reward or penalize students with greater
degrees of knowledge, nor were differences in response mathods clearly
identifiable. Also, the results did not differentiate between the
effects of none-of-the-above and a l1-of-the-above options.
McMorris, Brown, Snyder, and Pruzek examined faults of cue,
grammar, and length in item-writing as reflected in item and sub-test
scores, in a study conducted in 1972 (24). Tw o forms of a social
studies examination appropriate for the sample population of 11th
graders, were developed for this research. The forms differed only
with regard to which items were manipulated to include flaws. Half of
the items on each form were modified so that items which appeared in a
faulted version on one form appeared on the parallel form in a clean
version. In addition, an attempt was made to match clean and faulted
items appearing on the sam e form with regard to d ifficu lty level and
content so that individual test-score differences between clean and
faulted items could be obtained. Results indicated that for the popÂ
ulation of secondary students considered, the faults generally m ade the
items easier but did not affect either the validity or re lia b ility
20
coefficients. D ifficulty effects for items in which the correct option
was extra-long were considerably less than those found for faulty items
containing cue and grammar violations. The design was also intended
to provide measures of individual test-wiseness through score d iffe rÂ
ences obtained by individuals on clean and faulted items. However, as
the researchers themselves indicated, individual faulty-minus-clean
scores are generally unreliable and interpretations that can be m ade
from these score differences were limited. Furthermore, in this study,
the assumption that the "matched" items on each form were truly matched
might also be questionsed. Correlations with standardized achievement
tests suggested that students at different levels of mental a b ility
did not appear to capitalize differently on any of the three faults.
These conclusions, were, however, based on m ean scores for the groups
and no analysis of results was m ade for subgroups of high, medium or
low achievers.
Another study reported in 1972 was also concerned with the effects
of poor item-writing practices on test d iffic u lty , re lia b ility and
validity. This research was conducted by Cynthia Board and Douglas
Whitney (13). Violations of four selected principles for writing
multiple-choice items were examined. Effects of the following item
flaws were analyzed: (1) inclusion of extraneous, irrelevant informaÂ
tion in the stem (window dressing); (2) incomplete statements or
questions in the stem; (3) extra-long or extra-short keyed responses;
and (4) grammatical inconsistencies. For the study, three forms of a
30-item test were developed using items from a midterm examination in
American Politics administered to undergraduate college students.
21
Test I was considered to be fault-free. Test I I contained items in
half of which, keyed responses were either extra-long or extra-short
and half of which contained grammatical inconsistencies that offered
cues to correct responses. In contrast to the findings in the study
by McMorris et a l, the results indicated l i t t l e or no overall effects
on item d iffic u lty for three of the four faults,but significant deÂ
creases in indices of test re lia b ility and validity were found which
were attributed to the item faults. I t had been anticipated that the
window-dressing and incomplete-stem faults would make items more d ifÂ
fic u lt and that length and grammar faults would make items easier. The
hypothesis for incomplete-stem faults was supported by the findings.
N o significant effects due to window-dressing were found, although
there was som e indication that the fault m ade items easier for poorer
students and more d iffic u lt for better students. W hen keyed responses
were made either extra long or extra short, poor students seemed to
gain more than better students from the faults but no overall s ig n ifiÂ
cant decreases in d iffic u lty were noted. Grammatical consistencies beÂ
tween stem and keyed responses did not have a major effect on test
d iffic u lty for either group.
A further study of the effects of violations of principles of item-
writing was reported by Fred Pyrczak in a paper presented at the 1973
Annual Meeting of the National Council on Measurement in Education (45).
The Pyrczak study investigated the effects of sim ilarities between
stems and keyed responses on test d ifficu lty and whether examinees
could be trained to take advantage of the "similarity" principle.
The research design in this study differed from previously reported
22
studies in two important ways. Firstly, faulty items were taken from
published vocabulary tests, not deliberately developed with flaws for
experimental purposes, and thus, conceivably represent a more realistic
duplication of a true testing situation. Secondly, the subjects for
this study were selected from a population of extremely sophisticated
test-takers. Examinees were undergraduate and graduate students enÂ
rolled in courses in educational research and measurement and m any
had had considerable experience in writing multiple-choice tests. ReÂ
sults indicated no significant d ifficu lty effects attributable to stem-
option sim ilarities prior to training but highly significant d iffe rÂ
ences after training. However, since "training" consisted only of inÂ
clusion of instructions to look for stem-option sim ilarities when in
doubt, the researcher's interpretation of results can be questioned.
The findings may be more indicative of the a b ility of these experienced
test-takers to follow instructions carefully than of the acquisition
of a cue-using strateg^y. I t would have been interesting to see whether
the skill was applied at a later date, using a similar test but with
both faulty and non-faulty items included.
As the popularity of the multiple-choice test has increased, the
quality of items in teacher-made tests has become of increasing conÂ
cern to measurement specialists. Principles of "good" item-writing
are outlined in tests by such authorities as Adkins, Ebel, Lindquist
and Thorndike (12,16,21,22). Test specialists devote considerable time
and effort to assuring that the established rules are followed and
collections of sample items to assist the classroom teacher in developÂ
ing tests are available in many content areas (46,47,48). In addition,
23
, books of instruction on how to take tests have been written to en-
I courage test-takers to be more cognizant of item-writing faults which
j offer advantages to the test-wise student (49). Yet, as can be seen
i from this review, empirical evidence to support or refute the implied
: assumptions of the "do" and "don't" rules is sparse. Further research
is needed to determine the differential effects of violations of item-
writing principles under realistic testing conditions, am ong populaÂ
tions of differing levels of test sophistication.
2 4 '
CHAPTER I I I
PR O C ED U R ES
Research Design
As indicated in Chapter I, this study was designed to investigate
the influence of faulty item-writing on the accuracy of multiple-choice
tests as measures of achievement am ong two populations of health proÂ
fession students. To answer the stated questions concerning the
effects of faulty items on test d iffic u lty and re lia b ility , a quasi-
experimental research design was developed.
Independent Variables
The design included two independent variables, item structure and
achievement level. The fir s t of these variables, item structure, was
defined as the presence or absence of specified item faults. The item
faults studied were:
1. Extra-long keyed response.
2. Keyed response as middle alternative in number sequence.
3. Parallel alternatives, one of which is the keyed response.
4. "All-of-the-above" as the keyed response and only multiple-
alternative choice.
The second independent variable, achievement level, was derived
on the basis of the subjects' scores on non-manipulated items included
in the test instruments. For purposes of analysis, the sample populaÂ
tions were divided into quartiles with respect to this variable.
Dépendent Variables
25
The study was designed to determine the effects of the fault/no
fault treatments on two dependent variables: (1) test d iffic u lty and
(2) test re lia b ility .
Two separate measures of the variable, test d iffic u lty , were used.
Since previous studies had suggested that confidence-in-response
(C-I-R) testing procedures might provide more finely differentiated
data with respect to estimates of examinees' true state of knowledge
than could be obtained through traditional number-of-correct-response
ta llie s (50,51), both of these measures of test d iffic u lty were inÂ
corporated into the design.
The second dependent variable, test re lia b ility , was determined
by measures of Cronbach's alpha.
Statistical Hypotheses
For purposes of analysis, the following null hypotheses with reÂ
spect to the dependent variable test d iffic u lty were formulated for
each of the item faults studied.
1. There w ill be no significant difference between the mean
scores (as indicated either by the number of correct reÂ
sponses or by the expressed confidence in the correctness
of keyed responses) achieved on faulty items and the mean
scores achieved on paired fault-free items.
2. There w ill be no significant interaction between student
achievement levels and the effects of item faults on m ean
scores.
For purposes of analysis, the following null hypothesis with reÂ
spect to the dependent variable test re lia b ility was proposed in this
26
study:
There w ill be no significant difference between the re lia Â
b ility (Cronbach's alpha of a test consisting of all
faulty items and that of a test consisting of paired fault-
free items.
Description of Sample
In order to offset bias particular to a specific professional
school and to make study results more generalizable, two sample populaÂ
tions were included in this study, one from a school of optometry and
the other from a school of dentistry. Although identical research
designs were in itia lly planned for the two groups, experience with data
obtained from the sample population of optometry students suggested
several modifications in the format of the experimental tests. These
changes were subsequently incorporated into the study of the dental
school population and are described under the heading Final Examination
for Pedodontics 534. However, the changes did not necessitate any
alteration in data analyses plans and these were identical for both
groups.
Population I (Optometry Students)
Subjects for this sample were first-year students enrolled in the
introductory course in Clinical Optometry (Optometry 343) at the
Southern California School of Optometry. A total of 107 students took
the final examination in Optometry 343 in the spring semester, 1976;
however, for the purposes of analyses three of these students were ranÂ
domly dropped from the study in order to achieve equal N's in all of
the cells.
27
, Population II (Dentistry Students)
I Subjects for this sample were 104 second-year students enrolled
I
] in the introductory course in pedodontics (PEDO 534) at the University
I ;
I of Southern California School of Dentistry. All students who took the
I
final examination in P E D O 534 during the summer trimester, 1976, were
included in the sample, providing a total of 126 subjects. Prior to
analysis of the data, 22 students who were identified as foreign
students with limited fa c ility in the English language were dropped
from the study. Consequently, data from a total of 104 students were
I
available for analysis,
j Description of Instruments
I The experimental tests used for the study were the final examina-
I tions for Optometry 343 and Pedodontics 534. Two forms of each test
i
! were developed in the manner described below:
I
Final Examination for Optometry 343
â– The fir s t step in developing this study instrument was the con-
I
I struction of a 78-item multiple-choice examination by the course in-
; structor. The test was then reviewed to determine which items could
I most appropriately be manipulated for each of the study variables. A n
I
i attempt was m ade to divide the 78 items so that an equal number of
I items could be assigned to each item-fault category, with a sufficient
number of items remaining in the non-manipulated category to serve as
a measure of student achievement for the course.
; In itia lly i t was planned that for the Optometry study, 12 items |
would be assigned to each fault category, with the remaining 30 items
j serving as non-manipulated measures of achievement in the course. This.
i
! 28 '
proved to be possible for the extra-long, number and parallel cateÂ
gories; however, only ten items were considered suitable for inclusion
in the a l1-of-the-above category. Thus, a total of 46 items were
available for manipulation. Included am ong these were several items
in which one of the specified faults already existed. For each of
these, a paired item in which the fault had been corrected was develÂ
oped. Other items in this group were fault-free in their original form
but were considered suitable for manipulation to f i t into one of the
faulty-item categories. Faulty versions of these items were developed
in accordance with the procedures described below;
1. Faulty items in the extra-long-fault category were deÂ
veloped by rewriting the keyed response so i t was at
least 50% longer than any of the other alternatives.
For example.
In retinoscopy the ideal point for accurate determinaÂ
tion of the principal meridian is:
a. before the lenses are introduced
b. with only the working distance lens.
*c . when sphere is added to produce neutrality
in one meridian and "with" in the other.
d. when all of the meridians are "against."
2. Faulty items in the p arallel-fault category were developed
by rewriting the item so that the keyed response was obviously
paired with one of the other alternatives either by reason of
sim ilarlity or opposition and no such pairing existed am ong
other alternatives. For example,
29
In retinoscopy the behavior of the reflex changes with
the addition of lenses so that as neutrality is approached:
a. the speed of motion decreases
*b. the brightness increases
c. the brightness decreases
d. the width of the reflex decreases
e. the color fades.
3. Faulty items in the number-fault category were developed by
modifying the response choices so that the numbers were in seÂ
quence and the keyed response appeared as a middle alternaÂ
tive. For example.
The relations between axial length and refracting power
is such that 1 m /m of shortening of the axial length is
equivalent to a dioptric change of about:
a. ID
b. 2D
*c. 3D
d. 4D
e. 5D
4. Faulty items in the all-of-the-above-fault category were deÂ
veloped by modifying the item stem or the alternative choices
so that "all of the above" would be the keyed response and
the only multiple-alternative choice listed. For example,
A n eye is myopic by 1.50 diopters in the vertical
meridian and by 2.50 diopters in the horizontal meridian.
Which of the following would be observed in retinoscopy?
30
|~ a. Vertical meridian neutral at 33 cm .
I b. Horizontal meridian neutral at 40 cm .
I c. Both meridians show "with" motion at 33 cm .
I
! d. Both meridians show "against" motion at 1 m .
*e. All of the above.
W hen these procedures were completed and, as noted, 46 pairs of
items had been developed, the item sets were again reviewed to ascerÂ
tain whether they contained any item-writing faults other than those
j specifically intended for purposes of the study. W hen necessary, items
I were rewritten to meet criteria of fault-freeness as judged by a com-
, mittee of three faculty members comprised of the researcher, the |
course instructor and a consultant psychometrist associated with the
University of Southern California School of Medicine.
The next step in the procedure for developing the experimental
instrument was to distribute the 46 item pairs on two separate forms
(A and B). Half of the fault-free items in each category were randomly
assigned to form A, with the remainder automatically placed on form B.
For each fault-free item assigned to form A, the paired faulty version
of that sam e item was assigned to form B. Similarly, the faulty
,versions of fault-free items on form B appeared on form A. Each of
'the test forms included the sam e 32 non-manipulated items.
Final Examination for Pedodontics 534
For the Pedodontics 534 final examination, the instructor develÂ
oped a 100-item multiple-choice test. As with the optometry examinaÂ
tion, the dentistry test was reviewed to determine which items could
most appropriately be manipulated for each of the study variables.
I 31
i Again, an attempt was m ade to divide the items so that an equal number
I
' could be assigned to each item-fault category with a sufficient number
; remaining in the non-manipulated category to serve as a measure of
student achievement for the course. This time, ten items were deemed
suitable for inclusion in the number and all-of-the-above categories
I
and eight for the extra-long and parallel categories.
Data from the optometry study suggested one modification in item
format that was subsequently incorporated into the design of the study
, instrument used in the dentistry study. Specifically, a change was
; made in the manner in which the fault-free versions of a l1-of-the-above,
items were developed. As noted previously, on the optometry examina- '
tion faulty items in this category were corrected by altering either
the item stem or item alternatives so that "all of the above" was no
longer the correct response. After considering the data from the firs t
population studied, the researcher fe lt that this procedure necessiÂ
tated undesired changes in item content. Therefore, in order to assure
equivalent content for paired items in all-of-the-above sets used in
the dentistry tests, a complex alternative item format suggested by
: the National Board Examinations was employed. This format provides
, several multiple-alternative response choices in addition to an all -
; of-the-above or a l1-are-correct alternative. A n example of one of
I the items in this format along with the instructions for responding as
' they appeared on the Pedodontics examination is shown below:
; For each of the questions or incomplete statements below, O N E
or M O R E of the answers or completions given is correct. Record your
I
answer by circling the correct le tte r below the question. j
; 3 2 I
Which of the following statements is (are) true regarding
injuries to primary teeth?
1. Maxillary centrals are most often affected.
2. Most occur at "toddler" age.
3. Coronal fractures involving enamel only are rare.
4. Displacement is more com m on than fracture.
A. 1,2,3
B. 1,3
C .. 2,4
D. 4 only
*E. All are correct
In order to avoid repeating the special instructions required for
these items, the fault-free versions of items had to be grouped in one
section on each form. This would have meant placing in sequence ten
items in which the keyed response was "all are correct." To overÂ
com e this effect, eight additional items were developed in the complex-
alternative format in which the keyed response was one of the alternaÂ
tives other than "all are correct." These items which were identified
as "distractors" were intermixed with the fault-free items in which
"all are correct" was the keyed response.
Test Administration and Scoring
Instruction Sessions
Since the confidence-in-response (C-I-R) test format was new to
most of the subjects who were to be included in the study, preliminÂ
ary instructional sessions on the use of C-I-R techniques were considÂ
ered necessary. Consequently, the researcher met with each group, at
the last class session prior to their final examination, to discuss
response and scoring procedures for C-I-R evaluation. At these
33
sessions, C-I-R testing aspects of the study were discussed in detail,
but no indication was given of the researcher's intent to measure the
effects of faulty item-writing. Also, the students were told that
there would be two forms of the final test distributed on the day of
the examination and that both forms would contain similar questions
but items would be displayed in different orders.
During the instructional sessions, the advantages of a testing
system in which examinees could be credited for partial knowledge were
pointed out and, the benefits to the examinee of reporting confidence
estimates honestly were emphasized. To further demonstrate these conÂ
cepts, each student was provided with an information packet which conÂ
tained a summary of C-I-R testing and scoring techniques. A copy of
the C-I-R information packet given to the dentistry students is inÂ
cluded in Appendix A of this report. Finally, in order to familiarize
the students with the techniques and to provide practice in the use
of C-I-R techniques, a sample test was administered and scored during
each instructional sessron.
The sample tests were comprised of five multiple-choice questions
which focused on content from advanced courses in the respective proÂ
fessional school programs. The intent was to select subject matter
with which the examinees would probably have varying degrees of knowlÂ
edge and, consequently, could be expected to display varying degrees
of confidence in responses, thus providing examples of scoring across
the full range of response possibilities. The students were asked to
score their ow n tests by both regular and C-I-R scoring methods. The
procedures for each of these scoring techniques are described under
34
the paragraph heading. Test Scoring.
Administering the Final Examinations
O n the scheduled dates for the final examinations in Optometry
343 and Pedodontics 534, the tests were administered by the respective
course instructors in accordance with the customary school procedures.
The researcher was present as an observer but did not participate in
the test administration.
In order to collect data on the expressed confidence in responses,
special answer sheets were designed which provided space for the
student to indicate the percent confidence fe lt in the correctness of
each response alternative for a given item (see Appendix B). O n the
Optometry tests the students were asked to indicate confidence for all
items, but since the manipulated items on the Pedodontics examination
all appeared among the fir s t fif t y items of the test, subjects were
required to indicate confidence for questions only on the fir s t half
of this test.
Test Scoring
Scores were derived for each test using both regular scoring
(conventional number-of-correct-response ta lly ) and confidence-in-
response (C-I-R) scoring procedures. For purposes of analyses, C-I-R
scores were derived directly on the basis of percent confidence exÂ
pressed in keyed responses. However, in order to f u lf ill the researchÂ
er's obligation to the students to provide maximum scoring for honest
estimates of confidence, a second set of C-I-R scores were calculated
for reporting purposes. These scores were derived using the reproÂ
ducing scoring system (RRS) suggested by Shuford, Albert and
35
Massengill (52).
Analyses of items after in itia l scoring procedures were completed
revealed several errors (typographical and/or content) that warranted
dropping items from the study analyses. Tables 3.1 and 3.2 show the !
I :
final number of items used in the data analyses for each category.
Data Analyses
Analyses Involving the Variable Test D ifficulty
A s noted previously, since faulty and fault-free versions of the
I sam e items could not practicably be assigned to the same person in
I real testing situations, items were assigned to the two forms of each
; I
I test in a counterbalanced manner. Each fault category could, there-
! fore, be considered to comprise two sets of items. By definition, one
of these sets. Item Set I, was considered to consist of items which
' were assigned in their fault-free versions to form A of a test and,
thus, automatically appeared as faulty items on form B. Item Set I I
I was defined as the set of items which were fault-free on form B and
' faulty on form A.
- The counterbalanced design was analyzed by means of the basic
ILatin square diagrammed in Figure 1, in which Item Structure ( i.e .,
ithe presence or absence of item faults) is a two-level treatment factor
I
Sand Student and Item Set are considered as nuisance variables.
; Comparisons between Bi and B z served to test the hypotheses conÂ
cerning test d iffic u lty . To answer the research questions with '
respect to this variable, the basic Latin square was replicated 52
times in each study to encompass the 104 students in the population
I samples.
I
i 36 !
T A B L E 3 . 1
ASSIGNMENT OF ITEMS TO MANIPULATED AND
NON-MANIPULATED CATEGORIES FOR
OPTOMETRY EXAMINATION
Number o f Item s
I n i t i a l Design On Test Analysed
Plan* Instrum ent For Study
T o ta l 78 78 70*
Non-m anipulated 30 32 26
M anipulated 48 46 44
E x tra -lo n g 12 12 12
P a r a lle l 12 12 12
Number 12 12 12
A ll-o f - the-above 12 10 8
*6 non-m anipulated and 2 m anipulated item s were not analyzed because
d f ty p o g ra p h ic a l and /or content e rro rs found a f t e r s c o rin g .
TABLE 3.2
ASSIGNMENT OF ITEMS TO MANIPULATED AND
NON-MANIPULATED CATEGORIES FOR
DENTISTRY EXAMINATION
Number o f Item s
I n i t i a l Design On Test Analyzed
Plan Instru m ent For Study
T o ta l 100 100 93*
Non-m anipulated 50 56 53
D is tra c to rs 10 8 8
M anipulated 40 36 32
E x tra -lo n g 10
8
7
P a r a lle l 10
8
7
Number 10
10
9
A ll-o f-th e -a b o v e 10 10 9
* 3 non-m anipulated and 4 m anipulated item s were not analyzed because
o f ty p o g ra p h ic a l and /or content e rro rs found a f t e r s c o rin g .
37
Al Az
Bz
Bz Bz
A = Item Set
Al = F a u lt-fr e e item s, form A
Az = F a u lty item s, form A
B = Item S tru c tu re
B 1 = F a ü lt-fr e e item s
Bz = F a u lty item s
S = Student
S 1 = Student who received form A
Sz = Student who rece ive d form B
F igure 1. Diagram o f basic L a tin square r e p lic a te d fo r analyses. !
S '
; In order to test the hypotheses regarding interaction of test i
!
ÃŽ I
i d iffic u lty with achievement le v e l, subjects were divided into quartilesi
(
on the basis of scores on non-manipulated items. Each quartile in-
, eluded 13 students w ho had taken form A of the test and 13 students who
had taken form B. The achievement level variable (C) was then superÂ
imposed on the basic Latin square. Subjects who took the sam e test
I form and were in the same quartile were considered to constitute a
Group Block (G). In all there were 8 group blocks, two for each
achievement level (one group taking form A and the other group taking
form B). The basic Latin square was then replicated 13 times for each
achievement level. Figure 2 shows the diagram of the complete design.
Analyses Involving the Variable Test Reliability
1
To answer the research questions regarding the effects of'item
I faults on test re lia b ility , indices of re lia b ility for faulty and
I fault-free items were compared. Comparisons were m ade between (1) ;
I ;
: the re lia b ility (Cronbach's alpha) derived for faulty items on form A
and for their paired fault-free items on form B and (2) the coefficients
38
Cr
G2
Si
L
1 3
Si
Sl3
Si
S 1 3
Si
L
Si
L
Si
Sl3
Si
Sl3
Al Az
Bi Bz
Bi Bz
Al Az
Bi Bz
Bi Bz
Al Az
Bi Bz
Bi Bz
Al Az
Bi Bz
Bi Bz
A = Item Set
B = Item S tru c tu re
S - Student
G = Group Block
C = Achievement Q u a rtile
F ig u re 2. Diagram o f complete a n a ly s is design showing basic L a tin
squares which were re p lic a te d 13 tim es fo r each q u a r tile
39
derived from the fault-free items on form A and their faulty counterÂ
parts on form B. These analyses utilized statisctic "W " with approÂ
priately corrected degrees of freedom as described by Feldt (53).
Assumptions
For the purposes of this study, the following assumptions conÂ
cerning elements of the research design were considered reasonable and
were accepted as true:
1. I t was assumed that items designated as "fault-free" did not
contain any violations of the principles of good item-writing.
2. I t was assumed that the only fault in an item designated as
"faulty" was the specified item fault.
3. It was assumed that subjects indicated confidence in the
correctness of their responses honestly.
Limi tations
The practical demands of educational programs placed certain
limitations on the research design for this study.
The following conditions were considered to be possible limiting
factors with respect to the generalizability of the study findings.
1. I t was not practical to select randomly either courses or
instructors for the purposes of this study. However, since
the study focused on the characteristics of a specific test
in a testing situation, the effects of non-random selection
with respect to these variables were considered to be negliÂ
gible.
2. The measurement instruments used in the study were actual
final examinations for the courses. Since test results were
40
to be used to determine o fficia l student rankings, i t was not
considered ethical to manipulate items to experimental exÂ
tremes (e.g ., excessive or minimal item d iffic u lty , use of
nonsense questions, etc.)
3. The investigator was limited in choosing items to f i t the
fault categories by the need to maintain content requirements
for the tests.
4. The design could not provide for measures of difference in
the response behavior of individuals on faulty and fault-free
versions of the sam e item.
41
Chapter IV
Results
In this chapter results of the analyses are presented and disÂ
cussed within the framework of the major research questions. Data obÂ
tained from the two population samples with regard to the dependent
variable test d iffic u lty will be examined f ir s t, followed by considéra
tion of the results of the analyses of the test re lia b ility variable.
Analyses of Test D ifficulty Variable
As previously noted, hypotheses with respect to test d iffic u lty
(as indicated by differences between m ean scores on faulty and fault-
free items) and the interactions of this variable with achievement
levels were set forth for each item fault category. In the following
discussions each of these will be considered in turn.
Extra-long-Fault Category
Findings
The data on Tables 4.1 and 4.2 present the mean scores on items
in the extra-long category for the dentistry and optometry populations,
with results of the Latin square analyses of these data shown on
Tables 4.3 and 4.4. Each table presents data for regular as well as
the percent confidence scoring systems. A significant increase (p<.01)
in m ean scores on the faulty versions of extra-long items was indiÂ
cated for the optometry sample under both scoring systems but no such
significance noted for the dentistry data. O n the other hand, the
percent confidence scoring system for the dentistry sample showed
42
T A B L E 4 . 1
M E A N S C O R E S ON E X T R A -L O N G IT E M S W IT H R E G U L A R A N D
P E R C E N T C O N F ID E N C E S C O R IN G P R O C E D U R E S
F O R D E N T IS T R Y S T U D E N T S
Regular Scoring
Achievement
le v e l
Set
F a u lt-fr e e F a u lty
Set 11^
F a u lt-fr e e F a u lty
; I 51.92 55.77 92.31 92.31
1 2
1
55.77 48.08 97.44 84.62
i 3 53.85 57.69 92.31 97.44
: 4
1
57.69 59.62 92.31 89.74
Percent Confidence Scoring
; I 5 0.87 52.98 8 8.85 87.92
2 54.29 4 8.23 94.87 78.08
3 53.65 57.12 90.77 94.87
4 60.00 59.62 89.49 88.21
includes 4 item s, f a u lt - f r e e on form A and fa u lty on form B
includes 3 item s, f a u lt - f r e e on form B and fa u lty on form A
43
T A B L E 4 . 2
M E A N S C O R E S O N E X T R A -L O N G IT E M S W IT H R E G U L A R A N D
P E R C E N T C O N F ID E N C E S C O R IN G P R O C E D U R E S
F O R O P T O M E T R Y S T U D E N T S
Regular Scoring
Achievement
le v e l
Set I *
F a u lt-fr e e F a u lty
Set 11^
F a u lt-fr e e F a u lty
1 68.13 75.82 76.92 83.08
2 65.93 8 5 .7 1 80.00 78.46
3 72.53 8 4 .6 1 84.62 84.62
4 81.32 91.21 86.15 89.23
Percent Confidence Scoring
1 65.78 74.65 72.97 79.20
2 68.02 8 1.31 74.34 76.35
3 70.02 84.55 83.15 73.78
4 78.80 87.75 79.97 85.85
^ in clu d es 7 item s, f a u lt - f r e e on form A and fa u lty on form B
^ in clu d es 5 item s, f a u lt - f r e e on form B and fa u lty on form A
44
T A B L E 4 . 3
A N A L Y S IS O F V A R IA N C E F O R E X T R A -L O N G IT E M S W IT H
R E G U L A R A N D P E R C E N T C O N F ID E N C E S C O R IN G
P R O C E D U R E S F O R D E N T IS T R Y S T U D E N T S
R egular Scoring
Source o f V ariance SS d f M S F
Between Subjects
C (achievem ent le v e l) 481.79 3 160.60 <1.0
Rows (group blocks) 120.52 1 120.52 <1.0
Rows X C 8 3.78 27.93 <1.0
E rro r 41829.99 96 435.73
W ith in Subjects
B (ite m s tru c tu re ) 56.42 1 56.42 <1.0
A (ite m se ts ) 72193.11 1 72193.11 3 1 0 .1 7 **
BC 1622.17 3 540.72 2.32
AC 417.63 3 139.21 <1.0
E rro r 22344.10 96 232.75
Percent Confidence Scoring
Between Subject
C (achievem ent le v e l) 1194.16 3 398.05 1.09
Rows (group blocks) 155.11 1 155.11 <1.0
Rows X C 252.97 3 84.32 <1.0
E rro r 34907.51 96 363.62
W ith in Subjects
B (ite m s tru c tu re ) 201.20 1 201.20 1.03
A (ite m se ts ) 62040.39 1 62040.39 3 1 7 .1 5 **
BC 1695.74 3 565.25 2.89
AC 556.94 3 185.65 <1.0
E rro r 18779.19 96 195.62
*< .0 5
'
* * < .0 1
45
T A B L E 4 . 4
A N A L Y S IS O F V A R IA N C E F O R E X T R A -L O N G IT E M S W IT H
R E G U L A R A N D P E R C E N T C O N F ID E N C E S C O R IN G
P R O C E D U R E S F O R O P T O M E T R Y S T U D E N T S
Regular Scoring
Source o f V ariance SS d f M S F
Between Subjects
C (achievem ent le v e l) 3761.78 3 1253.93 4 .2 1 * *
Rows (group blocks) 1416.55 1 1416.55 4 .7 5 *
Rows X C 693.78 3 231.26 <1.0
E rro r 28607.65 96 298.00
W ith in Subjects
B (ite m s tru c tu re ) 2652.72 1 2652.72 1 4 .3 5 **
A (ite m sets) 1161.83 1 1161.83 6 .2 8 *
BC 72.87 3 24.29 <1.0
AC 327.80 3 109.27 <1.0
E rro r 17752.05 96 184.92
Percent Confidence Scoring
Between Subjects
C (achievem ent le v e l) 3058.94 3 1019.65 3 .9 0 *
Rows (group blocks 1493.40 1 1493.40 5 .7 2 *
Rows X C 987.89 3 329.30 1.26
E rro r 25067.21 96 261.12 <1.0
W ith in Subjects
B (ite m s tru c tu re 2229.23 1 2229.23 1 4 .7 3 **
A (ite m s e ts ) 227.56 1 227.56 1.50
BC 284.98 3 94.99 <1.0
AC 277.10 3 92.37 <1.0
E rro r 14524.30 6 151.29
:
*< .0 5
* * < .0 1
46
significant interaction between achievement level and item structure
(p<.05), whereas the optometry sample did not.
Tables 4.3 and 4.4 also show the results of the analyses of the
non-treatment variables of achievement level, item set and achieveÂ
ment- times-set interaction. With respect to these variables s ig n ifiÂ
cant differences in achievement level were shown for the optometry
sample (p<.05 for regular scoring and p<.01 for percent confidence
scoring) but no such differences were indicated for the dentistry
sample. In addition, very significant differences (p<.01) were found
between item sets in the dentistry sample under both scoring systems
and a difference at the p<.05 level noted in the optometry sample
under regular scoring methods. Neither sample reflected a significant
interaction between achievement and item set.
Di scussion
, O n the basis of the summary findings reported in Tables 4,3 and
4.4, there would appear to be considerable inconsistency between the
results obtained for the dentistry and optometry samples with respect
to the d iffic u lty variable. However, a more detailed examination of
the data for the dentistry group reveals additional information that
should be noted. A review of the m ean scores for faulty and fault-free
items in the extra-long category (see Table 4.1) shows that for most
of the achievement levels in the dentistry population, the scores on
faulty versions of items were equal to or lower than scores on the
corresponding fault-free versions, just as was true for the optometry
test (see Table 4.2 ). However, a reversal of this trend occurred in
the second quartile. Although this result cannot be explained other
47
than as a chance occurrence, its consequences should be considered.
For instance, i t is likely that the significant interaction shown beÂ
tween achievement and structure for the percent confidence scoring
system and the approaching significance evidenced under the regular
scoring procedures were results of this circumstance. Furthermore,
the reversal probably negated any possible findings of structure or
achievement effects.
Another result that warrants special consideration is the findÂ
ing of significant differences between item sets, despite the fact
that items were randomly assigned. It can be seen that similar
differences between item sets were found in almost all of the item
fault categories, although not always in the sam e direction, i.e ., in
som e fault categories, scores on items in Set I were lower, and in
others higher than scores on items in Set I I . Since in the Latin
square design used, the item set effect was confounded with the
subject-times-structure interaction, the possibility that the set
effect could be caused by this interaction had to be considered. HowÂ
ever, this was believed to be highly unlikely because of the random
assignment of subjects to groups. Possibly, the number of items inÂ
cluded in each fault category were too few to establish a desirable
balance. This reasoning is backed up by the fact that the overall
data show no consistent bias toward higher or lower-scoring items in
either set.
In summary, as originally anticipated, faulty items in the extra-
long-fault category seemed to be less d iffic u lt than their fault-free
versions. This finding is supported by results in previous studies
48
(24,25). However, in the present study there were no indications that
this effect was differential with respect to achievement levels as was
found in a study reported by Board and Whitney in 1972 (13).
Parallel-Fault Category
Findings
The results of the analyses of scores in faulty and fault-
free items in the p arallel-fau lt category for the two sample populaÂ
tions, under both regular and percent confidence scoring systems, are
shown on Tables 4.5, 4.6, 4.7, and 4.8.
Contrary to the findings in the extra-1ong-fault category these
analyses showed significant decreases (p<.01 for confidence scoring
and p<.05 for regular scoring) in mean scores for faulty items on the
dentistry test and a similar downward trend, though not at significant
levels, for faulty items on the optometry test. N o interactions, beÂ
tween achievement levels and item structure were indicated for either
sample population, under either scoring system.
Analyses of non-treatment factors in the p arallei-fault category
revealed significant differences (p<.01) between achievement levels
for the optometry group but no such differences were indicated for
the dentistry sample.
Discussion
It is of particular note that the findings in the p arallei-fault
category showed a trend in the opposite direction from the one antiÂ
cipated in the original hypotheses; that is, in general these faulty
items turned out to be more d iffic u lty than the fault-free items. In
interpreting these results, the different approaches used in developing
49
T A B L E 4 . 5
M E A N S C O R E S ON P A R A L L E L IT E M S W IT H R E G U L A R A N D
P E R C E N T C O N F ID E N C E S C O R IN G P R O C E D U R E S
F O R D E N T IS T R Y S T U D E N T S
Regular Scoring
Achievement
le v e l
Set I^
F a u lt-fre e Faulty
Set 11^
F a u lt-fre e Fault y
1 92.31 82.05 61.54 48.08
2 87.18 92.31 61.54 48.08
3 89.74 84.62 57.69 53.85
4 92.31 74.49 65.38 63.46
Percent Confidence Scoring
1 87.44 78.46 53.67 45.00
2 85.90 89.08 58.56 44.23
3 88.59 83.85 57.02 50.10
4 90.51 78.97 6 1 .73- 59.81
I^ includes 3 item s, f a u lt - f r e e on form A and fa u lty on form B
â– b
; includes 4 item s, f a u lt - f r e e on form B and fa u lty on form A
5 0
T A B L E 4 . 6
M E A N S C O R E S O N P A R A L L E L IT E M S W IT H R E G U L A R A N D
P E R C E N T C O N F ID E N C E S C O R IN G P R O C E D U R E S
F O R O P T O M E T R Y S T U D E N T S
Regular Scoring
Achievement
le v e l
Set I^
F a u lt-fre e Faulty
bet 11^
F a u lt-fre e Fau lty
1 69.23 64.42 69.23 65.38
2 77.88 69.23 73.08 73.08
3 82.69 79.81 65.38 55.77
4 87.50 88.46 71.15 73.08
Percent Confidence Scoring
1 67.98 63.33 64.88 64.08
2 73.42 66.36 71.81 70.67
3 78.76 79.02 62.50 56.71
4 87.40 87.64 69.10 72.69
^ includes 8 item s, f a u lt - f r e e on form A and fa u lty on form B
^ includes 4 item s, f a u lt - f r e e on form B and fa u lty on form A
51
T A B L E 4 . 7
A N A L Y S IS O F V A R IA N C E F O R P A R A L L E L IT E M S W IT H
R E G U L A R A N D P E R C E N T C O N F ID E N C E S C O R IN G
P R O C E D U R E S F O R D E N T IS T R Y S T U D E N T S
Regular Scoring
Source of Variance SS df M S F
Between Subjects
C (achievement le v e l) 543.14 3 181.04 <1.0
Rows (group blocks) 75.14 1 75.14 <1.0
Rows X C 1472.63 3 490.88 1.15
Error 41061.48 96 427.72
W ithin Subjects
B (item stru ctu re) 2526.90 1 2526.90 6 .72*
A (item :Aets) 46953.12 1 46953.12 1 25 .00 **
BC 495.08 3 165.03 <1.0
AC 1360.61 3 453.54 1.21
E rror 36061.22 96 375.64
Percent Confidence Scoring
Between Subjects
C (achievement le v e l) 1144.84 3 381.61 1.17
Rows (group blocks) 77.54 1 77.54 <1.0
Rows X C 1234.73 3 411.58 1.26
Error 31447.37 96 327.58
W ithin Subjects
B (item s tru c tu re ) 2362.50 1 2362.50 8 .7 7 **
A (item sets) 51876.46 1 51876.46 192.50**
BC 84.77 3 28.26 <1.0
AC 1085.76 3 361.92 1.34
E rror 25871.31 96 269.49
, *p<.05
I**p < .0 1
52
T A B L E 4 . 8
A N A L Y S IS O F V A R IA N C E F O R P A R A L L E L IT E M S W IT H
R E G U L A R A N D P E R C E N T C O N F ID E N C E S C O R IN G
P R O C E D U R E S F O R O P T O M E T R Y S T U D E N T S
Regular Scoring '
Source of Variance SS df MS F
Between Subjects
C (achievement le v e l) 4369.42 3 1546.47 5 .5 7 **
Rows (group blocks) 12 . 02 1 12.02 <1.0
Rows X C 384.62 3 128.21 <1.0
Error 26646.63 96 277.57
W ithin Subjects
B (item s tru c tu re ) 588.94 1 588.94 2.18
A (item sets) 4338.94 1 4338.94 1 6 .0 7 **
BC 432.69 3 144.23 <1.0
AC 4495.19 3 1498.40 5 .5 5 *
Error 25925.48 96 270.06
Percent Confidence Scoring
Between Subj ects
C (achievement le v e l) 5501.75 3 1833.92 9 .4 0 **
Rows (group blocks) 40.89 1 40.89 <1.0
Rows X C 277.22 3 92.41 <1.0
Error 18732.01 96 195.13
W ithin Subjects
j
B (item stru ctu re) 191.77 1 191.77 <1.0
A (item sets) 4149.77 1 4149.77 2 0 .4 8 **
BC 270.69 3 90.23 <1.0
AC 4319.55 3 1439.85 7 .1 0 **
E rror 19448.53 96 202.59
*p<.05
**p< .01
53
item pairs for this fault category should be kept in mind.
For example, in som e instances fault-free items were formed by
eliminating response-choices which paralleled keyed responses and
adding new alternatives. Other times an attempt was m ade to negate
the effects of a parallel fault by developing double parallel sets
for the fault-free item. For these items, a response choice was added
which paralleled one of the existing alternatives and one of the other
alternatives in the set was deleted. In all of these items, an equal
number of response-choices (four) appeared on faulty and fault-free
versions of items. Analyses of data for both sample populations
showed that faulty items developed using these procedures were genÂ
erally more d iffic u lt than their fault-free counterparts. For the
dentistry populations, where almost all of the items were manipulated
in one of these two ways, the decrease in scores on faulty items was,
as previously noted, significant. In addition, all of the optometry
items manipulated in this sam e manner showed a similar increase in
di ffic u lty .
Several explanations for the findings of decreased scores on these
faulty items can be postulated. For instance, i t is possible that in
manipulating items to make them fau lt-free, highly pertinent alternaÂ
tives may have been eliminated and/br less favored response-choices
substituted, thus making i t easier to select the correct answer.
Another possibility is that response alternatives added to mitigate
the effects of parallelism m ay have inadvertently provided additional
information that cued the examinee to the keyed response. Som e eviÂ
dence for these explanations can be seen on inspection of the items
54
in the optometry test in which a reversed trend in mean effects is
noted. These are items in which existing alternatives were not modiÂ
fied in any way but a fifth alternative was added, forming a second
set of parallel options. Because of the additional alternative, lower
scores on these items would be expected and this was the case. Yet,
despite these effects, the overall trend for optometry items was, as
previously noted, toward increased scores for faulty items.
Examination of the summary data (see Tables 4.7 and 4.8) reveals
that the significant differences found between item sets in the extra-
Ion g-f au I t category were evidenced here as well. This time, however,
for the optometry population, there is also a strong achievement-
t i mes-item set interaction indicated. As previously suggested, random
assignments of items to test forms and subjects to groups make i t reaÂ
sonable to consider that these effects m ay be chance results attributÂ
able to the small number of items included in the samples.
Looking at the mean scores for both sample populations (Tables
4.5 and 4.6) i t can be seen that the pattern of achievement that would
normally be anticipated for the groups is found with items in Set I
on the optometry test; however, unexpected reversals in achievement
trends were noted for both sets of items with the dentistry population
and for Set II items with the optometry group. Here, again, i t m ay be
speculated that, due to the small number of items sampled, chance
effects m ay have been observed.
To summarize, the findings in this study with respect to parallel
items did not seem to support the in itia l hypothesis of increased scores
for faulty items but instead, showed a trend in the opposite direction.
55
Although interpretations of the obviously complex effects in operation
here would be hazardous, based on the limited data in the present
study, the findings did emphasize a need for further investigations.
Number-Fault Category
Findings
Tables 4.9, 4.10, 4.11, 4.12 show the results of the analyses of
the data on faulty and fault-free items in the number-fault category.
Here, the trend toward decreasing mean scores on faulty items which was
noted in the discussion of the parallel fault category can again be
seen. Significant differences in the item structure variable at the
p<.05 level were indicated for the dentistry tests, but results for
the optometry tests were mixed, with l i t t l e or no directional effects
seen. N o interaction between achievement level and item structure was
noted for either population, under either scoring system. The analyses
revealed expected patterns of achievement with significant differences
between achievement levels for the dentistry population at the p<.05
level and for optometry students at p<.01, under both scoring methods.
Also a very significant difference between item sets (p<.01) was found
for the dentistry population under both scoring systems and with perÂ
cent confidence scoring for optometry students. Using regular scoring
procedures, differences between sets in the optometry population apÂ
proached but did not reach significance. No significant achievement-
times-item set interactions were indicated with either group, under
either scoring systems.
Di scussion
For items in the number fault category, a reversal of expected
5 6
T A B L E 4 . 9
M E A N S C O R E S ON N U M B E R IT E M S W IT H R E G U L A R A N D
P E R C E N T C O N F ID E N C E S C O R IN G P R O C E D U R E S
F O R D E N T IS T R Y S T U D E N T S
Regular Scoring
Achievement
le v e l
Set I^
F a u lt-fre e Faulty
b
Set I I
F a u lt-fr e e Faulty
1 48.08 48.08 64.62 56.92
2 59.62 59.62 73.85 58.46
3 75.00 48.08 70.77 72.30
4 71.15 61.54 72.30 72.30
Percent Confidence Scoring
1 48.17 48.83 61.00 52.92
2 54.42 59.33 73.12 53.66
3 67.79 49.04 64.69 69.15
4 67.88 56.44 70.97 72.35
^ includes 4 item s, f a u lt - f r e e on form A and fa u lty on form B
^ includes 5 item s, f a u lt - f r e e on form B and fa u lty on form A
57 I
_ _ _ I
T A B L E 4 . 1 0
M E A N S C O R E S O N N U M B E R IT E M S W IT H R E G U L A R A N D
P E R C E N T C O N F ID E N C E S C O R IN G P R O C E D U R E S
F O R O P T O M E T R Y S T U D E N T S
Regular Scoring
Set I^ Set 11^
Achievement
le v e l F a u lt-fre e Fau lty F a u lt-fre e Faulty
1 78.46 80.00 75.82 73.62
2 83.08 84.62 83.51 81.32
3 84.62 92.31 84.61 82.42
4 84.62 87.69 89.01 82.42
Percent Confidence Scoring
1 77.37 76.38 74.90 72.28
2 80.29 80.12 81.39 79.07
3 82.82 88.48 81.63 78.36
4 84.69 85.48 86.46 78.17
^ includes 5 items, f a u lt - f r e e on form A and fa u lty on form B
^ includes 7 items, f a u lt - f r e e on form B and fa u lty on form A
58
T A B L E 4 . 1 1
A N A L Y S IS O F V A R IA N C E F O R N U M B E R IT E M S W IT H
R E G U L A R A N D P E R C E N T C O N F ID E N C E S C O R IN G
P R O C E D U R E S F O R D E N T IS T R Y S T U D E N T S
Regular Scoring
.Source of Variance •SS df M S F
Between Subjects
1 C (achievement le v e l) 6540.75 3 2180.25 3 .09*
! Rows (group blocks) 182.81 1 182.81 <1.0
1 Rows X C 3711.90 3 1237.30 1.75
j Error 67821.15 96 706.47
! W ithin Subjects
1 B (item stru ctu re) 2740.50 1 2740.50 6 .30*
A (item sets) 4025.12 1 4025.12 9 .2 6 **
: BC 615.75 3 205.25 <1.0
AC 386.90 3 128.97 <1.0
' Error
i
41744.23 96 434.84
: — Percent Confidence Scoring
Between Subjects
! C (achievement le v e l 5525.89 3 1841.96 3.55
! Rows (group blocks) 7.03 1 7.03 <1.0
! Rows X C 4455.88 3 1485.29 2.86
E rror 49863.91 96 519.42
ÃŽ W ithin Subjects
B (item stru ctu re 1743.77 1 1743.77 6 .15*
j A (item sets) 3536.36 1 3536.36 1 2 .48**
BC 116.35 3 38.78 <1.0
; AC 60.74 3 20.25 <1.0
1
1 Error 27206.72 96 283.40
*p<.05
1 **p<.01
j
59
T A B L E 4 . 1 2
ANALYSIS OF VARIANCE FOR NUMBER ITEMS WITH
REGULAR AND PERCENT CONFIDENCE SCORING
PROCEDURES FOR OPTOMETRY STUDENTS
!
Regular Scoring
Source of Variance SS df M S F
Between Subjects
C (achievement le v e l) 2798.89 3 932,96 4 .6 5 **
Rows (group blocks 593.80 1 593,80 2.96
Rows X C 118.80 3 30.60 <1.0
E rror 19279.27 96 200.83
W ithin Subjects
B (item stru ctu re) 0.35 1 0.35 <1.0
A (item sets) 416.75 1 416.74 2.96
BC 140.75 3 46.92 <1.0
AC 194.45 3 64.82 <1.0
E rror 13520.98 96 140.84
^ Percent Confidence Scoring
Between Subjects
; C (achievement le v e l) 2259.60 3 753.20 5 .2 1 **
! Rows (group blocks) 386.00 1 386.00 2.67
i Rows X C 164.14 3 54.70 <1.0
1 E rror 13885.74 96 144.64
W ithin Subjects
B (item stru ctu re) 102.13 1 102.13 1.11
A (item sets) 443.70 1 443.70 4 .8 0 **
BC 162.13 3 54.04 <1.0
AC 211.70 3 70.57 <1.0
E rro r 8872.90 96 92.43
1
1
*p<.05
1**p < .01
!
60 ,
effects of faulty items was found for the dentistry sample,but 1itt le
similar effect noted for the optometry group. For these items, howÂ
ever, i t can be postulated that the differences in effects for the two
populations were probably due to differences in the types of items
rather than differences in item format. It should be noted that on
the optometry test almost all of the number items required mathematÂ
ical computations; alternatives, therefore, served simply to protect
against wild guessing. For these items the position or type of a lÂ
ternative would not be expected to be an important determinant of
response choice and results ihdfcated that this was so. O n the other
hand, on the dentistry test, most of the items in the number-fault
category involved recall a b ilitie s. W hen one of these items was
faulty, the keyed response was a middle alternative with two pertinent
(close in numerical value) alternatives displayed on either side. In
making the item fault-free, the keyed response was placed as either
a firs t or last alternative and thus; only one alternative close in
value to the keyed response remained. Consequently, i t m ay have been
easier to select the keyed responses in fault-free versions of these
items. In previous studies (37), researchers focusing on non-numerical
alternatives have found l i t t l e evidence of option-position bias. HowÂ
ever, the findings here suggest the need for further research to study
response-position effects for number items.
In the number-fault category, as with the extra-long and parallel
categories, there were no indications of interactions between item
structure and achievement levels for either population under either
scoring method (see Tables 4.11 and 4.12). Expected differences in
6 1
achievement levels were indicated for both groups and the familiar
pattern of item-set differences which has previously been discussed
was again evidenced.
In summary, the results of these analyses suggest that when item
responses require numerical calculations, the order in which response
choices are displayed does not significantly affect item d iffic u lty .
There were, however, indications that order m ay be significant when
selection of a keyed response is a function of recall. For the popuÂ
lations studied, such items tended to be less d iffic u lt when keyed
responses were either the fir s t or last options listed. This finding
is a reversal of the original research hypothesis in which i t was
anticipated that the faulty items in the number fault category would
be easier than their fault-free counterparts.
A11-of-the-above-Fault Category
Findings
The results of analyses of scores on faulty and fault-free items
in the a l1-of-the-above-fault category for the two sample populations
under both regular and percent confidence scoring procedures, are
shown on Tables 4.13, 4.14, 4.15, and 4.16.
These findings show highly significant (p<.01) increases in scores
on faulty items for the dentistry population under both scoring sysÂ
tems. Significant increases in scores on faulty items were also
found with the optometry group, at the p<.01 level under confidence
scoring procedures and at the p<.05 level with regular scoring. Just
as in the previously discussed fault categories, no interaction beÂ
tween achievement levels and item structure were found for either
62
T A B L E 4 . 1 3
M E A N S C O R E S ON A L L - O F - T H E - A B O V E IT E M S W IT H R E G U L A R A N D
P E R C E N T C O N F ID E N C E S C O R IN G P R O C E D U R E S
F O R D E N T IS T R Y S T U D E N T S
Regular Scoring
Achievement
le v e l
Set
F a u lt-fre e Fau lty
Set 11^
F a u lt-fre e Fau lty
1 66.15 84.62 69.23 73.08
2 73.85 96.92 76.92 86.54
3 80.00 89.23 73.08 92.31
4 84.62 98.46 76.92 90.38
Percent Confidence Scoring
1 68.00 84.00 66.92 74.13
2 73.23 95.38 78.37 82.69
3 78.38 88.77 73.56 90.77
4 82.92 95.38 75.86 88.17
^ includes 5 item s, f a u lt - f r e e on form A and fa u lty on form B
^ includes 4 item s. f a u lt - f r e e on form B and fa u lty on form A
63
T A B L E 4 . 1 4
M E A N S C O R E S O N A L L - O F - T H E - A B O V E IT E M S W IT H R E G U L A R A N D
P E R C E N T C O N F ID E N C E S C O R IN G P R O C E D U R E S
F O R O P T O M E T R Y S T U D E N T S
Regular Scoring
Set Set 11^
Achievement
le v e l F a u lt-fre e Faulty F a u lt-fr e e Faulty
1 50.00 55.77 40.38 55.76
2 65.38 75.00 48.08 57.69
3 69.23 63.46 59.62 73.07
4 69.23 69.23 67.30 78.85
Percent Confidence Scoring
1 45.77 54.00 41.67 50.52
2 63.46 70.85 43.08 54.54
3 62.56 63.83 60.44 68.10
4 64.35 69.23 63.17 75.50
^ includes 4
^ includes 4
item s, f a u lt - f r e e
item s, f a u lt - f r e e
on form A and
on form B and
f a u lty on form B
f a u lty on form A
64
T A B L E 4 . 1 5
A N A L Y S IS O F V A R IA N C E F O R A L L - O F - T H E - A B O V E IT E M S W IT H
R E G U L A R A N D P E R C E N T C O N F ID E N C E S C O R IN G
P R O C E D U R E S F O R D E N T IS T R Y S T U D E N T S
Regular Scoring
, Source of Variance . SS 4 f
M S
F
Between Subjects
C (achievement le v e l) 5860.58 3 1953.53 4 .8 2 **
Rows (group blocks) 276.92 1 276.92 <1.0
Rows X C 1331.73 3 443.91 1.09
Error 38932.69 96 405.54
W ithin Subjects
B (item stru c tu re ) 9969.23 1 9969.23 39 .4 8 **
' A (item sets) 1017.31 1 1017.31 4 .0 3 *
BC 177.88 3 50.29 <1.0
AC 245.19 3 81.73 <1.0
E rror 24240.38 96 252.50
— Percent Confidence Scoring
Between Subjects
1
1 C (achievement le v e l) 4491.74 3 1497.24 4 .3 2 **
1 Rows (group blocks) 323.13 1 323.13 <1.0
Rows X C 1112.29 3 370.76 1.07
Error 33249.93 96 346.35
' W ithin Subjects
B (item s tru c tu re ( 8462.82 1 8462.82 4 2 .1 0 **
A (item sets) 1029.51 1 1029.51 5 .1 2 *
' BC 36.16 3 12.05 <1.0
AC 232.97 3 77.66 <1.0
Error 19295.89 96 201.00
*p<.05
1**p < .01
1 .
65
T A B L E 4 . 1 6
A N A L Y S IS O F V A R IA N C E F O R A L L - O F - T H E - A B O V E IT E M S W IT H
R E G U L A R A N D P E R C E N T C O N F ID E N C E S C O R IN G
P R O C E D U R E S F O R O P T O M E T R Y S T U D E N T S
Regular Scoring
Source of Variance SS df .MS ,F
Between Subjects
C (achievement le v e l) 12220.55 3 4073.53 ' 6.60*
Rows (group blocks) 1325.12 1 1325.12 2.15
Rows X C 609.98 3 203.33 <1.0
E rror 59278.85 96 617.49
^ W ithin Subjects
B (item stru ctu re) 2887.62 1 2887.62 5.03*
I A (item sets) 1084.74 1 1084.74 1.89
BC 393.63 3 131.21 <1.0
' AC 3302.28 3 1100.76 1.92
E rror
1
55144.23 96 574.42
i Percent Confidence Scoring
Between Subjects
1 C (achievement le v e l) 11749.57 3 3916.52
7.90*:)
Rows (group blocks) 278.66 1 278.66 <1.0
Rows X C 89.08 3 29.69 <1.0
E rror 47597.12 96 495.80
' W ithin Subjects
i B (item stru ctu re) 3125.19 1 3125.19 6.94*:'
; A (item sets) 1113.47 1 1113.47 2.47
BC 193.74 3 64.58 <1.0
1 AC 3548.15 3 1182.72 2.63
Error 43199.67 96 450.00
*<.05
**< .0 1
66
sample.
As indicated on Tables 4.15 and 4,16, expected achievement-level
differences were evidenced for both population groups. Item-set d ifÂ
ferences at the p<.05 level were noted for the dentistry sample but no
significant differences in this variable were found for the optometry
sample. N o interactions between achievement and set were found for
either ..population.
Discussion
Research hypotheses with respect to the variable, test d iffic u lty ,
are supported by the findings for both the dentistry and optometry popÂ
ulations in the all-of-the-above fault category. I t had been postuÂ
lated that mean scores on faulty items would be higher than the mean
scores on paired items in which this fault did not exist. It was
speculated that these faulty items might provide undesired opporÂ
tunities for examinees to take advantage of partial knowledge in
order to identify keyed responses and thus improve their scores. The
findings in this study would seem to offer support for these assumpÂ
tions. At the same time, i t should be noted also that no significant
interactions between item structure and achievement levels were found
(see Tables 4.15 and 4.16). These data would seem to indicate that
the use of such test strategy to raise scores is probably not specific
to any achievement group.
In considering the findings with regard to this fault category
the different approaches used in developing fault-free items for the
two populations should be noted. I t w ill be recalled that on the
optometry test som e all-of-the-above faults were corrected by a ltèring
67
the Item stems to form negative statements and substituting new keyed
responses for all-of-the-above options. For other items on this sam e
test, fault-free versions were developed by changing the content of
alternative choices so that only one choice was correct. In addition,
on som e fault-free items all-of-the-above options were retained even
though they were no longer keyed responses and, on others, these
choices were dropped. The possibility that the item structure effects
seen in this portion of the study might have been confounded with poÂ
tential effects of the negative constructions and/or variation in item
content had to be considered as limiting factors in interpreting the
reported findings.
I t was because of the possible confounding effects described
above that changes in the format of fault-free items in this category
were introduced on the dentistry test. Faulty items on this test were
developed by providing additional multiple-alternative choices. The
modified format allowed the examinee to indicate that one, two, three
or all of the existing options were correct but, no changes in conÂ
tent were m ade on any item. The highly significant differences in
d iffic u lty between faulty and fault-free items in this subset were
considered to be strong support for the stated hypotheses. The findÂ
ings of increased d iffic u lty for fault-free items were in agreement
with similar results reported by Hughes and Trimble (26) in their
study of the use of complex alternatives in multiple-choice items.
Analyses of Test R eliability Variable
For purposes of analyzing re lia b ility effects, items were grouped
so that all of the faulty items formed one subset and all of the
6 8
fault-free items, a second subset. This procedure was necessary beÂ
cause of the small number of items within each item fault category.
The analyses employed Feldt's formula for comparing re lia b ility coÂ
efficients derived from independent samples of examinees, u tilizin g
statistic "W " with corrected degrees of freedom (53).
Findings
Comparisons of estimates of internal consistency (Cronbach's
alpha) for fault-free and faulty items in Item Sets I and II on both
the dentistry and optometry tests are shown on Table 4.17. A s can
be seen, the data for Set I items on the dentistry test revealed a
significantly higher alpha coefficient for fault-free items than for
faulty items. For the other three item sets, however, a trend in the
opposite direction was noted, although these differences did not
reach levels of significance.
Discussion
In assessing these findings, two questions concerning the nature
of the item groups selected for re lia b ility comparisons were raised.
First, the possibility that the re lia b ility effects observed might have
been confounded by the fact that all faulty items were pooled was conÂ
sidered. Did the inconsistent re lia b ility effects noted on Table 4.17
in truth reflect the differing mean effects of specific item faults?
Such confounding effects were seen as highly unlikely, however, since
when the alpha coefficients for individual subsets (see Tables 4.18
and 4.19) were compared with previously reported data on the mean
scores for these subsets, no apparent relationship between the effects
of a fault and its pattern of reliability-change could be found. For
69
TABLE 4.17
COMPARISONS OF MEASURES OF INTERNAL CONSISTENCY*
FOR FAULTY AND FAULT-FREE ITEMS ON
DENTISTRY AND OPTOMETRY TESTS
Item Set N
R e lia b ilit y
Faulty
C o e ffic ie n ts
F a u lt-F re e df
b
w
D e n tis try
I 16 52 .030 .442 48,48 1 .7 4 **
I I 16 52 .438 .369 48,48 1 . 12
Optometry
I 24 52 .460 .360 49,49 1 . 19
I I 20 52 .455 .265 49,49 1.42
*Cronbach’ s Alpha
**< .0 5
' = F e ld t’ s s t a t is t i c "W" (53)
I g
I i = Number of items in set
70!
T A B L E 4 . 1 8
RELIABILITY COEFFICIENTS (CRONBACH"S ALPHA) FOR
FAULTY AND FAULT-FREE ITEM SETS
ON DENTISTRY TEST
Alpha C o e ffic ie n ts
Fau lt Category Item Set Fau lty items F a u lt-F re e Items
E xtra-lo ng I .042 -.1 9 8
E xtra-lo ng I I .128 .072
P a r a lle l I .029 .015
P a r a lle l I I .064 .131
Number I .351 .145
Number I I .016 .433
A ll-o f-th e -a b o v e I .069 .539
A ll-o f-th e -a b o v e I I .004 .472
TABLE 4.19
RELIABILITY COEFFICIENTS (CRONBACH’ S ALPHA) FOR
FAULTY AND FAULT-FREE ITEM SETS
ON OPTOMETRY TEST
Alpha C o e ffic ie n ts
Fault Category Item Set Fau lty Items F a u lt-fr e e Items
E xtra-lo ng I .334 .042
E xtra-lo ng I I .110 -.2 0 7
P a r a lle l I .480 .209
P a r a lle l I I -.5 3 3 .341
Number I .009 -.0 9 2
Number I I .221 .127
A ll-o f-th e -a b o v e I .033 .265
A ll-o f-th e -a b o v e I I .261 .442
7 1
example, although opposite re lia b ility effects were noted for extra-
long and all-of-the-above subsets, the m ean effects for both of these
categories were in the sam e direction ( i . e . , higher means for faulty
items than for fault-free items).
A second matter of concern was the degree to which re lia b ility
effects observed when faulty items were grouped as a single test could
be considered to indicate probable effects in a real testing situation
in which faulty items would, of course, be interspersed with fault-
free items. I t was fe lt that as long as faulty items correlated
positively with non-faulty items, one could reasonably expect that the
re lia b ility effects would be similar in both situations. Examination
of the correlations between faulty items and non-manipulated items in
the present study (see Table 4.20) showed such a positive relationship
for both the optometry and dentistry populations.
Theoretically viable arguments for re lia b ility changes in either
direction due to item faults can , be--and have been--suggested, dependÂ
ing on how test-wise factors are interpreted. I f , as many experts
have suggested, test-wiseness is considered to be constant, non-random
error variance (30), i t might be expected to add to, not supress,"true"
achievement and thus increase test re lia b ility . In their study, Dunn
and Goldstein based their hypothesis concerning increased re lia b ility
for faulty i tems on a somewhat similar concept. These researchers sugÂ
gested that the presence of cues would reduce random guessing and thus
increase test re lia b ility (25).
O n the other hand, since test-wise skills are probably used at
certain times and not used at other times, test-wiseness can be seen
72
TABLE 4.20
CORRELATIONS OF FAULTY AND FAULT-FREE ITEMS
WITH NON-MANIPULATED ITEMS ON THE
DENTISTRY AND OPTOMETRY TESTS
Item Sets i" i ’" N
C o rrelatio n
Fau lty Items
C o e ffic ie n ts
F a u lt-fr e e Items
D e n tistry
I 16 53 52 .228 .408
I I 16 53 52 .471 .084
Optometry
I 24 26 52 .547 .620
I I 21 26 52 .343 .482
i^ = Number of items in set
i^ = Number of non-manipulated items
N = Number of examinees
73
as introducing a chance element into a test. The introduction of such
a non-predictable element would, of course, detract from the consistÂ
ent measurement of achievement.
Despite the theoretical arguments, findings in this study seem
to indicate that, for practical purposes, faulty items need not be
considered as detrimental to test re lia b ility estimates. The overall
findings reported here are in line with results reported by most of
the other researchers who have investigated similar phenomena. Dunn
and Goldstein (25), Hughes and Trimble (26), and McMorris, ^ a J L (24)
have all indicated that the inclusion of item faults does not seem to
affect test re lia b ility . These conclusions are, however, in contrast
to those reported by Board and Whitney (13) who found that item flaws
significantly reduced the internal consistency of a test for three out
of the four items faults investigated.
Overall then, i t would appear that the stated null hypothesis of
no signficant re lia b ility effects attributable to the presence of
faulty items is acceptable on the basis of the findings in this study.
74
CHAPTER V
SUM M ARY, CONCLUSIONS AND RECO M M ENDATIO NS
Summary
This study investigated the effects of violations of four authorÂ
ity- recommended item-writing principles on the d iffic u lty and re lia b ilÂ
ity of multiple-choice tests administered to a population of health
profession students. The selected item faults were;
1. A keyed response noticeably longer than other alternatives
(extra-long fa u lt).
2. A keyed response which is diametrically opposite, or closely
similar, to one other alternative (parallel fa u lt).
3. A keyed response which is the middle alternative in a
numerical sequence (number fa u lt).
4. A keyed response which indicates that all of the given
options are correct and is the only multiple-alternative
choice offered (all-of-the-above fa u lt).
Subjects for the research included 104 dentistry students and 104
optometry students. Experimental tests developed for the study were
administered to each group as final examinations in required course
work at the end of the academic year in each professional school. For
experimental purposes, parallel forms of each test were constructed by
manipulating a portion of the items on the test to include the speciÂ
fied item faults. A counterbalanced design was used in which faulty
items on one form appeared on a second form of each test as fault-free
75
items; similarly fault-free items on the fir s t form were paired with
faulty versions of the sam e items on the second form.
It was hypothesized that (1) faulty items would be easier than
fault-free items, (2) examinees would express more confidence in the
correctness of the keyed response on faulty items than on fault-free
items, (3) low-achievers would benefit more than high-achievers from
the presence of faults and (4) the re lia b ility of the tests would be
significantly affected by the inclusion of faulty items.
In order to examine the d iffic u lty effects indicated in these
hypotheses, the counterbalanced design was analyzed by means of a
basic Latin square in which item structure ( i . e . , presence or absence
of item faults) was a two-level treatment factor. Findings indicated
differential treatment effects that appeared to be dependent on
characteristics of specific faults. Results showed faulty items in
the extra-long and all-of-the-above fault categories to be less d i f f i Â
cult. Also, examinees showed more confidence in the correctness of
keyed responses for faulty versions of these items than was indicated
for fault-free verisons. Score and confidence-differences for both
faults were significant for the optometry population and, for a ll-o f-
the-above faults for the dentistry population. A similar trend was
noted for the dentistry group with extra-long faults, but an unexÂ
plained reversal of scores at one achievement level in the dentistry
test reduced the overall differences for this sample.
A reversal of the hypothesized directional effects was revealed
in the analysis of numberical and parallel item-fault categories.
For both of these faults, faulty items in the dentistry sample were
76
significantly more d iffic u lt, with less confidence expressed in the
correctness of keyed responses. A similar trend, but not at s ig n ifiÂ
cant levels was noted for the optometry population with parallei-fault
items. However, the introduction of item faults in the number-fault
category for the optometry test, where most items involved arithmetÂ
ical calculations, did not show an effect in either direction.
N o interaction between achievement and item structure were seen
with any fault for either population.
Findings with respect to re lia b ility effects were mixed and, with
one exception, were generally not significant. The exception appeared
in the data for items in Set I on the dentistry test. Here alpha coÂ
efficient estimates were significantly higher for fault-free items
than for faulty items. Comparisons of alpha coefficients for the reÂ
maining set on the dentistry teste and for both optometry item sets
showed slightly lower alphas for fault-free items than for faulty items
but, as mentioned, differences were not significant.
Conclusions
In view of the inconsistencies indicated by this investigation,
i t is apparent that global statements concerning the effects of item-
writing faults for the population studied would be inappropriate. HowÂ
ever, in assessing the implications of the study, several interpretaÂ
tions of the results can be suggested.
One of the fir s t questions that should be considered concerns the
certainty that the researcher can have that the differential effects
observed represent "true" inconsistencies that may be reasonably
attributed to factors inherent in faultiness characteristics of items
77
rather than reflections of chance variables. Consideration of the
findings reported by such researchers as Board and Whitney, McMorris,
e^ ^ and Dunn and Goldstein, (13,24,25), offers strong support for
the assumption that these inconsistencies are indeed real and treatÂ
ment related. Within each of these previous studies, observed effects
am ong several different item faults varied in degree and, in som e
cases, direction with respect to particular item faults.
Once the element of "inconsistency" is accepted as a realistic
phenomenon, further questions have to be asked regarding the nature
of these inconsistencies. Is there a com m on thread am ong faults that
e lic it specific test-taking behaviors? W hy are cue-using strategies
used in som e instances and not in others? Are the differential effects
noted due to characteristics of the cues themselves or the settings
in which they exist or, perhaps, som e combination of these factors?
I t is highly possible that the set to take advantage of item-
writing cues is dependent on particular characteristics of the test-
taking situation its e lf. Som e evidence for this notion can be found
in Chase's study of sets to choose alternatives of differing lengths
(23). I t w ill be remembered that Chase showed that not only did a set
to choose the longest option exist but, more important, i t was possible
to extinguish such a set by manipulating the surrounding items in the
test.
There is also considerable evidence that response sets toward
the use of cue-strategies can be encouraged through training. SucÂ
cessful implementation of such training has been reported for populaÂ
tions varying from Callenbach's second-graders to the graduate testing
78
and measurement students included in Pyrczak's sample (41,45). For
more sophisticated populations with demonstrated proficiency in the art
of taking tests, i t is reasonable to assume that training has taken
place by virtue of long experience and that most are quite cognizant
of the strategies that might be employed. Yet, even so, the applicaÂ
tion of these strategies seems to be selective, used in one situation
and not another. W hy? It may be that the inconsistencies observed by
researchers are in part indications of two different kinds of response
sets, one content-oriented and the other, secondary-cue dependent.
Perhaps when the circumstances of an examination emphasize the imÂ
portance of content knowledge, examinees tend to focus on item content
and generally disregard secondary cues,but in the non-evaluative, less
stressful testing situation of, for instance, taking an experimental
test in which content-irrelevant items are featured, a set to seek out
secondary cues is more likely to be evidenced.
Even with settings held constant, as was done in the present study,
i t appears that cue-using strategies are employed in response to one
kind of fault and not to another. This circumstance cannot be exÂ
plained definitively on the basis of existing data, but several posÂ
sible suggestions can be made. For instance, in their study of the
correlates of test-wiseness. Diamond and Evans concluded that secondary-
cue-response behaviors are specific to particular cues (31). Certainly
the findings of differences and, even, effect-reversals am ong the
faults studied in the present research support such a conclusion. I f
this is the case, then what are the characteristics of specific faults
that precipitate the application or disregard for cue strategy in a
79
given situation? I t can be theorized that cue-using strategies fa ll
into two different categories: those which u tilize partial knowledge
to deduce correct responses, as might be the case with an al 1-iof-the-
above faulty item and, those which are independent of content elements
within the item and are used in response to faults of parallelism
numerical sequencing, etc. M i 11 man. Bishop and Ebel hint at such a
distinction in their discussion of the strategies of deductive reasonÂ
ing (19). I t m ay also be that partial-knowledge cue-using strategies
are emphasized in testing situations when the examinee has a set toward
content achievement, and non-content oriented cues are used in situaÂ
tions which encourage consideration of item patterns.
I t should be noted, however, these distinctions might not always
be clearly defined. The extra-long keyed response flaw, for instance,
could be classified as either type, depending on the characteristics
of the particular item involved. In some instances, this fault m ay
be providing visual stimulus that provokes a set to choose i t over
other options; while at other times, extra-long options m ay actually
be providing additional information that can be used to advantage by
a "partial-knowledge" tactician to identify the correct response when
the examinee is not sure of the answer. It is possible that the
preference for extreme-1ength options shown in the Chase investigations
of relative-1ength response sets was in part due to a partial-knowledge
effect. It can also be speculated that partial-knowledge effects m ay
be operating in the findings of decreased d iffic u lty for items with
extra-long keyed responses which have been reported by McMorris et al
and Dunn and Goldstein (24,25). Furthermore, the negative findings
80
regarding effects of this fault on item d iffic u lty indicated in the
Board and Whitney study m ay possibly be explained by the fact that
extra-long and extra-short options were investigated and analyzed as a
single set. The partial-knowledge factor would probably exist only
for the extra-long alternatives.
Data in the present study would seem to lend credence to these
speculations concerning the use of partial knowledge as a response
strategy for certain kinds of faults. As noted, highly significant
d iffic u lty -e ffe c t differences were found in the all-of-the-above and
extra-long item categories, indicating that faulty items were easier
than fault-free versions. O n the other hand, faulty items in the
parallel and number categories were either more d iffic u lt than fault-
free items or did not d iffer significantly from them. Consideration
must be given to the possibility that partial-knowledge cues were
operating in the solution of the fir s t two kinds of items and perhaps
a more pattern or format-oriented strategy in the case of the last
two. Overall, therefore, i t would seem that the inconsistent findings
with regard to item-fault effects might be attributed to both context
and fault-type factors.
An additional factor which findings in the present study indicate
should be of concern is the effect of manipulating items to correct
faults. As mentioned previously, an unexpected reversal was found
in the d iffic u lty trend for number and parallel items. This reversal
can probably best be explained as resulting from changes that were
m ade in manipulating items to generate faulty and fault-free versions.
These results suggest the importance of maintaining the integrity of
81
alternative sets. There is a good likelihood that an experienced
faculty members selects the most pertinent alternatives for a set of
item options and that in attempting to correct presumed item faults
test specialists m ay be inadvertently altering the item characteristics
and possible making them less powerful elements of a particular test.
This explanation is somewhat confirmed by findings in the research reÂ
ported here that for faults generated without changing existing alternÂ
atives, no effect differences were seen but when such changes were
m ade in manipulating the items, difficulty-reversals appeared.
Recommendations
Based on the results of the present investigation, as well as
consideration of the findings reported by other researchers, the need
for further research to more accurately define the effects of faulty
item-writing is evident. Am ong the m any avenues of investigation open
to future researchers, the following are particularly recommended:
1. Interrelationships between test purposes and response sets to
use cue strategies.
2. Classification of cue-using strategies.
3. Interactions between the type of cue-using strategy and
examinees response set as determined by test purpose.
4. Effects of changing alternatives to make an item fault-free.
5. Differences in cue-using strategies am ong populations of
varying test-sophistication.
6. Effects of the use of partial-knowledge strategies on test
measurement.
7. Types of faults which are most likely to provide partial -
8 2
knowledge.
8. Further examination of specific authority-recommended item-
writing principles.
Som e Final Thoughts
In summary, the inconsistencies with regard to item-fault effects
seen in the present study, as well as the research of other investiÂ
gators preclude any overall statements on the nature of these effects.
In general, the results would seem to indicate that cue-using stratÂ
egies, presumably employed when item flaws are present are specific
to particular faults, to the experiential background of the population
involved and to characteristics of the test and testing situation in
which items are used. Further investigations into the interrelationÂ
ships among these variables would be valuable in establishing guideÂ
lines for test-developers.
In the meantime, however, test specialists might want to rexamine
the basic premises on which they typically act when "perfecting"
teacher-made tests. There is a reasonable possibility that future
data m ay indicate that compulsive manipulation of certain kinds of
items m ay not only be unnecessary but, possibly, damaging to test paÂ
rameters of overall mean, re lia b ility and discrimination. Extraneous
cues provided by item flaws may, for example, be of far less importÂ
ance, when dealing with item faults in which partial-knowledge is not
a particularly important factor, than the potential damage that m ay
be caused by disturbing the integrity of existing alternatives.
Finally, the implications here are not intended to indicate that
item faults should never be corrected but only that extreme caution
83
should be taken in the processes of manipulation, making as few changes
as possible in alternative sets that have been provided by content-
wise faculty members and, with full cognizance of the influences of
such test factors as purpose, characteristics and population in
assessing the probability that a cue-using strategy w ill be applied
to a particular item fault.
84
REFERENCES
1. Cox R., Examinations and Higher Education: A Review of the L itÂ
erature. London : Society for Research in Higher Education, 1966
2. Walton, H.J., and Drewery J. The Objective Examination in the
Evaluation of Medical Education. British Journal of Medical
Education, 1:255-264, 1967.
3. Hubbard, J.P., and Clemans, W.V. Multiple-Choice Examinations in
Medicine. Philadelphia: Lea and Febiger, 1961.
4. Mil 1er, G.E., Abrahamson, S., Cohen, I.S ., Graser, H.P, Harnack,
R.S. and Land, A. Teaching and Learning in Medical School.
Cambridge: Harvard University Press, 1961.
5. Charvat, J., McGuire, C. and Parsons, V. A Review of thé Nature
and Uses of Examinations in Medical Education, Public Health
Papers #36. Geneva: World Health Organization, 1968.
6. Cowles, J.T., and Hubbard, J.P. Validity and R eliability of New
Objective Tests. Journal of Medical Education, 29:30-34, 1954.
7. Murphee, H.B. Studies on the R eliability and Validity of Object-
tive Examinations. Journal of Medical Education, 36:813-818,
1961.
8. Cowles, J .T ., and Hubbard, J.P. A Comparative Study of Essay and
Objective Examinations for Medical Students. Journal of Medical
Education. Vol. 27, No. 3, Part 2:14-17, 1952.
9. Rosinski, E.F., and Hamilton, D.L. Examination As Part of a N ew
Curriculum. Journal of Medical Education, 41:135-142, 1966.
10. Hoffman, B. The Tyranny of Testing. New York: Crowel 1-Col 1ie r ,
1962.
11. Gibson, A.L. Second Thoughts on Multiple-Choice Examinations.
British Journal of Medical Education, 3:143-150, 1969.
12. Adkins, D.C. Test Construction : Development and Interpretation
of Achievement lests. (Second Edition). Columbus: Charles E.
M errill, 1974.
8 6
13. Board, C., and Whitney, D.R. The Effect of Selected Poor Item-
Writing Practices on Test D ifficulty, R eliability and Validity.
Journal of Educational Measurement, 9:225-232, 1972.
14. Budner, S. Intolerance of Ambiguity as a Personality Variable.
Journal of Personality 30:29-50, 1952.
15. McCarthy, W.H. A n Assessment of the Influence of Cueing Items
in Objective Examinations. Journal of Medical Education, 41:263-
266, 1966.
16. Ebel, R.L. Essentials of Educational Measurement. (Second
Edition) Englewood C liffs: Prentice HalT, 1972, Pp.236-237.
17. Alker, H.A., Carlson, J.A., and Hermann, M.G. Multiple-Choice
Questions and Student Characteristics. Journal of Educational
Psychology, 60:231-243, 1969.
18. Erickson, M . Test Sophistication : A n Important Consideration.
Journal of Reading, 16:140-144, 1972.
19. Mil 1man, J ., Bishop, C.H., and Ebel, R. A n Analysis of Test-
wiseness. Educational and Psychological Measurement, 25:707-726,
1965.
20. Ebel, R.L., and Damrin, D.E. Test and Examinations. Encyclopedia
of Educational Research. (Third Edition). N ew York: Macmillan,
1960
21. Lindquist, E .F., (Ed.) Educational Measurement. Washington, D.C:
American Council on Education, 1951.
22. Thorndike, R.L, and Hagen, E. Measurement and Evaluation in PsyÂ
chology and Education. (Fourth Edition). New York: John Wiley
and Sons, 1977.
23. Chase, C. I. Relative Length of Options and Response Set in
Multiple-choice Items. Educational and Psychological Measurement,
24:861-866, 1964.
24. McMorris, R.F., Brown, J.A., Snyder, G.W., and Pruzek, R.M.
Effects of Violating Item-Construction Principles. Journal of
Educational Measurement, 9:287-295, 1972.
25. Dunn, T.F., and Goldstein, L.G. Test D ifficu lty, Validity and ReÂ
lia b ility as Functions of Selected Multiple-choice Item ConstrucÂ
tion Principles. Educational and Psychological Measurement, 19:
171-179, 1959.
86
26. Hughes, H.H., and Trimble, W.E. The Use of Complex Alternatives
in Multiple-choice Items. Educational and Psychological MeasureÂ
ment, 25:117-126, 1965.
27. Woodley, K.K. Test-wiseness Program Development and Evaluation.
Paper presented at Conference of American Educational Research
Association : N ew Orleans, 1973.
28. Good, C.V. (Ed.) Dictionary of Education (Third Edition). New York
McGraw-Hill, 1973.
29. Stanley, J.C., and Hopkins, K.D. Educational and Psychological
Measurement and Evaluation. ( Fifth Edition). Englewood C liffs :
Prentice-Hal1, 1972.
30. Stanley, J.C. R eliability. In Educational Measurement (Second
Edition). R.L. Thorndike (Ed.). Washington, D.C.: American
Council on Education, 1971. Pp. 363-365.
31. Diamond, J .J ., and Evans, W.J. A n Investigation of the Cognitive
Correlates of Test-wiseness. Journal of Educational Psychology,
9:145-150, 1972.
32. Rugg, H. and Colloton, C. Constancy of the Stanford Binet IQ as
Shown by Retests. Journal of Educational Psychology, 12:315-322,
1921.
33. Levine, R.S., and Angoff, W.H. The Effects of Practice and
Growth on Scores on Scholastic Aptitude Test. Research arid DeÂ
velopment Report. No. 58-6/SR-586. Princeton, N ew Jersey:
Educational Testing Service, 1961.
34. Sanderson, P.H. The Art of Setting Multiple-Choice Questions.
The Lancet, 2:1291-1294, 1969.
35. Slakter, M . Effect of Guessing Strategy on Objective Test Scores.
Journal of Educational Measurement, 5:217-221, 1968.
36. Cronbach, L.J. Response Sets and Test Validity. Educational and
Psychological Measurement, 6:475-494, 1946.
37. Cronbach, L.J. Further Evidence on Response Sets and Test DeÂ
signs. Educational and Psychological Measurement, 10:3-31, 1950.
38. Moore, J.C., Schultz, R.E. and Baker, R.L. The Application of
Self Instructional Technique to Develop a Test-taking Strategy.
American Educational Research Journal, 3:13-17, 1966.
87
39. Gibb, B.G. Test-wiseness as a Secondary Cue Response. (Doctoral
Dissertation, Stanford University) Ann Arbor: QnTversity
Microfilms, 1964. No. 64-7643.
40. Slakter, M.J., Koehler, R.A., and Hampton, S.H. Learning Test-
wiseness by Programmed Tests. Journal of Educational MeasureÂ
ment, 7:247-254, 1970.
41. Callenbach, C. The Effects of Instruction and Practice in Content-
Independent Test-taking Techniques Upon the Standardized Reading
Test Scores of Selected Second-Grade Students. Journal of
Educational Measurement, 10:25-29, 1973.
42. Wahlstrom, M., and Boersma, F.J. The Influences of Test-wiseness
Upon Achievement. Educational and Psychological Measurement,
28:413-420, 1968.
43. Crehan, K.D., Koehler, R.A., and Slakter, M.J. Longitudinal
Studies of Test-wiseness. Journal of Educational Measurement,
2:209-212, 1974.
44. Rowley, G.L. Which Examinees are Most Factored by the Use of
Multiple-Choice Tests? Journal of Educational Measurement,
2:15-22, 1974.
45. Pyrczak, F. Use of Sim ilarities Between Stem and Keyed Choices
in Multiple-Choice Items. Paper presented at the Annual Meeting,
of the National Council on Measurement in Education: Washington,
D.C., 1973.
46. Lieberman, M., and Djokovic, J. Test Construction. International
Anesthesiology Clinics, 14:57-75, 1976.
47. Nedelsky, L. Science Teaching and Testing. New York: Harcourt,
Brace and World, 1965.
48. Tinkelman, S.N. Checklist for Reviewing Local School Tests.
Improving the Classroom Test. Albany, New York: Bureau of
Examinations and Testing, University of the State of N ew York,
1957.
49. Mil 1 man, J ., and Pauk, W . H ow to Take Tests. N ew York: McGraw-
H ill, 1969.
50. Rippey, R.M. Rationale For Confidence-Scored Multiple-Choice
Tests. Psychological Reports, 27:91-98, 1970.
88
51. Koehler, R.A. A Comparison of Validities of Conventional Choice
Testing and Various Confidence Marking Procedures. Journal of
Educational Measurement, 8:297-303, 1971.
5Z. Shuford, E.H., J r., Albert, A., and Massengil1, H.E. Admissable
Probability Measurement Procedures. Ps.ychometrika, 31:125-145,
1966.
53. Feldt, L.S. A Test of the Hypothesis that Cronbach's Alpha or
Kuder-Richardson Cooefficient Twenty is the Sam e for Two Tests.
Psychometrika, 34:363-373, 1969.
89
APPENDIX A
CONFIDENCE-IN-RESPONSE (C-I-R)
INSTRUCTIONAL PACKET
As Dr. Delaney explained at an earlier class meeting, som e of the
items on the final examination in Pedo 534 w ill require a new way of
responding to multiple-choice questions. This m em o is to acquaint you
with the procedures and to te ll you a l i t t l e about why we are trying
the "confidence-in-response " approach.
W e believe that by indicating a degree of confidence in each a lÂ
ternative rather than simply selecting one best choice, the examinee
w ill provide a better indication of his true knowledge. This will
permit the examiner to give credit for partial knowledge and provide
a more equitable evaluation of individual achievement.
To help you understand the procedures to be used on the examinaÂ
tion, I have enclosed the instructions and scores for a sample test
using confidence responses. Our experience has shown that an honest
indication of degree of confidence almost invariable maximizes
student scores. However, in order to assure an advantage to each
student participating, we will score each test by both regular and
confidence procedures. Only the highest of the two scores will be
recorded for an individual.
I f you have any questions concerning the confidence-in-response
procedures which you would like to have answered before the day of the
examination, please feel free to call m e at m y office or at home.
Thanks so m uch for your cooperation.
90
CONFIDENCE-IN-RESPONSE
The items in this test require a special way of responding.
Please read the following instructions carefully.
All the items are of standard format and have only one correct
answer. A s in all objective tests, you are asked to indicate the
answer you believe to be most correct. O n this test, however, you are
also asked to distribute 100 points over the alternatives in a way
that reflects your feelings as to the correctness of each alternative.
For example, i f you feel alternative ^ is the correct answer, you
should circle Then, i f you are 100% sure that is the answer,
assign 100 points to alternative _ B and mark each of the other alternaÂ
tives 0. I f , however, you are only 80% sure of ^ as the answer and
feel that alternatives A and ^ have an equal outside chance of being
the correct answer, you m ay assign 80 points to ^ and 10 points each to
alternatives A and £. Finally, i f you feel that all alternatives (or
two, three or four of them) have an equal probability of being the
correct answer, circle your "guess" and then distribute the 100 points
equally over the alternatives.
Your score on an item is simply a logarithmic function of the
number of points (probability) you assign to the true correct alternaÂ
tives. In this manner, you w ill get credit for partial knowledge. I t
is important to recognize that you will obtain your maximum score only
i f you are honest about your degree of certainty. That is, the maxiÂ
m u m score you are able to obtain on these items w ill be achieved i f
91
you distributed the 100 points in a manner truly reflecting your knowlÂ
edge.
Please answer each of the following items by circling the one
answer you believe to be most correct. Then distribute 100 points
over all the alternatives by writing the number which represents your
confidence in the correctness of that response on the line adjacent to
each response choice.
92
Confidence-In-Response
Sample Scoring
A n examiner must do a refraction at a 15 foot distance but has an
acuity chart constructed at 20 feet test distance. The overall
linear size of the letters best meeting the criterion of a 20/20
le tte r in this situation would be
0 a 4.35 m m
80 b* 6.5 m m
20 c 8.7 m m
0 d 9.5 m m
In a sphero-cylindrical lens the meridian of minimum power i:
called
60 a* the primary meridian
20 b the principal meridian
10 c the axis of the lens
10 d The base curve of the lens
Your patient is fogged to 2-.50 and observes the clock dial.
reports the 1-7 line blackest and the 2-8 line next blackest
Your phoropter has plus cylinder. You would set the axis at
0 a 120°
100 b* 127°
0 c 135°
0 d 150°
93
4. The image of a point , in simple hyperopic with-the-rule astigma-
tism and accommodation completely relaxed, w ill be a
15 a vertical oval
70 b* horizontal oval
0 c vertical line
15 d horizontal line
5. Versions and vergences d iffe r in that
50 a* Vergences are yoked or conjugate
movements while versions are nonÂ
parallel or disjunctive movements
50 b versions are yoked movements while
vergences are non-parallel movements
0 c versions occur only in the presence
of fusion
0 d only vergences have torsional component
Scoring
Correct answer Regular score C-I-R score
1. b 20 19
2. c 0 10
3 b 20 20
4 b 20 18.4
5 b 0 16.9
60 84.3
94
O R A L PH O TO G R A PH Y
Sample Test
Confidence in Response Scoring
This test is designed to assess your knowledge of the principles
and techniques associated with intra oral photography. For each item
circle the le tte r on the answer sheet which corresponds to the one
answer you believe to be most correct. Then distribute 100 points
over all the alternatives by writing the number which represents your
confidence in the correctness of that response, on the line adjacent
to each response choice.
1. The best way to focus a bellows intra oral system is:
________a) to m ove the focus ring on the lens
________b) to m ove the bellows
_c) to move the entire system
_d) to m ove the camera body
2. F values are numerical representations of:
a) iris diameter
________b) amount of light transmitted by lens
c) focal length
d) all of the above
3. Maximum depth of field is obtained by focusing:
________a) one third of the way into a scene
________b) on front edge of a scene
c) on back edge of a scene
d) middle of a scene
95
4. Mirrors for intra oral photography should be:
________a) back surfaced
________b) made of glass
c) made of metal
d) m ade of plastic
5. Using a mirror parallel with the teeth for a lingual mandi
bular view w ill:
________a) have no effect on the direction of the teeth in
the frame
________b) result in teeth straight across the frame
_c) result in the teeth going uphill in the frame
_d) result in the teeth going downhill in the frame
96
Scoring the Sample Test
Key:
Item No. Correct Choice
1 b
2 d
3 a
4 c
5 c
Regular Scoring: 20% for each correct answer
Confidence-in-response Scoring: Listed below are the percent
scores to be assigned for various levels of confidence in the correct
response, regardless of whether or not the alternative has been circled
as best choice.
No. of points assigned % score
0 0
5 6.9
10 10.0
15 11.7
20 13.0
25 13.9
30 14.7
33 15.1
35 15.4
40 16.0
45 16.5
50 16.9
55 17.4
60 17.7
65 18.1
70 18.4
75 18.7
80 19.0
85 19.2
90 19.5
95 19.7
100 20.0
97
APPENDIX B
SAM PLE A N S W E R SHEET
Optometry 343
Final Exam
D o not mark on the test. For each item circle the le tte r on the
answer sheet which corresponds to the one answer you believe to be
most correct. Then distribute 100 points over all the alternatives
by writing the number which represents your confidence in the correctÂ
ness of that response, on the line adjacent to each response choice.
You m ay use slide rules or calculators i f you like. Scratch
paper is also provided. I f you change an answer, be sure that your
final answer is quite clear. Return both your answer sheet and the
examination.
9 8
A N SW E R SHEET
Final Examination
O PTO M ETRY 343
Circle preferred choice
response.
1. a 8.
, then indicate percent
a 15. a
of confidence
22.
for each
a
b b b b
c c c c
d d d d
2. a 9. a 16. a 23. a
b b b b
c c c c
d d d d
e
3. a 10. a 24. a
b b
17. a b
c c b c
d d c d
d
4. a 11. a 25. a
b b 18. a b
c c b c
d d c d
d
5. a 12. a 26. a
b b 19. a b
c c b c
d d c d
e
d
13. a e 27. a
6. a b b
b c 20. a c
c d b d
d c
14 a d 28. a
7. a b b
b c 21. a c
c
d b d
d e c
d
99
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The Allied Health Professions Admission Test : its role in selection for physical therapy programs
PDF
The 'Widmar' (1981) decision and the status of religious liberty on the public college and university campus: Antecedents, effects, and prospects
PDF
An analysis of the need and opportunity for the continuing liberal education of doctors of medicine
PDF
A study of Japanese students at the University of Southern California, 1946-1980: vocational impact of American academic experience on Japanese students after returning to Japan
PDF
Studies of frog retinal ganglion cell responses to temporal, spatial and chromatic stimuli
PDF
The interaction of learners' spatial ability aptitudes with filmic schematic operations
PDF
A view from the top: What academic leaders believe about faculty performance and compensation in California's four-year postsecondary institutions
PDF
The relationship of cognitive learning styles of sectarian high school science teachers to their preferences for multiple-choice achievement test items at different cognitive levels
PDF
Effects of writing center usage and motivation on academic writing performance
PDF
Chemical and immunological studies on a heterophile antigen extracted from bovine erythrocyte stroma
PDF
The orientation toward employment of Taiwanese graduate students in the United States
PDF
An investigation of the effects of minimal competency testing on the curriculum, teachers, and students at the high school level
PDF
The role of microbes in the formation of the tubeworm fouling community in Fish Harbor, Los Angeles Harbor, California
PDF
Out of many...: A social history of the homosexual rights movement as originated and continued in Los Angeles, California
PDF
Studies on the aerobic oxidation of fatty acids by bacteria
PDF
The innervation of the teeth and periodontium of the rat
PDF
The effectiveness of the ESL program at Kuwait University as perceived by selected administrators, ESL teachers, and university students
PDF
A comparison of the beginning level of clinical competencies for critical care nurses as perceived by educators and employers
PDF
The effects of intellectual functioning and mediation on conceptual learning for Black college students
PDF
The effect of two pictorial encoding forms on the performances of children on a visual memory recognition test
Asset Metadata
Creator
Wolkow, Muriel (author)
Core Title
The effects of violating selected item-writing principles on the difficulty and reliability of multiple-choice tests for health profession students
School
School of Education
Degree
Doctor of Education
Degree Program
Education
Degree Conferral Date
1978-10
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Education,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c30-355173
Unique identifier
UC11232956
Identifier
DP26571.pdf (filename),usctheses-c30-355173 (legacy record id)
Legacy Identifier
DP26571.pdf
Dmrecord
355173
Document Type
Dissertation
Rights
Wolkow, Muriel
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA