Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The Subjective Utility Function As An Estimator Of Optimal Test Weights For Personnel Selection
(USC Thesis Other)
The Subjective Utility Function As An Estimator Of Optimal Test Weights For Personnel Selection
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
This dissertation has been microfilmed exactly as received 69-5067 SCHWARTZ, Donald Jam es, 1934- THE SUBJECTIVE UTILITY FUNCTION AS AN ESTIMATOR OF OPTIMAL TEST WEIGHTS FOR PERSONNEL SELECTION. U niversity of Southern California, Ph.D.,1968 Psychology, general University Microfilms, Inc., Ann Arbor, Michigan THE SUBJECTIVE UTILITY FUNCTION AS AN ESTIMATOR OF OPTIMAL TEST WEIGHTS FOR PERSONNEL SELECTION by Donald James Schwartz A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOS OPHY (Psychology) August 1968 UNIVERSITY O F SO UTHERN CALIFORNIA TH E GRADUATE SC H O O L UNIVERSITY PARK LOS ANGELES, C A LIFOR N IA 9 0 0 0 7 This dissertation, ’written by ....................Do _ n _ a 1 d am e.s. S c h w a rt z ................... under the direction of his Dissertation C o m mittee, and a pproved by all its members, has been presented to and accepted by The G radu ate School, in partial fulfillment of require ments for the degree of D O C T O R O F P H I L O S O P H Y Date. DISSERTATION COMMITTED C . ' ACKNOWLEDGEMENTS This study, perhaps more than most dissertations, would not have been possible were it not for the assistance and encouragement of many individuals* The writer wishes at this time to express his apprecia tion to all those who have contributed to the study. First of all, the writer owes a special debt of gratitude to his advisor, Dr* Norman Cliff, whose guidance and patience were indis pensable to the completion of this dissertation. The writer also wishes to thank the other members of his guidance committee, Professors Richard E* Beckwith, Daniel J. Davis, J. p. Guilford, and Robert Priest, whose advice and direction were exceedingly helpful* This study was begun as a research project assignment, whose purpose was to apply the J -coefficient technique to structuring achievement tests, while the writer was employed by the Civil Service Commission of the County of Los Angeles * The writer wishes to thank the Commission (now known as the Personnel Department) for its support of this project and especially Mr* J. H* Rainwater, Jr., then Chief Deputy of the Commission, under whose leadership the project was organ ized and conducted, and Mr. Leon Garneau, Head of Test Research and Development Section. The writer is also indebted to the Personnel Consultants of the. County, Drs. William B- Michael and Harry Case, whose assistance to the project was invaluable, and to all of the County employees who assisted in this project. Most notably, these ii iii include Mrs. Shirley Wise, Messrs. Robert H. Tollefson, Walter Cunningham, Morris Meadow, and Elliot Marcus . The study was completed under a personal research grant from Educational Testing Service. The writer wishes to thank ETS for its support of this project. The writer especially is grateful to Mrs. Mary Zullig and Mrs. Betty Clausen, who typed the manuscript. Finally, the writer wishes to express his appreciation to Mr. Barry Dayton, whose help in programming the study was especially valuable. TABLE OF CONTENTS PAGE ACKNOWLEDGEMENTS.............................. • ........... ii LIST OF TABLES..................... v LIST OF FIGURES ............................ vi CHAPTER I. INTRODUCTION....................................... 1 II. THEORETICAL BACKGROUND............................. 10 III. DEVELOPMENT OF THE MODEL........................... 3b IV. THE EXPERIMENTAL DESIGN............................. k-9 V. THE EXPERIMENTAL RESULTS........................... % VI. CONCLUSIONS AND INTERPRETATIONS..................... 62 REFERENCES.................................................. 72 APPENDICES APPENDIX I. Sample Questionnaire.................... 97 APPENDIX II. Sample Scored Questionnaire ............ 99 APPENDIX III. Rating Scale................... 102 LIST OF TABLES TABLE PAGE 1- Siegel and Thurstone Utility Scales ..................... 79 2. Proportion of Choices of Criterion Element Over Other Elements (Thnrstone Model) .......................... 80 3- Descriptive Statistics of the Criterion Element Ratings . 8l Descriptive Statistics of the Tests .................... 82 5- Comparison of Converted Regression Weights and Converted Correlations to the Subjective Utility Scale ........ 83 6. Relationship of Test Scores to Criterion Element Ratings.............................................. 84 v LIST OF FIGURES FIGURE PAGE 1. Paradigm of the Decision-Theoretic Approach............ 86 2. The Consequence Space.................................. 87 5- A Hypothetical Payoff Matrix.......................... 88 • Schematic View of a Cronbach-Gleser Decision Process (1965, P- 18)........................................ 89 5- Validity Matrix for Treatment t ...................... 90 6. Lattice of Hypothetical Options in the Siegel Higher Ordered Metric Method ................................ 91 7- Plot of Test Beta Weights and Subjective Utilities ... 92 8. Plot of Correlations of Tests With Criterion and Subjective Utilities ................................ 93 9- Plot of Beta Weights of Ratings and Subjective Utilities. 94 10. Plot of Correlations of Ratings With Criterion and Subjective Utilities ................................ 95 CHAPTER I INTRODUCTION Psychology, like many other sciences, is often marked by a- schizo phrenic-like split between "theoretical" and "applied." On the one hand, thousands of dedicated research workers are spending long and tedious hours of study and experimentation trying to discover the fundamental truths of human behavior; these theoreticians are usually little concerned with the practical applications of their endeavors and are in fact often rather contemptuous of their brethren who have abandoned some of the principles of the scientific method in order to provide economical and utilitarian applications of the truths. The latter group is composed of workers who believe that the basic purpose of psychology is not the discovery of fundamental truths but rather the formulation of methods that will enable man to modify and control his behavior and thereby create a better world. ■ They are, in their turn, often critical of the theoreticians who spend months and even years in search of a bit of knowledge the application of which will never begin to repay the cost of the discovery. One of the areas in which this division is most apparent and per haps most deleterious is that of psychological measurement. It Is here that the application of psychometric principles to problems of person nel selection, educational placement, and evaluation of achievement becomes a critical problem. Take, for example, the subarea of 1 personnel testing. This area developed as an offshoot of the intelli gence testing movement that was initiated by Binet; as such, its primary theoretical orientation remained that of classical mental test theory. This means that the central concepts of personnel test ing are the same as those of psychometrics; namely, reliability and validity (Loevinger, 1957)- Personnel psychology is, however, an applied branch of psychology in that its primary purpose is not the discovery of the fundamental psychometric principles underlying tests used for personnel selection but rather the application of these principles to the economic problems that arise from the selection of n individuals who will be employed for a specific job from a (hope fully) larger group of N individuals who have applied for the job. It is important to note here that at least some investigators feel that it is not necessary to know what the principles are as long as they can be applied in a satisfactory manner, just as it is unnecessary to know how electrons flow or what creates alternating current in order to use electricity to provide light. It is also important to note here that the selection problem must now be considered as related to business economics. Reliability and -validity are no longer the sine qua non of tests, but must be evaluated in the light of the cost of obtaining high reliabilities and validities and the benefits to be gained from increasing the validity and reli ability of tests . This approach is most clearly defined by Cronbach and Gleser (1957 j 1965) in their now famous application of decision theory to personnel selection. The basis of this approach is the formulation of a quantifiable concept of test utility, a concept that incorporates test validity as one of its elements but also includes other factors such as cost of testing, selection ratio, and expected payoff from the use of a test. Thus, if the cost of testing exceeds the expected payoff from using the test no validity coefficient how ever high will justify the use of the test for selection; conversely, if the expected payoff-to-cost ratio is very high a test of even com paratively low validity will be quite useful (Cronbach, 1966). An analogous situation applies to the decision whether to conduct a validity study on a test. Traditional personnel selection practices, based solidly on classical test theory, demand the exclusive use of adequately validated tests, and hundreds of thousands of dollars are spent each year on the validation of tests used for personnel selec tion. The fact remains, however, that this massive effort does not begin to meet the needs of industry; in the Bureau of National Affairs survey of testing practices in 170 companies, only 60 percent of large companies and 48 percent of small companies reported that they carried out any validity studies at all (Lawshe & Balma, 1966). It seems obvious that the majority of companies engaging in personnel testing 'feel.that the benefits of test validation are outweighed by the cost of such validation. The dilemma, then, that faces the personnel psychologist is choos ing between psychometric theory, which demands adequate test valida tion, and business economics, which may deem such validation uneconom ical. There is no question that in many cases the problem does not exist, as it can be shown that validation is justified economically as well as psychometrically (Balma, 1955> Guilford, 1955)* many other cases, however, the problem is a real one and demands resolution. In evaluating the need for test validation, one must take into account the decisions that the validity information will assist in making. Let us assume that an individual is selecting one of N tests to be used as a selection device and that the cost of testing is con stant regardless of which test is selected. In this case, he will choose the test with the highest validity (provided that its use is profitable), and the only information needed for this decision is the rank order of test validities and the knowledge of whether the gain in utility from using the test with the highest validity coefficient exceeds the cost of testing. If this information is available more cheaply elsewhere, test validation is unprofitable. Thus, if a per sonnel psychologist can intuitively identify the correct rank order of validity coefficients, the need for a validity study diminishes, even if he cannot accurately estimate the size of the coefficients. When more than one test is to be used, however, the situation changes. In this case, the overall validity (and therefore the utility) of the test battery varies depending upon the selection strategy used. Although a given selection strategy may include many possible actions, such as multiple cutoff methods, prereject and pre accept methods, and linear and nonlinear combination methods, we shall restrict our consideration to one type of strategy: the linear com bination method. In this approach, a matrix of test scores X is postmultiplied by a vector of test weights W • The overall validity of the test battery is the correlation of the weighted test scores 5 X^ = XW with a criterion; this validity coefficient can obviously vary considerably, although between limits, depending upon the test weights assigned. Usually, these weights are determined in such a manner as to maximize the overall validity of the battery, and knowl edge of individual test validities is essential to this purpose. Thus, when more than one test is being used and the test scores are weighted in such a manner as to maximize overall validity, the individual test validities must be known to a level beyond that of an ordinal scale. It seems likely that if the information necessary for-determining test weights that will maximize validity can be derived from sources other than validity studies the need for such studies would be reduced. And, indeed, several investigators have found'that simplified methods of determining test weights are almost as predictive as methods utiliz ing validity data (Boyce, 1955; Lawshe & Schucker, 1959; Trattner, 1965)* Most of these methods have, however, been based on the "practical" approach to the problem, and have generally avoided the question of providing a sound theoretical basis for the model. The most tenable of the proposed methods of predetermining test weights is the J -coefficient approach developed by the Federal Civil Service Commission (Primoff, 1955a, 1955b, 1959)• This approach has its theoretical basis in the synthetic validity concept, developed by Lawshe (1952)• Essentially, synthetic validity consists of the analysis of a given job into its component parts, the determination of the importance of each component part to success on the job, the determination of the validity of a given test for each component of the job, and the synthesis of the two values into the overall validity of that test for the job. The J -coefficient technique uses job ele ments (determined, by job analysis) as the intervening components, and defines as its estimate of validity: n J+ = £ . r . tp 1=1 Kti px where J, = estimate of validity of test T for job P tp = beta weight of test T with job element i r ^ = correlation of element i with proficiency on job P . Operationally, both f 3 . . and r . must be estimated by scaling judg- l> i pi ments. The beta weights are estimated directly by a number of test experts familiar with the test. The r ^ are determined by measuring the "proportion of saturation" p^ of the job with the element. These "proportions of saturation" are established by having persons familiar with the job rate it on a 3-point scale with respect.to each of the job elements in question. The ratings are made in response to four questions: 1. Is the element necessary for even a barely acceptable worker? 2. Is trouble likely to occur if the element is ignored in the examination? 3■ Is the element related to one's being a superior rather than an ordinary worker? . 4. Is it practical to expect ability in the element in the labor market? The responses are summed, converted to a scale of 0 to 1, and multi plied by the estimated beta weights to provide initial estimates of J -coefficients. In practice, the process is an iterative one, in which the results of validity studies are used to revise the initial J -coefficient to provide new coefficients; in this manner, the ideal test weights are approached through successive approximation. This method has apparently had some success as a test weighting technique. Trattner (1963) found that the J -coefficient approach developed test weights that were as adequate for predictive purposes as weights developed by the Wherry-Gaylord Gross Score Method and by the General Blue Collar Test Battery Method, both of which are related to selecting tests by use of test intercorrelations and correlations with a criterion. Unfortunately, the obtained validity coefficients by all three methods were all below . 50, with the majority in the -20 to .^0 range; this is hardly a degree of validity with which one could be satisfied. The J -coefficient has, in the writer's opinion, three gross deficiencies that severely limit its applicability. First, there has been no demonstrated relationship between job elements and any measure of validity; the fact that certain elements may be important in the performance of a particular job does not mean per se that testing for this element will improve the validity of selection- It is necessary to demonstrate conclusively that this is the case or to select a different element, one which will be related directly to validity. Second, it has not been demonstrated that the raters are knowl edgeable enough to estimate the "proportion of saturation" of any element in a given job. The four questions utilized to obtain this value all refer to the area of testing for the given element, and would be extremely difficult for even a trained psychometrician to answer, let alone job supervisors; it is a reasonable hypothesis that the raters are responding to some overall index of worth (as subjec tively perceived) of the element toward success on the job. There is some supportive evidence for this; in an unpublished J -coefficient study of six jobs involving probation officer positions, extremely high intercorrelations, all exceeding .85, were found among the responses to questions. The indication is that a common factor was underlying all of the questions. Further, even if the possibility of the raters being qualified to judge the values is accepted, there is no evidence that the subjective "proportions of saturation" are equivalent to the objective r^ 's which, according to the mathematical development of the technique, are the real estimators of validity. It is therefore necessary to select as a basis of obtaining ratings a function which pertains directly to the knowledge and experience of the raters and for which a theoretical and empirical relationship can be determined between the subjective values in the world of the raters and the objective values that are intrinsic in the tests to be devel oped. Third, the J -coefficient process really does not tell us much more about scientifically structuring tests than we knew before. The net result of this technique is a finite number of tests for each of which we have a J -coefficient. If the idea is to maximize the predictive validity of the examination, every test with a positive J -coefficient would have to be included in the examination, as the addition of every positive J would increase the overall validity coefficient, and we have created a "sorcerer's apprentice." Since this is obviously unsuitable, the problem of restricting the. examina tion to the tests with the largest J 's becomes once again a subjec tive judgment of the test constructor. To meet this problem, we would have to develop a technique that would provide adequate criteria for including some tests and not others and would "shut itself off" when further testing becomes unprofitable- Although these problems are serious ones, the J -coefficient approach has, in the opinion of the writer, considerable merit. The basic concept of seeking an intersection of job scaling and test validity is too valuable to surrender without attempting to resolve the difficulties described. It is the writer's opinion that such an attempt is now possible through a reexamination of the J -coefficient approach in the framework of decision theory. Through this orienta tion, which has been applied only recently to problems of psychological measurement, it may be possible to arrive at a method of assigning a priori test weights that will be economically feasible and yet theoretically sound. The purpose of this dissertation, then, is to reevaluate the J -coefficient approach in the decision-theoretic framework; to develop from this framework a revised method of developing a priori test weights; and to test experimentally such a method in a personnel selection problem. Prior to the development of this model, however, it will be necessary to examine the decision-theoretic approach to personnel testing; this discussion will be the subject of the next chapter. CHAPTER II THEORETICAL BACKGROUND One of the most fruitful and interesting of the recent develop ments in psychometrics is the application of decision theory to psycho-, logical testing. This approach, originally designed to provide a more scientific basis for the practical use of tests in decision-making, has been attracting increasing support from test specialists in both the theoretical and applied ends of the spectrum (Curtis, 1966; Gregson, 196^; Mahoney & England, 1965)- The basis of its popularity seems, indeed, to be its remarkable flexibility in relating the theoretical to the applied; if so, this approach should provide an excellent frame work for the hypotheses of this dissertation. Statistical decision theory was originally devised by Wald (1950) as a (supposedly) more feasible means of hypotheses testing in the areas of business and economics. Its basic premises have been deline ated by Chernoff and Moses (1959); Luce and Raiffa (1957); Raiffa and Schlaifer (1961); and Savage (195*+) • The presentation here is based primarily on the work of Raiffa and Schlaifer. The initial ingredient in the analysis of a decision problem using this approach is the action space (A); this may be defined as the set of all possible actions a decision-maker may take in solving the problem under question. In business problems, the action space may include elements such as "buy," "sell," or "wait." Associated with 10 11 the action space is a parameter space (p) . This may be defined, as the set of all possible states of the world at the time of the action. Every element of the action space will result in a set of possible outcomes; depending on which of the possible states of nature is the true one; that is, the action space will act through the parameter space to create a consequence space (c), which is the set of all possible outcomes of all possible actions . In the vast majority of cases, the parameter space is unknown; therefore, a fourth space, the sample space (s) is generated whose sole purpose is to induce a probability distribution over the parameter space. The sample space may be defined as all possible outcomes of a given experiment or random process. A series of decision rules may then be formulated, each rule linking each element of the sample space with one element of the action space. The experiment is then per formed, one (and only one) element of the sample space is observed, the decision rule is selected, and the prescribed action is taken. (See Figure 1.) So far, there has been no real divergence from classical hypothe sis-testing. Here, the elements of the parameter space would be the null hypothesis (H ) is correct and the research hypothesis (H^) is correct; the sample space would be the experimental outcomes; the action space would consist of two elements: "reject the null hypothe sis" and "do not reject the null hypothesis" and the consequence space will be the Type I and Type II errors as well as the correct responses . The decision rule function would be based on the probability that the observed element of the sample space can occur given the existence of 12 one of the elements of the parameter space. The elements of the parameter space themselves have a probability o'f occurrence of either zero or one. Before the experiment is run, the probability that the null hypothesis is correct is arbitrarily set at one and the proba bility that the research hypothesis is correct is set at zero. This assumption is reversed only if the sample space provided overwhelming evidence that this is not the case; that is, given element Hq of the parameter space, the probability of the observed sample element is less than some small number. Further, this "small number" is arbitrarily selected and there is no provision for more than two elements of the action space; the null hypothesis can either be rejected or not re jected. Thus, while classical hypothesis-testing can be incorporated in the decision theory model, it places considerable and perhaps undue limitations on the model. ' _ The complete decision model provides considerably more freedom. First, the a priori probability distribution over the parameter space s ' is not restricted to zero or one; its elements can be assigned varying probabilities based on previous experiments, expert judgments, or even hunches. Further, the results of the experiment can be used to revise the a priori probabilities by Bayesian methods. Second, there can be as many different elements of the action space as there are elements of the sample space. One can, for instance, define as actions "accept the research hypothesis," "accept the null hypothesis," "collect more data," or "drop the subject." This blessing, however, is not without qualification; while it provides for much greater flexibility, it also 13 requires the specification of criteria for decision rules. These will he discussed later- Finally., decision theory provides for a numerical function over the consequence space- This function is a lexigraphic ordering of the value or worth of the outcomes and is called the utility function of the outcomes; it is in fact the key concept of the decision-theory approach. The utility function can perhaps he hest illustrated by reference to the classical hypothesis testing paradigm described above. In this case; the action space consists of two elements: accept the null hypothesis (a^) and reject the null hypothesis (a^)- The parame ter space also consists of two elements: the null hypothesis is correct (s^) and the null hypothesis is incorrect (s^)- The conse quence space (Figure 2.), then^ contains four elements: accept the null hypothesis when It is in fact correct (c.^); accept the null hypothesis when It is in fact incorrect reject the null hypoth esis when it is in fact correct and reject the null hypothesis when it is in fact incorrect (c^) ' Here an|i c^ represent correct actions based on the states of nature and c^> and c^ represent Type I and Type II errors; respec tively- The utility function represents the gain from correct actions and the loss from Incorrect actions and is sometimes called a payoff matrix. A hypothetical payoff matrix in this context is shown in Figure 3- It may be noted that the gains from correctly accepting the null hypothesis and from correctly rejecting the null hypothesis need not be equal; nor do the losses associated with Type I and Type II errors - 1^ Utility Theory The concept of utility is, of course, not dependent on statistical decision theory and in fact predates it by two centuries. The first definition-of -utility was provided by Jeremy Bentham (1789); he inter preted the statement "the greatest good for the greatest number" as meaning the maximization of utility and attempted to formulate a program to measure utility. Following Bentham, the development of utility theory followed two divergent lines: that of ordinal utility functions and that of cardinal utility functions• Ordinal utility functions were first developed by Pareto (1907) and have their main application in the area of consumer preference behavior. On this approach, the only assumption concerning the utili ties of commodities in a set is that of order. That is, the two necessary relations to generate an ordinal scale are those of prefer ence ( P ) and indifference ( I ) . Ordinal utility functions are x y x y ^ unique only to an increasing monotonic transformation, a fact which tends to limit their usefulness in psychology, although apparently not in economics. A more fruitful area of exploration from the psychological view point is that of cardinal utility functions. This approach, which was developed originally in economics by Jevons, Waldras, Marshall and others (Luce & Suppes, 1965), is based on the assumption that the utility of a commodity bundle is the sum of the utilities of the individual components: u(xl^x2, • * ' >xn) = ux (*-[_) + u2 (*2) + ••• + un (x ) • 15 A cardinal utility scale is unique to at least a linear transformation, which implies rather strong assumptions need to be met to guarantee additivity; the specific assumptions vary according to the utility system selected. • The development of cardinal utility functions permits a more specific definition of decision rules in the decision theory model- The most generally used basis for this definition is the expected utility hypothesis, first formulated by Bernoulli (1758)• This hypothesis can perhaps be best illustrated in light of the decision- theory model presented previously. In the model, the consequence space consists of a set of outcomes (cnn,cn c ) and the 11 12 mn parameter space consists of a set of possible states of the world (sn,s ,...,s ) . For each outcome c., there is an associated 1 2 n xo utility u .. , and for each state of the world s. there is an - ^ 1J 1 associated probability p'. that this state exists at the time of ^ rsi action. If we restrict consideration to one element of the action space a^ , we reduce the consequence space to the subset (CH j c12, ■ . •,cin) • The expected utility of a^ is now defined as: n E(an)=u p +u p + ...+u p = Z u p 1 c-,, cno s„- ' c, s . c, . . 11 1 12 2 In n i=l lx x The expected utility hypothesis is in effect a decision rule which states that the action with the highest expected utility is the one that should be taken; that is to say, the decision-maker should act to maximize expected utility. Maximization of expected utility is by far the most popular and useful decision rule devised. This is not to say that it is the only decision rule utilized; indeed, many others have been formulated (Thrall, 195^0- There is, for example, the "minimax" principle (Savage, 1951)> in which the action that is chosen is the one that will minimize the maximum possible loss from a wrong decision; this is a highly conservative rule, one that is more appropriate in two-person games against shrewd and calculating opponents than in the more common decision problem in which one's opponent is the law of chance. For the application of decision theory to personnel testing, however, the principle of maximization of expected utility seems most appropriate. Axiom System for Utility The value of any theoretical construct depends, of course, on the development of appropriate axiom systems. These systems define the relationships between the components of the construct and thereby the type of scale that must be used to measure the construct; determine the relationships between the theoretical construct and observable phenomena; and establish testable relationships among the observable phenomena that will (hopefully) verify the existence of the construct, or at least illustrate its explanatory value. Utility theory is blessed--or cursed--with a number of axiom systems. One can, using the same observable preference data, arrive at ordinal, interval, or ratio scales of utility by stipulating different axiom systems to relate preferences to utility. These systems may involve strictly algebraic relationships between preference behavior 17 and utility,, or they may define probability distributions over prefer ences or over utilities or both. This situation is even more pro nounced when one considers psychological theories of utility; Abelson (196^), for example, counted nine major axiom systems in the simplest form of psychological utility theory, which excludes uncertain events from consideration. The natural result of this overabundance of axiom systems has been a state of chaos in the measurement and application of utility theory. A person seeking to measure utility would soon become aware of the fact that his scale and thereby his experimental design would be affected by the axiom system he selects• He may then turn to one of two directions: he may either suspend the experiment until the theo retical ramifications are worked out or he may choose one utility theory and axiom system and cling to it for dear life. The former solution leads to stagnation in an important area of research, while the latter solution leads to rigidity. This situation has, probably more than any other factor, acted to limit the practical application of utility theory. There is, however, a ray of hope on the horizon. Luce and Suppes (1965) have recently developed a classification system for utility theories that relates the various axiom systems in such a way as to permit meaningful choices between the various systems. The basis of the system is three dichotomous divisions which provide for complete and mutually exclusive assignments of utility theories to one of eight categories. The first distinction is between algebraic and proba bilistic theories. This refers to the probability of response; if this 18 probability is restricted to 0, or 1 the theory is an algebraic one. Otherwise, it is a probabilistic one. Although it appears that alge braic theories are merely .special cases of probabilistic, theories, the two types really have very different origins and bases. Algebraic theories were studied first in the fields of economics and statistics; probabilistic theories developed later in psychology, primarily as a result of the failure of the algebraic theories to explain empirical data on choice behavior ■ The second distinction is between certain and uncertain outcomes- This refers to the probability of event a occurring if a is chosen in response to a certain stimulus. If the stimulus and response totally determine the outcome, the theory has certain outcomes; if there is in addition a probability mechanism governing the outcomes the theory has uncertain outcomes. The third distinction is between choice theories and ranking theories- This refers to whether the subject is asked to select one of several outcomes or whether he is asked to rank order the outcomes in terms of preference. This distinction is relevant only to proba bilistic theories, as in algebraic theories the ranking process may be considered to be a series of choice processes where the subject is asked to select from a set of rankings the one that matches his preference. The choice of an axiom system for this study was made on the basis of three requirements. First, there had to be previously determined methods of measuring utility in both the subjective and objective aspects. Second, there had to be previous research relating the 19 utility theory to test validity and to preference behavior. Finally, of the systems meeting the first Wo requirements, the one with the simplest theoretical orientation was to be chosen. This is in agree ment with the general principle that the object of theory is to provide explanation for facts or observations using the fewest number of prin ciples or assumptions (Underwood, 1957)• The axiom system selected was the one of von Neumann and Morgenstern (19^1, 19^7)* This system was used by Cronbach and Gleser (1957, 1965) to provide a general decision theory of psychological tests and has been the subject of many experimental procedures for measuring utility. Further, it was the earliest axiomatization of utility involving numerical probability and is still the simplest of the theories that meet the requirements of the expected utility hypothesis. In the Luce and Suppes classification system, it is an algebraic choice theory involving uncertain outcomes. The basis of this axiom system is the definition, given two alternatives x and y , of a new alternative consisting of x occur ring with a probability of a and y occurring with a probability of 1 - a . • This alternative is called the a mixture of x and y (xoy) • If we define P as preference, R as weak preference, and I as indifference, the axioms that satisfy the existence of a von Neumann-Morgenstern system of utility are as follows: 1. R is a weak ordering of A ; 2. xoy is in A 5* xcy = y(l - a)x ; k. (xoy)py = xctpy ; 20 5- if xly , then xazlyaz ; 6 ■ if xPy , then xRxay and xayfy ; 7* if xPy and yPz , there is a y in (0,l) such that yPxyz ; 8- if xPy and yPz , there is a y in (0,l) such that xyzPy . The satisfaction of these axioms guarantees the existence of a numerical utility function u defined on A such that for every x , y , and a : 1- xRy if and only if u(x) > u(y) ; and 2. u(xcty) = au(x) + (l - a)u (y) • Moreover, if u' is any other function satisfying these two conditions, u’ is related to u by a positive linear transformation. Application of Decision Theory to Personnel Testing The first complete statement of the personnel selection and classification problem in the terms of decision theory was by Cronbach and Gleser (1957)• Previously,, however, certain elements of the approach were incorporated into psychometric theory and test tech nology. Taylor and Russell (1939) demonstrated that the value of a test varies with the particular decision to be made using the test and that considerable benefit often accrued from using tests of low validity. The value of a test was defined as the increase in the - proportion of employees that are satisfactory after the test was intro duced in the selection program, and is dependent on the a priori 21 probability of success, the selection ratio, and the validity of the test. Brogden (19^6, 19*+9a. > 19^9b) introduced a utility scale for evaluating decisions in order to provide a better means of interpret ing validity coefficients; he concluded that the gain from using a test is directly related to test validity regardless of the selection ratio. Other investigators contributing to the development of a decision- theoretic model of personnel testing are Richardson (19^) and Brogden and Taylor (1950)* Cronbach and Gleser's basic premise is that the basic purpose of psychological testing is not measurement, as in traditional test theory (Gulliksen, 1950; Hull, 1928), but rather to provide information for decision-making. These decisions may take the form of hiring or re jecting individuals (selection problems); assigning individuals to different jobs or training programs on the basis of one test score (placement problems); or assigning individuals to different treatments on the basis of several categories of information (classification problems). The personnel decision process formulated by Cronbach and Gleser (see Figure h-) is quite similar to the basic decision theory model discussed earlier: "There is, in the first place, an individual about whom a decision is required, and two or more treatments to which he may be assigned. The decision is to be made on the basis of information about the individual. "The information is processed by some principle of inter pretation, or strategy, which leads to either a terminal decision or an investigatory decision• A terminal decision ends the decision-making process by assigning the individual finally to a treatment. The outcome is his performance under that treatment. An investigatory decision calls for additional information, dictating what test or procedures will be used to 22 gather that information. This then leads to a further decision. The cycle of investigatory decision, information gathering, and decision making continues until a terminal decision is made [1965, p* l8] . " It is readily apparent that the Cronbach-Gleser decision-theory model is basically an extension of the general decision model. The treatments represent the action space, the outcome the consequence space, the information represents either the parameter space or the sample space, and the strategy represents the decision function- The major difference appears to be the cyclical nature of investigatory decision-information gathering--decision making process in the Cronbach-Gleser model. This can be resolved by simply regarding the decision to continue investigating as a treatment; the model then becomes a linear one, with one element of the action space (Treatment D) becoming "obtain more information" or "revise the a priori proba bilities of the parameter space and obtain more Information." Another difference between the two models is in the determination of probabilities. In the general decision model, the probability function is assigned over the parameter space and, in this context, represents the probability that the information obtained represents the true state of the world- In the Cronbach-Gleser model, the proba bility function is assigned directly to the decision rule or strategy. A strategy, then, consists of a set of conditional probabilities which represent the probability of each decision, given certain information about the individual. These probabilities need not be restricted to 1 or 0, although in the ideal case they-would be- Any deviation from 1 or 0 indicates either the presence of other factors in the decision process or the operation of a random variable. This is a rather sticky problem to handle since it is difficult to conceive of a decision-maker using a probability mechanism to make his decisions. Cronbach and Gleser attempt to resolve this by explaining that they are describing decisions that are made rather than formulating rules for making de cisions . In classical decision theory, a random process is often used to generate the sample space (e.g., "when an applicant appears, flip a coin. If heads, hire him. If tails, reject him-"), and the binary decision rules multiply the random process outcomes to produce a set of "action probabilities" which correspond to conditional probabilities of the Cronbach-Gleser strategies. Thus, the distinction is, in all probability, academic. The basic product of the process, a set of conditional expected utilities, remains the same. The outcome in the Cronbach-Gleser formulation is identical to the consequence space of classical decision theory. It is described in terms of a criterion or set of criteria of performance. The relation ship between the information and the criterion may be empirically determined from previous results and expressed as a matrix of validity coefficients. This validity matrix (see Figure 5) consists of a set of conditional probabilities of a criterion state given on information category and a treatment, and may be used to predict outcome in future decision problems• The concepts of validity and criteria, as related to this dissertation, will be discussed further in Chapter III. 2k The Utility Function of Personnel Decisions There is, of course, a utility function defined over the outcomes. This utility function, called the value matrix or payoff, is expressed on an interval scale and is defined in terms of the von Neumann and Morgenstern axiom system for utility. The expected utility for a strategy is determined only for a large number of decisions and not for individual decisions. The general formula for expected net utility is U = N Z P 2 P. | E P i e - N L P C y y t c clyt c y y y U = utility of the set of decisions N = the number of persons involved in the set of decisions y = information category t = treatment c = outcome e^ = utility of outcome C = cost of gathering information. y Here, P_^ represents the probability distribution of y , -^t|y represents the strategy matrix, and Pc|yt represents the validity' matrix. This expression involves no assumptions other than those of the expected utility hypothesis; namely, that the expected utility for a large number of decisions is determined by summing the expected payoff for each score times the probability of that score and that the pre ferred strategy is.that which maximizes the expected utility. It is therefore generally applicable to all personnel decision problems. 25 Since we are concerned with one type of decision problem--that of selection— and primarily with the relationship of the utility function and test weights or validities, it would be helpful to impose some restrictive assumptions on the general utility function. These assumptions are as follows: 1. Selection decisions are made on a large population consisting of individuals who have been previously screened by any currently used procedure• 2. The decision for any individual i is binary: accept (t ) “ “ Si or reject (t-^) ■ 5- Every individual has a test score Y. j the distribution of i test scores has a mean of 0 and standard deviation of 1- U• For every individual there is a payoff e., which results a when the person is selected and a payoff e., which results when the 1 b person is rejected, e.. regresses linearly on y , while e., is a b unrelated to Y and may be set equal to zero- 5- The average cost of testing an individual with test Y is C , where C > 0 . y Y 6. The decision rule or strategy is to accept the n individuals with the highest test scores. A cutoff Y' will be determined on the Y continuum such that the desired proportion <t>(Y') = n/w falls above Y1 • The probability of acceptance for individuals with Y > Y1 is 1.00; otherwise it is 0. This is the optional strategy for fixed-quota selection (Cochran, 1951)■ With these assumptions, the expression for net gain utility per man selected can be simplified to Where C is the average cost of testing one person r^e is the test- criterion correlation in the a priori population, ce is the standard deviation of payoff, Y' is the cutoff on the test, and £(Y') is the ordinate of the normal curve at that point (Cronbach & Gleser, 1965, p- 37)- The most important result of the formulation, for the purposes of this dissertation, is that utility is a linear function of validity (Brogden, 19^6, 19^9a>j 19^+9^) • In fact, if we assume a cost of zero, utility is proportional to_validity. If, therefore, we can determine the utility of a test used for selection we have, in effect, determined its validity to at least a linear transformation. The beta weights of the tests which will maximize the validity of the test battery are functions of the individual test validities and the intercorrelations between tests . If the tests are uncorrelated, then, the optimal weights will be equal to the test validities• It follows, therefore, that under the above conditions the optimal test weights are also equivalent to the rational utilities of the tests within a linear transformation. Thus, the utility function over the tests will provide all the information needed to determine the optimal test weights. Subjective Expected Utility It may be pointed out that, regardless of the merits of the utility function, we have not really provided much of a solution for the problem of assigning test weights. We still need to determine the validity matrix of the tests and, in addition, we have added the requirement for a payoff matrix that is difficult to determine with any degree of accuracy. It is apparent that the computation of utili ties is even more difficult and expensive than that of validities. There is, however, one aspect of utility theory that holds promise; this is the postulated existence of a subjective utility function. The concept of subjective expected utility is perhaps one of the most important ways in which the von Neumann-Morgenstern axiomatization of utility was extended (Luce & Suppes, 1965). Its theoretical founda tions and ramifications have been thoroughly explored by Savage (195M and Edwards (1962)• Basically, the same model as rational utility can be used to define subjective expected utility. The only basic differ ence lies in the purpose of the model; while the former is designed to enable a decision-maker to take the best possible action in a given- decision problem, subjective utility models are designed to explain why individuals make the decisions that they do- The subjective utility scale, then, is a set of values over possible outcomes, unique to every individual, that acts as an intervening variable to determine behavior. The individual need not be consciously aware of this process, it is theoretically sufficient to demonstrate that he acts to maximize subjective expected utility. The expected utility hypothesis requires, in addition to the utility scale, a probability distribution over the possible states of the world; this probability distribution serves as a multiplier of utility to determine expected utility, which is the component that is maximized. Subjective utility also requires such a function; this 28 subjective probability distribution is hypothesized to exist within the individual and to represent his (subjective) view of the probability of various outcomes. It is not necessarily dependent on empirical data gathering (although such evidence may be useful in revising previous expectations) and certainly need not be veridical to objective proba bility. A major question of the subjective utility model’ ' " is whether subjective utility can be measured. Obviously, if it cannot be mea sured the model cannot be verified. -The problem is a complex one, as there are two theoretically independent variables Involved in prefer ence behavior: utility and subjective probability. Fortunately, the decomposition theorem of Luce (1959) shows that an individual can resolve his preference scale into the two separate components and provide meaningful measures of both subjective utility and subjective probability. It is then possible to measure utility by holding sub jective probability constant and observing preference behavior in a number of choice situations, a method first suggested by Ramsey (1931)• The specific nature of the choice situations depends, of course, on the axioms of the utility system employed. The first real attempt to measure utility did not, however, in volve subjective probability at all. This experiment, by Mosteller and Rogee (l95l)> was designed specifically to test the von Reumann- Morgenstern axioms of utility. The basic design consisted of the TTie term "model" is used generically here; there are of course several subjective utility models, just as there are several rational utility models. presentation of a series of bets to a subject, each of which could be accepted or rejected. If the bet was rejected; no money changed hands. If the bet was accepted, the subject could either lose 5^ or win x amount of money, depending on the outcome of a random process (the throw of poker dice) with a probability distribution of p and 1 - p. According to the von Neumann-Morgenstern axioms, the utility of winning 0^, winning x , and losing are related by the following equation: u(OjzQ = pu(-5j*0 + (1 - p) u(x) • If we fix u(O^) at 0 and u(-5$0 at -1 , which is permissible since we are measuring utility on an interval scale, then By holding p constant and varying x , Mosteller and Nogee found the point at which the subject was indifferent between the options (that is, he accepted the bet 50f o of the time). On the basis of these results, it was possible to construct utility curves relating the utility values to indifference offers in cents. These utility curves varied for different individuals; it was found, for example, that while the rate of increase of utility with increasing payoff declined for college students, the opposite was true for National Guardsmen. The general conclusions of the Mosteller and Nogee experiment were that it is feasible to measure utility experimentally; that the concept that individuals act in such a way as to maximize expected utility is feasible; and that it is possible to use empirical utility curves to estimate future behavior in similar but more complicated situations. 30 The first attempt to measure utility using subjective probability was that of Siegel (1956)* This method was based directly on Ramsey's (l93l) proposal that subjective probability be fixed at 1/2 by select ing an event which is "ethically neutral" to the subject. This event was a die with two nonsense syllables (ZEJ and ZOj) engraved on its sides, each syllable engraved on three of the sides. The subjective probability of the occurrence of each syllable was experimentally determined to be l/2. The experimental procedure is rather simple: the subject is required to choose one of the two options, each of which will result in one of two payoffs, depending on the throw of the die. The payoffs may be diagrammed in a matrix such as Option 1 Option 2 E a c E b d . . If the subject chooses Option 1, he will receive a if Event E occurs and b if Event E does not occur (e) j similarly a choice of Option 2 will result in c if Event E occurs and d if Event E does not occur. If a subjective probability function S and a utility function U are defined over the options, the choice of .Option 1 over Option 2 by a subject (abRcd in the von Neumann-Morgenstern system) may be repre sented by the following inequality: S(E)u(a) + s(E)u(b) >S(e)u(c) + s(E)u(d) . By using an event for which S(e) = S(e) = l/2 , the inequality reduces to: 31 U(a) + u(b) > u(c) + U(d) , or U(a) - u(d) > u(c) - U(b) if4 and only if4 abRcd . Geometrically, we are saying that if abRcd the utility distance between a and. d is greater than that between b and c or ad > be . This inequality of utility differences was employed by Siegel to define a higher ordered metric scale of utility. The term ordered metric was introduced by Coombs (1950) to denote a scale which generates an ordering of a set of alternatives and a partial ordering of the differences between alternatives, or utility intervals• This scale is somewhere between an ordinal and interval scale of measurement. Siegel expanded this scale to include an order ing of the utility intervals between all pairs of alternatives; he called this the higher-ordered metric method. Since the number of observations needed to establish a higher ordered metric scale becomes quite large even with a moderate number of entities, Siegel resorted to a property of lattice theory (Birkhoff, 19^ - 8) to reduce the number of necessary choices. Assume that these are five entities A, B, C, D, E whose utilities have the rank, order U(A) > U(B) > U(C) > U(D) > U(E) • The options involving these entities can be placed in a lattice such as the one in Figure 6. The entries on the extreme right-hand column of the lattice in volve certain payoffs; that is, a person choosing Option (A, A; 1/2) will receive A regardless of the result of the toss of the die. The 32 other entries involve uncertain payoffs; that is, a person choosing Option (A, B; l/2) will receive A if Event E occurs and B if E does not occur. According to lattice theory, all options in" the lattice that can he connected by a line going consistently up or con sistently down nay be ordered merely from knowledge of the rank orders • For example, if A is preferred to B , we know without experimenta tion that (A, A; l/2) is preferred to (A, B; l/2) and that (A, C; 1/2) is preferred to (B, C; l/2). The only choices that must be observed now are between nonorderable options, or those which cannot be con nected by a line going consistently in one direction. The complete method used by Siegel, then, is: 1. Interpret the probability in the von Weumann-Morgenstern formulations of utility as subjective probability. 2. Find an event for which the person's subjective probability can be experimentally determined to be one-half. 3- Using the method of paired comparisons, obtain rankings of the entitles used by each person. 4. Set the ranked entities in a lattice and require each person to state his preference between each nonorderable pair of combinations (a nonorderable pair is a pair that cannot be ordered merely from the rankings). 5- Observe the choices that can be connected by a line going consistently up or down the lattice; these choices permit the deter mination of an ordered metric scale. 6- Observe a sufficient number of nonorderable choices to permit determination of a higher-ordered metric scale. 33 7 • Check to see if the remaining choices are consistent with the scale derived from Step 6. If they are consistent, higher-ordered metric scaling has been achieved. Summary and Implications The purpose of this chapter is to present some of the theoretical and practical ramifications of utility theory and to examine its rele vance to the subject of this dissertation. We may briefly summarize its major points as follows: Tests and other devices used for per sonnel selection have definable and measurable utilities for selection and placement decision-making. These utilities are, under certain restrictions, linearly related to the validities of these tests for the specific selection or placement problems under consideration. These utilities are rational in the sense that objective methods are used to measure them and that the decision-maker consciously uses them to formulate decision rules that will maximize the payoff. Each decision-maker also possesses a definable and measurable subjective utility scale by which he makes decisions that maximize the expected payoff of the decision. The formulation of this utility scale is Identical to that of the rational utility scale except for the fact that objective probability is replaced by subjective probability. The implications of the decision-theory approach to the problem under consideration are now obvious. If we can use the subjective utility scale to estimate rational utility and thereby test validity we have a practical and theoretically appropriate means of estimating test weights. The relationship between the two scales constitutes the primary hypothesis of the study. CHARTER III DEVELOPMENT OF THE MODEL The purpose of this chapter is to present the model relating utility to optimal test weights that is to be tested in this disserta tion- It is, of course; first necessary to define what is meant by utility and by optimal test weights. The last chapter discussed the concept of utility and its relation to test validity. It is well known that if tests are uncorrelated the optimal weights for prediction of a criterion are directly proportional to validity. Hence, test weights and utility are related through the concept of validity. We must, therefore, begin the presentation of the model with an examination of the concept of validity. Types of Validity Historically, the first recognized definition of validity has been the degree to which' a test "measures what it. is supposed to measure." This definition, although true enough theoretically, was found by some to be too vague to be useful in an operational sense. This criticism led to the more practical and measurable definition of validity as "correlation with a criterion." Although both definitions have long ago been rejected as overly simplistic, they represent two poles of scientific thought with respect to validity. The former point of view turns its gaze inward and tries to measure the worth of a test in terms 3^ 35 of its intrinsic value, while the latter tries to equate the worth of a test with some external variable that is related to the job that the test is supposed to do. In a sense, this division is part of the "theoretical" vs. "applied" controversy mentioned in Chapter I, insofar as the first definition is more closely related to the theoretical bases of psychological tests, while the latter is tied to practical issues of test use. The American Psychological Association (1966) has delineated three types of test validity in current use. There are content validity, construct validity, and criterion-related validity. The first two types stem from the first definition of validity, while the last type is derived from the second definition. Content validity stems from the fact that any test consists of a sample of items drawn from a population of all possible items in the area being measured. It is defined as the degree to which the sample is representative of the population. Unfortunately, high content validity does not guarantee either of the two general definitions of validity; its use is therefore relatively minor in both the theoretical and practical aspects of testing. Content validity has, however, been related by Cronbach, Rajaratnam, and Gleser (1963) to reliability, in that it represents the degrees to which a score on a specific test can be generalized to a score on all possible tests in the area in question- Construct validity is a closely related concept. This is the degree to which a test measures a given hypothetical construct underlying behavior; in essence, this is the degree to which a test "measures what it is supposed to measure." In its original formulation, it was considered to be a validation of the theory under lying the test. ■ ■ Construct validity has been a point of controversy in psycholog ical testing ever since its formulation. Avidly supported by Cronbach and Meehl (1955)* Loevinger (1957)* an(i Bunnette (19^3), and equally avidly criticized by Bechtoldt (1959)* it remains today as much a point of dispute as ever. The controversy seems not to center so much on whether construct validity accomplishes its purpose of validating psychological theory but rather on whether it is a useful concept for evaluating the worth of psychological tests. Criterion-related validity is derived from the basic definition of validity as "correlation with a criterion." This category is a com bination of two previously existing categories: predictive validity and concurrent validity. Predictive validity is defined as the rela tionship between test scores and some future behavior which serves as a criterion. Concurrent validity differs from predictive validity in the respect that the criterion measure is available at the same time as the test scores are. A common example of concurrent validity is the "present employee" method of validation of employment tests, a method in which both test data and criterion measures are obtained from people already employed on the job. Personnel psychologists and others in terested in the application of psychological tests to personnel problems usually regard predictive validity as the "best" concept of validity, with concurrent validity often being used as a necessary compromise with reality, just as content validity is sometimes used as a practical approximation to construct validity. The use of predictive 37 validity, as opposed to other types, has been supported by Guion (1965) and Lawshe and Balma (1966), among others. The basic conflict between supporters of construct validity and predictive validity is not so much over which is the correct interpre tation of "true" validity but rather over which is the interpretation of validity that will be most useful in attaching the problems of psychological measurement. The controversy, then, is not one of "truth" but rather one of "priority." The case for construct validity has been ably stated by Dunnette (1963)• He feels that the basic goal of validity research is to produce scientific information about the meaning of test scores. The predictive validity studies, while pro ducing a wealth of useful data for personnel selection and placement problems, are not as useful for this basic goal as are construct validity studies- He proposes, therefore, that "the coefficient of practical validity (concurrent and predictive) be accorded a lower position in the status hierarchy and that they be used simply as one of a number of hinds of evidence to lend meaning to test behavior [p. 252]." Supporters of predictive validity, on the other hand, thinh of predictive validity as playing the most important role in psycho logical measurement; Lawshe and Balma (1966), for example, refer to it as "the heart of personnel testing [p. 269]-" It is the writer's belief that a fully developed theory of psychological measurement must resolve the differences between these two views. Face Validity and Synthetic Validity Two other concepts of validity have arisen in addition to the three types delineated by the APA- These are face validity and 38 synthetic validity. Face validity is a special case of content validity; it represents the degree to which a test looks like it is related to its purpose• Although many individuals, particularly in personnel work, regard face validity as a valuable supplement to pre dictive validity or construct validity, very few regard it as a sub stitute for the other types . A slight variation of face validity is supported by a small group of individuals, mostly in public personnel work,who insist they can determine what an item measures merely bs^ looking at it. This view is emphatically rejected by most psychome tricians . The other concept, synthetic validity, has a more direct bearing on the problem in question. This concept was introduced by Lawshe (1952), primarily to provide a means to infer validity coefficients in a specific situation without running predictive validity studies. The operational definition of synthetic validity, as developed by Balma (1959), is: ". . • the inferring of validity in a specific situation from a systematic analysis of jobs into their elements, a determination of test validity for these elements, and a conbination of elemental validities into a whole [p. 395)•" Basically, then, synthetic validity is a combination of a large number of test-element correlations, where the elements are determined by job analysis• This concept forms the basis of the J -coefficient approach (Primoff, 1955a, 1955b)• Since the J -coefficient approach itself forms the practical basis of the present model, the concept of syn thetic validity is obviously quite important to the present study. There is, however, a more important consequence to this concept. It 39 forms, in the writer's opinion, a means of connecting construct valid ity and predictive validity. Basically, the job elements can be regarded as constructs. The predictive validity of a test may then be interpreted as the sum of two components: the construct validity of the tests in measuring the job elements and the predictive validity of the elements in determining success on the job. Decision Theory and Validity The decision theory approach to testing rests on the interpreta tion of test validity as predictive validity (Cronbach & Gleser, 19^5^ p. 3l)• There is, however, provision for the two-stage concept of validity indicated above. Cronbach and Gleser imply this in their development of the utility expression involving an intervening aptitude factor s , so that r = r r and AU = c r r r £(Y')-C ye ys se e ys se y (terms defined previously). It is therefore feasible to speak of both rational utility and validity of job elements- But whether it is also logical to speak of the subjective utility of job elements depends on how the criterion is defined. The Nature of the Criterion Both predictive validity and concurrent validity are defined as the relationship of scores on a test to a criterion, which is some form of behavior. A major problem, however, arises in the selection and measurement of that behavior that will serve as the criterion. Here, as in many areas of applied psychology, theoretical considerations are often overruled by practical limitations- Thus, a rating of doubtful reliability and untested relationship to overall job success may be 40 used as a criterion if it is the only measure available. This situa tion is, fortunately, rather rare; more commonly, there are several possible criteria available (at least potentially) and one of the personnel psychologist's duties is to select from these the criterion measure to be used. To accomplish this, some means of evaluating the worth of potential criteria is needed. One such method was implied by Thorndike (194-9) in his hierarchi cal organization of criteria. The criterion with the highest value is the ultimate criterion■ This is the final goal of selection; it in cludes everything that defines success on the job. As such, it is more of an abstraction, determinable only on rational grounds, than a practical criterion measure. The practical measure of job success are substitute criteria, which are judged to be related to the ultimate criterion. The degree of this relationship is an index of worth of the substitute criterion; it is in effect its validity. One single substitute criterion taken by itself seldom has a high relationship to the ultimate criterion. Current thinking, therefore, is directed toward the development and use of multiple criteria. These multiple criteria may be weighted and combined to form one single com posite criterion that (hopefully) is more closely related to the ultimate criterion than any of the substitute criteria taken separate ly. Some (e.g., Ghiselli, 1956; Seashore, Indik, & Georgopoulos, i960) reject the concept that criteria can be combined in any meaning ful. way. They support the use of multiple criteria, but as completely separate entities . Guion (1961) sums up the objections to composite criteria as two-fold. First, the criteria involve different types of hi entities and cannot be expected to "mix." Second, it is difficult to determine what the effective weights of the criteria are. The fact is, as Toops (l^^) and Nagle (1953) have pointed out, that a single, overall criterion is absolutely necessary in test validation. There are no statistical methods for weighting tests un less they are related to one criterion. What those opposed to com posite criteria are suggesting does not violate this; they are saying, in effbct, that one criterion should be chosen, but this choice may be varied from situation to situation and from job to job. This is really a weighting pattern in vrhich one criterion is assigned a weight of 1 and the others of 0. This is an acceptable but restrictive practice; rather than accepting the restriction it may be well to try to overcome the objections to composite criteria. The Criterion as a Decision Composite criteria are usually formed by weighting each of the substitute criteria (which may be called criterion elements) by its relation to the ultimate criterion. It is here that the first problem arises. The ultimate criterion is, as mentioned earlier, an abstrac tion without measurable substance. In effect, the criterion elements are being related to an immeasurable entity. This entity must be defined operationally before it is possible to speak meaningftilly of composite criteria. The operational definition for the ultimate criterion comes, in the writer's opinion, from the only possible method of its determination. Somewhere, at some time, someone must decide on an individual's worth to the organization. The ultimate criterion is k-2 defined as this value. Whether it is determined by quantitative data or based on general impressions is irrelevant except perhaps as a means of convincing the decision-maker to use a more scientific basis for his decision. The main requirement is that this ultimate criterion decision produces measurable values. This definition needs some restriction, however. While it is certainly possible to identify this decision process, it may be neces sary to wait years until it has been made; this would be highly im practical. Generally, however, new employees serve a probationary period, and the definition of ultimate criterion can be restricted to that value which is obtained after completion of the probationary period. It is at this point that the criterion decision whether an employee is successful or not is made, usually by the immediate super visor of the employee. It should be noted that this decision need not be followed by action; it is quite possible that unsatisfactory em ployees may be retained In their positions because of unusual economic conditions or nepotism, while satisfactory employees may be released because of factors unrelated to the performance of their jobs. In other words, we are really talking about two separate utility func tions: the utility or worth of the employee to the organization and the utility of keeping the employee on the job. An employee will be regarded as unsatisfactory when the first value drops below the cost of retaining him, but will not be released until the second value drops below this cost. The relationship between the criterion elements and the ultimate criterion or criterion decision is now apparent. The decision-maker derives the criterion decision by estimating the ultimate utility of the individual to the organization; he formulates the decision rules by relating the utility of retaining that individual to the utilities of retaining other employees and to the loss resulting to the organ ization from leaving the position vacant. While the criterion de cisions themselves are binary, the ultimate utilities are continuous and, if the von Neumann-Morgenstera formulation holds, unique to a linear transformation. The ultimate utility vector can be conceived in this formulation as the sum of utilities of a set of criterion ele ments, the utility of each element being weighted by the probability that the individual possesses that element: u. = Z p.u. , i j j or, in matrix formulation: U. = FU. ■ i J If the decision-making process is a "rational” one--that is, If there are no extraneous factors that tend to produce a divergence in the utility of an employee to the organization and the utility of retain ing that employee--the decision rule to retain employees would be to set a cutting score on the ultimate utility vector and retain every individual whose utility falls above this cutoff. Relationship to Test Weights The process of determining the test weights that will maximize predictive validity Is a similar one. A measure of worth of the employees which serves as an "ultimate” criterion or reasonable kk approximation is obtained. This measure of worth is sometimes obtained from objective data; more often, however, it is obtained from super visory judgments of value. Scores on a series of predictors are also obtained and weighted in such a way as to maximize the prediction of the criterion. If the weights are such as to provide perfect predic tion, then c. = Zp z. , 1 m xm or C. = Z.B i 1 where c. is the score of the individual on the criterion, z. is l xm the standard score on predictor m and f 3 is the regression weight for m . If, as we have assumed, the ultimate criterion is in fact a decision, prediction of success is maximized when u^ = c^ . This situation exists when the utility scale of the person making the criterion decision is used to determine u. . Then x PU = ZB . If, now, a series of predictors with construct validity for the cri terion elements are used, they can be scored so that the raw score represents the probability that the individual has the element in question. If so, Z = P and U = B . This relationship forms the central thesis of this dissertation. Implications of This Approach If this approach is valid, we have a powerful technique for economically weighting tests by using subjective scaling methods. To 1 ^ 5 derive the set of weights B that maximize validity we need only determine the subjective utility scale U of the criterion decision makers . In effect, we now have a modification of the J -coefficient pro cess that meets the objections to the original approach. The test weights that maximize validity are approximated by the subjective utility of the criterion elements which are measured by the tests. These subjective utilities are determined by examining the preference behavior or responses of the decision-makers. The decision-makers are not asked for their estimate of the "importance" of job elements, but rather for their preferences only, which is a more meaningful state ment for them to make. Further, we are now dealing not with job elements, but with criterion elements, which are more directly related to test validity. Finally, the use of the utility function permits the formulation of decision rules which allow for the elimination of a criterion element from the testing program if the utility of that element drops below the cost of testing for it. Assumptions of the Model L " The basic method of experimentally testing the model will be to derive the subjective utility function over a restricted number of criterion elements and to compare this function to the optimal weights of tests measuring these elements. There are, however, several assump tions about the model which must be verified if the process is to provide meaningful results. It would be helpful at this point to examine these assumptions in order to provide a more meaningful basis for evaluating the experimental results. Some of the assumptions relate to the determination of the sub jective utility function. The u. must be measurable by reasonably J economical scaling methods. The ratings must be made by individuals who would be making the criterion decisions. Finally, the provisions of the subjective expected utility hypothesis, namely that u. = Z-n.u. . , must hold, i ij There are several assumptions concerning the tests that must also be met. First, the tests must be valid and reliable factor-pure measures of the criterion elements; that is, they must have construct validity for these elements. They must also be uncorrelated. Then, the scores on the test must be such that a given standard score equals the probability that the individual receiving that score has the element in question (Z = P) - Finally, the test scores must be treated in such a way as to produce a Cronbach-Gleser utility scale that meets the requirements necessary to provide a linear relationship between utility and validity. The reason for this assumption is obvious when it is considered that it is test validity that is pro portional to optimal test weights. Essentially, these requirements are that the cost of testing be equal for all tests; that the payoff for each test be linearly related to the test score for selected indi viduals and zero for rejected individuals; that the selection decision be binary and dependent on whether the weighted test score for a given individual equals or exceeds a specified cutoff; and that the proba bility of acceptance be either 0 or 1, depending on whether the cutoff is exceeded or not. ^7 Observable Properties of the Model If the model proposed here is appropriate, there are certain observable phenomena that can be deduced from it; these phenomena can be used as a check on the accuracy and usefulness of the model. First, it can be inferred that the subjective utility function over the criterion elements can be defined and measured from the responses of a group of supervisors or other decision-makers; that is, these responses will be scalable in such a way as to satisfy the axioms of the von Neumann-Morgenstern utility system. Second, it can be inferred that if an -ultimate utility scale is available, this scale can be approximated by summing the subjective utilities, each utility being weighted by the probability that the individual in question possesses that criterion element. Third, it can be inferred that the optimal test weights of tests - measuring the criterion elements will be equiva lent to the utility function within a linear transformation. Finally, it can be inferred that the employment of utilities as test weights will provide a multiple correlation with the ultimate criterion that is not significantly less than that obtained by any other weighting method. Summary The purpose of this chapter was to develop a model within the decision theory framework that will permit the estimation of optimal test weights by relatively inexpensive scaling methods* The primary basis of this model is the relationship of test validity, which is the main determiner of optimal test weights when uncorrelated tests are used, to rational test utility. If the ultimate criterion is regarded as a decision, we can predict this decision hy a subjective decision model, employing the subjective utilities of a series of postulated criterion elements. The central thesis of this model is that if the tests are regarded as factor-pure measures of the criterion elements, the optimal test weights will be equivalent to the subjective utility function within a linear transformation. CHAPTER IV THE EXPERIMENTAL DESIGN The first three chapters of this dissertation described the theoretical background and development of the model for estimating test weights. The purpose of this chapter is to describe the experi mental procedure that was used to derive the subjective utility function and to compare it to objectively derived optimal test weights. Operational Definitions The terms used in this study.may be operationally defined as follows: 1. The criterion decision is a binary decision that an individual is successful in a given position. This decision is determined by his immediate supervisor. 2. The -ultimate utility scale is that index of an individual' s worth to an organization that is used to determine the criterion decision. 3* A criterion element (e) is a knowledge, skill, or ability that is to some degree predictive of success on the job. The subjective utility (u) of a criterion element is the preference scale value assigned to that element by a group of super visors who make the criterion decisions • 5- A unit test is a test measuring one criterion element only. 49 50 6- The selection decision is a binary decision to appoint or not to appoint a given individual to a specific position. 7> The rational utility of a unit test is the Cronbach-Gleser utility of that test for making the selection decision. Measurement of the Subjective Utility Function Two methods of measuring subjective utility were employed in the study: the Siegel higher-ordered metric method and the Thurstone pair- comparison method. The Siegel method is basically the original procedure, developed in 1957, which was described in Chapter II. Two changes were made in the procedure, however, to provide greater applicability to this problem. The first relates to the experimental method. Siegel set up his experiments as one-person games involving dice with nonsense syllables engraved on them; the subjective probability of the occur rence of each of the syllables was empirically determined to be l/2. This procedure was obviously inappropriate for use with supervisors in a work situation. They were given, instead, a questionnaire (see Appendix i) delineating the options and stipulating the objective probability as l/2. This is similar to the approach used by Fagot (1959) with very satisfactory results. The substitution of objective probability for subjective probability is admittedly a departure from the model; Edwards (1962), however, found that; "if subjective probabilities do have the additivity property, if the decision model in which they are used is in principle applicable to any conceivable set of events, and if the number of different subjective probabilities which may occur in 51 conjunction with a given objective probability is no more than denumerably infinite, then whenever objective proba bilities are defined subjective probability must equal objective probability [p. 130]." Although this is an artificial situation not generally found in most real-life decision problems, we can in this restricted situation adopt these assumptions and utilize the objective probability function. The second change relates to the utilization of the information. In the Siegel approach, a rank order of the entities was obtained first, and the probability combinations used were selected on the basis of this rank order. In this experiment, enough probability combina tions were obtained to provide a higher-ordered metric scale regardless of the rank order. The combinations which provided "nested” choices (e.g., if A>B>C , then the comparison AC and B was used) were scaled for the distance relationships, while the remaining choices served as a check on transitivity. This avoided the construction of a lattice for each questionnaire. The questionnaire itself included ' j k questions and was used to rank five entities. The first 10 items of the questionnaire were scored to provide a rank ordering of the areas for each supervisor. This rank order scale then was used to select from the remaining 44 items those used to- determine the distance between the elements. These inequalities were then solved for the distance relationships. For example, if the rank order of three elements is A > B > C and the mixture AC is preferred to B , then the relationship between the distances is AB > BC ■ If B is preferred to AC , then the converse is true. Similarly, if the rank order of four elements is 52 A > B > C > D and the mixture AD is preferred to the mixture BC , the relationship between the distances is AB > CD . Choices which do not involve nestings such as this are used to check for transitivity. For example, if the mixture AB is preferred to C the item is transitive; if the converse is true the item is intransitive. Next, the six relationships involving distances between adjacent elements were observed to provide an ordered metric scale. If these six relationships or the rank order relationships were intransitive, the entire questionnaire was rejected as unscorable due to intransi tivity. Next, a sufficient number of further preferences (generally from one to three) was observed to order all distances between all of the elements to form the higher-ordered metric scale. Finally, all relationships not used to construct the scale were observed for transi tivity, and the number of intransitivities counted. The scale was quantified by arbitrarily assigning a value of 1 to the smallest distance and expressing the larger distances in terms of the smaller distances. When a distance could not be expressed in terms of the smaller distances, it was arbitrarily given a value of one greater than the next smallest distance, except in the few instances when this would introduce inconsistent relationships later in the scale; in these instances fractional distances were used. The scale was then transformed linearly to a scale of 0 to 1 from the smallest entity to the largest entity. A scored questionnaire is found in Appendix II. The other method of measuring subjective utility in this study is the Thurstone pair-comparison method. This method, which is delineated in Torgerson (1958), requires only the comparison of each pair of entities and the forced choice of one of the pair by a group of sub jects. Subjective probability is not involved in this method at all. Sanders (1961) found a highly significant relationship between a Thurstone scale and a Siegel scale for preferences of newspaper con tent; if a similar relationship can be found in this area, the data collection procedures will be greatly simplified. The responses to the first ten items of the questionnaire were used to determine the Thurstone scale. The questionnaire was completed by 51 immediate supervisors of Senior Civil Engineering Assistants employed by the engineering department of a large and populous county. Five criterion elements, all knowledges, were involved. They were knowledges of surveying and mapping, soil mechanics and foundations, cement and concrete, mechanics and materials, and hydraulics. The choice of these elements was governed by the examination procedure established by the personnel department of the county for this class. Measurement of Criterion Elements The other measure needed for this study was the degree to which the candidates possess the criterion elements in question. In any practical application of this approach, these measures must be obtained from test scores or scores on other screening devices. The tests must, however, meet certain rather stringent requirements pertaining to con struct validity and reliability, among other things (see p. b6). Since the vagaries of test construction make it exceedingly difficult to ensure that these requirements will be met and since the purpose of this study was to test the model in a theoretical as well as a practi cal sense, it was decided to use direct supervisory ratings of the candidates' possession of the criterion elements as these measures in addition to test scores. The rating form used to obtain the cri terion element measures is presented in Appendix III. The actual ratings were letter grades based on a 5-point scale, which were later converted to scores of 1 to 5- The form was constructed in such a manner as to encourage a normal distribution of scores. In order to obtain some indication of the practicality of this approach when predictors of the type ordinarily used in personnel selection are employed, a second set of criterion element measures was obtained from scores on written tests designed to measure the criterion elements in the study. These five achievement tests, each 15 items long, were administered to 39 candidates in an examination for Senior Civil Engineering Assistant; all of the candidates in the examination were currently employed as Civil Engineering Assistants. The test scores were used as part of the civil service promotion procedure • Determination of Beta Weights The beta weights used in this study were obtained from a multiple regression analysis performed at the Health Sciences Computing Facility, University of California Medical Center, Los Angeles. The predictors used in the analysis were (a) the criterion element ratings and (b) the actual test scores. The criterion (dependent variable) 55 used was a rating of proficiency for the position in question. This rating, called the "Appraisal of Promotability," is prepared for each candidate in the examination by the head of the department, and is reported on a scale of ^0 to 100. In effect, then, this portion of the study was a concurrent -validity study. Summary of the Experimental Design The design of the experimental test of the model can be summa rized as follows: 1. Derive from the responses of the supervisors of a specific class of employees a subjective utility scale U of criterion ele ments, using Siegel and Thurstone methods. 2. Obtain criterion element measures from rating scales and tests and criterion judgments and determine the beta weights that maximize the validity of the criterion element measures . 3- Compare the subjective utility scale with the beta weights of the criterion element measures. If the utilities provide good esti mates of the beta weights, the model will be supported. CHAPTER V THE EXPERIMENTAL RESULTS The- purpose of this chapter is to describe the results of the actual tryout of the method on the group of candidates for the position of Senior Civil Engineering Assistant, as described in the preceding chapter. Determination of the Subjective Utility Function The subjective utilities of the criterion elements were determined by responses to the questionnaire described in Chapter IV and shown in Appendix I. The major test of this method is that of transitivity. This was the biggest problem that Siegel (1956) found, and has led to the formu lation of probabilistic methods of determining utility. In this study, intransitivities may have one of two effects, depending on the entities involved. A limited number of the items in the questionnaire were used to determine the higher-ordered metric scale; any intransitivity in these items would render the entire questionnaire unscorable. The remaining items do not enter directly into the scaling process, but were used to provide a check on transitivity. Since intransitivities here do not affect the utility scale except to cast doubt on Its accuracy, these intransitivities were merely counted. 57 Of the 51 questionnaires returned, two were incomplete and l4 were not scorable due to intransitivity. This left 55 questionnaires (69$ of the total group) that yielded higher-ordered metric scales. Of this group, l4 (27$ of the total number of questionnaires) had no intransi tivities; the remainder had from one to seven intransitivities each. There were sharp differences between divisions within the department in the scorability of the questionnaires . For example, all eight of the questionnaires received from the Waterworks Division yielded utility scales with three of the eight having no intransitivities; the Sanitation Division, on the other hand, yielded only three scorable questionnaires from the nine received. Since the jobs varied somewhat from division to division, it would appear that the feasibility of this method for determining utility is related in some way to the nature of the job being scaled. The subjective utilities were also determined by Thurstone's pair comparison method (Torgerson, 195^), using the first 10 responses to the questionnaire • This approach uses frequency methods to establish the scale values, and therefore reduces the problem of intransitivity. The results of the Thurstone scaling of utility and its comparison to the Siegel method are shown in Table 1. The first column of this table contains the means of the subjec tive utilities of the criterion elements as determined by the Siegel method; the second column contains the subjective utilities as deter mined by the Thurstone method. The third column contains the Thurstone scale values after they have been placed on a common scale with the Siegel values by equating the means and standard deviations• It can be seen that the two scaling methods provide comparable values for the utility function.; this is similar to the results of the Sanders (1961) study. Description of the Criterion Element Measures The basic criterion element measures used in the study were the ratings obtained from the supervisors of the candidates concerning the candidates 1 proficiency in each of the five criterion elements under consideration. These ratings, which were obtained on a 5-point scale by the questionnaire shown in Appendix III, were used as hypothesized factor-pure test scores of the criterion elements. The descriptive statistics of these measures, including the intereorrelations, corre lations with the criterion and beta weights to predict the criterion, are given in Table 2. It should be noted that the assumption of inde pendence is not met. Another set of criterion element measures used was the scores on the five 15-item achievement tests designed to measure the criterion elements in the study. The scores used here.were standard scores with a mean of 50 and a standard deviation of 10- The descriptive statis tics, which include those mentioned above as well as the Kuder- Richardson Formula 20 test reliabilities, are given in Table 5* It will be noted that the test reliabilities are less than satisfactory for assuming construct validity of the tests. Further, the intercorre lation matrix of the test indicates that they are not independent. 59 Comparison of Utility and Beta Weights The beta weights were obtained from a multiple regression study of the criterion element measures, using as the dependent variable the Appraisal of Rromotability described in the previous chapter. In order to facilitate the visual comparison of the beta weights and utilities, the beta weights were transformed linearly to the same scale as the subjective utility function by equating the means and standard devi ations . The transformed values and the subjective utilities are given in Table 5 and presented graphically in Figures 7 and 9- It will be noted that the utility scale is quite close to the beta weights when ratings are used (r = .83), but not when test scores are used for this purpose (r = .26). Further, the predictability of the utility function as a weighting device is comparable to that of the multiple regression weights when ratings are used as criterion element measures. The correlation using the utilities as test weights is .431, as compared to a multiple R of >573 obtained from the multiple regression study. This result becomes even more striking when it is considered that the multiple R can be expected to shrink under cross-validation while there is no reason to expect the correlation using the utilities as weights to shrink. It may well be, then, that the utility function would pro vide an equal or higher correlation with the criterion under cross- validation than the multiple regression weights. Unfortunately, there was an Insufficient sample size to divide the group for cross- validation. It should be noted, however, that the same situation does not apply when test scores are -used as criterion element measures; here the correlation was -.043 when the utility function was used, as 6o compared to a multiple R of .603 obtained from the multiple regression study. (See Table 5-) Comparison of Utility and Validity It will be noted that neither the ratings nor the test scores were independent from one another. The beta weights take into account the intercorrelation of the predictors as well as the validity of each predictor. Since the model is based on the assumption of independence of criterion elements, it may be well to compare the utilities to the correlations of the criterion element measures with the criterion, i.e., the validities of the measures. In this manner the intercorre lations between the criterion elements would affect neither set of data. These values, also transformed to the subjective utility scale, are included in Table ^. The effect of using validities rather than beta weights increases the relationship to the subjective utility scale for the ratings but decreases it for the test scores . Here, the corre lations between the scales is -91 for the ratings and -.10 for the tests. These results are shown graphically in Figures 8 and 10. Summary The method proposed in this dissertation was employed using empirical data from an examination for Senior Civil Engineering Assist ant for a large government agency. The analysis was performed using two methods of scaling utility (Thurstone and Siegel) and two sets, of criterion element measures: criterion element ratings and test scores. The results indicated a strong relationship between beta weights and subjective utility when ratings are used but no relationship when test scores are used. The significance of these findings to the model will be discussed in the next chapter. CHAPTER VI CONCLUSIONS AND INTERPRETATIONS The overall purpose of this study was to derive and test a method for using the subjective utility function over criterion elements to weight tests used for personnel selection. This overall goal can be broken down into three subprcblems: to demonstrate the existence and scalability of the subjective utility function over criterion elements; to verify empirically the model relating subjective utility to beta weights in an idealized situation; and to determine the practical feasibility of using subjective utilities as test weights in actual selection problems. The purpose of this chapter is to evaluate the findings of this experiment in light of each of these questions and to discuss the value and implications of this approach. Determination of the Subjective Utility Function The evidence from the experiment seems to indicate that the sub jective utility function does exist and can be determined by scaling methods - Two methods were used to determine this function--the Siegel higher-ordered metric method and the Thurstone pair comparison method- Although both methods used the same starting point (the pair comparison data), the generation of the scale beyond the rank ordering of the entities was accomplished by completely different methods. The 62 Thurstone method used the frequency of choosing A over B as the means of generating the interval utility scale, while the Siegel method generated an interval scale for each individual by examining choices between pairs of entities in situations of uncertain outcomes. The final interval scale for the group was then obtained by averaging the individual scale values. In spite of the differences between methods, however, the two scales were so close as to be for all intents identi cal (r = -997)- This would tend to indicate that something beyond random variation was being measured by the scaling process . The fact that the utility function served adequately as a weighting device for one set of scores (the element ratings) indicates that what was being scaled was indeed subjective utility. The Siegel method did present some problems with achieving transi tivity. Only ll of the 51 questionnaires were completely transitive, while an equal number were completely unscorable, due to intransi tivity. The majority of the questionnaires, while scorable, had one or more intransitivities in items used for checking the scale. The problem of transitivity will have to be solved if this method of determining subjective utility in this context is to be generally operational. Verification of the Model The results of the experiment also seem to indicate that the rela tionship between subjective utility and beta weights is significant enough to allow for the use, in an idealized setting, of the subjective utility function as a weighting device in a personnel selection 6k problem. fyjhen ratings were used as criterion element measures, the correlation vising elements weighted by the subjective utilities and the criterion was nearly as high as the multiple correlation using beta weights. Consequently, it can be concluded that the observed data tend to support the model, or at least do not disprove it. Practical Usefulness of the Model The experimental results using test scores as criterion element measures do not, however, indicate any practical utility for this approach using these data. The zero-order correlations of the utilities and beta weights of the tests and the absolute lack of predictiveness of the utility function stand in stark contrast to the results found when ratings were used as criterion element measures. It would perhaps be helpful to examine some possible reasons for this situation. The primary reason seems to be that the assumptions of. the model appear to be inapplicable to the test scores. The basic assumption that is apparently untenable is that the tests have construct validity. It will be noted that the tests have generally low internal consistency reliabilities, which in itself rules out the possibility of high con struct validity. Further evidence is obtained by examining the corre lations between the test scores and criterion element ratings (Table 5), following the approach of Campbell and Fiske (1959)* These corre lations indicate that there is serious doubt that the tests and ratings are measuring the same thing. Only in the case of one element (Hydraulics) was the test-element correlation higher than the corre lation of the test with any other element. The study provides, of course, no conclusive evidence to show that the criterion element ratings are tetter indices of proficiency than the test scores. However, the evidence from the reliabilities and intercorrelations seems to indicate that this is the case. There is, in fact, one further bit of evidence that the tests do not have con struct validity. This relates to the manner of test construction used by the agency involved- Most test development procedures require that the items included in the final form of a test have empirical evidence relating to item reliability and validity; the agency, on the other hand, includes items solely on the basis of the judgment of subject- matter experts that the item measures what it purports to measure.'*' The tests included in this study were, therefore, essentially prelimi nary tryout forms with no predetermined empirical evidence as to the item parameters. Under these circumstances, it would be highly coin cidental if the tests did have either construct validity or predictive validity. The results of this experiment may have been entirely different if factor-pure tests had been constructed and used in this study• This result indicates a very important fact concerning this model: neither this approach nor the J -coefficient process obviates the need for test validation. The J -coefficient process is considered by its 4 developers to be only a preliminary step in test validation; the "*Tt should be pointed out that the test construction procedures used by this agency are quite common in the public personnel field. These comments should not, therefore, be regarded as critical of this specific agency but rather as an attempt to explain the results of the study. 66 results must be confirmed or revised by predictive validity studies. This approach, while not being part of an iterative process, shifts the emphasis from predictive validity of tests to construct validity; its sole advantage is that construct validity of tests is more easily and economically obtained than predictive validity. Without adequate test construction methods that ensure construct validity, however, the entire process breaks down. There is one other assumption concerning the test scores that must be considered. This is the assumption that the test score equal the probability that the individual earning that score has the criterion element in question (z = P). Neither the test scores nor the criterion element ratings were treated in such a way as to ensure this. Yet, the results may be due to the possibility that the ratings more closely satisfy this condition than the test scores. The feasi bility of such a transformation must be considered in future studies of this sort. In conclusion, it is felt that the most likely cause of failure to demonstrate the practical usefulness of this approach lies in the failure of the tests to meet the assumptions stipulated by the model. Limitations of the Study In addition to the difficulties mentioned above, the experimental design has several characteristics which tend to limit generalization of the results. First, there is the problem of the small sample used to test the model. The empirical data are based on one class in a fairly rigid civil service classification system, and on a small sample 67 of candidates within that class- The sample was, in fact, too small to subdivide for cross-validation purposes; it is felt that the multiple correlation with the criterion would shrink using the multiple regres sion weights in a cross-validation sample, but would remain relatively constant using the subjective utility function. In any event, it would be risky to generalize the results to any other class or position. Second, the model presented here severely limits the class of decision problems to which the process is appropriate- Basically, this approach applies only to testing situations in which the selection decision is based on a linear combination of predictors and in which the utility of the decision is directly related to the validity of the selection instruments- As Cronbach and Gleser (1965) have pointed out, however, there are a large number of decision problems in which this model is not appropriate. Third, there is the problem of "level of ability" in scaling utility. In this approach, the level of ability used as a reference point in establishing the subjective utility scale was arbitrarily set at the "journeyman level" of proficiency; in reality, proficiency is a continuous variable and utility of a particular criterion element would probably depend on the expected proficiency level of the appointee in that element. The problem of measuring subjective utility of criterion elements is, therefore, a multidimensional one; the unidimensional approach taken here is merely an approximation made under certain specified circumstances. This approximation is necessary at this point, since multidimensional methods of scaling utility have not been 68 as thoroughly worked out as unidimensional ones. In fact, the only published method is that of Hausner (195^)> which was applied to empirical data by Thrall (195*0 • As multidimensional methods of scaling utility are developed, it may be possible to apply them to this problem; in the meantime, however, it must be recognized that the approach used here is merely an approximation of the actual utility function. Implications of the Study The study presented here has several implications for the field of personnel testing. First, the results indicated that it is feasible to regard the ultimate criterion as a decision and to postulate the ex istence of a subjective utility function over criterion elements. This indicates that utility theory presents a link between construct validity and predictive validity. Second, the study indicates that decision theory has practical applications to the problem of determining test weights . One of the criticisms of decision theory has been that there are so many unknown factors in its application that it is effectively rendered useless in a practical sense. This study shows that a simple, economic method of determining test weights based on the principles of decision theory is feasible. Finally, the study indicates that the gap between the theoretical aspects of psychological measurement and the practical problems of test use are not so great that they cannot be bridged. And the indication is that decision theory will be the vehicle to provide this bridge. 69 Summary One of the major tasks of any of the applied fields of psychology is to provide for the economical application of the psychological principles developed by theoreticians. An area in which this problem is especially acute is in the determination of test weights that will maximize the validity of a test battery used for personnel selection- According to established psychometric theory, optimal test weights are determined, if the tests are uncorrelated, by the predictive validities of the tests - Validity studies are, however, often economically un feasible, and substitute estimates of validity must be used to deter mine these weights. The purpose of this study is to determine the feasibility of using the subjective utility scale of job supervisors over a set of criterion elements to estimate the optimal test weights of tests measuring those elements - Theoretically, this study is based on the assumption that for many positions the best or only criterion of success is a specific decision made by some individual, usually the supervisor of the employee in question, whether the employee is successful in performing the duties of the job. This criterion may be regarded as the sum of the expected utilities of a' series of criterion elements, which can be determined by established scaling techniques. A similar situation pertains to the weighting of tests to be used to select employees for these positions. In this case, the criterion is predicted by summing a series of weighted test scores, the weights being multiple regression (beta) weights. It has been shown by 70 Cronbach and Gleser that these weights are proportional in certain eases to the utilities of tests used for selection decisions. The basic hypothesis of this study was that the subjective utility function will provide a good estimate of the test weights that maximize predictive validity and may be used to provide estimates of test weights in situations where a predictive validity study is economically impractical. This hypothesis was tested by comparing the subjective utility function over a set of five criterion elements to the beta weights of measures for these elements, as determined by a multiple regression study in a personnel selection setting. The subjects were candidates for promotion to the position of Senior Civil Engineering Assistant with a government agency. Two criterion element measures were used: supervisory ratings of proficiency in the elements and achievement tests used by the personnel department of the agency as the written portion of the examination. The first set of criterion element measures served as a check of the adequacy of the model in idealized circumstances, while the second set was used to determine the usefulness of the model in a practical personnel selection problem. The subjective utility function was determined by a questionnaire based on Siegel-type measurement of subjective utility and by Thurstone pair-comparison methods. The results indicated that a subjective utility function was determinable over criterion elements and that this function was quite close to the beta weights when ratings were used as criterion element measures, but that there was no relationship between the utility function and the beta weights when the test scores were used. The latter finding was apparently due to the fact that the tests used in the study lacked construct validity. It was concluded that a subjec tive utility function does exist over a finite series of criterion elements in a personnel selection problem and can be measured by scaling methods; that this utility function is a good approximation of the beta weights in an .idealized setting, but that the practical application of this method of weighting tests requires the use of tests with high construct validity. REFERENCES REFERENCES Abelson, R. P. The choice of choice theories. In S- Messick & A. Brayfield (Eds.), Decision and choice. New York: McGraw-Hill, 196^. American Psychological Association. Standards for educational and psychological tests and manuals. Washington, D- C.: APA, 1966. Balma, M. J. Take another look at personnel testing: it can cut your costs. Personnel Journal, 1955? 55? A10-^13- Balma, M. J. The concept of synthetic validity. Personnel Psychology, 1959? 12, 595-596. Bechtoldt, H. P. Construct validity: a critique. American Psycholo gist, 1959? lA? 619-629. Bentham, J. The principles of morals and legislation. London, 1789. Bernoulli, D. Specimen theoriae novae de mensura sortis . Contentarii academiae seientiarum imperiales petropolitanae, 1758, 5? 175-192• (Translated by L. Sommer, in Econometrica, 195^-? 22, 23-26.) Birkhoff, G- Lattice theory. (Rev. ed.) New York: American Mathematical Society, Colloquium Publications, 19A8, XXV. Boyce, J. E. Comparison of methods of combining scores to predict academic success in cooperative engineering program. Unpublished doctoral dissertation, Purdue University, 1955* Brogden, H. E. On the interpretation of the correlation coefficient as a measure of predictive efficiency. Journal of Educational Psychology, I9A6, 37, 65-76. Brogden, H- E. When testing pays off. Personnel Psycholo n r L T j 19^9, 2, 171-185. (a) Brogden, H. E. A new coefficient; application to biserial correlation and to estimation of selective efficiency. Psychometrika, 19A9, lA, 159-182. (b) Brogden, H. E., & Taylor, E. K. The dollar criterion: applying the cost accounting concept to criterion construction. Personnel Psychology, 1950? 3, 133-15A • 75 7k Campbell, D. T., & Fiske, D- W* Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin., 1959, 56, 81-105. Chernoff, H-, & Moses, L. E. Elementary decision theory. New York: Wiley, 1959- Cochran, W. J. Improvement by means of selection. In J. Neyman (Ed.), Second Berkeley Symposium on mathematical statistics and proba bility ■ Berkeley: University of California, 1951* Pp. 419-170. Coombs, C- H. Psychological scaling without a unit of measurement. Psychological Review, 1950, 57, lk5-lk8. Cronbach, L. J. New light on test strategy from decision theory. In A. Anastasi (Ed.), Testing problems in perspective. Washington, D. C.: American Council on Education, 1966. Pp. 55-58. Cronbach, L- J., & Gleser, G- C- Psychological tests and personnel decisions. Urbana, 111.; University of Illinois Press, 1957- (2nd ed., 1965) Cronbach, L- J., & Meehl, P. E. Construct validity in psychological tests. Psychological Bulletin, 1955, 52, 281-502. Cronbach, L. J., Rajaratam, N., & Gleser, G. C. Theory of generaliz- ability: a liberalization of reliability theory. British Journal of Statistical Psychology, 1963, 16, 137-163■ Curtis, E. W. The application of decision theory and scaling methods to selection test evaluation. Dissertation Abstracts, 1966, 26(8), k79k. Dunnette, M- D. A note on the criterion. Journal of Applied Psychology, 1963, k7, 251-25k- Edwards, W. Subjective probabilities inferred from decisions. Psychological Review, 1962, 69(2), 109-135* Fagot, R. F- A model for ordered metric scaling by comparison of intervals. Psychometrika, 1959, 2k, 157-168. Ghiselli, E- E- Dimensional problems of criteria- Journal of Applied Psychology, 1956, k0, 1-k. Guilford, J. p. Is personnel testing worth the money? General Management Service, 1955, No. 1?6, 52-6k. Guion, R. M. Criterion measurement and personnel judgements. Personnel Psychology, 1961, lk, lkl-lk9- 75 Guion, R. M. Personnel testing. New York: McGraw-Hill, 1965* Gulliksen, H. Theory of mental tests■ New York: Wiley, 1950- Hausner, M- Multidimensional utilities. In R. M- Thrall, C- H- Coombs, & R. L* Davis (Eds.), Decision processes ■ New York: Wiley, 1957. Hull, C- L. Aptitude testing. Yonkers: World Book, 1928. Lawshe, C- H- Employee selection. Personnel Psychology, 1952, _5, 31-37• Lawshe, C. H-, & Balma, M. J. Principles of personnel testing. (2nd ed.) New York: McGraw-Hill, 1966. Lawshe, C- H-, & Schucker, R. E. The relative efficiency of four test weighting methods in multiple prediction. Educational and Psychological Measurement, 1959, 17, 103-1177 Loevinger, J. Objective tests as instruments of psychological theory. Psychological Reports, 1957, _3, 635-697- Luce, R. D. Individual choice behavior: A theoretical analysis- New York: Wiley, 1959- Luce, R. D., & Raiffa, H- Games and decisions. New York: Wiley, 1957- Luce, R. D., & Suppes, P. Preference, utility, and subjective proba bility. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology, Vol. III. New York: Wiley, 1965. Pp. 279-710. Mahoney, T. A., & England, G- W- Efficiency and accuracy of employee selection decision rules- Personnel Psychology, 1965, 18, 360-378. Mosteller, F., & Nogee, P. An experimental measurement of utility. Journal of Political Economy, 1951, 59, 371-707. Nagle, B. F. Criterion development. Personnel Psychology, 1953, 6, 271-289. Pareto, V. Manuel d'economie politique. Paris, 1907* Primoff, E. S. Basic formulae for the J-coefficient to select tests by job analysis requirements. Washington, D. C-: U. S. Civil Service Commission, 1955- (a) 7 6 Primoff, E. S* Test selection by .job analysis ■ (2nd ed.) Washington, D- C-: U. S. Civil Service Commission, 1955* (h) Primoff, E. S. Empirical validations of the J-coefficient. Personnel Psychology, 1959, 12, 4l3-4l8. Raiffa, H-, & Schlaifer, R. Applied statistical decision theory- Cambridge, Mass.: Harvard University, 1961. Ramsey, F. P. Truth and probability. In The foundations of mathe matics and other logical essays. London: Kegan Paul, Trench, Trubner, 1931- Richardson, M. W- The interpretation of a test validity coefficient In terms of increased efficiency of a selected group of personnel. Psychometrika, 1944, 9, 245-248. Sanders, D- L- The intensity of newspaper content preferences. Dissertation Abstracts, 1961, 22(5), 1607. Savage, L- J. The theory of statistical decision. Journal of the American Statistical Association, 1951, 46, 55-57* Savage, L- J. The foundations of statistics ■ Hew York: Wiley, 1954- Seashore, S- E., Indik, B* P-, & Georgopoulos, B- S- Relationships among criteria of job performance. Journal of Applied Psychology, i960, 44, 195-202. Siegel, S- A method for obtaining an ordered metric scale. Psychometrika, 1956, 12, 207-216- Taylor, H. C-, & Russell, J. T. The relationship of validity co efficients to the practical effectiveness of tests in selection: discussion and tables. Journal of Applied Psychology, 1939, 23, 565-578. Thorndike, R . L• Personnel selection: Test and measurement technique • New York: Wiley, 1949• Thrall, R. M- Applications of multidimensional utility theory. In R. M. Thrall, C- H. Coombs, & R. L- Davis (Eds.), Decision processes. New York: Wiley, 1954- Toops, H- A. The criterion. Educational and Psychological Measure ment , 1944, 4, 271-297. Torgerson, W- S. Theory and methods of scaling. New York: Wiley, 1958. 77 Trattner, M. H- Comparison of three methods for assembling aptitude test batteries. Personnel Psychology, 1963 j l6> 221-232. Underwood, B. Psychological research. New York: Appleton-Century- Crofts, 1957. Von Neumann, J., & Morgenstern, 0- Theory of games and economic behavior. Princeton, N- J.: Princeton University Press, . (2nd ed., 19^7) Wald, A. Statistical decision functions. New York: Wiley, 1950- TABLES 79 TABLE 1 Siegel and Thurstone Utility Scales Test Siegel Thurstone Conv. Thurstone Surveying and Mapping .66 Hydraulics .60 Mechanics and Materials .12 Soil Mechanics and Foundat i ons .28 Cement and Concrete .2l 2X6 2.02 0.00 - 2.02 -2.16 .62 •59 .11 .29 .26 8o TABLE 2■ Proportion of Choices of Criterion Element Over Other Elements (Thurstone Model) Preferred Element SM H MM SF CC Surveying and Mapping* — .42 -28 .20 .20 Hydraulics .58 -- -50 .2k •16 Mechanics and Materials -72 *70 -- -50 .28 Soil Mechanics and Foundations .80 .76 -70 -- .48 Cement and Concrete .80 .84 .72 .52 81 TABLE 3 Descriptive Statistics of the Criterion Element Ratings Prediction of Test Intercorrelations Criterion MM SM CC SF H M cr r P Mechanics and Materials — .18 .81 .81* .69 3-03 0.79 .27 2.11 Surveying and Mapping .18 — •13 .18 .27 3-77 0.69 •52 3-93 Cement and Concrete .81 •13 — .89 .61 2.91 0.7L •17 -0 .42 Soil Mechanics and Foundations .Qk .18 -89 - - • 66 2-77 0-77 •19 -1.38 Hydraulics .69 .27 • 61 .66 — 3.3^ 0.91 • 31 0.89 Multiple R .603 82 TABLE b Descriptive Statistics of the Tests Test Intereorrelations Prediction of Criterion MM SM CC SF H M cr r r p Mechanics and Materials -- .22 .53 -50 -66 50-05 10.21 -73 >^0 .19 Surveying and Mapping .22 -- -3b .3b .28 ^-9*95 10.00 -38 .02 -.07 Cement and Concrete .53 -3b -- *65 * 70 50-6^ 10.05 -53 -05 --33 Soil Mechanics and Foundations -50 -3b .65 — .70 50.00 10.11 A 5 *36 .20 Hydraulics .62 .28 .70 -70 -- 50-15 10.19 .6l .38 -20 Multiple E •573 83 TABLE 5 Comparison of Converted Regression Weights and Converted Correlations to the Subjective Utility Scale Test U Ratings Tests r ( 3 r P Mechanics and Materials •19 •19 .2b .27 •25 Surveying and Mapping •30 • 33 •32 • 10 .16 Cement and Concrete .11 .13 .lb .12 ■07 Soil Mechanics and Foundations • 13 .1^ .10 .25 .26 Hydraulics .27 .21 .20 .26 .26 Correlation With Utility •91 • 83 - .10 .26 84 Relationship of Test TABLE 6 Scores to Criterion Element Ratings Test MM SM Rat ings CC SF H Mechanics and Materials •53 • 31 • 50 •54 .42 Surveying and Mapping •17 .15 .10 •19 .16 Cement and Concrete .44 .12 .50 • 33 .27 Soil Mechanics and Foundations •53 .13 .44 • 36 •35 Hydraulics .62 . .20 •58 .58 cr\ 00 FIGURES 85 86 Parameter Space a. 1 Decision Rule Action Space Sample Space C11'C1 2 ......... Cln C21 c c ml mn Fig- 1- Paradigm of the decision-theoretic approach. 87 Actions Pig. 2. o H H C12 C21 o ro ro The consequence space. 0 -50 -1000 50 Pi p2 Fig- 3- A hypothetical payoff matrix- 89 Treatment A Terminal "Decision Treatment B | — s-Outcome Information About ___ Individual Treatment C -Strategy- Investigatory Decision Test 1 Test 2 Test 3 Fig. ^. Schematic view of a Cronbach-Gleser decision process (1965, p. 18). Information Categories Fig. 5 Criterion States (c) 3 1 2 1 3 it 1 It 2 It 2 1 2t 3 2t 2 2t 3 l 3t 3 3t 2 3t Validity matrix for treatment t . 91 (A, A; 1/2) (A, E; 1/2; /A, D; 1/2; [B, E; 1/2 {A, C; 1/2 (B, D; I/2X "(C, E; 1/2; (A, B; 1/2; XB, G; 1/2; [C, D; l/2)C '(D, E; 1/2)' [B, B; 1/2) [C, C; 1/2) ;d, D; 1/2) (E, E; 1/2) Fig. 6. Lattice of hypothetical options in the Siegel higher ordered metric method. Converted ( 3 Weights 92 •50 .20 • 10 0 0 .20 .10 Subjective Utility Fig. 7. Plot of test beta weights and subjective utilities. Correlation o f Tests With Criterion 93 .20 .10 0 0 .10 .20 Subjective Utility Fig. 8. Plot of correlations of tests with criterion and subjec tive utilities. Converted p Weights 94 • 50 .20 .10 0 .10 .20 • 30 Subjective Utility Fig. 9* Plot of beta weights of ratings and subjective utilities. Correlations o f Ratings With Criterion 95 •50 .20 .10 0 .10 .20 •30 Subjective Utility Fig. 10. Plot of correlations of ratings with criterion and subjective utilities. APPENDICES APPENDIX I SAMPLE QUESTIONNAIRE 97 APPENDIX I. SAMPLE QUESTIONNAIRE COUNTY OF LOS ANGELES CIVIL SERVICE COMMISSION RECRUITMENT DIVISION Test Research and Development Section EMPLOYEE PREFERENCE QUESTIONNAIRE April 20, 1966 In order to gather information that may he useful in the construction of achievement tests for the class of Senior Civil Engineering Assistant, the Test Research and Development Section of the Civil Service Coirenission is conducting an experimental program to determine the factors that the su pervisors of employees working in this class feel are important for success on the job. To this end, we are requesting your assistance in completing and returning this questionnaire. Your responses will be anonymous and will be used for research purposes only. We wish to emphasize that the results of this study will not be used to establish testing patterns for future ex aminations until the method has been validated. For the purposes of this questionnaire, assume that you have a choice of selecting one of two individuals for a vacant position of Senior Civil Engineering Assistant in your unit. These two individuals are both equally and fully qualified in all respects except for a knowledge of two areas in question. For each question, assume that Candidate A has a working knowledge at a journeyman's level of the area listed after the letter A, while Candi date B has no knowledge whatsoever of this area; similarly, assume that Candi date B has a working knowledge at a journeyman's level of the area listed after the letter B, while Candidate A has no knowledge whatsoever of this area. From this information, determine which of the two candidates you would prefer to hire as your employee and circle the letter which is the same as that of the candidate of your choice. Remember, we are interested in your personal preference only. EXAMPLE: A Civil Service Rules B Engineering Mathematics In this case, assume -that Candidate A has a working knowledge of the Civil Service Rules and no knowledge whatsoever of Engineering Mathematics, while Candidate B has a working knowledge of Engineering Mathematics and no knowl edge whatsoever of the Civil Service Rules. Remember, both candidates are fully and equally qualified in all other respects. If, on the basis of this information, you would prefer Candidate B to Candidate A as your employee, circle the letter B. Please go through the questionnaire once only; do not change your first re sponse unless you are absolutely sure you made a clerical error the first time. Answer every question; incomplete questionnaires cannot be processed and must be discarded. 1-2 1. A Hydraulics B Soil Mechanics and Foundations 2. A Surveying and Mapping B Mechanics and Materials 3. A Hydraulics B Cement and Concrete U. A Mechanics and Materials B Soil Mechanics and Foundations 5* A Surveying and Mapping B Hydraulics 6. A Cement and Concrete B Soil Mechanics and Foundations 7. A Mechanics and Materials B Hydraulics 8. A Cement and Concrete B Surveying and Mapping 9. A Cement and Concrete B Mechanics and Materials • o H A Surveying and Mapping B Soil Mechanics and Foundations For the next group of questions, assume that there are three candidates in volved. If you choose Candidate A, you are assured of his appointment and you know that he has a working knowledge at a journeyman’s level of the area listed after the letter A and no knowledge whatsoever of either of the two areas listed after the letter B. The letter "Bn, however, represents a panel of two candidates. If you choose this panel you must take a chance of appoint ing one of the two candidates, one of which has a working knowledge of the first area listed after the letter B and the other of which has a working knowledge of the second area listed after the letter B. Neither of these candidates has a working knowledge of both areas listed after the letter B nor of the area listed after the letter A. You know, however, that if you choose panel B you have a SO-SO chance of appointing each of these candidates} in other words, the situation is as if the appointment were based on the toss of a fair coin. EXAMPLE: A Engineering Mathematics B Civil Service Rules or Hydraulics In this case, Candidate A has a working knowledge of Engineering Mathematics and no knowledge of Civil Service Rules or of Hydraulics, while Panel B in volves a $0-50 chance of appointing a person who has a working knowledge of Civil Service Rules and no knowledge of either Engineering Mathematics or of Hydraulics or of appointing a person who has a working knowledge of Hydraulics and no knowledge of either Engineering Mathematics or Civil Service Rules. As above, circle the letter that corresponds to your choice. 1-3 11. 12. 13. I k . 15. 16. 17. 18. 19. 20. 21. 22. 23. 2U. 25. 26. 27. 28. 29. A Surveying and Mapping B Soil Mechanics and Foundations OR Hydraulics A Soil Mechanics and Foundations B Hydraulics OR Cement and Concrete A Hydraulics B Surveying and Mapping OR Soil Mechanics and Foundations A Cement and Concrete B Mechanics and Materials OR Surveying and Mapping A Cement and Concrete B Soil Mechanics and Foundations OR Mechanics and Materials A Surveying and Mapping B Mechanics and Materials OR Cement and Concrete A Hydraulics B Cement and Concrete OR Soil Mechanics and Foundations A Cement and Concrete B Hydraulics OR Soil Mechanics and Foundations A Cement and Concrete B Surveying and Mapping OR Hydraulics A Surveying and Mapping B Mechanics and Materials OR Soil Mechanics and Foundations A Mechanics and Materials B Hydraulics OR Cement and Concrete A Mechanics and Materials B Soil Mechanics and Foundations OR Surveying and Mapping A Hydraulics B Mechanics and Materials OR Soil Mechanics and Foundations A Soil Mechanics and Foundations B Mechanics and Materials OR Surveying and Mapping A Surveying and Mapping B Soil Mechanics and Foundations OR Cement and Concrete A Mechanics and Materials B Surveying and Mapping OR Cement and Concrete A Mechanics and Materials B Soil Mechanics and Foundations OR Hydraulics A Soil Mechanics and Foundations B Hydraulics OR Surveying and Mapping A Hydraulics B Mechanics and Materials OR Surveying and Mapping 1-^ 30. A Hydraulics B Mechanics and Materials OR Cement and Concrete 31. A Cement and Concrete B Soil-Mechanics and Foundations OR Surveying and Mapping 32. A Soil Mechanics' and Foundations B Surveying and Mapping OR Cement and Concrete 33. A Surveying and Mapping B Cement and Concrete OR Hydraulics 3U. A Cement and Concrete B Hydraulics OR Mechanics and Materials 35- A Mechanics and Materials B Hydraulics OR Surveying and Mapping 36. A Hydraulics B Surveying and Mapping OR Cement and Concrete 37. A Mechanics and Materials B Cement and Concrete OR Soil Mechanics and Foundations CO A Surveying ^and Mapping B Hydraulics OR Mechanics and Materials 39. A Soil Mechanics and Foundations B Mechanics and Materials OR Hydraulics U0. A Soil Mechanics and Foundations B Mechanics and Materials OR Cement and Concrete The last group of questions is very similar to the ones immediately preceding, except that both letters A and B represent panels which involve a 50-?0 chance of getting one of two appointees. If you choose Panel A, you have a 50-|?0 chance of getting an appointee who has a working knowledge of one of the two areas listed after letter A but no knowledge of either of the areas listed after letter Bj fur ther, you know that he will not have a working, knowledge of either of th8 areas listed after letter A. The same situation holds for Panel B. EXAMPLE: A Civil Service Rules OR Engineering Mathematics B Hydraulics OR Surveying In this case, a choice of Panel A involves taking a £0-f?0 chance of appointing a person who has a working knowledge of Civil Service Rules and no knowledge of Engineering Mathematics, or of appointing a person who has a working knowledge of Engineering Mathematics and no knowledge of Civil Service Rulesj in either case, the appointee will have 510 knowledge of either Hydraulics or Surveying. A choice of Panel B involves taking a ^O-fSO chance of appointing a person who has a work ing knowledge of Hydraulics and knowledge of Surveying or of appointing a person who has a working knowledge of Surveying and no knowledge of Hydraulics. Again, in either case the appointee will have no knowledge of either Civil Service Rules or Engineering Mathematics. Circle the letter that corresponds to the risk that you would prefer of the two. 1-5 Ul. A Mechanics and Materials OR Soil Mechanics and Foundations B Surveying and Mapping OR Hydraulics U2. A Mechanics and Materials OR Surveying and Mapping B Cement and Concrete OR Soil Mechanics and Foundations U3. A Cement and Concrete OR Soil Mechanics and Foundations B Surveying and Mapping OR Hydraulics i | l i . A Soil Mechanics and Foundations OR Surveying and Mapping B Mechanics and Materials OR Hydraulics U5. A Hydraulics OR Soil Mechanics and Foundations B Surveying and Mapping OR Cement and Concrete U6. A Mechanics and Materials OR Cement and Concrete B Surveying and Mapping OR Hydraulics U7. A Surveying and Mapping OR Soil Mechanics and Foundations B Cement and Concrete OR Hydraulics Uti. A Mechanics and Materials OR Soil Mechanics and Foundations B Cement and Concrete OR Hydraulics U9. A Hydraulics OR Soil Mechanics and Foundations B Surveying and Mapping OR Mechanics and Materials 50. A Mechanics and Materials OR Cement and Concrete B Soil Mechanics and Foundations OR Surveying and Mapping 51. A Hydraulics OR Mechanics and Materials B Cement and Concrete OR Surveying and Mapping 52. A Cement and Concrete OR Surveying and Mapping B Mechanics and Materials OR Soil Mechanics and Foundations 53* A Cement and Concrete OR Soil Mechanics and Foundations B Hydraulics OR Mechanics and Materials 5U. A Soil Mechanics and Foundations OR Hydraulics B Cement and Concrete OR Mechanics and Materials Please return this questionnaire to Don Schwartz, Room 5U8 Hall of Administration, 222 N. Grand Avenue, Los Angeles. If there are any questions, call Don Schwartz at 625-3611, ext. 6U25U. The Civil Service Commission wishes to thank you for your cooperation in this project. APPENDIX II SAMPLE SCORED QUESTIONNAIRE 99 KEY TO SCORING SYMBOLS A > B A is preferred to B . A,B > C A 50 - 50 chance of either A or B is preferred to C • A-B > C-D The distance Between A and B is greater than the distance between C and D • This relationship was used in determining the ordered metric scale of distances between adjacent elements. This response, although not used in determining the scale, was transitive with the relationships used to determine the scale. This response was intransitive with the relation ships used to determine the scale. APPENDIX II- SAMPLE SCORED QUESTIONNAIRE 1. ( a) Hydraulics H > S f ~ B Soil Mechanics and Foundations 2. A Surveying and Mapping y S A7 (j) Mechanics and Materials 3. (A) Hydraulics w B Cement and Concrete n S C (A U. ( a) Mechanics and Materials B Soil Mechanics and Foundations A1 A! > S 5. A Surveying and Mapping I4~> < a i ^ Hydraulics 17 ' ^ 6. (S Cement and Concrete B Soil Mechanics and Foundations O C 7 S A 7. A Mechanics and Materials / -/ P> '/u//1 (g) Hydraulics 8. A Cement and Concrete M C r1 (!) Surveying and Mapping ’ ' 9. A Cement and Concrete M 0 0 (§) Mechanics and Materials 10. (I) Surveying and Mapping ^ B Soil Mechanics and Foundations ‘ ^ For the next group of questions, assume that there are three candidates in volved. If you choose Candidate A, you are assured of his appointment and you know that he has a working knowledge at a journeyman's level of the area listed after the letter A and no knowledge whatsoever of either of the two areas listed after the letter B. The letter "B", however, represents a panel of two candidates. If you choose this panel yon must take a chance of appoint ing one of the two candidates, one of which has a working knowledge of the first area listed after the letter B and the other of which has a working knowledge of the second area listed after the letter B. Neither of these candidates has a working knowledge of both areas listed after the letter B nor of the area listed after the letter A. You know, however, that if you choose panel B you have a £0-50 chance of appointing each of these candidates; in other words, the situation is as if the appointment were based on the toss of a fair coin. EXAMPLE: A Engineering Mathematics B Civil Service Rules or Hydraulics In this case, Candidate A has a working knowledge of Engineering Mathematics and no knowledge of Civil Service Rules or of Hydraulics, while Panel B in volves a 50-50 chance of appointing a person who has a working knowledge of Civil Service Rules and no knowledge of either Engineering Mathematics or of Hydraulics or of appointing a person who has a working knowledge of Hydraulics and no knovTedge of either Engineering Mathematics or Civil Service Rules. As above, circle the letter that corresponds to your choice. Hy /1/i > SM> CC > II-2 11. A Surveying and Mapping f- 7* $ M H S M "? 5/FSF (b) Soil Mechanics and Foundations OR Hydraulics 12. A Soil Mechanics and Foundations ^ Hydraulics OR Cement and Concrete 13. & ) Hydraulics B Surveying and Mapping OR Soil Mechanics and Foundations ll*. A Cement and Concrete (g) Mechanics and Materials OR Surveying and Mapping l£. A Cement and Concrete K M } /^/V-dd ^<2. ® Soil Mechanics and Foundations OR Mechanics and Materials 16. (X) Surveying and Mapping fb Al > A f l /I ; £ <L N. -Zc, ?■ M M - S/1 B Mechanics and Materials OR Cement and Concrete ------------------- 17. (a) Hydraulics ^ B Cement and Concrete OR Soil Mechanics and Foundations 18. A Cement and Concrete H , S F H CC P C LC, S/~ (B) Hydraulics OR Soil Mechanics and Foundations 19. A Cement and Concrete (§) Surveying and Mapping OR Hydraulics 20. (£) Surveying and Mapping S M "> M M , S F S A1 ~ SF M M S 01 B Mechanics and Materials OR Soil Mechanics and Foundations 21. A Mechanics and Materials hi, CC P’A/AI H — K M M CC~ (B) Hydraulics OR Cement and Concrete 22. ® Mechanics and Materials B Soil Mechanics and Foundations OR Surveying and Mapping 23* Q) Hydraulics B Mechanics and Materials OR Soil Mechanics and Foundations 2k. A Soil Mechanics and Foundations (B? Mechanics and Materials OR Surveying and Mapping 25. (A) Surveying and Mapping B Soil Mechanics and Foundations OR Cement and Concrete 26. < j £ ) Mechanics and Materials B Surveying and Mapping OR Cement and Concrete 27. A Mechanics and Materials //, SF MM hf—MM "T* M M - S p (g) Soil Mechanics and Foundations OR Hydraulics 2b. A Soil Mechanics and Foundations (§) Hydraulics OR Surveying and Mapping 29. <^) Hydraulics B Mechanics and Materials OR Surveying and Mapping II-3 Ai a i 30. (a) Hydraulics B Mechanics and Materials OR Cement and Concrete 31. A Cement and Concrete ^ M ^ F > C(L. ^ (B) Soil Mechanics and Foundations OR Surveying and Mapping 32. A Soil Mechanics and Foundations i S ' (§) Surveying and Mapping OR Cement and Concrete 33. A Surveying and Mapping H . C O S ^ H - S s l 7 * * - Ce. (g) Cement and Concrete OR Hydraulics 3U. A Cement and Concrete ® Hydraulics OR Mechanics and Materials 35- A Mechanics and Materials f / , S / - 7 > A/Al \h F ~ /V /l SA^, (g) Hydraulics OR Surveying and Mapping 36. (a) Hydraulics 1 / ^ B Surveying and Mapping OR Cement and Concrete 37- (£) Mechanics and Materials i / " B Cement and Concrete OR Soil Mechanics and Foundations 38. A Surveying and Mapping (|) Hydraulics OR Mechanics and Materials 39. A Soil Mechanics and Foundations (g) Mechanics and Materials OR Hydraulics UO. A Soil Mechanics and Foundations (B) Mechanics and Materials OR Cement and Concrete The last group of questions is very similar to the ones immediately preceding, except that both letters A and B represent panels which involve a 50-50 chance, of getting one of two appointees. If you choose Panel A, you have a 50-50 chance of getting an appointee who has a working knowledge of one of the two areas listed after letter A but no knowledge of either of the areas listed after letter Bj fur ther, you know that he will not have a working knowledge of either of the areas listed after letter A. The same situation holds for Panel B. EXAMPLE: A Civil Service Rules OR Engineering Mathematics B Hydraulics OR Surveying In this case, a choice of Panel A involves taking a 50-50 chance of appointing a person who has a working knowledge of Civil Service Rules and no knowledge of Engineering Mathematics, or of appointing a person who has a working knowledge of Engineering Mathematics and no knowledge of Civil Service Rulesj in either case, the appointee will have no knowledge of either Hydraulics or Surveying. A choice of Panel B involves taking a 50-50 chance of appointing a person who has a work ing knowledge of Hydraulics and knowledge of Surveying or of appointing a person who has a working knowledge of Surveying and no knowledge of Hydraulics. Again, in either case the appointee will have no knowledge of either Civil Service Rules or Engineering Mathematics. Circle the letter that corresponds to the risk that you would prefer of the two. 11-^ H - y/i n ?> 5a? > < * t, sp Ul« A Mechanics and Materials OR Soil Mechanics and Foundations (B) Surveying and Mapping OR hydraulics U2. (£) Mechanics and Materials OR Surveying and Mapping is* B Cement and Concrete OR Soil Mechanics and Foundations U3. A Cement and Concrete CR Soil Mechanics and Foundations Surveying and Mapping OR hydraulics llU. A Soil Mechanics and Foundations OR Surveying and Mapping (g) Mechanics and Materials OR Hydraulics " U5. (S) Hydraulics OR Soil Mechanics and Foundations /V 5/^ 7 5/% d<L B Surveying and Mapping OR Cement and Concrete * 77----- ---- — = _ h r — *S AJ r C- d- — S U6. A Mechanics and Materials OR Cement and Concrete (S) Surveying and Mapping OR Hydraulics U7 • A Sui*veying and Mapping OR Soil Mechanics and Foundations ( b) Cement and Concrete OR Hydraulics I 48. A Mechanics and Materials OR Soil Mechanics ancl Foundations (B) Cement and Concrete OR Hydraulics U9. © hydraulics OR Soil Mechanics and Foundations ^>S f T 's " SPA___ B Surveying ancF Mapping OR Mechanics and Materials //- - m m > S M - ’ SF ' 50. (A) Mechanics and Materials OR Cement and Concrete B Soil Mechanics and Foundations CR Surveying and Mapping 51. (A) Hydraulics OR Mechanics and Materials B Cement and Concrete OR Surveying and Mapping 52. Cement and Concrete OR Surveying and Mapping MP4y S F ^ (§) Mechanics and Materials CR Soil Mechanics and Foundation^/v^_5/7> ct -S/ 7 53- A Cement and Concrete OR Soil Mechanics and Foundations (§) Hydraulics OR Mechanics and Materials '? MP1 d(L- 5U. (a) Soil Mechanics and Foundations OR Hydraulics > — — 1--- a; ooix mecnanics ana rounaations un wyarauiics ' — ---- — B Cement and Concrete OR Mechanics and Materials QTj TV ^ £ -£ ■ Please return this questionnaire to Don Schwartz, Room 5U8 Hall of Administration, 222 N. Grand Avenue, Los Angeles. If there are any questions, call Don Schwarts at 625-3611, ext. 6h25U. The Civil Service Commission wishes to thank you for your cooperation in this project. fT— M M ^ SA't — c c ^ M M — Sa? (L(L~PP~SF (4 -SAI t4 -ca H = ■ 1.00 —j— > W -C' a. > /4-Sn "7 // - M M > PAM -<>f= > MM-dt. ^ ? 5X-ec. ? M - S/i > /3 /a. 5 7 U ^ ,3g' '*3 .q% APPENDIX III RATING SCALE 102 APPENDIX III. RATING SCALE COUNTY OF LOS ANGELES CIVIL SERVICE COMMISSION RECRUITMENT DIVISION Test Research and Development Section CANDIDATE RATING SCALE September 29, 1966 The Test Research and Development Section of the Civil Service Commission is conducting an experimental program to determine the effectiveness of the written tests in our selection program. To this end, we are requesting your assistance in completing one of these rating scales for every employee under your supervi sion who was a candidate in the recent examination for Senior Civil Engineering Assistant* Your responses will be used for research purposes only and will be kept strictly confidential. NAME OF CANDIDATE Please rate the candidates knowledge in each of the areas and his overall knowledge as 0 (Outstanding), G (Good), A (Acceptable), L (Limited), and U (Unsatisfactory). A designation of n0n would indicate that his knowl edge of the given element would be in the top 5% of the candidates for Senior Civil Engineering Assistant. A "G" would indicate the top 2$%t an HAH the top 7$%, an "L” the bottom 2$%t and a nTF in the bottom 5% as below. T3T-&-------- “ ----------75-----------------------------55--------------------------- 55— ---------T~h 0 G A L U 1. Knowledge of Surveying and Mapping • CM Knowledge of Soil Mechanics and Foundations 3. Knowledge of Cement and Concrete U. Knowledge of Mechanics and Materials 5. Knowledge of Hydraulics 6. Overall Knowledge Pertaining to the Work of a Senior Civil Engineering Assistant Please return this form to Don Schwartz, Room 5U8, Hall of Administration.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Transformational Processing Of Sentences Containing Adjectival Modifiers
PDF
A Multidimensional Similarities Analysis Of Twelve Choice Probability Learning With Payoffs
PDF
Age Differences In Serial Reaction Time As A Function Of Stimulus Complexity Under Conditions Of Noise And Muscular Tension
PDF
Measuring Thought Process As An Ego Function In Schizophrenic, Mentally Retarded And Normal Adolescents By Means Of The Rorschach
PDF
The Effect Of Subject Sophistication On Ratio And Discrimination Scales
PDF
Aggression As A Function Of Arousal And Friendship Ties
PDF
Age, Sex, And Task Difficulty As Predictors Of Social Conformity: A Search For General Tendencies Of Conformity Behavior
PDF
The Influence Of Communality And N On The Sampling Distributions Of Factor Loadings
PDF
A Multidimensional Scaling Of Mood Expressions
PDF
The Discriminant Function As A Model For Diagnostic Problem-Solving
PDF
A Monte Carlo Evaluation Of Interactive Multidimensional Scaling
PDF
Conflicting Motives In The Prisoner'S Dilemma Game
PDF
The Role Of Intellectual Abilities In Concept Learning
PDF
A Monte Carlo Evaluation Of A Computer-Interactive Extended Transitivity Dominance Scaling Model
PDF
The Effect Of Anxiety And Frustration On Muscular Tension Related To The Temporomandibular-Joint Syndrome
PDF
Factorial Stability As A Function Of Analytical Rotational Method, Type Of Simple Structure, And Size Of Sample
PDF
The Effect Of Discriminability On The Partial Reinforcement Effect In Human Gsr Conditioning
PDF
Analytic Model For The Design And Selection Of Electronic Digital Computing Systems
PDF
Personality Variables And Intellectual Abilities As Determinants Of Concept Learning
PDF
Intellectual And Cognitive Factors In The Production Of Psychological Stress Reactions
Asset Metadata
Creator
Schwartz, Donald James
(author)
Core Title
The Subjective Utility Function As An Estimator Of Optimal Test Weights For Personnel Selection
Degree
Doctor of Philosophy
Degree Program
Psychology
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest,psychology, general
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Cliff, Norman (
committee chair
), Beckwith, Richard E. (
committee member
), Davis, Daniel J. (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c18-661183
Unique identifier
UC11361286
Identifier
6905067.pdf (filename),usctheses-c18-661183 (legacy record id)
Legacy Identifier
6905067.pdf
Dmrecord
661183
Document Type
Dissertation
Rights
Schwartz, Donald James
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
psychology, general