Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
An investigation of the effects of output variability and output bandwidth on user performance in an interactive computer system
(USC Thesis Other)
An investigation of the effects of output variability and output bandwidth on user performance in an interactive computer system
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
AN IN V ESTIG A TIO N OF T H E EFFECTS OF O U T PU T VARIABILITY AND O U T P U T BANDWIDTH ON USER PERFORM ANCE IN AN INTERACTIVE CO M PUTER SYSTEM by Lawrence Henry Miller A Dissertation Presented to the FACULTY OF T H E GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree D O CTO R OF PHILOSOPHY (Computer Science) October, 1976 UM I Number: DP22719 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Dissertation Publishing UMI DP22719 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 UNIVERSITY O F SO U THERN CALIFORNIA T H E G R A D U A T E S C H O O L U N IV E R S IT Y P A R K L O S A N G E L E S , C A L IF O R N IA 9 0 0 0 7 C p S ' 7 7 M 6>48 This dissertation, written by Lawrence Henry M iller under the direction of h.lDissertation Com mittee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of requirements of the degree of D O C T O R OF P H I L O S O P H Y ^ L 'i/ ^Z7-zr?<y ^ - ^ Dean Date... A C K N O WLEDGMENTS Professor Donald Oestreicher deserves rich credit for planting the germ of the ideas from which this dissertation grew. As the advisor, his unusual insight into the problem areas provided a constant source of inspiration. He was particularly helpful in the refinement of the final experimental design during and after the initial pilot study. Making an important contribution to the overall success of the work was Professor Madigan of the USG Psychology Department His willingness to work with people steeped in the jargon of Computer Science, and his suggestions of new vectors of inquiry and study helped ensure the soundness of the experimental and statistical designs. I was privileged to have been able to associate with him over the years that this project was in the making. Professor Carlson of the Computer Science department similarly served an invaluable position of both encouragement and challenge during the initial preparations and final presentation of the results. T h e many good and enduring people at ISI who served as subjects must receive special thanks for their time and their interest in allowing these experiments to go forward. Finally, a deep debt of gratitude for the diversions provided by my good friend and financial consultant, Anton Vigorish. Col. Vigorish had an unerring ability to take my mind off of the seemingly endless concerns of producing a dissertation. Rita deserves all of the thanks that I can give for the encouragement and fortitude to stick with me over the years of this effort. I save for last my deepest gratitude for Dr. John Franklin Heafner of North Carolina. It is certain that without his prodding, his pioneering efforts in man-machine studies, his absolute encouragement and his incredible insights into the ways of the world, this dissertation would not have been accomplished. TA B LE OF C ONTENTS I. IN T R O D U C T IO N 1 II. RELATED TO PICS AND AREAS O F STUDY 8 III. T H E EXPERIM ENTAL APPROACH 18 T he Model 2 0 Confidence Intervals 24 T he Parameters 25 IV. M ETH O D O LO G Y 3 2 T he System 35 T he Subjects 38 Pilot Study 40 Experimental Setting 41 Data Analysis 46 Validity 48 V. RESULTS 55 Conclusions 57 Task Results 59 Discussion and Analysis 67 Post-Test Question naire 72 The Questions 73 Interaction Effects 79 Discussion and Analysis 8 8 VI. CONCLUSIO N S 96 Future studies and Extensions of Research 102 BIBLIOGRAPHY 107 APPENDICES 1. Instructions to Subjects 111 2. Tasks 114 3. Post-Test Questionnaire 116 4. Sample Messages 122 v I . I N T R O D U C T I O N T he general area of this research is man-machine interaction, specifically analysis of system output. Broadly stated, this study shows that the variability of system output bandwidth significantly affects users1 performance across a wide range of interactive tasks. In the field of computer science, it is now possible to divert attention away from fundamental theoretical issues towards refinements in systems and applications design for the greater satisfaction of end users. To this end, the previous ad hoc methods of refining systems to users’ needs — based on the intuition of the designer or programmer - ought to give way to the more rigorous and reliable techniques of controlled observation, experimentation, and. development. This research demonstrates that certain parameters of the man-machine interaction environment are manipulatable as a means of improving user performance. This research uses an interactive message processing system now used at the University of Southern California’s Information Sciences Institute and other locations. This program has been modified to provide a useful means of examining the relationship between the performance of the user and the variables influencing that performance. 1 Later sections discuss in greater detail the parameters of the interaction, which are shown to influence performance. Appropriate measurements are | developed for evaluating performance in this kind of interactive task. T his work is an attempt to develop a model of man-machine interaction. Clearly a number of variables will affect performance in interactive tasks:, for example, intellectual and cognitive differences in users, computer speed and power differences, command input differences, and data display differences. A complete model of the interaction would take all of these parameters into account, and use them to predict both user and system performance. Because this model was developed in order to predict user performance as a function of changes in the output parameters, it fixes the values of the other parameters to determine how changes in output affect user performance. Statistical and experimental design [Winer, 1973] provide a framework for testing the validity of the model. The parameters hypothesized to influence performance are classified as independent variables and the performance measures as dependent variables. The objective is then to demonstrate that changes in the former produce changes in the latter. Statistical design theory suggests ways in which the problem can be structured such that the model may be tested and a probability statement made concerning the likelihood that changes in the independent variables will actually produce changes in the dependent variables. 2 T his research has involved two distinct phases. T he first selected reasonable variables, which were hypothesized to influence user performance, and useful performance measures. The second tested for significance, i.e., the relationship between the independent variables and the performance measures. Ultimately, the rationale for performing research on the effects of changes in the system parameters upon user performance is to make possible the development of interactive computer systems that are more compatible with the needs and the limitations or abilities of the potential users of that system, as well as to provide a framework for testing future systems. In particular, Chapter VI points out that controlled observation of people using an interactive system, under conditions which stress the functions available in the system, provides a means of identifying those aspects amenable to design improvement. A number of broad-ranging questions occur as one studies the ways in which people interact with computer systems: suitable input language, keyboard and terminal designs, format and intensity of displays, amount of material, content of responses, speed and variability of display rates, etc. Additionally, there are questions concerning the way in which different individuals perform in the man-machine interaction: I.Q. differences, motivational and cognitive complexity factors, previous experience, etc. A complete theory of man-machine interaction (MMI) would take all of these factors into consideration in 3 attempting to predict user performance for a given man-machine system (MMS). T he theory would also include parameters relating directly to the individual system, and perhaps as well factors which relating to the supporting computer system -- CPU speed, memory capacity, etc. Any complete theory or model of user performance would, of course, have to predict performance measures that are useful; a preliminary step in developing a model or theory of M M I is the selection of useful or reasonable performance measures. It is conceptually reasonable to break the set of MMI parameters into those which represent the man (user) and those which represent the machine. T h e machine parameters, in turn, may be divided into those which represent the particular interactive program, those which represent the interactive environment (the terminal, display and input language form), and those which represent the background processor. By fixing all of the parameters except the display ones, this research explores the effects of changes in the display upon user performance in a given (though not untypical) MMS. In fixing the user population, the MMS, the input language form, and the background computer system, we still tacitly assume that the levels at which these have been fixed are representative of a broad class of interactive systems and users. It is this assumption of the generalizability of the research which makes the results of potential interest 4 outside the immediate system and subject sample. Further elaboration on this concept of external validity is made in Chapters IV and VI. Specific hypotheses are tested in this research. By limiting the parameters, the basic hypothesis is that, for both the given user population from which the sample was taken and the particular interactive system employed, there are s i g n ific a n t performance differences between groups of subjects receiving different levels of the display variables. A more detailed elaboration of the techniques, both the physical experimental environment and the statistical and design techniques, is presented in Chapters III and IV. T he experimental sessions involve three phases. First, subjects are given an introduction to the system. Next, they are given a series of tasks to be accomplished through the use of the system, i.e., questions to be answered by selecting and examining messages from a data base. Since the data base is rather large, the search and boolean operations available in the system must be used in order to reduce the number of messages that must be examined. Upon finishing all of the tasks, the subject completes a questionnaire on a number of features of the system. The statistical model for the analysis of the subject’s responses is contained in Chapter III. Discussions of the controlled testing of man-computer interface issues are sparse in the computer science literature. Chapter II of this report reviews 5 some of the pioneering work from which the ideas presented in this work germinated. Since experimental design and statistical evaluation of experimental results are not within the mainstream of computer science, Chapter III briefly develops the statistical techniques necessary for the experimental design, as well as the analysis of variance and multivariate techniques used to analyze the task and questionnaire data. Chapter IV describes the experimental design used to test the hypotheses, describes the interactive system and the experimental environment (the physical setting), and presents a sample subject scenario. Finally, Chapter IV addresses the issues of internal and external validity (generalizability) of the results. T h e results of the experimental session are presented and discussed in Chapter V, including summary analysis of variance tables, graphical presentation of answers to the post-test questionnaire, and correlational findings. Chapter, VI extends the discussion begun in Chapter V, drawing conclusions and making inferences about the utility of the types of experimentation conducted in this work. The shortcomings of the controlled experimentation and observation technique are specifically mentioned, and suggestions are made for further research into the relationships between the parameters of the MMI and the performance measures. Finally, the desirability of a theory of MMI is suggested, particularly the benefits to system 6 designers of intensive observation of potential users to better understand those features of a given interactive system that best satisfy users5 needs. 7 II. REL AT ED TOPICS AND A R E A S OF S T U D Y T h e actual number of controlled studies -- either of specific systems and their user population, or broader theoretical studies of MMI - is extremely limited, although a number of authors express the opinion that these studies are needed. Willmorth [1972] states, Designing an information system for human use implies task analyses to determine the human actions to be performed, the decisions to be made, and the information required to be displayed to the human and expected from him, followed by the optimal design of the man/system interface. . .Time and effort must be devoted to designing a well-human-engineered system. Willmorth goes on to note that there is virtually no verified hum an engineering data for software and suggests an experimental methodology for examining the relationships between various versions of on-line planning systems and a set of (unnamed) performance measures or characteristics. T h e paper serves more as a call for ideas or research rather than as a detailed statement of valid or useful performance measures. Bennett [1972] concludes that "After a careful search of the major human factors and applied psychology journals. . .there is remarkably little evidence of research undertaken for the express purpose either of increasing our understanding of man-computer interaction or of providing information that will be useful in the development of systems that are optimally suited to user’s needs.” He identifies three areas that would benefit from human-engineering expertise: ( 1) conversational languages, (2) the effects of computer system characteristics on user behavior, and (3) the problem of describing, or modeling, man-computer interaction. Bennett’s main concern is with the utility of the human-engineering attempt at model building and its lack of benefit to a systems designer. He feels that there is too great a distance "between the symbolic concepts and real-world data." He further reiterates his call for research by noting that early work with interactive facilities had computer efficiency as the paramount consideration, and by noting that "the experience that makes optimum usage patterns obvious to the designer rests on a computer-oriented lore unknown to people who are not computer professionals.” His final remark is worth quoting in its entirety for its clear statement of the need of a discipline of MMI design: Because the theoretical basis for incorporating user problem-solving characteristics into analytical models is so rudimentary, the resulting user interface technology will take the form of procedural rules used by designers to guide their creative judgment. Indeed, the challenge for research is to transform the current art of design into an engineering discipline by developing an agreement on ways for characterizing user tasks, for allocating interface resources to meet task requirements, and for evaluating user effectiveness in task performance. Specific examples of research designs, methodology and results in the M M I literature include Walther and O ’Neil [1974], who studied the effects of both user characteristics (for example, evaluative attitude [i.e., prior attitude towards 9 computers], experience with on-line systems), and program and terminal characteristics (TTY vs. CRT, flexible vs. inflexible command recognizer). They found significant effects for terminal type and interface flexibility, as hypothesized, but there were often significant interactions with user experience or evaluative attitude. The utility of their work is that it submits to experimental verification their hypotheses concerning the relationships between the user and system variables of their study. Since some of their results were counter-intuitive, they add evidence that there is a need for carefully designed, well controlled experiments on the relationship between user and system characteristics and user performance. Their performance measures were limited only to time for task and syntax errors, and did not include user attitudes towards the interactive system or its interactive environment. T he specific system of their study was an interactive text editor; subjects were required to find and correct a number of mistakes in a body of text. Hansen [1976] examined differences in performance of groups of users in solving complex problems in on-line vs. batch environments. His study suffers from the difficulties inherent in performing research on users of interactive computer systems in that the extensions of the results to actual real-world environments is not always justified. However, he concludes his work by noting "it is not necessary to predict accurately and in detail in order to be useful. Man-machine research may be effective if it serves only to help the 10 designer to organize his thinking about how [users] perform, to enable him to distinguish those variables which are likely to be important, and to design ad hoc experiments to answer specific questions." Melnyk [19723 administered a post-test questionnaire in her study of the effects of limited ("frustrating") bibliographic search systems vs. a more open or free-form ("non-frustrating") search system. The questionnaire included items concerning keyboard design, keyboard ease of use, printing speed, etc.; a significant difference was found between the responses of those who experienced the "frustrating" system and those who used the "non-frustrating" system. There are some clear methodological difficulties in her study concerning the nature of the differences in the "frustrating" vs. "non-frustrating" systems, but her attempt to elicit user attitudes towards the system and its environment are significant. Unfortunately, there seems to be no concern for effects of subject differences between the two groups in her experiments. She reports a broad range of background and terminal experience among her subjects, but does not appear to incorporate the necessary experimental or statistical techniques to control for these differences. Since her work pioneers in considering the process of people using computer systems, and concerns itself with user-oriented issues, it must be viewed in terms of its selection of performance and attitude measures as well as its methodology. 11 Others who have done experimental studies of the effects of interface parameters upon user performance in interactive systems include Carlisle [1974], who was also concerned with interface complexity, and Ting and Badre [1976], who were more concerned with interactive modes and their effectiveness in teaching and the subject’s subjective judgment of the operational quality of the features provided. Importantly, the authors believe that "the overall judgment of the usefulness of the system was taken to be an indication of the success of the [man-machine] interaction." T he few researchers who have performed experimental research on interactive systems are concerned that the user’s view of the system is as important to the success of a man-machine system as performance measures such as time to complete tasks, errors, cost, etc. In fact, there is as ample a volume of work on the nature of useful performance measures for evaluating interactive systems as there is a sparsity of actual studies evaluating systems. Sterling [1974] discusses the need for "humanizing" computerized information systems and the difficulty in deciding just what that term means. Martin, Carlisle and Treu [1973], in examining the man-machine interface in a number of interactive bibliographic systems, note that there is a lack of "knowledge about the blend of ingredients that produces a comfortable man-machine interface." Treu, in a later paper [1975], suggests experimentation in the effectiveness of interface languages, where the performance measure is a 12 user-oriented one, i.e., the amount of mental work or "think time" spent by the user just before using a command. T hus it is reasonable to optimize systems in terms of user-oriented performance measures. A questionnaire (see Appendix 3) was administered to the subjects of this study as a means of eliciting their attitudes toward the system as they had just experienced it. The post-test questionnaire data was further analyzed to determine whether differences in the versions of the system of this study are associated with differences in user attitudes towards the system. Chapters V and VI contain detailed discussions of the questionnaire. Questionnaires designed to elicit the subject’s attitudes and opinions on system features were used by Heafner [1975] in his study of input language types for interactive message processing tasks, and by Heafner and Miller [1976] in their detailed study of the functions needed in a military automated message processing system. T h e study of time and delays in interactive systems is presented by Miller [1968], who lists a number of interaction modes (from first log-on through requests for lengthy compilations) and discusses reasonable time delays for system response. The reasonableness of a delay is based upon user expectation and the concept of psychological closure. He does not discuss the possible effects on the user of continuous excessive or unanticipated delays in response 13 over a period of time. The effects of repeated delays upon the user’s performance and attitudes form the foundation of this dissertation. Seven, Boehm and Watson [1971] were also concerned with the effects of delays in interactive problem solving. Their study forced users to remain away from the terminal (locked out) for varying periods of time and discovered that total problem solving time was lower at some longer delay times than at lesser delays. One conclusion is that system delays greater than a certain amount can be useful (if they are predictable ) in that they may free a user to engage in other productive activities. Clearly the effects of delays on user performance and attitude warrant further study. T h e effects of the variability in the output display rate on the performance and attitudes of users in interactive computer tasks forms part of the foundation of the research reported in this dissertation. There is a long tradition within experimental psychology of concern with the reaction time (RT) of subjects in various stimulus, response settings. The motivations for R T experiments varies from deep concern about the effects of fatigue and boredom upon drivers or pilots, etc., people whose work entails long periods of potentially extreme boredom occasionally mixed with sudden, usually unpredictable, moments when quick reactions are required, to concern about the underlying neurological processes by which we discriminate between differing 14 stimuli and construct the appropriate response, as a means of gaining insight into cognitive functioning. A number of individual studies involving varying stimulus and inter-stimulus times and the associated effects on R T have found that R T increases as the variability increases. Mackworth [1970], Mostofsky [1970] and Davies [1969] all present extensive surveys of the long history of experimental work in the variables (signal characteristics, task variables, subject variables and environmental variables) which effect user performance (generally measured in terms of response latency but also including other variables such as physiological measures of arousal, etc.) in RT and vigilance tasks. Some of the earliest results indicate that decrements in performance occur as the variability in the inter-signal arrival rate is increased, with the best overall performance being found with a regular series of events. Specifically, Mackworth [1970] reports on experiments which examined three levels of inter-signal variability and found that reaction time was shortest with the minimum variability. In the medium variability, the longest R T was found with those signals which followed the shortest inter-signal arrival. McCormack and Prysiazniuk [1961] used three levels of inter-signal interval variability. They also found that the shortest mean RT was found with the regular interval and the longest with the most irregular. 15 These authors indicate a theory of RT which involve the expectancy of the next signal on the part of the subject. It appears that subjects perform best f when the next signal occurs at a time approximately equal to the mean inter arrival rate of all previous signals. As the arrival of the next signal occurs significantly before or significantly later than this mean arrival rate of previous signals, the subject’s arousal is decreased and response suffers. It is this apparent decrease in performance directly associated with increased variability of inter-signal arrival rate that led to the conjectures studied in this dissertation, that increasing the variability of the output display rate in M MI tasks would be associated with decreased performance and attitude of users of the interactive system. It is clear, however, that the operations of reading a number of messages and responding to their content involves a greater amount of information processing than merely reacting to a single stimulus and responding with a simple manual operation. It is this greater information content of the stimulus in the tasks reported here which might lead one to believe that a greater level of subject interest occurs which could counter the effects of the variability of the output. But it has been observed informally that during periods of heavy computer system use, when the output and response times of the system are quite variable, that user frustration and dissatisfaction do occur. The experiments reported in this dissertation are an attempt at demonstrating in a formal manner the existence of this performance 16 and attitude decrem ent as system response variability increases. Suitable performance measures for evaluating user performance are mentioned in a number of the above references. Those used in this research are discussed in greater detail in later chapters. The above references suggested additional attitude measures to be incorporated. One additional questionnaire item (question 15; see Appendix 3) was suggested by Cooper [1973] who conjectured that a cold, hard cash expenditure might prove to be a useful means of judging the utility of an interactive system. His suggestion was modified somewhat for use in this research. 17 III. THE EXPERIM EN TAL A PPRO AC H T here is a need in the computer sciences for refinement in applications design in order to optimize system and user performance. Unfortunately, the well-established techniques that would allow designers to accomplish controlled research have not been accessible to them for a number of reasons: concern for "more pressing" design needs, lack of familiarity or experience with the techniques of experimental design, etc. As a consequence, one occasionally sees such statements as "Our system is easy and natural to use,” in reports on interactive systems, input languages, etc. without the verification studies to defend them. T his work develops and parameterizes a model of man-machine interaction. People interact with machines in different ways, and with different styles. We would like to be able to predict the results of that interaction, with some probability of success, on the basis of values of certain parameters of the. interaction. For example; how do differences in individual intelligence, cognitive style, training, specific experience with the system at hand, or educational background affect the interaction? How do machine differences affect performance? How do faster machines, higher input and output reates, and larger, more powerful machines affect performance? In fact, what kinds of performance measurements are useful in deciding upon the efficacy of a particular man-machine system ? 18 1 First we restrict our focus of attention to MMI where the ms chine is an interactive computer system. The man in the MMS will now be called the user of the facility. We conceive of him as approaching the system with the intent of solving a problem which could be a mathematical task or an information request, etc. We may further assume that the interaction is iterative in nature, i.e., through some initial interaction with the system, the user develops a partial solution to his problem, and this partial solution is used to converge upon a better solution to his problem. We see the MMI as a joint effort between user and machine towards a final state objective. This objective need not be clearly perceived by the user before interaction begins, and may change during the course of the interaction. [See Figure III-1.] 19 SYSTEM OBJECTIVE Figure III-1 Schematic diagram of man-machine interaction T h e figure is meant to imply that interaction moves towards an objective, and th at changes in the perception of the objective are fed back to the system, which in turn may cause the objective to change. T H E M O D EL We have a performance measure, P, which we would like to m axim ize in the MMI. P may be a set of measures P = (Pj I i=l,...,n}, containing values such as C PU time, memory, cost, time to complete the interaction, user satisfaction, frustration, etc. As described in the introduction, this work focuses on the user-oriented performance measurements described in detail below. 20 W e describe P as a function of various parameters of the system: P = f(p,, p2,...,pn), w here the pj’s are values of the parameters, f a function of the p^’s. T h e p^’s relate to the man side of the MMI, and the machine side as well. Thus: P -f(u ,.u 2 mj, m2,...,mk), w here U * * {u^} is the set of user (man) parameters, and M = fmj} is the set of m achine parameters. Some examples of these parameters have been m entioned above. T h e function f may take a num ber of forms. O ne simple form, for which a great deal of analytical power is available* is a linear function. In those cases where it is believed that the relationship is in fact non-linear, suitable transform ations may often be used to make the linear techniques appropriate. If f is a linear function of U and M, then P = ^ aijklu k! riV k,l=l,...,kmax, Imax, respectively i,j=0 infinity 21 P is a function of the u ’ s, the m’s, their higher order powers, and their cross products (interactions). T he model is refined and simplified below. T h is work is directed at selecting suitable u and m parameters, and testing the effect of changes in these parameters on performance. A problem occurs: once we have selected u and m param eters th at we believe affect performance, how do we demonstrate that this is in fact the case? P u t another way, suppose we fix all values of U = fu-} and M = » { mj}» except one, some param eter p. We then observe a num ber of different people interacting with our system, each perhaps with a different value or level of p. W e would expect performance to differ, and we take our measurements and observe differences. T h e problem is to decide w hether these observed perform ance differences came about because of chance fluctuations, because of some changes in other factors which we did not keep constant, or as the result of changes in the param eter p. Additionally, the relationship between the param eter and the performance measure may be non-linear, so that the linear model is not sensitive to the relationship. T h e perform ance that we predict as a function of this one variable, p, is conditioned upon the fixed values of the other U and M parameters. T h u s P = F(p | U, M). For the purpose of this research, as described in the introduction, we will assume fixed values of the Uj’s, T h e value th at we predict for P becomes a function of the machine parameters (M), conditioned upon fixing each of the user (U) parameters. Thus: P = f(M [ U), or to shorten, P = f y ( M ) , where f y is a function which depends on the specific values of the U parameters, and may be different for other values of the U param eters. T his issue of the generalizability of the results is discussed in greater detail in C hapter IV. W e will attempt to construct f y in such a way as to minimize the error in predicting P from the values of the parameters. T h e function is constructed from the set of observed pairs, (Pj.Mj), such that for each pair we calculate the predicted perform ance P’j and an error, e^. P ’ i - f y ( M j ) Predicted perform ance P | = fy (M p + Cj, Actual performance " P ’i * ei ei = P i8- P i O u r criteria for constructing f include one that a function of {e^} should be minimized. T h e choices for this function of the errors include one such th at the m axim um error is minimized, over all possible f’s as defined above. T h is "m ini-m ax’ * solution leads to Chebyshev approximations. Since our d ata may contain "noise" or ’ ’ outliers," this approximation is not useful here, since our 23 choice for f would be inordinately influenced by highly random or noise effects, A nother choice might be to minimize the absolute value of the errors, or the square of the errors. For consistency with the experimental literature [W iner, 1971], our criterion will be to minimize the sum of the squares of the errors, or the least squares solution. C O N F ID E N C E IN TERV A LS It is necessary to know whether the solution represents merely chance fluctuations in the data, or is descriptive of underlying processes. Unfortunately, this question can not be answered with certainty. T h e best that can be done is to assign a probability value or confidence interval to the solution. For example, in testing the performance of each of two groups in an M M I task, where the two groups receive different values or levels of one of the param eters, a confidence rating may be assigned to the observed perform ance differences. It might be possible to say that an observed difference as large as or larger than was observed would be expected to occur as the result of random fluctuations only, between two otherwise equal groups, only p per cent of the time. T h e levels of p at which we would be willing to agree that the observed difference is not the result of random fluctuations depends on the nature of the experim ent, the cost we place on erroneous interpretations, etc., but historically a level of I percent or 5 percent has been used. 24 T H E PA RA M ETERS It is possible to select from a large number of parameters those which would be expected to affect user performance in an MMI. Previous research (see C h ap ter II) indicates that those below may be expected to have a significant effect on performance. Furthermore, the variable "O utput V ariability" has im plications for designers of time-shared interactive systems who are interested in allowing the largest possible num ber of users access to the system. IN D E P E N D E N T VARIABLES (1) O utput Variability (var) Two levels: Low vs. High (2) M ean O utput Baud Rate (Baud) Two levels: 1200 baud vs. 2400 baud (3) O utput Volume (Vol) Two levels: < 1000 chars, vs. >1000 chars. D E P E N D E N T VARIABLES (1) Tim e to complete task (2) C PU time (3) Keystrokes used (4) Post-test questionnaire (attitude survey) 25 T hese dependent variables will be loosely called "User Performance," or P. T h u s the model becomes: P = ■ ffvj, i-l,..,n ) T h e lin ear model underlying multiple regression (or analysis of variance) implies th at P = > L(vj, i~l,...,n), where L is a linear function of the v^’s, or their products. If we use the three independent variables indicated above, each at two levels, the linear model reduces to P = aQ+a jV+a2V+a2B+a^vV+agvB+agVB+a<^vVB w here v represents the output rate variability, V the output volume and B the output baud rate. T he statistical or experimental model used to test the effects of v and V on P will be a 2 x 2 x 2 factorial design (F igure' III-2). Repeated measures design was used as a means of reducing the between-subjects variability and to extract a maximum am ount of useful inform ation with a minimum num ber of subjects. T h e figure below (Figure III-2) represents the possible com binations of system (independent) variables tested. Each "cell" of this factorial design represents one of the conditions, 1200 baud, low output variability; 1200 baud, high output variability; 2400 baud, low output variability and 2400 baud, high output variability. 26 { ± ! 2400 < o; Q Z> « 1200 OUTPUT VARIABILITY Figure III-2 Factorial Design: 2400 Baud and 1200 Baud Vs, Low and High O utput Variability Each subject in the experimental sessions is randomly assigned to one of the four conditions of the experiment. For each subject in each of the cells, a num ber of perform ance and attitude measures are taken as described above. T h e relationships studied in this research concern the perform ance and attitude differences between the subjects in the various cells of the factorial design. Specifically, it is conjectured that subjects experiencing the high variability versions of the system will show poorer performance and have a lower attitude tow ards the system than those experiencing the low output variability versions. Similarly, it is conjectured that those experiencing the 1200 baud versions of LOW HIGH 27 the system will have poorer performance and attitude towards the system than those experiencing the 2400 baud versions. A graph of perform ance vs. output variability for both the subjects experiencing the 2400 baud and those experiencing the 1200 baud versions would look like the figure below. U Z < Q £ O LL. LU Ql . V -L LOW HIGH OUTPUT VARIABILITY Figure III-3 Hypothetical G raph of Performance Vs. O utput Variability For 1200 Baud and 2400 Baud Versions Classical analysis of variance [Winer, 1971] provides a fram ework and statistical model by which the separate and combined effects of the independent 28 variables upon the performance measures may be tested. Specifically, given the observed differences in the cell means in the 2 x 2 x 2 factorial design, we attem pt to determ ine whether these differences are the result of chance or random fluctuations in sampling where the population means are actually identical, or are in fact different. Since our subjects are sampled from a larger population of interactive computer system users, the best we can do is m ake a probability statement concerning the observed differences, i.e., th at observed differences as large as or larger than we actually obtained would be expected to occur by chance only, in otherwise equal populations, only p per cent of the time. T h e general linear model, of which classical analysis of variance is a special case, offers a procedure by which we may test the observed differences and determ ine the probability p described above. Historically, a level of 1 or 5 percent is selected as a priori levels for which we would be willing to accept or reject the hypothesis of no population differences. T h e general linear model states that an observation or measurem ent upon subject k in cell i,j of an n x m factorial design can be described in terms of the separate effects of the overall mean of the measurement variable, the main effect for the first factor or independent variable, the main effect for the second factor or independent variable, the joint effect of the two of them combined, and an error term not accounted for by this linear model. T hus, x ijk - M * at * bj * abij * eijk 2 w here is the observation (measurement) of subject k in cell i,j; M is the overall mean of the dependent variable; is the effect for variable A at level i; b j is the effect for variable B at level j; a b ^ is the joint effect for A at level i and B at level j which cannot be accounted for by the separate effects of A and B (the interaction effect), and e ^ is the error or residual in predicting X ^ from the separate linear effects of A, B and the interaction. It represents all uncontrolled effects in the factorial design and may include higher order (non-linear) effects of the independent variables. Classical analysis of variance provides the statistical tests necessary to test the separate effects for the independent variables and their interaction upon the perform ance measurement. We may define the within-cell variance to be the average variance within each of the cells of the factorial design: M Swi.hin ' 2 < xi ' X > 2 ' <» ' » w here X| is an observation from a given cell, X is the mean of th at cell, and n is the num ber of observations in the cell. In this case, we assume th at n is equal for each cell, and, that the variances for each of the cells are equal. Since the linear model above defines the observation within a cell as a unique com bination of the A effects, the B effects, the interaction effects and the uncontrolled or error effects, all observations within a cell would be equal if there were no error. Thus, we equate the within-cell variance, M Swit^ in, to the error term in the linear model. We define the mean square due to the 30 m ain effect for factor A by M Sa = nq 2 (Ai (bar) - M)2 / (p - 1) w here n is the num ber of subjects per cell (assumed equal), q is the num ber of levels o f the second independent variable (B), p is the num ber of levels of the first independent variable (A), Aj (bar) is the mean of A at level i, and M is the overall mean of the dependent variable. W e define the MS due to the main effect for factor B analogously. Finally, the MS due to the interaction between A and B is defined by: M Sab = n 2 2 (AB.j(bar) - A ^bar) - B ^ b a r)'♦ M)2 / (p-l)(q-l) w here the terms are as defined before, ABjj(bar) is the mean of cell i,j, and the double sum is taken over all of the p x q cells of the factorial design. In a 2 x 2 design, the denom inator reduces to I. T h e hypothesis of no effects for any of the independent variables on the perform ance measure is equivalent to stating that the means of all the cells a re equal. T h is is equivalent to stating that MSbetween = M Sw ithin, or equivalently, M Sbetwegn/ MSW |tb -n = 1.0. W hen A and B are fixed factors (the levels o f each in the factorial design represent all of the levels to which generalizations will be made), tests planned before the data are obtained may be m ade by use of the F statistic, F = « M s comparison /M Swithin» with a sam P!e distribution given by the F distribution at 1 and pq(n-t) degrees of freedom. 31 I V . M E T H O D O L O G Y T h e experim ental design used to test the effects of the variables of this study on the perform ance measures was a 2 x 2 x 2 factorial design, with repeated measures on each subject across the two output volumes. A diagram of the factorial design is presented below (Figure IV-1). T h e levels of the independent variables selected for this study were O u tp u t baud rate 1200 O u tp u t rate variability Low O u tp u t volume <1000 Chars. 2400 H igh >1000 C hars. LU Q Z D < co 2400 1200 LOW OUTPUT VARIABILITY HIGH Figure IV-1 2 x 2 x 2 Factorial Design 32 T h e perform ance measures used in this research have been described in the previous chapter. It is not initially clear just where the focus should be in o rd er to im prove the performance of an MMS. For example, tradeoffs are required whenever it becomes necessary to optimize one part of an interactive system. In general, increasing the output display rate requires either a m ore pow erful processor or a limit on the num ber of users who may simultaneously use the interactive system. Beyond a limit in the display rate, further attem pts to increase the speed of output, with a given C PU and a given num ber of users, leads to noticeable degradation in system performance; in particular, to ta l time to solve a problem, perform a compilation, etc., is increased. Even though the nominal output rate is increased, the actual output rate is decreased, since the processor can in general service only one term inal at a time, then m ust service others before returning. T his leads to a high variability in the output display rate, with characters being displayed in bursts of variable length, and with a variable time delay between bursts. O n e of the conjectures tested in this research is that this variability in the display rate is associated with a decrease in user perform ance in interactive problem solving tasks, i.e, even when the nominal and actual display rates are similar, the variability alone would lead to reduced perform ance. Furtherm ore, it was conjectured that people who use a heavily loaded system (i.e., one with a great deal of variability in output display rate) would have a 33 poorer overall view of the entire MMS and its environment. T h e post-test questionnaire represented an attempt to measure subjects’ attitudes towards the entire M MS as they had just experienced it. Derailed analysis o f the questionnaire is presented in the next chapter. T h e selection of performance measures used in this research followed from the above considerations. In particular, the total time to complete the tasks and the num ber of functions used (keystrokes) in completing the tasks would give a gross indication of user performance in problem solving. O th er measures reflecting user performance included num ber of errors made, help requests, use of references, etc. Number of errors, however, was not used for these experiments, since the subjects were instructed to work on a task until they were satisfied of the correctness of their response; thus the num ber of wrong answers on any given task was virtually zero. Similarly, C P U time used in completing the tasks gives a gross measure of computer performance. O th er i m achine-oriented measures might include measures of memory access, disk access, memory management time, etc., but these are partially reflected in the total C PU time used. Furthermore, the emphasis was on exam ining the relationship between the independent (display) variables and the user-oriented perform ance measures. 34 T H E SYSTEM T h e system used to test the influence of the independent variables on th e perform ance measures is an interactive message retrieval system in use at the Inform ation Sciences Institute and other locations. It works on unstructured b ut form atted text files which conform to a standard message fo rm at T h e program is in regular use by a number of users of the ARPA com puter network and is routinely used by all of the subjects of these experiments. T h e subjects required essentially no additional training. T his system has been m odified to perm it perform ance measurements to be taken on-line. Further m odifications were m ade to the program in order to mold it to the simulated travel departm ent environm ent of this study. T h e data base consisted of approximately 200 travel request messages (see A ppendix 4 for examples of messages from the data base). T h e messages-were generated by a SAIL program [Smith, 1975; V anLehn, 1973] using a random num ber generator to select names of travellers, dates of travel and return, "fellow travellers" and destination cities. Each message consisted of the following fields: To: All were to the Travel Department F r o m : T h e name of the individual requesting the travel. In all cases, the person sending the message was requesting travel for himself, and up to three 35 • additional people. S u b j e c t : Consisted of the word "Travel" followed by the destination city and th e date of intended departure. T his field was accessed as the "Destination or Date" field. It corresponds to the "Subject" field in the standard message form at. Message body : All messages were worded as follows: Please reserve a seat [or N seats if more than one person travelling, where N is the total num ber travelling] to <destination> on <date o f requested travel> for me [if more than one, Second traveller T h ird traveller Fourth traveller] R ET U R N : <date of return, or O PEN> T h an k s T h e program consisted of the M ain, MSG and Questionnaire modules. T h e M ain module initiates the experimental session. T he identification of the subject is entered, and the values of the parameters for the subject are generated. T h e system returns to this module after all tasks are completed. T h e M SG module is the modified message processing system. Subjects are instructed to assume that they are a clerk in the travel departm ent of an organization. T h e system has been modified so that commands to the system relate to the types of requests which might be made of a data base of travel messages. These include commands that allow the user to search on D ate of 36 requested travel, Destination city or Name of traveller. T h e system allows complex boolean search requests combining parts of the message normally available for searching. T h e usual result of a search is a listing of just the headers of the messages satisfying the search request. From these headers, the user may specify the exact message or messages to be displayed on the screen. If the user has reason to believe that the selected messages will not be too numerous, or knows th at he will w ant to read all of the selected messages, he may have the System immediately begin typing them on the screen rather than first displaying the headers. T h e command structure uses single-letter commands. For example, to see those headers of messages where the requested date of travel was in April, the user would type H D April T h e system would echo back what he had typed by completing the command, and the user would actually see the following on his screen: Headers Destination or date string: April 37 A fter a period of time during which the system is searching the entire d a ta base and selecting just those messages which have the word A pril (case insensitive) in the Date of Travel string, the headers of those messages would be typed out. T h e Q uestionnaire module administers the post-test questionnaire to the subject (see A ppendix 3), maintains the file of answers, and includes the screen editor for the final open-ended question. T H E SU B JE C T S T h e subjects used for these experiments were members of the professional and secretarial staff of ISI. In order to minimize training time and to better represent a population of experienced interactive computer users, subjects were taken from those who had already had some experience in using the system. T h o u g h most users of interactive computer systems do require training and experience with a particular system in order to reach maximum efficiency,-the learning and adaptive phase represents only a very small fraction of the total tim e in which they will be interacting with the system. T o use subjects who have not reached full familiarity with the particular test system leads to the problem of confounding learning effects with performance effects. 38 Subjects for this study represent a population of experienced interactive message search and retrieval system users. Though this population has been sampled only within the confines of a research institute environm ent, the broad cross section of subjects for this study- male and female, professional and secretarial- leads to the conclusion that this sample is representative of the types o f people who might use interactive search and retrieval systems in other environm ents. Such individuals might include, but are not limited to, librarians or other users of interactive bibliographic systems, airline reservations personnel, hotel reservations service personnel, users of d ata base m anagem ent or retrieval systems, etc. Subjects were randomly selected from the entire ISI staff. T h e only requirem ent for participation (other than volunteering) was that the subject use M SG, the message processing system modified for this study, on a regular basis. T h e factorial design necessitated approximately ten subjects in each of the four cells of fhe design. Approximately 40 subjects were needed for the experim ent. D ue to system difficulties, however, the number of actual subjects was 9 per cell, a total of 36. Each subject was assigned to one of the four ceils by a random num ber assignment procedure internal to the initialization routines. A fter approxim ately ten subjects, an attempt was made to weight the cell assignment so th at the num ber of subjects per cell would be equal. Most im portantly, however, subjects were assigned to baud and output variability conditions 39 w ithout regard to whether they were male or female, secretarial or professional staff, and without regard to time of day or current system loading. Some attem pt was made, however, to ensure that no one cell contained all of the females, or all of the professional staff, etc. P IL O T STU DY A pilot study was conducted prior to the main research effort reported here. It allowed for an initial testing of the relationship between the m achine-oriented independent variables and the performance measures. It dem onstrated the desirability of including output baud rate as an additional display variable, both because of its intrinsic potential effect upon user perform ance and also because of the potential confounding effect between output variability and actual display rate. T h e pilot study afforded an excellent opportunity to refine both .the experim ental design and the physical design of the sessions. In fact, a "pre" pilot test was conducted using a small subject sample in order to iron out problem s that subjects might have in the use of the T ravel Message System. T h e experience gained in the pilot study made the main experim ental sessions relatively free of the need for experimenter intervention. T his was not entirely the case during the pilot study. 40 ! D ata analyses were accomplished on the data taken during the pilot study, and they m ade it possible to refine actual statistical procedures accomplished on the m ain experim ent data. T he results of the analyses of the pilot study d a ta strongly agree with the results obtained in the main study. T h e pilot study provided initial support for the validity of the conjectured relationships between the display variables and the performance measures. E X P E R IM E N T A L SE TT IN G T h e experim ents were conducted in an office at ISI during norm al working hours (see Figure IV-2). Subjects were brought into the experim ental setting one at a tim.e. T h e room contained the usual ISI office furniture, including a Hewlett-Packard 264OA C R T and keyboard (HP), the computer term inal which all participants in the experiments ordinarily use for a large part of their daily activities, and a table with answer sheets for writing responses to the series of tasks to be perform ed. T he H P includes a 24 line by 80 character (5 inch by 10 inch) rectangular C R T display and a separate keyboard attached to the display by a connecting cable. T he normal display rate is switch-selectable from 110 baud to 2400 baud. At ISI, terminals are used at the 2400 baud rate (approxim ately 240 characters per second). T o simulate the 1200 baud display rate used in these experiments, each line of output was interleaved with "nuU" characters, which produce no output on the screen, but have a bit string and 41 are handled as an ordinary character, in order to increase the display time for a given output string by a factor of two. Terminal Table Subject Experimenter Figure IV-2 Experimental Arrangement As the subject entered the office, he or she was given a set of instructions on the use of the system as well as the answer sheets for writing the answers to the tasks. (See Appendix I for the instructional materials given the subjects and the instructions read them by the experimenter.) T he subject was invited to sit down in front of the terminal and was read the instructions by the experim enter. After being advised of his or her right to leave at any time, the subject was given an additional brief description of the nature and intent of 42 the research. Subjects were then told that the first two tasks were sample tasks, and th at they could use their time working on them to become fam iliar with the new commands available in the Travel Message System that are not included in the usual ISI version of MSG. They were also told that no data was being taken during the sample tasks, but that during later tasks, data would be taken on line, and that they were to work as quickly and efficiently as possible. T hey used the time on the sample tasks to experiment with com m ands in the system which they might not ordinarily use in their routine, day-to-day message retrieval procedures. In particular, most do not use the M ultiple search (boolean) requests on a regular basis; the sample tasks allowed them the opportunity to refresh their knowledge of the operation o f the boolean search requests. For each of the sample tasks, and each of the actual tasks, the task question appeared in a reserved area at the top of the screen, where it would rem ain until the subject pressed the "N" key to go to the next task. All requested output would then appear below the reserved area and would scroll in the norm al manner. T h e figure on the next page (Figure IV-3) indicates the way the screen looked to the subject with the first sample task description in the reserved area, and sample output in the working area. 43 S !) H O W MANY REQ U ESTS W ERE MADE T O T H E T R A V E L D E P A R T M E N T for travel to San Diego? Follow the instructions on the answer sheet to answer this question. »*< ^ > * < » * < > * •« > * : > ;■ : > ’« » * < s - * t > *« > * < > *« > *« > * < <<<? ^ ^ ^ ^ >J < <- headers destination or date string: san diego 8 Larry Miller Travel, San Diego, April 2, a.m. 11 Jan e Doe Travel, San Diego, M arch 26 p.m. 24 Larry Miller Travel, San Diego, M arch 25 a,m. 39 Alan Schwartz Travel, San Diego, May 26 a m. 46 T . Smith Travel, San Diego, May 15 a.m. 51 John Wilson Travel, San Diego, M arch 16 p.m. 55 Sim Farar Travel, San Diego, M arch 13 p.m. 57 Alan Schwartz Travel, San Diego, Feb. 13 p.m. 69 Alan Schwartz Travel, San Diego, Feb.26 a m. 81 John Wilson Travel, San Diego, M arch 20 p.m. 90 Bob Wilson Travel, San Diego, June7 p.m. 92 Bob Wilson Travel, San Diego, M arch 17 p.m. 97 D avid Simpson Travel, San Diego, M arch 11 p.m. 101 Sim Farar Travel, San Diego, April 2 a.m. 122 Sim Farar Travel, San Diego, June 8 p.m. 130 D avid Simpson Travel, San Diego, Jan. 21 a.m. . 137 Larry Miller Travel, San Diego, May 10 a.m. 155 Bob Wilson Travel, San Diego, Feb. 11 a.m. Figure IV-3 T ravel Message System display after completing first sample task Each task required the subject to read or count a num ber of messages and w rite the appropriate information on the answer sheets provided. Because subjects had a good deal of familiarity with the system in their daily use, they required little help in the use of the system during the experim ental session. T h e experim enter was available to provide help on the use of the functions if the subject requested it. Essentially, the experimenter simulated the help function normally available in MSG. T he usual help facility, however, was not 44 m ade available during these sessions because of programming peculiarities. After the subject completed a task and was satisfied with his answer, he pressed the "N" key to go to the next task. After all of the tasks were completed, the instructions for the post-test questionnaire appeared on the screen. T h e subject completed the questionnaire, including an open-ended question which allowed him to express his general comments on the system and, in particular, to comment on any areas about which he might have felt strongly but which were not adequately covered by the previous questions. In particular, though it was not a part of the variables of this study, this final question offered an opportunity for subjects to express their ideas on ways to im prove the system for the kinds of tasks performed in this study. Each subject’s total time in the experimental session varied with the particular combination of independent variables experienced; the average was about 1 to 1 1/2 hours. A brief comment is in order at this point on the effect of a heavily loaded system upon the subjects. For a num ber of the subjects in the high variability versions of the system, the experiment was not entirely pleasant. M any of these subjects felt, and rightly so, that the system did not provide an adequate range o f functions to efficiently work the tasks. These opinions are m irrored in the responses to a num ber of the questions in the post-test questionnaire; these are 45 discussed in greater detail in the next section. However, it is useful to point out that it is in a heavily loaded system that the inadequacies become noticeable and burdensome. It appears that it is necessary to stress a system in a reasonable operating environment in order to better perceive its usefulness and its shortcomings. A further discussion of this point is made in C hapter V I. D A TA ANALYSIS Subjects were randomly assigned to one of the four conditions of the factorial design, as described above. T h e theoretical basis for analyzing the relationship between these independent variables and the perform ance measurem ents of this study were described in the previous chapter. T h e factorial design, with repeated measures on each subject across the volum e levels, lends itself to analysis by the classical analysis of variance methods. Analysis of variance provides a means of partitioning the total variance in the dependent measures into that which can be accounted for by differences in the independent variables (as well as that which cannot be accounted for by these differences). T h e ratio of the between groups variance to the w ithin cell variance provides a test of the probability that the observed differences between the means of the cells in the factorial design are due to chance differences only. 46 T h e analysis of variance provides a means of testing for the main effects and the interactions as described above. Specifically, analysis of variance was used to test the effects of the independent variables upon the perform ance m easures separately. T hree separate analyses were required to test the effects on the time to complete the tasks, the C PU time used, and the num ber o f functions used to complete the tasks. T he data was analyzed w ith the perform ance measures totalled for the entire session, for low and high volum e tasks. T h e analysis of the data of these experiments involve the classical analysis of variance for differences in the main effects (main effects for output variability, output baud rate, output volume), and for the significance o f the interaction effects. Classical analysis of variance does not provide the m echanism for testing the significance of the effects of a group o f independent variables upon a group of dependent variables. Put another way, if the dependent or performance measure is viewed as an n-dimensional quantity, then m ultivariate techniques are necessary. T he analyses carried out on the d ata of this study includes multidimensional analyses of the relationship between the independent variables and the n-dimensional perform ance measures; the results are detailed in the next section. Specific m ethods used were a multidimensional analysis of variance to assess the effect of output variability alone on the n-dimensional performance measures, and canonical 47 correlation, to assess the relationship between the two sets o f variables. V A L ID ITY I n t e r n a l V alidity Cam pbell and Stanley [1963] and others provide a framework within which the necessary control on the experimental design may be exercised in order to better assure internal and external validity. Specifically, in te r n a l v a lid ity refers to whether observed changes or differences between groups may reasonably be ascribed to changes or differences in the independent variables of the study. T he possiblity of uncontrolled, unanticipated, or unknow n differences between the groups in the experimental design m ust be considered whenever an experimental design is analyzed. Following is a list o f eight extraneous variables identified by Campbell and Stanley which, if not controlled for in the design, may produce effects which are confounded with the experim ental variables. 1) H IS T O R Y — S p e c i f i c events occurring between testing in a d d itio n to the exp e rim en ta l variables. Not applicable, since the design of this study did not involve a test, re-test situation. 2) M A T U R A T IO N -P r o c e s s e s within subjects operating as a f u n c t i o n o f the passage of time (fatigue, hunger, I I 48 b o r e d o m , etc.). T h e experim ental session was short enough and involved types of tasks sufficiently similar to what subjects perform in their usual work th at fatigue and hunger effects are not reasonable. Boredom was a problem with some of the subjects in at least one of the experim ental conditions (high variability coupled with 1200 baud output rate). However, this change in the subject’ s attitude, from interest to boredom, was considered to be of importance in evaluating the subject’s overall response to the system and its environm ent Thus, though boredom itself was not m easured directly, its consequences — as indicated by the answers to the post-test questionnaire — were of interest. 3) T E S T IN G ~ T /r e e ffe c ts o f testing upon su b s e q u e n t t e s t i n g . For those subjects who were involved with the pre-test, there was a sufficient amount of time between testing sessions (approxim ately four months) so that they approached the sessions with no a p p a re n t carry-over effects. In any event, subjects were assigned random ly to groups in the design, so that there appears not to have been any selection bias. See item 6 below. 4) IN S T R U M E N T A T IO N --C /ian jfes in obtained m e a s u r e m e n t due to changes in in stru m e n t calibration, or changes in the observers or judges. 49 AH data was taken on-line. Specifically, the timing measurements, C P U time used, and count of keystrokes was compiled internally by the program . Also, the post-test questionnaire was administered on-line: subjects typed their answers into the machine and the answers were written to a file by the program. 5) ST A T IS T IC A L R E G R E S S IO N -^ Is o know n as Regression Towards the M ean . When groups are selected on the basis o f extrem e scores on a pre-test, we expect, regardless o f any treatment d iffe r e n c e s , that each wilt re-test closer to the initial mean. T h e subjects were not selected on the basis of any pre-test. Each, subject was random ly assigned to one of four experimental groups at the beginning of the session. 6 ) S E L E C T IO N --!!ias resulting from d i ff e r e n t i a l selection o f s u b je c ts for the e xp erim ental groups. Random assignment of subjects to treatment groups is an effective way of avoiding the Selection bias. By having the computer assign the subject to an experim ental group, experimenter bias in the selection was avoided. T o further reduce the possibility of having all of the females in one group, or all of the secretaries, etc., some attempt was made to balance the distribution of identifiable subject categories amongst the four comparison groups: male, female, secretarial, professional. 50 7) E X PE R IM E N T A L M O RTA LITY --!) i f f eren tial loss o f su b je c ts f r o m the e x p e rim e n ta l groups. Tw o subjects were lost, in that their data was invalid due to system difficulties. T his problem in internal validity, however, is of greater concern in a test, re-test situation. No data is included in the task d ata or the analysis of the post-test questionnaire of those subjects who were unable to complete the session. Subject mortality is of concern also, when the reason for subjects dropping out is loss of interest, low scores, etc. 8 ) SE L E C T IO N -M A T U R A T IO N IN T ER A C T IO N - I n te r a c tio n e f f e c t s between any of the above variables , which m a y be m ista k e n for the effects of the e x p e r i m e n t a l variables. T his confounding effect is again of greatest concern in test, re-test designs. O th er sources of error which may have an effect on the validity of the experim ental design include experimenter bias, reactive measures effects and ratin g errors. Attempts were made to control for each of these. In particular, to control for differential effects of experimenter bias, a consistent set of instructions was read to each of the subjects. Though the experim enter could determ ine the experimental group to which the subject was assigned, he was 51 careful to m aintain a non-obtrusive attitude towards all subjects. Because the session was essentially self-paced, little or no experimenter intervention was required or necessary, except for those specific cases in which the subject requested help. In such cases, the experimenter attempted to simulate the norm al help facility available in the message system. R ating errors were not a factor in these experiments, since all d ata was taken on-line. A more serious source of error was the effects o f the experim ental environm ent per se on the subjects. In particular, the possibility o f guinea pig effects - whereby the measuring and testing process itself, with the subject in an environment involving an observer -- changes the respondent and biases the results. However, since subjects were assigned random ly to groups, and the potentiality of guinea pig effects applied equally to all subjects, one may conclude that there would need be a selection, environm ent interaction for the guinea pig effect to influence the comparisons between treatm ent groups. T h e same problem exists for role selection effects, where a subject may assum e a role different from his natural behavior in situations sim ilar to, but outside, the research setting. Consistent instructions to subjects, including the statem ent that the tasks are not meant to be a test of individual perform ance, but rath er of the performance of the total man machine interaction, help to m inim ize this source of error. T here is no doubt, however, that the use of 52 reactive measures, an environment with an observer and a known test situation, may bias the results of this research. T o the extent th at the groups experienced essentially identical experimental environments, except for the differences in the independent variables, one may reduce the consequences of reactive measures. T he desirability of controlled observation, similar system param eters, lack of distractions, etc., make the reactive measures design seem the appropriate one. It would be useful in future research to be able to take the measurements unobtrusively, perhaps by gathering tim ing and other d ata on-line in the daily activities of users of a particular program m ing system. Instrum ented versions of interactive programs are not uncommon. T h e difficulty, however, is in controlling the working environm ent in order to m ake useful interpretations of the data gathered and to provide m eaningful controls so that useful perform ance comparisons may be made. E x te r n a l V alidity Factors influencing the external validity, or generalizability, of the research may be broken into two categories: those factors which relate to the population from which the subjects were sampled, and those which relate to the realm of interactive computer systems, of which the Travel Message System is one example. In fact, of course, the two concerns are similar. T h e subjects for this study were selected from the staff at ISI. At the lowest level, one may say th at 53 they are representative of the members of the ISI staff who have h ad experience with MSG. It is reasonable to conclude that there is nothing unique about the hiring practices of ISI, or the use of MSG, that would forbid a generalization of the subjects to a larger group. A better description of the population from which the subjects were drawn would be those who use interactive computer data base search and retrieval programs on a regular basis, where the interaction device is a 1200-2400 baud C R T and keyboard. Subjects who utilize slower T T Y (down to 110 baud) devices as their usual m eans of interacting with a machine may have different expectations of the speed and variability of the computer output. As the cost of m echanical devices such as T T Y s increase, and the cost of C RTs decrease, more and m ore interactive computer users will be utilizing the faster CRTs. Also, users of interactive systems involved in program development, text editing, or other non-search or retrieval-oriented tasks may similarly bring a different set of expectations to the interactive environment. 54 V . R E S U L T S Subjects were randomly assigned to one of four groups in the factorial design described in the previous section. As indicated below, these groups are 1200 baud, low output variability, 1200 baud, high output variability, and 2400 baud, low and high output variability (See Figure V -l). U J < C s C 2400 a 3 3 1200 LOW HIGH OUTPUT VARIABILITY Figure V -l 2x2x2 Factorial Design In addition to the two independent variables, baud rate and output variability, there was a third variable, output volume. Subjects were m easured at more than one volume level. T he final experimental design, as discussed 5 earlier, is a 2 x2 x2 factorial design, with repeated measures on one of the factors (output volume). O ne of the reasons for using the repeated measures design is to reduce the within-cell or error variation. Another reason to use this design is economy of subjects. T hough there exists the possibility of confounding main effects w ith subject differences in the mixed design utilized here, random assignment o f subjects to groups leads one to reject this possibility [W iner, 1971]. In the sim plified design below (Figure V-2), any observed differences in the criterion (dependent) variable across A could be ascribed to either the effect of A upon subjects or the consistent differences between subject group G i and subject group G2. ° l A ° 2 fc > T b 2 B Figure V-2 Simplified Repeated Measures Design G, Gi g 2 g 2 56 Because of random selection of subjects, we will ascribe any differences across A to be A effects rather than subject difference effects. O R G A N IZ A T IO N O F SE C T IO N A brief discussion of the organization of this section is in order. First wilt be presented the results concerning the series of tasks that the subjects perform ed. These support the following conclusions: C O N C L U S I O N 1 T h e re is a significant effect for output variability on user performance. C O N C L U S I O N 2 T h e effects of changes in the output rate on user perform ance are _not significant. C O N C L U S I O N 3 T h e re is a significant effect for output volume on user performance. 57 C O N C L U S I O N 4 T h e re is a differential effect for output variability on performance at different volum e levels. At high output volume, the effect of increased variability is greater than at low volumes. C O N C L U S I O N S T hose subjects experiencing the 1200 baud, low variability version of the system perform ed significantly better than those experiencing the 2400 baud, high variability version. T hese conclusions and supporting data are presented below. Secondly will be presented the results of analysis of the post-test questionnaire. These results support the following conclusions: C O N C L U S I O N 1 T hose subjects who were in the high variability conditions had a significantly poorer view of the interactive computer system and its environm ent than those in the low variability conditions. C O N C L U S I O N 2 T h e re is a significant differential (interaction) effect between variability and output display rate upon the user’s view and tolerance of the system. C O N C L U S I O N 3 T h e re is a significant difference in the user’s attitude towards the system and its t environm ent between those who received the low variance, 1 2 0 0 baud version, and the high variance, 2400 baud version. CONCL USION 4 T h ere is no significant difference in the user’ s attitude towards the system and its environm ent between those in the 1200 baud version and those in the 2400. baud version. > CONCL USION 5 T hose subjects who felt that the system was too slow, or too variable ! in response, were also less satisfied with other (nan-manipulated) features of the interactive environment. TA S K RESU LTS D ata was pooled across all 11 tasks. Additionally, tasks were divided into ones requiring low output volume and those requiring high output volume. T h e task with the median output volume was eliminated in order to fu rth er 59 enforce the high-low dichotomy. T h e main conclusion, that there is a significant effect for output variability on user performance, is supported, p<.05, across all volume levels. Analysis of variance summary tables for the repeated measures design, with repeated m easures across the two volume levels (Table V -l), appear below, and support this conclusion. Table V-l (a) ANALYSIS OF VARIANCE SUMMARY TABLE (Independent Variable; Total time in seconds to complete tasks) SOURCE SS df MS F P Var iabi1i ty 1370340 1 1370340 6.10 <.05 Baud 398 1 398 <1.00 N.S. Var x Baud 54 1 54 <1.00 N.S. Error (bet) 719BG08 32 224894 ' Vo 1 17743917 1 17743917 233.30 <•01, Var x Vol 583020 1 583020 7.67 <.01 Baud x Vol 31S0 1 3160 <1.00 N.S. • Var x Vol x Baud 2962 1 3296 <1.00 N.S. Error(w/i n) 2434000 32 76063 60 Table V-l (b) ANALYSIS OF VARIANCE SUMMARY TABLE (I independent Variable; CPU time used) SOURCE SS df MS F P Var i abi!i ty 9.90 1 3.90 <1.00 N.S. Baud 1245.80 1 1245.80 19.00 <.01 Var x Baud 4.7 1 4.70 <1.00 N.S. Error (bet) 2042.B0 32 03.80 Vol 1099.00 1 1099.00 17.90 <: 01 Var x Vol 2S.00 1 20.00 <1.00 N.S. Baud x Vol 599.20 1 - ■ 599.20 9.80 <.81 Var x Baud x Vol 3.30 1 3.30 <1.00 N.S. Error(u/i n) 1902.00 32 01.30 Table V-l (c) ANALYSIS OF VARIANCE SUMMARY TABLE (Independent Variable; Keystrokes used) SOURCE SS df MS F P VariabiIi ty 80.20 1 80.20 <1.00 N.S Baud 288.00 1 288.0 <1.00 N.S Var x Baud 312.50 1 312,50 <1.00 N.S Error (bet) 14035.00 32 440.5 ; Vol 53.40 1 53.40 <1.00 N.S Var x Vol 450.00 1 450.00 1.50 N.S Baud x Vol 555.00 1 555.00 1.90 N.S Var x Baud x Vol 501.40 1 501.40 1.70 N.S Error(w/in ) 9570.00 32 299.10 In T able V -l (a), note that the effects for Baud, V ar x B aud, B aud x Vol, and the triple interaction are not significant (p>.05). It is instructive at this point to view the data as a 2x2 factorial design, as indicated in Figure V-3, combining low and high baud groups. T h e mean times (in seconds) to complete the tasks are indicated in their proper cells, and the results are plotted below (Figure V-4). T his more clearly shows the effects o f increased output variability and increased output volume. 3 HIGH I O > H 3 £ LOW 3 O LOW HIGH OUTPUT VARIABILITY Figure V-3 2x2 D esig n -1200 and 2400 Baud Combined T h e nominal baud rates for the low and high variability conditions and the low and high baud rate conditions are indicated in Figure V-5. T h e num bers in brackets indicate the average baud rate over the row or column, as appropriate. 1400 1856 587 683 62 1900 1800 1700 1600 1500 1400 ia 1300 LLi 5 ! — 800 700 600 500 400 \P N l » • _L LOW HIGH OUTPUT VARIABILITY Figure V-4 Plot of Tim e vs. Variability for Two Volume Levels 63 [1800] [900] 2400 Q Z> “ 1200 2400 1200 1200 600 D 8 0 ° ] [900] LOW HIGH OUTPUT VARIABILITY Figure V-5 Nominal Baud Rates Since the high variability conditions yielded a nominal baud rate of 1800, and the low variability condition one of 900, we might expect differences in perform ance to accompany these differences. Similarly, the average difference in baud rate is also 900 vs. 1800. B ut here, no significant perform ance differences are observed, p>.05. T h u s the significant performance difference between the low and high output variability groups can not be accounted for by differences in nominal output rates. T he implications of this result will be discussed later. In Figure V-5, note that there are two cells with identical nom inal output rates-2400, high variability, and 1200, low variability. Each yields a nom inal 1200 baud output rate. It seems clear that any performance differences 64 between these two cells can be attributed to output variability differences only. Furtherm ore, comparing these two cells allows a test of which users would prefer: smooth but slow output, or jerky but fast. Additionally, the post-test questionnaire offered an opportunity to compare attitudes of users, as well as [ th eir behavior (performance). A t-test was performed comparing these two cells t o f the factorial design. T he mean difference in time to complete the tasks between cells was significant: t = ■ 4.28, pc.Ol. Perform ance measures other than time to complete tasks were taken, including C PU time used and number of keystrokes or functions used to complete tasks. T h e analysis of variance summary tables for these variables are presented above in Tables V -l (b) and V -l (c). T h e effects of changes in the output baud rate and the output variability on C PU time used can be accounted for by the algorithms necessary to implement the 1200 baud version i and the high variability version. ! For each subject on each task, three performance measures were obtained: tim e to complete task i, Tj-, C PU time used in completing task i, C p and I j keystrokes used, Kj. In the traditional analysis of variance, these three i perform ance measures are considered to be independent, uncorrelated, unidim ensional quantities, and the effects that the independent variables may have on them are determined separately and independently. T hree separate 65 analyses are done, each determ ining the effects of the set of independent variables on only one of the dependent variables at a time. W hen there is a reasonable interpretation which can be placed on the performance m easures in I a metric sense, then it becomes useful to attempt to interpret the observed ; differences. In the task data presented here, the interpretation is I • j straightforw ard: the high output variability groups took more time to complete i ! the tasks than the low variability groups. We can reasonably say th at their perform ance was "better.'' If we examine the num ber of keystrokes used in perform ing the tasks, it is not clear whether using more keystrokes should be considered "better" performance or "worse." In this case, it would be necessary to attach meanings to the num ber of keystrokes used before the d ata analysis is accomplished. Post-hoc (or really, ad-hoc) reasoning is inappropriate. Since there was no strong motivation for stating a priori whether more, or fewer, keystrokes were "better," we leave the presentation of the result in the form, o f I • stating that there was, or was not, an observed difference. In the analysis of ! the post-test questionnaire, reported below, this same problem of interpretation also appears. However, there it is possible to make a priori statements I i concerning the meaning of response differences. T his is reported in detail I f below. i ! i i i 66 D IS C U S S IO N A N D A N A L Y SIS O F T A SK R E SU L T S T h e variables of interest in this study were output baud rate and output I rate variability. A third variable, output volume, was used to control for differences that large vs. small amounts of material to be read would have on I the perform ance measures. T he rationale for selecting these variables, and the i I perform ance measures, is discussed earlier. T o the extent that these objective m easurable quantities capture the effects upon the user of differences in system param eters, they can be described as good, or useful indicators of effectiveness o f the interactive system from the user’s point of view. i In this study there are two sets of performance measures: those taken du rin g j the tasks - time, C PU , and functions - designed to objectively measure the ! user’s perform ance, and the attitude survey (post-test questionnaire), designed to m easure the user’s attitude towards the system and the interactive environm ent. I T h e effects of the independent variables on the.perform ance measures will i be discussed individually, then their combined effects will be discussed. M AIN EFFEC TS V a ria b ility E ffe c ts T h ere is a significant difference in time to complete the tasks, across the output variability. Further comparisons of the two conditions, 2400 b au d/high variability vs. 1200 baud/low variability were done in order to ascertain the possibility of confounding effects. These two cells produced equivalent nom inal output rates: 1200 baud (see Figure V-5). As described earlier, the variability algorithm was designed such that the total time to display N characters on the screen would be approximately double the am ount of tim e to display the same N characters without the variability. Therefore, any perform ance differences between these two conditions can reasonably be ascribed to variability of output differences rather than total output time differences. T he result of the comparison of performance differences between these two conditions is that there is a significant difference in the time to complete the tasks. T he amount of CPU time used also varies, but this may be attributed to the additional processing needed to implement the 12 0 0 baud display rate, and the high output variability. T here was no significant difference, however, in the num ber of keystrokes used in perform ing the tasks. T h e conclusion is that increasing the variability of com puter output significantly decreased user performance in the interactive tasks. T o the extent 68 th at the test population for these experiments represents a broader category of potential interactive systems users, and the test system is representative of a J broader class of interactive systems, we may conclude that variability in display ! rate has per se a detrimental effect on user performance. T h e question of the ! generalizability of the results of this study, the issues of reliability and internal I and external validity, were introduced in the previous chapter and will be discussed further in the next. Baud Rate E ff e c t s In contrast to the significant effects on performance of changes in output variability, presenting the requested information in the interactive travel message system at 1200 baud vs. 2400 baud produced no significant differences in performance. In particular, even though the nominal baud rate was 1800 for the 2400 baud groups and 900 for the 1200 baud groups (see Fijgure V-5), there was n o significant difference in times to complete the tasks I for the two groups. T his result holds across high volume as well as low volum e tasks (i.e., the baud x volume interaction was not significant). It appears th at both 1200 and 2400 baud display rates are faster than the typical subject can read, so that time to read a page of material depends on the in dividual’ s reading speed rather than system display rates. O ne would then conjecture a plateau in the curve of time to read a screen of text vs. display 69 rate, somewhat like the one below (Figure V-6 ). Apparently the plateau is. reached at display rates of less than 1200 baud. In fact, 1200 baud corresponds to approxim ately 1200 words per minute, a rate considerably faster than the rate at which the average person reads. TIME DISPLAY j ■RATE j Figure V -6 G raph of Tim e to Read Screenful of Material Vs. Display Rate However, the result is still somewhat curious. A num ber of tasks required the subject to visually search message bodies for particular names. W hile it would seem reasonable to expect that doubling the display rate should lead to shorter times in completing those kinds of tasks, this was not observed to occur. W e may conclude that doubling the display rate from 1200 to 2400 baud does not produce improved performance for the subjects and system of these experim ents. It should also be noted that the total time to present the typical am ount of material was only about seven percent of the total time subjects t 70 needed to perform the individual tasks. For example, for the high volume tasks, average total output was approximately 2000 characters, At 1200 baud, this would require approximately 2 0 seconds to display this am ount of m aterial I j on the screen. However, from Figure V-4, it is observed that the time to j ; complete the high output volume tasks varied from 1400 to 1900 seconds, about i ' 280 to 380 seconds for each of the five tasks, depending on output variability. Sim ilar results hold for the low output volume tasks. T h is result makes the one involving output variability seem that much stronger. Exam ining Figure V-5 again, we note that there was a difference in average display rates across the two variability conditions. It is not immediately clear whether the effect for variability might be confounded with the average display rate. T h at there is no significant difference across baud rate leads us to reject that possibility. I n te r a c tio n E ffec ts O f the four interaction terms in the analysis of variance (variability x baud, variability x volume, baud x volume, and the triple interaction), only the variability x volume effect is significant. T h e influence that increased output variability has on performance is greater at higher output volumes than at lower volumes. T his result is not unexpected, since for this study the low 71 output volume tasks required less than a screenful of text while the high volume tasks required considerably more. Apparently subjects could tolerate the higher variability in output rate if it occurred over a relatively short period of time. Also, in the high output volume tasks, a greater percentage of the task i ! time was spent in reading output as opposed to the other actions required in I i j the tasks. T hough there has been little reported in the literature on the effects I • j on reading speed or visual search of continuously varying the display rate, reaction time experiments tend to support the view that increasing the variability of the stimulus significantly increases response tim e (c.f. M ack worth, 1970; Mostofsky, 1970; Davies and Tune, 1969). P O S T -T E S T Q U EST IO N N A IR E A questionnaire (see Appendix 3) was administered to each subject on-line immediately following his or her series of tasks. Because the questionnaire was presented using the same values of the output parameters as for the tasks, subjects received a consistent view of the system. T he questionnaire consisted of 18 questions relating to the complete system — its speed, use of keyboard, display features, screen size, etc. - and asked the subject to numerically rate the particular area of the question on a 5 point scale. T h e rating scale was presented to the subject before he or she began answering the questions. T h e following paragraph was presented: 72 Please answer the following questions with a numerical rating in the range of 1-5. 1= Very Poor, Unacceptable, etc. 5= Excellent, Completely Acceptable, Easy to Use, etc. (1-2 implies a generally negative response, 4-5 a generally positive one.) However, a specific num erical scale will be given for each question. Even though specific meanings were attached to the ratings for each question, all obeyed the same ordering: 1-2 indicated a negative reaction to the point o r feature of the question, 3 a generally neutral response, and 4-5 a generally positive response. T he questionnaire and scale selection were designed to follow standard practices in questionnaire and survey research instrum ents [Babbie, 1973]. T he responses make at least an ordinal scale. Subjects were instructed to think of the response scale as representing a continuum and to answer anywhere in the range 1-5. Though not a ratio scale, and probably not an interval scale, a num ber of analytical techniques are available for analyzing the d a ta T h e analysis of the questionnaire d ata is presented later in this section. T H E Q U E ST IO N S N ot all of the questions were of immediate relevance to the tasks the subject perform ed, or to the variables of interest in this study. They were included, however, in order to provide a more complete view of the subject’s overall impression of, and response to, the system as he or she had just experienced it. 73 T h e questions can be conceptually broken into groupings which correspond to different features of the interactive system. For example, questions 1-4 dealt with the use and the completeness of the system’ s commands. Q uestions 5-8 dealt with the physical aspects of the display: screen size, character size and shape, sufficiency and readability of output, etc. Questions 9-13 dealt with com puter and printing speed, and variation in those speeds. Finally, 14-18 dealt with the overall utility of the system. T h e following table (Table V-2) presents the results for each question individually. T h e actual F ratios and probabilities are presented. T hose which were significant beyond the 1 percent level (p<.0 1 ) are indicated by **. T hose th at were significant beyond the 5 percent level (p<.05) are indicated by *. If in fact the data were uncorrelated, independent, we could m ake a probability statement of achieving N significant F ratios in M measures, at significance level p. Since our data does contain correlations, with the same subject providing answers on all questions, our probability statem ent is weakened. Table V-2 Significance of individual questions in the Post-Test Questionnaire Q uestion Fv Fb 1 11.7 1.10 2 <1 <1 3 <1 <1 4 6.38 6.38 5 1.92 1.44 6 8.16 1.44 7 <1 2.97 8 <1 <1 9 6.56 <1 10 2.47 1.36 11 16.35 <1 12 17.31 <1 13 4.02 <1 14 4.93 <1 15 6.49 6.20 16 1.78 3.94 17 1.16 2.26 18 7.96 <1 All with d f = '1,26) Fvxb Py Pb Pvxb <1 < 1 2.34 <1 & < 1 < 1 4.77 % <1 1.81 » * < <1 <1 <1 <1 4.49 > ! < 3.08 * <1 < 1 5.62 # > * < T h e 18 questions making up the post-test questionnaire may be thought of as comprising an "index of satisfaction” of the user with the system. If we simply add up the responses of each subject, this sum may be considered the j satisfaction index. T he figure below (Figure V-7) presents the average response for the subjects in each of the four cells of the factorial design, while I Figure V -8 presents the results graphically. An analysis of variance was « perform ed using this index as the dependent measure. lu 2400 h* < Q £ O 3 < 60 1200 OUTPUT VARIABILITY Figure V-7 M ean responses for the 2 x 2 factorial design 3 .9 3 .4 3 .8 3 .0 LOW HIGH 76 0 -------- 1 ----------------- L _ LOW HIGH OUTPUT VARIABILITY Figure V -8 G raph of average response to Post-Test Questionnaire Vs. O utput Variability for 1200 Baud and 2400 Baud T h e results of the analysis of variance are presented in the following table (T able V-4 (a)). We note the strong effect for output variability on the average answer to the post-test questionnaire. Table V-4 (a) ANALYSIS OF VARIANCE SUMMARY TABLE (Independent Variables Average answer to 18 quest i ons post-test questionnaire) SOURCE SS df MS F P Var iabi1i ty 2.77 1 2.77 19.4 <• 01 Baud 0.43 1 0.43 3.0 M.S. Var x Baud 0.29 1 0.29 2.1 N.S. Error 3.40 24 0.14 If we view the data as an 18-dimensiona! quantity then it is reasonable to com pare not the sum (or average) of the answers, but rather, the norm of the answer vector in 18-space. T his norm is given by the square root of the sum o f the square of the answers to the individual questions, |Q| = (2 where qj is the answer to question i, |QJ is the norm of the subject’s answers. T h e analysis of variance summary table is presented below (Table V-4 (b)), using |QJ as the dependent variable, and nominal output rate (baud) and output rate variability as the independent variables. 78 Table V-4 (b) ANALYSIS OF VARIANCE SUMMARY TABLE (Independent Variables •Norm of answers to 18 questions poet-test questionnaire) SOURCE SS df MS F P Var i ab i1i ty 0.14 1 0.14 4.S7 <.05 Baud 0.12 1 0.12 4.00 N.S. Var x Baud 0.08 1 0.08 2.67 N.S. Error 0.7S 24 0.03 In both of the analyses, using either the sum of answers to the 18 questions or the norm of th e answers, there is a significant effect for output rate variability but not for nominal rate differences. Both analyses allow an unequivocal view that users experiencing the high variability versions of the system expressed a significantly lower view of the system, its commands, its display, its speed and its overall utility than those experiencing the low variability versions of the system. T h e questionnaire is further broken down later by questions relating to commands, display, speed and utility. IN T E R A C T IO N EFFECTS In questions 3, 7, 9, 14, 15, and 18 an interesting pattern of interaction effects was observed. Analyzing the questions individually does not provide adequate sensitivity as to whether this pattern represents random (uncontrolled error) effects in the data or can reasonably be ascribed to differences in the display variables of this study. It is instructive to examine the nature of the V x B interaction. In a / / cases, the relationship between output baud rate (B), output variability (VAR), and group mean answer is as presented in Figure V-9. LU t s > Z < 1200 BAUD x LOW HIGH OUTPUT VARIABILITY Figure V-9 G raph of typical response to questions where interaction was significant, vs. O utput Variability In each case, for the low variability condition, the subjects experiencing the 2400 baud version of the system had higher mean responses than those experiencing the 1200 baud version. T his result is expected, and corresponds to the perform ance of these two groups in the tasks. 80 It is also noted in Figure V-9 that the mean answers for the 2400 baud, high variability group is lower than that for the 2400 baud, low variability group. T his result is also consistent with the performance data on the tasks. T h e interesting case, however, is that for those who experienced the 1200 b au d version o f the system. Those who experienced the 1200 baud, high variability version of the system had a pattern of higher mean responses than those experiencing the 1200 baud, low variability version of the system, or at least, th eir answers were not significantly lower. It is useful to examine the subjects’ answers to individual questions w ithin groups, and to the average response of the questions in each of the groups. For ease of later reference, the groupings of questions will be denoted [C] for 1-4 (commands), [D] for 5-8 (display), [S] for 9-13, (speed), and [U] for 14-18 (utility). T able V presents the analysis of variance summary tables for the average response within groups. T h e analysis of variance summary tables for the individual questions have been presented previously (see Table V-2). As noted earlier, if we exam ine the results for a main effect of output variability upon the answers, we find th at there is a significant effect on questions 1, 4, 6 , 9, 11-15, and 18. 81 W hen the answers are viewed as groups, each representing a common com ponent of the interactive environment, we find that there are significant differences between subjects in low vs. high variability groups in the average response within the three groups [C], [S] and [U], but not for [D3- 82 Table V-5 ANALYSIS OF VARIANCE SUNNARY SOURCE SS df MS F P [Cl Var iabi1i ty 1.38 1 1.38 3.93 05 Baud 0.14 1 0.14 0.40 . N.S. Var x Baud 0.39 1 0.39 1.10 N.S. Error 8.45 24 0.35 [0] Variabi1i ty 0.93 1 0.93 2.45 N.S. Baud 2.04 1 2.04 5.41 <.05 Var x Baud 0.13 1 0.13 0.35 N.S. Error 9.06 24 0.38 [S3 Var i abili ty 7.26 1 7.26 16.24 <.01 Baud 0.04 1 0.04 0.09 N.S. Var x Baud 0.04 1 0.04 0.09 N.S. Error 10.7 24 0.45 1 tU3 Var i abi M ty 2.64 1 2.64 13.63 <.01 Baud 0.83 1 0.83 4.31 <.05 Var x Baud 1.02 1 1.02 5.26 <.05 Error 4.65 24 0.19 Analysis of Variance summary tables for the Post-Test Questionnaire by groups: [C] - questions 1 - 3 ; [D] - questions 4-8; [S] - questions 9-13; [U] - questions 14-18 ANSWER Figure V-10 presents graphical results of the average response to the questions w ithin the four groups ([C], [D], [S] and [U]), vs. output variability, for 1200 and 2400 baud. 0 _l LOW OUTPUT VARIABILITY HIGH Figure V-10 (a) G raph of average response to the [C] questions vs. O utput Variability 84 L , ___ - ___________ L- LOW HIGH OUTPUT VARIABILITY Figure V-10 (b) G raph of average response to the [D] questions vs. O utput Variability 85 ANSWER 0 LOW HIGH OUTPUT VARIABILITY Figure V-10 (c) G raph of average response to the [S] questions vs. O utput Variability ANSWER 5 4 0 LOW HIGH OUTPUT VARIABILITY Figure V-10 (d) G raph of average response to the [U] questions vs. O utput Variability If we (cautiously) embed our answers in a metric space, we are able to get a better insight into the nature of the differences within each group of questions. W e may thus say that for those questions relating to system commands, [C], j those experiencing the low output variability condition had an average j response which was higher, and thus indicated better facility with the system, i th an those experiencing the high variability conditions. T h e same was true for those questions relating to system speed, [S], and overall system utility [U], D ISC U SSIO N AND ANALYSIS O F PO ST-T E ST Q U E ST IO N N A IR E T h e above analyses of the subject’s answers on the post-test questionnaire were m ade in order to provide insight into the the overall view of the system th at the subject would have upon completing a typical search and display session with an interactive system. Some of the results need be exam ined in depth, since they may be at variance with intuition. A fundam ental conclusion, which is supported both by the task data and the questionnaire data, is that the nominal output baud rate, at 1200 baud vs. 2400 baud, has at best a very weak effect upon the user’ s perform ance and attitude towards the system. Specifically, over a num ber of tasks, involving both low and high output volumes, there was no s i g n i fi c a n t perform ance difference between those receiving the 1200 baud version and those receiving the 2400 baud version. T his result is observed across the low volume tasks, across the high volume tasks, and across the low output variability groups and the high output variability groups. O n questions 3, 7, 9, 14, 15 an d 18, an i interaction as indicated in Figure V-9 was obtained. In these questions, the i average response of the 2400 baud group, in the low variability condition, was ! higher than in the 1200 baud group. In question 7, the 1200 baud group had a higher average response than the 2400 baud group, in the high variability condition. These six questions, 3, 7, 9, 14, 15 and 18, provide an interpretation problem . T h e result for the low variability condition is expected, and in virtually all of the questions, the 2400 baud answers were higher than the 1200 baud. For those experiencing the high variability version of the system, the answers to the questionnaire were more interesting. In a / / of the questions except 10 (variation in computer system speed) and 12 (variation in printing speed), those experiencing the 1200 baud version of the system have h ig h e r answers than those experiencing the 2400 baud version. As indicated previously, these differences are generally not significant on an individual question basis, but when the questions are viewed in a multidimensional sense, these differences do tend to become significant. 8 9 T h e answers to the questions might tend to imply that if we are faced with a system which suffers from a great deal of variability in output rate (such as a heavily loaded interactive system, or a user interacting with a host com puter through a communications interface processor), the nominal output rate should, | be adjusted to be about 1200 baud (if the selection is 1200 vs. 2400). T h is l j conclusion is not supported by a more thorough analysis of the interactive environm ent Virtually all subjects in the high variability/1200 baud version voiced their frustration at the slow, unsteady nature of the output. O n e subject, in fact, refused to continue the experiment and another became openly hostile towards the experimenter. T heir data are not included in the above analyses. W hile subjects in the other conditions were able to m aintain interest in the tasks and found the session a reasonable approximation to potential real-world tasks, those in the 1200 baud/high variability group had greater difficulty m aintaining interest or motivation. Many felt that the tasks were putting undue burdens upon them, given the system that they had to accomplish the tasks. These conclusions seem reasonable because of two supporting results. Firstly, the experim enter notes of the comments of subjects during the sessions dem onstrate the need for greater computer power (functions, speed) for this group (see C hapter V I for further elaboration). Secondly, on some of the questions dealing with system functions and speed, subjects in this group had 9 lower answers. T h u s there appears to be an indication of deteriorating attitude towards the system by subjects in this group, which matches their decreased perform ance in the tasks. j Because subjects in this 1200 baud/high variability group had lower ! | interest or motivation towards the tasks, because their frustration and dislike of I the system appeared to increase during the experimental session, their general attitude towards the post-test questionnaire was one of relief th at the agonizingly slow series of tasks had been completed. T h eir responses to questions were given not so much as a result of thoughtful consideration of the m eaning of the question and the best selection of possible alternative answers, b ut rather as a result of a desire to hurry onto the next question and end the session as quickly as possible. It is clear from reading the questions that some do not m ake immediate contact with the exasperating nature of the system, and so require the subject to think about the answer and its implications. Q uestions such as 5 (screen large enough) or 16 (need for more m aterials on functions available) fit into this category. W hen the questions dealing with | speed and variation (9-13) are examined, we find the expected relationship of J this cell to the others. T he table below (Table V-6 ) presents the rank order of the 1200 baud, low variability group on the [S] questions, where 1 means this group had the highest mean response, 4, the lowest. 91 Table V -6 Ranks of questions in the Speed group Question Rank of 1200 baud, low variability group 9 3 10 4 i t 4 12 4 13 4 O n those questions which most closely matched the strong feeling that th e subjects in this group had towards the system, their answers reflect their behavior and their expressions during the session. E xam ining the questions by groups, we notice a strong effect of output I ' variability upon the attitudes of the users towards the system on three of the four question groups. Subjects experiencing the high variability conditions had a lower response index to the questions in the [C], [S] and [U] groups. E xam ining more closely the meaning of the [C], [D], [S] and [U] indices, the J following are concluded: (1) H igh output variability subjects perceived the command structure as less adequate to their needs. As indicated earlier, the experimental design assumed 92 a fixed terminal type with its own display characteristics. T h e H P term inal used in these studies works on a scrolling method where each new tine o f output is presented on the bottom of the screen and all lines above scroll up. T h e topmost line is lost as each new line is appended at the bottom. Some o f | the apparent dissatisfaction with the command language (and the term inal ! itself) may be associated with the scrolling typical of the H P and other terminals. (2) H igh output variability subjects were less satisfied with the physical display. T hough this result is somewhat ambiguous, the general conclusion is th at increasing the variability of the output display rate reduces the user’s overall image of the system. Put another way, users with low variability of output seemed more likely to find the particular display satisfactory. (3) Users who experienced the high output variability version were bothered by the slowness of the system, and noticed the reduced speed and the increased variability of the output rate. However, merely cutting the output rate in half (from 2400 baud to 1200 baud) did not produce a noticeable i reduction in the answers for users in either the high or low variability groups. (4) In general, subjects experiencing the high variability versions of the system had a lower view of the overall utility of the system, as evidenced by their average response to questions concerning the usefulness of the system, the 93 desirability of using the system vs. performing the tasks by hand, need for more materials, and their overall rating of input to the computer and output from the computer. Exam ining the intercorrelations between the answers to the questionnaire provides a useful insight into those parts of the system which are viewed as a whole. For example, looking at those questions which correlate significantly with the [S] questions, 9-13 (Table V-7) allows one to identify those aspects of the interactive system with which a user is least satisfied as the system becomes m ore stressful. It would be expected, of course, that there would be significant correlations between questions within the [S] grouping, and this is observed. Identifying those questions outside of the [S3 group which correlate with questions within the group, the following are concluded: (1) Those who perceived the system as being slower had a significantly poorer view of the ease of using the commands and the overall utility of |h e system, felt a need for more materials on the system, and generally had a lower overall view of system output, than those who perceived the system as being ! • " i relatively faster. (2) Those who perceived the system as being relatively high in variability of output and processing speed had a significantly lower view of the ease of using the commands, found the brightness and size of the display screen less 94 satisfactory, and found that the data presented was less sufficient for th eir needs than those who perceived the system as having relatively little variability in output or processing speed. Table V-7 Q uestion Questions with which it correlates Significantly (p<.05) 9 10,11,12,.13 10 4,5,8,9,12 11 9,12,13,14,18 12 1,5,9,10,13 13 9,11,12,16,18 I i 95 VI. C O N C L U S IO N S AND R E C O M M E N D A T IO N S T h e m ajor emphasis of the research presented in this dissertation is th at there are a num ber of parameters of the man-machine interaction which affect I the perform ance o f the user. Specifically, it was hypothesized that changes in 1 the nom inal display rate of the presentation of computer output, and the j variability in the display rate, would have significant effects on the i perform ance and attitudes of the users of the man-machine system. Reviewing the extensive statistical material presented in C hapter V , we find that, contrary to initial conjecture, doubling the display rate from 1200 baud to 2400 baud has no apparent effect on user performance in the interactive tasks o f this research. Also of interest, doubling the display rate from 1200 to 2400 baud has no apparent effect on the attitudes of the users towards the interactive system, its command structure, its display features (screen, characters, etc.), the speed of the system, or the overall utility of the system. In fact, in reviewing T able V-2 again, we note that in only two of the questions in the post-test questionnaire (question 4 -- screen brightness, question 15 — perform tasks by hand) was there a significant effect for baud rate on the subjects’ answers. Even for the low output variability groups, the differences in the I : answers to the post-test questionnaire between those subjects receiving the 1200 baud version and those receiving the 2400 baud version were not significant. 96 W hen the effects of variability of display rate upon perform ance and attitude are evaluated, the situation is entirely different. In both the perform ance of the subjects on the interactive tasks and the attitude measures tow ards the system evaluated via the post-test questionnaire, subjects receiving high variability versions of the system performed significantly more poorly, and h ad a significantly poorer attitude towards the system, than those receiving the low variability versions. T his effect is significant at both the 1200 baud and 2400 baud levels, and increases as the volume of output material increases. In fact, because of the significant interaction effect between output volum e and output rate variability, we see that the effects upon performance are m agnified at the greater output volumes. In the direct comparison between the two cells of the factorial design where the nominal output rates are the same (1200 baud low output variability vs. 2400 baud high output variability), the perform ance differences between the two groups is significant and quite strong at the high output volume level. T h is research has found that doubling the display rate of system output to the user of an interactive message processing system does not im prove perform ance, nor does it lead to an improved view 'of the system or attitude /) tow ards the system on the part of the user. W hat then are the effects of increasing system output, and what ways might be effective in both im proving user perform ance and improving attitude? At this point, a confounding 97 problem occurs. It has been a (seemingly not unreasonable) assumption on the p art of system designers that increasing the display rates leads to better perform ance in interactive systems. If we could guarantee that the variability in the display rate were held constant as the display rate were increased, the results of these experiments allow us to conclude that performance and attitude are not dim inished. They may even be improved, though this does not appear to be the case in this research. So there is certainly no immediately apparent draw back to providing faster displays. However, as systems become heavily loaded, increased display rates are associated with increases in the variability. T h e actual display rate may not be improved, and the results of this research strongly demonstrate that performance is decreased and that user attitudes tow ards the system deteriorate. These conclusions are so strongly supported by the d ata presented (see C hapter V) that a general recommendation to system designers would have to be that increasing output display rates should not be attem pted without a corresponding increase in CPU power. Subjects who received the high variability versions of the system experienced frustration and demonstrated a poorer view of the system and its environm ent in ways other than poorer performance and more negative questionnaire responses. For many of the subjects who participated in the experim ents, their experiences with the system subjected them to conditions which were beyond their normal use of an interactive system. In particular, 98 those subjects in the high variability conditions tended to express the opinion th at they were utilizing a system through a heavily loaded term inal interface processor (T IP) port [Bolt Beranek and Newman, 1974]. Such was the agonizingly slow response of the system at times that the functions provided to perform the series of tasks were clearly inadequate. Those subjects receiving the low variability versions of the system did not tend to express such negative opinions of the system and generally found the experience to be a reasonable approxim ation of the types of tasks that they (or at least the person they were sim ulating) might have to perform on a regular basis. It is interesting to com pare some of the responses to the open-ended question on the post-test questionnaire (see Appendix 3). T he material presented below is not complete, b ut has been selected from both low and high output variability groups in order to illustrate the point. Selected responses from subjects in the high output variability groups: M ultiple mess, very useful but slightly confusing. Losing the page length in the middle of a search was a real pain. . .O utput delays would become more of a pain as the proficiency increased. . .System was generally quite easy to use. Since questions were oriented toward info imbedded in the mess text it would be very useful to have the mess info either cross referenced or available to the command structure. A nother subject: I felt that the system was lacking in two basic areas. First of all, there was a deficiency in the functionality of the system. Too 99 much data had to be scanned in order to answer the questions at hand. T h e addition of a few more commands would have greatly eased the tasks. Secondly, and far more important, was the fact that the system response was so incredibly poor. I felt myself thinking of the wasted time involved. It wasn’t even consoling to think of the time needed to complete the tasks by hand. T he thing that stuck out most to me was that I felt that I wasn’t being useful (to myself or anyone else) when I was waiting on the machine. In that sense, poring over a listing by hand, and possibly taking longer to complete the task, would have been more satisfactory. A nother subject: T h e system (message system) would have been very easy to get used to - definitions of commands etc. Would very much have liked a means to search the body of the messages for strings. Found the long delays from the computer made the job really dull - it would have been more interesting if the information could have been gathered quickly and easily (the text string search). Selected responses from subjects in the low output variability groups: T h e subcommand "Multiple" I feel should take more than a one-line string. For example I would like to be able to type From (string) and From (string) again and have the message service give me both From strings. I feel in general that this message program does its tasks well in th at it gives the user the facts he needs quickly and reliably. A nother subject: I would like to see more commands similar to XED "Find" and "Search" or a method of reading all selected messages into a buffer to search for key words or phrases. A nother subject: T h ere should have been more effort in being able to find names in the body of the message. Also, since there seemed to be no distinction between the sender and the rest of the people on the trip, they should have equal status for searching etc. It should also process dates into the standard format before the search (I don’t want to remember the exact format - especially which months are abrev. and which aren’t). T h e system speed was acceptable. I just want more power. T h e output form at should have been less verbose, i.e., find the people’s names in the messages and just print those. Q uite often production systems are developed and implemented in an ad-hoc m anner without adequate regard for the stresses that a heavily loaded system can place upon the performance and attitude of the user of the system. Apparently, from the nature of the responses to the open-ended question, a system designer can gain valuable insight into the needs, requirements, typical 1 modes of interaction, etc., of users by observing a large num ber of potential users actually using the system or a simulated version of the system, under a num ber of different conditions. T his conclusion was one of the m otivations for the research reported in Heafner and Miller [1976]. In this work, the authors dem onstrated the utility of observing and questioning a large num ber of potential users of a military message processing system. U nlike the experim ental paradigm used in the research reported in this dissertation, however, the authors of the above paper not only were concerned with the average response of subjects to a series of questions, but were also interested in 101 the small quantum of additional insight that each subject could provide. It was only through a careful series of probing questions that this additional inform ation was elicited. A fortunate additional result of the research reported here, then, is the further demonstration of the utility of perform ing well designed and controlled testing and observation of a system before putting th a t system into general use. F U T U R E STU DIES AND EX T EN SIO N O F RESEARCH T h e research reported here, in addition to providing useful insights into the display variables which affect the performance of the user in M M I, also establishes a methodology which may prove of use to future systems designers. T h e research began by attempting to develop a mathematical model of M M I. It is apparent that there would need to be man (user) oriented variables and m achine oriented variables in the model in order to adequately describe user behavior. A systems designer, however, may be less interested in developing a general description of MMI and more interested in predicting user perform ance with a particular population, a particular set of values of the display variables, and a particular MMS. T o the extent that the general model proposed here adequately describes user performance, and the performance measures used are of value to the designer, all that may be required is simply to solve the equations for performance, by fixing the parameters. T here may still be 102 ■ concern, however, as to the adequacy of the performance measures used in this research. Also, if the target user population is not relatively homogeneous, then steps must be taken to ascertain those user-oriented variables (I.Q., general attitude towards computers, typing ability, specific computer or system experience, etc.) which also affect user performance. H eafner [1974], and. C arlisle [1974], among others, postulate, or demonstrate, the effects th at user characteristics have upon performance in interactive problem-solving tasks. A num ber of the results presented in the previous chapter are not easily explained within the traditional confines of computer science and hum an factors research. T he unusual pattern of interactions between baud rate and output variability upon the answers in questions 3, 7, 9, 14, 15 and 18 (see Figure V-9) has been discussed in some detail in C hapter V, but it is interesting to re-examine the results of the post-test questionnaire at this point as a means of pointing out the need for a broader theoretical base upon which to interpret the results. These six questions do not appear to have any immediately observable properties in common. In fact, they represent all four o f the question groups discussed in C hapter V: Command questions [C], Display questions [D], Speed questions [S] and the General Utility questions [U]. O ne of the most startling aspects of the responses to the post-test questionnaire is that only three questions, 10, 12 and 13, yielded results sim ilar to w hat was conjectured before the study began: that there would be m ain 1 0 3 effects for baud rate and output variability, with nonsignificant interaction effects. T h e conjectured results are presented below (Figure V I-1). U J < J z < s O u_ C x t LU Q - LOW OUTPUT VARIABILITY HIGH Figure VI-1 Prototypical Results of Post-Test Questionnaire T h e deviations from the “expected" results generally took the form of non-significance of the effect for difference in display rate. Even on the [S] (speed and variability) group of questions (9-13), evaluated on a question-by-question basis, and on the effect of the display rate on the "index 104 o f satisfaction" (the average answer over the questions within the group) there was a surprising lack of baud rate effect For the same "index of satisfaction" analysis, using the output variability as the independent variable, the effect was significant. Specifically, it would be of value to explore in greater depth the reasons why those subjects who received the slowest, most frustrating version of the system - 1200 baud, high variability - had responses to the questionnaire which in general were higher than those who received the 2400 baud, high variability version. Potential motivational reasoning was explored briefly in C hapter V. It seems apparent that there ought to be a discipline of com puter interaction, or computer programming, psychology which would provide a theoretical basis for a synthesis between cognitive psychology and com puter science. As indicated above, the results obtained in these experiments -are generalizable to other users and other interactive systems only if we are willing to accept as reasonable the description o f the population from which the subjects were sampled, and from which the Travel Message system is an example. Furthermore, it seems apparent that there are motivational and cognitive processes occurring which require an analysis outside the scope of this research. T hough it may be reasonable to interpolate the results to values of the independent variables between the extremes tested here, the extrapolation to values outside the range tested is unjustified. As terminals become available, 105 with faster display rates the conclusion of no effect on perform ance in increasing baud rates ought to be subjected to further scrutiny. If in fact increasing the display rate to 4800 baud, 9600 baud, or higher fails to produce significant user performance improvements, then system designers ought to be spending their resources in other areas. O ne area specifically identified in this research is the reduction of output rate variability, which can be decreased either by decreasing the nominal display rate or by increasing the power of the C P U . T h e research reported here supports the conclusion that within the limits of the variables studied here, if decreasing the nominal baud rate from 2400 to 1200 baud decreases the variability of the output, perform ance im proves or is not decreased. 106 B I B L I O G R A P H Y Ambrozy, Denise, "On M an-Computer Dialogue" in Int. J . M a n-M achine Studies, Vol. 3, 1971,375-383 Babbie, Earl R., Survey Research Methods, W adsworth Publishing Co., Inc., Belmont, Ca., 1973 Bennett, John L., "The User Interface in Interactive Systems,” in C uadra, C arlos A. (ed.), A n n u a i Review of I n f o r m a t i o n Science and Technology, Vol. 7, 1972, ASIS, W ashington, D.C. Boies, S.J., "User Behavior on an Interactive Com puter System," in IBM S y s t e m s Journal, No. 1,1974,2-19 Bolt Beranek and Newman, Term inal In terface Message Processor User's Guide, Report No. 2183, NIC No. 10916, Bolt Beranek and Newman, Inc., Cambridge, Mass., 1974 Burchfiel, Jerry D., Elsie M. Leavitt, Sonya Shapiro & Theodore Strollo, T E N E X Users' Guide, Bolt Beranek and Newman, Inc., Cam bridge, Mass., 1975 Cam pbell, Donald __T. & Julian C. Stanley, E x p e r im e n ta l and Q uasi-E x peri mental Designs for Research, Rand McNally and Co., Chicago, 1963 Carlisle, Jam es H., Man Computer Interactive Problem S o lv in g R elationships Between User Characteristics and In te r fa c e Com plexity, Ph.D. Thesis, Yale University School.of O rganization and Management, 1974 Cooper, W illiam S., "On Selecting a Measure of Retrieval Effectiveness," in J.A S IS , M arch-April, 1973, 87-100 Cooper, W illiam S., "On Selecting a Measure of Retrieval Effectiveness, P art II. Im plem entation of the Philosophy," in J .A S I S , Nov.-Dee., 1973, 413-424 Davies, D. R., & G. S. Tune, Human Vigilance P erform ance, American Elsevier Publishing Company, Inc., New York, 1969 107 De Greene, Kenyon B., "Man-Computer Interrelationships," in De Greene, Kenyon (ed.), S ystem s Psychology,, McGraw Hill, New York, 1970, 281-336 H ansen, Jam es V., "Man-Machine Communication: An Experim ental Analysis of H euristic Problem-Solving U nder On-Line and Batch-Processing Conditions," in IEEE Tra ns . S y s te m s , Man, and Cybernetics, j Vol. SMC-6, No. 11, November, 1976, 746-752 H ansen, W ilfred J., "User engineering principles for interactive systems," in FJCC 1971, 523-532 H eafner, J. F., A Methodology for Selecting and R e f i n i n g M a n -C o m p u te r Languages to Im p rove U sers ' P erfo rm a nce, University of Southern California, Inform ation Sciences Institute Research Report, ISI/RR-74-21, September, 1974 H eafner, J. F. B e Lawrence Miller, Design Considerations for a Com pu terized Message Service Based on Triservice O pera tion s Personnel at CINCPA C Headquarter, Cam p S m i t h , Oahu, University of Southern California, Information Sciences Institute W orking Paper, WP-3, September, 1976 K erlinger, Fred N. & Elazar J. Pedhazur, M ultiple Regression in Behavioral Research, Holt, R inehart and Winston, Inc., New York, 1973 M ackworth, Jane F., Vigilance and A tte n tio n A S ig n a l D etec tio n Approach, Penguin Books, Middlesex, England, 1970 M artin, Thom as H., James Carlisle, & Siegfried Treu, "The User Interface for Interactive Bibliographic Searching: An Analysis of the Attitudes of Nineteen Inform ation Scientists," in J.ASIS, March-April, 1973, 142-147 M elnyk, Vera, "Man-M achine Interface: Frustration," in f.A SIS, Nov.-Dee., 1972, 392-401 M iller, Lance A., & Curtis A. Becker, Program m ing in N a tu ral E nglish, IBM Research Report, RC 5137, November, 1974 M iller, Robert B., "Response time in man-computer conversational transactions," in A m erica n Federation of In fo rm a tio n Processing Societies, , Fall Joint Computer Conference, San Francisco, California, 1968, Proceedings, Vol. 33, P art I, T hom pson Book Company, W ashington, D.C., 1968, 267-277 108 Mostofsky, D avid I. (ed.), Attention: Contemporary Theory and A n a ly sis, Appleton-Century-Crofts, New York, 1970 Nickerson, Raymond S., Jerome I. Elkind and Jaim e R. Carbonell, "Hum an Factors and the Design of Time Sharing Computer Systems," in H u m a n Factors, Vol. 10, No. 2, 1968, 127-134 Sackm an, H., & Ronald L. Citrenbaum (eds.), "Human Factors Experim entation in Interactive Planning," in ONLINE P LA NN IN G , Towards Creative Problem Solving, Prentice Hall, 1972 Salton, G., Interactive In fo rm a tio n Retrieval, Cornell U niversity, D epartm ent of Computer Science Technical Report No. 69-40, August, 1969, Salton, G., "Dynamic Document Processing," in C.A CM, Vol. 15, No. 7, July, 1972,658-668 Seven, M. J., B. W. Boehm & R. A. Watson, A S tu d y o f User Behavior in Problem Solving with an In tera ctiv e Com puter, Rand report R -513-N AS A, April, 1971 Shneiderm an, Ben, "Exploratory Experiments in Program m er Behavior," in I n te r n a tio n a l J. of Computer and I n f o r m a t i o n Sciences, Vol. 5, No 2, 1976, 123-143 Sm ith, Robert, Tenex Sail, Stanford University, Institute for M athem atical Studies in the Social Studies, Technical Report No. 248, January, 1975 Sterling, Theodor D., "Guidelines for Humanizing Computerized Inform ation Systems: A Report from Stanley House," in C.A CM, Vol. 17, No.'. 11, N ovem ber, 1974 Swets, John A., et al, In fo rm a tio n Processing Models and C o m p u te r Aids for Human Performance, BBN R eport No. 2008, Bolt Beranek and Newman, Inc., Cambridge, Mass., 31 July, 1970 Tom eski, Edward A. & Harold Lazarus, People Oriented C o m p u te r S y ste m s , V an Nostrand Reinhold Company, New York, 1975 V anL ehn, K urt A. (ed.), Sail User Manual, Stanford U niversity C om puter Science Department Report STAN-CS-73-373, July, 1973 109 W alker, Donald E. (ed.), Interactive Bibliogra phic Search: The U ser/C om puter In terfa c e , AFIPS Press, Montvale, N.J., 1971 W alther, George H. & Harold F. O ’neil, Jr., "On-line user-com puter in terface-T h e effects of interface flexibility, terminal type, and experience on perform ance," in Na t ion al Computer Conference, 1974 W illm orth, N. E., "Human Factors Experimentation in Interactive Planning," in Sackm an, H. & Ronald L. Citrenbaum (eds.), O N LIN E P L A N N I N G , Towards Creative Problem Solving, Prentice Hall, 1972, 281-313 W iner, B. J., Statistical Principles in E x p e r im e n ta l Design, M cGraw-Hill, New York, 1971 110 APPENDIX 1 IN S T R U C T IO N S T O SU BJECTS You are about to participate in an experiment that is designed to help us produce interactive programs which are better and easier to use. In order to do this, we have selected an environment which allows us to observe a typical, interaction session. Not trying to disguise what we are doing, it is im portant to point out that ! this is a test of certain aspects of the system you will be using, and is in no way a test of individual performance. In a sense, there are no right or wrong answers to the set of tasks and questions you will be asked to perform. In fact, you may assume that certain aspects of the system have been intentionally designed to be less than optimal, in order to determine w hether they do in fact affect user performance. Different people in these experimental sessions will be perform ing their tasks with different versions of the system, and a comparison o f the grouped data will be made. You are to assume that you are a clerk in the travel departm ent of a company. Individuals in the company make requests to the travel departm ent for flights to various cities, using a computerized message creation and transm ittal system. You utilize a modified version of the M SG system, 111 renam ed T ravel Messages Processing System, to access the data base of travel requests. You may assume that each request for a flight actually ended up in a flight, i.e., any cancellations of requests caused the initial request to be purged from the d ata base. In your position as clerk in the travel department, you will be given a set of tasks relating to these travel requests. For example, who wanted to travel to N.Y. on such and such a date, etc. These tasks are answered by m aking searches through the data base, and reading the retrieved messages. T h ere are eleven tasks to be performed, plus two additional sample tasks at the beginning which will allow you to test your understanding of the system, experim ent with the commands available, and determine what typical messages in the data base look like. Reference materials available for this session include the list of commands, the instructions for working the sample tasks and the experim enter for answering questions concerning the use of the system’s functions and commands. After you have completed the series of tasks, you will be asked to complete a questionnaire relating to your experiences with the system. It is im portant to understand that this questionnaire asks you to rate on a numerical scale, certain 112 features of the system. Please do no hesitate to give a negative rating, if you have negative feelings regarding a particular area. Similarly, do not hesitate to give a positive rating if you feel positively towards a question. Also, you m ust attem pt to answer these questions based on your current use of the T ravel Messages Processing System. Please do not answer based on your general knowledge of MSG or computers. Since questionnaires tend to include areas where you may have no strong opinion, and exclude areas where you may have a strong opinion, there is a free-text input question at the end, which allows you to express your general comments on the system, including those areas which you feel were not adequately covered in the previous questions. It is im portant to emphasize that your participation in this experim ent is voluntary. You may withdraw from this experiment at any time. T hough there may be no immediate benefits to you from this experiment, it is hoped th at the results of this research may guide system designers in the future in producing interactive programs which are easier to use. 113 APPENDIX 2 T A SK S T O BE PERFO RM ED 51) H O W MANY REQUESTS W ERE MADE T O T H E T R A V E L D E P A R T M E N T for travel to San Diego? Follow the instructions on the answer sheet to answer this question. 52) W H O W A N TED T O G O T O DES M OINES DURING T H E M O N T H O F JANUARY? Again, follow instructions on the answer sheet to answ er this question. 1) W H O W A N TED T O G O T O LO N D O N IN MARCH? If there is m ore than one person who wanted to go, write down all of their names. If nobody wanted to travel to London in March, write "NONE." 2) W H O W A N TED T O G O T O KANSAS CITY DURING T H E M O N T H O F FEB? Answer this question similarly to the previous question. 3) W H O W A N TED T O G O T O PO R TLA N D DURING T H E M O N T H O F FEB? Answer this question similarly to the previous questions. 4) W H O W A N TED T O G O T O MIAMI O N FEB. 2? Answer this question similarly to the previous questions. 5) W H O W A N TED T O G O T O SAN D IEGO ON APRIL 2? Answer this question similarly to the previous questions. 6) H O W MANY REQ U ESTS FOR TR IPS T O SEA TTLE ARE T H E R E IN T H E D A TA BASE? 7) W H O T O O K T H O SE TR IPS, and how many trips did each of these people take to Seattle. 8) FO R T H O S E W H O T O O K FIVE OR M ORE T R IP S T O SE A T T L E , to which other cities did they REQ U EST travel? 9) L IST T H E LO C A TIO N S AND REQUESTED DATES O F T R A V E L , th at Alan Schwartz made, where he requested Sim Farar to also travel. Similarly, list locations and dates of travel where Sim Farar requested travel with Alan Schwartz. 10) List the dates when Alan Schwartz and Sim Farar both traveled together to Seattle. 114 II) O N T H O S E O C CA SIO N S W HERE B O T H SIM FARAR AND ALAN SC H W A R T Z TRA V ELED T O G E T H E R T O SEA TTLE, list those who also traveled with them. 115 APPENDIX 3 P O S T -T E S T Q U ESTIO N N A IR E IN ST R U C T IO N S FOR PO ST -T EST Q U ESTIO N N A IR E You are now requested to answer a brief series of questions concerning your opinions of the computerized system you’ve just been using. It is im portant th at you answer these questions with the answer that best represents your attitude to the particular area of the question. Specifically, the questions require you to numerically rate certain aspects of the computerized system. Even though you may not have a strong feeling one way or the other, please select one of the numerical ratings that best characterizes your attitude to that particular area. You will note that the questions are answered using the computer. Please be careful th at you select the correct number for your answer. If you make an error, press the "DEL" (or A) key. W hen you are satisfied with your answer for that particular question, press the "RETURN" key. Please answer the following series of questions with a numerical rating in the range of 1-5. 1 = Very Poor, Unacceptable, etc. 5 = Excellent, Completely Acceptable, Easy to Use, etc. (1-2 implies a generally negative response, 4-5 a generally positive one.) However, a specific numerical scale will be given for each question. 1) CO M M A N DS: EASE O F U S E - 1 = Difficult to use 3 = Easy to use, but somewhat confusing 5 = > Easy to use, no confusion as to meaning 2) CO M M A N D S: CLEAR AND M EANINGFUL F U N C T IO N S - 1 = Com m ands produced results completely different from what was expected. 3 = Some commands were clear and simple, others were very confusing. 116 5 - All commands were completely clear. 3) C O M M A N D S - 1 - W ould liked to have had a number of additional commands available to make the tasks easier to accomplish. 3 = Some additional commands would have been useful. 5 = » Available commands were completely adequate to accomplish tasks. 4) SCREEN: B R IG H T ENOUGH? 1 = T oo dim, completely unreadable. 3 = T oo dim, but readable 5 = Brightness just right. 5) SCREEN: LARGE ENOUGH? 1 = Screen size too small, completely unreadable 3 = * Screen size too small, but readable 5 = Screen size just right 6) CH A RA CTERS: LEGIBLE, ADEQUATE SIZE, ETC.~ 1 = Characters too small or awkwardly shaped 3 = C haracter size and shape adequate, but some difficulty in reading 5 = C haracter size and shape just right 117 7) P R IN T IN G FORM AT: READABLE? 1 = » Form at unclear, jumbled, etc Unreadable. 3 * Form at readable, but not outstanding. 5 » Form at excellently arranged and completely readable. 8) P R IN T IN G FO RM AT: SU FFICIENT DATA? 1 = Completely insufficient data to adequately complete tasks. 3 - Just barely sufficient data, but would have been able to utilize more. 5 = D ata presented was completely adequate to complete tasks. 9) C O M P U T E R SYSTEM S P E E D - 1 = > T oo slow 3 = Ju st right 5 = » T oo fast 10) V A R IA T IO N IN C O M PU TER SYSTEM S P E E D - 1 = So much variation in computer and printing speed that system was difficult and bothersome. 3 = Some variation in computer speed and printing speed, but not enough to be bothersome, 5 = Little or no variation in the speed of the computer system. 118 11) P R IN T IN G S P E E D - I = T oo slow 3 - Just right 5 = T oo fast 12) V A R IA TIO N IN PR IN T IN G S P E E D - 1 » F ar too much variation for easy reading of output 3 - Some variation, but no great difficulty in reading ;5 = O utput was smooth and easy to read 13) PR O C ESSIN G T IM E - I = System took way too long to do what should have been simple tasks, 3 = System took about the time you would have expected. 5 = * T oo fast, felt rushed, etc 14) W AS T H E TR A V EL MESSAGE PROCESSING SYSTEM U SEFU L IN A NSW ERING TH ESE QUESTIONS? 1 = Completely useless, confusing, etc. Answering the questions was an exercise in futility. 3 = Found the system marginally useful, but some aspects were difficult to use, too slow, confusing, etc. 5 = Completely useful, no confusion in the use of the system. Speed of system was just right, easy to adapt to. 15) S U P P O S E YOU HAD T O ACTUALLY ANSWER T H E T R A V E L Q U E ST IO N S BY G O IN G T H R O U G H T H E MESSAGES BY HAND. H O W M UCH IS T H E TRA V EL MESSAGES V C O M P U T E R PR O C ESSIN G SYSTEM W O R T H T O YOU IN O RDER T O SAVE Y O U T H E EFFO R T O F D O IN G T H IS BY HAND? 1 - No advantage seen in using the computer system. W ould much prefer to perform these tasks by hand. 3 = No strong feeling one way or the other. 5 = M uch prefer using the computer system rather than having to answer these questions by going through the messages by hand. 16) DID Y O U FEEL A NEED FOR M ORE M ATERIALS O N T H E FU N C T IO N S AVAILABLE IN T H E SYSTEM? I = Available materials were completely useless. 3 = W hat was available was useful, but more information was needed. 5 = All available material was useful, no more information was needed. 17) Y O U R O V ER A LL RATING O F IN PU T T O T H E C O M PU T E R -- [Use a 1-5 scale as explained in the top portion of the screen.] 18) Y O U R O V ERA LL RATING O F O U T P U T FR O M T H E C O M P U T E R - [Use a 1-5 scale as explained in the top portion of the screen.] Please type your general comments on the functions provided, their ease of use, and your general feelings of frustration or satisfaction in the use of the system. 120 Be certain to address yourself to your feelings in regards to the delays in output, and the general speed of the system, particularly if the load average was high and you noted unacceptable delays in system performance, 121 APPENDIX 4 SA M PLE M ESSAGES To: T R A V E L D EPT. From: Alan Schwartz Subject: Travel, San Francisco, Feb. 2 a.m. Date: 31 JA N 76 1303-PST Message: Please reserve 2 seats to San Francisco on Feb. 2 a.m. for me and Arnold Serkin R eturn: O P E N T h an k s To: T R A V E L D EPT. From: D avid Simpson Subject: Travel, Des Moines, Jan. 4 p.m. Date: 31 JA N 76 1303-PST Message: Please reserve 4 seats to Des Moines on Jan. 4 p.m. for me and Jan e Doe Arnold Serkin John Wilson Return: Jan. 8 T h an k s To: T R A V E L D EPT. From: Arnold Serkin 'S u b ject: Travel, Miami, April 23 a.m. Date: 31 JAN 76 1303-PST Message: Please reserve 3 seats to Miami on April 23 atm. for me and Sim Farar Alan Schwartz Return: April 27 T h an k s
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A divide-and-conquer computer
PDF
A process elaboration formalism for writing and analyzing programs
PDF
Applications of abstract data types: The Trio operating system.
PDF
A distributed access control mechanism for managing intellectual property rights in large-scale information systems
PDF
Algorithms for the discrete logarithm problem and algebraic aspects of quantum computation
PDF
An algebraic view of protection and extendibility in abstract data types
PDF
An approximate analysis of lateral vibrations of a loosely-bonded pile.
PDF
An analysis of bit blitting and polygon clipping
PDF
A photophysical study of pyrene adsorbed onto silicas of variable surface area and porosity
PDF
A study of the double sulphates of some rare-earth elements with sodium, potassium, ammonium and thallium
PDF
An analysis of the effects of controls imposed on the production environment of a dramatic television series, "The Senator"
PDF
A Neural Code For Face Representation: From V1 Receptive Fields To It 'Face' Cells
PDF
A representation for semantic information within an inference-making computer program
PDF
A potential field method for the automatic design of modular fixtures
PDF
A model of filtration and reabsorption in the mammalian kidney
PDF
An architecture for parallel processing of "sparse" data streams
PDF
Computer simulation of polymers and membranes
PDF
An integrated systems approach for software project management
PDF
Automated postediting of documents
PDF
A study of the English Canadian feature film industry 1977-1981
Asset Metadata
Creator
Miller, Lawrence Henry (author)
Core Title
An investigation of the effects of output variability and output bandwidth on user performance in an interactive computer system
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Computer Science,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Oestreicher, Donald (
committee chair
), [illegible] (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c17-765120
Unique identifier
UC11345357
Identifier
DP22719.pdf (filename),usctheses-c17-765120 (legacy record id)
Legacy Identifier
DP22719.pdf
Dmrecord
765120
Document Type
Dissertation
Rights
Miller, Lawrence Henry
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA