Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The Generation Of Predictor Equations By Recursive Linear Regression
(USC Thesis Other)
The Generation Of Predictor Equations By Recursive Linear Regression
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
THE GENERATION OP PREDICTOR 'EQUATIONS BY RECURSIVE LINEAR REGRESSION by Charles Roderick Pillerup A Dissertation Presented to the FACULTY OP THE GRADUATE SCHOOL UNIVERSITY OP SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Chemical Engineering) June 1962 UNIVERSITY OF SO U TH ERN CALIFO RNIA GRADUATE SCHOOL U N IV ER S ITY PARK LOS ANGELES 7. C ALIFO R NIA This dissertation, written by Charles Roderick Fillerup under the direction of h.^-3...Dissertation Com mittee, and approved by all its members, has been presented to and accepted by the Dean of the Graduate School, in partial fulfillment of requirements for the degree of D O C T O R O F P H I L O S O P H Y Dean Date. June, 1962............. DISSERTATION COMMITTEE ACKNOWLEDGMENT The co-operation of the Service Bureau Corporation, in making computer time available to the author, is gratefully acknowledged• ii TABLE OP CONTENTS Page PREDICTION AND REGRESSION............ 1 The predictor equation . . . . . . . . . . . . . 1 Regression methods .............................. 5 The correlation coefficient ...................... 10 Term-by-term regression .......................... 14 Attributing the coefficients . . ..................21 GENERATION OP X-PARAMETERS............................ 29 Phase I . . . ....................................... 29 Phase IA, accounting for historical models . . . 29 PhaBe IB, generating the X^ -parameters .... 34 Phase IC, generation of numerical X^-data .... 40 DEVELOPMENT OP THE PREDICTOR EQUATION ....... *5 Phase II ...................................45 Phase IIA, calculating the basic statistical parameters................... 46 Phase IIB, selecting the recursion parameter . . 48 Phase IIC, generating the new dependent parameter 58 EVALUATION OP THE ATTRIBUTIVE COEPPICIENTS ......... 62 Phase I I I ............................................62 Phase IIIA, generating the -parameters .... 63 Phase IIIB, attributing the a^-coefficients . . . 66 iii Page AN APPLICATION OP RECURSIVE LINEAR REGRESSION .... 72 Introduction • ......... • • . » . . • • • • • • 72 Phase I •••-.. .......... 75 Phase IIIA • . • • • • • • • . • • . * • • • • • 90 First recursion stage, Phase II ................. 94 First recursion stage, Phase III . . ............ 105 Second recursion stage, Phase II . ............. 116 Second recursion stage, Phase III ............. .129 APPRAISAL OF THE RECURSIVE LINEAR REGRESSION METHOD . 143 The vapor pressure example . . . ............... 143 Comparison of regression treatments .......147 APPENDICES ......... ................... 158 I. NOMENCLATURE......................... 159 II. REFERENCES ........ 165 III. BIBLIOGRAPHY................................. 169 IV. MATHEMATICAL DERIVATIONS ..... ........... 171 V. STULL'S VAPOR PRESSURE DATA.................. 179 VI. PARAMETERS OF THE INITIAL X-MATRIX...........183 VII. SAMPLE OF X-MATRIX...........................187 VIII. SAMPLE OF Z-MATRIX...........................188 IX. THE ANTOINE PARAMETER.........................189 X. COMPUTER FLOW CHART ................ 193 XI. OVER-ALL-ERROR FRACTIONS FOR RAW DATA . . . . 196 iv LIST OP TABLES Table Page 1• Primary Formulas for Vapor Pressure ......... 76 2. Regression Statistics for First Phase II Recursion..................... 95 3o Basic Statistics for Three Parameters of the First Recursion............ 97 4. Regression Statistics for Second Phase II Recursion ................................124 5. Basic Statistics for Two Parameters of the Second Recursion........... <••••••127 6. Over-All-Error Fractions for the First Two Recursi ons ............................140 7* A Comparison of Recursive Linear Regression and Conventional Multivariate Regression Techniques ........... 148 8. Multiple Regression Coefficients for the Predictor Equation (Vila) for Stull*s Smoothed Data..........................150 9. Basic Statistics for the Antoine Parameter • 191 v LIST OP ILLUSTRATIONS Figure Page 1 . Erpenbeck-Miller Vapor Pressure Equation . . . 24 2. The Y-Augmented X-Matrix........................ 44 3. Chemical Properties of the n-Hydrocarbons . . 84 4. Stull's Vapor Pressure Data, Low-Range .... 86 5. Stull's Vapor Pressure Data, Mid-Range .... 87 6. Stull's Vapor Pressure Data, High-Range . . . 88 7. Regression Coefficients, First Phase II . . . 100 8. Typical Chemical Properties ........... ..102 9. Recursion-Error Fractions, First Phase II . . 103 10. Phase III Recursions on a ^ .... 110 11. Final Recursion Coefficients, First Phase II . 112 12. Total Residuals, First Phase II ....... 114 13» Estimated Roots, n -Plot ...................... 118 ' c 14. Estimated Roots, Z-Plot ...................... 121 15. Regression Coefficients, Second Phase II . . . 130 16. Recursion-Error Fractions, Second Phase II • • 131 17. Final Recursion Coefficient a2# ....... 133 18. Final Recursion Coefficient aQ 2 .............135 19. Total Residuals, Second Phase II ....... 138 20. Multiple Regression Coefficients ....... 151 21. Non-Linear Coefficient, Antoine Coefficient . 190 vi PREDICTION AND REGRESSION The predictor equation.— Virtually every problem in the pure and applied sciences is in part a concern with relationships among sets of physical measurements. Whether these sets of measurements may or may not be characterized as variables, the investigation of the relationships con stitutes the essential nature of science — "explaining'1 physical reality (21). Attempts are inevitably made to reduce the conceptual relationship to a mathematical equa tion, the predictor equation, through which one variable may be predicted more or less quantitatively from one or more other variables. One may Introduce, for example, a completely arbitrary distinction between the predicted (dependent) variable y and a total of m other (indepen dent) variables x^. The relationship under scrutiny may then admit to the generalization f(y» Xp......xm) ~ O. (Ia) The basic merit of a predictor equation is often, but not exclusively, predicated on the extent to which the f-func tion vanishes for all physically realizable combinations of the variables. In this sense, (Ia) can in fact be no more than an approximate equality. There exists an infin ity of such combinations, all intimately associated with 1 both random and systematic errors, which obviously could not be accommodated exactly by an equation. A predictor equation is, then, a mathematical form (Ia) which is satisfied approximately by one or more combinations of (i.e., observations on) all the variables y, x1, ... , and x^. Equation (Ia) may in some Instances be written as £(y) - g(x1, ... , xm), (lb) to indicate that the dependent variable is separable from those designated as independent. In this case a further transformation will prove useful. Consider a form equiv alent to (lb) n i i The and represent groups of variables, hereafter designated parameters, which are mathematical functions of one or more variables x and y (e.g., x^, y^, x^Xj/x^, x^loggXj, x^[a' + Xj]"^, etc.). A linear combi nation of these parameters constitutes the predictor equation. Instead of a relationship among variables, (Ic) implies the more specific relationship between dependent and independent parameters or variable groups. The number of variables m+1 and the number of parameters n'+ n will not generally coincide. Yet still another particular form of predictor equation may be deduced, viz., one in which no parameter predictor equation, similar to (Ic) except that the primes appended to the parameters are removed, is: Linear in the coefficients a and A, (id) suggests the ultimate goal— a least squares determination of the best values for these coefficients. The general procedure is termed regression analysis, the criterion "best" implying the principle of least squares. The simplification introduced by linearizing the coefficients is not a necessary step Insofar as regression technique is concerned, but only eliminates the necessity of introducing an iterative or trial-and-error procedure. Consider an example of (lb): Aiy + A2^ - a0 + aiX1l°Sex3 + a2x1lo®e(a* + x2^ * Then the equivalent (Ic) form would be Y^ nor X^ contains unknown coefficients. The resulting (Id) Yj AgY£ — a2^2 * (if) for which Y. y, = x^oggXj, 4 - are the appropriate variable transformations. The deter- i mination of a* in X? would normally involve an itera tive process. Various trial values of a' would be used i with experimental values of x1 and x2 to evaluate X^, and the least squares principle invoked to determine which a' is best. The analysis of, and for, a' is indeed distinct from that involving the linearly-related coeffi cients in (Id). The subject of non-linear coefficients will be discussed at somewhat greater length in a later section, but is treated in considerable detail by Williams (40). The preponderant majority of predictor equations Include only one term as the dependent parameter, in which there also appear no undetermined coefficients. This particular case of (Id), n'=1, may be written in the form Y = a o + ZZaiX1 = G(X), (Ih) n wherein the A-coefficient has been dropped. Equations of this type represent very nearly all predictor equations in the literature, and the parameter Y may be virtually any algebraic or transcendental function of y that is, preferably, single-valued. Not only are the methods of matrix algebra perfectly suited to the handling of (Ih), but elaborate statistical treatments have also been developed. The major part of this paper will therefore deal with predictor equations reducible to the form (Ih). They are in fact encountered so frequently as apparently to justify the publication of a variety of books, mono graphs, and journal articles on specialized methods (see Appendix III). The adoption of predictor equation (Ih), twice removed from the very general form (Ia), should not be considered as an excision of general applicability for the methods which will subsequently be described. With some exceptions, to be noted, the discussions will apply in substance to all forms (Ia)-(ld) and (Ih), provided only that by preliminary regression analysis the non linear coefficients have been evaluated. Many equations may be exactly converted into the form (Ih) by exponen tiation or some inversion process, while approximations thereto can always be derived by the inclusion of portions of infinite series. As will become evident later, the more general forms are not particularly relevant to the methods of recursive linear regression. Regression methods.--In both conventional multi variate and recursive linear regression analysis, a predictor equation such as Y = G(X1 , ... , Xn) (Ih) is the ultimate goal. Multivariate regression is one means whereby the unknown coefficients in (Ih) may be evaluated. These coefficients are assigned numerical values in accordance with the standard least squares criterion that the sum of the squares of the residuals about Y be a minimum: £ [ Y " G^xi* ••• * xn^)2 = minimum. (Ii) The function G is specified prior to the analysis, as are the identities of the dependent and independent parameters Y and X^. Computational methods vary, but the results are unique; given the analytical formula G(X), numerical values of the associated unknown coefficients may be generated in a single step which requires no additional decision-making by the analyst. The so-called regression order, equal to the number n+1 of unknown coefficients, is limited only by the memory capacity of a computer or by the research worker's own dedication to tedious and repetitive arithmetic. The use of recursive linear regression enables the investigator to postpone or entirely bypass the initial choice of an equation form. At the same time, he becomes responsible for a greater number of decisions throughout the analysis. While these decisions might be termed strictly mathematical in nature, yet they would normally be Interpreted in the light of certain characteristics of the physical model. This flexibility feature, the interrogation of intermediate results, is a distinctly useful option in recursive regression analysis with linear functions. Equally important, there is essentially no limit to the number of independent parameters X^ which may be studied. The regression order is always two, since only linear equations are manipulated. All the advantages of recursive linear regression may, in fact, be attributed to this property of linearity. The recursion procedure may be explained very simply, although the reader should bear in mind that extensive modifications can be introduced: 1. Among all independent parameters Xj_ there is found that one Xj which correlates most strongly with the original dependent parameter V -o* 2. The coefficients in the linear equation Y0 - a01 + V XI>1 (Ii> are calculated, according to the least squares criterion (Ii). 3. A new dependent parameter Y1 is created, Y 1 = Y0 “ a i ( x j ) i = a o i » ( I l c ) and steps 1., 2., and 3. are performed repeti tively to obtain (Xj)2 and Y2^ (Xj)^ and Y^, etc. In each recursion cycle a new dependent parameter is formed while an additional term of the predictor equation is introduced. The cycling process would normally be i considered complete when, in some sense, the predictor equation is deemed adequate. The basic procedural distinction between multi variate and recursive linear regression is in the choice of the predictor function G(X). This function is pre assigned in multivariate regression, both as to number of terms n and to the identity of each term X^. The set of normal equations for the least squares calculation is immediately derivable from a knowledge of G(X). The final solution for all unknown coefficients in (Ih) is, in most cases, a standard procedure amounting to one or more classical methods in matrix theory. The more sophisticated computer programs which accomplish this final data reduc tion produce, in addition, many individual statistics on which subsequent significance tests may be performed. The application of recursive linear regression involves no predetermined function G(X), but only an arbitrarily large number of specific parameters X^. From this set of X^ are drawn those which will eventually constitute the predictor equation (Ih). Since the equa tion is developed one term at a time, each addition accounting for some appropriate improvement, the number of parameters is not known at the start of the analysis. There could be generated, in fact, at least n equations 9 Yo 5 Go[<Vi]> y, = a, [(Xj),, (xx)2], (11) Tn-1 “ Gn-l[txI>1........ tXI>„]- each of which is in some way a better approximation to the dependent parameter YQ than the preceding ones. Each successive Y^ becomes the basis for statistical testing and inference schemes which determine that G^ finally adopted as the most suitable predictor equation. The introduction of single terms into the develop ing predictor equation does not imply that non-linear functions can not be accommodated. The nature of a particular physical relationship may be such that, for example, a reasonable and accurate mathematical fit is accomplished by a third-degree polynomial. In recursive linear regression, the polynomial must be transformed from the general expression Y - A + BX + OX2 + DX3 to one of the form Y = A(X - B' ) (X - C1 ) (X - D1 ). The necessity of devising and applying this transformation (in which a pair of the coefficients may be complex) and the requisite root-finding procedure, are in no sense 10 handicaps. On the contrary, considerable additional infor mation concerning the physical relationship may be obtained during the study and determination of the zeros explicitly defined by the second form. The three factors are, of course, exactly the x^-variables which .have been discussed. The coefficients are of the non-linear type and may be evaluated by trial-and-error or, perhaps, by inspection of the Y-X plot. The conceptual difference between conventional multivariate and recursive linear regression should now be obvious. In the former a specific equation form is Investigated, while in the latter a final equation form is to be determined. The quantity of initial data will generally be much larger for recursive linear regression, as a larger number of parameters may warrant statistical appraisal. The accessability of such data is a consider ation external to the regression analysis proper, however large the computational task of generation might be. Whether variables or quite elegant parameters, sets of measurements or sets of computed functions of measure ments, the basic or raw data are common to both regression methods. Only the preliminary and operational assumptions differ. The correlation coefficient.--In recursive linear regression the simple correlation coefficient r is the primary criterion by which a particular independent 11 parameter Is Introduced as a term in the predictor equa tion. In general, the coefficients of correlation between the current dependent parameter Y and all independent t i parameters X^ are calculated. That particular X^, denoted which is associated with the correlation coefficient of greatest magnitude, becomes the new term in the predictor equation. The r-value for any parameter X^ has a precise mathematical meaning (see Appendix IV), but may be regarded qualitatively as a measure of "goodness of fit" between the values of Yj and those calculated from the least squares linear relationship, (Yj) calculated = A + BX^. Thus, r is a measure of the dispersion of observed values of Yj about the least squares line. for a more complete discussion of the correlation coefficient, the reader is referred to Worthing (42) and to the more formal treatment by Chuprov (7). The purposes at hand are adequately served by the foregcing statements and by the following: the simple correlation coefficient may be regarded as that fraction of variation accounted for by the least squares line. As one of the premises on which recursive linear regression is based, the r-criterion for selection of predictor terms is novel in the treatment of so-called causal relationships. The use of r in decision-making 12 has great precedence in the social and life sciences, presumably because the effects studied are stochastic rather than functional. The difference between the two types of variation is generally considered fundamental. In the physical sciences, predictor equations are meant to predict "best" values in the sense of "most accurate" or "most nearly true." There is built into such a predictor equation the a priori assumption that there does exist a certain value of the dependent parameter which might be called true or perfect. Were it possible to eliminate experimental error, one should always observe and record this particular value. At a specific temperature, for example, the assumption is that there exists exactly one properly-dimensioned number corresponding to the vapor pressure of a pure substance. That this value may never be established (and in fact can not be known, according to the uncertainty principle) is manifestly not relevant to the regression problem; the predictor equation presumably represents a relation among absolute quantities. The equation and the relationship are both termed, in this case, causal, or functional. In the social and life sciences, however, the concept of "true" value is often not even Intuitively meaningful. One speaks, for example, not of the "true" or "best" or "most nearly perfect" weight of a ten-year-old boy, but only of a most probable weight. The most probable value is associated with a frequency distribution law, for all weights, all boys, which in spite of a super ficial resemblance to the law of randomly distributed "experimental" errors, is somehow different. The differ ence has always hinged, classically, on the distinction between stochastic or correlational variation and that described as causal or functional. Semantically, the difference is as that between "experimental accuracy" and "maximum likelihood." The natural and^physical sciences have each created imposing arrays of distinctive statisti cal methods for handling their own data reduction requirements. Even descriptions of research differ— the laws of nature versus natural law. Just after World War I the philosophies began to merge. Bohr (5) and Heisenberg (14) argued that causal relationships were perhaps non-existent after all. * Although many observed phenomena do obey apparently functional relationships, there is slight Justification for insisting that they must, and none whatsoever for the dictum that there can be no other description. For this reason primarily, and for others previously noted, this writer takes the position (not unorthodox, but probably minority) that all observable phenomena are essentially stochastic in nature. To a greater degree (e.g., Kepler's laws of planetary motion) or lesser (laws of mass trans fer) these stochastic relationships may not only appear to be causal but also to be accurately predictable. Even so, the cause-and-effeet hypothesis will be presumed an unnecessary qualification. The adoption of this premise means that some useful concepts of correlation and variance analysis may be incorporated into the mathematical methods of the physical sciences, wherein they have heretofore failed to appear. Insofar as recursive linear regression is concerned, the notion of a correlation coefficient (which excepts functional relationships) becomes a primary judgment factor in the development of predictor equations. The introduction of the r-value criterion constitutes one of the basic premises. Term-by-term regression.— The application of recursive linear regression leads to the development of an n-term predictor equation through a total of n linear recursion equations. To compare the effects of such term- by-term regression with those of multivariate regression, let us consider a final predictor equation. Y0 = &q3 + a-| + a3 ^XI ^3* (lm) and the several ways by which it might be generated from, say, N observations or measurements on the three (xj)i and the one Y0 parameters. Using conventional regres sion techniques, there exists one solution, unique within the realm of the finite arithmetic methods employed. Ignoring for the moment the r-criterion, there will gener ally be 31=6 such equations if recursive linear regres sion were used. In this latter case the parameters may be (XI)1,(XI)J, or [(XjJ- j, (Xj ) 1 , (X j) 2] , etc. It can be shown (see Appendix IV) that the coefficients are different as the introduction-sequence is changed. The question naturally arises: which one of the six recursive and one conventional equations is correct, or best? If we were to consider only the N particular observations and only this particular equation (Im), we might decide on that set of coefficients for which the Gauss criterion for closeness- of-fit is met* in which n=3 for the example being considered. Alter nately, we might choose that equation whose coefficients are most nearly in accord with certain theoretical deriva tions. In fact, of the total of seven equations, there seems to be no specific criterion which will satisfy every user's notion of "best." Except on the point of tradition, no great accumulation of evidence points to the convention- ally-derived equation as best. Representing as it does but one-seventh of the number of cases, each in its way satisfying the least squares principle, even the Introduced in the order [(XJ)1 , (XjJg, (XI)3], or [Uj)2, minimum, (in) N - n traditional priority is subject to question. The problem of choice described above is not com pletely solved by the recursive method, but there exists at the very least the option of studying all cases. In addi tion, the term-selection {i.e., r-value) criterion may be invoked. The first parameter then to be included in the predictor equation will be that Xj which correlates most strongly with Yq. If the correlation coefficient r(Y0,Xj) is in fact large, one may also be persuaded that Xj belongs first intuitively. Not only is the coefficient associated with it known more accurately, according to the value of r, but most of the total variation may be removed in one parameter, and in one stage rather than several. The fewer the number of observations N, the more Important it is that the total variation be elimi nated by means of as few parameters as feasible. Other wise one risks the masking of a simple relationship, unrealized, by several complex ones. Obviously, too, the theoretical impact is the greater, as the number of consequential predictor terms is smaller. As was noted previously, the decisions which arise in recursive linear regression are greater both in number and importance, than those normally encountered in multi variate regression. Especially in the matter of term- selection is the prescience of the investigator put to task. He will (if his calculations are computer- 17 performed) have access to specific values of the correla tion coefficients for the most strongly correlated parameters, if not indeed for the entire set of X^. He Is then at liberty to choose among these coefficient-and- parameter pairs Just that r(Y,Xj)-Xj which appears most suitable. The particular independent parameter selected may not be associated with the largest r-value but may be one which is, for other reasons, more appropriate in the equation or for the physical model. Perhaps the most highly correlated parameter is considered too complex, or perhaps too dissimilar to terms previously Introduced. Most important, it may be too specific. For example, in analyzing the data on vapor pressures of normal hydrocar bons, one may find that a certain X-parameter correlates very strongly with the current Y. for a few members of J the homologous series, but weakly for the others. At the same time there may exist some other parameter for which the r-values, though smaller, are uniformly significant for all members of the series. This other parameter should obviously be the one introduced into the predictor equation. The advantage of uniformity thus gained is almost inevitably worth the sacrifice in accuracy for a minority of the homologues. The reader may surmise that considerable fore sight is required of the analyst, since the cumulative changes in dependent parameters Yj afford no convenient projection or extrapolation to residual variation after the next succeeding stage. Although a particularly perceptive investigator will produce the more desirable and coherent results (as might be expected in any endeavor), the lack of appreciable mathematical insight is no great barrier to the results ultimately obtainable by the recursive procedure. One might regard each linear recursion equation as an entity, a predictor equation in itself. As will be des cribed later, the process of attributing the coefficients a^ in the predictor equation tends to separate the over all analysis of an n-term predictor equation into n smaller analyses. Mathematical insight works to advantage in each of these sub-analyses but is not generally exten sible among them. In other words, what one knows about Yg at stage two, Y2 = Yi - a2(Xj)2, (Ip) can lead to profitable speculation concerning the next parameter to be removed, but not generally to (Xj)^. The difficulties will be described further in the next section. The comment was made earlier that discussion of the more general forms of the predictor equations was not relevant to the method of recursive linear regression. The adoption of the particular form (Ih) should by now appear Justified. The difference between the general and 19 particular forms (Id) and (Ih) lay wholly in a restriction on the number of dependent parameters. In the term-by-term generation.procedure, however, the restriction vanishes. The distinction between dependent and independent para meters is purely arbitrary. All equation parameters save one may in fact be considered independent and thus moved to the right side. In this sense, (Id) and (Ih) are computa tionally indistinguishable. The same reasoning applies to a comparison between (la) and (Ic). Insofar as the regression mathematics are concerned, a parameter may appear as either the predicted or as one of the predictor terms. Only in the interpreta tion of the least squares principle is the independent- dependent concept of parameters important. But the method of recursive linear regression is by design flexible to the extent that a newly-introduced term may be considered either dependent or Independent. Multivariate regression involves elaborate techniques, not all direct, for main taining the validity of the least squares criterion as the status of any parameter is modified. The transformation is linear, however, for the recursive regression method, and a trivial manipulation of the recursion equation. Consider the k-th recursion formula: V i 2 aOk + W k - ™ That parameter which correlates most strongly with Ylc_1 20 may be, for example, a truly independent parameter such as (XjJ^x^loggXj, a truly dependent parameter such as or a composite parameter such as In the first case the recursion equation may stand as written. In the second and third cases the r-criterion may be synthetic (a regression X-on-Y rather than Y-on-X) and therefore must be checked by means of a parameter transformation before it is accepted as an authentic reflection of the least squares principle. The transfor mation < V k = ^"aOk^ak^ + ^1//ak^Yk-1 establishes the regression check and the proper values of the coefficients, regardless of the type of interaction between x^ and y. The checking procedure is ad libitum, according to what may be regarded as theoretically sound in the mathematical model. The term-by-term regression method provides, in either case, a satisfactory solution to the dilemma posed by particularization of the generalized predictor equation form (la). The primary advantage of term-by-term regression analysis is that of flexibility. At any stage of the recursion process one may perform a conventional multivari ate regression on that same set of parameters as had been introduced recursively. The two equations, identical in form but with different coefficients, raise very fundamen tal questions as to which may be regarded the superior. In 21 some cases the Gauss criterion alone may be deemed suffi cient. In others, uniform variation in the coefficients for homologues may be considered decisive. Aside from those rare instances for.■which the theoretical or mathemat ical model is extremely trustworthy, these two criteria are antithetical. Either can be accommodated within the framework of recursive linear regression. particular regression analysis were merely the development of an accurate predictor equation, the general procedure which has so far been described is adequate. A generally more difficult problem may be distinguished, however, wherein two different levels of regression are required. One may wish, for example, to derive a general equation for the vapor pressure of members of a homologous hydro carbon series. Each member k of the series might thus be presumed to obey one general predictor equation; after the n-th recursion, The dependent parameter Y may in this case represent the function logeP. With (Is) are associated k sets of the a-coefficients. Therefrom arises the possibility of developing predictor equations for the coefficients themselves, viz., Attributing the coefficients.— If the purpose of a (Is) in p recursion stages. The generalized parameters in equation (It) represent different fundamental (chemical) properties of particular chemical species k, and are thus analogous to independent parameters of the general predictor equation (Is). Likewise, the unknown coeffi cients bg. j . are analogous to coefficients in equation (Is). Each coefficient ag^ might therefore be considered the basis for a regression analysis at a distinctly higher level of abstraction. The goal of such an analysis would be an explanation, so to speak, of the numerical values asj c* The development of n+1 auxiliary predictor equations (including the attributing of a . ) u n 9 j c is just such a process hereafter defined as attributing the coefficients. Equations (Is) and (It), analogous in form and treatment, differ in several important respects. Equation (It) represents a family of equations, one for each coefficient in (Is). Furthermore, these equations are not necessarily Identical in form. If, for example, the Z-parameters are functions of the number of carbon atoms, n , in the series of normal hydrocarbons, equations (It) v for any chemical species k might assume the following (expanded) forms: a2k,0 = b02,0» (Iu) a3k,1 ~ b03,1 + b31(logeIlc) Each of these equations is obviously a regression problem in itself. Theoretically, at least, there is no limit to the number of hierarchic regression levels. The reader will appreciate, however, the current impracticability of analyzing in a similar manner the (It) coefficients in terms of still other coefficients c. Serious efforts at attributing the a-coefficlents are almost never attempted, even for two- or three-term predictor equations of the form (Is). The major barrier to such research is in the author's opinion a direct outgrowth of the exclusive use of multivariate regression methods. The set of values for any particular coefficient is, as a result of such regression, too irregular to accommodate any reasonable and simple predictor equations as in (Iu). Figure 1 shows the variation in magnitude of the coefficients for a typical vapor pressure equation (11) for the normal hydrocarbons through heptane. The curve labeled C, and to a lesser extent that shown as A, demonstrate the irregularity of these two coefficients as a function of the number of carbon atoms in the chain. The picture is not complete, of course, for just this one 105 100 90 80 0.22 0.20 0.18 0.12 0.10 0.08 0.06 9.0 8.8 0.04 8.6 8.4 A 8.0 7.8 7.6 7.4 nc, Number of Carbon Atoms in Chain Pig.1 — Erpenbeck-Miller Vapor Pressure Equation: logjQp = A - BT_I +■ logjo Cl-C|£-) 25 chemical property. There may in fact be other chemical properties for which the curves would be smooth and uniform. For such a limited portion of the simple homolo gous series, however, one might expect more regularity in the variation with nc than is shown. With coefficient B the prospects are considerably improved. Except for the slight irregularity at n =6 w (note that no values A,B,C are shown for n-butane, nc=4), the curve is smooth and without Inflections. In fact B might be satisfactorily approximated with one or more elementary functions of nc. The analytical form of the approximation will then represent a predictor equation for B. Since the term BT”^ is itself an approximation to log^Q?, the predictor equation for B must naturally be that which one would characterize as a good fit. The over-all accuracy of the vapor pressure equation might otherwise be seriously impaired, to say nothing of the changes which might simultaneously be forced into the coefficients A and C. Changes in all the B-values, except those of a random nature, must be reflected in modified A- and C-coefflcients if the least squares criterion is still to be met. The repetitive adjustments of two of the coefficients, as the third is undergoing the attributive process, is a routine task well-adapted to computing machines. The efforts might in fact be worth while for this particular equation, since it is based on 26 the Dieterici equation of state and is thus attractive from the standpoint of theoretical inference. As a general procedure, attributing coefficients should be undertaken only with the realization that two distinct and often antagonistic concepts are involved. One might seek a very accurate predictor equation, in which expressions for the ag^ might be lengthy and complex. The alternative might be a more general equation in which these a-coefficients may be given only by constants, or by linear forms Involving the simplest functions of elementary chemical properties. Two separate procedures are used in attributing the a-values. If accuracy of the predictor equation were paramount, the attributive scheme would be initiated only after the entire predictor equation had been generated and all the a-values had been established. Separate equation! would then be developed for the asj c* In this case, agl{.-functlons of the form (Iu) are apt to be disconcertingly complex. For large s especially, the equations may have to represent quite tortuous curves. This difficulty is precisely the one which figures so prominently in multivariate regression analysis. If a more general equation were to be devised— one of simpler form, for example, or of more substantial theoretical interest--each coefficient a ^ should be analyzed immediately on its insertion into the predictor equation. Thus, the equation for a ^ would be developed 27 directly after the decision involving (XjJgic but before the evaluation of Y2« Equation (It), in other words, would be defined prior to the generation of equation residuals which constitute the new dependent parameter. The very attractive feature of this method lies in the comparative simplicity of the functions (Iu). Generally, the approximation to each coefficient may be somewhat relaxed, since the residual or slack variation may be taken up in the next succeeding dependent parameter. The data analyst may exert more control over each recursion stage, and may Introduce only as much complexity in the ag^-functions (Iu) as he feels is Justified by the theo retical model. He runs the additional risk, however, of introducing certain types of periodicity, which may lead to explosive instability in the study of subsequent coefficients. Under such circumstances he may be forced to reappraise the prior recursion stage, perhaps to modify the aggregate predictor equation for that stage. The two different methods of attributing the coef ficients lead to two equations which differ not only in the functions and values assigned to the a but perhaps SK. also in the form of the general predictor equation itself. Initially, the same independent parameter (Xj)-j is obtained. The Y1-values, however, are calculated as residuals: The Y1 ^ for the two methods thus differ slightly, depending on the accuracy of the (It) predictor equation for a1. The difference may be great enough to effect a change in the decision for s^nce this decision is heavily or decisively influenced by the magnitude of the correlation coefficient i*[Y1 f (xj)2] * The Predictor equations based on the two attributive methods thus become more and mope dissimilar as to a-values, if not eventually as to the themselves. Whether or not this bifurcation of the attributive process might distress those accustomed to the relative serenity of multivariate regression analysis, the additional flexibility must be considered a distinct asset of the recursive linear method. Perhaps the outstanding distinction between the two regression methods lies in the number of Important decisions which inevitably must be made in recursive linear regression. One may no more than speculate as to the nature of the concealment of these decisions in multivariate regression. GENERATION OP X-PARAMETERS Phase I.— The nature of Phase I in recursive linear regression Is best described as preparatory to the regression analysis proper. In three distinct steps, the intuitive notion of a predictor equation is advanced from that of a conceptual relationship, involving variables, to one of numerical analysis involving sets of measurements and functions thereof. The analytical work of Phase I thus serves as a basis for regression analysis in both Phase II (generating the predictor equation) and Phase III (attributing the coefficients). The decisions which must be made throughout Phase I, however, are for the most part less significant. In addition to straightforward computa tional procedures, the first phase is devoted to the interpretation and manipulation of existing predictor equations. Herein also is effected the somewhat subtle transition between the physical and mathematical models. Phase IA. accounting for historical models.— The first step of Phase I consists of a search of the scien tific literature for predictor equations representing the relationship under investigation. The search is for terms which will become parameters in the regression analysis. According to the extent of past scientific 29 30 interest in the relationship, one may compile a more or less impressive list of analytical equations containing the chosen dependent variable. These equations may each be written in the form fi(y» x1 , ... , xm) = 0, (Ha) recognizable as the most general predictor equation form. In addition to the equations themselves, the Investigator will also have become familiar with the derivation and scope of application of each function f^, and with the more subtle framework which underlies the fundamental predictor relationship. As bases for the parameters, equations (Ila) should all be transformed so that the dependent variable y appears at least once in the same form for each equa tion. This transformation is usually trivial if there exists but one term containing the dependent variable. In almost all vapor pressure equations, for example, the vapor pressure appears as a logarithm, log P. Those equations containing P, or P°, or eP, would thus be rewritten with log P, or with c”1(log Pc), or with G 6 p loge(logee ). The choice of a particular dependent parameter, logeP in this example, depends to a great extent on the facility with which all equations may be so transformed. The generation of terms containing non linear coefficients may often be approximately represented by portions of infinite series. Any such series is trun cated to an extent determined by the range of values assumed by the several variables in each term, and by the relative significance of each term within its equation. Consider the logarithm term in the Erpenbeck-Miller vapor pressure equation (11): The general treatment described above is made quite diffi cult by the fact that the infinite series converges very slowly for CTR =C(T// TC) Just less than unity, and of course diverges for CT,,>1 . Knowing that I t — the approximation is poor for CTD>0.1, the analyst may ft approximate (lie) by and resign himself to the confounding of one term in the predictor equation near CT_=1 . However, he will recall ft that the journal article describing the Erpenbeck-Mlller equation Included the statements: Because C is so near unity and it appears only in a correction term, it can be used arbitrarily as 1.00 in equation 17 [lib] for all substances. The resulting relation fits the data nearly as well as equation 17. . . . Because equation 17 meets but one of the six criteria given by Waring (31) [39] for a long range vapor pressure equation, it was fitted to data only log1Qp = A + 3T"1 + log1Q[l - C(t/Tc)] . (lib) 5®(CTh )1 (He) log10(1 - CTr )=-(CTr) - ?(ctr)2 (Hd) 32 in the region from the triple point to the boiling point. (11) Noting also that log^0(1 - T^) is unstable near the critical point, the data analyst might well drop this term altogether. If there remain among the basic predictor equations certain terms which contain non-linear coefficients and which, for reasons appropriate to a particular case, may not be omitted or otherwise approximated, a trial-and-error determination of the coefficients becomes the immediate task. In the above example, the C-coefficlent would be calculated for all the substances to be investigated, according to the original equation (lib) and by conven tional regression analysis. The non-linear coefficient thus is determined (and, perhaps, attributed in a separate analysis), whence the Erpenbeck-Miller equation reverts to the form log10P S A + 3T-' ♦ where x^ ^ represents a particular member of the family of basic variables x^. Conventional regression was utilized for convenience only. This treatment of C is, in fact, quite reasonable in the light of the quotation cited earlier. Special treatment is not generally required except, as noted, when one or more equations (Ila) contain terms 33 with non-linear coefficients. Virtually any other equation falls into the "typical" category, wherein all undetermined coefficients are linearly related. Even the unusual Frost- Kalkwarf equation for vapor pressure (12), l O g 1 0 P = A + BT”1 + C(log10T) + DPT-2, (lie) is typical, although the dependent and independent varia bles are not separable and the dependent variable appears in two forms. As was stated in a previous section, the -2 appearance of a composite parameter such as PT offers no difficulties in the recursive linear regression method. Once the initial set of predictor equations from the literature has been tabulated, an auxiliary set of equations may be introduced, to account for parametric equations in a second class of variables. Some vapor pressure equations, for example, include as Independent variables the liquid and/or vapor viscosities (26): A(log10T) + log10(log10P - log10jLL) = B. (Ilf) One may thereby feel Justified in Introducing one or more auxiliary equations fJL = hA(P, T, ... ) (Ilg) to approximate the relationship between viscosity and certain independent variables. Although a separate recur sive regression analysis of (Ilg) may not be Justified, 34 still one may not wish to overlook the possibility of concealed parametric forms. Viscosity, for example, has frequently been correlated (associated might perhaps be the better word) with the parachor It is not inconceivable, then, that a best predictor equation for vapor pressure might include the parachor, or a function thereof, as an independent variable. The particular and most appropriate function may not be obvious, but a consideration of forms such as (Ilg) and (Ilh) may reveal some hint as to a possible form. Such second-level equations would not normally warrant exact or rigorous derivation, were it even possible, or trial-and- error procedures. Such equations (loosely, functions for independent variables) might be considered an optional extension of the basic predictor equations (Ila). To gether, equations (Ila) and those of the form (Ilg) constitute the primary formulas for Phase IB. IB the initial parameters or variable groups X^ are generated. The basic set of parameters consists of all the independent X j , which are represented in the primary formulas (Ila) and (Ilg). In this basic set, no unknown coefficients appear; those which were originally present m v p ' 1/4 ( H h ) Phase IB, generating the X^-parameters.— In Phase 35 are presumed to have been evaluated by trial-and-error. As a basis for regression analysis, the parameters thus itemized make possible the study of any linear combination of the primary formulas. The research analyst is thus assured that he may derive a predictor equation at_ least as good (in any desired sense) as any of the formulas individually. In addition, any linear combination of the separate X^ is admitted to study, e.g. Y = A + BX^ + CXj + DXk. (Hi) Regardless, then, of the particular manipulations and decisions in Phase II (regression), the predictor equation which may ultimately be derived is not Inferior to any of the equations originally gleaned (perhaps with some trans formation additionally) from the scientific literature. The only possible compromise is that which may obtain during Phase III operations, attributing the coefficients. Just as auxiliary equations were introduced in Phase IA, to accommodate second-level functions, so may auxiliary parameters be introduced in Phase IB. The auxiliary parameters are those introduced by the research worker for reasons, perhaps, known only to himself. Having searched the literature for vapor pressure equa tions, for example, he may come to be persuaded that certain additional parameters ought to be at least admissible in his own conception of a predictor equation. Just as Gamson and Watson (13) Introduced an exponential term on little else than Intuitive speculation, the present analyst may decide that certain hyperbolic func tions may be concealed in the fundamental vapor pressure relationship. He may thus Introduce sinh( TL,), cosh( Tt, ), i t x t tanh(TT,) as auxiliary parameters. The decision may have stemmed from his study of a graphical plot of reduced vapor pressure versus these functions. Perhaps, however, the possibility arose during the transformations he made In Phase IA, when the collected terms of various infinite series resulted in the sum TT + 57 + + ••• = sinh<TR) Almost as likely, the use of sinh(T^) may have been suggested from purest intuition, or as an ’ ’aside” to other research, perhaps even as an undefined hunch. The emphatic point Is that no restriction is placed on the number of parameters which may be handled by recursive linear regression, and therefore the investigator should not feel restricted to functional forms traditionally relevant to the relationship under study. Naturally, the Indiscriminate enlargement of the set of parameters X^ Increases (linearly) the amount of computation in Phases IC and II. Some balance must therefore be struck between computer time, cost, or availability, and the total number of parameters. Mathematical experimentation of the type 37 described is frequently considered by physical scientists an heretical technique. This writer does not subscribe to that opinion. The chance of discovering an important relationship by purest accident is surely remote, but then the auxiliary parameters should not be considered purely accidental or random. The distinction between intuitive and randomly-chosen parameters is perhaps subtle, but not trivial. The difference is essentially the same as that between a meaningful or manageable problem and one of infinite scope. Another type of auxiliary parameter may also be introduced at this point in Phase I. Having accommodated in the framework of a future regression analysis all possible linear combinations of the primary formulas, we now consider certain non-linear combinations. In particu lar, we consider parameters which are developed in products and/or quotients of the individual formulas. The expression [p. (Xi)] [r2(x.)] Y = A + P— * ------— — (Ilk) P3(Xi) is an example of such a non-linear combination, not indeterminate so long as ) is not indentically zero. This latter circumstance can never prevail, else the search for a perfect predictor equation will have ended. Generally, each function P^(X^) approximates the dependent parameter Y. A very large number of cross 38 products of parameters may appear in the expansion of (Ilk). This expansion will not usually consist of a linear combination of terms (containing no unknown coefficients), but frequently may be so approximated. Consider in (Ilk) the result of employing only the most significant parameter X^ in the F^-equatlon. Then (Ilk) does reduce to the linear combination sought. Each term will contain an X- and an Xw in the numerator and an *1 *2 XF in the denominator. The parameter XF XF /Xp thus becomes an independent parameter of the second auxiliary type. The statements made earlier regarding auxiliary parameters apply also to this second type. In the expanded representation of (Ilk) all cross-products of the two functions F^X^) and FgfX^) must necessarily be considered, and none can be pre-emptorlly omitted. At the same time the number of equations (Ilk) is very much greater than those of the linear combinatorial form, and some restriction must of necessity be invoked. One may define a complexity coefficient c as the largest number of functions ^j(X^) which are to be included in general equations of the form (Ilk). The most complex parameter which then appears In the expansion of (Ilk) can be no more than c times as complex as any original parameter. Complexity, in this sense, is taken to indicate the number of distinguishable parameters In the primary formulas. The number of added parameters of the third type increases 39 rapidly, however, as c is made larger and for a consider able number of primary formulas. The bookkeeping task may itself be formidable. In handling parameters of the third auxiliary type, most analysts will prefer to pick and choose the specific parameters to be added to the basic set, again from their own intuition. The number of distinct parame ters of the third type would in practice be considerably smaller than the number of all possible such parameters, since different primary formulas for the same physical relationship undoubtedly would contain many identical terms. The bookkeeping or identification problem is thereby magnified, but the selection process may be considerably simplified. The fact that parameters of this type are decidedly synthetic should relegate their role to one of very moderate scope. More elaborate combinations of the primary formu las are of course possible, but these would in most cases fall perilously close to that division between random parameters and parameters selected for reasonable cause. One must recall that each of these formulas has to some extent served as a predictor equation by itself. Except for good reason, unusual combinations of predictor equations would seem to border on the presumptuous. Phase 10. generation of numerical X-data.— Largely a matter of numerical computation, Phase IC involves the 40 preparation of the observation matrix. This Nxn matrix constitutes the basic arithmetic data for Phase II (regres sion analysis). Corresponding to the n independent parameters X^ (1<i<n) are the n columns of the matrix. Within each column are N rows, the Individual observations or data points. A general study of the Prost-Kallcwarf equation (12), for example, log1QP = A + BT_1 + C(log10T) + DPT-2, (He) would lead to a three-column matrix, one column for each of the X^-parameters T , log10T,PT . The number of vapor pressure values selected for study determines the number of rows in the matrix. The i-th data point (the pair ?i»T^) thus is identified as the i-th row, and (X^^ is the value of log^T^ As for the dependent parameter YQ=log^QP, a column of these data is augmented on the left of the matrix. The Y-column is specifically excluded from what is called the observation matrix, since the Y^ values for Phase II change at each recursion stage. The X^y values undergo no such change. The X-matrix, as it will be so called, is generated in the memory circuit (cards, tape, magnetic drum, etc.) of the computer and cycled through the logic of the computer program(s) during the regression analysis. Provision must generally be made for extending the number of columns of the X-matrix, since 41 each recursion stage may suggest additional X^ parame ters. These parameters, generated as needed, augment the X-matrix on the right. The generation of the X-matrix is a problem in elementary computer programming which will not be discussed here. Various subroutines (e.g., for transcendental functions) must generally be incorporated, but the user may alternately apply any one of several compilers avail able for specific computing equipment. Such relatively new compilers as FORTRAN, ALGOL, and COMPACT enable the engineer or scientist with no programming background to accomplish virtually any mathematical transformation con ceivable. Transformations which may be required for Y, the column of dependent-parameter data, would normally be made at the same time the values are calculated. For Phase II, only the X-matrix and the Y-vector are required. The computer program should also accomplish the weighting of parameters, if required. This weighting procedure allows for unequal precision in the determina tions or transformations of (or observations on) a particular parameter. The method, a weighting inversely as the variance, is covered adequately by Sherwood (33) and others (40,42), and will not be discussed here. Very infrequently, the difference between weighted and unweighted least squares treatments results in markedly 42 different predictor equations. The difference Is perhaps accentuated In the terminal stages of recursive linear regression, but the added computational effort introduced by observation weights can not be overlooked. As Sherwood states: Although the procedure in some cases is not particu larly complicated, it is seldom Justified in the treatment of chemical engineering data. After comparing the results in several cases with and without weighting, Scarborough [31] expresses the opinion that it is ordinarily not worth while to bother about the weights of the residuals. (33) For parameters which are truly unknown, specifically the dependent parameter, the associated variances also are unknown and must be estimated from the X^. In regression analysis this estimation is by trial-and-error. Data analysts currently use weighted regression analysis only with quite rigorous mathematical models. The identification of all numerical data is complete in the indices 1 and J, but another subscript might subsequently prove necessary. If Phase III Attribu ting the coefficients) were contemplated, the basic data should be partitioned further according to the number of species in Phase III observations. In the vapor pressure example, Phase III data would be partitioned according to chemical species, in order that J sets of predictor coefficients may be obtained for each recursion stage. The X-matrix is conveniently handled, in this case, as an array of J sub-matrices with elements (X^)^i, ISiSN^t, 43 lsj<n, where N^, is the number of observations for species j*. The general array is shown in Figure 2. The subscripted X's in Figure 2 are completely unambiguous. The parentheses are desirable in identifying the elements as members of a two-dimensional array. The Y-nomenclature is similar to that for X. The notation (Y^)^, indicates the i-th observation (in a total of ,) on species J' (in a total of J). The general nomencla ture at this point may seem unduly sophisticated, but the necessity of complete identification is served by no ( significantly simpler system. Since in this section the concern is with individual items of data, in the more general formulas and Interpretation of Phases II and III one subscript described above, i, disappears. A complete description of nomenclature appears in Appendix I. Figure 2.— The Y-Augmented X-Matrlx ( Y ,) , ( X , ) , , (X1 ^21 ( X i)31 . . . <X1>n1 (Ya ), (x2 ) n (x 2 )21 • * ♦ ( x2 ) 3 i .... (x2) nl < V l < V " ^XN1 ^21 (XN1^31 • • * ^ ^n1 ( Y , ) ? ( x , ) , 2 ( x ,)22 <X1 >32 . . . (x1 )n2 (r2)2 (X2 )12 (x2 ) 22 (x2 )32 . . . (x 2 ^n2 (YNg^2 ( XN2 ^12 • * ♦ (XNg ^ 22 (XN2 J32 • * * ^XN2 ^n2 • • • <T' \ <X1>1k ^X1^2k ^X1^3k . . . (x 1 )nk ‘V * (X2 ^1k ^X2^2k ^X2^3k . . • (x2^nk < V 1k • • • ^XNk ^2k ^ N ^ k * * * ^ t f ^ n k DEVELOPMENT 0? THE PREDICTOR EQUATION Phase II.— The primary feature of recursive linear regression is introduced in Phase II, during which the general form of the predictor equation is established. The basic data generated in Phase IC are manipulated in a cyclic manner to produce correlation coefficients on successive residulals Y±. The three parts of Phase II are separately concerned with the calculation of correlation coefficients, the making of decisions as to parameter choice Xj, and the calculation of equation residuals, the new dependent parameter. Phases II and III may be used simultaneously, but in this section the technique of attributing the coefficients will not be discussed as such. In practice, however, part of Phase III is indis tinguishable, mathematically, from the complete Phase II. The calculations in Phase II are certainly complex and repetitive enough to warrant the use of a digital computer. The procedural steps to be discussed are those which have been used extensively in statistical analysis in the past, and the numerical computations thereof are competently analyzed elsewhere (15*31). For this reason computer-performed calculations are assumed in this section. Also, and to clarify further the logic of Phase 45 46 II, the basic nomenclature (viz., subscripting) has been simplified for expressions in which meaning and intent are unmistakable. Phase IIA. calculating the basic statistical parameters.— The correlation coefficients of Phase Ila are calculated from the rectangular array of numerical data produced during Phase 10. Consider now the sub-array of data for a single species. Each set (row) i, 1£i£N, consists of values (X^)j» 1s;j£n, on the n independent parameters X^. Together with similar values of the dependent parameter at recursion stage k, these data permit the calculation of all correlation coefficients r(Yjc_1,Xj) and the values of the coefficients in the least squares relationship <Yi>k-i 3 a0k,j + akj<xi>r as well as the residuals (p^)^ and error fractions ( €i^kj 8 (Pi^kj = (Yi^k-1 "a0k,j “ akj^Xi^j* (IIlb) aOk 1 - ^ 1 < «i>kj = 1 - 7— *k3 ' 1 ' Ullo) u i 'k - 1 U i ;k-1 On an electronic computer these five quantities r, an, ,, U K , j akj» P are easily generated from the same set of statistical measures. For a total of N observations (rows), the pseudo-variances are defined as 47 , J L o N - = nSIKXj^)2 - [2H(X1)]2, (Hid) 1=1 ■*■ 1=1 9 o JL JL N VYX = VXY = <Yi H xi> - (Y± )IZ] (X^). Then for the k-th recursion stage the following equations may be derived (see Appendix IV): r2 r(Yv-,.x1) = YY X *k-1 . 1 k-i* r vY vY v k-1 *k-l VY X ak 1 = J|=1i (me) VX X xrj aOk,j = ^ / Yi>k-1 " (X15 j] ’ in which ^ may also be represented by the means, a0k,j = Yk-1 " akjXj * The five statistics r, a0icfj» aicj* P » € are the com” plete set of results for a single parameter Xj of the specific sub-array of data. If the total number of independent parameters is large, the volume of results may be sharply reduced by limiting the number of sets of computed results to those associated with only the largest r’s. Since the magnitude of r Is a controlling factor for the choice of an independent parameter in each recur sion equation, the results given for small r-values will not generally be either useful or Informative. If only the several largest r-values are punched or printed as results for each Phase II recursion, the total computer output will be very small compared to the mass of raw data handled. The number of correlations selected for print- or punch-out is strictly arbitrary, and depends to some extent on what may be accomplished in computer program ming. In the interest of maintaining the significance of the term "data reduction," of which regression analyses are prime examples, the limitation on Phase IIA output is certainly a worth-while simplification. However complete one would prefer his analysis to be, an exhaustive tabula tion of statistics at this point would only render the decision-making process more difficult, seldom more fruitful. Phase IIB. selecting the recursion parameter.--Of all the individual steps in recursive linear regression, none is more Important nor, perhaps, more difficult than Phase IIB. During this stage the recursion equation is modified by the addition of a new independent parameter. Just as there is apparently no intuitively right or best choice of the parameter Xj introduced, so there is also 49 no way of delineating the decision-making step itself. For this reason a specific goal should be enunciated by the analyst. The search for a very accurate predictor equation, as mentioned earlier, may not at all coincide with the search for a general equation, or for a "theoret ical" one. In all cases, however, three criteria may be investigated. The foremost decision criterion is represented by the set of r-values generated in Phase IIA. An accurate predictor equation might well be produced solely on the basis of these r-values; that parameter X^ associated with the correlation coefficient of greatest magnitude, is introduced as a predictor term at each recursion stage. At the end of any recursion stage k the new dependent parameter Yk = V i - ak<xA is generated and used subsequently in the (k+1)-th recursion stage. If a number of analyses are performed simultane ously (e.g., for different chemical species), then the choice of may devolve on the X^-associated r-values which are largest or most uniformly prominent in the group as a whole. The augmentation of the predictor equation is thus made to depend only on linear correlation coeffi cients. Justifying such a decision is the statistical 50 principle that the residual or unaccountable variation from a linear equation varies inversely with the magnitude of the correlation coefficient (see Appendix IV). Thus the largest r-value is associated with the smallest residual sum of squares. The use of the r-criterion exclusively may be considered short-sighted, in a sense, as the prior and future recursion stages are beyond the realm of study. What is sought during each recursion stage in this case, however, is just that parameter X which minimizes the residual sum of squares on the new dependent parameter. Accuracy is hence best served by the r-criterion alone. Frequently the set of r-values is by itself an insufficient guide to the selection of the independent parameter X^. Especially if the coefficients are to be attributed, the choice of an appropriate is aided by the second decision-making criterion, represented by the set of a^-coefficlents. The value of any coefficient a^ (1^0) is given by VYX. V ai = ^2 ~ r(Y’x)v * (IIIg) Vi Vi The a-crlterion is thus an extension of the r-criterion, since the pseudo-variance Vyy is not a necessary statistic for the computation of a^. This coefficient a^ is, in fact, just the slope of the least squares line fitting the Y-X^ data pairs. 51 At this time we must digress momentarily to a consideration of Phase. Ill procedure, since the attributing of coefficients may be greatly hindered by a singularly unfortunate choice of X . The complete set of (e.g., for all chemical species) should be studied graphically before the associated X^ is chosen as Xj for the current recursion stage. The a^ which are to be attrib uted may be plotted against any of the simpler Z (chemical) parameters of Phase III. Only if the resulting plot can be described by a smooth curve will it be possible to express a^ as a reasonable function of the Z selected as abscissa. The plot may actually reveal, more significantly, that the a^ are too randomly dis tributed for representation by any reasonable function(s) of Z. The a-criterion is thus actually a test of consistency (which, however, must be Interpreted in terms of the presumed uniformity of Z). Among all the parameters contemplated as added terms in the predictor equation, the search is for that one whose associated a^-coefficients may be attributed most accurately, most simply, or most uniformly. Thus is embodied the essential nature of the a-criterion. The interpretation of such a^-z plots as may be made rests partially on the associated values of r(Y,Xj). Obviously, if r is very small, the corresponding coefficient a^ may not lie close to the best a^-Z 52 curve. If a smooth curve can be dravm near most of the a^ with large r-values, the a-criterion is thus still met. Regardless of the unsatisfactory visual picture such a plot might give, the Phase III (attributing) procedure for a^ may nevertheless be quite satisfactory. The impor tance of a particular point on the a^-Z graph must be intuitively weighted according to its r-value. Noting that the sum of squares of the residuals on the linear equation is (see Appendix iv) Y2 p 2 = |iv2YY(1 “ r2)> (IIIh) N one may establish a very rough weighting procedure. One may, for example, plot very close to a^ that number of P — 1 distinct points given by (1 - r ) . Weighting the relative importance of each a^ by this method (i.e., replication of a^-polnts on the plot) is thus approxi mately equivalent to an Inverse weighting by the equation 2 1 / 2 error. Aptly, the quantity (1 - r ) / is called the coefficient of alienation. The third decision-making criterion for X^- selectlon is based on € and E, the recursion- and over-all-error fractions. For each observation i the k-th recursion-error fraction may be written as in (IIIc), with the species (j) subscript deleted: 53 / * ■ \ . aok + ak^xil^k . x 1 k " ----------- In general, € is a measure of the closeness of fit of the linear equation (Ilia). It is different from the over-all error for the augmented predictor equation, since the denominator is, for all recursion stages save the first, different from the original dependent parameter Yq. As shown in Appendix IV, the over-all-error fraction is a function of Y-values for successive recursions, and of the error fraction €• After the k-th recursion, the over-all-error or total-error fraction in the 1-th observation is given by ( 4 <e a = (€i > k V ? i T r ’ (iii3) in which *s the i-th observation on the original dependent parameter. Both error fractions are useful; one (€ ) represents an improvement since the last recursion stage, the other (E) an attained accuracy with the current (k-th) predictor equation. The expressions for average error (e.g., for all observations on a single chemical species) are not reducible to such simple forms as (III1) and (III;)), and are best handled as simple summations (see also Appendix IV): '*k - • " uL^Ok^b| Vii 'k-1 " ak^-^ il k 1 k-IJ ’ (IIIk) ! N (Y, ^k = ( € i}k (Y1)q ’ These statistics are those generally reported as ."devia tion" and "average deviation," respectively. In the application of €- and S-criteria, maximum values (for one chemical species) are often the most useful parameters. If the error fractions for some adjacent members of a set of observations were significantly larger than the error fractions for other members, one might expect not only increasing instability during future recursion steps, but also considerable difficulty in attributing the coefficients in Phase III. As in the study of the a-criterion, a graphical representation is suggested. A plot of € as a function of Xj may reveal a decided departure from linearity, indicating thereby that the inclusion of X^ in the predictor equation reduces the residuals non-uniformly, or non-randomly. Such information .may be exceedingly valuable, for it may suggest just that transformation which will accomplish the desired reduction of the squares of the residuals. Suppose, for example, the X^-choice leads to €-fractions which plot randomly for all observations, except for a mild hump, oscillation, or other eccentricity in the smoothly-drawn curve. Then the mathematical form of this curve may be deducible by inspection, in which case the table of X^-parameters may be further augmented by an appropriately-chosen new independent parameter. The process is generally handled as a next Phase II recursion, but the investigation at this stage enables the data analyst to determine whether or not a proper X^ (of the form needed) may already have been included in the original table or matrix of X^-parameters. In fact, the use of the € -criterion ( for that matter the p-values) is one demonstration of the power of recursive linear regression. If among all the independent parameters X^ even that one (Xj) most strongly correlating with Y exhibits non-random residuals, an additional X-parameter may be introduced as compensation. The over-all improvement is directly related to the accuracy with which the residual or error curve may be fitted with a linear function of the new independent parameter. An obviously systematic error curve can always be approximated by any one of an infinite number of functions of the added X-parameter, although a relatively complex equation form may be required. In the extremity, various orthogonal functions may be summed for a fit to any desired degree of approximation. In any event, a positive improvement can be assured for the next succeeding recursion stage, through the augmenting of the X-matrix. If the first three criteria (r, a, and €) all 56 fail to reveal a preferred Xj for inclusion in the pre dictor equation, three circumstances may prevail; either the set of X-parameters is too small, the X^ are insufficiently complex to describe an obviously non-linear Y-X^ relationship, or the residuals have been reduced to the state of essentially random variation. In the latter case, the analysis is manifestly terminated. In the first two cases, either an enlargement of the X^-set or else an immediate attempt at some more primitive form of analysis is required. The method is trlal-and-error to the extent that various selected functions, perhaps containing coefficients related non-linearly, must be fitted to the dependent parameter Y. As a last recourse, consequent only on the failure of the three prime criteria, this procedure amounts essentially to the standard multiple regression scheme with a predetermined equation in specified independent parameters. Frequently a simple manual solution, not involving the least squares principle, will suffice. Either conventional or recursive regression procedures may be utilized, however, depending on whether or not the coefficients are to be attributed. 3y far the most useful trial-and-error procedure is one involving only a simple translation of the Y-Xi origin. The observation that a Y-versus-X^ curve is approximated, for example, by a parabola symmetric about Xi=XT ? and with vertex at Y=Yp virtually acclaims the transformation tt = Y - yf, (III1) <X1>T = X1-XF. If the parabola appeared somewhat skewed, an additional non-algebraic transformation of either or both parameters may be obvious. When appropriate, an s-order polynomial approximation may be useful: (Xi)T = (xi - A1 )(Xi - A2)...(Xi - Ag). (Illm) The coefficients A^ are the zeros for an abscissa at the translated X^-origin, X =Xp. Each transformation of the type (Illm) represents one or more parameters added to the original X-matrix of Phase IC. Rather then solving for the by trial-and-error or by any use of regression techniques, one would preferably read approximate values from the graph and then insert several new X-parameters into the basic table. Each of these new parameters might represent a minor modification of the fundamental s-order polynomial deduced from the graph by inspection. If the coefficients a^ of the predictor equation itself are to be attributed, an even more deliberate approach to (Illm) may be taken. The procedure amounts to a regression analysis at the third level, in some cases, since equation (Illm) may also represent a coefficient of Phase III. The A^ in (Illm) are very loosely attributed, to the extent that only the most general features of the curve are matched. The purpose in this effort is to reduce the amplitudes of maxima and minima. These steps, and in fact all those described above, would normally be taken only when the three basic criteria fail to supply the requisite information on a suitable X^ for the recursion stage in process. The methods of recursive linear regression are closely tied to the principle of least squares. Insuffi ciency in the body of data (YQ and all X^) does not mitigate the prime hypothesis of inferential analysis in correlation and regression; the set of independent parameters must be exhaustive. ter.— Phase II Is nominally complete with the selection of a particular X^ for the current recursion stage, thus defining the k-th recursion (and predictor) equation: If the analysis is deemed complete, perhaps on the basis of a "t"-test of a, , or of randomly-distributed residu als, the predictor equation may be immediately written: k The coefficient a^^ may yet have to be attributed (in most cases, quite accurately) if this procedure were Phase IIC. generating the new dependent parame- Yk-1 ~ a0k + ak^XI^k* (Iq) (Illn) 59 followed for the other a^. Should another recursion be required, however, the new dependent parameter is calcu lated as a = v , - A < x iA (IIIf) and Phase II reprocessed with a possibly-augmented X- matrix. Before (Illf) may be utilized, the value of ak must be calculated. As a part of the r-a^-a^-p-6 out put, this coefficient is best in only one of the senses of least squares regression. !f ak is to be attributed it will take on a different value, as close to the optimum (a^) as the p-th stage (Phase III) a^-predictor equation v ak,p-1 ” bOk,p + S bki(ZI^ki (IIIo) will permit, and satisfying the least squares principle in a different sense. Two alternatives are thus presented. If accuracy is paramount, the a of Phase IIP (i.e., that of the r-a^-aoutput) may be used, and the residual Yk calculated immediately. This procedure usually is degenerative of any subsequent Phase III analysis. If on the other hand a formal expression for a^ is desired, the Phase III procedure must be initiated. The result of Phase III, an analytical formula for afc, is used in (IHf). Whichever of the two methods is used, the outcome is a new dependent parameter Y^. For the first few recursions, 60 Phase III would normally be performed, presuming the first few Xj represent reasonably strong correlations with the then-current Y. Eventually, however, most of the residual variation may be found to scatter more or less randomly about a constant a0j c» the regression parameter. Further regression analysis would thereafter be futile. Occasion ally the E-criterion of over-all accuracy may be superior to a test of randomly distributed residuals, since the latter may be recognizable only with difficulty. Systematic experimental error has not heretofore been discussed, but should be mentioned as a possible vitiation of the random-residuals hypothesis. Normally, in conjunction with the approximations of the various a- coefflcients, recursive linear regression leads to a terminal residual function characterized by relatively strong periodicity, of large amplitude and uneven frequency. Such functions may be: (a) primarily random and therefore arising out of random experimental error, (b) accountable by unknown -parameters not originally included in Phase I, (c) attributable to systematic error. As in any handling of physical measurements, regression analysis may involve raw data which conceal unsuspected systematic error. Recursive linear regression presents one means of detecting such error, namely the inclusion, as X^, of such instrumentation variables as may be suspect (e.g., parallax of optical devices, non-linear ranges of thermometers or other meters, interpolation ranges in discontinuous calibration curves, etc.). Varia bles such as these, related to misapplied or unapplied corrections to equipment readings, are distinct from those embodied in "experimental error" as commonly interpreted. Should the data analyst have no legitimate grounds for apprehension about the experimental technique, he is confronted with either of the two remaining explanations, (a) or (b) above, of the final residuals for the predictor equation. The difficulty arises when he attempts to define a logical discontinuation of further attempts to improve the equation. The problem of detecting an acceptably "best" solution does not arise in multivariate regression. In recursive linear regression the final stage may be taken as that for which the accuracy-correla- tion duality is satisfactory, or at least sufficient, for the purposes to whlcn the predictor equation will be applied. As in any research, the final approach to perfection is inordinately expensive of time and morale. The initial statement of a goal is, as mentioned earlier, the only definitive limit. EVALUATION OP THE ATTRIBUTIVE COEFFICIENTS Phase III.— Although the method of recursive linear regression was intended originally as a Phase III operation, the actual procedure therein is precisely that of Phase II. Instead of a dependent parameter YQ we consider one predictor coefficient a^., associated with the parameter introduced during the k-th recursion stage of Phase II. Instead of many physical parameters X^ we consider, for example, many chemical or second- level parameters Z^. In place of the form of a Y- predictor equation we seek that of an a-predlctor equation in p Phase III stages: For the N observations on (Y, X1 , ... , Xn) of any single species we substitute the M sets of values (a^, Z1, ... , Zm). In most other respects Phase III is a duplication of Phase II. The two major steps in Phase III are essentially Phase I-Phase II in combination. For 62 a0k + C ai(X];)i (Illn) (IVa) 63 illustrative purposes in this section the assumption is that M different analyses (corresponding to M chemical species) have been performed simultaneously in Phases IA and II. Phase IIIA. generating the Z^-parameters.— The first step in attributing the a-coefficients of Phase II is, as in Phase 13, the compilation of a table of indepen dent parameters. One set of these (chemical) parameters may be sufficient for all coefficients a^ which are to be attributed, but often one may find it necessary to augment the Z-table after each Phase III recursion, depending on the nature of the recursion equation residuals. The Z-table should in any case be regarded as an open-end tabulation. Just as in a consid eration of -parameters in Phase II, one is not usually in a position to segregate all reasonably-conceived parameters Z^ into smaller groups applicable to specific coefficients. Neither is this procedure an advantageous one in computer processing, as such manipulation of sets of data on cards or tape is cumbersome as compared to the arithmetic operations performed. Since the number of may be made arbitrarily large, no such subdivision is required as a condition to the simplification or narrowing of scope of the Phase III analysis. Phase IIIA is in most respects analogous to Phase 13, in which certain variables were selected for the 64 regression study. These variables were defined during the course of a preliminary literature survey, a survey of existing equations for the fundamental physical relation ship under investigation. Augmenting this set of variables were others suggested intuitively, the final set consisting of all "interesting" variables. From these were generated independent parameters X^ for use in Phase II. The augmented set of Phase I variables x^ repre sented physical variables in the sense that, regardless of chemical species, the fundamental predictor relationship was presumed reducible to but one mathematical form containing these x^ with unknown coefficients. In the application of recursive linear regression to "functional" relationships of the physical sciences, we shall adopt the convention that properties of a system may be classified as physical (macroscopic, or system properties) or chemical (microscopic, or substance properties). The simplest definition, to be used here, is that a property is physical if it may be regarded as continuous, chemical if it appears as a discrete distribution. For a pure substance, temperature, pressure, viscosity, electrical resistance, and vapor pressure are thus physical proper ties, each varying continuously and (perhaps) uniformly over a more or less well-defined range. To any properly- dimensioned number in this range there corresponds a physically realizable condition defined as the property 65 T, P, |LL, etc. On the other hand formula -weight, critical temperature, and vapor pressure at. a specific temperature are not continuous properties for the set of all pure substances (presumably finite in number). It is at least theoretically possible to choose a particular number of degrees Kelvin and subsequently find no corresponding pure substance with that critical temperature. According to the above definitions, then, the x^ are physical and the z^ chemical properties. The z^ are itemized as much from a study of various chemical or other physical handbooks as from a literature survey. The analyst's knowledge of the funda mental physical relationship plays a large role at this stage, although in truth the number of simple and conven iently measurable chemical properties is relatively small. If the relationship under investigation involves various thermodynamic criteria, then formula weight, critical temperature, critical pressure, and critical density might be introduced. In the study of electrical or ionic phenomena the inclusion of dipole moment might be warranted, in nuclear physics, half-life of a nuclide. The types of z^ used depend largely on the physical realm of the relationship under study. The z^ may be few in number, but the set of parameters may be elaborated at will, precisely in accordance with the conditions discussed in Phase IB. A graphical plot of each a^ for all substances is as useful in Phase III as were similar plots in Phase II. They may well serve the purpose of establishing the Z-forms to be studied. When any a^ is plotted against some chemical property or variable z, a smooth curve may often be superimposed on the data points and an equation in z immediately inferred. The reader is referred to the latter part of the section on Phase IIB (pages 56-58) for special techniques in handling either equation resid uals or, in this case, coefficients to be attributed. Analogous to Phase IB, however, the development of a table of from is largely intuitive, undoubtedly more so for Phase III. The illustrative example which concludes this paper will demonstrate some of the proced ure for selecting a set of Z^. Once chosen, these Z^- parameters are formulated into a Z-matrlx equivalent in form and mathematical notation to the X-matrix. Phase IIIB. attributing the aj-coefficlents.— Subsequent to the construction of the Z-matrix, Phase III is virtually identical with Phase II, a^ for Y, Z^ for X^, mutatls mutandis. The a^ of Phase II recursion stage k is to be approximated In a total of p Phase III recursion stages as 67 For each Phase III recursion stage h (1<h<p) the depen dent parameter is approximated by ak3,h-i s V h + W V k h - (IVo) The new dependent parameter is then calculated as • akj,h = akj,h-1 ~ bkh^ZJI^kh* ^IVd^ No coefficient b^ is to be attributed (although theo retically this, too, is possible with recursive linear regression), so the Z^-selection criterion reduces almost solely to a consideration of the correlation coefficients ,Z.^ ). To a lesser extent the specific form of Zj might be considered Important to the development of a predictor equation with impressive inferential qualities. Usually, however, the expressions for a. .-coefficients may (or must) be considerably more complex than the basic expression or predictor equation for Y^. In practical application of the final predictor equation these coeffi cients a are very infrequently calculated more than •kj once by the user. At the same time, they must be fitted quite accurately with Z-functions during the regression analysis. A poorly approximated coefficient a may K j f H virtually nullify the benefits accrued by introducing into the basic predictor equation the perhaps eminently suitable parameter (Xj)^ during the last (k-th) Phase II recursion stage. 63 One has the choice of limiting dimensionality in the a-predictor equation to the coefficients b, to the parameters Z, or to permit a mixture of dimensions in each. The first option would normally be chosen, primarily to avoid the difficulties which result in analysis of such combinations as bez, b(logez), b(z)l /2, b(tan"1z). In these instances dimensioned z's may lead to perplexing philosonhical problems. The predictor equation must be dimenslonally homogeneous, as regards both the formula itself and the inferential analysis which oresumably links the physical and mathematical models. In some cases the complex interpretations placed on 2-parameters may be resolved in a revised construction of the constant term b0k h *'or example b[loge(z./zp)] = -b (loge Zp) + b(logez) (IVe) - A + b(logez). The dimensional properties of b(logez) may be thus tied to the predictor equation constant. The expansion of the basic predictor equation leads to terms of the type bX^Zj, each of which must carry units identical with those of the predicted dependent parameter Yq . In the vapor pressure example, each ^X^Zj must be dlmensionless, as is YQ=logeP. As long as each has the same dimensions, 69 however, it need not be apparently dimensionless. One may consider any coefficient b as a product B.b1, and the B may be factored out of the equation. Then each term b*Xizj may take on arbitrary dimensions, B assuming the reciprocal dimensions. The separability is of course figurative; only homogeneity is sought. The satisfaction of accuracy requirements for any coefficient a^ may be tested by means of the error fractions (see also Appendix IV) h - , . b°k-h * 5 b*i(z;iihi , ( I V f ) akj,h-1 a °k, j + b0i,h + Il^is^jlhs ^ k 1 ^ h ~ ^ ~ v J k - 1 ( i v g ) The subscripts h and k refer, respectively, to the recursion stages in Phase III and Phase II. The error fraction $ represents the error in akj>h-1 due to approximation (IVb), the predictor equation for that coefficient. The error fraction D is the ultimate error in the basic predictor equation for YQ. The difference between D and the E of equations (Illk) is thus a measure of the total predictor equation error resulting from the attributing procedure on a, . , . Obviously D^E K j | n and the use of Phase III causes a degeneration in the k-th recursion stage of Phase II. The fact that the best a- 70 value in Phase II is only approximated in Phase III is not necessarily an objectionable feature. At best, the attrib uting procedure tends to damp out irregularities in the set of coefficients. At worst, into the new residuals Yf c will be forced a periodic or oscillatory tendency which may at later stages become explosive. The various trial-and-error procedures which have been described will generally be employed more frequently in Phase III than elsewhere. The more strongly the first parameter (Xj)1 correlates with YQ, the more essential it is that a1 (i.e., a^ h) be predicted accurately. The residual variation represented by Yj may otherwise be completely obscured by the Induced error of approximation in the coefficient. The error fractions 8 and D signal, to some extent, the danger which may thereby be concealed. As might be supposed, a very large number of chemical species k permit the generation of quite complex predictor equations simply on the basis of the available number of degrees of freedom. The theoretical impact of the resulting formula would be diluted considerably, however. Resistance to the use of such an equation is perhaps Justifiable from both viewpoints, theory and empiricism. With even half a dozen different species, however, attributing of the a-coefficlents can reward the entire regression effort. The data analyst should bear in mind that, in second-level regression, the fundamental knowledge Is at best meagre. Only in the exceptional case will a trivial expression for an a-coefficient do aught but injure a developing predictor equation for Y. Perhaps the most unfortunate aspect of recursive linear regression is the inverse relation between sequence of regression steps or operations and the relative importance of decisions made during such steps. The most important decisions (e.g., selection of the first ) occur during the first recur sion stages of Phases II and III. The last decisions, involving the final term of the predictor equation, amount primarily to a determination of what is consequential and what is not. This principle of diminishing returns is telescoped even more dramatically, however, in conventional multivariate regression. The first major decision therein (equation form) is in fact also the last. AN APPLICATION OF RECURSIVE LINEAR REGRESSION Introduction.— As an illustration of the general techniques of recursive linear regression, the author has chosen the example mentioned several times earlier, the vapor pressure of a pure substance. The reasons for this choice lay not only in his prior studies in vapor pressure phenomena, but also in the large collection of data available, in the myriad correlations or theoretical equations which have been developed, and in the Importance of both the physical and mathematical models to chemical engineering research. The study of vapor pressure rela tionships, and the search for mathematical formulations thereof, has over many years resulted yet in no definitive equation. The equations of Riedel (30) and Frost and Kalkwarf (12) are probably the most dependable of those in current use. Riedel1s equation is extremely complex (quite accurate, however, over the entire range between the triple point and critical point) while the Frost-Kalkwarf form requires an iterative procedure for the calculation of either vapor pressure or temperature. Neither can thus be considered ideal. In this section the method of recursive linear regression is used to develop an equation for the vapor pressure of the homologous normal, saturated, 72 73 straight-chain hydrocarbons. Representing as it may but a small, however important, part of the general vapor pressure relationship, the derivation is meant to serve paramountly as an illustration of the recursion procedure. That a useful or Important equation results is secondary to the illustrative purposes. The final product, a predictor equation, in no sense constitutes a proof of the method, but only a demonstration. There can probably be no formal ’ 'proof," and in fact no index of reliability, except that acquired through repeatedly successful applications to more than one or even hundreds of individual cases. The writer's hope is that the descriptive text preceding this section presents arguments persuasive enough to reveal the power and flexibility of the recursive regression method. The example that follows is meant first of all to clarify the procedural steps involved. In accordance with the stated illustrative purpose, certain liberties have been taken in the treatment of the predictor equation for vapor pressure. The major simpli fications involve the data used in Phase I. What shall hereafter be treated as raw data were for the most part smoothed experimental data taken from Stull's compilation (37). The evaluation of experimental data is a task far from trivial, and this writer feels such an evaluation is beyond the scope of the research in recursive regression. Aside from the manipulation of truly massive collections of ' ■ 74 vapor pressure data from the literature, no other treatment would eliminate the prime risk of sampling data containing unknown but systematic error. The consequence of using smoothed data is the interposition of another level of abstraction between the set of raw data and the set of values predicted by an equation, A third "smoothed" model thus appears between the physical and mathematical ones. An indication of the reliability of Stull's data, specifi cally concerning bias, is provided by the frequent references to his compilation in more recent literature (27,30). Stull's prefatory remarks include: Since the method of interpolation is a graphical one, the personal error is also an unknown factor. In a number of cases the same data were plotted and "read back" at times well separated from each other. The agreement was always within half a degree; therefore the writer feels certain the values presented in these tables are correct to the nearest degree (assuming that the original information is accurate). The writer further feels that in the majority of cases, the agreement will be of the order of a few tenths of a degree. (37) The original article must be consulted to appreciate the particular graphical approach he used. In any case, error introduced through the use of smoothed data is mitigated somewhat to the extent of a test of the final predictor equation against genuinely raw data. The second grievance which may be raised against the particular example used is the omission of statistical weights for the dependent parameter. Properly, the parameter Yo=logeP should be weighted Inversely 75 according to its variance. The use of smoothed data lessens the possibility of large Inverse weightings, how ever. Also, the recursion procedure considerably complicates the weighting procedure, especially as the coefficients are attributed. In the present case, the availability of computer check-out and processing time severely limited the arithmetic treatments the author might otherwise have used, and which he deemed expendable to the purposes of the vapor pressure example. The sections which follow are the logically sequen tial operations previously discussed in detail. After the Phase I and Phase III preparations of the X^ and (X- and Z-matrices), the text is subdivided by recursion stage. The descriptions parallel those for the three- phase procedure. Phase I.--The literature search for vapor pressure equations, which initiated this example, was limited to a study of several books (10,27,29,33) and the most recent (post-1947) volumes of Chemical Abstracts. Prom these sources was compiled a list of vapor pressure equations which represented distinct analytical forms. These equation forms, with source-references shown for other than classical equations, are tabulated in Table 1. The most general forms of the equations are shown, with unsigned empirical coefficients A, B, C, etc., which in some cases embody chemical properties (critical, melting, TABLE 1 PRIMARY FORMULAS FOR VAPOR PRESSURE (a) Clausius-Clapeyron log-P = A + BT“1 (b) Antoine loSioF - A + cTr (c) Cornelissen-'Waterman (8) logtqP = A + 8T"°, where C = 1.50 (d) Erpenbeck-Miller (11) logiQP - A + BT“1 + log1Q(l - CT) (e) Ashworth (1) l°g10P = A + p \ 1 /p 1U (T + C)1 + D (f) Keyes-Taylor-Smith (17) log j QP = A + BT“1 + CT + DT2 + ET3 (g) Smith (35) log1QP £ (1 + AT) [B + 0(1 + AT) + D(1 + (h) Baehr (3) logePR = “ (dP/dT)0]logeTR + B[4TR - T2 - 21og 76 AT)2] TR - 3] 77 TABLE 1 — Continued (i) Barkuysen (4) log10P S' A + (J) Mltra (25) log10P s A + --------^ ^ (log10T) (1 + DT ) (k) Gamson-T Watson (13) , 20(T_ + D)2 = A + BT - e 6 r C (1) Thodos (33) log1Qp = A + BT"1 + CT"2 + D(ET - 1 )P (m) Frost-Kalkwarf (12) log1QP = A + BT“1 + C(log10T) + DPT-2 (n) Riedel (30) -log10PR sr O.H8[cf(TR) ] - 7(log1 qTr) + (A - 7)[0.0364.JZf(TR) - log10TR], in which <^(Tr) = 36T^1+ 42(logeTR) - 35 - Tr 78 and normal boiling temperatures, for example). The symbols and represent the reduced temperature T/Tq and reduced pressure respectively. Conspicuous in the set of primary formulas is the presence of but one independent variable, temperature. Other than scattered equations which contain viscosity, surface tension, latent heat of vaporisation, etc., by far the majority of all equations reported in the literature contain only T as an independent variable, save those previously defined as "chemical." This observation is in accord with the phase rule. As the independent variable in the present example, however, reduced temperature was chosen. The early introduction of a chemical parameter tremendously reduces the virtual task of attributing the coefficients, because of the non-dimensionality of T^. A great deal of computational simplicity is further afforded by the condition that 0.2<T^<1.0 for the data used. Essentially for the same reasons, the dependent variable was taken as the reduced pressure P0. The K . continual appearance of P as the argument of a logarithm led to a decision to use the logarithm of the reduced vapor pressure as the sole dependent parameter: Yo = losey = l0SePR = 10£e (P /P c ). The appearance of pressure as a variable in the last term of the Prost-Kalkwarf equation, (m) in Table 1, elevates < 79 an ordinary vapor pressure equation into one of very superior accuracy. The equation is inconvenient to use, however, for calculating either vapor pressure from temper ature or vice versa. No other parameter including P is intuitively suggested to accomplish the reversal of curva ture in the vapor pressure curve, so P as a variable was arbitrarily eliminated from the right-hand side of the predictor equation. Its inclusion would have presented no difficulties whatsoever in the recursive regression procedure, however. The compromise made was one merely of convenience. No independent parameters containing unknown coefficients were initially included in the X-matrix. The basic transformation to reduced co-ordinates eliminated the obvious need for any translation of the origin. The trial-and-error procedures which would have been required in the Inclusion of such parameters are demonstrated elsewhere in the regression analysis. Quite a number of the primary formulas are thus ignored, according to Phase IB procedure. In any genuine study of the vapor pressure relationship, the relatively uncomplicated analysis of unknown coefficients would be handled by recursive linear regression. Consider Barkhuysen1s equation (4-) as converted to a standard form: 80 loB,0P = A + ilogI0[r |-3?] = A + ^[log10T - log10(B + CT)] = (A - ^log10B) + ^[log10T - log10(1 + §T)] , which can be generalized to log1Qp = A' + 8‘log10T + B'log10(1 + C'T). Trial values of C1 would be used with the raw data to evaluate by conventional multivariate regression the other coefficients A1 and B1. The coefficient C1 would then be attributed according to the Phase III procedure of recursive linear regression. The two different regression analyses would be performed for each of the chemical species being studied. The resulting set of C1 coeffi cients would then be utilized in the calculation of the X-parameter log^O + C'T), which appears for each observation on each species as a column of the X-matrix. For the simpler treatment involving the Antoine equation, correlation coefficients and other statistical results for the correlation log^QP-versus-(T + C)-1 are shown in Appendix IX. The trial-and-error procedures were con sidered too time-consuming and too numerous, for the set of primary formulas which appear in Table 1, to merit their inclusion in this example. Under the conditions described in the preceding paragraphs, the initial set of X^-parameters is Tr( I’’ , lo6eTH, Tl/2, e"TR, TI, TT^, t‘6 The natural logarithm was used throughout these and all subsequent calculations, since a basic computer subroutine was available for this function. The first set of auxiliary X^-parameters included the hyperbolic functions sinh Tr, cosh TR, and tanh TR. These parameters were suggested to the writer as ones capable of introducing the reverse curvature Invariably present in log P-versus-'7 1 e -T R plots, and of complementing the exponential term e The use of hyperpolic functions is apparently unique in this study. As functions closely related to the exponen tial, they deserve more scrutiny than they have received. They are easily handled on electronic computing equipment, but are probably considered "exotic" by engineers because they are manipulable on a slide rule only with difficulty. Kathematically they are single-valued and well-behaved throughout the real domain. Their derivatives and inte grals are analogous to those for the circular trigonometrl functions. Tor this vapor pressure example they are eminently suitable as intuitive parameters. In addition to the hyperbolic functions, the parameters 82 were introduced to maintain a degree of symmetry in the X^-table. By this means also, the possibility of detecting missing parameters is enhanced. The series of parameters 2/3 J/2 1/3 R* lR 9 lR 9 ’ for example, may lead in Phase IIA to the discovery of an X^-parameter through a comparison of the corres ponding sequence of r-values. In effect, a certain sense A of continuity of parameters X=Ta is established. The XV procedure is virtually equivalent to a trial-and-error solution for the exponent A. Independent parameters of the third auxiliary type were deduced from products and quotients of the right-hand sides of the historical or primary formulas in Table 1. In most instances these combinations are represented as terms of second-level complexity (i.e., c=2, see pages 33- 39). Any unstable members (for P>0.05, T-r,i1.0) were K K deleted and some Interesting ones from the third c-level T P R \ added. Two (loggTR, TR ) were added further to fill an incomplete set of computer-oriented data cards; they are parameters selected essentially at random--no conscious deductive process was involved. The complete set of initial Xi are given in Appendix VI. They number 115 (=n), and are identified according to their ultimate positions in a deck of standard IBM eighty-column cards. As primarily an illus 33 tration of recursive linear regression, the initial X^-set may be considered quite extensive. It serves the stated purpose, however, and emphasizes but again that the number n of parameters may be arbitrarily large. The x-values were taken from Stull's tables (37), as temperatures given at pressures of 1, 3, 10, 20, 40, 60, 100, 200, 400 : mm. Hg, 1, 2, 5, 10, 20, 30, 40, . . . , PQ : atm. Pressure-temperature pairs were recorded for the following members of the homologous straight-chaln hydrocarbon series n-C.H : i 21+2 i = 1, 2, . . . , 19, 22, 23, 26, 29. Except for dodecane (i=12), in the region i>3 no critical point was given, nor were temperatures reported above that corresponding to a vapor pressure of one atmosphere. Of these cases, the critical points for i<19 were those of Michael and Thodos (24), smoothed, and for 1219 the pairs PC,TC were taken from an extrapolated seml-logarlthmlc plot (figure 3) which inter-related nc and critical prop erties.- The P^-function in figure 3 changed moderately over the range of data available, and the Tc-function was nearly linear. The possible error in the extrapolation of the Tp-curve must be quite small. An estimated confidence region (dashed line) includes no more than a 2.3'( error in Tp. Extrapolation of the critical pressure relation was Number o f Carbon Atoms I n Chain 84 Temperature (degrees) 0 100 200 300 ^00 500 600 30 20 10 9 8 7 6 5 4 3 2 1 40 20 Pressure (atm.) Pig.3— Chemical Properties of the n-Hydrocarbons 35 considerably less trustworthy. From the best estimated value of Pc=8.7 atm. for nonacosane (n-Cg^HgQ), the confi dence Interval shown might lead to an error of about eight per cent. In all subsequent analysis, therefore, the importance of statistics for substances i>13 was somewhat de-emphasized. Since no data for pressures above one atmosphere were used in these cases, uncertainty in the region near the critical is of no consequence. The maximum error may be propagated, however, throughout the low pressure region. The reduced oarameters P-jT,, were calculated (as K . in all numerical computations hereafter mentioned, on an IBM Model 650 Magnetic Drum Data Processing Machine) to eight significant digits. They were plotted on semlloga- rithmic paper to obtain the consistency graph, Figures 4, 5, and 6. Two obvious errors were immediately detected. The n-C^H^g temperature corresponding to a pressure of P=5 mm. Hg was changed from T=-62.5°C. to T=-58.2°C., and the n-Ci^Hgs temperature for the same pressure was changed from T=98.3°C. to T=90.0°C. These corrections were deduced from the consistency graph. The family of curves shown in Figures 4-6 contains three members whose behavior is inconsistent with that of the other members. Specifically, the curves for n-C^Hgg* n-C^gH-^g, and 11-G2 2H4 6 appear either translated or rotated. 86 -2 <D U y w CO < D £ 0745 0750 Tg, Reduced Temperature 0.60 Pig.4— Stull1s Vapor Pressure Data (Reduced), Low-Range Reduced Pressure m 0755 0.60 0.65 0.70 0.75 0.80 Tjj, Reduced Temperature Fig. 5— Stull*s Vapor Pressure Data (Reduced), Mid-Range 1D, Reduced Temperature ft Rig. 6— Stull's Vapor Pressure Data (Reduced), High-Range 89 Por octadecane an abnormality Is apparent in the range P-£<0.002 (i.e., P<10 mm. Hg). The curve for docosane (n-C22^46) crosses over those for the higher members. Most unusual of all is decane, the P^-T^ curve for which K K is nearly coincident with that of heptane. Stull's data in this case must be considered incorrect. His temperature data are compared below with that calculated from an Antoine equation given by Lange (1 9)i P (mm. Hg) Stull (°C.) Lange (°C.) Error (°C.) 1 6.3 20.8 14.5 5 30.5 45.1 14.6 10 42.3 57.5 15*2 20 55.3 71.1 15.3 40 71.2 36.9 15.7 60 80.8 96.5 15.7 100 93.9 109.3 15.4 200 114.0 129.0 15.0 400 136 151.3 15.3 760 159.7 175.0 15.3 Prom the critical curves (figure 3), quite regular within the region 4<i<l6, the estimated values of P( -,=20.3 atm. and Tq=346°C. closely match those given more recently by Michael and Thodos (24), P^=20.8 atm. and Tq=343.7°C. The cause of the error in Stull's tables is inexplicable, since it amounts to a translation of about 15.2°C. on the average. The error does not likely taint other members of the homologous series, if it indeed is an error, since the others as a whole plot uniformly and smoothly. Again, the -exceptions noted for octadecane and docosane also can not be explained. The abnormalities were duly weighted in the « 90 analysis of Phase III. The data on decane, while carried through in the numerical computations, were almost completely ignored throughout the recursive procedure. Since the consistency plot (Figures 4-6) was generally satisfactory, the Y-vector and X-matrix were computed and punched into an appropriately-identified file of approximately 8500 IBM cards. A sample of these data is shown in Appendix VII. The cards (and therefore the printed lines of Appendix VII)contain five X^, the values of T^ and P^, and ten digits of identification. The entire X-matrix is much too large for inclusion as a part of this paper, as almost two-hundred closely-printed pages would be required. Interested persons may write the author at the University for copies of the computed data on the X^-parameters. Phase IIIA.— The compilation of a table of Z- parameters, for the Phase III urocedure of attributing the a-coefficients, was largely intuitive. The chemical properties which might bear on the vapor pressure rela tionship Include the critical temperature and pressure, the number of carbon atoms in the chain, and the triple point pressure and temperature. Only for methane is the triple-point well-lcnown, and for this reason the melting point at atmospheric pressure was used as an approximation to the triple point. The validity of this approximation is probably good for the first several members of the homologous methane series, but for n_C29^60 may be consid" erably in error. The variables TM and P„r should r * i ’ > i therefore be considered as such, and not as characteristics of the triple point. They are, in this resnect, somewhat artificial chemical parameters. The parameters to be used in Phase III were taken as the reduced functions (Tv )r=t,,/Tc, (P^^P./P^Pq1 , and n , the number of carbon atoms in the straight chain. The reduced values were used for the same reasons given for the utilization of T„ and Pp as > ' -parameters, i t J - t 1 primarily for the sake of dimensionality. ho other chemical parameters were obviously apparent. The use of formula weight V . h . is not necessary, nor in fact advisable, since it can be expressed as a linear function of the number of carbon atoms in the chain and the atomic weights of carbon and hydrogen, (^s )q and ^f^n-C I I 0 = nc ^ V l a ^ C + ^2nc + 2^ Wa^H nQ 2nc+2 = 2< V h - nct<Vc + A Any parameter which is strictly a linear function of another parameter need not be included as either an or a ZThe correlation coefficients with the dependent parameter are identical. The small number of chemical parameters, coupled with the foreseen number of observations (twenty-three at most, one for each of the chemical species studied), led 92 to the conclusion that a very large Z-tahle could easily be accommodated. The initial set of three parameters was thus augmented by a second set, which in combination with the original three is: n"J, n”^, n“^ , n”1 , n~1 ^ n1^, nc, n3/2 c c ’ c ’ c’ c ’ c c ncr nc’ ^°Senc’ (tm)r5» (tm)r5 2» ^TM^R ’ ^TK^R * <tm>r * ^TM^R’ < V a ’ p°fe(Tm)^; ^PM^R5, ^PM^R ’ ^P:-^R5 ’ ^PK^R ’ ’ ^PM^r' ’ ^ PM^R’ ^?M'R '* ^Pm \r' ^ PI - ! ^ R ’ l°Se(Py^R* The reader's remonstrations are anticipated, that these are parameters introduced virtually at random, as specifi cally challenged in the discussion of Phase I. What are being sought here are sample functions, however, The process of attributing regression coefficients is decidedly a frontier operation, and practically nothing is known of the analytical forms such coefficients may either assume or accommodate. The a ^-values are known from experience to appear somewhat harmonic or periodic. The use of certain low-order polynomial forms might thus be very useful. At the same time, irrational or 93 transcendental functions may be needed to counter the symmetry Introduced by polynomials. The distribution of coefficients may be skewed, for example, or the periodicity may be of changing amplitude and frequency. The use of both types of functions is thus needed. Nevertheless, elaborate parameters have been avoided in Phase IIIA for the same reasons given in the Phase I preparation of the X - table. As examples of the third type of auxiliary parame ter, products or quotients of those Initially specified, many third-level (c=3) terms were introduced into the set of Z's. Specifically, all products of nc-with-(TM)R, nc- with-(Pji J R, and (TM)R-wlth-(?M)R were included. Together with the three original sets [nc, (T^)R, (Pj*)R] t3ie z~ parameters number 396. Naturally, such a Z-matrix considerably exceeds the bounds of deductive logic, but was included as a part of the development of a computer program to generate automatically, and with proper identi fication, such tables. Automatic procedures of this type are frequently useful in that they can, as mentioned previously in the discussion of Phase III, eliminate some of the manual work associated with trlal-and-error processes. A file of some 1800 cards was produced, and these constituted the initial Z-matrlx. A sample of this fifty-page table is in Appendix VIII. Copies of the complete set of are available from the author at the 94 University, in the unlikely circumstance that this purely illustrative material may prove otherwise useful. First recursion stage. Phase II,— The recursion method is initiated with a study of the simple correlation coefficients r(Y0=log P_,X. ) for all 115 narameters of o R 1 the initial X-matrlx. The results for the five most strongly correlated parameters for each species are given in Table 2. One would expect very high correlations, since such a simple parameter as T-1 has been known historically to remove all but a very small part of the total variation in log P . Surprisingly, however, 6 It X^q2=T^ appears only three times in Table 2, while the — 1 ” parameter ^io9=tr e ^-s represented nineteen times, g=T^^l°geTR seventeen times, and the following six teen times each: X51 ^=T^2sinh(TR), X1lZf=T^^2, X^21 = -5 /2 T^ 3inh (T^). On the basis of r-magnitude alone, also dominates Table 2. For r>0.9999, Tp1e ^ appears -T ten times, T^2sinh(T^) seven; for r>0.9995» T^1e ^ — 2 —2 appears nineteen times and Tp ' log Tn and Tn slnh(Tr>) ^ 6 it it it seventeen times each. The choice for (X,.) , with regard -1 "tr to accuracy alone, is clearly the parameter ^io9="R e The statistics a0“ai“ pi“^ i “ieimax aPPear ln Table 3 for each of three parameters X^. As in all future tabulations, the error fractions € (and E) are based on the vapor pressure itself, rather than the loga rithm of the reduced vapor pressure. Thus, instead of 95 TABLE 2 REGRESSION STATISTICS FOR FIRST PHASE II RECURSION nc ■ Parameter Codes and Simple Correlation Coefficients 1 502 0.999972- 305 0.999959 41 9 0.999909 514 0.999907- 222 0.999902 2 41 9 0.999960 1 09 0.999946- 514 0.999940- 502 0.999679- 421 0.999534- 3 109 0.999963- 41 9 0.999934 514 0.999910- 421 0.999543- 502 0.999535- 4 109 0.999953- 419 0.999872 514 0.999345- 1 14 0.999662- 421 0.999474- c ; 109 0.999921 - 41 9 0.999327 514 0.999303- 1 1 4 0.999733- 421 0.999486- 6 109 0.999367- 1 1 4 0.999330- 41 9 0.999747 Si 4 0.999717- 221 0.999436- 7 109 0.999913- 419 0.999315 11 4 0.999305- 514 0.999794- 421 0.999554- 8 114 0.999897- 109 0.999354- 41 9 0.999719 514 0.999702- 221 0.999517- 9 1 14 0.999906- 109 0.999319- 41 9 0.999603 514 0.999599- 421 0.999505- 10 1 14 0.999953- 109 0.999694- 221 0.999627- 306 0.999428 419 0.999420 11 109 0.999987- SI 4 0.999950- 419 0.999946 421 0.999913- 114 0.999569- 12 109 0.999877- 114 0.999358- 514 0.999758- 41 9 0.999754 421 0.999743- 13 514 0.999973- 419 0.999969 421 0.999963- 109 0.999954- 305 0.999773 14 109 0.999923- .114 0.999335- 421 0.999825- SI 4 0.999807- 41 9 0.999792 96 TABLE 2— Continued n c Parameter Codes and Simple Correlation Coefficients 15 1 14 0.999957- 221 0.999630- 109 0.999676- 306 0.999628 421 0.999558- 16 1 14 0.999385- 221 0.999733- 306 0.999713 521 0.999613- 109 0.999438- 17 1 14 0.999940- 109 0.999325- 421 0.999710- 514 0.999677- 41 9 0.999653 18 514 0.999985- 419 0.999982 109 0.999963- 305 0.999855 421 0.999324- 1 9 109 0.999964- 51 4 0.999911 - 419 0.999839 421 0.999872- 114 0.999831 - 22 502 0.999926- 222 0.999393 1 20 0.999834- 41 9 0.999683 305 0.999661 23 421 0.999901 - 1 14 0.999632- 109 0.999625- 514 0.999503- 221 0.999436- 26 421 0.999885- 221 0.999697- 114 0.999653- 306 0.999634 521 0.999575- 29 221 0.999490- 521 0.999433- 321 0.999442- 103 0.999400- 306 0.999399 (see Appendix VI for translation of parameter codes) 97 TABLE 3 BASIC STATISTICS FOR THREE PARAMETERS OF THE FIRST RECURSION nc r a0 ai l€l l^max X109 = T^1 e-T* 1 0.999791 - 2.26913 6.33754- .0385 .0813 2 0.999946- 2.50194 7.0C991- .0286 .0740 3 0.999963- 2.71293 7.50153- .0233 .0595 4 0.999953- 2.88281 7.34537- .0227 .0932 5 0.999921 - 3.051 1 1 3.30892- .0325 .0935 6 0.999567- 3.18967 3.69910- .0406 .1303 7 0.999913- 3.31762 9.07361 - .0338 .0919 8 0.999854- 3.48385 9.47254- .0411 . 1 366 9 0.999319- 3.69952 9.82747- .0 384 .091 1 1 0 O.999694- 3.51979 9.25190- .0517 .1452 1 1 0.999987- 3.76997 10.25340- .0104 .0238 1 2 0.999377- 3.90452 10.59990- .0345 .0839 13 0.999954- 3.99363 10.92496- .0138 .0553 14 0.999923- 4.25063 11.43193- .0252 .0456 1 5 0.999676- 4.45411 1 1 .84551 - .0543 . 1 227 1 6 0.999488- • 4.61678 12.23677- .0700 . 1 497 17 0.999825- 4.58363 12.32327- .0357 .1 139 18 0.999963- 4.42660 12.13443- .0168 .0436 19 0.999964- 4.71839 12.78227- .0174 .0339 22 0.999447- 4.61473 13.03016- .0602 .1790 23 0.999625- 5.23978 13.37021 - .0505 .1471 26 0.999430- 5.79235 15.24033- .0620 .2041 29 0.998222- 6.49956 16.77095- . 1108 .3903 X419 = TR 2/3log3TR 1 0.999909 0.039267- 4.88267 .0249 .0592 2 0.999961 0.033507- 5.44165 .0228 .0793 3 0.999934 0.007362- 5.31037 .0234 . 11 67 4 C.999872 0.040247 6.07441 .0401 .1485 5 0.999327 0.035139 6.42228 .0485 . 1 500 6 0.999747 0.035579 6.72214 .0559 .1837 7 0.99981 5 0.023565 7.00244 .0431 .1445 8 0.999719 0.045866 7.30077 .0592 .1913 9 0.999603 0.137456 7.58264 .0577 .1473 10 0.999420 0.169105 7.14607 .0724 .1926 11 0.999946 0.052284 7.90403 .0212 .0537 t 93 TABLE 3--Contlnued n c r ao ai l€l l€lmax _o /t Xj H 9 = T KcTr“" Continued 1 2 0.999754 0.049063 8.1 5090 .0516 .1051 13 0.999969 0.028325 3.40902 .01 67 .0327 .14 0.999791 0.095339 3.78990 .0436 .1001 15 0.999423 0.145991 9.09610 .071 6 .1671 1 6 0.999197 0.163598 9.33834 .0867 .1949 17 0.999658 0.099354 9.45369 .0510 .1563 13 0.999932 0.013392 9.31323 .0110 .0334 19 0.999839 0.065297 9.79735 .0239 .0675 22 0.999633 0.148213- 10.01913 .0444 .1378 23 0.999423 0.134321 10.61161 .0623 .2030 26 0.999151 0.230820 11.63363 .0783 .2596 29 0.997731 0.372113 12*77400 .1231 .4503 X514 = TR2 sinh(Tr) 1 0.999907- 6.89431 5.90423- .0251 .0533 2 0.999940- 7.71163 6.59139- .0272 .0975 3 0.999910- 8.25339 7.03609- .0329 .1329 4 0.999345- 3.67717 7.35253- .0435 .1635 5 0.999803- 9.16327 7.77124- .0514 .1628 6 0.999717- 9.53477 8.13098- .0533 .1953 7 0.999794- 9.96876 8.46355- .0502 .1 539 S 0.999702- 10.41277 8.82796- .0602 .1992 9 0.999599- 10.91510 9.17335- .0572 .1434 10 0.999403- 10.32993 8.64779- .0729 . 1 939 1 1 0.999950- 11.23117 9.55973- .01 98 .0477 1 2 0.999758- 11.61552 9.35128- .0514 .1023 13 0.999973- 11.96397 10.16744- .0135 .0299 14 0.999307- 1 2.57421 10.62645- .0407 .0393 15 0.999456- 13.05305 10.99533- .0699 .1 660 16 0.999235- 13.43919 11.34346- .0346 .1928 17 0.999677- 13.51 590 11.42651 - .0492 .1 540 18 0.999985- 13.22971 11.25631 - .0092 .0366 19 0.999911- 13.96876 11.84188- .0265 .0536 22 0.999616- 14.06466 12.10676- .0483 .1 508 23 0.999503- 15.24422 12.82665- .0573 .1357 26 0.999264- 16.74239 14.06422- .0724 .2394 29 0.997943- 18.51031 15.44889- . 1 214 .4265 error fractions expressed logerR - (l°BeFR)oalc , the values given are P - <P>calc . P The accuracy of a predictor equation for vapor pressure must obviously be measured in terms of the latter func tion. Too, logeP^=0 at the critical, and the error fraction is therefore undefined in the logarithmic form. — 1 ^R —2 —2 / 3 Three parameters TR e , T^ sinh(TR), and T^ ' were selected as ones most logically introducible as the first predictor equation term. The regression coeffi cients a^ were then plotted against nc, the number of carbon atoms in the chain, and the resulting curves are shown in Figure 7. The shapes of the three traces are so nearly identical that the Phase III attributing procedures would also be virtually indistinguishable. Each segmented curve is characterized by approximately the same eccentri cities of form, the same Inflections at nc=7,11,17. No one of the three parameters thus shown is to be preferred, — 1 ~ and the one which is perhaps simplest (T- e ) has little ± 1 else to recommend it. A study of other common chemical parameters is useful at this time. Several of the simpler Z's are Regression Coefficient 100 12 10 (0) 10 5 20 nc, Number of Carbon Atoms in Chain Fig„ 7— Regression Coefficients, First Phase II 101 shown in Figure 3 as functions of n . The most striking characteristic of all these parameters, as evidenced by the curves, is the duality of the functional values below about n =16. In almost all cases the curves branch, one 0 following smoothly the points (nc)0dd an(^ other, in a similarly smooth and uniform manner, the complementary ■points (n ) . Although the Z-parameters are single- c even valued, and the Z-versus-n curves smooth, this usage of "single-valued" and "smooth" is certainly different from that normally employed. hhat are thus characterized as chemical properties obviously, in the present case, are not continuous in the sense that the curves of Figure 7 are (apparently) continuous. The use of a different abscissa in Figure 7 would "improve" the three curves shown therein only if a quite different type of Z-parame- ter were employed. This particular circumstance is discussed in the next section as a Phase III recursion procedure. - 1 " t r The parameter (XT).=X,__=T_ e was chosen as I i ioy r the first independent recursion parameter. The decision was based on the r- and a-criterla mentioned above, and on the €-criterion graphically displayed in Figure 9. The average (absolute) recursion-error fractions were plotted against n for each of the twenty-three species. For ° -T almost all of these the errors for Tl1e- R are less than i t those for either of the other two parameters tested. The (0) 5 10 15 25 20 30 nc, Number of Carbon Atoms in Chain Pig. 8— Typical Chemical Properties Z j . I € l , Average Absolute Recursion-Error Fraction 103 0.09 0.08 0.07 ~2sinh T, 0.06 0.05 0.04 0.03 0.02 0.01 20 30 n_, Number of Carbon Atoms in Chain w Fig. 9— Recursion-Error Fractions, First Phase II features of Figure 7 are accentuated In Figure 9, however. The strange behavior of all these curves for n =11,12,13 V can not be explained by this writer, nor can the undeniable peak at about n =16. These two characteristics may have c arisen solely from the use of Stull's smoothed data, in which case one may conclude that the interposed or smoothed model is not generally trustworthy for nc>9» Stull's data are indeed an accurate representation of the raw data he extracted from literature sources, the entire vapor pressure study should be suspended until an account is made of the results shown in Figures 7 and 9. This writer feels that very serious consideration should be given such a study. The two unanticipated ambiguities (near n =12 o and n =16) are too far removed from random fluctuation to c be seriously considered as merely experimental error. One must grant that, in recursive linear regression, abnormal ities of this type are magnified in successive recursion stages. Nevertheless, in a first recursion stage one would not expect the residuals to be so grossly incompati ble with the apparently uniform physical model. Whether the anomalies are a consequence of hydrocarbon chain resonance, molecular geometry (chain, helix, rod, etc.), or some type of clustering, a continuing or further analysis by any type of regression could now be described only as synthetic. With full realisation that grave doubts have now been raised concerning either the 105 uniformity of the physical model, or the interposed model Itself, we will terminate the first Phase II recursion stage. The which has been selected serves reason ably well at what one might call the macro-level, but does not satisfactorily describe local perturbations. The predictor equation at the end of the first Phase II recur sion is thus -1-"TR, Y0,j = a01,j + a1j(XI^1J ” a01,j + a1J(TR e (Va) in which the numerical values of the two coefficients are as yet undetermined. First recursion stage. Phase III.— The coefficients j (which may also be identified as coefficients a^, as opposed to aQ1 ) of the first recursion equation (Va) form a set of twenty-two values, one for each chemical species (the case nc=10 was omitted in this study). The largest correlation coefficients for the original Z^-table (described on pages 90-9^)are tabulated below: r(a1 = 0.99301+ Z309 = nc1°ee(V To) r(a1 »Z508J = 0.99286- Z508 = w v 172 r(a1 »Z108^ = 0.99254- Z108 = W V r(a1 »Z201) = 0.99199- Z201 = no r(a1 ,Z337} = 0.99198+ Z337 _ n-3/2/T /T jl/2 c ' V M 106 As a first approximation to none of the above terms is a particular best choice from the standpoint of accuracy, but the simplest one (n ) is in every other sense the most appropriate. The reader will recall that almost four hundred different -parameters constituted the inital Z-matrix. That nc should correlate more strongly with a^ than all but three others is rather surprising. One would probably conclude that an accurate expression for a^ can not be derived from elementary combinations of the simplest chemical properties z^. Eefore continuing with the Phase II recursion stage using Z2oi=nc as t*16 recursion parameter (Z ^^ , let us consider other parameters Z^ which might better serve the basic predictor equation. Y . r e seek a Z which obeys, approximately, the equation a^A+EZ. Thus, considering now the relationship between Z and n (from Figure 7) c and noting that da^dZ = (da1/dnQ ) (dnc/dZ), we may conclude that: (a) Z must vary uniformly with n over the ranges c n <10 and 10<n <29 (omitting, or "de-empaslz- c c~ ing," the points n =10,13,22), since (da1,/dZ)=B U i and da./dn (Figure 7) is so characterized; I c (b) either a discontinuity or an abrupt change in dz/dn occurs in the neighborhood of n =10, c c since the product of da./dn and dn /dZ must 1 C c be constant while da1/dnQ (Figure 7) suffers 107 such a change; (c) Z must be nearly constant in the neighborhood of nc=17, since the product of da1/dnc (=0 from figure 7) and dn /dZ must be approximately c constant in the vicinity of nQ=17; (d) inflection points appear in the Z-n curve near nc=l8, and probably near nc 1 -4, since d2a, da! d2Z d2a, dZ ' _ !_•___ + L* * dn2 dZ dn2 dZ2 dn C c c p da, da, — I * o and — = 7, dZ2 dZ 2 2 d a, d Z — 2~ = — ? * dn2 dn2 2 2 and these inflections do occur (d a,/dn - 0) in 1 c Figure 7. One has difficulty conceiving a parameter Z which satis fies even a few of these criteria. As shown in Figure 3, most of the unusual features of the Z-nc curves initially chosen are quite well-behaved above nQ =16. A chemical property, or combination of properties, which satisfies the five specifications given above must be of rather remarkable nature. It must be uniform (with n ) for c n <10, as most of the Figure 8 curves obviously are not. c It must also change markedly in those regions where the Z-n curves are smooth and, so to speak, continuous. The 108 search for such a Z-parameter is beyond the scope of this illustrative example, but either the physico-mathematical model (i.e., an assumed regularity in functions of n =1 ,2,3, •..., or even-odd arrangements thereof) or the raw V data (P^,T^) must be considered erroneous. A proper study of the vapor relationship should certainly include some attempt to verify, in the light of Figure 9, either the model or the data. On the assumption that nQ is the best parameter (Zj)i for the first Phase III recursion, we may write the predictor equation for a1 as the recursion formula aU = ai j.O ” b01 ,1 + b1 1 ^Z JI ^ 1 1 ~ h01 ,1 + b1 1 (nc^’ for which the solved regression equation is a1j,0 “ ~6.53396 - 0.33509(nc). (Vb) The species subscript 3 is, in this first approximation to a^, equivalent to a subscript n . The two will henceforth be used interchangeably. In order to approximate a1 more accurately, by a second Phase III recursion stage, the residuals a1 = a13,0 - b11^ ZJI^11 (Vc) were calculated and a correlation analysis performed on the parameter pairs a1 The most strongly corre- _ ~ Z / p lated parameter was in this case Z,1H=n (P.Vp^) , for 311 c' M 0 which the correlation coefficient was r^aij 1*^311) = 0.549. The resulting predictor equation for a1 then becomes the second approximation a 1 3,1 ~ b01 ,2 + b12^ZJI^12 = b01,2 + »12[no<pM/?0)'5/2]- Solution of the corresponding least squares normal equa tions produces a1;M 3? -5.54517 - 0.0011 964[nc (Pj/Pc )“3^2], or in expanded form, a.. = -5.54517 - 0.33509(n_) C (Vd) - 0.0011964[nc(PM/PG)"3^2]. The two approximations to a^^ may be compared in Figure 10. The simple correlation coefficient for the second stage (i=311) is significantly non-zero at the 99.9+% level of confidence. The curves representing the two coefficients aij o anc* a1 J 1 not differ signifi cantly in form, however. The amplitudes of the peaks and valleys in the second recursion have been reduced by a factor of about ten, but the relative ranges of these extremes have not been measurably changed. Virtually the only conclusion which may be drawn from this analysis is that the coefficient a^ ^ must be studied by non-linear methods, viz., trial-and-error procedures. The first trial parameter tested was one involving £ a i j , o - b n z i i l 110 0.6 5.9 - 6.0 0.5 - 6.1 0.4 - 6.2 0.3 -6.3 0.2 -6.4 -6.5 < \ l - 6.6 -6.7 -0.2 12 12 -0.3 -6.9 -7.0 -0.5 0.6 -7.1 -7.2 0.7 (0) 20 Atoms 5 10 nc, Number of Carbon Atoms in Chain Pig.10— Phase III Recursions on a^ 111 the hyperbolic sine function. The slight but detectable -1 “Tr reverse curvature in a^ for e (see Figure 7) may be accounted for by an equation of the form a1 = A + B sinh[c(nc + D)] , in which the origin has been translated to D,A and the axes scaled by factors of C,b" ’1. Of several manually- derived equations with which the writer experimented, &1 = 11.5 + 2.5; 32 sinh[0.10l8(nc - 15)] (Ve) followed the given curve best, as shown in Figure 11, with due allowance for the dubious points n =10,18,22. The constant aQ1 was manually fitted in a similar manner to the equation aQ1 =4.12 +0.5134 sinh[0.1 552(nc - 14)]. (Vf) In both (Ve) and (Vf) the coefficients C and D were initially estimated from Figure 11. Residual sums of squares were then calculated for regression equations in which C and D assumed values within a relatively narrow range on either side of these estimates. The final selection of A,B,C,D, was executed graphically. The coefficient a^ need not be attributed, since the regres sion constants a0j c for all recursion stages through the k-th may be combined and attributed at the .k-th stage. The attributing of a^ at this point is in part an anticipation of difficulties in the second recursion stage, Recursion Coefficient 112 -10 -12 -13 . -14 CO -17 5 10 15 20 25 nc, Number of Carbon Atoms In Chain Pig. 11--Pinal Recursion Coefficients, First Phase II Recursion Constar 113 and will be justified subsequently. The first recursion equation is now completely defined: (Vg) and a1 = 11.5 + 2.5-2 sinh[0.10l8(nc - 15)], (Ve) ac1 = 4.12 +0.5134 sinh[0.1552(nc - 14)] . (Vf) The total residuals, tentatively the dependent parameter for the second Phase II recursion stage, were calculated as and plotted as shown In figure 12. Throughout the analysis of the procedure discussed in the preceding paragraphs, one feature is quite promi nent. The first nine members of the homologous series behave always in a regular, almost anticipated, manner. A similar but less nronounced tendency prevails for species n =14-17,19,26,29. Tor cases n =10 (not L > w plotted) and n =11-13,1q,22,23, however, the trends toward uniform variation are almost totally absent. The general shape (essentially a third-order polynomial in some function of n ) of a typical curve of this latter group is either reversed in vertical orientation or else the expected maxima and/or minima were obliterated. Just (Vh) First Dependent Parameter 7.0 4.7 6.8 .6 CI8 .4 .2 6.0 C/5 0 3.0 6 4.2 CI4 4 .2 4.0 5.0 2.4 4.8 2.2 4.6 0.4 0.6 0.8 1 T-p, Reduced Temperature Tp, Reduced Temperature T^, Reduced Temperature Fig. 12--Total Residuals, First Phase II 1 14 115 as the coefficients a1^ vary with nc in a highly Irregular manner, so do the residuals ^ vary irregu larly among the twenty-two species. Essentially the same grouping appears in Figure 12 as is evidenced in Figure 7. For ^c>9 Stull*s data unfortunately did not cover the region between one atmosphere and the critical point pressure Pn (except for n =12), hence the curves shown c in Figure 12 are but approximately defined by the dashed, extrapolated portions. The constant a , given by equation (Vf), is also affected by this lack of data, since a01 = Y0 " a 1 ^ XI ^ 1 ^Vi ^ by equations (Ille). The distribution of points Y,X^. may thus not be representative enough of the complete range to establish Y and Xj for that range. This question, regarding the uniformity or common basis of the means for all twenty-two species, will arise again in the next section. Thus apparently is the termination point reached for this first manual procedure in Phase II recursion stage number one. In spite of the defects noted for the new dependent parameter defined by equations (Ve-Vh), the parameter Y^ of equation (Vh) was selected as a start ing point for the second Phase. II recursion. A bona fide investigation of the vapor pressure phenonenon would 11 6 necessarily demand a considerably more thorough study of a1j> especially in light of the unexpected and provocative features of the curves shown in Figures 7, 9, and 12. Such a research project is, however, beyond the scope of the example now being illustrated. Second recursion stage. Phase II.— The total resid uals shown in Figure 1? were translated by the attributed coefficient a of equation (Vf). This translation constant, the best attributed mean value of the quantities Y^-a^(Xj)p is different for each of the twenty-two species and is indicated by the dashed horizontal line through each curve of Figure 17. The dependent parameter of the second recursion is thus Y1 = Y0 - [a01 + al < VIa> rather than the expression (Vh). In some analyses of physical data the first independent recursion parameter will be found to remove all but a very small part of the total variation originally present in linear rela“ tionshirs. In such cases, the regression constants a ox, 2 might best be removed from successive dependent parameters during each recursion stage. Any functional characteris tics of the will then be magnified, rather than partially obscured by the constant of translation. In the present case, the elimination of a in the first recursion stage enhanced computational accuracy in the 1 1 7 eight-significant-digit arithmetic of the computer program. The values of Y1 are not plotted, since all useful information pertaining thereto appears in Figure 12. 3ven a casual surveillance of Figure 1? would indicate that a third-order polynomial in some function of nQ might drastically reduce the total residuals. From this assumption, the zeros of the drawn curves (for the translated origin, dashed) were studied. The three very approximate roots for each of the curves are plotted as functions of nQ in Figure 13. Intersections of the Figure 12 curves with their translated origins were determined by inspection, ranges being used when the roots were in doubt. This procedure of root-finding is, of course, one of the justifications for the elimination of an. ^ in the first recursion stage. If an equation cubic ^ * $ O in some relatively simple function of nc will in fact remove a major part of the remaining "functional" variation, the Phase III attributing of the roots of Y? = A(Y, - B)(Y1 - 0)(Y1 - D) (VIb) will be necessary. The results depicted in Figure 13 establish beyond reasonable doubt that the first two zeros of Y^ vary with nc in a systematic manner. The situation is not so clear in the case of the third root. In order to deter mine whether a different abscissa would more • completely Roots of General Cubic (Equation VIb 118 1.05 root3 CEq’n. V lf) 1 .0 0 0.95 0.90 0.85 0.80 root2 C Eq’n.Vic) 0.75 root. CEq’n. Vic) 0.70 0.65 0.60 0.55 root. CEq’n. V Id) 0.50 0.45 0.40 20 nc, Number of Carbon Atoms In Chain Fig. 13— Estimated Roots, nc-Plot 119 define the systematic variation or trend which is barely dlscernable for this third root, a correlation analysis (Phase IIA) was performed. The probability of finding; a better independent parameter did not seem large, in view of the scattered and regionalized data on the estimated roots. Howeverj Figure 8 showed that rather remarkable changes in the Zj do indeed occur for small nc. The results of the correlation analysis, root^( ) -versus-Z, are tabulated below: r(root. ^ , ) -41 1 J = 0.97735- Z41 1 ncloge (P_,;/Pc ) r(roo t. v ) ‘ "530 = 0.97712+ Z580 = (PM /PC)1^3 r(roo t. Z102^ = 0.97669+ Z1 02 - 1 /? nc r(root1 21 1 1 ) - 0.97637+ Z1 1 1 - nc<pa/pM> r(root. Z436} - 0.97626+ Z436 - n^/'^(P /P ) c v rC W r (roo t.g Z505* = 0.99323- Z505 (P0/Py ) r(rootg Z406 ^ - 0.99189- Z406 - (F /P )1 ^ C M r(rootp 7'451 - 0.98625+ z4 51 = nc(W ’ r(rootp Z1 07 ^ = 0.98606- Z107 - (P 7p ) ^ ^ M' r(root^ Z244) — 0.9825q+ Z244 — 1°eenc(V Tc) r (roo t-j Z520^ = 0.56270+ z520 = nc < V pK>"3 r(root^ z345) — 0.52939+ Z345 = _ / r(root^ Z31 1 - 0.49985+ Z31 1 = n (P /p ) nc ^ C /iV r(roo Z! 23) = 0.49680- Z1 23 = no2< V /TM>3 r(root^ Z338} - 0.45527+ Z333 — n5/2(p /p \3 c v Cf M ; The first two roots obviously can be handled with any of several Z-parameters, since the difference between any two of the five largest correlation coefficients is so small as to suppress almost completely any consideration except simplicity of the Z-parameter. Just as obvious is the conclusion that the third root is going to be much more difficult to attribute. The correlation coefficients for root^(Y1),Z are of quite a different order of magnitude. The estimated roots are renlotted in figure 14 as functions of the most strongly correlated Z-parameter. A comparison of the apparent (visual) fits of the two sets of curves in in figures 13 and 14 Indicates that the Z-parameters discovered in the correlation analysis have not afforded any significant improvement in the attributing of the roots. Considering also that the roots were deduced graphically in the first place, with substantial latitude in several of the selections, one may conclude that the parameter nQ is still the preferred one for Phase III. As trial parameters, the nc~functions to be used should be considered complex enough for the purposes of this example. The search, after all, has been only for an appropriate independent parameter. Suitable functions of nc, to be utilized in the prediction of numerical values for the three roots, are yet to be determined. Relatively simple manual (non-least-sauares) -Roots of General Cubic (Equation VIb) 0 rooty Z52Q 2 4 6 3 10 1 21 1 2 0.3 0.7 0.4 20 10 -roott :-Z4l1 or Fig. 14— Estimated Roots, Z-Plot 1 22 methods were used to determine reasonable functions for the three roots of the -curves. In this particular Phase III recursion only the feasibility of a third-order polynomial form was under consideration. Should the results of the test prove to be encouraging, a more formal study of the ?eros of the polynomial would be undertaken. At this ooint, however, the following equations, deduced from rather arbitrary trial-and-error procedures, were assumed to be satisfactory oredictor equations for the roots : (T„) . =0.340 + 0.397 [tanh(O.C53n )] (Vic) u rootj c or -0.139n (TR>root -0.733 - (0.375 +0.0427nc)e (Vld) and (Vroot2 “ C'476 + 0.356 [tanh(0.1026nc)] , (Vie) -0.1142nc (TR)rootT - °*880 + °-0473(nce (VIf) For the first root two different predictor equations were derived. All four curves are shown in Figure 13. By setting B = <Vroot, °f <VIC>- B’ = (Vroot, °f (VId)* C = (TR>rcot2 of <vle>- D = 'Vrootj the residual curves of Figure 12 may be approximated by either of two equations (for species J): 123 Y3,3 = V ' V j - B3H(T3)3 - C3][(Tr)3 - Da], (YIB) Ys ,l = A3 [ ( Th )j - Sj ] [ < TR>3 - 0j ] [ < ”r ) j - Dj ] - <V Ih > The product of the three bracketed terms in each of these equations was calculated by means of equations (Via) and (Vlc)-(VIf), and defined as X-parameter %228 ^ or or ^530 (for VIh). These parameters were added to the original X-matrlx and a correlation analysis (Phase IIA) performed for on the augmented set of X-narameters. The results are given in Table 4. As indicated by the summary therein, the trial parameters which were used to. augment the X-matrix are not markedly superior to the -X /? parameter X420~^R ‘tanh(TR) in reducing residual varia-. tion in Y1. Apparently a third-order polynomial of the type selected does not, for such relatively simple n - c functions, trace more than the most fundamental character istics of the residual curves in Figure 12. Nevertheless, parameter ^28 holds a definite edge, albeit slight, over as a correlating parameter ^or ^is second recursion stage. The three roots which were estimated from Figure 13 depended to great extent on the origin-translation constant a ^ . An increase or decrease in a^ would have the effect of raising or lowering the equivalent zero axis which determined the three roots. For higher members of the methane hydrocarbon series the assignment of aQ^ % 1 2 3 4 5 6 7 8 9 10 11 1 2 13 14 15 1 24 TABLE 4 REGRESSION STATISTICS FOR SECOND PRASE II RECURSION Parameter Codes and Simple Correlation Coefficients 123 0.94-51 5 508 0.92261 - 420 0.86868- 423 0.34557 320 0.74576 228 0.81601 320 0.81248 111 0.76452 406 0.73343 118 0.69948 311 0.67818 228 0.81947 330 0.79858 1 11 0.29912 406 0.26553 51 5 0.25107 311 0.21258 410 0.89666- 113 0.89338- 209 0.89292- 316 0.89159 117 0.89123 330 ' 0.88596 228 0.88129 208 0.75517 206 0.74436- 316 0.74443 322 0.73823- 228 0.97160 330 0.96151 208 0.71693 322 C.70008- 206 0.68240- 503 0.68139- 228 0.96694 330 0.99074 208 0.71334 322 0.695-79- 206 0.69067- 316 0.68778 228 0.97796 330 0.92712 51 8 0.64525 519 0.63936- ,1 18 0.55218- 31 1 0.48286- 330 0.94768 228 0.88747 518 0.86032 519 0.65672- 508 0.73251 118 0.69073- 501 0.99573 402 0.99560- 102 0.99548- 117 0.99542- 115 0.99539- 306 0.95278- 521 0.95275 221 0.95270 1 14 0.95268 108 0.95267 118 0.75398- ' 518 0.73943 311 0.73806- 406 0.73394- 519 0.73188- 330 0.84690- 228 0.82159- 423 0.69123 1 23 0.67559 508 0.63424- 420 0.62073- 423 0.94656- 228 0.93512 420 0.91706 330 0.91132 320 0.90135- 513 0,89754 0.9§§09 0.5^246 423 0.93855 420 0.91970 0 JI529- 125 TABLE 4— Continued n Parameter Codes and Simple Correlation Coefficients 1 6 228 0.98491 330 0.93052 423 0.94017- 420 0.91566 320 0.87642- 513 0.86317 17 228 0.97068 330 0.97038 423 0.94220- 513 0.86650 420 0.86575 320 0.86550- 18 320 0.92661 420 0.92337- 423 0.91791 323 0.90304 407 0.90494 19 i K N C V ] O J o K \ ON . o 407 0.90198- 513 0.90144 320 0.90063- 403 0.90043- 22 420 0.97813- 123 0.90979 320 0.86068 423 0.85520 228 0.81424- 323 0.81039 23 1 23 0.69061 - 508 0.64620 420 0.55316 519 0.46888- 518 0.46089 26 420 0.85368 123 0.83159- 320 0.60350- 423 0.54643- 508 0.54070 29 420 0.87860 123 0.37739- 508 0.60902 320 0.50022- 423 0.43396- Appearance of in Table 4 * r>0.90 rS0.85 r>0.80 13 7 9 13 1 2 8 9 1 1 13 5 9 9 1 1 3 7 7 1 1 5 7 7 8 2 4 5 Summary--The Prequency o: X228: see Y * 330 * see X420 = T-' 1r X320 _ m“' X423 I I P 3 1 X1 23 = T~ R tanh(Tr) 'cosh(T^) •si-among the six most strongly correlated parameters 1 26 on the basis of a relatively small portion of the T„-range may thus have led to an inaccurately positioned origin. The possibility exists, therefore, that the mathematical model for n <9 may be quite different from that for, say, nc=23. Figure 13 does not sustain this conjecture, how ever, as the roots seem to vary uniformly as functions of n . Except for the third root figure 13 gives little evidence that the zeros of the cubic form are not, essen tially, points along single curves. We shall therefore accept the proposition that the accuracy of the three attributed roots decreases gradually, never abruptly, with increasing n . The possibility that a real change in the model may in fact exist is nevertheless not overlooked. The change, perhaps a discontinuity in the curves of Figure 13> may be masked by the concentration of P~,T„ R R data in the region well below the critical. Only a study of more extensive data would clarify the shapes of the curves nearer the critical, PqTq> aad permit at the same time a better estimate of the third root. Of all the parameters analyzed in the second Phase II recursion, either X228 or X420 seems to be roost suitable as the second independent recursion parameter (Xj)2« The basic statistics for these two. X's are tabu lated in Table 3« For %228 averaSe absolute recursion-error fractions l€l have in most cases been TABLE 5 BASIC STATISTICS 70R TWO PARAMETERS OP THE SECOND RECURSION nc r a0 ai l € l l^max X228: see equation (Vlg) 1 0.440507 2.35391 2.15937 .041 6 .0994 2 0.816010 2.49184 3.79085 .01 54 .0390 j 0.819455 2.71710 3.55347 .0100 .0573 4 0.796043 3.00831 7.80332 .0250 .1004 5 0.881290 3.13°62 8.31227 .0191 .0493 o 0.971597 3.28249 10.98967 .0108 .0287 r 7 A 0.966940 3.39915 9.69454 .0104 .0253 8 0.977963 3.52240 11.24351 .0083 .0291 9 0.887472 3.69192 10.43602 .0185 .0544 10 0.430172- 4.33752 13.09142- .1693 .91 30 1 1 C.624667- 3.91379 5.43953- .0233 .0936 12 0.443535 3.98888 4.95462 .0364 .0891 13 0.821591 - 4.02710 4.18554- .0115 .0315 14 0.935118 4.10072 10.09498 .01 53 .0345 15 0.988093 4.20448 20.05645 .0134 .0243 16 0.984913 4.28342 26.63699 .01 66 .0378 17 0,970681 4.38192 1 5.2101 6 .0150 .0310 18 0.898045- 4.54037 6.73496- .011 5 .041 6 19 •0.833138 4.58110 7.65320 .0213 .0387 22 0.814241 - 4.87817 19.42937- .0458 .1403 23 0.131555 5.22387 19.82568 .0514 .1463 26 0.420926 5.66422 10.54749 .0579 .1818 29 0.243963 6/36849 12.76732 .1069 .3821 The values of shown include the regression constant a( ^1 of equation (Vg). TABLE 5“-Continued nc r ao ai l€l X420 = Ta2/3tanh(TR) 1 0.868679- 4.73739 3.16368- .0242 .0572 2 0.900777- 2.49549 0.01205- .0274 .0831 3 0.922623- 2.79634 0.11300- .0227 .0663 4 0.735216- 4.46071 1.96249- .0320 .0965 0.489193- 4.07478 1.26488- .0392 .0895 6 0.361613- 4.15355 1.17711 - .0472 .1 192 7 0.^93865- 4.25952 1.15664- .0387 .0842 8 0.823432 3.29858 0.28327 .0445 .1085 9 0.416603 2.54713 1.53073 .0357 .1043 10 0.769715- 14.34840 13.34663- .1 183 .5804 1 1 0.844561 - 6.14100 2.95277- .0169 .0609 1 2 0.178223- 4.56651 0.77102- .0399 .0978 13 0.620728- 5.15549 1.48599- .0167 .0382 14 0.917063 0.44191 4.81450 .0200 .0281 15 0.919702 3.01163 9.47901 .0320 .0771 16 0.915655 5.65126 13.03236 .0421 .1051 17 0.865750 1.37493 7.54109 .0320 .0666 18 0.923373 7.72623 4.17220 .01 10 .0372 19 0.800352 1 .21131 4,40808 .0203 .0579 22 0.978129- 16.27909 14.89790- .0185 .0532 ■ 23 0.555318 1.07791 5.42958 .0418 . 1 280 26 0.853683 4.61033- 13.42066 .0343 .0833 29 0.878601 14.90231 - 27.76479 .0484 .1638 The values of shown include the regression constant aQ1 of equation (Vg). 1 29 reduced substantially from those of the first recursion stage (see Table 3). The a^ and j€l of Table 5 are plotted in Figures 15 and 16, respectively, as functions of n , the number of carbon atoms in the hydrocarbon v chain. According to the general trends which are discern ible in Figure 15, the coefficient associated with ^420 might prove the easier to attribute. In Figure 16 the choice between X^g anc* ^420 more obvious. For nc<l8 the smaller error fractions are clearly those related to a recursion equation containing ^228* Above nc=l8, however, X42Q leads to the smaller errors. Over the entire range of nc, and qualitatively, neither parameter satisfies all the criteria for a best (Xj)^. Both the a^-coefficients and the error fractions are so dispersed as to evidence but slight systematic variation. In the present example, the illustrative purposes -2/"5 are served by choosing (Xj)2=X420=TR tanh(TR). This parameter is essentially much less complex than ^223’ which contains in its cubic form three non-linearly- attributed coefficients (see equation VIg). While the hyperbolic tangent in X^would undoubtedly be less popular with a user of the predictor equation, the parame ter itself is nevertheless well-behaved and easily calculable. Second recursion stage. Phase III.— The second recursion equation for species J may be written: Regression Coefficient 130 20 ‘ 228 ‘ 420 5 10 15 20 25 nc, Number of Carbon Atoms in Chain Fig. 15— Regression Coefficients, Second Phase II }€l, Average Absolute Recursion-Error Fraction 0.09 0.07 0.06 0.05 0.04 0.03 0.02 0.01 228 3) 5 10 15 20 n , Number of Oarbon Atoms in Chain Fig. 16— Recursion-Error Fractions, Second Phase II Y1 , j “ a02, j + a2j (XI>2;J 132 (VII) - &Q2 t j + a2j [Tr2 3tanh(TR )] , or, combining (Vli) with (Va) and (Via), as the expanded predictor equation -1 ~TR Y0,3 - ^a01,J + a02,^ + a 1 J ^ TR e ) (VIJ) + a2J t^R " tanh( TR)] , in which b- q>2 j anc* a2j are ^etermlned. The attributing of a^ (i.e., all coefficients a^^) begins with a correlation analysis of a_ with the original Z- d parameters. The Z-matrix was not augmented during this stage, since no obvious transformation was suggested logically by Figures 15 and 16. Of the original set of Z-parameters, =n^correlated most strongly with a2, to the extent r (a2, Z ^ )=0.677. As shown also in Figure 17, the correlation is disappointing. The peak at nc=16 is still present in the a^-values (as might have been anticipated), as are the special, anomalous groups of points n =2,3 and 3<n <10. These features c c are inexplicable now as they were at the end of the first recursion stage of Phase II. In this example of recursive linear regression it is becoming increasingly doubtful that any real progress toward an accurate and significant predictor equation can be expected. Until the defects in the model and/or raw data are resolved, refinement of the Recursion Coefficient 133 30 25 20 10 5 0 (0) 5 10 15 20 25 30 nc, Number of Carbon Atoms In Chain Pig. 17— Pinal Recursion Coefficient a2 134 predictor formula at each recursion stage is likely to be very moderate unless X- and Z-parameters of unwarranted complexity are introduced. Certainly any dramatic reduc tion of the error fractions seems remote. The second recursion stage shall thus be continued in the shadow of one very important consideration of theoretical vapor pressure characterization: in the region 9<n <23 the nature of the variation of all parameters Y or with n is, in essence, completely unknown. The coefficient will be taken as the attrib uted function (a least squares solution) a2,3 “ b02,1 + b21^ZjI^21 ~ b02,1 + b21^Z351^J’ or r 5 , 3 . a =0.16699 + 0.63122[nQ(Pj/Pq) ]• (VIk) The coefficient a may now be calculated as a02, j = *1,J “ a2J^I^2J* ^VI1^ These values are plotted against n in Pigure 18. A c Phase III correlation analysis on the parameter pairs a02 1 revealec* ^351 aSain as b*ie mos' * ' strongly correlated parameter, in which case the regression equation became a02,J “ °*14545 - 0.49048[n^(PM/P0)3]. (Vim) The data may be fitted with about the same accuracy to a modified Gompertz equation: < Recursion Coefficient 135 -2 -10 -12 -14 -16 -18 EQ ’N .C VIn) -2 20 nc, Number of Carbon Atoms in Chain Fig. 18— Final Recursion Coefficient aQ2 1 36 aQ2 j “ 0.12 - 444.84(0.1359^)A 0.25(n -8) (VIn) A = 5(0.80447) This type of equation, generally written T^X Y = A + B(C) , (VIo) ia an extremely useful yet relatively obscure equation in numerical analysis. It has potentially wide application in engineering, since the basic shape of the curve is sigmoidal, with zero slope as X-^-oo. The coefficients may be easily computed according to the method of Schuler, described by Davis (8, page 75). In the case of aQ2 y either of equations (Vim) or (VIn) is satisfactory. Equation (Vim) is chosen as the obviously simpler form, and also because Z,r-i=n-5(P1 1 /Pr,)^ is a parameter common j 01 C r l to both coefficients a^2 ^ and a^j of the second recursion stage. The predictor equation at the end of the second stage, the second recursion equation, is (for species j): Y0,J = ^logePR^ = *a01,J + a02,j^ +a fT"1e”TR1 (VJP> 1 J R 'j + a23[TR2/3tanh<TR >]3 in which (J equivalent to n ): V 137 5*01,3 =4.12 +0.5134 sinh[0.1552(nc - 14)], (Vf) a02,3 " °*14545 - 0.49043[n^(PM/PC)3], (Vim) a^ = 11 .5 + 2.582 sinh[0.10l8(nc - 15)], (Ve) a2J = -0.16699 + 0.63122[n3{P^/Pq)3]• (VIk) All digits in the coefficients of these equations are not significant, but are given in anticipation of possible further application or manipulation of the equations (e.g., numerical differentiation). The new dependent parameter ^ may now be calculated as Y2,3 = Y1,3 “ a23(xl)23 = Y1,3 “ a23(X420>3 (VIq) r _2/3 ~ Y1,3 “ a2j CTR tanh(TR )]j. These values YQ . are shown in figure 19 as functions ^ 9 J of nc. The translated origins ^=ao2+a01 are lu^ica'ted by the dashed lines. A comparison of figures 12 and 19, total residuals for the first two recursion stages, indicates that the effect of the second stage has been almost solely a small reduction in the amplitudes of The "functional" or systematic variation which characterized Figure 12 has been reduced but slightly. The question may properly be asked whether the second recursion has accomplished any significant improvement in the predictor equation. In Second Dependent Parameter 4.0 0.9 4.0 0.8 0.7 +0.1 Cl 7 C23 \— - 0.1 -0.2 -4.7 3.0 -4.8 -4.9 3.0 -11 .4 , r — OJ -11.5 -11.6 2.4 0.8 078' oTf T^, Reduced Temnerature Reduced Temperature T , Reduced Temperature T Pig. 19— Total Residuals, Second Phase II V>l 03 this example of recursive linear regression no specific goal was initially specified. Insofar as accuracy of the predictor equation is concerned, the improvement has been small but is definitely in evidence. The average recur sion errors are not in this case so useful as the average over-all-error fractions lE^ and IE^ which are given in Table 6. The two sets of error fractions may be compared in the grand averages and lELj. These figures are represented as in which the j indices assume the J (=22) values of nQ excluding nc=10. Another type of grand average, . T TJ the average of error fractions for all observations, is again less useful in the present case. The larger number of observations for species j, for small J, tends to lower the so-called grand average error fraction and thereby give a spurious indication of accuracy for all substances 3<30. formula (VIp) over the first (Vg) is small, as implied by the grand average error fractions of Table 6. The (VIr) (Vis) The increase in accuracy for the second recursion 140 TABLE 6 OVER-ALL-ERROR FRACTIONS FOR THE FIRST TWO RECURSIONS E1 E2 1 0.11931 0.10202 2 0.03296 0.05052 3 0.04124 0.06236 4 0.05471 0.04824 5 0.04895 0.05390 6 0.05492 0.06201 7 0.05919 0.03006 8 0.06082 0.08120 9 0.04278 0.03884 1 1 0.05051 0.03778 1 2 0.04870 0.04317 13 0.02015 0.03156 14 0.04060 0.04551 15 0.07192 0.06601 16 0.09685 0.09138 17 0.05125 0.04334 18 0.09049 0.10146 19 0.03690 0.03123 22 0.09205 0.11472 23 0.11681 0.18596 26 0.10738 0.06972 29 0.46183 0.13401 Grand Average 0.08274 0.07164 (Grand Average)* 0.08089 0.06800 In the (Grand Average)* the cases n =18,22 are excluded. c approximately one per cent improvement, slight compared to the analytical labor representing the second stage, is a direct result of the previously mentioned difficulties with the raw data and/or certain theoretical assumptions regarding the vapor pressure phenomenon. The evidence is decidedly against significant increases in accuracy for further recursion steps. The criterion of random residuals is certainly not met, according to the curves of Figure 19. For the coefficients (Figures 17 and 18) and recursion- error fractions (Figure 16) of the second recursion stage, randomness is a stronger possibility. Since the attribu ting of coefficients is a major feature of recursive linear regression, resources of the latter method might be presumed to have been virtually exhausted at the end of the second recursion stage. As a last exercise in this example of a vapor pressure predictor equation, genuine raw data (from chemi cal literature) were substituted into the final recursion formula (VIp): -T / l06ePr “ (a01 + aG2^ + a1^TR1e ^ + a2^TR2 5tanh( TR)] • The over-all-error fractions E2» and the corresponding species or source averages, appear in Appendix XI. Whether the Yq data may be considered "raw" is problem atic, as the literature sources in many cases gave the Impression that the authors had already smoothed the data. 142 Comparison of Table 6 and Appendix XI, In any event, gives some indication of the reliability of Stull's data; the IE^ values are in most cases of comparable magnitude. Unexpectedly, however, the genuine raw data are apparently accommodated by the predictor equation somewhat more accurately than Stull's smoothed data, on which the predic tor equation was based. For species n =3,4,8,10 Stull's c data represent the better fit. For species nc=5,7,9,11,13,14,15,16,17,18,19,22,23, however, the grand average error fractions for Stull's and for the raw data are, respectively, 0.0709 and 0.0501. The raw data from the literature, selected more or less randomly from those sources most conveniently accessible, may have therefore been, coincidentally, of higher accuracy or precision than the smoothed values used in this example. An hypothesis that the predictor equation is generally more accurate than the error fractions of Table 6 (smoothed data) would indicate, is sustained by no evidence determinable in the example of this paper, and must therefore be considered strictly conjectural. On the basis of grand average error fractions for all species studied, the raw data are associated with an error about 1.9^ greater than that for the data used in deriving the equation. APPRAISAL OP THE RECURSIVE LINEAR REGRESSION METHOD The vapor pressure example.--In the preceding text serious questions were raised regarding certain assumptions underlying the three vapor pressure models--physical, "smoothed" (i.e., Stull's Interposed), and mathematical. The final result of this study, equation (VIp), . — Tp p /-r logePR = (a01 + aQ2) + a1 (TR” e K) + TR /:"tanh(TR) , is of quite limited use as a vapor pressure predictor equation, but the illustrative purpose has been adequately served. Useful information has in fact been obtained from each decision made during the recursion analysis. One Important conclusion which may be drawn concerns the chemical parameters Z^. Such Z-parameters as may be use ful in Phase III must apparently satisfy very unusual conditions (see pages 105 - 106). A predictor equation deduced entirely from theoretical considerations should manifestly account for this behavior in terms of chemical properties z. A second conclusion which may be drawn is that, with existing data, a good predictor equation (i.e., an accurate one, or one of theoretical significance) with attributed coefficients is almost certain to be extremely complex. During the first recursion stage the parameter — 1 — —1 T e was found to be superior to TR , and the 143 144 Antoine parameter (Tr+A)-1 considerably better than either. One would suspect that a parameter of this latter type (Inverse temperature, In some form) may be the best for development of an accurate one-term predictor equation. Since the coefficient A In (T^+A) ^ virtually defies any attributing effort (see Appendix IX), the ideal first term would undoubtedly include non-linear coefficients of presently unknown type. All evidence indicates, nonethe less, that a parameter typical of that in the Antoine equation presents the most interesting opportunities for further development. The correlation coefficients r(Y0,X^) for the first recursion stage indicate that an appropriately- chosen recursion parameter (X^)1 eliminates all but a very small portion of the total variation in YQ. One might thereby conclude that a class of very satisfactory predictor equations may be derived with but one or two complex terras, rather than several simpler ones. In the Prost-Kalkwarf equation, for example, log1QP = A + BT_1 + C(log10T) + DPT-2, the four separate terms contribute numerically to log^P in the manner shown by the examples below (for which P, the vapor pressure, is expressed in millimeters of mercury, temperature T in degrees Kelvin): 145 nc T A bt“1 C(log10T) -2 DPT 2 113.65 16.5182 9.3372- 7.1618- 0.000035 2 220.35 16.5182 4.8161 - 8.1636- 0.035427 2 305.44 16.5182 3.4743- 8.6577- 0.177734 7 239.26 28.1045 11.7601 - 16.3535- 0.000051 7 398.12 28.1045 7.0674- 17.8738- 0.027794 7 540.18 28.1045 5.2088- 18.7348- 0.203059 12 323.28 47.2435 15.5056- 31.8642- 0.000108 12 465.05 47.2435 10.7789- 33.8692- 0.020942 12 663.13 47.2435 7.5592- 35.8258- 0.342451 The value of A, positive, is always approximately equal to the sum of the negative quantities BT“1 and CClog^T), less the predicted quantity log1QP. The two negative terms are furthermore of approximately the same magnitude for small n while C(log nP) predominates C 1 u as nQ increases. Over the range for which the coeffi cients were determined (23), however, the correlation is typical of that which generally prevails for multivariate regression equations (especially of the polynomial form) — relatively large terms, the algebraic sum of which is relatively small. The sum of the first three terms in the Frost-Kalkwarf equation is thus much smaller than any of the terms individually, and comparable in magnitude to DPT”2. Such is not generally the case with recursive linear regression, as the most important terms are removed from the equation residual (dependent parameter) in de creasing order of significance. In light of the accuracy of the Frost-Kalkwarf equation, however, a recursive linear regression analysis of all four terms should be 1 46 very interesting. Such an analysis would establish the relative significance of each of the four terms, in addition, perhaps, allowing the accurate attributing of the four coefficients. Perhaps the most important result of the two-stage recursion analysis is the appearance of unexplained anomalies in the mathematical model, especially for n =11,12,13. The unusual change in recursion statistics V for this region, most apparent in Figure 9, is more likely the result of an abrupt change in the physical phenomena of vapor pressure rather than inaccurate data. The vapor pressure tabulations of Perry (27) and Krafft (13) differ from Stull's graphically-smoothed data by at most 0.5% (except as previously noted for n =5,10,13). This differ- ence is much too small to affect any of the functions in (VIp) to any marked degree. 'While data on the higher hydrocarbons are meager indeed, and not recent, still any reasonable observational error is insufficient to account for the inflections in Figure 11 (a^^ versus nc)» In fact, the vapor pressures of the normal hydrocarbons above n-CgH2o should be analyzed or redetermined as a serious further study. The change in shape of the locus of points (Figure 11) is certainly too strongly in evidence to be attributed to experimental error alone, and the parameters in the predictor equation are much too well-behaved for one to suspect analytic fluctuations of 147 the type and magnitude shown. In lieu of other reasonable explanations, doubt must be cast on the theoretical vapor pressure model and our assumptions (e.g., uniform Z-n w functions) pertaining thereto. These same conclusions should also be inferred as applying to the neighborhood of nc=l6. The usual correlating parameters Z (chemical) for vapor pressure were shown in Figure 8 to be strongly even-odd characterized by n . Recursive linear regres- c sion analysis has indicated that this dichotomy of hydrocarbon chain length is not, however, consequential. The procedures in both recursion stages were based on the clearly uniform variation with n of all statistics Y, a, and € in the lower range n <10. The abnormal 0 behavior of these statistics became evident only for the larger n , for which the chemical parameters Z had become primarily single functions of n . An intuitive 0 conviction of the importance of even or odd n -values (as deduced from Figure Q) is thus dismissed as a result of recursive linear regression techniques. Comparison of regression treatments.— Except in the imposition of the least squares criterion, recursive linear and multivariate regression analyses differ in almost every respect. Table 7 is a compilation of some of these differences--major ones Insofar as numerical analysis is concerned. The decisions consequent on the TABLE 7 A COMPARISON OF RECURSIVE LINEAR REGRESSION AND CONVENTIONAL MULTIVARIATE REGRESSION TECHNIQUES Recursive Linear Regression Multivariate Regression Number of independent parameters, n Number of observations, N (per parameter) Establishment of equation form Attributing of equation coefficients Correlation matrix and inverse Inter-relationships among independent parameters Partial predictor equations Feasibility of manual computation Equations of specified form Non-linear coefficients Consistency analysis of physico-mathematical model Unlimited; standard Phase I Somewhat limited (by compu ter memory) to ~100 Developed term by term, un der control of analyst; standard Phase II Standard Phase III Neither required, but both can be generated Linear independence not required; linear dependence detectable Any linear residuals calculated, as desired Relatively difficult in Phases I and IIA, subse quently a simple task Possibility of n! different equations Trial-and-error procedure at some level L A designed purpose of method Limited by computer memory to 50-75 Essentially unlimited Specified at start of analy sis, inflexible through out actual derivation Extremely difficult and seldom attempted Both matrices essential J Linear independence a prerequisite Partial equations analyzable only with difficulty Quite impractical for n>6, seldom attempted for large n One equation Trial-and-error procedure at a level L+1 Usually inextricable from multivariate statistics -> 149 use of the recursion method generally have no counterpart in multivariate regression, and a comparison of the ulti mate results of each of the two techniques is Illusive. The terms in Table 7 should therefore be interpreted with respect to primary regression goals rather than the secondary information about the mathematical model(s). The study of variable- or parameter-relationshlps in general is beyond the scope of multivariate regression, while precisely the purpose of the recursion method. For the vapor pressure example which has been carried through two recursion stages, one might wish to consider the results of a conventional multivariate re gression on the same yq,X1 data» The S°al of such an analysis is undefined, however, since the requisite equa tion form is initially unspecified. Taking the final recursion equation (VIp) as the basic predictor equation, may be analyzed as a multiple regression problem. The sets of coefficients which result appear in Table 8 and are plotted in Figure 20 as functions of n . Equation (Vila) represents a slightly better fit than the equiva lent recursion equation (VIp), but the Phase III analysis i i i i i i (attributing) of the coefficients A ,B ,C seems an n almost hopeless task except in the case of B . On most counts (VIp) is the more suitable, as a predictor equation. 150 TABLE 8 MULTIPLE REGRESSION COEFFICIENTS FOR THE PREDICTOR EQUATION Y *0 i i i t < = A + B (Tr “T-d I I _ e R) + c [tr2/3tanh(TR)] FOR STULL"S SMOOTHED DATA (37) t i i i i i nc A B C 1 7.40768 6.60967- 6.52747- 2 5.44119 7.20772- 3.69462- 3 3.71281 7.56506- 1.26113- 4 1 .66644 7.76793- 1.53183 5 1 .39151 8.20804- 2.09542 6 0.46257 3.53349- 3.43704 7 1.34251 8.95961 - 2.49506 8 0.07339- 9.27765- 4.50991 9 1.52089- 9.53127- 6.50977 1 1 3.97937 10.26484- 0.26455- 1 2 0.03362- 10.40461 - 4.99571 13 5.36389 1 1 .02061 - 2.36741 - 14 1.13423 11.28469- 3.95233 15 2.84763- 11.51129- 9.27573 .16 4.56705- 11.83713- 11.68106 17 0.14231 12.13234- 5.64972 18 6.84749 12.24068- 3.07720- 19 3.37914 12.72813- 1.70592 22 16.94771 13.54423- 15.73252- 23 2.43328- 13.60203- 9.80402 26 6.54820- 14.89644- 15.83104 29 18.80788- 16.24397- 32.59296 or 151 20 PQ O -1 0 -20 -2 10 15 20 25 Number of Carbon. Atoms In Chain n Pig. 20— Multiple Regression Coefficients 152 On the basis of accuracy alone, one would certainly choose an entirely different equation (the Frost-Kalkwarf equa tion, for example, or the Antoine equation of Appendix IX). If the introduction sequence in (VIp) were re versed, the equations ¥0 S a01 + a,'[T52/3tanh(TR)], 1 1 * « A Y, = Y0 - aQ1 - a, [tr tanh(TR)] , (Vllb) t I « _1 ”^-R. Y1 = a02 + a2(TR e ^ would result, in which the primed coefficients and depen dent parameter differ numerically from the unprimed ones of (VIp). The equations (Vllb) are of course synthetic in the domain of recursive linear regression, since the change in introduction sequence is hypothetical and logically inde fensible. However (Vllb) can be reduced to the form * ' ' -1 ’ r — O A - i Y0 - a + 3 (TR e K) + C [tr ^tanh(TR)J , (VIIc) Just as the final recursion equation can be expressed generally as, instead of (VIp), -T / YQ - A + H(T^1 e R) + C[T“2/3tanh(TR)] . (Vlld) Equations (Vila), (VIIc), and (Vlld) are identical in form but contain numerlcally-different coefficients. These are therefore three distinct equations, each one satisfying the least squares principle in one or more respects. Prior statements have indicated that (VIp), the equation generated during the Illustrative example, is 153 the most suitable predictor equation of the three. Never theless, for different purposes this choice may not be appropriate. The b", or a1, coefficients of Table 8 are quite regular and hence conceivably appropriate for use in Phase III. Since the multivariate regression equation for which they were evaluated is generally more accurate than (VIp), the former is somewhat a mixture of the two regres sion techniques. The vague relationships that apparently t t i t bind coefficients A and C still, however, leave that equation inferior to (VIp). The more closely one chooses to associate (not necessarily Implying "fit") the predictor equation with the reality of vapor pressure phenomena, the more strongly is one convinced that the choice is, as stated, equation (VIp). Divorcing the mathematical and physical models completely, one may say that the numbers Y0,X109,X420 are mos' t nearly accommodated by equation (Vila). In the last analysis the user must decide which of the equations is least fictitious for all possible cases to which it will be applied. Should interpolation and/or extrapolation be required, the selection of any equation other than (VIp) would appear to be unjustified. The research investigator who contemplates the use of regression analysis must eventually turn to a consider ation of available or easily prepared computer programs. A flow diagram of the general program used by this writer is given in Appendix X. His attention was recently called to a computer program written In the FORTRAN language by M. A. Efroymson of the Esso Research and Engineering Company. Entitled "E R MPR2 - Stepwise Multiple Regres sion," this program (written in 1953) accomplishes part of the recursive linear regression analysis of up to 59 Independent variables. During the computer processing all decisions are based on the simple correlation coefficients (actually, residual sums of squares). Essentially, terms are added to the predictor equation on the basis of r- values only; the program logic was not designed to investi gate the different chemical species or higher regression levels to which this paper’s Phase III alludes. At the same time an additional feature of Efroymson's program merits some discussion. A variable once introduced into the predictor equation may subsequently (at some later recursion stage) be removed from the equation on the basis of the statistical F-test. This writer does not subscribe to such a procedure, taking the position that more sub stantial reason than an F-level test must be invoked to delete a variable (or parameter) once introduced. Consider a parameter (xi)k» Introduced at the k-th recursion stage and found at some stage t (t>k+1) to contribute insignificantly (F-test) to the residual sums of squares. By analogy, one may conclude that an "error" was made in including in the first place (at the k-th stage). The central issue is: how appropriate is 155 the analogy? By using the r-criterion as a control, rather than a guide, one would seem to be continuously confounding the regression analysis by spurious predictor terms. The most reasonable action to take under these circumstances should amount to a reconsideration of the (k-1)-th predictor equation k-1 Y0,k-2 ~ aO,k-1 + ^ ai^XI^i (Vile) and the set of residuals Pk-1 = Y0,k-2 ’bo,k-1 + & i < XI)1]- <VIlf) The values ( p f ° r observations i represent a parameter equivalent in every respect, mathematically, to the original dependent parameter YQ. Sow, no effort was made to determine either significance or redundancy of (Xj)1 wlthin Yq for the first recursion equation Y0 = aQ1 + a, (II)r Should one, or in fact can one, therefore be concerned about the "(X^J^-content" of Y^_.j in the k-th recursion equation Yk-1 = aOk + ak^XI^k* an equivalent form? For decisions orient.ed toward mathematical characteristics only (e.g., residual sums of squares, F-levels, etc.), the two equations should not be seriously differentiated. 1 56 The concept .of deletion of parameters is, to this writer's thinking, one example of an excessive preoccupa tion with purely statistical measures. In dealing with intertwined physical and mathematical models, as engineers, especially, must, the research analyst assumes full responsibility for any tendency to become oblivious of the one while intently pursuing the other. Computer programs for either conventional or for Efroymson1s stepwise multiple regression necessarily emphasize purely statisti cal aids to the decision-making procedure. The ultimate application of the predictor equation carries with it, however, the implication that the result is a blend of the physical and mathematical models. Such Implication is proper in recursive linear regression, since each recur sion stage k constitutes a concerted study of an isolated mathematical construct (the parameters Y^, all independent parameters X, the k-th predictor equation, and all k-th stage residual statistics) and a physical relationship. Stepwise multiple regression proceeds at too rapid and automatic a pace to permit the computation of interesting intermediate results. By the time the k-th step has been completed many decisions have been made through the use of computed statistics and the application of statistical tests, when such decisions may properly have been the domain of the analyst— not the programmer. This writer can in fact foresee no immediate prospect of a 157 complete computer program for recursive regression analy sis. In no respect do the methods which have been described In this paper'constitute a formalized procedure, a recipe, for the new regression analysis. The use of appropriate mathematical tools should be augmented by the investigator's knowledge of the physical relationship. Such knowledge is currently quite beyond the capacity of a computer or of computer logic. The tenor of this argument is that the results of a regression analysis are valuable according only to the analyst's conscious or Implied manipulation of the primary mathematical tool. In this respect, recursive linear regression has a distinct advantage over other regression methods, methods charac terized generally by an uninterrupted sequence of arithmetic operations and dichotomous logical decisions. Decision-making steps which are latent in conventional multivariate regression and potential in stepwise multiple regression, are finally realizable in the recursion method. APPENDICES APPENDIX I - NOMENCLATURE Functional notation.— Most of the equations which have been used may be written in any of several forms, depending on the number of subscripts which may be re quired to eliminate any ambiguity. The general recursion equation for the k-th stage Phase II recursion, chemical species (sub-array) j, observation i, may be written as (Vk-1,J “ aOk,j + akJ(XiI)kj’ or, in terms of the original dependent parameter Yq, as (Yi>O.J = aOk,j + E asJ(X1i)sr With the tacit assumption that the equations hold for each species under consideration, the subscript J may be omitted and the following simplified expressions result: ^Vk-1 = aOk + W V <Yi)0 “ aOk + as (XiI^s* Even more condensed versions may be formulated by dropping the obervation index 1; for each observation on each species the equations reduce to These last two equations are the simplest expressions for the primary recursion formulas. The secondary equations of recursive linear regre sion involve the generation of successive dependent parameters, the residuals on the primary equations. The generalized expression for the i-th observation of the j-th species, after the k-th Phase II recursion stage, is (Yi*k,3 = (Yi>k-l,3 " ak3(xil)kr Simplification may be effected, as in the preceding para graphs, by the omission of the species subscript 3, (Yl)k = (Yi)k-1 “ ak^XiI^k* or of both j and the observation subscript i, Yk = - V xib- Exactly equivalent formulas for the Phase.III approximations to a may be similarly simplified. For k J the p-th Phase III recursion stage, the most general primary formula is akJ,p-1 = bOk,p + bkp(zJI^kpf or, in terras of the original dependent parameter a , kj (akJ )o = akj - bOk,p + i bks{z3I^ks* s=i Simplification of these two primary recursion equations may be effected by dropping the j-subscrlpt for species (in Phase III this subscript is actually an observation 1 61 index): ak,p-1 ~ bOk,p + bkp^ZI^kp* (ak)0 = ak = b0k, p + | ^ bks^ZI^ks* The two secondary recursion equations, for successive dependent parameters a^_, are in like manner deducible as akj,p = akj,p-1 ~ bkp^zJI^kp* akp = ak,p-1 “ bkp^zI^kp* With this background in development of the functional notation, derivations and discussions of pertinent equa tions may be carried out for either the general or the particular case. Functional nomenclature.— The following symbols are defined in accordance with the text of the preceding section (I denotes a recursion-selected parameter): aOk,j» aOk “ regression constant for the k-th Phase II recursion equation for species J akJ p-1* akj* ak ” regression coefficient for the k-th phase II, p-th Phase IJI recursion equation for species j bm^ t, “ regression constant for the k-th Phase II, p-th uk , p Phase recursion equation b - regression coefficient for the k-th Phase II, p-th Kp Phase III recursion equation 1 62 ” ultimate-error fraction for the k-th Phase II, I t J P p-th Phase III recursion equation for species J (E.), - over-all-error fraction for the i-th observation 1 K in the k-th Phase II recursion equation 2 'VjK - pseudo-variance for parameters J and K - independent variable h (Xi)hj, “ i-th observations, in the j-th species, of independent parameter h ^Xil^kj» ^Xil^k» (XI^k* Xi " observation, in the J-th species, on the k-th Phase II Independent recursion parameter yh - dependent variable h (Y^)^ y (Y^j or (Yj_)k, Y,k - i-th observation in the 3-th species, on the k-th Phase II dependent recur sion parameter z^ - independent chemical (species) variable h (Zj)h» - J-th observation on independent chemical (species) parameter h ^Jl^kp* ^ZI^kp “ 3“^h observation on p-th Phase III independent recursion parameter, for the k-th Phase II recursion equation ^kj,p-1 *^kj »^k ” Phase III recursion-error fraction for that recursion defined by a^ p„i » akJ» or ak (€i)kj» ^i^k “ recurslon-error fraction for the i-th observation, in the 3**th species, of the k-th Phase II recursion equation 163 <Pi> (ft )^ - residual error for the i-th observation, in the j-th species, of the k-th Phase II recursion equation Any symbol over which a horizontal bar is placed (e.g., X , Y , etc.) represents the arithmetic mean of the quantity denoted by that symbol. Summation is assumed, generally, over the observation index i. General nomenclature— Below are defined those symbols not otherwise mentioned above: A,B,... arbitrary coefficients e base of Naperian logarithms f,g,h function (of) F,G function (of) J total number of chemical species J for Phase II m number of independent (physical) variables x or independent (chemical) variables z M number of Phase III observations, or species n number of independent (physical) parameters X nQ number of carbon atoms in straight-chaln saturated hydrocarbon N number of Phase II observations, or data points P pressure; specifically, vapor pressure Pq critical pressure P^. melting point pressure, viz., one atmosphere reduced (vapor) pressure, P/Pn parachor correlation coefficient (simple, product moment, zero-order, or Pearson) between parameters J and K temperature, absolute critical temperature melting point temperature, at one atmosphere pressure reduced temperature, T/Tq atomic weight, formula weight surface tension viscosity function (of) standard deviation 165 APPENDIX II - REFERENCES (1) Ashworth, A.A., J. Inst. Petr. Tech. 10, 787(1924). (2) Aston, J.G., Messerly, G.H., J. Am. Chem. Soc. 62. 1917(1940). (3) Baehr, H.D., Chem.-Ing.-Tech. 25. 717(1954). (4) Barkhuysen, P.H.C., Chem. Weekblad 55. 509(1959). (5) Bohr, N., "Atomic Physics and Human Knowledge," pp. 3-12, John Wiley & Sons, New York, 1958. (6) Burrel, G.A., Robertson, I.W., J. Am. Chem. Soc. 37. 2189(1915). (7) Chuprov, A.A., "Principles of the Mathematical Theory of Correlation," W. Hodge & Co., Ltd., London, 1939. (8) Cornelissen, J., Waterman, H.I., Chem. Eng. Scl. 5. 141(1956). (9) Davis, D.S., "Nomography and Empirical Equations," Reinhold Publishing Corp., New York, 1955. (10) Dodge, B.P., "Chemical Engineering Thermodynamics," McGraw-Hill Book Co., New York, 1944. (11) Erpenbeck, J.J., Miller, D.G., Ind. Eng. Chem. 51. 329(1959). (12) Frost, A.A., Kalkwarf, D.R., J. Chem. Phys. 21. 264(1953). (13) Gamson, B.W., Watson, K.M., Natl. Petr. News 36. R258(May 3, 1944). 166 (14) Heisenberg, W., "Physics and Philosophy," .pp. 96-102, Brothers Publishers, New York, 1958. (15) Hildebrand, F.B., "Introduction to Numerical Analy sis," McGraw-Hill Book Co., New York, 1956. (16) Kemp, J.D., Egan, C.J., J. Am. Chem. Soc. 60. 1521(1938). (17) Keyes, F.J., Taylor, R.L., Smith, L.B., J. Math. Phys. 1. 211(1922). (18) Krafft, F., Chem. Ber. 15. 1687-1728(1882). (19) Lange, N.A., "Handbook of Chemistry," 9th Ed., McGraw-Hill Book Co., New York, 1956. (20) Linder, E.G., J. Phys. Chem. 35. 531(1931). (21) Margenau, H., "The Nature of Physical Reality," pp. 25-31, pp. 389-394, McGraw-Hill Book Co., New York, 1950. (22) Matsen, J.M., Johnson, E.F., J. Chem. Eng. Data 4. 531(1960). (23) Messerly, G.H., Kennedy, R.M., J. Am. Chem. Soc. 62. 2988(1940). (24) Michael, G.V., Thodos, G., Chem. Engr. Progr. Sympo sium 49. No. 7, 131(1953). (25) Mitra, S.S., J. Indian Chem. Soc. 51. 444(1954). (26) Mitra, S.S., Chakravarty, D.N., J. Chem. Phys. -22, 1775(1954). (27) Perry, J.H., et al., "Chemical Engineers’ Handbook," 3rd Ed., McGraw-Hill Book Co., New York, 1950. 167 (28) Perry, R.E., Thodos, G., Ind. Eng. Chem. 44. 1649(1952). (29) Reid, R.O., Sherwood, T.K., "The Properties of Gases and Liquids," McGraw-Hill Book Co., New York, 1958. (30) Riedel, L., Chem.-Ing.-Tech. 26. 83(1954). (31) Scarborough, J.B., "Numerical Mathematical Analysis," 1st Ed., Johns Hopkins Press, Baltimore, 1930. (32) Seibert, P.M., Burrell, G.A., J. Am. Chem. Soc. 37. 2683(1915). (33) Sherwood, T.K., Reed, C.E., "Applied Mathematics in Chemical Engineering," 1st Ed., pp. 268-270, McGraw-Hill Book Co., New York, 1939. (34) Smith, E.R., J. Res. Natl. Bur. Standards 24. 224(1940). (35) Smith, L.B., et al•, Proc. Am. Acad. Arts & Scl. 69. 137(134). (36) Sondak, N.E., Thodos, G., A. I. Oh. E. Journal 2. 347(1956). (37) Stull, D.R., Ind. Eng. Chem. 39. 517(1947). (38) Thodos, G., Ind. Eng. Chem. 42. 1514(1950). (39) Waring, W., Ind. Eng. Chem. 46. 528(1954). (40) Williams, E.J., "Regression Analysis," John Wiley & Sons, New York, 1959. (41) Willingham, C.B., et al.. J. Res. Natl. Bur. Stand ards 35. 219(1945). 168 (42) Worthing, A.G., Geffner, J., "Treatment of Experi mental Data," John Wiley & Sons, New York, 1943. 169 APPENDIX III - BIBLIOGRAPHY The following references contain information related to special and unusual problems which arise occa sionally in regression analysis. The list is not meant to be exhaustive. Durbin, J., J. Am. Stat. Assoc. 48. 799(1953). The accounting for extraneous information about coefficients in regression equations. Ezekiel, M., Pox, K.A., "Methods of Correlation and Regression Analysis, Linear and Curve-Linear," 3rd. Ed., John Wiley & Sons, New York, I960. An excellent and modern basic treatment of correla tion methods, regression analysis, and significance- testing. Greyson. M., Cheasley, J., J. Petr. Ref. 38. No. 8, 135(1959). Special methods for fitting straight lines and parabolas, with descriptions of similar treatment for polynomials. Keeping, E.S., Ann. Math. Stat. 22, 180(1951). Significance-testing of exponential regression equations. Marquardt, D.W., Chem. Eng. Prog. 55. No. 6, 65(1959). Special techniques for regression analysis with non-linear models. Opfell, J.B., Ind. Eng. Chem. 51. 226(1959). Simplified representationof weighting formulas for weighted least squares regression. Rosen, J.B., J. Soc. Ind. Appl. Math. 8. 18(1960). Use of non-linear models in conventional linear programming methods. Rosenbrock, H.H., Computer Journal 3, 175(1960). Use of gradient methods and partial least squares approximations with general optimization procedures. Whittaker, S., Pigford, R.L. , Ind. Eng. Chem. 52. 185(1960). Numerical differentiation through regression equations. Wilkinson, G.N., Biometrics 14. 257(1958). Estimation of points missing from experimental observations. 171 APPENDIX IY - Mathematical Derivations Fundamental recursion equations.— Consider a set of N observations on one dependent parameter Y and n different independent parameters X^, 1^b<n. The simple (Pearson) correlation coefficients between Y and any X^ are given (42) by £ [ ( V - Y]E[(Xl)h - Xh] r(Y,X.) = 4=2-- -----4. 7, 1-. . .--- _ N a*Y C T y r in which N o " 2 - - 5 ^ (x )2 - x?' Xh ~ Ni^1 1 h h’ N = JrS (Yi3» i=i Xh = (Xi)h. 1=1 The pseudo-variances are now defined: N X,_X, ^xi^ ‘ h h ,2 i= 1 N N N N (Villa) (VUIb) (VIIIc) (VUId) Substituting (VUId) into (VUIb) and (VIIIc) results in = V2y/N2, O’2 ='v| x /k2. xh xhxh Then substituting (VIIIc), (VUId), and (VUIe) Into (Villa) leads to 172 (VUIe) r(YtXh) = VYX - -h . (Vlllf) V V Y Y Vh the correlation coefficient between Y and X^. During the k-th recursion stage of Phase II, the linear equation defined by the least squares regression of on X^ is given by Yk-1 “ aOk,h + akhXh generally, or specifically for each observation i by <Vk-1 - aOk,h + ak h < W (VIIIg) The coefficients a_, . and a, . are calculated as the Ok,h kh solution to the following set of linear normal equations: aok,h^ + akh^^Xi^h = ^^i^k-l, aOk,ht]^ *Xi U + ak h [ ^ ^ Xi^hl = ^ ^ Yi ^k-1 ^Xi ^h. The solution to this system of equations may be expressed in the form The error of approximation in (VHIg) is ^Pl^kh “ ^Yi^k-1 “ aOk,h(XiV and the corresponding recursion-error fraction is <ei>*h = 1 - aOte.h + akh( X l > h . (Villi) <Yi>k-1 If the residual error of approximation (Pi) kh ls squared and summed for all observations 1<1<N, the result may be combined with (Villi) to produce (VIII;)) N 12 <Pi> 1=1 ' 2 _ 1 kh N V L Yk-1 Yk-1 V k-1 h 5---- xh*h which may be again combined with (VIII1) to yield „2 N X J (pA ) i = 1 ^ 2 _ xk-1 xk-1 kh ~ N [1 - r2 (Yi i ,Xh)] . (VUIk) Select now a particular (often, that associ ated with the r-value of greatest magnitude) and denote it by (Xj)^* ^e i-th observation by A new dependent parameter may now be written generally as: 174 Yk = Yk-1 " ak^XI^k » or specifically, for each observation i, as <Yi>ic = <Vk-i - ak<xii>k- <VI1ID The recursion equation is similar to (VHIg), in which the subscript h has been dropped, the I-notation added: Yk-1 ~ aOk + ak^XI^k’ (Vlllm) (Yi^k-1 ~ a0k + ak^XiI^k* For previous recursion stages (VIII1) may be written ^Yi ^k-1 = (Yl^k-2 " ak-1^XiI^k-1* (Yi*k-2 = (Yi^k-3 ' ak-2^XiI^k-2» • • • (Yl)l = (Y1)0 - a1(X1I)1. Adding all of these equations to (VIIIl) and simplifying, (Ti }k = < V o - i as<xii)s- (Tiiin) The expanded predictor equation at recursion stage k may now be written as a combination of (Vlllm) and (VIIIn): ^Yi^k-1 = ^Yi ^0 “ g as(xn ) s ~ a0k + ak^XiI^k» or, by rearranging terms, as: The total- or over-all-error fraction is defined from equation (VIIIo) as <Yi>0 " [ a0k + S as^XiI^s] (Ei’k = ----------- ---------------- which may be combined with (Vllln) to form ( E , ) v = - i;^ “ rrj7 a„. + (Y, ) , ok 1 k . (VIIIp) 0 T . i e now mite the recursion form of the recursion-error fraction, equation (VIIIJ), without the ; ] -subscript: aOk + ak(XiI^K ( TTTnl = 1 n n ---------• (/mq) 1 u l'k-1 Combining (Vlllq) with (VIIIl) leads to (€« )> = - aOk + (Yj)k . (VUIr) :i'k = " fY0k-1 Comparison of (VIIIp) and (VUIr) leads immediately to For all observations 1<i<N the mean error fractions are: 1 ~~i=1 w i;k-1 ^1=1 '"i'k-1 (Vlllt) aOkf- 1 <Vk , ' R 7 * - . !°*v^ 1 . i - ^ ' V k fc N 1 = 1 <Yi> o RS ^Yi^ o . A ( 6 l ) k i Y l > k - 1 < V 1 I l u > 0 - * S - - n r r 1 v^-i a0k + | Y1 . ~ ” i TyH 1=1 vV o Term-by-term Introduction sequence.— Proof -will be given that the recursion coefficients are not Invariant as the term-by-term introduction sequence is changed. Consider N observations on each of three parameters, one (Yq ) dependent and two (X^X^) independent. No parameter is to be constant for all observations, and the two independent parameters are to be mutually independent. Case A. Yq = aQ1 + a1 X1 , 177 v | „ = »Ey, x2 -Ei,Ena X1X2 N N 1 N d = nE v , - a j E y - E ynEHx N 1 N 1 2 N u N + a i X E X i H X N N 2 2 — Vv y “ Vy y 0 2 12 V2 2 Y0X1 2 = Vv - — J-V V 2 V2 y X1X2 1 1 v 2 0X2 V 1 V 2 . ••a2 ~ v2 " v2 v2 vx2x2 V x 1x 1 x2x 2 Case B. Y0 ~ aoi + a, X Y1 1 0 I I a1X2’ Y1 1 - a02 1 + v 2 • V 2 . a1 - 7 — % x 2 The final predictor equations for the two cases are: Case A. YQ = aQ2 + a^X1 + a2X2, i t t Case B. YQ = aQ2 + a?X1 + a^g. A comparison of the coefficients of X2 leads immediately to: 173 a Since and Xg are not separately constant for all not zero. Then a2=a1 only if either of the variances are thus distinct unless (a) Yq and Xj are totally uncorrelated, or (b) X1 and X^ are totally uncorrelated, or both. The probability of such zero correlations, for reasonably accurate arithmetic procedures, is infinitesi mal. 2 observations, the pseudo-variances VY Y i *1 and x are 2 or Vv y is zero. The two coefficients of X 1 2 2 APPENDIX V - STULL'S VAPOR PRESSURE DATA Below are tabulated the data of Stull (37), as used in the vapor pressure example. The two errors (see page 85) are identified by asterisks (*) to the left of the temperatures. nc P T nc P T 1 100 mm.Hg . o o 1 . CO 3 100 mm.Hg 79.6-°C. 1 200 175.5- 3 200 68.4- 1 400 168.8- 3 400 55.6- 1 1 atm. 161.5- 3 1 atm. 42.1- t 2 152.3- 3 2 25.6- 1 5 138.3- 3 5 1.4+ 1 10 124.8- 3 10 26.9 1 20 108.5- 3 20 58.1 1 30 96.3- 3 30 78.7 1 40 86.3- 3 40 94.8 1 45.8 82.1 — 3 42.0 96.8 2 1 mm.Hg 159.5-°C. 4 1 mm.Hg 101.5-°C. 2 5 148.5- 4 5 85.7- 2 10 142.9- 4 10 77.8- 2 20 136.7- 4 20 68.9- 2 40 129.8- 4 40 59.1- 2 60 125.4- 4 60 52.8- 2 100 119.3- 4 100 44.2- 2 200 110.2- 4 200 31.2- 2 400 99.7- 4 400 16.3- 2 1 atm. 88.6- 4 1 atm. 0.5- 2 2 75.0- 4 2 18.8+ 2 5 52.8- 4 5 50.0 2 10 32.0- 4 10 79.5 2 20 6.4- 4 20 116.0 2 30 10.0+ 4 30 140.6 2 40 23.6 4 36.0 152.8 2 48.2 32.3 5 1 mm.Hg 76.6-°C. 3 1 mm.Hg 128.9-°C. 5 5 *62.5- 3 5 115.4- 5 10 50.1- 3 10 108.5- 5 20 40.2- 3 20 100.9- 5 40 29.2- 3 40 92.4- 5 60 22.2- 3 60 87.0- 5 100 12.6- 180 • • # • • o D o o o o o o o o + + + + + C O C-VO O V O t - C M 00 - = J - C M -3- m o o vo r- o b-oo in m m moo c m co ov o o c- -3- c m m *- ovco o vo m t-C-CTv E h • • •• •• • • • • • • • • • • • • • • • t • • • • m in m-^- i n oj vo i n *- vo C M vo C O H in ^ t V O V O O V vo o c m m •— o m-^- vo o\ » — co cvi 'o m m vo co o -3- t- m o v mvo c o o w in o M O o o o N w m in v o N o o o c M ^ t m<t- m M D O \r - m r n m m c-oo o *- cu-st f-o v t-c o C M C M C M r" r- r" ^ t- *- r - *“ *“ bO bO • bO bO bO m • w W W • a • • • • a +» 0 a a a a a t - S B B B o o o o - - w i n o o ^ ^ - i n o o o o o o o o ’- m o o o o o o o o — m o o o o o o o o »- m o p4 VO O O O » — C M C M «- CM^-VO O O O V O * — C M ^J- vo O O O V O *- CM<t vo O O O vo t - C M < J - t- C M C '- — C M -st t>- »- CM^t t - O COCOaOCOCOCOCOO OOOOO OvCJvOVOvCJvOvOvOvOvOv o o o o o o o o o o C M C M C M a " • C O • o • o • U o o o o + 1 1 1 1 1 + i i i + 1 + o m tit- 0 -3- t“- m m c m Ovin o — m-3- co vo vo t^ -o t— vo-=t oo o r - ’-in m v o o o t'-o -s j-c o t'-o o moo o m c M m n E h • •• • • • • • • • • •• •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • *-COVOCOCM-it-5t»-C^- m ^ - in n t c m in i n »- o\co m T-VO OV^t -3- C M C M C T \ C M Q i- co co co -cf mcM c^vo oo c t \- - in — minovCMvp ovov in m c m t - i - m 4 \ o o \ m v o o m m »- c m m-st- mc^ovCMvo o-= tvo v — » — m-^- C M C M i— C M C M C M M bO bO bD W • « • w • • a • B • B • B -P s ■ p a -P a S d O a cd vo B c fl O v B o O ' - w i n o o o t A m o o o o o o o cm m o o o\ r - m o o o o o o o ^ c M m o o v o — m o o o m o o t- cm m m CMsJ-vo O O O C M C M T- C M ^1- V O O O O »— C M C M t - C M -s fr CM -3" » - C M -s J - C M o in in m in in in in in i n V O V O V O V O V O V O V O V O V O V O V O V O V O V O V O t — c — t^-c— tr— c— c— c—c— —1>- CO CO CO CO 00 n EH • • • • • o o o o o O O O O o + + + + + c o t — f< ' \ o j i f \ c — f<"vifv o oj o t '- c o r ^ o c o i n o v o * - vo i n - i t t— o vo o o C M K v in c o o c o o c o o o o c o - ^ - o m t n • • • • • • • • ♦ • • • • • • • • « » • • • • • « • • • • • • • • • o v - f * - k v c o • - co t - i n i n o f - i n t * - K v t ' - - f m ov o j ov c - o v v o o 00 t— m v o 1^0 o o jc o — o v o t^ -irv K V K v < j- H t v o c o o v o m in c o * - - f ' o t - o v o c M < r c - o ^ u w o c o o h i o v o o o * - m v o c o o cm K v - it c - o v m i n o v * - m m *— *— *— *— CMCMCMCM *— *— *— ■ * — *— CMCMOJCVim * - * - * - * - C M OJ C V I OJ C V i KV OJOJCMCMOJCMKV *— *— C M C M C M Ph bO bO bO ts O bO W W « W W ' • • • • • O O O O O O O O * -1 - 0 0 0 0 0 0 0 0 0 * - 1 0 0 0 0 0 0 0 0 0 * - 1 0 0 0 0 0 0 0 0 0 *-10000 *— C M -f VO 0 0 O V O * - CVInt VO 0 0 O v o * - CM^l-VO 0 0 o v o » - CVJnt VO 0 0 o v o * - CM^f * - O J H t C - * - O J - t t - * - C M -t t - * - O J - f t - o p! vo vo v o VO vo VO v o vo t— C^- C — t^- C — t>- C — C — CO CD CD CO CO CD CO 00 00 CO OV Ov Ov Ov OV Ov Ov Ov Ov OV OJ C M OJ C M OJ EH • • ■ • • O O O O O 0 0 0 0 0 + + + + + K V n t t'- in c o - s t- L T V OJ O CO -t" fO O C M f - OJ LO O - f O -3" O t —VO 0 - O i n CO 00 LO V 0 O H t C M t - H j- O * - 0 0 L n KV C M • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • -=t — «- i n i n c o - t - co o m m o v c o - t -o t - c o oj i n o v - f v o v o o iriC M ^ j-c o * - vo cm * - * - i n o t - c o ^ t i o o j o i n i n O C M K V -f VO 00 *— t O -t" CO i n o v O O J K V t V C O O O IO C - O C M fOLTWO t— O C M i n ov OJ fC V in vo t - Ov * - H t t - 010 » - * — *— *— * - « — O J C M r n m fn # *— *— * - * - * - * - o j o j * - * - * - * - * - * - o j o j o j * - * - * - * - * - * - c m c m c m . - * - P4 bO bO bD bD bD « • « W M W ^ f l ^ • • • • 0 0 0 0 0 0 * - cm i n o t - t - i d o o o o o o o o * - i n o o o o o o o o * - 0 0 0 0 0 0 0 0 0 * - i n C M - t VO O O O * - * - * - CM<f VO 0 0 O V O — C M - tV O O O O V O — O J - f VO O O O VO *— C M -f *— O J - f t— *— C M - f f — — CM-sf t — O 0J C M C M C M C M C M C M OJ C M OJ OJ f n m m m m f n m m K V K V -=t -= 1 * -^t -sj" -st -=t -sf LfVLfV L T V ICVITV L T V LTVjlTV L C V LfV VO VO • • • • o O O O o 0 0 O + + + + to o cm m o 0 t o 0 O t o c o CO 10CO 10 O O - 3 - 0 0 C M 0 0 C M - 3 - V O 0 0 CMCOhJ-V O C M cO O C M C M cO EH • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • c o v o -3 - t o v o o v o r o c M * - t o o \ t n a w o 0 c - m i n e - t o c o ^ o s <J- o v v o t o 1 0 -3 " 0 t o t 1 - * - VO CO — h* t - t - O C M jj-V O t^ C D * - t o v o o - ^ i n b - o \ o c M ^ t t - c h (OVO C D O C M f O lT t t '- O V C M <M CM t o t o t o — C M C M C M C M C M C M tO tO tO C M C M C M C M C M tO fO tO tO tO CM CM CM t o t o t o t o t o t o <1- hO t o bO w « K M K • • • • 0 0 0 0 0 — L O O O O O O O O O ^-tnoooooooo •-m oooooooo Pt vo 0 0 OVO »- CM-^-VO O O O V O * - CM-sj- VO O O O V O r - CM<J- vo O O O V O « — CM^- — CMhJ- C- * - O jHj- f - «- CM ^t C- O C M C M C M C M C M t o t o r o t o r o r o r o f o t o t o VOVOVOVOVOVOVOVOVOVO 0\0\0\0\0\0\0\0\Q\C\ a C M C M C M C M C M CMCMCMCMCMCMCMCMCMCM CMCMCMCMCMCMCMCMCMCM CMCMCMCMCMCMCMCMCMCM 183 APPENDIX VI - PARAMETERS 0? THE INITIAL X-MATRIX 101: logePR(=Y0) 201 : Tt ■R 301: loge TR T r 401 : -T 501 : e R 102: T 1/2 R 202: sinh(Tr) 302: cosh(TR) 402: tanh(TR) ,-1 502: T R 103: Tr 203: Tr2 303: TR 403: Tr5 503: TRlogeTR T . -T 104: TRe 204: TRe . ^3/2 R R 304: T : R 404: TRsinh(TR) 504: TRcssh(TR) 105: TRtanh(TRj T 205: e RlogeTR -T 305: e RlogeTR 405: T^/2logeTR 505: (logeTR)sinh(TR) 106: (logeTR )cosh(Tr ) 206: (logeTR)tanh(TR) 506: T-'logeTs 406: T®logeTR 506: TR/Se ‘R 107: e Rsinh(TR) T 207: e Rcosh(TR) T 307: e Rtanh(TR) -1 tr 407: TR e P 507: TRe T 108: T~2e R 184 208: TR/2e Tr -T 303: e XRsinh(TR) "T 408: e Rcosh(TR) -T 500: e Rtanh(T_.) K 109: -1 T e * R 209: _ T1 r ri 2_ R R 309: rn-2 "Tr "R e 409: 1 /? Tr sinh(TR) 509: TR//2co sh ( Tr) 110: 1 /p Trx tanh(TR) 210: m-1 /2 R 310: m TselRlogeTH 410: V ‘ l06eTn 510: TR/2l0Ee TS 111 • ’ T^-OgeTR)slnh(TR) 211: TR(logeTR)cosh(TR) 311: TR(logeTR)tanh(TR) 411 : T ^ 2eTR K ' T 511: TRe Rsinh(TR) 112: TRe cosh(TR) T 212: TRe Rtanh(TR) 312-: T ^ 2e Tr -T 412: T e Rsinh(Tn) sx K -T 512: TRe Rcosh(TR) -T 113: T_e Rtanh(T„) X l J -u 213: T^'/2sinh(T ) a K R /P 313: Tr cosh(TR) 413: TR/2 tanh(TR) 513: TR2log0TR 114: m-3/2 XR 214: T2sinh(Tr) 314: P TRcosh(Tr) 414: T2tanh(TR) 514: TR2sinh(TR) 115: t1/3 R 21 5: m-1/3 R 315: rp2/3 XR 415: t-2/3 XR in 03 P i * " ~Pi P i P i E H E h E H p i E h « . _ w v _ - > P i E H C U P i 4 i Si . c: E H c u to P = I Eh S 3 C O p i C U to o E H 1 •H o ei bD o 1 — 1 C U C U C O o ■p o r H K M K M K M K M K M K M i — ! C M ~v \ \ V \ C M C M C M C M C M C M C M K M 1 P i 1 P i i P i 1 P i i P i I P i KM Ph 1 P ! E i E h Eh Eh E H E H E h E H • » • • • • • • «• • • . . C M C M O O o O O « “ C M C M C M C M C M C M liM » — C M K M -=!• in cd C U C X I P i £ - > I C M C \ J C U k m KM i e r f i r t f E-t Eh CM KM Pi PJ P i Pi Pi Eh EH E h E h EH _- ■ '— ' — ' C D C U Pi A u i s i bD P i bO Pi E h Pi C O p i O E h o EH 1 •H o c3 i—1 C P H C U < D C O o +3 K M K M K M K M K M K M K M K M \ \ \ \ » — T — vo pi t- pi - P i t - Pi i- Pi Pi t- P i 1 P i i Pi Eh EH EH E h EH EH EH E h EH • • • • • • • • • • • • • • • • • • in M3 M3 M3 VO vo C' - K - p - r _ ,— i — T — T — T — r — in CM K M H f in H — C M K M Ph E h » C U KM t P3 Ei K- nt Pi Pi Pi Eh E h E h Pi - — * ■ — ^ ■w* Eh s i J m s i C D Pi s i C O s i bO Pi EH •H o cJ o E h I C O o - p p —1 C U cu C M C M C M C M C M C M \ \ \ \ KM KM KM i — i— t — I Pi 1 Pi 1 Pi i Pi i Pi i Pi E h E h E h E h Eh EH • • . . . . . . , _ C M C M C M C M Cv, C M C M C M C M C M Hj- in — C M f*M H t P 3 Eh £ T ~ t c o C M C M 0J UN P i E - f Ui CO o o C M P i e h .d s i a -p C M I P h • P h Eh Eh K M K M C M C M « - C M P i R R E 1 e h eh -_^ --v — / S i u i ,s 3 s i c o si •H O c 3 C O O -P K M KM- K M \ T p i T P 3 T P i E h EH E h • • • • e ' 00 C O en ! " Z C M Pi E h (U Pi bD Pi eh o e h 1 i —l C U C U K M KM K M \ \ \ C M Pi C M Pi C M Pi Eh &h Eh • • co •• • • 03 C O K M ■5- M M Pi Pi Pi Eh E H E H V__* ^3 ^3 s i s i C O s i o a C O o -p K M K M K M \ C M Pi C M Pi C M Pi E H E H Eh O N • • O N • « C M V— C M K M 186 323: (logeTR)2 423: T~6 T» 523: (Tr) k The three parameters on this page are the "random" ones added to complete the file of IBM data cards (see page 82). APPENDIX VII - SAMPLE OP X-MATRIX Numerical Values for Methane, n =1 c i (pR>i X107 X207 Y 307 X407 X507 0 0.0025195357 O.47451062 0.7915901 1.7915899 0.7101320 3.3871263 ■ 0.3618839 1 0.0028729027 0.48026797 0.8065482 1.3065431 0.7217030 3.3653448 0.3728593 2 0.0057458055 0.51114833 0.8897856 1.8897855 0.7849857 3.2616844 0.4355949 3 0.011491611 0.54621584 0.9907577 1.9907575 0.8593451 3.1612164 0.5151659 4 0.021834061 0.58442374 1.1091406 2.1091405 0.9433940 3.0696166 0.6127230 5 0.043668122 0.63257615 1.2718161 2.27181 60 1.0538419 2.9758531 0.7532687 6 0.10917030 0.705851 56 1.5514685 2.5514685 1.2316864 2.8696336 1.0091928 7 0.21834061 0.77650999 1.8628601 2.8628601 1.4145364 2.7995417 1.3107748 S 0.43668122 0.86132351 2.3024663 3.3024662 1.6505933 2.7470517 1.7584169 9 0.65502183 0.92567779 2.6842230 3.6842231 1 .8386091 2.7261949 2.1624020 10 0.87336244 0.97301737 3.0356158 4.0356157 2.0002512 2.7189483 2.5435523 11 1.0 1 .0 3.1945277 4.1945279 2.0702274 2.7182318 2.7182813 ■ X1 08 X208 > < ! O 00 x408 X508 0 0.0025t95357 0.47451062 7.1381462 0.4285935 0.3064401 0.6935593 0.2749061 1 0.0028729027 0.48025797 7.0082641 0.4237105 0.3086561 0.6913438 0.2761869 2 0.0057458055 0.51114833 6.3810913 0.4238294 0.3201161 0.6798839 0.2824125 3 0.011491611 0.54621584 5.7374857 0.4280196 0.3323000 0.6676999 0.2382243 4 0.021834061 0.58442374 5.2523818 0.4261 396 0.3446376 0.6553624 0.2931360 5 0.043668122 0.63257615 4.7043395 0.4225052 0.3589013 0.6410981 0.2973903 6 0.10917030 0.705851 56 4.0655622 0.4147713 0.3781361 0.6213639 0.3001963 7 0.21834061 0.77650999 3.6052872 0.4053587 0.3941960 0.6058040 0.2993272 B 0.43668122 0.36182351 3.1874373 0.3921245 0.4107929 0.5892071 0.2944894 CD APPENDIX VIII - SAMPLE 0? Z-MATRIX n c < W n 0 < W n 0 s 00 Z208 Z303 Z408 Z508 1 0.47440053 0.02183406 2.107923 0.2250559 4.44334 0.688767 1.451869 2 0.29441424 0.02074689 6.793150 0.1733595 23.07344 1.085199 3.685959 3 0.23253676 0.02380952 12.901185 0.1622200 55.48020 1.446662 6.221218 4 0.32428625 0.02777778 12.334781 0.4206463 33.03671 2.277845 7.924181 3 0.30494131 0.03030303 16.396597 0.4649460 55.76968 2.761075 9.054445 6 0.35009450 0.03378378 17.133229 0.7353969 48.95315 3.550127 10.140481 7 0.33805008 0.03717472 20.706990 O.7999450 61 ,25421 4.069945 12.039473 8 0.37996206 0.04048583 21.054733 1.1549692 55.41273 4.931235 12.978360 9 0.36869874 0.04444444 24.410172 1.2234488 66.20628 5.464351 14.821994 10 0.39317095 0.04807692 25.434229 1.5458339 64.69000 6.270335 15.9481 12 11 0.38728251 0.05181347 23.403037 1.6498651 73.33932 6.845523 17.675784 12 0.40041329 0.05714286 29.969035 1.9239696 74.84526 7.593386 18.963871 13 0.39630597 0.05952381 32.802936 2.0417594 82.77174 8.183869 20.650377 14 0.40439598 0.06369427 34.619531 2.2895954 35.60800 3.902899 22.015298 15 0.40123579 0.06756757 37.384501 2.4148522 93.17340 9.501476 23.680530 16 0.40529797 0.07142857 39.477123 2.6232630 97.40273 10.186082 25.132329 17 0.40345561 0.07518797 42.135985 2.7671991 104.43772 10.798986 26.763999 18 0.40385182 0.07936503 44.570802 2.9357332 110.36424 11.438880 28.324449 19 0.40437248 0.08333333 46.928354 3.1145126 115.90897 12.089622 29.860318 22 0.40558279 0.09433962 54.242931 3.6139425 133.74071 14.010784 34.544818 23 0.40424888 0.09708738 56.895642 3.7585944 140.74409 14.623530 36.174572 26 0.40252954 0.10752688 64.591532 4.2127307 160.46407 16.495755 40.980235 29 0.40103794 0.11494252 72.312353 4.6641111 180.31301 18.364991 45.793647 CD oo 189 APPENDIX IX - THE ANTOINE PARAMETER As a preliminary example of the trial-and-error calculations which may be required in Phase I, consider the primary historical formula logePR = A + c“+”tT ft also known as the Antoine equation. This equation can be rearranged as iogePj, = (A + §) + (§)T„ + (-C)TRlogePH = A' + b 'th + o'TRlogePR, I _1 in which C =-C . The coefficients were determined by standard multivariate regression analysis of all for the vapor pressure example. A plot of coefficient C against the number of carbon atoms in the hydrocarbon chain (nc) appears as Figure 21. The scatter of these results over the range 10<nc<23 destroys any reasonable hope of attributing the coefficient C. A first recursion stage, logePR=Y0 versus (C + » was thus performed on the computed, rather than the attributed, coefficients. The results of this recursion are shown in Table 9. The statistics in Table 9, for the parameter (C + Tr,)”1 , may be compared at this first recursion stage i t „ip with those for the X-parameter X1q^=Tr1c P (Table 3). The average recurslon-error fractions for the Antoine 190 0.05 - 0.10 c -0.15 - 0.20 -0.25 5 10 15 20 25 n , Number of Carbon Atoms in Chain Fig. 21-Non-Linear Coefficient, Antoine Coefficient 191 TABLE 9 BASIC STATISTICS FOR THE ANTOINE PARAMETER logePR “ A + C + Tw ft nc C r* A B 161 l^lmax 1 0.04184- O.99996- 4.91932 4.73002- 0.0155 0.0445 2 0.04739- 0.99999- 5.37769 5.14402- 0.0088 0.0256 3 0.06156- 0.99999- 5. 54050 5.22063- 0.0073 0.0223 4 0.07532- 0.99999- 5.60239 5.18308- 0.0033 0.0089 5 0.08171 - 0.99999- 5.83129 5.36633- 0.0085 0.0334 6 0.09133- 0.99999- 5.94044 5.42312- 0.0120 0.0327 7 0.09011 - 0.99999- 6.23507 5.70349- 0.0125 0.0327 8 0.10224- 0.99999- 6.32486 5.70217- 0.0109 0.0267 9 0.11017- 0.99997- 6.41319 5.71436- 0.0154 0.0302 10 0.11533- 0.99999- 5.92749 5.24371- 0.0093 0.0198 1 1 0.03415- 0.99996- 7.20833 6.60512- 0.0172 0.0409 12 0.12038- 0.99989- 6.78123 6.00720- 0.0339 0.0823 13 0.08142- 0.99992- 7.74301 7.12142- 0.0244 0.0712 14 0.11107- 0.99997- 7.55073 6.71457- 0.0138 0.0370 15 0.13928- 0.99997- 7.28699 6.28045- 0.0180 0.0313 16 0.15484- 0.99993- 7.24127 6.13610- 0.0249 0.0534 17 0.12515- 0.99998- 7.89532 6.91572- 0.0145 0.0268 18 0.07896- 0.99998- 8.67593 7.99254- 0.0125 0.0323 19 0.11147- 0.99995- 8.48476 7.54618- 0.0219 0.0395 22 0.31838+ 0.99815- 17.51856 22.76648- 0.1244 0.3555 23 0.18767- 0.99929- 7.61307 6.22011 - 0.0737 0.1816 26 0.21724- 0.99935- 7.85601 6.17406- 0.0688 0.1681 29 0.29289- 0.99858- 7.02462 5.00473- 0.1042 0.2378 r* = r[logePR,(C+Tr)"1] 192 -1 -tr parameter are about half those for e , up to about nc=23. For higher members of the series, the difference Is much less pronounced. Basically, however, and as regards accuracy alone, the Antoine parameter is distinctly - 1 superior to e . Unfortunately, the difficulties inherent in a Phase III analysis (attributing the coeffi cients) outweigh the accuracy advantages. The attributing -T "1 R of the recursion coefficient for T^ e is, as shown in Figure 11, page 112, much more promising. A separate and isolated study of the Antoine equation might well be worth while, however. With appropriate allowance for Inaccurate data, and with a more extensive analysis of chemical parameters Z^, the first order recursion equation may become more accurate by a factor of two. The possibility of developing a very accurate four-constant vapor pressure equation thus is decidedly attractive. 193 APPENDIX X - COMPUTER FLOW CHART The flow chart that follows is the computational schematic for the basic statistics of recursive linear regression. Input-output programming is not scheduled, as such plans must account for the particular computing machinery used. The output format should, however, dupli cate the input format so that subsequent recursion stages are made operable from the same types of records. Floating point arithmetic is presumed for all statistical (non-logic) calculations, the precision depending on the word size of computer storage. Addres sable locations are identified in the flow chart as L(x), which symbol represents the numerical cell number in memory. The contents of L(x), in n-ary code, are denoted C (x). 1 94 INITIALIZE: X-,Y-,T-bands (N+9 cells each) to zero 1=1, j= N T= r T=Ns= j T=0 r2 SUM( X,X »XY)=V (XX,XY) =r (Y, X)=aQ=a^ =0 DEFINE: N (observation 1-count), S (number of strong- correlation sets desired as output ) end iJNrur OUTPUT entire T-band run yes Add 1 to Nm Store in L(i) of Y-band yes store X. in L(i) of X-band Stop no Is ^=JT? Is N=Nt? yes Is no yes a A yes r(Y.X) r Reset X-band to zero Set • - 3 I I o and i l p p — J Calculate SUM(X,X ,XY), ^XX*^XY»a0*a1 * and store in L(N+1 ) to L(N+9) of X-band Calculate SUK(Y,Y2), Y,Vyy> an< * store in L(N+1 ) to L(N+4) of Y-band a Set s=1 Is r(Y,X) C s(N+1 ) of T-band? Add1 to s yes Set t=2 no no Is s=S? l y e s Stop Add 1 to u Store C (t-1)(N+9)-u in L t(N+9)-u , both of T-band no Is U' =N+9? > yes Is t=s? no Add 1 to t yes Set u=0 Store all (N+9) cells of X-band Store in L s(N+9)+1 to L (s+1)(N+9) ----- > C S(N+9) of T-band in L(rT) 196 APPENDIX XI - OVER-ALL-ERROR FRACTIONS FOR RAN DATA YflTH SECOND RECURSION EQUATION Ref. nc ^obs ^calc E2 ( 6 l 3 0.003947 0.002633 0.3328 6 ) 3 0.007893 0 .0 0 5 112 0.3525 ( 6 ) 3 0.019737 0.012875 0.3476 (6 3 0.039^74 0.028264 0 .2840 6 3 0.065789 0.050942 0.2257 6 3 0.131579 0.110372 0.1612 ( 6 ) 3 0.197368 0.170330 0.1370 ( 6 ) 3 0.263158 0.235793 0.1040 6 3 0.394737 0.379960 0.3743 6 3 0.526316 0.503688 0.4299 ( 6 ) 3 0.657895 0.635255 0.3441 ( 6 ) 3 0.789474 0.766585 0.2899 ( 6 ) 3 0.855263 0.835896 0.2264 ( 6 ) 3 0.921053 0.897371 0.2571 6 ) 3 0.960526 0.940301 0.2106 ( 6 ) 3 1.000000 0.984838 0.1516 0 . 1358= IE^ ( 16) 3 0.015276 0.015860 0 .0 3 8 2- ( 16) 3 0.028947 0.028857 0 .0 3 14- ( 16) 3 0.049197 0.050807 0 .0 3 2 7- ( 16 ) 3 0.086105 0.089162 0 .0355- ( 16) 3 0.146092 0.152071 0 .0 4 0 9- ( 16) 3 0.236737 0.247997 0 .0 4 7 6- 16 3 0.353066 0.372612 0 .0 5 5 4- ( 16) 3 0.462711 0.490696 0 .0 6 0 5- ( 16) 3 0.604118 0.644368 0 . 0666- ( 16) 3 0.761697 0.817024 O .0726- ( 16) 3 0.982882 0.972652 0 .0 7 7 3- ( 16) 3 1 .016724 1.097828 0 .0 7 98- 0.0532a life ( 2 ) 4 0.013026 0.011956 0.0821 ( 2 ) 4 0.047711 0.044470 0.0679 (2 4 0.112618 0.105832 0.0603 ( 2 ) 4 0.191553 0.182220 0.0487 2 ) 4 0.328329 0.317035 0.0344 ( 2 ) 4 0.510158 0.501320 0.0173 Pressures in this table are expressed in atmospheres* 197 Ref. nc ?obs ■^calc E2 (2) 4 0.662289 0.655107 0.0108 (2) 4 0.796013 0.792272 0.0041 (2) 4 0.917276 0.918377 0.0012- (2) 4 0.976579 0.980064 0.0036- (2) 4 1.005921 1.010681 0.0047- 0.0305= il^ (6) 4 0.001316 0.001555 0.1815- (6) 4 0.003947 0.003070 0,2222 !6I 4 0.009211 0.006411 0.3039 (6) 4 0.019737 0.014532 0.2637 (6) 4 0.040789 0.031457 0.2288 (6) 4 0.065790 0.051399 0.2187 (6) 4 0.131579 0.101740 0.2268 (6) 4 0.197368 0.154373 0.2178 (6) 4 0.263158 0.210587 0.1997 (6) 4 0.394737 0.337991 0.1438 S§ ' 4 0.526316 0.474926 0.0976 (6) 4 0.657895 0.615250 0.0648 (6) 4 0.789474 0.764542 0.0316 !6} 4 0.855263 0.837267 0.0210 (6 4 0.921053 0.915394 0.0061 (6) 4 0.960526 0.964210 0.0038- (6) 4 1.000000 1.011079 0.0111- 0.1437= iEfe (23) 5 0.004053 0.003969 0.0205 (23) 5 0.025829 0.015253 0.0364 (23) 5 0.027816 0.026861 0.0343 (23) 5 0.058737 0.057123 0.0275 23) 5 0.102053 0.100130 0.0188 (23) 5 0.151408 0.149671 0.0115 (23) 5 0.216461 0.215631 0.0038 (23) 5 0.283908 0.284710 0.0028- 23) . 5 0.368158 0.371483 0.0090- 23) 5 0.453461 0.459905 0.0142- 23 5 0.530697 0.540928 0.0193- 23) 5 0.597474 0,611295 0.0231- (23) 5 0.673210 0.691079 0.0265- 0.0191= i E j L j 198 Ref. Fobs Ecalc e2 (34) 7 0.121737 0.125018 0.0270- (34) 7 0.155342 0.159287 0.0254- (34) 7 0.196579 0.202159 0.0284- 34) 7 0.246803 0.254731 0.0321- (34) 7 0.307526 0.318757 0.0365- 34 7 0.380434 0.390423 0.0263- 34) 7 0.467393 0.489271 0.0468- (34) 7 0.570474 0.600455 0.0526- (34) 7 0.691921 0.732485 0.0586- (34) 7 0.834197 0.888344 0.0649- (34 7 1.000000 1.071340 0.0713- (34 7 1.192184 .1.285010 0.0779- (34) 7 1.413921 1.533242 0.0844- (34) 7 1.668461 1.820135 0.0909- 34) 7 1.959395 2.150157 0.0974- (34) 7 2.290487 2.527952 0,1037- 0.0578= IE^ (20) 8 0.001842 0.002247 0.2199- 2° 8 0.003053 0.003486 0.1418- (20) 8 0.004803 0.005430 0.tHQ6- 0.1641= (18) 9 0.014474 0.014350 0.0086 (18) 9 0.019737 0.018702 0.0524 (18) 9 0.039474 0.038383 0.0276 (18) 9 0.065789 0.063349 0.0371 (18) 9 0.131579 0.123612 0.0606 (18) 9 1.000000 1.011272 0.0113- 0.0329= iE^ (20) 10 0.000217 0.000266 0.2243- (20) 10 0.000276 0.000365 0.3194- (20) 10 0.000303 0.000373 0.2330- (20) 10 0.000618 0.000681 0.1014- 0.2195= lE^ (IS) 10 0.014474 0.013432 0.0720 (18) 10 0.019737 0.018209 0.0774 18 10 0.039474 0.037920 0.0394 (18) 10 0.065789 0.062508 0.0499 199 Ref. nc Eobs P calc E2 (18) 10 0.131579 0.123099 0.0644 (18) 10 1.000000 0.994162 0.0058 0.0515= ieL, (18) 11 0.014474 0.013639 0.0577 (18) 11 0.019737 0.019146 0.0300 (18) 11 0.039474 0.038590 0.0224 (18) 11 0.065789 0.063597 0.0333 (18) 11 0.131579 0.128736 0.0216 (18) 11 1.000000 0.997219 0.0028 0.0280= |E^ (18) 12 0.014474 0.013442 0.0712 (18) 12 0.019737 0.019057 0.0344 MS) 12 0.039474 0.037866 0.0407 (18) 12 0.065789 0.061753 0.0614 (18) 12 0.131579 0.126363 0.0396 (13) 12 1.000000 0.953605 0.0464 0.0490= iE^ (18) 13 0.014474 0.014934 0.0320- (18) 13 0.019737 0.020880 0.0579- (18) 13 0.039474 0.040767 0.0328- (18) 13 0.065789 0.066082 0.0044- (18) 13 0.131579 0.134221 0.0201- (18) 13 1 .000000 1.013875 0.0139- 0.0268= ii^ (18) 14 0.014474 0.015656 0.0817- 13 14 0.019737 0.021175 0.0729- (18) 14 0.039474 0.040492 0.0258- (18 14 0.065789 0.064739 0.0160 (18 14 0.131579 0.131251 0.0025 (18) 14 1.000000 1.006036 0.0060- 0.0341= IE^ (18) 15 0.014474 0.015505 0.0712- (18) 15 0.019737 0.020802 0.0540- (18) 15 0.039474 0.039146 0.0083 (18) 15 0.065789 0.063042 0.0418 200 Ref. n c P . obs Pcalc E2 OS) 15 0.131579 0.127834 0.0285 <18) 15 1 .000000 0.998398 0.0016 0.0342= \ & > (18) 16 0.014474 0.015761 0.0889- (18) 16 0.019737 0.020570 0.0422- 18) 16 0.039474 0.038903 0.0145 (18) 16 0.065789 0.063086 0.0411 (18) 16 0.131579 0.125952 0.0428 (18) 16 1.000000 1.004891 0.0049- 0.0391= IE^ (18) 17 0.014474 0.015306 0.0575- (18) 17 0.019737 0.019884 0.0074- (18) 17 0.039474 0.038603 0.0221 (18) 17 0.065789 0.063090 0.0410 (18) 17 0.131579 0.126215 0.0408 (18) 17 1.000000 0.994290 0,0057 0.0291= ilfe (18) 18 0.014474 0.013992 0.0333 MS) 18 0.019737 0.018494 0.0629 (18) 18 0.039474 0.036975 0.0633 (18) 18 0.065789 0.061060 0.0719 (18) 18 0.131579 0.121020 0.0803 (18) 18 1.000000 0.955495 0.0445 0.0594= iE^ i18! 19 0.014474 0.014449 0.0017 (18) 19 0.019737 0.019388 0.0177 (18) 19 0.039474 0.039008 0.0118 (18) 19 0.065789 0.063866 0.0292 (18) 19 0.131579 0.125229 0.0483 (18) 19 1.000000 0.983825 0.0162 0.0208= life (18) 20 0.019737 0.019698 0.0020= IE^ (18) 21 0.019737 0.018658 0.0547= lEij 201 Ref. nc ^obs p calc E2 (18) 22 0.019737 0.017677 0.1044= IE£ (18) 23 0.019737 0.016476 0.1652= IE^ (18) 24 0.019737 0.015439 0.2178= iE^ (18) 27 0.019737 0.014844 0.2479= Grand average error fraction (by equation VIr): IE^ = 0.0808
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The Separation Of Multicomponent Mixtures In Thermally-Coupled Distillation Systems
PDF
Thermal Conductivity Of Liquids
PDF
Thermal Conductivity Of Binary Liquid Mixtures
PDF
An Experimental Investigation Of Heat Conduction Through Liquid-Liquid Phase Boundaries
PDF
Vapor-Liquid Equilibria For The Benzene - N-Octane System Near The Regionof The Critical-Locus
PDF
Absorption And Scattering Of Thermal Radiation By A Cloud Of Small Particles
PDF
Radiative Transfer Through Plane-Parallel Clouds Of Small Particles
PDF
Removal And Separation Of Particles (In Solid Organic Chemicals And Metals) By Crystallization
PDF
Prediction Of Enthalpy Of Saturated Paraffin Hydrocarbon Mixtures
PDF
Investigations On The Flow Behavior Of Disperse Systems
PDF
Predicting The Enthalpy Of Saturated Hydrocarbon Mixtures
PDF
Mechanistic Study Of Air Pollution
PDF
Perovskite Structure Rare-Earth Transition-Metal-Oxide Catalysts
PDF
Phase Behavior In A Multicomponent Heteroazeotropic System
PDF
A Study Of The Behavior Of Thixotropic Fluids
PDF
A study of the Lamb equation in dynamic systems
PDF
Isothermal bulk modulus of liquids
PDF
Dental Enamel Electrochemistry
PDF
An experimental investigation of thixotropy and flow of thixotropic suspensions through tubes
PDF
The phase behavior of cyclohexane-rich mixtures with water
Asset Metadata
Creator
Fillerup, Charles Roderick
(author)
Core Title
The Generation Of Predictor Equations By Recursive Linear Regression
Degree
Doctor of Philosophy
Degree Program
Chemical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, chemical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Rebert, Charles J. (
committee chair
), Antosiewicz, Henry (
committee member
), Lockhart, Frank J. (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c18-253071
Unique identifier
UC11358626
Identifier
6203726.pdf (filename),usctheses-c18-253071 (legacy record id)
Legacy Identifier
6203726.pdf
Dmrecord
253071
Document Type
Dissertation
Rights
Fillerup, Charles Roderick
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, chemical