Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A new method of hindcasting and its application to better predicting extreme events in civil engineering, winds and waves
(USC Thesis Other)
A new method of hindcasting and its application to better predicting extreme events in civil engineering, winds and waves
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A NEW METHOD OF HINDCASTING AND ITS APPLICATION TO BETTER PREDICTING EXTREME EVENTS IN CIVIL ENGINEERINGf WINDS AND WAVES by Gho1amr e sa Ir anpou rmoba r ek eh A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In* Partial Fulfillment, of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Civil Engineering) Mav 1985 UMI Number: DP22173 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Dissertation Publishing UMI DP22173 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6 UNIVERSITY OF SOUTHERN CALIFORNIA TH E GRADUATE SCHOO L U N IVER SITY PARK LOS ANGELES, C A LIFO R N IA 90089 This dissertation, written by G ^ l/ianjp^ou./tmobasieJk&h.. under the direction of h.iA Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of 3o&sT)£'7K DO CTO R OF PH ILO SO PHY Dean D a te . . ? £ , . . 1 9 8 5 DISSERTATION COMMITTEE ii DEDICATION To the late Professor Mark Kac, my best teacher and advisor, from whom I have learnt most of what I know about stochastic processes, and to Professors B.A. Troesch, Wlodek Proskurowski, Michael Waterman and Paul Seide. iii TABLE OF CONTENTS Page DEDICATION ii LIST OF FIGURES vii LIST OF TABLES x ABSTRACT xvii CHAPTER 1 INTRODUCTION 1 1.1 General Statement of the Problem 1 1.2 Hypothesis and Objectives 5 2 LITERATURE SURVEY 12 3 MATHEMATICAL SURVEY 28 3.1 Introduction 28 3.2 Classical Theory 29 3.2.1 Preliminaries 29 3.2.2 Asymptotic Distribution of Extremes 3.2.3 Application of the Classical Extreme Value Theory to Civil Engineering 3.2.4 Some Further Results on Classical Theory: Tables and Figures 3.3 Random Number of Random Variables and the Partial Duration Series 63 3.3.1 Introduction 63 3.3.2 Definition and Preliminaries 64 3.3.3 Compound Distribution Functions 71 3.3.4 The Joint Distribution of X(t) and its Time of Occurrence . T(t) for Identically Distributed Exceedances 75 3.3.5 Distribution of the Number of the Exceedances ' n (t)' 79 3.3.6 Application to Civil Engineering 86 4 THE NEW PREDICTION METHOD AND MODELS FOR SIMULATION 108 4.1 Introduction 108 4.2 Basic Mathematics 109 4.3 Practical Implementations 115 4.3.1 The "Wide Interval" Method 119 4.3.2 The "Individual Slice" Method 122 4.3.3 The "Narrow Interval" Method 126 4.4 Evaluation of Fy^(l-i) 128 T 30 40 41 V 4.5 Choices For The Simulation Tests 130 4.6 Simulation Method 133 4.6.1 Ordered Pairs with-a Specified Distribution 134 4.6.2 Simulation Schemes 141 4.6.2.1 Uniform Weather 143 4.6.2.2 Seasonal Variations 144 4.6.2.3 Seasonal with Seasonal Variations with Hurricanes 145 4.6.3 Selection of Independent Observations 148 5 .COMPUTER RUNS AND RESULTS 152 5.1 Introduction 152 5.2 Individual Outputs 153 5.3 Summary Tables 186 5.3.1 Description of the Tables 186 5.3.2 Analysis of the Tables 194 5.4 "Actual" Design Wave Heights 207 5.5 Brief Note on Numerical Methods in DESHT 209 5.6 Notes on Flowcharts 211 6 SUMMARY AND DISCUSSION 213 REFERENCES 227 APPENDIX vii LIST OP FIGURES Figure Page 3.1 Some relationships between H^1) and L'1' , i = 1,,2,3 38 3.2 Probability density function of type (I) 44 3.3 Probability density function of type (II) 45 3.4 Probability density function of type (in)u 46 3.5 Probability density function of type (III)L 47 3.6 Probability density function of Log-Normal 49 3.7 Fitting Frechet extreme-value distribution function to significant wave heights (Thom 71) 51 3.8 Wave height exceedance diagram based on Log-normal distribution (Issacson and Mackenzie 81) 52 3.9 Comparison of histogram of data in table 3.3, using Log-normal and Weibull probability density (Ochi and Whalen) 54 3.10 Cumulative distribution of data in table 3.3, using Log-normal probability paper (Ochi and I 3.11 3.12 3.13 3.14 3.15 I ! I | I I '! 3.16 i i i I 3.17 ! I | 3.18 i i ! 3.19 ! I i t ! 3.20 j 3.21 < J Whalen) 55 Cumulative distribution of data in table 3.3, using Weibull probability paper (Ochi and Whalen) 56 Asymptotic distribution curves for region 1 57 Comparison of the results obtained by using 1904-1952 and 1904-1965 data 58 The modified first asymptotic distribution and observed maximum magnitudes 59 Comparison of asymptotic distribution curves for the modified first and third asymptotic functions 60 Probabilities of exceeding magnitude M in t years in Chile 61 Probabilities of exceeding magnitude M in t years in Peru 62 Sample function of the process p(s) 65 (a) Graphical representation of the process X(t) 68 (b) Graphical representation of the process T(t) 69 Graphical presentation that Fj-(x) is not differentiable at the point x = 0 74 Observed and corresponding theoretical (Poisson) distributions of the number of exceedances for intervals of 20, 60, 100, 140, 160, 180, 200, 220, 240, and 365 days for the Greenbrier River at Alderson, 3.22 3.23 3.24 3.25 3.26 West Virginia Observed distributions of the magnitude of flood peak exceedance flows for the winter season, spring season, and the entire year for the Susquehanna River at Wilkes-Barre, Pennsylvania (a) Greenbrier River at Alderson, West Virginia, observed distribution functions of exceedances for four different periods (b) Greenbrier River at Alderson, West Virginia, theoretical and observed (1-year period) distribution function of exceedances (a) Greenbrier River at Alderson, West Virginia, seasonal occurrence of exceedances (b) Greenbrier River at Alderson, West Virginia, observed function A(t) and fitting function Theoretical and observed distribution functions of the largest exceedance for intervals of 140, 180 and 365 days for the Greenbrier River at Alderson, West Virginia Comparison of statistical experiment values ix 88 90 91 92 94 95 97 106 X Table 3.1a 3.1b 3.2 3.3 3.4 i 3.5 I j 5.1 5.2 LIST OF TABLES Classical probability distributions, theorem 3.1, used to describe extreme events such as wave heights, wind speeds, flood peaks, earthquake magnitudes, etc Some other distributions used for extreme data fittings Scale relationship for probability papers Significant wave height data obtained from measurements in the North Sea from Bouws 1978 (Ochi and Whalen 80) Comparison of design wave height for once in many years at station three Comparison of statistical experiment values Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 80 years of joint data Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, Page 43 48 50 53 105 107 157 xi ! conditional distribution hindcasting with bins, 60 years of joint data ! 5.3 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 40 years of joint data, the 1st period 5.4 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 40 years of joint data, the 2nd period 5.5 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 20 years of joint data, the 1st i period 5.6 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 20 years of joint data, the 2nd period 5.7 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, , conditional distribution hindcasting with bins, 20 years of joint data, the 3rd i period I 5.8 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, I j conditional distribution hindcasting with j bins, 20 years of joint data, the 4th period 158 159 160 161 162 f I 163 i I 164 xii 5.9 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 10 years of joint data, the 1st period 5.10 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with | bins, 10 years of joint data, the 2nd period 5.11 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with I bins, 10 years of joint data, the 3rd ! period | I I 5.12 Gumbel-Gumbel simulation of winds and t waves, no seasonal variation in model, conditional distribution hindcasting with bins, 10 years of joint data, the 4th period 5.13 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 80 years of joint data I i 5.14 Gumbel-Gumbel simulation of winds and i ! waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 60 years of j joint data I i I 5.15 Gumbel-Gumbel simulation of winds and I waves, seasonal variation model with j hurricanes, conditional distribution 165 166 167 168 169 170 xiii 5.16 5.17 5.18 5.19 5.20 5.21 hindcasting with bins, 40 years of joint data, the 1st period Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 40 years of joint data, the 2nd period Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 20 years of joint data, the 1st period Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 20 years of joint data, the 2nd period Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 20 years of joint data, the 3rd period Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 20 years of joint data, the 4th period Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 10 years of joint data, the 1st period 171 172 173 174 175 176 177 xiv 5.22 i 5.23 5.24 | 5.25 5.25 5.27 5. 28 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 10 years of joint data, the 2nd period Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 10 years of joint data, the 3rd period Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 10 years of joint data, the 4th period Gumbel-Gumbel simulation of winds and waves, seasonal variation model, conditional distribution hindcasting with bins, 80 years of joint data Gumbel-Gumbel simulation of winds and waves, seasonal variation model, conditional distribution hindcasting with bins, 60 years of joint data Gumbel-Gumbel simulation of winds and waves, seasonal variation model, conditional distribution hindcasting with bins, 40 years of joint data Gumbel-Gumbel simulation of winds and waves, seasonal variation model, conditional distribution hindcasting with bins, 20 years of joint data, the 1st period 178 179 180 181 182 183 184 X V 5.29 5.30 5.31 5.32 5.33 5.34 5.35 5.36 5.37 5.38 Gumbel-Gumbel simulation of winds and waves, seasonal variation model, conditional distribution hindcasting with bins, 20 years of joint data, the 2nd period Uniform summary model, 60 years of data with wind only, recurrence period of 100 years Uniform summary model, 60 years of data with wind only, recurrence period of 50 years Summary model of seasonal with hurricanes, 60 years of data with wind only, recurrence period of 100 years Summary model of seasonal with hurricanes, 60 years of data with wind only, recurrence period of 50 years Summary model of seasonal variation, 60 years of data with wind only, recurrence period of 100 years Summary model of seasonal variation, 60 years of data with wind only, recurrence period of 50 years Maximum and minimum predicted value for each method Correct, over and underprediction scores for the different methods based on summary tables within a range of .5 feet Erroneous uniform summary table with 60 years of wind only and 100 years of 185 189 190 191 192 193 193 195 198 xvi 5.39 5.40 5.41 return period Erroneous uniform summary table with 60 years of wind data and 50 years of return period Erroneous seasonal variation summary table with 60 years of wind data and 100 years of return period Erroneous seasonal variation summary table with 60 years of wind data and 50 years of return period 202 203 204 205 I xvii ABSTRACT i A new method of hindcasting wave data from wind measurements for extreme event analysis is presented. Instead of a single estimate of the unknown wave height corresponding to a wind datum, the method in effect provides the conditional distribution of wave heights, given the known wind speeds. The conditional distribution is estimated from the available joint data. This method of estimating the long term extreme wave height distribution uses the information from the tail of the joint histogram which is lost in conventional regression methods. Various implementations of the j general concept are possible, but one has been found to be particularly efficient to execute. In this method the wind speed range is divided into intervals of a few miles or km per hour, and the conditional distribution on each interval is taken as a Gurnbel distribution. The method has been tested using random number (Monte Carlo) X V I 1 1 (simulation. Various modifications and adaptations to jot her areas appear possible. CHAPTER 1 INTRODUCTION 1.1 General Statement o£ the Problem In the largest sense, the problem is as old as modern engineering: stronger structures cost rnore, so the requirements of safety and economy are in conflict, and an engineer's design must be partly guided by statistical estimates of the stress it must face. A very common approach is to estimate the recurrence period of an event greater than a certain specified magnitude: waves larger than a certain height, floods with peak flow rates or total volumes greater than a given maximum, or earthquakes larger than a given Richter magnitude, for example. Shane and Lynn [44] point out that it is actually more directly useful to calculate the probability that a structure will encounter an event of a given magnitude during its expected lifetime, or even better to calculate the distribution of waiting times for 2 a specific event, since the actual history of a structure is often different from the prediction before construction. However, for all of these approaches, the principal problem is that the expected lifetime of a structure is often a large fraction of the length of the available geophysical data, or even longer, and so any prediction represents a large extrapolation from experience. The general character of the solution is as obvious as the general problem: To make the best possible prediction by using the most information and the best mathematical methods, that are available. I am presenting a specific way of using more I | information which appears to be new. There are many methods in the engineering literature for analyzing geophysical data. Some of them are more mathematically tractable than others. Usually, the newer methods are based on more comprehensive mathematical theories, whereas the older ones were derived according to intuitive ideas of a reasonable-seeming statistical estimator, perhaps constrained by the lack of availability of computational equipment. Mathematical tractibilty is valuable, because it allows assessment of I ^ the significance of a method's predictions, and this can j be another reason in addition to the use of more i l information, why a new method would be better than an old one. For example, if data from two or more observing stations can be shown to have the same statistical properties, and yet the observations from the stations can be shown to be independent of each other, then the individual data streams can be considered to be independent realizations of the same stochastic process, and can be combined as if they were one long series of observations if the process is ergodic. Many authors have discussed the mathematical convenience of the partial i duration series method (PDS), since if one takes all the events larger than a sufficiently high threshold, then the events would be independent and the theory of a random number of random variables could be applied. As will be seen in more detail in later sections, among the authors who have done such an analysis are Shane and Lynn [44], Todorovic and various coworkers [51-55] Gupta et al. [21], and Liu and Ma [35]. Some of these authors explicitly present their result as producing a joint distribution of event times and magnitudes. It is not difficult to find pairs of other related quantities for which joint distributions can be constructed. For example, on a coast, higher winds are associated with larger waves; in a watershed, heavier rains are I associated with larger stream flows; and in a fault zone, longer intervals between earthquakes are associated with larger Richter magnitudes. A major goal of geophysical research in each of these areas is to replace these vague associations with more precise and deterministic mathematical models of the relationship, but even the best of them needs some allowance for random j i fluctuations. Furthermore, the more detailed models often require information which is not always available, especially when attempting to analyze old data, collected with more primitive instruments, or at fewer places, than what are used now. Thus, there may be applications where joint distributions are the most useful way to organize information. This seems particularly true in extreme event analysis. By definition, extreme event analysis is concerned with unusual events, and random fluctuations often contribute to the extreme values. Thus, the tails of a joint distribution will contain important information about times when the quantities depart from J their usual relationship. Hindcasting is the process of using information about a more easily measured quantity such as wind speed or rainfall, to provide information about a less easily measured but more directly important quantity, such as wave height or stream flow respectively. It is common to attempt to do this to extend data series, since the more easily measured parameters have usually been recorded for a longer period. It appears that usually the hindcasting is done by evaluating a model function relating the two quantities, but since for extreme event analysis the desired information is based on . the tail of the distribution, the discussion above suggests that a method based more directly on joint distributions could be better. Thus, the specific problem has become, "For extreme event analysis for design, can we make better use of the available information by using a hindcasting method based on joint distributions?" 1.2 Hypothesis and Objectives My hypothesis is that there are situations in which the best way to hindcast an event from observations of 6 another parameter recorded simultaneously is to use a conditional distribution for the values of the desired parameter, given the particular value of the known parameter. For example, one can consider wave height to be the dependent variable, and wind speed to be the independent variable, where the distribution of wave height is conditioned on the wind speed. If the conditional distributions are known and they are time invariant, then the joint distribution is determined by the marginal distribution of the independent parameter; improved estimates of the independent parameter distribution then will lead to improved estimates of the joint distribution, and hence to improved estimates of the marginal distribution of the dependent parameter. As the desired recurrence period is ordinarily determined from the tail of this marginal distribution, its estimate will be improved in this indirect fashion if the estimate of the distribution of the independent variable is improved by the use of more data. From this viewpoint, it seems clear that if some computationally convenient way were found to use entire conditional distributions instead of some "most probable values" or "best estimates" derived by regression, a I 1 ; better estimate of the joint distribution would be obtained. A single value cannot show the shape of the tail of the conditional distribution, which is the most important part for an extreme value analysis. This method appears to have the advantage that it can be used whenever there is a period of joint observation of two parameters at one place and another period of single observation from the same point or nearby. While a record with too many gaps may pose questions of the validity of inferences drawn from an insufficiently representative sample, there are no formal requirements for uniformity of sampling, and there is no need for data from more than one place. By contrast, hindcasting models with more detailed physical information may pose more stringent requirements for data that are harder to fulfill. For example, the method used by Brater and Wallace [7] to hindcast stream flow from rainfall uses information about rainfall from the previous three years. While this approach is physically reasonable, given what is known about ground water seepage, it requires good data coverage. By contrast, a more physically insightful approach to ocean waves hitting a shore would require wind speed 8 and direction information from distances miles at sea which usually is not available, especially in old records. In short, there appear to be several reasons for thinking that a method of hindcasting based on joint distributions should have advantages over the methods now in use. The rest of this report will discuss the details of a particular implementation of this general concept, and the results that were obtained. The objectives of this study were two fold. One was to study the mathematical principles of this approach and to compare them with other data analysis methods, especially the partial duration series (PDS) methods which have been studied by a number of authors. The simplest approach to a PDS assumes it is a stationary Poisson process, but nonstationarity especially for seasonal variations and other modifications have also been incorporated. Liu and Ma, for example, essentially define a partial duration series of wave heights, but they are selected on the basis of atmospheric conditions (specifically, when a typhoon occurs) instead of using a cut off value for the waves. As the highest waves occur during typhoons, there is likely to be good agreement between the waves selected by Liu and Ma' s method, and the selection by a conventional PDS, but because of ! \ i J random variations in the size of waves generated by j i different typhoons of similar sizes, and because the I j weaker typhoons are distinguished from "tropical storms" by a somewhat arbitrary wind speed criterion, the agreement will not be exact. An advantage of such mathematical studies is in modifying the algorithm for greater efficiency. For example, if the conditional distributions for waves given winds of a low speed, or range of .speeds, give negligible probabilities for waves of a given height, compared to the probabilities, given winds of a higher speed, then it is not necessary to calculate the entire joint distribution, but only the part of it that corresponds to winds above a certain speed. The remaining values, being rarer, are likely to be more nearly independent, and the analysis of the new method can follow more closely the development of other PDS methods, than if all the low values are included to give a complete duration series. However, if there should be some reason to included the complete series, such an analysis would be needed to show what it is. We may note that while a set of daily values may be considered a complete duration series for a 10 •quantity which changes slowly, it might be very incomplete for something which changes more rapidly. The second objective was to develop and test j computer programs which implement the new method as well as older methods, and to test, them on simulated data and if possible on real observations. The use of simulation allows long time periods to be studied, but since a model must be used to generate the simulated data, the effectiveness of a calculation method in a simulation depends on whether the model in the calculation method matches the model used to simulate the data. Liu and Ma [35], for example, use a combination of a Poisson and a Gumbel distribution to generate their simulation data, so it is not surprising that their method which analyzes data using a combined Poisson-Gumbel distribution, is extremely accurate in predicting the probabilities of simulated events, while a simple Gumbel distribution method predicts less accurately. I Real data, on the other hand, may be difficult or expensive to acquire in machine-readable form. Also, in some situations the observation method may be poorly suited to the analysis method, but as explained above | that is not likely to be the case here. Data availability 11 has been the principal obstacle to a realistic test of the method, but a method intended for practical use, as this one is, must be tested extensively on real data. Further testing is needed beyond what is reported here. 12 CHAPTER 2 LITERATURE SURVEY A large number of papers are relevant to this project and have been consulted during the course of my work. The first group of papers deals with the estimation of extreme values of single parameters without considering joint distributions/ or other sorts of relations between the parameter being considered and other parameters. A clear summary of the basic ideas of extreme value distributions is found in Gnedenko [18], Gumbel [20], Galambos [17], and Deheuvels [12]. The basic formulas and definitions are given along with the statement of the two important theorems of Gnedenko, the relationships under various transformations of the different types of distributions defined by the theorems, and literature i references to the various methods of estimating the parameters of the distributions. | = n - i j Many papers discuss estimation of extreme values in i particular applications. There appears to be a strong tendency for workers in one field to concentrate so exclusively on their own area that ideas are transferred relatively slowly between areas of civil engineering research. Thus, the remarks below are grouped according to their field. In the field of flood prediction, the old paper of Gumbel [19] gives details of calculating plotting positions on Gumbel distribution probability paper in a i i way which attempts to take into account the sampling ! problem posed by small amounts of data. I | This work appears to have been superseded by the j J methods discussed in chapter 5 of Linsley [34]. This is I | a very detailed discussion which describes everything a i j student must do to carry out standard procedures. Perhaps because of the great importance of successful flood prediction, and also because floods are so obviously related to rainfall, research on the I * I i estimation of flood heights long ago turned to consideration of estimates of flood heights from rainfall quantities, or to joint distributions of flood heights [ and times. These researches will be discussed below. In 14 other areas papers are still published on the calculation of probabilities of extremes of single parameters, described by univariate distributions. In the field of wave height predictions, the paper by Issacson and Mackenzie [24] was quite useful for my early work, because it summarizes the formulas and current uses of a number of distributions commonly used in civil engineering geophysics research. These include the lognormal distribution as well as the extreme value distributions, since it is often used and has easily understood properties. Also, as suggested by figures which they present, for particular choices of parameters, a lognormal distribution might be an adequately close approximation of a Gumbel or a Frechet distribution for some practical purposes. While a more mathematically justified procedure seems preferable, their discussion is a very valuable summary of current practices. A further insight into current practices is given by remarks in the paper by Earle and Baer [13], which notes that the World Meterological Organization recommends the I use of the Gumbel distribution for hydrologic studies and ! that oceanographers commonly fit wave height data by lognormal, Gumbel, and Weibull distributions. Their i 1 principle interest is in wave height predictions, and ! particularly ,on the effect that uncertainties in distribution parameters will have on predictions of recurrence periods. They use an elaborate computer simulation of these effects on the lognormal distribution. They use- this distribution because it simplifies their calculations, but their results strongly suggest that it is important in general to obtain the best possible estimates of distributions. This is particularly true if they are to be extrapolated far beyond the periods and events that have been observed. Small uncertainties in distribution parameters translate into large uncertainties in recurrence intervals. The paper by Ochi and Whalen [40] is relevant in a different way. It deals with extreme wave heights, but its principal importance is an example of an approach which it seems best to avoid. They propose to fit extreme wave height data with a complicated function | which has four adjustable parameters. Unfortunately there is no explanation by Ochi and Whalen [40] about why they selected the particular formula which they present. Thus, their formula differs from the established distribution functions discussed above, which are derived from mathematical analysis of the expected behavior of extreme values of a stochastic process under certain assumptions. Ochi and Whalen's function fits the data well in the examples they present, but one cannot tell from their descriptions why a different formula with four parameters might not be as good or better. Their work is not necessarily invalid, but when one considers the many sources of doubt involved in trying to fit distribution functions to observations, a method with a good mathematical justification seems preferable to one whose justification is obscure. If goodness of fit were the only criterion, one might be able to fit Ochi and Whalen's data with a formula which had five or more adjustable parameters, and to obtain a closer fit. Among earthquake prediction researchers, univariate extreme value distribution models have recently been under intensive research. These models . are for the maximum magnitude of earthquakes. The papers by Yegulap and Kuo [59], and S.G. Kim [29] are representative of I i this work. Yegulap and Kuo take a world-wide view and use the Weibull distribution, while Kim restricts his attention to a few densely populated, seismically hazardous urban regions, and uses a modified Gumbel 17 distribution. As the flood and wave researchers have done before then, they work hard to find the best way to derive the parameters of their chosen distributions from the available data. Knowing the recurrence period of an earthquake larger than or equal to a given magnitude would allow an estimation of seismic risk, and would constitute a form of prediction, even though it would not give a specific time or place. As noted in the introduction, a number of authors have investigated partial duration series, since the PDS for any time period can be treated as a random number of random variables. The random integer gives the number of events, and the random variables give the individual magnitudes. Most of the papers of this type which I have are on the number of floods per year and their exceedances over some base level. The first of these is Shane and Lynn [44], .in which the number of events is assumed to have a poisson distribution, but Todorovic [51-55] was able to derive the properties of the extreme value of such a random number of random variables under the assumption of a more general distribution of the number of events. This opened the way to a series of studies by Todorovic and a variety of coworkers 18 (Todorovic and Zelenhasic [52], Todorovic and Rousselle [54], Todorovic and Woolhiser [53], Todorovic [55]) in which seasonal variations in flood probabilities are discussed. The exceedances can be considered to be independent if the base level is high enough, and their distributions appear to be independent of the season, as shown by the data presented in Todorovic [55], so these are independent identically distributed random variables. The times of the floods are treated as random variables and their number is given by a time-inhomogenecus Poisson distribution. The specific temporal inhomogeneity is a subdivision of the year into seasons, which are each considered homogeneous. These seasons do not necessarily coincide with the conventional beginning of Spring, Summer, Fall, and Winter on March 21st, June 21st, September 21st, and December 21st respectively, but are chosen according to the changes in flood probability, as shown by a plot of the cumulative distribution of flood events. These methods have been used on data from only a few rivers. Todorovic and Zelenhasic present data from the Susquehanna River at Wilkes-Barre, Pennsylvania, while Todorovic and Rousselle, and the subsequent papers by 19 Todorovic and the other collaborators discuss results for the Greenbrier River at Alderson, West Virginia. A closely related paper by Gupta et al. [21] builds on the work of the papers in the Todorovic series which had appeared up to that time. Gupta et al. discuss the joint distributions of flood magnitudes and times of occurrences. On general grounds, they doubt that the flood exceedances would be identically distributed, since some of them might be generated by the rainfall from convective air mass thunderstorms, while others could result from frontal storms or hurricanes. They consider data from the Mississippi River at St. Paul, Minnesota, and the Licking River at Catawba, Kentucky, which may represent a greater contrast in climatic areas than the Susquhehanna River and Greenbrier River. Gupta et al. are also concerned that over a sufficiently long time, urbanization might significantly change the hydrologic properties of a watershed, and this may be a valid concern, since the data series for all four rivers extend from the late 1800's to the 1960's. However, Gupta et al. do not attempt to include this long-term change in the applications to observations which they present. A less general development, in a similar spirit, is 20 presented for wind and wave data in a paper by Liu and Ma [35]. They treat the number*of typhoons in a year as a random number with a Poisson distribution, and the highest wave in a given typhoon as a random variable with a Gumbel distribution. The problem of incomplete data is especially serious for them, since they apparently do not have records from before 1949, when the present government established its control over China, and because the storm environment on the coast of China is ! quite severe, with numerous stations reporting an average I j of two or more typhoons per year. The average typhoon | wave height at station three, the station whose data are l the most carefully analyzed, is 5.38 meters and the expected 20-year design wave at this station, as predicted by their method, is 10.6 meter high. This implies very great strength and expense will be needed in structures intended to survive for long in this area of coast. Even though my method is somewhat different from that of Liu and Ma, their paper was the original inspiration for my work on hindcasting, because it first aroused my interest in potential application of joint distributions. Several other papers discuss various aspects of the 21 use of joint distributions. Pickands1 paper [41], on multivariate extreme value distributions, is largely concerned with estimating dependence functions, especially in the case of bivariate distributions. The dependence function which appears in these distributions is more general than the linear correlation coefficient which appears in bivariate Gaussian distributions, since the two quantities are not assumed, to be linearly related. A dependence function is also important in the work reported by Buishand [8], He is concerned to find when data from Dutch weather stations are independent of each other, so that they can be combined to provide the effect' of a longer series in the station year method. When data for the whole year are considered, the results from stations 30 km apart appear to be independent, especially when only the larger events are considered. However, when only Winter data are considered, the data from the I ! same combinations of stations show a dependence which increases with the magnitude of events. The tentative | explanation is that the storms which occur in the Winter are different from the ones which occur during the rest of the year, and that while the most intense rain of the 22 year comes from small storms which will not be observed at more than one station at a time, the most intense rain of the winter comes from the larger storms which are observed at several stations simultaneously. This appears to be an example of the situation discussed by Gupta et al., of different types of storms producing rainfall patterns with different statistical properties. It is also a good warning to be careful about analyzing data. Another paper which discusses a type of joint distribution is by Kiremidjian and Anagnos [31]. Unlike the earthquake prediction papers mentioned earlier, which i analyzed only the extreme magnitudes of earthquakes, these authors discuss the dependence of earthquake magnitude on the interval since the preceding earthquake. This dependence is very plausible, since the tectonic plates of the earth's crust appear to move relative to each other at rather constant rates. Thus, the strain in the rocks along a fault, and the energy available for release in an earthquake, is approximately proportional to the time since the faults has last moved. These j authors derive conditional probability formulas which are related to those obtained in other papers which have been 23 discussed here, and use them to study the behavior of the Middle America Trench, which runs along the southern Pacific coast of Mexico, off shore from the state of Oaxaca. Only a few papers are directly concerned with hindcasting, and none of them deal with hindcasting shore waves from shore winds, as I am discussing. Herbert C.S. Thom [48-50] explains that a frequently used type of hindcasting is an elaborate method for estimating waves in the open ocean. In this method, isobars of air pressure are estimated by extrapolating shipboard barometric measurements to areas of the ocean where direct measurements were not made; then winds are estimated from the isobars; waves are estimated from the winds; and finally a distribution function for wave heights is estimated from a large number of individual wave height estimates. This process accumulates so many uncertainties that Thom considers the final results to be meaningless. He finds that the scale of the extreme wave height distribution is proportional to the scale of the extreme wind speed distribution so that the wave distribution can be obtained more directly and reliably. This direct use of distributions has certain similarities 24 to my method, even though Thom is studying waves in the open ocean instead of coastal waves. On a more technical level, Thom argues for the use of Frechet distributions instead of Gumbel distributions because the Frechet distributions are not defined for negative values of the variable, which are physically unreasonable in most applications. Obviously, neither a negative wave height nor a negative wind speed is possible. However, the figures and formulas presented by Isaacson and Mackenzie [24] indicate that a Gumbel distribution with reasonable parameters gives negligible probabilities for negative values. Thus, this difference does not appear significant for practical applications. Another paper on hindcasting takes a very different approach from what I am suggesting. Brater and Wallace [7] hindcast stream flow from rainfall by a moving average model which uses rainfall data from the previous three years. They use successively longer-term averages of rainfall as they go into the past. This mode is plausible, given the known behavior of ground water infiltration, in which the effects of short-term variations in rainfall are lost as the water seeps down to the water table. 25 For hindcasting specific events, when sufficient rainfall data are available to evaluate the formula, this appears to be a very good approach. However, it appears to be quite specific to the particular problem of rainfall and runoff, and does not seem to be well suited for an adaption to hindcasting waves from winds, unless wave behavior could be shown to be well correlated with some moving average of wind speed measurements, which could probably need to be made at intervals of less than an hour. As will be explained in later sections, my method could be used with a complete duration series. That is, it could be used to hindcast wave distributions from winds of all speeds, even for weak winds which have a negligible probability of being associated with excessively large waves. With a complete duration series, extreme events are described by the upper tails of distributions which are intended to describe all the events of a given type. Under highly idealized assumptions, ocean waves would be expected to follow a Rayleigh distribution, but these assumptions do not appear to be satisfied in practice. Papers by Nolte and Hsu [39] and by Tayfun [47] discuss possible responses to 26 the difference between real distributions of ocean waves, as they are usually recorded, and the expected Rayleigh distributions. Nolte and Hsu suggest applying a filtering operation to the wave record to make an analysis with a Rayleigh distribution valid. Tayfun argues that the problem arises because the Rayleigh distribution applies to wave height measured in terms of envelopes while the usual measurements are crest-to- trough wave heights. When the spectrum of the waves is broadband, the crest-to-trough wave heights do not agree with the envelope wave heights. Tayfun's computer simulation using the envelope definition provides better agreement with the Rayleigh distribution. Both the calculations of Tayfun and the filtering studies of Nolte and Hsu appear to deal primarily with wave records of a few minutes or hours, and thus do not appear to be highly relevant for the estimates of very long-term extremes such as those I wish to study. Also, the principal purpose of my proposed method is to take advantage of old data, rather than to rely, as Nolte and Hsu and Tayfun do, on measurements made at high data rate, which require sophisticated equipment that has only been available for a relatively short time, and is used i in comparatively few places. Thus we see that many papers civil engineering are relevant in work presented here. in different a some fashion 28 CHAPTER 3 MATHEMATICAL SURVEY 3.1 Introduction The theory of extreme values and the extreme value distributions play an important role in many areas of civil engineering. For example, extreme-value distributions arise quite naturally in studying coastal waves and winds, ocean waves and winds, floods and droughts, earthquakes and environmental problems such as assessing the levels of air pollution. The next sections will be a short review of classical and recent mathematical approaches to extreme value problems which have been discussed in civil engineering research papers that are relevant to the work presented here. 29 I 3.2 Classical Theory i l 3.2.1 Preliminaries I Suppose that £1,C2'**‘'Sn are independent and j identically distributed random variables from a i distribution F(x) which is assumed to be continuous. The classical theory of extreme values primarily concerns itself with the distribution of the smallest and largest values of K\r^2'••*•5n* T^at is, if Xj = min( ^2. * ^>2’' * * ’ ^n^' (3-) t I Xn = max( , K21 * *•f^n^ then knowing F(x), we would like to find L n (x ) = P(XX < x), (3) Hn (x ) = P(Xn < x). To find distribution of the extreme values in (3), it is easy to see that 30 L n ( x ) = P{XX < x) = l - P ^ > x) n = 1 - n P(Ci > X ) i=l = 1-[1-F(x)]n , (4) and similarly Hn (x ) = P(Xn < x) = [F(x)]n . (5) The distributions in (4) and (5) when n-~oo , are known as the asymptotic (or limiting) distributions of extreme values. Fortunately, the behavior of these distributions is relatively simple, and is well understood. 3.2.2 Asymptotic Distribution of Extremes The fundamental result in the theory of extreme values was discovered in 1927 by Frechet [15] and in 1928 by Fisher and Tippett [14] and was formalized in 1943 by Gnedenko [18]. A complete bibliography is contained in Gumbel [20]. The following theorems of Gnedenko, stated here without proof, summarize these results. Theorem 3.1 (Gnedenko). 31 Let 5lr52'***'£n be independent and identically distributed with distribution function F, and let Xn = max( / ^ 27 * * *"^n^• Suppose that for some sequence of normalizing constants {yn} and {8n > 0}, and some other constants a > 0, b > 0 lim p{- n ■ ■ 00 *n 'n } x-a H {—— ) (6) for all x where H(«) is a continuous nondegenerate distribution function. Then H(®) must belong to one of the following three "extreme value types", (I) (largest) (Gumbel Distribution): ,., x-a x-a H'1 ’ (---) = exp[-exp{- ---)], -oo<x«», (7) b b (II) {largest): 0 x<a x-a -a exp(-(—— ) ], x>a, a>0, b (8) 32 {III) (largest): x<a, a>0 O) 1 x>a. Whenever (7), (8) or (9) holds for some sequence {yn} an<3 {Sn > 0), we shall say that F belongs to the domain of attraction of i = 1,2,3, and write F e D(H^)). Methods for determining Yn' ®n can found in Gnedenko C18] or Galambos [17]. It is not necessary for us to know the exact form of F in order to determine to which domain of attraction it belongs. A useful feature of the extreme value theory is that it is just the behavior of the tail of F(x) that determines its domain of attraction. Thus, a good deal can be said about the asymptotic behavior of Xn based on limited knowledge about F. The second theorem of Gnedenko gives the necessary and sufficient conditions for F e Z)(H( ) , i = 1,2,3. Theorem 3.2 (Gnedenko). Let xQ <oo be such that F(xQ) = 1, and F(x) < 1 for 33 all x < xQ. Then: 1. F e D(h (-*-)) if and only if there exists a continuous function A(x) such that lim A(x) = 0, and such that for all h, x-~x0 l-F{x(l+hA(x))} _h lim ---------------— = e n. (10) 2. F e D(H(2)) if and only if 1_F<X) « Ml, lim ---7;— 7 = ka (ii) x —> -0 0 ( ^ x ) for each k > 0 and a > 0 . 3. F e D(H(3)) if and only if l-F(xQ-kh) iim ----------- = ka (12) h-* - 0 F(xo ^) for each k > 0, and a > 0. The tail point xQ of F was defined above as the point where F = 1. Using the criteria given in Theorem 3.2, we can verify that if F is either an exponential, or a normal, or a Weibull j 3 4 s I distribution [F(x) = 1-(exp(-x))a, x > 0, a > 0], then F i j belongs to the domain of attraction of whereas if F is a uniform distribution, then F e D(H^)). Another property of the extreme value type distributions 1)(•), i = 1,2,3, is that they belong to their own domain of attraction, which is also referred to as self-locking property. As an example of using Theorems 3.1 and 3.2, let's assume that the tail l-F(x) of the distribution function F(x) has the exponential type; i.e., lim ex(l-F{x)) = a > 0 X—>-oo Then for large x we have F{x) ~ l-ae“x. Furthermore, if <$n = If Yn = /n(an) n = 1,2, a - 0, b = 1 35 then by (5) we may determine H(®) in (6), as follows Xn- /n(an) lim P{--------- < x} n-^oo = lim P(xn < x+/n(an)} n-»-oo = lim [F(x+ In(an))3n n-^oo = lim (l-ae_x- /n(an))n n—>-oo e n = lim (1----) n-*-3 o n = exp{-exp(-x)} which is the first asymptotic distribution (Gumbel's law). It is easy to verify that F satisfies (10) if we simply let A(x) = -k and xQ =«> , 1-F{x(l+hA(x))} xh l - [ l - o e ” <x+" 7 ) 1 _ 2im ---------- = e“^. x^oo l-(l-ae~x) Analogous to the three "extreme value types" for the 36 largest values given in Theorem 3.1, we have three extreme value types for the smallest values That is, if X1 an x-a x-a lim P{— -- < — } = L< — ) Bn b b then L(«) must belong to one of the following: I (smallest): ,,. x-a x-a l(1)('— -) = l-exp[-exp(-7— ) ], -oo<x«», (13) b b II (smallest) x<a x-a ct l-exp[-(~— ) ], x>a, a>0 b (14) III (smallest): 37 l f-n x “ a x-a _a 1-exp[ - ( — ) ], x<a, a>0, D (15) x>a These extreme value types also have domains of attraction, and using criteria that are analogous to Theorem 3.2, we can verify that if F is a normal distribution, then F e £>( L^)), where as if F is an exponential, a uniform, or a Weibull, then F t D{L^2^). Since max[min] .. . ,£n) = -min[max] (-^lf.. . ,-^n), the L^1^ and 1^ distributions are very closely related by changes of variables. Figure 3.1 shows some of the relationships that exist between the distributions H^1 ^ and the L ^ , i = 1,2,3. It is easy to verify that in Figure 3.1, the reverse relationships also hold. For example, if Yn = -X^, then “X1 L^)(a,b,a) H^J(-a,b,a), 38 I 39 and so on. In view of this last relationship, and the relationships implied by Figure 3.1, it follows that we need only to consider the distribution L^^arb). All the other distributions considered here can be transformed to the distribution L^)(a,b), either by a change of variables or by a change of variable with a setting of the location parameter equal to zero. 40 3.2.3 Application of the Classical Extreme Value Theory to Civil Engineering H.C.S. Thom [48] has applied (8) to extreme wind speeds and then obtained annual extreme significant wave height distributions in the open ocean by a change of scale. Observations from around the world show a similar scaling between wind speed distributions and wave height distributions. He urges the use of this distribution because wave heights have a zero lower bounds, as does in the equation (8) while the other distributions have infinite lower bounds although they usually give negligible probabilities for negative heights. In I previous papers [49,50] he separately examined extreme i | wave heights and wind speeds in the open ocean. Nordquist's analysis of Southern California data shows that the probable occurrence of maximum magnitude earthquakes obeys the first asymptotic distribution (7), fairly well. Yegulalp and Kuo [59], however claim that (9) is more physically realistic than either (7) or (8) because the magnitudes of earthquakes must have an upper bound as does 2) . So Gu Kim [29] has used and j 2^ to compute return periods and maximum earthquakes in I ! densely populated regions such as Tokyo and Guatemala. 41 In certain respects, applying the classical extreme value distributions to geophysical data for civil engineering requires unrealistic assumptions. Gumbel [20] discusses the assumptions needed to apply (7) to his complete duration series data. 'It must be admitted that the good fit in the example regave cannot be foreseen from the theory, which is based on three assumptions: 1) the distribution of the daily discharges is of the exponential type, 2) n = 365 days is sufficiently large, and 3) the daily observations are independent. Assumptions 1) and 2) cannot be checked since the analytical form of the discharges is unknown. The third assumption does not hold and the number of independent observations is certainly less than 365.' These remarks by Gumbel indicate that the mathematical assumptions underlying the classical extreme value theory may not always be applicable to a stream flow hydrograph. Similar comments could be applied to other geophysical records which are similar to hydrographs. 3.2.4 Some Further Results on Classical Theory: Tables and Figures For more familiarity with the classical extreme ! 42 value distribution functions of Theorem 3.1 (Gnedenko), we now summarize the related figures and repeat the mathematical equations with a slight change of notations. The means and variances will also be mentioned in terms of the parameter distributions. Equations and figures of other useful distributions which are not theoretically extremes but being used in extreme value analyses will also be shown. Finally some figures resulted from real | civil engineering extreme data analyses will be j I presented. For more detailed explanations regarding i I these tables and figures, please see the related | references. i 1 Table 3.1a Classical probability distributions, theorem 3.1, used to describe extreme events such as wave heights, wind speeds, flood peaks, earthquake magnitudes, etc. Distribution Range Cumulative probability Mean Variance Type (I) — oo<H<°° — 00< £ <0O 0 < 0 <oo {exp[-exp - (Si--) ] } e+y0 (-£+.589) 2 O (-1.640 ) Type (II) 8 8 8 V V V W a i S V V V o o o exp[- ( j) ] e r (l-—) a 2 2 ? 1 9 [ r (l-—) -r (l--) ] a a Type (III)D. (Upper Bound) -°o<H< e 0 < 0 <°° 0 <a<°° r , H-e.a. exp[ (- 0 ) ] £ - 0 r ( 1 + - ) a 9 2 [ r ( i + | ) - r 2 ( i + i - ) ] Type (III) J _ i (Lower Bound) e <H<°° 0 < 0 < ° ° 0 < a <0 0 - i r /H- e > a. 1-exp [ - ( — g— ) ] e + 0 r ( 1 - - ) a 02 [ r ( i + - ) - r 2 ( l + i ) ] a a u> 0.5 0.4 0.3 0.2 0.1 0.0 8 6 4 2 0 « Figure 3.2 Probability density function of type (I) 5 0 5 0 4 1 2 3 0 H Figure 3.3 Probability density function of type (II) CT>|3T 2.0 1 .5 .0 0.5 0.0 2 3 1 0 H e Figure 3.4 Probability density function of type (III)U .0 .5 0 5 0 H Figure 3.5 Probability density function of type (IH)L Table 3.1b Some other distributions used for extreme data fittings Distribution Range Cumulative Probability Mean Variance /2i h 1 r 1 ,ln (h) -0 . 2. ,, -xp[-T (---- J .--) ] dh ah exp (0+-y) 2 2 exp(29+a )[exp(a )—1] Lognormal 0<H<™ — 00 < 0 < oo 0 < a <0 0 Log Pearson type III Convert the data series to.logarithms, compute the mean, standard deviation, and skew coefficient g which is g = NX (log X-log X) (N-l)(N-2)(0log x) The values of X for various return periods are computed from log X = log X+Kcr log X where K is selected from tables based on g and the return period. Such tables could be found in Linsley [34]. co 2.0 1.5 1.0 0.5 0.0 0 1 2 3 4 5 H 6' Figure 3.6 Probability density function of Log-Normal Table 3.2 Scale relationship for probability papers Distribution Abseissa scale X Ordinate scale y Slope a Intercept b Type (I) H -£n{-£n[P(H)]} 1/0 -e/0 Type (II) An (H) -Xn{-£n[P(H)]} a -a Jin0 Type (IlDy Upper Bound -Jin (e-H) H - £n{ - Jin [ P (H) ] } -{-£n[P(H)]}1/a a 1/6 a£n0 -e/0 Type (III)L Lower Bound £n(H-e) H Xn{-£n[Q (H) ] } {-£n[Q(H)]}1/a a l / e -a£n6 -e/6 Lognormal Jin (H) 1 fy -Pt2 1 e 2t dt /2ir 0 1/a -8/a U1 o 1 LN (SIGNIFICANT WAVE HEIGHT) MEAN RECURRENCE INTERVAL TOO 1 0 1 4.2 4.0 3.8 3.6 3.4 3.2 3.0 .98 .99 .90 .95 01. .05 .10.20.30.40.50.60.70.80 PROBABILITY Figure 3.7 Fitting Frechet extreme-value distribution function to significant wave heights , (Thorn?71) cn H SIGNIFICANT WAVE HEIGHT (FT) 5 0.00001 0.0001 0.001 0.01 Q(H) 0.10 0.50 _L 5 10 H( ft) 20 JL 50 100 ,0.1 .0.3 -0.9 -*0.99 1/2 TR(yrs) Figure 3.8 Wave height exceedance:diagram based on Log-Normal distribution (Isaacson and Mackenzie 81) r 53 Table 3.3 Significant wave height data obtained from measurements in the North Sea from Bouws 1978 (Ochi and Whalen 80) Significant Number of Wave Height (M) Observations 0 - 0.5 1,280 0.5 - 1.0 1,549 1.0 - 1.5 1,088 1.5 - 2.0 628 2.0 - 2.5 402 2.5 - 3.0 192 3.0 - 3.5 115 3.5 - 4.0 63 4.0 - 4.5 38 4.5 - 5.0 18 5.0 - 5.5 21 5.5 - 6.0 7 6.0 - 6.5 8 6.5 - 7.0 2 7.0 - 7.5 1 TOTAL 5,412 in 3 years PROBABILITY DENSITY IN 1/METER .8 .-NORMAL DISTRIBUTION .6 BULL 3ISTRI5UTION HISTOGRAM 0.4 0.2 SIGNIFICANT WAVE HEIGHT IN METERS Figure 3.9. Comparison between histogram of above data using Log-Normal and Weibull probability density (Ochi and Whalen) CUMULATIVE DISTRIBUTION 0.9999 0 . 9 9 9 0 . 9 9 Log-N ■Di str )rmal ibi itinn 0 . 9 5 8 0 6 0 0 . 4 0 0.20 010 0 . 5 1 2 4 6 8 1 0 SIGNIFICANT WAVE HEIGHT (M) F i g u r e 3 . 1 0 C u m u l a t i v e d i s t r i b u t i o n o f t h e s e d a t a u s i n g L o g - N o r m a l p r o b a b i l i t y p a p e r ( O c h i a n d W h a l e n ) CUMULATIVE DISTRIBUTION 0.999 0 . 9 9 0 . 9 5 0 . 9 0 0 . 8 0 0 . 6 0 WEIBULL DISTRIBUTION 0 . 4 0 0.20 0.10 SIGNIFICANT WAVE HEIGHT (M) Figure 3.11 Cumulative distribution of these data using Weibull probability paper (Ochi and Whalen) U1 RETURN PERIOD U) o ZD O < £ Probabil ity 9 8 7 6 Aleutian Islands,West Alaska 5 .01 .10 .30 .50 .70 .80 .90 .95 .98 .99 .995 .999 J L 1 __I I -2-10 1 2 3 4 Reduced Variable, y Figure 3.12 Asymptotic distribution curves for region 1. — indicates the first type, --- indicates the third type, and + indicates observed annual maximum magnitude (Yegulalp 74) RETURN PERIOD 2 5 10 20 50 100 200 500 Yrs 9 8 o n 7 o £ 6 Aleutian Islands, West Alaska 5 I I I 11 I I i 1 J__U—I--1 --1 ---1 1ty .01 .10 .30 .50.70.80 .90 .95 .98 .99 .995 .999 - 2 - 1 0 1 2 3 4 5 6 7 Reduced Variable, y Figure 3.13 Comparison of the results obtained by using 1904-1952 and 1904-1965 data. ---- indicates the asymptotic distribution obtained from 1904-1952 data, and.-— . — indicates the asymptotic, distribution curve obtained from 1904-1965 data. + indicates 1904-1965 data, and o indicates 1904-1952 data . Ui co MAGNITUDE RETURN PERIOD (YRS) 500 1000 30 40 50 100 200 8 XX 7 6 5 .995 .997 .999 .950 .9.80 .990 0.10 .500 .800 .100 .900 G(X) Figure 3.14 The modified first asymptotic distribution and observed maximum magni tudes. la, the Tokyo region for the period 1213-1931; lb, the Tokyo region for the period 1906-1979; 2, the Guatemalan region; 3, the Managuan region; 4, the Los Angeles region (JKim 1983) < ji MAGNITUDE RETURN PERIOD {YRS) 1.01 1.1 1.5 2 3 4 5 10 20 30 40 50 100 200 500 1000 8 7 6 5 .999 .995 .997 .990 .950 .900 .980 .800 .500 .010 .100 G(X) Figure 3.15 Comparison of asymptotic distribution curves for'the modified first and third asymptotic^ functions. — indicates the modified first type, indicates the third type (Kim 198 3). cn o Figure 3. MODIFIED FIRST TYPE GUMBEL THIRD TYPE 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 9.5 8.5 9.0 7.0 7.5 8.0 EARTHQUAKE MAGNITUDE M CHILE 1900-1980 Probabilities of exceeding magnitude M in t years in Chile. and - - - indicate the modified first type and Gumbel third type, respectively (Kim 83) ' h Figure 3 MODIFIED FIRST TYPE GUMBEL THIRD TYPE < U > , P c r A| 4-> u f l 3 0 ) * 0 ,17 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 9.5 10 9.0 8.5 7.5 8.0 7.0 EARTHQUAKE MAGNITUDE M PERU 1900-1980 Probabilities of exceeding magnitude H in t years in Peru, — and indicate the modified first type and Gumbel third type, respectively (Kim 83) CD N J 63 3.3 Random Number of Random Variables and the Partial Duration Series 3.3.1 Introduction The work described in this section is an attempt to develop a theory that will be more descriptive of actual geophysical behavior than the classical theory is. Many phenomena, such as stream flow, wind speed, and wave height, show a typical behavior in which relatively long periods at a relatively low level are interrupted by excursions to much higher values. A partial duration series {PDS) is composed of the peak exceedances of the parameter above a base level which is above the range of typical values, and the times when this occurs. The PDS is studied as a random number of random variables. This is more realistic than the classical extreme value theory for several reasons: first, when the truncation base level is taken to be sufficiently high, the assumption of stochastic independence among individual exceedances becomes physically plausible; second, the assumption that the number of exceedances in a fixed time interval is a random variable not only bears unequivocal empirical support but also allows this approach to be applied to any arbitrary time interval, 1 - — — ' — — — ...... 64 ' e.g., a month, which is not true for the classical i extreme value theory. For long time intervals, e.g., a year, this feature provides a natural way to incorporate the seasonal variation effects in the occurrence of the individual exceedances.. Moreover, the mathematical formalism is general enough to treat stochastic dependence among exceedances in a PDS and nonidentically distributed exceedances. 3.3.2 Definition and Preliminaries I ! Since the study of finite random number of random I 1 variables is more complicated to write than the classical i i discussion of asymptotic limits, it is necessary to ! define a number of notations. I j Figure 3.18 shows a time series which has features i which are typical of many geophysical phenomena. It i | happens to be a stream flow hydrograph (Figure 1 of i j Todorovic [55]) but it could easily represent wind speed j values recorded every few hours in a place which is | I subject to occasional severe wind storms. Such places would include not only coastal areas subject to I i j hurricanes or typhoons, but some inland areas, such as i large areas of the American Great Plains. T . ^ s Figure 3.18 Sample function of the process p(s) 66 Since the values vary in a random manner with time, p = {p (s) , s- > 0} (16) is a continuous parameter stochastic 'process. Let us select a base level xQ and record only those values that exceed this level, i.e., the sequence of random number of random variables in [0, t ] : I j ; Sl'^2' * * * '^n( t) t <00 , n(t) <3° (17) I I I we let be the time of £k and so I I I 1 n(t) = sup{k; x(k)<t} t>0, k = 0,1,... (18) and we note that for At > 0, n(t) < n(t+At). The series in (17) over an n-year period is called the PDS, corresponding to the geophysical record p(s). Designate the largest of all £k in [0,t] by X(t) = sup K0 = 0' *<0) = 0 (19) T ( k ) < t or 67 X(t) = sup Ck K0 = 0 , t ( 0 ) = 0 o<k<n(t) It is apparent that X(t) is a stochastic process with each sample function a nondecreasing step-function; i.e., X(tx) < X(t2) when t^ < t2. Figure 3.19a from Todorovic and Woolhiser, shows the process X(t), and leads to the consideration of event times. a Let T(t) be the instant in [0,t] when the extreme X(t) is achieved. By definition T(t) represents another stochastic process of nondecreasing step functions since for every At > 0, 0 < T(t) < T(t+At) < t+At. Figure 3.19b shows an example of T{t), corresponding to Figure 3.19a. Finally let Ek be the event that exactly k exceedances occur in [0,t]; i.e., Et = (n(t) = k} (20) then it is easy to see that X(s) = S4 t { 1 ) t { 2) t ( 3 ) t ( 4 ) Figure 3.19a Graphical representation of the process X(t) t (i) For For For = T t<T. Tl—t<T4 T4it T(t) i=l,2,.. T(t)=0 T(t)=T1 T(t)=x . / / / / / / / / / / / T (1 ) t ( 4 ) Figure 3.19b Graphical representation of the process T(t) i j I < y s 70 = {x(k) < t < t(k+l)} = {(t(k) < t> D {t(k+i) > t}'• = {^(k) < t}-{T(k+l) < t} t t Ei fl Ej = e, i ? j = 0,1,... (21) U e£ = J2, k = 0,1,... k=0 P(e£) = P{ t (k) < t}-P{r(k+l) < t} where 0 and Q are the null and sure sets respectively. We now have a way to deal with risk statistics that is different from the classical one. The risk of having an event (flood, wind, earthquake, or whatever is currently of interest) larger than a given magnitude can be determined from the distribution of X(t), and the times of greatest risk are given by the distribution of T(t) . 3.3.3 Compound Distribution JTuPCtions In this section the main goal is to find Ft(x) = P{X(t) < x} (22) It is easy to verify that the most general form of (22) is (23) k=l v_x Using (23) we first want to investigate the distribution functions of X(t), for nonidentically distributed exceedances via the following theorem. Theorem 3.3 (TodorovicJu. is independent of t(v ), t(v+1)f and are also mutually independent with (24) 72 Then t * - Ft(x) = P(E0 )+£p{ H Hv(x)}P(E£) (26) k=l v=1 Proof: By (21), we have that p{ n (5 < x) h e £ } v=l k = p{ n ( 5 v < x ) n [-u(k)<t-T(k+i)<t]} V = 1 k k = p { n ( ^ v < x ) n x ( k ) < t } - p { n ( s v < x ) n T ( k + i ) < t } V = 1 V = 1 k n HV(X) {p[T(k)< t]-P[t(k + 1)<t ] } V=1 k = n HV(X) P(e£) V = 1 i Substituting this final expression into (23) completes the proof. In addition to the assumptions in the theorem above if we assume identically distributed exceedances, 73 P U V < x> = H<x) then (26) becomes t 00 . , Ft(x) = P(E0)+X; [H(x)]k P(Efc), k=l and letting x = 0 in (27), yields Ft(0) = P(Eq ) which by definition is the probability that there 1 no exceedances in [0,t]. Figure 3.20 shows that not differentiable at the point x = 0. (27) will be '*-(x) is U 1 * -> X Figure 3.20 Graphical presentation that F^(x) is not differentiable at the point x = 0 -j 75 3.3.4 The Joint Distribution of X(t) and Its Time of Occurrence T(t) for Identically Distributed Exceedances Theorem 3.4 (Gupta et al.). Let {t(i)<t; i = 1,2,...} and l<i< n(t)}. (a) If {E^} is a set of independent identically distributed random variables with H (x ) = P { ^ i < x} for all i > 1, x > 0, H(x) being continuous, and (b) If the sequence {x(i)> and {t^} are stochastically independent, then 76 Ft(x,u) = P{X(t) < xr T{t) < u} ^ •» [H(x)]n = p{n(t)=o}+ X, --------p{n(t)=n} n=l n n * J2 p(T(m) < u|n(t)=n} m=l 0 < u < t, x > 0. Proof: Define Nn to be the random index such that Nn = max( / ^ 2•••*• ' 1 < m < n. (28) =■ m if Then 77 p{Nn=m, max C^<x} l<i<n P ( C m < x , C m = m a x C i > l<i<n n P{ n (5i < C m ) n ( C m < x)} i^m r x n = \ p{ n < y)} d[H(y)] by (a) • ' 0 rx . [ h { x) ]n = i [H(y)]n 1d[H(y)] = — — i<m<n (29) J 0 n Now within a fixed time interval [0,t], define N{t) to be the random index such that if X(t) = cm = max l<i<n(t) then N(t) = m, and since T(t) = t [ N (t) ], CQ = and x(0) = 0, we get • Ft(x,u) oo = £ P{X(t)<xr T[N(t)]<u, n(t)=n} n=0 0 < u < t r x > 0 00 = P{ri( t )=0>+ Y p{X(t)<x, x[N(t) ]<u, n (t)=n} n=l 00 n p{n(t)=o}+ y Y p{ max ^j<x' T(m)<u, n=l m=l N(t)=m, n{t)=n}. By definition N(t) = Nn on the set n(t) = n which with (b) yields Ft(x,u) n 00 = P{n(t)=0}+£ Y max Nn=m} n=l m=l *p{t(m)<ur n(t)=n} 78 along and plugging (29) into this equation leads to obtaining (28). 79 3.3.5 Distribution of the Number of the Exceedances 1n(t) Shane and Lynn [44] claim that under certain assumptions, n(t) is a time homogeneous Poisson process. Various proofs of this are available in the book in preparation by Kac and Iranpour [27], and we make .an adaptation of the most mathematically convenient one to prove this in the language of extremes in civil engineering: Theorem 3.5 Let n(t) be defined as in (18), and ru. be the x increment of n(t) over interval A^, i.e., n(t+At)~n(t) is independent of n{s+As)~n(s ) when Assume: independent whenever (for example 80 (t,t+At) and (s,s+As) do not overlap) (2) a. p{ti( t+At )-n( t) = 0} = l-XAt+0(At) b. P{n(t+At)-n(t) = 1} = XAt+0(At) P(n(t+At)-n(t) = k} = 0(At) for k > 2 where 0{At) is of higher order than At, i.e., 9 (At) At 0 as t— **0, and P(t=0) ‘ p[n(0)=o] " P [ T 1 ( 0 ) =1 ] • -1- 0 0 _ P[n(0)=k] _ • • L o J hold. We now want to prove that P(n(t)=k)} = e~xt{Xt)k k! For shorter notation denote (30) 81 Pfc(t) = p{n(t) = k}. Due to independence of increments, we may begin by writing P{n(t+At)=k} = p(n(t)=k} p{n(t+At)-n(t)=o} +p{n(t)=k-i> p{n(t+At)-n(t)=i} +P{n(t)=k-2} p{n(t+At)-ri(t)=2} +higher order terms which are 9(At). Using the shorter notation, we have Pk(t+At) = Pk(t)(l-aAt)+Pk_1(t)aAt+higher order terms which are 0(At) for k > 1 Rearranging and letting At-*0 give the recursion formula 82 dPk(t) dt From the recursion formula {31)/ we get ■1 1 e 0 0 -1 0 ■1 0 . . . 0' 0 . . . 0 * 1 - 1. LP, and denoting the matrix in this equation as A fundamental matrix solution is P(t) = e (t-tn)A To compute etA we do as follows: " 1 0 . . . O' "1 0 0 . . . o' 1 0 ... 0 0 1 0 . . . 0 • • f t _0 a • • • • • 1 o_ - a « • • .0 • # • • f t • . . . 1- = aB-al. (31) , the Since BI = IB, we can write 83 At = ea(B-I)t = eaBteaI(-t) = eB{at) eI(-at) and to evaluate this in general/ let's do computations for the case when A is 3 by 3: -1 0 0 A = a H O .... J -1 1 0 - l "o [ o o "i 0 0 = a 1 0 0 - a 0 1 0 _0 1 0_ 0 0 1 eK-at) = -e-at 0 e 0 0 -at 0 0 0 e-at eB(at) B{ at) B2(at)2 = I - t - 1! 4 * 2! + . but B is nilpotent of order 3, i.e., B = 0 0 o" 0 0 o’ 1 0 0 B2 = 0 0 0 _0 1 0_ 1 0 0 the 84 B* 0 0 0 0 0 0 0 0 0 = Bn i n > 3, ~1 0 0~ I o o = 0 1 0 + at 0 -0 0 1- ~0 at ' 1 0 0~ at _ 1 0 (at)2 ------ at 1 21 O’ " 0 0 0 0 + 0 0 0 0 (at) ^ 0- 0 0 2! Finally etA = eB(at) eI(-at) 1 0 0 1 at _ 1 0 (at)2 at 1 L 21 r„-a t e 0 0 0 0 -at 0 0 e~at e atela^ 21 0 e-at 0 ate“at e at 85 Note that tQ = 0, so P(.t) = p] P- etA p*(Qj e“at 0 e~at ate~at (at> _ af _ a<- ----- e ate 2! 0 0 e-at 1 0 0 -at ate' (at) L 2 ! -at -at Thus, since A is in Jordan form, familiarity with the properties of Jordan matrices could allow us to write etA by inspection: P(t) = P0 (t) * pk(t) = etA F(0) 86 e"at 0 0 (at)ke“at 0 .-at k! .-at ate (at) L kl -at -at Thus, we have just finished the proof of (30). 3.3.6 Applications to Civil Engineering Todorovic and Zelenhasic [52] claim that under certain assumptions.it has been shown by Todorovic and Yevjevich that one can get P(4> = p{n(t)=k> = r A(t)[A(t)ik k! where (32) A (t) -I: 00 X(s)ds = 22 k p(Ek) k=l (33) ! This is a time dependent Poisson process. To provide some evidence to support assumption (32), Todorovic [55] analyzes a 72 year flood record of the Greenbrier River at Alderson, West Virginia, covering (1896-1967) in the form of PDS with the base level xQ = 17000 f t 205 exceedances occurred. Figure 3.21, from 87 Todorovic [55], shows good agreement between the observed and theoretical Poisson distributions for the number of exceedances, for periods of 20, 60, 100, 140, 160, 180, 200, 240, 365 days, which supports their hypothesis. RELATIVE FREQUENCY Figure 1.0 t * 20 days t « 60 days 0.8 0.6 0.4 0.2 1 C 0.0 I 2 0 1 2 0 1 0 2 4 3 t * 140 days t =160 days t « 180 days 1.0 0.8 0.6 0.4 0.2 0.0 0 1 2 5 6 3 4 7 t * 200 days t = 220 days 0 1 2 3 t = 365 days 0.8 — Observed --Theoretical (poissonlan) K* Number of exceedances 0.4 K 8 4 0 1 2 3 5 6 7 8 S.21 Observed and corresponding theoretical (Poisson) distributions of the number of exceedances for intervals of 20, 60, 100, 140, 160, 180, 200, 220, 240, and 365 days for the Greenbrier River at Alderson, West Virginia CO CO 89 Substituting equation (30) into (27), gives Ft(x) = exp{-A(t)[l-H(x)]}, (34) and for all applications of this distribution function, we need to find H(x), the distribution of the extreme magnitude exceedances. Todorovic and Zelenhasic use data from the Susquehanna River near Wilkes-Barre, Pennsylvania, and Figure 3.22, from their paper, shows that the exponential distribution H{x) = 1-e Xx (35) fits the data for X = 2.628 iO"5 (ft-^s-1)"1. Todorovic [55] has also shown that (35) is applicable to the Greenbrier River and Figure 3.23 is from that paper. » 90 H(x) E n tir e Year The W in te r Season The Spring Season O o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o Figure 3.22 Observed distributions of the magnitude of flood peak exceedance flows for the winter season, spring season, and the entire year for the Susquehanna River at Wilkes-Barre, Pennsylvania H(x) Summer and Fall — Spring -------- Year - ------- Winter 0.1533 x(csf) 0.0 9000 18000 27000 36000 45000 54000 63000 .Figure 3.23a Greenbrier River at Alderson, West Virginia. Observed distribution functions of exceedances for four different periods H H(x 0.8 — Observed — Theoretical (Exponential) x(csf) SOOO 18000 27000 36000 45000 54000 63000 Figure 3.2 3b Greenbrier River at Alderson, West Virginia. Theoretical and observed (1 - year period) distribution function of exceedances 93 We now plug (35) into (34) to obtain Ft(x) = exp{-A(t)[e ^x]}, (36) and it is easy to note that this equation is completely determined by X and A(t). Fourier series curve fitting has been applied to the data for the Greenbrier River to find 2irt A(t) = . 2475+. 15831+. 5086cos[ ( ) + .6841tt] 18 2irt +. 0556cos ( (— -)-.1476ir] 9 2 * T T t +. 0154cos [ (---) + .7780tt] 6 2tt t + 014 2cos [ + . 6742ir ] . (37) This was first done by Todorovic, Zelenhasic [52] and Todorovic, Woolhiser [53]. Figure 3.24 shows the results. WINTER 0.30 0.20 SPRING FALL SUMMER 0.00 O v J O CM o o CO o < Figure 3.24a Greenbrier River at Alderson, West Virginia. Seasonal occurrence of exceedances S E p t . 2 0 95 3.00 2.00 Observed Fitted 0.00 O c n c t i c o r— <7ico r- i — tnuno i— o j c n c v i i — o o c m i — o i n i— co t— r cm .— co co cm • . . . . . . . . . . . O ^5 ^ • +J +j +-> > > o c c j o i - i - i. >> >> c r — i — oi a a o a o o a> re ns a> ra o. re <e rs 3 rs 3 a> aj O O Z Z O '■ ‘O'-DU-SS: < S S r3r3 T < W u i Figure 3.24b Greenbrier River at Alderson, West Virginia. Observed function A(t) and fitting function 95 As mentioned earlier Lynn and Shane's result that the number of exceedances is a time dependent Poisson does not seem to be general enough and by Figure 3.9 we see that their choice of A(t) = Xt (38) is obviously not the case. Finding X = 8.821 10"5 (ft3s“1)'"1 and using A(t) from (37) for the Greenbrier River, we can see a comparison between the theoretical and observed distribution functions for t = 140, t = 180, t =• 365 days in Figure 3.25, from Todorovic and Roussell [54]. A Ft(x) 1.0 0.9 t = 140 days 0.8 0.7 0.6 0.5 t = 180 days 0.4 t = 365 days 0.3 Observed Theoretical 0.2 0.1 x 0.0 20000 40000 60000 Figure 3.25 Theoretical and observed distribution functions of the largest exceedance for intervals of 140, 180 and 365 days for the Greenbrier River at Alderson, West Virginia VO -j 98 In the case of the Theorem 3.4 (Gupta et al.), equation (28) represents a completely general and exact (nonasymptotic) expression for the joint distribution function F^(x,u) under the assumptions indicated previously. On the basis of (32), the conditional probability P{t(m) < u ] t i (t) = n) in (28), admits an explicit representation. Gupta et al. [21] shows that and when (39) and (32) are substituted in (28), we get n P{x(m)<u t rj(t)=n} m=l r i A(u) = nP{Ui<u} = n---- A (t) 0 < u < t (39) Ffc(x,u) A (t) x>0, 0<u<t (40) 99 They also demonstrate certain mathematical properties of Ft(x,u) as follows: first, they show that (40) is discontinuous at the origin, i.e., lim Fj-(x ,u ) = e*"^*-) (41) u->0 x->0 and in the remaining region x > 0, 0 < u < t, it is differentiable, so a2 ft(x,u) = ^^-Ft(x,u). (42) Then, they find the marginal distributions, that is p{X(t) < x} = lim Fi. (x,u) t' u->t exp{-A(t)[1-H(x)]} x > 0 (43) 100 P{T(t) < u} = lim Ft(x,u) x—*»co = e - A ( t ) + M ^ u . e - A ( t ) ] ( 4 4 ) A ( t ) 1 I 0 < u < t equation (43) is exactly the same as (34), and (44) was first derived by Todorovic and Woolhiser [53]. Two discussions which have gone deeper in mathematical theoretic results are of Gupta et al., who have developed the joint distribution function of the largest flood peak and its time of occurrence for i nonidentically distributed exceedances and of Todorovic I [55], who has developed a more general form for Ft{x) in (26) and (27) under less restrictive assumptions than the ones in Theorem 3 (Todorovic), i.e., he has replaced (24) is independent of • (45) He has also used such advanced mathematical concepts in his generalization section, such as Borel sets, that have 101 doubtful physical applicability. Thus, we avoid such mathematical complications in this section. To end this chapter, we briefly explain how Liu and Ma have developed their Poisson Gumbel compound distribution to predict highest waves in typhoons which hit the coast of China. Their ideas are mathematically quite similar to Todorovic et al. which may be seen by comparing Theorem 3, and Theorem 3.6 (Liu and Ma). Let £ and a be random variables with distributions G(x) and Q(x) respectively. Designate as the ith independent observation value of and let r ) > 0 be a random variable independent of £ and a. Let P(n = k) = Pk, k = 0,1, . . . and define a Max l<i<n i f n = 0 if n > 1, then Fp(x) = Z pktG(x)]k-pot1-Q{x)1 k=0 Proof: Fp(x) = P{p<x} 0° = p{p<xnri=o}+^ p{p<xnn=k} k=l 00 = p{p<x|n=o}p{n=o}+ £ p{p<x|n=k}p{Ti=k} k=l OC = P(a<x)P0+ p( max Ui]<x) pk k=l l<i<k 00 . = pcq (x ) + £; pk[G(x)]k k = l 00 i, = £ Pk[G(x)]k-P0[l-Q(x)] k = 0 = F0(x)-e(x) where 103 00 i , P q( * ) = E Pkt G< xH k=0 e(x) = PQtl-Q(x)} Assuming lim E = E lim, then lim Fp(x) = 1 X-»-oo ; lim Fp(x) = 0 j X-*— 0 0 I I but both because PQ is small and for computational 1 convenience they use Fp(x) « FQ(x) for their application. Of course, F0(x) is not quite a distribution function, i.e., F0(x) = 1 X-J-cc but F0(x) = PQ f t 0 X->— c o Tables 3.4, 3.5 and Figure 3.26 show the results of their 104 method for comparison with the annual max series method using the Gumbel distribution. Table 3.4 also contains some results from the Pearson type III distribution. All of these results are based on their computer simulations. Table 3.4 Comparison of design wave height for once in many years at station three 1 Design wave height for once in many years, in meters Method of calculation (1) Period of date (2) Number of data . (.3) Once in 100 yr (.4) Once in 50 yr (5) Once in 20 yr (6) Once in 10 yr (7) P-III type curve 27 yr (1949-1975) 27 12.31 11.34 9.98 8.94 Gumbel curve 27 yr (1949-1975) 27 12.99 11.84 10.30. 9.21 Our method (for 27 yr) 27 yr (1949-1975) 86 12.97 11.97 10.59 9.52 , Our method (for front 9 yr) 27 yr (1949-1957) 28 12.05 11.07 9.72 8.72 Our method (for middle 9 yr) 27 yr (1958-1966) 23 12.24 11.29 9.99. 8.99 Our method (for rear 9 yr) 9 yr (1967-1975) 35 13.84 12.84 11.44 10.35 *Note: Our method refers to Liu and Ma's method 105 106 Return period 10000 2000 1000 500 200 100 50 20 10 5 11 10 9 Gumbel cistributiot Compound extreme_ _ value d'stribution 8 7 6 5 4 3 2 0 FREQUENCY Figure 3.26 Comparison of statistical experiment values Table 3.5 Comparison of statistical experiment values CALCULATING VALUE AND ERROR Return Period Method of calculation (1) Once in 10,000 yr (2) Once in 2,000 yr (3) Once in 1,000 yr (4) Once in 500 yr (5) Statistical experiment value 9.80 8.20 7.80 7,26 Compound extremum distribution: Calculating value 10.30 8.69 8.01 7.31 Error 0.50 0.49 0.21 0.06 Gumbel distribution: Calculating value 11.12 9.40 8.63 7.90 Error 1.32 1.20 0.83 0.65 1G8 CHAPTER 4 THE NEW PREDICTION METHOD AND MODELS FOR SIMULATION 4.1 Introduction The methods used in this investigation fall into several groups. First, there is the basic mathematical formulation of the intuitive concepts which have been discussed in the previous chapters. Many of the operations involved in this formulation, such as integrals, and solution of nonlinear equations, are infinitary and so next the particular methods needed to approximate these processes on a computer must be discussed. There are several different alternative approaches, with differing degrees of reliability and efficiency. One of these has been selected for the computer program that was written. Finally, since the tests reported here are based on simulated data, the simulation methods are discussed. Several variations of | 109 I i the basic model were used, with varying degrees of | realism. 4.2 Basic Mathematics Consider a joint cumulative distribution of two random variables X, Y, where F(x,y) = P(X < x, Y < y). For our purposes, we assume that F(x,y) is continuous in ! x and y, differentiable and integrable enough that all the derivatives and integrals exist, and that no problems I I arise with changing the order of integration when i j necessary. These assumptions appear to be reasonable for this application. We can say that wind speeds and wave heights vary over the nonnegative real numbers up to an indefinite upper limit, although if the wave heights are measured in feet, and the wind speeds in miles per hour, the available instruments do not give more than one decimal place of precision. Furthermore, considering the turbulent motion of the atmosphere which are the ! I principal concern of this study, and considering the 110 irregularity for the crests of large waves, we can see that wind speed and wave height are quantities that are not defined with high precision. Using connected subsets of the real line as the domain of variation of random wind speeds and wave heights is thus something of idealization. Nevertheless, there is no evidence that this idealization is misleading; conventional probability theory is formulated in terms of measures on the real line; and the concepts formulated here may be applicable to other linked quantities that can be defined with higher precision. It seems difficult to imagine a collection of measurements of continuously varying macroscopic physical quantities which could not be approximated by a distribution having the desired differentiability and integribility properties, and this is what will be used. So we will follow the standard notation in writing the joint probability density f(x,y) = d2F(x,y) dxdy and the marginal density functions Ill Too f x ( x ) = J _ oo f ( x , y ) d y i : f v ( y ) = 1 f ( x , y ) d x , I —oo marginal cumulative distributions Fx(x) = | ^ fx(t)dt = J * J f(t,y)dydt = P(X<x) FY<y> = j L fy<t)dt = j ^ X » f!t,x)<Jxdt = P(Y<y) and for each x, the conditional density f(x,y) f c ( Y / X ) t x ( x ) Then conditional cumulative distribution function would be Fc(y/x) = P(Y<y/X=x) = fc(t/x)dt and thus 112 J 'oo Fc(y/x)fx(x)dx -oo = f c { t / x ) d t f x ( x ) d x f y f ( X , t ) , ^ _ 1 1 - - - - - - - - d t f y ( x ) d x J - o o J - o o f x ( x ) X f y f00 f ( x , t ) = \ y \- - - - - - - - - - - - f y ( x ) d x d t J - o o J - c o f v ( x ) X ■ U ( X ) '00 f ( x , t ) d x d t = FY(y), •00 x so that the marginal distribution of the dependent variable can be considered as an appropriately weighted sum of the conditional distributions. The above development shows that in idealized terms the conditional distribution method is a natural way to obtain the desired marginal distribution. It is also possible to justify the regression method in these terms, but it is more difficult. If we consider that with F(x,y) as assumed, with x,y varying over connected subsets of the real line, Fx<x) and Fy(y) are both monotonic increasing functions from their respective domains to [0,1], then for any y with FY(y) £ (0,1), there exists a unique x such that Fy(y) = Fx{x), so that we can define a 113 A A function X(y) which makes Fy(y) = Fx(X(y)). We can also define an idealized regression function r(x) between x and y by letting r{x) = y such that J r 00 r oo (y-t)2f(t/x)dt = inf{ I (u-t)2f(t/x)dt}. —CO 00 • — 0CXu<°° Note that even though the interval is not compact, any physically realistic conditional density function will insure that the infimum is attained in the interior of the interval. Then if r(x) is monotonic, it will have an inverse, and it is possible to show that r-1(y) is the desired X(y) which allows us to conve rt fx(x) into Fy(y)• In the nonmonotonic case, where r-1(y) is not a point but a set, one can obtain Fy(y) by summing (or integrating, in the uncountable case) fx(x) over the set {x:x e r_1(y)}. This derivation is not relevant to the method which is being presented, so there is no need to discuss details here, but it seems worth noting in passing that the difference in effectiveness between the regression method and the conditional distribution method does not occur in this idealized mathematical formulation (which correspond to using an uncountable infinity of 114 observations) but in the practical case where only a finite number of observations are used. The desired risk statistic is the minimum event which will have a given recurrence period. Since Fy(y) = F(Y < y), the average recurrence period of an event Y > y is 1 R(y) l-Fy(y) Given a return period T, we want y = R ^(T). Clearly Fy(Y) = 1“^ * hence -1 -1 1 R X(T) = Fy1(1--) Thus, our real goal is to be able to find R-1(T) for practically interesting values of T. For the sort of application which prompted this research, these values of T are likely to range from a few decades to a few centuries. 115 4.3 Practical Implementations As suggested by the above* discussion, the practical problem of finding R_1(T) can be subdivided into two areas: 1) estimation of F y ( y ) from the available data; 2) evaluation of Fy1 for tHe desired return periods; Before discussing the details of methods for estimating conditional distributions, and of using them to estimate Fy(y), it is worth noting that such complex formulas for estimating Fy(y) could not be considered without the recognition that numerical evaluation of Fy1- could be accomplished relatively easily, despite the intractability of doing an analytic inversion. This is because all the estimates for Fy(y) are linear Combinations with positive coefficients of estimates for Fc(y/x). Since the estimates for Fc are monotone, and the coefficients are positive, the estimates for Fy(y) are also monotone. Thus, a unique solution always exists, and is not difficult to find. Only a few significant figures of the solution are needed, since as noted before, wind speeds and wave heights are not determined with high precision. Also, these results are intended to be used for design specifications, which is another reason why a large number of significant figures are not needed. Thus, 116 the numerical ease of computationally inverting this type of estimate for F y ( y ) was a significant consideration in choosing the approach that has been selected.. f ^ ^ If we have a finite set of observations of X, a frequency histogram of these will approximate fx(x). The approximation will be better as n increases. If we A have an estimate F c ( y / x ^ ) for F c ( y / x ^ ) for each x^, then we can approximate F y { y ) b y n E F C( Y / Xi > i = l n The quality of this approximation will depend both on n A and on the accuracy of the Fc(y/x^). There are several A possible ways to make an estimate Fc(y/xj_), but the differences will come later. All the approaches which have been considered begin by dividing the range of possible X values into intervals of finite width. The widths do not have to be uniform. For example, the intervals where events are sparser may need to be wider for improved counting statistics. Then if the set of joint data points is 117 © = {( xjl , y±) and the intervals are given by 0 < tQ < t*L < ... < tp, we can let 0j = e 0|tj_! < <tj>, j=l,...,p. We will assume tp > max{xi}< £_1, so that all the data points will fall within the intervals. Let qj be the number of observations in 0j and let the observations be given new subscripts so that the elements of 0j are written as <xl j ' y i j ) ' < x2j ' y 2j > ' “ *'< xq j j '^qj j ^ ' Then we can write qj 0j = {(xij^yij))i=i for j = 118 Now for each j we can find S <Xij-Xj>V/2 i=l \ The mean and standard deviation of the Y values corresponding to the X values in the specified interval. If necessary, other statistical measures could be evaluated on each interval, using the standard formulas, but for these investigations so far, the mean and standard deviation have been sufficient. These mean and standard deviation values obviously provide an approximation to and p(x) = cr(Y) = { J (y-Y)2fc(y/x)dy}1/2 •00 which give the ideal variates of the mean and standard deviation of the conditional distribution of Y for a given X = x. These discrete approximation values are used 119 A> i n c a l c u l a t i n g t h e e s t i m a t e s F c ( y / x £ ) , a n d t h e d i f f e r e n c e s b e t w e e n t h e m e t h o d s w h i c h h a v e b e e n i n v e s t i g a t e d a r e i n t h e v a r i o u s w a y s o f e s t i m a t i n g t h e A__________________ F c ( y / x j . ) u s i n g Y j _ a n d S j v a l u e s . 4 . 3 . 1 T h e " W i d e I n t e r v a l ” M e t h o d P e r h a p s t h e s i m p l e s t a p p r o a c h i s t o a s s i g n a n e s t i m a t i o n f u n c t i o n F Y , ( y ) t o e a c h o f t h e p i n t e r v a l s . A ^ A _ n a t u r a l v / a y t o d o t h i s i s t o l e t f y _ . ( y ; Y ^ , S j ) b e a n a p p r o x i m a t i o n t o •c (y/tj-! < x < tj) = _t f t j V f(x,y)dx tj_i J 3 f x ( x ) d x t j - l t h e n A_____ _ where fv (y;Y^,S^) is assumed to be a distribution of a J ^ ^ particular type, such as Frechet or Gumbel. Since the parameters which appear in the conventional definitions of these functions can be obtained from Yj and Sj, we can 120 consider any of them to be parameterized by Yj and Sj, A _ hence the notation fy. (y;Yj,Sj). The possibility mentioned above that one might calculate other statistical measures on the Y values of interval j was to A allow for the possibility that a function fy_. might be used which needed more parameters than Yj and Sj. So far this has not seemed necessary. A This formulation does not require the Fy . function to be of the same type. If it seemed necessary to fit the data, one could be a lognormal, another could be a Gumbel, and the third could be a Frechet, for example. This possibility also has not seemed necessary so far, and the methods discussed later in this section are based on the assumption of one type of function for all intervals. The choice of the type of function in practice might depend largely on theoretical arguments of the sort discussed in the papers reviewed in Chapter 3, since it would take a large number of data to be able to say that the best possible empirical fit of one type of distribution was significantly better than the best possible fit of another type. This is because moSt of the observations will fall near the peak of the distribution, and standard fitting methods will select parameters for 121 each type of distribution to give good agreement with the observations near the peak, even if the upper tails of the different distributions disagree with each other. This problem leads to questions of data analysis which are outside the scope of this dissertation. There is an established methodology of using the Kolmogorov-Smirnov test and related tests to compare empirical with theoretical cumulative distributions. Determining whether it is possible to do better than the Kolmogorov- Smironov test in this type of application might be a research question in itself. The present studies have followed the lead of numerous previous authors in using * Gumbel distribution as the Fv . function for Y values which are measurements, predictions, or simulations of wave height. / s Once we have our function FYj(y) for each interval j, we take all of our observations of X, with or without accompanying observations of Y, and put them into a set ( ■%n D - (xi)i=l* Let nj be the number of X^ values in the set 122 Dj = {xi e D!tj_x < Xi < tj}, so that P E nj = n- j=l The "wide interval" method is simply to approximate A Fc(y/xj.) by FY .(y) ^or *-he x^ e Dj, so that P £ njPy (y) j=l J n The method is informally called the 'wide interval' method, because in the calculation of wind speed and wave height, where X is the wind speed, I have used tj-tj_^_ > 10 miles per hour. 4.3.2 The "Individual Slice" Method As noted above, in the "wide interval" method, the A Fv functions can be of different types for different j values of j, without posing any conceptual difficulties. It also is not difficult to implement in a program, particularly if one is using one of the usual scientific FY(y) = 123 programming languages, which include statement types specifically designed to sfelect among several alternatives, based on a selection parameter such as j. The "computed Go to" statement in FORTRAN is one example of this, but other languages, contain similar features, such as the SELECT in PL/1, and CASE in PASCAL. However, if the distribution of the observations are the same type of distribution functions, such as a Gumbel, then a modification of the wide interval method is possible. One may consider Yj, Sj to be assigned to some particular x coordinate in the interval. One might use the center of interval j, Cj - — T “ ' or perhaps the average of the coordinates of the joint observations that fall into that interval: ‘i] Then we can do interpolation, extrapolation or both among 124 the pairs (Cj,Yj) and (Cj,Sj) to obtain function Y(x), S(x) which can be evaluated at each x^ e D. Using regression to obtain the function Y(x), S(x) is another At possibility. In either case, instead of having an Fy_. assigned to interval j, it is now possible to define Fc(y/x) = G(y; Y(x),S(x)) where G is the general formula for some type of distribution function, such as a Gumbel, parameterized by the continuously varying parameters Y(x), S(x). Then the individual slice estimate is n F y ( y ) E Fc( y / xi > i=l n Here we are actually taking a different conditional distribution (a "slice" out of the joint distribution) for each value. If we use regression to get the continuous Y(x), S(x), then we are modeling the variation of the parameters of the conditional distribution of our 125 desired quantity, rather than modeling the variation of the quantity itself. However, the number of observations at low values of X (low wind speeds, in our application) is likely to be so much larger than the number at higher values that unless a weighting scheme is used that gives more weight to the observations at higher speeds, a regression curve might be merely an extrapolation of the trend displayed by the values for the lower speeds. It is possible that interpolation between the original points could be more faithful to the data than a uniformly weighted regression would. This approach would need to have a way to extrapolate beyond the Yp, Sp values, in case for some i, there were an e D with x^ > Cp (or whatever is being used as the x coordinate for Yp, Sp). For this sort of application, some very simple extrapolation scheme might be satisfactory. For example, one might simply take a linear extrapolation from Yp with slope mp+mp-l+mp-2 m = ^ , where 126 is the slope of the interpolation line between adjacent points. Obviously the same thing could be done to extrapolate from Sp , and various modifications to this approach could be desired, such as basing m on 2 or 4 points, or using a higher degree extrapolation. However, unless the parameters Yj and Sj change very rapidly from one interval to the next, the individual slice method appears computationally too expensive to justify the added accuracy that it would bring. As will be discussed in more detail below, the search for Fy (1-—) requires A A numerous evaluations of Fy(y). The more functions Fc A which are used in the estimate of Fy(y), the longer each iteration takes. This is especially true since we are usually evaluating the extreme tail of the Fc function, where special time-consuming numerical-analytic techniques are needed to insure accuracy. 4.3.3 The "Narrow Interval" Method The remarks above imply that if the wide interval method does not seem accurate enough, the best approach 127 would be a compromise between that and the individual slice method. An obvious approach is to calculate the continuous function Y(x), S(x) as before, but then to count the number of X^ values in each of a larger collection of narrow intervals, and evaluate Y(x) and S(x) at the centers of those intervals. Let the intervals be defined by 0 < vQ < ... < vr for r > p, and vk+vk-l d _ ------------- for k = l,...,r. K 2 Then letting Ek = (xiIvk-l - xi < vk^ and ^k = lEkl ' the number of X i n Ek, we have ~ n' k=l and Fy(y) X /kG(y; Y{dk),S(dk)) k=l n For example, as noted above in the "wide interval" 128 method, differences > 10 seem reasonable for wind-wave data, so vk = vk-l+2 might be a good choice for the narrow intervals. Depending on the local climate, this would probably mean somewhere between 40 and 100 intervals. (Hurricane Camille had winds of about 200 miles per hour). In any case, evaluating several dozen Fc functions would be V better than the several thousand that might appear in a calculation using the individual slice method. A 4.4 Evaluation of Fy-^l-i) To find y = r l ( l - i ) 1 T by a numerical method an iterative search method is used to find y such that FY(y) = 1““ = R 129 which is equivalent to the solution y of Fy(y)-R = 0. Out of the possible choices the bisection method is the simplest: it searches the interval to find two nearby points y^, y2 such that A A FY(yi)-R < 0 and FY(y2)-R > 0. The method of false position converges faster than bisection, while Newton's method additionally requires A the derivative of FY(y) and converges even faster. Obviously, there is a possible trade off between the efficiency of the iteration method, and the complexity of A the approximation FY(y) which was discussed in the previous sections. For either of the interval methods, however the amount of time needed to find F x(l-—) xs T likely to be a small fraction of the time needed to generate random numbers in a simulation, or to process observational data, and so the efficiency of the 130 iteration method may not be an important consideration. 4.5 Choices For The Simulation Tests Even though the wide interval method could be evaluated using, for example, a lognormal distribution for Fy^(y), a Gumbel for Fy^fy), a Frechet for Fy^(y) etc., it is obviously more convenient to use the same functional form for all the conditional distributions. It is especially convenient to' use the Gumbel distribution for all of them, because of the ease of determining the parameters of a Gumbel distribution from mean and standard deviation values. Given Yj and Sj, the Gumbel distribution parameters corresponding to them are Uj = Yj-. 45 Sj , otj = .779 Sj. Then -([num<y-Uj>,denom<cxj>]) Fy . (y)'= exp(-e ) . 3 As Isaacson and Mackenzie [24] note in a table, 131 cited in Chapter three, the mean and variance of a Weibull or a Frechet distribution can be calculated from the parameters which characterize these distributions, However, the formulas do not seem to be analytically invertible by conventional methods. It is possible that there is some exotic special function, or some other clever device that is not widely known which would allow an analytic inverse to be defined, but it still might be more efficient for the present work to evaluate the inverse numerically. In any case, for this initial study, there were enough other potential sources of complexify for the programs which had to be written, and so it seemed wise to simplify this aspect of the programming by using Gumbel distributions. One of the unavoidable sources of complexity in the program was the need to have an accurate approximation for the tail of the distribution. If we define the normalized Gumbel variable as Y-Oj Zj = ---- i - J Oj -Z-: then for large Y, Zj is large, so that e J is small, and 132 is equivalent to having 1 _zi - = l-exp(-e J) -zAn “Z 4 n -z . (e 3)2 (e 3)3 = e 3 - -------- + — 21 31 In this way, one can eliminate the need to subtract numbers which are nearly equal to each other, and the extreme tail of the distribution can be evaluated. The -Zj i number of terms used depends on the size of X®— ; — — , l! since only 7 significant figures are available in single precision on IBM 370-series computers. Terms which are too small to contribute to the value are omitted. Double precision does not seem necessary, considering the other sources of uncertainty. Of course, the series would need to be truncated for any finite precision scheme. A similar approach would be used for other types of 133 distributions, but the one above implements the Gumbel distributions which I am using. Thus, the equation which is actually solved in the interval method is m. P “‘3 £ nj<~£ j-l i=l -Z-i i (~e 3) i! n T where mj is the number of terms taken for the interval j. One reason for using this type of approach is that any search method must be able to find its way back if it goes too far in a given direction. Thus, it seemed better to have an accurate estimate for l-Fy(y) even if y were too large, rather than letting it be rounded to 0 after the subtraction. 4.6 Simulation Method The effort to test this method by simulation involves several aspects. One is the basic problem of generating a collection of ordered pairs with a given joint distribution. Another is the degree of realism attempted in simulating the variation in weather behavior 134 during a year. The third concerns the relationship between the independence of consecutive numbers generated by a random number generating program as compared to the independence of actual weather observations. 4.6.1 Ordered Pairs with a Specified Distribution To review briefly the generation of "random numbers" by computer programs, it was observed a long time ago that if 'm' is a large prime number, and if 'a' is an integer which is a fairly large fraction of m, and if bQ is another integer in the interval 0 < bQ < m, then if b^ a is determined by taking the remainder after ab^^is divided by m, the result is a sequence of numbers bQ,bi,...,bk which appear to be independent and uniformly distributed on the interval between 0 and m, as long as k < Each number is actually determined by the one which precedes it, so they actually are not random, but the relationship between them cannot be determined by conventional statistical tests. Thus, they are actually pseudo random numbers, but the use of such numbers in Monte Carlo Simulation programs is so well established that the routines which generate such numbers are usually just called "random number generators". A true random number generator which produced truly unpredictable 135 results, would have to be a device which was attached to the computer and produced measurements of some random physical process, such as radioactive decay, or Brownian motion * Such a device would pose many inconveniences, not 'only in design of computer hardware, but in program testing. It would be difficult to determine whether unexpected program results were the result of a program error, or an especially large random variation in the output of the random number generator, because the random number generator's outputs would never be repeated .exactly. Perhaps provisions could be made for recording the random number generator's outputs, and replaying them, instead of getting new random numbers, during program testing. However, researchers in the 1950's concluded that the benefits of using true random numbers were not great enough to justify the extra inconvenience, when congruential pseudo random number generator routines of the sort described above, worked so well, and could be incorporated so easily into a program. Ordinarily, the output values of a congruential routine are divided by 'm' in order to get values uniformly distributed between 0 and 1. It is therefore possible to obtain numbers which are distributed 136 according to another distribution by using these numbers as inputs to the inverse of the desired cumulative distribution function. (Note that another drawback of physical random number generators might be the distribution of the random numbers produced. Radioactive decays, for example, follow a Poisson distribution. This drawback can be overcome to some extent, as discussed below). The inverse cumulative function approach has weaknesses in simulating the extreme tails of distributions, because the uniformly distributed input values are a discrete set, and do not come arbitrarily close to 0 and 1. However, if used with caution, it can be very convenient. The discussion above is given to show why caution is needed, since I have used pseudo random number programs provided by the IMSL library and why it seems wise not to attempt to use my current programs to simulate the magnitudes of events with recurrence periods of more than several thousand years. The IMSL routines were used because they have been written by experts in statistics programming, and elaborately tested, so that they are better than routines which I could develop on my own for this purpose. For example, since the seed values are to be input in double 137 precision, it appears that the congruence calculations are done in double precision, and are only truncated to single precision for output. This would mean that more of the available single-precision numbers between 0 and 1 will be used, than if the calculations were done in single precision. It also means that a much larger number of pseudo random numbers can be generated before the sequence starts to repeat. With single precision on an IBM 370, the maximum number is no more than several hundred millions. With double precision it is many trillions. As my work only requires some thousands of random numbers, at present, the problem of repetition is not likely to be an important consideration, but it is clear that the use of double precision makes sure that no foreseeable program will be limited by repetition of the IMSL random number generators. Of course, if the inverse cumulative distribution function Fy1 converts random numbers which are uniformly distributed on the unit interval into the distribution Fy, then the cumulative distribution function will convert random variables with any distribution into u(0,l). This is clear from the following facts: I f U1,...,Un are U(0,1), then F y 1 {U1) , . . . , F y 1 (Un } 138 are distributed according to Fy This is because P(Fy1(U) < t) = P(U < Fy(t)) = FY{t)• Similarly, if Yx,...,Yn are random variables with any distribution Fy, then Fy(Y^),...,Fy(Yn) are U(0,1), i.e., P(FY(Y) < t) = P{Y < F^1(t)) = t Thus, in the ideal case, it would not matter if random numbers were generated by a physical device, in some nonuniform distribution. It would be possible to convert the generator's outputs to a uniform distribution, and then to the desired distribution. In fact, the two steps could be combined into one, using a single composite function. In practice, since a measuring device would produce only a finite set of outputs, the same problem of handling the extreme tails of the other distribution would occur, possibly made worse by the increased computational complexity in evaluating the composite function. In addition, in a device making measurements of a random physical process, the parameters of the process 139 might not be known with high accuracy, but they would be needed for the conversion. In the case of radioactive decays, the average number of disintegrations per second would change with time, unless a very long-lived isotope were used. All of this argues that it would be best to minimize the amount of conversion done to distributions, and to use pseudo random numbers instead of a physical random number generator. If the congruent method is stared with a different "seed" value cQ/ instead of the value bQ mentioned at the beginning, the value b0,b1,b2,... will appear independent of the values c0,ci,c2,... so that the ordered pairs (b^,c^) will be approximately uniformly distributed over the unit square and can be transformed by an inverse cumulative distribution method to a set of ordered pairs with the desired joint distribtuion. That is, given cumulative distribution F(x,y) with a marginal cumulative distribution Fx(x), conditional cumulative distribution Fc(y/x), and given a pair (rj_,r2) £ [0,l]x[0,l], from a uniform distribution on the unit square, we can evaluate X = Fjl(ri) and if for such an x, 140 r 2 = Fc(y/x) then Y = F"1(r2/x) = Fc1(r2/FX1(rl)>• As discussed above, for this simulation, Gumbel distributions are used for the conditional distributions, and it is easy to evaluate the inverse of a Gumbel distribution. If y-U -( ) P(Y < y) = e"e a = P then y = U-a in(- Jnp) . It is also easy to find the inverse of a Frechet distribution, i.e., since y - ( ) -ot P(Y < y) = e u = P then 141 1 y = u{- Inp) a . With these two inverse functions, a large number of simulations can be devised. 4.6.2 Simulation Schemes As suggested by the discussion in the previous chapters, which explains how the conditional distributions of waves for given winds are imagined to be the result of essentially permanent features of the shape of the shore and the sea bottom, the same type of conditional distribution is used in all the simulations. Thus, given a simulated wind speed x, and a random number r2 between 0 and 1, all the simulation generate the simulated wave height from the inverse Gumbel distribution y = F”1(r2/x) = U(x)-a(x) ln(-lnr2) then U(x) = Y(x)-.45 S(x) a(x) = .779 S(x) 142 and Y(x) = .lx+3. S (x) = 2.+ .08\ l i T . This gives wave heights of several feet for winds of a few dozen miles per hour, and models the increase in wave height, and also the greater variability of wave height, as the wind increses. Even for 0 wind, there will be a positive wave height, to allow for the swells from distant storms. It is very likely that not only the constants in the formulas for Y(x) and S(x), but the functional forms used, could be changed to model a particular coastal area more accurately. However, the main purpose of the simulations is to see whether the new approach makes better use of the data than older methods do. This advantage will be demonstrated if the new method gives answers which are more accurate, especially when it has only a small amount of previous information on which to base its estimates. While the conditional dependence of waves on winds 143 is assumed to be fixed, a large variety of simulated wave behavior can be obtained by different simulations of wind behavior. Three different type of simulations have been tried - "uniform weather method", "seasonal variations", and "seasonal with hurricanes". 4.6.2.1 Uniform Weather In this approach, a time-invariant distribution is used to generate the simulated winds. A certain distribution Fx{x;X,s) is chosen to represent the probability of different wind speeds, and then all the simulated winds are generated by feeding the uniformly distributed random numbers r^ through Fx^ (rj_ ;X,s) where X,s are constant. This procedure effectively assumes that a given wind speed has the same probability of being observed at all times during the year, which is not very realistic. For some applications, such a lack of realism might not matter but the new method is intended to be applied to weather records containing daily observations, or at least every few days, so that a hundred or more observations a year might be used. The number used might depend on considerations of independence of observations, discussed in the next section. Each observation is still imagined to be the extreme which occurs during its 144 particular time interval (a day, or a few days). However, as storms are usually concentrated in certain months, the set of values accumulated over a whole year, being drawn from different populations, might not be distributed according to one of the standard extreme - value distributions. All of this - suggests that a pure realistic simulation should put seasonal variation in the parameters of the inverse distribution. 4.6.2.2 Seasonal Variations The general climate modeled in the seasonal variation routine which has been used is one in which the winds are highest and most variable in the Summer, and lowest in the Winter. If there are to be n observations during the year, which is imagined to start and end in Winter, then for a random number r -j_ at time i, the simulated wind will be Fx1(ri;X(i),S(i)) where _ 2ni X ( i ) = Wi —Wocos (--- ) n i-l,2,...,n 2ni S ( i ) = W-5-W4COS (---) . J * n 145 wl' w2' w3' w4 are chosen to give plausible variations of the parameters. When w2 and W4 are large enough this should give extreme values that are detectably different from the uniform weather simulation. As noted in the discussion of the implementation of the conditional distribution, this particular seasonal variation model could doubtless be changed to model a real coastal climate more accurately. For example, the time of highest storm activity might be nearer the fall, or there might be two peaks, one in spring, and one later, with a calmer period in between, or other possibilities. If there were a good reason to adopt the seasonal variation model program to adapt an actual coast, it probably would not be difficult, but the sinusoidal scheme above seems adequate as long as we merely want a significant departure from the uniform weather model. 4.6.2.3 Seasonal with Hurricanes One more step toward realism for a particular type of climate has been programmed. Tropical storms (known as hurricanes in the Atlantic and typhoon in the Pacific), are sufficiently different from milder storms to deserve special treatment. Liu and Ma's research was specifically done for the heavily typhoon-battered coast of China. Since their research was one of the major inspirations for mine, a modification of the seasonal variation model was programmed to include typhoon events. In this model, for each simulated year, a random integer is generated to give the number of typhoons which occur in that year. This number is a sample from a Poisson distribution, and is generated by another IMSL routine. Unlike the inverse cumulative distribution method described above, which only works for random variables distributed on intervals of the real line, the IMSL routine works by generating uniformly distributed random variables, multiplying them, and counting to see how many are needed to exceed e~^ where X is the Poisson parameter of the distribution. This count is then returned as the integer. Actually, there are two different IMSL routines, implementing different algorithms. One of them is better for generating large number of integers sampled from a single distribution, while the other, which I use, is better if called frequently, for a small set of output integers per call. (All of the IMSL random number generating routines will 147 generate an entire array of numbers from a single call, for efficiency) . As I did not write or modify this routine, it seems best to refer the interested reader to the IMSL documentation for further details, and a reference to a proof that the method described is correct. Once the number of typhoons or hurricanes which occur in the year has been generated, the behavior of the simulation routine depends on this number. If 0 typhoons occur, the whole year is generated according to the seasonal variation model. If at least one typhoon occurs, the seasonal variation model is followed for the first quarter of the year. After that, for each typhoon, another Poisson distributed random integer is generated to tell how many weeks elapse from the current time until the typhoon occurs. The seasonal-variation model is evaluated until the typhoon occurs, when a wind is generated from a distribution based on a mean of 65 miles per hour and a standard deviation of 30. These parameters express the fact that a storm must have winds of 75 miles per hour to be classified as typhoon, but some storms loose some wind speed before they hit the coast. Typhoons are all assumed to last 5 days, but in a more 148 realistic simulation, this could also be determined by a random integer. After the wind for the typhoon is generated, the time of the typhoon is taken as the current time, an the process is repeated until all the typhoons have occurred. After that, the seasonal model is resumed to complete the year. These models could be constructed using either the Gumbel or the Prechet distribution for the winds, with the parameters being varied appropriately in the "seasonal”, and "seasonal with hurricanes” models. Because of the previously mentioned convenience of converting means and standard deviations into the Gumbel parameters, Gumbel distribution have been used in my tests, but a Prechet distribution routine has been written and incorporated into the program in a successful small-scale test execution. 4.6.3 Selection of Independent Observations If there is a positive dependence between X and Y, as there is in the case of winds and waves, it may be possible to omit some of the lowest values of X if they do not contribute significantly to the estimation of For example, in the wide interval method, if 149 for values of y large enough to be of interest in extreme value analysis, then there is no need to include A _ A— 1 1 evaluations of Fy^y) in the calculation of -Fy (1-^). Corresponding modifications can be made in the individual slice and narrow interval methods. This approach has more advantages than just computational efficiency. Many geophysical time series display the same general characteristic appearance. Relatively long periods of similar values at a relatively low level are interrupted by occasional brief excursions to much higher levels. For example, Kirby [30] comments that a steamflow hydrograph can be considered as a sequence of nearly instantaneous peaks separated by relatively long periods of low flows. Clearly, if the sampling period is such that it is common to have several samples during a period of low values, these samples will not be independent of each other. The averaging processes used at various stages of this calculation may not be valid if they are based on observations which are not independent. If we select only events which exceed some base 150 level, then if the base level is sufficiently higher, the magnitudes of these events usually are independent. Gupta et al. [21] discuss this for stream flow, but it should be true for winds as well. Computationally, this is very easy. One simply evaluates Fy(y) using the intervals or slices which are above the base level, instead of using all of them. Another way to improve the independence of the observations is to take samples which are separated by a time lag long enough that the autocorrelation of the time series is low at this lag. For example, wind is sufficiently changeable that observations every few days should be reasonably independent. This discussion of independence of observations emphasizes one way in which all of the simulation methods discussed above are different from real data. Since the successive outputs of a random number generator are intended to be as independent as possible, the simulated values are probably more independent than a typical sequence of real observations would be. For the present simulation, this does not appear to be a serious problem. However, more realistic simulations are possible, in which the strictly independent values from the random 151 number generator are fed into some moving-average or autoregressive algorithm to produce a random process with a significant temperal autocorrelation, and sampling schemes like those discussed above could be tested to see how well they restore independence. Possibilities for further work will be discussed more extensively in Chapter 6, summary and discussions. 152 CHAPTER 5 COMPUTER RUNS AND RESULTS 5.1 Introduction The following pages contain the results from the simulation program. The first section contains the individual tables from the printouts, with the results of the uniform model grouped first, then the results of the seasonal model with hurricanes, and finally the less complete results which are currently available for the seasonal model. The next section contains some table which summarize the results of the outputs in a more compact form, and the last section of this chapter contains flowcharts of the program routines, with a few additional notes on aspects of program operation which are not covered elsewhere. 153 5.2 Individual Outputs All of these printouts follow the same form. The phrase "GUMBEL-GUMBEL SIMULATION" in the first line refers to the use of Gumbel distribution for both the winds and the conditional distributions of the waves. If Frechet distributions were used for the winds, as discussed previously, this line would be changed to read "FRECHET-GUMBEL SIMULATION. The next line gives the type of model: "MO SEASONAL VARIATION" means the uniform weather model, and the other two models are described by the same terms used in this dissertation. As will be seen at other points below, I have changed some of my terminology between the time the FORMAT statements were written in the program, and the writing of this dissertation. If further work were done on the program, changing the statements which produce these labels would be easy. The next line which speaks of "HINDCASTING WITH BINS", refers to use of the wide interval method. In case of using the individual slice or the various possible narrow-interval methods suggested in Chapter 4, this line would be changed to describe the other method. 154 The next four lines describe the particular selection of the data which produced the predictions below. As discussed in the previous chapters, this program simulates a situation in which a period when only wind data are available is followed by the introduction of wave-measuring instruments, and joint wind and wave data become available. The simulation actually has two purposes: one is to assess the accuracy of the different methods, by comparing their prediction against the "real" results of simulating a longer period of later wave data. The other is to assess the efficiency of the methods, by comparing the stability which they display when used on various subsets of the overall set of simulated joint data. Thus, in all of these simulations, as the printouts describe, 140 years of simulated data are generated. The first 60 years only have wind data, and the last 80 years were joint data. This is probably a longer period than what is actually available, but it made possible a large number of subdivisions of the joint data period in order to test stability. The particular subdivision used is given by the two lines which tell how many years of joint data were used, and which ones. The number of intervals "bins" for wind speed and 155 their upper bounds are given next. The lower bound of each bin except the first is the upper bound of the previous bin. The first starts at 0. These intervals were used in all runs, to make the results more easily comparable. The intervals are wider at the higher wind speeds to improve the counting statistics. As discussed in the section on "Selection of Independent Observations", the observations in the lowest intervals would not be used in a computation based on real data, and so the lowest three intervals are not used in this simulation. Bin 4 is the first which appears in the "DOSTAT SUMMARY", produced by subroutine DOSTAT {see the flowcharts for further information about it), which finds the means and standard deviation of the waves corresponding to the winds in each interval. This summary was originally printed above the first line which appears here, but is placed on the side in these pages for better compactness. The wave height predictions at the bottom of each group are for the return periods given in years. Each row gives the predictions of one method, as shown. NTYPH is the number of typhoons in a year, needed for studying the results of Liu and Ma's method, and 156 TYFHN is the cutoff level for considering a storm to be a typhoon, which is always 75 miles per hour- Table 5.1 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 80 years of joint data 140 years of data available 60 years of data with wind only 80 years of joint data used start year = 61 stop year = 140 NTYPH, TYPHN = 44 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 10 bins with upper bounds at 4 909 6.383E 00 2.523E 00 10. 20. 30. 40. 50. 5 459 7.464E 00 2.463E 00 60. 75. 90. 110. 140. 6 192 8.397E 00 2.450E 00 7 104 9.470E 00 2 .806E 00 8 34 1.090E 01 2.635E 00 9 8 1.201E 01 2.057E 00 10 2 1.429E 01 ' 1.115E 00 Predictions of wave heights for return periods 10. 20. 50. 100. Gumbel (annual max) 1.742E 01 1.863E 01 2.019E 01 2.135E 01 Liu and Ma 1.32IE 01 1.477E 01 1. 6 7 IE 01 1.813E 01 New method 1.597E 01 1.733E 01 1.916E 01 2.056E 01 Ul <t Table 5.2 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 60 years of joint data 140 years of data available NTYPH, TYPHN = 38 75.0 60 years of data with wind only 60 years of joint data used DOSTAT SUMMARY start year = 61 stop year = 120 BIN NOBS YBAR SIG 10 bins with upper bounds at 4 690 6.367E 00 2.574E 00 in ?n in 4h in 5 337 7.449E 00 2.446E 00 1 . U * £ U • J U « “ U • v • 60. 75. 90. 110. 140. 6 137 . 8.32IE 00 2.375E 00 7 83 9.492E 00 2.819E 00 8 29 1.111E 01 2.684E 00 9 7 1.18 IE 01 2.132E 00 10 2 1.429E 01 1.115E 00 Predictions of wave heights for return periods 10. 20. 50. 100. Gumbel (annual max) 1.766E 01 1.891E 01 2.052E 01 2.17 2E 01 Liu and Ma 1.369E 01 1.525E 01 1.719E 01 1.862E 01 New method 1.602E 01 1.738E 01 1.924E 01 2.065E 01 H Ul 00 Table 5.3 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 50 years of joint data, the 1st period 140 years of data available 60 years of data with wind only 40 years of joint data used start year = 61 stop year = 100 10 bins with upper bounds at NTYPH, TYPHN =28 75.0 DOSTAT SUMMARY 10. 60. 20. 75. 30. 40. 90. 110. 50. 140. BIN NOBS YBAR SIG 4 461 6.379E 00 2.512E 00 5 226 7.414E 00 2.478E 00 6 88 8.557E 00 2.481E 00 7 58 9.165E 00 2.355E 00 8 22 1.092E 01 2.601E 00 9 5 1.208E 01 2.54 5E 00 10 1 1.350E 01 0.0 Predictions of wave heights for return periods Gumbel (annual max) Liu and Ma New method 10. 20. 50. 100. 1.672E 01 1.775E 01 1.909E 01 2.009E 01 1.370E 01 1.523E 01 1.714E 01 1.855E 01 1.579E 01 1.715E 01 1.892E 01 2.027E 01 H U1 VO Table 5.4 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 40 years of joint data, the 2nd period 140 years of data available 60 years of data with wind only 40 years of joint data used start year = 101 stop year = 140 10 bins with upper bounds at 10. 60. 2 0 . 75. 30. 90. 40. 110. 50 140, NTYPH, TYPHN = 16 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 448 6.388E 00 2.535E 00 5 233 7.512E 00 2.453E 00 6 104- 8.261E 00 2.428E 00 7 46 9.855E 00 3.276E 00 8 12 1.086E 01 2.813E 00 9 3 1.191E 01 1.353E 00 10 1 1.508E 01 0.0 Predictions of wave heights for return periods Gumbel (annual max) Liu and Ma New method 10. 20. 50. 100. 1.806E 01 1.940E 01 2.114E 01 . 2.244E 01 1.259E 01 1.427E 01 1.63IE oi 1.780E 01 1.466E 01 1.610E 01 1.805E 01 1.958E 01 o Table 5.5 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcastina with bins, 20 years of joint data, the 1st period 140 years of data available NTYPH,TYPHN = 15 75.0 60 years, of data with winds only 20 years of joint data used DOSTAT SUMMARY start year = 61 stop year = 80 BIN NOBS YBAR SIG 10 bins with upper bounds at' 4 228 6.432E 00 2.548E 00 10. 20. 30. 40. 50. 5 116 7.511E 00 2.511E 00 60. 75. 90. 110. 140. 6 31 8.595E 00 2.260E 00 7 30 9.937E 00 2.691E 00 8 12 1.062E 01 3.227E 00 9 3 1.198E 01 2.173E 00 10 0 0.0 0.0 Predictions of wave heights for return periods 10. 20. 50. 100. Gumbel (annual max) 1.728E 01 1.835E 01 1.975E 01 2.079E 01 Liu and Ma 1.398E 01 1.578E 01 1.802E 01 1.969E 01 New method 1.467E 01 1.610E 01 1.798E 01 1.940E 01 ( _ i Table 5.6 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 20 years of joint data, the 2nd period 140 years of data available 60 years of data with wind only 20 years of joint data used start year = 81 stop year = 100 10 bins with upper bounds at 10. 20. 30. 40. 50. 60. 75. 90. 110. 140. Predictions of wave heights for return 10. Gumbel (annual max) 1.605E 01 1. Liu and Ma 1.338E 01 . 1. New method 1.413E 01 1. NTYPH, TYPHN =13 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 233 6.327E 00 2.480E 00 5 110 7.312E 00 2.450E 00 6 57 8.536E 00 2.613E 00 7 28 8.337E 00 1.598E 00 8 10 1.129E 01 1.668E 00 9 2 1.222E 01 4.049E 00 10 1 1.350E 01 0.0 periods 20, 50. 100. 699E 01 1.822E 01 1.913E 01 456E 01 1.603E 01 1.711E 01 550E 01 1.731E 01 1.871E 01 H fo Table 5.7 Gumbel'-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 20 years of joint data, the 3rd period 140 years of data available 60 years of data with wind only 20 years of joint data used start year = 101 stop year = 120 10 bins with upper bounds at NTYPH, TYPHN =10 75,0 DOSTAT SUMMARY 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. BIN NOBS YBAR SIG 4 229 6,342E 00 2.700E 00 5 111 7.522E 00 2.389E 00 6 49 7.896E 00 2.129E 00 7 25 1.025E 01 3.623E 00 8 7 1.172E 01 3.062E 00 9 2 1.114E 01 3.448E 01 10 1 1.508E 01 0.0 Predictions of wave heights for return periods Gumbel (annual max) Liu and Ma New method 10. 20. 50. 100. 1.913E 01 2.061E 01 2.253E 01 2.397E 01 1.379E 01 1.546E 01 1.752E 01 1.904E 01 1.491E 01 1.648E 01 1.865E 01 2.031E 01 < y > UJ Table 5.8 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 20 years of joint data, the 4th period 14 0 years of data available 60 years of data with wind only 20 years of joint data used start year = 121 stop year =140 10 bins with upper bounds at 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. NTYPH, TYPHN = 6 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 219 6.435E 00 2,355E 00 5 122 7.504E 00 2.519E 00 6 55 8.586E 00 2.643E 00 7 21 9.386E 00 2.823E 00 8 5 9.647E 00 2.136E 00 9 1 1.345E 01 0.0 10 0 0.0 0.0 Predictions of wave heights for return periods 10. 20. 50. 100, Gumbel (annual max) 1.66 6E 01 1.773E 01 1.910E 01 2.014E 01 ■Liu and Ma 1.078E 01 1.238E 01 1.428E 01 1.565E 01 New method 1.4 33E 01 1.572E 01 1.756E 01 1.894E 01 H C T > t C . Table 5.9 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 10 years of joint data, the 1st period 14 0 years of data available 60 years of data with v?ind only 10 years of joint data used start year = 61 stop year =70 10 bins with upper bounds at 10. 60. 2 0 . 75. 30. 40. 90. 110. 50. 140. NTYPH, TYPHN = 6 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 109 6.429E 00 2.304E 00 5 67 7.352E 00 2.387E 00 6 16 8.897E 00 2.524E 00 7 17 1.055E 01 2.654E 00 8 6 9.615E 00 1.845E 00 9 0 0.0 0.0 10 0 0.0 0.0 Predictions of wave heights for return periods Gumbel (annual max) Liu and Ma New method 10. 20. 50. 100, 1.565E 01 1.642E 01 1.74IE 01 1.816E 01 1.115E 01 1.226E 01 1.363E 01 1.465E 01 1.436E 01 1.572E 01 1.749E 01 1.885E 01 H Oi ui Table 5.10 Gumbel-Gumbel simulation of winds and waves,, no seasonal variation in model, conditional distribution hindcasting with bins, 10 years of joint data, the 2nd period 14 0 years of data available 60 years of data with wind only 10 years of joint data used start year =71 stop year = 80 10 bins with upper bounds at NTYPH, TYPHN = 9 75.0 DOSTAT SUMMARY 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. 10. Gumbel (annual max) Liu and Ma New method BIN NOBS YBAR SIG 4 119 6.434E 00 2.763E 00 5 49 7.729E 00 2.681E 00 6 15 8.272E 00 1.977E 00 7 13 9.130E 00 2.619E 00 8 6 1.162E 01 4.134E 00 9 3 1.198E 01 2.173E 00 10 0 0.0 0.0 return periods • 20, 50. 100. 01 1.943E 01 2.089E 01 2.198E 01 01 1.781E 01 2.036E 01 2.225E 01 01 1.640E 01 1.850E 01 2.012E 01 H CTt Table 5.11 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 10 years of joint data, the 3rd period 140 years of data available 60 years of data with wind only 10 years of joint data used start year = 81 stop year = 90 10 bins with upper bounds at 10. 20. 30. 40. 50. 60. 75. 90. 110. 140. NTYPH, TYPHN = 6 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 113 6.353E 00 2.426E 00 5 53 7.638E 00 2.877E 00 6 31 8.485E 00 2.710E 00 7 12 7.697E 00 1.542E 00 8 3 1.175E 01 5.119E-•01 9 2 1.222E 01 4.049E 00 10 1 1.350E 01 0.0 Predictions of wave heights for return periods 10. 20. 50. 100. Gumbel (annual max) 1.621E 01 1.697E 01 1.795E 01 1.869E 01 Liu and Ma 1.383E 01 1.500E 01 1.647E 01 1.755E 01 New method 1.442E 01 1.595E 01 1.795E 01 1.947E 01 CTl Table 5.12 Gumbel-Gumbel simulation of winds and waves, no seasonal variation in model, conditional distribution hindcasting with bins, 10 years of joint data, the 4th period 14 0 years of data available 60 years of data with wind only 10 years of joint data used start year = 91 stop year = 100 10 bins with upper bounds at NTYPH, TYPHN = 7 75.0 DOSTAT SUMMARY 10. 60. 2 0 . 75. 30. 90. 40. 110. 50. 14 0. BIN NOBS YBAR SIG 4 120 6.303E 00 2.539E 00 5 57 7.009E 00 1.949E 00 6 26 8.598E 00 2.543E 00 7 16 8.818E 00 1.511E 00 8 7 1.109E 01 1.984E 00 9 0 0.0 0.0 10 0 0.0 0.0 Predictions of wave heights for return periods Gumbel (annual max) Liu and Ma New method • o 1 —1 20. 50. 100. 1.566E 01 1.670E 01 1.803E 01 1.904E 01 1.300E 01 1.418E 01 1.566E 01 1.675E 01 1.365E 01 1.496E 01 1.668E 01 1.798E 01 0 0 Table 5.13 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcating with bins, 80 years of joint data 140 years of data available 60 years of data with wind only 80 years of joint data used start year = 61 stop year - 140 10 bins with upper bounds at NTYPH, TYPHN = 108 75.0 DOSTAT SUMMARY 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. 10 Gumbel (annual max) Liu and Ma New method BIN NOBS YBAR SIG 4 818 6.525E 00 2.585E 00 5 432 7.369E 00 2.491E 00 6 214 8.153E 00 2.420E 00 7 163 9 . 5 0IE 00 2.507E 00 8 61 1.158E 01 2.893E 00 9 33 1.315E 01 4.036E 00 10 14 1.620E 01 2.425E 00 return periods 20. 50. 100. 01 2.051E 01 2.251E 01 2.401E 01 01 2.007E 01 2.268E 01 2.463E 01 01 1.942E 01 2.157E 01 2.326E 01 h - > O '1 > £ > Table 5.14 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 60 years of joint data 140 years of data available 60 years of data with wind only 60 years of joint data used start year = 61 stop year = 120 10 bins with upper bounds at NTYPH, TYPHN =79 75.0 DOSTAT SUMMARY 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. Predictions of wave heights for return periods 10. 1.942E 01 Gumbel (annual max) Liu and Ma New method 1.855E 01 1.803E 01 BIN NOBS YBAR SIG 4 590 6.602E 00 2.616E 00 5 340 7.379E 00 2.574E 00 6 164 8.104E 00 2.373E 00 7 123 9.375E 00 2 .446E 00 8 40 1.180E 01 2.828E 00 9 27 1.357E 01 4.289E 00 10 12 1.626E 01 2.610E 00 lrn periods 20. 50. 100, 2.107E bl 2.319E 01 ; >.478E 01 2.068E 01 2.339E 01 ; >.541E 01 1.971E 01 2.204E 01 ; >. 384E 01 H O Table 5.15 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 4 0 years of joint data, the 1st period 14 0 years of data available 60 years of data with wind only 4 0 years of joint data used start year = 61 stop year = 100 10 bins with upper bounds at 10. 20. 30. 40. 50. 60. 75. 90. 110. 140. Predictions of wave heights for return periods 10. Gumbel (annual max) 1.822E 01 Liu and Ma 1.78IE 01 New method 1.758E 01 NTYPH, TYPHN = 57 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 390 6.581E 00 2.630E 00 5 218 7.530E 00 2.696E 00 6 110 8.077E 00 2.204E 00 7 87 9.495E 00 2.537E 00 8 28 1.138E 01 2.588E 00 9 19 1.328E 01 2.827E 00 10 10 1.628E 01 2.650E 00 lrn periods 20. 50. 100. 1.95IE 01 2.117E 01 2.242E 01 1.965E 01 2.200E 01 2.374E 01 1.903E 01 2.098E 01 2.240E 01 H H Table 5.16 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 4 0 years of joint data, the 2nd period 140 years of data available 60 years of data with wind only 40 years of joint data used start year = 101 stop year =140 10 bins with upper bounds at 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. Predictions of wave heights for return periods 10. 1.966E 01 Gumbel (annual max) Liu and Ma New method 1.817E 01 1.593E 01 NTYPH, TYPHN = 51. 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 428 6.474E 00 2.54 3E 00 5 214 7.205E 00 2.256E 00 6 104 8.232E 00 2.638E 00 7 76 9.508E 00 2.489E 00 8 33 1.174E 01 3.159E 00 9 14 1.296E 01 5.383E 00 10 4 1.600E 01 2.08IE 00 irn periods 20. 50, 100. 2.144E 01 2.374E 01 2.546E 01 2.045E 01 2.336E 01 2.552E 01 1.768E 01 2.012E 01 2.215E 01 172 Table 5.17 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting-.with bins, 20 years of joint data, the 1st period 140 years of data available 60 years of data with wind only 20 years of joint data used s.tart year = 61 stop year = 80 10 bins with upper bounds at 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. NTYPH, TYPHN = 31 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 200 6.66IE 00 2.852E 00 5 109 7.716E 00 2.702E 00 6 54 7.794E 00 2.188E 00 7 40 9.348E 00 2.314E 00 8 16 1.228E 01 2.325E 00 9 10 1.296E 01 3.420E 00 10 5 1.762E 01 2.510E 00 Predictions of wave heights for return periods 10. 20. 50. 100. Gumbel (annual max) 1.915E 01 2.056E 01 2.240E 01 2.378E 01 Liu and Ma 1.866E 01 2.055E 01 2.295E 01 2.475E 01 New method 1.658E 01 1.820E 01 2.019E 01 2.169E 01 H U) Table 5.18 Gumbel-Gumbel simulation of winds and waves, seasonal variation with hurricanes, conditional distribution hindcasting with bins, 20 years of joint data, 2nd period 140 years of data available 60 years of data with wind only 20 years of joint data used start year = 81 stop year =100 10 bins with upper bounds at 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. NTYPH, TYPHN = 26 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 190 6.497E 00 2.378E 00 5 109 7.345E 00 2.690E 00 6 56 8.351E 00 2.205E 00 7 47 9.620E 00 2.730E 00 8 12 1.017E 00 2.508E 00 9 9 1.365E 01 2.132E 00 10 5 1.494E 01 2.238E 00 Predictions of wave heights for return periods' Gumbel (annual max) Liu and Ma New method 10. 20. 50. 100. 1.709E 01 1.816E 01 1.953E 01 2.056E 01 1.678E 01 1.853E 01 2.077E 01 2.243E 01 1.553E 01 1.687E 01 1.864E 01 1.998E 01 i —1 <i i t s . Table 5.19 Gumbel-Gumbel simulation of winds and waves, seasonal variation with hurricanes, conditional distribution hindcasting with bins, 20 years of joint data, 3rd period 140 years of data available 60 years of data with wind only 20 years of joint data used start year = 101 stop year - 120 10 bins with upper bounds at 10. 20. 30. 40. 50. 60. 75. 90. 110. 140. NTYPH, TYPHN = 22 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 200 6.642E 00 2.595E 00 5 122 7.108E 00 2.325E 00 6 54 8.159E 00 2.706E 00 7 36 9.084E 00 2.218E 00 8 12 1.280E 01 3.222E 00 9 8 1.425E 01 6.858E 00 10 2 1.618E 01 3.42IE 00 Predictions of wave heights for return periods 10. 20. 50, 100. Gumbel (annual max) 2.140E 01 2.356E 01 2.636E 01 2.846E 01 Liu and Ma 2.003E 01 2.280E 01 2.633E 01 2.894E 01 New method 1.651E 01 1.866E 01 2.186E 01 2.465E 01 175 Table 5.20 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 20 years of joint data, 4th period 140 years of data available 60 years of data with wind only 20 years of joint data used start year = 121 stop year - 140 10 bins with upper bounds at 10. 20. 30. 40. , 50. 60. 75. 90. 110. 140. NTYPH, TYPHN = 29 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 228 6.326E 00 2.494E 00 5 92 7.335E 00 2.166E 00 6 50 8.311E 00 2 .587E 00 7 40 9.889E 00 2 .6 8 IE 00 8 - 21 1.114E 01 3 .035E 00 9 6 1.124E 01 1.806E 00 1 0 2 1.583E 01 1.08IE 00 Predictions of wave heights for return periods Gumbel (annual max) Liu and Ma New method 10. 20. 50. 100, 1.738E 01 1.853E 01 2.003E 01 2.115E 01 1.608E 01 1.778E 01 1.994E 01 2.155E 01 1.546E 01 1.675E 01 1.855E 01 1.999E 01 176 Table 5.21 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 10 years of joint data, the 1st period 140 years of data available- 60 years of data with wind only 10 years of joint data used start year =61 stop year =70 10 bins with upper bounds at 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. NTYPH, TYPHN = 14 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 95 6.651E 00 2.709E 00 5 47 7.348E 00 2.643E 00 6 35 7.92IE 00 1.823E 00 7 21 9.212E 00 2.569E 00 8 7 1.179E 01 2.192E 00 9 5 1.175E 01 • 1.394E 00 10 2 2.014E 01 1.679E--01 Predictions of wave heights for return periods 10. 20. 50. 100. Gumbel (annual max) 1.872E 01 2.028E 01 2.230E 01 2.381E 01 Liu and Ma 1.830E 01 2.030E 01 2.285E 01 2.474E 01 New method 1.583E 01 1.875E 01 2.017E 01 2 .0 36E 01 177 Table 5.22 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 10 years of joint data, the 2nd period 140 years of data available 60 years of data with wind only 10 years of joint data used start year = 71 stop year =80 10 bins with upper bounds at 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. NTYPH, TYPHN = 17 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 105 6 .670E 00 2.988E 00 5 62 7.994E 00 2.735E 00 6 19 7.560E 00 2.780E 00 7 19 9.499E 00 2 .056E 00 8 9 1.266E 01 2 .483E 00 9 5 1.416E 01 4.555E 00 10 3 1.594E 01 1.425E 00 Predictions of wave heights for return periods 10. 20. 50. 100. Gumbel (annual max) 1.937E 01 2.055E 01 2.209E 01 2,324E 01 Liu and Ma 1.907E 01 2 .089E 01 2.323E 01 2.497E 01 New method 1.670E 01 1.828E 01 2,053E 01 2.240E 01 'able 5.23 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 10 years of joint data, the 3rd period 140 years of data available 60 years of data with wind only 10 years of joint data used start year = 81; stop year = 90 10 bins with upper bounds at 10. 20. 30. 40. 50. 60. 75. 90. 110. 140. NTYPH, TYPHN =12 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 92 6.487E 00 2.097E 00 5 53 7.635E 00 3.200E 00 6 30 8.375E 00 2.250E 00 7 23 9.872E 00 3.136E 00 8 4 8.990E 00 2.694E 00 9 5 1.388E 01 2.537E 00 10 3 1.385E 01 1.795E 00 Predictions of wave heights for return periods Gumbel (annual max) Liu and Ma New method 10. 20. 50. 100. 1.749E 01 1.855E 01 1.992E 01 2.095E 01 1.685E 01 1.874E 01 2.115E 01 2.293E 01 1.578E 01 1.732E 01 1.938E 01 2.096E 01 H Table 5.24 Gumbel-Gumbel simulation of winds and waves, seasonal variation model with hurricanes, conditional distribution hindcasting with bins, 10 years of joint data, the 4th period 140 years of data available NTYPH, 1 TYPHN = 14 75.0 60 years of data with wind only 10 years of joint data used DOSTAT SUMMARY start year = 91 stop year = 100 BIN NOBS YBAR SIG 10 bins with upper bounds at 4 98 6.506E 00 2.625E 00 10. 20. 30. 40. 50. 5 56 7.070E 00 2 . 0 9 IE 00 V f l A r t V • V / t X W c 9 60. 75. 90. 110. 140. 6 26 8.323E 00 2.195E 00 7 24 9.380E 00 2.318E 00 8 8 1.076E 01 2.361E 00 9 4 1.336E 01 1.828E 00 10 2 1.656E 01 2 .185E 00 Predictions of wave heights for return periods O . » 20. 50. 100, Gumbel (annual max) 1.671E 01 1.778E 01 1.917E 01 2.021E 01 Liu and Ma 1.687E 01 1.857E 01 2.075E 01 2.236E 01 New method 1.556E 01 1.666E 01 1.836E 01 1.963E 01 18 0 Table 5.25 Gumbel-Gumbel simulation of winds and waves, seasonal variation model, conditional distribution hindcastirig with bins, 80 years of joint data 140 years of data available NTYPH, TYPHN =66 75.0 60 years of data with wind only 80 years of joint data used DOSTAT SUMMARY ■ start year = 61 stop year = 140 BIN NOBS YBAR SIG 10 bins with upper bounds at 4 821 6.574E 00 2.602E 00 5 413 7.387E 00 2.585E 00 10. 20. 30. 4 0. 50. 6 192 8.205E 00 2.468E 00 60. 75. 90. 110. 140. 7 131 9 .400E 00 2.490E 00 8 41 1.103E 01 2.369E 00 9 19 1.294E 01 3.114E 00 10 6 1.535E 01 2 .44 0E 00 Predictions of wave height for : return periods 10. 20. 50. 100. Gumbel (annual max) 1.793E 01 1.924E 01 2.093E 01 2.220E 01 Liu and Ma 1.518E 01 1.689E 01 1.905E 01 2.065E 01 New method 1.653E 01 1.798E 01 1.987E 01 2.132E 01 181 Table 5.26 Gumbel-Gumbel simulation of winds and waves, seasonal variation model, conditional distribution hindcasting with bins, 60 years of joint data 140 years of data available NTYPH, TYPHN = 53 75.0 60 years of data with wind only 80 years of joint data used DOSTAT SUMMARY start year = 61 stop year = 120 BIN NOBS YBAR SIG 10 bins with upper bounds at 4 587 6.64IE 00 2.637E 00 T A A A A A A A C A 5 323 7.413E 00 2.702E 00 10. 20. 30. 40. 50. 6 147 8.132E 00 2.412E 00 60. 75. 90. 110. 140. 7 98 9.15 IE. 00 2.359E 00 8 32 1.126E 01 2.541E 00 9 16 1.349E 01 3.072E 00 10 5 1.511E 01 2.643E 00 Predictions of wave heights for return periods • o r —i 20. 50. 100. Gumbel (annual max) 1.819E 01 1.953E 01 2.126E 01 2.256E 01 Liu and Ma 1.576E 01 1.752E 01 1. 973E 01 2.138E 01 New method 1.671E 01 1.816E 01 2.010E 01 2.159E 01 H CO Table 5.27 Gumbel-Gumbel simulation of winds and waves, seasonal variation model, conditional distribution hindcasting with bins, 40 years of joint data 140 years of data available 60 years of data with wind only 40 years of joint data used start year = 61 stop year = 100 10 bins with upper bounds at 10. 20. 30. 40. 50. 60. 75. 90. 110. 140. NTYPH, NTYPHN =39 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 389 6..619E 00 2.639E 00 5 211 7.463E 00 2.656E 00 6 101 8.032E 00 2.197E 00 7 66 9.272E 00 2.453E 00 8 25 1.093E 01 2.36IE 00 9 11 1.378E 01 3.13IE 00 10 3 1.439E 01 2.489E 00 Predictions of wave heights for return periods 10. 20. 50. 100. Gumbel (annual max) 1.743E 01 1.862E 01 2.016E 01 2.13IE 01 Liu and Ma 1.562E 01 1.733E 01 1.949E 01 2.109E 01 New method 1.658E 01 1.803E 01 1.999E 01 2.144E 01 H 00 u> Table 5.28 Gumbel-Gumbel simulation of winds and waves, seasonal variation model, conditional distribution hindcasting with bins, 20 years of joint data, the 1st period 140 years of data available NTYPH, TYPHN = 19 75.0 60 years of data with wind only 20 years of joint data used DOSTAT SUMMARY start year = 61 stop year = 80 BIN NOBS YBAR SIG 10 bins with upper bounds at 4 200 6.709E 00 2.875E 00 10. 20. 30. 40. 50. 5 104 7.581E 00 2.655E 00 60. 75. 90. 110. 140. 6 52 7.863E 00 2.165E 00 7 32 9.407E 00 2.282E 00 8 13 1.176E 01 2.079E 00 9 5 1.470E 01 4 .24 3E 00 10 1 1.678E 01 0.0 Predictions of wave heights for return periods 10. 20. 50. 100. Gumbel (annual max) 1.810E 01 1.934E 01 2. 094E 01 2.214E 01 Liu and Ma 1.656E 01 1.837E 01 2.065E 01 2. 2 34E 01 New method 1.38 8E 01 1.546E 01 1.762E 01 1.936E 01 H CO Table 5.29 Gumbel-Gumbel simulation of winds and waves, seasonal variation model, conditional distribution hindcasting with bins, 20 years of joint data, the 2nd period 14 0 years of data available 60 years of data with wind only 20 years of joint data used start year = 81 stop year = 100 10 years with upper bounds at 10. 60. 20. 75. 30. 90. 40. 110. 50. 140. NTYPH, TYPHN = 20 75.0 DOSTAT SUMMARY BIN NOBS YBAR SIG 4 189 6 .523E 00 2.368E 00 5 107 7.349E 00 2.664E 00 6 49 8.212E 00 2.239E 00 7 34 9.145E 00 2,631E 00 8 12 1.003E 01 2.399E 00 9 6 1.301E 01 1.911E 00 10 2 1.320E 01 1.96IE 00 Predictions of wave heights for return periods Gumbel(annual max) Liu and Ma New method 10. 1.660E 01 1.453E 01 1.330E 01 2 0. 1.768E 01 1.605E 01 1.463E 01 50. 1.903E 01 1.798E 01 1.636E 01 100. 2.012E 01 1.941E 01 1.769E 01 186 5.3 Summary Tables 5.3.1 Description of the Tables These tables were prepared to give a better overall view of the behavior of the different methods. The predictions for events with 50 and 100 year recurrence periods are grouped together because these longer-term predictions are a greater challenge to a prediction method than the 10 and 20 year recurrences listed in the individual tables. Also, 50 and 100 year recurrence intervals are short enough to have more practical importance than, for example, a 5000 year recurrence prediction, while there is likely to be more economic importance in a structure designed to last 50 or 100 years, than in one which will last 10 or 20. The format of the summary tables is largely self- explanatory. The first column, "years of joint data" gives the start and stop years of the period of the calculations, as they appear in the individual tables on the previous pages. The following columns appear in pairs, describing the results for each prediction method. The first column gives the number of observations on which each prediction is based, and the second is the 187 prediction. Por the annual max series method, the number of observations is simply the number of years. The number of typhoons appear as a separate item on the earlier printouts, and the number of points for the new method is the sum of the NOBS column of the DOSTAT Summary. The first three pages give the results for the three types of models which were discussed in Chapter 4. The uniform model and the seasonal model with hurricanes were studied first, since they were expected to provide the greatest contrast with the seasonal model as an intermediate case. A less complete set of runs was made for the seasonal model, to reduce computer expense. As predicted in Chapter 4, evaluating the predictions of the models takes very little time, compared to generating the tens of thousands of random numbers needed in the simulation. A further expense was provided by the need to rerun the program after the results which are summarized in tables 5.34 and 5.35. These were generated with a major programming error which limited the highest wind speeds to 100 miles per hour. Thus, I inadvertently obtained a test of the sensitivity of the different methods to measurement errors in the data. This is valuable because 188 it is shown that the weather instruments used in earlier times were not as accurate as newer types. For example, the anemometers used before 1927 tended to overestimate the higher wind speeds. While correction methods are known for these errors, it seems likely that the older results are still not as reliable as the newer ones. One final item which appears on each page of tables is the "actual" minimum wave height with a given recurrence period estimated from simulation of a longer period of data. Owing to the aforementioned expense of generating huge numbers of random numbers, and because this method required more random numbers per simulated year than one required by Liu and Ma's simulation method, the future has not been simulated over hundreds of thousands of years, as they did. Periods of 200 and 500 years of waves have been simulated, and the results listed on the tables. The 200 years actually consists of 120 years of future, and the 80 years of waves in the joint data. Likewise, the 80 years are augmented by another 420 to get the 500. Table 5.30 Uniform summary model recurrence period of , 60 years 100 years of data with wind only, years of joint data n joint Annual max Gumbel n typhoon Liu and Ma n points New method 61-140 80 21.35 44 18.13 1708 20.56 61-120 60 21.72 38 18.62 1285 20.65 61-100 40 20.09 28 18.55 861 20.27 101-140 40 22.44 16 17.80 847 19.38 61-80 20 20.79 15 19.69 420 19.40 81-100 20 19.13 13 17.11 441 18.71 101-120 20 23.97 10 19.04 424 20.31 121-140 20 20.14 6 15.65 423 18.94 61-70 10 18.16 6 14.54 215 18.85 71-80 10 21.98 9 22.25 205 20.12 81-90 10 18.69 6 17.55 215 19.47 91-100 10 19.04 7 16.75 226 17.98 From an actual 500 year simulation From an actual 200 year simulation 23.36 20.35 189 Table 5.31 Uniform summary model, 60 years of data with wind only, recurrence period of 50 years Years of joint data n joint Annual max Gumbel . n typhoon Liu and Ma n points New method 61-140 80 20.19 44 16.71 1708 19.16 61-120 60 20.52 38 17.19 1285 19.24 61-100 40 19.09 28 17.14 861 18.92 101-140 40 21.14 16 16.13 847 18.05 61-80 20 19.75 15 18.02 420 17.98 81-100 20 18.22 13 16.03 441 17.31 101-120 20 22.53 10 17.52 424 18.65 121-140 20 19.10 6 14.28 423 17.56 61-70 10 17.41 6 13.63 215 17.49 71-80 10 20.89 9 20.36 205 18.50 81-90 10 17.95 6 16.47 215 17.95 91-100 10 18.03 7 15.66 226 16.68 From an actual 500 year simulation From an actual 200 year simulation 21.40 19.93 Table 5.32 Summary model of seasonal with hurricanes, 60 years of data with wind only, recurrence period of 100 years Years of joint data n joint Annual max n typhoons Liu and Ma n points new met! 61-140 80 24.01 108 24.63 1735 23.26 61-120 60 24.78 79 25.41 1296 23.84 61-100 40 22.42 57 23.74 862 22.40 101-140 40 25.46 51 25.52 870 22.15 61-80 20 23.78 31 24.75 437 21.69 81-100 20 20.56 26 22.43 428 19.98 101-120 20 28.46 22 28.94 434 24.65 121-140 20 21.15 29 21.55 439 19.99 61-70 10 23.18 14 24.74 212 20.36 71-80 10 23.24 17 24 .97 222 22.40 81-90 10 20.95 12 22.93 210 20.96 91-100 10 20.21 14 22.36 218 19.36 From an actual 500 year 27.35 simulation From an actual 200 24.06 year simulation 191 Table 5.33 Summary model of seasonal with hurricanes, 60 years of data with wind only, recurrence period of 50 years Years of joint data n joint Annual max n typhoon Liu and Ma n points New method 61-40 80 22.51 108 22.68 1735 21.57 61-120 60 23.19 79 23.39 1296 22.04 61-100 40 21.17 57 22.00 862 20.98 101-14 0 40 23.74 51 23.36 870 20.12 61-80 20 22.40 31 22.95 437 20.19 81-100 20 19.53 26 20.77 428 18.64 101-120 20 26.36 22 26.33 434 21.86 121-140 20 20.03 29 19.94 439 18.55 61-70 10 22.30 14 22.85 212 20.17 71-80 10 22.09 17 23.23 222 20.53 81-90 10 19.92 12 21.15 210 19.38 91-100 10 19.17 14 20.75 218 18.36 From an actual 500 year simulation 24.56 From an actual 200 year simulation 22.70 t — * N) Table 5.34 Summary model of seasonal variation, 60 years of data with wind only, recurrence period of 100 years Years of joint n joint Annual max n typhoons Liu and Ma n points New method data Gumbel 61-140 80 22.20 66 20.65 1623 21.32 61-120 60 22.56 53 21.38 1208 21.59 61-100 40 21.31 39 21.09 806 21.44 61-80 20 22.14 19 22 .34' 407 19.36 81-100 20 20.12 20 19.41 399 17.69 From an actual 200 year simulation, 23.12 Table 5.35 Summary model of seasonal variation, 60 years of data with wind only, recurrence period of 50 years Years of joint data n joint Annual max Gumbel n typhoons Liu and Ma n points New method 61-140 80 20.93 66 19.05 1623 19.87 61-120 60 21.26 53 19.73 1208 20.10 61-100 40 20.16 39 19.49 806 19.99 61-80 20 20.94 19. 20.65 407 17.62 81-100 20 19.08 20 17.98 399 16.36 From an actual 200 year simulation, 22.53 H V £ > u> 194 5.3.2 Analysis of- the Tables The clearest result in these tables is that the new method is more stable if different time periods are selected, hence more immune to short-term sampling efforts, than either the annual max series method or the method of Liu and Ma. Perhaps the easiest way to see this is to consider the range between the maximum and the minimum predicted value for each method for the different models and recurrence periods. Table 5.36 provides this summary. Table 5.36 Maximum and minimum predicted value for each method Uniform Annual max least greatest diff. 100 18.16 23.97 5.81 Liu and Ma least greatest diff, 14.65 22.25 7.60 New method least greatest- diff 20.65 50 17.41 22.53 5.12 13.63 20.36 6.73 Seasonal 100 20.21 with hurricanes 50 19.17 28.46 26.36 8.25 7,19 21.55 19,94 28,94 26.33 7.39 6.39 17.98 16.68 19,63 18,36 19.24 24,65 22.04 2.67 3.54 5.02 3.68 195 196 It is clear that the new method produces results which are usually about half as variable as the other methods. It is worth noting that the relative variability of the other methods depends on the model used. Thus, for the uniform model, Liu and Ma's method is more variable than the annual maximum method because this model produces few occasions of winds greater than 75 miles per hour, which can be treated as typhoons for Liu and Ma's method. In short Liu and Ma's method has little information to work on in the uniform model case. This is obvious from the "n typhoons" columns in the, summary table pages. On the other hand, the seasonal model with hurricanes makes the annual max series method more variable than the method of Liu and Ma. Not only are there more typhoons for the Chinese method to incorporate into its calculations, but the basic process, with its seasonal variations and randomly inserted hurricanes, seems less likely to be modeled well by a simple Gumbel distribution, such as the annual max method uses in this program. Turning now to the absolute accuracy of the predictions, we find that all the methods tend almost always to underpredict the actual height of the events 197 with specified recurrence periods. If we say, as a heuristically chosen criterion, that a method is successful if its prediction is within .5 feet of the "actual" result, and that otherwise it has overpredicted or underpredicted, then table 5.37 shows the scores for the different method based on the summary tables. Table 5.37 Correct, over and on summary tables underprediction within a range cores for of .5 feet the different methods based Uniform model Annual max method Liu and Ma New method Recurrence Future Period years under correct over under correct over under correct over 200 100 years 4 3 5 11 0 1 7 5 0 500 11 0 1 12 0 0 12 0 0 200 6 2 4 11 1 0 12 0 0 50 years 500 9 Seasonal with hurricanes 2 1 12 0 0 12 0 0 200 7 2 3 4 1 7 10 1 1 100 years 500 11 0 1 11 0 1 12 0 0 200 6 4 2 5 4 3 12 0 0 50 years 500 11 0 1 11 0 1 12 0 0 Seasonal variation 100 years 200 5 0 0 5 0 0 5 0 0 50 years 200 5 0 0 5 0 0 5 0 0 198 199 Out of the twenty nine different combinations of model prediction methods, return periods and length of future simulations presented above there are only two in which the underprediction column is not the largest. Possibly the criterion for a successful prediction is excessively strict, but this display does show the tendency for predictions to be less than the observed values in these simulations. It is possible that further simulations would change this mixture. A number of considerations suggest that these results may not completely represent the results that would be obtained from more excessive simulations. For example, all three models were run using the same seed numbers for the random number generators. Thus, the uniformly distributed random numbers generated by GGUBS were the same, although the different models generated different wind speeds and wave heights from them. This accounts for the fact that in both the uniform and the seasonal model with hurricanes, the twenty year period from 101 to 120 produced unusually high predictions, and that the 500 year future period produced higher values than the 200-year future period. A truly random process can display periods of apparently consistent behavior. 2G0 This can be seen by considering that if a uniformly distributed random process has no correlation between its values at any lags, its autocorrelation function will be a delta function, and then the Fourier transform of the autocorrelation, the power spectrum, will be a constant function. Having significant power in the power spectrum at low frequencies means that the time series of the data values can exhibit temporary trends and brief periods when the values clump together. Since phase information is not included in a power spectrum, and such periods of apparently orderly behavior depend on the relative phase of the various sinusoids which make up the process, it is hard to predict when such events will occur, but the results of this simulation suggest that several such episodes occur in the sequences of random numbers generated by GGUBS starting with the currently used seeds. If enough computer time were available, perhaps it would be good to try both longer simulations with the currently used seeds, and similar simulations using other seeds. It is worth noting that starting GGUBS with a different seed has the effect of starting it at a different point in the extremely long cycle of numbers which it generates. Considering that each run uses only a 2 0 1 few hundred thousand of the many millions which can be generated, the chances are very good that any other seeds chosen would start the simulation in portions of the sequence not touched during the simulations reported here. The IMSL documentation recommends seeds of 7 digits or more. Possibly 7-digit strings from the RAND corporation book of random numbers would be good seeds. It also appears from the erroneous summary tables that the new method is more sensitive to data errors than the other methods are. The prediction by the new method, based on the erroneous data with the 100 miles per hour wind limit, shown on the next pages are often a foot or two below the ones based on the previous summary tables resulted from the current data. The predictions by the annual max series method and Liu and Ma's are less affected. This is comprehensible. If the new method was designed to make more use of the data, it might be more sensitive to errors which appear in data values which are ignored by the annual max series method, and are averaged together with more values (since all "typhoons" are treated together) by Liu and Ma's method. In particular, the estimation of standard deviations, which is so important in the new method, is sensitive to outlying Table 5.38 Erroneous uniform summary table with 60 years of wind data and 100 years of return period Years of joint data n point annual max Gumbel n typhoon Liu and Ma n points new method 61-140 80 21.35 44 17.78 1708 20.56 61-120 . 60 21.72 38 18.22 1285 20.60 61-100 40 20.08 28 18.38 861 20.23 101-140 40 22.46 16 17.16 847 19.58 61-80 20 20.79 15 19.69 420 19.40 81-100 20 19.09 13 16.70 441 18.59 101-120 20 24.09 10 18.13 : 424 18.78 121-140 20 20.14 6 15.65 423 17.56 61-70 10 18.16 6 14.65 215 17.56 71-80 10 21.98 9 22.25 205 18.58 81-90 10 18.62 6 16.79 215 17.82 91-100 10 19.04 7 16.75 226 16.71 From an actual 500 year simulation From an actual 200 year simulation 23.04 20.35 K> O r s j Table 5.39 Erroneous uniform 50 years of return summary table period with 60 years of wind data and Years of joint data . , annual max n P°lnts Gumbel n typhoons Liu and Ma n points new metl 61-140 80 20.17 44 16.41 1708 19.16 61-120 60' 20.51 38 16.84 1285 19.20 61-100 40 19.07 28 16.99 861 18.88 101-140 40 21.14 16 15.77 847 18.05 61-80 20 19.75 15 18.02 420 17.98 81-100 20 18.17 13 15.67 441 17.25 101-120 20 22.61 10 16.75 424 17.16 121-140 20 19.60 6 14.28 423 16.18 61-70 10 17.41 6 13.63 215 16.19 71-80 10 20.89 9 20.36 205 16.96 81-90 10 17.89 6 15.82 215 16.36 91-100 10 18.03 7 15.66 226 15.40 From an actual 500 year simulation From an actual 200 year simulation 21.29 19. 93 203 Table 5.4 0 Erroneous seasonal variation summary table with 60 years of wind data and 100 years of return period Years of joint data n joint annual max Gumbel n typhoon Liu and Ma n points newT method 61-140 61-120 61-100 61-80 81-100 80 60 40 20 20 21.93 22.31 21.17 21.94 20.12 66 53 39 19 20 19.65 20.49 20.64 21.83 18.92 1623 1208 806 4 07 399 21.00 21.27 21.27 20.86 18.85 From an actual 200 year simulation 22.70 Table 5.41 Erroneous seasonal variation summary table with 60 years of wind data and 50 years of return period Years of joint data n joint annual max Gumbel n typhoons Liu and Ma n points new method 61-140 80 20.69 66 18.19 1623 19.60 61-120 60 21.03 53 18.95 1208 19.83 61-100 40 20.03 39 19.09 806 19.82 61-80 20 20.75 19 20.21 407 19.20 81-100 20 19.08 20 17.53 399 17.52 From an actual 200 year simulation 21.86 205 values. This may explain the apparently paradoxical result that a few of the predictions based on the erroneous data are higher than predictions based on the correct data. Making the correction meant that fewer waves corresponded to winds in the 90-110 miles per, hour interval, which thus would have a smaller standard deviation. The wind interval from 110 to 140 might have only 1 or two point in it, particularly in the shorter intervals, and thus also contribute either nothing or a highly unreliable value to the cumulative distribution. It is possible that small estimates of the standard deviation are an important contributor to the underprediction problem. If so, a better choice of wind speed intervals will help. It is also possible that the annual max series method could be affected by standard deviation changes based on changes in outlying values. Since the Gumbel "mean" parameter is tJ = Y-.4 5S a value which moves farther could conceivably affect S 207 more than Y, and result in a decrease of U. This appears the most plausible explanation for the slight decrease in a few of the 100 year predicted values from the annual max series method after the data generation correction was made. I can summarize this section by saying that the simulation which have been done so far have displayed a wealth of phenomena which have not yet been fully explained. Perhaps the most surprising result is the observed tendency of all the methods to underpredict extreme events as produced by this simulation. More simulations are needed to see how important this tendency is. One result is clear, however. The new method is the most stable. 5.4 "Actual" Design Wave Heights In order to check the predictions of the different methods, a long stretch of the future must be simulated. However, most of the values generated during this simulation can be discarded. Only the very highest need to be kept. For example, if one finds the highest 100 wave heights in 500 years of data, then on the average, 208 one of them occurred every 5 years, and the least of them is thus the minimum event with a 5-year recurrence period. This means that at any given time during the simulation of those 500 years, it is only necessary to keep a list- of the 100 highest values which have been seen so far ordered from highest to lowest. Whenever a new large value occurs, it can be inserted into its proper place in the list, bumping down the ones which should now be below it, and removing the old lowest value. Any value which is below the current lowest value is clearly too small for further consideration. Thus, in general, if one is looking for the event with a recurrence period of k years, an approximation will be given for n years of simulated data by taking the highest k/n values, and selecting the lowest. The recurrence periods considered in this program are 10, 20, 50 and 100 years. Thus, for a future simulation of n years, the program collects the highest values, and estimates the design wave height by looking at the highest and —3— values in the list. 100 209 5.5 Brief Note on Numerical Methods in DESHT The bisection method was employed in DESHT to solve A the equation Fy(y)-R = 0, (see page 129). If the new method is put into practical use, a more efficient numerical method (as regula falsi or Newton) should probably be used. It is possible to evaluate the tail of the derivative of a Gumbel or Frechet distribution in a way analogous to the truncated series method used in COMPVL. Thus, Newton's method could be used, or the derivatives could be omitted, and the method of false position used, starting with YSTART, and perhaps (1*1)*YSTART as the initial guess. For the bisection method, the initial "left" boundary is known to be 0, but it is hard to guess the "right" boundary. The largest wave observed so far is taken as the starting value, and if it is not large enough, it is doubled repeatedly until a sufficiently large value is found. The monotonicity of the cumulative function guarantees that this process will terminate. As the desired probability value will not be attained exactly by the iterative process, it is necessary to specify a range of acceptable values. The 210 range used is +1% of the nominal probability. Thus, if the nominal probability is — — = .0002, a wave height 5000 will be accepted if its predicted probability is between .000202 and .000198. This degree of accuracy seems adequate, because the rapid decrease {in relative terms) of these distribution functions means that the absolute accuracy of the output is comparable to the relative accuracy of the probability value, and as noted in a previous section, it does not appear that the nature of the problem justifies efforts to define the results much more precisely than a few hundredths of a foot. Accuracy requested for DESHT is controlled by a parameter, ACC, which is defined in the main program, and hence could be easily changed if desired. The remark above about the relatively rapid decrease of the distributions on their G' (x) tails can be justified by looking at ------, where G is G {x) the tail of the distribution function. We have primarily considered the upper tail of a Gumbel distribution, and hence have been looking at [l-exp(-e-x)} for large x. As discussed above, for sufficiently large x, 1-exp(-e-x) e-x and 211 -x> i (e A) = -1. „-x Rewriting the derivative in differential notation, we have dF dx dF — = -1 or — = -dx F F which means that in the case of a simple Gumbel distribution the relative change in the function is proportional to the absolute change in the independent variable. Allowing for the normalization of the variables, and the summing of the Gumbel distributions to get the overall distribution function still gives us that a 1% uncertainty in the probability corresponds to an uncertainty of a few hundredths of a foot in the outputs, which is why the outputs are given to two decimal places. 5.6 Notes on Flowcharts All "=" signs represent the FORTRAN replacement operation. Thus, I = 1+1 means I is replaced by one more 212 than its previous value, and is a legitimate statement. Flov/charts are shov/n in the Appendix. 213 CHAPTER 6 SUMMARY AND DISCUSSION In order to obtain improved estimates of extreme geophysical events for civil engineering designs, this dissertation has presented a new method of hindcasting shore waves from wind data, when a period of joint data is available. The joint data are used to estimate the conditional cumulative distribution of the waves, given the observed wind speed. Then the old wind speed observations are used to obtain the corresponding conditional cumulative distributions, and the conditional distributions are combined to obtain an improved estimate of the long-term cumulative distribution of the waves. From this distribution, it is possible to estimate the minimum magnitude of events with a specified recurrence period. This method is new, as far as can be told from the extensive literature search discussed in Chapters 2 and 3, but it is related to some previous methods. Compared 214 to the more conventional method of fitting a regression line through the joint data from which a most probable value is hindcast for each wind, we see that the new method hindcasts a conditional cumulative distribution, instead of a single value. Compared to the method of Liu and Ma, who concentrate on waves in typhoons, we see that the new method makes a finer subdivision of wind speeds, instead of ignoring everything except typhoons. Compared to the ocean wave hindcasting method criticized by H.C.S. Thom, the new method involves a much shorter chain of inferences, and works relatively directly with distributions, as he has done in some of his papers. There are several possible implementations of the general idea which I have discussed in Chapter 4. But for reasons of computational convenience and because Gumbel distributions are commonly used to fit measurements of extreme waves, the conditional distributions used so far have been assumed to be Gumbel distributions. I estimate the mean and standard deviation of the extreme waves corresponding to winds in specified speed intervals, and then use these values to estimate the parameters of Gumbel distributions, one for each wind speed interval. Other approaches might make more effort to subdivide the 215 range of speeds into narrower intervals using interpolation or regression methods to estimate intermediate values of the mean and standard deviation of the wave heights as function of wind speed. The method has been tested by simulation. Several simulation models have been tried, with varying degrees of realism. One is a "uniform weather" model which assumes an unchanging probability distribution of wind speeds throughout the year; another uses a sinusoidal variation of the mean and standard deviation of the wind speeds to simulate seasonal variations in weather patterns, while a third adds to the seasonal variations a random number of typhoon events. Once a simulated wind value is generated by one of these methods, using a uniformly distributed random number between 0 and 1, and an inverse cumulative distribution, a simulated wave value - is generated by using the simulated wind to determine the parameters of an inverse cumulative Gumbel distribution. A second, independent U{0,1) random number is fed into this inverse cumulative Gumbel to simulate the wave height raised by this wind. The uniformly distributed random numbers are generated by an IMSL routine which has been carefully tested for nationwide 216 use. The simulations show clearly that the new method is more stable than the others in the sense that its predictions show less variation with the portion of the input data set used. The other methods used in the simulation for comparison with the new method are the annual maximum series method assuming a Gumbel distribution, and the method of Liu and Ma, which employs a compound Poisson Gumbel distribution. Stability in cases where only small amounts of data are available was one of the chief concerns of Liu and Ma, who apparently have no data from before the end of the Chinese Revolution in 1949. In this simulation, the new method shows greater stability than Liu and Ma's method even on the seasonal model with hurricanes, which is similar to the simulation model which they used, but not identical. The simulation results are somewhat preliminary, as only a few years of data have been simulated so far, because of computer expense. All of the methods showed an unexpectedly large tendency to underpredict the magnitude of events with a specified return period. Only further simulation can show if this is a persistent tendency of this simulation program, or merely an artifact of the 217 particular sequence of random numbers generated by the IMSL routine using the particular ' ‘seed" numbers that were selected for this program. There is some evidence, discussed in Chapter 5, that the underprediction is partly by the result of local clumping of the random numbers. Obviously I would like to test this method on real data. I expect that source of wind and wave data might be available in Southern California, but the most promising collection of data of which I know now are in National Data Centers maintained by the U.S. Government on the East Coast. There is a National Climatic Data Center in Ashevile, North Carolina, a National Oceanic Data Center in Washington, D.C., and the Army Ocean Engineering Center in Ft. Belvoir, Virginia. The services of the National Climatic Data center are quite expensive, and the description of the National Oceanic Data Center's records suggests that they are not really suitable for my purpose at present, being concerned more with the open ocean than the shore. The use of real data will pose some questions of sampling which- have not been addressed yet. Perhaps the largest is the decision about the maximum time lag 218 allowable between wind and wave observations. The use of Gumbel distribution in the simulation and in the calculation method assumes that when real wind and wave data are used, they should be the extreme values. There is no difficulty in selecting the highest wind speed, or the highest wave measurement, in a day, or in a storm, but once such a datum is selected, how close in time should the other measurement be chosen? The moment when the highest wave is recorded is not likely to the moment of the highest wind, and vice versa. The physics of the situation argues that wind and wave measurements made several hours apart may be expected to be dependent. Therefore it is reasonable to select the highest wind speed and the highest wave height within several hours of each other. Only a study of data from a particular location can help us to find how many hours would be reasonable to use. I doubt that a separation as long as a day could be used, but this is just a guess. As the areal extend of storms usually increases with their intensity, adjusting the time interval according to the intensity is conceivable, but it might not be helpful. For example, it might be difficult to do reliably, and hence would introduce more errors than it would 219 eliminate, and it might easily require more information than is available in the old records for which this method is intended. I do not know how reliably one could estimate storm areas in the days before satellite photographs become available. A number of plausible further modification of this hindcasting approach seem possible. For example, the height of waves is affected by the angle between their direction of propagation and the direction of the wind. Possibly if wind direction data were used in the calculation, as well as wind speeds, this would account for some of the variance observed in wave heights corresponding to a certain speed. In this case, the conditional distribution of wave heights, given both wind speed and direction, would be narrower, but the number of combinations of wind speed and direction could be large. For simulations, this idea suggests the possibility of a more physically realistic simulation scheme in which wind speeds and directions might be simulated at various points on a grid near a shore area, and the waves at the point of interest on the shore might be calculated as some sort of average of the simulated winds on the grid. This idea is based on a spatial average, as compared to 220 the temporal average concept mentioned at the end of Chapter 4's discussion of simulations. The ideas could be related if there were enough local ergodicity in storms. This would mean that the winds at a given point, if observed sufficiently frequently, for a long enough period, might provide information comparable to that which would be obtained by getting simultaneous measurements at many points in the storm. Once again, this idea of local ergodicity require more data than would be available in old records, so it is not clear that it would be relevant to my original problem of making better use of long term observations. One might follow the example of Buishana [8] in combining data from stations where observations are independent, but have statistical properties which are similar enough that they can be considered as independent realizations of the same stochastic process. For example, the data from station 4 through 7 in Liu and Ma' s paper have Poisson parameters, respectively, of 2.66, 2.81, 2.48, and 2.33. These parameters are sufficiently close to each other to suggest that the data from these stations might be combined, which would give a total of 88 station years of data, containing 200 typhoons. They 221 have respectively 19, 27, 21, and 21 years, containing 40, 59, 52, and 49 typhoons. However, it might take considerable effort to find places in the U.S. which could be combined in this manner, because of the great quantities of data that are available. If the station year method could be combined with the method introduced here, so that independent data series extended by conditional distribution hindcasting could be combined to form very large records, the improvement could be greater than what could be obtained by either method alone. It appears to be rare for uncertainty estimates to be attached to the estimates of long-term geophysical extrapolations for design. For example, Liu and Ma present an estimate that at their coastal "station three", there will be a 100 year return period for waves at least 12.97 meters high, based on all 27 years of data from the station. However, when this period is broken into 9-year intervals, the predictions are 12.05, 12.24, and 13.84 meters. Similar variations were displayed by all the methods in the simulations in Chapter 5, and the emphasis in the previous discussion was on the greater stability provided by the new method. We might take this variability as an indication of true uncertainty, and 222 assign an error range to the estimates as physicists do. In this case the estimate for Liu and Ma's station three would be something like "13+1 meter". We may note that estimating the uncertainty of a calculation by repeating it in different subsets of the original data is the essence of the "bootstrap" statistical method. However, taking a more traditional approach, we note that if an uncertainty can be assigned to the Yj and Sj calculated for the waves corresponding to the winds in interval j, then since the width and Location parameter of a Gumbel distribution are simple functions of Yj and Sj, standard approaches to error propagation (e.g., Bevington, 1960, Chapter 4) can be used to obtain uncertainties for the A estimates Fy (y). As another approach to further mathematical analysis of this method, we note that as discussed in Chapter 4, "selection of independent observations", the number of wave events corresponding to wind speeds above a cutoff which makes them independent will be a random number. These wave events have random magnitudes, so the analysis methods of Todorovic and coworkers can be applied to the new method. It can also be applied to the method of Liu and Ma, since the number of typhoons and the largest wave 223 in each, is clearly a process of random number of events of random magnitudes. Both their method and the new one are closely enough related to the partial duration series method that similar theoretical approaches can be used to analyze them. Possible application of the approach described in this dissertation to other types of geophysical phenomena can also be seen. Rainfall and streamflow appear to be treatable in a similar way 'to winds and waves, respectively. The time scale for rainfall effects to appear in stream data is longer than for winds to effect wave heights, and there is much more of a moving average effect. For example, Brater and Wallace [7] use a 3-year moving average of rainfall to predict streamflow, although the greatest weight is given to rainfall in a few previous months. The conditional distribution method would allow one to deal with the residual variation left in the streamflow data after the best moving average model had been devised from the joint data, and the independent variable used in the hindcasting could not be individual rainfalls, but moving average values. It would not be excessively difficult to devise reasonably realistic simulation for runoff with which to 224 test the model. For example, an element of added variation, comparable to the hurricanes in my wave simulation, would be added by including mountain snowpeaks in the simulation model, with random snowstorm events, and a randomly varying temperature to control the meltrate. Perhaps the change in the temperature in degrees could be a Poisson distributed integer. There are many possibilities. The change in the snowpeak in each day (or whatever time period was used) would be the difference between the snowfall, if any, and the melting, if the temperature were warm enough, and so on. It may also be possible to apply this method to earthquake studies. For example, the magnitudes of earthquakes on a given fault tend to be positively correlated with the time intervals preceding them, since the plate tectonic movements which build up strain in the rocks appear to be fairly uniform. The times of large earthquakes are recorded for centuries past in many parts of the world, although instrumentally measured magnitudes are not available before the early 20th century. if rough estimates of magnitudes based on damage reports could be supplemented by more precise hindcast values, prediction and risk assessment might be aided. There 225 would be an additional payoff if the use of joint distributions led to observations of scaling behavior by which results from small faults could be applied to large ones. For example, the proportionality between earthquake magnitudes and preceding interval might be expected to depend both on fault length and on the rate of ground motion on opposite sides of the fault. Perhaps the type of fault, such as thrust, or strike slip, might be important, too, but if these turned out to be the principal factor, then it would be possible to improve the statistics by combining data from several faults as in the station year method. These later possibilities indicate the wide variety of potential studies opened by increased use of joint and conditional distribution. When I began looking for an improved method of hindcasting waves from winds, I had no idea that the work would have so many possible applications to other areas. The simulation method used has also provided valuable experience, and the possibility of adaptation to other areas. Engineering geophysical data analysis and simulation have become much more sophisticated and complex in recent years, with the increased availability of automatic 226 computation and accumulating evidence that more traditional methods, suitable for hand computation, have been inadequate. The damage done along the coast of Southern California by the unusually large storms of the winter of 1983 is a good example of failure in the past to predict correctly that such large storms might occur in the life times of the structures that were damaged. I hope that the methods and concepts which I have introduced will be a contribution to safer and more effective engineering. 227 REFERENCES [1] Alexander, G.N., "Heavy Rainfall as a Stochastic Process," Review of the International Statistical Institute, Vol. 38:1, 1970. [2] Ang, Alfredo H-S and Tang, Wilson, H., Probability Concepts in Engineering Planning and Design, Vol. 1, John Wiley & Sons, Inc. [3] Benjamin, Jack R. and Cornell, C. Allen, Probability, Statistics and Decision for Civil Engineers, McGraw-Hill Book Company. [4] Bevington, Philip R., Data Reduction and Error Analysis for the Physical Sciences, McGraw-Hill Book Company. 228 [5] Bishop, Craig T., "Comparison of Manual Wave Prediction Models," Journal of Waterway, Port, Coastal and Ocean Division Engineering, ASCE, Vol. 109, No. 1, pp. 1-17, February 1983. [6] Blair, Kinsman, Wind Waves Their Generation and Propagation on the Ocean Surface, Prentice- Hall, Inc. [7] Brater, Ernest F., and Wallace, Roger M., "Hindcasting Monthly Runoff," Journal of Hydraulic Engineering, Vol. 110, No. 2, pp. 126-141, February 1984. [8] Buishand, T.A., "Bivariate Extreme-Value Data and the STation-Year Method," Journal of Hydrology, Vol. 69, pp. 77-95, 1984. [9] Clarke, R.T. and Edwards, K.A., "The Application of the Analysis of Variance to Mean Areal Rainfall Estimation," Journal of 229 Hydrology, Vol. 150, pp. 97-112, 1972. [10] Cramer, Harald and Leadbetter, M.R., Stationary and Related Stochastic Processes, John Wiley & Sons, Inc. [11] Cramer, Harald, Mathematical Methods of Statistics, Princeton University Press. [12] Deheuvels, P., "Univariate Extreme Values - Theory and Applications," Invited Papers Meeting, pp. 837-858. [13] Earle, M.D., and Baer, L., "Effects of Uncertainties on Extreme Wave Heights," Journal of Waterway, Port, Coastal and Ocean Division, ASCE, Vol. 108, No. WW4, pp. 456-477, November 1982. [14] Fisher, R.A. and Tippett, L.H.C. (1928), "Limiting Forms of the Frequency Distribution 230 of the Largest or Smallest Member of a Sample," Proc. Camb. Philos. Soc., 24, 180-190. [15] Frechet, M. (1927), Ann. Soc. Pol. Math. Cracovie, 6, 93-115. [16] Gabriel, K.R. and Neumann, J., "On a Distribution of Weather Cycles by Length," Quarterly Journal of Royal Meteroloqical Society London, pp. 375-380, 1957. [17] Galambos, J. (1978), The Asymptotic Theory of Extreme Order Statistics, Wiley, New York. [18] Gnedenko, B.V. (1943), Ann. Math., 44, 423-453. [19] Gumbel, E.J., "Floods Estimated by Probability Method," Engineering News Record, June 1945. [20] Gumbel, E.J. (1958), Statistics of Extremes. 231 Columbia University Press, New York. [21] Gupta, Vijay K., Duckstein, L., and Peebles, R.W., "On the Joint Distribution of the Largest Flood and its Time of Occurrences," Water Resources Research, Vol. 12, No. 2, pp. 295-304, April 1976. [22] Gutjahr, Allan L., et al., "Stochastic Analysis of Spatial Variability in Subsurface Flows - 1. Comparison of One and Three Dimensional Flows," Water Resources Research, Vol. 14, No. 2, pp. 263-271, 1978. [23] Gutjahr, Allan L., et al., "Stochastic Analysis of Spatial Variability in Subsurface Flows - 2. Evaluation and Application," Water Resource Research, Vol. 14, No. 5, pp. 953-959, 1978. [24] Isaacson, M.S.Q. and Mackenzie, Neil G., 232 "Long-Term Distribution of Ocean Waves," Journal of the Waterway, Port, Coastal and Ocean Division, ASCE, Vol. 107, No. WW2, pp. 93-109, May 1981. [25] Isikara, A.M. and Vogel, Andreas (Eitors), "Multidisciplinary Approach to Earthquake Prediction," Proceedings of the International Symposium on Earthquake Prediction in the North Anatolian Fault Zone Held in Istanbul, March 31-April 5, 1980, Friedr. Vieweg. [26] Jensen, Arne, "A Characteristic Application of Statistics in Hydrology," Review of the International Statistical Institute, Vol. 38:1, 1970. [27] Kac, Mark and Iranpour.M, G. Reza, A Mathematical Introduction to the Theory of Stochastic Processes, in preparation. 233 [28] Karlin, Samuel and Taylor, Howard M., A First Course in Stochastic Processes, Academic Press. [29] Kim, So Gu, "On the Estimation of Parameters in the Statistical Prediction of Earthquakes," J. Phys. Earth, 31, pp. 251-264, 1983. [30] Kirby, William, "On the Random Occurrence of Major Floods," Water Resources Research, Vol. 5, No. 4, pp. 778-784, August 1969. [31] Kiremidjian, Anne S. and Anagnos, Thalia, "Stochastic Slip-Predictable Model for Earthquake Occurrences," Bulletin of the Seismoloqical Society of America, Vol. 7 4, No. 2, pp. 739-755, April 1984. [32] Knuth, Donald E., Seminumerical Algorithms, Addison Wesley Publishing Company. [33] Langbein, W.B., "Annual Floods and the 234 Partial-Duration Flood Series," Transaction, American Geophysical Union, Vol. 30, No. 6, pp. 879-881, December 1949. [34] Linsley and Franzini, Water Resources Engineering, McGraw-Hill. [35] Liu, Teh-Fu and Ma, Feng-Shi, "Prediction of Extreme Wave Heights and Wind Velocities," Journal of the Waterway, Port, Coastal and Ocean Division, ASCE, Vol. 106, No. WW4, pp. 469-479, November 1980. [36] Mann, Nancy R. and Singpurwalla, Nozer D., "Extreme-Value Distribution," pp. 606-613. [37] Mejia, Jose M. and Rodriguez-Iturbe, Ignacio, "Correlation Links Between Normal and Log Normal Processes," Water Resources Research, Vol. 10, No. 4, pp. 689-690, August 1974. 235 [38] Morris, Charles D., "A Stochastic Model for a Small-Time-Interval-Intermittent Hydrologic Process," Journal of Hydrology, Vol. 68, pp. 247-272, 1984. [39] Nolte, K.C., and Hsu, F.H., "Statistics of Larger Waves in a Sea State," Journal of the Waterway, Port, Coastal and Ocean Division, ASCE, Vol. 105, No. WW4, pp. 389-403, November 1979. [40] Ochi, N.K. and Whalen, J.E., "Prediction of the Severest Significant Wave Height," Journal of Waterway, Port, Coastal and Harbor Division, pp. 587-599, 1980. [41] Pickands, James, "Multivariate Extreme Value Distributions," Invited Papers Meeting, pp. 859-878, December 1981. [42] Populis, Athanasios, Probability Random 236 variables and Stochastic Processes, McGraw-Hill Book Company. [43] Ross, Sheldon M., Applied Probability Models with Optimization Applications, Holden-Day. [44] Shane, Richard M. and Lynn, Walter R., "Mathematical Model for Flood Risk Evaluation," Journal of the Hydraulics Division, ASCE, Vol. 90, NO. HY6, pp. 1-20, November 1964, [45] Snyder, W.M., Mills, W.C., and Knisel, W.G., "A Unifying Set of Probability Transforms for Stochastic Hydrology," Water Resources Bulletin, Vol. 14, No. 1, pp. 83-38, February 1978. [46] Takeuchi, Kuniyoshi, "Annual Maximum Series and Partial-Duration Series-Evaluation of Langbein’s Formula and Chow's Discussion," Journal of Hydrology, Vol. 68, pp. 275-284, 237 1984. [47] Tayfun, M.A., "Distribution of Crest-to-Trough Wave Heights," Journal of Waterway, Port, Coastal and Harbor Division, Vol. 107, No. WW3, pp. 149-158, August 1981. [48] Thom, H.C.S., "Extreme - Wave Height Distributions Over Ocean," Journal of the Waterways, Harbors and Coastal Engineering Division, ASCE, Vol. 99, No. WW3, pp. 355-374, August 1973. [49] Thom, H.C.S., "Asymptotic Extreme-Value Distributions of Wave Heights in the Open Ocean," Journal, of Marine Research, Vol. 29, pp. 19-27, 1971. [50] Thom, K.C.S., "Distributions of Extreme Winds Over Ocean," Journal of the Waterways, Harbors and Coastal Engineering Division, ASCE, Vol. 238 99, No. WW1, pp. 1-17, February 1973. [51] Todorovic, P., "On Some Problems Involving Random Number of Random Variables," The Analysis of Mathematical Statistics, Vol. 41, No. 3, pp. 1059-1063, 1970. [52] Todorovic,' P. and Zelenhasic, E., "A Stochastic Model for Flood Analysis," Water Resources Research, Vol. 6, No. 6, pp. 1641-1648, December 1970. [53] Todorovic, P. and Woolhiser, D.A., "On the Time When the Extreme Flood Occurs," Water Resources Research, Vol. 8, No. 6, pp. 1433-1438, December 1972. [54] Todorovic, P. and Rousselle, J., "Some Problems of Flood Analysis," Water Resources Research, Vol. 7, No. 5, pp. 1144-1150, October 1971. 239 [55] Todorovic, P., "Stochastic Models of Floods," Water Resources Research, Vol. 14, No. 2, pp. 345-356, April 1978. [56] Vere-Jones, D., "Stochastic Models for Earthquake Sequences," Geophys. J.R. Astron. Soc., 42, pp. 811-826, 1975. [57] Wax, Nelson (Editor), Selected Papers on Noise and Stochastic Processes, Dover Publications, Inc. [58] Wiegel, R.L. , "An Analysis of Data from Wave Recorders on the Pacific Coast of the United States," Transactions, American Geophysical Union, Vol. 30, No. 5, pp. 700-704, October 1949. [59] Yegulalp, Tuncel M. , and Kuo, J.T., "Statistical Prediction of the Occurrence of Maximum Magnitude Earthquake," Bulletin of the 240 Seismoloqical Society of America, Vol. 64, No. 2, pp. 393-414, April 1974. [60] Yevjevich, Vujica, Stochastic Processes in Hydrology, Water Resources Publications, Fort Collins, Colorado. APPENDIX 241 C Stop ~") Main Program Print predictions from all methods Print result of simulation of future Perform-analysis by my method (newest) Perform analysis by Liu and Ma's method (NEWIER) Generate simulated wind and wave data (GENDAT); Perform annual max series analysis with Gumbel distribution (OLD) Simulate long future period to compare with predictions (FORWRD) 242 Subroutine OLD (Annual max series method with Gumbel distribution) NO T>last year YES CReturn ^ Find max wave in current year Find mean, standard deviation from SUM1, SUM2 ____ SUM1=0 SUM2.=0 I=first year BMAX=maxwave SUM1=SUM1+BMAX SUM2=SUM2+BMAX' 1= 1+1 Predict design wave height for a specific return period from F~-*- for Gumbel using mean, standard deviation Subroutine NEWIER (Liu and Ma's method) 243 YES Hind ( J, I) <minimum' shir typhoon?^-^'"’ NO NO J>last davj- YES 1= 1+1 >last year?)>—- — YES (Return ) J=J+1 • SUM1=0 SUM2=0 NTYPH=0 I=first year J=first day NTYPH=NTYPH+1 S UM1=S UMl+WAVE(J,I) SUM2=SUM2+(WAVE(J,I)) Find mean, standard deviation and Poisson parameter A from SUM1, SUM2, NTYPH Predict design wave height from F ^ with parameters mean, standard deviation, A Subroutine NEWEST 244 (My methodj wide interval version) Accumulate observations in intervals (DOSTAT) Predict desig by calculati data from DO n wave height ng F”^ - from STAT (DESHT) ^Return ^ Subroutine GENDAT 245 (Generate data - uniform weather model - generates all data with one call) NO Convert WIND (I.) to a simulated wind (CGMINV) Generate uniform distribution in wave for conversion to simulated waves (GGUBS-IMSL); 1=1 Generate uniform distribution in wind for conversion to simulated winds (GGUBS-IMSL) Use WIND(I) to give parameters for generating wave value. Converg WAVE(I) to a simulated wave (CGMINV); 1=1+1 YES f return ) Subroutine GENDAT 246 (Generate data - seasonal variation model - does one year per call) NO YES C Return ) Generate uniform distribution in WIND for conversion to simulated winds (GGUBS-IMSL) Generate uniform distribution in WAVE for conversion to simulated waves (GGUBS-IMSL); 0 ______ , ___________________ i—1 number of times per year ’ Use WIND(I) to give parameters for generating wave value; Convert WAVE(I) to a simulated wave • (CGMINV); 1=1+1 ________________ SEASON=-cos(0*I); ^ Wind mean=base line for mean+SEASON*amplitude of variation Wind standard deviation=base line for standard deviation+ SEASON*amplitude of variation; Convert WIND(I) into a simulated wind using wind mean and with standard deviation (CGMINV) Subroutine GENDAT 247 (Generate data - seasonal variation model with hurricanes - does one year per call) Generated Poisson-distributed integer NTYPH giving number of typhoons (GGPON-IMSL) NO YES NTYPH=0 1=1 (Continue) NSEAS=number of times per year number of times per year (1 season) NSEAS=f of Generate uniform distribution in WAVE for conversion to simulated waves (GGUBS-IMSL) Generate uniform distribution in WIND for conversion to simulated winds (GGUBS-IMSL) Week length = ^ Typhoon length=max(i time period, 5 days) 2tt number of times per year 7*no. of times per year (Continue) Like box NO I>NSEAS? „YES C on next page YES NTYPH=02. NO YES NO NO ■>number of time YES (Continue) I=1;NDAY=NSEAS Like Box © for WAVE (NDAY) J=J+1 NDAY=NDAY+1, like box ® for WIND (NDAY) SEASON=-cos(8 *1) etc. Like box ® in seasonal Number of times=NWK*week length J=1 Geneate Poisson distributed integer, NWK for number of weeks from current time when hurricane begins (GGPON-IMSL) 248 i (Continue) 24 9 NO >typhoon length? YES NO YES C on previous page Like ® for WIND (I) NO T>number o times/year I YES Q Return J=1 1= 1+1 I=NDAY+1 Like ® for WAVE (I) 1= 1+1 Like ® for WAVE (NDAY) j=j+l NDAY=NDAY+1 Convert WIND (NDAY) into a simulated wind using hurricane parameters DOSTAT 250 (Accumulates statistics of wide intervals) Increment NOBS, YBAR, SIG; YSTART is to be largest wave height, to be used in search procedure of DESH; 1=1+1 NO ^"I>number of ioint observation YES NO >number of wind observations?^ Return ) 1=1; clear NOBS array Print summary of results NOBS(IP)=N0BS(IPT)+1; 1=1+1 Calculate mean and standard deviation for waves corresponding to winds in each speed interval For joint wind observation I, search for the interval it falls into; IPT is interval number; Ignore wind speeds below cutoff level. ___ Clear arrays YBAR, SIG, NOBS to accumulate number of observations, mean and standard deviation of waves _______for each interval of wind speeds. 1=1_______ For wind observation I, search for the interval it falls into; IPT is interval number; All intervals needed for correct probabilities in COMPVL 2 FORWARD NOVER=number of future years to be simulated+number of years of wave data already available in the joint data; Find number of values to examine for each desired recurrence period and put into array IBND. NBIG(number of values to collect)=IBND(l); Clear WORK array to accumulate largest values; __________________________ IYR=1 YES WAVE(I)<smallest value -— in WORK arrayj^-— NO YES 'AVE(I)>largest value ~-^J.n WORK arravi--'''''” NO NO IPT=NBIG NO [WORK (IPT ) =WAVE (I) 1= 1+1 NO YES (Continue) I>number of imes per year? Search WORK array for place to insert WAVE(I) IPT=place to insert Generate simulated wind and wave data (GENDAT) 1=1; IPT=1 Move all elements of' WORK one space further to make room for the insertion, discarding the old WORK (NBIG) (Continue) 252 NO XYR>NY YES YES WVOLD(I, IYR) <smallest? atalue in WORK array? YES PfVOLD(I, IYR) >largesT~ ..value in WORK arrayZ NO YES NO Move all elements of WORK one space farther to make room for the insertion, discarding the OLD WORK(NBIG) NO >number of imes ter yea YES (Continue; 1= 1+1 IYR=IYR+1 1=1; IPT-1 WORK(IPT)=WAVE(I) IYR=first year of joint data Search WORK array for place to insert WVOLD(I,IYR) IPT=place to insert 253 (Continue) NO IYR>NYR? YES 1=1 DESWV(I)=WORK(IBND(I)) (Select values for output) 1= 1+1 NO I>number of -^outputs ?_^-- 1 YES (Return**) DESHT 254 NITER=0(number of iterations completed) PROBVL=DATPD/RETPD(DATPD=sampling period in years) Retpd=return period in years: e.g., sampling period of 1/100 years return period of 50 years means desired probability=^“ -Q BLEFT=0. (left boundary for bisection search at start) YLEV,BRIGHT=YSTART(largest observed wave height seen in joint data by DOSTAT-YLEV is current estimate of solution, BRIGHT is right boundary PR0BU,PR0RL=upper and lower bounds around PROBVL for acceptible range of solution Calculate probability of current value of YLEV.for PNOW; NITER=NITER+1 (COMPVL) WRITE NITER r Write message indicating failure to converge, with diagnostic information ( Stop) N0W>PR0BU and RIGHT=YLEV2 YLEV=2*YLEV BRIGHT=YLEV N0W>PR0BU and RIGHT>YLEV2 BLEFT=YLEV N0w<PR0BL and RIGHT=YLEV2 YLEV PNOW<PROBL and RIGHT<YLEV2 BRIGHT=YLEV BRIGHT+BLEFT YLEV a : ('Return YLEV) CGMINV 255 (Calculate inverse cumulative Gumbel distributions) ( Enter ) YES VALIN>0? NO YES VALIN >11 NO Result=OUTMAX Result is larger of OUTMIN, 0UTMIN4-DEFVL Evaluate inverse cumulative Gumbel distribution function for VALIN and then make, sure the result is between OUTMIN and OUTMAX CFRINV is identical, except for using the inverse Ferchet cumulative distribution. (Cumulative probability distribution for interval methods) YES No data in .interval II NO 1= 1+1 NO I>number or intervals?^. YES COMPVL=0 1=1 Divide COMPVL by total number of observations COMPVL=COMPVL+NOBS (I)-'cumulative Gumbel probability for this interval I, for independent variable value given by YLEV ^Return ^ GUMNC 257 (Complement of normalized Gumbel distribution - evalxiates the tail of [1-exp(-e“x)] for large values of x, the normalized variable) YES 0RM>150. NO YES NO YES NO 1=1; PR0D=1; SUM=0. PROD=PROD*EXPON/l; SUM=SUM+PROD; 1=1+1 XNORM=normalized variable based on input.H mean HBAR, standard deviation S ABSEXP=e XN0RM EXPON=-ABSEXP NITRE=number of iterations for truncated series based on magnitude of ABSEXP i t .
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Application of computational geomechanics to description of soils behavior
PDF
Investigation of soil-flexible foundation-structure interaction for incident plane SH waves
PDF
A new method for hindcasting and its application to predicting extreme events in civil engineering
PDF
Compact VLSI array processors design for multimedia applications
PDF
The Effect Of An Initial Axisymmetric Imperfection On The Natural Frequencies Of Conical Shells
PDF
Analytical and experimental of particle dampers
PDF
Boundary element method for scattering of elastic waves in general anisotropic media
PDF
A study of geotechnical applications of biopolymer treated soils with an emphasis on silt
PDF
Application Of Dynamic Programming In Limit Analysis Of Structures Utilizing A General Method Of Construction Of The Yield-Surface
PDF
A revised multi-level substructuring technique for linear and nonlinear finite element analysis
PDF
A high temperature catalytic ceramic membrane reactor: Theory and applications
PDF
An experimental investigation of gas phase synthesis using a spark generated shock wave
PDF
Adsorption of mercury (ii) from aqueous solutions by hydrous metal-oxides
PDF
Numerical-Solution Of Two-Dimensional Stress Waves In Orthotropic Media
PDF
Boundary integral equation method for the diffraction of elastic waves using simplified Green's functions
PDF
Advances in receptor modeling with application to ambient atmospheric concentrations of volatile organic compounds
PDF
Application of electrokinetic phenomena in: dewatering, consolidation andstabilization of soils and weak rocks in civil and petroleum engineering,and augmenting reservoir energy during petroleum...
PDF
The Nonlinear Dynamic Analysis Of Shells Of Revolution With Asymmetric Properties By The Finite Element Method
PDF
On The Stresses In A Circular Cylindrical Shell Containing Two Circular Holes
PDF
Applications of abstract data types: The Trio operating system.
Asset Metadata
Creator
Iranpourmobarekeh, Gholamreza (author)
Core Title
A new method of hindcasting and its application to better predicting extreme events in civil engineering, winds and waves
Degree
Doctor of Philosophy
Degree Program
Civil Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, civil,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
[illegible] (
committee chair
), Proskurowski, Wlodek (
committee member
), Seide, Paul (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c17-40656
Unique identifier
UC11348725
Identifier
DP22173.pdf (filename),usctheses-c17-40656 (legacy record id)
Legacy Identifier
DP22173.pdf
Dmrecord
40656
Document Type
Dissertation
Rights
Iranpourmobarekeh, Gholamreza
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, civil