Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The analysis of circular data
(USC Thesis Other)
The analysis of circular data
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
THE ANALYSIS OF CIRCULAR DATA
by
Ping Wen
A Thesis Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(Statistics)
December 1994
U N IV E R SIT Y O P S O U T H E R N C A L IF O R N IA
TH E GRADUATE SCHOO L
UN IV ER SITY PARK
LO S A N O ELES. CA LIFO R N IA B 0 0 0 7
This thesis, written by
......2-ING--WSN-...............................
under the direction of h+r. Thesis Committee,
and approved by all its members, has been pre
sented to and accepted by the Dean of The
Graduate School, in partial fulfillment of the
requirements for the degree of
Master of Science in Statistics
C .
D tm m
Date November 28, 1994
ffY V f n /fT T T F .F
C k m irm m M
WWLjuU2 UJaX€r^~iy^
D ed ica tio n
This thesis is dedicated to my parents and my husband.
1 1
A cknow ledgm ents
I would like to express my sincerest gratitude to my academic advisor Dr. Simon
Tavare for his valuable advice and careful guidance throughout the process of this
thesis. He provided me relevant materials and spent time helping me with the
difficulties that I encountered. I also would like to thank Dr. Michael Waterman
and Dr. Larry Goldstein for leading their support on my thesis committee. Special
thanks are given to Dr. Liqun Wang for helping me with techniques of S-Plus.
C on ten ts
D edication ii
Acknow ledgm ents iii
List Of Tables v
List O f Figures vi
1 Introduction 1
2 F itting M odels to the W ind D ata 3
2.1 Mean wind d i r e c t i o n ............................................................................................ 4
2.2 The LAR p r o c e s s .................................................................................................. 7
3 C A R M odel for W ind D irections 18
3.1 The von Mises distribution V M ( /i,/c ) ........................................................... 18
3.2 The CAR processes .......................................................................... 19
3.3 Estimation of p aram eters.................................................................................... 20
3.4 Simulations of CAR p ro c e s s e s .......................................................................... 23
3.5 Fitting CAR process for wind d i r e c t i o n s ...................................................... 26
3.5.1 CAR model for mean d ire c tio n s ......................................................... 26
3.5.2 Seasonal m o d e l ......................................................................................... 28
4 CA R M odel for W ind D irections w ith Zeroes 30
4.1 The Occurrence of w i n d .................................................................................... 30
4.2 The GCAR p ro c e s s ............................................................................................... 32
5 Concluding Rem ark and Discussion 35
R eference List 36
A ppendix A
Com puter P r o g r a m s ................................... 38
iv
List O f T ables
2.1 AIC values of the transformed data. They have the minimum value
subtracted from them so the minimum is zero............................................... 14
3.1 Estim ated parameters for 100 replications of 5,000 simulated data of
different order of auto-regression, /i = 4, « = 6, ct| = 0.3 (order 1),
Qi = 0.4 {order 2), a 2 = 0 . 3 ............................................................................... 25
3.2 Variance-covariance matrices for simulated d a t a ......................................... 26
3.3 Estimated parameters for seasonal m o d e l ....................................................... 28
4.1 BIC v a lu e s.................................................................................................................... 31
v
List O f Figures
2.1 Ordinary histogram and angular histogram of the wind d a t a .............. 4
2.2 Time series plot of the mean direction for each day in the 8-year data 6
2.3 Monthly angular histograms for daily mean directions from 1961 to
1968 .......................................................................................................................... 7
2.4 Time series plot of the transformed data, using the trigonometric link
g(9) = t a n ( 0 / 2 ) ..................................................................................................... 9
2.5 Time series plots of parts of data from 1962 and 1963. (a) and (c)
use the trigonometric link. The dates for the jumps are 167 and 110
for 1962 and 1963 respectively, (b) and (d) are the original data from
the same ranges. The value for the wind direction on 167th day of
1962 is 179.6224, and on 110th day of 1963 is 180.1533............................ 11
2.6 One example of how to choose the breaking point of the circle .... 11
2.7 Time series plot of the trigonometrically transformed data without
the “outliers” ........................................................................................................ 12
2.8 Autocorrelation function & partial autocorrelation function of the
transformed d a t a ................................................................................................. 13
2.9 Diagnostic checks of model A R ( 3 ) .................................................................. 15
2.10 Forecast with different transformed data, (a) is the forecast using
trigonometric link; (b) is the forecast using probit link; (c) is the
original data, which are the first fifty observations from January 1,
1969. (a) and (b) are forecasts of these data................................................. 17
3.1 3-D plots of the surfaces of log-likelihood of 500 simulated VM(2,6)
distributed data, (a) is part of (b).................................................................... 24
4.1 Histograms of data on wind directions from 11/1961 to 4/1962. . . . 32
vi
C hapter 1
In trod u ction
Wind data analysis is an interesting topic in the context of both tim e series analysis
and circular data analysis, and is important as well. Wind is a source of energy,
like gasoline, except that wind is a renewable resource. Like waterfall in a den,
wind direction and speed are crucial to a wind farm. The knowledge and prediction
of wind direction and speed in a wind farm can help to manage the wind turbine
generators there, and so derive the most energy from wind. In this paper, a set of
wind direction data is analyzed in a systematic way, incorporating ordinary linear
time series analysis, Markovian analysis and circular data analysis.
The directions of the wind data are modeled with different approaches; one is a
linear auto-regressive process (LAR), and the other one is a circular auto-regressive
process (CAR). The LAR models deal with transformed circular data, and the CAR
model deal with circular data directly. In Chapter 2, we introduce the concept of
mean directions and fit LAR model to them. In Chapter 3, we discussed the CAR
model and its various forms. In Chapter 4, Markov process models are used to
1
describe the occurrence of wind. A variant of CAR model, gapped circular auto
regressive model (GCAR) is fitted to the wind directions with “no wind” as obser
vation. In the last chapter, conclusions of the analysis and further discussions are
given. The relevant code can be found in the Appendix.
2
C hapter 2
F ittin g M od els to th e W ind D ata
The data set consists of 77,155 hourly observations of the wind direction at a mete
orological station at Roche’s Point, Ireland; see Raftery, Haslett and McColl (1982)
and Haslett and Raftery (1989). The data were recorded from 1:00 am on January
1, 1961 till 7:00 pm on October 19, 1969. The original data were recorded as 0 for
no wind, and then in 10 degree units clockwise from due North, for a total of 37
states. The paper uses the data of the first 8 years for analysis and model fitting,
the other data for evaluation and prediction.
Since the observed data tnt, where t denotes time, are of states 0 to 36, they are
not really angles, so we chose to recode the data in the following way:
/
0t =
0 if u > ( = 0
lQw( — 5 otherwise
3
Notice that, aside from the value 0, the 6t are angles having values in {5°, 10°,355°}.
Two representations of these data appear in Figure 2.1.
■ - o H
JS o
0 )
cc
o
c i
100 20 0 300
w ind
270 90
160
Figure 2.1: Ordinary histogram and angular histogram of the wind data
2.1 M ean w ind d irection
Since data are collected on an island, there are usually some windy hours within one
day. By taking the mean direction of the wind for each day, we can eliminate the
problem of having measurements with “no wind” in the data set, and focus on the
simplified overall picture of the wind direction. The reason why having “no wind”
in the data set causes trouble will be given in Chapter 4 .
4
Our sample is of wind directions, i.e. angles. The usual arithmetic mean
n -1 Bi is not appropriate to estimate the mean of those directions here. The
reason is because the sample data are not linear. Consider the sample of three points
365°, 1°,3°. The arithmetic mean is 123°. Actually, 365° is identical to 5° in terms
of angle, and the reasonable mean of the three points is 3°. This is the notorious
cross-over problem. We introduce angular mean direction as follows:
Define
n
C = V] cos 0,,
* = 1
S = ^ sin Bi,
1 = 1
R 2 = C 2 + S 2 (R > 0)
The mean direction ? is given by
tan & = SjC
or
tan l(S jC ) S > 0,C > 0
ta n - l (5 /C ) + n C < 0
ta n _ l(5 /C ) + 2rr 5 < 0, C > 0
5
The quantity R is called resultant length of the vector. R = n~lR is defined to be
mean resultant length. 0t is defined to be mean direction for day t. By this means,
we now have the mean direction for every day. The data set {5(} now consists of
2,922 data points, and is a time series of circular random variables. See Figure 2.2.
1901
1963
1905
1967
1902
\
1964
Y U
f
1966
I W W f l
i j h d
1968
Figure 2.2: Time series plot of the mean direction for each day in the 8-year data
As might have been suspected, there are some inhomogeneities in the distribu
tion of mean directions {#t}. Figure 2.3 shows the inhomogeneities in the form of
circular histograms. However, in the analysis of the next section, the data we use
are transformed and It turns out that inhomogeneities are not a problem.
6
Jan
*
Feb
&
Mai
*
Apt
*
May
f
Jun
*
Jul
*
Aug
*
Sep
*
Oct
♦
Nov
*
Dec
*
Figure 2.3: Monthly angular histograms for daily mean directions from 1961 to 1968
2.2 T he L A R process
To analyze the set of angular data, one approach is to use the existing standard time
series algorithms, which require that the observations should be linear. So the first
step is to transform the angular data into some form of linear data, via some link
function g{9), where
g : (0, 2rr) — ► (-o o , + oc)
7
g(9) needs to be one-to-one. Several possible choices of the link function have been
discussed. See Fisher (1993) for example. Basically, there are two simple forms:
1. trigonometric link:
< 7(0) = tan(0/2) 0 < 9 < 2n
2. probit link:
The first link function is a straightforward and simple connection between an
gles and real numbers. The second link function takes the advantage of $ (x ), the
standard normal distribution. In this paper we use the trigonometric link function
wherever a link function is needed.
Using the link function, the circular series {?(} is then transformed to a linked
linear series {X (}. If the process {X(} is an AR(p) process, then {0(} is called a link
autoregressive (LAR) process. The advantages of the link linear process are
1. the circular tim e series problem can be solved in the scope of well estab
lished ordinary time series theory;
2. the parameters are easily estimated by using standard software.
Figure 2.4 shows tim e series plots of the transformed mean wind directions. The
data axe smooth and oscillating about zero, except for a few huge jumps. These
8
19*1 1962
8
1
I
f
8
0 1M 200
1963
300 0 too 200
1964
300
8
. 1 . ^ .
8
8
T
Q 100 200
1965
300 0 too 200
1966
300
j — . I
8
1 M ^
8
8
........................ !
9 100 200
1967
300 0 V0 2 H O
1968
300
_ 1 .
8 ,
" I
1 r
8
8
Figure 2.4: Time series plot of the transformed data, using the trigonometric link
g{$) = tan(0/ 2)
jum ps might be considered outliers. In a one dimensional data set an outlier is a
point whose value is too large or too small. Identifying an outlier is not an easy job,
because first of all “too large” or “too small” is not a clear and precise standard;
secondly, it can be that the strange point is just the result of some transformation
that we apply on the original data set, but not that the observation is abnormal
itself. W hat happened here is exactly the second case. Figure 2.5 is the time series
plots of parts of data of year 1962 and 1963 containing two huge jumps. The two
9
unusual points are not really outliers, but large values due to the fact that as 9 gets
close to ir, tan(0/2) tends to infinity with increasing rate. Any link function is a
mapping of a circle to the real line, it has to break the circle somewhere on the
circle. Figure 2.6 shows an example in pictorial form. If the link function is chosen
to break the circle at a point, then points that were in a small neighborhood of that
point fall further apart on the real line, even though they meant to be close. So one
can choose the point in such a way that after the transformation close points stay
close, and far apart points stay far apart. For example, for a data set ranging (a,b)
with c as the midpoint, we can use a link function that breaks the circle at point
c — 180, like ta n ((0 — c — 180)/2).
If we were to treat the large jum ps as “outliers", there are two approaches we
can use. One is that we replace the outliers with the sample mean; the other one is
that we replace them with mean of the same day in different years. We consider the
outliers as
Outlier = (0 |tan (0 /2 ) < —100 or tan(0/2) > 100}
There are 12 “outliers” in the 8 year data set, and the differences between the two
approaches are not significant in the analysis. So here, we use the first approach to
handle the outliers problem. See Figure 2.7 for time series plots of the data without
the “outliers” .
10
1962 1963
160 163167 170 175 180 100 105 110 115 120
(c)
1963 1962
Figure 2.5: Time series plots of parts of data from 1962 and 1963. (a) and (c) use
the trigonometric link. The dates for the jum ps are 167 and 110 for 1962 and 1963
respectively, (b) and (d) are the original data from the same ranges. The value for
the wind direction on 167th day of 1962 is 179.6224, and on 110th day of 1963 is
180.1533.
c
Figure 2.6: One example of how to choose the breaking point of the circle
11
1961
--
a iso kb no
1963
l l L 1
1962
3
R
o
f ?
¥
no 3 0 0
1964
U ~
1965
* * * p b-
1966
8
9
£
O
¥
1967 1966
r ~ r
Figure 2.7: Time series plot of the trigonometrically transformed data without the
“outliers”
Consider the tim e series {Af}, an AR{p) process determined by
Xt — OriAi-j + Q j A ( _ 2 T ... + QpJVf-p -j- f(
where £( is a white noise process with zero mean and finite variance c r There are
several methods for determining the order p of the model. Upon examination of the
autocorrelation function and partial autocorrelation function, A/?{3) appears to be
an appropriate model. See Figure 2.8. It is confirmed by the Akaike’s Information
12
20
Figure 2.8: Autocorrelation function & : partial autocorrelation function of the trans
formed data
Criterion (AIC). AIC is a measure of goodness-of-fit considering the number of
parameters estimated as well as the reduction of estimated error variance. See
Shumway (1988).
For the present case of an order k model, this criterion can be written as
AIC(k) = n lo g (d ^) + 2k,
where
° l k = n_1
(=i
and it(k) is the estimated residuals from a model with k parameters estimated.
If the series is an AR process, then an estimate of the order of the model is the
value k that minimizes AIC(Ar). Table 2.1 gives the AIC values for the AR model.
13
k AIC
0 10.246
1 11.593
2 5.688
3 0.000
4 1.348
5 3.301
6 5.098
7 6.676
8 5.680
9 7.090
Table 2.1: AIC values of the transformed data. They have the m inim um value
subtracted from them so the minimum is zero.
The order is estim ated to be 3. The fitted model using the maximum likelihood
method is
X t = —0.0070Xt_i + 0.0593JV,_2 + 0.0566JV,_;, + i t
The variance-covariance m atrix is
ar(l)
ar(l) 3.417216e-04
ar(2) 1.244851e-06
ar(3) -2.017760e-05
ar(2)
1.244851e-06
3.405347e-04
1.244851e-06
ar(3)
-2.017760e-05
1.24485 le-06
3.417216e-04
The estimated a , ’s give us the relation between lagged tr a n s f o r m e d wind direc
tions. They do not have actual meaning here because there is no physical meaning
for the {At}. We can say nothing beyond that the transformed wind directions have
a negative relation at lag 1, and stronger positive relation at lag 2 and lag 3.
14
Since qj + q 2 + q 3 < 1, and the diagonal elements of the variance coefficient
matrix are reasonably close, the series is considered to be stationary, just as the
model requires. See Shumway (1988).
Plot of Stind«fdto*d Rwldualt
tMO l$M
ACF Plot of Residuals
a
b
o
a
a
o
0 2 0 J O
P-value for Goodness of Fit Statistic
Figure 2.9: Diagnostic checks of model AR(3)
To evaluate the performance of the model, in addition to the study of the residuals
themselves and the autocorrelations of them, we introduce the portmanteau test
statistic Q
Q = « £ 7j?
k=l
15
where 7* . is the autocorrelations of the residuals £t, K is a fixed maximum number
of lags, and n is the number of observations used to compute the likelihood. See
Box and Jenkins (1976). Typically, K should be between 10 and 20. If the correct
ARIMA model is fit, and the data are Gaussian, then Q is approximately distributed
as a x 2 random variable on K — r degrees of freedom, where r is the number of
parameters fit to the model. In Figure 2.9 the p-values for the Q statistics are
plotted using a x 2 distribution with the appropriate degrees of freedom. The AR( J)
model seems to behave well under the diagnostic measurements.
One of the most important steps in the model fitting in this case is forecasting.
Here all forecasts are based on the information available from the mean direction
for each day between the observed period, except the last day of leap years. See
Figure 2.10. For forecast based on trigonometric transformed data, the first six
forecasts are meandering below the mean, but they flatten out quickly at the level
of the mean direction. For forecasts based on probit linked data, the first ten or so
forecasts are above 180°, and quickly curve down to level of 180°. This explains why
we prefer trigonometric transformation here. These may come as a disappointment,
for it is natural to expect that good forecast would bounce up and down to capture
the ups and downs that we know the future will bring. In general, however, this is
not possible. We can hope to capture only the first couple of bounces, and those
imperfectly. It is a limitation of the nature of the world, not of the statistical
techniques we must use. So it is sensible to look at only the short term forecast
16
the models produce. Comparing the forecasts of both transformed data with the
original wind data, there is evidence that the trigonometric link yelled better result.
In the next chapter, we make more explicit use of a model for the mean directions
{0(} themselves, rather than a transformation of them.
0 10 96 V) 40 50
Figure 2.10: Forecast with different transformed data, (a) is the forecast using
trigonometric link; (b) is the forecast using probit link; (c) is the original data,
which are the first fifty observations from January 1, 1969. (a) and (b) are forecasts
of these data.
17
C hapter 3
C A R M od el for W ind D irection s
In this chapter, we work with CAR models, which based 0 1 1 wrapping a linear au
toregressive process around the circle. CAR process is part of data modeling, but it
is sufficiently im portant to be treated separately in a chapter.
In the first few sections, we gave brief introduction to a von Mises distribution,
a CAR model, and we talk about parameter estimation. A simulation algorithm
is used as an assessment of models. In Section 3.5 we fit CAR model for mean
directions obtained from Chapter 2.
3.1 T h e von M ises d istrib u tion VM(/i,«)
The von Mises distribution is the analogue on the circle of the Normal distribution
on the real line. It is a symmetric unimodal distribution, and is the most common
model for unimodal samples of circular data.
18
The probability density function of VM(/j , k) is
f(B) = [2tt/0(k)] 1 exp[«cos(0 — ^)] O < 0 < 2 t t , O < k < o o
The distribution function is
ts
F(0) = [27t/0(«)] 1 / exp[« cos{0 —/i)]d0
Jo
where (i is mean direction described in Section 2.1, and k is the concentration pa
rameter. The bigger the value of k, the more concentrated the sample is. As k — > 0,
the distribution converges to the uniform distribution on the circle and as k — > oo,
the distribution tends to the point distribution concentrated in the direction fi. Iq(k)
is the modified Bessel function of order zero, and has the expression
tin
I0(k) = (27t)_i I exp[« cos( — n)\d p
(3.1)
Pt t < p
For t < p, we assume 6t are independent V M (p(,*:() random variables, g is a link
function. Here link functions are used in such a way that the transformed angles
follow an ordinary AR(p) model.
CAR models are AR models wrapped around the circle; consequently, both the
correlation and the variance of the underlying linear models affect the appearance
of the circular models.
3.3 E stim a tio n o f param eters
The param eters are estimated by maximizing the likelihood function
/ (© ) = I\ u m ~ mt) (3.2)
20
In practice the /if and Kt must be parameterized, for example to reflect possible
seasonal components. We begin by assuming /jt = /*, Here the log-likelihood
comes in more handy, due to the special expression of the density function. For
independent and identically distributed observations, (3.2) becomes
so that
lo g /( 0 ) = log{[2fl7o(«)]- n exp[/<£" cos(0, - p,)]}
= — n log /o{«) + E" k cos(0( — jj,) — n log 2ir
Differentiating once with respect to // and k gives
= «E?sin (6t - fi)
= «E"(sin 6t cos/i + sin/i cos 0t)
= /((cos/iEJ1 sin9t ■ + ■ sin/iE” cos0t)
and
„ ^ I + E?cos(0( - / i )
/o(«)
/ (k)
n 1 + cos /tE” cos 0t + sin /iE* sin 6t
/o{«)
dlog(f)
8 k
21
setting them equate to 0, we get
tan(j/) = sin cos 9t
and
W R , a ■ n,
. . . = —(cos u cos 9 4- sin u. sin 9)
/o(«) «
The maximum likelihood estimates are therefore
and
« is the solution of Ai(k) = R.
where 6 is the mean direction, R is the mean resultant length R/n, >4i(a*) =
h { x ) f I q ( x ) , and / j(x) is the modified Bessel function of order 1.
For autocorrelated observations, there is no explicit simplification of (3.2). We
wrote a Fortran program to maximize the log-likelihood numerically, using Powell’s
method ( cf. W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery 1993
). The program is given in the Appendix. One of the key issues in using such
an algorithm effectively is to determine sensible initial values for the parameter
estimates. We examine this in the next section.
22
3.4 S im u lation s o f C A R p rocesses
Unusual problems occurred when we applied the param eter estimating procedure
to the data on wind directions. The program using Powell’s method didn’t work
out well in finding the m axim um point of the log-likelihood. Each replication of
the modeling with different start positions resulted in a different maximum, which
was an obvious pitfall. One potential reason for this is that the surface of the log-
likelihood has many local maxima and their values are close. Simulations of iid
k) observations indicated that this is indeed the reason. Figure 3.1 is the
3-D plot of the log-likelihood of simulated V'A/(2,6) observations; the periodicity of
the ups and downs is obvious. W ith simulated data, the problem still exists, i.e.
different start points give us different final results. However if the initial values of
parameters are reasonably close to the actual values, the estimates are good. From
the simulations, it is evident that due to the nature of the von Mises distribution,
good initial param eter estimates are crucial in obtaining acceptable final results.
In a CAR process, for each observation 0t, with mean (3.1), initial estimates are
needed for k, a 1(..., a p. We have examined one simple method for producing such
initial estimates. They can be obtained following the procedure below:
1. Com pute preliminary estimates of y, and k
ft = Q ,
23
<«>
(t>)
Figure 3.1: 3-D plots of the surfaces of log-likelihood of 500 simulated VM(2,6)
distributed data, (a) is part of (b).
k = V ( / ? ) .
2. Transform the data via X t — g(0t — ft)
so that
9(& t - A) = - A) + ••• + Gpdidt-p - ft)
3. Use algorithms of standard time series analysis to estimate the coefficients
d i ,..., qp. We used a Fortran implementation of an algorithm of N. Andersen
(1974).
24
To assess the adequacy of this method, we used a simulation approach. Simu
lations of different combinations of parameters were carried out following the steps
indicated below:
Step 1. Specify n
Step 2 . Simulate n V M (m (, k) data, where m, is given in (3.1)
Step 3. Estimate initial parameters
Step 4. Run the program using Powell’s method, and obtain final results.
Three sets of simulations were done. Each consists of 100 runs of simulations of
5,000 data points. All sets of simulated data have the same p and k, but different
autoregressive orders; order 0, 1 and 2. The results are shown in Table 3.1
order p
£
k Q, d 2
0 3.9995 5.9905 - -
1 3.9990 6.0198 0.3020 -
2 3.9993 5.9968 0.4003 0.2991
Table 3.1: Estimated parameters for 100 replications of 5,000 simulated data of
different order of auto-regression, p = 4, k = 6, ou = 0.3 (order 1), Qi = 0.4 (order
2), a 2 = 0.3
The variance-covariance matrices for order 0, 1 and 2 are given in Table 3.2.
The results, and other simulations not reported here show that the parameter
estimation algorithm works well with simulated circular data.
25
p - 0 p K
ft 4.751471e-05 -0.0002118327
H -2.118327e-04 0.0130349212
p = 1 P K Q!
P
7.928724e-05 -1.642667e-04 6.480869e-06
K -1.642667e-04 1.271859e-02 -3.479517e-05
a i
6.480869e-06 -3.479517e-05 1.394758e-04
p = 2 p K Qi a 2
P
5.537744e-04 -2.993853e-04 6.939920e-06 -3.102682e-06
K -2.993853e-04 1.469394e-02 -1.412325e-04 7.740023e-05
6.939920e-06 -1.412325e-04 1.400885e-04 -6.381203e-05
a 2 -3.102682e-06 7.740023e-05 -6.381203e-05 1.456053e-04
Table 3.2: Variance-covariance matrices for simulated data
3.5 F ittin g C A R process for w ind d irection s
3.5.1 CAR m odel for mean directions
In Chapter 2, we converted observations of wind direction into mean directions for
each day, and fitted them with a LAR model. It will be interesting to see how well
a CAR model can do in this case. We fitted models of order p = 0, 1, 2 and 3. The
estimated log-likelihoods are
P
0 1 2 3
log-Hk (lp) -5165.88 -4081.62 -4079.14 -4079.64
A formal x a °f the likelihood ratio can be used to determine the order p.
In comparison of two models with log-likelihood I0 i) and where ft is the
26
estimates of parameters, the hypothesis is that the extra parameter in the model
with /?,•+1 is zero. The likelihood ratio test statistic (LRTS) for this hypothesis is
A = - 2 { / ( A ) - / ( A +i)}
If the hypothesis is true, then the log-likelihood of the model with one more pa
rameter will not be much larger than that of the reduced model, and so A will be
small. If the hypothesis is false, the A will be large. Critical values for A are based
on its asymptotic distribution when the null hypothesis is true, which is Xi- We
calculated the A values for order 0, order 1 and order 3, they are 2168.52, 4.95, 1.00
respectively. Xi o.os = 3-84, Xi.o.ojs = 5.02. With 97.5% confidence we consider order
1 is sufficient.
For the “best” model, its starting values for parameters are obtained as
= 241.68°, «o = 0.54, a 0 = 0.50.
This leads to the final estimates
ft = 253.05°, k = 1.62, a = 0.78.
Estimated parameters are consistent with what we would have thought of the
data by studying the distribution of them, which is indicated in Figure 2.1. The mean
27
is between 180° and 270°, and data are quite disperse, indicating low concentration.
Estimations of future mean directions can be obtained from (3.1).
3.5.2 Seasonal m odel
We also fitted a seasonal model to the data. This corresponds to 12 different pa
rameters for the p, and the k (, one for each month. The A can be used to determine
the order p. Following the same test as discussed in the previous section. We found
that the order for the seasonal model is again 1. the coefficient o is 0.77, and the
estimates for pt and t = 1,2,..., 12 are shown in Table 3.3 below.
Month p(in degrees) k
Jan 237.59 1.89
Feb 266.89 2.13
Mar 275.31 1.85
Apr 224.87 1.54
May 259.06 1.49
Jun 235.25 1.52
Jul 241.43 1.22
Aug 266.75 1.75
Sep 235.62 1.30
Oct 229.37 1.72
Nov 258.81 1.72
Dec 288.46 1.63
Table 3.3: Estimated parameters for seasonal model
Different months have differed mean direction and concentration, even though
not by very much. The estimations are consistent with what is indicated by Figure
28
2.3. For example the estimates shows that wind directions in February has a higher
concentration factor, and it is evident in Figure 2.3.
We note that the likelihood ratio test can also be used to test whether the seasonal
model is really needed. The log-likehood of the non-seasonal model is -4081.62, and
the log-likehood of the seasonal one is -4055.94 with an increase of 22 degrees of
freedom. So A = 51.4. Since X3 2[005 = 32.67, and y23io.o2 5 = 36.78, we conclude that
the seasonal model produces a significantly better fit.
In this chapter, we fit different CAR models to daily mean directions. In the next
chapter, we return to the original data set, which has zeros as observation, and fit
the data with a variation of CAR model: GCAR ( gapped circular auto-regressive)
model.
29
C h ap ter 4
C A R M od el for W ind D irection s w ith Z eroes
We return to the original observations of wind directions at Roche’s Point, Ireland.
Upon studying the histograms of the distributions of wind directions for separate
months, some inhomogeneities were revealed, but the distributions are quite similar
for the m onths of November through April, so we chose the wind directions of this
period to be modeled in the following analysis. See Raftery and Tavare (1994).
4.1 T h e O ccurrence o f w ind
There are hours in a day when there is no wind on the island. Figure 4.1 shows
that the state of no wind is quite prominent compared to other states of the wind
directions, and we need to take this into account. Here a two-state Markov chain
analysis is presented to the study of occurrence of wind. The states are labeled
“windy” and “no wind”.
30
Let
/(<) =
1 if hour t is windy
0 if hour t is no wind
P [ /( f ) |/( f — 1), I{t — 2),...] is of interest. We model {/(#), t > 0} as a Markov
chain of order p. Estimation of p is done using a Bayesian Information Criterion
(BIC) approach, as in Raftery and Tavare (1994). The result is shown in Table 4.1.
order p BIC
0 1169.28
1 824.08
2 802.28
3 825.62
4 882.22
5 985.982
Table 4.1: BIC values
Table 4.1 gives the BIC values for the Markovian model of different order. The
order is estim ated to be 2, i.e.
P [/(t)|/(< - ! ) ,/( ( - 2),...} = P [/(f)|/< ( - 1). I ft - 2)]
The occurrence of wind on one day depends on whether there is wind the previous
two days. We model the wind directions treating the observed {/(f)} as given.
31
4.2 T h e G C A R p rocess
Figure 4.1: Histograms of data on wind directions from 11/1961 to 4/1962.
The data analyzed here come from the period November 1961 to April 1962. After
being recoded in the way described in Chapter 2, we have a total of 4,344 consecutive
hourly observations. See Figure 4.1.
As we have discussed earlier, we need to recode the data on wind directions, but
recoding causes a problem. 0t's are representing angles except when 6t = 0, which
represents “no wind” instead of 0°. All the #(’s are not the same type, angles or
32
state of no wind, therefore an inconsistency may occur if we treat them as the same.
Simple circular tim e series analysis can not be applied here. We can not simply
delete them from the data set either; they are not outliers, and they are not missing
observations. To solve this problem, we think that the zeroes or the state of no wind
somehow represent the end of a cycle of wind directions and the start of another one,
subsequently, we modify the CAR models to take account of this factor. So for wind
direction data with zeroes, we define a variation of a CAR model: GCAR (gapped
circular autoregressive) model. In a GCAR model, the order of auto-regression is
specified: p. The conditional distribution of 8t given 0(_ i,..., 0,_p is still V M (m (,«:),
but the m t is defined differently from a CAR model as
p + “ /*) + ••• + “ /i)l i f # f _ i , . . . , 0 e_ p > 0
m < = < p + 0 -1[aig(0,_i - p ) + ... + a,s(0 ,_ , - p)] if 8t- i , ..., 0,_, > O,0(_fl_i = 0
p if 0t_i = 0
(4.1)
(4.1) is defined in such a way that there can be various autoregressive order in a
CAR model; it is either p or the number q of previous observations before reaching
p until a zero is encountered. Note that q < p.
We use maximum likelihood method to estim ate parameters. Again we encounter
the same problem as we did before; getting a good starting estim ate of parameters.
Recall that what we did to solve it was to use the param eter estimation procedure
(given in Section 3.4). Unfortunately, because there are zeroes in our observations,
33
the procedure can not solve the entire problem here. Precisely, time series algorithms
can not be used to identify the order of regression. But we can obtain estimates of p
and k, and from the previous section we know the occurrence of zero is of order two,
thus the autoregressive order of the data that we are interested in should not differ
from it much. Since a takes value between 0 and 1, which is a small range, what we
can do is to try different orders and a as initial parameters, and run the program
to find maximum likelihood estimates. It is a sensible approach t hat we can take in
this situation. We used the procedure given in Section 3.4 to get the estimates of fi
and *, and we tried different values for a. We obtained initial parameters for p — 1
as
Po = 265.96°, k0 = 0.26, d 0 = 0.70
The final estimates are
ft - 288.79°, k = 8.32, a = 0.97
34
C h ap ter 5
C on clu d in g R em ark and D iscu ssio n
This paper shows how to fit models to wind direction data. Two types of models are
fitted to the data: LAR model and CAR model. The first one is used in the context
of mean directions for each day, and it turned out that linearly transformed mean
direction for a day depends on the transformed mean directions up to three days ago.
There is not much physical meaning in it, however it can be used to predict future
values. The CAR model is used directly with circular data, here wind directions.
Mean directions for each day were modeled, and they have the mean about 253° with
small concentration. The mean direction of one day depends largely on the mean
direction the day before. Secondly, we deal with wind directions having “ no wind”
as observation. A Markov chain model is fitted to decide the stochastic structure
of the occurrence of wind. It is found that whether there is wind at a certain time
depends on whether there was wind both one and two hours ago. A variant CAR
model, the GCAR model, is used to fit this kind of data. The data that have been
used here are part of the whole data set: November 1, 1961 to April 1962. In this
35
period, wind directions have a mean of 289° with relatively higher concentration.
Wind direction at one tim e depends on the wind direction one hour ago, provided
that it was windy an hour ago.
In this paper, we deal with (1) non-linear or circular data, (2) data of angle
and data of “no angle”, (3) fitting standard model to non-standard data, (4) fitting
non-standard model to non-standard data. Nevertheless, we have not overcome
all the difficulties that arise during the processes. Some of them are: what is the
theoretically correct prescription for finding the starting point? Is the GCAR model
really a sensible model to solve the “zero” problem? W hat is the best way to test
goodness-of-fit of these models? Further study is needed.
Our code was written in Fortran and is listed in the Appendix,
36
R eference List
[1] Andersen, N. (1974) Geophysics 39, 69-72.
[2] Best, D.J. and Fisher, N.I. (1979) Efficient simulation of the von Mises distir-
bution. Appl. Statist. 28, 152-157.
[3] Box, G. E. P. and Jenkins, G. M. (1976) Time series analysis: forecasting and
control. Oakland, Calif.:Holden-Day.
[4] Fisher, N. I. (1993) Statistical analysis of circular data. Cambridge University
Press.
[5] Fisher, N. I. and Lee, A. J. (1992) Regression models for an angular response.
Biometrics. 48, 665-677.
[6] Fisher, N. I. and Lee, A. J. (1994) Time series analysis of circular data. J.R.
Statist. Soc. B 56, 327-339.
[7] Haslett, J. and Raftery, A. E. (1989) Space-time modelling with long-memory
dependence: assessing Ireland’s wind power resource (with discussion). Appl.
Statist. 38, 1-50.
[8] Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (1992)
Numerical recipes in FORTRAN. Cambridge University Press.
[9] Raftery, A. E., Haslett, J. and McColl, E. (1982) Wind power: a space-time
process? In Ttme Series Analysis: Theory and Practice (ed. O.D. Anderson),
pp. 191-202. Amsterdam: North-Holland.
[10] Raftery, A. and Tavare S. (1994) Estimation and modelling repeated patterns
in high order Markov chains with the mixture transition distribution (MTD)
model. Appl. Statist. 43, 179-199.
[11] Shumway, R. H. (1988) Applied statistical time series analysis. Englewood Cliffs,
NJ.
[12] Statistical Sciences, Inc. S-PLUS User’s Manual, Version 3.0, Seattle: Statistical
Sciences, Inc., 1992
37
A p p en d ix A
C om p u ter P rogram s
1. Fortran program to fit a CAR model to circular data
c program to f i t circular A R process to wind direction data,
c June 19, 1994.
c double precision version.
c-------------------- ----------------------------------------------------------------------------------
c
c nxdim ■ p.max + 2
Parameter (NPT * 5000)
Parameter (NXDIM - 10)
Parameter (PI - 3 . 141592654D0)
Implicit Double Precision (A-H.O-Z)
Com m on /PARAM/ LAG, NN, N
Common /TH/ THETA(NPT)
Dimension X(NXDIM), DVAL(NXDIM )
Dimension Y(NXDIM), XI(NXDIM,NXDIM)
Character*64 INFILE, OUTFILE
TW O PI - 2.0 * PI
c Write (6,*) 'Enter name of direction data f i l e '
c Read ( 5 ,'(A) ') INFILE
c Open (Unit*25,File*INFILE,Status*'unknown')
Open (Unit*25,File*’test.dat'.Status*'unknown')
c Write (6,*) 'Enter name of in itia l estimates f i l e '
c Read ( 5 , '(A) ') INFILE
c Open (Unit*24,File*INFILE,Status*’unknown')
Open (Unit*24,File*'infile'.Status*'unknown')
38
c Writs (6,*) ’Enter name of output f i l e ’
c Read ( 6 , ’ (A)’) OUTFILE
c Open (Unit-26,File-QUTFILE,Status-’unknown’)
Open (Unit-26, F ile -’o u tfile ’ .Status-'unknown’)
c
c read in the d a ta ....
c N N is the number of data points in the data sequence
c The data should be in one column, with values between
c 0 and 360. Program converts to radians
c
N N - 0
10 N N - N N + 1
Read (25,*,End-20) TH V A L
THETA(NN) - TW O PI * TH V A L / 360.0
G o To 10
20 Write (26,5800) INFILE
N N - N N - 1
Write (6,+) 'Number of data points - ’ , N N
Write (26,*) ’Number of data points - 1, N N
c
C This is the loop for making multiple runs
c
30 Write (6,*) ’New run.. . ’
Write (26,*) ’***************************+***+
Write (26,*) ’New run.. . ’
c
Write (6,*) ’W hat is the lag you want to use?’
Read (5,*) L A G
Write (26,*) ’Lag - ', L A G
N • L A G + 2
c
c Set up optimizer
c
C y - in itia l guess at * vector
Read (24,*) (Y(I),I - 1,N)
Rewind 24
Write (26,5100) Y(1)
Write (26,5200) Y(2)
Do 40 I - 3, N
39
Write (26,5900) I - 2, Y(I)
40 Continue
Do 50 I - 3, N
X(I) - Y(I)
50 Continue
X(l) - L0G(Y(1)/(TW0PI-Y(1)))
X(2) - L0G(Y(2))
FTO L - 1.0E-9
ITER • 20
c
c set up info for Povell method
FRET - FUNC(X)
Write (6,5300) -FRET
Write (26,5600) -FRET
60 Do 80 I - 1, N
Do 70 J - 1, N
XI(I,J) - 0.0D0
If (I.EQ.J) Then
XI(I,J) - 1.0DO
End If
70 Continue
80 Continue
Call POW ELL(X,XI,N,NXDIM ,FTOL, ITER,FRET)
c
c output results
c
Write (6,*)
Write (26,+)
Write (6,*) ’Derivatives at the solution are:’
Write (26,+) ’Derivatives at the solution are:'
Call DFUNC(X,DVAL)
Do 90 I » 1, N
Write (26,5000) DVAL(I)
Write (6,5000) DVAL(I)
90 Continue
40
Do 100 I - 3, N
Y(I) - X(I)
100 Continue
Y(l) - TW O PI * EXP(X(1)) / (1 .0+EXP(X(l)))
Y(2) » EXP(X(2))
Write (6,*) 'maximum lag - ', L A G
Write (6,5700) -FRET
Write (26,5700) -FRET
Y Y » 360.0 * Y(l) / TW O PI
Write (6,5400) Y (l), Y Y
Write (26,5400) Y(l), Y Y
Write (6,5500) Y(2)
Write (26,5500) Y(2)
Do 110 I - 3, N
Write (6,5900) 1 - 2 , Y(I)
Write (26,5900) 1 - 2 , Y(I)
110 Continue
Write (6,*) 'Do you want to restart from here? '
Write (6,*) 'Enter 1 for y e s’
Read (5,*) IG0
If (IG0.EQ.1) G o To 60
Write (6,*)
Write (6,*)
Read (5,*)
If (IG0.EQ.
5000 Format (IX,
5100 Format (IX,
5200 Format (IX,
5300 Format (IX,
5400 Format (IX,
5500 Format (IX,
5600 Format (IX,
5700 Format (IX,
5800 Format (IX,
5900 Format (IX,
'Do you want another run?'
'Enter 1 for y es’
IG D
1) G o To 30
F10.6)
'In itia l value of m u * '.F7.4)
'In itia l value of kappa « ’ .F8.4)
'In itia l value of log-likelihood * ’ ,F12.4)
'final value of m u * ’ ,F 7 .4 ,’ (',F 6 .2 ,' d)')
'Final value of kappa ■ ',F8.4)
'In itia l value of log-likelihood * ’ .F12.4/)
'Final value of log-likelihood * ’ .F12.4/)
'Direction data f i l e is ' ,A64)
'alpha(',1 2 ,’) » ' .F10.5)
End
41
c-------------------------------------------------------------------------------------
c
Function FUNC(X)
Parameter (NPT » 5000)
Parameter (PI - 3 . 141592654D0)
Implicit Double Precision (A-H.O-Z)
Com m on /PARAM / LAG, NN, N
Com m on /TH/ THETA(NPT)
Dimension X(N)
XLIK - 0 . 0
NBESS - 0
Do 10 I - 1, N N
XLIK ADD - ADD O N (I,X)
c
c add term to log-likelihood
c
XLIK - XLIK + XLIK ADD
10 Continue
c
c add log of Bessel function part
c
XLIK » XLIK - N N * L0G(2.0*PI*BESSI0(EXP(X(2))))
c
c powell is a minimiser, so multiply by -1
c
FU N C « -XLIK
Return
End
c
c-------------------------------------------------------------------------------------
c
Function ADDON(ILOC.X)
Parameter (NPT ■ 5000)
Parameter (PI - 3 . 141592654D0)
Implicit Double Precision (A-H.O-Z)
Com m on /PARAM/ LAG, NN, N
Dimension X(N)
Com m on /TH/ THETA(NPT)
42
c use tan link. Could be modified
c
c this returns the log of the density of the von Mises
c distribution, without the bessel function part
c
G1(Y) - TAN(Y/2.0)
G(Y) - 2.0 * ATAN(Y)
X M U - 2.0 + PI * EXPCX(l)) / (1 .0+EXPCXCl)))
XK APPA - EXPCXC2))
JLA G - 0
If (ILOC.GT.LAG) JLAG - L A G
If (JLAG.Eq.O) Then
SU M - THETA(ILOC) - XMU
ADDON - XKAPPA * COS(SUM)
Return
End If
SU M - 0.0
Do 10 I - 1, JLA G
SU M - SU M + XCI+2) * G1( (THETA(ILOC-I)-XMU))
10 Continue
SU M - G(SUM )
SU M - THETA(ILOC) - X M U - SU M
A D D O N - XK APPA * COS(SUM )
Return
End
c
c-------------------------------------------------------------------------------------------------
c
Subroutine DFUNC(X,DVAL)
c Compute derivatives of log-likelihood function
Parameter (NPT * 5000)
Parameter (NXDIM * 10)
Parameter (PI ■ 3 . 141592654D0)
Implicit Double Precision (A-H,0-Z)
43
Common /PARAM/ LAG, NN, N
Common /TH/ THETA(NPT)
Dimension DERIV(NXDIM)
Dimansion X(N), DVAL(N)
Do 10 I - 1. N
DVAL(I) - 0.0
10 Continua
NBESS - 0
TW O PI ■ 2.0 * PI
X M U - TW O PI * EXPCX(l)) / (1 .0+EXP(X(l)))
XK APPA - EXP(X(2))
Do 30 I ■ 1, N N
Call DIFF( I ,X,DERIV,THETAT)
add terms to derivatives
DVAL(l) - DVAL(l) - X K APPA + SIN(THETAT) * DERIV(l)
DVALC2) ■ DVALC2) + COS(THETAT)
If ( I .GT.LAG.AND.LAG.NE.O) Then
Do 20 J - 1, L A G
DVAL(J+2) - DVAL(J+2)-XKAPPA+SIN(THETAT)*DERIV(J+2)
20 Continue
End If
30 Continue
add derivative of Bessel function part
DVAL(2) » DVAL(2) - N N ♦ BESSI1(XKAPPA) / BESSIO(XKAPPA)
povell is a minimiser, so multiply by -1
Do 40 I - 1, N
DVAL(I) - -DVAL(I)
40 Continue
now adjust derivatives with respect to m u and kappa
to allov for reparameterizing variables
DVAL(l) - DVAL(1) * X M U * (1 .O-XM U/TW DPI)
DVAL(2) « DVAL(2) ♦ XK A PPA
c
Return
End
c
c-------------------------------------------------------------------------------------------------------
c
Subroutine DIFF( ILDC,X,DERIV.THETAT)
c compute derivatives of log-likelihood
Parameter (NPT ■ 5000)
Parameter (PI ■ 3 . 141592654D0)
Implicit Double Precision (A-H.Q-Z)
Common /PARAM/ LAG, NN, N
Dimension X(N), DERIV(N)
Common /TH/ THETA(NPT)
c
c use tan link. Could be modified
c
c th is returns the derivatives of log of the density of the
c von Mises distribution, without the bessel function part
c
G1(Y) - TAN(Y/2.0)
G(Y) - 2.0 * ATAN(Y)
G2(Y) - 1.0 / (C0S(Y/2.0)**2)
X M U - 2.0 * PI + EXP(XCl)) / (1.0+EXP(X(1)))
Do 10 I - 1, N
DERIV(I) - 0 . 0
10 Continue
JLAG - 0
If (IL0C.GT.LAG) JLAG - L A G
c
c compute theta~"(iloc)
c
THETAT - THETA(ILOC) - X M U
If (JLAG.NE.O) Then
SU M » 0.0
45
Do 20 I - 1, JLA G
SU M - SU M + X(I+2) * Gl(THETAClLOC-I)-XMU)
20 Continue
TH ETA A - SU M
TH ETAT - TH ETAT - G(SUM )
End If
c
c calculate d(theta"~)/d m u
c
DERIV(l) ■ -1.0
If (JLAG.NE.O) Then
SU M - 0.0
Do 30 I - 1, JL A G
SU M « SU M + XCI+2) + G2(THETA(I LOC-I ) -XMU)
30 Continue
DERIV(l) - DERIV(l) + SU M / (1 .0+THETAA++2)
c
c calculate d(theta“~)/ d(alpha_l)
c
Do 40 I ■ 1, JLA G
DERIV(I+2) * - 2 .0*G1(THETA(ILOC-I)-XMU)/ ( 1 . 0+THETAA*+2)
40 Continue
End If
Return
End
c
c------------------------------------------------------------------------------------------------------
c
c All Numerical Recipes used below need to be Double Precision
c
c-------------------------------------------------------------------------------------------------------
c
Function BESSIl(X)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c-------------------------------------------------------------------------------------------------------
c
Function BESSIO(X)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
46
c ----------------------------------------------------------------------------------
c
Function FIDIM(X)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c----------------------------------------------------------------------------------------
c
Subrout ine M N BR A K (AX,BX,CX,FA,FB,FC,FUNC)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c -----------------------------------------------------------------------------------
c
Function BRENT(AX,BX,CX,F,TOL,XM IN)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c----------------------------------------------------------------------------------------
c
Subrout ine POW ELL(P,XI,N,NP,FTOL, ITER,FRET)
C CO Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c----------------------------------------------------------------------------------------
c
Subroutine LINMIN(P,XI.N.FRET)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v .
c
c----------------------------------------------------------------------------------------
2. Fortran Program to fit a seasonal CAR model to circular data
c program to f i t circular A R process to wind
c direction data. Seasonal case,
c October 21, 1994.
c
c double precision version,
c
c this does N O T compute derivatives at the maximum
c
c---------------------------------------------------------------------------------------
c
c nxdim ■ p_max + 24
47
Parameter (NPT » 5000)
c set up for max lag of 6, and monthly labels 1..............12
c itype is the number of seasonal parameters,
c they must be labeled 1, 2, itype
Parameter (NXDIM - 30,ITYPE - 12)
Parameter (PI - 3 . 141592654D0)
Implicit Double Precision (A-H,Q-Z)
Com m on /PARAM / LAG, NN, N
Com m on /TH/ THETA(NPT), NTYPE(NPT)
Dimension X(NXDIM), DVAL(NXDIM )
Dimension Y(NXDIM), XI(NXDIM,NXDIM)
Character*64 INFILE, OUTFILE
TW O PI - 2.0 * PI
c Write (6,+) 'Enter name of direction data f ile '
c Read ( 5 ,'(A) ') INFILE
c Open (Unit-25,File-INFILE,Status-'unknown')
Open (Unit-25,File-'testb.dat’ .Status-’unknown’)
c Write (6,+) 'Enter name of in itia l estimates f ile '
c Read ( 5 ,‘ (A) ’) INFILE
c Open (Unit-24,File-INFILE,Status-’unknown')
Open (Unit-24,File-’in file'.S tatu s*’unknown’)
c Write (6,*) 'Enter name of output f i le '
c Read ( 5 ,'(A)’) OUTFILE
c Open (Unit-26,File-OUTFILE,Status-’unknown')
Open (Unit-26,File-'outfile'.Status-'unknown')
c
c read in the data....
c N N is the number of data points in the data sequence
c The data should be in one column, with values between
c 0 and 360. Program converts to radians
c
N N - 0
10 N N - N N + 1
Read (25,*,End-20) THVAL, JINT, N V A L
48
c this needs to be fixed for particular input formats
THETA(NN) - TW O PI * TH V A L / 360.0
c THETA(NN) - TW O PI * (THVAL-0.5) / 36.0
NTYPE(NN) - N V A L
G o To 10
20 Write (26,5800) INFILE
N N - N N - 1
Write (6,+) 'Number of data points » ’ , N N
Write (26,*) 'Number of data points ■ ', N N
Write (6,*) 'Number of seasonal coefficients ■ ITYPE
Write (26,*) 'Number of seasonal coefficients ■ ITYPE
c
C This is the loop for making multiple runs
c
30 Write (6,*) 'New ru n ...'
Write (26,*) ’ i t*****************************************»
Write (26,*) 'New ru n ...'
c
Write (6,*) 'What is the lag you want to use?'
Read (5,*) L A G
Write (26,*) ’Lag ■ L A G
N - L A G + 2 * ITYPE
c
c Set up optimizer
c
C y ■ in itia l guess at x vector
Read (24,*) (Y(I),I - l.N)
Rewind 24
Do 40 I - 1, ITYPE
Y(I) ■ TW O PI * Y(I) / 360.0
Write (26,5100) I, Y(I)
40 Continue
Do 50 I » 1, ITYPE
Write (26,5200) I, Y(ITYPE+I)
50 Continue
IT2 ■ 2 * ITYPE
Do 60 I - IT2 + 1, N
J - I - IT2
Write (26,5900) J, Y(I)
60 Continue
49
Do 70 I - IT2 + 1, N
X(I) - Y(I)
70 Continue
Do 80 I - 1, ITYPE
X(I) - L0G(Y(I)/(TWOPI-Y(I)))
80 Continue
Do 90 I - 1, ITYPE
XCITYPE+I) - LOG(Y(ITYPE+I))
90 Continue
FTDL - 1.0E-9
ITER - 20
c
c set up info for Powell method
c
FRET - FUNC(X)
Write (6,5300) -FRET
Write (26,5600) -FRET
Do 110 I * 1, N
Do 100 J ■ 1, N
XI(I,J) - 0.000
If (I.EQ.J) Then
XI(I,J) - 1.0D0
End If
100 Continue
110 Continue
Call POW ELL(X,XI,N,NXDIM ,FTOL, ITER,FRET)
c
c output results
c
Write (6,*)
Write (26,*)
c Write (6,*) 'Derivatives at the solution are:'
c Write (26,*) 'Derivatives at the solution are:'
c Call DFUNC(X.DVAL)
c Do 80 I ■ 1, N
c Write (26,5000) DVAL(I)
c Writ* (6,5000) DVAL(I)
c 80 Continue
50
Do 120 I - 2 * ITYPE + 1, N
Y(I) - X(I)
120 Continue
Do 130 I - 1, ITYPE
Y(I) - TW O PI + EXP(X(I)) / (1.0+EXP(X(I)))
130 Continue
Do 140 I - 1, ITYPE
YCITYPE+I) - EXPCXCITYPE+I))
140 Continue
Write (6,*) ’maximum lag ■ L A G
Write (6,5700) -FRET
Write (26,5700) -FRET
Do 150 I - 1, ITYPE
Y Y - 360.0 + Y(I) / TW O PI
Write (6,5400) I, Y(I), Y Y
Write (26,5400) I, Y(I), Y Y
150 Continue
Do 160 I ■ 1, ITYPE
J - hype + I
Write (6,5500) I, Y(J)
Write (26,5500) I, Y(J)
160 Continue
Do 170 I - 2 * ITYPE + 1, N
Write (6,5900) I - 2 * ITYPE, Y(I)
Write (26,5900) I - 2 * ITYPE, Y(I)
170 Continue
Write (6,*) ‘Do you want another run?'
Write (6,*) ’Enter 1 for y es’
Read (5,*) IG0
If (IG0.EQ.1) G o To 30
5000 Format (1X.F10.6)
5100 Format (IX ,’In itia l value of mu(’ , I 2 , ’) * *,F7.4)
5200 Format (IX,1In itial value of kappa(’ ,1 2 ,’) * ’ ,F8.4)
5300 Format (IX,’In itial value of log-likelihood ■ ’ .F12.4)
5400 Format (IX ,’Final val of mu(',1 2 ,’) » ’ ,F 7.4,' ( ’ ,F 6 .2.’d ) '
5500 Format (IX ,’Final value of kappa(’ ,12,') - ’ ,F8.4)
5600 Format (IX ,‘In itia l value of log-likelihood ■ ’ .F12.4/)
5700 Format (IX,'Final value of log-likelihood ■ ’ ,F12.4/)
5800 Format (IX,'Direction data f i l e is ' ,A64)
5900 Format (IX,'alphaC M 2 , ’) - \F10.5)
End
Function FUNC(X)
Parameter (NPT • 5000)
Parameter (NXDIM - 30,ITYPE - 12)
Parameter (PI ■ 3 . 141592654D0)
Implicit Double Precision (A-H.0-Z)
Com m on /PARAM / LAG, NN, N
Com m on /TH/ THETA(NPT), NTYPE(NPT)
Dimension X(N)
XLIK - 0.0
NBESS * 0
IT - ITYPE
Do 10 I - 1, N N
XLIK ADD - ADDON(IT,I,X)
add term to log-likelihood
XLIK - XLIK + XLIK ADD
add log of Bessel function part
N T - NTYPE(I)
XLIK « XLIK - L0G(2.0+PI+BESSI0(EXP(X(ITYPE+NT))))
10 Continue
powell is a minimiser, so multiply by -1
FU N C - -XLIK
Return
End
c--------------------------------------------------------------------------------------------------
c
Function ADD0N(IT,ILOC.X)
Parameter (NPT ■ 5000)
Parameter (PI - 3 .141592654D0)
Implicit Double Precision (A-H.O-Z)
Com m on /PARAH/ LAG, NN, N
Dimension X(N)
Com m on /TH/ THETA(NPT), NTYPE(NPT)
c
c use tan link. Could be modified
c
c this returns the log of the density of the von Mises
c distribution, without the bessel function part
c the parameters kappa and m u are determined by the value
c of ntype(iloc). IT is the number of seasonal components
G1(Y) - TAN(Y/2.0)
G(Y) « 2.0 * ATAN(Y)
N T - NTYPE(ILOC)
X M U - 2.0 * PI * EXP(X(NT)) / ( l .0+EXP(X(NT)))
X K A PPA - EXP(X(IT+NT))
JLA G - 0
If (ILOC.GT.LAG) JLA G » L A G
If (JLAG.EQ.0) Then
SU M - THETA(IL0C) - X M U
A D D O N ■ X K A PPA * COS(SUM )
Return
End If
SU M - 0.0
Do 10 I - 1, JLA G
SU M - SU M + X(I+2*IT) * G1( (THETA(ILOC-I)-XMU))
10 Continue
SU M - G(SUM )
SU M - THETA(ILOC) - X M U - SU M
53
A D D O N » XK APPA * COS(SUM )
Return
End
c
c-------------------------------------------------------------------------------------------------
c
c All Numerical Recipes routines should be Double Precision
c
---------------------------------------------------------------------------------------------------------------------
c
Function BESSIO(X)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c--------------------------------------------------------------------------------------------------
c
Function FIDIM(X)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c--------------------------------------------------------------------------------------------------
c
Subrout ine M N BR A K (AX,BX,CX,FA,FB, FC,FUNC)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c--------------------------------------------------------------------------------------------------
c
Function BRENT(AX,BX,CX,F,TOL,XM IN)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c--------------------------------------------------------------------------------------------------
c
Subrout ine POW ELL(P,XI,N,NP,FTOL, ITER,FRET)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c-------------------------------------------------------------------------------------------------
c
Subroutine LINMIN(P,XI,N,FRET)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c--------------------------------------------------------------------------------------------------
c
Function BESSIl(X)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
54
c
c
3. Fortran program to fit a GCAR model to circular data
c program to f i t circular gapped A R process to wind directions,
c June 12, 1994.
c Single precision version
c
c------------------------------------------------------------------------------------------------------------
c
c nxdim * p_max + 2
Parameter (NPT ■ 5000)
Parameter (NXDIM ■ 10)
Parameter (PI ■ 3.141592654)
Common /PARAM/ JSTART, LAG , NN, N
Com m on /TH/ THETA(NPT), INDX(NPT)
Dimension X(NXDIM )
Dimension Y(NXDIM), XI(NXDIM,NXDIM)
Character*64 INFILE, OUTFILE
TW O PI - 2.0 * PI
c Write (6,+) 'Enter name of direction data f ile '
c Read ( 5 ,'(A) ') INFILE
c Open (Unit*25,File*INFILE,Status*'unknown’)
Open (Unit»25,File*,te s t .d a t ',Status*’unknown')
c Write (6,+) 'Enter name of in itia l estimates f ile '
c Read ( 5 ,’ (A) ') INFILE
c Open (Unit*24,File*INFILE,Status*'unknown’)
Open (Unit*24,F ile * 'f il’ ,Status*’unknown')
c Write (6,*) 'Enter name of output f i le '
c Read ( 5 , '(A)') OUTFILE
c Open (Unit*26,File*OUTFILE,Status*'unknown')
Open (Unit*26,File*'test.out',Status*’unknown’)
c
c read in the data....
55
c N N is the number of data points in the data sequence
c The data should be in one column, vith values between
c 0 and 360. Program converts to radians
c
N N - 0
10 N N - N N + 1
Read (25,*,End-20) TH V A L
IN - 1
If (THVAL.EQ.O) IN - 0
THETA(NN) » TW O PI * 10.0 * (THVAL-0.5) / 360.0
INDX(NN) - IN
G o To 10
20 Write (26,5200)
Write (26,5200) INFILE
Write (6,*) 'Number of data points * N N
Write (26,*) 'Number of data points - ', N N
c
C JSTART is the position that sequence counts start from
c
Write (6,*) 'Input the starting position in sequence '
Read (5,*) JSTART
Write (26,*) 'Starting position in sequence ■ ', JSTART
c
C This is the loop for making multiple runs
c
30 Write (6,*) 'New run...'
Write (26,*) ' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ‘
Write (26,*) 'New r u n ...’
c
Write (6,*) 'What is the maximum lag (p) you want to use?'
Read (5,*) L A G
Write (26,*) 'Maximum lag ■ ', L A G
N - L A G + 2
c
c Set up optimizer
c
C y ■ in itia l guess at x vector
Read (24,*) (Y(I),I - 1,N)
Rewind 24
Write (26,*) 'In itia l value of m u ■ ', Y(l)
Write (26,*) 'In itia l value of kappa ■ Y(2)
56
Do 40 I « 3, N
Writ* (26,5300) 1- 2, Y(I)
40 Continue
Do 50 I • 3, N
X(I) « Y(I)
50 Continue
X(l) - L0G(Y(1)/(TW0PI-Y(1)))
X(2) - L0G(Y(2))
FTO L - 1.0E-7
ITER - 20
c
c set up info for Powell method
FRET - FUNC(X)
Write (6,*) 'In itial value of log-likelihood * ', -FRET
Write (26,5000) -FRET
Do 70 I • 1, N
Do 60 J - 1, N
XI(I.J) - 0.0D0
If (I.EQ.J) Then
XI(I.J) - 1.0D0
End If
60 Continue
70 Continue
Call POWELUX, XI ,N,NXDIM ,FTOL, ITER, FRET)
c
c output results
c
Do 80 I - 3, N
Y(I) - X(I)
80 Continue
Y(1) - T W O PI * EXP(X(1)) / (1.0+EXP(X(l)))
Y(2) - EXP(X(2))
Write (6,*) ‘maximum lag « L A G
Write (6,5100) -FRET
Write (26,5100) -FRET
57
Writ* (6,*) 'mu - ', Y(l)
Writ* (26,*) ’m u - *, Y(l)
Writ* (6,*) ’kappa * Y(2)
Writ* (26,*) 'kappa - Y(2)
Do 90 I - 3, N
Writ* (6,5300) 1 - 2 , Y(I)
Writ* (26,5300) 1 - 2, Y(I)
90 Continu*
Writ* (6,*) ’Do you want another run?’
Writ* (6,*) ’Enter 1 for yes'
Road (5,*) IG0
If (IG0.EQ.1) Go To 30
5000 Format (lX .’In itia l value of log-likelihood ■ ’ .F12.4/)
5100 Format (IX,'Final value of log-likelihood ■ ’ ,F12.4/)
5200 Format (IX, ' Direction data f i l e is ’ ,A64)
5300 Format (IX ,’alpha(’ ,1 2 ,’) - ’ .F10.5)
End
c
c---------------------------------------------------------------------------------------------------
c
Function FUNC(X)
Parameter (NPT ■ 5000)
Parameter (PI * 3.141592654)
Common /PARAM/ JSTART. LAG, NN, N
Common /TH/ THETA(NPT), INDX(NPT)
Dimension X(N)
XLIK -0.0
NBESS • 0
Do 10 I - JSTART, N N
If (INDX(I).NE.0) Then
NBESS - NBESS + 1
JLAG - JVAL(I)
XLIKADD - ADDON(I,JLAG,X)
c
c add term to log-likelihood
c
XLIK - XLIK + XLIK ADD
End If
58
10 Continue
add log of Bessel function part
XLIK - XLIK - NBESS * L0G(2. 0+PI*BESSI0(EXP(X(2))))
povell is a minimiser, so multiply by -1
FU N C - -XLIK
Return
End
Function ADD0N(IL0C,JLAG.X)
Parameter (NPT ■ 5000)
Parameter (PI ■ 3.141592654)
Com m on /PARAM/ JSTART, LAG, NN, N
Dimension X(N)
Common /TH/ THETA(NPT), INDX(NPT)
use tan link. Could be modified
this returns the log of the density of the von Mises
distribution, without the bessel function part
G1(Y) « TANCY/2.0)
G(Y) - 2.0 * ATAN(Y)
X M U - 2.0 * PI * EXP(X(1)) / (1 .0+EXP(X(l)))
XK APPA - EXP(X(2))
If (JLAG.LE.0) Then
A D D O N - X K A PPA * COS(THETA(IL0C)-XMU)
Return
End If
SU M - 0.0
Do 10 I - 1, JLA G
SU M - SU M + X(I+2) * G1((THETA(ILOC-I)-XMU))
10 Continue
SU M - G(SUM )
SU M - THETA(ILOC) - X M U * SU M
A D D O N - X K A PPA * COS(SUM )
Return
End
c
c------------------------------------------------------------------------------------------------
c
Function JVAL(X)
Parameter (NPT * 5000)
Com m on /PARAM/ JSTART, LAG, NN, N
Com m on /TH/ THETA(NPT), INDX(NPT)
JV A L - 0
If (LAG.EQ.O) Return
D o 10 J - 1, L A G
If (INDX(I-J).Eq.O) Return
JV A L - JVAL + 1
10 Continue
Return
End
c
c------------------------------------------------------------------------------------------------
c
c All Numerical Recipes routines need to be Single Precision
c
c------------------------------------------------------------------------------------------------
c
Function BESSIO(X)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c *
c-------------------------------------------------------------------------------------------------
c
Function FIDIM(X)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c-------------------------------------------------------------------------------------------------
c
Subrout ine M N BR A K (AX,BX,CX,FA,FB,FC,FUNC)
C (C) Copr. 1986-92 Numerical Recipes Software 53#]v.
c
c-------------------------------------------------------------------------------------------------
60
Function BRENT(AX, BX,CX, F,TOL,XM IN)
(C) Copr. 1986-92 Numerical Recipes Software 53#]v.
Subrout ine POWELL(P,XI,N,NP,FTOL, ITER, FRET)
(C) Copr. 1986-92 Numerical Recipes Software 53#]v.
Subroutine LINMIN(P,XI,N,FRET)
INFORMATION TO USERS
This manuscript has been reproduced from the m icrofilm master. U M I
films the text directly from the original or copy submitted. Thus, some
thesis and dissertation copies are in typewriter free, while others may
be from any type of computer printer.
The quality of this reproduction is dependent upon the quality of the
copy submitted. Broken or indistinct print, colored or poor quality
illustrations and photographs, print bleedthrough, substandard margins,
and improper alignment can adversely affect reproduction.
In the unlikely event that the author did not send UMI a complete
manuscript and there are missing pages, these will be noted. Also, if
unauthorised copyright material had to be removed, a note win indicate
the deletion.
Oversize materials (e.g^ maps, drawings, charts) are reproduced by
sectioning the original, beginning at the upper left-hand corner and
continuing from left to right in equal sections with small overlaps. Each
original is also photographed in one exposure and is included in
reduced form at the bade of the book.
Photographs included in the original manuscript have been reproduced
xerographically in this copy. Higher quality 6" x 9" black and white
photographic prints are available for any photographs or illustrations
appearing in this copy for an additional charge. Contact UMI directly
to order.
A Bell & Howell Information Company
300 North Z ee b Road Ann Arbor Ml 48106-1346 USA
313-' 761-4700 800.521-0600
UMI Number: 1376533
ONI Microform 1376533
Copyright 1995, by UMI Company. All rights reserved.
This microform edition is protected against unauthorized
copying under Title 17, United States Code.
UMI
300 North Zeeb Road
Ann Arbor, MI 48103
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Independent process approximation for the coupon collector's problem
PDF
Structural equation modeling in educational psychology
PDF
The effects of dependence among sites in phylogeny reconstruction
PDF
Markovian Models For Discrete Data With Repeated Patterns
PDF
Ordered Probit Models For Transaction Stock Prices
PDF
Repeated Measures In Psychology: Bias In Collinearity Judgment
PDF
Implementation aspects of Bézier surfaces to model complex geometric forms
PDF
A finite element approach on singularly perturbed elliptic equations
PDF
Diminishing worlds?: the impact of HIV/AIDS on the geography of daily life
PDF
Structural geology of the Chiwaukum schist, Mount Stuart region, central Cascades, Washington
PDF
Altered interaction of human endothelial cells to the glycosylated laminin
PDF
The study of temporal variation of coda Q⁻¹ and scaling law of seismic spectrum associated with the 1992 Landers Earthquake sequence
PDF
Performances of orthodox and liberal Jewish children in third grade on praxis subtests of the SIPT: a cross cultural study
PDF
An analysis of private and social gains from plastics recycling
PDF
"It was in the air": Adolph Gottlieb, the pictographs, New York, and the Zeitgeist of the 1940s
PDF
Trifluoromethanesulfonates (triflates) for organic syntheses
PDF
Femtosecond laser studies of biological systems
PDF
Pulse oximetry failure rates
PDF
Hand function in older adults: the relationship between performance on the Jebsen Test and ADL status
PDF
Respiratory system impedance at the resting breathing frequency range
Asset Metadata
Creator
Wen, Ping
(author)
Core Title
The analysis of circular data
School
Graduate School
Degree
Master of Science
Degree Program
Statistics
Degree Conferral Date
1994-12
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest,statistics
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Tavare, Simon (
committee chair
), Goldstein, Larry (
committee member
), Waterman, Michael (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c18-6047
Unique identifier
UC11357905
Identifier
1376533.pdf (filename),usctheses-c18-6047 (legacy record id)
Legacy Identifier
1376533-0.pdf
Dmrecord
6047
Document Type
Thesis
Rights
Wen, Ping
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
statistics