Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: zz-x2580
STAT 248 - Analysis of Time Series
1 Instructions
1. We will only accept typed solutions.
2. You need to upload your homework solution as a single pdf file on Gradescope by
midnight on 13 March
3. Total points: 70
2 Questions
1. Consider the dataset Nile (from the R package dlm) on the measurements of the annual
flow volume of the river Nile. Consider fitting the local level model to this dataset with
diffuse initialization X0 ∼ N(0,+∞). Compute numerical estimates of the parameters
σ2Z and σ
2
by the EM algorithm. Compare the resulting estimates to those obtained
by the dlmMLE function in the package dlm. (5 points).
2. Consider the data in the file “Prob2HW3DataStat248.csv” which is basically the Nile
dataset from the previous problem with some missing observations. You may assume,
for this problem, that the missing observations are “missing at random”.
a) Fit the local level model to this dataset and estimate the parameters σ2η and σ
2
by the EM algorithm (5 points).
b) Let α denote a time index corresponding to a missing observation. Show that
(below “data” refers to the observed data)
Yα | data, θ ∼ N(mα|data, Qα|data + σ2 )
where mα|data and Qα|data denote the mean and variance of the conditional dis-
tribution of Xα given the observed data. (3 points)
c) Impute the missing observations by providing point estimates and uncertainty
intervals. Compare your answers with the actual values from the full Nile dataset
(6 points).
1
3. Consider the state space model
X0 ∼ N(0, C)
Xt = φXt−1 + Zt for t ≥ 1 and Zt i.i.d∼ N(0, σ2Z)
Yt = Xt + t for t ≥ 0 and t i.i.d∼ N(0, σ2 ).
(1)
Here φ, σZ , σ are parameters of the model and C is a large positive constant. Let
θ = (φ, σZ , σ).
a) Write down the Kalman Smoother Recursions for this model. (3 points)
b) Using the Fisher identity, write down an explicit formula for the score vector in
this model at a fixed parameter value θ(0) = (φ(0), σ
(0)
Z , σ
(0)
). The formula should
be in terms of the observed data y0, . . . , yT and the values ms|t(θ(0)) and Qs|t(θ(0))
for some s and t. (6 points)
c) Write down the formula for the EM iterate θ(n) 7→ θ(n+1). The formula should be
in terms of the observed data y0, . . . , yT and the values ms|t(θ(0)) and Qs|t(θ(0))
for some s and t. (4 points).
d) Consider fitting model (1) to the dataset from “Prob3HW3DataStat248.csv”. Es-
timate the parameters θ = (φ, σZ , σ) of the model using maximum likelihood
calculated via the Gradient Ascent algorithm. (5 points)
e) Estimate the parameters θ = (φ, σZ , σ) using the EM algorithm. (5 points).
4. In the file “Prob4HW3DataStat248.csv”, you will find a simulated time series of length
n = 500 that is generated according to the model:
Yt = f(ut) + t for t = 1, . . . , n
where ut := (t− 1)/n, f : [0, 1]→ R is a smooth function and 1, . . . , n are i.i.d errors
with mean zero and variance σ2 . Consider fitting the following model to this data:
Yt = Xt + t and Xt − 2Xt−1 +Xt−2 = Zt (2)
where {t}, {Zt} are independent with
t ∼ N(0, σ2) and Zt ∼ N(0, σ2Z). (3)
a) Calculate maximum likelihood estimates of σZ and σ. You can use any method
(including in-built functions from the package dlm) for this. (2 points)
b) Plot point estimates of the function f (i.e., of Xt = f(ut) for t = 1, . . . , n) as well
as the associated 95% uncertainty bands. (4 points)
c) Compare your estimates of f and the uncertainty intervals with the following true
function which was used to generate the data: (2 points)
f(u) = sin(15u) + exp(−u2/2) + 1
2
(u− 0.5)2 for u ∈ [0, 1]. (4)
5. In the file “Prob5HW3DataStat248.csv”, you will find a simulated time series of length
n = 500 that is generated according to the model:
Yt = f(ut) + δt for t = 1, . . . , n (5)
2
where ut := (t−1)/n, f : [0, 1]→ R, is a smooth function, and δ1, . . . , δn are mean zero
errors that are not necessarily independent. Writing Xt = f(ut), consider the following
model for Yt:
Yt = Xt + γt + t
where
Xt − 2Xt−1 +Xt−2 = η1t and γt − φγt−1 = η2t
with {t}, {η1t} and {η2t} all being independent with
t ∼ N(0, σ2 ) and η1t ∼ N(0, σ21) and η2t ∼ N(0, σ22).
In other words, the dependent errors (5) are modeled as δt = γt + t with γt being
AR(1) and t being i.i.d.
a) This model has four unknown parameters: φ, σ, σ1 and σ2. What are the maxi-
mum likelihood of these parameters. You can use any method (including in-built
functions from the package dlm) for this. (4 points).
b) Plot point estimates of the function f (i.e., of Xt = f(ut) for t = 1, . . . , n) as well
as the associated 95% uncertainty bands. (4 points)
c) Compare your estimates of f and the uncertainty intervals with the true function
(4) which was used to generate the data. (2 points)
6. In the file “Prob6HW3DataStat248.csv”, you will find a simulated time series of length
n = 2048 that is generated according to the model:
Yt = Xt + t with t
i.i.d∼ N(0, σ2 ) and σ = 5
for a piecewise constant sequence X1, . . . , Xn (piecewise constant means that Xt =
Xt−1 for most but not all t). Answer the following questions employing the model:
Xt = Xt−1 + Zt
with
Zt
i.i.d∼ (1− α)N(0, 0.00001) + αN(0, σ2Z)
for two parameters α ∈ (0, 1) and σZ > 0.
a) Obtain maximum likelihood estimates of σZ and α. (6 points)
b) Obtain point estimates of {Xt} for each t = 1, . . . , n (4 points).