STAT3022 Applied Linear Models
Applied Linear Models Semester
项目类别:数学

STAT3022 Applied Linear Models Semester 1

Tutorial & Lab Sheet 1
Tutorial Problems
There are no tutorial problems in Week 1.
Computer Problems
We will be using the statistical software R for all analysis. Basic use of R is assumed knowledge in
this course. If you are not familiar with R, you should quickly learn. Check the resource page in
canvas to learn or refresh.
For Q1-Q3, you will treat R as a calculator to get the quantity you want. For Q4, you will get
started with literate programming using R Markdown. For Q5, you should try to produce an R
Markdown report yourself. Note that assignments will be based on your R Markdown output so be
sure to learn how to make an output in your computer class.
Question 1
(Assumed knowledge) Use R to find the following probabilities
(a) P (Z > −0.785), Z ∼ N(0, 1), with pnorm(q, lower.tail = FALSE).
(b) P (t2 ≥ −1.26), with pt(q, df, lower.tail = FALSE).
(c) P (χ24 < 4.7), with pchisq(q, df).
(d) P (|t9| > 1.85), with pf(q**2, df1, df2, lower.tail = FALSE) after thinking about how
the t and the F distribution relate to each other.
Question 2
(Assumed knowledge) Use qnorm, qt, qchisq, and qf to find c in the following
(a) P (t4 ≥ c) = .995 with qt,
(b) P (|Z| ≤ c) = 1/11 with both, qnorm and qchisq,
(c) P (F3,12 ≤ c) = .90 with qf.
Question 3
(Assumed knowledge) A machine produces metal pieces that are cylindrical in shape. A sample of 8
pieces is taken and the diameters are
1.01, 0.97, 0.39 1.03, 1.04, 0.99, 0.98, 0.99.
(a) Construct a box plot representation of this data set. (Hint: with x <- c(1.01,..., 0.99)
and boxplot()).
(b) Estimate the average diameter, µ, produced by the machine. Estimate the standard error
(= s/

n) of your estimate? (mean(x), s = sd(), and n = length(x)).
(c) Assuming that the diameter can be modelled by a normal distribution, calculate a 98% con-
fidence interval for µ. (Hint: with t.test(x, mu = 1, conf.level = 0.98), gives you the
solution for (d) as well).
(d) Would you reject the hypothesis H0 : µ = 1.00 at significance level α = .02 on the basis of
these data?
Question 4
Go to RStudio > File > New File > R Markdown. Click on ‘OK’ to get a pre-filled R Markdown
file. Push the Knit button on top of the console just under the file names and examine the output.
Have a play around and knit to understand how it works. Where did the data cars and pressure
come from?
Question 5
In this question, you will attempt to write your own reproducible report for the analysis of the lengths
of time of passages of play data from ten international rugby matches involving the “All Blacks”.
This (as all other course data is available from Canvas Unit Schedule & Materials) is available as
rugby.txt. This exercise helps you to digest part of Lectures 1-2 and to revisit assumed knowledge
on R and graphical displays. To get started, you may like to modify the R Markdown file from Q4.
(a) Load the tidyverse R packages which will load a collection of R packages including ggplot2
and dplyr.
library(tidyverse)
(b) Download the data and read it into R, storing them as a data frame rugby. You can use the
command below but you will need to make sure that the file you have downloaded is in the
right path. Make sure you master about reading data into R.
rugby <- read.table("rugby.txt", header = TRUE)
(c) Look at the data frame by simply typing its name, rugby, into an R chunk and compiling the
pdf with Knit. You should see that the data frame has two columns. Scroll up to see that
these columns are headed Game and Time respectively. (These headings were read in from the
text file, rugby.txt; R was alerted to the presence of these headings by the header = TRUE
syntax in the read.table command.) The variable Game identifies the match (labelled A, B,
. . ., K) and the variable Time contains the times of passages of play, in seconds.
(d) In reports it is often preferable to only show the first couple of lines in a data frame. Try the
following:
rugby[1:3, ]
head(rugby)
head(rugby, 2)
(e) Type in rugby$Game into an R chunk and press Knit.
(f) The variable type for Game is categorical (or factor as synonym). You get frequencies for each
category by table(rugby$Game). Which game had the most separate passages of play? Which
had the least? You can use the help function to learn more – try help(table) or equivalently
?table in the R console.
(g) We can display the data using a bar plot. You can produce a bar plot with

barplot(table(rugby$Game))
without any additional R package. Try producing a simliar plot using ggplot2 R package.
(h) The passage of play time is a continuous numerical variable. Try displaying it using a histogram.
Is the distribution of Time normal? If not, have you seen any other data sets with similarly
shaped histograms?
(i) Finally, we can look at the times broken down by individual match. Type
rugby %>%
filter(Game=="A") %>%
pull(Time)
That gives you just the passage times for match A. Try producing separate histograms of the
passage times for game A and game H using ggplot2 R package or otherwise.
Copyright c© The University of Sydney 3
STAT3022 Applied Linear Models Semester 1
Tutorial & Lab Sheet 2
Tutorial Problems
Question 1
Suppose the linear regression model is given by Yi = β0 + β1xi + εi, i = 1 . . . , n ≥ 2. Assume that
εi ∼ NID(0, 1), that is assumptions (A1)-(A4) hold. Because of convenience the scale of the x values
is changed (e.g. from inches to centimeters) and the transformed explanatory values z = x/τ are
used instead. Write the new model as
Yi = γ0 + γ1zi + εi.
(a) Represent estimates of γ0 and γ1 in terms of βˆ0 and βˆ1. (Lecture 3)
(b) Show that r2 is invariant. (Lecture 3)
Question 2
Show that the F -test statistic for testing H0 : β1 = 0,
F =
βˆ1
2
SXX
σˆ2
,
can be written as
F =
r2(n− 2)
1− r2 ,
where r is the coefficient of correlation between x and Y . (Assumed knowledge)
Question 3
A vector of random variables X = (X1, X2, X3)
> has covariance matrix
Σ =
 9 −4 1−4 25 0
1 0 2
 .
(a) Find the correlation coefficient between X1 and X2. (Assumed knowledge)
(b) Find the variance of Y = −2X1 + 3X2. Write Y as a>X and show that the variance is a>Σa.
(Assumed knowledge + Lecture 5)
(c) What is the standard deviation of X3? (Assumed knowledge)
Question 4
In Lecture 3, you saw that a simple linear regression model
Yi = β0 + β1xi + i
assuming errors i ∼ NID(0, σ2) have the joint density evaluated at (the observed values) y> =
(y1, . . . , yn) as
Copyright c© The University of Sydney 1
f(y; β0, β1, σ) =
1√
2piσ
e−
(y1−β0−β1x1)2
2σ2 × · · · × 1√
2piσ
e−
(yn−β0−β1xn)2
2σ2
=
(
1√
2piσ
)n
e−
1
2σ2
∑n
i=1(yi−β0−β1xi)2 . (1)
(a) Write the log-likelihood, `(β0, β1;y, σ) = log f(y; β0, β1, σ).
(b) Find β0 and β1 that maximises ` assuming σ is a known fixed value
留学ICU™️ 留学生辅助指导品牌
在线客服 7*24 全天为您提供咨询服务
咨询电话(全球): +86 17530857517
客服QQ:2405269519
微信咨询:zz-x2580
关于我们
微信订阅号
© 2012-2021 ABC网站 站点地图:Google Sitemap | 服务条款 | 隐私政策
提示:ABC网站所开展服务及提供的文稿基于客户所提供资料,客户可用于研究目的等方面,本机构不鼓励、不提倡任何学术欺诈行为。