Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: zz-x2580
STATS 762 Learning Objectives for Sets 1–6
This document lists everything a student should be able to achieve following the first half of
the course. Each question in a mid-semester test or exam should directly relate to one or more
of these objecives. These objectives were written by Ben and may not apply when others teach
the course.
Set 1
By the end of this handout, students should be able to
Write down a specification for a generalised linear model using equations, including via
matrix notation.
List the assumptions of the generalised linear model
Describe the following components of a generalised linear model:
– The response variable
– The response distribution
– The link function
– The explanatory variables
Fit generalised linear models in R using lm() and glm().
For a linear model, calculate the following in R by directly coding up the required equa-
tions, and describe what we can infer about the population from each:
– The estimated coefficients, β̂, given a design matrix X and a vector for the response
variable, Y .
– The estimated variance of the errors.
– The variance-covariance matrix for β̂.
– The residuals.
– Confidence intervals for each coefficient in β̂.
– A test statistic and p-value for a hypothesised value of a coefficient in β̂.
– Confidence intervals for the coefficients in β̂.
– Point predictions for the response variable, given some specified value(s) of the ex-
planatory variable(s).
– Prediction intervals for a response, given some specified values(s) of the explanatory
variable(s).
1
– A test statistic and p-value for an added-variable F -test. (For generalised linear
models, the equivalent test is the analysis of deviance.)
Use standard R functions (e.g., lm(), summary()) to extract all of the above for a linear
model.
Use standard R functions to do the equivalent steps for a generalised linear model, again
describing what we can infer from the population for each.
Describe the procedure of calculating β̂ for a generalised linear model.
Use the anova() function to carry out a series of hypothesis tests for a fitted model, and
interpret the output.
Define under- and overdispersion, and identify when we need to make corresponding ad-
justments to our models.
Describe the similarities and differences between standard generalised linear models and
their quasilikelihood counterparts. Describe what changes when we switch from a standard
model to a quasilikelihood model, and what stays the same.
Fit models with negative binomial responses using glm.nb() and interpret the output.
Define offsets and identify when they are required in a generalised linear model.
Fit a model with an offset, and interpret the output.
Summarise the inference obtained from a fitted model that can be used to answer questions
of interest.
Set 2
By the end of this handout, students should be able to
Identify when conducting a bootstrap is helpful, and explain why.
Describe how parametric and nonparametric bootstrapping works, and highlight the key
differences between the two, including their relative strengths and weaknesses.
Write R code to conduct bootstrapping for a generalised linear model.
Use a bootstrap procedure to calculate standard errors and confidence intervals, and carry
out hypothesis tests, for parameters (or functions of parameters) that are of interest.
Set 3
By the end of this handout, students should be able to
Define and explain the following terms: outliers, high leverage points, influential points,
multicollinearity.
Identify when outliers, high leverage points, influential points, and multicollinearity exist
in a data set using the diagnostic tools discussed in the lectures.
2
Directly calculate leverage, different types of residuals, and Cook’s distance for a linear
model in R.
Describe and discuss properties of the different types of residuals, and calculate residuals
directly in R for a linear model.
For a generalised linear model, directly calculate
– Pearson residuals for a generalised linear model, given an observed response and a
fitted value.
– Cook’s distance, given Pearson residuals and the hat matrix.
– Deviance change, given deviance and Pearson residuals, and the hat matrix.
Create and interpret GAM plots to test for curvature in a regression surface.
Use the deviance to test for goodness-of-fit for a GLM either using a chi-squared sampling
distribution or a parametric bootstrap, where appropriate.
Decide whether or not a fitted model is appropriate using the diagnostic techniques de-
scribed in this lecture set, and explain why or why not.
Describe what effect a violated assumption may have on any inference obtained from a
model.
Propose modifications to a model in order to fix any problems identified by a diagnostic
technique.
Set 4
By the end of this handout, students should be able to
Directly calculate AIC and BIC, given a model’s maximised log-likelihood, number of
parameters, and sample size.
Use AIC and BIC to assess the relative support of different models.
Discuss similarities and differences between information criteria such as AIC and BIC,
along with their strengths and weaknesses.
Carry out and interpret the output from model search strategies.
Discuss strengths and weaknesses of different search strategies.
Set 5
By the end of this handout, students should be able to
Describe descriptive models, causal models, and discuss their differences.
Create a causal diagram from a description of the direct effects that are believed to exist
amongst a set of variables.
3
Using a causal diagram, identify
– which variables have direct effects on others,
– which variables have indirect effects on others,
– which variables are confounders when considering the effect of one variable on another,
– which variables are colliders when considering the effect of one variable on another,
– which pathways are confounding pathways when considering the effect of one variable
on another, and
– which pathways are colliding pathways when considering the effect of one variable on
another.
Using a causal diagram, propose models that can estimate
– direct effects of explanatory variables on a response variable, and
– the total effect of a particular variable on a response variable.
Define what an effect modifier is, and propose models that are appropriate when an effect
of interest is affected by a modifier.
Fit models to estimate direct effects and total effects, and interpret these estimated effects.
Describe the impact of a missing variable on the inference obtained from a causal model.
Lab 1
Following this lab, students should be able to
Critique a model based on its description, identify when the fitted response distribution
is potentially inappropriate, and propose a better alternative.
Lab 2
Following this lab, students should be able to
Identify when a model might be inappropriate by visually inspecting the data, and com-
paring with simulated data from a fitted model.
Propose ways to improve a model when it is found to be inappropriate using this approach.
Lab 3
Following this lab, and using their understanding from both Labs 2 and 3, students should be
able to
Discuss strengths and weaknesses of parameteric and nonparametric bootstrapping.
4
Lab 4
Following this lab, students should be able to
Discuss scenarios for which ChatGPT is a useful tool in terms of using methods and
techniques covered in this course, and for which ChatGPT is unhelpful or misleading.
Provide examples of queries that fall into the helpful vs unhelpful/misleading categories.
Lab 5
Following this lab, students should be able to
Discuss the performance of information-theoretic criteria when presented with a model-
selection problem.