This document lists everything a student should be able to achieve following the first half of the course. Each question in a mid-semester test or exam should directly relate to one or more of these objecives. These objectives were written by Ben and may not apply when others teach the course. Set 1 By the end of this handout, students should be able to Write down a specification for a generalised linear model using equations, including via matrix notation. List the assumptions of the generalised linear model Describe the following components of a generalised linear model: – The response variable – The response distribution – The link function – The explanatory variables Fit generalised linear models in R using lm() and glm(). For a linear model, calculate the following in R by directly coding up the required equa- tions, and describe what we can infer about the population from each: – The estimated coefficients, β̂, given a design matrix X and a vector for the response variable, Y . – The estimated variance of the errors. – The variance-covariance matrix for β̂. – The residuals. – Confidence intervals for each coefficient in β̂. – A test statistic and p-value for a hypothesised value of a coefficient in β̂. – Confidence intervals for the coefficients in β̂. – Point predictions for the response variable, given some specified value(s) of the ex- planatory variable(s). – Prediction intervals for a response, given some specified values(s) of the explanatory variable(s). 1 – A test statistic and p-value for an added-variable F -test. (For generalised linear models, the equivalent test is the analysis of deviance.) Use standard R functions (e.g., lm(), summary()) to extract all of the above for a linear model. Use standard R functions to do the equivalent steps for a generalised linear model, again describing what we can infer from the population for each. Describe the procedure of calculating β̂ for a generalised linear model. Use the anova() function to carry out a series of hypothesis tests for a fitted model, and interpret the output. Define under- and overdispersion, and identify when we need to make corresponding ad- justments to our models. Describe the similarities and differences between standard generalised linear models and their quasilikelihood counterparts. Describe what changes when we switch from a standard model to a quasilikelihood model, and what stays the same. Fit models with negative binomial responses using glm.nb() and interpret the output. Define offsets and identify when they are required in a generalised linear model. Fit a model with an offset, and interpret the output. Summarise the inference obtained from a fitted model that can be used to answer questions of interest. Set 2 By the end of this handout, students should be able to Identify when conducting a bootstrap is helpful, and explain why. Describe how parametric and nonparametric bootstrapping works, and highlight the key differences between the two, including their relative strengths and weaknesses. Write R code to conduct bootstrapping for a generalised linear model. Use a bootstrap procedure to calculate standard errors and confidence intervals, and carry out hypothesis tests, for parameters (or functions of parameters) that are of interest. Set 3 By the end of this handout, students should be able to Define and explain the following terms: outliers, high leverage points, influential points, multicollinearity. Identify when outliers, high leverage points, influential points, and multicollinearity exist in a data set using the diagnostic tools discussed in the lectures. 2 Directly calculate leverage, different types of residuals, and Cook’s distance for a linear model in R. Describe and discuss properties of the different types of residuals, and calculate residuals directly in R for a linear model. For a generalised linear model, directly calculate – Pearson residuals for a generalised linear model, given an observed response and a fitted value. – Cook’s distance, given Pearson residuals and the hat matrix. – Deviance change, given deviance and Pearson residuals, and the hat matrix. Create and interpret GAM plots to test for curvature in a regression surface. Use the deviance to test for goodness-of-fit for a GLM either using a chi-squared sampling distribution or a parametric bootstrap, where appropriate. Decide whether or not a fitted model is appropriate using the diagnostic techniques de- scribed in this lecture set, and explain why or why not. Describe what effect a violated assumption may have on any inference obtained from a model. Propose modifications to a model in order to fix any problems identified by a diagnostic technique. Set 4 By the end of this handout, students should be able to Directly calculate AIC and BIC, given a model’s maximised log-likelihood, number of parameters, and sample size. Use AIC and BIC to assess the relative support of different models. Discuss similarities and differences between information criteria such as AIC and BIC, along with their strengths and weaknesses. Carry out and interpret the output from model search strategies. Discuss strengths and weaknesses of different search strategies. Set 5 By the end of this handout, students should be able to Describe descriptive models, causal models, and discuss their differences. Create a causal diagram from a description of the direct effects that are believed to exist amongst a set of variables. 3 Using a causal diagram, identify – which variables have direct effects on others, – which variables have indirect effects on others, – which variables are confounders when considering the effect of one variable on another, – which variables are colliders when considering the effect of one variable on another, – which pathways are confounding pathways when considering the effect of one variable on another, and – which pathways are colliding pathways when considering the effect of one variable on another. Using a causal diagram, propose models that can estimate – direct effects of explanatory variables on a response variable, and – the total effect of a particular variable on a response variable. Define what an effect modifier is, and propose models that are appropriate when an effect of interest is affected by a modifier. Fit models to estimate direct effects and total effects, and interpret these estimated effects. Describe the impact of a missing variable on the inference obtained from a causal model. Lab 1 Following this lab, students should be able to Critique a model based on its description, identify when the fitted response distribution is potentially inappropriate, and propose a better alternative. Lab 2 Following this lab, students should be able to Identify when a model might be inappropriate by visually inspecting the data, and com- paring with simulated data from a fitted model. Propose ways to improve a model when it is found to be inappropriate using this approach. Lab 3 Following this lab, and using their understanding from both Labs 2 and 3, students should be able to Discuss strengths and weaknesses of parameteric and nonparametric bootstrapping. 4 Lab 4 Following this lab, students should be able to Discuss scenarios for which ChatGPT is a useful tool in terms of using methods and techniques covered in this course, and for which ChatGPT is unhelpful or misleading. Provide examples of queries that fall into the helpful vs unhelpful/misleading categories. Lab 5 Following this lab, students should be able to Discuss the performance of information-theoretic criteria when presented with a model- selection problem.