STATS 763 - Midterm test
STATS 763
项目类别:统计学

Question 1 [21 marks total]

Wilms’ tumour is a rare childhood cancer of the kidney. Treatment is successful for the majority of patients, but a minority do relapse. An important risk factor for relapse is disease stage (how far it has spread).

The following data on all U.S. paediatric Wilms’ tumour patients between

1980 and 1994, inclusively, were collected:

•  Year: Year of diagnosis, from 1980 to 1994

•  Stage: Disease stage (I [least advanced], II, III, IV [most advance])

•  rel5: Relapse within 5 years (0 [No], 1 [Yes]), hereafter called ”relapse” .

We fit a relative risk model of rel5 on Year*Stage, to capture any secular trend in relapses by disease stage, and obtain the following results:

Call:

glm(formula=  rel5~Year*Stage,  family=binomial(link="log"),  data=wilms)

Coefficients:

Estimate  Std .  Error  z  value  Pr(>|z| )

(Intercept)         52 .59226      38 .93026      1 .351      0 .1767

Year                       -0 .02767        0 .01960    -1 .412      0 .1581

StageII             -107 .15715      52 .12848    -2 .056      0 .0398  *

StageIII               40 .62429      50 .38844      0 .806      0 .4201

StageIV               -29 .17571      53.37873    -0 .547      0 .5847

Year:StageII         0 .05422        0 .02623      2 .067      0 .0387  *

Year:StageIII      -0 .02008       0 .02537    -0 .792      0 .4286

Year:StageIV          0 .01525        0 .02687      0 .568      0 .5703

Selected rows and columns from the estimated variance matrix of the coef- ficient estimates are given below:

(Intercept)       Year       StageIV    Year:StageIV

(Intercept)        1515 .6      -0 .763       -1515 .6         0 .763

Year                         -0 .763    0 .000384          0 .763    -0 .000384

StageIV             -1515 .6        0 .763          2849 .3        -1 .434

Year:StageIV           0 .763  -0 .000384        -1 .434      0 .000722

(a) [12 marks total]

We replace Year in the model by Year1980  <- Year-1980.

i. [6 marks] What are the values of the new estimates for (Intercept), Year1980, StageIV and Year1980:StageIV?

ii. [6 marks] What are the standard errors of the new estimates for Year1980, StageIV and Year1980:StageIV?

(b) [5 marks]

According to the model, what is the estimated relative risk of relapse between a patient at Stage IV in 1990 and a patient at Stage III in 1980?

(c) [4 marks]

According to the model, what is the estimated diference in log-risk of relapse corresponding to an increase in the year of diagnosis of 5 years for a patient in Stage I, and what is the standard error of this estimate?

Question 2 [16 marks total]

We t a GLM by solving the score equation Σin= xTiw i(Yi  - μi) = 0, where xi  is the ith  1 × p covariate vector, μi  = g-1(xiβ) and 0 is the 1 × pvector of all zeros. The GLM involves a dispersion parameter φ > 0.

(a) [4 marks]

Explain why wi  = 1 when the canonical link is used.

(b) [4 marks total]

What is wi  in the following settings?

i. [2  marks]  Variance function V (μ)  = μ2 , μ  ∈ R+  and link function g(μ) = log(μ).

ii. [2 marks]  Poisson(μ) family and identity link.

(c) [4 marks]

How do we usually estimate φ? Write an expression for the estimator.

(d) [4 marks]

For a certain value of β0 , you are given the observed values

(this last subscript means “evaluated at β = β0 ”).

Write down a test statistic that you can approximately compare to a χp(2) quantile to test H0  : β = β0  vs H1  : β ≠ β0 .

Question 3 [marks total]+4 bonus Answer the following questions:

(a) [4  marks]   You have written a scientific paper containing results from a linear regression model E[YjX = x] = xβ fitted to independent count data. The data set was large and you estimated the variance of β(^) using a sandwich estimator.

A reviewer writes that count data are not normally distributed, and there-fore your Wald confidence intervals are incorrect because 1)β(^) is therefore not normally distributed either and b) the variances are wrong because they are estimated under the wrong model.  How do you respond?

(b) [2 marks]  Describe one situation in which a quasi-likelihood model may fail to produce reliable standard errors.

(c) [2 marks]  True or False: Data sampled according to the outcome will yield biased regression estimates unless it is appropriately weighted.

(d) [4 marks]  (bonus)  A regression coe代cient estimated from a parametric gen-

eralised linear model has covariance matrix Cov(β(^)) = φ(XTX)-1 .  Find two combinations of family and link that will yield this covariance.

留学ICU™️ 留学生辅助指导品牌
在线客服 7*24 全天为您提供咨询服务
咨询电话(全球): +86 17530857517
客服QQ:2405269519
微信咨询:zz-x2580
关于我们
微信订阅号
© 2012-2021 ABC网站 站点地图:Google Sitemap | 服务条款 | 隐私政策
提示:ABC网站所开展服务及提供的文稿基于客户所提供资料,客户可用于研究目的等方面,本机构不鼓励、不提倡任何学术欺诈行为。