STAT 211: Statistical Inference 1 (Fall 2024) Instructor: Lucas Janson Office hours: Th 3-5pm, SC 710 Homework 4 Due date: November 18, 2024 at 9:00am Collaboration Policy: Collaboration is allowed but students must write up and fully understand their own solutions and report their collaborators and cite any online resources used when they turn in the assignment. Use of AI is allowed, but should be treated as you would a collaborator: you can use it for assistance (and if you do, you should indicate how in your submission) but you may not directly ask it for the answer. Problem 1 (Periodicity testing). Consider i.i.d. data Y1, . . . , Yn supported on [−1, 1), i.e., P(Yi < −1) = P(Yi > 1) = 0. (a) Assuming the Yi have continuous distribution, and therefore a probability density function f , construct an exact size-α test for the (nonparametric) null hypothesis H0 : f(y) = f(y + 1) for all y ∈ [−1, 0), i.e., the null hypothesis is that the probability density function f is periodic with period 1 over the range [−1, 1], or that f([−1, 0)) = f([0, 1)). By “construct” a test, I mean write down the test statistic and characterize its distribution under the H0 in terms of distributions that are known and/or were discussed in class, so that it is clear how to find the cutoff for any given α. Of course there is more than one exact size-α test, including the trivial one that simply rejects with probability α while ignoring the data—full credit will only be given to tests that take advantage of all the symmetries of the null hypothesis (and no credit will be given for the trivial test). (b) Extend your test from part (a) to also work for the generalization ofH0 to non-continuous distributions (still supported on [−1, 1)): H∗0 : P(Yi ≤ y) + 1/2 = P(Yi ≤ y + 1) for all y ∈ [−1, 0). (You can check, although you do not need to for this problem, that H0 ⇒ H∗0 .) Solution. (a) (b) 1 Problem 2 (Modied BHq) . In this problem we will generalize the Benjamini{Hochberg procedure (BHq). For some motivation, consider a regression problem where you haven i.i.d. samples of the m+1 dimensional random vector (Y; X1; : : : ; X m ), and want to learn which elementsX j the response Y depends on. As a start, you might assume a linear model:Y j (X 1; : : : ; X m )
N (P mj =1 X j
j ;
2), compute a p-value for each of them hypothesis testsH j :
j = 0, and then apply BHq to those p-values to select a subset of those hypotheses to reject such that the false discovery rate (FDR) is controlled at some nominal levelq (setting aside, for the moment, the assumptions on the p-values needed to ensure FDR control). But BHq is a step-up procedure, meaning that, for instance, it rst considers the largest p-value (call it pj max ) and, if pj max > q , it accepts H j max and then proceeds. Suppose that, if BHq accepts H j max , a new model is posed that excludesX j max , i.e., Y = N ( P j 2f 1;:::;m gnf j max g X j
j ;
2) and a corresponding new set ofm 1 p-values are computed before proceeding to the next step of BHq. This could then be repeated at the next step if another hypothesis is accepted, and so on, with the hope that by reducing the dimensionality every time a hypothesis is accepted, the p-values for the remaining hypotheses become more powerful. Note that in this modied version of BHq, not only are the p-values changing at each step, but the null hypotheses they are testing also change, since the model itself is changing and thus the meaning of a given j changes. For instance, when m = 2 and Y j (X 1; X 2)
N (X 1; 1), if ( X 1; X 2) are bivariate Gaussian with unit variances and correlation
6= 0, then H2 will be true in the rst step (to be more precise we should call this H f 1;2g2 to convey that both X 1 and X 2 are included in that rst step’s model), but if j max = 1 so that X 1 is discarded before the second step, thenH2 will become false in the second step, as Y j X 2
N ( X 2; 2(1
2=2)) (to be more precise, we should call thisH f 2g2 ). The preceding paragraphs were just for motivation and intuition; we now proceed to formalize our modied BHq procedure and prove its validity under certain assumptions on the p-values. Since the p-values and null hypotheses depend on the set of hypotheses accepted at prior steps, we add a superscript to both notations. So the set of possible p-values one could observe in modied BHq are f pSj : j 2 f 1; : : : ; mg; S
f 1; : : : ; mg; j 2 Sg, which is of sizem 2m 1 since there arem dierent values of j , and for every j there is a dierent p-value for every one of the 2m 1 subsetsS that contain j . Each of these p-values also has a corresponding null hypothesisH Sj , and any true/false pattern among the m 2m 1 null hypotheses is possible. Modied BHq then proceeds as follows: 1. Initialize S = f 1; : : : ; mg 2. Compute j max = arg max j 2 S f pSj g 3. If pSj max
qjSj=m, then stop and return S as the selected set; otherwise, setS = S n f j max g 4. If jSj = 0, stop and return the empty set, otherwise return to step 2. (a) Show that modied BHq is identical to BHq when pSj = pS0j for all j and all S, S0 and H Sj = H S0j . (b) For a given index j 2 f 1; : : : ; mg, denote by N j the vector of pSj for which H Sj is true, i.e., N j = (pSj )S : H Sj is true (note the set of suchS is non-random, but the p-values themselves are random, and hence N j is random as well unlessH Sj is false for all S, in which case we simply setN j deterministically to some default value like 1). Prove that if the N j are independent of one another and of all the non-null p-values f pSj gj;S : H Sj is false , then the above procedure controls the FDR, i.e., if we denote by S^ the set returned by modied BHq, then E " jf j 2 S^ : H S^j truegj maxfj S^j; 1g #
q: As in the lecture notes, you may assume that any null p-value (i.e., a p-value whose corresponding null hypothesis is true) is marginally Unif(0 ; 1). Note that the null p-values within an N j are not 2 assumed independent. Hint: consider rewriting the FDR as E " |{j ∈ Sˆ : H Sˆj true}| max{|Sˆ|, 1} # = E 2 4 X S⊆{1,...,m} 1{Sˆ=S} |{j ∈ Sˆ : H Sˆj true}| max{|Sˆ|, 1} 3 5 . Solution. (a) (b) 3 Problem 3 (Controlling the tail of the FDP). Let p1, . . . , pm be m independent p-values, m0 of which are null (and therefore Unif([0, 1])), and let p(1), . . . , p(m) as usual denote the ordered p-values. Instead of FDR control, where we control the expectation of the FDP at some q, we will instead prove a procedure for controlling the probability that the FDP exceeds q for some q ∈ (0, 1). Consider a step-down procedure with p-value thresholds αk = F −1 ⌊qk⌋+1,m−k+⌊qk⌋+1(α), where Fj,n is the CDF of the jth order statistic among n i.i.d. Unif([0, 1]) random variables. So to be explicit, our procedure rejects nothing if p(1) > α1 and otherwise rejects the null hypotheses corresponding to the smallest kˆ p-values, where kˆ = min{k : p(k+1) > αk+1}, and each αk is the αth quantile of the distribution of the (⌊qk⌋+1)th order statistic among m− k+ ⌊qk⌋+1 i.i.d. Unif([0, 1]) random variables. (a) Let Sk be the number of non-null p-values less than or equal to αk, and define the set K = {k : k − Sk > qk}. Show that for our procedure, if FDP > q, then K is nonempty. (b) Show that for our procedure, if FDP > q then at least ⌊qk0⌋+1 null p-values are less than or equal to αk0 , where k0 = minK (which exists since we know from part (a) that K is nonempty). (c) Show that k0 − Sk0 = ⌊qk0⌋+ 1. (d) Finally, show that for our procedure, P (FDP > q) ≤ α. You may use without proof the fact that F−1j, n(α) is nonincreasing in n. (e) Plot these critical values and compare them to those of the Benjamini–Hochberg procedure. Which procedure would you prefer to use given their respective guarantees on FDP and how powerful you expect them to be. Solution. (a) (b) (c) (d) (e) 4 Problem 4 (FDR control between step-up and step-down procedures). Recall Theorem 8.4 in the notes that the step-up procedure BHq controls the FDR (under conditions). It turns out that the step-down procedure that uses the exact same sequence of thresholds as BHq, call this procedure BHq(sd), also controls the FDR under the same conditions as in Theorem 8.4 (you do not need to prove this). Denote the number of rejections made by (step-up) BHq as kˆ(su) and the number of rejections made by the step-down version of BHq as kˆ(sd). Consider a procedure BHq(betw) that rejects H0,(1), . . . ,H0,(kˆ) for some kˆ that depends on the p-values (where no rejections are made if kˆ = 0). Suppose further that for any set of p-values, kˆ(sd) ≤ kˆ ≤ kˆ(su). (a) Are these properties sufficient to establish Theorem 8.4 for BHq(betw) under the global null? (b) Are these properties sufficient to establish Theorem 8.4 for BHq(betw) in general? Hint: A plot like Figure 1 in the lecture notes might help build intuition. Solution. (a) (b) 5 Problem 5 (Shortest interval at a point). Let Y be a single sample from a N (θ0, 1) and let zτ = Φ 1(τ) be the τth quantile of the N (0, 1), with z0 = −∞ and z1 =∞. Let α < 0.5. (a) Show that for any function w : R → [0, 1], Cw(Y ) =
θ : Y + zαw(θ) ≤ θ ≤ Y + z1 α(1 w(θ))
constitutes a valid 100(1−α)% confidence region for θ0 (with w(θ) = 1/2 corresponding to the usual Normal confidence interval). (b) Find a function w that minimizes the expected length of Cw(Y ) when θ0 = 0. What is the resulting confidence region? Is it an interval? For which values of Y is the the resulting confidence region shorter than the usual confidence region which sets w(θ) = 1/2? Solution. (a) (b) 6