Mathematical Statistics (for Actuarial Studies)/Principles of Mathematical Statistics (for Actuarial Studies) Expectation Two coins are tossed. How many heads can we expect to come up? Let Y = number of heads. Then p(y) = 8><>: 1/4 y = 0 1/2 y = 1 1/4 y = 2 The answer seems to be 1 (the middle value). But what exactly do we mean by “expect”? A mental experiment: Suppose we toss the two coins 1000 times, and each time record the number of heads, y. The result would be something like 1,1,2,0,1,2,0,1,1,1,...,1,0. We’d get about 250 zero’s, 500 one’s and 250 two’s. 3 / 19 Expectation continued So the average of the 1000 values of Y would be approximately 250(0)+500(1)+250(2) 1000 = 0(1/4) + 1(1/2) + 2(1/4) = 1. This agrees with our intuitive answer. Observe that 0(1/4) + 1(1/2) + 2(1/4) = 0p(0) + 1p(1) + 2p(2) = 2X y=0 yp(y). This leads to the following definition. Suppose Y is a discrete random variable with pmf p(y). Then the expected value (or mean) of Y is E(Y) = X y yp(y). (The sum is over all possible values y of the rv Y .) We may also write Y’s mean as µY or µ. µ is a measure of central tendency, in the sense that it represents the average of a hypothetically infinite number of independent realisations of Y . 4 / 19 Example 10 Suppose that Y is a random variable which equals 5 with probability 0.2 and 7 with probability 0.8. Find the expected value of Y . E(Y) = X y yp(y) = 5(0.2) + 7(0.8) = 6.6. This means that if we were to generate many independent realisations of Y , so as to get a sequence like 7, 7, 5, 7, 5, 7, ..., the average of these number would be close to 6.6. As the sequence got longer, the average would converge to 6.6. More on this later. 5 / 19 Example 11 Find the mean of the Bernoulli distribution. Let Y v Bern(p). Then p(y) = ( p y = 1 1 p y = 0. So Y has mean µ = 1X y=0 yp(y) = 0p(0) + 1p(1) = 0(1 p) + 1p = p. Thus for example, if we toss a fair coin thousands of times, and each time write 1 when a head comes up and 0 otherwise, we will get a sequence like 0,0,1,0,1,1,1,0,... The average of these 1’s and 0’s will be about 1/2, corresponding to the fact that each such number has a Bernoulli distribution with parameter 1/2 and thus a mean of 1/2. 6 / 19 Example 12 Find the mean of the binomial distribution. Let Y v Bin(n, p). Then Y has mean µ = nX y=0 y ✓ n y ◆ py(1 p)ny = nX y=1 y n! y!(n y)!p y(1 p)ny (the first term is zero) = np nX y=1 (n 1)! (y 1)!(n 1 (y 1))!p y1(1 p)n1(y1) = np mX x=0 m! x!(m x)!p x(1 p)mx (x = y 1 and m = n 1) = np (since the sum equals 1, by the binomial theorem) This makes sense. For example, if we roll a die 60 times, we can expect 60(1/6) = 10 sixes. y It 1 0 7 / 19 Example 12 limits sum y 1 to y n I y 1 0 to y i n t I 2 0 to a n I µ 8 0 to a M 8 / 19 Expectations of functions of random variables Suppose that Y is a discrete random variable with pmf p(y), and g(t) is a function. Then the expected value (or mean) of g(Y) is defined to be E (g(Y)) = X y g(y)p(y). The text presents this equation as Theorem 3.2 and provides a proof for it. We have instead defined the expected value of a function of a rv, with no need for a proof. Example 13 Suppose that Y v Bern(p). Find E(Y2). E(Y2) = X y y2p(y) = 02(1 p) + 12p = p. (same as E(Y); in fact, E(Yk) = p for all k). Ok O I k I 9 / 19 Laws of expectation 1. If c is a constant, then E(c) = c. 2. E {cg(Y)} = cE {g(Y)}. 3. E {g1(Y) + g2(Y) + ...+ gk(Y)} = E {g1(Y)}+E {g2(Y)}+...+E {gk(Y)} . Proof of 1st law: E(c) = P y cp(y) = c P y p(y) = c(1) = c. Example 14 Suppose that Y v Bern(p). Find E(3Y2 + Y 2). E(3Y2 + Y 2) = 3E(Y2) + E(Y) 2 = 3p+ p 2 = 4p 2. (recall from Example 13 that E(Yk) = p for all k) 10 / 19 Special expectations 1. The kth raw moment of Y is µ0k = E(Yk). 2. The kth central moment of Y is µk = E ⇣ (Y µ)k ⌘ . 3. The variance of Y is Var(Y) = 2 = µ2 = E ⇣ (Y µ)2 ⌘ . 4. The standard deviation of Y is SD(Y) = = p Var(Y). We can also write Var(Y) as V(Y) or 2Y . Note that µ01 = µ. Also, µ1 = E ⇣ (Y µ)1 ⌘ = E(Y) µ = µ µ = 0. 11 / 19 Example 15 Suppose that p(y) = y/3, y = 1, 2. Find µ03 and . µ03 = E(Y3) = P y y 3p(y) = 13 1 3
+ 23 2 3
= 173 . µ = E(Y) = P y yp(y) = 1(1/3) + 2(2/3) = 5/3. 2 = µ2 = E ⇣ (Y µ)2 ⌘ = P y(y µ)2p(y) = 1 53 2 1 3 +
2 53 2 2 3 = 2 9 . Hence = p 2/3 = 0.4714. The various moments provide information about the nature of a distribution. We have already seen that the mean provides a measure of central tendency. The variance and standard deviation provide measures of dispersion. Distributions that are highly disperse have a large variance. 12 / 19 Variance example Example: Suppose X has pmf p(x) = 1/2, x = 1, 3 and Y has pmf p(y) = 1/2, y = 0, 4. Find Var(X) and Var(Y). Which distribution is the more disperse? Both distributions have a mean of 2 (= average of 1 and 3 = average of 0 and 4). Var(X) = (1 2)20.5+ (3 2)20.5 = 1 and Var(Y) = (0 2)20.5+ (4 2)20.5 = 4. We see that Var(Y) > Var(X). This corresponds to the fact that Y’s distribution is the more disperse of the two. pint Ply 112 112 o I i n o d 4 Y 13 / 19 Binomial Pmf ply 111 Mylarge Mysmall ipeaned flat T T Mi su mean 1st raw moment Ma measure spread dispersioncentral moments My measures shewner binomial abode shewed to right measures kurtosis MITumnetricY how flat or peatied First 4 moments tell you a lot about the distribution 14 / 19 Two important results (for computing variances) 1. Var(Y) = E(Y2) (E(Y))2, or equivalently, 2 = µ02 µ2. 2. Var(a+ bY) = b2Var(Y). proof of 1: LHS = E
(Y µ)2 = E(Y22Yµ+µ2) = E(Y2)2µE(Y)+µ2 = RHS. proof of 2: LHS = E ⇣ (a+ bY E(a+ bY))2 ⌘ = E ⇣ (a+ bY a bµ)2 ⌘ = b2E ⇣ (Y µ)2 ⌘ = RHS. Example 16 Find the variance of the Bernoulli distribution. Recall that if Y v Bern(p), then E(Y) = E(Y2) = p. Therefore Y has variance Var(Y) = p p2 = p(1 p) . What is the variance of X = 2 5Y? Var(X) = (5)2Var(Y) = 25p(1 p). 15 / 19 Moment generating functions The moment generating function (mgf) of a random variable Y is defined to be m(t) = E(etY). Mgf’s have two important uses: 1. To compute raw moments, according to the formula: µ0k = m (k)(0) (See Thm 3.12 in the text for a general proof, and below for the case k = 1.) 2. To identify distributions according to the result: If the mgf of a rv Y is the same as that of another rv X, we can conclude that Y has the same distribution as X. (This follows from “the uniqueness theorem”, a result in pure mathematics.) Rtn derivative of mgf evaluated at t o 16 / 19 Moment generating functions continued Note: m(k)(0) denotes the kth derivative of m(t), evaluated at t = 0, and may also be written as dkm(t) dtk
t=0 . We may also write m(1)(t) as m0(t), and m(2)(t) as m00(t), etc. For all mgf’s, m(t), it is true that m(0)(0) = m(0) = E(YY0) = E(1) = 1 . Proof that µ = µ01 = m0(0): m0(t) = ddtm(t) = d dtE(e Yt) = ddt P y e ytp(y) = P y d dt e ytp(y) =P y ye ytp(y). So m0(0) = P y ye y0p(y) = P y yp(y) = E(Y) = µ. 17 / 19 Example 17 Use the mgf technique to find the mean and variance of the binomial distribution. Let Y v Bin(n, p). Then Y has mgf m(t) = E(eYt) = Pn y=0 e ytn y
py(1 p)ny =Pn y=0 n y
(pet)y(1 p)ny = {(pet) + (1 p)}n by the binomial theorem. Thus m(t) = (1 p+ pet)n. Then, m0(t) = dm(t)dt = n (1 p+ pet)n1 pet, by the chain rule for differentiation. (This rule is: dudt = du dv dv dt , where here v = 1 p+ pet and u = m(t) = vn.) So µ = µ01 = m0(0) = n
1 p+ pe0n1 pe0 = np (as before). 18 / 19 Example 17 continued m00(t) = d 2m(t) dt2 = dm0(t) dt = np n (1 p+ pet)n1 et + et(n 1) (1 p+ pet)n2 pet o , by the product rule for differentiation. (This rule is: d(uv)dt = u dv dt + v du dt , where here u = (1 p+ pet)n1 and v = et.) µ02 = m00(0) = np n 1 p+ pe0n1 e0 + e0(n 1) 1 p+ pe0n2 pe0o = np {1+ (n 1)p}. Therefore 2 = µ02 µ2 = np {1+ (n 1)p} (np)2 = np(1 p). 19 / 19 Example 18 A random variable Y has the mgf m(t) = 18(1+ e t)3. Find the probability that Y equals three. m(t) = (1 12 + 12et)3 = (1 p+ pet)n, where n = 3 and p = 1/2. Thus m(t) is the mgf of a random variable whose distribution is binomial with parameters 3 and 1/2. Therefore Y v Bin(3, 1/2), and so P(Y = 3) = 1/8.