STAT40150 Multivariate Analysis
Multivariate Analysis
项目类别:统计学

Hello, dear friend, you can consult us at any time if you have any questions, add  WeChat:  zz-x2580


STAT40150
Multivariate Analysis

Time Allowed: 2 hours
Instructions for Candidates
Answer all three questions.
Each question is worth 100 marks. For full marks, you must clearly show all steps,
define all notation and explain all reasoning.
Candidates should upload their examination script as a single (scanned) pdf file to
Brightspace within 30 minutes of the end of the exam.
Candidates may refer to their notes or online references when answering these questions,
or use software for numerical calculations, but they must not communicate with anyone
else during the examination. Candidates must show complete workings, and
associated reasoning, in their submitted examination script. Correct answers alone
will not achieve full marks.
Candidates are required to read, complete, and upload the Honour Code form that has
been distributed as the first page in their single pdf file submission.

1. A chemical analysis of 178 wines grown in the same region in Italy but derived from
three different cultivars was conducted. The analysis determined the quantities of
5 constituents found in each wine.
(a) Under the factor analysis model a p-dimensional observation xi (i = 1, . . . , N)
is modeled as xi = µ + Λf i + i where it is assumed that f i ∼ MVNq(0, I),
i ∼ MVNp(0,Ψ) and Cov(f i, i) = 0. Derive the marginal distribution
p(xi), clearly motivating the steps in the derivation. [10 marks]
(b) The variances of the 5 constituents are detailed in Table 1. Would you advise
standardising these data prior to the application of factor analysis to them?
Prove that your answer is correct using the factor model definition detailed in
1(a), clearly defining any notation you use. [10 marks]
Alcohol Sugar free extract Fixed acidity Tartaric acid Chloride
0.66 5.01 350.44 0.40 2505.74
Table 1: Constituents’ variances
(c) A factor analysis model with 2 latent factors was applied to the correlation
matrix of these data with the resulting loadings matrix detailed below. Which
constituent has the smallest uniqueness? What is that uniqueness value?
[15 marks]
Factor1 Factor2
Alcohol -0.084 0.409
Sugar.free_extract -0.058 0.996
Fixed_acidity 0.952 0.127
Tartaric_acid 0.624 -0.205
Chloride -0.245 0.073
(d) Describe why a factor rotation is often employed when fitting a factor analysis
model, clearing defining any notation you use. How would you expect the
application of the ‘varimax’ rotation to affect the factor loadings detailed in
1(c)? [20 marks]
Cont./...

(e) Principal components analysis was then applied to the correlation matrix of
these data. The standard deviations of principal components 1 through 5
are 1.3224, 1.1795, 0.9398, 0.8183 and 0.55432, respectively. Compute the
proportion of the variance explained by each principal component and illustrate
(by hand, software is not necessarily required) the resulting scree plot.
[15 marks]
(f) The resulting principal components analysis loadings matrix is given below.
Given the loadings, and your answer to 1(e), how many principal components
do you think are required to represent these data? Explain your answer.
[10 marks]
PC1 PC2 PC3 PC4 PC5
Alcohol -0.19 0.66 -0.16 0.64 0.29
Sugar.free_extract -0.30 0.65 0.18 -0.51 -0.45
Fixed_acidity 0.57 0.32 0.34 -0.34 0.58
Tartaric_acid 0.64 0.12 0.24 0.40 -0.60
Chloride -0.37 -0.17 0.88 0.24 0.08
(g) The (2-dimensional) scores resulting from the application of factor analysis and
from the application of principal components analysis to the wine data are il-
lustrated in Figures 1a and 1b respectively. What method could be used to
compare the similarity of these resulting scores plots? Explain how your sug-
gested method works, clearly defining any notation used, and how the method
quantifies the similarity of the scores plots. [10 marks]
(h) Based on the outputs provided above, do you think factor analysis or principal
components analysis would be the preferred dimension reduction approach for
these data? [10 marks]
[Total: 100 marks]
Cont./...

−1 0 1 2

2
0
2
4
Dimension 1
D
im
en
si
on
2
(a) Two dimensional factor analysis scores.
−4 −2 0 2 4

3

2

1
0
1
2
3
Dimension 1
D
im
en
si
on
2
(b) Two dimensional principal components analysis scores.
Figure 1: Two dimensional scores for the wine data.
Cont./...

2. (a) In linear discriminant analysis the aim is to maximize the posterior probability
P(g|x) that observation x belongs to class g for g = 1, . . . , G. Show that in the
case of p = 1 for G = 2 classes of equal size and with means µ1 and µ2 respec-
tively, that the Bayes decision boundary between the 2 classes corresponds to
the point
x =
µ1 + µ2
2
.
Clearly define all notation and motivate each step taken in your solution.
[20 marks]
(b) Archaeological researchers analyzed 180 glass vessels from the 15th-17th cen-
turies using x-ray methods to determine the concentrations of 8 elements
present in the glass vessels. Four major compositional types could be dis-
tinguished with 145, 15, 10 and 10 vessels in each type respectively. The
researchers’ goal was to predict the compositional type from the 8 elements
alone. The data are illustrated in Figure 2.
i. Which of the variables seem most likely to be useful in predicting compo-
sitional type? Explain your reasoning. [5 marks]
ii. It was not possible to use quadratic discriminant analysis to classify these
data. Why is this the case? [10 marks]
iii. Three new glass vessels were analyzed to determine the concentrations
of the 8 elements present. The resulting linear discriminant functions are
detailed in the output below. To which composition type would you classify
each vessel? Explain your answer. [5 marks]
[,1] [,2] [,3] [,4]
[1,] 2327 2299 2279 2161
[2,] 2243 2268 2226 2096
[3,] 1806 1816 1946 2014
iv. Which of the 3 glass vessels in (iii) has the lowest uncertainty associated
with their classification? [30 marks]
v. An additional variable is available for the glass vessels data that details
the geographical region in which the vessel was found. Would it be appro-
priate to include this variable when classifying a glass vessel using linear
discriminant analysis? Explain your answer. [10 marks]
留学ICU™️ 留学生辅助指导品牌
在线客服 7*24 全天为您提供咨询服务
咨询电话(全球): +86 17530857517
客服QQ:2405269519
微信咨询:zz-x2580
关于我们
微信订阅号
© 2012-2021 ABC网站 站点地图:Google Sitemap | 服务条款 | 隐私政策
提示:ABC网站所开展服务及提供的文稿基于客户所提供资料,客户可用于研究目的等方面,本机构不鼓励、不提倡任何学术欺诈行为。