ELEE08021 Sensor Networks and Data Analysis
Sensor Networks and Data Analysis
项目类别:电气电力工程

Hello, dear friend, you can consult us at any time if you have any questions, add  WeChat:  zz-x2580


ELEE08021 Sensor Networks and Data Analysis 2 Exercise ML

1. Principal Component Analysis - Consider a dataset that consists of data points
{xi}5i=1 where xi 2 R2. We can represent this dataset using a matrix X 2 R5⇥2 where
each row corresponds to a data point.
X =
266664
1 2
0 0
1 2
2 4
2 4
377775
(a) Standardise this dataset to form a matrix Xs 2 R5⇥2 where each column has
zero mean and unit variance.
(b) Compute the principal components of Xs. These should have unit norm.
(c) Use the first principal component to transform your standardised dataset down
to 1 dimension.
(d) Reconstruct your standardised dataset using the transpose of the first principal
component. Compute the reconstruction error.
Solution:
(a) First, compute the means and standard deviations of each column. Both
columns are zero mean, and the variances are
21 = (1
2 + 02 + (1)2 + (2)2 + (2)2)/5 = 2
22 = (2
2 + 02 + (2)2 + (4)2 + (4)2)/5 = 8
This means the standard deviations are 1 =
p
2 and 2 = 2
p
2. We divide
the values in each column by the column standard deviation (we don’t have
to worry about dividing by zero here) to get our standardised data matrix:
X =
1p
2
266664
1 1
0 0
1 1
2 2
2 2
377775
Notice that both columns are the same so we have a redundancy.
(b) First, form the covariance matrix C = 1DX
TX:
Page 1 of 11 Continued
ELEE08021 Sensor Networks and Data Analysis 2 Exercise ML
C =
1
4

1 0 1 2 2
1 0 1 2 2
266664
1 1
0 0
1 1
2 2
2 2
377775 =

5
2
5
2
5
2
5
2

Now we need to compute the eigenvalues 1,2 and eigenvectors w1, w2 of C.
Recall that an eigenvalue/vector pair ,w satisfies w = Cw. This means we
can write det(C I) = 0. 52 525
2
5
2
= 0
(
5
2
)2 25
4
= 0
25
4
5+ 2 25
4
= 0
2 5 = 0
( 5) = 0
This gives 1 = 5 and 2 = 0. Remember that we sort these by decreasing
magnitude. We can now insert 1 into the equation 1w1 = Cw1 to compute
the eigenvector w1:
5

w1,1
w1,2

=
5
2

1 1
1 1

w1,1
w1,2

This gives us
2w1,1 = w1,1 + w1,2
2w1,2 = w1,1 + w1,2
which is satisfied when
w1,1 = w1,2
The scaling of eigenvectors is arbitrary, so we could have w1,1 = w1,2 = 1.
However, this vector would have a norm of
p
2 and we have been asked for
unit-norm eigenvectors, so we must have:
w1 =
1p
2

1
1

Page 2 of 11 Continued
ELEE08021 Sensor Networks and Data Analysis 2 Exercise ML
This is the first principal component. We can now insert 2 into the equation
2w2 = Cw2 to compute the eigenvector w2.
0

w2,1
w2,2

=
5
2

1 1
1 1

w2,1
w2,2

This gives us
0 = w2,1 + w2,2
0 = w2,1 + w2,2
which is satisfied when
w2,1 = w2,2.
This gives us the 2nd principal component as
w2 =
1p
2

1
1

(c) To project our data down to 1D, we can compute XD = Xw1 where X is our
standardised data matrix.
XD =
1
2
266664
1 1
0 0
1 1
2 2
2 2
377775

1
1

=
266664
1
0
1
2
2
377775
(d) We can reconstruct our data using X˜ = XDwT1
X˜ =
1p
2
266664
1
0
1
2
2
377775 ⇥1 1⇤ = X.
This perfectly reconstructs our data, so the reconstruction error is zero.
2. Dataset splits - Before learning a classifier, we first split our dataset into a training,
validation, and test set.
(a) Describe the purpose of each these sets.
Page 3 of 11 Continued
ELEE08021 Sensor Networks and Data Analysis 2 Exercise ML
(b) Explain what is meant by generalisation and overfitting in machine learning.
Relate these to the dataset splits.
Solution:
(a) A training set is used to train our classifier. Non-parametric classifiers such
as K-NN use this training set directly to classify new points. In parametric
classifiers, the training data is used to learn its parameters and can be dis-
carded afterwards. Validation sets are used to tune the hyper-parameters of
a classifier. In K-NN we can tune K to this validation set. When we employ
gradient descent, the learning rate ↵ is a hyper-parameter. We can tune this
by e.g. training lots of di↵erent classifiers with di↵erent learning rates, and
pick the one that does best on the validation set. The test set is used to
evaluate the performance of a (tuned) classifier.
(b) Generalisation is when a classifier trained on some data is able to successfully
classify new, unseen data. We use a train/test split to capture how well a
classifier generalises. Overfitting occurs if a classifier is very good at classifying
the data it was trained on, but fails to generalise to new data.
3. K-Nearest Neighbours in 2D - We are given a training set of data points {xi}10i=1
where xi 2 R2. Each point has an associated class label yi 2 {0, 1}. These are
depicted below, where dim1 and dim2 are the first and second element of each datum.
(a) Use a K-NN classifier with K = 1 to classify a test point at

2 2⇤
Page 4 of 11 Continued
ELEE08021 Sensor Networks and Data Analysis 2 Exercise ML
(b) Use a K-NN classifier with K = 3 to classify a test point at

0 2

(c) Use a K-NN classifier with K = 5 to classify a test point at
⇥2 0⇤
(d) Use a K-NN classifier with K = 7 to classify a test point at

4 4⇤
Solution:
While it is possible to solve this problem by computing distances explicitly, the
2D plot let’s us perform K-NN visually in most cases.
(a) K = 1 means we are only interested in the nearest point to

2,2⇤. This is
the blue point in the lower-right corner, so this test point will be classified as
class 1.
(b) We have K = 3. The nearest three points are the blue to the left, and the
red+blue to the right. We have more blues than reds, so our test point is
classified as class 1.
(c) We have K = 5. The four nearest points are the two blue above, and the two
red below. The next nearest point is red, which breaks the tie, so our test
point is classified as class 0.
(d) We have K = 7 and our test point is in the lower right corner. The four
nearest points are clearly two red and two blue, so we are interested in the
three nearest points after those. It’s harder to determine the next few by eye,
so we can compute some distances. The red point at (-2,-2) is
p
40 away. The
red point at (-3,-1) is
p
58. The red point at (1,2) is
p
45 and the blue point
at (1,3) is
p
58. The remaining two blue points are further. This gives us the
6 nearest points as 4 red and 2 blues. The 7th nearest point is tied, but red
already has a majority, so our point is classified as class 0.
4. K-Nearest Neighbours in higher dimensions - We are given a training set of data
points {xi}3i=1 where xi 2 R5. Each point has an associated class label yi 2 {1,1}.
x1 =
⇥1 2 2 2 0⇤ y1 = 1
x2 =

1 1 0 1 1⇤ y2 = 1
x3 =
⇥2 4 2 1 0⇤ y3 = 1
(a) Use a K-NN classifier with K = 1 to classify a test point at

0 1 0 1 2⇤
(b) Give an example of something that we could represent with a 5D data point.
Page 5 of 11 Continued
ELEE08021 Sensor Networks and Data Analysis 2 Exercise ML
Solution:
(a) We can’t visualise these, but we can compute distances between the test points
and each of the training points. We will compute square distances, as these
are proportional to distances.
||x1 xt||22 = ((1 0)2 + (2 1)2 + (2 0)2 + (21)2 + (0 2)2) = 11
||x2 xt||22 = ((1 0)2 + (1 1)2 + (0 0)2 + (11)2 + (1 2)2) = 10
||x3 xt||22 = ((2 0)2 + (4 1)2 + (2 0)2 + (11)2 + (0 2)2) = 41
We have K = 1 and the nearest point belongs to class 1, so we classify our
test point as class 1.
(b) There are many answers to this, but you could for instance have the 4D iris
dataset but with an additional measurement of e.g. stem length to make it
5D.
5. Linear Classification — Consider a training set of data points {xi}4i=1 where xi 2
R1, in which each datum has an associated class label yi 2 {0, 1}.
x1 = 1 y1 = 0
x2 = 2 y2 = 0
x3 = 2 y3 = 1
x4 = 1 y4 = 1
Now consider a linear classifier zi = wxi+b where w and b are a learnable weight, and
bias parameter respectively. We will use our classifier to classify points as belonging
to class 1 if zi > 0.5, and as belonging to class 0 if zi < 0.5.
To learn our parameters we can minimise the mean squared error loss (MSE) across
our training data:
L =
1
4
4X
i=1
(zi yi)2
(a) Plot the training data. Annotate this plot with the classifier’s decision boundary
for initial parameters w = 1 and b = 1.
Page 6 of 11 Continued
ELEE08021 Sensor Networks and Data Analysis 2 Exercise ML
(b) Determine which side of the decision boundary is allocated to which class, thence
classify a test point at xt = 0.6.
(c) Derive an expression for @L@w and
@L
@b and explain how these may be used in gradient
descent to update the classifier’s parameters.
(d) Perform a single iteration of gradient descent to update w and b. Use a learning
rate ↵ = 0.1.
(e) Determine where the decision boundary is after this update, and reclassify the
test point at xt = 0.6.
Solution:
(a) In 1D, the decision boundary is a point, rather than a line. We can locate this
point by considering the case where our classifier outputs 0.5. Points either
side of this are classified di↵erently.
We have zi = wxi + b, with w = 1 and b = 1. Let’s plug these values in, and
consider a classifier output of 0.5.
0.5 = xd + 1.
This gives us the point xd = 0.5.
(b) Let’s see what happens to a point a small distance ✏ to the right of this.
z = (0.5 + ✏) + 1 = 0.5 + ✏ > 0.5
So points to the right of the decision boundary will be put into class 1. xt =
0.6 is to the left of the decision boundary so will be classified as class 0.
Page 7 of 11 Continued
ELEE08021 Sensor Networks and Data Analysis 2 Exercise ML
留学ICU™️ 留学生辅助指导品牌
在线客服 7*24 全天为您提供咨询服务
咨询电话(全球): +86 17530857517
客服QQ:2405269519
微信咨询:zz-x2580
关于我们
微信订阅号
© 2012-2021 ABC网站 站点地图:Google Sitemap | 服务条款 | 隐私政策
提示:ABC网站所开展服务及提供的文稿基于客户所提供资料,客户可用于研究目的等方面,本机构不鼓励、不提倡任何学术欺诈行为。