DATA7703 Assignment
Assignment
项目类别:物理

Hello, dear friend, you can consult us at any time if you have any questions, add  WeChat:  zz-x2580


DATA7703 Assignment 

Instructions. Please read the instructions carefully — not following them may result in a
penalty. (a) Submit your solutions as a single PDF file on Blackboard. Go to Assessment,
Assignment 3 to submit. If you don’t know how to convert your file to a PDF, please search
for a guide online. You can submit as many times as you want before the deadline. The last
submission will be graded. (b) Write down your name and student number on the first
page of your solution report, and write down the question numbers for your solutions.
For programming questions, you are welcome to submit your code files or output files in a
separate zip file, but you must include both your code and relevant output in your submitted
PDF file. Excessive code output may be penalised. (c) Follow integrity rules, and provide
citations as needed. You can discuss with your classmates, but you are required to write
your solutions independently, and specify who you have discussed with in your solution. If
you do not know how to solve a problem, you can get 15% of the mark by writing down “I
don’t know”.
You are encouraged to keep your solutions concise — these questions require thoughts, not
long answers.
1. (15 marks) We consider linear combinations of threshold classifiers for classifying real
numbers in this question. Specifically, a threshold classifier for classifying real numbers is
one of the form () = ( ° ) where ° can be ≤,≥, < or >, and the indicator function
(·) takes value +1 if its argument is true, and -1 otherwise. For example, () = ( ≤ 5)
can be written as () =
{?
+1, ≤ 5,
?1, > 5. .
Given threshold classifiers 1, . . . , , we say that a linear combination = 11 + . . . +
of them represents a set ? R if and only if the classifier () = (() ≥ 0)
satisfies () = ( ∈ ).
(a) (5 marks) For any < , consider the threshold classifiers 1() = ( > ), 2() =
( < ), and 3() = ( < +∞). What is the set represented by = 1+2+0.13?
(b) (10 marks) Find a linear combination of threshold classifiers to represent two intervals
(?2,?1) ∪ (1, 2),
2. (20 marks) We consider a variant of the standard bagging algorithm, which we call Wag-
ging (Weighted Aggregating), in this question. We only consider regression problems here.
In the standard bagging algorithm for regression, we train multiple models 1, . . . , and
average their outputs during prediction. In Wagging, we compute a weighted average of
the predictions of the individual models instead. Formally, we assign a weight ≥ 0 to
with
∑?
=1 = 1. If is the prediction of , then Wagging predicts =
∑?
.
There are many possible ways to choose 1, . . . , in Wagging, and we consider how
their values affect bias and variance below.
1
In the remainder of this question, assume that 1, . . . , are identically distributed with
Var() =
2 for all , and cov(, ) =
2 for all 1 ≤ ?= ≤ .
(a) (5 marks) Show that Wagging has the same bias as each individual model.
(b) (5 marks) Express Var( ) in terms of 1, . . . , , and
2. In addition, for = 2,
use your formula to evaluate Var( ) when (1, 2) equals to (1, 0), (
1
2
, 1
2
) and (0, 1)
respectively.
(c) (10 marks) For any ≥ 2, find weights 1, . . . , that minimize the variance of
=
∑?
.
3. (30 marks) We consider random forest in this question, and study the effect of , the size
of the random feature subset used in choosing the splitting point in the decision trees.
Recall that when constructing a decision tree in a random forest, at each node, instead of
choosing the best split from all given features, we can first choose 1 ≤ ≤ features,
and then choose the best split among them.
(a) (5 marks) Load the California housing dataset provided in sklearn.datasets, and
construct a random 70/30 train-test split. Set the random seed to a number of your
choice to make the split reproducible. What is the value of here?
(b) (5 marks) Train a random forest of 100 decision trees using default hyperparameters.
Report the training and test accuracies. What is the value of used?
(c) (5 marks) Compute all the pairwise correlations between the test set predictions of
the 100 trees, and report their average. Correlation refers to the Pearson correlation
in this question.
(d) (5 marks) Repeat (b) and (c) for = 1 to , and tabulate the training and test
accuracies, and the average correlations for all values. In addition, plot the training
and test accuracies against in a single figure, and plot the average correlation
against in another figure.
(e) (5 marks) Describe how the average correlation changes as increases. Explain the
observed pattern.
(f) (5 marks) A data scientist claims that we should choose such that the average
correlation is smallest, because it gives us maximum reduction in the variance, thus
maximum reduction in the expected prediction error. True or false? Justify your
answer.
4. (35 marks) We consider bagging using the OOB error to automatically determine the
number of basis models to use in this question. Specifically, we keep adding one model
trained on a bootstrap sample to the bagging ensemble at a time, and stop only when
one of the following happens: (a) the number of models in the ensemble reaches a
pre-specified maximum number of models, or (b) is at least 10, and ≥ ?5 .
2

留学ICU™️ 留学生辅助指导品牌
在线客服 7*24 全天为您提供咨询服务
咨询电话(全球): +86 17530857517
客服QQ:2405269519
微信咨询:zz-x2580
关于我们
微信订阅号
© 2012-2021 ABC网站 站点地图:Google Sitemap | 服务条款 | 隐私政策
提示:ABC网站所开展服务及提供的文稿基于客户所提供资料,客户可用于研究目的等方面,本机构不鼓励、不提倡任何学术欺诈行为。