MATH38161 Multivariate Statistics and Machine Learning
Multivariate Statistics and Machine Learning
项目类别:统计学

MATH38161 Multivariate Statistics and Machine Learning

Coursework

November 2024

Overview

The coursework is a data analysis project with a written report. You will apply skills

and techniques acquired from Week 1 to Week 8 to analyse a subset of the FMNIST

dataset.

In completing this coursework, you should primarily use the techniques and methods

introduced during the course. The assessment will focus on your understanding and

demonstration of these techniques in alignment with the learning outcomes, rather

than the accuracy or exactness of the final results.

The project report will be marked out of 30. The marking scheme is detailed below.

You have twelve days to complete this coursework, with a total workload of approximately 10 hours (including preliminary coursework tasks).

Format

• Software: You should mainly use R to perform the data analysis. You may use

built-in functions from R packages or implement the algorithms with your own

codes.

• Report: You may use any document preparation system of your choice but the

final document must be a single PDF in A4 format. Ensure that the text in the

PDF is machine-readable.

• Content: Your report must include the complete analysis in a reproducible format,

integrating the computer code, figures, and text etc. in one document.

• Title Page: Show your full name and your University ID on the title page of your

report.

• Length: Recommended length is 8 pages of content (single sided) plus title

page. Maximum length is 10 pages of content plus the title page. Any content

exceeding 10 pages will not be marked.

1

Submission process and deadline

• The deadline for submission is 11:59pm, Friday 29 November 2024.

• Submission is online on Blackboard (through Grapescope).

Academic Integrity and Use of AI Tools

This is an individual coursework. Your analysis and report must be completed

independently, including all computer code. Note that according to the University

guidances, output generated by AI tools is considered work created by another person.

• Citations: Acknowledge all sources, including AI tools used to support text and

code writing.

• Ethics: Use sources in an academically appropriate and ethical manner. Do not

copy verbatim, and cite the original authors rather than second- or third-level

sources.

• Accuracy: Be mindful that sources, including Wikipedia and AI tools, may contain

non-obvious errors.

Copying and plagiarism (=passing off someone else’s work as your own) is a very

serious offence and will be strictly prosecuted. For more details see the “Guidance

to students on plagiarism and other forms of academic malpractice” available at

https://documents.manchester.ac.uk/display.aspx?DocID=2870 .

2

Coursework tasks

Analysis of the FMNIST data using principal component analysis

(PCA) and Gaussian mixture models (GMMs)

The Fashion MNIST dataset contains 70,000 grayscale images of fashion products

categorised into 10 distinct classes. More information is available on Wikipedia and

Github.

The data set to be analysed in this coursework is a subset of the full FMNIST data and

contains 10,000 images, each with dimensions of 28 by 28 pixels, resulting in a total of

784 pixels per image. Each pixel is represented by an integer value ranging from 0 to

255. You can download this data subset as “fmnist.rda” (7.4 MB) from Blackboard.

load("fmnist.rda") # load sampled FMNIST data set

dim(fmnist$x) # dimension of features data matrix (10000, 784)

## [1] 10000 784

range(fmnist$x) # range of feature values (0 to 255)

## [1] 0 255

Here is a plot of the first 15 images:

par(mfrow=c(3,5), mar=c(1,1,1,1))

for (k in 1:15) # first 15 images

{

m = matrix( fmnist$x[k,] , nrow=28, byrow=TRUE)

image(t(apply(m, 2, rev)), col=grey(seq(1,0,length=256)), axes = FALSE)

}

3

Each sample is assigned to one label represented by an integer from 0 to 9 (as R factor

with 10 levels):

fmnist$label[1:15] # first 15 labels

## [1] 7 1 4 8 1 4 7 1 2 0 7 0 8 1 6

## Levels: 0 1 2 3 4 5 6 7 8 9

Task 1: Dimension reduction for FMNIST data using principal components analysis

(PCA)

The following steps are suggested guidelines to help structure your analysis but are not

meant as assignment-style questions. Integrate your work as part of a cohesive report

with a logical narrative.

• Do some research to learn more about the FMNIST data.

• Compute the 784 principal components from the 784 original pixel variables.

• Compute and plot the proportion of variation attributed to each principal component.

• Create a scatter plot of the first two principal components. Use the known labels

to colour the scatter plot.

• Construct the correlation loadings plot.

• Interpret and discuss the result.

• Save the first 10 principal components of all 10,000 images to a data file for Task 2.

Task 2: Analysis of the FMNIST data set using Gaussian mixture models (GMMs)

Using all 784 pixel variables for cluster analysis is computationally impractical. In

this task, use the 10 (or fewer) principal components instead of the original 784 pixel

variables. Again, these steps serve as guidelines. Integrate this work into your report

logically following from Task 1.

• Cluster the data using Gaussian mixture models (GMMs).

• Find out how many clusters can be identified.

• Interpret and discuss the results.

Structure of the report

Your report should be structured into the following sections:

1. Dataset

2. Methods

3. Results and Discussion

4. References

In Section 1 provide some background and describe the data set. In Section 2 briefly

introduce the method(s) you are using to analyse the data. In Section 3 run the analyses

and present and interpret the results. Show all your R code so that your results are

fully reproducible. In Section 4 list all journal articles, books, wikipedia entries, github

pages and other sources you refer to in your report.

4

Marking scheme

The project report will be assessed out of 30 points based on the following rubrics.

Criteria Marks Rubrics

Description of

data

6 Excellent (5-6 marks): Provides a clear and thorough

overview of the FMNIST dataset, detailing the image

structure, pixel data, and its context within multivariate

analysis.

Good (3-4 marks): Provides a clear overview of the

dataset with some context; minor details may be missing.

Adequate (1-2 marks): Basic description of the dataset

with limited context; lacks important details.

Insufficient (0 marks): Little to no description provided.

Description of

Methods

6 Excellent (5-6 marks): Clearly and thoroughly explains

PCA and GMMs, their purposes, and how they apply to

this dataset.

Good (3-4 marks): Provides a clear explanation of PCA

and GMMs, with minor gaps in clarity or relevance.

Adequate (1-2 marks): Basic explanation of methods with

limited detail or relevance to the course techniques.

Insufficient (0 marks): Lacks clear explanations of the

methods.

Results and

Discussion

12 Excellent (10-12 marks): Correctly applies PCA and

GMMs, presents clear and informative visualisations, and

provides a coherent and insightful interpretation of the

results.

Good (7-9 marks): Accurately applies PCA and GMMs

with mostly clear visuals and reasonable interpretation;

minor improvements needed.

Adequate (4-6 marks): Basic application of techniques,

limited or unclear visuals, minimal interpretation.

Insufficient (0-3 marks): Incorrect application of

techniques, with little to no interpretation.

Overall

Presentation of

Report

6 Excellent (5-6 marks): Report is well-organised, clear, and

professionally formatted, with a logical narrative and

adherence to page limits.

Good (3-4 marks): Report is generally clear and

organised, with minor structural or formatting issues.

Adequate (1-2 marks): Report lacks coherence or has

significant formatting issues; may not meet all format

requirements.

Insufficient (0 marks): Report lacks structure and clarity,

does not meet formatting requirements.

留学ICU™️ 留学生辅助指导品牌
在线客服 7*24 全天为您提供咨询服务
咨询电话(全球): +86 17530857517
客服QQ:2405269519
微信咨询:zz-x2580
关于我们
微信订阅号
© 2012-2021 ABC网站 站点地图:Google Sitemap | 服务条款 | 隐私政策
提示:ABC网站所开展服务及提供的文稿基于客户所提供资料,客户可用于研究目的等方面,本机构不鼓励、不提倡任何学术欺诈行为。