Overview The coursework is a data analysis project with a written report. You will apply skills and techniques acquired from Week 1 to Week 8 to analyse a subset of the FMNIST dataset. In completing this coursework, you should primarily use the techniques and methods introduced during the course. The assessment will focus on your understanding and demonstration of these techniques in alignment with the learning outcomes, rather than the accuracy or exactness of the ffnal results. The project report will be marked out of 30. The marking scheme is detailed below. You have twelve days to complete this coursework, with a total workload of approximately 10 hours (including preliminary coursework tasks).
Format • Software: You should mainly use R to perform the data analysis. You may use built-in functions from R packages or implement the algorithms with your own codes. • Report: You may use any document preparation system of your choice but the ffnal document must be a single PDF in A4 format. Ensure that the text in the PDF is machine-readable. • Content: Your report must include the complete analysis in a reproducible format, integrating the computer code, ffgures, and text etc. in one document. • Title Page: Show your full name and your University ID on the title page of your report. • Length: Recommended length is 8 pages of content (single sided) plus title page. Maximum length is 10 pages of content plus the title page. Any content exceeding 10 pages will not be marked.
Submission process and deadline • The deadline for submission is 11:59pm, Friday 29 November 2024. • Submission is online on Blackboard (through Grapescope). Academic Integrity and Use of AI Tools This is an individual coursework. Your analysis and report must be completed independently, including all computer code. Note that according to the University guidances, output generated by AI tools is considered work created by another person. • Citations: Acknowledge all sources, including AI tools used to support text and code writing. • Ethics: Use sources in an academically appropriate and ethical manner. Do not copy verbatim, and cite the original authors rather than second- or third-level sources. • Accuracy: Be mindful that sources, including Wikipedia and AI tools, may contain non-obvious errors. Copying and plagiarism (=passing off someone else’s work as your own) is a very serious offence and will be strictly prosecuted. For more details see the “Guidance to students on plagiarism and other forms of academic malpractice” available at https://documents.manchester.ac.uk/display.aspx?DocID=2870 .
Coursework tasks Analysis of the FMNIST data using principal component analysis (PCA) and Gaussian mixture models (GMMs) The Fashion MNIST dataset contains 70,000 grayscale images of fashion products categorised into 10 distinct classes. More information is available on Wikipedia and Github. The data set to be analysed in this coursework is a subset of the full FMNIST data and contains 10,000 images, each with dimensions of 28 by 28 pixels, resulting in a total of 784 pixels per image. Each pixel is represented by an integer value ranging from 0 to 255. You can download this data subset as “fmnist.rda” (7.4 MB) from Blackboard. load("fmnist.rda") # load sampled FMNIST data set dim(fmnist$x) # dimension of features data matrix (10000, 784) ## [1] 10000 784 range(fmnist$x) # range of feature values (0 to 255) ## [1] 0 255 Here is a plot of the ffrst 15 images: par(mfrow=c(3,5), mar=c(1,1,1,1)) for (k in 1:15) # first 15 images { m = matrix( fmnist$x[k,] , nrow=28, byrow=TRUE) image(t(apply(m, 2, rev)), col=grey(seq(1,0,length=256)), axes = FALSE) }
Each sample is assigned to one label represented by an integer from 0 to 9 (as R factor with 10 levels): fmnist$label[1:15] # first 15 labels ## [1] 7 1 4 8 1 4 7 1 2 0 7 0 8 1 6 ## Levels: 0 1 2 3 4 5 6 7 8 9