Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: zz-x2580
5011CEM Big Data Programming
Module Learning Outcomes Assessed:
B1: COMPUTATION THINKING: develop and understand algorithms to solve problems; measure and optimise algorithm complexity; appreciate the limits of what may be done algorithmically in reasonable time or at all.
B2: PROGRAMMING: create working solutions to a variety of computational and real world
problems using multiple programming languages chosen as appropriate for the task.
B4: DATA SCIENCE: work with (potentially large) datasets; using appropriate storage technology; applying statistical analysis to draw meaningful conclusions; and using modern machine learning tools to discover hidden patterns.
B5. SOFTWARE DEVELOPMENT: develop a product from the initial stage of requirement / analysis all the way through development to its final stages of testing / evaluation.
B6: PROFESSIONAL PRACTICE: understand professional practices of the modern IT industry which include those technical (e.g. version control / automated testing) but also social, ethical & legal responsibilities.
B7: TRANSFERABLE SKILLS: apply a wide variety of degree level transferable skills including time
management, team working, written and verbal presentation to both experts and non-experts, and critical reflection on own and others work.
B8: ADVANCED WORK: apply the above to advanced topics selected according to the interests of individual students.
The report is grade out of 150 and contributes 10 credits towards the module. Resit marks are capped at 40%.
For detailed guidance on mark allocation, see the grading scheme below.
This is also available as a separate Excel document on Aula.
Your original submission has been graded and feedback provided. By considering the written feedback, along with the marks for each part you are required to improve your work before re-submitting for the re-sit assessment. For convenience, the details are repeated below.
Please note that work which has not been improved may attract lower marks at the second submission.
Over the course of this module you have been introduced to a range of techniques that may be used for programming a big data project. This assessment allows you to pull together these techniques in a realistic scenario to complete a big data analysis project. Below is a realistic project scenario. By using the techniques presented during class you are to carry out the project and write a final project report for your client.
In line with real world projects, where the client has rejected your work and requested improvements, work which is not improved in line with the feedback may be marked lower.
You have been approached by a client who analyses atmospheric science and climate model data. They have developed a new analysis technique, but it takes too long to run for them to use it. They have asked you to investigate the use of big data techniques to reduce the processing time.
They have a large volume of data to process, and the analysis needs to be repeated frequently. They have the following basic requirements:
You have been tasked with investigating the use of parallel processing to achieve the analysis speed required, with the following expectations: