5011CEM Big Data Programming
Big Data Programming
项目类别:计算机

Hello, dear friend, you can consult us at any time if you have any questions, add  WeChat:  zz-x2580


5011CEM Big Data Programming


Module Learning Outcomes Assessed:

B1: COMPUTATION THINKING: develop and understand algorithms to solve problems; measure and optimise algorithm complexity; appreciate the limits of what may be done algorithmically in reasonable time or at all.

B2: PROGRAMMING: create working solutions to a variety of computational and real world

problems using multiple programming languages chosen as appropriate for the task.

B4: DATA SCIENCE: work with (potentially large) datasets; using appropriate storage technology; applying statistical analysis to draw meaningful conclusions; and using modern machine learning tools to discover hidden patterns.

B5. SOFTWARE DEVELOPMENT: develop a product from the initial stage of requirement / analysis all the way through development to its final stages of testing / evaluation.

B6: PROFESSIONAL PRACTICE: understand professional practices of the modern IT industry which include those technical (e.g. version control / automated testing) but also social, ethical & legal responsibilities.

B7: TRANSFERABLE SKILLS: apply a wide variety of degree level transferable skills including time

management, team working, written and verbal presentation to both experts and non-experts, and critical reflection on own and others work.

B8: ADVANCED WORK: apply the above to advanced topics selected according to the interests of individual students.

The report is grade out of 150 and contributes 10 credits towards the module. Resit marks are capped at 40%.

For detailed guidance on mark allocation, see the grading scheme below.

This is also available as a separate Excel document on Aula.

Your original submission has been graded and feedback provided. By considering the written feedback, along with the marks for each part you are required to improve your work before re-submitting for the re-sit assessment. For convenience, the details are repeated below.

Please note that work which has not been improved may attract lower marks at the second submission.

Over the course of this module you have been introduced to a range of techniques that may be used for programming a big data project. This assessment allows you to pull together these techniques in a realistic scenario to complete a big data analysis project. Below is a realistic project scenario. By using the techniques presented during class you are to carry out the project and write a final project report for your client.

In line with real world projects, where the client has rejected your work and requested improvements, work which is not improved in line with the feedback may be marked lower.

You have been approached by a client who analyses atmospheric science and climate model data. They have developed a new analysis technique, but it takes too long to run for them to use it. They have asked you to investigate the use of big data techniques to reduce the processing time.

They have a large volume of data to process, and the analysis needs to be repeated frequently. They have the following basic requirements:

  1. Current analysis time is approximately 2.5 hours to analyse the climate model output data for a 1-hour time period.
  2. The data for a single day of model output is approximately 250MB. However, they have over 100 years’ worth of data to analyse making a total of over 9TB.
  3. Each day, they need to analyse the new data set for that day, so they wish to complete the analysis of the data for a 24-hour period (25 data sets) in under 2 hours.
  4. It is not possible to hold on this in memory at one time, so the new process should load only 1 hour of data for processing at a time. If parallel processing is to occur, then 1 hour of data per worker can be loaded as needed.

You have been tasked with investigating the use of parallel processing to achieve the analysis speed required, with the following expectations:

  1. Test and compare the processing speed of sequential and parallel processing
  2. Extrapolate your findings to indicate the number of processors required to achieve the target processing time.
  3. Test how your code responds to common errors, e.g. data that is text instead of numeric, use of NaN in the data as an error code.
  4. Run automated tests that allow your client to set the tests running and return later to see the results, without user intervention.


留学ICU™️ 留学生辅助指导品牌
在线客服 7*24 全天为您提供咨询服务
咨询电话(全球): +86 17530857517
客服QQ:2405269519
微信咨询:zz-x2580
关于我们
微信订阅号
© 2012-2021 ABC网站 站点地图:Google Sitemap | 服务条款 | 隐私政策
提示:ABC网站所开展服务及提供的文稿基于客户所提供资料,客户可用于研究目的等方面,本机构不鼓励、不提倡任何学术欺诈行为。