PROJECT OVERVIEW
In this project, you are being tasked with solving a supervised classification problem. The client has
provided 5,899 rows of labeled data, and your job will be to create a model that can predict the correct
output label.
PROJECT STRUCTURE
You will be using Python and Jupyter notebooks to complete the data science process for this project.
Feel free to include any third-party packages (through pip or easy_install). However, all work must be
shown in the Jupyter notebook.
PROJECT TIMELINE
You will have three (3) days to complete the project. By the end of the third day, you must email your
Jupyter notebook and any supporting files/documents to [email protected] and
[email protected]. All of your code, materials, and anything else you plan on using in your presentation
must be emailed to the team by the end of the third day (e.g. PowerPoint presentation, visualizations,
graphs, etc.). As an example, if you receive the challenge on a Friday, it will be due by end of day
(11:59PM) CST on Sunday. Upon receiving the documentation, the EY team will work with you to sched ule a presentation meeting to discuss the results of the project.
PRESENTATION STRUCTURE AND DELIVERABLES
After scheduling the presentation meeting with the team, you will be asked to come into the office or
given details for a conference bridge where you can present your findings. You will have one (1) hour to
walk through your process and answer any questions that the team has for you. You will be able to plug
your laptop into an external monitor through HDMI. If you need any special accommodations (physical
or technical), please let us know ahead of time so we can make the proper arrangements. You can use
any presentation medium that you prefer (Jupyter Notebook, PowerPoint, Power BI, etc) as long as you
are able to run it from your own laptop.
PROJECT EVALUATION
First and foremost, you will be evaluated on the overall model prediction accuracy on your test data.
Secondly, you will be evaluated on your process. We would like to see a wide variety of techniques
demonstrated so we can get a feel for the depth and breadth of your knowledge of python and the data
science process. Bonus points for clean, interesting visualizations. Lastly, you will be evaluated on your
communication and presentation skills as you deliver your findings to the team. Remember, great data
scientists can tell a compelling story.
PROJECT DETAILS
• Columns A through G are available for use as input data
• Column H is the classification label to be used for prediction
• Set aside 10% of the data set to serve as test data
• You may use any learning algorithm(s) of your choice to complete the project, but be prepared
to explain your choice
• You are free to use any data exploration and visualization techniques that you see fit
• This document is intentionally left vague; we want you to have as much freedom in solving the
challenge as possible. Use this as an opportunity to showcase your talent, but be prepared to
explain any and all decisions you make