DEPARTMENT OF COMPUTER SCIENCE
FITE7410 Financial Fraud Analytics
First Semester, 2024-2025
Assignment 1 – Exploratory Data Analysis (EDA)
(Due Date: 11 Oct, 2024?(Fri) 23:59)
Assessment Criteria:
·Plagiarism: Please follow the guidelines laid down by our department.
·You are allowed to discuss the assignment with your classmates, however, you should submit your individual work. Any direct copy and paste is PROHIBITED and would be considered as PLAGIARISM.
·Assignments would be marked based on the logic, presentation and understanding of the problem; not only on accuracy.
·LATE PENALTIES: 50% of assignment marks will be deducted for late submissions. ?0 marks if the submission is later than 2 weeks.
Objectives of this assignment:
·Perform. data cleaning and preparation.
·Explore and visualize the data to identify patterns and trends.
·Engineer new features based on domain knowledge or insights from EDA.
·Prepare a report summarizing the findings from EDA.
Instructions of this assignment:
1.(50%) Exploratory Data Analysis
a.Use the provided dataset for the mini-case study.
Download the dataset?(A1_data.csv) from Moodle,?which?is a modified version of IEEE-CIS Fraud?Dataset.?
b.Using the R package, conduct exploratory analysis of the dataset downloaded.
·Identify and handle missing values, outliers, and inconsistencies, if applicable.
·Explore the distribution of features (e.g. univariant, bi-/multi-variant analysis) using histograms, box plots, scatter plots, correlation plots, etc.
·Create new features that may be relevant for fraud detection.
NOTE: A sample R script. is provided, but you still need to complete the program. Or you can build the model by yourselves and use whatever library you like.
2.(50%) Write a short report on the following:
a.Describe the dataset based on the EDA result, including:
·A description of the data cleaning and preparation process.
·Visualizations of the data, with clear labels and explanations.
·A discussion of the key findings from EDA, including insights and potential hypotheses.
·A description of the engineered features and their rationale.
NOTE: The short report should consist of a main body of maximum 2-3 pages, focusing on your analysis and insights. Additional figures and diagrams can be included in a separate Appendix to support your report.
3.Submission on Moodle:
a.R language script.
b.A pdf version report
NOTE: Report submissions will be checked for similarity using Turnitin.