成功案例设置

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: zz-x2580

COMP30027 Machine Learning

Submission: Source code (in Python) and written responses

Marks: The Project will be marked out of 20, and will contribute 20% of your total mark.

This will be equally weighted between implementation and responses to the questions.

Groups: You may choose to form a group of 1 or 2.

Groups of 2 will respond to more questions, and commensurately produce more implementation.

Overview

In this Project, you will implement a supervised naı¨ve Bayes learner and evaluate it with respect

to various supervised datasets. You will then use your observations to respond to some conceptual

questions about naı¨ve Bayes.

Naive Bayes classifiers

There are some suggestions for implementing your learner in the “Naı¨ve Bayes” and “Discrete

Continuous” lectures, but ultimately, the specifics of your implementation are up to you. Your imple

mentation must be able to perform the following functions:

• preprocess() the data by reading it from a file and converting it into a useful format for

training and testing

• train() by calculating prior probabilities and likelihoods from the training data and using

these to build a naive Bayes model

• predict() classes for new items in a test dataset (for the purposes of this assignment, you

can re-use the training data as a test set)

• evaluate() the prediction performance by comparing your model’s class outputs to ground

truth labels

Your implementation should be able to handle both nominal and numeric attribute types in the

same dataset. You can assume numeric attributes are Gaussian-distributed. When handling discrete at

tributes, you should implement some type of smoothing to ensure the likelihoods are greater than zero.

Your implementation should actually compute the priors, likelihoods, and posterior probabilities for

the naı¨ve Bayes model and may not simply call an existing implementation such as GaussianNB

from scikit-learn.

Data

For this assignment, we have adapted some of the classification datasets available from the UCI ma

chine learning repository (https://archive.ics.uci.edu/ml/index.html). In all of

these datasets, the task is classifcation, but the attribute types vary:

Datasets with nominal attributes only:

• breast-cancer-wisconsin

• mushroom

• lymphography

Datasets with numeric attributes only:

• wdbc

• wine

Datasets with ordinal attributes only:

• car

• nursery

• somerville

Datasets with a mix of attribute types:

• adult

• bank

These datasets vary in terms of number of instances and number of classes, in addition to the

number and type of attributes. More information is provided in the README file included with the

datasets. You are not required to use all of these datasets in your submission, however it is strongly

recommended that you use multiple datasets to answer the questions below. Different datasets will

produce different results, so if you only test your algorithm on one or two datasets, you may arrive at

an incorrect conclusion due to a small sample space.

Questions

The following problems are designed to pique your curiosity when running your classifier(s) over the

given data sets:

1. Try discretising the numeric attributes in these datasets and treating them as discrete variables

in the naı¨ve Bayes classifier. You can use a discretisation method of your choice and group the

numeric values into any number of levels (but around 3 to 5 levels would probably be a good

starting point). Does discretizing the variables improve classification performance, compared

to the Gaussian naı¨ve Bayes approach? Why or why not?

2. Implement a baseline model (e.g., random or 0R) and compare the performance of the naı¨ve

Bayes classifier to this baseline on multiple datasets. Discuss why the baseline performance

varies across datasets, and to what extent the naı¨ve Bayes classifier improves on the baseline

performance.

留学ICU™️ 留学生辅助指导品牌

在线客服 7*24 全天为您提供咨询服务

咨询电话(全球): +86 17530857517

客服QQ：2405269519

微信咨询：zz-x2580

服务项目

服务流程

客户保障

关于我们

微信订阅号