COMP30027 Report – Book Rating
Report – Book Rating
项目类别:计算机

Hello, dear friend, you can consult us at any time if you have any questions, add  WeChat:  zz-x2580


COMP30027 Report – Book Rating

Predictions
Anonymous
1. Introduction
In the tremendously developed world now, the
platform on the realm of literature has
migrated from physical books to online
platform. These platforms have provided a
treasure trove of variety of books for
booklovers. From the reviews of thousands of
readers, we will be able to study and analyse
the important information such as book
ratings, descriptions, publishers etc.
In recent years, machine learning techniques
can be used to predict book rating which can
assist authors, publishers and marketers
identifying potential audience and tailoring
marketing strategies to maximise reader
engagement.
The aim of this report is to analyse different
features, such as titles of the books, the
authors, descriptions and other features and
build a supervised machine learning model to
predict the rating of books. The names of
authors, descriptions and titles will be
extracted for sentiment analysis. This project
will be divided into sections using correlations
attributes and sentiment analysis of ‘Text’
containing name of books, authors,
descriptions, publishers as well as the
language of the books, to attempt to predict
book rating with 3 different levels: 3, 4 or 5.
The report will try to train classifiers using
different techniques and analyse the results
with regards to the attributes.
2. Methodology
2.1 Data Pre-processing
Different features are given in the training and
testing csv files. Upon manual inspection, the
data consists of unwanted stop words, words in
different languages, non-words etc. in order to
enhance the performance of the classifier,
pre-processing methods were carried as shown
below.
2.1.1 Case-folding
Raw data that has been extracted contains
alphabetical features that are in both upper and
lower cases. In this step, all the characters that
are in upper-case are converted into
lower-case.
2.1.2 Removing punctuation and
numbers
After case-folding process, there are numerical
values and punctuations exists such as ‘

and ‘’. these non-ASCII characters,
symbols convey no values and meaning in the
data, thus they can be considered as less
valuable information.
2.1.3 Removing stop words
Common English stop words are removed as
these words do not convey and specific
meaning. By removing words that contain
low-level information, dataset size has been
reduced thus the training time required will be
eventually reduces as fewer number of tokens
are involved.

留学ICU™️ 留学生辅助指导品牌
在线客服 7*24 全天为您提供咨询服务
咨询电话(全球): +86 17530857517
客服QQ:2405269519
微信咨询:zz-x2580
关于我们
微信订阅号
© 2012-2021 ABC网站 站点地图:Google Sitemap | 服务条款 | 隐私政策
提示:ABC网站所开展服务及提供的文稿基于客户所提供资料,客户可用于研究目的等方面,本机构不鼓励、不提倡任何学术欺诈行为。