Internshippython
Internshippython
By
SCHOOL OF COMPUTING
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC | 12B Status by UGC | Approved by AICTE
JEPPIAAR NAGAR, RAJIV GANDHISALAI,
CHENNAI - 600119
APRIL - 2023
i
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of BAVISETTI
GOWTHAM (39110146) who carried out the Project Phase-2 entitled “BOOK
RECOMMENDATION SYSTEM” under my supervision from Jan 2023 to April 2023.
Internal Guide
Dr. M. D. ANTO PRAVEENA M.E., Ph.D.,
ii
DECLARATION
DATE: 19-04-2023
iii
ACKNOWLEDGEMENT
I convey my thanks to Dr. T. Sasikala M.E., Ph.D., Dean, School of Computing, and
Dr. L. Lakshmanan M.E., Ph.D., Head of the Department of Computer Science and
Engineering for providing me necessary support and details at the right time during
the progressive reviews.
I would like to express my sincere and deep sense of gratitude to my Project Guide
Dr. M. D. Anto Praveena, M.E., Ph.D., for his valuable guidance, suggestions and
constant encouragement paved the way for the successful completion of my phase-
2 project work.
I wish to express my thanks to all Teaching and Non-teaching staff members of the
Department of Computer Science and Engineering who were helpful in many
ways for the completion of the project.
4
ABSTRACT
Today the World Wide Web provides users with a vast array of information, and commercial
activity on the Web has increased to the point where hundreds of new companies are adding
web pages daily. This has led to the problem of information overload. Recommender systems
have been developed to overcome this problem by providing recommendations that help
individual users identify content of interest by using the opinions of a community of users
and/or the user’s preferences.
The aim of this thesis was to design and evaluate different approaches for producing
personalised recommendations within the book domain. To achieve this goal, the project
first investigated existing recommender systems and profiling techniques. The next step was
to build users’ profiles by monitoring users’ behaviour, and develop three different
approaches for producing recommendations. Finally, an evaluation of the system
recommendations’ accuracy was done, by first conducting live user experiments and then
performing offline analysis to measure the recommendations’ accuracy using appropriate
methods for testing.
The system evaluation results show that the accuracy of the system recommendations is very
good and that a recommender system based on the combination of content-based and
collaborative filtering approaches provides more accurate recommendations for the book
domain.
5
TABLE OF CONTENTS
Chapter
TITLE Page No.
No.
ABSTRACT v
1 INTRODUCTION 1
2 REQUIREMENTS ANALYSIS 3
DES 7
SSSS
SSSS
8
8
8
4 DESCRIPTION OF PROPOSED SYSTEM 13
4.1 Selected Methodology or process model 13
4.2 Data sets 14
4.3 Architecture / Overall Design of Proposed System 18
Description of Software for Implementation and Testing 19
4.4
plan of the Proposed Model/System
4.5 Project Management Plan 24
5 IMPLEMENTATION DETAILS 32
5.1 Development and Deployment Setup 25
5.2 Algorithms 26
5.3 Testing 26
6 RESULTS AND DISCUSSIONS 32
7 CONCLUSION 33
7.1 Conclusion 33
6
7.2 Future Work 33
REFERENCES 35
APPENDIX
A. SOURCE CODE
B. SCREENSHOTS
C. RESEARCH PAPER
7
LIST OF FIGURES
8
CHAPTER 1
INTRODUCTION
This is an experimental project which first, designs…. And second evaluates different
approaches for offering recommendations to readers regarding books they may wish to
purchase, as part of an online bookshop website.
Today the World Wide Web has provided access to a vast array of information through the
web pages, as a result of the Internet growth. Also, commercial activity on the Web has
increased to the point where hundreds of new companies are adding Web pages daily. With
this increase in information sources, a problem of information overload occurs, in which the
users are trying to deal with an excess of information that is not useful to them as they try to
make sensible decisions (Losee, 1989). As a response to this problem, a range of tools to help
with retrieving, searching, and filtering have been developed.
The tool most widely used to alleviate the problem of information overload is the search
engine. The benefits for the users from search engine technology have decreased as the
number of web pages has grown. In addition, the user must first consider the large number of
search tools available and decide which one to access. Then the user must interact with each
one individually because search engines are typically not personalised to individual users or
their prevailing context. Users usually make a choice on the basis of their personal experience
or other people’s experience. Based on these facts, recommender systems have been
developed to provide recommendations that help individual users identify content of interest
by using the opinions of a community of users and/or the user’s preferences.
1
Introduction
system’s recommendations. This is done with a clear explanation from the system, presented
in a way that is in keeping with the consumer’s preferences. A good recommender system
can significantly contribute to achieving the consumer’s acceptance of the system
recommendations.
Objectives:
Look into and assess the profiling and recommender systems that are already
in use.
By observing dynamic user behaviours, you can create a user's profile for a
recommender system. The user profile needs to change to reflect the user's
shifting interests.
Create a recommender system that uses a variety of computation methods.
Utilize the right methods to assess the system's recommendations' accuracy.
Dept of ISE,SKIT 2
CHAPTER 2
REQUIREMENT ANALYSIS
The project will create and assess a collaborative filtering and content-based recommender
system for a real online bookstore. Machine learning methods are typically needed for
content-based recommendations in order to identify trends in the products customers like
(Middleton, 2003). The experiences of actual users will be reflected in the content-based
technology. Users' profiles will be created so that their behaviour may be tracked.
Additionally, the system will produce recommendations by comparing the contents of the
books in the user's profile with those that the user hasn't reviewed.
Monitor : LED.
Mouse : Logitech.
Hard Disk : 1 TB
Language : Python 3
2.2.3 Python:
Python is a high-level, interpreted, interactive and object-oriented scripting
3
Requirement Analysis
Python is Interactive − You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
Easy-to-read − Python code is more clearly defined and visible to the eyes.
Dept of ISE,SKIT 4
Requirement Analysis
A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
Extendable − You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more efficient.
GUI Programming − Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
Scalable − Python provides a better structure and support for large programs than shell
scripting.
Apart from the above-mentioned features, Python has a big list of good features, few are
listed below −
It supports functional and structured programming methods as well as OOP.
It provides very high-level dynamic data types and supports dynamic type checking.
.
2.2.6 Getting Python
The most up-to-date and current source code, binaries, documentation, news, etc is
available on the official website of Python https://www.python.org.
Dept of ISE,SKIT 5
Requirement Analysis
Follow the link for the Windows installer python-XYZ.msifile where XYZ is the
version you need to install.
To use this installer python-XYZ.msi, the Windows system must support Microsoft
Installer 2.0. Save the installer file to your local machine and then run it to find out
if your machine supports MSI.
Run the downloaded file. This brings up the Python install wizard, which is really
easy to use. Just accept the default settings, wait until the install is finished, and you
are done.
The Python language has many similarities to Perl, C, and Java. However, there are some
definite differences between the languages.
$ python
>>>
Dept of ISE,SKIT 5
Requirement Analysis
If you are running new version of Python, then you would need to use print statement with
parenthesis as in print ("Hello, Python!"). However, in Python version 2.4.3, this
produces the following result −
Hello, Python!
We assume that you have Python interpreter set in PATH variable. Now, try to run this
program as follows −
$ python test.py
Hello, Python!
Dept of ISE,SKIT 7
CHAPTER 3
The application will be developed using the incremental development methodology and will
be made up of four increments: Front End, Learning module, Recommendation module and
Database increment. The requirements outlined in the Requirements Document will be
mapped to manageable increments.
Recommender systems have been developed to overcome the above mentioned limitations
of searching through the massive volume of information available. Recommender systems,
in comparison with other filtering tools, require less experience on the part of the user and
less effort to specify their interests when querying and operating the system (Resnick and
Varian, 1997).
Recommendations systems rely on different technologies for computing recommendations.
The most important approaches are content-based filtering and collaborative filtering.
Content-based filtering displays users as individuals, while recommender systems
employing the collaborative filtering approach display the user as a part of a group (Fasli,
2006). In addition, an advanced recommender system that combines content-based and
collaborative filtering to avoid the limitations of each approach, is called a hybrid approach.
8
Description And Propsed System
approaches are performed on textual documents, such as web pages and articles. The textual
document can be easily broken down into individual words, unlike video and physical
resources, which required sophisticated analysis.
Data Set
During the last few decades, with the rise of Youtube, Amazon, Netflix and many other such
web services, recommender systems have taken more and more place in our lives. From e-
commerce (suggest to buyers articles that could interest them) to online advertisement
(suggest to users the right contents, matching their preferences), recommender systems are
today unavoidable in our daily online journeys.
In a very general way, recommender systems are algorithms aimed at suggesting relevant
items to users (items being movies to watch, text to read, products to buy or anything else
depending on industries).
Recommender systems are really critical in some industries as they can generate a huge
amount of income when they are efficient or also be a way to stand out significantly from
competitors. As a proof of the importance of recommender systems, we can mention that, a
few years ago, Netflix organised a challenges (the “Netflix prize”) where the goal was to
produce a recommender system that performs better than its own algorithm with a prize of 1
million dollars to win
Dept of ISE,SKIT 9
Description And Propsed System
removed from the dataset. Moreover, some content-based information is given (Book-Title,
Book-Author, Year-Of-Publication, Publisher), obtained from Amazon Web Services. Note
that in case of several authors, only the first is provided. URLs linking to cover images are
also given, appearing in three different flavours (Image- URL-S, Image-URL-M, Image-
URL-L), i.e., small, medium, large. These URLs point to the Amazon web site.
Different forms for providing recommendations have been developed; they can be classified
into the following forms: attribute-based recommendations, item-to-item correlation,
peopleto- people correlation and non-personalised recommendations (Konstan et al., 2001).
For more detailed descriptions.
Dept of ISE,SKIT 10
Description And Propsed System
approaches are performed on textual documents, such as web pages and articles. The textual
document can be easily broken down into individual words, unlike video and physical
resources, which required sophisticated analysis.
Content-based filtering has some shortcomings in recommending items. A user's selection is
based on the subjective attributes (such as the quality) of the item (Goldberg et al., 1992); in
contrast, content based approaches are based on objective attributes (such as the description
of an item) about the items. Also, some items the users may be interested in cannot be
recommended to them because content-based methods compare new items with the items
previously seen by the user, while the user's interests may be beyond the scope of the
previously seen items. Finally, multimedia technology such as sound, video or physical items
cannot be analysed automatically for relevant attribute information, due to limitations of
resources (Jennings et al., 2005).
Dept of ISE,SKIT 11
Description And Propsed System
User-based algorithm is based on the fact that each user belongs to a larger group of similarly
behaving individuals. It uses statistical techniques to find a set of users with similar interests,
known as neighbours, in the entire user-item database, to generate a list of recommendation
for the active user (Middleton, 2003).
Different measures of similarity that are based on neighbourhood algorithms are used to
compute the similarity between the active user and other users in the database, such as the
Pearson correlation coefficient and Mean squared differences
Dept of ISE,SKIT 12
Description And Propsed System
algorithms (Breese et al., 1998). Moreover, to predict the rating of an item given by the active
user, the ratings from the most similar users for the item are averaged and weighted by their
similarities to the active user. The Pearson Correlation (fig 4.2) reflects the degree of linear
relationship between two variables and ranges from
+1 to -1. A positive correlation means that the two users have very similar tastes, while a
negative correlation indicates that the users have dissimilar tastes (Fasli, 2006). The Pearson
Correlation Coefficient method defines the similarity between two users by:
Dept of ISE,SKIT 13
Description And Propsed System
Flask Framework:
Flask is a web application framework written in Python. Armin Ronacher, who leads
an international group of Python enthusiasts named Pocco, develops it. Flask is based on
Werkzeug WSGI toolkit and Jinja2 template engine. Both are Pocco projects.
Http protocol is the foundation of data communication in world wide web. Different
methods of data retrieval from specified URL are defined in this protocol.
By default, the Flask route responds to the GET requests. However, this preference
can be altered by providing methods argument to route () decorator.In order to demonstrate
the use of POST method in URL routing, first let us create an HTML form and use the
POST method to send form data to a URL.
Dept of ISE,SKIT 14
Description And Propsed System
1 GET
2 HEAD
3 POST
Used to send HTML form data to server. Data received by POST method
is not cached by server.
4 PUT
5 DELETE
By default, the Flask route responds to the GET requests. However, this preference
can be altered by providing methods argument to route () decorator.In order to demonstrate
the use of POST method in URL routing, first let us create an HTML form and use the
POST method to send form data to a URL.
Dept of ISE,SKIT 15
Description And Propsed System
Fig 3.4 depicts the project management plan flowchart of standard project management
practices and methodologies widely recognized and utilized globally. It covers essential
topics such as project initiation, planning, execution, monitoring, and closure, offering
detailed processes, tools, and techniques for each phase
Dept of ISE,SKIT 16
CHAPTER 4
IMPLEMENTATION DETAILS
A recommendation system helps an organization to create loyal customers and build trust by
them desired products and services for which they came on your site. The recommendation
system today are so powerful that they can handle the new customer too who has visited the
site for the first time. They recommend the products which are currently trending or highly
rated and they can also recommend the products which bring maximum profit to the
company.
17
Implementation
4.2 ALGORITHM
Dataset description
we have 3 files in our dataset which is extracted from some books selling websites.
Books – first are about books which contain all the information related to
books like an author, title, publication year, etc.
Users – The second file contains registered user’s information like user id,
location.
ratings – Ratings contain information like which user has given how much
rating to which book.
So based on all these three files we can build a powerful collaborative filtering model.
Dept of ISE,SKIT 18
Implementation
Loading data
let us start while importing libraries and load datasets. while loading the file we have some
problems like.
The values in the CSV file are separated by semicolons, not by a comma.
There are some lines which not work like we cannot import it with pandas and It
throws an error because python is Interpreted language.
Encoding of a file is in Latin
Preprocessing Data: Now in the books file, we have some extra columns which are not
required for our task like image URLs. And we will rename the columns of each file as the
name of the column contains space, and uppercase letters so we will correct as to make it easy
to use.
The dataset is reliable and can consider as a large dataset. we have 271360 books data and
total registered users on the website are approximately 278000 and they have given near
about 11 lakh rating. hence we can say that the dataset we have is nice and reliable.We do
not want to find a similarity between users or books. we want to do that If there is user A
who has read and liked x and y books, And user B has also liked this two books and now user
A has read and liked some z book which is not read by B so we have to recommend z book
to user B. This is what collaborative filtering is.
So this is achieved using Matrix Factorization, we will create one matrix where columns will
be users and indexes will be books and value will be rating. Like we have to create a Pivot
table.If we take all the books and all the users for modeling, Don’t you think will it create a
problem? So what we have to do is we have to decrease the number of users and books
because we cannot consider a user who has only registered on the website or has only read
one or two books. On such a user, we cannot rely to recommend books to others because we
have to extract knowledge from data. So what we will limit this number and we will take a
user who has rated at least 200 books and also we will limit books and we will take only
those books which have received at least 50 ratings from a user.
Dept of ISE,SKIT 19
Implementation
The primary goal of EDA is to support the analysis of data prior to making any conclusions.
It may aid in the detection of apparent errors, as well as a deeper understanding of data
patterns, the detection of outliers or anomalous events, and the discovery of interesting
relationships between variables.
Website Deployment
We are using the pycharm community to deploy the website. By creating the project book
recommendation System.
Flask provides configuration and conventions, with sensible defaults, to get started. This
section of the documentation explains the different parts of the Flask framework and how they
can be used, customized, and extended. Beyond Flask itself, look for community-maintained
extensions to add even more functionality.
def create_app():
app = Flask( name )
hello.init_app(app) return
app
Dept of ISE,SKIT 20
CHAPTER 5
Today the World Wide Web provides users with a vast array of information, and commercial
activity on the Web has increased to the point where hundreds of new companies are adding
web pages daily. This has led to the problem of information overload. Recommender systems
have been developed to overcome this problem by providing recommendations that help
individual users identify content of interest by using the opinions of a community of users
and/or the user’s preferences.
The aim of this thesis was to design and evaluate different approaches for producing
personalised recommendations within the book domain. To achieve this goal, the project
first investigated existing recommender systems and profiling techniques. The next step was
to build users’ profiles by monitoring users’ behaviour, and develop three different
approaches for producing recommendations. Finally, an evaluation of the system
recommendations’ accuracy was done, by first conducting live user experiments and then
performing offline analysis to measure the recommendations’ accuracy using appropriate
methods for testing.
The system evaluation results show that the accuracy of the system recommendations is very
good and that a recommender system based on the combination of content-based and
collaborative filtering approaches provides more accurate recommendations for the book
domain.
21
CHAPTER 6
CONCLUSION
6.1 CONCLUSION
All of our systems– purely content-based, purely collaborative-filtering, and hybrid–
performed quite well. Looking back on the project, one thing that we might have chosen to
do differently in retrospect would have been to spend more time searching for a dataset of
ratings with a higher rating variance per user. Had we been able to find such a dataset, our
implementations of algorithms would have been tested on data that would have been more
representative of what a typical commercial recommendation system could access in creating
its predictions. However, given the data that was available to us, as well as the results our
various approaches produced, our systems were largely successful, providing insight into
how the different systems we regularly use work and the varying algorithms that make that
possible.
22
REFERENCES
[1] Ahuja, Rishabh, Arun Solanki, and Anand Nayyar.” Movie recommender system using
[2] Badriyah, Tessy, Erry Tri Wijayanto, Iwan Syarif, and Prima Kristalina. ”A hybrid
recommendation system for E-commerce based on product description and user profile.” In
2017 Seventh International Conference on Innovative Computing Technology (INTECH),
pp. 95-100. IEEE, 2017.
[3] Chen, Junnan, Courtney Miller, and Gaby G. Dagher. ”Product recommendation system
for small online retailers using association rules mining.” In Proceedings of the 2014
International Conference on Innovative Design and Manufacturing (ICIDM), pp. 71-77.
IEEE, 2014.
[4] Jisha, R. C., Ram Krishnan, and Varun Vikraman. ”Mobile applications
recommendation based on user ratings and permissions.” In 2018 International Conference
on Advances in Computing, Communications and Informatics (ICACCI),
pp. 1000-1005.IEEE, 2018.
[5] Keerthana, N. K., Shriram K. Vasudevan, and Nalini Sampath. ”An Effective Approach
to Cluster Customers with a Product Recommendation System.” Journal of Computational
and Theoretical Nanoscience Vol. 17, No. 1, pp. 347-352.IEEE, 2020.
[6] Kurmashov, Nursultan, Konstantin Latuta, and Abay Nussipbekov. ”Online book
recommendation System.” In 2015 Twelve International Conference on Electronics
Computer and Computation (ICECCO), pp. 1-4. IEEE, 2015.
23
APPENDIX
A. SOURCE CODE
Fig A.1 Refers to the implementation of the flask for the website deployment.
from flask import Flask,render_template,request
import pickle
import numpy as np
popular_df=pickle.load(open('popular.pkl','rb'))
pt=pickle.load(open('pt.pkl','rb'))
books=pickle.load(open('books.pkl','rb'))
similarity_scores=pickle.load(open('similarity_scores.pkl','rb'))
app=Flask( name )
@app.route('/')
def index():
return render_template('index.html',
book_name = list(popular_df['Book-
Title'].values),
author=list(popular_df['Book-
Author'].values),
image=list(popular_df['Image-URL-
M'].values),
votes=list(popular_df['num_ratings'].values),
rating=list(popular_df['avg_rating'].values)
)
@app.route('/recommendation')
def recommendation_ui():
return render_template('recommendation.html')
@app.route('/recommend_books',methods=['post'])
def recommend():
user_input=request.form.get('user_input')
index = np.where(pt.index == user_input)[0][0]
similar_items = sorted(list(enumerate(similarity_scores[0])),
key=lambda x: x[1], reverse=True)[1:6]
data = []
for i in similar_items:
item = []
temp_df = books[books['Book-Title'] == pt.index[i[0]]]
item.extend(temp_df.drop_duplicates("Book-Title")['Book-
Title'].values)
item.extend(temp_df.drop_duplicates("Book-Title")['Book-
Author'].values)
item.extend(temp_df.drop_duplicates("Book-Title")['Image-
URL-M'].values)
data.append(item)
print(data)
return render_template('recommendation.html',data=data)
24
Appendix
B. SCREENSHOTS
First of all we are importing the required libraries and datasets (Fig B.1)
In the Fig B.2, the books dataset is merged with ratings dataset to evaluate the highest
average rating of the books
Dept of ISE,SKIT 25
Appendix
In the next step data pre-processing is carried out to modify the data as required
(FigB.3)
Dept of ISE,SKIT 26
Appendix
Dept of ISE,SKIT 27