IR - MINIPROJECT Final
IR - MINIPROJECT Final
ON
“Develop Fake news detection system”
Submitted to
UNIVERSITY OF PUNE
In Partial Fulfilment of the Requirement for the Award of
AFFILIATED TO
UNIVERSITY OF PUNE
1
JSPM’s Jayawantrao Sawant College of Engineering
Department of Computer Engineering
Hadapsar, Pune-028.
CERTIFICATE
This is certify that the mini project of Information Retrieval entitled
is a record of confide work carried out by them, in the partial fulfillment of the
requirement for the award of Degree of Bachelor of Engineering (Computer Engineering)
at JSPM’s Jayawantrao Sawant College of Engineering, Pune under the University of Pune.
This work is done during year 2024-2025.
Date: 10/10/2024
2
TABLE OF CONTENT
1 Introduction 4-5
1.1. Introduction 4
1.2. Motivation 5
1.3. Objectives 5
2 Methodology 6-7
3 Implementation 8-12
4 Conclusion 13
3
CHAPTER 1: INTRODUCTION
1.1. INTRODUCTION
These days‟ fake news is creating different issues from sarcastic articles to A a
fabricated news and plan government propaganda in some outlets. Fake news and
lack of trust in the media are growing problems with huge ramifications in our
society. Obviously, a purposely misleading story is “fake news “ but lately
blathering social media‟s discourse is changing its definition. Some of them now
use the term to dismiss the facts counter to their preferred viewpoints.
1.2. MOTIVATION
We will be training and testing the data, when we use supervised learning it means
we are labeling the data. By getting the testing and training data and labels we can
4
perform different machine learning algorithms but before performing the
predictions and accuracies, the data is need to be preprocessing i.e. the null values
which are not readable are required to be removed from the data set and the data is
required to be converted into vectors by normalizing and tokening the data so that it
could be understood by the machine. Next step is by using this data, getting the
visual reports, which we will get by using the Mat Plot Library of Python and Sickit
Learn. This library helps us in getting the results in the form of histograms, pie
charts or bar charts.
1.3. OBJECTIVE
The objective of this project is to examine the problems and possible significances
related with the spread of fake news. We will be working on different fake news data
set in which we will apply different machine learning algorithms to train the data and
test it to find which news is the real news or which one is the fake news. As the fake
news is a problem that is heavily affecting society and our perception of not only the
media but also facts and opinions themselves. By using the artificial intelligence and
the machine learning, the problem can be solved as we will be able to mine the
patterns from the data to maximize well defined objectives. So, our focus is to find
which machine learning algorithm is best suitable for what kind of text dataset. Also,
which dataset is better for finding the accuracies as the accuracies directly depends
on the type of data and the amount of data. The more the data, more are your chances
of getting correct accuracy as you can test and train more data to find out your
results.
5
CHAPTER 2 : METHODOLOGY
EXISTING SYSTEM
There exists a large body of research on the topic of machine learning methods for
deception detection, most of it has been focusing on classifying online reviews and
publicly available social media posts. Particularly since late 2016 during the
American Presidential election, the question of determining ’fake news’ has also been
the subject of particular attention within the literature. Conroy, Rubin, and Chen
outlines several approaches that seem promising towards the aim of perfectly classify
the misleading articles. They note that simple content-related n-grams and shallow
parts-of-speech tagging have proven insufficient for the classification task, often
failing to account for important context information. Rather, these methods have been
shown useful only in tandem with more complex methods of analysis. Deep Syntax
analysis using Probabilistic Context Free Grammars have been shown to be
particularly valuable in combination with n-gram methods. Feng, Banerjee, and Choi
are able to achieve 85%-91% accuracy in deception related classification tasks using
online review corpora.
PROPOSED SYSTEM
In this paper a model is build based on the count vectorizer or a tfidf matrix ( i.e )
word tallies relatives to how often they are used in other artices in your dataset ) can
help . Since this problem is a kind of text classification, Implementing a Naive Bayes
classifier will be best as this is standard for text-based processing. The actual goal is
in developing a model which was the text transformation (count vectorizer vs tfidf
vectorizer) and choosing which type of text to use (headlines vs full text).
6
SYSTEM ARCHITECTURE
7
CHAPTER 3 : IMPLEMENTATION
SOURCE CODE
print(e.args[0]) tb = sys.exc_info()[2]
print(tb.tb_lineno)
event.accept()
if _name_ == "_main_":
import sys
app = QtWidgets.QApplication(sys.argv)
Dialog = QtWidgets.QDialog()
ui = Ui_Dialog() ui.setupUi(Dialog)
Dialog.show() sys.exit(app.exec_())
9
B) SCREENSHOTS
10
Fig: 5.2 Checking statement with Dataset
11
Fig:5.3 Detecting Fake News using Dataset
12
CHAPTER 4 : CONCLUSION
In conclusion, the fake news detection system utilizes machine learning algorithms to assess news content and
social media reviews, effectively identifying misinformation. This tool is vital for promoting informed decision-
making and combating the spread of fake news.
13
14
15