0% found this document useful (0 votes)
138 views18 pages

Fake News Proposal

The document proposes a fake news detection system using machine learning. It discusses requirements gathering, feasibility analysis, the system design and methodology. The goal is to classify news as real or fake and provide users with truthful information for free.

Uploaded by

Ishwor Nepal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
138 views18 pages

Fake News Proposal

The document proposes a fake news detection system using machine learning. It discusses requirements gathering, feasibility analysis, the system design and methodology. The goal is to classify news as real or fake and provide users with truthful information for free.

Uploaded by

Ishwor Nepal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Tribhuvan University

Institute of Science and Technology

A Proposal Report

On

“Fake News Detection”

Submitted to:

Department of Computer Science and Information Technology

Ambition College

Mid-Baneshwor, Kathmandu

In partial fulfillment of the requirements

For the Bachelor’s Degree in Computer Science and Information Technology

Submitted by:

Rupesh Thakuri (TU RollNo.:23801/076)

Ishwor Nepal (TU Roll No.: 230788/076)

Rajesh Chaudhary (TU Roll No.: 23799/076)


Under the Supervision of

Mr. Ramesh Kumar Chaudhary

March 2024
TABLE OF CONTENTS
List of Figures..........................................................................................................................iii

List of Abbreviation..................................................................................................................iv

1. INTRODUCTION............................................................................................................1

2. PROBLEM STATEMENT..............................................................................................2

3. OBJECTIVES...................................................................................................................2

4. METHODOLOGY...........................................................................................................2

4.1. Requirement Identification.........................................................................................2

4.1.1. Study of Existing System.....................................................................................2

4.1.2. Literature Review................................................................................................3

4.1.3. Requirement Analysis..........................................................................................4

4.2. Feasibility Analysis.....................................................................................................5

4.2.1. Technical Feasibility............................................................................................5

4.2.2. Operational Feasibility.........................................................................................6

4.2.3. Economic Feasibility...........................................................................................6

4.2.4. Schedule Feasibility.............................................................................................6

4.3. High Level Design of System.....................................................................................6

4.3.1. Methodology of proposed system........................................................................7

4.3.2. Flow Chart...........................................................................................................8

4.3.3. Working mechanism of proposed system............................................................9

4.3.4. Description of algorithm....................................................................................12

5. EXPECTED OUTCOME .............................................................................................13

REFERENCES

ii
LIST OF FIGURES

Figure 4.1: Use case diagram of fake news detection system...................................................5

Figure 4.2: Gantt Chart..............................................................................................................6

Figure 4.3: Agile software development life cycle....................................................................7

Figure 4.4: Flowchart of the system..........................................................................................9

Figure 4.5: Working mechanism of the system.......................................................................12

iii
LIST OF ABBREVIATION

ANN : Artificial Neural Network

CPU : Central Processing Unit

DNN : Deep Neural Network

FNDS : Fake News Detection System

RFA : Random Forest Algorithm

LSTM : Long Shor-term memory

UI : User interface

UX : User experience

iv
1. INTRODUCTION
The Fake News Detection System (FNDS) is a system that helps to determine either the
given news is true or false (fake). Nowadays, there are many news and different sources to
get this news that is hard to believe. Hence, this system plays a vital role to determine the
validity in that news and helps the trustworthiness of the news. It filters out the real from the
fake saving peoples to get and believe in the wrong information circulated through different
websites having their own prime motive like misdirecting the citizens, some tremendous
change in economic like shares and many more.

This system takes the valid news that has been validated through different sources through
some websites (like Kaggle). Through training the machine with the data through a website
the user can test the valid news by just giving the news title. The website is free source hence
anyone can get access to the website and use it to test the news. These also helps to bring the
right news to the people with free of cost making it cost efficient. These systems are in
constant evolution, yet encounter challenges due to the ever-changing landscape of
misinformation. A comprehensive approach, uniting technology, media. Its main purpose is
to spread good and correct news to the user and establish a environment free of corrupt or
fake news.

To make the website and system merge together the model that is being prepared using the
machine learning algorithm (Random Forest Algorithm) is used inside a website prepared by
different medium like (Bootstrap, tailwind etc.) with the help of flask. The machine is trained
using the RFA algorithm in python (Jupyter) and tested for its accuracy. It helps to determine
or conclude a final result or give a final decision with the majority decision made by all the
binary decision tree. The website will provide a good user interface where the user will find a
place (search area) where he/she can use the news title to search for the validity of the news.
This process entails a thorough analysis, leveraging pre-trained models, extensive datasets,
and sophisticated algorithms to ensure precise assessments. This implementation seamlessly
combines powerful machine learning techniques with user-centric web interfaces,
guaranteeing both accessibility and dependability.

1
2. PROBLEM STATEMENT

This project proposes the question of whether it is possible to detect fake news through
machine learning models. Specifically, the aim of this project is to determine the ideal
model that is efficient in predicting fake news while also limiting the cost of memory and
storage for computation.” Fake news” has been a very recent and prevalent problem within
recent years.

3. OBJECTIVES
The project attempts to fulfill the following objectives:
 To classify a piece of news as real or fake.
 To provide true valid news to the user free of cost.

4. METHODOLOGY
4.1. Requirement Identification
The Requirement Identification is the process that helps to determine the resources and
information that is essential and plays a vital role in successful development of the system.
The requirements like cost, technical functionality and operational functionality were tested
using different feasibility test. With the need of the system the different diagrams like UML,
class diagram, structured and activity diagrams were prepared.

4.1.1. Study of Existing System


A research article titled “Exploiting Network Structure to Detect Fake News,” authored by
three students from Stanford University, introduces a Neural Network approach for
identifying fake news. Their approach involves considering not only the article-specific
aspects like title and content but also incorporates the social context to enhance prediction
accuracy. This innovative strategy offers an avenue for refining prediction accuracy without
relying solely on advancing natural language processing techniques. [1]

A separate research paper titled “Fake News Detection: Deep Learning Approach” explored
the utilization of three distinct neural network models. The focus was on comparing these
models, primarily differing in how they processed the article's content and title. This

2
comparison highlights that the methodology used to handle text within an article significantly
impacts a model's performance. This observation is logical, given that the content of an
article typically stands as the primary basis for authenticating its credibility, emphasizing the
importance of text processing methodologies. [2]

4.1.2. Literature Review


In this study, the proposed method integrates a Hybrid Deep Neural Network Model,
incorporating both Convolution Dynamic Semantic Structural Model (C-DSSM) and Deep
Convolutional Neural Networks (D-CNN). This combined architecture employs the
preliminary layers of the C-DSSM for feature extraction, while the subsequent D-CNN layers
facilitate the categorization process. The initial layers of the C-DSSM focus on extracting
salient features, while the subsequent D-CNN layers aid in the categorization or classification
phase. This hybrid model leverages the strengths of both architectures to enhance
performance and accuracy in the task at hand. Experimental results demonstrate that the
proposed model achieved an impressive accuracy level of 92.60%. This success emphasizes
the efficacy of the hybrid model in delivering accurate categorization or classification
outcomes, highlighting its potential for various applications. [3]

In this study, the proposed method involves employing various algorithms, namely Naïve
Bayes, logistic regression, and Long Short-Term Memory (LSTM), to discern fake news. The
aim was to compare and contrast the outcomes generated by these distinct algorithms in
identifying deceptive information. Assessing the scope and accuracy of these algorithms, it
was observed that the LSTM algorithm notably stood out, showcasing the highest accuracy
level at 92.36%. This exceptional accuracy rate underscores the effectiveness of LSTM in
distinguishing fake news among the algorithms examined in this study. [4]

In this study, the proposed method employs the Bidirectional Encoder Representations from
Transformers (BERT), a sophisticated deep neural network model. BERT, being inherently
intricate, operates on deep neural networks. Its performance potential significantly improves
when handling large datasets, showcasing enhanced efficiency. Specifically in this study, the
BERT model utilized achieved an accuracy level of 52%. This highlights the model's
capability within the context of this research, signifying its efficacy in analyzing and
processing the given dataset. [5]

3
In this study, the proposed approach, named SLD-CNN (Semi-Supervised Linear
Discriminant Convolutional Neural Network), integrates Convolutional Neural Network
(CNN) and Semi-Supervised Linear Discriminant Analysis (LDA) techniques. It involves
feature extraction from text and image data using CNN. Subsequently, the extracted features
are utilized by Linear Discriminant Analysis (LDA) for class prediction of unclassified data.
This method aims to leverage CNN's capabilities in extracting diverse features from both
textual and image-based data. Further, the integration with LDA, a discriminative model,
assists in effectively predicting class labels for unannotated instances. Regarding its
performance, evaluations demonstrate promising results. The precision achieved by this
hybrid approach stands at 95.6%, showcasing its ability to precisely classify instances.
Additionally, the method exhibits a high recall rate of 96.7%, ensuring its effectiveness in
capturing relevant instances within the dataset. This combined approach appears to offer a
robust solution, achieving notable accuracy in classification tasks while leveraging the
strengths of both CNN and LDA methodologies. [6]

4.1.3. Requirement Analysis


The Requirement Analysis phase is a critical stage in the software development process
where the project team collaborates with stakeholders to define and document the system's
requirements. The Requirement Analysis phase can be furthermore defined through
following use case diagrams.

4.1.3.1. Use case Diagram


It shows the interaction of the user with the system. The diagram represents different activity
and the involvement of the user with the activity using the arrows. The activities and user
interaction are shown in the diagram below:

4
Figure 4.1: Use case diagram of fake news detection system

4.2. Feasibility Analysis


The system is checked either the system will run successfully and efficiently in the current
ongoing situation referring different change in environment, the economic status, the
technical and operational feasibility of the system and the schedule feasibility of the parties
involved in building, testing and using the system.

4.2.1. Technical Feasibility


The project stands as technically feasible, aligning with current technology standards
encompassing both hardware and software components. The outlined technical requirements
for this project include a laptop with a minimum of 4GB RAM equipped with GPU and a
high-speed internet connection. This application is compatible with most contemporary
personal computers, meeting the specified hardware and software prerequisites.

5
4.2.2. Operational Feasibility
This project can be executed with minimal human resources, as two developers are engaged
in the project, which surpasses the required manpower. The project's objective is to develop a
Fake News Detection system that identifies fake news within the provided dataset.

4.2.3. Economic Feasibility


The project's development proves to be cost-effective, leveraging open-source software such
as Python, which is freely accessible. Hosting services, available for free until a certain
storage threshold is reached, align with the project's current efficiency. Moreover, the
advantages offered by the project outweigh its costs, establishing its economic feasibility at
present.

4.2.4. Schedule Feasibility


In assessing scheduling feasibility, the organization estimates the time required to complete
the project. Upon exploring these aspects, the feasibility analysis identifies potential
constraints the proposed project might encounter, including internal constraints like
technical, technological, budgetary and resource-related limitations.

Figure 4.2: Gantt Chart


4.3. High Level Design of System
A high-level design is the overall architecture that shows the development phases of a system
from the planning to the implementation or maintenance phase. It contains the brief
description about the services, system, platform used and the relationship between modules
and different events and activities that take place in building the system. The SDLC

6
(software development life cycle) that is used to build the system is the Agile software
development life cycle whose implementation is mentioned below:

4.3.1. Methodology of proposed system


The agile software development process is a system development process where the system
is developed quickly collaborating with customers frequently, and being able to adapt
changes quickly. Here the tools are developed with complete interaction over processes and
tools. It takes the software itself as reference rather than the documents and quickly bring any
changes if needed that helps the easy development of the system.

Figure 4.3: Agile software development life cycle


a. Concept

The scope of the project is determined determining the most critical tasks that might occur
in future development of the project. Also, with the past research and studies, the plan for
the progress and features for the system were planned that can be the most need for the
users. Feasibility test is performed which will be a key part to determine the cost, system
requirements for effective and efficient running of the project in the market.

b. Inception

The team members are being gathered after all the decision making and future prospect
and challenges that may appear. The team member needs to build a user interface mock-up

7
and lay down the project architecture with regular guidance and feedback from the
supervisor and the at the time user (consider the classmates or other team members).

c. Iteration

It is the building phase where the system development starts. It takes most of the time for
the project where the designer meets with the UI/UX developer for the layout of the
project. At the conclusion of the initial iteration or sprint, the objective is to establish the
fundamental functionality of the product. Subsequent iterations can then incorporate
additional features and adjustments. This phase is pivotal in Agile software development
as it enables developers to rapidly create functional software and make adaptations to
fulfill the client's requirements.

d. Release

At this point, the product is nearing its release. Here, the quality assurance team needs to
conduct various tests to verify the complete functionality of the software. Initially, the
team will perform system testing to ensure the readiness of the code for release. Crucially,
any potential bugs identified by testers will be promptly addressed by the developers.
Once all these tasks are completed, the product's final iteration will enter the production
phase.

e. Maintenance

As part of this stage, the software development team will offer continuous support to
ensure the proper functioning of the system and address any new issues. Additionally, the
team will deliver additional training to users and confirm their comprehension of the
product's usage. Developers may gradually introduce new iterations to enhance the
product with advanced features over time.

f. Retirement

A product reaches the retirement phase due to two primary reasons:

1. It is replaced by new software.

2. The system becomes outdated or incompatible with the organization over time.

8
During this phase, the software development team will initially inform the users about the
decommissioning of the software. Subsequently, if the company identifies a replacement,
users will transition to the new system. Finally, the developers will finalize any remaining
end-of-life tasks and cease support for the current product.

4.3.2. Flow Chart


Flowchart is the representation of sequence of activities or processes that took place while
executing or implementing the system. The flow char of the system FNDS is consist of
different steps of implementation that are shown through the diagram below:

9
Figure 4.4: Flowchart of the system
4.3.3. Working mechanism of proposed system
The Fake news detection system (FNDS) is a system that helps to build a model that can
determine the news source through title and determining either the news is true or fake. The
system uses Random Forest Algorithm (RFA) that uses the binary decision tree to give the
result from different sets of input data. Some of the processes involves in working of the
system are as follows:

a. Data Gathering
In this phase the data is gathered from the trusted source. The data exist in comma-
separated values file (.CSV) format. This data is later used to train the model. Different
trusted source like Kaggle can be used to get the data. It consists of different attributes
that defines a data. For data of the FNDS attributes like news title, text, subject, date and
label seems to be more appropriate. Where the label attributes are considered as the
critical attributes which contains the result true or false and the model is trained
accordingly.
b. Data preprocessing
Data preprocessing is the way of reducing the extra amount of data. It helps to reduce the
training time for the model. Different process like tokenization, lowercase, stopwords
removal and lemmatization or stemming can be used for data pre-processing.
Tokenization: It is the way of breaking down an entire collection of sentences into a
word of array. It can also be said as splitting a string or input text into list of tokens.
Tokens helps in understanding the context and for interpreting the meaning of text by
analyzing the sequence of words.
Make Lowercase: Some data may create a different impression but have the same actual
words combination because of the lower- and upper-case situation. For e.g. ‘The’ and
‘the’ can be considered as different with the defense in ‘T’ with each other. Hence,
making all the words lowercase will help to remove the ambiguity in the sentence.
Remove Stopwords: Stopwords are those words that don’t have general specific
meaning like constants (a, an, the). Hence removing the stopwords can help in reducing
the noise and dimension of feature set affect pre-processing.

10
Stemming and Lemmatization: Stemming is used to normalize the words into its best
form or root form. Sometimes it can change the word into root form which doesn’t have
any meaning which can cause problem which is solved by lemmatization but it is used to
group different inflected form of the words called as lemma and produces group words
which have meaning.
For e.g.: for the word ‘lazy’ the stemming process convert the word as ‘lazi’ which don’t
have any meaning in English dictionary where the lemmatization produces the exact
word ‘lazy’ for ‘lazy'.
c. Vectorization
The vectorization is the process where the text data is converted to vectors which can be
later easily processed using the algorithm. It can be done using bag or words (Count
Vectorizer) or TF-IDF (Term frequency - inverse document frequency). It converts the
words into matrix form helps in reducing dimensionality and feature extraction. Here, the
matrix thus formed corresponds to a document and columns correspond to a word or
term.
d. Model Building
Before training the model, the data is first split into train and test data using some
vectorization method. Later the main algorithm i.e., Random Forest Algorithm is
implemented to build the model. This algorithm works on the basis of decision tree where
the decision of the majority of the decision tree is considered as the final result or output.
The combination of decision tree makes the Random Forest.
e. Model Evaluation
The preciseness of the model is checked checking any possible errors. Accuracy soccer,
confusion matrix and classification report can be used to check the accuracy and
occurrence of any possible error in the building process of the model. Here confusion
matrix is a 2x2 matrix where the C 12 and C21 shows the number errors that occurs while
evaluating the model.
f. Model Deployment
The model is then attached with the website that is the actual user interface where the
user tests the truthiness of the news using the news title. The synchronization of the
model and the website is possible using flask.

11
g. Prediction on client data
The system uses prediction pipeline which performs all the data preprocessing and
prepare a method. This method can be called with input (as news title) giving the result
true or fake for the news.

Figure 4.5: Working mechanism of the system


4.3.4. Description of algorithm
For implementing the system, the Random Forest Algorithm (RFA) that helps to determine
or conclude a final result or give a final decision with the majority decision made by all the
binary decision tree. Hence, to implement this algorithm first the understanding and
implementation of decision tree is necessary. Also, the combination of tree (decision tree
makes it name as random forest).

4.3.4.1. Decision Tree


From the sets of data provided to us according to the target data and the remaining data
decision tree can be in the following ways:
Step 1: Choose a target attribute within the attributes of the data given to train the model.
Step 2: Then the information gain is found out as:
IG=−¿

12
Step 3: Then the entropy is determined using the remaining attribute. One of the remaining
attributes will be the root of our decision tree.
n
Pi+¿
Entropy = ∑ ∗¿ I ( PiNi ) ¿
i=1 P+ N

Step 4: Then finally the gain is determined as:


Gain= IG-E(A)
The more is the gain that attribute will become the root nodes of our tree.
4.3.4.2. Random Forest Algorithm
From the trees that the system received from the binary decision tree. The system followed
the following steps to implement the Random Forest Algorithm:
Step 1: The data that is given by the admin is considered to be the observed data.
Step 2: From the observed data set a bootstrap data set is taken.
A bootstrap data set is a collection of data set that is randomly picked from the observed
dataset. The same data or event from the observed that may be repeated more than once or
may not even be there while taking the data for bootstrap data set. But the less repetition the
better.
Step 3: Then a decision tree is built from the data in bootstrap data set.
While making the decision tree the subsets of the variables is used to make the node at each
step. The one with the highest entropy is choose as the root nodes from any two randomly
selected variables (attributes).
Step 4: Then a random data is taken again leaving the target value as unknow.
Step 5: Now the sample value is passed through the decision tree and let them decide the
value for target attribute.
Step 6: The majority decision made by all the decision tree is considered to be the final value
for target attribute.

5. EXPECTED OUTCOME
The system will consist a website and a model develop in Python Pickle Files (.pkl) format
which is merged together giving both the user interface and machine learning ability to the

13
system. The UI i.e., website will consist of a detection section where the user can use the
news title to check either the news is true or fake.

Furthermore, this system will also get recheck and updated with the update news that has
been listed on the updated news source (Kaggle) which fulfils the condition of our system
development life cycle which is the Agile Development life cycle where testing and
maintenance with feedback will be done even after the completion of the system. But the
system won’t expect to be 100% sure since there is many news that might not be updated
form different verified source and it won’t be able to detect the truthiness of the news that
hasn’t be trained using the data file (.csv) format since the model is not suitable for training
the large set of data at once.

REFERENCES
[1] M. Rao. “Exploiting Network Structure To Detect Fake News”. Stanford University,
School of Computer Science, 2018.
[2] A. Thota. ”Fake News Detection: A deep learning approach,”. SMU, Data Science
Review, 2018.
[3] Roshan R. Karwa, Sunil R. Gupta, “Automated hybrid Deep Neural Network model for
fake news identification and classification in social networks”, Journal of Integrated
Science and Technology, Volume 10, No 2, 2022.
[4] Sudhanshu Kumar, Thoudam DorenSingh, “Fake News detection on Hindi news
dataset”, Global Transition Proceedingd,2022, doi:
https://doi.org/10.1016/j.gltp.2022.03.014
[5] Enjoy Maity, Ankush Tomar, Ruhi Peter, “Fake News Detection System: In Hindi
Data Set Using BERT”, International Research Journal of Modernization in
Engineering Technology and Science, Volume 4, May-2022.
[6] Reza Mansouri, Mahmood Naderan-Tahan, Mohammad Javad Rashti, “A Semi-
supervised Learning Method for Fake News Detection in Social Media”, Iranian
Conference on Electrical Engineering (ICEE), 2020, doi:
10.1109/ICEE50131.2020.9261053.

14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy