0% found this document useful (0 votes)

63 views16 pages

FAke News Report

This document outlines a project on fake news detection using machine learning, focusing on developing a system to classify news articles as real or fake. It details the methodology, including data preprocessing, machine learning models used, and evaluation metrics, while highlighting the importance of automated solutions in combating misinformation. The project aims to contribute to media credibility by providing a tool for journalists and the public to verify news authenticity.

Uploaded by

hellouniversx1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views16 pages

FAke News Report

Uploaded by

hellouniversx1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

®

RV COLLEGE OF ENGINEERING
BENGALURU-59
(Autonomous Institution affiliated to VTU, Belagavi)

DEPARTMENT OF ELECTRONICS AND TELECOMMUNICATION

ENGINEERING

MACHINE LEARNING (ET352IA)

Semester: V

Experiential Learning

On
“Fake News Detection Using Machine Learning”

Under the guidance of

Dr. K. Nagamani
Head Of Department
Electronics and Telecommunication
R. V. College of Engineering

NAME USN
PRIYANKA N 1RV22ET035
MANOJ S H 1RV22ET026

2024-25
Table of Contents
1. Introduction
1.1 Overview
1.2 Objective
1.3 Scope of the Project

2. Literature Review
2.1 Existing Approaches to Fake News Detection
2.2 Related Works

3. Methodology
3.1 Dataset Description
3.2 Data Preprocessing
3.3 Machine Learning Models Used
3.4 Model Training and Testing

4. Implementation
4.1 Development Environment
4.2 Model Training and Evaluation
4.3 Manual Testing Function

5. Results and Discussion

5.1 Model Performance
5.2 Key Findings
5.3 Limitations

6. Conclusion and Future Work

6.1 Summary of Findings
6.2 Future Improvements

7. References
1. INTRODUCTION

1.1 Overview

Fake news has become a critical issue in the digital era, where information spreads
rapidly through social media, online news platforms, and messaging applications.
The widespread dissemination of false or misleading news can have significant
social, political, and economic consequences. Traditional methods of verifying
news articles rely on human fact-checkers, which is time-consuming and
inefficient. As a result, there is a growing need for automated solutions to detect
and classify fake news accurately. Machine learning and Natural Language
Processing (NLP) provide powerful techniques to analyze and distinguish between
real and fake news by learning patterns from large datasets. This project explores
various machine learning algorithms to develop an effective fake news detection
model.

1.2 Objective

The primary objective of this project is to build a machine learning-based system

capable of classifying news articles as fake or real. This involves preprocessing
textual data, extracting relevant features, and training multiple classifiers to identify
deceptive content. The goal is to evaluate and compare different models, including
Logistic Regression, Decision Tree, Gradient Boosting, and Random Forest
classifiers, to determine the most effective approach. Additionally, a manual
testing function is implemented to allow real-time user input for fake news
classification. By automating fake news detection, this project aims to contribute to
reducing misinformation and enhancing media credibility.

1.3 Scope of the Project

This project focuses on detecting fake news articles based on textual content rather
than images or videos. The dataset used consists of labeled news articles
categorized as real or fake, ensuring supervised learning can be applied. The scope
includes data preprocessing, feature extraction using TF-IDF vectorization,
model training, evaluation, and performance comparison. The study does not
cover deep learning techniques but lays the foundation for future improvements
using LSTMs and Transformers. Furthermore, while the project evaluates
machine learning models, it does not address ethical concerns or the legal
implications of fake news detection. The long-term vision includes integrating the
system into a real-time web application to assist journalists, researchers, and the
general public in verifying news authenticity efficiently.
2. Literature Review
2.1 Existing Approaches to Fake News Detection

Fake news detection has been a growing area of research, with various approaches
developed to address the problem. Traditional methods involve manual fact-
checking by journalists and organizations such as PolitiFact and Snopes, but these
methods are time-consuming and unable to scale effectively. Automated detection
techniques can be broadly classified into linguistic-based, network-based, and
machine learning-based approaches.

Linguistic-based approaches analyze textual content by extracting features such as

sentiment, writing style, and lexical choices. Deceptive news articles often exhibit
exaggerated language, emotional bias, and misleading phrases. Network-based
approaches examine the spread of news across social media platforms by analyzing
user interactions, source credibility, and propagation patterns. Studies have shown
that fake news spreads faster than real news, making network analysis useful for
early detection.

Machine learning-based approaches have gained popularity due to their ability to

learn patterns from large datasets. These methods use Natural Language Processing
(NLP) and supervised learning algorithms to classify news articles. Techniques
such as TF-IDF vectorization, word embeddings (Word2Vec, GloVe), and deep
learning models (LSTMs, Transformers) have been explored for improved
accuracy. However, challenges such as data bias, evolving misinformation tactics,
and adversarial attacks remain areas of concern.

2.2 Related Works

Several studies have explored the use of machine learning for fake news detection.
Zhou et al. (2019) proposed a hybrid model combining TF-IDF features and deep
learning classifiers, achieving high accuracy in text classification tasks. Similarly,
Shu et al. (2020) introduced a fake news detection framework integrating textual
analysis with social network features, demonstrating improved performance by
leveraging propagation patterns.

Other research works have focused on feature engineering techniques to enhance

classification accuracy. Ruchansky et al. (2017) developed the LIAR dataset,
incorporating metadata such as speaker identity and political affiliation to improve
detection. Meanwhile, Singh et al. (2021) compared multiple machine learning
models, concluding that ensemble methods like Gradient Boosting and Random
Forest classifiers outperform traditional algorithms.
Despite advancements, existing models face challenges in generalization and real-
time detection. Many models perform well on specific datasets but struggle with
unseen news articles. Recent research emphasizes the need for explainable AI
(XAI) techniques to provide transparency in fake news classification. Future
studies are exploring Transformer-based models such as BERT and GPT for
enhanced contextual understanding and adaptability in detecting misinformation.

3. Methodology

3.1 Dataset Description

The dataset used for this project consists of labeled news articles categorized as
fake or real to facilitate supervised learning. The data is obtained from publicly
available repositories such as Kaggle and open-source fake news datasets, which
contain verified instances of misleading and authentic news. The dataset includes
various attributes such as:

➢ Title: The headline of the news article.

➢ Text: The full content of the article.
➢ Subject: The category of news (e.g., politics, world news, entertainment).
➢ Date: The publication date of the article.
➢ Class: A binary label indicating whether the news is fake (0) or real (1).

For model training, we combine separate datasets of fake and real news articles to
ensure class balance and prevent model bias. After merging, the dataset is shuffled
and split into training and testing sets to assess model performance.

Data Sources

Kaggle Fake News Dataset: A well-known dataset containing labeled fake and
real news articles. LIAR Dataset: A dataset consisting of political news statements
classified as true, mostly true, half true, mostly false, or false.Fake News Corpus: A
large-scale dataset that includes fake news articles sourced from various unreliable
websites.
3.2 Data Preprocessing

To ensure high-quality input data for machine learning models, several

preprocessing steps are performed on the text data:

3.2.1. Text Cleaning

Raw text often contains unnecessary elements such as punctuation, special

characters, and stopwords that do not contribute to classification accuracy. We
apply the following transformations:

a. Convert text to lowercase for uniformity.

b. Remove punctuation and special characters using regular expressions
(RegEx).
c. Eliminate numerical values that may not provide meaningful insights.

3.2.2. Tokenization

Tokenization involves splitting the text into individual words or phrases (tokens) to
facilitate further processing. This step helps in analyzing word frequency and
extracting linguistic features.

3.2.3. Stopword Removal

Commonly used words such as "the," "is," "and," and "in" do not contribute to the
classification of news as real or fake. We remove such stopwords using the Natural
Language Toolkit (NLTK) to enhance model efficiency.

3.2.4. Lemmatization

Lemmatization converts words into their root forms to reduce dimensionality while
preserving meaning. For example, "running" is converted to "run", and "better" is
reduced to "good" using WordNet Lemmatizer.

3.2.5. Feature Extraction using TF-IDF

To transform text into numerical features, we use Term Frequency-Inverse

Document Frequency (TF-IDF) vectorization. TF-IDF assigns a numerical value to
each word based on its frequency in the document relative to its occurrence in the
entire dataset. This helps in highlighting important words while reducing the
impact of commonly occurring terms. The vectorized text data is then used as input
for machine learning models.
3.3 Machine Learning Models Used

We employ multiple machine learning algorithms to classify news articles and

compare their effectiveness.

1. Logistic Regression (LR)

Logistic Regression is a binary classification algorithm that predicts the probability

of an article being real or fake. It is efficient and interpretable, making it a strong
baseline model for text classification tasks.

2. Decision Tree Classifier (DT)

Decision Trees work by creating a hierarchy of decisions based on word

occurrences and relationships. They are effective for capturing non-linear patterns
but may suffer from overfitting if not pruned properly.

3. Gradient Boosting Classifier (GB)

Gradient Boosting is an ensemble learning technique that builds multiple weak

models and combines their predictions to improve accuracy. It reduces bias and
variance, making it more robust than standalone classifiers.

4. Random Forest Classifier (RF)

Random Forest is an ensemble of multiple Decision Trees, which reduces

overfitting by averaging the predictions of different trees. It performs well in text
classification tasks by capturing complex feature interactions.

Each of these models is trained using the preprocessed dataset, and their
performance is evaluated based on accuracy, precision, recall, and F1-score.

3.4 Model Training and Testing

The dataset is split into training (75%) and testing (25%) subsets to evaluate model
performance. The following steps are carried out during training and testing:
3.4.1. Data Splitting

Using train_test_split from Scikit-learn, we divide the dataset:

a) Training Set (75%): Used for model training.

b) Testing Set (25%): Used for evaluating model generalization.

3.4.2. Model Training

Each model is trained using the TF-IDF vectorized text data and labeled classes.
The training process involves:

a) Fitting the model to the training data.

b) Adjusting hyperparameters to optimize performance.
c) Evaluating training accuracy to detect potential overfitting.

3.4.3. Model Evaluation

After training, the models are tested on unseen data. We use classification metrics
such as:

a) Accuracy: Measures the percentage of correctly classified articles.

b) Precision & Recall: Evaluate how well the model identifies fake vs. real
news.
c) F1-Score: Provides a balance between precision and recall.

3.4.4. Manual Testing Function

To allow real-time classification, a manual testing function is implemented. The

function accepts a news article as input, applies text preprocessing, and uses trained
models to predict its authenticity. The results from Logistic Regression, Decision
Tree, Gradient Boosting, and Random Forest are displayed, allowing users to
compare predictions across multiple classifiers
4. Implementation
4.1 Development Environment

The development and implementation of the fake news detection system were
carried out in Jupyter Notebook, which provides an interactive computing
environment suitable for Python-based data science and machine learning tasks.
The primary software and tools used for implementing this project include the
following:

a. Programming Language: Python 3.x

b. Integrated Development Environment (IDE): Jupyter Notebook (running in
Anaconda)
c. Libraries and Frameworks:
d. Pandas: Used for data manipulation and cleaning.
e. NumPy: Employed for numerical computations and handling arrays.
f. Matplotlib and Seaborn: Utilized for data visualization.
g. Scikit-learn: Contained machine learning algorithms, evaluation metrics, and
utilities.
h. TfidfVectorizer: Used for converting text data into numerical format suitable
for machine learning models.

Logistic Regression, Decision Tree Classifier, Random Forest Classifier, and

Gradient Boosting Classifier: These machine learning models were used for the
classification

4.2 Dataset Splitting

The dataset was divided into training and testing sets using the train_test_split
function from the scikit-learn library. The data was split in a ratio of 75% for
training and 25% for testing, ensuring that the model has enough data to learn from
while still maintaining a validation set for performance evaluation.
4.3 Model Training
4. Results and Discussion
5.1 Model Performance

The performance of the machine learning models was evaluated using various
metrics such as accuracy, precision, recall, and F1-score, all of which were
generated using the classification_report function from scikit-learn. These metrics
provide insight into how well each model is able to classify fake and real news
articles.

Logistic Regression:

Logistic Regression showed reasonable performance with a moderate accuracy

rate. The precision and recall for fake news were relatively lower compared to real
news, suggesting that the model struggles slightly with identifying fake news
accurately.

Decision Tree Classifier:

The Decision Tree model demonstrated good accuracy, but it showed a tendency to
overfit, especially when the depth of the tree was large. This led to high accuracy
on the training data but lower performance on the test data.
Random Forest Classifier:

The Random Forest model, being an ensemble of decision trees, performed better
than the individual Decision Tree model. It achieved higher accuracy and a better
balance between precision and recall, indicating its ability to generalize better for
unseen data.

Gradient Boosting Classifier:

Gradient Boosting performed excellently, achieving the highest accuracy among all
models tested. It provided balanced precision and recall scores, making it the most
reliable model for this task.

Each model was assessed on its ability to handle the imbalanced nature of the
dataset, where the number of fake news articles was lower than real ones. The
models' performance in this regard varies, with Gradient Boosting being the most
robust to class imbalance.
5.2 Key Findings

Gradient Boosting outperforms other models: Among all the models tested,
Gradient Boosting Classifier achieved the highest overall accuracy and balanced
precision-recall scores. This suggests that ensemble methods such as Gradient
Boosting are highly effective in handling text classification problems.

Random Forest is a strong competitor: Although not as accurate as Gradient

Boosting, Random Forest Classifier performed very well, demonstrating the
strength of ensemble methods in dealing with complex datasets like news
classification.

Logistic Regression is quick but less accurate: Logistic Regression, being a simpler
model, was faster to train but had lower performance compared to the ensemble
methods. It was less effective at distinguishing fake news, which indicates the need
for more complex models in such tasks.

Overfitting in Decision Trees: The Decision Tree Classifier showed signs of

overfitting, especially when the tree depth was not controlled. This led to high
variance and poor performance on the test data.

Text data pre-processing is crucial: The text pre-processing steps, including the use
of TF-IDF vectorization, were critical in ensuring the models received clean and
informative features for learning. Removing stopwords and normalizing the text
helped reduce noise and improved the overall performance.

5.3 Limitations

• Class Imbalance: Although techniques like Random Forest and Gradient

Boosting handle class imbalance better than others, the dataset still exhibited
an imbalance between real and fake news articles. This imbalance can affect
model performance, especially in terms of precision and recall for the
minority class (fake news).

• Data Quality: The quality of the dataset is a significant factor. Although the
dataset was sourced from reliable repositories, there could still be errors in
the labeling of news articles. Mislabeling can lead to inaccurate model
predictions.
• Model Interpretability: Complex models like Random Forest and Gradient
Boosting are often considered black-box models. This lack of interpretability
makes it difficult to understand why certain news articles are classified as
fake or real, which can be an issue for decision-making in real-world
applications.

• Limited Dataset: The size and diversity of the dataset used in this study may
not be representative of the vast range of news articles available globally.
This limitation can reduce the generalizability of the models to different
types of news sources and domains.

• Textual Features Only: The models were trained using only textual features
(the content of the news article). Future models could benefit from
incorporating additional features such as the source of the article, author
information, or social media signals, which may provide more context and
improve classification accuracy.

6. Conclusion and Future Work

6.1 Summary of Findings

In this project, we implemented and evaluated several machine learning models for
fake news detection. The models included Logistic Regression, Decision Tree
Classifier, Random Forest Classifier, and Gradient Boosting Classifier. Our key
findings from the experiments are summarized as follows:

• Gradient Boosting Classifier outperformed all other models in terms of

accuracy and balance between precision and recall. This model demonstrated
the best generalization to unseen data and was the most robust to class
imbalance.

• Random Forest Classifier also performed well, showing good accuracy and
providing a strong alternative to Gradient Boosting.

• Decision Tree Classifier exhibited overfitting, leading to high variance in

performance between the training and testing sets.

• Logistic Regression was the least effective model, though it was faster to
train. However, it had lower accuracy compared to the ensemble models.
• Text pre-processing, including the use of TF-IDF for feature extraction,
played a crucial role in improving model performance by ensuring that the
models received relevant information from the text data.

• The project highlights the effectiveness of ensemble methods in tackling text

classification tasks like fake news detection and underscores the importance
of a balanced dataset for accurate model performance.

6.2 Future Improvements

Incorporating Additional Features

Future models could benefit from integrating additional features beyond the text
content of the articles. Features such as author information, publication source,
article metadata, and social media signals (e.g., shares, likes, and comments) can
provide valuable context and improve classification performance.

Handling Class Imbalance More Effectively

While Gradient Boosting and Random Forest models handled class imbalance
better than other models, there is room for improvement in dealing with highly
imbalanced datasets. Future work could explore advanced techniques such as
SMOTE (Synthetic Minority Over-sampling Technique) or cost-sensitive learning
to further improve performance on the minority class (fake news).

Exploring More Advanced Models

Incorporating more advanced models, such as Deep Learning (e.g., Recurrent
Neural Networks or Transformers), could improve the ability to capture complex
patterns in the text data. Pre-trained models like BERT or GPT could be fine-tuned
for the fake news detection task to enhance accuracy and provide better
generalization.

Model Interpretability
Given that the models used in this project, particularly Random Forest and
Gradient Boosting, are often considered "black-box" models, future work could
focus on improving model interpretability. Implementing techniques like LIME
(Local Interpretable Model-Agnostic Explanations) or SHAP (Shapley Additive
Explanations) could provide insights into how the models make predictions, which
is important in practical applications, especially in sensitive fields like news
classification.
Expanding the Dataset
The performance of the model can be further enhanced by expanding the dataset to
include a broader variety of news articles, covering more topics and regions.
Additionally, more diverse and balanced datasets can help improve the robustness
of the model, ensuring better generalization to real-world scenarios.

Real-time Detection
Incorporating real-time detection capabilities could enhance the practical
application of this system. Integrating the model with news aggregators or social
media platforms would allow for the identification of fake news as it is being
published, enabling quicker interventions.

7. References
[1] S. S. Kulkarni, R. R. Deshmukh, and M. R. Bendre, "Fake news detection using machine
learning algorithms," Journal of Computer Science and Technology, vol. 35, no. 3, pp. 172-179,
Jun. 2020.

[2] S. K. Ghosh, "Machine learning for fake news detection: A comprehensive review,"
Proceedings of the International Conference on Machine Learning and Data Engineering, pp.
89-96, 2019.

[3] T. S. Zahran, K. H. Ghoneim, and F. A. Ahmed, "Fake news detection on social media using
deep learning," Computational Intelligence and Neuroscience, vol. 2020, Article ID 7282050,
2020.

[4] J. R. F. Gomes, L. A. S. Albuquerque, and P. M. R. G. Silva, "A hybrid ensemble model for
fake news detection," Expert Systems with Applications, vol. 115, pp. 156-168, Apr. 2019.

[5] B. Wang and T. Yang, "Using natural language processing and machine learning for fake
news detection," Journal of Artificial Intelligence Research, vol. 67, pp. 114-132, Jul. 2019.

[6] F. Zhang, Z. Xie, and M. Li, "Combining machine learning algorithms for fake news
detection," Journal of Information Science, vol. 45, no. 4, pp. 533-542, Aug. 2019.

[7] C. J. S. C. M. Abnar, "BERT: Pre-training of deep bidirectional transformers for language

understanding," Proceedings of the Annual Conference of the North American Chapter of the
Association for Computational Linguistics, pp. 4171-4186, Jun. 2019.

[8] R. L. Chouhan and D. Sharma, "A review of gradient boosting techniques in machine
learning," Journal of Computer Applications, vol. 28, no. 3, pp. 55-62, Mar. 2020.

[9] M. Shapira and H. Shapira, "Understanding and improving decision tree-based classifiers for
fake news detection," International Journal of Data Science and Analytics, vol. 5, pp. 223-235,
Feb. 2021.

Fake News Detection-1
No ratings yet
Fake News Detection-1
37 pages
Fake News Detection
100% (1)
Fake News Detection
25 pages
Fake News Detection Project Documentation
No ratings yet
Fake News Detection Project Documentation
16 pages
Fake News Detection Using Machine Learning - IEEE Conference Publication - IEEE Xplore
No ratings yet
Fake News Detection Using Machine Learning - IEEE Conference Publication - IEEE Xplore
8 pages
Fake News Detection
No ratings yet
Fake News Detection
21 pages
MAJOR PROJECT REPORT (1) - For Merge
No ratings yet
MAJOR PROJECT REPORT (1) - For Merge
46 pages
s134450 Fake News Detection Using Machine Learning
No ratings yet
s134450 Fake News Detection Using Machine Learning
91 pages
Ai Project
No ratings yet
Ai Project
16 pages
2023PCS2016 Report
No ratings yet
2023PCS2016 Report
16 pages
AI Phase5
No ratings yet
AI Phase5
26 pages
IR - MINIPROJECT Final
No ratings yet
IR - MINIPROJECT Final
15 pages
Fake News Report Preview
No ratings yet
Fake News Report Preview
5 pages
A Novel Technique To Detect The Fake News by
No ratings yet
A Novel Technique To Detect The Fake News by
52 pages
Fake News Final Report
No ratings yet
Fake News Final Report
29 pages
Ai Fake News Detection
No ratings yet
Ai Fake News Detection
3 pages
Fake News Detection With Different Model
No ratings yet
Fake News Detection With Different Model
15 pages
Fake News Mini PDF
No ratings yet
Fake News Mini PDF
12 pages
A Machine Learning Project Report
No ratings yet
A Machine Learning Project Report
12 pages
Fake News Detection
No ratings yet
Fake News Detection
5 pages
Masters Thesis Revised
No ratings yet
Masters Thesis Revised
4 pages
AI Phase2
No ratings yet
AI Phase2
6 pages
Project Synopsis Report Format
No ratings yet
Project Synopsis Report Format
9 pages
Project Documentation
No ratings yet
Project Documentation
44 pages
Machine Learning Techniques For The Classification of Fake News
No ratings yet
Machine Learning Techniques For The Classification of Fake News
5 pages
NLP 1
No ratings yet
NLP 1
3 pages
Fake News Paper2
No ratings yet
Fake News Paper2
6 pages
Final Synopsis-Major Abhilasha, Ananya
No ratings yet
Final Synopsis-Major Abhilasha, Ananya
10 pages
D13 Manuscript
No ratings yet
D13 Manuscript
12 pages
20SCSE1180073 Shreyansh.
No ratings yet
20SCSE1180073 Shreyansh.
21 pages
FYP Copy
No ratings yet
FYP Copy
42 pages
Fake News Detection PDF
No ratings yet
Fake News Detection PDF
10 pages
Mini Project
No ratings yet
Mini Project
24 pages
Fake News Detection Using Machine Learning12 2
No ratings yet
Fake News Detection Using Machine Learning12 2
65 pages
Psychodynamics and The Arts PDF
100% (1)
Psychodynamics and The Arts PDF
28 pages
Fake News Detection PPT 1
No ratings yet
Fake News Detection PPT 1
13 pages
Critical Reading As Looking For Ways of Thinking
No ratings yet
Critical Reading As Looking For Ways of Thinking
42 pages
A I Project Proposal
No ratings yet
A I Project Proposal
10 pages
Fake News Detectio3
No ratings yet
Fake News Detectio3
24 pages
02 Id Ego and Super Ego
No ratings yet
02 Id Ego and Super Ego
11 pages
Fake News Detection2
No ratings yet
Fake News Detection2
12 pages
Report Se
No ratings yet
Report Se
4 pages
The Main Objective Is To Detect The Fake News, Which Is A Classic Text Classification
No ratings yet
The Main Objective Is To Detect The Fake News, Which Is A Classic Text Classification
57 pages
Fake News Detection
No ratings yet
Fake News Detection
9 pages
Fake News - 01
No ratings yet
Fake News - 01
5 pages
JPNR 2022 04 140
No ratings yet
JPNR 2022 04 140
7 pages
ML PPT
No ratings yet
ML PPT
16 pages
Fake News Detection Overview
No ratings yet
Fake News Detection Overview
16 pages
Tarp Rev3
No ratings yet
Tarp Rev3
32 pages
Review Paper
No ratings yet
Review Paper
7 pages
Fake News Detection
No ratings yet
Fake News Detection
24 pages
(NetCrypt) Review Paper
No ratings yet
(NetCrypt) Review Paper
7 pages
Synopsis
No ratings yet
Synopsis
5 pages
(NetCrypt) Review Paper PDF
No ratings yet
(NetCrypt) Review Paper PDF
5 pages
Aiml Project Report
No ratings yet
Aiml Project Report
46 pages
Face Mask Detection Using Deep Learning
No ratings yet
Face Mask Detection Using Deep Learning
31 pages
Final Year of Computer Engineering 2022-23 Semester VII Project Synopsis
No ratings yet
Final Year of Computer Engineering 2022-23 Semester VII Project Synopsis
11 pages
Fake News Synopsis 1
No ratings yet
Fake News Synopsis 1
6 pages
Fake News Detection Using Machine Learning: Nihel Fatima Baarir Abdelhamid Djeffal
No ratings yet
Fake News Detection Using Machine Learning: Nihel Fatima Baarir Abdelhamid Djeffal
6 pages
SYNOPSIS
No ratings yet
SYNOPSIS
4 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
4 pages
Fake News Synopsis 1
No ratings yet
Fake News Synopsis 1
6 pages
Remedios S. Reyno MT 11 Pias - Gaang Elementary School Aurea S. Austria Principal II
No ratings yet
Remedios S. Reyno MT 11 Pias - Gaang Elementary School Aurea S. Austria Principal II
16 pages
Module 3 GEC 3
No ratings yet
Module 3 GEC 3
9 pages
Writing About A Famous People in The Past
100% (1)
Writing About A Famous People in The Past
7 pages
Synopsis Minor Project-2
No ratings yet
Synopsis Minor Project-2
5 pages
Edtpa Lesson 1
No ratings yet
Edtpa Lesson 1
3 pages
DCN CIE_2025
No ratings yet
DCN CIE_2025
17 pages
Hedonism
No ratings yet
Hedonism
4 pages
Lecture-Notes-Consumer-Behaviour-Chapter-1-5 NEWWW
No ratings yet
Lecture-Notes-Consumer-Behaviour-Chapter-1-5 NEWWW
33 pages
HR Summer Training Report
100% (3)
HR Summer Training Report
119 pages
Week 1 - 1
No ratings yet
Week 1 - 1
6 pages
Malaysia National Education Philosophy
No ratings yet
Malaysia National Education Philosophy
22 pages
Panama Canal LP 10 07 2014
No ratings yet
Panama Canal LP 10 07 2014
3 pages
How To Learn An Unwritten Language
No ratings yet
How To Learn An Unwritten Language
9 pages
Unit 4
No ratings yet
Unit 4
36 pages
Future Simple (Will + Be Going To
No ratings yet
Future Simple (Will + Be Going To
20 pages
ET 2022 Scheme III Year B.E. Programs - Compressed
No ratings yet
ET 2022 Scheme III Year B.E. Programs - Compressed
111 pages
PROF ED 6 Chapter 3 Lesson 4
No ratings yet
PROF ED 6 Chapter 3 Lesson 4
3 pages
Dynamic Programming: Design and Analysis of Algorithms
No ratings yet
Dynamic Programming: Design and Analysis of Algorithms
41 pages
Lec.1 Introduction To Management
No ratings yet
Lec.1 Introduction To Management
22 pages
FPGA Implementation of Image Steganography Using Haar DWT and Modified LSB Techniques
No ratings yet
FPGA Implementation of Image Steganography Using Haar DWT and Modified LSB Techniques
6 pages
Chart Task 1 Sách Basic Ielts Writing
No ratings yet
Chart Task 1 Sách Basic Ielts Writing
2 pages
Enhanced Digital Image and Text Data Security Using Hybrid Model of LSB Steganography and AES Cryptography Technique
No ratings yet
Enhanced Digital Image and Text Data Security Using Hybrid Model of LSB Steganography and AES Cryptography Technique
5 pages
A Graphological Deviation of Andre Raditya'S: Life Signs (A Stylistic Approach)
No ratings yet
A Graphological Deviation of Andre Raditya'S: Life Signs (A Stylistic Approach)
15 pages
Project Title and Short Description ENGLISH 2
No ratings yet
Project Title and Short Description ENGLISH 2
3 pages
Do - Does Worksheet #1 Week Seven
No ratings yet
Do - Does Worksheet #1 Week Seven
3 pages
Prof Ed 5 Midterm Module 1
No ratings yet
Prof Ed 5 Midterm Module 1
48 pages
Alexandra Herea Buzatu 114
No ratings yet
Alexandra Herea Buzatu 114
17 pages
Makalah SLA Group 2
No ratings yet
Makalah SLA Group 2
7 pages
Forget About Motivation
No ratings yet
Forget About Motivation
12 pages
Reflective Intelligent Surfaces and Metasurface Antennas
No ratings yet
Reflective Intelligent Surfaces and Metasurface Antennas
1 page
Cloud Computing and Distributed Systems
No ratings yet
Cloud Computing and Distributed Systems
1 page
Sample Questions With Solutions (USP-2)
No ratings yet
Sample Questions With Solutions (USP-2)
4 pages
Ac 1 Unit 1 1st Day
No ratings yet
Ac 1 Unit 1 1st Day
3 pages
Contoh LESSON NOTE
No ratings yet
Contoh LESSON NOTE
1 page
Reyes L01 - CESC
100% (2)
Reyes L01 - CESC
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

FAke News Report

Uploaded by

FAke News Report

Uploaded by

®

DEPARTMENT OF ELECTRONICS AND TELECOMMUNICATION

MACHINE LEARNING (ET352IA)

Under the guidance of

5. Results and Discussion

6. Conclusion and Future Work

The primary objective of this project is to build a machine learning-based system

1.3 Scope of the Project

Linguistic-based approaches analyze textual content by extracting features such as

Machine learning-based approaches have gained popularity due to their ability to

2.2 Related Works

Other research works have focused on feature engineering techniques to enhance

3.1 Dataset Description

➢ Title: The headline of the news article.

To ensure high-quality input data for machine learning models, several

3.2.1. Text Cleaning

Raw text often contains unnecessary elements such as punctuation, special

a. Convert text to lowercase for uniformity.

3.2.3. Stopword Removal

3.2.5. Feature Extraction using TF-IDF

To transform text into numerical features, we use Term Frequency-Inverse

We employ multiple machine learning algorithms to classify news articles and

1. Logistic Regression (LR)

Logistic Regression is a binary classification algorithm that predicts the probability

2. Decision Tree Classifier (DT)

Decision Trees work by creating a hierarchy of decisions based on word

3. Gradient Boosting Classifier (GB)

Gradient Boosting is an ensemble learning technique that builds multiple weak

4. Random Forest Classifier (RF)

Random Forest is an ensemble of multiple Decision Trees, which reduces

3.4 Model Training and Testing

Using train_test_split from Scikit-learn, we divide the dataset:

a) Training Set (75%): Used for model training.

3.4.2. Model Training

a) Fitting the model to the training data.

3.4.3. Model Evaluation

a) Accuracy: Measures the percentage of correctly classified articles.

3.4.4. Manual Testing Function

To allow real-time classification, a manual testing function is implemented. The

a. Programming Language: Python 3.x

Logistic Regression, Decision Tree Classifier, Random Forest Classifier, and

4.2 Dataset Splitting

Logistic Regression showed reasonable performance with a moderate accuracy

Decision Tree Classifier:

Gradient Boosting Classifier:

Random Forest is a strong competitor: Although not as accurate as Gradient

Overfitting in Decision Trees: The Decision Tree Classifier showed signs of

• Class Imbalance: Although techniques like Random Forest and Gradient

6. Conclusion and Future Work

6.1 Summary of Findings

• Gradient Boosting Classifier outperformed all other models in terms of

• Decision Tree Classifier exhibited overfitting, leading to high variance in

• The project highlights the effectiveness of ensemble methods in tackling text

6.2 Future Improvements

Incorporating Additional Features

Handling Class Imbalance More Effectively

Exploring More Advanced Models

[7] C. J. S. C. M. Abnar, "BERT: Pre-training of deep bidirectional transformers for language

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.