0% found this document useful (0 votes)

63 views8 pages

"The Pope Has A New Baby!" Fake News Detection Using Deep Learning

This document summarizes a research project that aims to build a classifier to detect fake news using deep learning techniques. It explores several neural network architectures for fake news detection, including logistic regression, feedforward neural networks, recurrent neural networks (RNNs), long short-term memory networks (LSTMs), gated recurrent units (GRUs), bidirectional RNNs with LSTMs, convolutional neural networks (CNNs), and an attention-augmented CNN. The models are trained and evaluated on a dataset containing over 63,000 real and fake news articles to classify news as real or fake based only on its content.

Uploaded by

Ramesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views8 pages

"The Pope Has A New Baby!" Fake News Detection Using Deep Learning

Uploaded by

Ramesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

“The Pope Has a New Baby!

”
Fake News Detection Using Deep Learning

Samir Bajaj
Stanford University
CS 224N - Winter 2017
samirb@stanford.edu

Abstract

The objective of this project is to build a classifier that can predict whether a piece
of news is fake based only its content, thereby approaching the problem from a
purely NLP perspective. An important part of the goal is to compare and report
the results from multiple different model implementations, and present an analysis
of the findings. Several architectures are explored, and a novel design that incor-
porates an attention-like mechanism in a Convolutional Network is investigated.

1 Introduction

Fake news is increasingly becoming a menace to our society. It is typically generated for commer-
cial interests—to attract viewers and collect advertising revenue. However, people and groups with
potentially malicious agendas have been known to initiate fake news in order to influence events
and policies around the world. It is also believed that circulation of fake news had material impact
on the outcome of the 2016 US Presidential Election [1].

From an NLP perspective, this phenomenon offers an interesting and valuable opportunity to
identify patterns that can be coded in a classifier. In our experiments, we will be ignoring all other
signals (e.g., the source of the news, whether it was reported online or in print, etc.), and instead
focus only the content matter being reported.

2 Data

The data used for this project was drawn from two different sources, both in public domain.
Fake news articles were procured from an open Kaggle dataset [2]—this collection is made up
of 13,000 articles that span multiple subjects. Authentic news articles (negative examples for
the classifier) were extracted from the Signal Media News dataset [3]; reservoir sampling was
used to select 50,000 entries uniformly at random from the corpus. The ratio of fake to non-fake
examples was determined on the basis of empirical evidence in an attempt to reflect published
news in the real-world. Binary 0/1 labels were assigned to the real/fake news articles, respectively.
The collection of 63,000 articles was subsequently shuffled, and split into 60% training, 20%
dev/validation, and 20% test sets. The test set was stashed away for the final performance evaluation
of the models, and the validation set was used for hyperparameter selection.

The hyperparameters used for the various models are listed in table 1.

1
3 Related Work
Text classification has a rich research history within the NLP community, and an equally impressive
array of practical applications to showcase its importance. See [4] for an introduction and references.

While there exist tools and products to detect sources of fake news (e.g., whether a web site
publishes misleading news), we will approach this problem as an instance of text classification,
using only the content of the article as the source of features. Doing so affords us to focus on
NLP-related algorithms, thereby allowing us to explore in depth the performance of a variety of
models on a particular task.

4 Models
Several different models were implemented; the following sections describe each in detail. All mod-
els use pre-trained 300-dimensional GloVe [5] embeddings, which are not updated during training.
Given that the training set is small compared to the magnitude of the corpora used to train GloVe
vectors, it makes little sense to update the embeddings as part training the models.

4.1 Logistic Regression

This is a simple, linear model, that was implemented to serve as a baseline for comparison with
other, more sophisticated models.

In this model, vectors corresponding to words in each news story are averaged to produce
one embedding, which is first run through a linear layer and subsequently input to the sigmoid
nonlinearity to predict the label:
ŷ = σ(xW + b) (1)
All models used the logistic (sigmoid) cross-entropy loss, which is defined as:
J = −y log ŷ − (1 − y) log(1 − ŷ) (2)
Figure 1 shows the loss as a function of training epochs.

4.2 Two-layer Feedforward Neural Network

This model also uses a fixed-size input x ∈ R1×300 , obtained as the average of all vectors corre-
sponding to the words in the news story. It is inspired by the simple design explored in [6].
h = ReLU(xW + b1 ) (3)
hdrop = Dropout(h, pdrop ) (4)
ŷ = σ(hdrop U + b2 ) (5)
Averaging the word vectors has intuitive appeal; doing so not only normalizes every example for
length, but also takes every word in the content into account.

4.3 Recurrent Neural Network

The RNN was initially implemented to process the entire length of every news article. However,
with the longest training sample at 26,970 words, it quickly became computationally infeasible to
train the model. Consequently, a limit of 200 time steps was imposed (a hyperparameter, determined
via cross-validation), and shorter examples were padded with zero vectors at the beginning, so that
the network gets to examine the words in the examples as it moves through the time steps to the end.
Padding at the end of the examples produced higher loss.
ht = σ(xt Wx + ht−1 Wh + b) (6)
The output at the final time step was recorded as the predicted label for the example. Figure 7
illustrates the basic architecture underlying the network used in this model, as well as in some of the
others that follow.

2
4.4 Long Short-Term Memories

The Long Short-Term Memory (LSTM) unit was initially proposed by Hochreiter and Schmidhuber
[7], and since then a number of modifications to the original unit have been made. Unlike the
recurrent unit, which simply computes a weighted sum of the input signal and applies a nonlinear
function, each LSTM unit maintains a memory ct at time t, which is subsequently used to determine
the output, or the activation, ht , of the cell.
ht = ot tanh(ct ) (7)
ct = ft ct−1 + it c̃t (8)
c̃t = tanh(xt Wc + ht−1 Uc + bc ) (9)
ot = σ(xt Wo + ht−1 Uo + bo ) (10)
it = σ(xt Wi + ht−1 Ui + bi ) (11)
ft = σ(xt Wf + ht−1 Uf + bf ) (12)
See [8] for a good exposition of the motivation behind this mathematical formulation.

4.5 Gated Recurrent Units

Although RNNs can theoretically capture long-term dependencies, they are very hard to actually
train to do this. Gated recurrent units are designed in a manner to have more persistent memory
thereby making it easier for RNNs to capture long-term dependencies [8].

ht = ut h̃t + (1 − ut ) ht−1 (13)

h̃t = tanh(xt W + (rt ht−1 )U + b) (14)
ut = σ(xt Wu + ht−1 Uu + bu ) (15)
rt = σ(xt Wr + ht−1 Ur + br ) (16)
The above equations can be thought of a GRU’s four fundamental operational stages and they have
intuitive interpretations that make this model much more intellectually satisfying.

4.6 Bidirectional RNN with LSTMs

As a final RNN model, a bidirectional recurrent network with LSTM cells was also implemented.
The final states of the forward and backward layers are concatenated and passed through an affine
transform before being input to the sigmoid to generate the prediction.

4.7 Convolutional Neural Network with Max Pooling

This model starts with the 300-dimensional GloVe embeddings of all the words in the example,
and extracts bigrams, trigrams, 4-grams, as well as 5-grams from the text. Vectors corresponding
to these n-grams are then concatenated and fed into a linear layer, where they are scaled using
different weight matrices (but the same bias). A max pooling layer then selects the largest from the
scaled inputs. The results are added and transformed via a ReLU unit, whose output is finally sent
to a sigmoid nonlinearity to predict the label.

Figure 8 shows the CNN model architecture.

4.8 Attention-Augmented Convolutional Neural Network

As an exploratory exercise, a new architecture that augments a CNN with an attention-like mecha-
nism was implemented. The intuition behind this approach is to determine if the features generated
by the convolution process can be enhanced to capture influence from prior n-grams before they are
max-pooled and sent through the nonlinearity. The idea is to learn an attention matrix for bigrams,
and another for trigrams (only two n-grams were used in this specific model) that capture influence
from a small set of n-grams preceding the convolution window.

3
Figure 2: Loss for the Feed-Forward
Figure 1: Loss for the Linear Model Neural Network Model

Figure 3: Loss for the RNN Model Figure 4: Loss for the GRU Model

Figure 5: Loss for the CNN Model Figure 6: Loss for the LSTM Model

In other words, for words xi ∈ R1×k in a window size h, we learn not only the convolution
(h)
filter W (h) , but also an attention matrix Wa , such that the feature map becomes
h X i
c = W (h) Wa(h) xTi−1:i−1+h xi:i+h (17)

Max pooling is subsequently applied to this augmented feature map.

5 Implementation

All code was written in Python 2.7, using TensorFlow r0.12.1 and NumPy. However, TensorFlow’s
implementation of LSTM and GRU cells was not used; instead, they were coded from scratch.
Xavier initialization was used for all variables. Dropout was employed as a regularization mecha-
nism for the feedforward and the recurrent neural networks.

Experiments were run on local CPU-only machines, as well as on GPU-enabled Azure clus-
ters provided by Microsoft.

4
1

The Pope has a New Baby !

Figure 7: RNN Architecture for the Classification task

ReLU

Max Max Max Max Max

Pooling

Linear xW2 + b xW3 + b xW4 + b xW5 + b

Layer

Convolution Bigrams Trigrams 4-grams 5-grams

Word Vectors

The Pope has a New Baby !

Figure 8: CNN Architecture for the Classification task

Table 1: Hyperparameters for the Various Models

Model Embedding Learning Optimizer Hidden Dropout Time Epochs

Size Rate Size Steps

Logistic Regression 300 10−1 SGD — — — 100

Feedforward Network 300 10−3 Adam 200 0.5 — 100
RNN (Vanilla) 300 10−3 Adam 100 0.5 200 30
GRUs 300 10−4 Adam 100 0.5 200 30
LSTMs 300 10−4 Adam 100 0.5 200 30
BiLSTMs 300 10−3 Adam 100 0.5 200 20
CNN with Max Pooling 300 10−2 Adam 100 — — 100
CNN with Max Pooling and Attention 300 10−2 Adam 100 — — 10

5
Table 2: Model Performance on the Test Set
MODEL PRECISION RECALL F1

Logistic Regression 0.96 0.49 0.65

Feedforward Network 0.89 0.74 0.80
RNN (Vanilla) 0.91 0.56 0.70
GRUs 0.89 0.79 0.84
LSTMs 0.93 0.72 0.81
BiLSTMs 0.88 0.75 0.81
CNN with Max Pooling 0.87 0.44 0.58
CNN with Max Pooling and Attention 0.97 0.03 0.06

6 Results and Discussion

The performance of each model employed in this project is summarized in table 2.

There were two major takeaways from these experiments. First, in accordance with the ob-
servations reported in [9] and [10], the RNN architecture with GRUs outperformed one with LSTM
cells in the task at hand. This result holds despite the fact that a positive bias was added to the
LSTM’s forget gate. Not only did the RNN with GRUs achieve the better F1 score (and the best
overall), it converged faster than the one using LSTM units. While the faster convergence time can
be attributed to the fact that GRUs have fewer gates, and hence fewer computations to generate
the output, the fact that the GRU exposes its full content without any control likely helped it
perform better overall, given that examples in the data set were long news stories, averaging 515
words across both classes. Further, the bidirectional RNN model equipped with LSTMs didn’t do
significantly better than the unidirectional network, reporting almost the same F1 score.

The fact that an RNN architecture won out was a very interesting result in itself. It is worth
noting that some of the news articles in the dataset were thousands of words long, and RNNs, even
those equipped with LSTMs or GRUs, cannot retain memory for more than a hundred or so time
steps. The secret to the success of RNNs in this task may have to do with the fact that the network
employed a large internal state of size 100, often posing challenges in training and resource usage.
A smaller internal state didn’t perform as well.

Second, it was unsurprising to see the feedforward network perform well as a classifier. Its
success can be attributed to the fact that its input comprises the mean across the embeddings of
all the words in the example. As a consequence, the model takes into account contributions from
potentially salient words that identify the news story.

It was disappointing to see that the Convolutional network didn’t perform as well, given that
multiple n-grams were used as inputs to the model, and that a max pooling layer was incorporated.
The assumption was that the max pooled inputs would be a strong indicator to the classifier; but
apparently that signal isn’t adequate. Perhaps additional features would have helped. Further, en-
hancing the CNN with an attention-like device didn’t add much value (although this model achieved
the highest precision), but that’s also because only ten epochs of training could be completed within
the given time and resource constraints. It is also worth noting that the introduction of attention
vectors increases the number of parameters to learn, necessitating a larger training set, which was
fixed for all experiments conducted in the project to allow timely completion and fair comparison
of model performance.

The precision numbers are high for all models because the dataset is (deliberately) skewed,
with authentic news examples (i.e., negative examples) outnumbering the fake news examples by
4 to 1. Consequently, any model can deliver high accuracy simply by guessing every test example
as negative. However, only the feedforward network and the RNNs with complex activation units
achieved a recall greater than 0.7, indicating that the models were able to extract some subset of
relevant features to classify more of the positive examples correctly.

6
In a very high dimensional space, linear classifiers are generally able to take advantage of
the sparsity and find a hyperplane that separates the classes reasonably well. However, with dense
vectors like the GloVe embeddings used in all experiments for this project, it was apparent that
a nonlinear kernel is necessary for good results. The logistic regression model that serves as the
baseline is devoid of any nonlinearity, and therefore reports one of the lowest F1 scores among the
bunch.

7 Future Directions

A complete, production-quality classifier will incorporate many different features beyond the
vectors corresponding to the words in the text. For fake news detection, we can add as features the
source of the news, including any associated URLs, the topic (e.g., science, politics, sports, etc.),
publishing medium (blog, print, social media), country or geographic region of origin, publication
year, as well as linguistic features not exploited in this exercise—use of capitalization, fraction of
words that are proper nouns (using gazetteers), and others.

I also believe that the idea of attention-enabled CNNs is a promising avenue worth exploring
further. Finally, more sophisticated models—for example, pointer and highway networks—
definitely merit investigation.

I plan to continue this effort after the class ends.

Acknowledgments
Many thanks to Dr. Richard Socher, who advised me on this project. The TAs and instructors of CS
224N deserve high praise for making the course highly engaging, despite the logistical challenges
posed by the sheer size of the class.

Thanks also to the course staff for use of some boilerplate code from the homework assign-
ments as a starting point for the project implementation.

References
[1] Allcott, H., and Gentzkow, M., Social Media and Fake News in the 2016 Election, https:
//web.stanford.edu/˜gentzkow/research/fakenews.pdf, January, 2017.
[2] Kaggle, Getting Real About Fake News, https://www.kaggle.com/mrisdal/
fake-news, URL obtained on March 2, 2017.
[3] Signal Media, The Signal Media One-Million News Articles Dataset, http://research.
signalmedia.co/newsir16/signal-dataset.html, URL obtained on March 2,
2017.
[4] Manning, C., Raghavan, P., and Schütze, H., Introduction to Information Retrieval, http:
//informationretrieval.org/, 2008.
[5] Pennington, J., Socher, R., and Manning, C., GloVe: Global Vectors for Word Representation,
2014.
[6] Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T., Bag of Tricks for Efficient Text Classifi-
cation, August, 2016.
[7] Hochreiter, S., and Schmidhuber, J., Long Short-Term Memory, Neural Computation, 1997.
[8] Mohammadi, M., Mundra, R., Socher, R., and Wang, L., CS224N Lecture Notes,
Part V, 2017. http://web.stanford.edu/class/cs224n/lecture_notes/
cs224n-2017-notes5.pdf, URL obtained on March 3, 2017.
[9] Jozefowicz, R., Zaremba, W., and Sutskever, I., An Empirical Exploration of Recurrent Net-
work Architectures, Proceedings of the 32nd International Conference on Machine Learning,
2015.

7
[10] Chung, J., Gulchere, C., Cho, K., and Bengio, Y., Empirical Evaluation of Gated Recurrent
Neural Networks on Sequence Modeling, 2014.

Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
We Used Neural Networks To Detect Clickbaits: You Won't Believe What Happened Next!
No ratings yet
We Used Neural Networks To Detect Clickbaits: You Won't Believe What Happened Next!
7 pages
Sequential Short-Text Classification With Recurrent and Convolutional Neural Networks
No ratings yet
Sequential Short-Text Classification With Recurrent and Convolutional Neural Networks
6 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Project Report
No ratings yet
Project Report
6 pages
Lecture10 Lstms
No ratings yet
Lecture10 Lstms
34 pages
NLP Progress Report
No ratings yet
NLP Progress Report
4 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
RNN
No ratings yet
RNN
22 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
For Seminar
No ratings yet
For Seminar
17 pages
11 RNN
No ratings yet
11 RNN
32 pages
12-13.chapter9 DeepLearningInNLP
No ratings yet
12-13.chapter9 DeepLearningInNLP
45 pages
Lecture8 421
No ratings yet
Lecture8 421
85 pages
Fake News Detection Final Report
No ratings yet
Fake News Detection Final Report
7 pages
Day 4
No ratings yet
Day 4
22 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
Comparison of Naive Bayes Classifier and C-LSTM
No ratings yet
Comparison of Naive Bayes Classifier and C-LSTM
6 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
AN2DL 05 2324 Seq2SeqAndWordEmbedding
No ratings yet
AN2DL 05 2324 Seq2SeqAndWordEmbedding
42 pages
Modelling Time Series With Neural Networks: Volker Tresp Summer 2017
No ratings yet
Modelling Time Series With Neural Networks: Volker Tresp Summer 2017
24 pages
LSTM
No ratings yet
LSTM
24 pages
06 - LLM
No ratings yet
06 - LLM
18 pages
Recurrent Neural Network For Text Classification With Multi-Task Learning
No ratings yet
Recurrent Neural Network For Text Classification With Multi-Task Learning
7 pages
Impact of Word Embedding Models On Text Analytics in Deep Learning Environment: A Review
No ratings yet
Impact of Word Embedding Models On Text Analytics in Deep Learning Environment: A Review
81 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
14 pages
5a. Recurrent Neural Networks
No ratings yet
5a. Recurrent Neural Networks
45 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
RNN-1
No ratings yet
RNN-1
50 pages
NLP Week7 RNNLSTM
No ratings yet
NLP Week7 RNNLSTM
66 pages
ML For NLP-LO3
No ratings yet
ML For NLP-LO3
61 pages
AAM Unit 6 Notes
No ratings yet
AAM Unit 6 Notes
20 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Attention Is All You Need Explained
No ratings yet
Attention Is All You Need Explained
46 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Sentiment Analysis With An Recurrent Neural Networks
No ratings yet
Sentiment Analysis With An Recurrent Neural Networks
12 pages
04 - RNNs
No ratings yet
04 - RNNs
37 pages
s134450 Fake News Detection Using Machine Learning
No ratings yet
s134450 Fake News Detection Using Machine Learning
91 pages
Recurrent Neural Networks Cheatsheet
No ratings yet
Recurrent Neural Networks Cheatsheet
44 pages
Aids Ii
No ratings yet
Aids Ii
42 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Self-Attention GRU Networks For Fake Job Classification
No ratings yet
Self-Attention GRU Networks For Fake Job Classification
5 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
Time Series RNN LSTM 1746197734
No ratings yet
Time Series RNN LSTM 1746197734
25 pages
NLP Midsem Last Lesson
No ratings yet
NLP Midsem Last Lesson
53 pages
NN Text Generation Zaid Bouslikhin
No ratings yet
NN Text Generation Zaid Bouslikhin
14 pages
Report On Text Classification Using CNN, RNN & HAN - Jatana - Medium
No ratings yet
Report On Text Classification Using CNN, RNN & HAN - Jatana - Medium
15 pages
ML Project Report
No ratings yet
ML Project Report
7 pages
3 Sequence and Language Modeling
No ratings yet
3 Sequence and Language Modeling
56 pages
Fake News Detection With Different Model
No ratings yet
Fake News Detection With Different Model
15 pages
2AMM30+AY23 24+Text+Mining+Lecture+3
No ratings yet
2AMM30+AY23 24+Text+Mining+Lecture+3
88 pages
Unit 4
No ratings yet
Unit 4
50 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Unit 1
No ratings yet
Unit 1
20 pages
Parisodhana 2025 Template
No ratings yet
Parisodhana 2025 Template
1 page
Space ISAC MLSecOps White Paper 08.04.2023
No ratings yet
Space ISAC MLSecOps White Paper 08.04.2023
5 pages
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
No ratings yet
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
27 pages
Elon Musk (2017) Claimed ''AI Will Be The Best or Worst Thing Ever For Humanity''. Student's Name: University: Course: Tutor: Date
No ratings yet
Elon Musk (2017) Claimed ''AI Will Be The Best or Worst Thing Ever For Humanity''. Student's Name: University: Course: Tutor: Date
7 pages
Deep Learning Algorithms
No ratings yet
Deep Learning Algorithms
19 pages
Arshad Hisham Sir
No ratings yet
Arshad Hisham Sir
10 pages
Loan Aproval System
No ratings yet
Loan Aproval System
60 pages
A Brief Survey of Deep Reinforcement Learning PDF
No ratings yet
A Brief Survey of Deep Reinforcement Learning PDF
14 pages
Final 1
No ratings yet
Final 1
15 pages
Unit-Ii Knowledge Representation and Reasoning Part-A
No ratings yet
Unit-Ii Knowledge Representation and Reasoning Part-A
10 pages
Journal 4 (2024)
No ratings yet
Journal 4 (2024)
10 pages
Crime Hot Spot Prediction-1
No ratings yet
Crime Hot Spot Prediction-1
7 pages
Barakat
No ratings yet
Barakat
7 pages
How Data Becomes Knowledge, Part 1 From Data Toknowledge 7p
No ratings yet
How Data Becomes Knowledge, Part 1 From Data Toknowledge 7p
7 pages
Back Propagation
No ratings yet
Back Propagation
106 pages
P9120 Syllabus
No ratings yet
P9120 Syllabus
5 pages
Ai QQQQ
100% (1)
Ai QQQQ
23 pages
Searching For Better Test Case Prioritization Schemes: A Case Study of AI-assisted Systematic Literature Review
No ratings yet
Searching For Better Test Case Prioritization Schemes: A Case Study of AI-assisted Systematic Literature Review
25 pages
Unit-5 DL
No ratings yet
Unit-5 DL
35 pages
Temporal Transaction Aggregation Graph Network For Ethereum Phishing Transaction Detection
No ratings yet
Temporal Transaction Aggregation Graph Network For Ethereum Phishing Transaction Detection
9 pages
Frai 05 827584
No ratings yet
Frai 05 827584
16 pages
Characteristics of Artificial Neural Networks
No ratings yet
Characteristics of Artificial Neural Networks
38 pages
Learning Capability and Storage Capacity of Two-Hidden-Layer Feedforward Networks
No ratings yet
Learning Capability and Storage Capacity of Two-Hidden-Layer Feedforward Networks
8 pages
Machine Learning 1
No ratings yet
Machine Learning 1
2 pages
Fin Irjmets1714329036
No ratings yet
Fin Irjmets1714329036
6 pages
Information Fusion: Sciencedirect
No ratings yet
Information Fusion: Sciencedirect
34 pages
Demand Forecasting
No ratings yet
Demand Forecasting
10 pages
2021 CDP Market Guide Final1022
100% (1)
2021 CDP Market Guide Final1022
43 pages
IOT and Embedded Based Smart Baby Cradle
No ratings yet
IOT and Embedded Based Smart Baby Cradle
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

"The Pope Has A New Baby!" Fake News Detection Using Deep Learning

Uploaded by

"The Pope Has A New Baby!" Fake News Detection Using Deep Learning

Uploaded by

“The Pope Has a New Baby!

4.1 Logistic Regression

4.2 Two-layer Feedforward Neural Network

4.3 Recurrent Neural Network

4.5 Gated Recurrent Units

ht = ut h̃t + (1 − ut ) ht−1 (13)

4.6 Bidirectional RNN with LSTMs

4.7 Convolutional Neural Network with Max Pooling

Figure 8 shows the CNN model architecture.

4.8 Attention-Augmented Convolutional Neural Network

Max pooling is subsequently applied to this augmented feature map.

The Pope has a New Baby !

Figure 7: RNN Architecture for the Classification task

Max Max Max Max Max

Linear xW2 + b xW3 + b xW4 + b xW5 + b

Convolution Bigrams Trigrams 4-grams 5-grams

The Pope has a New Baby !

Figure 8: CNN Architecture for the Classification task

Table 1: Hyperparameters for the Various Models

Model Embedding Learning Optimizer Hidden Dropout Time Epochs

Logistic Regression 300 10−1 SGD — — — 100

Logistic Regression 0.96 0.49 0.65

6 Results and Discussion

The performance of each model employed in this project is summarized in table 2.

I plan to continue this effort after the class ends.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.