0% found this document useful (0 votes)
135 views8 pages

Cyber Security: PROJECT: Fake News Detection

1) The document describes steps to detect fake news using Python. It involves importing necessary libraries, reading in news data from a CSV file, splitting the data into training and test sets, using TF-IDF to vectorize text data, training a PassiveAggressiveClassifier model, and calculating accuracy on test data. 2) Key steps include vectorizing training and test sets using TfidfVectorizer, fitting a PassiveAggressiveClassifier on the training vectors and labels, predicting on test vectors and calculating accuracy, and generating a confusion matrix. 3) The model achieved an accuracy of 92.82% on test data for classifying news articles as 'FAKE' or 'REAL' according to the described process.

Uploaded by

Nehal Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views8 pages

Cyber Security: PROJECT: Fake News Detection

1) The document describes steps to detect fake news using Python. It involves importing necessary libraries, reading in news data from a CSV file, splitting the data into training and test sets, using TF-IDF to vectorize text data, training a PassiveAggressiveClassifier model, and calculating accuracy on test data. 2) Key steps include vectorizing training and test sets using TfidfVectorizer, fitting a PassiveAggressiveClassifier on the training vectors and labels, predicting on test vectors and calculating accuracy, and generating a confusion matrix. 3) The model achieved an accuracy of 92.82% on test data for classifying news articles as 'FAKE' or 'REAL' according to the described process.

Uploaded by

Nehal Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Cyber security

PROJECT: Fake news detection

Submitted By: Nehal Agarwal


School: Maheshwari Public School, Ajmer
Class : XII
What is Fake News?

A type of yellow journalism, fake news encapsulates pieces of


news that may be hoaxes and is generally spread through social
media and other online media. This is often done to further or
impose certain ideas and is often achieved with political
agendas. Such news items may contain false and/or exaggerated
claims, and may end up being viralized by algorithms, and users
may end up in a filter bubble.
PREREQUISITES
1) You’ll need to install the following libraries with pip:
pip install numpy pandas sklearn

2)You’ll need to install Jupyter Lab to run your code. Get to your command
prompt and run the following command:
C:\Users\DataFlair>Jupyter lab

3)You’ll see a new browser window open up; create a new console and use it to
run your code. To run multiple lines of code at once, press Shift+Enter.
Steps for detecting fake news with Python

Follow the below steps for detecting fake news and complete your first
advanced Python Project –

1.)Make necessary imports:

import numpy as np import pandas as pd


import itertools from sklearn.model_selection
import train_test_split from sklearn.feature_extraction.text
import TfidfVectorizer from sklearn.linear_model
import PassiveAggressiveClassifier from sklearn.metrics
import accuracy_score, confusion_matrix
2) Now, let’s read the data into a DataFrame, and get the shape of the data and
the first 5 records:
#Read the data df=pd.read_csv('D:\\DataFlair\\news.csv')
#Get shape and head
df.shape
df.head()

 3) And get the labels from the DataFrame.


#DataFlair - Get the labels
labels=df.label
labels.head()

4) Split the dataset into training and testing sets.


#DataFlair - Split the dataset
x_train,x_test,y_train,y_test=train_test_split(df['text'], labels,
test_size =0.2 , random_state=7
5) Let’s initialize a TfidfVectorizer with stop words from the English language and
a maximum document frequency of 0.7 (terms with a higher document frequency
will be discarded). Stop words are the most common words in a language that are
to be filtered out before processing the natural language data. And a
TfidfVectorizer turns a collection of raw documents into a matrix of TF-IDF
features.
Now, fit and transform the vectorizer on the train set, and transform the vectorizer
on the test set:

#DataFlair - Initialize a TfidfVectorizer


tfidf_vectorizer=TfidfVectorizer(stop_words= 'english , max_df=0.7)
#DataFlair - Fit and transform train set,
transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train)
tfidf_test=tfidf_vectorizer.transform(x_test)
6) Next, we’ll initialize a PassiveAggressiveClassifier. This is. We’ll fit this on
tfidf_train and y_train.
Then, we’ll predict on the test set from the TfidfVectorizer and calculate the
accuracy with accuracy_score() from sklearn.metrics:

#DataFlair - Initialize a PassiveAggressiveClassifier


pac=PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)
#DataFlair - Predict on the test set and calculate accuracy
y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')
7. We got an accuracy of 92.82% with this model. Finally, let’s print out a
confusion matrix to gain insight into the number of false and true negatives and
positives.

#DataFlair - Build confusion matrix


confusion_matrix(y_test,y_pred,
labels=['FAKE','REAL'])

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy