0% found this document useful (0 votes)
8 views20 pages

RTRP Batch 10

The document outlines a system for SMS spam detection using FastICA and neural networks, detailing the process from data collection to classification. It discusses the existing system's methodology, proposed components, and advantages and disadvantages of the new approach. Additionally, it highlights software and hardware requirements, functional requirements, and benefits of using Python for implementation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

RTRP Batch 10

The document outlines a system for SMS spam detection using FastICA and neural networks, detailing the process from data collection to classification. It discusses the existing system's methodology, proposed components, and advantages and disadvantages of the new approach. Additionally, it highlights software and hardware requirements, functional requirements, and benefits of using Python for implementation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Subject: RTRP

BRANCH: Information Technology


ROLL NO’S: 23D41A1259
23D41A1261
23D41A1262
23D41A1264
SMS Spam detection using
fastICAwith Neural
networks
Abstract:
• E- messages are an important means of communication between
millions of people world-wide .

• › But several people and companies misuse this facility to distribute


unsolicited bulk messages that are commonly called as Spam sms.

• › Span srns may include advertisements of drugs, software, adult


content, health ñisurance or otlser fraudulent advertisement.
• › Various Spam filters are used to provide a protective
• mechanism that are able to design a system to recognize the
Existing System:
• It involves pre-processing SMS text data with FastICA (Independent Component Analysis) to
extract independent features, which are then fed into a neural network classifier to identify
whether an SMS message is spam or not.
• FastICA helps isolate key patterns in the data that are most indicative of spam, enhancing
the neural network's ability to accurately classify messages.
• Text cleaning: Removing punctuation, special characters, and converting text to lowercase.
• Tokenization: Breaking down the SMS text into individual words or n-grams (sequences of
words).
• Feature extraction using FastICA:
• Applying FastICA to the numerical representation of the text (e.g., TF-IDF scores) to identify
independent components that best capture the underlying patterns of spam messages.
• These extracted features are considered to be the most relevant for spam detection.
Proposed :system
Components:
1. Data Collection
2. Preprocessing
3. Feature Extraction
4. FastICA
5. Neural Network
6. Classification
7. Feedback Mechanism
Workflow:
8. Receive SMS messages
9. Preprocess messages
10. Extract features
11. Apply FastICA
12. Train neural network
13. Classify messages
14. Store results and feedback
Literature Survey:
• Qian Xu and Evan Wei Xiang, Baidu, A survey on SMS Spam Detection Using
Noncontent Features, Technical Report no. 1541-1672, sep 2012, IEEE
• [1] The authors of this paper have used Neural network simple for the detection of
spam contents in accordance followed by the same. They have used a training
mechanism on the basis of the features extracted towards the classification with the
NEURAL NETWORK.
Saadat Nazirova, Survey on Spam Filtering Techniques, technical report no.153-160,
August 2011...
• [2] Various types of neural networks have been used to spam. Neural Networks are
able to detect features which can detected by human. They state complex
relationships between input and output. They make system adaptable so that system
can adjust it according to changing environment. A huge number of techniques and
solutions have proposed to detect spam but every techniques has some pitfalls.
• kZhenhel Duan, Senior Member, IEEC, Pung Chen, Fernando Sanchez, Ringlet
• Dong, Meatber, IEEZ, Mary Stephmson, aod Jr Mlchaet Banker, a s«ori* oe
• 20/2.
• Detecting spam Zombies by Monitoring Outgoing Messages, Vol.9.no.2,March/Aprial
t3j The work proposed in thir approach includes the detection of massive message
spam. II has been considered that a bulk sms 'service muy include the spam zs rhc
receiver does not always want to receive rhe messages frum the senders. For this
purpose they have develop gn effective spam zombie detection system named
SPOT by monitoring outgoing messages of a network. SPDT is designed based on
a powerful statistical tool called Sequential Probability Ratio Tesl, which har
bounded false positive and false negative error rates.
• k Vandana 1asmA,Sp‹im dete‹vion system using hidden ñfurkov Mnilel. volume
3,isssue7,July 2013.
• [4]ln it image spam detection system that uses deteci spam words . Filtering
• methods are used to deteci stemming words and ihe using hidden markov model
to delete spam images.
•Nan Jtang Non Jlengl, Yu Jln2, Ann Skodlark2, and Zhl•Ll Zhang
[5] hos discuss on that SMS spurn messages together with SMS network
records collected from a large US based cellular carrier, we easy out a
comprehensive siudy of SMS spamming. Our analysis shows various
characteristics of SMS spamming activities, such as spamming rates, victim
selection suategies and spatial clustering of spam numbers. Our analysis
also reveals thet spam numbers wiih similar content exhibit strong
similarity in ierms of their sending patterns, tenure, devices and
geolocaiions.
Block diagram of SMS spam
Detection :
Software and Hardware
requirement:
• Software Components:
• 1. Programming Language: Python or MATLAB for implementing FastICA and
Neural Network algorithms.
• 2. Neural Network Library:TensorFlow, Keras, or PyTorch for building and
training the neural network
• .3. FastICA Library: scikit-learn or MATLAB's built-in FastICA function for
implementing Independent Component Analysis (ICA).
• 4. SMS Dataset: A labeled dataset of SMS messages (spam and non-spam) for
training and testing the model.
• 5. Natural Language Processing (NLP) Tools: NLTK, spaCy, or Stanford CoreNLP
for text preprocessing and feature extraction.
Hardware Components
• 1. CPU: A multi-core processor (e.g., Intel Core i5 or i7) for efficient
computation.
• 2. GPU: A dedicated graphics processing unit (e.g., NVIDIA GeForce or
Quadro) for accelerating neural network computations.
• 3. Memory: Adequate RAM (at least 8 GB) for storing and processing
large datasets.
• 4. Storage: A fast storage drive (e.g., SSD) for storing the dataset,
model, and other files.
• 5. Server or Cloud Infrastructure: Optional, for deploying the model in
a production environment.
Functional Requirement:
• 1. Text Preprocessing: Cleans and normalizes text data.
• 2. FastICA: Extracts independent features from data.
• 3. Neural Network: Learns patterns and classifies SMS as spam or non-
spam.
• 4. Spam Filtering: Filters out spam messages based on model
predictions.
• 5. Model Evaluation: Evaluates model performance using metrics like
accuracy and recall.
Advantages on proposed system:
• 1. High Accuracy
• 2. Robustness to Noise
• 3. Feature Extraction
• 4. Real-Time Detection
• 5. Scalability
• 6. Flexibility
• 7. Reduced False Positives
Disadvantages on proposed system:
• 1. Complexity: Requires expertise in machine learning and neural
networks.
• 2. Training Time: Requires large amounts of training data and
computational resources.
• 3. Overfitting: May not generalize well to new, unseen data.
• 4. Dependence on Data Quality: Requires high-quality training data to
achieve good results.
• 5. Computational Resources: Requires significant computational
resources for training and deployment.
Source Code:
import pandas as pd
import numpy as np
from
sklearn.feature_extraction.text
import CountVectorizer,
TfidfTransformer
from sklearn.model_selection
import train_test_split
• # Load dataset
• Df=pd.read_csv("https://raw.githubusercontent.com/
dD2405/Twitter_Sentiment_Analysis/master/train.csv")
• df = df[['label', 'tweet']]
• df.columns = ['label', 'text']

• df['label'] = df['label'].map({0: 'ham', 1: 'spam'})

• # Splitting dataset
• X_train, X_test, y_train, y_test = train_test_split(df['text'],
df['label'], test_size=0.2, random_state=42)
• # Creating a text processing and classification pipeline
• pipeline = Pipeline([
• ('vectorizer', CountVectorizer()),
• ('tfidf', TfidfTransformer()),
• ('classifier', MultinomialNB())
• ])

• # Train model
• pipeline.fit(X_train, y_train)

• # Predict on test data


• y_pred = pipeline.predict(X_test)

• # Function for prediction


• def predict_sms(text):
• return pipeline.predict([text])[0]

• # Example
• sms_text = "Congratulations! You've won a free lottery ticket. Call now!"
• print("Prediction:", predict_sms(sms_text))
Program Implimentation:
output:
Benefits of Using Python for SMS Spam
Detection:
1.Easy to Use & Readable – Python’s simple syntax makes it beginner-friendly for
machine learning and text processing.
2.Rich Libraries – Powerful tools like scikit-learn, NLTK, and pandas simplify spam
detection.
3.Strong Text Processing – Efficient tokenization, stemming, and TF-IDF
transformation for better accuracy.
4.AI & ML Integration – Supports Naïve Bayes, Deep Learning, and scalable
deployment with Flask/FastAPI.
5.Large Community Support – Open-source tools, datasets, and active forums for
learning and troubleshooting.
THANK YOU

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy