0% found this document useful (0 votes)
30 views32 pages

Email Spam Filtering

Uploaded by

vivek15092000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views32 pages

Email Spam Filtering

Uploaded by

vivek15092000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Email Spam Filtering

Computer Security Seminar

N.Muthiyalu Jothir – 271120


Media Informatics

12/18/24 Email Spam Filtering - Muthiyalu Jothir 1


Agenda
 What is Spam ?
 Statistics
 Who Benefits from it?
 Spam Filtering Techniques
 Combining Filters
 Conclusion
12/18/24 Email Spam Filtering - Muthiyalu Jothir 2
What is Spam?
 Spam  Unsolicited email

 Emails that involves sending identical


or nearly identical messages to
thousands (or millions) of recipients.

 Caution !
“SPAM - Spiced Ham ” is a popular
American canned meat brand…
12/18/24 Email Spam Filtering - Muthiyalu Jothir 3
Problem 
 With a tiny investment, a spammer can send over
100,000 bulk emails per hour.

 Junk mails waste storage and transmission


bandwidth.

 ISP’s investment  Cost we absorb as ISP’s


customer

 Spam is a problem because the cost is forced


onto us, the recipient.
12/18/24 Email Spam Filtering - Muthiyalu Jothir 4
Statistics
Email considered Spam 40% of all
email
Daily Spam emails sent 12.4 billion

Daily Spam received per 6


person
Annual Spam received per 2,200
person
Spam cost to all non-corp. Internet users $255 million

Spam cost to all U.S. $8.9 billion


Corporations in 2002
Estimated Spam increase 63%
by 2007
Users who reply to Spam 28%
email
Users who purchased from Spam email 8%

Wasted corporate time per Spam email 4-5 seconds

12/18/24 Email Spam Filtering - Muthiyalu Jothir 5


Who benefits from Spam?
Financial Firms e.g. Mortgage

Information about
interested customers

Recipient replies here

Lead Generators Recipient


(Gain 2% of
Loan value
per customer data) Spammers
(Share the profit
with Lead Generators)
12/18/24 Email Spam Filtering - Muthiyalu Jothir 6
Spam Control Techniques

Fight Back techniques Filtering Techniques

• Reporting Spam to ISP • Challenge-Response Filtering

• Fight back filters • Blacklists and White lists

• Slow Senders • Content based filters


 Rule based
• Law ???  Bayesian filters

• etc.

12/18/24 Email Spam Filtering - Muthiyalu Jothir 7


Reporting Spam To ISPs
 Original spam solution
 Legitimate ISPs respond to such

complaints
 Spammers kicked off

Disadvantage
 Disguised Spammers.

 Naïve users cannot interpret the

email headers
12/18/24 Email Spam Filtering - Muthiyalu Jothir 8
Filters that Fight Back (FFB)
 Majority of spam contain links to web pages.

 Spam filters could auto retrieve the URLs and crawl back to
those pages, which would increase the load on the server.

 If all the spam receivers do this at the same time, the server
might be crashed and so the cost of spamming increases.

Caution !

 FFB usually works with blacklists (of malicious


servers) in order to avoid the attack on innocent
servers.

12/18/24 Email Spam Filtering - Muthiyalu Jothir 9


Filtering Techniques

12/18/24 Email Spam Filtering - Muthiyalu Jothir 10


Spam Vs Ham
 Care to be taken in any Spam filtering technique

 “All the Spam could be allowed to pass thro; but,


not even a single legitimate mail should be filtered.”

 False Positive – Legitimate mail classified as spam.

 Least false positive rate desired…

 Caution : Check your junk folder before deleting

 Don’t believe your Spam filter 

12/18/24 Email Spam Filtering - Muthiyalu Jothir 11


Challenge-Response Filtering
 Emails from unknown senders will receive an auto-reply
message asking them to verify themselves

 Senders “Challenged" to type in a word that is hidden


within a graphic or a sound file

 Mail is forwarded to receiver’s inbox, only after successful


“response”

 This technique almost filters all spam . No spammer would


be interested to take the extra effort to prove him / her self.
 Commercial product “spamarrest”

Disadvantage
 This technique is rude 

 Sometimes senders don’t or forget to reply to the challenge

12/18/24 Email Spam Filtering - Muthiyalu Jothir 12


Blacklists and White lists
 Blacklists of misbehaving servers or known spammers that
are collected by several sites.

 Sender id in the email is compared with the blacklist

 White lists are complementary to black lists, and contain


addresses of trusted contacts

 Use blacklists and white lists for the first level filtering
(before applying content checks) and not used as the only
tool for making decision.

Disadvantage
 Prone to wrong configurations with legitimate servers unable to
exit from a list where they had been incorrectly inserted.

12/18/24 Email Spam Filtering - Muthiyalu Jothir 13


Content based filters

 Not a good idea to filter mails just based


on blacklists

 Wiser decision  Consider the actual


content of the email

 Almost all the successful spam filters use


this technique

 Major types : Rule-based and Bayesian


12/18/24 Email Spam Filtering - Muthiyalu Jothir 14
Rule Based Filters
 Rule based filters work based on some
static rules to decide whether a mail is a
spam or not.

 Rules could be
• words and phrases
• lots of uppercase characters
• exclamation points
• special characters
• Web links
• HTML messages
• background colors
• crazy Subject lines etc.

12/18/24 Email Spam Filtering - Muthiyalu Jothir 15


Rule based filters
 Rules are given scores, based on importance

 Incoming mails are parsed and checked for


known malicious patterns

 Total score calculated for the triggered rules

 If Final Score > Threshold, classify as spam.


Otherwise, classify as legitimate mail.

 Threshold decided by the user.


12/18/24 Email Spam Filtering - Muthiyalu Jothir 16
Rule Based Filters
 “Spamassasin”, a popular spam filtering product
uses rule based filtering.

 Perl Regex (Regular expressions) used for pattern


checking

 Example rules
• header __LOCAL_FROM_NEWS From /news@example\.com/i

• body __LOCAL_SALES_FIGURES /\bMonthly Sales Figures\b/

• score LOCAL_NEWS_SALES_FIGURES 0.8

12/18/24 Email Spam Filtering - Muthiyalu Jothir 17


Rule Based Filters
 Advantage
 Easy to implement
 No training required

 Disadvantage
 Static rules too general
 Spammers find new ways to deceive the

rules

12/18/24 Email Spam Filtering - Muthiyalu Jothir 18


Bayesian Filters
 Bayesian filters are the latest in spam
filtering technology and the most
successful.

 Bayes classifiers were used extensively in


the field of pattern recognition.

 Given an unlabeled example, the classifier


will calculate the most likely classification
with some degree of probability.
12/18/24 Email Spam Filtering - Muthiyalu Jothir 19
Bayesian Filters
 Steps in Bayes Filtering
 Training

 Validation

 Implementation

 Training starts with two collections of mails : one of spam and


one of legitimate mail.

 For every word in these emails, it calculates a spam probability


based on the proportion of spam occurrences.

 Bayesian filters are quite accurate, and adapt automatically as


spam evolves.

 False positives are minimized by Bayesian filtering because


they consider evidence of innocence as well as evidence of
spam.
12/18/24 Email Spam Filtering - Muthiyalu Jothir 20
Bayesian Filtering
 Bayes Probability,

Pr (spam | words) = Pr (spam) * Pr (words | Spam)

Pr (words)

 Probability closer to 1 would be classified as spam and


closer to 0 is classified as ham.

 0.5 is set as the threshold.


12/18/24 Email Spam Filtering - Muthiyalu Jothir 21
Neural Network for Training
 Neural Network Structure

12/18/24 Email Spam Filtering - Muthiyalu Jothir 22


Neural Networks for Training
 Neural networks are used to train the
spam filter (Rule-based or Bayesian) and
itself is not a filter

 Input  words or rules etc.

 Trained over multiple samples of the


user’s mails (both spam and ham)

 Weights of the links are altered till the


desired output is obtained.
12/18/24 Email Spam Filtering - Muthiyalu Jothir 23
Supervised Learning
 Supervised learning  Training with a
“teacher” signal

 Train the system till we get optimized


unaltered weights for the edges.

Caution!
 Take care not to over train the network.

12/18/24 Email Spam Filtering - Muthiyalu Jothir 24


Combining Spam Filters

 Goal  Combined filter aims to improve


individual filters performance.

 Combined Filter = Original Filter (OF) + Received Filter (RF)

 Max gain  Received filter contains some feature


sets not found in the original filter.

 E.g.
Original Filter = {“Share Market”, “Higher Studies”}
Received filter = {“Share Market”, “Job Alerts”}

12/18/24 Email Spam Filtering - Muthiyalu Jothir 25


Challenges
 Decisions (Spam / Ham) made by both
filters individually

 Decisions agree  No Problem 

 Disagreement  Due to difference of


feature sets
 Challenges
• “How do we select the correct decision or filter?”
• “Who selects it?”

12/18/24 Email Spam Filtering - Muthiyalu Jothir 26


Filter Selector (FS)
 Training Phase  FS predicts the unique
features (e.g. words) of RF

 Parse the emails of training set and


extract the features

 ‘Bag’ of (predicted) features for RF

 Text similarity comparison between the


current e-mail's features and the feature
sets of the filters.
12/18/24 Email Spam Filtering - Muthiyalu Jothir 27
Algorithm Flowchart
1. Training Phase
2. Final Verdict

12/18/24 Email Spam Filtering - Muthiyalu Jothir 28


TF – IDF Similarity Measure

 Commonly used in Information Retrieval


applications.

 More frequent words would be key to


accurate classification of emails

 FS predicted feature set is unique

 “Query – Document” retrieval procedure.


• 2 documents – Feature sets
• Query – Current email

12/18/24 Email Spam Filtering - Muthiyalu Jothir 29


Experiments & Results

12/18/24 Email Spam Filtering - Muthiyalu Jothir 30


Conclusion
 We discussed the techniques to “kill” spam

 Comparison between various techniques

 So far, Bayesian seems to be reliable

 Discussed a new approach to combine filters

 Future work :
 Learning techniques for Filter Selector
 Better Similarity measures

12/18/24 Email Spam Filtering - Muthiyalu Jothir 31


Thank You 

12/18/24 Email Spam Filtering - Muthiyalu Jothir 32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy