0% found this document useful (0 votes)
110 views2 pages

Spambase Names

This document describes the attributes of a spam email database. There are 48 continuous attributes representing the percentage of words in an email that match specific words. There are 6 continuous attributes representing the percentage of characters in an email that match certain characters. There are also attributes related to capital letters in the email. Finally, there is a nominal class attribute indicating if the email is spam or not spam.

Uploaded by

Heri Darmanto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views2 pages

Spambase Names

This document describes the attributes of a spam email database. There are 48 continuous attributes representing the percentage of words in an email that match specific words. There are 6 continuous attributes representing the percentage of characters in an email that match certain characters. There are also attributes related to capital letters in the email. Finally, there is a nominal class attribute indicating if the email is spam or not spam.

Uploaded by

Heri Darmanto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

| SPAM E-MAIL DATABASE ATTRIBUTES (in .

names format)
|
| 48 continuous real [0,100] attributes of type word_freq_WORD
| = percentage of words in the e-mail that match WORD,
| i.e. 100 * (number of times the WORD appears in the e-mail) /
| total number of words in e-mail. A "word" in this case is any
| string of alphanumeric characters bounded by non-alphanumeric
| characters or end-of-string.
|
| 6 continuous real [0,100] attributes of type char_freq_CHAR
| = percentage of characters in the e-mail that match CHAR,
| i.e. 100 * (number of CHAR occurences) / total characters in e-mail
|
| 1 continuous real [1,...] attribute of type capital_run_length_average
| = average length of uninterrupted sequences of capital letters
|
| 1 continuous integer [1,...] attribute of type capital_run_length_longest
| = length of longest uninterrupted sequence of capital letters
|
| 1 continuous integer [1,...] attribute of type capital_run_length_total
| = sum of length of uninterrupted sequences of capital letters
| = total number of capital letters in the e-mail
|
| 1 nominal {0,1} class attribute of type spam
| = denotes whether the e-mail was considered spam (1) or not (0),
| i.e. unsolicited commercial e-mail.
|
| For more information, see file 'spambase.DOCUMENTATION' at the
| UCI Machine Learning Repository: http://www.ics.uci.edu/~mlearn/MLRepository.h
tml
1, 0.

| spam, non-spam classes

word_freq_make:
word_freq_address:
word_freq_all:
word_freq_3d:
word_freq_our:
word_freq_over:
word_freq_remove:
word_freq_internet:
word_freq_order:
word_freq_mail:
word_freq_receive:
word_freq_will:
word_freq_people:
word_freq_report:
word_freq_addresses:
word_freq_free:
word_freq_business:
word_freq_email:
word_freq_you:
word_freq_credit:
word_freq_your:
word_freq_font:
word_freq_000:
word_freq_money:
word_freq_hp:
word_freq_hpl:

continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.
continuous.

word_freq_george:
continuous.
word_freq_650:
continuous.
word_freq_lab:
continuous.
word_freq_labs:
continuous.
word_freq_telnet:
continuous.
word_freq_857:
continuous.
word_freq_data:
continuous.
word_freq_415:
continuous.
word_freq_85:
continuous.
word_freq_technology: continuous.
word_freq_1999:
continuous.
word_freq_parts:
continuous.
word_freq_pm:
continuous.
word_freq_direct:
continuous.
word_freq_cs:
continuous.
word_freq_meeting:
continuous.
word_freq_original:
continuous.
word_freq_project:
continuous.
word_freq_re:
continuous.
word_freq_edu:
continuous.
word_freq_table:
continuous.
word_freq_conference: continuous.
char_freq_;:
continuous.
char_freq_(:
continuous.
char_freq_[:
continuous.
char_freq_!:
continuous.
char_freq_$:
continuous.
char_freq_#:
continuous.
capital_run_length_average: continuous.
capital_run_length_longest: continuous.
capital_run_length_total: continuous.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy