0% found this document useful (0 votes)
4 views13 pages

Tweaking 2

tweaking ml

Uploaded by

Raviteja PV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views13 pages

Tweaking 2

tweaking ml

Uploaded by

Raviteja PV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

PROJECT

PRESENTATION
Robustness of Credibility Assessment

Team Members:
-Naga Dheeraj P
-Ravi Teja
Project Description
Our Aim is to create ‘adversarial examples’
by making small modifications to each text snippet
in the attack dataset that change the
victim classifier’s decision without altering the text meaning.

Evaluation
Evaluation is done using BODEGA score, which is a product of three numbers
computed for each adversarial example:

Confusion score (1: victim classifier changed the decision, 0: otherwise)


Semantic score (BLEURT similarity between the original and adversarial example, clipped to 0-1)
Character score (Levenshetin distance scaled as 0-1 similarity score).

The final ranking will be based on the BODEGA scores, averaged first over all examples in the dataset, and
then over the five domains and three victims. The number of queries to victim models needed to generate
each example will not influence the ranking but will be included in the analysis.
ROADMAP
We are given with 3 Classifier Models (BERT,BiLSTM,Surprise Classifier)
And we are suppose to use the train dataset to train the surprise classifier and to
understand the working of the other models.

After that we need to create a methodology to make changes in the text(in dev.tsv).
Using Synonyms replacement,Rephrasing and Character-level changes untill the
classifier decision changes.

After tweaking the texts we use those tweaked texts to train the given models to
make sure that the models are exposed to these changes.

Now we evaluate the model based on attack.tsv where we use BODEGA score to
evaluate the models.
Dataset
No of
Domain Name Dev dataset Attack Dataset Size
rows

Covid19 1130 - 595 345 KB

Rumour Detection 8683 2070 415 10.8 MB

Fact Checking 172763 19010 405 51.2MB

Style-based news bias


60234 3600 400 248 MB
assessment

Propaganda Detection 11546 3320 416 1.9 MB

Each Dataset 3 tsv - train.tsv , dev.tsv , attack.tsv


Each file has 3 columns with labels, tweet id and tweet(text).
0 for Credible content
1 denotes Non-Credible content
RD_data value counts
0 - 5671
1 - 3012
Text Pre-processing

URL & Punctuation Removal

01. Eliminated URLs from the text to avoid irrelevant links


that could skew the analysis.
Stripped punctuation marks to focus on the core words, enhancing
text processing.

02. Emoji Decoding


Converted emojis to text descriptions, preserving their
sentiment and meaning for better analysis.
Tokenization
Tokenization is the process of breaking down text into smaller units
called tokens, typically words or phrases.
This segmentation allows the analysis algorithms to examine each token
independently and understand the context in which they appear.

Sample of
Tokenization output
Stemming
Stemming is used to reduce words to their base or root form.
The goal is to strip away prefixes, suffixes, and inflections to obtain the word stem, which
might not always be a valid word itself.
For example, the words "running," "runner," and "ran" can all be reduced to the stem "run"
Term frequency(TF) of a term in document = Number of times term appears in the document.
Total number of words in the document

Inverse document frequency(IDF) of a term = log ( Total number of documents in the collection
Number of documents in the collection that contain this term )

TF-IDF = TF * IDF

Spam emails often contain specific keywords or phrases that are not commonly used in legitimate emails.
By analyzing the TF-IDF weights of words in an email, we can identify words that are more indicative of spam.
The classifier learns to associate higher weights for certain words with a higher probability of being spam.
Correlation Analysis(To identify words that are highly influencing the labels)

1. Creating a Binary Matrix:


We start by creating a binary matrix that represents whether each word is present or absent in each tweet.
If a word is present, its corresponding entry in the matrix is 1; otherwise, it's 0.
2. Computing Correlation:
For each tweet, we calculate the correlation between the presence of each word and the label assigned to the tweet.
This correlation tells us how much the presence of a word in a tweet is related to the assigned label.
A high positive correlation means the word is strongly associated with the label, while a negative correlation means the word is less associated.
3. Selecting Top Correlated Words:
After computing the correlation for each word in each tweet, we select the top 6 words that have the highest correlation with the label.
These words are considered the most influential or important words for predicting the label of the tweet.
Tweaking the Statements

Original sentence : water shortage!! we have to save it!! people are dying
Adversarial sentence: h2o shortfall ! ! we have to economize it ! ! people are fail

The replace_with_synonyms method tokenizes input text, identifies words based on their part
of speech, and replaces them with synonyms retrieved from WordNet. This preserves the
sentence's structure while altering its content.
The rephrase method rearranges the order of words in the input text, offering a simple form of
sentence transformation.
Additionally, the character_level_changes method introduces random typographical errors
into the text, simulating small alterations at the character level.
Tweaking the Statements(Code)
Evaluation(Final Phase)

Evaluation is done using BODEGA score, which is a product of three numbers


computed for each adversarial example:

Confusion score (1: victim classifier changed the decision, 0: otherwise)


Semantic score (BLEURT similarity between the original and adversarial example, clipped to 0-1)
Character score (Levenshetin distance scaled as 0-1 similarity score).

BODEGA SCORE = CONFUSION * SEMANTIC * CHARACTER SCORE

The final ranking will be based on the BODEGA scores, averaged first over all examples in the dataset, and
then over the five domains and three victims. The number of queries to victim models needed to generate
each example will not influence the ranking but will be included in the analysis.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy