Tweaking 2
Tweaking 2
PRESENTATION
Robustness of Credibility Assessment
Team Members:
-Naga Dheeraj P
-Ravi Teja
Project Description
Our Aim is to create ‘adversarial examples’
by making small modifications to each text snippet
in the attack dataset that change the
victim classifier’s decision without altering the text meaning.
Evaluation
Evaluation is done using BODEGA score, which is a product of three numbers
computed for each adversarial example:
The final ranking will be based on the BODEGA scores, averaged first over all examples in the dataset, and
then over the five domains and three victims. The number of queries to victim models needed to generate
each example will not influence the ranking but will be included in the analysis.
ROADMAP
We are given with 3 Classifier Models (BERT,BiLSTM,Surprise Classifier)
And we are suppose to use the train dataset to train the surprise classifier and to
understand the working of the other models.
After that we need to create a methodology to make changes in the text(in dev.tsv).
Using Synonyms replacement,Rephrasing and Character-level changes untill the
classifier decision changes.
After tweaking the texts we use those tweaked texts to train the given models to
make sure that the models are exposed to these changes.
Now we evaluate the model based on attack.tsv where we use BODEGA score to
evaluate the models.
Dataset
No of
Domain Name Dev dataset Attack Dataset Size
rows
Sample of
Tokenization output
Stemming
Stemming is used to reduce words to their base or root form.
The goal is to strip away prefixes, suffixes, and inflections to obtain the word stem, which
might not always be a valid word itself.
For example, the words "running," "runner," and "ran" can all be reduced to the stem "run"
Term frequency(TF) of a term in document = Number of times term appears in the document.
Total number of words in the document
Inverse document frequency(IDF) of a term = log ( Total number of documents in the collection
Number of documents in the collection that contain this term )
TF-IDF = TF * IDF
Spam emails often contain specific keywords or phrases that are not commonly used in legitimate emails.
By analyzing the TF-IDF weights of words in an email, we can identify words that are more indicative of spam.
The classifier learns to associate higher weights for certain words with a higher probability of being spam.
Correlation Analysis(To identify words that are highly influencing the labels)
Original sentence : water shortage!! we have to save it!! people are dying
Adversarial sentence: h2o shortfall ! ! we have to economize it ! ! people are fail
The replace_with_synonyms method tokenizes input text, identifies words based on their part
of speech, and replaces them with synonyms retrieved from WordNet. This preserves the
sentence's structure while altering its content.
The rephrase method rearranges the order of words in the input text, offering a simple form of
sentence transformation.
Additionally, the character_level_changes method introduces random typographical errors
into the text, simulating small alterations at the character level.
Tweaking the Statements(Code)
Evaluation(Final Phase)
The final ranking will be based on the BODEGA scores, averaged first over all examples in the dataset, and
then over the five domains and three victims. The number of queries to victim models needed to generate
each example will not influence the ranking but will be included in the analysis.