0% found this document useful (0 votes)

22 views11 pages

Conceptdrift: Uncovering Biases Through The Lens of Foundational Models

Uploaded by

nico

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views11 pages

Conceptdrift: Uncovering Biases Through The Lens of Foundational Models

Uploaded by

nico

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

ConceptDrift: Uncovering Biases through the Lens of

Foundational Models

Cristian Daniel Păduraru Antonio Bărbălau

Bitdefender, Romania Bitdefender, Romania
University of Bucharest University of Bucharest
cpaduraru@bitdefender.com ext-abarbalau@bitdefender.com
arXiv:2410.18970v1 [cs.AI] 24 Oct 2024

Radu Filipescu Andrei Liviu Nicolicioiu

Bitdefender, Romania Mila, Montreal, Canada
University of Bucharest University of Montreal, Canada
rfilipescu@bitdefender.com andrei.nicolicioiu@mila.quebec

Elena Burceanu
Bitdefender, Romania
Institute for Logic and Data Science, Romania
eburceanu@bitdefender.com

Abstract
Datasets and pre-trained models come with intrinsic biases. Most methods rely
on spotting them by analysing misclassified samples, in a semi-automated human-
computer validation. In contrast, we propose ConceptDrift, a method which
analyzes the weights of a linear probe, learned on top a foundational model. We
capitalize on the weight update trajectory, which starts from the embedding of
the textual representation of the class, and proceeds to drift towards embeddings
that disclose hidden biases. Different from prior work, with this approach we can
pin-point unwanted correlations from a dataset, providing more than just possible
explanations for the wrong predictions. We empirically prove the efficacy of our
method, by significantly improving zero-shot performance with biased-augmented
prompting. Our method is not bounded to a single modality, and we experiment
in this work with both image (Waterbirds, CelebA, Nico++) and text datasets
(CivilComments).

1 Introduction
Deep neural networks, and especially fine-tuned versions of foundational models, are commonly
deployed in critical areas such as healthcare, finance, and criminal justice, where biased predictions
can have significant societal consequences [1]. Despite their impact, these models are often employed
in their natural black-box state, i.e. as highly non-linear, multi-layered decision processes, lacking
transparency or interpretability. Even if the pretrained model has been validated by the community,
the dataset leveraged in the fine-tuning process can, and usually does, imprint the model with new
biases. This issue is particularly concerning as biases from these datasets can lead to undesired
outcomes [6], reinforcing existing inequalities or creating new forms of discrimination. This scenario
finds its representation in subpopulation shift setups, where biases can naturally occur in samples.
Within the context of subpopulation shift setups, efforts employing foundational models [13, 37]
have been recently made towards identifying and preventing biases. However, these methods limit

Interpretable AI: Past, Present and Future, Workshop at the 38th Conference on Neural Information Processing
Systems (NeurIPS 2024).
I. Linear probing II. Ranking System III. Dictionary filtering

Landbird key concepts Landbird biases

Surf River Sand
Sport Ocean forest next to a bird
forest

Lake
bamboo forest
bamboo

Duck tree branch in forest tree

Bamboo
Tree branch
Forest Waterbird

Climbing Beak Waterbird key concepts Waterbird biases

Seagull

Horizon Feathers
bird on water
water

Landbird flying

flying over ocean

Human Woodpecker Update trajectory for

bird flying over river
ocean

the neuron weight

river
Hair Eagle Scope of relevant

Face concepts for the class

Figure 1: Illustration of ConceptDrift for the Waterbirds benchmark. The model’s classification
weights drift from the embedding of the textual representation of the class, outside the scope of
relevant concepts, towards biases. We propose a novel embedding-space scoring system, capitalizing
upon this drift, to identify which concepts factor in the final decision of the model and leverage a
dictionary-based approach to delineate biases.

themselves to data analysis alone. For instance, Kim et al. [13] focus on investigating misclassified
validation samples. Their method relies on validating the presence of a given object within the set of
mistakes and its absence from within the set of the correctly classified samples, in order to label it as
a bias. The actual internal decision-making process of the model is never investigated nor referred to.
As an example, a method focusing on analyzing misclassified samples, such as B2T [13], is restricted
to highlighting only the biases present in the validation set. Furthermore, some of the biases found
in the dataset might not have been imprinted upon the model weights. As an example, fine-tuning
a ViT-L-14 CLIP [25] model on the O2O-Hard setup from Spawrious [19], a dataset specifically
designed to instill biases at train-time and expose them at test-time, results in a 96% test-time accuracy.
This demonstrates that biases in the data need not necessarily translate to biases in the model, and that
model investigation is imperative in confirming whether or not a bias seen in the data is a contributing
factor in the decision making process of the model.
We endeavor to expand upon the current usage of foundational models, beyond the restricted scope
of simple data analysis, and propose a new direction for bias identification within the context of
subpopulation shift setups. Our method focuses on investigating the skewness of the model’s weights
towards detecting and prioritizing spurious features as part of the decision-making process of the
investigated model. We hereby propose a novel protocol for uncovering biases using foundational
models such as CLIP [25] and mGTE [36], leveraging the topology of their embedding space to
identify and name biases instilled by linear probing. Our protocol, dubbed ConceptDrift, is illustrated
in Fig. 1. We showcase how, during training, the weights of the final classification layer drift away
from the textual representation of their associated class, towards representations of spurious attributes.
We propose a ranking system based on embedding-space arithmetic to extract keywords from concepts
which factor in the activation of class neurons, and leverage a dictionary-based approach to delineate
concepts outside the semantic scope of the classes, as biases.
We summarize our main contributions as follows:

1. We introduce ConceptDrift, a method capable to pin-point concepts relevant for the decision-
making process of a model. We are the first to propose a weight-space approach for
identifying the biases of fine-tuned foundational models, diverging from the current data-
restricted protocols.

2. We propose a novel, embedding-space scoring method, able to reveal concepts which

discriminatively impact the class prediction.

3. We show how our procedure is suited to assist in bias investigation. We reveal previously
untapped biases on four datasets: Waterbirds [31], CelebA [17], Nico++[35] and CivilCom-
ments [5], showcasing significant improvements in terms of zero-shot bias prevention, upon
state-of-the-art bias identification methods. Validated over image and text data, it can work
on other modalities, with a foundational model with text processing capabilities as well.

2
2 Our Method
For a standard classification task {(xj , yj )} ⊂ X × Y, we propose a method for pin-pointing concepts
that are erroneously correlated to the task’s classes. In order to achieve this, we train a linear layer on
top of a frozen, pre-trained representations of the input data, obtained from a foundational model M .
Next, we find concepts (ci )1≤i≤q in textual form, that are present in the training data and strongly
influence the predictions of the classifier.
We require that the model M is capable of embedding both the concepts ci and the input samples
xj into the RD vector space, such that their cosine similarity cos(M (ci ), M (xj )) is high when the
concept represented by ci is present in sample xj .
The main steps of our method are the following:
Step 1: Initialization We initialize the weights wk , 1 ≤ k ≤ |Y| = N , of the linear layer with the
embedding of the corresponding class name, extracted by the model M , for each class k.
Step 2: Drifting towards biases, through learning We perform ERM [29] training on our dataset of
interest, while keeping the weights wk on the unit sphere. Through learning, the weights in the linear
layer naturally shift from the original initialization, towards concepts that can effectively distinguish
the samples of different classes. In an ideal, unbiased dataset w.r.t. the foundational model, the
learned weights would be the embeddings of the class names. But in all the other cases, concepts
used for classification drift, like visually presented in Fig. 1.
Step 3: Dataset concepts extraction For image classification task, we first use a captioning model
to obtain descriptions of the images in the dataset. Next, for both image and text classification, we
extract concepts from the captions or directly from the text samples.
Step 4: Rank the concepts For each class, we want to keep only the candidate concepts, which
favour the prediction of that class with respect to another subset of classes. Since the weights wk of
each class are normalized, the prediction rule of the classifier can be formulated as:
yˆj = arg max cos(wk , M (xj )). (1)
k∈Y

This further motivates the need for wk to point closer to samples in class k, than the weight of the
other classes. Consider now a concept ci , that has a high cosine with the weight wk and a training
example xj containing the concept ci . Based on the following inequality (proof in appendix A):
p
cos(M (xj ), wk ) ≥ cos(M (xj ), M (ci )) − 2(1 − cos(wk , M (ci ))), (2)
it follows that: as long as our assumption from the beginning of this section holds, and wk is highly
similar with M (ci ), then wk is also guaranteed to have a high similarity with samples containing the
concept ci . Since we seek the concepts which favour the prediction of class k as opposed to at least
one other class, we rank them by the difference in similarity of M (ci ) with wk , and the weight of any
other class, wp :
scorek (ci ) = cos(wk , M (ci )) − min cos(wp , M (ci )) (3)
1≤p≤N ;
p̸=k

Step 5: Filtering concepts Among the concepts with high rank, based on the score in Eq.3 we also
expect to find those that refer to the class itself, or specific instances of it. We thus apply a filtering
procedure to remove instances of the class from the keywords, leaving only associated attributes or
keywords of completely different concepts.

3 Experimental analysis
Foundational models (FM) We used mGTE (gte-large-en-v1.5 [36]) for text embeddings in Civil-
Comments [5], and OpenAI CLIP ViT-L/14 [25] for text and images in the other datasets.
We train the linear layer on L2 normalized embeddings extracted by these models using the Py-
Torch [22] AdamW optimizer with a learning rate of 1e − 4, a weight decay of 1e − 5, a batch size
of 1024 and a cosine annealing learning rate scheduler. We use the cross entropy loss with balanced
class weights as the objective. The weights of the layer are normalized after each update and we

3
Table 1: Foundational Model (FM) Zero-shot prompting task. We modify the prompt using several
bias-discovering methods, and evaluate the zero-shot performance of the FM. Notice how our
ConceptDrift method significantly improves the accuracy for all datasets, over the baseline (prompt
template w/o biases wildcard) and over the existing SoTA methods.
Waterbirds CelebA Nico++ CivilComments
(Acc % ↑) (Acc % ↑) (Acc % ↑) (Acc % ↑)
Method Worst Avg. Worst Avg. Worst Avg. Worst Avg.
FM zero-shot [25] 35.2 90.7 72.8 87.4 57.7 88.4 33.1 83.4
FM w B2T [13] 48.1 86.1 72.8 88.0 - - - -
FM w SpLiCE [4] - - 67.2 90.2 - - - -
FM w Lg [37] - - 67.2 90.2 - - - -
FM w ConceptDrift (ours) 55.3 84.7 75.6 88.4 63.5 86.2 53.7 69.0

Table 2: Model Ablation (Zero-shot prompting task). We variate the ranking score and the cut-off for
concepts, revealing that both aspects could greatly influence the overall performance.
Waterbirds CelebA Nico++ CivilComments Mean
(Acc % ↑) (Acc % ↑) (Acc % ↑) (Acc % ↑) (Acc % ↑)
Variations Worst Avg. Worst Avg. Worst Avg. Worst Avg. Worst Avg.
top-q concepts: 30 51.3 85.0 70.6 86.7 46.0 80.9 54.0 68.0 59.0 77.9
score: final - init weights 48.1 85.7 74.4 88.6 60.9 85.8 50.6 64.3 61.9 80.5
ConceptDrift
* top-q concepts: 15
* score: classes difference 55.3 84.7 75.6 88.4 63.5 86.4 53.7 69.0 65.8 81.3

also learn a temperature to scale the logits. As early stopping criterion, we use the class-balanced
accuracy on the validation set.
Keyword extraction For image captioning we use the GIT-Large model [32], trained on
MSCOCO [15]. Next, to extract concepts we use YAKE [7], taking the top 256 n-gram concepts,
for both n = 3, 5. For post-processing the selected concepts, we split them into individual words to
remove stopwords, substrings from the class names, and hypernims or hyponims of the class concepts
using WordNet [20] (e.g. ’seagull’ for ’landbird’ class). We remove keywords common for all classes,
as they are usually in top because they are part of n-grams containing the class names.

3.1 Datasets

Waterbirds [31] is a common datasets for generalization and bias mitigation. It is created from CUB
[33], by grouping different species of birds into two categories, landbirds and waterbirds, each being
associated with a spurious correlation regarding its background, land and water respectively.
CelebA [17] is a large-scale collection of celebrity images (over 200000), widely used in computer
vision research. The setup for using it in a generalization context [18] consists of using the Blond_Hair
attribute as the class label and the Male attribute as the spurious variable.
Nico++ [35] image dataset has annotations for a main object and its context (e.g. dog on the beach).
Unlike other datasets, NICO++ includes over 50 classes and 6 contexts, providing a richer context
for evaluating model generalization performance across diverse scenarios. For this work, we build a
setup with spurious correlations between 4 classes and 3 contexts (more details in Appx. A.2).
CivilComments [5] is a large collection (1.8 millions) online user comments, used also for research-
ing bias and fairness in NLP, across different social and identity groups.

3.2 Quantitative analysis through zero-shot prompting

In this experiment, we validate the ability of our method to identify biases. We follow B2T [13] setup
and choose the zero-shot prediction task. We augment the initial, class-only related prompt, with the

4
Table 3: Identified global biases. For a qualitative comparison, we show the biases extracted by
multiple methods on Waterbirds and CelebA datasets. See in red biases that are off-topic, person
names, or too related to the semantic content of the class, in green new biases, that were not identified
before, and in blue words that come from expressions like ’body of water’, which are quite difficult
to filter. Notice how our ConceptDrift method proposes lots of new biases, that might be correct,
since they are obtained by analysing the model weights drift while iterating through each dataset.
Waterbirds (highest rank first) CelebA (highest rank first)
landbird waterbird blonde hair non-blonde hair
ocean, beach, model, favorite,
forest, woods, surfer, boat, outfit, hair, man, player,
B2T [13] tree, branch dock, water, lake love, style person, artist
hairstyles, dolly, hairstyles, visor,
turban, actress, amy, kate, fielder,
SpLiCE [4] - - tennis, beard cuff, rapper, cyclist
forest, woods, beach, lake, woman blonde hair, man, man wearing
rainforest, tree water, seagull, blonde hair, actress, sunglasses, young
Lg [37] branch, tree pond model, woman long hair man, black hair, actor
bamboo, log, boat, lake,
tree, surrounded, flying, ocean, smiles, woman, man, dark,
ConceptDrift floor, field, snowy, pond, body, long, brown, eye,
(ours) ground, forest swimming girl, beautiful made, hat

bias, through a minimal intervention (e.g. ’a photo of a {cls} in the {bias}’ (see Appx. A.1). For
each class, we test one prompt for each bias identified in the dataset, taking into account the score
for the best one (zero-shot with max over templates). The results in Tab. 1 show how the biases,
automatically selected by our method, improve the worst group accuracy, over the initial zero-shot
baseline and other bias-extracting solutions, in all four tested datasets. This emphasises on the quality
of the biases automatically extracted by ConceptShift. The better they are, the more capable the
zero-shot prompt approach is to generalize, by adapting the prompt better to the new dataset context.
Ablations We validate key decisions in our algorithm in Tab. 2. We changed the ranking score (score)
in Eq. 3 to the difference in cosine similarity of a concept with the final weights and the initial ones
for each class. This highlights the concepts that the weights of a class have become more similar to,
but does not take their similarity to other class weights into account. We also notice that the number
of chosen concepts (top-q concepts) is important, as taking too many adds noise to the prompts and
lowers the performance. We leave finding a good cut-off strategy for future work.

3.3 Extracting qualitative biases

In Fig. 2, we analyse the scores for the n-gram concepts on Waterbirds, for both classes, extracted as
explained in Sec. 2). Notice how the score variation for each class is steep at the margins, becoming
almost flat as soon as similarity decreases, showing that there are only a few candidates with high
similarity scores, worth to be taken into account next for extracting the biases.
Qualitative examples We present in Tab. 3 the identified biases. Notice how our method comes with
lots of new proposals for biases (in green). This might be case because our approach is fundamentally
different, when compared with others [13, 37, 4], relying on the decision-making process of the
model being investigated. See Nico++ and CivilComments in Appx. A.2.

4 Related Work
To enable a more meaningful comparison, we have distilled in Tab. 4 existing methods down to the
aspects we consider fundamental to bias detection.
Biases and generalization Machine learning methods easily capture relevant factors to solve a task.
Nevertheless, many times, models capture shortcuts [10], that are helpful in solving a task, but are not
fundamental or essential for it. These shortcuts represent spurious correlations or biases, that don’t

5
Concepts scoring for Waterbirds classes
0.10
0.08 with waterbird

Concept score
with landbird
0.06
0.04
0.02
0.00
Concepts scoring for landbird Concepts scoring for waterbird
landbird 0.10 waterbird
Concept score

Concept score
0.05 0.08
0.06
0.04
0.04
forest floor

forest

ground
forest of bamboo

log
field
grass covered
tree branch in a forest
forest with trees
branch in a forest
bamboo forest
forest of trees
forest surrounded
bamboo forest floor

snowy forest

bird flying through a forest

bush
hill
forest next to a tree
bird standing in a field
bird standing in a forest
bird in a forest
bird is in the forest

bird in the forest

bird flying over a forest

platform
bird standing on a grass
bird standing in the grass
forest next to a bird

bird standing in the woods

lake
floating
beach

shore
seagull standing
duck standing
fish
seagull flying
seagull
lake with trees
ocean
dock
duck
duck flying
pond
boat
pelican
flying over a lake
duck is standing

body of water
water
bird flying over a lake
bird standing on a lake

swimming
bird flying over a beach

bird flying over a river

bird swimming
bird flying over the ocean
bird is swimming
bird is on the water
Figure 2: Top concept scores for Waterbirds (before Step 5: Filtering concepts). Notice that the curve
has a steep descent on both ends, showing that there are just a few top candidates (with high scores),
for each class. The first plot shows the similarities scores for all the concepts, while second and third
plot are the detailed high score areas, one per Waterbirds class.

Table 4: Bias-extraction approaches comparison, based on fundamental differences in methods.

Bias definition Source for Principle for
key focus bias candidates scoring candidates
B2T [13] mistakes driven valid-set common keywords in mistakes
SpLiCE [4] dictionary learning full dataset Lasso solver
Lg [37] class-specificity score full dataset embedding-space arithmetic
ConceptDrift weights drift
(ours) towards biases full dataset embedding-space arithmetic

always hold, and should not be used for reliable generalization outside of training distribution, often
leading to degraded performance [24, 3, 11]. Prior works [13, 37, 4] have thus focused on identifying
dataset biases, through data analysis procedures.
Debiasing Debiasing and bias extraction techniques have become crucial in ensuring the fairness and
accuracy of machine learning models [28], with extensive research dedicated to removing harmful
biases across various domains. Some existing methods use bias annotations to train unbiased model,
by means of group balanced subsampling [12], reweighting [27] or data augmentations [34]. In the
absence of these annotations, other works [21, 16, 23] have proposed to first learn a biased model
and then focus on its mistakes to train an unbiased one.
Fairness Fairness in machine learning has been extensively studied, with numerous approaches [8, 30]
proposed to facilitate ethical research and ensure equitable outcomes across different subpopulations.
Most of those methods overlap with domain generalization and worst-group performance improve-
ments. This is also a field where model interpretability plays a crucial role [26], as understanding
how decisions are made can help in identifying and mitigating biases.
Invariant Learning Robustness to out-of-distribution changes can be obtained by enforcing that
the learning model is invariant to different environments or domains [2, 14, 34]. But there are many
cases where we don’t have access to such environments and we must discover them. Approaches

6
like [9] partition the data into subsets that maximally contradict an invariant constraint, and apply
algorithms for distributional robustness, like groupDRO [27], on those subsets, called environments.

5 Conclusions
We introduce ConceptDrift, the first method to identify biases using a weight-space approach, moving
beyond traditional data-restricted protocols. Our novel embedding-space scoring method highlights
concepts that significantly influence class predictions. We empirically demonstrate its effectiveness in
bias investigation across four datasets: Waterbirds, CelebA, Nico++, and CivilComments, revealing
previously undetected biases and achieving notable improvements in zero-shot bias prevention over
current state-of-the-art methods. Validated on image and text datasets, with a foundational model
also endowed with text processing capabilities, ConceptDrift can accommodate any other modality.

6 Acknowledgments
This work was funded by EU Horizon project ELIAS (No. 101120237).

References
[1] Angwin, Julia and Larson, Jeff and Mattu, Surya and Kirchner, Lauren. Machine Bias. There’s
software used across the country to predict future criminals. And it’s biased against blacks.
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing, 2016. Accessed
on: 2024-08-31.

[2] M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz. Invariant risk minimization. arXiv preprint
arXiv:1907.02893, 2019.

[3] S. Beery, G. Van Horn, and P. Perona. Recognition in terra incognita. In Proceedings of the European
conference on computer vision (ECCV), 2018.

[4] U. Bhalla, A. Oesterling, S. Srinivas, F. P. Calmon, and H. Lakkaraju. Interpreting clip with sparse linear
concept embeddings (splice), 2024. URL https://arxiv.org/abs/2402.10376.

[5] D. Borkan, L. Dixon, J. Sorensen, N. Thain, and L. Vasserman. Nuanced metrics for measuring unintended
bias with real data for text classification. CoRR, abs/1903.04561, 2019. URL http://arxiv.org/abs/
1903.04561.

[6] A. Caliskan, J. J. Bryson, and A. Narayanan. Semantics derived automatically from language corpora
contain human-like biases. Science, 356(6334):183–186, 2017. doi: 10.1126/science.aal4230. URL
https://www.science.org/doi/abs/10.1126/science.aal4230.

[7] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, and A. Jatowt. Yake! keyword extraction
from single documents using multiple local features. Information Sciences, 509:257–289, 2020. ISSN
0020-0255. doi: https://doi.org/10.1016/j.ins.2019.09.013. URL https://www.sciencedirect.com/
science/article/pii/S0020025519308588.

[8] S. Caton and C. Haas. Fairness in machine learning: A survey. ACM Comput. Surv., 56(7), apr 2024. ISSN
0360-0300. doi: 10.1145/3616865. URL https://doi.org/10.1145/3616865.

[9] E. Creager, J.-H. Jacobsen, and R. Zemel. Environment inference for invariant learning. In International
Conference on Machine Learning, pages 2189–2200. PMLR, 2021.

[10] R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann. Shortcut
learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.

[11] D. Hendrycks, K. Zhao, S. Basart, J. Steinhardt, and D. Song. Natural adversarial examples. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021.

[12] P. Izmailov, P. Kirichenko, N. Gruver, and A. G. Wilson. On feature learning in the presence of spurious
correlations. Advances in Neural Information Processing Systems, 35:38516–38532, 2022.

[13] Y. Kim, S. Mo, M. Kim, K. Lee, J. Lee, and J. Shin. Discovering and mitigating visual biases through
keyword explanation. In CVPR, 2024.

7
[14] D. Krueger, E. Caballero, J. Jacobsen, A. Zhang, J. Binas, D. Zhang, R. L. Priol, and A. C. Courville.
Out-of-distribution generalization via risk extrapolation (rex). In Proceedings of the 38th International
Conference on Machine Learning, ICML, 2021.

[15] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft
COCO: common objects in context. In ECCV, 2014.

[16] E. Z. Liu, B. Haghgoo, A. S. Chen, A. Raghunathan, P. W. Koh, S. Sagawa, P. Liang, and C. Finn.
Just train twice: Improving group robustness without training group information. In M. Meila and
T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume
139 of Proceedings of Machine Learning Research, pages 6781–6792. PMLR, 18–24 Jul 2021. URL
https://proceedings.mlr.press/v139/liu21f.html.

[17] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In 2015 IEEE International
Conference on Computer Vision (ICCV), pages 3730–3738, 2015. doi: 10.1109/ICCV.2015.425.

[18] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of
International Conference on Computer Vision (ICCV), December 2015.

[19] A. Lynch, G. J.-S. Dovonon, J. Kaddour, and R. Silva. Spawrious: A benchmark for fine control of spurious
correlation biases, 2023. URL https://arxiv.org/abs/2303.05470.

[20] G. A. Miller. Wordnet: a lexical database for english. Commun. ACM, 38(11):39–41, nov 1995. ISSN
0001-0782. doi: 10.1145/219717.219748. URL https://doi.org/10.1145/219717.219748.

[21] J. Nam, H. Cha, S. Ahn, J. Lee, and J. Shin. Learning from failure: De-biasing classifier from
biased classifier. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Ad-
vances in Neural Information Processing Systems, volume 33, pages 20673–20684. Curran Asso-
ciates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/
eddc3427c5d77843c2253f1e799fe933-Paper.pdf.

[22] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and
A. Lerer. Automatic differentiation in pytorch. 2017.

[23] M. Pezeshki, D. Bouchacourt, M. Ibrahim, N. Ballas, P. Vincent, and D. Lopez-Paz. Discovering environ-
ments with xrm. In Forty-first International Conference on Machine Learning, 2024.

[24] J. Quiñonero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence. Dataset Shift in Machine

Learning. The MIT Press, 12 2008. ISBN 9780262255103. doi: 10.7551/mitpress/9780262170055.001.
0001. URL https://doi.org/10.7551/mitpress/9780262170055.001.0001.

[25] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin,
J. Clark, et al. Learning transferable visual models from natural language supervision. In International
conference on machine learning, pages 8748–8763. PMLR, 2021.

[26] C. Rudin, C. Chen, Z. Chen, H. Huang, L. Semenova, and C. Zhong. Interpretable machine learning:
Fundamental principles and 10 grand challenges. Statistics Surveys, 16, 01 2022. doi: 10.1214/21-SS133.

[27] S. Sagawa, P. W. Koh, T. B. Hashimoto, and P. Liang. Distributionally robust neural networks. In
International Conference on Learning Representations, 2020. URL https://api.semanticscholar.
org/CorpusID:213662188.

[28] T. Tommasi, N. Patricia, B. Caputo, and T. Tuytelaars. A Deeper Look at Dataset Bias, pages
37–55. Springer International Publishing, Cham, 2017. ISBN 978-3-319-58347-1. doi: 10.1007/
978-3-319-58347-1_2. URL https://doi.org/10.1007/978-3-319-58347-1_2.

[29] V. Vapnik. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5):
988–999, 1999. doi: 10.1109/72.788640.

[30] S. Verma and J. Rubin. Fairness definitions explained. In Proceedings of the International Workshop
on Software Fairness, FairWare ’18, page 1–7, New York, NY, USA, 2018. Association for Computing
Machinery. ISBN 9781450357463. doi: 10.1145/3194770.3194776. URL https://doi.org/10.1145/
3194770.3194776.

[31] C. Wah, S. Branson, P. Welinder, P. Perona, and S. J. Belongie. The caltech-ucsd birds-200-2011
dataset. https://api.semanticscholar.org/CorpusID:16119123, 2011. URL https://api.
semanticscholar.org/CorpusID:16119123.

8
[32] J. Wang, Z. Yang, X. Hu, L. Li, K. Lin, Z. Gan, Z. Liu, C. Liu, and L. Wang. Git: A generative image-to-text
transformer for vision and language. arXiv preprint arXiv:2205.14100, 2022.

[33] P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-ucsd birds 200.
09 2010.
[34] H. Yao, Y. Wang, S. Li, L. Zhang, W. Liang, J. Zou, and C. Finn. Improving out-of-distribution robustness
via selective augmentation. In International Conference on Machine Learning, pages 25407–25437. PMLR,
2022.
[35] X. Zhang, Y. He, R. Xu, H. Yu, Z. Shen, and P. Cui. Nico++: Towards better benchmarking for domain
generalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
pages 16036–16047, 2023.
[36] X. Zhang, Y. Zhang, D. Long, W. Xie, Z. Dai, J. Tang, H. Lin, B. Yang, P. Xie, F. Huang, M. Zhang,
W. Li, and M. Zhang. mgte: Generalized long-context text representation and reranking models for
multilingual text retrieval. CoRR, abs/2407.19669, 2024. doi: 10.48550/ARXIV.2407.19669. URL
https://doi.org/10.48550/arXiv.2407.19669.

[37] Z. Zhao, S. Kumano, and T. Yamasaki. Language-guided detection and mitigation of unknown dataset
bias, 2024. URL https://arxiv.org/abs/2406.02889.

9
A Appendix
Finding biases in models We discuss so far how our approach can be used to find biases in datasets, but it
can also be used for finding biases of a model w.r.t. the foundational model. We apply the same procedure, such
that the new ground-truth labels of the dataset entries are the predictions of our model of interest.

Broader impacts We emphasize that our method should not be used in a stand-alone fashion for automated
discovery of biases in every field and that human assistance is needed in order to interpret the model output
before any further actions of consequence. Our tool is meant to aid and assist humans in the process of bias
identification, not to replace them.

Limitations An important limitation in our method is the captioning model used for image classifications
task. Zhao et al. [37] acknowledged as well that these models usually do not extract all the details in the images,
so methods relying on them are limited to discovering the biases that they can extract. Another limitation is
the keyword extracting procedure - using a more sophisticated one could bring forth new biases (e.g. extracting
topics or taking into account synonymy). The method also relies on known hierarchies of concepts to detect
biases by filtering concepts related to the desired class. These hierarchies and the relations they provide thus
limit the type of filtering that we can ensure.

Bound on cosine similarity of vectors Let u, v, t ∈ RD be three vectors of unit length, with u and v
being fixed. We are interesting in finding the vector t that maximizes the difference in cosine similarity with the
two fixed vectors:
arg max (t · u − t · v) ,
∥t∥2 =1

where · represents the standard dot product of vectors. This can be rewritten as:
arg max t · (u − v) = arg max cos(t, u − v) ∥u − v∥2
∥t∥2 =1 ∥t∥2 =1

= arg max cos(t, u − v) ,

∥t∥2 =1

1
as ∥u − v∥2 is a constant. It is now easy to see that the solution to this problem is t = ∥u−v∥ (u − v), the
2
unit length vector with the same orientation as u − v. Using this we can place an upper bound on the initial
difference:
t · u − t · v ≤ ∥u − v∥2 ,
which we then rearrange as
t · v ≥ t · u − ∥u − v∥2 .
The norm ∥u − v∥2 can be equivalently expressed as
p √ p
∥u − v∥2 = (u − v) · (u − v) = 2 − 2u · v = 2(1 − u · v) .
Introducing this in the previous inequality we obtain
p
t·v ≥t·u− 2(1 − u · v) .
Since u, v and t are vectors of unit length we can replace the dot products with the cosine similarity. By then
setting u = M (ci ), v = wk and t = M (xj ) we finally obtain the inequality:
p
cos(M (xj ), wk ) ≥ cos(M (xj ), M (ci )) − 2(1 − cos(M (ci ), wk ))

A.1 Zero-Shot Prompts

The basic prompts we used for each dataset are the following:

• Waterbirds: ’a photo of a {class name}’

• CelebA: ’a photo of a person with {class name}’,
• CivilComments: ’{class name}’
• Nico++: ’a photo of a {class name}’.

Next, we change them to accomodate the biases wildcard:

• Waterbirds: ’a photo of a {class name} in the {bias}’

10
Table 5: Identified global biases - Nico++
We find words related to the environments that we associated to each class, but also some attributes
more specific to the class itself than the other ones (e.g. ’wooden’ for chair).
Nico++ (highest rank first)
car flower chair truck
Model
Ground Truth Biases outdoor grass water water
beach, parking, driving, road,
standing, driving, red, close, sitting, red, large, lake,
parked, blue, yellow, wild, pool, beach, black, beach,
road, pool, field, floating, wooden, floating, spraying, field,
ConceptDrift (ours) lot, group water, white near standing, water

Table 6: Identified global biases - CivilComments++

Notice how references to religion and ethnicity are common in the class of offensive comments, while
in the opposite part we have words that are more common in formal contexts.
CivilComments (highest rank first)
non-offensive offensive
Model
experienced, completely, responsible, losers, acting, bigotry,
barrier, coverage, attempt, misogynist, mental, racist,
Engineer, total, Notice, Muslim, jesus, Christian,
shared, primarily, regard, driving, Sexuality, White,
helping, accepting, paycheck, supremacist, Trump, someone,
wrote, petition, case, rid, repub, president,
always, aspects, rest, white, Mental, lesbian,
noticed, name, hours, like, people, Jihadist,
analysis, Extension, personal, intellectuals, state, God,
blog, based, relative, dangerous, black, mans,
ConceptDrift (ours) important, new, mentioned killing, ultimate

• CelebA: ’a photo of a {bias} with {class name}’

• CivilComments: ’a/an {class name} comment about {bias}’
• Nico++: ’a photo of a {class name} in the {bias}’.

The class names used in the templates and for the initialization of the linear layer weights are:

• Waterbirds: ’landbird’, ’waterbird’

• CelebA: ’non-blonde hair’, ’blonde hair’
• CivilComments: ’non-offensive’, ’offensive’
• Nico++: ’car’, ’flower’, ’chair’, ’truck’

A.2 Nico++ and CivilComments Biases

See Tab. 5 and Tab. 6 for the biases extracted with our method for Nico++ and CivilComments datasets.

Custom Nico++ subset For the experiments on Nico++ we selected only the first four classes and paired
them with the environments that they had the most samples in, resulting in the following associations: (car,
outdoor), (flower, grass), (chair, water), (truck, water). Notice how the classes chair and truck shared the same
bias, in contrast to most popular subpopulation shift datasets that only have one-to-one associations of classes
and biases. For the training set we keep for each class 300 samples from its associated environment and only 25
from the other ones, while for validation we keep 50 from the associated one and 25 from the others. The test set
is made up of all the remaining samples.

Artificial Island, 1
No ratings yet
Artificial Island, 1
25 pages
Orchid Hotel Explaination
No ratings yet
Orchid Hotel Explaination
14 pages
Spam Text Detection Over Social Media Usage
No ratings yet
Spam Text Detection Over Social Media Usage
8 pages
Analyzing Classroom Interaction Data Using Prompt Engineering and Network Analysis
No ratings yet
Analyzing Classroom Interaction Data Using Prompt Engineering and Network Analysis
34 pages
INGLES II Cuadernillo
No ratings yet
INGLES II Cuadernillo
38 pages
Homework Unit #3
No ratings yet
Homework Unit #3
2 pages
An Overview of Deep Neural Networks For Few-Shot Learning
No ratings yet
An Overview of Deep Neural Networks For Few-Shot Learning
44 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
AI Glossary Second Edit PDF
No ratings yet
AI Glossary Second Edit PDF
30 pages
AFM ER308 Afm Er308L
No ratings yet
AFM ER308 Afm Er308L
9 pages
A Category of Wide Subcategories: Wide K M M M - Rigid - Tilting Finite
No ratings yet
A Category of Wide Subcategories: Wide K M M M - Rigid - Tilting Finite
43 pages
Perversity of Coinvariants of Affine Springer Sheaves: Abstract
No ratings yet
Perversity of Coinvariants of Affine Springer Sheaves: Abstract
33 pages
Digital 04 00001
No ratings yet
Digital 04 00001
68 pages
A Survey On Bias in Machine Learning Research: August 2023
No ratings yet
A Survey On Bias in Machine Learning Research: August 2023
49 pages
Neural Networks: Sree Rama Vamsidhar S., Arun Kumar Sivapuram, Vaishnavi Ravi, Gowtham Senthil, Rama Krishna Gorthi
No ratings yet
Neural Networks: Sree Rama Vamsidhar S., Arun Kumar Sivapuram, Vaishnavi Ravi, Gowtham Senthil, Rama Krishna Gorthi
7 pages
Bias in Federated Learning
No ratings yet
Bias in Federated Learning
36 pages
1 s2.0 S0021869323005355 Main
No ratings yet
1 s2.0 S0021869323005355 Main
34 pages
Adversarial Concept Drift Detection
No ratings yet
Adversarial Concept Drift Detection
36 pages
UNIT 2 Data Science LM 2023
No ratings yet
UNIT 2 Data Science LM 2023
13 pages
An Intensity Based Deep Approach To Mitigate Step Imbalance Problem Under Extreme Paucity of Images From Rare Classes
No ratings yet
An Intensity Based Deep Approach To Mitigate Step Imbalance Problem Under Extreme Paucity of Images From Rare Classes
31 pages
Ny Lybrary
No ratings yet
Ny Lybrary
6 pages
Applsci 15 03056
No ratings yet
Applsci 15 03056
27 pages
Failing Loudly
No ratings yet
Failing Loudly
38 pages
Jasso 2014
No ratings yet
Jasso 2014
32 pages
Kamiran, Faisal (2011) Data Preprocessing Techniques For Classification Without Discrimination
No ratings yet
Kamiran, Faisal (2011) Data Preprocessing Techniques For Classification Without Discrimination
33 pages
Sample Presentation
No ratings yet
Sample Presentation
11 pages
Predictions For The Detectability of Milky Way Satellite Galaxies and Outer-Halo Star Clusters With The Vera C. Rubin Observatory
No ratings yet
Predictions For The Detectability of Milky Way Satellite Galaxies and Outer-Halo Star Clusters With The Vera C. Rubin Observatory
18 pages
Many-Body Atomic Response Functions of Xenon and Germanium For Leading-Order Sub-Gev Dark Matter-Electron Interactions in Effective Field Theory
No ratings yet
Many-Body Atomic Response Functions of Xenon and Germanium For Leading-Order Sub-Gev Dark Matter-Electron Interactions in Effective Field Theory
25 pages
Spam Text Detection Over Social Media Usage A Supervised Sampling Approach
No ratings yet
Spam Text Detection Over Social Media Usage A Supervised Sampling Approach
8 pages
Bayesianize Fuzziness in The Statistical Analysis of Fuzzy Data
No ratings yet
Bayesianize Fuzziness in The Statistical Analysis of Fuzzy Data
22 pages
Cohn1994 Article ImprovingGeneralizationWithAct
No ratings yet
Cohn1994 Article ImprovingGeneralizationWithAct
21 pages
Learning Under Concept Drift A Review
No ratings yet
Learning Under Concept Drift A Review
18 pages
Thudi2025 - MixMin Finding Data Mixtures Via Convex Minimizat
No ratings yet
Thudi2025 - MixMin Finding Data Mixtures Via Convex Minimizat
17 pages
Closing in On Pop-III Stars: Constraints and Predictions Across The Spectrum
No ratings yet
Closing in On Pop-III Stars: Constraints and Predictions Across The Spectrum
20 pages
Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning
No ratings yet
Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning
25 pages
Anisotropic Active Brownian Particle in Two Dimensions Under Stochastic Resetting
No ratings yet
Anisotropic Active Brownian Particle in Two Dimensions Under Stochastic Resetting
21 pages
407 A Decade S Battle On Datas
No ratings yet
407 A Decade S Battle On Datas
17 pages
1 s2.0 S0020025524007485 Main
No ratings yet
1 s2.0 S0020025524007485 Main
15 pages
Metrics For Dataset Demographic Bias A Case Study On Facial Expression Recognition
No ratings yet
Metrics For Dataset Demographic Bias A Case Study On Facial Expression Recognition
18 pages
The LOFAR Two-Metre Sky Survey: Deep Fields Data Release 2: I. The ELAIS-N1 Field
No ratings yet
The LOFAR Two-Metre Sky Survey: Deep Fields Data Release 2: I. The ELAIS-N1 Field
20 pages
Mitigating Bias in Artificial Intelligence
No ratings yet
Mitigating Bias in Artificial Intelligence
18 pages
Finite-Size Effects in Aging Can Be Interpreted As Sub-Aging
No ratings yet
Finite-Size Effects in Aging Can Be Interpreted As Sub-Aging
19 pages
Bias and Unfairness in Machine Learning Models A S
No ratings yet
Bias and Unfairness in Machine Learning Models A S
31 pages
An Overview of Unsupervised Drift Detection Methods
No ratings yet
An Overview of Unsupervised Drift Detection Methods
18 pages
Control Strategies For Maintaining Transport Symmetries Far From Equilibrium
No ratings yet
Control Strategies For Maintaining Transport Symmetries Far From Equilibrium
17 pages
"Standing On The Shoulders of Giants": Dominican College of Tarlac
100% (1)
"Standing On The Shoulders of Giants": Dominican College of Tarlac
3 pages
C M M G A N: Orrecting Odel Isspecification Via Enerative Dversarial Etworks
No ratings yet
C M M G A N: Orrecting Odel Isspecification Via Enerative Dversarial Etworks
13 pages
Noncommutative Dedekind Domains
No ratings yet
Noncommutative Dedekind Domains
10 pages
A Novel Resampling Technique For Imbalanced Classification in Software Defect Prediction by A Re-Sampling Method With Filtering
No ratings yet
A Novel Resampling Technique For Imbalanced Classification in Software Defect Prediction by A Re-Sampling Method With Filtering
10 pages
A Trailing Lognormal Approximation of The Lyman-Forest: Comparison With Full Hydrodynamic Simulations at
No ratings yet
A Trailing Lognormal Approximation of The Lyman-Forest: Comparison With Full Hydrodynamic Simulations at
15 pages
A Groundwater Market Model: Igor Cialenco and Michael Ludkovski
No ratings yet
A Groundwater Market Model: Igor Cialenco and Michael Ludkovski
12 pages
Time Symmetries of Quantum Memory Improve Thermodynamic Efficiency
No ratings yet
Time Symmetries of Quantum Memory Improve Thermodynamic Efficiency
14 pages
Stykliste BY Manual
No ratings yet
Stykliste BY Manual
35 pages
Evidential Deep Learning To Quantify Classification Uncertainty
No ratings yet
Evidential Deep Learning To Quantify Classification Uncertainty
12 pages
Most
No ratings yet
Most
31 pages
Choi Et Al. - 2022 - Imbalanced Data Classification Via Cooperative Int
No ratings yet
Choi Et Al. - 2022 - Imbalanced Data Classification Via Cooperative Int
14 pages
The Python Library For Automated Feature Engineering and Selection
No ratings yet
The Python Library For Automated Feature Engineering and Selection
11 pages
Agarwal Does Data Repair Lead To Fair Models Curating Contextually Fair WACV 2022 Paper
No ratings yet
Agarwal Does Data Repair Lead To Fair Models Curating Contextually Fair WACV 2022 Paper
10 pages
A Decade's Battle On Dataset Bias
No ratings yet
A Decade's Battle On Dataset Bias
20 pages
Is The Performance of My Deep Network Too Good To Be True? A Direct Approach To Estimating The Bayes Error in Binary Classification
No ratings yet
Is The Performance of My Deep Network Too Good To Be True? A Direct Approach To Estimating The Bayes Error in Binary Classification
22 pages
Towards Fair Classifiers Without Sensitive Attributes: Exploring Biases in Related Features
No ratings yet
Towards Fair Classifiers Without Sensitive Attributes: Exploring Biases in Related Features
10 pages
Implement Ethical and Unbiased Algorithms
No ratings yet
Implement Ethical and Unbiased Algorithms
19 pages
Jacob H. Hamer Kevin C. Schlaufman: Bryden Et Al. 2000 Kley 2000 Masset & Snellgrove 2001 Marcy Et Al. 2001
No ratings yet
Jacob H. Hamer Kevin C. Schlaufman: Bryden Et Al. 2000 Kley 2000 Masset & Snellgrove 2001 Marcy Et Al. 2001
25 pages
Concept Drift
No ratings yet
Concept Drift
13 pages
Drift Survey Paper JETIR2411319
No ratings yet
Drift Survey Paper JETIR2411319
9 pages
1 F40, R-41, In-House IHTM-14 Test Report
No ratings yet
1 F40, R-41, In-House IHTM-14 Test Report
1 page
RGUHS - B.SC Nursing - 2012 - 1 - Mar - 1754 Anatomy and Physiology (Rs 3)
No ratings yet
RGUHS - B.SC Nursing - 2012 - 1 - Mar - 1754 Anatomy and Physiology (Rs 3)
1 page
BK XXLS400 1-0
No ratings yet
BK XXLS400 1-0
9 pages
NeurIPS 2020 Training Normalizing Flows With The Information Bottleneck For Competitive Generative Classification Paper
No ratings yet
NeurIPS 2020 Training Normalizing Flows With The Information Bottleneck For Competitive Generative Classification Paper
13 pages
Fair Thesis
No ratings yet
Fair Thesis
45 pages
Improving Small-Scale Large Language Models Function Calling For Reasoning Tasks
No ratings yet
Improving Small-Scale Large Language Models Function Calling For Reasoning Tasks
10 pages
The DESI 2024 Hint For Dynamical Dark Energy Is Biased by Low-Redshift Supernovae
No ratings yet
The DESI 2024 Hint For Dynamical Dark Energy Is Biased by Low-Redshift Supernovae
8 pages
2502 03547v1
No ratings yet
2502 03547v1
8 pages
Study On The Label Generation Model of Power Suppliers' Bad Behavior Based On Supervised Machine Learning
No ratings yet
Study On The Label Generation Model of Power Suppliers' Bad Behavior Based On Supervised Machine Learning
5 pages
SUMSEM2023-24 CSI3901 ETH VL2023240701291 2024-06-06 Reference-Material-I
No ratings yet
SUMSEM2023-24 CSI3901 ETH VL2023240701291 2024-06-06 Reference-Material-I
10 pages
Guiding Empowerment Model: Liberating Neurodiversity in Online Higher Education
No ratings yet
Guiding Empowerment Model: Liberating Neurodiversity in Online Higher Education
9 pages
Insights Into The Physics of Grbs From The High-Energy Extension of Their Prompt Emission Spectra
No ratings yet
Insights Into The Physics of Grbs From The High-Energy Extension of Their Prompt Emission Spectra
17 pages
Mitigating Unwanted Biases With Adversarial Learning: Brian Hu Zhang Blake Lemoine Margaret Mitchell
No ratings yet
Mitigating Unwanted Biases With Adversarial Learning: Brian Hu Zhang Blake Lemoine Margaret Mitchell
6 pages
Thawing Quintessence and Transient Cosmic Acceleration in Light of DESI
No ratings yet
Thawing Quintessence and Transient Cosmic Acceleration in Light of DESI
5 pages
A Cold Tracer in A Hot Bath: in and Out of Equilibrium
No ratings yet
A Cold Tracer in A Hot Bath: in and Out of Equilibrium
6 pages
The Role of Low-Energy ( 20 Ev) Secondary Electrons in The Extraterrestrial Synthesis of Prebiotic Molecules
No ratings yet
The Role of Low-Energy ( 20 Ev) Secondary Electrons in The Extraterrestrial Synthesis of Prebiotic Molecules
14 pages
Bazan 2013
No ratings yet
Bazan 2013
44 pages
I F C L M S T O: Mproving Airness in Redit Ending Odels Using Ubgroup Hreshold Ptimization
No ratings yet
I F C L M S T O: Mproving Airness in Redit Ending Odels Using Ubgroup Hreshold Ptimization
9 pages
Ai For Robustness and Fairness Addressing Bias Fairness and Robustness in Machine Learning Algorithms
No ratings yet
Ai For Robustness and Fairness Addressing Bias Fairness and Robustness in Machine Learning Algorithms
4 pages
Autoregressive Based Drift Detection Method
No ratings yet
Autoregressive Based Drift Detection Method
13 pages
Study of Cosmogenic Activation Above Ground of Ar For Darkside-20K
No ratings yet
Study of Cosmogenic Activation Above Ground of Ar For Darkside-20K
8 pages
Kilonova Evolution - The Rapid Emergence of Spectral Features
No ratings yet
Kilonova Evolution - The Rapid Emergence of Spectral Features
8 pages
The Emperor's New Arc: Gigaparsec Patterns Abound in A CDM Universe
No ratings yet
The Emperor's New Arc: Gigaparsec Patterns Abound in A CDM Universe
6 pages
Hierarchical Cross-Entropy Loss For Classification of Astrophysical Transients
No ratings yet
Hierarchical Cross-Entropy Loss For Classification of Astrophysical Transients
6 pages
The Problem of Concept Drift - Definitions and Related Work
No ratings yet
The Problem of Concept Drift - Definitions and Related Work
7 pages
Persuasive Speech On Homework Should Be Banned
100% (1)
Persuasive Speech On Homework Should Be Banned
6 pages
Pr. Miguel Couceiro Dr. Amedeo Napoli: Final Report On
No ratings yet
Pr. Miguel Couceiro Dr. Amedeo Napoli: Final Report On
20 pages
PSSS - 2 Membership Option Form
No ratings yet
PSSS - 2 Membership Option Form
1 page
BS 2nd Shift Time Table Wef 11-12-2023 (1st, 5th, 7th Semester)
No ratings yet
BS 2nd Shift Time Table Wef 11-12-2023 (1st, 5th, 7th Semester)
3 pages
Entropy 23 00018 v2 19
No ratings yet
Entropy 23 00018 v2 19
1 page
Entropy 23 00018 v2 24
No ratings yet
Entropy 23 00018 v2 24
1 page
China Plastic Chair in Furniture Suppliers, Plastic Chair in Furniture Manufacturers From China On
No ratings yet
China Plastic Chair in Furniture Suppliers, Plastic Chair in Furniture Manufacturers From China On
11 pages
Buy Social Security Number SSN
No ratings yet
Buy Social Security Number SSN
8 pages
Fcs 16 Learn Ware
No ratings yet
Fcs 16 Learn Ware
3 pages
RRL in Combined Cryptographic Algorithms
No ratings yet
RRL in Combined Cryptographic Algorithms
8 pages
Toxic Comments Classification
No ratings yet
Toxic Comments Classification
10 pages
FAST - A ROC-Based Feature Selection Metric For Small Samples and Imbalanced Data Classification Problems (2008)
No ratings yet
FAST - A ROC-Based Feature Selection Metric For Small Samples and Imbalanced Data Classification Problems (2008)
9 pages
Training Data Influence Analysis and Estimate: A Survey Machine Learning 194918
No ratings yet
Training Data Influence Analysis and Estimate: A Survey Machine Learning 194918
2 pages
Grami Product List & Price 2021
No ratings yet
Grami Product List & Price 2021
6 pages
Rubrics Essay
No ratings yet
Rubrics Essay
1 page
Installing ICU 52
No ratings yet
Installing ICU 52
7 pages
Preliminay Report For Projrct Bayesian Deep Learning and Recommender Systems
No ratings yet
Preliminay Report For Projrct Bayesian Deep Learning and Recommender Systems
2 pages
On The Extension of Fermat's Theorem To Matrices of Order N: by J. B. Marshall
No ratings yet
On The Extension of Fermat's Theorem To Matrices of Order N: by J. B. Marshall
7 pages
Learning Competencies in Science For Grade 3
No ratings yet
Learning Competencies in Science For Grade 3
3 pages
Energy Management SYSTEM Manual
No ratings yet
Energy Management SYSTEM Manual
34 pages
ANT 4468 - Syllabus PDF
No ratings yet
ANT 4468 - Syllabus PDF
5 pages
SCHEME HND 1 General Computer II 2019-2020
No ratings yet
SCHEME HND 1 General Computer II 2019-2020
5 pages
Lesson 2 - Rights and Obligations of Parties
No ratings yet
Lesson 2 - Rights and Obligations of Parties
9 pages
Efu Health Insurance
No ratings yet
Efu Health Insurance
3 pages
A Review On Concept Drift
No ratings yet
A Review On Concept Drift
7 pages
Molloy College Division of Education Lesson Plan Template: Instructional Objectives
No ratings yet
Molloy College Division of Education Lesson Plan Template: Instructional Objectives
7 pages
Corporate and Academic Services: Part 1: Basic Data
No ratings yet
Corporate and Academic Services: Part 1: Basic Data
3 pages
Ps 70 1 e
No ratings yet
Ps 70 1 e
8 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Virtual Intelligence: Fundamentals and Applications
From Everand
Virtual Intelligence: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Conceptdrift: Uncovering Biases Through The Lens of Foundational Models

Uploaded by

Conceptdrift: Uncovering Biases Through The Lens of Foundational Models

Uploaded by

ConceptDrift: Uncovering Biases through the Lens of

Cristian Daniel Păduraru Antonio Bărbălau

Radu Filipescu Andrei Liviu Nicolicioiu

Landbird key concepts Landbird biases

Duck tree branch in forest tree

Climbing Beak Waterbird key concepts Waterbird biases

flying over ocean

Human Woodpecker Update trajectory for

the neuron weight

Face concepts for the class

2. We propose a novel, embedding-space scoring method, able to reveal concepts which

3.2 Quantitative analysis through zero-shot prompting

3.3 Extracting qualitative biases

bird flying through a forest

bird in the forest

bird flying over a forest

bird standing in the woods

bird flying over a river

Table 4: Bias-extraction approaches comparison, based on fundamental differences in methods.

[24] J. Quiñonero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence. Dataset Shift in Machine

= arg max cos(t, u − v) ,

A.1 Zero-Shot Prompts

• Waterbirds: ’a photo of a {class name}’

Next, we change them to accomodate the biases wildcard:

• Waterbirds: ’a photo of a {class name} in the {bias}’

Table 6: Identified global biases - CivilComments++

• CelebA: ’a photo of a {bias} with {class name}’

• Waterbirds: ’landbird’, ’waterbird’

A.2 Nico++ and CivilComments Biases

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.