IEEE Conference Template
IEEE Conference Template
Abstract—Artificial Intelligence (AI) has significantly advanced greatly from AI-driven innovations. However, traditional deep
medical diagnostics, particularly in medical imaging, by enhanc- learning methods often operate as ”black boxes,” providing
ing accuracy and efficiency. However, traditional deep learning little transparency or explanation for their decisions. This
models often function as ”black boxes,” lacking transparency in
their decision-making processes—a critical concern in clinical opacity poses significant challenges in domains like health-
settings where interpretability is essential. To address this, care, where interpretability, trustworthiness, and explainability
Neuro-Symbolic AI has emerged, integrating neural networks’ are paramount for clinical decision-making. In response to
pattern recognition capabilities with symbolic reasoning’s logical these challenges, Neuro Symbolic AI—an emerging paradigm
interpretability, aiming to create models that are both accurate combining neural networks with symbolic reasoning—has
and explainable. In parallel, contrastive learning, a selfsuper-
vised technique, has gained traction for learning meaningful gained traction. By fusing the pattern recognition strengths
representations from unlabeled data, enhancing a model’s ability of neural networks with the logical reasoning capabilities of
to distinguish between different classes—a valuable asset in symbolic AI, Neuro Symbolic models offer both efficiency and
medical image analysis.This paper explores the integration of interpretability, addressing limitations found in purely neural
Neuro-Symbolic AI and contrastive learning in medical image or symbolic approaches. Recent works [1]-[3] highlight the
diagnosis. We propose a multimodal framework that combines
visual data from chest X-rays with textual data from radiology potential of Neuro Symbolic AI to enhance interpretability
reports. Advanced models, such as ConvNeXt V2 for image in AI systems, especially in complex tasks such as medical
analysis and Flan-T5 for text processing, are employed to diagnosis. Neuro Symbolic AI has shown promise in vari-
extract high-dimensional feature embeddings. These embeddings ous fields, including medical image analysis, by balancing
are projected into a shared latent space, facilitating seamless accuracy with explainability, thus making AI-assisted diag-
multimodal fusion. Contrastive learning techniques are applied
to enhance the model’s ability to discern between different nostic systems more trustworthy for clinicians. Additionally,
classes by maximizing the similarity between related imagetext contrastive learning, a self-supervised learning technique, has
pairs and minimizing it for unrelated pairs. Additionally, a gained attention for its effectiveness in learning meaningful
symbolic reasoning module is incorporated to perform logical representations from unlabeled data. By contrasting positive
inference on the multimodal embeddings, mapping continuous and negative sample pairs, contrastive learning enhances the
representations to discrete symbolic concepts. This integration
enhances the system’s transparency, providing clinicians with model’s ability to distinguish between different classes, which
clear and justifiable insights into the model’s decision-making is particularly beneficial in medical image analysis. Studies
process. have demonstrated the applicability of contrastive learning
Index Terms—Artificial Intelligence (AI), Medical Imaging, in medical imaging tasks [4]-[5]. This review explores the
Neuro-Symbolic AI, Contrastive Learning, Multimodal Frame- integration of Neuro-Symbolic AI and contrastive learning in
work, Symbolic Reasoning, Self-Supervised Learning
medical image diagnosis, focusing on methodologies, applica-
I. I NTRODUCTION tions, challenges, and future directions.
Artificial Intelligence (AI) has revolutionized several indus-
II. R ELATED W ORK
tries, with healthcare and medical diagnostics standing out due
to their potential to significantly improve patient outcomes. A. Traditional Medical Image Diagnosis
The field of medical imaging, in particular, has benefited
Several studies have explored the role of AI in medi-
Identify applicable funding agency here. If none, delete this. cal imaging, particularly in enhancing diagnostic accuracy.
For example, [6] discuss the application of AI in Comput- B. Applications of Neuro-Symbolic AI and Contrastive Learn-
erAided Diagnosis (CAD), emphasizing the improvements AI ing in Medical Image Diagnosis
brings in diagnostic accuracy, efficiency, and speed. Their
paper highlights the use of deep learning techniques such as Medical image diagnosis relies heavily on the interpretabil-
convolutional neural networks (CNNs) for tasks like image ity of the models used. Recent studies have shown that
segmentation, detection, and classification. CAD systems can Neuro Symbolic approaches can address the shortcomings
help detect lesions and perform quantitative analyses, showing of traditional AI by providing explainable, rule-based out-
considerable improvements in areas like lung nodule detection puts. Studies emphasize the importance of balancing the
and liver tumor screening. CNNs have been instrumental computational demands of neural networks with the inter-
in medical image analysis. Their review focuses on CNN pretability offered by symbolic systems. Work on models
architectures such as AlexNet, ResNet, and VGGNet, and like Neuro-Symbolic Concept Learner (NSCL) and Neuro-
their application in tasks like classification, detection, seg- Symbolic Dynamic Reasoning (NS-DR) demonstrates how
mentation, and image enhancement. These models reduce these hybrid approaches outperform purely neural methods in
the need for manual feature engineering, demonstrating sig- terms of transparency [14]. Studies further supports this by
nificant improvements in areas such as image segmentation comparing post-hoc explanation methods such as SHAP and
and disease detection [7]. Despite the success of AI-based LIME with neural-symbolic rule extraction methods. While
systems, limitations persist. One of the main challenges is SHAP and LIME provide local explanations, they often suffer
the lack of high-quality, annotated medical datasets, which from inconsistency and computational inefficiency. In contrast,
limits the reliability of deep learning models. Several papers, neural-symbolic approaches provide clearer, more actionable
point out that the lack of standardized evaluation protocols insights, which are essential in medical image diagnosis [15].
also complicates the deployment of AI in clinical settings. Contrastive learning further enhances interpretability by learn-
The complexity of medical images, especially those with ing representations that distinguish between different classes.
minimal differences between pathological and healthy areas, Authors introduce a framework for contrastive learning of vi-
further hinders AI performance in specific cases [8]. Some sual representations, demonstrating its effectiveness in various
papers propose Sino-CT-Fusion-Net, a framework designed for tasks [16]. In medical imaging, authors applied contrastive
the detection and classification of intracranial hemorrhages. learning to chest X-rays, improving diagnostic performance
This system integrates sinogram data (raw Xray data) with by leveraging unlabeled data [17].
CT images to enhance diagnostic accuracy. The fusion of
data types provides significant improvements in detection
rates, especially in identifying smaller hemorrhages. This C. Hybrid Models for Medical Image Segmentation and Di-
study illustrates the potential of multi-modal data fusion in agnosis
improving diagnostic outcomes [9]. Some have developed a
CNN-Transformer hybrid model for bladder cancer detection Hybrid Neuro Symbolic models, which combine deep learn-
using cystoscopy images. The model achieved high accuracy, ing with symbolic reasoning, have shown great promise in
demonstrating the effectiveness of integrating transformers for medical imaging tasks like segmentation, tumor detection, and
capturing global context. However, the study noted limitations disease classification.Studies highlight the potential of hybrid
such as the small dataset size and the lack of clinical validation models that use neural embeddings along with symbolic rea-
[10]. The COVID-19 pandemic has accelerated the use of AI soning frameworks, which have proven effective in reasoning
in healthcare. Authors discussed the COV19D Competition, over knowledge graphs [18]. Studies discussed the implemen-
which aimed to improve COVID-19 detection using 3-D chest tation of convolutional neural networks (CNNs) for image
CT scans. The top-performing models achieved impressive segmentation and object detection, integrated with symbolic
results, highlighting the importance of domain adaptation reasoning. Their AI-based Computer-Aided Diagnosis (CAD)
techniques to generalize across different hospitals and medical system assists clinicians in identifying lesions and conducting
centers [11]. Papers focused on pneumonia detection from quantitative analyses, reducing diagnosis times and increasing
chest Xray images using CNNs. The study compared several accuracy. In the realm of medical image segmentation, U-Net
CNN architectures, with a custom-tuned CNN achieving the has been a foundational architecture. Recent advancements
highest accuracy (83.16model pre-processing and tuning to have seen the integration of symbolic AI into such archi-
enhance performance [12]. Another notable study explores tectures to enhance performance. For instance, symbolic AI’s
brain tumor classification using MRI scans. The paper evalu- rulebased segmentation has been applied to analyze anatomical
ates multiple models, including CNNs and logistic regression, structures, improving the precision of boundary delineation
with the CNN achieving the highest accuracy. However, the in complex medical images. Furthermore, the integration of
authors note the need for external validation and comparison Neuro-Symbolic AI in brain tumor diagnosis has shown poten-
with human radiologists [13]. tial. By combining CNNs with symbolic reasoning, models can
better handle the complexity and variability inherent in brain
imaging, leading to more accurate and interpretable diagnostic
outcomes.
D. Enhancing Scalability and Efficiency for MALaria), employs a CSPDarkNet53 backbone for ob-
ject detection, which is well-suited for extracting meaningful
Contrastive learning contributes to scalability by enabling features from complex imaging data. To effectively bridge
models to learn robust representations from unlabeled data, the gap between the differing domains of HCM and LCM
reducing the dependency on extensive labeled datasets. This images, the authors introduce a Domain Adaptive Contrastive
approach is particularly beneficial in medical imaging, where (DAC) loss. This loss function is specifically designed to min-
labeled data is often scarce. Studies have demonstrated imize the domain discrepancy between the two image types,
that contrastive learning frameworks can effectively utilize thereby enhancing the model’s ability to generalize across
unlabeled data to improve diagnostic performance across varying image quality and conditions. In addition to the DAC
various medical image datasets [19]. For the detection of loss, the framework incorporates standard object detection
Alzheimer’s disease, researchers have developed a hybrid losses, including classification, localization, and objectness, to
model that combines Graph Convolutional Neural Networks ensure comprehensive training and accurate predictions. By
(GCNN) and CNNs. This model utilizes the Alzheimer’s integrating these elements, the model is capable of effectively
dataset, which includes MRI images across four categories: identifying and localizing malaria parasites within the im-
mildly demented, moderately demented, nondemented, and ages. Furthermore, a non-linear projection layer is utilized to
very mildly demented. To address sample imbalances, various map features to a lower-dimensional latent space, facilitating
data augmentation techniques were applied. The model was improved representation learning and enhancing the model’s
rigorously evaluated against established pretrained models, overall performance. This multi-faceted approach positions
such as VGG19, ResNet50, AlexNet, and DenseNet-121, CodaMal as a significant advancement in malaria parasite
with performance metrics encompassing accuracy, sensitivity, detection, paving the way for more accessible and efficient
precision, and F1-score [20] The research on liver cancer diagnostic solutions in resource-limited settings [23].
detection highlights a comprehensive preprocessing pipeline
that includes essential steps such as normalization, noise E. Challenges
reduction, contrast enhancment, and artifact removal. This While Neuro-Symbolic AI and contrastive learning offer
thorough approach is critical for ensuring the quality and promising advancements, several challenges remain. One ma-
consistency of the input data, which directly influences the jor issue is the integration of neural and symbolic components.
performance of the machine learning models. The study The symbolic components, although crucial for interpretabil-
also compares multiple CNN architectures, including VGG16, ity, often introduce bottlenecks in terms of computational ef-
ResNet50, and MobileNet, providing valuable insights into ficiency, particularly in real-time medical applications. More-
their relative strengths and weaknesses in the context of liver over, studies emphasize the scalability issues faced by Neuro
cancer detection. By evaluating both CT and MRI scans, the Symbolic AI when applied to large medical datasets, as seen
research acknowledges the distinct advantages of each imaging in their work on knowledge graph reasoning. Additionally,
modality, allowing for a more nuanced understanding of liver there are concerns about the generalization of Neuro Symbolic
cancer. Moreover, the paper emphasizes the paramount impor- models, as they may struggle with the inherent complexity
tance of early detection in improving patient outcomes, thereby of medical images, such as subtle anatomical variations [24].
linking the technical aspects of machine learning directly to Contrastive learning methods, while effective, require care-
practical clinical benefits. By highlighting these innovations ful consideration in the selection of positive and negative
and approaches, the research aims to contribute significantly pairs to ensure meaningful representation learning. In medical
to the advancement of cancer detection methodologies, ulti- imaging, where inter-class similarities are high, defining these
mately enhancing diagnostic accuracy and patient care [21]. pairs can be challenging. Moreover, the reliance on large
Other systems have also utilized various architectures, such as amounts of unlabeled data necessitates efficient data handling
AlexNet and GoogLeNet, alongside a custom 23-layer CNN and processing capabilities.
model specifically for the detection and classification of brain
tumors using MRI and CT scan images [22]. The author III. METHODOLOGY
presents a novel end-to-end methodology for the detection of
malaria parasites, specifically designed to harness the strengths In response to the multifaceted challenges inherent in chest
of high-cost microscope (HCM) images during training while X-ray diagnosis, we propose a comprehensive multimodal
ensuring robust performance on lowcost microscope (LCM) Neuro-Symbolic AI framework. This innovative approach
images during testing. This approach addresses a critical synergistically combines visual and textual data to enhance
challenge in the field: the difficulty and expense associated interpretability, scalability, and diagnostic accuracy.
with annotating LCM images, which often results in limited
training data. By leveraging the clearer, high-resolution HCM A. Data Acquisition and Preprocessing
images, the proposed method aims to improve the accuracy Our methodology is grounded in the utilization of the
and reliability of malaria detection in more accessible and MIMIC-CXR dataset, a robust and extensive collection of
cost-effective imaging modalities. The core of this innovative chest X-ray images accompanied by detailed radiology reports.
framework, named CodaMal (COntrastive Domain Adaptation This dataset serves as a critical resource, providing both the
visual and textual information necessary for the effective train- (UMLS), provide structured and standardized vocabularies that
ing and evaluation of our model. To ensure the integrity and define relationships between medical concepts. By integrating
consistency of the data, a meticulous preprocessing pipeline these ontologies, our system can interpret complex medical
is employed. For the image data, preprocessing steps include data within a clinically relevant framework. The symbolic
normalization to standardize pixel intensity values and resizing reasoning module employs description logic—a formalism
to ensure uniform input dimensions, thereby facilitating effi- used in ontologies—to perform logical inference over the
cient processing by subsequent neural network architectures. multimodal embeddings. This process enables the system to
The textual data, particularly the ’Findings’ and ’Impression’ deduce new information and uncover hidden relationships
sections of the radiology reports, undergoes thorough tokeniza- between clinical findings, thereby enriching the diagnostic
tion and cleaning processes. This involves parsing the text into insights. For instance, if the system identifies imaging features
manageable units and removing extraneous information, which indicative of pulmonary nodules and correlates them with
is essential for accurate natural language understanding and textual reports mentioning a history of smoking, it can infer
subsequent embedding generation an increased risk of malignancy, providing a rationale for its
conclusion. The transformation of continuous data into discrete
B. Multimodal Feature Extraction symbolic representations is crucial for human-understandable
The extraction of meaningful features from both visual explanations. Our approach involves the following steps: 1)
and textual modalities is pivotal to our framework. Visual Prototype Learning: The system learns prototypical repre-
Modality: We employ ConvNeXt V2, a cuttingedge convolu- sentations of medical concepts by clustering continuous em-
tional neural network renowned for its efficiency and superior beddings derived from multimodal data. Each cluster center
performance in image analysis. ConvNeXt V2 processes the represents a prototype corresponding to a specific medical
preprocessed chest X-ray images to extract high-dimensional condition or anatomical feature. 2) Similarity Measurement:
feature embeddings that encapsulate critical visual patterns Incoming patient data is mapped onto these prototypes by
indicative of various thoracic conditions. The model’s archi- measuring the similarity between the patient’s data embed-
tecture, characterized by its hierarchical design and optimized ding and each prototype. This similarity assessment utilizes
convolutional blocks, ensures effective capture of both local metrics such as cosine similarity or Euclidean distance within
and global image features. Textual Modality: The textual data the embedding space. 3) Symbol Assignment: Based on the
is processed using FlanT5, a transformer-based model adept highest similarity scores, the system assigns discrete symbolic
at understanding and generating human language. Flan-T5 labels to the patient’s data, effectively translating complex con-
processes the ’Findings’ and ’Impression’ sections of the radi- tinuous information into recognizable medical concepts. This
ology reports to extract meaningful textual embeddings. These methodology aligns with prototype theory, where concepts
embeddings encapsulate the semantic nuances of the medical are represented by central exemplars, facilitating intuitive and
narratives, providing context to the visual data. The model’s explainable reasoning. By grounding abstract data in concrete
encoderdecoder architecture facilitates the comprehension of medical terminology, clinicians can better understand and trust
complex medical terminologies and relationships within the the system’s diagnostic outputs. The integration of symbolic
text. To achieve a cohesive integration of visual and textual reasoning not only aids in accurate diagnosis but also enhances
information, we project both ConvNeXt V2 and Flan-T5 em- the transparency of the decision-making process. By providing
beddings into a shared latent space of identical dimensionality. clear mappings from data to diagnostic concepts and elucidat-
This alignment ensures that corresponding visual and textual ing the logical pathways leading to conclusions, the system
features reside proximally within the manifold, facilitating offers clinicians insight into its reasoning. This transparency
seamless multimodal fusion. Contrastive learning techniques is vital for validating the system’s recommendations and for
are employed during this phase to enhance the model’s ability fostering trust in AI-assisted medical diagnostics.
to discern between different classes by maximizing the simi-
larity between related image-text pairs and minimizing it for D. Equations
unrelated pairs. This approach leverages the complementary Number equations consecutively. To make your equations
nature of visual and textual data, leading to more robust and more compact, you may use the solidus ( / ), the exp
informative representations. function, or appropriate exponents. Italicize Roman symbols
for quantities and variables, but not Greek symbols. Use a
C. Neuro-Symbolic Reasoning Integration
long dash rather than a hyphen for a minus sign. Punctuate
Building upon the aligned embeddings, we incorporate a equations with commas or periods when they are part of a
symbolic reasoning module to enhance the interpretability sentence, as in:
and trustworthiness of our diagnostic system. This module a+b=γ (1)
leverages predefined medical ontologies and logical inference
mechanisms to map continuous multimodal embeddings into Be sure that the symbols in your equation have been defined
discrete symbolic concepts, facilitating transparent and ex- before or immediately following the equation. Use “(1)”, not
plainable clinical decision-making. Medical ontologies, such “Eq. (1)” or “equation (1)”, except at the beginning of a
as SNOMED CT and the Unified Medical Language System sentence: “Equation (1) is . . .”
E. LATEX-Specific Advice • Be aware of the different meanings of the homophones
Please use “soft” (e.g., \eqref{Eq}) cross references “affect” and “effect”, “complement” and “compliment”,
instead of “hard” references (e.g., (1)). That will make it “discreet” and “discrete”, “principal” and “principle”.
possible to combine sections, add equations, or change the • Do not confuse “imply” and “infer”.
order of figures or citations without having to go through the • The prefix “non” is not a word; it should be joined to the