0% found this document useful (0 votes)
11 views6 pages

Medical Rag Report

This document presents a case study on the implementation of Retrieval-Augmented Generation (RAG) models to enhance medical query processing and healthcare education in low- and middle-income countries. It details the development of the SMARThealth GPT model, which utilizes a hybrid retrieval mechanism and adaptive chunking to improve the accuracy and relevance of responses for healthcare workers. The study emphasizes the potential of large language models (LLMs) in providing accessible medical information and improving healthcare delivery while addressing challenges such as factual accuracy and data privacy.

Uploaded by

Anmol Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Medical Rag Report

This document presents a case study on the implementation of Retrieval-Augmented Generation (RAG) models to enhance medical query processing and healthcare education in low- and middle-income countries. It details the development of the SMARThealth GPT model, which utilizes a hybrid retrieval mechanism and adaptive chunking to improve the accuracy and relevance of responses for healthcare workers. The study emphasizes the potential of large language models (LLMs) in providing accessible medical information and improving healthcare delivery while addressing challenges such as factual accuracy and data privacy.

Uploaded by

Anmol Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Optimized Retrieval-Augmented Generation

Framework for Enhanced Medical Query


Processing

Aarthi M Riddhi Gindodiya Anmol Singh


Department of Computer Science Department of Computer Science Department of Computer Science
(MTECH CSE) (MTECH CSE) (MTECH CSE)
Vellore Institute of Technology Vellore Institute of Technology Vellore Institute of Technology
Vellore,Tamil Nadu -632014 Vellore,Tamil Nadu -632014 Vellore,Tamil Nadu -632014
aarthimanoharan2003@gmail.com riddhigindodiya06@gmail.com mranmolsingh101@gmail.com

Abstract—Large language models (LLMs) have been a effective techniques for adapting pre-trained LLMs to
game-changer in a number of fields in recent years, including particular applications are retrieval-augmented generation
healthcare and medical education. This work offers a case study (RAG) and fine-tuning. In a "close-book" scenario, fine-
on the real-world implementation of retrieval-augmented tuning adjusts the model's weight according to a task-specific
models for improving healthcare education in low- and middle-
income nations that are based on generation (RAG). The need
dataset, depending only on extra input-output pairs of training
for easily available and locally relevant medical information to data for learning. On the other hand, RAG does not require
support community health workers in providing high-quality labeled training data and functions in a "open-book"
maternity care led to the development of the SMARThealth environment.
GPT model, which is the subject of this research. We outline the
whole RAG pipeline development process, which includes
A. What is RAG
parameter selection and optimization, knowledge embedding The implementation of goal-oriented large language
retrieval, response production, and the establishment of a models (LLMs) in conjunction with various LLM-oriented
knowledge base of Indian pregnancy-related rules. This case frameworks is expanding the range of AI applications and
study demonstrates how LLMs may improve guideline-based improving LLMs' ability to perform complicated tasks.
health education and develop the ability of frontline healthcare Modern LLMs are quite capable, ranging from chatbots that
workers. It also provides ideas for comparable applications in
can generate programming code to responding to inquiries on
environments with restricted resources. It is a resource for
machine learning researchers, teachers, medical experts, and legal papers with latent provenance. But this enhanced
legislators who want to use LLMs to significantly enhance potential also brings with it new complications. Despite their
education. strength at traditional text-based activities, emerging LLMs
require outside assistance to keep up with changing
Keywords—Machine Learning, Large language Models, Retrieval knowledge [2].
Augmented Generation, Natural Language processing, Medical
Assisstent.

I .INTRODUCTION
Large Language Models (LLMs) are the solution for majority of
the text-related tasks or LLMs, are the standard approach.
Their factual accuracy1, a drawback of their generative
nature, is still a serious worry, nevertheless. LLMs are made
to produce believable text based on learnt patterns rather than
to acquire exact facts [1]. Contextualizing LLMs by the use
of pertinent input tokens to affect their output is a common
method of improving their factuality. This includes more
complex Retrieval Augmented Generation (RAG) methods
as well as more straightforward prompting strategies like
"Let's think step by step." Context retrieval system
integration may, in fact, greatly improve LLM performance
and dependability[1].
Recently, With the growing availability of pre-trained large
language models (LLMs), including Open AI's GPT, Lama, Fig. 1. RAG Model
and PaLM, the field of natural language processing (NLP) has
recently witnessed amazing advancements. These models Non-parametric retrieval-based approaches, like as retrieval-
have been used in a variety of sectors and are becoming more augmented generation (RAG), are becoming essential to the
and more working in healthcare and medical education. Two most recent LLM applications in order to overcome this
difficulty, particularly for domain-specific tasks.

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


The development of AI-stack applications emphasizes how Computing framework: Computing frameworks such as
important it is to improve RAG techniques in order to keep Apache Hadoop and Spark have made it easier to manage
LLMs' knowledge bases up to date. When using semantic large-scale knowledge base processing and analysis. These
similarity search to find the most pertinent passages, or top- frameworks enable the parallel processing of data over
K vectors, retrieval-based applications require optimization. several nodes, enabling the efficient and scalable
There are dependencies on time and token constraints when computation of complex tasks such as indexing, querying,
querying multi-document vectors and adding pertinent and analysis.
context to LLMs. The "bi-encoder" retrieval models make NLP and ML Techniques: To glean insights from vast
use of state-of-the-art approximation nearest-neighbor volumes of text data, deep learning architectures like
techniques[4]. transformers, in addition to other cutting-edge machine
learning and NLP models, are increasingly being employed.
B. Related work
Models that excel at tasks like text classification,
Numerous studies have been conducted in an effort to summarization, and question answering, such as B-E-R-T
address the problem of LLM factuality. Using LLMs' innate (Bidirectional-Encoder-Representations from Transformer)
In-Context Learning (ICL) capabilities was the main focus of and GPT (Generative Pre-trained Transformer), can manage
early attempts to enhance it, enabling individuals to adjust to large knowledge bases. Knowledge graphs are structured
new duties without particular training and with few examples. representations of knowledge that hold entities,
This opened the door for the creation of complex prompting relationships, and characteristics using a graph-based
strategies intended to elicit more precise and thoughtful structure. By organizing data into connected nodes and
answers. LLMs do better on challenging problems when edges, knowledge graphs make it easier to efficiently
guided through intermediate reasoning processes by Chain of navigate and retrieve relevant information from large
Thought (CoT) prompts. Self-Consistency (SC), on the other sources. When knowledge graphs are filled and improved
hand, takes use of the stochastic character of LLMs by with strategies like these, they become more beneficial for
generating and contrasting several results for the same input knowledge retrieval tasks.
before generating a single, cohesive response [3]. The Self- Mixed techniques: Several contemporary techniques use
Consistency Chain of Thought (SC-CoT) combines both. elements of the aforementioned strategies in order to
Researchers used to prompt strategies to integrate external optimize the advantages of different approaches. For
knowledge after realizing the limitations of relying just on example, hybrid systems can mix machine learning models
internal knowledge, which eventually gave rise to retrieval with traditional indexing methods or leverage distributed
RAG stands for Augmented Generation. By biasing replies computing frameworks to increase the scalability of
with real data, RAG systems greatly improve LLM knowledge retrieval and analysis processes[3].
performance by retrieving and integrating pertinent
information from external knowledge sets. Medprompt a
context retrieval system created for medical MCQA that B. LLMs in Medical Domains
produces state-of-the-art answers with GPT-4, proposes a Large Language Models (LLMs) have emerged as
combination of few-shot, CoT, and SC, which are frequently powerful tools in the medical domain, transforming how
utilized in the healthcare area to increase factuality. Although healthcare professionals, researchers, and patients access and
Medprompt has been modified for open-source models, a interpret complex medical information. These models,
comprehensive analysis of the best way to set up its trained on massive datasets, including scientific literature,
constituent parts (such as DBs and embeddings) is still a work clinical notes, and public health data, can understand,
in progress[4]. generate, and summarize medical content with remarkable
accuracy. In clinical decision support, LLMs assist
II .LITERATURE SURVEY physicians by providing evidence-based answers to
A. Existing Approaches for Large-Scale Knowledge diagnostic queries, suggesting treatment options, and
Bases analyzing patient data for potential risks. They are also
invaluable in biomedical research, helping researchers
Many techniques and tactics are now used to manage
navigate vast amounts of literature by generating insights and
large-scale knowledge bases, each of which is intended to
summaries from multiple sources, including databases like
address specific challenges associated with processing
PubMed[2].
massive volumes of textual material. These techniques may
LLMs contribute to patient engagement by simplifying
be broadly categorized into many key strategies.
medical jargon into understandable language, empowering
patients to make informed decisions about their health.
Searching and indexing algorithms: Conventional
Despite their immense potential, LLMs face challenges, such
information retrieval methods rely on indexing strategies
as ensuring data privacy, managing biases in training data,
such as inverted indexes and search algorithms such as TF-
and maintaining up-to-date medical knowledge [3].
IDF (Term Frequency-Inverse-Document-Frequency) to
Furthermore, the need for regulatory compliance and
efficiently locate relevant documents inside large
validation of AI-generated medical advice underscores the
knowledge repositories. Many information retrieval systems
importance of human oversight. As LLMs continue to evolve,
are built on these processes, which allow for the prompt and
their integration into the medical domain holds great promise
precise retrieval of information in response to user
for advancing healthcare delivery, research efficiency, and
queries[3].
patient outcomes.
C. RAG methods
Retrieval-Augmented Generation (RAG) is a hybrid
approach in natural language processing that combines
information retrieval with language generation to produce
more accurate and contextually relevant responses. Unlike
traditional language models that rely solely on pre-trained
knowledge, RAG dynamically retrieves external information
from large datasets or document repositories to augment the
generation process. This makes it particularly suitable for
tasks requiring factual accuracy and domain-specific
knowledge, such as biomedical literature search, customer
support, and legal document analysis[1].
1. Stuff Method
The stuff method directly concatenates all the retrieved
chunks of information and feeds them as context to the LLM.
The LLM processes the entire input at once to generate the
final response.
2. Refine Method
The refine method provides the LLM with one chunk of
information at a time. The initial response is generated from
the first chunk, and subsequent chunks are used to iteratively
refine or improve the response.
3. Map-Reduce Method
In the map-reduce method, the LLM processes each chunk Fig.2. Proposed Model
individually to generate partial answers (map phase). These
partial answers are then combined and summarized to
produce the final response (reduce phase).
4. Map-Retrieve Method A. Adaptive Chunking for context Retention
The map-retrieve method first generates partial answers from In natural language processing (NLP), adaptive chunking
each chunk (map phase). Then, instead of merely is a dynamic technique that maximizes context preservation
summarizing the results, it retrieves additional information while breaking up lengthy text sequences or massive datasets
based on these partial answers to refine the final output. into manageable, relevant pieces. Because traditional fixed-
size chunking techniques randomly break off text at
predetermined bounds, they frequently fail to preserve a
III. PROPOSED METHODOLOGY document's semantic coherence and may divide context-
The proposed model shown in the fig.2 outlines an sensitive material like sentences, paragraphs, or logical units.
advanced information retrieval and answer generation system On the other hand, adaptive chunking cleverly modifies the
tailored for the PubMed dataset. It begins with a user query, size and boundaries of every chunk according on semantic
which is encoded using a hybrid approach that combines linkages, language signals, or content structure. Applications
sparse embeddings, such as TF-IDF for exact term matching, containing lengthy texts, such research papers, legal
and dense embeddings from neural models like BERT for contracts, or biological literature (like the PubMed dataset),
semantic understanding. Simultaneously, the PubMed dataset benefit greatly from adaptive chunking. It improves language
undergoes adaptive chunking, where large documents are model performance in tasks including document retrieval,
segmented into coherent sections based on criteria such as question answering, and text summarizing by optimizing
token density, entropy, and medical entity recognition. This chunk size and placement.
chunking ensures that meaningful content is retained for In addition, adaptive chunking methods frequently use rule-
efficient processing[2]. based algorithms or machine learning models to identify the
The query and document embeddings are aligned, and a best chunk boundaries. It is possible to train these models to
hybrid retrieval mechanism is applied, combining dense identify textual patterns like paragraph transitions or
search for semantic relevance and sparse search for precise semantic similarity between parts. Some sophisticated
matches. Results are ranked using a combination of cosine methods constantly enhance chunking choices depending on
similarity and BM25 weighting, and the top K relevant downstream job performance by utilizing reinforcement
chunks are selected. These chunks are then passed to a large learning. By properly dividing text, adaptive chunking lowers
language model (LLM), which generates comprehensive memory and computational overhead in transformer-based
answers based on the retrieved information. This model models (like BERT or GPT), enabling models to handle data
effectively balances traditional keyword-based retrieval with more effectively within their input size restrictions. In the
semantic understanding, optimized context filtering, and end, adaptive chunking helps provide more precise and
advanced language generation, making it highly suitable for contextually aware NLP results, particularly for jobs that call
complex biomedical literature searches and information for in-depth understanding of large amounts of textual
extraction. material[3].
models, specifically embedding-based techniques (such as
Mathematical Formulation BERT or phrase transformers). Even in situations when there
Let a document be represented as: is no direct term overlap between the query and the content,
these vectors effectively enable retrieval by capturing
semantic meanings. Although dense approaches are very
good at semantic search, they can be computationally costly
and occasionally fail to find exact matches that sparse
approaches would find. These two perspectives are combined
a) Token Density calculation in the hybrid method. Hybrid retrieval systems combine
sparse and dense representations to provide robust semantic
comprehension and accurate keyword matching. This is
frequently accomplished by employing sparse and dense
scoring methods to evaluate documents independently, then
combining the findings using weighted aggregation or
learning ranking algorithms.
b) TF-IDF Calculation
Mathematical Formulation

a) Dense Embedding(Semantic Encoding)

b) Sparse Embedding(lexical encoding)


Entropy of chunk is then:

c) Hybrid Embedding fusion

c) Medical Entity Frequency

C. Low Memory Optimization with Quantization


In order to optimize machine learning models for
deployment on resource-constrained contexts, such as mobile
d) Adaptive Chunking Decision
devices, edge computing nodes, or low-power embedded
systems, quantization is a potent method that lowers memory
use and computational expenses. Quantization reduces the
memory footprint significantly while frequently preserving a
respectable level of model accuracy by encoding model
B. Hybrid Dense-Sparse Retrieval Mechanism parameters (weights and activations) using lower precision
data types rather than the conventional 32-bit floating-point
A hybrid dense-sparse retrieval mechanism is an advanced
format (FP32). High-precision data are converted into lower-
information retrieval technique that enhances search
precision representations using quantization, which usually
efficiency and accuracy by combining the advantages of
uses 8-bit integers (INT8) rather than 32-bit floats[3].In order
dense and sparse representations. By utilizing their
for the model to function with smaller data types, a
complementing qualities, it closes the gap between
continuous range of values must be mapped to a discrete set.
contemporary semantic search approaches (dense retrieval)
and conventional keyword-based search methods (sparse LLM generates the answer using
retrieval). Exact keyword matching is necessary for Sparse
Retrieval techniques, such those found in conventional search
engines that employ BM25 or Term Frequency-Inverse
Document Frequency (TF-IDF). When the query words
exactly match the content of the page, they function
effectively. They frequently have trouble, though, when D. Dataset
language varies or when searches call for semantic The National Library of Medicine (NLM) of the National
comprehension as opposed to precise matching[3]. Institutes of Health (NIH) has compiled the extensive and
Conversely, Dense Retrieval encodes queries and documents reputable PubMed dataset of biomedical literature. For
into dense vector representations using machine learning academics, researchers, and medical professionals working in
the biological sciences and healthcare domains, it is an V. CONCLUSION
essential resource. PubMed frequently offers links to In this study, we introduced a unique Retrieval-
publisher websites or open-access repositories such as Augmented Generation (RAG) framework that uses three
PubMed Central (PMC), but it does not contain the full-text important innovations—adaptive chunking, hybrid retrieval,
articles. In order to facilitate accurate literature categorization and quantized inference—to improve response accuracy and
and search, every item in the collection includes structured computing efficiency. Our adaptive chunking technique
metadata, such as titles, abstracts, authorship, publication maximizes retrieval relevance by dynamically segmenting
dates, and Medical Subject Headings (MeSH) keywords. The text according to semantic value. The hybrid retrieval process
dataset is a foundation for applications in text mining, natural includes both dense and sparse embeddings, boosting
language processing (NLP), and biological research because information retrieval precision. Furthermore, our quantized
of its comprehensive metadata and ease of access. The dataset inference method preserves model performance while
is widely used by researchers to develop machine learning drastically lowering computing cost. Our method performs
models for applications including large-scale systematic better than current RAG implementations in terms of retrieval
reviews, literature-based discovery, and biological entity efficiency, response quality, and inference time, according to
recognition. The PubMed dataset is easily accessible through empirical tests. Our approach is well-suited for real-world
its downloadable data subsets and API (E-utilities), which applications that demand scalable, effective, and precise
enables effective integration into computational pipelines for language comprehension because it makes use of these
cutting-edge research and development. improvements to provide better retrieval precision, lower
TABLE I. latency, and lower resource consumption. Subsequent
research endeavors will concentrate on expanding the model
Method Feature Existing Enhanced RAG to multi-modal retrieval, refining quantization methods, and
RAG
assessing its applicability in other fields.

Adaptive Chunking Fixed Token REFERENCES


length(eg. Density,entropy,medic
Tokens) al terms [1] Ke, Y., Jin, L., Elangovan, K., Abdullah, H. R., Liu, N., Sia, A. T. H.,
... & Ting, D. S. W. (2024). Development and Testing of Retrieval
Augmented Generation in Large Language Models--A Case Study
Report. arXiv preprint arXiv:2402.01733.
Hybrid Embedding Dense or Dense+Sparse fusion
Sparse [2] Kresevic, S., Giuffrè, M., Ajcevic, M., Accardo, A., Crocè, L. S., &
Shung, D. L. (2024). Optimization of hepatological clinical guidelines
interpretation by large language models: a retrieval augmented
Hybrid Retrieval Semantic( Hybrid-Cosine+BM25 generation-based framework. NPJ Digital Medicine, 7(1), 102.
Cosine weighting [3] Neelakanteswara, A., Chaudhari, S., & Zamani, H. (2024, March).
similarity) RAGs to Style: Personalizing LLMs with Style Embeddings.
In Proceedings of the 1st Workshop on Personalization of Generative
Redundancy Context Top-K Token limit+ AI Systems (PERSONALIZE 2024) (pp. 119-123).
Filtering Selection redundancy filtering [4] Meduri, K., Nadella, G. S., Gonaygunta, H., Maturi, M. H., & Fatima,
F. (2024). Efficient RAG Framework for Large-Scale Knowledge
Bases.
Prompting LLM Prompt Optimized context
Integration based prompting [5] Long, C., Liu, Y., Ouyang, C., & Yu, Y. (2024). Bailicai: A Domain-
Optimized Retrieval-Augmented Generation Framework for Medical
generation
Applications. arXiv preprint arXiv:2407.21055.
Fig. 3. Comparison Table [6] Şakar, T., & Emekci, H. (2025). Maximizing RAG efficiency: A
comparative analysis of RAG methods. Natural Language
Processing, 31(1), 1-25.
IV. RESULTS AND EXPERIMENTS [7] Soman, K., Rose, P. W., Morris, J. H., Akbas, R. E., Smith, B.,
The performance comparison table highlights that your Peetoom, B., ... & Baranzini, S. E. (2024). Biomedical knowledge
Hybrid RAG Model outperforms existing state-of-the-art graph-optimized prompt generation for large language
models. Bioinformatics, 40(9), btae560.
RAG models on the PubMed dataset across key evaluation
[8] Bayarri-Planas, J., Gururajan, A. K., & Garcia-Gasulla, D. (2024).
metrics. Your model achieves the highest Recall@5 (0.78) Boosting Healthcare LLMs Through Retrieved Context. arXiv preprint
and MRR (0.71), indicating superior document retrieval arXiv:2409.15127.
efficiency. Additionally, it surpasses other models in text [9] Murali, S., Sowmya, S., & Supreetha, R. (2024, August). ReMAG-KR:
generation quality, with improved BLEU (0.63) and Retrieval and Medically Assisted Generation with Knowledge
Reduction for Medical Question Answering. In Proceedings of the
ROUGE-L (0.72) scores, demonstrating its ability to produce 62nd Annual Meeting of the Association for Computational Linguistics
more fluent and relevant responses. The BERTScore (0.85) (Volume 4: Student Research Workshop) (pp. 62-67).
further confirms that your model's outputs closely align with [10] Al Ghadban, Y., Lu, H., Adavi, U., Sharma, A., Gara, S., Das, N., ... &
ground truth answers, outperforming OpenAI RAG and Hirst, J. E. (2023). Transforming healthcare education: Harnessing
Facebook DPR + FiD. The combination of BM25 and Dense large language models for frontline health worker capacity building
using retrieval-augmented generation. medRxiv, 2023-12.
Embeddings in your hybrid retrieval approach proves more
[11] Al Ghadban, Y., Lu, H., Adavi, U., Sharma, A., Gara, S., Das, N., ... &
effective than sparse or dense-only methods, leading to Hirst, J. E. (2023). Transforming healthcare education: Harnessing
enhanced retrieval and generation performance. large language models for frontline health worker capacity building
using retrieval-augmented generation. medRxiv, 2023-12.
[12] Zhao, S., Yang, Y., Wang, Z., He, Z., Qiu, L. K., & Qiu, L. (2024).
Retrieval augmented generation (rag) and beyond: A comprehensive
survey on how to make your llms use external data more wisely. arXiv [17] Yang, R. (2024). CaseGPT: a case reasoning framework based on
preprint arXiv:2409.14924. language models and retrieval-augmented generation. arXiv preprint
[13] Fleischer, D., Berchansky, M., Wasserblat, M., & Izsak, P. (2024). Rag arXiv:2407.07913.
foundry: A framework for enhancing llms for retrieval augmented [18] Das, S., Ge, Y., Guo, Y., Rajwal, S., Hairston, J., Powell, J., ... &
generation. arXiv preprint arXiv:2408.02545. Sarker, A. (2024). Two-layer retrieval augmented generation
[14] Adejumo, P., Thangaraj, P. M., Vasisht Shankar, S., Dhingra, L. S., framework for low-resource medical question-answering: proof of
Aminorroaya, A., & Khera, R. (2024). Retrieval-Augmented concept using Reddit data. arXiv preprint arXiv:2405.19519.
Generation for Extracting CHA2DS2VASc Features from [19] Hu, Y., & Lu, Y. (2024). Rag and rau: A survey on retrieval-augmented
Unstructured Clinical Notes in Patients with Atrial language model in natural language processing. arXiv preprint
Fibrillation. medRxiv, 2024-09. arXiv:2404.19543.
[15] Kim, S. (2025). MedBioLM: Optimizing Medical and Biological QA
with Fine-Tuned Large Language Models and Retrieval-Augmented
Generation. arXiv preprint arXiv:2502.03004.
[16] Leng, Q., Portes, J., Havens, S., Zaharia, M., & Carbin, M. (2024).
Long context rag performance of large language models. arXiv
preprint arXiv:2411.03538.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy