0% found this document useful (0 votes)
7 views

Python Coding Exercise

Candidates must build a document management and Q&A application using a Retrieval-Augmented Generation (RAG) system, focusing on backend services in Python. The application includes APIs for document ingestion, Q&A, and document selection, and requires efficient embedding generation and storage. Evaluation criteria emphasize code quality, performance, scalability, and comprehensive documentation, alongside requirements for deployment and CI/CD integration.

Uploaded by

pdiksha0214
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Python Coding Exercise

Candidates must build a document management and Q&A application using a Retrieval-Augmented Generation (RAG) system, focusing on backend services in Python. The application includes APIs for document ingestion, Q&A, and document selection, and requires efficient embedding generation and storage. Evaluation criteria emphasize code quality, performance, scalability, and comprehensive documentation, alongside requirements for deployment and CI/CD integration.

Uploaded by

pdiksha0214
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Coding Exercise: Document Management and RAG-based Q&A Application

Candidates are required to build an application that involves backend services and QCA features
powered by a Retrieval-Augmented Generation (RAG) system. The application aims to manage users,
documents, and an ingestion process that generates embeddings for document retrieval in a Q&A
setting.

You can use mocking services/JSON/TF-IDF/BM25/Scikit-learn or any other retrieval algorithm


to complete assignment

Application Components
1. Python Backend (Document Ingestion and RAG-driven Q&A)
o Purpose: Develop a backend application in Python to handle document
ingestion, embedding generation, and retrieval-based Q&A (RAG).
o Key APIs:
▪ Document Ingestion API: Accepts document data, generates
embeddings using a Large Language Model (LLM) library, and stores
them for future retrieval.
▪ Q&A API: Accepts user questions, retrieves relevant document
embeddings, and generates answers based on the retrieved content
using RAG.
▪ Document Selection API: Enables users to specify which documents to
consider in the RAG-based Q&A process.
o Tools/Libraries: Any one of the following
▪ Use Ollama Llama 3.1 8B model/Langchain/Llama Index library or
OpenAI API or Hugging Face Transformers
▪ Database for storing embeddings (Postgres preferred).
▪ Asynchronous programming for efficient handling of API requests.

Evaluation Criteria

Backend (Python - Document Ingestion and Q&A)


1. Code Quality:
o Asynchronous programming practices for API performance.
o Clear and concise code, with emphasis on readability and maintainability.
2. Data Processing and Storage:
o Efficient embedding generation and storage.
o Ability to handle large datasets (e.g., large volumes of documents and
embeddings).
3. QsA API Performance:
o Effective retrieval and generation of answers using RAG.
o Latency considerations for prompt response times.
4. Inter-Service Communication:
o Design APIs that allow the backend to trigger ingestion and access Q&A
functionality seamlessly.
5. Problem Solving and Scalability:
o Demonstrate strategies for large-scale document ingestion, storage, and
efficient retrieval.
o Solution for scaling the RAG-based Q&A system to handle high query volumes.
End-of-Development Showcase Requirements
At the end of the development, candidates should demonstrate the following:

1. Design Clarity:
o Show a clear design of classes, APIs, and databases, explaining the rationale
behind each design decision.
o Discuss non-functional aspects, such as API performance, database integrity,
and consistency.
2. Test Automation:
o Showcase functional and performance testing.
o Cover positive and negative workflows with good test coverage (70% or higher).
3. Documentation:
o Provide well-documented code and create comprehensive design
documentation.
4. 3rd Party Code Understanding:
o Explain the internals of any 3rd-party code used (e.g., libraries for LLM or
authentication).
5. Technical Knowledge:
o Demonstrate knowledge of HTTP/HTTPS, security, authentication, authorization,
debugging, monitoring, and logging.
6. Advanced Concepts:
o Usage of design patterns in code.
7. Test Data Generation:
o Demonstrate skills in generating large amounts of test data to simulate real-
world scenarios.
8. Deployment and CI/CD (Applicable to All Components):
o Dockerization: Dockerize each service, making it easily deployable and
portable.
o Deployment Scripts: Provide deployment scripts to run the application on
Docker or Kubernetes, compatible with any cloud provider (e.g., AWS, Azure,
GCP).
o CI/CD Pipeline: Implement a CI/CD pipeline for each component to automate
testing, building, and deployment.

You can use any one of the below options for Deployment part -

• Kindly push the code in Github.


• Create docker files/docker images
• Create README file and write detailed instructions for CI/CD workflow or
infrastructure workflow

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy