Aryan 2022PH11425
Aryan 2022PH11425
Abstract: Software engineering interviews, especially at the entry level, focus heavily on Data Struc-
tures and Algorithms problems. Preparing for these interviews is a daunting task for many. PrepAccelerator
streamlines interview preparation using many DL-based features such as Ratings Forecaster, Problems Sug-
gester, code editior with context aware code completion in C++, Java and Python3, LLM-Based Chatbot
that asks questions in the style of Coding interviews, personalized learning path generation using GNNs.
1 Introduction
In today’s competitive job market, technical interviews play a crucial role in securing software engineering
positions. However, many candidates struggle with algorithmic problem-solving, coding efficiency, and tech-
nical communication due to a lack of structured practice. To address this, PrepAccelerator aims to develop
an AI-powered coding interview preparation platform that provides an interactive and personalized learning
experience. The features that are aimed to be integrated on this platform are:
1. Ratings Forecaster: In order to prepare for coding interviews, many candidates participate in contests
on platforms like Codeforces, Codechef and Leetcode. A dynamic numerical rating is assigned to every
participant based on the performance in the contest. Ratings forecaster will predict rating changes for
a participant before the official contest, based on contest history.
2. Problems Suggester: This feature helps users improve on weaker areas in their preparation. After a
contest, once problem tags (Topic, Level) for every problem have been updated, PS will suggest top-K
(K is subject to user setting) relevant problems to solve and improve performance in next contest.
Problems will be suggested from Leetcode and Codeforces (due to dataset availability). User can also
enter topics, difficulty, to be suggested top-K relevant problems from afore mentioned sites.
3. Context aware code completion: Self implemented feature that is found in most code editors. It
is aimed to train 3 separate models for CPP, Python and Java. Models will be either CNN, RNN or
LSTM (Subject to performance) . Uses combination of Data Structures and DL models for predictive
code completion.
4. LLM Based Chatbot: Fine tuned open source LLMs like LLaMA 2 to ask questions in the style
of a Coding interview. Evaluates answers and provides feedback to the user. This feature will help
candidates practice for real time interviews.
5. Personalized Learning Path Generation using GNNs: The GNN model predicts the next best topic
to study based on user performance. GNN ranks topics and suggests adaptive learning paths based on
user data from Leetcode and Codeforces.
3 Dataset Description
The datasets used for the Problems Suggester feature are:
1. Leetcode Problem Dataset (Kaggle): File Format: CSV. The LeetCode Problems Dataset consists
of 1,825 coding problems collected from LeetCode. This dataset contains rich metadata about each
problem, including difficulty level, company tags, problem frequency, and acceptance rates.
Feature Description
id Unique problem identifier
title Name of the coding problem
description Full text of the problem statement
is premium Indicates if a premium account is required (Boolean)
difficulty Problem difficulty (Easy, Medium, or Hard)
solution link Link to the problem’s solution
acceptance rate Percentage of correct submissions
frequency How often the problem is attempted
url Link to the problem on LeetCode
discuss count Number of discussion threads on the problem
accepted Number of times the solution was accepted
submissions Total number of submissions
companies Companies that have asked the problem
related topics Topics related to the problem (e.g., Graphs, DP)
likes Number of likes received by the problem
dislikes Number of dislikes received by the problem
likes
rating Rating score calculated as likes+dislikes
asked by faang Indicates if the problem was asked by FAANG companies
similar questions List of similar problems with metadata
2. Codeforces Dataset (Kaggle): File Format: CSV. The dataset consists of 6,819 Codeforces prob-
lems. The dataset consists of two main columns: Problem Statement and Problem Tags. Problem
Statement is the full text from the problem page. Problem Tags are comma-separated tagged classes
1. CodeSearchNet (GitHub): The CodeSearchNet dataset is a collection of code snippets and their
corresponding natural language descriptions, designed to train and evaluate machine learning models
for code retrieval and code completion tasks. It has over 2 million code snippets in Python, Java and
CPP.
2
Field Name Description
Miscellaneous Data: Fetched from Codeforces (Contest History, Rating changes etc.)
1. Statistical Methods: Moving Average (MA), Autoregressive Integrated Moving Average (ARIMA),
Exponential Smoothing (ETS).