0% found this document useful (0 votes)
5 views8 pages

Articles Search Project

The document outlines a project aimed at developing a search query system using Deep Learning, specifically Recurrent Neural Networks (RNNs), to improve article retrieval accuracy. It details the methodologies employed, including data preprocessing, model architecture selection, and training processes, while comparing the RNN approach to traditional search methods. The results indicate that the RNN-based model shows potential for enhanced performance in understanding and processing user queries.

Uploaded by

edam.koubaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views8 pages

Articles Search Project

The document outlines a project aimed at developing a search query system using Deep Learning, specifically Recurrent Neural Networks (RNNs), to improve article retrieval accuracy. It details the methodologies employed, including data preprocessing, model architecture selection, and training processes, while comparing the RNN approach to traditional search methods. The results indicate that the RNN-based model shows potential for enhanced performance in understanding and processing user queries.

Uploaded by

edam.koubaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Report of articles search using Deep Learning project

Directed by: Ahmed Benameur∗ Siwar Ben Gharsallah† Sarra Ben Hadj Slama‡

∗ Author
† Author
‡ Author

1
Contents

1 The problematic 4
1.1 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Word Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Semantic Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 The Project 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Data Base used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Generating data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Model Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Results and Conclusion 8

2
List of Figures
1 Bibliomatrix sample data base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Data preprocessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Defining the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 Training the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6 Performing a query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3
1 The problematic
The goal of this project is to develop a search query system that can understand and process user queries
to retrieve the most relevant articles from a given corpus. Traditional search systems often rely on
keyword matching and simple statistical methods, which can fall short when dealing with complex queries
or nuanced language. By utilizing DL architectures, our system aims to grasp the contextual meaning
of search queries and articles, thereby enhancing retrieval accuracy. The following are 3 approaches to
tackle the problematic of this project.

1.1 Recurrent Neural Networks


Recurrent Neural Networks are a class of artificial neural networks where connections between nodes
form a directed graph along a temporal sequence. This feature allows them to exhibit temporal dynamic
behavior, making RNNs particularly well-suited for tasks involving sequential data, such as natural
language processing (NLP).

1.2 Word Embeddings


Word embeddings are a type of word representation that allows words to be represented as vectors
in a continuous vector space, capturing semantic relationships between words based on their usage in
context. Unlike traditional keyword matching techniques, word embeddings can understand the nuanced
meanings and relationships between words, making them particularly effective for natural language
processing (NLP) tasks.

1.3 Semantic Matching


Semantic matching involves understanding and matching the meaning behind words and phrases rather
than relying solely on exact keyword matches. This approach allows for a deeper comprehension of the
context and intent behind search queries and articles, leading to more accurate and relevant retrieval of
information. By focusing on the semantics of the content, our system aims to overcome the limitations
of traditional keyword-based search methods.

4
2 The Project
2.1 Introduction
Throughout this report, we will detail the steps taken to develop and evaluate our RNN-based search
query system. This includes data preprocessing, model architecture selection, training and optimization
processes, and performance evaluation. We will also compare our RNN approach with traditional search
methods to highlight its advantages and areas for further improvement.
The findings from this project demonstrate the potential of RNNs to significantly enhance search
query performance in article retrieval, paving the way for more intelligent and efficient information
retrieval systems.

2.2 Data Base used


The data base used is generated from the library biblimatrix using biblioshiny interface.

Figure 1: Bibliomatrix sample data base.

5
2.3 Generating data set
Through performing keywords queries, and observing their answers, we managed to generate a dataset
to train our RNN model with.

Figure 2: Keywords

6
2.4 Data Preprocessing
It consisted of three main steps:
Tokenization: Split the sentences into individual words or subword units to create a vocabulary.
Numerical Encoding: Convert words or subword units into numerical representations using the vo-
cabulary.
Padding: Ensure all sequences have the same length by padding shorter sentences with special tokens.

Figure 3: Data preprocessing.

2.5 Model Architecture Selection


After preprocessing the data, We had to choose an appropriate model architecture for the translation
task. We opted for a sequence-to-sequence model based on recurrent neural networks (RNNs).

2.6 Model Development


this model takes input text, processes it through an embedding layer and an LSTM layer to capture
sequential dependencies, passes it through fully connected dense layers to extract higher-level features,
and finally outputs the probability of each class using a sigmoid activation function.

Figure 4: Defining the model

7
2.7 Model Training
Model training was one of the most critical steps in my project. We used a training dataset to adjust
the model’s weights by minimizing the loss function of each layer.

Figure 5: Training the model

3 Results and Conclusion

Figure 6: Performing a query

The search queries model we developed demonstrated promising performance, although further im-
provement is possible with ongoing training and additional adjustments.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy