Ai Report - Merged
Ai Report - Merged
PROJECT REPORT
Introduction
This code is an implementation of a next word prediction model using a GRU (Gated Recurrent Unit)
neural network architecture. The goal of the model is to predict the next word in a given sequence of
words.
The code starts by importing the necessary libraries and loading the data from a text file. The data is
then cleaned and tokenized using the Keras Tokenizer class. The sequences of words are split into
input/output pairs, where the input is a single word and the output is the next word in the sequence.
The GRU model is then defined using Keras Sequential API. The model consists of an embedding layer,
a GRU layer with 128 units, and a dense output layer with softmax activation. The model is compiled
using categorical cross-entropy loss and the Adam optimizer.
The model is trained on the input/output pairs using the fit() function, and several callbacks are used to
monitor the training process and save the best model. The model is then used to predict the next word
in a given input sequence.
Overall, this code demonstrates how to build and train a GRU-based next word prediction model using
Keras.
Literature Review
Next Word Prediction has become an important task in Natural Language Processing. It has various
applications in text completion, machine translation, and speech recognition. The prediction model can
be built using various techniques such as n-grams, Hidden Markov Models, and neural networks.
Recently, Recurrent Neural Networks (RNNs) have become very popular in Next Word Prediction.
RNNs are known for their ability to model sequential data. Long Short-Term Memory (LSTM) and
Gated Recurrent Unit (GRU) are the most commonly used RNNs architectures for this task. LSTM is
better suited for longer sequences while GRU is less computationally expensive and has shown better
performance in some cases.
Tokenization is a crucial step in Next Word Prediction. It involves converting the input text into
numerical sequences. The Tokenizer class in Keras can be used for this purpose. The sequences are the
split into input (X) and output (y) variables. These variables are then used to train the LSTM and GRU
models.
Methodology
Libraries: The required libraries are imported as follows:
Data Preprocessing: The Tokenizer function from keras.preprocessing.text is used to tokenize the data.
The function is fitted on the cleaned data, and the tokenizer is saved using the pickle library. The
sequences variable is created by sliding a window of size 2 over the tokenized data. Each window of
size 2 is converted into a sequence of 2 integers, where the first integer is the input and the second
integer is the output. The input sequences are stored in the variable X, and the output sequences are
stored in the variable y. The output sequences are converted into categorical format using the
to_categorical function from keras.utils.
Model Architecture: Two different models are created for the same task with different architectures:
LSTM and GRU. The architecture of the LSTM model consists of an embedding layer, two LSTM
layers, and two Dense layers. The architecture of the GRU model consists of an embedding layer, a
GRU layer, and a Dense layer.
Callbacks: The following callbacks are used during training:
• ModelCheckpoint is used to save the best model based on the loss value.
• ReduceLROnPlateau is used to reduce the learning rate when the loss value does not improve
for 3 epochs.
• TensorBoard is used to visualize the training process.
Compilation and Training
The model is compiled using the compile function from tensorflow.keras. The categorical_crossentropy
loss function and the Adam optimizer are used. The model is trained using the fit function from
tensorflow.keras.
Model Training: We train the model using the Adam optimizer and cross-entropy loss function. We
monitor the performance on the validation set during training to ensure that the model is not overfitting.
We also experiment with different hyperparameters to find the optimal values.
Prediction: Two trained models, one for LSTM and the other for GRU, can be selected for next word
prediction. The user enters a sentence, and the last word of the sentence is used as input to predict the
next word. The load_model function from tensorflow.keras.models is used to load the selected model,
and the saved tokenizer is loaded using pickle. The Predict_Next_Words function takes the loaded
model, the loaded tokenizer, and the input text as input and predicts the next word of the input text. The
user can choose to stop the script by typing "stop the script".
Model Improvement: We explore ways to improve the performance of the model. For example, we
can use a larger training dataset, fine-tune the hyperparameters, or incorporate additional features such
as part-of-speech tags or named entities. We also experiment with different architectures, such as adding
more LSTM layers or using a different type of recurrent neural network.
Overall, these methods provide a framework for building a next word prediction AI model using the
TensorFlow library. By following these steps and experimenting with different approaches, we can
build a model that accurately predicts the next word in a sequence and generates high-quality text.
Conclusion
The project demonstrates how to train a model for next word prediction using LSTM and GRU
architectures. The model is trained on the text corpus of the book "Metamorphosis" by Franz Kafka.
The trained model can be used to predict the next word
References
• Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8),
1735-1780.
• Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural
machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
• Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent
neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
• Karpathy, A. (2015). The unreasonable effectiveness of recurrent neural networks. Andrej
Karpathy blog.
• Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning (Vol. 1). MIT Press.