NLP Lab 4
NLP Lab 4
AIM :
To implement word prediction using the N-Gram language model.
ALGORITHM :
STEP 1:Data Preprocessing: Collect and clean the text dataset by removing special
characters, extra spaces, and converting text to lowercase.
STEP 2:Tokenization: Split the text into individual words or tokens.
STEP 3:Building N-Grams: Generate n-grams (bigrams, trigrams, etc.) from the
tokenized text.
STEP 4:Probability Calculation: Compute conditional probabilities of words given
previous words using frequency counts.
STEP 5:Prediction Model: Given an input sequence, determine the most probable
next word based on maximum likelihood estimation.
STEP 6:Smoothing (if needed): Apply smoothing techniques like Laplace smoothing
to handle unseen n-grams.
STEP 7:Implementation & Testing: Implement the model in Python and test its
accuracy with different input sequences.
PROGRAM:
import nltk
from nltk.util import ngrams
from collections import Counter
nltk.download('punkt')
text = """Artificial intelligence is the future of technology.
The future of AI is bright and promising.
AI is transforming the world with automation and intelligence."""
tokens = nltk.word_tokenize(text.lower())
bigrams = list(ngrams(tokens, 2))
bigram_freq = Counter(bigrams)
def predict_next_word(prev_word):
candidates = {pair[1]: freq for pair, freq in bigram_freq.items() if pair[0] ==
prev_word}
if not candidates:
return "No prediction available"
return max(candidates, key=candidates.get)
print("Predicted word after 'the':", predict_next_word("the"))
print("Predicted word after 'ai':", predict_next_word("ai"))
print("Predicted word after 'future':", predict_next_word("future"))
OUTPUT :
RESULT :
The N-Gram model successfully predicts the next word based on historical
word sequences.