NLPPR8
NLPPR8
Department of
Artificial Intelligence and Data Science
# Download tokenizer
nltk.download('punkt')
# Preprocessing corpus
= corpus.lower()
corpus = re.sub(r'[^\w\s]', '', corpus) # Remove punctuation
words = word_tokenize(corpus)
# Vocabulary size
vocab_size = len(set(words))
if n == 2:
candidates = [(word, smoothed_bigrams.get((words[-1], word), 1 / vocab_size)) for
word in unigrams]
elif n == 3:
candidates = [(word, smoothed_trigrams.get((words[-2], words[-1], word), 1 /
vocab_size)) for word in unigrams]
else:
return "n can only be 2 or 3."
Conclusion:
Developing a language model to predict the next best word is a core NLP task. Traditional n-gram
models provide a statistical foundation, while modern neural models enable context-aware,
powerful predictions. Such models form the backbone of intelligent applications like chatbots,
virtual assistants, and content generators.