0% found this document useful (0 votes)

14 views11 pages

INTELLIPAAT - 2024 - 01 - 20 - Tansformers Cont. and Autoencoders

The document discusses different types of tokenization used in natural language processing including word, character, and subword tokenization. It explains how tokenization breaks down text into smaller units to make it easier for algorithms to analyze language and perform tasks like machine translation, sentiment analysis, and chatbots.

Uploaded by

ScarfaceXXX

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views11 pages

INTELLIPAAT - 2024 - 01 - 20 - Tansformers Cont. and Autoencoders

Uploaded by

ScarfaceXXX

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

In Generative question and answering the answer is rephrased.

In the world of Natural Language Processing (NLP), tokens are the building blocks! They're essentially smaller units that we break
down text into, making it easier for computers to understand and analyze. Think of it like chopping up a giant pizza into slices – it's
much easier to manage and consume that way.

There are different ways to slice that pizza (text), depending on the task at hand:

 Word Tokenization: This is the most common type, where we simply split the text into individual words. Like "The quick
brown fox jumps over the lazy dog" becomes ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"].
 Character Tokenization: Here, we go even smaller, breaking down the text into individual characters. So, our example
sentence would become ["T", "h", "e", " ", "q", "u", "i", "c", "k", ...].
 Subword Tokenization: This takes a middle ground, splitting text into smaller meaningful units than words but larger than
characters. This is particularly useful for languages with complex morphology (like German) or dealing with rare words.
Now, why is tokenization so important? Well, computers don't naturally understand language the way humans do. By breaking
down text into smaller, more manageable units, we make it easier for NLP algorithms to identify patterns, analyze word
relationships, and perform various tasks like:

 Machine translation: Translating languages involves understanding the meaning of one sentence and expressing it in
another. Tokenization helps break down the meaning into smaller pieces that can be translated more accurately.
 Sentiment analysis: Identifying the emotional tone of a text requires understanding the relationships between
words. Tokenization allows us to analyze these relationships and determine if a sentence is positive, negative, or neutral.
 Chatbots and virtual assistants: These systems need to understand what users are saying in order to respond
appropriately. Tokenization helps break down user queries into meaningful units that the chatbot can process and respond
to.

So, the next time you interact with an NLP system, remember the tiny tokens behind the scenes, diligently working to help
computers understand and respond to human language!

BERT, which stands for Bidirectional Encoder Representations from Transformers, is based on Transformers, a deep learning
model in which every output element is connected to every input element, and the weightings between them are dynamically
calculated based upon their connection.

It took place in Qatar From 18 November to 20 December 2022, after the country was awarded hosting rights in 2010.
[SEP] – it is a tag

When you hit the encode:

input_ids = tokenizer.encode(question, text)

It is encoding them into the tokens. But behind the scenes the tokens are all going to be embeddings.

Semantic search:

1:05:45 –
We can perform a similarity exercise such as cosine similarity between the questions and sentences.

The vector db support similarity search.

Algorithms that support search such as Locality Sensitive Hashing, Approximate nearest neighbors (ANN)
In generative question answering the final output is sent to a decoder.
Extractive question answering versus generative question answering.
1:49:47

The encoder compresses the input and the decoder is trained to recreate the original input.
Even if there is a minor amount of loss it should be able to construct the image well enough.

We then calculate the error and from the error backpropagate.

The whole thing put together becomes an autoencoder.

PCA does linear combination of inputs while here it is a nonlinear combination of inputs.

(x_train, _), (x_test, _) = mnist.load_data()

We don’t need the target variable in the above as the data set itself is the target variable.

8 Quiz Maker Automatic Quiz Generation From Text Using NLP
No ratings yet
8 Quiz Maker Automatic Quiz Generation From Text Using NLP
11 pages
NLP Practicals All
No ratings yet
NLP Practicals All
57 pages
13 TextGen 2024
No ratings yet
13 TextGen 2024
106 pages
Bert Model - NLP
No ratings yet
Bert Model - NLP
10 pages
Unit - 2
No ratings yet
Unit - 2
55 pages
11 Bert
No ratings yet
11 Bert
66 pages
BERT GPT CoT
No ratings yet
BERT GPT CoT
83 pages
UNIT-5 and 6
No ratings yet
UNIT-5 and 6
40 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
Final Ojt
No ratings yet
Final Ojt
31 pages
BERT Language Model
No ratings yet
BERT Language Model
7 pages
How Does A GPT Tool Process Inputs
No ratings yet
How Does A GPT Tool Process Inputs
19 pages
BA Module 02 - Quiz
100% (10)
BA Module 02 - Quiz
22 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
BERT
No ratings yet
BERT
98 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
Report Group-8
No ratings yet
Report Group-8
16 pages
495 Lecture 11 BERT
No ratings yet
495 Lecture 11 BERT
31 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
NLP Basics
No ratings yet
NLP Basics
12 pages
Natural Language Processing Unit 1
No ratings yet
Natural Language Processing Unit 1
16 pages
32-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024
No ratings yet
32-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024
8 pages
NLP Mod 1 (New)
No ratings yet
NLP Mod 1 (New)
50 pages
Lec 02
No ratings yet
Lec 02
33 pages
Unraveling The Power of Natural Language Processing
No ratings yet
Unraveling The Power of Natural Language Processing
11 pages
Assignment 05 CL
No ratings yet
Assignment 05 CL
3 pages
Status Report I-1
No ratings yet
Status Report I-1
1 page
Gitika Mandal BE4 A 17 NLP EXP1
No ratings yet
Gitika Mandal BE4 A 17 NLP EXP1
3 pages
13 - Bert
No ratings yet
13 - Bert
17 pages
Report 24
No ratings yet
Report 24
7 pages
BERT Architecture
No ratings yet
BERT Architecture
8 pages
Rebertsubmission116 NW
No ratings yet
Rebertsubmission116 NW
26 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLP - 1 - 250119 - 222702
No ratings yet
NLP - 1 - 250119 - 222702
71 pages
Learning To Answer by Learning To Ask - Getting The Best of GPT-2 and BERT Worlds PDF
No ratings yet
Learning To Answer by Learning To Ask - Getting The Best of GPT-2 and BERT Worlds PDF
10 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
Pretraining-Based Natural Language Generation For Text Summarization
No ratings yet
Pretraining-Based Natural Language Generation For Text Summarization
7 pages
Understanding BERT
No ratings yet
Understanding BERT
4 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
BERT
No ratings yet
BERT
4 pages
NLP m2
No ratings yet
NLP m2
71 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Bert 1
No ratings yet
Bert 1
4 pages
HKBK College of Engineering Department of Computer Science and Engineering
No ratings yet
HKBK College of Engineering Department of Computer Science and Engineering
24 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
The Birth of BERT
No ratings yet
The Birth of BERT
7 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
NLP Crash Course Comprehensive
No ratings yet
NLP Crash Course Comprehensive
2 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Spark NLP Training-Public-Oct 2020
No ratings yet
Spark NLP Training-Public-Oct 2020
50 pages
Chapter 2. Transformers: A Note For Early Release Readers
No ratings yet
Chapter 2. Transformers: A Note For Early Release Readers
85 pages
Preprint Jesus
No ratings yet
Preprint Jesus
2 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
Bert Explained
No ratings yet
Bert Explained
8 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
NepaliGPT 2.0: Nepali Text Understanding and Generation
No ratings yet
NepaliGPT 2.0: Nepali Text Understanding and Generation
9 pages
ChatGPT MASTERY 12 Books in 1 Unlocki... (Z-Library)
No ratings yet
ChatGPT MASTERY 12 Books in 1 Unlocki... (Z-Library)
161 pages
BA Module 02 - 2.1 + 2.2
No ratings yet
BA Module 02 - 2.1 + 2.2
12 pages
Seminar Text Summarization 1
No ratings yet
Seminar Text Summarization 1
21 pages
09-12 - A Manufacturer of Semiconductor Devices Takes A Random Sampl Quizlet
100% (1)
09-12 - A Manufacturer of Semiconductor Devices Takes A Random Sampl Quizlet
5 pages
Aiml Online Brochure
No ratings yet
Aiml Online Brochure
16 pages
P-Values Notes
No ratings yet
P-Values Notes
15 pages
Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data
No ratings yet
Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data
5 pages
01 - Describing and Summarizing Data
No ratings yet
01 - Describing and Summarizing Data
41 pages
A Deep Learning-Based Approach For Machining Process Route Generation
No ratings yet
A Deep Learning-Based Approach For Machining Process Route Generation
19 pages
AI Glossary HB Basic
No ratings yet
AI Glossary HB Basic
21 pages
BA Module 02 - 2.4 - Confidence Interval
No ratings yet
BA Module 02 - 2.4 - Confidence Interval
41 pages
300 Pedgai 10 en SG
No ratings yet
300 Pedgai 10 en SG
341 pages
R - Package Creation and Installation
No ratings yet
R - Package Creation and Installation
9 pages
BA Module 02 - Practice Questions
No ratings yet
BA Module 02 - Practice Questions
23 pages
BA 1.2 - Visualizing Data
No ratings yet
BA 1.2 - Visualizing Data
37 pages
VOC1 - Part1
No ratings yet
VOC1 - Part1
70 pages
Open X-Embodiment: Robotic Learning Datasets and RT-X
No ratings yet
Open X-Embodiment: Robotic Learning Datasets and RT-X
12 pages
MLSys Class LLM Introduction
No ratings yet
MLSys Class LLM Introduction
43 pages
A Review On Large Language Models Architectures, Applications, Taxonomies, Open Issues and Challenges
No ratings yet
A Review On Large Language Models Architectures, Applications, Taxonomies, Open Issues and Challenges
37 pages
Project
No ratings yet
Project
65 pages
BA 1.3 - Descriptive Statistics
No ratings yet
BA 1.3 - Descriptive Statistics
70 pages
Senarai Tajuk FYP Dan Nama Pelajar Sidang 2022-2023-Program Mekatronik 2nd Round-1
No ratings yet
Senarai Tajuk FYP Dan Nama Pelajar Sidang 2022-2023-Program Mekatronik 2nd Round-1
13 pages
Deep Multi-Scale Pyramidal Features Network For Supervised Video Summarization
No ratings yet
Deep Multi-Scale Pyramidal Features Network For Supervised Video Summarization
14 pages
Deep Learning and Transfer Learning Architectures For English Premier League Player Performance Forecasting
No ratings yet
Deep Learning and Transfer Learning Architectures For English Premier League Player Performance Forecasting
13 pages
Scaling Transformer To 1M Tokens and Beyond With RMT (Arxiv:2304.11062)
No ratings yet
Scaling Transformer To 1M Tokens and Beyond With RMT (Arxiv:2304.11062)
9 pages
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
No ratings yet
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
6 pages
A Survey: Object Detection Methods From CNN To Transformer
No ratings yet
A Survey: Object Detection Methods From CNN To Transformer
31 pages
BA Module 01 - Quiz
No ratings yet
BA Module 01 - Quiz
29 pages
Theory Is All You Need: AI, Human Cognition, and Decision Making
No ratings yet
Theory Is All You Need: AI, Human Cognition, and Decision Making
46 pages
2024 Conversational - AI - An - Explication - of - Few-Shot - Learning - Problem - in - Transformers-Based - Chatbot - Syste
No ratings yet
2024 Conversational - AI - An - Explication - of - Few-Shot - Learning - Problem - in - Transformers-Based - Chatbot - Syste
19 pages
Dab-Detr Dynamic Anchor Boxes
No ratings yet
Dab-Detr Dynamic Anchor Boxes
19 pages
F Dist Recipocal Proof Alternate - Mathematics Stack Exchange
No ratings yet
F Dist Recipocal Proof Alternate - Mathematics Stack Exchange
5 pages
Probability - Why Is The Expected Value of X-Squared - $E (X 2) - Neq E (X) 2$ - Mathematics Stack Exchange
No ratings yet
Probability - Why Is The Expected Value of X-Squared - $E (X 2) - Neq E (X) 2$ - Mathematics Stack Exchange
5 pages
Content - The Mean and Variance of - ( - Bar (X) - )
No ratings yet
Content - The Mean and Variance of - ( - Bar (X) - )
4 pages
Enhancing Quranic Question Answering Systems Through Dataset Expansion and Model Optimization
No ratings yet
Enhancing Quranic Question Answering Systems Through Dataset Expansion and Model Optimization
7 pages
Proof - Relationship Between Normal Distribution and T-Distribution
No ratings yet
Proof - Relationship Between Normal Distribution and T-Distribution
2 pages
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
No ratings yet
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
11 pages
PMC-CLIP - Contrastive Language-Image Pre-Training Using Biomedical Documents
No ratings yet
PMC-CLIP - Contrastive Language-Image Pre-Training Using Biomedical Documents
13 pages
GQA: Training Generalized Multi-Query Transformer Models From Multi-Head Checkpoints
No ratings yet
GQA: Training Generalized Multi-Query Transformer Models From Multi-Head Checkpoints
7 pages
Untitled Collection 2ye0ujym Composite User Behaviour Assisted Rumour Detection Over 4abixgi4tv
No ratings yet
Untitled Collection 2ye0ujym Composite User Behaviour Assisted Rumour Detection Over 4abixgi4tv
6 pages
Arshad Resume - 2025
No ratings yet
Arshad Resume - 2025
2 pages
DIET: Lightweight Language Understanding For Dialogue Systems
No ratings yet
DIET: Lightweight Language Understanding For Dialogue Systems
9 pages
Shreyash Resume
No ratings yet
Shreyash Resume
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

INTELLIPAAT - 2024 - 01 - 20 - Tansformers Cont. and Autoencoders

Uploaded by

INTELLIPAAT - 2024 - 01 - 20 - Tansformers Cont. and Autoencoders

Uploaded by

In Generative question and answering the answer is rephrased.

When you hit the encode:

input_ids = tokenizer.encode(question, text)

The vector db support similarity search.

We then calculate the error and from the error backpropagate.

The whole thing put together becomes an autoencoder.

(x_train, _), (x_test, _) = mnist.load_data()

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.