Skip to content

Commit a2bcd1d

Browse files
committed
readme for token classification
1 parent 5749330 commit a2bcd1d

File tree

2 files changed

+55
-3
lines changed

2 files changed

+55
-3
lines changed

README.md

Lines changed: 55 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
</h2>
2020

2121
<p align="center">
22-
Simple machine learning with
22+
Generative AI with
2323
<a href="https://www.postgresql.org/" target="_blank">PostgreSQL</a>
2424
</p>
2525

@@ -408,9 +408,61 @@ SELECT pgml.transform(
408408
]
409409
```
410410
## Token Classification
411-
## Table Question Answering
412-
## Question Answering
411+
Token classification is a task in natural language understanding, where labels are assigned to certain tokens in a text. Some popular subtasks of token classification include Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. NER models can be trained to identify specific entities in a text, such as individuals, places, and dates. PoS tagging, on the other hand, is used to identify the different parts of speech in a text, such as nouns, verbs, and punctuation marks.
412+
413+
![token classification](pgml-docs/docs/images/token-classification.png)
413414

415+
### Named Entity Recognition
416+
Named Entity Recognition (NER) is a task that involves identifying named entities in a text. These entities can include the names of people, locations, or organizations. The task is completed by labeling each token with a class for each named entity and a class named "0" for tokens that don't contain any entities. In this task, the input is text, and the output is the annotated text with named entities.
417+
418+
```sql
419+
SELECT pgml.transform(
420+
inputs => ARRAY[
421+
'I am Omar and I live in New York City.'
422+
],
423+
task => 'token-classification'
424+
) as ner;
425+
```
426+
*Result*
427+
```sql
428+
ner
429+
------------------------------------------------------
430+
[[
431+
{"end": 9, "word": "Omar", "index": 3, "score": 0.997110, "start": 5, "entity": "I-PER"},
432+
{"end": 27, "word": "New", "index": 8, "score": 0.999372, "start": 24, "entity": "I-LOC"},
433+
{"end": 32, "word": "York", "index": 9, "score": 0.999355, "start": 28, "entity": "I-LOC"},
434+
{"end": 37, "word": "City", "index": 10, "score": 0.999431, "start": 33, "entity": "I-LOC"}
435+
]]
436+
```
437+
438+
### Part-of-Speech (PoS) Tagging
439+
PoS tagging is a task that involves identifying the parts of speech, such as nouns, pronouns, adjectives, or verbs, in a given text. In this task, the model labels each word with a specific part of speech.
440+
441+
Look for models with `pos` to use a zero-shot classification model on the :hugs: Hugging Face model hub.
442+
```sql
443+
select pgml.transform(
444+
inputs => array [
445+
'I live in Amsterdam.'
446+
],
447+
task => '{"task": "token-classification",
448+
"model": "vblagoje/bert-english-uncased-finetuned-pos"
449+
}'::JSONB
450+
) as pos;
451+
```
452+
*Result*
453+
```sql
454+
pos
455+
------------------------------------------------------
456+
[[
457+
{"end": 1, "word": "i", "index": 1, "score": 0.999, "start": 0, "entity": "PRON"},
458+
{"end": 6, "word": "live", "index": 2, "score": 0.998, "start": 2, "entity": "VERB"},
459+
{"end": 9, "word": "in", "index": 3, "score": 0.999, "start": 7, "entity": "ADP"},
460+
{"end": 19, "word": "amsterdam", "index": 4, "score": 0.998, "start": 10, "entity": "PROPN"},
461+
{"end": 20, "word": ".", "index": 5, "score": 0.999, "start": 19, "entity": "PUNCT"}
462+
]]
463+
```
464+
## Question Answering
465+
## Table Question Answering
414466
## Translation
415467
## Summarization
416468
## Conversational
524 KB
Loading

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy