Content-Length: 374185 | pFad | http://github.com/postgresml/postgresml/pull/1570.diff

thub.com diff --git a/pgml-cms/docs/.gitbook/assets/Chatbots_Flow-Diagram.svg b/pgml-cms/docs/.gitbook/assets/Chatbots_Flow-Diagram.svg new file mode 100644 index 000000000..382cab6e3 --- /dev/null +++ b/pgml-cms/docs/.gitbook/assets/Chatbots_Flow-Diagram.svg @@ -0,0 +1,281 @@ + diff --git a/pgml-cms/docs/.gitbook/assets/Chatbots_King-Diagram.svg b/pgml-cms/docs/.gitbook/assets/Chatbots_King-Diagram.svg new file mode 100644 index 000000000..8f9d7f7fd --- /dev/null +++ b/pgml-cms/docs/.gitbook/assets/Chatbots_King-Diagram.svg @@ -0,0 +1,78 @@ + diff --git a/pgml-cms/docs/.gitbook/assets/Chatbots_Limitations-Diagram.svg b/pgml-cms/docs/.gitbook/assets/Chatbots_Limitations-Diagram.svg new file mode 100644 index 000000000..c96b30ec4 --- /dev/null +++ b/pgml-cms/docs/.gitbook/assets/Chatbots_Limitations-Diagram.svg @@ -0,0 +1,275 @@ + diff --git a/pgml-cms/docs/.gitbook/assets/Chatbots_Tokens-Diagram.svg b/pgml-cms/docs/.gitbook/assets/Chatbots_Tokens-Diagram.svg new file mode 100644 index 000000000..0b7c0915a --- /dev/null +++ b/pgml-cms/docs/.gitbook/assets/Chatbots_Tokens-Diagram.svg @@ -0,0 +1,238 @@ + diff --git a/pgml-cms/docs/.gitbook/assets/chatbot_flow.png b/pgml-cms/docs/.gitbook/assets/chatbot_flow.png deleted file mode 100644 index f9107d99f..000000000 Binary files a/pgml-cms/docs/.gitbook/assets/chatbot_flow.png and /dev/null differ diff --git a/pgml-cms/docs/.gitbook/assets/embedding_king.png b/pgml-cms/docs/.gitbook/assets/embedding_king.png deleted file mode 100644 index 03deebbe8..000000000 Binary files a/pgml-cms/docs/.gitbook/assets/embedding_king.png and /dev/null differ diff --git a/pgml-cms/docs/.gitbook/assets/embeddings_tokens.png b/pgml-cms/docs/.gitbook/assets/embeddings_tokens.png deleted file mode 100644 index 6f7a13221..000000000 Binary files a/pgml-cms/docs/.gitbook/assets/embeddings_tokens.png and /dev/null differ diff --git a/pgml-cms/docs/guides/chatbots/README.md b/pgml-cms/docs/guides/chatbots/README.md index cd65d9125..9237f5c38 100644 --- a/pgml-cms/docs/guides/chatbots/README.md +++ b/pgml-cms/docs/guides/chatbots/README.md @@ -30,7 +30,7 @@ Here is an example flowing from: text -> tokens -> LLM -> probability distribution -> predicted token -> text -

The flow of inputs through an LLM. In this case the inputs are "What is Baldur's Gate 3?" and the output token "14" maps to the word "I"

{% hint style="info" %} We have simplified the tokenization process. Words do not always map directly to tokens. For instance, the word "Baldur's" may actually map to multiple tokens. For more information on tokenization checkout [HuggingFace's summary](https://huggingface.co/docs/transformers/tokenizer\_summary). @@ -108,11 +108,11 @@ What does an `embedding` look like? `Embeddings` are just vectors (for our use c embedding_1 = embed("King") # embed returns something like [0.11, -0.32, 0.46, ...] ``` -

The flow of word -> token -> embedding

`Embeddings` aren't limited to words, we have models that can embed entire sentences. -

The flow of sentence -> tokens -> embedding

Why do we care about `embeddings`? `Embeddings` have a very interesting property. Words and sentences that have close [semantic similarity](https://en.wikipedia.org/wiki/Semantic\_similarity) sit closer to one another in vector space than words and sentences that do not have close semantic similarity. @@ -157,7 +157,7 @@ print(context) There is a lot going on with this, let's check out this diagram and step through it. -

The flow of taking a document, splitting it into chunks, embedding those chunks, and then retrieving a chunk based off of a users query

Step 1: We take the document and split it into chunks. Chunks are typically a paragraph or two in size. There are many ways to split documents into chunks, for more information check out [this guide](https://www.pinecone.io/learn/chunking-strategies/).

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!