0% found this document useful (0 votes)
23 views39 pages

LangChain From 0 To 1 Public 1 PpuSgEN

Uploaded by

raghunadha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views39 pages

LangChain From 0 To 1 Public 1 PpuSgEN

Uploaded by

raghunadha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

LangChain From 0 To 1

Unveiling the Power of LLM Programming


GitHub

https://github.com/Stell0/fosdem2024

- Presentation
- Code
- Useful links
Our Journey

1. Introduction to LangChain
2. Document loaders
3. Text Splitters
4. Embeddings
5. Vectorstores
6. Retrievers
7. Prompts and Templates
8. Large Language Models
9. Chains
10. RAG - Retrieval Augmented Generation
11. Demo
Retrieval Augmented Generation (RAG) 🔥🔥🔥

Augment LLM knowledge using additional data

● Combines retrieval + generation


● Data not in training dataset
○ Private data
○ Data after cutoff date, even real time
● Improves accuracy and relevancy
● Supports evidence-Based Responses, can
reference source
Example of RAG use case: QA over unstructured data

data

Answer
~~~
~~~~~~
~~~~~~ LLM
~~~

Question
Example of RAG use case: QA over unstructured data

~~
~~~
~~~
~~~~~~
~~~
~~~
~~~
[0.2, 0.3, 2.1, 0.2, …]
~~~~ ~~~~~~
YouTube ~~~~~~
transcript ~~~
video
~~ ~~~
~~~
~~~
~~~
[1.2, 4.7, 0.1, 0.1, …]
~ ~~~ ~~~

Prompt Answer
[0.9, 1.2, 2.1, 1.1, …]
~~~
~~~~~~ Template LLM
~~~~~~
~~~
Instructions
+
Question {Context}
+
{Question}
LangChain

- Python (also JS/TS) framework


- Building blocks
- Swappable components
- Examples
- From PoC to Production
- Speed of improvement
LangChain
Preparing and storing data
~~
~~~
~~~
~~~~~~
~~~
~~~
~~~
[0.2, 0.3, 2.1, 0.2, …]
~~~~ ~~~~~~
YouTube ~~~~~~
transcript ~~~
video
~~ ~~~
~~~
~~~
~~~
[1.2, 4.7, 0.1, 0.1, …]
~ ~~~ ~~~
Document loader Text Splitter Embedding Function Vectorstore

~~~
~~~
~~~ [0.2, 0.3, 2.1, 0.2, …] [0.2, 0.3, 2.1, 0.2, …]
~~~
HTML ~~~
PDF ~~~ [1.2, 4.7, 0.1, 0.1, …]
[0.9, 1.2, 2.1, 1.1, …]
JSON ~~~
~~~ [1.2, 4.7, 0.1, 0.1, …]
TXT
~~~
[0.4, 0.4, 1.5, 0.6, …]

~~~
~~~
~~~
~~~
~~~
~~~ ~~~
~~~ [0.9, 1.2, 2.1, 1.1, …]
~~~

~~~ ~~~
~~~ ~~~
~~~ ~~~
~~~
~~~
~~~
~~~
~~~
~~~
~~~ [0.4, 0.4, 1.5, 0.6, …]
~~~
~~~
Document Loaders

Microsoft Word
MongoDB
Open Document Format (ODT)
Arxiv Pandas DataFrame
CSV PubMed
Discord ReadTheDocs Documentation
HTML
PDF Email Reddit
EPub RSS Feeds
JSON EverNote Slack
TXT Facebook Chat Snowflake
… Figma Telegram
Git X
GitHub URL
HTML WhatsApp Chat
JSON Wikipedia
~~~
~~~ Markdown XML
~~~ Mastodon YouTube audio
MediaWiki Dump YouTube transcripts
Document Loaders

Loading a YouTube video transcript

- YoutubeLoader from LangChain Community


- loaders return a list of Documents
Document class

page_content: Document text metadata: dictionary { “source”:”https://…”}


Text Splitters

Break text into smaller chunks

~~~
~~~
~~~

~~~
~~~
~~~
~~~
~~~
~~~

~~~ ~~~
~~~ ~~~
~~~ ~~~
~~~
~~~
~~~

https://chunkviz.up.railway.app
Characters / Tokens

Text Splitters: 5 levels Recursive Character

of text splitting
Document structure

Semantic Chunker

Agent-like Splitting
Text Splitters

RecursiveCharacterTextSplitter
Embeddings

- Numerical representation
- Vectors in High-dimensional space
- Each dimension reflects an aspect
- Similarity = Proximity in embedding space ~~~
~~~ [0.2, 0.3, 2.1, 0.2, …]
~~~

~~~
~~~ [1.2, 4.7, 0.1, 0.1, …]
~~~

~~~
~~~ [0.9, 1.2, 2.1, 1.1, …]
~~~

~~~
~~~ [0.4, 0.4, 1.5, 0.6, …]
~~~
Embeddings

- Complexity is hidden
- We rely on an external provider
- note: data is sent to the external provider
Vectorstore

Storing embeddings

- Stores
- Search [0.2, 0.3, 2.1, 0.2, …]
- Retrieve
[1.2, 4.7, 0.1, 0.1, …]
[0.9, 1.2, 2.1, 1.1, …]
[0.4, 0.4, 1.5, 0.6, …]
Vectorstore

- ChromaDB initialized from our documents


- OpenAI embedding function
- Optional: persist directory
Most Used Vectorstores

https://blog.langchain.dev/langchain-state-of-ai-2023/
Using data

Prompt Answer
[0.9, 1.2, 2.1, 1.1, …]
~~~
~~~~~~ Template LLM
~~~~~~
~~~
Instructions
+
Question {Context}
+
{Question}
Retriever Prompt/Template LLM Chain

{
T }
~~~ ~~~
~~~ ~~~
~~~ ~~~
~~~
~~~
~~~
~~~~
~
Retriever

Question ➞ Embedding ➞ distance

Relevant Documents

~~~ ~~~
~~~ ~~~
~~~ ~~~
~~~
~~~
~~~
Retriever
Another Retriever

Multi Query Retriever

- use LLM to generate multiple variations of our questions


- increase chances of finding Documents near to the questions
Prompt/Template

- Guide LLM output

{
Question
T }
+

Documents


~~~~
~
context
Prompt
Prompt from Hub
LLM

https://python.langchain.com/docs/integrations/llms/
LLM
“Nobody Gets Fired For Buying
IBM OpenAI”
Most Used LLM Providers

https://blog.langchain.dev/langchain-state-of-ai-2023/
Most Used OSS Model Providers

https://blog.langchain.dev/langchain-state-of-ai-2023/
Put everything together
Chains

Sequence of calls

- Advantages:
- Simple
- Modular
- Efficient
- compose your own
- Off-the-shelf
- Legacy Class
- LCEL
- Streaming
- Async (and sync) support
- Optimized parallel execution
- integrated with LangSmith and LangServe
- …
Put everything together using LCEL
Other use cases

- QA over structured data


- Question ➞ SQL Query ➞ Query Results ➞ Additional Context ➞Answer
- Extraction
- Unstructured Text + JSON Schema ➞ Compiled JSON
- Summarization
- MOAR text ➞ LESS text
- Synthetic data generation
- JSON Schema ➞ [Unstructured Text, Unstructured Text, Unstructured Text, Unstructured Text …]
- Agents
- let LLM takes actions
The End
https://github.com/Stell0
https://x.com/Stll00
https://t.me/Stll0
https://www.linkedin.com/in/stefano-fancello

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy