Skip to content

pgml chat opensourceai #1238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion pgml-apps/pgml-chat/.env.template
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ DATABASE_URL=<POSTGRES_DATABASE_URL starts with postgres://>

SLACK_BOT_TOKEN=<SLACK_BOT_TOKEN>
SLACK_APP_TOKEN=<SLACK_APP_TOKEN>
DISCORD_BOT_TOKEN=<DISCORD_BOT_TOKEN>
DISCORD_BOT_TOKEN=<DISCORD_BOT_TOKEN>
SYSTEM_PROMPT_TEMPLATE=<SYSTEM PROMPT FOR CHAT COMPLETION MODEL. Check prompts.md file for examples>
51 changes: 41 additions & 10 deletions pgml-apps/pgml-chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ A command line tool to build and deploy a **_knowledge based_** chatbot using Po

There are two stages in building a knowledge based chatbot:
- Build a knowledge base by ingesting documents, chunking documents, generating embeddings and indexing these embeddings for fast query
- Generate responses to user queries by retrieving relevant documents and generating responses using OpenAI API
- Generate responses to user queries by retrieving relevant documents and generating responses using OpenAI and [OpenSourceAI API](https://postgresml.org/docs/introduction/apis/client-sdks/opensourceai)

This tool automates the above two stages and provides a command line interface to build and deploy a knowledge based chatbot.

Expand All @@ -12,7 +12,7 @@ Before you begin, make sure you have the following:

- PostgresML Database: Sign up for a free [GPU-powered database](https://postgresml.org/signup)
- Python version >=3.8
- OpenAI API key
- (Optional) OpenAI API key


# Getting started
Expand All @@ -30,24 +30,24 @@ wget https://raw.githubusercontent.com/postgresml/postgresml/master/pgml-apps/pg
```
3. Copy the template file to `.env`

4. Update environment variables with your OpenAI API key and PostgresML database credentials.
4. Update environment variables with your PostgresML database credentials and OpenAI API key (optional).
```bash
OPENAI_API_KEY=<OPENAI_API_KEY>
DATABASE_URL=<POSTGRES_DATABASE_URL starts with postgres://>
OPENAI_API_KEY=<OPENAI_API_KEY> # Optional
```

# Usage
You can get help on the command line interface by running:

```bash
(pgml-bot-builder-py3.9) pgml-chat % pgml-chat % pgml-chat --help
usage: pgml-chat [-h] --collection_name COLLECTION_NAME [--root_dir ROOT_DIR] [--stage {ingest,chat}] [--chat_interface {cli,slack,discord}]
[--chat_history CHAT_HISTORY] [--bot_name BOT_NAME] [--bot_language BOT_LANGUAGE] [--bot_topic BOT_TOPIC]
[--bot_topic_primary_language BOT_TOPIC_PRIMARY_LANGUAGE] [--bot_persona BOT_PERSONA]
usage: pgml-chat [-h] --collection_name COLLECTION_NAME [--root_dir ROOT_DIR] [--stage {ingest,chat}] [--chat_interface {cli,slack,discord}] [--chat_history CHAT_HISTORY] [--bot_name BOT_NAME]
[--bot_language BOT_LANGUAGE] [--bot_topic BOT_TOPIC] [--bot_topic_primary_language BOT_TOPIC_PRIMARY_LANGUAGE] [--bot_persona BOT_PERSONA]
[--chat_completion_model CHAT_COMPLETION_MODEL] [--max_tokens MAX_TOKENS] [--vector_recall_limit VECTOR_RECALL_LIMIT]

PostgresML Chatbot Builder

optional arguments:
options:
-h, --help show this help message and exit
--collection_name COLLECTION_NAME
Name of the collection (schema) to store the data in PostgresML database (default: None)
Expand All @@ -57,16 +57,21 @@ optional arguments:
--chat_interface {cli,slack,discord}
Chat interface to use (default: cli)
--chat_history CHAT_HISTORY
Number of messages from history used for generating response (default: 1)
Number of messages from history used for generating response (default: 0)
--bot_name BOT_NAME Name of the bot (default: PgBot)
--bot_language BOT_LANGUAGE
Language of the bot (default: English)
--bot_topic BOT_TOPIC
Topic of the bot (default: PostgresML)
--bot_topic_primary_language BOT_TOPIC_PRIMARY_LANGUAGE
Primary programming language of the topic (default: )
Primary programming language of the topic (default: SQL)
--bot_persona BOT_PERSONA
Persona of the bot (default: Engineer)
--chat_completion_model CHAT_COMPLETION_MODEL
--max_tokens MAX_TOKENS
Maximum number of tokens to generate (default: 256)
--vector_recall_limit VECTOR_RECALL_LIMIT
Maximum number of documents to retrieve from vector recall (default: 1)
```
## Ingest
In this step, we ingest documents, chunk documents, generate embeddings and index these embeddings for fast query.
Expand Down Expand Up @@ -146,6 +151,32 @@ Once the discord app is running, you can interact with the chatbot on Discord as

![Discord Chatbot](./images/discord_screenshot.png)

# Prompt Engineering
In addition to relevant context retrieved from vector search, system prompt to generate accurate responses with minimum hallucinations requires prompt engineering.
Different chat completion models require different system prompts. Since the prompts including the context are long, they suffer from **lost in the middle** problem described in [this paper](https://arxiv.org/pdf/2307.03172.pdf). Below are some of the prompts that we have used for different chat completion models.

## Default prompt (GPT-3.5 and open source models)
```text
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
```

## GPT-4 System prompt
```text
You are an assistant to answer questions about {topic}.\
Your name is {name}. You speak like {persona} in {language}. Use the given list of documents to answer user's question.\
Use the conversation history if it is applicable to answer the question. \n Use the following steps:\n \
1. Identify if the user input is really a question. \n \
2. If the user input is not related to the {topic} then respond that it is not related to the {topic}.\n \
3. If the user input is related to the {topic} then first identify relevant documents from the list of documents. \n \
4. If the documents that you found relevant have information to completely and accurately answers the question then respond with the answer.\n \
5. If the documents that you found relevant have code snippets then respond with the code snippets. \n \
6. Most importantly, don't make up code snippets that are not present in the documents.\n \
7. If the user input is generic like Cool, Thanks, Hello, etc. then respond with a generic answer. \n"
```

# Developer Guide

1. Clone this repository, start a poetry shell and install dependencies
Expand Down
Loading
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy