Ways To Use LLM in Finance Organisation
Ways To Use LLM in Finance Organisation
— or even hundreds of millions. But there are several ways to deploy customized
LLMs that are faster, easier, and, most importantly, cheaper.
According to a September survey of IT decision makers by Dell, 76% say gen AI will
have a “significant if not transformative” impact on their organizations, and most
expect to see meaningful results within the next 12 months.
A large language model (LLM) is a type of gen AI that focuses on text and code
instead of images or audio, although some have begun to integrate different
modalities. The most popular LLMs in the enterprise today are ChatGPT and other
OpenAI GPT models, Anthropic’s Claude, Meta’s Llama 2, and Falcon, an open-source
model from the Technology Innovation Institute in Abu Dhabi best known for its
support for languages other than English.
There are several ways companies deploy LLMs, like giving employees access to
public apps, using prompt engineering and APIs to embed LLMs into existing
software, using vector databases to improve accuracy and relevance, fine-tuning
existing models, or building their own.
--------------------------------------------------------------------Deploying
public LLMs
Dig Security is an Israeli cloud data security company, and its engineers use
ChatGPT to write code. “Every engineer uses stuff to help them write code faster,”
says CEO Dan Benjamin. And ChatGPT is one of the first and easiest coding
assistants out there. But there’s a problem with it — you can never be sure if the
information you upload won’t be used to train the next generation of the model. Dig
Security addresses this possibility in two ways. First, the company uses a secure
gateway to check what information is being uploaded.
“Our employees know they can’t upload anything sensitive,” says Benjamin. “It’s
blocked.”
For example, someone can use a VPN or a personal computer and access the public
version of ChatGPT. That’s where another level of risk mitigation comes in.
“It’s all about employee training,” he says, “and making sure they understand what
they need to do, and they’re well trained on data security.”
Skyhigh Security in California says that close to a million end users accessed
ChatGPT through corporate infrastructures during the first half of 2023, with the
volume of users increasing by 1,500% between January and June, says Tracy Holden,
Skyhigh’s director of corporate marketing.
And in a July report from Netskope Threat Labs, source code is posted to ChatGPT
more than any other type of sensitive data at a rate of 158 incidents per 10,000
enterprise users per month.
------------------------------------------------------------------------Vector
databases and RAG
For most companies looking to customize their LLMs, retrieval augmented generation
(RAG) is the way to go. If someone is talking about embeddings or vector databases,
this is what they normally mean. The way it works is a user asks a question about,
say, a company policy or product. That question isn’t set to the LLM right away.
Instead, it’s processed first. Does the user have the right to access that
information? If the access rights are there, then all potentially relevant
information is retrieved, usually from a vector database. Then the question and the
relevant information is sent to the LLM and embedded into an optimized prompt that
might also specify the preferred format of the answer and tone of voice the LLM
should use.
“Right now, we’re converting everything to a vector database,” says Ellie Fields,
chief product and engineering officer at Salesloft, a sales engagement platform
vendor. “And yes, they’re working.”
And it’s more effective than using simple documents to provide context for LLM
queries, she says.
The company primarily uses ChromaDB, an open-source vector store, whose primary use
is for LLMs. Another vector database Salesloft uses is Pgvector, a vector
similarity search extension for the PostgreSQL database.
“But we’ve also done some research using FAISS and Pinecone,” she says. FAISS, or
Facebook AI Similarity Search, is an open-source library provided by Meta that
supports similarity searches in multimedia documents.
“We had Azure certified as a new sub-processor on our platform,” says Fields. “We
always let customers know when we have a new processor for their information.”
But Salesloft also works with Google and IBM, and is working on a gen AI
functionality that uses those platforms as well.
“We’ll definitely work with different providers and different models,” she says.
“Things are changing week by week. If you’re not looking at different models,
you’re missing the boat.” So RAG allows enterprises to separate their proprietary
data from the model itself, making it much easier to swap models in and out as
better models are released. In addition, the vector database can be updated, even
in real time, without any need to do more fine-tuning or retraining of the model.
“We’ve switched out models, from OpenAI to OpenAI on Azure,” says Fields. “And
we’ve switched among different OpenAI models. We may even support different models
for different parts of our customer base.”
Sometimes different models have different APIs, she adds. “It’s not trivial,” she
says. But switching out a model is still easier than retraining. “We haven’t yet
found a use case that’s better served by fine tuning rather than a vector
database,” Fields adds. “I believe there are use cases out there, but so far, we
haven’t found one that performs better.”
One of the first applications of LLMs that Salesloft rolled out was adding a
feature that lets customers generate a sales email to a prospect. “Customers were
taking a lot of time to write those emails,” says Fields. “It was hard to start,
and there’s a lot of writer’s block.” So now customers can specify the target
persona, their value proposition, and the call to action — and they get three
different draft emails back they can personalize. Salesloft uses OpenAI’s GPT 3.5
to write the email, says Fields.
To feed information into the LLM, Ikigai uses a vector database, also run locally.
It’s built on top of the Boundary Forest algorithm, says co-founder and co-CEO
Devavrat Shah.
“At MIT four years ago, some of my students and I experimented with a ton of vector
databases,” says Shah, who is also a professor of AI at MIT. “I knew it would be
useful, but not this useful.”
Keeping both the model and the vector database local means no data can leak out to
third parties, he says. “For clients who are okay with sending queries to others,
we use OpenAI,” says Shah. “We are LLM agnostic.”
PricewaterhouseCoopers, which built its own ChatPWC tool, is also LLM agnostic.
“ChatPWC makes our associates more capable,” says Bret Greenstein, the firm’s
partner and leader of the gen AI go-to-market strategy. For example, it includes
pre-built prompts to generate job descriptions. “It has all my formats, templates,
and terminology,” he says. “We have an HR, data and prompt experts, and we design
something that generates very good job postings. Now nobody needs to know how to do
the amazing prompting that generates job descriptions.”
The tool is built on top of Microsoft Azure, but the company also built it for
Google Cloud Platform and AWS. “We have to serve our clients, and they exist on
every cloud,” Says Greenstein. Similarly, it’s optimized to use different models on
the back end, because that’s how clients want it. “We have every model working,” he
adds. “Llama 2, Falcon — we have everything.”
“There’s a lot people can do,” he says, “like building up their data that’s
independent of models, and building up the governance.” Then, when the market
changes, and a new model comes out, the data and governance structure will still be
relevant.
-------------------------------------------------------------------------The fine
tuning
Management consulting company AArete took open source model GPT 2 and fine tuned it
on its own data. “It was lightweight,” says Priya Iragavarapu, the company’s VP of
digital technology services. “We wanted an open source one to be able to take it
and post it internally in our environment.”
If AArete used a hosted model and connected to it via API, trust issues come up.
“We’re concerned where the data from the prompting might end up,” she says. “We
don’t want to take those risks.”
When choosing an open source model, she looks at how many times it was previously
downloaded, its community support, and its hardware requirements.
“The foundational model should also have some task relevancy,” she says. “There are
some models for specific tasks. For example, I recently looked at a Hugging Face
model that parses content from PDFs into a structured format.”
Many companies in the financial world and in the health care industry are fine-
tuning LLMs based on their own additional data sets.
“The basic LLMs are trained on the whole internet,” she says. With fine tuning, a
company can create a model specifically targeted at their business use case.
A common way of doing this is by creating a list of questions and answers and fine
tuning a model on those. In fact, OpenAI began allowing fine tuning of its GPT 3.5
model in August, using a Q&A approach, and unrolled a suite of new fine tuning,
customization, and RAG options for GPT 4 at its November DevDay.
This is particularly useful for customer service and help desk applications, where
a company might already have a data bank of FAQs.
Also in the Dell survey, 21% of companies prefer to retrain existing models, using
their own data in their own environment.
“The most popular option seems to be Llama 2,” says Andy Thurai, VP and principal
analyst at Constellation Research Inc. Llama 2 comes in three different sizes, and
is free for companies with fewer than 700 million monthly users. Companies can
fine-tune it on their own data sets and have a new, custom model fairly quickly, he
says. In fact, the Hugging Face LLM leaderboard is currently dominated by different
fine-tunings and customizations of Llama 2. Before Llama 2, Falcon was the most
popular open source LLM, he adds. “It’s an arms race right now.” Fine tuning can
create a model that’s more accurate for specific business use cases, he says. “If
you’re using a generalized Llama model, the accuracy can be low.”
And there are some advantages to fine-tuning over RAG embedding. With embedding, a
company has to do a vector database search for every query. “And you’ve got the
implementation of the database,” Thurai says. “That’s not going to be easy,
either.”
There are no context window limits on fine tuning, either. With embedding, there’s
only so much information that can be added to a prompt. If a company does fine
tune, they wouldn’t do it often, just when a significantly improved version of the
base AI model is released.
Finally, if a company has a quickly-changing data set, fine tuning can be used in
combination with embedding. “You can fine tune it first, then do RAG for the
incremental updates,” he says.
Software companies building applications such as SaaS apps, might use fine tuning,
says PricewaterhouseCoopers’ Greenstein. “If you have a highly repeatable pattern,
fine tuning can drive down your costs,” he says, but for enterprise deployments,
RAG is more efficient in 90 to 95% of cases.
“We’re actually looking into fine-tuning models for specific verticals,” adds
Sebastien Paquet, VP of ML at Coveo, a Canadian enterprise search and
recommendations company. “We have some specialized verticals with specialized
vocabulary, like the medical vertical. Enterprises selling truck parts have their
own way of how the parts are named.”
For now, however, the company is using OpenAI’s GPT 3.5 and GPT 4 running on a
private Azure cloud, with the LLM API calls isolated so Coveo can switch to
different models if needed. It also uses some open source LLMs from Hugging Face
for specific use cases.
-------------------------------------------------------------------------Build an
LLM from scratch
Few companies are going to build their own LLM from scratch. After all, they are,
by definition, quite large. OpenAI’s GPT 3 has 175 billion parameters and was
trained on a data set of 45 terabytes and cost $4.6 million to train. And according
to OpenAI CEO Sam Altman, GPT 4 cost over $100 million.
That size is what gives LLMs their magic and ability to process human language,
with a certain degree of common sense, as well as the ability to follow
instructions.
“You can’t just train it on your own data,” says Carm Taglienti, distinguished
engineer at Insight. “There’s value that comes from training on tens of millions of
parameters.”
Today, nearly all LLMs come from the big hyperscalers or AI-focused startups like
OpenAI and Anthropic.
Even companies with extensive experience building their own models are staying away
from creating their own LLMs.
Salesloft, for example, has been building their own AI and machine learning models
for years, including gen AI models using earlier technologies, but is hesitant
about building a brand-new, cutting edge foundation model from scratch.
“It’s a massive computational step that, at least at this stage, I don’t see us
embarking on,” says Fields.