0% found this document useful (0 votes)
62 views5 pages

Ways To Use LLM in Finance Organisation

Several companies are deploying customized large language models (LLMs) in ways that improve security and control while reducing costs compared to building proprietary models. Some companies use public LLMs like ChatGPT through secure gateways or private instances. Others build vector databases to provide contextual information to queries before sending them to LLMs. Some companies run open source models like Falcon locally to avoid sending any data to third parties. LLM deployment options allow customization without the high costs of building new models.

Uploaded by

Md Ahsan Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views5 pages

Ways To Use LLM in Finance Organisation

Several companies are deploying customized large language models (LLMs) in ways that improve security and control while reducing costs compared to building proprietary models. Some companies use public LLMs like ChatGPT through secure gateways or private instances. Others build vector databases to provide contextual information to queries before sending them to LLMs. Some companies run open source models like Falcon locally to avoid sending any data to third parties. LLM deployment options allow customization without the high costs of building new models.

Uploaded by

Md Ahsan Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

Building a new large language model (LLM) from scratch can cost a company millions

— or even hundreds of millions. But there are several ways to deploy customized
LLMs that are faster, easier, and, most importantly, cheaper.

It’s the fastest-moving new technology in history. Generative AI is transforming


the world, changing the way we create images and videos, audio, text, and code.

According to a September survey of IT decision makers by Dell, 76% say gen AI will
have a “significant if not transformative” impact on their organizations, and most
expect to see meaningful results within the next 12 months.

A large language model (LLM) is a type of gen AI that focuses on text and code
instead of images or audio, although some have begun to integrate different
modalities. The most popular LLMs in the enterprise today are ChatGPT and other
OpenAI GPT models, Anthropic’s Claude, Meta’s Llama 2, and Falcon, an open-source
model from the Technology Innovation Institute in Abu Dhabi best known for its
support for languages other than English.

There are several ways companies deploy LLMs, like giving employees access to
public apps, using prompt engineering and APIs to embed LLMs into existing
software, using vector databases to improve accuracy and relevance, fine-tuning
existing models, or building their own.

--------------------------------------------------------------------Deploying
public LLMs
Dig Security is an Israeli cloud data security company, and its engineers use
ChatGPT to write code. “Every engineer uses stuff to help them write code faster,”
says CEO Dan Benjamin. And ChatGPT is one of the first and easiest coding
assistants out there. But there’s a problem with it — you can never be sure if the
information you upload won’t be used to train the next generation of the model. Dig
Security addresses this possibility in two ways. First, the company uses a secure
gateway to check what information is being uploaded.

“Our employees know they can’t upload anything sensitive,” says Benjamin. “It’s
blocked.”

Second, the company funnels its engineers to a version of ChatGPT running on a


private Azure cloud. This means Dig Security gets its own self-contained instance
of ChatGPT. Even with this belt-and-suspenders approach to security, it’s not a
perfect solution, Benjamin says. “There’s no perfect solution. Any organization
that thinks there is, is fooling itself.”

For example, someone can use a VPN or a personal computer and access the public
version of ChatGPT. That’s where another level of risk mitigation comes in.

“It’s all about employee training,” he says, “and making sure they understand what
they need to do, and they’re well trained on data security.”

Dig Security isn’t alone.

Skyhigh Security in California says that close to a million end users accessed
ChatGPT through corporate infrastructures during the first half of 2023, with the
volume of users increasing by 1,500% between January and June, says Tracy Holden,
Skyhigh’s director of corporate marketing.

And in a July report from Netskope Threat Labs, source code is posted to ChatGPT
more than any other type of sensitive data at a rate of 158 incidents per 10,000
enterprise users per month.

More recently, companies have been getting more secure, enterprise-friendly


options, like Microsoft Copilot, which combines ease of use with additional
controls and protections. And at the OpenAI DevDay in early November, CEO Sam
Altman said there are now 100 million active users using the company’s ChatGPT
chatbot, two million developers using its API, and more than 92% of Fortune 500
companies are building on top of the OpenAI platform.

------------------------------------------------------------------------Vector
databases and RAG
For most companies looking to customize their LLMs, retrieval augmented generation
(RAG) is the way to go. If someone is talking about embeddings or vector databases,
this is what they normally mean. The way it works is a user asks a question about,
say, a company policy or product. That question isn’t set to the LLM right away.
Instead, it’s processed first. Does the user have the right to access that
information? If the access rights are there, then all potentially relevant
information is retrieved, usually from a vector database. Then the question and the
relevant information is sent to the LLM and embedded into an optimized prompt that
might also specify the preferred format of the answer and tone of voice the LLM
should use.

A vector database is a way of organizing information in a series of lists, each one


sorted by a different attribute. For example, you might have a list that’s
alphabetical, and the closer your responses are in alphabetical order, the more
relevant they are.

An alphabetical list is a one-dimensional vector database, but vector databases can


have an unlimited number of dimensions, allowing you to search for related answers
based on their proximity to any number of factors. That makes them perfect to use
in conjunction with LLMs.

“Right now, we’re converting everything to a vector database,” says Ellie Fields,
chief product and engineering officer at Salesloft, a sales engagement platform
vendor. “And yes, they’re working.”

And it’s more effective than using simple documents to provide context for LLM
queries, she says.

The company primarily uses ChromaDB, an open-source vector store, whose primary use
is for LLMs. Another vector database Salesloft uses is Pgvector, a vector
similarity search extension for the PostgreSQL database.

“But we’ve also done some research using FAISS and Pinecone,” she says. FAISS, or
Facebook AI Similarity Search, is an open-source library provided by Meta that
supports similarity searches in multimedia documents.

And Pinecone is a proprietary cloud-based vector database that’s also become


popular with developers, and its free tier supports up to 100,000 vectors. Once the
relevant information is retrieved from the vector database and embedded into a
prompt, the query gets sent to OpenAI running in a private instance on Microsoft
Azure.

“We had Azure certified as a new sub-processor on our platform,” says Fields. “We
always let customers know when we have a new processor for their information.”
But Salesloft also works with Google and IBM, and is working on a gen AI
functionality that uses those platforms as well.

“We’ll definitely work with different providers and different models,” she says.
“Things are changing week by week. If you’re not looking at different models,
you’re missing the boat.” So RAG allows enterprises to separate their proprietary
data from the model itself, making it much easier to swap models in and out as
better models are released. In addition, the vector database can be updated, even
in real time, without any need to do more fine-tuning or retraining of the model.

“We’ve switched out models, from OpenAI to OpenAI on Azure,” says Fields. “And
we’ve switched among different OpenAI models. We may even support different models
for different parts of our customer base.”

Sometimes different models have different APIs, she adds. “It’s not trivial,” she
says. But switching out a model is still easier than retraining. “We haven’t yet
found a use case that’s better served by fine tuning rather than a vector
database,” Fields adds. “I believe there are use cases out there, but so far, we
haven’t found one that performs better.”

One of the first applications of LLMs that Salesloft rolled out was adding a
feature that lets customers generate a sales email to a prospect. “Customers were
taking a lot of time to write those emails,” says Fields. “It was hard to start,
and there’s a lot of writer’s block.” So now customers can specify the target
persona, their value proposition, and the call to action — and they get three
different draft emails back they can personalize. Salesloft uses OpenAI’s GPT 3.5
to write the email, says Fields.

-----------------------------------------------------------------Locally run open


source models
Boston-based Ikigai Labs offers a platform that allows companies to build custom
large graphical models, or AI models designed to work with structured data. But to
make the interface easier to use, Ikigai powers its front end with LLMs. For
example, the company uses the seven billion parameter version of the Falcon open
source LLM, and runs it in its own environment for some of its clients.

To feed information into the LLM, Ikigai uses a vector database, also run locally.
It’s built on top of the Boundary Forest algorithm, says co-founder and co-CEO
Devavrat Shah.

“At MIT four years ago, some of my students and I experimented with a ton of vector
databases,” says Shah, who is also a professor of AI at MIT. “I knew it would be
useful, but not this useful.”

Keeping both the model and the vector database local means no data can leak out to
third parties, he says. “For clients who are okay with sending queries to others,
we use OpenAI,” says Shah. “We are LLM agnostic.”

PricewaterhouseCoopers, which built its own ChatPWC tool, is also LLM agnostic.
“ChatPWC makes our associates more capable,” says Bret Greenstein, the firm’s
partner and leader of the gen AI go-to-market strategy. For example, it includes
pre-built prompts to generate job descriptions. “It has all my formats, templates,
and terminology,” he says. “We have an HR, data and prompt experts, and we design
something that generates very good job postings. Now nobody needs to know how to do
the amazing prompting that generates job descriptions.”

The tool is built on top of Microsoft Azure, but the company also built it for
Google Cloud Platform and AWS. “We have to serve our clients, and they exist on
every cloud,” Says Greenstein. Similarly, it’s optimized to use different models on
the back end, because that’s how clients want it. “We have every model working,” he
adds. “Llama 2, Falcon — we have everything.”

The market is changing quickly, of course, and Greenstein suggests enterprises


adopt a “no regrets” policy to their AI deployments.

“There’s a lot people can do,” he says, “like building up their data that’s
independent of models, and building up the governance.” Then, when the market
changes, and a new model comes out, the data and governance structure will still be
relevant.

-------------------------------------------------------------------------The fine
tuning
Management consulting company AArete took open source model GPT 2 and fine tuned it
on its own data. “It was lightweight,” says Priya Iragavarapu, the company’s VP of
digital technology services. “We wanted an open source one to be able to take it
and post it internally in our environment.”

If AArete used a hosted model and connected to it via API, trust issues come up.
“We’re concerned where the data from the prompting might end up,” she says. “We
don’t want to take those risks.”

When choosing an open source model, she looks at how many times it was previously
downloaded, its community support, and its hardware requirements.

“The foundational model should also have some task relevancy,” she says. “There are
some models for specific tasks. For example, I recently looked at a Hugging Face
model that parses content from PDFs into a structured format.”

Many companies in the financial world and in the health care industry are fine-
tuning LLMs based on their own additional data sets.

“The basic LLMs are trained on the whole internet,” she says. With fine tuning, a
company can create a model specifically targeted at their business use case.

A common way of doing this is by creating a list of questions and answers and fine
tuning a model on those. In fact, OpenAI began allowing fine tuning of its GPT 3.5
model in August, using a Q&A approach, and unrolled a suite of new fine tuning,
customization, and RAG options for GPT 4 at its November DevDay.

This is particularly useful for customer service and help desk applications, where
a company might already have a data bank of FAQs.

Also in the Dell survey, 21% of companies prefer to retrain existing models, using
their own data in their own environment.

“The most popular option seems to be Llama 2,” says Andy Thurai, VP and principal
analyst at Constellation Research Inc. Llama 2 comes in three different sizes, and
is free for companies with fewer than 700 million monthly users. Companies can
fine-tune it on their own data sets and have a new, custom model fairly quickly, he
says. In fact, the Hugging Face LLM leaderboard is currently dominated by different
fine-tunings and customizations of Llama 2. Before Llama 2, Falcon was the most
popular open source LLM, he adds. “It’s an arms race right now.” Fine tuning can
create a model that’s more accurate for specific business use cases, he says. “If
you’re using a generalized Llama model, the accuracy can be low.”

And there are some advantages to fine-tuning over RAG embedding. With embedding, a
company has to do a vector database search for every query. “And you’ve got the
implementation of the database,” Thurai says. “That’s not going to be easy,
either.”

There are no context window limits on fine tuning, either. With embedding, there’s
only so much information that can be added to a prompt. If a company does fine
tune, they wouldn’t do it often, just when a significantly improved version of the
base AI model is released.

Finally, if a company has a quickly-changing data set, fine tuning can be used in
combination with embedding. “You can fine tune it first, then do RAG for the
incremental updates,” he says.

Rowan Curran, analyst at Forrester Research, expects to see a lot of fine-tuned,


domain-specific models arising over the next year or so, and companies can also
distil models to make them more efficient at particular tasks. But only a small
minority of companies — 10% or less — will do this, he says.

Software companies building applications such as SaaS apps, might use fine tuning,
says PricewaterhouseCoopers’ Greenstein. “If you have a highly repeatable pattern,
fine tuning can drive down your costs,” he says, but for enterprise deployments,
RAG is more efficient in 90 to 95% of cases.

“We’re actually looking into fine-tuning models for specific verticals,” adds
Sebastien Paquet, VP of ML at Coveo, a Canadian enterprise search and
recommendations company. “We have some specialized verticals with specialized
vocabulary, like the medical vertical. Enterprises selling truck parts have their
own way of how the parts are named.”

For now, however, the company is using OpenAI’s GPT 3.5 and GPT 4 running on a
private Azure cloud, with the LLM API calls isolated so Coveo can switch to
different models if needed. It also uses some open source LLMs from Hugging Face
for specific use cases.

-------------------------------------------------------------------------Build an
LLM from scratch
Few companies are going to build their own LLM from scratch. After all, they are,
by definition, quite large. OpenAI’s GPT 3 has 175 billion parameters and was
trained on a data set of 45 terabytes and cost $4.6 million to train. And according
to OpenAI CEO Sam Altman, GPT 4 cost over $100 million.

That size is what gives LLMs their magic and ability to process human language,
with a certain degree of common sense, as well as the ability to follow
instructions.

“You can’t just train it on your own data,” says Carm Taglienti, distinguished
engineer at Insight. “There’s value that comes from training on tens of millions of
parameters.”

Today, nearly all LLMs come from the big hyperscalers or AI-focused startups like
OpenAI and Anthropic.

Even companies with extensive experience building their own models are staying away
from creating their own LLMs.

Salesloft, for example, has been building their own AI and machine learning models
for years, including gen AI models using earlier technologies, but is hesitant
about building a brand-new, cutting edge foundation model from scratch.

“It’s a massive computational step that, at least at this stage, I don’t see us
embarking on,” says Fields.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy