Why Agents Are The Next Frontier of Generative Ai
Why Agents Are The Next Frontier of Generative Ai
July 2024
Over the past couple of years, the world has agent system to accomplish a complex workflow.
marveled at the capabilities and possibilities A multiagent system could then interpret and
unleashed by generative AI (gen AI). Foundation organize this workflow into actionable tasks, assign
models such as large language models (LLMs) can work to specialized agents, execute these refined
perform impressive feats, extracting insights and tasks using a digital ecosystem of tools, and
generating content across numerous mediums, collaborate with other agents and humans to
such as text, audio, images, and video. But the next iteratively improve the quality of its actions.
stage of gen AI is likely to be more transformative.
In this article, we explore the opportunities that
We are beginning an evolution from the use of gen AI agents presents. Although the
knowledge-based, gen AI–powered technology remains in its nascent phase and
tools—say, chatbots that answer questions requires further technical development before
and generate content—to gen AI–enabled “agents” it’s ready for business deployment, it’s quickly
that use foundation models to execute complex, attracting attention. In the past year alone, Google,
multistep workflows across a digital world. In short, Microsoft, OpenAI, and others have invested in
the technology is moving from thought to action. software libraries and frameworks to support
agentic functionality. LLM-powered applications
Broadly speaking, “agentic” systems refer to such as Microsoft Copilot, Amazon Q, and Google’s
digital systems that can independently interact upcoming Project Astra are shifting from being
in a dynamic world. While versions of these knowledge-based to becoming more action-based.
software systems have existed for years, the Companies and research labs such as Adept,
natural-language capabilities of gen AI unveil new crewAI, and Imbue also are developing agent-based
possibilities, enabling systems that can plan their models and multiagent systems. Given the speed
actions, use online tools to complete those tasks, with which gen AI is developing, agents could
collaborate with other agents and people, and learn become as commonplace as chatbots are today.
to improve their performance. Gen AI agents
eventually could act as skilled virtual coworkers,
working with humans in a seamless and natural What value can agents bring
manner. A virtual assistant, for example, could plan to businesses?
and book a complex personalized travel itinerary, The value that agents can unlock comes from their
handling logistics across multiple travel platforms. potential to automate a long tail of complex use
Using everyday language, an engineer could cases characterized by highly variable inputs and
describe a new software feature to a programmer outputs—use cases that have historically been
agent, which would then code, test, iterate, and difficult to address in a cost- or time-efficient
deploy the tool it helped create. manner. Something as simple as a business trip, for
example, can involve numerous possible itineraries
Agentic systems traditionally have been difficult encompassing different airlines and flights, not
to implement, requiring laborious, rule-based to mention hotel rewards programs, restaurant
programming or highly specific training of reservations, and off-hours activities, all of which
machine-learning models. Gen AI changes that. must be handled across different online platforms.
When agentic systems are built using foundation While there have been efforts to automate parts of
models (which have been trained on extremely large this process, much of it still must be done manually.
and varied unstructured data sets) rather than This is in large part because the wide variation in
predefined rules, they have the potential to adapt potential inputs and outputs makes the process too
to different scenarios in the same way that LLMs complicated, costly, or time-intensive to automate.
can respond intelligibly to prompts on which they
have not been explicitly trained. Furthermore, using Gen AI–enabled agents can ease the automation of
natural language rather than programming code, complex and open-ended use cases in three
a human user could direct a gen AI–enabled important ways:
External systems:
Agents interact with
databases and systems—
both organizational and
Analyst Checker Planner external data—to
agent agent agent complete the task.
Specialist agents
Start 1 2 3 4 End
Using natural language, The agent system interprets The agent team shares The agent team receives user
the user prompts the the prompt and builds a the draft output with feedback, then iterates and
generative AI agent work plan. A manager agent the user. refines output accordingly.
system to complete a subdivides the project into
task. tasks assigned to specialist
agents; they gather and
analyze data from multiple
sources and collaborate
with one another to execute
their individual missions.
4. Agent executes action: The agent executes any required, this tends to be a time-consuming and
necessary actions in the world to fully complete highly collaborative effort, requiring a relationship
the user-requested task. manager to work with the borrower, stakeholders,
and credit analysts to conduct specialized analyses,
which are then submitted to a credit manager for
Art of the possible: Three potential review and additional expertise.
use cases
What do these kinds of systems mean for Potential agent-based solution: An agentic
businesses? The following three hypothetical system—comprising multiple agents, each assuming
use cases offer a glimpse of what could be possible a specialized, task-based role—could potentially
in the not-too-distant future. be designed to handle a wide range of credit-risk
scenarios. A human user would initiate the process
Use case 1: Loan underwriting by using natural language to provide a high-level
Financial institutions prepare credit-risk memos to work plan of tasks with specific rules, standards, and
assess the risks of extending credit or a loan to a conditions. Then this team of agents would break
borrower. The process involves compiling, analyzing, down the work into executable subtasks.
and reviewing various forms of information pertaining
to the borrower, loan type, and other factors. Given One agent, for example, could act as the
the multiplicity of credit-risk scenarios and analyses relationship manager to handle communications
Web 2024
McKQ-2024Q4-GenAIAndSmartAgents
Exhibit 2
Exhibit 2 of 2
Financial institutions A relationship The RM and a The credit analyst The RM reviews the memo
often spend 1–4 manager (RM) credit analyst typically spends and provides feedback;
weeks creating a gathers data from collaboratively 20+ hours writing the credit analyst writes
credit-risk memo. 15+ sources on analyze the data. the memo. a new draft incorporating
The current process: borrower, loan type, the feedback.
and other factors.
Start 1 2 3 4 End
Start 1 2 3 End
Large language models (LLMs), as we Misuse of tools is key to ensuring that users maintain
now know, are prone to mistakes and With their ability to access tools and data, a balanced perspective, critically
hallucinations. Because agent systems agents could be dangerous if intentionally evaluate agent performance, and retain
process sequences of LLM-derived misused. Agents, for example, could be final authority and accountability in
outputs, a hallucination within one of used to develop vulnerable code, create agent actions. Furthermore, agent
these outputs could have cascading convincing phishing scams, or hack performance should be evaluated by tying
effects if protections are not in place. sensitive information. agents’ activities to concrete outcomes
Additionally, because agent systems (for example, customer satisfaction,
Mitigation strategy: For potentially
are designed to operate with autonomy, successful completion rates of tickets).
high-risk scenarios, organizations should
business leaders must consider additional
build in guardrails (for example, access In addition to addressing these potential
oversight mechanisms and guardrails.
controls, limits on agent actions) and risks, organizations should consider the
While it is difficult to fully anticipate all the
create closed environments for agents broader issues raised by gen AI agents:
risks that will be introduced with agents,
(for instance, limit the agent’s access
here are some that should be considered. — Value alignment: Because agents are
to certain tools and data sources).
akin to coworkers, their actions should
Potentially harmful outputs Additionally, organizations should apply
embody organizational values. What
Large language models are not always real-time monitoring of agent activities
values should agents embody in their
accurate, sometimes providing incorrect with automated alerts for suspicious
decisions? How can agents be
information or performing actions with behavior. Regular audits and compliance
regularly evaluated and trained to
undesirable consequences. These risks checks can ensure that guardrails remain
align with those values?
are heightened as generative AI (gen AI) effective and relevant.
agents independently carry out tasks — Workforce shifts: By completing tasks
Insufficient or excessive human–agent
using digital tools and data in highly independently, agent systems stand
trust
variable scenarios. For instance, an agent to significantly alter the way work is
Just as in relationships with human
might approve a high-risk loan, leading to accomplished, potentially allowing
coworkers, interactions between humans
financial loss, or it may make an expensive, humans to focus more on higher-level
and AI agents are based on trust. If users
nonrefundable purchase for a customer. tasks that require critical thinking and
lack faith in agentic systems, they might
managerial skills. How will roles and
Mitigation strategy: Organizations scale back the human–agent interactions
responsibilities shift in each business
should implement robust accountability and information sharing that agentic
function? How can employees be
measures, clearly defining the systems require if they are to learn and
provided with retraining
responsibilities of both agents and improve. Conversely, as agents become
opportunities? Are there new
humans while ensuring that agent outputs more adept at emulating humanlike
collaboration models that can
can be explained and understood. This behavior, some users could place too
enhance cooperation between
could be accomplished by developing much trust in them, ascribing to them
humans and AI agents?
frameworks to manage agent autonomy human-level understanding and judgment.
(for example, limiting agent actions This can lead to users uncritically — Anthropomorphism: As agents
based on use case complexity) and accepting recommendations or giving increasingly have humanlike
ensuring human oversight (for example, agents too much autonomy without capabilities, users could develop an
verifying agent outputs before execution sufficient oversight. overreliance on them or mistakenly
and conducting regular audits of agent believe that AI assistants are fully
Mitigation strategy: Organizations can
decisions). Additionally, transparency and aligned with their own interests and
manage these issues by prioritizing
traceability mechanisms can help users values. To what extent should
the transparency of agent decision
understand the agent’s decision making humanlike characteristics be
making, ensuring that users are trained
process to identify potentially fraught incorporated into the design of agents?
in the responsible use of agents, and
issues early. What processes can be created to
establishing a humans-in-the-loop
enable real-time detection of potential
process to manage agent behavior.
harms in human–agent interactions?
Human oversight of agent processes
Lareina Yee is a senior partner in McKinsey’s Bay Area office, where Michael Chui and Roger Roberts are partners;
Stephen Xu is a senior director of project management in the Toronto office.
The authors wish to thank Aneri Shah, Arun Mittal, Henry Zhang, Kimberly Te, Mara Pometti, and Rickard Ström for their
contributions to this article.