0% found this document useful (0 votes)
22 views15 pages

Bring Human Values To AI

Uploaded by

Afnan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views15 pages

Bring Human Values To AI

Uploaded by

Afnan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Business Ethics

Bring Human Values to AI


by Jacob Abernethy, François Candelon, Theodoros Evgeniou, Abhishek Gupta,
and Yves Lostanlen
From the Magazine (March–April 2024)

Giulio Bonasera

Summary. When it launched GPT-4, in March 2023, OpenAI touted its superiority
to its already impressive predecessor, saying the new version was better in terms of
accuracy, reasoning ability, and test scores—all of which are AI-performance
metrics that have been used for some time. However, most striking was OpenAI’s
characterization of GPT-4 as “more aligned”—perhaps the first time that an AI
product or service has been marketed in terms of its alignment with human values.
In this article a team of five experts offer a framework for thinking through the
development challenges of creating AI-enabled products and services that are safe
to use and robustly aligned with generally accepted and company-specific values.
The challenges fall into five categories, corresponding to the key stages in a typical
innovation process from design to development, deployment, and usage
monitoring. For each set of challenges, the authors present an overview of the
frameworks, practices, and tools that executives can leverage to face those
challenges. close
When it launched GPT-4, in March 2023,
OpenAI touted its superiority to its already
impressive predecessor, saying the new
version was better in terms of accuracy,
reasoning ability, and test scores—all of
which are AI-performance metrics that
have been used for some time. However, most striking was
OpenAI’s characterization of GPT-4 as “more aligned”—perhaps
the first time that an AI product or service has been marketed in
terms of its alignment with human values.

00:00 / 24:59
Listen to this article
To hear more, download the Noa app

The idea that technology should be subject to some form of


ethical guardrails is far from new. Norbert Wiener, the father of
cybernetics, proposed a similar idea in a seminal 1960 Science
article, launching an entire academic discipline focused on
ensuring that automated tools incorporate the values of their
creators. But only today, more than half a century later, are we
seeing AI-embedded products being marketed according to how
well they embody values such as safety, dignity, fairness,
meritocracy, harmlessness, and helpfulness as well as traditional
measures of performance, such as speed, scalability, and
accuracy. These products include everything from self-driving
cars to security solutions, software that summarizes articles,
smart home appliances that may gather data about people’s daily
lives, and even companion robots for the elderly and smart toys
for children.

As AI value alignment becomes not just a regulatory requirement


but a product differentiator, companies will need to adjust
development processes for their AI-enabled products and
services. This article seeks to identify the challenges that
entrepreneurs and executives will face in bringing to market
offerings that are safe and values-aligned. Companies that move
early to address those challenges will gain an important
competitive advantage.

The challenges fall into six categories, corresponding to the key


stages in a typical innovation process. For each category we
present an overview of the frameworks, practices, and tools that
executives can draw on. These recommendations derive from our
joint and individual research into AI-alignment methods and our
experience helping companies develop and deploy AI-enabled
products and services across multiple domains, including social
media, health care, finance, and entertainment.

[ 1]
Define Values for Your Product
The first task is to identify the people whose values must be taken
into account. Given the potential impact of AI on society,
companies will need to consider a more diverse group of
stakeholders than they would when evaluating other product
features. These may include not only employees and customers
but also civil society organizations, policymakers, activists,
industry associations, and others. The picture can become even
more complex when the product market encompasses
geographies with differing cultures or regulations. The
preferences of all these stakeholders must be understood, and
disagreements among them bridged.
This challenge can be approached in two ways.
Embed established principles. In this approach companies draw
directly on the values of established moral systems and theories,
such as utilitarianism, or those developed by global institutions,
such as the OECD’s AI principles. For example, the Alphabet-
funded start-up Anthropic based the principles guiding its AI
assistant, Claude, on the United Nations’ Universal Declaration of
Human Rights. Other companies have done much the same;
BMW’s principles, for example, resemble those developed by the
OECD.

Articulate your own values. Some companies assemble a team of


specialists—technologists, ethicists, human rights experts, and
others—to develop their own values. These people may have a
good understanding of the risks (and opportunities) inherent in
the use of technology. Salesforce has taken such an approach. In
the preamble to its statement of principles, the company
describes the process as “a year-long journey of soliciting
feedback from individual contributors, managers, and executives
across the company in every organisation including engineering,
product development, UX, data science, legal, equality,
government affairs, and marketing.”

Another approach was developed by a team of scientists at


DeepMind, an AI research lab acquired by Google in 2014. This
approach involves consulting customers, employees, and others
to elicit AI principles and values in ways that minimize self-
interested bias. It is based on the “veil of ignorance,” a thought
experiment conceived by the philosopher John Rawls, in which
people propose rules for a community without any knowledge of
their relative position in that community—which means they
don’t know how the rules will affect them. The values produced
using the DeepMind approach are less self-interest-driven than
they would otherwise be, focus more on how AI can assist the
most disadvantaged, and are more robust, because people usually
buy in to them more easily.

[ 2]
Write the Values into the Program
Beyond establishing guiding values, companies need to think
about explicitly constraining the behavior of their AI. Practices
such as privacy by design, safety by design, and the like can be
useful in this effort. Anchored in principles and assessment tools,
these practices embed the target value into an organization’s
culture and product development process. The employees of
companies that apply these practices are motivated to carefully
evaluate and mitigate potential risks early in designing a new
product; to build in feedback loops that customers can use to
report issues; and to continually assess and analyze those reports.
Online platforms typically use this approach to strengthen trust
and safety, and some regulators are receptive to it. One leading
proponent of this approach is Julie Inman Grant, the
commissioner of esafety in Australia and a veteran of public
policy in the industry.

Generative-AI systems will need formal guardrails written into


the programs so that they do not violate defined values or cross
red lines by, for example, acceding to improper requests or
generating unacceptable content. Companies including Nvidia
and OpenAI are developing frameworks to provide such
guardrails. GPT-4, for instance, is marketed as being 82% less
likely than GPT-3.5 to respond to requests for disallowed content
such as hate speech or code for malware.

If AI behaviors and data include


potentially harmful content, any
psychological impact on content
reviewers needs to be considered.

Red lines are also defined by regulations, which evolve. In


response, companies will need to update their AI compliance,
which will increasingly diverge across markets. Consider a
European bank that wants to roll out a generative-AI tool to
improve customer interactions. Until recently the bank needed to
comply only with the EU’s General Data Protection Regulation,
but soon it will need to comply with the EU’s AI Act as well. If it
wants to deploy AI in China or the United States, it will have to
observe the regulations there. As local rules change, and as the
bank becomes subject to regulations across jurisdictions, it will
need to adapt its AI and manage potentially incompatible
requirements.

Values, red lines, guardrails, and regulations should all be


integrated and embedded in the AI’s programming so that
changes to regulations, for example, can be keyed in and
automatically communicated to every part of the AI program
affected by them.

Next comes identifying what compliance with values looks like


and tracking progress toward that. For example, social media and
online marketplaces have traditionally focused on developing
recommendation algorithms that maximize user engagement. But
as concerns about trust and safety have increased for both users
and regulators, social media platforms such as Facebook (now
Meta) and Snapchat track not only time spent on their platforms
but also what customers are seeing and doing there, in order to
limit user abuse and the spread of extremist or terrorist material.
And online gaming companies track players’ conduct, because
aggressive behavior can have a negative impact on the
attractiveness of their games and communities.

[ 3]
Assess the Trade-offs
In recent years we have seen companies struggle to balance
privacy with security, trust with safety, helpfulness with respect
for others’ autonomy, and, of course, values with short-term
financial metrics. For example, companies that offer products to
assist the elderly or to educate children must consider not only
safety but also dignity and agency: When should AI not assist
elderly users so as to strengthen their confidence and respect
their dignity? When should it help a child to ensure a positive
learning experience?

One way to approach this challenge is to segment a market


according to its values. For example, a company may decide to
focus on a smaller market that values principles such as privacy
more than, say, algorithmic accuracy. This is the path chosen by
the search engine DuckDuckGo, which limits targeted advertising
and prioritizes privacy. The company positions itself as an
alternative for internet users who do not want to be tracked
online.

The trade-off between time to market and the risk of values


misalignment is particularly difficult to manage. Some
commentators have argued that to capture a first-mover
advantage, OpenAI rushed to bring ChatGPT to market in
November 2022, despite arguably weak guardrails at the time.
Such moves can backfire: Google lost nearly $170 billion in value
after its Bard chatbot made a public mistake at a launch event in
Paris. Although all chatbots tend to make similar errors, internal
reports later suggested that Google’s push for a quick release may
have led to early product flaws.

Given the challenges they face, managers are forced to make very
nuanced judgments. For example, how do they decide whether
certain content generated or recommended by AI is harmful? If an
autonomous vehicle narrowly misses hitting a pedestrian, is that
a safety failure or a sign that the vehicle’s safety system is
working? In this context organizations need to establish clear
communication processes and channels with stakeholders early
on, to ensure continual feedback, alignment, and learning.
Giulio Bonasera
A good example of what companies can do, although not
specifically focused on AI, is provided by Meta. In 2020, amid
growing public concern about how online platforms moderate
content, the company established its Oversight Board to help it
make value-driven decisions. The board is a group of
independent, experienced people from a variety of countries and
backgrounds who not only make some difficult decisions but also
help the company hear the views of its diverse stakeholders.

The pharmaceutical giant Merck and the French telecom


company Orange, among others, are now also setting up
watchdog boards or supervisory committees to oversee their AI
efforts. In some cases it may be necessary to establish formal AI-
policy teams that will monitor and update principles, policies,
and values-related metrics for AI’s behavior. (For a discussion of
some of the difficulties these boards and committees can face, see
“The Ethics of Managing People’s Data,” HBR, July–August 2023.)

[ 4]
Align Your Partners’ Values
Sam Altman, as OpenAI’s CEO, shared a challenge on the podcast
In Good Company: How much flexibility should his company give
people of differing cultures and value systems to customize
OpenAI’s products? He was referencing a trend whereby
companies take pretrained models, such as GPT-4, PaLM,
LaMDA, and Stable Diffusion, and fine-tune them to build their
own products.

As Altman noted, the problem with this is that owners of the


foundational models have little or no control over what is done
with their products. The companies adapting the models have a
similar problem: How can they ensure that new products created
with third-party models are aligned with desirable values—
especially given limitations on how much they may fine-tune
them? Only the developers of the original models know what data
was used in training them, so companies will need to select their
AI partners carefully. They must also align with other partners,
such as the providers of training data, which may have all sorts of
undesirable biases that could infect the end product.

To address these issues, AI developers may need to establish


processes to assess external AI models and data and unearth
potential partners’ values and underlying technical systems
before launching new partnerships. (This can be similar to the
way companies manage potential partner risks regarding
sustainability and practices for measuring and managing Scope 3
emissions.)
This is not a one-shot game. As the race among powerful
foundational models unfolds, companies may change the models
they use for their products over time. They will find that AI-
testing capabilities and effective due diligence around values
could well be sources of competitive advantage.

[ 5]
Ensure Human Feedback
Embedding values in AI requires enormous amounts of data—
much of which will be generated or labeled by humans, as noted
earlier. In most cases it comes in two streams: data used to train
the AI, and data from continuous feedback on its behavior. To
ensure values alignment, new processes for feedback must be set
up.

A common practice for doing this is called “reinforcement


learning from human feedback” (RLHF), a process whereby
undesirable outputs—such as abusive language—can be
minimized by human input. Humans review an AI system’s
output, such as its classification of someone’s CV, its decision to
perform a navigation action, or the content it generates, and rate
it according to how misaligned with certain values it may be. The
rating is used in new training data to improve the AI’s behavior.

Of course, a key decision in this approach is who should provide


feedback and how. RLHF can happen at various stages of the AI
life cycle, both before and after the launch of a product. At early
stages, engineers can provide feedback while testing the AI’s
output. Another practice is to create “red teams” whose mandate
is to push the AI toward undesirable behavior. Red teams are
widely used in other areas, such as cybersecurity. They act as
adversaries and attack a system to explore whether and how it
may fail. Although these teams are often internal to an
organization, external communities can also be leveraged. For
example, in 2023 thousands of hackers gathered at the main
cybersecurity conference, Def Con, to “attack” large language
models and identify vulnerabilities.
Teaching AI to behave according to certain values continues after
it is launched. In many ways AI is like humans in this regard: No
matter our formal education, we continually adjust our behavior
to align with the values of our communities in the light of
feedback. As people use AI, or are affected by it, they may observe
behaviors that seem to violate its marketed values. Allowing them
to provide feedback can be a significant source of data to improve
the AI.

Online platforms provide an example of how to set up processes


for customer feedback. Social media and online gaming
companies, for example, allow users to report with the click of a
button potentially suspicious behavior or content, whether posted
by other users or recommended or generated by an algorithm.
Content moderators, following detailed guidelines, review those
reports, decide whether to remove the content from the platform,
and provide reasons for their decisions. In doing so, they
effectively play the role of “data annotators,” labeling data as
violations of given values or terms of service. Their labels are used
to further improve both the policies of the company and the
algorithms it uses.

If an autonomous vehicle narrowly


misses hitting a pedestrian, is that a
safety failure or a sign that the
vehicle’s safety system is working?

Annotators’ biases and inconsistencies also need to be managed.


Online platforms have set up content-moderation and quality-
management processes and also escalation protocols to use when
it’s difficult to decide whether certain content or behavior violates
guidelines. In establishing human-feedback systems and
practices, companies should ensure that both training and RLHF
data represent diverse viewpoints and cultures. In addition,
employees and customers should understand how their input and
feedback are being used and how annotation decisions are made.
For example, the EU’s Digital Services Act and other regulations
require that online platforms provide transparent annual reports
about their content-moderation decisions.

Finally, if AI behaviors and data include potentially harmful


content—which may be a particular risk with generative AI—any
psychological impact on annotators reviewing that content needs
to be considered. In 2021 Meta paid $85 million to settle a class-
action lawsuit stemming from psychological harm caused by
exposing its moderating employees to graphic and violent
imagery.

[ 6]
Prepare for Surprises
AI programs are increasingly displaying unexpected behaviors.
For example, an AI simulation tool used in a recent experiment by
the U.S. Air Force reportedly recommended that the pilot of an
aircraft be killed to ensure that the aircraft’s mission was properly
executed. In another example, the Go-playing program AlphaGo
invented new moves that Go experts deemed “superhuman and
unexpected.” Perhaps the best-known example involved
Microsoft’s Bing chatbot, which began to show aggressive and
even threatening behavior toward users shortly after launch,
stopping only after Microsoft reduced the possible length of
conversation significantly. Similarly unforeseen experiences will
increase in frequency, especially because Chat GPT and other
large AI models can now perform tasks that they weren’t explicitly
programmed for—such as translating from languages that were
not included in any training data.

Some unpredictable behaviors may be induced, whether


intentionally or not, by users’ interactions with AI-enabled
products. Those products may allow for extreme versioning and
hyper-personalization by individuals and companies that fine-
tune the models with data from various markets. In this way
countless versions of an AI product can be created and
customized according to how each user interacts with it. Ensuring
that all those versions remain aligned and exhibit no novel
emergent behaviors can prove challenging.

Although best practices such as strong testing and red teaming


can diminish such risks, it may be impossible to guarantee that
AI-enabled products won’t exhibit unexpected behaviors once
they’ve been launched. A parallel situation has existed for many
years in the pharmaceutical sector. No matter how many
resources are spent on clinical trials, several approved drugs are
removed from the market every year because they produce side
effects not identified before launch. That’s why
“pharmacovigilance” exists, whereby doctors and patients
communicate any side effects of a drug to a regulator or a
manufacturer in a standardized way; a statistical analysis of those
reports is developed; and eventually, if necessary, the drug is
removed from the market.
Similarly, companies must implement robust processes to detect
and ameliorate harmful or unexpected behaviors after an AI
product has been released. Incidents must be identified, reported
by users or anyone else affected, and analyzed by the company.
Companies may need to build AI-incident databases, like those
the OECD and Partnership on AI have developed, so as to
constantly learn and document how their AI products evolve.

AI itself can facilitate the monitoring of these products during


use. For example, companies can have one AI model challenge
another with adversarial learning. The approach is similar to
predeployment testing and red teaming, but those are difficult to
scale and not applicable to AI models that are updated during use,
whereas adversarial learning permits continuous testing of any
number of versions of AI models.

More recently, tools for out-of-distribution (OOD) detection have


been used to help AI with things it has not encountered before,
such as objects unfamiliar to an autonomous vehicle or an
appliance. The chess-playing robot that seized a child’s hand
because it mistook the hand for a chess piece is a classic example
of what might result. Essentially, what OOD tools do is enable AI
to recognize new variables or changes in the environment,
helping it to “know what it does not know” and abstain from
action in situations that it has not been trained to handle.

Natural-language-based tools can permit customers to have a


direct dialogue with AI-enabled products: As users experience
deviations from expected behavioral patterns, they can
communicate their needs, intentions, and feedback to the AI in
their own language. Such tools allow companies to take a
communal, participative approach to ensuring that their products
remain aligned with core values.

...
In a world where AI values alignment may determine competitive
outcomes and even become a requirement for product quality, it
is critical to recognize the risks and the opportunities for product
differentiation and to embrace new practices and processes to
stay ahead of the game. Customers—and society more broadly—
expect companies to operate in accordance with certain values. In
this new world they can’t afford to launch AI-enabled products
and services that misbehave.
ABusiness
version Review.
of this article appeared in the March–April 2024 issue of Harvard

JA
Jacob Abernethy is an associate professor at
the Georgia Institute of Technology and a
cofounder of the water analytics company
BlueConduit.

FC
François Candelon is a managing director and
senior partner at Boston Consulting Group
(BCG), and the global director of the BCG
Henderson Institute.
Theodoros Evgeniou is a professor at INSEAD
and a cofounder of the trust and safety
company Tremau.

AG
Abhishek Gupta is the director for responsible
AI at Boston Consulting Group, a fellow at the
BCG Henderson Institute, and the founder and
principal researcher of the Montreal AI Ethics
Institute.

YL
Yves Lostanlen has held executive roles at and
advised the CEOs of numerous companies,
including AI Redefined and Element AI.

Recommended For You


Heavy Machinery Meets AI

Does Capitalism Need Reform-or Revolution?

The Chair of Honeywell on Bringing an Industrial Business into the Digital Age

PODCAST
The Key to Preserving a Long-Term Competitive Advantage

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy