Artefact Data and AI Transformation For Business Report
Artefact Data and AI Transformation For Business Report
TRANSFORMATION
FOR BUSINESS
WE ACCELERATE DATA AND AI ADOPTION
TO POSITIVELY IMPACT
PEOPLE AND ORGANIZATIONS.
The Netherlands
UK Germany
New York France Switzerland
Spain
Los Angeles South Korea
Lebanon Chengdu
Morocco Shanghai
Dubai
Mexico Saudi Arabia India
Senegal
Colombia Malaysia
Singapore
Brazil
South Africa
21 1500 +1000
COUNTRIES EMPLOYEES CLIENTS
20 CARREFOUR — Google Data Lab Using AI to 68 Gaining buy-in for data & analytics
drive value in store based on the AI Factory’s initiatives in financial services
operating model of Artefact 74 The road ahead: data-driven sales is
22 HEINEKEN Brazil —Using the Data Factory critical for the evolving car industry
methodology as a Revenue Generation Center 77 Interview: How Nissan is transforming in
24 Data Governance, a prerequisite for AI project the digital world
success
26
The vital role data governance plays in Data for Impact
achieving sustainability goals
30 PIERRE & VACANCES CENTER PARCS How 81 Use data to measure and reduce your
data governance and data quality can boost environmental impact with Artefact
digital marketing and activation performance
83 Industrializing carbon footprint
33 Data Mesh: Principles, promises and realities measurement to achieve neutrality
of a decentralized data management model
85 Applying machine learning algorithms
37 Why is the “data as a product” concept central to satellite imagery for agriculture
to data mesh? applications
AI Industry Solutions
Demand Forecasting
39 Demand forecasting: Using machine learning
to predict retail sales
41 L’OREAL — Trend Detection Innovating
tomorrow’s products today thanks to AI trend
detection by Artefact
43 Scoring Customer Propensity using Machine
Learning Models on Google Analytics Data
3
The outlook for data and How is Artefact leading the
generative AI transformation for
AI transformation, today
enterprises?
and tomorrow.
Since the availability of the first LLM
models (Large Language Models),
even before the official public launch
of ChatGPT in November 2022, we
at Artefact have been one of the key
An interview with Vincent Luciani global pioneers using this powerful
technology, designing and deploying
many generative AI use cases with our
clients throughout 2023.
The generative AI technology revolution has been a
paradigm shift for all industries and sectors. Artefact As certified experts with major
sees AI as an incredible opportunity that, if used Clouds and open source GenAI, we’ve
already acquired strong expertise and
properly and ethically, will lead to economic, social, developed a solid ecosystem. In this
and democratic progress. context, we recently announced our
official strategic collaboration with
Mistral AI, the most powerful LLM
How is generative AI profoundly platform for a European OpenAI.
transforming society and
businesses? Despite achieving notable reductions
in development time and enhanced
We are at the beginning of a new employee adoption, scalability of
era. The generative AI revolution is GenAI projects remains a challenge,
reshaping societal and economic emphasizing the need for ethical and
landscapes. After an experimentation secure environments grounded in
phase, generative AI will continue robust data foundations.
to change the game for the global
community. It’s a technology with or more than 10 years, Artefact has
F
the potential to improve the world prioritized the crucial role of data in AI
in many ways, as long as solid success for enterprises. Initiating data
checks and balances are in place to acceleration programs, we focus on
ensure its responsible and beneficial elevated data quality, governance, and
development. interconnected platforms, adhering
to ethical and responsible guidelines.
• Economically, it offers undeniable
productivity gains that will spur Anticipating substantial growth thanks
innovation and new business growth. to these new LLM technologies,
companies are urged to embrace AI for
“At Artefact, we • Socially, generative AI will streamline a competitive edge. This transformative
administrative tasks, freeing up year will necessitate new organizational
hold an optimistic more valuable and creative time, models and widespread AI deployment
vision, viewing AI which could lead to innovative job across business value chains, with
opportunities and the development Artefact accompanying its clients from
as an incredible of new skills. strategy to full operations.
chance that, if
• Democratically, the accessibility However, the success of technology
used properly and of GenAI to all will provide deep shifts depends on fostering trust and
ethically, will lead to knowledge and solutions to address enthusiasm among all employees,
specific societal and educational requiring consultation and support
economic, social, and inequalities and advance the cause from top to bottom, an area where
democratic progress.” of social justice. hackathons and training can be
instrumental.
4
INTERVIEW
5
This case perfectly illustrates
Artefact’s firm belief that to achieve
true data maturity, companies have no
choice but to make data accessible to
everyone: not only to experts, but also
to operational staff in the field. This
will lead to new forms of augmented
work, where applications and their
interfaces put intelligent information
in everyone’s hands to work more
efficiently and with more autonomy.
Artefact also helped the Carrefour Paris-Saclay) and leading enterprises for clients. Artefact has become one
Group in reducing the carbon impact of including Orange, Société Générale of the first and few consolidated pure
its e-commerce branch with a solution and Decathlon, with other companies data & AI players in the market, with
that can be implemented by the joining us soon. Through the the most comprehensive set of data-
company and consumers. Carrefour’s developments and publications of the driven services and AI applications.
aim is to become the world leader in Artefact Research Center, we aspire
food system transformation for all by to shape a future where AI is not only We offer data acceleration programs,
committing to four major objectives, a powerful tool but is also tailored to industry specific AI solutions, and
including achieving carbon neutrality the needs of businesses with ethics data-driven marketing services.
by 2030 for its e-commerce activities. and responsibility, thereby facilitating Our engineers build tech agnostic
The challenge for Artefact was to its adoption. solutions, combining custom code with
enable Carrefour to reliably measure open source and proprietary software,
all greenhouse gas emissions from • The creation of the SKAFF technology backed by strong partnerships with
data storage, transport and logistics platform, an open source developer leading cloud providers, to create
activities, from first click to final portal that includes a central software exactly what you need for your data
delivery, components catalog supporting and AI transformation.
TechDocs and a scaffolder for
Our solution measured greenhouse automating engineering processes. Today, Artefact is present in 20
gas emissions generated by This platform enhances efficiency countries across Europe, Asia, the
e-commerce orders, then collected by swiftly delivering high-quality Americas (North & South), the Middle
activity data to convert it into carbon outcomes through the consolidation East and Africa, with 23 offices and
emissions. All Carrefour business of technical assets, convictions, 1,500 employees. And we have robust
teams helped obtain the data – which and tutorials focused on our core plans for geographical expansion as
is why the operation was a success, as technologies. well as an ambitious M&A policy that
it allowed all stakeholders to become will continue.
ambassadors for the group’s “carbon After a decade of exponential
neutrality 2030” objective. growth, what is Artefact’s ambition We’re also continuously hiring new
for the coming years? consulting Partners and Directors,
How is Artefact able to always be experts in their respective fields,
at the forefront of AI through core First of all, our gratitude goes to our orchestrating collaboration across
research and advanced technology? clients for entrusting us, a cornerstone Artefact’s regions. They provide
of our success. dedicated support and industry-
At Artefact, we’ve implemented major specific services. While strengthening
projects to ensure that we always I believe that our success also stems our positions in CPG, Retail, and Luxury,
leverage the best of data science and from our unique ability to transform we’ve also intensified our development
AI technologies for our clients: data and AI into value for companies. in Financial Services, Healthcare &
We offer our 1000+ clients a unique Pharmaceuticals, and Manufacturing,
• The launch of the Artefact Research combination of innovation (Art) and reinforcing human resources.
Center, which fosters a robust data data science (Fact).
and AI R&D ecosystem by connecting We’re excited about the promising
PhD talent at Artefact with esteemed By creating multidisciplinary teams future that AI holds for individuals and
professors from top universities and breaking down silos between organizations. The excellence of AI
(Polytechnique, Sorbonne University, business and technology departments, technology will be realized through the
and CentraleSupélec, University of we generate real, immediate impact collective capabilities of human talent.
6
Data Readiness
8 Entreprise Governance in the data age
27
The vital role data governance plays
in achieving sustainability goals
7
routes: they don’t factor in the distance
to be covered, but rather the fewest
left turns to be made on each route.
By analysing the data, they realised
that 60% of all accidents were caused
by taking left turns, and only 3% by
taking right turns (and requiring more
waiting time).
Entreprise Governance
Analysis, prediction and optimisation:
with these, data becomes a ‘production
factor’ rather than an ‘innovation
8
DATA READINESS
to rectify. Treating data as a strategic to develop in-demand expertise in for distributing calculations on
asset means agreeing to invest in areas such as data science and different servers).
a program to improve data quality, cybersecurity.
documentation and accessibility, and Decentralisation is also valid
to do so in a sustainable manner as When considering the talents that will in governance. Why? Because
sources multiply. be needed tomorrow, it’s tempting centralisation is impossible: there’s
to focus only on technology and simply too much data, with poorly
A data driven company assume the next generation will be controlled sources, which can
must become a talent exclusively composed of engineers easily be poorly interpreted without
development factory and data scientists. Clearly, there will contextual knowledge. Some benefits
be a need for them and there will even of decentralisation in governance
Talent has become the decisive be a shortage of them for the next few include:
factor in the digital age. Access years, but this is only part of the story.
to technology is a commodity so In a world where data and algorithms 1. Rapid decision-making and less
universally accessible that the can automate manual, repetitive and time spent going back and forth
emergence of no-code, for example, time-consuming tasks, and where
and cloud computing, are increasingly technology is ever more accessible, 2. Letting the ‘one who knows’ make
associated with turnkey services such there’s plenty of room for other types the best decision
as database storage and operation or of talents: problem-solving, creative,
automatic algorithm building. interpersonal, etc. 3. E mpowering decision-makers
with a mandate – with limits and
The recruitment war is very serious – In a modern company, the a control loop, of course.
among the seven million available job decision-making process
offers in the US posted on LinkedIn must be decentralised This implies a deep organisational
(70% of the total) for example, two out change oriented around knowledge:
of seven are for data-related positions. In technology, there is a major the organisation as an autonomous
The pace of technological change is so progressive movement towards whole, constituted of cognisant
rapid that it’s impossible to establish decentralisation, which began with communities organised around
a competency framework at any given technological breakthroughs (e.g., knowledge. At Artefact, for example,
time. crypto currency or the metaverse we made sure that certain entities
which are decentralised systems), (chapters, tribes, guilds) could decide
This acceleration is being driven but also IT systems (cloud, where we on their own very critical things, like
by GAFA-backed big budget cloud share our machines, or distributed salaries, bonuses, prices and offers
technology frameworks, research computing architectures such as and even staffing! We have created a
labs and free algorithms from a global Hadoop, a world-renowned framework fully decentralised governance.
network of 100k researchers, plus
the almost immediate adoption and
widespread use of open source in the
start-up world.
9
all share them too. They are invaluable
tools which break any strategy down
into measurable objectives, then into
two or three sub-objectives shared
by all employees. Andy Grove, who
taught them to John Doerr; he in
turn wrote the book, ‘Measure What
Matters’, about the process. OKRs
allow employees to be valued for their
accomplishments, not merely their
backgrounds, degrees, or titles.
Conclusion: A new
perception of data and data
governance
Data changes the way companies are
governed, and the role of managers –
and directors in particular – from one
of making the best decisions in the
company’s interest to one of creating
a system so that everyone contributes
to making the best decisions in the
company.
10
CASE STUDY
ORANGE FRANCE
AI solution of visual recognition
at the service of Orange France
technical intervention quality
CHALLENGES
11
An AI solution based on
image recognition
Asking technicians to verify numerous control points
at the job site, or having enough human resources
dedicated to analyzing the 20,000 pictures generated
every day is not a feasible solution. This would be too
time-consuming, too costly, and would not be error-
proof. In addition, sampling is not an option, as each and
every intervention must be verified.
12
RESULTS
application
designed, tested, corrected and industrialized
on a large scale. It was a real technical and
human feat, as the tool has now been adopted
To understand their way of working, technicians have by the 10,000 technicians deployed every day
been part of the project from the very beginning . in France.
This allowed the development team to identify several
points, one of which is crucial: the application should This application is just one of the 150 use cases
not be perceived as a means of controlling the work of developed by Orange over the last two years as
technicians, but as a tool to facilitate their daily work. part of its transformation through AI. Since then,
Orange – together with Artefact – has put 15
So, to ensure that end-users are comfortable with the new models into production to support other
application and that it was ethically designed, the team functions, such as sales or customer service.
worked on two aspects.
First, technicians must be able to maintain control over
the machine and go against its recommendations. This
is why the explainability of the results returned by the
model was a core value. If the model finds one or more
non-conformities, the AI must specify which area or areas
are affected.
Then, once a first version of the application was ready,
the team had it tested by 50 volunteer technicians. This Médéric Chomel
allowed the team to collect relevant feedback so they could VP Data & AI Automatisation
improve the models. As an example, the conditions in ORANGE FRANCE
which the photos are taken can lead to confusion between
orange-colored cables (from Orange) and red-colored “Regulations on artificial intelligence
cables (competitors). The recurrence of this error led the are in the process of being
feature team to improve the algorithm’s acceptability. developed. Our transformation,
The model’s performance was reduced in order to avoid
contradicting what the human sees.
using data and AI, is intended to
be respectful of privacy, to benefit
For Vincent Luciani, co-founder and CEO of the Artefact
group, humans and their environment,
and to be unbiased. This is why
“All of our AI projects are designed to respect the seven
fundamental principles for ethical AI use established by a this strategy anticipates future
group of European Commission experts. The first of these regulatory changes as far as
values is human control. We have placed technicians at possible. We must remember
the heart of the project to ensure that this new solution
that this type of project is not just
makes their daily lives easier and doesn’t hinder their
autonomy. This has also been crucial for its adoption by technical; humans are the greatest
all Orange installers.” factor in their success.”
13
A Practical Approach
to Business Impact
from Data & AI 1-B
uilding data solutions should
be driven by the business, for
the business
14
DATA READINESS
3-P
rioritizing a few data solutions “As organizations seek to
will ultimately have the most achieve tangible business
business impact
results from their investments
The goal shouldn’t be to impress with in data analytics and artificial
a long list of data solutions, but rather intelligence, it’s critical to adopt
to identify the most critical business
areas that can benefit from data-driven a focused approach that builds
insights. By avoiding the temptation the right solutions and sets
to pursue too many data solutions, the right expectations. Through this approach,
organizations can stay focused and
increase their chances of building business leaders spearhead the development
successful data solutions. It’s also of data & AI solutions ‘for the business by the
important to identify the value-added business’ - prioritizing the most impactful solutions,
capabilities of data solutions beyond
simple reporting. While reporting is building quick POCs with data experts, scaling
valuable in providing a summary of data solutions that work, and accepting ‘failure’
business performance, it only provides on those that don’t. Having business teams lead
a retrospective view of data, leaving
little room for analysis and decision- the whole process ensures business buy-in and
making. To fully leverage the power adoption by design.”
of data, organizations must identify
data solutions that provide diagnostic
analytics that automatically identify
Oussama Ahmad, Data Consulting Partner,
the root causes of performance and Global Travel & Tourism Lead - ARTEFACT
predictive analytics that anticipate
future trends.
4-A
ssessing feasibility of data Building and scaling data solutions projects. Instead, companies should
solutions requires a full for businesses requires a new focus on industrializing successful
understanding of data sources operating model - an AI Factory data use cases, scaling them to full
and technologies - made up of feature teams led by data domains, and optimizing their
business experts supported by algorithms and data sources. This
Before embarking on the development data scientists, engineers, analysts also includes ongoing monitoring and
of a data solution, it is vital to conduct and software engineers. This team improvement of the use case to ensure
a detailed feasibility study that structure ensures that data solutions that it continues to meet the needs of
examines the availability and quality are always built with a business the business users.
of the required data sources, as well objective in mind. Adopting an agile
as the cost of the technologies and test-and-learn process that attempts 7-S
haring knowledge is
expertise required to collect and to build a successful POC in a short necessary but not sufficient for
process these data sources. This time span is also essential to achieve wide data solution adoption
includes examining the hardware and faster time-to-build.
software requirements, as well as the Providing data solution training
human skills needed to implement 6-A
ccepting that some data and easy-to-use documentation for
and maintain the technology. This solutions will fail, and scaling business users is necessary, but
also helps to set realistic expectations and maintaining those that usually not sufficient for widespread
for data solutions that are consistent work adoption of data use cases.
with the maturity of the required Widespread adoption of data solutions
data sources, technologies, and Not all data solutions will succeed; by business users is best achieved by
capabilities. some will fail, due to technical or having users lead the development
data limitations, despite careful process, integrating these solutions
5-B
uilding data solutions planning and execution. It is crucial into the organization’s learning
efficiently needs a scalable for organizations to recognize curriculum, and including adoption
AI Factory and an agile that failure is a natural part of the and impact KPIs in business user
development process development process: it should not scorecards. By aligning business user
discourage them from pursuing future scorecards with the organization’s
15
data strategy, organizations can create 9-M
aintaining robust governance
a culture of data-driven decision of data solutions ensures
making and ensure that the adoption accurate results with minimal
of data solutions leads to tangible oversight
business impact.
Maintaining high-quality data sources
8 - I mproving data solutions for data solutions is crucial for
is continuous; prioritizing achieving automated, accurate results
enhancements that matter is with minimal oversight. To achieve
key this, organizations should implement
a robust data quality framework
To achieve continuous enhancement that enforces clear guidelines
of data solutions, it is vital to regularly and standards for data collection
collect feedback from business and transformation. In addition,
users, evaluate their needs and organizations should implement
requirements, and make necessary strong data security and privacy
adjustments to optimize these. The policies for secure and compliant data
Scrum methodology provides an processing. This approach ensures
effective approach for gathering and that input data is accurate, current,
implementing improvements in an and consistent, which reduces the
iterative and incremental manner. risk of errors and improves the overall
Users of data solutions should log efficiency of the data processing
continuous feedback on the accuracy workflow.
and usability of data solutions as
well as required improvements to 10 - T
racking the business impact
business processes. It’s important of data solutions requires
to (1) implement improvements that defining direct impact KPIs
increase the accuracy of the solution’s and assigning incremental
output, (2) expand its features and business impact
functionality, and (3) improve its
usability and user experience. Identifying the commercial or
operational KPIs that are directly
improved by a data solution is
essential to measuring its business
impact. Once these KPIs are identified,
“Data acceleration projects have the next step is to develop a formula
to measure the incremental impact
been surging in the MENA region of the data solution on each of these
in recent years, as organizations KPIs. This formula should take into
embrace the power of data for account the baseline of these KPIs
before (or without) the implementation
business growth. While certain of the data solution and compare it
challenges persist, such as to the performance of these KPIs
maintaining data quality, especially with legacy after (or with) the implementation
of this solution, taking into account
systems, organizations are actively seeking solutions other factors that may have led to this
to overcome these obstacles. Building the right data increase. Once the incremental impact
capabilities within business teams and the right on each KPI has been calculated, it
should be translated into financial
operating model is the single most important way to terms, such as reduced costs or
ensure the successful implementation and adoption increased revenues. Finally, it’s always
of data solutions and the realization of tangible recommended to use automated
business impact measurement of
business impact.” data solutions to ensure unbiased
and timely measurement of business
Karim Hayek, Data Consulting Senior Manager impact.
ARTEFACT
16
DATA READINESS
to implementation:
advantage. The data doesn’t lie:
there’s been an almost 25% year-
on-year increase in business use of
becoming
AI, with 63% of executives agreeing
it has led to revenue increases. The
global pandemic has only put this
an AI factory
into sharper focus. The businesses
that thrive and survive will be those
able to adopt the right AI solutions
and deploy and scale them quickly
and efficiently.
Yet, as with all game-changers, AI
initiatives raise new challenges.
Implementation comes with many
Alexandre Thion de la Chaume questions – chief among them,
Managing Partner how can you adopt the right data
Data Factory - Industries approach to deploy AI initiatives
ARTEFACT rapidly and efficiently, without failure
and sustainably over the long term?
The ‘AI Factory’ approach has been
developed for precisely this reason.
17
The four pillars of the AI Factory
Once the company’s data strategy and AI vision are defined, you should have
a prioritised list of use cases to implement. But how can you start working on
them? An effective AI Factory implementation is founded on four distinct pillars:
18
DATA READINESS
hybrid teams based on agile methods. The purpose of MLOps is to tackle should work on the same canvas
Agility ensures a flexible and adaptive challenges that traditional coded and apply software engineering best
way of working and avoids issues systems do not have. The first practices to data science projects –
linked to a silo approach, such as challenge is collaboration between versioning, deployment environments,
isolated departments within the same teams: different units are often siloed testing.
structure or overly rigid procedures. and own different parts of the process.
Ultimately, MLOps is the discipline of
This requires a good blend of business This stifles the unity needed to go into
consistently managing ML projects
and technical profiles, to ensure that production.
in a way that’s unified with all other
what is developed on the technical
The second is pipeline management, production elements. It secures an
side always has a useful purpose that
as ML pipelines are more complex than efficient technical delivery from use
addresses business needs.
traditional ones. They have specific case early stage (first models) to use
Scalability is an important overall characteristics, including bricks case industrialisation.
characteristic of a team’s makeup. The that must be tested and monitored
A FRAMEWORK FOR SUCCESS
idea is that its structure can be easily throughout production.
duplicated, like Lego bricks. With a AI holds tremendous promise, but
The final obstacle is that ML models
fully scalable model, more teams can also great risk for organisations
usually need several iterations – when
be added to address additional use unable to deploy it properly. The real
put into production in a manual, ad-hoc
cases. benefit of the AI Factory model is that
way, they become rigid and difficult
it establishes a core framework for
ADVANCED AI TECHNOLOGIES to update.
swift and successful implementation.
Of course, effective AI deployment Instead, an MLOps approach should Processes, teams and tools are
needs a foundation of AI-enabling embed all ML assets in a Continuous transferable and repeatable by nature,
technologies. An AI Factory uses Integration and Continuous Delivery meaning a company can remain
a combination of open-source, pipeline (CICD) to secure fast and agile in pursuing its AI vision. Once
proprietary and cloud solutions. They seamless rollouts. All data, features the process is established and
should be standardised across the and models should be tested before supported by MLOps, a business
whole data pipeline – from ingestion every new release to prevent quality has what it needs to become an AI
to visualisation – from beginning to or performance drift. All stakeholders powerhouse.
end, according to best practices.
SYSTEMATIC & PROVEN
METHODOLOGIES
Systematisation is needed to make
sure a series of steps are always taken
in a specific order, each with its own
defined objective. The benefits are
twofold. First, this gives an overall
structure of common references
throughout, creating a backbone that
guarantees consistency. Second, this
makes methodologies replicable and
scalable, considerably accelerating
the deployment of the industrialisation
phase.
MLOPS: KEEPING THE FACTORY
RUNNING
Alongside a set use case methodology,
MLOps (Machine Learning Operations)
practices must be deployed to close
the gap between the concept phase
and production. Inspired by the DevOps
process, this should combine software
development and IT operations to
shorten the development life cycle.
19
CASE STUDY
CARREFOUR
Using AI to drive value in store
based on the AI Factory’s
operating model of Artefact
CHALLENGES
AI as a corporate strategy.
AI offers incredible opportunities in the retail space. Global
retailer Carrefour is going through a digital transformation
and has partnered with Google and Artefact to leverage
the power of AI and capture value in several departments:
assortment, pricing, supply chain, store operations,
ecommerce, and marketing.
“We aim to build Artificial Intelligence and Machine
Learning solutions to better serve our customers and
employees”
40%
Elina Ashkinazi-Ildis — Director, Carrefour-Google Data Lab
Carrefour’s ambition is to sift through its vast trove of
data (4 billions annual transactions, 1 million daily visits
to digital platforms) to identify unaddressed issues, define
use cases, scale AI solutions, spread the adoption of AI
additional
within the company and conduct training and upskilling.
“We are really trying to inject innovation, agility, extra-
revenue
collaboration”
Amélie Oudéa-Castéra — Head of E-Commerce, Data and
Digital, Carrefour
Carrefour chose to set up a multidisciplinary hub of internal
and external data experts.
20
SOLUTION
AI Factory by
Artefact, a robust
framework that turns
AI technology into
valuable AI projects RESULTS
21
CASE STUDY
HEINEKEN
Using the Data Factory methodology as
a Revenue Generation Center
CHALLENGES
SOLUTION
Rafael Melo – partner phase and its solution, through data mapping,
ARTEFACT collection, and exploration, creation of machine
We implement our Data learning model and a final product to activate
Factory methodology, which this model.
are hybrid teams composed Finally, we test the solution and industrialize this
of business experts, data product for larger scopes. For this, we always rely
scientists and engineers, to on agile principles: We start with a reduced scope,
deliver a product that is quickly actionable. to quickly show business value to stakeholders,
The teams are a mix of people from Artefact and and developing the solution incrementally.
HEINEKEN, to help in the Data Driven acculturation In this partnership with HEINEKEN we created
which was one of HEINEKEN objectives. data products in practically the entire value chain:
Artefact delivers data products from start to Such as finance, HR; production; distribution &
finish, from the business problem prioritization logistics; marketing and trade, as well as sales
and e-commerce.
22
Daniel Guimarães In this way, we developed a stockout prediction
Logistics & Planning model, which consists of the complete automation
Manager of data, modeling and creation of a dashboard
HEINEKEN BRAZIL which generates the necessary insights for our
We had a challenge in the decision making.
area of planning and logistics We started for a few products and a few
related to allocating products in distribution distribution centers, but quickly saw the value
centers and making short-term decisions. The of the solution and scaled to the rest. Today,
challenge was both extracting the information this model is one of the main decision-making
and creating the intelligence to generate the tools in the area.
insight needed daily.
RESULTS
Fábio Criniti
Data & Analytics Director
HEINEKEN BRAZIL
The biggest benefit of this
partnership with Artefact is the
speed at which we are able to
deliver value to the business,
and build a revenue generation center for HEINEKEN.
Hybrid teams are able to very well connect the
problem with a data solution. For us this is very
important, as we were able to prove value and
consequently invest more in innovative projects
like these.
23
What are the challenges
of data governance today?
The amount of data and the number
of use cases around data is constantly
increasing. First, companies have to
deal with the challenge of getting the
most possible value out of their data
and democratizing it. Good quality,
well documented data should allow
it to be accessible to the end user.
Data Governance,
that it knows what data is flowing
through its infrastructure. It must be
fully transparent about what data it is
24
DATA READINESS
data governance implementation, specialize in data governance, with • Structuring their data assets into a
from strategy to deployment. First, profiles from different backgrounds: large “business domain”, with the
we perform an audit to see where data product owners who model choice of tools to operate, etc.
they stand, then define a roadmap to products, data stewards who
identify areas to work on. Finally, we document and improve quality, but We’re now entering a third phase of
build a data asset structure into data also data engineers and data analysts. industrialization and extension of this
products and help them choose the AI program. As part of the migration
technical tools they need. We also have an ecosystem of to the cloud, we’re analyzing how we
technology partners with whom can structure, rationalize and pool our
In our consulting approach, we insist we collaborate in an agnostic way. data assets. At the moment, we’ve
on the importance of data as a vector We’re proficient in all the new tools moved on to the second stage, which
of value for the company, then we work that appear on the market. We have consists of structuring our data assets
on deployment, quality tool selection both technical and strategic DNA, and according to these major business
and documentation of governance to are able to link all of these subjects families. Next, we’re going to start
give substance to the strategy and together to treat them in a holistic and thinking about the development of
make it feasible. comprehensive way and deploy them tomorrow’s data products, which
to many clients. will serve different categories of use
We’ve also set up our own Artefact cases.
School of Data, which lets us train data Have you got a concrete
stewards and data owners, essential example of support that What can we expect in the
roles in the implementation of data you’ve provided? future, once everyone has
governance for businesses. Along implemented their data
with this professional training, we We assisted one of our major clients
also intervene directly in companies with very extensive data assets in
governance?
to acculturate them to the need their data transformation. The project The availability of data will allow
for advanced and supported data concerned a redesign of their data the implementation of even more
governance in order to succeed in governance. When we arrived in mid- use cases, particularly in the area
their AI projects. 2017, we saw that their governance of Artificial Intelligence. This will
had been approached from a too- accelerate value creation within
What is unique about technical and not sufficiently organizations. It will also allow us to
Artefact’s global vision? “business” perspective. This resulted support all the issues surrounding data
in a lack of adoption of the necessary democratization and decentralization,
Our strength is that we propose tools. To correct this, we linked their especially in terms of bringing data
a global data governance model, governance to their strategic use closer to the business. Artefact’s
focusing on end-use cases first. cases. To do so, we documented the mission is to create this bridge
We position data governance as an use cases, democratized their access, between data and business, and we
“asset” of this transformation. We’re and improved data quality to ensure carry it out on a daily basis with our
able to transcribe use cases into good results. The first pilots were a clients. If the data is well structured
tangible value and be part of a global success! We then faced the challenge and clean, if the products are available,
transformation program. of scaling up. and if we have the push-button tools
to manipulate them, theoretically in
We also have multidisciplinary experts. In 2020, we assisted this same five years, everyone will be able to use
There are about 20 of us in France who company in launching a program data in their daily work!
25
The vital role data governance
plays in achieving sustainability
goals
In this article, we will present how to define Sustainability is a key focus for today’s
organisations, and with consumers’
sustainability goals and how to include them in data purchase decisions increasingly
governance strategies. based on ‘green’ credentials, it can
be a critical element in remaining
competitive. Businesses are starting
to improve their sustainable practices
by addressing the products and
services they provide, the processes
Manuela Mesa they use, the waste they generate as
Director a by-product, and the supply chain
ARTEFACT that facilitates their operations. But
while 90% of executives believe that
sustainability is essential, only 60%
of organisations have sustainability
strategies in place.
26
DATA READINESS
1.
F inancial Companies can 3. Customer trust: Today’s consumers Sustainability strategies and goals are
face enormous costs due to are actively choosing brands based crucial for companies and if reliable
environmental risks that affect on their ethical behaviour and their data isn’t available and accessible,
their supply chain. For example, initiatives linked to sustainability their societal, environmental and
Unilever estimated an annual loss and climate change – although legal requirements won’t be met.
of €300 million due to climate 48% of UK adults say they do not Companies cannot implement
change endangering agricultural trust the information companies sustainability strategies without data
27
governance that offers transparent
and valuable data for better data-
driven decisions.
Data governance: what it
is and why every company
needs it
28
DATA READINESS
29
CASE STUDY
30
To meet the objectives of PVPC, it was necessary to:
• Prioritise Data Quality and Governance above
all other subjects to work efficiently and reliably
in order to have an immediate impact on the
business
•
Form a committed SWAT team composed of
experts ready to work hand-in-hand with PVCP
on complex subjects in order to correct current
data quality issues – but also to prevent new
ones
• Create a new Data Steward role with a network
of SPOCs (Single Points of Contact) to reduce
quality problems and produce high-performance
analytics available to all departments.
“The objectives of the data quality project with Artefact
were simply to have a better cohesion in the quality of
the data and to set up processes to help us be more
efficient in the way we deal with the different subjects,”
clarifies Julien SOULARD,
PVCP’s newly-appointed Tracking & Data Collection
Specialist.
SOLUTION
31
RESULT
32
DATA READINESS
On 27 September at the Big Data & AI Paris 2022 Data mesh is a new organizational and
technological model for decentralized
Conference, Justine Nerce, Data Consulting Partner data management. A distributed
at Artefact and Killian Gaumont, Data Consulting architecture approach for managing
Manager at Artefact, along with Amine Mokhtari, analytical data, it allows users to
easily access and query data where
Data Analytics Specialist at Google Cloud, conducted it resides, without first transporting
a Data Mesh Workshop. Data mesh is one of the it to a data lake or warehouse. Data
hottest topics in the data industry today. But what is mesh is based on four core principles:
it? What are its business benefits? And above all, how • Domain-oriented data ownership,
can companies successfully deploy it across their
organizations? • Data as a product,
33
3. Technology stack: Why choose
Google as a technology solution?
34
DATA READINESS
1 2 3
How to define data domains How to measure success When and how to scate
and associated responsibilities? on a first domain? up the model?
1. M
ap systems and Business use Time to insight Several signs that scaling
processes and the people up is possible
attached to these systems lncrease Time to
in average deploy a new 1. If the first data products
2. M
ap business uses number of data product are widely distributed and
around data (BI solutions, users of data or access reusable
applications) and main products information
users made 2. W
hen a data domain needs a
available new data product
3. E
stablish first mapping and
a common vocabulary 3. W
hen a domain has the
resources to build its own
feature teams product
Deploying data mesh across Killian. “It won’t happen overnight, The last prerequisite is that the
the enterprise obviously. But we’ve already begun business be able to clearly and
breaking down silos by integrating continuously define its data domains
Artefact’s approach to data mesh business teams into IT data teams so and, once the model has proven its
deployment star ts small, by that product teams developing data value, be capable of scaling up.
prioritizing the business’s use cases products can work more efficiently.”
and pain points. All the domains These are the three of the most
and data products needed for each The second prerequisite is the frequently-asked questions by clients
prioritized business use case (from Data Product Owner, who plays a implementing data mesh, along
raw data to finished products) are then key role in coordinating data mesh with Artefact’s recommendations
identified. A future team is assembled implementation. The data product for successfully defining domains,
to develop the first products and set owner has three missions: to design, measuring success, and knowing
standards. Then, related products to build and promote data products. when it’s opportune to scale up.
be built in the future can be identified. The first two missions are self-
explanatory; the third is equally The tech stack: managing
There are three prerequisites for data important, as the strength of a data data mesh with Google Cloud
mesh deployment. The first: breaking product lies in the fact that it is adopted
down silos. and used by the business. “The data “The first thing data and IT teams
product owner is responsible for need to implement data mesh is the
“If data mesh is to be a success, we ensuring that the data product is ability to make their data discoverable
must move towards an organizational documented, understandable and and accessible by publishing it in a
model that breaks down the silos accessible to users, and aligned data catalog”, begins Amine Mohktari.
between IT, data and business to with business needs. The criteria “To achieve this, Google has a first
have platform teams composed of of his success are his KPIs: usage, pillar, Big Query, which enables the
cross domain and cross product technical performance, data quality”, creation of shareable datasets. The
teams, across all entities”, says adds Killian. second pillar, the catalog itself, is
35
The deployment approach used by Artefact clients consists of demonstrating
the value of the model on an initial perimeter or domain
1 2 3 4
Identification of Within a domain
Staffing of 1st
domains and data identification of
Prioritization of feature teams,
products needed for related products
business use development of 1st
prioritized business to be built, opening
cases and pain products, creation
uses cases (from of new domains,
point analysis of 1st standards &
raw data adaptation of
alignment with tools
to finished product standards
made possible by Analytics Hub, their intelligent data fabric that helps Conclusion: three pitfalls to
which creates links to all the datasets unify distributed data and automate avoid when implementing
created by various members of the data management and governance data mesh
organization or its partners so that across that data to power analytics
subscribers may easily access them.” at scale. Along with an Identity DON’T > Stay stuck in a project vision
and Access Management (IAM) instead of a product vision
“It’s important to understand that only framework to assign a unique identity
links to data are made – never copies. to each data consumer, “Dataplex DO > Define priority data products
Thanks to this system, subscribers offers companies a set of technical according to different uses;
can use data as if it belongs to them, pillars that allow them to carry out
even though it remains in its original any implementation of governance in DON’T > Scale up the new model
physical location. This remains true the simplest way possible”, explains too rapidly
even when you have data sets stored Amine.
in a different cloud”, assures Amine. DO > Test the model with a well-
“At Google Cloud, our aim is to provide defined operating model;
User experience is a major principle you with a serverless data platform
of the system and is reflected in all that will allow your data teams to DON’T > Deploy an overly complex
aspects of data mesh, not only in focus on areas such as processes technical ecosystem
facilitating data sharing and data and business use cases, where they
composition, but by keeping data have added value no one else can DO > Keep the tech stack small to
permanently available, no matter how produce.” IMAGE 3 have as many players as possible.
many users are active.
Google’s Dataplex gives users a 360°
As for data security and governance, view of published data products and
Google has it covered with Dataplex, their quality
36
DATA READINESS
37
AI Industry
Solutions
Demand Forecasting
38
AI INDUSTRY SOLUTIONS — DEMAND FORECASTING
39
Or a product missing from the store
might actually be in stock – just
not yet out on the shelves. Big box
retailers often struggle to restock in
real time, so an instantly popular item
might disappear from the shelves
very quickly, and thus not perform
as well as expected, despite it being
available in inventory. This calls for
technology that can help retailers
seamlessly align supply and demand.
Using machine learning and
multiple signals to assess
inventory levels
40
CASE STUDY
Charles Besson, Global Social Insights & AI Director at L’Oréal, and Fabrice
Henry, Managing Partner at Artefact, discuss how L’Oréal Trend Detection,
deployed with Artefact’s AI trend detection solution, is predicting what
cosmetics products consumers are going to want tomorrow.
CHALLENGES
41
SOLUTION
A co-creation that
can forecast emerging
consumer trends.
L’Oréal’s ambitious project needed to go a step further
with AI technology, in comparison with traditional market RESULT
research models. That’s where Artefact leveraged its
advanced expertise in digital marketing and data science
to help L’Oréal detect and predict new trends emerging in
This project is a predictive intelligence
the digital space. Because discovering what consumers
machine with three main components:
want – almost before they know they want it – is the Holy
Grail every marketer seeks. • D ETECT: Using Natural Language
Processing (NLP) algorithms, this feature
Developing an innovative and reliable trend prediction
can digest a database composed of millions
solution was both exciting and challenging for the project
of documents and extract weak signals –
team. As soon as they started brainstorming, they realised
keywords that are relevant but rare (e.g.
that tracking influencers wasn’t the answer.
emerging terms) in the beauty domain
“Sure, when Kim Kardashian wears a new lipstick, everyone
• PREDICT: Once new, atypical, or relevant
starts buying the same colour, but by then it’s already too
beauty terms have been detected, we have
late, the million-dollar question is what happens before
to see if they have staying power. To find
that?”,
out, we train machine learning algorithms
explains Fabrice Henry, Managing Partner at Artefact. based on predictive variables that have
So we went deeper, and asked upstream questions to find reliably demonstrated whether a given trend
out where trends originate and how they propagate. Once a is going to grow or not, using factors such
trend is born, how does it spread? Does it spread differently as number of mentions, commitment score,
according to geography or community? What are the co-occurrence of author citations, etc.“
big sources – YouTube, blogs, Instagram, Facebook, • I LLUSTRATE: Building a number of
etc. – from which data can be extracted in order to train visualisations to demonstrate the power of
algorithms? the trend, along with a variety of contextual
“We found different approaches for each of these subjects elements (brands or authors that were
and proposed a final one to Charles Besson. And that’s talking about it, articles and visuals that
where our collaboration began, where we started this mentioned it…) and let all of this appear in
project,” adds Fabrice. the tool’s interface.
The project co-created by L’Oréal and Artefact was based “I’m really happy with it, we’ve launched a beta
on three key success factors: co-development of an version, and if all goes well; the next steps will
employee-centric solution, validation of the solution via be the launch, adoption, and training. We’ve
an MVP (Minimum Viable Product) prior to scaling, and already had lots of positive feedback!”
strong collaboration based on trust – a vital element concludes Charles Besson, Global Social Insights & AI
when sharing sensitive information with your partners. Director at L’Oréal.
42
AI INDUSTRY SOLUTIONS — DEMAND FORECASTING
What is propensity
A deep-dive on how we built state of the art custom modeling ?
machine learning models to estimate customer Propensity modeling is estimating
propensity to buy a product using Google Analytics how likely a customer will perform
data. a given action. There are several
actions that can be useful to estimate:
• Purchasing a product
• Propensity modeling can be used • Churn
to increase the impact of your •Unsubscription
communication with customers and • etc …
optimize your advertising budget
spendings. In this article we we will focus on
estimating the propensity to purchase
• Google Analytics data is a well an item on an e-commerce website.
structured data source that can
easily be transformed into a machine But why estimate propensity to
learning ready dataset. purchase ? Because it allows to
adapt how we want to interact with
• Backtest on historical data and a customer. For exemple, suppose we
technical metrics can give you a first have a very simple propensity model
sense of your model’s performance that classify the customers in “Cold”,
while live test and business metrics “Warm” and “Hot” for a given product
will allow you to confirm your (“Hot” being customers with highest
model’s impact. chance of buying and “Cold” the least)
• Our custom machine learning model Well, based on this classification you
outperformed existing baselines: can have a specific targeted response
during live tests in terms of ROAS for each class. You might want to
(Return on advertising spend): have a different marketing approach
+221% vs rule based model and with a customer that is very close to
+73% vs off-the-shelf machine buying than with one who might not
learning (Google Analytics session even have heard of your product. Also
quality score). if you have a limited media budget ,
you can focus it on customers that
This article assumes basic have a high likelihood to buy and not
fundamentals in machine learning spend too much on the ones that are
and marketing. long shots.
43
YES
Hot
This simple type of rule based Google Analytics data can be easily F R O M ‘b i g q u e r y - p u b l i c - d a t a .
classification can give good results exported to Big Query (Google g o o g l e _ a n a ly t i c s _ s a m p l e.g a _
and is usually better than not having Cloud Platform fully managed data sessions_20170801’
any but it has several limitations: warehouse service) where it can be
accessed via an SQL like syntax: WHERE totals.hits > 1
• It is likely not exploiting all the data
you have at your disposal whether it Note that the Big Query export table
be more precise information on the with Google Analytics data is a nested And in this query we have used an
customer journey or your website or table at session level: Unnest function to query the same
other data sources you may have at information at hit level:
your disposal like CRM data. • S essions are a list of actions a
specific customer does within a SELECT
•
W hile it seems obvious that given timeframe. They start when a VisitId,
customers classified as “Hot” are customer visit a page and end after hits.hitNumber,
more likely to purchase than “Warm” 30 minutes of activity. hits.page.hostname,
which are more likely to purchase hits.page.pagePath,
than “Cold”, this approach does • Each customer can have several Hits.evenInfo.eventAction
not give us any specific figures on sessions.
how likely they are to purchase. Do F R O M ‘b i g q u e r y - p u b l i c - d a t a .
“warm” customers have 3% chance • E ach session can be made of g o o g l e _ a n a ly t i c s _ s a m p l e.g a _
to purchase ? 5%? 10% ? severals hits (i.e. events) and each sessions_20170801’
hit can have a several attributes or UNNEST(hits) as hits
• Using simple rules, the number of custom metrics (this is why the table
classes you can obtain is limited, is nested, for instance if you want WHERE totals.hits > 1
which limits how customized your to look at a the data at hit level you
targeted response can be. will need to flatten the table).
Note that our project was developed
To cope with those limitations we For example in this query we are only on GA360 so if you are using the latest
can use a more data driven approach: looking at session level features: version, GA4, there will be some slight
use machine learning on our data to differences in data model, especially
predict a probability of purchase for SELECT the table will be at event level. There
each customer. VisitId, are public sample tables of GA360
fullVisitorId, and GA4 data available on Big Query.
Understanding Google totals.hits,
Analytics data totals.pageviews, Now that we have access our raw data
totals.timeOnSite, source we need to perform feature
Google Analytics is an analytics web device.browser, engineering before we can feed our
service that tracks usage data and geoNetwork.country, table to a machine learning algorithm
traffic on website and applications. device.operatingSystem,
channelGrouping
44
AI INDUSTRY SOLUTIONS — DEMAND FORECASTING
Crafting the right features for each customer (using fullVisitorId We also wanted to include some
field as a key) information on the key categorical
The aim of the feature engineering data available such as browser or
GENERAL FEATURES
step is to transform the raw Google device. Since that information is at
Analytics data (extracted from Big Global features are numerical features session level, there can be several
Query) into a table ready to be used that give general information about different values for a single customer
for Machine Learning. the session. so we only take the one that occurs the
most per customer (i.e. the favorite).
GA data is very well structured and Note that bounce rate is defined as Also, to avoid having categorical
will require minimal data cleaning % of times the customer only visited features with too high cardinality,
steps. However there are still a lot only one webpage during a session. we only keep the 5 most common
of information present in the table, values for each feature and replace all
many of which are not useful for It was also important to include the other values with an “Other” value
machine learning or cannot be used information on the recency of events:
PRODUCT FEATURES
as is so selecting and crafting the for instance a customer that just
right features is important. For this visited your website is probably more While the first two types of features
we built features that seemed to be keen to purchase than one that visited are definitely useful in helping us
the most correlated with buying a it 3 months ago. For more information answer the question “Is a customer
product. on this topic you can check the theory going to buy on my website?”, they
on RFM (recency, frequency monetary are not specific enough if we need to
We crafted 4 types of features: value). know “Is the customer going to buy
a specific product?”. To help answer
Features used in the model So we added a feature Recency since this question we built product specific
last session = 1 / Number of days since features that only include the product
Note that we are computing all those last session which allows the value to for which we are trying to predict the
features at a customer level which be normalized between 0 and 1 purchase:
means that we are aggregating
FAVORITE FEATURES
information from multiple sessions For Recency since last session with at
45
least one interaction with this product, following to create our machine + 3 weeks, instead of a single
we use the same formula than for learning dataset (which was 1 row one. In addition to increasing our
the Session Recency in the General per customer): volumes of data, this improves the
Features. However we can have cases generalization capacity of the model
where there is 0 session with at least • Compute the features using the by training on various time of the
one interaction with the product, in sessions in a 3 months time window year where the customers can have
which case we fill with 0. This makes for each customer. different purchase behaviors. Note
sense from a business perspective that due to this, the same customer
since is our highest possible value is •
C ompute the target using the will be present several times in the
1 (when the customer had a session sessions in a 3 weeks time window dataset (at different periods). To
since yesterday). subsequent to the feature time avoid data leakage we make sure
window. If there is at least one that he is always either in the training
SIMILAR PRODUCT FEATURES
purchase of the product in the time or the test dataset.
In addition to looking at the customer’s window, Target it equal to 1 (defined
interaction with the product for as Class 1), else Target is equal to • We undersampled our Class 0 so
which we are trying to predict the 0 (defined as Class 0) that the Class 1 / Class 0 ratio is 1.
probability to purchase, knowing Undersampling is a good solution to
that the customer interacted with • Split the data between a Train set deal with the class imbalance issue,
other products with similar function and a Test using 80 / 20 random compared to other options such as
and price range can definitely be split. oversampling or SMOTE, because
useful (ie substitute product). For we were already able to increase
this reason we added a set of Similar However some first data exploration the volume of Class 1considerably
Product features that are identical to quickly showed that there was a strong with the first two changes. Only the
the Product features except that we class imbalance issue: Class 1 / Class training set is rebalanced since we
also include similar products in the 0 ratio was over 1:1000 and we did not want the test set to have the same
variable scope. The similar products have enough Class 1 customers. This class ratios than the future data we
for a given product were defined using can be very problematic for machine will test it on. Note that we tested
business inputs. learning models. with higher ratios such as 5 or 10 but
1 was optimal in model evaluation.
We now have our feature engineered To cope with these issues we made
dataset on which we can train our several modifications in our approach: Using this dataset we tested with
machine learning model. several classification models: Linear
• We switched the target variable from Model, Random Forest and XGboost,
Training the model making a purchase to making an add finetuning hyperparameters using
to cart. Hence, our model looses a grid search, and ended up selecting
Since we want to know whether a bit in terms of business signification an XGboost model.
customer is going to purchase a but increasing the volume of Class
specific product or not, this is a binary 1 more than compensates. Evaluating our model
classification problem.
• We trained the model on several When evaluating a propensity model
For our first iteration, we did the shifting windows, each of 3 months there are two main types of evaluations
46
AI INDUSTRY SOLUTIONS — DEMAND FORECASTING
that can be performed: So we decided to use two metrics that used to impact the media budget
were more interpretable: strategy.
• Backtest Evaluation
• Livetest Evaluation • PR AUC: Area under the curve of With our metrics, we only assessed if
precision by recall graph (see this the model was able to correctly identify
explanation for more details). customers who would make an add
BACKTEST EVALUATION
Essentially this metric allows us to cart but we did not assess how
First we performed backtest to get a global evaluation on every the identification of those customers
evaluation: we applied our model possible threshold. This metric is would generate a sales uplift.
to past historical data and checked well suited for unbalanced dataset
LIVETEST EVALUATION
that our model is correctly identifying where the priority is to maximize
customers that are going to perform precision and recall on the minority So to get a better idea of our model’s
an add to cart. Since we are using a class : Class 1(contrary to its cousin business value we need to perform
binary classifier, the model produces the ROC AUC) live test evaluation. Here we activate
a probability score between 0 and 1 our model and use it to prioritize
of being Class 1 (Add to cart). • Uplift: we sort customers by their advertising budget spendings :
probability score and we divide our
A first step in evaluating a binary results into 20 ventiles. Uplift is Results we obtained on the livetest
classification model is create a defined as the Class 1 Rate in the top were very solid:
confusion matrix and compute the 5% / the Class 1 Rate across all the
precision / recall (or their combined dataset. So for instance if we have • Compared to a simple rule based
form in the f1 score). However there 21 % Add to Cart in the top Top 5 % approach for evaluation propensity,
are two issues with these simple of the dataset vs 3 % Add to cart Rate our model’s ROAS was +221 %
metrics: in whole dataset we have an uplift of
7 which means our model is 7 times • Furthermore we also compared our
Some can be hard to interpret because more effective than a random model. performance to a strong contender
the dataset is imbalanced (for instance competition in the form of Google’s
the precision metric will generally be Results on those metrics were rather Session Quality Score: a score
very low because we have so few positive, especially, Uplift was around provided by Google in the Google
Class 1) 13.5. Analytics dataset, and in that case
our model was still at +73 % ROAS.
They require to decide on a probability Backtest evaluation is a risk free This shows how a custom ML
threshold to discriminate between method for a first assessment of a approach can bring considerable
Class 0 and 1 propensity model but it has several business value.
limitations:
Confusion Matrix Example for our
Class Imbalanced problem Since it is only done on the past, the
model output is not actually being
47
Conclusion
48
AI Industry
Solutions
AI for Call Centre
49
Powering your The recent advancements in
technology and artificial intelligence
artificial intelligence
profit from productivity gains,
improve customer retention and
create additional revenue.
50
AI INDUSTRY SOLUTIONS — AI FOR CALL CENTRE
51
4 — Seamless integration In the event that a business doesn’t Artefact has been helping clients, in
into your legacy system have any data to analyze or precise various industries, turbocharge their
use cases to aim for, it is possible call centres with artificial intelligence.
For optimal performance, the Google to implement a working solution by We provide assistance in different
Contact Centre AI must be integrated asking each caller a prompt such as ways:
into the call centre workflow, work “Could you please tell us the reason
with the existing databases and for your call ?” and then letting them • Identification and prioritization
documentation (via APIs) and the access the traditional customer of use cases
front desk interfaces. experience journey. By analyzing
the initial answer and the human • S etup and training of artificial
Organizations need to bring a agents’ interactions, the artificial intelligence solutions
multidisciplinary team to achieve this intelligence will get trained to quality
project according to their needs and future interactions. •
D evelopment of integrations
their own IT architecture. to collect relevant data
5 — Why rely on a partner
Before being set up, a chatbot needs Our company has ex tensive
to be fed with customer interaction Before taking the task of implementing experience working with both
data. The bot needs to be trained by a virtual assistant solution into your partners and service providers
listening and analyzing past customer architecture, it could be useful to from the digital, data and artificial
interactions. That will enable the bring in an experienced partner that intelligence industries. Artefact’s
virtual assistant to be able to provide could assist you in the different steps method is centreed on featured
value immediately with high levels of the project and help you maximise teams, composed of members with
of customer satisfaction. Existing value. complementary skills, from business
data can be emails, chat messages
or voice calls. The data will help
train the model according to the
customer journey and the expected
optimisations.
52
AI INDUSTRY SOLUTIONS — AI FOR CALL CENTRE
53
CASE STUDY CHALLENGES
Using topic
with more than 3 million members.
One of the challenges facing its
modelling customer services team was managing
the volume of calls coming into its call
to reduce contact centre — on average, some 8 million a
year.
centre bottlenecks With no way of vetting calls before they
reached an operator, the team was
wasting precious time responding to
questions customers easily find the
answers to on the MAIF website.
To improve efficiencies, we needed to
filter out unnecessary calls and free up
more time for MAIF’s customer service
teams to process more complicated
requests.
54
SOLUTION RESULTS
55
Using NLP to extract quick
and valuable insights from
your customers’ reviews
•U
nsupervised data exploration
•
A nalyzing correlation between
ratings and predefined business
themes
56
AI INDUSTRY SOLUTIONS — AI FOR CALL CENTRE
•D
ata mining or sentiment analysis
is more exploratory: it will find out
what matters the most, what could
be the main reasons driving a review
to be positive or negative.
•T
hemes impact is used to associate
scores distribution to already
defined business concepts (zoom,
battery, …).
57
Get a global look at the data Number of reviews per product category
you have collected
Whenever you’re starting a new data
project, the first step is always to get
the global picture on the data you have
(is it imbalanced? is there enough
data? are there lot of missing values?).
58
AI INDUSTRY SOLUTIONS — AI FOR CALL CENTRE
59
WordCloud
60
AI INDUSTRY SOLUTIONS — AI FOR CALL CENTRE
61
CASE STUDY
62
CHALLENGES
Present in France for 20 years, HomeServe is Because we couldn’t build the entire architecture
the world leader in home insurance services, right away, we needed to quickly demonstrate the
with 8 million customers and over one billion value of speech analytics to all stakeholders via
in revenue. a minimum viable product (MVP), able to expand
When it comes to home emergencies, the after its validation with business experts.
most common channel used by customers To do this, we analysed two high-value use
is the phone – 9 out of 10 customers prefer cases in a four-week cross-company workshop.
it. This particularity places the call centre at We developed several microservices for data
the heart of every step of the insurance value collection and processing and packaged to
chain, from sales to customer service, and enable these use cases to be developed and
ultimately assistance. be reused in the future, should the MVP phase
Although HomeServe has already developed prove successful.
AI-based conversational solutions and is 1. Refining understanding of customer
present on Google Assistant and Amazon contact root causes
Alexa, they wanted to explore new ways in 2. Detecting risks of non-compliance
which AI could improve efficiency and customer within sales calls
experience in their existing phone channel.
They were especially interested to see what
impact speech analytics could have on the
vast amounts of unexploited customer data
they collected.
RESULTS
63
AI Industry
Solutions
Data for Finance & Industry
64
AI INDUSTRY SOLUTIONS — AI FOR FINANCE & INDUSTRY
to scale AI
the time it takes to test and develop
AI solutions down to a few minutes,
thanks to managed services”, assures
Athena. “A bank I worked with started
transitioning into the cloud two years
ago, and their innovation rate has
increased by about 49% according
to their own KPIs. That might seem
small, but for an incumbent, monolithic
Athena Sharma institution, it’s quite revolutionary.”
Consulting Director & Global
Financial Services Lead Another facet of this challenge is
ARTEFACT investing in data management –
both in terms of data quality and
data access. In FIs, data is siloed
across various business units and
divisions. As a result, data isn’t
standardised, quality is hard to
According to The Economist, some Athena explained the main challenges manage, and there’s no single source
54% of large financial institutions to AI project success and how to of truth, so stakeholders are unsure
(FIs) had already adopted artificial overcome them: if the underlying data of proposed
intelligence back in 2020, so imagine projects is trustworthy. “Investment
where those numbers stand today. • N umber one requires investing in modern data governance and data
To add to that proliferation, 86% of in core technology and data management practices is crucial
financial executives say that they plan management. for FIs”, insists Athena. “And a key
on increasing AI investment through component of that is what we call
2025. And in another survey, 81% said • Number two involves implementing an Enterprise Data Model, or EDM.
that unlocking value from AI would a future-oriented operating model. It’s not an IT concept, but a way of
be the key differentiator between describing and logically organising
winners and losers in the banking • Number three concerns proactively your data – all of your data – in
industry. considering AI ethics and regulation. business relevant language – a kind
of business glossary, if you will, that
“There’s clearly a very strong Investing in core technology streamlines data quality management
value case to be made for AI in and data management for all certified users.” The final part
financial institutions”, said Athena. of this challenge is data access.
“Investment banks are perhaps the For Athena, one of the key difficulties
earliest adopters and beneficiaries of FIs face is that their core technology “Data is the most valuable raw material
machine learning technology in the is built for traditional operations, any organisation possesses; key to
algorithmic trading space. After all, such as payments, lending, claims leveraging its value is to have access
70% of FIs now use machine learning management. “Legacy IT stacks don’t to analytics at scale, at the point of
for fraud detection, credit scoring, have the flexibility to deploy AI skills. decision making. It’s especially difficult
or predicting cash flow events, and The computational capacity for data in banks due to data confidentiality. An
conversational AI is commonly used management and analytics you need innovative solution is to create API-
in retail banking and insurance. Yet in a closed loop VR application just enabled databases for more effective
despite this, many FIs fall short isn’t there and testing and developing and secure data access, but at scale
when it comes to productionising AI technologies can take days or even and in real time to fulfil your business
their AI projects to deliver concrete, months – prohibitive when you’re objectives, and in real time to fulfil your
enterprise-wide value.” trying to be innovative. The solution? business objectives.”
65
Regulatory restrictions are to be
imposed on anyone who uses any
software associated with biometric
technology in financial institutions,
human capital management or credit
assessment of individuals. As things
stand, this will affect almost all FIs.
While the full extent of future AI
regulation is not yet clear to anyone,
what is evident is that regulations will
be ethics-based. But many leaders in
the financial services industry feel
their companies don’t understand
the ethical issues associated with AI.
66
Gaining buy-in for
data & analytics
initiatives in
financial services
67
Speak the language of the
business
The majority of data leaders have a
technical background and possess
a genuine passion for their area of
expertise. Nonetheless, it is key
to resist the urge to drop complex
data concepts when pitching a data
transformation proposal to business
leadership. Business leaders do not
always understand data jargon, and
might be thrown off by a technical
approach to solving their problems.
They care about the business
outcomes and impacts: increasing
sales, reducing operating costs,
freeing up human resources and >
B usiness talent is needed to Business could be hooked by the
mitigating risks. Be sure to speak spearhead the vision and quantify use cases available through a
their language! the value impact: new data platform. They would
own the roadmap. IT would love
•
P ique their interest with to roll out the latest data & AI
compelling use cases through tech.
a new data platform.
EXAMPLE
If you are trying to secure buy-in • Make it clear they would own the
to launch an enterprise-wide project roadmap.
data transformation project: Advanced Analytics in
> Engineering & IT excel at building service of your strategy
– Do not: talk about microservice complex tools:
architectures, data observability,
domain driven ownership or • To ensure they are onboard
common data governance and invested, offer them the
DEFENSE
principles. opportunity to roll out the latest
data & AI technology. Risk Monitoring &
– Instead: reference the cross- Mitigation
selling opportunities, the time
saved by teams cleaning data
Regulatory
EXAMPLE
and the opportunity to reduce Compliance
customer churn. Let’s consider a scenario where
you are trying to build a next Reduce Costs
generation Customer Data
Platform and need to attract
Bring business and talent from elsewhere in the
technology onboard business to drive the project:
OFFENSE
Just as our Renaissance hero • Have a clear plan of who you
combined art and science, modern need and what they can do.
day Da(ta) Vincis need to recruit Increased Revenue
experts from across the business • Understand what will drive and More Retention
to instill a data-driven culture into motivate individual experts to
an organization. Understanding the work on the project. Competitive
strengths of colleagues – and what
motivates them – can help pave the • Provide incentives to work Advantage
way for effective collaboration, for with the data team, make sure
example: everyone benefits.
68
AI INDUSTRY SOLUTIONS — AI FOR FINANCE & INDUSTRY
SITUATION
Poor data quality | Ineffective data products | No buy-in from leadership
TOP-DOWN
Hybrid pods
CDO team engaged business analysts to understand
Sprint retrospects
impacts caused due to poor DQ
Ask me anything hours
Coffee-chats
BOTTOM-UP Surveys, slack
69
Aim big, start small
LAYING THE GROUNDWORK FOR
SUCCESS
The Mona Lisa wasn’t painted in a
day. Data Leaders won’t be able to
reorient their whole company to make
it data driven overnight. Have your
Da(ta) Vincis invest their energy and
resources into first sketching out
the story. Use this to build a high
value use case that highlights all the
advantages of leveraging your data
insights.
•B
usiness user friendly tools
Discovering analytics gold Using a lean data visualization POD team, we built
and deployed an automated B2B Sales Power BI
To complement a strong case – or
Dashboard in three weeks leveraging the existing data
generate buy-in from the business
without one – identifying existing eco-system with no disruption
pots of “analytics gold” in your
organization is a good approach.
There are almost certainly passionate
individuals in the business who
have developed impactful analytics
solutions locally. Find them and
Our client Marketing We helped the
industrialize their initiatives so the reorganized now wanted client build a
whole company can benefit. their sales a dashboard Bl center of
operations to visualize excellence
to leverage omni-channel to backlog,
data from this campaign prioritize and
dashboard results on a execute on
unified interface enterprise
needs
70
AI INDUSTRY SOLUTIONS — AI FOR FINANCE & INDUSTRY
FIND •
B uild an enticing platform
Organize slack/teams channels for (newsletter, monthly data meetup,
ideas sharing …) so people will want to share their
ideas.
DATA &
Data & analytics leaders, true ANALYTICS
renaissance thinkers LEADER
Being a remarkable “sales” person is
a quality that makes one stand out, no
matter the profession. However, being SPEAK THE LANGUAGE BRING BUSINESS AND
remarkable at “selling” is less about OF THE BUSINESS TECHNOLOGY ONBOARD
the act of sales and more about the
ability to solve complex problems. The
role of a data leader encompasses
all of these traits. They need to be an
outstanding salesperson, a smooth DISCOVER
SELL & EXECUTE &
operator and possess the ability
to understand their organization’s CONVINCE ANALYTICS GOLD SCALE
numerous challenges.
71
The road ahead:
data-driven sales is critical
for the evolving car industry
The commercial model for selling
cars is shifting, with manufacturers
adopting the direct-to-consumer (also
Axel Tasciyan know as the agency model) trend
Data Consulting Director being witnessed by a wide variety of
Automotive Lead business sectors as people look to
ARTEFACT simplify the buying process.
Sell more
72
AI INDUSTRY SOLUTIONS — AI FOR FINANCE & INDUSTRY
Sell better
73
behavior, it identified the people with using creative thinking to make money components keep costs manageable,
the highest propensity to buy a vehicle; from everything they know about the and make it feasible and relatively
targeting these new audiences drove people that buy their products, in uncomplex for most enterprises.
a conversation rate six times higher a way that enhances the customer
than the previous tactic of re-engaging experience. With a CDP established, data
website visitors, while reducing the management and analysis can deliver
cost-per-acquisition by 80 percent. The data ‘glue’ relevant insight that adds value and
More advanced activity will include generates revenue opportunities
using increasing amounts of first party So what is the link that holds together throughout the complete customer
data to steer advertising campaigns. the modus operandi outlined above? lifetime.
Data. Data, data and more data.
Seek out new sales streams However, this isn’t a “tick box” exercise;
This calls for a change of mindset the work of a data-marketer is never
Extending the car-buying experience in the car industry. Instead of one done… Data, and especially business
gives manufacturers the opportunity organization selling new cars, another data, is continually added to the CDP,
to build up a picture of their customers selling used ones, another offering and with it the picture of customers
— the environment they live in (rural spare parts and yet another selling becomes clearer, richer and more
or urban), the type of car they need, services, with each keeping their data granular — giving manufacturers and
how they use it and potentially where separate, the vision of tomorrow relies dealers ever more accurate tools with
they go. on one central data hub that benefits which to sell more, sell smart and
everyone. Sometimes referred to as a develop new sales streams.
Anonymizing these details and adding Customer Data Platform (CDP), today’s
them to the vast amounts of first-party cloud solutions for the different tech
data that the industry already owns,
puts manufacturers in charge of a
lucrative, mainly unrealised, revenue
stream that can be unlocked via data
partnerships.
74
AI INDUSTRY SOLUTIONS — AI FOR FINANCE & INDUSTRY
INTERVIEW
75
Pascal: Ok, can you tell us how decision making; on the other end is how one million visitors interact, every
you are using data? How are you output data, which are the predictive single minute of every single day.
becoming customer-centric? models and the data science that
shows the operational results of our A couple of years ago, we launched
Dév: Over my career, I hold a firm belief initiatives. What links the two of these a new car in a specific market and
that the amount of data available will together is a data-driven, hypothesis- spotted a lot of cross-segment
always be ahead of organisational led test-and-learn culture. buying going on. We could see when
appetite to consume and action those customers changed their minds from
insights. This is not a bad or a good Pascal: Can you give us an example the model we thought they would buy
thing, it’s just reality. Sometimes the of a data project that has resonated to the one they actually did. We could
battle is simply because the data at scale within your organisation? also determine who would buy an
conflicts with the intended course of automatic transmission or a manual
action you wanted to take. Dév: In one way it’s been our entire transmission, for example. We can
journey for the last two or three now consume data and see significant
But more importantly, it’s when analysis years… developing CEDAR, our internal patterns in the consumer buying
does not present clear implications or brand for the data analytics function. journey. We want to know what our
recommendations — if the data is It stands for Customer Experience customers want and respond to that.
not telling you what to do next, it’s Data Analytics and Reporting. It
pretty useless. This is where the word helps us turn data into information, Pascal: Can you tell me more about
‘utilisation’ becomes relevant; it shifts knowledge, and wisdom. That’s where this CEDAR dashboard — this BI
the focus from how many people are the actionability and usability come in. hub? Why does it exist?
using the data to the utility. We’ve
been leading the adoption of data For the first couple of years, we were Dév: It wasn’t the first thing we built.
within our organisation by keeping a kind of hardcore, getting everything We started by focussing on data –
clear definition of its usefulness for organised. The first ‘Eureka!’ moment what questions can we ask the data?
our markets, business functions and was when the whole organisation what answers will the data give? We
digital teams. realised they could look at the data got our head around that in the first
and see consumer trends across year of the program. Then it was
At one end is the input data, like 147 markets. Rather than make about ‘democratising’ this data. We
dashboards and the support systems assumptions based on samples of realised that having the data was not
that enable upstream planning and 5,000 people, say, we can now see the challenge, but making sense of it
76
AI INDUSTRY SOLUTIONS — AI FOR FINANCE & INDUSTRY
77
Data for
Impact
81 Use data to measure and reduce your
environmental impact with Artefact
83 Industrializing carbon footprint measurement
to achieve neutrality
85 Applying machine learning algorithms to
satellite imagery for agriculture applications
78
DATA INDUSTRY SOLUTIONS — DATA FOR IMPACT
environmental impact
example, how their customers will use
their products once they’ve purchased
them and what impact this will have on
with Artefact
the environment. To better estimate
this impact, companies can turn to
Artefact’s data experts. They advise
large companies on how to turn data
into business value and focus more
on the potential of data to positively
impact the environment.
79
Data governance for energy
sobriety in digital activity
As with many of the challenges facing
companies, the creation of a reliable,
sustainable database is an essential
prerequisite for implementing a
strategy to reduce environmental
impact. Artefact’s sustainable data
governance offer enables its clients
to benefit from a clean and structured
data repository, a real added value for
the organization.
80
DATA INDUSTRY SOLUTIONS — DATA FOR IMPACT
The climate emergency has become Achieving carbon neutrality We must meet the expectations of
a major issue for our society. Recent with three objectives thanks consumers, who are increasingly
events, in particular the multiple to data committed, and anticipate the
shortages and repeated heat waves, tightening of the legislative framework
only confirm the acceleration of Let’s take the example of the Carrefour to come, such as the eco-score that
current and future difficulties that group, for whom we are carrying out will become mandatory from 2023 for
must be overcome. Today, many an assignment. Carrefour’s ambition certain players. In order to face these
European companies listed on the is to become the world leader in food challenges, Carrefour understood that
stock exchange are announcing their transition, particularly in e-commerce. it was necessary to have a measure
commitment to climate transition. One of its major objectives is to of the carbon footprint: to make a
30% have made a real commitment make e-commerce carbon-neutral quantified inventory of the starting
to reduce their carbon emissions, but by 2030. Three main levers of action point, to determine the impact of
it is estimated that only 5% of them have been identified in order to reduction initiatives and to be able
are on track to do so. It is not a simple reach these objectives: reducing to communicate both internally and
exercise. Reducing emissions in a Carrefour’s own emissions, engaging externally on the successes, and
sustainable way requires accurate its service providers to reduce their also on the challenges to come. This
measurement of their carbon footprint, emissions and finally encouraging its measure will be the compass for the
in order to develop concrete actions. customers to adopt eco-responsible 2030 neutrality trajectory. It will have
At Artefact, we believe that exploiting behaviors. This ambition, in addition to to meet the requirements of reliability
the data to its full potential is a major responding to the climate emergency, and transparency, and allow for the
asset for the success of this approach. also has a strong economic impact. implementation of concrete actions.
81
We are undoubtedly at a crossroads
as far as the ecological transition
in companies is concerned. The
successive disasters of the summer
of 2022 are helping to accelerate this
awareness, while a new generation
of workers who are highly aware of
these issues is entering the job market.
Nearly 76% of Gen Yers place CSR
above salary in their job search criteria
and 70% are willing to pay up to 35%
The major challenge same way that companies are required more for a sustainable, low-carbon
of prioritizing data to be financially transparent. product or service.
A large part of the project’s efforts The parallel with the data The market is still at this stage: there
consisted of collecting a large market is a strong will to move forward, but
amount of very heterogeneous data the foundations needed to achieve
from multiple sources (for example, We can take the parallel further with these goals in a sustainable way often
mileage data of delivery services or the evolution of the data market. Ten have to be built, which Carrefour has
IT infrastructure emissions data), in years ago, awareness of data in large understood well. It is therefore crucial
order to orchestrate them and build companies was still limited. It was that companies equip themselves
a consolidated carbon footprint the exclusive territory of small teams with the capabilities and tools to
measurement. The goal is to obtain within the IT or digital departments match their ambition, in particular
a comprehensive measurement of who worked on use cases, without the measurement of the carbon
all emission items for each individual the capacity to bring their solution footprint of all their activities. This
order. The main difficulty with any to scale. Today, the importance measurement must be industrialized,
project of this type is the complexity of data is heard at the executive calculated in real time, accessible and
of accessing data that can be used committee level of large groups, and integrated into all business processes.
quickly. Most large groups have is perceived as a strategic priority at For example, the carbon footprint
already launched significant programs all levels. This evolution has been, could be integrated into budgets and
to better govern the data, addressing over the last ten years, the result used to assess the impact of new
quality and accessibility issues first. of a collective awareness of the projects, along with the revenues
These programs are often very importance of data, notably through generated and the associated CapEx
large and obviously cannot handle geopolitical and strategic issues, and OpEx costs.
all the data created in a company, as well as tensions between major
often very large. Prioritization of data powers and large technology groups. Consolidating data
domains closest to the core business This awareness has gradually taken governance
is necessary, such as sales, supplier hold in all organizations, even those
or consumer data. less advanced in digital technology. Once these foundations are built and
It has been accelerated by the arrival consolidated, large corporations will
“Reducing your emissions in a of new generations (millennials) in be able to leverage their data much
sustainable and lasting way requires decision-making positions, who have better to accelerate their green
you to accurately measure your carbon been aware of digital issues since transition. Strong data foundations
footprint.” their childhood. are a major prerequisite for deploying
AI solutions at scale; the same is true
Unfortunately, data related to Measuring the carbon for the green transition, where AI
sustainable development is rarely footprint of all activities will certainly play a role once these
prioritized in such initiatives, as it foundations are consolidated. It’s
is rarely used in an industrial way This evolution is not going smoothly, often more appealing to talk about
by large groups. Today, a team of and the use of data does not always AI than data governance, but I am
experts needs several weeks of project give the expected results, often convinced that the success of these
time to calculate a carbon footprint because robust foundations have not initiatives lies in the ability to move
measurement that is often static. It is been put in place. The major groups forward on both fronts: delivering
certain that tomorrow all companies now understood the importance of this impact through targeted initiatives,
will have to be able to calculate this fundamental work and are launching while building the right foundations
carbon footprint at any time, in the numerous programs on the subject. to sustain those impacts.
82
DATA INDUSTRY SOLUTIONS — DATA FOR IMPACT
to satellite imagery
of business applications. Computing
the number of plots, their average
size, the density of vegetation, the
for agriculture
total surface area of specific crops,
and plenty more indicators could
serve various purposes. For example,
applications
public organizations could use these
metrics for national statistics, while
private farming companies could
use them to estimate their potential
market with a great level of detail.
83
Solution 1A
TRAINING A PIXEL CLASSIFIER
The first solution for detecting
agricultural zones on large images
is to build a pixel classifier. For
each pixel, this machine learning
model would predict whether this
pixel belongs to a forest, a city,
water, a farm … and therefore, to an
agricultural zone or not.
Solution 1A pros:
Solution 1A cons:
84
DATA INDUSTRY SOLUTIONS — DATA FOR IMPACT
Solution 1B
is to do is map those coordinates to
MAPPING GEO COORDINATES
your satellite images and filter your
TO PIXEL COORDINATES
images to only cover the zones within
If coordinates about your zone your polygons.
of interest have been labeled,
or if you’re labeling coordinates Solution 1B pros:
by yourself, it is possible to map
these geo coordinates (latitude and •A
lso a reliable method
longitude) to your images.
Solution 1B cons:
For example, if you have the
coordinates associated with large • You need a list of coordinates
farming areas, or if you draw large associated with agricultural regions
polygons on Google Maps yourself,
you can easily obtain geo coordinates • Manually creating those coordinates
of agricultural areas. Then, all there can be time consuming
85
Visual representation of the NDVI on an agricultural zone and a desert
(Copernicus Sentinel data 2019)
Solution 1C
ground, which could serve to detect Solution 1C pros:
USING A VEGETATION INDEX
agricultural areas over a large image.
It is possible to compute a vegetation • Absolutely no labelled data required
index from the color bands provided After computing NDVI values for
by the satellite images. A vegetation each pixel, you can set a threshold Solution 1C cons:
index is a formula combining multiple to quickly eliminate pixels with no
color bands, often highly correlated vegetation. We used NDVI as an • Not very accurate: for example,
with the presence or density of example, but experimenting with it could be hard to differentiate
vegetation (or other indicators such various indices could help achieve agricultural crops from forests
as the presence of water). better results.
• The thresholds have to be fine tuned
Multiple indices exist, but one of Note that computing a vegetation depending on climate and other
the most commonly used ones in index can provide you with useful specificities
an agricultural context is the NDVI information to enrich your analysis,
(Normalized Difference Vegetation even if you have already implemented
Index). This index is used to estimate another way to detect agricultural
the density of vegetation on the areas.
86
DATA INDUSTRY SOLUTIONS — DATA FOR IMPACT
In the absence of labeled data, we Illustration of the full process of outlining plots (Copernicus Sentinel
decided to go for an unsupervised data 2019)
approach based on OpenCV’s Canny
Edge detection. Edge detection
consists in looking at a specific pixel
and comparing it to the ones around
it. If the contrast with neighboring
pixels is high, then the pixel can be
considered as an edge.
87
Experimenting on contrast, saturation or sharpness can help improve the
efficiency of the edge detection (Copernicus Sentinel data 2019)
OPTIMIZING OF THE
PERFORMANCE OF THE EDGE
DETECTION ALGORITHM
Forcing convex shapes fits most plots much better (Copernicus Sentinel
data 2019)
88
DATA INDUSTRY SOLUTIONS — DATA FOR IMPACT
89
When working with farm plots, just a few weeks can make a large
difference (Copernicus Sentinel data 2019)
90
WE OFFER END-TO-END
DATA & AI SERVICES
FMCG • RETAIL & ECOMMERCE • LUXURY & COSMETICS SPORTS & ENTERTAINMENT • TRAVEL & TOURISM • PUBLIC &
HEALTHCARE • BANKING & INSURANCE • TELECOMMUNICATIONS GOVERNMENT • REAL ESTATE • MANUFACTURING & UTILITIES
CONTACT
hello@artefact.com
artefact.com/contact-us
ARTEFACT HEADQUARTERS
19, rue Richer
75009 — Paris
France
artefact.com