A Brief History of AI: How To Prevent Another Winter (A Critical Review)
A Brief History of AI: How To Prevent Another Winter (A Critical Review)
net/publication/354387444
CITATIONS READS
88 5,662
5 authors, including:
Babak Saboury
University of Pennsylvania
232 PUBLICATIONS 6,016 CITATIONS
SEE PROFILE
All content following this page was uploaded by Amirhosein Toosi on 06 September 2021.
atoosi@bccrc.ca andrea.bottino@polito.it
Arman Rahmim
Departments of Radiology and Physics
University of British Columbia
Vancouver, BC
arman.rahmim@ubc.ca
September 6, 2021
A BSTRACT
The field of artificial intelligence (AI), regarded as one of the most enigmatic areas of science, has
witnessed exponential growth in the past decade including a remarkably wide array of applications,
having already impacted our everyday lives. Advances in computing power and the design of
sophisticated AI algorithms have enabled computers to outperform humans in a variety of tasks,
especially in the areas of computer vision and speech recognition. Yet, AI’s path has never been
smooth, having essentially fallen apart twice in its lifetime (‘winters’ of AI), both after periods of
popular success (‘summers’ of AI). We provide a brief rundown of AI’s evolution over the course of
decades, highlighting its crucial moments and major turning points from inception to the present. In
doing so, we attempt to learn, anticipate the future, and discuss what steps may be taken to prevent
another ‘winter’.
Keywords Artificial intelligence · machine learning · deep learning · artificial neural networks · perceptron
1 Introduction
Artificial Intelligence (AI) technology is sweeping the globe, leading to bold statements by notable figures: “[AI] is
going to change the world more than anything in the history of mankind” [1], “it is more profound than even electricity
or fire" [2], and "just as electricity transformed almost everything 100 years ago, today I actually have a hard time
thinking of an industry that I don’t think AI will transform in the next several years” [3]. Every few weeks there is
news about AI breakthroughs. Deep-fake videos are becoming harder and harder to tell apart from real ones [4] [5].
Intelligent algorithms are beating humans in a greater variety of games more easily. For the first time in history, in
arguably the most complex board game (named “Go”), DeepMind’s AlphaGo has beaten the world champion. AI has
A PREPRINT - S EPTEMBER 6, 2021
been around for decades, enduring “hot and cold” seasons, and like any other field in science, AI developments indeed
stand on the shoulders of giants (see figure 1). With these in mind, this article aims to provide a picture of what AI
essentially is and the story behind this rapidly evolving and globally engaged technology.
George Boole
Boolean Logic
(First-order Logic) Gottlob Frege
Gerolamo Cardano
Probability
Conclusion
Computability
Linguistics
Read
Incompleteness
Tractability
Theorem
Analyze Drafts
NP-completeness
Proofread
Dualism
250 BCE
Empiricism
Water clock
First self-controlling machine constant flow control induction
Cognitive Psychology
Intelligence Augmentation
Performances doubled every 18 months
Alan Turing
World War II
First
Operational Computer
Neuron
Deciphering codes
John Mauchly
Utility theory
First
J.Presper Eckert Electronic Computer Computer
Decision Theory
Hardware
Charles Babbage
First Universal Game theory
Computation mashine
Operating Systems
Computer
Software
Programming Languages
2 What is AI?
AI is “the theory and development of computer systems able to perform tasks normally requiring human intelligence,
such as visual perception, speech recognition, decision-making, and translation between languages” [6]. Marvin
Minsky, American mathematician, computer scientist, and famous practitioner of AI defines AI as “the science of
making machines do things that would require intelligence if done by men” [7]. John McCarthy who coined the term
“Artificial intelligence” in 1956, described it as "the science and engineering of making intelligent machines". IBM
2
A PREPRINT - S EPTEMBER 6, 2021
suggests that “Artificial intelligence enables computers and machines to mimic the perception, problem-solving, and
decision-making capabilities of the human mind” [8]. McKinsey & Company explains it as a “machine’s ability to
mimic human cognitive functions, including perception, reasoning, learning, and problem-solving” [9].
Knowledge representation
Turing Test
Automated reasoning
Economics
AI Cognitive science
Russel and Norvig [11] proposed four conceivable approaches to AI: Acting Humanly, Thinking Humanly, Acting
Rationally, and Thinking Rationally (see figure 2). The British mathematician Alan Turing published a paper in 1950
(“Computers and intelligence” [12]) in which he proposed a tool to determine the difference between a task performed
by a person and a machine. This test, known as the Turing Test, consists of a series of questions to be answered. A
computer can pass the test if a human interrogator cannot tell whether the answers to the questions come from a person
or a computer. As such, to pass the test, the computer is required to have a number of essential capabilities such
as: Natural language processing - to manage a natural and effective communication with human beings; Knowledge
representation - to store the information it receives; Automated reasoning - to perform question answering and update
the conclusion; Machine learning - to adjust to new situations and recognize new patterns. In Turing’s view, a physical
simulation of a human is totally irrelevant to demonstrate intelligence. Other researchers, however, have suggested
a complete Turing test [13, 14, 15] that involves interaction with real-world objects and people. Hence, the machine
should be equipped with two additional (and vital) capabilities to pass the “Extended” version of the Turing test:
Computer vision and speech recognition - to see and hear the environment. Robotics - to move around and interact with
the environment. [10]
3 History of AI
The field of AI has experienced extreme ascends and descends over the last seven decades. These recurring ridges of
great promise and valleys of disappointment referred to as AI’s Summers and Winters, have divided the history of AI
into three distinct cycles (see figure 3). These different cycles and seasons will be discussed in the following parts.
When science fiction writer Isaac Asimov wrote his timeless book "I, Robot" in 1942, he likely did not imagine that
this work, 80 years later, would become a primary source for defining the laws governing human-robot interactions in
modern AI ethics. While Asimov’s novels (figure 4) are often considered as the birthplace of the ideas of intelligent
machines [16], McCulloch and Pitts’ paper, “A Logical Calculus of the Ideas Immanent in Nervous Activity” published
in 1943 [17], was the first step towards the implementation of AI [18, 19, 20, 21].
3
A PREPRINT - S EPTEMBER 6, 2021
Subtopic 1 Subtopic 2 Subtopic 3 Subtopic 4 Subtopic 5 Subtopic 6 Subtopic 7 Subtopic 8 Subtopic 9 Subtopic 10 Subtopic 11 Subtopic 12
--------------------------------------------------------------------------------
Global
-------------------------------------------------------------------------------- funding cut
AlexNet wins
Lighthill ImageNet ILSVRC
report
Second Summer of
First Summer of AI AI Third Summer of AI
Expert Systems
Second Winter of AI
AI as First Winter of AI Machine Learning
Search Algorithm era era
Commercialized
Dartmouth Conf. Expert System Deep Blue wins
The Birth of AI Gary Kasparov
Figure 3: Dividing the history of AI into three recurring cycles (public excitements/time).
(a) (b)
Figure 4: (a): Isaac Asimov, the well-known sci-fi writer. (b): I, Robot, Asimov’s sci-fi book series [22].
Based on Alan Turing’s "On Computable Numbers [23]," their model provided a way to abstractly describe brain
functions, and demonstrated that simple elements connected in a neural network can have enormous computational
power. The paper received little attention until John von Neumann, Norbert Wiener, and others applied its concepts.
“McCulloch - Pitts” neuron, was the first mathematical model of an artificial neural network. This model, inspired
by the basic physiology and function of the brain’s neurons, showed that essentially any computable function could
be modeled as a connected network of such neurons [10]. Based on this work, 6 years later, Donald Hebb proposed
a simple learning rule to tune the strength of the neuron connections [24]. His learning method, namely “Hebbian
Learning,” [25] is considered as the inspiring model for neural networks learning. Building upon these works, one year
later, in the summer of 1950, two Harvard undergrad students, Marvin Minsky and Dean Edmonds built the first analog
neural net machine called SNARC [26]. SNARC stands for “stochastic neural-analog reinforcement calculator” and
was based on a network of 40 interconnected artificial hardware neurons built using 3000 vacuum tubes and the remains
4
A PREPRINT - S EPTEMBER 6, 2021
of a B-24 bomber’s automatic pilot mechanism. SNARC was successfully applied to find the way out from a maze (See
figure 5).
Figure 5: One node of 40 nodes constructing the stochastic neural-analog reinforcement calculator (SNARC) [27]
AI developed significantly from the studies of Alan Turing (figure 6) during his short life, considered in all respects as
one of the fathers of AI. Although Turing owes much of his fame to the work he did at the Bletchley Park center to
decode German communications during World War II, his remarkable work toward the Theory of Computation dates
back to his paper published when he was only 24 [23]. Turing demonstrated that his "universal computing machine"
could perform any imaginable mathematical computation if it could be represented as an algorithm. John von Neumann
stated that Turing’s paper laid the groundwork for the central concept of modern computers. A few years later, in 1950,
in his paper entitled “Computing machinery and intelligence” [12], Turing raised the fundamental question of “Can a
machine think?”. The imitation game or the Turing test evaluates the ability of a machine to “think”. In this test, a
human was asked to distinguish between a machine’s written answers and those of a human (figure 6). A machine is
considered as being intelligent if the human interrogator could not tell if the answer is given by a human or a machine
[28].
Before the term AI was coined, many works were pursued that were later recognized as AI, including two checkers-
playing games, developed almost at the same time by Arthur Samuel at IBM and Christopher Strachey at the University
of Manchester in 1952.
The term AI was coined around 6 years after Turing’s paper [12], in the summer of 1956, when John McCarthy, Marvin
Minsky, Claude Shannon, and Nathaniel Rochester gathered common interest in automata theory, neural networks, and
cognitive science (2-month workshop at Dartmouth College). There, the term ’Artificial Intelligence’ was coined by
McCarthy. McCarthy defined AI as "the science and engineering of making intelligent machines," emphasizing the
parallel growth between computers and AI. The conference is sometimes referred to as the "birthplace of AI" because
it coordinated and energized the field [10], and this time is considered as the beginning of an era called “The First
Summer of AI”.
One of the consequent results of the Dartmouth Conference was the work of Newell and Simon. They presented
a mathematics-based system for proving symbolic logic theories, called the Logic Theorist (LT), along with a list
processing language for writing them called IPL (Information Processing Language) [29]. Soon after the conference,
their program was able to prove most of the theorems (38 out of 52 of them) in the second chapter of Whitehead &
Russell’s “Principia Mathematica”. In fact, the program was able to give a solution for one of the theorems that was
shorter than the one in the text. Newell and Simon later released their General Problem Solver (GPS) which was
designed to mimic the problem-solving protocols of the human brain [30]. GPS is counted as the first work in the
“reasoning humanly” framework of AI.
Using reinforcement learning, Arthur Samuel’s 1956 checker player quickly learned to play at an intermediate level,
better than its own developer [31]. Reinforcement learning is a type of AI algorithm where an AI agent learns how
to interact with its surrounding environment to achieve its goal through a reward-based system. He demonstrated
5
A PREPRINT - S EPTEMBER 6, 2021
Computer Human
Evaluator
(a) (b)
Figure 6: (a): Alan Turing. (b): Schematic of the Turing Test.
his checker player program on television, making a great impression [32]. His work is considered to be the first
reinforcement learning-based AI program, and indeed the forefather of later systems such as TD-GAMMON in 1992,
one of the world’s best backgammon players [33], and AlphaGo in 2016, which shocked the world by defeating the
human world champion of Go [34]. A turning point in AI, and specifically in neural networks, occurred in 1957 when
the psychologist researcher Frank Rosenblatt (considered a father of deep learning [35]) built the Mark I Perceptron at
Cornell [36]. He built an analog neural network with the ability to learn through trial and error. More precisely, the
perceptron was a single-layer neural network being able to classify the input data into two potential categories. The
neural network produces a prediction, say “left” or “right”, and if it is incorrect, it attempts to get more accurate the
following time. Accuracy increases with each iteration. A 5-ton IBM 704 computer the size of a room was fed by a
large stack of punch cards (figure 7). The computer learned to identify cards on the left and cards on the right in 50
attempts. Mark I Perceptron is considered as one of the forefathers of modern neural networks [37].
In 1958 John McCarthy introduced the AI-specific programming language named LISP which became the prevailing
AI programming language for the next 3 decades. In his paper entitled “Programs with Common Sense”, he proposed a
conceptual approach for AI systems based on knowledge representation and reasoning [39]. LISP is the first high-level
AI programming language. 1958 is also an important year since the first experimental work involving evolutionary
algorithms in AI [40] was conducted by Freidberg towards automatic programming [41]. Nathaniel Rochester and
Herbert Gelernter of IBM developed a Geometry-theorem-proving program in 1959. Their AI-based program called
“geometry machine” was able to provide proofs for geometry theorems which many math students had found quite
tricky [42]. Written in FORTRAN, their “geometry machine” program is regarded as one of the first AI programs
that could perform a task as well as a human. Another important event during the early 60s is the emergence of the
first industrial robot. Named “Unimate”, the robotic arm was employed on an assembly line in General Motors in
1961 for welding and other metalworks [43]. In 1962 Widrow and Frank Rosenblatt revisited Hebb’s learning method.
Widrow enhanced the Hebbian learning method in his network called Adaline [44] and Rosenblatt in his well-known
Perceptrons [36]. Marvin Minsky in 1963 proposed a simplification approach for AI use cases [45]. Marvin Minsky
and Seymour Papert suggested that AI studies concentrate on designing programs capable of intelligent behavior in
smaller artificial environments. The so-called blocks universe, which consists of colored blocks of different shapes
and sizes arranged on a flat surface, has been the subject of many studies. The framework called Microworld became
a backbone for subsequent works. Instances are James Slagle’s SAINT program for solving closed-form calculus
integration problems in 1963 [46], Tom Evans’s ANALOGY in 1964 for solving an IQ test’s geometric problems [47],
and STUDENT program, written in LISP by Daniel Bobrow in 1967, for solving algebra problems [48]. ELIZA, the
first chat-bot in the history of AI was developed by Joseph Weizenbaum at MIT in 1966. ELIZA was designed to
6
A PREPRINT - S EPTEMBER 6, 2021
serve as a virtual therapist to ask questions and provide follow-ups in response to the patient [49]. SHAKEY, the first
omni-purpose mobile platform robot was also developed at Stanford Research Institute in 1966 with reasoning about its
surrounding environment [50].
The hype and high expectations caused by the media and the public from one side, and the false predictions and
exaggerations by the experts in the field about their outcome from the other side, led to major funding cuts on AI
research in the late 60s. Governmental organizations like Defense Advanced Research Projects Agency (DARPA) had
already granted enormous funds for AI research projects during the 60s. Two reports brought about major halts in
supporting the research: the US government report, namely ALPAC in 1966 [51], and the Lighthill report of the British
government in 1973 [52]. These mainly targeted the research pursued in AI, mostly the research works done on artificial
neural networks and came up with a grim prediction for the technology’s prospects. As a result, both the US and the UK
governments started to reduce support for AI research at universities. DARPA, which had previously funded various
research projects in the 1960s, now needed specific timelines and concise explanations of each proposal’s deliverables.
These events slowed the advancement of AI and ushered in the first AI winter, which lasted until the 1980s.
It is crucial to recognize the three key factors that caused this major halt in AI research for that era. First, many
early AI systems pursued the “thinking humanly” approach to solve the problems. In other words, instead of taking
a bottom-up approach starting from thoroughly analyzing the task, providing a possible solution, and turning it into
an implementable algorithm, they took the opposite direction, merely relying on replicating the way humans perform
the task. Secondly, there was a failure to recognize the complexity of many of the problems. Resulting from the
oversimplification of the AI frameworks proposed by Marvin Minsky, most early problem-solving systems succeeded
mainly on toy (simplistic) problems, by combining simple steps to come up with a solution. However, many of the
real-world problems that AI was attempting to solve were in fact intractable. It was commonly assumed that “scaling
up” to bigger problems was merely a matter of faster hardware and higher memory capacity. However, developments in
computational complexity theory proved it wrong. The third factor was related to the negative conceptions about neural
networks and the limitations of their fundamental structures. In 1969 Minsky pointed out the limited representational
abilities of a perceptron, (to be exact, a single-layer perceptron cannot implement the classic XOR logical function),
and despite not being a general critique about neural networks, this also contributed to global funding cuts in neural
networks research.
7
A PREPRINT - S EPTEMBER 6, 2021
Mainstream AI research efforts during the previous two decades were generally based on so-called “weak AI”, that
is, providing general solutions based on search algorithms in a space of all possible states built upon basic reasoning
steps. Despite being general-purpose, these approaches suffered from a lack of scalability to larger or more complex
domains. To address these drawbacks, in the early 80s, researchers decided to take a more robust approach utilizing
domain-specific information for stronger reasoning but in narrower areas of expertise. The new approach, so-called
“expert systems”, originated at Carnegie Mellon University, and was quickly able to find its way to corporations.
DENDRAL [53], created at Stanford by Ed Feigenbaum, Bruce Buchanan, and Joshua Lederberg in the late 60s and
early 70s, which inferred molecular structure from mass spectrometry data, was an early success story. DENDRAL was
the first effective knowledge-intensive system, relying on a vast range of special-purpose laws, to provide expertise
rather than basic knowledge.
In 1971 at Stanford University, Feigenbaum started the Heuristic Programming Project aimed at extending the area in
which expert systems could be applied. The MYCIN system was one of the successful consequent results of the new
wave, developed in the mid-70s for the purpose of blood infection diagnosis by Edward Shortliffe under the supervision
of Bruce Buchanan and Stanley Cohen. MYCIN could perform identification of bacteria causing sepsis, and recommend
antibiotics dosage based on patient weight. It could perform diagnosis on par with the human experts in the field, and
significantly better than medical interns, benefiting from around 600 deduced rules in the form of a knowledge base,
from extensive interviews with the experts, by means of integrating uncertainty calculations [54]. Meanwhile, one of
the most important moves towards deep convolutional neural networks (CNN) happened in 1980. The “neocognitron”,
the first CNN architecture, was proposed by Fukushima in 1980 [55]. Several learning algorithms were suggested
by Fukushima to train the parameters of a deep neocognitron so that it could learn internal representations of input
data. This work is in fact regarded as the origin of today’s deep CNNs. R1, developed by McDermott in 1982, was
the first successful commercial expert system employed in the digital equipment industry, for the configuration of new
computer systems’ orders [56]. In nearly 4 years, the firm added 40 million dollars of revenue using R1. By 1988,
most corporations in the U.S benefited from expert systems, either by being a user of the system or doing research
in the field [57]. The application of expert systems to real-world problems resulted in the development of a wide
range of representations and reasoning tools. The Prolog language gained popularity in Europe and Japan while the
PLANNER language family thrived more in the US. In Japan, the government started a 10-year plan to keep up with
the new wave by investing more than 1.3 billion dollars in intelligent systems. The U.S government, by establishing
the Microelectronics and Computer Technology Corporation (MCC) in 1982, revived AI research both in hardware,
chip design, and software research. The same change happened in the UK as well, resulting in reassignment of funds
previously cut. All these events during the 80s led to a period of “Summer” for AI. The AI industry thrived from
billions of dollars invested in the field, and various activities emerged from expert systems developer companies to
domain-specific hardware, computer vision, and robotic systems. Overall, the AI industry boomed from a few million
dollars in 1980 to billions of dollars in the late 80s, including hundreds of companies building expert systems, vision
systems, robots, and software and hardware specialized for these purposes.
Despite all efforts and investments made during the early 80s, many companies could not fulfill their ambitious promises.
Hardware manufacturers declined to keep up with the requirements of specialized needs of the expert systems. Hence,
the thriving industry of expert systems in the early 80s declined tremendously and inevitably collapsed by the end of the
90s and the AI industry faced another winter which lasted until the mid-90s. This second period of so-called “winter”
in the history of AI had been so harsh that AI researchers subsequently tended to avoid even the term “AI” by choosing
other titles such as “informatics” or “analytics”. Despite the big shutdown of AI-based research works, the second
winter was the time when the very well-known backpropagation algorithm was revisited by many research groups
[58] [59]. Backpropagation, which is a primary learning mechanism for artificial neural networks, was vastly used
in learning problems during these years and eventually led to a new wave of interest in neural networks. The lesson
learned during the periods of AI’s winter made researchers more conservative. As a result, during the late-80s and the
90s, the field of AI research witnessed a major conservative shift towards more established theories like statistics-based
methods. Among these theories finding their way to the field were Hidden Markov Models (HMMs) [60]. Being strictly
mathematical-based and resulting from extensive training on large real-world datasets, HMMs became a trustable
framework for AI research, especially in handwriting recognition and speech processing, helping them make their way
back to the industry. Another important outcome of this conservative shift in the field of AI was the development of
public benchmark datasets and related competitions in its various sub-fields. Instances include the Letter Dataset [61],
Yale face database [62], MNIST dataset [63], Spambase Dataset [64], ISOLET Dataset [65], TIMIT [66], JARtool
experiment Dataset [67], Solar Flare Dataset [68], EEG Database [69], Breast Cancer Wisconsin (Diagnostic) Dataset
[70], Lung Cancer Dataset [71], Liver Disorders Dataset [72], Thyroid Disease Dataset [73], Abalone Dataset [74],
8
A PREPRINT - S EPTEMBER 6, 2021
UCI Mushroom Dataset [75] and other datasets that have been gathered during the 90s. The availability of these public
benchmarks became an important means for rigorous measurement of AI research advancements.
The gradual public interest in AI during the early 90s opened doors to other emerging or established fields such as
control theory, operational research, and statistics. Decision Theory and Probabilistic Reasoning started being adopted
by AI researchers. Uncertainty was represented more effectively by introducing Bayesian networks to the field [76].
Rich Sutton in 1998 revisited reinforcement learning after around thirty years by adopting Markov Decision Processes
[77]. This led to a growth in applying reinforcement learning on various problems such as planning research, robotics,
and process control. The vast amount of available data in different areas on the one hand, and the influences of statistical
methods such as machine learning and optimization on AI research methods, on the other, resulted in significant
re-adoption of AI in subfields including multi-agent models, natural language processing, robotics, and computer vision.
As such, new hopes for AI shaped again in the early 90s. Eventually, in 1997, AI-equipped machines showed off their
power against “Man” to the public [78]. Chess-playing AI software developed in IBM, called “Deep Blue”, eventually
won over the great maestro chess world champion, Garry Kasparov. Broadcasted live, Deep Blue captured the public’s
imagination once again towards AI systems of the future. The news was so breathtaking that IBM’s share values rose
up to all-time highs [79].
Massive advances in microchip manufacturing technologies in the late 90’s led to emerging powerful computers,
concurrent to the growth of the global internet that generated massive amounts of data. This included enormous
unprocessed text, video, voice, and images, along with semi-processed data such as geographical tracking, social
media-related data, and electronic medical records, ushering in the era of Big Data [80]. In the computer vision area in
2009, the ImageNet dataset was created gathering millions of labeled images, significantly contributing to the field [81].
There was a new beginning of wide interests in AI from the industry. Notable steps were taken in 2011 when IBM’s
Watson defeated human champions in the highly popular TV word quiz show Jeopardy [82], significantly boosting
public impression of the state-of-the-art in AI, and with the introduction of Apple’s Siri intelligent assistant.
In 1989 Yann LeCun revisited convolutional neural networks, and using gradient descend in their training mechanism,
demonstrated the ability to perform well in computer vision problems, specifically in handwritten digit recognition
[83]. Yet, it was in 2012 that these networks came to the forefront. A deep convolutional neural network developed in
Geoffrey Hinton’s research group at the University of Toronto surpassed the ILSVRC (ImageNet Large Scale Visual
Recognition Challenge) competitors by significantly enhancing all ImageNet classification benchmarks [84]. Before
the utilization of deep neural networks, all best-performing methods were mostly so-called classical machine learning
methods utilizing hand-crafted features. By 2011 the computing power of GPUs (Graphics processing units) had
grown enough to help the researchers train deep networks with higher dimensions both in terms of width and depth
in a shorter time. Since the earlier implementation of Convolutional Neural Networks on GPUs in 2006 [85] which
resulted in 4 times faster performance compared to CPUs, Schmidhuber’s team at IDSIA could achieve a 60 times
faster performance on GPUs in 2011 [86]. Meanwhile, the availability of huge amounts of labeled data such as millions
of labeled images in the ImageNet dataset helped researchers to overcome the problem of overfitting. Eventually,
in 2012 Hinton’s team proposed a deep convolutional neural network architecture, named AlexNet (after the team’s
leading author Alex Krizhevsky) which was able to train more layers of neurons. Utilizing many mechanisms and
techniques such as ReLU (Rectified Linear Unit) activation functions and dropout technique, AlexNet could achieve
higher discriminative power in an end-to-end fashion, that is, to feed the network with merely the pure images of the
dataset [87]. This event is regarded as the birth of the third “Boom” of AI. Since then, deep learning-based methods
have continued to achieve outstanding feats, including outperforming or performing on par with human experts in
certain tasks. Instances include AI-related fields such as computer vision, natural language processing, medical image
diagnosis [88], and natural language translation. The progress of deep neural networks gained public attention in 2016
at the time when Deep Mind’s AlphaGo beat the world champion of Go [34]. AI became again the target of the media,
public, governments, industries, scholars, and investor’s interests. Deep learning methods have nowadays entirely
dominated AI-related research, creating entirely new lines of research and industries. In 2018, Yoshua Bengio, Geoffrey
Hinton, and Yann LeCun won the Turing award for their pioneering efforts in deep learning. Figure 8 summarizes the
timeline of AI from the time it was born up to now.
9
A PREPRINT - S EPTEMBER 6, 2021
- Newell
- Simon
The first
Reinforcement Learning
program.
- Arthur Samuel
- Christopher Strachey
First - Arthur Samuel
Mark I Perceptron First high-level AI Mathematical Prover Perceptron Convergence The first
The first Neural Net programming Lang. program. The first Theorem The invention of "ELIZA" general-purpose
Computer (LISP) Industrial Robot "MicroWorld" The first chatbot mobile robot
- Nathaniel Rochester - Bernie Widrow
- Frank Rosenblatt - John McCarthy - Herbert Gelernter "Unimate" – Frank Rosenblatt - Marvin Minsky - Joseph Weizenbaum "Shakey"
( 1957 - 1966 ) 1957 1958 1959 1961 1962 1963 1965 1966
( 1967 - 1986 ) 1969 1971 1971 1972 1973 1975 1980 1982 1986
( 1987 - present ) 1987 1995 1997 1998 2000 2009 2011 2012 2016 2020 2021
MNIST Dataset
Introduced
The previous section presented a brief story of AI’s journey, with all its ups and downs over the decades. This journey
has not been easy, with multiple waves and seasons. Specifically, AI has faced two main breakdowns (so-called winters)
and three main breakthroughs (so-called AI summers or booms). Thanks to the convergence of parallel processing,
higher memory capacity, and more massive data collection (e.g., Big Data), AI has enjoyed a steady upward climb
since the early 20s. With all these pieces in place, much better algorithms have been developed, assisting this steady
progression. Computers are becoming faster. Computing power has continued to double nearly every two years
(Moore’s law). Advancements in technology occur 10 times faster, that is, what used to take years, now may happen in
the course of weeks or even days [89]. On a global scale, AI is becoming an attractive target for investors, producing
billions of dollars of profit per annum. From 2010 to 2020, global investment in AI-based startup companies has
steadily grown from $1.3B to more than $40B, with an average annual growth rate of nearly 50%, whereas in 2020
only, corporate investment in AI is reported to be nearly $70B globally [90]. In the academic sector, from 2000 up until
2020, the number of peer-reviewed AI papers per year has grown roughly 12 times worldwide. AI conferences have
witnessed similar significant increases in terms of the number of attendants. In 2020, NeurIPS accepted 22,000 attendees,
more than 40% growth over 2018, and tenfold compared to 2012. Concurrently, AI has become the most popular
specialization among computer science Ph.D. students in North America, nearly three times the next rival (Theory
and Algorithms) [91]. In 2019, more than 22% of Ph.D. candidates in Computer Science majored in AI/Machine
Learning. With the introduction of machine learning, the environment of the healthcare and biology sectors has changed
dramatically. AlphaFold, developed by DeepMind, used deep learning to make a major advance in the decades-long
10
A PREPRINT - S EPTEMBER 6, 2021
biology problem of protein folding. Scientists use machine learning algorithms to learn representations of chemical
molecules to plan more efficient chemical synthesis. ML-based approaches were used by PostEra, an AI startup, to
speed up COVID-related drug development during the pandemic [91]. This suggests that we are in the midst of the next
hype cycle. And this new hype is focused on applications with life-or-death implications, such as autonomous vehicles,
medical applications, and so on, making it critical that AI algorithms be trustworthy.
5 The Future of AI
Numerous AI-related startups have been founded in recent years, with both companies and governments investing
heavily in the sector. If another AI winter occurs, many will lose their jobs, and many startups will be forced to close, as
has occurred in the past. According to McKinsey & Company, the economic gap between an approaching winter period
and continued prosperity by 2025 would be in the tens of billions of dollars [92]. A recurrent pattern in previous AI
winters has been the promises that sparked initial optimism yet turned out to be exaggerated. During both AI winters,
budget cuts had a major effect on AI research. The Lighthill report resulted in funding cuts in the United Kingdom
during the first AI winter, as well as cuts in Europe and the United States. DARPA support was cut, resulting in the
second AI winter. Significant attention needs to be paid to technical challenges and limitations. Let us recall what
was faced by the perceptron in the 1960s in being noted as unable to solve the so-called XOR problem, or limitations
faced by expert systems in the 1980s. AI has appeared particularly vulnerable to overestimations coupled to technical
limitations. Overall, the hype and fear that comes with reaching human-level intelligence have quickly contributed to
exaggerations and public coverage that is not common in other innovative tech sectors. To avoid a next winter of AI, a
number of important considerations may need to be made:
i. It is extremely important to be aware of philosophical arguments about the utter sublimeness of what it means to
be human, and to not make exaggerated claims about ascension of AI systems to being human (see very illustrating
documentary [93]). In addition, these philosophical arguments (e.g. by Hubert Dreyfus based on the philosophy of
Martin Heidegger), had they been more extensively and interactively considered, could have likely contributed to
further success by AI in early years (e.g. earlier attention to ‘connectionist’ approaches to AI).
ii. Neglects regarding above point, as well as exciting early successes, contribute to exaggerated claims that AI
will solve any important problem soon. This is what happened that contributed to the first winter of AI, where
AI researchers made overconfident and overoptimistic predictions about upcoming successes, given the early
promising performances of AI on simpler examples [94]. Lack of appreciation for the computational complexity
theory was another reason for AI scientists to believe that scaling up of simple solutions to larger tasks is just a
matter of employing faster hardware and larger memories.
iii. According to a recently released report [95], 40% of startups established in Europe who claim to use AI in their
provided services do not actually do so. This result largely owes to the fact that the definition of AI is ambiguous
for the majority of the public and the media. Therefore, given recent excitement around AI and the resulting hype
and investment growth in the field, some businesses try to benefit from this ambiguity by misusing terms such as
AI, Machine Learning, and Deep Learning. Thus, it is crucial to define these terms more clearly with respect to
other related concepts and define what they have in common and what they do not.
iv. There is significantly troubled trends in scientific methodology and dissemination by AI researchers, contributing
to the hype and confusion: these trends include failure to distinguish between explanation and speculation,
failure to identify real sources of performance gains, confusing/misleading use of math, and misuse of language
[96]. According to a recent study [97] reviewing a spectrum of machine learning approaches for detecting and
prognosticating coronavirus disease 2019 (COVID-19) from standard-of-care chest radiographs (CXR) and CT
images, none of >400 studies were found suitable for clinical application! The studies suffered from one or
multiple issues including use of poor-quality data, poor application of machine learning methodology, poor
reproducibility and biases in study design.
v. AI, in its essence, is vague and covers a broad scope. As observed by Andrew Moore [98], “Artificial intelligence
is the science and engineering of making computers behave in ways that, until recently, we thought only human
intelligence is able to perform”. The critical point in this definition lies in the phrase ‘until recently’ which points
to the moving target of AI through time. In other words, ideas and methods are being referred to as AI as long as
they have not been completely discovered. Once they are figured out, they may be no longer associated with AI
and receive their own tag. This phenomenon is known as the AI Effect [99] which contributes to the fast decline of
public excitement about groundbreaking achievements in AI.
vi. An important challenge with AI technologies, more specifically deep learning-based AI, is its so-called black-box,
opaque nature of decision making. That is, when a deep learning algorithm makes a decision, the process of its
inference or the logic behind it may not be representable. Although in some tasks such as playing board games,
11
A PREPRINT - S EPTEMBER 6, 2021
e.g. “Go”, where the objective is merely winning the game, this concern does not reveal itself, in critical tasks;
e.g. in healthcare, where a decision impacts humans lives, this issue can lead to trustworthiness challenges (other
examples of critical tasks include transportation and mobility systems, social or financial systems, etc.)
vii. One very specific concern is issue of bias. As an example, soon after Google introduced BERT (Bidirectional
Encoder Representations from Transformers), one of the most sophisticated AI technologies in language models,
scientists learned an essential flaw in that system. BERT and its peers (GPT-2, GPT-3, t5, etc.) were more likely to
equate males with computer programming and, in general, failed to give females adequate respect. BERT, which
is now being used in critical services such as Google’s internet search engine, is one of a number of AI systems
(so-called “transformers”) that learn from massive amounts of digitized data, such as old books, Wikipedia pages,
tweets, forums, and news stories. The main problem with BERT or GPT-3 (Generative Pre-trained Transformers,
BERT’s rival introduced by OpenAI) and similar universal language models is that they are too complex even for
their own designers (GPT-3’s full version has 175 billion parameters [100]); in fact, scientists are still learning
how these models work. One certain fact about these systems is that they pick up biases as they learn through
human-generated data. Because these mechanisms can be used in a variety of sensitive contexts to make critical
and life-changing choices, it is critical to ensure that the decisions do not represent biased attitudes against
particular communities or cultures. Thus, builders of AI systems have a duty to steer the design and use of AI in
ways that serve society.
Overall, the future of AI appears promising. AI eventually will drive our automobiles, aid physicians in making
more precise diagnoses, assist judges in making more consistent judgments, help employers in hiring more qualified
applicants, and much more. We are aware, however, that these AI systems may be fragile and unjust. By adding graffiti
to a stop sign, the classifier may believe it is no longer a stop sign. By adding subtle noise or signal to a benign skin
lesion image, the classifier may be fooled into believing it is malignant (e.g. adversarial attacks). Risk management
instruments used in US courts have been shown to be racially discriminatory. Corporate recruitment tools have been
shown to be sexist. Towards trustworthy AI, organizations around the world are coming together to establish consistent
standards for evaluating responsible implementation of AI systems and to encourage international support for AI
technologies that benefit humanity and the environment. Among these tries is the European Commission’s report on
Ethics guidelines for trustworthy AI [101] and DARPA’s XAI (eXplainable AI) roadmap [102].
Deep
High Learning
Ensempling
SVMs
Bayesian
Models
Generalized
Additive Models
kNN
Model Decision
accuracy Trees
Linear / Logistic
Regressions
Rule-based
Learning
According to Arrieta et al. [103], a trade-off between interpretability of AI models and their accuracy (performance) can
be observed, given fair comparison conditions (figure 9). Simpler AI approaches, such as linear regression and decision
trees, are self-explanatory (interpretable) since the classification decision border may be depicted in a few dimensions
using model parameters. However, for tasks such as categorization of medical images in healthcare, these may lack
necessary complexity, yet to acquire the trust of physicians, regulators, and patients, a medical diagnostic system
needs to be visible, intelligible, and explainable; it should be able to explain the logic of making a certain decision to
stakeholders engaged in the process. Newer rules, such as GDPR (the European General Data Protection Regulation),
are making the use of black-box models more difficult in different industries because re-traceability of judgments is
increasingly required. An AI system designed to assist professionals should be explainable and allow the human expert
to retrace their steps and utilize their judgment. Some academics point out that humans are not always competent or
willing to explain their choices. However, explainability is a fundamental enabler for AI deployment in the real world
since it ensures that technology is used in a safe, ethical, fair, and trustworthy manner. Breaking AI misconceptions by
12
A PREPRINT - S EPTEMBER 6, 2021
demonstrating what a model primarily looked at while making a judgment can help end-users trust the technology (e.g.
via use of heat maps / activation maps). For non-deep learning users, such as most medical professionals, it is even more
vital to show such domain-specific attributes used in the decision. For further enhanced AI, a way forward seems to be
the convergence of symbolic and connectionist methods, which would incorporate the former’s higher interpretability
with the latter’s significant recent success (more on these next). For instance, the use of hybrid distributional models,
which combine sparse graph-based representations with dense vector representations and connect those to lexical tools
and knowledge bases, appears to be promising towards explainable AI in the medical domain [104]. However, the main
obstacle toward this solution is the historical division between these two paradigms. The deep neural network approach
is not novel. Today, it is fulfilling the promise stated at the beginning of cybernetics by benefitting from developments
in computer processing and the existence of massive datasets. These techniques, however, have not always been deemed
to constitute AI. Machine-learning approaches based on neural networks (“Connectionist AI”) have been historically
scorned and ostracized by the “symbolic” school of thinking. The rise of AI, which was clearly distinct from early
cybernetics, amplified the friction between these two approaches. The co-citation network of the top-cited authors
in papers mentioning "AI" demonstrates the drift between researchers who have used the symbolic or connectionist
paradigms (See figure 10).
Figure 10: Co-citation network of the 100 most cited authors with “Artificial Intelligence” in the title. Figure illustrates
the names of some important authors, clearly distributed by the community. At the heart of the “connectionists,” some
core figures in deep learning appear. On the “symbolic side,” some core figures are laid out in a way that represents
their proximities and divergences, surrounded by primary contributors to the construction of cognitive modeling, expert
systems, and even those critical of symbolic AI (Dreyfus, Searle, Brooks). This figure is not comprehensive and misses
some key contributors; and is intended to demonstrate existing dichotomies in connectionist and symbolic frameworks
(The figure is taken from [105])
13
A PREPRINT - S EPTEMBER 6, 2021
Despite the obvious separation that existed among the intellectuals from these two schools, a third subfield of AI has
been emerging, namely Neuro-Symbolic (NeSy) AI, that focuses on combining the neural and symbolic traditions in AI
for additional benefit [106]. The promise of NeSy AI is largely based on the aim of achieving a best-of-both-worlds
scenario in which the complementary strengths of neural and symbolic techniques can be advantageously merged. On
the neural side, desirable strengths include trainability from raw data and robustness against errors in the underlying
data, whereas on the symbolic side, one would like to retain these systems’ inherent high explainability and provable
correctness, as well as the ease with which they can be designed and function using deep human expert knowledge.
In terms of functional features, using symbolic approaches in conjunction with machine learning – particularly deep
learning, which is currently the subject of the majority of research – one would hope to outperform systems that rely
entirely on deep learning on issues such as out-of-vocabulary handling, generalizable training from small data sets,
error recovery, and, in general, explainability [106].
6 Conclusions
Rapid developments in the field of AI are changing different aspects of human life. Advances both in computational
power and AI algorithm design have enabled AI methods to outperform humans in an increasing number of tasks. The
field of AI has experienced decades of praise and criticism; its path has never been smooth. With the two winters that
the field has experienced, following two waves of great growth and high expectations, as well as the costs that the
community of researchers, corporations, start-ups, and governments have paid, it is critical for us to recognize that the
current wave of high hopes and high expectations should not be taken for granted.
Acknowledgment
This work was in part supported by the Canadian Institutes of Health Research (CIHR) Project Grant PJT-162216. The
authors also wish to acknowledge valuable feedback from Ian Janzen of BC Cancer Research Institute.
References
[1] Catherine Clifford. The ‘oracle of a.i.’: These 4 kinds of jobs won’t be replaced by robots. https://www.cnbc.com/2019/01/
14/the-oracle-of-ai-these-kinds-of-jobs-will-not-be-replaced-by-robots-.html, January 2019. Accessed: 2021-6-29.
[2] Catherine Clifford. Google ceo: A.i. is more important than fire or electricity. https://www.cnbc.com/2018/02/01/
google-ceo-sundar-pichai-ai-is-more-important-than-fire-electricity.html, February 2018. Accessed: 2021-6-29.
[3] Shana Lynch. Andrew ng: Why ai is the new electricity. https://www.gsb.stanford.edu/insights/
andrew-ng-why-ai-new-electricity, 2017. Accessed: 2021-6-29.
[4] BBC News. Deepfake queen to deliver channel 4 christmas message. https://www.bbc.com/news/technology-55424730#:
~:text=While%20the%20Queen%20is%20delivering,news%20in%20the%20digital%20age., December 2020. Accessed:
2021-5-17.
[5] James Vincent. Tom cruise deepfake creator says public shouldn’t be worried about ‘one-click fakes’. https://www.theverge.
com/2021/3/5/22314980/tom-cruise-deepfake-tiktok-videos-ai-impersonator-chris-ume-miles-fisher, March 2021. Accessed:
2021-5-25.
[6] Oxford languages and google - english : Artificial intelligence definition. https://languages.oup.com/google-dictionary-en/,
May 2020. Accessed: 2021-4-14.
[7] Michael Aaron Dennis. Marvin minsky, american scientist: Encyclopedia britannica. https://www.britannica.com/biography/
Marvin-Lee-Minsky, January 2021. Accessed: 2021-6-29.
[8] IBM Cloud Education. What is artificial intelligence (AI)? https://www.ibm.com/cloud/learn/what-is-artificial-intelligence.
Accessed: 2021-4-14.
[9] Michael Chui, Martin Harrysson, James Manyika, Roger Roberts, Rita Chung, Pieter Nel, and Ashley Van Heteren. Applying
AI for social good | mc kinsey et al. Technical report.
[10] S.J. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Pearson series in artificial intelligence. Pearson
education limited., 2021.
[11] Stuart Jonathan Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995.
[12] A M Turing. I.—computing machinery and intelligence. Mind, LIX(236):433–460, October 1950.
[13] Jose Hernandez-Orallo. Beyond the turing test. J. Log. Lang. Inf., 9(4):447–466, 2000.
[14] David L Dowe and Alan R Hajek. A computational extension to the turing test. In Proceedings of the 4th conference of the
Australasian cognitive science society, University of Newcastle, NSW, Australia, volume 1. Citeseer, 1997.
14
A PREPRINT - S EPTEMBER 6, 2021
[15] Patrick Hayes and Kenneth Ford. Turing test considered harmful. In IJCAI (1), pages 972–977. researchgate.net, 1995.
[16] Michael Haenlein and Andreas Kaplan. A brief history of artificial intelligence: On the past, present, and future of artificial
intelligence. California management review, 61(4):5–14, August 2019.
[17] Warren S McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin of
mathematical biophysics, 5(4):115–133, December 1943.
[18] Jeremy M. Norman. McCulloch & pitts publish the first mathematical model of a neural network. https://www.
historyofinformation.com/detail.php?entryid=782. Accessed: 2021-6-7.
[19] Charles Wallis. History of the perceptron. https://web.csulb.edu/~cwallis/artificialn/History.htm. Accessed: 2021-6-7.
[20] Eric Roberts. Neural networks - history. https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/
History/history1.html. Accessed: 2021-6-7.
[21] Gualtiero Piccinini. The first computational theory of mind and brain: A close look at mcculloch and pitts’s “logical calculus
of ideas immanent in nervous activity”. Synthese, 141(2):175–215, August 2004.
[22] I, robot by isaac asimov. https://prezi.com/r1go_sui-af2/i-robot-by-isaac-asimov/. Accessed: 2021-4-24.
[23] Alan Mathison Turing. On computable numbers, with an application to the entscheidungsproblem. Proceedings of the
London mathematical society, 2(1):230–265, 1937.
[24] Donald Olding Hebb. The organization of behavior: A neuropsychological theory. Psychology Press, 2005.
[25] S Song, K D Miller, and L F Abbott. Competitive hebbian learning through spike-timing-dependent synaptic plasticity. Nat.
Neurosci., 3(9):919–926, September 2000.
[26] Jeremy Bernstein. Marvin minsky’s vision of the future. https://www.newyorker.com/magazine/1981/12/14/a-i, 1981.
Accessed: 2021-6-29.
[27] Jef Akst. Machine, learning, 1951. https://www.the-scientist.com/foundations/machine--learning--1951-65792, May 2019.
Accessed: 2021-4-25.
[28] A brief history of artificial intelligence. https://cyfuture.com/blog/history-of-artificial-intelligence/, April 2020. Accessed:
2021-4-24.
[29] Daniel Crevier. Ai. Basic Books, August 1994.
[30] Allen Newell, John C Shaw, and Herbert A Simon. Report on a general problem solving program. In IFIP congress, volume
256, page 64. Pittsburgh, PA, 1959.
[31] Arthur L Samuel. Some studies in machine learning using the game of checkers. IBM Journal of research and development,
3(3):210–229, 1959.
[32] Chris Bleakley. Poems That Solve Puzzles: The History and Science of Algorithms. Oxford University Press, August 2020.
[33] Gerald Tesauro. Temporal difference learning and TD-Gammon. Commun. ACM, 38(3):58–68, March 1995.
[34] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser,
Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalch-
brenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis.
Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, January 2016.
[35] Charles C Tappert. Who is the father of deep learning? In 2019 International Conference on Computational Science and
Computational Intelligence (CSCI), pages 343–348. ieeexplore.ieee.org, December 2019.
[36] Frank Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory,
1957.
[37] Professor’s perceptron paved the way for ai – 60 years too soon. https://news.cornell.edu/stories/2019/09/
professors-perceptron-paved-way-ai-60-years-too-soon, September 2019. Accessed: 2021-4-26.
[38] One page schoolhouse. Perceptron. https://ronkowitz.blogspot.com/2017/11/perceptron.html. Accessed: 2021-4-29.
[39] John McCarthy and Others. Programs with common sense. RLE and MIT computation center, 1960.
[40] Kenneth De Jong, David B Fogel, and Hans-Paul Schwefel. A2. 3 a history of evolutionary computation. A1. 1 Introduction.
[41] R M Friedberg. A learning machine: Part i. IBM Journal of Research and Development, 2(1):2–13, 1958.
[42] H Gelernter, J R Hansen, and D W Loveland. Empirical explorations of the geometry theorem machine. In Papers presented
at the May 3-5, 1960, western joint IRE-AIEE-ACM computer conference, IRE-AIEE-ACM ’60 (Western), pages 143–149,
New York, NY, USA, May 1960. Association for Computing Machinery.
[43] Shimon Y Nof. Handbook of Industrial Robotics. John Wiley & Sons, March 1999.
[44] Bernard Widrow and Others. Adaptive adaline Neuron Using Chemical memristors. 1960.
[45] Marvin Minsky. Society Of Mind. Simon and Schuster, March 1988.
[46] James R Slagle. A heuristic program that solves symbolic integration problems in freshman calculus. Journal of the ACM
(JACM), 10(4):507–520, October 1963.
15
A PREPRINT - S EPTEMBER 6, 2021
[47] Thomas G Evans. A heuristic program to solve geometric-analogy problems. In Proceedings of the April 21-23, 1964,
spring joint computer conference, AFIPS ’64 (Spring), pages 327–338, New York, NY, USA, April 1964. Association for
Computing Machinery.
[48] Daniel G Bobrow. Natural language input for a computer problem solving system. 1964.
[49] Manisha Salecha. Story of eliza, the first chatbot developed in 1966. https://analyticsindiamag.com/
story-eliza-first-chatbot-developed-1966/, October 2016. Accessed: 2021-4-26.
[50] Shakey. http://www.ai.sri.com/shakey/. Accessed: 2021-5-4.
[51] Sergei Nirenburg, Harold L Somers, and Yorick A Wilks. Alpac: the (in) famous report. Readings in machine translation,
14:131–135, 2003.
[52] Cambridge University. Sir James Lighthill FRS Lucasian Professor of Applied Mathematics. Artificial intelligence: A
general survey. (lighthill report). http://www.chilton-computing.org.uk/inf/literature/reports/lighthill_report/p001.htm, 1972.
Accessed: 2021-5-4.
[53] Edward A Feigenbaum, Bruce G Buchanan, and Joshua Lederberg. On generality and problem solving: A case study using
the dendral program. 1970.
[54] Edward H Shortliffe and Bruce G Buchanan. A model of inexact reasoning in medicine. Math. Biosci., 23(3):351–379, April
1975.
[55] Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition
unaffected by shift in position. Biol. Cybern., 36(4):193–202, April 1980.
[56] Drew McDermott, M Mitchell Waldrop, B Chandrasekaran, John McDermott, and Roger Schank. The dark ages of ai: A
panel discussion at aaai-84. AI Magazine, 6(3):122–122, September 1985.
[57] Kenneth Olsen and Harlan Anderson. Digital equipment corporation. First People, 25, 1983.
[58] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. Nature,
323(6088):533–536, October 1986.
[59] Goodfellow Ian, Bengio Yoshua, and Courville Aaron. Deep Learning. MIT Press, 2016.
[60] Leonard E Baum and Ted Petrie. Statistical inference for probabilistic functions of finite state markov chains. aoms,
37(6):1554–1563, December 1966.
[61] Peter W Frey and David J Slate. Letter recognition using holland-style adaptive classifiers. Machine learning, 6(2):161–182,
March 1991.
[62] A Georghiades, P Belhumeur, and D Kriegman. Yale face database. Center for computational Vision and Control at Yale
University, 2(6):33, 1997.
[63] Y Lecun, L Bottou, Y Bengio, and P Haffner. Gradient-based learning applied to document recognition. Proceedings of the
IEEE, 86(11):2278–2324, November 1998.
[64] Christos Dimitrakakis and Samy Bengio. Online policy adaptation for ensemble algorithms. Technical report, IDIAP, 2002.
[65] Mark Fanty and Ronald Cole. Spoken letter recognition. Adv. Neural Inf. Process. Syst., 3:220–226, 1990.
[66] Victor Zue, Stephanie Seneff, and James Glass. Speech database development at mit: Timit and beyond. Speech communica-
tion, 9(4):351–356, August 1990.
[67] G H Pettengill, P G Ford, W T Johnson, R K Raney, and L A Soderblom. Magellan: radar performance and data products.
Science, 252(5003):260–265, April 1991.
[68] Jinyan Li, Guozhu Dong, Kotagiri Ramamohanarao, and Limsoon Wong. Deeps: A new instance-based lazy discovery and
classification system. Machine learning, 54(2):99–124, February 2004.
[69] Lester Ingber. Statistical mechanics of neocortical interactions: Canonical momenta indicatorsof electroencephalography.
Physical Review E, 55(4):4578–4593, April 1997.
[70] W Nick Street, W H Wolberg, and O L Mangasarian. Nuclear feature extraction for breast tumor diagnosis. In Biomed-
ical Image Processing and Biomedical Visualization, volume 1905, pages 861–870. International Society for Optics and
Photonics, July 1993.
[71] Zi-Quan Hong and Jing-Yu Yang. Optimal discriminant plane for a small number of samples and design method of classifier
on the plane. Pattern Recognition, 24(4):317–324, January 1991.
[72] A M Bagirov, A M Rubinov, N V Soukhoroukova, and J Yearwood. Unsupervised and supervised data classification via
nonsmooth and global optimization. TOP, 11(1):1–75, June 2003.
[73] J R Quinlan, P J Compton, K A Horn, and L Lazarus. Inductive knowledge acquisition: a case study. In Proceedings of
the Second Australian Conference on Applications of expert systems, pages 137–156, USA, October 1987. Addison-Wesley
Longman Publishing Co., Inc.
[74] David Clark, Zoltan Schreter, and Anthony Adams. A quantitative comparison of dystal and backpropagation. In Australian
conference on neural networks, 1996.
16
A PREPRINT - S EPTEMBER 6, 2021
[75] Wayne Iba, James Wogulis, and Pat Langley. Trading off simplicity and coverage in incremental concept learning. In John
Laird, editor, Machine Learning Proceedings 1988, pages 73–79. Elsevier, San Francisco (CA), January 1988.
[76] Judea Pearl. Causality. Cambridge University Press, September 2009.
[77] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
[78] Bruce Weber. Computer defeats kasparov, stunning the chess experts. The New York Times, May 1997.
[79] Chris Higgins. A brief history of deep blue, IBM’s chess computer. https://www.mentalfloss.com/article/503178/
brief-history-deep-blue-ibms-chess-computer, July 2017. Accessed: 2021-5-3.
[80] Michael A Morris, Babak Saboury, Brian Burkett, Jackson Gao, and Eliot L Siegel. Reinventing radiology: Big data and
the future of medical imaging. Journal of thoracic imaging, 33(1):4–16, January 2018.
[81] Dave Gershgorn. The data that transformed ai research—and possibly the world. https://qz.com/1034972/
the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/, July 2017. Accessed: 2021-5-4.
[82] Adam Gabbatt. Ibm computer watson wins jeopardy clash. https://www.theguardian.com/technology/2011/feb/17/
ibm-computer-watson-wins-jeopardy, February 2011. Accessed: 2021-5-17.
[83] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D
Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
[84] Imagenet large scale visual recognition competition 2012 (ilsvrc2012). https://image-net.org/challenges/LSVRC/2012/results.
html. Accessed: 2021-5-4.
[85] Kumar Chellapilla, Sidd Puri, and Patrice Simard. High performance convolutional neural networks for document processing.
In Tenth International Workshop on Frontiers in Handwriting Recognition. hal.inria.fr, 2006.
[86] Dan Claudiu Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber. Flexible, high
performance convolutional neural networks for image classification. In Twenty-second international joint conference on
artificial intelligence. people.idsia.ch, 2011.
[87] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.
Adv. Neural Inf. Process. Syst., 25:1097–1105, 2012.
[88] Xiaoxuan Liu, Livia Faes, Aditya U Kale, Siegfried K Wagner, Dun Jack Fu, Alice Bruynseels, Thushika Mahendiran,
Gabriella Moraes, Mohith Shamdas, Christoph Kern, Joseph R Ledsam, Martin K Schmid, Konstantinos Balaskas, Eric J
Topol, Lucas M Bachmann, Pearse A Keane, and Alastair K Denniston. A comparison of deep learning performance against
health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet
Digit Health, 1(6):e271–e297, October 2019.
[89] Mark Manson. I, for one, welcome our AI overlords. https://markmanson.net/artificial-intelligence, June 2016. Accessed:
2021-4-14.
[90] Raymond Perrault, Yoav Shoham, Erik Brynjolfsson, Jack Clark, John Etchemendy, Barbara Grosz, Terah Lyons, James
Manyika, Saurabh Mishra, and Juan Carlos Niebles. The AI index 2019 annual report. Technical report, , AI Index Steering
Committee, Human-Centered AI Institute, Stanford University, Stanford, CA, December 2019.
[91] Daniel Zhang, Saurabh Mishra, Erik Brynjolfsson, John Etchemendy, Deep Ganguli, Barbara Grosz, Terah Lyons, James
Manyika, Juan Carlos Niebles, Michael Sellitto, et al. The ai index 2021 annual report. arXiv preprint arXiv:2103.06312,
2021.
[92] Michael Chui, Martin Harrysson, James Manyika, Roger Roberts, Rita Chung, Pieter Nel, and Ashley van Heteren.
Applying artificial intelligence for social good. https://www.mckinsey.com/featured-insights/artificial-intelligence/
applying-artificial-intelligence-for-social-good, November 2018. Accessed: 2021-4-14.
[93] Tao Ruspoli. Being in the world - on the subject of the #heideggerian dasein, April 2018. Alive Mind Cinema.
[94] Timothy Taylor. 1957: When machines that think, learn, and create arrived. https://www.bbntimes.com/global-economy/
1957-when-machines-that-think-learn-and-create-arrived. Accessed: 2021-6-29.
[95] James Vincent. Forty percent of ‘ai startups’ in europe don’t actually use AI, claims report. https://www.theverge.com/2019/
3/5/18251326/ai-startups-europe-fake-40-percent-mmc-report, March 2019. Accessed: 2021-4-17.
[96] Zachary C Lipton and Jacob Steinhardt. Research for practice: troubling trends in machine-learning scholarship. Communi-
cations of the ACM, 62(6):45–53, May 2019.
[97] Michael Roberts, Derek Driggs, Matthew Thorpe, Julian Gilbey, Michael Yeung, Stephan Ursprung, Angelica I Aviles-
Rivero, Christian Etmann, Cathal McCague, Lucian Beer, Jonathan R Weir-McCall, Zhongzhao Teng, Effrossyni Gkrania-
Klotsas, James H F Rudd, Evis Sala, and Carola-Bibiane Schönlieb. Common pitfalls and recommendations for using
machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Machine
Intelligence, 3(3):199–217, March 2021.
[98] Irving Wladawsky-Berger. What machine learning can and cannot do. https://www.wsj.com/articles/
what-machine-learning-can-and-cannot-do-1532714166?tesla=y, July 2018. Accessed: 2021-4-18.
[99] AI set to exceed human brain power. http://edition.cnn.com/2006/TECH/science/07/24/ai.bostrom/, August 2006. Accessed:
2021-4-29.
17
A PREPRINT - S EPTEMBER 6, 2021
[100] Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan,
Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan,
Rewon Child, Aditya Ramesh, Daniel M Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric
Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford,
Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. arXiv e-prints, May 2020.
[101] Luciano Floridi. Establishing the rules for building trustworthy ai. Nature Machine Intelligence, 1(6):261–262, May 2019.
[102] David Gunning and David Aha. Darpa’s explainable artificial intelligence (xai) program. AI Magazine, 40(2):44–58, June
2019.
[103] Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado,
Salvador Garcia, Sergio Gil-Lopez, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. Explainable
artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information Fusion,
58:82–115, June 2020.
[104] David Gunning. Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web,
2(2), 2017.
[105] Dominique Cardon, Jean-Philippe Cointet, Antoine Mazières, and Elizabeth Libbrecht. Neurons spike back. Reseaux,
(5):173–220, 2018.
[106] Kamruzzaman Sarker, Lu Zhou, Aaron Eberhart, and Pascal Hitzler. Neuro-symbolic artificial intelligence: Current trends.
arXiv e-prints, page arXiv:2105.05330, May 2021.
Glossary
Term: Definition:
Artificial Intelligence The study and development of computer systems capable of imitating intelligent human behavior.
(AI)
Machine learning (ML) A subset of AI in which computers learn how to do tasks through the use of massive amounts of
data rather than being programmed.
Artificial neural networks A type of computer system that is supposed to operate in a manner comparable to the human
(ANN) brain and nervous system.
Convolutional neural A type of feed-forward neural network that consists of a large number of convolutional layers
networks (CNN) stacked on top of one another. It is primarily used for computer vision tasks.
Deep learning (DL) A type of computer system that is supposed to operate in a manner comparable to the human
brain and nervous system.
Reinforcement learning A subfield of machine learning that studies how intelligent entities should behave in order to
(RL) optimize the concept of cumulative reward.
Hidden Markov model A type of Markov model in which the represented system is considered to be a Markov process.
(HMM)
General problem solver A computer software designed in 1959 to act as a universal problem solver.
(GPS)
Bayesian Networks A probabilistic graphical model that utilizes a directed acyclic graph (DAG) to describe a set of
variables and their conditional dependencies.
Gradients Vanishing When backpropagation is used to train artificial neural networks, the gradient can become
vanishingly small in some situations, thereby preventing the weight from changing its value.
Natural language process- Series of methods for processing of natural languages, such as translating.
ing (NLP)
18
A PREPRINT - S EPTEMBER 6, 2021
Computer vision Computer vision is an interdisciplinary scientific topic that examines how computers can extract
information from digital images or video streams at a high degree of understanding.
Speech Recognition Refers to the technology that enables computers to comprehend spoken language.
cognitive science The scientific study of the mind and its processes on an interdisciplinary level.
logical notation Refers to a collection of symbols that are frequently employed to convey logical representations.
Turing test A test of a machine’s ability to demonstrate intelligent behavior that is comparable to, or indistin-
guishable from, human behavior.
CVPR The Conference on Computer Vision and Pattern Recognition (CVPR), is an annual conference
on computer vision and pattern recognition.
AAAI The Association for the Advancement of AI, an international scientific society devoted to promot-
ing research in, and responsible use of AI.
AI INDEX Report The AI Index is a public-facing annual study on the state of AI in all relevant fields.
XOR function A logical "exclusive OR" operator that returns TRUE if one of the logical propositions is true and
FALSE if both statements are true. It also returns FALSE if neither of the statements is true.
weak AI Used in contrast to "strong AI," which is described as a machine capable of applying intelligence
to any problem, rather than just one specific problem.
expert systems A computer program that simulates the decision-making abilities of a human expert. It is sup-
posed to handle difficult problems through the use of bodies of knowledge, which are mostly
represented as if–then rules.
Microelectronics and One of the largest computer industry research and development consortia in the United States.
Computer Technology
Corporation (MCC)
MNIST dataset The Modified National Institute of Standards and Technology database (MNIST) is a massive
collection of handwritten digits that is frequently used to train image processing systems.
ImageNet A big visual database that was created for the purpose of doing research on visual object recogni-
tion software.
Decision Theory The study of an agent’s decisions is called decision theory. It is closely related to the discipline
of game theory and is researched by economists, statisticians, data scientists, psychologists,
biologists, social scientists, philosophers, and computer scientists.
Markov Decision Pro- A discrete-time stochastic control process that provides a mathematical framework for modeling
cesses decision-making in situations where outcomes are partly random and partly under control.
Big Data A branch of study that focuses on methods for analyzing, extracting information from, or oth-
erwise dealing with data volumes that are too vast or complicated for typical data-processing
application software to handle.
Data science An interdisciplinary field that uses scientific methods, procedures, algorithms, and systems to
mine organized and unstructured data for information and insights.
ILSVRC The ImageNet Large Scale Visual Recognition Challenge is an annual software competition in
which participants compete to classify and recognize objects and scenes properly.
GPU A graphics processing unit (GPU) is a specialized electronic circuit that is capable of swiftly
manipulating and altering memory in order to expedite the production of images in a frame buffer
for output to a display device.
ReLU (Rectified Linear The rectifier, or ReLU (Rectified Linear Unit), is an activation function whose positive portion is
Unit) specified as the argument’s positive component.
19
A PREPRINT - S EPTEMBER 6, 2021
Deep Mind Deep Mind is an artificial intelligence company and research laboratory of Alphabet Inc. based in
the United Kingdom. It was created in September 2010 and was bought by Google in 2014.
Moore’s law Moore’s law is a historical observation and projection of a pattern in which the number of
transistors in a dense integrated circuit (IC) doubles approximately every two years.
NeurIPS NeurIPS, The Conference and Workshop on Neural Information Processing Systems, is an annual
conference on machine learning and computational neuroscience held in December.
WiML workshop The annual WiML Workshop is a technical event where women can share their machine learning
research.
AI4ALL AI4ALL is a not-for-profit organization dedicated to advancing diversity and inclusion in the field
of artificial intelligence.
AlphaFold AlphaFold is a machine learning algorithm built by Google’s DeepMind that predicts the struc-
ture of proteins.
XAI XAI is a type of AI in which the solution’s outcomes are understandable by humans. It contrasts
with the concept of the "black box" in machine learning, in which even the designers of the AI
are unable to explain why it made a certain decision.
20