Intelligence Primer
Intelligence Primer
developed a unique language. These systems started with basic crudely represents the training cost in Watt hours to pass the Tur-
English. ing Test.
Large Language Models also represent this measure Section 12.
Large refers to the size of the model, but it could also refer to the
size of the corpus of its data. Large Language Models, like ChatGPT, 𝑥
appear intelligent due to the language model driving them. 𝑇𝑇 𝐸21 = (2)
3.7𝑀𝑊 ℎ
Turing test
The Turing test has become infamous in the world of Artificial Intel-
The quotient for another 21-year-old would be 1, compared to a
ligence. Proposed by Alan M. Turing in 1950 and originally called
57-year-old, which is 2.7. Passing the Turing Test takes a 57-year-old
the Imitation game [71] describes a simple method to determine
2.7 times more training energy than a 21-year-old. The higher the
intelligence. The test focuses on a machine’s ability to exhibit intel-
quotient, the more energy is required to pass the Turing Test. The
ligent behaviors indistinguishable from that of a human. A person
quotient gives us a simple metric to determine the cost of attaining
interacts through an interface with an unknown entity. That entity
a level of intelligence indistinguishable from humans.
can be another human or a machine. The machine has passed the
Turing test if the person cannot distinguish which one is human or
machine. Problem solving
Only one Artificial Intelligence system has passed the Turing Measuring problem-solving ability is a way to measure reasoning
Test [63]. The passing is not without controversy. The Chatbot capability without language. One of the best ways to do this is
is called Eugene Goostman and acts like a 13-year-old Ukrainian through Behavioral Psychology. Behavioral Psychology is a school
boy. The chosen age and country of origin forced the judges to of thought focused on observable and measurable intelligence. It
compensate for errors and awkward grammar. The Royal Society assumes a blank slate (tabula rasa) and that all aspects of intelli-
held the test in London in 2014. Eugene Goostman managed to gence are due to a learning process. One’s intelligence is directly
convince 30% of the judges that it was human for 5 minutes during proportional to the complexity of the problem domain. In this case,
a typed conversation. problem-solving refers to a physical embodiment of deductive rea-
soning.
Efficiency: Back of an envelope calculation. Let us create an effi-
Pavlovian conditioning is the best example of Behavioral Psy-
ciency equation for passing the Turing Test 1 .
chology put into observable practice. By shaping the environment,
specific actions can be co-related to understanding the world and
𝑥 capacity for intelligence. Since animals cannot speak or refuse to
𝑇𝑇 𝐸𝑦 = (1) talk to humans, this is one of the first examples of being able to test
𝑦 ∗𝑑 ∗ℎ ∗𝑝
an animal’s understanding.
Crows, primates, and cephalopods all can reason up to several
Where 𝑥 is what you are comparing with (human or artificial), 𝑦 steps in advance to solve problems. In some cases, crows can solve
is years old, 𝑑 is days per year (365), ℎ hours in a day (24), and 𝑝 is problems better than most 5-year-olds. Crows can retrieve objects
power consumption (for humans it is 20 Watts). Using an average floating in containers just out of beak reach by adding rocks [39].
graduate age of 21 years old, the Turing Test Efficiency 21 (TTE21) Problem-solving as a measure shows some promise as a method
would have a divisor of 3.7 MWh (MegaWatt hours). This number to evaluate certain kinds of Artificial Intelligence. This method is
especially true for systems that navigate changing environments
1 Inspired by Paul Gleichauf or new problems to solve [12].
9
Measuring brain activity are two main categories of learning, namely supervised and un-
Measuring the brain’s electrical activity is vital to determine whether supervised. As the name implies, supervised learning requires an
a non-responsive person is either in a vegetative state or suffer- expert to dictate. The expert determines what is correct and what
ing from locked-in syndrome. A vegetative state is when a person is incorrect. Unsupervised learning requires no domain expert to
appears awake without awareness. By comparison, locked-in syn- intervene. There are sub-categories, which include, for example,
drome is a condition where a person is aware but cannot communi- semi-supervised learning.
cate. Some forms of stroke or head injury cause these situations. We will categorize each model by pairing it with one of the
This issue is where non-invasive Brain Computer Interfacing previously defined elements associated with intelligence and high-
helps determine levels of consciousness. For example, one method lighting their unique biomimicry. Biomimicry is the emulation of
to help discover the level of consciousness is called Zap and Zip processes found in nature. The point for this is two-fold. First, to
[34]. A sensor cap fits onto a patient’s head. The sensors measure put the methodology into perspective and second, to reinforce the
electroencephalogram or more commonly referred to as EEG. EEG notion that replication requires a detailed definition and a form of
is the activity occurring in the brain. First, a magnetic pulse "zap" measurement.
is applied. The cap sensors pick up the EEG signals. The EEG data
is a large amount of unstructured information. Then the zip com- Specialization vs generalization
pression algorithm is used to compress the data. The zip algorithm Statistical modeling is all about optimizing based on some desired
compresses common repeating patterns and leaves unique activities result. Engineers, researchers, and data scientists define the meth-
unchanged. As in no discarded information, the zip compression ods and the desired outcomes.
algorithm is called a lossless compression algorithm. The resulting Chollet defines intelligence as “The intelligence of a system is a
file size is the measurement of consciousness. A small file size in- measure of its skill-acquisition efficiency over a scope of tasks, con-
dicates that most brain systems are automatic. A large file leans cerning priors, experience, and generalization difficulty.” [12]. This
toward the patient having consciousness but cannot communicate. observation is helpful to remember as most, if not all, of these
Artificial Intelligence has no consciousness, so there is no equiv- methodologies optimize for a specific task. Chollet points out that
alent method. there is no quantifiable measure for the general. The more opti-
mized a model is for a particular job, the more specialized it is, and
Recommended reading the less general it will become.
• The Feeling Of Life Itself, by Christof Koch Historically, we believed that specialized knowledge came from
• Human Compatible, by Stuart Russell generalized knowledge. However, the following methods show how
• Artificial Intelligence: A Modern Approach, by Stuart Russell Et al. to obtain technical expertise without general knowledge. Each
• On the Measure of Intelligence, by François Chollet model has constraints, but all focus on optimizing a particular
feature. For example, Reinforcement Learning (for now) optimizes
7 MATHEMATICALLY MODELING along with one variable, defined by its reward function.
INTELLIGENCE
In an attempt to make the world understandable, humans use math-
Bayesian probabilism
ematics. Mathematics is said to be either invented or naturally Biomimicry note: Paired with deductive reasoning and statistics
occurring. If we believe it is naturally occurring, it is a discipline of
discovery rather than an invention of ingenuity. Immanuel Kant, Bayesian probabilism is where the comparison between human
the philosopher, said “that it was Nature herself, and not the math- and Machine Intelligence stops, as biological systems are generally
ematician, who brings mathematics into Natural philosophy” [65]. weak at statistics. As mentioned previously, Judea Pearl believes
For our discussion, we choose the invention path. humans are wrongly-wired or, more precisely intentionally wrongly
Mathematics is a collection of rules that attempts to model and wired for other evolutionary priorities [54]. Bayesian probability
map the world about us. There is a vast number of techniques for primarily lends itself to robotics and Simultaneous Localization and
algorithmically representing the world. It is a tool that is gaining in Mapping (SLAM). The reason for the popularity in these areas is
capability. Is it only a matter of time before we can model intelligence because it seems to be the best method to mitigate multiple sources
mathematically? The following are what we believe to be the most of information to establish a consensus of best known. For example,
focused mathematical models for intelligence. while a robot tracks its path, it always runs on minimal information
In many cases, these methodologies require an expert, in most and makes decisions based on statistical likelihood.
cases a human, to assist in pointing out the correct from the incor- Bayes Theorem underpins Bayesian probability, a mathematical
rect. While reading this section, the essential element to remember equation for how justified a specific belief is about the world.
is the George Box aphorism: “All models are wrong, but some are
useful.” Bayes Theorem is defined as follows:
We create models to generalize and abstract. We could make
a model so complex that it reflects reality or so simple that it is 𝑃 (𝐵|𝐴)𝑃 (𝐴)
𝑃 (𝐴|𝐵) = (3)
easier to implement and train. Models are essential in Machine 𝑃 (𝐵)
Learning. From the outside looking in, we need to train a model;
the model itself has to have the ability to learn. In general, there
10
Bayes Theorem mathematically expresses classical logic, as far
as the known is concerned. When dealing with the unknown, new
variables need to be introduced. Entropy in probabilism refers to the
amount of the problem space that remains unknown. As the equation
expresses the probability that something is true, the amount known
about the system must be recorded and categorized.
Deep learning
Biomimicry note: Paired with the physical nature of the human brain
understanding. In this concept, consciousness is viewed simply form [17]. In other ideas such as Psychological ether theory, brains
as a byproduct of functional execution. The Global Workspace do not produce consciousness but use consciousness, i.e., con-
Theory (GWT) [5, 6] is one concept that falls under the functional sciousness existed before our brains existed.
category. This theory is closely related to an old idea in Artificial • Experiences basis: Christof Koch defines consciousness as a set
Intelligence. Global Workspace Theory has a centralized black- of experiences. The historical experiences differentiate humans
board (workspace) where ideas live. These living ideas are placed from each other and machines [34]. Some humans are more
or taken from the blackboard; some ideas appear briefly. Ideas disposed to experiences than others. Experiences are a form of
can be combined, processed, or ignored. Subsystems are available causal action. Koch showed an exciting model, already introduced
to handle low-level ideas. Consciousness, therefore, comes about in Figure 8, to show the difference between consciousness and
through functional processes. intelligence. In this figure, we have taken the liberty to show how
The functional approach also includes the notion that animals are we humans and Artificial Intelligence could map out. The driver
physical “wet” computers, processing complex data. The main for this figure is to provide two elements, the first is an abstract
task today is to understand how these complicated biological view of where we are in terms of intelligence, and the second is
systems interact and function as a complex system. to highlight the importance of consciousness. Koch describes an
High-level consciousness is a functional system, so the more we experience in terms of the Integrated Information Theory (IIT)
understand the functions, the closer we get to having the ability to [70]:-
understand and, in theory, create machines with consciousness.
• Universe basis: Galileo believed certain things have repeatable “consciousness is determined by the causal properties of any
physical characteristics that obey mathematical laws [17], e.g., a physical system acting upon itself. That is, consciousness is a
ball rolling down a hill. Other concepts reside only in the con- fundamental property of any mechanism that has cause-effect
scious, e.g., smell, taste, and color. These qualities only exist in power upon itself.”
the mind. If people cease to exist, then these qualities evaporate. Christof Koch [34, 70]
Meaning our physical laws are incapable of providing the com-
plete story. Philip Goff put forward the concept of Panpsychism Integrated Information Theory can be said to be a form of Panpsy-
[22]. Panpsychism proposes that consciousness is a fundamental chism. Koch and others proposed that we may never replicate
aspect of reality [17]. Meaning it is equivalent to mass or charge. biological consciousness if the replication method is digital simu-
The belief is that consciousness is inherent in the fabric of the lation. Consciousness requires causal powers to make conscious-
universe and not limited to a brain. This belief comes about if ness conscious. For engineers and scientists, this is probably best
we separate the substrate and the concept. Christof Koch stated illustrated as a simple equation; see Figure 9. Using the equation,
that if true, the “cosmos is suffused with sentience” [35]. This idea we would need both simulated rules and the ability to store and
leads to cosmopsychism that the universe itself is conscious. create causal effects to make a conscious machine.
Panpsychism theory has difficulty explaining combination, i.e., • Quantum basis: As an alternative thought, Roger Penrose, a
how small consciousness combines to create a more significant mathematical physicist, believes that “whatever consciousness is,
13
• Novacene, by James Lovelock
• Consciousness as a Social Brain, by Michael S. A. Graziano
• Where am I?, by Daniel C. Dennett
Recommended reading
• Novacene, by James Lovelock
• ChatGPT, by OpenAI
Figure 10: Scala Nuturae "Ladder of Being" (plus poetic li-
10 EXCEEDING HUMAN INTELLIGENCE cense)
Up to this point, we have been discussing digital cloning; in this
section, we discuss how to exceed our intelligence. This area comes
with both optimism and concern. Surprisingly, it was John von naturally through chaotic processes. This chaotic emergence would
Neumann who created the now infamous term Technological Singu- mean that intelligence would appear without the expected controls.
larity [73]. The belief is that once Singularity occurs, technology
will reach an irreversible point that exceeds human capability; see
Superintelligence
Figure 10 —Singularity is achieved by constructing an artificial sys- There are at least two main methods to build a system capable
tem or through augmentation. Specifically for intelligence, this is of superintelligence [8]. The first method is to create a system so
known as superintelligence and hyperintelligence. There are at least complex in knowledge and sophistication that intelligence hope-
two other methods to exceed human intelligence, namely evolution fully appears. The other process involves transferring or copying
or external influence. Evolution does not stop [10]; it continues. an existing biological intelligence in the hope of jump-starting su-
Humans are just an interlude in the process; assuming we are an perintelligence. The jump-start consists of reading and copying
endpoint would be an error. The fourth possibility is an external neurons and synapses.
influencer, i.e., extraterrestrial [23]. Extraterrestrial is beyond this Koch pointed out that at the current rate of technological ad-
paper’s scope but raises some interesting questions about how we vancement, we should be capable of simulating a mouse brain
define life and, more precisely, how we define intelligent life, e.g., within 2-years [34]. The Blue Brain/Spinnaker project is on course
Assembly Theory [20]. to achieve this goal with a massive parallel spiking neuron ma-
There are always concerns. The primary problem with having chine [46]. A device must simulate 100 million neurons even for
superintelligence is having superintelligence with no consciousness, this relatively simple task. As Koch also points out, this is merely a
i.e., no awareness of implications. Harari describes an Artificial functional model with no consciousness or awareness, i.e., a zombie
Intelligent system that takes over the world (and beyond), and its intelligence.
only objective is computing 𝜋 [26]. It constantly pursues gaining
resources and removes all obstacles, with no awareness of right or Hyperintelligence
wrong; the system takes over the world by continuously consuming Hyperintelligence is a concept where humans are augmented with
more and more resources to feed a pointless calculation. It has no technology to enhance their intelligence. James Lovelock, the orig-
evil intentions; it is too focused on its goal to consider other factors. inator of the Gaia hypothesis (i.e., the Earth is a self-regulating
It has no awareness of the self and the implications of its actions. system), believes that increasing human intelligence is the only
Humans are secondary at best in this scenario. solution to global warming and world issues. The only way to
If some form of super Artificial Intelligence should emerge, will it improve intelligence is through augmentation [41].
occur by accident due to system complexity? The system would attain In recent years, we have seen a rapid increase in the sophistica-
sufficient complexity to allow intelligence to emerge. In other words, tion of Brain Computer Interfacing (BCI) in reading and writing. The
emergence would not be designed or manufactured but would form intrusive devices are implanted directly into the brain, connecting
15
to specific neurons. We typically apply these devices to people with Second, Machine Learning strives toward an optimized goal utiliz-
cognitive disabilities, for example, people who have sustained head ing some metric or reward, i.e., Reinforcement learning. Both are
injuries or experienced a severe stroke. The neural pathways can useful within narrow problem spaces. The exciting part is when
be re-routed by stimulating particular brain regions. This method the system changes from perception to decision-making. Percep-
uses a biological mechanism called plasticity, which teaches the tion is about determining an environment, e.g., the orange is in
brain to bypass the damaged areas. front of the pineapple, or the bed is in a hotel. Perception is rela-
Augmentation could also take out aspects of our humanity. Aug- tively safe since the consequences tend to be limited. By contrast,
mentation could involve getting rid of the drive to explore or travel. decision-making is about interacting within the physical world
It could lower respiration to reduce 𝐶𝑂 2 levels or even make us all (for example, autonomous vehicles). Decision-making is inherently
vegetarians reduce methane production. more dangerous since there are human implications.
Companies such as Neuralink see this as an opportunity to speed Stuart Russell in Human Compatible - Artificial Intelligence and
up human-computer communication by allowing direct connection, the problem of Control, and others, have identified this transition
i.e., opening the door for augmentation and hyperintelligence. as highly dangerous [49]. There is concern that unquestioning be-
lief in reinforcement learning, with its endless pursuit of simple,
What happens? attainable goals, might lead to problems. The real-world environ-
If we exceed human intelligence, where does this lead us [23]? Do ment is much more complicated [77] since there are humans (other
humans end up co-existing with this new intelligence? Do humans independent agents). For example, Russell [61] points out potential
become pets and end up in zoos? Do humans ascend to a higher danger if the system identifies protecting its kill-switch as part of
plane of existence (we become the machine)? Or do humans eventu- an optimizing metric.
ally become extinct (for good or bad reasons)? Do government regu-
lations slow down or speed up this process? Does energy/resources What are the mechanisms to control Machine Learning?
become the primary constraint and limiting factor?
• Testing. Vigorous testing is the easiest way to control a Machine
Recommended reading Learning model. Corrections are made to the training data if the
model goes awry. The disadvantage is that we cannot handle
• Superintelligence - Path, Dangers, Strategies, by Nick Bostrom
every test case and scenario —a strong linkage exists between
• The Major Transitions in Evolution Revisited, by B. Calcott Et al.
testing and training data.
• Human compatible, by Stuart Russell
• Boundary limitation. We can set boundary limitations for the
• Novacene, by James Lovelock
Machine Learning systems, i.e., no dialing volume to 11. These
• The Feeling Of Life Itself, by Christof Koch
are simple mechanisms that narrow the operating range. The
disadvantage of boundary limitations is that not all environments
11 CONTROL OF INTELLIGENCE
have a precise operating scope, and maybe there is a rare instance
We have intelligence and multiple individual intelligence nodes, i.e., that requires setting something to “11”.
animals and artificial. How do we control these nodes to do some- • Parallel modeling involves a simple duplicate functional model
thing useful or ensure they behave correctly? Sometimes correct that checks the more complex Machine Learning model. If there
behavior is optional, overridden by basic survival requirements. is a noticeable difference, then a contention error is raised. By
In animal intelligence, survival tends to have the highest priority. judging the contention, a decision on the right course of action
Humans have religion, laws, ethics, morals, and social norms to can occur. The disadvantage of parallel modeling is that, like the
ensure compliance with society. A selfish motivator is applied, re- previous two examples, it is only suitable for more straightfor-
warding obedience: more money, promotion, or high status. And, if ward problems with abundant computational resources.
we do not comply, depending on severity, repercussions occur, e.g., • Multiple Machine Learning systems N-version programming
isolation of an animal from the pack. may provide a control method. Either they are using the same
For most animals, conformance training occurs when young, and input data or different input data. Each system votes on a final
more so for altricial species. For example, maturer dogs make sure decision, and the majority wins. The voting method has resilience
the younger ones are in check. Dog owners are very familiar with since it handles a wrongly trained model, i.e., a Byzantine-style
this concept, so they introduce younger dogs to an environment algorithm.
with older dogs. When dogs mature into adulthood, without this • Explainable AI. Another method to help control Machine Learn-
social training, they lack some social skills, i.e., they do not comply ing systems is to have a strategy to understand them. This strat-
with the social etiquette of dogs. egy is essential to determining how conclusions come about in a
network. Explainability is vital to avoid bad biasing.
Constraining machine learning As a counterpoint, Julian Miller, Computer Scientist at York Uni-
The introduction mostly covered animal control, so what about versity, stated that explainability might act against the goal of
Artificial Intelligence and Machine Learning? What are the control Artificial Intelligence. For example, we find it difficult to under-
mechanisms available for artificial systems? What are the potential stand our decision-making, so why expect Artificial Intelligence
repercussions of non-control? to be explainable or understandable?
Machine Learning performs two significant tasks. The first is a • Inverse Reinforcement Learning. As described by Russell, it
pattern matcher using some form of correlation, i.e., deep learning. is when you alter the reward mechanism to be more oriented
16
around humans. The reward is based on human preference to with programs such as ELIZA and DOCTOR [80], which provided
produce beneficial Artificial Intelligent systems. “. . . machines primitive natural language interactions.
will need to learn more about what we want from observations of Over what appears to be a short period, Large Language Models
the choices we make and how we make them. Machines designed can now pass professional exams in disciplines such as medicine
in this way will defer to humans; they will ask for permission; they [33] or accountancy [82]. Not just pass but pass at a high proficiency
will act cautiously when guidance is unclear, and they will allow level. Let’s ask one a question, “Concisely, what is a Large Language
themselves to be switched off ” [61] —in other words, building Model?”. Below is the response from OpenAI ChatGPT-3.5 on Friday,
mathematical models that can capture, understand, and work July 28th, 2023, at 7:48 PM:
with human intent.
Today Artificial Intelligence and Machine Learning systems are “A large language model is an artificial intelligence
comparatively basic, i.e., narrow. The next generation of systems model that has been trained on vast amounts of text
will likely be much more capable, and with that capability comes data to understand and generate human-like text. It
the requirement for more control. At what resource cost are we uses a deep learning architecture called transformers to
willing to pay for explainability and control? Is it essential that process and analyze language patterns, allowing it to
Artificial Intelligent systems fall under human control mechanisms, answer questions, generate responses, perform language
e.g., ethics, laws, and religion? translation, and perform various natural language pro-
cessing tasks. The term "large" refers to the extensive
Explainability in Deep Learning amount of training data and the size of the model’s
Over the last few years, explainability has become the forefront parameters, which helps it achieve a higher level of lan-
of recent Deep Learning conversations. Not all Deep Learning ar- guage comprehension and generate more coherent and
chitectures are the same; some are more explainable than others. contextually relevant responses.”
Convolutional Neural Networks (CNN) are the worst in explain- OpenAI ChatGPT-3.5
ability but the most popular in terms of ease of development. These
two factors go together.
Explainable AI refers to decoding the black box that is Deep If we did a traditional search, we could find a similar response,
Learning. The issue is that the architectures are so massive and which would require reading various websites. This simple example
removed from human involvement that they need more readable. shows some of the versatility of a Large Language Model. We next
There have been some methodologies to address this. Heat mapping try a more challenging question. The question we asked is this
is one aspect that highlights specific areas in the images of a dataset “Concisely, what is a Large Language Model from Plato’s perspective?”.
that co-relate to high-impact weights in Neural Networks. This question is much more challenging, even for a human. Below
Explainability has been one of the significant factors impacting is the response from OpenAI ChatGPT-3.5 on Friday July 28th, 2023
the adoption of Deep Learning. Most Deep Learning models lack at 8:10 PM:
transparency, which means that any human-involved interaction
will be complicated. “A large language model is like a modern-day oracle of
This difficulty is genuine in the medical industry. Two metrics language. It is a vast and intricate system designed by
help in explainability, at least in terms of performance. These are humans, equipped with the wisdom of countless texts
the sensitivity factors that are specific to any model. Sensitivity from the past. This machine can comprehend human
refers to the proportion of correctly diagnosed positives, e.g., people language and generate responses that appear to be im-
identified as having cancer who do, or correct negatives, e.g., people bued with intelligence, even though it lacks true un-
diagnosed as cancer-free who are. These metrics aid in the adoption derstanding or consciousness. It is a tool that can be
process by highlighting the likelihood of the worst-case scenario harnessed for various purposes, acting as a virtual com-
in medicine, i.e., a false-negative answer. panion, advisor, or storyteller, but its capabilities are
limited to the information it has been fed. As a result,
Recommended reading while it may seem to possess knowledge and insight, it
• Human Compatible, by Stuart Russell remains a creation of human ingenuity, not a sentient
• Homo Deus: A Brief History of Tomorrow, by Yuval Noah Harari being with genuine wisdom.”
• Superintelligence - Path, Dangers, Strategies, by Nick Bostrom OpenAI ChatGPT-3.5
• The Possibility of Life, Jaime Green
This second response provides much more insight into what a
12 LARGE LANGUAGE MODELS AND Large Language Model is and is not. It may not be true to Plato’s
GENERATIVE AI style, but the model exhibits adaptability.
Large Language Models (LLM) have become increasingly popular, Large Language Models use a technique called transfromers.
raising excitement and concern. We can ask questions and receive Transformers were described in Vaswani Et al. paper Attention
near-perfect responses. The excitement comes from our desire to Is All You Need [75]. They create general models that can transfer
interact with computers using natural language, e.g., Star Trek’s learning to specific areas. Pre-trained models focus on multiple
voice computer. This desire can be traced back to the mid-1960s tasks. We can view transformers as another form of sophisticated
17
search tool. Instead of receiving a list of web pages to be post- is always concern about how Education adapts. Each new tool
reviewed, we get an answer to the question. For more specialized causes a rethink in teaching. Education is all about proof of
searches, we can use a prompt template to help narrow the search learning. Intelligent systems (including humans) need to be able
and provide higher-quality responses. to learn.
• Normalization: We have a general tool that is so powerful it
Initial concerns can create and summarize texts. If it is so good, then are we not
at risk of normalizing the written language to the point of being
All increases in capability cause concern, these models have gener-
obsolete? If we (humans) communicated in a normalized fashion,
ated great excitement and worry. These concerns are not necessarily
we would probably question the effectiveness and value of that
new, but because Large Language Models produce quality natural
communication. We would move away from individualism.
language responses, it magnifies the standard issues. Organiza-
tions such as OpenAI propose alleviating some of these concerns The above concerns are typical for all Artificial and Biological
through self-regulation, e.g., through organizations such as the Intelligence, but are we expecting too much from these new models
Frontier Model Forum [51]. too soon? Considering that the models are derivatives of human-
produced knowledge. After all, a Large Language Model is, at this
The concerns: point, a sophisticated search tool that responds with well-formed
natural language answers.
• Hallucinations: Large Language Models trained on large data An important variant of the Large Language Model is the Multi-
sets with incomplete or contrarian information. In response to modal Large Language Model (MLLM) [85]. As the name implies,
some questions, the models can give potentially silly, inaccurate, it combines the capabilities of Large Language Models with the
or downright dangerous answers, i.e., garbage in, garbage out ability to converse in multiple modalities. Modalities are speech,
[50]. Human intelligence is not immune to such hallucinations. images, audio, and many more. It allows for more human-like com-
• Verification: Requirement for verification is due to the halluci- munication, potentially moving towards some form of Artificial
nations. Like all programs written by humans, we must verify General Intelligence.
the results. Assuming output from a Large Language Model to Why are these models so significant? It is not because they con-
be correct would be dangerous, especially in cases where the sistently produce perfect results nor provide any progress towards
results have consequences. For example, Medical Advise, even if consciousness, but they offer a method to explore what we already
the output from the model seems logical and friendlier than a know. We want answers or, more importantly, good enough an-
human equivalent, it should be verified. Humans look for mul- swers to ambiguous questions.
tiple sources for verification by comparing facts, opinions, and Using natural language is a significant achievement, so we should
statements. consider it a Trinity event. We see a path to a proper multimodal
• Resources: A comment from Sajjad Moazeni, a University of capable system handling voice, images, sound, and taste. Trinity
Washington Assistant Professor of Electrical and Computer En- was the name given to the first successful Atomic bomb experiment,
gineering: “Overall, this can lead to up to 10 gigawatt-hour (GWh) a significant inflection point in history (for good or bad). There
power consumption to train a single large language model like was a before and after time, and we may consider the emergence
ChatGPT-3. This is, on average, roughly equivalent to the yearly of this technology as equally significant. As of writing this section,
electricity consumption of over 1,000 U.S. households.” [44]. Using we are only scratching the surface of understanding. The potential
the 𝑇𝑇 𝐸21 equation 2 for efficiency, the Large Language Model capabilities are just appearing. It is still early days.
would have an estimated inefficiency level of 2700. Passing the
Turing Test takes 2700 times more energy than training a 21- Recommended reading
year-old. Along with the power consumption, a model requires • ChatGPT, by OpenAI
a large amount of water to cool the data center, estimated to be
185,000 gallons [3]. Resource consumption is an issue for these
models. Intelligence relies on efficiency just as much as being 13 LEGAL IMPLICATIONS
intelligent. If we want to scale Large Languages Models, they Most humans operate within some form of the legal system. The
must become efficient; otherwise, it is not sustainable. legal system is required to pass blame or exonerate an entity. The
• Transparency: How was the response created? What were the critical question is, what happens when artificial intelligence makes
sources used? Just providing answers to questions does not mean the wrong decision? Is artificial intelligence to blame? Is the opera-
that the logic is correct, i.e., sophistry. We are wary of untruths, tor or final integrator to blame? Is the person who switched on the
biases, or made-up information without transparency. Also, pla- system to blame? Is the engineer or data scientist to blame? Or, if
giarism, copyright infringement, privacy, and intellectual prop- in doubt, is the entire stack of people to blame?
erty rely on transparency. We want transparency, but we could These are fundamental questions for government regulators and
argue that humans do not provide transparency. Humans may insurance companies. For government regulators, it usually comes
do this because the act is too complex, do not know, or do not down to ensuring that the new systems do not act against society
want to divulge. or hinder progress [9]. Insurance companies look at the problem
• Education: Similar to any new tools in history, e.g., Slide Rule, of how best to protect their company from unnecessary costs. In
Digital Calculator, Personal Computer, and Google Search. There other words, what does the insurance cover, and what does it not
18
cover Artificial Intelligence technology introduce other problems of DNA contains about 455 exabytes of data. The cellular transcribe
for the legal world: (RNA transcribes genetic information from DNA to a ribosome)
about 1015 yottaNOPS. That is about 1022 times that of the Fujitsu
“A New York lawyer is facing a court hearing of his own Fugaku supercomputer. Finally, we add an estimated computing time
after his firm used AI tool ChatGPT for legal research. A of roughly 3 billion years. These are all vast numbers.
judge said the court was faced with an "unprecedented Evolution plays with all these features using convergency (com-
circumstance" after a filing was found to reference ex- bining to know outcomes) and contingency (creating unexpected
ample legal cases that did not exist.” outcomes) [10]. The Earth is the only planet known to sustain in-
BBC News, May 2023 [4]. telligent life, making the probability of life a startlingly rare event,
i.e., 1 in 26.1 × 1021 .
Other problems include taking knowledge and creating new The simulator will have to simulate neurons. A simple neuron
versions. Copying biographies and selling those biographies online is 1 × 103 Flops, Hodgkin-Huxley (Electrophysiological model) is
[68]. These new tools can take existing content and re-irritate the 1, 200 × 103 Flops, and the multi-compartmental model is 1, 200 ×
content into new forms. As Artificial Intelligence tackles evermore 106 − 107 Flops. All at the scale of 1025 neurons.
sophisticated problem spaces, the legal system must learn to adapt Currently, the fastest supercomputers range from 60 to 537
to these new challenges. We are making this a catch-up race, and petaFlops [42]. The mobile phone network or cryptocurrency min-
the regulators and insurance companies need to catch up. ing community may exceed this in raw floating-point performance.
The current storage capacity required to run the simulation is 1021
14 WRONG NUMBERS times that of the top 4 supercomputers in 2019.
If Moore’s law continues, it will take roughly 6.7 years to increase
We have finally got to the wrong numbers section, and this, in part,
the computation by a power of 1. After 100 years, the gap will still
is inspired by a paper written by Roman V Yampolskiy, titled Why
be significant. Even if we created dedicated hardware accelerators
We Do Not Evolve Software? Analysis of Evolutionary Algorithms
and optimized software, it would only add a few more orders of
[83]. What are the crucial numbers if we want to build a human
magnitude.
simulator from the ground up, using fundamental principles, i.e.,
From the evidence presented, traditional technology will fail to
tabula rasa or clean slate?
reach human level intelligence in the next 50 or 100 years. Other pos-
We start with a premise 𝑝 ′ (p prime). 𝑝 ′ states that “soon we
sibilities, such as Quantum Evolutionary Computation, may create
will be able to create an intelligent machine with sufficient compu-
an equivalent to human intelligence using brute force computation.
tational power to simulate all the evolutionary processes required to
Unfortunately, Quantum Evolutionary Computation is too new to
produce human intelligence”. 𝑝 ′ depends largely upon significant
predict its likelihood of success.
advancements in computing. Today’s technological advances must
continue at a similar pace for the next coming decades. Now, the
vital question to ask what is the perceived computational gaps be-
tween today’s computer systems and human-level intelligence?. To Recommended reading
help answer that question, we start right from the beginning; see • The Major Transitions in Evolution Revisited, by B. Calcott Et al.
Figure 11.
22