0% found this document useful (0 votes)
133 views21 pages

Hey Robot! Build Your Own AI Companion - Make

Guide for AI robot

Uploaded by

precanatini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views21 pages

Hey Robot! Build Your Own AI Companion - Make

Guide for AI robot

Uploaded by

precanatini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

PROJECTS FROM MAKE: MAGAZINE

Hey Robot! Build Your


Own AI Companion

Time Required!: 1–3 Hours


Difficulty: Moderate
Price: $51-$150

Advertisement

By Shawn Hymel  October 28th, 2024


Take me to the Steps

Chat and command your own embedded-AI companion bot using local LLMs

Imagine a fully autonomous robotic companion, like Baymax from Disney’s Big Hero 6 —
a friendly, huggable mechanical being that can walk, hold lifelike interactive
conversations, and, when necessary, fight crime. Thanks to the advent of large language
models (LLMs), we’re closer to this science
fiction dream becoming a reality — at least for
lifelike conversations.

In this guide I’ll introduce Digit, a companion bot


that I helped create with Jorvon Moss
(@Odd_jayy). It uses a small LLM running locally
on an embedded computer to hold
conversations without the need for an internet
connection.

I’ll also walk you through the process of running


a similar, lightweight LLM on a Raspberry Pi so
you can begin making your own intelligent
companion bot.

This article appeared in Make: Vol. 91.


Subscribe for more maker projects and
articles!

What Is a Large Language Model (LLM)?


A large language model is a specific type of AI that can understand and generate natural,
human-like text. The most popular example of an LLM right now is OpenAI’s ChatGPT,
which is used to answer questions for the curious, automatically generate social media
content, create code snippets and, to the chagrin of many English teachers, write term
papers. LLMs are, in essence, the next evolution of chatbots.

LLMs are based on the neural network architecture known as a transformer. Like all
neural networks, transformers have a series of tunable weights for each node to help
Advertisement
perform the mathematical calculations required to achieve their desired task. A weight in
this case is just a number — think of it like a dial in a robot’s brain that can be turned to
increase or decrease the importance of some piece of information. In addition to
weights, transformers have other types of tunable dials, known as parameters, that help
convert words and phrases into numbers as well as determine how much focus should
be given to a particular piece of information.
Instead of humans manually tuning these dials, imagine if the robot could tune them
itself. That is the magic of machine learning: training algorithms adjust the values of the
parameters (dials) automatically based on some goal set by humans. These training
algorithms are just steps that a computer can easily follow to calculate the parameters.
Humans set a goal and provide training data with correct answers to the training
algorithms. The AI looks at the training data and guesses an answer. The training
algorithm determines how far off the AI’s result is from the correct answer and updates
the parameters in the AI to make it better next time. Rinse and repeat until the AI
performs at some acceptable level.

To give you an idea of complexity, a machine learning model that can read only the
handwritten digits 0 through 9 with about 99% accuracy requires around 500,000
parameters. Comprehending and generating text are vastly more complicated. LLMs are
trained on large quantities of human-supplied text, such as books, articles, and websites.
The main goal of LLMs is to predict the next word in a sequence given a long string of
previous words. As a result, the AI must understand the context and meaning of the text.
To achieve this, LLMs are made up of massive amounts of parameters. ChatGPT-4,
released in June 2023, is built from eight separate models, each containing around 220
billion parameters — about 1.7 trillion total.

Why a Local LLM?


The billions of calculations needed for ChatGPT to hold a simple conversation require
lots of large, internet-connected servers. If you tried to run ChatGPT on your laptop,
assuming you even had enough memory to store the model, it would take hours or days
to get a response! In most cases, relying on servers to do the heavy lifting is perfectly
acceptable. After all, we use copious cloud services as consumers, such as video
streaming, social media, file sharing, and email.

However, running an LLM locally on a personal computer might be enticing for a few
reasons:
Advertisement

Maybe you require access to your AI in areas with limited internet access, such as
remote islands, tropical rainforests, underwater, underground caves, and most
technology conferences!
By running the LLM locally, you can also reduce network latency — the time it takes
for packets to travel to and from servers. That being said, the extra computing power
from the servers often makes up for the latency time for complex tasks like LLMs.
Additionally, you can assume greater privacy and security for your data, which
includes the prompts, responses, and model itself, as it does not need to leave the
privacy of your computer or local network. If you’re an AI researcher developing the
next great LLM, you can better protect your intellectual property by not exposing it to
the outside.
Personal computers and home network servers are often smaller than their corporate
counterparts used to run commercial LLMs. While this might limit the size and
complexity of your LLM, it often means reduced costs for such operations.
Finally, most commercial LLMs contain a variety of guardrails and limits to prevent
misuse. If you need an LLM to operate outside of commercial limits — say, to inject
your own biases to help with a particular task, such as creative writing — then a local
LLM might be your only option.

Thanks to these benefits, local LLMs can be found in a variety of instances, including
healthcare and financial systems to protect user data, industrial systems in remote
locations, and some autonomous vehicles to interact with the driver without the need for
an internet connection. While these commercial applications are compelling, we should
focus on the real reason for running a local LLM: building an adorable companion bot
that we can talk to.

Introducing Digit
Jorvon Moss’s robotic designs have improved and evolved since his debut with Dexter
(Make: Volume 73), but his vision remains constant: create a fully functioning companion
robot that can walk and talk. In fact, he often cites Baymax as his goal for functionality. In
recent years, Moss has drawn upon insects and arachnids for design inspiration. “I
personally love bugs,” he says. “I think they have the coolest design in nature.”

Digit’s body consists of a white segmented exoskeleton, similar to a pill bug’s, that
protects the sensitive electronics. The head holds an LED array that can express
emotions through a single, animated “eye” along with a set of light-up antennae and
controllable mandibles. It sits on top of a long neck that can be swept to either side
thanks to a servomotor. Digit’s legs cannot move on their own but can be positioned
manually. Advertisement

Like other companion bots, Digit can be perched on Moss’s shoulder to catch a ride. A
series of magnets on Digit’s body and feet help keep it in place.
Courtesy of Jorvon Moss

But Digit is unique from Moss’s other companion bots thanks to its advanced brain — an
LLM running locally on an Nvidia Jetson Orin Nano embedded computer. Digit is capable
of understanding human speech (English for now), generating a text response, and
speaking that response aloud — without the need for an internet connection. To help
maintain Digit’s relatively small size and weight, the embedded Jetson Orin Nano was
mounted on a wooden slab along with an LCD for startup and debugging. Moss totes
both the Orin Nano and the appropriate battery in a backpack. You could design your own
companion bot differently to house the Orin Nano inside.

Advertisement
Courtesy of DigiKey

How Digit’s Brain Works


I helped Moss design and program the software system to act as Digit’s AI brain. This
system is comprised of three main components: a service running the LLM, a service
running the text-to-speech system, and a client program that interacts with these two
services.

Advertisement

Courtesy of DigiKey
The client, called hopper-chat, controls everything. It continuously listens for human
speech from a microphone and converts everything it hears using the Alpha Cephei
Vosk speech-to-text (STT) library. Any phrases it hears are compared to a list of wake
words/phrases, similar to how you might say “Alexa” or “Hey, Siri” to get your smart
speaker to start listening. For Digit, the wake phrase is, unsurprisingly, “Hey, Digit.” Upon
hearing that phrase, any new utterances are converted to text using the same Vosk
system.

The newly generated text is then sent to the LLM service. This service is a Docker
container running Ollama, an open-source tool for running LLMs. In this case, the LLM is
Meta’s Llama3:8b model with 8 billion parameters. While not as complex as OpenAI’s
ChatGPT-4, it still has impressive conversational skills. The service sends the response
back to the hopper-chat client, which immediately forwards it to the text-to-speech (TTS)
service.

TTS for hopper-chat is a service running Rhasspy Piper that encapsulates the en_US-
lessac-low model, a neural network trained to produce sounds when given text. In this
case, the model is specifically trained to produce English words and phrases in an
American dialect. The “low” suffix indicates that the model is low quality — smaller size,
more robotic sounds, but faster execution. The hopper-chat program plays any sounds it
receives from the TTS service through a connected speaker.

On Digit, the microphone is connected to a USB port on the Orin Nano and simply draped
over a backpack strap. The speaker is connected via Bluetooth. Moss uses an Arduino to
monitor activity in the Bluetooth speaker and move the mandibles during activity to give
Digit the appearance of speaking.

Moss added several fun features to give Digit a distinct personality. First, Digit tells a
random joke, often a bad pun, every minute if the wake phrase is not heard. Second,
Moss experimented with various default prompts to entice the LLM to respond in
particular ways. This includes making random robot noises when generating a response
and adopting different personalities, from helpful to sarcastic and pessimistic.
Advertisement
Courtesy of Jorvon Moss

Agency: From Text to Action


The next steps for Digit involve giving it a form of self-powered locomotion, such as
walking, and having the LLM perform actions based on commands. On their own, LLMs
cannot perform actions. They simply generate text responses based on input. However,
adjustments and add-ons can be made that allow such systems to take action. For
example, ChatGPT already has several third-party plugins that can perform actions, such
as fetching local weather information. The LLM recognizes the intent of the query, such
as, “What’s the weather like in Denver, Colorado?” and makes the appropriate API call
Advertisement
using the plugin. (We’ll look at some very recent developments in function calling, below.)

At the moment, Digit can identify specific phrases using its STT library, but the recorded
phrase must exactly match the expected phrase. For example, you couldn’t say “What’s
the weather like?” when the expected phrase is “Tell me the local weather forecast.” A
well-trained LLM, however, could infer that intention. Moss and I plan to experiment with
Ollama and Llama3:8b to add such intention and command recognition.
The code for hopper-chat is open source and can be found on GitHub. Follow along with
us as we make Digit even more capable.

DIY Robot Makers


Science fiction is overflowing with shiny store-bought robots and androids created by
mega corporations and the military. We’re more inspired by the DIY undercurrent —
portrayals of solo engineers cobbling together their own intelligent and helpful
companions. We’ve always believed this day would come, in part because we’ve seen it
so many times on screen. —Keith Hammond

Dr. Tenma and Toby/Astro (Astro Boy manga, anime, and films, 1952–2014)
J.F. Sebastian and his toys Kaiser and Bear (Blade Runner, 1982)
Wallace and his Techno-Trousers (Wallace & Gromit: The Wrong Trousers, 1993)
Anakin Skywalker and C-3PO (Star Wars: The Phantom Menace, 1999)
Sheldon J. Plankton and Karen (Spongebob Squarepants, 1999–2024)
Dr. Heinz Doofenshmirtz and Norm (Phineas and Ferb, 2008–2024)
The Scientist and 9 (9, 2009)
Charlie Kenton and Atom (Real Steel, 2011)
Tadashi Hamada and Baymax (Big Hero 6, 2014)
Simone Giertz and her Shitty Robots (YouTube, 2016–2018)
Kuill and IG-11 (The Mandalorian, 2019)
Finch and Jeff (Finch, 2021)
Brian and Charles (Brian and Charles, 2022)

Roll Your Own Local LLM Chatbot


I will walk you through the process of running an LLM on a Raspberry Pi. I specifically
chose the Raspberry Pi 5 due to its increased computational power. LLMs are notoriously
complex, so earlier versions of the Pi might need several minutes to produce an answer,
even from relatively small LLMs. My Pi 5 had 8GB RAM; these LLMs may not run with
less. Advertisement

What will the next generation of Make: look like? We’re inviting you to shape the future
by investing in Make:. By becoming an investor, you help decide what’s next. The
future of Make: is in your hands. Learn More.

Project Steps 1. Set up the Pi 5 with Ollama


1. SET UP THE PI 5 WITH OLLAMA
Follow the official Raspberry Pi Getting Started guide to install the latest Raspberry Pi OS
(64-bit) and configure your Raspberry Pi. You should use an SD card with at least 16 GB.
Once you have booted into your Raspberry Pi, make sure you are connected to the
internet and open a terminal window. Enter the following commands to updated the Pi
and to install Ollama:

$ sudo apt update


$ sudo apt upgrade
$ curl -fsSL https://ollama.com/install.sh | sh

Next, start the Ollama service:

$ ollama serve

You might see a message that says “Error: listen tcp 127.0.0.1:11434: bind: address
already in use.” Ignore this, as it just indicates that Ollama is already running as a service
in the background.

2. TRY OUT TINYLLAMA


Meta’s LLaMa models are almost open source, with some caveats for commercial
usage. AI researcher Zhang Peiyuan started the TinyLlama project in September 2023.
TinyLlama is a truly open-source (Apache-2.0 license), highly-optimized LLM with only 1.1
billion parameters. It is based on the LLaMa 2 model and can generate responses quite
quickly. It’s not as accurate as the newer generation of small LLMs, such as Llama3, but
it will run on hobby-level hardware like our Pi 5.

Download the latest version of TinyLlama with ollama:

$ ollama pull tinyllama

Run an interactive shell to chat with TinyLlama: Advertisement

$ ollama run tinyllama

You should be presented with a prompt. Try asking the AI a question or have it tell you a
joke.
Courtesy of Shawn Hymel

Press Ctrl+D or enter /bye to exit the interactive shell.

3. SET UP THE OLLAMA PYTHON PACKAGE


By default, Ollama runs as a background server and exposes port 11434. You can
communicate with that service by making HTTP requests. To make life easier, Ollama
maintains a Python library that communicates directly with that locally running service.
Create a virtual environment and install the package:

$ python -m venv venv-ollama --system-site-packages


$ source venv-ollama/bin/activate
$ python -m pip install ollama==0.3.3

Open a new document:


Advertisement

$ nano tinyllama-client.py

Enter the following Python code:


import ollama
# Settings
prompt = "You are a helpful assistant. Tell me a joke. " \
"Limit your response to 2 sentences or fewer."
model = "tinyllama"

# Configure the client


client = ollama.Client(host="http://0.0.0.0:11434")
# The message history is an array of prompts and responses
messages = [{
"role": "user",
"content": prompt
}]

# Send prompt to Ollama server and save the response


response = client.chat(
model=model,
messages=messages,
stream=False
)

# Print the response


print(response["message"]["content"])

Close the file by pressing Ctrl+X, press Y when asked to save the document, and press
Enter.

4. CHAT WITH YOUR LLM BOT!


Run the Python script by entering:

$ python tinyllama-client.py
Advertisement
TinyLlama can take some time to generate a response, especially on a small computer
like the Raspberry Pi — 30 seconds or more — but here you are, chatting locally with an
AI!

Courtesy of Shawn Hymel


This should give you a sense of how to run local LLMs on a Raspberry Pi and interact
with them using Python. Feel free to try different prompts, save the chat history using the
append() method, and build your own voice-activated chatbot.

LOCAL LLM CHATBOT WITH FUNCTION CALLING


LLMs have traditionally been self-contained models that accept text input and respond
with text. In the last couple of years, we’ve seen multimodal LLMs enter the scene, like
GPT-4o, that can accept and respond with other forms of media, such as images and
videos.

But in just the past few months, some LLMs have been granted a powerful new ability —
to call arbitrary functions — which opens a huge world of possible AI actions. ChatGPT
and Ollama both call this ability tools. To enable such tools, you must define the
functions in a Python dictionary and fully describe their use and available parameters.
The LLM tries to figure out what you’re asking and maps that request to one of the
available tools/functions. We then parse the response before calling the actual function.

Let’s demonstrate this concept with a simple function that turns an LED on and off.
Connect an LED with a limiting resistor to pin GPIO 17 on your Raspberry Pi 5.

Advertisement
Advertisement

Courtesy of Shawn Hymel / Fritzing

Make sure you’re in the venv-ollama virtual environment we configured earlier and install
some dependencies:
$ source venv-ollama/bin/activate
$ sudo apt update
$ sudo apt upgrade
$ sudo apt install -y libportaudio2
$ python -m pip install ollama==0.3.3 vosk==0.3.45 sounddevice==0.5.0

You’ll need to download a new LLM model and the Vosk speech-to-text (STT) model:

$ ollama pull allenporter/xlam:1b


$ python -c "from vosk import Model; Model(lang='en-us')"

As this example uses speech-to-text to convey information to the LLM, you will need a
USB microphone, such as Adafruit 3367. With the microphone connected, run the
following command to discover the USB microphone device number:

$ python -c "import sounddevice; print(sounddevice.query_devices())"

You should see an output such as:

0 USB PnP Sound Device: Audio (hw:2,0), ALSA (1 in, 0 out)


1 pulse, ALSA (32 in, 32 out)
* 2 default, ALSA (32 in, 32 out)

Note the device number of the USB microphone. In this case, my microphone is device
number 0, as given by USB PnP Sound Device. Copy this code to a file named ollama-
light-assistant.py on your Raspberry Pi.

You can also download this file directly with the command:

$ wget https://gist.githubusercontent.com/ShawnHymel/16f1228c92ad0eb9d

Advertisement
Open the code and change the AUDIO_INPUT_INDEX value to your USB microphone
device number. For example, mine would be:

AUDIO_INPUT_INDEX = 0

Run the code with:

$ python ollama-light-assistant.py
You should see the Vosk STT system boot up and then the script will say “Listening…” At
that point, try asking the LLM to “turn the light on.” Because the Pi is not optimized for
LLMs, the response could take 30–60 seconds. With some luck, you should see that the
led_write function was called, and the LED has turned on!

Courtesy of Shawn Hymel

The xLAM model is an open-source LLM developed by the SalesForce AI Research team.
It is trained and optimized to understand requests rather than necessarily providing text-
based answers to questions. The allenporter version has been modified to work with
Ollama tools. The 1-billion-parameter model can run on the Raspberry Pi, but as you
probably noticed, it is quite slow and misinterprets requests easily.

For an LLM that better understands requests, I recommend the Llama3.1:8b model. In
Advertisement
the command console, download the model with:

$ ollama pull llama3.1:8b

Note that the Llama 3.1:8b model is almost 5 GB. If you’re running out of space on your
flash storage, you can remove previous models. For example:

$ ollama rm tinyllama
In the code, change:

MODEL = "allenporter/xlam:1b"

to:

MODEL = "llama3.1:8b"

Run the script again. You’ll notice that the model is less picky about the exact phrasing of
the request, but it takes much longer to respond — up to 3 minutes on a Raspberry Pi 5
(8GB RAM).
When you are done, you can exit the virtual environment with the following command:

$ deactivate

A CLOSER LOOK AT OLLAMA TOOLS


Let’s take a moment to discuss how tools work in Ollama. Feel free to open the ollama-
light-assistant.py file to follow along.
First, you need to define the function you want to call. In our example, we create a simple
led_write() function that accepts an led object (as created by the Raspberry Pi gpiozero
library) and an integer value: 0 for off, 1 for on.

def led_write(led, value):


"""
Turn the LED on or off.
"""
if int(value) > 0:
led.on()
print("The LED is now on")
else:
led.off()
print("The LED is now off") Advertisement

The trick is to get the LLM to understand that calling this function is a possibility! Since
the LLM does not have direct access to your code, the ollama library acts as an
intermediary. By defining a set of tools, the LLM can return one of those tools as a
response instead of (or in addition to) its usual text-based answer. This response comes
in the form of a JSON or Python dictionary that our code can parse and call the related
function.
You must define the tools in a list of dictionary objects. As these small LLMs struggle
with the concept of an “LED,” we’ll call this a “light.” In our code, we provide the following
description of the led_write() function to Ollama:

TOOLS = [
{
'type': 'function',
'function': {
'name': "led_write",
'description': "Turn the light off or on",
'parameters': {
'type': 'object',
'properties': {
'value': {
'type': 'number',
'description': "The value to write to the ligh
"to turn it off and on. 0 for off, 1 for o
},
},
'required': ['value'],
},
}
}
]

In the send() function, we send our query to the Ollama server running the LLM. This
query is captured by the Vosk STT functions and converted to text before being added to
the message history buffer msg_history.

response = client.chat(
model=model,
Advertisement
messages=msg_history.get(),
tools=TOOLS,
stream=False
)

When we receive the response from the LLM, we check to see if it contains an entry with
the key tool_calls. If so, it means the LLM decided to use one of the defined tools! We
then need to figure out which tool the LLM intended to use by cycling through all of the
returned tool names. If the name led_write is given for one of the tools, which we defined
in the original TOOLS dictionary, we call the led_write() function. We provide the function
call with the pre-defined led object and argument value that the LLM decided to give.

if response['message'].get('tool_calls') is None:
print("Tools not used.")
return
else:
print("Tools used. Calling:")
for tool in response['message']['tool_calls']:
print(tool)
if tool['function']['name'] == "led_write":
led_write(led, tool['function']['arguments']['value'])

The properties defined in the TOOLS dictionary give the LLM context about the function,
such as its use case and the necessary arguments it needs to provide. Think of it like
giving an AI agent a form to fill out. The AI will first determine which form to use based
on the request (e.g. “control a light”) and then figure out how to fill in the various fields.
For example, the value parameter says that the field must be a number and it should be a
0 for “off” and 1 for “on.” The LLM uses these context clues to figure out how to craft an
appropriate response.

CONCLUSION

ROBOT POWERS, ACTIVATE!


This example demonstrates the possibilities of using LLMs for understanding user
intention and for processing requests to call arbitrary functions. Such technology is
extremely powerful — we can connect AI agents to the internet to make requests, and
control hardware! — but it’s still new and experimental. You will likely run into bugs, and
you can expect the code interface to change. It also demonstrates the need for better-
optimized models and more powerful hardware. A few boards such as the Jetson Orin
Advertisement
Nano and accelerators like the new Hailo-10H enable low-cost local LLM execution today.
I’m excited to see this tech get better!

MORE ABOUT OLLAMA:


Ollama is a lightweight framework for locally running LLMs
The Ollama API is compatible with most of the OpenAI API, which means you can use
many of the same client function calls found in the OpenAI documentation.
Ollama tools — support and examples
This article appeared in Make: Vol. 91. Subscribe for more maker projects and articles!
FROM THE SHED: NEW ARRIVALS

DIY Arcade Joystick Make: Arduino Maker's Notebook - Transistor Cat Kit
Kit Electronics Starter Hardcover 3rd
Pack Edition

$24.95 $64.95 $19.99 $12.95

MATERIALS

Raspberry Pi 5 single-board computer, 8GB RAM or more


MicroSD card, 16GB or more
USB microphone such as Adafruit 3367
Breadboard
LED
Resistor, 220Ω to 1kΩ
Jumper wires

TOOLS

Computer to flash Raspberry Pi OS; not needed afterward


Keyboard, mouse, and monitor connected to Raspberry Pi
Raspberry Pi Imager
Light assistant code

Advertisement
 Tagged AI artificial intelligence companion cosplay expressive robots

Shawn Hymel
is an embedded engineer, maker, technical content creator, and instructor. He loves
finding fun uses of technology at the intersection of code and electronics, as well as
swing dancing in his free time (pandemic permitting).

Advertisement

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy