Chapter 1 - Machine Learning Fundamentals
Chapter 1 - Machine Learning Fundamentals
Learning
Instructor: Melaku M.
AI
FML by Melaku M
Quotes
❖“If you were a current computer science student, what area
would you start studying heavily?
• Answer: Machine Learning.
–Bill Gates, Reddit AMA
❖“Machine learning is today’s discontinuity”
–Jerry Yang, Co-founder, Yahoo
❖“AI is the new electricity! Electricity transformed countless
industries; AI will now do the same.” –Andrew Ng
FML by Melaku M
What is Machine Learning?
FML by Melaku M
Potential Definitions for Machine Learning
❖Machine learning (ML) is a field of study that focuses on creating systems that
can learn from data without being explicitly programmed. “Arthur Samuel (1959)”
➢Improve performance over time: As they are exposed to more data, ML models refine
their understanding and become more accurate in their predictions or decisions.
➢Make predictions or decisions: Based on the patterns they've learned, ML algorithms can
predict future outcomes or make informed decisions.
FML by Melaku M
Potential Definitions for Machine Learning
Definition by Tom Mitchell (1998):
FML by Melaku M
Defining the Learning Task
• Improve on task T, with respect to performance metric P, based on experience E
T: Recognizing and classifying hand-written words within images
P: Percentage of words correctly classified
E: Dataset of human-labeled images of handwritten words
FML by Melaku M
When We Need Machine Learning
FML by Melaku M
Artificial intelligence (AI)-Broader concept
FML by Melaku M
DL: Automatically extracts features from raw data, reducing the need for manual feature engineering.
FML by Melaku M
State-of-the-Art Applications of Machine Learning
FML by Melaku M
Generative AI •ChatGPT: This LLMs has a foundation of GPT architecture
that generates text that resembles something a human
would produce. It's a helpful companion for research,
strategy, and content creation.
•DALL-E2: This model generates images from text prompts,
so creatives can create vibrant illustrations and concept art
that’s a useful accompaniment to content marketing.
•GitHub Copilot: This collaboration between GitHub and
OpenAI acts as a coding companion to help developers code
faster and more intuitively.
Figure: generative AI platforms
•Gemini: large language model chatbot, also known as a
conversational AI .
FML by Melaku M
Autonomous Cars
Path Planning
Adaptive Vision
FML by Melaku M
Deep Learning in the Headlines
15
FML by Melaku M
Deep Networks Learn Layered Representations
1980s-Era Neural Network Deep Neural Networks
FML by Melaku M
Image: https://www.pnas.org/content/116/4/1074
Object Recognition
FML by Melaku M
Image Translation- Sketch to Photo
FML by Melaku M
Image Synthesis – Image Inpainting
Image inpainting is essentially the art of filling in missing or damaged parts of an image.
E.g., Repair scratches, cracks, Take out unwanted objects, Fill in areas for artistic FML by Melaku M
purposes
NLP-Named Entity Recognition
FML by Melaku M
NLP: Text Generation
FML by Melaku M
NLP: Text Translation/Machine Translation
FML by Melaku M
Automatic Speech Recognition
A Typical Speech Recognition System
# Hidden Layers 1 2 4 8 10 12
FML by Melaku M
Machine learning is currently the preferred approach in
the following domains:
FML by Melaku M
Machine Learning pipeline
FML by Melaku M
FML by Melaku M
Supervised Learning
❖The model is trained on labeled data, meaning that the input
data is paired with the correct output.
– y is categorical == classification
– y is real-valued == regression
FML by Melaku M
Supervised Learning: Spam Detection
• This is a binary classification task:
Assign label (i.e. spam/not-spam) to the
input (an email message)
FML by Melaku M
Supervised Learning: Document Classification
• This is a multi-class classification task:
Assign label (i.e.Politics, Sports, Finance, Arts) to the input
?
Machine
Learning Classifier Classifier
Algorithm predicted
model
label
labeled data new document
In this class, we study algorithms and techniques to learn such models from data
FML by Melaku M
Supervised Learning: Document Classification
• This idea generalizes to many types of data and applications
Data Labels
Documents Politics,Sports,Finance
Sentences Positive,Negative
Phrases Person,Location
Images Cat,Dog,Snake,Horse Re-
M edical records admit soon/Not
...
FML by Melaku M
Supervised Learning: Digit Recognition
What is a ‘2’? What is a ‘4’?
FML by Melaku M
Unsupervised Learning
❖ The model is trained on unlabeled dataset. The model aims to discover hidden
patterns, structures, or relationships within the data. Given x1, x2, ..., xn (without
labels)
– E.g., clustering
FML by Melaku M
Genes
Unsupervised Learning
Individuals
Genomics application:
Social network analysis
group individuals by genetic similarity Finding image similarity
FML by Melaku M
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources
FML by Melaku M
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Semi-supervised Learning:
❖Definition: Combines a small amount of labeled data with large amounts
of unlabeled data.
❖ Common Applications:
✓ Medical imaging (e.g., labeling diseases in X-rays with limited labeled data).
✓ Speech recognition (e.g., learning from a few transcribed audio clips).
FML by Melaku M
Reinforcement Learning
❖ Reinforcement learning (RL): involves learning through interaction with an
environment. An agent learns to take actions in an environment to maximize a
reward signal.
❖ The agent learns through trial and error, receiving rewards or penalties for its
actions. The goal is to maximize the cumulative reward.
FML by Melaku M
The Agent-Environment Interface
FML by Melaku M
FML by Melaku M
Main Challenges of Machine Learning
•In short, since your main task is to select a learning algorithm and
train it on some data, the two things that can go wrong are 1) "bad
data" and 2) "bad algorithm".
1. Dataset(training data)
FML by Melaku M
Main Challenges of Machine Learning - Dataset
FML by Melaku M
Main Challenges of Machine Learning - Dataset
FML by Melaku M
Main Challenges of Machine Learning - Dataset
3) Poor-Quality Data:
•If your training data is full of errors, outliers, and noise (e.g., due
to poor quality measurements), it will make it harder for the system
to detect the underlying patterns, so your system is less likely to
perform well. It is often well worth the effort to spend time cleaning
up your training data. The truth is, most data scientists spend a
significant part of their time doing just that.
FML by Melaku M
Main Challenges of Machine Learning - Dataset
4 Irrelevant Features:
• ..,
FML by Melaku M
Main Challenges of Machine Learning - Algorithm
1) Overfitting the Training Data:
❖ Overfitting happens when a model learns the detail and noise in the training
data to the extent that it negatively impacts the performance of the model
on new data.
❖ The model performs well on the training data, but it does not generalize
well.
FML by Melaku M
Main Challenges of Machine Learning - Algorithm
2) Underfitting the Training Data:
✓ Underfitting is the opposite of overfitting: it occurs when your
model is too simple to learn the underlying structure of the data.
❖ Computational Resources: Training complex models can require significant time and resources.
❖Bias and Fairness: Models can inherit biases present in the data.
FML by Melaku M
ML in Practice
Designing a Learning System
• Understand domain, prior knowledge, and goals
• Choose the training experience and what is to be
learned
Loop – i.e. the target function
• Data integration, selection, cleaning, pre-processing, etc.
• Learn models
• Choose a learning algorithm to infer the target
function from the experience
• Interpret results
• Consolidate and deploy discovered knowledge FML by Melaku M
FML by Melaku M