0% found this document useful (0 votes)
23 views85 pages

Slides 19

The document describes context-free grammars (CFGs) and how they can be used to define the structure of strings in a language. A CFG consists of nonterminal and terminal symbols, production rules that replace nonterminals with symbol strings, and a start symbol. Strings in the language of a CFG are those that can be derived from the start symbol through zero or more applications of the production rules. The document provides examples of CFGs for arithmetic expressions and explains how regular expressions can be converted to equivalent CFGs.

Uploaded by

Rupesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views85 pages

Slides 19

The document describes context-free grammars (CFGs) and how they can be used to define the structure of strings in a language. A CFG consists of nonterminal and terminal symbols, production rules that replace nonterminals with symbol strings, and a start symbol. Strings in the language of a CFG are those that can be derived from the start symbol through zero or more applications of the production rules. The document provides examples of CFGs for arithmetic expressions and explains how regular expressions can be converted to equivalent CFGs.

Uploaded by

Rupesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Context-Free Grammars

Describing Languages

We've seen two models for the regular languages:

Finite automata accept precisely the strings in the
language.

Regular expressions describe precisely the strings
in the language.

Finite automata recognize strings in the language.

Perform a computation to determine whether a
specifc string is in the language.

Regular expressions match strings in the language.

Describe the general shape of all strings in the
language.
Context-Free Grammars

A context-free grammar (or CFG) is an
entirely diferent formalism for defning a
class of languages.

Goal: Give a description of a language by
recursively describing the structure of
the strings in the language.

CFGs are best explained by example...
Arithmetic Expressions

Suppose we want to describe all legal arithmetic
expressions using addition, subtraction,
multiplication, and division.

Here is one possible CFG:
E → int E
E → E Op E ⇒ E Op E
E → (E) ⇒ E Op (E)
Op → +
⇒ E Op (E Op E)
⇒ E * (E Op E)
Op → -
⇒ int * (E Op E)
Op → *
⇒ int * (int Op E)
Op → /
⇒ int * (int Op int)
⇒ int * (int + int)
Arithmetic Expressions

Suppose we want to describe all legal arithmetic
expressions using addition, subtraction,
multiplication, and division.

Here is one possible CFG:
E → int E
E → E Op E ⇒ E Op E
E → (E) ⇒ E Op int
Op → +
⇒ int Op int
⇒ int / int
Op → -
Op → *
Op → /
Context-Free Grammars

Formally, a context-free grammar E → int
is a collection of four items:
E → E Op E

A set of nonterminal symbols
(also called variables), E → (E)

A set of terminal symbols (the Op → +
alphabet of the CFG)

A set of production rules saying Op → -
how each nonterminal can be
replaced by a string of terminals
Op → *
and nonterminals, and Op → /

A start symbol (which must be a
nonterminal) that begins the
derivation.
Context-Free Grammars

Formally, a context-free grammar E → int
is a collection of four items:
E → E Op E

A set of nonterminal symbols
(also called variables), E → (E)

A set of terminal symbols (the Op → +
alphabet of the CFG)

A set of production rules saying Op → -
how each nonterminal can be
replaced by a string of terminals
Op → *
and nonterminals, and Op → /

A start symbol (which must be a
nonterminal) that begins the
derivation.
Some CFG Notation

In today’s slides, capital letters in Bold Red
Uppercase will represent nonterminals.

e.g. A, B, C, D

Lowercase letters in blue monospace will represent
terminals.

e.g. t, u, v, w

Lowercase Greek letters in gray italics will
represent arbitrary strings of terminals and
nonterminals.

e.g. α, γ, ω

You don't need to use these conventions on your
own; just make sure whatever you do is readable. ☺
A Notational Shorthand

E → int
E → E Op E
E → (E)
Op → +
Op → -
Op → *
Op → /
A Notational Shorthand

E → int | E Op E | (E)
Op → + | - | * | /
Derivations
E → E Op E | int | (E)
Op → + | * | - | /

A sequence of steps where
nonterminals are replaced by
⇒ E the right-hand side of a
⇒ E Op E production is called a
⇒ E Op (E) derivation.
⇒ E Op (E Op E) ●
If string α derives string ω, we
⇒ E * (E Op E) write α ⇒* ω.
⇒ int * (E Op E) ●
In the example on the left, we
⇒ int * (int Op E) see E ⇒* int * (int + int).
⇒ int * (int Op int)
⇒ int * (int + int)
The Language of a Grammar

If G is a CFG with alphabet Σ and start
symbol S, then the language of G is the
set
ℒ(G) = { ω ∈ Σ* | S ⇒* ω }

That is, ℒ(G) is the set of strings of
terminals derivable from the start
symbol.
IfIfGGisisaaCFG
CFGwith
withalphabet
alphabetΣΣand
andstart symbolS,
startsymbol S,
then thelanguage
thenthe languageofofGGisisthe
theset
set
ℒ(G)=={{ω
ℒ(G) Σ*||SS⇒*
ω∈∈Σ* ⇒*ω ω}}

Consider
Considerthe
thefollowing
followingCFG
CFGGGover ={{aa,,bb,,cc,,dd}:
overΣΣ= }:
→SSaa||ddTT
SS→
→bbTTbb||cc
TT→
How
Howmany
manyof
ofthe
thefollowing
followingstrings
stringsare
arein ℒ((G)?
inℒ G)?
dca
dca
cad
cad
bcb
bcb
ddTTaa
aa
Answer atPollEv.com/cs103
Answerat PollEv.com/cs103or or
text CS103 to 22333 once to join, then anumber.
text CS103 to 22333 once to join, then a number.
Context-Free Languages

A language L is called a context-free
language (or CFL) if there is a CFG G
such that L = ℒ(G).

Questions:

What languages are context-free?

How are context-free and regular languages
related?
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → a*b
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → a*b
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → a*b
A → Aa | ε
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → a*b
A → Aa | ε
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → Ab
A → Aa | ε
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → a(b ∪ c*)
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → a(b ∪ c*)
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → a(b ∪ c*)
X → b | c*
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → a(b ∪ c*)
X → b | c*
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → aX
X → b | c*
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → aX
X → b | c*
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → aX
X → b | c*
C → Cc | ε
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → aX
X → b | c*
C → Cc | ε
From Regexes to CFGs

CFGs consist purely of production rules
of the form A → ω. They do not have the
regular expression operators * or ∪.

However, we can convert regular
expressions to CFGs as follows:

S → aX
X→b|C
C → Cc | ε
Regular Languages and CFLs

Theorem: Every regular language is
context-free.

Proof Idea: Use the construction from
the previous slides to convert a regular
expression for L into a CFG for L. ■

Problem Set 8 Exercise: Instead, show
how to convert a DFA/NFA into a CFG.
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

S
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

a S b
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

a S b
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

a a S b b
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

a a S b b
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

a a a S b b b
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

a a a S b b b
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

a a a a S b b b b
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

a a a a b b b b
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

a a a a b b b b
The Language of a Grammar

Consider the following CFG G:
S → aSb | ε

What strings can this generate?

a a a a b b b b
ℒ(G) = { anbn | n ∈ ℕ }
Regular
Languages CFLs

All Languages
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε

S
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε

a S b
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε

a S b
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε

a a S b b
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε

a a S b b
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε

a a a S b b b
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε

a a a S b b b
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε

a a a a S b b b b
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε

a a a a b b b b
Why the Extra Power?

Why do CFGs have more power than
regular expressions?

Intuition: Derivations of strings have
unbounded “memory.”
S → aSb | ε

a a a a b b b b
Time-Out for Announcements!
Midterm Exam Logistics

The next midterm is tonight from 7:00PM – 10:00PM.
Locations are divvied up by last (family) name:

A-I: Go to Cubberley Auditorium.

J-Z: Go to Cemex Auditorium.

The exam focuses on Lecture 06 – 13 (binary relations
through induction) and PS3 – PS5. Finite automata
onward is not tested.

Topics from earlier in the quarter (proofwriting, frst-order
logic, set theory, etc.) are also fair game, but that’s primarily
because the later material builds on this earlier material.

The exam is closed-book, closed-computer, and limited-
note. You can bring a double-sided, 8.5” × 11” sheet of
notes with you to the exam, decorated however you’d
like.
Our Advice

Eat dinner tonight. You are not a brain in a jar. You
are a rich, complex, beautiful biological system.
Please take care of yourself.

Read all the questions before diving into them.
Tunnel vision can hurt you on an exam. There’s
evidence that spreading your time out leads to better
outcomes.

Refect on how far you’ve come. How many of
these questions would you have been able to
understand two months ago? That’s the mark that
you’re learning something!
Three Questions

What is something you know now that, at
the start of the quarter, you knew you didn’t
know?

What is something you know now that, at
the start of the quarter, you didn’t know
that you didn’t know?

What is something you don’t know that, at
the start of the quarter, you didn’t know
that you didn’t know?
Back to CS103!
Designing CFGs

Like designing DFAs, NFAs, and regular
expressions, designing CFGs is a craft.

When thinking about CFGs:

Think recursively: Build up bigger structures
from smaller ones.

Have a construction plan: Know in what
order you will build up the string.

Store information in nonterminals: Have
each nonterminal correspond to some useful
piece of information.
Designing CFGs

Let Σ = {a, b} and let L = {w ∈ Σ* | w is
a palindrome }

We can design a CFG for L by thinking
inductively:

Base case: ε, a, and b are palindromes.

If ω is a palindrome, then aωa and bωb are
palindromes.

No other strings are palindromes.

S → ε | a | b | aSa | bSb
Designing CFGs

Let Σ = {(, )} and let L = {w ∈ Σ* | w is a
string of balanced parentheses }

Some sample strings in L:
((()))
(())()
(()())(()())
((((()))(())))
ε
()()
Designing CFGs

Let Σ = {(, )} and let L = {w ∈ Σ* | w is a
string of balanced parentheses }

Let's think about this recursively.

Base case: the empty string is a string of
balanced parentheses.

Recursive step: Look at the closing parenthesis
that matches the frst open parenthesis.

((()(()))(()))(())((()))
Designing CFGs

Let Σ = {(, )} and let L = {w ∈ Σ* | w is a
string of balanced parentheses }

Let's think about this recursively.

Base case: the empty string is a string of
balanced parentheses.

Recursive step: Look at the closing parenthesis
that matches the frst open parenthesis.

( ( ( ) ( ( ) ) ) ( ( ) ) )( ( ) )( ( ( ) ) )
Designing CFGs

Let Σ = {(, )} and let L = {w ∈ Σ* | w is a
string of balanced parentheses }

Let's think about this recursively.

Base case: the empty string is a string of
balanced parentheses.

Recursive step: Look at the closing parenthesis
that matches the frst open parenthesis.

( ( ( ) ( ( ) ) ) ( ( ) ) )( ( ) )( ( ( ) ) )
Designing CFGs

Let Σ = {(, )} and let L = {w ∈ Σ* | w is a
string of balanced parentheses }

Let's think about this recursively.

Base case: the empty string is a string of
balanced parentheses.

Recursive step: Look at the closing parenthesis
that matches the frst open parenthesis.

( ( ) ( ( ) ) )( ( ) ) ( ( ) )( ( ( ) ) )
Designing CFGs

Let Σ = {(, )} and let L = {w ∈ Σ* | w is a
string of balanced parentheses }

Let's think about this recursively.

Base case: the empty string is a string of
balanced parentheses.

Recursive step: Look at the closing parenthesis
that matches the frst open parenthesis.
Removing the frst parenthesis and the
matching parenthesis forms two new strings of
balanced parentheses.

S → (S)S | ε
Designing CFGs

Let Σ = {a, b} and let L = {w ∈ Σ* | w
has the same number of a's and b's }
How
Howmany
manyof
ofthe
thefollowing
followingCFGs
CFGshave
havelanguage
languageL?
L?

S → aSb | bSa | ε S → abS | baS | ε

S → abSba | baSab | ε S → SbaS | SabS | ε

Answer atPollEv.com/cs103
Answerat PollEv.com/cs103or or
text CS103 to 22333 once to join, then anumber.
text CS103 to 22333 once to join, then a number.
Designing CFGs

Let Σ = {a, b} and let L = {w ∈ Σ* | w
has the same number of a's and b's }
How
Howmany
manyof
ofthe
thefollowing
followingCFGs
CFGshave
havelanguage
languageL?
L?

S → aSb | bSa | ε S → abS | baS | ε

S → abSba | baSab | ε S → SbaS | SabS | ε

Answer atPollEv.com/cs103
Answerat PollEv.com/cs103or or
text CS103 to 22333 once to join, then anumber.
text CS103 to 22333 once to join, then a number.
Designing CFGs

Let Σ = {a, b} and let L = {w ∈ Σ* | w
has the same number of a's and b's }
How
Howmany
manyof
ofthe
thefollowing
followingCFGs
CFGshave
havelanguage
languageL?
L?

S → aSb | bSa | ε S → abS | baS | ε

S → abSba | baSab | ε S → SbaS | SabS | ε

Answer atPollEv.com/cs103
Answerat PollEv.com/cs103or or
text CS103 to 22333 once to join, then anumber.
text CS103 to 22333 once to join, then a number.
Designing CFGs

Let Σ = {a, b} and let L = {w ∈ Σ* | w
has the same number of a's and b's }
How
Howmany
manyof
ofthe
thefollowing
followingCFGs
CFGshave
havelanguage
languageL?
L?

S → aSb | bSa | ε S → abS | baS | ε

S → abSba | baSab | ε S → SbaS | SabS | ε

Answer atPollEv.com/cs103
Answerat PollEv.com/cs103or or
text CS103 to 22333 once to join, then anumber.
text CS103 to 22333 once to join, then a number.
Designing CFGs

Let Σ = {a, b} and let L = {w ∈ Σ* | w
has the same number of a's and b's }
How
Howmany
manyof
ofthe
thefollowing
followingCFGs
CFGshave
havelanguage
languageL?
L?

S → aSb | bSa | ε S → abS | baS | ε

S → abSba | baSab | ε S → SbaS | SabS | ε

Answer atPollEv.com/cs103
Answerat PollEv.com/cs103or or
text CS103 to 22333 once to join, then anumber.
text CS103 to 22333 once to join, then a number.
Designing CFGs: A Caveat

When designing a CFG for a language,
make sure that it

generates all the strings in the language and

never generates a string outside the
language.

The frst of these can be tricky – make
sure to test your grammars!

You'll design your own CFG for this
language on Problem Set 8.
CFG Caveats II

Is the following grammar a CFG for the
language { anbn | n ∈ ℕ }?
S → aSb

What strings in {a, b}* can you derive?

Answer: None!

What is the language of the grammar?

Answer: Ø

When designing CFGs, make sure your
recursion actually terminates!
Designing CFGs

When designing CFGs, remember that each
nonterminal can be expanded out
independently of the others.

Let Σ = {a, ≟} and let L = {an≟an | n ∈ ℕ }.

Is the following a CFG for L?
S → X≟X S
X → aX | ε ⇒ X≟X
⇒ aX≟X
⇒ aaX≟X
⇒ aa≟X
⇒ aa≟aX
⇒ aa≟a
Finding a Build Order

Let Σ = {a, ≟} and let L = {an≟an | n ∈ ℕ }.

To build a CFG for L, we need to be more clever with how we
construct the string.

If we build the strings of a's independently of one another,
then we can't enforce that they have the same length.

Idea: Build both strings of a's at the same time.

Here's one possible grammar based on that idea:
S → ≟ | aSa
S
⇒ a Sa
⇒ aaSaa
⇒ aaaSaaa
⇒ aaa≟aaa
Function Prototypes

Let Σ = {void, int, double, name, (, ), ,, ;}.

Let's write a CFG for C-style function
prototypes!

Examples:
● void name(int name, double name);
● int name();
● int name(double name);
● int name(int, int name, int);
● void name(void);
Function Prototypes

Here's one possible grammar:

S → Ret name (Args);

Ret → Type | void

Type → int | double

Args → ε | void | ArgList

ArgList → OneArg | ArgList, OneArg

OneArg → Type | Type name

Fun question to think about: what changes
would you need to make to support pointer
types?
Summary of CFG Design Tips

Look for recursive structures where they exist:
they can help guide you toward a solution.

Keep the build order in mind – often, you'll
build two totally diferent parts of the string
concurrently.

Usually, those parts are built in opposite directions:
one's built left-to-right, the other right-to-left.

Use diferent nonterminals to represent
diferent structures.
Applications of Context-Free Grammars
CFGs for Programming Languages
BLOCK → STMT
| { STMTS }

STMTS → ε
| STMT STMTS

STMT → EXPR;
| if (EXPR) BLOCK
| while (EXPR) BLOCK
| do BLOCK while (EXPR);
| BLOCK
| …

EXPR → identifier
| constant
| EXPR + EXPR
| EXPR – EXPR
| EXPR * EXPR
| ...
Grammars in Compilers

One of the key steps in a compiler is fguring out what a
program “means.”

This is usually done by defning a grammar showing the
high-level structure of a programming language.

There are certain classes of grammars (LL(1) grammars,
LR(1) grammars, LALR(1) grammars, etc.) for which it's
easy to fgure out how a particular string was derived.

Tools like yacc or bison automatically generate parsers
from these grammars.

Curious to learn more? Take CS143!
Natural Language Processing

By building context-free grammars for actual
languages and applying statistical inference, it's
possible for a computer to recover the likely meaning
of a sentence.

In fact, CFGs were frst called phrase-structure
grammars and were introduced by Noam Chomsky in his
seminal work Syntactic Structures.

They were then adapted for use in the context of
programming languages, where they were called Backus-
Naur forms.

Stanford's CoreNLP project is one place to look for an
example of this.

Want to learn more? Take CS124 or CS224N!
Biography Minute:
Noam Chomsky

Invented CFGs!

Helped found felds of linguistics
and cognitive science
PC: Hans Peters / Anefo (via Wikimedia)


Today, perhaps more well known for political
writing than linguistics

Made it onto President Nixon’s “Enemies List”

Anti-capitalism, anti-imperialism, anti-war

Drawing on linguistics expertise, written extensively
on state propaganda (Manufacturing Consent)
Next Time

Turing Machines

What does a computer with unbounded
memory look like?

How would you program it?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy