0% found this document useful (0 votes)
11 views115 pages

Syntax Semantics Lexical

Uploaded by

John Ramiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views115 pages

Syntax Semantics Lexical

Uploaded by

John Ramiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

CHAPTER 3

DESCRIBING
SYNTAX AND
SEMANTICS
Presented By : Group 2
OVERVIEW
3.1 Introduction

3.2 The General Problem of Describing Syntax

3.3 Formal Methods of Describing Syntax

3.4 Attribute Grammars

3.5 Describing the Meanings of Programs: Dynamic Semantics


3.1 INTRODUCTION
The study of programming languages, like the study of natural
languages, can be divided into examinations of syntax and
semantics. The syntax of a programming language is the form
of its expressions, statements, and program units. Its
semantics is the meaning of those expressions, statements,
and program units.
MEANING EXAMPLE

The form or structure of the


SYNTAX while (boolean_expr) statement
expression.

The semantics of this statement


form is that when the current value
of the Boolean expression is true, the
The meaning of the embedded statement is executed.
SEMANTICS
expression. Otherwise, control continues after
the while construct. Then control
implicitly returns to the Boolean
expression to repeat the process.
REVIEW QUESTION NO. 1
In your own words, define syntax and semantics.
3.2 THE GENERAL
PROBLEM OF DESCRIBING
SYNTAX
LANGUAGE AS A SET OF
STRINGS
A language, regardless of whether it's natural (such as English)
or artificial (such as Java), can be thought of as a set of strings
made up of characters from some defined alphabet. These
strings are referred to as sentences or statements.
SYNTAX RULES
Every language has syntax rules that determine which strings
of characters from its alphabet are considered valid in the
language. For instance, English has a complex set of rules
governing its sentence structure. Similarly, programming
languages have syntax rules dictating the structure and
arrangement of code elements.
COMPLEXITY COMPARISON

The natural languages like English have intricate and


NATURAL LANGUAGE
extensive sets of rules.

ARTIFICAL LANGUAGE The programming languages tend to be simpler in their


(PROGRAMMING LANGUAGE) syntax rules, even for the most complex languages.
LEXEMES AND TOKENS
Lexemes (/lek-sim(s)/) are the lowest-level syntactic units of
a language. They encompass elements like numeric literals,
operators, special words, etc. These units are often not
explicitly described in formal syntax descriptions of
programming languages for the sake of simplicity. Instead,
lexemes are typically defined separately through lexical
specifications.
Lowest-level syntactic units: refers to the smallest meaningful elements or
components of a language's syntax. In other words, they are the basic building
blocks from which larger structures within the language are constructed.
LEXEMES AND TOKENS
Each group of lexemes is represented by a name, which is
termed a Token. Therefore, a token is essentially a category or
classification of lexemes within a language. For instance, in the
context of identifiers, the token represents the category of
lexemes that serve as names for variables, methods, or classes.
MEANING COMPARISON

The sequence of characters that is the lowest level syntactic


LEXEMES
unit in the programming language.

The syntactic category or classification of lexemes within a


TOKENS
language.
EXAMPLE: INDEX = 2 * COUNT + 17;

LEXEMES TOKENS

index identifier

= equal_sign

2 int_literal

* mult_op

count identifier

+ plus_op

17 int_literal

; semicolon
FLOW OF LEXEMES AND
TOKENS
In general, languages can be formally defined in two distinct ways:
by recognition and by generation (although neither provides a
definition that is practical by itself for people trying to learn or use
a programming language).
3.2.1 LANGUAGE RECOGNIZERS
Language recognizers are mechanisms or algorithms used to
determine whether a given string of characters belongs to a
specific language.
To formally define a language using the recognition method, a
mechanism called a recognition device is constructed. This
device is capable of reading input strings of characters from
the language's alphabet and indicating whether a given input
string is in the language or not. Essentially, the recognition
device acts as a filter, separating valid strings (those
belonging to the language) from invalid ones.
3.2.2 LANGUAGE GENERATORS
A language generator is a device that can be used to generate
the sentences of a language. We can think of the generator as
having a button that produces a sentence of the language
every time it is pushed. Because the particular sentence that is
produced by a generator when its button is pushed is
unpredictable, a generator seems to be a device of limited
usefulness as a language descriptor.
LANGUAGE RECOGNIZER LANGUAGAE GENERATOR

Determines whether a given string belongs


Produces strings that belong to a language.
to a language or not. It accepts or rejects
It generates valid strings according to the
strings based on whether they adhere to
language's syntax rules.
the language's syntax rules.

Determines membership of strings in a Produces strings that are members of the


language. It checks whether a string is language. It creates syntactically correct
syntactically correct. strings according to the language's rules.

Outputs a binary result (accept or reject) Outputs strings that belong to the
for each input string. language.

Example: Compiler Example: CFG, BNF


3.3 FORMAL METHODS OF
DESCRIBING SYNTAX
This section discusses the formal language-generation
mechanisms, usually called grammars, that are commonly
used to describe the syntax of programming languages.
3.3.1 BACKUS-NAUR FORM AND
CONTEXT-FREE GRAMMARS
In the middle to late 1950s, two men, Noam Chomsky and John
Backus, in unrelated research efforts, developed the same
syntax description formalism, which subsequently became the
most widely used method for programming language syntax.
CONTEXT-FREE GRAMMARS
Noam Chomsky, a prominent linguist, introduced four classes
of grammars, particularly context-free and regular grammars,
provided a theoretical framework for describing the syntax of
programming languages. Chomsky's work in linguistics had
significant implications for the field of computer science,
particularly in the study of programming languages.
BACKUS-NAUR FORM
John Backus, a member of the ACM-GAMM group, presented a
landmark paper in 1959 describing ALGOL 58, a programming
language. In this paper, Backus introduced a formal notation
for specifying programming language syntax. The new
notation was later modified slightly by Peter Naur for the
description of ALGOL 60 (Naur, 1960) which later became
known as Backus-Naur Form (BNF).
COMPARISON

CONTEXT-FREE Context-free grammars provide the theoretical framework for


GRAMMAR (CFG) describing the syntax of programming languages.

Backus-Naur Form is a notation used to describe the syntax of


BACKUS-NAUR FORM
programming languages or other formal languages. BNF can be
(BNF)
described as a metasyntax notation for context-free grammars.
3.3.1.3 FUNDAMENTALS
A metalanguage is a language that is used to describe another
language. BNF is a metalanguage for programming languages.
BNF uses abstractions for syntactic structures. A simple Java
assignment statement, for example, might be represented by
the abstraction (pointed brackets are often used to delimit
names of abstractions). The actual definition of can be given by:
<assign> → <var> = <expression>

Definition of the LHS. It is called the


left-hand side (LHS), right-hand side (RHS) and consists Altogether, the
is the abstraction of some mixture of tokens,
lexemes, and references to other
definition is called a
being defined.
abstractions. (Actually, tokens are rule, or production.
also abstractions.)
This particular rule specifies that the abstraction <assign> is
defined as an instance of the abstraction <var>, followed by
the lexeme =, followed by an instance of the abstraction
<expression>. One example sentence whose syntactic
structure is described by the rule is
total = subtotal1 + subtotal2
A BNF description, or grammar, is a collection of rules.

TERMINAL SYMBOLS NONTERMINAL SYMBOLS

Description, Grammar &


LEXEMES & TOKENS
Abstraction
Nonterminal symbols can have multiple distinct definitions,
representing different syntactic forms in the language. These
alternatives are separated by the symbol |, denoting logical
OR. For example, an if statement in Java can have different
forms, as shown in the provided examples.
<if_stmt> → if ( <logic_expr> ) <stmt>
<if_stmt> → if ( <logic_expr> ) <stmt> else <stmt>
In these rules, <stmt>
or with the rule represents either a single
statement or a compound
<if_stmt> → if ( <logic_expr> ) <stmt> statement.

| if ( <logic_expr> ) <stmt> else <stmt>


3.3.1.4 DESCRIBING LISTS
Variable-length lists in mathematics are often written using an
ellipsis (. . .); 1, 2, . . . is an example. BNF does not include the ellipsis,
so an alternative method is required for describing lists of
syntactic elements in programming languages (for example, a list
of identifiers appearing on a data declaration statement). For BNF,
the alternative is recursion. A rule is recursive if its LHS appears in
its RHS. The following rules illustrate how recursion is used to
describe lists:
<ident_list> → identifier | identifier, <ident_list>
This defines <ident_list> as either a single token (identifier) or an
identifier followed by a comma and another instance of <ident_list>.
3.3.1.5 GRAMMARS AND DERIVATIONS
A grammar is a generative device for defining languages. The
sentences of the language are generated through a sequence of
applications of the rules, beginning with a special nonterminal of
the grammar called the start symbol. This sequence of rule
applications is called a derivation. In a grammar for a complete
programming language, the start symbol represents a complete
program and is often named <program>.
EXAMPLE 3.1
A Grammar for a Small Language
<program> → begin <stmt_list> end
<stmt_list> → <stmt>
| <stmt> ; <stmt_list>
<stmt> → <var> = <expression>
<var> → A | B | C
<expression> → <var> + <var>
| <var> – <var>
| <var>
The language described by the grammar of Example 3.1 has only one statement
form: assignment. A program consists of the special word begin, followed by a
list of statements separated by semicolons, followed by the special word end. An
expression is either a single variable or two variables separated by either a + or -
operator. The only variable names in this language are A, B, and C.
A derivation of a program in this language follows:

<program> => begin <stmt_list> end


=> begin <stmt> ; <stmt_list> end
=> begin <var> = <expression> ; <stmt_list> end
=> begin A = <expression> ; <stmt_list> end
=> begin A = <var> + <var> ; <stmt_list> end
=> begin A = B + <var> ; <stmt_list> end
=> begin A = B + C ; <stmt_list> end
=> begin A = B + C ; <stmt> end
=> begin A = B + C ; <var> = <expression> end
=> begin A = B + C ; B = <expression> end
=> begin A = B + C ; B = <var> end
=> begin A = B + C ; B = C end
This derivation, like all derivations, begins with the start
symbol, in this case . The symbol => is read “derives.” Each
successive string in the sequence is derived from the
previous string by replacing one of the nonterminals with one
of that nonterminal’s definitions. Each of the strings in the
derivation, including <program> , is called a sentential form.
In this derivation, the replaced nonterminal is always the
leftmost nonterminal in the previous sentential form.
Derivations that use this order of replacement are called
leftmost derivations.
EXAMPLE 3.2
A Grammar for Simple Assignment Statements

<assign> → <id> = <expr>


<id> → A | B | C
<expr> → <id> + <expr>
| <id> * <expr>
| ( <expr> )
| <id>

It describes assignment statements whose right


sides are arithmetic expressions with multiplication
and addition operators and parentheses.
The grammar of Example 3.2 describes assignment statements
whose right sides are arithmetic expressions with multiplication and
addition operators and parentheses. For example, the statement
A=B*(A+C)
is generated by the leftmost derivation:
<assign> => <id> = <expr>
=> A = <expr>
=> A = <id> * <expr>
=> A = B * <expr>
=> A = B * ( <expr> )
=> A = B * ( <id> + <expr> )
=> A = B * ( A + <expr> )
=> A = B * ( A + <id> )
=> A = B * ( A + C )
3.3.1.6 PARSE TREES

One of the most attractive features of grammars is


that they naturally describe the hierarchical
syntactic structure of the sentences of the
languages they define. These hierarchical structures
are called parse trees.
FIGURE 3.1
A parse tree for the simple statement A = B * (A + C)
<assign>

Every internal node of a <id> = <expr>


parse tree is labeled with a
nonterminal symbol; every A <id> * <expr>
leaf is labeled with a
terminal symbol. Every B ( <expr> )
subtree of a parse tree
describes one instance of <id> + <expr>
an abstraction in the
sentence. A <id>

C
3.3.1.7 AMBIGUITY

A grammar that generates a sentential form for


which there are two or more distinct parse trees is
said to be ambiguous.
EXAMPLE 3.3
An Ambiguous Grammar for Simple Assignment
Statements
<assign> → <id> = <expr>
<id> → A | B | C
<expr> → <expr> + <expr>
| <expr> * <expr>
| ( <expr> )
| <id>
The grammar of Example 3.3 is ambiguous because the sentence
A=B+C*A
has two distinct parse trees, as shown in Figure 3.2. The ambiguity occurs
because the grammar specifies slightly less syntactic structure than does the
grammar of Example 3.2. Rather than allowing the parse tree of an expression
to grow only on the right, this grammar allows growth on both the left and the
right.
FIGURE 3.2
Two distinct parse tress for the same sentence,
A=B+C*A
OPERATOR
PRECEDENCE
3.3.1.8

The Precedence of operator determines which


operation is executed first if there is more than one
operation in an expression
The expression x + y * z.
The expression x * y + z
operator precedence helps us avoid confusion and
ensures expressions are evaluated correctly!
OPERATOR
PRECEDENCE
3.3.1.8
ASSOCIATIVITY OF
OPERATOR
3.3.1.9

Associativity is used when there two or more operator


of same precedence
It is used to determine the direction in which an
expressions is evaluated
ASSOCIATIVITY OF
OPERATOR
3.3.1.9
AN UNAMBIGUOUS GRAMMAR
FOR IF-THEN-ELSE
3.3.1.10

The rule for if-else statements in most languages is that


an else is matched with the nearest previous
unmatched if.
the concepts of matched and unmatched statements.
Matched statements are those where every ‘if’ is paired
with an ‘else’, and unmatched statements are ‘if’
without an ‘else’.
AN UNAMBIGUOUS GRAMMAR
FOR IF-THEN-ELSE
3.3.1.10

<stmt> → <matched> | <unmatched>


<matched> → if <logic_expr> then <matched> else
<matched> | any non-if statement
<unmatched> → if <logic_expr> then <stmt> | if <logic_expr>
then <matched> else <unmatched>
EXTENDED BNF
3.3.2

Optional parts are placed in brackets [ ]


- <if_stmt> → if (<expression>) <statement> [else <statement> ]
- <if_stmt> → if (<expression>) <statement>
| if (<expression>) <statement> else <statement>

Alternative parts of RHS are placed inside parentheses and separated


via vertical bars
<term> → <term> ( + | - ) const

Repetition (0 or more) are placed inside braces { }


<ident> → letter { letter | digit }
EXAMPLES OF BNF AND EBNF
VERSIONS OF EXPRESSION GRAMMAR
BNF:
<expr> → <expr> + <term>
| <expr> - <term>
| <term>

<term> → <term> * <factor>


| <term> / <factor>
| <factor>

EBNF:
<expr> → <term> {(+ | -) <term>}

<term> → <factor> {( * | /) <factor>}


RECENT VARIATIONS IN EBNF
3.3.1.10

Alternatives RHSs are pot on separate lines


Use of colon : : instead of =>
Use of subscript opt opt for optional parts
Use of oneof for choices
ATTRIBUTE GRAMMAR
3.4
Attributes grammars (AGs) have additions to CFGs to
carry some semantic info on parse tree nodes.

A formal extension of a context-free grammar (CFGs). It


adds semantic information processing to the grammar.

We associate attributes with both terminal and


nonterminal symbols of the grammar.

Store additional information related to the symbols.


STATIC SEMANTICS
3.3.2.1

Nothing do with meaning


Context-free grammars (CFGs) cannot describe all of
the syntax of programming languages
BASIC CONCEPT
3.3.2.1

it generally refers to the foundational idea of a topic,


subject, or theory.
ATTRIBUTE GRAMMARS
DEFINED
3.4.3

In an attribute grammar, each grammar symbol (like a


variable or a nonterminal) has associated attributes.
These attributes are divided into two types:
Synthesized Attributes: These are used to pass
information up the parse tree. Think of them as
“results” that we compute at each node.
Inherited Attributes: These pass information down
and across the tree. They depend on the attributes of
parent and sibling nodes.
INTRINSIC ATTRIBUTES
3.4.4

Intrinsic attributes are properties specific to program


elements.
They’re determined by the symbol table.
3.4.5 EXAMPLES OF ATTRIBUTE GRAMMARS
As a very simple example of how attribute grammars
can be used to describe static semantics, consider
the following fragment of an attribute grammar that
describes the rule that the name on the end of an
Ada procedure must match the procedure’s name.
The string attribute of , denoted by .string, is the
actual string of characters that were found
immediately following the reserved word procedure
by the compiler.
The attributes for the nonterminals in the example attribute
grammar are described in the following paragraphs:
actual_type — A synthesized attribute associated with the
nonterminals <var> and <expr>. It is used to store the actual
type, int or real, of a variable or expression. In the case of a
variable, the actual type is intrinsic. In the case of an
expression, it is determined from the actual types of the
child node or children nodes of the <expr> nonterminal.

expected_type — An inherited attribute associated with the


nonterminal <expr>. It is used to store the type, either int or
real, that is expected for the expression, as determined by
the type of the variable on the left side of the assignment
statement.
EXAMPLE 3.6
An Attribute Grammar for Simple Assignment Statements
FIGURE 3.6
A parse tree for A = A+ B
3.4.6 COMPUTING ATTRIBUTE VALUES
Consider the process of computing the attribute values of a
parse tree, which is sometimes called decorating the parse
tree. If all attributes were inherited, this could proceed in a
completely top-down order, from the root to the leaves.
Alternatively, it could proceed in a completely bottom-up order,
from the leaves to the root, if all the attributes were
synthesized.
FIGURE 3.7
A Flow of Attributes in the Tree
FIGURE 3.8
A Fully Attributed Parse Tree
3.4.7 EVALUATION
Compilers use attribute grammars to check static
semantic rules in their languages. However, the size
and complexity of these grammars make them
difficult to write and read, and the attribute values
on a large parse tree are costly to evaluate. Less
formal attribute grammars are more commonly used
by compiler writers who focus on the process of
producing a compiler.
3.5 DESCRIBING THE MEANINGS OF
PROGRAMS: DYNAMIC SEMANTICS

The dynamic semantics of programming languages


are challenging to describe due to a lack of
universally accepted notation. A precise semantics
specification could enable program correctness
without testing, generate compilers automatically,
and help identify ambiguities in designs. Scheme is
one example.
3.5.1 OPERATIONAL SEMANTICS
Operational semantics describes the meaning of a program or
statement by specifying the effects of running it on a machine. This
is achieved by executing a compiled version of the program on a
computer. However, this approach has limitations due to the small
and numerous steps involved in machine language execution and
the complex storage of real computers. Intermediate-level
languages and interpreters are designed specifically for this
process. Operational semantics can be used at different levels,
from natural operational semantics to structural operational
semantics, to determine the precise meaning of a program.
3.5.1.1 BASIC PROCESS
An operational semantics description of a language involves designing
an intermediate language with clarity and unambiguous meanings. A
virtual machine is constructed for natural operational semantics, which
can execute single statements, code segments, or whole programs. The
intermediate language is often abstract, but a more human-oriented
language can be used for simpler control statements. This process is
commonly used in programming textbooks and reference manuals.
3.5.1.1 BASIC PROCESS
The human reader of a description is the virtual computer, assuming
they can execute instructions correctly and recognize their effects.
The intermediate language used for formal operational semantics
descriptions is often abstract, but a more human-oriented
intermediate language could be used for this purpose.

ident = var
ident = ident + 1
ident = ident – 1
goto label
if var relop var goto label
3.5.1.1 BASIC PROCESS
The statement consists of relational operators relop, ident, and var,
which are easy to understand and implement. A slight generalization
allows for more general arithmetic expressions and assignment
statements.

ident = var bin_op var


ident = un_op var

The text describes the semantics of various control statements using


an intermediate language, which includes binary and unary operators,
and allows for the description of arrays, records, pointers, and
subprograms, despite the complexity of multiple data types.
3.5.1.2 EVALUATION

The Vienna Definition Language (VDL) was the first significant use of
formal operational semantics to describe the semantics of PL/I, an
abstract machine designed by IBM. This method is effective for language
users and implementors, but can lead to circularities. More formal
methods, based on mathematics and logic, are discussed in subsequent
sections.
3.5.2 DENOTATIONAL
SEMANTICS
Denotational semantics is a formal method for describing the meaning
of programs, based on recursive function theory. It involves defining
mathematical objects and functions for each language entity, mapping
instances of that entity onto mathematical objects. These objects model
the exact meaning of their corresponding entities. Denotational
semantics is related to operational semantics, which translate
programming language constructs into simpler ones. However, unlike
operational semantics, denotational semantics does not model the step-
by-step computational processing of programs. The process of
constructing a denotational semantics specification for a programming
language is complex and requires rigorous manipulation of
mathematical objects.
3.5.2.1 TWO SIMPLE EXAMPLES

We use a very simple language construct, character string


representations of binary numbers, to introduce the denotational
method. The syntax of such binary numbers can be described by the
following grammar rules:

<bin_num> → '0'
|'1'
| <bin_num> '0'
| <bin_num> '1'
3.5.2.1 TWO SIMPLE EXAMPLES

The syntactic domain of a mapping function for binary numbers is


the set of character string representations of binary numbers, while
the semantic domain is the set of nonnegative decimal numbers.
Denotational semantics associates the actual meaning of binary
numbers with rules with a single terminal symbol. For example,
decimal numbers are associated with the first two grammar rules.
3.5.2 DENOTATIONAL
SEMANTICS

The semantic function, named Mbin, maps the syntactic objects, as


described in the previous grammar rules, to the objects in N, the set of
nonnegative decimal numbers. The function Mbin is defined as follows:

Mbin('0') = 0 Mbin('1') = 1 Mbin( '0') = 2 * Mbin() Mbin( '1') = 2 * Mbin() + 1


3.5.2 DENOTATIONAL
SEMANTICS
This section discusses denotational semantics descriptions of simple constructs,
assuming correct syntax and static semantics, and excluding integer and Boolean
scalar types.

The denotational mappings for these syntax rules are


Mdec('0') = 0, Mdec('1') = 1, Mdec('2') = 2, . . ., Mdec('9') = 9
Mdec(<dec_num> '0') = 10 * Mdec(<dec_num>)
Mdec(<dec_num> '1') = 10 * Mdec(<dec_num>) + 1
...
Mdec(<dec_num> '9') = 10 * Mdec(<dec_num>)
3.5.2.2 THE STATE OF THE
PROBLEM

Denotational semantics of a program is defined by state changes in an ideal computer,


similar to operational semantics. It describes meaning by describing the values of all
program variables, unlike operational semantics which uses state changes through coded
algorithms.

Let the state s of a program be represented as a set of ordered pairs, as


follows:
s = {<i1, v1>, <i2, v2>, . . . , <in, vn>}

Variable names and associated values are represented by v's, with undef
values indicating undefined variables. VARMAP, a function with two
parameters, maps states to states to define program meanings. Some
language constructs, like expressions, are mapped to values.
3.5.2.3 EXPRESSIONS

Expressions are fundamental in programming languages, with simple


operators like + and *, operands like scalar integer variables and literals, no
parentheses, and an integer value.
Following is the BNF description of these expressions:

<expr> → <dec_num> | <var> | <binary_expr>


<binary_expr> → <left_expr> <operator> <right_expr>
<left_expr> → <dec_num> | <var>
<right_expr> → <dec_num> | <var>
<operator> → + | *
3.5.2.3 EXPRESSIONS

In expressions, errors are primarily machine-dependent and are considered as undefined


variables. The semantic domain for denotational specification is Z h {error}. The mapping function
for an expression is defined using the symbol =, and the implication symbol => connects operand
form to case construct. Dot notation refers to child nodes.

Me(<expr>, s) Δ= case <expr> of


<dec_num>=>Mdec(<dec_num>, s)
<var> =>if VARMAP(<var>, s) == undef
then error
else VARMAP(<var>, s)
<binary_expr> =>
if(Me(<binary_expr>.<left_expr>,s) == undef OR
Me(<binary_expr>.<right_expr>, s) == undef)
then error
else if (<binary_expr>.<operator> == '+')
then Me(<binary_expr>.<left_expr>, s) +
Me(<binary_expr>.<right_expr>, s)
else Me(<binary_expr>.<left_expr>, s) *
Me(<binary_expr>.<right_expr>, s)
3.5.2.4 ASSIGNMENT STATEMENTS
An assignment statement is an expression evaluation plus the
setting of the target variable to the expression’s value.
3.5.2.5 LOGICAL PRETEST LOOPS
The denotational semantics of a logical pretest loop is deceptively
simple. To expedite the discussion, we assume that there are two
other existing mapping functions, Msl and Mb, that map statement
lists and states to states and Boolean expressions to Boolean
values (or error), respectively. The function is:
3.5.2.6 EVALUATION

Objects and functions, such as those used in the earlier


constructs, can be defined for the other syntactic entities of
programming languages. When a complete system has been
defined for a given language, it can be used to determine the
meaning of complete programs in that language. This
provides a framework for thinking about programming in a
highly rigorous way.
3.5.3 AXIOMATIC SEMANTICS

Axiomatic semantics, thus named because it is based on


mathematical logic, is the most abstract approach to
semantics specification discussed in this chapter.

Axiomatic semantics was defined in conjunction with the


development of an approach to proving the correctness of
programs. Such correctness proofs, when they can be
constructed, show that a program performs the computation
described by its specification.
3.5.3.1 ASSERTIONS

The logical expressions used in axiomatic semantics are


called predicates, or assertions.

An assertion immediately following a statement describes the


new constraints on those variables (and possibly others) after
execution of the statement. These assertions are called the
precondition and postcondition, respectively, of the
statement. For two adjacent statements, the postcondition of
the first serves as the precondition of the second.
3.5.3.2 WEAKEST PRECONDITIONS

The weakest precondition is the least restrictive precondition


that will guarantee the validity of the associated
postcondition. For example, in the statement and
postcondition given in Section 3.5.3.1, {x > 10}, {x > 50}, and {x
> 1000} are all valid preconditions. The weakest of all
preconditions in this case is {x > 0}.
3.5.3.3 ASSIGNMENT STATEMENTS

The precondition and postcondition of an assignment


statement together define precisely its meaning. To define
the meaning of an assignment statement, given a
postcondition, there must be a way to compute its
precondition from that postcondition.
3.5.3.3 ASSIGNMENT STATEMENTS
3.5.3.4 SEQUENCES
• The weakest precondition for a sequence of statements cannot
be described by an axiom, because the precondition depends on
the particular kinds of statements in the sequence. In this case, the
precondition can only be described with an inference rule. Let S1
and S2 be adjacent program statements. If S1 and S2 have the
following pre- and postconditions
3.5.3.5 SELECTION
• This rule indicates that selection statements must be proven both
when the Boolean control expression is true and when it is false.
The first logical statement above the line represents the then
clause; the second represents the else clause. According to the
inference rule, we need a precondition P that can be used in the
precondition of both the then and else clauses.
3.5.3.6 LOGICAL PRETEST LOOPS
• Another essential construct of imperative programming
languages is the logical pretest, or while loop. Computing the
weakest precondition for a while loop is inherently more difficult
than for a sequence, because the number of iterations cannot
always be predetermined. In a case where the number of iterations
is known,
the loop can be unrolled and treated as a sequence.
3.5.3.7 PROGRAM PROOFS

• This section provides validations for two simple programs. The


first example of a correctness proof is for a very short program,
consisting of a sequence of three assignment statements that
interchange the values of two variables.
3.5.3.8 EVALUATION

• As stated previously, to define the semantics of a complete


programming language using the axiomatic method, there must be an
axiom or an inference rule for each statement type in the language.
Defining axioms or inference rules for some of the statements of
programming languages has proven to be a difficult task. An obvious
solution to this problem is to design the language with the axiomatic
method in mind, so that only statements for which axioms or inference
rules can be written are included. Unfortunately, such a language would
necessarily leave out some useful and powerful parts.
CHAPTER 4

LEXICAL AND
SYNTAX
ANALYSIS
Presented By : Group 2
OVERVIEW
4.1 Introduction

4.2 Lexical Analysis

4.3 The Parsing Problem

4.4 Recursive-Decent Parsing

4.5 Bottom-Up Parsing


4.1 INTRODUCTION
There are three different approaches for implementing
programming languages:
Compilation Approach
Pure Interpretation Approach
Hybrid Implementation Approach
MEANING EXAMPLE

Uses a program called a compiler to


translate programs written in a high-
COMPILATION C++, COBOL
level programming language into
machine code.

Systems perform no translation; rather,


PURE
programs are interpreted in their JavaScript
INTERPRETATION
original form by a software interpreter.

Translates programs written in high- Java programs and


HYBRID
level languages into intermediate forms, programs written for
IMPLEMENTATION
which are then interpreted. the Microsoft .NET
JUST-IN-TIME (JIT)
COMPILER
Traditionally, hybrid systems have resulted in much slower
program execution than compiler systems. However, in recent
years the use of JIT compilers has become widespread,
translating intermediate code to machine code at runtime.
JIT compilers improve execution speed, transforming the
hybrid system into a delayed compiler system.
SCOPE OF TASK COMPARISON

Lexical analyzer deals with small-scale language


LEXICAL ANALYZER
constructs, such as names and numeric literals.

Syntax analyzer deals with the large-scale constructs,


SYNTAX ANALYZER
such as expressions, statements, and program units.
There are three reasons why lexical analysis is separated
from syntax analysis:
1. Simplicity—Techniques for lexical analysis are less complex
than those required for syntax analysis, so the lexical-analysis
process can be simpler if it is separate.
2. Efficiency—Although it pays to optimize the lexical analyzer,
because lexical analysis requires a significant portion of total
compilation time, it is not fruitful to optimize the syntax
analyzer.
3. Portability—Because the lexical analyzer reads input
program files and often includes buffering of that input, it is
somewhat platform dependent.
4.2 LEXICAL ANALYZER
A lexical analyzer is essentially a pattern matcher. A pattern
matcher attempts to find a substring of a given string of
characters that matches a given character pattern.
A lexical analyzer serves as the front end of a syntax analyzer.
Technically, lexical analysis is a part of syntax analysis.
LEXICAL ANALYZER
Lexical analysis serves as the initial phase of syntax analysis. It
organizes characters into logical groupings (lexemes) and
assigns internal codes (tokens) to these groupings based on
their structure. Lexemes are recognized by matching the input
character string against predefined patterns.
LEXICAL ANALYZER
PROCESS
4.3 THE PARSING PROBLEM
The part of the process of analyzing syntax that is referred to
as syntax analysis is often called parsing. This section
discusses the general parsing problem and introduces the two
main categories of parsing algorithms, top-down and bottom-
up, as well as the complexity of the parsing process.
WHAT IS SYNTAX
ANALYZER?
A syntax analyzer, also known as a parser, is a crucial component
of a compiler or interpreter. Its primary function is to analyze the
syntactic structure of the source code and check if it follows the
grammatical rules of the programming language.
SYNTAX ANALYZER
WHAT IS PARSING?
Parsers for programming languages is a program that
construct parse trees for given string, if the string is generated
from the underlying grammar.
Parsers are categorized according to the direction in which
they build parse trees. The two broad classes of parsers are
top-down, in which the tree is built from the root downward to
the leaves, and bottom-up, in which the parse tree is built from
the leaves upward to the root.
MEANING

A parser in which the tree is built from the root


TOP-DOWN PARSERS
downward to the leaves

A parser in which the parse tree is built from the leaves


BOTTOM-UP PARSERS
upward to the root.
COMPARISON

The parser selects the leftmost nonterminal in the


TOP-DOWN PARSERS current sentential form and expands it according to
the grammar rules.

It involves finding substrings of the input that match


the right-hand sides of production rules in the
BOTTOM-UP PARSERS grammar. This process corresponds to identifying the
reverse steps of a rightmost derivation, where the
sentential forms are produced from last to first.
We use a small set of notational conventions for grammar
symbols and strings to make the discussion less cluttered:
Terminal symbols — lowercase letters at the beginning of
the alphabet (a, b, . . .)
Nonterminal symbols — uppercase letters at the beginning
of the alphabet (A, B, . . .)
Terminal or nonterminal — uppercase letters at the end of
the alphabet (W, X, Y, Z)
Strings of terminals — lowercase letters at the end of the
alphabet (w, x, y, z)
Mixed strings (terminal and/or nonterminal) — lowercase
Greek letters
COMPLEXITY OF PARSING
Parsing algorithms for unambiguous grammars are complex and
inefficient, with a time complexity of O(n^3), due to frequent
backtracking and reparsing, making them impractical for real-
world applications. Instead, more efficient algorithms with linear
time complexity (O(n)) are preferred for commercial compiler
syntax analyzers, sacrificing some generality for practical
efficiency.
COMPLEXITY OF PARSING
Parsing algorithms for unambiguous grammars are complex and
inefficient, with a time complexity of O(n^3), due to frequent
backtracking and reparsing, making them impractical for real-
world applications. Instead, more efficient algorithms with linear
time complexity (O(n)) are preferred for commercial compiler
syntax analyzers, sacrificing some generality for practical
efficiency.
RECURSIVE DESCENT
PARSING
Recursive-descent parsing is named for its structure, which
consists of a collection of subprograms, many of which are
recursive. It generates a parse tree in a top-down manner,
reflecting the nested structures often found in programming
languages.
RECURSIVE DESCENT
PARSING
A recursive-descent parser has a subprogram for each
nonterminal in the grammar it is associated with. Each
subprogram is responsible for parsing a specific nonterminal
by tracing out a parse tree for that nonterminal based on the
input string.
LL GRAMMAR CLASS
The first L in LL specifies a left-to-right scan of the input; the
second L specifies that a leftmost derivation is generated.
LL GRAMMAR CLASS
One simple grammar characteristic that causes a catastrophic
problem for LL parsers is left recursion.
A→A+B

Left recursion and failure of the pairwise disjointness test are


significant issues for top-down parsing algorithms like
recursive descent, but they can often be addressed through
grammar transformations and alternative representations.
SHIFT REDUCE
ALGORITHMS
Bottom-up parsers, often referred to as shift-reduce algorithms,
are a common parsing technique used in the compilation of
programming languages. Shift: This operation moves the next
input token onto the parser's stack. Reduce: This operation
replaces a sequence of tokens (known as the handle) on top of the
parser's stack with its corresponding nonterminal symbol (the left-
hand side of a production rule).
LR PARSERS
The first L in LR specifies a left-to-right scan of the input; the
second R specifies that a rightmost derivation is generated.
LR (Left-to-right, Rightmost derivation) parsing is a popular
technique used in compiler design for syntax analysis. The LR
parsing algorithm, originally devised by Donald Knuth in 1965,
offers several advantages over other parsing techniques.
VARIATION OF LR PARSERS
Over time, variations of the canonical LR algorithm were
developed, such as those by DeRemer in 1971 and DeRemer
and Pennello in 1982. These variations aimed to reduce the
computational overhead of generating the parsing table while
still working with smaller classes of grammars.
ADVANTAGE DISADVANTAGE

LR parsers can be constructed for all


programming languages. They can detect The main challenge with LR parsing is the
syntax errors as soon as they occur during a manual construction of the parsing table
left-to-right scan. The class of grammars for a given grammar, especially for complex
parsable by LR parsers is a superset of programming languages. However, various
those parsable by LL parsers. This means software tools are available to automate
that LR parsers can handle a broader range this process.
of grammars.
THANK YOU
Presented By : Group 2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy