2021BSE109PS
2021BSE109PS
These two processes differ in their levels of complexity; lexical analysis and syntax
analysis fulfill unique roles and utilize distinct methods. Lexical analysis is
dedicated to dissecting the source code into tokens, which constitute the fundamental
elements of the programming language, such as keywords, identifiers, operators, and
literals. This phase addresses regular languages using approaches like finite automata
and regular expressions. Conversely, syntax analysis concentrates on the structure
and rules of the program, verifying whether the token sequence constitutes a valid
program. This phase handles context-free languages and makes use of techniques
such as parsing algorithms. The division of these two stages simplifies the overall
complexity of the compilation process, facilitating its implementation and upkeep.
Additionally, the distinction between lexical analysis and syntax analysis promotes
the portability and reusability of the compiler. (Rajshekhar 2018) The lexical
analyzer is tasked with ingesting the input program files, managing file I/O
operations, and handling buffering. These tasks may involve platform-specific
implementations. By detaching the lexical analysis phase from the syntax analysis
phase, it is possible to design the compiler to be more platform-independent. The
syntax analyzer operates on the token stream generated by the lexical analyzer,
distancing itself from platform-specific nuances.
Modularity and Maintenance: The separation fosters modularity and simplifies the
maintenance of the compiler. The lexical analyzer and syntax analyzer can be
developed as independent modules, each with a clearly defined interface. This
modular design enables the separate development, testing, and debugging of the
modules. Additionally, should there be revisions or updates to the language
specifications, it is generally simpler to adjust or expand the syntax analyzer without
impacting the lexical analyzer. Employing formal grammars to outline the language's
syntax offers a precise framework for both humans and software. These guidelines
serve as the direct foundation for the syntax analyzer, easing the understanding and
maintenance of the compiler's functionality.
Associativity dictates the grouping of operators with the same precedence in the
absence of parentheses. This is divided into two types:
Left Associativity: Operations are processed from left to right. For instance, the
expression a + b + c is grouped as (a + b) + c.
Right Associativity: Operations are processed from right to left. For example, the
expression a ^ b ^ c (used for exponentiation) is grouped as a ^ (b ^ c).
Precedence establishes the sequence in which operators are applied when they share
an operand. Operators of higher importance (higher precedence) are applied first. For
instance, in the expression 2 + 3 * 4, the multiplication operator (*) has higher
precedence over addition (+), thus it is evaluated as 2 + (3 * 4).
Left Recursion: This phenomenon occurs when a grammar's production rule can
recursively call itself by applying the rule from the left side repeatedly. For example,
A → A + C | C is left-recursive as it can continuously derive itself (A → A + C → A
+ C + C → ...). This can result in infinite recursion in top-down parsers.
Elimination of Left Recursion: To prevent infinite recursion in top-down parsers, left
recursion must be removed. A common approach to achieve this is by substitution,
replacing the recursive rule with a non-recursive alternative. For instance, A → A +
C | C can be reformulated as A → C B and B → + C B | ε (where ε denotes an empty
string).
Left Factoring:Left factoring is a method to refine the grammar by extracting
common prefixes from different alternatives in a production rule. For example, the
rule stmt → id := expr | id(expr_list) can be refactored as stmt → id (:= expr |
(expr_list)). This modification aids the parser by presenting only one production
option per prefix, simplifying decision-making.
First and Follow Sets:
First Set: The First set of a non-terminal A includes the terminals (or the empty
string ε) that can initiate any string derived from A. For example, if A → a | Bc, then
First(A) = {a, b}.
Follow Set: The Follow set of a non-terminal A consists of the terminals that can
appear immediately following A in a derived sentential form. For instance, if the
grammar includes the rule S → AaB, then 'a' is included in the Follow set of A.
References
Amit Barve, Brijendra Kumar Joshi , 2014, Parallel Lexical Analysis of Multiple
Files on
Multi-Core Machines; International Journal of Computer Applications
(Foundation of Computer Science (FCS))-Vol. 96, Iss: 16, pp 22-24
Novikova, J., Balagopalan, A., Shkaruta, K. and Rudzicz, F. (Year) 'Lexical
Features Are More Vulnerable, Syntactic Features Have More Predictive Power',