We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 9
Powerful Techniques for Perl and Other Tools
Mastering
O’REILLY® Jeffrey E. F. FriedlTable of Contents
Preface xv
1: Introduction to Regular Expressions 1
Solving Real Problems 2
Regular Expressions as a Language 4
The Filename Analogy 4
The Language Analogy 5
‘The Regular-Expression Frame of Mind 6
If You Have Some Regular-Expression Experience 6
Searching Text Files: Egrep 6
Egtep Metacharacters 8
Start and End of the Line 8
Character Classes 9
Matching Any Character with Dot u
Alternation B
Ignoring Differences in Capitalization 4
Word Boundaries 15
Ina Nutshell 16
Optional Items 7
Other Quantifiers: Repetition 18
Parentheses and Backreferences 20
The G: 2
Expanding the F 2B
Linguistic Diversification 23
The Goal of a Regular Expression 23wilt Table of Contents
A Few More Examples
Regular Expression Nomenclature
Improving on the Status Quo
Summary
Personal Glimpses
2: Extended Introductory Examples
About the Examples
A Short Introduction to Perl
Matching Text with Regular Expressions
Toward a More Real-World Example
Side Effects of a Successful Match
Intertwined Regular Expressions
Intermission
Modifying Text with Regular Expressions
Example: Form Letter
Example: Prettifying a Stock Price
Automated Editing
A Small Mail Utility
Adding Commas to a Number with Lookaround
‘Text-to-IITML Conversion
That Doubled-Word Thing
3: Overview of Regular Expression Features and Flavors
A Casual Stroll Across the Regex Landscape
The Origins of Regular Expressions
Ata Glance
Care and Handling of Regular Expressions
Integrated Handling
Procedural and Object-Oriented Handling
A Search-and-Replace Example
Search and Replace in Other Languages
Care and Handling: Summary
Strings, Character Encodings, and Modes
Strings as Regular Expressions
Character-Encoding Issues
Regex Modes and Match Modes
Common Metacharacters and Features
Character Representations
23
27
101
101
101
105
109
112
114Table of Contents ix
Character Classes and Class-Like Constructs 47
Anchors and Other “Zero-Width Assertions” 127
Comments and Mode Modifiers 133
Grouping, Capturing, Conditionals, and Control 135
Guide to the Advanced Chapters 141
4; The Mechanics of Expression Processing 143
Start Your Engines! 143
Two Kinds of Engines 144
New Standards 144
Regex Engine Types - 145
From the Department of Redundancy Department 146
Testing the Engine Type 146
Match Basics 147
About the Examples 147
Rule 1: The Match That Begins Earliest Wins 148
Engine Pieces and Parts 149)
Rule 2: The Standard Quantifiers Are Greedy 1st
Regex-Directed Versus Text-Directed 153
NFA Engine: Regex-Directed 153
DFA Engine: Text-Directed 155,
First Thoughts: NEA and DFA in Comparison 156
Backtracking 157
A Really Crummy Analogy 158
Two Important Points on Backtracking 159
Saved States 159
Backtracking and Greediness 162
More About Greediness and Backtracking 163
Problems of Greediness 164
Multi-Character “Quotes” 165
Using Lazy Quantifiers 166
Greediness and Laziness Always Favor a Match 167
The Essence of Greediness, Laziness, and Backtracking 168
Possessive Quantifiers and Atomic Grouping 169
Possessive Quantifiers, +, ++, ++, and {m,n} 172
The Backtracking of Lookaround 173
Is Alternation Greedy? 174
‘Taking Advantage of Ordered Alternation 175,
NFA, DFA, and POSIX 77x Table of Contents
“The Longest-Leftmost” onsen WIT
POSIX and the Longest-Lefimost Rule onsen 1B.
Speed and Efficiency 179)
Summary: NFA and DFA in Comparison 180
Summary ... ones IBS
5: Practical Regex Techniques . ose 185
Regex Balancing Act - 186
A Few Short Examples 186
Continuing with Continuation Lines ssn 186
Matching an IP Address ssn 187
Working with Filenames pone, 190
Matching Balanced Sets of Parentheses - 193
Watching Out for Unwanted Matches cose 194
Matching Delimited Text sosnsnnnnasie 196
Knowing Your Data and Making Assumptions 198,
Stripping Leading and Trailing Whitespace sosnnenneasne 199)
HTMLRelated Examples ose 200
Matching an HTML Tag 200
Matching an HTML Link 201
Examining an HTTP URL ssn 203
Validating a Hostname osnnennnae 203
Plucking Out a URL in the Real World sosnsnnnnasiin 205
Extended Examples 208
Keeping in Syne with Your Data este 208
Parsing CSV Files - 212
6: Crafting an Efficient Expression . ou 221
A Sobering Example seen 222
A Simple Change—Placing Your Best Foot Forward... 223
Efficiency Verses Correctness “ 223
Advancing Further—Localizing the Greediness eos 225
Reality Check . 226
A Global View of Backtracking . . 228
More Work for a POSIX NFA 29
Work Required During a Non-Match 230
Being More Specific 231
Alternation Can Be Expensive esse 231
Benchmarking . . 232Table of Gontents at
Know What You're Measuring
Benchmarking with Java
Benchmarking with VB.NET
Benchmarking with Python
Benchmarking with Ruby
Benchmarking with Tel
Common Optimizations
No Free Lunch
Everyone's Lunch is Different
The Mechanics of Regex Application
Pre-Application Optimizations
Optimizations with the Transmission
Optimizations of the Regex Itself
Techniques for Faster Expressions
‘Common Sense Techniques
Expose Literal Text
Expose Anchors
Lazy Versus Greedy: Be Specific
Split Into Multiple Regular Expressions
Mimic Initial-Character Discrimination
Use Atomic Grouping and Possessive Quantifiers ..
Lead the Engine to a Match
Unrolling the Loop
Method 1: Building a Regex From Past Experiences
The Real “Unrolling-the-Loop” Pattern
Method 2: A Top-Down View
Method 3: An Internet Hostname
Observations
Using Atomic Grouping and Possessive Quantifiers
Short Unrolling Examples
Unrolling C Comments
The Freeflowing Regex
A Helping Hand to Guide the Match
A Well-Guided Regex is a Fast Regex
Wrapup
In Summary: Think!ait Table of Contents
7: Perl 283
Regular Expressions as a Language Component 285
Perl's Greatest Strength 286
Perl's Greatest Weakness 286
Perl's Regex Flavor 286
Regex Operands and Regex Literals 288
How Regex Literals Are Parsed 292
Regex Modifiers 292
Regex-Related Perlisms 293
Expression Context 294
Dynamic Scope and Regex Match Effects 295,
Special Variables Modified by a Match 299
The ax/-/ Operator and Regex Objects 303
Building and Using Regex Objects 303
Viewing Regex Objects 305
Using Regex Objects for Efficiency 306
The Match Operator 306,
Match’s Regex Operand 307
Specifying the Match Target Operand 308,
Different Uses of the Match Operator 309
Iterative Matching: Scalar Context, with /g 312
‘The Match Operator's Environmental Relations 316
‘The Substitution Operator 318
The Replacement Operand 319)
The /e Modifier 319)
Context and Return Value 321
The Split Operator 321
Basic Split 322
Returning Empty Elements 324
Split’s Special Regex Operands 325
Split’s Match Operand with Capturing Parentheses 326
Fun with Perl Enhancements 326
Using a Dynamic Regex to Match Nested Pairs 328
Using the Embedded-Code Construct 331
Using local in an Embedded-Code Construct 335
‘A Warning About Embedded Code and my Variables 338
Matching Nested Constructs with Embedded Code 340
Overloading Regex Literals 341
Problems with Regex-Literal Overloading 344Table of Gontents
Mimicking Named Capture
Perl Efficiency Issues
“There's More Than One Way to Do I
Regex Compilation, the /o Modifier, ax/-/, and Efficiency
Understanding the “Pre-Match” Copy
The Study Function
Benchmarking
Regex Debugging Information
Final Comments
8 Java
Judging a Regex Package
Technical Issues
Social and Political Issues
Object Models
A Few Abstract Object Models
Growing Complexity
Packages, Packages, Packages
Why So Many *Perl5” Flavors?
Lies, Damn Lies, and Benchmarks
Recommendations
Sun’s Regex Package
Regex Flavor
Using java.util.regex
The Pattern.compile() Factory
The Matcher Object
Other Pattexn Methods
A Quick Look at Jakarta-ORO
oRO's Per1sutil
A Mini Per15Util Reference
Using ORO's Underlying Classes
9: .NET
NET's Regex Flavor
Additional Comments on the Flavor
Using NET Regular Expressions
Regex Quickstart
Package Overview
Core Object Overview
365
366
366
367
368
368
372
372
375
375
377
378
378
381
383
384
390
392
392
393
397
. 399
400
402
407
407
409
410xiv Table of Contents
Core Object Details 4i2
Creating Regex Objects 413,
Using Regex Objects 415
Using Match Objects . 421
Using Group Objects 424
Static “Convenience” Functions 425
Regex Caching 426
Support Functions 426
Advanced .NET 4a7
Regex Assemblies 428
Matching Nested Constructs 430
Capture Objects 431
Index 433