0% found this document useful (0 votes)
66 views7 pages

Lesson 2: Matching Single Characters

This document discusses matching single characters in regular expressions. It introduces literal character matching using text like "Ben" and metacharacter matching using "." to match any single character. Examples show matching file names containing patterns like "sales." or ".a.\." to find files related to sales or North/South America. Escaping characters with "\" is also demonstrated, such as "\." to match a literal period character.

Uploaded by

Me Its
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views7 pages

Lesson 2: Matching Single Characters

This document discusses matching single characters in regular expressions. It introduces literal character matching using text like "Ben" and metacharacter matching using "." to match any single character. Examples show matching file names containing patterns like "sales." or ".a.\." to find files related to sales or North/South America. Escaping characters with "\" is also demonstrated, such as "\." to match a literal period character.

Uploaded by

Me Its
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Lesson 2

Matching Single Characters

In this lesson you’ll learn how to perform simple character matches of one or more characters.

MATCHING LITERAL TEXT


Ben is a regular expression. Because it is plain text, it may not look like a regular expression, but it is. Regular
expressions can contain plain text (and may even contain only plain text). Admittedly, this is a total waste of regular
expression processing, but it’s a good place to start.

So, here goes:

Text

Click here to view code image


Hello, my name is Ben. Please visit
my website at http://www.forta.com/.

RegEx

Ben

Result

Click here to view code image


Hello, my name is Ben. Please visit
my website at http://www.forta.com/.

Analysis
The regular expression used here is literal text and it matches Ben in the original text.
Note
In the examples you’ll see that the matched text is shaded. We’ll use this format throughout the book so you can easily
see exactly what an example matched.

Let’s look at another example using the same search text and a different regular expression:

Text

Click here to view code image


Hello, my name is Ben. Please visit
my website at http://www.forta.com/.

RegEx

my

Result
Click here to view code image
Hello, my name is Ben. Please visit
my website at http://www.forta.com/.

Analysis
my is also static text, but notice how two occurrences of my were matched.

How Many Matches?


The default behavior of most regular expression engines is to return just the first match. In the preceding example,
the first my would typically be a match, but not the second.
So why were two matches made? Most regex implementations provide a mechanism by which to obtain a list of all
matches (usually returned in an array or some other special format). In JavaScript, for example, using the
optional g (global) flag returns an array containing all the matches.
Note
Consult Appendix A, “Regular Expressions in Popular Applications and Languages,” to learn how to perform global
matches in your language or tool.

Handling Case Sensitivity


Regular expressions are case sensitive, so Ben will not match ben. However, most regex implementations also
support matches that are not case sensitive. JavaScript users, for example, can specify the optional i flag to force a
search that is not case sensitive.
Note
Consult Appendix A to learn how to use your language or tool to perform searches that are not case sensitive.

MATCHING ANY CHARACTERS


The regular expressions thus far have matched static text only—rather anticlimactic, indeed. Next we’ll look at
matching unknown characters.
In regular expressions, special characters (or sets of characters) are used to identify what is to be searched for.
The . character (period, or full stop) matches any one character.
Therefore, searching for c.t will match cat and cot (and a bunch of other nonsensical words, too).

Here is an example:

Text

Click here to view code image


sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls

RegEx

sales.
Result

Click here to view code image


sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls

Analysis
Here the regex sales. is being used to find all filenames starting with sales and followed by another character.
Three of the nine files match the pattern.
Tip
You’ll often see the term pattern used to describe the actual regular expression.
Note
Notice that regular expressions match patterns with string contents. Matches will not always be entire strings, but the
characters that match a pattern—even if they are only part of a string. In the example used here, the regular
expression did not match a filename; rather, it matched part of a filename. This distinction is important to remember
when passing the results of a regular expression to some other code or application for processing.
. matches any character, alphabetic characters, digits, and even . itself:

Text

Click here to view code image


sales.xls
sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls

RegEx

sales.

Result

Click here to view code image


sales.xls
sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls

Analysis
This example contains one additional file, sales.xls. The file was matched by the pattern sales. as . matches
any character.
Multiple instances of . may be used, either together (one after the other—using .. will match any two characters
next to each other) or in different locations in the pattern.
Let’s look at another example using the same text. This time you need to find all files for North America (na) or South
America (sa) regardless of what digit comes next:

Text

Click here to view code image


sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls

RegEx

.a.

Result

Click here to view code image


sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
Analysis
The regex .a. did indeed find na1, na2, and sa1, but it also found four other matches that it was not supposed to.
Why? Because the pattern matches any three characters so long as the middle one is a.
What is needed is a pattern that matches .a. followed by a period. Here is another try:

Text

Click here to view code image


sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls

RegEx

.a..

Result

Click here to view code image


sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls

Analysis
.a.. does not work any better than .a. did; appending a . will match any additional character (regardless of what
it is). How then can you search for . when .is a special character that matches any character?

MATCHING SPECIAL CHARACTERS


A . has a special meaning in regex. If you need a . in your pattern, you need a way to tell regex that you want the
actual . character and not the regex special meaning of the . character. To do this, you escape the . by preceding it
with a \ (backslash). \ is a metacharacter (a fancy way of saying a character with a special meaning, in contrast to
the character itself). Therefore, . means match any character, and \. means match the . character itself.
Let’s try the previous example again, this time escaping the . with \.:

Text
Click here to view code image
sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls

RegEx

.a.\.

Result

Click here to view code image


sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls

Analysis
.a.\. did the trick. The first . matched n (in the first two matches) or s (in the third). The second . matched 1 (in
the first and third matches) or 2 (in the second). \. then matched the . separating the filename from the extension.
The example could be further improved by including the xls in the pattern so as to prevent a filename such
as sa3.doc from being matched, like this:

Text

Click here to view code image


sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
RegEx

.a.\.xls

Result

Click here to view code image


sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
In regular expressions, \ is always used to mark the beginning of a block of one or more characters that have a special
meaning. You saw \. here, and you’ll see many more examples of using \ in future lessons.
Note
The use of special characters is covered in Lesson 4, “Using Metacharacters.”
Note
In case you were wondering, to escape \ (so as to search for a backslash) use \\ (two backslashes).
Tip
. matches all characters, right? Well, maybe not. In most regular expression implementations, . matches every
character except a newline character.

SUMMARY
Regular expressions, also called patterns, are strings made up of characters. These characters may be literal (actual
text) or metacharacters (special characters with special meanings), and in this lesson you learned how to match a
single character using both literal text and metacharacters. . matches any character. \ is used to escape characters
and to start special character sequences.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy