4 Filter and Regex
4 Filter and Regex
The cut command uses one delimiter between two elds A number of whitespaces may confuse it
Example: Try to print only le size and name $ ls -l gnasl -rw-r--r-- 1 hugo staff 2894 Feb 12 14:14 gnasl $ ls -l | cut -d -f 5,9 staff 12 $ _ The awk Filter
Strictly speaking, not just a lter but a programming language Without knowing the language, its still useful for some tasks
Example: Select elds from ls -l output with awk $ ls -l gnasl | awk { print $5, $9 } 2894 gnasl $ ls -l gnasl | awk { print $5, "\t", $9 } 2894 gnasl $ _
44
Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions
Regular Expressions
Regular Expressions can be used for describing Text Patterns Example: ^g matches text lines starting with a lowercase g Dialects dier, depending on the tools used
Basic Operators These are understood by most tools supporting regular expressions: \ [AaBbCc] [a-z] [^a-z] . * activate or deactivate an operator, example: \\ produces a backslash matches one character from the set {A, a, B, b, C, c} matches a range, here between a and z matches one character that is not within the range specied here matches one character (any) matches zero to innity occurrances of the preceding expression, example: * matches any number of space characters matches the beginning of the current line matches the current lines end matches the beginning and the end of a word, example: \<Hugo\> matches Hugo as a whole word
45
Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions
Example: ls Output Display only symbolic links: $ ls -l | grep "^l" lrwxrwxrwx 1 hugo staff 17 Jul 26 2001 foo -> bar lrwxrwxrwx 1 hugo staff 17 Sep 13 2001 x -> ../y $_ Example: Log File Select only the entries from the 28th and 29th of March 2001 in the Apache log le. Heres the format from which we want to get the information: $ tail -1 access_log myhost [28/Mar/2001:16:19:07 +0200] "GET /a.html" $ _ This is the regular expression used for getting the entries: $ grep "2[89]/Mar/2001.*/.*\.html" access_log myhost [28/Mar/2001:16:19:07 +0200] "GET /a.html" [...] myhost [29/Mar/2001:17:00:12 +0200] "GET /b.html" [...] $ _
46
Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions
sed stands for Stream Editor It can be used to manipulate text in a data stream Like grep, sed can use regular expressions We concentrate on the substitute command here More than one expression can be specied using -e
Example: Evaluate a conguration le $ cat config.conf # Configuration file set A b set B c $ grep -v "^ *#" config.conf | sed s/^set *// \ > | sed s/ */=/ A=b B=c $ grep -v "^ *#" config.conf \ > | sed -e s/^set *// -e s/ */=/ A=b B=c $ eval grep -v "^ *#" config.conf \ > | sed -e s/^set *// -e s/ */=/ $ echo $A b $ _
Herbert Martin Dietze <herbert@the-little-red-haired-girl.org> 47
Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions
More sed
The substitute command can take options: Ignore case: i and global replace: g (replace not only the rst match) They get appended to the expression: s/foo/bar/gi What if the source or destination pattern contains slashes? Escape the slashes with backslashes (can be dicult if the pattern is a variables content) or use a dierent separator, any character is allowed!
Example: Remove double slashes in path specs $ echo /usr//local/bin:/home/herbert///data \ > | sed |//*|/|g /usr/local/bin:/home/herbert/data $ _
We can also reference matches from the search pattern \( and \) address a subpattern in the search eld \1 selects the rst, \2 the second etc. in the replace eld
Example: $ echo "Hugo <hugo@hotmail.com>" \ > | sed "s/[^<]*<\([^@]*\)@\([^>]*\)>.*/\1 at \2/" hugo at hotmail.com $ _
Herbert Martin Dietze <herbert@the-little-red-haired-girl.org> 48
Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions
Some tools understand more than just the basic operators Such tools are e.g. perl and egrep Other tools may support them: use \ to activate!
? + matches none or one occurrance of the preceding pattern matches one to innity occurrances of the preceding pattern matches exactly n occurrances of the preceding pattern matches n to m occurrances of the preceding pattern matches at least n occurrances of the preceding pattern matches text containing either text1 or text2 bundles text to a unit for repetition operators (*, + etc.), and it can now be selected by \1, \2 etc.
{ n} {n,m} {n,}
text1|text2 (text)
Example: $ ls -l | egrep "hugo|harry" -rw-r--r-- 1 harry staff 1315 Feb 14 11:05 annab -rw-r--r-- 1 hugo staff 2894 Feb 12 14:14 gnasl $ _
Herbert Martin Dietze <herbert@the-little-red-haired-girl.org> 49
Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions
The perl interpreter can be used like sed Advantage: no escaping of extended syntax necessary! Also: perl can work on more than one line! Syntax: perl -pe s/source/destination/
Example: $ echo "Hugo <hugo@hotmail.com>" \ > | sed "s/[^<]*<\([^@]*\)@\([^>]*\)>.*/\1 at \2/" hugo at hotmail.com $ echo "Hugo <hugo@hotmail.com>" | perl \ > -pe "s/[^<]*<([^@]*)@([^>]*)>.*/\1 at \2/" hugo at hotmail.com Longer Example: Generate HTML from Inline Comments The problem:
It is always nice to keep module descriptions at one place So why not generate HTML from the program sources? Convention: Extract only comments starting with double hash Ignore other comments and program code Add tags for special elements (function, type, variable, ...)
50
Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions
Example source: $ cat example.sh #!/bin/sh ############################################### # ## @function hugo ## print a friendly message to stdout. ## ## This function print a "hello world" to ## stdout. Quite nice. # ############################################### hugo () { echo "hello world" } # ## @function main program ## ## The main program calls hugo and exits. # hugo $ _
51
Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions
Step 1: Discard unwanted lines $ egrep "^ *##" example.sh | egrep -v "^ *###" ## @function hugo ## print a friendly message to stdout. ## ## This function print a "hello world" to ## stdout. Quite nice. ## @function main program ## ## The main program calls hugo and exits. $ _ Step 2: Add HTML-Tags and remove hashes $ egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe s/^ *## *$/<p>/; s/^ *## *// @function hugo print a friendly message to stdout. <p> This function print a "hello world" to stdout. Quite nice. @function main program <p> The main program calls hugo and exits. $ _
52
Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions
Step 3: Translate pseudo-tags $ egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe s/^ *## *$/<p>/; s/^ *## *// \ > -e ; s|@function *(.*)|<h2>Function \1</h2>| <h2>Function hugo</h2> print a friendly message to stdout. <p> This function print a "hello world" to stdout. Quite nice. <h2>Function main program</h2> <p> The main program calls hugo and exits. $ _ Last Step: Make it a HTML-File $ ( echo "<html><head>Program Documentation</head> > <body><h1>Program Documentation</h1>" > egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe s/^ *## *$/<p>/; s/^ *## *// \ > -e ; s|@function *(.*)|<h2>Function \1</h2>| > echo "</body></html>") <html><head>Program Documentation</head> <body><h1>Program Documentation</h1> [...] </body></html> $ _
Herbert Martin Dietze <herbert@the-little-red-haired-girl.org> 53