Node:Regular expressions, Neste:, Forrige:Quoted strings, Opp:More advanced concepts



Regular expressions

Regular expressions can be used in cfengine in connection with editfiles and processes to search for lines matching certain expressions. A regular expression is a generalized wildcard. In cfengine wildcards, you can use the characters '*' and '?' to match any character or number of characters. Regular expressions are more complicated than wildcards, but have far more flexibility.

NOTE: the special characters * and ? used in wildcards do not have the same meanings as regular expressions!.

Some regular expressions match only a single string. For example, every string which contains no special characters is a regular expression which matches only a string identical to itself. Thus the regular expression cfengine would match only the string "cfengine", not "Cfengine" or "cfengin" etc. Other regular expressions could match more general strings. For instance, the regular expression c* matches any number of c's (including none). Thus this expression would match the empty string, "c", "cccc", "ccccccccc", but not "cccx".

Here is a list of regular expression special characters and operators.


\
The backslash character normally has a special purpose: either to introduce a special command, or to tell the expression interpreter that the next character is not to be treated as a special character. The backslash character stands for itself only when protected by square brackets [\] or quoted with a backslash itself \\.
\b
Matches word boundary operator.
\B
Match within a word (operator).
\<
Match beginning of word.
\>
Match end of word.
\w
Match a character which can be part of a word.
\W
Match a character which cannot be part of a word.
any character
Matches itself.
.
Matches any character
*
Match zero or more instances of the previous object. e.g. c*. If no object precedes it, it represents a literal asterisk.
+
Match one or more instances of the preceding object.
?
Match zero or one instance of the preceding object.
{ }
Number of matches operator. {5} would match exactly 5 instances of the previous object. {6,} would match at least 6 instances of the previous object. {7,12} would match at least 7 instances of, but no more than 12 instances of the preceding object. Clearly the first number must be less than the second to make a valid search expression.
|
The logical OR operator, OR's any two regular expressions.
[list]
Defines a list of characters which are to be considered as a single object (ORed). e.g. [a-z] matches any character in the range a to z, abcd matches either a, b, c or d. Most characters are ordinary inside a list, but there are some exceptions: ] ends the list unless it is the first item, \ quotes the next character, [: and :] define a character class operator (see below), and - represents a range of characters unless it is the first or last character in the list.
[^list]
Defines a list of characters which are NOT to be matched. i.e. match any character except those in the list.
[:class:]
Defines a class of characters, using the ctype-library.

alnum
Alpha numeric character
alpha
An alphabetic character
blank
A space or a TAB
cntrl
A control character.
digit
0-9
graph
same as print, without space
lower
a lower case letter
print
printable characters (non control characters)
punct
neither control nor alphanumeric symbols
space
space, carriage return, line-feed, vertical tab and form-feed.
upper
upper case letter
xdigit
a hexadecimal digit 0-9, a-f

( )
Groups together any number of operators.
\digit
Back-reference operator (refer to the GNU regex documentation).
^
Match start of a line.
$
Match the end of a line.

Here is a few examples. Remember that some commands look for a regular expression match of part of a string, while others require a match of the entire string (see Reference manual).

^#        match string beginning with the # symbol
^[^#]      match string not beginning with the # symbol
^[A-Z].+  match a string beginning with an uppercase letter
          followed by at least one other character