unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (SunOS-4.1.3)
Page:
Section:
Apropos / Subsearch:
optional field

REGEXP(3)                  Library Functions Manual                  REGEXP(3)



NAME
       regexp - regular expression compile and match routines

SYNOPSIS
       #define INIT <&lt;declarations>&gt;
       #define GETC() <&lt;getc code>&gt;
       #define PEEKC() <&lt;peekc code>&gt;
       #define UNGETC(c) <&lt;ungetc code>&gt;
       #define RETURN(pointer) <&lt;return code>&gt;
       #define ERROR(val) <&lt;error code>&gt;

       #include <&lt;regexp.h>&gt;

       char *compile(instring, expbuf, endbuf, eof)
       char *instring, *expbuf, *endbuf;
       int eof;

       int step(string, expbuf)
       char *string, *expbuf;

       extern char *loc1, *loc2, *locs;

       extern int circf, sed, nbra;

DESCRIPTION
       This  page  describes  general-purpose regular expression matching rou-
       tines.

       The interface to this file  is  unpleasantly  complex.   Programs  that
       include  this  file must have the following five macros declared before
       the `#include <&lt;regexp.h>&gt;' statement.  These macros are used by the com-
       pile routine.

       GETC()              Return the value of the next character in the regu-
                           lar expression pattern.  Successive calls to GETC()
                           should  return successive characters of the regular
                           expression.

       PEEKC()             Return the next character in  the  regular  expres-
                           sion.   Successive  calls  to PEEKC() should return
                           the same character, which should also be  the  next
                           character returned by GETC().

       UNGETC(c)           Returns  the  argument c by the next call to GETC()
                           or PEEKC().  No more that one character of pushback
                           is  ever needed and this character is guaranteed to
                           be the last character read by GETC().  The value of
                           the macro UNGETC(c) is always ignored.

       RETURN(pointer)     This  macro  is  used on normal exit of the compile
                           routine.  The value of the argument  pointer  is  a
                           pointer  to  the character after the last character
                           of the compiled regular expression.  This is useful
                           to programs that have memory allocation to manage.

ERRORS
       ERROR(val)          This is the abnormal return from the compile() rou-
                           tine.  The argument val is an error number (see ta-
                           ble  below  for  meanings).  This call should never
                           return.

                           ERROR     MEANING
                           11        Range endpoint too large.
                           16        Bad number.
                           25        ``\ digit'' out of range.
                           36        Illegal or missing delimiter.
                           41        No remembered search string.
                           42        \( \) imbalance.
                           43        Too many \(.
                           44        More than 2 numbers given in \{ \}.
                           45        } expected after \.
                           46        First number exceeds second in \{ \}.
                           49        [] imbalance.
                           50        Regular expression too long.

       The syntax of the compile() routine is as follows:

                     compile(instring, expbuf, endbuf, eof)

       The first parameter instring is never used explicitly by the  compile()
       routine but is useful for programs that pass down different pointers to
       input characters.  It is sometimes used in the INIT() declaration  (see
       below).  Programs that call functions to input characters or have char-
       acters in an external array can pass down a value of ((char *)  0)  for
       this parameter.

       The  next  parameter  expbuf  is a character pointer.  It points to the
       place where the compiled regular expression will be placed.

       The parameter endbuf is one more than the  highest  address  where  the
       compiled  regular expression may be placed.  If the compiled expression
       cannot fit in (endbuf-expbuf) bytes, a call to ERROR(50) is made.

       The parameter eof is the character that marks the end  of  the  regular
       expression.  For example, in an editor like ed(1), this character would
       usually a `/'.

       Each program that includes this file must have a #define statement  for
       INIT().  This definition will be placed right after the declaration for
       the function compile() and `{' (opening curly brace).  It is  used  for
       dependent  declarations  and initializations.  Most often it is used to
       set a register variable to point the beginning of the  regular  expres-
       sion so that this register variable can be used in the declarations for
       GETC(), PEEKC(), and UNGETC().  Otherwise it can  be  used  to  declare
       external variables that might be used by GETC(), PEEKC(), and UNGETC().
       See the example below of the declarations taken from grep(1V).

       There are other functions in this  file  that  perform  actual  regular
       expression  matching, one of which is the function step().  The call to
       step() is as follows:

              step(string, expbuf)

       The first parameter to step() is a pointer to a string of characters to
       be checked for a match.  This string should be null-terminated

       The second parameter expbuf is the compiled regular expression that was
       obtained by a call of the function compile.

       The function step() returns non-zero if the given  string  matches  the
       regular expression, and zero if the expressions do not match.  If there
       is a match, two external character pointers are set as a side effect to
       the  call  to  step().   The variable set in step() is loc1.  This is a
       pointer to the first character that  matched  the  regular  expression.
       The  variable  loc2,  which is set by the function advance(), points to
       the character after the last character that matches the regular expres-
       sion.   Thus  if  the  regular expression matches the entire line, loc1
       will point to the first character of string and loc2 will point to  the
       null character at the end of string.

       step()  uses  the  external variable circf which is set by compile() if
       the regular expression begins with `^'.  If this  is  set  then  step()
       will try to match the regular expression to the beginning of the string
       only.  If more than one regular expression is to be compiled before the
       first  is executed the value of circf should be saved for each compiled
       expression and circf should be set to that saved value before each call
       to step().

       The function advance() is called from step() with the same arguments as
       step().  The purpose of step() is to step through the  string  argument
       and  call advance() until advance() returns non-zero indicating a match
       or until the end of string is  reached.   If  one  wants  to  constrain
       string  to  the  beginning of the line in all cases, step() need not be
       called; simply call advance().

       When advance() encounters a * or \{ \} sequence in the regular  expres-
       sion, it will advance its pointer to the string to be matched as far as
       possible and will recursively call itself trying to match the  rest  of
       the  string to the rest of the regular expression.  As long as there is
       no match, advance() will back up along the  string  until  it  finds  a
       match  or  reaches the point in the string that initially matched the *
       or \{ \}.  It is sometimes desirable to stop this backing up before the
       initial  point  in  the  string  is reached.  If the external character
       pointer locs is equal to the point in the string at sometime during the
       backing  up process, advance() will break out of the loop that backs up
       and will return zero.  This could be used by an editor  like  ed(1)  or
       sed(1V) for substitutions done globally (not just the first occurrence,
       but the whole line) so, for example, expressions like  s/y*//g  do  not
       loop forever.

       The  additional  external  variables  sed and nbra are used for special
       purposes.

EXAMPLES
       The following is an example of how the regular  expression  macros  and
       calls could look in a command like grep(1V):
              #define INIT   register char *sp = instring;
              #define GETC() (*sp++)
              #define PEEKC()     (*sp)
              #define UNGETC(c)   (--sp)
              #define RETURN(c)   return;
              #define ERROR(c)    regerr()

              #include <&lt;regexp.h>&gt;
              ...
                       (void) compile(*argv, expbuf, &&amp;expbuf[ESIZE], '\0');
              ...
                      if (step(linebuf, expbuf))
                              succeed ();

SEE ALSO
       ed(1), grep(1V), sed(1V)

BUGS
       The handling of circf is difficult.



                                21 January 1990                      REGEXP(3)