unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (HP-UX-11.11)
Page:
Section:
Apropos / Subsearch:
optional field



 regcomp(3C)							 regcomp(3C)




 NAME
      regcomp(), regerror(), regexec(), regfree() - regular expression
      matching routines

 SYNOPSIS
      #include <&lt&lt&lt;regex.h>&gt&gt&gt;

      int regcomp(regex_t *preg, const char *pattern, int cflags);

      int regexec(
	   const regex_t *preg,
	   const char *string,
	   size_t nmatch,
	   regmatch_t pmatch[],
	   int eflags
      );

      void regfree(regex_t *preg);

      size_t regerror(
	   int errcode,
	   const regex_t *preg,
	   char *errbuf,
	   size_t errbuf_size
      );

 DESCRIPTION
      These functions interpret regular expressions as described in
      regexp(5).  They support both basic and extended regular expressions.

      The structures regex_t and regmatch_t are defined in the header
      <regex.h>.

      The regex_t structure contains at least the following member (use of
      other members results in non-portable code):

	   size_t re_nsub	    Number of parenthesized subexpressions.

      The regmatch_t structure contains at least the following members:

	   regoff_t rm_so	    Byte offset from start of string to
				    start of substring.

	   regoff_t rm_eo	    Byte offset from start of string to the
				    first character after the end of the
				    substring.

      regcomp() compiles the regular expression specified by the pattern
      argument and places the results in the structure pointed to by preg.
      The cflags argument is the bit-wise logical OR of zero or more of the
      following flags (defined in <regex.h>):



 Hewlett-Packard Company	    - 1 -   HP-UX Release 11i: November 2000






 regcomp(3C)							 regcomp(3C)




	   REG_EXTENDED	     Use extended regular expressions.

	   REG_NEWLINE	     IF REG_NEWLINE is not set in cflags, a newline
			     character in pattern or string is treated as an
			     ordinary character.  If REG_NEWLINE is set,
			     newlines are treated as ordinary characters
			     except as follows:

				  1.  A newline in string is not matched by
				      a period outside of a bracket
				      expression or by any form of a
				      nonmatching list.

				  2.  A circumflex (^) in pattern, when used
				      to specify expression anchoring,
				      matches the zero-length string
				      immediately after a newline in string,
				      regardless of the setting of
				      REG_NOTBOL.

				  3.  A dollar-sign ($) in pattern, when
				      used to specify expression anchoring,
				      matches the zero-length string
				      immediately before a newline in
				      string, regardless of the setting of
				      REG_NOTEOL.

	   REG_ICASE	     Ignore case in match.  If a character in
			     pattern is defined in the current LC_CTYPE
			     locale as having one or more opposite-case
			     counterpoints, both the character and any
			     counterpoints match the pattern character.
			     This applies to all portions of the pattern,
			     including a string of characters specified to
			     be matched via a back-reference expression
			     (\n).

			     Within bracket expressions: Collation ranges,
			     character classes, and equivalence classes are
			     effectively expanded into equivalent lists of
			     collation elements and characters.	 Opposite-
			     case counterpoints are then generated for each
			     collation element or character to form the
			     complete matching list or non-matching list for
			     the bracket expression.  Opposite-case
			     counterpoints for a multi-character collating
			     element include all possible combinations of
			     opposite-case counterpoints for each individual
			     character comprising the collating element.
			     These are then combined to form new valid
			     multi-character collating elements.  For



 Hewlett-Packard Company	    - 2 -   HP-UX Release 11i: November 2000






 regcomp(3C)							 regcomp(3C)




			     example, the opposite-case counterpoints for
			     [.ch.] could be [.Ch.], [.cH.], and [.CH.].

      The default regular expression type for pattern is Basic Regular
      Expression.  The application can specify Extended Regular Expressions
      by using the REG_EXTENDED cflags value.

      If the function regcomp() succeeds, it returns zero; otherwise it
      returns a non-zero value indicating the error.

      If regcomp() succeeds, and if the REG_NOSUB flag was not set in
      cflags, regcomp() sets re_nsub to the number of parenthesized
      subexpressions (delimited by \( and \) in basic regular expressions or
      ( and ) in extended regular expressions) found in pattern.

      regexec() matches the null-terminated string specified by string
      against the compiled regular expression preg initialized by a previous
      call to regcomp().  If it finds a match, regexec() returns zero;
      otherwise it returns non-zero indicating either no match or an error.
      The eflags argument is the bit-wise logical OR of the following flags:

	   REG_NOTBOL	     The first character of the string pointed to by
			     string is not the beginning of the line.
			     Therefore, the circumflex character (^), when
			     taken as a special character, never matches.

	   REG_NOTEOL	     The last character of the string pointed to by
			     string is not the end of the line.	 Therefore,
			     the dollar sign ($), when taken as a special
			     character, never matches.

      If nmatch is not zero, and REG_NOSUB was not set in the cflags
      argument to regcomp(), then regexec() fills in the pmatch array with
      byte offsets to the substrings of string that correspond to the
      parenthesized subexpressions of pattern: pmatch[i].rm_so is the byte
      offset of the beginning and pmatch[i].rm_eo is the byte offset one
      byte past the end of the substring i.  (Subexpression i begins at the
      ith matched left parenthesis, counting from 1).  Offsets in pmatch[0]
      identify the substring that corresponds to the entire regular
      expression.  Unused elements of pmatch are set to -1.  If there are
      more than nmatch subexpressions in pattern (pattern itself counts as a
      subexpression), regexec() still does the match, but only records the
      first nmatch substrings.

      When matching a regular expression, any given parenthesized
      subexpression of pattern might participate in the match of several
      different substrings of string, or it might not match any substring,
      even though the pattern as a whole did match.  The following explains
      which substrings are reported in pmatch when matching regular
      expressions:




 Hewlett-Packard Company	    - 3 -   HP-UX Release 11i: November 2000






 regcomp(3C)							 regcomp(3C)




	   1.	If subexpression i in a regular expression is not contained
		within another subexpression, and it participated in the
		match several times, the byte offsets in pmatch[i] delimit
		the last such match.

	   2.	If subexpression i is not contained within another
		subexpression, and it did not participate in an otherwise
		successful match (because either *, ?, or | was used), then
		the byte offsets in pmatch[i] are -1.

	   3.	If subexpression i is contained in subexpression j, and a
		match of subexpression j is reported in pmatch[j], the match
		or no-match reported in pmatch[i] is the last one that
		occurred within the substring in pmatch[j].

	   4.	If subexpression i is contained in subexpression j, and the
		offsets in pmatch[j] are -1, the offsets in pmatch[i] will
		also be -1.

	   5.	If subexpression i matched a zero-length string, both
		offsets in pmatch[i] refer to the character immediately
		following the zero-length substring.

      If REG_NOSUB was set in cflags in the call to regcomp(), and nmatch is
      not zero in the call to regexec(), the content of the pmatch array is
      unspecified.

      regfree() frees any memory allocated by regcomp() associated with
      preg.

      If the preg argument to regexec() or regfree() is not a compiled
      regular expression returned by regcomp(), the result is undefined.  A
      preg can no longer be treated as a compiled regular expression after
      it is given to regfree().

      regerror() provides a mapping from error codes returned by regcomp()
      and regexec() to printable strings.  regerror() generates a string
      corresponding to the value of the errcode parameter, which was the
      last non-zero value returned by regcomp() or regexec() with the given
      value of preg.  The errcode parameter can take on any of the error
      values defined in <regex.h>.  If errbuf_size is not zero, regerror()
      copies an appropriate error message into the buffer specified by
      errbuf.  If the error message (including the terminating null) cannot
      fit in the buffer, it is truncated to errbuf_size - 1 bytes and null
      terminated.

      If errbuf_size is zero, the errbuf parameter is ignored, but the
      return value is as defined below.

      regerror() returns the size of the buffer (including terminating null)
      that is required to hold the entire error message.



 Hewlett-Packard Company	    - 4 -   HP-UX Release 11i: November 2000






 regcomp(3C)							 regcomp(3C)




 EXTERNAL INFLUENCES
    Locale
      The LC_COLLATE category determines the collating sequence used in
      compiling and executing regular expressions.

      The LC_CTYPE category determines the interpretation of text as single
      and/or multi-byte characters, the characters matched by character-
      class expressions in regular expressions, and the opposite-case
      counterpart for each character.

    International Code Set Support
      Single- and multi-byte character code sets are supported.

 RETURN VALUE
      regcomp() returns zero for success and non-zero for an invalid
      expression or other failure.  regexec() returns zero if it finds a
      match and non-zero for no match or other failure.

 ERRORS
      If regcomp() or regexec() detects one of the error conditions listed
      below, it returns the corresponding non-zero error code.	The error
      codes are defined in the header <regex.h>.

	   REG_BADBR	       The contents within the pair \{ (backslash
			       left brace) and \} (backslash right brace)
			       are unusable: not a number, number too large,
			       more than two numbers, or first number larger
			       than second.

	   REG_BADPAT	       An invalid regular expression.

	   REG_BADRPT	       The ?  (question mark), * (asterisk), or +
			       (plus sign) symbols are not preceded by a
			       valid regular expression.

	   REG_EBRACE	       The use of a pair of \{ (backslash left
			       brace) and \} (backslash right brace) or {}
			       (braces) is unbalanced.

	   REG_EBRACK	       The use of [] (brackets) is unbalanced.

	   REG_EBOL	       Using the ^ (caret) anchor and not beginning
			       of line.

	   REG_ECHAR	       There is an invalid multibyte character.

	   REG_ECOLLATE	       There is an unusable collating element
			       referenced.

	   REG_ECTYPE	       There is an unusable character class type
			       referenced.



 Hewlett-Packard Company	    - 5 -   HP-UX Release 11i: November 2000






 regcomp(3C)							 regcomp(3C)




	   REG_EEOL	       Using the $ (dollar) anchor and not end of
			       line.

	   REG_EESCAPE	       There is a trailing \ in the pattern.

	   REG_EPAREN	       The use of a pair of \( (backslash left
			       parenthesis) and \) (backslash right
			       parenthesis) or () is unbalanced.

	   REG_ERANGE	       There is an unusable endpoint in the range
			       expression.

	   REG_ESPACE	       There is insufficient memory space.

	   REG_ESUBREG	       The number in \digit is invalid or in error.

	   REG_NOMATCH	       The regexec() function failed to match.

 EXAMPLES
	   /* match string against the extended regular expression in pattern,
	   treating errors as no match.	 Return 1 for match, 0 for no match.
	   Print an error message if an error occurs. */

	   int
	   match(string, pattern)
	   char *string;
	   char *pattern;
	   {
	       int i;
	       regex_t re;
	       char buf[256];

	       i=regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB);
	       if (i != 0) {
		   (void)regerror(i,&re,buf,sizeof buf);
		   printf("%s\n",buf);
		   return(0);			    /* report error */
	       }
	       i = regexec(&re, string, (size_t) 0, NULL, 0);
	       regfree(&re);
	       if (i != 0) {
		   (void)regerror(i,&re,buf,sizeof buf);
		   printf("%s\n",buf);
		   return(0);			    /* report error */
	       }
	       return(1);
	   }

      The following demonstrates how the REG_NOTBOL flag could be used with
      regexec() to find all substrings in a line that match a pattern
      supplied by a user.



 Hewlett-Packard Company	    - 6 -   HP-UX Release 11i: November 2000






 regcomp(3C)							 regcomp(3C)




	   (void) regcomp(&re, pattern, 0);
	   /* look for first match at start of line */
	   error = regexec(&re, &buffer[0], 1, &pm, 0);
	   while (error == 0) {		     /* while matches found */
	       /* find next match on line */
	       error = regexec(&re, &buffer[pm.rm_eo], 1, &pm, REG_NOTBOL);
	   }

 AUTHOR
      regcomp(), regerror(), regexec(), and regfree() were developed by OSF
      and HP.

 SEE ALSO
      regexp(5).

 STANDARDS CONFORMANCE
      regcomp(): XPG4, POSIX.2

      regerror(): XPG4, POSIX.2

      regexec(): XPG4, POSIX.2

      regfree(): XPG4, POSIX.2































 Hewlett-Packard Company	    - 7 -   HP-UX Release 11i: November 2000