unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (SunOS-4.1.3)
Page:
Section:
Apropos / Subsearch:
optional field

LEX(1)                      General Commands Manual                     LEX(1)



NAME
       lex - lexical analysis program generator

SYNOPSIS
       lex [ -fntv ] [ filename ] ...

DESCRIPTION
       lex  generates  programs to be used in simple lexical analysis of text.
       Each filename (the standard input by default) contains regular  expres-
       sions  to  search  for,  and  actions  written in C to be executed when
       expressions are found.

       A C source program, lex.yy.c is generated, to be compiled as follows:

              cc lex.yy.c -ll

       This program, when run, copies unrecognized portions of  the  input  to
       the  output,  and  executes  the  associated  C action for each regular
       expression that is recognized.  The actual string matched  is  left  in
       yytext, an external character array.

       Matching  is done in order of the strings in the file.  The strings may
       contain square braces to indicate character classes, as in  [abx-z]  to
       indicate  a,  b, x, y, and z; and the operators *, + and ?, which mean,
       respectively, any nonnegative number, any positive  number,  or  either
       zero  or  one occurrences of the previous character or character-class.
       The "dot" character (`.') is the class of all ASCII  characters  except
       NEWLINE.

       Parentheses for grouping and vertical bar for alternation are also sup-
       ported.  The notation r{d,e} in a rule indicates instances  of  regular
       expression  r  between d and e.  It has a higher precedence than |, but
       lower than that of *, ?, +, or concatenation.  The ^ (carat  character)
       at the beginning of an expression permits a successful match only imme-
       diately after a NEWLINE, and the $ character at the end of  an  expres-
       sion requires a trailing NEWLINE.

       The  /  character in an expression indicates trailing context; only the
       part of the expression up to the slash is returned in yytext,  although
       the remainder of the expression must follow in the input stream.

       An operator character may be used as an ordinary symbol if it is within
       `"' symbols or preceded by `\'.

       Three subroutines defined as macros are expected:  input()  to  read  a
       character; unput(c) to replace a character read; and output(c) to place
       an output character.   They  are  defined  in  terms  of  the  standard
       streams,  but  you  can  override them.  The program generated is named
       yylex(), and the library contains a main() which calls it.  The  action
       REJECT  on  the  right side of the rule rejects this match and executes
       the next suitable match; the function yymore()  accumulates  additional
       characters  into the same yytext; and the function yyless(n) where n is
       the number of characters to retain in yytext.   The  macros  input  and
       output use files yyin and yyout to read from and write to, defaulted to
       stdin and stdout, respectively.

       In a lex program, any line beginning with a blank is assumed to contain
       only  C  text  and  is  copied; if it precedes %% it is copied into the
       external definition area of the lex.yy.c file.  All rules should follow
       a %%, as in YACC.  Lines preceding %% which begin with a nonblank char-
       acter define the string on the left to be the remainder of the line; it
       can  be  used later by surrounding it with {}.  Note: curly brackets do
       not imply parentheses; only string substitution is done.

       The external names generated by lex all begin with the prefix yy or YY.

       Certain table sizes for the resulting finite-state machine can  be  set
       in the definitions section:

              %p n   number of positions is n (default 2000)

              %n n   number of states is n (500)

              %t n   number of parse tree nodes is n (1000)

              %a n   number of transitions is n (3000)

       The  use  of  one  or  more  of  the above automatically implies the -v
       option, unless the -n option is used.

OPTIONS
       -f     Faster compilation. Do not bother to pack the resulting  tables;
              limited to small programs.

       -n     Opposite of -v; -n is default.

       -t     Place  the  result  on  the  standard  output instead of in file
              lex.yy.c.

       -v     Print a one-line summary of statistics  of  the  generated  ana-
              lyzer.

EXAMPLES
       The following command line:
              lex lexcommands

       would  draw  lex  instructions from the file lexcommands, and place the
       output in lex.yy.c.

       The following:

              %%  [A-Z]     putchar   (yytext[0]+'a'-'A');   [   ]+$     ;   [
              ]+ putchar(' ');

       is  an  example  of  a  lex  program.  It converts upper case to lower,
       removes blanks at the end of lines, and  replaces  multiple  blanks  by
       single blanks.

              D       [0-9]
              %%
              if      printf("IF statement\n");
              [a-z]+  printf("tag, value %s\n",yytext);
              0{D}+   printf("octal number %s\n",yytext);
              {D}+    printf("decimal number %s\n",yytext);
              "++"    printf("unary op\n");
              "+"     printf("binary op\n");
              "/*"    {       loop:
                              while (input() != '*');
                              switch (input())
                                      {
                                      case '/': break;
                                      case '*': unput('*');
                                      default: go to loop;
                                      }
                              }

FILES
       lex.yy.c

SEE ALSO
       sed(1V), yacc(1)

NOTES
       The  lex  command is not changed to support 8-bit symbol names, as this
       would produce lex source code that is not portable between systems.



                                1 December 1988                         LEX(1)