With Bison, the GNU version of Yacc, one is able to generate a C program that
parses a given LALR(1) context-free grammar. Often when creating the grammar and
running it through Bison, one gets warnings about shift/reduce and reduce/reduce
conflicts. Although a functional parser is created, there is a possibility that
any given text is parsed incorrectly as the parser may make incorrect
choices/operations when it comes to resolving the conflicts. If you are an
experienced user of Flex and Bison then stop reading and click on the
Back button. If you are a beginner then continue reading.
To help resolve these conflicts, one can make use of the -v or --verbose
option when invoking Bison. This results in a *.output file being
created, which can be used to find and resolve these conflicts. I have looked
for documentation that describes the output file in more detail but haven't been
able to find anything decent. In this article I am putting forward my
interpretation of the contents of the *.output file.
Just a warning!!! - The grammar found in the structgrammar.y
file is very badly written but is done so that the output file can contain all
the relevant sections, especially those dealing with conflicts.
This wasn't what I was hoping to be my first article for Code Project. I was
hoping my first article would be an introduction to one of the applications that
I have been working on and off for the past three years. But at least this way I
start writing. OK, enough procrastination.
I have been trying on and off to create a parser with the help of GNU tools
Flex (Lex) and Bison (Yacc). My attempts have all failed so far, that is until I
bought the book Lex & Yacc. Die Profitools zur lexikalischen und syntaktischen
Textanalyse by Helmut Herold.
Contents of the output file
The output file provides descriptions of the parser states and can consist of
up to 7 sections.
This section lists all of the non-terminals that have been defined but that
are not used nor referred to in any other rules. That is to say that the
non-terminal doesn't occur on the right side of any of the other rules.
This section lists the rules that are not used in the definition of any other
rules and is similar to the Useless non-terminals
#45 template : TEMPLATE;
#46 template : epsilon;
The conflicts section says where the
shift/reduce conflicts and
reduce/reduce conflicts occur in the various parser states,
State 43 contains 4 reduce/reduce conflicts.
The fourth section contains the grammar which is divided up into rules,
rule 1 class_specifier -> class_head OPENBRACE
while the corresponding line from the context-free grammar is as follows:
class_specifier : class_head OPENBRACE member_specification CLOSEBRACE
The fifth section provides a list of all terminals that have been defined in
the grammar. As can be seen in the example below, the name of each terminal is
followed by two numbers, one that is enclosed in brackets and one that is not.
STRUCT (258) 9
The first number, 258, refers to the number generated by the Bison program
and is used in the C definition found in the structgrammar.tab.c
#define STRUCT = 258
while the second number, 9, is the number of the rule where the terminal is
rule 9 class_key -> STRUCT
The sixth section provides a list of all the non-terminals that have been
defined in the grammar.
on left: 3 4 5 6 7 8, on right: 1 2
Here again there are a few numbers that are displayed. I think the number
enclosed in brackets directly after the non-terminal is the index into the
yytname array found in the structgrammar.tab.c file. The
numbers found after
on left indicate the rules where the
non-terminal occurs on the left side of the rule, while the numbers found after
on right indicate the rules where the non-terminal occurs on the
right side of the rule.
rule 3 class_head -> class_key
rule 4 class_head -> class_key IDENTIFIER
rule 5 class_head -> class_key base_clause
rule 6 class_head -> class_key IDENTIFIER base_clause
rule 7 class_head -> class_key nested_name_specifier IDENTIFIER
rule 8 class_head -> class_key nested_name_specifier
rule 1 class_specifier -> class_head OPENBRACE
rule 2 class_specifier -> class_head
The seventh section contains details about each parser state. The
structgrammar.output file contains a total of 219 states. A lot of
information is contained in each state description. First, the rules that the
parser has worked through to get to the current state are listed. Some rules
listed haven't been parsed totally. How far the parser has worked through each
rule is indicated by the full stop (
.). The rest of the state
description shows the states to which the parser will proceed or which
operations will occur, depending on the value of the lookahead token. Conflicts
are demarcated by square brackets.
class_name -> IDENTIFIER . (rule 17)
template_id -> IDENTIFIER . LESSTHAN
template_argument_list GREATERTHAN (rule 59)
type_name -> IDENTIFIER . (rule 92)
LESSTHAN shift, and go to state 76
COMMA reduce using rule 17 (class_name)
COMMA [reduce using rule 92 (type_name)]
OPENBRACKET reduce using rule 17 (class_name)
OPENBRACKET [reduce using rule 92 (type_name)]
SEMICOLON reduce using rule 17 (class_name)
SEMICOLON [reduce using rule 92 (type_name)]
OPERATOR reduce using rule 17 (class_name)
OPERATOR [reduce using rule 92 (type_name)]
$default reduce using rule 92 (type_name)
As an example, state 43 (the conflicts section
above, refers to state 43 having 4
reduce/reduce conflicts) is
used. To reach state 43 the token
IDENTIFIER has been parsed and is
situated on top of the stack, as is seen in all three rules by the position of
the full stop after the
IDENTIFIER token. If the lookahead token is
LESSTHAN then a
shift operation occurs and the parser
goes to state 76. If the lookahead token is
COMMA then a
reduce operation will occur, and it is here that a
reduce/reduce conflict arises. The
token can be reduced either by rule 17 or rule 92 and this is where the conflict
arises, the parser doesn't know which rule to use in the reduction. The
following is a guess but I think that the line beginning with $default indicates
the action taken when the lookahead token doesn't match any of the given tokens.
An example of a
shift/reduction conflict as it appears in the state
description is as follows:
IDENTIFIER shift, and go to state 70
IDENTIFIER [reduce using rule 14 (nested_name_specifier)]
If the lookahead token is
IDENTIFIER then either a
reduce operation can occur. The conflict here
again is that the parser doesn't know which operation to perform.
I will gladly appreciate any comments on this article.
- October 2003 - First public release.