Facio


FSharpYacc

FSharpYacc is a tool for generating parsers for context-free grammars (CFGs) described by a parser specification file (*.fsy).

The parser specification files used by FSharpYacc and the parsers it generates are largely compatible with the older fsyacc tool from the F# PowerPack. It follows a similar specification to the OCamlYacc parser generator (especially when used with the ml compatibility switch)

Compatibility Notes

  • After switching from fsyacc to FSharpYacc, you may find that a parser specification which works correctly with fsyacc does not work correctly with FSharpYacc. If you encounter this problem, it is likely that your parser will return a result which is not "complete" -- i.e., the parser did not parse the entire contents of the input file. The cause of this seems to be that fsyacc contains a bug where it sometimes does not honor a %left declaration for a token, meaning that some conflicts on that token may be solved by shifting (the equivalent of a %right declaration). The problem can be remedied by changing the %left declarations in question to %right declarations.

    TODO: Include instructions for diagnosing which declarations need to be modified.

  • FSharpYacc handles multi-way conflicts differently than fsyacc. A multi-way conflict occurs when an LR parser table contains multiple REDUCE actions (and possibly, a SHIFT action) for a single LR parser state; this type of conflict often occurs when compiling a parser specification with an empty production rule for one or more nonterminals.

    The current version (as of 04-Jun-2013) of FSharpYacc simply discards all of the REDUCE actions except for the one with the lowest production rule number (i.e., the one which occurs closest to the top of the specification file). This stategy is "good enough" for now, but is not optimal so it will be refined in a future version.

    fsyacc handles multi-way conflicts on-the-fly while constructing an LR parser table. When it adds an action to an LR parser state which already contains a simple conflict (S/R or R/R), and the added action is not equal to either of the actions involved in the simple conflict, the two conflicting actions are discarded, leaving only the new action. The result of this is that the precedence and associativity declarations are not always applied to resolve conflicts, so the parser may behave unexpectedly.

Getting Started

Install the Facio nuget package.

The FSharpYacc can be installed from NuGet:
PM> Install-Package Facio -Pre

Sample input

Parser generators typically produce numbers represented by values in an F# Union Type. For example:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
type Expr =
 | Val of string
 | Int of int
 | Float of float
 | Decr of Sxpr


type Stmt =
 | Assign of string * Sxpr
 | While of Expr * Stmt
 | Seq of Stmt list
 | IfThen of Expr * Stmt
 | IfThenElse of Expr * Stmt * Stmt
 | Print of Expr


type Prog = Prog of Stmt list

Given that, a typical parser specification is as follows:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
%{
open Ast
%}

%start start
%token <string> ID
%token <System.Int32> INT
%token <System.Double> FLOAT
%token DECR LPAREN RPAREN WHILE DO END BEGIN IF THEN ELSE PRINT SEMI ASSIGN EOF
%type < Ast.Prog > start


%%


start: Prog {  $1 }


Prog: StmtList { Prog(List.rev($1)) }


Expr: ID { Val($1) }
    | INT {  Int($1)  }
    | FLOAT {  Float($1)  }
    | DECR LPAREN Expr RPAREN {  Decr($3)  }


Stmt: ID ASSIGN Expr { Assign($1,$3) }
    | WHILE Expr DO Stmt { While($2,$4) }
    | BEGIN StmtList END { Seq(List.rev($2)) }
    | IF Expr THEN Stmt { IfThen($2,$4) }
    | IF Expr THEN Stmt ELSE Stmt { IfThenElse($2,$4,$6) }
    | PRINT Expr { Print($2) }


StmtList: Stmt { [$1] }
       | StmtList SEMI Stmt { $3 :: $1  }

The above generates a datatype for tokens and a function for each start production. Parsers are typically combined with a lexer generated using FSharpLex.

MSBuild support

The nuget package includes MSBuild support for FSharpLex and FSharpYacc. You need to verify that Facio.targets is referenced from your project file like this (adjust the NuGet package number if needed):

1: 
2: 
<import project="..\packages\Facio.0.8.7-alpha\build\Facio.targets"
        condition="Exists('..\packages\Facio.0.8.7-alpha\build\Facio.targets')" />

You must also add FSharpLex and FSharpYacc entries like this:

1: 
2: 
3: 
4: 
5: 
6: 
<FSharpLex include="Lexer.fsl">
    <unicode>true</unicode>
</FSharpLex>
<FSharpYacc include="Parser.fsy">
    <modulename>Parser</modulename>
</FSharpYacc>

Command line options

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
FSharpYacc <filename>

    -o <string> Name the output file.

    -v: Produce a listing file.

    --module <string> Define the F# module name to host the generated parser.

    --internal: Generate an internal module

    --open <string> Add the given module to the list of those to open in both the generated signature and implementation.

    --light: (ignored)

    --light-off: Add #light "off" to the top of the generated file

    --ml-compatibility: Support the use of the global state from the 'Parsing' module in FSharp.PowerPack.dll.

    --tokens: Simply tokenize the specification file itself.

    --lexlib <string> Specify the namespace for the implementation of the parser table interperter (default Microsoft.FSharp.Text.Parsing)

    --parslib <string> Specify the namespace for the implementation of the parser table interperter (default Microsoft.FSharp.Text.Parsing)

    --codepage <int> Assume input lexer specification file is encoded with the given codepage.

    --help: display this list of options

    -help: display this list of options

Managing and using position markers

Each action in an FsharpYacc parser has access to a parseState value through which you can access position information.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
type IParseState =
    abstract InputStartPosition: int -> Position
    abstract InputEndPosition: int -> Position
    abstract InputRange: int -> Position * Position
    abstract ParserLocalStore: IDictionary<string,obj>
    abstract ResultRange  : Position * Position
    abstract RaiseError<'b> : unit -> 'b

Input relate to the indexes of the items on the right hand side of the current production, the Result relates to the entire range covered by the production. You shouldn't use GetData directly - these is called automatically by $1, $2 etc. You can call RaiseError if you like.

You must set the initial position when you create the lexbuf:

1: 
2: 
3: 
4: 
5: 
let setInitialPos (lexbuf:lexbuf) filename =
     lexbuf.EndPos <- { pos_bol = 0;
                        pos_fname=filename;
                        pos_cnum=0;
                        pos_lnum=1 }

You must also update the position recorded in the lex buffer each time you process what you consider to be a new line:

1: 
2: 
let newline (lexbuf:lexbuf) =
    lexbuf.EndPos <- lexbuf.EndPos.AsNewLinePos()

Likewise if your language includes the ability to mark source code locations, see custom essay (e.g. the #line directive in OCaml and F#) then you must similarly adjust the lexbuf.EndPos according to the information your grok from your input.

Notes on OCaml Compatibility

OCamlYacc accepts the following:

1: 
%type < context -> context > toplevel

For FSharpYacc you just add parentheses:

1: 
%type < (context -> context)) > toplevel
type Expr =
  | Val of string
  | Int of int
  | Float of float
  | Decr of obj

Full name: yaccindex.Expr
union case Expr.Val: string -> Expr
Multiple items
val string : value:'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------
type string = System.String

Full name: Microsoft.FSharp.Core.string
union case Expr.Int: int -> Expr
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
union case Expr.Float: float -> Expr
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = System.Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
union case Expr.Decr: obj -> Expr
type Stmt =
  | Assign of string * obj
  | While of Expr * Stmt
  | Seq of Stmt list
  | IfThen of Expr * Stmt
  | IfThenElse of Expr * Stmt * Stmt
  | Print of Expr

Full name: yaccindex.Stmt
union case Stmt.Assign: string * obj -> Stmt
union case Stmt.While: Expr * Stmt -> Stmt
Multiple items
union case Stmt.Seq: Stmt list -> Stmt

--------------------
module Seq

from Microsoft.FSharp.Collections
type 'T list = List<'T>

Full name: Microsoft.FSharp.Collections.list<_>
union case Stmt.IfThen: Expr * Stmt -> Stmt
union case Stmt.IfThenElse: Expr * Stmt * Stmt -> Stmt
union case Stmt.Print: Expr -> Stmt
Multiple items
union case Prog.Prog: Stmt list -> Prog

--------------------
type Prog = | Prog of Stmt list

Full name: yaccindex.Prog
namespace System
type Int32 =
  struct
    member CompareTo : value:obj -> int + 1 overload
    member Equals : obj:obj -> bool + 1 overload
    member GetHashCode : unit -> int
    member GetTypeCode : unit -> TypeCode
    member ToString : unit -> string + 3 overloads
    static val MaxValue : int
    static val MinValue : int
    static member Parse : s:string -> int + 3 overloads
    static member TryParse : s:string * result:int -> bool + 1 overload
  end

Full name: System.Int32
type Double =
  struct
    member CompareTo : value:obj -> int + 1 overload
    member Equals : obj:obj -> bool + 1 overload
    member GetHashCode : unit -> int
    member GetTypeCode : unit -> TypeCode
    member ToString : unit -> string + 3 overloads
    static val MinValue : float
    static val MaxValue : float
    static val Epsilon : float
    static val NegativeInfinity : float
    static val PositiveInfinity : float
    ...
  end

Full name: System.Double
Multiple items
module List

from Microsoft.FSharp.Collections

--------------------
type List<'T> =
  | ( [] )
  | ( :: ) of Head: 'T * Tail: 'T list
  interface IEnumerable
  interface IEnumerable<'T>
  member Head : 'T
  member IsEmpty : bool
  member Item : index:int -> 'T with get
  member Length : int
  member Tail : 'T list
  static member Cons : head:'T * tail:'T list -> 'T list
  static member Empty : 'T list

Full name: Microsoft.FSharp.Collections.List<_>
val rev : list:'T list -> 'T list

Full name: Microsoft.FSharp.Collections.List.rev
namespace Microsoft
namespace Microsoft.FSharp
type obj = System.Object

Full name: Microsoft.FSharp.Core.obj
type unit = Unit

Full name: Microsoft.FSharp.Core.unit
Fork me on GitHub