Archive for April, 2009

A gotcha when using fslex with #light syntax

No Comments »

Tonight I’ve been implementing a small lexer/parser pair in order to be able to read data from a csv-like file and process the data using F# Interactive. I originally chose F# over C# because F# fits well with the data processing I’m doing. I expected to use a library written in C# for loading the data, but decided to use fslex and fsyacc instead, just for the fun of it. However, I came across a problem with fslex (or my understanding of fslex) and I thought I’d share the solution:

I want all the code for loading the data to go into a module named Parser. Therefore, my Lexer.fsl begins with the following code:

{
module Parser =
  open System
  open Lexing
  open Parser


This section of the lex specification is traditionally called the definition section and contains initial code I want copied into the final lexer. The code produced by fslex will look something like

module Parser =
open System
open Lexing
open Parser

# 8 "Lexer.fs"
let trans : uint16[] array =
[|
(* State 0 *)

(* ...lots of code... *)

let rec __fslex_dummy () = __fslex_dummy()
(* Rule tokenize *)
and tokenize (lexbuf : Microsoft.FSharp.Text.Lexing.LexBuffer<_>) = __fslex_tokenize 0 lexbuf
and __fslex_tokenize __fslex_state lexbuf =
match __fslex_tables.Interpret(__fslex_state,lexbuf) with
| 0 -> (
# 14 "Lexer.fsl"
tokenize lexbuf
# 50 "Lexer.fs"
)
| 1 -> (
# 15 "Lexer.fsl"
EOR
# 55 "Lexer.fs"

(* ...more code... *)

However, when trying to build the generated parser, the compiler would complain that “the value or constructor ‘EOR’ is not defined”. This had me stumbled for a while until I realized, that the code generator wasn’t indenting the code properly. The

let rec __fslex_dummy () = ...

wasn’t indented at all, thus the

open System
open Lexing
open Parser

statements, which were indented to be part of the Parser module, weren’t in scope anymore. To fix this, I had to fall back to the more verbose syntax

module Parser = begin
open System
open Lexing
open Parser

(* code code code *)

end

This made the compiler concur.
Just like you can add any valid F# code to the definition section of the lexer specification, you can add a similar section, called the user subroutines section, to the end of the file. fslex will copy it to the end of the generated code. Thus, we can have fslex generate the desired code by changing the definition section of the lexer specification to

{
module Parser = begin
open System
open Lexing
open Parser
}

and adding a closing section like

{
end
}

I guess the code generator ought to have handled the problem by indenting the generated code properly, but this is just a CTP, so hopefully it will be fixed in the future.