Archive for May, 2009

How to enable your Windows Service to have its name configured during installation

1 Comment »

In this posting I will show you how you can configure the name of a Windows Service as it is installed. Once we have tweaked the service installer properly, we will be able to have InstallUtil.exe install our service under a name which we supply as a command-line parameter.

Let’s say we’ve implemented a Windows Service in C#. To install the service, we’ve implemented an Installer, i.e. something like the following:

    [RunInstaller(true)]
    public class WindowsServiceInstaller : Installer
    {
        public WindowsServiceInstaller()
        {
            ServiceProcessInstaller serviceProcessInstaller =
                               new ServiceProcessInstaller();
            ServiceInstaller serviceInstaller = new ServiceInstaller();

            //# Service Account Information
            serviceProcessInstaller.Account = ServiceAccount.LocalSystem;
            serviceProcessInstaller.Username = null;
            serviceProcessInstaller.Password = null;

            //# Service Information
            serviceInstaller.StartType = ServiceStartMode.Automatic;

            serviceInstaller.DisplayName = "My Windows Service";
            serviceInstaller.ServiceName = "my_windows_service";

            this.Installers.Add(serviceProcessInstaller);
            this.Installers.Add(serviceInstaller);
        }
    }

Assuming that the service is contained in Installable.exe, we install the service using InstallUtil.exe like so:

c:\windows\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe /i c:\service1\Installable.exe

Now, what happens if we want to have two instances of this service running? We might want to have multiple services running, each associated with different databases. If we place Installable.exe and its configuration file in a new folder (say c:\service2) we might try to install this service via

c:\windows\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe /i c:\service2\Installable.exe

InstallUtil will throw an error: "System.ComponentModel.Win32Exception: The specified service already exists". InstallUtil complains because a service named my_windows_service is already installed.

So, what can we do? The obvious thing to do is to just change the lines

            serviceInstaller.DisplayName = "My Windows Service";
            serviceInstaller.ServiceName = "my_windows_service";

in WindowsServiceInstaller, recompile and run InstallUtil.exe again. While this works, it doesn’t play well with source control: what should the serviceInstaller.ServiceName property be set to in the code committed to the repository? How do you ensure that you remember to verify that this property is set correctly, before you compile and deploy the code?

What we really need is a way to supply the service name to the install procedure. Ideally, we would like to be able to do something like

c:\windows\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe /i c:\service1\Installable.exe /servicename="my_service_instance_1" /servicedisplayname="My Service Instance 1"

but InstallUtil doesn’t support these arguments. However, InstallUtil won’t bail out if it encounters unknown arguments either. This observation is going to be half the  solution. The other half is knowing that you can use System.Environment.GetCommandLineArgs() to get at the command line arguments provided for the current process.

Thus, in the installer we can access the command line arguments provided to InstallUtil, parse these arguments ourselves and set the service name accordingly.

To do this, we add the following private methods to the installer

private void SetServicePropertiesFromCommandLine(ServiceInstaller serviceInstaller)
{
	string[] commandlineArgs = Environment.GetCommandLineArgs();

	string servicename;
	string servicedisplayname;
	ParseServiceNameSwitches(commandlineArgs, out servicename, out servicedisplayname);

	serviceInstaller.ServiceName = servicename;
	serviceInstaller.DisplayName = servicedisplayname;
}

private void ParseServiceNameSwitches(string[] commandlineArgs, out string serviceName, out string serviceDisplayName)
{
	var servicenameswitch = (from s in commandlineArgs where s.StartsWith("/servicename") select s).FirstOrDefault();
	var servicedisplaynameswitch = (from s in commandlineArgs where s.StartsWith("/servicedisplayname") select s).FirstOrDefault();

	if (servicenameswitch == null)
		throw new ArgumentException("Argument 'servicename' is missing");
	if (servicedisplaynameswitch == null)
		throw new ArgumentException("Argument 'servicedisplayname' is missing");
	if (!(servicenameswitch.Contains('=') || servicenameswitch.Split('=').Length < 2))
		throw new ArgumentNullException("The /servicename switch is malformed");

	if (!(servicedisplaynameswitch.Contains('=') || servicedisplaynameswitch.Split('=').Length < 2))
		throw new ArgumentNullException("The /servicedisplaynameswitch switch is malformed");

	serviceName = servicenameswitch.Split('=')[1];
	serviceDisplayName = servicedisplaynameswitch.Split('=')[1];

	serviceName = serviceName.Trim('"');
	serviceDisplayName= serviceDisplayName.Trim('"');
}

The SetServicePropertiesFromCommandLine method retrieves the command line arguments and configures the installer to set the service’s name properties. The second method, ParseServiceNameSwitches, is just a utility method for SetServicePropertiesFromCommandLine.

We use these methods from InstallableServiceInstaller’s constructor:

public InstallableServiceInstaller()
{
	ServiceProcessInstaller serviceProcessInstaller =  new ServiceProcessInstaller();
	ServiceInstaller serviceInstaller = new ServiceInstaller();

	//# Service Account Information
	serviceProcessInstaller.Account = ServiceAccount.LocalSystem;
	serviceProcessInstaller.Username = null;
	serviceProcessInstaller.Password = null;

	SetServicePropertiesFromCommandLine(serviceInstaller);

	//# Service Information
	serviceInstaller.StartType = ServiceStartMode.Automatic;
	this.Installers.Add(serviceProcessInstaller);
	this.Installers.Add(serviceInstaller);
}

We can now install our service, providing the service names to InstallUtil as desired:

c:\windows\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe /i c:\service1\Installable.exe /servicename="my_service_instance_1" /servicedisplayname="My Service Instance 1"


How do I parse a csv file using yacc?

No Comments »

Parsing csv files… it’s tedious, it’s ugly and it’s been around forever. But if you’re working with legacy systems from before the rise of XML, chances are you will have to handle csv files on a regular basis.

Parsing csv files seems like a task perfectly suitable for a standard library or framework. However, this ostensibly easy task is apparently not quite that easy to tackle in a generic way. Despite a substantial effort, I haven’t been able to find a library that I find widely applicable and pleasant to work with. Microsoft BizTalk has some nice flat-file XSD extensions, but enrolling BizTalk on each of your projects involving csv files is hardly a palatable approach. A more light-weight approach might be the Linq2CSV project on CodeProject. This project allows you to define a class describing the entities in each line of the csv file. Once this is done, the project provides easy means for reading the data of a csv file and populating a list of objects of the class just defined. If the input doesn’t conform to the expected format, an exception will be thrown containing descriptions and linenumbers of the errors encountered.  This seems like a really nice approach and I will probably be using it on a few small projects in the near future.

However, the proper way of parsing is of course to bring out the big guns: yacc (Yet Another Compiler Compiler). As its name suggests, yacc is intended for tasks much more complex than parsing a bunch of one-liners. Yacc is a code-generation-tool for generating a parsers from a context free grammar specification. The generated parsers are of a type known as LR parsers and are well suited for implementing DSLs and compilers. Yacc even comes in a variety of flavors, including implementations for C, C#, ML and F#.

Below, I will show you how to parse a csv file in a sample format, generating a corresponding data structure. I will be using the F# yacc implementation (fsyacc) which comes with the F# installation (I’m using F# 1.9.6.2).

The parsing process

When we’re parsing text, the ultimate goal is to recognize the structure of a string of incoming characters and create a corresponding datastructure suitable for later processing. When designing compilers, the datastructure produced by the parser is referred to as an abstract syntax tree, but I will just to refer it as the datastructure.

Parsing the incoming text will fall in these steps:

The Parsing Process

The parser relies on a lexer to turn the incoming string of characters into a sequence of tokens. Tokens are characterized by their type and some may carry annotations. Thus, a token of type INT, representing an integer in the input stream, will have the actual integer value associated to it, because in most cases we will be interested in the particular value, not just the fact that some integer was present in the input. On the other hand, a token of type EOL, representing an end-of-line, will not carry any extra information.

We will not cover the details of the lexer in this posting.

The data format

The data we will be parsing will look like this:


328 15 20.1
328 13 11.1
328 16 129.2
328 19 4.3

Each line contains two integers followed by a decimal value, each separated by whitespace. This is not a traditional csv format, since the values are separated by whitespace. However, it is straightforward to adapt a lexer tokenizing the format above to a lexer procesing a more traditional format.

The dataset represents the result of measuring various health parameters of a person. The first integer of each line identifies a person. The next integer identifies a parameter and the decimal represents the value measured. Thus, if parameter 15 is BMI (body mass index), the first line above states that user 328 has a BMI of 20.1.

To parse this dataformat, we will need the following tokens:

Token Description Annotation
INT An integer The integer value
FLOAT A decimal The decimal value
EOR End of record (end of line) -
EOF End of input (end of file) -

The datastructure

To represent the data, we will use a very simple F# datastructure:

The datastructure

We define the datastructure in F# like this:

module Ast =
type Line = { userid : int; parameterid : int; value : float; }
type DataSet = DataSet of Line list

The context free grammar

For yacc to be able to generate a parser which makes sense of the data format described above, we need to provide yacc with instructions on how to convert a sequence of tokens into the components of the datastructure. We do this in the form of a context free grammar and a set of associated semantic actions. To understand these concepts, let’s have a look at the context free grammar we will actually provide to yacc:

DataSet LineList
LineList Line
  | LineList EOR Line
Line INT INT FLOAT

Each line in the grammar is traditionally called a production, because it states that the term on the left side of the arrow may be expanded into whatever is on the right side of the arrow. The terms on the left are called non-terminals, because they may be expanded into their constituents, namely the elements on the right. In the grammar above, DataSet, LineList and Line are non-terminals. On the other hand, no productions exist expanding EOR, INT or FLOAT. Thus, these elements are said to be terminals. They are the tokens which the lexer may provide.

The fourth production above states that the concept of a Line consists of two consecutive INT tokens and a FLOAT token, in that order. The second and third productions combined state that a LineList is either a Line or consists of a LineList followed by an EOR token and a Line. Thus, if two Line elements separated by an EOR token have been identified by the parser, it may consider this to be a LineList, since the first Line is a LineList by the second production while this sequence of a LineList followed by an EOR token and a Line is itself a LineList by the third production.

You should remember, that while the terms in the context free grammar are intimately related to the elements in our datastructure, these concepts are not the same. Also note that we had to introduce the recursively defined LineList element in the grammar to accomodate the concept of a list of elements.

If you’ve never encountered context free grammars before, a more thorough introduction than what I have provided may be desirable. In this case, you may want to consult Wikipedia.

Semantic actions

The datastructure is constructed in a bottom-up fashion by executing a piece of code each time a production is applied. The piece of code is called the production’s semantic action. For the

Line → INT INT FLOAT

production, we create a corresponding Ast.Line instance (cf. the "The datastructure" section above). These are the semantic actions we will need:

Production Semantic action Description
DataSet LineList DataSet($1) Create a Ast.DataSet, passing the Ast.Line list to the constructor
LineList Line [$1] Create a Ast.Line list containing a single element
  | LineList EOR Line $3 :: $1 Concatenate the Ast.Line to the list of the LineList
Line INT INT FLOAT

{

userid = $1;

parameterid = $2;

value = $3;

}

Create a new Ast.Line assigning the first integer of the line to Ast.Line.userid, the second integer to Ast.Line.parameterid and the float to Ast.Line.value

As you have probably guessed, the $x variables in the fourth semantic action refers to the values of the INT and FLOAT tokens.

When specifying the semantic action for a production to fsyacc, you enclose the appropriate piece of code in braces after the production. Thus, our parser specification will look like this:

%{
open RI.Statistics.Ast
%}

%start start
%token <System.Int32> INT
%token <System.Double> FLOAT
%token EOR
%token EOF
%type <RI.Statistics.Ast.DataSet> start

%%

start: DataSet { $1 }

DataSet: LineList { DataSet($1) }

LineList: Line { [$1] }
| LineList EOR Line { $3 :: $1 }

Line: INT INT FLOAT { { userid = $1; parameterid = $2; value = $3; } }

Generating and exercising the parser

To generate the parser, you run yacc from the command line, passing the name of the file containing the parser specification above as an argument. For fsyacc, we get:



C:\Users\rui\Projects\Statistics\Parser>fsyacc Parser.fsp --module Parser

building tables

computing first function...time: 00:00:00.1604878

building kernels...time: 00:00:00.1359233

building kernel table...time: 00:00:00.0407968

computing lookahead relations.............time: 00:00:00.0723062

building lookahead table...time: 00:00:00.0439270

building action table...time: 00:00:00.0673112

building goto table...time: 00:00:00.0099398

returning tables.

10 states

5 nonterminals

7 terminals

6 productions

#rows in action table: 10

C:\Users\rui\Projects\Statistics\Parser>

This produces two files, Parser.fs and Parser.fsi, which contain the implementation of the parser. We will include them when we compile the parser.

To test the parser, we create a console application which will parse the sample data presented earlier and print the resulting datastructure:

#light
open Lexing
open Parser

let x = @"328 15 0,0
  328 13 11,1
  328 16 129,2
  328 19 4,3"

let parse() =
  let myData =
    let lexbuf = Lexing.from_string x in
      Parser.start Lexer.Parser.tokenize lexbuf in
  myData

let data = parse()
printfn "%A" data

Compiling this with the generated parser and executing the resulting console application results in this:



C:\Users\rui\Projects\Statistics\ConsoleApplication\bin\Debug>ConsoleApplication.exe

DataSet [{userid = 328; parameterid = 19; value = 4.3;};

{userid = 328; parameterid = 16; value = 129.2;};

{userid = 328; parameterid = 13; value = 11.1;};

{userid = 328; parameterid = 15; value = 0.0;}]

C:\Users\rui\Projects\Statistics\ConsoleApplication\bin\Debug>

Presto, we have our datastructure! And with only a minimum amount of code!

 
So, is this really an appropriate approach for parsing csv files? Well, no, not quite. Even though the procedure described above is rather straightforward, there’s no error reporting facility, making it inappropriate for anything but a prototype application. Thus, for parsing csv files, the aforementioned Linq2CSV project, or something similar, will probably give us much more initial headway than the yacc approach. But the yacc approach scales very well with the complexity of the input, hence may become feasible as the complexity of the input increases.

UPDATE: It has come to my attention that Robert Pickering, in the second edition of his book on F#, intends to include a special section on parsing text files using other means than fsyacc. Thus, if you’re reading this posting with the intention of actually using the method described above for production, you may want to consult Robert Pickering’s book for alternatives.