Archive for the ‘.NET’ Category

Unit testing against Windows Azure Development Storage

1 Comment »

When developing an application targeted for Windows Azure, your application will be executing in an environment to which you have somewhat limited access, making debugging hard. Thus, you will probably want to test your application as much as possible before deploying it. Enter unit testing.

To be able to execute unit tests  against the Azure storage (well, maybe such tests are more like integration tests), you need to have the Azure Development Storage running when the tests are executed. To achieve this, you will need to do two things:

  1. Make sure the development storage is available
  2. Make sure the development storage is running before testing begins

The first task is taken care of by installing the Windows Azure SDK. If you are developing Windows Azure applications, you will probably already have this installed on your computer. However, if you plan to run your tests on a dedicated build server, you may need to install the SDK on that machine.

For the second task, you need to start the Development Storage prior to executing tests. To start the Development Storage you can use the CSRun utility from the Azure SDK. If you have installed the SDK to the default position, you will find csrun.exe in C:\Program Files\Windows Azure SDK\v1.0\bin:

image

image

If you want to integrate this task with your build scripts, you can of course create a NAnt target for it:

<property name="azure.sdk.csrun.exe" value="C:\Program Files\Windows Azure SDK\v1.0\bin\csrun.exe"/>

<target name="azure.devstorage.start" description="Starts the Azure Development Storage">
    <exec program="${azure.sdk.csrun.exe}"
        commandline="/devstore:start">
    </exec>
    <sleep seconds="5"/><!-- Sleep 5 seconds, waiting for the development storage to start -->
</target>

As you can see, I have added a delay in order to allow the storage to fully start before subsequent tasks are executed.

Now just call this target before executing your unit tests. You may also want to shut down the Development Storage upon completion of your tests. Use csrun /devstore:shutdown to do this

Note that if your project is the first project to use Development Storage on the build server, you will want to initialize the database underlying Development Storage before running tests. Initializing the Development Storage will create a database called something like DevelopmentStorageDb20090919 in the local SQL Server (Express) instance.

image

image

You just need to do this once. However, if you feel like it, you can force DSInit to recreate the database as part of your test procedure by issuing the command dsinit.exe /forceCreate. DSInit requires administrator privileges, though.


IDisposable, finalizers and garbage collection

No Comments »

Proper resource management in .NET requires that you understand a few basic concepts, which I will discuss in this posting.

The main resource consumed by your .NET program is memory and this consumption comes in two flavours: memory consumed by the stack and memory consumed by the heap. The stack is where local variables of primitive types are stored, while complex objects are stored on the heap.

In addition to memory your application will probably claim other system resources as well, such as file handles.

Since the amount available of any resource is constrained, its usage has to be managed. What this means is not only that we need to create or claim resources whenever we want to use them, we also need to destroy or release the resources when we are done with them. This last point, to which many textbooks do not dedicate enough attention IMO, is the tricky bit.

Because .NET is a garbage collected environment, you usually do not have to think about managing memory: memory used by local variables is managed automatically as the stack grows and shrinks during program execution. Memory allocated on the heap is managed by the .NET framework’s garbage collector. This is a component of the .NET runtime which will periodically freeze program execution, take a look at the memory allocated on the heap, compare it with the set of references currently used by your program, and release memory to which no references exist.

.NET offers (at least) two means to help you manage resources not handled by the garbage collector: the IDisposable interface and object finalizers.

Finalizers

Every .NET object has a Finalize() method. This method is called by the .NET runtime when the garbage collector collects the object. The Finalize() method should be overridden in any class which needs to release unmanaged resources prior to its destruction. However, in C# you cannot directly implement the Finalize method. That is, the compiler will not allow this:

protected override void Finalize()
{
    //Do clean up, releasing resources etc.
}

Instead you must implement a destructor. A destructor is implemented by creating a method whose name is constructed by prefixing the type’s name with a tilde (~). Thus, to create a destructor for the class MyClass, you would write this:

~MyClass()
{
    //Do clean up, releasing resources etc.
}

The C# compiler will transform this into

protected override void Finalize()
{
    try
    {
        //Do clean up, releasing resources etc.
    }
    finally
    {
        base.Finalize();
    }
}

Thus, the point of forcing you to use a destructor instead of overriding Finalize() directly is to help you remember to call base.Finalize(). Notice that it is the responsibility of the .NET runtime to call your destructor. In fact, you are not allowed to call the destructor yourself.

IDisposable

You can’t predict when the garbage collector runs (unless you explicitly force it to run). Thus, the destructor of an object may run long after your application is done with the object. For objects using scarce resources this is not desirable behaviour: you will usually want to perform clean up as soon as you are done with the object. The IDisposable interface is a means to this end.

The IDisposable interface contains a single method:

// Summary:
//     Defines a method to release allocated resources.
[ComVisible(true)]
public interface IDisposable
{
    // Summary:
    //     Performs application-defined tasks associated with freeing, releasing, or
    //     resetting unmanaged resources.
    void Dispose();
}

In this method you are encouraged to put the code which should run to release the resources the current object is holding. By implementing IDisposable you are signaling to users of your class that it uses unmanaged resources and that they should call Dispose() as soon as they are done with an instance of the class to release these resources. Moreover, implementing IDisposable does give you some benefits:

When your class implements IDisposable, the C# compiler will let you create an object of the given type in a using statement, and the .NET framework will make sure to call Dispose() when control leaves the statement:

using (MyDisposableClass m = new MyDisposableClass())
{
    //yada yada yada
}//m.Dispose() will be called here.

By calling Dispose implicitly like this, you make sure that you do not forget to call Dispose() or accidentally remove the call when reorganizing the code.

Now, how does destructors and Dispose() relate? Obviously you don’t want to have to duplicate clean up code in your class, so where should you put it: in the destructor or in Dispose()? Since Dispose() is the more aggressive approach and you aren’t allowed to call a destructor directly, you should put your clean up code in Dispose() and call Dispose() from your destructor. Be aware that putting a call to Dispose() inside your destructor means that your Dispose() method may be called multiple times: first when control leaves a using statement in which the object is being used and then again when the garbage collector collects the object and the destructor is called. Thus, your Dispose() method should not blow up if it is called multiple times. Also, Microsoft specifically declares that you should not make any assumptions on the order in which objects are collected or have their Finalize method called. Thus, when your cleanup code is called because of a call to Finalize(), you should not reference other managed resources, since these may already have been destroyed.

Suppressing Finalize

Executing a Finalize method can be costly and it actually delays the reclaiming of the memory used by the object. Thus, if appropriate clean up has already been performed for an instance through a call to Dispose(), it is desirable that the destructor is not run when the instance is collected by the garbage collector. To achieve this, your Dispose method should call the GC.SuppressFinalize() method, passing in the instance as a parameter.

That’s it. I hope this has cleared up some of the confusion usually surrounding IDisposable, finalizers and garbage collection.

As you can see, there are a number of rules you should adhere to if you want to manage resources effectively. To help you to arrive at a correct resource management scheme, Microsoft provides a reference implementation here: http://msdn.microsoft.com/en-us/library/system.gc.suppressfinalize.aspx. Notice how the Dispose(bool disposing) overload is used to ensure that only unmanaged resources are accessed when Finalize causes a dispose.


Creating an Outlook add-in to write new messages to disk

No Comments »

In this post I will describe how to create an add-in for Outlook which will cause Outlook to write new messages to a folder on disk.

I’ve created the add-in using Visual Studio 2010, since I couldn’t get the Visual Studio Tools for Office (VSTO) to work for my VS 2008 installation.

After firing up VS 2010 choose Visual C# –> Office –> 2007 –> Outlook 2007 Add-in:

image

Visual Studio will create a class called ThisAddIn. You will initially only see a partial class in the file ThisAddIn.cs:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
using Outlook = Microsoft.Office.Interop.Outlook;
using Office = Microsoft.Office.Core;

namespace NewItemsPersister
{
    public partial class ThisAddIn
    {
        private void ThisAddIn_Startup(object sender, System.EventArgs e)
        {
        }

        private void ThisAddIn_Shutdown(object sender, System.EventArgs e)
        {
        }

        #region VSTO generated code

        /// <summary>
        /// Required method for Designer support - do not modify
        /// the contents of this method with the code editor.
        /// </summary>
        private void InternalStartup()
        {
            this.Startup += new System.EventHandler(ThisAddIn_Startup);
            this.Shutdown += new System.EventHandler(ThisAddIn_Shutdown);
        }

        #endregion
    }
}

Visual Studio will also have created a file called ThisAddIn.Designer.cs which contains the bulk of the class. You will not need to bother with this file, but it is nice to know that it is there.

Obviously, the ThisAddIn_Startup method will execute when the add-in is loaded while the ThisAddIn_Shutdown method will execute when the add-in is unloaded.

To execute code when Outlook downloads a new message, we will need to register an appropriate event handler. The ThisAddIn class has an Application property (auto-generated by the Visual Studio template) which you can use to access an event firing whenever a new mail is received.

Thus, we update the class to contain the following code:

public partial class ThisAddIn
{
    private ApplicationEvents_11_NewMailExEventHandler handler;

    private void ThisAddIn_Startup(object sender, System.EventArgs e)
    {
        this.Application.NewMailEx += handler;
    }

    void Application_NewMailEx(string EntryIDCollection){ }

    private void ThisAddIn_Shutdown(object sender, System.EventArgs e)
    {
        this.Application.NewMailEx -= handler;
    }

    #region VSTO generated code

    /// <summary>
    /// Required method for Designer support - do not modify
    /// the contents of this method with the code editor.
    /// </summary>
    private void InternalStartup()
    {
        this.Startup += new System.EventHandler(ThisAddIn_Startup);
        this.Shutdown += new System.EventHandler(ThisAddIn_Shutdown);

        handler = new ApplicationEvents_11_NewMailExEventHandler(Application_NewMailEx);
    }

    #endregion
}

The ApplicationEvents_11_NewMailExEventHandler delegate (found in the Microsoft.Office.Interop.Outlook namespace) will be passed a string containing a comma-separated list of identifiers for the received mails. Once you have such an identifier, you can get a hold of the corresponding mail like this:

MailItem item = (MailItem)Application.Session.GetItemFromID(entryId, null);

To save the MailItem, use the MailItem’s SaveAs method. This method takes two parameters: the path in which to save the message and an instance of the OlSaveAsType enumeration designating the file type.

To have the add-in save new files to disk in the C:\tmp folder, update the Application_NewMailEx method as follows:

void Application_NewMailEx(string EntryIDCollection)
{
    string[] entryIdArray = EntryIDCollection.Split(',');
    foreach (string entryId in entryIdArray)
    {
        try
        {
            MailItem item = (MailItem)Application.Session.GetItemFromID(entryId, null);
            string fullpath = string.Format(@"c:\tmp\{0}.msg", item.Subject);
            item.SaveAs(fullpath, OlSaveAsType.olMSG);
        }
        catch (System.Exception e)
        {
            MessageBox.Show(e.Message);
        }
    }
}

When you build the application four files will be generated:

image

Double-clicking the NewItemsPersister.vsto will install the add-in.

Now, whenever I receive a mail, the mail will immediately be written to the disk:

image

image

When developing Outlook add-ins, you will often want to uninstall the add-in to be able to install an updated version. To uninstall the add-in, use Control Panel –> Programs and Features.

NOTE: when developing the add-in above, I continually ran into trouble when reinstalling the add-in. I would get an error saying that the add-in could not be installed, because it was already installed. This happened even though I had removed the add-in using Control Panel –> Programs and Features. To solve this, I had to clear the ClickOnce install cache using the Visual Studio command prompt:

image

More on the Manifest Generation and Editing Tool (Mage.exe) here.


Writing an Add-in for Visual Studio 2008

4 Comments »

I’ve been thinking about writing a test generation add-in for Visual Studio. The idea is to have an add-in that will enable the user to auto-generate a bunch of tests for each controller in an ASP.NET MVC project.

I haven’t done any programming for Visual Studio before, so I decided that the first step would be to be able to place a menu item in the context menu of the files in Solution Explorer containing controllers.

 vsaddin_enabled vsaddin_disabled

To be able to query the Solution Explorer for things like ‘give me the files the user currently has selected’, ‘does the file the user has selected contain a controller’ etc., I created a wrapper:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using EnvDTE;
using EnvDTE80;
using TestGenerator.CodeGeneration;

namespace TestGenerator
{
    public class SolutionExplorerWrapper
    {
        private DTE2 _applicationObject;

        public SolutionExplorerWrapper(DTE2 applicationObject)
        {
            this._applicationObject = applicationObject;
        }

        public bool IsFolder(ProjectItem item)
        {
            foreach (string guid in folderGUIDs)
            {
                if (item.Kind.ToUpper() == guid)
                    return true;
            }
            return false;
        }

        public IEnumerable<ProjectItem> GetSelectedProjectItems()
        {
            UIHierarchy uih = (UIHierarchy)_applicationObject.Windows.Item(Constants.vsWindowKindSolutionExplorer).Object;
            Array selectedItems = (Array)uih.SelectedItems;
            foreach (UIHierarchyItem item in selectedItems)
                yield return (ProjectItem)item.Object;
        }

        public IEnumerable<ProjectItem> GetSelectedItemsContainingControllers()
        {
            return (from pi in GetSelectedProjectItems()
                    where
                        pi.IsPhysicalFile() &&
                        ContainsControllers((FileCodeModel2)pi.FileCodeModel)
                    select pi);
        }



        private bool ContainsControllers(FileCodeModel2 file)
        {
            FileCodeModel2QueryHelper helper = new FileCodeModel2QueryHelper(file);
            return helper.FileContainsControllers();
        }

        private readonly string[] folderGUIDs = new string[3] { "{6BB5F8EF-4483-11D3-8BCF-00C04F8EC28C}", "{6BB5F8F0-4483-11D3-8BCF-00C04F8EC28C}", "{66A26722-8FB5-11D2-AA7E-00C04F688DDE}" };
    }
}

The elements in the Solution Explorer are represented as ProjectItem objects and because of all the COM goo (I guess) you can’t just do something like

if(projectItem is IFolder)

to determine if a given ProjectItem instance is a folder. Thus, to ease the interaction with these objects, I created two extension methods:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using EnvDTE;

namespace TestGenerator
{
    public static class ProjectItemExtensions
    {
        public static bool IsPhysicalFile(this ProjectItem item)
        {
            return item.Kind == EnvDTE.Constants.vsProjectItemKindPhysicalFile;
        }

        public static bool IsPhysicalFolder(this ProjectItem item)
        {
            return item.Kind == EnvDTE.Constants.vsProjectItemKindPhysicalFolder;
        }
    }
}

Given a COM object like a ProjectItem, you can use a class in the Microsoft.VisualBasic namespace to get a string representation of what type of object the COM object is wrapping. Hence, I created the following extension method, which was very helpful during development:

public static class ComObjectExtentions
{
    public static string GetTypeName(this object o)
    {
        return Microsoft.VisualBasic.Information.TypeName(o);
    }
}

The actual implementation of the add-in looks complex at first glance, but it is really quite simple. Much of the code is just boilerplate code generated by the Visual Studio Add-In wizard.

public class Connect : IDTExtensibility2, IDTCommandTarget
    {
        /// <summary>Implements the constructor for the Add-in object. Place your initialization code within this method.</summary>
        public Connect()
        {
        }

        /// <summary>Implements the OnConnection method of the IDTExtensibility2 interface. Receives notification that the Add-in is being loaded.</summary>
        /// <param term='application'>Root object of the host application.</param>
        /// <param term='connectMode'>Describes how the Add-in is being loaded.</param>
        /// <param term='addInInst'>Object representing this Add-in.</param>
        /// <seealso class='IDTExtensibility2' />
        public void OnConnection(object application, ext_ConnectMode connectMode, object addInInst, ref Array custom)
        {
            try
            {
                _applicationObject = (DTE2)application;
                _addInInstance = (AddIn)addInInst;

                switch (connectMode)
                {
                    case ext_ConnectMode.ext_cm_UISetup:

                        // Create commands in the UI Setup phase. This phase is called only once when the add-in is deployed.
                        CreateCommands();
                        break;

                    case ext_ConnectMode.ext_cm_AfterStartup:

                        InitializeAddIn();
                        break;

                    case ext_ConnectMode.ext_cm_Startup:

                        // Do nothing yet, wait until the IDE is fully initialized (OnStartupComplete will be called)
                        break;
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.ToString(), "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
            }
        }

        private void InitializeAddIn()
        {
            CommandBarControl myCommandBarControl;
            CommandBar codeWindowCommandBar;
            Command myCommand1;
            CommandBars commandBars;

            // Retrieve commands created in the ext_cm_UISetup phase of the OnConnection method
            myCommand1 = _applicationObject.Commands.Item(_addInInstance.ProgID + "." + m_COMMAND_GENERATETESTS_NAME, -1);

            // Retrieve the context menu of an item in solution explorer
            commandBars = (CommandBars)_applicationObject.CommandBars;
            codeWindowCommandBar = commandBars["Item"];

            // Add a popup command bar
            myCommandBarControl = codeWindowCommandBar.Controls.Add(MsoControlType.msoControlPopup,
               System.Type.Missing, System.Type.Missing, System.Type.Missing, System.Type.Missing);

            m_commandBarPopup = (CommandBarPopup)myCommandBarControl;

            // Change its caption
            m_commandBarPopup.Caption = "RUI Test Generator";

            // Add controls to the popup command bar
            m_commandBarControl1 = (CommandBarControl)myCommand1.AddControl(m_commandBarPopup.CommandBar,
               m_commandBarPopup.Controls.Count + 1);

            m_commandBarControl1.Caption = "Generate Tests";
        }

        /// <summary>Implements the OnDisconnection method of the IDTExtensibility2 interface. Receives notification that the Add-in is being unloaded.</summary>
        /// <param term='disconnectMode'>Describes how the Add-in is being unloaded.</param>
        /// <param term='custom'>Array of parameters that are host application specific.</param>
        /// <seealso class='IDTExtensibility2' />
        public void OnDisconnection(ext_DisconnectMode disconnectMode, ref Array custom)
        {
            try
            {
                if (m_commandBarPopup != null)
                {
                    m_commandBarPopup.Delete(true);
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.ToString(), "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
            }
        }

        /// <summary>Implements the OnAddInsUpdate method of the IDTExtensibility2 interface. Receives notification when the collection of Add-ins has changed.</summary>
        /// <param term='custom'>Array of parameters that are host application specific.</param>
        /// <seealso class='IDTExtensibility2' />
        public void OnAddInsUpdate(ref Array custom)
        {
        }

        /// <summary>Implements the OnStartupComplete method of the IDTExtensibility2 interface. Receives notification that the host application has completed loading.</summary>
        /// <param term='custom'>Array of parameters that are host application specific.</param>
        /// <seealso class='IDTExtensibility2' />
        public void OnStartupComplete(ref Array custom)
        {
            InitializeAddIn();
        }

        /// <summary>Implements the OnBeginShutdown method of the IDTExtensibility2 interface. Receives notification that the host application is being unloaded.</summary>
        /// <param term='custom'>Array of parameters that are host application specific.</param>
        /// <seealso class='IDTExtensibility2' />
        public void OnBeginShutdown(ref Array custom)
        {
        }

        /// <summary>Implements the QueryStatus method of the IDTCommandTarget interface. This is called when the command's availability is updated</summary>
        /// <param term='commandName'>The name of the command to determine state for.</param>
        /// <param term='neededText'>Text that is needed for the command.</param>
        /// <param term='status'>The state of the command in the user interface.</param>
        /// <param term='commandText'>Text requested by the neededText parameter.</param>
        /// <seealso class='Exec' />
        public void QueryStatus(string commandName, vsCommandStatusTextWanted neededText, ref vsCommandStatus status, ref object commandText)
        {
            if (commandName == _addInInstance.ProgID + "." + m_COMMAND_GENERATETESTS_NAME)
            {
                SolutionExplorerWrapper solutionExplorer = new SolutionExplorerWrapper(_applicationObject);
                if (solutionExplorer.GetSelectedItemsContainingControllers().Any())
                    status = vsCommandStatus.vsCommandStatusSupported | vsCommandStatus.vsCommandStatusEnabled;
                else
                    status = vsCommandStatus.vsCommandStatusSupported;
            }
        }

        /// <summary>Implements the Exec method of the IDTCommandTarget interface. This is called when the command is invoked.</summary>
        /// <param term='commandName'>The name of the command to execute.</param>
        /// <param term='executeOption'>Describes how the command should be run.</param>
        /// <param term='varIn'>Parameters passed from the caller to the command handler.</param>
        /// <param term='varOut'>Parameters passed from the command handler to the caller.</param>
        /// <param term='handled'>Informs the caller if the command was handled or not.</param>
        /// <seealso class='Exec' />
        public void Exec(string commandName, vsCommandExecOption executeOption, ref object varIn, ref object varOut, ref bool handled)
        {
            if (commandName == _addInInstance.ProgID + "." + m_COMMAND_GENERATETESTS_NAME)
            {
                GenerateTestsForSelectedFiles();
            }
        }

        #region Menu and command setup

        private const string m_COMMAND_GENERATETESTS_NAME = "GenerateTests";
        private const string m_COMMAND_GENERATETESTS_TEXT = "Generate Tests";
        //private const string m_NAME_COMMAND2 = "MyCommand2";

        private CommandBarPopup m_commandBarPopup;
        private CommandBarControl m_commandBarControl1;
        //private CommandBarControl m_commandBarControl2;

        private void CreateCommands()
        {
            object[] contextUIGuids = new object[] { };

            _applicationObject.Commands.AddNamedCommand(_addInInstance, m_COMMAND_GENERATETESTS_NAME, m_COMMAND_GENERATETESTS_TEXT, m_COMMAND_GENERATETESTS_TEXT, true, 59,
               ref contextUIGuids, (int)vsCommandStatus.vsCommandStatusSupported);
        }

        #endregion

        public bool GenerateTestsForSelectedFiles()
        {
            SolutionExplorerWrapper solutionExplorer = new SolutionExplorerWrapper(_applicationObject);
            foreach (ProjectItem projectItem in solutionExplorer.GetSelectedProjectItems())
            {
                if (!projectItem.IsPhysicalFile())
                {
                    MessageBox.Show(string.Format("Cannot generate code for {0}", projectItem.Name), "Project item not supported");
                    return true;
                }

                CodeFileGenerator generator = new CodeFileGenerator();
                generator.GenerateTests(projectItem, _applicationObject.Solution);
            }
            return true;
        }

        private DTE2 _applicationObject;
        private AddIn _addInInstance;
    }

Of particular interest are the methods QueryStatus(…) and Exec(…).

QueryStatus is called whenever Visual Studio wants to know if the add-in should be available for use. As you can see, I just use the SolutionExplorerWrapper to make the add-in available, whenever the user has selected a file containing one or more controllers.

Exec is called whenever a command associated to the add-in is invoked. If the command is the command for generating tests (the add-in might contain other buttons or menu items invoking other commands) I proceed to create tests.

That’s it. My add-in now shows up whenever the user right-clicks a controller in Solution Explorer.

For completeness sake I have made all of the source code available here, including the CodeFileGenerator that I am using for generating (at the moment, very rudimentary) test code.


Parsing EDI messages with IronPython

2 Comments »

I have recently been wondering about how to build a generic system for parsing and processing lots of EDI messages in such a way that a minimum of work is needed when a new message type is to be processed by the system. The syntactic format of EDI messages is fairly consistent, but the semantics of particular fields are open for interpretation. Thus, I thought it should be possibly to write a general parser for turning a message into a tree structure (much like an abstract syntax tree for “the EDI language”) which would be appropriate as input to the processing phase of the system.

Having recently picked up “IronPython in Action” I decided to apply IronPython to the task. Since this was only an experiment, I decided to somewhat give up the idea of a tree structure and instead leverage the dynamic programming capabilities of IronPython. Given an EDI message at runtime I wanted to generate an object with properties corresponding to the segments, subsegments and elements of the message.

To represent the document elements I created the following very simple classes:

class Document(object):
    pass

class Segment(object):
    pass

class Element(object):
    pass

class DataElement(object):
    pass

As you can see these are all identical classes, simply derived from object. They are created as seperate classes to allow code that reflect on their type to be able to make sense of their intended use.

To describe the various delimiters and escape characters of an EDI message I created an EDIDelimiterContext:

class EDIDelimiterContext(object):
    def __init__(self, segmentSeparator='\'', elementSeparator='+', dataElementSeparator=':', escapeCharacter='?'):
        self.SegmentSeparator = segmentSeparator
        self.ElementSeparator = elementSeparator
        self.DataElementSeparator = dataElementSeparator
        self.EscapeCharacter = escapeCharacter


The code for the parser looks like this:

from Document import Document, Segment, Element, DataElement
import clr

class Parser(object):
    def __init__(self, ediDelimiterContext):
        self.EDIDelimiterContext = ediDelimiterContext

    "Splits the string input according to the given separator, taking the given escape character into consideration"
    def splitter(self, input, escape, separator):
        subStrings = []
        i = 0
        while (i < input.Length):
            if (input[i] == separator and (i == 0 or input[i - 1] != escape)):
                subStrings.append(input.Substring(0, i))
                input = input.Remove(0, i + 1)
                i = 0
            else:
                i += 1

        if(input.Length > 0):
            subStrings.append(input)

        return subStrings

    def getSegmentsStrings(self, input):
        return self.splitter(input, self.EDIDelimiterContext.EscapeCharacter, self.EDIDelimiterContext.SegmentSeparator)

    def getElementsStrings(self, input):
        return self.splitter(input, self.EDIDelimiterContext.EscapeCharacter, self.EDIDelimiterContext.ElementSeparator)

    def getDataElementsStrings(self, input):
        return self.splitter(input, self.EDIDelimiterContext.EscapeCharacter, self.EDIDelimiterContext.DataElementSeparator)

    "Attaches a property named propertyName with value value to the object targetObj. If a property with that name already exists, an index is added to the end of the name to construct a unique property name"
    def attachProperty(self, targetObj, propertyName, value):
        postfix = ""
        if hasattr(targetObj, propertyName):
            i = 2
            while hasattr(targetObj, propertyName + str(i)):
                i += 1
            setattr(targetObj, propertyName + str(i), value)
        else:
            setattr(targetObj, propertyName, value)

    "Builds a Document from a string (input)"
    def buildDocumentObject(self, input):
        doc = Document()
        segmentStrings = self.getSegmentsStrings(input)
        for segmentstr in segmentStrings:
            name, value = self.buildSegmentObject(segmentstr)
            self.attachProperty(doc, name, value)
        return doc

    "Builds a Segment from a string (input)"
    def buildSegmentObject(self, input):
        seg = Segment()
        elementsStrings = self.getElementsStrings(input)
        name, elements = self.splitNameFromSegment(elementsStrings)
        for elementstr in elements:
            self.attachProperty(seg, 'Element', self.buildElementObject(elementstr))
        return name, seg

    "Builds an Element from a string (input)"
    def buildElementObject(self, input):
        element = Element()
        dataElements = self.getDataElementsStrings(input)
        for dataelementstr in dataElements:
            self.attachProperty(element, 'DataElement', dataelementstr)
        return element

    "Determines the name of a segment. If the segment contains multiple elements, the first element will form the name of the segment. That is, the segment ABC+DEF:def+GH:gef will be named ABC"
    def splitNameFromSegment(self, elements):
        if len(elements) == 0:
            return ('Segment', elements)

        name = elements.pop(0)
        return (name, elements)

The method of primary interest in the Parser is attachProperty(self, targetObj, propertyName, value) which will attach a new property to the object given by targetObj. The property will have the value given by the value parameter. The property will be named according to the propertyName parameter, unless a property with that name already exists. In that case we will try to go with <propertyName>2, <propertyname>3, etc. Coming from a background in C# I think it is really nice to see how easy it is to attach a property using the setattr method. Dynamically typed, late bound languages like IronPython and Javascript really make you see the world in a different light once tricks like this become part of your arsenal.

Using the parser above, I can now create an object from an EDI message as follows:

>>> context = Parser.EDIDelimiterContext()

>>> parser = Parser.Parser(context)

>>> doc = parser.buildDocumentObject(‘ABC+DEF:def+GH:gef\’IJK+LM:nop:q21′)

>>> doc.ABC

<Segment object at 0x000000000000002B>

>>> doc.ABC.Element.DataElement

‘DEF’

>>> doc.ABC.Element.DataElement2

‘def’

>>>

Notice that when I write doc.ABC.El<TAB> the tab completion of the python console will allow me to easily cycle through the elements of the ABC segment. This will be immensely valuable, since given a sample EDI message of some new type, the tab completion features will guide me when I’m implementing the logic for processing such messages.


MVC Model Binding to an interface

No Comments »

Have you ever tried using ASP.NET MVC’s data binding capabilities against an interface? If so, you may have experienced some unexpected behaviour.

Let’s say we’ve got a form with two fields that we want to post to a Create action. In the Create action we’ll usually databind the fields to properties of a business object and persist the object to the database. For this example, we’ll just perform the data binding and display the bound values.

Our Order class is very simple:

public class Order : IOrder
{
    public int Id { get; set; }
    public string Text { get; set; }
}

Our form for submitting an order is equally simple:

<%@ Page Language="C#" MasterPageFile="~/Views/Shared/Site.Master" Inherits="System.Web.Mvc.ViewPage" %>

<asp:Content ID="indexTitle" ContentPlaceHolderID="TitleContent" runat="server">
    Home Page
</asp:Content>

<asp:Content ID="indexContent" ContentPlaceHolderID="MainContent" runat="server">
    <h2><%= Html.Encode(ViewData["Message"]) %></h2>

     <% using (Html.BeginForm("Create", "Home")){ %>
        <fieldset>
        <legend>Order:</legend>
            Id:<br />
            <%= Html.TextBox("order.Id") %> <br />
            Text:<br />
            <%= Html.TextBox("order.Text") %><br />
        </fieldset>
    <%} %>
</asp:Content>

We’ve got a Create action like this

[AcceptVerbs(HttpVerbs.Post)]
public ActionResult Create(FormCollection form)
{
    Order order = new Order();
    UpdateModel(order, "order", form.ToValueProvider());

    ViewData["Id"] = order.Id;
    ViewData["Text"] = order.Text;

    return View();
}

and a corresponding view which displays the values:

<%@ Page Title="" Language="C#" MasterPageFile="~/Views/Shared/Site.Master" Inherits="System.Web.Mvc.ViewPage" %>

<asp:Content ID="Content1" ContentPlaceHolderID="TitleContent" runat="server">
    Create
</asp:Content>

<asp:Content ID="Content2" ContentPlaceHolderID="MainContent" runat="server">

    <h2>Create</h2>
    <p>Data used for create:</p>
    <p>
        Id: <%= ViewData["Id"] %> <br />
        Text: <%= ViewData["Text"] %>
    </p>


</asp:Content>

When we enter a pair of values like so

forminput

we will see this upon submitting

result1

However, what happens if we try to databind to a reference of type IOrder? The IOrder interface looks like this

public interface IEntity
{
    int Id { get; set; }
}

public interface IOrder : IEntity
{
    string Text { get; set; }
}

If we change the Create action to use an IOrder reference like so

[AcceptVerbs(HttpVerbs.Post)]
public ActionResult Create(FormCollection form)
{
    IOrder order = new Order(); //This line has been changed
    UpdateModel(order, "order", form.ToValueProvider());

    ViewData["Id"] = order.Id;
    ViewData["Text"] = order.Text;

    return View();
}

we will se a somewhat surprising result:

result2 

What happens is that the posted value is not bound to the Id property of IOrder! The reason for this turns out to be that the Id property is declared in the IEntity interface. If you move the Id property from IEntity to IOrder, things will again be working as expected.

Tihs is something to keep in mind when you’re considering what to use for databinding in your application. If all your business objects are encapsulated in interfaces once they get to the web tier, you may have to provide special types for use in databinding, even though this to some extent defeats the purpose of databinding..

I have previously written a blog post about why this behaviour occurs: Hey, where are my interface’s properties?


Hey, where are my interface’s properties?

2 Comments »

This post is about the somewhat unexpected behaviour of the Type.GetProperties() method when it is called on an interface.

Last week I began writing an ASP.NET MVC application to act as a dashboard for an application which has previously had a very cumbersome UI (involving pgAdmin – certainly not best practice :-). I wanted it to have a very light feel, hence decided to use as much AJAX as I could get away with. This has worked out very well. I’ve now got 1(!) view and a bunch of controller actions all returning JsonResults.

My business objects all reference each other, even in a circular fashion: an Order has a List<OrderLine> OrderLines property and each OrderLine has an Order property. I would like to pass orders and orderlines to the browser using JSON, but using an Order as data for a JsonResult will cause the serializer to barf because of the circularity. Thus, we’ll need to pass a simpler object to the JsonResult (that’s a good idea anyway, since you will usually not need/want all the object’s properties to be passed to the browser).

My business objects looked something like

namespace X.BO
{
    public class Entity
    {
        public virtual int Id { get;set; }
    }

    public class OrderLine : Entity, IOrderLine
    {
        public virtual string Text { get; set; }
        public virtual int UnitPrice { get; set; }
        public virtual string Quantity { get; set; }
        public virtual string ProductNumber { get; set; }
        public virtual Order Order {get;set;}
    }
}

and I decided that I would want to pass objects like

namespace X.JSON
{
    public class OrderLine
    {
        public int Id { get; set; }
        public string Text { get; set; }
        public int UnitPrice { get; set; }
        public string Quantity { get; set; }
        public string ProductNumber { get; set; }
        public int OrderId {get;set;}
    }
}

to the JsonResult. I furthermore anticipated that I would be passing all kinds of objects to the browser, hence didn’t want to write all the boilerplate code for mapping from my business objects to the JSON objects over and over. What I needed was a generic mapper. Among the features I wanted was the ability to automatically have the mapper map references to business objects in the source object to an identifier in the target JSON object (for an orderline this would mean automatically mapping the BO.OrderLine.Order property of type  BO.Order to the JSON.OrderLine.OrderId property of type Int32 by copying the value of BO.Order.Id (all my business objects have an identifier named Id of type Int32).

I considered using AutoMapper, but, not having used it before, I decided that for my limited needs it would be faster to just roll my own mapper (of course, as is most often the case, my needs changed over time and I am now thinking about investing the time to get acquainted with AutoMapper – I guess hindsight is always 20/20 :-).

So, I got to work on the mapper, initially named Business2JSONMapper. Everything was working out fine, I was writing unit tests, they were all passing and I was feeling good about myself. At last I fired up the application and, lo and behold, the Business2JSONMapper blew up, complaining that the target object X.JSON.OrderLine had a property named ‘Id’ of type Int32, but that the source object had no such property! This had me baffled! The source object certainly had a property named ‘Id’ of type Int32!

I started furiously banging out regression tests to try and reproduce the behaviour in my unit tests. I had been using pretty simple objects previously, so I started adding virtual properties, inherited properties etc.

In the end, it turned that the problem lay with the IOrderLine interface. In my application, all my references are interfaces, but I hadn’t been using interfaces in the tests. The IOrderLine interface looks something like

namespace X.BO
{
    public class IEntity
    {
        public virtual int Id { get;set; }
    }

    public class IOrderLine : IEntity
    {
        string Text { get; set; }
        int UnitPrice { get; set; }
        string Quantity { get; set; }
        string ProductNumber { get; set; }
        Order Order {get;set;}
    }
}

In the Business2JSONMapper code I used the Type.GetProperties() method to determine the set of properties on the source object which might act as sources for each of the properties of the target object. This worked great when acting on classes like OrderLine, but when called on the IOrderLine type, the Id property wasn’t among the properties being returned!

Some investigation revealed the reason for this: when an interface inherits from a parent interface, it does not inherit the properties of the parent interface, it only inherits the requirement to implement these properties!

This conforms with section 8.10 of the CLI specification:

Only object types can inherit implementations, hence only object types can inherit members (see §8.9.8). While interface types can be derived from other interface types, they only “inherit” the requirement to implement method contracts, never fields or method implementations.

Thus, in order to get a full list of the properties required by an interface, we will have to traverse the inheritance hierarchy:

public List<PropertyInfo> GetTypesProperties(Type type)
{
    List<PropertyInfo> typesProperties = type.GetProperties().ToList();

    if (!type.IsInterface)
        return typesProperties;

    foreach (Type intface in type.GetInterfaces())
        typesProperties.AddRange(GetTypesProperties(intface));

    return typesProperties;
}

For mapping purposes this might lead to some problems, since we may find properties of the same name and type in different interfaces. Fortunately, I know that I don’t have to worry about this in my particular case, but if you’re writing a mapper yourself, it may be a problem you’ll have to consider.


How to enable your Windows Service to have its name configured during installation

1 Comment »

In this posting I will show you how you can configure the name of a Windows Service as it is installed. Once we have tweaked the service installer properly, we will be able to have InstallUtil.exe install our service under a name which we supply as a command-line parameter.

Let’s say we’ve implemented a Windows Service in C#. To install the service, we’ve implemented an Installer, i.e. something like the following:

    [RunInstaller(true)]
    public class WindowsServiceInstaller : Installer
    {
        public WindowsServiceInstaller()
        {
            ServiceProcessInstaller serviceProcessInstaller =
                               new ServiceProcessInstaller();
            ServiceInstaller serviceInstaller = new ServiceInstaller();

            //# Service Account Information
            serviceProcessInstaller.Account = ServiceAccount.LocalSystem;
            serviceProcessInstaller.Username = null;
            serviceProcessInstaller.Password = null;

            //# Service Information
            serviceInstaller.StartType = ServiceStartMode.Automatic;

            serviceInstaller.DisplayName = "My Windows Service";
            serviceInstaller.ServiceName = "my_windows_service";

            this.Installers.Add(serviceProcessInstaller);
            this.Installers.Add(serviceInstaller);
        }
    }

Assuming that the service is contained in Installable.exe, we install the service using InstallUtil.exe like so:

c:\windows\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe /i c:\service1\Installable.exe

Now, what happens if we want to have two instances of this service running? We might want to have multiple services running, each associated with different databases. If we place Installable.exe and its configuration file in a new folder (say c:\service2) we might try to install this service via

c:\windows\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe /i c:\service2\Installable.exe

InstallUtil will throw an error: "System.ComponentModel.Win32Exception: The specified service already exists". InstallUtil complains because a service named my_windows_service is already installed.

So, what can we do? The obvious thing to do is to just change the lines

            serviceInstaller.DisplayName = "My Windows Service";
            serviceInstaller.ServiceName = "my_windows_service";

in WindowsServiceInstaller, recompile and run InstallUtil.exe again. While this works, it doesn’t play well with source control: what should the serviceInstaller.ServiceName property be set to in the code committed to the repository? How do you ensure that you remember to verify that this property is set correctly, before you compile and deploy the code?

What we really need is a way to supply the service name to the install procedure. Ideally, we would like to be able to do something like

c:\windows\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe /i c:\service1\Installable.exe /servicename="my_service_instance_1" /servicedisplayname="My Service Instance 1"

but InstallUtil doesn’t support these arguments. However, InstallUtil won’t bail out if it encounters unknown arguments either. This observation is going to be half the  solution. The other half is knowing that you can use System.Environment.GetCommandLineArgs() to get at the command line arguments provided for the current process.

Thus, in the installer we can access the command line arguments provided to InstallUtil, parse these arguments ourselves and set the service name accordingly.

To do this, we add the following private methods to the installer

private void SetServicePropertiesFromCommandLine(ServiceInstaller serviceInstaller)
{
	string[] commandlineArgs = Environment.GetCommandLineArgs();

	string servicename;
	string servicedisplayname;
	ParseServiceNameSwitches(commandlineArgs, out servicename, out servicedisplayname);

	serviceInstaller.ServiceName = servicename;
	serviceInstaller.DisplayName = servicedisplayname;
}

private void ParseServiceNameSwitches(string[] commandlineArgs, out string serviceName, out string serviceDisplayName)
{
	var servicenameswitch = (from s in commandlineArgs where s.StartsWith("/servicename") select s).FirstOrDefault();
	var servicedisplaynameswitch = (from s in commandlineArgs where s.StartsWith("/servicedisplayname") select s).FirstOrDefault();

	if (servicenameswitch == null)
		throw new ArgumentException("Argument 'servicename' is missing");
	if (servicedisplaynameswitch == null)
		throw new ArgumentException("Argument 'servicedisplayname' is missing");
	if (!(servicenameswitch.Contains('=') || servicenameswitch.Split('=').Length < 2))
		throw new ArgumentNullException("The /servicename switch is malformed");

	if (!(servicedisplaynameswitch.Contains('=') || servicedisplaynameswitch.Split('=').Length < 2))
		throw new ArgumentNullException("The /servicedisplaynameswitch switch is malformed");

	serviceName = servicenameswitch.Split('=')[1];
	serviceDisplayName = servicedisplaynameswitch.Split('=')[1];

	serviceName = serviceName.Trim('"');
	serviceDisplayName= serviceDisplayName.Trim('"');
}

The SetServicePropertiesFromCommandLine method retrieves the command line arguments and configures the installer to set the service’s name properties. The second method, ParseServiceNameSwitches, is just a utility method for SetServicePropertiesFromCommandLine.

We use these methods from InstallableServiceInstaller’s constructor:

public InstallableServiceInstaller()
{
	ServiceProcessInstaller serviceProcessInstaller =  new ServiceProcessInstaller();
	ServiceInstaller serviceInstaller = new ServiceInstaller();

	//# Service Account Information
	serviceProcessInstaller.Account = ServiceAccount.LocalSystem;
	serviceProcessInstaller.Username = null;
	serviceProcessInstaller.Password = null;

	SetServicePropertiesFromCommandLine(serviceInstaller);

	//# Service Information
	serviceInstaller.StartType = ServiceStartMode.Automatic;
	this.Installers.Add(serviceProcessInstaller);
	this.Installers.Add(serviceInstaller);
}

We can now install our service, providing the service names to InstallUtil as desired:

c:\windows\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe /i c:\service1\Installable.exe /servicename="my_service_instance_1" /servicedisplayname="My Service Instance 1"


How do I parse a csv file using yacc?

No Comments »

Parsing csv files… it’s tedious, it’s ugly and it’s been around forever. But if you’re working with legacy systems from before the rise of XML, chances are you will have to handle csv files on a regular basis.

Parsing csv files seems like a task perfectly suitable for a standard library or framework. However, this ostensibly easy task is apparently not quite that easy to tackle in a generic way. Despite a substantial effort, I haven’t been able to find a library that I find widely applicable and pleasant to work with. Microsoft BizTalk has some nice flat-file XSD extensions, but enrolling BizTalk on each of your projects involving csv files is hardly a palatable approach. A more light-weight approach might be the Linq2CSV project on CodeProject. This project allows you to define a class describing the entities in each line of the csv file. Once this is done, the project provides easy means for reading the data of a csv file and populating a list of objects of the class just defined. If the input doesn’t conform to the expected format, an exception will be thrown containing descriptions and linenumbers of the errors encountered.  This seems like a really nice approach and I will probably be using it on a few small projects in the near future.

However, the proper way of parsing is of course to bring out the big guns: yacc (Yet Another Compiler Compiler). As its name suggests, yacc is intended for tasks much more complex than parsing a bunch of one-liners. Yacc is a code-generation-tool for generating a parsers from a context free grammar specification. The generated parsers are of a type known as LR parsers and are well suited for implementing DSLs and compilers. Yacc even comes in a variety of flavors, including implementations for C, C#, ML and F#.

Below, I will show you how to parse a csv file in a sample format, generating a corresponding data structure. I will be using the F# yacc implementation (fsyacc) which comes with the F# installation (I’m using F# 1.9.6.2).

The parsing process

When we’re parsing text, the ultimate goal is to recognize the structure of a string of incoming characters and create a corresponding datastructure suitable for later processing. When designing compilers, the datastructure produced by the parser is referred to as an abstract syntax tree, but I will just to refer it as the datastructure.

Parsing the incoming text will fall in these steps:

The Parsing Process

The parser relies on a lexer to turn the incoming string of characters into a sequence of tokens. Tokens are characterized by their type and some may carry annotations. Thus, a token of type INT, representing an integer in the input stream, will have the actual integer value associated to it, because in most cases we will be interested in the particular value, not just the fact that some integer was present in the input. On the other hand, a token of type EOL, representing an end-of-line, will not carry any extra information.

We will not cover the details of the lexer in this posting.

The data format

The data we will be parsing will look like this:


328 15 20.1
328 13 11.1
328 16 129.2
328 19 4.3

Each line contains two integers followed by a decimal value, each separated by whitespace. This is not a traditional csv format, since the values are separated by whitespace. However, it is straightforward to adapt a lexer tokenizing the format above to a lexer procesing a more traditional format.

The dataset represents the result of measuring various health parameters of a person. The first integer of each line identifies a person. The next integer identifies a parameter and the decimal represents the value measured. Thus, if parameter 15 is BMI (body mass index), the first line above states that user 328 has a BMI of 20.1.

To parse this dataformat, we will need the following tokens:

Token Description Annotation
INT An integer The integer value
FLOAT A decimal The decimal value
EOR End of record (end of line) -
EOF End of input (end of file) -

The datastructure

To represent the data, we will use a very simple F# datastructure:

The datastructure

We define the datastructure in F# like this:

module Ast =
type Line = { userid : int; parameterid : int; value : float; }
type DataSet = DataSet of Line list

The context free grammar

For yacc to be able to generate a parser which makes sense of the data format described above, we need to provide yacc with instructions on how to convert a sequence of tokens into the components of the datastructure. We do this in the form of a context free grammar and a set of associated semantic actions. To understand these concepts, let’s have a look at the context free grammar we will actually provide to yacc:

DataSet LineList
LineList Line
  | LineList EOR Line
Line INT INT FLOAT

Each line in the grammar is traditionally called a production, because it states that the term on the left side of the arrow may be expanded into whatever is on the right side of the arrow. The terms on the left are called non-terminals, because they may be expanded into their constituents, namely the elements on the right. In the grammar above, DataSet, LineList and Line are non-terminals. On the other hand, no productions exist expanding EOR, INT or FLOAT. Thus, these elements are said to be terminals. They are the tokens which the lexer may provide.

The fourth production above states that the concept of a Line consists of two consecutive INT tokens and a FLOAT token, in that order. The second and third productions combined state that a LineList is either a Line or consists of a LineList followed by an EOR token and a Line. Thus, if two Line elements separated by an EOR token have been identified by the parser, it may consider this to be a LineList, since the first Line is a LineList by the second production while this sequence of a LineList followed by an EOR token and a Line is itself a LineList by the third production.

You should remember, that while the terms in the context free grammar are intimately related to the elements in our datastructure, these concepts are not the same. Also note that we had to introduce the recursively defined LineList element in the grammar to accomodate the concept of a list of elements.

If you’ve never encountered context free grammars before, a more thorough introduction than what I have provided may be desirable. In this case, you may want to consult Wikipedia.

Semantic actions

The datastructure is constructed in a bottom-up fashion by executing a piece of code each time a production is applied. The piece of code is called the production’s semantic action. For the

Line → INT INT FLOAT

production, we create a corresponding Ast.Line instance (cf. the "The datastructure" section above). These are the semantic actions we will need:

Production Semantic action Description
DataSet LineList DataSet($1) Create a Ast.DataSet, passing the Ast.Line list to the constructor
LineList Line [$1] Create a Ast.Line list containing a single element
  | LineList EOR Line $3 :: $1 Concatenate the Ast.Line to the list of the LineList
Line INT INT FLOAT

{

userid = $1;

parameterid = $2;

value = $3;

}

Create a new Ast.Line assigning the first integer of the line to Ast.Line.userid, the second integer to Ast.Line.parameterid and the float to Ast.Line.value

As you have probably guessed, the $x variables in the fourth semantic action refers to the values of the INT and FLOAT tokens.

When specifying the semantic action for a production to fsyacc, you enclose the appropriate piece of code in braces after the production. Thus, our parser specification will look like this:

%{
open RI.Statistics.Ast
%}

%start start
%token <System.Int32> INT
%token <System.Double> FLOAT
%token EOR
%token EOF
%type <RI.Statistics.Ast.DataSet> start

%%

start: DataSet { $1 }

DataSet: LineList { DataSet($1) }

LineList: Line { [$1] }
| LineList EOR Line { $3 :: $1 }

Line: INT INT FLOAT { { userid = $1; parameterid = $2; value = $3; } }

Generating and exercising the parser

To generate the parser, you run yacc from the command line, passing the name of the file containing the parser specification above as an argument. For fsyacc, we get:



C:\Users\rui\Projects\Statistics\Parser>fsyacc Parser.fsp --module Parser

building tables

computing first function...time: 00:00:00.1604878

building kernels...time: 00:00:00.1359233

building kernel table...time: 00:00:00.0407968

computing lookahead relations.............time: 00:00:00.0723062

building lookahead table...time: 00:00:00.0439270

building action table...time: 00:00:00.0673112

building goto table...time: 00:00:00.0099398

returning tables.

10 states

5 nonterminals

7 terminals

6 productions

#rows in action table: 10

C:\Users\rui\Projects\Statistics\Parser>

This produces two files, Parser.fs and Parser.fsi, which contain the implementation of the parser. We will include them when we compile the parser.

To test the parser, we create a console application which will parse the sample data presented earlier and print the resulting datastructure:

#light
open Lexing
open Parser

let x = @"328 15 0,0
  328 13 11,1
  328 16 129,2
  328 19 4,3"

let parse() =
  let myData =
    let lexbuf = Lexing.from_string x in
      Parser.start Lexer.Parser.tokenize lexbuf in
  myData

let data = parse()
printfn "%A" data

Compiling this with the generated parser and executing the resulting console application results in this:



C:\Users\rui\Projects\Statistics\ConsoleApplication\bin\Debug>ConsoleApplication.exe

DataSet [{userid = 328; parameterid = 19; value = 4.3;};

{userid = 328; parameterid = 16; value = 129.2;};

{userid = 328; parameterid = 13; value = 11.1;};

{userid = 328; parameterid = 15; value = 0.0;}]

C:\Users\rui\Projects\Statistics\ConsoleApplication\bin\Debug>

Presto, we have our datastructure! And with only a minimum amount of code!

 
So, is this really an appropriate approach for parsing csv files? Well, no, not quite. Even though the procedure described above is rather straightforward, there’s no error reporting facility, making it inappropriate for anything but a prototype application. Thus, for parsing csv files, the aforementioned Linq2CSV project, or something similar, will probably give us much more initial headway than the yacc approach. But the yacc approach scales very well with the complexity of the input, hence may become feasible as the complexity of the input increases.

UPDATE: It has come to my attention that Robert Pickering, in the second edition of his book on F#, intends to include a special section on parsing text files using other means than fsyacc. Thus, if you’re reading this posting with the intention of actually using the method described above for production, you may want to consult Robert Pickering’s book for alternatives.


A gotcha when using fslex with #light syntax

No Comments »

Tonight I’ve been implementing a small lexer/parser pair in order to be able to read data from a csv-like file and process the data using F# Interactive. I originally chose F# over C# because F# fits well with the data processing I’m doing. I expected to use a library written in C# for loading the data, but decided to use fslex and fsyacc instead, just for the fun of it. However, I came across a problem with fslex (or my understanding of fslex) and I thought I’d share the solution:

I want all the code for loading the data to go into a module named Parser. Therefore, my Lexer.fsl begins with the following code:

{
module Parser =
  open System
  open Lexing
  open Parser


This section of the lex specification is traditionally called the definition section and contains initial code I want copied into the final lexer. The code produced by fslex will look something like

module Parser =
open System
open Lexing
open Parser

# 8 "Lexer.fs"
let trans : uint16[] array =
[|
(* State 0 *)

(* ...lots of code... *)

let rec __fslex_dummy () = __fslex_dummy()
(* Rule tokenize *)
and tokenize (lexbuf : Microsoft.FSharp.Text.Lexing.LexBuffer<_>) = __fslex_tokenize 0 lexbuf
and __fslex_tokenize __fslex_state lexbuf =
match __fslex_tables.Interpret(__fslex_state,lexbuf) with
| 0 -> (
# 14 "Lexer.fsl"
tokenize lexbuf
# 50 "Lexer.fs"
)
| 1 -> (
# 15 "Lexer.fsl"
EOR
# 55 "Lexer.fs"

(* ...more code... *)

However, when trying to build the generated parser, the compiler would complain that “the value or constructor ‘EOR’ is not defined”. This had me stumbled for a while until I realized, that the code generator wasn’t indenting the code properly. The

let rec __fslex_dummy () = ...

wasn’t indented at all, thus the

open System
open Lexing
open Parser

statements, which were indented to be part of the Parser module, weren’t in scope anymore. To fix this, I had to fall back to the more verbose syntax

module Parser = begin
open System
open Lexing
open Parser

(* code code code *)

end

This made the compiler concur.
Just like you can add any valid F# code to the definition section of the lexer specification, you can add a similar section, called the user subroutines section, to the end of the file. fslex will copy it to the end of the generated code. Thus, we can have fslex generate the desired code by changing the definition section of the lexer specification to

{
module Parser = begin
open System
open Lexing
open Parser
}

and adding a closing section like

{
end
}

I guess the code generator ought to have handled the problem by indenting the generated code properly, but this is just a CTP, so hopefully it will be fixed in the future.