Archive for August, 2009

Parsing EDI messages with IronPython


I have recently been wondering about how to build a generic system for parsing and processing lots of EDI messages in such a way that a minimum of work is needed when a new message type is to be processed by the system. The syntactic format of EDI messages is fairly consistent, but the semantics of particular fields are open for interpretation. Thus, I thought it should be possibly to write a general parser for turning a message into a tree structure (much like an abstract syntax tree for “the EDI language”) which would be appropriate as input to the processing phase of the system.

Having recently picked up “IronPython in Action” I decided to apply IronPython to the task. Since this was only an experiment, I decided to somewhat give up the idea of a tree structure and instead leverage the dynamic programming capabilities of IronPython. Given an EDI message at runtime I wanted to generate an object with properties corresponding to the segments, subsegments and elements of the message.

To represent the document elements I created the following very simple classes:

class Document(object):

class Segment(object):

class Element(object):

class DataElement(object):

As you can see these are all identical classes, simply derived from object. They are created as seperate classes to allow code that reflect on their type to be able to make sense of their intended use.

To describe the various delimiters and escape characters of an EDI message I created an EDIDelimiterContext:

class EDIDelimiterContext(object):
    def __init__(self, segmentSeparator='\'', elementSeparator='+', dataElementSeparator=':', escapeCharacter='?'):
        self.SegmentSeparator = segmentSeparator
        self.ElementSeparator = elementSeparator
        self.DataElementSeparator = dataElementSeparator
        self.EscapeCharacter = escapeCharacter

The code for the parser looks like this:

from Document import Document, Segment, Element, DataElement
import clr

class Parser(object):
    def __init__(self, ediDelimiterContext):
        self.EDIDelimiterContext = ediDelimiterContext

    "Splits the string input according to the given separator, taking the given escape character into consideration"
    def splitter(self, input, escape, separator):
        subStrings = []
        i = 0
        while (i < input.Length):
            if (input[i] == separator and (i == 0 or input[i - 1] != escape)):
                subStrings.append(input.Substring(0, i))
                input = input.Remove(0, i + 1)
                i = 0
                i += 1

        if(input.Length > 0):

        return subStrings

    def getSegmentsStrings(self, input):
        return self.splitter(input, self.EDIDelimiterContext.EscapeCharacter, self.EDIDelimiterContext.SegmentSeparator)

    def getElementsStrings(self, input):
        return self.splitter(input, self.EDIDelimiterContext.EscapeCharacter, self.EDIDelimiterContext.ElementSeparator)

    def getDataElementsStrings(self, input):
        return self.splitter(input, self.EDIDelimiterContext.EscapeCharacter, self.EDIDelimiterContext.DataElementSeparator)

    "Attaches a property named propertyName with value value to the object targetObj. If a property with that name already exists, an index is added to the end of the name to construct a unique property name"
    def attachProperty(self, targetObj, propertyName, value):
        postfix = ""
        if hasattr(targetObj, propertyName):
            i = 2
            while hasattr(targetObj, propertyName + str(i)):
                i += 1
            setattr(targetObj, propertyName + str(i), value)
            setattr(targetObj, propertyName, value)

    "Builds a Document from a string (input)"
    def buildDocumentObject(self, input):
        doc = Document()
        segmentStrings = self.getSegmentsStrings(input)
        for segmentstr in segmentStrings:
            name, value = self.buildSegmentObject(segmentstr)
            self.attachProperty(doc, name, value)
        return doc

    "Builds a Segment from a string (input)"
    def buildSegmentObject(self, input):
        seg = Segment()
        elementsStrings = self.getElementsStrings(input)
        name, elements = self.splitNameFromSegment(elementsStrings)
        for elementstr in elements:
            self.attachProperty(seg, 'Element', self.buildElementObject(elementstr))
        return name, seg

    "Builds an Element from a string (input)"
    def buildElementObject(self, input):
        element = Element()
        dataElements = self.getDataElementsStrings(input)
        for dataelementstr in dataElements:
            self.attachProperty(element, 'DataElement', dataelementstr)
        return element

    "Determines the name of a segment. If the segment contains multiple elements, the first element will form the name of the segment. That is, the segment ABC+DEF:def+GH:gef will be named ABC"
    def splitNameFromSegment(self, elements):
        if len(elements) == 0:
            return ('Segment', elements)

        name = elements.pop(0)
        return (name, elements)

The method of primary interest in the Parser is attachProperty(self, targetObj, propertyName, value) which will attach a new property to the object given by targetObj. The property will have the value given by the value parameter. The property will be named according to the propertyName parameter, unless a property with that name already exists. In that case we will try to go with <propertyName>2, <propertyname>3, etc. Coming from a background in C# I think it is really nice to see how easy it is to attach a property using the setattr method. Dynamically typed, late bound languages like IronPython and Javascript really make you see the world in a different light once tricks like this become part of your arsenal.

Using the parser above, I can now create an object from an EDI message as follows:

>>> context = Parser.EDIDelimiterContext()

>>> parser = Parser.Parser(context)

>>> doc = parser.buildDocumentObject(‘ABC+DEF:def+GH:gef\’IJK+LM:nop:q21′)

>>> doc.ABC

<Segment object at 0x000000000000002B>

>>> doc.ABC.Element.DataElement


>>> doc.ABC.Element.DataElement2



Notice that when I write doc.ABC.El<TAB> the tab completion of the python console will allow me to easily cycle through the elements of the ABC segment. This will be immensely valuable, since given a sample EDI message of some new type, the tab completion features will guide me when I’m implementing the logic for processing such messages.