Tuesday, November 21, 2006
For want of a '++' or When will I ever learn
Last night etc3 wrote its first piece from a grammar with two syntactic structures. Prior to that it had written pieces using different structures, but only from grammars with one of them. The code was there to do two--it just hadn't been exercised (or so I thought). I spent much of yesterday writing a GUI to manage grammar content, so I wouldn't have to keep keying it in via SQL commands. Got that to work and used it to link up two structures with a single grammar. (I'd previously tested each independently.) This was the first important test and of course, the program crashed right out of the box.
The call stack showed a problem in the constructor of one of the TAG nodes (a nominal-pronoun node). I was looking for a "composition preferences" object that doesn't get linked to a tree until the complete grammar loads. I should have known better: Never do work in a constructor. (During testing of that node, I'd hard-coded a preference object where it could see it, but after the test, moved it to where it should be, as an attribute of a "composition plan" object. Ok, that's easy to fix, right? Just extract a method that used that object and call it from places where the node wants some direction on how it should realize itself.
Still crashed.
This time the problem appeared to be in the grammar factory, which uses a node factory to build up its TAG trees. From the stack trace it looked like the grammar needed to know about the preferences object and the preferences object about the grammar. This was not good. Bidirectional associations almost always means a seriously flawed design. Oh boy.
An hour of stepping the code: There it was. The code that loads the trees first gets the names of all of the trees for a given grammar, then gets the Xml for each tree (since a tree is just nested nodes, Xml was the logical choice), which it stores in an array allocated the length of the number of names. The problem was that in spinning the array of names, I wasn't incrementing the index into the array of Xml definitions, so the second just overwrote the first. The second was null and when the grammar tried to load it, got very confused.
This kind of thing happens all the time and this post isn't just an old and grey programmer spinning yarns. What interests me about this phenomenon is how seriously a minor flaw xml[i], instead of xml[i++] can complete derail a large software development project. If this were such a project, there would be one team writing the Xml interface, another the grammar, another the rules of composition and so on. Some investigation shows that the problem does not appear to be isolated to a particular component, but is a function of two or more component interactions. So we have meetings to figure out what each other is doing, review the UML that supports the programming, assign testers to devise more test cases to verify that there is indeed a general problem not specific to a given grammar, and on and on and on.
Brooks noted that most programmers think they can build systems more quickly than large teams. But what they do better is write better programs. Systems are a whole 'nother thing. But sometimes, when I catch a problem such as the one I wrestled with last night, I wonder if maybe the programmer is on to something.