Wednesday, January 31, 2007

 

How to tell a treasure? By its reaction.


Over at Grand Text Auto, Nick Montfort has some quite interesting commentary on Ect3beta. As usual, Nick is dead-on on- the-mark and underscores the challenges in getting from a conceptual design of a poetry machine, to a fully programmed implementation.

Perhaps the greatest challenge is formulating a test plan. It's hard enough testing deterministic software (one occasionally longs for the simplicity of the fully compliant APR calculation) but when the entire system is nondeterministic, testing can be a nightmare. And one thing I've learned for sure on this journey is that an undetected software defect in a poetry generation system will come back and humiliate you just at the moment you are showing off. So how can we begin?

First off, we need to find out if the basic premise of this version is sound, that the grammar is where everything of importance lives--that the grammar is pretty much the system. So how do you test a grammar? The first thing you need is a form upon which to impose the grammar, being mindful that we want the final poem to vary from other machine poems and from the poems of the more general world. In order to accomplish all of that, we know that in the final deployed version there must be some stochastic selection of grammatical choices within a given form. But building that form and giving it choices means that we can never be sure that a given bit of defined syntax will ever be exercised.

To get around this problem, I developed a Grammar Test form. This form consists of a single-stanza poem that looks up all of the possible syntactic structures for a given grammar and then uses each one in turn to construct the stanza. Anytime this form is used, it will produce a "poem" of the exact same grammatical sequence. Thus the three structurally identical poems Nick quotes: "Beggary and hate," "Of onyx," and "Politeness." Nick is right that this kind of repetition impoverishes the machine, but it serves the purpose of removing a bit of the randomness that makes testing so hard.

Nick also points out that there is too much lexical repetition as well, as in his example, "Of cashmere." What's at work there is a test of grammatical weights. In any set of syntactic structures, we want some of those to be rarer than others. A way to test that is with a two-element grammar, one weighted high and the other low. That's no guarantee that we'll get the kind of distribution we're after, but it reduces the number of possible outcomes so that we can draw reasonable inferences from the results.

Then there's that thing about the interface into the monster. Should the system give its users some kind of control over input parameters and if so how much? That's a question I can only guess at right now. I have made myself an absolute rule for the building of Etc3: If I find a defect, either conceptual or structural, anywhere in the monster, I fix it first, before adding anything new. (I learned the hard way with Etc1 and Etc2 where I cut corners to meet deadlines--the result was flawed and irreparable software. Not this time.) The effect of this rule is a continual and aggressive refactoring of the design and code. Putting an interface "out there" would mean refactoring it as well, and I would incur completely unacceptable opportunity costs. Etc3's final coming out would be delayed.

And then there is the fact that I can never know (perhaps it's just me) what sorts of configuration utilities the monster will require until I get enough infrastructure built to make a feature work. For example, until I had a real sense of how weights in a grammar should function, I simply created the weights (stored in a DB) by issuing command-line SQL. Not very user-friendly.

So I'm not ready to expose the control a user can have over Etc3 until I'm further along. But I can show you what I use in my own testing. Here's my main test form:


The "databases" are the sources for semantic choices, sort of functioning like lexica. Any composition will be constrained in what it can say by what is in these guys. This was how I got past Etc2's enormous performance problems. Here, instead of a gargantuan source (31,000,000 rows), we use very small models. But we can add new ones forever. (BTW: Processing these source texts requires the use of about a dozen additional utilities, all of which themselves had to be coded and tested.) The composition plans are actually forms. We adopted and adapted NLG's "document plan." Each new form has to be coded. The "Use custom preferences" option provides an interface that creates an override to a form's default, things like tense, and subject person, and x and y and z....

Replicating these widgets in jsp forms just isn't an option at this time, but should be worth the wait (weight?).

And then there's the utility by which grammars can be formulated. It was only through using early grammars that I became aware of additional attributes a grammar had to have to be usable (e.g.: a sample output string would be helpful). Eventually I got to this:



And this:


All in all, everything Nick notices awaits in the wings. Whether any of them should be allowed into the spotlight is very much an open question.

Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?