Wednesday, June 14, 2006

 

Once more into the breach


I've begun iteration four of the poetry engine. The current version is OK, but decidely flawed. It grew as a response to a couple of problems in its immediate ancestor. First, its (the ancestor's) poetry became repetitive, a problem I traced to the paucity of words from which it could work. Iused the Brown corpus as the text source in that version. With "only" a million words, the engine soon began reusing some of them--a great many bigrams appeared only once in that corpus, so selecting by one of the word's context usually got the other word.

The current version uses the British National Corpus. With 85,000,000 usable words, it rarely repeats itself, at least semantically. But now the engine is woefully repetitive in structure and style. And the context of the works tend to a kind of high-level quotidian, a result of so much of the text's being drawn from journalistic sources where the practicalities of government and business dominate. So there's lots of references to monetary amounts and parliament.

And performance suffered. The datasets are huge and take a long time to retrieve.

But worst is the weakness of structure. This is a serious design problem. Though the current system implements a "structure" class that attempts to stitch together the pieces of a poem into a compositional whole, it is only and completely semantically based. No question and answer. No hinges. No shifting speaker. And the machine only generates one type of poem, a mildly abstract sort of free verse. No MFA specials, no radical abstractions, no sonnets.

My understanding of how these monsters can best be designed has followed a kind of geodesic pattern. From a design hypothesis I've developed a system, then examined its output. Discovering weakness in content and form leads to a better design hypothesis, which I then test as a functioning system. The problem is that the time it takes to go from recognizing basic flaws to delivering a working system takes longer and longer. The last go round took over a year. I anticipate about that much time for this new version.

I welcome any reader comments on where the Erica goes wrong and suggestions on how she could do better.

Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?