Monday, July 24, 2006

 

1 / (common sense) = animus


My post last week goofing on entropy wasn't entirely tongue-in-cheek. And it looks as if there may actually be something to it. I've updated the ETC software to experiment with the idea. The original programming selected bigrams using a simple weighted selection--the more often a bigram appeared in the model, the more likely it would be selected within an utterance. There were a couple of reasons for the approach. First, I'd hypothesized that poetry wants to be mostly normal text, with unexpected variations here and there. Second, low frequency bigrams seemed to trash the poetics.

This low-frequency distortion turned out not to be "normal." What I was seeing were the result of tagging mistakes and anomalies in the source text (British National Corpus)--mistakes such as mispunctuated compound sentences, where two words were really in two different sentences and therefore not a natural bigram or spelling errors or phonetically contrived attempts to capture an accent in a literary passage, and so on. A closer look at the model revealed that about 30% of the volume under the distribution curve consisted of bigram pairs with a frequency of one! I trashed all of those and the output was much less distorted. Now that the model is a better representation of language as it's "normally" used, I could more easily reason about things like Barthes' poetic zone of speech within the model's constraints.

So I changed the bigram selection code to favor infrequent bigram pairs. That's what's running now. We'll see if the poetry is any better.

Sunday, July 23, 2006

 

More of why ALG is hard


I've been silent here for quite a while. I've been trying to meet a project deadline that has me ruminating on the interdisciplinarity of poetry generation. Here's what's open and scattered about my desk:
ALG is hard because there's just too much to know in fields so disparate from one another that they seem to not even speak the same language.

Saturday, July 15, 2006

 

I've been thinking too much


If this is true:

And if this is true:
Syntagmatic freedom is clearly related to certain aleatory factors: there are probabilities of saturation of certain syntactic forms by certain contents… This phenomenon is called catalysis; it is possible to imagine a purely formal lexicon which would provide, instead of the meaning of each word, the set of other words which could catalyse it according to possibilities which are of course variable—the smallest degree of probability would correspond to a ‘poetic’ zone of speech. (Barthes, Elements of Semiology)
Does it follow that poetic value could be explained with this?:







Wednesday, July 05, 2006

 

A tool that should be in every ALG programmers utility belt


I taught middle and high school English for 12 years. You know the drill: lots of spelling, vocabulary and grammar, composition practice, some literature, and a different Shakespeare play every year. But mostly grammar and usage. So much so that I was sure I knew enough about English grammar to embark on my first ALG project and build its grammar just from what was in my head. So wrong! It took Quirk, et. al.'s A Comprehensive Grammar of the English Language to get me anywhere near the place I needed to be to construct a grammar out of OOP classes. Eventually I tired of playing dueling recall requests with whoever else needed it and bought my own. I consult it every day that I work on an ALG project. It's well worth the price.

This page is powered by Blogger. Isn't yours?