Tuesday, December 26, 2006
LTAG, you're it!
Consider this recent poem from the ETC Web site:
Busy
Tooth is a skirt.
Acquaintance and the crowd.
Worn, dines Jim.
The meddling intellect allows short-term fluctuation.
Wines in three good years.
Considerates in mathematics.
Wines hand.
Considerates.
I am no major threat.
This sample illustrates several problems in aesthetic text generation. In the first line, tooth lacks an leading determiner. Obviously, a better (and more "correct") line would be a tooth is a skirt. The problem is that ETC2's phrase structure grammar, being context-free, can't distinguish between nouns that take determiners and those that don't. Courage is a skirt works and A tooth is a skirt works, but A courage is a skirt and Courage is skirt don't.
The difference is that courage is an abstract noun, while tooth and skirt are concrete. ETC2 tries to normalize derterminers by looking at the frequency distributions of various determiners as they are associated with different nouns and does it at runtime. That allows for all kinds of errors. It won't work well if the parsing algorithm doesn't. It won't work well if the frequency is skewed due to some words' contexts being overly represented in the concordance. And since no parsing algorithm is flawless and since all bodies of text are misreprestentative of the language as a whole ("all words are rare"), there will always be errors.
And so Erica realizes, that at this stage of her development, she is no major threat. Making her a threat becomes possible through a lexicalized tree adjoining grammar (LTAG). I posted about basic TAGs last month. LTAG goes a step further and allows categories of words (right down to individual words) in the grammar, a trick sometimes referred to as mildly context sensitive. Take seem as an example. Ought to be easy, right? Not. It can't be used in an expansion such as VP->V-NP, because it's intransitive (copulative actually). So just define separate word categories for transitive and intransitive verbs. Still not. Be is the ultimate intransitive. So She is beautiful and She is a friend both work. She seems beautiful is OK. But She seems a friend is not (or only marginally--the sentence really wants to be She seems to be a friend. More interesting, She is bleeding works, but She seems bleeding doesn't. Seem wants real adjectives not participles and seems to indicate that is bleeding is really in a progressive tense, not containing an inflection of to be at all.
LTAG offers a way past all of this. Just give seem its own node in the grammar and tag it as requiring adjoining. Then define substitution trees for seem and feed the result into the adjoining algorithm. And SHAZZAM, it works! (It really does--we got it working last week.)
There are some complications in getting there for sure. We have to capture more attributes for most word categories. We have to weight tree selections, so that rarely used constructions don't become as legitimate as frequently used ones. And these tasks require more analytic intervention into the part-of-speech tagging process and more complex database schema. But keeping the semantic model small makes this manageable.
The really neat thing about an LTAG implementation, however, is that it means we can deploy the monster and let its anomalies surface through use. When we find one, all we have to do is define whatever lexicalized trees are needed and slam them into the grammar (stored in the DB). No new code. No recompilation. No redeployment. Sweet!
Wednesday, December 06, 2006
Advice to fledgling poets
Just how does one go about determining if computer-generated poetry can compete with conventionally written poetry? Early on in this project I tried a number of ways. I distributed copies of machine-generated texts alongside poems written by people to an audience and asked them to rank each poem according to their confidence that it was human or machine in origin. I distributed mixed bundles of machine and human poetry to various audiences (undergraduate classes at two universities, graduate classes, and most interesting of all to the members of a high-school English faculty) without telling them that some of the poems were not of human origin and asked them to rank the poems on order of quality. What I was attempting to do was to apply the principles of ethnography to the question of competitiveness, looking for a way to define “control” and “test” subject groups, trying to develop a null hypothesis (something about readers never choosing a machine poem over a human poem). But the attempt was hopeless. There weren’t enough samples to get any kind of reasonable distribution of results and no matter how I constructed a test, it could always be argued that the null hypothesis was always rejected—someone inevitably assigned at least one machine poem a very high rank. I was going about this the wrong way.
If I were claiming that machine poetry could compete with conventional poetry, then I ought to be competing as conventional poets compete—for publication space.
I started composing texts using ETC Version 1 and sending them out to little magazines and literary journals. I enhanced ETC to generate poems by the hundreds, from which I could select the ones that seemed to me most likely to be considered for publication. Sometimes I extracted the best lines from several poems on the same subject, merging them into a single composition. At other times I would work interactively with the software. I would have it generate a few lines. If the results held little promise, I discarded them and started over. If they did look promising, I’d ask for a few more lines, perhaps adjusting the input parameters based on the new text. Working back and forth like that, I could easily “write” a half dozen poems in an evening. And then I’d edit, correcting grammar and mechanics, adjusting tense and number, replacing prepositions (ETC 1 did a just abominable job with prepositions). I’d make a few word substitutions, replacing a prosodic word here and there with words containing long vowel sounds. And in the process, if I thought a poem wanted some help in articulating a particular aesthetic or political stance, I’d accommodate it, nudging the poem in one direction or another.
It was through this process that Erica came to find her identity. I’d chosen the name for its initials, matching the name of the software, ETC. Erica T. Carter was the first name I thought of and for whatever reason, I kept it. Erica evolved from persona to personality. Guiding the machine’s poems to slick finishes meant imposing my own judgments and aesthetic preferences on them. These couldn’t be poetic adjuncts, since I’m no poet, but rather calculated editorial changes. How could this or that piece be bent just a little into the just the right distortion a particular editor might accept?
Sometimes a poem would seem to want to be a love poem. Given my limited imagination and long-standing heterosexuality, I didn’t even try to guess what it would be like to be a woman loving a man and so just expressed the object of the speaker’s attentions as she. (ETC 1 did not handle personal pronouns.) And Erica became a lesbian. I became bolder in writing the cover letters. Erica became a disciple of Mary Magdalene and little by little a stiletto-heel feminist. She rides a motorcycle. She hates cats. She lives in the woods with her companion Rose (my wife's name). She can't hold a job. She drinks too much. She was irreverent, just a little bit trashy, and serious about her writing. I chose the publications with some care, sending them to journals whose editorial selections and practices coincided with a poem’s seeming theme: Political poetry to activist journals. Pastiche to the heady youngsters. Feminist positions to gender-centric journals (this tended to work fairly well—I suspect because of Erica’s sexual orientation, which she hinted at in her cover letters). Abstractions to experimentalists.
With scores of poems going out to dozens of publications, I needed some way to keep track. So I developed a database of publications and poems that I could query to learn just which poems where currently outstanding and which journals had rejected or accepted which poems. The project became process: Compose poetry, research potential targets, match the work up with one of a dozen stock cover letters (edit the letter to taste), mail, record in the database. This was not about aesthetics--it was about procedure and process documentation--what business people do very well, I among them. And through process came my first understandings about the game of publication.
I researched the first publications I sent to with a great deal of care. In addition to reviewing their Web sites, I read the journals. I researched the editors. I looked at the poetry of contributors beyond what they'd published in the journal. Which got me to The Shattered Wig Review, editor: one Sonny Bodkin.
I hadn’t read any issues of Shattered Wig but thought it promising because it promotes itself as “…hav[ing] a penchant for hardboiled surreal absurdity, but will accept anything that hits the target with a solid twang” (Little Magazines & Small Presses). Shattered Wig was Erica’s first placement. Her cover letter began with this:
Dear Mr. Bodkin:
Bodkin is your real name, right? Isn’t that the guy who seduced James Joyce’s wife into marrying Joyce by dying (because Joyce after that reminded her of death by tuberculosis)? Something like that. Not you, I’m guessing.
Erica included three prose poems (since her prose pieces seemed a bit more experimental than her open form poems). And “Sonny” responded:
Well, first off you are the grand prize literary winner! In all the years I’ve been using Sonny Bodkin as an editorial name, you’re the first to recognize it! Secondly I’m really intrigued as to whether these are computer generated. They seem a little on the “warm” side to be completely generated. Anyway, no matter how they came about, I want to include ”The stones of west Wyoming” and “In the unappeasable small hours” in Shattered Wig Review #22, due out in February. Thanks for the intrigue & unveiling the Bodkin riddle.BTW: Wondolowski is the only editor to suspect computational intervention. I'd dropped a couple of bread crumbs in the cover letter. After that I didn't.
Rupert Wondolowski
I received Erica’s complimentary copy and was surprised at the generally high quality of the work—pastiche to be sure, but some delightful little experimental riffs as well. The issue’s contributors had some gravitas: John M. Bennett, with a masters degree in English and a PhD in Latin American literature, has published over 200 books, chapbooks, and audio recordings, and has appeared in TapRoot Reviews. Rob Cook is editor of Skidrow Penthouse. Sarah Fox has a published collection and her poetry has appeared in Jacket and jubilat, among others. John Colburn, with an MFA from the University of Minnesota, is the editor of Spout Magazine. Dan Raphael has an undergraduate degree in English from Cornell and an MFA from Bowling Green and has published a half-dozen or so chapbooks.
An encouraging start—except Erica’s pieces were not all in the editorial vein of the rest of the work. Where the others had a mischievous gleam in their eyes, Erica was serious. Where they flouted convention, Erica by comparison embraced it. Her poems should not have made it into Shattered Wig. It was obvious that the cover letter let her in.
Over time I learned that on average every eighth submission garnered an acceptance and that there was no evidence whatsoever that "quality" had anything at all to do with a poem's placement. We were competitive, for sure. But why and how remain very much open questions.
Now the folks who disdain (or fear) machine poetry are heartened by this observation--which kinds of begs the question as to how they found publication space, doesn't it?
Monday, December 04, 2006
Between the poet and the machine lies the shadow
Resistance to machine poetry takes two forms. The first is simply to ignore it. new media poetics, a collection of essays by seventeen experts in the field, makes not a single reference to generated work--digital poetics is the province of the human artist. As Marjorie Perloff declares in her essay, "Screening the Page/Paging the Screen," "No medium of technique or production can it itself give the poet (or any other kind of artist) the inspiration or imagination to produce works of art."
What every single one of these writers believes is that the new media made possible through computational technology are all expressive media--raw material the artist molds into art. And it is certainly that. The computer makes available to the artist a plasticity of material never before possible--the opportunity for, not hundreds, but thousands of visions and revisions. But the computer is so much more--it affords the possibility of invented intelligence, not just invented visions--the computer as expressing medium.
The scariest thing of all: The existence of artificial intelligence implies the possibility of an artificial poet. The real threat is annihilation--if the critic is the poet's shadow, then when the poet disappears, so does he.
Part II tomorrow.