Wednesday, January 31, 2007
How to tell a treasure? By its reaction.
Over at Grand Text Auto, Nick Montfort has some quite interesting commentary on Ect3beta. As usual, Nick is dead-on on- the-mark and underscores the challenges in getting from a conceptual design of a poetry machine, to a fully programmed implementation.
Perhaps the greatest challenge is formulating a test plan. It's hard enough testing deterministic software (one occasionally longs for the simplicity of the fully compliant APR calculation) but when the entire system is nondeterministic, testing can be a nightmare. And one thing I've learned for sure on this journey is that an undetected software defect in a poetry generation system will come back and humiliate you just at the moment you are showing off. So how can we begin?
First off, we need to find out if the basic premise of this version is sound, that the grammar is where everything of importance lives--that the grammar is pretty much the system. So how do you test a grammar? The first thing you need is a form upon which to impose the grammar, being mindful that we want the final poem to vary from other machine poems and from the poems of the more general world. In order to accomplish all of that, we know that in the final deployed version there must be some stochastic selection of grammatical choices within a given form. But building that form and giving it choices means that we can never be sure that a given bit of defined syntax will ever be exercised.
To get around this problem, I developed a Grammar Test form. This form consists of a single-stanza poem that looks up all of the possible syntactic structures for a given grammar and then uses each one in turn to construct the stanza. Anytime this form is used, it will produce a "poem" of the exact same grammatical sequence. Thus the three structurally identical poems Nick quotes: "Beggary and hate," "Of onyx," and "Politeness." Nick is right that this kind of repetition impoverishes the machine, but it serves the purpose of removing a bit of the randomness that makes testing so hard.
Nick also points out that there is too much lexical repetition as well, as in his example, "Of cashmere." What's at work there is a test of grammatical weights. In any set of syntactic structures, we want some of those to be rarer than others. A way to test that is with a two-element grammar, one weighted high and the other low. That's no guarantee that we'll get the kind of distribution we're after, but it reduces the number of possible outcomes so that we can draw reasonable inferences from the results.
Then there's that thing about the interface into the monster. Should the system give its users some kind of control over input parameters and if so how much? That's a question I can only guess at right now. I have made myself an absolute rule for the building of Etc3: If I find a defect, either conceptual or structural, anywhere in the monster, I fix it first, before adding anything new. (I learned the hard way with Etc1 and Etc2 where I cut corners to meet deadlines--the result was flawed and irreparable software. Not this time.) The effect of this rule is a continual and aggressive refactoring of the design and code. Putting an interface "out there" would mean refactoring it as well, and I would incur completely unacceptable opportunity costs. Etc3's final coming out would be delayed.
And then there is the fact that I can never know (perhaps it's just me) what sorts of configuration utilities the monster will require until I get enough infrastructure built to make a feature work. For example, until I had a real sense of how weights in a grammar should function, I simply created the weights (stored in a DB) by issuing command-line SQL. Not very user-friendly.
So I'm not ready to expose the control a user can have over Etc3 until I'm further along. But I can show you what I use in my own testing. Here's my main test form:
The "databases" are the sources for semantic choices, sort of functioning like lexica. Any composition will be constrained in what it can say by what is in these guys. This was how I got past Etc2's enormous performance problems. Here, instead of a gargantuan source (31,000,000 rows), we use very small models. But we can add new ones forever. (BTW: Processing these source texts requires the use of about a dozen additional utilities, all of which themselves had to be coded and tested.) The composition plans are actually forms. We adopted and adapted NLG's "document plan." Each new form has to be coded. The "Use custom preferences" option provides an interface that creates an override to a form's default, things like tense, and subject person, and x and y and z....
Replicating these widgets in jsp forms just isn't an option at this time, but should be worth the wait (weight?).
And then there's the utility by which grammars can be formulated. It was only through using early grammars that I became aware of additional attributes a grammar had to have to be usable (e.g.: a sample output string would be helpful). Eventually I got to this:
And this:
All in all, everything Nick notices awaits in the wings. Whether any of them should be allowed into the spotlight is very much an open question.
Friday, January 26, 2007
Jess-Belle
Thursday, January 25, 2007
Could we incite a server pages war?
I've taken a break from completing features in etc3 to get a Web version going. Keeping the two user interfaces in synch now means a lot less work later. Also, I've been getting requests for access to the monster and a Web version is the easiest way to do that.
But this guy is Java, suggesting Java Server Pages as a processing mechanism and I don't (didn't) know a thing about it. My Web work has been primarily in ASP and ASP.NET. So JSP is another thing to learn.
And learning new enabling technologies always invites questions about the relative quality and utility of what we've worked with before. What's interesting to me about JSP is less what it does than how much it looks like ASP. Of course JSP came first, which means Microsoft stole the approach from Sun. (Before going off, remember TS Eliot's observation that bad poets borrow, good poets steal.)
But JSP has little resemblance to the newer ASP.NET. Where JSP and ASP allow the mingling of program code (VB or Java) with html, ASP.NET completely separates them. Each aspx form file has a "code-behind" file that really is pure program code, making getting to an MVC pattern really easy. Moreover ASP.NET 2005 utilizes an xcopy deployment strategy. Whereas the first version of ASP.NET required you to precompile your code-behind files into binaries and install those, the current version has you deploying the code-behind files. When IIS passes them off to ASP.NET, ASP.NET checks to see if the source file it's working with has been modified since the last browser request for it. If so, it compiles the file and merges that binary into a single application binary. Very easy. Very dependable.
What would be pretty cool would be for Sun to steal back from MS some of what's good about ASP.NET and make a "better" JSP, forcing MS to make a "better" ASP forcing Sun to make a "better" JSP forcing MS to....
Competition sires quality. But Sun has to keep doing its part.
Monday, January 22, 2007
Irony in the marketplace of ideas
I've been reading Kate Hayles' My Mother Was a Computer. Now before I make my point, know that I like Hayles' work and her contributions to thought about electronic literarture and especially her ongoing support of the Electronic Literature Organization. That and the fact that she's just a nice person.
That said, I'm a bit troubled by her shots at capitalism. The book is very lightly seasoned with biased references to market economics: "the nefarious corporate practices of Microsoft, the capitalistic greed that underlies its ruthless business practices," "the machinations of evil corporations," "resistant practices [read: 'reactionary'] and hegemonic reinscriptions associated with them [capitalism's politics and economics]."
Hayles is not alone. Over on the other side of the campus, in the humanities, there exists a political litmus test: Be an anti-capitalist or else. Now some readers (sounds like I have a lot of readers, but I actually only have about three) will note that my adjunct appointment at an elite business school no doubt distorts my views. Maybe so. But I've personally experienced the venom of my friends "over there." Not only is it clearly impossible that I know anything about literature, poetry and poetics in particular, because of my Wharton affiliation, I'm no doubt a rapacious robber baron directly responsible for the starvation of children just by virtue of my job. Oh well...
But it seems to me that the anti-capitalism so evident in the academy smacks of McCarthyism. Just a whispered allegation like "You know, that guy supports pro-business legislation" or even better "is a conservative" consigns "that guy" to a blacklisted status.
Which seems to me profoundly hypocritical. Every tenured faculty member (and even adjuncts like me) receive some portion of his/her salary from income from the institution's endowment: A high percentage in the elite university, a lower one in the not-so-elite. And nearly every penney of that income is realized from investment in capitalistic enterprises.
BTW: Those lower tier institutions depend on parental contributions, which very often are the result of careful financial planning and investment in equities, and on government programs supported by tax revenues generated by our capitalistic, market economy.
Oddly the academy's political practices as manifest in faculty hiring practices are resoundingly capitalistic. With a limited number of seats at the table, if a young PhD has any chance of getting one of them, he/she better produce and offer for sale just what the market wants: The academic party line. Otherwise, just as in business where organizations that make money survive and those that don't don't, that aspiring academic with nothing the academy wants to buy will be out in the cold, begging for a handout at the adjunct entrance.
Thursday, January 18, 2007
Break on through....
With apologies to the Doors, I submit for your approval:
Of love
We will be silver
Silver
Water
Freezing
We will be white
After we will be white, freezing, seeing, silver as a flower.
Flowering silver leans
Freezing
Until we will be silver, freezing, standing, silver as a flower.
This is output from yesterday's Etc3 tests, straight from the machine, no edits (except bolding the title). We've often been criticized because our machine's results required human intervention--"editing to taste" was good enough for John Cage but not for us. This pretty much works and seems to us about as good as most of what's "out there." (We don't generally care for contemporary lyric poetry, but it's dominant in available publication space and so until we got the lyric right, those naysayers could rest content--but no more!)
This validates our choice of a tree adjoining grammar(s) for this iteration of the monster. Creating TAG nodes as independent trees allows their aggregation into more or less complex grammars. The grammar for this piece consists of seven trees, weighted variably for selection by the generator. The structurally identical lines closing the second and third stanzas are the result of syntactic structures built up from a subordinate clause (itself an aggregation of two trees), two participles, and a "simile" node.
And, by the way, we aren't positing these lyrical structures from our own notions of poetic form. We go to poetry as it exists for that.
Monday, January 15, 2007
What does that mean?
Using a Lexicalized Tree Adjoining Grammar is working out well. I've been able to code the generation of quite a few different semantic patterns and have been able to organize them into grammars. This requires a lot more analytic work (identifying nouns as concrete or abstract in the database, specifying transitiveness for verbs, words that need their own lexicalizing nodes, and so forth), but that's an evolving answer--as grammatical problems expose themselves, just respond, either by changing a lexical entry's attributes or the definition of the appropriate tree. "Grammaticalness" in ALG systems that look to range in semantics over the full range of the English language becomes a solved problem, one that has consumed much of our time in working out our designs. (And for many of us, the only part of the problem we've been attacking.)
We can now turn our attention to making these Frankensteinian monsters mean something. We're getting there. Oh lousy poets, be afraid, be very afraid.
Wednesday, January 10, 2007
Madly normal
In Writing and Difference, Derrida makes this rather remarkable assertion:
It doesn’t matter at all where the text comes from:Just as it is the fact of the sentence that makes it senseful, it is the fact of the poem makes it poetical. It doesn’t matter where it comes from: man, woman, child, sociopath, criminal, dictator, or machine—the poem carries normalcy and poeticalness within itself by way of the very virtue of its being. The language of poetry's what makes it what it is--whether thought or not.
By its essence, the sentence is normal. It carries normality within it, that is, sense, in every sense of the word—Descartes’s in particular. It carries normality and sense within it, and does so whatever the state, whatever the health or madness of him who propounds it, or whom it passes through, on whom, in whom it is articulated.
Machine poetry is not going away. Sooner or later the critical community is going to have to concede its intelligism and accept that artificial intelligence is every bit as deserving of study as their own human intelligence when it comes to the propagation of poetry. Think they'll put up a fight?
Tuesday, January 09, 2007
Considerates do as considerates are
In a recent post, I ruminated about a problem with ETC's poems, specifically its inablity to distinguish between abstract and concrete nouns and issues with syntactically specfic words. The sample poem I used illustrates other problems. One of these is the inapproriate substitution of an adjective for a noun:
Considerates in mathematics
This malapropism is a result of the way I programmed the system to initiate itself. The poem object accepts a set of "seed" words in its constructor (nouns, verbs, and adjectives) and then uses those to build up contextual sets of related words. The programming is quite strict about getting what it wants and assumes that if it's been told a given word is a noun, then it must be a noun. The Web interface prompts a user for lists of nouns, verbs, and adjectives. The way "Considerates" came to be was a by way of a user's keying in the adjective considerate in the noun text field. Thinking it had a noun, it composed the line to be structurally similar to: "Lessons in mathematics," making a plural noun from the singular error.
That's how the mistake happened. But the cause is more complex. "Seed words" is a technical concept, an initiator (first cause?). You can't get poetry from technical concepts, only from literary concepts. What we really want is for the poem to be about something, to have a topic, e.g.: "The death of a beautiful woman."
Now the software evenutally has to figure out the grammatical categories into which the words in the topic fall, but it should figure that out on its own. After all, if we asked a "real" poet to write about the death of a beautiful woman, he/she might quite conceivably compose a line containing something like "sorrow for the lost," without our telling him/her that death is a noun and beautiful an adjective.
An improvement in ETC3 over its predecessors is that it actually takes a topic and parses it on its own. Which of course means that it has to have a decent parser. It does. Every word in its lexicon contains all of that word's inflections. If it's a word ETC3 knows about, it will find it, just like a "real" poet.
Software that thinks it's software will never be a threat, but software that thinks it's a writer just might be.
Tuesday, January 02, 2007
It's probably just me...
...but there are posts all over the place about the recent MLA Convention in Philadelphia, including by some who lay claim to innovation and resistance to norms. There's been a lot of this going on recently. Slought (one of my favorite places on earth) recently hosted a talk by Barrett Watten in which he claimed to still be a resistor even though he's a tenured English professor (because there are degrees of resistance) and its Rogue Thought Award presentation (sponsored by the MLA) with a public conversation that included Gregg Lambert and James English. To his credit, Lambert noted that though MLA members pride themselves on their radicalness, the MLA is about as conservative as you can get.
So riddle me this: Once you are tenured (ordained?) into the academy and you make decisions on who else gets tenure and who the next generation of faculty will be, haven't you by definition not only allowed yourself to be appropriated but are one of the appropriators yourself? And how is it that "radical" writers don't boycott the convention?