Tuesday, October 31, 2006
The good, the bad, and the velmu
If globals are bad and goto is bad and Linux is good, then how explain this function from the Linux kernel? And why would anyone ever pass in a global as a function parameter?
static int check_free_space(struct file *file)
{
struct kstatfs sbuf;}
int res;
int act;
sector_t resume;
sector_t suspend;
spin_lock(&acct_globals.lock);
res = acct_globals.active;
if (!file || !acct_globals.needcheck)
goto out;
spin_unlock(&acct_globals.lock);
/* May block */
if (vfs_statfs(file->f_dentry, &sbuf))
return res;
suspend = sbuf.f_blocks * SUSPEND;
resume = sbuf.f_blocks * RESUME;
sector_div(suspend, 100);
sector_div(resume, 100);
if (sbuf.f_bavail <= suspend)
act = -1;
else if (sbuf.f_bavail >= resume)
act = 1;
else
act = 0;
/*
* If some joker switched acct_globals.file under
* us we'ld better be
* silent and _not_ touch anything.
*/
spin_lock(&acct_globals.lock);
if (file != acct_globals.file) {
if (act)
res = act>0;
goto out;
}
if (acct_globals.active) {
if (act < active =" 0;"> 0) {
acct_globals.active = 1;
printk(KERN_INFO "Process accounting resumed\n");
}
}
del_timer(&acct_globals.timer);
acct_globals.needcheck = 0;
acct_globals.timer.expires = jiffies + ACCT_TIMEOUT*HZ;
add_timer(&acct_globals.timer);
res = acct_globals.active;
out:
spin_unlock(&acct_globals.lock);
return res;
Monday, October 30, 2006
Canon fodder
At last week's Autostart festival, celebrating the release of Volume 1 of the Electronic Literature Organization's annual digital anthology, the conversation turned occasionally to the notion of quality as it might apply to electronic writing. Just what constitutes quality in electronic writing? Is it necessary that the text the medium presents be of high quality? And does that mean that the measure of a successful electronic piece is at least in part how well it stands up to the quality of mainstream (or counter-mainstream) print writing? What about the digital presentation? What makes one presentation "better" than another? Who gets to decide?
The very fact that there is an Electronic Literature Collection, Volume 1 certainly implies that its editors, having made editorial choices, believe that there is a standard of quality that can be applied to electronic writing. The "e-blurb" blog entry says that the collection represents a "broad overview of the field of electronic literature." But there is no discussion (at least that I can find) concerning why the editors included any particular work.
This is important. The canon is defined by who makes it into the anthologies. In time e-writing will become a field for conservative pedagogy, the locus of PhD dissertations, and knowledge of it the currency of a future body of scholars. And about what and about whom the, by then, petty arguments rage will be determined by choices made now. Shouldn't there be really good reasons motivating those choices, so that only the aristocratic make it in and that the rabble be left behind?
Saturday, October 28, 2006
I can do it in my sleep
Over the last couple of weeks I've been wrestling with some thorny design issues the etc3 project has opened up. In particular, I've been trying to complete the design for a text-generation program ignorant of the grammar it is using. This in service of several different design goals, such as the system's having the ability to shift among different grammars and lexical distributions. To get this to work I would need a way to create the grammars, a way to persist them so that they could be split and spliced, a way to realize surface without any terminal being aware of the grammar in which it was embedded but at the same time responsive to a composition plan, and then to be able to attend to details that require constituents to know a little something about each other (e.g.: agreement).
Experience told me that I would only have one chance to screw this up--a mistake anywhere would cascade into mistakes everywhere. So I did a little prototyping, figured out an Xml schema for defining TAG trees (actually, this part was kind of easy), and drew page after page after page of UML class collaboration diagrams. Then one day early this week, I worked on the problem too late into the night and slipped into bed without taking a little down time. And of course I dreamed I was working on the problem.
(This isn't the first time I've dreamed of the work of computer programming. Any programmer who's worked on a difficult problem has had this experience. Once I even dreamed I was a DATA statement in a BASIC program that looped infinitely through its DATA statements. Every time I was read in the dream, I woke up, only to go back to sleep and continue the dream.)
When I awakened this time, I just figured I'd been working too hard and needed a rest. But then I started to think about what I'd dreamed. Hmmm... That works. So does that. Yeah, that's the answer to the agreement problem. Turns out I'd solved the entire design problem, except for one issue that was actually a result of my ignorance about a particular Java feature.
Now if I can just figure out a way to code while sleeping...
Tuesday, October 24, 2006
Genesis
Got the first etc3 full poem this morning (rather than fragments and bits):
she is alone"she" must be a computer programmar.
she is alone
she is blue
Monday, October 23, 2006
The grammar is not the code--except when it is the code
The more I think about ALG problems in the context of a tree adjoining grammar, the more I'm convinced that the grammar is the code. But first, two claims (for which I''ll defer defending--in time, gentler readers, in time)"
1 - In a quiet moment of reflection, we admit (at least to ourselves) that some measure of our motivation in developing ALG machines is to disturb (piss off , if possible) "real" writers. To do that ALG has to find and walk the line between utter conformity and chaos. If we are to be a viable threat to "real" writers, our output has to be recognizable as poetry--it has to conform to a greater or lesser degree to the forms of the near greats: the language poet, the memoir poet, the authentic poet, the confessing poet, the conceptual poet, and especially the academic poet. At the same time, our texts have to appear "original." More simply: While they are being same enough to compete, they have to be different enough to compel.
2 - In a well-behaved OOP application, all of the work should be done in objects, those little ontological islands of definition and capability. We are at our best when our programs exemplify the OOP programmer's mantra: Objects should be responsible for their own behaviors.
Our first two "serious" etc attempts tried very hard to be sets of collaborating objects: Poem objects that knew how to select utterance objects that knew how to select constituent objects that new how to select lexical objects, a veritable Russian nest egg of surprise.
But.... How to control the selection of the utterances in the first place. We want some fully-fleshed sentences, so we can compete with Poetry. We want some fragments so we can compete with Jacket. We want some lyricism (but not too much). And so on and so on and so on....
My programs inevitable contained (within the poem object) a very procedural method (Poem.Compose) that did a lot of weighted selections of different forms and that code gets really ugly, really fast.
But what if all of that were in the grammar? Even better: Why not right smack dab in the TAG nodes themselves? Let them decide when and what to adjoin and adjunct with. Let them make the lexical choices.
Then the grammar can be open-ended. As long as the programmed node types line up with their attendants in the grammar, we can just add new grammatical structures at any time and never touch the program. We can have multiple grammars, one for language poetry, one for memoir, one for confession, one for authenticity. We can define new grammars by splitting existing grammars into pieces and splicing grammars together. We can have one grammar for an initial stanza, another for the bridge, and a return to the first for the finale.
I have a prototype working. The code is so clean, I suspect faeries are stealing into my computer at night and replacing my heavy, swarthy code with their light and emphemeral programs.
BTW: Notice how a well-placed frame of skeptical quotation marks makes me sound especially arrogant and smart. Just part of learning how to compete in the academy.
Thursday, October 19, 2006
Is there anything outside the grammar?
Employing a tree adjoining grammar (TAG) in an ALG system isn't so hard. Just climbing and falling down trees, a common pattern and tight. Lots of recursion, but that's OK--it makes it hard for novices to duplicate and they're the only real threat to getting to the demon grail of computationally expressed texts that look "normal." Seasoned and grizzled old timers either already know or don't care. It's the young are always on the attack.
But it gets me to thinking, not so much about TAG as LTAG (lexicalized tree adjoing grammer) how they admit to the place of semantics inside a grammar. Turns out context sensitivity is a real thing after all.
If the lexicon can find a place in the grammer, couldn't affect also? And style? Is there anything that can't be defined within a grammar?
Tuesday, October 17, 2006
Accident and essence
I learn a little something from each Etc iteration. Etc's grandfather was a python program working against a text repository built from Jane Austen's novels (all of them). It composed single-line poems from a string of slots that represented a string of parts of speech roughly approximating a sentence. It took about a half hour to compose a poem. It based its semantic selection on the distribution of adjacent bigrams. These poems often used the same word at different positions in a line, e.g.:" The nurse saw the nurse." Lesson learned: A word's context includes itself--trivial now, but a revelation then. And in-memory language models suck.
Etc1 (C++) took the experience and added to it a relational database--much faster. It used a phrase structure grammar with LHS=>RHS expansion rules. It used the Brown corpus to build a semantic association model (its "vocabulary"). To enforce some kind of semantic cohesion, it used a bag-of-words approach, selecting from its vocabulary 1000 words associated with each other in some way (synonyms, antonyms, context, etc). These became the subset vocabulary from which a poem could draw down words. Context was again based on adjacent bigrams. It could compose a 20-line poem (once a bag was created, which could be serialized for reuse) in about a minute. Some bags-of-words resulted in better poems than others. After a few hundred poems, Etc1 began to repeat itself--some bigrams only appeared once and so Etc1, once given a particular word, had no choice but to use the other (a conditional certainty?). Lessons learned: Context counts. The 1,000,000-word Brown corpus was too small to lead to semantic variation. And adjacent bigrams are too restrictive.
Etc2 (C#) used the British National Corpus (100,000,000) words and a much refined and controlled phrase structure grammar. Bigrams were defined as pairs of words appearing together in a sentence. Slower (because of the massive amount of data), but certainly more varied. Lots of annoying usage and mechanical errors. Lots of oddball word combinations. After a while Etc2 started to repeat itself as well--not so much semantically as structurally. Too many participials. Too many "gets to the...." Lessons learned: Tend to the surface--it really does count. And variation exists as much in formal novelty as in quirky semantics.
So....
The word distribution model as represented in Etc1 and 2 is a flawed concept. A model as monolith only ever speaks itself--and so can never be other than a briefly interesting thing--once the structure has been spoken, it has nothing else to say. Structure counts not only as an expressive medium, but as an expressing one.
Etc3's design (Java) does not access a language model or define a grammar. Rather it defines a set of rules by which language models can be instantiated (a meta-model, I suppose) and another set of rules for importing a grammar. The only thing it knows how to do is to realize a terminal. It is ignorant of the actual grammar it is using to find the terminals and it is oblivious to whatever language model it is using at any instant. In Etc3, grammar and vocabulary will be states, not drivers. They are nowhere coded and everywhere present.
How cool is that!
Monday, October 16, 2006
Writing Poetry
As I work through the design details of Etc3 (e.g.: a tree adjoining grammar this time, but oh, what an innovative one!), I think I'm beginning to understand why ALG systems so often put out non-poetic texts.
What we've been trying to do is mimic the poets' output, conforming to standard AI practices. (It doesn't have to actually be a duck, it just has to look, sound, and walk like one.) Hisar Manurung's doctoral dissertation is a case in point (and well worth the read, by the way). His attempt, which in the end he finds unsatisfactory, is to build an engine that produces output with the observable features of poetry, the AI party line.
But the problem with this approach is that what's observable in poetry is only surface (admittedly that "only" is just downright hard to get right). What makes a poem Poetry is something deeper. And whatever it is, a poetry machine has to express that within its source. What we are asking our code to do is write Poetry, not poems. What these machines need to be able to do is not to produce copies of poems but to define poetry itself. Then the poems will come naturally, a fraught phrase if I've ever written one.
This is a lot to ask. There's little agreement on what Poetry is as expressed in natural language. Now we are asking programmers to describe it in the poverty-stricken vocabulary and grammar of a computer programming language. We need another language and so we "write" that first: the interfaces and classes and enumerations and functions that provide the expressive capabilty such a new language requires. We write Poetry and it writes the poems.
Friday, October 13, 2006
Change is so hard....
I've completed moving my new ALG project (ETC3) over to the MacBook--all-in-all a pretty smooth transition. The MySql DB came right over. Their claims to portability stand up. The code was a little trickier. I'd been using Sun's Enterprise Studio on the PC, which they don't offer for the Mac. But since ES is really just an extension of NetBeans, I was able to get the projects to load in NetBeans for OS X. I had to rewrite some xml load and serialize code, because I'd used a third party lib that I couldn't get to work on the Mac (in the end, we all know that "Write once, run anywhere" is just propaganda). But that only took a couple of days.
It's the change itself that's hard. I'm still gaining proficiency in Java. It takes me a couple of years to get really good with a programming language and wrestling with the language and its libraries at the same time as coding in a new IDE under a new OS is a tad frustrating, especially on cross-platform tools. Not only are the shortcut keys different on the PC and Mac, NetBeans doesn't fully implement Mac shortcuts (such as full-line text selection). So I find myself first keying the PC shortcuts (from 20 years of practice) and then trying the Mac version, which doesn't work. I end up using the menus, so coding is taking longer than I'd like.
So much of the activity of programming is reflexive proficiency with the keyboard. Watch a programmer programming and what you see is flying fingers and an intense stare (accompanied by regular, softly explosive expletives). In my consulting days, I had a reputation for extraordinary productivity. 'Twas all a charade--I just made sure I was really, really good with the IDE. I'll get there with NetBeans on the Mac, but for now it is so hard....
Saturday, October 07, 2006
Switching
Another hiatus. This time I'm moving my new application work from Windows to Mac. It's a time-consuming process.
For years I made my living coding or consulting on coding Windows programs. Very difficult stuff--but the harder, the better: You can charge more.
But I don't do that anymore and now I just want things to work. And at the same time, I found myself needing to have my ALG programs run on servers other than MS--not possible with C#, .NET and SQLServer. So I've been writing ETC3 in Java, using MySQL as the DB. I started on a PC, to satisfy myself that it would work. It is.
I'm not entirely new to Java, but pretty close. So there's the learning of the language. And there's the learning of MySQL (entirely trivial actually). And the learning of a new IDE. (Those who exclaim that real programmers program in emacs will forever be casualties on the competitive landscape.)
And then, there's the move to my lovely new MacBook. In recent years, I have only touched a Mac to access the online card catalog in the library, so there's a learning curve there. But my oh my oh my! What a sweetheart this little guy is. The DB ported with nary a hitch (I had a little trouble with the UNIX command line, until I remembered to prefix commands executing in the current directory with "/.", but that's more an embarrassment than a problem). And the application port is going OK--once I figured out that Sun's Studio Enterprise is just NetBeans. (I'd used SE on the PC, but there is no Mac version.) I started with Eclipse on the Mac, which would mean an awful lot of rewriting to address that guy's GUI lib. Switching to NetBeans seems to have solved that problem.
All through this process, I've been stunned with the superior quality of the MacBook and OS X. I knew it was better, but had no idea it was this better. So far, there is no single function, process, or feature that I can say is better implemented on the PC. Reasonable observers could not disagree that the Mac is the superior machine.
So riddle me this: Why hasn't the business world caught on? Any business that replaced its PCs with Macs would, after absorbing switching costs (which would be significant) enjoy an enormous increase in productivity and a huge decrease in operations expense.
This is reminiscent of my experience with Unisys technology in the early '90s. I was in a sensitive position, comparing various midrange systems for a new implementation, including IBM's AS400 line and Unisys's A Series. I knew the AS400--very solid. But compared to the A Series, amateurish. You just couldn't compare those two lines and not conclude that the A Series was better in every possible way. Unisys engineering is just about the best there is. But the company is imploding at an ever-increasing rate, simply because it hasn't been able to sell itself.
Hey Steve, take a page from Bill's book and learn how to sell!