Tuesday, May 23, 2006
Between essence and descent
In our last post, we looked at the elegant compositional simplicity of the C strpcy() function. Today, we'll talk about just how dangerous that bit of poetry really is.
strcpy()'s successful operation depends on the destination buffer's being large enough to contain the contents of the source buffer, determined by the presence of a zero-value null terminator somewhere in the source buffer. Consider this code fragment:
char dest_buffer[16];
strpcy(dest_buffer,"Hello world!")
This works just fine because the compiler-supplier null terminator, which falls immediately after the ! character at position 12, is well short of the end of the destination buffer's final position 15 (offsets in C are zero indexed).
But what about this?:
char dest_buffer[16];
strpcy(dest_buffer,"This is the way the world ends");
A seasoned and grizzled old timer would take one look at this and shiver and should the snippet be his own code remark, "That is not what I meant at all."
As we saw in the last post, strcpy() just copies each character from the string literal to the dest_buffer array, but it doesn't stop at position 16. It just keeps right on copying--to position 30. Positions 16 to 30 exist, immediately following the memory space allocated to contain dest_buffer. But that space is probably being used for something else, another variable perhaps, or an argument list on the stack, or (shudder) executable code. And whatever was there gets overwritten. If the gods are in a good mood, the program crashes. If they are in their usual demonic phase, this bit of sloppy code actually changes the program!
It gets worse. Consider some typical startup code:
int main(int argc, char** argv, char** env) {
char* pc = NULL;
if ( argc > 1 ) {
char* pc = (char*)malloc(12);
strcpy(pc, argv[1]);
}
// Do stuff
if ( pc != NULL )
free(pc);
}
At first glance, this code's author seems to have been cautious. She checks that there are indeed command line arguments before attempting to process one (argc represents the number of command line parameters passed to the program). And she makes sure that the call to malloc() (which asks the OS for a block of memory) was successful before trying to release the memory via the call to free(). Nice defensive code.
BUT…..
How does she know that argv[1]'s length is less than 12 and will fit into the allocated buffer pointed to by pc. She doesn't and when (not if) it is too long to fit, this program will crash and burn (itself or its user or its user's machine).
BTW: This is the cause of all of those buffer-overrun security flaws in Windows, where clever hackers have figured out where the code buffers are in a program or OS function and simply send that code to an accommodating function that, blissfully ignorant of incoming buffer sizes, copies the code right into a place where if can execute and do whatever damage it wants.
Most of these security holes could have been avoided by the simple and very common practice that says "Never, ever, ever use strcpy()." (How sad that such beauty must wilt, unseen and ever innocent.)
If you need to copy strings, use strncpy(), which takes a third argument, the maximum number of bytes to copy. You can control that value by using whatever mechanism you used to allocate the destination buffer--common programming wisdom at least since I started using C about 20 years ago.
There's a disconnect here between the elegance of strcpy()'s composition and its practical application and a host of questions to be invented about the roles and utility of radiant program texts and the relative balance of aesthetic and engineering practices in the development of useful applications.
Next: poetic software vs. engineered texts.