prev : next : index SPEW
Number of guitars I own, measured in strings: 28

May 21, 1999: you want to turn it on its head by staying in bed

Friday

computer programming geekiness today

I was browsing through a book I own, Jim Blinn's A Trip Through the Graphics Pipeline--a compilation of columns he wrote for IEEE Computer Graphics. I was struck by one odd column: a confessional in which he admitted that he had stuck with Fortran even through C's heyday, but he had now finally been won over, converted, yeah he's switching--to C++.

Elsewhere on my website you can find a rant about the programmer virus we call C++. What struck me, though, was how all the things he liked about it--all the things that mattered to him for C--were things I strongly dislike about it.

For example, he was thrilled with operator overloading. Admittedly, he was thrilled with it for use with mathematical operations--the one use to which I'm willing to admit it has decent applicability. But all the drawbacks still apply: hidden performance impact, inability for other people to read/modify/maintain your code. Since he's working solo, the latter aspect doesn't matter as much, and since he's the one implementing the classes, he'll realize the performance ramifications as he goes. So, ok, maybe this is a decent application of operator overloading--but only because he's a lone wolf. Since C++ is pushed as a language for building large products, with many team members, I remain unconvinced.

Arguably, operator overloading is just syntactic sugar, allowing an infix notation instead of a prefix one. C++ carries along with an even simpler form of syntactic-sugar-improvements over C, one that everyone loves, and which I'm really skeptical about.

In C++, you can declare a variable anywhere within a procedure, instead of in the initial 'declaration' section for the procedure. Here are the classic arguments for why this is good:

Here are the classic arguments why this is bad:

I can't argue with the fact that it saves typing. It doesn't really save that much typing, though:

   /* C-style */
void myfunc(void)
{
   void *frog;
   int headlessFrogWeight;
   ...
   frog = ...
   headlessFrogWeight = computeWeight(frog);
   ...
}


   // C++-style
void myfunc(void)
{
   void *frog;
   ...
   frog = ...
   int headlessFrogWeight = computeWeight(frog);
   ...
}

If you check, C++ prevents you from having to type 'headlessFrogWeight', and a ';'. It also saves one screen line, if you care about fitting as much as possible on a single screen.

The second argument is that, rather than having to cluster all of your variables in a big monolithic pack, which then becomes a pain to search through, you can declare your variables much closer to where they're used and thus make it easier to find them.

You may have noticed that I tried to be as factual as possible in my presentation of the arguments above. Now let me offer you my perspective on the problem overall. I think the valid arguments for doing this are:

Here are my arguments why this is bad:

Let's look at my reasons in favor of "delayed variable declaration" first. If your variable declaration is initialized, you can guarantee that the variable is never used without being initialized. If you're forced to declare your variable at the begining of the function, you may not have meaningful values available to assign to it. So on the surface you can avoid a class of bugs.

On the other hand, if you delay the variable declaration into the middle of the function, but don't initialize it, you haven't really improved the situation. It does mean that if there are references to the variable before that declaration, they'll get caught as a compile time error. On the other hand, any such references to a variable are exactly the ones which it's easy for a compiler to catch as a reference to an uninitialized variable. The compiler can't always catch such references, but the cases where it can't are also cases where a delayed declaration would have to be uninitialized anyway. So the advantage here is really minimal, assuming you're using a non-sucky compiler. (For example, Java actually mandates that compilers catch uninitialized variables as a fatal compile error, not just a warning.)

Let me put it this way. I'd much rather compiler development effort had gone into making compilers detect uninitialized variables, rather than supporting all the extra complexity of C++. Besides, how many C++ style guides have you seen that said "only use delayed variable declaration if you're providing an initializer"? None, because it's still perceived as better even if you don't provide an initializer. That may be, but then it makes this argument in favor somewhat irrelevent.

A stronger argument is the notion that it allows const variables, in other words variables that aren't really varying, to be defined anywhere within a procedure. Such a variable isn't a general constant, but rather a value computed at that moment in time, and guaranteed to be locked in for the duration of that variable's scope. This is a powerful tool for programming, because it makes unambiguous the notion of 'value', making certain kinds of understanding of the code easier. Indeed, in purely "functional" programming languages, this is the only kind of 'variable' allowed. (If you want to compute a different value, stick it in a different variable, or compute it on a different iteration.)

Of course, you can create similar values in C by simply only assigning to a variable in one location. In some cases, that may even be cleaner, since you could assign to the variable only once unambiguously, but allow the assignment to be made in one of several statements, i.e. distinguished by ifs or switch statements or the like. In other words, C++'s implementation of const is weakened for this purpose because the rest of the language doesn't make single-point-of-initialization as simple as might be desired.

Nonetheless, C++ provides compiler-guaranteed-support that you won't multiply assign into the variables, so clearly C++ has an advantage here. On the other hand, how many C++ style guides have you seen that said "only use delayed variable declaration for variables that are 'const'"?

Now let's look at the aspects of delayed variable declaration that I consider to be a problem. My number one concern is that it makes finding the declaration for a given variable harder. This may seem surprising, since the opposite is claimed by other people.

If you sit down and look at a piece of code, and there's reference to a variable 'frog', and you're not sure what type it is, what do you do? You could scan back up the text of the program, line by line, looking for the declaration of 'frog'. If that's what you do, moving declarations 'down' in the program closer to their use will definitely shorten your search.

In an ideal world, an editor which knew a little about C/C++ syntax could just find it for you with the tap of a hotkey. But it's not an ideal world--or at least not everyone who might read your program will have such an editor--so it's a good idea to make this task easier.

Now, in C you can still do something a bit like C++, by declaring a new block. The difference is that you have to define a place where that block ends, and the new variable goes out of scope.

   /* C-style */
void myfunc(void)
{
   void *frog;
   ...
   frog = ...
   {
      int headlessFrogWeight = computeWeight(frog);
      ...
   }
   ...
}
In some sense, C++ is just making this practice more efficient to type, and more flexible. I've even (very infrequently) used the above construction, so what's my beef?

While not every editor provides automatic declaration finding, every programming editor worth mentioning does include 'parenthesis matching'; for example, in vi it's '%', and in MSVC's editor it's ctrl-]. Since the '{' character isn't used for anything other than blocks, it's easy to find the nearest containing '{' character. The only place where a variable might be declared is immediately after one of the nested '{' containing a given usage. Thus, you simply need to find all of those; the editor automation may or may not provide a tool to help.

Even if the editor doesn't help with this, indentation can easily be used to find all the locations. Indentation is of course not a feature of the language, but it is a universal style, and universally recommended in style guides. So finding all of the containing variable declaration blocks for a given use just isn't that hard. You can scan up vertically watching the indentation much faster than you can scan for a particular name. Then when you find a variable declaration block, you apply the detailed visual scan to just that section.

But you can do even better than this, using any editor. As long as you to stick to a simple stylistic rule (one that's widely recommended)--that you shouldn't use the same name for two different variables in the same function (i.e. variables at different scopes)--then you can simply go to the top of the function and search forward for the name of the variable. The first use will always be the declaration, by definition.

How hard is that?

Really, the argument that it makes it easier makes me quite incredulous. It makes it easier if the declaration happens to be on the previous line, but that's about the limit of its easierness. I generally stick by a (somewhat common) stylistic rule that advocates declaring all variables only in the initial function declaration section, which makes it pretty darn trivial to lookup the declaration for a specific variable.

There are probably two kinds of functions that we care about: functions whose source code fits on a single screen, and functions whose source code does not. The former can be scanned visually, entirely without any input to the computer. In this case, the automation of forward scanning is less interesting--we care more about pure visual eye scans. So perhaps in this case, my argument isn't very applicable.

However, if a function fits on a single screen, it has to be relatively short. It can't have that many variables declared. Because it fits on one screen, it's trivial for the eye to find the beginning of the program and then scan the variable list. Indeed, I don't think this is really the case that people care about very much. What they care about is the other case, where they're writing code and they're far from the start of the function and they need a new variable and they're just too darn lazy to scroll up and declare it in the main section and it's so much easier just to type it in here.

But that's almost universally a bad idea, because it means you're making wonkingly huge functions that don't fit on a screen, that are hard to understand and follow and everything. That is bad style! Split it into multiple functions, loser! Sure, sometimes you have to do functions like that--they share too many local variables in arcane ways to be split up--but they should be exceedingly rare--rare enough not to justify putting in this feature just for those routines. If nothing else, this feature encourages people to just declare the variable and go, instead of splitting things up into multiple functions. And this is my final argument.

I recently ported some code that was written to be compiled as C++, even though it used almost no object-oriented code. It did have some very long sections of code used to initialize and setup a number of systems provided by the operating system. The code was basically straightline--some ifs, but no loops. And it was full of stuff that read like, 'ok, now it's time to do this thing, so here's declarations for the variables we need, ok, and here's the things to do with them'. Sometimes those variables would stick around for a while, needing to be used by later setup code, and sometimes they wouldn't. It's the kind of code that people rarely modularize, because everything in it is only done once, so there's no "savings" to modularizing it--just more typing.

My first inclination was to just turn as many of the delayed declarations into local { ... } blocks as possible, and move other declarations which needed longer lifetimes up to the initial function declaration stage. But as I did this and added comments to each of the major sections of the code, I realized that for most { ... } I was adding, I was putting a comment before that section describing what it did.

So I instead pulled out each { ... }, made it its own function, with a name based on the comment, changed the comment to a function call, added appropriate parameter-passing code to expose other variables needed by the block, and suddenly all of the code was nicely modular, and twice as readable.

Indeed, this was all for the best in the long run, because part of the modifications I had to make to the code after porting it was to reset the system to new states, which required shutting down some subsystems and reinitializing them with new values. Having each system broken off into its own initialization function was the appropriate structure for this anyway.

But even if this weren't true, I'd still want the code to be this way. It's much easier to read the initialization section, with a list of ten things that are being initialized, each with its own function call. It all fits on one screen; you can easily see how the the systems interact. The degree to which they expose variables to one another is made explicit in the parameter passing. The order in which they are executed is made explicit. If you're looking for the chunk of code which performs a particular operation, it provides a top-down structure that makes it easier to find the section of straightline code which performs that operation.

Modular code good.

Monolithic code bad.

Single location for declarations good.

Delayed declarations bad.


prev : next : month : index : : home
attribution dammit: He Knows I'd Love To See Him Morrissey