prev : next : index SPEW

May 8, 1999



This is part II. Read yesterday's first.

But Then Came Spew...

When we last left our hero, we found him blindly leaping off a cliff known as "it's ok for this to be crap, it's finished and will never need changing" (a cliff, I should note, situated well apart and yet parallel to that known as "it's a temporary fix").

The careful reader should have noticed the foreshadowing--that macro begot spew--and been unsurprised by the climactic leap.

In classic serial fashion, however, I must reveal that it's not as bad as you might have thought.

The Need For Spew

Before one can write a program, one must decide what it will do. That means deciding what the end result should be, and then deciding how to get there.

I only worked this out in the roughest terms before I started coding. In the end, I didn't quite get what I really needed; as of this writing, navigation is a little clumsier than you might like, primarily the interface with the months. spew's model of a set of pages is just a little too simplistic, and that's too transparent. (But I have plans--mwa ha ha).

Here's what I wanted:

At this point, I was prepared to bite the bullet and write a brand new system to do this sort of thing, but I hated the idea of giving up the functionality of macro, so the idea of evolving it came to mind.

Then I got a clever idea... and then another... and the grand scheme came together: spew is still entirely independent of HTML!

I'm not sure what other uses it might have, but I promise you, I was quite astonished to come up with a solution which:

As I said, in the end it turns out that this isn't quite enough to do it right. Doing it right is going to require at least one more command. But maybe nothing more.

Multi-File Output

Adding multiple-file output is trivial:

   .file  "<filename>"
       Text following this command is put into the file <filename>.

In other words, you have a long file. Subregions of it are the definition for each page; between any given pair of .file commands, that section of text goes into the first .file's filename. (A given filename can only appear once; an alternative would allow them to appear multiple times, but this would be flaky in this particular situation.)

Not everyone who might want to use a system like this will want everything in one file. That's ok; .include allows us to to simulate them all being in one file, by making a single file that .includes everything else in the right order. If you need to automate it, you can write a simple shell script to generate this list of .includes from an existing directory. Personally, I don't like generating lists from directories, because it prevents me from using the directory flexibly; and I like the idea that the order of navigating through the output files is implicit in the ordering of the text within the main source file, not determined by sorting the filenames or some such. Nonetheless, any such logic can be done, outside of spew.


At this point, there's only one command left to be added, so it seems rather implausible that I could do everything else in the spec, unless it's a doozy.

Well, I probably misled you a bit. spew sidesteps by making use of "built-in macros". (C programmers should think of __FILE__ and __LINE__ and skip the rest of this paragraph.) A built-in macro is a string which gets macro substitution performed on it, but the user never defined that string as having a macro definition. Instead, it's built into the system. Moreover, the built-in macro can change its value depending on the context--something that a traditional macro can't do at all, and which macro's macros can only do in an extremely simplistic way.

spew introduces four built-in macros. One slight flaw here is that these macros become reserved words; they intrude on the space of strings the user can legitimately include in their source code. macro doesn't do this, by leaving all syntactical constructions to the user, except the very minimal constraints of avoiding lines starting with ".raw", ".define", and ".include". With spew, the built-in macros are interpreted anywhere on any line, and so a user is more likely to encounter situations where this interferes with what the user wanted to do. The names have been chosen so as to hopefully avoid this. A user can always escape them by creating new .raw macros.

The four macros are (one second, I need to add a new escape character... ok):

Now, consider the navigation toolbar at the top of this page. To avoid having to do a lot of escaping, I'm not going to use the real names, but I'll sketch out how this is implemented:

   .define "linknext"      "|#NEXT#|next|"
   .define "linkprev"      "|#PREV#|prev|"
   .define "linkindex"     "|.|index|"
   .define "shorttoolbar"  "linkprev : linknext : linkindex"
   .define ".newbody"      ".body shorttoolbar .p"

The definition of "shorttoolbar" is cut-and-pasted from my macros, but the actual usage of it is more complicated by the use of tables to create the floating SPEW on the right. (If you are reading this in the far future, I may very well have reformatted these pages, in which case none of this will make any sense. I forgot to mention that this is one of the other ways the pages might change--in formatting.)

Note that in an HTML context, there is very little use for #FILE#, since one would rarely if ever want to put in a link to a file itself. (It's not a fully qualified link, either, just the name of the file, so it's not even useful as plain text.)

However, as you will see shortly, it was useful for there to be a "this" that was symmetric to "next" and "prev". And other non-HTML applications might well have a use for it.


I implemented something which works for indexing. Part of it is really elegant, and part of it is a hack. The part that's a hack is how navigation ends up working; the part that's really elegant is in how the HTML itself is structured. So let's focus on each of these one at a time.

Tree Structure

I conceptualize the pages as being organized into what's know as a tree; a root index which lists all of the months, and a "month" index which lists all of the days for that month. (Should SPEW go on for a surprisingly long time, I might need to insert a year index.)

Here's a diagram I drew to illustrate it. Please forgive the low-tech approach, but I didn't want to spend hours in a paint program:

You might be used to seeing trees drawn "sideways" compared to this; in a moment you'll see why I chose this orientation.

I decided on a convenient hack to simplify the problem. I would only automatically generate the month indexes, which are the ones that require per-entry updating. I'd update the master-index by hand (once a month).

Now, I'd like index pages to be able to contain arbitrary text. That is, they should be "normal" pages that happen to contain indices of other pages. As such, they should just appear in the file somewhere, too. And, to keep things simple, I decided that they would just be just like normal files in terms of #PREV# and #NEXT#. This leads to this somewhat bizarre chaining:

The first picture shows just the #PREV#-#NEXT# linkage; the second shows the automatically generated indices as well. The master index, which I maintain by hand, I chose to make be the first page in the file. It appears to be a normal page, but the index-generating code automatically ignores it for indexing purposes (a hack, but not a totally non-general one).

Unfortunately, if you squish the second picture back to the orientation of the originally shown tree, which is how one is normally conceptualizing the end result, you get this:

The upshot is that when you get to the end of the month and click 'next', it takes you to that month's index. Click on 'next' from there and you go to the first page of the next month. This is bizarre because what people are expecting is for days to be linked directly to each other, like this:

(And plausibly the month indices should be linked to each other.)

Now that I've explained the major navigation problems, I'll explain why I chose to interleave the indices with regular pages in this way--which is what causes the navigation problems with #PREV# and #NEXT# in the first place.

The reason for this interleaving is very simple. It makes the "range" of the indexing implicit. Remember how the decision for what text belongs in a particular file is determined by it being "bracketed" by two ".file" commands? The convenience of this is that it's easy to specify, nearly minimal in the amount of effort. You stick in one command--specifying the name of the file, something you have to do somehow--and the location of that command (and other equivalent ones) also specifies exactly which text belongs in the file.

As you might guess from the pictures above, indices work the same way. If two pages are specified as being indexes, then the second one indexes all the pages between them.

Why the second and not the first, like .file? Because I originally thought it was somehow going to be a one-pass affair, and that meant after. Implementing #NEXT# turned out not to be--one of the major mistakes I made (and quite obvious, really--just a good example of the sort of thing you can forget if you're doing disorganized seat-of-the-pants design).

A page which contains an index is indicated with the command ".index". This can appear anywhere within the text for that page (as a command, so it must be at the beginning of the line). If a ".index" appears, the page is understood to be an index page, and all pages between it and the previous index page are indexed. So it's not the ".index" itself which bracket the material to be indexed, but the pages containing ".index".

Index Text

Generating an index without "knowing about" HTML might have seemed like an amazing trick, but hopefully by now I've laid enough groundwork that this isn't really suprising at all.

To generate an index, you issue the command ".index". Like other commands, .index takes a single string parameter, which contains normal string escaping and the like.

At this point, things deviate a bit. .index then outputs text to the current file, consisting of one copy of that string for every file indexed by this page. Macro substitution is performed on each of these strings independently, and #ITEM# is bound to the name of the page being indexed.

In other words, a simple index page looks like this:

   SPEW Index Page
   <h1>SPEW Index Page</h1>
   .index "<li>|#ITEM#|a page|"

Which, if it indexes pages named "one.html", "two.html", and "three.html", will output something like

   <h1>SPEW Index Page</h1>
   <li><a href="one.html">a page</a>
   <li><a href="two.html">a page</a>
   <li><a href="three.html">a page</a>

Unfortunately, this index page has the flaw that the description of every link is "a page", but we'll fix that in a second. Nonetheless, you can see how .index can be used to generate list items, paragraphs, table rows or table entries. It doesn't know anything about typesetting, but it allows a reasonably flexible application. And it does quite a lot with one command: specifying the text to be used (which can thus vary for different indices) and at the same time delimiting the pages to be indexed.

Of course, you wouldn't be able to do something clever like generate the month's days in calendar format (split up by weeks), but I never promised you a rose garden. You couldn't even generate a list of all the days of the month, with links for only those days that have entries. spew doesn't know about months or weeks or days. It just has lists of files. A good compromise, I think. You could use spew to provide hierarchical indexing for anything with an essentially linear structure.

Actually, if you were sick, you could do something like calendar month arrangements, assuming you update every day. You have to go to some effort each month, but not per day. (If you don't update every day, you'd might be able to fake it, but it would be really hard.) Here's the trick. Since there are always seven days in a week, we can use a macro which cycles through seven values. On the first day of the week, it outputs the end of the previous row, the start of a new row, and the start of the day, and the actual data. The others just generate a new table cell.

To make it even more explicit, you could do something like this:

  general table handling:
  .define "STARTWEEK"   "<tr>"
  .define "ENDWEEK"     "</tr>"
  .define "DAY"         "<td>|#ITEM#|DATE|"

  the specific table handling for each day of the week:
  .define "MON"    "STARTWEEK DAY"
  .define "TUE"    "DAY"
  .define "WED"    "DAY"
  .define "THU"    "DAY"
  .define "FRI"    "DAY"
  .define "SAT"    "DAY"
  .define "SUN"    "DAY ENDWEEK"

  a macro to cycle through days of the week: the first day
  in the list should be the day of the week of the very first
  .define "WEEK"   "MON" "TUE" "WED" "THU" "FRI" "SAT" "SUN"

  for the .index, just say
  .index "WEEK"

  macros to set up the calendar for a new month, padding with
  empty days:
  .define "MONTHHEADER"  \ 

  .define "PAD2"  "<td><td>"
  .define "PAD4"  "PAD2PAD2"

  for a month starting on a particular day, put this before the .index

  cut and paste this macro into the beginning of each month
  .define "DATE" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" \ 
                 "11" "12" "13" "14" "15" "16" "17" "18" \ 
                 "19" "20" "21" "22" "23" "24" "25" "26" \ 
                 "27" "28" "29" "30" "31"

(Each time you redefine a macro, it start over from the beginning; there's no way to force the restart without redefining it completely; if there was you'd just say .reset "DATE" each month.)

I'm not saying anyone should do that, but it's kinda suprising what you can do with such a limited system.

Descriptive Page Information

Clearly, when the time comes to generate the index, it would be nice to be able to say something more than 'a page'. It would be nice for the data there to be "personalized" on a per-page basis. Of course, you could always do something like:

   .index "<li>|#ITEM#|#ITEM#|"

but that's likely to be a little ugly looking, and cause you to use silly filenames.

But it suggests an alternative:

   .index "<li>|#ITEM#|#DESCRIPTION#|"
where #DESCRIPTION# also varies on a per-indexed-file basis, just like #ITEM#.

But where would #DESCRIPTION# come from? Well, presumably the user would set it. The user could set it per page. In fact, we could change ".file" so it takes two strings--the page name and the description.

For various reasons, though, I decided that wasn't a satisfying solution. Instead, I added yet another command. For historical reasons, given the purpose to which I was putting it, I called it ".title", but it has no hard-coded relationship to HTML's <title>.

   .title  "descriptive-next-for-file"
      Sets the value of built-in macros in certain conditions:
         #FILE.title# during this output file
         #PREV.title# during next file
         #NEXT.title# during previous file
         #ITEM.title# when index iteration names this file

Note that unlike #FILE#, #FILE.title# is very useful, as it allows you to define the "title" associated with a page once, and use it to define both the HTML title or header for the page, and the description used to index the page. (SPEW uses all three.)

The syntax for referring to these variables may seem clumsy, but they're normally only ever used inside other macros, so they're used rarely, and keeping them verbose and unlikely to conflict seems wise. Furthermore, the design allows for a future meta-extension, in which the command ".title" is removed, and instead a command ".data" is created which allows defining new commands of the form ".title". That is, I imagine:

    .data  "date"
    .data  "title"

    .define "toolbar"   "|#PREV#|| ...etc..."
    .define "fulltitle" " : #FILE.title#"

    .file       "foo.html"
    .title      "Foolish discussion"
    .date       "April 1, 2000"

Alternately, rather than creating a syntax for spawning new structures, you could use the slightly less convenient idea of using a command to attach arbitrary (name, value) pairs to a file:

    .define "toolbar"   "|#PREV#|| ...etc..."
    .define "fulltitle" " : #FILE.title#"

    .file "foo.html"
    .info "title"   "Foolish discussion"
    .info "date"    "April 1, 2000"

That's more typing, but a better usage of namespaces. I haven't bothered with either of these yet, but it could happen.

You may note that the actual toolbar also includes a link back to the current month, but I haven't specified a macro name that refers to "whatever page indexes this one". Indeed, I could and probably will do so in a future revision. For now, immediately before a given month's pages, I define a macro whose value is the name of that month's index file. The code to generate the navigation toolbar refers to that macro. If I don't remember to add a new definition each month, the toolbar will do the wrong thing.

Fixing the Navigation

I haven't done it, and I don't know if I'm going to, but I can see how I could address all the problems with a single command, and a not unreasonable amount of coding. Each page would get specified with a "level"--a depth within the tree. This might be a command, which would make it more explicit, or it might be able to be attached to the ".file" command. The "level" in the tree would be the distance from the right as shown in the illustrations above. The daily pages would be at level 0, the months at level 1, and the master index at level 2. (Or the years would be level 2, and the master index level 3.)

The meaning of #NEXT# would change from "the next file in the source" to "the next file in the source at the same level". As a result, #NEXT# for the last day of a month would refer to the first day of the next month, and #NEXT# for a month would refer to the next month. This would clean up the #NEXT# navigation and even provide nice navigation between months. Furthermore, it would allow generation of higher level indices; ".index" would generate an index of all pages of the next-lowest level since the last index page at this same level.

Finally, introduction of the macro #INDEX# would allow each page to correctly refer back to the index page which refers to it.

I'm A Liar

After writing the above descriptions, I couldn't resist, so I went and implemented them, which is why the description doesn't match the actual navigation. I implemented both ".level" and ".data"--the latter can be seen in some of the "short descriptions" in the monthly indices.

I still didn't implement a "#INDEX#" which would point to the page which indexes the current page (allowing for the "month" link to be automated), because currently I don't have the information available at the correct time (I don't do any index processing during the first pass, and I'd need to to know which index a page belongs to during the second).

That's All He Wrote

Once again, spew feels to me like a very elegant, tight little program which provides pretty much the minimum toolset necessary to accomplish the task. (This becomes more clear if the above extension for defining arbitrary "attached definitions" is used.)

I hope you've enjoyed this tour of my silly systems. The source code to macro and spew are available on my source code pages, and you can email me for win32 executables if you're desperate.

I will leave you now with the complete (as of this writing) list of macro definitions used to build SPEW, slightly tweaked so I wouldn't have to do too much quoting.

.comment   Basic macros for doing HTML more nicely

.raw "_"    "<b>"  "</b>"
.raw "@"    "<i>"  "</i>"
.raw "<"    "&lt;"
.raw ">"    "&gt;"
.raw "&"    "&amp;"

.raw ".p"      "<p>"
.raw "\n"     "<br>"
.raw ".pre"    "<pre>"   "</pre>"
.raw ".quote"  "<blockquote>"  "</blockquote>"
.raw ".rule"   "<hr>"

.raw "|"    "<a href=\""     "\">"    "</a>"
.raw ".h1"  "<h1>"    "</h1>"
.raw ".h2"  "<h2>"    "</h2>"
.raw ".h3"  "<h3>"    "</h3>"
.raw ".h4"  "<h4>"    "</h4>"
.raw ".h5"  "<h5>"    "</h5>"
.raw ".h6"  "<h6>"    "</h6>"

.raw ".head"  "<html><head><title>"
.raw ".end"   "</body></html>"
.raw ".body"  "</title></head><body>"

.raw ".list"    "<ul>"
.raw ".item"    "<li>"
.raw ".endlist" "</ul>"
.raw ".image"   "<img src=\""       "\">"

.raw "\_"   "_"
.raw "\@"   "@"
.raw "\\\\"  "\\"
.raw "\<"   "<"
.raw "\>"   ">"
.raw "\&"   "&"
.raw "\|"   "|"
.raw "\."   "."

.comment   Specialized macros describing the format
.comment      of all the basic page elements

.define "LINKNEXT"      "|#NEXT#|next|"
.define "LINKPREV"      "|#PREV#|prev|"
.define "LINKINDEX"     "|.|index|"
.define "LINKMAIL"      "buzzard @at@ world @dot@ std @dot@ com"
.define "LINKMONTH"     "|MONTHPAGE|month|"
.define "LINKHOME"      "||home|"

.define "INDEXtOOLBAR"  "|#PREV#|prev| : |#NEXT#|next month|\ 

.raw    "TABLEHACK"                                    \ 
    "<table border=0 width=100%><tr><td>"              \ 
    "<td align=right><b>SPEW</b></tr></table><p>"


.define "DAYHEADER"   "TOPtOOLBAR .h2 #PAGE.title# .h2"
.define ".daystart"    ".head SPEW: #PAGE.title# .body DAYHEADER"
.define ".dayend"      ".rule LONGtOOLBAR .end"
.define ".song"        ".rule LONGtOOLBAR \\nattribution dammit:"
.define ".songend"     ".end"

.comment   A monthly index page looks like this:
.comment       .page "mar99.html"
.comment       .title "March 1999 index"
.comment       INDEXSTART
.comment       .index "INDEXFORMAT"
.comment       INDEXEND

.define "INDEXTITLE"    ".h6 Sean's Personal Electronic Writings .h6"

.define  "INDEXSTART"   ".head SPEW: #PAGE.title# .body TOPtOOLBAR \ 
INDEXTITLE .h2 #PAGE.title# .h2 .list"

.define  "INDEXFORMAT"  ".item |#ITEM#|#ITEM.title#| @#ITEM.desc#@"

.define  "INDEXEND"     ".endlist .p .rule INDEXtOOLBAR .end"

prev : next : month : index : : home