Sunday, October 24, 2004

What Yesterday Taught Me

In last night's game the action was too fast and furious to keep up with.

Text Editing

Somehow, isn't that the way these things always happen -- somehow, I made a mistake on the Cards-Sox page in the HTML, the mark-up language of the web. This experience brings me to my next idea: how to make text editing work better, if not less error-prone. And while we're at it, my hidden agenda is always to discover ever more generalizations: in text, programming, editing, human expression. This is a step in that general direction.

I'm a fan of the macro programming language m4, which allows a writer to re-use frequently repeated runs of text. Here's a Google search of m4. Often, in situtations like this, editing on-line content, there is quite a bit of "dark matter" to provide the gravity to hold the product together and provide it's structure. You know, the paragraphs, lists, headings, even the images.

M4 is a handy tool to capture and re-use the repeated text, seen or un-seen in the case of HTML. The world is moving on from "batch" and "compile" environments, where some jobs are done ahead of time, and now moving to "just-in-time" delivery of the goods and services. I'm thinking that's appropriate here too. The web already has a good deal of timely information, much of it in on-line shopping, using databases with concurrent processes, locks, semaphores. All the recent software innovation is well used. My target is a little narrower. It's us.

As we sit here in our web logs, typing away. Somewhere below the words is the structure. I for one, would like to capture that in a useful way. To get an idea of the problem, select yesterday's Cards-Sox entry and try to make some sense out of the HTML. It's not worth my time in it's current shape. But, I've previously used m4 to format web pages on my own, and would like to bring it into this process, the on-line publishing. As I edit this very page, in addition to the "preview" option, I've an "Edit HTML" and "Compose" tab. Most of the time, I'm composing. Occasionally, I like to see the real stuff, and switch to the HTML view. To make this work, I'm proposing to get the macro-processing out of the batch, pre-compile mode, and work it in to the editor. My waking thought this morning was maybe there's an emacs mode in this, though I think that will be stretching.

A Macro Tool

Here's what I'm talking about, requirements-wise (sort of):
  • capturing repeated text, both identical, and patterns with arguments,
  • install in a client-server model, which means remove it from a pre-compile or batch process
  • support functional, procedural languages, to the possible omission of Object-Oriented languages
  • moving personal communications into the web (relying on someone else's server for backup data security.. This may need to be examined in detail)
This was where my thinking was yesterday morning. Yesterday was a day of great awakening for me, returning to this Web log, blogger.com. I opened up this web log, Three Martinis for reasons you can see below. In the process, I've had some of my preconceived ideas challenged as to the value of this medium. Recently a critic, I'm now a convert. Probably for the opportunity to ramble on, if not make an occasional point. The experience of Cards-Sox leads me to these conclusions (sort of an implementation):
  • The macro-processor belongs in the "keystroke interpreter" portion of the process
  • It communicates with the file store using stored templates and a view of the edited product.
  • It is stateful, either in a discovery mode or user dialogue. Think of the simple text-completion tools on browsers as already-existing example of this process
  • Users may, like the editor for this web log, see a Compose mode, or an HTML (e.g.) mode, or an editor mode. The latter shows the structure in a raw sense, regardless of the target language.
My view of how this works is also informed by the workings of emacs. Each keystroke is interpreted as you type. Emacs is stateless in that sense, or it may be thought of as one big state machine, where each keystroke takes you to a possibly different state. In emacs case, the default state (or action) is to insert the just-typed character at the point in the buffer, and move the point. But note the action when you are searching for text. My copy of emacs highlights the visible text fragments that match the search criteria. The cursor moves to the first matching string beyond the current point.

Applications

The constraints on generalization are easy to imagine. If one has a large amount of text to re-use, such as legal boiler plate, frequently you see phrases such as "John Doe, hereafter referred to as the Party of the First Part (PFP)". This is an example of a request by the writer to make you, the reader perform the necessary substitution as a mental exercise. Where the text is considerable, and there are but a few values to substitute, but possibly many places to perform the substitution, it is quite cost-effective to capture the boiler plate and insert the few instances of the repeated text. Form-letter, or mail-merge software is an example of this use. If you think how these are used, they are still batch processes. In truth, for such large amounts of text, the process is likely to remain a batch. Think of the job: "Fred, call up the contract and insert Grace Jones as the 2nd supplier". As the amount of text to re-use decreases, unless it's a tightly controlled legal requirement, it is frequently easier to simply re-type it, as the overhead of accessing it, and bringing it into contact with the specific information is greater than the re-typing. It's for each application, based on re-use patterns, to determine what is the lower threshold on size of re-use pattern.

So, I'm imagining a world where re-used patterns are marketed, again imposing a threshold, but since it's economic, the costs and benefits may be weighed. Imagine this: you're typing, and you're informed by your editor there is an applicable text pattern for your use. You see a copy of your work, a copy of the text pattern, and an insertable template, which you may select. The first application is for those of us who teach, and defend against plagiarism. I needn't describe the scenario.

Three Martinis

In the mid-'70s, while a paid programmer for NORAD, I observed events which inform the ideas behind Three Martinis. With a personal suffix of "third", namely MMIII, the number three has especial significance to me. Working on my software, I had to program an interface to some system software, the Man-Machine Interface, or MMI, as it was known. After a few months, there was a sufficiently useful MMI document, and lo and behold, it worked as advertised. The interaction was pretty lo-level for the Fortran IV most of us orbital mechanics used, but no problem. I was working my way through one, then two separate interfaces to this software. All was going well. Then, when I started an interface to a third module, I got the rather bright idea of generalizing this for all three modules, as the code was looking quite similar. It's worth noting the programming means was a batch card deck. (Or coding sheets for those who couldn't type). Of course this interface worked. It was all my software. As an aside, when I proposed this for the general library, I had to put it before the committee, and was rejected with the notion that "you're an applications programmer, not a systems programmer". That non-technical rejection proved the necessity of my interfacing module. I was able to shop it around to the group of ten I worked with. It didn't make the public library.

So, why three. For any number of years, I've had the occasion to repeat:

My threshold of pain is three.

Three of what? Three of anything. The number of applications for the principal that the third one is redundant (oops) is considerable. In software, there are readers and writers, clients and servers, processes and data, ... the list goes on. We are conditioned to think of binary as something fundamental. It seems if one is doing something for the third time, you haven't discovered the appropriate generality. Such was the case with the NORAD MMI. If you have written two instances of a piece of code, and only two, the odds are good you'll need no more. But if you cross the two-three threshold, the need is likely to continue indefinitely. That's the time to generalize.

Leaving for another day, is generality always better?





No comments: