Wednesday, May 21, 2008


In April, I gave a workshop for the Boston STC on the issue of doing structured authoring without using DITA or Frame. In the workshop, I happened to mention a technology called microformats which several people asked me to define during and after the workshop. I did but, in retrospect, decided that I wasn't satisfied with the answer that I gave. So, here's a better definition that also has a number of additional ramifications.

Microformats use elements from existing languages or standards, like HTML, to mark up web content in such a way as to add semantic information to that content for use in Web 2.0, without having to adopt new languages or standards. Basically, microformats re-use existing features of current languages and standards.

There are several issues tied up in this definition. Let's take a look at two big ones.

- Existing languages or standards... - Every language or standard has a number of widely-used features and an often much larger number of little-known features. The latter often go unused or unnoticed. For example, the rel attribute of the link tag points to the location of the CSS that we attach to a topic in a help system, but rel often goes unnoticed unless you delve into the code. Yet rel is actually pretty flexible, offering a bunch of pre-defined values *and* the ability to define your own. This can get pretty esoteric, but it does not require you to buy and learn new software, just new ways to work what you already have.

- Semantic... - HTML tags like h1 are presentational rather than semantic. In other words, applying h1 to text tells us how to display that text but not what it is. For example, consider an online book store that uses HTML to mark up its listings. We could create a book listing and use h1 for the book title and h2 for the name of the author. We can then format the display by specifying the style attributes for h1 and h2, but we have no way of knowing that h1 is actually the book title and h2 is the author's name - e.g. the semantics of the information. XML lets us fix this by creating our own, semantically-definitive tags, such as creating and using tags called and rather than h1 and h2. But HTML already has elements that carry semantic information, such as the "cite" element that lets us identify a block of text as a citation. In other words, we may well be able to add semantic information without having to move to XML or DITA.

For a detailed overview of microformats, I recommend Microformats: Empowering Your Markup for Web 2.0 by John Allsopp, published by friendsof. In fact, I recommend reading the book even if you never plan to use microformats because of two other useful aspects of the book.

The first is the author's discussion of structural and semantic HTML in chapter 3. Here, he discusses some of the more rigorous programmatic aspects of HTML, how they're implemented, and why they're important for the long run.

The second is the nuggets sprinkled throughout the book, such as this one on why XML is important for RSS feeds.

"...RSSs are also XML-based languages, meaning that feeds must at least be well-formed..."

(from Microformats: Empowering Your Markup for Web 2.0, page 226.)

Why does this matter for technical communication? Today, most material produced by technical communicators is self-contained - e.g. a help system or user manual produced by one developer. But the web already has features like RSS feeds and aggregators that may be just as useful for technical communication. What the nugget above is saying is that RSS feeds and aggregators will most likely require XML, which will require a move away from HTML and the adoption of authoring tools that produce content that's at least well-formed if not valid. ("well-formed" and "valid" in the programmatic sense of following XML syntax rules.)

The book assumes familiarity with HTML, XML, IETF, and other acronyms and is very dense, but it's a quick read if you just look at a few code samples to get the idea and focus instead on the larger issues of programmatic rigor. Highly recommended.

No comments: