Wednesday, November 24, 2010

W3C Mobile Web Best Practices and HAT-Based Mobile – Part 10

A recommendation regarding terse markup…

45. [MINIMIZE] – Use terse, efficient markup.

I noted in part 9 that the desktops and laptops for which we create online content have so much power that we often code verbosely and inefficiently because we can. And it’s less work. I was referring to style sheets there, but the same point is true for markup in topics.

The issue of terse, efficient markup falls into several areas. In each case, we have to ask whether the benefit is worth the effort.

Redundant White Space

One area, according to the W3C, is that “content …marked up in… XML can… be made smaller while preserving… semantics by removal of redundant white space (i.e. spaces and new lines).” It isn’t clear, but I think what the W3C is referring to is white space and new lines in code view, not WYSIWYG, but there is an efficiency issue in WYSIWYG too so I’ll look at that as well.

Code View – “Pretty Printing”

The W3C suggests avoiding “pretty printing,” the formatting of markup with indentation, because it “can generate large amounts of white space and… add to page weight.” But it also notes that “pretty printing” may be an important part of the authoring, in which case “try to arrange that redundant white space is stripped when serving a page.” It further notes that “…some network proxies strip white space that they think is redundant, [but] not all do so, so… not… to rely upon this behavior.” What is this about?

Look at a topic in code view in your authoring tool. You’ll see something like this:

Note the indentation and new lines, all of which add space. For example, the code above uses 194 bytes. Collapsing the indents to look like this:

reduces the size to 184 bytes, a 5.1% drop. And this:

reduces it to 172 bytes. So eliminating the indentation and white space reduces the size of the file from 194 bytes, or 11.3%. (Your numbers will differ.) This may seem trivial, but smaller files reduce network traffic load. However, the increased terseness and efficiency of the markup also makes the code less readable. The first example is easier to read than the second and certainly the third.

This may force you to weigh the need to work at the code level versus code efficiency. If code efficiency is most important, you’d buy the most efficient authoring tool you can and make it a policy not to work in code except in unusual cases. Conversely, if you want to be able to work in code, you may go with a tool that creates less efficient markup. Talk to Engineering or IT about this.


HTML and XHTML collapse multiple character spaces, so we can’t use multiple spaces to indent text. However, one thing we often do is add a line space between paragraphs, a bad habit left over from our word-processing days, rather than adding a “space after” or bottom-margin property to Normal and other body-content related styles. Does doing so reduce file size? Based on a rough test, yes.

I created a topic with five lines of text, effectively paragraphs, in Normal style, with one line space after each paragraph and a bottom margin of 0 in Normal style. The file size was 630 bytes. When I set the bottom margin to 1.12 em in Normal style and deleted the line spaces, the file size dropped to 538 bytes, a 14.6% drop. (Again, your numbers will differ.)

I always suggest that clients and attendees in training classes set paragraph spacing using a top and bottom margin setting for the Normal style in the CSS simply because the more we control through the CSS rather than by local formatting, the more future-proofed our content is. A file size reduction of this magnitude is just one more reason to get rid of hard line spaces between paragraphs.

Excess Code

We may also find that our topics contain additional, proprietary codes. Some are inserted by our help authoring tool to add custom features. Others may appear in topics created by importing Word files into the help authoring tool. These extra codes increase the file size. It is worth trying to get rid of them?

If we plan to keep our current authoring tool, then any proprietary code can probably stay in the file, even if the code is inelegant. However, this can be risky if we’ve been using a particular tool for years and take the proprietary codes for granted, only to learn that they don’t convert well when we change tools or formats and must be fixed by hand. Because of that, it’s a good idea to periodically review a sample of topic files created by our tools and check the codes to see what’s standard and non-standard.

A similar problem occurs in topics created by importing Word files. The files often have Word-specific code that may not harm the topics but do increase their file sizes. And it is possible that those Word-specific codes might cause trouble during some conversion in the future. So, again, it’s a good idea to periodically review a sample of topics created from Word files and check the codes to see what’s standard and non-standard. I don’t know of any specific references for this, but I recently read a book called ePub Straight to the Point by Elizabeth Castro, published by Peachpit Press in 2011, that discusses how to convert Word files to ePub format. In the discussion, Castro explains what many of the Word codes are and how to delete them. I don’t think it’s the complete answer but it is worth a look.

The big problem in deleting excess, proprietary code added by an authoring tool or Word is simply the greater knowledge required to do so and the risk of breaking something. So while eliminating these codes may increase terseness and efficiency and reduce file size, it seems like the most difficult part of the job and one to be approached very carefully and after some experimentation.

More to come…

Monday, November 22, 2010

W3C Mobile Web Best Practices and HAT-Based Mobile – Part 9

A recommendation regarding style sheet efficiency…

44. [STYLE SHEETS SIZE] – Keep style sheets small.

The desktops and laptops for which we create online content have so much power that we often code inefficiently because we can get away with it, and it’s less work than coding efficiently. But mobile devices lack the power and resources of desktops and laptops, so they force us toward more optimized control files.

One of the most important of those files is the CSS, and one issue is whether our target mobile devices support CSS caching… After a device has retrieved the CSS for a topic, will it store that CSS to use for the next topic or does it have to retrieve the CSS again for the next topic? If so, we want to keep the CSS small to use minimal device resources. If the device does not support caching, it may have to retrieve the CSS for each topic. If so, we still want the CSS small to reduce network traffic for the repeated downloads of the CSS for each topic.

The W3C notes several ways to keep CSSs small but the simplest is to “…optimize style information so that only styles that are used are included.” How does this apply to help?

When our help authoring tools create style sheets, they include a default set of styles, like heads 1 - 6. The problem is that we often accept that set of default styles, even though we never use some of them in our topics. Those unused styles simply make our CSSs larger to no purpose. The solution is to remove the unused styles from the CSS. How?

We can easily remove unused styles through the authoring tool interface. But how do we know if those styles are really unused? Even if we keep careful records of style usage, it’s difficult to be sure - especially when dealing with legacy projects or large projects that we’re not totally familiar with.

A better idea is to search for the style codes in the topic code. Use the authoring tool’s multi-file search feature to look for the questionable codes within all the topics in the project. If the search doesn’t find any hits, then delete the style code from the CSS. This is a sure-fire approach technically, but it can be slow and tedious to work your way through a large and complex CSS or one that you have not checked for a while.

So the best solution here isn’t technical but managerial. Create a standard CSS for all authors to use, do one search pass through your legacy topics to clean up the style codes they contain, and prohibit ad hoc modification of the CSS after that. It’s one more argument in favor of standards.

More to come…

Thursday, November 11, 2010

W3C Mobile Web Best Practices and HAT-Based Mobile – Part 8

Another recommendation regarding formatting.

41. [MEASURES] – Do not use pixel measures and do not use absolute units in markup language attribute values and style sheet property values.

I discussed absolute vs. relative units in part 2 in relation to scrolling, but I’ll define this issue again here and expand on it.

From my consulting and training experience, I find that authors usually define the size of fonts and spacing in points, such as 10pt for Normal text. Two reasons for this:

• Many help authoring tools, like RoboHelp, use points as the default so we’ve just gotten used to it.

• For authors who come from hard-copy, especially Word, points is the default and an easily visualized unit – 1 point = 1/72”, so 12 points = 1/6”…, for example

But when we put content online, absolute units like points cause trouble for two reasons.

• Different display technologies may render points differently. For example, 10 pt is standard for Normal text on PC displays but too small on Macs because of the different technologies. So if you’re creating online material that might be read on PCs and Macs, you’d need two CSSs, one for each, with two different sizes for Normal text.

This isn’t a problem if you create online content for local use on one platform, such as HTML Help on PCs. But if you have to convert that content to a format like WebHelp that might sit on a server and be accessed by users with different platforms, like PCs and Macs, this does become a problem.

• Browsers can’t resize point-based text.

Going relative fixes these problems easily, once you get used to thinking in this new way. Relative units replace points, pixels, inches, etc., with units like % or ems. The benefit of relative units is that they automatically adapt, under the browser’s control, to the space on a given display.

For example, a graphic that’s 180 pixels wide might be fine on a standard display but too wide for a mobile device, so the device adds a horizontal scroll bar. Horizontal scrolling isn’t evil, but it does reduce usability, especially if you have to scroll vertically and horizontally, so you want to avoid it where possible.

Setting the width to 100% fixes the problem. This says that the graphic should display at 100% of the available space and leaves it up to the browser to figure out what that is.

This approach isn’t perfect. It’s possible to apply relative size to a graphic which displays it correctly but makes it too small to read. Or it may not display correctly in certain cases, such as within table cells in ePub format running in the Adobe Digital Editions (ADE) viewer. Still trying to track this one down…

What relative units can you use? There are several, including %, ems, and exes. The % is relative to the size of Normal style at 100% on any browser. So, for example, setting h1 style to 150% says that all heading 1 text should be half again as large as Normal text on each browser. (This isn’t exactly the case, but it’s close enough to get a mental picture.) The em is based on the baseline height of the m on the browser. The ex is similar to the em but based on the x.

Of these three, the most widely supported (a totally unscientific statement based solely on experience) are % and em. Some people prefer the em. Others, myself included, prefer the % because it’s easier to picture mentally. Pick the one that makes the most sense for you and is easiest to use.

Next, keeping style sheets small and using terse, efficient markup…

Wednesday, November 3, 2010

W3C Mobile Web Best Practices and HAT-Based Mobile – Part 7

Several inter-related recommendations regarding content formatting.

53. [FONTS] – Do not rely on support of font related styling.

Per the W3C – “Mobile devices often have few fonts and limited support for font sizes and effects (bold, italic, etc). …use of font size, face, or effect, for example to highlight an answer or a stressed word may not achieve the desired result.”

This seems obvious. So how does it affect tech comm?

If you’re creating “true” mobile apps, they may have little or no font related styling since you’ll be creating the apps from scratch and thus, hopefully, following best practices. But if you’re mobilizing an existing online doc or help project, it may have lots of font related styling, bold or italics, because it was created at a time when styles and style sheets didn’t have the importance they do now. Those font related styles in your material can affect your project planning, development style, and development process.

1. You have to find out if the devices you’re publishing to can support font related styling. If so, the styling becomes a best practices issue instead of an immediate programming crisis. But even if it’s not a crisis, font related styling, essentially local formatting, is still bad practice in an increasingly single sourced world and will affect your projects eventually. And if any of your target devices don’t support font related styling, you’ll have to deal with it now. To do so…

2. You have to get rid of it. This isn’t difficult but it is tedious, especially if there’s a lot to get rid of. You may have to do large-scale search and replaces in the code to replace local formatting with character styles. This works but it’s scary, even if you’ve backed up the project. More difficult still…

3. You have to stop using font related styling, possibly a major change in how you work. It’s simple mechanically – add bold and italic character styles to the project CSS. What’s hard is breaking old work habits, no longer using the text formatting toolbar in Flare or RoboHelp, for example. This can be surprisingly difficult.

So if font related styling is now off the table, how do we format text?

42. [STYLE SHEETS USE] – Use style sheets to control layout and presentation, unless the device is known not to support them.

Style sheet, or CSS, use has been growing in tech comm for years but still isn’t as widely or consistently used as it should be. CSS styles are used on heads and body content in online help and doc, but less often for things like bulleted or numbered lists, tables, notes, cautions, etc. More rarely still are CSS styles used for text enhancement, again because the text formatting toolbar is so convenient. Some authors are also deterred by the need to create a separate CSS for each target mobile device. (This isn’t necessary – instead, the solution is the media types feature in the W3C’s CSS spec, MadCap Flare’s “mediums”, but this feature is still not that widely known.) There’s also uncertainty in some Word shops about the distinction between how styles work in Word vs. in HTML/XHTML.

The upshot? Any doc group that sees single sourcing to mobile in its future should plan to train all authors on the concepts and use of style sheets and new development processes so as to reduce or even eliminate local formatting.

There’s more to this issue as well, including:

• The need to organize documents so that they can be read without style sheets if a particular mobile device doesn’t support them.

• Editing style sheets to minimize their file size.

• Using relative measures in place of absolute measures like pixels and points.

To be covered in the next installment…