A recommendation regarding terse markup…
45. [MINIMIZE] – Use terse, efficient markup.
I noted in part 9 that the desktops and laptops for which we create online content have so much power that we often code verbosely and inefficiently because we can. And it’s less work. I was referring to style sheets there, but the same point is true for markup in topics.
The issue of terse, efficient markup falls into several areas. In each case, we have to ask whether the benefit is worth the effort.
Redundant White Space
One area, according to the W3C, is that “content …marked up in… XML can… be made smaller while preserving… semantics by removal of redundant white space (i.e. spaces and new lines).” It isn’t clear, but I think what the W3C is referring to is white space and new lines in code view, not WYSIWYG, but there is an efficiency issue in WYSIWYG too so I’ll look at that as well.
Code View – “Pretty Printing”
The W3C suggests avoiding “pretty printing,” the formatting of markup with indentation, because it “can generate large amounts of white space and… add to page weight.” But it also notes that “pretty printing” may be an important part of the authoring, in which case “try to arrange that redundant white space is stripped when serving a page.” It further notes that “…some network proxies strip white space that they think is redundant, [but] not all do so, so… not… to rely upon this behavior.” What is this about?
Look at a topic in code view in your authoring tool. You’ll see something like this:
45. [MINIMIZE] – Use terse, efficient markup.
I noted in part 9 that the desktops and laptops for which we create online content have so much power that we often code verbosely and inefficiently because we can. And it’s less work. I was referring to style sheets there, but the same point is true for markup in topics.
The issue of terse, efficient markup falls into several areas. In each case, we have to ask whether the benefit is worth the effort.
Redundant White Space
One area, according to the W3C, is that “content …marked up in… XML can… be made smaller while preserving… semantics by removal of redundant white space (i.e. spaces and new lines).” It isn’t clear, but I think what the W3C is referring to is white space and new lines in code view, not WYSIWYG, but there is an efficiency issue in WYSIWYG too so I’ll look at that as well.
Code View – “Pretty Printing”
The W3C suggests avoiding “pretty printing,” the formatting of markup with indentation, because it “can generate large amounts of white space and… add to page weight.” But it also notes that “pretty printing” may be an important part of the authoring, in which case “try to arrange that redundant white space is stripped when serving a page.” It further notes that “…some network proxies strip white space that they think is redundant, [but] not all do so, so… not… to rely upon this behavior.” What is this about?
Look at a topic in code view in your authoring tool. You’ll see something like this:
Note the indentation and new lines, all of which add space. For example, the code above uses 194 bytes. Collapsing the indents to look like this:
reduces the size to 184 bytes, a 5.1% drop. And this:
reduces the size to 184 bytes, a 5.1% drop. And this:
reduces it to 172 bytes. So eliminating the indentation and white space reduces the size of the file from 194 bytes, or 11.3%. (Your numbers will differ.) This may seem trivial, but smaller files reduce network traffic load. However, the increased terseness and efficiency of the markup also makes the code less readable. The first example is easier to read than the second and certainly the third.
This may force you to weigh the need to work at the code level versus code efficiency. If code efficiency is most important, you’d buy the most efficient authoring tool you can and make it a policy not to work in code except in unusual cases. Conversely, if you want to be able to work in code, you may go with a tool that creates less efficient markup. Talk to Engineering or IT about this.
WYSIWYG View
HTML and XHTML collapse multiple character spaces, so we can’t use multiple spaces to indent text. However, one thing we often do is add a line space between paragraphs, a bad habit left over from our word-processing days, rather than adding a “space after” or bottom-margin property to Normal and other body-content related styles. Does doing so reduce file size? Based on a rough test, yes.
I created a topic with five lines of text, effectively paragraphs, in Normal style, with one line space after each paragraph and a bottom margin of 0 in Normal style. The file size was 630 bytes. When I set the bottom margin to 1.12 em in Normal style and deleted the line spaces, the file size dropped to 538 bytes, a 14.6% drop. (Again, your numbers will differ.)
I always suggest that clients and attendees in training classes set paragraph spacing using a top and bottom margin setting for the Normal style in the CSS simply because the more we control through the CSS rather than by local formatting, the more future-proofed our content is. A file size reduction of this magnitude is just one more reason to get rid of hard line spaces between paragraphs.
Excess Code
We may also find that our topics contain additional, proprietary codes. Some are inserted by our help authoring tool to add custom features. Others may appear in topics created by importing Word files into the help authoring tool. These extra codes increase the file size. It is worth trying to get rid of them?
If we plan to keep our current authoring tool, then any proprietary code can probably stay in the file, even if the code is inelegant. However, this can be risky if we’ve been using a particular tool for years and take the proprietary codes for granted, only to learn that they don’t convert well when we change tools or formats and must be fixed by hand. Because of that, it’s a good idea to periodically review a sample of topic files created by our tools and check the codes to see what’s standard and non-standard.
A similar problem occurs in topics created by importing Word files. The files often have Word-specific code that may not harm the topics but do increase their file sizes. And it is possible that those Word-specific codes might cause trouble during some conversion in the future. So, again, it’s a good idea to periodically review a sample of topics created from Word files and check the codes to see what’s standard and non-standard. I don’t know of any specific references for this, but I recently read a book called ePub Straight to the Point by Elizabeth Castro, published by Peachpit Press in 2011, that discusses how to convert Word files to ePub format. In the discussion, Castro explains what many of the Word codes are and how to delete them. I don’t think it’s the complete answer but it is worth a look.
This may force you to weigh the need to work at the code level versus code efficiency. If code efficiency is most important, you’d buy the most efficient authoring tool you can and make it a policy not to work in code except in unusual cases. Conversely, if you want to be able to work in code, you may go with a tool that creates less efficient markup. Talk to Engineering or IT about this.
WYSIWYG View
HTML and XHTML collapse multiple character spaces, so we can’t use multiple spaces to indent text. However, one thing we often do is add a line space between paragraphs, a bad habit left over from our word-processing days, rather than adding a “space after” or bottom-margin property to Normal and other body-content related styles. Does doing so reduce file size? Based on a rough test, yes.
I created a topic with five lines of text, effectively paragraphs, in Normal style, with one line space after each paragraph and a bottom margin of 0 in Normal style. The file size was 630 bytes. When I set the bottom margin to 1.12 em in Normal style and deleted the line spaces, the file size dropped to 538 bytes, a 14.6% drop. (Again, your numbers will differ.)
I always suggest that clients and attendees in training classes set paragraph spacing using a top and bottom margin setting for the Normal style in the CSS simply because the more we control through the CSS rather than by local formatting, the more future-proofed our content is. A file size reduction of this magnitude is just one more reason to get rid of hard line spaces between paragraphs.
Excess Code
We may also find that our topics contain additional, proprietary codes. Some are inserted by our help authoring tool to add custom features. Others may appear in topics created by importing Word files into the help authoring tool. These extra codes increase the file size. It is worth trying to get rid of them?
If we plan to keep our current authoring tool, then any proprietary code can probably stay in the file, even if the code is inelegant. However, this can be risky if we’ve been using a particular tool for years and take the proprietary codes for granted, only to learn that they don’t convert well when we change tools or formats and must be fixed by hand. Because of that, it’s a good idea to periodically review a sample of topic files created by our tools and check the codes to see what’s standard and non-standard.
A similar problem occurs in topics created by importing Word files. The files often have Word-specific code that may not harm the topics but do increase their file sizes. And it is possible that those Word-specific codes might cause trouble during some conversion in the future. So, again, it’s a good idea to periodically review a sample of topics created from Word files and check the codes to see what’s standard and non-standard. I don’t know of any specific references for this, but I recently read a book called ePub Straight to the Point by Elizabeth Castro, published by Peachpit Press in 2011, that discusses how to convert Word files to ePub format. In the discussion, Castro explains what many of the Word codes are and how to delete them. I don’t think it’s the complete answer but it is worth a look.
The big problem in deleting excess, proprietary code added by an authoring tool or Word is simply the greater knowledge required to do so and the risk of breaking something. So while eliminating these codes may increase terseness and efficiency and reduce file size, it seems like the most difficult part of the job and one to be approached very carefully and after some experimentation.
More to come…
No comments:
Post a Comment