HTML: The Definitive Guide

Previous Chapter 3 Next
 

3. Anatomy of an HTML Document

Contents:
Appearances Can Deceive
Structure of an HTML Document
HTML Tags
Document Content
HTML Document Elements
The Document Header
The Document Body

HTML documents are very simple, and writing one shouldn't intimidate even the most timid of computer users. First, although you might use a fancy WYSIWYG editor to help you compose it, an HTML document is ultimately stored, distributed, and read by a browser as a simple ASCII text file.[1] That's why even the poorest user with a barebones text editor can compose the most elaborate of HTML pages. (Accomplished webmasters often elicit the admiration of HTML "newbies" by composing astonishingly cool pages using the crudest text editor on a cheap laptop computer and performing in odd places like on a bus or in the bathroom.) HTML writers should, however, keep several of the popular browsers on hand and alternate among them to view new documents under construction. Remember, browsers differ in how they display a page; not all browsers implement all of the HTML standards; and some have their own special extensions to the language.

[1] Informally, both the text and the markup tags in an HTML document are ASCII characters. Technically, unless you specify otherwise, text and tags are made up of eight-bit characters as defined in the standard ISO-8859-1 Latin character set. The HTML standard does support alternative character encoding, including Arabic and Cyrillic. See Appendix E, Character Entities, for details.

3.1 Appearances Can Deceive

HTML documents never look alike when displayed by a text editor and when displayed by an HTML browser. Simply take a look at any source HTML document off the World Wide Web. At the very least, return characters, tabs, and leading spaces, although important for readability of the source text document, are ignored for the most part in HTML. There also is a lot of extra text in an HTML source document, mostly from the display tags and interactivity markers and their parameters that affect portions of the document, but don't themselves appear in the display.

Accordingly, new HTML authors are confronted with having to develop not only a presentation style for their HTML pages, but a different style for their HTML source text. The source document's layout should highlight the programming-like markup aspects of HTML, not its display aspects. And it should be readable not only by you, the author, but by others, as well.

Experienced HTML document writers typically adopt a programming-like style, albeit very relaxed, for their source HTML text. We do the same throughout this book, and that style will become apparent as you compare our source HTML examples with the actual display of the document by a browser.

Our formatting style is simple, but serves to create readable, easily maintained documents:

The task of maintaining the indentation of your source HTML ranges from trivial to onerous. Some text editors, like Emacs, manage the indentation automatically; others, like common word processors, couldn't care less about indentation and leave the task completely up to you. If your editor makes your life difficult, you might consider striking a compromise, perhaps by indenting the tags to show structure, but leaving the actual text without indentation to make modifications easier.

No matter what compromises or stands you make on source code style, however, it's important that you adopt one. You'll be very glad you did when you go back to that HTML document you wrote three months ago searching for that really cool trick you did with. . . . Now, where was that?


Previous Home Next
Forging Ahead Book Index Structure of an HTML Document

HTML: The Definitive Guide CGI Programming JavaScript: The Definitive Guide Programming Perl WebMaster in a Nutshell