For the most part, HTML document tags are simple to understand and use, since they are made up of common words, abbreviations, and notations. For instance, the <i> and </i> tags tell the browser to respectively start and stop italicizing the text characters that come between them. Accordingly, the syllable "simp" in our barebones HTML example would appear italicized on a browser display.
The HTML standard and its various extensions define how and where you place tags within a document. Let's take a closer look at that syntactic sugar that holds together all HTML documents.
tag syntax
Every HTML tag consists of a tag name, sometimes followed by an optional list of tag attributes, all placed between opening and closing brackets (< and>). The simplest tag is nothing more than a name appropriately enclosed in brackets, such as <head> and <i>. More complicated tags contain one or more attributes, which specify or modify the behavior of the tag.
Tag and attribute names are not case-sensitive. There's no difference in effect between <head>, <Head>, <HEAD>, or even <HeaD>; they are all equivalent. The values that you assign to a particular attribute may be case-sensitive, however, depending on your browser and server. In particular, file location and name references--universal resource locators (URLs)--are case-sensitive. [the section called "Referencing Documents: The URL"]
Tag attributes, if any, belong after the tag name, each separated by one or more tab, space, or return characters. Their order of appearance is not important.
A tag attribute's value, if any, follows an equal sign (=) after the attribute name. You may include spaces around the equal sign, so that width=6, width = 6, width =6, and width= 6 all mean the same. For readability, however, we prefer not to include spaces. That way, it's easier to pick out an attribute/value pair from a crowd of pairs in a lengthy tag.
If an attribute's value is a single word or number (no spaces), you may simply add it after the equal sign. All other values should be enclosed in single or double quotation marks, especially those values that contain several words separated by spaces. The length of the value is limited to 1024 characters.
Most browsers are tolerant of how tags are punctuated and broken across lines. Nonetheless, avoid breaking tags across lines in your source document whenever possible. This rule promotes readability and reduces potential errors in your HTML documents.
Here are some tags with attributes:
<a href="http://www.ora.com/catalog.html"> <ul compact> <input name=filename size=24 maxlength=80> <link title="Table of Contents">
The first example is the <a> tag for a hyperlink to O'Reilly & Associates' World Wide Web-based catalog of products. It has a single attribute, href, followed by the catalog's address in cyberspace--its URL.
The second example shows a tag that formats text into an unordered list of items. Its single attribute--compact, which limits the space between list items--does not require a value.
The third example shows a tag with multiple attributes, each with a value that does not require enclosing quotation marks.
The last example shows proper use of enclosing quotation marks when the attribute value is more than one word long.
Finally, what is not immediately evident in these examples is that while attribute names are not case-sensitive (href works the same as HREF and HreF), most attribute values are case-sensitive. The value filename for the name attribute in the <input> tag example is not the same as the value Filename, for instance.
We alluded earlier to the fact that most HTML tags have a beginning and an end and affect the portion of text between them. That enclosed text segment may be large or small, from a single text character, syllable, or word, such as the italicized "simp" syllable in our barebones example, to the <html> tag that bounds the entire document. The starting component of any tag is the tag name and its attributes, if any. The corresponding ending tag is the tag name alone, preceded by a slash. Ending tags have no attributes.
Tags can be put inside the affected segment of another tag (nested) for multiple tag effects on a single segment of the HTML document. For example, a portion of the following text is both bold and included as part of an anchor defined by the <a> tag:
<body> This is some text in the body, with a <a href="another_doc.html">link, a portion of which is <b>set in bold</b></a> </body>
According to the HTML standard, you must end nested tags starting with the most recent one and work your way back out. For instance in the example, we end the bold tag (</b>) before ending the link tag (</a>) since we started in the reverse order: <a> tag first, then <b> tag. It's a good idea to follow that standard, even though most browsers don't absolutely insist you do so. You may get away with violating this nesting rule for one browser, sometimes even with all current browsers. But eventually a new browser version won't allow the violation and you'll be hard pressed to straighten out your source HTML document.
According to the HTML standard, only a few tags do not have an ending tag. For example, the <br> tag causes a line break; it has no effect otherwise on the subsequent portion of the document and, hence, does not need an ending tag.
The standard HTML tags that do not have corresponding ending tags are:
<area>
<base>
<basefont>
<br>
<hr>
<img>
<input>
<isindex>
<link>
<meta>
<nextid>
<option>
<param>
You often see documents in which the author seemingly has forgotten to include many ending tags in apparent violation of the HTML standard. But your browser doesn't complain, and the documents displays just fine. What gives? The HTML standard lets you omit certain tags or their endings for clarity and ease of preparation. The HTML standard writers didn't intend the language to be tedious.
For example, the <p> tag that defines the start of a paragraph has a corresponding end tag </p>, but the </p> ending tag rarely is used. In fact, many HTML authors don't even know it exists! [the section called "The <p> Tag"]
Rather, the HTML standard lets you omit a starting tag or ending tag whenever it can be unambiguously inferred by the surrounding context. Many browsers make good guesses when confronted with missing tags, leading the document author to assume that a valid omission was made. When in doubt, add the ending tag: it'll make life easier for yourself, the browser, and anyone else who might need to modify your document in the future.
Browsers sometimes ignore tags. This usually happens with redundant tags whose effects merely cancel or substitute for themselves. The best example is a series of <p> tags, one after the other with no intervening text. Unlike the similar series of repeating return characters in a text-processing document, most browsers skip to a new line only once. The extra <p> tags are redundant and usually ignored by the browser.
character entities
In addition, most browsers ignore any tag that they don't understand or that was incorrectly specified by the document author. Browsers habitually forge ahead and make some sense of a document, no matter how badly formed and error-ridden it may be. This isn't just a tactic to overcome errors, it's also an important strategy for extensibility. Imagine how much harder it would be to add new features to the language if the existing base of browsers choked on them.
The thing to watch out for with nonstandard tags that aren't supported by most browsers is their enclosed contents, if any. Browsers that recognize the new tag may process those contents differently than those that don't support the new tag. For example, Internet Explorer supports a <comment> tag whose contents serve to document the source HTML and are not intended to be viewed by the user. However, none of the other browsers recognizes the <comment> tag and render its contents on the user's screen, effectively defeating the tag's purpose in addition to ruining the document's appearance. [the section called "Character Entities"]