Tags, Attributes and Elements
Introduction
An HTML document is a representation of a hierachy of nodes. There are several different types of node, the most common of which are text nodes and element nodes.
A text node simply contain text. An element contains special information, describing structure, semantics or a relationship with another resource.
For example, the following HTML describes a paragraph containing some text and a link, and that link contains some more text.
<p>The quick brown <a href="fox.html">fox</a> jumped over the lazy dog.</p>
[ Diagram - nested nodes ]
Elements
In the above example there are two elements, a p (paragraph) element, and an a (anchor) element. The element includes everything inside it, so the text node 'fox' is considered part of the a element (and also part of the p element).
Tags
Tags are how the begining and end of elements are represented in the HTML source of a document. Elements generally have both a start tag, and an end tag.
<p> - start tag for a paragraph
</p> - end tag for a paragraph
A start tag consists of an opening angle bracket (or less than sign), followed by the element type, followed by any attributes (which I will discuss shortly), and then a closing angle bracket (or greater than sign).
An end tag consists of an opening angle bracket, a forward slash character (/), the element name and a closing angle bracket.
Optional and forbidden tags
Optional tags
Sometimes the structure of a document allows the presense of a tag to be inferred. For example, it is not possible to have one paragraph contained inside another paragraph, so, logically, if a new paragraph starts, then the old paragraph must have finished.
HTML allows for this shorthand in a number of circumstances, but making end tags optional.
Box: This feature may cause problems
While some tags are optional, including them explicitly is less likely to trigger browser bugs, makes debugging less work, and can make reading source code easier. I recommend always including them.
It is still important to understand where tags are optional, and this can be important for interpreting errors reported during validation.
If we look at the HTML specification for the p element we see:
<!ELEMENT P - O (%inline;)* -- paragraph -->
This is part of the DTD (a type of machine readable document that describes which elements are allowed where). If we break the line down into its components parts we learn that:
<! this is an SGML instruction
ELEMENT an element is being defined
P the element type is P
- the start tag is required
O the end tag is optional
(%inline;)* the element can only contain any number of 'inline' nodes
-- paragraph -- this is a comment for humans to read
> end of instruction
So the end tag for a P element is optional. This makes the following two blocks of HTML equilivent.
<p>The quick brown fox.</p>
<p>The lazy dog.</p>
<p>The quick brown fox.
<p>The lazy dog.
Now, we saw that one of the componenets of the line taken from the DTD was claimed that the start tag was required. You've probably inferred that this means the start tag for some elements is optional, and you would be right.
<!ELEMENT HEAD O O (%head.content;) +(%head.misc;) -- document head -->
Let us take the HEAD element as an example. This element can appear only in one place in the document - immediately after the start of the HTML element. As a result, it is possible to infer the presence of the HEAD element based on the presence of the HTML element.
Box: XHTML is different
XHTML does not have optional start or end tags. All elements must be added to the document explicitly.
Forbidden tags
Some elements in HTML cannot contain other content (not elements and not text nodes), they are always empty and for these the end tag is explicitly forbidden.
<!ELEMENT HR - O EMPTY -- horizontal rule -->
You can see the EMPTY keyword in the line from the DTD. In this case the syntax is simply:
<hr>
Box: XHTML is different
XHTML does not have forbidden tags. All elements must be opened and closed explicitly.
If you are following Appendix C of XHTML (which you should do if you are serving your XHTML with a text/html content-type), then elements with forbidden end tags should use the XHTML-specific shorthand of slash just before the closing angle bracket of the start tag instead of a normal end tag. This slash should be seperated from the type identifier and attributes with a space.
<hr />
For XHTML documents that are not HTML compatible the normal closing tag syntax is fine:
<hr></hr>
Quick reference
The index of elements (http://www.w3.org/TR/html4/index/elements.html) in the HTML 4.01 recomendation can act as a quick reference to identify which elements have optional or forbidden tags.