Quick XHTML

So You Wanna Be a Boxer?

You want to use XHTML. That, my friend, isn’t as easy as the W3C, the Web 2.0 crowd, and every second—rate web—whacker this side of Carthage want you to believe. There are several drawbacks, and few advantages. Let’s get to it.

Question: “Why do I want to use XHTML?”. There are no stock answers but here are a few we have run across before:

“It is stricter than HTML” – This is a common misunderstanding. Because XHTML is an XML—based language, as opposed to HTML which is an application of SGML, different rules apply; for the most part these are no different from an author point of view. (footnote 1)
“It is more semantic than HTML” – Also a misunderstanding. No elements were added between HTML 4.01 and XHTML 1.1 (footnote 2), nor will there ever be. Only by adding parts of other markup languages through the namespaces mechanism can XHTML 1.0 or 1.1 have extended semantics – but there is a but.
“With XHTML I can extend the language through the use of namespaces” – You could, in theory, do that. However, it requires that the XHTML document is not only parsed by the browser as if it was XML, but that the browser understands namespaces and is able to create a coherent whole out of the various languages involved. Semantically, none can do that today. (footnote 3)
“XHTML is compatible with HTML” – This may be the greatest misunderstanding of all – no, it isn’t. Only by telling browsers that the page is actually HTML will a majority of today’s browsers handle the content, and then only because they have gotten very good at handle incorrectly written markup.
“I use it to be forwards compatible.” – The likelyhood of any current, or future, web—browsers actually giving up support for HTML is rather on the low side. Do you really believe that, tomorrow, a user—agent won’t handle today’s HTML?

Remember, regardless of your arguments, one thing remains fact: a majority of web browsers do not, nor will they for the next several years, support XHTML, while it is highly unlikely that any of them will ever cease supporting HTML.

I don’t remember names …

Here is the list of things you must remember:

Always, without fail, write element— and attribute names in lower case. There is no onLoad, only onload; no P, only p. This way, or the highway.
Always, without fail, close all elements. Those that do not have ending tags should be written as follows and in no other way:
- <area … />
- <base … />
- <br … />
- <hr … />
- <img … />
- <input … />
- <link … />
- <meta … />
- <param … />
Configure your text editor to save using UTF—8 encoding. Configure your web server to send charset=utf—8 when it serve pages. Use UTF—8 throughout the process; it will cut down on your problems now, and later.
Don’t include the XML declaration – that is, make certain your document does not start with <?xml version="1.0" encoding="utf-8"?>.
Use both the lang and xml:lang attributes to specify the natural language used in the document – for instance: <html lang="en-GB" xml:lang="en-GB">.
Move your scripts and your styles out into external files, and reference them using <link … href="…" … /> and <script … src="…" …></script> Don’t use historical “comment” hacks to hide them, as this can have peculiar effects indeed in XHTML.
Always use a DOCTYPE. Always. Select the one which best fit the markup language you have written. Don’t select it based on whether or not it will change a particular browser’s rendering mode. You have, in effect, two choices: XHTML 1.0 Strict or XHTML 1.1 (footnote 4)
Configure your webserver to send XHTML documents with the content—type text/html. This way the browser will always parse the content using the legacy HTML parser. Sounds illogical? Yes, and it is. However, if you want your XML to be parsed as XML and set the application/xhtml+xml content—type, a majority of browsers – including Internet Explorer up to and through version 7 – will prompt a download, as they don’t understand the type. (footnote 5)

At this point in the narrative you might have begun wondering what the point is of using XHTML if, in order for it to be understood at all, you must pretend that it is oddly coded HTML – and thereby losing out on any benefits XML might give you. This is an excellent question, and with it in mind we suggest you go back and yet again ask yourself why XHTML?

I‘m feeling Fine

Study the HTTP Accept header, and study it well. Not only will it, under ideal circumstances, tell you which markup languages the user—agent support, but also to which degree it prefer one over the other. (footnote 6)
Decide how you want to store your information. There are several options, of which some even make sense:
- As XHTML 1.0 or 1.1. If a browser requires HTML, you simply transform the structure, using either XSL or a similar method. No information – save all the Ruby markup – is lost.
- As XHTML 2.0. The same applies, but here you’ll lose information, since several constructs in the 2.0 version cannot be transformed to equivalent structures under either 1.0, 1.1, or HTML of any version.
- As a more expressive, public or private, XML— or SGML—based language. The same effect – lost information – as for XHTML 2.0 apply.
- In a database. From there you can produce the markup language desired. Given, of course, that it can express the structures you need.
And, finally, make sure you have a little caching turned on. After all, you are now dynamically changing the structure of documents. The question, of course, remain: why are you spending all these resources – just to support XHTML?

Yes. You could just change the content—type to text/html and leave the error handling to the browser. With such a suggested “solution” in mind, please re—read the argument above regarding how XHTML is stricter than HTML …

Tomorrow

“What about XHTML 2?” – I can hear you ask. At the time of writing – June 2006 – this new markup language is a proposal, and nothing more. In time it might become a standard, and with even more time current browsers might support it.

Being a draft, the new language does not yet have a media type associated with it, but we can assume two things:

It’ll be application/xml
text/html will still work.

It has already been announced that Microsoft shall not support XHTML 1.0 or 1.1 in their upcoming Internet Explorer 7. They do, however, already support XML parsing, which means that sticking with a generic application/xml content—type will make everything groovy.

Not quite. IE, like Opera, Firefox, Mozilla, Safari, and any other XML—enabled user—agent are entirely able to read, parse, and even apply CSS to a generic XML—based language, which is what XHTML sent as application/xml would be. Neither of them will, today, understand any of it. For that to happen, they need to be taught the language. We leave – as yet another exercise – the analysis of when, given the current browser population and development cycles, it may be feasible to deploy XHTML 2 in a production environment.

Despite claims such as those made by the W3C in http://www.w3.org/MarkUp/2004/xhtml-faq: Much of XHTML 2 works already in existing browsers, …, it is a trivial task to find a user—agent today their linked example does not work in. Hint, try finding the document through Google.

As an entirely hypothetical question, we wonder: what will happen to an XHTML 2 document served as text/html? It surely can’t perceived as worse tag soup than what real life user—agents have to day with on an everyday basis. The question, perhaps, is whether their error correction algorithms are good enough to, for instance, apply styles to elements they don’t otherwise know anything about. Idle minds.

Notes

(footnote 1) This is something of a myth, which springs from the fact that the XML specification clearly states that, for all fatal errors, processing should stop. Leaving those complications aside that relate to the accessibility of such a philosophy, it is worth noting that only fatal errors should be treated in this manner. Hence, <html><head></head> <body><html /></body></html> would lead to an error which doesn't lead to a stop (it violates a validity constraint in not having a title element), while <html><head><title>foo</head> <body><html /></body></html> will be a fatal error for violating the well—formedness constraint. And then we’ll stop.

(footnote 2) Well caught indeed – yes, the Ruby elements were added to XHTML 1.1. The support is, to put it mildly, limited.

(footnote 3) You are right – if the browser in question understand XHTML (Firefox and Opera do), understand namespaces (again, they do), and via them pull in other XML—based languages it also understands – MathML, SVG, etc. – then it can create a coherent understanding of even semantics. Congratulations, you have stumbled across the one reason for employing XHTML. It still won’t work outside a minority of browsers, but it is a reason.

(footnote 4) Yes, there exist both frameset and transitional DTDs for XHTML 1.0, but we have purposefully ignored them. This is also why <basefont … />, and <frame … /> are not mentioned above.

(footnote 5) Yes, even with XHTML 1.1. It is a common myth, but still a myth, that the 1.1 version cannot be sent as text/html. You would be well within the standards to do so, tho it is recommended against. It is also … pointless.

(footnote 6) This, however, seems terribly difficult. The reader is encouraged to take a look at the article The Road to XHTML 2.0: MIME Types for an illustration in how not to go about this task. The important thing to notice is how the author suggest handling the q—parameters in the Accept header. For an illustration of the exact opposite, see the HTTP::Negotiate Perl library by Gisle Aas.

References

Bray, Tim, et al. Extensible Markup Language (XML) 1.0.
W3C. February 2004.
Raggett, Dave. HTML 4.01 Specification.
W3C. December 1999.
Pemberton, Steven et al. HTML and XHTML Frequently Answered Questions.
W3C. July 2004.
Fielding, R. et al. Hypertext Transfer Protocol — HTTP/1.1.
RFC 2616. June 1999.
Sawicki, Marcin et al. Ruby Annotation.
W3C. May 2001.
Hickson, Ian. Sending XHTML as text/html Considered Harmful.
Online. September 2002.
ISHIKAWA, Masayasu et al. XHTML Media Types.
W3C. August 2002.
Pemberton, Steven et al. XHTML 1.0 The Extensible HyperText Markup Language (Second Edition).
W3C. January 2000, August 2002.
Altheim, Murray et al. XHTML 1.1 – Module—based XHTML.
W3C. May 2001.
Axelsson, Jonny et al. XHTML 2.0.
W3C. May 2005.
Clark, James. XSL Transformations (XSLT).
W3C. November 1999.

Examples

XHTML 1.0 Strict

      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

XHTML 1.1

      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
                          "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

Acknowledgements

David Dorward, for grammar and catching my most horrid mistakes …
Jörgen Andreasen, for explanations of where I have assumed prerequisites the target audience cannot be expected to have …
Bugsy Malone for the headings.