It’s a really old subject, but I haven’t said my piece on the XHTML 1.0 versus HTML 4.01 debate. While commenting on Roger Johansson’s blog, 456 Berea Street, I said a little bit about what I think. I figured I ought to go ahead and say my fill.
XHTML was supposed to be the death of HTML. HTML 4.01, until recently, was supposed to be the last iteration of HTML. I think XHTML is great. It allows the designer to implement bits of XML, should he want to. It also is very strict, requiring proper syntax where HTML didn’t. If there is something wrong with the code, the page should not render at all. This makes good syntax coding a requirement rather than a suggestion. However, the current implementation of most of the XHTML pages I know of is ideologically broken. The rest don’t work on Internet Explorer.
In 2000 when XHTML 1.0 was introduced, there was a need for backward compatibility since most browsers could not render real XHTML. That is, browsers were built to render HTML served as
text/html. Sending XHTML as XML resulted in, I assume, an interpreted XML tree. So that adoption of XHTML would occur, it needed to work in browsers. So, the XHTML recommendation had guidelines in place of how to make XHTML work on HTML browsers. This wasn’t backward compatibility so much as a hack. It didn’t allow for any of the benefits of XHTML, though it succeeded in making pages written with XHTML syntax render on old browsers. So, the mime-type is the modern equivalent of
DOCTYPE triggering quirksmode (which the Web Hypertext Application Technology Work Group embrace).
As far as hacks go, it worked. Like all
transitional solutions, it was supposed to be dropped as browser support for XHTML grew. The problem is that Internet Explorer never supported it (even in version 7) and people opted to continue to send HTML-XHTML. Further, it seems, many people don’t realize that real XHTML requires the correct
application/xhtml+xml mime-type being sent. This is something that must be set up on the server, as
text/html is the default mime-type for sending
.html documents on every web server I know of.
When XHTML is sent as
text/html, it behaves differently than if it is sent correctly. XHTML sent as HTML is treated as tag soup, which means any optimized, light weight XML parsers aren’t used. Tag Soup XHTML doesn’t require strict use of
CDATA elements and improper syntax doesn’t stop rendering of the page. The null closing tags are treated as broken attributes. While it still works, XHTML is ideologically broken if it is sent as
text/html. People who suggest we ought to use XHTML to guarantee that fledgling designers pick up XHTML (which would require strict syntax) are suggesting that we use a broken implementation to uphold an ideal that is at odds with broken implementations (ignoring that an unforgiving markup language would cause most designers to give up out of irritation). It is hypocritical.
So, I see two choices that are ideologically sound. The first is to use XHTML and send it as
application/xhtml+xml, Internet Explorer be damned. However, this is really not a good choice for most. Some would suggest content negotiation, but this is still a hack that requires a lot of extra thinking and planning (albeit a better one than sending XHTML as HTML). The second is to only use XHTML in specific instances where compatibility can be guaranteed.
Let me elaborate on the second choice. HTML 4.01 is a web standard. Anyone in the web standards group that always advocates XHTML over HTML on grounds that XHTML is better suited for use than HTML doesn’t understand what tools are for. HTML with a strict
DOCTYPE can be validated, written semantically, and obsessed over as much as XHTML. HTML just allows the web designer to make the choice (and good designers will obsess over their markup no matter what).
HTML and XHTML are tools to solve a problem. Just like a screwdriver won’t help when a hammer is needed, XHTML is no good when HTML is needed. I’ll be specific. On pages where free-form user input is allowed, the potential for non-designers (and designers, too, for that matter) to enter bad markup is huge. In a real XHTML page, the user could easily break a page, preventing rendering. When using HTML, the page may no longer validate, but it will still render. In instances such as these HTML is a better tool than XHTML.
Web applications, however, are a different story. System requirements can be specified as they are on traditional applications (e.g. a browser that supports XHTML can be required). This means that no hacks need to be used. Since web applications generally use form elements to display and edit data, concerns over user input are drastically reduced. Most of the time, discreet data is required rather than free-flow data that one might find on, say, a comments page. So, data can be more accurately validated. When that data is inserted into an XHTML page, it’s far less likely to break the page. On a syntactical level, XHTML meshes well with programming languages. That is, the code must live up to certain standards. XHTML would help tie the front end to the back end. XHTML is a perfect tool for web applications.
So, ignoring all the common arguments about the dangers of using XHTML, the ideology of XHTML is broken if the page works in Internet Explorer 7 or below. Advocating the use of a broken technology is hypocritical when well written HTML 4.01 with a strict
DOCTYPE is better suited for normal web usage. However, XHTML has a defined and useful place on the World Wide Web.
If you want another opinion, Maciej Stachowiak of Apple’s WebKit / Safari project weighs in with pretty much the same opinions I have.
Update: The Lachlan Hunt pointed out that I screwed up the XHTML MIME-type. I fixed the error.