Posts Tagged ‘HTML’

The Great MIME-Type Swindle

Sunday, November 19th, 2006

It’s a really old subject, but I haven’t said my piece on the XHTML 1.0 versus HTML 4.01 debate. While commenting on Roger Johansson’s blog, 456 Berea Street, I said a little bit about what I think. I figured I ought to go ahead and say my fill.

XHTML was supposed to be the death of HTML. HTML 4.01, until recently, was supposed to be the last iteration of HTML. I think XHTML is great. It allows the designer to implement bits of XML, should he want to. It also is very strict, requiring proper syntax where HTML didn’t. If there is something wrong with the code, the page should not render at all. This makes good syntax coding a requirement rather than a suggestion. However, the current implementation of most of the XHTML pages I know of is ideologically broken. The rest don’t work on Internet Explorer.

The comments I made assumed a little foreknowledge but basically say what I want to say here. However, I’ll go ahead and lay out my full argument here.

In 2000 when XHTML 1.0 was introduced, there was a need for backward compatibility since most browsers could not render real XHTML. That is, browsers were built to render HTML served as text/html. Sending XHTML as XML resulted in, I assume, an interpreted XML tree. So that adoption of XHTML would occur, it needed to work in browsers. So, the XHTML recommendation had guidelines in place of how to make XHTML work on HTML browsers. This wasn’t backward compatibility so much as a hack. It didn’t allow for any of the benefits of XHTML, though it succeeded in making pages written with XHTML syntax render on old browsers. So, the mime-type is the modern equivalent of DOCTYPE triggering quirksmode (which the Web Hypertext Application Technology Work Group embrace).

As far as hacks go, it worked. Like all transitional solutions, it was supposed to be dropped as browser support for XHTML grew. The problem is that Internet Explorer never supported it (even in version 7) and people opted to continue to send HTML-XHTML. Further, it seems, many people don’t realize that real XHTML requires the correct application/xhtml+xml mime-type being sent. This is something that must be set up on the server, as text/html is the default mime-type for sending .html documents on every web server I know of.

When XHTML is sent as text/html, it behaves differently than if it is sent correctly. XHTML sent as HTML is treated as tag soup, which means any optimized, light weight XML parsers aren’t used. Tag Soup XHTML doesn’t require strict use of CDATA elements and improper syntax doesn’t stop rendering of the page. The null closing tags are treated as broken attributes. While it still works, XHTML is ideologically broken if it is sent as text/html. People who suggest we ought to use XHTML to guarantee that fledgling designers pick up XHTML (which would require strict syntax) are suggesting that we use a broken implementation to uphold an ideal that is at odds with broken implementations (ignoring that an unforgiving markup language would cause most designers to give up out of irritation). It is hypocritical.

So, I see two choices that are ideologically sound. The first is to use XHTML and send it as application/xhtml+xml, Internet Explorer be damned. However, this is really not a good choice for most. Some would suggest content negotiation, but this is still a hack that requires a lot of extra thinking and planning (albeit a better one than sending XHTML as HTML). The second is to only use XHTML in specific instances where compatibility can be guaranteed.

Let me elaborate on the second choice. HTML 4.01 is a web standard. Anyone in the web standards group that always advocates XHTML over HTML on grounds that XHTML is better suited for use than HTML doesn’t understand what tools are for. HTML with a strict DOCTYPE can be validated, written semantically, and obsessed over as much as XHTML. HTML just allows the web designer to make the choice (and good designers will obsess over their markup no matter what).

HTML and XHTML are tools to solve a problem. Just like a screwdriver won’t help when a hammer is needed, XHTML is no good when HTML is needed. I’ll be specific. On pages where free-form user input is allowed, the potential for non-designers (and designers, too, for that matter) to enter bad markup is huge. In a real XHTML page, the user could easily break a page, preventing rendering. When using HTML, the page may no longer validate, but it will still render. In instances such as these HTML is a better tool than XHTML.

Web applications, however, are a different story. System requirements can be specified as they are on traditional applications (e.g. a browser that supports XHTML can be required). This means that no hacks need to be used. Since web applications generally use form elements to display and edit data, concerns over user input are drastically reduced. Most of the time, discreet data is required rather than free-flow data that one might find on, say, a comments page. So, data can be more accurately validated. When that data is inserted into an XHTML page, it’s far less likely to break the page. On a syntactical level, XHTML meshes well with programming languages. That is, the code must live up to certain standards. XHTML would help tie the front end to the back end. XHTML is a perfect tool for web applications.

So, ignoring all the common arguments about the dangers of using XHTML, the ideology of XHTML is broken if the page works in Internet Explorer 7 or below. Advocating the use of a broken technology is hypocritical when well written HTML 4.01 with a strict DOCTYPE is better suited for normal web usage. However, XHTML has a defined and useful place on the World Wide Web.

If you want another opinion, Maciej Stachowiak of Apple’s WebKit / Safari project weighs in with pretty much the same opinions I have.

Update: The Lachlan Hunt pointed out that I screwed up the XHTML MIME-type. I fixed the error.

W3C Listens, Incremental Update to HTML On The Way

Saturday, October 28th, 2006

Surprisingly, SlashDot scooped all the web design websites I normally read on Tim Berners-Lee’s announcement that HTML will be incrementally updated (as well as things such as the W3C‘s HTML validator)

In the post, Berners-Lee addresses folks like (in order of impact) Bjoern Hoehrmann, Jeffery Zeldman, Eric Meyer, and Molly Holzschlag who have been angry with the W3C for resting on their laurels, instead of listening to the community (apparently blatantly ignoring them). As everyone points out, the W3C should be advancing the web. Instead, small work groups are bringing us things like microformats and recommendations to update HTML.

So, the really important part of Berners-Lee’s post is not that HTML will be updated. It is that the W3C is claiming that it will now listen to the community by actually following the feedback mechanisms they have in place instead of blatantly ignoring them. The W3C is going to get back into the community instead of towering above it. Assuming they follow through, this could be a new age of progression.

I might have more to say after the weekend when the web design celebrities weigh in on the subject.

How To Make an Ajax Chat Room

Friday, June 23rd, 2006

I’ve been helping this guy from India with an Ajax / XML chat app. I wrote one ages ago, but I think it broke. So, since I’m already giving a how-to, I might as well write it down.

Ajax For Newbies

If you are familiar with XML, the buzzword Ajax, the XMLHttpRequest() object, and DOM scripting, skip to the juicy part. If not, keep on reading.

An XML Primer for Ajax

XML looks kind of like HTML, but it is quite different. For the sake of brevity, XML is a structured data file that draws from SGML. The difference is that it must be well formed and that the tags can be arbitrarily named (unless you specify a DTD).

Going into more detail would not be pragmatic since the JavaScript XML parsers only care about whether or not the XML is well-formed. So, here is an example XML file:

<messages>
 <message>
  <from>
   Fred
  </from>
  <body>
   Fred here... Just making a post!
  </body>
 </message>
 <message>
  <from>
   Joey
  </from>
  <body>
   Hey, Fred.  Thanks for the post.
  </body>
 </message>
</messages>

The good thing about good XML is that it has metadata built in. You can read it and know what is what. The code above is obviously a group of messages. Inside the messages group is a message. In this case, there are two of them. Inside each message is a from and a body. The from, of course, contains the name of the person who sent the message. It follows that the body contains the message body. This XML file will be used later.

XMLHttpRequest Primer for Ajax

XMLHttpRequest, at least on browsers that aren’t Internet Explorer 6 or less, is a native object in JavaScript that allows the client to send a request to the server in the background. That is, the browser doesn’t have to physically go to the page. This makes it easy to update parts of pages without iframe, frame and other hacks.

I’ll give you the condensed version of how to use it for the sake of brevity. I’m sort of copy / pasting code from Apple’s Developer documents because their example does XMLHttpRequest() the right way.

// This needs to be in the global scope so we can use it in other functions.
var xml_obj;
/*
 
 We will be sending loadXMLDoc the url of the page we want to call,
 the query string to post to the url, and a function to call when
 the script updates.
 
*/
function loadXMLDoc(url,query_string,callback) {
	xml_obj = false;
    // Try native XMLHttpRequest first.
    if(window.XMLHttpRequest) {
    	try {
			xml_obj = new XMLHttpRequest();
        } catch(e) {
			xml_obj = false;
        }
    // If native fails, fall back to IE's ActiveX objects.
    } else if(window.ActiveXObject) {
       	try {
        	xml_obj = new ActiveXObject("Msxml2.XMLHTTP");
      	} catch(e) {
        	try {
          		xml_obj = new ActiveXObject("Microsoft.XMLHTTP");
        	} catch(e) {
          		xml_obj = false;
        	}
		}
    }
	if(xml_obj) {
		// Any time the state of the request changes, callback will be called.
		xml_obj.onreadystatechange = callback;
		// Set the parameters of the request.
		// POST can also be GET.  We use the URL from above.
		// The <q>true</q> tells whether the request should be asynchronous.
		xml_obj.open("POST", url, true);
		xml_obj.setRequestHeader("Content-Type","application/x-www-form-urlencoded");
		// Since we are POSTing, we send the query string as a header
		// instead of as a string at the end of the URL.
		xml_obj.send(query_string);
	}
	else{
		alert("Failed to create XMLHttpRequest!");
	}
}
/*
 
This is an example callback function.
 
*/
function messagesRetrieved(){
	if (xml_obj.readyState == 4){
		// When readyState is 4, the request is complete.
		if (xml_obj.status == 200){
			// When status is 200, we know there are no server errors or anything.
			// and we can process the data.
		}
		else{
			alert("There was a problem retrieving the XML data:\n" + xml_obj.statusText);
		}
	}
}
 
// How we would call the loadXMLDoc
loadXMLDoc("http://domain.com/myurl.php","action=get_messages",messagesRetrieved);

Hopefully, from the comments in the script, you can see what is going on. When the request starts, the xml_obj.readyState property will be update. Once the request is complete, there will be several properties available.

  • xml_obj.responseXML is the most important for Ajax, since that is where the data is stored as an object. Yes, JavaScript automagically parses the XML and puts it in an object for you.
  • xml_obj.responseText is nice because it will print out the text returned from the server. It’s great for debugging.
  • xml_obj.status is the server status code of the request. This should be 200 or something is wrong with your scripts.

What is Ajax About, Then?

As I said, the term Ajax is a buzzword. It’s a short way of saying, JavaScript using XMLHttpRequest to get XML from a server in the background.

There are other methods that are probably get called Ajax, but aren’t. For example, XMLHttpRequest allows the server to send back text instead of XML (Ajat?). Also, if the asynchronous flag is set to false, the request doesn’t run in the background. Instead the script that is running waits for a response from the server before it continues.

It just turns out XML is slightly more usable for complex stuff and that synchronous requests defeat the point. Though Ajat have many uses.

DOM Scripting

DOM scripting is the key to Ajax. Ajax is pretty useless if you can’t extract the data. Basically put, the DOM is HTML / XML represented in an object. DOM Scripting, then, is using JavaScript (or some other language, I suppose) to manipulate this object. If you manipulate the DOM of your HTML page, it will reflect visibly in the browser.

DOM scripting replaces old-school document.write coding. Instead, you insert nodes (aka tags) or remove nodes from the DOM, and thus the page. The great thing is that XML files can be represented as a DOM object. That means that you can manipulate the XML the same way you manipulate the HTML. You’ll make a lot of use of the following (where xml = xml_obj.responseXML):

  • xml.childNodes is a collection of the children of the current node. It’s handy to loop on: for(i = 0; i < xml.childNodes.length; i++){}
  • xml.nodeName is the string value of the current tag.
  • xml.nodeType is a numeric representation of what kind of node the current element is, text (3) or element (1). This is handy for conditionals: if(xml.nodeType == 3){alert(xml.nodeValue);}
  • xml.nodeValue is the string value of the current element if it is a text node.

There are more useful properties and methods, but, for now, we're going to loop through output. It isn't the most efficient method, but it is more agnostic than, say, xml.getElementsByTagName.

A Basic Ajax Chat

As we see above, we already have a good deal of code written. In fact, most of the tedious stuff is done. We just have to solve a few problems.

Server Side Scripting

Ajax isn't worth much without server side scripting. PHP is my weapon of choice, but ASP, Perl, Ruby, or any other language will work just fine. However, since I chose PHP, it will affect my data storage solution.

Data Storage

Most people use Ajax to deliver data stored on the server. All the usual methods are available including mySQL, text files, and XML. XML seems like the likely choice, since we are spitting out XML. However, at least with PHP, working with XML is hard. Further, doing queries on XML seems like it'd be severely difficult in PHP. Other languages might like XML better than PHP, though.

So, for the sake of this article, that leaves text files and mySQL. Normally, I'd go with mySQL but this is a simple article. So, we're just going to use text files. Opening files is easier than writing all the code to get the database running.

The Text Storage File

The text file needs to only hold two variables: the name of the person sending the message, and their message. We'll use a tab-separated file since it is unlikely that the user will enter a tab (because that typically changes fields) and we'll remove all carriage returns. So, the XML file mentioned at the beginning before will be stored like this:

Fred	Fred here... Just making a post!
Joey	Hey, Fred.  Thanks for the post.

Convert The Text File To XML

We'll need to write a function in PHP to convert the text file to XML for display in our chat application. It should be something like this:

function tsv_to_xml($file){
	if(file_exists($file))
		$output = trim(file_get_contents($file));
	else
		die("Couldn't open the file.");
	if($output != "")
		$output = explode("\n",$output);
	else
		$output = array();
	$return = "<messages>";
	for($i=0; $i<count($output); $i  ){
		$output[$i] = explode("\t",$output[$i],2);
		$return .= "<message>";
		$return .= "<from>";
		$return .= "".htmlentities($output[$i][0])."";
		$return .= "</from>";
		$return .= "<body>";
		$return .= "".htmlentities($output[$i][1])."";
		$return .= "</body>";
		$return .= "</message>";
	}
	$return .= "</messages>";
	return $return;
}

That should return the XML file specified at the beginning from the tab-separated file.

Adding a New Message

We also need to add to the text file. So, we'll write a PHP function for that, as well:

function tsv_add($file, $from, $body){
	if(file_exists($file))
		$output = trim(file_get_contents($file));
	else
		die("Couldn't open the file.");
	$from = preg_replace("/(\s|\r\n|\r|\n)/"," ",$from);
	$body = preg_replace("/(\s|\r\n|\r|\n)/"," ",$body);
	$tsv = $from."\t".$body;
	$my_file = fopen("$file","w");
	fwrite($my_file, trim($tsv)."\n".$output);
	fclose($my_file);
	return tsv_to_xml($file);
}

Basically, we create a tab-separated entry for our file (stripping out protected characters), dump it into a file and return the updated XML file.

Putting the PHP Together

It'd be a bummer to use several different PHP files when one would do. So, we'll use a script to manage it. We want to accept an action parameter to help our script decide what to do. It will look like this:

$file = "./data/chat.txt";
switch($_POST["action"]){
	case "get_xml":
		header("Content-type: text/xml");
		echo tsv_to_xml($file);
		die();
		break;
	case "add_post":
		header("Content-type: text/xml");
		echo tsv_add($file, $_POST["from"], $_POST["post"]);
		die();
		break;
	default:
		die("There is no action associated with that call.");
		break;
}

Once we have all that PHP in one file, it will be our request file. So, we will call it request.php and put it on our development server putting the tsv_add and tsv_to_xml functions at the top. Make sure the data folder is writable! Make sure you put an empty chat.txt file in the data folder!

The HTML

The only thing we don't have is the HTML file (and a little bit of JavaScript) we'll use to display the XML from the chat. The code for the HTML should look something like this (except you'll probably want to put the CSS in a file and use the link tag):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
 <head>
  <title>Ajax Chat</title>
  <style type="text/css">
   label{display:block; width:100%;}
   input{display:block; width:100%;}
   input#submit{width:auto;}
  </style>
  <script type="text/javascript" src="chat.js"></script>
 </head>
 <body onload="get_xml();">
  <form method="post" action="#" name="send_form" id="send_form" onsubmit="send_xml(); return false;">
   <p>
    <label for="uname" accesskey="n">Name</label>
    <input type="text" id="uname" name="uname">
   </p>
   <p>
    <label for="mbody" accesskey="m">Message</label>
    <input type="text" id="mbody" name="mbody">
   </p>
   <p>
    <input type="submit" id="submit" value="Send Message" accesskey="s">
   </p>
  </form>
  <div id="chat_area">Loading chat...</div>
 </body>
</html>

The Rest of the JavaScript

We need just a little bit more JavaScript. The get_xml() function, the send_xml() function, and the messagesRetrieved() function (see above).

function messagesRetrieved(){
	if (xml_obj.readyState == 4){
		if (xml_obj.status == 200 &amp;&amp; xml_obj.responseXML){
			xml = xml_obj.responseXML;
			output = "";
			for(var i=0; i < xml.childNodes.length; i++){
				for(var j=0; j < xml.childNodes[i].childNodes.length; j++){
					if(xml.childNodes[i].childNodes[j].nodeType == 1){
						for(var k=0; k < xml.childNodes[i].childNodes[j].childNodes.length; k++){
							if(xml.childNodes[i].childNodes[j].childNodes[k].nodeName == "from"){
								output  = "<b>" + xml.childNodes[i].childNodes[j].childNodes[k].firstChild.nodeValue + "</b>: ";
							}
							else if(xml.childNodes[i].childNodes[j].childNodes[k].nodeName == "body"){
								output  = xml.childNodes[i].childNodes[j].childNodes[k].firstChild.nodeValue + "<br>";
							}
						}
					}
				}
			}
			document.getElementById("chat_area").innerHTML = output;
		}
		else{
			alert("There was a problem retrieving the XML data:\n" + xml_obj.statusText + "\n" + xml_obj.responseText);
		}
	}
}
 
function get_xml(){
	loadXMLDoc("request.php","action=get_xml",messagesRetrieved);
	setTimeout("get_xml()",5000);
}
 
function send_xml(){
	u = document.send_form.uname.value;
	b = document.send_form.mbody.value;
	qs = "action=add_post&amp;from=" + escape(u) + "&amp;post=" + escape(b);
	loadXMLDoc("request.php",qs,messagesRetrieved);
}

Put all the JavaScript (except the initial skeleton of messagesRetrieved()) into a file called chat.js and save it to the same folder as the HTML file and request.php.

Finishing Up

Assuming everything is set up the way I set it up on my server, you should have a rudimentary Ajax chat. Of course, there are some tweaks I highly suggest. The first being adding clip and overflow to the chat_area div. If done right, you can have a scrollable window much like an iframe.

The second thing to do is modify the script so it keeps track of the time the posts were made. Then the script could figure out which posts the user hasn't received and send only them. This will cut down on bandwidth, which is important. I let my first Ajax Chat run all day and I ended up doing three gigs of bandwidth. That's a lot considering it was all text files.

The third thing I would do is learn more about DOM scripting and get rid of those for loops in messagesRetrieved(). While they are agnostic and easy to add message items to, they aren't pretty.

Disclaimer

I wrote this code and tested it on Safari and Firefox on Mac. It worked for me with those browsers. It might not work for you and you may have to tweak it heavily. I don't have time to support it. The code is free and it is yours to use. So, feel free to modify it and ask other people for help with it. If you want to be nice, link back to this page if you get it working.

Oh, and since this dynamically updates an area of the page, it probably isn't very accessible to screen readers. Don't say I didn't warn you.

Source Code

You can download all the files mentioned above in a zip file. They are uncommented, but the code is clean enough that you should be able to figure out what is going on.