My to-do list has had a "install local copy of the W3C validator" item on it for quite a while, and when I came across an article detailing how to do just that, I thought I was all set. However, my excitement faded shortly after I saw the steps required to do the installation. A CVS checkout, replacing some files with Mac OS X-specific ones, Apache config file editing, two libraries to download and install, and fourteen Perl modules to setup. I resigned myself to an hour or two of drudgery, and went through the list. I eventually stumbled when trying to set up the Open SP library: I didn't feel like installing Fink just for this one thing, and my attempt at building it by hand didn't quite work out (the
libtool that was included wasn't the right one for OS X).
Rather than force myself to go through with the rest, I decided that perhaps an alternative approach was worth investigating. Instead of running my local (behind the firewall) documents through my own validator, I could instead transfer the file to another server, and then point the regular W3C validator to its (publicly-visible) temporary URL. Doing this in the form of a favelet/bookmarklet seemed ideal, since it would provide one-click access and be more portable than a shell script. This favelet would then invoke a CGI script on my server; a hybrid design in the style of my feed subcription favelet.
The first thing that must be done is to get the current page's source code. Initially, an approach based on the
innerHTML DOM property seemed reasonable. However, it turned out that this property is dynamically generated based on the current DOM tree, and thus not necessarily reflective of the original source. Furthermore, it's hard to get at the outermost processing instructions in XHTML documents, thus the source wouldn't be complete anyway. Therefore, I decided to use a XMLHttpRequest to re-fetch the page and then get its source by using the
responseText property. Unfortunately at this stage Internet Explorer support had to be dropped, since its equivalent ActiveX object didn't seem to want to run from within a favelet (clicking on the identical
With the source thus in hand, I had to find a way to get it on the server. The
XMLHttpRequest object also supports the PUT HTTP method, but apparently Safari only supports GET and POST. In any case, the object's use is restricted for security reasons, and so it would've been difficult to make any requests to a server different than the one hosting the page that was to be validated. However, the other, more old-school way of communicating with servers, via a form, was still available. Therefore the favelet creates a form object on the fly, sets the value of a hidden item to the source, and then passes it on to the CGI script. The script generates a temporary file and then passes it on to the validator.
The validator favelet is thus ready to be dragged to your toolbar. The original, formatted and commented source code is also available, as is the server-side script that receives its data and passes it to the W3C validator. The development process of this favelet was made much more pleasant due to the generator that can be used to transform human-readable code into something that's ready to be pasted in a
Full disclosure: For some reason, perhaps because it was 1 AM, it didn't occur to me to use the POST method to submit the source. Instead I devised a (convoluted) method that would take the source, divie it up into ~2K chunks, and then create a series of
iframes that would have as their
src attribute the CGI script that took the current chunk as its query string (i.e. via the GET method). Since there was no guarantee that all of the chunks would arrive in order, I had to keep track of them and eventually join them to reconstruct the original source (à la IP packet fragmentation). You'd think that I would have realized the folly (i.e. difficulty in proportion to benefit) of this approach early on, but no, I pursued it until it worked 95% of the time (modulo some timing issues in Firefox). Only when I was researching this entry did I realize that the form/POST approach was much faster (each chunk required a new HTTP connection and a
exec on the server), and ended up implementing it in 15 minutes with half the code. Chalk one up to learning from your mistakes (hopefully).