Zettabyte Storage

Sunday, January 21, 2007

A Perfunctory Semantic Note about ECMAScript Javascript

There are probably quite a few of you out there that have started reading my series on Making Javascript Useful that are wondering why I insist on referring to the ECMAScript language as Javascript. There are a few reasons.

First, almost nobody knows what ECMAScript is, aside from myself and you, the one reader whose eyes have not yet glazed over. If I were to call it, somewhat more properly, ECMAScript, then essentially nobody would see it, and even if they did, they probably wouldn't know or care what ECMAScript is because they only know how to program in Javascript. Everyone who knows what ECMAScript is, knows that Javascript is ECMAScript; however, the opposite is certainly not true.

The second, and more important, reason is mere pedantry. ECMAScript is a language definition; Javascript is the language that is available "in the wild" on essentially every computer in the world. Although it might be true to say that SpiderMonkey (name your own favorite backend here, I don't care) is an implementation of the ECMAScript Language, SpiderMonkey is not the program that runs your scripts, Firefox (or other appropriate browser) is the program that runs your scripts. This program comes with baggage like DOM, AJAX, Netscape2 compatibility, W3C Event Model, etc, whereas the ECMAScript language doesn't really care one way or the other what interfaces your program provides to the interpreter. Since we only really care (at the moment at least) about making our ECMAScripts run "in the wild", we are really worried about Javascript. ECMAScript will end up coming along for the ride, most certainly, but Javascript is the real target of our adventure.

Tuesday, January 16, 2007

Making Javascript Useful: Part 1, Simple Classes

Before I get into this blog, I should mention that I am not the first person to write what is essentially the first three articles of this series. It's also probably not the best discussion of the topic. When I got started seriously using Javascript, much of this information did not exist (at least on the web where I looked). After writing Part 0 and most of Part 1 (this post), I realized that the internet is both (1) not static and (2) tends towards ever greater knowledge. Sure enough, when I looked again, there was much more information than was available only a year ago. Now that I no longer need the information, it is more plentifully available than the grains of sand on a beach - it prickles similarly too.

From my peremptory scan of Google's top picks, one of the best places to go looking for advanced Javascript is: http://javascript.crockford.com - The article "JavaScript: The World's Most Misunderstood Programming Language" even has a section "Lisp in C's Clothing". Not only am I late to the party, but Mr. Crockford has much catchier wording. All is not lost, however; there is as yet very little (good) information about polymorphism out there. Besides, this is all really just introductory material for the real topic of this series: making javascript and the web a real programming and distribution environment - this goes well beyond OOP and should provide me with interesting things to say well into the future.

Sadly, I couldn't find the article that originally pointed out how to do some very basic OOP things with Javascript. So props go to whoever the mystery hacker happened to be, even if I can't remember their name.

And now on to our main feature:

What it all comes down to is playing up to your talents. When you get down to it, Javascript has a very odd set of talents.

The things we want to do with Javascript are essentially the things we want to do with any other programming language: build tools powerful enough that doing the really hard work can be made entirely someone else's problem. What are the tools that javascript* comes with "out of the box"?

  1. generic Functions and Objects and the things we like doing with them (closures, hashing, et.al.)
  2. a plethora of high and low level syntax (operators, if/then, while, function, et.al.)
  3. free and easy memory management (when and if it works at all)
  4. a powerful string library
There are other things that come to mind (and probably some important ones that don't), but these are the big 4. This is actually a much bigger set of functionality than most environments give us. For instance, C only gives us 2**. Everyone that builds a significantly large piece of code in C ends up reimplementing 1, 3, and 4 out of hand; Apache has it's pools and buckets, gtk has GObject, etc.

The one thing that Javascript doesn't give us (and, ironically, the BigThing for its namesake Java) is OOP. Of course we have the "Object" object, but what we really want is a class template, instances, public and private data, interfaces, super/subclassing, etc.

Without further ado, let's define a class:


function MyClass()
{
    <blah>
}


You are probably wondering why our "class" is defined with the 'function' keyword. Remember that the 'function' keyword is just a shortcut for declaring something like Function( "" ). Since the Function object is just another 'Object', we can tack properties (hash-table keys) onto it willy-nilly; if those properties happen to be other functions, so much the better. Specifically, we are going to use the 'this' keyword from inside the function to tack properties on the function object that represents the class. Thus, like with a real class, our "class" will be a wrapper for function definitions, like so:

Defining public functions:


function MyClass()
{
    this.MyPublicMethod = function()
    {
    };
}


The problem here is the value of 'this'. I've already mentioned that the 'this' refers to the function object that we are creating, but it is not clear why that should be the case. If you just call the MyClass function (it is really a function after all), you would expect 'this' to refer to nothing, since MyClass resides in the top-level containing block which has no 'this', since 'this' always refers to an Object(ish). Enter the 'new' keyword. The 'new' keyword copies the Function object created by 'function MyClass' and calls the new MyClass Function with 'this' set to the new Function object. The 'new' keyword is what creates the class's instance - a Function object.

We instantiate the class like so:


var my_class_instance = new MyClass();
/*
typeof my_class_instance == Function
typeof my_class_instance[ 'MyPublicMethod' ] == Function
*/


Note: in the definition of MyClass there is a semicolon after the definition of MyPublicMethod. Consider this a litmus test: if you understand what we're doing with the Function object, it should be obvious why this is needed. If you don't understand why there is a semicolon here, you should think about it until you do: it is important.***

Defining Public Data:


function MyClass()
{
    this._myPublicData = 0;
}


Since you understand that 'this' is just tagging keys into a Function object, it should be pretty obvious that we can do the same thing with other types of data as well.

Private functions and variables are a little tricker. First, let's review the concept of the "closure". I think I must have accidentally slept through the lecture where this was defined formally, but it's actually really easy. Just go read Wikipedia's article on it if you are not familiar. (http://en.wikipedia.org/wiki/Closure_%28computer_science%29) One of the things that I purposely failed to mention earlier about the 'new' operator is that it also forms a "closure" with the function. Essentially, what this boils down to is that the variables you use in the function are, simultaneously, kept around by the Function object with their current values, and are available to any functions you define within the class 'function'. Incidentally, they are not available to anyone who is not declared inside the function.

Defining Private Stuff:


function MyClass()
{
    var _myPrivateData = 0;
    var _myPrivateFunction = function()
    {
        _myPrivateData++;
    };
}

var my_instance = new MyClass();
/* window.alert( my_instance._myPrivateData ); // javascript dies */
/* window.alert( my_instance._myPrivateFunction() ); // javascript dies */


These examples should give us everything that we need to know to define, instance, and use basic (non polymorphic) objects. In my next article, I will finish up with the basics with an article implementing polymorphism in javascript.

* - I don't care about your browser's DOM tree and its cute little HTML renderer - we are talking about core ecmascript here

** - Keep in mind that I said a "powerful" string library; the cstdlib hardly counts.

*** - You are creating a Function object with the 'function' keyword and assigning it to a property in the class's Function object. The semicolon goes after the property assignment, even if it is a function you are assigning. Actually, you don't really _need_ a semicolon here: a conforming Javascript interpreter will be able to deduce the property ending correctly. Of course, that notion presuposes that there exists some conforming Javascript interpreter. In general, if a semicolon can go there, it should go there, because it will almost always make the interpreter's job easier.

Sunday, January 07, 2007

Making Javascript Useful: Part 0, Taking off the Training Wheels

I am growing fond of Javascript.

Javascript is a fascinating language. In this respect, most serious* languages are fairly ordinary: in the first five minutes of reading a language tutorial, you generally have a grasp of that language's big "thing".** For instance, if you are coding in Ruby, you know that Everything-is-an-Object, so you put on your Noun hat and get to work. In Python, Spacing-is-Blocks-so-you-Better-Make-Small-Modules, so you open your file manager alongside your editor. In PHP, PHP-Interoperates-With-Apache, so you open your ssh terminal and ftp client. In general, this is a "Good Thing." The unifying feature of the language helps the nascent acolyte learn the language by giving her mind something to put a handle on - a solid, defensible concept to inform and direct the learning process.

I don't want to give the impression that a language's big selling point is the language. I am not saying that these languages are that "thing". That said, it is important to note that they almost always start that way.

With most languages, the initial concept grows on the programmer until that framework of thinking enables her to do wonderful things within the language. This almost always happens (to me at least) when I start to get a real "feel" for how the language works and know without having to look at a reference manual how it is going to handle something I haven't seen before. If you have ever known that gloating sensation of supreme competence when something new works the first time, you know what I am talking about. When you get to this stage, your abstractions become clear and concise as a matter of habit and the code almost writes itself.

Although this competence comes faster with every language I learn, it generally comes slowly over the course of several months. With Javascript, the competence came overnight, after a full nine months of learning. I'm not bragging about this: it is actually rather sad that I didn't "get it" sooner. The article that really made it click was this one. This is the Mozilla Foundation's Core Javascript reference on Functions. If you work with javascript at all regularly, you owe it to yourself to read and understand that article.

So, have you all gone and read the article? No? Well go do so, it is quite fascinating. You disagree? I suppose you can disagree, but you might miss an important point.

"That was certainly interesting," I can hear you saying***, "but what is the point?" The point is: this is somewhat different from what you will hear about javascript in any tutorial and almost every book about the language. Most presentations of javascript that I have seen point out that javascript looks pretty much like C and leave it at that. So javascript becomes the Javascript-Is-Very-Much-Like-Java-:-That-Hip-And-Popular-Language-You-Actually-Have-Heard-About-And-We-Kinda-Look-Like-That-Too-If-You-Squint-And-Don't-Do-Anything-Too-Complicated language. So coming into javascript programming, you really have No Idea what to expect, except that it's something like C and Java, which, if you've actually used one of these languages is very obviously, poignantly untrue. Thus, you define your one-deep functions with the "function" keyword and use 'if' statements and 'for' loops as if they were the garden variety C variants of those constructs and hope for the best. Perhaps, you will brush up against something more esoteric in the cataclysmic depths of the ecmascript standard; you may notice the odd property that 'everything is a hash table.' Odd things though you may see, it doesn't really keep us from thinking about javascript as a quirky, clunky C'ish variant.

Well, what changes when you read the article? Probably nothing, if you just skimmed it. If you didn't notice it right off, try replacing the word "Function" in that article with the word "Lambda." Notice how the javascript Function object allows you do Lambda calculusy things with your code. As it turns out, in many ways, javascript has more in common with Lisp than C.

Consider the naive recursive Fibonacci number generator in "vanilla" javascript - it looks almost like the C equivalent:


function fib1( n )
{

if( n < 2 ) {

return n;

}
return fib1( n - 1 ) + fib1( n - 2 );
}


Now we can write this as a Function object:


var fib2_code = "if( n < 2 ) {return n;} return fib2( n - 1 ) + fib2( n - 2 );"
var fib2 = Function( "n", fib2_code );


Now it looks like poorly written C code. Of course, this doesn't really get us anything new or different; however, since this code is a string, we can modify it like a string, similar to the way we can edit Lisp s-expressions:


var myfunc = Function( "n", "if( n < 4 ) { return n - 2; } return n * 2;" );
var fib3_code = fib2_code.replace( /fib2/g, "myfunc" );
var fib3 = Function( "n", fib3_code );


Naturally, this is a rather silly example, but if you use your imagination, you can think of some rather clever (perhaps even devious) constructs that you can build with this technique. At the very least, it expands the "typical" javascript toolbox to include both Lisp'ish and C'ish constructs. With this realization, it should be relatively clear why I think javascript so extraordinary: Javascript combines the mathematical generality and flexibility of Lisp with the high-level ease-of-use of C.

Of course, the fun hardly ends here. I'll write an article soon that deals with some of the ramifications of these internals. As it turns out, this will have a dramatic impact on how we implement, abstract, factor, and secure our client-side code.


* - I love BrainF*ck too, but until I see a web-server written in it, it fails my litmus test for being a "serious" language.

** - I am not insulting your favorite language! I love your favorite language too! However, to an outsider, a language's big selling point _is_ the language. Bear with me.

*** - Yes, that was (almost certainly) ventriloquism.

Friday, January 05, 2007

Waiting around is hard work. Let me explain: we take data backup seriously. A large part of that seriousness comes into play with having the discipline to build and apply good tests and good testing procedures against our core backup code. The tests suites we run against Perseus (our file mirroring agent) are split into four main components: feature tests, unit tests, upgrade tests, and the integration test. Before a release of Perseus gets anywhere near the Zettabits patch network it has to run successfully on all of these tests. Once we are done vetting a release against our testsuite, we push the changes out to our 'testing' network, on which we run our internal dev machines. After we poke and prod it in a production-like environment to our satisfaction, we push it out to our Beta network. The zBox that hosts our giant code repository runs on the Beta patch network, so by the time we get code to this stage, we're staking our own data on it's stability and correctness. Before we push a patch live, we always do a full restore to a fresh zBox from our own backups. Although this process generally produces exemplary code, it can take a frustratingly long time to get changes into the field.

I think the best way to put the test suite in perspective is with a simple line count: the test suite is 3.5 times larger than the core code.

The feature test suite is the first test set we instrumented against Perseus, before we had even a line of code. Each of the feature tests looks for a single specific feature (e.g. unicode filename support for directories) and does a complete backup restore cycle, checking the results at each stage for correctness. As we add features to Perseus, our feature tests give us quick feedback about our progress implementing that feature.

On the other hand, the unit test suite picks at individual bits of code. Generally, this involves overriding much of the rest of the system with dummy modules. These modules then lie their interfaces off to the tested module in the hopes of getting out a wrong result.

The integration test suite is our "big-bang" test. This test is multi-tiered. As the test runs, we add files, remove file, rename files, change and update files, backing up and restoring to verify the content several times over the course of the test. This test attempts to catch every use case that we can imagine and rolls it into a single big cruncher that we can run and get a yes/no answer out of.

The upgrade tests are smaller versions of the integration test. They work similarly to the integration test; however, they change the version of Perseus in-between test phases. This ensures that when we push a new version of Perseus, no matter what version happens to be running on a client's box in the field, it will cleanly transition to the new code. The upgrade test runs for every version of perseus that has been in the field to the current version.

Between these tests, we have a pretty good idea of how well we are doing when working on Perseus. On my desktop and on the pro edition zbox, these will all run in about an hour; on our standard edition zBoxes, this takes more like two or three hours. The longest test is the restore we run against our own massive archives. Even on our business-class cable connection, this takes several days. Of course, if any of the tests fails, we have to start over at the beginning.

Waiting for tests to finish can be trying when we have so much work invested in the code - I want to know if it works now. On the other hand, the assurance of having such a rigorous test suite makes the wait well worth it.