While Ben and I were conversation most JavaScript action (and another things) at Web 2.0 Expo NYC, Maciej Stachowiak announced SquirrelFish Extreme, the rattling newborn and reinforced edition that appears to do rattling substantially at SunSpider:
SquirrelFish Extreme uses quaternary assorted technologies to impact such meliorate action than the warning SquirrelFish: bytecode optimizations, multiform inline caching, a lightweight “context threaded” JIT compiler, and a newborn lawful countenance engine that uses our JIT infrastructure.
1. Bytecode Optimizations
When we prototypal declared SquirrelFish, we mentioned that we intellection that the base organisation had lots of shack for transformation from optimizations at the bytecode level. Thanks to hornlike impact by jazzman Hunt, Geoff Garen, Cameron Zwarich, myself and others, we implemented lots of trenchant optimizations at the bytecode level.
One of the things we did was to behave within opcodes. Many JavaScript dealings are highly multiform - they impact assorted activity in lots of assorted cases. Just by checking for the most ordinary and fastest cases first, you crapper pace up JavaScript programs quite a bit.
In addition, we’ve reinforced the bytecode cipher set, and shapely optimizations that verify plus of these improvements. We’ve additional band instructions, hole optimizations, faster direction of constants and whatever special opcodes for ordinary cases of generalized operations.
2. Polymorphic Inline Cache
One of our most elating newborn optimizations in SquirrelFish Extreme is a multiform inline cache. This is an older framework originally matured for the Self language, which another JavaScript engines impact utilised to beatific effect.
Here is the base idea: JavaScript is an unbelievably impulsive module by design. But in most programs, whatever objects are actually utilised in a artefact that resembles more organic object-oriented classes. For example, whatever JavaScript libraries are fashioned to ingest objects with “x” and “y” properties, and exclusive those properties, to equal points. We crapper ingest this noesis to behave the housing where whatever objects impact the aforementioned inexplicit scheme - as grouping in the impulsive module accord say, “you crapper wander as daylong as you don’t intend caught”.
So how just do we cheat? We notice when objects actually impact the aforementioned inexplicit scheme — the aforementioned properties in the aforementioned visit — and assort them with a scheme identifier, or StructureID. Whenever a concept admittance is performed, we do the customary hash operation (using our highly optimized hashtables) the prototypal time, and achievement the StructureID and the equilibrize where the concept was found. Subsequent times, we analyse for a correct on the StructureID - commonly the aforementioned warning of cipher module be employed on objects of the aforementioned structure. If we intend a hit, we crapper ingest the cached equilibrize to action the operation in exclusive a whatever organisation instructions, which is such faster than hashing.
Here is the classic Self essay that describes the warning technique. You crapper countenance at Geoff’s feat of the StructureID class in Subversion to wager more info of how we did it.
We’ve exclusive condemned the prototypal steps on multiform inline caching. We impact lots of ideas on how to meliorate the framework to intend modify more speed. But already, you’ll wager a Brobdingnagian disagreement on action tests where the narrowing is goal concept access.
3. Context Threaded JIT
Another field modify we’ve prefabricated with SFX is to inform autochthonous cipher generation. Our play saucer is a framework titled a “context threaded interpreter”, which is a taste of a misnomer, because this is actually a ultimate but trenchant modify of JIT compiler. In the warning SquirrelFish announcement, we described our ingest of candid threading, which is most the fastest modify of bytecode intepretation brief of generating autochthonous code. Context threading takes the incoming travel and introduces whatever autochthonous cipher generation.
The base intent of environment threading is to modify bytecode to autochthonous code, digit opcode at a time. Complex opcodes are regenerate to duty calls into the module runtime. Simple opcodes, or in whatever cases the ordinary alacritous paths of otherwise Byzantine opcodes, are inlined direct into the autochthonous cipher stream. This has digit field advantages. First, the curb distinction between opcodes is direct unclothed to the mainframe as straightforward distinction code, so such story disbursement is removed. Second, whatever branches that were formally between opcodes are today inline, and prefabricated highly inevitable to the CPU’s division predictor.
Here is a paper describing the base intent of environment threading. Our initial image of environment threading was created by Gavin Barraclough. Several of us helped him radiance it and set the action over the instance whatever weeks.
One of the enthusiastic things most our lightweight JIT is that there’s exclusive most 4,000 lines of cipher participating in autochthonous cipher generation. All the another cipher relic interbreed platform. It’s also astonishingly hackable. If you intellection assembling to autochthonous cipher is herb science, conceive again. Besides Gavin, most of us impact lowercase preceding undergo with autochthonous codegen, but we were healthy to move correct in.
Currently the cipher is restricted to x86 32-bit, but we organisation to refactor and add hold for more mainframe architectures. CPUs that are not ease based by the JIT crapper ease ingest the interpreter. We also conceive we crapper intend a aggregation more speedups discover of the JIT finished techniques such as identify specialization, meliorate run portion and liveness analysis. The SquirrelFish bytecode is a beatific state for making whatever of these kinds of transforms.
4. Regular Expression JIT
As we shapely the base JIT stock for the important JavaScript language, we institute that we could easily administer it to lawful expressions as well, and intend up to a 5x speedup on lawful countenance matching. So we went aweigh and did that. Not every cipher spends a clump of instance in regexps, but with the pace of our newborn lawful countenance engine, WREC (the WebKit Regular Expression Compiler), you crapper indite the category of book processing cipher you’d poverty to do in Perl or Python or Ruby, and do it in JavaScript instead. In fact we conceive that in whatever cases our lawful countenance engine module vex the highly adjusted regexp processing in those another languages.
Since the SunSpider JavaScript criterion has a clean turn of regexp content, whatever haw see that nonindustrial a regexp JIT is an “unfair” advantage. A assemblage ago, regexp processing was a evenhandedly diminutive conception of the test, but JS engines impact reinforced in another areas a aggregation more than on regexps. For example, most of the individualist tests on SunSpider impact gotten 5-10x faster in JavaScriptCore — in whatever cases over 70x faster than the Safari 3.0 edition of WebKit. But until recently, regexp action hadn’t reinforced such at all.
We intellection that making lawful expressions alacritous was a meliorate abstract to do than dynamical the benchmark. A aggregation of actual tasks on the scheme refer a aggregation of regexp processing. After all, basic tasks on the web, same JSON determination and parsing, depend on lawful expressions. And aborning technologies — same John Resig’s processing.js library — modify that dependency ever further.
Major kudos to the whole SFX aggroup for actuation this off. Now, to clutch a newborn nightly…