technology from back to front

Archive for the ‘Programming’ Category

Optimizing Salsa20 in BouncyCastle

A while ago I started dabbling in cryptography a bit, and inevitably, I ended up toying with performance of the related algorithms and code. In this post I’d like to share my approach to optimizing BouncyCastle’s implementation of Salsa20.

A few observations regarding Salsa20 (and presumably other modern ciphers) in the context of performance:

  • Salsa20 uses 32bit integers, 32bit CPUs are dirt cheap nowadays, there is no reason to design an algo based on smaller words.
  • It produces a pseudo-random stream in a block based fashion, 16 ints per block. A good opportunity to leverage multiple cores and multiple execution units in a core.
  • It only uses basic, fast operations on ints, add, xor and shift.

As far as I could measure it, BouncyCastle’s Salsa20 implementation works out at around 20-22 CPU cycles / byte with latest JVMs. Fastest C implementations, according to DJB’s website can make it at around 4-5 CPU cycles / byte. A noticeable gap, let’s see why this is and what can be done with it.

Salsa20Engine.java from BouncyCastle sources can be found here. There are two hotspots in it.

First, salsaCore method, the actual implementation of the Salsa20 cipher. It produces a chunk of pseudo random ints and stores it in an internal buffer of bytes. One of the recent commits is an optimisation of this part of code. Buffering ints in variables as seen in the commit can potentially reduce a number of memory lookups in comparison to manipulating array elements as well as a number of boundary checks JVM has to perform.
Unfortunately, because algorithhm requires 16 such variables, JIT is likely to produce extra stores / loads to handle them all. Furthermore, underneath this still is good old serial code with no utilization of SIMD execution units. JITs in Java7 and Java8 can utilise SSE/AVX but are quite finicky about when and how they do it and code in Salsa20Engine.java doesn’t compel JIT to make use of SIMD instructions. As comment in the commit says, this optimization yields about 15%, and it is about so far we can go this way. This part of code nonetheless has a potential of yielding more with SIMD but it has to be approached from a different angle. The topic of SIMD use in JVM doesn’t seem to be well covered so I had to resort to experimentation and analysis of the JIT’s disassembly. To explain it all in proper detail would take way too much space to fit in a single post. So, I share a full optimized source code and hope it speaks for itself instead. The last, somewhat generic, note on it is that restructuring execution flow so that it uses SIMD entails extra data rearrangements which in turn take up extra CPU and reduce gains. This often is the case when we try to optimise a part of the picture without changing too much, or when we just cannot carry out deeper structural changes.

The second hotspot, in processBytes method, which implements an API entry, does xor’ing of an input stream of bytes with the sequence of bytes produced by salsaCore. Problem is, Salsa20 algorithm operates on 32-bit ints whereas API takes and outputs streams of bytes. As I mentioned before, Salsa20Engine.java converts ints produced by algorithm into a buffer of bytes which in turn is used to xor an input buffer of bytes. The xoring itself is done byte by byte and JIT in fact does produce code that processes it all in 8-bit chunks (including the costly loads / stores). A better approach is to keep the ints produced by algorithm as ints and use them as input bytes go, xor’ing input bytes with respective quarters of precalculated ints.

To test it all, FasterSalsa20Engine.java needs to be dropped alongside Salsa20Engine.java (into org.bouncycastle.crypto.engines package path), and SalsaTest.java needs to be compiled and run against this modified BouncyCastle.

On my laptop with a SandyBridge CPU and Java 1.8.0_11, an example output for larger blocks shows an average gain of 200-220%:

        Salsa20   30.6kB :: min= 104mb/s  avg= 109mb/s  max= 111mb/s
  FasterSalsa20   30.6kB :: min= 221mb/s  avg= 227mb/s  max= 241mb/s
        Salsa20   86.5kB :: min=  99mb/s  avg=  99mb/s  max=  99mb/s
  FasterSalsa20   86.5kB :: min= 239mb/s  avg= 239mb/s  max= 239mb/s
        Salsa20   15.6kB :: min=  92mb/s  avg=  92mb/s  max=  92mb/s
  FasterSalsa20   15.6kB :: min= 231mb/s  avg= 231mb/s  max= 231mb/s
        Salsa20   72.4kB :: min=  93mb/s  avg= 100mb/s  max= 111mb/s
  FasterSalsa20   72.4kB :: min= 200mb/s  avg= 207mb/s  max= 221mb/s
        Salsa20    3.8kB :: min=  96mb/s  avg=  97mb/s  max=  98mb/s
  FasterSalsa20    3.8kB :: min= 140mb/s  avg= 193mb/s  max= 207mb/s
by
jarek
on
22/08/14

Automating pre-deployment sanity checks with Grunt

Grunt is a great tool for building, running and deploying ‘Single Page Apps’. I have a single grunt command to build and deploy to S3 for production, but recently I added some extra functionality to make deployment safer and even easier:

  • Abort if you are not on master branch
  • Abort if there are any uncommitted local changes
  • Abort if not up to date with the origin repo
  • Create a file revision.txt containing the deployed git revision hash, so we can GET it from the server and be sure of which revision is live
  • Automatically create a tag with the date and time.

I found a few existing pieces to implement some of these, but not all of them, and I ended up with a set of custom Grunt tasks, which I present here in the hope that they are useful to others. They could perhaps be packaged up into a Grunt plugin.

With no further ado, here is the stripped down Gruntfile, just showing the parts relevant to this post, though the deploy-prod task definition leaves in the other task names for context in the overall flow.

module.exports = function(grunt) {

  // Load all grunt tasks matching the `grunt-*` pattern
  require('load-grunt-tasks')(grunt);

  grunt.initConfig({
    // Lots of other Grunty things
    // ...

    // Executing the 'gitinfo' command populates grunt.config.gitinfo with useful git information
    // (see https://github.com/damkraw/grunt-gitinfo for details) plus results of our custom git commands.
    gitinfo: {
      commands: {
        'status': ['status', '--porcelain'],
        'origin-SHA': ['rev-parse', '--verify', 'origin']
      }
    },

    gittag: {
      prod: {
        options: {
          tag: 'prod-<%= grunt.template.today("ddmmyy-HHMM") %>'
        }
      }
    },

    shell: {
      gitfetch: {
        command: 'git fetch'
      },
      saverevision: {
        // Save the current git revision to a file that we can GET from the server, so we can
        // be sure exactly which version is live.
        command: 'echo <%= gitinfo.local.branch.current.SHA %> > revision.txt',
        options: {
          execOptions: {
            cwd: 'dist'
          }
        }
      }
    },
  });

  grunt.registerTask('check-branch', 'Check we are on required git branch', function(requiredBranch) {
    grunt.task.requires('gitinfo');

    if (arguments.length === 0) {
      requiredBranch = 'master';
    }

    var currentBranch = grunt.config('gitinfo.local.branch.current.name');

    if (currentBranch !== requiredBranch) {
      grunt.log.error('Current branch is ' + currentBranch + ' - need to be on ' + requiredBranch);
      return false;
    }
  });

  grunt.registerTask('check-no-local-changes', 'Check there are no uncommitted changes', function() {
    grunt.task.requires('gitinfo');

    var status = grunt.config('gitinfo.status');

    if (status != '') {
      grunt.log.error('There are uncommitted local modifications.');
      return false;
    }
  });

  grunt.registerTask('check-up-to-date', 'Check code is up to date with remote repo', function() {
    grunt.task.requires('gitinfo');
    grunt.task.requires('shell:gitfetch');

    var localSha = grunt.config('gitinfo.local.branch.current.SHA');
    var originSha = grunt.config('gitinfo.origin-SHA');

    if (localSha != originSha) {
      grunt.log.error('There are changes in the origin repo that you don\'t have.');
      return false;
    }
  });

  // Some of these tasks are of course ommitted above, to keep the code sample focussed.
  grunt.registerTask('deploy-prod', ['build','prod-deploy-checks','gittag:prod','aws_s3:prod']);

  grunt.registerTask('prod-deploy-checks', ['gitinfo','check-branch:master','check-no-local-changes','shell:gitfetch','check-up-to-date']);
};

We rely on a few node modules:

  • grunt-git which provides canned tasks for performing a few common git activities. We use it for tagging here.
  • grunt-gitinfo which sets up a config hash with handy data from git, and allows adding custom items easily. This helps us to query the current state of things.
  • grunt-gitshell which lets us run arbitrary command line tasks. We use it to git fetch (not supported by grunt-git, though we could probably have abused gitinfo to do it) and to save the current revision to file. I hope that the command I use for that is cross-platform, even to Windows, but it’s only tested on Mac so far.

Hence I ended up with the following added to package.json:

    "grunt-git": "~0.2.14",
    "grunt-gitinfo": "~0.1.6",
    "grunt-shell": "~0.7.0"
by
Sam Carr
on
15/08/14

Optimising compilers as adversaries

Suppose that you want to handle some secret data in C and, in the wake of some high-profile vulnerability or other, want to take precautions against your secret being leaked. Perhaps you’d write something along these lines:

#include <string .h>

typedef struct {
  char password[16];
} secret_t;

void get_secret(secret_t* secret);
void use_secret(secret_t* secret);

void wipe_secret(secret_t* secret) {
  memset(secret, 0, sizeof(secret_t));
}

int main() {
  secret_t secret;
  get_secret(&secret);
  use_secret(&secret);
  wipe_secret(&secret);
  return 0;
}

I think you could be forgiven for assuming that this does what it says. However, if you have what John Regehr calls ‘a proper sense of paranoia’, you might actually check. Here’s an excerpt of what I got when I ran clang -S -O2 -emit-llvm on this example:

define i32 @main() #0 {
  %secret = alloca %struct.secret_t, align 1
  call void @get_secret(%struct.secret_t* %secret) #4
  call void @use_secret(%struct.secret_t* %secret) #4
  ret i32 0
}

As if by magic, wipe_secret has completely disappeared.

Read more…

by
ash
on
30/06/14

Super-simple JavaScript inheritance

JavaScript uses prototype-based inheritance, which can prove a bit of a puzzler for those of us used to class-based object orientation. At first glance it seems like it’s basically the same and as if it can be used in very nearly the same way. If you pretend that those prototype objects are in fact classes and ignore the nuances you can get surprisingly far and code up a decent-sized heap of working code. However you will eventually be bitten and have to read up on what’s really happening at the nuts and bolts level. This is guaranteed to happen just when you’re under pressure, trying to get a critical feature working. It’s a horrible feeling, the dawning realisation that the subtle bug you can’t grok is because things don’t work the way you thought at a very basic level and that your heap of code is founded on quicksand. This happened to … a friend of mine.

This post isn’t going to attempt to explain the depths and subtleties of JavaScript’s prototype model. Plenty of others have been there before. In fact we will embrace our class-based stubbornness and attempt to get it working the way we really wanted. Plenty of others have done this too, but there are a few issues with most of their solutions:

  • They are too simplistic and don’t cover all the cases required, like having an arbitrarily deep hierarchy that can call up the constructor chain neatly
  • They are too complicated, having developed into powerful libraries with many features
  • The perennial problem: I didn’t write them, so am not in control and able to understand exactly what’s going on and adapt to exactly my needs – no more, no less.*

I present the result below, wrapped up for require.js. There is really very little code indeed – just two functions: inherit and superConstructor.

// Because class is a keyword in JS, consumers should inject this as clazz.
define(function() {

  return {
    // In the unlikely event that you need to explicitly call a superclass implementation
    // of a method, because a method with the same name exists in the current class:
    //  foo.parent.bar.call(this, x, y);
    inherit: function(child, parent) {
      child.prototype = Object.create(parent.prototype);
      child.prototype.constructor = child;
      child.prototype.parent = parent.prototype;
    },

    // The superclass constructor should generally be called from child's constructor
    // otherwise it won't run and fields defined there will be missing:
    //   superConstructor(this);
    superConstructor: function(self) {
      // The constructor that we call here may in turn wish to call superConstructor() to
      // call its own parent's constructor (but with the same 'self') so we must take
      // special measures to allow this, as self will be the same object with each recursion.
      var constructor = (self.nextParent) ? self.nextParent.constructor : self.parent.constructor;
      self.nextParent = self.parent.parent;
      constructor.call(self);
      self.nextParent = undefined;
    }
  }

});

The contents of inherit are much as you’ll find in many a blog post, though there’s a surprising amount of subtle variation out there!

More interesting is superConstructor, which employes a somewhat offensive tactic to allow calls all the way up the constructor chain. What makes this difficult is that ‘this’ must remain the actual object being constructed throughout those nested calls, so we need to manually provide the context to know what the next constructor up the chain is.

Having done this and saved the code above into clazz.js, we can write code with inheritance as follows (working example as a jsfiddle).

// A Dog can bark.
function Dog() {
    console.log('Constructed a dog');
}
Dog.prototype.bark = function() { return 'Woof' };

// A Yorkie is a Dog that barks a lot!
clazz.inherit(Yorkie, Dog);
function Yorkie() {
    var self = this;
    clazz.superConstructor(this);
}
Yorkie.prototype.bark = function() {
    var noise = this.parent.bark.call(this);
    return noise + noise + noise;
};

// Create dogs and see what noises they make.
console.log(new Dog().bark());
console.log(new Yorkie().bark());

To be fair, my super-simple inheritance library is extremely restricted in its abilities, for instance not handling constructor parameters. But that’s because I didn’t need them, and any extra features should be easy to add. Most of all it was a valuable learning experience.

* Actually I love an off-the-shelf library as much as the next chap (or chappess) – but if you don’t feel comfortable with the libraries on offer and the problem seems nicely tractable and a worthwhile learning experience then why not go for it. You can always change your mind.

by
Sam Carr
on
16/06/14

CSS Transitions can’t animate display change

I’d like to demonstrate a fairly simple CSS issue that caught me out, and the straightforward solution. Put simply CSS Transitions do not work if there is a change in the display property as part of the same change that fires the transition, but you can workaround this by separating out the display change.

If you’re not already aware, CSS Transitions are a cute way of animating transitions on your web page. Simply add a transition property in your CSS stating which property of the element should be animated when it changes, and over what period of time.

.animatedWidth {
    transition: width 2s;
}

In the example above, whenever the width of the element is changed (e.g. programmatically from JavaScript) it will animate that change over 2 seconds, complete with ease-in and ease-out by default.

I’ve created a jsfiddle with a more convoluted example that demonstrates the display problem, so you can inspect the HTML, CSS and JS, and run it in the browser. The example has three coloured bars (though the second two start off invisible) and an Animate button. Click the button and you’ll see that the ordinary transition animates the width of the bar as expected, but where the coloured bar is being made visible at the same time it just winks into existence in its end state with no animation. The third bar appears and then animates correctly, because our JS separately shows it then triggers the animation. It uses a timeout with zero delay to achieve this, effectively giving the rendering engine its chance to handle the display change before then triggering the animation.

button.on('click', function() {
    // To get the animation working we need to change the
    // display property first (via jQuery toggle()) and then
    // trigger the CSS transition with a zero-delay timeout.
    bar3.toggle();
    window.setTimeout(function() {
        bar3.toggleClass('animate');
    }, 0);
});

In my real world situation where I first stumbled across this effect, the item being animated started offscreen (and invisible) and slid into place, with the problem only evident on Chrome for some still unknown reason. The change of display property was but one of many things going on via incidental CSS so it took some sleuthing to figure out that it was responsible for the problem. Coming at it from that baffling angle for the first time, the problem and its solution were not nearly so obvious as presented above!

by
Sam Carr
on
27/05/14

A simple Knockout page router

Knockout.js is a pleasantly simple approach to data-binding ViewModels into your HTML. Like many JavaScript libraries it sticks to a core mission with a few simple concepts, which makes it quite approachable. Its simple template support means that you don’t need to write much code to get a top-level page router going in your single page app (SPA) and that’s exactly what I have done.

Knockout-routing

It uses hash-based routing, so URLs must be of the form http://foo.com/index.html#myPage. This approach means that even a statically hosted site with just the one real URL (index.html in this example) and zero server-side dynamicism can be a SPA with multiple virtual pages. All requests will ultimately come to index.html and then the router takes over and shows the right actual page based on the hash in the URL. Back and forward buttons work, as does page refresh, bookmarking, emailing links etc.

The code is on GitHub, with a decent README explaining the features and the key files to look at, so I won’t repeat that here. The code is also well-commented, with the intention that you can (and should) read it to see how it works. You can clone it, then simply double click src/index.html to open it in your browser and see its capabilities demonstrated. Nice and easy.

The router itself is just a 61 line JavaScript file, which would be very easy to extend with further features that you might need. The rest of the code on GitHub shows how to use it by example, and demonstrates all of its features.

Any feedback is very much appreciated. I imagine there are other similar routers out there, but this one is mine and making it (and using it in anger) taught me a lot and provided a nice, tight result which I can easily add to as required.

by
Sam Carr
on
30/04/14

Getting back into front-end web development

I’ve been working on a small SPA (Single Page Application) – just HTML, CSS and JavaScript statically served and doing its thing entirely in the browser. I learned a great deal throughout the project, but here are some of the things that strike me as most valuable.

Get a good workflow going

I used Grunt to setup a nice build system that is mostly a joy during development. It took a while to evolve my Gruntfile, but now when I edit a file, the results are immediately refreshed in the browser (I don’t even have to hit cmd-R). I can deploy to S3 test, staging and live sites with a single command that takes about 3 seconds. My SASS files are compiled down to minified CSS, my JS is minified with source maps etc.

The hardest part of using Grunt is figuring out how to configure it and its many contrib plugins. I could have done with a definitive reference or perhaps I could have used Yeoman to give me an out of the box solution. However I recognised that I was always going to have to figure out the guts of Grunt so I think I really was better off bespoking it from the start. I’m glad I did as now I have a tight setup that does precisely what I want and that I understand completely.

Now it seems there is a new kid on the scene, Gulp – nicely introduced in this tutorial blog post. I will definitely be looking closely at that for my next project, with the piping approach looking like the key step beyond Grunt, along with nicer syntax. I’d also look at Browserify, for a nicer way to piece together the JS bits.

Learn JavaScript properly

To the uninitiated, JavaScript is fairly surprising in many subtle ways, and though I can grok the prototype-based inheritance fairly easily, the scoping rules caught me out repeatedly. This was especially the case as I tried to create JQuery plugins with private methods and state. Eventually a simple old article by grand-daddy of JavaScript writing Douglas Crockford gave me the vital clues I was missing.

Really I should just read his book, and I would recommend that anyone else doesn’t just attempt to learn JavaScript as they go, but takes some time to pro-actively figure out the core concepts – it will pay off in short order.

jQuery is non-negotiable

And the award for most indispensable library goes to: jQuery. Seriously, it should be baked into the browsers or the ECMAScript standard. The nicest thing about it is I can pretty much just guess at the API and be right most of the time, though reading the docs usually reveals new conveniences that go further than I even imagined.

Browser quirks can be a living nightmare

JavaScript itself is fairly reliable, especially with judicious use of mature libraries like jQuery that paper over the cross-browser DOM cracks. CSS in complicated scenarios is where it all seems to go wrong however.

It’s amazing how broken/different some browsers are. Here are just a few highlights, though every day brought tens of new oddities and associated workarounds.

  • Mobile Safari on iOS 7 reports the viewport height inconsistently (depending on how you access it) leading to bad layout and horrible JavaScript workarounds.
  • Use of -webkit-overflow-scrolling:touch causes the hardware accelerated renderer to kick in, resulting in various flickers, flashes and flitches with content not rendering.
  • IE 10 on Windows 8 shows back/forward overlays at the left/right of the screen when your mouse moves near them, obscuring links in those locations.
  • Chrome running on Retina Macs suffers from strange graphical glitches when running CSS Animations, but is fine with CSS Transitions. However other browsers/platforms really need CSS Animations to get smooth, hardware accelerated movement. In my case it was necessary to implement both approaches and select using browser detection.
by
Sam Carr
on
06/03/14

A study in Scala

Cards on the table: I like Scala. Right now it’s my go-to general purpose programming language, but I know that many people have a dim opinion of it, or appreciate its positives but are heavily conflicted by its negatives. Actually I do put myself in that latter camp, though I think I’ve got deep enough into learning the language that I can get past most of those negatives. Here I present a classic tale of “life with Scala”.

The problem

Scala generally has collection methods for most things you can think of, so I was pleased but not surprised to find inits and tails methods, that list all of the prefixes and suffixes respectively. These methods return an iterator, hence the use of toList below, to actually see the results. For the rest of this post, we’ll just consider inits, though tails is trivially similar.

scala> List(1,2,3).inits
res1: Iterator[List[Int]] = non-empty iterator

scala> List(1,2,3).inits.toList
res2: List[List[Int]] = List(List(1, 2, 3), List(1, 2), List(1), List())

scala> List(1,2,3).tails.toList
res3: List[List[Int]] = List(List(1, 2, 3), List(2, 3), List(3), List())

The results include the empty collection, but as it happens that doesn’t suit my needs. I want a version that doesn’t return the empty list.

What to do?

Options include:

  • Call the existing inits method then strip off the last item. However since we get an iterator this isn’t quite so trivial – we can’t just do inits.init
  • Write a new custom method to do just what I want and return a collection, not an iterator.

I decided, in the spirit of Scala, that I would add a nonEmptyInits method to ‘all collections’ (and I’m deliberately vague here) via an implicit conversion, so that it can be called just as easily as inits. It would call inits but wrap the iterator to exclude the final element. Easy peasy, but the rabbit hole loomed ahead of me.

Because I was over-engineering and trying to be purist (this was hobby code) I wanted to implement my new method for the most generic type reasonable. Figuring out what type that was challenged me rather. The Scala collections API has a fairly complicated hierarchy. This is somewhat diagrammed and explained in the documentation but that doesn’t show all the interim traits and classes that are in the guts of the implementation such as TraversableLike, GenTraversable, GenTraversableLike and 10+ others relating just to Traversable. The API is setup to support some very clever facilities, like generic collection methods typically returning collections of the same type you started with. Apparently most libraries can’t/don’t do this. However it made it hard for me to find the right place to start. Ultimately I settled on Traversable, since inits itself is defined in TraversableLike, but adding to that directly seemed like more trouble than it was worth.

implicit class TraversableExtras[T](t: Traversable[T]) {
  def nonEmptyTails: Iterator[Traversable[T]] = {
    val bufferedInits = t.inits.buffered
    new Iterator[Traversable[T]] {
      def hasNext() = bufferedInits.head.nonEmpty
      def next() = bufferedInits.next()
    }
  }
}

My solution above wraps the iterator with a buffered version, so that I can peek at head and determine whether to end early or not. This is fairly nice and simple and I’m quite happy with it. It means we can do the following, via the magic of implicit conversions.

scala> List(1,2,3,4).nonEmptyTails.toList
res7: List[Traversable[Int]] = List(List(1, 2, 3, 4), List(1, 2, 3), List(1, 2), List(1))

But Arrays mess it all up

This is all looking very promising and straightforward, but it took a fair bit of banging my head against the desk to figure out why the compiler wouldn’t let me call it on an Array.

scala> Array(1,2,3).nonEmptyTails
:9: error: value nonEmptyTails is not a member of Array[Int]

To cut a long story short, though we can seamlessly treat Array as a Traversable, Array is actually somewhat special. It maps to Java arrays for implementation reasons, but supporting generics, and with an implicit conversion to WrappedArray, which is a Seq (and hence a Traversable) to provide all the collection methods we’re used to. So far so good, but it turns out that the compiler cannot apply more than one implicit conversion at a time, as the search space would explode. Hence it cannot convert from Array to WrappedArray to my TraversableExtras. A solution in such a case, where we want two conversions, is to explicitly cast the Array, to make the first conversion explicit, then the compiler does the rest.

scala> (Array(1,2,3): Traversable[Int]).nonEmptyTails
res10: Iterator[Traversable[Int]] = non-empty iterator

Wrap-up

Finally I got where I wanted, even if that cast is a bit irritating, and I learned a lot of useful things in the process – all part of the journey of familiarity with any language. But it struck me as soon as I got a bit lost in the Scala collections hierarchy that I was on a mission with which I am all too familiar. It was at that point that I decided I would probably blog about the experience and started taking notes!

I still like Scala, but I’m very well aware that this is the price I pay, and I’m not surprised that others find it too high a price.

Just to be clear…

I realise by the way that this was all a fools errand in the first place. I deliberately went down the path described above as an explicit learning exercise because I wanted to get more experience with the collections API, iterables, implicits etc. It is possible to achieve the result more directly with a filter:

scala> Array(1,2,3).inits.filter(_.nonEmpty).toList
res16: List[Array[Int]] = List(Array(1, 2, 3), Array(1, 2), Array(1))

It’s not quite as pithy as having a nonEmptyTails method, but in all other senses it’s probably superior.

by
Sam Carr
on
10/02/14

Grunt uglify file specs

I struggled a bit finding relevant examples of Gruntfile configuration for Uglify, so having solved a few specific problems myself, here’s what I came up with.

This is just a snippet from the whole Gruntfile of course, and contains half-decent comments already, though I’ll provide some extra explanations below to point out the most interesting bits.

// Variables used internally within this config.
conf: {
  app: 'app',
  dist: 'dist',
  // Just our own custom scripts.
  script_files: ['scripts/*.js'],
  // All scripts that should be minified into final result.
  // Ordering is important as it determines order in the minified output and hence load order at runtime.
  // We don't include jquery (though we could) as it's better to get it from Google where possible.
  minify_js_files: [
      'scripts/vendor/modernizr/modernizr.custom.js',
      '<%= conf.script_files %>',
      'scripts/polyfills/**/*.js']
},

uglify: {
  options: {
    banner: '/*! <%= pkg.name %> <%= grunt.template.today("yyyy-mm-dd") %> */\n',
    sourceMap: '<%= conf.dist %>/scripts/source.map.js',
    sourceMapRoot: '/scripts',
    sourceMappingURL: '/scripts/source.map.js',
    sourceMapPrefix: 2
  },
  // For dev, effectively just concatenate all the JS into one file but perform no real minification.
  // This means that the HTML is the same for dev and prod (it just loads the single .js file) but
  // debugging in the browser works properly in the dev environment. It should work even when fully
  // minified, given the source maps, but practice shows that it doesn't.
  dev: {
    options: {
      report: false,
      mangle: false,
      compress: false,
      beautify: true
    },
    files: [{
      expand: true,
      cwd: '<%= conf.app %>',
      src: '<%= conf.minify_js_files %>',
      dest: '<%= conf.dist %>/scripts/main.min.js',
      // Because we want all individual sources to go into a single dest file, we need to use this
      // rename function to ensure all srcs get the same dest, otherwise each would get a separate
      // dest created by concatenting the src path onto dest path.
      rename: function(dest, src) { return dest; }
    }]
  },
  prod: {
    options: {
      banner: '/*! <%= pkg.name %> <%= grunt.template.today("yyyy-mm-dd") %> */\n',
      report: 'min',
      mangle: true,
      compress: true
    },
    files: '<%= uglify.dev.files %>'
  }
},

Use of a rename function for configuring file srcs and dests

I was really struggling to come up with src/dest configuration for Uglify that pushed all of my source files into a single minified dest file. To be fair, this is trivially easy in the common case, as you can simply use files: { ‘dest_path’: ['file_1', 'file2'] }.

However I have my list of source files in <%= conf.minify_js_files %> and the paths therein do not include the root app/ directory, because this works out for the best in various other Grunt tasks (not shown) where I use cwd in the files block to supply that root dir. Unfortunately, without my custom rename function, a separate dest is calculated for each src file, by concatenating the src path with the dest, so instead of one minified JS file we get lots of individual files sprayed over all sorts of unintended locations. The trivial rename function I’ve used overrides those calculated dest locations to our originally intended single dest. Where different srcs have the same dest, the grunt-contrib-uglify plugin has the good sense to assume you want to merge their output. And hence we get the result we want. To be clear, this is only complicated because I want to use cwd in the file config rather than using the simpler approach.

Re-using files blocks in multiple places

You can share common options amongst multiple targets by putting them at the top level of the task and overriding/extending as required in the specific targets. However you can’t do this when specifying files. In my case I want to specify the same files for both dev and prod Uglify targets, so I specify them in full for dev then use Grunt’s templating facility to refer to them from prod with files: ‘<%= uglify.dev.files %>’.

Theoretically I could have put the definition in the conf block at the top, but it’s specific to Uglify and only used there so I prefer it to be local to the Uglify task. It seems obvious now that I can refer back to it like this, but at the time I struggled to see how to achieve it. I think I had a blind spot for the generic nature of the templating mechanism, having only used it in a rigid way for config previously, and still being very new to Gruntfiles.

Uglify may break JS debugging

I found that my minified JS files could not be successfully debugged in any browsers. I could see the original un-minified code thanks to the source maps and I could set breakpoints, but they usually wouldn’t trigger, or if I did break (e.g. with JS ‘debugger’ command in my code) it was impossible to get variable values. All very frustrating.

Whilst I’m developing I use Grunt’s watch task to serve my files and to auto-process modifications on the fly, so in that mode I turn off all the actual minification features of Uglify and effectively just concatenate the files together into one. Because it’s still the same single file with the same name as in production, I can use identical static HTML to include my JS script. The source maps are still there and allow me to see the individual files in the browser debugger.

by
Sam Carr
on
08/01/14

Getting Sieves Right

The great thing about being wrong is that you get to learn something. In a previous post I went on at length about the the sieve of eratosthenes. Now that I have been enlightened by Melissa O’Neill I must make corrections.

The problem is that Erastosthenes posited a fixed array from which we ‘cross off’ non-primes, the classic algorithm being:

  • Write a list of all integers > 1 up to whatever your maximum n is
  • Declare the first non-eliminated integer as a prime p, and eliminate all multiples of this (staring at p^2 as an optimisation). Note that multiples may be found by incrementing by p. A further optimisation is to stop eliminating at sqrt(n).
  • Repeat for the next non-eliminated integer.

The problem is that fixed list at the start, which rules out generators. The usual implementation for generating infinite sequences of primes stores each p and tests future candidates against them. But this is not what Erastosthenes was talking about. O’Neill calls it the ‘unfaithful sieve’ and notes that

In general, the speed of the unfaithful sieve depends on the number of primes it tries that are not factors of each number it examines, whereas the speed of Eratosthenes’s algorithm depends on the number of (unique) primes that are

As an example (from the paper). If we are looking for primes below 541 (the first 100 primes), and find that 17 is a prime, we start crossing of multiples of 17 at 289 (17^2), crossing off 15 times in total. In contrast, the unfaithful sieve will check all numbers not divisible by 2,3,5,7,11 or 13 for divisibility by 17. This is a total of 99  checks.

In the end, the correct Erastosthenes sieve  is ϴ(n log log n) to find all primes up to n, whereas the unfaithful sieve is  ϴ(n^2/(log n)^2).

O’Neill goes and explores Haskell implementations, but what would a good sieve look like in Clojure? Remember that Clojure, the unfaithful sieve looks like this:

(defn lazy-sieve [s]
  (cons (first s)
    (lazy-seq (lazy-sieve (remove #(zero? (mod % (first s))) (rest s))))))

(defn primes []
  (lazy-sieve (iterate inc 2)))

(take 5 (primes))
;= (2 3 5 7)

It stores the list of eliminated primes in nested calls to lazy-sieve. To turn this into a lazy, infinite Erastosthenes sieve, we need to rather store a map of the next ‘crossings off’, along with their prime factors. Another example from O’Neill: When we discover 17 as a prime, we insert the first ‘crossing off’ (17^2 = 289) in the map of upcoming composites along with it’s prime factors (17 in this case). When we come to consider 289, we discover it is a known multiple of 17, remove 289 from the map, and insert 17+289 = 306. In essence we are building a map of iterators keyed by their current value. As it happens, 306 has prime factors 2, 3 and 17, so when 306 is considered, it is removed and entries inserted for 306 + 2, 306 + 3 and 306 + 17. Each of the iterators at that value is incremented appropriately.

Let’s quickly hack it together. We’re going to store the table of iterators in a plain hash-map, with the ‘crossed off’ values as the key, and a vector of prime factors as value.

;; Given a seq of iterators and a prime, generate a key-value list of
;; the next values of the iterators (as keys) and new lists for the prime factors
(defn gen-iterator-keyvals [iterators prime]
  (mapcat #(list (+ % prime) [%]) iterators))

;; update the iterator-map when a prime is found.
(defn update-iterators [prime iterator-map]
  (let [iterators (apply hash-map (gen-iterator-keyvals (get iterator-map prime) prime))
        basemap (dissoc iterator-map prime)]
    (merge-with concat basemap iterators {(* prime prime) [prime]})))

(defn lazy-erastosthenes [iterator-map [x & xs]]
  (if (contains? iterator-map x)

    ;; if the value is 'crossed-off', it's not a prime so don't cons it. But update the map.
    (lazy-erastosthenes (update-iterators x iterator-map ) xs)

    ;; otherwise chain the value on, and add simply add an entry to the map.
    (cons x (lazy-seq (lazy-erastosthenes (merge-with concat iterator-map {(* x x) [x]}) xs)))))

(defn primes []
  (cons 2 (lazy-seq (lazy-erastosthenes {2 [2]} (iterate inc 2) ))))

(take 10 (primes))
(2 3 5 7 11 13 17 19 23 29)

Performance? Well, the old version had lower constant factors, so the better algorithm only starts to show through at about N=550. As we get to larger numbers, the improvement is clear. For one, it isn’t prone to stack overflows like the original (and the canonical lazy-seq example)! In fact I won’t give graphs here (see the O’Neill paper for some) because I’m the stack overflows are limiting what I can do, but by about N=4000 we’re looking at about an order of magnitude improvement.

by
James Uther
on
31/12/13

Search

Categories

You are currently browsing the archives for the Programming category.

Feeds

Archives

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us