technology from back to front

Archive for the ‘Technology’ Category

When in Rome

I’ve been trying to integrate js-sequence-diagrams into Trac. I’ve reached the point where I can choose between my sequence diagrams getting rendered, and the rest of the Javascript in Trac working. And it’s all because of an underscore…

There’s a popular library in the Javascript world: underscore. In python _ is used to internationalize a string. Trac have carried this notation into their Javascript. That description makes this sound harmless. It’s anything but: it’s taken me a very long time to work that out.

The lesson here is write idiomatic code. Because Javascript has no module system, a web page really is a place you have to co-operate in by using the local customs. For Trac, it’s not reasonable to claim the namespace all for yourself: it’s main claim is extensibility, and the easiest way to extend it is using Javascript in the browser.

Anyway, I guess I should point out the solution is require.js. It’s not all that complicated to use in this case: we leave all the trac stuff, including jquery, and jquery-ui outside. _ get’s redefined, but just in the scope of sequence-diagram.js. main.js looks like this:

require.config({
  baseUrl: requireBaseUrl,
  shim : {
    'underscore': {
      exports: '_'
    },
    'raphael': {
      exports: 'Raphael'
    },
    'sequence-diagram': {
      deps: [ 'underscore', 'raphael' ]
    }
  },
});

require(["draw-sequence-diagrams"])

draw-sequence-diagrams is my script, which is written as a module:

define(
  // I don't actually refer to this, I need it's side effects on jQuery
  ["sequence-diagram"],
  function(sd) {
    // jQuery comes from Trac - it's global
    jQuery(document).ready(function() {
      jQuery(".sequence-diagram").sequenceDiagram({theme: 'simple'})
    })
 })

I have to get trac to calculate the base URL, so I add it as a variable. Here’s the python for the Trac Extension:

import pkg_resources

from genshi.core import Markup

from trac.core import implements, Component
from trac.web.chrome import ITemplateProvider, add_script, add_script_data
from trac.wiki.api import IWikiMacroProvider
from trac.web.api import IRequestFilter
from trac.util.translation import N_

MACRO_DESCRIPTION = """
Include a sequence diagram in the Wiki. We use a javascript
extension to actually render the sequence diagram.
"""


SEQUENCE_DIAGRAM_TEMPLATE = """
<div class="sequence-diagram">{text}</div>
"""


class JsSequenceDiagrams(Component):

    implements(ITemplateProvider,IWikiMacroProvider,IRequestFilter)
       
    # ITemplateProvider methods

    def get_htdocs_dirs(self):
        return [ ( 'js-sequence-diagrams',
                   pkg_resources.resource_filename(
                    'jssequencediagrams',
                    'templates') ) ]

    def get_templates_dirs(self):
        return []

    # IWikiMacroProvider methods

    def get_macros(self):
        yield 'SequenceDiagram'

    def get_macro_description(self, name):
        return 'messages', N_(MACRO_DESCRIPTION)

    def expand_macro(self, formatter, name, text, args):
        return Markup(SEQUENCE_DIAGRAM_TEMPLATE.format(text=text))

    def pre_process_request(self, req, handler):
        return handler

    def post_process_request(self, req, template, data, content_type):
        get = self.env.config.get
        add_script_data(req, { 'requireBaseUrl': '%s/js-sequence-diagrams' % req.href.chrome() })
        add_script(req, 'js-sequence-diagrams/require.js', 'text/javascript')
        add_script(req, 'js-sequence-diagrams/main.js', 'text/javascript')
        return (template, data, content_type)

You can see that I load main.js, rather than letting require.js do it for me. That’s because otherwise, I wouldn’t be able to use trac’s add_script, which can’t add the extra attribute I’d need for that.

Also note I need to get trac to calculate require’s base URL, and I pass that in as a javascript variable.

You will need to look elsewhere for how to include resources in your Trac extension.

by
david
on
31/03/14

Getting back into front-end web development

I’ve been working on a small SPA (Single Page Application) – just HTML, CSS and JavaScript statically served and doing its thing entirely in the browser. I learned a great deal throughout the project, but here are some of the things that strike me as most valuable.

Get a good workflow going

I used Grunt to setup a nice build system that is mostly a joy during development. It took a while to evolve my Gruntfile, but now when I edit a file, the results are immediately refreshed in the browser (I don’t even have to hit cmd-R). I can deploy to S3 test, staging and live sites with a single command that takes about 3 seconds. My SASS files are compiled down to minified CSS, my JS is minified with source maps etc.

The hardest part of using Grunt is figuring out how to configure it and its many contrib plugins. I could have done with a definitive reference or perhaps I could have used Yeoman to give me an out of the box solution. However I recognised that I was always going to have to figure out the guts of Grunt so I think I really was better off bespoking it from the start. I’m glad I did as now I have a tight setup that does precisely what I want and that I understand completely.

Now it seems there is a new kid on the scene, Gulp – nicely introduced in this tutorial blog post. I will definitely be looking closely at that for my next project, with the piping approach looking like the key step beyond Grunt, along with nicer syntax. I’d also look at Browserify, for a nicer way to piece together the JS bits.

Learn JavaScript properly

To the uninitiated, JavaScript is fairly surprising in many subtle ways, and though I can grok the prototype-based inheritance fairly easily, the scoping rules caught me out repeatedly. This was especially the case as I tried to create JQuery plugins with private methods and state. Eventually a simple old article by grand-daddy of JavaScript writing Douglas Crockford gave me the vital clues I was missing.

Really I should just read his book, and I would recommend that anyone else doesn’t just attempt to learn JavaScript as they go, but takes some time to pro-actively figure out the core concepts – it will pay off in short order.

jQuery is non-negotiable

And the award for most indispensable library goes to: jQuery. Seriously, it should be baked into the browsers or the ECMAScript standard. The nicest thing about it is I can pretty much just guess at the API and be right most of the time, though reading the docs usually reveals new conveniences that go further than I even imagined.

Browser quirks can be a living nightmare

JavaScript itself is fairly reliable, especially with judicious use of mature libraries like jQuery that paper over the cross-browser DOM cracks. CSS in complicated scenarios is where it all seems to go wrong however.

It’s amazing how broken/different some browsers are. Here are just a few highlights, though every day brought tens of new oddities and associated workarounds.

  • Mobile Safari on iOS 7 reports the viewport height inconsistently (depending on how you access it) leading to bad layout and horrible JavaScript workarounds.
  • Use of -webkit-overflow-scrolling:touch causes the hardware accelerated renderer to kick in, resulting in various flickers, flashes and flitches with content not rendering.
  • IE 10 on Windows 8 shows back/forward overlays at the left/right of the screen when your mouse moves near them, obscuring links in those locations.
  • Chrome running on Retina Macs suffers from strange graphical glitches when running CSS Animations, but is fine with CSS Transitions. However other browsers/platforms really need CSS Animations to get smooth, hardware accelerated movement. In my case it was necessary to implement both approaches and select using browser detection.
by
Sam Carr
on
06/03/14

Two Magnolias, one container

We are using Magnolia in a number of projects here at LShift. I have been feeling that Magnolia has a simple way to do most things, but often there are a number of other plausible alternatives that gradually lead you into wasting enormous amounts of time.

Here I want to present a simple way to get both author and public instances of Magnolia running in your dev environment in the same container. It may seem very obvious. If so, good. This was not the first way I tried, and it cost me a lot of time.

We will be aiming for:

  1. Easily deploying Magnolia onto a stage or production environment — one file, one or two configuration parameters only.
  2. Making it easy for a tester to launch local public and author instances of Magnolia that talk to each other correctly.
  3. Making it easy for a developer to debug Magnolia, having both instances running under the control of the IDE.

Preconditions

I will be assuming that you have a parent project with a child project that represents your webapp. I also will assume that you have copied the contents of src/main/webapp/WEB-INF/config from the magnolia-empty-webapp project into your own webapp project. The source for this is in the ce-bundle at git.magnolia-cms.com/gitweb/?p=ce-bundle.pub.git, but assuming you have magnolia-empty-webapp as a dependency (as recommended) you should be able to pick it up from your target directory.

I will be using Tomcat 7 as Tomcat is recommended by Magnolia and 7 is the latest stable version at the time of writing.

Deploying Magnolia to Stage or Production environments

For deployment to stage or production you don’t want both author and public deployed in the same container, or even on the same machine; so we only need to be able to configure a single running instance to be either author or public.

This is quite simple and well documented. In your webapp project, open your src/main/webapp/WEB-INF/web.xml (that you copied from the empty webapp project as described above) and look for the lines:

  <context-param>
    <param-name>magnolia.initialization.file</param-name>
    <param-value>
      WEB-INF/config/${servername}/${contextPath}/magnolia.properties,
      WEB-INF/config/${servername}/${webapp}/magnolia.properties,
      WEB-INF/config/${servername}/magnolia.properties,
      WEB-INF/config/${contextPath}/magnolia.properties,
      WEB-INF/config/${webapp}/magnolia.properties,
      WEB-INF/config/default/magnolia.properties,
      WEB-INF/config/magnolia.properties
    </param-value>
  </context-param>

You will need to add your own line at the top of the <param-value> section:

      WEB-INF/config/${contextAttribute/instanceName}/magnolia.properties,

Then when you deploy your WAR, you can simply set the instanceName environment variable to magnoliaPublic or magnoliaAuthor depending on what type of instance you want. As you can see from the fragment of web.xml above, this will make the settings in src/main/webapp/WEB-INF/config/magnoliaAuthor/magnolia.properties or src/main/webapp/WEB-INF/config/magnoliaAuthor/magnolia.properties active, respectively. Ultimately you will want to make more magnolia.properties files in more subdirectories (called, perhaps, stageAuthor, productionPublic and so on) with appropriate settings for those environments and you can simply make instanceName refer to the appropriate subdirectory.

Local Magnolia from the command line

Now, it would seem plausible that this method can be made to make your local testing environment work. Plausible, but wrong. This is the difficult way. You’ll start writing your context.xml files, then you’ll need a server.xml file, then before you know it you’ll be building your own Tomcat so that you can manage it all.

The “secret” is to use the fact that the web.xml already refers to the context path, in the form of the line:

      WEB-INF/config/${contextPath}/magnolia.properties,

(as well as in another line which we won’t concern ourselves with). This means that, instead using an environment variable, you can deploy the same WAR file to two different context paths and Magnolia will set itself up differently for each. And if you choose the paths /magnoliaAuthor and /magnoliaPublic you will automatically pick up the properties files provided by the empty webapp and all will be fine — Magnolia even sets up the author instance to point at http://localhost:8080/magnoliaPublic by default, so you won’t have to configure it yourself!

Well, actually, it’s not all fine. If you try this, you’ll find that one of your instances will refuse to start, complaining that its repository is already locked. Of course, they are trying to use the same repository. Fix this by adding a line similar to the following to magnoliaPublic/magnolia.properties:

magnolia.repositories.home=${magnolia.home}/repositories-public

The name of the subdirectory is not important. Note that, as it stands, this will change where the stage and production deployed Magnolias you configured above store their data. If that bothers you, now might be a good time to make your productionPublic/magnolia.properties and similar files.

So, how do we get that running painlessly so that your tester doesn’t keep asking you how to do it?

Add the Tomcat Maven plugin to your webapp’s pom.xml, and configure it to launch your WAR twice on two different context paths:

      <plugin>
        <groupId>org.apache.tomcat.maven</groupId>
        <artifactId>tomcat7-maven-plugin</artifactId>
        <version>2.2</version>
        <configuration>
          <webapps>
            <webapp>
              <groupId>com.my.group</groupId>
              <artifactId>my-webapp</artifactId>
              <version>1.0-SNAPSHOT</version>
              <type>war</type>
              <asWebapp>true</asWebapp>
              <contextPath>/magnoliaAuthor</contextPath>
            </webapp>
            <webapp>
              <groupId>com.my.group</groupId>
              <artifactId>my-webapp</artifactId>
              <version>1.0-SNAPSHOT</version>
              <type>war</type>
              <asWebapp>true</asWebapp>
              <contextPath>/magnoliaPublic</contextPath>
            </webapp>
          </webapps>
        </configuration>
      </plugin>

Replacing com.my.group and my-webapp with your own webapp’s group and artifact id.

Now you can run your Magnolia simply with:

mvn tomcat7:run-war

For reasons best known to the Tomcat plugin, boring old mvn tomcat7:run doesn’t work — deploying only one Magnolia in its default location. Sorry.

The instances are available, of course, at http://localhost:8080/magnoliaAuthor and http://localhost:8080/magnoliaPublic.

Local Magnolia from your IDE

Now you’re on the home straight. Here’s how I configure the Tomcat plugin in Eclipse:

Firstly, you need to get Eclipse to know about Tomcat 7. The foolproof way to do this is as follows: Window -> Preferences -> Server -> Runtime Environments -> Add… -> Apache Tomcat v7.0 -> Next. Now give it a location that is writable by you in the “Tomcat installation directory” box and click “Download and Install…”; using your pre-existing Tomcat might not work if it isn’t laid out in the way Eclipse expects. Now Finish and open the Servers view.

You can now add a new Tomcat 7 server and double-click on it. Tick “Publish module contexts to separate XML files”, set the start timeout to something large like 480 seconds, and in the modules tab add your webapp project twice; once with the path /magnoliaAuthor and once with the path /magnoliaPublic.

Now you can launch and debug your two instances of Magnolia from within your IDE!

by
Tim Band
on
03/03/14

Predicting and controlling correlations in differentials of addition mod 2^{n}

This is a paper I wrote in collaboration with Scott Fluhrer in 2005. It was not accepted for FSE 2006; it would have been better if I hadn’t waited until 2014 to make it public, but better late than never.

It arose from a discovery I made when developing attacks on Salsa20 for “Truncated differential cryptanalysis of five rounds of Salsa20”. Several of the attacks were twice as effective as my calculations showed they should have been. I remarked in the paper “By experiment, we have even determined a few differential trails whose probability appears to be twice as high as their weight would suggest—this is presumably because of problems with the independence assumption”. Digging deeper, I discovered why these trails did not behave independently, and developed an algorithm for correctly predicting the probability that the trail holds; this algorithm works across a wide range of “ARX” (addition, rotation, XOR) cryptographic primitives. The obvious person to talk to about these ideas was Cisco’s Scott Fluhrer, whose expertise in differential cryptanalysis led to him breaking the cipher in my first publication, Mercy. Scott pointed me to the ARX MAC “Michael”, and we found that the same technique could be used to analyse an anomaly in Michael.

In all likelihood this is just a historical curiosity; the right place to look now would be the ARX toolkit and associated papers.

Paul Crowley, Scott Fluhrer, Predicting and controlling correlations in differentials of addition mod 2^{n} (PDF).

by
Paul Crowley
on
19/02/14

A study in Scala

Cards on the table: I like Scala. Right now it’s my go-to general purpose programming language, but I know that many people have a dim opinion of it, or appreciate its positives but are heavily conflicted by its negatives. Actually I do put myself in that latter camp, though I think I’ve got deep enough into learning the language that I can get past most of those negatives. Here I present a classic tale of “life with Scala”.

The problem

Scala generally has collection methods for most things you can think of, so I was pleased but not surprised to find inits and tails methods, that list all of the prefixes and suffixes respectively. These methods return an iterator, hence the use of toList below, to actually see the results. For the rest of this post, we’ll just consider inits, though tails is trivially similar.

scala> List(1,2,3).inits
res1: Iterator[List[Int]] = non-empty iterator

scala> List(1,2,3).inits.toList
res2: List[List[Int]] = List(List(1, 2, 3), List(1, 2), List(1), List())

scala> List(1,2,3).tails.toList
res3: List[List[Int]] = List(List(1, 2, 3), List(2, 3), List(3), List())

The results include the empty collection, but as it happens that doesn’t suit my needs. I want a version that doesn’t return the empty list.

What to do?

Options include:

  • Call the existing inits method then strip off the last item. However since we get an iterator this isn’t quite so trivial – we can’t just do inits.init
  • Write a new custom method to do just what I want and return a collection, not an iterator.

I decided, in the spirit of Scala, that I would add a nonEmptyInits method to ‘all collections’ (and I’m deliberately vague here) via an implicit conversion, so that it can be called just as easily as inits. It would call inits but wrap the iterator to exclude the final element. Easy peasy, but the rabbit hole loomed ahead of me.

Because I was over-engineering and trying to be purist (this was hobby code) I wanted to implement my new method for the most generic type reasonable. Figuring out what type that was challenged me rather. The Scala collections API has a fairly complicated hierarchy. This is somewhat diagrammed and explained in the documentation but that doesn’t show all the interim traits and classes that are in the guts of the implementation such as TraversableLike, GenTraversable, GenTraversableLike and 10+ others relating just to Traversable. The API is setup to support some very clever facilities, like generic collection methods typically returning collections of the same type you started with. Apparently most libraries can’t/don’t do this. However it made it hard for me to find the right place to start. Ultimately I settled on Traversable, since inits itself is defined in TraversableLike, but adding to that directly seemed like more trouble than it was worth.

implicit class TraversableExtras[T](t: Traversable[T]) {
  def nonEmptyTails: Iterator[Traversable[T]] = {
    val bufferedInits = t.inits.buffered
    new Iterator[Traversable[T]] {
      def hasNext() = bufferedInits.head.nonEmpty
      def next() = bufferedInits.next()
    }
  }
}

My solution above wraps the iterator with a buffered version, so that I can peek at head and determine whether to end early or not. This is fairly nice and simple and I’m quite happy with it. It means we can do the following, via the magic of implicit conversions.

scala> List(1,2,3,4).nonEmptyTails.toList
res7: List[Traversable[Int]] = List(List(1, 2, 3, 4), List(1, 2, 3), List(1, 2), List(1))

But Arrays mess it all up

This is all looking very promising and straightforward, but it took a fair bit of banging my head against the desk to figure out why the compiler wouldn’t let me call it on an Array.

scala> Array(1,2,3).nonEmptyTails
:9: error: value nonEmptyTails is not a member of Array[Int]

To cut a long story short, though we can seamlessly treat Array as a Traversable, Array is actually somewhat special. It maps to Java arrays for implementation reasons, but supporting generics, and with an implicit conversion to WrappedArray, which is a Seq (and hence a Traversable) to provide all the collection methods we’re used to. So far so good, but it turns out that the compiler cannot apply more than one implicit conversion at a time, as the search space would explode. Hence it cannot convert from Array to WrappedArray to my TraversableExtras. A solution in such a case, where we want two conversions, is to explicitly cast the Array, to make the first conversion explicit, then the compiler does the rest.

scala> (Array(1,2,3): Traversable[Int]).nonEmptyTails
res10: Iterator[Traversable[Int]] = non-empty iterator

Wrap-up

Finally I got where I wanted, even if that cast is a bit irritating, and I learned a lot of useful things in the process – all part of the journey of familiarity with any language. But it struck me as soon as I got a bit lost in the Scala collections hierarchy that I was on a mission with which I am all too familiar. It was at that point that I decided I would probably blog about the experience and started taking notes!

I still like Scala, but I’m very well aware that this is the price I pay, and I’m not surprised that others find it too high a price.

Just to be clear…

I realise by the way that this was all a fools errand in the first place. I deliberately went down the path described above as an explicit learning exercise because I wanted to get more experience with the collections API, iterables, implicits etc. It is possible to achieve the result more directly with a filter:

scala> Array(1,2,3).inits.filter(_.nonEmpty).toList
res16: List[Array[Int]] = List(Array(1, 2, 3), Array(1, 2), Array(1))

It’s not quite as pithy as having a nonEmptyTails method, but in all other senses it’s probably superior.

by
Sam Carr
on
10/02/14

Grunt uglify file specs

I struggled a bit finding relevant examples of Gruntfile configuration for Uglify, so having solved a few specific problems myself, here’s what I came up with.

This is just a snippet from the whole Gruntfile of course, and contains half-decent comments already, though I’ll provide some extra explanations below to point out the most interesting bits.

// Variables used internally within this config.
conf: {
  app: 'app',
  dist: 'dist',
  // Just our own custom scripts.
  script_files: ['scripts/*.js'],
  // All scripts that should be minified into final result.
  // Ordering is important as it determines order in the minified output and hence load order at runtime.
  // We don't include jquery (though we could) as it's better to get it from Google where possible.
  minify_js_files: [
      'scripts/vendor/modernizr/modernizr.custom.js',
      '<%= conf.script_files %>',
      'scripts/polyfills/**/*.js']
},

uglify: {
  options: {
    banner: '/*! <%= pkg.name %> <%= grunt.template.today("yyyy-mm-dd") %> */\n',
    sourceMap: '<%= conf.dist %>/scripts/source.map.js',
    sourceMapRoot: '/scripts',
    sourceMappingURL: '/scripts/source.map.js',
    sourceMapPrefix: 2
  },
  // For dev, effectively just concatenate all the JS into one file but perform no real minification.
  // This means that the HTML is the same for dev and prod (it just loads the single .js file) but
  // debugging in the browser works properly in the dev environment. It should work even when fully
  // minified, given the source maps, but practice shows that it doesn't.
  dev: {
    options: {
      report: false,
      mangle: false,
      compress: false,
      beautify: true
    },
    files: [{
      expand: true,
      cwd: '<%= conf.app %>',
      src: '<%= conf.minify_js_files %>',
      dest: '<%= conf.dist %>/scripts/main.min.js',
      // Because we want all individual sources to go into a single dest file, we need to use this
      // rename function to ensure all srcs get the same dest, otherwise each would get a separate
      // dest created by concatenting the src path onto dest path.
      rename: function(dest, src) { return dest; }
    }]
  },
  prod: {
    options: {
      banner: '/*! <%= pkg.name %> <%= grunt.template.today("yyyy-mm-dd") %> */\n',
      report: 'min',
      mangle: true,
      compress: true
    },
    files: '<%= uglify.dev.files %>'
  }
},

Use of a rename function for configuring file srcs and dests

I was really struggling to come up with src/dest configuration for Uglify that pushed all of my source files into a single minified dest file. To be fair, this is trivially easy in the common case, as you can simply use files: { ‘dest_path’: ['file_1', 'file2'] }.

However I have my list of source files in <%= conf.minify_js_files %> and the paths therein do not include the root app/ directory, because this works out for the best in various other Grunt tasks (not shown) where I use cwd in the files block to supply that root dir. Unfortunately, without my custom rename function, a separate dest is calculated for each src file, by concatenating the src path with the dest, so instead of one minified JS file we get lots of individual files sprayed over all sorts of unintended locations. The trivial rename function I’ve used overrides those calculated dest locations to our originally intended single dest. Where different srcs have the same dest, the grunt-contrib-uglify plugin has the good sense to assume you want to merge their output. And hence we get the result we want. To be clear, this is only complicated because I want to use cwd in the file config rather than using the simpler approach.

Re-using files blocks in multiple places

You can share common options amongst multiple targets by putting them at the top level of the task and overriding/extending as required in the specific targets. However you can’t do this when specifying files. In my case I want to specify the same files for both dev and prod Uglify targets, so I specify them in full for dev then use Grunt’s templating facility to refer to them from prod with files: ‘<%= uglify.dev.files %>’.

Theoretically I could have put the definition in the conf block at the top, but it’s specific to Uglify and only used there so I prefer it to be local to the Uglify task. It seems obvious now that I can refer back to it like this, but at the time I struggled to see how to achieve it. I think I had a blind spot for the generic nature of the templating mechanism, having only used it in a rigid way for config previously, and still being very new to Gruntfiles.

Uglify may break JS debugging

I found that my minified JS files could not be successfully debugged in any browsers. I could see the original un-minified code thanks to the source maps and I could set breakpoints, but they usually wouldn’t trigger, or if I did break (e.g. with JS ‘debugger’ command in my code) it was impossible to get variable values. All very frustrating.

Whilst I’m developing I use Grunt’s watch task to serve my files and to auto-process modifications on the fly, so in that mode I turn off all the actual minification features of Uglify and effectively just concatenate the files together into one. Because it’s still the same single file with the same name as in production, I can use identical static HTML to include my JS script. The source maps are still there and allow me to see the individual files in the browser debugger.

by
Sam Carr
on
08/01/14

Getting Sieves Right

The great thing about being wrong is that you get to learn something. In a previous post I went on at length about the the sieve of eratosthenes. Now that I have been enlightened by Melissa O’Neill I must make corrections.

The problem is that Erastosthenes posited a fixed array from which we ‘cross off’ non-primes, the classic algorithm being:

  • Write a list of all integers > 1 up to whatever your maximum n is
  • Declare the first non-eliminated integer as a prime p, and eliminate all multiples of this (staring at p^2 as an optimisation). Note that multiples may be found by incrementing by p. A further optimisation is to stop eliminating at sqrt(n).
  • Repeat for the next non-eliminated integer.

The problem is that fixed list at the start, which rules out generators. The usual implementation for generating infinite sequences of primes stores each p and tests future candidates against them. But this is not what Erastosthenes was talking about. O’Neill calls it the ‘unfaithful sieve’ and notes that

In general, the speed of the unfaithful sieve depends on the number of primes it tries that are not factors of each number it examines, whereas the speed of Eratosthenes’s algorithm depends on the number of (unique) primes that are

As an example (from the paper). If we are looking for primes below 541 (the first 100 primes), and find that 17 is a prime, we start crossing of multiples of 17 at 289 (17^2), crossing off 15 times in total. In contrast, the unfaithful sieve will check all numbers not divisible by 2,3,5,7,11 or 13 for divisibility by 17. This is a total of 99  checks.

In the end, the correct Erastosthenes sieve  is ϴ(n log log n) to find all primes up to n, whereas the unfaithful sieve is  ϴ(n^2/(log n)^2).

O’Neill goes and explores Haskell implementations, but what would a good sieve look like in Clojure? Remember that Clojure, the unfaithful sieve looks like this:

(defn lazy-sieve [s]
  (cons (first s)
    (lazy-seq (lazy-sieve (remove #(zero? (mod % (first s))) (rest s))))))

(defn primes []
  (lazy-sieve (iterate inc 2)))

(take 5 (primes))
;= (2 3 5 7)

It stores the list of eliminated primes in nested calls to lazy-sieve. To turn this into a lazy, infinite Erastosthenes sieve, we need to rather store a map of the next ‘crossings off’, along with their prime factors. Another example from O’Neill: When we discover 17 as a prime, we insert the first ‘crossing off’ (17^2 = 289) in the map of upcoming composites along with it’s prime factors (17 in this case). When we come to consider 289, we discover it is a known multiple of 17, remove 289 from the map, and insert 17+289 = 306. In essence we are building a map of iterators keyed by their current value. As it happens, 306 has prime factors 2, 3 and 17, so when 306 is considered, it is removed and entries inserted for 306 + 2, 306 + 3 and 306 + 17. Each of the iterators at that value is incremented appropriately.

Let’s quickly hack it together. We’re going to store the table of iterators in a plain hash-map, with the ‘crossed off’ values as the key, and a vector of prime factors as value.

;; Given a seq of iterators and a prime, generate a key-value list of
;; the next values of the iterators (as keys) and new lists for the prime factors
(defn gen-iterator-keyvals [iterators prime]
  (mapcat #(list (+ % prime) [%]) iterators))

;; update the iterator-map when a prime is found.
(defn update-iterators [prime iterator-map]
  (let [iterators (apply hash-map (gen-iterator-keyvals (get iterator-map prime) prime))
        basemap (dissoc iterator-map prime)]
    (merge-with concat basemap iterators {(* prime prime) [prime]})))

(defn lazy-erastosthenes [iterator-map [x & xs]]
  (if (contains? iterator-map x)

    ;; if the value is 'crossed-off', it's not a prime so don't cons it. But update the map.
    (lazy-erastosthenes (update-iterators x iterator-map ) xs)

    ;; otherwise chain the value on, and add simply add an entry to the map.
    (cons x (lazy-seq (lazy-erastosthenes (merge-with concat iterator-map {(* x x) [x]}) xs)))))

(defn primes []
  (cons 2 (lazy-seq (lazy-erastosthenes {2 [2]} (iterate inc 2) ))))

(take 10 (primes))
(2 3 5 7 11 13 17 19 23 29)

Performance? Well, the old version had lower constant factors, so the better algorithm only starts to show through at about N=550. As we get to larger numbers, the improvement is clear. For one, it isn’t prone to stack overflows like the original (and the canonical lazy-seq example)! In fact I won’t give graphs here (see the O’Neill paper for some) because I’m the stack overflows are limiting what I can do, but by about N=4000 we’re looking at about an order of magnitude improvement.

by
James Uther
on
31/12/13

CodeMesh 2013 Redux

Last month I attended the CodeMesh conference here in sunny London, along with a couple of my colleagues. Here are my recollections and thoughts.

The venue (Hotel Russel on Russel Square) is a pleasantly rambling, grand old hotel, which hosted a few hundred hardcore geeks fairly well. A couple of the rooms were a bit small and not entirely suited to the task, though the main lecture hall and the exhibition/mingling/eating space alongside worked very well. Food was good and the craft beers on the first night were exactly what is missing from most such parties. Most event organisers don’t do any better than Becks, so the variety of beer here and live, Clojure-generated beats make it stick in the memory.

The content of the conference (I didn’t just go for the beer and food) was tending strongly towards functional programming, new languages and general deep geekery. And for that I applaud it, though I found the quality of the talks variable. Putting it less politely, a few were atrocious, mainly due to scatty content and poor exposition, but there were just enough gems to make it worthwhile. I think I may have chosen badly between the available tracks as my colleagues seemed to fare better. I’ve been to enough conferences to know that this is par for the course, and like a round of golf, it just takes a couple of standouts to make it all worthwhile.

For me, Jafar Husain’s talk “End to End Reactive Programming at Netflix” was a sparkling example, and I have already presented a mini version of it back at base, introducing my colleagues to the Reactive Extensions (Rx) from a JS point of view, and including some nice examples that I found from Bacon.js, an alternative implementation based on the same principles. I strongly recommend looking at Rx for JavaScript if you’re working in the browser, and at the many other ports for use on the server or various UI frameworks. I’ve actually been exposed to the Scala version myself recently as part of the Principles of Reactive Programming course on Coursera, which was nicely timed to bolster my understanding.

More generally, the key themes that stuck out to me, with a heavy helping of personal opinion, were as follows.

  • Scala has a lot of mindshare and people are doing real things with it. It also brings with it a great deal of scepticism from those that think it’s too complicated or too impure as a functional language. I share those feelings, but I’m learning and using it all the same as it may be the best thing available right now and allows me to mix and match paradigms within a single program, bringing pragmatism to problem solving. I spoke to Bruce Tate after his talk on the evolution of programming languages and he reckoned Scala is a good bridge language to a better place that probably doesn’t exist yet (I’m paraphrasing).
  • Haskell maintains the moral high ground and the talk about what they’re doing with it at Facebook was a necessary example of its use in the real world. Having one of the long time Haskell compiler engineers working on the project has clearly helped, and indeed they have made compiler changes to enable their project to succeed. Somehow Haskell leaves me cold however, at least for now. Perhaps because it’s that much more of a stretch away from the JVM and its purity feels more like trouble than freedom. Maybe I’ll get there in the end.
  • People are really trying hard to bring new languages to the table that learn from what has gone before. Rust looks like a nice approach to systems programming and I’d definitely consider it if faced with a problem that looked like it required C or similar.
  • There was a background level of Erlang, but perhaps more interesting was Elixir, a new functional language that runs on the Erlang VM. Dave Thomas was up on the stage with José Valim, Elixir’s creator, waxing hysterical about how he was as excited as he was about Ruby back in the day. Personally I’m not feeling it, perhaps because I’m tending towards statically typed languages recently, having done plenty of Ruby and Groovy in years past.

To summarise, it was a little hit and miss, but allowed me to put my finger on the pulse of this section of the tech community, and to learn some worthwhile lessons. Many of those lessons were of the intangible, hand-wavy sort, but I value them highly as we try to navigate the currents of the fast-moving tech world. Then again, one of the things we learn is that at a higher level it doesn’t move that quickly and a lot of the stuff that’s hot right now has its origins in the 70s and 80s.

by
Sam Carr
on
29/12/13

Documenting an HTTP API with Swagger

I recently tried out Swagger, for documenting an HTTP API. The big win with Swagger is that it provides a sweet HTML UI to browse your API docs and experiment with sending requests and viewing responses, which is a great experience for other developers that are trying to get to grips with your API. Try out their demo of the Swagger UI, for a simple petstore example.

Swagger petstore example - screenshot

Swagger is effectively three things ‘architecturally’:

  • A specification for the JSON files, which contain your API documentation in the abstract
  • Various “generator” tools and libraries (many third party) for producing those JSON files in the first place, either statically or dynamically
  • Swagger UI, a static HTML/JS app which consumes those JSON files and presents the nice UI.

Ideally you use a generator that extracts the API documentation for free, right from your API code. Many such generators are available for various languages and HTTP/REST frameworks, so you may have to do very little to get a free lunch here. However typically you’d expect to use extra annotations to further document the API with the useful human-facing semantic information which isn’t present in the raw code, and further annotations may be required to clarify object serialisation etc. Still, this is a pretty good facsimile of the promised land, where documentation stays in sync with the code and never goes stale!

In our specific case we were working with an app written in Go, so could potentially have used the go-restful library for our REST services, which has Swagger support built into it. However we were already committed to another library that didn’t have that support and being new to Swagger we couldn’t be sure if it was worth switching libraries or wiring up our own swagger integration. We decided to prototype a Swagger solution by hand-crafting the JSON files in the first instance, to see if we (and our users) liked the results. This showed up a particular challenge that is worth covering here.

You can’t do URL hierarchies with static file serving

A typical REST API will have URL hierarchies such as /users (that lists users) /users/fred-smith (details for a specific user) and indeed the Swagger JSON file URLs consumed by Swagger UI are assumed to be in this sort of hierarchy. Swagger UI consumes Swagger JSON files via HTTP: you give it the URL of the main JSON “resource listing” file which provides URLs for the subordinate “API declaration” files. If that resource listing is served from /main, it expects the API declarations to be at /main/user, /main/product etc. and this is hardcoded into the way it constructs URLs. Unfortunately if we want to provide these JSON files by simply serving them via Nginx, straight from disk with no smarts, we’re out of luck as your average filesystem cannot have both a file “main” and a directory “main” in the same parent directory. You just can’t do it, so you can’t serve up a hierarchy like that from static files.

Obviously you could configure your web server more intricately, mapping individual URLs to individual files to construct the hierarchy. This isn’t appealing however, especially as Swagger UI itself can be served statically (it’s just static HTML, JS, CSS etc.) and we are simply including our JSON files within its directory structure. Three simple lines of config in Nginx should be enough, to serve up swagger-ui and our included JSON files:

location /api-docs {
    alias /opt/swagger-ui;
}

The root problem here is that Swagger UI is extremely simplistic about how it interprets paths in the top-level resource listing JSON. It assumes that the paths to the individual API declaration files can simply be concatenated to the resource listing path, as if they are laid out in a pure hierarchy as sub-resources. If the resource listing is at /api-doc.json and it references a path “users.json” then Swagger UI concatenates these and looks for the API declaration at /api-doc.jsonusers.json. This looks especially bad if you have a .json extension and no leading / on the path. By fixing those two problems we get a bit closer but it’s still looking for /api-doc/users and as mentioned above, we can’t have both a file and a directory named “api-doc” in the filesystem so we are stuck. As an aside, losing the file extension is worth doing regardless, as Swagger UI uses the full name as the title for each section of the docs and you really want “users” rather than “users.json” as your heading.

The trick to win the day here is to use a path like “/../users” in the resource listing. Then the concatenated path is /api-doc/../users which is ultimately resolved to just /users. That being the case, we can put our JSON files “api-doc” and “users” in the same directory (even though Swagger likes to consider them as hierarchical) and they will link together correctly. If you do want the API declaration files to be down a level, you could use “/../apis/users” and put them in an “apis” directory one level deeper than the resource listing file. The key here is that we don’t have to have a file and directory with the same name.

by
Sam Carr
on
27/11/13

Optimizing loops in C for higher numerical throughput and for fun

We had here, in LShift, this typical C-vs-Fortran discussion which prompted me to follow up on it. In this holy war I stand by C and believe that a lot of opinions supporting the alleged superiority of Fortran in numerical throughput come from poor understanding of what actually can be done on the C side. So, I’d like to demonstrate here a couple of loop optimization techniques that can be used in C in order to maximize numerical throughput. Why loops? Because they often constitute the bulk of computations and have significant potential for utilizing architecture of modern CPUs. And if your code doesn’t spin in such loops, you should strive to make it do so, I hope, I will provide in this post convincing arguments as to why.

I use my laptop with a SandyBridge CPU, Core i5, running at 1.7GHz, and as a compiler, GCC 4.8. As a task, I’ve picked sum of squares of N vectors:

    y[i] = x1[i]^2 + x2[i]^2 + ... + xN[i]^2

It nicely fits the bill, sum of squares is a commonplace in numerical computations, uses addition and multiplication allowing to demonstrate mechanisms available in modern CPUs, and has the desired structure of:

    for i ...
        for j ...
            do stuff here

I use snippets of a bit simplified C here, and although all tested code is pure C, I also use chunks of actual assembly produced by GCC to illustrate arising issues. The assembly snippets are for SSE target as GCC tends to produce cleaner code for SSE than for AVX.

Ok, down to the business, our naive loop summing squares of N vectors could be:

    for(i = 0; i < VECTOR_LEN; i++)        // mulss   %xmm0, %xmm0
        for(j = 0; j < N; j++)             // addss   %xmm0, %xmm1
            acc[i] += v[j][i] * v[j][i];   // movss   %xmm1, (%rdi)

Disassembly reveals that XXXss instructions are used. This subset of SSE instructions is using one floating point word, not 4, in one go. This code is clearly not taking full advantage of SIMD units and still limits the throughput to 1xCPU_CLOCK. Since logically, it doesn’t matter which loop is inner, which outer, we can swap them while the algorithm remains valid. Now we have the “horizontal” loop:

    for(j = 0; j < N; j++)                  // mulps   %xmm0, %xmm0
        for(i = 0; i < VECTOR_LEN; i++)     // addps   %xmm0, %xmm1
            acc[i] += v[j][i] * v[j][i];    // movups  %xmm1, (%rdi, %rax)

Boom! This time, GCC unrolled the inner loop using full-width XXXps SSE instructions. This single change boosts performance to the expected SIMD_WIDTHxCPU_CLOCK benchmark, as it will be shown below. Too bad that GCC cannot automatically do this simple optimization for us, but as far as I remember, ICC can. Moving on, the next logical step would be to unroll calculations “vertically”, it should reduce number of memory reads/writes. An example of thus manually unrolled loop:

    for(j = 0; j <= N-4; j+=4)                 // movups  (%rdi, %r9), %xmm0
        for(i = 0; i < VECTOR_LEN; i++)        // mulps   %xmm1, %xmm1
            acc[i] += v[j+0][i] * v[j+0][i];   // addps   %xmm1, %xmm0
            acc[i] += v[j+1][i] * v[j+1][i];   // movups  %xmm0, (%rdi, %r9)   <== redundant write
            acc[i] += v[j+2][i] * v[j+2][i];   // movups  (%rax, %r9), %xmm1
            acc[i] += v[j+3][i] * v[j+3][i];   // mulps   %xmm1, %xmm1
                                               // addps   %xmm0, %xmm1
                                               // movups  %xmm1, (%rdi, %r9)   <== redundant write

Here, we see the infamous effect of pointer aliasing every so often brought up in C-vs-Fortran discussions. Compiler, for each line of calculations produces extra read / write instructions which defeat the intended purpose of vertical unrolling. Luckily, the solution is trivial, an extra variable in the inner loop, this makes compiler produce code which caches calculations in a register. Here is the “cached” loop:

    for(j = 0; j <= N-4; j+=4)             // movups  (%rcx, %r9), %xmm1   <== single reads
        for(i = 0; i < VECTOR_LEN; i++)    // movups  (%r8, %r9), %xmm0
            float tmp = acc[i];            // mulps   %xmm1, %xmm1         <== bulk calculations
            tmp += v[j+0][i] * v[j+0][i];  // addps   %xmm4, %xmm3
            tmp += v[j+1][i] * v[j+1][i];  // mulps   %xmm0, %xmm0
            tmp += v[j+2][i] * v[j+2][i];  // addps   %xmm3, %xmm2
            tmp += v[j+3][i] * v[j+3][i];  // addps   %xmm2, %xmm1
            acc[i] = tmp;                  // addps   %xmm1, %xmm0
                                           // movups  %xmm0, (%rdi, %r9)   <== single write

Now the block of resultant SSE operations is compact and doesn’t have the redundant memory accesses. The last optimization I’d like to introduce further leverages the capability of the modern CPU to parallelize independent streams of operations. In order to do this, we need to break dependency chains, in other words, split calculations into independent sequences being executed on separate registers and execution units, here is our “final” loop:

    for(j = 0; j <= N-8; j+=8)
        for(i = 0; i < VECTOR_LEN; i++)
            float tmp1 = acc[i];
            float tmp2 = v[j+0][i] * v[j+0][i];
            float tmp3 = v[j+1][i] * v[j+1][i];
            tmp1      += v[j+2][i] * v[j+2][i];
            tmp2      += v[j+3][i] * v[j+3][i];
            tmp3      += v[j+4][i] * v[j+4][i];
            tmp1      += v[j+5][i] * v[j+5][i];
            tmp2      += v[j+6][i] * v[j+6][i];
            tmp3      += v[j+7][i] * v[j+7][i];
            acc[i] = tmp1 + tmp2 + tmp3;

C code I used for testing all the above loops, is here. To rule out memory bandwidth issues as much as it was possible, I run tests for a bunch of vectors small enough to fit into L1 cache. Throughputs for single core:

                SSE              AVX
naive:       1733.4 MFLOPS    1696.6 MFLOPS    // 1xCPU_CLOCK barrier for scalar instructions
horizontal:  5963.6 MFLOPS    9419.8 MFLOPS    // 4xCPU_CLOCK and 8xCPU_CLOCK for SSE and AVX
unrolled:   11264.8 MFLOPS   11496.6 MFLOPS
cached:     14253.7 MFLOPS   15086.5 MFLOPS
final:      17985.4 MFLOPS   18210.4 MFLOPS    // Both, SSE and AVX settle at around 10xCPU_CLOCK

So it seems, this midrange laptop CPU could potentially get us some 35 GFLOPS out of its two cores without resorting to nothing more than simple changes in pure C.

Things to consider:

  • Why for SSE, we did manage to get throughput of 10xCPU_CLOCK even though SSE operates on 4-floats chunks? SandyBridge architecture has separate execution units for addition and multiplication capable of performing operations in full parallel, this effectively means, SandyBridge can perform as if it had fused add-mul, upping, in some situations the theoretical limit to 8xCPU_CLOCK for SSE and 16xCPU_CLOCK for AVX.
  • Why then, we have 10xCPU_CLOCK not 8xCPU_CLOCK for SSE? SandyBridge CPUs have TurboBoost feature which provides extra headroom under certain circumstances, TurboBoost, however, may be very limited, especially when you properly harness all cores of your CPU.
  • Why then, we didn’t get more than 10xCPU_CLOCK and why we hit the same wall for AVX? We hit an L1 memory bandwidth bottleneck, further memory access optimizations are needed, and this is when carefully handcrafted assembly code may squeeze even more out of CPU.
  • How generic / reliable these optimization techniques are? Well, it’s a product of code, compiler and CPU, so your mileage may vary. But if you really wanna get real kick for your buck, you will have to research and experiment with your own devices.
  • Last but not least, keep in mind that such changes are likely to produce slightly different outputs due to floating-point rounding.

 

by
jarek
on

Search

Categories

You are currently browsing the archives for the Technology category.

Feeds

Archives

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us