technology from back to front

Archive for December, 2013

Getting Sieves Right

The great thing about being wrong is that you get to learn something. In a previous post I went on at length about the the sieve of eratosthenes. Now that I have been enlightened by Melissa O’Neill I must make corrections.

The problem is that Erastosthenes posited a fixed array from which we ‘cross off’ non-primes, the classic algorithm being:

  • Write a list of all integers > 1 up to whatever your maximum n is
  • Declare the first non-eliminated integer as a prime p, and eliminate all multiples of this (staring at p^2 as an optimisation). Note that multiples may be found by incrementing by p. A further optimisation is to stop eliminating at sqrt(n).
  • Repeat for the next non-eliminated integer.

The problem is that fixed list at the start, which rules out generators. The usual implementation for generating infinite sequences of primes stores each p and tests future candidates against them. But this is not what Erastosthenes was talking about. O’Neill calls it the ‘unfaithful sieve’ and notes that

In general, the speed of the unfaithful sieve depends on the number of primes it tries that are not factors of each number it examines, whereas the speed of Eratosthenes’s algorithm depends on the number of (unique) primes that are

As an example (from the paper). If we are looking for primes below 541 (the first 100 primes), and find that 17 is a prime, we start crossing of multiples of 17 at 289 (17^2), crossing off 15 times in total. In contrast, the unfaithful sieve will check all numbers not divisible by 2,3,5,7,11 or 13 for divisibility by 17. This is a total of 99  checks.

In the end, the correct Erastosthenes sieve  is ϴ(n log log n) to find all primes up to n, whereas the unfaithful sieve is  ϴ(n^2/(log n)^2).

O’Neill goes and explores Haskell implementations, but what would a good sieve look like in Clojure? Remember that Clojure, the unfaithful sieve looks like this:

(defn lazy-sieve [s]
  (cons (first s)
    (lazy-seq (lazy-sieve (remove #(zero? (mod % (first s))) (rest s))))))

(defn primes []
  (lazy-sieve (iterate inc 2)))

(take 5 (primes))
;= (2 3 5 7)

It stores the list of eliminated primes in nested calls to lazy-sieve. To turn this into a lazy, infinite Erastosthenes sieve, we need to rather store a map of the next ‘crossings off’, along with their prime factors. Another example from O’Neill: When we discover 17 as a prime, we insert the first ‘crossing off’ (17^2 = 289) in the map of upcoming composites along with it’s prime factors (17 in this case). When we come to consider 289, we discover it is a known multiple of 17, remove 289 from the map, and insert 17+289 = 306. In essence we are building a map of iterators keyed by their current value. As it happens, 306 has prime factors 2, 3 and 17, so when 306 is considered, it is removed and entries inserted for 306 + 2, 306 + 3 and 306 + 17. Each of the iterators at that value is incremented appropriately.

Let’s quickly hack it together. We’re going to store the table of iterators in a plain hash-map, with the ‘crossed off’ values as the key, and a vector of prime factors as value.

;; Given a seq of iterators and a prime, generate a key-value list of
;; the next values of the iterators (as keys) and new lists for the prime factors
(defn gen-iterator-keyvals [iterators prime]
  (mapcat #(list (+ % prime) [%]) iterators))

;; update the iterator-map when a prime is found.
(defn update-iterators [prime iterator-map]
  (let [iterators (apply hash-map (gen-iterator-keyvals (get iterator-map prime) prime))
        basemap (dissoc iterator-map prime)]
    (merge-with concat basemap iterators {(* prime prime) [prime]})))

(defn lazy-erastosthenes [iterator-map [x & xs]]
  (if (contains? iterator-map x)

    ;; if the value is 'crossed-off', it's not a prime so don't cons it. But update the map.
    (lazy-erastosthenes (update-iterators x iterator-map ) xs)

    ;; otherwise chain the value on, and add simply add an entry to the map.
    (cons x (lazy-seq (lazy-erastosthenes (merge-with concat iterator-map {(* x x) [x]}) xs)))))

(defn primes []
  (cons 2 (lazy-seq (lazy-erastosthenes {2 [2]} (iterate inc 2) ))))

(take 10 (primes))
(2 3 5 7 11 13 17 19 23 29)

Performance? Well, the old version had lower constant factors, so the better algorithm only starts to show through at about N=550. As we get to larger numbers, the improvement is clear. For one, it isn’t prone to stack overflows like the original (and the canonical lazy-seq example)! In fact I won’t give graphs here (see the O’Neill paper for some) because I’m the stack overflows are limiting what I can do, but by about N=4000 we’re looking at about an order of magnitude improvement.

by
James Uther
on
31/12/13

CodeMesh 2013 Redux

Last month I attended the CodeMesh conference here in sunny London, along with a couple of my colleagues. Here are my recollections and thoughts.

The venue (Hotel Russel on Russel Square) is a pleasantly rambling, grand old hotel, which hosted a few hundred hardcore geeks fairly well. A couple of the rooms were a bit small and not entirely suited to the task, though the main lecture hall and the exhibition/mingling/eating space alongside worked very well. Food was good and the craft beers on the first night were exactly what is missing from most such parties. Most event organisers don’t do any better than Becks, so the variety of beer here and live, Clojure-generated beats make it stick in the memory.

The content of the conference (I didn’t just go for the beer and food) was tending strongly towards functional programming, new languages and general deep geekery. And for that I applaud it, though I found the quality of the talks variable. Putting it less politely, a few were atrocious, mainly due to scatty content and poor exposition, but there were just enough gems to make it worthwhile. I think I may have chosen badly between the available tracks as my colleagues seemed to fare better. I’ve been to enough conferences to know that this is par for the course, and like a round of golf, it just takes a couple of standouts to make it all worthwhile.

For me, Jafar Husain’s talk “End to End Reactive Programming at Netflix” was a sparkling example, and I have already presented a mini version of it back at base, introducing my colleagues to the Reactive Extensions (Rx) from a JS point of view, and including some nice examples that I found from Bacon.js, an alternative implementation based on the same principles. I strongly recommend looking at Rx for JavaScript if you’re working in the browser, and at the many other ports for use on the server or various UI frameworks. I’ve actually been exposed to the Scala version myself recently as part of the Principles of Reactive Programming course on Coursera, which was nicely timed to bolster my understanding.

More generally, the key themes that stuck out to me, with a heavy helping of personal opinion, were as follows.

  • Scala has a lot of mindshare and people are doing real things with it. It also brings with it a great deal of scepticism from those that think it’s too complicated or too impure as a functional language. I share those feelings, but I’m learning and using it all the same as it may be the best thing available right now and allows me to mix and match paradigms within a single program, bringing pragmatism to problem solving. I spoke to Bruce Tate after his talk on the evolution of programming languages and he reckoned Scala is a good bridge language to a better place that probably doesn’t exist yet (I’m paraphrasing).
  • Haskell maintains the moral high ground and the talk about what they’re doing with it at Facebook was a necessary example of its use in the real world. Having one of the long time Haskell compiler engineers working on the project has clearly helped, and indeed they have made compiler changes to enable their project to succeed. Somehow Haskell leaves me cold however, at least for now. Perhaps because it’s that much more of a stretch away from the JVM and its purity feels more like trouble than freedom. Maybe I’ll get there in the end.
  • People are really trying hard to bring new languages to the table that learn from what has gone before. Rust looks like a nice approach to systems programming and I’d definitely consider it if faced with a problem that looked like it required C or similar.
  • There was a background level of Erlang, but perhaps more interesting was Elixir, a new functional language that runs on the Erlang VM. Dave Thomas was up on the stage with José Valim, Elixir’s creator, waxing hysterical about how he was as excited as he was about Ruby back in the day. Personally I’m not feeling it, perhaps because I’m tending towards statically typed languages recently, having done plenty of Ruby and Groovy in years past.

To summarise, it was a little hit and miss, but allowed me to put my finger on the pulse of this section of the tech community, and to learn some worthwhile lessons. Many of those lessons were of the intangible, hand-wavy sort, but I value them highly as we try to navigate the currents of the fast-moving tech world. Then again, one of the things we learn is that at a higher level it doesn’t move that quickly and a lot of the stuff that’s hot right now has its origins in the 70s and 80s.

by
Sam Carr
on
29/12/13

Search

Categories

You are currently browsing the LShift Ltd. blog archives for December, 2013.

Feeds

Archives

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us