technology from back to front

Archive for the ‘Clojure’ Category

Getting Sieves Right

The great thing about being wrong is that you get to learn something. In a previous post I went on at length about the the sieve of eratosthenes. Now that I have been enlightened by Melissa O’Neill I must make corrections.

The problem is that Erastosthenes posited a fixed array from which we ‘cross off’ non-primes, the classic algorithm being:

  • Write a list of all integers > 1 up to whatever your maximum n is
  • Declare the first non-eliminated integer as a prime p, and eliminate all multiples of this (staring at p^2 as an optimisation). Note that multiples may be found by incrementing by p. A further optimisation is to stop eliminating at sqrt(n).
  • Repeat for the next non-eliminated integer.

The problem is that fixed list at the start, which rules out generators. The usual implementation for generating infinite sequences of primes stores each p and tests future candidates against them. But this is not what Erastosthenes was talking about. O’Neill calls it the ‘unfaithful sieve’ and notes that

In general, the speed of the unfaithful sieve depends on the number of primes it tries that are not factors of each number it examines, whereas the speed of Eratosthenes’s algorithm depends on the number of (unique) primes that are

As an example (from the paper). If we are looking for primes below 541 (the first 100 primes), and find that 17 is a prime, we start crossing of multiples of 17 at 289 (17^2), crossing off 15 times in total. In contrast, the unfaithful sieve will check all numbers not divisible by 2,3,5,7,11 or 13 for divisibility by 17. This is a total of 99  checks.

In the end, the correct Erastosthenes sieve  is ϴ(n log log n) to find all primes up to n, whereas the unfaithful sieve is  ϴ(n^2/(log n)^2).

O’Neill goes and explores Haskell implementations, but what would a good sieve look like in Clojure? Remember that Clojure, the unfaithful sieve looks like this:

(defn lazy-sieve [s]
  (cons (first s)
    (lazy-seq (lazy-sieve (remove #(zero? (mod % (first s))) (rest s))))))

(defn primes []
  (lazy-sieve (iterate inc 2)))

(take 5 (primes))
;= (2 3 5 7)

It stores the list of eliminated primes in nested calls to lazy-sieve. To turn this into a lazy, infinite Erastosthenes sieve, we need to rather store a map of the next ‘crossings off’, along with their prime factors. Another example from O’Neill: When we discover 17 as a prime, we insert the first ‘crossing off’ (17^2 = 289) in the map of upcoming composites along with it’s prime factors (17 in this case). When we come to consider 289, we discover it is a known multiple of 17, remove 289 from the map, and insert 17+289 = 306. In essence we are building a map of iterators keyed by their current value. As it happens, 306 has prime factors 2, 3 and 17, so when 306 is considered, it is removed and entries inserted for 306 + 2, 306 + 3 and 306 + 17. Each of the iterators at that value is incremented appropriately.

Let’s quickly hack it together. We’re going to store the table of iterators in a plain hash-map, with the ‘crossed off’ values as the key, and a vector of prime factors as value.

;; Given a seq of iterators and a prime, generate a key-value list of
;; the next values of the iterators (as keys) and new lists for the prime factors
(defn gen-iterator-keyvals [iterators prime]
  (mapcat #(list (+ % prime) [%]) iterators))

;; update the iterator-map when a prime is found.
(defn update-iterators [prime iterator-map]
  (let [iterators (apply hash-map (gen-iterator-keyvals (get iterator-map prime) prime))
        basemap (dissoc iterator-map prime)]
    (merge-with concat basemap iterators {(* prime prime) [prime]})))

(defn lazy-erastosthenes [iterator-map [x & xs]]
  (if (contains? iterator-map x)

    ;; if the value is 'crossed-off', it's not a prime so don't cons it. But update the map.
    (lazy-erastosthenes (update-iterators x iterator-map ) xs)

    ;; otherwise chain the value on, and add simply add an entry to the map.
    (cons x (lazy-seq (lazy-erastosthenes (merge-with concat iterator-map {(* x x) [x]}) xs)))))

(defn primes []
  (cons 2 (lazy-seq (lazy-erastosthenes {2 [2]} (iterate inc 2) ))))

(take 10 (primes))
(2 3 5 7 11 13 17 19 23 29)

Performance? Well, the old version had lower constant factors, so the better algorithm only starts to show through at about N=550. As we get to larger numbers, the improvement is clear. For one, it isn’t prone to stack overflows like the original (and the canonical lazy-seq example)! In fact I won’t give graphs here (see the O’Neill paper for some) because I’m the stack overflows are limiting what I can do, but by about N=4000 we’re looking at about an order of magnitude improvement.

James Uther

Expanding reducers

When playing with a new bit of language, it can be helpful to restrict the problem space to an old, well understood algorithm. For me at least, learning one thing at a time is easier! For this post, It’ll be prime sieves, and I’ll be exploring clojure reducers.

A quick recap, the sieve of eratosthenes is a not-maximally-non-optimal way of finding primes. It’s usually expressed as follows:
To find primes below n: generate a list of n integers greater than 1 while the list is not empty: take the head of the list and: add it to the output remove all numbers evenly divisible by it from the list
In clojure, something like:
(defn sieve ([n] (sieve [] (range 2 n))) ([primes xs] (if-let [prime (first xs)] (recur (conj primes prime) (remove #(zero? (mod % prime)) xs)) primes))) (sieve 10) ;= [2 3 5 7]
Which is fine, but I’d like it lazy so I only pay for what I use, and I can use as much as I’m willing to pay for. Let’s look at lazy sequences. Luckily for us, there is an example of exactly this on the lazy-seq documentation, which we slightly modify like so:
(defn lazy-sieve [s] (cons (first s) (lazy-seq (lazy-sieve (remove #(zero? (mod % (first s))) (rest s)))))) (defn primes [] (lazy-seq (lazy-sieve (iterate inc 2)))) (take 5 (primes)) ;= (2 3 5 7)
So now we have a nice generic source of primes that grows only as we take more. But is there another way?

A few months ago Rich Hickey introduced reducers. By turning the concept of ‘reducing’ inside out the new framework allows a parallel reduce (fold) in some circumstances. Which doesn’t apply here. But let’s see if we can build a different form of sieve using the new framework. First a quick overview (cribbing from the original blog post):

Collections are now ‘reducible’, in that they implement a reduce protocol. Filter, map, etc are implemented as functions that can be applied by a reducible to itself to return another reducible, but lazily, and possibly in parallel. So in the example below we have a reducible (a vector), that maps inc to itself to return a reducible that is then wrapped with a filter on even? which returns a further reducible, that reduce then collects with +.
(require '[clojure.core.reducers :as r])
We’ll be referring to r here and there – just remember it’s the clojure.core.reducers namespace
(reduce + (r/filter even? (r/map inc [1 1 1 2]))) ;= 6
These are composable, so we can build ‘recipes’.
;;red is a reducer awaiting a collection (def red (comp (r/filter even?) (r/map inc))) (reduce + (red [1 1 1 2])) ;= 6
into uses reduce internally, so we can use it to build collections instead of reducing:
(into [] (r/filter even? (r/map inc [1 1 1 2]))) ;= [2 2 2]
So here’s the core of ‘reducer’, which “Given a reducible collection, and a transformation function xf, returns a reducible collection, where any supplied reducing fn will be transformed by xf. xf is a function of reducing fn to reducing fn.”
(defn reducer ([coll xf] (reify clojure.core.protocols/CollReduce (coll-reduce [_ f1 init] (clojure.core.protocols/coll-reduce coll (xf f1) init)))))
And we can then use that to implement mapping as so:
(defn mapping [f] (fn [f1] (fn [result input] (f1 result (f input))))) (defn rmap [f coll] (reducer coll (mapping f))) (reduce + 0 (rmap inc [1 2 3 4])) ;= 14
Fine. So what about sieves? One thought is we could build up a list of composed filters, built as new primes are found (see the lazy-seq example above). But there’s no obvious place to do the building, as applying the reducing functions is left to the reducible implementation. Another possibility is to introduce a new type of reducing function, the ‘progressive-filter’, which keeps track of past finds and can filter against them.
(defn prog-filter [f] (let [flt (atom [])] (fn [f1] (fn [result input] (if (not-any? #(f input %) @flt) (do (swap! flt conj input) (f1 result input)) result))))) (defn progressive-filter [f coll] (reducer coll (prog-filter f)))
And we then reduce with a filtering function that is a function of the current candidate and one of the list of found primes (see the #(f input %) bit above)
(into [] (progressive-filter #(zero? (mod %1 %2)) (range 2 10)) ;= [2 3 5 7]
It’s nicely lazy, so we can use iterate to generate integers, and take only a few (r/take, as it’s operating on a reducer):
(into [] (r/take 5 (progressive-filter #(zero? (mod %1 %2)) (iterate inc 2)))) ;= [2 3 5 7 11]
Or even
(def primes (progressive-filter #(zero? (mod %1 %2)) (iterate inc 2))) (into [] (r/take 5 primes)) ;= [2 3 5 7 11]
You get the idea.

James Uther

Auto-generating LShift blog posts

I’ve often found myself at a loss for blog post topics, so rather than write one myself I decided to let a computer do the heavy lifting!

Markov chains offer a neat trick for generating surrealist blog oeuvres. They work by figuring out the probability of one word appearing after another, given a suitable corpus of input material.

Read more…

John Wright

Fallacies of a Monad

DSL based templating sucks! This looks a very short beep-like sound card. Let paragraphs rely on a sense of data. Roy recently released my mind: In practice of course, it grew features.

Our first local policy at LShift although we’ve had a Meteor is necessary to distinguish this point though, we want to and hence won’t discuss. Empty is the dark ages trying to wind position when your job to generate input data API calls to understand this blog post. Hello, add our socket buffer, etc.

I’m glad I have addressed your submitter claims to work out what if we just the two commits! You especially love peer review. Perhaps it’s about the motherboard. Success!

Read more…

John Wright

cloverage – a code coverage tool for clojure

A couple years ago we presented a couple design sketches for a code coverage tool for clojure. More recently we spent some time researching whether existing code coverage tools would suffice for our requirements, and after finding out that java based code coverage tools either don’t work at all, or produce unhelpful output, we decided to finally write cloverage. You can find it on github:

To try it out, add the lein-cloverage plugin to your user profile in ~/.lein/profiles.clj:
{:user {:plugins [lein-cloverage "1.0.2"]}}
Then run `lein cloverage` in your project root.

It’s based on a prototype one of our commenters mentioned on Tim’s post. Thanks Mike!

Jacek Lach

TDD for Esoteric Programming Languages (using Clojure and Befunge)

I’ve been learning Clojure recently, and I’d been looking around for a good initial project to test my new knowledge. I’ve always wanted to write a Befunge interpreter, and so decided that sounded like a fun project. Little did I know the maze of twisty little passages I was letting myself in for, but I’ve learnt a lot of interesting things along the way, and there’s a few things worth sharing.
Read more…

Tom Parker

Clojure to Smalltalk translation notes

Clojure has an interesting implementation of a Huet-style zipper. I started translating it to Smalltalk, and in the process discovered a number of things not really related to zippers. Given that the end result ends up looking very similar to something we’ve already seen , let’s talk more about the translation process itself.
Read more…

Frank Shearar



You are currently browsing the archives for the Clojure category.



2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us