July 3, 2020

Clojure and The Joy of Seqs

Introduction

Recently I've seen some great posts out there on creating sequences in Clojure and wanted to throw out a few of my own solutions as well as give some detailed explanations for those new to Clojure and the idea of creating lazy seqs, especially using the iterate function.

I think it is super fun to reduce a problem to its essence and provide an implementation that is very close to the mathematical description of the sequence. The fact that we have laziness on our side means we can do things like describe the problem without having to worry about termination or other concerns and just make the seq happen.

Enough talk, let's look at some code!

Primes - A Lazy Seq of Seqs

The following is an implementation of the Sieve of Eratosthenes that uses some very cool lazy sequence magic. The idea is quite simple:

  1. Start with a sequence beginning with the first prime (drop 2 (range)).
  2. Iteratively remove all items in the sequence divisible by the first item in the sequence. Each successive sequence will have been filtered by the previous primes so the first item in that sequence is also prime.
  3. This results in a lazy sequence of lazy sequences (super lazy). Map over the first item in each sequence and you've got your primes. Under the covers, each sequence is consuming the prior sequence as needed to produce the next value.
(def prime-seq
 (letfn [(step [[prime & r]] (remove #(zero? (mod % prime)) r))]
   (map first (iterate step (drop 2 (range))))))

(comment
 ;This works great
 (take 100 prime-seq) ;(2 3 5 7 11 13 17 19 23 29 ...)
 (nth prime-seq 1234)
 ;=> 10067
 
 ;But at some point will blow up.
 (count (take 2900 prime-seq))
 ;Execution error (StackOverflowError) at (REPL:1).
 ;null
 ;Ugh!
 (count (take 2900 prime-seq))
 ;=> 2698 - Note that the number it blows up at will vary, but is generally in the high 2000s.
)

This is a beautiful thing. Sadly, it blows up somewhere in the range of the high 2000 primes. My poor, beautiful sequence! Without diving into the details of the failure, clearly we've taxed this implementation to its limits. If you do want to better understand what happened, see this post by Stuart Sierra.

Primes - Iterate

Given the above limitation, let's try creating a prime seq instead by iterating on a map containing the value of the calculation at each step. You can generate any lazy sequence with iterate by defining seed value, a step function that returns a structurally similar value as the input, and then mapping out the desired output (if needed). It's great because you can break out these different pieces and reason about them independently.

In this case the seed is {:primes [] :n 2}. I'm keeping track of the current vector of primes to use as a sieve as well as the current number.

I need a step function to iterate on:

(defn prime-step [{:keys [primes n]}]
  (if (not-any? (fn [prime] (zero? (mod n prime))) primes)
    {:primes (conj primes n) :n (inc n) :prime n}
    {:primes primes :n (inc n)}))

prime-step takes the seed and sees if the current number is divisible by any of the known primes. Note that :primes is a vector because I want to conj new primes on the back such that I always run my sieve from first to last, effectively doing the exact filtering I did in the previous solution. If the number is prime, I conj it onto the vector of primes and add it as an entry in my map (:prime). In either case, I increment the current number.

I can now do a few manual steps of this function to see that it behaves as expected:

(prime-step (prime-step {:primes [] :n 2}))
;=> {:primes [2 3], :n 4, :prime 3}

Once I'm satisfied with a few manual steps, I'll iterate on it with a take to limit output:

(->> {:primes [] :n 2}
     (iterate prime-step)
     (take 5))
=>
({:primes [], :n 2}
 {:primes [2], :n 3, :prime 2}
 {:primes [2 3], :n 4, :prime 3}
 {:primes [2 3], :n 5}
 {:primes [2 3 5], :n 6, :prime 5})

Perfect! Now I just need to map over the :prime key and eliminate entries with no :prime key, like so:

(def prime-seq2
  (->> {:primes [] :n 2}
       (iterate prime-step)
       (map :prime)
       (filter identity)))

Depending on your preference, once you have a good solution you may want to just inline the step function anonymously.

(def prime-seq2
  (->> {:primes [] :n 2}
       (iterate (fn [{:keys [primes n]}]
                  (if (not-any? (fn [prime] (zero? (mod n prime))) primes)
                    {:primes (conj primes n) :n (inc n) :prime n}
                    {:primes primes :n (inc n)})))
       (map :prime)
       (filter identity)))

This solution does not suffer from the limitations of the old one. Give it a shot in a REPL.

(comment
  (take 10 prime-seq2)
  ;=> (2 3 5 7 11 13 17 19 23 29)
  (take 10000 prime-seq2)
  (nth prime-seq2 10000))

One protip is that you can turn any loop-recur into an iterate by converting the loop bindings into a map. This separates the concerns of iteration, stepping, and termination. This is exceptionally useful for both reasoning about your code as well as creating small, reusable pieces of functionality that can be handled in different ways (e.g. use the same seq to get the nth prime vs. n primes).

The Fibonacci Sequence

I can use the same iterate strategy for other sequences. Here's a solution for the Fibonacci sequence.

(def fib-seq
  (letfn [(step [[lo hi]] [hi (+ lo hi)])]
    (map first (iterate step [0N 1N]))))

The thing I like most about this solution is that when I first implemented it I used a seed of [0 1] which will overflow on the 92nd element. It was trivial to change the seed to [0N 1N] to avoid this problem using BigIntegers.

Pascal's Triangle

Pascal's Triangle is used to compute binomial coefficients. Here's a lazy seq implementation:

(def pascals-triangle-seq
  (letfn [(step [v] (map (partial apply +) (cons '(1) (partition-all 2 1 v))))]
    (iterate step '(1))))

The step function in this case bears a little explanation. Each iterative step takes the current row of the triangle. (partition-all 2 1 v) will create the pairs in the next row to be summed. For example:

(partition-all 2 1 [1 3 3 1])
;=> ((1 3) (3 3) (3 1) (1))

This is good, but I need a (1) at the front for symmetry, so I cons the '(1) on the front.

(cons '(1) (partition-all 2 1 [1 3 3 1]))
;=> ((1) (1 3) (3 3) (3 1) (1))

Finally, I map over the pairs and sum them:

(map (partial apply +) (cons '(1) (partition-all 2 1 [1 3 3 1])))
;=> (1 4 6 4 1)

Iterate on the seed of '(1) and you get something this:

(take 10 pascals-triangle-seq)
=>
((1)
 (1 1)
 (1 2 1)
 (1 3 3 1)
 (1 4 6 4 1)
 (1 5 10 10 5 1)
 (1 6 15 20 15 6 1)
 (1 7 21 35 35 21 7 1)
 (1 8 28 56 70 56 28 8 1)
 (1 9 36 84 126 126 84 36 9 1))

Summary

In this post I show how to create several sequences using iterate. There are other methods such as lazy-seq and lazy-cat but iterate is a workhorse you can use for all kinds of sequences. I generally default to it anywhere I have conditional iteration (rather than loop-recur), especially since the termination condition is separated from the concern of iteration. It is much easier to develop and reason about since the problem can be simplified into its component parts.

Tags: seq clojure iterate sequences laziness