Dear blog, I’ve got the hots for Haskell.
Its terse syntax is intimidating at first, but as you keep exploring, you start seeing its beauty and even possible ways to write a Hello World program.
But let’s start from the easy bits: this language makes it trivial to compose functions, and in a sense specialize the bare language to represent more closely the semantics of the computational problem at hand.

One of the most basic higher order functions is map, which takes a unary function f and a list l and returns a new list, made of mapping f onto every element of l:

map         :: (a -> b) -> [a] -> [b]
map f []     = []
map f (x:xs) = f x : map f xs

Silly examples!

> map (+1) [1,2,3]
> map (\c -> if c=='o' then 'i' else c ) "potatoes"

Similarly, the right ‘fold’ operation (called ‘reduce’ in other languages) foldr takes a binary function f, a starting object v and a list (x:xs) , and maps f across the elements of (x:xs) recursively:

foldr :: (t -> t1 -> t1) -> t1 -> [t] -> t1
foldr f v [] = v
foldr f v (x:xs) = f x (foldr f v xs)

Binary functions? We’re not restricted to arithmetic operations! If we consider f in the example above as the functional composition operator (.), and the starting element v being the identity function id, the Haskell typechecker will infer that foldr (.) id just requires a list of a -> a functions (i.e. that share the same type signature as id, in this case) and an element to start with!
So here’s compose :

compose :: [a -> a] -> a -> a
compose = foldr (.) id

This idea of representing n-ary functions as daisy-chains of unary elements is called “currying“, and is crucial for reasoning about programs, as we can see in the above example.
Functional composition is implemented as follows, where “\u -> .. body .. ” is the idiom for anonymous (lambda) functions of a variable u:

(.) :: (b -> c) -> (a -> b) -> a -> c
(.) f g = \x -> f (g x)

Another thing of beauty is unfold, which serves to build data structures up from a prototype element.
In this case, it’s specialized to the List monad, so the neutral element is the empty list [] and the structure constructor is the list concatenation operation (:).

unfold :: (a -> Bool) -> (a -> b) -> (a -> a) -> a -> [b]
unfold p h t x
  | p x       = []
  | otherwise = h x : unfold p h t (t x)

See the type signature? We need a function from a domain to Booleans (a “decision” function p), two functions h and t that are applied to the head and to the recursion tail, respectively.

Unfold can be generalized to arbitrary recursive structures, such as trees, heaps etc.

As a quick example, a function that takes a positive integer and returns its digits as a list can be very compactly formulated in terms of unfold (however, leading zeroes are discarded):

toDigits :: Integer -> [Integer]
toDigits = reverse . unfold (==0) (`mod` 10) (`div` 10)
> toDigits 29387456

Next up: Monoids! Functors! Monads! “Hello World” ! and other abstract nonsense.


Coffee + graphs

Good morning!

Today I kick off with some more Algorithms basic theory. Tim Roughgarden, the instructor, has a precise yet captivating delivery style (not too informal, not too nerdy), which makes this worthwhile.
In general, I noticed the same pattern during all of my studies: the instructor has a large responsibility in keeping the students’ heads up, in both senses.
The presentation of even the most formal of subjects has to be enriched by “refreshing” elements. For example, my Calculus 1, 2, 3 AND 4 teacher, the never-enough-celebrated G. C. Barozzi, had a way of embedding anecdotes and biographical facts about mathematicians. This both put the subject in historical context, making it feel less abstract, and relieved the audience from continued focus. Seems too nerdy still? You’ve found the wrong blog then! Ha.

I can agree that some subjects might feel a bit “dogmatic”, as formal proof of all their aspects would require a disproportionate effort from large part of the audience. In this case, it is up to the lecturer to select the more illuminating material beforehand : Longum iter est per praecepta, breve et efficax per exempla (Seneca)

So, contraction algorithms for the MINCUT problem for graphs: given a graph G = \{V, E\}, with nodes V and edges E, which is the smallest set of edges \tilde{E} that divides G in two otherwise disjoint subsets of nodes V_1 and V_2 (i.e. the smallest cutset)?

Random contraction (Karger’s algorithm)
Sequentially remove edges uniformly at random, merging the resulting nodes V(E_{k}) = V_i, V_j \rightarrow V_i (“edge contraction”) and rerouting the edges initially connected to V_i and V_j to V_i.
The cutset that remains when there are only two nodes left is a local optimum of MINCUT.
A single run of Karger’s algorithm (adapted from Wikipedia)

If N_E = |E| is the edge set size, there are N_E! possible contraction sequences. Only a subset of these yield the true MINCUT (i.e. the problem is path-dependent).

Recursive algorithms

September 14, 2013

A recursive algorithm calls itself with a subset of the initial data until its base case is reached (e.g. data cannot be split further).

The canonical example is the factorial operation: x! = x (x-1) (x-2) \cdots 2 .
A straightforward Python function that implements this idea follows:

def fact(x):
    if x <= 1:
        return 1
        return x*fact(x-1)

In practice, recursion actually can speed up the execution of divide-and-conquer (DNQ) algorithms (i.e. in which the initial problem is sequentially split in more manageable sub-problems), as discussed in the next section.

Analysis: the Master Theorem
An asymptotic estimate of the running time T of a recursive DNQ algorithm with deterministically equal size subproblems is given by the following formula: \displaystyle T(n) \leq a T(\frac{n}{b}) + O(n^d), where n is the data size, a \geq 1 and b >  1 are the number of recursive calls per level (i.e. the branching factor of the recursion tree) and the associated reduction in input size, and O(n^d) describes the computational effort associated with preprocessing (i.e. all that is done before the recursions begin).
One can then identify three cases (for simplicity, n is assumed to be a power of b):

  • If a = b^d then T(n) = O(n^d \log n)
  • If a < b^d then T(n) = O(n^d)
  • If a > b^d then T(n) = O(n^{\log_b a})

The first assignment for Algorithms 1 is an estimation for the percolation threshold in an n-by-n array of sites; the system is said to percolate whenever there exists an uninterrupted series of connections between the top “edge” and the bottom one.

We need a fast lookup for the neighborhood of any given site; the neighbors() method checks whether a site is in the middle or on the sides or corners of the grid:

def neighbors(ii, n):
    "returns list of neighbors of ii in n-by-n grid"
    l = []
    if mod(ii, n) > 0:
    if mod(ii-n+1, n) > 0 or ii == 0:
        l.append(ii + 1)
    if ii > n-1:
        l.append(ii - n)
    if ii < n*(n-1):
        l.append(ii + n)
    return l

I use two lists: L is a pointer array as in the previous post and O contains the state tags (‘#’ for closed, ‘.’ for open site). There are two extra sites, serving as roots for the top, resp. bottom sites.

def percolation_frac(L, O, sites):
    for ii, isite in enumerate(sites):
        O[isite] = '.'
        if isite < n:
            L[isite] = ridxTop
        elif n*(n-1) < isite <= N:
            L[isite] = ridxBot
        neighb = neighbors(isite, n)
        for inb in neighb:
            if O[inb]=='.':
                L = wqupc2(L, isite, inb)                
        if L[-1]==L[-2]: 
            """ percolation: 
            top and bottom virtual sites 
            belong to same component
            pf = float(ii) / N
            return pf

As there is no analytical means of predicting when the percolation phase transition is going to occur, we just run the experiment in Monte Carlo mode (many times with randomized initialization), and average the per-run percolation threshold estimate.
Theory tells us that in a square lattice, this value lies around 0.5927 .
My implementation bumps into a few false positives, possibly due to the order in which the union operation is performed, thereby skewing the average more toward 0.6something. Need to look into this.

Union-find Algorithms

August 25, 2013

Started following Coursera-Princeton’s Algorithms 1.
Being a non-computer scientist, I started missing these theoretical foundations in some of my recent scientific work, so this course might fill a few gaps.
Too bad they don’t offer any completion certificate (as other Coursera courses do), therefore I won’t be submitting the programming exercises (Java, btw).
Instead, I’ll work on the assignments in Python, to my heart’s content.

We start from the very top: n objects belonging to m \leq n disjoint sets (“connected components”, in graph terms).
Often one wishes to query whether two items belong to the same set and, in case, merge two sets.
If sets are represented as trees, a common tree root node indicates that two given items belong to the same set.

Tree root-find with recursion
Data structure: L is a pointer list, a list of integers which can be traversed recursively to find the root (i.e. root of element i is L[ ... L[ i ] ... ] ):

def root(L,ii):
    "ridx = tree root of element ii"
    "rd = depth of ii"
    ridx = ii
    rd = 0
    if L[ii] != ii:
        ridx, rd = root(L, L[ii])
        rd += 1    
    return ridx, rd

The branch() method outputs the list of nodes connecting ii to its root:

def branch(L,ii):
    "return branch from ii to root as list"
    if L[ii] != ii:
        Lout = Lout + branch(L, L[ii]) 
    return Lout

There is an interesting discussion about recursion in Python and its real implications here, the take-home message being simply that one should not implement iterative procedures (‘loops’) as recursion.

This algorithm needs the items to be arranged in a set of trees (a ‘forest’), so that the set query is simplified to an identity test of the respective tree roots.
Using the code above, this simply reads:

def qu(L, ii1, ii2):
    "attach subtree of ii2 to root of ii1"
    ridx1, rd1 = root(L, ii1)
    ridx2, rd2 = root(L, ii2)
    L[ridx1] = ridx2
    return L

A shortcoming of this approach is that trees tend to get tall and skinny, so k searches in them, in the worst case, require O(k n) operations.

Weighted Quick-Union
The complexity of a search on Union-Find trees can be kept manageable by weighting, i.e. one subtree in WQU is attached to the second’s root if the former’s size is smaller.
A more precise notion of size is in order: e.g. the total number of elements in the subtree or the subtree depth.
If we settle for the latter definition (union-by-depth),

def wqu(L, ii1, ii2):
    "weighted quick union"
    "attach subtree of ii2 to subtree of ii1,"
    "if the root distance of former is smaller"
    ridx1, rd1 = root(L, ii1)
    ridx2, rd2 = root(L, ii2)
    if rd1 <= rd2:
        L[ridx1] = ridx2
        L[ridx2] = ridx1
    return L  

This balancing helps to keep the complexity of a search on L at n+k\log_2(n) (where \log_2(n) is the average number of subtree size doublings).

Weighted Quick-Union with Path Compression
An optimization on WQU is given by “path compression”: at union time, all the parents of the node to be attached are set to be the root node of the larger tree.
A (single-pass) WQUPC code reads:

def wqupc2(L, ii1, ii2):
    "WQU with path compression"
    B1 = branch(L, ii1)
    B2 = branch(L, ii2)
    rd1 = len(B1)
    rd2 = len(B2)
    ridx1 = B1[-1]
    ridx2 = B2[-1]
    if rd1 <= rd2:
        for ii in B1:
            L[ii] = ridx2
        for ii in B2:
            L[ii] = ridx1
    return L   

While doing what it says on the label, the code above uses two recursive calls to find the branch roots (with the known memory and depth size limitations of recursion), storage of the branches and a loop along the smaller of the two branches.
A smarter implementation is clearly necessary for large-scale applications.
If the search tree can be “flattened out” maximally, the complexity of a union search can be further reduced so as to approach (but never reach) the size n of the data items.
That is, k union-find operations on n objects organized in a flattened tree need less than c \left( n + k \log^*_2(n) \right) array accesses, where \log^*() is the iterated log function (Hopcroft-Ullman, Tarjan).
BTW, the iterated logarithm is defined as the number of times \log() has to be applied in order to obtain \leq 1, and is an extremely slowly growing function (e.g. \log^*(2^{2^{16}})\sim 5).

Results and conclusions
We list the theoretical worst-case computational burden, along with typical figures for the average branch depth of the trees generated with each algorithm (obtained with sets of n=1000 integers, after 10^5 random union operations):

  • Quick-union
    Complexity of k searches on n items: O(k n)
    Average branch depth range: 124 – 133
  • Weighted quick-union
    Complexity of k searches on n items: O( n + k log2(n) )
    Average branch depth range: 5.3 – 8.3
  • Weighted quick-union with path compression
    Complexity of k searches on n items: O( n + k log*2(n) )
    Average branch depth: 2.01 (variations at the third decimal)

Path compression clearly minimizes the overhead associated with finding the set label of large collections of items.
Simple code leading to very deep mathematics and enabling huge-scale computation .. my appetite, on the other hand, is growing quickly.