Tuesday, January 28, 2020

Functional Programming and Data Science Applications

This month we had a talk introducing functional programming, its benefits and challenges, and potential applications to data science.

This blog post won't replicate all of the detail of the talk but provide an introduction for newcomers and some examples of key concepts.


The talk was given by Callum O'Brien who has expertise in both functional programming and mathematics. Amongst his many other interests are operating systems and their design.

Callum's slides are here: (plain text), and his code examples are here: (link).

A video recording of the talk is on the group youtube channel: [youtube].


Context

Functional programming has often been thought as a niche, difficult and very theoretical pursuit with little application in the real world building of technology.

Against this false perception, the last decade has seen a growing interest in functional languages and functional programming styles in everyday languages. Demand for functional programmers is also rising in some sectors.

Today, software is much larger and more complicated than it was 20 and 30 years ago. As the size of software grows, so does the challenge of avoiding or finding errors in that code. Some classes of error are a result of that complexity. Because functional programming avoids entire classes of error by design, the growth of software has seen a growth in functional programming.


What is Functional Programming?

Every programming language has a model for how it works, and how you should code.

The model for 6502 assembly language is the CPU registers and memory, and programs manipulate data amongst these using moves and simple operations like arithmetic. Object oriented programming languages encourage you to think in terms of semantically high-level objects which contain data and methods which may act on that data.

The model for functional programming is strictly mathematical functions, just like f(x) = 2x + 3 for example. This might seem like an unusual model for a programming language so let's look at some properties of mathematical functions.

  • Mathematical functions always give the same output given the same input. 
  • Mathematical functions don't refer to information beyond what they are passed as arguments, and don't leave any information beyond what is returned by the function.
  • Simpler mathematical functions can be combined, or composed, to achieve more sophisticated computation.

Let's think about how each of these might apply to a programming language.


1. Referential Transparency
The first property might sound trivial. Of course f(x) = 2x + 3 always gives f(5) = 13. But if you're an experienced programmer you will know that too often many functions we come across don't always do this. Too many hours are spent trying to find bugs which cause functions to give unexpected results. The worst bugs are where functions give unexpected results sometimes, but not always.

Now, if a programming language guaranteed that functions always returned the same outputs given the same inputs, many hours of debugging would be avoided. Our confidence in the correctness of a program also increases if we know the functions behave in this strict manner.

Purely functional languages only allow you to write functions that have this property, called referential transparency.



2. No Side-Effects
That second property also sounds trivial. To we evaluate the function f(5), all the information needed is in the definition of the function, 2x+3, and the information passed as an argument, 5. There is no need to look outside the function at all. If our code obeyed this property, it would mean our functions would work totally independently of whatever else was happening in our code, or indeed elsewhere in our system. This independence is needed to support the first property. If a function depended on some external variable, or some data in some object, then it it wouldn't be independent, and we couldn't make strict guarantees on it always producing the same result given the same input.

That second property also says that a function doesn't leave behind any information after it has been evaluated. Evaluating f(5) leaves us with only one outcome, which is the answer 13. No information elsewhere is changed. Again this sounds trivial, but in the real world software functions break this rule frequently, changing information outside the function, or leaving behind newly created information. Changing external data means others that depend on that data can no longer guarantee the same output for every input. In complex software, many bugs happen because something has changed data in a way you didn't expect. The dependencies can grow into an entangled error-prone web.

Purely functional languages enforce this property by preventing your functions having any side-effects.



3. Composability
The third property also sounds trivial. If we have g(x) = 3x, we can have combinations like h(x) = f(x) + g(x) as well as h(x) = f(g(x)). We can use smaller functions to build bigger ones, and they too will behave in the same strict way as the smaller ones. While this might not sound like much, for software engineering, guaranteeing that a bigger program made of many smaller functions will also behave just as well as the smaller ones is a significant benefit.

But there's more in this third property. That example, f(g(x)) shows the function f() being passed a function g() and not a number. To avoid using data that is external to our function, the only way to pass information to a function is as an argument. If functions were only allowed to take and return integers, being able to build useful programs would be really hard. By being able to take functions, it makes building programs much easier.

If we're composing programs out of functions, and those functions can take functions as arguments, and those functions could also take functions .. we can see that functions must also have the ability to return functions not just integers. This is why functional programming languages treat functions as first-class citizens.



That was a lot of discussion from just three observations about mathematical functions!

Why Maths?
It is worth asking why languages might want to use mathematical functions as a model. The reason is that mathematics is strict enough for us be able to make guarantees about mathematical functions, and compositions of mathematical functions. Software engineering would benefit hugely from similar guarantees. We've talked about guaranteeing the same outputs for the same inputs. But there are other guarantees too. In mathematics we can prove things about about entire classes of functions and algebras. In software, this would mean we can apply guarantees to entire classes of functions or data types. These guarantees can help us understand how a program will behave, and can also help the compiler to make optimisations.

The key mathematical theory for computation, lambda calculus, was developed in the 1930s by Alonzo Church (Alan Turing's phd supervisor), before practical computers were ever built. Many functional languages derive their design from this lambda calculus.


Let's now bring some of this theory to life with some illustrative examples.


Example: Pure Function vs Impure Function

Let's have a look at this python function which joins two lists together.


# impure function

a = [1,2,3]
b = [4,5,6]

def impure_append(b):
    a.extend(b)
    return a

print(impure_append(b))

print(a)


When we run the function, we do get a result which is the two lists joined together. However, the list a has been modified by the function, and we can see this when we print it after the impure_append() has been called.


[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]


Clearly impure_append() is not a pure function because it modifies data outside itself.  Another reason it is impure is that the output of impure_append(b) does not just depend on b, but on data outside itself which can change. In fact if we call impure(b) again we get a different result.


[1, 2, 3, 4, 5, 6, 4, 5, 6]


Now have a look at a pure version of this function.


# pure function

a = [1,2,3]
b = [4,5,6]

def pure_append(a,b):
    return a + b

print(pure_append(a,b))

print(a)


When we run the function, we do get a result which is the two lists joined together. We can also see that the list a has not been modified.


[1, 2, 3, 4, 5, 6]
[1, 2, 3]


The function pure_append() will always give the same results for the same inputs, and it does not modify any data outside itself.

Although python is not a strict purely functional language, we can still benefit from building our programs with well behaved safer functions.


Example: Imperative vs Declarative

The requirement for functions to only depend on the information passed to them, including functions, and also have no side-effects, encourages a style of programming where there are no statements creating or changing state directly, but instead having expressions which express how the input is related to the output, a style called declarative programming.

Lots of programming is done in an imperative style. We state what needs to happen to data, and in what order, for the output to be calculated from the input.

In declarative programming, we only state how output is related to input using functions, just like a mathematical relationship using mathematical functions.

Have a look at the following python code for a factorial function. It is written in a traditional imperative style, where data is created and then changed in a loop. The order the instructions are carried out is important to the correctness of the result.


# imperative factorial

def imperative_factorial(n):
    
    if n == 0: return 1
    
    product = 1
    for i in range(1, n+1):
        product *= i
    return product

imperative_factorial(5)


Another observation is that product is 1 initially, but later is something else. This can't be true mathematically. In functional programs variables are immutable. This shouldn't be seen as a limitation, but a guarantee that if those variables, or indeed functions since they're first class, will always have a single always correct value.

The following is a declarative version. It relies on the fact that n! = n * (n-1)! or factorial(n) = n * factorial(n-1). Although it might take extra thought, the resulting code is simpler. The definition of the function doesn't focus on what to calculate but focusses on how the input and output are related.


# declarative factorial

def declarative_factorial(n):
    
    if n == 0:
        return 1
    else:
        return n * declarative_factorial(n-1)

declarative_factorial(5)


What's more, this definition is recursive, which means the function is defined in terms of itself. Because functions must be pure, functional programming relies much more on solving problems recursively. Iterating is done recursively, not with a counter-based loop.

The above examples were in python to demonstrate the ideas in a familiar language. Functional languages are often much more concise in expressing these kinds of function. The following is a Haskell definition of a factorial function.


factorial :: Integer -> Integer
factorial 0 = 1
factorial n = n * factorial (n - 1)


The first line declares what our function factorial does. It takes an Integer and returns and Integer.  The second line defines factorial for the case it is applied to 0. The final line defines factorial of n as n multiplied by the factorial applied to (n-1).  Simple and clean! Notice also how this definition is much more mathematical, stating the relationships and special cases (n = 0).

A more compelling illustration of declarative programming is the Haskell example of the Quicksort sorting algorithm. Quicksort takes an item, a pivot, from an unsorted list and then moves everything that is smaller than it to the left, and everything bigger to the right. The pivot is now in the right place, but the sub-list to the left and right are unsorted. The same method is then applied recursively to each left and right sub-lists. Even saying this tells you that writing Python or C code would take quite a few lines. The following is a Haskell declarative function that does this.


quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        lesser  = filter (< p) xs
        greater = filter (>= p) xs


The first line is the signature, stating that the quicksort function works on a list of order-able data types, takes a list as input and returns a list. The next line simply states that the quicksort of an empty list is an empty list. The third line has the magic. It says that the quicksort of list (p:xs) where p is the head and xs the tail, is simple the concatenation of a quicksorted lesser list with that p and also with a quicksorted greater list ... where the lesser list is all values less than p from the tail xs, and greater is those with larger values than p.

Notice how this very concise definition of the quicksort function focusses on what a quicksorted list is, rather than the memory-level mechanics of how to re-arrange a list to get it sorted.

You can try Haskell yourself using an online interpreter: https://repl.it/languages/haskell together with a beginner-friendly practical tutorial http://learnyouahaskell.com/.


Performance and Parallelism

In recent years, it has become harder and harder to make commercially available computers faster. More specifically, the performance of a single CPU has not increased to the extent it did 20 and 30 years ago. Instead, CPU and system builders are trying to improve performance through parallelism - multiple cores in CPUs, and multiple CPUs for larger systems. For even larger systems, higher performance can be sought with computing nodes distributed over a network.

The hardware is only half the solution. We need our software to work efficiently across multiple computing nodes, and also to scale well with more nodes.

Experienced programmers will tell you that developing concurrent and parallel software in languages is a major challenge. Not all tasks can be parallelised, but when they can, software is often plagued by a plethora of hard to debug issues like race-conditions, synchronisation and locks.

Parallelism makes ensuring data is not incorrectly changed even harder.

Luckily, functional languages require that no function can rely on, or change, information outside of itself. This isolation, which some might have considered painful, is a huge benefit in this era of parallel and distributed computing.

The ease with which functional programming languages, with their roots in the mathematical theory developed in the 1930s before computers, can painlessly take advantage of modern parallel hardware when languages like C, Java, Python struggle is striking!

There is something else to say about performance. Because a pure function always returns the same result for the same inputs, we can safely remember previous results and avoid doing the calculations again the next time a function is called with the same arguments. This is called memoisation.

In languages that don't require functions to be pure, this kind of optimisation is difficult and limited to a small number of very well understood functions, which are rarely those written by the user.


Testing & Verification

The task of testing software is dramatically easier if it is purely functional. We can test that a function produces the correct output for a given input, and that is a complete test of the that function's behaviour. We don't need to worry about external factors changing the output because pure functions don't use external information, and don't change information outside themselves. In practice, this avoids the huge effort put into mocking up external or surrounding software or systems.

Because pure functional languages are based on a mathematical framework, they are easier to "prove" correct. In reality, this depends on how closely a particular language aligns to a mathematical framework such as lambda calculus. Haskell programs are much easier to verify than C for example, and are amenable to proof assistants like Coq.


Types, TypeClasses

Functional programming languages, having come from a mathematical foundation where correctness is emphasised, will have a strong type system. Types describe values, so 1, 4, 6 are all values of type integer, and 'a', 'x', 'z' are all values of type char. Haskell has a rich type system which applies also to functions because they are first-class citizens.

Type classes define both possibilities and constraints for types. For example, integer and character types both implement equality and so equality is a type class. Languages like Haskell have a very rich type system.

Many classes of error can be avoided more elegantly by enforcing consistency and constraints at the type level.

I found this video useful in understanding the difference between types and type classes:



The Input/Output Puzzle (and Monads)

A very common question about pure functional programming is how any program can be useful if it can't interact with the outside world. To be useful, many programs need to obtain information from the a user, a file system, or network, for example, and also write information to similar external systems.

It seems input/output contravenes the need to keep functions pure and have no side-effects.

How to solve this question as been hot topic of research in recent years and is ongoing. The core aim of functional language designers is to ensure that a program, which is a composition of functions, retains the referential transparency guarantee - which is necessary for both validating the correctness of code, and also for correctly optimising the execution of the program. A secondary aim is to ensure programmers don't inadvertently cause bugs by reading invalid data or overwriting good data with bad.

Imagine a function getName() reads information from the user (or database, or network server). If we call it again, it might return a different value. Straight away this breaks referential transparency. Remember all functions must always return the same output when called with the same input, if any.

Let's consider another problem. Imagine two requests to read external information, a = getPrice() and b = getPrice() called at a later time. A good functional compiler or runtime knows that a pure function will always return the same answer. Here it will return the first answer to the second call, because, well, why would it calculate it again? Even if it didn't do this, there's no guarantee the first one will actually be run first, because pure functions can be run in any order as long as their arguments don't define an order, and here they don't. It gets worse! Some languages are lazy, which is a good thing, but doesn't work with the time-sensitive need to "get this price now, not later".

One solution is to model the external world as data type and use these as additional arguments to functions. So a function getName() is passed a state of the world, and it returns not just the desired information but also a new world state, because the world has changed. Code would look something like this (name, world2) = getName(world1). Another call to getName() could not use world1 again because that world is now in the past. And some functional languages enforce this. Notice how getName(world1) and getName(world2) can now legitimately return different values because they have different arguments.

Too often the question of functional programming and side-effects leads to the mention of something mysterious called a monad. Monads are considered by many as a difficult concept with many warning mere mortals to keep away.

Words like monad, functors, and monoids come from the mathematical study of computation, functions and strict programming languages that aim to give us the same guarantees as mathematics itself.

We won't go into detail here but simply leave an intuition about monads. That augmenting of a function and return value with a world value we just talked about needs to be done in a way that allows functions to still be able to be chained together, composable, in the way we're used to. The underlying function structure that allows this is a monad. That's it. A monad doesn't magically solve side-effects, it is just a way of augmenting functions with that additional unique-making information that enables functions to be pure, whilst also enabling those functions to be composable.

Monads have become lore in the world of computer science, responsible for many memes, myths and legends!


For a programmer, a nice feature of languages like Haskell is that any function that wants to have aside effect has to declare it in its signature and return type. You can't get away with slipping in a getPrice() inside an ordinary looking function that returns an integer. The return type might be IO(integer) or something like that. In this way, anyone else using your function knows that you used a side-effect, reading data from an external source for example.

You can read a good explanation of the IO problem here:


A mathematics-light starting point for understanding monads:



Recommendations For Real World Use

When considering functional programming real world use, there are several factors to evaluate.

An important consideration is the scarcity of people who can develop software in Haskell, as an example. Haskell is not yet as widespread as Javascript and Python. This scarcity is then a risk to be compared to potential benefit. There is growing demand for Haskell programmers in sectors like finance, and this is viable because there is sufficient availability of Haskell programmers in those regions.

A pragmatic strategy for all sectors is to infuse the discipline and good practice encouraged by functional programming into your existing development. Many languages are adopting syntax to support functional style programming, but simply using recursion, list comprehensions or maps isn't sufficient for writing safer more correct code. Adopting patterns that minimise shared state and make transparent what functions are doing is what will have an impact.

The natural ability of functional programming to do parallel computation with ease is a major growth area for functional programming, and one worth investigating if this applies to your work. The Erlang functional language has been used in highly-concurrent telecoms for some years now.

One potential disadvantage of functional languages is that they make use of memory very liberally. In the early days, even simple programs would consume inordinate amounts of memory, but today's compilers and runtimes are much more efficient. If your needs require constrained memory consumption or memory consumption that is highly predictable then functional programming is unlikely to be a good fit. For a vast majority of sectors memory is a very cheap resource.


More Reading