Rocking Functional Combinators

Published on February 15, 2018 by Vasily Vasinov

Higher-order functions applied to collections—or collection combinators—are some of the most powerful programming tools. They are largely based on functional programming principles and can be used to organize data processing code.

Anyone starting to incorporate functional and reactive idioms into their work can get overwhelmed by the sheer volume of different combinators. The purpose of this lesson is to explain how foundational combinators work and how higher-level combinators utilize base concepts to achieve more specialized solutions.

This lesson will focus on standard Scala implementations of combinators but the ideas behind Scala combinators can be applied to combinators in any other language, be it Ruby, Java, Python, or JavaScript. Most combinators can be used on all standard Scala collections (e.g., List, Set, and Map). They can also be applied to Options, Eithers, and Trys that are special—monadic—kinds of collections.

Foreach

Let’s start with the most basic combinator called foreach. Just like all combinators, it’s a higher-order function, meaning that it accepts another function as an argument.

Say, we have a collection of messages expressed as a sequence of case classes:

case class Message(id: Int, author: String, body: String,
                   sentAt: Long, receivedAt: Option[Long])

val t = Instant.now.getEpochSecond

val messages = Seq(
  Message(1, "Alice", "Hi Bob!", t - 20, Some(t - 19)),
  Message(2, "Bob", "Hey Alice!", t - 18, Some(t - 15)),
  Message(3, "Alice", "Want to hack on some crypto?", t - 10, Some(t - 9)),
  Message(4, "Bob", "Sure! BTC or ETH?", t - 5, Some(t - 3)),
  Message(5, "Alice", "Crypto is not cryptocurrency!", t, None)
)

How would we print bodies of all messages? In a typical OOP language we would use a for loop but Scala provides a different option:

messages.foreach(m => println(m))

Here we pass an anonymous function to the foreach combinator that calls—or applies—this function messages.length number of times and passes all elements of messages into it sequentially. Foreach doesn’t return anything and is only intended for code with side effects.

Side Effect

A side effect refers to the modification of state elsewhere in the program. It could be writing data to disk, changing a variable, or performing a UI interaction.

We can also always pass a named function to the combinator:

def printBody(m: Message) = println(m.body)

messages.foreach(m => printBody(m))

// or with a shortcut:

messages.foreach(printBody)

What if we want foreach to return data as opposed to print something in the console? We’d have to use map—one of the most widely used combinators.

Map

Map is similar to foreach in the sense that it applies a passed function to all elements of the collection. Say, we want to create a collection with a human-readable log of all messages:

messages.map(m => s"${m.author}: ${m.body}")

This code will produce the following list by applying m => s"${m.author}: ${m.body}" to every element in the original list:

List(
  "Alice: Hi Bob!",
  "Bob: Hey Alice!",
  "Alice: Want to hack on some crypto?",
  "Bob: Sure! BTC or ETH?",
  "Alice: Crypto is not cryptocurrency!"
)

Combinators like map are generally only expected to be used with pure functions. If you absolutely must have a side effect add foreach at the end of your combinator chain to execute the side effect.

Pure Function

Pure functions’ return values are determined by their input values without observable side effects. In other words, pure functions always return the same output for the same set of inputs.

Now, let’s try combining two combinators to print out a collection that was returned by map:

messages.map(m => s"${m.author}: ${m.body}").foreach(println)

Here we chained map and foreach combinators to produce the desired result. This is one of the main benefits of combinators: by chaining small digestible pieces of code we can generate complex behaviors with strong boundaries.

Exercise #1 easy

Write a program that will return a map of human-readable time strings as keys and author-body strings as values.

Show Solution

import java.time.Instant

messages.map { m =>
  Instant.ofEpochSecond(m.sentAt).toString -> s"${m.author}: ${m.body}"
}.toMap

FlatMap

So far we’ve been working with pretty straightforward data types. Now, let’s take a look at Message.receivedAt. This attribute is an Option[Long] meaning that it can either be Some[Long] or None. Let’s try extracting it into a list of human-readable strings.

First of all, we’ll have to somehow “unwrap” our Options, while looping through the collection. Depending on the requirements, we’ll either have to replace None with a default date or drop it. In the former case we can use a standard Option.getOrElse function or pattern matching:

import java.time.Instant

val t = Instant.now.getEpochSecond

messages.map { m =>
  Instant.ofEpochSecond(m.receivedAt.getOrElse(t)).toString
}

// or

messages.map { m =>
  val seconds = m.receivedAt match {
    case Some(r) => r
    case None => t
  }

  Instant.ofEpochSecond(seconds).toString
}

What if we don’t want to include receivedAt timestamps that are None? It’s a little trickier to think about conceptually than previous examples because we have to do two things simultaneously:

  • Collect and convert existing long values into human-readable strings while ignoring None values.
  • Unwrap Options and get rid of None values.

To address the second problem we can use a built-in collections method called flatten that literally flattens a collection of collections. What does it mean? Say, you have a collection of collections. When you call flatten on such a collection it unwraps internal collections and merges all their values into the parent collection:

val list = List(List(1, 2), List(3, 4), List())
list.flatten // results in List(1, 2, 3, 4)

As previously mentioned, since Option is just a special collection we can apply flatten to a collection of Options:

val list = List(Some("foo"), None, Some("bar"))
list.flatten // results in List("foo", "bar")

With this knowledge we can collect and convert existing receivedAt values:

messages.map { m =>
  m.receivedAt.map(r => Instant.ofEpochSecond(r).toString)
}.flatten

Again, since Option is just a special collection (similar to Try and Either) we can apply any function with the help of map to its value. Map unwraps Option’s value, applies our function to it, and then puts it back into Option. If there is no value (i.e., when Option is None) no function is applied.

By using two map combinators on two nested collections (Seq and Option) we can achieve the end goal. The result will be a list of Options. To generate a list of strings we simply call flatten on the resulting collection.

There is a shortcut that we can use to achieve the same effect. Instead of calling flatten on the result of map we can replace map with flatMap, which does the same thing as map and flatten but in one pass.

FlatMap is a powerful and useful function. It works by applying a function to each element in the collection while flattening the results into the original list:

messages.flatMap { m =>
  m.receivedAt.map(r => Instant.ofEpochSecond(r).toString)
}

There are surprisingly many applications of this combinator in the wild. It’s widely used with chained Futures and Trys, since API consumers are generally interested in a concrete value at the end of the chain of operations instead of nested containers.

Reduce

Looping over a collection is great and all but there are lots of cases when you have to “memorize” data while iterating over the collection. A trivial example of this is summing up a list of numbers. How can we store intermediate sums with map and then pass them into next steps? We could use an outside mutable variable to achieve this but this would defeat the purpose of using pure functions and immutable data structures.

Immutable Data Structure

An immutable data structure always preserves the previous version of itself when it’s modified, yielding a new updated structure for every applied function like a combinator.

Basic Usage

There is a combinator called reduce that can help us solve this problem. Instead of a function with one parameter it takes a function with two. The first parameter is the result of the previous execution step and the second parameter is the currently processed collection element. Let’s see how it works:

val list = Seq(1, 2, 3, 4)
list.reduce((x, y) => x + y) // returns 10

Here is a list of function parameters and return values for each step of the execution:

(1, 2) => 3
(3, 3) => 6
(6, 4) => 10

The first pair is special: it corresponds to the first two elements of the list. The rest of the pairs follow the aforementioned reduce rule. Scala provides a shortcut combinator for summing up elements in a collection called sum. The code from before simply becomes:

Seq(1, 2, 3, 4).sum // returns 10

There are many useful built-in shortcuts like this that take advantage of base combinators.

Exercise #2 easy

Write a function that calculates the factorial of N.

Show Solution

def factorialOf(n: Int) = {
  (1 to n).reduce((x, y) => x * y)
}

Scala provides a shortcut combinator for calculating a product called product that you can replace reduce((x, y) => x * y) with.

Order of Operations

Now, let’s talk about order of operations. So far in all of our examples, combinators looped over collections sequentially from left to right. Is it always the case? As it turns out, it depends on the underlying collection implementation.

On top of the standard collections library Scala (as well as many other platforms) provides implementations for parallel collections. Parallel collections were added to Scala in an effort to facilitate parallel programming by abstracting low-level parallelization details.

Parallel collection combinators have an out-of-order execution semantic. It means that they could be applied to elements of the collection in an arbitrary non-deterministic order. It makes sense since, by definition, we are executing combinators in parallel instead of sequentially. The underlying mechanism of parallel execution varies from platform to platform. Since Scala uses JVM, parallel computations are performed in an arbitrary set of threads.

To take advantage of parallelism, we have to pass functions that only perform associative operations. We can, however, pass functions with non-commutative operations in them.

Associative Operation

If the result of an expression with two or more occurances of the same operation doesn’t change based on the order in which those operations are performed then this operation is associative. For example, addition and multiplication are associative operations because their order of execution doesn’t matter: (a * b) * c is the same as a * (b * c).

Commutative Operation

An operation is called commutative if changing the order of the operands does not change the result. For example, multiplication is commutative because a * b is the same as b * a. String concatenation is not because "foo" + "bar" is not the same as "bar" + "foo".

Let’s take a look at the real example:

(1 to 20).reduce((x, y) => x - y) // returns -208

(1 to 20).par.reduce((x, y) => x - y) // can return 0, -40, -30, etc.

Subtraction is a non-associative operation. That’s why we get random results by executing the code above. However, if we perform string concatenation, which is associative and non-commutative, then the result will be correct:

Seq("welcome", "to", "grok", "academy").par
  .reduce((x, y) => x ++ " " ++ y)

// returns "welcome to grok academy"

What if we want to make sure that no matter what type of collection we use all of its elements are processed sequentially? For this we can use another combinator called reduceLeft. It works exactly the same as reduce for non-parallel collections.

If we want to go through our collection from right-to-left instead of left-to-right we can use reduceRight instead. It will result in the same returned value as reduceLeft for all associative operations but will be different for non-associative operations.

Exercise #3 easy

If the list of messages were shuffled, find the most recently sent message.

Show Solution

val shuffledMessages = Random.shuffle(messages)

val lastSent = shuffledMessages.reduce { (a, b) =>
  if (a.sentAt > b.sentAt) a else b
}
Exercise #4 hard

If the list of messages were shuffled, find the most recently received message.

Show Solution

val shuffledMessages = Random.shuffle(messages)

val lastReceived = shuffledMessages.reduce { (a, b) =>
  val result = a.receivedAt.map { ar =>
    b.receivedAt match {
      case Some(br) => ar > br
      case None => true
    }
  }

  if (result.getOrElse(false)) a else b
}

Fold and Aggregate

Reducing collections works great when we need to “memorize” data that has the same type as the collection elements and the starting point for this data is some element of the collection. But what if we need to introduce a custom starting value or an entity of an entirely different type while looping over the original collection? There is a standard family of combinators fold/foldLeft/foldRight that lets us achieve this. These combinators use currying with the first argument being an arbitrary object and the second being a function with two arguments—the same as the reduce combinator.

Currying

Currying is the process of decomposing a function of multiple arguments into a chained sequence of functions of one argument. Learn more in our lesson on currying and partial function application.

Let’s look at the real world example by implementing a method that finds all messages from a specific author:

def findByName(messages: Seq[Message], name: String): Seq[Message] = {
  messages.foldLeft(Seq[Message]()) { (ms, m) =>
    if (m.author == name) m +: ms
    else ms
  }
}

First, we “inject” an empty collection Seq[Message]() that gets passed to the applied function in the first step as ms. Then we either return the same collection or a modified version m +: ms based on the function logic.

Exercise #5 easy

Implement a higher-order filter function that takes a list of integers and a user-provided function containing filtering logic. This function should return a list of filtered integers. Here is the signature for this function:

def filter(inputs: Seq[Int], p: (Int) => Boolean): Seq[Int]

Show Solution

def filter(es: Seq[Int], op: (Int) => Boolean): Seq[Int] = {  
  es.foldLeft(Seq[Int]()) { (fes, e) =>
    if (op(e)) e +: fes
    else fes
  }
}

Did you notice anything strange about findByName?

Instead of using fold we went directly to foldLeft. Why did we do it? The simple answer is that fold only works for initial values that have the same type as elements of the collection. If we were to use fold in findByName we’d only be able to inject an object of type Message or its parent, which obviously wouldn’t have worked.

Why have this distinction? Let’s look at fold and foldLeft method signatures:

trait TraversableOnce[+A] {
  def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1

  def foldLeft[B](z: B)(op: (B, A) => B): B
}

TraversableOnce is a trait that all Scala collections implement. Fold’s first argument can only be of type A or parent of A defined by the A1 >: A type constraint. FoldLeft accepts an arbitrary B type. Hence, fold can only have a custom starting value similar to A and foldLeft can have any starting value.

Remember how reduce was used to take advantage of parallelism? Fold works the same way—its signature is designed to be used in parallel collection implementations. You might wonder what it has to do with foldLeft’s inability to run in parallel. The reason why it’s impossible is because parallel collections apply combinator functions out of order. It means that during result merging from previous executions the combinator will pass B, B instead of B, A to op, which violates the type constraint. What does it mean intuitively? It means that we didn’t instruct our combinator how to merge intermediate results of the applied combinator, i.e. B, B.

Thankfully, there is a combinator called aggregate that lets us specify a function to merge intermediate objects. Here is its signature:

def aggregate[B](z: =>B)(seqop: (B, A) => B, combop: (B, B) => B): B

The first parameter z is the starting value of a custom type like in foldLeft. The second parameter seqop is our custom function that gets applied to every element of the collection. And the third parameter combop is a custom merge function that is used when the combinator needs to aggregate the results of seqop together. Here’s how we would rewrite findByName to be used with generic collections (both sequential and parallel):

def findByName(messages: GenSeq[Message], name: String): Seq[Message] = {
  messages.aggregate(Seq[Message]())(
    (ms, m) => {
      if (m.author == name) m +: ms
      else ms
    },
    (a, b) => a ++ b
  )
}

We can pass a regular collection to it:

findByName(messages, "Alice")

The standard implementation of aggregate will use foldLeft under the hood.

We can also pass a parallel collection to it as well:

findByName(messages.par, "Alice")

This will use complex custom logic to process our collection and then merge intermediate results in parallel.

Exercise #6 easy

Implement reduceLeft by using foldLeft. Here is a method signature to get you started:

def myReduceLeft[A](list: Seq[A])

Show Solution

def myReduceLeft[A](list: Seq[A]) = list.tail.foldLeft(list.head)(_)

val list = Seq(1, 2, 3, 4)

myReduceLeft(list) { (a, b) =>
  b + a
}

// returns 10
Exercise #7 hard

Write a function that takes a text file name as an argument and returns an sorted map of word/word-count pairs:

def countWords(fileName: String): SortedMap[String, Int]

Show Solution

def countWords(fileName: String): SortedMap[String, Int] = {
  scala.io.Source.fromFile(fileName)
    .getLines
    .flatMap(line => line.split("\\W+").map(_.toLowerCase))
    .foldLeft(SortedMap[String, Int]()) { (words, word) =>
      val count = words.getOrElse(word, 0) + 1
      words + (word -> count)
    }
}

Scan

There are situations when it’s useful to keep the list of all function application results as part of the returned structure. Scala provides a combinator for this called scan. A common use case for scan is sequence generation. For example, we can easily generate a list of factorials with this approach:

(1 to 200).map(BigInt(_))
  .scan(BigInt(1))((x, y) => x * y)

First, we inject an initial value—just like with fold. Then we apply a function where the first argument is the result of the previous function execution (except for the very first execution, when the argument is the initial value) and the second is the current element of the sequence. Here is the list of the first five argument-result combinations from the previous example:

(1, 1) => 1
(1, 2) => 2
(2, 3) => 6
(6, 4) => 24
(24, 5) => 120

Just like with reduce and fold, scan has left and right variants that can’t be parallelized.

The Rest of Them

By now, we’ve covered the most important combinators that exist in many programming languages. Those foundational logical pieces can be used to implement more specialized combinators. Let’s take a brief look at some other combinators that Scala and other languages provide.

Filter and Find

Filter returns collection elements that satisfy the function predicate. We already implemented a version of this combinator in exercise #5 but the Scala version is more generic:

messages.filter(m => m.id > 3)

Find is very similar to filter. Instead of finding all values that satisfy a given predicate it finds the first occurrence of an element that satisfies the predicate.

Zip and Unzip

Zip aggregates the contents of two lists into a single list of tuples. For example, with zip we can generate a UUID for each message in the list and then put both of them inside a tuple:

(1 to messages.length)
  .map(_ => UUID.randomUUID())
  .zip(messages)

Unzip splits a collection of tuples into two lists. It can be very useful for separating keys from values while parsing maps:

Seq(1 -> "foo", 2 -> "bar", 3 -> "baz").unzip

// results in

(List(1, 2, 3), List("foo", "bar", "baz"))
Exercise #8 easy

Implement unzip by using one of the fold variants for a collection of (Int, String) tuples:

def myUnzip(vs: Seq[(Int, String)]): (Seq[Int], Seq[String])

Show Solution

def myUnzip(vs: Seq[(Int, String)]): (Seq[Int], Seq[String]) = {
  vs.foldRight((Seq[Int](), Seq[String]())) { (unzipped, v) =>
    (unzipped._1 +: v._1,  unzipped._2 +: v._2)
  }
}

Partition and GroupBy

Partition splits a collection into two based on a function predicate. Here is an example of splitting the list of messages based on whether they were received or not:

messages.partition(m => m.receivedAt.nonEmpty)

GroupBy let’s you partition your collection by any property (not just boolean like partition). Here is how we can group our messages by author:

messages.groupBy(m => m.author)

GroupBy returns a map with keys representing grouping properties and values being lists of grouped elements of the original collection. If you only need grouped lists of elements then you can use groupBy with unzip:

messages.groupBy(m => m.author).unzip._2
Exercise #9 hard

Implement a generic function with the following signature without using groupBy that returns a key -> (count, values) map:

def groupByAndCount[A, B](vs: Seq[A])(f: A => B): Map[B, (Int, Seq[A])]

Show Solution

def groupByAndCount[A, B](vs: Seq[A])(f: A => B): Map[B, (Int, Seq[A])] = {
  // inject empty map
  vs.foldLeft(Map[B, (Int, Seq[A])]()) { (map, v) =>
    val key = f(v)

    // unwrap Option[(Int, Seq[A])]
    val pair = map.get(key).fold(key -> (1, Seq(v))) { value =>
      key -> (value._1 + 1, value._2 :+ v)
    }

    // add new pair to an updated map
    map + pair
  }
}

As previously mentioned, there are many other useful but less common combinators. When you are working on a problem I encourage you to always break up your problem into smaller logical chunks and see if any of those chunks match functional combinators that we covered. If nothing comes to mind try “inventing” a combinator and then checking if it exists in the standard collections library. If it doesn’t then you can always implement it yourself.


Join Discussion

Subscribe to our Grokked Weekly newsletter to receive excellent engineering articles and the latest lessons from Grok Academy.