Parallel Streams and Spliterators

Today we are going to look at one of the aspects where using streams is a real win – when we need to thread work. As well as parallel streams, we will also look at Spliterators which acts as the machinery which pushes elements into the pipeline.

Streams use a technique known as internal iteration. It’s internal because the Iterator (or in our case Spliterator) which supplies work through our stream is hidden from us. To use a stream all we need do [once we have a source] is add the stages of the pipeline and supply the functions that these stages require. We don’t need to know how the data is being passed along the pipeline, just that it is. The benefit is that the workings are hidden from us and we can focus more on the work that must be done rather than how it can be done.

The opposite, external iteration is where we are given a loop variable or iterator and we look up the value, and pass it through the code ourselves. This obviously gives us a benefit in that have full control and low overhead. The downside is we have to do all the work looking up the values and passing them through the loop body. This will also mean more test code, and testing loop bodies properly can be tricky. With normal for-loops we also have to be careful of one-off errors.

The question we need to ask ourselves when considering the iteration method: Do we really need absolute control for the task? Streams do some things really well but come with a small performance penalty. Perhaps a non-stream (or even non-Java) solution is more appropriate for high performance work. On the other hand, sorting and filtering files to display in say a ‘recently accessed’ menu item doesn’t require high performance. In that case we’d probably settle for an easy and quick way to do it rather than the best performing one. Even if we go with a performant solution some benchmarking will be necessary as surprises often await. Thus we’re trading convenience off against performance, development time and risk of bugs.

Streams are easy to parallelise as we’ll see. We just change the type of the stream to a parallel stream using the parallel() operator. To do this with internal iteration is hard because it’s set up that we get one item per iteration. The best we can do in that environment is pass work off to threads. To do things efficiently we’d probably have to ditch looping through all the values in the outer loop and look at dividing the work up another way. We’ll see a way of doing this.

With that in mind we’ll look at a prime number generator. First this is not the most efficient prime number generator. For a demonstration it was useful to have an application that was well known, easy to understand, easy to perform with streams and would take a fair bit of computation time to complete.

Let’s look at the internal iteration version first:

EDIT: It’s been pointed out that the simple test (i % j == 0) is better than (i / j * j == i)

public class ForLoopPrimes
{
  public static Set<Integer> findPrimes(int maxPrimeTry)
  {
    Set<Integer> s = new HashSet<>();

    // The candidates to try (1 is not a prime number by definition!)
    outer:
    for (int i = 2; i <= maxPrimeTry; i++)
    {
      // Only need to try up to sqrt(i) - see notes
      int maxJ = (int) Math.sqrt(i);

      // Our divisor candidates
      for (int j = 2; j <= maxJ; j++)
      {
        // If we can divide exactly by j, i is not prime
        if (i / j * j == i)
        {
          continue outer;
        }
      }

      // If we got here, it's prime
      s.add(i);
    }

    return s;
  }

  public static void main(String args[])
  {
     int maxPrimeTry = 9999999;

     long startTime = System.currentTimeMillis();

     Set<Integer> s = findPrimes(maxPrimeTry);

     long timeTaken = System.currentTimeMillis() - startTime;

     s.stream().sorted().forEach(System.out::println);

     System.out.println("Time taken: " + timeTaken);
  }
}

Note: Since we only need to find one divisor, and multiplication is commutative, we only need to exhaust all potential pairs of factors and test one of them [the smaller]. The smaller can’t be any bigger than the square root of the candidate prime and must be at least 2.

This is an example of a brute force algorithm. We’re trying every combination rather than using any stealth or optimisation. We’d also in this case expect the internal iteration version to run fast since there is not a lot of work per iteration.

So why do we have to demonstrate this?

Suppose we want to take advantage of hardware in modern processors and thread this up. How might we do it? Up to Java 7 and certainly before Java 5 this would have been a real pain. We’ve got to divide up the workload, maintain a pool of threads and signal them that there is work available and then collect the work back from them when done. We probably also want to shut the worker threads down at the end if we have any more work to do. While it’s not rocket science, it can be hard to get right quickly and subtle bugs can be hard to spot.

Java 7 makes this a lot easier with the ForkJoin framework. It’s still tricky and easy to get wrong. We’ll use a RecursiveAction to break up the outer loop into pieces of work using a divide-and-conqueror strategy. Note that parallel streams do this as well.

public class ForkJoinPrimes
{
  private static int workSize;
  private static Queue<Results> resultsQueue;

  // Use this to collect work
  private static class Results
  {
    public final int minPrimeTry;
    public final int maxPrimeTry;
    public final Set resultSet;

    public Results(int minPrimeTry, int maxPrimeTry, Set resultSet)
    {
      this.minPrimeTry = minPrimeTry;
      this.maxPrimeTry = maxPrimeTry;
      this.resultSet = resultSet;
    }
  }

  private static class FindPrimes extends RecursiveAction
  {
    private final int start;
    private final int end;

    public FindPrimes(int start, int end)
    {
      this.start = start;
      this.end = end;
    }

    private Set<Integer> findPrimes(int minPrimeTry,
                                    int maxPrimeTry)
    {
      Set<Integer> s = new HashSet<>();

      // The candidates to try
      // (1 is not a prime number by definition!)
      outer:
      for (int i = minPrimeTry; i <= maxPrimeTry; i++)
      {
        // Only need to try up to sqrt(i) - see notes
        int maxJ = (int) Math.sqrt(i);

        // Our divisor candidates
        for (int j = 2; j <= maxJ; j++)
        {
          // If we can divide exactly by j, i is not prime
          if (i / j * j == i)
          {
            continue outer;
          }
        }

        // If we got here, it's prime
        s.add(i);
      }

      return s;
    }

    protected void compute()
    {
      // Small enough for us?
      if (end - start < workSize)
      {
        resultsQueue.offer(new Results(start, end,
                                 findPrimes(start, end)));
      }
      else
      {
        // Divide into two pieces
        int mid = (start + end) / 2;

        invokeAll(new FindPrimes(start, mid),
                            new FindPrimes(mid + 1, end));
      }
    }
  }

  public static void main(String args[])
  {
    int maxPrimeTry = 9999999;
    int maxWorkDivisor = 8;

    workSize = (maxPrimeTry + 1) / maxWorkDivisor;

    ForkJoinPool pool = new ForkJoinPool();

    resultsQueue = new ConcurrentLinkedQueue<>();

    long startTime = System.currentTimeMillis();

    pool.invoke(new FindPrimes(2, maxPrimeTry));

    long timeTaken = System.currentTimeMillis() - startTime;

    System.out.println("Number of tasks executed: " +
                       resultsQueue.size());

    while (resultsQueue.size() > 0)
    {
      Results results = resultsQueue.poll();

      Set<Integer> s = results.resultSet;

      s.stream().sorted().forEach(System.out::println);
    }

    System.out.println("Time taken: " + timeTaken);
  }
}

This is quite recognisable since we have reused the sequential code to carry out the work in a subtask. We create two RecursiveActons to break the workload into two pieces. We keep breaking down until the workload is below a certain size when we carry out the action. We finally collect our results on a concurrent queue. Note there is a fair bit of code.

Let’s look at a sequential Java 8 streams solution:

public class SequentialStreamPrimes
{
  public static Set<Integer> findPrimes(int maxPrimeTry)
  {
    return IntStream.rangeClosed(2, maxPrimeTry)
                    .map(i -> IntStream.rangeClosed(2,
                                          (int) (Math.sqrt(i)))
                    .filter(j -> i / j * j == i).map(j -> 0)
                    .findAny().orElse(i))
                    .filter(i -> i != 0)
                    .mapToObj(i -> Integer.valueOf(i))
                    .collect(Collectors.toSet());
  }

  public static void main(String args[])
  {
    int maxPrimeTry = 9999999;

    long startTime = System.currentTimeMillis();

    Set<Integer> s = findPrimes(maxPrimeTry);

    long timeTaken = System.currentTimeMillis() - startTime;

    s.stream().sorted().forEach(System.out::println);

    System.out.println("Time taken: " + timeTaken);
  }
}

EDIT: A better and quicker version was posted on DZone by Tom De Greyt. Out of courtesy I’ve asked for permission to repost his solution rather than just add it here, but it would also serve as a good exercise for the reader to try to find it. Hint: it involves a noneMatch. If you want to see it, it’s in the comments on the link but it would be beneficial to try to spend a few minutes to find it first.

We can see the streams solution matches up with the external iteration version quite well except for a few tricks needed:

  • Since we only need one factor we use findAny(). This acts like the break statement.
  • findAny() returns an Optional so we need to unwrap it to get our value. If we have no value (i.e. we found a prime) we will store the prime (the outer value, i) by putting it in the orElse clause.
  • If the inner IntStream finds a factor, we can map to 0 which we can filter out before storing.

So let’s make it threaded. We only need to change the findPrimes method slightly:

  public static Set<Integer> findPrimes(int maxPrimeTry)
  {
    return IntStream.rangeClosed(2, maxPrimeTry)
                    .parallel()
                    .map(i -> IntStream.rangeClosed(2,
                                          (int) (Math.sqrt(i)))
                    .filter(j -> i / j * j == i).map(j -> 0)
                    .findAny().orElse(i))
                    .filter(i -> i != 0)
                    .mapToObj(i -> Integer.valueOf(i))
                    .collect(Collectors.toSet());
  }

This time we don’t have to mess around with the algorithm. Simply by adding an intermediate stage parallel() to the stream we make it divide up the work. Parallel(), like filter and map, is an intermediate operation. Intermediate operations can also change the behaviour of a stream as well as affect the passing values. Other intermediate stages we’re not seen yet are:

  • sequential() – make the stream sequential
  • distinct() – only distinct values pass
  • sorted() – a sorted stream is returned, optionally we can pass a Comparator
  • unordered() – return an unordered stream

If we fire up jconsole while we’re running and look at the Threads tab, we can compare the sequential and parallel version. In the parallel version we can see several ForkJoin threads doing the work.

I did some timings and got the following results [note this is not completely accurate since other tasks might have been running in the background on my machine – values are to the nearest half-second].

  • External, sequential (for-loop): 8.5 seconds
  • External, parallel (ForkJoin): 2.5 second
  • Internal, sequential (sequential stream): 21 seconds
  • Internal, parallel (parallel stream): 6 seconds

This is probably as expected. The amount of work per iteration in the inner loop is low, so any stream actions will have relatively high overhead as seen in the sequential stream version. The parallel stream comes in slightly faster than the for-loop, but the ForkJoin version outperforms it by a factor of more than 2. Note how simpler the streams version was [once we get the hang of streams of course] compared to the amount of code in the ForkJoin version.

Let’s have a look at the work-horse of this work distribution, the Spliterator. A Spliterator is an interface like an Iterator, but instead of just providing the next value, it can also divide work up into smaller pieces which are executed by ForkJoinTasks.

When we create a Spliterator we provide details of the size of the workload and characteristics that the values have. Some types of Spliterators such as RangeIntSpliterator [which IntRange supplies] use the characteristics() method to return characteristics, rather than having them supplied via a constructor like AbstractSpliterator does.

We obviously need the size of the workload so we can divide up the work up and know when to stop dividing. The characteristics we can supply are defined in the Spliterator interface as follows:

SIZED – we can supply a specific number of values that will be sent prior to processing (versus an InfiniteSupplyingSpliterator)
SUBSIZED – implies that any Spliterators that trySplit() creates will be SIZED and SUBSIZED. Not all SIZED Spliterators will split into SUBSIZED spliterators. The API gives an example of a binary tree where we might know how many elements are in the tree, but not in the sub-trees
ORDERED – we supply the values in sequence, for example from a list
SORTED – the order follows a sort order (rather than sequence); ORDERED must also be set
DISTINCT – each value is different from every other, for example if we supply from a set
NONNULL – values coming from the source will not be null
IMMUTABLE – it’s impossible to change the source (such as add or remove values) – if this is not set and neither is CONCURRENT we’re advised to check the documentation for what happens on modification (such as a ConcurrentModificationException)
CONCURRENT – the source may be concurrently modified safely and we’re advised to check the documentation on the policy

These characteristics are used by the splitting machinery, for example in the ForEachOps class (which is used to carry out tasks in a pipeline terminated with a forEach). Normally we can just use a pre-built Spliterator [and often don’t even need to worry about that because it’s supplied by the stream() method]. Remember the streams framework allows us to get work done without having to know all the details of how its being done. It’s only in the rare cases of a special problem or needing maximum performance do we have to worry.

Splitting is done by the trySplit() operation. This returns a new Spliterator. For the requirements of this function the API documentation should be referred to.

When we consume the contents of [part of] the stream in bulk using the Spliterator, the forEachRemaining(action) operation is called. This takes source data and calls the next action via the action’s accept call. For example if the next operation is filter, the accept call on filter is called. This calls the test method of the contained predicate, and if that is true, the accept method of the next stage is called. At some point a terminal stage will be called [the accept method calls no other stage] and the final value will be consumed, reduced or collected. When we call a stream() method, this pipeline is created and calling intermediate stages chains them to the end of the pipeline. Calling the final consuming stage makes the final link and sets everything off.

Alternatively when we need to generate each element from a non-bulk source, the tryAdvance() function is used. This is passed an action which accept is called on as before. However, we return true if we want to continue and false if we don’t. InfiniteSupplyingSpliterator for example always returns true, but we can use an AbstractSpliterator if we want to control this. Remember the AbstractIntSpliterator from our SixGame in the finite generators article? One of our tryAdvance functions was this:

@Override
public boolean tryAdvance(Consumer action)
{
  if (action == null)
    throw new NullPointerException();
  if (done)
    return false;

  action.accept(rollDie());

  return true;
}

In this case if we roll the die we always continue. This would allow the done logic to be set from elsewhere if we didn’t want to roll a die again. It might have been slightly better to have returned !done instead of true to terminate generation immediately as soon as the six was thrown. However in this case going through another cycle was hardly a chore.

That’s it for the streams overview. In the next article we’ll look a bit more at lambda expressions.

Collectors Part 2: Provided collectors and a Java 8 streams demonstration

Today we’re going to continue where the last article left off. In that one we looked at collectors, specifically reduction and short-circuiting operations. Today we’ll look at the collect function and then we’ll finish off with a more substantial example showing the power Java 8 streaming gives us.

A collector gathers results and terminates the stream. It can also do reductions. The Collectors class provides a number of useful collectors ready for us to use. We’ll start by looking at those which carry out the same operations we looked at in the previous article (count, sum, average, max, min and summaryStatistics).

public class Collectors
{
  public static void main(String args[])
  {
    Integer[] numbersArray = new Integer[] { 1, 2, 3, 4, 5 };

    System.out.println(Arrays.stream(numbersArray)
                             .collect(Collectors.counting()));

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.summingInt((Integer x) -> x)));

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.averagingInt((Integer x) -> x)));

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.maxBy(Integer::compare)).get());

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.minBy(Integer::compare)).get());

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.summarizingInt((Integer x) -> x)));
  }
}

Note we’re streaming Integer rather than int so we need to pass a ToIntFunction function for the sum, average and summarizing collectors. A ToIntFunction applies a function to a type and returns an int. Given the stream has an Object shape we need to help the compiler and indicate the parameter to the lambda is really an Integer. Auto-unboxing will do the rest.

All relatively easy so far, but now it gets a bit harder. The problem that we face with the rest of the API is that the functions have several overloaded versions and there are lots of generics in the specification making it hard to read.

Take this one from partitioningBy which is not the most difficult to understand:

    Collector<T, ?, Map<Boolean, D>> 
       partitioningBy(Predicate<? super T> predicate,
                      Collector<? super T, A, D> downstream) {

We can see clearly that partitioningBy takes a predicate and a collector, and returns a collector which the collect function can use. The problem is working out what all these types are, even with the help of the documentation. Luckily I’ll provide some examples which should make it easy to start.

What I did when I explored was make simple examples like those I’m presenting:

  • Try the simplest version with the least parameters. That will usually be the easiest to understand.
  • Once you get that working look at the implementation. Often the simpler one will be passing its own ‘default’ parameters to a more complicated version giving a clue to what that is expecting.
  • Try the more complicated one, but substitute the ‘default’ parameter with something else. See what happens. Does it compile and do what’s expected, if not why not?
  • Try stepping through the library code and see how the code is using the parameters.

Hopefully with the examples I’ve make easy work of this, but you can learn a lot more by experimenting yourself.

Collecting sounds like the thing collections are for, and indeed there are several ways of building collections with results coming from a stream. We can build a generic list, a generic set, a generic map, or we can build a specific type of collection or map:

public class CollectInCollections
{
  public static void main(String args[])
  {
    Character[] chars = new Character[]
                        { 'a', 'b', 'c', 'd', 'e', 'f', 'g' };

    // First a list
    List<Character> l = Arrays.stream(chars)
                              .collect(Collectors.toList());

    System.out.println(l);

    // toList gives us a generic list (code creates an ArrayList)
    // Let's get a linked list
    List<Character> ll = Arrays.stream(chars)
                               .collect(
                       Collectors.toCollection(LinkedList::new));

    System.out.println(ll);

    // toSet gives us a generic set (code creates a HashSet)
    Set<Character> s = Arrays.stream(chars)
                                   .collect(Collectors.toSet());

    System.out.println(s);

    // and now a generic map (code creates a HashMap)
    Map<Character, Character> m = 
                        Arrays.stream(chars).collect(
                  Collectors.toMap((Character k) -> 
                                       Character.toUpperCase(k),
                                   Function.identity()));

    System.out.println(m);

    // What happens if keys clash?
    try
    {
      Arrays.stream(chars).collect(
                  Collectors.toMap((Character k) -> 'a', 
                                   Function.identity()));
    }
    catch (IllegalStateException e)
    {
      System.out.println("Caught duplicate key");
    }

    // Let's provide a function to resolve this
    // we'll keep the first
    Map<Character, Character> m2 =
                        Arrays.stream(chars).collect(
               Collectors.toMap((Character k) -> 'a',
                                Function.identity(),
                               (v1, v2) -> v1));

    System.out.println(m2);

    // If we return null from our merge function,
    // the latest is kept
    Map<Character, Character> m3 =
                        Arrays.stream(chars).collect(
                Collectors.toMap((Character k) -> 'a',
                                 Function.identity(),
                                 (v1, v2) -> null));

    System.out.println(m3);

    // We can also request a different type of map
    Map<Character, Character> m4 =
                        Arrays.stream(chars).collect(
                Collectors.toMap(
                   (Character k) -> Character.toUpperCase(k),
                                 Function.identity(),
                                 (v1, v2) -> v1,
                                 TreeMap::new));

    System.out.println(m4);
  }
}

Notes:

  • Map comes with several overloaded versions which help us deal creating keys for our values and dealing with clashes. As we know if we try to put a key-value pair into a map where the key already exists, we overwrite the existing value.
    The first two parameters map our value onto a key and a value respectively. For one of these (often the value) we don’t want to change anything and so passing Function.identity() (or the lambda v -> v) will keep the value the same.
  • By default toMap uses a throwingMerger() which throws an IllegalStateException if two keys clash. We can see this in action if we force the keys to a single value. If we specify a third parameter we can specify a BiFunction (two parameters in, one out) to deal with the clash instead. If the result of this function is null, the latest is kept.
  • If we specify a fourth parameter to toMap we can specify a specific type of map.
  • The documentation doesn’t state what type of collection is returned for toList(), toSet() and the 2 and 3 parameter versions of toMap(). The idea is that functional programming provides a rich set of features for a few collections rather than lots of collections. We therefore shouldn’t make any assumptions and where it matters use toCollection or the 4 parameter version of toMap to be sure, or convert later.
  • There are also concurrent versions of toMap called toConcurrentMap which can give better performance in parallel streams when we don’t care about the order.

Now on to rest of the operations which are for joining strings, grouping and partitioning:

class JoiningGroupingAndPartitioning
{
  public static void main(String args[])
  {
    Character[] chars =
           new Character[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g' };

    // Join them all together
    System.out.println(
           Arrays.stream(chars).map(x -> x.toString())
                               .collect(Collectors.joining()));

    // Join with a ,
    System.out.println(
           Arrays.stream(chars).map(x -> x.toString())
                            .collect(Collectors.joining(",")));

    // Join with a , and surround the whole thing with []
    System.out.println(Arrays.stream(chars)
                             .map(x -> x.toString())
                             .collect(
                           Collectors.joining(",", "[", "]")));

    // Group into two groups
    Map<String, List<Character>> group1 =
           Arrays.stream(chars).collect(
                           Collectors.groupingBy(
          (Character x) -> x < 'd' ? "Before_D" : "D_Onward"));

    System.out.println(group1);

    // As before, but group values with like keys in a set
    Map<String, Set<Character>> group2 =
           Arrays.stream(chars).collect(
                           Collectors.groupingBy(
          (Character x) -> x < 'd' ? "Before_D" : "D_Onward",
                                         Collectors.toSet()));

    System.out.println(group2);

    // Put the whole grouping structure in a TreeMap
    Map<String, Set<Character>> group3 =
           Arrays.stream(chars).collect(
                           Collectors.groupingBy(
          (Character x) -> x < 'd' ? "Before_D" : "D_Onward",
                             TreeMap::new,
                             Collectors.toSet()));

    System.out.println(group3);

    // Partition into two lists
    Map<Boolean, List<Character>> partition1 =
           Arrays.stream(chars).collect(
                            Collectors.partitioningBy(
                                 (Character x) -> x < 'd'));

    System.out.println(partition1);

    // Partition into two sets
    Map<Boolean, Set<Character>> partition2 =
           Arrays.stream(chars).collect(
                            Collectors.partitioningBy(
                                 (Character x) -> x < 'd',
                                       Collectors.toSet()));

    System.out.println(partition2);
  }
}

The first type of collector in this example is a joining collector. This is used to join Strings together (in the example we map characters to Strings). The no parameter version just joins the Strings together, the one parameter version allows us to specify a string to put between any strings we join. The 3 parameter version also allows us to specify a start and an end string to flank the collected String with. This can be useful for producing debug or human readable output.

The second is a groupingBy collector. This allows us to group together elements that have the same classification into a map. To classify we pass a classification function which takes an element and returns the type we are using for the keys of the map. GroupingBy has three versions: the first just takes the classification function, the second allows us to specify the the collection type for values with duplicate keys (default is a generic list). The third is the same as the second, but also has another parameter (2nd one) which allows us to specify the type of map (as opposed to just a generic one). We pass a constructor using the :: notation and the constructor is denoted by the ‘new’ function.

The third set is a partitioningBy collector. This is like groupingBy, but instead of passing a function to specify the key, we pass a predicate which determines the key (true or false) for each value. In the single parameter version, values that share the same key are organised into a generic list, where as in the two parameter version we can specify the collection type for the values.

So we’ve seen plenty of Hello World style examples, but I think I owe you something that’s a bit more realistic. Let’s model a club. Each member of the club has a membership which keeps track of their name, age and gender. It’s also possible to register two members as a couple. There are three types of memberships, a junior membership for the under 18s, a senior membership for the 60s and overs, otherwise adult membership.

We’re tasked with the following:

  • Get all member’s names as a String separated by ,
  • Find the average age (rounded down to the nearest integer)
  • Split the membership list into all the male and all the female members
  • Classify the members depending on their membership types
  • Get all couples as a List

This is a reasonably sized example that would pass quite easily for a university programming homework/exam, or a longer programming exercise in an interview. Being able to knock out such code from a description in say 30 minutes will put you in very good stead.

Let’s think how to approach this. Before you start iterating through the collections using ‘for’ and having lots of mutable state, we’re solve without any mutable state. Well except in one place for convenience (registering couples). The data is not going to get further mutated in the example once we’re set up anyway. This will mean we’re less likely to make bugs by mutations happening incorrectly, have one-off errors and we can delegate more of the how to do it machinery to the Java libraries leaving us to focus on arranging it to solving the problem. This will be our first steps in thinking like a functional programmer but not be too far away from what an OO programmer would understand.

First we are going to create a ClubMember type:

public class ClubMember
{
  private String name;
  private boolean male;
  private int age;
  private ClubMember partner;

  public ClubMember(String name, boolean male, int age)
  {
    this.name = name;
    this.male = male;
    this.age = age;
  }

  public String getName()
  {
    return this.name;
  }

  public int getAge()
  {
    return age;
  }

  public ClubMember getPartner()
  {
    return partner;
  }

  @Override
  public String toString()
  {
    return name;
  }
}

All pretty straight forward and wouldn’t look out of place in object-oriented code. We have a constructor, some getters and toString method. We don’t have a isMale() getter, but we’ll see why in a second. There’s a lot of boiler-plate though so if boiler-plate makes your blood boil [I’m allowed a quip] then you might want to take a look at Project Lombok.

We’re also going to have to register a couple somehow and let’s assume this club frowns on polygamy. We will use a static method to do it. Why static? A static method can set the partner field of both members in the same method. If it was a normal method we’d either to rely on our class’ user to call it on both members – which they might forget to do. What if we made a member register method, calling the passed member’s register method inside it? In that case we would have to decide how are we going to stop an infinite loop ping-ponging between the two instances. After all that method is going to call the first one again.

So here is the static method:

  public static void registerPartners(ClubMember cm1,
                                      ClubMember cm2)
  {
    cm1.partner = cm2;
    cm2.partner = cm1;
  }

I also want to add two static fields. Instead of a getter for isMale and a test for a partner being a member, I want to use Predicates instead so I can do some functional things with them:

public static final Predicate<ClubMember> isMale =
                                               m -> m.male;

public static final Predicate<ClubMember> isPartnerMember =
                                    m -> m.partner != null;

We’re done with the ClubMember class now.

Having a Tuple2 type to manage both age ranges for our memberships and couples would be useful. Unfortunately standard Java doesn’t have Tuple2 just yet, so we’ll make our own (using a library like Guava for this functionality would be better to avoid having to reinvent the wheel):

public class Tuple2<T1, T2>
{
  public T1 t1;
  public T2 t2;

  public Tuple2(T1 first, T2 second)
  {
    t1 = first;
    t2 = second;
  }

  @Override
  public String toString()
  {
    return "(" + t1 + ", " + t2 + " " + ")";
  }
}

A couple Tuple2 has two ClubMember types, a age range Tuple2 has two Integer types. As we probably would reuse the two integer type Tuple2 elsewhere, we’ll define an IntegerRange specially:

public class IntegerRange extends Tuple2<Integer, Integer>
{
  public IntegerRange(Integer start, Integer end)
  {
    super(start, end);
  }

  public Integer getStart()
  {
    return t1;
  }

  public Integer getEnd()
  {
    return t2;
  }

  public final Predicate<Integer> inRange = 
                                i -> i >= t1 && i < t2;
}

This also allows us to define another predicate to determine whether a value is in the range. We’ll take the end-point as non-inclusive.

Now on to our main class:

public class ClubMembers
{
  private List<ClubMember> members;

  private static final IntegerRange juniors =
                              new IntegerRange(0, 18);
  private static final IntegerRange adult =
                              new IntegerRange(18, 60);
  private static final IntegerRange seniors = 
               new IntegerRange(60, Integer.MAX_VALUE);

  private static final String juniorMembership =
                                   "Junior Membership";
  private static final String adultMembership =
                                    "Adult Membership";
  private static final String seniorMembership =
                                   "Senior Membership";

  public ClubMembers(List<ClubMember> members)
  {
    this.members = members;
  }
}

Let’s start off with a field to store the membership list and a constructor to initialise it. We also will define our age ranges for our membership types and some strings to represent them. We’ll add all the rest of the code into this class.

So the first thing we want to do is get all the members. Let’s join them together with a comma. Does that sound like a job for Collectors.joining? You bet!:

public String getAllMembers()
{
  return members.stream().map(m -> m.getName())
                         .collect(Collectors.joining(", "));
}

That was simple. We take a member, map it to just its name, and then collect with joining. The cool thing here is we don’t need to worry whether to add a comma or not as it’s taken care of for us. No more defining mutable state variables such as ‘first’ to avoid putting a comma before the first name.

Now on to average age. Again simple since we’ve seen this before:

public OptionalDouble getAverageAge()
{
  return members.stream().map(m -> m.getAge())
                         .mapToInt((Integer x) -> x)
                         .average();
}

Next we need to find all male and female members. Hmmmm two groups, this sounds like a job for Collectors.partitioningBy.

Note: One thing we should do when programming in a functional style is try to reuse as much as possible. Instead of the class being a template that we specialise, we use a function [often as a parameter] to give it special behaviour.

Let’s partition members by an arbitrary predicate:

private Map<Boolean, List<ClubMember>> partitionMembers(
                                   Predicate<ClubMember> p)
{
  return members.stream().collect(
                            Collectors.partitioningBy(p));
}

Now we can specialise this function by passing the appropriate predicate, conveniently defined in ClubMember:

public Map<Boolean, List<ClubMember>> getMembersByGender()
{
  return partitionMembers(ClubMember.isMale);
}

So now we have two groups, the false group are female, the true group are male.

What about our memberships? This time we have three groups, and thus Collectors.partitioningBy isn’t a good fit, so let’s use Collectors.groupingBy. Again let’s use the same trick: a generic classification function we pass a classifier to. This classifier is just a function taking a ClubMember and classifying into groups of an arbitrary type:

private <T> Map<T, List<ClubMember>>
         classifyMembers(Function<ClubMember, T> classifier)
{
  return members.stream().collect(
                         Collectors.groupingBy(classifier));
}

We need that classification function to take a ClubMember and return details of their membership type as a String:

private static final Function<ClubMember, String>
                  resolveMembershipType =
    m -> juniors.inRange.test(m.getAge()) ? juniorMembership :
         adult.inRange.test(m.getAge()) ? adultMembership :
                                          seniorMembership;

We defined the strings, we defined the age ranges, we defined a inRange test. It was just a case of putting it together. Now we just specialise classifyMembers:

private Map<String, List<ClubMember>> classifyMemberships()
{
  return classifyMembers(resolveMembershipType);
}

See how easy this can be?

Last is the couple members. What does it mean to be a couple member?

Well, we expect the isPartnerMember predicate defined in ClubMember to return true. That’s a filter we need in order to check this.

Now if we go through all our members with partners we will return the couples twice though: once for personA, personB and once for personB, personA. That might be what we could want, but in this case it’s not. Let’s make the assumption that both partners have a different name (to be robust we probably should have member ids for this sort of thing). We need to arbitrarily choose one of the partners as our first partner, so let’s use it if personA’s name when compared to personB’s name returns < 0. String has a compareTo function defined for us so we can do that. This will need another filter.

We also want to return the couple, not just one of the members, so let’s use Tuple2 to hold that. This will need a map.

Finally we want a list of couples, so Collectors.toList() will do that job nicely:

public List<Tuple2<ClubMember, ClubMember>> getCouples()
{
  return members.stream()
             .filter(ClubMember.isPartnerMember)
             .filter(m -> m.getName().
                     compareTo(m.getPartner().getName()) < 0)
             .map(m -> new Tuple2<>(m, m.getPartner()))
             .collect(Collectors.toList());
}

I hope this is making you smile – a little bit of thought about the problem and we can easily solve it by putting blocks together.

Let’s write a driver main method in ClubMembers. To make things a little easier to verify we’ll call the couples by their titles and everyone else by their first name:

public static void main(String args[])
{
  ClubMember cm1 = new ClubMember("Johnny", true, 13);
  ClubMember cm2 = new ClubMember("Jenny", false, 9);
  ClubMember cm3 = new ClubMember("Dave", true, 21);
  ClubMember cm4 = new ClubMember("Penny", false, 28);
  ClubMember cm5 = new ClubMember("Mrs. Smith", false, 36);
  ClubMember cm6 = new ClubMember("Mr. Smith", true, 45);
  ClubMember cm7 = new ClubMember("Mr. Watts", true, 59);
  ClubMember cm8 = new ClubMember("Mrs. Watts", false, 60);
  ClubMember cm9 = new ClubMember("Bill", true, 68);

  ClubMember.registerPartners(cm5, cm6);
  ClubMember.registerPartners(cm7, cm8);

  ClubMember[] membersArray = new ClubMember[]
           { cm1, cm2, cm3, cm4, cm5, cm6, cm7, cm8, cm9 };

  ClubMembers members = new ClubMembers(
                              Arrays.asList(membersArray));

  System.out.println("Members: " + members.getAllMembers());
  System.out.println("Average age: " + 
         new Double(members.getAverageAge().orElse(0))
                   .intValue());
  System.out.println("Membership by gender (true is male): " +
                        members.getMembersByGender());
  System.out.println("Memberships: " +
                        members.classifyMemberships());
  System.out.println("Couples: " + members.getCouples());
}

Let’s fire it up:

Members: Johnny, Jenny, Dave, Penny, Mrs. Smith, Mr. Smith, Mr. Watts, Mrs. Watts, Bill
Average age: 37
Members are male: {false=[Jenny, Penny, Mrs. Smith, Mrs. Watts], true=[Johnny, Dave, Mr. Smith, Mr. Watts, Bill]}
Memberships: {Senior Membership=[Mrs. Watts, Bill], Junior Membership=[Johnny, Jenny], Adult Membership=[Dave, Penny, Mrs. Smith, Mr. Smith, Mr. Watts]}
Couples: [(Mr. Smith, Mrs. Smith ), (Mr. Watts, Mrs. Watts )]

What you might notice when writing code like this on your own, is that once you deal with all the compile errors it works first time out of the box. This is very different from doing it in a procedural style using iterators where you often end up with one off errors, null pointers and a host of other problems. By eliminating the possibility of making them we can write code quicker and be more productive. Java 8 Streams and supplied collectors make life very easy for us.

Collectors Part 1 – Reductions and Short-Circuiting Operations‏

In the first couple of articles we looked at streams. We saw that we could take something simple such as a list of countries, filter or map their names and then print them via a foreach. We then looked at ranges/loops and generators as a way of supplying values as an alternative to a predefined list.

Although we didn’t explicitly mention this, a stream can be divided into 3 distinct parts:

  1. A source operation such as a supplier or a generator which pushes elements into our stream via a spliterator.
  2. Optional intermediate steps: these can filter values, sort values, map values, affect the stream’s processing (such as go parallel) and so on.
  3. Finally a terminal operation either consumes the values, reduces the values, short-circuits the values or collects them. Short-circuiting a terminal operation means that the stream may terminate before all values are processed. This is useful if the stream is infinite.

We’ve covered the first two parts reasonably well and also used forEach to do consuming, so let’s now look at collecting. Why collect instead of consume? There are several reasons including:

  • Since it returns nothing, consuming must involve a side-effect (else it wouldn’t do anything) which when running in parallel might not be in the order we expect or to put it in order cause unnecessary synchronisation
  • We want to use the results again later
  • We want to reduce the values into a single result
  • We want to be able to inspect/return the values, such as for unit tests or to build in reusability.
  • Side-effects can make testing hard and often require mocking
  • Side-effects break the concept of pure-functions (values in, results out only; same values in give same results out) which make it harder to prove code works

We’ll start by looking at reduction. This is a form of collecting where instead of returning all the results which come out of the stream, we condense them down into [usually] a single result. A common example would be summing all the values. Let’s look at the built in reduction operations using a list of Integer as the source:

public class ListReduction
{
  public static void main(String[] args)
  {
	List<Integer> numbersList = Arrays.asList(1, 2, 5, 4, 3);

	System.out.println(numbersList.stream().count());
		
	System.out.println(numbersList.stream().mapToInt(x -> x).sum());

	System.out.println(numbersList.stream().mapToInt(x -> x).average()
				.getAsDouble());

	System.out.println(numbersList.stream().mapToInt(x -> x).max()
				.getAsInt());

	System.out.println(numbersList.stream().mapToInt(x -> x).min()
				.getAsInt());

	System.out.println(numbersList.stream().mapToInt(x -> x)
				.summaryStatistics());
  }
}

Note:

  • the summaryStatistics() operation calculates all the values
  • average() returns an OptionalDouble – we need to use getAsDouble() to get the value
  • max() and min() return OptionalInt – we need to use getAsInt() to get the value

As already discussed in the article on Optional, if the Optional value happens to be the special empty() value [when we didn’t pass any values through or filtered all of them out] we will get a NullPointerException if we try to use get or getAs<type> – we might wish to consider getOrElse for example to supply a default to avoid this.

Also note because we were streaming a list, we had to use mapToInt(x -> x) to change the stream shape from Object to int as IntStream works with int not Integer.

If we used an array of int instead we could dispense with the map:

public class ArrayReduction
{
  public static void main(String[] args)
  {
	int[] numbersArray = new int[] { 1, 2, 5, 4, 3 };

	System.out.println(Arrays.stream(numbersArray).count());

	System.out.println(Arrays.stream(numbersArray).sum());

	System.out.println(Arrays.stream(numbersArray).average().getAsDouble());

	System.out.println(Arrays.stream(numbersArray).min().getAsInt());

	System.out.println(Arrays.stream(numbersArray).max().getAsInt());

	System.out.println(Arrays.stream(numbersArray).summaryStatistics());
  }
}

This looks a bit tidier. We can’t do anything about having to create the stream each time. If we tried to save a reference to Arrays.stream(numbersArray) it would only be able to be used once. This is why summaryStatistics can be very useful.

What if we want to write our own reductions? There are two ways. The first is to use the reduce operation which we’ll look at here. The other way is to use collect which we’ll look at in the next article.

To do reduction we need one or two things:

  • a binary function which takes two values and returns a single one
  • we may also need an initial value (termed the identity)
  • Let’s imagine the stream as a queue of values [assume the stream is sequential]. If an identity value is given, we’ll put that in queue first. All the values from the stream in turn are then added to the queue. Once we have our queue, we remove the first value and assign it to the accumulator. While there are more values in the queue, we remove the next first value from the queue, and then perform the binary function on the accumulator and the value removed. We then assign the result back to the accumulator. This is repeated until the queue is empty.

    It’s easy to see why it’s useful to have an identity value. In the case of sum, for example, the identity is zero, and thus zero is assigned to the accumulator before values are taken from the stream. If there are no values in the stream, the final result is just zero.

    What if both the stream is empty and there was no identity value? To solve this problem, the version of the API without an identity value returns an appropriate Optional. You can now see why we took a detour to discuss Optional in the last article.

    Let’s replace the built in operations above with explicit reductions using reduce:

    public class ExplicitReductions
    {
      public static void main(String[] args)
      {
        int[] numbersArray = new int[] { 1, 2, 3, 4, 5 };
    
        System.out.println(Arrays.stream(numbersArray).map(x -> 1)
                                 .reduce(0, Integer::sum));
    
        System.out.println(Arrays.stream(numbersArray)
                                 .reduce(0, Integer::sum));
    
        System.out.println(Arrays.stream(numbersArray)
                                 .reduce(Integer::min).getAsInt());
    
        System.out.println(Arrays.stream(numbersArray)
                                 .reduce(Integer::max).getAsInt());
       }
    }
    

    A few things to note:

    • To perform count we have to map the values to a 1 and then do a sum. It might seem that it would be far easier to just use length on the array to get the count, however remember in a stream we might have other operations first such as to filter some of the values. An example use might be to count how many values are even.
    • Average is missing since it’s a bit more complicated. We have to keep both a tally and a sum so the simple call to reduce is not enough to implement it.
    • The reduction operation is also called ‘fold left’ since if we drew a tree it would be leaning left.

    For example with 4 values:

    foldleft

    This reduces to (((Val1 Op1 Val2) Op2 Val3) Op3 Val4)

    We can use our own functions in reduce. For example to do a factorial we just need a function which multiplies the accumulator by the next value:

    public class Factorial
    {
      public static void main(String[] args)
      {
        int n = 6;
    		
        System.out.println(IntStream.rangeClosed(1, n)
                                    .reduce((x, y) -> x * y).getAsInt());
      }
    }
    

    Let’s finish off by looking at the short-circuit operators:

    public class ShortCircuit
    {
      public static void main(String[] args)
      {
        List<String> countries = Arrays.asList("France", "India", "China",
                                               "USA", "Germany");
    
        System.out.println(countries.stream()
                           .filter(country -> country.contains("i"))
                           .findFirst().get());
    
        System.out.println(countries.stream()
    		       .filter(country -> country.contains("i"))
                           .findAny().get());
    
        System.out.println(countries.stream()
                           .allMatch(country -> country.contains("i")));
    		
        System.out.println(countries.stream()
                           .allMatch(country -> !country.contains("z")));
    
        System.out.println(countries.stream()
                           .noneMatch(country -> country.contains("z")));
    
        System.out.println(countries.stream()
                           .anyMatch(country -> country.contains("i")));
    
        System.out.println(countries.stream()
                           .anyMatch(country -> country.contains("z")));
    
      }
    }
    

    As said earlier, terminal short-circuit operations may mean we don’t process all the values in the stream. There are built in operations to find the first value that matches [findFirst], any one value that matches [findAny] and to find out if all, any or none match [allMatch, anyMatch, noneMatch].

    Note in the case of findFirst or findAny we only need the first value which matches the predicate (although findAny is not guaranteed to return the first). However if the stream has no ordering then we’d expect findFirst to behave like findAny. The operations allMatch, noneMatch and anyMatch may not short-circuit the stream at all since it may take evaluating all the values to determine whether the operator is true or false. Thus an infinite stream using these may not terminate.

    We’ve still got collectors to look at, so that will be the focus of the next article.

    Optional: Java 8’s way to deal with null

    For those who have been programming Java or C/C++ for any period of time will know one of the most annoying things is trying to debug a crash due accessing a null object. While the concept of null is needed to make a programming language work, deal with deviations from the normal ‘happy’ path including error handling, it doesn’t contribute towards implementing a solution. Yet we have to spend a fair portion of our time dealing with and protecting against null values to make robust software. Today we will take a look how Optional can improve our code in general followed by a quick look through its API.

    Null is the default value for an uninitialised class member field or static object, we reassign back to null to free memory. It’s also used for sentinel values such as indicating no data. The problem is when we try to access a null value we get an exception. We are then left trying to work out whether the value was uninitialised and thus the fault of some other code, or whether it was a sentinel value our code didn’t handle properly. Sometimes this leads to the wrong fix being made or dithering over which fix to make. This code will probably look familiar:

    public class ImportantData
    {
      private Data fileData; // Not constructor initialised
    
      ...
    
      // Call first before using csvData
      public void load(String fname)
      {
        try
        {
           fileData = loadCSVFromFile(fname);
        }
        catch (IOException e)
        {
          // Should at least have:
          // System.err.println("Can't load " + fname);
        }
      }
    
    ...
    }
    

    This is the ‘I can’t work out how to handle this yet’ pattern. Often we do this sort of thing just to get code running because handling the error might not be trivial, not yet specified and/or we’re making a proof of concept. Such code becomes more likely in the agile ‘always demonstrable’ development model. When we try to build on this code it’s easy to forget revisiting the shortcuts and hard to find them again unless we consistently mark them. Worse is when the exception is being caught, but the handler is empty, with not even a message, so we get a silent failure. This is further compounded by Java’s rule on having to catch checked exceptions tempting us into a shortcut. Testing might not even highlight the problem because it’s an exception case and might need something else to go wrong before we get a failure.

    If fileData mustn’t be null we should certainly do a check. We could use an assert, but that will be disabled in production. Unless space or time is at a premium it’s always better to be defensive. Better to catch a problem sooner rather than later as well as not allowing it to go on and mess something else up. Until Java 7 we would have had to do the following:

        try
        {
           fileData = loadCSVFromFile(fname);
        }
        catch (IOException e)
        {
          // Should at least have:
          // System.err.println("Can't load " + fname);
        }
    
        if (fileData == null)
        {
          throw new NullPointerException("fileData can't be null!");
        }
    

    This will also help us with the silent IOException catch since fileData will also be null there.

    In Java 7 we can go one better and replace the null test with the built in:

       Objects.requireNonNull(fileData, "fileData can't be null!");
    

    This is shorter, documents our intention that fileData can’t be null and prevents a null object causing bother later in the code. There are two versions of requireNonNull, one with a message and one without which translate exactly to the older Java equivalent.

    Java 8 added Optional to allow us to work better with nulls and distinguish between no result and uninitialised/an error occurred. Let’s change the code as follows:

    public class ImportantData
    {
      private Optional<Data> fileData; // Not constructor initialised
    
      ...
    
      // Call first before using csvData
      public void load(String fname)
      {
        // assume fileData is uninitialised at this point
    
        try
        {
           fileData = Optional.of(loadCSVFromFile(fname));
        }
        catch (IOException e)
        {
          // Should at least have:
          // System.err.println("Can't load " + fname);
        }
    
        Objects.requireNonNull(fileData, "fileData can't be null!");
      }
    
    ...
    }
    

    Now we’re using an Optional to wrap our Data object using Optional’s static ‘of’ method (Optionals can only be initialised using static methods). The ‘of’ method will throw a NullPointerException if we try to wrap a null. We might as well use this as a free safety check as the code will crash there and then. If we have to be more robust later we can search for Optional.of to locate all the places we need to be checking for NullPointerException.

    Once we’re finally done with fileData and need to release to the garbage collector we can’t just change the contents of an Optional (as that cannot be reassigned), we need to change what fileData references. We might consider a special sentinel object to indicate it was freed rather than using null which could be mistaken [when debugging] for never initialised (i.e. load was never called).

    Suppose it’s acceptable for loadCSVFromFIle to return null, perhaps to indicate an empty file. Without wrapping this with an Optional we can’t tell between an empty file, the file wasn’t found, or the file was corrupted, or load was never called. If we don’t handle those exceptions properly we have no way later to know the cause of fileData being null and whether it should have be worked with, or should have been handled earlier. Thus we’re not documenting our intentions, often leaving someone else to work out what we meant. This can lead to the wrong fix being made.

    Optional helps with this problem but to wrap nulls we must replace

    ...
           fileData = Optional.of(loadCSVFromFile(fname));
    ...
    

    with

    ...
           fileData = Optional.ofNullable(loadCSVFromFile(fname));
    ...
    

    Since passing a null to Optional’s ‘of’ method throws a NullPointerException, we have to use ofNullable which also wraps nulls. Under the hood an Optional.empty() is returned if null is passed to it. We can now tell the difference between an uninitialised fileData (due to exceptions or load not getting called) and the case of the file lacking any data.

    Note: The example assumed that we couldn’t change loadCSVFromFile, but if we could we’d return the Optional from that rather than wrapping it afterwards. This will also save the user of the API from having to decide whether to wrap with ‘of’ or ofNullable.

    Optional allows us to work with null objects easier as there are useful supporting functions reducing the boiler-plate ‘if (object != null) { …. ‘ that can litter code making it hard to follow.

    Let’s now have a look at Optional’s API. Note there are also specialised Optionals: OptionalInt, OptionalDouble and OptionalLong whose APIs are very similiar. First we’ll start with creating (wrapping objects) and unwrapping them:

    public static void main(String[] args)
    {
        Optional<String> opt = Optional.of("hello");
        System.out.println("Test1: " + opt.get());
    
        try
        {
            Optional.of(null);
        }
        catch (NullPointerException e)
        {
            System.out.println(
               "Test2: Can't wrap a null object with of");
        }
    
        Optional<String> optNull = Optional.ofNullable(null);
    
        try
        {
            System.out.println(optNull.get());
        }
        catch (NoSuchElementException e)
        {
            System.out.println(
               "Test3: Can't unwrap a null object with get");
        }
    
        Optional<String> optEmpty = Optional.empty();
    
        try
        {
            System.out.println(optEmpty.get());
        }
        catch (NoSuchElementException e)
        {
            System.out.println(
               "Test4: Can't unwrap an empty Optional with get");
        }
    }
    

    There are four tests above:

    1. The first shows the wrapping of an object which we do by calling the static ‘of’ method with object we wish to wrap then retrieving with get (getAs<type> in the specialised Optionals).
    2. The second shows that we can’t wrap a null object with ‘of’ and if we try we get a NullPointerException. Thus ‘of’ should be used when we’re sure that a null is not possible or we wish to throw a NullPointerException if it is. If null is allowable we must use ofNullable instead.
    3. & 4. The third and fourth are actually the same case since when ofNullable wraps a null a Optional.empty() is returned. We can also call the empty method directly. These tests show if the get method is used to unwrap Optional.empty() it will throw a NoSuchElementException.

    One thing to note is the specialised versions (e.g. OptionalInt) do not have an ofNullable, although we can still do a test and manually get an OptionalInt.empty() if we want. Correspondingly that API works with int and not Integer.

    Since we may need to check whether an Optional is empty or not, we can use the isPresent() test for this. The API explicitly states we should never do a == check against Optional.empty() since it can’t be guaranteed to be a singleton.

    If we want to unwrap an Optional which may be null we should use orElse instead to give it a default value (which can be null).

    public class OptionalTest2
    {
            public static void main(String[] args)
            {
                    Optional<String> opt = Optional.of("found");
                    System.out.println(opt.isPresent());
                    System.out.println(opt.orElse("not found"));
    
                    Optional<String> optNull = Optional.ofNullable(null);
                    System.out.println(optNull.isPresent());
                    System.out.println(optNull.orElse("default"));
    
                    Optional<String> optEmpty = Optional.empty();
                    System.out.println(optEmpty.isPresent());
                    System.out.println(optEmpty.orElse("default"));
            }
    }
    

    In addition to the code supplying a default value explicitly using orElse, we can call orElseGet to get a value from a supplier. There is also orElseThrow in which the supplier passed will supply an appropriate exception, and also an ifPresent method that passing the value to a supplier only if the Optional is wrapping a value. The next example demonstrates these:

    public class OptionalTest3
    {
    	private static class MySupplier implements Supplier<String>
    	{
    		@Override
    		public String get()
    		{
    			return "Supplier returned this";
    		}
    	}
    
    	private static class MyExceptionSupplier implements
    			Supplier<IllegalArgumentException>
    	{
    		@Override
    		public IllegalArgumentException get()
    		{
    			return new IllegalArgumentException();
    		}
    	}
    
    	private static class MyConsumer implements Consumer<String>
    	{
    		@Override
    		public void accept(String t)
    		{
    			System.out.println("Consumed: " + t);
    		}
    	}
    
    	public static void main(String[] args)
    	{
    		Optional<String> opt = Optional.of("found");
    		System.out.println(opt.orElseGet(new MySupplier()));
    		System.out.println(opt.orElseThrow(
                                             new MyExceptionSupplier()));
    		opt.ifPresent(new MyConsumer());
    
    		Optional<String> optNull = Optional.ofNullable(null);
    		System.out.println(optNull.orElseGet(new MySupplier()));
    
    		try
    		{
    			System.out.println(optNull.orElseThrow(
                                             new MyExceptionSupplier()));
    		}
    		catch (IllegalArgumentException e)
    		{
    			System.out.println("Exception caught");
    		}
    
    		// This one won't use the consumer
    		optNull.ifPresent(new MyConsumer());
    	}
    }
    

    Having to retrieve and check for values being present in this way, although initially tedious, makes us think more about what to do if values are null. It’s at least shorter that not using Optional.

    Note that in latest example if we were wrapping say Integer instead of String, we can’t use IntSupplier or IntConsumer. This is because orElseGet and ifPresent of Optional require a type that extends or is super to an Integer respectively (including Integer or course). IntSupplier and IntConsumer do not extend Supplier and Consumer so we cannot substitute them. The specialised OptionalInt does take a IntSupplier and IntConsumer though.

    There are a few useful functional methods (which surprisingly haven’t been added to the wrapper classes or Number): filter, map and flatMap. FlatMap handles the case where the mapping function already returns an Optional and so doesn’t wrap it again. Conversely Optional’s map will wrap whatever the mapping function returns.

    Filter returns Optional.empty() if the predicate doesn’t match. If the Optional was already empty the predicate is not checked, although this shouldn’t concern us because Predicates should be just logical tests, and not have side-effects.

    Here’s a quick run through:

    public static void main(String args[])
    {
        Optional<String> hiMsg = Optional.of("hi");
    
        Optional<String> hiThereMsg = hiMsg.map(x -> x + " there!");
    
        System.out.println(hiMsg.get()); // Original
    
        System.out.println(hiThereMsg.get()); // Mapped
    
        System.out.println(hiThereMsg.filter(x -> x.equals("hi there!"))
    				 .orElse("Bye!"));
    
        // Filter test fails returning Optional.empty()
        System.out.println(hiThereMsg.filter(x -> x.equals("yo there!"))
    				.orElse("Bye!"));
    
        // The Optional gets wrapped
        Optional<Optional<String>> byeMessage = hiThereMsg
                                     .map(x -> Optional.of("Bye bye!"));
    
        // No extra wrapping
        Optional<String> byeMessage2 = hiThereMsg
                                 .flatMap(x -> Optional.of("Bye bye!"));
    
        System.out.println(byeMessage.get().get());
        System.out.println(byeMessage2.get());
    
        // This would be an error since the
        // mapping has to return Optional
        // hiThereMsg.flatMap(x -> "Bye bye!");
    
        // We can change the wrapped type
        Optional<Integer> five = hiThereMsg.map(x -> 5);
        System.out.println(five.get());
    
        Optional<Integer> six = hiThereMsg.flatMap(x -> Optional.of(6));
        System.out.println(six.get());
    }
    

    Finally a word of warning from the Java documentation itself: ‘This is a value-based class; use of identity-sensitive operations (including reference equality ==), identity hash code, or synchronization on instances of Optional may have unpredictable results and should be avoided.’ In short don’t try to use ==, hashCode or synchronized on Optional. Normal .equals can be used but you if you’re expecting a match then the object you are comparing against will also need to be an Optional. If both Optionals are Optional.empty(), that is considered a match.

    More will said soon on how Optional fits in with the new functional programming features.

    Finite Sequence Generators in Java 8 – Part 2

    In the last couple of articles we looked at generators. First we looked at ways of generating an infinite sequence. In the second we saw a way of generating a finite sequence. Let’s look at a few more aspects before we move on.

    In the finite sequence article, we saw that unless we wanted to limit ourselves to a certain number of values we couldn’t use generate and iterate in a simple manner. This was because there was no way of indicating a stop condition. Limit is fine if we know how many values we need, but not if we don’t. If we use limit we’d have to create a new stream to get further values. There are a couple of other methods we could use for generating finite sequences without having to resort to using Iterable.

    Let’s go back to our die throwing SixGame example from the last article. Instead of using an Iterator/Iterable, we’ll use an IntSupplier coupled with IntStream’s generate method. If any of that is new to you, then first review the article on generators with infinite sequences. We’re going to attempt (and I’m not saying this is good practice) to stop generating when we get a Six by throwing an exception:

    public class SixGame
    {
    	public static class DieThrowSupplier implements IntSupplier
    	{
    		private Random rand = new Random(System.nanoTime());
    		private boolean done = false;
    
    		@Override
    		public int getAsInt()
    		{
    			if (!done)
    			{
    				int dieThrow = Math.abs(rand.nextInt()) % 6 + 1;
    
    				if (dieThrow == 6)
    				{
    					done = true;
    				}
    
    				return dieThrow;
    			}
    			else
    			{
    				throw new NoSuchElementException();
    			}
    		}
    	}
    
    	public static void main(String args[])
    	{
    		DieThrowSupplier dieThrows = new DieThrowSupplier();
    
    		IntStream myStream = IntStream.generate(dieThrows);
    		
    		try
    		{
    			myStream.mapToObj(i -> "You threw a " + i).forEach(
    					System.out::println);
    		}
    		catch (NoSuchElementException e)
    		{
    			// Escaped
    		}
    	}
    }
    

    Something here that’s new is the mapToObj call. We’re starting out with an IntStream, but we want to create a message which is a String. Thus we need to change the ‘shape’ of the stream from Integer to Object (there is no special String stream) and we can do that with mapToObj. It works like map, but instead of expecting an Integer being returned from the function, it expects an Object.

    We have to catch the exception, but luckily (or perhaps sloppily given this is a demonstration) we are using a side-effect to do something with the string we generate: printing in forEach. Once we go parallel though we need to remove side effects. Although we’ve not covered it yet, what we need to do is collect the results from the stream, perhaps in a list, and then perform the printing outside of the stream chain. Although this seems a lot for our simple game, getting streams to work properly in parallel is one of the more difficult tasks that we’re going to have to master eventually.

    Our problem in the parallel world is going to be that we’re collecting, but we need to assign that collection to something when the stream is done. Try changing the try/catch code to the broken:

                List<String> l = null;
    
                try
                {
                        l = myStream.parallel().mapToObj(i -> "You threw a " + i)
                                               .collect(Collectors.toList());
                }
                catch (NoSuchElementException e)
                {
                        // Escaped
                }
    
                l.stream().forEach(System.out::println);
    

    Nothing gets printed this time, and we crash with a NullPointerException. Given we throw an exception during the stream which we catch after the assignment and not as part of the stream, the assignment never happens. Thus the list, l, stays null. We went through all the motions and got nothing for our troubles. Perhaps we could try making special collectors to handle exceptions, but given an exception is almost certainly a side-effect we should avoid these when going parallel. I can also imagine that catching exception outside of a stream and hoping we still get all the results might be quite flaky as we’re relying on the implementation to make it work. Implementations change, and other implementations come along. My verdict is – unless Oracle say otherwise, is avoid.

    We also discussed that we wanted to avoid implementing a whole spliterator if there was another way available. To recap, a spliterator is an iterator that can be split into batches of work and is what drives streams. Getting that right isn’t trivial. We saw that we couldn’t get access to override InfiniteSupplyingSpliterator in order to make a version we could terminate. However, there exists a spliterator that is just missing tryAdvance which we use to inject the next value into the stream and indicate when we’re done. This is AbstractSpliterator, in particular AbstractIntSpliterator, which we can extend. Let’s have a look at our game using one of those:

    public class SixGame
    {
    	public static class DieThrowSpliterator extends
    			Spliterators.AbstractIntSpliterator
    	{
    		private Random rand = new Random(System.nanoTime());
    		private boolean done = false;
    
    		protected DieThrowSpliterator()
    		{
    			super(Long.MAX_VALUE, 0);
    		}
    
    		private int rollDie()
    		{
    			int dieThrow = Math.abs(rand.nextInt()) % 6 + 1;
    
    			if (dieThrow == 6)
    			{
    				done = true;
    			}
    
    			return dieThrow;
    		}
    
    		@Override
    		public boolean tryAdvance(IntConsumer action)
    		{
    			if (action == null)
    			{
    				throw new NullPointerException();
    			}
    			
    			if (done)
    			{
    				return false;
    			}
    
    			action.accept(rollDie());
    
    			return true;
    		}
    
    		@Override
    		public boolean tryAdvance(Consumer<? super Integer> action)
    		{
    			if (action == null)
     			{
    				throw new NullPointerException();
      			}
    
      			if (done)
      			{
    				return false;
      			}
    
      			action.accept(rollDie());
    
      			return true;
    		}
    	}
    
    	public static void main(String args[])
    	{
    		Stream<Integer> stream = StreamSupport.stream
    					(new DieThrowSpliterator(), false);
    
    		stream.map(i -> "You threw a " + i)
    	              .forEach(System.out::println);
    	}
    }
    

    First notice that we are creating the stream the same way we did when using an Iterable, but instead we are creating a spliterator which we pass to the stream. The second thing to notice is that we have to implement two tryAdvance functions. These take Consumers which will use our value. The first is a true IntConsumer, where as the second is a Consumer of any type which can hold an Integer (Object, Number and Integer). I’ve kept the null check used in other spliterators. If we’re already done, we can return false, otherwise pass a roll to the action and return true. The parent constructor of our spliterator takes two values, the first being how many values we expect (we don’t know) and flags for characteristics of the spliterator (0 being none of them).

    No doubt we could continue the discussion on generation, particularly as we now have several ways to solve problems. For now we’ll move on and look at a few more aspects of Java 8 functional programming and lambda expressions.

    Finite sequence generators in Java 8

    … and introducing default methods.

    Last time we looked at generators, and more specifically those generating an infinite sequence. We saw that there were several ways to achieve this:

    • The older Java 7 way with an iterator like class
    • Using Stream’s iterate method
    • Using Stream’s generate method

    We also saw that when using a Stream we had to use the limit method on our infinite sequence otherwise it would keep generating and the program couldn’t continue. The problem with using limit was that once we’d got the values, we couldn’t use the stream again to get more. To solve this problem, we used an IntSupplier and created several streams with it to process batches of values.

    What if the sequence we wanted to process was finite. We want to avoid buffering up values in advance. We also want to consume the sequence without knowing up front how many values it will return because of reasons such as:

    • We can’t work it out or it’s difficult to
    • There is a random element
    • We want to decouple the generating function from stream doing the consuming

    We saw a simple finite sequence with our Hello World! example in the first article. In this case we were not generating the sequence with a function, we were instead processing a pre-initialised list. We also saw we could use an iterator when we discussed infinite sequences, but went on to discuss other ways. We’ll see that in this case, we are virtually forced into the Iterator solution.

    Let’s take a simple sequence which we can’t know the length of. We’ll simulate a game where the idea is to keeping throwing a die until we get a six. First we’ll start with a non-functional implementation:

    public class SixGame
    {
    	public static class DieThrowIterator implements Iterator<Integer>
    	{
    		private int dieThrow = 0;
    		private Random rand = new Random(System.nanoTime());
    
    		@Override
    		public boolean hasNext()
    		{
    			return dieThrow != 6;
    		}
    
    		@Override
    		public Integer next()
    		{
    			dieThrow = Math.abs(rand.nextInt()) % 6 + 1;
    			return dieThrow;
    		}
    	}
    
    	public static void main(String args[])
    	{
    		DieThrowIterator dieThrowIterator = new DieThrowIterator();
    
    		while (dieThrowIterator.hasNext())
    		{
    			System.out.println("You threw a " + dieThrowIterator
                                      .next());
    		}
    	}
    }
    

    (Note our random number generation is not very robust, but for a simple demonstration it will do).

    Let’s try to use a Stream. We can’t use Stream’s iterate function because it doesn’t allow a stop condition. Also an IntSupplier with generate is not an option unless we want to use limit(1) and create a stream for each die throw, when we might as well not use a stream at all.

    If we look under the hood at the implementation of IntStream’s generate function we see that it creates an InfiniteSupplyingSpliterator. This has a problem for us – the tryAdvance function always returns true meaning we can never stop.

    We could implement our own Spliterator where our tryAdvance checks for a stop condition. We can’t simply extend InfiniteSupplyingSpliterator and override the tryAdvance method since it’s an inner class of a default access class. So the only way is a cut and paste job rather than inheritance. I’m very nervous about copying large parts of code which might change in future versions; this is saying to me it’s not what was intended. We should look for other ways first.

    Let’s look to see how List does its streaming. List’s streaming comes from inheriting Iterable. To create an Iterable we only need to implement its iterator() function – for the example let’s do so as an inner-class:

    	public static class DieThrowIterable implements Iterable<Integer>
    	{
    		@Override
    		public Iterator<Integer> iterator()
    		{
    			return new DieThrowIterator();
    		}
    	}
    

    We can then stream:

    	public static void main(String args[])
    	{
    		Stream<Integer> stream = StreamSupport.stream(
    				new DieThrowIterable().spliterator(), false);
    
    		stream.map(i -> "You threw a " + i).forEach(System.out::println);
    	}
    

    Wait a moment… Iterable is an interface, yet it has a spliterator() method implemented. How can that be?

    If we look at the interface, there is indeed a spliterator() method creating a new spliterator for us. If we also look closely, we see the default keyword. This was added in Java 8 (cunningly default is already a reserved word for switch statements so it won’t break old code). When we write interfaces we can provide default methods already implemented. Now some people might have reservations about this as it’s turning an interface in to effectively an abstract class. There are some good reasons and advantages this gives us:

    • It avoids having to create abstract classes to implement interfaces to supply default methods. Thus there is less boiler-plate to write, and also prevents the consumer of the API implementing the interface when we expected the abstract class to be used instead.

    • It helps with multiple inheritance issues (inherting from two or more classes which C++ supports, but not Java). In Java, the problems with multiple inheritance were avoided by allowing only single inheritance from classes, but we can implement as many interfaces as desired. Problem is we had to implement large portions of those interfaces in each class that used them – a real pain.

    • Our API vendors can also now add new methods to interfaces without breaking old code. This is one reason we see a default method here. A new interface would make the JDK even larger and make things more complicated.

    • Lambda expressions bind to interfaces with one single method left to implement. If there were more we couldn’t do this binding. We’ll look at this in a future article.

    Note: If we implement more than one interface with an identical default method, and we do not override it by implementing in the class or a parent class, this is an error. Whether eventually it’ll be possible to use Scala-like mix-in traits is an interesting question.

    So using Iterable is one way to do finite sequence generators, although the documentation accompanying the default spliterator() method suggests it should be overridden for better performance. In a later article we’ll come back and look at spliterators in some more detail.

    In the next article we’ll look another couple of ways to implement finite sequence generators, one good, and one we should avoid.

    Generators with Java 8

    Today we’ll look at creating generators. In simple terms, a generator is a function which returns the next value in a sequence. Unlike an iterator, it generates the next value when needed, rather than returning the next item of a pre-generated collection. Some languages such as Python support generators natively via keywords such as yield. When a generator’s next value is requested in Python, the generator function continues to run until the next yield statement, where a value is returned. The generator function is able to continue where it left off which can be quite confusing for the uninitiated. So how to do something similar in Java?

    We saw in the last article that we can use an IntStream to generate a simple set of numbers, but we had to generate them all up front. That’s fine if we know how many we’re going to need. What if we don’t, and we want to be able to get the next whenever we like? This is where a generator comes in.

    Let’s choose a simple infinite sequence, the square numbers. In a standard Java implementation we’d end up with something like the following:

    public class Squares
    {
            private int i = 1;
    
            public int next()
            {
                    int thisOne = i++;
                    return thisOne * thisOne;
            }
    
            public static void main(String args[])
            {
                    Squares squareGenerator = new Squares();
    
                    System.out.println(squareGenerator.next());
                    System.out.println(squareGenerator.next());
                    System.out.println(squareGenerator.next());
            }
    }
    

    This prints the first three square numbers. Note we could have gone further and implemented this as an iterator.

    What we have here is an example of lazy evaluation in a non-functional style. Wikipedia defines lazy evaluation as: ‘In programming language theory, lazy evaluation, or call-by-need is an evaluation strategy which delays the evaluation of an expression until its value is needed’. Lazy evaluation is useful because we don’t need to worry about infinite sequences, performing computationally expensive operations up-front, and about storage.

    Let’s expand on the example to allow getting a batch of results. This is easy – create a nextN function which calls next() a number of times and returns the results in say a List:

    public class Squares2
    {
            private int i = 1;
    
            public int next()
            {
                    int thisOne = i++;
                    return thisOne * thisOne;
            }
    
            public List<Integer> nextN(int n)
            {
                    List<Integer> l = new ArrayList<>();
    
                    for (int i = 0; i < n; i++)
                    {
                            l.add(next());
                    }
    
                    return l;
            }
    
            public static void main(String args[])
            {
                    Squares2 squareGenerator = new Squares2();
    
                    squareGenerator.nextN(10).forEach(System.out::println);
            }
    }
    

    A few points:

    • Notice in the nextN function there is the empty diamond in the new ArrayList statement. This was added in Java 7 to save having to state the type both on the left and the right hand side; the compiler now works it out.
    • List is an Iterable, and Iterable now has a forEach() method which was added in Java 8. We could use stream() as before to create a stream, but if all we want to do is pass the contents to a function forEach() does nicely.

    Now, to save having to write nextN for every sequence we make, we could create a new type which extends Iterator providing the nextN function.

    The only problem we face here is that we have to save the batch in a list before we can operate on it. Java 8 provides another way. Let’s go back and start again with the following code:

    public class Squares3
    {
            public static void main(String args[])
            {
                    IntStream.rangeClosed(1, 10).map(i -> i * i)
                             .forEach(System.out::println);
            }
    }
    

    This uses IntStream to get the indexes of the sequence in a stream and calls map to convert them into their squares. The problem is that to get more squares than the tenth we need to duplicate the pipeline and start it off from the right place. Let’s look at another way without using a range:

    
            public static void main(String args[])
            {
                    IntStream myStream = IntStream.iterate(1, i -> i + 1);
    
                    myStream.limit(10).map(i -> i * i)
                                      .forEach(System.out::println);
            }
    

    This also generates the first 10 square numbers. This time it uses the iterate function. This takes two parameters, the first is our initial value, and the second is a function defining how to get to the next value from the previous. It’s a good place to use a lambda function. We can even dispense of the map function since we can undo squaring easily in iterate to get what the last index was:

            public static void main(String args[])
            {
                    IntStream myStream = IntStream.iterate(1,
                            i -> ((int) Math.pow(Math.sqrt(i) + 1, 2)));
    
                    myStream.limit(10).forEach(System.out::println);
            }
    

    This solves one of the problems of having to buffer beforehand. However, we need to use the limit operator on the stream to limit it to 10 items, otherwise it would keep on going. Unfortunately this is a problem, since once we’ve got the 10 the stream is ‘operated on’ and we can’t use it again to generate more. If we try, we get an IllegalStateException. We’d have to create another stream to get more.

    So how do we get around the problem of the stream being used up? Instead of using IntStream’s iterate function, we can use generate instead. IntStream’s generate function takes an instance of an IntSupplier. IntSupplier has a getAsInt() function which returns the next int in the sequence which is very much like our next() function. Here is an example that prints the first 20 square numbers in two batches:

    public class SquaresGenerator
    {
            private static class SqSupplier implements IntSupplier
            {
                    int i = 0;
    
                    @Override
                    public int getAsInt()
                    {
                            i++;
                            return i * i;
                    }
            }
    
            public static void main(String args[])
            {
                    SqSupplier sqSupplier = new SqSupplier();
                    IntStream myStream = IntStream.generate(sqSupplier);
                    IntStream myStream2 = IntStream.generate(sqSupplier);
    
                    myStream.limit(10).forEach(System.out::println);
                    myStream2.limit(10).forEach(System.out::println);
            }
    }
    

    Again we’re using limit to stop the stream continuing indefinitely. However unlike last time, although the stream is used up, the generator still survives and can be used again. No buffering needed either, just keeping hold of the supplier. The only downside vs the old Java way is that we have to use Streams to get sequence members, although this comes with other benefits such as parallelism which we’ll see in a later article.

    Overall, there are several ways to generate a sequence and which we chose may depend on our needs. Using an IntSupplier is a good way to integrate with the rest of the Java 8 functional programming support.