Tag Archives: map

Collectors Part 2: Provided collectors and a Java 8 streams demonstration

Today we’re going to continue where the last article left off. In that one we looked at collectors, specifically reduction and short-circuiting operations. Today we’ll look at the collect function and then we’ll finish off with a more substantial example showing the power Java 8 streaming gives us.

A collector gathers results and terminates the stream. It can also do reductions. The Collectors class provides a number of useful collectors ready for us to use. We’ll start by looking at those which carry out the same operations we looked at in the previous article (count, sum, average, max, min and summaryStatistics).

public class Collectors
{
  public static void main(String args[])
  {
    Integer[] numbersArray = new Integer[] { 1, 2, 3, 4, 5 };

    System.out.println(Arrays.stream(numbersArray)
                             .collect(Collectors.counting()));

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.summingInt((Integer x) -> x)));

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.averagingInt((Integer x) -> x)));

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.maxBy(Integer::compare)).get());

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.minBy(Integer::compare)).get());

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.summarizingInt((Integer x) -> x)));
  }
}

Note we’re streaming Integer rather than int so we need to pass a ToIntFunction function for the sum, average and summarizing collectors. A ToIntFunction applies a function to a type and returns an int. Given the stream has an Object shape we need to help the compiler and indicate the parameter to the lambda is really an Integer. Auto-unboxing will do the rest.

All relatively easy so far, but now it gets a bit harder. The problem that we face with the rest of the API is that the functions have several overloaded versions and there are lots of generics in the specification making it hard to read.

Take this one from partitioningBy which is not the most difficult to understand:

    Collector<T, ?, Map<Boolean, D>> 
       partitioningBy(Predicate<? super T> predicate,
                      Collector<? super T, A, D> downstream) {

We can see clearly that partitioningBy takes a predicate and a collector, and returns a collector which the collect function can use. The problem is working out what all these types are, even with the help of the documentation. Luckily I’ll provide some examples which should make it easy to start.

What I did when I explored was make simple examples like those I’m presenting:

  • Try the simplest version with the least parameters. That will usually be the easiest to understand.
  • Once you get that working look at the implementation. Often the simpler one will be passing its own ‘default’ parameters to a more complicated version giving a clue to what that is expecting.
  • Try the more complicated one, but substitute the ‘default’ parameter with something else. See what happens. Does it compile and do what’s expected, if not why not?
  • Try stepping through the library code and see how the code is using the parameters.

Hopefully with the examples I’ve make easy work of this, but you can learn a lot more by experimenting yourself.

Collecting sounds like the thing collections are for, and indeed there are several ways of building collections with results coming from a stream. We can build a generic list, a generic set, a generic map, or we can build a specific type of collection or map:

public class CollectInCollections
{
  public static void main(String args[])
  {
    Character[] chars = new Character[]
                        { 'a', 'b', 'c', 'd', 'e', 'f', 'g' };

    // First a list
    List<Character> l = Arrays.stream(chars)
                              .collect(Collectors.toList());

    System.out.println(l);

    // toList gives us a generic list (code creates an ArrayList)
    // Let's get a linked list
    List<Character> ll = Arrays.stream(chars)
                               .collect(
                       Collectors.toCollection(LinkedList::new));

    System.out.println(ll);

    // toSet gives us a generic set (code creates a HashSet)
    Set<Character> s = Arrays.stream(chars)
                                   .collect(Collectors.toSet());

    System.out.println(s);

    // and now a generic map (code creates a HashMap)
    Map<Character, Character> m = 
                        Arrays.stream(chars).collect(
                  Collectors.toMap((Character k) -> 
                                       Character.toUpperCase(k),
                                   Function.identity()));

    System.out.println(m);

    // What happens if keys clash?
    try
    {
      Arrays.stream(chars).collect(
                  Collectors.toMap((Character k) -> 'a', 
                                   Function.identity()));
    }
    catch (IllegalStateException e)
    {
      System.out.println("Caught duplicate key");
    }

    // Let's provide a function to resolve this
    // we'll keep the first
    Map<Character, Character> m2 =
                        Arrays.stream(chars).collect(
               Collectors.toMap((Character k) -> 'a',
                                Function.identity(),
                               (v1, v2) -> v1));

    System.out.println(m2);

    // If we return null from our merge function,
    // the latest is kept
    Map<Character, Character> m3 =
                        Arrays.stream(chars).collect(
                Collectors.toMap((Character k) -> 'a',
                                 Function.identity(),
                                 (v1, v2) -> null));

    System.out.println(m3);

    // We can also request a different type of map
    Map<Character, Character> m4 =
                        Arrays.stream(chars).collect(
                Collectors.toMap(
                   (Character k) -> Character.toUpperCase(k),
                                 Function.identity(),
                                 (v1, v2) -> v1,
                                 TreeMap::new));

    System.out.println(m4);
  }
}

Notes:

  • Map comes with several overloaded versions which help us deal creating keys for our values and dealing with clashes. As we know if we try to put a key-value pair into a map where the key already exists, we overwrite the existing value.
    The first two parameters map our value onto a key and a value respectively. For one of these (often the value) we don’t want to change anything and so passing Function.identity() (or the lambda v -> v) will keep the value the same.
  • By default toMap uses a throwingMerger() which throws an IllegalStateException if two keys clash. We can see this in action if we force the keys to a single value. If we specify a third parameter we can specify a BiFunction (two parameters in, one out) to deal with the clash instead. If the result of this function is null, the latest is kept.
  • If we specify a fourth parameter to toMap we can specify a specific type of map.
  • The documentation doesn’t state what type of collection is returned for toList(), toSet() and the 2 and 3 parameter versions of toMap(). The idea is that functional programming provides a rich set of features for a few collections rather than lots of collections. We therefore shouldn’t make any assumptions and where it matters use toCollection or the 4 parameter version of toMap to be sure, or convert later.
  • There are also concurrent versions of toMap called toConcurrentMap which can give better performance in parallel streams when we don’t care about the order.

Now on to rest of the operations which are for joining strings, grouping and partitioning:

class JoiningGroupingAndPartitioning
{
  public static void main(String args[])
  {
    Character[] chars =
           new Character[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g' };

    // Join them all together
    System.out.println(
           Arrays.stream(chars).map(x -> x.toString())
                               .collect(Collectors.joining()));

    // Join with a ,
    System.out.println(
           Arrays.stream(chars).map(x -> x.toString())
                            .collect(Collectors.joining(",")));

    // Join with a , and surround the whole thing with []
    System.out.println(Arrays.stream(chars)
                             .map(x -> x.toString())
                             .collect(
                           Collectors.joining(",", "[", "]")));

    // Group into two groups
    Map<String, List<Character>> group1 =
           Arrays.stream(chars).collect(
                           Collectors.groupingBy(
          (Character x) -> x < 'd' ? "Before_D" : "D_Onward"));

    System.out.println(group1);

    // As before, but group values with like keys in a set
    Map<String, Set<Character>> group2 =
           Arrays.stream(chars).collect(
                           Collectors.groupingBy(
          (Character x) -> x < 'd' ? "Before_D" : "D_Onward",
                                         Collectors.toSet()));

    System.out.println(group2);

    // Put the whole grouping structure in a TreeMap
    Map<String, Set<Character>> group3 =
           Arrays.stream(chars).collect(
                           Collectors.groupingBy(
          (Character x) -> x < 'd' ? "Before_D" : "D_Onward",
                             TreeMap::new,
                             Collectors.toSet()));

    System.out.println(group3);

    // Partition into two lists
    Map<Boolean, List<Character>> partition1 =
           Arrays.stream(chars).collect(
                            Collectors.partitioningBy(
                                 (Character x) -> x < 'd'));

    System.out.println(partition1);

    // Partition into two sets
    Map<Boolean, Set<Character>> partition2 =
           Arrays.stream(chars).collect(
                            Collectors.partitioningBy(
                                 (Character x) -> x < 'd',
                                       Collectors.toSet()));

    System.out.println(partition2);
  }
}

The first type of collector in this example is a joining collector. This is used to join Strings together (in the example we map characters to Strings). The no parameter version just joins the Strings together, the one parameter version allows us to specify a string to put between any strings we join. The 3 parameter version also allows us to specify a start and an end string to flank the collected String with. This can be useful for producing debug or human readable output.

The second is a groupingBy collector. This allows us to group together elements that have the same classification into a map. To classify we pass a classification function which takes an element and returns the type we are using for the keys of the map. GroupingBy has three versions: the first just takes the classification function, the second allows us to specify the the collection type for values with duplicate keys (default is a generic list). The third is the same as the second, but also has another parameter (2nd one) which allows us to specify the type of map (as opposed to just a generic one). We pass a constructor using the :: notation and the constructor is denoted by the ‘new’ function.

The third set is a partitioningBy collector. This is like groupingBy, but instead of passing a function to specify the key, we pass a predicate which determines the key (true or false) for each value. In the single parameter version, values that share the same key are organised into a generic list, where as in the two parameter version we can specify the collection type for the values.

So we’ve seen plenty of Hello World style examples, but I think I owe you something that’s a bit more realistic. Let’s model a club. Each member of the club has a membership which keeps track of their name, age and gender. It’s also possible to register two members as a couple. There are three types of memberships, a junior membership for the under 18s, a senior membership for the 60s and overs, otherwise adult membership.

We’re tasked with the following:

  • Get all member’s names as a String separated by ,
  • Find the average age (rounded down to the nearest integer)
  • Split the membership list into all the male and all the female members
  • Classify the members depending on their membership types
  • Get all couples as a List

This is a reasonably sized example that would pass quite easily for a university programming homework/exam, or a longer programming exercise in an interview. Being able to knock out such code from a description in say 30 minutes will put you in very good stead.

Let’s think how to approach this. Before you start iterating through the collections using ‘for’ and having lots of mutable state, we’re solve without any mutable state. Well except in one place for convenience (registering couples). The data is not going to get further mutated in the example once we’re set up anyway. This will mean we’re less likely to make bugs by mutations happening incorrectly, have one-off errors and we can delegate more of the how to do it machinery to the Java libraries leaving us to focus on arranging it to solving the problem. This will be our first steps in thinking like a functional programmer but not be too far away from what an OO programmer would understand.

First we are going to create a ClubMember type:

public class ClubMember
{
  private String name;
  private boolean male;
  private int age;
  private ClubMember partner;

  public ClubMember(String name, boolean male, int age)
  {
    this.name = name;
    this.male = male;
    this.age = age;
  }

  public String getName()
  {
    return this.name;
  }

  public int getAge()
  {
    return age;
  }

  public ClubMember getPartner()
  {
    return partner;
  }

  @Override
  public String toString()
  {
    return name;
  }
}

All pretty straight forward and wouldn’t look out of place in object-oriented code. We have a constructor, some getters and toString method. We don’t have a isMale() getter, but we’ll see why in a second. There’s a lot of boiler-plate though so if boiler-plate makes your blood boil [I’m allowed a quip] then you might want to take a look at Project Lombok.

We’re also going to have to register a couple somehow and let’s assume this club frowns on polygamy. We will use a static method to do it. Why static? A static method can set the partner field of both members in the same method. If it was a normal method we’d either to rely on our class’ user to call it on both members – which they might forget to do. What if we made a member register method, calling the passed member’s register method inside it? In that case we would have to decide how are we going to stop an infinite loop ping-ponging between the two instances. After all that method is going to call the first one again.

So here is the static method:

  public static void registerPartners(ClubMember cm1,
                                      ClubMember cm2)
  {
    cm1.partner = cm2;
    cm2.partner = cm1;
  }

I also want to add two static fields. Instead of a getter for isMale and a test for a partner being a member, I want to use Predicates instead so I can do some functional things with them:

public static final Predicate<ClubMember> isMale =
                                               m -> m.male;

public static final Predicate<ClubMember> isPartnerMember =
                                    m -> m.partner != null;

We’re done with the ClubMember class now.

Having a Tuple2 type to manage both age ranges for our memberships and couples would be useful. Unfortunately standard Java doesn’t have Tuple2 just yet, so we’ll make our own (using a library like Guava for this functionality would be better to avoid having to reinvent the wheel):

public class Tuple2<T1, T2>
{
  public T1 t1;
  public T2 t2;

  public Tuple2(T1 first, T2 second)
  {
    t1 = first;
    t2 = second;
  }

  @Override
  public String toString()
  {
    return "(" + t1 + ", " + t2 + " " + ")";
  }
}

A couple Tuple2 has two ClubMember types, a age range Tuple2 has two Integer types. As we probably would reuse the two integer type Tuple2 elsewhere, we’ll define an IntegerRange specially:

public class IntegerRange extends Tuple2<Integer, Integer>
{
  public IntegerRange(Integer start, Integer end)
  {
    super(start, end);
  }

  public Integer getStart()
  {
    return t1;
  }

  public Integer getEnd()
  {
    return t2;
  }

  public final Predicate<Integer> inRange = 
                                i -> i >= t1 && i < t2;
}

This also allows us to define another predicate to determine whether a value is in the range. We’ll take the end-point as non-inclusive.

Now on to our main class:

public class ClubMembers
{
  private List<ClubMember> members;

  private static final IntegerRange juniors =
                              new IntegerRange(0, 18);
  private static final IntegerRange adult =
                              new IntegerRange(18, 60);
  private static final IntegerRange seniors = 
               new IntegerRange(60, Integer.MAX_VALUE);

  private static final String juniorMembership =
                                   "Junior Membership";
  private static final String adultMembership =
                                    "Adult Membership";
  private static final String seniorMembership =
                                   "Senior Membership";

  public ClubMembers(List<ClubMember> members)
  {
    this.members = members;
  }
}

Let’s start off with a field to store the membership list and a constructor to initialise it. We also will define our age ranges for our membership types and some strings to represent them. We’ll add all the rest of the code into this class.

So the first thing we want to do is get all the members. Let’s join them together with a comma. Does that sound like a job for Collectors.joining? You bet!:

public String getAllMembers()
{
  return members.stream().map(m -> m.getName())
                         .collect(Collectors.joining(", "));
}

That was simple. We take a member, map it to just its name, and then collect with joining. The cool thing here is we don’t need to worry whether to add a comma or not as it’s taken care of for us. No more defining mutable state variables such as ‘first’ to avoid putting a comma before the first name.

Now on to average age. Again simple since we’ve seen this before:

public OptionalDouble getAverageAge()
{
  return members.stream().map(m -> m.getAge())
                         .mapToInt((Integer x) -> x)
                         .average();
}

Next we need to find all male and female members. Hmmmm two groups, this sounds like a job for Collectors.partitioningBy.

Note: One thing we should do when programming in a functional style is try to reuse as much as possible. Instead of the class being a template that we specialise, we use a function [often as a parameter] to give it special behaviour.

Let’s partition members by an arbitrary predicate:

private Map<Boolean, List<ClubMember>> partitionMembers(
                                   Predicate<ClubMember> p)
{
  return members.stream().collect(
                            Collectors.partitioningBy(p));
}

Now we can specialise this function by passing the appropriate predicate, conveniently defined in ClubMember:

public Map<Boolean, List<ClubMember>> getMembersByGender()
{
  return partitionMembers(ClubMember.isMale);
}

So now we have two groups, the false group are female, the true group are male.

What about our memberships? This time we have three groups, and thus Collectors.partitioningBy isn’t a good fit, so let’s use Collectors.groupingBy. Again let’s use the same trick: a generic classification function we pass a classifier to. This classifier is just a function taking a ClubMember and classifying into groups of an arbitrary type:

private <T> Map<T, List<ClubMember>>
         classifyMembers(Function<ClubMember, T> classifier)
{
  return members.stream().collect(
                         Collectors.groupingBy(classifier));
}

We need that classification function to take a ClubMember and return details of their membership type as a String:

private static final Function<ClubMember, String>
                  resolveMembershipType =
    m -> juniors.inRange.test(m.getAge()) ? juniorMembership :
         adult.inRange.test(m.getAge()) ? adultMembership :
                                          seniorMembership;

We defined the strings, we defined the age ranges, we defined a inRange test. It was just a case of putting it together. Now we just specialise classifyMembers:

private Map<String, List<ClubMember>> classifyMemberships()
{
  return classifyMembers(resolveMembershipType);
}

See how easy this can be?

Last is the couple members. What does it mean to be a couple member?

Well, we expect the isPartnerMember predicate defined in ClubMember to return true. That’s a filter we need in order to check this.

Now if we go through all our members with partners we will return the couples twice though: once for personA, personB and once for personB, personA. That might be what we could want, but in this case it’s not. Let’s make the assumption that both partners have a different name (to be robust we probably should have member ids for this sort of thing). We need to arbitrarily choose one of the partners as our first partner, so let’s use it if personA’s name when compared to personB’s name returns < 0. String has a compareTo function defined for us so we can do that. This will need another filter.

We also want to return the couple, not just one of the members, so let’s use Tuple2 to hold that. This will need a map.

Finally we want a list of couples, so Collectors.toList() will do that job nicely:

public List<Tuple2<ClubMember, ClubMember>> getCouples()
{
  return members.stream()
             .filter(ClubMember.isPartnerMember)
             .filter(m -> m.getName().
                     compareTo(m.getPartner().getName()) < 0)
             .map(m -> new Tuple2<>(m, m.getPartner()))
             .collect(Collectors.toList());
}

I hope this is making you smile – a little bit of thought about the problem and we can easily solve it by putting blocks together.

Let’s write a driver main method in ClubMembers. To make things a little easier to verify we’ll call the couples by their titles and everyone else by their first name:

public static void main(String args[])
{
  ClubMember cm1 = new ClubMember("Johnny", true, 13);
  ClubMember cm2 = new ClubMember("Jenny", false, 9);
  ClubMember cm3 = new ClubMember("Dave", true, 21);
  ClubMember cm4 = new ClubMember("Penny", false, 28);
  ClubMember cm5 = new ClubMember("Mrs. Smith", false, 36);
  ClubMember cm6 = new ClubMember("Mr. Smith", true, 45);
  ClubMember cm7 = new ClubMember("Mr. Watts", true, 59);
  ClubMember cm8 = new ClubMember("Mrs. Watts", false, 60);
  ClubMember cm9 = new ClubMember("Bill", true, 68);

  ClubMember.registerPartners(cm5, cm6);
  ClubMember.registerPartners(cm7, cm8);

  ClubMember[] membersArray = new ClubMember[]
           { cm1, cm2, cm3, cm4, cm5, cm6, cm7, cm8, cm9 };

  ClubMembers members = new ClubMembers(
                              Arrays.asList(membersArray));

  System.out.println("Members: " + members.getAllMembers());
  System.out.println("Average age: " + 
         new Double(members.getAverageAge().orElse(0))
                   .intValue());
  System.out.println("Membership by gender (true is male): " +
                        members.getMembersByGender());
  System.out.println("Memberships: " +
                        members.classifyMemberships());
  System.out.println("Couples: " + members.getCouples());
}

Let’s fire it up:

Members: Johnny, Jenny, Dave, Penny, Mrs. Smith, Mr. Smith, Mr. Watts, Mrs. Watts, Bill
Average age: 37
Members are male: {false=[Jenny, Penny, Mrs. Smith, Mrs. Watts], true=[Johnny, Dave, Mr. Smith, Mr. Watts, Bill]}
Memberships: {Senior Membership=[Mrs. Watts, Bill], Junior Membership=[Johnny, Jenny], Adult Membership=[Dave, Penny, Mrs. Smith, Mr. Smith, Mr. Watts]}
Couples: [(Mr. Smith, Mrs. Smith ), (Mr. Watts, Mrs. Watts )]

What you might notice when writing code like this on your own, is that once you deal with all the compile errors it works first time out of the box. This is very different from doing it in a procedural style using iterators where you often end up with one off errors, null pointers and a host of other problems. By eliminating the possibility of making them we can write code quicker and be more productive. Java 8 Streams and supplied collectors make life very easy for us.

Finite sequence generators in Java 8

… and introducing default methods.

Last time we looked at generators, and more specifically those generating an infinite sequence. We saw that there were several ways to achieve this:

  • The older Java 7 way with an iterator like class
  • Using Stream’s iterate method
  • Using Stream’s generate method

We also saw that when using a Stream we had to use the limit method on our infinite sequence otherwise it would keep generating and the program couldn’t continue. The problem with using limit was that once we’d got the values, we couldn’t use the stream again to get more. To solve this problem, we used an IntSupplier and created several streams with it to process batches of values.

What if the sequence we wanted to process was finite. We want to avoid buffering up values in advance. We also want to consume the sequence without knowing up front how many values it will return because of reasons such as:

  • We can’t work it out or it’s difficult to
  • There is a random element
  • We want to decouple the generating function from stream doing the consuming

We saw a simple finite sequence with our Hello World! example in the first article. In this case we were not generating the sequence with a function, we were instead processing a pre-initialised list. We also saw we could use an iterator when we discussed infinite sequences, but went on to discuss other ways. We’ll see that in this case, we are virtually forced into the Iterator solution.

Let’s take a simple sequence which we can’t know the length of. We’ll simulate a game where the idea is to keeping throwing a die until we get a six. First we’ll start with a non-functional implementation:

public class SixGame
{
	public static class DieThrowIterator implements Iterator<Integer>
	{
		private int dieThrow = 0;
		private Random rand = new Random(System.nanoTime());

		@Override
		public boolean hasNext()
		{
			return dieThrow != 6;
		}

		@Override
		public Integer next()
		{
			dieThrow = Math.abs(rand.nextInt()) % 6 + 1;
			return dieThrow;
		}
	}

	public static void main(String args[])
	{
		DieThrowIterator dieThrowIterator = new DieThrowIterator();

		while (dieThrowIterator.hasNext())
		{
			System.out.println("You threw a " + dieThrowIterator
                                  .next());
		}
	}
}

(Note our random number generation is not very robust, but for a simple demonstration it will do).

Let’s try to use a Stream. We can’t use Stream’s iterate function because it doesn’t allow a stop condition. Also an IntSupplier with generate is not an option unless we want to use limit(1) and create a stream for each die throw, when we might as well not use a stream at all.

If we look under the hood at the implementation of IntStream’s generate function we see that it creates an InfiniteSupplyingSpliterator. This has a problem for us – the tryAdvance function always returns true meaning we can never stop.

We could implement our own Spliterator where our tryAdvance checks for a stop condition. We can’t simply extend InfiniteSupplyingSpliterator and override the tryAdvance method since it’s an inner class of a default access class. So the only way is a cut and paste job rather than inheritance. I’m very nervous about copying large parts of code which might change in future versions; this is saying to me it’s not what was intended. We should look for other ways first.

Let’s look to see how List does its streaming. List’s streaming comes from inheriting Iterable. To create an Iterable we only need to implement its iterator() function – for the example let’s do so as an inner-class:

	public static class DieThrowIterable implements Iterable<Integer>
	{
		@Override
		public Iterator<Integer> iterator()
		{
			return new DieThrowIterator();
		}
	}

We can then stream:

	public static void main(String args[])
	{
		Stream<Integer> stream = StreamSupport.stream(
				new DieThrowIterable().spliterator(), false);

		stream.map(i -> "You threw a " + i).forEach(System.out::println);
	}

Wait a moment… Iterable is an interface, yet it has a spliterator() method implemented. How can that be?

If we look at the interface, there is indeed a spliterator() method creating a new spliterator for us. If we also look closely, we see the default keyword. This was added in Java 8 (cunningly default is already a reserved word for switch statements so it won’t break old code). When we write interfaces we can provide default methods already implemented. Now some people might have reservations about this as it’s turning an interface in to effectively an abstract class. There are some good reasons and advantages this gives us:

  • It avoids having to create abstract classes to implement interfaces to supply default methods. Thus there is less boiler-plate to write, and also prevents the consumer of the API implementing the interface when we expected the abstract class to be used instead.

  • It helps with multiple inheritance issues (inherting from two or more classes which C++ supports, but not Java). In Java, the problems with multiple inheritance were avoided by allowing only single inheritance from classes, but we can implement as many interfaces as desired. Problem is we had to implement large portions of those interfaces in each class that used them – a real pain.

  • Our API vendors can also now add new methods to interfaces without breaking old code. This is one reason we see a default method here. A new interface would make the JDK even larger and make things more complicated.

  • Lambda expressions bind to interfaces with one single method left to implement. If there were more we couldn’t do this binding. We’ll look at this in a future article.

Note: If we implement more than one interface with an identical default method, and we do not override it by implementing in the class or a parent class, this is an error. Whether eventually it’ll be possible to use Scala-like mix-in traits is an interesting question.

So using Iterable is one way to do finite sequence generators, although the documentation accompanying the default spliterator() method suggests it should be overridden for better performance. In a later article we’ll come back and look at spliterators in some more detail.

In the next article we’ll look another couple of ways to implement finite sequence generators, one good, and one we should avoid.

Generators with Java 8

Today we’ll look at creating generators. In simple terms, a generator is a function which returns the next value in a sequence. Unlike an iterator, it generates the next value when needed, rather than returning the next item of a pre-generated collection. Some languages such as Python support generators natively via keywords such as yield. When a generator’s next value is requested in Python, the generator function continues to run until the next yield statement, where a value is returned. The generator function is able to continue where it left off which can be quite confusing for the uninitiated. So how to do something similar in Java?

We saw in the last article that we can use an IntStream to generate a simple set of numbers, but we had to generate them all up front. That’s fine if we know how many we’re going to need. What if we don’t, and we want to be able to get the next whenever we like? This is where a generator comes in.

Let’s choose a simple infinite sequence, the square numbers. In a standard Java implementation we’d end up with something like the following:

public class Squares
{
        private int i = 1;

        public int next()
        {
                int thisOne = i++;
                return thisOne * thisOne;
        }

        public static void main(String args[])
        {
                Squares squareGenerator = new Squares();

                System.out.println(squareGenerator.next());
                System.out.println(squareGenerator.next());
                System.out.println(squareGenerator.next());
        }
}

This prints the first three square numbers. Note we could have gone further and implemented this as an iterator.

What we have here is an example of lazy evaluation in a non-functional style. Wikipedia defines lazy evaluation as: ‘In programming language theory, lazy evaluation, or call-by-need is an evaluation strategy which delays the evaluation of an expression until its value is needed’. Lazy evaluation is useful because we don’t need to worry about infinite sequences, performing computationally expensive operations up-front, and about storage.

Let’s expand on the example to allow getting a batch of results. This is easy – create a nextN function which calls next() a number of times and returns the results in say a List:

public class Squares2
{
        private int i = 1;

        public int next()
        {
                int thisOne = i++;
                return thisOne * thisOne;
        }

        public List<Integer> nextN(int n)
        {
                List<Integer> l = new ArrayList<>();

                for (int i = 0; i < n; i++)
                {
                        l.add(next());
                }

                return l;
        }

        public static void main(String args[])
        {
                Squares2 squareGenerator = new Squares2();

                squareGenerator.nextN(10).forEach(System.out::println);
        }
}

A few points:

  • Notice in the nextN function there is the empty diamond in the new ArrayList statement. This was added in Java 7 to save having to state the type both on the left and the right hand side; the compiler now works it out.
  • List is an Iterable, and Iterable now has a forEach() method which was added in Java 8. We could use stream() as before to create a stream, but if all we want to do is pass the contents to a function forEach() does nicely.

Now, to save having to write nextN for every sequence we make, we could create a new type which extends Iterator providing the nextN function.

The only problem we face here is that we have to save the batch in a list before we can operate on it. Java 8 provides another way. Let’s go back and start again with the following code:

public class Squares3
{
        public static void main(String args[])
        {
                IntStream.rangeClosed(1, 10).map(i -> i * i)
                         .forEach(System.out::println);
        }
}

This uses IntStream to get the indexes of the sequence in a stream and calls map to convert them into their squares. The problem is that to get more squares than the tenth we need to duplicate the pipeline and start it off from the right place. Let’s look at another way without using a range:


        public static void main(String args[])
        {
                IntStream myStream = IntStream.iterate(1, i -> i + 1);

                myStream.limit(10).map(i -> i * i)
                                  .forEach(System.out::println);
        }

This also generates the first 10 square numbers. This time it uses the iterate function. This takes two parameters, the first is our initial value, and the second is a function defining how to get to the next value from the previous. It’s a good place to use a lambda function. We can even dispense of the map function since we can undo squaring easily in iterate to get what the last index was:

        public static void main(String args[])
        {
                IntStream myStream = IntStream.iterate(1,
                        i -> ((int) Math.pow(Math.sqrt(i) + 1, 2)));

                myStream.limit(10).forEach(System.out::println);
        }

This solves one of the problems of having to buffer beforehand. However, we need to use the limit operator on the stream to limit it to 10 items, otherwise it would keep on going. Unfortunately this is a problem, since once we’ve got the 10 the stream is ‘operated on’ and we can’t use it again to generate more. If we try, we get an IllegalStateException. We’d have to create another stream to get more.

So how do we get around the problem of the stream being used up? Instead of using IntStream’s iterate function, we can use generate instead. IntStream’s generate function takes an instance of an IntSupplier. IntSupplier has a getAsInt() function which returns the next int in the sequence which is very much like our next() function. Here is an example that prints the first 20 square numbers in two batches:

public class SquaresGenerator
{
        private static class SqSupplier implements IntSupplier
        {
                int i = 0;

                @Override
                public int getAsInt()
                {
                        i++;
                        return i * i;
                }
        }

        public static void main(String args[])
        {
                SqSupplier sqSupplier = new SqSupplier();
                IntStream myStream = IntStream.generate(sqSupplier);
                IntStream myStream2 = IntStream.generate(sqSupplier);

                myStream.limit(10).forEach(System.out::println);
                myStream2.limit(10).forEach(System.out::println);
        }
}

Again we’re using limit to stop the stream continuing indefinitely. However unlike last time, although the stream is used up, the generator still survives and can be used again. No buffering needed either, just keeping hold of the supplier. The only downside vs the old Java way is that we have to use Streams to get sequence members, although this comes with other benefits such as parallelism which we’ll see in a later article.

Overall, there are several ways to generate a sequence and which we chose may depend on our needs. Using an IntSupplier is a good way to integrate with the rest of the Java 8 functional programming support.

Ranges and Looping with IntStream

In the previous posts we looked at taking a container, getting it to stream its contents into a pipeline and looked at a few different operations on data in that pipeline (forEach, map, filter, peek).

Java 8 supports several specialist streams where the pipeline contains a specific type of object. Today we’ll look at IntStream which passes Integers along its pipeline.

public class IntStreamExample
{
	public static void main(String[] args)
	{
		System.out.println("[1,5]");
		IntStream.rangeClosed(1, 5).forEach(System.out::println);

		System.out.println("[1,5)");
		IntStream.range(1, 5).forEach(System.out::println);

		System.out.println("Just 3");
		IntStream.of(3).forEach(System.out::println);

		System.out.println("Specific values");
		IntStream.of(1, 3, 5, 6).forEach
                (System.out::println);

		System.out.println("[1,3] and [4,6] joined");
		IntStream.concat(IntStream.rangeClosed(1, 3),
		IntStream.rangeClosed(4, 6)).forEach(System.out::println);
	}
}

To build our stream we take the IntStream class and use one of its static methods. A selection of such methods are demonstrated in the example above.

For those who missed the previous articles, the :: operator means pass the function on the right, calling with the object on the left.

Let’s describe each of the methods:

  • The range and rangeClosed methods produce a stream which has an ordered pipeline of integers starting at the first number and ending at the second. The difference is that rangeClosed has an inclusive endpoint where are range does not. There is no version yet with a step or descending values – the pipeline is initialised as empty if the start is beyond the last element. If we wanted a step, we use transformations with map:
  • IntStream.rangeClosed(1, 5).map(x -> 6-x)
                               .forEach(System.out::println);
    

    This is a bit clumsy (and hopefully a step version will be added soon) but it does the trick. If we need a step often, we could make our own based on the IntStream class.

  • The of method puts a one or more values in the pipeline. The multiple value version takes a variable number of ints (which means we can also pass an array of int).
  • Finally the concat method can be used to put two or more IntStreams together into a single IntStream.

Note that there also is an empty() method which produces an empty stream.

A use of range is a functional-style for-loop. It’s functional-style because there is no mutable loop variable. The examples above have already demonstrate this – the ‘body’ of the loop just printed the loop counter.

What if we want to nest two loops? That’s easy:

public class Multiplication
{
	public static void main(String[] args)
	{
		IntStream.rangeClosed(1, 10)
		         .forEach(i -> IntStream.rangeClosed(1, 10)
                         .forEach(
            j -> System.out.println(i + " * " + j + " = " + i * j)));
	}
}

By using the forEach operation we can map each element onto a another stream. We don’t have to use the element there and then, we can use it later in the pipeline. In the example we use both streams to produce a multiplication table.

We could also save the output of an IntStream to an array like this:

	int[] a = IntStream.rangeClosed(1, 10).toArray();

This provides us with a very handy way to initialise an array to a sequence of integers.

The pipeline operation toArray returns an int[]. What if we want to create an array from two or more nested loops? The problem we run into in the nested version is that we can’t use toArray() in the inner loop since the inner loop is part of a map function which is expecting an int not an int[]. This means we have to use another trick:

	int[] a = IntStream.rangeClosed(1, 10)
			.flatMap(i -> IntStream.rangeClosed(1, 10)
                                          .map(j -> i * j))
            .toArray();

Here we use flatMap. flatMap flattens a number of IntStreams (10 of them here) into pipeline elements (ints). These are then passed on to the next operation which is toArray().

The lambda expression passed to flatMap is converted to an IntFunction and its apply function takes an int and returns an IntStream. Here is an inner-class with it implemented explicitly:

	private static class MultiplicationTable implements
			IntFunction<IntStream>
	{
		@Override
		public IntStream apply(int value)
		{
			return IntStream.rangeClosed(1, 10).map(j -> value * j);
		}
	}

using the call:

		int[] a = IntStream.rangeClosed(1, 10)
			               .flatMap(new MultiplicationTable())
			               .toArray();

Finally here is a version using a local function:

public class Multiplication
{
	private IntStream getTable(int i)
	{
		return IntStream.rangeClosed(1, 10).map(j -> i * j);
	}

	public void test()
	{
		int[] a = IntStream.rangeClosed(1, 10).flatMap
                                               (this::getTable)
					       .toArray();

		Arrays.stream(a).forEach(System.out::println);
	}

	public static void main(String[] args)
	{
		new Multiplication().test();
	}
}

Note we also used the stream function on the Arrays helper class to print out the array. To do this we need to pass in the array into the stream function and then we can use forEach to print.

That should give you a good start for using IntStreams. Note there are also special streams for Long and Double which you might want to take a look at.

An Introduction to Functional Programming with Java 8

Java 8 is perhaps one of the most exciting editions of the Java language in recent times. One of the headline features is support for functional programming which is the focus of this blog. The support comes mostly in three features:

  • Support for [work pipeline] streams. Streams allow us to process data through a number of stages in a pipeline in a functional way. We can create such streams from container instances such as List, or create them using specific stream classes such as IntStream.
  • Support for lambda functions. In simple terms this is an anonymous function (we don’t need to give it a name) which takes a number of parameters and returns a value. A returned value will be passed down the pipeline to the next operation.
  • Support for passing functions into other functions.

Given functional programming has been around since the 50s, and until recently mostly disregarded by the mainstream, why has it become such a hot topic? My opinion is that it’s because of its ability to easily process work in parallel taking advantage of multi-core processors, lazy (on-demand) evaluation, and ease of integration with other languages such as Java. Certainly the JVM has provided a good base for Scala which can even be embedded in a Java program, giving the best of both worlds plus an easier migration path for developers. Scala, however, has quite a steep learning curve, and there are already many existing Java programmers out there, so eventually there was pressure for Java itself to catch up to avoid having to learn a whole new language to do some cool and useful things.

Enough words, let’s get down to business and consider a hello world example:

public class HelloWorld
{
	public static void main(String[] args)
	{
		List<String> countries = Arrays.asList("France", "India", 
			"China", "USA", "Germany");

		for (String country : countries)
		{
			System.out.println("Hello " + country + "!");
		}
	}
}

This should be familiar to every Java programmer; we take a list of countries and print a greeting using an enhanced for-loop.

So, is there anything wrong just doing it like this? For a simple example, not a lot, but let’s imagine we were doing something more complicated.

We want to write some tests to check the code works, but testing the body of the loop might be tricky. We could extract the body into its own function of course, but if we did that everywhere we’d end up with lots of little functions for the bodies of loops.

What if we want to process items in parallel? We’d need to hand over a country to a thread that implements the greeting function. Should one thread process one country, or a batch of them? How are we going to pass the work in batches and check when it’s done? We have to also handle communication between the threads.

Another problem is that the loop body’s code is disjoint from the source of the loop variable. To see how a country gets created we must find the for statement. We’d prefer to tell the list of countries to do the message printing. The problem is that lists don’t understand being told to do things, they are just containers.

However, since Java 8, containers can pass their contents to an entity which does handle instructions – a Stream. I’ve changed the example to use a stream:

	public static void main(String[] args)
	{
		List<String> countries = Arrays.asList("France", "India", "China",
				"USA", "Germany");

		countries.stream().forEach(
				(String country) -> System.out
						.println("Hello " + country + "!"));
	}

The list of countries now creates a Stream and passes it a ‘spliterator’ from the container. This is an iterator which is capable of splitting the contents into batches of work. We’ll take a look at this batching (which then can be parallelised) in a future post.

Stream itself is an interface which is implemented by the abstract class ReferencePipeline. Jobs can then be chained to the returned Stream. Each job does some work, and perhaps calls a downstream job when it’s done. In the example the list creates a Stream and a forEach job is added to its pipeline. Like our loop in the first example, the forEach job does some work on each item. The funny -> indicates a lambda expression (an anonymous function), here taking a String named country and printing it in a message.

Since in this case the compiler can work out that country is a String, we can dispense with the type, and as there is just one parameter we can also dispense with the () around the lefthand side leaving the more concise:

countries.stream()
   .forEach(country -> System.out.println("Hello " + country + "!"));

Note it’s a lot clearer that we are doing work on countries rather than just referencing a loop variable which happened to be a country. It’s also easier to read as there is less boilerplate obscuring the actual business logic. Clearer and easier to read code is easier to understand, also it’s easier to spot mistakes. We don’t also have to worry about making mistakes in the boilerplate which cuts down our tests. Overall, coding is more fun and the development time is shortened.

Let’s experiment with this example a bit to illustrate what is going on under the hood. The lambda that foreach is taking is actually an implementation of Consumer<T>.

* Reminder for those a bit rusty or coming to generics for the first time that this means the Consumer type uses an unknown type T which is decided at compile time.

We can explicitly show use of Consumer by creating an inner-class:

private static class Greeter<T> implements Consumer<T>
{
	@Override
	public void accept(T t)
	{
		System.out.println("Hello " + t + "!");
	}
}

and changing the pipeline:

countries.stream().forEach(new Greeter<String>());

Consumer has an accept method which is called every time by forEach with each item in the pipeline.

Let’s separate the forming of the message and the printing into two jobs. To form the greeting message, we can use the map function. This applies a lambda function to an input transforming the input. The printing job is still carried out by a foreach:

public static void main(String[] args)
{
	List<String> countries = Arrays.asList("France", "India",
			"China", "USA", "Germany");

	countries.stream().map(country -> "Hello " + country + "!")
			.forEach(System.out::println);
}

Note the strange syntax System.out::println. We’re actually passing a function into forEach, the println method of the System.out instance. The :: syntax simply means pass the function on the right, calling with the object on the left. The left hand side can either be a class name for a static call, an object, or an alias (this or super) to an object. Note we cannot pass only the name of the function, since this would be taken to mean a variable whose type is-a Consumer.

Instead of a static main method, if we ran this inside an instance of an object which also had a doPrint method (taking a String and returning void), we could write:

       countries.stream().map(country -> "Hello " + country + "!")
                        .forEach(this::doPrint);

Let’s look at how map works under the hood. Map takes an instance of a class which is-a Function and calls its apply method. Apply takes type T and returns a type U. We’ll create a Greeter inner-class to demonstrate this:

private static class Greeter<T> implements Function<T, String>
{
	@Override
	public String apply(T t)
	{
		return "Hello " + t + "!";
	}
}

...

countries.stream().map(new Greeter<String>())
                  .forEach(System.out::println);

When we used a lambda function here, it was just taken to be an anonymous instance of Function.

To finish, let’s run the pipeline inside an object has both a doPrint and a makeGreeting method (taking a String and returning a String), and see what it looks like:

public class HelloWorldConcise
{
	private void doPrint(String str)
	{
		System.out.println(str);
	}

	private String greet(String country)
	{
		return "Hello " + country + "!";
	}
		
	public void greetCountries()
	{
		List<String> countries = Arrays.asList("France", "India", 
			"China", "USA", "Germany");

		countries.stream().map(this::greet).forEach(this::doPrint);
	}

	public static void main(String[] args)
	{
		new HelloWorldConcise().greetCountries();
	}
}

The greetCountries function is very concise. It’s clear that we take each country, make a greeting from it and then print it. Another great thing is it’s really easy to test. If we extend the HelloWorldConcise class to make a HelloWorldConciseTest, we can implement our own versions of makeGreeting and doPrint. Our own makeGreeting can be used to check the greeting is being formed correctly after calling its parent version, and the doPrint method can be a stub.

* Okay, printing to stdout during a test is not a big deal, but imagine if doPrint sent the greeting to a web page, we wouldn’t want to have to set all that up in order to test it.

So that’s our first taste of Java 8 functional programming. I hope you found it useful. In the next article I will look at some other operations which can be included in the pipeline.