Tag Archives: partitioningBy

Collectors Part 2: Provided collectors and a Java 8 streams demonstration

Today we’re going to continue where the last article left off. In that one we looked at collectors, specifically reduction and short-circuiting operations. Today we’ll look at the collect function and then we’ll finish off with a more substantial example showing the power Java 8 streaming gives us.

A collector gathers results and terminates the stream. It can also do reductions. The Collectors class provides a number of useful collectors ready for us to use. We’ll start by looking at those which carry out the same operations we looked at in the previous article (count, sum, average, max, min and summaryStatistics).

public class Collectors
{
  public static void main(String args[])
  {
    Integer[] numbersArray = new Integer[] { 1, 2, 3, 4, 5 };

    System.out.println(Arrays.stream(numbersArray)
                             .collect(Collectors.counting()));

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.summingInt((Integer x) -> x)));

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.averagingInt((Integer x) -> x)));

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.maxBy(Integer::compare)).get());

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.minBy(Integer::compare)).get());

    System.out.println(Arrays.stream(numbersArray)
                             .collect(
                    Collectors.summarizingInt((Integer x) -> x)));
  }
}

Note we’re streaming Integer rather than int so we need to pass a ToIntFunction function for the sum, average and summarizing collectors. A ToIntFunction applies a function to a type and returns an int. Given the stream has an Object shape we need to help the compiler and indicate the parameter to the lambda is really an Integer. Auto-unboxing will do the rest.

All relatively easy so far, but now it gets a bit harder. The problem that we face with the rest of the API is that the functions have several overloaded versions and there are lots of generics in the specification making it hard to read.

Take this one from partitioningBy which is not the most difficult to understand:

    Collector<T, ?, Map<Boolean, D>> 
       partitioningBy(Predicate<? super T> predicate,
                      Collector<? super T, A, D> downstream) {

We can see clearly that partitioningBy takes a predicate and a collector, and returns a collector which the collect function can use. The problem is working out what all these types are, even with the help of the documentation. Luckily I’ll provide some examples which should make it easy to start.

What I did when I explored was make simple examples like those I’m presenting:

  • Try the simplest version with the least parameters. That will usually be the easiest to understand.
  • Once you get that working look at the implementation. Often the simpler one will be passing its own ‘default’ parameters to a more complicated version giving a clue to what that is expecting.
  • Try the more complicated one, but substitute the ‘default’ parameter with something else. See what happens. Does it compile and do what’s expected, if not why not?
  • Try stepping through the library code and see how the code is using the parameters.

Hopefully with the examples I’ve make easy work of this, but you can learn a lot more by experimenting yourself.

Collecting sounds like the thing collections are for, and indeed there are several ways of building collections with results coming from a stream. We can build a generic list, a generic set, a generic map, or we can build a specific type of collection or map:

public class CollectInCollections
{
  public static void main(String args[])
  {
    Character[] chars = new Character[]
                        { 'a', 'b', 'c', 'd', 'e', 'f', 'g' };

    // First a list
    List<Character> l = Arrays.stream(chars)
                              .collect(Collectors.toList());

    System.out.println(l);

    // toList gives us a generic list (code creates an ArrayList)
    // Let's get a linked list
    List<Character> ll = Arrays.stream(chars)
                               .collect(
                       Collectors.toCollection(LinkedList::new));

    System.out.println(ll);

    // toSet gives us a generic set (code creates a HashSet)
    Set<Character> s = Arrays.stream(chars)
                                   .collect(Collectors.toSet());

    System.out.println(s);

    // and now a generic map (code creates a HashMap)
    Map<Character, Character> m = 
                        Arrays.stream(chars).collect(
                  Collectors.toMap((Character k) -> 
                                       Character.toUpperCase(k),
                                   Function.identity()));

    System.out.println(m);

    // What happens if keys clash?
    try
    {
      Arrays.stream(chars).collect(
                  Collectors.toMap((Character k) -> 'a', 
                                   Function.identity()));
    }
    catch (IllegalStateException e)
    {
      System.out.println("Caught duplicate key");
    }

    // Let's provide a function to resolve this
    // we'll keep the first
    Map<Character, Character> m2 =
                        Arrays.stream(chars).collect(
               Collectors.toMap((Character k) -> 'a',
                                Function.identity(),
                               (v1, v2) -> v1));

    System.out.println(m2);

    // If we return null from our merge function,
    // the latest is kept
    Map<Character, Character> m3 =
                        Arrays.stream(chars).collect(
                Collectors.toMap((Character k) -> 'a',
                                 Function.identity(),
                                 (v1, v2) -> null));

    System.out.println(m3);

    // We can also request a different type of map
    Map<Character, Character> m4 =
                        Arrays.stream(chars).collect(
                Collectors.toMap(
                   (Character k) -> Character.toUpperCase(k),
                                 Function.identity(),
                                 (v1, v2) -> v1,
                                 TreeMap::new));

    System.out.println(m4);
  }
}

Notes:

  • Map comes with several overloaded versions which help us deal creating keys for our values and dealing with clashes. As we know if we try to put a key-value pair into a map where the key already exists, we overwrite the existing value.
    The first two parameters map our value onto a key and a value respectively. For one of these (often the value) we don’t want to change anything and so passing Function.identity() (or the lambda v -> v) will keep the value the same.
  • By default toMap uses a throwingMerger() which throws an IllegalStateException if two keys clash. We can see this in action if we force the keys to a single value. If we specify a third parameter we can specify a BiFunction (two parameters in, one out) to deal with the clash instead. If the result of this function is null, the latest is kept.
  • If we specify a fourth parameter to toMap we can specify a specific type of map.
  • The documentation doesn’t state what type of collection is returned for toList(), toSet() and the 2 and 3 parameter versions of toMap(). The idea is that functional programming provides a rich set of features for a few collections rather than lots of collections. We therefore shouldn’t make any assumptions and where it matters use toCollection or the 4 parameter version of toMap to be sure, or convert later.
  • There are also concurrent versions of toMap called toConcurrentMap which can give better performance in parallel streams when we don’t care about the order.

Now on to rest of the operations which are for joining strings, grouping and partitioning:

class JoiningGroupingAndPartitioning
{
  public static void main(String args[])
  {
    Character[] chars =
           new Character[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g' };

    // Join them all together
    System.out.println(
           Arrays.stream(chars).map(x -> x.toString())
                               .collect(Collectors.joining()));

    // Join with a ,
    System.out.println(
           Arrays.stream(chars).map(x -> x.toString())
                            .collect(Collectors.joining(",")));

    // Join with a , and surround the whole thing with []
    System.out.println(Arrays.stream(chars)
                             .map(x -> x.toString())
                             .collect(
                           Collectors.joining(",", "[", "]")));

    // Group into two groups
    Map<String, List<Character>> group1 =
           Arrays.stream(chars).collect(
                           Collectors.groupingBy(
          (Character x) -> x < 'd' ? "Before_D" : "D_Onward"));

    System.out.println(group1);

    // As before, but group values with like keys in a set
    Map<String, Set<Character>> group2 =
           Arrays.stream(chars).collect(
                           Collectors.groupingBy(
          (Character x) -> x < 'd' ? "Before_D" : "D_Onward",
                                         Collectors.toSet()));

    System.out.println(group2);

    // Put the whole grouping structure in a TreeMap
    Map<String, Set<Character>> group3 =
           Arrays.stream(chars).collect(
                           Collectors.groupingBy(
          (Character x) -> x < 'd' ? "Before_D" : "D_Onward",
                             TreeMap::new,
                             Collectors.toSet()));

    System.out.println(group3);

    // Partition into two lists
    Map<Boolean, List<Character>> partition1 =
           Arrays.stream(chars).collect(
                            Collectors.partitioningBy(
                                 (Character x) -> x < 'd'));

    System.out.println(partition1);

    // Partition into two sets
    Map<Boolean, Set<Character>> partition2 =
           Arrays.stream(chars).collect(
                            Collectors.partitioningBy(
                                 (Character x) -> x < 'd',
                                       Collectors.toSet()));

    System.out.println(partition2);
  }
}

The first type of collector in this example is a joining collector. This is used to join Strings together (in the example we map characters to Strings). The no parameter version just joins the Strings together, the one parameter version allows us to specify a string to put between any strings we join. The 3 parameter version also allows us to specify a start and an end string to flank the collected String with. This can be useful for producing debug or human readable output.

The second is a groupingBy collector. This allows us to group together elements that have the same classification into a map. To classify we pass a classification function which takes an element and returns the type we are using for the keys of the map. GroupingBy has three versions: the first just takes the classification function, the second allows us to specify the the collection type for values with duplicate keys (default is a generic list). The third is the same as the second, but also has another parameter (2nd one) which allows us to specify the type of map (as opposed to just a generic one). We pass a constructor using the :: notation and the constructor is denoted by the ‘new’ function.

The third set is a partitioningBy collector. This is like groupingBy, but instead of passing a function to specify the key, we pass a predicate which determines the key (true or false) for each value. In the single parameter version, values that share the same key are organised into a generic list, where as in the two parameter version we can specify the collection type for the values.

So we’ve seen plenty of Hello World style examples, but I think I owe you something that’s a bit more realistic. Let’s model a club. Each member of the club has a membership which keeps track of their name, age and gender. It’s also possible to register two members as a couple. There are three types of memberships, a junior membership for the under 18s, a senior membership for the 60s and overs, otherwise adult membership.

We’re tasked with the following:

  • Get all member’s names as a String separated by ,
  • Find the average age (rounded down to the nearest integer)
  • Split the membership list into all the male and all the female members
  • Classify the members depending on their membership types
  • Get all couples as a List

This is a reasonably sized example that would pass quite easily for a university programming homework/exam, or a longer programming exercise in an interview. Being able to knock out such code from a description in say 30 minutes will put you in very good stead.

Let’s think how to approach this. Before you start iterating through the collections using ‘for’ and having lots of mutable state, we’re solve without any mutable state. Well except in one place for convenience (registering couples). The data is not going to get further mutated in the example once we’re set up anyway. This will mean we’re less likely to make bugs by mutations happening incorrectly, have one-off errors and we can delegate more of the how to do it machinery to the Java libraries leaving us to focus on arranging it to solving the problem. This will be our first steps in thinking like a functional programmer but not be too far away from what an OO programmer would understand.

First we are going to create a ClubMember type:

public class ClubMember
{
  private String name;
  private boolean male;
  private int age;
  private ClubMember partner;

  public ClubMember(String name, boolean male, int age)
  {
    this.name = name;
    this.male = male;
    this.age = age;
  }

  public String getName()
  {
    return this.name;
  }

  public int getAge()
  {
    return age;
  }

  public ClubMember getPartner()
  {
    return partner;
  }

  @Override
  public String toString()
  {
    return name;
  }
}

All pretty straight forward and wouldn’t look out of place in object-oriented code. We have a constructor, some getters and toString method. We don’t have a isMale() getter, but we’ll see why in a second. There’s a lot of boiler-plate though so if boiler-plate makes your blood boil [I’m allowed a quip] then you might want to take a look at Project Lombok.

We’re also going to have to register a couple somehow and let’s assume this club frowns on polygamy. We will use a static method to do it. Why static? A static method can set the partner field of both members in the same method. If it was a normal method we’d either to rely on our class’ user to call it on both members – which they might forget to do. What if we made a member register method, calling the passed member’s register method inside it? In that case we would have to decide how are we going to stop an infinite loop ping-ponging between the two instances. After all that method is going to call the first one again.

So here is the static method:

  public static void registerPartners(ClubMember cm1,
                                      ClubMember cm2)
  {
    cm1.partner = cm2;
    cm2.partner = cm1;
  }

I also want to add two static fields. Instead of a getter for isMale and a test for a partner being a member, I want to use Predicates instead so I can do some functional things with them:

public static final Predicate<ClubMember> isMale =
                                               m -> m.male;

public static final Predicate<ClubMember> isPartnerMember =
                                    m -> m.partner != null;

We’re done with the ClubMember class now.

Having a Tuple2 type to manage both age ranges for our memberships and couples would be useful. Unfortunately standard Java doesn’t have Tuple2 just yet, so we’ll make our own (using a library like Guava for this functionality would be better to avoid having to reinvent the wheel):

public class Tuple2<T1, T2>
{
  public T1 t1;
  public T2 t2;

  public Tuple2(T1 first, T2 second)
  {
    t1 = first;
    t2 = second;
  }

  @Override
  public String toString()
  {
    return "(" + t1 + ", " + t2 + " " + ")";
  }
}

A couple Tuple2 has two ClubMember types, a age range Tuple2 has two Integer types. As we probably would reuse the two integer type Tuple2 elsewhere, we’ll define an IntegerRange specially:

public class IntegerRange extends Tuple2<Integer, Integer>
{
  public IntegerRange(Integer start, Integer end)
  {
    super(start, end);
  }

  public Integer getStart()
  {
    return t1;
  }

  public Integer getEnd()
  {
    return t2;
  }

  public final Predicate<Integer> inRange = 
                                i -> i >= t1 && i < t2;
}

This also allows us to define another predicate to determine whether a value is in the range. We’ll take the end-point as non-inclusive.

Now on to our main class:

public class ClubMembers
{
  private List<ClubMember> members;

  private static final IntegerRange juniors =
                              new IntegerRange(0, 18);
  private static final IntegerRange adult =
                              new IntegerRange(18, 60);
  private static final IntegerRange seniors = 
               new IntegerRange(60, Integer.MAX_VALUE);

  private static final String juniorMembership =
                                   "Junior Membership";
  private static final String adultMembership =
                                    "Adult Membership";
  private static final String seniorMembership =
                                   "Senior Membership";

  public ClubMembers(List<ClubMember> members)
  {
    this.members = members;
  }
}

Let’s start off with a field to store the membership list and a constructor to initialise it. We also will define our age ranges for our membership types and some strings to represent them. We’ll add all the rest of the code into this class.

So the first thing we want to do is get all the members. Let’s join them together with a comma. Does that sound like a job for Collectors.joining? You bet!:

public String getAllMembers()
{
  return members.stream().map(m -> m.getName())
                         .collect(Collectors.joining(", "));
}

That was simple. We take a member, map it to just its name, and then collect with joining. The cool thing here is we don’t need to worry whether to add a comma or not as it’s taken care of for us. No more defining mutable state variables such as ‘first’ to avoid putting a comma before the first name.

Now on to average age. Again simple since we’ve seen this before:

public OptionalDouble getAverageAge()
{
  return members.stream().map(m -> m.getAge())
                         .mapToInt((Integer x) -> x)
                         .average();
}

Next we need to find all male and female members. Hmmmm two groups, this sounds like a job for Collectors.partitioningBy.

Note: One thing we should do when programming in a functional style is try to reuse as much as possible. Instead of the class being a template that we specialise, we use a function [often as a parameter] to give it special behaviour.

Let’s partition members by an arbitrary predicate:

private Map<Boolean, List<ClubMember>> partitionMembers(
                                   Predicate<ClubMember> p)
{
  return members.stream().collect(
                            Collectors.partitioningBy(p));
}

Now we can specialise this function by passing the appropriate predicate, conveniently defined in ClubMember:

public Map<Boolean, List<ClubMember>> getMembersByGender()
{
  return partitionMembers(ClubMember.isMale);
}

So now we have two groups, the false group are female, the true group are male.

What about our memberships? This time we have three groups, and thus Collectors.partitioningBy isn’t a good fit, so let’s use Collectors.groupingBy. Again let’s use the same trick: a generic classification function we pass a classifier to. This classifier is just a function taking a ClubMember and classifying into groups of an arbitrary type:

private <T> Map<T, List<ClubMember>>
         classifyMembers(Function<ClubMember, T> classifier)
{
  return members.stream().collect(
                         Collectors.groupingBy(classifier));
}

We need that classification function to take a ClubMember and return details of their membership type as a String:

private static final Function<ClubMember, String>
                  resolveMembershipType =
    m -> juniors.inRange.test(m.getAge()) ? juniorMembership :
         adult.inRange.test(m.getAge()) ? adultMembership :
                                          seniorMembership;

We defined the strings, we defined the age ranges, we defined a inRange test. It was just a case of putting it together. Now we just specialise classifyMembers:

private Map<String, List<ClubMember>> classifyMemberships()
{
  return classifyMembers(resolveMembershipType);
}

See how easy this can be?

Last is the couple members. What does it mean to be a couple member?

Well, we expect the isPartnerMember predicate defined in ClubMember to return true. That’s a filter we need in order to check this.

Now if we go through all our members with partners we will return the couples twice though: once for personA, personB and once for personB, personA. That might be what we could want, but in this case it’s not. Let’s make the assumption that both partners have a different name (to be robust we probably should have member ids for this sort of thing). We need to arbitrarily choose one of the partners as our first partner, so let’s use it if personA’s name when compared to personB’s name returns < 0. String has a compareTo function defined for us so we can do that. This will need another filter.

We also want to return the couple, not just one of the members, so let’s use Tuple2 to hold that. This will need a map.

Finally we want a list of couples, so Collectors.toList() will do that job nicely:

public List<Tuple2<ClubMember, ClubMember>> getCouples()
{
  return members.stream()
             .filter(ClubMember.isPartnerMember)
             .filter(m -> m.getName().
                     compareTo(m.getPartner().getName()) < 0)
             .map(m -> new Tuple2<>(m, m.getPartner()))
             .collect(Collectors.toList());
}

I hope this is making you smile – a little bit of thought about the problem and we can easily solve it by putting blocks together.

Let’s write a driver main method in ClubMembers. To make things a little easier to verify we’ll call the couples by their titles and everyone else by their first name:

public static void main(String args[])
{
  ClubMember cm1 = new ClubMember("Johnny", true, 13);
  ClubMember cm2 = new ClubMember("Jenny", false, 9);
  ClubMember cm3 = new ClubMember("Dave", true, 21);
  ClubMember cm4 = new ClubMember("Penny", false, 28);
  ClubMember cm5 = new ClubMember("Mrs. Smith", false, 36);
  ClubMember cm6 = new ClubMember("Mr. Smith", true, 45);
  ClubMember cm7 = new ClubMember("Mr. Watts", true, 59);
  ClubMember cm8 = new ClubMember("Mrs. Watts", false, 60);
  ClubMember cm9 = new ClubMember("Bill", true, 68);

  ClubMember.registerPartners(cm5, cm6);
  ClubMember.registerPartners(cm7, cm8);

  ClubMember[] membersArray = new ClubMember[]
           { cm1, cm2, cm3, cm4, cm5, cm6, cm7, cm8, cm9 };

  ClubMembers members = new ClubMembers(
                              Arrays.asList(membersArray));

  System.out.println("Members: " + members.getAllMembers());
  System.out.println("Average age: " + 
         new Double(members.getAverageAge().orElse(0))
                   .intValue());
  System.out.println("Membership by gender (true is male): " +
                        members.getMembersByGender());
  System.out.println("Memberships: " +
                        members.classifyMemberships());
  System.out.println("Couples: " + members.getCouples());
}

Let’s fire it up:

Members: Johnny, Jenny, Dave, Penny, Mrs. Smith, Mr. Smith, Mr. Watts, Mrs. Watts, Bill
Average age: 37
Members are male: {false=[Jenny, Penny, Mrs. Smith, Mrs. Watts], true=[Johnny, Dave, Mr. Smith, Mr. Watts, Bill]}
Memberships: {Senior Membership=[Mrs. Watts, Bill], Junior Membership=[Johnny, Jenny], Adult Membership=[Dave, Penny, Mrs. Smith, Mr. Smith, Mr. Watts]}
Couples: [(Mr. Smith, Mrs. Smith ), (Mr. Watts, Mrs. Watts )]

What you might notice when writing code like this on your own, is that once you deal with all the compile errors it works first time out of the box. This is very different from doing it in a procedural style using iterators where you often end up with one off errors, null pointers and a host of other problems. By eliminating the possibility of making them we can write code quicker and be more productive. Java 8 Streams and supplied collectors make life very easy for us.