Tag Archives: forEach

Generators with Java 8

Today we’ll look at creating generators. In simple terms, a generator is a function which returns the next value in a sequence. Unlike an iterator, it generates the next value when needed, rather than returning the next item of a pre-generated collection. Some languages such as Python support generators natively via keywords such as yield. When a generator’s next value is requested in Python, the generator function continues to run until the next yield statement, where a value is returned. The generator function is able to continue where it left off which can be quite confusing for the uninitiated. So how to do something similar in Java?

We saw in the last article that we can use an IntStream to generate a simple set of numbers, but we had to generate them all up front. That’s fine if we know how many we’re going to need. What if we don’t, and we want to be able to get the next whenever we like? This is where a generator comes in.

Let’s choose a simple infinite sequence, the square numbers. In a standard Java implementation we’d end up with something like the following:

public class Squares
{
        private int i = 1;

        public int next()
        {
                int thisOne = i++;
                return thisOne * thisOne;
        }

        public static void main(String args[])
        {
                Squares squareGenerator = new Squares();

                System.out.println(squareGenerator.next());
                System.out.println(squareGenerator.next());
                System.out.println(squareGenerator.next());
        }
}

This prints the first three square numbers. Note we could have gone further and implemented this as an iterator.

What we have here is an example of lazy evaluation in a non-functional style. Wikipedia defines lazy evaluation as: ‘In programming language theory, lazy evaluation, or call-by-need is an evaluation strategy which delays the evaluation of an expression until its value is needed’. Lazy evaluation is useful because we don’t need to worry about infinite sequences, performing computationally expensive operations up-front, and about storage.

Let’s expand on the example to allow getting a batch of results. This is easy – create a nextN function which calls next() a number of times and returns the results in say a List:

public class Squares2
{
        private int i = 1;

        public int next()
        {
                int thisOne = i++;
                return thisOne * thisOne;
        }

        public List<Integer> nextN(int n)
        {
                List<Integer> l = new ArrayList<>();

                for (int i = 0; i < n; i++)
                {
                        l.add(next());
                }

                return l;
        }

        public static void main(String args[])
        {
                Squares2 squareGenerator = new Squares2();

                squareGenerator.nextN(10).forEach(System.out::println);
        }
}

A few points:

  • Notice in the nextN function there is the empty diamond in the new ArrayList statement. This was added in Java 7 to save having to state the type both on the left and the right hand side; the compiler now works it out.
  • List is an Iterable, and Iterable now has a forEach() method which was added in Java 8. We could use stream() as before to create a stream, but if all we want to do is pass the contents to a function forEach() does nicely.

Now, to save having to write nextN for every sequence we make, we could create a new type which extends Iterator providing the nextN function.

The only problem we face here is that we have to save the batch in a list before we can operate on it. Java 8 provides another way. Let’s go back and start again with the following code:

public class Squares3
{
        public static void main(String args[])
        {
                IntStream.rangeClosed(1, 10).map(i -> i * i)
                         .forEach(System.out::println);
        }
}

This uses IntStream to get the indexes of the sequence in a stream and calls map to convert them into their squares. The problem is that to get more squares than the tenth we need to duplicate the pipeline and start it off from the right place. Let’s look at another way without using a range:


        public static void main(String args[])
        {
                IntStream myStream = IntStream.iterate(1, i -> i + 1);

                myStream.limit(10).map(i -> i * i)
                                  .forEach(System.out::println);
        }

This also generates the first 10 square numbers. This time it uses the iterate function. This takes two parameters, the first is our initial value, and the second is a function defining how to get to the next value from the previous. It’s a good place to use a lambda function. We can even dispense of the map function since we can undo squaring easily in iterate to get what the last index was:

        public static void main(String args[])
        {
                IntStream myStream = IntStream.iterate(1,
                        i -> ((int) Math.pow(Math.sqrt(i) + 1, 2)));

                myStream.limit(10).forEach(System.out::println);
        }

This solves one of the problems of having to buffer beforehand. However, we need to use the limit operator on the stream to limit it to 10 items, otherwise it would keep on going. Unfortunately this is a problem, since once we’ve got the 10 the stream is ‘operated on’ and we can’t use it again to generate more. If we try, we get an IllegalStateException. We’d have to create another stream to get more.

So how do we get around the problem of the stream being used up? Instead of using IntStream’s iterate function, we can use generate instead. IntStream’s generate function takes an instance of an IntSupplier. IntSupplier has a getAsInt() function which returns the next int in the sequence which is very much like our next() function. Here is an example that prints the first 20 square numbers in two batches:

public class SquaresGenerator
{
        private static class SqSupplier implements IntSupplier
        {
                int i = 0;

                @Override
                public int getAsInt()
                {
                        i++;
                        return i * i;
                }
        }

        public static void main(String args[])
        {
                SqSupplier sqSupplier = new SqSupplier();
                IntStream myStream = IntStream.generate(sqSupplier);
                IntStream myStream2 = IntStream.generate(sqSupplier);

                myStream.limit(10).forEach(System.out::println);
                myStream2.limit(10).forEach(System.out::println);
        }
}

Again we’re using limit to stop the stream continuing indefinitely. However unlike last time, although the stream is used up, the generator still survives and can be used again. No buffering needed either, just keeping hold of the supplier. The only downside vs the old Java way is that we have to use Streams to get sequence members, although this comes with other benefits such as parallelism which we’ll see in a later article.

Overall, there are several ways to generate a sequence and which we chose may depend on our needs. Using an IntSupplier is a good way to integrate with the rest of the Java 8 functional programming support.

An Introduction to Functional Programming with Java 8

Java 8 is perhaps one of the most exciting editions of the Java language in recent times. One of the headline features is support for functional programming which is the focus of this blog. The support comes mostly in three features:

  • Support for [work pipeline] streams. Streams allow us to process data through a number of stages in a pipeline in a functional way. We can create such streams from container instances such as List, or create them using specific stream classes such as IntStream.
  • Support for lambda functions. In simple terms this is an anonymous function (we don’t need to give it a name) which takes a number of parameters and returns a value. A returned value will be passed down the pipeline to the next operation.
  • Support for passing functions into other functions.

Given functional programming has been around since the 50s, and until recently mostly disregarded by the mainstream, why has it become such a hot topic? My opinion is that it’s because of its ability to easily process work in parallel taking advantage of multi-core processors, lazy (on-demand) evaluation, and ease of integration with other languages such as Java. Certainly the JVM has provided a good base for Scala which can even be embedded in a Java program, giving the best of both worlds plus an easier migration path for developers. Scala, however, has quite a steep learning curve, and there are already many existing Java programmers out there, so eventually there was pressure for Java itself to catch up to avoid having to learn a whole new language to do some cool and useful things.

Enough words, let’s get down to business and consider a hello world example:

public class HelloWorld
{
	public static void main(String[] args)
	{
		List<String> countries = Arrays.asList("France", "India", 
			"China", "USA", "Germany");

		for (String country : countries)
		{
			System.out.println("Hello " + country + "!");
		}
	}
}

This should be familiar to every Java programmer; we take a list of countries and print a greeting using an enhanced for-loop.

So, is there anything wrong just doing it like this? For a simple example, not a lot, but let’s imagine we were doing something more complicated.

We want to write some tests to check the code works, but testing the body of the loop might be tricky. We could extract the body into its own function of course, but if we did that everywhere we’d end up with lots of little functions for the bodies of loops.

What if we want to process items in parallel? We’d need to hand over a country to a thread that implements the greeting function. Should one thread process one country, or a batch of them? How are we going to pass the work in batches and check when it’s done? We have to also handle communication between the threads.

Another problem is that the loop body’s code is disjoint from the source of the loop variable. To see how a country gets created we must find the for statement. We’d prefer to tell the list of countries to do the message printing. The problem is that lists don’t understand being told to do things, they are just containers.

However, since Java 8, containers can pass their contents to an entity which does handle instructions – a Stream. I’ve changed the example to use a stream:

	public static void main(String[] args)
	{
		List<String> countries = Arrays.asList("France", "India", "China",
				"USA", "Germany");

		countries.stream().forEach(
				(String country) -> System.out
						.println("Hello " + country + "!"));
	}

The list of countries now creates a Stream and passes it a ‘spliterator’ from the container. This is an iterator which is capable of splitting the contents into batches of work. We’ll take a look at this batching (which then can be parallelised) in a future post.

Stream itself is an interface which is implemented by the abstract class ReferencePipeline. Jobs can then be chained to the returned Stream. Each job does some work, and perhaps calls a downstream job when it’s done. In the example the list creates a Stream and a forEach job is added to its pipeline. Like our loop in the first example, the forEach job does some work on each item. The funny -> indicates a lambda expression (an anonymous function), here taking a String named country and printing it in a message.

Since in this case the compiler can work out that country is a String, we can dispense with the type, and as there is just one parameter we can also dispense with the () around the lefthand side leaving the more concise:

countries.stream()
   .forEach(country -> System.out.println("Hello " + country + "!"));

Note it’s a lot clearer that we are doing work on countries rather than just referencing a loop variable which happened to be a country. It’s also easier to read as there is less boilerplate obscuring the actual business logic. Clearer and easier to read code is easier to understand, also it’s easier to spot mistakes. We don’t also have to worry about making mistakes in the boilerplate which cuts down our tests. Overall, coding is more fun and the development time is shortened.

Let’s experiment with this example a bit to illustrate what is going on under the hood. The lambda that foreach is taking is actually an implementation of Consumer<T>.

* Reminder for those a bit rusty or coming to generics for the first time that this means the Consumer type uses an unknown type T which is decided at compile time.

We can explicitly show use of Consumer by creating an inner-class:

private static class Greeter<T> implements Consumer<T>
{
	@Override
	public void accept(T t)
	{
		System.out.println("Hello " + t + "!");
	}
}

and changing the pipeline:

countries.stream().forEach(new Greeter<String>());

Consumer has an accept method which is called every time by forEach with each item in the pipeline.

Let’s separate the forming of the message and the printing into two jobs. To form the greeting message, we can use the map function. This applies a lambda function to an input transforming the input. The printing job is still carried out by a foreach:

public static void main(String[] args)
{
	List<String> countries = Arrays.asList("France", "India",
			"China", "USA", "Germany");

	countries.stream().map(country -> "Hello " + country + "!")
			.forEach(System.out::println);
}

Note the strange syntax System.out::println. We’re actually passing a function into forEach, the println method of the System.out instance. The :: syntax simply means pass the function on the right, calling with the object on the left. The left hand side can either be a class name for a static call, an object, or an alias (this or super) to an object. Note we cannot pass only the name of the function, since this would be taken to mean a variable whose type is-a Consumer.

Instead of a static main method, if we ran this inside an instance of an object which also had a doPrint method (taking a String and returning void), we could write:

       countries.stream().map(country -> "Hello " + country + "!")
                        .forEach(this::doPrint);

Let’s look at how map works under the hood. Map takes an instance of a class which is-a Function and calls its apply method. Apply takes type T and returns a type U. We’ll create a Greeter inner-class to demonstrate this:

private static class Greeter<T> implements Function<T, String>
{
	@Override
	public String apply(T t)
	{
		return "Hello " + t + "!";
	}
}

...

countries.stream().map(new Greeter<String>())
                  .forEach(System.out::println);

When we used a lambda function here, it was just taken to be an anonymous instance of Function.

To finish, let’s run the pipeline inside an object has both a doPrint and a makeGreeting method (taking a String and returning a String), and see what it looks like:

public class HelloWorldConcise
{
	private void doPrint(String str)
	{
		System.out.println(str);
	}

	private String greet(String country)
	{
		return "Hello " + country + "!";
	}
		
	public void greetCountries()
	{
		List<String> countries = Arrays.asList("France", "India", 
			"China", "USA", "Germany");

		countries.stream().map(this::greet).forEach(this::doPrint);
	}

	public static void main(String[] args)
	{
		new HelloWorldConcise().greetCountries();
	}
}

The greetCountries function is very concise. It’s clear that we take each country, make a greeting from it and then print it. Another great thing is it’s really easy to test. If we extend the HelloWorldConcise class to make a HelloWorldConciseTest, we can implement our own versions of makeGreeting and doPrint. Our own makeGreeting can be used to check the greeting is being formed correctly after calling its parent version, and the doPrint method can be a stub.

* Okay, printing to stdout during a test is not a big deal, but imagine if doPrint sent the greeting to a web page, we wouldn’t want to have to set all that up in order to test it.

So that’s our first taste of Java 8 functional programming. I hope you found it useful. In the next article I will look at some other operations which can be included in the pipeline.