65.9K
CodeProject is changing. Read more.
Home

Iteration Over Java Collections with High Performance

starIconstarIcon
emptyStarIcon
starIcon
emptyStarIconemptyStarIcon

2.08/5 (4 votes)

Jun 4, 2017

CPOL

2 min read

viewsIcon

33883

Travelling over Java collections is just a piece of cake, but when the size of the collections increases you have to choose wisely

Introduction

Java developers usually deal with Collections such as ArrayList, HashSet, Java 8 come with lambda and streaming API helping us to work easily with Collections. In most cases, we work with a few thousands of items and performance isn't a concern. But in strict code, when we have to travel over millions of items several times, performance will become a pain.

forEach vs C style vs Stream API

Iteration is a basic feature so that all programming languages have simple syntax to allow programmers to run through collections. Stream API can iterate over Collections in a very straightforward manner.

	public List<Integer> streamSingleThread(BenchMarkState state){
		List<Integer> result = new ArrayList<>(state.testData.size());
		state.testData.stream().forEach(item -> {
			result.add(item);
		});
		return result;
	}
	public List<Integer> streamMultiThread(BenchMarkState state){
		List<Integer> result = new ArrayList<>(state.testData.size());
		state.testData.stream().parallel().forEach(item -> {
			result.add(item);
		});
		return result;
	}

forEach loop is also simple:

	public List<Integer> forEach(BenchMarkState state){
		List<Integer> result = new ArrayList<>(state.testData.size());
		for(Integer item : state.testData){
			result.add(item);
		}
		return result;
	}

C style is more verbose, but still very compact:

	public List<Integer> forCStyle(BenchMarkState state){
		int size = state.testData.size();
		List<Integer> result = new ArrayList<>(size);
		for(int j = 0; j < size; j ++){
			result.add(state.testData.get(j));
		}
		return result;
	}

Then the performance:

	Benchmark                               Mode  Cnt   Score   Error  Units
	TestLoopPerformance.forCStyle           avgt  200  18.068 ± 0.074  ms/op
	TestLoopPerformance.forEach             avgt  200  30.566 ± 0.165  ms/op
	TestLoopPerformance.streamMultiThread   avgt  200  79.433 ± 0.747  ms/op
	TestLoopPerformance.streamSingleThread  avgt  200  37.779 ± 0.485  ms/op

With C style, JVM just simply increases an integer, then reads value directly from memory, so that it is very fast. But forEach is very different. According to  answer on StackOverFlow and document from Oracle, JVM has to convert forEach to Iterator and calls hasNext() with every item, that's why forEach is much slower than C style.

Which is a High Performance Way to Travelling Over Set?

With data:

    @State(Scope.Benchmark)
    public static class BenchMarkState {
        @Setup(Level.Trial)
        public void doSetup() {
            for(int i = 0; i < 500000; i++){
                testData.add(Integer.valueOf(i));
            }
        }
        @TearDown(Level.Trial)
        public void doTearDown() {
            testData = new HashSet<>(500000);
        }
        public Set<Integer> testData = new HashSet<>();
    }

Java Set also supports Stream API and forEach loop, according to the previous test, should we wrap Set to ArrayList, then travel over ArrayList?

    public List<Integer> forCStyle(BenchMarkState state){
        int size = state.testData.size();
        List<Integer> result = new ArrayList<>(size);        
        Integer[] temp = (Integer[]) state.testData.toArray(new Integer[size]);
        for(int j = 0; j < size; j ++){
            result.add(temp[j]);
        }
        return result;
    }

How about combine iterator with C style for loop?

    public List<Integer> forCStyleWithIteration(BenchMarkState state){
        int size = state.testData.size();
        List<Integer> result = new ArrayList<>(size);
        Iterator<Integer> iteration = state.testData.iterator();
        for(int j = 0; j < size; j ++){
            result.add(iteration.next());
        }
        return result;
    }

Or just simple travel?

    public List<Integer> forEach(BenchMarkState state){
        List<Integer> result = new ArrayList<>(state.testData.size());
        for(Integer item : state.testData) {
            result.add(item);
        }
        return result;
    }

Nice idea, but it doesn't work because initializzing new ArrayList also consumes resources.

    Benchmark                                   Mode  Cnt  Score   Error  Units
    TestLoopPerformance.forCStyle               avgt  200  6.013 ± 0.108  ms/op
    TestLoopPerformance.forCStyleWithIteration  avgt  200  4.281 ± 0.049  ms/op
    TestLoopPerformance.forEach                 avgt  200  4.498 ± 0.026  ms/op

HashMap (HashSet uses HashMap<E,Object>) isn't designed for iterating all items, the fastest way to iterate over HashMap is a combination of Iterator and C style for loop, because JVM doesn't have to call hasNext().

Conclusion

Foreach and Stream API are convenient to work with Collections, you can write code faster. However when your system is stable and performance is a major concern, you should think about rewriting your loop.