Tuesday, March 25, 2008

The concurrency support in Java7 is not enough

I am very interested in how today's programming languages will tackle the many-core hardware trend that is going to continue to grow. In a couple of years time it's not difficult to foresee processors with up to 1000 cores or more. My personal feeling is that we need new languages to support this new hardware, but of course the current big ones (Java, C#, VB.Net) will try to add on features to handle many-cores.

In the Java programming language it is easy to see the evolution of the language:

  • 1995: Support for threads built into the language (Thread, synchronized, volatile)
  • Java5: Included java.util.concurrent (which adds a lot of nice to have classes when doing Thread programming)
  • Java7: Talk about introducing frameworks supporting further multi-core usage
My main concern for Java7 is that it might not be enough to cover the new ballpark being brought forward by the many-core hardware. What do I base this fear on then?

Well first and foremost I think that the new programming languages must have support to handle many-cores built into the language rather than adding these important features as APIs later on.

Another issue is that the new features in Java7 (e.g. fork-join framework) aims to solve problems that are a minute part of the overall problem. Let me give you an example taken from the article "Java theory and practice: Stick a fork in it, Part 1". In this well written article the fork-join framework is introduced to the reader and it is a very handy tool when used for the right problem. And this is the springing point: "When used for the right problem". In this case that is sorting. But let's not get into sorting with fork-join rather address the real problem: I/O.

In the article mentioned above the author (Brian Goetz) writes the following "Server applications normally run many more threads than there are processors available. The reason is because in most server applications, the processing of a request includes a fair amount of I/O which does not require very much of the processor" ... "As processor counts increse there may not be enough concurrent requests to keep the processors busy.".
From there on the article describes how to divide the parts that can be divided (sorting etc.) into sub-problems and solve them all in parallel.

I'd like to focus on the second quoted sentence above though. What Brian writes is basically that unless we solve the real problems (I/O) then there will be no way to keep our many-cores busy.

This means that there will be idle computer power but it will be impossible to improve the speed of the programs! What CTO will be comfortable with this? What architect would want to go into the board room explaining why the CPUs are more or less idle but it's not possible to handle the peak hour requests?

When this happens, and trust me it will happen sooner than we think, it will open up the ballpark to new languages. There has been a lot of buzz around Erlang the last 1.5 years, and there are good reasons for it. Just check out this comparison between Tomcat and Yaws (an Erlang based web server). I don't think Erlang will ever be a contender to Java in terms of popularity or usage, but perhaps other languages similar to Erlang will.

Scala is a programming language gaining momentum within the Java community. This language incorporates both functional and object oriented features. It could serve as bridge between the two camps and work as an eye-opener for all those hard-core-Java-will-prevail-and-survive-anything coders :-)

Only future will tell what languages we'll be coding in when many-core platforms are here. No one can be sure for certain, but I'd put my 2c on that Java (and C# for that matter) will struggle keeping the same attention from the IT community in a couple of years time.

What do you think?

10 comments:

david said...

I think it's great that fork-join is hopefully gonna be in Java 7, but that's only part of the solution. IMHO the real problem is the application. If it's not able to divide it into "sub-problems", then no fork-join library is able to help you keep those cores busy.

That's when the functional part of let's say Scala comes in handy, it suggests dividing your problems and thus makes it easier to "fork and join".

Olaf said...

The problem Brian Goetz hints at is hardly affected by your choice of programming language. He implicitly contends that the ratio between raw computing power (instructions/sec) and IO (MB/sec) will inevitably increase.
This is a hardware limit, and a language/platform can only do so much to alleviate its effects.
So the real question is not how to design a language that maximizes IO - you cannot get past the hardware barrier - but rather how to best utilize the raw computing power that would otherwise go to waste while waiting for IO to become available. And this is ecactly what the fork/join framework tries to achieve.

Cheers,
Olaf

Henrik Engström said...

Olaf,
If what you write is true, i.e. that there is no way to get past the hardware barrier, then there is no real need for the fork/join either since we all agree that I/O will be the main part of the execution time. It's like improving the wings of an airplane to cope with Mach3 with the knowledge that the engine will only make the plane fly in Mach0.5...

I do think that new programming languages will introduce new ways of thinking (perhaps new ways of designing systems) + make better use of the hardware.

I don't think that the fork/join framework in Java7 is a bad idea. I'm just stipulating that it will not be enough for Java to continue to be the no1 choice when developing systems in the future.

Cheers
Henrik

Jonathan said...

This post came at a surprising time for me, since I have recently taken an interest in Erlang.

It seems to me that Erlang has been doing something very right for 20 or so years and now it will probably get more popular.

The problem is that there are much more popular languages like Java that have so many additions, frameworks, communities, etc.. behind them that its very difficult to change from those supportive environments.

I have tried to look into Java, today and yesterday even, and tried to learn if there are similar features to Erlang.
From what I found, Java has concurrent and JMS to send async messages between objects.
But Erlang's multi-processing is just so deep rooted into the language that its very difficult to get the same results in Java.

So basically, I will wait and see what will happen and keep my eye on Erlang.

Good post, by the way.

Anonymous said...

Just because i am curious, how exactly does erlang allow you to accept more io through say a 1gbps port?

you cant get 1.5gbps out of a 1gbps pipe. thats all there is to it.

if there is a x cycle delay to get data from the hdd, there isn't anything any language can do about it. you just block that process/thread and move onto another until you have the data in cache/ram.

Henrik Engström said...

Anonymous,
Do I interpret your comment incorrectly if I say that it is a bit sarcastic? :-)

The point with my blog about Java being insufficient to cope with the future HW demands was not that Erlang will get 1.5gbps out of a 1gbps pipe. I think Erlang will be as insufficient at that as any other programming language. You are absolutely spot on with that conclusion.

Instead we must try to avoid using I/O to the degree that we hit the roof. How is this possible? Well, for starters we could use RAM to avoid using I/O for example DB calls. Well, the Java advocates would say, Java can do that. And I agree, but (there's always a but) with Java it's possible to design systems in a numerous of ways (some better than others). When developing multi-threaded applications (coping with high loads) in Java it is very easy to make mistakes. This is where Erlang, or any other functional programming language for that matter, enters the scene. Erlang is built from scratch with concurrency in mind, hence it is easier to develop systems in Erlang for a concurrent environment.

As I've said before I am not an adversary of Java7 including fork/join or any other framework improving it towards the concurrent environment. On the contrary! Java is what put bread crumbs on my table today (and probably will continue to do so for years to come). I just don't think the new APIs will leverage the need of a future system.

Java will still stay a part of the future systems (it's not the new COBOL) but it will not be the sole contender. New system designs (patterns), programming languages, and/or combination of these will open new possibilities to handle the requirements to come.

Anonymous said...

Your whole argument hinges around one link (Tomcat vs Yaws), and the link is not a comparison of Tomcat and Yaws. It's a comparison of old style Apache httpd (written in C, not Java).

Anonymous said...

anon1 again...

Actually I think I was trying to be an ass :-)

I think I understand your argument better, and that is simply that since java relies on the OS for threads/processes at some point the os itself will fall over because there are simply too many threads/processes. As seen with the *pitiful* results of apache.

Dont get me wrong, i would love to have the problem of my server falling over because we have too many concurrent connections.

As an aside if we assume the throughput is as they reference in the article 800 kbps. Then I am left wondering per connection or total. if its per connection then we are looking at 64Gbyteps. not likely in the other case we are looking at 10 bytes per second as a connection speed, and there 20kb page would take a wooping 33 minutes to download. High Performance indeed. Apache when it falls over would take a mere 1min40secs to download. At least it knows to fall over so a sysadmin can fix the obviously broken system.

In any case the limiting factor is not IO, but the OS's capacity for multitasking.

Peter Lawrey said...

One highly multi-processing system is the Azul appliances which have systems with 768 processors.

I find it interesting that they only supports java/byte code. It doesn't have a C compiler!

http://www.azulsystems.com/products/compute_appliance.htm

So I would assume Azul believe Java can scale to lots of processors.

Jim said...

Adding more concurrency features to Java in the form of fork-join is helpful, but misses a bigger opportunity. Fork-join can help make one part of an application faster, but does not help the overall application.

I'm working on a framework in Java that is similar in concept to the Erlang capability of message passing to light-weight processes. It's based on a dataflow architecture and helps you to build scalable applications, not just scalable for loops.

It's in beta and free to download and try. I'd appreciate any feedback on the concept and direction. You can also read an article about it this month's (March) issue of JDJ.

You can read about and download the software at http://www.pervasivedatarush.com.