The last couple of weeks I have been working intenselyResources:
with performance of a JEE application (EJB3, JPA) that
runs on Glassfish v2.1.
When the application server was stressed tested we found
that the CPU (4 cores + 8GB) load was between 80-90% and
the throughput and latency was not good enough.
Hence we began investigating what the cause of the
problem was. I'd like to share some of the findings as it
might point someone else in the right direction when
it comes to JEE applications and performance.
Loading the Application
To find out the performance (latency and throughput) of
your application you must be able to load it. We use Grinder
for this. If possible it is definitely better to set up the
load server on another machine than the one you run
your app server on.
Measure and analyze the results of the load test.
If it turns out to not fulfill the performance requirements of
the application below you can find a checklist that I have
found useful when tracing the root cause of the problem(s).
Suspicion 1 - Database Issues
The first performance problem a JEE application "normally"
runs into is db performance (at least this is my experience as
a JEE developer since 1999).
Check CPU load on the db server to get an indication of if
this is the reason of the experienced performance problems.
If it is then tuning the database application is where you
should start! But this topic is not covered here - Google it!
Suspicion 2 - Code Issues
You may have unintentionally written your code in a non-
performing way. This happens some time you know :-)
I suggest that you download a Java profiler and use it to
introspect how your code really operates.
We have used JProfiler (v5.2.1) to profile the JEE application
running in Glassfish (v.2.1). You can install and try it for
10 days for free. JProfiler worked well as long as we kept the
number of HTTP threads to a small number. When running
Glassfish with 128 HTTP threads they all ended up
in a deadlock.
We could not determine if JProfiler was the cause of this or
something else. (Although, running the application server
without JProfiler did not cause this faulty behavior).
If you want to use freeware you may test NetBeans Java profiler.
It worked ok too, but since it only measured wall time and we
wanted to see the real CPU time it did not suffice to our needs.
Run the profiler and try to locate memory leaks,
performance issues, and so on.
The documentation of the profiler will give you a good
starting point for how to do this.
In our case we could not find any obvious issues with our
JEE code and, hence, we continued with the next step (below).
Suspicion 3 - App Server Issues
Could it be an application server problem that was causing
our issues? As written above we run Glassfish and the
first thing we did was to read this:
Glassfish Performance Tuning Guide.
Glassfish out of the box is targeting a development environment
so therefore you can tune a lot of parameters to squeeze out
higher performance.
Although our performance problems did not go away after the
tuning made. We had better performance now, but not good enough.
I wondered if it could be a Glassfish thing?
What if we ran our application on another application server
and compared the latency and throughput!
This would be a good indicator whether it was a
app server problem.
After trying to deploy our application onto a JBoss 5
application server for a couple of hours without any
success I gave up! There are too many differences between
the two servers.
"Write once, run everywhere" my ass! (pardon my french).
Also I found this document and felt quite confident
that Glassfish could perform.
So, the next step is to tune the JVM.
Suspicion 4 - JVM Issues
If you're not familiar with the JVM GC parameters you
should start this investigation by reading:
JVM GC documentation.
Altering GC params can do wonders :-)
Also I highly suggest that you download and install JVMStat.
It's a great tool to get an overview of what is going on
with the heap.
When you feel confident enough you can play around with
some parameters. Some we found useful
(i.e. improved the performance significantly):
* -XX:+UseParallelGC
* -XX:+AggressiveHeap
* -XX:NewRatio=z (where z = 1, 2, 3, ...)
Of course you must adjust and tune these parameters to
fit your environment.
Lessons Learned
Run performance tests for every release (and store the results -
as they can serve as benchmarks for coming releases).
Always run performance tests early in the projects lifecycle.
It is hard to detect, and isolate, problematic areas
in the latter stages of the project.
Also, it can prove very time consuming to fix any of the
found problems, i.e. you may not be able to deliver on time.
Don't underestimate the issues with tuning the JVM and
its GC behavior! GC in itself is a topic in which you
can get a PhD :-) If your app is to run in an environment
where throughput and latency are keywords you should
invest some time in understanding the JVM GC concepts -
or hire an expert to the tuning for you.
Happy tracing!
Grinder
JProfiler
Java 5 GC Tuning Documentation
JVMStat
2 comments:
You didn't say whether you ran a thread dump and figured out what those threads were blocked on (kill -3). Most likely a dropped index or something and you're hanging on the connections in your DB connection pool or something simple like that (or someone thinks "synchronized" is a magic keyword that makes all threading work ;-) ). You should turn on GC logging if you suspect a gc issue (-Xloggc:file.log). If you want to try it on JBoss, your probably are running into classloaders and just need to specify "loader-repository". Though if you weren't careful then default mappings of things like EJBs that you look up manually will get in your way. Glassfish seems pretty solid minus the mini-me toplink (I'd switch to Hibernate).
Post a Comment