Saturday, March 26, 2011

Performance Tuning

What is performance tuning?
Performance tuning is turn WebSphere an extensive number of "knobs" and parameters that you can use to enhance an application’s performance according to the specific needs of each application.

When to tune?
Do not try to fix anything that is not broken. Only do performance tuning at performance testing or when there is a performance problem identified.

Performance tuning strategy?
Have a holistic view of the WebSphere system from the geographical load balancer all the way to the back

What to tune 
What to tune depends on the specific JEE application. The most frequently tunable areas are the following.
  • JVM, especially for memory bound applications. Ensure that the heap size is large enough but not too large to cause a long pause. There is not fixed "gold rule" for heap size. For example, I have seen heap as large as 2GB that works well. It depends on the application. For application that has many small and short lived objects, you may want to experiment on having a large "nursery". Even though for a 64 bit system, there is theoretically no upper limit but the constraint of available physical memory, too large a heap certainly can cause problems due to the "stop the world" compact cycle of garbage collection. 
  • EJB container - EJB thread is a custom property. You want to use ITCAM or Introscope to detect thread usage and tune accordingly.
  • Web container for web applications - web container thread is the most common bottleneck, especially for high load systems. You have to watch for creeping increase of load. For a very stable WebSphere system, the stead but slow increase of load over the years may eventually become unstable. When thread saturation occurs, it may manifest itself as a JVM heap problem because of within the JVM, large number of requests are piling up. Increase of vertical clustering has been an effective means that I used over the years to fix this problem. Most of the application code supports vertical clustering, with the exception of  the rare situation where the application code uses a unique counter or software router kind of Java artifact. Quite a number of applications have logic that do not support horizontal cluster. For this type of JEE application, vertical clustering is the only means to achieve a level of inter-JVM fail-over and increase of thread processing capability. It is very helpful to work with the application development team not to design and code JEE applications that do not support vertical and horizontal clustering.
  • JDBC connection pool - a usually problem is the too low a setting for the load, if you see a number of waitForConnection exception and timeouts in the log files. Now it is time to test increased connection pool size. 
  • JMS - the number of connection, the max retry value, and the number of messages in a session are among the tunable parameters.
  • OS - The ulimit value, the network (MTU), and amount of memory, and the CPU allocation, among others, contribute to the performance of WebSphere Application Server.
  • Web Server - are you running in worker mode where you can take advantage of web server threads or refork mode where the web server process forks to serve each request? Thread settings, timeout values, and the location of the web server impact the performance of the WebSphere system. Look for warnings about reaching MaxClients. Here is a useful web server tuning guide on WAS 6 .
What are the "knobs" and parameters?
  1. JVM heap size
  2. Thread pool size
  3. Connection pool size
  4. Data source statement cache size
  5. ORB pass by reference 
  6. Servlet caching
JVM heap size 
  1. The increase of the heap size should be balanced with the time and pause needed for garbage collection. 
  2. The high (maximum) and low (minimum) heap settings should be equal to prevent the dynamic heap size adjustments.
  3. Default of 50 and 256 MG is usually inadequate (it is wrong to take these default settings as optimal or IBM recommended settings. For some applications, a relatively small heap may work the best while for others, a very large heap has the best performance. It is purely a matter of performance testing and tuning).
  4. The right heap size tuning can only be achieved via testing (turn verbose garbage collection on when testing).
  5. Using free heap after collection to isolate memory leak.
  6. garbage collection policy is a main GC tuning parameter. 
  • optthruput: (default) mark and sweep during garbage collection when the application is paused to maximize throughput.
  • optavgpause: mark and sweep while the application is running to minimize pause times to get the best response time.
  • gencon: manage short-lived and long-lived objects differently to provide a combination of lower pause times and high throughput.
    Thread pool size
    Minimum size The minimum number of threads that the contain will keep in the pool. After the number is reached with thread adding into the pool, this minimum pool is kept regarding they are busy are idle.
    Maximum size The maximum number of threads to maintain in the thread pool. To set this too high can cause JVM resource issues and halt the application
    Thread inactivity timeout The amount of inactivity (in milliseconds) that should elapse before a thread is reclaimed. A value of 0 indicates not to wait, and a negative value (less than 0) means to wait forever.

    Connection pool size 
    Making connection and tearing up the connection is resource intensive. A connection pool allows the reuse of a connection to improve performance.


    Minimum connections The minimum number of physical connections. If the size of the connection pool is at or below the minimum connection pool size, an unused timeout thread will not discard physical connections. The pool does not create connections only to maintain the minimum connection pool size.
    Maximum connections The maximum number of physical connections possible for this pool. If this number is reached, no new physical connections are created; requestors must wait until a physical connection returned to the pool, or until a ConnectionWaitTimeoutException is thrown, based on connection timeout. Too high a maximum connections value can stress or even overwhelm the back-end.
    Thread inactivity timeout The amount of inactivity (in milliseconds) before a thread is reclaimed. A value of 0 = no wait, and a negative value = wait forever.


    Data source statement cache size
    Data source statement cache size is for improve the performance of prepared statement and callable statement. Try to get the number of these statements and make the size equal to that statement. Then, test and increase the size till you see no discarded statement. This is how to adjust this knob. 

    Data sources > Derby JDBC Driver XA DataSource > WebSphere Application Server data source properties.

    ORB pass by reference
    This is the same like in C++. Pass by value method creates a new copy of the object. This method is more costly than pass by reference.Using the following panel to change this.

    Servers > Server Types > Application servers > server name > Container services > ORB Service 

    Servlet caching
    Use DynaCache to cache fragments for Servlet can improve performance. Servlet caching can be enabled in the administrative console by navigating to Servers => Application servers => server_name => Web container settings => Web container.

    Tools
    TPV, Introscope, ITCAM all provide help in performance tuning. However, without real time capability to monitor and measure all the way from browser to the backend 1) user experience, 2) transaction, your ability to fully understand what is going in the system is limited.

    No comments: