Recently I had the good fortunate to meet with a very senior WebSphere system engineer who works for a large financial services company. The discussion focused on WebSphere troubleshooting. Here are the notes that I took from the conversation.
- Have an end-to-end view in WebSphere troubleshooting, from browser all the way to the backend system.
- First, test JVM to see if it is working. Make sure that the JVM is up and running and there is no hang thread. Turn on verbose GC and look into system log and native_std.log for JVM related error message.
- From the browser, to be if the URL is working. If the return code is 500 internal error, this may be a JVM or plugin issue. If the return code is 404 page not found error, it may well be a web server problem.
- Try to browse into the transport port of the web server and application server directly. If there URL works, then, you can exclude the web server and application server from the troubleshooting scope.
- Use "telnet server_name port_name" to test network connectivity and server status or test other components of the system, for example MQ server with a port number of 1470.
- Look into the access log of the web server to see if any request has actually made to the web sever and not got stuck with the 3DNS or BIG IP. Also look into error logs to see if there are any plugin problems and SITEMINDER issues.
- If there is high CPU, usually it is bad application code.
- If there is high memory consumption, create heap dump with kill -3 helps. You can ship the dump to IBM for analysis if your work station does not have enough memory to run the Support Assistance suite of tools.
-
Check connection pool - a frequently seen problem is a bug in the JEE code that does not close the connection after using. This causes a connection leak. Use "telnet server_name 446" to examine the network connectivity between the WebSphere Application Server and the backend systems. This will also tell you if the server is actually up and running. Sometimes, the piling up of connections is due to a connectivity issue. Use TPV, Introscope, or ITCAM to inspect the connection pool as well as examine system log for connection timeout.
It helps tremendously if you have transaction monitoring capability. Then, you know exactly where the transaction got stuck or slows down. Introscope provides this capability, though you need in-depth expertise in Introscope that takes time to build.
The capability to monitor user experience and transaction is critical in troubleshooting.
No comments:
Post a Comment