JRun problem debugging
Posted: 7/11/2001 5:40:34 PMBy: Comfortably Anonymous
Times Read: 3,396
Likes: 0 Dislikes: 0
Topic: Programming: Java
In order to debug an problem involving an application that uses Jrun, you will need to use the Jrun logs. Not only do these log files give you evidence of how Jrun is performing, they can also reflect issues "upstream" from Jrun, such as errors from the JAD, TAD, or the mainframe.
In /logs/Jrun, you will find two types of files, each with a date/timestamp extension so files don't overwrite each other or get impossibly (versus merely obnoxiously) long. They are stdoutmmddyy-hhmmss.log and stderrmmddyy-hhmmss.log, where mmddyy-hhmmssis the date and timestamp of when the logfile was first opened. Jrun will open a new logfile whenever (a) Jrun is restarted or (b) a new servlet is loaded by the Java virtual machine. stdout is the most useful file, as it contains timestamps and error messages. stderr contains some error messages, but since it doesn't have timestamps, it is difficult to correlate the messages to a specific problem. If you want to see when Jrun was stopped and restarted, you can view /software/jrun/jsm-default/logs/event.log; in this file, you will find a log showing when Jrun was stopped or restarted. This file can be useful when trying to determine what someone else may have done to attempt to restart Jrun.
When reviewing the log files on each server, the OS date/timestamp of the file will indicate when that file was last updated. If Jrun was merely
switching filenames, you should see another file with a date/timestamp in the filename that is within seconds of when the last file was closed out. If you see a large gap between the time that Jrun stopped writing log information and the time Jrun started back up, Jrun was probably unavailable during that time.
Reviewing Jrun Logs
What should I see in a Jrun log?
jcp [Jul09:05:35:45] 262143kb, 15+7=51 threads, 7 peak, 0/8 rq delayed, 0 dropped.
In doGet
Mon Jul 09 05:35:46 CDT 2001 x1.BoardLink 16
Mon Jul 09 05:35:46 CDT 2001 x2.PetServlet 641
In doGet
Mon Jul 09 05:35:48 CDT 2001 x1.BoardLink 0
In doGet
Mon Jul 09 05:35:49 CDT 2001 x2.BoardLink 0
In doGet
Mon Jul 09 05:35:50 CDT 2001 x1.BoardLink 16
jcp [Jul09:05:35:50] 262143kb, 15+9=51 threads, 9 peak, 0/7 rq delayed, 0 dropped.
Here is a brief description of each type of line in the log:
jcp [Jul09:05:35:45] 262143kb, 15+7=51 threads, 7 peak, 0/8 rq delayed, 0 dropped.
The JCP lines give you a status of the Java virtual machine, and are updated every 5 seconds. The following line from July 9th at 5:35 am tells you that the JVM heap was using 262,143 kilobytes (256 Meg). The next section of the line shows thread usage inside the JVM: 15+7=51 threads, means that Jrun had 15 listener threads and 7 request threads in use. Jrun is configured to have 15 request threads in use to constantly listen for requests from the web server. When a listener requests receives a request, it creates a request thread to handle the request, and the listener request can go back and listen for more requests. The total number of threads will not equal the number of listener and request threads since request threads may spawn additional threads. For example, for requests communicating with Tuxedo via Jolt, Jolt will create 3 threads for reading and writing. 7 peak indicates that during the previous 5 second interval, Jrun had a maximum of 7 requests during that time. 0/8 rq delayed indicate that of 8 requests that came in during the 5 seconds, none were delayed. Jrun is currently configured to execute up to 200 requests at any one time, and any requests over that amount are queued up to wait for a Jrun request thread to become free. Queued requests are generally bad; that indicates an overload condition. 0 dropped means that Jrun didn't drop any requests; Jrun also has a limit on the total number of threads in the JVM, and so if the thread count goes over 1000, then subsequent requests will be dropped in an attempt to try and finish the work that is underway. When requests are delayed, you will see an ! at the end of the status line; when requests are dropped you will see a !!; so when you see the exclamation point, that's usually a sign of a problem somewhere. Generally, high request counts are a sign of problems; usually it should be well under 100 requests, as our applications don't get THAT many requests and should be processing requests fairly quickly.
Mon Jul 09 05:35:46 CDT 2001 x1.BoardLink 16
The HttpServlet is set up to write out a line to stdout whenever a servlet completes. In this case, at 5:35:46 on Monday 7/9/2001, x1.BoardLink completed. It took .016 seconds to execute; the number reads in thousandths of a second. This time is a important measure; if you see requests taking longer than 20-30 seconds, that's usually an indication of a problem somewhere down the line, either in the data servers/mainframe, or even on the network.
In doGet
Sometimes developers will leave debugging statements in their application, and they will show up in the stdout logs as well. You may want to track down the developer who forgot to remove his debug code and ask them to clean it up, or just beat the shit of of their stupid asses. Log files are large enough in production without the debugging information.
What will you see when there is a problem?
jcp [Jul09:12:48:18] 262143kb, 180+200=818 threads, 541 peak, 3/10 rq delayed, 0 dropped.!
jcp [Jul09:12:48:23] 262143kb, 185+200=818 threads, 536 peak, 5/7 rq delayed, 0 dropped. !
jcp [Jul09:12:48:28] 262143kb, 180+200=816 threads, 538 peak, 11/6 rq delayed, 0 dropped.!
jcp [Jul09:12:48:33] 262143kb, 177+200=818 threads, 541 peak, 5/3 rq delayed, 0 dropped. !
Note the ! indicating that requests are being queued up. You should be seeing slow response time from Jrun.
jcp [Jul09:12:47:58] 262143kb, 180+200=818 threads, 511 peak, 3/10 rq delayed, 0 dropped. !
jcp [Jul09:12:48:03] 262143kb, 185+200=818 threads, 516 peak, 5/7 rq delayed, 0 dropped. !
jcp [Jul09:12:48:08] 262143kb, 190+200=818 threads, 521 peak, 3/10 rq delayed, 0 dropped. !
jcp [Jul09:12:48:13] 262143kb, 195+200=818 threads, 526 peak, 5/7 rq delayed, 0 dropped. !
jcp [Jul09:12:48:18] 262143kb, 210+200=818 threads, 531 peak, 3/10 rq delayed, 0 dropped. !
jcp [Jul09:12:48:23] 262143kb, 215+200=818 threads, 536 peak, 5/7 rq delayed, 0 dropped. !
jcp [Jul09:12:48:28] 262143kb, 218+200=816 threads, 538 peak, 11/6 rq delayed, 0 dropped. !
jcp [Jul09:12:48:33] 262143kb, 227+200=818 threads, 541 peak, 5/3 rq delayed, 0 dropped. !
This at the end of the log file is a sign of a Jrun problem where all of the Jrun request threads are locked, and Jrun is now unresponsive. Jrun is likely to crash, and Jrun alerts should be going off. Unless something else is locking the threads (such as a backend data server problem), Jrun will need to be stopped and restarted to clear the problem.
How to clear Jrun alerts
First, the problem needs to be isolated to a Jrun problem. If you are seeing problems on multiple Jrun servers, chances are the problem is "something else" and recycling Jrun isn't necessarily going to fix the problem. However, clearing Jrun can clear up resources that are either temporarily or permanently hung by the problem, since Jrun will occasionally encounter problems in response to other problems.
1. Cycle the Jrun service through the NT Services control panel.
2. If the Jrun service is hung, sometimes you'll see Jrun applications go into a "loop" taking people back to the initial page of the application instead of showing an error message. This is usually because the Java Virtual Machine (JVM) is hung up, and Jrun can't stop it. Sometimes, trying multiple stops and restarts work. You can also telnet into the NT server (If telnetd/services for Unix are installed), and issue a command line kill -f java.exe to force the JVM to terminate.
3. When all else fails, reboot since it's NT.
In /logs/Jrun, you will find two types of files, each with a date/timestamp extension so files don't overwrite each other or get impossibly (versus merely obnoxiously) long. They are stdoutmmddyy-hhmmss.log and stderrmmddyy-hhmmss.log, where mmddyy-hhmmssis the date and timestamp of when the logfile was first opened. Jrun will open a new logfile whenever (a) Jrun is restarted or (b) a new servlet is loaded by the Java virtual machine. stdout is the most useful file, as it contains timestamps and error messages. stderr contains some error messages, but since it doesn't have timestamps, it is difficult to correlate the messages to a specific problem. If you want to see when Jrun was stopped and restarted, you can view /software/jrun/jsm-default/logs/event.log; in this file, you will find a log showing when Jrun was stopped or restarted. This file can be useful when trying to determine what someone else may have done to attempt to restart Jrun.
When reviewing the log files on each server, the OS date/timestamp of the file will indicate when that file was last updated. If Jrun was merely
switching filenames, you should see another file with a date/timestamp in the filename that is within seconds of when the last file was closed out. If you see a large gap between the time that Jrun stopped writing log information and the time Jrun started back up, Jrun was probably unavailable during that time.
Reviewing Jrun Logs
What should I see in a Jrun log?
jcp [Jul09:05:35:45] 262143kb, 15+7=51 threads, 7 peak, 0/8 rq delayed, 0 dropped.
In doGet
Mon Jul 09 05:35:46 CDT 2001 x1.BoardLink 16
Mon Jul 09 05:35:46 CDT 2001 x2.PetServlet 641
In doGet
Mon Jul 09 05:35:48 CDT 2001 x1.BoardLink 0
In doGet
Mon Jul 09 05:35:49 CDT 2001 x2.BoardLink 0
In doGet
Mon Jul 09 05:35:50 CDT 2001 x1.BoardLink 16
jcp [Jul09:05:35:50] 262143kb, 15+9=51 threads, 9 peak, 0/7 rq delayed, 0 dropped.
Here is a brief description of each type of line in the log:
jcp [Jul09:05:35:45] 262143kb, 15+7=51 threads, 7 peak, 0/8 rq delayed, 0 dropped.
The JCP lines give you a status of the Java virtual machine, and are updated every 5 seconds. The following line from July 9th at 5:35 am tells you that the JVM heap was using 262,143 kilobytes (256 Meg). The next section of the line shows thread usage inside the JVM: 15+7=51 threads, means that Jrun had 15 listener threads and 7 request threads in use. Jrun is configured to have 15 request threads in use to constantly listen for requests from the web server. When a listener requests receives a request, it creates a request thread to handle the request, and the listener request can go back and listen for more requests. The total number of threads will not equal the number of listener and request threads since request threads may spawn additional threads. For example, for requests communicating with Tuxedo via Jolt, Jolt will create 3 threads for reading and writing. 7 peak indicates that during the previous 5 second interval, Jrun had a maximum of 7 requests during that time. 0/8 rq delayed indicate that of 8 requests that came in during the 5 seconds, none were delayed. Jrun is currently configured to execute up to 200 requests at any one time, and any requests over that amount are queued up to wait for a Jrun request thread to become free. Queued requests are generally bad; that indicates an overload condition. 0 dropped means that Jrun didn't drop any requests; Jrun also has a limit on the total number of threads in the JVM, and so if the thread count goes over 1000, then subsequent requests will be dropped in an attempt to try and finish the work that is underway. When requests are delayed, you will see an ! at the end of the status line; when requests are dropped you will see a !!; so when you see the exclamation point, that's usually a sign of a problem somewhere. Generally, high request counts are a sign of problems; usually it should be well under 100 requests, as our applications don't get THAT many requests and should be processing requests fairly quickly.
Mon Jul 09 05:35:46 CDT 2001 x1.BoardLink 16
The HttpServlet is set up to write out a line to stdout whenever a servlet completes. In this case, at 5:35:46 on Monday 7/9/2001, x1.BoardLink completed. It took .016 seconds to execute; the number reads in thousandths of a second. This time is a important measure; if you see requests taking longer than 20-30 seconds, that's usually an indication of a problem somewhere down the line, either in the data servers/mainframe, or even on the network.
In doGet
Sometimes developers will leave debugging statements in their application, and they will show up in the stdout logs as well. You may want to track down the developer who forgot to remove his debug code and ask them to clean it up, or just beat the shit of of their stupid asses. Log files are large enough in production without the debugging information.
What will you see when there is a problem?
jcp [Jul09:12:48:18] 262143kb, 180+200=818 threads, 541 peak, 3/10 rq delayed, 0 dropped.!
jcp [Jul09:12:48:23] 262143kb, 185+200=818 threads, 536 peak, 5/7 rq delayed, 0 dropped. !
jcp [Jul09:12:48:28] 262143kb, 180+200=816 threads, 538 peak, 11/6 rq delayed, 0 dropped.!
jcp [Jul09:12:48:33] 262143kb, 177+200=818 threads, 541 peak, 5/3 rq delayed, 0 dropped. !
Note the ! indicating that requests are being queued up. You should be seeing slow response time from Jrun.
jcp [Jul09:12:47:58] 262143kb, 180+200=818 threads, 511 peak, 3/10 rq delayed, 0 dropped. !
jcp [Jul09:12:48:03] 262143kb, 185+200=818 threads, 516 peak, 5/7 rq delayed, 0 dropped. !
jcp [Jul09:12:48:08] 262143kb, 190+200=818 threads, 521 peak, 3/10 rq delayed, 0 dropped. !
jcp [Jul09:12:48:13] 262143kb, 195+200=818 threads, 526 peak, 5/7 rq delayed, 0 dropped. !
jcp [Jul09:12:48:18] 262143kb, 210+200=818 threads, 531 peak, 3/10 rq delayed, 0 dropped. !
jcp [Jul09:12:48:23] 262143kb, 215+200=818 threads, 536 peak, 5/7 rq delayed, 0 dropped. !
jcp [Jul09:12:48:28] 262143kb, 218+200=816 threads, 538 peak, 11/6 rq delayed, 0 dropped. !
jcp [Jul09:12:48:33] 262143kb, 227+200=818 threads, 541 peak, 5/3 rq delayed, 0 dropped. !
This at the end of the log file is a sign of a Jrun problem where all of the Jrun request threads are locked, and Jrun is now unresponsive. Jrun is likely to crash, and Jrun alerts should be going off. Unless something else is locking the threads (such as a backend data server problem), Jrun will need to be stopped and restarted to clear the problem.
How to clear Jrun alerts
First, the problem needs to be isolated to a Jrun problem. If you are seeing problems on multiple Jrun servers, chances are the problem is "something else" and recycling Jrun isn't necessarily going to fix the problem. However, clearing Jrun can clear up resources that are either temporarily or permanently hung by the problem, since Jrun will occasionally encounter problems in response to other problems.
1. Cycle the Jrun service through the NT Services control panel.
2. If the Jrun service is hung, sometimes you'll see Jrun applications go into a "loop" taking people back to the initial page of the application instead of showing an error message. This is usually because the Java Virtual Machine (JVM) is hung up, and Jrun can't stop it. Sometimes, trying multiple stops and restarts work. You can also telnet into the NT server (If telnetd/services for Unix are installed), and issue a command line kill -f java.exe to force the JVM to terminate.
3. When all else fails, reboot since it's NT.
Rating: (You must be logged in to vote)