Using Apache Access Logs with JMeter

Written by Geoff Mottram (geoff at minaret dot biz).

Placed in the public domain on September 21, 2004 by the author.

This document and all associated software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the author be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with this document and associated software or the use or other dealings in same.

Contents
Introduction
Software
JMeter Tips
Using Apache Log Files
Filtering Your Apache Log Files
Duration and Load
Creating the Test Plan
Running the Test
Analyzing the Test Results

Introduction
This document provides some tips on how to use Apache access logs as the source material for a JMeter test plan. A Perl script for generating test plans from Apache log files and a test plan for displaying test results is provided. You will have to edit the enclosed scripts and test plans to suit your particular configuration.

Please note that after this article was written it was pointed out to me that there is a Tomcat Access Log Sampler. This sampler can read files in Apache common log format and generate requests to your web application. While I have not had an opportunity to try this sampler, many of the tips in this article will still apply. In particular, you will need to produce one access log file for each thread you want to run during your test.

Software
Download the zip file that accompanies this document. It contains the following:

FileDescription
filter.sh Filters a raw Apache access log to extract ten minutes worth of requests.
makeplan.pl Converts an Apache access log file into a JMeter test plan.
results.jmx A test plan that can be used to process the results of a test.
runtest.sh A shell script that will run a test in non-GUI mode, storing the test results in a file.
sample.jmx A test plan with one thread and one request, generated by the makeplan.pl Perl script.

JMeter Tips
A few things that you may find useful with regards to JMeter:

  1. JMeter stores its test plans in XML. This means you can generate a test plan using a text editor and shell scripts.
  2. Create a simple case test plan within JMeter and save it using the Save Test Plan as option of the File menu. The .jmx file that is produced can be chopped up in a text editor for use in a shell script.
  3. When using Apache logs as your source material, test plans can get very big, very fast. A plan with 2800 requests will require 256 MB of RAM and cannot be loaded in JMeter's GUI mode without significantly increasing the JVM's maximum heap space.
  4. Run your tests in non-GUI mode to cut down on GUI overhead during the test (use the JMeter -n option).
  5. All of the JMeter Listeners (i.e. Aggregate Report) have Filename boxes. Click on the associated Browse button to load and display the results of a test.
  6. Create a test plan that only contains Listeners to analyze the results of a test. See the results.jmx test plan that is provided with this technical tip for an example.
  7. You can run a JMeter test from a remote machine for which you do not have a GUI and copy the resulting .jtl file back to a machine that does have GUI support.
  8. If you are running JMeter under IBM AIX, you must edit the jmeter startup script that is located in the JMeter bin directory and comment out any -XX options; they are not supported by IBM's JVM.
  9. I never figured out what the HTTP Sampler's Redirect Automatically option does but if you make sure the Follow Redirects option is set on every one of your HTTP requests, your cookie testing redirects will work properly.
  10. I had the fewest problems on different platforms by running JMeter from its bin directory.
  11. Any time you are running a test, you can use your web browser to send additional requests to the server to get a feel for the system response time. This subjective approach may give you more of a feel for whether the system is responding as you would like. Remember that these additional requests will not be logged in your results file and will not be reflected in your test statistics.
The following tips are more applicable when you are using an Apache access log as your source material for a test (you may find it easier to follow along if you load the sample.jmx test plan provided with this documentation):

  1. Use HTTP Request Defaults to set the server name you are testing and leave this information out of the requests. This makes it much easier to change which server you are testing. You can change one setting instead of modifying every request.
  2. Use a Constant Throughput Timer when using Apache logs as your source material to deliver your requests at a rate that is similar to the original log files (more on this below). This constant rate is applied to each thread. In the sample test plan, each thread will send in ten requests per minute.
  3. Use a Uniform Random Timer when you have more than one thread to prevent your requests from being sent in waves. Otherwise, your threads will all send in their requests at the same time when using the Constant Throughput Timer. Choose a Random Delay value corresponding to your Constant Throughput Timer rate. In the sample plan, the Constant Throughput Timer is set for ten requests per thread, per minute. This gives each request a "window" of six seconds to complete (60 seconds per minute divided by ten requests per minute). To spread your threads out evenly over that six second interval, set the Uniform Random Timer Random Delay Maximum to 6000 milliseconds (six seconds).
  4. The HTTP Cookie Manager works really well and maintains a set of cookies for each thread that is run.

Using Apache Log Files
Apache log files make an excellent source of data for creating a JMeter test plan for load testing your Web server. It is a straightforward process to convert text log files into a test plan because JMeter stores its plans using XML. The advantage of using an Apache access log file is that you don't have to guess what a real world load on your server looks like -- you get that from the logs.

The down side to using Apache log files is that you are restricted to GET requests (since log files don't contain the contents of a POST). And while it is possible to simulate requests that return a 304 response (Not Modified), it requires adding headers to the request with the modification date and time of the file you are requesting. While you can do this with JMeter, it was more work than I had time for.

Filtering Your Apache Log Files
You must decide how long you want your test to run and then grab the appropriate entries from your Apache logs. I looked at the statistics and graphs generated by the Webalizer program to determine when the server I wanted to simulate was the busiest (i.e. the day and hour). Having decided that ten minutes was a good length of time to run the test, I used the filter.sh file included with this documentation to determine which ten minute slice was busiest in my busiest hour and then filter out those requests I could not use.

I my case, I removed any entries that did not return a HTTP 200 (OK) or 302 (Moved Temporarily) response. In the latter case, the application performs a cookie test when a session is first started by redirecting the user to a different URL, so I needed to keep those redirects. I had to remove all Login and POST requests and any requests that required a query to have already been run. Queries are problematic because, in my case, they use a POST form to run the query and when you are running multiple threads, I could not guarantee that the query would be built before it was used.

If you are testing a database application (as I was), you have to find a way to log in your test sessions. If you use cookies to track your sessions, you will have one session created per JMeter test-plan thread. My application happens to support automatic login by IP address, so I just had to configure a test account with the IP address of the computer that would be running the JMeter test.

You have to be careful when you filter out requests from your log source material as you are in effect reducing the load on the server. To run a better test, you should add some additional requests from your log files to approximate the total load on your system. There is a whiff of hocus pocus to this part of the procedure as you are making an unscientific guess as to how many extra requests to add to make up for the requests you are filtering out.

Duration and Load
In the end, your filtered log file should contain a round number of requests that approximate a given period of time. Round numbers are important because it makes it easier to simulate the load on your system using JMeter threads. In a real application, you may have hundreds (or thousands) of actual users making requests at various intervals. In your test, you will use a smaller number of threads to simulate those same users. Thus, the total number of requests should be evenly divisible by the number of threads you decide to run.

The number of threads you need to run the test depends on the longest amount of time your server might take to service a request. Add some extra time in case your load test starts to overwhelm your server. Since I was running a database test and most screens return in under two seconds, I decided that I didn't want the same thread making more than one request every six seconds (if you made it three seconds, you would only need half as many threads for the test). That meant that each thread would make ten requests per minute (60 seconds per minute divided by six seconds per request). This is the value that you will plug into the Constant Throughput Timer. Note that this value is the rate per thread.

You need more than the Constant Throughput Timer to simulate a real world load on your server as that timer will wake up all of your threads at the same time to send out their requests. This will cause your requests to come in waves. To solve this, you must add a Uniform Random Timer as well (JMeter timers are additive). The Uniform Random Timer adds a random number of milliseconds to each thread's next request time. This value will be from 0 to the value of Random Delay Maximum. The value is recalculated with each request which makes it that much more realistic. The value of Random Delay Maximum is set to the same frequency as the Constant Throughput Timer. You have to convert from Samples per minute to milliseconds per request to get these two timers in sync.

In my case, ten requests per minute per thread works out to six seconds per request (60 seconds per minute divided by 10 requests per minute). This is 6000 milliseconds and should be set in the Random Delay Maximum of the Uniform Random Timer. This means that requests are sent at random times throughout each six second interval.

I had decided on 2800 requests for my ten minute test. By using 28 threads, each thread will make 100 requests in ten minutes or ten requests per minute. This is the value that was used to program the two timers. Note the flexibility you have with round numbers. If you need to cut the number of threads to 14, each thread will make 200 requests per ten minutes or twenty requests per minute.

If you want to run a test with a reduced load, have the test plan send in fewer requests in the same time period. For a half load test, you can cut the number of requests and the number of threads in half. In my case that would mean using 1400 requests and 14 threads.

To increase the load, increase the number of requests sent in the same time period along with the number of threads. For my test a 125% load test would require 3500 requests and 35 threads (28 times 125%).

Creating the Test Plan
The makeplan.pl Perl script that is included with this documentation has the following usage statement:

Usage: makeplan.pl [-h HOST] [-t THREADS] [FILE1] [FILE2] ...
Where: -h specifies the host name or ip address to test (default is "localhost").
       -t specifies a non-zero number of threads to generate (default is 1).

Reads each FILE in succession (or standard input), generating
  a JMeter test plan to standard output.

You will need to edit this program to suit your particular needs. In particular, the script checks every database request for an "auto=" parameter and adds one if necessary. You might change this to pass login and password information with each request to insure the JMeter thread can always log in.

Redirect the output of this script to a file with a .jmx file name extension (i.e plan.jmx). This is your test plan.

For your initial test, run the makeplan.pl script against a very small number of access log entries (one to start with, and then ten). Load this test file into JMeter and run the test. Then check the results sent back by your server. One way to do this is by adding a View Results Tree Listener to your test plan before you run your test. Once you have run your test, you can use the Response data tab to see exactly what was sent back.

Note: The makeplan.pl Perl script does not simply chop the source file into the same number of chunks as the number of threads you have requested. In an attempt for the requests to arrive at the server in approximately the same order as in your Apache log file, each thread is given every n'th request (where n is the number of threads).

Running the Test
You can run your full-blown test with the included runtest.sh shell script. You must first edit this script to point to the "bin" directory of your JMeter installation on the machine that will be running the test.

The gist of the runtest.sh shell script is:

jmeter -n -t $DIR/$1.jmx -l $DIR/$1.jtl

Where $DIR is your current directory and $1 is the name of the test plan. The -n option tells JMeter to run in non-GUI mode, the -t option is the test to run and the -l indicates where to output the test results.

Analyzing the Test Results
After your test has completed running, open the results test plan that is included with this documentation. This plan only contains listeners which can load and display test results.

I have found the Aggregate Report to be the most useful in analyzing the results of the tests described here. The most important column is labelled Average and indicates the average number of milliseconds it took for each request to return successfully. The rate column indicates how often a particular request was sent. Since you are using timers to throttle your rate, this number doesn't reflect how well the server is handling your load.

To load your test results, pick a Listener like Aggregate Report and in the Filename box, click on the Browse... button. Select your test results and click on the Load button. If you want to see these results in a different format (i.e. in a graph), select a different Listener and use the Browse... button to reload the data for that Listener.

Technical Tips