mod_gzip content compression

This morning I’ve been looking at the mod_gzip Apache module, and have had some impressive results!

mod_gzip is, for those of you that don’t know, an apache module that compresses content before it is sent to the client’s web browser. Compressing content before sending it to the client means there is less content to transfer, which equates to faster download times. The time taken to compress the content at the server end, transfer the compressed content to the client, and then restore it again at the client end is usually significantly faster than transferring the original uncompressed HTML file across the wire.

The graph below shows the download time for the London blogs page over at BritBlog, measured at a download speed of 40 Kbps:

Page download speed, showing time at which content compression was enabled.

This morning, at around 11.00 AM I configured mod_gzip, and you can see straight away the effect it has had on the download speed. I first of all only enabled compression on HTML content, but as I have a hefty CSS file and a large JavaScript file, I enabled compression for this content too. (I know this may cause problems for some browsers - we’ll have to see what happens).

The following two graphs show the component breakdown for these pages:

Before content compression:
download breakdown before content compression

After content compression:
download breakdown after content compression

As you can see, the text-based content comes down much more quickly. In fact, it knocks nearly 2/3rds of the download time off!

What are the cons?

The downside with mod_gzip is the additional load on the server. The table below shows some of the data that went into plotting the top graph, and you see that the data start time goes up once mod_gzip is enabled (between 10:28 and 10:38). Note: my mod_gzip configuration wasn’t correct straight away, which is why the download speed came down gradually over the next few tests).

Time of Test Time (seconds) for Status Code
DNS Connect Data Start Content Total
10:08:53 0.010 0.003 0.177 11.710 11.900 OK
10:18:53 0.008 0.003 0.191 11.607 11.809 OK
10:22:26 0.012 0.003 0.188 11.583 11.786 OK
10:28:53 0.006 0.001 0.186 11.601 11.794 OK
10:38:52 0.007 0.003 0.229 11.561 11.800 OK
10:47:44 0.017 0.003 0.238 6.752 7.010 OK
10:48:52 0.016 0.003 0.288 6.780 7.087 OK
10:57:59 0.012 0.003 0.240 6.765 7.020 OK
10:58:52 0.005 0.002 0.220 6.742 6.969 OK
11:03:13 0.005 0.002 0.230 4.023 4.260 OK
11:08:52 0.008 0.004 0.246 4.033 4.291 OK
11:18:52 0.010 0.003 0.219 4.096 4.328 OK
11:28:53 0.006 0.001 0.224 4.118 4.349 OK
11:38:52 0.009 0.003 0.245 4.052 4.309 OK
11:48:53 0.018 0.004 0.228 4.028 4.278 OK
11:58:53 0.006 0.001 0.223 4.109 4.339 OK

The data start time on my server isn’t that impressive at the best of times - and that’s without any normal server load. It will be interesting to see what happens once the server goes live. I’m also planning on building in some sort of page cache to the site: the pages don’t change that rapidly, so building content on-the-fly like this isn’t really necessary.

mod_gzip Configuration

I’m not going to go into this in much depth; if you’re using mod_gzip for the first time then this is a useful example configuration file - you should read through it.

The configuration snippet below shows the main part of the mod_gzip configuration. This is the bit that specifies which stuff should get compressed and which shouldn’t. The best candidate for content compression is text, as you can easily knock 60% off the amount of data that needs to be transferred.

The configuration below allows all PHP, HTML, JavaScript and CSS files to be compressed. Images are already compressed, so trying to compress them any further is a waste of time. For this reason, images are excluded from the compression process.

# Filters
mod_gzip_item_include         file       .php$
mod_gzip_item_include         file       .html$
mod_gzip_item_include         file       .js$
mod_gzip_item_include         file       .css$

mod_gzip_item_include         mime       ^text/html
mod_gzip_item_include         mime       ^text/plain
mod_gzip_item_include         mime       ^text/css$
mod_gzip_item_include         mime       ^application/x-javascript$

# don’t compress images
mod_gzip_item_exclude         file       .gif$
mod_gzip_item_exclude         file       .png$
mod_gzip_item_exclude         file       .jpg$

When I was trying to get this to work, I ran into some problems as my PHP and HTML files weren’t being compressed. I did a GET -ed http://www.britblog.com/directory/.../london.html and discovered that the Content-Type header (mime type) was coming back with a charset parameter:

Content-Type: text/html; charset=iso-8859-1

So I removed the $ from the regexp (the $ matches at the end of the string) for the mime filters so that they would allow for additional parameters, et voilĂ !

At the moment this is only running on the development site, but I’ll turn it on once we move to the new server.

If you are interested, the download data was captured and plotted by Site Confidence.

Sociable:These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • NewsVine
  • Reddit
  • YahooMyWeb

4 Responses to “mod_gzip content compression

  • Jim
    June 13th, 2005 10:36
    1

    Hi Mark,
    do you have a web cache? That should help the server load significantly, right?
    jim

  • marky moo
    June 13th, 2005 11:32
    2

    Hey Jim,

    Do you mean client-side or server-side web cache? At the moment there is no server-side caching of dynamic content on BritBlog, but I’ll be introducing this soon.

    The problem at the moment is BritBlog has personallised content on all the pages if you are logged in, so caching complete pages is no use.

    I could use partial page caching to cache content like tables and other database queries, but if I’m going to rewrite the site anyway, I may as well go the whole hog!

    Thinking about it, I could use whole page caching, but replace the personallised stuff on the fly…. Maybe this would be worth investigating 8-)

  • marky moo’s blog » Even More Website Performance Improvement!
    June 19th, 2005 20:42
    3

    […] w have quite a satisfactory solution. As you may recall, I’ve been looking at using content compression and content caching. Conten […]

  • Son Nguyen
    January 18th, 2006 18:59
    4

    Be aware that there are many different quirks in client browsers that some content should not be compressed. See http://www.schroepl.net/projekte/mod_gzip/browser.htm for some

Leave a Reply