Archive for June, 2005

Just a quickie

as I’ve got to get to sleep! I’m going up to Bradford tomorrow for a meeting, and this means a six o’clock start. It’s a bit early, but it should be nice to meet these folk.

On the downside, I woke up this morning with a sore throat, and I can only expect it to be worse tomorrow. This isn’t good.

I had a good weekend in Devon and Bristol with some friends. You can see the photos here, though you won’t see many if you’re not signed in and belong to my friends or family groups on flickr.

Right, can’t keep my eyes open. Must, get, sleep!

Gumtree

Just had my first gander at Gumtree.com. Looks pretty cool - may have to start using this a bit more.

Anyone else used Gumtree? Good? Bad?

TTFN, Mark

Encoding of ampersands in GET requests

Well I’ve just learn’t something new!

Basically, the following HTML is incorrect:

<a href="http://www.example.com/index.cgi?foo=bar&moo=poo">Click me</a>

and the following is correct:

<a href="http://www.example.com/index.cgi?foo=bar&amp;moo=poo">Click me</a>

According to this article you should encode ampersands in querystrings (when in an HTML document) as entitiy references (&amp;). I can now go to bed a happy man ;-)

BritBlog hits 2000 members

BritBlog hit 2000 members a few days ago! The site is growing at a nice steady pace these days, so I wonder how many members we’ll have by the end of the year.

Any guesses? In fact, lets have a competition… Put your estimates for the total number of active BritBlog members on the 31 December 2005 in the comments for this post, and the person with the closest guess will win a prize from the BritBlog shop!

I’ll start by guessing (totally randomly) 5,570 members.

More on Website Performance Improvements

Lately I’ve been looking at improving the performance of BritBlog, and I think I now have quite a satisfactory solution.

As you may recall, I’ve been looking at using content compression and content caching.

Content (HTTP) Compression

The content compression has made a huge difference to the download times of the pages, but only seems to have added a slight additional load on the server:

Download speeds before and after content compression

The graph above shows the total download time (green), and the data start times (yellow). The data start time is the time taken for the first byte of data to come back from the server, and with dynamic pages this largely reflects the time taken to generate the content.

(You can see four intermediary tests where the download time is only improved a small amount. This is where I compressed the HTML but not the CSS or JavaScript files.) Although you can’t really tell from these graphs, the data start times only go up a smidgen, so I’m guessing the compression work isn’t too much of an overhead.

Anyway, on the whole the content for these dynamic pages doesn’t change much throughout the day, so I’ve been looking into content caching with the two PEAR packages (Cache and Cache_Lite).

Content Caching

Cache_Lite seems to be marginally quicker than Cache, and it’s incredibly simple to use. It even has built in output buffering so you don’t need to control this yourself. Can you tell I like it?!

The graph below shows how the use of caching (with Cache_Lite) has improved the data start times of page requests, and therfor overall web site performance. I’ve set the cache life to one hour, and you can see that once an hour there is a spike in the data start times where the cache is refreshed. Caching was enabled at about 4:30 PM, so the cache file was built at the first HTTP request at 4:38 PM.

Data start times improve once content caching is enabled

Although this only saves us about 200 milliseconds, the work done by the server is reduced dramatically. When looking at these results, you should also bear in mind that there is basically no load on the server at the moment. The only traffic comes from the Site Confidence test agents that I use to gather this data. Once the site is in production usage the load on the system should be greatly reduced.

So is it all worth it?

The short answer is yes, I think so! The traffic on BritBlog is going up everyday, and it seems a bit unnecessary generating all the pages on-the-fly when they change so little. It should also improve the visitor experience as the site will respond more quickly to requests.

More stuff

If you find this interesting, you may like to see my earlier posts on mod_gzip content compression and using PEAR cache_lite.

Finally, I used Site Confidence to monitor my test server and to generate the graphs.

Using PEAR Cache_Lite

Cache_Lite is a PHP PEAR package designed for caching stuff. As the project description puts is:

This package is a little cache system optimized for file containers. It is fast and safe (because it uses file locking and/or anti-corruption tests).

The documentation for this project is actually quite good (I’m not impressed by the level of documentation for the majority of PEAR packages), so I would suggest you pop over there and take a look at it.

Anyway, I’ve just written a little bit of code to get the dictionary.com Word of the Day and cache it for two hours, so I thought I’d share it here. The script also uses the XML_RSS package, which is a handy little tool when you’re dealing with RSS.

<?php
/*
** This script demonstrates some simple useage of two PHP PEAR packages
** (Cache_Lite and XML_RSS).
**
** When run it displays the dictionary.com Word of the Day. First of
** all, it looks to see if we have a local cached copy of the Word of
** the Day. If not, it gets the Word of the Day RSS feed from
** dictionary.com, pulls out the current Word of the Day, and then
** caches it locally ready for use the next time.
*/

// include the PEAR packages
require_once('XML/RSS.php');
require_once('Cache/Lite.php');

// name a cache key
$cache_id = 'wotd';

// the data (an array) that we want to cache
$data = array();

// Set a few Cache_Lite options
$options = array(
    'cacheDir' => '/tmp/',      // must include trailing slash…
    'lifeTime' => 7200,              // cache life in seconds (2 hours)
    'pearErrorMode' => CACHE_LITE_ERROR_DIE  // helps us when debugging
                                    // but not good for production site
);

// Create a Cache_Lite object
$cache = new Cache_Lite($options);

// Test if there is a valide cache item for this data
if( $data = $cache->get($cache_id) )
{
    // Cache hit
    $data = unserialize($data);
}
else
{
   // Cache miss
   $rss =& new XML_RSS('http://dictionary.reference.com/wordoftheday/wotd.rss');
   $rss->parse();
   foreach ($rss->getItems() as $item)
   {
      $data['link'] = $item['link'];
      list($data['title'], $tmp) = explode(':', $item['title']);
      // we only need to get the first item, so jump out here
      break;
   }
   $cache->save(serialize($data));
}

// print out the results of our work!
echo 'Word of the day: <a href="' . $data['link'] . '">' . $data['title'] . '</a>';

?>

So there you have it!

It’s pretty simple really, but the package looks to be quite powerful. I’m sure the other caching package, PEAR Cache is good too, but the documentation here is rather poor :-(

Keyword analysis and HTTP referer - a real eye opener!

I’ve never really known how much traffic this blog or BritBlog gets. With BritBlog, the referrals that are due to the BritBlog icons that are plastered all over people’s blogs cloud the results in the web analytics software, Analog, that I use normally.

So anyway, I decided to try StatCounter. First impressions were good: it has a clean and intuitive (if not somewhat limited) interface, so this was a promising start. After an hour or so though, the free log space that you get was totally used up by BritBlog, and this meant that I couldn’t do any longer term analysis on things like search terms or referrers. This was a bit of a shame.

Being of a generous disposition though, I upgraded to one of the paid-for packages. It’s actually only a few quid for a month, and if it gives me some insight into how people are using BritBlog (or even this site), then it will be worth it.

One thing that surprised me straight away was the number of sex-related search terms that were coming through to the member interests pages. I’m not talking a few here - probably over 18% of people coming in from search engines are looking for something so do with sex. Quite an eye opener!

mod_gzip content compression

This morning I’ve been looking at the mod_gzip Apache module, and have had some impressive results!

mod_gzip is, for those of you that don’t know, an apache module that compresses content before it is sent to the client’s web browser. Compressing content before sending it to the client means there is less content to transfer, which equates to faster download times. The time taken to compress the content at the server end, transfer the compressed content to the client, and then restore it again at the client end is usually significantly faster than transferring the original uncompressed HTML file across the wire.

The graph below shows the download time for the London blogs page over at BritBlog, measured at a download speed of 40 Kbps:

Page download speed, showing time at which content compression was enabled.

This morning, at around 11.00 AM I configured mod_gzip, and you can see straight away the effect it has had on the download speed. I first of all only enabled compression on HTML content, but as I have a hefty CSS file and a large JavaScript file, I enabled compression for this content too. (I know this may cause problems for some browsers - we’ll have to see what happens).

The following two graphs show the component breakdown for these pages:

Before content compression:
download breakdown before content compression

After content compression:
download breakdown after content compression

As you can see, the text-based content comes down much more quickly. In fact, it knocks nearly 2/3rds of the download time off!

What are the cons?

The downside with mod_gzip is the additional load on the server. The table below shows some of the data that went into plotting the top graph, and you see that the data start time goes up once mod_gzip is enabled (between 10:28 and 10:38). Note: my mod_gzip configuration wasn’t correct straight away, which is why the download speed came down gradually over the next few tests).

Time of Test Time (seconds) for Status Code
DNS Connect Data Start Content Total
10:08:53 0.010 0.003 0.177 11.710 11.900 OK
10:18:53 0.008 0.003 0.191 11.607 11.809 OK
10:22:26 0.012 0.003 0.188 11.583 11.786 OK
10:28:53 0.006 0.001 0.186 11.601 11.794 OK
10:38:52 0.007 0.003 0.229 11.561 11.800 OK
10:47:44 0.017 0.003 0.238 6.752 7.010 OK
10:48:52 0.016 0.003 0.288 6.780 7.087 OK
10:57:59 0.012 0.003 0.240 6.765 7.020 OK
10:58:52 0.005 0.002 0.220 6.742 6.969 OK
11:03:13 0.005 0.002 0.230 4.023 4.260 OK
11:08:52 0.008 0.004 0.246 4.033 4.291 OK
11:18:52 0.010 0.003 0.219 4.096 4.328 OK
11:28:53 0.006 0.001 0.224 4.118 4.349 OK
11:38:52 0.009 0.003 0.245 4.052 4.309 OK
11:48:53 0.018 0.004 0.228 4.028 4.278 OK
11:58:53 0.006 0.001 0.223 4.109 4.339 OK

The data start time on my server isn’t that impressive at the best of times - and that’s without any normal server load. It will be interesting to see what happens once the server goes live. I’m also planning on building in some sort of page cache to the site: the pages don’t change that rapidly, so building content on-the-fly like this isn’t really necessary.

mod_gzip Configuration

I’m not going to go into this in much depth; if you’re using mod_gzip for the first time then this is a useful example configuration file - you should read through it.

The configuration snippet below shows the main part of the mod_gzip configuration. This is the bit that specifies which stuff should get compressed and which shouldn’t. The best candidate for content compression is text, as you can easily knock 60% off the amount of data that needs to be transferred.

The configuration below allows all PHP, HTML, JavaScript and CSS files to be compressed. Images are already compressed, so trying to compress them any further is a waste of time. For this reason, images are excluded from the compression process.

# Filters
mod_gzip_item_include         file       .php$
mod_gzip_item_include         file       .html$
mod_gzip_item_include         file       .js$
mod_gzip_item_include         file       .css$

mod_gzip_item_include         mime       ^text/html
mod_gzip_item_include         mime       ^text/plain
mod_gzip_item_include         mime       ^text/css$
mod_gzip_item_include         mime       ^application/x-javascript$

# don’t compress images
mod_gzip_item_exclude         file       .gif$
mod_gzip_item_exclude         file       .png$
mod_gzip_item_exclude         file       .jpg$

When I was trying to get this to work, I ran into some problems as my PHP and HTML files weren’t being compressed. I did a GET -ed http://www.britblog.com/directory/.../london.html and discovered that the Content-Type header (mime type) was coming back with a charset parameter:

Content-Type: text/html; charset=iso-8859-1

So I removed the $ from the regexp (the $ matches at the end of the string) for the mime filters so that they would allow for additional parameters, et voilĂ !

At the moment this is only running on the development site, but I’ll turn it on once we move to the new server.

If you are interested, the download data was captured and plotted by Site Confidence.