Subscribe RSS

Archive for October, 2008

Database Primer – Part 3 Oct 25
She didnt get it till she read this

The moment the left join becomes clear

Part 1 | Part 2 | Part 3

Mastering the LEFT join

The left join is very powerful, very useful and very under used. In my experience few people have even heard of the left join, let alone used one. The left join comes in handy when you are trying to join tables together where you have unreliable joins or where you are managing subscriptions between two objects. A similar result to the left join can be achieved programatically using nested queries. I’ve seen many a PHP script that uses nested loops of SQL queries to accomplish a similar result. However performing these joins programtically is of course far slower than using a database that has been specially designed for the purpose. Using a query is simpler too, once you know how it works.

As with the standard join, its important to know when a left join is applicable. There ae many situations where the use of a left join is applicable, but i’ll just explain the simple, most obvious and most common ones I come across almost daily.

Unreliable Joins

This is a term I have made up. Please feel free to suggest a more technically correct word but this does it for me. An unreliable join is where we have a relationship between two objects but the relationship is optional – it does not always exist. Its ‘unreliable’.

more…

Category: SQL, Tutorials  | Tags: , , ,  | Leave a Comment
Database Primer – Part 2 Oct 25
She didnt get it till she read this

She really can fly you know

Part 1 | Part 2 | Part 3

Mastering the table join

In this tutorial I try to explain one of the most simplest and useful mechanisms available to you in a relational SQL database: the standard Join (also called Inner Join).

Querying the Tables

To connect these tables together we need to do a query. More specifically a query with a ‘join’ between the tables. The ‘join’ describes how we wish to relate the tables together.

As a simple example, consider you wanted to list all of the babes from the club. You could simply do this with the following query:

select name from surfer where gender_id=1;

This does the job, but it relies on us knowing the unique id of the gender ‘babe’ (which is ‘1′). A join allows us to accomplish this this using the gender table as a lookup.

select s.name, g.gender from surfer as s join gender as g where s.gender_id=g.id and g.gender="Babe";

Returns:

+--------+--------+
| name   | gender |
+--------+--------+
| Sheila | Babe   |
| Kylie  | Babe   |
| Danni  | Babe   |
+--------+--------+

This SQL statment introduces a number of syntax elements. The tables gender and surfer are aliased. e.g. “… surfer as b…”. This means we can reference the fields in the table surfer by simply using the letter ‘b’. You may use the table name but this can get confusing later on eg. “b.name” == “surfer.name”.

n.b. Its good practice, especially with large queries to use table aliasing. Although MySQL allows you to omit the table/alias name, it will bitch if you reference a field that appears in both tables as it won’t know which one you want.

more…

Database Primer – Part 1 Oct 25
She didnt get it till she read this

This is Sheila - shes fit as well as mental

Part 1 | Part 2 | Part 3

Building a relational database

Databases are extremely powerful data storage mechanisms, I use them everyday. A lot of people have extremely limited experience with relational databases, using them as they would an Excel document. This sequnce of posts aims to demonstrate how easy they are to use and how best to go about converting an everyday scenario into a useful database that can support your web site or application.

Part 1: Creating a database from a scenario

I believe it is always better to first PLAN what it is you are trying to achieve, so first we’ll construct a reason to have a database in the first place. [If you know how to normalise data then jump to here and skip the scenario]

The Scenario

Okay, we are going to model our Surf Shack Member’s Club, storing details about the members within the club. We’re gonna convert this into a normalised set of data tables and then use those tables in a few queries.

This is the data (a list of surfers, their gender and their skill level), as it might be presented scribbled onto a piece of paper:

BABES

Sheila – Mental

Kylie  - Spectator

Danni  - Advanced

DUDES

Hank   – Spectator

Brad   – Beginner

Joey   – Mental

Tom    - Professional

Jason  - Professional

more…

Category: SQL, Tutorials  | Tags: , ,  | Leave a Comment
Doodlebugged Oct 25
She didnt know the effect of a doodlebug either

Kylie didnt know the effect of a doodlebug either

The effect of a Google Doodle on a web server

A site I work on recently had the honour of being specifically targeted by a (UK) Google Doodle.  Fortunately we had some notice from Google and the client that the site was going to be the subject of some increased traffic so we decided to take some measures to ensure the site performed as well as possible.

Unfortunately, this was harder to plan for than we imagined.  Getting some raw data about the effect of being Google Doodled was unusually difficult. We asked Google, we asked similar companies recently Doodled, we even asked friends and family! No one could tell us what would happen. The data was either ‘confidential’ or unavailable.

In the end we decided to err on caution and do what we could within budget and time to prepare for the potential onslaught (or damp squib) that would result.

The website in question shall remain anonymous. It has around 250 pages, relatively graphic heavy in design but nothing over the top. Its a nice friendly information based web site (no video or large media).

A normal day for this website, according to its Analytics, generates around 1000+ unique visits, around 6500+ page views (more detailed figures below).

In preperation for the Google Doodle we decided to not only put the site on its own server, but move it into our newer co-lo hosting facility. The new web server is nothing particularly special. Dual core, black, hums a bit with some pretty flickering lights on the front, runs Linux/Apache. However, it is empty with no other sites running on it, so a good option for measuring the effect of the Doodle. The co-lo hosting environment has standard 5MB connection to a core backbone, which is burstable should the need arise to some stupidly high level.

The Day Arrives

The day finally arrived and the site was ready. Bring it on. Clicking the Google logo took you to a search result page, with our site appearing first in the list plus occupying a number of other positions on the first page. Therefore it takes users two clicks to get from the Doodle to the site.

Traffic increase was immediate, by around midday we started to notice a little connection latency. The apache server status report showed that all the apache connections were being used, so we upped the available spare ’servers’ from 256 to 1500. This had an immediate effect with the site responding normally.

The server status screen grab below was from about 3pm, the data here is a cumulative average from when we applied the restart just over 2 hours previously.

The apache server status log after a couple of hours

The apache server status log after a couple of hours

This screen gives a good snapshot of how the site is bearing up. In the last 2 hours 22 minutes:

  • The server has shifted 12GB
  • The CPU load by apache is 3%
  • The site is receieving 170 requests a second, almost 1.5 million accesses.
  • Its delivering a throughput of 1.4 MB/s (we saw this increase to 1.8MB/s later in the day)
  • 527 simultaneous requests are being processed (This rose to a maximum of about 650)

Over the day as a whole, traffic steadily increased well into late afternoon. At 6pm the server was still processing around 600 connections with a throughput of 1.8MB/s. Traffic ceased almost immediately once the Doodle was removed at the end of the day.

Analysis

The final statistcs for the day make for some interesting reading (see below). According to Analytics the site had over 191 thousand visits, thats a thousand percent increase on a normal day. The site shifted just over 100GB of traffic during the day. Fortunately the bandwidth seemed to cope more than adequately with the demand and after increasing the spare server allocation to Apache the server managed to deliver the site without problems and without getting too hot under the collar.

Hopefully, if your site is subject to a Google Doodle, this information may help you plan to avoid an outtage.

Statistics for the Day:

  • Site visits reported by Analytics: 191,491 (compare to 1073 the week before, thats a THOUSAND PERCENT increase)
  • Total ‘accesses’: ~10 million
  • Requests per second: up to ~200/second
  • Data transfer rate: up to ~1.8MB/second (roughly 6GB/hour over the busier part of the day)
  • Total data transferred: ~104GB
  • Requests being processed by apache: up to ~650 simultaneous
  • Apache CPU load: up to ~3%
  • Apache access log file, which logs each request made to the server: 2.9GB

Analytics Data for Comparison:

Visits to site on a ‘normal’ day:

  • 1,073 Visits
  • 6,429 Pageviews
  • 5.99 Pages/Visit
  • 33.74% Bounce Rate
  • 00:03:47 Avg. Time on Site
  • 83.88% % New Visits

Visits to site on Google Doodle day:

  • 191,491 Visits
  • 476,196 Pageviews
  • 2.49 Pages/Visit
  • 61.52% Bounce Rate
  • 00:00:52 Avg. Time on Site
  • 96.51% % New Visits
Category: Apache  | Tags: ,  | Leave a Comment