Working with large MySQL tables

October 14, 2009

I’m working with Global Affiliate Network right now, and one of the big problems we’re starting to run into is how to work with the sheer volume of leads that we have to store.  It’s early days yet, so we have to find a solution to this as the volume of data is only going to grow, even with sensible archiving going on.  The good news is that if huge operations like Google and Yahoo can run their many billions of records on MySQL, I know we have the right underlying technology!
So – the plan is to take things operation by operation and optimize each one, both in terms of the table data and the way we set up the queries…
One of the things we do is to run programs from other networks through our network.  With this setup, we have to be able to synchronize our database to show which leads were accepted by the other network in order to show commissions due, etc.  So each click is tagged with a unique ID in our database and in the receiving network, then I periodically pull a report from the receiving network, as an XML or CSV, to update our database.
All of our transactions are in one big table, currently some 35,000 records (so not massive by MySQL standards) and the report from the outside network we’re working on is pulled into a second table (currently about 7,500 records).  So there’s a few steps here:

  1. Pull the report from the receiving network
  2. Parse that report and add it to a data table just for that network (each network returns different information, and there’s only a few to work with, so this is probably better than trying to normalise and combine all the data into one table) – making sure we only add the NEW records.
  3. Run through our transaction table and update the status of each transaction matching the unique ID’s stored in that network table.

My first thought was that pulling the file was taking the time, but inserting a few timestamp echo statements into this file quickly showed that the first two steps were typically completed within 2-3 seconds.  The last step looked like this:
$sql = "update partners_transaction set piggyback_approved=1, transaction_status='pending' where transaction_status='held' AND unique_id in (select UID from net_trans WHERE NOT UID='');";
if (!$result = mysql_query($sql,$db)) die ("Could not update partners_transactions ".$sql);

Nice and simple, and all in one query – I even made sure that the main table was indexed on the key fields, but this single query was totally killing my database, even with only a few records to update.  After much work with the tables and googling MySQL experts, I finally ended up with the following solution – which seems totally counter intuitive to me:

  1. Read the list of UID’s from net_trans into an array
  2. Loop through the array, updating the matching record in partner_transaction as a single query each time

It seems to me that this requires far more queries, but apparently this is a MUCH quicker way to do things!  I’d appreciate any feedback from MySQL experts out there, but the final code looks like this:
//Get list of unique ID's:
$UID_list = array();
$sql = "select UID from trans;";
$result = mysql_query($sql, $db) or die(mysql_error());
while($row=mysql_fetch_assoc($result)) {array_push($UID_list, $row['TransactionId']);}

echo stamp()." Pulled list of UIDs";

$i=1; //Use as loop counter
foreach ($UID_list as $uid) {
$sql = "update partners_transaction set approved=1, transaction_status='pending' where unique_id='{$uid}' and transaction_type='sale' limit 1";
$result = mysql_query($sql,$db) or die ("Could not update partners_transactions ".$sql);
$i++;
}

echo stamp()." Transaction update complete - {$i} updates";

This solution processes almost 8,000 records in a total of 8 seconds, with negligible loading on the system. I’ll be limiting things more by selecting only the UID’s from the last few days – no need to really go back much more than 24 hours, but this solution already solves a huge load – and may give me the pointers to optimizing the other queries!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: