Sunday, February 8, 2009

wrapping MYSQL queries in Memcached

One huge opportunity for scalability and speed improvements in almost every web app involves reducing the number of requests for dynamic data that must be directed to the back end database server. A great way to do that (for data that is not transactional) is to put in a memcached server, and use it 'on top' of the Mysql DB.

A memcached server is a brilliant piece of open-source engineering from the fine folks at danga.
Memcached is a distributed hash table. You create a key, composed of text, and give it some data to store with that key. You may then GET the data back by giving memcached the same key. It's that simple. Because the data is always RAM resident, and memcached does no complicated indexing when you add elements, inserting and retrieving elements scales nicely.

The basic idea of wrapping the mysql queries in the memcached is this:
1) When inserting or updating records in the DB, also add a memcached entry, with an expiry period, like say, 1 minute.
2) Then when the app needs to recover information, first check the memcached. If it's there, use it, and do not bother the database. If it's not in the memcached (it expired, r has not been set yet) look in the db for the data, and refer back to item 1) above.

Here is a ruby example that very flexibly stores just about any MySQL query.
not printed here are the mysql handle definition (standard) and the code for the dbupdate! method.
The dbupdate! method is a wrapper for the mysql select and update syntax, so that if a record exists, it will be updated, and if it does not exist, it will be added.
Ok - here are the three total memcached methods:

### Simple memcached methods.
def mchandle(namespace="pimp")
begin
MemCache.new(MEMCACHEDHOSTS, :namespace => namespace, :multithread => true)
rescue MemCache::MemCacheError => theerror
$stderr.print("Error creating memcached handle for this thread #{theerror}")
return nil
end
end
  def mcput(mchandle,key,value)
begin
mchandle.set(key,value,@expiry) #10 sec timeout
rescue MemCache::MemCacheError => theerror
$stderr.print("Error in Persist::mcput #{theerror}")
raise "Persist:mcput: #{theerror}"
end
end

def mcget(mchandle,key)
begin
mchandle[key]
rescue MemCache::MemCacheError => theerror
$stderr.print("Error in Persist::mcget #{theerror}")
raise "Persist:mcget: #{theerror}"
end
end


So - all that is required is a handle to the memcached, and a get method and a put method.
Now here are the two corresponding Mysql methods: wrapget and wrapput.



def wrapget(mchandle,dbhandle,table,what,where)
# totally generic, no set db or ms.
# On aDB hit, convert the result to an array of rows, each row is a hash of columns, in string format, to match how db results are
# cached in the wrapput, like [{val=>"one",uri=>"two"}] and insert in the memcached.
mckey = (where.gsub(/AND|and/,'') +'_'+ what +'_'+ table ).gsub(/[^a-zA-Z0-9_]/,'')
#$stderr.print("\n in wrapget, the mckey is #{mckey} (where_what_table)")
result = mcget(mchandle,mckey)
if(result != nil)
$stderr.print("\ncache HIT: #{mckey}")
else
$stderr.print("\ncache MISS: #{mckey}")
result = dbselect(dbhandle,table,what,where)
if(result.num_rows == 0)
$stderr.print("\ndb MISS: #{mckey}")
else #cache the hit for next time.
$stderr.print("\ndb HIT: #{mckey}")
cache_result = Array.new
result.each_hash{|x| cache_result.push(x)}
mcput(mchandle,mckey,cache_result)
result = cache_result #replace the result to pass out.
end
end
result #return result
end

def wrapput(mchandle,dbhandle,table,set,where)
# totally generic, no set db or ms.
# to make the mckey, strip the values from set, so that hval=123,hkey="shiz" gets condensed to hvalhkey, just for the purposes of setting the mc key.
# to memcached, insert an array of hash rows based on the sql, so "SET val= 'one', uri='two'" becomes [{val=>"one",uri=>"two"}]

mckey = (where.gsub(/AND|and/,'') +'_'+ set.gsub(/=.*?,|=.*$/,'') +'_'+ table).gsub(/[^a-zA-Z0-9_]/,'')
#$stderr.print("\n in wrapput, the mckey is #{mckey}")
dbupdate!(dbhandle,table,set,where)
set_hash = {}
set.split(',').each{|item| kv = item.split('='); set_hash.store(kv[0],kv[1])}
mcput(mchandle,mckey,[set_hash])
#$stderr.print("\n wrapput - here's the hash we are storing in the mc: #{[set_hash]}, it is class: #{[set_hash].class} ")
end






One interesting thing to note is that I convert both the insert/update and the select from a MYSQL::RESULT to an Array of hashes like this [{name=>"anthony"}{name=>"musetta"}] which mimics the structure of the mysql result set. This makes the cached result more consistent. And you could read the caches with nodes that have not included the 'mysql.rb' driver, and don't have the compiled mysql client.

No comments:

Post a Comment