One huge opportunity for scalability and speed improvements in almost every web app involves reducing the number of requests for dynamic data that must be directed to the back end database server. A great way to do that (for data that is not transactional) is to put in a
memcached server, and use it 'on top' of the
Mysql DB.
A
memcached server is a brilliant piece of open-source engineering from the fine folks at
danga.
Memcached is a distributed hash table. You create a key, composed of text, and give it some data to store with that key. You may then GET the data back by giving
memcached the same key. It's that simple. Because the data is always RAM resident, and
memcached does no complicated indexing when you add elements, inserting and
retrieving elements scales nicely.
The basic idea of wrapping the mysql queries in the memcached is this:
1)
When inserting or updating records in the DB, also add a
memcached entry, with an expiry period, like say, 1 minute.
2) Then when the app needs to
recover information, first check the
memcached. If it's there, use it, and do not bother the database. If it's not in the
memcached (it expired, r has not been set yet) look in the db for the data, and refer back to item 1) above.
Here is a ruby example that very flexibly stores just about any MySQL query.
not printed here are the mysql handle definition (standard) and the code for the dbupdate! method.
The dbupdate! method is a wrapper for the mysql select and update syntax, so that if a record exists, it will be updated, and if it does not exist, it will be added.
Ok - here are the three total memcached methods:
def mchandle(namespace="pimp")
begin
MemCache.new(MEMCACHEDHOSTS, :namespace => namespace, :multithread => true)
rescue MemCache::MemCacheError => theerror
$stderr.print("Error creating memcached handle for this thread #{theerror}")
return nil
end
end
def mcput(mchandle,key,value)
begin
mchandle.set(key,value,@expiry) rescue MemCache::MemCacheError => theerror
$stderr.print("Error in Persist::mcput #{theerror}")
raise "Persist:mcput: #{theerror}"
end
end
def mcget(mchandle,key)
begin
mchandle[key]
rescue MemCache::MemCacheError => theerror
$stderr.print("Error in Persist::mcget #{theerror}")
raise "Persist:mcget: #{theerror}"
end
end
So - all that is required is a handle to the memcached, and a get method and a put method.
Now here are the two corresponding Mysql methods: wrapget and wrapput.
def wrapget(mchandle,dbhandle,table,what,where)
mckey = (where.gsub(/AND|and/,'') +'_'+ what +'_'+ table ).gsub(/[^a-zA-Z0-9_]/,'')
result = mcget(mchandle,mckey)
if(result != nil)
$stderr.print("\ncache HIT: #{mckey}")
else
$stderr.print("\ncache MISS: #{mckey}")
result = dbselect(dbhandle,table,what,where)
if(result.num_rows == 0)
$stderr.print("\ndb MISS: #{mckey}")
else $stderr.print("\ndb HIT: #{mckey}")
cache_result = Array.new
result.each_hash{|x| cache_result.push(x)}
mcput(mchandle,mckey,cache_result)
result = cache_result end
end
result end
def wrapput(mchandle,dbhandle,table,set,where)
mckey = (where.gsub(/AND|and/,'') +'_'+ set.gsub(/=.*?,|=.*$/,'') +'_'+ table).gsub(/[^a-zA-Z0-9_]/,'')
dbupdate!(dbhandle,table,set,where)
set_hash = {}
set.split(',').each{|item| kv = item.split('='); set_hash.store(kv[0],kv[1])}
mcput(mchandle,mckey,[set_hash])
end
One interesting thing to note is that I convert both the insert/update and the select from a MYSQL::RESULT to an Array of hashes like this [{name=>"anthony"}{name=>"musetta"}] which mimics the structure of the mysql result set. This makes the cached result more consistent. And you could read the caches with nodes that have not included the 'mysql.rb' driver, and don't have the compiled mysql client.