scale: ruby

One huge opportunity for scalability and speed improvements in almost every web app involves reducing the number of requests for dynamic data that must be directed to the back end database server. A great way to do that (for data that is not transactional) is to put in a memcached server, and use it 'on top' of the Mysql DB.

A memcached server is a brilliant piece of open-source engineering from the fine folks at danga.
Memcached is a distributed hash table. You create a key, composed of text, and give it some data to store with that key. You may then GET the data back by giving memcached the same key. It's that simple. Because the data is always RAM resident, and memcached does no complicated indexing when you add elements, inserting and retrieving elements scales nicely.

The basic idea of wrapping the mysql queries in the memcached is this:
1) When inserting or updating records in the DB, also add a memcached entry, with an expiry period, like say, 1 minute.
2) Then when the app needs to recover information, first check the memcached. If it's there, use it, and do not bother the database. If it's not in the memcached (it expired, r has not been set yet) look in the db for the data, and refer back to item 1) above.

Here is a ruby example that very flexibly stores just about any MySQL query.
not printed here are the mysql handle definition (standard) and the code for the dbupdate! method.
The dbupdate! method is a wrapper for the mysql select and update syntax, so that if a record exists, it will be updated, and if it does not exist, it will be added.
Ok - here are the three total memcached methods:

### Simple memcached methods.
  def mchandle(namespace="pimp")
   begin
   MemCache.new(MEMCACHEDHOSTS, :namespace => namespace, :multithread => true)
   rescue MemCache::MemCacheError => theerror
     $stderr.print("Error creating memcached handle for this thread  #{theerror}")
     return nil
   end
 end

  def mcput(mchandle,key,value)
   begin
     mchandle.set(key,value,@expiry) #10 sec timeout
    rescue MemCache::MemCacheError => theerror
     $stderr.print("Error in Persist::mcput  #{theerror}")
     raise "Persist:mcput:  #{theerror}"
   end
 end

 def mcget(mchandle,key)
    begin
      mchandle[key]
     rescue MemCache::MemCacheError => theerror
       $stderr.print("Error in Persist::mcget  #{theerror}")
       raise "Persist:mcget:  #{theerror}"
     end
 end

So - all that is required is a handle to the memcached, and a get method and a put method.
Now here are the two corresponding Mysql methods: wrapget and wrapput.

def wrapget(mchandle,dbhandle,table,what,where)
   # totally generic, no set db or ms.
   # On aDB hit, convert the result to an array of rows, each row is a hash of columns, in string format, to match how db results are
   # cached in the wrapput, like [{val=>"one",uri=>"two"}] and insert in the memcached.
   mckey = (where.gsub(/AND|and/,'') +'_'+ what +'_'+ table   ).gsub(/[^a-zA-Z0-9_]/,'')
   #$stderr.print("\n in wrapget, the mckey is #{mckey}  (where_what_table)")
   result = mcget(mchandle,mckey)
  if(result != nil)
    $stderr.print("\ncache HIT: #{mckey}")
  else
    $stderr.print("\ncache MISS: #{mckey}")
    result = dbselect(dbhandle,table,what,where)
    if(result.num_rows == 0)
      $stderr.print("\ndb MISS: #{mckey}")
    else #cache the hit for next time.
       $stderr.print("\ndb HIT: #{mckey}")
      cache_result = Array.new
      result.each_hash{|x| cache_result.push(x)}
      mcput(mchandle,mckey,cache_result)
      result = cache_result #replace the result to pass out.
     end
  end
  result #return result
 end 

def wrapput(mchandle,dbhandle,table,set,where)
   # totally generic, no set db or ms.
   # to make the mckey, strip the values from set, so that hval=123,hkey="shiz" gets condensed to hvalhkey, just for the purposes of setting the mc key.
   # to memcached, insert an array of hash rows based on the sql, so "SET val= 'one', uri='two'" becomes [{val=>"one",uri=>"two"}]
  
  mckey = (where.gsub(/AND|and/,'') +'_'+ set.gsub(/=.*?,|=.*$/,'') +'_'+ table).gsub(/[^a-zA-Z0-9_]/,'')
   #$stderr.print("\n in wrapput, the mckey is #{mckey}")
   dbupdate!(dbhandle,table,set,where)
  set_hash = {}
  set.split(',').each{|item| kv = item.split('='); set_hash.store(kv[0],kv[1])}
  mcput(mchandle,mckey,[set_hash])
   #$stderr.print("\n wrapput - here's the hash we are storing in the mc:  #{[set_hash]}, it is class: #{[set_hash].class} ")
 end

One interesting thing to note is that I convert both the insert/update and the select from a MYSQL::RESULT to an Array of hashes like this [{name=>"anthony"}{name=>"musetta"}] which mimics the structure of the mysql result set. This makes the cached result more consistent. And you could read the caches with nodes that have not included the 'mysql.rb' driver, and don't have the compiled mysql client.

I'm Anthony, and this is my scalability blog. I have a few projects I want to write about as I work on them - maybe I can connect with other people doing similar work, and we can help each other solve problems.

I am primarily interested these days in building web-scale distributed systems using pure Ruby, Mysql, and Memcached. I have two big opportunities to push the scalability envelope:

1) I am tasked at the company I work for (http://www.mogreet.com/) with building out and scaling up the SMS/MMS Video delivery platform I architected and wrote (with a lot of help from Blake, and a some pointers from Aber.) This involves buying a lot of hardware - servers, databases, load balancers, to run more of the existing Secret_sauce nodes - secret_sauce is the mogreet Application Server and messaging gateway, written in pure ruby. Over the next months I'll be designing a set up for redundant, high-availability Mysql servers, and working out an application serer -level load balancing solution, perhaps from the nice folks at F5.

2) My personal project, proceeding slowly on weekends, is the p-server. The p-server is a different architectural approach to scaling up a pure ruby app-server, although it could also be deployed as a cloud of cheap nodes. The basic idea of the p-server is to move the relevant portions of the web server into the app server - rather than vice-versa, and use ruby threads to manage a large number (1000? 10,000? 100,000?) of concurrent HTTP requests, which will generally be RESTFUL queries that the p-server will parse and respond to. The p-serve is designed to be a real-time game server, so it's designed to handle large numbers of concurrent players of the game, who are all interacting with the system via different User Interfaces, like social network apps, web pages, iphones, etc.
The p-server has a producer-consumer thread model, where each incoming request is passed to a consumer thread, which runs for a limited number of passes or a limited period in time.
These consumer threads re-use connections to any n number mysql databases, which are wrapped in n number Memcached distributed memory caches. Initial tests (on a laptop) show that the prior architecture I developed for the special_sauce can handle about 20 connections a second per node when performing logic that involves database action (and is massively distributed) the new p-server architecture handles 400-500 connections a second. And there is plenty of room for improvement, in how fast the Memcache client is, supporting keep-alives, etc.
My plan is to make the p-server available to other people as open source software, as soon as it's together enough, and documented enough, to be useful. The p-server relies on mysql and memcached, so I'll probably put it all together in a little software distro, compiled for Mac OSX and FreeBSD, to make it easy for people to try out.

scale

Sunday, February 8, 2009

wrapping MYSQL queries in Memcached

introduction

About Me

Blog Archive

Followers