Database Sharding : dbShards vs. Amazon AWS RDS

A friend was recently asking about our backend database systems.  Our systems are able to successfully handle high-volume transactional traffic through our API coming from various customers, having vastly different spiking patterns,  including traffic from a site that’s in the top-100 list for highest traffic on the net.   Don’t get me wrong; I don’t want to sound overly impressed with what our little team has been able to accomplish, we’re not perfect by any means and we’re not talking about Google or Facebook traffic levels.  But serving requests to over one million unique users in an hour, and doing 50K database queries per second isn’t trivial, either.
I responded to my friend along the following lines:
  1. If you’re going with an RDBMS, MySQL is the right, best choice in my opinion.  It’s worked very well for us over the years.
  2. Since you’re going the standard SQLroute:
    1. If your database is expected to grow in step with traffic, and you’re thinking about sharding early – kudos.  You’re likely going to have to do it, sooner or later.
      1. Sooner vs. later if you’re in the cloud and running under its performance constraints.
      2. Do it out of the gate, if you have time, after you’ve figured out how you’re going to do it (i.e. whether you’re going to leverage a tool, DYI, etc).
        1. In other words, if you have time, don’t “see how long you can survive, scaling vertically”.
          1. Sharding while running the race : not a fun transition to make.
      3. I realize what I’m saying is counter to popular thinking, which is “don’t shard unless you absolutely have to”.
        1.  Without the assumption that your data set is going to grow in step with your traffic, I’d be saying the same thing.
    2. Designing your schema and app layer for sharding, sharded on as few keys as possible, ideally just one, is not future-proofing, it’s a critical P0.
  3. Since you’re going to be sharding MySQL, your options are relatively limited last I checked.
    1. Ask for input from folks who have done it before.
    2. The other sharding options I started looking at over two years ago all had disallowing limitations, given our business model.
    3. At quick search-glance just now, it also does appear that dbShards is ruling this space at this point.
  4. So barring any other options I’m missing, your best options that I’m aware of:
    1. dbShards
      1. Definitions we/they use, to help clarify discussion  :
        1. global tables : tables that contain the same data on every shard, consistency managed by dbShards.
        2. shard : two (primary and secondary) or more hosts that house all global table data, plus any shard-specific data.
        3. shard tree : conceptually, the distribution of sharded data amongst nodes, based on one or more shard keys.
        4. reliable replication : dbShards proprietary replication, more details on this below.
      2. pros
        1. The obvious : you’ll be able to do shard-count more reads and writes that you’d otherwise be able to do with a monolithic, non-sharded backend (approximately).
          1. Alternatively, with a single-primary read-write or write-only node, and multi-secondary read-only nodes you could scale reads to some degree.
            1. But be prepared to manage the complexities that come along with eventual read-consistency, including replication-lag instrumentation and discovery, beyond any user notifications around data not being up-to-date (if needed).
        2. It was built by folks who have only been thinking about sharding and its complexities, for many years
          1. who have plans on their roadmap to fill any gaps with their current product
            1. gaps that will start to appear quickly, to anyone trying to build their own sharding solution.
              1. In other words, do-it-yourself-ers will at some point be losing a race with CodeFutures to close the same gaps, while already trying to win the race against their market competitors.
        3. It’s in Java, vs. some other non-performant or obscure (syntactically or otherwise) language.
        4. It allows for multiple shard trees; if you want (or have to) trade in other benefits for sharding on more than one key, you can.
          1. Benefits of just sharding on one key include, amongst other things, knowing that if you have 16 shards, and one is unavailable, and the rest of the cluster is available, 1/16th of your data is unavailable.
            1. With more than one shard tree, good luck doing that kind of math.
        5. It provides a solution for the auto-increment or “I need unique key IDs” problem.
        6. It provides a solution for the “I need connection pooling that’s balanced to shard and node count” problem.
        7. It provides a solution for the “I want an algorithm for balancing shard reads and writes”.
          1. Additionally, “I want the shard key to be based on a column I’m populating with the concatenated result of two other string keys”.
        8. It has a distributed-agent architecture, vs. being deeply embedded (e.g. there are free-standing data streaming agents, replication agents, etc instead of MySQL plugins, code modules, etc ).
          1. Provides future-proofing, scalability and plug-ability.
          2. Easier to manage than other design approaches.
        9. Streaming agents allow you to plug into the update/insert stream, and do what you like with changes to data.
          1. We use this to stream data into Redis, amongst other things.  Redis has worked out very well for us thus far, by the way.
          2. Other dbShards customers use this to replicate to other DBMS engines, managed by dbShards or not, such as a column store like MonetDb, InfoBright, even a single standalone MySQL server if it can handle the load.
        10. It supports consistent writes to global tables; when a write is done to a global table, its guaranteed to have been done on all global tables.
        11. It doesn’t rely on MySQL’s replication and its shortcomings, but rather on its own robust, low-maintenance and flexible replication model.
        12. Its command-line console provides a lot of functionality you’d rather not have to build.
          1. Allows you to run queries against the shard cluster, like you were at the MySQL command line.
          2. Soon they’re releasing a new plug-compatible version of the open source MyOSP driver, so we’ll be able to use the same mysql command line to access both dbShards and non-dbShards managed MySQL databases.
        13. Its web console provides a lot of functionality you’d rather not have to build.
          1. Agent management and reporting, including replication statistics.
          2. Displays warning, error, diagnostic information, and graphs query counts with types.
          3. Done via the “dbsmanage” host, which provides centralized shard node management as well.
        14. It’s designed with HA in mind.
          1. Each shard is two (or optionally more, I think) nodes.  We put all primary nodes in one AWS availability zone, secondaries in a different one, for protection against zone outages.
          2. Write consistency to two nodes; in other words DB transactions only complete after disk writes have completed on both nodes.  Secondary writes only require file-system (vs. MySQL) disk writes.
          3. Managed backups with configurable intervals; MySQL EC2/EBS backups aren’t trivial.
          4. Web-console based fail-over from primary to secondary; this is very helpful, particularly for maintenance purposes.
        15. Proven to work well in production, by us and others.
          1. We’ve performed 100K queries per second in load-testing, on AWS/EC2, using m1.xlarge instances.
        16. Designed with the cloud and AWS in mind, which was a great fit for us since we’re 100% in AWS.
        17. “dbsmanage” host
        18. Drivers included, of course.
          1. In addition to MyOSP, they have JDBC, PQOSP (native Postgres), ADO OSP (for .NET), and soon ODBC.
        19. Go-fish queries allow you to write standard SQL against sharded data
          1. e.g. sharded on user.id : SELECT * FROM user where FirstName=’Foo’;
            1. will return all results from all shards performing automatic aggregation
              1. sorting using a streaming map-reduce method
        20. Relatively easy to implement and go live with; took us about six weeks of hard work, deadline-looming.
        21. It’s the market-leading product, from what I can tell.
          1. 5 of the Top 50 Facebook apps in the world run dbShards.
        22. It supports sharding RDBMSs besides MySQL, including Postgres, DB2, SQL Server, MonetDb, others coming.
        23. Team : top-notch, jump-through-their-butts-for-you, good guys.
        24. Ability to stream data to a highly performant BI backend.
      3. cons
        1. As you can see, some of these are on the pro list too, double-edged swords.
        2. Cost – it’s not free obviously, nor is it open source.
          1. Weigh the cost against market opportunity, and/or the additional headcount required to take a different approach.
        3. It’s in Java, vs. Python (snark).  Good thing we’ve got a multi-talented, kick-ass engineer who is now writing Java plugins when needed.
        4. Doesn’t rely on MySQL replication, which has its annoyances but has been under development for a long time.
          1. Nor is there enough instrumentation around lag.  What’s needed is a programmatic way to find this out.
        5. Allows for multiple shard trees.
          1. I’m told many businesses need this as a P0, and that might be true, even for us.
          2. But I’d personally prefer to jump through fire in order to have a single shard tree, if at all possible.
            1. The complexities of multiple shard trees, particularly when it comes to HA, are too expensive to justify unless absolutely necessary, in my humble opinion.
        6. Better monitoring instrumentation is needed, ideally we’d have a programmatic way to determine various states and metrics.
        7. Command line console needs improvement, not all standard SQL is supported.
          1. That said, we’ve managed to get by with it, only occasionally using it for diagnostics.
        8. Can’t do SQL JOINs from between shard trees.  I’ve heard this is coming in a future release.
          1. This can be a real PITA, but it’s a relatively complex feature.
          2. Another reason not to have multiple shard trees, if you can avoid them.
        9. Go-fish queries are very expensive, and can slow performance to a halt, across the board.
          1. We’re currently testing a hot-fix that makes this much less severe.
          2. But slow queries can take down MySQL (e.g. thread starvation), sharding or no.
        10. HA limitations, gaps that are on their near-term roadmap, I think to be released this year:
          1. No support for eventually-consistent writes to global tables means all primaries must be available for global writes.
            1. Async, eventually consistent writes should be available as a feature in their next build, by early October.
          2. Fail-over to secondaries or back to primaries can only happen if both nodes are responding.
            1. in other words, you can’t say via the console:
              1. ‘ignore the unresponsive primary, go ahead and use the secondary’
            2. or:
              1. ‘stand me up a new EC2 instance for a secondary, in this zone/region, sync it with the existing primary, and go back into production with it’
          3. Reliable replication currently requires two nodes to be available.
            1. In other words, if a single host goes down, writes for its shard are disallowed.
              1. In the latest versions, there’s a configuration “switch” that allows for failing-down to primary
                1. But not fail down to secondary.  This is expected in an early Q4 2012 version release.
          4. dbsmanage host must be available.
            1. dbShards can run without it or a bit, but stats/alerts will be unavailable for that period.
          5. Shard 1 must be available for new auto-increment batch requests.
          6. go-fish queries depend on all primaries (or maybe all secondaries via configuration, but not some mix of the two as far as I’m aware) to be available
    2. DYI
      1. I can rattle off the names of a number of companies who have done this, and it took many months longer than our deployment of dbShards (about six weeks, largely due to the schema being largely ready for it).
      2. Given a lot of time to do it, appeals to me even now, but I still wouldn’t go this route, given the pros/cons above.
    3. The latest release of MySQL Cluster may be an option for you, it wasn’t for us back with MySQL 5.0, and not likely now, due to its limitations (e.g. no InnoDB).
    4. AWS RDS was an option for us from the onset, and I chose to manage our own instances running MySQL, before deciding how we’d shard.
      1. For the following reasons:
        1. I wanted ownership/control around the replication stream, which RDS doesn’t allow for (last I looked) for things like:
          1. BI/reporting tools that don’t require queries to be run against secondary hosts.
            1. This hasn’t panned out as planned, but could still be implemented, and I’m happy we have this option, hope to get to it sometime soon.
          2. Asynchronous post-transaction data processing.
            1. This has worked out very well, particularly with dbShards, which allows you to build streaming plugins and do whatever you want when data changes, with that data.
              1. Event-driven model.
              2. Better for us than doing it at the app layer, which would increase latencies to our API.
        2. Concern that the critical foundational knobs and levers would be out of our reach.
          1. Can’t say for sure, but this has likely been a good choice for our particular use-case; without question we’ve been able to see and pull levers that we otherwise wouldn’t have been able to, in some cases saving our bacon.
        3. Their uptime SLAs, which hinted at unacceptable downtime for our use-case.
          1. Perhaps the biggest win on the decision not to use RDS; they’ve had a lot of down-time with this service.
        4. Ability to run tools, like mk-archiver (which we use extensively for data store size management), on a regular basis without a hitch.  Not 100% sure, but I don’t think you can do this with RDS.
        5. CloudWatch metrics/graphing is a very bad experience, and want/need better operational insights to what it provides.  Very glad we don’t depend on CW for this.
      2. All of these reasons have come at considerable cost to us as well, of course.
        1. Besides the obvious host management cycles, we have to manage :
          1. MySQL configurations, that have to map to instance sizes.
          2. Optimization and tuning of the configurations, poor-performance root-cause analysis,
          3. MySQL patches/upgrades.
          4. maybe more of the backup process than we’d like to.
          5. maybe more HA requirements than we’d like to; although I’m glad we have more control over this, per my earlier comment regarding downtime.
          6. maybe more of the storage capacity management than we’d like to.
        2. DBA headcount costs.
          1. We’ve gone through two very expensive and hard-to-find folks on this front, plus costly and often not-helpful, cycle-costing out-sourced DBA expertise.
          2. Currently getting by with a couple of experienced engineers in-house and support from CodeFutures as-needed.
      3. As I’ve seen numerous times in the past, AWS ends up building in features that fill gaps that we’ve either developed solutions for, or worked around.
        1. So if some of the RDS limitations can be worked-around, there’s a good chance that the gaps will be filled by AWS in the future.
        2. But it’s doubtful they’ll support sharding any time soon, there’s too much design and application-layer inter-dependencies involved.  Maybe I’m wrong, that’s just my humble opinion.

This was originally posted last week here, but I wanted to re-post here and will be updating with our latest status and learnings, if there’s any interest.  Let me know.

Getting back to it

Wow, since I blogged last I chose Python/Django and we rocked the gamification world with it.  Lots of stuff I should have been chronicling.  Trying to get back to it…but there’s a single event that prompted the return, I’ll be posting shortly on that.

Python vs. Ruby, Rails vs. Django

I’ve been learning the basics of these programming languages and development platforms lately. I’m impressed with the huge Rails movement, and Django seems like a very cool project. Both platforms allow web sites to be built at an incredible pace.

Both Ruby and Python appear to be relatively straightforward. But I must say that at this still-early stage in my learning, Python wins for simplicity and ease. Ruby’s “code blocks” are a bit obtuse, as example reasoning.

From what I currently understand about Rails and Django, their strengths lie in auto-generated code whether the implementation be application-supporting code derived from database schema (Rails) or database schema derived from application code (Django). Microsoft’s LINQ, which we’ve been using a bit at work, is similar but not foundational to the platform. It’s all cool stuff that makes DBAs anxious, which is always fun.

After trying to test out Google’s AdWords client API using their supplied code samples, finding that those code samples require old libraries that the latest Python release doesn’t support, and realizing that Django itself won’t support the latest version of Python (3.0) for a year or more to come, I’m wondering what it would be like to have a wonderfully supported and rich platform like Rails tooled with a simple and evolved language like Python.

Now I’m wondering what I’m going to have for dinner. I’ll probably get more mileage out of this wonderment.

TED

A number of years ago I was employed as an audio technician, and had the wonderful opportunity to travel around the country, recording people say interesting things. Out of all the events that I got to attend, TED was my absolute favorite. I’d come home boiling over with inspiration and enthusiasm for technology. If you aren’t familiar with it, check out some of the video that they’ve graciously made available for free. I just watched a very cool clip with Pattie Maes and Pranav Mistry about MIT’s “Sixth Sense”.

Resuscitated DVD Player

I repaired an Onkyo DV-CP702 DVD changer today.  Below is a picture of it with the case still open, playing “This Film Is Not Yet Rated”.

Onkyo DV-CP702 6 DVD Changer

Onkyo DV-CP702 6 DVD Changer

The DVD player had stopped working months ago.  You could hear it loading discs, but nothing would play.

Here’s how I fixed it:

First, I unplugged it.  This was a key step.

Then I removed the outer panel by unscrewing the six Phillips machine screws and tilting it up from the rear.

Then I sat in wonderment for a good fifteen minutes, gazing at the complex beauty inside.  Gears, motors, circuit boards, a fuse that hadn’t blown, a dead spider.

What to do next?   It wasn’t very interesting without electricity flowing through it.  I noticed that there was a plastic shield covering what appeared to be the most interesting bits.  It had a big yellow warning sign on it that said something to the effect of “Watch out, there’s a laser in here.  Don’t look into it.”   Miffed that the innards of this remarkable piece of equipment were causing me to read, I removed the shield.

Again, I detected no movement, nor anything of any interest happening whatsoever.  So, of course I plugged it in.  Still nothing.

Then I turned it on.

A few things happened at that point: it went through a boot-up process, and then the carousel turned, trying to load disks.  The carousel is a round plastic disc that holds the six DVDs or CDs. It turned one time for each slot, stopping to allow the disc-reading mechanism to pop upwards and cradle a loaded disc.  I had put one in there, and when it got to my disc the disc-reader popped up under it, elevating it slightly.   I heard the mechanism that moves the laser engage, and then…nothing.

So I stared at it for a while. Why no workie? Finally it struck me –  the disc is supposed to spin.  Why disc no spin?  Hmm.  Time to take more apart.

Unplug.  Unscrew disc-loading mechanism, unplug electrical leads.  Look at mechanism more closely.  I noticed there were two motors: one for moving the laser back and forth, and another for turning the loaded disc.   The disc-turning motor had two small screws holding it in the chassis, and two wires coming off of it.   The spindle was crowned with the smaller disc that the DVD sits upon, and after looking without success for a tension screw, realized it had to be leveraged off by force with two screwdrivers. I then unscrewed the motor and clipped the wires where they connected to the circuit board.

Then I figured I better get an electrical parts manual, to find out what part number I needed to replace.  Ten dollars later, I had found the repair manual for the DVD changer on-line and had identified the part number needed, but didn’t have a lead on where to find one.  I later realized that it would be easier and cheaper to read the part number right off the motor itself.   Brilliant.

I Googled the part number, and found Mat Electronics, an on-line distributor who had one in stock.  Motor: $4.  Shipping: $7.  I ordered it.  Then I went on vacation for two weeks.

When I got back from vacation, I had completely forgotten where I was in the process, and couldn’t find any evidence (including a delivered motor) that I had ordered it.  After looking at the cost, and realizing that I’d have to buy a new soldering iron to finish the job, I decided it wasn’t worth my time.  It was going to likely cost me $50 bucks to fix a $200 DVD changer using a motor that was listed as a “high failure” part.  Screw it.

A week later the part arrived, much to my confusion.

Time to go pick up a soldering iron at Fry’s: $6.

So I brought the soldering iron home and got to work.  I screwed the motor back into the chassis, and soldered the two wires back onto the circuit board.  I then pressed the seat back on the spindle, pushing it all the way down.  I replaced the chassis and dreaded tiny spring, and used my soldering iron to re-glue the spring to the chassis.  Then I screwed the shield back on.

Excited to see my work in action, I plugged it in and loaded a disc.  Disc no spin.  Nothing.  Bummer.  Oh well, it was a fun little project. Fail fast and early to save time, cost and headache.

Nope – I couldn’t let it go.  I stared at it for a while.  Then I removed the shield and put a piece of tape over the laser to protect myself from my own morbid curiosity.  Load another disc.  Disc no spin.  Closer look.  Spindle seat was pressing against the laser housing!  Oh my goodness, that can’t be good.   I had pressed the spindle seat too far down on the motor spindle.  Dang me, blast it.

So I pried it off the spindle a bit and tried again.  Nothing.

Time to grasp at straws.  The manual shows how to reset the micro-controller, so I did that.  Still nothing.  Closer look – the spindle seat was still pushed pretty far down.  Pry it off a bit more.

Then – it happened.  After plugging it back in, the disc spun! 

I brought it out of the shop and into the nice warm house where I plugged the video out into a TV.  The disc spun, but was making a strange noise.  Not like “whirrrrr”, but more like “brrrrrrgrrrrwhrrrrrbraaaa”.  The player readout said “No Play”, which typically means a dirty disc.   I’m sure you can imagine the problem, given that description.  The spindle seat was still pushed too far down, and the disc was moving vertically up and down too much.  Unplug, dismantle, adjust spindle seat again.  Clean laser lens, just in case.  Reassemble, plug back in and…voila!  Taking the photo above was the next thing I did after jumping up and down a few times and crying out like a schoolgirl.

The total cost without the manual was actually under $20, and I spent a good three or four hours taking things apart, driving for parts, and reassembling them.

I make a comfortable living in the software business, and my time is very valuable to me.  Was all this time, effort and money spent to fix a $200 outdated and prone-to-fail-again electrical component worth it?  Hell yes.

Why not just buy a new one, or have someone else fix it (outsource it)? Because it was fun.

Westport

The snail

Well, I’ve spent the last few days in Westport WA. This picture pretty much sums up the simplicity, happiness, and speed at which I’ve enjoyed being with my family here on the coast. This is an amazing animal. Beautiful calcium carbonate shell, with little tentacles eyes and olfactory organs perched, retractable … and yes, he slimed me.

<span>%d</span> bloggers like this: