Migrating monolithic databases

December 22, 2017

How do you migrate from a monolithic application with a single large database to an eventually consistent, distributed persistence architecture?

Randy Shoup recently did an excellent talk at QCon on exactly how Stitchfix approached this problem. At Stitchfix, they started with a Rails monolith on top of a RDBMS, and defined a bunch of incremental milestones that let them achieve partial benefits along the way. Their initial picture looks something like this:

       Rails ------> Monolithic-DB

And they wanted to get to an ideal end state that looks something like this:

  +--- Rails ------> Less-Monolithic-DB
  |
  |
  +--> mSvc1 ------> mDB1
  |
  +--> mSvc2 ------> mDB2
  |
  +--> mSvcN ------> mDBN

Taking an organizational approach to migration

One of the important things to remember is that this end state is really only beneficial if each of these services are owned by teams who are free to move forward independently. If the problem you are solving is poorly divided amongst these services, then this picture can actually end up being just as unweildy as your original monolith. It's therefore really important that your strategy for getting to this end state prioritizes independence of each of these services along the way. This starts with the organizational approach to service teams, and really focusing on maintaining as much autonomy as possible from day one.

The way Stitchfix approached this was to define different milestones for each service and delegate responsibility to each service team for achieving these milestones.

The first milestone was to split out the business logic into a separate service that still talks to the same database. After a few services had reached this first milestone, they might have had a picture that looked something like this:

       Rails ------> Monolithic-DB
                      /|\  /|\
                       |    |
       mSvc1 ----------+    |
                            |
       mSvc2 ---------------+

       ...

Note that although this diagram is illustrating things from a global perspective, each service/team would proceed independently through these milestones. For each service that achieves this first relatively straightforward milestone it gives them incremental benefit. The service is still coupled into the monolithic process by the data model, and this does mean that any change that touches the data model still needs to run through the centralized release process for the monolith, but that still leaves the service teams free to make and release any pure business logic changes independently for each service and that's a big step toward autonomy that can be quite a significant benefit.

The second stage is then for a microservice to take over management of its own data. This starts to get a bit tricker in the case of migrating a monolith, but roughly speaking there are two milestones here. First, go through the monolithic codebase and replace all direct access to the microservice data via the Monolithic-DB with calls into the microservice. Once you've done this you get a picture that looks like this:

  +--- Rails ------> Monolithic-DB
  |                   /|\  /|\
  |                    |    |
  +--> mSvc1 ----------+    |
                            |
       mSvc2 ---------------+

Then, once you are sure that for the subset of the data that is owned by mSvc1 there is no access from the monolith, you can migrate that data into its own database and end up here:

  +--- Rails ------> Slightly-Less-Monolithic-DB
  |                        /|\
  |                         |
  +--> mSvc1 ------> mDB1   |
                            |
       mSvc2 ---------------+

After completing this with more services, you end up with this picture:

  +--- Rails ------> Less-Monolithic-DB
  |
  |
  +--> mSvc1 ------> mDB1
  |
  +--> mSvc2 ------> mDB2
  |
  +--> mSvcN ------> mDBN

It's important to note that this can take a while to achieve. Often, a complete migration is not worthwhile. Instead, you want to focus on whatever functional areas will give you the most bang for the buck, e.g., something you need to scale, or an area where you want to build out significant new functionality. It's pretty common for successful microservices companies to still have a monolith at the center of their network of services.

A workflow for each service team

The other important thing to note is that "replacing direct access to the Monolithic-DB with calls to your microservice" is not a discrete development task. When you do this, you are changing from a transactional data access mode to a non transactional data access mode. For this to work, ultimately your service teams will need to learn to design APIs that enable the monolith and/or any other user to accomplish whatever they need to do without the aid of transactions. There are a bunch of design patterns that can be useful here, but for even moderately complex domains this can require some trial and error to figure out what will work.

For these reasons, as well as for maintaining the autonomy of your teams/services, make sure your teams have tools that:

  • let developers quickly and easily test in isolation a small change that spans both your monolith and the service they are developing,
  • and, once that change functions well in isolation, roll it out safely into production.

The reason for this focus is that if you follow a migration process like Stitchfix, you're going to have a lot of developers/teams running through this sequence of steps:

  1. Hunt down a Monolithic-DB dependency in your monolithic codebase.

  2. Prototype an API for your microservice that exposes that data in a way the monolith can consume it.

  3. Try out the prototype from your monolithic codebase, usually going back to step 2 a few times if/when it doesn't work so well.

  4. Once you're happy with (3) you will need to deploy (2) into production.

  5. Once (2) is in production, you will need to do a canary deploy of (3) into production in order to avoid overloading (2) by routing all your production traffic through a brand new data pathway that has not yet been tested under real load.

  6. With the canary deployed, you will want to assess the impact of the canary on the quality of service to your users. At a minimum doing a differential comparison of request latency and request error rates between the canary and the stable deployment. If these comparsions are unfavorable, you may need to go back to step (2) again.

  7. Once you are happy with (6) you will then merge/deploy the change into your stable deployment.

As you can see, these steps really emphasize both the rapid prototyping and canary aspects of the Service Oriented Development workflow, and enabling your developers to reduce the latency involved in each of these steps will accelerate you quite a bit. The benefit from streamlining this workflow is actually often larger than expected, because with multiple teams running through this cycle simultaneously, you will get a kind of "lock contention" on the monolith as each team tries to make the changes necessary to take ownership of their own data. Having a lower latency cycle helps you reduce that "lock contention" along with having other benefits like reducing the overall risk/outages associated with this process in general.

Questions?

We’re happy to help! Look for answers in the rest of the Microservices Architecture Guide, join our Gitter chat, send us an email at hello@datawire.io, or contact our sales team.