Introducing Datawire Connect for Resilient Microservices

Scaling development organizations is hard. How do you ensure that the 50th engineer is as productive as the first, especially as your code base grows in complexity and size? The time tested answer is to figure out how to break your engineering organization into smaller teams, and build a software architecture that supports this model of distributed development. However, modern cloud applications require continuous uptime, and building a continuous uptime system that supports distributed development is not trivial.

Microservices is an architectural paradigm for cloud applications that addresses how organizations can build a continuous uptime distributed system that supports distributed development, enabling agility, velocity, and scale.

Over the past year, Datawire has had conversations with dozens of companies who have adopted microservices at scale, to understand what is driving the adoption and the principles of what they have adopted. The core of a microservices architecture is how the services communicate. The critical requirement for that communication is resiliency. In our conversations, we’ve found one simple, dominant pattern for communication between microservices: resilient RPC.

Companies like Netflix, HubSpot, Twitter, and Yelp have adopted resilient RPC as their standard microservices communication model. However, most of these efforts have involved writing custom libraries (e.g., Hystrix or Finagle) and convincing developers to learn and use these libraries.

Both of these are very hard.

Today, we’re introducing the open source Datawire Connect, which lets you easily add resilient RPC to your microservices– using your existing tools, languages, and frameworks. With Datawire Connect, the microservice author documents their existing REST or WebSockets API using Quark, a simple language for APIs. Datawire Connect then automatically uses this information to create NodeJS, Python, and Java libraries that can be used by any developer. These packages automatically include powerful resilience patterns such as circuit-breaking, timeouts, and load balancing.

Datawire connect allows you to add resilient RPC between your microservices without having to write custom libraries. You can use your existing codebase and Datawire Connect will take care of the rest.


Get Datawire Connect

How Datawire Connect Works

Imagine you have a RESTful service, ratings, with a single method that returns the customer rating of a given thing:

GET /ratings/:thingID

Here’s the API documented in Quark, with tunables that add timeouts and circuit breakers:

interface Ratings extends Service {
  // timeout, failureLimit, and retestDelay are the basic tunables for the RPC
  // circuit-breaker/retry functionality.
  static float timeout = 1.0;
  static int failureLimit = 1;
  static float retestDelay = 30.0;

  // get: get the rating for a given thingID. Note that this call is
  // _asynchronous_: Rating extends Future.
  Rating get(String thingID) {
    return ?self.rpc("get", [ thingID ]);
  }
}

Connect then compiles this file into JavaScript, Python, and Java libraries, and installs them locally (via npm, pip, and/or maven):

% quark install ratings.q --all
Compiling ratings ...
Installing ratings ...
Done!

This library implements the get() method, and by default adds:

  • a timeout of 1.0 seconds
  • a circuit breaker that trips when there is 1 failure, and automatically retests the connection after 30 seconds

Now, a microservice that wants to call the existing ratings microservice can simply use a get() method:

ratings = RatingsClient("ratings")
thingRating = ratings.get(thingID)
thingRating.await(1.0)             # 1.0 is the timeout, in seconds.
realRating = thingRating.rating    # Assumes no errors!

In practice, some application level error handling should be added. The key with Datawire Connect is that the engineer accessing the service can use resilient RPC with minimal changes to her code and approach, and no changes to the existing REST-based service interface were required.

Adding client-side load balancing for service availability

We’ve just introduced a few resiliency patterns — circuit breakers and timeouts — that add service isolation. In the example above, if the ratings service is not available, a circuit breaker trips so that subsequent calls to the service return a no-op until the circuit breaker resets, enabling the ratings service to recover. Datawire Connect also supports service availability resiliency patterns in the form of dynamic load balancing.

By adding this code to the client, we can add dynamic load balancing to the client:

ratings = RatingsClient("ratings")

# use Datawire Cloud service discovery
options = DWCOptions(<your token here>)
ratings.setResolver(DWCResolver(options))

thingRating = ratings.get(thingID)
thingRating.await(1.0)            # 1.0 is the timeout, in seconds.
realRating = thingRating.rating   # Assumes no errors!

In this example above, the get() method will dynamically route a request to an available instance of the ratings microservice. If an instance becomes unavailable, the get() method will route subsequent requests to another instance. Behind the scenes, the local library generated by Datawire Connect subscribes to a real-time data stream of microservice availability from the Datawire Cloud service, and uses the stream to update a local copy of a routing table. (Incidentally, this approach insures that any issue in connecting to the Datawire Cloud service will not impact local system availability.)

Summary

With Datawire Connect, you’re able to:

  1. Add service isolation resiliency patterns into your system, in the form of timeouts and circuit breakers
  2. Add service availability resiliency patterns into your system, in the form of dynamic client-side load balancing
  3. Do this all with no changes to your existing APIs, frameworks, and languages

Interested? Download Datawire Connect and get started with our tutorial!


Get Datawire Connect