Service discovery for microservices
No application is an island. They constantly communicate with other applications (services) -- or, more precisely, instances of applications. Microservice architectures amplify the volume and frequency of these communications.
Service discovery is how applications and (micro)services locate each other on a network. Service discovery implementations include both:
- a central server (or servers) that maintain a global view of addresses and
- clients that connect to the central server to update and retrieve addresses.
The history of service discovery
The concept of service discovery is an old concept that has evolved as computer architectures has evolved. At the dawn of the networking age, different computers had to locate each other, and this was done through a single global text file, HOSTS.TXT. Addresses were added manually, as new hosts were infrequently added. As the Internet grew, hosts were added at an increasing rate, and an automated and more scalable system was needed. This led to the invention and widespread adoption of DNS.
Today, microservice architectures are driving the continued evolution of service discovery. In a microservice architecture, service lifespan is measured in seconds and minutes. With microservices, addresses are added and changed constantly as new hosts are added, ports are changed, or services are terminated. The highly transient nature of microservices is again pushing the limits of today’s technologies, and we see different organizations adopting different strategies.
Service discovery today
Today, three basic approaches exist to service discovery for microservices. The first is to use existing DNS infrastructure. The advantage of this approach, of course, is that every organization already has DNS deployed. Moreover, it is a well-understood, highly available distributed system with API implementations in every conceivable language. Examples of DNS-based service discovery systems include Mesos-DNS or Spotify’s use of SRV records.
The second approach is to use an existing strongly consistent key value datastore such as Apache Zookeeper, Consul, or etcd. These are highly sophisticated distributed systems. While the original design goal of these systems was broader than service discovery, their general robustness and simple interfaces lend themselves well to many service discovery use cases. Many of the early adopters of these systems for service discovery grew out of convenience – there was a business need for a Zookeeper, and then a need for service discovery – so the cost to adopt Zookeeper for service discovery instead of another mechanism was very low.
The final approach to service discovery is a specialized service discovery solution such as Netflix Eureka. This approach enables design choices optimized for service discovery. For example, Eureka prioritizes availability over consistency. However, in order for developers to take full advantage of these capabilities, we will see that some more sophisticated client libraries need to be written, which adds to the engineering cost of building these solutions.
Service discovery clients
While a centralized service discovery mechanism – whether it be Eureka, Consul, DNS, or something else – is necessary, each of your microservices needs a client to actually communicate with service discovery.
At its core, a service discovery client needs to allow service registration and service resolution. When a service starts, the service discovery uses service registration to signal its availability to other services. Once a service is available, other services use service resolution to locate the given service on the network. These two basic operations, however, encapsulate a wide range of sophisticated behavior necessary for microservices.
In addition to registering (and unregistering) a service when starting up and shutting down, service registration clients also should initiate a heartbeat system. A heartbeat is a message that periodically signals to other services that the service is alive and running. Heartbeats should be sent asynchronously to minimize any performance impact on the running service. If a heartbeat system is not implemented, a server can also poll services, although this centralized strategy does not scale as well as a distributed heartbeat. Finally, service registration also is responsible for setting the convention of what metadata about a service is shared, e.g., service name, version, and so forth.
The process of returning the physical (network) location of a microservices is service resolution. A typical service discovery client implements several critical features in its service resolution implementation: caching, failover, and load balancing. To avoid requiring an expensive and unreliable network connection on each service lookup, caching of service addresses is critical. The cache typically subscribes to updates from the service discovery mechanism to insure that it is always up to date as services come and go. Service resolution clients also implement failover and load balancing algorithms. A typical microservice is deployed in multiple physical instances for availability and scalability, and the service resolution client must know how to return the address of the appropriate instance based on load, availability, and other factors. For example, a service resolution client might use a round robin algorithm to cycle through the various addresses associated with a specific microservice.
Service discovery implementations
So we’ve now discussed several strategies for service discovery, along with the requirements for clients used to access service discovery. What actual implementations exist?
Category 1: DNS
With a DNS-based approach to service discovery, standard DNS libraries are used as clients. In this model, each microservice receives an entry in a DNS zone file, and does a DNS lookup to connect to locate a microservice. Alternatively, microservices can be configured to use a proxy such as NGINX, which can periodically poll DNS for service discovery.
This approach works with any language, and requires minimal (or zero) code changes. There are several limitations to using DNS. First, DNS does not provide a real-time view of the world, and adjusting TTLs is insufficient when different clients have different caching semantics. Second, the operational overhead of managing zone files as new services are added or removed can become expensive. Finally, additional infrastructure for resilience (e.g., local caching when the central DNS server is unavailable) and health checks will need to be added, negating the initial simplicity of DNS.
Category 2: Key/Value Store and sidecar
With a key/value store and sidecar, a strongly consistent data store such as Consul or Zookeeper is used as the central service discovery mechanism. To communicate with this mechanism, a sidecar is used. In this model, a microservice is configured to speak to a local proxy. A separate process communicates with service discovery, and uses that information to configure the proxy. With Zookeeper, AirBnb’s SmartStack, built on HAProxy, is a popular choice. Consul provides a number of different interfaces including REST and DNS that can be used with a sidecar process. For example, Stripe replicates Consul data into DNS and uses HAProxy as its sidecar process.
The sidecar approach is designed to be completely transparent to the developer writing code. With the sidecar, a developer can write code in any programming language and not think about how his or her microservice interacts with other services. This transparency has two key tradeoffs. First, the sidecar is limited to service discovery of hosts. The proxy is unable to route to more granular resources, e.g., topics (if using a pub/sub system like Kafka) or schemas (e.g., if using a database). A sidecar also adds additional latency, introducing an extra hop for every microservice. Finally, a sidecar is yet another mission-critical application that needs to be tuned and deployed with each microservice.
Category 3: Specialized service discovery and library/sidecar
In the final category, a library (and API) is directly exposed to the developer, who uses the library (e.g., Ribbon) to communicate with a specialized service discovery solution such as Eureka.
This model exposes functionality directly to the end developer, and results in a different set of tradeoffs. Developers need to be aware that they’re coding a microservice, and explicitly call APIs. This approach enables discovery of any resource and is not limited to hosts. Deployment is straightforward, as the client library is deployed with all other service libraries.
Since languages need to be supported one at a time, a broad set of client libraries that support multiple languages is necessary for polyglot organizations.
Service discovery is arguably the first piece of infrastructure you should adopt when moving to microservices. When choosing your service discovery architecture, make sure you consider the following key areas:
- What is the best strategy for deploying service discovery clients?
- What types of resources and services will you ultimately want to address in your service discovery system?
- What languages and platforms need to be supported?
Regardless of your choice, the implementation of an automated, real-time service discovery solution will pay significant dividends for your microservices architecture. In a future article, we’ll explore the various benefits of using a real-time service discovery solution.
 This limitation can be addressed if a proxy implements native support for a given application protocol (e.g., MySQL protocol, Kafka protocol). In practice, most proxies support only HTTP and TCP.