Microservices at Netflix: Stop system problems before they start

Beginning In 2012, Netflix began adopting microservices. When Netflix experienced massive database corruption due to a missing semicolon, they realized that their monolithic application was susceptible to a single point of failure. If one missing semicolon had the potential to shut down an entire backend system, something needed to change. They made the decision to break down their monolithic legacy application into smaller, individual microservices to lower the possibility of system errors and ensure better long-term system stability.

Determining what a successful microservice structure looks like, particularly when running at scale, may not be well-defined yet. When scaling, services must not only be kept separate, but must allow developers to access information from a short-lived container after a job has ended. Adhering to these foundational microservice building blocks helped Netflix develop its architecture.

Microservices should ideally have not only a multi-tiered architecture, but that services should not share backend data stores. When an application has a single database where multiple services are written, any update to a service’s database means every other service on that database must also follow suit. If this were to happen at scale, developer teams would be tasked with updating the database structure of hundreds of individual services every time a change was made. At Netflix, each microservice is given its own backend data storage, which is utilized only by that particular service. What this means for the applications is simple: Keeping separate data stores for microservices means that the team is spared the headache of having to update every service individually when there’s a change in the database.

Netflix also makes use of elevated data processing through its use of tools like Genie 2.0, which allows Hadoop, Pig, and Hive jobs to run in the cloud. With Genie 2.0, Netflix allowed developers to launch new clusters from across their testing, development, and production environments while also configuring and deploying pending jobs. Elevating data allows for developers to better scale their microservices to meet the needs of customers, while also allowing for more efficient data processing. As clusters can be added or terminated based on cluster load, this represents significant savings to those working with large amounts of data at scale.

Data persistence is a key factor when working with microservices, as their short-lived nature often means that when a task is killed its data is lost with it. Integrating a data collection and monitoring solution helps DevOps teams gain insight into the status and health of their services, regardless of the length of time a job is run for. Netflix leverages a polyglot persistence approach while leaving its microservices application tiers stateless. This allocates state management to an application’s relevant persistence tier. When working with microservices, developers should choose the right data store for their use case. Implementing a persistent data store allows for reading and writing of data to be not only seamless, but flexible. By allowing microservice architecture to remain stateless, developers gain flexibility as well as increased availability of their cluster overall.

Utilizing Continuous Integration & Continuous Deployment for Better Code

When deploying microservices at scale, having a solid continuous deployment workflow is crucial. As more companies make the transition from legacy applications to a microservice-based approach, continuous integration and deployment allow developers to find errors in their code before it is pushed into production, roll back any changes which don’t compile correctly, and makes weekly or daily build updates a possibility. Netflix uses its microservice Spinnaker for handling all aspects of its continuous integration and deployment. The Spinnaker service allows Netflix developers the ability to view an application’s resource consumption across a stack, while also allowing devs to resize, start/stop, or disable server groups.

Getting an idea of what your organization needs to consider when onboarding a continuous integration strategy or moving toward continuous deployment can take time. Businesses must consider what kind of jobs they are running, their overall pipeline structure, and to what extent your developer team will need to have cluster management available to them. Some teams may get by with the bare minimum of visibility into their applications if they are deploying smaller microservices which are short-lived and don’t directly impact one’s system. However, if your team is deploying continuous changes into production which directly affect your end user experience, more visibility into the overall health of your applications and services throughout their deployment lifecycle is needed.

Continuous integration and continuous deployment approaches can save developers time and resources by allowing for automated testing, quicker error resolution, and the ability to drill down into exactly where errors are occurring in one’s code. CI and CD also offer various levels of application monitoring, through which DevOps teams can monitor applications at a lighter level, or dive into the impact an application’s performance has on an entire cluster.

Takeaways from Netflix Microservice Architecture

Netflix structured its microservice architecture to ensure its data would be secure, persistent, and not susceptible to singular points of failure. By refactoring its legacy database into a microservice-based architecture, Netflix was able to mitigate the risk of database failure. And by utilizing stateless application states, Netflix remains able to respond to sudden scaling issues while also having the ability to manage cluster resources across their stack through making use of features such as AWS EC2’s Auto-Scaling.

Rather than rolling out large changes which take months to build, Netflix adopted an agile development pipeline where its developers deploy many smaller changes into production over the course of a day. When working with microservices, continuous integration and continuous deployment allow developers to automate their workflow based on the language, use case, and type of service they are running. If a service such as a database store is running long-term, it has different needs for deployment, management, and testing compared with short-lived containerized services which are terminated when their tasks complete.

Through utilizing a serverless approach to its microservice and continuous delivery architecture, Netflix has blazed a trail for our industry to abstract their applications away from physical servers, making the transition to cloud-based application and software development more approachable.

Like Netflix, companies in highly competitive markets are embracing a more agile, fast-paced development pipeline that centers around microservices and continuous delivery. In order to remain competitive, software companies must be able to iterate on their codebase as quickly as possible, and microservices empower that.