MICROSERVICES

3 Reasons AWS Lambda Is Not Ready for Prime Time

Flynn

February 9, 2016 | 12 min read

Table of contents

The user can give you unparseable input.

The user can give you input that’s parseable but illegal.

Something goes wrong at runtime.

The runtime raises an exception.

If you’re not familiar with Lambda, it’s a AWS feature that’s meant to give you a way to quickly write a service and let Amazon worry about all the boilerplate junk that normally goes with standing your service up in a way that people can actually talk to it. You don’t configure subnets or instances or load balancers with Lambda: you just write some code and then tell Amazon to hook you up. It’s a pretty compelling promise.

When we at Ambassador Labs tried to actually use Lambda for a real-world HTTP-based microservices we found some uncool things that make Lambda not yet ready for the world we live in:

Lambda is a building block, not a tool
Lambda is not well documented
Lambda is terrible at error handling

Lung skips these uncool things, which makes sense because they’d make the tutorial collapse under its own weight, but you can’t skip them if you want to work in the real world. (Note that if you’re using Lambda for event handling within the AWS world, your life will be easier. But the really interesting case in the microservice world is Lambda and HTTP.)

It’s not just Lambda, even: AWS’ model is that they provide building blocks, and they expect others to wrap real tools around them. If you try to interact directly with AWS, it’s absurdly manual.

To wit, Lung’s tutorial shows us that manually setting up a Python Lambda is a twenty step process – and that’s a service with exactly one endpoint that uses GET and takes just one argument on the query string. Mind you, about half those steps (8-10, depending) are things you’ll have to repeat for every endpoint you create. If you have even five services, you’re looking not at 20 steps, but 50-60. Imagine 100 services. Imagine how often you’ll have to do this if you’re using versioned endpoints. Does 8-10 manual configuration per endpoint, every time you roll out a new version, sound like fun?

The root of the problem here is that we want a tool (our microservice) but AWS gives us building blocks, and leaves connecting them up to us. The blocks here are Lambda and the API Gateway, and it’s telling that Lung starts his tutorial not by creating a Lambda but rather by messing with the API Gateway – the gateway is a little annoying. Those 8-10 steps I mention above are all API Gateway stuff, and what’s worse, they’re the minimum for HTTP-only support. If you want HTTPS (and you should, it being 2016 and all), you need to add that into the mix as well.

This is, of course, the sort of thing that cries out for automation, and various folks are scrambling to fill the void. We already use Terraform for wrangling EC2, and it won’t take much for them to cope with Lambda and the API Gateway. Serverless announced Python support in their recent 0.2.0 release; Zappa’s initial release happened just two days ago. More on those as we experiment – at a first glance, though, none of these tools looks like I can trust it in production yet, so we’re still left with the manual world. (Of course, if you’re working on any of these tools, I’d welcome hearing why I’m wrong here!)

“But wait,” I hear you (and all the AWS folks) shout, “you lie! There’re all kinds of docs about Lambda and AWS online!”

Well, let’s imagine that you’re a developer at Alice’s House of Grues. You’ve been tasked with creating the Grue Locator service, which takes the name of a grue and responds with its location. Let’s further imagine that you’re a good developer:

You’re a team player, so you and the rest of your team are going to agree on the API before you write code.
You’re conscientious, so you’re going to test your code locally before deploying it (and you’re going to automate the tests).
You’re a Python developer, so you’re going to write in Python.

We’ll make the API easy:

GET /v1/grue/grue_name

will give you back some JSON. Done. (We’ll assume that the service just magically knows where the grue is.)

Given the API, you can sit down to write your tests and code locally. First questions: how does the

grue_name

get passed to your code, and how does output get handed back? Remember: you’re about to write code, so you need specifics that cover normal operation, exceptional conditions, error cases, and corner cases brought on by wrong (or malicious) clients.

Go ahead. Fire up Google. I’ll wait.

Back again? OK. You found a lot of hits, right? Then you had to spend some time differentiating info about AWS Lambda from the Python programming term

lambda

. Even at that point there’s a lot of chaff, but you I’m sure you eventually got to the AWS Programming Model for Authoring Lambda Functions (http://docs.aws.amazon.com/lambda/latest/dg/python-programming-model.html) in Python pages, which dance around the topic and have examples from which you can infer many things, but do not actually provide a specification.

Here’s what the Python programming model actually says about the inputs and outputs:

event

– AWS Lambda uses this parameter to pass in event data to the handler. This parameter is usually of the Python

dict

type. It can also be

list, str, int, float,

NoneType

type.

The implication here is that it’s expecting JSON over the wire, which will be deserialized into the

event

parameter. This is further implied by Lung’s directions about mapping templates, which include explicitly setting the input type to

application/json

and manually constructing a JSON dictionary from the URL query string, but it’s still an implication and not a specification.

This may seem unnecessarily pedantic. Maybe it seems obvious that the handler is going to accept a JSON dictionary and deserialize it, so they shouldn’t need to spell it out. But no, this really is important: if they don’t spell out what’s valid input, they’re not telling you what to expect when – not if – the user hands in something unexpected.

Suppose the user hands in a simple string instead of a JSON-encoded dict? Suppose they hand in a JSON-encoded array? Suppose they hand in gibberish? We’ll talk about this more shortly when we get to error handling, but the problem is that you really have no idea. Instead you have to guess and test.

Likewise, here’s what they say about outputs:

Optionally, the handler can return a value. What happens to the returned value depends on the invocation type you use when invoking the Lambda function:

If you use the
RequestResponse
invocation type (synchronous execution), AWS Lambda returns the result of the Python function call to the client invoking the Lambda function (in the HTTP response to the invocation request, serialized into JSON). For example, AWS Lambda console uses the
RequestResponse
invocation type, so when you test invoke the function using the console, the console will display the returned value.
If the handler does not return anything, AWS Lambda returns null.
If you use the
Event
invocation type (asynchronous execution), the value is discarded.

This is, well, better. It actually says that it’s going to take the output and serialize it into JSON. Great. The main issue here is that we have to go somewhere else to find out how to control the invocation type (the tutorial doesn’t even mention it); at least it explicitly says that the test console uses the

RequestResponse

version. But again, suppose you return malformed data? If you return a custom object, how does the serialization happen? What happens if you return

None

? What happens if your handler raises an exception? Again, without a spec it’s a guess-and-test game.

Note that AWS Lambda is hardly alone here: poor docs are endemic in the industry, enough so that I’ll be writing further about it later. But it’s definitely an issue with trying to get started with Lambda.

We mentioned a few specific cases above but let’s get more into this, because it’s a dealbreaker. How, exactly, are you meant to manage errors in Lambda?

Bear in mind that there are a lot of potential points of error in Lambda. The user can call you with bad data, your own service’s processing can fail, the network can fail, the list goes on and on… and, again, Amazon never documents how exactly you’re meant to deal with this.

Walking down some of the simple cases:

The user can give you unparseable input.

This is actually sort of simple: AWS will simply not run your Lambda if it can’t understand the input at all. Of course, the logging AWS provides (AWS CloudTrail) often didn’t actually capture anything about this case, when I was trying it, so it was a little tough to be sure exactly what was happening.

The user can give you input that’s parseable but illegal.

This is the case where the user gives you valid JSON, but it doesn’t make any sense (maybe you have a required dictionary element that they didn’t provide). Lambda doesn’t seem to have any specific way to handle this, so we have to treat it as a generic error at runtime, which leads us to…

Something goes wrong at runtime.

Maybe another service that we need is down. Maybe the user made a nonsensical request. We’d like to log this for debugging, and return an error to the caller so that they know it didn’t work. Logging should be easy: per AWS, any logging we do from Python should appear in CloudTrail, as should writes to

stdout

. Sadly, I often seemed to be missing output when I was experimenting, which complicated life.

Worse, there doesn’t seem to be a way to get Lambda in Python to return anything but HTTP 200. If you’re writing in JavaScript, maybe… so why not Python? (I’d love to be wrong about this, by the way.)

The combination means that the most effective tool here is to always return a dictionary with an element indicating whether the request succeeded or failed, since you don’t have HTTP status codes to work with.

Finally:

The runtime raises an exception.

AWS claims that if a Lambda raises an exception, a specific JSON dictionary is returned. This pretty much seemed to work, when I did it, but of course it means that you get something totally unlike the output you see in the normal case. Combined with the lack of control over HTTP status codes, wrapping your entire service in a big

try/catch

block seems critical to have any control at all.

When I first sat down to write my microservice using Lambda, I really wanted it to be the greatest thing since sliced bread. It had so very much promise, and I just loved the idea that I’d be able to whip up 50 lines of Python and let Amazon worry about deploying.

Sadly, it was too good to be true. We recently rolled out that microservice using Terraform and EC2 instances because Lambda just isn’t quite ready for the real world. Maybe things will be different soon, though. Maybe Zappa, Serverless, and Terraform will take over the world. Who knows, maybe Amazon will decide that Lambda should be the first thing to move beyond the building-block stage.