AWS Lambda Vs. Storlets

In the past few months I have heard some people confuse Storlets and AWS Lambda to be the same or competative technologies. While both 'tools' leverage the same technologies and can enjoy similar building blocks I think of them as different beasts. In this blog post I will try and explain the differences (as well as the similarities). Let me start with few words on each of them.

AWS Lambda. I will use the AWS Lambda as the representative to the various similar solutions out there (AWS were the first I believe). These include (amongst others) Google Cloud functions, Azure Functions and IBM Open Whisk (Opensource project). AWS Lambda allows the end user to write a piece of code, upload it to the cloud, and have it executed as a result of cloud events or triggers that she defines. As with the good old cloud tradition this can be done without the user needing to take care of any server side administration or maintenance. Just pay as you go.

Storlets allow the end user to write a piece of code, upload it to the cloud storage system, and execute it over her data locally in a secured and isolated manner. Again, without the need to take care of any server side administration or maintenance.

Similarities

The most obvious similarty is being serverless. This is clearly a nice feature but certainly does not make them identical. Another similarity is that both address developers to do cool stuff to get more out of cloud services. As such both would like to support as many languages as possible to write those serverless functions. While Storlets support Java and Python, AWS Lambda supports Java, Node.js, C#, and Python (Additional languages are suported by the 'relatives'). Another similarity lies in both Storlets and AWS Lambda need to use sandboxing for running the functions. Sandboxing is needed primarily (but not only) for resource isolation. The need for sandboxing brings a lot of technical problems that needs to be addresses in both technologies, e.g.: How and when to reuse a sandbox, making sure the dependencies of the functions exist in the sandbox, how to monitor the resource usage, etc.

The above similarities might imply some mutual building blocks, but this is where IMO, the similarity ends.

Differences

AWS lambda is event centric while Storlets are data centric. AWS Lambda is great for tailoring automated cross cloud services workflows: The IoT device has spat too much data, raise an event that would run an aggregate and delete function. A new picture has been uploaded to the store, create a thumbnail and run a feature extraction algorithm to tag it. The last example is probably what makes most of the confusion between the two technologies. The fact AWS Lambda allows one to put triggers in S3 so that a Lambda function can be triggered when a new picture is uploaded does not make it data centric. Why?

  1. The data still needs to be copied out from S3 to the cluster where the function is executed. What if the data is huge?
  2. What if we wish to process a large data set consisting of many S3 objects?

This is what Storlets are for. Storlets are meant to run where the data is without the need to move it, and not necessarily as a result of some automated workflow (although one can think of a Lambda function that triggers a Storlet). Looking at the pricing model of both AWS Lambda and Azure functions there is no direct accounting for the network. Thus, one can persume that the copying of data out of S3 in the above example is free of charge. However, the pricing model of both AWS and Azure relies on memory usage (no CPU!). Since Lambda functions cannot store anything locally, the memory usage is in complete correlation with the amount of data that needs to be moved. I am thus guessing that networking plays a major role in the pricing model (specifically given the lack of CPU utilization as an accounted resource).

The above data locality discussion brings another big difference between the two: Storlets shine where there are huge amounts of 'unmovable' data. This suggests that Storlets are most likely to reside at the cloud 'back-end'. AWS Lambda on the other hand are also useful at the edge (as done with Lambda@Edge).

Another somewhat philosophical (and perhaps arguable) difference that arise from being event Vs. data centric is the actual functionality being implemented: Your lambda functionality is probably as rich as the types of events in the system, while Storlets are as rich as the types of data you have and what you can do with it. I am not suggesting that one type is richer then the other, just pointing out the different nature of functionality that might be used...

To conclude: I think that both tecnologies are great and both are required, but they are certainly not the same.

Storlet: 
1

Add new comment