me:Insights

The Data Redundancy Scheme Network Tradeoff

In a previous post I have tried to develop a pricing model for comparing CPU and network. The motivation behind the model was showing that putting more CPU power near the data is cheaper than moving the data. In this post I am comparing storage and network prices with the following motivation: Assuming that ‘EC’ed’ data cannot be processed locally, it needs to get decoded (that is moved) in order to process it.

AWS Lambda Vs. Storlets

In the past few months I have heard some people confuse Storlets and AWS Lambda to be the same or competative technologies. While both 'tools' leverage the same technologies and can enjoy similar building blocks I think of them as different beasts. In this blog post I will try and explain the differences (as well as the similarities). Let me start with few words on each of them.

Spark and Storlets

One of the most compelling use cases for Storlets (at least as I see it) is boosting Spark analytics workloads by offloading compute to Openstack Swift object store. While not all types of workloads can be offloaded, there are some that definately can. This blog post introduces one type of such a copmutation and introduces Spark-Storlets [1] that imlements its offload. When coming to implement Spark-Storlets it was tempting to leverage the existing stack that connects Spark with Swift (especially Hadoop RDD and Hadoop-IO).

Thoughts on Hyperconvergence, Containers and 100TB SSDs

In the search for the holy grail of ‘compute and data co-location’, Hyperconvergence seems an interesting "avenue". Especially, with containers and the high storage-to-compute ratio offered by the forcecoming 100TB SSDs.

The CPU Network Tradeoff

Should data be brought to the compute or vice versa?

This is an old question in the 'storage centric' workload realm. While being old, I am not aware of any cost model that tries to answer that question in terms of price, that is: What costs more? Bringing the data to the compute node or placing more CPU power where the data resides? Be it my ignorance or the actual non existance of such model - here is a simple pricing model that attempts to answer that question.

Subscribe to me:Insights