All Things Co-located

The Case for Data Centric Hyperconvergence

In this blog post I would like to introduce the notion of data-centric hyperconvergence. Why should someone care about it? In two words: cost reduction. In few more words: In the last decade the ability to store data grew 10 times more than the ability to move data. It grew 20 times more if we consider the last 5 years only (details below). Thus, moving stored data (even within the data center) for the purpose of processing it becomes too costly if not impossible. The above mentioned storage efficiency growth is driven from the need to store more data.

Machine Learning with Storlets

While working together with the IOStack folks on a paper that shows how storlets can be used to boost spark SQL workloads, two of my colleagues, Raul Gracia Tinedo and Yosef Moatti brought the idea of doing machine learning with Storlets. Out of sheer ignorance I was against the idea. I was wrong and in more than one way.  Now, that I am a little less ignorant about machine learning, I can say that storlets can be useful for machine learning in several ways which I describe in this blog.

The Data Redundancy Scheme Network Tradeoff

In a previous post I have tried to develop a pricing model for comparing CPU and network. The motivation behind the model was showing that putting more CPU power near the data is cheaper than moving the data. In this post I am comparing storage and network prices with the following motivation: Assuming that ‘EC’ed’ data cannot be processed locally, it needs to get decoded (that is moved) in order to process it.

AWS Lambda Vs. Storlets

In the past few months I have heard some people confuse Storlets and AWS Lambda to be the same or competative technologies. While both 'tools' leverage the same technologies and can enjoy similar building blocks I think of them as different beasts. In this blog post I will try and explain the differences (as well as the similarities). Let me start with few words on each of them.

Spark and Storlets

One of the most compelling use cases for Storlets (at least as I see it) is boosting Spark analytics workloads by offloading compute to Openstack Swift object store. While not all types of workloads can be offloaded, there are some that definately can. This blog post introduces one type of such a copmutation and introduces Spark-Storlets [1] that imlements its offload. When coming to implement Spark-Storlets it was tempting to leverage the existing stack that connects Spark with Swift (especially Hadoop RDD and Hadoop-IO).

Thoughts on Hyperconvergence, Containers and 100TB SSDs

In the search for the holy grail of ‘compute and data co-location’, Hyperconvergence seems an interesting "avenue". Especially, with containers and the high storage-to-compute ratio offered by the forcecoming 100TB SSDs.

The CPU Network Tradeoff

Should data be brought to the compute or vice versa?

This is an old question in the 'storage centric' workload realm. While being old, I am not aware of any cost model that tries to answer that question in terms of price, that is: What costs more? Bringing the data to the compute node or placing more CPU power where the data resides? Be it my ignorance or the actual non existance of such model - here is a simple pricing model that attempts to answer that question.

Subscribe to All Things Co-located