Some thoughts on your (and my) thoughts.

Friday, June 26, 2020

Managed or Unmanaged or something in between?

As you build a product, offering, or a team, one of the implicit questions you must answer up-front is "how managed" you wish your offering to be. This question applies equally to you as a consumer who wishes to evaluate multiple offerings to see which makes more sense for you as a user.

In practice, you will never be on the completely managed or the completely un-managed side of things, but in some practical middle. Over the years, the trend has been to have more options available open up on the managed spectrum of offerings.

So what exactly does it mean for an offering to be "managed"?

Well, the "managed v/s un-managed" question is more of a conceptual one, but is best explained with a few concrete examples.

In both of the examples above, the common theme is that a managed offering looks more like an end to end solution that solves a user scenario, whereas an un-managed offering is usually one of the many ingredients of an end to end solution to a user scenario. One can also very loosely compare the managed v/s unmanaged concept with framework v/s library offerings (see also this link for library v/s framework).

Just to make sure I am explaining the concept clearly, when I say that an offering is "managed" I mean that as far as the consumer of this offering is concerned, they do not need to manage its operations, and they can rely on it as a black box for the most part.

There are many levels of a managed offering one can find in practice

Managed v/s un-managed is not a binary state. i.e. there are many shades of managed-ness involved in practice. For example if you consider a typical cloud offering, here are some options:

Data Center space provided, and you need to bring your own machines, set them up, get power lines, network lines, set up cooling, etc... In the context of this discussion, we can consider this to be completely unmanaged.
Data Center space, power lines, network lines, cooling provided, but you need to bring your own machines, and maintenance staff, etc... This is slightly more managed than the previous offering.
Data Center space, power lines, network lines, cooling provided, machines, and personnel provided, but you need to manage the machines, software installation, monitoring, uptime, etc... This is slightly more managed than the previous offering.
All of the above plus an installed OS and management tooling provided, but you need to install and manage application software, etc... This is even more managed than the previous offering.
All of the above plus application software, monitoring/management tooling/interfaces provided, and then only thing you need to do as a user is provide a single function (think AWS lambda) that encodes your specific business logic. Everything else is taken care of for you. In the context of this discussion, we can consider this to be completely managed.

Pros and Cons of managed v/s un-managed setups

The pros and cons of managed v/s un-managed setups can be thought of from 2 different perspectives; i.e. from the point of view of the provider, and from the point of view of the consumer.

Consumer side view of a managed v/s un-managed offering

Pros of managed

Easier to set up and get started. Typically, out of the box (canned) solution works for most "common" use-cases.
No hassle of maintaining or managing anything. Pay as you go.
Easy to scale up (typically) by paying more. No (or very little) lag time between increased demand for your offering and ability to scale up. You don't need to be super buttoned up with planning and capacity projections in so far as the specific offering you are relying on is concerned.
Allows you to focus on the unique value that you are providing to your customers. For example, as someone who runs a bakery, having Shopify manage the app, payments, delivery, etc... is super awesome.
No need to worry about common concerns that are addressed by the provider. For example, in the context of a web-service being provided, the customer need not worry about kernel upgrades, web-server updates, security patches, etc...

Cons of managed

May cost more overall (TCO) compared to if you were to manage everything yourself. This typically becomes relevant only if you reach a certain scale.
Managed offering may have severe limitations that limit your ability to scale up or scale out to more user scenarios. This is a quality aspect, and if you have a good relationship with the provider of the managed offering, you could work something out. This also relies on the vision and relative priorities of the managed solution provider.
The only tunables are the ones that the managed offering provider offers. For example, if you are using Amazon AWS Lambda, it may be hard or impossible to configure low level kernel parameters or select a specific version of a language runtime. You are limited to using what is provided, and can't customize things beyond a point.
There is a risk of getting too tightly coupled with the offering. If the offering isn't implementing an open specification, there is a risk of vendor lock-in. If the vendor decides to increase rates (for a SaaS model) or drop support (for a more traditional offering), then the cost to move to a different vendor may be cripplingly significant depending on how broad/deep the dependency is.

Provider side view of a managed v/s un-managed offering

Pros of managed

A managed offering means that the provider has greater flexibility over the implementation, and can keep changing the underlying implementation (and reaping cost savings and benefits as a result) since customer doesn't depend on the details of the implementation. A great example of this being leveraged is in the various types of JVMs available, each having its own advantages and disadvantages. In the case of Shopify for example, they can change shipping providers without the customer having to worry too much. In case of AWS, they can keep charging the same amount per GB for storage, but negotiate better costs with their supplier and not pass that on to their customers.
The abstraction layer between the customer and the infrastructure reduces overall maintenance and upgrade costs since the provider has complete control over the offering. This is beneficial for the customer as well since they can be on the latest and greatest offering. For example, a cloud offering is much easier to manager/upgrade compared to an on-premise offering. For example, how often have you (as a customer) worried about an update to the gmail backend?

Cons of managed

Easy to provide an offering that works for small (or medium) consumers, but challenges become increasingly hard to scale up to large and extra-large scale customers. For example, not many cloud providers can scale-up to the level that AWS can. CodeIgniter (a very popular PHP application development framework) works really well for small/medium size apps, but not necessarily for large scale ones. The level of architectural and design thinking needed to scale up is a few orders of magnitude more to go from medium to large than it is to go from small to medium. Most managed offering providers fail or crumble because they can't make this leap.

Additional considerations of managed

Requires a lot of resources and up-front thought to provide. Need to understand numerous use-cases before deciding the platform to provide that is generic enough, and yet absorbs most of the common pain-points of customers.
Need to plan to grow/hire a significant workforce (various disciplines) to support a managed offering. For example, AWS would need to hire not only engineers, but also sales reps, a customer success team, maintain detailed documentation (technical writers), etc...
As a provider, you need to be extremely careful about the published API and contract that you are exposing. For example, if you run a service like AWS lambda, and you expose a Java interface, and the customer takes a hard dependence on a specific behavior of a specific Java runtime version, then you can't effectively upgrade the Java runtime without working closely with the affected customers or breaking their application (and hence losing their business and tarnishing your own image).

Monday, April 13, 2015

Models of engagement

As a person who solves problem for a living, there's a few options one has as far as current models of engagement (or more traditionally employment) are concerned:

Engaged full time
Engaged part time
Freelance (i.e. somehow the work and the worker find each other)

I would argue that [1] and [2] are a special case of [3], wherein [3] just runs in a loop and the extra overhead of performing the search and matching are avoided, hence gaining some efficiency as a result of eliding the constant extra cost of running the matching algorithm.

[3] Is the purest (in some way) of being engaged, but some industries (such as the bollywood music industry) are more conducive to this model compared to something like a person working in a factory (either churning out tangible goods or code, etc...) (please pardon the phrase "factory" since some of you might be offended, and you should be since not all code is created equal; a lot of code is also a work of art). I'm ignoring some of the things that come with being employed full (or part) time such as health insurance, etc... since I want to focus on the most important aspect of employment, which in my opinion is impact and engagement.

Why is it that some industries have a preference for a certain model and others prefer another model?

Is it in the best interests of both parties to gravitate towards the most flexible model in most situations?

Which situations require that less flexible models thrive and are conducive to a more long term contractual type of engagement setting?

Which situations require that more flexible models thrive and are conducive to a more short term freelance type of engagement setting?

Wednesday, February 18, 2015

Searching faster than Binary Search

We all know that Binary Search is asymptotically the fastest way to search for data within a sorted (ordered) array in the comparison (or simple decision tree) model assuming nothing in particular about the data or the data types.

However, there's an entire class of algorithms and data structures that focus on how efficiently they utilize the host system's cache while processing data. These class or algorithms and data structures are called cache efficient algorithms and data structures [pdf].

There's 2 types of cache efficient algorithms and data structures:

One class tunes the algorithm for a particular size of cache or cache hierarchy. The algorithm is aware of the cache sizes, the number of caching levels, and the relative speeds of each of these caching levels.
Another class of algorithms is oblivious of the underlying cache sizes and layouts and are provably optimal for any cache size and caching layout (sounds magical doesn't it!). These are called Cache Oblivious Algorithms. See this link on Cache Oblivious Algorithms for more details, and this link for more details on the model, and the assumptions made in the Cache Oblivious model.

An example of a cache-efficient algorithm that is also cache-oblivious is Linear Search.
An example of a cache-inefficient algorithms that isn't cache-oblivious is Binary Search.
An example of a cache-efficient data structure that isn't cache-oblivious is the B-Tree (since B is the tuning parameter for the particular machine on which we arr running).

Without getting into the details, the complexity of running Binary Search on an array in the Disk Access Model (DAM) (where we are only concerned about the number of disk blocks read, and not the number of comparisons made) is O(log^N⁄_B)), since we must always load a block from disk till we reach a small enough array (of size B) such that no more jumps within that array will trigger another disk I/O operation to fetch another disk block. The optimal complexity for searching for ordered data on disk is realized by ordering the data recursively in a static search tree such that the complexity reduces to O(log_BN).

However, implementing that structure is somewhat complicated, and we should ask ourselves if there is a way to get the best of both worlds. i.e.

Runtime efficiency of the cache-oblivious recursive layout, and
Implementation simplicity of the standard Binary Search algorithm on an ordered array

Turns out, we can reach a compromise if we use the square-root trick. This is how we'll proceed:

Promote every sqrt(N)'th element to a new summary array. We use this summary array as a first-level lookup structure.
To lookup an element, we perform binary search within this summary array to find the possible extents within the original array where our element of interest could lie.
We then use binary search on that interesting sub-array of our original array to find our element of interest.

For example:

Suppose we are searching for the element '7', we will first perform binary search on the top array; i.e. [10,22,33,43] and identify that 7 must lie in the original values sub-array at an index that is before the index of 10. We then restrict our next binary search to the sub-array [5,7,8,10].
Suppose we are searching for the element '22', we first identify the sub-array [11,21,22,25,26,33] as potentially containing a solution and perform binary search on that sub-array.

Even though we are asymptotically performing the same number of overall element comparisons, our cache locality has gone up, since the number of block transfers we'll perform for a single lookup will now be 2(log^√N⁄_B), which is an additive factor of log B lesser than what we had for our normal binary search on the sorted array.

You can find another similar idea named Cache Conscious Binary Search described here.

All the code needed for reproducing the results shown here can be found below:

Sunday, February 15, 2015

The many small steps to Pincha Mayurasana (पिंच मयूरासन ) (Feathered Peacock Pose)

Pincha Mayurasana (or Feathered Peacock Pose) is a yoga pose that roughly translates to a forearm stand. This can be the next step after mastering (or getting decent at) the Headstand (Shirshasana).

You'll need some tools to be able to start practicing the feathered peacock pose. There are:

Forearm strength
Shoulder strength
Lower Back strength
Lower Back flexibility (to arch into a slight concavity)
General balance (that you'll have developed as part of the headstand practice)
Some Core strength (that you'll have developed as part of the headstand practice)

Here are some posts explaining (in great detail) how to attain the final pose and possibly some of the intermediate steps needed to be practiced on the way to getting there:

Here are some excellent videos that you should watch to get a visual introduction to the pose:

Let me now discuss some of the extra bits I would like to add or stuff I would like to emphasize from the posts and videos above:

Practicing the Cobra Pose (Bhujangasana) was helpful in opening up my lower back
Practicing the Locust Pose (Salabhasana) (or even doing one leg at a time) was extremely helpful in developing lower back strength
Forearm and shoulder strength can be developed by practicing the Peacock Pose (Mayursana) and by using wrist strengtheners such as a Powerball
When I started, I used a wall to support me when I kicked up from the dolphin pose
Forearm strength, shoulder strength, and general flexibility can be developed by practising multiple rounds of Surya Namaskaar
Andrei Ram Om's video on the Feathered Peacock pose (above) touches up on some subtle but important aspects about the back position (arched) and the differences between various hand positions. I think it's extremely relevant to know these differences when you start so that you can experience the full flavour of the pose

On Shanti, and keep practising! Namastey!

ॐ

Tuesday, February 10, 2015

Smallest multiple with constraints

Problem statement:

Given a number 'n', design an algorithm that will output the smallest integer number 'X' which contains only the digits {0, 1} such that X mod n = 0 and X > 0 (1 ≤ n ≤ 100000)

Problem credits: Ayush on the Stony Brook Puzzle Society Facebook Group

Insight: Suppose n = 6. Consider some numerator (target answer X) to be 1101. Can this be correct? We can verify by dividing and checking the remainder. 1101 mod 6 = 3, so we've overshot the answer by 3 or more (since we might not have the smallest X).

Another way to get the same remainder '3' is to say:

1000 mod 6 = 4
100 mod 6 = 4
00 mod 6 = 0
1 mod 6 = 1

Add up all the remainders and take that mod 6. Hence, (4 + 4 + 0 + 1) mod 6 = 9 mod 6 = 3.

Which basically means that we want to find that subset of powers of 10, added together (mod n) leave a remainder of 0. We can solve this using dynamic programming using a technique similar to the one used to solve the subset sum problem. The solution to our problem (and to the subset sum problem) is pseudo-polynomial. Specifically, our problem is solved in time O(n log X).

Here is the ideone link, and below is the code (in python) to solve it.

Follow up problem:

Find a number 'X' given an 'n' (similar to the problem above) except that you are allowed to only use digits {0, 3, 5, 8}. Can you always find a solution for any 'n'?

Monday, January 19, 2015

How to make an intense cup of hot chocolate

tl;dr Find a way to melt chocolate; then add milk and sugar. Drink up (credits to Yazhini for this succinct description).

Experiment: One of the most intense hot chocolates you've ever had

Ingredients -- You will need (for 1 cup hot chocolate [240ml]):

250 ml whole milk
4-6 squares of dark chocolate (I use Ghirardelli)
A ceramic cup
2 tablespoons of water
(optional) 1/2 teaspoon sugar

Procedure:

Pour the whole milk into a saucepan and heat till it begins to simmer
Now take the chocolate squares and place them in the ceramic cup. Heat in a microwave for ~20 second at medium power
Pour 2 tablespoons of water on the chocolate and heat in the microwave for 30 second at medium power
Stir the chocolate and water well to make it a thick homogeneous mixture
Pour the simmering milk into the ceramic cup, add sugar if needed, and stir well
Optionally strain the hot chocolate to rid it of any thin layer of cream that may have formed while simmering the milk
Drink and enjoy!

Observation:

One happy person (i.e. you)

Inference:

Hot chocolate makes you happy :)

Saturday, November 01, 2014

Macho-ism in Computer Science

It's common for me to see blog posts by companies talking about the high traffic volume in terms of QPS/RPS they handle and the amount of data they process, and that's super cool. But then there's another class of facts I see floating around a lot, and they tend to talk about the size of their hadoop or serving cluster, and things like "dozens or hundreds" of machines in your fleet seem to be something to be proud of. I don't understand this way of thinking or at least don't see the point of it. As I see things, it's nicer if you can get more done with fewer machines, and not more machines in your fleet.