It has become fairly fashionable in networking to compare evolving architectures to trends that played out in both compute and storage. The general idea is that if you can identify some architectural truth that is ubiquitous in its application, you can draw conclusions that allow you to either predict the future or, at the vest least, position the particular future for which you are an advocate.

The challenge here is that the nature of compute, storage, and networking are sufficiently different that it is difficult to draw certain conclusions from the way that either or both have evolved.

For instance, the most basic architectural principle for both compute and storage is load distribution. By breaking workloads into small chunks and distributing those across a large number of smaller resources, we can do in parallel tasks that used to be done serially. The subsequent performance increases have been dramatic for some applications (think: Hadoop).

It stands to reason that networking might go through a similar evolution.

But what is the networking analog for distributing load across resources? The physical analogy is a push for lots of smaller devices (some of them virtual), but the meaningful resource provided by the network is capacity rather than computing power. The networking equivalent of distributing load across a system is making better use of available paths. Rather than sending traffic across a small number of equal cost paths, the principle of distributing load favors higher path utilization, even if some of those paths are not the shortest.

Once compute and storage went to a distributed processing model, it created interesting failure scenarios that needed to be addressed. For compute, the central architectural principle is distribution. If workload compute resources are distributed, if any one of those pools fails, the others can pick up the slack. This favors things like virtual machines residing not just in the same server but across servers. It also puts a premium on resource portability, so that as conditions arise that favor one server or another, compute workloads can be moved.

On the storage side, it wasn’t just about distribution. Sure, distribution allowed for the parallel processing required to improve performance. However, for resilient architectures, it became necessary to distribute data to more than one place. This is why replication is so important. And it isn’t just replicating data across servers but rather performing intelligent replication to better control or mitigate failure scenarios. For example, Hadoop has the notion of rack awareness, with the central thought being that data replicated across different racks makes that data less susceptible to failures within the rack (the ToR, for example).

So both compute and storage have evolved somewhat differently as they have had to consider resilient designs. For networking, what might a similar evolution look like?

In networking, resiliency requires multiple possible paths to get from A to B. If a device (or even a physical cable) is broken, that take away some number of paths. The most highly-resilient systems will find alternate paths in the event of failure. Where compute favored distribution, and storage favored replication, networking favors diversity. Path diversity is the only path to the more fault-tolerant topologies.

So as we evolve networking, one of the things we ought to be collectively looking at is how we are introducing the concept o path diversity into the network. In point of fact, most networking paths are determined by the same set of underlying algorithms that have dictated network topologies for more than 50 years. The SHjortest Path First algorithms first pioneered by Dijkstra do a tremendous job of finding the paths with the fewest possible hops between two points. But the result of this is a relatively small number of paths that are active when compared to all the possible ways to get between two places.

In fact, if you look at one of the issues of ECMP, it is that while it load balances across a number of equal-cost paths, traffic ultimately ends up traversing a relatively small number of nodes, making bottlenecks in the architecture possible, particularly around large flows.

What compute and storage ought to have taught us is that we need to be building in path diversity into our networks if resiliency is a primary design criteria. This requires a fundamental change in how we not just design networks but also evaluate networking gear. Some of the mainstay technical requirements (like ECMP) are still important, but they have to exist alongside things like non-equal cost pathing.

The point here is that the analogies that people lean to – comparisons to compute and storage – are important, but their value lies not in direct correlation so much as new ways of thinking about problems. As a thought exercise, it is worthwhile to examine what you think are the driving reasons behind the changes. The changes themselves are simply instantiations of architectural evolution. What is it that is actually driving the architectural change? And then what are the networking analogs?

[Today’ fun fact: A lion’s roar can be heard from five miles away. My wife’s roar has triple that range.]