In my last column on complexity, I quoted from a “standard paper” in the realm of complexity to argue that complexity is a necessary to provide adaptability to uncertainty. The quote is nice, of course, but an argument from authority is rarely as strong as making the case directly. So how can we make the connection between complexity and uncertainty in a more concrete way? Let’s look at two examples in this post to cement the connection: hierarchical design and type-length vectors in protocol design.

Hierarchical design is, of course, an industry standard going “all the way back.” From my first work in the networking field, in the early 1990’s, I can remember having hierarchical design drilled into my head. My first book, Advanced IP Network Design, was mostly a treatise on hierarchical design, giving thoughts, ideas, and reasons. As we move towards a “flat network world,” can we safely discard hierarchical design as a way of designing networks? I think we can even more safely answer that question with a resounding “no.” But doesn’t hierarchical design just add complexity that doesn’t need to be there? Isn’t building a network around some set of artificial rules about what should be placed where more complex than just building a big flat network? Doesn’t hierarchy make life more complicated for application designers, and don’t applications drive the network? All of the above, certainly. But before we abandon hierarchical design, let’s look at why the additional complexity is necessary in light of uncertainty in the network world.

There are two primary motivators to designing networks within a hierarchy: division of labor, and division of failure domains. Both of these are reactions to uncertainty. The more you divide labor, the easier it is to fit new things into a structure, and the better you are able to handle changing requirements. To take this from a policy perspective – how easy is it to find all the different bits of a policy scattered throughout the configuration of a thousand different devices deployed in a ten thousand router network? I can answer this from experience: it’s not very easy. By dividing a network into pieces that focus on one to two functions, you can focus policy, as well. So building networks in a hierarchy is a reaction to the uncertainty of changing business, policy, and applications deployed on the network itself.

What about division of failure domains? The same concept applies. If we can define a failure domain as the number of devices that must interact when a change in the network topology (or reachability) changes, we can see that hierarchical design allows for aggregation, and aggregation hides information, and hiding information reduces the size of a given failure domain. In the face of uncertain link and device performance, then, hierarchical network design is a rational response.

The case for type-length vector (TLV) encoding in a protocol design is even easier to make. OSPF, for instance, is the prime example of a protocol where the fields are hard coded. Each packet format is defined and determined, down to the last bit. This makes OSPF very efficient on the wire, and very simple from a packet format perspective. OSPF was, in fact, much simpler than IS-IS, for instance, when there were a half a dozen or less LSA types. What’s happened since those early days, though? OSPF hasn’t weathered the storms of change in the network as well as IS-IS has from a packet format perspective. Each new idea in OSPF requires a new LSA type, each with its own flooding and processing rules. One particular challenge – supporting IPv6 – has led to an entirely new protocol in the OSPF realm, OSPFv3. IS-IS, however, remains just IS-IS. New TLVs have been invented, of course, and some new real features have been added over the years, but IS-IS tends to take in new features and support for new addresses, etc., very easily.

None of this is to say that IS-IS is a superior protocol, or OSPF is somehow a bad design. What we can learn, though, is that sometimes a little up front complexity can ultimately reduce complexity by allowing for change – or rather, that complexity is a reaction to change in the environment. In OSPF’s case, the complexity has been added on through LSPs, multiple versions, and other mechanisms. In IS-IS’ case, the complexity has been added by adding TLVs.

There are a number of other ways we can see how complexity is related to uncertainty, from injecting humans into security to counter brittleness to creating processes for the development of new applications and protocols rather than “just doing it.” So now that we’ve validated the connection between complexity and uncertainty, we’ll jump into some of the foundational ideas of complexity in the next post.

‘til next time – keep it simple.


Cloud Infrastructure

Russ White

Russ White is a Principal Engineer in IPOS who's scribbled a basket of books, penned a plethora of patents, written a raft of RFCs, taught a trencher of classes, and done a lot of other stuff. He has 20 years of experience in network design and architecture, holds an MSIT from Capella University, an MACM from Shepherds Theological Seminary, CCIE #2635, CCIE 2007:001, and a CCAr.

Russ White

Discussions