Been there, done that... (chimeras of the network engineering)
By gregoire on Sunday 14 September 2008, 17:49 - Permalink
I don't work in internet architecture anymore. Call it growing old, call it
going the easy way, on the whole, I was ready for a change, I was offered
change, I signed for it. Still, I will under no circumstance give up digging on
network tech news in the industry to keep up with sharpening my vision of the
technological ecosystem. Although I'm done (for the moment) with being a grease
monkey, I need that technical background to do whatever I'm doing right
now.
Now that I'm on the other side of the fence, I'd like to share with the very
few out there following my nerdy posts some of the frustrations I've been
accumulating over my past Internet Engineering years.
This oughta be fun
The below list of chimeras I've crossed on my misc journeys keep on
appearing when I chat over a (numerous amount of) pints with my Network
Architects fellows out there. Surprisingly, we seem to share a common
frustration over numerous amounts of topics, which, despite the years they have
been outstanding, don't seem to ever get resolved.
Layer 3 Switching VS {Switching OR Routing}
That one is actually my favorite, I even extend the concept to pretty much
any so called "revolutionary" all-in-one electronic gizmo I come across.
Let's take a lively and yet very current example: cellphones. The initial
purpose of a cell phones is to provide the user with the ability to place phone
calls from virtually anywhere - doesn't get more simple than that. Now try and
go to your favorite local phone-shop to purchase one that isn't either mix of
the below:
- camera
- mp3 player
- PDA
- GPS
What is usually ends-up with, is you buying one shiny/heavy/expensive device
that does it all.
One thing is for sure, the more research the phone constructor will do on one
of those features, the more it will be detrimental to the initial purpose:
PLACING CALLS ! It ends up with a vast range of side effects such
as:
- battery lasts about 10mins with all bluetooth, wifi, photo (name any other one) function activated.
- device won't fit in your pocket
- interface requires a PHD in human-to-machine interface
- device is heavy as a brick
- AND MOST IMPORTANTLY: it does neither camera, nor mp3, nor PDA nor Phone as well as a single use does.
Let's be serious for a moment: a phone that is only a phone has better
chances of working as than any multi-purpose device in the same price range.
That is a fact, and mainly because having all those functionalities together
only highers the amount of bugs that one features causes on the phone
feature
The case of L3 Switching is pretty similar.
Before (yeah, I'm leaping back in Y2K here), you had either Routing issues (no
too many really, routing has always came up pretty standard)
OR Switching issues (in copious amounts.)
The fabulous idea that L3 Switching promoted is to push a routing table into a
distributed switching table.
Sounds appealing on paper: less equipments, cheapest per routed port, maximum
flexibility, lower operational cost.
Here's a list of collateral issues it brought:
- lazy architects sacrifying resilience of the "Access-Distribution-Core"
model to a All in one box, and particularly fiance people reading the specs and
telling to techs "why do you need two of these to do that, specs say one does
it all ?" - It is not about what the kit can do, it is about how you want
things to be done.
- memory issues... here they are... you know, when one features sucks up all
your L3SW CPU, and all the subsequent cascaded side effects ending up in the
box process-switching all your traffic. Been there, done that. Abscence of
decent function partitioning when it comes to memory soon becomes your worst
enemy.
- Device code stability: all goes down to the cellular phone example - the
more you want to add feature, the more you risk to jeopardize the already
existing ones. One would say that cautious devloppments rules this risk out,
but it doesn't... When you need to keep up with your competitor's features, you
get sloppy, you don't test as much, and your customers tend to become your
field tests. Been there.
- lazy engineers again, building temporary-but-everlasting designs, made out
of VLAN forwarding all over the backbone, that later blow-up when you least
expect them. Been there also, tons of headaches to unravel variously dumb and
risky designs.
These are only very few examples on what side effects L3 Switching brought. But
let's face it: for those of us who went to network engineering schools, they
teach us one thing, and they insist on it being the thing we should always
refer to: THE OSI LAYERED MODEL.
I'm not a rocket scientist myself, but I can easily understand why keeping a
partition between the Access Layer and the Network
Layer makes sense. I like to rely on the work of people that have
thought this through, not to justify my copious Network Engineer salary in
re-inventing the wheel.
Layer 2 Normalization VS Vendor Specific
This is also one of my big time faves.
I told you, when I signed in the Internet Industry, I felt comforting that
older people with grey beards had spend time torturing themselves about what
the best way to do things were, and that the fruit of all that common
grey-matter were strict Norms. I would bless IEEE and IETF people on a daily
basis. They just made my job easier, I just had to read.
IEEE 802.3 together with Equipment Vendors spoiled the naive vision I had. To
make it short, whenever it comes to Spanning Tree , I start to
freak out instantly. I have never been able to interoperate different equipment
vendor boxes without having any side effects. Let's face it, plain STP
convergence (30s) time is not sufficient nowadays, especially with increasing
bandwidth. Whenever you try to configure RSTP, namely 802.1w accross vendors,
unless you deliberately test it, you would always face a not so lovely surprise
when discovering that it falls back to plain Spanning Tree convergence because
of non interop.
Somme call it PVST+, some RSTP, some 802.1w. What I tend to assume is that
802.1w is too hazy to not let vendors implement their own flavour of it, hence
it has to be un-interoperable by default. Amongst other Vendor specific issues:
where do BPDUs go ? Tagged in a proprietary VLAN ? Untagged ?
Why doesn't the PDF mention it if it is vendor specific ?
I came to the unsatisfying conclusion that Spanning Tree was the oddest
protocol on earth: whenever you do Layer 2, you need it to prevent loops. Thing
is, once you've set it up, you don't have loop issues anymore, you have
Spanning Tree issues - what kind of sense does it make ?
Most of the time, it would force you to signe for a single equipment vendor,
which from an engineering and financial perspective, is not satisfying at all
!!!!!
Dense Line Cards VS Non Blocking Backplane
This one also is a funny one. How many times have you met that guy in a suit,
spoiling your morning coffee moment (because the equipment vendor's office is
far away in the suburb you know, and the guy needs to drive to your office, and
because of traffic, it's easier early in the morning...), feeding you that
usual sales pitch: " you know we have the densiest non-blocking chassis on the
market ! *blink* *blink*.
Well OK, but if you question the suit guy enough (no offense, some of them have
become close friends...) he'll end up with telling you that it is either
density or resiliency.
Let me figure this out, because I need translate your sales pitch into usable
specs:
- if I stuff the chassis up with all densiest linecards, then the backplane turns into not-so-non-blocking ?
- those two management cards that I need to purchase to cope with that situation actually can protect each other ? Why that ? Because of the backplane design, and in order to use all the ports to their nominal bandwidth, I lose the Management resiliency feature ?
- that shiny 48x1G card here has a 40G attachment to the backplane ? They don't come up in 40x1G ports ? What do I do with those last 8 ports that I've paid as a part of the kit ?
- Oh now, if I do multicast, the replication on separate linecards consumes
internal bandwidth... so I will never be able to go linerate ?
What a shock, I would have thought that non-blocking actually meant
non-blocking.
Enterprise grade MPLS VPNv4 VPNs
Been a while in the enterprise business, and the more it goes, the less I
understand how we can pitch such VPNs.
One mutualized Architecture to Provide all of our customer's VPNs. How
comforting is that ? It often ends up with someone asking for MPLS rather
than asking for an actual solution - MPLS is trendy. MPLS does it all. MPLS is
the most secure thing on earth.
OK, first, if you decide to purchase an MPLS VPN, it is mainly because it is
cheaper. Else, you'd build your own network on top of rented
leased-lines.
You can't decently ask for a fully private (I'm talking resource-wise here)
network as long as you ask it to be setup on a mutualized backbone, right
?
If your goal is to go cheap, why don't you just buy some plain residential
internet access on all your sites, with more bandwidth than you actually need,
and pay a decent engineer to build tunnels on top of if ? It will come up
really cheap, with the same amount of features. If you're paranoid, use two
separate internet access vendors on each site, residential bandwidth is cheap
nowadays.
What ? You say the support with enterprise telcos is premium ? Well
sorry to spoil that, but it isn't, try raising a ticket, you'll figure out by
yourself.
Now something even more absurd: QoS
What is the use of QoS, except to cope with undersized pipes ? What is the
use of QoS ? I hear you: "It brings Quality of Service" ! Well
actually, it is just meant so your important traffic is not dropped whenever
shit hits the fan (namely when your employees suck on your centraly delivered
internet access).
I say QoS starts with sizing the network correctly. Whenever QoS is triggered,
you're past quality. You're just trying to limit the damage done. And the more
QoS classes the telco offers, the most comforting you find it...
I'd suggest one simple thing: before going after some trendy protocol, you just
specify what your needs are, without even taking market trends into account.
You sum-up your bandwidth needs, what features are the most important, what are
only nice-to-have. Then you go and ask msic telcos what suits your need the
best.
I've got plenty of other chimeras to discuss with you folks, but I'm getting
sorta tired here... plus feeling helpless. I might write a couple more in a
future post, stay tuned :)