Use CDN, you retarded !

As an introduction, this one link here goes out to all you web platform developers around the world - you need to have a read at what this guy says. Well his not any random "this guy", he's Steve Souders from Yahoo!'s web performance team. The article I mention states 34 rules to make your platform efficient, and guess what ? Rule number two explicitly mentions: "use CDN". One interesting fact you might want to know is that, Yahoo!, pretty much the portal that symbolizes the whole Internet, has never been working without CDN. Even in its early stages, it was powered by one of the oldest Edge Proxy Caching CDNs in place (won't name it tho, but that is not the point).

That is a very well know fact: you will never be able to serve as many users from as many locations as possible, from a central hosting facility in your home country. Sorry if it comes as quite a shock, this is raw facts. Whatever your business model is: either you are subscription based and then performance is the driver to get more users from everywhere and keep your local ones (I'm not speaking about features here, let's consider you have the features and content to be successful) or you get money from your advertising partners, which means visibility must be good and as widespread as possible.

Also, if someday your traffic increase (Good for you, you've become successful, you're living the dream), except that your business plan is not fixed yet, meaning you burn more money than you actually get from your platform. Are you going to deploy a network of yours before you can even get decent and realistic revenue forcast, and if you're smart enough to compute a spreadsheet, there is no way you can conclude deploying your own infrastructure and buying raw bandwidth will actually make you save money... Not now (maybe later), not until your platform is finalized and making money. Not until you've had some experience and insights on how painful it can be to run your own infrastructure when your core job is designing a platform and making it interesting to other people, feature-wise. Right now you need to focus on your own job, and network and infrastructure simply isn't. (trust me, I've been there, or trust me not, I've loved convincing some of my previous employers a datacenter, transit, routers, switches were the only way to go). There will be a time where it will be a smart move, but simply not now.

So basically, whenever you traffic increases, you need performance, at the price of commodity. (yeah right, who doesn't want that...)

The flavors of Content Delivery

Like any decent geek Ice Cream, CDN comes in many flavors. Basically, you'll end up in choosing amongst those three:

  • reverse proxy caching (a la old school)
  • storage based (origin distributed)
  • peer to peer

All of those three have very different features sets, low points, costs and cope with a specific portion of your content. I'll try to go over them in detail.

Rule#1: Get to know the value your content assets have

Yeah right. Say you're in the early stages of developing a promising platform, but you don't know yet what your definite business model is (don't laugh yet, the biggest ones are still looking for their business model, so there is very little chance you'll be right on your 1st shot: best example I have is Joost) and one thing is for sure, you don't know yet what slice of your content is valuable, so get technically ready to adapt your delivery methods and costs. This goes through thinking through the below.

Meet Mr Content Director Engine

First of all, you need to consider that there is not a perfect solution for all of your content. It all goes down to your ability to categorize closely your content. Based on the value and needs of performance that you have set on each "class of content" (man I do hate that CoS formalism, because it puts me back to when I explained that QoS was only necessary when you couldn't afford pipes big enough to carry your traffic and had to make a choice of what to drop when shit hit the fan...), you will serve them using different delivery methods. Basically, it is all about you being able to write your own Content Director Engine. Make sure you have a philosophy on conditional writing URLs pointing to your statics whenever you start building your platform. It one difficult thing to do once your platform is live and running so please take a moment to think this through before you have too many users / too much content on your platform, or it'll become one of those technical hassles that ends up in blood, sweat, tears and big fat ugly downtime.
You might want to be able to choose from where to serve your content depending on:

  • The Network from which the request is sent (I'm saying AS# , which basically is the network of the ISP from which the content is coming)
  • The Location from where the request is sent (use GeoMapping databases such as MaxMind, or Quova - MaxMind has a handfull APIs including an Apache module here.
  • The popularity of the ressource you are viewing: something from the short tail should be served better as it is one of the most viewed items.
  • The monetization value that your item has: plain "stairway faceplant UGC video" being low value, the hottest tail UGC video being the current buzz being high value, licensed creative content being very high value. Basically the long tail being low value, the short tail being mid value and the creative/licensed content being premium value.

One step further in distributing your content:

More than the above, your content director engine can even move your assets to fully owned storage (expensive) to disposable storage clouds for instance... You can even mix and match professional CDNs with centralized transit delivery or even peering, but this will be only when you have a network to play and cry about, I suggest peering is the cheapest and most performing way to serve your local users.

The trick here is to have everything ready to detect new trends of valuable content, and move them to the proper storage and delivery bundle of yours, with the fewest tweaks in your platform. The general guidance here would be that you use CDN by default, and move it to your own internal distribution when it reaches significant revenue levels, and that you can trigger a project to serve it locally by your own technical means - you'll need hosting, edge proxies, a datacenter switching fabric, and routers with local peerings to an IX or Private Peerings with the local ISPs and transit).

Again, maintaining a network architecture of your own is far away from your initial business, which is supposedly content, so whenever you decide to take the step to move some piece of your content to internal delivery methods, please take some well deserved time to think this through: you need to know what the benefit is, because once you've stepped into building up a network, you have responsibilities in the Internet Ecosystem, and you need to maintain it 24/7 as one of the networks being part of the internet: with great power comes great responsibility some may say, so there have to be well established reasons for you to venture out of your core business !

I'll stop for now and will go over the different methods of delivery that I just mentioned, stay tuned for our next episode, fade to black, advertisement coming up :)