Why would you not choose to go colo and save the money? Even with hiring an extr...

RyanGWU82 · on Jan 2, 2013

Our #1 requirement has been to keep up with the growth in traffic on the site. We've been growing so fast that there's literally no way we could have ordered and racked equipment fast enough. We were also a very small team -- a year ago there were only about a dozen people in the whole company. At this point we're much larger, which gives us room to consider more options like colo or multiple cloud providers.

AWS certainly feels pretty costly when you compare colo prices to the list price for on-demand instances. But one of the reasons I wanted to present our work is to show that you can use the cloud for a lot less than the list price. It takes work to buy reserved instances or run spot instances, but that does make it much more cost competitive.

samstave · on Jan 2, 2013

I am not sure what you mean exactly.

By "going colo" they would need to layout all the upfront hardware costs. These would not be insignificant. They would then have all the operational overhead of maintaining all that gear.

You may have a higher monthly cost for AWS services - but without needing to buy any physical hardware, you have way less to worry about. Further - if you need to scale, it can be done in seconds rather than weeks/months given lead times for procurement, design time, implementation time (i.e. scheduling the install in the colo, coordinating the need for more space etc...)

This is just scratching the surface...

vidarh · on Jan 2, 2013

> By "going colo" they would need to layout all the upfront hardware costs.

Leasing still ends up substantially cheaper than EC2. Never paid upfront for any colocated hardware I've been responsible for. So does managed hosting at a number of providers. Last time I priced out this, EC2 ended up 2-3 times as expensive as managed hosting (with no upfront costs for the managed hosting either), and the gap to leasing servers and putting them in a colo was even larger (but there you do need some scale before you cover the extra ops costs).

> They would then have all the operational overhead of maintaining all that gear.

If you're small enough, sure, your savings won't pay for extra ops people. But you don't need to be very large before the savings outweigh the cost of more ops people. And with managed hosting this is a non-issue - at that point you don't have any more ops issues than you have with EC2.

> Further - if you need to scale, it can be done in seconds rather than weeks/months given lead times for procurement, design time, implementation time (i.e. scheduling the install in the colo, coordinating the need for more space etc...)

It's not either/or. In fact, being prepared to use EC2 to handle peaks means the cost difference between self/colo hosted (+occasional EC2 use for peaks) and EC2 gets even larger, as you can run your own servers at far closer to full capacity without the risk you'd take if you didn't have that ability. Handling occasional peaks with EC2 is a great use of it, and definitively cost effective.

> Further - if you need to scale, it can be done in seconds rather than weeks/months given lead times for procurement, design time, implementation time (i.e. scheduling the install in the colo, coordinating the need for more space etc...)

See above. But also consider that if instead comparing against the managed hosting option, a number of providers with auto-provision in minutes to a couple of hours once an order is placed. And many providers now also offer a mix between colo, managed hosting and EC2 like cloud solutions, so you can put your base load in a rented rack, scale in the mid term via managed hosting, and spin up cloud instances as needed if you want to deal with a single provider.

EC2 is great for "quick and dirty" temporary solutions, batch jobs or handling peaks that last less than about 6-8 hours a day, and I use it now and again for that reason. But the moment your instances are up more than about 8 hours a day, and you have more than a few of them, it will quickly start costing you more than the alternatives.

samstave · on Jan 2, 2013

>EC2 is great for "quick and dirty" temporary solutions, batch jobs or handling peaks that last less than about 6-8 hours a day, and I use it now and again for that reason. But the moment your instances are up more than about 8 hours a day, and you have more than a few of them, it will quickly start costing you more than the alternatives.

I think Adrian Cockcroft & Jedberg may disagree with this statement.

Netflix has made a point (and a business model) of pushing all their infrastructure costs for their streaming service to AWS for many reasons.

They clearly have a HUGE amount of traffic across their service, and they are very successful in keeping a lean team on staff that has a focused skillset while not needing all the IT ops folks on staff. The HW costs to support their service would be very large as well as the distribution of that HW across the [nation|globe] to support their userbase.

Also, I do not think you're properly accounting for all the design and support considerations.

In a large infrastructure implementation you're going to need quite a few ops specialties: (in smaller orgs, these roles can be collapsed, in very large orgs they are discreet. Your ops costs get high fast in large infrastructure deployments)

Architect

Network

Server

Support (deployment, ops, maintenance etc..)

With the need for 24/7/365 ops coverage - especially if you have multiple regions/internationally deployed infrastructure... you can see how this can get expensive.

So, I think there are a few sweet spots that can be looked at.

Finally, there is also the hybrid model, where you have your own base-line infrastructure which scales out to AWS to support larger load (CDN model)

vidarh · on Jan 3, 2013

> I think Adrian Cockcroft & Jedberg may disagree with this statement.

They might. But either they haven't priced it out, or they have decided it's worth paying several times as much for some reason. Given that the high price of EC2 gets brought up and how I've never seen them actually address the pricing issue, I'm not going to speculate why they've decided to make that tradeoff. I find it quite baffling, though, and I'd be very interested in it if they have done a serious assessment of it somewhere.

> They clearly have a HUGE amount of traffic across their service, and they are very successful in keeping a lean team on staff that has a focused skillset while not needing all the IT ops folks on staff.

Given the very public, very extensive issues that in particular Reddit have had with their hosting, and how they kept taking the entire service down for maintenance seemingly always when I want to use it (since I tend to want to use it when Americans are sleeping, I guess), I'm not so sure this is a glowing endorsement of doing things their way. I certainly couldn't get away with the stability-record Reddit has - the CEO where I currently work would look at me as if I was crazy if I suggested even the amount of scheduled maintenance windows Reddit takes. I don't use Netflix, so I haven't kept track of how they're doing stability wise.

EDIT2: Actually looking at their numbers, and comparing EC2 prices, I'm fairly comfortable in saying that the setup we're running is actually larger than their in terms of total computing resources (but nowhere near them on bandwidth use), which is quite interesting...

> while not needing all the IT ops folks on staff.

You can have someone else do the IT ops for co-located services too. There are literally thousands of companies offering suitable services on an hourly basis, and dozens that offers it globally. Outsourcing ops is easy.

And with managed hosting, the ops you need to do yourself if you don't pay for extra service tiers is pretty much the same as for EC2. Someone else handles the hardware, just as with EC2. Someone else handles the network, just as with EC2. What you need to handle is what is installed on your servers, just as with EC2.

> The HW costs to support their service would be very large as well as the distribution of that HW across the [nation|globe] to support their userbase.

You pay for the HW with EC2 too. You just don't get to own it at the end. A typical colocated setup often involves leasing rather than purchasing, so you're still typically dealing with monthly payments. And if you don't want to own, managed hosting is still vastly cheaper.

As an example, leasing costs for our lates purchases of a quad-server box containing 4x dual hex-core 2.6GHz cpu's with 24GB RAM each, and 24x 256GB OCZ Vertex 4 SSD's, is about $600/month per unit. With their share of our rack space, power, bandwidth etc. the full hosting cost excluding our ops cost for this box is about $750/month (this is accounting for the fact our racks are currently nowhere near full, and so this price is higher than it could be).

Comparing them to EC2 is a bit tricky, since there's no direct equivalent. But to be very generous to EC2 and using a model that these servers substantially outperform, consider that 4 x single M3 Double Extra Large in US East is around $3300/month (which is indeed quite a bit better than last time I look - I'll grant that), and I have about $2550/month left to assign to ops every month for that single box.

In reality, for our loads the more direct equivalent would likely be the High I/O EC2 instances, which are almost 3 times as expensive.

(EDIT: Note also that this is before account for any bandwidth charges or costs for EBS volumes or similar for EC2; on the other hand you can of course cut the hourly cost by paying upfront for reserved instances - effectively you're then paying for "fractional managed hosting"... Last time I looked that still ends up more expensive, though the margin is definitively better)

If we had hardware that required enough extra time to deal with to cost us anywhere near that, we'd throw it in the garbage. We're in London. Here, that's 30%-50% the fully loaded cost of a mid-level ops person...

In reality our dev-ops cost per server (remember the box above is four individual servers) is ~$400/month and dropping as part of that cost is development work to automate more of our maintenance. That is our total. Of that ~$100/month is related to the physical server or network infrastructure and maintenance, and thus costs that are included in the EC2 cost.

The rest are related to maintenance of the VM's running on those servers as well as monitoring of the VM's that we'd still pay if were using EC2.

So comparing against the relatively underpowered EC2 instances above, one of our new boxes costs us ~$1150/month for equivalent service, or ~$2350/month total. So we're getting all the dev-ops and monitoring for our VM's "for free" and then some compared to EC2 despite being small enough that we have a lot of ops overhead.

Judging from our growth, our dev-ops cost per server with twice as many servers as we have today would likely only increase by ~ 10%-20%, and so our per-server cost would drop accordingly. Similarly, our rack and power costs would remain roughly constant as we have spare space in our racks, and so the per server costs would drop even more. I'd expect our rough per box costs for the quad server boxes above to drop to ~$900/month if the number doubled with "EC2 equivalent" ops included.

Keep in mind again, that this is comparing to an instance type I know these servers outperform comfortably, and excludes EC2 bandwidth and EBS or other services.

> In a large infrastructure implementation you're going to need quite a few ops specialties:

I don't know why you believe that EC2 is any simpler to work with than managed hosting in this respect. It isn't. Simpler than a co-located setup where you own your own servers, sure. You don't need much size before it's still cheaper, though.

Many hosting providers even provide API's for their managed hosting, and deploy them all using Xen, with the only difference being that you commit to pay for full months of service and a dedicated physical machine. At the same time you often get the benefit of being able to order custom setups tailored to your workload.

> Finally, there is also the hybrid model, where you have your own base-line infrastructure which scales out to AWS to support larger load (CDN model)

I mentioned exactly that, and it is what I recommend unless there are other reasons not to use EC2, because if you handle peaks via EC2, and your traffic is suitably spiky, you can load your dedicated base servers to 90%+ if you're careful instead of often <50% if you don't have any way of rapidly scaling up, and this drives the cost advantage of dedicated for your base load even higher.