I heard lately many presentations about virtualization solutions, being them orchestration solutions, or storage, networking or soemthing else, and the vendors selling them have a usual phrase “this product is designed for large companies AND service providers”. The underlying idea, coming from vendors thinking at both use cases for their products, is that a large companies is completely comparable to a service provider. At first, the thought is correct, but there is a significant difference that ultimately makes this thinking wrong.
For sure, one of the reasons behing the idea to compare large companies to service provider, is the size of the infrastructure. A companies serving 10.000 employees is not that different from a service provider hosting several virtual machines, that combined serve the same amount of users: in both cases there will be 10.000 mailboxes, 10.000 hits to websites, 10.000 logins to file servers,and so on.
If the running workloads have the same size and performance requirements, the infrastructure needed in order to run them will be about the same.
Furthermore, other characteristics are common: one above all is automation. You cannot think about managing thousands of servers and virtual machines without some automation tool. Regardless is Puppet, Chef or other solutions (some of them funnily were designed originally for large companies and have been converted in order to be sold to service providers too), they are fundamental above a certain size: there are too many managed systems per single system admins, manual management is simply impossible. In fact, one of the KPI (Key performance metrics) in large environments is this ratio.
Everything under control?
So, you would think the idea that those two use cases are the same is correct. In general it is, but there is at least ONE element that makes the two totally different, and is usually overlooked or purposely forgotten: users, and how they use resources of the infrastructure.
In a large companies, even the biggest ones, you can obtain a high degree of accuracy in forecasting the growth of the IT infrastructure, and thus design it accordingly. All the requests from the several business units are evaluated, analyzed, and finally translated in an infrastructure solution that also have capacity planning in it.
In a service provider, at least those at a certain size, once the user (the customer) has gained access to the services he bought, there is no direct or correct way to know what he’s going to run, what compute power he will use or how much storage he will consume. Unless you place limits to customers’ growth (and even if many SP do it, this is not what a service provider should do), at any time a customer can deploy and run how many workloads he needs.
Let’s look at a quick example: for a new e-commerce, the user wants to create a classic 3-tier infrastructure, with a database server, some application servers and some web servers. If the user is a business unit of a large companies, he will meet the IT department in order to evaluate the project together, so at the end he will be sure the required IT resources are available. But if the user is the customer of a service provider, he’s only going to login and start deploy the required virtual machines. All of a sudden, the provider could end up with 50 new virtual machines and 10 more TB of consumed storage space.
Capacity Planning vs Design to Scale
If you agree with what I just explained, you would agree also to the idea that the two use cases require different kind of design. In a service provider, one of the most painful activity is the so called “forklift upgrade”: that moment when a system is no more expandable, and the only way to upgrade is its complete replacement with another one. If you are talking about storage, this also means you have to migrate all data from one system to the other, and this is something that can always lead to problems even if it was completely planned.
In a large company, even if it could use a design with no or reduced forklift upgrade problems, a component upgrade by substitution is less upsetting; that’s because it has the control over workloads. First, they can design a solution that can last for the usual 3/4/5 years period, but also because the migration during the forklift upgrade is somewhat manageable: by involving the different business units impacted by the upgrade, downtimes for example are easier to be planned.
In a service provider, even if they can do the same kind of upgrade, this would have to be done without any problem caused to the customer, and at the end without impacting the agreed SLA. It’s not impossibile, but it’s hard to be done.
So, instead of thinking about a really detailed capacity planning, I think a service provider should focus more on “Design to Scale”. This means to use, wherever you can, modular components that can be scaled horizontally in a linear and transparent way. The compute part is already managed in this way from years, thanks to VMware and its clusters made with several ESXi servers that can be added and removed dinamically. In storage this does not always happen, and there are still few scale-out solutions, while the vast majority of storage systems are scale-up. I wrote a post some months ago titled “The future of storage is Scale Out”, and in it I explained why I think a scale-out storage is better, go and read it if you want.
A service provider has limits in his capacity planning possibilites because he does not have a complete overview of his users. So, he better focus on “Design to Scale”, so he would be able to react more easily to sudden requesta of his users.
At the end, a service provider runs in a different way compared to a large company, even if they have some characteristics in common. If a vendor wants to compare them come hell or high water, so he can sell the same product to both, well this is not always a smart move. There are for sure some products that can be a great fit for both use cases, but at the same time many solutions are good only in one use case, and their use in the other can lead to problems; maybe those problems are not visible at first, and because of this they are even more dangerous because they are going to surface when a (painful) upgrade is in place.
If you are a service provider, design by using extreme scenarios: don’t think about a system that can grow 30% per year in the next 3 years, think about “Is this system able to be DOUBLED in its size in a week timeframe? And how?”