One of the most common question I receive in regard to Veeam Cloud Connect, is “What’s the size I should configure for my Veeam server”?
Usually, we answered this question using our sizing tools and our best practices, but lately I found a different and probably even better answer, thanks to our big data.
Sizing Veeam Server, the usual way
Over the years, Veeam professionals have developed easy-to-use best practices to correctly size a Veeam server. Based on parameters like number of virtual machines to be protected, amount of data to be moved each day, the desired backup window, all these values are taken into account to do the sizing.
And even if calculations can be easily done using these numbers and applying the rules explained in Veeam best practices, once someone got the theory behind these numbers he could just run a sizing tool to do the calculations in a few clicks, as in the VSE tool.
BUT: this is all correct in a controlled environment, where an admin can configure his own backup jobs, plan how to spread VM’s across different policies, pick the target storage, and so on. Here, an admin knows what is going to happen every time a backup runs.
A service provider landscape is totally different. I remember a post I wrote many years ago, titled Service providers ARE NOT large companies and I believe it still holds true. Even if the size of a provider in many cases is comparable to a large enterprise, some of the design rules that we have in the enterprise world cannot be applied to service providers, for one simple yet powerful truth:
Providers are not fully in control of what their tenants do.
Yes, they build the infrastructure. Yes, they can apply rules and limits. But ultimately it’s up to a tenant to deploy 1 VM or 1000 all at the same time, or to run backups every day at 10pm or at 3am, without knowing what other tenants will do. A provider may end up with ALL the backups starting at 10pm, because all the tenants decided so!
So, a provider can “plan” for a certain size of his environment as a starting point, and then have a flexible design that can quickly be adjusted and grow as needed. Otherwise, declaring a size as “the correct one” will always fail, as this will be the correct size for a given moment, and in maybe just one month it will have to be changed.
The new sizing, using big data
So, back to our original question, how do we size a Veeam server for Cloud Connect, if the workloads are somewhat unpredictable?
Well, at some point I was struggling to reply, and this was the moment I decided to look at the problem from a different angle. Instead of giving cold numbers from a calculator (even if based on very precise theory), I decided to explore a different path. I started asking around the company “how can I know what providers are doing with their Cloud Connect environments?”. And after some research I got into contact with our Data analytics team, and it was a revelation!
I learned that in our support tickets, starting from version 10 of our main software, we collect anonymized data of the installations for which we receive the support logs. This is where we can extract information that sometimes you see posted in Anton Gostev’s weekly digest and you learn crazy things like “27% of all Veeam repositories are SMB shares” (14 December 2019).
So, I quickly find out that yes, we also have statistics about our Veeam Cloud Connect installations, and in just a week (did I say our Data Analytics team is crazy smart?) they created a Grafana dashboard for me with all the values I wanted, and even more!
Let me give you an example, so I can explain how to read them and how to size a Veeam server for Cloud Connect. First, let’s give some context: “how many providers are we using for our statistics?”. Remember, these are not all the Cloud Connect installations we have in the world, but only “installations running v10 that opened at least one support ticket” as we don’t upload logs automatically from our software, but it’s the provider doing so manually at his will.
Still, we are talking about 800 installations, and we have been able to split them by numbers of protected VM’s, which was one of my biggest goal in this project:
As we can expect, there is a pyramid where a few providers are managing a very large amount of workloads, and many manage smaller amounts.
As I said, this was a primary goal for me: I wanted to be able to map the sizing requests to existing providers, and be able to give numbers from comparable providers. Once I’d got a new request, I could ask first “how many workloads are you protecting or planning to protect?”.
Let’s say 600. I can tell this provider “ok, I have 61 installations to compare with”.
What I can tell about those installations? Let’s see for example the memory value:
You see how this is becoming powerful already? I can see that on average, in providers protecting 500-1000 workloads (the range where the request I got falls into) the Veeam server is configured with 27 GB of RAM. But then we can learn even more: look at the smaller size, the 1-100. There’s a “maximum” column where we found an extremely cautious provider who (over?)sized his server with 512GB. Maybe he’s planning to have a massive growth, or this is an “all-in-one” physical box that runs the entire Cloud Connect environment. Whatever is the reason, these values ruins our average. So, we added the median and mode values – I hope you have some statistics knowledge (I did a couple of stats exams at University, but don’t worry it’s not rocket science). This can be translated as “what is the vast majority of providers doing” and we see that for this size of environments, the most common value (the mode) is 16, and the middle of the set of numbers (the median) is 20. So, ultimately, I can tell that a good sizing choice can be 16-20 GB of RAM.
And I can do the same for CPU cores:
And if you want to “go full nerdy”, I can even tell you the most common Operating System used to install the Veeam Server (we didn’t go as far as splitting this by environment size, but we could…):
Windows 2019 is growing steady, 2016 as expected is the most common one, and there are many 2012 R2 installations that were never upgraded, probably because “you don’t change what’s not broken”.
And yes, we also saw a couple of installations running on Windows 7 !
The power of big data
Why I did this project? My main goal was to give a “real” answer to providers looking for information from the field, rather than aseptic lab results (as much as they can be accurate). Because a software designed for service providers like Cloud Connect can only be tested “properly” at scale in a real environment. So, having these numbers and be able to show them to other providers is key for me and us at Veeam, because these are by definition validated values, otherwise a provider would have configured something different to make our software run smoothly.
We didn’t think yet about a chance to eventually publish these numbers in a public site, this is really the first time non-Veeam people see this research – and not really many Veeam people have seen it yet to be honest.
But I can now finally answer to any sizing question for Cloud Connect with validated numbers from the field, thanks to our big data.
PS: a huge huge thanks to Alexandra from the Data Analytics team,none of what you see here could have been possible without her