Veeam repositories, both Windows and Linux based, are running a software component responsible for receiving and storing data as they are processed by proxies. One of the most important parameter when sizing a repository is its expected memory consumption. Here are some informations for its proper configuration.
What Veeam repository stores in memory
A Veeam Repository is responsible for the collection of saved blocks coming from proxies and their storage on a disk target, local or remote. In order to speed IO operations on disk, a Repository leverages multiple technologies involving memory. Obviously, there’s always a tradeoff between disk IO and memory IO, so any saved IO on the storage has to be compensated by some additional IO on memory. But given the fact memory is multiple times faster than disk, this is a good tradeoff.
At a first level, a repository uses memory to store incoming blocks. This queue collects all blocks coming from proxies, caches them in memory and after some optimizazions it is flushed to disk. This allows to reduce to a minimum random IO impacting the backup files, while trying to serialize as much as possible the writes operations. The amount of memory consumed by the queue is simple to be calculated: it uses up to 2 GB of memory per active job.
But this is not the only memory consumed by the repository: Veeam backup files contains deduplicated informations of the saved blocks. As in any deduplicated storage, in order to keep track of stored blocks, there are metadata informations stored along the file itself (Remember, in Veeam metadata are compared inside the same file, not as a global deduplication system).
When a new block needs to be written into the repository, for example during an incremental backup, the Veeam datamover component running in the repository (also referred as the target datamover) has to read these metadata, compare the hashes of the stored blocks with the incoming blocks arriving from a proxy, and decide if this block has to be stored because it’s new, or it just needs to update the metadata informations when the block is already stored from a previous write operation. This is extremely important especially in a scale-out design, when multiple proxies are writing data into the same backup file: blocks coming from different proxies might be duplicated, so the right point in the chain to compare them is the target datamover itself.
To improve performances, the target datamover loads dynamically these metadata informations into memory. Before Veeam Backup & Replication v8 Update 2, the cache was used to accelerate writes, while in Update 2 it’s now also used to accelerate read operations; but there are also differences in the way the cache is populated and used. Let’s first see what is the content of the cache: metadata informations are obviously way smaller than the data they refer to, but still they consume some amount of memory. The amount of consumed memory for metadata depends on the selected block size for deduplication:
When both deduplication and encryption are enabled these are the consumption values:
|VBK size||Optimization||VBK block size||Memory consumption for VBK metadata|
|1 TB||WAN target||256 KB||700 MB|
|1 TB||LAN target||512 KB||350 MB|
|1 TB||Local target||1024 KB||175 MB|
|1 TB||Local target 16+ TB||8192 KB||22 MB|
Based on this table, you can easily calculate the amount of consumed memory on a repository based on the size of the backup files you have to deal with; also, you can easily understand why on large backup sets the 8MB block size is to be preferred: a 30TB backup file would consume 5,2 GB of memory, while with the large block consumption is just 660 MB of memory. The tradeoff on the other side is a worse deduplication result because of the larger block size. This is also the block size for deduplication appliances: in this case, the reason is that it’s useless to consume a high amount of memory on the gateway server, while the real deduplication happens in another machine, the deduplication appliance itself.
EDIT: after publishing the article, Veeam developers found a couple of inaccuracies in the cache description. In order to avoid having people reading wrong statements, I’ve temporarily removed the second section; it will be back soon. Sorry for the inconvenience.