PernixData in my Lab: some performance tests

2 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 2 Email -- 2 Flares ×

Few weeks ago I published an article titled “My new “I/O Test Virtual Machine”, telling you about the VM I built to run my performance tests for virtualized environments. After a first post running those tests against my “plain” Lab and its enhanced version using another server-side caching solution, many of you asked me if I was planning to do the same tests with PernixData.

It’s no doubt this software solution is gaining traction, for several reasons, the main one being its support (and is the only one at the moment) for accelerating writes. You can find many articles around about how it works, and is out of the scope of this post to describe it again.

Also, this post is NOT an head-to-head comparison between different caching solutions; you better check first features and characteristics of these different solutions, and decide which one is the best fit for your own use case. Performances of each of those are extremely high, and there is not so much sense in evaluating them from a pure performance standpoint. My main goal with this article is to see how much Pernix can speed up my lab, and to evaluate the impact of the write-back capabilities.

Test environment

For my tests I used my SkunkWorks Lab. On two of my three ESXi servers I have two Fusion-IO ioDrive 320 Gb cards, and then the PernixData software, using the 1.0.2 release. All these results has been obtained by executing the tests I described in my article My new “I/O Test Virtual Machine, where I described the initial results in my lab in its “base” configuration.

One important warning about my results: my lab is built on HP Proliant G5 servers. Their PCIe bus is not so efficient (to say the least…), and because of this the Fusion-IO cards cannot run as fast as they would be able to, and they can only reach around 25-30k IOPS; on different servers they would be able to go above 60-70k, even if they are first generation cards.

So, look at these numbers merely as a comparison with the performances of my lab without acceleration and use them to understand how PernixData can improve performances, but NOT as absolute results. In order to have real performance results I should have a newer lab where the PCIe bus is not the main bottleneck.

You will find here two different results:
– Write-Through: caching of only read operations
– Write-Back: caching of reads and writes without any replica to a secondary host


FIO Max Bandwidth: 100% read, 100% sequential, Block size 1M, IO depth 32, 16 jobs

Labs: 194 IOPS, 199,57 MBs, 2593 ms latency

Labs + PernixData Write-Through: 609 IOPS, 623,95 MBs, 634 us latency (yes, microseconds, not milliseconds, thanks Fusion-IO!)

Labs + PernixData Write-Back: 604 IOPS, 590,95 MBs, 779 us latency


FIO Max Real I/O: 100% read, 100% sequential, Block Size 8k, IO depth 32, 16 jobs

Labs: 12717 IOPS, 101,73 MBs, 4016 ms latency

Labs + PernixData Write-Through: 24689 IOPS, 197,52 MBs, 30 us latency

Labs + PernixData Write-Back:  24282 IOPS, 190,41 MBs, 32 us latency

FIO Real Life Test: 80% read, 100% random, Block Size 8k, IO depth 32, 16 jobs

Labs: 2800 IOPS, 22,40 MBs, 181 ms latency

Labs + PernixData Write-Through: 13647 IOPS, 109,18 MBs, 29,21 us latency

Labs + PernixData Write-Back: 18588 IOPS, 113,55 MBs, 29,53 us latency

This test was the real battelfiled for PernixData. Thanks to the write-back accelaration, Pernix was able to reach far better result than any read-only caching tests, and you can clearly see it in this graph. Reads were basically the same as in Write-Through, but that 20% of writes were further accelerated:

Pernix FIO Write-Back

JetStress: 2 hr run performance test

Labs: 1013 IOPS

Labs + PernixData Write-Through: 1644 IOPS

Labs + PernixData Write-Back: 3205 IOPS

Jetstress is almost a perfect 50% read – 50% write workload, and the result of the test is here to confirm it, just like the IOPS graph showing writes and reads:

Pernix jetstress rw


I had several hardware problems while running my HammerDB tests, so I’m not able to post any official number; you can see them quoted in the text, and youl’ll noticed they did not skyrocket even when using Pernix; bad problems to my servers and specifically the PCIe bus where the Fusion-IO cards were connected…

However, there are some nice lessons that can be learned from the HammerDB tests, and I’d like to share them with you.

Most important of all, they showed the importance of warming the cache, and let tests run for enough time so they have sense. Let me explain: the first execution of HammerDB with Write-Through acceleration gave me on average 11997 TPM and 5190 NOPM, even if the TPM peaked at 22878. What’s interesting was the cache behaviour:

 Pernix hammerdb through

The Total (effective) IOPS is not visible because is exactly the same line as Datastore. This means Pernix is not helping that much, and most of the IOPS is coming from the backend storage; in this test Pernix is running in write-through, and HammerDB TPC-C test is another 50-50 reads/writes workload. Pernix is caching those reads, but data needs to be read more than once to be served by Pernix, and seems instead HammerDB is reading and writing each data only once.

A second execution of the same tests gave me different results. TPM averaged at 17851 with 7727 NOPM, and peaked at 28992, but most of all, IOPS from Pernix were now much higher, and nearer to the total value.Meaning the cache was warming more and more. The real response to these doubts come however from this screen in PernixData, about Hit Rate:

Pernix hammerdb through3

As you can see, PernixData cannot cache (and then serve) all the I/O coming from HammerDB, and that’s why the results are similar to those obtained with the plain lab. This means only one thing; tests need to last more, in order to let PernixData fill the cache, and start serving I/O. So, I made a third run, this time for 40 minutes (just to have the graph all inside the 1hr graph), and the cache behavious was completely different:

Pernix hammerdb with warmed cache

Final notes

This series of tests helped me first of all to check the performances of PernixData, and its ability to enhance the performances of a quite old and slow storage, like my FAS2020 is. Also, I’d like to highlight how write acceleration is a real benefit, and you can really see this in the numbers. The reason is very simple: no production environment only reads data and never changes them. Especially on databases, such as Exchange or PostgreSQL that I used in my tests, fairly large amount of I/O are writes.

I’m sure designing a write-back caching system with every data protection in place like PernixData did is not trivial, otherwise other competitors would offer writes acceleration too. They will catch on this in the future I think; we’ll see what’s going to come.

Time to finally upgrade my lab to vSphere 5.5…

2 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 2 Email -- 2 Flares ×
  • Matt Vaughan

    Be aware that it’s my understanding that Pernix FVP does not work with ESXi 5.5 as of yet.

    • Luca Dell’Oca

      Yes and No. Version 1.0.2 that I’m using does not, but the new just released 1.5 works indeed with vSphere 5.5 😉


  • Nice write up Luca. As you mention data needs to be read at least once in order to warm up the cache. Some synthetic tests are very random and thus require multiple runs. Real-world workloads will often see even better results as it is much more likely that there will be multiple reads of the same data. Writes however will see immediate benefits from local flash as there is no such dependency.

    Quick question for one of the results: In the FIO Real Life Test section you list the Lab + write-through latency in ms (29.21), but in us for the writes (29.53). Should the WT latency not actually be in us as well? Based on the much higher IO it seems that most of it was coming from flash.


    • Luca Dell’Oca

      Hi Peter, you are totally right, I wrote ms instead of us. Corrected now, thanks!