My new “I/O Test Virtual Machine”

13 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 13 Email -- 13 Flares ×

output.jpg

Being able to perform reliable I/O tests on a storage system is something that can digress into art. And it’s both the art of defining trustworthy and repeatable methods, but sadly more often the art of configuring ad-hoc tests in order to make the measurement tool say what the vendor want to be said; faking I/O tests is one of the easiest tasks, and often you only need to omit certain parameters in the published results as latency or block size in order to make them completely different. You only need to use a 512 byte block size to skyrocket your IOPs, even if you know no “real” application will use that block size, or again you can show your results omitting a huge latency you suffered while performing those tests…

There are some professional tools to do reliable tests, like TPC-C or SPECsfs. Those are really powerful tools, and most of all they offer repeatable tests regardless the storage they are run against. They would be almost perfect, but they are really expensive too, in fact even many vendors use them only on their high-end storage arrays; for us simple users, they are out of our reach.

In a “amateur” situation instead, one of the most common tool is with no doubt IOmeter. It’s really easy to use, and allows you to run tests really quickly. However, its tests are far from being “real”: it’s easy to measure the maximum performances of a storage, but not to check real performances in production scenarios. It’s like an acceleration race, compared to timing a circuit track lap: a dragster is the fastest way to win an acceleration race, but it’s not at all the best solution to drive around.

Lately also IOmeter has become even more unreliable, especially since many storage systems are using caching or SSDs, or when you are using a server side caching solution. IOmeter is not able to create “hot spots”, has Howard Marks has clearly explained here. Once it has created the test file with a size of some GBs, IOmeter reads and writes evenly along the whole file. There is no way to have a “new” data block that the caching system has never seen before, so it cannot simulate a “read miss”.

Another solution, often quoted in VMware environments, is VMmark. It’s has been created directly by VMware, and it configures several virtual machine executing different applications (Exchange Server, Web Server, Application servers…) running some common workloads. for sure is really realistic, really near to a proper production environment, but one of its problem is its configuration, it’s really cumbersome and time consuming, and it uses a large number of virtual machines.

In order to workaround all those problems, I choose to follow a different path and I created my own solution. I have no claim this is going to be the best one. But I think is a simple way to run reliable tests, and most of all it creates easily repeatable tests, so you can then compare the results.

In my lab I’m using a NetApp FAS2020 storage array, and you can read the details of my lab in this dedicated page. I wanted to have a starting point, so I run different tests on my storage. In the future, this will be my baseline for comparisons.

The Virtual Machine

My solution is based on a single virtual machine running Microsoft Windows Server 2008 R2 Standard; it has 4 vCPU configured as 1 socket and 4 core, 4 GB RAM and a 30 Gb thick disk, with another secondary disk at 350 Gb used to run the tests. I need to run many vCPU and a large disk in order to create enough data and I/O to saturate the several caches of the storage array and the SSDs used as caching inside ESXi, otherwise data are never read or written from disks and final results are too high, and not true.

Virtual hardware is version 9, and both operating system and VMware tools are updated until November 2013. In order to have the same results, I’m not doing any further update.

Finally, I exported this VM as an OVA file, so I can install it in other environments.

FIO

I discovered FIO (Flexible IO tester) thanks to a suggestion from one of my friend at Fusion-IO (funny enough, both the tool and the company have the same abbreviation… :D). Even if is really less known, after testing it I can say for sure it’s way better than IOmeter. First of all, its code is continuously updated, in fact when I wrote this article the last version was 2.1.2, released on 7th August 2013. Think about it, the last IOmeter version was released in 2006… Also, you can run it both on Linux or Windows, for the latter there is a porting of the penultimate version 2.1.1, and you can get it here.

FIO has some interesting configuration options, and thanks to them you can generate multiple binary block files to be used, in order to mix the I/O among them and access all of them randomly; thus in order to create as much entropy as possible and ultimately to stress the storage.

FIO can be used via command line, and you can save all the parameters in a configuration file; then you use this configuration file by running the command “fio config_file“. I’ve done several tests before finding a good configuration; you can use my configuration files as a starting point in order to develop your own tests. This configuration files are created for the Windows version, if yo want to use them in Linux you need to update them. You can also change IO depth and the number of parallel jobs to see how the storage react to those changes. Here are my files (change the extension to .fio before using them)

FIO Max Real I/O: 100% read, 100% sequential, Block Size 8k, IO depth 32, 16 jobs

FIO Max Bandwidth: 100% read, 100% sequential, Block size 1M, IO depth 32, 16 jobs

FIO Real Life Test: 80% read, 100% random, Block Size 8k, IO depth 32, 16 jobs

With this configurations, my NetApp FAS2020 has reached Max I/O 12717 IOPS, and Max Bandiwidth 199,57 MBs, while the Real Test gave me 2800 IOPS, with 22.40 MBs bandwidth and 181 ms of average latency.

Also, just for fun, I tried a totally silly result, that is the overall maximum IOPs, by configuring block size at 512 byte. My NetApp made slightly more than 23.000 IOPS. As you can see, by simply configuring blocksize at 8k (a more real value) IOPs felt down to 12717 IOPS. Here is another example to explain why we need to do meaningful tests.

JetStress

JetStress is the official Microsoft tool to simulate Exchange Server workloads. If compare to the other available tool (LoadGen) this one does not need a complete Exchange Server installed and configured in order to run the tests. I choose the 2010 version even if the 2013 is already available, since 2010 is much more widespread, so the test is much more interesting. You can follow this tutorial in order to install and configure JetStress.

Once JetStress is installed, it’s really simple and easy to be used. You can start the graphical version of the program (using the option “Run as Administrator”) and you choose to start a new test. You have several options; I created a performance type test, and you can run my same test by using these parameters:

Jetstress

The minimum duration of the test is 2 hours, and for the whole duration of the test JetStress really simulates every possible activity of an Exchange Server. If you take a look at the log, you can see informations like these ones:

Operation mix: Sessions 8, Inserts 40%, Deletes 20%, Replaces 5%, Reads 35%, Lazy Commits 70%.

As you can see, all activities are multi-threaded, and are a mix of writes, updates, reads, deletions. Once the test is completed, the result is  like this one (I only removed some details I do not need for the purpose of these tests):

Microsoft Exchange Jetstress 2010

Performance Test Result Report

Database Sizing and Throughput
Achieved Transactional I/O per Second1013,577
Capacity Percentage100%
Throughput Percentage100%
Initial Database Size (bytes)17179934720
Final Database Size (bytes)20418461696
Database Files (Count)1
Jetstress System Parameters
Thread Count15 (per database)
Minimum Database Cache32,0 MB
Maximum Database Cache256,0 MB
Insert Operations40%
Delete Operations20%
Replace Operations5%
Read Operations35%
Lazy Commits70%
Run Background Database MaintenanceTrue
Number of Copies per Database1
Transactional I/O Performance
MSExchange Database ==> InstancesI/O Database Reads Average Latency (msec)I/O Database Writes Average Latency (msec)I/O Database Reads/secI/O Database Writes/secI/O Database Reads Average BytesI/O Database Writes Average BytesI/O Log Reads Average Latency (msec)I/O Log Writes Average Latency (msec)I/O Log Reads/secI/O Log Writes/secI/O Log Reads Average BytesI/O Log Writes Average Bytes
Instance1956.119,55713,949501,278512,30033146,67234842,7300,0003,1030,000163,2940,0007864,278
Background Database Maintenance I/O Performance
MSExchange Database ==> InstancesDatabase Maintenance IO Reads/secDatabase Maintenance IO Reads Average Bytes
Instance1956.124,789240015,465
Total I/O Performance
MSExchange Database ==> InstancesI/O Database Reads Average Latency (msec)I/O Database Writes Average Latency (msec)I/O Database Reads/secI/O Database Writes/secI/O Database Reads Average BytesI/O Database Writes Average BytesI/O Log Reads Average Latency (msec)I/O Log Writes Average Latency (msec)I/O Log Reads/secI/O Log Writes/secI/O Log Reads Average BytesI/O Log Writes Average Bytes
Instance1956.119,55713,949526,066512,30042894,43334842,7300,0003,1030,000163,2940,0007864,278

Overall, is a really reliable test. As you can see, even if the FIO result was 2800 IOPS, JetStress only reached 1013, simply because is a much more “real” test.

HammerDB

I tried for a long time to have a tool able to simulate a database server. Don’t count on Microsoft: they have a tool called SQLio, but it’s not related at all with SQL, and it’s simply a I/O benchmark tool, just like IOmeter or FIO. They also have SQLioSim, this one is a proper SQL simulator but it does not run I/O tests, it’s more aimed at testing storage resiliency, by introducing errors in the database, and furthermore the I/O pattern is too random, and cannot be repeated; so tests are not comparable.

Same problem with Oracle: there is Orion, but at the end is another I/O simulator, even if it’s dedicated to Oracle, and most of all the links in the vendor websites do not work… There is an alternative called SLOB, but from several months the binaries are not downloadable anymore and the author never replied in his blog about this problem… (UPDATE: the author of SLOB2, Kevin Closson, has commented on this post he fixed the broken link, you can try its tool by going here).

HammerDB

At the end I choose HammerDB. It needs a database server to be installed in order to run the tests, but you can simply use one of the free version of the supported databases, and follow their guide to install the most common database servers. Also, HammerDB is available for both Linux and Windows, so I was able to run everything inside my Windows VM.

Following their guide, I installed and configured PostgreSQL. It’s in my opinion a better choice than Microsoft SQL Server Express or Oracle Express since it does not have any limit on CPU or RAM usage, PostgreSQL is completely free and you can push it to the limits of the machine where it’s running.

Once you install the Database server and HammerDB, you can start the tests. HammerDB can run a complete OLTP test based on the TPC-C specifications, and this is a huge advantage of this software: you can get final results that you can then compare with other systems running TPC-C too. There are different guides helping you to run a TPC-C test, I used this one specifically written for PostgreSQL. By the way, this document is a great resouce to learn about TPC-C tests.

To run the tests, HammerDB needs to be installed in a system other than the database server, so I used another machine in my lab to run the tests, hosted on a different storage system, and also using vMotion I separated the two VMs.

Tests can be configured with different parameters. If, like me, you are using a VM with 4 vCPU, you need to configure 100 Warehouses (5 for each vCPU, rounded to the upper 100) and 10 Virtual Users, 1 every 10 Warehouses. I’m not looking at the best possible performance in the TPC-C test, I only want to design e reusable configuration that can be run on every situation.

The final goal of TPC-C is to evaluate the performances of a database server when running transactions. HammerDB give us back a value called TPM, that is Transactions per Minute, together with the NOPM value, that is Number of Orders per Minute. In fact, TPC-C is simulating a database used for processing orders of a large company…

My VM running on the FAS2020 reached 17790 TPM and 7845 NOPM, and as you can see in the graph, controller cache had hard times trying to cope with the high I/O stream reaching the storage. There is a peak value at 28896 TPM, but also much lower values:

HammerDB hitting the storage

Final notes

Performance tests are a difficult art to master. Often you risk to get results that have a value only in the test environment you used to get them, especially if those labs are “home labs” like mine. Bottlenecks like CPU, Memory, Disk Controller, Network, are always in ambush, and they can falsify the results, and ultimately give readers wrong informations. There are dedicated labs and companies that run those kind of tests, and they too fail sometimes; so think about the overall value of results coming out from a home lab.

Because of these reasons, don’t take my results or those of other bloggers like ultimate truths, even if some of them will try to convince you about the opposite. Rather, take my tests as a starting point an create YOUR tests; only be sure they can be repeated on several different systems in different periods.

13 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 13 Email -- 13 Flares ×
  • Micke

    Hi,
    Nice post!

    Regarding SLOB, it has evolved into SLOB2 and information can be found here:
    http://kevinclosson.wordpress.com/2013/05/02/slob-2-a-significant-update-links-are-here/

    It can also be downloaded here:
    https://my.syncplicity.com/share/rjvigh25y6uu2mm/2013.05.05.slob2

    regards
    /M

  • Micke

    Sorry, didnt check the links properly in your post. Just realized its the same same link.

    /M

    • Luca Dell’Oca

      No problem Micke, I tried both to get SLOB and SLOB2, but seems impossible to find, the links in the author’s blog are not working anymore. Hopefully HammerDB seems a perfect fit for database tests 🙂

      Luca.

  • Micke
  • Hello Luca,

    Can you please try going to the SLOB2 page again and clicking on the link at the bottom (under the heading Download the SLOB2 Kit). The link expired for some reason. I had a couple of people ping me so I fixed the link. According to my download stats it has been downloaded successfully, on average, 20 times per day since the link was fixed.

    Please reach out to me through the contact section in my blog if you have trouble.

    I hope you enjoy SLOB.

    http://kevinclosson.wordpress.com/2013/05/02/slob-2-a-significant-update-links-are-here/

    • Luca Dell’Oca

      Hi Kevin,
      thanks for joining the conversation. Yes the link is working again now, I’m going to update the post with this info. In the next version of the Test Machine I will try SLOB too.

      Thanks,
      Luca.

  • Michael

    For scaling tests with 10-200 VM’s i like vmware io analyzer. VMware Io analyzer uses io meter. Also you can mix different workloads random and sequential and so on.
    http://labs.vmware.com/flings/io-analyzer

    • Luca Dell’Oca

      Hi Micheal,
      thanks for your input. How do you orchestrate the scaling of the test machines to 200 in an automated way?

      Luca.

  • Michael
    • Luca Dell’Oca

      vscsistats are a different thing, sorry.
      You measure the iops/latency from the ESXi perspective, and how it manages the vmdk virtual disk. I want to see the performances “inside” the virtual machine, since at the end it’s where my applications will run.
      Thanks nonetheless for the input.

      Luca.

  • Thanx Luca !

    I was just struggling with IOmeter to get some sensible result from my home lab (just updated with SSD cache). Thank you very much for pointing out the FIO as a better replacement, just tried it, seems perfect.

    Ciao from neighbourhood, from Slovenia

    • Luca Dell’Oca

      Happy to help an Adriatic neighbour 🙂

  • Andy

    I need to run stress from linux servers to my storage array,
    can you help
    How to modify those files to make them run from linux servers ?

    • Luca Dell’Oca

      Andy, the line you need to change is:
      ioengine=windowsaio
      with:
      ioengine=libaio

      • Andy

        thank you Luca