Performance and Storage Efficiency

This is another post that for a particular customer (Hi folks!) on a topic that comes up frequently, so here I am posting again.

Defaults

Currently shipping ONTAP AFF systems will by default to use the most appropriate settings, which means inline deduplication, compression, and compaction.

Overhead?

Efficiency features require at least some CPU work. That is unavoidable. No matter what sort of storage system is used, CPU instructions or some kind of ASIC work will be required to compress or decompress those Oracle 8K blocks.

At worst, ONTAP adaptive compression should add around 7 microseconds of latency. That’s not noticeable in a database world. We’ve also had customers report significant performance increases by enabling compression. The reason is compression allows more logical data to be processed with fewer physical IO’s.

Hybrid FlashPool systems are a good example. if you have 1TB of space on the SSD layer and you get 2:1 efficiency, you’ve effectively doubled your SSD capacity to 2TB.

Deduplication requires a fingerprint table, but more importantly it’s not going to work with virtually any Oracle database, so what’s the point? The structure of an Oracle datafile includes unique metadata which largely means there are no duplicate blocks to be found. Enabling ONTAP deduplication on any shipping version should not hurt performance, but it’s also got about zero chance of providing a benefit.

Workloads

The main reason TR-3633 recommends changing defaults in some cases is that I’m cautious in the extreme.  Although most database workloads aren’t significant enough to burden even an A200 controller, we also face a significant number of database customers who really do want 1M IOPS with a maximum latency of 1ms.

Now we’re talking about a system that will be heavily loaded with CPU work. You have to make a choice now. Do you want to accept a possible performance hit, even if it’s just 2%? Maybe that’s acceptable if the database is large and compressible and there will be significant savings. Maybe a 2% performance improvement would be worth a small sacrifice of efficiency. That’s a business decision.

I also have a concern that somewhere, sometime, someone is going to run a database workload on a system alongside some high-file-count NFS shares, configure SMB3 encryption, and require a ton of LDAP lookup activity. Somewhere out there, the extra CPU work of one of the efficiency features might tip that system over the edge into a performance problem.

I don’t know of anyone with such a problem, but I’m paid to be cautious. If you know certain data isn’t compressible, why add even one extra CPU cycle to your system? If deduplication isn’t going to discover any duplicate blocks, why consume even one byte of space for the fingerprint table? It just takes a few seconds of effort to change the defaults of a volume at the time of provisioning.

POC’s and Benchmarketing

We also run into a lot of synthetic database POC situations where vendors are evaluated by blasting their storage system with a wholly unrealistic workload that doesn’t reflect any real needs and award the business to whoever has the best results.

In these situations, we want to tune for maximum performance. Even a 1% performance difference can make a difference.

Furthermore, some vendors try to game the system. For example, some vendors have had garbage collection problems with the drives in their all-flash arrays. If you understand the problem, you can be sneaky and craft a POC plan that triggers the problem in the worst possible way. I’ve seen a few bake-offs that required a vendor to fill up their array to 100%, then delete everything, then immediately jump into performance testing. Other times POC plans focus on an IO pattern that shows a particular vendor at their best, but still doesn’t reflect real-world needs.

Thin Provisioning

Relying on efficiency features means thin provisioning. If the logical data stored exceeds the physical space on the array, you’ve thin provisioned.

For example,if you thin provision 10 one-terabyte LUN’s on a storage system with 5TB of capacity, you are assuming those LUN’s will never be 100% full with real data.

Likewise, if you compress 10TB of data on a storage system that contains only 5TB of storage, you are counting on always having 2:1 compressibility of data.

Many customers won’t do that. Personally, I think thin provisioning is the best way to manage storage. Thin provision everywhere and monitor capacity utilization at the array level.

If for some reason that can’t be done, and all storage needs to be fully provisioned and ready to accommodate uncompressible data then you won’t be using any saved space, so what’s the point of enabling any of those features? Just disable them. Save a few CPU cycles.

Options

It’s about options. Maybe that makes databases on ONTAP seem more complicated, but you have more features and capabilities. That’s a good thing.

If there’s a way to reduce latency by even one microsecond, or there’s a way to ensure consistent performance over time as the workload increases, I want to know about it and I want to document it for our customers, partners, and account teams.

Summary

Here’s a flowchart applicable to ONTAP 9 versions through and including ONTAP 9.3.

FLOWCHART

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s