Oracle on ASM vs Oracle on NFS

G’day. I’ve had a lot of discussions lately about whether or not to keep using ASM.

Origins of ASM

Jump back to days of Oracle 9. If you had an enterprise database, it was almost certainly Oracle on Solaris on SAN. One of the first things you had to do was license Veritas Volume Manager. It was expensive, and managing licenses was difficult.

Oracle then released ASM with Oracle 10g. It was a win all around.

It was purpose-built for Oracle databases, it bypassed potentially buggy and performance-limiting behavior in operating systems, and it was free. Everyone benefited. That means both Oracle Inc, and customers.

Customers spent less money on Oracle infrastructures, which meant they might spend more money on Oracle products. Databases became more reliable for customers, which meant Oracle spent less providing support.

SAN Evolution, and the arrival of NFS

Native OS volume managers slowly closed the gap with ASM. They too could distribute IO across lots of drives and offer transparent migration. They started supporting critical features like asynchronous and direct IO which allowed Oracle databases to run more efficiently. ASM was still arguably better in some cases, but the difference wasn’t that significant and the some users found the native LVM’s easier to manage than ASM.

Then Oracle and NetApp got together. Seriously. You’d be surprised how much of ONTAP grew out the Oracle-NetApp partnership. The most important project was turning NFS into an enterprise-class clustered filesystem, mostly for use in Oracle On-Demand. To this day, NetApp continues to spend a lot of effort profiling Oracle workloads and improving ONTAP. It really is engineered for databases.

You can even thank Oracle for FlexClone. Remember how the original Oracle applications product requires that APPL_TOP directory with a bazillion files? Oracle was already using snapshots for backup and recovery, but they had a devops problem. They asked if we could make snapshots read-write so developers could use them. That led to the creation of FlexClone.

Oracle then applied the reasoning behind ASM to NFS with the introduction of the Oracle DNFS client. Just as ASM bypassed the host OS filesystem layer, DNFS replaces the OS NFS client to deliver a more predictable, reliable, high-performance storage layer. You can still see your files on the NFS filesystem, the only difference is that the Oracle database processes will open private TCP/IP sessions to perform file IO. No OS in the way.

2019

Now it’s 2019. NetApp offers 100Gb ethernet for NFS and iSCSI. We have 32GB FC for SAN. Just a handful of NVMe LUN’s (technically they’re called a namespaces) can provide more performance that most any database can consume.

There’s also more to IT than just a database. You have to look at the big picture, including virtualization, automation, management, backup, recovery, RPO’s and RTO’s. Even if ASM in a vacuum might be easier to manage, when you look at the infrastructure as a whole there are frequently better options.

The choice of storage protocols and filesystems is now mostly a question of business practices and personal preferences. There is no inherently superior technology.

Infrastructure

Broadly speaking, if someone has a massive FC SAN infrastructure in place and ASM was already embedded in established business practices, I wouldn’t try to change that. It would be a waste of money and it would be throwing away staff experience.

If a project involves a wholly new infrastructure, I’d rather go with an IP protocol. It costs less, and it’s easier to manage, especially where Cloud is involved.  True, that opens the door to ASM over iSCSI, but why do that? If you already have a well-built, high-speed, resilient IP infrastructure why not use NFS for the databases? You can still fall back on iSCSI for applications that require block, such as most SQL Server databases.

Manageability

The primary value of NFS is you can actually see the database files. You don’t hide them inside LUN’s where only DBA’s can see the contents. You can move NFS filesystems around without dismounting an ASM diskgroup or rediscovering LUNs. You can also upsize or downsize NFS filesystems safely and without data movement.

You can also manage your NFS filesystems safely. Most customers would hesitate to make changes to FC zoning or discover FC devices on mission-critical databases during the workday. That’s essentially risk-free with NFS.

ASM does have an edge in manageability when it comes to migration. You can transparently migrate ASM diskgroups. Migration without ASM is still pretty fast because networks are so fast these days, but it’s not wholly nondisruptive. For example, you can use RMAN to duplicate a database and then switch over to the new copy with minimal disruption, but there will be at least a few minutes outage at the point of cutover.

Scalability

How many databases do you have? The demands of managing three production databases on ASM isn’t all the different than NFS, xfs, or anything else. Trying managing 300 or 3000 databases. Among other things, it will waste a lot of storage because of all that unused space that will inevitably become stranded within ASM diskgroups across the environment. It’s not easy to reclaim that. There are some options involving writing zeros to unused LUN space or removing data from a specific LUN, but that’s not nearly as easy as simply resizing a volume as needed. Add a few TB, remove a few GB. It just require a “volume size” command and it’s completely safe.

On the other hand, if you have a database that needs performance scalability at many GB/sec or IOPS of 500K and beyond, it will probably be easier with ASM. Block protocols with OS multipathing parallelize better. You can do it with NFS, but if the number of databases in scope is limited it will be easier overall with SAN. You just take a number of LUNs and their multiple paths to storage, and bond them together with ASM. It scales out predictably and largely linearly.

Manageability

Capacity management requires knowing where data exists. A system with 3000 ASM-based databases will probably be spread across 10,000 or more LUNs. Where is the free space located? You can’t really tell unless you log onto ASM for each diskgroup. NFS space consumption can be managed from a single, central location because the storage system can “see” the individual files. The free and consumed space is easy to visualize.

Backup and recovery is also easier when you work with files. You can certainly leverage snapshots with ASM, but recover requires dismounting ASM diskgroups and restoring entire LUNs. With regular NFS, you can selectively restore individual files or restore entire filesystems without having to run additional commands to manage the ASM layer.

NetApp Best Practices

In summary, there ain’t no best practices on this topic. The best option depends on what you want to do, the infrastructure available, and the scale at which you operate.

Personally, if I needed to design an architecture for 3 or 4 large mission-critical, IO-intensive production databases I’d probably use ASM over FC-SCSI or FC-NVMe. If the requirement is a 3000 database DBaaS project that changes frequently, I’d probably choose NFS.

No matter what you choose, you benefit from all the usual ONTAP features, including the traditional snapshot-based backups, restores, replication, DR, and cloning, and also including newer features like tiering to S3 with FabricPool.

NetApp NVMe for your database

I’ve been assisting with Oracle tests on NetApp NVMe over Fabric (NVMeoF). The results are really impressive, but there’s a few things worth explaining first…

NVMe is not media!

There’s a lot of confusion, some of it deliberately created, that NVMe is storage media.

NVMe is a protocol, not a type of drive. You could build an NVMe-based tape drive if you really wanted to.

Continue reading