Kubernetes+ONTAP = DBaaS Part 3: Trident and Storage

Part 2 of this series provided an overview of how I built the image that runs an Oracle container. However, that image only defined the runtime environment for the database.

It included the various Oracle binaries and supporting OS components, but the database needs somewhere to put its files. This most important part of the section is NetApp Trident, our driver that integrates a containerized environment with the key features of NetApp storage systems. Some of Trident’s capabilities were specifically designed to address the needs of database customers.

As mentioned previously, I participated in some Docker discussions early on, but no one in the group had an interest in persistent storage at the time. Docker was viewed as a sort of virtualization platform for microapplications, which would then use a backend database for any persistent data storage. I lost interest in Docker at that point.

Docker now supports persistent storage, and Kubernetes adds even more power to manage persistent storage. Specifically, it has abstracted a container’s use of a volume from the volume itself. You first create persistent volumes, and then containers are granted access to the volume through a persistent volume claim (PVC). From an Oracle database point of view, it’s pretty simple. I’m going to provision a volume on the storage array, and then create a PVC that links a container to that volume. Then I’ll place the Oracle datafiles, redo logs, and other instance-specific files in those volumes.

There are many, many ways to use PVCs. Among other things, I can create and destroy containers without destroying the underlying storage. This lets me do things like replace a damaged container. For example, if an administrator accidentally deleted critical files from ORACLE_HOME, I would be able to delete the container and grant the PVCs to a new container with undamaged binaries. The database can then start back up again.

Volume Layout

My Oracle image is designed to construct a database using the usual two-volume Oracle layout. The two volumes are as follows:

  • One volume exclusively for datafiles, including the temporary datafiles
  • One volume for redo logs, archive logs, and controlfiles

The primary benefit of this layout is manageability. By separating the two data types, I can easily backup, restore, and clone a database without any need for extra software. This is made possible by Oracle’s automatic recovery capability. When I take a snapshot or create a clone of the datafiles, I am capturing the entire set of datafiles at a perfectly atomic point in time. When I create a snapshot or create a clone of the log volume, I am capturing a perfectly synchronized set of controlfiles, redo logs, and archive logs.

As a result, I can get a valid backup by taking the snapshots in the correct sequence. Recovery only requires restoring a snapshot of the log volume that is later in time than the datafile volume. During recovery, the Oracle database processes use the SCN data in the controlfiles and datafiles to identify which logs must be replayed to make the datafiles consistent. There is no requirement for Oracle hot backup mode.

In addition, cloning is easier. A clone is a read-write copy of a snapshot. Therefore, if I want to clone a container, all I need to do is use the following sequence:

  • Clone the datafile volume
  • Clone the log volume
  • Attach the volumes to a new container

The recovery process happens automatically through sqlplus. From an Oracle RDBMS point of view, a clone is no different from a recovery.

Trident

The storage provisioning driver is called Trident. The following sections describe the configuration of Trident in my Kubernetes architecture.

Trident works on a variety of NetApp storage platforms, but I’m using ONTAP in this case. ONTAP is extremely powerful and acts as the bridge between the NetApp storage system and the container environment to provision storage while leveraging all the advanced features of ONTAP such as snapshots and clones.

The key values of Trident-managed volumes for Oracle are as follows:

  • You can specify the size of the volume I want to provision.
  • You can control the Snapshot schedule for each volume.
  • You can create a volume that is a clone of another volume.
  • You can create a volume that is a clone, and you can immediately invoke the clone split operation to make it a fully independent volume.

Trident Backend

The first step in configuring Trident is to create the backend driver so that Trident knows how to talk to the storage system and is preconfigured with some basic defaults.

As an aside, I’ve noticed a pattern in Kubernetes now. There are a lot of flexible file formats out there. At one time, it looked like XML was going to take over the world, and, while it’s very powerful, it can be difficult to read at times. Kubernetes seems to use YAML formatting. Nobody wants to run a command with 1000 characters of arguments, so instead you create a small YAML file that describes the operation you want to execute, and then you feed that to the Kubernetes management command.

My backend JSON file for Trident configuration is as follows:

{
  “version”: 1,
  “storageDriverName”: “ontap-nas”,
  “managementLIF”: “172.20.108.200”,
  “dataLIF”: “172.20.108.200”,
  “svm”: “jfsCloud0”,
  “username”: “dockeru”,
  “password”: “Netapp1234!”,
  “aggregate”: “data_02”,
  “nfsMountOptions”: “-o vers=3,nolock”,
  “defaults”: {
    “size”: “25G”,
    “spaceReserve”: “none”,
    “exportPolicy”: “tridentcluster”,
    “snapshotDir”: “true”
  }
}

This is fairly intuitive, but here are the highlights of what I’ve done:

  • Trident directs API calls to the management IP address to create storage and so on.
  • The data IP, which is the same as the management IP, is used to mount the filesystem. In a true secure multitenancy environment, you would want to put the management interface on a different IP address that is only accessible from limited hosts.
  • The SVM used for creating storage resources is called jfsCloud0.
  • You can see the username and password used for calling APIs. I’ve locked down this user so that it can only invoke the APIs that are required for Trident to function correctly.
  • Any volumes created are drawn from the data_02 aggregate. This is a hybrid aggregate that is created with all-SSD and spinning media. In some cases, you might want to have tiers. For example, you can create one backend that points to a hybrid aggregate and a second backend that points to an all-SSD aggregate.
  • The default NFS version varies with configuration, so I want to make sure that I’m mounting NFS filesystems with vers=3. I’ve also added the “nolock” parameter so that, if I have to kill a container or a host crashes, I’m not stuck with stale NFS locks. NFS locks provide minimal protection to an Oracle database or really any NFSv3 file because they’re only advisory. There is also additional protection at the persistent volume claim (PVC) layer that I explain below.
  • The exportpolicy is “tridentcluster,” which includes all of the host IP addresses in my Kubernetes cluster that might need to mount this volume.

I can view my backend configuration as follows and see that it’s online and operational.

[root@jfs4 kube]# tridentctl ‑n trident get backend
+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+
|          NAME           | STORAGE DRIVER | ONLINE | VOLUMES |
+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+
| ontapnas_172.20.108.200 | ontap‑nas      | true   |       0 |
+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑+‑‑‑‑‑‑‑‑‑+

After I create the backend connection, it is time to create the storage class. It is another yaml file as follows:

[root@jfs4 kube]# [root@jfs4 kube]# cat ntap-dbaas-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ntap-dbaas
provisioner: netapp.io/trident
mountOptions: [“rw”, “nfsvers=3”, “proto=tcp”, “nolock”]
parameters:
backendType: “ontap-nas”
provisioningType: “thin”
snapshots: “true”
clones: “true”

I just need the one, called “ntap-dbaas”. When I create volumes, I specify this storage class, which in turn calls the backend to create the volume I need. I haven’t actually created a volume yet, I just have the Trident interface configured so that it can create volumes when I ask for them.

This explanation covers Docker configuration, but just for the container itself. I want to do more than run a database in a container; I want to manage the whole container environment. That’s where Kubernetes comes in, as I discuss in the next posting.

Kubernetes Installation

After the basic Docker images are built, I am ready to install Kubernetes to manage them. A complete explanation of Kubernetes configuration would take many pages and is beyond the scope of this blog post, but the following summary is useful for newcomers.

Installing Kubernetes can be as easy as running the following command:

kubeadm init –pod-network-cidr=192.168.0.0/16 –ignore-preflight-errors=cri

You then get a command at the end of the operation that shows you how to add more servers to your cluster.

In practice, it’s a little more complicated. There are some configuration files that must be adjusted based on the version of Linux you’re using, and there are always growing pains. Really, it is not much work. After you understand how it works (and it does take a week of experimentation to get comfortable with everything) management and growth of the cluster is straightforward.

The various Kubernetes processes that manage and track your configuration are running in containers of their own, which provides a good example of the value of containers. You can define the Docker image that includes all the components that a given application needs, and you’re done. You can use that image to run the container anywhere. There’s little work to be done, and there are essentially no dependencies because the container is self-contained.

Calico

The only major problem I had was choosing a networking model. The basic Kubernetes environment starts, stops, manages, and monitors containers, but you need to access them. If you have a wholly internal environment, you can use private networking from within the Kubernetes cluster itself. However, a private network won’t work for a public service like DBaaS. There are also some performance limitations with certain Kubernetes network options.

I chose Calico for my configuration. Calico allows you to define a subnet to be used for container communication and makes your container network available to the outside world through the Border Gateway Protocol (BGP). You need to spend some time with the Calico documentation to properly secure everything, but Calico allows you to use a true IP networking without any translation layers.

I plan to cover Calico in more detail, including how to fully secure it for DBaaS purposes, in a future post.

We are now at the stage where I can run Oracle containers within a Kubernetes cluster, but how do I make it dance? I don’t want to just edit more and more YAML files. I want simple, easy automation of everything. That’s discussed in the next posting on orchestration.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s