Kubernetes+ONTAP = DBaaS Part 4: Orchestration

In parts 1 to 3 of this series, I discussed Docker configuration and how the NetApp Trident driver integrates the storage system into the Kubernetes framework, so all the pieces are in place.

The main management interface for Kubernetes is kubectl. I can create, start, stop, modify, and destroy containers with this command. I can also manage storage via the Trident driver using kubectl.

This would be sufficient if I was setting up a handful of containerized databases as a static production environment, but that’s not the goal here. I want to provide DBaaS for a large number of users, which means I need an orchestration layer.

DBaaS Utility

To make container management easier, I created a wrapper script to manage the container and resources. The script takes a few command line arguments and then issues the various kubectl commands. Alternatively, I could type these commands but using the script is easier and reduces the risk of typos. The script is basic Python and I am happy to share it with anyone who asks.

The arguments for my utility look like this:

[root@jfs4 kube]# ./dbaas.v5
usage: dbaas create     (unique identifier for DBaaS service)
                         ‑‑sid (database sid)
                         ‑‑pdb (PDB name >=12.2.0.1 only)
                         ‑‑version [12.2.0.1|12.1.0.2|11.2.0.4]
                         ‑‑password (oracle password)

usage: dbaas provision  (unique identifier for DBaaS service
                         ‑‑sid (database sid)
                         ‑‑pdb (PDB name >=12.2.0.1 only)
                         ‑‑from DBaaS uuid of source
                         ‑‑password (oracle password)

usage: dbaas clone      (unique identifier for DBaaS service)
                         ‑‑sid (database sid)
                         ‑‑pdb (PDB name >=12.2.0.1 only)
                         ‑‑from DBaaS uuid of source
                         ‑‑password (oracle password)

usage: dbaas rm         (unique identifier for DBaaS service

usage: dbaas mktemplate (unique identifier for DBaaS service)

usage: dbaas show       (unique identifier for DBaaS service)

usage: dbaas cli        (unique identifier for DBaaS service)

I can add anything to this, for example you might notice there’s nothing about the size of the database required in this utility. If I add a –size argument that will pass the database size to the Trident driver. This takes minimal effort, and is a good illustration of the power of both Kubernetes architecture and the NetApp Trident driver. It’s just a lot of parsing of simple YAML text.

Creating Containers

To illustrate how the container works, let’s issue this command:

./dbaas.v5 create aaaa –sid NTAP –pdb NTAPPDB –version 12.2.0.1

Here, I’m creating a container with a unique identifier of “aaaa” using an Oracle 12.2.0.1 container, I’m creating a database called NTAP and a pluggable database called NTAPPDB.

First, the utility gets a list of all currently defined pods and PVCs. A pod is technically a group of containers, but my project has a 1:1 relationship between pods and containers, so the terms pod and container are synonymous in this case.

The utility then issues a  kubectl get pod -o json command and scans the output for this result:

“metadata”: {
  “annotations”: {
    “ntap-dbaas-uuid”:
  “name”:
}

I am labelling my containers and volumes with the uuid so that I know which containers are associated with which volumes. That means I must not duplicate any names.

My script grabs the output of the currently defined pods and volumes to verify that I don’t already have a DBaaS container or PVC with the same uuid that I passed to the script.

Assuming there are no conflicts, dbaas then issues kubectl create -f with the following YAML:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: aaaa‑ntap‑ntappdb‑dbf
  annotations:
    trident.netapp.io/reclaimPolicy: “Delete”
    trident.netapp.io/snapshotPolicy: “docker‑datafiles”
    trident.netapp.io/protocol: “file”
    trident.netapp.io/snapshotDirectory: “true”
    trident.netapp.io/unixPermissions: “‑‑‑rwxrwxrwx”
    trident.netapp.io/size: “8Gi”
    ntap‑dbaas‑managed: “True”
    ntap‑dbaas‑version: “12.2.0.1”
    ntap‑dbaas‑uuid: “aaaa”
    ntap‑dbaas‑db: “oracle”
    ntap‑dbaas‑type: “dbf”
spec:
  accessModes:
    ‑ ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  storageClassName: ntap‑dbaas

The system has created an 8GB volume for my datafiles. The volume is using a snapshot policy of docker-datafiles and I tagged it with metadata to identify the PVC as a datafile volume that is under the control of the dbaas utility. The unique ID is “aaaa” and the version is Oracle 12.2.0.1. By including the database and version I can extend the model to MySQL/MariaDB and PostgreSQL in the future.

Up next is creating the log volume.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: aaaa‑ntap‑ntappdb‑log
  annotations:
    trident.netapp.io/reclaimPolicy: “Delete”
    trident.netapp.io/snapshotPolicy: “docker‑logs”
    trident.netapp.io/protocol: “file”
    trident.netapp.io/snapshotDirectory: “true”
    trident.netapp.io/unixPermissions: “‑‑‑rwxrwxrwx”
    trident.netapp.io/size: “4Gi”
    ntap‑dbaas‑managed: “True”
    ntap‑dbaas‑version: “12.2.0.1”
    ntap‑dbaas‑uuid: “aaaa”
    ntap‑dbaas‑db: “oracle”
    ntap‑dbaas‑type: “log”
spec:
  accessModes:
    ‑ ReadWriteOnce
  resources:
    requests:
      storage: 4Gi
  storageClassName: ntap‑dbaas

Note that the snapshot policies differ. The datafile volume is using docker-datafiles and the log volume is using docker-logs. This is because the policies look like this within ONTAP®:

EcoSystems‑8060::volume snapshot policy> snapshot policy show docker‑*

Policy Name              
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ 
docker‑datafiles         

    Schedule                        Count     
    ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑          ‑‑‑‑‑     
    docker‑5‑past‑the‑hour           48   
    docker‑5‑past‑midnight           30     
    docker‑5‑past‑midnight‑monthly    3 

Policy Name              
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ 
docker‑logs               

    Schedule                        Count     
    ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑          ‑‑‑‑‑     
    docker‑10‑past‑the‑hour           48   
    docker‑10‑past‑midnight           30     
    docker‑10‑past‑midnight‑monthly    3

This database is now preconfigured for snapshot-based backups. Each backup is a pair of Snapshot copies where the log snapshot is created five minutes later than the datafile backup. That’s all I need to restore or clone. If you want to understand why, check out the section “Snapshot-Optimized backups” in TR-4591 https://www.netapp.com/us/media/tr-4591.pdf.

I’m keeping 48 hourly backups, 30 nightly backups, and 3 first-day-of-the-month backups. That should be sufficient retention.

Restoration is a little more complicated. At the simplest level, the data could be recovered by copying data from the Snapshot directory within the container but that takes a long time for large databases. I need to find a good way for a user to invoke a recovery from within the container that restores the volumes instantly using ONTAP SnapRestore®. While this would be efficient, it’s not easy to communicate with Kubernetes from within a container and keep things secure. I’m still working on one particular idea but that’s a topic for later.

Finally, it’s time to create the container with another kubectl create -f command:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: aaaa‑ntap‑ntappdb
  annotations:
    ntap‑dbaas‑managed: “True”
    ntap‑dbaas‑uuid: “aaaa”
    ntap‑dbaas‑type: “HAservice”
spec:
  replicas: 1
  selector:
    matchLabels:
       ntap‑dbaas‑HA: “Y”
  template:
    metadata:
      labels:
        ntap‑dbaas‑HA: “Y”
      annotations:
        ntap‑dbaas‑managed: “True”
        ntap‑dbaas‑version: “12.2.0.1”
        ntap‑dbaas‑uuid: “aaaa”
        ntap‑dbaas‑db: “oracle”
        ntap‑dbaas‑type: “container”
        ntap‑dbaas‑sid: “NTAP”
        ntap‑dbaas‑pdb: “NTAPPDB”
    spec:
      volumes:
        ‑ name: datafiles
          persistentVolumeClaim:
           claimName: aaaa‑ntap‑ntappdb‑dbf
        ‑ name: logs
          persistentVolumeClaim:
           claimName: aaaa‑ntap‑ntappdb‑log
      containers:
        ‑ name: aaaa‑ntap‑ntappdb
          image: database:12.2.0.1‑ntap
          command: [“/orabin/NTAP.go”]
          args: [“‑‑sid”,”NTAP”,”‑‑pdb”,”NTAPPDB”,”‑‑version”,”12.2.0.1″,” ‑‑password”,”oracle”]
          ports:
            ‑ containerPort: 1521
              name: “sqlnet”
            ‑ containerPort: 2022
              name: “oraclessh”
          volumeMounts:
            ‑ mountPath: “/oradata”
              name: datafiles
            ‑ mountPath: “/logs”
              name: logs

The important parts of this YAML are:

  • I’m creating a Deployment, which is basically a container management service.
  • That Deployment includes a specification of replicas=1. That means I want one replica of the container to be created. This is how I get HA in a Kubernetes cluster. The Deployment ensures there is always at least one copy running on the cluster.
  • That Deployment then defines a template for my container, which includes the same metadata the previously defined volumes.
  • The template includes PVC claims that use the datafile and log volumes and mount them under /oradata and /logs.
  • I’m exposing two ports, 1521 for sqlnet and 2022 for ssh.
  • The command to initiate the container is /orabin/NTAP.go
  • I’m passing some arguments to /orabin/NTAP.go that control how the database should be configured

Creating Database Services

This is a newly created container, meaning the binaries are present but there’s no database. The command NTAP.go is just a shell script that looks like this:

#! /usr/bin/bash

/orabin/NTAP.startDB “$@”

NTAP.startDB is the main Python startup script that orchestrates container management. I originally setup the container to use /orabin/NTAP.startDB directly, but this resulted in a lot of defunct child processes. The Python subprocess library possibly isn’t reaping the exit codes from the children properly. In any case, starting from bash fixes this problem.

So, NTAP.go simply calls /orabin/NTAP/startDB with the same list arguments.

“–sid”,”NTAP”,”–pdb”,”NTAPPDB”,”–version”,”12.2.0.1″,” –password”,”oracle”

This is fairly intuitive, the command is telling NTAP.startDB to start an environment with the specified ORACLE_SID, ORACLE_PDB, Oracle version, and to set the default passwords to “oracle”

The first thing the script will do is check /oradata and /logs for files. If the directories are empty, NTAP.startDB calls NTAP.createDB and passes the same set of arguments.

That’s important – if the directories are empty, NTAP.startDB calls NTAP.createDB. If they aren’t empty, something different happens but assume for now they are empty.

Creating a Database

NTAP.createDB creates a new database structure using the following naming convention:

/oradata/[ORACLE_SID]
/logs/[ORACLE_SID]
/logs/[ORACLE_SID]/arch
/logs/[ORACLE_SID]/ctrl
/logs/[ORACLE_SID]/dbconfig
/logs/[ORACLE_SID]/redo

Since the SID here is “NTAP”, the datafiles are placed in /oradata/NTAP. The /logs/NTAP directory has four subdirectories:

  • Redo logs are in /logs/NTAP/redo
  • Archive logs are in /logs/NTAP/arch
  • Controlfiles are in /logs/NTAP/ctrl
  • Critical configuration files are in /logs/NTAP/dbconfig

Most of that is intuitive, with the exception of the dbconfig directory which will be explained shortly. You can view the output with the kubectl logs command:

################################################################################
###                                                                          ###
###  /oradata and /logs volumes are empty, creating new database             ###
###                                                                          ###
################################################################################
ORACLE_SID is NTAP
ORACLE_HOME is /orabin/product/12.2.0.1/dbhome_1
ORACLE_PDB is NTAPPDB
Creating audit directory at /orabin/admin/NTAP/adump
Creating datafile directory at /oradata/NTAP
Creating log directory at /logs/NTAP
Creating DBCA template file
A new listener, sqlnet, and tnsnames file are also created:
Creating and linking listener.ora file
Creating and linking sqlnet.ora file
Creating and linking tnsnames file

When the database structure is in place, then the script will invoke dbca, the database creation assistant. There is a template at /orabin/NTAP.dbc, which moves to $ORACLE_HOME/assistants/dbca/templates/NTAP.dbc.

Next, the response file must be edited to include details such as the ORACLE_SID. The template is read from /orabin/NTAP.dbca.rsp.tmpl and written to /orabin/NTAP.dbca.rsp while replacing fields such as ‘###ORACLE_SID### with the actual name of the ORACLE_SID.

Then dbca is invoked to build the database:

  [WARNING] [DBT-06208] The ‘SYS’ password entered does not conform to the Oracle recommended standards.
   CAUSE:
a. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9].
b.The password entered is a keyword that Oracle does not recommend to be used as password
   ACTION: Specify a strong password. If required refer Oracle documentation for guidelines.
[WARNING] [DBT-06208] The ‘SYSTEM’ password entered does not conform to the Oracle recommended standards.
   CAUSE:
a. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9].
b.The password entered is a keyword that Oracle does not recommend to be used as password
   ACTION: Specify a strong password. If required refer Oracle documentation for guidelines.
[WARNING] [DBT-06208] The ‘PDBADMIN’ password entered does not conform to the Oracle recommended standards.
   CAUSE:
a. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9].
b.The password entered is a keyword that Oracle does not recommend to be used as password
   ACTION: Specify a strong password. If required refer Oracle documentation for guidelines.
Copying database files
1% complete
13% complete
25% complete
Creating and starting Oracle instance
26% complete
30% complete
31% complete
35% complete
38% complete
39% complete
41% complete
Completing Database Creation
42% complete
43% complete
44% complete
46% complete
49% complete
50% complete
Creating Pluggable Databases
55% complete
75% complete
Executing Post Configuration Actions
100% complete
Look at the log file “/orabin/cfgtoollogs/dbca/NTAP/NTAP.log” for further details.

The database now needs a few final touches. First, I still need to explicitly set PDBs to open automatically and then I must move the spfile and passwd file.

Setting PDB to open automatically
Relocating Oracle password file
Relocating spfile

Normally the spfile and passwd file are located in $ORACLE_HOME/dbs, which is part of the image. I’m not storing anything important on the image itself because if I detach my persistent volumes from this container and move it to a new container, I’ll lose that data.

So, I’m using symlinks and this is what it looks like:

[oracle@aaaa-ntap-ntappdb dbs]$ ls -l $ORACLE_HOME/dbs
total 12
-rw-rw—- 1 oracle oinstall 1544 Mar 19 14:14 hc_NTAP.dat
-rw-r–r– 1 oracle oinstall 3079 May 15  2015 init.ora
-rw-r—– 1 oracle oinstall   24 Mar 19 14:08 lkNTAP
lrwxrwxrwx 1 oracle oinstall   29 Mar 19 14:13 orapwNTAP -> /logs/NTAP/dbconfig/orapwNTAP
lrwxrwxrwx 1 oracle oinstall   34 Mar 19 14:13 spfileNTAP.ora -> /logs/NTAP/dbconfig/spfileNTAP.ora

The spfile and passwd file are symlinks to a location in /logs/NTAP/dbconfig, which is a persistent location.

Likewise, the tnsnames, sqlnet, and listener configuration files are symlinks:

[oracle@aaaa-ntap-ntappdb dbs]$ ls -l $ORACLE_HOME/network/admin
total 4
lrwxrwxrwx 1 oracle oinstall   32 Mar 19 14:07 listener.ora -> /logs/NTAP/dbconfig/listener.ora
drwxr-xr-x 2 oracle oinstall   64 Mar 13 12:20 samples
-rw-r–r– 1 oracle oinstall 1441 Aug 28  2015 shrept.lst
lrwxrwxrwx 1 oracle oinstall   30 Mar 19 14:07 sqlnet.ora -> /logs/NTAP/dbconfig/sqlnet.ora
lrwxrwxrwx 1 oracle oinstall   32 Mar 19 14:07 tnsnames.ora -> /logs/NTAP/dbconfig/tnsnames.ora

The result is that the entire database, including associated configuration files, are all on protected, persistent storage.

The final step in the database configuration is to enable archive logging. I could add a switch to allow the user to choose whether archive logging should be enabled, but in my experience most users want this enabled.

Enabling log archival to /logs/NTAP/arch
Database closed.
Database dismounted.
ORACLE instance shut down.
ORACLE instance started.
Total System Global Area 1610612736 bytes
Fixed Size                  8793304 bytes
Variable Size             671089448 bytes
Database Buffers          922746880 bytes
Redo Buffers                7983104 bytes

The final step is to start the listener.

LSNRCTL for Linux: Version 12.2.0.1.0 ‑ Production on 19‑MAR‑2018 14:14:30
Copyright (c) 1991, 2016, Oracle.  All rights reserved.
Starting /orabin/product/12.2.0.1/dbhome_1/bin/tnslsnr: please wait…
TNSLSNR for Linux: Version 12.2.0.1.0 ‑ Production
System parameter file is /orabin/product/12.2.0.1/dbhome_1/network/admin/listener.ora
Log messages written to /orabin/diag/tnslsnr/aaaa‑ntap‑ntappdb/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1)))
STATUS of the LISTENER
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 12.2.0.1.0 ‑ Production
Start Date                19‑MAR‑2018 14:14:30
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /orabin/product/12.2.0.1/dbhome_1/network/admin/listener.ora
Listener Log File         /orabin/diag/tnslsnr/aaaa‑ntap‑ntappdb/listener/alert/log.xml
Listening Endpoints Summary…
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
The listener supports no services
The command completed successfully

The NTAP.createDB script is complete, so control returns to NTAP.startDB. It then checks to see if the database is operational and then starts monitoring the Oracle alert log for the instance. If everything completed correctly, the final logs look like this:

################################################################################
###                                                                          ###
###  Database is open for read‑write                                         ###
###                                                                          ###
################################################################################
################################################################################
###                                                                          ###
###  Monitoring NTAP alert log                                               ###
###                                                                          ###
################################################################################
Shared IO Pool defaulting to 64MB. Trying to get it from Buffer Cache for process 2310.
Starting background process CJQ0
2018‑03‑19T14:14:30.485479+00:00
CJQ0 started with pid=44, OS id=2577
Completed: alter database open

Managing the Database

I can also use my utility to check the current status of containers:

[root@jfs4 kube]# ./dbaas.v5 show
UUID NAME                  TYPE      STATUS  MANAGED DB     VERSION  NODE
‑‑‑‑ ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ ‑‑‑‑‑‑‑‑‑ ‑‑‑‑‑‑‑ ‑‑‑‑‑‑‑ ‑‑‑‑‑‑ ‑‑‑‑‑‑‑‑ ‑‑‑‑
aaaa aaaa‑ntap‑ntappdb     container Running True    oracle 12.2.0.1 jfs5
aaaa aaaa‑ntap‑ntappdb‑dbf dbf       Bound   True    oracle 12.2.0.1
aaaa aaaa‑ntap‑ntappdb‑log log       Bound   True    oracle 12.2.0.1

It’s parsing the output of kubectl get pod -o json  and kubectl get pvc -o json to make a list of containers and volumes. I can also see that the container is currently running on the host called jfs5.

Removing the Database

I automated the removal process and added some logic to get the status of the container so I can verify that it was completely removed before removing the volumes. If the volumes are removed first, you end up with a container that is stuck because it is permanently blocked on I/O for volumes that no longer exist.

The only way to fix this problem is to reboot the server. Similar to any hard-mounted NFS filesystem, if you remove the filesystem and there were processes using the contained files, that process is unkillable. You must reboot.

It can take 30 seconds to remove a container (pod) cleanly, so I added some logic that polls the container status to see if it’s actually gone and only remove the volumes once the container is definitely gone.

[root@jfs4 kube]# ./dbaas.v5 rm aaaa
Deleted container aaaa-ntap-ntappdb
aaaa-ntap-ntappdb has status ‘Terminating’
aaaa-ntap-ntappdb has status ‘Terminating’
aaaa-ntap-ntappdb has status ‘Terminating’
aaaa-ntap-ntappdb has status ‘Terminating’
aaaa-ntap-ntappdb has status ‘Terminating’
aaaa-ntap-ntappdb has status ‘Terminating’
Deleted datafile persistent volume claim aaaa-ntap-ntappdb-dbf
Deleted logfile persistent volume claim aaaa-ntap-ntappdb-log

Provisioning the Database Service

So far, I explained how to create a completely new database container from nothing, but it takes a couple of minutes for the installation process to complete. We can do better than that.

Assume we created our base 12cR2 image with the following command:

./dbaas.v5  create 12cR2 –sid NTAP –pdb NTAPPDB –version 12.2.0.1

Rather than using that as a database, let’s use it as a template. I can log in and make any preferred changes, and perhaps create some basic tables for use with a particular application. When I’m happy with my database, I can convert it to a template as follows:

[root@jfs4 kube]# ./dbaas.v5 mktemplate 12cR2
Deleted container 12cr2-ntap-ntappdb

I removed the container, but not the volumes. I can still see the volumes are defined, but the container is gone.

[root@jfs4 kube]# ./dbaas.v5 show
UUID   NAME                    TYPE      STATUS  MANAGED DB     VERSION  NODE
‑‑‑‑‑‑ ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ ‑‑‑‑‑‑‑‑‑ ‑‑‑‑‑‑‑ ‑‑‑‑‑‑‑ ‑‑‑‑‑‑ ‑‑‑‑‑‑‑‑ ‑‑‑‑
12cR2  12cr2‑ntap‑ntappdb‑dbf  dbf       Bound   True    oracle 12.2.0.1
12cR2  12cr2‑ntap‑ntappdb‑log  log       Bound   True    oracle 12.2.0.1

Now let’s provision a couple of databases based on my 12cR2 template.

[root@jfs4 kube]# ./dbaas.v5 provision test1 –sid NTAP –pdb NTAPPDB –from 12cR2
Created datafile persistent volume claim test1
Created logfile persistent volume claim test1
Created container test1

[root@jfs4 kube]# ./dbaas.v5 provision –sid NTAP –pdb NTAPPDB –uuid test2 –from 12cR2
Created datafile persistent volume claim test2
Created logfile persistent volume claim test2

The process is almost identical to creating a new database, with one important change in how volumes are created. The YAML file for creating the Persistent Volume Claims (PVC) looks like this:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: test2‑ntap‑ntappdb‑dbf
annotations:
trident.netapp.io/reclaimPolicy: “Delete”
trident.netapp.io/snapshotPolicy: “docker‑datafiles”
trident.netapp.io/protocol: “file”
trident.netapp.io/snapshotDirectory: “true”
trident.netapp.io/unixPermissions: “‑‑‑rwxrwxrwx”
trident.netapp.io/size: “8Gi”
trident.netapp.io/cloneFromPVC: “12cr2‑ntap‑ntappdb‑dbf”     <‑‑‑‑‑‑‑‑‑‑
trident.netapp.io/splitOnClone: "true"                       <‑‑‑‑‑‑‑‑‑‑
ntap‑dbaas‑managed: "True"
ntap‑dbaas‑version: "12.2.0.1"
ntap‑dbaas‑uuid: "test2"
ntap‑dbaas‑db: "oracle"
ntap‑dbaas‑type: "dbf"
spec:
accessModes:
‑ ReadWriteOnce
resources:
requests:
storage: 8Gi
storageClassName: ntap‑dbaas

Notice the two additional fields passed to the Trident driver. The dbaas utility told Trident to create a new volume based on the template volume. That means I bypass the 5 minutes required to build a new database because I’m using FlexClone® to replicate the current volume. In addition, I’m telling ONTAP to split the clone into a fully independent volume once compete.

My request to provision a volume does not require the user to specify a version because I’m storing basic metadata with the volumes. The version information was included in the volume that I’m cloning from.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
annotations:
ntap-dbaas-managed: “True”
ntap-dbaas-version: “12.2.0.1”
ntap-dbaas-uuid: “12cR2”
ntap-dbaas-db: “oracle”
ntap-dbaas-type: “dbf”

This means that the complete provisioning process takes about 30 seconds from the time I hit “enter” on the keyboard to the time I have a fully operational database. Not bad, right? The majority of the time is spent waiting for the container to be assigned to a node. The actual container startup time is around 8 seconds.

Cloning the Database Service

The cloning process is nearly identical to the provisioning process, with some important advantages. The most important is that I can clone any container whether it’s running or not. For example:

[root@jfs4 kube]# ./dbaas.v5 clone testclone ‑‑sid NTAP ‑‑pdb NTAPPDB ‑‑from test1 ‑‑nocleanup
Created datafile persistent volume claim testclone
Created logfile persistent volume claim testclone
Created container testclone

In about 8 seconds I have a clone of the test1 service, and that instance was running. I didn’t interrupt my source database to create the clone, in fact I didn’t have to interact with it at all.

A large part of this process is based on Oracle’s automatic recovery feature. If you’re not interested in Oracle-specific topics, you can skip to the next blog in the series Kubernetes+ONTAP = DBaaS Part 5: Manageability.

One of the most important features of an Oracle database is that you can perform a backup while the database is online. In the past, this meant creating a tape copy of files that were rapidly changing and the result was a copy of the datafiles that was badly corrupted. That’s ok however, because Oracle relational database management system (RDBMS) also includes transaction log archival.

The usual practice for a database backup was to issue the alter database begin backup command and then copy the datafiles to tape or a disk location. After the backup was complete the user issued the alter database end backup command and then created a backup of the archived transaction logs.

To recover the database the user required two things:

  1. A copy of the datafiles that were in hot backup mode
  2. All archive logs that were generated while in hot backup mode, that is later in time than the state of the datafiles.

The datafiles were essentially corrupt, but the archive logs allowed the database to replay all those transactions to make the database consistent.

It was easier to use hot backup mode, but since the release of Oracle 10gR2 it’s not strictly required. All you really need is:

  1. A copy of the datafiles at a write-ordered-consistent point in time
  2. A synchronized set of archive logs, controlfiles, and redo logs

If you have these, you can use Oracle’s automatic recovery capability. During startup, the scripts associated with the dockerfile run a recover automatic command to ensure the database is consistent.

In the case of a clone, the dbaas script first clones the datafile volume, and then clones the consolidated log volume. ONTAP Snapshot copies are always write-order consistent, so we have a usable pair of volumes. The automatic recovery procedure brings the database right up.

That covers the basics of provisioning, backup/restore, and cloning. The final post in this series covers some of the manageability benefits of this design.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s