SQL Database Continuous Availability

SQL Server Clustering

Clustering only applies to SQL Server, not databases hosted on Azure, SQL Data Warehouse or Parallel Data Warehouse
SQL Server has an 'AlwaysOn' feature which offers 100% availability and it uses Windows Server Failover Clustering (WSFC) to achieve this. A Failover Cluster Instance or FCI is a single instance of SQL Server that is installed across several Windows Server nodes and, possibly, across multiple subnets. Externally, an FCI appears to be a single instance of an SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.

The WSFC service manages the server cluster, quorum configuration, failover policy, and failover operations, as well as the Virtual Network Name (VNN) and virtual IP addresses for the FCI. The WSFC cluster node which currently owns the resource group is called the active node. If the active node fails, or the service is moved for a planned upgrade, the resource group ownership is moved to another node in the FCI. A WSFC cluster can run multiple FCIs, with individual FCIs active on different nodes.

An SQL Server FCI runs in a WSFC resource group and there are some rules regarding the WSFC Resource Group hardware and software. The physical servers must contain a similar hardware configuration and an identical software configuration right down to operating system version and patch level. The SQL Server code and configuration must be identical on every node in the cluster. This includes software version, patch level, components, and the instance name.
The product binaries are installed locally on every node of the FCI, but the services are not started automatically, but managed by WSFC.

All the nodes in the FCI must have the same view of instance data whenever a failover occurs, so an FCI must use shared storage between all nodes of the FCI for database and log storage. The shared storage can be either SAN storage, file shares or WSFC cluster disks. The shared storage is a potential single point of failure so it should be raided and mirrored to protect it.

An FCI has a Virtual Network Name (VNN), which means that applications can connect to the VNN without the need to know the current active node. When a new active node starts up in a failover event, the VNN is registered to this new active node. As the applications are connected to the VNN, this means that the process is transparent to the client or application connecting to SQL Server and so the application downtime is minimised. If you use multi-subnet FCIs then a virtual IP address is assigned to each subnet. If a failover involves changing subnets then the VNN is updated on the DNS server to point to the virtual IP address of the surviving subnet. This means that applications and clients can continue to use the same VNN after a multi-subnet failover.

back to top

AND or OR Clustering?

Depending on how the nodes are clustered, the SQL Server failover cluster is configured in the following ways:

The OR IP address resource dependency option was introduced with SQL Server 2012. It means that the SQL Server can be online when there is at least one valid IP address that it can bind to, which can be important as the IP addresses are not owned by all the nodes in the failover cluster, and may not be all online during SQL Server startup. Earlier releases of SQL server relied on a stretched V-LAN to expose a single IP address for failover across sites.

A SQL Server multi-subnet failover cluster is a configuration where each failover cluster node is connected to a different subnet or different set of subnets. These subnets can be in the same location or in geographically dispersed sites. Clustering across geographically dispersed sites is sometimes referred to as stretch clusters. As there is no shared storage that all the nodes can access, data should be replicated between the data storage on the multiple subnets. With data replication, there is more than one copy of the data available. Therefore, a multi-subnet failover cluster provides a disaster recovery solution in addition to high availability.

Various ways exist to configure multiple subnets. You could have 2 nodes, each on a different subnet; 3 nodes, 2 on one subnet and 1 on the other; one node connected to both subnets and one node connected to just one subnet. In all these cases SQL Server Setup sees this configuration as a multi-subnet cluster and sets the IP address resource dependency to OR.
However if you have 2 nodes and 2 subnets, and connect both nodes to both subnets, then this configuration is not considered as a multi-subnet failover cluster configuration because the clustered nodes are on the same set of subnets so the IP address resource dependency is set to AND by SQL Server Setup.

You have options for OR configuration, for example, when there is a failure of one of the IP addresses on the node that currently owns the SQL Server cluster resource group, a failover is not triggered automatically until all the IP addresses valid on that node fail. When a failover occurs, SQL Server will come online if it can bind to at least one IP address that is valid on the current node. The IP addresses that did not bind to SQL Server at startup will be listed in the error log.

It is possible to install an SQL Server FCI instance and a standalone SQL Server instance on the same server node, but if you do this you need to make sure the TCP port assignments do not clash. The default TCP port is 1433 and it is usually best to leave the default for the FCI instance and configure the standalone instance with a fixed port.

back to top

Installation Options

First you need a Windows Cluster, and you need one Windows Server Failover Clustering group for each failover cluster instance you want to configure. This process is not described here as it belongs in the Windows section. Once you have the cluster fully configured, you must run the SQL Server Setup program on each node of the failover cluster. You have two options for doing this:

Installation on one Node at a time

Create and configure a single-node SQL Server failover cluster instance. While this is technically a cluster, it has no failover capability until you add another node. You then add each node to the SQL Server failover cluster, running Setup with Add Node functionality as follows.
Run Setup.exe from the SQL Server installation code which will launch the SQL Server Installation Center Wizard. Click 'Installation' in the left-hand pane, then select 'Add node to a SQL Server failover cluster'. When the Configuration Checker has completed running a discovery operation on your computer click OK.
Work through the Wizard, suppling your language and Product Key, the name of the SQL Server failover cluster instance, login accounts and passwords, and accepting the license agreement.
At various points in the process, the System Configuration Checker will validate the information that you have provided and display a summary.

You will also need to supply IP addresses as appropriate, depending on if you have extra, or different subnets. If the node you are adding is on a different subnet to the others, you also need to confirm the IP address resource dependency change to OR.

Installation on all Nodes

First, follow the install instructions above for every node in the new SQL Server failover cluster. Once you run Setup on one node, it creates a Configuration.ini file which contains the SQL Server Failover Cluster settings that you provided. On the additional nodes to be prepared, instead of following these steps, you can supply the autogenerated Configuration.ini file from first node as an input to the Setup command line like this, substituting the name of your .ini file.

Setup.exe /ConfigurationFile=ConfigFile.ini

The nodes are now prepared, but clustering is not operational. To finish the job off, run Setup on one of the prepared nodes. Now you will have an operational SQL Server failover cluster instance with all the nodes available to host the SQL Server failover cluster.

back to top

Database Snapshots

A database snapshot is a read-only view of a SQL Server database frozen at a moment in time. The original database is called the source database and a database snapshot always resides on the same server instance as its source database.

The ability to take database snapshots came with SQL Server 2005. Database snapshot is the same in principle to disk based snapshots, except that with disks the unit of data is a block, whereas the unit of a database snapshot is a page. A snapshot is a point-in-time copy of a database that can be used for rapid backups, or for testing purposes. Snapshots could also be used for applications that only run in read-only mode against a consistent point in time copy of a database. An excellent example would be month end or year end reports.
The initial copy is made by creating a set of pointers that refer to the pages in the original or source database. As the source database is updated, the original pages are copied to the snapshot, thus preserving the point in time condition of the snap without having to copy all the data initially. A user initiated database snapshot is implemented by using sparse files, an NTFS feature. Initially a sparse file contains no user data and has no disk space allocated. This process is often called copy-on-write.

You can have create several snapshots on a given source database and they will persist until they are explicitly dropped by the database owner. Database snapshots are dependent on the source database and if that database becomes unavailable for any reason, all of its database snapshots also become unavailable. Backing up the source database works normally; it is unaffected by database snapshots. Database snapshots always work on an entire database.

 animation showing database snapshot process

SQL Server creates its own database snapshots when it issues some DBCC commands like DBCC CHECKTABLE. These snapshots are implemented using sparse alternate data streams.

A database snapshot is consistent. There is no need to quiesce or stop the database before taking a snapshot as SQL Server ensures consistency of the snap copy by rolling back any uncommitted transactions at the point the snap was taken. Transactions are applied as normal to the source database. Because a snapshot database gets pages that have not been updated from the source database, the source database must be available and both databases must be in the same phyysical server for the snap to work.

A database snapshot is read only. From the user's point of view the snapshot database always looks unchanged because reads always access the original data at the point of snap. If a page has not yet been updated on the source database, then a read operation on the snapshot reads the original page from the source database. If a page from the source has been updated, then the original page will be copied to the source file before the update is written to disk. So after a page has been updated, a read operation on the snapshot still accesses the original page, which is now stored in a snapshot file.

A database snapshot does not use much space. Well, that statement is not exactly true. Physical space occupied by a database snapshot will grow larger the longer the snapshot exists, but the growth rate will depend on how often the source database is updated and how localised the updates are. A snapshot page is only copied the first time it is updated at the source, updates after that do not affect the snapshot. This means that a database could have a very high update rate, but those updates might be localised to only 10% of the database space, so the snapshot will never grow to much more than 10% of the source.
Sparse files grow in 64KB increments, and if there is no room on disk to add an extra 64KB the snapshot will be marked as suspect and has to be dropped. None of this affects the source database, unless it is competing for disk space with the snapshot. It is a good idea, especially for a busy database, to have a process for renewing snapshots at regular intervals to prevent them from using up too much disk space. It is also useful to monitor a database before a snapshot is taken, so you know how much update activity to expect and can get the snap sizing correct.

A database snapshot is dependent on the original database. If that database becomes unavailable then the snapshot is unusable. It is also possible to use a snapshot to revert a source database back to the point in time when the snapshot was created. This does not mean you do not need backups anymore as you cannot roll forward to any point in time from snapshots. Also, as database snapshots are dependent on the source database, if that database is destroyed by hardware failure, then the snapshots are also destroyed. One way to utilise the 'restore from snapshot' facility might be for testing purposes, when you take a snapshot before you start testing, run the tests, then revert back to the starting point using the snapshot.

There are limitations on how database snapshots can be used, here are some of them:

back to top

Creating a Database Snapshot

Before you start, you need to consider what you are going to use a snapshot for. Two obvious options spring to mind.
As a 'backup' so you can recover from a database error
As a PIT copy of the data, maybe for running reports against.

There are a number of issues with the first option. Snapshots fully depend on the original database, so if it is corrupted or destroyed by a hardware error, the snapshot is destroyed too. A snapshot restore resets the transaction logs, so you can only recover to the point in time of the snapshot, you cannot roll forward through the logs. Actually, you could rename the logs before you do the restore, then rename them back again, but that process is a bit error prone. Finally, you can only do a restore if one, and only one snapshot exists.

For the second option, you do not want to have to change file names on programs everytime you run your report, so a fixed naming standard would be ideal.
Putting these two together, I'd suggest a daily snapshot with a fixed name. For example, say you need to run reports against your database as it stood at 06:00 each morning. Call your snapshot something like Lascon001_snap_0600.
Lascon001 tells you which database it is, snap identifies it as a snapshot and 0600 is the time the snaphot happened.
Limiting the snapshot to a single version helps with potential recovery, and saves on disk space.

To create a database snapshot

You cannot use SQL Server Management Studio to create a snapshot, you need to use Transact-SQL. You also need to make sure you have enough disk space for a snapshot, potentially up to the size of the original database. If your database consists of multiple files, you need te explicitly snap all of these files. The general syntax of the command is.

CREATE DATABASE database_snapshot_name
  NAME =source_file_name,
  FILENAME ='snap_file_name'
  ) repeat as necessary
 AS SNAPSHOT OF source_database_name

source_database_name is the name of the original database you are snapping from, and database_snapshot_name is the name of the snapshot database. source_file_name is the logical name used in SQL Server when referencing the file, snap_file_name is the path and file name used for the snapshot.
To give a concrete example, the imaginary Lascon001 database has a single file.

CREATE DATABASE Lascon001_snap_0600 ON
 ( NAME='Lascon001_Data',
 FILENAME ='E:\Lascon\Database\Lascon001_snap_0600_Data.ds' )
 AS SNAPSHOT OF Lascon001;

To view a database snapshot, connect to the instance of the SQL Server Database Engine and then expand that instance. Expand Databases, Expand Database Snapshots then select the snapshot you want to view.


Remember that if your database is corrupt or you have had a hardware error, you will need to recover from a backup. A snapshot revert will not work. Only one snapshot can exist. If you have more than one, identify the one you want to revert to, and drop all the others. The general restore syntax is below, with a specific example. Note that you use the database names, not the file names.

 RESTORE DATABASE database_name FROM DATABASE_SNAPSHOT=database_snapshot_name
 USE master;
  RESTORE DATABASE Lascon001 from DATABASE_SNAPSHOT='Lascon001_snap_0600';

Once the restore completes, start the database, and then it is a good idea to run a full database backup.

back to top

MSSQL storage

Lascon latest major updates