MS SQL User Databases

Database Files

The maximum size for a SQL Server 2017 database is 524,272 TB. Hardware limitations and physical management issues will mean that databases are likely to be restricted to a fraction of this size. A database file is made up of a number of 8 KB pages, and a Disk I/O operation will read or write a whole 8 KB page at a time. Pages are managed in extents, which consist of eight contiguous pages. At the beginning of each page is a 96-byte header header record that contains metadata about the page, including the page number, and the amount of free space in the page. There are a number of different page types, including Data, Index, Text/Image, Global Allocation Map, Shared Global Allocation Map, Page Free Space, Index Allocation Map, Bulk Changed Map and Differential Changed Map.

A database can consist of three types of file:

Primary file
The primary data file is the starting point of the database and points to any other files in the database. You use the master database to find the locations of all the files in a database, but they are also recorded in the primary file for each database so that the master database can be rebuilt using data from the primary user database files. If a database is small, then all the user data will be held in the primary file.
The recommended file name extension for primary data files is .mdf.
Secondary files
If the primary file is not large enough to hold all the data in the database, then secondary files are required. They are basically used to hold the data that does not fit in the primary data file. Some databases will not need secondary data files, while others may be large enough to need several secondary data files. You may also want to define secondary files on separate disk drives to spread the IO workload across multiple disks.
The recommended file name extension for secondary data files is .ndf.
Transaction logs
These files hold records of updates to the data files, and can be used to recover the database after an error. Every database has at least one transaction log file, and some databases may have more than one. Log files do not contain pages, they contain a series of log records.
The recommended file name extension for log files is .ldf.
The SQL Server initially writes its updates to a buffer, both database updates and log updates. These are not written out to disk until the application issues a checkpoint. The term 'dirty page' refers to a page in the buffer that has been updated, but not yet written out to disk. It is essential that SQL Server writes the log file updates out to disk before the data updates, as otherwise uncommitted transactions could not be rolled back. This is called a Write-ahead log or WAL.

The .mdf, .ndf, and .ldf file naming standards are not enforced, but if you use them, then anyone can identify the file type from the extension. These are the names of the physical files on disk, called the os-file-names. You should not confuse these with the logical-file-names, the name used by applications.

File sizes and growth

When you define an SQL Server file, you can specify that it should grow automatically if it fills up its initial space allocation. You say how much space you want added to the file, and each time the file fills, it increases its size by that space amount. If you have multiple files in a filegroup, they will not autogrow until all the files are full. Growth then occurs using a round-robin algorithm.
If you don't state what growth parameters you want when you create a database then you will get the ones that are specified for the model database and they might not be appropriate. When a database adds an extent, all database processing stops until the add extent process completes. New extents will probably be allocated to different parts of the physical disk, so as extents are added, the database becomes fragmented and this can affect performance. Bottom line is that you do not want the database to be extending too often. If the database is growing rapidly, you want to be adding larger extents to reduce the adding frequency. If the database grows slowly, consider adding smaller extents to use space effectively.
You can also specify a maximum size for each file, otherwise the file can continue to grow until the disk is full. If you use autogrow, then you do not need to spend so much administration time monitoring the state of the database files and increasing them manually when they start to fill up.

By default, all files are allocated onto a single disk, as some servers just have one disk. If you have more than one disk available for database files, then you should separate the database and log files onto different disks.

back to top

Database Filegroups

It is possible to group database files together in filegroups to simplify management. It may be possible to improve performance by storing tables and indexes on separate disks, to spread the IO workload and prevent head contention. If you define a separate filegroup for every disk you can simplify the management of the files.
Log files are never members of a filegroup. Log space is managed separately from data space.

The primary filegroup is the default. It contains the primary data file and any other files that are not specifically assigned to another filegroup. System tables are allocated in the primary filegroup. If multiple data files exist within a filegroup, SQL Server allocates pages to the data files using a round-robin mechanism. So when data is inserted into a table, SQL Server allocates pages first to data file 1, then data file 2, then through the rest of the data files, then back to data file 1 again. SQL Server achieves this by an algorithm known as Proportional Fill.

You can place specific files into User-defined filegroups by using the FILEGROUP keyword in a CREATE DATABASE or ALTER DATABASE statement.
You can specify a default user-defined-filegroup for each database, so pages which are not given a specific filegroup are assigned to it, instead of the primary file group. If you do not specify a default filegroup, then the primary filegroup is the default.

It is possible to make all the files in a user-file-group read only, and it is also possible to detach a file group from one server, then reattach it to another server. This means that it is possible to copy an SQL database to removable media, then distribute it to remote clients, who would be able to read the data using their own SQL server.

You do not need to use file groups at all, especially if you are running a smaller system. In this case all files are included in the primary filegroup.

back to top

Azure Cloud Database as a Service

Azure Database as a service allows you to store your user databases in the Cloud as an Azure blob. Note that you should not store system databases in Azure blob storage as this is neither recommended nor supported.

There are several benefits to using Azure, some of which are:
Azure gives you limitless storage capacity, and you just pay for the storage you are using.
You can use Azure snapshots using the Azure Blob storage service to provide nearly instantaneous backups and quicker restores for database files.
You can move databases between machines without needing any application changes. The machines can be either on your premises, or in the Cloud.
You can create multiple copies of your databases on the local machines or virtual machines

You might be concerend about ther security of a database in the Cloud, but one way to ensure your data is to separate your compute instance from your storage instance.
You then encrypt your database using Transparent Data Encryption (TDE) certificates, which are physically separated from the data. The TDE keys can be stored in the master database, which must be stored locally on your site and should backed up locally (so the backups with the encryption key are not held in the Cloud). You then use these local held keys to access the data, which is hekd on Azure Storage. If an unauthosied person was able to access your cloud storage, they cannot access your data as they would not have access to the TDE certificates.

An Overview of Using Azure Databases

First, you need to create a storage account and at least one container in Windows Azure. A storage account can contain an unlimited number of containers within the storage limits. Each container will then hold a set of Blobs, either block blobs or page blobs. Page blobs are more efficient when ranges of bytes in a file are modified frequently.

You must then create a policy on the container, which contains things like read, write, and list rights. Wehn you create a policy on a container, you also generate a SAS key which is kept unencrypted in memory and is needed by the SQL Server to access the blob files in the container. For each container used by a data or a log file, you must create a SQL Server Credential whose name matches the container path. This includes information on the policy of the container as well as a shared access signature that is necessary to access the container. You store the information regarding Azure Storage container, its associated policy name, and SAS key in the SQL Server credential store.

back to top

Allocating a database

The following T-SQL command will create a database named 'Lascon' with three data files (1 primary and 2 secondary data files) and one log file. You can will need to substitute your own database name, file path, file names, size and file growth parameters.

( NAME = N'Lascon_log', FILENAME = N'C:\MSSQL\DATA\Lascon_log.ldf',SIZE = 50GB , MAXSIZE = 100GB , FILEGROWTH = 10%)

CONTAINMENT = NONE means that the database is not isolated from the SQL Server instance. ON and PRIMARY means that the disk files used to store the database components are explicitly defined, and that the first defined file will be the primary. The MAXSIZE and FILEGROWTH parameters combined mean that when the defined 50GB allocation fills up, the database will grow by increments of 5GB, until the disk fills up.
Now you need to create a table before you can add in data, using the T-SQL command below. Again, substitute your own names.

USE Lascon;
CREATE TABLE [Database_pages] (
   [Page_Name] CHAR (25) DEFAULT 'New Page',
  [Created_By] CHAR (25) DEFAULT 'Lascon AJA');

These are both very simple examples that use minimal parameters. Lots more parameters are available for both CREATE DATABASE and TABLE, see the Microsoft documentation for details.

back to top

Querying a Database Allocation

It is useful to be able to find out how much available free space there is in each data file of a database. One way to do this is to run the SQL query below.

USE Lascon
Select DB_NAME() AS [DatabaseName], Name, file_id, physical_name,
   (size * 8.0/1024) as Size,
   ((size * 8.0/1024) - (FILEPROPERTY(name, 'SpaceUsed') * 8.0/1024)) As FreeSpace
   From sys.database_files

This command will tell you the data file names, file IDs, physical name, total size and available free space in each of the database files.
If you want to know how many extents are allocated for a database, try the undocumented DBCC command below.

USE Lascon
DBCC showfilestats

This command shows you the number of extents for each data file. As each data page is 8KB and each extent contains 8 continuous pages, an extend will hold about 64KB. UsedExtents is the number of extents allocated with data in each file. While SQL will spread the data evenly over all the files, the primary data file will have a higher number of UsedExtents as it includes system information about the database.

back to top

MSSQL storage

Lascon updTES

I retired 2 years ago, and so I'm out of touch with the latest in the data storage world. The Lascon site has not been updated since July 2021, and probably will not get updated very much again. The site hosting is paid up until early 2023 when it will almost certainly disappear.
Lascon Storage was conceived in 2000, and technology has changed massively over those 22 years. It's been fun, but I guess it's time to call it a day. Thanks to all my readers in that time. I hope you managed to find something useful in there.
All the best