DB2 Storage

DB2 naming standards

This first section about naming standards applies to DB2 running on a z/OS mainframe.
Probably the first thing you notice about DB2 is the naming convention. DB2 names rarely fit in with the rest of the naming conventions in your site! They look like

hlq.DSNDBn.dbname.tblename.i0001.A001

hlq usually defines the application. The dbname, or database name, and tblename, or table (or index) name, were usually picked by DBAs to a standard which makes sense to them. In the i0001 component, the 'i' can be 'I' for Standard, 'S' for Shadow or 'T' for temp. Unfortunately, the DBA standards did not always lend themselves to automated storage management. Still, it is worth looking at some specific DB2 components, and how they fit into the grand naming scheme.

DB2 backup file names have the following format:

DB_alias.Type.Inst_name.NODEnnnn.CATNnnnn.timestamp.Seq_num

Where Database alias is a 1-8 character database alias name that was specified when the backup utility was invoked.
Type is the type of backup operation, where: 0 represents a full database-level backup, 3 represents a table space-level backup, and 4 represents a backup image generated by the LOAD...COPY TO command.
Instance name is the 1- to 8-character name of the current db2 instance.
Node number is the database partition number. In single partition database environments, this is always NODE0000. In partitioned database environments, it is NODExxxx and is the current database partition.
Catalog partition number is the database partition number of the catalog partition for the database. In single partition database environments, this is always CATN0000. In partitioned database environments, it is CATNxxxx.
Time stamp is a 14-character representation of the date and time at which the backup operation was performed in the format yyyymmddhhnnss.
Sequence number is a 3-digit number used as a file extension.

DB2 Components

A TABLE is a DB2 storage unit, consisting of rows and columns of data. Tables are implemented internally, within TABLESPACES. Tablespaces are allocated as VSAM LRDS files and pre. DB2 V8 they all had a 4k CI size. Version 8 allows the CI size to be 4, 8, 16 or 32K for performance tuning.

A SIMPLE TABLESPACE holds one table, and is composed of up to 32 LRDS VSAM files, with a maximum size of 64 GB. The fourth qualifier in the DB2 name identifies the tablespace. These DB2 databases all end with 'A001'. Your DBAs can't create simple tablespaces anymore, since DB2 v9 I think, but old ones are still supported and will exist.

A SEGMENTED TABLESPACE holds several tables, and can also be hold up to 64 GB. Think of this a being a bit like a PDS, a means of collecting together a number of small files into one container, to make management easier.

A PARTITIONED TABLESPACE works the other way. It takes one large table and splits it over several physical files. It can be between 64GB. and 128TB, depending on which version of DB2 is used. The last qualifier for partitioned databases identifies the partition number, so the fourth partition of a partitioned database, has a file name ending in 'A004'.
The advantage of using partitions, is that a large database can be split into smaller files. SQL queries will then just run against the partitions which match the WHERE clause, and not the whole database. DB2 utilities can also be set to run against sets of partitions, and not the whole database. This speeds up maintenance work. It also means that if a file is lost, then just one partition is affected, not the whole database.

There are two ways to generate partitions:
Partition-by-Growth starts with a single partition and simply adds more are the database grows up to a maximum of 128 TB.
Partition-by-Key is maintained by a key. When you initially partition a database, you will chose key ranges so that all the partitions are the same size. Over time, the database content will change, and some partitions will be bigger than others. If you want to change the key ranges, you need to unload the database, drop it, redefine it with the new keyranges, then load all the data back in again. This is not a trivial task, as all the plans and packages have to be rebound, and all views redefined. DB2 V6 simplified this a bit, as it allows you to just drop and redefine the partitions you are changing, not the whole database. DB2 V8 allows partitions to be added on-line and can take up to 4096 partitions per tablespace.

DB2 Database Structure

Rotating Partitions work a bit like Generation Data Groups and are typically used for time retention dependent data. Say you update data on a monthly basis and needed to keep a year's work of data. You would define thirteen rotating partitions, then as you add a new partition, the oldest one will automatically be deleted.

INDEXES contain keys which point to data rows in tables. Indexes are stored in INDEXSPACES, which are also implemented by Linear VSAM datasets. The DBAs usually use naming standards to differentiate between index spaces and table spaces, but these will obviously be site dependent. DB2 takes the index and table space names, and uses these to generate the fourth qualifier name.

Tables and indexes are combined into DATABASES. These usually make sense in terms of an application (the 'customer' database, for example, would be a collection of all the tables and associated indexes which held customer information). You will have local standards for table names, which DB2 uses when generating the names for the VSAM files. The table name is the third qualifier. A Database can hold a maximum of 4096 table spaces.

STOGROUPS map databases to physical disks. The significant difference between DB2 stogroups, and SMS storagegroups, is that the SMS relationship is 1:n, and the DB2 relationship n:n (hey, this stuff is about DB2, so why not use a bit of relational terminology?). Translated, this means that an SMS disk can only be defined to one storagegroup, but a DB2 disk can be assigned to several stogroups. In any case, if you are running DB2 under SMS, then your DBAs should define all their stogroups with volume(*), or just miss the volume parameter off altogether, and let SMS take control.

So that just leaves the high level qualifier. Ideally, this should identify the application, and whether the data is prod, development, or test. If so, you can use the hlq to direct allocations to the correct storage pool. Separating prod and dev by hlq also keeps production and development data in separate catalogs, always a good idea!

DB2 UDB Tablespaces

DB2 UDB on open systems (Windows, Unix, Linux) has two types of tablespaces called SMS (System Managed Space) and DMS (Database Managed Space). Do not confuse this with z/OS SMS. This is explained on the DB2 and SMS page.

Database (UDB) are usually found in home_dir/instance/NODEnnnn/SQLnnnnn/, where home_dir is site specific, chosen when a database is create instance is used to ensure that a database name is unique within a db2 instance, and is usually db2inst1, db2inst2 etc., partition is NODEnnn, is the partition number starting from 0000 and SQLnnnnn is the directories that contain the databases, with SQL00001 containing all the files for the first database created etc.
Database names must be unique within the location in which they are cataloged. On Linux and UNIX implementations, this location is a directory path, whereas on Windows implementations, it is a logical disk.

With SMS-tablespaces the operating system looks after the storage space, both for initial allocation and on-going management and the data is kept in the operating system directories. With DMS-tablespaces the data creator is responsible for the initial space allocation and the DBMS does the on-going management. The data is stored in operating system files or raw. DMS table spaces perform better, SMS tablespaces can be larger and are easier to manage. SMS tablespaces are best suited for tables that are growing fast, while DMS is best for fairly static tables that need very good performance. SMS allocation is the default.

A tablespace is physically implemented as a number of containers. DB2 UDB will stripe data across the containers of a tablespace based on the extent size. The first extent is written to container one and the second to container two and so on. This happens for both types of tablespace. Space in SMS containers is allocated and extended by the operating system, while DMS containers have to be sized and allocated up front.

SMS tablespaces are not allocated in extents, but in pages. The pages size can be 4,8,16 or 32k. The maximum table size is 16k times the page size, so tables can range from 64GB to 512GB depending on the page size. The striping extents are a multiple of the page size, with a maximum of 256 times the page size.
DMS tablespaces are allocated in extents. It is possible to add an extra container to a DMS table space, but not to an SMS table space. A DB2 UDB database has three types of object; data objects, index objects and long field objects. With DMS tablespaces these objects can be stored in individual tablespaces to improve performance, but the default is they all share the same tablespace.

The different types of tablespace are
A regular tablespace, which stores permanent data and can hold Large Objects (LOBS).
A Catalog tablespace, one per database, which holds system catalog tables and is automatically created when the database is created.
A Large tablespace, which holds permanent data and can hold LOBS. It must be DMS and a default large tablespace called USERSPACE1 is created when the database is created.
A System Temporary tablespace, the default is called TEMPSPACE1 and is used for sort, reorganisation and other temporary data requirements. At least one must exist for every database, but more can be created.

DB2 v.8.2.2 simplified the choice between SMS and DMS. You can set DMS up to auto-resize tablespaces, and automatically extend them as the tablespace becomes full. You can control the growth rate and the maximum size. DB2 v.8.2.2 pretty much automates the tablespace storage management tasks.
DB2 v.8.2.2 introduced a new feature called 'Automatic Storage'. Automatic storage tablespaces use DMS tablespaces, and uses 'storage paths' essentially file systems or directories. You associate one or more storage paths with each database and you don't need to specify a list of containers when allocating a tablespace. DB2 will automatically decide which containers to allocated based on the storage paths associated with the database. You can define a starting size, or you can take a default. Automatic storage will auto-resize databases by default. It will also add new stripe sets of containers to a tablespace