Windows File Systems - NTFS

A file system is used to organise files into directories, so that each file has a start point and an end point to segregate the data. A file system also organises files, so they can be stored and retrieved. Windows supports three native file systens, FAT, NTFS and ReFS. THe links below take you to a description of each file system.

NTFS

NTFS and the MFT

NTFS was introduced in the Windows NT operating system and has been the native file system for Windows systems ever since. However it has some limitations and these are starting to hold it back when Windows Server is used for critical business systems.
In theory, an NFT filespace can be allocated up to 240TB, but tested capacities are much less than this. Disks are now supplied with multi-terabyte capacities and NTFS needs third party software to cope with them.
If NTFS disks or files become corrupt then it was necessary to run a chkdsk command with all systems down and this was time consuming on large disks and it was not really acceptable to have important business systems down for extended times. Windows 12 R2 introduced spot fixing for NTFS, an offline repair utility that is much faster than chkdsk and so has less impact on business systems. With spot fixing, a scan runs as a background task alongside other active programs and this scans the file system and logs the issues that it finds for later correction. You can then take the volume offline during a maintenance window and run Spotfix to sort out the corruptions logged by the scan. The actual downtime should just be a few seconds.
ReFS is a new file system that fixes these issues, see the link on the right. However ReFS is not intended to replace NTFS, the two file systems will run in tandem for the forseeable future.

Some NTFS features are -

  • Transaction logs exist to help recover from disk failures.
  • NTFS has the ability to control access at file level by setting permissions for directories and/or individual files.
  • NTFS files are not accessible from other operating systems such as DOS. This stops people hacking security by booting from a DOS floppy disk.
  • If the amount of data is less than about 2 KB, the data is actually stored in the directory entry itself, that's faster retrieval and space saving!
  • NTFS disk can span over several physical volumes, essential for large industrial strength applications.
  • NTFS uses a data retrieve technique called 'elevator seeking'. Data reads are sorted and read in track order, rather than in the order submitted. This means the reads heads aren't seeking back and forward over the disk, but read the IOs in sequence up the disk
  • The MFT directory is stored in the middle the disk, to reduce seek time to find files. You have to read the directory to find where the file is, Once located in the directory, your file is never more than half the disk away from the read heads.
  • NT will try to allocate data in a single contiguous extent. This also reduces seek time when reading the whole file.

The MFT

In NTFS, all objects are files, even the metadata about files. This allows the file system to handle all objects consistently. The Master File Table (MFT) is the most important system file. It contains information about all the files on the volume. There is exactly one MFT per volume. There is also at least one entry in the MFT for every file on an NTFS volume. If the base file record is not big enough to hold all the information about a file, an extension record is created. The MFT file records contain all the information about a file, including its size, time and date stamps,permissions, data content, etc.
The MFT consists of a series of 1KB records, with Record 0 describing the MFT itself. Record 1 is a duplicate of Record 0 for resilience. The file descriptors are called attributes, Resident Attributes fit inside the MFT, while Non-resident Attributes are too big and are held in overflow records. Attributes include stuff like the archive bit, time stamps, file names (a file can have several names, including a short name and a long name), and the ACL or security data for the file. If the file is small enough then all the data held by that file can be contained within the MFT record. If a file is too big to fit into the MFT then the MFT data record points to an external cluster on disk. A badly fragmented file will need several MFT records to hold all the pointers to the bits of the file, so fragmentation degrades performance.

Directories are held in the MTF as file records, with small directories fully contained in the MTF, while large directories are organised into B-trees, with records pointing to external clusters for the rest of the data.
It is important that the MFT does not become fragmented, as this can affect system performance. NTFS will reserve space for the MFT, but if the rest of the disk fills up, this reserved space will be used. NTFS does not delete records from the MTF when files are deleted, but it does mark them as reusable.

If you allocate lots of small files on your disk, you will fill up the MFT before the disk fills up. If you allocate big files, you will run out of disk space before the MFT is full. You can change the amount of space reserved by NTFS for the MFT by updating the NTFS zone reservation parameter, by editing

HKEY-LOCAL-MACHINE\System\CurrentControlSet\Control\FileSystem Add Value name NtfsMftZoneReservation as a type REG-DWORD and set the data value to a number between 1and 4. The bigger the number, the more space that will be reserved for the MFT. Caution - Microsoft warn that before you make any change to the registry, you should take a backup, and be prepared for the system to crash.

The valid values are -

  • 1 - default 12.5%
  • 2 - 25%
  • 3 - 37.5%
  • 4 - 50%

Note that if you change this setting it will apply to all the disks on your server, and also it is best to set the parameter at disk creation time, as if it is increased after creation the MFT will become fragmented.

Reparse points were introduced with Windows 2000 storage subsystem. They provide 'hooks' into the file system that can be used by ISVs to add storage functionality.

NTFS Change Journal

Change Journal software was introduced with Windows 2000 operating system. It is used by functions that are only interested in processing new or changed files on a volume. Examples are backup, virus scanning, indexing services and auditing. A record is added to the Change Journal every time a file or directory is updated. Applications that need to find changed files can get their information from the Change Journal, they do not have to scan the entire volume. This can mean a considerable saving in I/O operations and time, especially if not many files have changed.

NTFS cluster sizes

The minimum size of a file in the NTFS file system is the size of a single cluster and files cannot share space within a cluster. The smaller the cluster size, the more efficiently a disk stores information, the bigger the cluster the better the performance as more data is moved per IO operation. The file system has limits on the number of clusters it can support so it chooses the default cluster size of the volume. The default cluster size can be overridden by a user, up to a maximum size of 64KB.
A Win2016 server can support volume sizes up to 256TB.

An NTFS partition consists of 4 sectors
The partition boot sector
The MFT
Filesystem data
The MFT backup

NTFS Cluster Size Volume Size
4 KB 16TB
8 KB 32 TB
16 KB 64 TB
32 KB 128 TB
64 KB 256 TB

Windows ReFS

Microsoft is developing Windows as a server operating system that is capable of hosting the most demanding of applications and one of the issues it faced was NTFS. The NTFS file system is the principle Windows operating system but it has a couple of serious limitations. It cannot easily handle the multi-terabyte drives that are now in common use, and when the file system breaks, it needs a lengthy disk check operation to fix it. Most companies cannot afford to have business critical applications down for an extended period while disk issues are being fixed.

To fix these issues, Microsoft developed a new file system called ReFS, or 'Resilient File System', which was first introduced with Windows Server 2012. Microsoft has a 'statement of intent' to move to ReFS as the default file system but there is no timescale for this as yet. The fact that we cannot yet boot from an ReFS system is an immediate show stopper. ReFS was designed to support most of the NTFS features, so it would not need new system APIs and so most file system filters will continue to work with ReFS volumes.
The NTFS features that are supported include; Access Control Lists, BitLocker encryption, change notifications, file IDs, USN Journals, junction points, symbolic links, mount points, reparse points, volume snapshots and oplocks.
Some NTFS features were removed in the initial release of ReFS, then restored in later editions. These include; Alternate Data streams and automatic correction of corruption when integrity streams are used on parity spaces. Alternate data streams was required to allow ReFS to support MSSQL servers.
Some features were dropped and have not been re-instated so far. these are object IDs, short 8.3 filenames, NTFS compression, file level encryption (EFS), user data transactions, sparse, hard-links, extended attributes, and quotas.
Major issue are that ReFS does not offer data deduplication, it does not support SAN attached volumes and Windows cannot be booted from a ReFS volume.

Ensuring Data Integrity

Metadata is 'data about data' and is used to describe disks, directories and files. As such it is vital that the metadata does not get corrupted. ReFS uses a number of techniques to make sure the metadata stays valid, including independently stored 64-bit checksums and ensuring that metadata is not written in place to avoid the possibility of 'torn writes'.
All ReFS metadata is check-summed at the level of a B+ tree page, and the checksum is stored independently from the page itself. This allows ReFS to detect all forms of disk corruption, including lost and misdirected writes and bit rot, or degradation of data on the media.

The same technique can be used for file data, but this is optional. It is called an 'integrity streams', and if it is used then ReFS always writes the file changes to a location different from the original one. This allocate-on-write technique ensures that pre-existing data is not lost due to the new write. The action of writing the update and writing the checksum is 'monatomic', that is, they must both be completed as a single transaction. An important result from this is that if a file does get corrupted, say by a power failure during write, then that file can be deleted, then either restored from backup or re-created.
Older NTFS file system could not open or delete corrupted files, so for NTFS the only resolution to a corrupt file wass to run chkdsk against the whole volume. The ReFS solution ensures that if a single file does get corrupted, then access to the rest of the good data is not affected. This is especially important as volume sizes get ever larger and so volume checks take longer and longer to run.

FAT16 and FAT32

FAT (File Allocation Table) is the file system that has been around since MS-DOS days. Bill Gates supposedly created the original FAT system in 1976, in a hotel room in Albuquerque. The original FAT, or FAT16 system supported volumes up to a maximum size of 4 GB. The FAT32 file system supports volumes up to 32GB and exFAT or extended FAT supports bigger volumes. While FAT is over 40 years old, it is still supported by Windows 10. If you insert a USB drive, then right click on it in Windows Explorer and check out 'Properties' you will probably see that the file system is FAT32. Most USB drives are formatted as FAT32, because FAT is supported by so many devices. The problem with FAT32 is that it can only hold 32GB, and the maximum file size is 4GB. If this is a problem for you, reformat the USB drive as exFAT.

back to top