Enterprise Storage Selection

When this page first appeared in about 2003, IBM mainframes probably still carried out most of the business processing, although Open Systems processing on Unix and Windows devices was catching up fast. For that reason, my original defintion of an enterprise storage system was one which supported Mainframe z/OS, UNIX variants, Netware and Windows. Now in 2017, while big financial organisations still use mainframes, Z/OS is just a small part of the picture. However, this does not mean that I think that z/Os is dead, or even dying. If your organisation uses Mainframes, then z/OS support is critical and so it still plays a major part in the selection criteria below. However, I no longer think that an enterprise system has to support z/OS. In the Open systems world, Netware has almost disappeared of course and LINUX support is now important.
So the current definition of an enterprise disk subsystem is one which supports the Unix variants, Linux variants and Windows, with z/OS support a bonus.

The Cloud seems to be changing everything too. It is said that no-one is building data centers anymore (except Cloud providers), and if you don't build data centers you don't need disk subsystems. Of course, your Cloud provider will be hosting your data on disks of some kind, but one of the Cloud benefits is that you don't worry about that, as long as you can store and access your data.
The other game changer in recent years is flash drives. Most vendors now supply all-Flash subsystems and indications are that they will replace spinning disk for most real time applications, relegating spinning disk to an archive storage role.

One thing that stands out is that major subsystem builders have not changed their architecture for several years. The EMC DMX4, the HDS VSP and the IBM DS8870 have all been enhanced with large SSD disks and automated I/O tiering but the basic architectures are unchanged. Is this because the technology has reached a plateau and a new wave of technology will arrive at some point or is this because with the advent of the Cloud means that we don't need technology upgrades every five years or so?

The links below will take you to discusions of the enterprise products from the five big enterprise vendors; EMC, HDS, IBM, HP and NetApp. The final link is to a table that compares some of their products.

    GFS Advert

Dell EMC

History

EMC started out producing cache memory and developed solid state disks, memory devices that emulated spinning disks, but with much faster performance. These solid state disks were usually re-badged and sold by StorageTek.

Around 1988, EMC entered the storage market in its own name, selling symmetrix disk subsystems with what at that time was a very large, 256MB cache fronting 24GB of RAID 1 storage. Their mosaic architecture was the first to map IBM CKD mainframe disk format to standard FBA open system backend disks, and as such, could claim to be the first big user of storage virtualisation. In those days, EMC developed a reputation for delivering best performance, but at a price.

In 2008, EMC became the first to use flash storage in an enterprise subsystem, for high performance applications. EMC introduced their latest addition to the symmetrix range, the V-MAX, in April 2009.
In September 2016, Dell bought out EMC and the company is now called Dell EMC.

Architecture

The old Symms used the Direct Matrix architecture, now called Enginuity. The principle behind Direct Matrix is that all IO comes into the box from the front-end directors. These are connected to global memory cache modules, which are in turn connected to back-end directors that drive the IO down to the physical disks. This connectivity is all done by a directly connected, point-to-point fibre-channel matrix.

The V-MAX architecture builds on the older DMX architecture, but has some fundamental differences. The Directors and cache are combined together into a V-MAX engine. Each V-MAX engine contains two controllers and each controller contains Host and Disk ports, a CPU complex, cache memory and a Virtual Martix interface.
The current V-MAX consists of between 1 and 8 engines. Each engine is built from commodity processors, cache, host adapters and disk adapters. This makes them relatively cheap to produce and easier to upgrade later. Internally, the engine components communicate locally, so memory access is local. However, the engines must communicate with each other and also support the Enginuity global memory concept. To achieve this, the memory is virtualised, and each engine communicates with other engines using fiber connect and RAPIDIO technology. When a director gets a memory request it then checks the location, and if it is local it is served at memory bus speeds. If it is remote, then the request is packaged up and sent off to the remote director for processing. Presumably EMC have optimised this setup to ensure that most memory accesses are local. Certainly the EMC diagrams show each engine with 2 directors, 16 host ports and 16 disk ports, but only 4 virtual matrix ports. There are two of these ports per director, and they are connected to other engines with two MIBE (Matrix Interface Boards). The Cache memory is mirrored, and in configurations with 2 or more engines, it is mirrored between engines.

This architecture extends the direct matrix principle, but now the matrix is virtual. One of the difficulties in machine hall design is leaving room for various frames to grow as cabinets are added to increase capacity. The V-MAX can now be split into 4 frames, where the system bays can be up to 25m apart.

One interesting feature is the storage tiering, based on T0 Flash storage, T1 FC drives and T2 SATA drives.
EMC FAST, or "fully automated storage tiering" checks for data usage patterns on files and moves them as required between Fibre Channel, SAS and flash drives to optimise cost effectiveness and performance requirements. Supported subsystems include the V-Max, the Clariion CX4 and the NS unified system.
FAST can also be configured manually to move application data to higher performing disk on selected days of the month or year. This could be useful for a monthly payroll application , for example.
EMC introduced FAST2 in August 2010, which introduced true LUN tiering and can manage data at block level.
The tiering concept has been extended further by adding a 'Cloud' layer, the EMC Cloud Array.

Models

The V-MAX starts with the VMX 100K, which supports between 1 and 4 engines, each with 128 GB cache, and 24 to 1,560 disk drives giving a usable capacity of 1 PB. The virtual matrix bandwidth is 200GB/s
The VMAX 200K supports up to 8 engines, each with 128 GB cache, and up to 3,200 drives, but with a variety of different size disk and Flash drives in different RAID configurations, the total capacity is very much dependent on the configuration. The maximum formatted capacity with 3TB disk drives is 2.9 PB and maximum usable capacity in a RAID configuration is close to 2PB.
The VMAX 400K also supports up to 8 engines, but these are more powerful than the 20K engines. Each can support 256 GB cache, giving a maximum cache capacity of 2 TB. It supports up to 2,400 drives with a formatted capacity close to 4PB, and a potential RAID5 or RAID6 usable capacity of 3.8PB. The main difference between the 200K and the 400K seems to be increased internal bandwith, 400GB/s compared to 192 GB/s, and that the 400K supports 4TB drives which is where the capacity increase comes from.

Software

DMX software includes EMC Symmetrix Management Console for defining and provisioning volumes and managing replication. The Time Finder products are used for in-subsystem and PIT replication, and SRDF for remote replication. SRDF can run in full PPRC compatibility mode, and can also replicate to three sites in a star configuration.
Enginuity 5784 adds new features including SRDF/EDP (Extended Distance Protection) which is similar to cascaded SRDF except that it uses a DLDEV (DiskLess Device) for the intermediate hop.
EMC was lacking in z/OS support for some years, but they have now licensed PAV and MA software from IBM, and have provided z/OS Storage Manager to manage mainframe volumes, datasets and replication.

GDPS support is provided, except for GDPS/GM or a three site GDPS/MGM solution.

Openness

In general, EMC subsystems are not Open, the exception being if they are fronted by an EMC VPLEX which allows different manufacturers devices to co-exist with EMC. Software wise, SRDF will only work between EMC devices, and even then, not with all of them. EMC Open Replicator has the ability to take PIT copies from selected non-EMC subsystems to DMX, or to copy from DMX to selected non-EMC devices.
The V-MAX is a closed virtual system, as it cannot connect to storage subsystems within the EMC range.

Full intersite connectivity is available with VPLEX.

EMC historically had an issue with supporting z/OS features like FlashCopy and PPRC mirroring, as the equivalent EMC features were introduced earlier, and were arguably (at least by EMC) better. This became a problem when GDPS came along as while Timefinder and SRDF worked fine, they did not work with GDPS. GDPS manages remote mirroring and site failover, but it does much more than just manage the storage, but also manages the failover of z/OS LPARS and applications too. A lot of big sites use it and require that any disk purchase must be 100% GDPS compatible. EMC therefore licenced some of the IBM code to ensure good compatibility.

The EMC implementation of PPRC is called Symmetrix Compatible Peer and is built on SRDF/S code. Some minor differences are:
PPRC needs Fiber Channel path definitions between each z/OS LCU. A DS8000 uses the WWN for each FC adapter to define the links, but the VMAX does not use WWNs, it uses the serial number. This means that in the GEOPLEX LINKS definition of the GDPS Geoparm, you need to specify the link protocol as 'E', then define the links with the serial number (This was how ESCON links were define, hence EMC uses the 'E' protocol).
Symmetrix Comptible Peer does not support cascaded PPRC, PPRC loopback configurations or Open Systems FBA disks.
For GDPS FREEZE to work correctly, the GDPS / PPRC CGROUP definitions must exactly match the SRDF GROUP definitions and link definitions in the VMAX config file.
If you use Hyperswap and FAST tiering, then the FAST performance stats are copied over when a hyperswap is invoked, so the disk performance will be maintained.
GDPS requires small dedicated utility volumes on each LCU to manage the mirroring. These volumes should not be confused with EMC GDDR Gatekeeper volumes, they have completely different purposes.

The VMAX will also support XRC, which means that it will support 2 sites synchronously mirrored with PPRC, then a third site asynchronously mirrored with XRC.

back to top


   

Accelerate DB2 Write with zHyperWrite and "EADM™ by Improving DB2 Logs Volumes Response Time:

HDS

History

Hitachi Data Systems was always known as the company that manufactured disks that were exactly compatible with IBM, but worked a little faster and cost a little less. HDS broke that mould when they introduced the 'Lightning' range of subsystems in 2000, which was a merging of telephony cross-bar technology and storage subsystem technology. They extended and developed that architecture further with the USP (Universal Storage Platform), released in September 2004.
In September 2010 HDS released the Virtual Storage Platform (VSP), a purpose built subsystem that provides automated tiering between flash and spinning disk drives. This model was augmented on late 2015 with the VSP F range, all flash systems.

Architecture

Unlike competing storage subsystems, the VSP is not built from 'commodity' components, but uses parts designed and manufactured within HDS. HDS claims that this allows them to make a subsystem that outperforms its rivals.
The VSP is composed of 'racks', 'chassis' and 'boards'. The base model is a single rack and can be 'scaled up' by adding a second control rack and up to four disk racks. The base rack contains one control chassis and one drive chassis. The control chassis contains a number of functional boards, and more boards can be added to the first chassis to improve performance, and another disk chassis added to increase capacity. This is called 'scaling out'. Like the USP, the VSP supports adding external disks behind a virtualisation unit, and this is called 'scaling deep'.

There are five different kinds of functional boards.

  1. The Front End Director boards or FEDs provide the interface to host servers and also to any external storage that may be attached to the VSP. FEDs can be either 16 port 8Gb/s FICON or 16 port Fiber Channel.
  2. The Back end Director boards or BEDs interface to the disk or SSD devices. Each chassis can hold two or four BED boards and BED has eight 6Gb/s SAS links, a significant departure from the USP which used F-CAL links to the disks. Two boards are normally installed and the extra two boards are added for extra performance, as that gives 32 * 6 Gb/s SAS links in a chassis. The BEDs will connect to 128 disk Small Form Factor (SSF) disk containers or DKU and 80 disk Large Form Factor (LFF) DKUs.
    The BEDS generate RAID parity. The RAID options are very flexible, but HDS recommends Raid5 for SSD devices and RAID10 for disks. Other RAID configurations are possible, including RAID6 P+Q. You can also buy a BED with no disks, which you can use as a virtualisation engine for external disks.
  3. A Virtual Storage Director (VSD) is the central processor and data movement engine. Either two or four VSDs are installed in each chassis, four are used for added performance. The processors are now Intel, another change from USP technology. Each VSD board contains a quad-core Xeon CPU and 4GB of RAM. The VSDs are paired for failover purposes and they hold their meta-data and control data in shared system memory to make this possible. If one VSD fails then failover to the other one is automatic with no loss of service. When the failed VSD is repaired, failback is also automatic. The maximum number of processors is 32; 2 chassis with 4 VSDs, each with 4 processors. Control memory is not on a separate board anymore, but is held on the VSD.
  4. The Cache boards or DCAs (Data Cache Adapter) hold the system memory. This contains transient user IO activity and also configuration details like RAID setup, dynamic tiering status and remote copy operations. Up to 6 DCAs can be installed per chassis. Each DCA board also has either one or two 32GB SSDs to allow the board to backup configuration details and any outstanding activity if the power drops. This means the VSP does not need the heavy, expensive batteries that were required to protect from power failure on the USPs. Write blocks are mirrored, but not read blocks, which means the cache utilisation is improved.
  5. The Grid Switch boards or GSWs are PCI express based, connected by a crossbar switch. They form a High-Star-E network with two or four GSWs in each chassis. Every GSW board has 24 1GB/s , bi-directional ports connected as follows
    • 8 ports connect to FED and BED boards and transfer both data and meta-data
    • 4 ports connect to VSD boards for job requests and system data transfer (like memory access requests)
    • 8 ports connect to DCA boards for user data transfer and control memory updates
    • 4 ports are used if an extra chassis is installed, to cross-connect to the matching GSW in the second chassis.

The switched PCI-e architecture means that internal communication is non-blocking and every input port can connect to any piece of memory and every BED port can connect to any disk. This means that data does not need to be placed behind specific ports to ensure performance.

VSP Models

There are 6 hybrid disk / flash models; G200, G400, G600, G800, G1000 and G1500. The more recent Flash range models are the F400, F600, F800 and F1500. The F1500 is intended fpr mainframe systems.

Storage Tiering

The idea behind storage tiering is an old one - you keep your busiest data on fast, but expensive storage, then as it ages and becomes less busy you move it down the hierarchy to cheaper, slower storage. To achieve this, you had to solve two problems, first you had to run a report that identified data access profiles and use that report to work out what data was in the wrong place in this timeframe. Second, you had to move the incorrectly positioned data to the correct place in the storage hierarchy, a process that often required application downtime.
This data movement might involve whole volumes, or whole files. However in many cases files are active for part of the day and waste expensive disk space for the rest of the day. Some very large files can be have parts that are very active, and parts that are rarely accessed and moving the whole file to expensive storage is wasteful.

HDS has addressed those problems with Hitachi Dynamic Tiering (HDT). Storage inside the VSP can be either fast but expensive SSD, or slower and cheaper SAS/SATA drives. When you allocate a virtual volume on a VSP, it stripes the data over all the physical volumes in 42MB chunks or pages and that striping can go over both SSD and spinning disk. The page size is much bigger than that used by other Storage manufacturers, and HDS has used that bigger size to allow it to position parts files on different types of disk. The process is called sub-lun tiering.

Page access is checked on a regular basis, and if a page becomes 'hot' it is automatically moved up to SSD disk, while pages that have cooled down again are moved back to SAS disks. This means that active parts of files are held on high performance SSD and inactive parts on SAS disk, so optimising SSD usage. HDS claims that this effectively means that all files are on fast disk.

HDT is not just a VSP feature, it is also used on HDS NAS and Content Management storage systems.

Some other VSP features are:

Thin provisioning. Disk space is just allocated as needed, up to size of the virtual volume. When data is deleted from the virtual volume, a Zero Page Reclaim utility returns unused storage pages returned back to spare pool.
Automatic Dynamic Rebalancing. When new physical volumes are added to the subsystem virtual volume pages are re-striped to ensure they are still evenly spread over all the physical volumes.
Universal Virtualisation Layer. If you put some external storage behind the VSP then it is carved up and allocated to look the same as the internal storage. This means that mirroring, snapshot and replication software all work consistently for both internal and external storage
Virtual Ports. Up to 1024 virtual FC ports can share the same physical port. Each attached server will only see its own virtual ports, which means they don't get to access each other's data. This feature allows the VSP to efficiently use the high bandwidth that is available on an individual port.
All data stored on the VSP is hardware encrypted for security.

Software

Hitachi High Availability Manager provides non-disruptive failover between VSP and USP systems and means instant data access at remote site if primary site goes down. This is aimed at non-mainframe SAN based applications.
Mainframe availability uses Truecopy synchronous remote mirroring and Universal replicator with full support for GDPS.

The Storage Command suite includes.

  • Hitachi device manager for disk and storage configuration
  • Hitachi replication manager
  • Hitachi Storage Capacity Reporter for usage trending
  • Hitachi tuning manager

Openness

The VSP is an open architecture, in that it works with disks from many other vendors and virtualises the data. The list of supported vendors includes EMC, HP, IBM and SUN, as well as older HDS devices. In general, the USP will support the hardware, but replaces the OEM replication software with its own.

back to top


IBM

History

The original IBM hard drive, the RAMAC 350, was manufactured in 1956, had a 24 inch (609mm) platter, and held 5 MB. The subsystem also weighed about 1 ton. That was a bit before my time, but when I joined IT, the storage market was dominated by IBM, the mainframe was king, and the standard disk type was the IBM 3380 model K which contained 1.89 GB. IBM lost their market leader position to EMC sometime in the 1990s.

IBM introduced the DSxxxx series in late 2004 in response to competition from EMC and HDS. They updated their internal bus architecture to increase the internal transfer speed by 200% plus over the ESxxx series, and also abandoned their SSA disk architecture for a switched FC-AL standard. The DS8800 series is essentially a follow-on from the ESS disk series, and re-uses much of the ESS microcode.
IBM introduced the XIV in 2008. The XIV is Open Systems only and sells alongside the DS8000 series which supports Open and Mainframe systems.

DS8880 Architecture

The DS8880 family have greater throughput, and run faster than previous models, to make use of the faster speeds from flash drives. They use POWER8 processors connected by Gen 3 I/O controllers, which can run at 3.891 GHz or 3 .535 GHz.

The POWER8 processor can parallelise workloads by running in SMT4 or SMT8 mode. As up to 8 instruction lanes can be allocated to a processor, if one lane is blocked waiting for an IO response from a database or suchlike, then the processor can continue working with instructions from one of the other lanes. This means that the processor is kept busy and can process a lot more work and so improve the IO throughput of the DS8880 subsystem. The different DS8880 models offer different CPC configurations, ranging from a 6-core processor with 64GB of memory to a 40 core processor configured with 2TB of memory.

Drive enclosures come in two types, high performance flash enclosures and standard drive enclosures. The high performance flash enclosures are connected over a PCIe G3 fabric for improved IO performance and bandwidth. The standard drive enclosures use 8Gb/s four port fiber channel adapters that are connected to 8Gb switched FC-AL, with point to point SAS connections to each drive. This means that there are 4 paths from the DS8880 processor to each drive.

DS8000 Models

The DS8000 series currently has three models available: the DS8888, the DS8886 and the DS8884. The DS8888 is an all flash storage model. The DS8886 is a flash / spinning disk hybrid. Both models can hold up to 4.6 PB raw capacity and can be configured with varying numbers of CPU cores and memory, up to a maximum of 48 cores with 2TB memory. The DS8884 is a single processor model with 6 cores and up to 256GB memory.
The DS8886 base cabinet holds 128 disk drives, up to two expansion cabinets can be added, each holding 256 disk drives. The raw disks are supplied in blocks of sixteen, but are configured in groups of eight, with each group being called an array group. All the disks in an array group must have identical size and rotation speed.

The DS8886 extent pools can be a mixture of SSD and spinning disk, so individual LUNs and mainframe CKD volumes can have some extents on SSD and some on Disk. If you already have discrete SSD and disk pools you can merge them together to create a mixed pool.

You can move LUNs or volumes manually and non-disruptively between storage tiers, but the Easy Tier product enhances this. It moves data at 1 Gb storage stripe level rather than full volumes and the movement is policy based, depending on how active or hot a 1GB storage stripe extent is.
Manual movement is called ELMR or Entire-LUN Manual Relocation, while the automated striped based migration is called Easy stripe.

DS8000 Software

The DS software includes Flashcopy for internal subsystem point-in-time data copies, IBM Total Storage DS Manager for configuration and Metro/Global mirror for continuous inter-subsystem data replication.

The older ESS subsystems supported two kinds of z/OS Flashcopy, a basic version that just copied disks, and an advanced version that copied disks and files. DS only supports the advanced Flashcopy.
Flashcopy versions include;
multi-relationship, will support up to 12 targets;
Incremental, can refresh an old Flashcopy to bring the data to a new point-in-time without needing to recopy unchanged data;
Remote Mirror Flashcopy, permits dataset flash operations to a primary mirrored disk;
Inband Flashcopy commands, permits the transmission of flashcopy commands to a remote site through a Metro Mirror link;
Consistency Groups, flash a group of volumes to a consistent point-in-time. A consistency group can span multiple disk subsystems.

Remote mirroring versions include;
Metro Mirror, synchronous remote mirroring up to 300km, was PPRC;
Global Copy, asynchronous remote data copy intended for data migration or backup,was PPRC-XD;
Global Mirror, asynchronous remote mirroring;
Metro/Global Mirror, three site remote replication, two sites being synchronous and the third asynchronous;
z/OS Global Mirror, z/OS host based asynchronous remote mirror, was called XRC;
Z/OS Metro/Global Mirror, three site remote replication, two sites being synchronous and quite close together, the third asynchronous and remote.

Openness

The DS subsystem series is self contained and does not interface with any other vendor's storage subsystem. For Open Systems data, IBM does support mirroring and copying to other vendor's subsystems if they are fronted with SVC virtualisation.

XIV

In early 2008 IBM bought XIV, a small storage company based in Tel Aviv. The XIV is a different type of box for IBM, and they sell it alongside their DS8000 range as an open systems solution.

XIV G3 Architecture

The XIV is based on a grid architecture of up to 15 interconnected but independent units called data modules. There is no common backplane, the modules are interconnected with Infiniband switches. Each data module contains an Intel Xeon Processor, cache and up to 12 storage disks. Interface modules are a special type of data module and contain the above, but can also connect to external hosts through Fibre Channel and iSCSI interfaces. They also manage external mirroring and data migration tasks. Note that as there is no Ficon connectivity there is no z/OS support, which is unusual for a mainstream IBM storage unit.
As every module contains processors, all the modules share equally in processing the workload so a single module can be lost with little performance impact.

The other two types of component are the Ethernet switches and the UPS units. The redundant Ethernet switches connect the data and interface modules together so that every module can interface directly to every other module.

The XIV can be scaled out by adding new modules and scaled up by upgrading existing modules. When a new module is added, because it contains all of storage, cache and processing power, performance and bandwith capability increases in proportion.
If a new interface module is added, Ethernet and Fibre Channel interfaces are added in proportion.

The XIV can hold a maximum of 180 physical volumes, which with 6 TB drives, gives a maximum raw capacity of 1080 TB. The system is designed to be able to cope with losing a whole module and three disks in other modules without losing data, so it reserves the equivalent capacity of 1*12 disk module plus 3 disks for this. It also reserves another 4% of the space for Metadata, then the available space is reduced by 50% for partition copies, so the maximum effective native capacity is 485 TB. If IBM Random Access Compression Engine technology is used, then the effective capacity can be up to 2,400 TB.
Solid State drives are a later addition, but these are not used in a conventional manner. Instead of being an extra tier of disks that requires tiering software for effective use, the SSDs sit in between the DRAM cache and the spinning disks as a second level of cache. They are primarily intended to improve random read hits.

XIV Logical Volume layout

The logical volumes as presented to the hosts are made up of 1 MB data units called partitions. These partitions are striped over all the physical disks and are also duplicated, with each copy held on different modules. The partition copies are called primary copy or secondary copy.
The mapping of logical volume partitions to physical disks and primary to secondary partitions is held in a distribution table and is carried out by the system at system startup. The distribution table is obviously a very critical component as the data would be inaccessible without it, so it is replicated over every module.
You have no control over where partitions are stored and in fact, you cannot interrogate the mapping from logical volume to partition to physical volume.

The XIV calculates its space in decimal GB ( 1 decimal GB = 1,000,000,000 bytes, a 'normal' GB = 1024*1024*1024 = 1,073,741,824 bytes). This makes volume allocation a challenge as volume calculations normally use the higher value.
A logical volume is physically made up of 17 decimal GB chunks or 15.83 standard GB chunks, so it's best to define logical volume sizes as multiples of 17GB. You can define a maximum of 16,377 logical volumes including snapshots.

The data is mirrored and striped over all the disks, which can be considered a form of RAID10, but IBM say this is not really the case as the distribution follows different rules.
The 1 MB partitions are 'pseudo-randomly' spread over the disks in a way that ensures that the partition pairs never reside in the same module, the data for each volume is spread evenly over all disks, and each logically adjacent partition on a volume is distributed across a different disk.
If you add more volumes, the system creates a new goal distribution which re-balances the data distribution to make sure it is still spread evenly over all the disks. So new physical disks are quickly used and contributing to overall system performance, with no action needed from yourself
Logical volumes are 'thin provisioned', that is, the system only allocates physical space as it is required. The logical volume size is the one that is defined to the host, but the physical size is allocated in 17GB chunks as needed, until the physical size reaches the limit set by the logical size.

XIV Snapshots

Snapshots, or point-in-time copies of a volume, are fundamental to the XIV design. As the partitions that make up logical volumes are already tracked by pointers in the Distribution Table, it is very easy to create a snapshot by manipulating those pointers. Once a snapshot is created it is possible to update it, or even take another snapshot of it. Up to 16,000 snapshots can be created. Snapshots can be full refresh or differential, and it's possible to restore the original volume from a snapshot.
The XIV uses re-direct on write to manage snapshots, that is, if data is updated, the new data is written out to a new partition. With a copy-on-write snapshot, the old data must be copied over to a snapshot space before the new data can be written to disk. The proviso is that the update is going to be applied to the whole 1MB partition, otherwise the non-updated data must also be copied to the new location.
Snapshots can be made to be consistent over several logical volumes by creating consistency groups. In this case I/O activity is suspended over all the volumes in the group until all the snapshots are created.

It is possible to partition the storage into independent groups of volumes called storage pools to simplify administration. You can set a maximum storage pool size for each pool, which could be useful for setting quotas on applications or user groups. A master volume and all of its associated snapshots are always a part of only one Storage Pool.

The XIV can be configured and managed with either a GUI interface or an XCLI interface. It is also possible to use the XIV as a host to other storage subsystems. This means you can migrate data from those subsystems in-band and non-disruptively.

back to top


HP

History

In terms of mainframe disk, HP has been a Hitachi reseller for some time, but while they buy VSP hardware from Hitachi, HP works in close collaboration with HDS and supplies its own software. I've always viewed HP as a major Intel player, but a supplier with limited presence in the mainframe market. HP certainly uses re-badged HDS subsystems for mainframe storage where they run a managed service.

HP XP7 Architecture

Because the HP XP7 is a re-badged Hitachi VSP, it has the same basic architecture.

HP mainframe software includes the following products

  • VLVI Manager for Mainframe - used to reduce logical device contention and I/O queue times
  • Business Copy for Mainframe - used to provide local mirror copies of mainframe volumes
  • Continuous Access Synchronous for Mainframe - a PPRC equivalent synchronous remote copy
  • Continuous Access Journal for Mainframe - an XRC equivalent asynchronous remote copy
  • Logical Volume divider for Mainframe - works with Business Copy, renames datasets and creates a user catalog to make data accessible after a split operation.

For Open Systems solutions, HP software includes

  • Storageworks Continuous Access which provides synchronous data mirroring between subsystems
  • Storageworks Business Copy which provides full volume copy within the subsystem. This looks similar to EMC Timefinder rather than IBM FlashCopy
  • Storageworks Virtualization system, an internal and external virtualisation manager, can be used for data migration and replication
  • Storageworks LUN configuration and Security manager which is used to configure the XP12000, to define paths, array groups, volumes and LUNs
  • StorageWorks Performance Advisor which monitors performance within the XP subsystem

The HP XP7 has the same open architecture as the HDS VSP and supports the same range of OEM devices, plus it supports HP MSA devices.

HPE 3PAR StoreServ 20000 Storage

The StoreServ comes in 6 models; 2 are flash only and 4 are flash / disk hybrid systems. They include inline deduplication, which HP claims can cut capacity requirements by up to 75%. The Raw Capacity varies between Flash only, 3.9 - 12PB and Hybrid 6-15PB.

back to top


NetApp

History

NetApp was founded in 1992 and started out producing NetApp filers. A filer, or NAS device has a built in operating system that owns a filesystem and presents data as files and directories over the network. Contrast this with more traditional block storage approach used by IBM and EMC, where data is presented as blocks over a SAN, and the operating system on the server has to make sense of it and carve it up into filespaces.

NetApp use their own operating system to manage the filers, called Data ONTAP, which has progressively developed over the years, partly by a series of acquisitions. In June 2008 NetApp announced the Performance Acceleration Module (or PAM) to optimize the performance of workloads which carry out intensive random reads.
Data ONTAP 8.0, released at the end of 2010, introduced two major features; 64-bit support and the integration of the Spinnaker code allow clustering of NetApp filers.
According to an IDC report in 2010, at that time NetApp was the third biggest company in the network storage industry behind EMC and IBM
NetApp released the EF550 Flash array device in 2013. This is an all flash storage array, with obvious performance benefits. The current (2017) all flash array, the AFF A700s 2-node cluster, will hold 3.3PB raw, on MSW SSD drives, fronted by a 1TB cache.

Architecture

File system

Data ONTAP is an operating system, and it contains a file system called Write Anywhere File Layout (WAFL) which is proprietary to NetApp. When WAFL presents data as files, it can act as either NFS or CIFS, so it can present data to both UNIX and Windows, and share that data between them.

Snapshots

Snapshots are arguably the most useful feature of Data ONTAP. It is possible to take up to 255 snapshots of a given volume and up to 255,000 per controller. UNIX Snapshots are stored in a .snapshots directory or ~snapshots in Windows. They are normally read only, though it is possible to form writeable snapshots called Flexclones or virtual clones.

Snapshots are based at disk block level and use move-after-write techniques, based on inode pointers.

SnapMirror is an extension of Snapshot and is used to replicate snapshots between 2 filers. Cascading replication, that is, snapshots of snapshots, is also possible. Snapshots can be combined with SnapVault software to get full backup and recovery capability.

SyncMirror duplicates data at RAID group, aggregate or traditional volume level between two filers. This can be extended with a MetroCluster option to provide a geo-cluster or active/active cluster between two sites up to 100 km apart.

Snaplock provides WORM (Write Once Read Many) functionality for compliance purposes. Records are given a retention period, and then a volume cannot be deleted or altered until all those records have expired. A full 'Compliance' mode makes this rule absolute, and 'Enterprise' mode lets an administrator with root access override the restriction.

Models

The NetApp models are grouped into 3 series, All-Flash, Hybrid and Object stores. Detailed and up to date specifications can be found on the NetApp web site, but in general terms, the difference between the models are shown below. Each model uses in-line data reduction, which increase the raw capacity by a factor of 5-10. Data updates use redirect-on-write techniques and all have Cloud connectivity for data archiving. Replication can be provided using Metro Cluster (synchronous) or Snap Mirror (asynchronous) and these can be combined into a three site configuration. The all-Flash and Hybrid models come in HA pairs and more pairs can be added to form a scale out cluster. It is possible to combine all-Flash and Hybrid models in the same cluster.

Subsystem type Model Max Capacity Max Cache Connectivity
All Flash AFF A700 (12 HA pairs) 88PB 6TB FC, iSCSI, NFS, pNFS, CIFS/SMB
AFF A200 (1 HA Pair) 2.2PB 5GB FC, iSCSI, NFS, pNFS, CIFS/SMB
Hybrid FAS9000 172PB 1TB 12Gb SAS, 40GbE, 32GbFC, 10GbE
FAS2600 5.7PB 64GB 10GbE
Object SG5660 480TB n/a SMB/NFS
SG5612 96 TB n/a SMB/NFS

back to top


Storage Subsystem Features table

The various suppliers of enterprise disks are contrasted in the tables below. The first row explains why the factor might be important, the second row just presents the facts, which were correct at time of writing, March 2017. However I'd advise you to check with your salesperson for up to date details.

Vendor IBM EMC HDS HP NetApp
Device DS8886/8 XIV 2812-214 V-MAX 400K VSP G1500 / F1500 HP XP7 FAS9000
Subsystem Architecture
Internal Comms Architecture See the previous page for an explanation of the various types of comms architecture
PCI-e gen3 Infiniband switch Virtual Matrix PCI-e PCI-e PCI-e
Internal Bandwidth How fast can data move inside the box? The numbers quoted are marketing figures, you won't really see these numbers in practice. See the Architecture section for more information.
192 Gb/s per server, which gives 386 Gb/S for 2 servers 480 Gb/s 1,400 Gb/s with 8 engines 384 Gb/s 384 Gb/s 40 GB/s switches
External Connectivity How many external cables can you connect to the box, and how fast do they run. Numbers quoted are maximum for each type, and if the maximum is installed then that may mean no other port types can be installed. NetApp is for 24 node NAS model.
4 and 8-port 8 Gbps or 4-port 16 Gbps Fibre Channel/IBM FICON to a max of 128 ports 24*8Gb/s FC
22*10Gb/s iSCSI
8/16 Gb/s FC, iSCSI, FCoE or FICON Host Ports
 Up to 32 per engine; Up to 256 per array
10 GbE SRDF Ports:
 Up to 16 per engine; Up to 128 per array
1 GbE SRDF Ports
 Up to 32 per engine; Up to 256 per array
Hybrid model; 192 FC, 176 FICON, 192 FCoE, 88 iSCSI
All-Flash model; 128 x 16Gb Ficon or FCoE
96 * 16 Gb/8 Gb Fibre Channel
192 * 8 Gb Fibre Channel
176 * 8 Gb FICON
192 * 10 Gb FCoE
88 * 10 Gb/sec iSCSI
12Gb SAS, 40GbE, 32GbFC, 10GbE
Protocol Support What kind of cables you can plug into the box. A good box will support a mixture of protocols.
Ficon, Fibre Channel Fibre Channel, iSCSI FCoE Fibre Channel , GbE, iSCSI, FCoE, FICON NFS, SMB, FTP, iSCSI, HTTP to Cloud FC, FICON, FCoE, iSCSI, HTTP to Cloud FC, FCoE, iSCSI, NFS, pNFS, CIFS/SMB
Disk Connectivity See the previous page for details of disk connectivity.
PCI-3 connection to an 8 Gbps FCAL backbone SAS HBA PCIe 2.0 PCIe Gen 3 to 6Gb/s 2 port SAS drives 6Gb/sec SAS 6Gb/sec SAS 6Gb / 12Gb SAS
Storage Virtualisation Server Can the storage subsystem act as a virtualisation engine in conjunction with a SAN? This enables lots of disparate storage to be controlled from one central point, including mirroring between different vendor's devices.
No No No Yes Yes No
Subsystem Capacities
Maximum, and maximum effective capacity How much data can you cram into the box? The maximum configured capacity will be less than the rated capacity, partly due to RAID overhead, and partly due to 3390 emulation overhead. The maximum EFFECTIVE capacity for a mainframe workload running IO intensive TP systems can be as little as 33% of the maximum capacity,if you want adequate performance.
4.6 PB of Flash only DS8888
DS8886; 4.6PB with 6TB SAS disks plus 614TB Flash
485 TB usable with 6TB SAS drives, more with compression. Usable Capacity depends on RAID configuration, but is up to 4 PB. 40 PB usable with Flash drives 40 PB usable with Flash drives, 247 PB External Storage 14.4PB per HA pair, max 172PB with 12 pairs
Cache size In theory, the bigger the cache, the better the performance, as you will get a better read-hit ratio, and big writes should not flood the cache. If the cache is segmented, it is more resilient, and has more data paths through it
2 TB 720 GB, plus 12 TB flash cache. 16 TB with 8 engines 2 TB 2 TB 1TB - 12TB with 12 HA pairs
Number of LUNs supported
65,336, LUN or CKD.
1TB CKD max. size, 16TB max. LUN size
4000, volume or snapshots 64,000 65,280, 256TB max LUN size 65,280, 256TB max LUN size 8,192
Disk types
Physical disk size How big are the real, spinning disks and how fast do they run. The bigger the disks, the less you pay for a terabyte, but bigger disks might be performance bottlenecks. If you have really large disks, then there should be fewer of them on an FC-AL loop and avoid RAID5 as rebuild times will be too long. Faster speeds means less rotational delay.
3300/600/800GB; 1.2, 1.6, 1.8, 3.2, 4, 6TB disk 2 TB, 3 TB, 4 TB or 6 TB nearline SAS 3.5" SAS Drives
10K RPM 300GB, 600GB, 1.2TB
15K RPM 2TB
7.2K RPM 4TB

2.5" SAS Drives:
10K RPM 300GB, 600GB, 1.2TB
15K RPM 300GB
300, 600, 900GB, 1.2, 1.8TB faster disks
4, 6, 10TB slower disks
300, 600, 900GB, 1.2, 1.8TB faster disks
4, 6, 10TB slower disks
4TB, 6TB, 8TB, 10TB at 7.2K RPM
900GB, 1.2TB, 1.8TB at 10K RPM
Flash Disk support How much flash capacity can be supplied
400/800/1,600GB flash drives;
400GB to 3.2TB high performance flash cards
up to 12 TB SSD, but used as extra cache, not a storage tier. 3.5" SAS Drives: 200GB, 400GB, 800GB, 1.6TB
2.5" SAS Drives: as above plus 960GB, 1.92TB
Flash only; 576 * 7TB or 14TB flash modules
Hybrid; 200, 400, 800, 1,900 GB flash drives
Flash only; 576 * 7TB or 14TB flash modules
Hybrid; 200, 400, 800, 1,900 GB flash drives
960GB + 4TB, 960GB + 8TB, 960GB + 10TB
RAID levels supported See the RAID section for details
5,6,10; raid5 is not supported for drives bigger than 1TB RAID 10 equivalent 1,5 (3+1 or 7+1),6 (6+2 or 14+2) 1+0,5,6 1,5,6 4, 6
Availability features
remote copy Do you mirror data between two sites? If so you need this. The remote mirroring section has more details.
Global Mirror, asynchronous
Metro Mirror (PPRC), synchronous
3 site MGM also supported
XIV Remote Mirroring, synchronous or asynchronous Synchronous(SRDF/S) and asynchronous(SRDF/A) data replication between subsystems.
SRDF/DM will migrate data between subsystems.
SRDF/AR works with TimeFinder to create remote data replicas.
SRDF products are all EMC to EMC
SRDF can emulate Metro mirror and Global mirror
Hitachi true copy, PPRC compatible and synchronous;
Hitachi Universal Replicator, asynchronous copy.
Storageworks replication Metro Cluster (sync.), Snap Mirror (async.) 3 site solution possible
Instant copy 'Instant Copy' of volumes or datasets. Can be used for instant backups, or to create test data. Some implementations require a complete new disk, and so double the storage. Some implementations work on pointers, and just need a little more storage.
Flashcopy at volume and dataset level redirect-on-write snapshot, flexible options Timefinder at volume or dataset level. BCV version requires a complete volume be supplied, newer 'snap' version just uses pointers.
EMC Compatible Flash (FlashCopy)
Shadow Image at volume level
Copy on write snapshot
Storageworks copy software SnapMirror
Z/OS features
3380/90 emulation 3380 drives are older legacy technology and most sites have now converted to 3390. 3390 comes in multiple sizes, a 3390-3 will hold 2.8 GB. The newest model is the 3390-M.
All models, including 1TB EAV volumes. N/A All models All models, supports up to 65,536 logical devices All models, supports up to 65,536 logical devices N/A
GDPS support for automated site failover See the GDPS pages for details
Yes N/A Yes, including Hyperswap Yes Yes N/A
PAV and MA support Parallel Access Volume and Multiple Allegiance. See the implementation tips section for details. Used to permit multi-tasking to logical devices
Yes N/A Yes , including HyperPAV support Yes Yes , including HyperPAV support N/A
Manufacturer IBM EMC HDS HP NetApp
Device DS8886 XIV 2812- 214 V-MAX 400K VSP G1000 HP XP7 FAS9000

Price is usually very negotiable, but be sure to make sure that the vendor quotes for a complete solution with no hidden extras. Also, make sure that you get capped capacity upgrade prices, including increased software charges as software is usually charged by capacity tiers.

back to top