Looking at the world in general, the main disruptors for the medium term future would seem to be: An aging population, especially in the West, The Internet of Things or IoT, Advanced Robotics, Focus on Privacy, both at a personal level and regulatory level and Disaster Recovery and Business Continuity. Our challenge is to investigate these disruptors and try to figure what they will mean for us in terms of future directions of storage.
The amount of stored data continues to grow at an alarming rate, at least in the Open systems world. IoT should certainly fuel future growth, but savvy people are now a little worried about the size of the 'digital footprint' that they are leaving behind them, and the wealth of detail that it contains about them so this could slow data growth in some areas. However, storage futures continues to be driven by data growth, cost containment and regulatory compliance. How do you plan for this when there is a bewildering array of products and services to chose from, all of which promise to fix all your issues with a minimum of effort?
These future possibilities can be split up into three areas:
If you decide to purchase capacity yourself, then you need to consider the trade-off between capacity, performance and cost. High speed storage usually comes in smaller increments and is more expensive. The trick is to make sure that the data that needs best performance is on that high speed storage, while older data can be on a cheaper media.
Flash Storage continues to be the favourite for fast access, and is rapidly replacing hard disk drives. Every vendor supplies all-Flash arrays now, and most will supply hybrid mixed disk and flash systems, but there are very few storage systems now that only host hard disk drives. With multi-terabyte Flash drives already in production, the disk drive future looks very uncertain. However it looks like Flash storage itself is close to the end of its development capabilities now, and new technologies are planned to supplement it.
Intel's Optane uses Prototype Phase change or PRAM, which at a very simplistic level, works by changing the state of chalcogenide glass as the two states have different electrical resistance. Optane drives are available in 2018, but sizes are small at 280 or 480GB. They have some way to go to supplant Flash. Optane has some developing competition from Crossbar, which has designed a ReRAM device which it claims is 1,000x faster than NAND flash has 1/20th the power consumption, and lasts 1,000x longer. We need to see these products tested and integrated into storage subsystems before they can be considered mainstream, but they are an indication of things to come. There are plans to store data on DNA strands, which would be at molecular level. This promises very high capacity density, but if it ever becomes a mainstream storage product, then a lot more development is needed.
One of the problems with producing faster technology is that the storage devices can process data faster than the existing comms channels that serve them, so the comms channels become a bottleneck. I'm certainly hearing this from Mainframe performance experts, who tell me that Flash drives have resolved the problems with disk performance, but the FICON channels are now the problem. There are two angles to consider here, the interfaces that exist within a server or device, and the external connections and channels.
PCI Express, or PCIe, provides the fast interface and internaly, NVMe is an alternative to the old SCSI protocol for transferring data between hosts and peripheral storage. NVMe was designed specifically to support the high IO rates demanded by PCIe connected Flash Storage. There are physical limits to how fast you can send a signal down a wire. To quote Scotty, "Ye cannae change the laws of physics, Jim", What you can do, is use more wires in parallel. The m.2 PCIe device interface allows a device to connect to 2 or 4 PCIe lanes and as this scales up, it should cope with the fastest Flash Storage transfer rates, but this is a motherboard connection. It should resolve the comms issues inside the storage device, but will not help with the external channels.
The next stage is NVMe over Fabrics (NVMe-oF), which is intended to improve the data transfer between host computers and target storage systems. At the moment, NVMe-oF just provides remote direct memory access (RDMA) over converged Ethernet (RoCE) and Fibre Channel (NVMe-FC), but in 2017 the only major takeup of NVMe-oF was with appliances based on Microsoft's Storage Spaces Direct using NVMe over RoCE. In 2018, several all-flash array vendors such as Kaminario, Pure Storage and Western Digital's Tegile have moved to use NVMe-oF as a back-end fabric. The expectation is that more enterprise servers will be NVMe-enabled in 2018.
Magnetic Disk is not completely dead, two technology improvements exist which could extend their life a bit. Heat assisted magnetic recording (HAMR) and Microwave assisted magnetic recording (MAMR) both use techniques to persuade the magnetic domains to change polarity faster, by using a laser (HAMR) or a microwave generator (MAMR) in the write head. This allows the data density on a disk to improve by a factor of two or four, and so reducing the cost per terabyte. These technologies are not due until 2018 or so.
Magnetic tape does seem to be falling behind, with the current LTO-8 tape holding just 12TB raw storage capacity, though there is a routemap to LTO-10 with a 48TB capacity. Magnetic tape is still the best product for cold archive data, as the slower access is offset by the cheaper price per TB. However indications are that vendors are encouraging customers to abandon tape and use the Cloud insted.
The vendors like to pitch the Cloud as some kind of fuzzy new data storage, where your data is held almost by magic and is always available from anywhere you can get an internet connection. Of course we storage professionals know that data in the cloud is stored on SSD, HDD and magnetic tape, just the same as any other data! A lot of smaller businesses are now using the cloud to reduce their storage costs, often using SaaS applications. A lot of the pressure to move to the Cloud comes from COEs or other senior management, rather than being driven by the IT department. One reason for this pressure is that as data moves to the Cloud, the cost moves from CapEx to OpEx, which is always preferred by accountants.
Large companies have traditionally kept their IT services in house as they get economies of scale. However we are now seeing some very large companies outsourcing their IT facilities to the Cloud, even those with very stringent regulatory environments. It will be interesting to see how this work out and if the benefits happen as promised then doubtless others will follow. Meanwhile most of the large enterprises tend to pursue a hybrid strategy and keep a significant amount of their storage capacity on site. Typically they keep mission-critical data in-house and use the cloud for lower-priority data like backups, and long term archive. In 2018, more enterprises will use hybrid delivery models and store their workloads across multiple clouds. This model calls for flexible storage that will boost your efficiencies when you place data across public, private, and hybrid clouds. With multi-cloud storage, you can also reduce your risks of data loss or downtime if one of your services fails. For example, if your public cloud provider has an outage, your customers can still access data that you replicate across other clouds.
Your biggest challenge might be getting your data back out of the Cloud if you need to. If you use a Cloud providor, find out what options you have to extract your data.
What is the future for backup and recovery services? Applications can span to several terabytes and while it is possible to back this up using traditional methods from snapshots, it would take several hours to recover an application from tape. That is pretty much unacceptable for most of today's businesses. For me, the future seems to be snapshots. The EMC DMX3 storage subsystem can snapshot a whole application with a single command, maintain up to 256 of those snapshots for each application, and restore the source volumes from any one of those snapshots. Of course, most of the time you don't want to restore the whole application, just a few files. So you mount the relevant snapshot on a different server and copy over the files that you want.
If you lose the whole subsystem you lose the source data and all the snapshots, so to fix that you need a second site with remote synchronous mirroring between the two. This is not the future of course, you can do all this now. I think the future is that backup and recovery applications will start to recognise that they do not need to move data about to create backups, but can use snapshots and mirrors as backup datastores. The role of the software would then be to manage all that hardware and maintain the necessary catalogs that refer to backups contained in all these snapshots so the storage manager can easily work out what backups are available and also recover from them with simple commands.
We may see standard backups using the cloud as a longer term repository, with on-site snapshots retained for large application restore or user error type restores from recent backups. As more important data is moved to the cloud, it too needs to be protected. Cloud-to-cloud backup, where data is copied from one cloud service to another cloud, will be important in 2018. Backup vendors will need to add cloud-to-cloud capabilities to satisfy this requirement. Specifically, they will need to add tools to back up and restore applications within the cloud.
Ransomware attacks, such as WannaCry and Petya have been big news in the last year or so. Victim organisations have two choices; pay the ransom or take a lot of downtime while fixing the problem. Backup and Recovery vendors are now adding ransomware protection to their products and this will continue in 2018. Ransomware malware is picked from an infected email attachment or website. It then encrypts your data and demands money for the decryption key. One strategy is education, informing all employees of the risks and warning not to open unsolicted attachements.
However your backup and recovery product can help in various ways; by detecting suspicious application behavior before files are corrupted, with ransomware monitoring and detection tools, or by using predictive analytics to determine the probability that ransomware is operating on a server. Companies that are doing this now include Acronis, Druva, Unitrends and Quorum and more will surely follow in 2018. Of course, don't overlook tape. Tapes are 'write once read many', so a tape backup cannot be encrypted.
Metadata Intelligence, the process of using metadata to manage data, is being touted as an exciting new way to get on top of managing your data. Of course, Mainframes have been using metadata like this for 30 years or more, the point is that Windows is starting to catch up. Metadata lets you see when a file was last opened and with this information, you can keep current data on fast flash storage and move older data off onto cheaper storage.
The EU has recently introduced the General Data Protection Regulation (GDPR) legislation, which dictates how personal data must be stored, processed and deleted when the 'right to be forgotten' applies. Metadata Intelligence will help manage this, as data can be automatically stored and deleted based on pre-determined rules.
The requirement to store data securely will mean that data copies must be geographically dispersed, especially for long term archived data.
The various disks, disk arrays, switches and other bits of the storage estate generate lots of data describing the current health of the product. Predictive Storage Analytics is about continuously analysing all those data points, to predict the future behaviour of the storage estate. The theory is that this can include pinpointing potential developing problems, such as defective cables, drives and network cards, then alerting support staff, with a precisely located problem and a recommended solution. One of my least favourite error messages goes something like 'An unidentified System Error has occurred'. I'm not sure how that would be pinpointed.
Predictive Storage Analytics would also be able to monitor storage pools, cache, CPU and channel utilisation and recommend capacity requirements now and in the future.
Every manufacturer has their own idea of exactly what Hyperconverged Infrastructure (HCI) is, but a workable definition could be "HCI replaces the need to buy servers and storage arrays, then load them with a hypervisor and configure backups and monitoring software. Instead it supplies a total solution that includes pre-configured monitoring, backups, networking and storage configuration. Resources are dynamically allocated as needed, depending on the available hardware and the parameters defined by systems administrators." HCI could be Software Defined Storage, where the Hypervisor manages the storage, whihc could comprise a cluster of nodes, or it could be Container based
HCI is a complete solution from a single vendor. You don't need to purchase storage yourself from a third-party, nor maintain different hardware islands. This can be an off-site Cloud solution, or it could be built onsite. One obvious issue is vendor lockin. When you need more resources, your choice is limited to that one supplier.
If your supplied solution incudes a Global Namespace, then your applications can access file data regardless of physical location. This will greatly simplify data migration as it will be possible to move data between storage devices without any impact on applications.