Looking at the world in general, the main disruptors for the medium term future would seem to be: An aging population, especially in the West, The Internet of Things or IoT, Advanced Robotics, Focus on Privacy, both at a personal level and regulatory level and Disaster Recovery and Business Continuity. Our challenge is to investigate these disruptors and try to figure what they will mean for us in terms of future directions of storage.
The amount of stored data continues to grow at an alarming rate, at least in the Open systems world. IoT should certainly fuel future growth, but savvy people are now a little worried about the size of the 'digital footprint' that they are leaving behind them, and the wealth of detail that it contains about them so this could slow data growth in some areas. However, storage futures continues to be driven by data growth, cost containment and regulatory compliance. How do you plan for this when there is a bewildering array of products and services to chose from, all of which promise to fix all your issues with a minimum of effort?
These future possibilities can be split up into three areas:
If you decide to purchase capacity yourself, then you need to consider the trade-off between capacity, performance and cost. High speed storage usually comes in smaller increments and is more expensive. The trick is to make sure that the data that needs best performance is on that high speed storage, while older data can be on a cheaper media.
Flash Storage continues to be the favourite for fast access, and is rapidly replacing hard disk drives. Every vendor supplies all-Flash arrays now, and most will supply hybrid mixed disk and flash systems, but there are very few storage systems now that only host hard disk drives. With multi-terabyte Flash drives already in production, the disk drive future looks very uncertain. However it looks like Flash storage itself is close to the end of its development capabilities now, and new technologies are planned to supplement it.
Intel's Optane uses Prototype Phase change or PRAM, which at a very simplistic level, works by changing the state of chalcogenide glass as the two states have different electrical resistance. Optane drives are available in 2018, but sizes are small at 280 or 480GB. They have some way to go to supplant Flash. Optane has some developing competition from Crossbar, which has designed a ReRAM device which it claims is 1,000x faster than NAND flash has 1/20th the power consumption, and lasts 1,000x longer. Optane is available now on PCs, as a cache between hard drive storage and main memory and is getting good reviews. We can expect it to be more prevalent in 2019, and hopefully the cost will come down too.
There are plans to store data on DNA strands, which would be at molecular level. This promises very high capacity density, but if it ever becomes a mainstream storage product, then a lot more development is needed. CATALOG Technologies is developing an implementation that uses standard DNA building blocks, or pre-made DNA molecules, which they say is a faster and cheaper way of building a datablock that assembling the molecules individually as required. One to watch, but unlikely to surface unti the 2020s.
One of the problems with producing faster technology is that the storage devices can process data faster than the existing comms channels that serve them, so the comms channels become a bottleneck. I'm certainly hearing this from Mainframe performance experts, who tell me that Flash drives have resolved the problems with disk performance, but the FICON channels are now the problem. There are two angles to consider here, the interfaces that exist within a server or device, and the external connections and channels.
PCI Express, or PCIe, provides the fast interface and internaly, NVMe is an alternative to the old SCSI protocol for transferring data between hosts and peripheral storage. NVMe was designed specifically to support the high IO rates demanded by PCIe connected Flash Storage. There are physical limits to how fast you can send a signal down a wire. To quote Scotty, "Ye cannae change the laws of physics, Jim", What you can do, is use more wires in parallel. The m.2 PCIe device interface allows a device to connect to 2 or 4 PCIe lanes and as this scales up, it should cope with the fastest Flash Storage transfer rates, but this is a motherboard connection. It should resolve the comms issues inside the storage device, but will not help with the external channels.
The next stage is NVMe over Fabrics (NVMe-oF), which is intended to improve the data transfer between host computers and target storage systems. At the moment, NVMe-oF just provides remote direct memory access (RDMA) over converged Ethernet (RoCE) and Fibre Channel (NVMe-FC). In 2018, several all-flash array vendors such as Kaminario, Pure Storage and Western Digital's Tegile have moved to use NVMe-oF as a back-end fabric. The expectation is that NVMe will expand further in 2019, making inroads into storage systems, servers, and SAN fabrics.
Magnetic Disk is not completely dead, two technology improvements exist which could extend their life a bit. Heat assisted magnetic recording (HAMR) and Microwave assisted magnetic recording (MAMR) both use techniques to persuade the magnetic domains to change polarity faster, by using a laser (HAMR) or a microwave generator (MAMR) in the write head. This allows the data density on a disk to improve by a factor of two or four, and so reducing the cost per terabyte. These technologies are not due until 2018 or so.
Magnetic tape does seem to be falling behind, with the current LTO-8 tape holding just 12TB raw storage capacity, though there is a routemap to LTO-10 with a 48TB capacity. Magnetic tape is still the best product for cold archive data, as the slower access is offset by the cheaper price per TB. However indications are that vendors are encouraging customers to abandon tape and use the Cloud insted.
The Cloud can be defined as a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer. A lot of smaller businesses are now using the cloud to reduce their storage costs, often using SaaS applications. A lot of the pressure to move to the Cloud comes from COEs or other senior management, rather than being driven by the IT department. One reason for this pressure is that as data moves to the Cloud, the cost moves from CapEx to OpEx, which is always preferred by accountants.
Getting data in and out of the cloud can take some time, seconds for small amounts of data, and hours for Big Data. This is beginning to becone a problem, especially for the Internet of Things. Enter Edge computing and Fog computing. You can wait a second of two for a response from Amazon Echo, but we want a device like a driverless car to respond instantly. For this to happen you need processing power and storage on the device itself, or at the 'edge'.
If you consider the internet to be a bit like a spider's web, then the Cloud would be the computing and storage 'spider' in the center, and the web extends out to all the connected things. The idea behind Edge computing is that storage and provessing is provided at the 'edge' of the web, to reduce the amount of raw data that would need to be passed over the web, and speed up processing for the things.
Fog computing provides the same functionality, but could be a little nearer the users than the Edge, or it could be the same as the Edge. As yet there is no agreed definiton for this. Edge devices can be both small, low-cost cluster hardware in an SME, or server farms with clustering and large scale storage networks in a very large corporation.
Edge processing is expected to grow in 2019, which will mean that companies must provision and manage data storage for them. If your cloud, public or private, spans multiple cities or countries, then this could be a challenge.
In 2019 you can expect to see more data movement between multiple cloud platforms, both on-premise and public cloud vendors. Your biggest challenge might be getting your data back out of the Cloud if you need to. If you use a Cloud providor, find out what options you have to extract your data.
What is the future for backup and recovery services? Applications can span to several terabytes and while it is possible to back this up using traditional methods from snapshots, it would take several hours to recover an application from tape. That is pretty much unacceptable for most of today's businesses. For me, the future seems to be snapshots. The EMC DMX3 storage subsystem can snapshot a whole application with a single command, maintain up to 256 of those snapshots for each application, and restore the source volumes from any one of those snapshots. Of course, most of the time you don't want to restore the whole application, just a few files. So you mount the relevant snapshot on a different server and copy over the files that you want.
If you lose the whole subsystem you lose the source data and all the snapshots, so to fix that you need a second site with remote synchronous mirroring between the two. This is not the future of course, you can do all this now. I think the future is that backup and recovery applications will start to recognise that they do not need to move data about to create backups, but can use snapshots and mirrors as backup datastores. The role of the software would then be to manage all that hardware and maintain the necessary catalogs that refer to backups contained in all these snapshots so the storage manager can easily work out what backups are available and also recover from them with simple commands.
We may see standard backups using the cloud as a longer term repository, with on-site snapshots retained for large application restore or user error type restores from recent backups. As more important data is moved to the cloud, it too needs to be protected. Cloud-to-cloud backup, where data is copied from one cloud service to another cloud, will be important in 2018. Backup vendors will need to add cloud-to-cloud capabilities to satisfy this requirement. Specifically, they will need to add tools to back up and restore applications within the cloud.
Ransomware attacks, such as WannaCry and Petya have been big news in the last year or so. Victim organisations have two choices; pay the ransom or take a lot of downtime while fixing the problem. Backup and Recovery vendors are now adding ransomware protection to their products and this will continue in 2018. Ransomware malware is picked from an infected email attachment or website. It then encrypts your data and demands money for the decryption key. One strategy is education, informing all employees of the risks and warning not to open unsolicted attachements.
However your backup and recovery product can help in various ways; by detecting suspicious application behavior before files are corrupted, with ransomware monitoring and detection tools, or by using predictive analytics to determine the probability that ransomware is operating on a server. Companies that are doing this now include Acronis, Druva, Unitrends and Quorum and more will surely follow in 2018. Of course, don't overlook tape. Tapes are 'write once read many', so a tape backup cannot be encrypted.
Metadata Intelligence, the process of using metadata to manage data, is being touted as an exciting new way to get on top of managing your data. Of course, Mainframes have been using metadata like this for 30 years or more, the point is that Windows is starting to catch up. Metadata lets you see when a file was last opened and with this information, you can keep current data on fast flash storage and move older data off onto cheaper storage.
The EU has recently introduced the General Data Protection Regulation (GDPR) legislation, which dictates how personal data must be stored, processed and deleted when the 'right to be forgotten' applies. Metadata Intelligence will help manage this, as data can be automatically stored and deleted based on pre-determined rules.
The requirement to store data securely will mean that data copies must be geographically dispersed, especially for long term archived data.
The various disks, disk arrays, switches and other bits of the storage estate generate lots of data describing the current health of the product. Predictive Storage Analytics is about continuously analysing all those data points, to predict the future behaviour of the storage estate. The theory is that this can include pinpointing potential developing problems, such as defective cables, drives and network cards, then alerting support staff, with a precisely located problem and a recommended solution. One of my least favourite error messages goes something like 'An unidentified System Error has occurred'. I'm not sure how that would be pinpointed.
Predictive Storage Analytics would also be able to monitor storage pools, cache, CPU and channel utilisation and recommend capacity requirements now and in the future.
Every manufacturer has their own idea of exactly what Hyperconverged Infrastructure (HCI) is, but a workable definition could be "HCI replaces the need to buy servers and storage arrays, then load them with a hypervisor and configure backups and monitoring software. Instead it supplies a total solution that includes pre-configured monitoring, backups, networking and storage configuration. Resources are dynamically allocated as needed, depending on the available hardware and the parameters defined by systems administrators." HCI could be Software Defined Storage, where the Hypervisor manages the storage, which could comprise a cluster of nodes, or it could be Container based
HCI is a complete solution from a single vendor. You don't need to purchase storage yourself from a third-party, nor maintain different hardware islands. This can be an off-site Cloud solution, or it could be built onsite. One obvious issue is vendor lockin. When you need more resources, your choice is limited to that one supplier.
If your supplied solution incudes a Global Namespace, then your applications can access file data regardless of physical location. This will greatly simplify data migration as it will be possible to move data between storage devices without any impact on applications.