LasCon Storage - Main page

Lascon Storage

Lascon Storage was founded in 2000 and provides hints and tips on how to manage your data, strategic advice and news items. A lot of this advice is about smarter, cheaper ways of working without compromising service.
Use the links above to enter the different sections of the site, or just select from the highlights further down the screen.

Storage Vendor News

IBM Storage

Hu Yoshidas blog

Overcoming CPU Chokepoints For NVMe
Mon, 12 Aug 2019

Marc Staimer of Dragon Slayer Consulting published a recent article on the CPU chokepoint in servers and controllers for NVMe storage. This supports blogs that I have recently posted on CPU architectural limitations and the need for accelerated compute and other computer architectures.

 

Marc observes that as more NVMe flash SSDs are required, then the supporting hardware gets increasingly complicated. “It usually means more CPUs, either internal or external ones. The storage can be DAS or shared across NVMe-oF. Either way, more CPUs, drives, drive drawers, switches, adapters, transceivers and cables will be required. The general industry consensus is that scaling capacity and performance using NVMe drives and NVMe-oF just requires more hardware. Storage Class Memory technologies will only exacerbate the CPU chokepoint problem, because their increased performance puts even more load pressure on the CPU.”

 

“But here's the rub. These systems offer quite noticeable diminishing marginal returns. The hardware grows much faster than the performance gains. This occurs no matter how many CPUs or NVMe flash SSDs are added. Eventually, more hardware means a negative return on overall performance.

 

The root cause of this NVMe performance challenge isn't hardware. It's storage software that wasn't designed for CPU efficiency. Why bother with efficiency when CPU performance was doubling every 18 to 24 months? Features, such as deduplication, compression, snapshots, clones, replication, tiering and error detection and correction, were continually added to storage software. And many of these features were CPU intensive. When storage software is consuming CPU resources, they aren't available for storage I/O to the high-performance drives.

 

While Hitachi Vantara, has not yet delivered NVMe or NVMe-oF in the enterprise storage VSP platform, we have been making changes to the VSP Storage controller in preparation for the introduction of NVMe and NVMe oF when the standards become finalized. I blogged about this a year ago. Essentially, we have rewritten the SVOS (Storage Virtualization Operating System) VSP controller software for NVMe and released it as SVOS RF where the RF stands for Resilient Flash. This systems software was re architected and designed to optimize and scale NVMe performance. The other thing we did in the hardware was to accelerate compute through the offload of some tasks to FPGAs. We also optimized the data path with improved cache algorithms.

 

The result of these changes helped to accelerate our performance even without NMVe or NVMe oF. Our Flash performance with SAS (Serial Attached SCSI) are comparable to some of the startups who are delivering NVMe storage systems.

 

Last year’s August 6, 2018, Gartner Critical Capabilities for Solid State Arrays report provides some insight on what our capabilities will be. In terms of performance rating, the VSP F series came in third in front of several vendors that had NVMe. This evaluation did not include the latest SVOS RF and VSP F900/F700/F370/F350 enhancements which were announced in May 2018 because they did not make Gartner’s cutoff date for the 2018 evaluation. These new enhancements featured an improved overall flash design, with 3x more IOPS, lower latency and 2.5x more capacity than previous VSP all flash systems.

 

The only two vendors ahead of the F series in performance at that time, were the Kaminario K2 and the Pure Storage Flash Blade, none of which have the high reliability, scalability and enterprise data services of the VSP.  In fact, the VSP F series placed the highest in RAS (reliability, availability, serviceability) of all 18 products that were evaluated. The Kaminario K2 has a proprietary NVMe-oF host connection which they call NVMeF. One can assume that the performance of the Hitachi Vantara All Flash Arrays, even with SCSI/SAS would be higher if the new models of the VSP and SVOS RF had been included in the evaluation. Here are the Product scores for the High-Performance Use Case for the top three places on a scale from 1 to 5 with 5 being the highest.

 

Kaminario K2                                       4.13

Pure Storage FlashBlade                    4.08

Hitachi VSP F Series                            4.03

Pure Storage M and X Series             4.03

 

 

While the standards for NVMe oF are still being worked on and still to be proven, the NVMe standards are pretty close to being finalized, so you can expect to see NVMe coming soon from Hitachi in the near future, and you should expect to see It blow away the competition, since we have already done the ground work to address the choke points that Marc Staimer identifies.


TCP Is A Network Protocol for NVMe
Thu, 08 Aug 2019

Last year in August, I posted a blog about NVMe, an open standards protocol for digital communications between servers and non-volatile memory storage. It replaces the SCSI protocol that was designed and implemented for mechanical hard drives which processed one command at a time. NVMe was designed for flash and other non-volatile storage devices that may be in our future. The command set is leaner, and it supports a nearly unlimited queue depth that takes advantage of the parallel nature of flash drives (a max 64K queue depth for up to 64K separate queues).

There are several transports for the NVMe protocol. NVMe by itself can use PCIe (Peripheral Component Interconnect Express), which is a standard type of connection for internal devices in a computer, to transport signals over a PCIe bus from a non-volatile memory storage device (SSD). Hitachi Vantara has implemented NVMe on our hyperconverged, Unified Compute Platform (UCP HC), where internal NVMe flash drives are connected directly to the servers through PCIe. While direct-attached architectures offer high performance and are easy to deploy at a small scale, data services like snapshots and replication will have to be done by the host CPU which adds overhead. If a VM has to access another node to find data, you will need to transfer the data or the application to the same node.  For smaller data sets this isn't an issue, but as the workload increases, this negates some of the performance advantages of NVMe. However, you are still ahead of the game compared to SCSI devices and UCP HC with NVMe is a great option for hyperconverged infrastructure workloads.

In my post from last year, I introduced the other transports that enable NVMe to be transported over a Fabric for external attachment (NVMe-oF). These transports included, NVMe-oF using Fibre Channel and  NVMe-oF using RDMA over Infiniband, RoCE, or iWARP.

Late last year, another transport was ratified, NVMe-oF using TCP. The value proposition for TCP, is that it’s well-understood, and can use the TCP/IP routers and switches. One of the disadvantages with TCP/IP is congestion. Unlike FC where buffer credits are used to ensure that the target can receive a packet before the packet is sent, The IP layer  drops the packet when the network gets congested, and it is up to TCP to ensure that no data is lost, which causes the transport to slow down when the network gets overloaded. While TCP overreacts to congestion, it doesn’t fail; it just slows down. NVMe over TCP is still substantially ahead of SCSI in terms of latency while still behind NVMe over FC and RDMA.

RDMA, provides direct memory access, and will be the choice for high performance, but there will be decisions to be made on the choice of networking protocol; Infiniband, RoCE (RDMA over converged ethernet) or iWARP (Internet Wide Area RDMA Protocol).

While there is still some standardization to be done on NVMe-oF for Fibre Channel, this will probably be the network protocol to be accepted first, since it is more mature than NVMe-oF over TCP, provides flow control through buffer credits, and is a familiar network protocol for storage users. Like TCP, Fibre Channel can use existing routers and switches, with relatively minor changes in software. There will likely be different network protocols depending on use case. Direct attached NVMe over PCIe for hyperconverged and software defined storage, Fibre channel for enterprise storage, TCP for distributed storage and RDMA for high performance storage. SCSI will still be the dominant interface for the next few years. However, NVMe and NVMe-oF will eventually replace traditional SCSI based storage. I would expect the first implementations to be 50% Fibre Channel, 30% TCP, 12% PCIe, and 8% RDMA.

This could change dramatically depending on what the hyper scaling vendors do. This week Amazon acquired E8 Storage, an Israeli Company, that has an end to end 2U NVMe storage system that uses NVMe-oF over TCP. TCP is a logical choice for a cloud company.


Lascon latest major updates

Welcome to Lascon Storage. This site provides hints and tips on how to manage your data, strategic advice and news items.