Thursday, February 7, 2013

Innovations towards more energy- efficient storage

source: ComEd
Electrical Power Consumption is one large cost component in data centre expenses. As per some of the available reports, 30% of data centre energy bill is attributed to IT Equipment. Last year in September, New york Times published an article that highlighted how much energy bill these modern data centres run. As per that article [read here], Google’s data centers consume nearly 300 million watts and Facebook’s about 60 million watts. "Worldwide, the digital warehouses use about 30 billion watts of electricity, roughly equivalent to the output of 30 nuclear power plants, according to estimates industry experts compiled for The Times.", NYT reports.
Since power consumption has become such a big issue with data centres, there are many companies like ComEd whose business entirely is focused to solutions for reducing the energy use of data centers.
But given that 30% of that consumption is driven by servers and storage devices, the question arises as to why the storage vendors are not bringing more energy efficient storage. Fact that 'energy efficient storage' has not been highlighted to be one of the major trends in 2013, tells us that  addressing the problem is not very simple. Let us see why.

Electrical Power efficiency in disk based system

Storage systems at present are primarily disk-based. Even our backup system is predominantly disk-based except possibly the main-frame systems. With disks-based systems, historically the storage software are designed assuming that all attached disks are online i.e. disks have to be connected to the data bus of the storage controller and are powered on and are always available for the higher level . These systems traditionally cannot differentiate between a failed disk and a powered off disk. Fact is disk vendors, initially did not provide any different device state in the disk controller [electrical circuitry attached to the disk that controls read/write to the disk] that would identify that disks are powered down.  So the storage vendors too designed their File Systems /RAID Arrays with the assumption that disks are always powered on and in active state. Probably nobody imagined that hundreds of disks will be attached to a single storage controller one day. With the rising clamour for better power management, by 2008-2010, the disk vendors introduced power management scheme in SATA controllers. Since SATA is mostly used in near-line, backup and archive systems, and these systems have large number of disks which are not used all the time, one can power down 'idle' disks, if possible and bring considerable power saving. SATA provides two link power management states, in addition to the “Active” state. These states are “Partial” and “Slumber,” that, by specification, differ only by the command sent on the bus to enter the low power state, and the return latency. Partial has a maximum return latency of 10 microseconds, while Slumber has a maximum return latency of 10 milliseconds [further read].  Storage systems would need to tweak their File System and/or RAID controller to take advantage of SATA power management. Handling microseconds of latency is easier but handling miliseconds of latency requires major design change in the software.
EMC first brought this power management feature in their Clarion platform. The solution was to create a new RAID group and assign the powered down disks to that group after 30 min. idle state. The controller could recognize these disk states and can wait for maximum 10 seconds for the disks to come back to active state. EMC claims that this power down feature would save around 54% in average [ further read].  To my knowledge, other storage vendors are in the process of adopting this power saving feature in their controllers. If they haven't done already, it is probably because their disk based system would require pretty large design changes to accommodate these new states of disks. I personally was involved in analysis for one prominent storage vendors, and was made adequately aware of how deep the changes would go. However my take is that in next 3-4 years, most disk-based storage vendors will adopt the SATA power management.
That obviously leaves out a large number of systems that use FC/SAS disks. Fortunately SAS 2.1 brought in a new set of power management features which disk vendors are expected to adopt in next few years and SAS is expected to replace FC disks going forward, so we have a workable solution in the future.

Tape-based system as an alternative

Tape controllers on the other hand do not suffer such issues. Tapes, in fact are designed with specific attention to offline storage. One can backup the data to the tapes, take the cartridge out of the online system, store them in separate locker and insert them to the system when needed. Inside the vault, the tape cartridge do not consume any electrical power. They do however needs periodical data auditing since tape-read fails more frequently that disks.
But with the  new long-life  and a high-capacity LTO-5 and LTO-6 tapes, those problems are much reduced. Many are in fact bringing back tape storage in their system. EMC also is promoting tapes for backup data. Although it sounds like a regressive step, one must accept that tape does provide a considerable power saving option especially when it comes to storage for data backup and archival.

Little Longer-term Future

To envisage the future of power efficient storage, we need to look at the problem holistically.  One can power down idle disks. However more power is consumed by active disks. Data centres also spend considerable money in cooling the data centres. The pie chart at the top shows that almost 33% of total energy is spent in Cooling system and that cost is going to rise with rising global temperature.
     A better solution would therefore be to design storage media that consumes almost zero power when kept idle but also consumes much less power even in active state compared to existing hard disks. Much much better if these media can operate at room temperature which would translate to lower energy bill for cooling. Towards this, flash provides an excellent option. Flash storage [see previous post] consumes magnitude less power for regular read/write operation and consumes almost zero power when left idle. It also provides much higher random read/write throughput making it ideal for high-performance storage. At present its relative higher cost and limited write-span are the hindrance for  it to replace disks in mainstream storage. With time there is little doubt that further innovations will bring down the cost/GB drastically. Storage capacity also will be comparable to SAS. The biggest dampener for SSD/flash so far has been its number of writes limitation. A very recent article in IEEE Spectrum indicates that we already have a breakthrough. Macronix, a Taiwanese company has reported the invention of a self-healing NAND flash memory that survives more than 100 million cycles.
 Fact is they are yet to find the limit where it breaks. They strongly believe that it will survive a billion writes. Their method is simply a local heat treatment on the chip-set to lengthen the life of the media. If that invention works, we have an alternative storage solution that meets all our stated needs, namely, 1.consume low power, 2. can operate at room temperature, 3. provide both high capacity [~ around 2 TB] and high throughput and 5. consume a fraction of space compared to HDD [the IEEE Spectrum article can be accessed here].
In a couple of years the technology is very likely to matures with full-fledged induction of flash-only storage in mainstream storage systems. EMC's xtremeIO, whipTail, violin memory and other all-flash-storage systems are likely to define tomorrow's mainstream storage system.

No comments:

Post a Comment