Thursday, February 21, 2013

Future Energy Efficient Enterprise Storage Servers

When I was finishing my last post, another thought came to my mind. If 30% of data centre energy bill goes to Storage and Servers and half of that are consumed only by storage, we still have a large expense head to target. Present Intel Xeon or x-86 based processors that most enterprise servers use today are quite power hungry. Given that processors are the biggest power consumers in server hardware, should we not be looking at making them more energy efficient?
ARM enjoys today as the de-facto processor design for all energy sensitive segment, particularly miniature and consumer devices such as smartphone. Intel brought up ATOM processor based on ARM design but somehow the server industry stayed away from ATOM. Traditionally Server work profile used to be considered too heavy for ARM-based processor. But the trend is changing slowly. Since ARM is inherently lot more power efficient, there is an interest to use cluster of ARM core processors to replace more powerful Intel variants. Baidu, the Chinese Google [search] equivalent announced that it has deployed ARM based processor from Marvell for its storage servers. The particular version that Baidu is using is known to be Marvell's 1.6GHz quad-core Armada processor, that Marvell launched in 2010. However, AMD and some semiconductor startups like Calxeda also are trying to bring a 64-bit version for the storage server market. In last 3-4 years most of the storage vendors have moved to (mostly Intel-based) 64-bit processors for their enterprise servers. So, it is quite obvious that for them to seriously consider a ARM-based processor, they would need at least a 64-bit version. Taking cognizance of this need, ARM has already announced to bring out two new core designs. Last October, ARM unveiled its 64 bit Cortex A-50 server processor. ZDNET reports that this design is already licensed by AMD, Broadcom, Calxeda, HiSilicon, Samsung and STMicroelectronics. AMD announced that their first ARM based server CPU is targeted for production in 2014.

AMD Bridges the X86 and ARM Ecosystems for the Data Center from AMD

 At this point, it is not clear if Intel's response would be another version of ATOM or Xeon. Storage vendors who adopted Xeon in their storage controllers definitely would like if Intel makes Xeon more energy efficient. But we sure can expect the data centres to be lot more energy efficient compared to their present versions.

Thursday, February 7, 2013

Innovations towards more energy- efficient storage

source: ComEd
Electrical Power Consumption is one large cost component in data centre expenses. As per some of the available reports, 30% of data centre energy bill is attributed to IT Equipment. Last year in September, New york Times published an article that highlighted how much energy bill these modern data centres run. As per that article [read here], Google’s data centers consume nearly 300 million watts and Facebook’s about 60 million watts. "Worldwide, the digital warehouses use about 30 billion watts of electricity, roughly equivalent to the output of 30 nuclear power plants, according to estimates industry experts compiled for The Times.", NYT reports.
Since power consumption has become such a big issue with data centres, there are many companies like ComEd whose business entirely is focused to solutions for reducing the energy use of data centers.
But given that 30% of that consumption is driven by servers and storage devices, the question arises as to why the storage vendors are not bringing more energy efficient storage. Fact that 'energy efficient storage' has not been highlighted to be one of the major trends in 2013, tells us that  addressing the problem is not very simple. Let us see why.

Electrical Power efficiency in disk based system

Storage systems at present are primarily disk-based. Even our backup system is predominantly disk-based except possibly the main-frame systems. With disks-based systems, historically the storage software are designed assuming that all attached disks are online i.e. disks have to be connected to the data bus of the storage controller and are powered on and are always available for the higher level . These systems traditionally cannot differentiate between a failed disk and a powered off disk. Fact is disk vendors, initially did not provide any different device state in the disk controller [electrical circuitry attached to the disk that controls read/write to the disk] that would identify that disks are powered down.  So the storage vendors too designed their File Systems /RAID Arrays with the assumption that disks are always powered on and in active state. Probably nobody imagined that hundreds of disks will be attached to a single storage controller one day. With the rising clamour for better power management, by 2008-2010, the disk vendors introduced power management scheme in SATA controllers. Since SATA is mostly used in near-line, backup and archive systems, and these systems have large number of disks which are not used all the time, one can power down 'idle' disks, if possible and bring considerable power saving. SATA provides two link power management states, in addition to the “Active” state. These states are “Partial” and “Slumber,” that, by specification, differ only by the command sent on the bus to enter the low power state, and the return latency. Partial has a maximum return latency of 10 microseconds, while Slumber has a maximum return latency of 10 milliseconds [further read].  Storage systems would need to tweak their File System and/or RAID controller to take advantage of SATA power management. Handling microseconds of latency is easier but handling miliseconds of latency requires major design change in the software.
EMC first brought this power management feature in their Clarion platform. The solution was to create a new RAID group and assign the powered down disks to that group after 30 min. idle state. The controller could recognize these disk states and can wait for maximum 10 seconds for the disks to come back to active state. EMC claims that this power down feature would save around 54% in average [ further read].  To my knowledge, other storage vendors are in the process of adopting this power saving feature in their controllers. If they haven't done already, it is probably because their disk based system would require pretty large design changes to accommodate these new states of disks. I personally was involved in analysis for one prominent storage vendors, and was made adequately aware of how deep the changes would go. However my take is that in next 3-4 years, most disk-based storage vendors will adopt the SATA power management.
That obviously leaves out a large number of systems that use FC/SAS disks. Fortunately SAS 2.1 brought in a new set of power management features which disk vendors are expected to adopt in next few years and SAS is expected to replace FC disks going forward, so we have a workable solution in the future.

Tape-based system as an alternative

Tape controllers on the other hand do not suffer such issues. Tapes, in fact are designed with specific attention to offline storage. One can backup the data to the tapes, take the cartridge out of the online system, store them in separate locker and insert them to the system when needed. Inside the vault, the tape cartridge do not consume any electrical power. They do however needs periodical data auditing since tape-read fails more frequently that disks.
But with the  new long-life  and a high-capacity LTO-5 and LTO-6 tapes, those problems are much reduced. Many are in fact bringing back tape storage in their system. EMC also is promoting tapes for backup data. Although it sounds like a regressive step, one must accept that tape does provide a considerable power saving option especially when it comes to storage for data backup and archival.

Little Longer-term Future

To envisage the future of power efficient storage, we need to look at the problem holistically.  One can power down idle disks. However more power is consumed by active disks. Data centres also spend considerable money in cooling the data centres. The pie chart at the top shows that almost 33% of total energy is spent in Cooling system and that cost is going to rise with rising global temperature.
     A better solution would therefore be to design storage media that consumes almost zero power when kept idle but also consumes much less power even in active state compared to existing hard disks. Much much better if these media can operate at room temperature which would translate to lower energy bill for cooling. Towards this, flash provides an excellent option. Flash storage [see previous post] consumes magnitude less power for regular read/write operation and consumes almost zero power when left idle. It also provides much higher random read/write throughput making it ideal for high-performance storage. At present its relative higher cost and limited write-span are the hindrance for  it to replace disks in mainstream storage. With time there is little doubt that further innovations will bring down the cost/GB drastically. Storage capacity also will be comparable to SAS. The biggest dampener for SSD/flash so far has been its number of writes limitation. A very recent article in IEEE Spectrum indicates that we already have a breakthrough. Macronix, a Taiwanese company has reported the invention of a self-healing NAND flash memory that survives more than 100 million cycles.
 Fact is they are yet to find the limit where it breaks. They strongly believe that it will survive a billion writes. Their method is simply a local heat treatment on the chip-set to lengthen the life of the media. If that invention works, we have an alternative storage solution that meets all our stated needs, namely, 1.consume low power, 2. can operate at room temperature, 3. provide both high capacity [~ around 2 TB] and high throughput and 5. consume a fraction of space compared to HDD [the IEEE Spectrum article can be accessed here].
In a couple of years the technology is very likely to matures with full-fledged induction of flash-only storage in mainstream storage systems. EMC's xtremeIO, whipTail, violin memory and other all-flash-storage systems are likely to define tomorrow's mainstream storage system.

Tuesday, February 5, 2013

Building a SAAS infrastructure using opensource components

Let's assume that you are in the low-cost web server business and you want to build your entire setup using only opensource components. This means that you probably are using a open-source LAMP stack for your web-servers. This also means that you definitely are using mySQL for your backend database. As far as programming tools are considered, you have plenty of choices. For our present discussion, we would assume that you are using PHP and Javascript since these are the tools majority use today. However, changing the tool should not be a big issue and we should be able to add new tools as needed. So in summary, we need an application server setup comprising a LAMP stack, a PHP server, a mySQL server. In case you need a Windows configuration, we would simply replace LAMP with WAMP stack.
All right, now let's say you need many of these servers and you need to be able to provision the application fast, ideally in just couple of hours, application in this case would mean a single server application and not the Hadoop type. This means you would prefer a virtual server instead of a single dedicated m/c. Since you do not know how many applications you would run and how many users would subscribe to a single application, you want to design a setup that can scale out instead of scale up. What we mean is this, let's say your application handles 4000 connections today. Now you can either design a server that can scale up to load for 100,000 connections [which is decidedly more complex] or you can decide to design to share the load with multiple servers, which is relatively easier.
Advantage of the latter approach is that you can scale down very easily by decommissioning few servers when your load goes down and re-provision the servers for different application(s).
  In short we need a cluster of VMs with each VM running either a web server or mySQL.
Let's look at your hardware investment. To start with, you want a mid-size server m/c. A good configuration would be six core x86-based processor [Intel or AMD] with 8 GB RAM at the least, somewhat similar to Dell Poweredge. The hardware must support Intel VT spec for hardware assistance for virtualization. To cover for hardware failures, you may want to have two of the servers connected in hot-standby mode. For storage, you may want to have another set of two servers with large cache of SAS disks running GlusterFS. No need to mention, all these m/c would be connected in LAN and would be interfacing to the external world through a Firewall router.
Now let's bring virtualization. We will install  Linux KVM hypervisor on the PowerEdge equivalent servers. Remember Linux KVM that we are talking here is free and does not come with many management tools that come with Both RedHat Enterprise Virtualization [RHEV] and SUSE enterprise virtualization version are used in large enterprise setup and one can choose either of them. Both of these versions come with a License Fee. If you like to check which guest OS is supported on which KVM version, check this page.
Once KVM is installed we can create around a hundred VMs on the server.  Each VM can be pre-configured with a guest OS [we used Ubuntu version] and a web-server template which would provide default configuration for LAMP stack, PHP servers, IP address, domain name in order to preload our VM installation work. I know that RHEV provides tool to create template. For free KVM, one may have to do it manually or write a tool. Having a template makes the provisioning jobe easier. All that one needs is to modify the configuration before commissioning the web-server.
For GlusterFS, many prefer the Debian linux as the host. Since they are not part of web-server, we actually can have a native debian server and have the GlusterFS installed on it. This page can be helpful in getting this setup ready. Now we need to install the mySQL cluster [it provides replication support]. this How-to document could be useful.
Now the setup is ready, we have almost the first version of open-source based cloud ready. You can commission the web-server as needed. There is one glitch though. You have no interface to monitor server load and health remotely. We picked Nagios tool for that job as it was relatively light-weight and easier to install. It is also used by some of the leading large cloud service providers. You still may have to develop few tools to make the provisioning the application and monitoring entirely autonomous as per specific need for the setup.
There is a nice article on building Ubuntu linux based cloud and you may want to check that too.


Friday, February 1, 2013

Building your own cloud

Let me start with a disclaimer that I do not intend to write a 'How-to' document in this post. My primary motivation behind writing this article is that I see too many overloaded and clouded use of the word 'cloud', which in my opinion is becoming serious demotivator for honest and hype-shy people to really feel interested to cloud. From that point of view my humble attempt here would be to deglamourize 'cloud' and lay it as bare as possible. Hopefully along the way we will be able to understand what it takes to build one cloud.
Conceptually cloud refers to the arrangement where application server configuration and storage space can be dynamically changed  preferably with a software interface. Essentially this means 1. application server is decoupled from physical server configuration and 2. there is software interface which allows the administrator to monitor loads and performances and add/modify server configuration most desirably remotely.  Advantages are two-fold, the application server becomes self-contained entity which can migrate from one hardware to another as needed and secondly it gives us the freedom to think in terms of our application's need alone. This simplifies a lot in application provisioning, especially if the user does not have the expertise in data-centre technologies or it does not want to incur large CapEx in data centre.
Technologically, there is one crucial element that has accelerated the shaping up of cloud and that is virtualization. Not that the concept is anything new, but the way it has been made ubiquitous has brought new value. With increasing number of faster processors and larger RAM becoming the norm for modern physical servers, it  became evident some time back that running a single OS with single server application would mean significant under-utilization of the server m/c. For example, a Dell PowerEdge 310 is a relatively low-end server m/c but it provides one Quad-core Intel® Xeon® 3400 series processor and 8GB RAM (expandable to 32GB) in one configuration. Running a single application is serious wastage of all its processing power, unless the application is heavily used all the time. In a typical case, an application server's load is input driven and only takes up compute and networking bandwidth for a fraction of the time the resources are up. 
Instead one can install VMware vSphere or Microsoft Hyper-V and have tens of VMs each with its own application server running on a single server m/c. Great thing about this VM (Virtual Machines) is that all the needed interfaces (serial, network) come bundled with it. One just have to install (provisioning is almost effortless with all commercial VMs) them and they are ready for application provisioning. Best part of it, with one time configuration of all the VMs, getting a new application up requires very small time. One can even have VM templates for different type of servers [e.g. Oracle server VM or exchange server VM or an Ubuntu based linux web server VM] and install the template once a VM is allocated.
Now a server needs storage for its data which also keeps growing. Adding a physical storage volume or a physical LUN [in SAN concept] to each VM is bound to bring under-utilization of storage bandwidth. Instead, storage vendors provide virtual volumes / LUNs which can be provisioned over a physical Volume or LUNs[which is just a bundle of disks sharing same RAID structure].
 VM and Vstorage (i.e. Virtual storage volume or virtual LUN) thus can be thought of as the unit of provisioning in an IT setup. All that one needs is a horizontal software layer that monitors and configures the VMs [with all the interfaces] and VStorages and one has basic working Cloud ready. A user of this cloud can be allocated VMs with pre-installed applications and storage and a software interface using which the user can manage and monitor his resources. When he needs more compute or network bandwidth he places the request to the Cloud administrator and from the Cloud's readily available VM pool, the administrator assigns adequate resources to the user. This generic model is used for what is  known as IAAS or Infrastructure As A Service. If the cloud needs to provide support for higher abstraction of service, it needs further sophistication at the horizontal software layer. For example, let's assume the user needs to run an application that sifts through huge data which are distributed across many compute and storage nodes. The application needs the support of a more sophisticated interface which can scale up or scale down resources with the volume of data while  providing a consistent data manipulation engine across many nodes. Yes, we are talking about Hadoop like software layer. We cannot cover Hadoop here but the point should be clear, complexity of cloud is driven by the sophistication of the application that the cloud is going to host, but at the essential level, Cloud remains a set of virtualized compute and storage resources which are governed by a single software layer.
As one can imagine, a basic Cloud setup can be built entirely using Opensource elements too. In the next post we will talk about a basic cloud setup that we built with linux KVM and Python.