Monday, January 28, 2013

Storage Trends in 2013

As the year begins, it is kind of a popular game to try to predict what is coming in the year. However, as far as storage industry goes whichever way one looks, it does not appear that the game can be much interesting. One  good measure of interesting-ness is (1.)how predictable the market is and the other one could be (2.)how likely a new technology is going to be disruptive. As far as the market is concerned, results from last years tell us the market has been quite stable with EMC leading the pack almost in all fronts of storage systems with more than 1/3rd share of market, while IBM, NetApp, HDS and HP closely competing with each other to pick up second position in the rank. Gartner's 3rd Quarter, 2012 chart [source], below shows relative positions of the storage system vendors.
Growth projectiles of individual players also do not throw up any possibility of surprises with EMC continuing to lead with large margin in the foreseeable future. IDC for example in its press release last November, 2012 forecast 2013 to be a slow growth year. "While both EMC and NetApp continue to gain market share, which should enable both vendors to outpace the overall market growth rate, we are modestly concerned with the current estimates for EMC (+9.5% in 2013) and NTAP (+7.6%) vs. the (forecast) industry growth of 4%.", IDC report says. Gartner also sounded similarly in their 2012-end forecast about the market. To sum up, we do not expect much of reordering in ranks this year too.
As far as technology trend is considered, we have seen published views of 3PAR (HP) CEO David Scott and NetApp CTO, Jay Kidd.
While both focus on their respective solution portfolio and positioning, Mr. Kidd paints the picture with broader brush. He sees, dominant market play of Virtualization, Clustered Storage, Flash Storage and Cloud access in 2013. EMC in addition talks very strongly about Tapes. Tapes?? Some would sneer that it looks like a regressive step. But frankly if enterprise is buying tape storage, there must be strong reasons for that. With larger archive, the need, to drive down power, rack density for archive storage, has become stronger and EMC is giving a solution where Tape adequately addresses that need, especially where EMC gears constitute most of the data centre equipment. But we are digressing.
Coming back to where we started, based on what we can distill from all the chatters, there are three distinct technological patterns:
1. more penetration for solid state storage as differentiator in tiered storage system and
2. stronger play of virtualization in storage deployment to enable more data mobility and
3. growth of  object storage.
Let's take them individually.

Flash-based Storage /Solid-state Storage

Samsung SSD vs SATA HDD: source CNET
Solid-state storage [based on NOR or NAND flash] devices gained popularity in last decade, especially in consumer devices. With passing years, SSDs vendors have brought more reliability, memory density and longer device life time so much so that enterprises see SSD as strong alternative to high-speed HDD. Compared to disk-based HDDs, SSDs offer almost 3 times lower power consumption and magnitude faster memory access [no seek time for SSDs] making it better fit for high-transaction-throughput server. SAP's HANA for example runs entirely on memory to provide faster throughput. SSDs become cheaper alternative in this scenario.  However big storage players so far showed lukewarm response due to high cost of SSD compared HDD-based system. Most of the large storage players brought in flash as fast cache or accelerator in otherwise disk-based storage controllers for read/write throughput [some are using it to store metadata for active storage volumes] but so far complete SSD-array has not come to mainstream. Startups like Whiptail and Violin Memory are betting on full flash-based storage array and not surprisingly they are making quite a few positive news splashes too. Many believe that 2013 will herald the era of SSD-based storage arrays in mainstream enterprise storage. Here is a recent story where Indian Railway System [IRCTC] is looking at SSD to boost performance for online real-time ticketing system. In a tiered storage structure, it looks like flash-based storage or SSDs will see a dominant role in Tier-1 or performance tier in this year. [For more on tiered storage concept see my previous post]

Virtualization

Virtualization is not a new story. VMware continues to shape the storage deployment topography where mobility of not only virtual machines but mobility of entire solution bundled with storage, server and networking infrastructure is making headway. Here we are talking about mobility of entire application ensemble. While EMC gets the biggest benefit of VMware's dominance, by working with all leading storage players like NetApp and HP and networking giant Cisco, VMware has almost created a de-facto virtualization solution for enterprises.  There are are few IT managers, though, who are brave enough to try linux based virtualization solution. IBM for a change is pushing KVM [linux virtualization solution] and trying to position it as alternative to VMware solutions. Read more at IBM blog. There is however hardly any different opinion that virtualization will drive most of the storage deployment this year [I am not counting tape storage here]. IDC also forecasted that in 2013, 69 percent of workloads will be virtualized.
Software Defined Data Centre [SDDC] is a term that has got quite popular in electronic chatter these days. Although VMware coined the term some time back but the way people are using it today is very different from the way VMware outlined. SDDC is used to describe scenario where entire data centre is defined in software and provided as a service. It takes lot more than just virtualization of server and storage but primary constituent of the solution definitely is virtualization. From that perspective, we would put SDDC under Virtualization in the present context.

Object Storage

Object Storage largely comprises all cloud-based storage access. Typically a cloud is accessed over HTTP-based interfaces where all storage entities are referred as objects, i.e. a URL is an object, a File is an object, a database entry too is an object. In other words whenever one accesses storage using any of cloud APIs, one is accessing object storage. In a sense this is an abstraction but in many other senses, it is new way of dealing with storage. It is kind of getting closer to application semantics. As enterprises are moving to Cloud [public or private], storage accesses is getting objectified.
In 2013, Adoption of special cloud-based software will expand as the applications will become more and more cloud-aware. Mark Goros, CEO of Caringo, the leading provider of object storage software, tells us that, "The shift to object storage is being driven by IT trends, including the adoption of cloud services, emerging analytics applications, BYOD and mobility, as well as the market momentum of research, healthcare, government and life sciences." [source]. While there are public cloud gateways like Nasuni File server or NetApp® StorageGRID® gateway that connect enterprise data centres to public cloud like Amazon or Rackspace, the challenge of object storage is less about handling throughput, it is more about how one can organize, move and manage huge number objects of varied sizes in an unconstrained namespace. As is evident, enterprise object storage will closely follow the evolution of large public Cloud infrastructure like Amazon EC or Microsoft Azure.

Tuesday, January 15, 2013

A storage system potpourrie for beginners

Storage is easily one of the most talked about, most invested by people's attention and most confusing technologies around. Anyone who can read this sentence, is aware of digital storage as a concept. Any data that is generated by computing m/c is digital and requires digital storage. However when it comes to technology aspect, storage is easily most clouded concept that is infested with unending series of acronyms: DAS, NAS, SAN, SCSI, SATA, SAS, NFS, CIFS, RAID.. and multiple technology families, like tape storage, disk storage, solid-state storage and then there is all-encompassing Cloud. If you hoped  that with cloud you have finally one thing that you can take refuge in, hold that hope for you must first ascertain what constitutes cloud  to be sure that you can rest with Cloud.
                 One way to make sense out of these apparent forest of acronyms and concepts is try to appreciate what we need storage for. Essentially entire purpose of all storage technologies is to help us to store our ever-expanding digital data in such a way that is
  1. safe and persistent, that is data does not get destroyed, lost, mutated or corrupted once stored 
  2. secure against unauthorized access 
  3. accessible when one needs and 
  4. affordable. 
There is one more complexity that we must be mindful, which is, complexity of size. As the size of the data grows, the means to deliver on all those four parameters, must evolve, often drastically so that overall solution remain attractive to user. For example, if you have only 100GB data, a single external hard disk is often good enough for your need, however if that data becomes 1 exabyte [1 exabyte is 1000 petabytes and 1 petabyte is 1000,000 GB], you need whole range of technologies to manage that data. Difference between personal storage and enterprise storage to a large extent is an illustration of how Quantity transforms into a qualitative attribute at larger magnitude.

Personal Storage

 For non-professional personal need, typically a 300GB hard disk that comes by default with a laptop is more than sufficient. A 250GB hard disk for example can hold around 50,000 normal size photos or mp3 music. If you are avid user of video, you probably will buy few 1 TB external hard disk in addition and that would be DAS or Directly Attached Storage system for you. If you are a cloud aficionado, you probably would rely on Google Drive or Microsoft SkyDrive for your additional needs. In which case you have both DAS and public Cloud in your system.

Enterprise Storage

When it comes to enterprise, many aspects like, data growth, data retention, preparedness towards recovery of data against site disaster and access frequency of data comes into consideration, making the storage planning a costly and complex business.  Additionally with increasing sensitivity towards unstructured data, enterprise is experiencing faster expansion of storage demands. According to IDC's Worldwide Quarterly Disk Storage Systems Tracker, 3Q12 marked the first time that external disk storage systems makers shipped over seven exabytes, or 7,104 petabytes, of capacity in a single quarter for a year-over-year growth rate of 24.4 percent.[source: Infostor]. This means in next 5-6 years there will be many organizations that would hit exabyte of enterprise data. 

Storage Tiers

To get around this challenge of data explosion, enterprise try to bring storage tiers where the data is organized into different classes based how actively they are used. For example, very active (data modification rate is high and data access rate is very high) data requires that they are kept online in fast and most reliable storage tier [let's say tier 1] and the least active [no data modification and only accessed in special scenario like past data audit or recovery] data could be archived in off-line storage. This way, the enterprise provides most resources to most active data and efficiently reduces cost of storage for lesser active data.

Fig 1. Storage Tiers based on Data usage
Fig. 2 tapes and disks
Typically most of the online storage in an enterprise is maintained in disk-based storage. Traditionally digital tapes were used for all offline storage for advantages that tapes can be preserved with very low electrical power consumption and can be moved to different location physically with very little cost. But tapes are serial and therefore require different hardware setup. They also are more prone to read-failures compared to disk. Last ten years of innovations increased storage density of disks manifold and brought down the cost/GB of storage for disk lower than to that of tape and eventually established disks very strongly for archival storage so much so that most enterprises of late are opting for disk-based backup over tape. It started with VTL [Vitual Tape library] appliances replacing physical Tape backup appliances and of late VTLs got merged with standard disk-based backup appliances. Almost all backup appliances use Deduplication in a major way to reduce storage footprint. An added advantage that this transition has brought is archived data can be made online within a very small time-window. Datadomain appliances are very good example of how disk-based backup appliances shaped up. Additionally the backup appliances provide some desirable features such as compliance support where the system can be configured to ensure immutability of data once written into it for a duration defined by the administrator, or automatic data shredding where the data gets destroyed when someone tries to access the data from disk without going through proper authentication procedure.
Compared to archival data, Tier-1 storage employs high-end faster disks [15K RPM] quite often along with SSDs [Solid State Disks]. SSDs are new favourite in this segment with vendors, like Samsung, Sandisk competing with each other to bring out new products that are cheaper, denser and last longer. SSDs are a lot faster and support true random read/write compared to disks.With fast falling price, higher capacity and increased life-time, solid-state drives are finding their places in a large way in tier-1 storage gears. Other advantages of SSDs are that they occupy less physical space, less electrical power and can transition from offline to online a lot quicker compared to disks. It however will take some time, before we see SSDs completely replacing disks in this Tier.
Fig 3: simple stack comparison - SAN, NAS and DAS
Fig. 4 Tiered NAS storage organization in Data Centre
Sometimes called primary, mission-critical storage appliances, Tier-1 storage gear provides fast, reliable storage for mission critical data. They often provide multiple levels of redundancy in order to reduce data down-time. Since these gears are the most expensive of the lot, many storage vendors provide mechanism to transparently move less active data to less expensive disk storage. This Low-Cost Storage Tier or sometimes referred as Near-line storage often is made up of large set of high-capacity but slower SATA disks [5400/7200 RPM]. NAS (Network attached Storage) designs are inherently suited for this type of tiered use, which kind of explains why NAS sells more compared to SANs. Also SAN uses fibre-channel or SAS disks making it more expensive compared to NAS when the data is not mission critical. [see slides for an illustrative comparison between NAS and SAN]. In either SAN or NAS a single disk-array must have all its disks of similar type and speed. For example either they all will be FC high speed disks or they will be SAS. Either way, higher level data access syntax are built into the NAS/SAN software. NAS mimics File access syntax as provided by a File System and SAN provides block access that File systems can use. So NFS (Network File System) and CIFS are the two primary interfaces that a  NAS server supports whereas iSCSI and FC are the two interfaces that SAN provides primary support for the host server file systems.
Fig 4 provides an illustration of a typical enterprise with two data centres, both simultaneously serving its users as well as  providing storage replication service to the other site, a popular configuration to support Site Disaster Recovery, while internally each data centre organizes data into 3 tiers. Tier 1 storage almost always come in a primary-standby configuration in order to support high availability.

Cloud Storage

courtesy: HDS: Thin Provisioning with Virtual Volume

Cloud as a concept became popular only after Virtualization became successful in large-scale. With virtualization, one could have hundreds of virtual servers running on a single physical server. With that, came software that could make provisioning hundreds of applications a matter of running few software commands which could be invoked remotely over HTTP. Ability to dynamically configure servers using software brought up a new paradigm where an application can be commissioned to run across multiples of virtual servers (that are communicating with each other using a common communication structure), serving a large user base entirely using software commands that administrator could execute remotely from his desktop. This type of server provisioning demanded new way of storage provisioning. Concept of virtual volume or logical storage container became popular. Now one can define multiple containers residing in the same physical storage volume and provision them to the server manager remotely.  The concept of Thin provisioning became predominant in storage provisioning where the idea is that a server is provided a virtual volume that uses little physical storage to start with but as it grows the physical storage allocation also grows underneath based on demand. Advantage with this is that one does  not need to plan for all the storage in advance, as the data grows, one can keep adding more storage to the virtual volume, making the virtual volume grow. That decoupled physical storage planning from server's storage provisioning. Storage provisioning became dynamic like virtualized server provisioning. As long as the software can  provision, monitor and manage the servers and virtual volumes allotted to the server over a software defined interface, without errors and within acceptable performance degradation, the model can scale to any size. As it is apparent, there is no real category called 'Cloud storage', what we have rather is 'Cloud service'. Data centres are designed and maintained in the same way the data centres are designed and built all along using combinaton of NAS, SAN and DAS.
Cloud provides a software framework to manage the resources in the data centres by bringing them in a common sharable pool. Cloud in that sense is more about integrating and managing the resources and is less about what storage technologies or systems per se. are used in the data centre(s). Given that Cloud software is the essential element of Cloud service, as long as the software is designed carefully, one can have any type of devices /systems below it, ranging from inexpensive storage arrays of JBOD (Just a Bunch Of Disks) to highly sophisticated HDS, HP, EMC disk arrays or  NAS servers. The figure below from EMC's online literature illustrates this nicely.
It is apparent that as the cloud size grows larger and larger, the complexity and sophistication of the software increase by magnitude and so does the cost advantage of data storage. One can look at the cost of provisioning (server and storage) in public clouds like that of  Google, Rackspace and Amazon and imagine the complexity and sophistication of their Cloud management software. Fortunately many have published  a version of their software in open source for others to learn/try.

source: http://managedview.emc.com/2012/08/the-software-defined-data-center/
courtesy: EMC





Further Reading:
Brocade Document on Data centre infrastructure

My slides on slideshare