Showing posts with label Cloud storage. Show all posts
Showing posts with label Cloud storage. Show all posts

Monday, March 25, 2013

Mini-mill phenomenon in Enterprise Storage sector

"Disruption is a theory : a conceptual model of cause and effect that makes it possible to better predict the outcomes of competitive battles in different circumstances. These forces almost always topple industry leaders when an attacker has harnessed them because disruptive strategies are predicated on competitors doing what is in their best and most urgent interest : satisfying their most important customers and investing where profits are most attractive. In a profit-seeking world, this is a safe bet." - The Innovator's Solution: Creating and Sustaining Successful Growth, by Michael E Raynor and Clayton M. Christensen

   Clayton M. Christensen in his all-time business classic named Innovator's Dilemma narrated how mini steel plants (called minimills) brought disruption in the US Steel market between 60s and 90s and pushed the established large Integrated steel mills out of business. It is a classic case of incumbents pursuing the path of higher profitability ceding the low-profitability space to new entrants and in turn helping the new entrant to get better in operation and finding the opportunity to replace the incumbent from more premium segments. 
Backblaze's comparison
The question is whether the similar pattern can happen in the storage industry. 
 But steel was commodity, you argue and storage system is not. Buyers know that EMC, NetApp, IBM, HP storage servers are expensive but IT managers buy them for the vendors guarantee for the system's reliability, features and the performance. While that argument does hold good, there were few who tried to build a business, the success of which were critically dependent on the cost of storage and they were astonished to find that they could bring down the cost by a magnitude if they built the entire system on their own.  Backblaze in 2009 published their finding of how much cost saving they could bring by building their own storage.  But Backblaze had a specific need and they had to develop/customize whole range of software to make their hardware useful for their business. Their chart shows that, even with all those R&D cost, they still managed to keep the cost of storage quite close to barebone disks. You may argue that  this could be a special case and it hardly proves the rule. 
Well, it wouldn't, had it been a single case. Gartner and other analysts have predicted that most enterprises will have cloud storage in their setup in next 3-4 years. When they talk about Cloud storage, they primarily mean the specific way of accessing and organizing storage, which is different from traditional methods of storage access, like NFS, CIFS or fibre-channel based block storage i.e. SAN. Beside renting storage from public cloud vendors e.g. Amazon, rackspace, enterprises may decide to build their own cloud infrastructure, especially if they have large user-base. Just like the PC hardware got freed from the clutch of IBM, HP and became commoditized at the hands of Taiwanese and Chinese ODMs few decades ago, storage server hardware is showing distinct signal of getting commoditized at the hands of Taiwanese ODMs like Quanta, Wistron.
Fact is, almost all enterprise storage vendors at present outsource their hardware manufacturing to ODMs in Taiwan [Quanta etc] or China [ where Jabil has a very large manufacturing base]. Traditionally storage OEMs have claimed to make their differentiation in their software. Now thanks to Open source movement, there are some proven solutions which can run on commoditized hardware. What prevents these enterprises to approach directly the ODM's for getting the customized hardware at cheap price?
Mike Yang, VP and GM for Quanta's
Cloud computing Business
 Nancy Gohring's recent article in IT World is an interesting read. She reports how Quanta has built itself as the vendor for data centre's customized hardware. Quanta, traditionally an ODM for server OEMs like Dell, HP is seen to be selling directly to the large-scale cloud providers after their first success story with Facebook around  five years ago. 
EE Times article tells how likes of Facebook, Google found viable alternative to likes of IBM, EMC, NetApp in Quanta and how Quanta recognized the opportunity to expand itself as the customized hardware vendor for big data centres. Facebook's need was to find a source for large number of low-cost hardware customised for their huge data centre. Clearly branded storage servers were found to be too costly and a little inflexible for them to sustain their business model. So they followed what other cloud providers like Rackspace, Google did. They approached the ODM, who have been supplying the unbranded hardware to the storage/server OEMs, and brought down their hardware cost drastically. The advantage that Facebook, Google or Rackspace have is that their unique model of services are driven entirely with software and they have a very strong software team who just needs powerful hardware [which ODM's manufactured based on customized design from Facebook] for their software to run. The necessary question is: does one need as big a data centres as the ones that Amazon, Rackspace have to justify the risk of taking this approach against the cost advantage that one gains. 
There are reports that second tier of cloud providers have already adopted this strategy of going directly to the ODMs for their hardware. Check this recent EE times article if you need reference.
Present Scenario

Future Scenario
Of course the cloud providers do not represent the general enterprise buyers who are the customer-base for enterprise storage/server vendors. Enterprises may be tempted but will never have the man-power or the risk-appetite to justify overhauling their existing IT setup..unless it becomes so cost-effective in the long run that they will be forced by their stakeholders [ It seems that Goldman Sach also bought hardware directly from quanta bypassing brands like Dell, HP ]. Remember, how virtualization swept every enterprise and forced every IT setup to adopt virtualization? Adopting public cloud could become a similar sweeping wave in next few years once the competition between Amazon, Rackspace, Google and second tier of Cloud providers flare up. Gartner says that Cloud and CRM will constitute large component in enterprise software spending in next two years. "It's very clear that mature regions like North America, Western Europe are focusing on public cloud computing, while emerging regions are focusing on private cloud computing," said Ms. Swineheart from Gartner during a recent press meet. Salesforce [leading CRM SAAS provider] already expanded their infrastructure anticipating larger customer acquisition. So one can reasonably expect that further expansion of Cloud infrastructure will reduce the revenue for storage OEMs and expand growth of Taiwanese commodity hardware ODMs like Quanta. All storage OEMs may not immediately see a revenue reduction since overall growth of enterprise IT expenditure will adequately compensate in absolute terms. Also following Clayton M. Christensen's conjecture, established players will find it unprofitable to compete with Quanta and instead focus on high-value customers who are too paranoid to try out commodity hardware. That will improve their profit margin and push their stock price, assuming there is enough growth in that segment. The pattern will continue till commodity hardware become baked enough to compete against branded hardware. Facebook's Open compute project can accelerate the process for a open standardized hardware design that people can use for enterprise servers.
However for the trend to be disruptive enough for the existing storage OEMs, enterprises need a viable ready-made software solution, an open source server software which can run on these commodity hardware which can directly be inducted, to data centres, like a HP or Dell server.  And that is one challenge that we do not have clear answer yet. While Rackspace sponsored openstack is becoming a strong contender for opensource software for cloud providers, it has some miles to go before standard-track enterprises can adopt it for their private cloud setup. But once they mature, OpenCompute certified commodity hardware with Openstack software could become a formidable entity to start the domino effect of disruption for the enterprise storage sector. Like enterprises running mainframes,  NAS and SAN will continue to exist in enterprise IT but major growth will be in cloud infrastructure.
  As Clayton M. Christensen explained, the disruptive solution always comes first as inferior alternative to existing solution and appeals to Niche buyers, in this case large scale cloud providers. But as the time goes opportunity pushes further innovations and soon the Industry gets a credible alternative to existing products at much lower cost-point at every customer segment. And when that happens Industry shifts to new structure quite fast creating new leaders of the Industry. That is how EMC became storage leader earlier pushing costly IBM servers. That is how Nucor became the leader riding on mini-mill disruption in US steel industry. Commodity enterprise hardware and open stack software have the potential to bring the mini-mill effect in enterprise storage sector but whether we will see a new leader in place of EMC depends on which path incumbents take. High tech software sectors are lot more connected compared to what steel Industry was in 70s-80s. EMC is in fact in the board of Openstack and so there are chances that incumbents will see the long-term pattern quite ahead and change themselves to survive the disruption force.

Monday, January 28, 2013

Storage Trends in 2013

As the year begins, it is kind of a popular game to try to predict what is coming in the year. However, as far as storage industry goes whichever way one looks, it does not appear that the game can be much interesting. One  good measure of interesting-ness is (1.)how predictable the market is and the other one could be (2.)how likely a new technology is going to be disruptive. As far as the market is concerned, results from last years tell us the market has been quite stable with EMC leading the pack almost in all fronts of storage systems with more than 1/3rd share of market, while IBM, NetApp, HDS and HP closely competing with each other to pick up second position in the rank. Gartner's 3rd Quarter, 2012 chart [source], below shows relative positions of the storage system vendors.
Growth projectiles of individual players also do not throw up any possibility of surprises with EMC continuing to lead with large margin in the foreseeable future. IDC for example in its press release last November, 2012 forecast 2013 to be a slow growth year. "While both EMC and NetApp continue to gain market share, which should enable both vendors to outpace the overall market growth rate, we are modestly concerned with the current estimates for EMC (+9.5% in 2013) and NTAP (+7.6%) vs. the (forecast) industry growth of 4%.", IDC report says. Gartner also sounded similarly in their 2012-end forecast about the market. To sum up, we do not expect much of reordering in ranks this year too.
As far as technology trend is considered, we have seen published views of 3PAR (HP) CEO David Scott and NetApp CTO, Jay Kidd.
While both focus on their respective solution portfolio and positioning, Mr. Kidd paints the picture with broader brush. He sees, dominant market play of Virtualization, Clustered Storage, Flash Storage and Cloud access in 2013. EMC in addition talks very strongly about Tapes. Tapes?? Some would sneer that it looks like a regressive step. But frankly if enterprise is buying tape storage, there must be strong reasons for that. With larger archive, the need, to drive down power, rack density for archive storage, has become stronger and EMC is giving a solution where Tape adequately addresses that need, especially where EMC gears constitute most of the data centre equipment. But we are digressing.
Coming back to where we started, based on what we can distill from all the chatters, there are three distinct technological patterns:
1. more penetration for solid state storage as differentiator in tiered storage system and
2. stronger play of virtualization in storage deployment to enable more data mobility and
3. growth of  object storage.
Let's take them individually.

Flash-based Storage /Solid-state Storage

Samsung SSD vs SATA HDD: source CNET
Solid-state storage [based on NOR or NAND flash] devices gained popularity in last decade, especially in consumer devices. With passing years, SSDs vendors have brought more reliability, memory density and longer device life time so much so that enterprises see SSD as strong alternative to high-speed HDD. Compared to disk-based HDDs, SSDs offer almost 3 times lower power consumption and magnitude faster memory access [no seek time for SSDs] making it better fit for high-transaction-throughput server. SAP's HANA for example runs entirely on memory to provide faster throughput. SSDs become cheaper alternative in this scenario.  However big storage players so far showed lukewarm response due to high cost of SSD compared HDD-based system. Most of the large storage players brought in flash as fast cache or accelerator in otherwise disk-based storage controllers for read/write throughput [some are using it to store metadata for active storage volumes] but so far complete SSD-array has not come to mainstream. Startups like Whiptail and Violin Memory are betting on full flash-based storage array and not surprisingly they are making quite a few positive news splashes too. Many believe that 2013 will herald the era of SSD-based storage arrays in mainstream enterprise storage. Here is a recent story where Indian Railway System [IRCTC] is looking at SSD to boost performance for online real-time ticketing system. In a tiered storage structure, it looks like flash-based storage or SSDs will see a dominant role in Tier-1 or performance tier in this year. [For more on tiered storage concept see my previous post]

Virtualization

Virtualization is not a new story. VMware continues to shape the storage deployment topography where mobility of not only virtual machines but mobility of entire solution bundled with storage, server and networking infrastructure is making headway. Here we are talking about mobility of entire application ensemble. While EMC gets the biggest benefit of VMware's dominance, by working with all leading storage players like NetApp and HP and networking giant Cisco, VMware has almost created a de-facto virtualization solution for enterprises.  There are are few IT managers, though, who are brave enough to try linux based virtualization solution. IBM for a change is pushing KVM [linux virtualization solution] and trying to position it as alternative to VMware solutions. Read more at IBM blog. There is however hardly any different opinion that virtualization will drive most of the storage deployment this year [I am not counting tape storage here]. IDC also forecasted that in 2013, 69 percent of workloads will be virtualized.
Software Defined Data Centre [SDDC] is a term that has got quite popular in electronic chatter these days. Although VMware coined the term some time back but the way people are using it today is very different from the way VMware outlined. SDDC is used to describe scenario where entire data centre is defined in software and provided as a service. It takes lot more than just virtualization of server and storage but primary constituent of the solution definitely is virtualization. From that perspective, we would put SDDC under Virtualization in the present context.

Object Storage

Object Storage largely comprises all cloud-based storage access. Typically a cloud is accessed over HTTP-based interfaces where all storage entities are referred as objects, i.e. a URL is an object, a File is an object, a database entry too is an object. In other words whenever one accesses storage using any of cloud APIs, one is accessing object storage. In a sense this is an abstraction but in many other senses, it is new way of dealing with storage. It is kind of getting closer to application semantics. As enterprises are moving to Cloud [public or private], storage accesses is getting objectified.
In 2013, Adoption of special cloud-based software will expand as the applications will become more and more cloud-aware. Mark Goros, CEO of Caringo, the leading provider of object storage software, tells us that, "The shift to object storage is being driven by IT trends, including the adoption of cloud services, emerging analytics applications, BYOD and mobility, as well as the market momentum of research, healthcare, government and life sciences." [source]. While there are public cloud gateways like Nasuni File server or NetApp® StorageGRID® gateway that connect enterprise data centres to public cloud like Amazon or Rackspace, the challenge of object storage is less about handling throughput, it is more about how one can organize, move and manage huge number objects of varied sizes in an unconstrained namespace. As is evident, enterprise object storage will closely follow the evolution of large public Cloud infrastructure like Amazon EC or Microsoft Azure.

Tuesday, January 15, 2013

A storage system potpourrie for beginners

Storage is easily one of the most talked about, most invested by people's attention and most confusing technologies around. Anyone who can read this sentence, is aware of digital storage as a concept. Any data that is generated by computing m/c is digital and requires digital storage. However when it comes to technology aspect, storage is easily most clouded concept that is infested with unending series of acronyms: DAS, NAS, SAN, SCSI, SATA, SAS, NFS, CIFS, RAID.. and multiple technology families, like tape storage, disk storage, solid-state storage and then there is all-encompassing Cloud. If you hoped  that with cloud you have finally one thing that you can take refuge in, hold that hope for you must first ascertain what constitutes cloud  to be sure that you can rest with Cloud.
                 One way to make sense out of these apparent forest of acronyms and concepts is try to appreciate what we need storage for. Essentially entire purpose of all storage technologies is to help us to store our ever-expanding digital data in such a way that is
  1. safe and persistent, that is data does not get destroyed, lost, mutated or corrupted once stored 
  2. secure against unauthorized access 
  3. accessible when one needs and 
  4. affordable. 
There is one more complexity that we must be mindful, which is, complexity of size. As the size of the data grows, the means to deliver on all those four parameters, must evolve, often drastically so that overall solution remain attractive to user. For example, if you have only 100GB data, a single external hard disk is often good enough for your need, however if that data becomes 1 exabyte [1 exabyte is 1000 petabytes and 1 petabyte is 1000,000 GB], you need whole range of technologies to manage that data. Difference between personal storage and enterprise storage to a large extent is an illustration of how Quantity transforms into a qualitative attribute at larger magnitude.

Personal Storage

 For non-professional personal need, typically a 300GB hard disk that comes by default with a laptop is more than sufficient. A 250GB hard disk for example can hold around 50,000 normal size photos or mp3 music. If you are avid user of video, you probably will buy few 1 TB external hard disk in addition and that would be DAS or Directly Attached Storage system for you. If you are a cloud aficionado, you probably would rely on Google Drive or Microsoft SkyDrive for your additional needs. In which case you have both DAS and public Cloud in your system.

Enterprise Storage

When it comes to enterprise, many aspects like, data growth, data retention, preparedness towards recovery of data against site disaster and access frequency of data comes into consideration, making the storage planning a costly and complex business.  Additionally with increasing sensitivity towards unstructured data, enterprise is experiencing faster expansion of storage demands. According to IDC's Worldwide Quarterly Disk Storage Systems Tracker, 3Q12 marked the first time that external disk storage systems makers shipped over seven exabytes, or 7,104 petabytes, of capacity in a single quarter for a year-over-year growth rate of 24.4 percent.[source: Infostor]. This means in next 5-6 years there will be many organizations that would hit exabyte of enterprise data. 

Storage Tiers

To get around this challenge of data explosion, enterprise try to bring storage tiers where the data is organized into different classes based how actively they are used. For example, very active (data modification rate is high and data access rate is very high) data requires that they are kept online in fast and most reliable storage tier [let's say tier 1] and the least active [no data modification and only accessed in special scenario like past data audit or recovery] data could be archived in off-line storage. This way, the enterprise provides most resources to most active data and efficiently reduces cost of storage for lesser active data.

Fig 1. Storage Tiers based on Data usage
Fig. 2 tapes and disks
Typically most of the online storage in an enterprise is maintained in disk-based storage. Traditionally digital tapes were used for all offline storage for advantages that tapes can be preserved with very low electrical power consumption and can be moved to different location physically with very little cost. But tapes are serial and therefore require different hardware setup. They also are more prone to read-failures compared to disk. Last ten years of innovations increased storage density of disks manifold and brought down the cost/GB of storage for disk lower than to that of tape and eventually established disks very strongly for archival storage so much so that most enterprises of late are opting for disk-based backup over tape. It started with VTL [Vitual Tape library] appliances replacing physical Tape backup appliances and of late VTLs got merged with standard disk-based backup appliances. Almost all backup appliances use Deduplication in a major way to reduce storage footprint. An added advantage that this transition has brought is archived data can be made online within a very small time-window. Datadomain appliances are very good example of how disk-based backup appliances shaped up. Additionally the backup appliances provide some desirable features such as compliance support where the system can be configured to ensure immutability of data once written into it for a duration defined by the administrator, or automatic data shredding where the data gets destroyed when someone tries to access the data from disk without going through proper authentication procedure.
Compared to archival data, Tier-1 storage employs high-end faster disks [15K RPM] quite often along with SSDs [Solid State Disks]. SSDs are new favourite in this segment with vendors, like Samsung, Sandisk competing with each other to bring out new products that are cheaper, denser and last longer. SSDs are a lot faster and support true random read/write compared to disks.With fast falling price, higher capacity and increased life-time, solid-state drives are finding their places in a large way in tier-1 storage gears. Other advantages of SSDs are that they occupy less physical space, less electrical power and can transition from offline to online a lot quicker compared to disks. It however will take some time, before we see SSDs completely replacing disks in this Tier.
Fig 3: simple stack comparison - SAN, NAS and DAS
Fig. 4 Tiered NAS storage organization in Data Centre
Sometimes called primary, mission-critical storage appliances, Tier-1 storage gear provides fast, reliable storage for mission critical data. They often provide multiple levels of redundancy in order to reduce data down-time. Since these gears are the most expensive of the lot, many storage vendors provide mechanism to transparently move less active data to less expensive disk storage. This Low-Cost Storage Tier or sometimes referred as Near-line storage often is made up of large set of high-capacity but slower SATA disks [5400/7200 RPM]. NAS (Network attached Storage) designs are inherently suited for this type of tiered use, which kind of explains why NAS sells more compared to SANs. Also SAN uses fibre-channel or SAS disks making it more expensive compared to NAS when the data is not mission critical. [see slides for an illustrative comparison between NAS and SAN]. In either SAN or NAS a single disk-array must have all its disks of similar type and speed. For example either they all will be FC high speed disks or they will be SAS. Either way, higher level data access syntax are built into the NAS/SAN software. NAS mimics File access syntax as provided by a File System and SAN provides block access that File systems can use. So NFS (Network File System) and CIFS are the two primary interfaces that a  NAS server supports whereas iSCSI and FC are the two interfaces that SAN provides primary support for the host server file systems.
Fig 4 provides an illustration of a typical enterprise with two data centres, both simultaneously serving its users as well as  providing storage replication service to the other site, a popular configuration to support Site Disaster Recovery, while internally each data centre organizes data into 3 tiers. Tier 1 storage almost always come in a primary-standby configuration in order to support high availability.

Cloud Storage

courtesy: HDS: Thin Provisioning with Virtual Volume

Cloud as a concept became popular only after Virtualization became successful in large-scale. With virtualization, one could have hundreds of virtual servers running on a single physical server. With that, came software that could make provisioning hundreds of applications a matter of running few software commands which could be invoked remotely over HTTP. Ability to dynamically configure servers using software brought up a new paradigm where an application can be commissioned to run across multiples of virtual servers (that are communicating with each other using a common communication structure), serving a large user base entirely using software commands that administrator could execute remotely from his desktop. This type of server provisioning demanded new way of storage provisioning. Concept of virtual volume or logical storage container became popular. Now one can define multiple containers residing in the same physical storage volume and provision them to the server manager remotely.  The concept of Thin provisioning became predominant in storage provisioning where the idea is that a server is provided a virtual volume that uses little physical storage to start with but as it grows the physical storage allocation also grows underneath based on demand. Advantage with this is that one does  not need to plan for all the storage in advance, as the data grows, one can keep adding more storage to the virtual volume, making the virtual volume grow. That decoupled physical storage planning from server's storage provisioning. Storage provisioning became dynamic like virtualized server provisioning. As long as the software can  provision, monitor and manage the servers and virtual volumes allotted to the server over a software defined interface, without errors and within acceptable performance degradation, the model can scale to any size. As it is apparent, there is no real category called 'Cloud storage', what we have rather is 'Cloud service'. Data centres are designed and maintained in the same way the data centres are designed and built all along using combinaton of NAS, SAN and DAS.
Cloud provides a software framework to manage the resources in the data centres by bringing them in a common sharable pool. Cloud in that sense is more about integrating and managing the resources and is less about what storage technologies or systems per se. are used in the data centre(s). Given that Cloud software is the essential element of Cloud service, as long as the software is designed carefully, one can have any type of devices /systems below it, ranging from inexpensive storage arrays of JBOD (Just a Bunch Of Disks) to highly sophisticated HDS, HP, EMC disk arrays or  NAS servers. The figure below from EMC's online literature illustrates this nicely.
It is apparent that as the cloud size grows larger and larger, the complexity and sophistication of the software increase by magnitude and so does the cost advantage of data storage. One can look at the cost of provisioning (server and storage) in public clouds like that of  Google, Rackspace and Amazon and imagine the complexity and sophistication of their Cloud management software. Fortunately many have published  a version of their software in open source for others to learn/try.

source: http://managedview.emc.com/2012/08/the-software-defined-data-center/
courtesy: EMC





Further Reading:
Brocade Document on Data centre infrastructure

My slides on slideshare