NewSQL database?
It appears that although the NoSQL variant got popularity with likes of Google, Yahoo, Facebook, it did not make much dent to RDBMS clientale base and main reason, the NewSQL advocates cite, is that people like SQL and irrespective of scalability and other issues, people decided to stay with SQL. The fact that SQL has been in the game for last 20-25 years makes it so entrenched in Business Application space that it is almost impossible to take it away from the Business Application Space. Even if one ignores the large SQL investment, one cannot ignore the value SQL brought to business community. SQL essentially separated data from the code, that protects the data, i.e. Database Engine, enabling the DBMS packages to be commoditized.
SQL also provided a well-undersood and simple Data manipulation interface which Business Applications could use without needing to assimilate full-complexities of computer programming. Business Applications would have become far more complex if they had to deal with ACID requirement of their data in addition to implement their business logic.
NoSQL, however scalable and flexible they are, comes with a huge cost, one has to design his own data manipulation engine and larger the scale of the data and larger the distribution of computing resources, more intensive is the effort. The uniqueness of the applications demands specific design of the engine that manipulates the applications data and thereby makes it more tied to business logic of the enterprise. While it has its strength, it is obvious that most of the business whose core operation is not about the data itself, has stayed away from NoSQL movement, however large their data are.
Promises of NewSQL : SQL for Big Data
NewSQL tries to bridge this gap by keeping the SQL interface intact but trying to reengineer the basic database engine. Evidently this is far more daunting that coming up with NoSQL alternative. It requires a change in the design that has held its ground for almost 30 years. Only those who understands the intricacies of the original design can venture to take this task of rearchitecting the database engine keeping its original promises intact. What does it mean? It means that the engine must fully support SQL, engine must guarantee ACID [my previous posts elaborated on how NoSQL addressed this requirement]. Additionally engine must 1. provide support for loosely connected set of computing resources, such as computers connected over Internet and 2. scale the performance with the number of computers. The last requirement came from NoSQL land, where huge number of computing resources are connected over Internet and are designed to sift through the massive distributed data to find out answer for a single query. Essentially this engine must be capable of building a distributed database spread over huge number of affordable [i.e. cheap] computers connected on Internet and provide a SQL interface for the entire data. That makes the NewSQL truly the database engine for Big Data.
Contenders
Most of the challengers in this space are started by someone who has participated in the Database software development in early 70's . Let's take the example of VoltDB, started by Michael StoneBraker, a luminary in Database Research who architected Ingres [one of the first Relational DBs], Postgres and many more.
Similarly NuoDB boasts of Jim Starkey, the person behind DEC's relational DB suites during 75-85.
The other prominent NewSQL venture, ScaleDB was started by another Database legend Vern Watts, the architect of famed DB2 from IBM.
Then there is JustoneDB that proclaim itself as the Relational DB of 21st century, boasts of its CTO, Duncan Pauly.
The list is along one but I must mention of Clustrix that boasts of its CTO, Aaron Passey, with Isilon fame [Isilon brought new definition in mainstream storage clustering]. Clustrix appear to have brought appliance model in the NewSQL Database world.
I am sure there are many more to come in this space and I am sure I missed few in listing down here but given the emeregence of this new technology space, we will have to revisit this topic.
I will try to provide more detailed review of these products in next posts starting with Clustrix [see the post here]
Summary
Here is a quick comparison between three different DB technologies:
SQL-DB | No-SQL DB | NewSQL DB |
Basic architecture from 70’s relational Database Model | New [2000] Architecture from likes of Google, Yahoo; designed for single large distributed database | Newer [post-2000] Architecture promises to scale for both standalone and large installation |
Centralized Transaction Processing | Distributed processing | Distributed Processing |
Fully ACID compliant | Breaks ACID, brings eventual consistency model | ACID-compliant |
Integrates SQL engine | No support for SQL | Full support for SQL |
Limited scalability | High Scalability | High scalability, tries to break dependency on any single engine |
Mature Technology; has been in the core of all popular OLTP suites | Relatively mature; suits better for SaaS model | Still Evolving; has the potential to scale for both the use-cases |
No comments:
Post a Comment