On My Watch

Tuesday, December 01, 2009

Cloud DB's

SQL-NoSQL, MVCC-Distributed MVCC, Relational-Non relational .....
The challenges are myriad for managing data in the cloud, particularly since traditional databases (Oracle, SQLServer, MySQL etc.) are difficult and expensive to scale and virtualize, requiring full backups for "instant" virtualization.

On the other hand, the choices are expanding. NoSQL, such as Google's BigTable and Amazon's SimpleDB, is gaining real adherents as it piles up successes (just don't ask for joins.) Joins in a massively distributed environment will probably require a distributed relational database based on MVCC or some other technique to ensure queries against a consistent snapshot. While not yet proven (AFAIK), this holds real promise. Many of the applications I am working on currently, including Innerpass' collaboration application, do require relational queries but they don't require subsecond response. And they could also benefit from the ability to handle spikes without provisioning db servers. Eventual consistency is actually a good match with asynchronous Ajax-based page refreshes where temporarily inaccurate or incomplete link lists can be tolerated.


My guess it will come down, like most (if not all) technologies and architectures before it, to application requirements and business imperative. First off, many apps won't need to change and won't change - it it ain't broke, don't fix it. Greenfield applications and, in some instances, applications that need to modernize - as they are exposed to larger user bases, for example, will need to consider cloud architectures. Technology adoption and adoption rates will be driven by the nature of the application, cost and time-to-market concerns. High volume, low data value sites will tend, from what I can see, toward non relational deployments, data warehouses to columnar and MapReduce hybrids and transactional applications will remain dependent on relational databases. The biggest questions are the in between ones. And the type of relational database will be very performance and latency dependent. Those with high latency tolerance may find distributed databases acceptable and even preferrable while those demanding higher throughput and consistency will stick with tried and true.

Labels: , ,