Welcome!

AJAX & REA Authors: Rajesh Lain, Sebastian Kruk, RealWire News Distribution, Harald Zeitlhofer

Related Topics: Big Data Journal, .NET, Virtualization, Web 2.0, Cloud Expo, Apache

Big Data Journal: Blog Feed Post

Classifying Today’s “Big Data Innovators”

These 13 vendors distribute 16 unique data management products

By

Editor’s note: The piece below by   first appeared on the Hadapt blog and is republished with permission here. The framework presented provides insight into the very dynamic market around “Big Data Innovators” and should be of use for classifying many other firms in this interesting space. -bg

Recently InformationWeek published a piece, authored by Doug Henschen, that listed 13 innovative Big Data vendors. The complete list is reproduced below:

1.  MongoDB
2.  Amazon (Redshift, EMR, DynamoDB)
3.  Cloudera (CDH, Impala)
4.  Couchbase
5.  Datameer
6.  Datastax
7.  Hadapt
8.  Hortonworks
9.  Karmasphere
10.  MapR
11.  Neo Technology
12.  Platfora
13.  Splunk

Big-Data3These 13 vendors distribute 16 unique data management products (since both Amazon and Cloudera offer multiple distinct data management/processing systems), all of which push the boundary on Big Data management.

In this post I will attempt to subcategorize these 16 products into a competitive grouping, where products placed inside the same group can be considered replacements for each other (and hence are competitive), and each group is complementary to every other group.

Before starting this classification, I will remove three products that, while potentially being interesting from a Big Data perspective, are often used outside of what has become known as the “Big Data realm”, and therefore their primary competitors did not make it on the InformationWeek list. These three products are Splunk (which typically competes with companies focused on the security, compliance, and IT operations management verticals), Amazon Redshift (which typically completes with traditional MPP database vendors), and Neo Technology (which, although usually classified as a “NoSQL database”, its focus on graph data makes it highly unique from a technology and use case perspective relative to the other NoSQL databases on this list).

The remaining 13 products can be classified into four distinct groups:
1.  Operational data stores that allow flexible schemas
2.  Hadoop distributions
3.  Real-time Hadoop-based analytical platforms
4.  Hadoop-based BI solutions

Group 1 (operational data stores that allow flexible schemas)
This group is composed of database products that can be used to manage active data for dynamic applications with hard to define (or hard to predict) schemas. The database must be optimized for inserting, retrieving, updating, or deleting individual data items in real-time (latencies on the order of milliseconds), but should also support some sort of interface for performing analysis of the data stored within. The dynamic nature of the typical use case for databases in this group implies a NoSQL interface, and either a key-value or document-store retrieval model. From the InformationWeek list, MongoDB, DynamoDB, Couchbase, and Datastax all fit in this category. Although there are some significant technical differences between these products, they can nonetheless be roughly described as potential replacements for each other in Group 1 use cases.

Group 2 (Hadoop distributions)
The products in this group are designed for very different situations than Group 1. Hadoop is typically used for large scale data analysis and batch processing. Rather than inserting, retrieving, updating, or deleting individual data items, Hadoop is optimized for scanning through large swaths of data, processing and analyzing the data as it proceeds. Hadoop has become the poster-child for “Big Data” due to its proven massive scalability, and its ability to handle the “variety” aspect of Big Data (since Hadoop does not require data to fit neatly into rows and columns in order to be analyzed and processed). From the InformationWeek list, Cloudera, Hortonworks, MapR, and Amazon EMR all fit in this category.

Group 3 (real-time Hadoop-based analytical platforms)
Group 3 takes Hadoop to the next level, transforming it from a mere batch processing system to a full-fledged analytical platform that can answer queries in real-time. Furthermore, by adding a more robust SQL interface to Hadoop (in addition to industry-standard ODBC connectors), group 3 products help to hide the complexity of Hadoop and the need for Hadoop specialists, since traditional business intelligence and visualization tools are now able to interface directly with data stored inside Hadoop. From the InformationWeek list, Hadapt clearly fits in this category, and with certain caveats, so does Cloudera Impala (the caveats are that as of the time of writing this blog post (a) Impala is an extremely young codebase and is still only in beta (b) Impala only supports a small subset of SQL and does not support UDFs or other ways to combine structured and unstructured data in the same query, so calling it an “analytical platform” might be a bit of a stretch).

Group 4 (Hadoop-based BI solutions)
Often lumped together with group 3 products,  group 4 products are often confused as being competitive with group 3 products. However, just as business intelligence tools and analytical database solutions are highly complementary and were often packaged together in the pre-Hadoop world, the same is true in the Hadoop/Big Data world. Therefore, Datameer, Karmasphere, and Platfora, all of which function as a business intelligence layer above Hadoop, are capable of working closely with the group 3 products (with announcements along these lines already starting to begin).

In conclusion, although “Big Data” is an enormous and rapidly growing market, one single data management software product is not going to rule the market. Rather, there are four major groups of data management solutions within the Big Data space; and while there is fierce competition within each group, at the macro level these groups can not only co-exist, but are highly complementary. In the long run, it is likely that the 2-3 leaders in each group will emerge and share the Big Data pie.

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley, former CTO of the Defense Intelligence Agency (DIA), is Founder and CTO of Crucial Point LLC, a technology research and advisory firm providing fact based technology reviews in support of venture capital, private equity and emerging technology firms. He has extensive industry experience in intelligence and security and was awarded an intelligence community meritorious achievement award by AFCEA in 2008, and has also been recognized as an Infoworld Top 25 CTO and as one of the most fascinating communicators in Government IT by GovFresh.

Cloud Expo Latest Stories
With the explosion of the cloud, more businesses are transitioning to a recurring revenue model to generate reliable sales, grow profits, and open new markets. This opportunity requires businesses to get to market quickly with the pricing and packaging options customers want. In addition, you will want to take advantage of the ensuing tidal wave of data to more effectively upsell, cross-sell and manage your customers. All of this is possible, but only with the right approach. At 15th Cloud Expo, Brendan O'Brien, Co-founder at Aria Systems and the inventor of cloud billing panelists, will lead a panel discussion on what it takes to launch and manage a successful recurring revenue business. The panelists will offer their insights about what each department will need to consider, from financial management to line of business and IT. The panelists will also offer examples from their success in recurring revenue with companies such as Audi, Constant Contact, Experian, Pitney-Bowes, Teleko...
Planning scalable environments isn't terribly difficult, but it does require a change of perspective. In his session at 15th Cloud Expo, Phil Jackson, Development Community Advocate for SoftLayer, will broaden your views to think on an Internet scale by dissecting a video publishing application built with The SoftLayer Platform, Message Queuing, Object Storage, and Drupal. By examining a scalable modular application build that can handle unpredictable traffic, attendees will able to grow your development arsenal and pick up a few strategies to apply to your own projects.
Come learn about what you need to consider when moving your data to the cloud. In her session at 15th Cloud Expo, Skyla Loomis, a Program Director of Cloudant Development at Cloudant, will discuss the security, performance, and operational implications of keeping your data on premise, moving it to the cloud, or taking a hybrid approach. She will use real customer examples to illustrate the tradeoffs, key decision points, and how to be successful with a cloud or hybrid cloud solution.
The cloud provides an easy onramp to building and deploying Big Data solutions. Transitioning from initial deployment to large-scale, highly performant operations may not be as easy. In his session at 15th Cloud Expo, Harold Hannon, Sr. Software Architect at SoftLayer, will discuss the benefits, weaknesses, and performance characteristics of public and bare metal cloud deployments that can help you make the right decisions.
Over the last few years the healthcare ecosystem has revolved around innovations in Electronic Health Record (HER) based systems. This evolution has helped us achieve much desired interoperability. Now the focus is shifting to other equally important aspects – scalability and performance. While applying cloud computing environments to the EHR systems, a special consideration needs to be given to the cloud enablement of Veterans Health Information Systems and Technology Architecture (VistA), i.e., the largest single medical system in the United States.
Cloud and Big Data present unique dilemmas: embracing the benefits of these new technologies while maintaining the security of your organization’s assets. When an outside party owns, controls and manages your infrastructure and computational resources, how can you be assured that sensitive data remains private and secure? How do you best protect data in mixed use cloud and big data infrastructure sets? Can you still satisfy the full range of reporting, compliance and regulatory requirements? In his session at 15th Cloud Expo, Derek Tumulak, Vice President of Product Management at Vormetric, will discuss how to address data security in cloud and Big Data environments so that your organization isn’t next week’s data breach headline.
Scott Jenson leads a project called The Physical Web within the Chrome team at Google. Project members are working to take the scalability and openness of the web and use it to talk to the exponentially exploding range of smart devices. Nearly every company today working on the IoT comes up with the same basic solution: use my server and you'll be fine. But if we really believe there will be trillions of these devices, that just can't scale. We need a system that is open a scalable and by using the URL as a basic building block, we open this up and get the same resilience that the web enjoys.
Is your organization struggling to deal with skyrocketing volumes of digital assets? The amount of data is growing exponentially and organizations are having a hard time managing this growth. In his session at 15th Cloud Expo, Amar Kapadia, Senior Director of Open Cloud Strategy at Seagate, will walk through the essential considerations when developing a cloud storage strategy. In this discussion, you will understand the challenges IT is facing, why companies need to move to cloud, and how the right cloud model can help your business economically overcome the data struggle.
If cloud computing benefits are so clear, why have so few enterprises migrated their mission-critical apps? The answer is often inertia and FUD. No one ever got fired for not moving to the cloud – not yet. In his session at 15th Cloud Expo, Michael Hoch, SVP, Cloud Advisory Service at Virtustream, will discuss the six key steps to justify and execute your MCA cloud migration.
The 16th International Cloud Expo announces that its Call for Papers is now open. 16th International Cloud Expo, to be held June 9–11, 2015, at the Javits Center in New York City brings together Cloud Computing, APM, APIs, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
Most of today’s hardware manufacturers are building servers with at least one SATA Port, but not every systems engineer utilizes them. This is considered a loss in the game of maximizing potential storage space in a fixed unit. The SATADOM Series was created by Innodisk as a high-performance, small form factor boot drive with low power consumption to be plugged into the unused SATA port on your server board as an alternative to hard drive or USB boot-up. Built for 1U systems, this powerful device is smaller than a one dollar coin, and frees up otherwise dead space on your motherboard. To meet the requirements of tomorrow’s cloud hardware, Innodisk invested internal R&D resources to develop our SATA III series of products. The SATA III SATADOM boasts 500/180MBs R/W Speeds respectively, or double R/W Speed of SATA II products.
In today's application economy, enterprise organizations realize that it's their applications that are the heart and soul of their business. If their application users have a bad experience, their revenue and reputation are at stake. In his session at 15th Cloud Expo, Anand Akela, Senior Director of Product Marketing for Application Performance Management at CA Technologies, will discuss how a user-centric Application Performance Management solution can help inspire your users with every application transaction.
SYS-CON Events announced today that Gridstore™, the leader in software-defined storage (SDS) purpose-built for Windows Servers and Hyper-V, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Gridstore™ is the leader in software-defined storage purpose built for virtualization that is designed to accelerate applications in virtualized environments. Using its patented Server-Side Virtual Controller™ Technology (SVCT) to eliminate the I/O blender effect and accelerate applications Gridstore delivers vmOptimized™ Storage that self-optimizes to each application or VM across both virtual and physical environments. Leveraging a grid architecture, Gridstore delivers the first end-to-end storage QoS to ensure the most important App or VM performance is never compromised. The storage grid, that uses Gridstore’s performance optimized nodes or capacity optimized nodes, starts with as few a...
SYS-CON Events announced today that Cloudian, Inc., the leading provider of hybrid cloud storage solutions, has been named “Bronze Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Cloudian is a Foster City, Calif.-based software company specializing in cloud storage. Cloudian HyperStore® is an S3-compatible cloud object storage platform that enables service providers and enterprises to build reliable, affordable and scalable hybrid cloud storage solutions. Cloudian actively partners with leading cloud computing environments including Amazon Web Services, Citrix Cloud Platform, Apache CloudStack, OpenStack and the vast ecosystem of S3 compatible tools and applications. Cloudian's customers include Vodafone, Nextel, NTT, Nifty, and LunaCloud. The company has additional offices in China and Japan.
SYS-CON Events announced today that TechXtend (formerly Programmer’s Paradise), a leading value-added provider of server and storage virtualization, and r-evolution will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. TechXtend (formerly Programmer’s Paradise) is a leading value-added provider of software, systems and solutions for corporations, government organizations, and academic institutions across the United States and Canada. TechXtend is the Exclusive Reseller in the United States for r-evolution