|By Bob Gourley||
|January 2, 2013 08:00 AM EST||
By Daniel Abadi
Editor’s note: The piece below by Daniel Abadi first appeared on the Hadapt blog and is republished with permission here. The framework presented provides insight into the very dynamic market around “Big Data Innovators” and should be of use for classifying many other firms in this interesting space. -bg
Recently InformationWeek published a piece, authored by Doug Henschen, that listed 13 innovative Big Data vendors. The complete list is reproduced below:
2. Amazon (Redshift, EMR, DynamoDB)
3. Cloudera (CDH, Impala)
11. Neo Technology
These 13 vendors distribute 16 unique data management products (since both Amazon and Cloudera offer multiple distinct data management/processing systems), all of which push the boundary on Big Data management.
In this post I will attempt to subcategorize these 16 products into a competitive grouping, where products placed inside the same group can be considered replacements for each other (and hence are competitive), and each group is complementary to every other group.
Before starting this classification, I will remove three products that, while potentially being interesting from a Big Data perspective, are often used outside of what has become known as the “Big Data realm”, and therefore their primary competitors did not make it on the InformationWeek list. These three products are Splunk (which typically competes with companies focused on the security, compliance, and IT operations management verticals), Amazon Redshift (which typically completes with traditional MPP database vendors), and Neo Technology (which, although usually classified as a “NoSQL database”, its focus on graph data makes it highly unique from a technology and use case perspective relative to the other NoSQL databases on this list).
The remaining 13 products can be classified into four distinct groups:
1. Operational data stores that allow flexible schemas
2. Hadoop distributions
3. Real-time Hadoop-based analytical platforms
4. Hadoop-based BI solutions
Group 1 (operational data stores that allow flexible schemas)
This group is composed of database products that can be used to manage active data for dynamic applications with hard to define (or hard to predict) schemas. The database must be optimized for inserting, retrieving, updating, or deleting individual data items in real-time (latencies on the order of milliseconds), but should also support some sort of interface for performing analysis of the data stored within. The dynamic nature of the typical use case for databases in this group implies a NoSQL interface, and either a key-value or document-store retrieval model. From the InformationWeek list, MongoDB, DynamoDB, Couchbase, and Datastax all fit in this category. Although there are some significant technical differences between these products, they can nonetheless be roughly described as potential replacements for each other in Group 1 use cases.
Group 2 (Hadoop distributions)
The products in this group are designed for very different situations than Group 1. Hadoop is typically used for large scale data analysis and batch processing. Rather than inserting, retrieving, updating, or deleting individual data items, Hadoop is optimized for scanning through large swaths of data, processing and analyzing the data as it proceeds. Hadoop has become the poster-child for “Big Data” due to its proven massive scalability, and its ability to handle the “variety” aspect of Big Data (since Hadoop does not require data to fit neatly into rows and columns in order to be analyzed and processed). From the InformationWeek list, Cloudera, Hortonworks, MapR, and Amazon EMR all fit in this category.
Group 3 (real-time Hadoop-based analytical platforms)
Group 3 takes Hadoop to the next level, transforming it from a mere batch processing system to a full-fledged analytical platform that can answer queries in real-time. Furthermore, by adding a more robust SQL interface to Hadoop (in addition to industry-standard ODBC connectors), group 3 products help to hide the complexity of Hadoop and the need for Hadoop specialists, since traditional business intelligence and visualization tools are now able to interface directly with data stored inside Hadoop. From the InformationWeek list, Hadapt clearly fits in this category, and with certain caveats, so does Cloudera Impala (the caveats are that as of the time of writing this blog post (a) Impala is an extremely young codebase and is still only in beta (b) Impala only supports a small subset of SQL and does not support UDFs or other ways to combine structured and unstructured data in the same query, so calling it an “analytical platform” might be a bit of a stretch).
Group 4 (Hadoop-based BI solutions)
Often lumped together with group 3 products, group 4 products are often confused as being competitive with group 3 products. However, just as business intelligence tools and analytical database solutions are highly complementary and were often packaged together in the pre-Hadoop world, the same is true in the Hadoop/Big Data world. Therefore, Datameer, Karmasphere, and Platfora, all of which function as a business intelligence layer above Hadoop, are capable of working closely with the group 3 products (with announcements along these lines already starting to begin).
In conclusion, although “Big Data” is an enormous and rapidly growing market, one single data management software product is not going to rule the market. Rather, there are four major groups of data management solutions within the Big Data space; and while there is fierce competition within each group, at the macro level these groups can not only co-exist, but are highly complementary. In the long run, it is likely that the 2-3 leaders in each group will emerge and share the Big Data pie.
The 4th International Internet of @ThingsExpo, co-located with the 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than
Jul. 7, 2015 05:00 PM EDT Reads: 1,143
The time is ripe for high speed resilient software defined storage solutions with unlimited scalability. ISS has been working with the leading open source projects and developed a commercial high performance solution that is able to grow forever without performance limitations. In his session at Cloud Expo, Alex Gorbachev, President of Intelligent Systems Services Inc., shared foundation principles of Ceph architecture, as well as the design to deliver this storage to traditional SAN storage co...
Jul. 7, 2015 05:00 PM EDT Reads: 1,185
Announcing @ProfitBricksUSA to Exhibit at @CloudExpo Silicon Valley | #IoT #API #DevOps #Microservices
SYS-CON Events announced today that ProfitBricks, the provider of painless cloud infrastructure, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. ProfitBricks is the IaaS provider that offers a painless cloud experience for all IT users, with no learning curve. ProfitBricks boasts flexible cloud servers and networking, an integrated Data Center Designer tool for visual control over the...
Jul. 7, 2015 05:00 PM EDT Reads: 1,511
[video] Logging and Monitoring with @Sematext Founder @OtisG | @DevOpsSummit #DevOps #Logging #Monitoring
"We got started as search consultants. On the services side of the business we have help organizations save time and save money when they hit issues that everyone more or less hits when their data grows," noted Otis Gospodnetić, Founder of Sematext, in this SYS-CON.tv interview at @DevOpsSummit, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 7, 2015 04:45 PM EDT Reads: 518
[slides] Policy in Hybrid Cloud By @DerekCollison | @CloudExpo #API #DevOps #Containers #Microservices
Enterprises are turning to the hybrid cloud to drive greater scalability and cost-effectiveness. But enterprises should beware as the definition of “policy” varies wildly. Some say it’s the ability to control the resources apps’ use or where the apps run. Others view policy as governing the permissions and delivering security. Policy is all of that and more. In his session at 16th Cloud Expo, Derek Collison, founder and CEO of Apcera, explained what policy is, he showed how policy should be arch...
Jul. 7, 2015 04:30 PM EDT Reads: 356
"CenturyLink brings a full suite of services to the table and that enables us to be an IT service provider," explained Jeff Katzen, Director of the Cloud Practice at CenturyLink, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 7, 2015 04:15 PM EDT Reads: 230
17th Cloud Expo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises ar...
Jul. 7, 2015 04:00 PM EDT Reads: 771
DevOps is about increasing efficiency, but nothing is more inefficient than building the same application twice. However, this is a routine occurrence with enterprise applications that need both a rich desktop web interface and strong mobile support. With recent technological advances from Isomorphic Software and others, it is now feasible to create a rich desktop and tuned mobile experience with a single codebase, without compromising performance or usability.
Jul. 7, 2015 03:45 PM EDT Reads: 683
One of the hottest areas in cloud right now is DRaaS and related offerings. In his session at 16th Cloud Expo, Dale Levesque, Disaster Recovery Product Manager with Windstream's Cloud and Data Center Marketing team, will discuss the benefits of the cloud model, which far outweigh the traditional approach, and how enterprises need to ensure that their needs are properly being met.
Jul. 7, 2015 03:45 PM EDT Reads: 1,303
In the midst of the widespread popularity and adoption of cloud computing, it seems like everything is being offered “as a Service” these days: Infrastructure? Check. Platform? You bet. Software? Absolutely. Toaster? It’s only a matter of time. With service providers positioning vastly differing offerings under a generic “cloud” umbrella, it’s all too easy to get confused about what’s actually being offered. In his session at 16th Cloud Expo, Kevin Hazard, Director of Digital Content for SoftL...
Jul. 7, 2015 03:45 PM EDT Reads: 1,278
[video] Infrastructure as a Toolbox By @SoftLayer at @CloudExpo New York | #IoT #API #Containers #Microservices
Countless business models have spawned from the IaaS industry. Resell Web hosting, blogs, public cloud, and on and on. With the overwhelming amount of tools available to us, it's sometimes easy to overlook that many of them are just new skins of resources we've had for a long time. In his General Session at 16th Cloud Expo, Phil Jackson, Lead Technology Evangelist at SoftLayer, broke down what we've got to work with and discuss the benefits and pitfalls to discover how we can best use them to d...
Jul. 7, 2015 03:30 PM EDT Reads: 1,590
WebRTC converts the entire network into a ubiquitous communications cloud thereby connecting anytime, anywhere through any point. In his session at WebRTC Summit,, Mark Castleman, EIR at Bell Labs and Head of Future X Labs, will discuss how the transformational nature of communications is achieved through the democratizing force of WebRTC. WebRTC is doing for voice what HTML did for web content.
Jul. 7, 2015 03:30 PM EDT Reads: 642
Discussions about cloud computing are evolving into discussions about enterprise IT in general. As enterprises increasingly migrate toward their own unique clouds, new issues such as the use of containers and microservices emerge to keep things interesting. In this Power Panel at 16th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the state of cloud computing today, and what enterprise IT professionals need to know about how the latest topics and trends affect t...
Jul. 7, 2015 03:30 PM EDT Reads: 636
Malicious agents are moving faster than the speed of business. Even more worrisome, most companies are relying on legacy approaches to security that are no longer capable of meeting current threats. In the modern cloud, threat diversity is rapidly expanding, necessitating more sophisticated security protocols than those used in the past or in desktop environments. Yet companies are falling for cloud security myths that were truths at one time but have evolved out of existence.
Jul. 7, 2015 03:30 PM EDT Reads: 1,411
[slides] The Secure Path to Value in the Cloud By @Windstream | @CloudExpo #IoT #API #Containers #Microservices
Even as cloud and managed services grow increasingly central to business strategy and performance, challenges remain. The biggest sticking point for companies seeking to capitalize on the cloud is data security. Keeping data safe is an issue in any computing environment, and it has been a focus since the earliest days of the cloud revolution. Understandably so: a lot can go wrong when you allow valuable information to live outside the firewall. Recent revelations about government snooping, along...
Jul. 7, 2015 03:00 PM EDT Reads: 1,565
[slides] Workloads and Public Cloud at @CloudExpo By @utollwi | @ProfitBricksUSA #DevOps #Containers #Microservices
Public Cloud IaaS started its life in the developer and startup communities and has grown rapidly to a $20B+ industry, but it still pales in comparison to how much is spent worldwide on IT: $3.6 trillion. In fact, there are 8.6 million data centers worldwide, the reality is many small and medium sized business have server closets and colocation footprints filled with servers and storage gear. While on-premise environment virtualization may have peaked at 75%, the Public Cloud has lagged in adop...
Jul. 7, 2015 03:00 PM EDT Reads: 1,718
DevOps Summit, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development...
Jul. 7, 2015 02:45 PM EDT Reads: 858
The last decade was about virtual machines, but the next one is about containers. Containers enable a service to run on any host at any time. Traditional tools are starting to show cracks because they were not designed for this level of application portability. Now is the time to look at new ways to deploy and manage applications at scale. In his session at @DevOpsSummit, Brian “Redbeard” Harrington, a principal architect at CoreOS, will examine how CoreOS helps teams run in production. Attende...
Jul. 7, 2015 02:45 PM EDT Reads: 646
"We have an new division call the Cloud Monetization Division, based on our platform Powua, which empowers enterprises and organizations to take the journey to cloud monetization and to make it a reality," explained Ian Khan, Manager, Innovation & Marketing at Solgenia, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 7, 2015 02:45 PM EDT Reads: 307
[video] Improving the Developer Experience with @DanKLynn | @CloudExpo #Agile #BigData #IoT #Microservices
"AgilData is the next generation of dbShards. It just adds a whole bunch more functionality to improve the developer experience," noted Dan Lynn, CEO of AgilData, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 7, 2015 02:30 PM EDT Reads: 481