|By Elad Israeli||
|October 7, 2010 10:25 AM EDT||
In recent times, one of the most popular subjects related to the field of Business Intelligence (BI) has been In-memory BI technology. The subject gained popularity largely due to the success of QlikTech, provider of the in-memory-based QlikView BI product. Following QlikTech’s lead, many other BI vendors have jumped on the in-memory “hype wagon,” including the software giant, Microsoft, which has been aggressively marketing PowerPivot, their own in-memory database engine.
The increasing hype surrounding in-memory BI has caused BI consultants, analysts and even vendors to spew out endless articles, blog posts and white papers on the subject, many of which have also gone the extra mile to describe in-memory technology as the future of business intelligence, the death blow to the data warehouse and the swan song of OLAP technology. I find one of these in my inbox every couple of weeks.
Just so it is clear - the concept of in-memory business intelligence is not new. It has been around for many years. The only reason it became widely known recently is because it wasn’t feasible before 64-bit computing became commonly available. Before 64-bit processors, the maximum amount of RAM a computer could utilize was barely 4GB, which is hardly enough to accommodate even the simplest of multi-user BI solutions. Only when 64-bit systems became cheap enough did it became possible to consider in-memory technology as a practical option for BI.
The success of QlikTech and the relentless activities of Microsoft’s marketing machine have managed to confuse many in terms of what role in-memory technology plays in BI implementations. And that is why many of the articles out there, which are written by marketers or market analysts who are not proficient in the internal workings of database technology (and assume their readers aren’t either), are usually filled with inaccuracies and, in many cases, pure nonsense.
The purpose of this article is to put both in-memory and disk-based BI technologies in perspective, explain the differences between them and finally lay out, in simple terms, why disk-based BI technology isn’t on its way to extinction. Rather, disk-based BI technology is evolving into something that will significantly limit the use of in-memory technology in typical BI implementations.
But before we get to that, for the sake of those who are not very familiar with in-memory BI technology, here’s a brief introduction to the topic.
Disk and RAM
Generally speaking, your computer has two types of data storage mechanisms – disk (often called a hard disk) and RAM (random access memory). The important differences between them (for this discussion) are outlined in the following table:
Most modern computers have 15-100 times more available disk storage than they do RAM. My laptop, for example, has 8GB of RAM and 300GB of available disk space. However, reading data from disk is much slower than reading the same data from RAM. This is one of the reasons why 1GB of RAM costs approximately 320 times that of 1GB of disk space.
Another important distinction is what happens to the data when the computer is powered down: data stored on disk is unaffected (which is why your saved documents are still there the next time you turn on your computer), but data residing in RAM is instantly lost. So, while you don’t have to re-create your disk-stored Microsoft Word documents after a reboot, you do have to re-load the operating system, re-launch the word processor and reload your document. This is because applications and their internal data are partly, if not entirely, stored in RAM while they are running.
Disk-based Databases and In-memory Databases
Now that we have a general idea of what the basic differences between disk and RAM are, what are the differences between disk-based and in-memory databases? Well, all data is always kept on hard disks (so that they are saved even when the power goes down). When we talk about whether a database is disk-based or in-memory, we are talking about where the data resides while it is actively being queried by an application: with disk-based databases, the data is queried while stored on disk and with in-memory databases, the data being queried is first loaded into RAM.
Disk-based databases are engineered to efficiently query data residing on the hard drive. At a very basic level, these databases assume that the entire data cannot fit inside the relatively small amount of RAM available and therefore must have very efficient disk reads in order for queries to be returned within a reasonable time frame. The engineers of such databases have the benefit of unlimited storage, but must face the challenges of relying on relatively slow disk operations.
On the other hand, in-memory databases work under the opposite assumption that the data can, in fact, fit entirely inside the RAM. The engineers of in-memory databases benefit from utilizing the fastest storage system a computer has (RAM), but have much less of it at their disposal.
That is the fundamental trade-off in disk-based and in-memory technologies: faster reads and limited amounts of data versus slower reads and practically unlimited amounts of data. These are two critical considerations for business intelligence applications, as it is important both to have fast query response times and to have access to as much data as possible.
The Data Challenge
A business intelligence solution (almost) always has a single data store at its center. This data store is usually called a database, data warehouse, data mart or OLAP cube. This is where the data that can be queried by the BI application is stored.
The challenges in creating this data store using traditional disk-based technologies is what gave in-memory technology its 15 minutes (ok, maybe 30 minutes) of fame. Having the entire data model stored inside RAM allowed bypassing some of the challenges encountered by their disk-based counterparts, namely the issue of query response times or ‘slow queries.’
When saying ‘traditional disk-based’ technologies, we typically mean relational database management systems (RDBMS) such as SQL Server, Oracle, MySQL and many others. It’s true that having a BI solution perform well using these types of databases as their backbone is far more challenging than simply shoving the entire data model into RAM, where performance gains would be immediate due to the fact RAM is so much faster than disk.
It’s commonly thought that relational databases are too slow for BI queries over data in (or close to) its raw form due to the fact they are disk-based. The truth is, however, that it’s because of how they use the disk and how often they use it.
Relational databases were designed with transactional processing in mind. But having a database be able to support high-performance insertions and updates of transactions (i.e., rows in a table) as well as properly accommodating the types of queries typically executed in BI solutions (e.g., aggregating, grouping, joining) is impossible. These are two mutually-exclusive engineering goals, that is to say they require completely different architectures at the very core. You simply can’t use the same approach to ideally achieve both.
In addition, the standard query language used to extract transactions from relational databases (SQL) is syntactically designed for the efficient fetching of rows, while rare are the cases in BI where you would need to scan or retrieve an entire row of data. It is nearly impossible to formulate an efficient BI query using SQL syntax.
So while relational databases are great as the backbone of operational applications such as CRM, ERP or Web sites, where transactions are frequently and simultaneously inserted, they are a poor choice for supporting analytic applications which usually involve simultaneous retrieval of partial rows along with heavy calculations.
In-memory databases approach the querying problem by loading the entire dataset into RAM. In so doing, they remove the need to access the disk to run queries, thus gaining an immediate and substantial performance advantage (simply because scanning data in RAM is orders of magnitude faster than reading it from disk). Some of these databases introduce additional optimizations which further improve performance. Most of them also employ compression techniques to represent even more data in the same amount of RAM.
Regardless of what fancy footwork is used with an in-memory database, storing the entire dataset in RAM has a serious implication: the amount of data you can query with in-memory technology is limited by the amount of free RAM available, and there will always be much less available RAM than available disk space.
The bottom line is that this limited memory space means that the quality and effectiveness of your BI application will be hindered: the more historical data to which you have access and/or the more fields you can query, the better analysis, insight and, well, intelligence you can get.
You could add more and more RAM, but then the hardware you require becomes exponentially more expensive. The fact that 64-bit computers are cheap and can theoretically support unlimited amounts of RAM does not mean they actually do in practice. A standard desktop-class (read: cheap) computer with standard hardware physically supports up to 12GB of RAM today. If you need more, you can move on to a different class of computer which costs about twice as much and will allow you up to 64GB. Beyond 64GB, you can no longer use what is categorized as a personal computer but will require a full-blown server which brings you into very expensive computing territory.
It is also important to understand that the amount of RAM you need is not only affected by the amount of data you have, but also by the number of people simultaneously querying it. Having 5-10 people using the same in-memory BI application could easily double the amount of RAM required for intermediate calculations that need to be performed to generate the query results. A key success factor in most BI solutions is having a large number of users, so you need to tread carefully when considering in-memory technology for real-world BI. Otherwise, your hardware costs may spiral beyond what you are willing or able to spend (today, or in the future as your needs increase).
There are other implications to having your data model stored in memory, such as having to re-load it from disk to RAM every time the computer reboots and not being able to use the computer for anything other than the particular data model you’re using because its RAM is all used up.
A Note about QlikView and PowerPivot In-memory Technologies
QlikTech is the most active in-memory BI player out there so their QlikView in-memory technology is worth addressing in its own right. It has been repeatedly described as “unique, patented associative technology” but, in fact, there is nothing “associative” about QlikView’s in-memory technology. QlikView uses a simple tabular data model, stored entirely in-memory, with basic token-based compression applied to it. In QlikView’s case, the word associative relates to the functionality of its user interface, not how the data model is physically stored. Associative databases are a completely different beast and have nothing in common with QlikView’s technology.
PowerPivot uses a similar concept, but is engineered somewhat differently due to the fact it’s meant to be used largely within Excel. In this respect, PowerPivot relies on a columnar approach to storage that is better suited for the types of calculations conducted in Excel 2010, as well as for compression. Quality of compression is a significant differentiator between in-memory technologies as better compression means that you can store more data in the same amount RAM (i.e., more data is available for users to query). In its current version, however, PowerPivot is still very limited in the amounts of data it supports and requires a ridiculous amount of RAM.
The Present and Future Technologies
The destiny of BI lies in technologies that leverage the respective benefits of both disk-based and in-memory technologies to deliver fast query responses and extensive multi-user access without monstrous hardware requirements. Obviously, these technologies cannot be based on relational databases, but they must also not be designed to assume a massive amount of RAM, which is a very scarce resource.
These types of technologies are not theoretical anymore and are already utilized by businesses worldwide. Some are designed to distribute different portions of complex queries across multiple cheaper computers (this is a good option for cloud-based BI systems) and some are designed to take advantage of 21st-century hardware (multi-core architectures, upgraded CPU cache sizes, etc.) to extract more juice from off-the-shelf computers.
A Final Note: ElastiCube Technology
The technology developed by the company I co-founded, SiSense, belongs to the latter category. That is, SiSense utilizes technology which combines the best of disk-based and in-memory solutions, essentially eliminating the downsides of each. SiSense’s BI product, Prism, enables a standard PC to deliver a much wider variety of BI solutions, even when very large amounts of data, large numbers of users and/or large numbers of data sources are involved, as is the case in typical BI projects.
When we began our research at SiSense, our technological assumption was that it is possible to achieve in-memory-class query response times, even for hundreds of users simultaneously accessing massive data sets, while keeping the data (mostly) stored on disk. The result of our hybrid disk-based/in-memory technology is a BI solution based on what we now call ElastiCube, after which this blog is named. You can read more about this technological approach, which we call Just-in-Time In-memory Processing, at our BI Software Evolved technology page.
- Mainstream Business Applications and In-Memory Databases
- Working with Project Management Software – Who Is Managing Who?
- APM Convergence: Monitoring vs. Management
- Donald Fischer Joins General Catalyst as Venture Partner
- DataStax Hires Clint Smith as General Counsel
- Achieving Agile Transformation with Kanban, Kotter, and Lean Startup
- How to Performance Test Automation for GWT and SmartGWT
- The Top Five Benefits of Cloud Computing
- Compuware APM Extends Leadership in Big Data
- Compuware APM Recognized as Trendsetter in Big Data Solutions
- Will These Five Websites Make the Same Mistake Twice During the Big Game?
- RSA Conference USA 2014 Exhibitor Profiles (A through L)
- Mainstream Business Applications and In-Memory Databases
- Consumer Electronics - Global Trends, Estimates and Forecasts, 2011-2018
- Working with Project Management Software – Who Is Managing Who?
- Objective-C Programming: The Big Nerd Ranch Guide (2nd Edition)
- APM Convergence: Monitoring vs. Management
- Small Medium Business (SMB) IT Continues to Gain Respect, What About SOHO?
- Donald Fischer Joins General Catalyst as Venture Partner
- Big Data Market: Business Case, Market Analysis and Forecasts 2014 - 2019
- 2014 International CES Exhibitor Profiles: Samsung Electronics America, Inc. to 3D Vision Technologies Limited
- Global Customer Relationship Management (CRM) Software Industry
- Creating JavaServer Faces Maven Managed Projects with Eclipse
- DataStax Hires Clint Smith as General Counsel
- Building a Drag-and-Drop Shopping Cart with AJAX
- What Is AJAX?
- Google Maps! AJAX-Style Web Development Using ASP.NET
- Where Are RIA Technologies Headed in 2008?
- How and Why AJAX, Not Java, Became the Favored Technology for Rich Internet Applications
- Flashback to January 2006: Exclusive SYS-CON.TV Interviews on "OpenAjax Alliance" Announcement
- "Real-World AJAX" One-Day Seminar Arrives in Silicon Valley
- AJAXWorld Conference & Expo to Take Place October 2-4, 2006, at the Santa Clara Convention Center, California
- AJAX Sponsor Webcasts Are Now Available at AJAXWorld Website
- AJAXWorld University Announces AJAX Developer Bootcamp
- AJAX Support In JadeLiquid WebRenderer v3.1
- i-Technology 2008 Predictions: Where's RIAs, AJAX, SOA and Virtualization Headed in 2008?
Cloud environments have created situations that allow users, customers, consumers, and employees to access Public, Intranet, and Extranet applications from different locations, devices, and as different personas. The focus of all the Internet and enterprise front-end applications today is to enhance the user experience. In addition, with advances in mobility and BYOD, the line between public and private becomes a deep shade of gray. At the same time, organizations are leveraging SaaS applications, such as Google Apps and DropBox, for their internal business communication and collaboration. This opens up challenges in providing a universal identity for the user, while at the same time retaining the flexibility to segregate access depending on the scenario. While cloud computing environments may offer different levels of abstraction to its users, federated identity management does not leverage these abstractions; each user must set up her identity management solution. This situation is...
Mar. 11, 2014 02:39 PM EDT Reads: 501
SYS-CON Events announced today that Ambernet Technologies, the innovative “Cloud Management Center” company, will exhibit at SYS-CON's 14th International Cloud Expo®, which will take place on June 10–12, 2014, at the Javits Center in New York City, New York. Ambernet Technologies is a leading global provider of cloud management software (CloudTruOps) and IT professional services to the enterprise, service provider and government markets. CloudTruOps is the industry’s first infrastructure-independent and service-aware software solution that provides a fully transactional single pane of glass for cloud service provisioning & orchestration, governance, policy, security, performance, self-service storefront, and billing/chargeback for multiple clouds. Ambernet's IT professional services provide consulting services, solutions, and support. Ambernet is a global company with headquarters in Dallas, Texas and regional offices in Toronto, Canada, and Bangalore, India.
Mar. 11, 2014 08:00 AM EDT Reads: 807
The evolutionary nature of mobile presents a security-centric challenge for businesses with corporate content on these devices. Enterprises put themselves at risk when users access sensitive information through email and applications across smartphones and tablets, while mobile. Organizations can choose to ignore this security threat or enhance employee productivity through secure corporate containers. In his session at 14th Cloud Expo, Eric Owings, an enterprise account executive at AirWatch®, will discuss best practices and strategies to ensure global security and workforce enablement by leveraging enterprise mobility management (EMM) across the enterprise. He will also provide attendees with a deeper understanding of enterprise mobility in a connected ecosystem, while ensuring security and compliance in the cloud.
Mar. 7, 2014 09:45 AM EST Reads: 1,719
Cascading is the popular Java-based application development framework for building Big Data applications on Apache Hadoop. This open source framework allows you to leverage existing skillsets such as Java, SQL, R, and more to create enterprise-grade applications without having to think in MapReduce. In his session at 5th Big Data Expo, Alexis Roos, a Senior Solutions Architect focusing on Big Data solutions at Concurrent, Inc., will give an introduction to Cascading, how it works, and then dive into how enterprises can start building applications with Cascading. Come and see how companies like Twitter, eBay, Etsy, and other data-driven companies are taking advantage of Cascading and how Cascading is changing the business of Big Data in the enterprise.
Mar. 4, 2014 11:15 AM EST Reads: 1,835
The world’s largest and most successful private cloud operations are revolutionizing their approach to demand management. These organizations have recognized that while self-service portals are a component in the overall cloud architecture, these tools do not enable demand management. In fact, in many cases the portals and end-user interfaces don’t actually capture anything to do with demand, but instead force the user to enter the capacity “supply” requirements that they think will meet their demands. This is very different. Large enterprises have recognized the need to look beyond immediate requests to also model the “pipeline” of new demands that will be coming down the road. It is only by capturing new immediate requirements, an understanding of the pipeline and what is running in environments that organizations can possibly hope to accurately model demand and properly allocate compute, storage and network resources.
Mar. 4, 2014 10:15 AM EST Reads: 1,852
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity. Without bringing these three elements together via Systems of Discover you either end up with an Internet of somethings and/or a big mess of data. In his session at @ThingsExpo, Mac Devine, a Distinguished Engineer at IBM, will focus on how to ensure businesses have the right plans in place for Systems of Discovery for the Internet-of-Things world we are entering.
Mar. 4, 2014 09:00 AM EST Reads: 2,158
Nominations for participating vendors will be accepted through Twitter at @ThingsExpo. The "Open Cloud Shoot-Out at @ThingsExpo New York," in which leading cloud providers are expected to participate, will be held live on stage at the event. The Shootout will provide the vendors with an opportunity to demonstrate the features and capabilities of their products, with a particular focus on interoperability, scalability, security, and reliability in terms of development, deployment, and management.
Feb. 25, 2014 02:30 PM EST Reads: 2,276
As businesses aspire to move more and more application workloads outside of the boundaries of their private cloud data centers, public cloud service providers are increasingly implementing a private cloud staple: resiliency. In his session at 14th Cloud Expo, John Roese, SVP and Chief CTO at EMC Corporation, will summarize the key architectural tenets of resilient private cloud architectures. These tenets can be implemented in any service provider cloud implementation, regardless of hypervisor choice (e.g., VMware, Hyper-V, Xen), cloud orchestration software (e.g., vSphere, OpenStack), network implementation (e.g., SDN, NFV), or storage implementation (file, block, object). A resilient public cloud will naturally attract increased workload migration, and the rest of the session will describe foundational technologies that facilitate not only secure and seamless application workload migration, but secure and seamless data set migration as well.
Feb. 25, 2014 11:00 AM EST Reads: 2,001
Fueled by the global economic situation, the government's focus on datacenter consolidation and the "Cloud First" initiative, Cloud Computing continues to be the buzzword of the year. As government agencies start to adopt cloud computing, additional challenges including security in the cloud have become prominent barriers to adoption. In his session at 14th Cloud Expo, Majed Saadi, Director of the Cloud Computing Practice at SRA International, will focus on providing a quick Cloud Computing technology update with an emphasis on current Cloud Computing security trends and drivers. Examples of these trends include: the utilization and evaluation of Clouds in both active and passive surveillance systems and the use of High Performance Clouds for expanding scientist ability to access data. He will also introduces best practices and lessons learned for securing both public and private cloud environments. It offers insight into how Cloud Computing coupled with other technical advancements i...
Feb. 24, 2014 09:45 AM EST Reads: 2,411
With Windows Server 2003 end of extended support approaching, enterprises must begin their migration planning for all affected production applications. There are a variety of approaches and many people will take a “mix and match” approach. Whatever the approach, it’s important to have a migration plan now – 200 business days goes by quickly when some applications take weeks to migrate. This is the perfect opportunity to move those applications to the Cloud. There’s a way to move your applications and modernize (move to the cloud) at the same time.
Feb. 23, 2014 11:30 AM EST Reads: 1,789
Software development, like engineering, is a craft that requires the application of creative approaches to solve problems given a wide range of constraints. However, while engineering design may be craftwork, the production of most designed objects relies on a standardized and automated manufacturing process. By contrast, much of what's typically involved when moving an application from prototype to production and, indeed, maintaining the application through its lifecycle remains craftwork.
Feb. 22, 2014 01:30 PM EST Reads: 1,910
Are you re-creating existing technology silos in the cloud? If so, your entire enterprise investment in the cloud is at risk. From the perspective of IT, organizational silos seem to be the root of all problems. Every line of business, every department, every functional area has its own requirements, its own technology preferences, and its own way of doing things. They have historically invested in specialized components for narrow purposes, which IT must then conventionally integrate via application middleware – increasing the cost, complexity, and brittleness of the overall architecture. Now those same stakeholders want to move to the cloud. Save money with SaaS apps! Reduce data center costs with IaaS! Build a single private cloud we can all share! But breaking down the technical silos is easier said than done. There are endless problems: Static interfaces. Legacy technology. Inconsistent policies, rules, and processes. Crusty old middleware that predates the cloud. And everybod...
Feb. 21, 2014 11:00 AM EST Reads: 2,132
Recent high-profile events (2010 Haitian Earthquake, 2011 Tōhoku Earthquake and Tsunami, 2013 Typhoon Haiyan/Yolanda) have highlighted the growing importance played by the international community in successful humanitarian assistance and disaster response. These events also showcased the critical importance of quickly providing robust information technology resources to response effort participants. In June 2010, in support of its continuing effort to foster international collaboration, the National Geospatial-Intelligence Agency (NGA) initiated a dialog with the Network Centric Operations Industry Consortium (NCOIC) to discuss this and other aspects of geospatial data information-sharing across the international community. In response to this request the NCOIC through the use of a cloud services brokerage paradigm, built and demonstrated a federated cloud computing infrastructure capable of managing the electronic exchange of geospatial data. The effort also led to the development of ...
Feb. 21, 2014 09:00 AM EST Reads: 2,238
Cloud computing is changing our world, sharing common platforms for global information exchange. Self-service computing makes the Internet come alive, helping users visualize and analyze location-aware information. Configurable applications deliver a solution framework for integration, collaboration, and efficiency. Cloud-based applications integrate and synthesize information from many sources, facilitating communication and collaboration, and breaking down barriers between institutions, disciplines, and cultures. Online platforms enable real-time access from everyone. Web connectivity provides a common information source, elaborating, collaborating, and sharing holistic approaches for content awareness.
Feb. 18, 2014 09:15 AM EST Reads: 1,945
Although PaaS is new, it's rapidly gaining momentum, with growth projected at 48 percent annually by Technavio, the research firm, and topping $6 billion in value by 2016. If PaaS is treated as a strategic opportunity to align agendas across IT and across the business, it may well prove to be a ʺonce in a generationʺ opportunity to clarify, improve, and strengthen everything developers do. As with any new technology or approach to doing business, PaaS will appeal to different groups for different reasons. The clear business value is that PaaS is added at the application layer. For ISVs, PaaS can help extend the availability of a traditional software product or enable organizations to add new capabilities to their existing IT spectrum. It's also helpful to anyone wishing to achieve productivity gains, speed time to results, or reduce their costs. But like any technological shift, PaaS adoption requires changes in how people work and demands collaboration if it is to be as successful as...
Feb. 17, 2014 09:00 AM EST Reads: 2,965