Welcome!

Machine Learning Authors: Pat Romanski, Elizabeth White, Yeshim Deniz, Rene Buest, Nate Vickery

Related Topics: Java IoT, Industrial IoT, Adobe Flex, Machine Learning , Recurring Revenue, Apache

Java IoT: Blog Feed Post

Debunking DRAM vs. Flash Controversy vis-a-vis In-Memory Processing

“Minimal” Performance Advantage of DRAM vs SSD

Wikibon produced an interesting material (looks like paid by Aerospike, NoSQL database recently emerged by resurrecting failed CitrusLeaf and acquihiring AlchemyDB, which product, of course, was recommended in the end) that compares NoSQL databases based on storing data in flash-based SSD vs. storing data in DRAM.

There are number of factual problems with that paper and I want to point them out.

Note that Wikibon doesn’t mention GridGain in this study (we are not a NoSQL datastore per-se after all) so I don’t have any bone in this game other than annoyance with biased and factually incorrect writing.

“Minimal” Performance Advantage of DRAM vs SSD
The paper starts with a simple statement “The minimal performance disadvantage of flash, relative to main memory…”. Minimal? I’ve seen number of studies where performance difference between SSDs and DRAM range form 100 to 10,000 times. For example, this University of California, Berkeley study claims that SSD bring almost no advantage to the Facebook Hadoop cluster and DRAM pre-caching is the way forward.

Let me provide even shorter explanation. Assuming we are dealing with Java – SSD devices are visible to Java application as typical block devices, and therefore accessed as such. It means that a typical object read from such device involves the same steps as reading this object from a file: hardware I/O subsystem, OS I/O subsystem, OS buffering, Java I/O subsystem & buffering, Java deserialization and induced GC. And… if you read the same object from DRAM – it involves few bytecode instructions – and that’s it.

Native C/C++ apps (like MongoDB) can take a slightly quicker route with memory mapped files (or various other IPC methods) – but the performance increase will not be significant (for obvious reason of needing to read/swap the entire pages vs. single object access pattern in DRAM).

Yet another recent technical explanation of the disadvantages of SSD storage can be found here (talking about Oracle’s “in-memory” strategy).

MongoDB, Cassandra, CouchDB DRAM-based?
Amid all the confusion on this topic it’s no wonder the author got it wrong. Neither MongoDB, Cassandra or CouchDB are in-memory systems. They are disk-based systems with support for memory caching. There’s nothing wrong with that and nothing new – every database developed in the last 25 years naturally provides in-memory caching to augment it’s main disk storage.

The fundamental difference here is that in-memory data systems like GridGain, SAP HAHA, GigaSpaces, GemFire, SqlFire, MemSQL, VoltDB, etc. use DRAM (memory) as the main storage medium and use disk for optional durability and overflow. This focus on RAM-based storage allows to completely re-optimized all main algorithms used in these systems.

For example, ACID implementation in GridGain that provides support for full-featured distributed ACID transactions beats every NoSQL database (EC-based) out there in read and even write performance: there are no single key limitations, no consistency trade offs to make, no application-side MVCC, no user-based conflict resolutions or other crutches – it just works the same way as it works in Oracle or DB2 – but faster.

2TB Cluster for $1.2M :)
If there was on piece in the original paper that was completely made up to fit the predefined narrative it was a price comparison. If the author thinks that 2TB RAM cluster costs $1.2M today – I have not one but two Golden Gate bridges to sell just for him…

Let’s see. A typical Dell/HP/IBM/Cisco blade with 256GB of DRAM will cost below $20K if you just buy on the list prices (Cisco seems to offer the best prices starting at around $15K for 256GB blades). That brings the total cost of 2TB cluster well below $200K (with all network and power equipment included and 100s TBs of disk storage).

Is this more expensive that SSD only cluster? Yes, by 2.5-3x times more expensive. But you are getting dramatic performance increase with the right software that more than justifies that price increase.

Conclusion
2-3x times price difference is nonetheless important and it provides our customers a very clear choice. If price is an issue and high performance is not – there are disk-based systems of wide varieties. If high performance and sub-second response on processing TBs of data is required – the hardware will be proportionally more expensive.

However, with 1GB of DRAM costing less than 1 USD and DRAM prices dropping 30% every 18 months – the era of disks (flash or spinning) is clearly coming to its logical end. It’s normal… it’s a progress and we all need to learn how to adapt.

Has anyone seen tape drives lately?

Read the original blog entry...

More Stories By Thomas Krafft

Over 15 years of experience in marketing and demand creation, with strategies driving over $500 million in revenue for a variety of companies in several high-growth and competitive markets, including consumer software and web services, ecommerce, demand creation through web and search, big data, and now healthcare.

@CloudExpo Stories
"As we've gone out into the public cloud we've seen that over time we may have lost a few things - we've lost control, we've given up cost to a certain extent, and then security, flexibility," explained Steve Conner, VP of Sales at Cloudistics,in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
"The Striim platform is a full end-to-end streaming integration and analytics platform that is middleware that covers a lot of different use cases," explained Steve Wilkes, Founder and CTO at Striim, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We want to show that our solution is far less expensive with a much better total cost of ownership so we announced several key features. One is called geo-distributed erasure coding, another is support for KVM and we introduced a new capability called Multi-Part," explained Tim Desai, Senior Product Marketing Manager at Hitachi Data Systems, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex ...
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud plat...
"We focus on SAP workloads because they are among the most powerful but somewhat challenging workloads out there to take into public cloud," explained Swen Conrad, CEO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Outscale was founded in 2010, is based in France, is a strategic partner to Dassault Systémes and has done quite a bit of work with divisions of Dassault," explained Jackie Funk, Digital Marketing exec at Outscale, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are still a relatively small software house and we are focusing on certain industries like FinTech, med tech, energy and utilities. We help our customers with their digital transformation," noted Piotr Stawinski, Founder and CEO of EARP Integration, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"I think DevOps is now a rambunctious teenager – it’s starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We've been engaging with a lot of customers including Panasonic, we've been involved with Cisco and now we're working with the U.S. government - the Department of Homeland Security," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
"We're here to tell the world about our cloud-scale infrastructure that we have at Juniper combined with the world-class security that we put into the cloud," explained Lisa Guess, VP of Systems Engineering at Juniper Networks, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
"I'm here to leverage my secret sauce, which is using outsourced development and the company that I utilize is delaPlex Software and they've basically allowed me to win Fortune 500 companies," noted Justin Witz, CTO of FRA and PlanTools, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are an IT services solution provider and we sell software to support those solutions. Our focus and key areas are around security, enterprise monitoring, and continuous delivery optimization," noted John Balsavage, President of A&I Solutions, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.