Welcome!

Machine Learning Authors: Carmen Gonzalez, Yeshim Deniz, Bob Gourley, Elizabeth White, Pat Romanski

Related Topics: Java IoT, Industrial IoT, Adobe Flex, Machine Learning , Recurring Revenue, Apache

Java IoT: Blog Feed Post

Debunking DRAM vs. Flash Controversy vis-a-vis In-Memory Processing

“Minimal” Performance Advantage of DRAM vs SSD

Wikibon produced an interesting material (looks like paid by Aerospike, NoSQL database recently emerged by resurrecting failed CitrusLeaf and acquihiring AlchemyDB, which product, of course, was recommended in the end) that compares NoSQL databases based on storing data in flash-based SSD vs. storing data in DRAM.

There are number of factual problems with that paper and I want to point them out.

Note that Wikibon doesn’t mention GridGain in this study (we are not a NoSQL datastore per-se after all) so I don’t have any bone in this game other than annoyance with biased and factually incorrect writing.

“Minimal” Performance Advantage of DRAM vs SSD
The paper starts with a simple statement “The minimal performance disadvantage of flash, relative to main memory…”. Minimal? I’ve seen number of studies where performance difference between SSDs and DRAM range form 100 to 10,000 times. For example, this University of California, Berkeley study claims that SSD bring almost no advantage to the Facebook Hadoop cluster and DRAM pre-caching is the way forward.

Let me provide even shorter explanation. Assuming we are dealing with Java – SSD devices are visible to Java application as typical block devices, and therefore accessed as such. It means that a typical object read from such device involves the same steps as reading this object from a file: hardware I/O subsystem, OS I/O subsystem, OS buffering, Java I/O subsystem & buffering, Java deserialization and induced GC. And… if you read the same object from DRAM – it involves few bytecode instructions – and that’s it.

Native C/C++ apps (like MongoDB) can take a slightly quicker route with memory mapped files (or various other IPC methods) – but the performance increase will not be significant (for obvious reason of needing to read/swap the entire pages vs. single object access pattern in DRAM).

Yet another recent technical explanation of the disadvantages of SSD storage can be found here (talking about Oracle’s “in-memory” strategy).

MongoDB, Cassandra, CouchDB DRAM-based?
Amid all the confusion on this topic it’s no wonder the author got it wrong. Neither MongoDB, Cassandra or CouchDB are in-memory systems. They are disk-based systems with support for memory caching. There’s nothing wrong with that and nothing new – every database developed in the last 25 years naturally provides in-memory caching to augment it’s main disk storage.

The fundamental difference here is that in-memory data systems like GridGain, SAP HAHA, GigaSpaces, GemFire, SqlFire, MemSQL, VoltDB, etc. use DRAM (memory) as the main storage medium and use disk for optional durability and overflow. This focus on RAM-based storage allows to completely re-optimized all main algorithms used in these systems.

For example, ACID implementation in GridGain that provides support for full-featured distributed ACID transactions beats every NoSQL database (EC-based) out there in read and even write performance: there are no single key limitations, no consistency trade offs to make, no application-side MVCC, no user-based conflict resolutions or other crutches – it just works the same way as it works in Oracle or DB2 – but faster.

2TB Cluster for $1.2M :)
If there was on piece in the original paper that was completely made up to fit the predefined narrative it was a price comparison. If the author thinks that 2TB RAM cluster costs $1.2M today – I have not one but two Golden Gate bridges to sell just for him…

Let’s see. A typical Dell/HP/IBM/Cisco blade with 256GB of DRAM will cost below $20K if you just buy on the list prices (Cisco seems to offer the best prices starting at around $15K for 256GB blades). That brings the total cost of 2TB cluster well below $200K (with all network and power equipment included and 100s TBs of disk storage).

Is this more expensive that SSD only cluster? Yes, by 2.5-3x times more expensive. But you are getting dramatic performance increase with the right software that more than justifies that price increase.

Conclusion
2-3x times price difference is nonetheless important and it provides our customers a very clear choice. If price is an issue and high performance is not – there are disk-based systems of wide varieties. If high performance and sub-second response on processing TBs of data is required – the hardware will be proportionally more expensive.

However, with 1GB of DRAM costing less than 1 USD and DRAM prices dropping 30% every 18 months – the era of disks (flash or spinning) is clearly coming to its logical end. It’s normal… it’s a progress and we all need to learn how to adapt.

Has anyone seen tape drives lately?

Read the original blog entry...

More Stories By Thomas Krafft

Over 15 years of experience in marketing and demand creation, with strategies driving over $500 million in revenue for a variety of companies in several high-growth and competitive markets, including consumer software and web services, ecommerce, demand creation through web and search, big data, and now healthcare.

@CloudExpo Stories
Bert Loomis was a visionary. This general session will highlight how Bert Loomis and people like him inspire us to build great things with small inventions. In their general session at 19th Cloud Expo, Harold Hannon, Architect at IBM Bluemix, and Michael O'Neill, Strategic Business Development at Nvidia, discussed the accelerating pace of AI development and how IBM Cloud and NVIDIA are partnering to bring AI capabilities to "every day," on-demand. They also reviewed two "free infrastructure" pr...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists will examine how DevOps helps to meet th...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the USA and Europe, we work with a variety of customers from emerging startups to Fortune 1000 companies.
Financial Technology has become a topic of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 20th Cloud Expo at the Javits Center in New York, June 6-8, 2017, will find fresh new content in a new track called FinTech.
@DevOpsSummit at Cloud taking place June 6-8, 2017, at Javits Center, New York City, is co-located with the 20th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long developm...
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 add...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
SYS-CON Events announced today that Juniper Networks (NYSE: JNPR), an industry leader in automated, scalable and secure networks, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Juniper Networks challenges the status quo with products, solutions and services that transform the economics of networking. The company co-innovates with customers and partners to deliver automated, scalable and secure network...
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
Cloud Expo, Inc. has announced today that Aruna Ravichandran, vice president of DevOps Product and Solutions Marketing at CA Technologies, has been named co-conference chair of DevOps at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In recent years, containers have taken the world by storm. Companies of all sizes and industries have realized the massive benefits of containers, such as unprecedented mobility, higher hardware utilization, and increased flexibility and agility; however, many containers today are non-persistent. Containers without persistence miss out on many benefits, and in many cases simply pass the responsibility of persistence onto other infrastructure, adding additional complexity.
SYS-CON Events announced today that Hitachi Data Systems, a wholly owned subsidiary of Hitachi LTD., will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City. Hitachi Data Systems (HDS) will be featuring the Hitachi Content Platform (HCP) portfolio. This is the industry’s only offering that allows organizations to bring together object storage, file sync and share, cloud storage gateways, and sophisticated search an...
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, pane...
@GonzalezCarmen has been ranked the Number One Influencer and @ThingsExpo has been named the Number One Brand in the “M2M 2016: Top 100 Influencers and Brands” by Analytic. Onalytica analyzed tweets over the last 6 months mentioning the keywords M2M OR “Machine to Machine.” They then identified the top 100 most influential brands and individuals leading the discussion on Twitter.
SYS-CON Events announced today that Twistlock, the leading provider of cloud container security solutions, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Twistlock is the industry's first enterprise security suite for container security. Twistlock's technology addresses risks on the host and within the application of the container, enabling enterprises to consistently enforce security policies, monitor...
@ThingsExpo has been named the Most Influential ‘Smart Cities - IIoT' Account and @BigDataExpo has been named fourteenth by Right Relevance (RR), which provides curated information and intelligence on approximately 50,000 topics. In addition, Right Relevance provides an Insights offering that combines the above Topics and Influencers information with real time conversations to provide actionable intelligence with visualizations to enable decision making. The Insights service is applicable to eve...
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...