Welcome!

Machine Learning Authors: Elizabeth White, Liz McMillan, Jason Bloomberg, Ed Featherston, Karthick Viswanathan

Related Topics: @BigDataExpo, @CloudExpo, @ThingsExpo

@BigDataExpo: Blog Feed Post

Aggregated Data Dilemma | @BigDataExpo #BigData #Analytics #DataScience

Valuable performance and behavioral nuances can be buried in the aggregated data

Okay, I am weird (tell me something that I don’t know, say most of my friends).  For Christmas I wanted a Nike Apple Watch to go with my existing FitBit and Garmin fitness trackers (I look sort of like a cyborg in the photo below…which is always cool).

While I was intrigued by the ability to do all sorts of cool things on the Apple Watch (like take a phone call and talk into my wrist watch like Dick Tracy), the thing that most intrigued me was the ability to buy third-party apps that could yield detailed exercise and health data.  I was hoping that this detailed exercise and health data could help me understand what effect particular behaviors or activities (or lack of particular behaviors and activities) were having on my overall health.

Why is this important to me?  You can thank articles like “Unexpected Heart Attack Triggers” for my health and exercise anxiety.  The article highlighted several things that can trigger a heart attack including:

  • Lack of sleep (definitely an issue, especially when I’m traveling so much)
  • Migraine Headaches (how can you work in technology and not have headaches)
  • Cold Weather (need to find more clients in warmer weather)
  • Big, Heavy Meals (with the exception of Chipotle, right?)
  • Getting Out of Bed in the Morning (see, I knew that was a big danger!!)
  • Alcohol (just like to drink a beer now and then)
  • Coffee (I drink Chai Tea Lattes, that’s technically not coffee, and I know that I shouldn’t admit that I drink Chai Tea Lattes)

So there are many items on that above list that could trigger a heart attack, and I enjoy many of the things on that list (like sleeping and eating and the occasional beer).  Consequently, I thought I’d put my data science experience to work to monitor my exercise and diet behaviors and predict potential health outcomes.

Personal Fitness Analytics
I tested the downloadable data from each of the three devices. The Fitbit offered the easiest way to download my fitness data (and I have TONS of useful fitness and diet tracking suggestions if anyone at Fitbit, Garmin or Apple ever read this blog!!). The problem with the fitness data is that I can only get daily level data (see Table 1).

Table 1:  Daily Fitness Tracking Data

I can add more external data to the aggregated fitness data (e.g., days of the week, days when I travel, how much I travel on those travel days) to come up with some simple plots.

For example, Figure 2 shows a visual correlation between the calories that I burn per step and the days that I travel.  My assumption is that I burn more calories per step when I am doing something that requires more exertion (like running or climbing steps), so it makes sense that on days when I am traveling, I have less opportunities for highly exertive activities.

Figure 2:  How Many Calories I Burn Per Step When Traveling

While this information is “interesting,” unfortunately, data at the aggregated daily level is not actionable.  If I had more detailed or granular fitness data, I’d like to chart what happens to my heart rate (and related stress levels):

  • During an airplane flight
  • When racing through an airport to catch a connecting flight
  • Waking up very early in the morning while traveling
  • Immediately after eating a large meal
  • While I’m doing my taxes (I hate doing my taxes)

The problem is that the data provided by my fitness band is aggregated to a level that is not actionable.  If I had my fitness data at 5 or 10-minute intervals, then I could more easily spot unusual health outcomes and determine (and eventually predict?) what behaviors (e.g., flying in an airplane, eating large meals, heavy exercise exertion, waking up extremely early) might be causing health concerns.

Power of Granular Data
Big Data and data science are all about granular data because valuable performance and behavioral nuances can be buried in the aggregated data.  For example, the chart in Figure 3 shows how additional performance nuances are being uncovered as we transition from a 5-minute to a 1-minute and finally to a 5-second interval in the capture of the performance data.

Figure 3:  Performance Nuances Uncovered in Granular Data

As the data gets more granular, the behavioral and performance nuances buried in the data start to surface. Data at the 5 minute and 1 minute intervals in Figure 3 tell you very little. Aggregated data is the anti-data science. Data at the 5-second interval highlights some potential performance concerns.  In this example, data at the 5-second interval starts to become actionable.

For example, I might notice too sedentary of a heart rate whenever I sit too long on a cross-country flight or my stress level jumping whenever I get another “flight delayed” message while trying to catch a connecting flight. I might then learn to perform some in-seat exercises and walking around during those long flights, or practicing controlled breathing and some simple yoga when enduring yet another flight delay (SFO airport does have a yoga room, and now I know why).

Preparing for an IoT World of Granular Data
Understanding the challenges of capturing and analyzing real-time granular machine and device-generated data will become even more critical as we move into the Internet of Things (IOT), where hundreds of sensors are kicking off tens, hundreds or even thousands of data points per minute.  This will force two specific challenges upon those of us coming from the more traditional human-generated big data world:

  • Real-time data capture and compression
  • Real-time analytics at the edge

For my fitness focus, I might need to expand my Personal Fitness Analysis to capture and analyze more of this detailed data in (near) real-time so that I can become aware of behaviors that are hurting or improving my health and fitness.  Ultimately, my goal is to change my behaviors, but I need to understand (and quantify?) what behaviors lead to desirable health and fitness outcomes (e.g., improved blood pressure, lower weight, less stress).

The post Aggregated Data Dilemma appeared first on InFocus Blog | Dell EMC Services.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

@CloudExpo Stories
SYS-CON Events announced today that Interface Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Interface Corporation is a company developing, manufacturing and marketing high quality and wide variety of industrial computers and interface modules such as PCIs and PCI express. For more information, visit http://www.i...
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
SYS-CON Events announced today that Fusic will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Fusic Co. provides mocks as virtual IoT devices. You can customize mocks, and get any amount of data at any time in your test. For more information, visit https://fusic.co.jp/english/.
As businesses evolve, they need technology that is simple to help them succeed today and flexible enough to help them build for tomorrow. Chrome is fit for the workplace of the future — providing a secure, consistent user experience across a range of devices that can be used anywhere. In her session at 21st Cloud Expo, Vidya Nagarajan, a Senior Product Manager at Google, will take a look at various options as to how ChromeOS can be leveraged to interact with people on the devices, and formats ...
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant th...
In his session at @ThingsExpo, Greg Gorman is the Director, IoT Developer Ecosystem, Watson IoT, will provide a short tutorial on Node-RED, a Node.js-based programming tool for wiring together hardware devices, APIs and online services in new and interesting ways. It provides a browser-based editor that makes it easy to wire together flows using a wide range of nodes in the palette that can be deployed to its runtime in a single-click. There is a large library of contributed nodes that help so...
SYS-CON Events announced today that Daiya Industry will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Daiya Industry specializes in orthotic support systems and assistive devices with pneumatic artificial muscles in order to contribute to an extended healthy life expectancy. For more information, please visit https://www.daiyak...
Many companies start their journey to the cloud in the DevOps environment, where software engineers want self-service access to the custom tools and frameworks they need. Machine learning technology can help IT departments keep up with these demands. In his session at 21st Cloud Expo, Ajay Gulati, Co-Founder, CTO and Board Member at ZeroStack, will discuss the use of machine learning for automating provisioning of DevOps resources, taking the burden off IT teams.
What is the best strategy for selecting the right offshore company for your business? In his session at 21st Cloud Expo, Alan Winters, U.S. Head of Business Development at MobiDev, will discuss the things to look for - positive and negative - in evaluating your options. He will also discuss how to maximize productivity with your offshore developers. Before you start your search, clearly understand your business needs and how that impacts software choices.
Most of the time there is a lot of work involved to move to the cloud, and most of that isn't really related to AWS or Azure or Google Cloud. Before we talk about public cloud vendors and DevOps tools, there are usually several technical and non-technical challenges that are connected to it and that every company needs to solve to move to the cloud. In his session at 21st Cloud Expo, Stefano Bellasio, CEO and founder of Cloud Academy Inc., will discuss what the tools, disciplines, and cultural...
SYS-CON Events announced today that Cedexis will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cedexis is the leader in data-driven enterprise global traffic management. Whether optimizing traffic through datacenters, clouds, CDNs, or any combination, Cedexis solutions drive quality and cost-effectiveness.
Is advanced scheduling in Kubernetes achievable? Yes, however, how do you properly accommodate every real-life scenario that a Kubernetes user might encounter? How do you leverage advanced scheduling techniques to shape and describe each scenario in easy-to-use rules and configurations? In his session at @DevOpsSummit at 21st Cloud Expo, Oleg Chunikhin, CTO at Kublr, will answer these questions and demonstrate techniques for implementing advanced scheduling. For example, using spot instances ...
SYS-CON Events announced today that SIGMA Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. uLaser flow inspection device from the Japanese top share to Global Standard! Then, make the best use of data to flip to next page. For more information, visit http://www.sigma-k.co.jp/en/.
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
SYS-CON Events announced today that B2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. B2Cloud specializes in IoT devices for preventive and predictive maintenance in any kind of equipment retrieving data like Energy consumption, working time, temperature, humidity, pressure, etc.
Today traditional IT approaches leverage well-architected compute/networking domains to control what applications can access what data, and how. DevOps includes rapid application development/deployment leveraging concepts like containerization, third-party sourced applications and databases. Such applications need access to production data for its test and iteration cycles. Data Security? That sounds like a roadblock to DevOps vs. protecting the crown jewels to those in IT.
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp em...
Why Federal cloud? What is in Federal Clouds and integrations? This session will identify the process and the FedRAMP initiative. But is it sufficient? What is the remedy for keeping abreast of cutting-edge technology? In his session at 21st Cloud Expo, Rasananda Behera will examine the proposed solutions: Private or public or hybrid cloud Responsible governing bodies How can we accomplish?
Cloud-based disaster recovery is critical to any production environment and is a high priority for many enterprise organizations today. Nearly 40% of organizations have had to execute their BCDR plan due to a service disruption in the past two years. Zerto on IBM Cloud offer VMware and Microsoft customers simple, automated recovery of on-premise VMware and Microsoft workloads to IBM Cloud data centers.